[PATCH 0/6] Use virtual page class key protection mechanism for speeding up guest page fault

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/6] Use virtual page class key protection mechanism for speeding up guest page fault
@ 2014-06-29 11:17 ` Aneesh Kumar K.V
  0 siblings, 0 replies; 51+ messages in thread
From: Aneesh Kumar K.V @ 2014-06-29 11:17 UTC (permalink / raw)
  To: agraf, benh, paulus; +Cc: linuxppc-dev, kvm-ppc, kvm, Aneesh Kumar K.V

Hi,

With the current code we do an expensive hash page table lookup on every
page fault resulting from a missing hash page table entry. A NO_HPTE
page fault can happen due to the below reasons:

1) Missing hash pte as per guest. This should be forwarded to the guest
2) MMIO hash pte. The address against which the load/store is performed
   should be emulated as a MMIO operation.
3) Missing hash pte because host swapped out the guest page.

We want to differentiate (1) from (2) and (3) so that we can speed up
page fault due to (1). Optimizing (1) will help in improving
the overall performance because that covers a large percentage of
the page faults.

To achieve the above we use virtual page calss protection mechanism for
covering (2) and (3). For both the above case we mark the hpte
valid, but associate the page with virtual page class index 30 and 31.
The authority mask register is configured such that class index 30 and 31
will have read/write denied. The above change results in a key fault
for (2) and (3). This allows us to forward a NO_HPTE fault directly to guest
without doing the expensive hash pagetable lookup.

For the test below:

#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/mman.h>

#define PAGES (40*1024)

int main()
{
	unsigned long size = getpagesize();
	unsigned long length = size * PAGES;
	unsigned long i, j, k = 0;

	for (j = 0; j < 10; j++) {
		char *c = mmap(NULL, length, PROT_READ|PROT_WRITE,
			       MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
		if (c == MAP_FAILED) {
			perror("mmap");
			exit(1);
		}
		for (i = 0; i < length; i += size)
			c[i] = 0;

		/* flush hptes */
		mprotect(c, length, PROT_WRITE);

		for (i = 0; i < length; i += size)
			c[i] = 10;

		mprotect(c, length, PROT_READ);

		for (i = 0; i < length; i += size)
			k += c[i];

		munmap(c, length);
	}
}

Without Fix:
----------
[root@qemu-pr-host ~]# time ./pfault

real    0m8.438s
user    0m0.855s
sys     0m7.540s
[root@qemu-pr-host ~]#

With Fix:
--------
[root@qemu-pr-host ~]# time ./pfault

real    0m7.833s
user    0m0.782s
sys     0m7.038s
[root@qemu-pr-host ~]#

Aneesh Kumar K.V (6):
  KVM: PPC: BOOK3S: HV: Clear hash pte bits from do_h_enter callers
  KVM: PPC: BOOK3S: HV: Deny virtual page class key update via h_protect
  KVM: PPC: BOOK3S: HV: Remove dead code
  KVM: PPC: BOOK3S: HV: Use new functions for mapping/unmapping hpte in
    host
  KVM: PPC: BOOK3S: Use hpte_update_in_progress to track invalid hpte
    during an hpte update
  KVM: PPC: BOOK3S: HV: Use virtual page class protection mechanism for
    host fault and mmio

 arch/powerpc/include/asm/kvm_book3s_64.h |  97 +++++++++++++++++-
 arch/powerpc/include/asm/kvm_host.h      |   1 +
 arch/powerpc/include/asm/reg.h           |   1 +
 arch/powerpc/kernel/asm-offsets.c        |   1 +
 arch/powerpc/kvm/book3s_64_mmu_hv.c      |  99 ++++++++++++------
 arch/powerpc/kvm/book3s_hv.c             |   1 +
 arch/powerpc/kvm/book3s_hv_rm_mmu.c      | 166 +++++++++++++++++++++----------
 arch/powerpc/kvm/book3s_hv_rmhandlers.S  | 100 +++++++++++++++++--
 8 files changed, 371 insertions(+), 95 deletions(-)

-- 
1.9.1

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH 0/6] Use virtual page class key protection mechanism for speeding up guest page fault
@ 2014-06-29 11:17 ` Aneesh Kumar K.V
  0 siblings, 0 replies; 51+ messages in thread
From: Aneesh Kumar K.V @ 2014-06-29 11:17 UTC (permalink / raw)
  To: agraf, benh, paulus; +Cc: linuxppc-dev, kvm, kvm-ppc, Aneesh Kumar K.V

Hi,

With the current code we do an expensive hash page table lookup on every
page fault resulting from a missing hash page table entry. A NO_HPTE
page fault can happen due to the below reasons:

1) Missing hash pte as per guest. This should be forwarded to the guest
2) MMIO hash pte. The address against which the load/store is performed
   should be emulated as a MMIO operation.
3) Missing hash pte because host swapped out the guest page.

We want to differentiate (1) from (2) and (3) so that we can speed up
page fault due to (1). Optimizing (1) will help in improving
the overall performance because that covers a large percentage of
the page faults.

To achieve the above we use virtual page calss protection mechanism for
covering (2) and (3). For both the above case we mark the hpte
valid, but associate the page with virtual page class index 30 and 31.
The authority mask register is configured such that class index 30 and 31
will have read/write denied. The above change results in a key fault
for (2) and (3). This allows us to forward a NO_HPTE fault directly to guest
without doing the expensive hash pagetable lookup.

For the test below:

#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/mman.h>

#define PAGES (40*1024)

int main()
{
	unsigned long size = getpagesize();
	unsigned long length = size * PAGES;
	unsigned long i, j, k = 0;

	for (j = 0; j < 10; j++) {
		char *c = mmap(NULL, length, PROT_READ|PROT_WRITE,
			       MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
		if (c == MAP_FAILED) {
			perror("mmap");
			exit(1);
		}
		for (i = 0; i < length; i += size)
			c[i] = 0;

		/* flush hptes */
		mprotect(c, length, PROT_WRITE);

		for (i = 0; i < length; i += size)
			c[i] = 10;

		mprotect(c, length, PROT_READ);

		for (i = 0; i < length; i += size)
			k += c[i];

		munmap(c, length);
	}
}

Without Fix:
----------
[root@qemu-pr-host ~]# time ./pfault

real    0m8.438s
user    0m0.855s
sys     0m7.540s
[root@qemu-pr-host ~]#

With Fix:
--------
[root@qemu-pr-host ~]# time ./pfault

real    0m7.833s
user    0m0.782s
sys     0m7.038s
[root@qemu-pr-host ~]#

Aneesh Kumar K.V (6):
  KVM: PPC: BOOK3S: HV: Clear hash pte bits from do_h_enter callers
  KVM: PPC: BOOK3S: HV: Deny virtual page class key update via h_protect
  KVM: PPC: BOOK3S: HV: Remove dead code
  KVM: PPC: BOOK3S: HV: Use new functions for mapping/unmapping hpte in
    host
  KVM: PPC: BOOK3S: Use hpte_update_in_progress to track invalid hpte
    during an hpte update
  KVM: PPC: BOOK3S: HV: Use virtual page class protection mechanism for
    host fault and mmio

 arch/powerpc/include/asm/kvm_book3s_64.h |  97 +++++++++++++++++-
 arch/powerpc/include/asm/kvm_host.h      |   1 +
 arch/powerpc/include/asm/reg.h           |   1 +
 arch/powerpc/kernel/asm-offsets.c        |   1 +
 arch/powerpc/kvm/book3s_64_mmu_hv.c      |  99 ++++++++++++------
 arch/powerpc/kvm/book3s_hv.c             |   1 +
 arch/powerpc/kvm/book3s_hv_rm_mmu.c      | 166 +++++++++++++++++++++----------
 arch/powerpc/kvm/book3s_hv_rmhandlers.S  | 100 +++++++++++++++++--
 8 files changed, 371 insertions(+), 95 deletions(-)

-- 
1.9.1

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH 1/6] KVM: PPC: BOOK3S: HV: Clear hash pte bits from do_h_enter callers
  2014-06-29 11:17 ` Aneesh Kumar K.V
  (?)
@ 2014-06-29 11:17   ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 51+ messages in thread
From: Aneesh Kumar K.V @ 2014-06-29 11:17 UTC (permalink / raw)
  To: agraf, benh, paulus; +Cc: linuxppc-dev, kvm-ppc, kvm, Aneesh Kumar K.V

We will use this to set HPTE_V_VRMA bit in the later patch. This also
make sure we clear the hpte bits only when called via hcall.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 arch/powerpc/kvm/book3s_64_mmu_hv.c | 15 +++++++++++++--
 arch/powerpc/kvm/book3s_hv_rm_mmu.c |  8 ++++++--
 2 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 09a47aeb5b63..1c137f45dd55 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -371,8 +371,6 @@ long kvmppc_virtmode_do_h_enter(struct kvm *kvm, unsigned long flags,
 	if (!psize)
 		return H_PARAMETER;
 
-	pteh &= ~(HPTE_V_HVLOCK | HPTE_V_ABSENT | HPTE_V_VALID);
-
 	/* Find the memslot (if any) for this address */
 	gpa = (ptel & HPTE_R_RPN) & ~(psize - 1);
 	gfn = gpa >> PAGE_SHIFT;
@@ -408,6 +406,12 @@ long kvmppc_virtmode_h_enter(struct kvm_vcpu *vcpu, unsigned long flags,
 			     long pte_index, unsigned long pteh,
 			     unsigned long ptel)
 {
+	/*
+	 * Clear few bits, when called via hcall
+	 */
+	pteh &= ~(HPTE_V_HVLOCK | HPTE_V_ABSENT | HPTE_V_VALID);
+	ptel &= ~(HPTE_R_KEY_HI | HPTE_R_KEY_LO | HPTE_GR_RESERVED);
+
 	return kvmppc_virtmode_do_h_enter(vcpu->kvm, flags, pte_index,
 					  pteh, ptel, &vcpu->arch.gpr[4]);
 }
@@ -1560,6 +1564,13 @@ static ssize_t kvm_htab_write(struct file *file, const char __user *buf,
 			if (be64_to_cpu(hptp[0]) & (HPTE_V_VALID | HPTE_V_ABSENT))
 				kvmppc_do_h_remove(kvm, 0, i, 0, tmp);
 			err = -EIO;
+			/*
+			 * Clear few bits we got via read_htab which we
+			 * don't need to carry forward.
+			 */
+			v &= ~(HPTE_V_HVLOCK | HPTE_V_ABSENT | HPTE_V_VALID);
+			r &= ~(HPTE_R_KEY_HI | HPTE_R_KEY_LO | HPTE_GR_RESERVED);
+
 			ret = kvmppc_virtmode_do_h_enter(kvm, H_EXACT, i, v, r,
 							 tmp);
 			if (ret != H_SUCCESS) {
diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index 084ad54c73cd..157a5f35edfa 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -182,8 +182,6 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags,
 	if (!psize)
 		return H_PARAMETER;
 	writing = hpte_is_writable(ptel);
-	pteh &= ~(HPTE_V_HVLOCK | HPTE_V_ABSENT | HPTE_V_VALID);
-	ptel &= ~HPTE_GR_RESERVED;
 	g_ptel = ptel;
 
 	/* used later to detect if we might have been invalidated */
@@ -367,6 +365,12 @@ EXPORT_SYMBOL_GPL(kvmppc_do_h_enter);
 long kvmppc_h_enter(struct kvm_vcpu *vcpu, unsigned long flags,
 		    long pte_index, unsigned long pteh, unsigned long ptel)
 {
+	/*
+	 * Clear few bits. when called via hcall.
+	 */
+	pteh &= ~(HPTE_V_HVLOCK | HPTE_V_ABSENT | HPTE_V_VALID);
+	ptel &= ~(HPTE_R_KEY_HI | HPTE_R_KEY_LO | HPTE_GR_RESERVED);
+
 	return kvmppc_do_h_enter(vcpu->kvm, flags, pte_index, pteh, ptel,
 				 vcpu->arch.pgdir, true, &vcpu->arch.gpr[4]);
 }
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 1/6] KVM: PPC: BOOK3S: HV: Clear hash pte bits from do_h_enter callers
@ 2014-06-29 11:17   ` Aneesh Kumar K.V
  0 siblings, 0 replies; 51+ messages in thread
From: Aneesh Kumar K.V @ 2014-06-29 11:17 UTC (permalink / raw)
  To: agraf, benh, paulus; +Cc: linuxppc-dev, kvm, kvm-ppc, Aneesh Kumar K.V

We will use this to set HPTE_V_VRMA bit in the later patch. This also
make sure we clear the hpte bits only when called via hcall.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 arch/powerpc/kvm/book3s_64_mmu_hv.c | 15 +++++++++++++--
 arch/powerpc/kvm/book3s_hv_rm_mmu.c |  8 ++++++--
 2 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 09a47aeb5b63..1c137f45dd55 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -371,8 +371,6 @@ long kvmppc_virtmode_do_h_enter(struct kvm *kvm, unsigned long flags,
 	if (!psize)
 		return H_PARAMETER;
 
-	pteh &= ~(HPTE_V_HVLOCK | HPTE_V_ABSENT | HPTE_V_VALID);
-
 	/* Find the memslot (if any) for this address */
 	gpa = (ptel & HPTE_R_RPN) & ~(psize - 1);
 	gfn = gpa >> PAGE_SHIFT;
@@ -408,6 +406,12 @@ long kvmppc_virtmode_h_enter(struct kvm_vcpu *vcpu, unsigned long flags,
 			     long pte_index, unsigned long pteh,
 			     unsigned long ptel)
 {
+	/*
+	 * Clear few bits, when called via hcall
+	 */
+	pteh &= ~(HPTE_V_HVLOCK | HPTE_V_ABSENT | HPTE_V_VALID);
+	ptel &= ~(HPTE_R_KEY_HI | HPTE_R_KEY_LO | HPTE_GR_RESERVED);
+
 	return kvmppc_virtmode_do_h_enter(vcpu->kvm, flags, pte_index,
 					  pteh, ptel, &vcpu->arch.gpr[4]);
 }
@@ -1560,6 +1564,13 @@ static ssize_t kvm_htab_write(struct file *file, const char __user *buf,
 			if (be64_to_cpu(hptp[0]) & (HPTE_V_VALID | HPTE_V_ABSENT))
 				kvmppc_do_h_remove(kvm, 0, i, 0, tmp);
 			err = -EIO;
+			/*
+			 * Clear few bits we got via read_htab which we
+			 * don't need to carry forward.
+			 */
+			v &= ~(HPTE_V_HVLOCK | HPTE_V_ABSENT | HPTE_V_VALID);
+			r &= ~(HPTE_R_KEY_HI | HPTE_R_KEY_LO | HPTE_GR_RESERVED);
+
 			ret = kvmppc_virtmode_do_h_enter(kvm, H_EXACT, i, v, r,
 							 tmp);
 			if (ret != H_SUCCESS) {
diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index 084ad54c73cd..157a5f35edfa 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -182,8 +182,6 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags,
 	if (!psize)
 		return H_PARAMETER;
 	writing = hpte_is_writable(ptel);
-	pteh &= ~(HPTE_V_HVLOCK | HPTE_V_ABSENT | HPTE_V_VALID);
-	ptel &= ~HPTE_GR_RESERVED;
 	g_ptel = ptel;
 
 	/* used later to detect if we might have been invalidated */
@@ -367,6 +365,12 @@ EXPORT_SYMBOL_GPL(kvmppc_do_h_enter);
 long kvmppc_h_enter(struct kvm_vcpu *vcpu, unsigned long flags,
 		    long pte_index, unsigned long pteh, unsigned long ptel)
 {
+	/*
+	 * Clear few bits. when called via hcall.
+	 */
+	pteh &= ~(HPTE_V_HVLOCK | HPTE_V_ABSENT | HPTE_V_VALID);
+	ptel &= ~(HPTE_R_KEY_HI | HPTE_R_KEY_LO | HPTE_GR_RESERVED);
+
 	return kvmppc_do_h_enter(vcpu->kvm, flags, pte_index, pteh, ptel,
 				 vcpu->arch.pgdir, true, &vcpu->arch.gpr[4]);
 }
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH] KVM: PPC: BOOK3S: HV: Update compute_tlbie_rb to handle 16MB base page
  2014-06-29 11:17 ` Aneesh Kumar K.V
  (?)
@ 2014-06-29 11:17   ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 51+ messages in thread
From: Aneesh Kumar K.V @ 2014-06-29 11:17 UTC (permalink / raw)
  To: agraf, benh, paulus; +Cc: linuxppc-dev, kvm-ppc, kvm, Aneesh Kumar K.V

When calculating the lower bits of AVA field, use the shift
count based on the base page size. Also add the missing segment
size and remove stale comment.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/kvm_book3s_64.h | 6 ++++--
 arch/powerpc/kvm/book3s_hv.c             | 6 ------
 2 files changed, 4 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h
index 66a0a44b62a8..ca7c1688a7b6 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -158,6 +158,8 @@ static inline unsigned long compute_tlbie_rb(unsigned long v, unsigned long r,
 	 */
 	/* This covers 14..54 bits of va*/
 	rb = (v & ~0x7fUL) << 16;		/* AVA field */
+
+	rb |= v >> (62 - 8);			/*  B field */
 	/*
 	 * AVA in v had cleared lower 23 bits. We need to derive
 	 * that from pteg index
@@ -188,10 +190,10 @@ static inline unsigned long compute_tlbie_rb(unsigned long v, unsigned long r,
 	{
 		int aval_shift;
 		/*
-		 * remaining 7bits of AVA/LP fields
+		 * remaining bits of AVA/LP fields
 		 * Also contain the rr bits of LP
 		 */
-		rb |= (va_low & 0x7f) << 16;
+		rb |= (va_low << mmu_psize_defs[b_psize].shift) & 0x7ff000;
 		/*
 		 * Now clear not needed LP bits based on actual psize
 		 */
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index cbf46eb3f59c..328416f28a55 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -1917,12 +1917,6 @@ static void kvmppc_add_seg_page_size(struct kvm_ppc_one_seg_page_size **sps,
 	(*sps)->page_shift = def->shift;
 	(*sps)->slb_enc = def->sllp;
 	(*sps)->enc[0].page_shift = def->shift;
-	/*
-	 * Only return base page encoding. We don't want to return
-	 * all the supporting pte_enc, because our H_ENTER doesn't
-	 * support MPSS yet. Once they do, we can start passing all
-	 * support pte_enc here
-	 */
 	(*sps)->enc[0].pte_enc = def->penc[linux_psize];
 	/*
 	 * Add 16MB MPSS support if host supports it
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH] KVM: PPC: BOOK3S: HV: Update compute_tlbie_rb to handle 16MB base page
@ 2014-06-29 11:17   ` Aneesh Kumar K.V
  0 siblings, 0 replies; 51+ messages in thread
From: Aneesh Kumar K.V @ 2014-06-29 11:17 UTC (permalink / raw)
  To: agraf, benh, paulus; +Cc: linuxppc-dev, kvm, kvm-ppc, Aneesh Kumar K.V

When calculating the lower bits of AVA field, use the shift
count based on the base page size. Also add the missing segment
size and remove stale comment.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/kvm_book3s_64.h | 6 ++++--
 arch/powerpc/kvm/book3s_hv.c             | 6 ------
 2 files changed, 4 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h
index 66a0a44b62a8..ca7c1688a7b6 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -158,6 +158,8 @@ static inline unsigned long compute_tlbie_rb(unsigned long v, unsigned long r,
 	 */
 	/* This covers 14..54 bits of va*/
 	rb = (v & ~0x7fUL) << 16;		/* AVA field */
+
+	rb |= v >> (62 - 8);			/*  B field */
 	/*
 	 * AVA in v had cleared lower 23 bits. We need to derive
 	 * that from pteg index
@@ -188,10 +190,10 @@ static inline unsigned long compute_tlbie_rb(unsigned long v, unsigned long r,
 	{
 		int aval_shift;
 		/*
-		 * remaining 7bits of AVA/LP fields
+		 * remaining bits of AVA/LP fields
 		 * Also contain the rr bits of LP
 		 */
-		rb |= (va_low & 0x7f) << 16;
+		rb |= (va_low << mmu_psize_defs[b_psize].shift) & 0x7ff000;
 		/*
 		 * Now clear not needed LP bits based on actual psize
 		 */
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index cbf46eb3f59c..328416f28a55 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -1917,12 +1917,6 @@ static void kvmppc_add_seg_page_size(struct kvm_ppc_one_seg_page_size **sps,
 	(*sps)->page_shift = def->shift;
 	(*sps)->slb_enc = def->sllp;
 	(*sps)->enc[0].page_shift = def->shift;
-	/*
-	 * Only return base page encoding. We don't want to return
-	 * all the supporting pte_enc, because our H_ENTER doesn't
-	 * support MPSS yet. Once they do, we can start passing all
-	 * support pte_enc here
-	 */
 	(*sps)->enc[0].pte_enc = def->penc[linux_psize];
 	/*
 	 * Add 16MB MPSS support if host supports it
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 2/6] KVM: PPC: BOOK3S: HV: Deny virtual page class key update via h_protect
  2014-06-29 11:17 ` Aneesh Kumar K.V
  (?)
@ 2014-06-29 11:17   ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 51+ messages in thread
From: Aneesh Kumar K.V @ 2014-06-29 11:17 UTC (permalink / raw)
  To: agraf, benh, paulus; +Cc: linuxppc-dev, kvm-ppc, kvm, Aneesh Kumar K.V

This makes it consistent with h_enter where we clear the key
bits. We also want to use virtual page class key protection mechanism
for indicating host page fault. For that we will be using key class
index 30 and 31. So prevent the guest from updating key bits until
we add proper support for virtual page class protection mechanism for
the guest. This will not have any impact for PAPR linux guest because
Linux guest currently don't use virtual page class key protection model

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 arch/powerpc/kvm/book3s_hv_rm_mmu.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index 157a5f35edfa..f908845f7379 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -658,13 +658,17 @@ long kvmppc_h_protect(struct kvm_vcpu *vcpu, unsigned long flags,
 	}
 
 	v = pte;
+	/*
+	 * We ignore key bits here. We use class 31 and 30 for
+	 * hypervisor purpose. We still don't track the page
+	 * class seperately. Until then don't allow h_protect
+	 * to change key bits.
+	 */
 	bits = (flags << 55) & HPTE_R_PP0;
-	bits |= (flags << 48) & HPTE_R_KEY_HI;
-	bits |= flags & (HPTE_R_PP | HPTE_R_N | HPTE_R_KEY_LO);
+	bits |= flags & (HPTE_R_PP | HPTE_R_N);
 
 	/* Update guest view of 2nd HPTE dword */
-	mask = HPTE_R_PP0 | HPTE_R_PP | HPTE_R_N |
-		HPTE_R_KEY_HI | HPTE_R_KEY_LO;
+	mask = HPTE_R_PP0 | HPTE_R_PP | HPTE_R_N;
 	rev = real_vmalloc_addr(&kvm->arch.revmap[pte_index]);
 	if (rev) {
 		r = (rev->guest_rpte & ~mask) | bits;
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 2/6] KVM: PPC: BOOK3S: HV: Deny virtual page class key update via h_protect
@ 2014-06-29 11:17   ` Aneesh Kumar K.V
  0 siblings, 0 replies; 51+ messages in thread
From: Aneesh Kumar K.V @ 2014-06-29 11:17 UTC (permalink / raw)
  To: agraf, benh, paulus; +Cc: linuxppc-dev, kvm, kvm-ppc, Aneesh Kumar K.V

This makes it consistent with h_enter where we clear the key
bits. We also want to use virtual page class key protection mechanism
for indicating host page fault. For that we will be using key class
index 30 and 31. So prevent the guest from updating key bits until
we add proper support for virtual page class protection mechanism for
the guest. This will not have any impact for PAPR linux guest because
Linux guest currently don't use virtual page class key protection model

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 arch/powerpc/kvm/book3s_hv_rm_mmu.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index 157a5f35edfa..f908845f7379 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -658,13 +658,17 @@ long kvmppc_h_protect(struct kvm_vcpu *vcpu, unsigned long flags,
 	}
 
 	v = pte;
+	/*
+	 * We ignore key bits here. We use class 31 and 30 for
+	 * hypervisor purpose. We still don't track the page
+	 * class seperately. Until then don't allow h_protect
+	 * to change key bits.
+	 */
 	bits = (flags << 55) & HPTE_R_PP0;
-	bits |= (flags << 48) & HPTE_R_KEY_HI;
-	bits |= flags & (HPTE_R_PP | HPTE_R_N | HPTE_R_KEY_LO);
+	bits |= flags & (HPTE_R_PP | HPTE_R_N);
 
 	/* Update guest view of 2nd HPTE dword */
-	mask = HPTE_R_PP0 | HPTE_R_PP | HPTE_R_N |
-		HPTE_R_KEY_HI | HPTE_R_KEY_LO;
+	mask = HPTE_R_PP0 | HPTE_R_PP | HPTE_R_N;
 	rev = real_vmalloc_addr(&kvm->arch.revmap[pte_index]);
 	if (rev) {
 		r = (rev->guest_rpte & ~mask) | bits;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 3/6] KVM: PPC: BOOK3S: HV: Remove dead code
  2014-06-29 11:17 ` Aneesh Kumar K.V
  (?)
@ 2014-06-29 11:17   ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 51+ messages in thread
From: Aneesh Kumar K.V @ 2014-06-29 11:17 UTC (permalink / raw)
  To: agraf, benh, paulus; +Cc: linuxppc-dev, kvm-ppc, kvm, Aneesh Kumar K.V

Since we do don't support virtual page class key protection mechanism in
the guest, we should not find a keyfault that needs to be forwarded to
the guest. So remove the dead code.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 arch/powerpc/kvm/book3s_64_mmu_hv.c | 9 ---------
 arch/powerpc/kvm/book3s_hv_rm_mmu.c | 9 ---------
 2 files changed, 18 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 1c137f45dd55..590e07b1a43f 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -499,15 +499,6 @@ static int kvmppc_mmu_book3s_64_hv_xlate(struct kvm_vcpu *vcpu, gva_t eaddr,
 	gpte->may_write = hpte_write_permission(pp, key);
 	gpte->may_execute = gpte->may_read && !(gr & (HPTE_R_N | HPTE_R_G));
 
-	/* Storage key permission check for POWER7 */
-	if (data && virtmode && cpu_has_feature(CPU_FTR_ARCH_206)) {
-		int amrfield = hpte_get_skey_perm(gr, vcpu->arch.amr);
-		if (amrfield & 1)
-			gpte->may_read = 0;
-		if (amrfield & 2)
-			gpte->may_write = 0;
-	}
-
 	/* Get the guest physical address */
 	gpte->raddr = kvmppc_mmu_get_real_addr(v, gr, eaddr);
 	return 0;
diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index f908845f7379..1884bff3122c 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -925,15 +925,6 @@ long kvmppc_hpte_hv_fault(struct kvm_vcpu *vcpu, unsigned long addr,
 			return status | DSISR_PROTFAULT;
 	}
 
-	/* Check storage key, if applicable */
-	if (data && (vcpu->arch.shregs.msr & MSR_DR)) {
-		unsigned int perm = hpte_get_skey_perm(gr, vcpu->arch.amr);
-		if (status & DSISR_ISSTORE)
-			perm >>= 1;
-		if (perm & 1)
-			return status | DSISR_KEYFAULT;
-	}
-
 	/* Save HPTE info for virtual-mode handler */
 	vcpu->arch.pgfault_addr = addr;
 	vcpu->arch.pgfault_index = index;
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 3/6] KVM: PPC: BOOK3S: HV: Remove dead code
@ 2014-06-29 11:17   ` Aneesh Kumar K.V
  0 siblings, 0 replies; 51+ messages in thread
From: Aneesh Kumar K.V @ 2014-06-29 11:17 UTC (permalink / raw)
  To: agraf, benh, paulus; +Cc: linuxppc-dev, kvm, kvm-ppc, Aneesh Kumar K.V

Since we do don't support virtual page class key protection mechanism in
the guest, we should not find a keyfault that needs to be forwarded to
the guest. So remove the dead code.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 arch/powerpc/kvm/book3s_64_mmu_hv.c | 9 ---------
 arch/powerpc/kvm/book3s_hv_rm_mmu.c | 9 ---------
 2 files changed, 18 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 1c137f45dd55..590e07b1a43f 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -499,15 +499,6 @@ static int kvmppc_mmu_book3s_64_hv_xlate(struct kvm_vcpu *vcpu, gva_t eaddr,
 	gpte->may_write = hpte_write_permission(pp, key);
 	gpte->may_execute = gpte->may_read && !(gr & (HPTE_R_N | HPTE_R_G));
 
-	/* Storage key permission check for POWER7 */
-	if (data && virtmode && cpu_has_feature(CPU_FTR_ARCH_206)) {
-		int amrfield = hpte_get_skey_perm(gr, vcpu->arch.amr);
-		if (amrfield & 1)
-			gpte->may_read = 0;
-		if (amrfield & 2)
-			gpte->may_write = 0;
-	}
-
 	/* Get the guest physical address */
 	gpte->raddr = kvmppc_mmu_get_real_addr(v, gr, eaddr);
 	return 0;
diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index f908845f7379..1884bff3122c 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -925,15 +925,6 @@ long kvmppc_hpte_hv_fault(struct kvm_vcpu *vcpu, unsigned long addr,
 			return status | DSISR_PROTFAULT;
 	}
 
-	/* Check storage key, if applicable */
-	if (data && (vcpu->arch.shregs.msr & MSR_DR)) {
-		unsigned int perm = hpte_get_skey_perm(gr, vcpu->arch.amr);
-		if (status & DSISR_ISSTORE)
-			perm >>= 1;
-		if (perm & 1)
-			return status | DSISR_KEYFAULT;
-	}
-
 	/* Save HPTE info for virtual-mode handler */
 	vcpu->arch.pgfault_addr = addr;
 	vcpu->arch.pgfault_index = index;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 4/6] KVM: PPC: BOOK3S: HV: Use new functions for mapping/unmapping hpte in host
  2014-06-29 11:17 ` Aneesh Kumar K.V
  (?)
@ 2014-06-29 11:17   ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 51+ messages in thread
From: Aneesh Kumar K.V @ 2014-06-29 11:17 UTC (permalink / raw)
  To: agraf, benh, paulus; +Cc: linuxppc-dev, kvm-ppc, kvm, Aneesh Kumar K.V

We want to use virtual page class key protection mechanism for
indicating a MMIO mapped hpte entry or a guest hpte entry that is swapped out
in the host. Those hptes will be marked valid, but have virtual page
class key set to 30 or 31. These virtual page class numbers are
configured in AMR to deny read/write. To accomodate such a change, add
new functions that map, unmap and check whether a hpte is mapped in the
host. This patch still use HPTE_V_VALID and HPTE_V_ABSENT and don't use
virtual page class keys. But we want to differentiate in the code
where we explicitly check for HPTE_V_VALID with places where we want to
check whether the hpte is host mapped. This patch enables a closer
review for such a change.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/kvm_book3s_64.h | 36 ++++++++++++++++++++++++++++++++
 arch/powerpc/kvm/book3s_64_mmu_hv.c      | 24 +++++++++++----------
 arch/powerpc/kvm/book3s_hv_rm_mmu.c      | 30 ++++++++++++++------------
 3 files changed, 66 insertions(+), 24 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h
index 0aa817933e6a..da00b1f05ea1 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -400,6 +400,42 @@ static inline int is_vrma_hpte(unsigned long hpte_v)
 		(HPTE_V_1TB_SEG | (VRMA_VSID << (40 - 16)));
 }
 
+static inline void __kvmppc_unmap_host_hpte(struct kvm *kvm,
+					    unsigned long *hpte_v,
+					    unsigned long *hpte_r,
+					    bool mmio)
+{
+	*hpte_v |= HPTE_V_ABSENT;
+	if (mmio)
+		*hpte_r |= HPTE_R_KEY_HI | HPTE_R_KEY_LO;
+}
+
+static inline void kvmppc_unmap_host_hpte(struct kvm *kvm, __be64 *hptep)
+{
+	/*
+	 * We will never call this for MMIO
+	 */
+	hptep[0] |= cpu_to_be64(HPTE_V_ABSENT);
+}
+
+static inline void kvmppc_map_host_hpte(struct kvm *kvm, unsigned long *hpte_v,
+					unsigned long *hpte_r)
+{
+	*hpte_v |= HPTE_V_VALID;
+	*hpte_v &= ~HPTE_V_ABSENT;
+}
+
+static inline bool kvmppc_is_host_mapped_hpte(struct kvm *kvm, __be64 *hpte)
+{
+	unsigned long v;
+
+	v = be64_to_cpu(hpte[0]);
+	if (v & HPTE_V_VALID)
+		return true;
+	return false;
+}
+
+
 #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
 /*
  * Note modification of an HPTE; set the HPTE modified bit
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 590e07b1a43f..8ce5e95613f8 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -752,7 +752,8 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu,
 	if (be64_to_cpu(hptep[0]) & HPTE_V_VALID) {
 		/* HPTE was previously valid, so we need to invalidate it */
 		unlock_rmap(rmap);
-		hptep[0] |= cpu_to_be64(HPTE_V_ABSENT);
+		/* Always mark HPTE_V_ABSENT before invalidating */
+		kvmppc_unmap_host_hpte(kvm, hptep);
 		kvmppc_invalidate_hpte(kvm, hptep, index);
 		/* don't lose previous R and C bits */
 		r |= be64_to_cpu(hptep[1]) & (HPTE_R_R | HPTE_R_C);
@@ -897,11 +898,12 @@ static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long *rmapp,
 		/* Now check and modify the HPTE */
 		ptel = rev[i].guest_rpte;
 		psize = hpte_page_size(be64_to_cpu(hptep[0]), ptel);
-		if ((be64_to_cpu(hptep[0]) & HPTE_V_VALID) &&
+		if (kvmppc_is_host_mapped_hpte(kvm, hptep) &&
 		    hpte_rpn(ptel, psize) == gfn) {
 			if (kvm->arch.using_mmu_notifiers)
-				hptep[0] |= cpu_to_be64(HPTE_V_ABSENT);
+				kvmppc_unmap_host_hpte(kvm, hptep);
 			kvmppc_invalidate_hpte(kvm, hptep, i);
+
 			/* Harvest R and C */
 			rcbits = be64_to_cpu(hptep[1]) & (HPTE_R_R | HPTE_R_C);
 			*rmapp |= rcbits << KVMPPC_RMAP_RC_SHIFT;
@@ -990,7 +992,7 @@ static int kvm_age_rmapp(struct kvm *kvm, unsigned long *rmapp,
 		}
 
 		/* Now check and modify the HPTE */
-		if ((be64_to_cpu(hptep[0]) & HPTE_V_VALID) &&
+		if (kvmppc_is_host_mapped_hpte(kvm, hptep) &&
 		    (be64_to_cpu(hptep[1]) & HPTE_R_R)) {
 			kvmppc_clear_ref_hpte(kvm, hptep, i);
 			if (!(rev[i].guest_rpte & HPTE_R_R)) {
@@ -1121,11 +1123,12 @@ static int kvm_test_clear_dirty_npages(struct kvm *kvm, unsigned long *rmapp)
 		}
 
 		/* Now check and modify the HPTE */
-		if (!(hptep[0] & cpu_to_be64(HPTE_V_VALID)))
+		if (!kvmppc_is_host_mapped_hpte(kvm, hptep))
 			continue;
-
-		/* need to make it temporarily absent so C is stable */
-		hptep[0] |= cpu_to_be64(HPTE_V_ABSENT);
+		/*
+		 * need to make it temporarily absent so C is stable
+		 */
+		kvmppc_unmap_host_hpte(kvm, hptep);
 		kvmppc_invalidate_hpte(kvm, hptep, i);
 		v = be64_to_cpu(hptep[0]);
 		r = be64_to_cpu(hptep[1]);
@@ -1141,9 +1144,8 @@ static int kvm_test_clear_dirty_npages(struct kvm *kvm, unsigned long *rmapp)
 				npages_dirty = n;
 			eieio();
 		}
-		v &= ~(HPTE_V_ABSENT | HPTE_V_HVLOCK);
-		v |= HPTE_V_VALID;
-		hptep[0] = cpu_to_be64(v);
+		kvmppc_map_host_hpte(kvm, &v, &r);
+		hptep[0] = cpu_to_be64(v & ~HPTE_V_HVLOCK);
 	} while ((i = j) != head);
 
 	unlock_rmap(rmapp);
diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index 1884bff3122c..e8458c0d1336 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -177,6 +177,7 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags,
 	unsigned int writing;
 	unsigned long mmu_seq;
 	unsigned long rcbits;
+	unsigned int host_unmapped_hpte = 0;
 
 	psize = hpte_page_size(pteh, ptel);
 	if (!psize)
@@ -199,9 +200,10 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags,
 		/* PPC970 can't do emulated MMIO */
 		if (!cpu_has_feature(CPU_FTR_ARCH_206))
 			return H_PARAMETER;
-		/* Emulated MMIO - mark this with key=31 */
-		pteh |= HPTE_V_ABSENT;
-		ptel |= HPTE_R_KEY_HI | HPTE_R_KEY_LO;
+		/*
+		 * Mark the hpte as host unmapped
+		 */
+		host_unmapped_hpte = 2;
 		goto do_insert;
 	}
 
@@ -241,7 +243,8 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags,
 			pa = pte_pfn(pte) << PAGE_SHIFT;
 			pa |= hva & (pte_size - 1);
 			pa |= gpa & ~PAGE_MASK;
-		}
+		} else
+			host_unmapped_hpte = 1;
 	}
 
 	if (pte_size < psize)
@@ -252,8 +255,6 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags,
 
 	if (pa)
 		pteh |= HPTE_V_VALID;
-	else
-		pteh |= HPTE_V_ABSENT;
 
 	/* Check WIMG */
 	if (is_io != ~0ul && !hpte_cache_flags_ok(ptel, is_io)) {
@@ -330,16 +331,17 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags,
 	}
 
 	/* Link HPTE into reverse-map chain */
-	if (pteh & HPTE_V_VALID) {
+	if (!host_unmapped_hpte) {
 		if (realmode)
 			rmap = real_vmalloc_addr(rmap);
 		lock_rmap(rmap);
 		/* Check for pending invalidations under the rmap chain lock */
 		if (kvm->arch.using_mmu_notifiers &&
 		    mmu_notifier_retry(kvm, mmu_seq)) {
-			/* inval in progress, write a non-present HPTE */
-			pteh |= HPTE_V_ABSENT;
-			pteh &= ~HPTE_V_VALID;
+			/*
+			 * inval in progress in host, write host unmapped pte.
+			 */
+			host_unmapped_hpte = 1;
 			unlock_rmap(rmap);
 		} else {
 			kvmppc_add_revmap_chain(kvm, rev, rmap, pte_index,
@@ -350,8 +352,10 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags,
 		}
 	}
 
+	if (host_unmapped_hpte)
+		__kvmppc_unmap_host_hpte(kvm, &pteh, &ptel,
+					 (host_unmapped_hpte == 2));
 	hpte[1] = cpu_to_be64(ptel);
-
 	/* Write the first HPTE dword, unlocking the HPTE and making it valid */
 	eieio();
 	hpte[0] = cpu_to_be64(pteh);
@@ -593,7 +597,7 @@ long kvmppc_h_bulk_remove(struct kvm_vcpu *vcpu)
 			rev = real_vmalloc_addr(&kvm->arch.revmap[pte_index]);
 			note_hpte_modification(kvm, rev);
 
-			if (!(hp0 & HPTE_V_VALID)) {
+			if (!kvmppc_is_host_mapped_hpte(kvm, hp)) {
 				/* insert R and C bits from PTE */
 				rcbits = rev->guest_rpte & (HPTE_R_R|HPTE_R_C);
 				args[j] |= rcbits << (56 - 5);
@@ -678,7 +682,7 @@ long kvmppc_h_protect(struct kvm_vcpu *vcpu, unsigned long flags,
 	r = (be64_to_cpu(hpte[1]) & ~mask) | bits;
 
 	/* Update HPTE */
-	if (v & HPTE_V_VALID) {
+	if (kvmppc_is_host_mapped_hpte(kvm, hpte)) {
 		rb = compute_tlbie_rb(v, r, pte_index);
 		hpte[0] = cpu_to_be64(v & ~HPTE_V_VALID);
 		do_tlbies(kvm, &rb, 1, global_invalidates(kvm, flags), true);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 4/6] KVM: PPC: BOOK3S: HV: Use new functions for mapping/unmapping hpte in host
@ 2014-06-29 11:17   ` Aneesh Kumar K.V
  0 siblings, 0 replies; 51+ messages in thread
From: Aneesh Kumar K.V @ 2014-06-29 11:17 UTC (permalink / raw)
  To: agraf, benh, paulus; +Cc: linuxppc-dev, kvm, kvm-ppc, Aneesh Kumar K.V

We want to use virtual page class key protection mechanism for
indicating a MMIO mapped hpte entry or a guest hpte entry that is swapped out
in the host. Those hptes will be marked valid, but have virtual page
class key set to 30 or 31. These virtual page class numbers are
configured in AMR to deny read/write. To accomodate such a change, add
new functions that map, unmap and check whether a hpte is mapped in the
host. This patch still use HPTE_V_VALID and HPTE_V_ABSENT and don't use
virtual page class keys. But we want to differentiate in the code
where we explicitly check for HPTE_V_VALID with places where we want to
check whether the hpte is host mapped. This patch enables a closer
review for such a change.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/kvm_book3s_64.h | 36 ++++++++++++++++++++++++++++++++
 arch/powerpc/kvm/book3s_64_mmu_hv.c      | 24 +++++++++++----------
 arch/powerpc/kvm/book3s_hv_rm_mmu.c      | 30 ++++++++++++++------------
 3 files changed, 66 insertions(+), 24 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h
index 0aa817933e6a..da00b1f05ea1 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -400,6 +400,42 @@ static inline int is_vrma_hpte(unsigned long hpte_v)
 		(HPTE_V_1TB_SEG | (VRMA_VSID << (40 - 16)));
 }
 
+static inline void __kvmppc_unmap_host_hpte(struct kvm *kvm,
+					    unsigned long *hpte_v,
+					    unsigned long *hpte_r,
+					    bool mmio)
+{
+	*hpte_v |= HPTE_V_ABSENT;
+	if (mmio)
+		*hpte_r |= HPTE_R_KEY_HI | HPTE_R_KEY_LO;
+}
+
+static inline void kvmppc_unmap_host_hpte(struct kvm *kvm, __be64 *hptep)
+{
+	/*
+	 * We will never call this for MMIO
+	 */
+	hptep[0] |= cpu_to_be64(HPTE_V_ABSENT);
+}
+
+static inline void kvmppc_map_host_hpte(struct kvm *kvm, unsigned long *hpte_v,
+					unsigned long *hpte_r)
+{
+	*hpte_v |= HPTE_V_VALID;
+	*hpte_v &= ~HPTE_V_ABSENT;
+}
+
+static inline bool kvmppc_is_host_mapped_hpte(struct kvm *kvm, __be64 *hpte)
+{
+	unsigned long v;
+
+	v = be64_to_cpu(hpte[0]);
+	if (v & HPTE_V_VALID)
+		return true;
+	return false;
+}
+
+
 #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
 /*
  * Note modification of an HPTE; set the HPTE modified bit
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 590e07b1a43f..8ce5e95613f8 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -752,7 +752,8 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu,
 	if (be64_to_cpu(hptep[0]) & HPTE_V_VALID) {
 		/* HPTE was previously valid, so we need to invalidate it */
 		unlock_rmap(rmap);
-		hptep[0] |= cpu_to_be64(HPTE_V_ABSENT);
+		/* Always mark HPTE_V_ABSENT before invalidating */
+		kvmppc_unmap_host_hpte(kvm, hptep);
 		kvmppc_invalidate_hpte(kvm, hptep, index);
 		/* don't lose previous R and C bits */
 		r |= be64_to_cpu(hptep[1]) & (HPTE_R_R | HPTE_R_C);
@@ -897,11 +898,12 @@ static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long *rmapp,
 		/* Now check and modify the HPTE */
 		ptel = rev[i].guest_rpte;
 		psize = hpte_page_size(be64_to_cpu(hptep[0]), ptel);
-		if ((be64_to_cpu(hptep[0]) & HPTE_V_VALID) &&
+		if (kvmppc_is_host_mapped_hpte(kvm, hptep) &&
 		    hpte_rpn(ptel, psize) == gfn) {
 			if (kvm->arch.using_mmu_notifiers)
-				hptep[0] |= cpu_to_be64(HPTE_V_ABSENT);
+				kvmppc_unmap_host_hpte(kvm, hptep);
 			kvmppc_invalidate_hpte(kvm, hptep, i);
+
 			/* Harvest R and C */
 			rcbits = be64_to_cpu(hptep[1]) & (HPTE_R_R | HPTE_R_C);
 			*rmapp |= rcbits << KVMPPC_RMAP_RC_SHIFT;
@@ -990,7 +992,7 @@ static int kvm_age_rmapp(struct kvm *kvm, unsigned long *rmapp,
 		}
 
 		/* Now check and modify the HPTE */
-		if ((be64_to_cpu(hptep[0]) & HPTE_V_VALID) &&
+		if (kvmppc_is_host_mapped_hpte(kvm, hptep) &&
 		    (be64_to_cpu(hptep[1]) & HPTE_R_R)) {
 			kvmppc_clear_ref_hpte(kvm, hptep, i);
 			if (!(rev[i].guest_rpte & HPTE_R_R)) {
@@ -1121,11 +1123,12 @@ static int kvm_test_clear_dirty_npages(struct kvm *kvm, unsigned long *rmapp)
 		}
 
 		/* Now check and modify the HPTE */
-		if (!(hptep[0] & cpu_to_be64(HPTE_V_VALID)))
+		if (!kvmppc_is_host_mapped_hpte(kvm, hptep))
 			continue;
-
-		/* need to make it temporarily absent so C is stable */
-		hptep[0] |= cpu_to_be64(HPTE_V_ABSENT);
+		/*
+		 * need to make it temporarily absent so C is stable
+		 */
+		kvmppc_unmap_host_hpte(kvm, hptep);
 		kvmppc_invalidate_hpte(kvm, hptep, i);
 		v = be64_to_cpu(hptep[0]);
 		r = be64_to_cpu(hptep[1]);
@@ -1141,9 +1144,8 @@ static int kvm_test_clear_dirty_npages(struct kvm *kvm, unsigned long *rmapp)
 				npages_dirty = n;
 			eieio();
 		}
-		v &= ~(HPTE_V_ABSENT | HPTE_V_HVLOCK);
-		v |= HPTE_V_VALID;
-		hptep[0] = cpu_to_be64(v);
+		kvmppc_map_host_hpte(kvm, &v, &r);
+		hptep[0] = cpu_to_be64(v & ~HPTE_V_HVLOCK);
 	} while ((i = j) != head);
 
 	unlock_rmap(rmapp);
diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index 1884bff3122c..e8458c0d1336 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -177,6 +177,7 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags,
 	unsigned int writing;
 	unsigned long mmu_seq;
 	unsigned long rcbits;
+	unsigned int host_unmapped_hpte = 0;
 
 	psize = hpte_page_size(pteh, ptel);
 	if (!psize)
@@ -199,9 +200,10 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags,
 		/* PPC970 can't do emulated MMIO */
 		if (!cpu_has_feature(CPU_FTR_ARCH_206))
 			return H_PARAMETER;
-		/* Emulated MMIO - mark this with key=31 */
-		pteh |= HPTE_V_ABSENT;
-		ptel |= HPTE_R_KEY_HI | HPTE_R_KEY_LO;
+		/*
+		 * Mark the hpte as host unmapped
+		 */
+		host_unmapped_hpte = 2;
 		goto do_insert;
 	}
 
@@ -241,7 +243,8 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags,
 			pa = pte_pfn(pte) << PAGE_SHIFT;
 			pa |= hva & (pte_size - 1);
 			pa |= gpa & ~PAGE_MASK;
-		}
+		} else
+			host_unmapped_hpte = 1;
 	}
 
 	if (pte_size < psize)
@@ -252,8 +255,6 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags,
 
 	if (pa)
 		pteh |= HPTE_V_VALID;
-	else
-		pteh |= HPTE_V_ABSENT;
 
 	/* Check WIMG */
 	if (is_io != ~0ul && !hpte_cache_flags_ok(ptel, is_io)) {
@@ -330,16 +331,17 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags,
 	}
 
 	/* Link HPTE into reverse-map chain */
-	if (pteh & HPTE_V_VALID) {
+	if (!host_unmapped_hpte) {
 		if (realmode)
 			rmap = real_vmalloc_addr(rmap);
 		lock_rmap(rmap);
 		/* Check for pending invalidations under the rmap chain lock */
 		if (kvm->arch.using_mmu_notifiers &&
 		    mmu_notifier_retry(kvm, mmu_seq)) {
-			/* inval in progress, write a non-present HPTE */
-			pteh |= HPTE_V_ABSENT;
-			pteh &= ~HPTE_V_VALID;
+			/*
+			 * inval in progress in host, write host unmapped pte.
+			 */
+			host_unmapped_hpte = 1;
 			unlock_rmap(rmap);
 		} else {
 			kvmppc_add_revmap_chain(kvm, rev, rmap, pte_index,
@@ -350,8 +352,10 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags,
 		}
 	}
 
+	if (host_unmapped_hpte)
+		__kvmppc_unmap_host_hpte(kvm, &pteh, &ptel,
+					 (host_unmapped_hpte == 2));
 	hpte[1] = cpu_to_be64(ptel);
-
 	/* Write the first HPTE dword, unlocking the HPTE and making it valid */
 	eieio();
 	hpte[0] = cpu_to_be64(pteh);
@@ -593,7 +597,7 @@ long kvmppc_h_bulk_remove(struct kvm_vcpu *vcpu)
 			rev = real_vmalloc_addr(&kvm->arch.revmap[pte_index]);
 			note_hpte_modification(kvm, rev);
 
-			if (!(hp0 & HPTE_V_VALID)) {
+			if (!kvmppc_is_host_mapped_hpte(kvm, hp)) {
 				/* insert R and C bits from PTE */
 				rcbits = rev->guest_rpte & (HPTE_R_R|HPTE_R_C);
 				args[j] |= rcbits << (56 - 5);
@@ -678,7 +682,7 @@ long kvmppc_h_protect(struct kvm_vcpu *vcpu, unsigned long flags,
 	r = (be64_to_cpu(hpte[1]) & ~mask) | bits;
 
 	/* Update HPTE */
-	if (v & HPTE_V_VALID) {
+	if (kvmppc_is_host_mapped_hpte(kvm, hpte)) {
 		rb = compute_tlbie_rb(v, r, pte_index);
 		hpte[0] = cpu_to_be64(v & ~HPTE_V_VALID);
 		do_tlbies(kvm, &rb, 1, global_invalidates(kvm, flags), true);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 5/6] KVM: PPC: BOOK3S: Use hpte_update_in_progress to track invalid hpte during an hpte update
  2014-06-29 11:17 ` Aneesh Kumar K.V
  (?)
@ 2014-06-29 11:17   ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 51+ messages in thread
From: Aneesh Kumar K.V @ 2014-06-29 11:17 UTC (permalink / raw)
  To: agraf, benh, paulus; +Cc: linuxppc-dev, kvm-ppc, kvm, Aneesh Kumar K.V

As per ISA, we first need to mark hpte invalid (V=0) before we update
the hpte lower half bits. With virtual page class key protection mechanism we want
to send any fault other than key fault to guest directly without
searching the hash page table. But then we can get NO_HPTE fault while
we are updating the hpte. To track that add a vm specific atomic
variable that we check in the fault path to always send the fault
to host.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/kvm_book3s_64.h |  1 +
 arch/powerpc/include/asm/kvm_host.h      |  1 +
 arch/powerpc/kernel/asm-offsets.c        |  1 +
 arch/powerpc/kvm/book3s_64_mmu_hv.c      | 19 ++++++----
 arch/powerpc/kvm/book3s_hv.c             |  1 +
 arch/powerpc/kvm/book3s_hv_rm_mmu.c      | 40 +++++++++++++++++++--
 arch/powerpc/kvm/book3s_hv_rmhandlers.S  | 60 +++++++++++++++++++++++++++++---
 7 files changed, 109 insertions(+), 14 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h
index da00b1f05ea1..a6bf41865a66 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -416,6 +416,7 @@ static inline void kvmppc_unmap_host_hpte(struct kvm *kvm, __be64 *hptep)
 	 * We will never call this for MMIO
 	 */
 	hptep[0] |= cpu_to_be64(HPTE_V_ABSENT);
+	atomic_dec(&kvm->arch.hpte_update_in_progress);
 }
 
 static inline void kvmppc_map_host_hpte(struct kvm *kvm, unsigned long *hpte_v,
diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index f9ae69682ce1..0a9ff60fae4c 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -254,6 +254,7 @@ struct kvm_arch {
 	atomic_t hpte_mod_interest;
 	spinlock_t slot_phys_lock;
 	cpumask_t need_tlb_flush;
+	atomic_t hpte_update_in_progress;
 	struct kvmppc_vcore *vcores[KVM_MAX_VCORES];
 	int hpt_cma_alloc;
 #endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */
diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
index f5995a912213..54a36110f8f2 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -496,6 +496,7 @@ int main(void)
 	DEFINE(KVM_LPCR, offsetof(struct kvm, arch.lpcr));
 	DEFINE(KVM_RMOR, offsetof(struct kvm, arch.rmor));
 	DEFINE(KVM_VRMA_SLB_V, offsetof(struct kvm, arch.vrma_slb_v));
+	DEFINE(KVM_HPTE_UPDATE, offsetof(struct kvm, arch.hpte_update_in_progress));
 	DEFINE(VCPU_DSISR, offsetof(struct kvm_vcpu, arch.shregs.dsisr));
 	DEFINE(VCPU_DAR, offsetof(struct kvm_vcpu, arch.shregs.dar));
 	DEFINE(VCPU_VPA, offsetof(struct kvm_vcpu, arch.vpa.pinned_addr));
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 8ce5e95613f8..cb7a616aacb1 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -592,6 +592,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu,
 	unsigned int writing, write_ok;
 	struct vm_area_struct *vma;
 	unsigned long rcbits;
+	bool hpte_invalidated = false;
 
 	/*
 	 * Real-mode code has already searched the HPT and found the
@@ -750,13 +751,15 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu,
 	r &= rcbits | ~(HPTE_R_R | HPTE_R_C);
 
 	if (be64_to_cpu(hptep[0]) & HPTE_V_VALID) {
-		/* HPTE was previously valid, so we need to invalidate it */
+		/*
+		 * If we had mapped this hpte before, we now need to
+		 * invalidate that.
+		 */
 		unlock_rmap(rmap);
-		/* Always mark HPTE_V_ABSENT before invalidating */
-		kvmppc_unmap_host_hpte(kvm, hptep);
 		kvmppc_invalidate_hpte(kvm, hptep, index);
 		/* don't lose previous R and C bits */
 		r |= be64_to_cpu(hptep[1]) & (HPTE_R_R | HPTE_R_C);
+		hpte_invalidated = true;
 	} else {
 		kvmppc_add_revmap_chain(kvm, rev, rmap, index, 0);
 	}
@@ -765,6 +768,9 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu,
 	eieio();
 	hptep[0] = cpu_to_be64(hpte[0]);
 	asm volatile("ptesync" : : : "memory");
+	if (hpte_invalidated)
+		atomic_dec(&kvm->arch.hpte_update_in_progress);
+
 	preempt_enable();
 	if (page && hpte_is_writable(r))
 		SetPageDirty(page);
@@ -1128,10 +1134,9 @@ static int kvm_test_clear_dirty_npages(struct kvm *kvm, unsigned long *rmapp)
 		/*
 		 * need to make it temporarily absent so C is stable
 		 */
-		kvmppc_unmap_host_hpte(kvm, hptep);
-		kvmppc_invalidate_hpte(kvm, hptep, i);
 		v = be64_to_cpu(hptep[0]);
 		r = be64_to_cpu(hptep[1]);
+		kvmppc_invalidate_hpte(kvm, hptep, i);
 		if (r & HPTE_R_C) {
 			hptep[1] = cpu_to_be64(r & ~HPTE_R_C);
 			if (!(rev[i].guest_rpte & HPTE_R_C)) {
@@ -1144,8 +1149,8 @@ static int kvm_test_clear_dirty_npages(struct kvm *kvm, unsigned long *rmapp)
 				npages_dirty = n;
 			eieio();
 		}
-		kvmppc_map_host_hpte(kvm, &v, &r);
-		hptep[0] = cpu_to_be64(v & ~HPTE_V_HVLOCK);
+		hptep[0] = cpu_to_be64(v & ~HPTE_V_LOCK);
+		atomic_dec(&kvm->arch.hpte_update_in_progress);
 	} while ((i = j) != head);
 
 	unlock_rmap(rmapp);
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 47d3f20ad10f..328416f28a55 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -2291,6 +2291,7 @@ static int kvmppc_core_init_vm_hv(struct kvm *kvm)
 
 	kvm->arch.using_mmu_notifiers = !!cpu_has_feature(CPU_FTR_ARCH_206);
 	spin_lock_init(&kvm->arch.slot_phys_lock);
+	atomic_set(&kvm->arch.hpte_update_in_progress, 0);
 
 	/*
 	 * Track that we now have a HV mode VM active. This blocks secondary
diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index e8458c0d1336..d628d2810c93 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -647,6 +647,7 @@ long kvmppc_h_protect(struct kvm_vcpu *vcpu, unsigned long flags,
 	struct revmap_entry *rev;
 	unsigned long v, r, rb, mask, bits;
 	u64 pte;
+	bool hpte_invalidated = false;
 
 	if (pte_index >= kvm->arch.hpt_npte)
 		return H_PARAMETER;
@@ -683,8 +684,26 @@ long kvmppc_h_protect(struct kvm_vcpu *vcpu, unsigned long flags,
 
 	/* Update HPTE */
 	if (kvmppc_is_host_mapped_hpte(kvm, hpte)) {
-		rb = compute_tlbie_rb(v, r, pte_index);
+		/*
+		 * We use this in the fault path to check whether a
+		 * NO_HPTE fault should be send to guest. In this
+		 * case we don't want to send it to guest.
+		 */
+		atomic_inc(&kvm->arch.hpte_update_in_progress);
+		/*
+		 * We want to ensure that other cpus see the hpte_update_in_progress
+		 * change before taking a page fault due to us clearing the valid
+		 * bit below
+		 */
+		smp_wmb();
+		hpte_invalidated = true;
+		/*
+		 * We need to mark V = 0 before doing a tlb invalidate.
+		 * Refer Page table entry modifying section in ISA
+		 */
 		hpte[0] = cpu_to_be64(v & ~HPTE_V_VALID);
+
+		rb = compute_tlbie_rb(v, r, pte_index);
 		do_tlbies(kvm, &rb, 1, global_invalidates(kvm, flags), true);
 		/*
 		 * If the host has this page as readonly but the guest
@@ -714,6 +733,8 @@ long kvmppc_h_protect(struct kvm_vcpu *vcpu, unsigned long flags,
 	eieio();
 	hpte[0] = cpu_to_be64(v & ~HPTE_V_HVLOCK);
 	asm volatile("ptesync" : : : "memory");
+	if (hpte_invalidated)
+		atomic_dec(&kvm->arch.hpte_update_in_progress);
 	return H_SUCCESS;
 }
 
@@ -755,7 +776,22 @@ void kvmppc_invalidate_hpte(struct kvm *kvm, __be64 *hptep,
 			unsigned long pte_index)
 {
 	unsigned long rb;
-
+	/*
+	 * We use this in the fault path to check whether a
+	 * NO_HPTE fault should be send to guest. In this
+	 * case we don't want to send it to guest.
+	 */
+	atomic_inc(&kvm->arch.hpte_update_in_progress);
+	/*
+	 * We want to ensure that other cpus see the hpte_update_in_progress
+	 * change before taking a page fault due to us clearing the valid
+	 * bit below
+	 */
+	smp_wmb();
+	/*
+	 * We need to mark V = 0 before doing a tlb invalidate.
+	 * Refer Page table entry modifying section in ISA
+	 */
 	hptep[0] &= ~cpu_to_be64(HPTE_V_VALID);
 	rb = compute_tlbie_rb(be64_to_cpu(hptep[0]), be64_to_cpu(hptep[1]),
 			      pte_index);
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index 1ff3ebdd2ab0..0b425da9f8db 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -1790,11 +1790,33 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
 kvmppc_hdsi:
 	mfspr	r4, SPRN_HDAR
 	mfspr	r6, SPRN_HDSISR
+	/*
+	 * first check whether we have an hpte update in progress.
+	 * If so, we go to host directly
+	 */
+	ld	r0, VCPU_KVM(r9)
+	lwz	r0, KVM_HPTE_UPDATE(r0)
+	/*
+	 * If update is in progress go to host directly
+	 */
+	cmpwi	r0, 0
+	bne	5f
+
+	andi.	r0, r11, MSR_DR		/* data relocation enabled? */
+	beq	3f
+
 	/* HPTE not found fault or protection fault? */
 	andis.	r0, r6, (DSISR_NOHPTE | DSISR_PROTFAULT)@h
 	beq	1f			/* if not, send it to the guest */
-	andi.	r0, r11, MSR_DR		/* data relocation enabled? */
-	beq	3f
+	b	8f
+5:
+	/*
+	 * real mode access to be sent to host. because
+	 * of an hpte update in progress
+	 */
+	andi.	r0, r11, MSR_DR
+	beq	7f
+8:
 	clrrdi	r0, r4, 28
 	PPC_SLBFEE_DOT(R5, R0)		/* if so, look up SLB */
 	bne	1f			/* if no SLB entry found */
@@ -1833,7 +1855,14 @@ fast_interrupt_c_return:
 	mr	r4, r9
 	b	fast_guest_return
 
-3:	ld	r5, VCPU_KVM(r9)	/* not relocated, use VRMA */
+3:	/*
+	 * Check whether we can send the fault directly to
+	 * guest.
+	 */
+	andis.	r0, r6, (DSISR_NOHPTE | DSISR_PROTFAULT)@h
+	beq	1b
+7:
+	ld	r5, VCPU_KVM(r9)	/* not relocated, use VRMA */
 	ld	r5, KVM_VRMA_SLB_V(r5)
 	b	4b
 
@@ -1865,10 +1894,27 @@ fast_interrupt_c_return:
  * it is an HPTE not found fault for a page that we have paged out.
  */
 kvmppc_hisi:
+	/* first check whether we have an hpte update in progress.
+	 * If so, we go to host directly
+	 */
+	ld	r0, VCPU_KVM(r9)	/* not relocated, use VRMA */
+	lwz	r0, KVM_HPTE_UPDATE(r0)
+	/*
+	 * If update is in progress go to host directly
+	 */
+	cmpwi	r0, 0
+	bne	5f
+
+	andi.	r0, r11, MSR_IR		/* instruction relocation enabled? */
+	beq	3f
+
 	andis.	r0, r11, SRR1_ISI_NOPT@h
 	beq	1f
+	b	8f
+5:
 	andi.	r0, r11, MSR_IR		/* instruction relocation enabled? */
-	beq	3f
+	beq	7f
+8:
 	clrrdi	r0, r10, 28
 	PPC_SLBFEE_DOT(R5, R0)		/* if so, look up SLB */
 	bne	1f			/* if no SLB entry found */
@@ -1896,7 +1942,11 @@ kvmppc_hisi:
 	bl	kvmppc_msr_interrupt
 	b	fast_interrupt_c_return
 
-3:	ld	r6, VCPU_KVM(r9)	/* not relocated, use VRMA */
+3:	/* Check whether we should send this to guest */
+	andis.	r0, r11, SRR1_ISI_NOPT@h
+	beq	1b
+7:
+	ld	r6, VCPU_KVM(r9)	/* not relocated, use VRMA */
 	ld	r5, KVM_VRMA_SLB_V(r6)
 	b	4b
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 5/6] KVM: PPC: BOOK3S: Use hpte_update_in_progress to track invalid hpte during an hpte update
@ 2014-06-29 11:17   ` Aneesh Kumar K.V
  0 siblings, 0 replies; 51+ messages in thread
From: Aneesh Kumar K.V @ 2014-06-29 11:17 UTC (permalink / raw)
  To: agraf, benh, paulus; +Cc: linuxppc-dev, kvm, kvm-ppc, Aneesh Kumar K.V

As per ISA, we first need to mark hpte invalid (V=0) before we update
the hpte lower half bits. With virtual page class key protection mechanism we want
to send any fault other than key fault to guest directly without
searching the hash page table. But then we can get NO_HPTE fault while
we are updating the hpte. To track that add a vm specific atomic
variable that we check in the fault path to always send the fault
to host.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/kvm_book3s_64.h |  1 +
 arch/powerpc/include/asm/kvm_host.h      |  1 +
 arch/powerpc/kernel/asm-offsets.c        |  1 +
 arch/powerpc/kvm/book3s_64_mmu_hv.c      | 19 ++++++----
 arch/powerpc/kvm/book3s_hv.c             |  1 +
 arch/powerpc/kvm/book3s_hv_rm_mmu.c      | 40 +++++++++++++++++++--
 arch/powerpc/kvm/book3s_hv_rmhandlers.S  | 60 +++++++++++++++++++++++++++++---
 7 files changed, 109 insertions(+), 14 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h
index da00b1f05ea1..a6bf41865a66 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -416,6 +416,7 @@ static inline void kvmppc_unmap_host_hpte(struct kvm *kvm, __be64 *hptep)
 	 * We will never call this for MMIO
 	 */
 	hptep[0] |= cpu_to_be64(HPTE_V_ABSENT);
+	atomic_dec(&kvm->arch.hpte_update_in_progress);
 }
 
 static inline void kvmppc_map_host_hpte(struct kvm *kvm, unsigned long *hpte_v,
diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index f9ae69682ce1..0a9ff60fae4c 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -254,6 +254,7 @@ struct kvm_arch {
 	atomic_t hpte_mod_interest;
 	spinlock_t slot_phys_lock;
 	cpumask_t need_tlb_flush;
+	atomic_t hpte_update_in_progress;
 	struct kvmppc_vcore *vcores[KVM_MAX_VCORES];
 	int hpt_cma_alloc;
 #endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */
diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
index f5995a912213..54a36110f8f2 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -496,6 +496,7 @@ int main(void)
 	DEFINE(KVM_LPCR, offsetof(struct kvm, arch.lpcr));
 	DEFINE(KVM_RMOR, offsetof(struct kvm, arch.rmor));
 	DEFINE(KVM_VRMA_SLB_V, offsetof(struct kvm, arch.vrma_slb_v));
+	DEFINE(KVM_HPTE_UPDATE, offsetof(struct kvm, arch.hpte_update_in_progress));
 	DEFINE(VCPU_DSISR, offsetof(struct kvm_vcpu, arch.shregs.dsisr));
 	DEFINE(VCPU_DAR, offsetof(struct kvm_vcpu, arch.shregs.dar));
 	DEFINE(VCPU_VPA, offsetof(struct kvm_vcpu, arch.vpa.pinned_addr));
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 8ce5e95613f8..cb7a616aacb1 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -592,6 +592,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu,
 	unsigned int writing, write_ok;
 	struct vm_area_struct *vma;
 	unsigned long rcbits;
+	bool hpte_invalidated = false;
 
 	/*
 	 * Real-mode code has already searched the HPT and found the
@@ -750,13 +751,15 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu,
 	r &= rcbits | ~(HPTE_R_R | HPTE_R_C);
 
 	if (be64_to_cpu(hptep[0]) & HPTE_V_VALID) {
-		/* HPTE was previously valid, so we need to invalidate it */
+		/*
+		 * If we had mapped this hpte before, we now need to
+		 * invalidate that.
+		 */
 		unlock_rmap(rmap);
-		/* Always mark HPTE_V_ABSENT before invalidating */
-		kvmppc_unmap_host_hpte(kvm, hptep);
 		kvmppc_invalidate_hpte(kvm, hptep, index);
 		/* don't lose previous R and C bits */
 		r |= be64_to_cpu(hptep[1]) & (HPTE_R_R | HPTE_R_C);
+		hpte_invalidated = true;
 	} else {
 		kvmppc_add_revmap_chain(kvm, rev, rmap, index, 0);
 	}
@@ -765,6 +768,9 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu,
 	eieio();
 	hptep[0] = cpu_to_be64(hpte[0]);
 	asm volatile("ptesync" : : : "memory");
+	if (hpte_invalidated)
+		atomic_dec(&kvm->arch.hpte_update_in_progress);
+
 	preempt_enable();
 	if (page && hpte_is_writable(r))
 		SetPageDirty(page);
@@ -1128,10 +1134,9 @@ static int kvm_test_clear_dirty_npages(struct kvm *kvm, unsigned long *rmapp)
 		/*
 		 * need to make it temporarily absent so C is stable
 		 */
-		kvmppc_unmap_host_hpte(kvm, hptep);
-		kvmppc_invalidate_hpte(kvm, hptep, i);
 		v = be64_to_cpu(hptep[0]);
 		r = be64_to_cpu(hptep[1]);
+		kvmppc_invalidate_hpte(kvm, hptep, i);
 		if (r & HPTE_R_C) {
 			hptep[1] = cpu_to_be64(r & ~HPTE_R_C);
 			if (!(rev[i].guest_rpte & HPTE_R_C)) {
@@ -1144,8 +1149,8 @@ static int kvm_test_clear_dirty_npages(struct kvm *kvm, unsigned long *rmapp)
 				npages_dirty = n;
 			eieio();
 		}
-		kvmppc_map_host_hpte(kvm, &v, &r);
-		hptep[0] = cpu_to_be64(v & ~HPTE_V_HVLOCK);
+		hptep[0] = cpu_to_be64(v & ~HPTE_V_LOCK);
+		atomic_dec(&kvm->arch.hpte_update_in_progress);
 	} while ((i = j) != head);
 
 	unlock_rmap(rmapp);
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 47d3f20ad10f..328416f28a55 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -2291,6 +2291,7 @@ static int kvmppc_core_init_vm_hv(struct kvm *kvm)
 
 	kvm->arch.using_mmu_notifiers = !!cpu_has_feature(CPU_FTR_ARCH_206);
 	spin_lock_init(&kvm->arch.slot_phys_lock);
+	atomic_set(&kvm->arch.hpte_update_in_progress, 0);
 
 	/*
 	 * Track that we now have a HV mode VM active. This blocks secondary
diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index e8458c0d1336..d628d2810c93 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -647,6 +647,7 @@ long kvmppc_h_protect(struct kvm_vcpu *vcpu, unsigned long flags,
 	struct revmap_entry *rev;
 	unsigned long v, r, rb, mask, bits;
 	u64 pte;
+	bool hpte_invalidated = false;
 
 	if (pte_index >= kvm->arch.hpt_npte)
 		return H_PARAMETER;
@@ -683,8 +684,26 @@ long kvmppc_h_protect(struct kvm_vcpu *vcpu, unsigned long flags,
 
 	/* Update HPTE */
 	if (kvmppc_is_host_mapped_hpte(kvm, hpte)) {
-		rb = compute_tlbie_rb(v, r, pte_index);
+		/*
+		 * We use this in the fault path to check whether a
+		 * NO_HPTE fault should be send to guest. In this
+		 * case we don't want to send it to guest.
+		 */
+		atomic_inc(&kvm->arch.hpte_update_in_progress);
+		/*
+		 * We want to ensure that other cpus see the hpte_update_in_progress
+		 * change before taking a page fault due to us clearing the valid
+		 * bit below
+		 */
+		smp_wmb();
+		hpte_invalidated = true;
+		/*
+		 * We need to mark V = 0 before doing a tlb invalidate.
+		 * Refer Page table entry modifying section in ISA
+		 */
 		hpte[0] = cpu_to_be64(v & ~HPTE_V_VALID);
+
+		rb = compute_tlbie_rb(v, r, pte_index);
 		do_tlbies(kvm, &rb, 1, global_invalidates(kvm, flags), true);
 		/*
 		 * If the host has this page as readonly but the guest
@@ -714,6 +733,8 @@ long kvmppc_h_protect(struct kvm_vcpu *vcpu, unsigned long flags,
 	eieio();
 	hpte[0] = cpu_to_be64(v & ~HPTE_V_HVLOCK);
 	asm volatile("ptesync" : : : "memory");
+	if (hpte_invalidated)
+		atomic_dec(&kvm->arch.hpte_update_in_progress);
 	return H_SUCCESS;
 }
 
@@ -755,7 +776,22 @@ void kvmppc_invalidate_hpte(struct kvm *kvm, __be64 *hptep,
 			unsigned long pte_index)
 {
 	unsigned long rb;
-
+	/*
+	 * We use this in the fault path to check whether a
+	 * NO_HPTE fault should be send to guest. In this
+	 * case we don't want to send it to guest.
+	 */
+	atomic_inc(&kvm->arch.hpte_update_in_progress);
+	/*
+	 * We want to ensure that other cpus see the hpte_update_in_progress
+	 * change before taking a page fault due to us clearing the valid
+	 * bit below
+	 */
+	smp_wmb();
+	/*
+	 * We need to mark V = 0 before doing a tlb invalidate.
+	 * Refer Page table entry modifying section in ISA
+	 */
 	hptep[0] &= ~cpu_to_be64(HPTE_V_VALID);
 	rb = compute_tlbie_rb(be64_to_cpu(hptep[0]), be64_to_cpu(hptep[1]),
 			      pte_index);
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index 1ff3ebdd2ab0..0b425da9f8db 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -1790,11 +1790,33 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
 kvmppc_hdsi:
 	mfspr	r4, SPRN_HDAR
 	mfspr	r6, SPRN_HDSISR
+	/*
+	 * first check whether we have an hpte update in progress.
+	 * If so, we go to host directly
+	 */
+	ld	r0, VCPU_KVM(r9)
+	lwz	r0, KVM_HPTE_UPDATE(r0)
+	/*
+	 * If update is in progress go to host directly
+	 */
+	cmpwi	r0, 0
+	bne	5f
+
+	andi.	r0, r11, MSR_DR		/* data relocation enabled? */
+	beq	3f
+
 	/* HPTE not found fault or protection fault? */
 	andis.	r0, r6, (DSISR_NOHPTE | DSISR_PROTFAULT)@h
 	beq	1f			/* if not, send it to the guest */
-	andi.	r0, r11, MSR_DR		/* data relocation enabled? */
-	beq	3f
+	b	8f
+5:
+	/*
+	 * real mode access to be sent to host. because
+	 * of an hpte update in progress
+	 */
+	andi.	r0, r11, MSR_DR
+	beq	7f
+8:
 	clrrdi	r0, r4, 28
 	PPC_SLBFEE_DOT(R5, R0)		/* if so, look up SLB */
 	bne	1f			/* if no SLB entry found */
@@ -1833,7 +1855,14 @@ fast_interrupt_c_return:
 	mr	r4, r9
 	b	fast_guest_return
 
-3:	ld	r5, VCPU_KVM(r9)	/* not relocated, use VRMA */
+3:	/*
+	 * Check whether we can send the fault directly to
+	 * guest.
+	 */
+	andis.	r0, r6, (DSISR_NOHPTE | DSISR_PROTFAULT)@h
+	beq	1b
+7:
+	ld	r5, VCPU_KVM(r9)	/* not relocated, use VRMA */
 	ld	r5, KVM_VRMA_SLB_V(r5)
 	b	4b
 
@@ -1865,10 +1894,27 @@ fast_interrupt_c_return:
  * it is an HPTE not found fault for a page that we have paged out.
  */
 kvmppc_hisi:
+	/* first check whether we have an hpte update in progress.
+	 * If so, we go to host directly
+	 */
+	ld	r0, VCPU_KVM(r9)	/* not relocated, use VRMA */
+	lwz	r0, KVM_HPTE_UPDATE(r0)
+	/*
+	 * If update is in progress go to host directly
+	 */
+	cmpwi	r0, 0
+	bne	5f
+
+	andi.	r0, r11, MSR_IR		/* instruction relocation enabled? */
+	beq	3f
+
 	andis.	r0, r11, SRR1_ISI_NOPT@h
 	beq	1f
+	b	8f
+5:
 	andi.	r0, r11, MSR_IR		/* instruction relocation enabled? */
-	beq	3f
+	beq	7f
+8:
 	clrrdi	r0, r10, 28
 	PPC_SLBFEE_DOT(R5, R0)		/* if so, look up SLB */
 	bne	1f			/* if no SLB entry found */
@@ -1896,7 +1942,11 @@ kvmppc_hisi:
 	bl	kvmppc_msr_interrupt
 	b	fast_interrupt_c_return
 
-3:	ld	r6, VCPU_KVM(r9)	/* not relocated, use VRMA */
+3:	/* Check whether we should send this to guest */
+	andis.	r0, r11, SRR1_ISI_NOPT@h
+	beq	1b
+7:
+	ld	r6, VCPU_KVM(r9)	/* not relocated, use VRMA */
 	ld	r5, KVM_VRMA_SLB_V(r6)
 	b	4b
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 6/6] KVM: PPC: BOOK3S: HV: Use virtual page class protection mechanism for host fault and mmio
  2014-06-29 11:17 ` Aneesh Kumar K.V
  (?)
@ 2014-06-29 11:17   ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 51+ messages in thread
From: Aneesh Kumar K.V @ 2014-06-29 11:17 UTC (permalink / raw)
  To: agraf, benh, paulus; +Cc: linuxppc-dev, kvm-ppc, kvm, Aneesh Kumar K.V

With this patch we use AMR class 30 and 31 for indicating a page
fault that should be handled by host. This includes the MMIO access and
the page fault resulting from guest RAM swapout in the host. This
enables us to forward the fault to guest without doing the expensive
hash page table search for finding the hpte entry. With this patch, we
mark hash pte always valid and use class index 30 and 31 for key based
fault. These virtual class index are configured in AMR to deny
read/write. Since access class protection mechanism doesn't work with
VRMA region, we need to handle them separately. We mark those HPTEs
invalid and use the software defined bit, HPTE_V_VRMA, to differentiate
them.

NOTE: We still need to handle protection fault in host so that a
write to KSM shared page is handled in the host.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/kvm_book3s_64.h | 80 +++++++++++++++++++++++++++-----
 arch/powerpc/include/asm/reg.h           |  1 +
 arch/powerpc/kvm/book3s_64_mmu_hv.c      | 48 ++++++++++++++-----
 arch/powerpc/kvm/book3s_hv_rm_mmu.c      | 69 ++++++++++++++++++---------
 arch/powerpc/kvm/book3s_hv_rmhandlers.S  | 52 ++++++++++++++++-----
 5 files changed, 194 insertions(+), 56 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h
index a6bf41865a66..4aa9c3601fe8 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -48,7 +48,18 @@ extern unsigned long kvm_rma_pages;
  * HPTEs.
  */
 #define HPTE_V_HVLOCK	0x40UL
-#define HPTE_V_ABSENT	0x20UL
+/*
+ * VRMA mapping
+ */
+#define HPTE_V_VRMA	0x20UL
+
+#define HPTE_R_HOST_UNMAP_KEY	0x3000000000000e00UL
+/*
+ * We use this to differentiate between an MMIO key fault and
+ * and a key fault resulting from host swapping out the page.
+ */
+#define HPTE_R_MMIO_UNMAP_KEY	0x3000000000000c00UL
+
 
 /*
  * We use this bit in the guest_rpte field of the revmap entry
@@ -405,35 +416,82 @@ static inline void __kvmppc_unmap_host_hpte(struct kvm *kvm,
 					    unsigned long *hpte_r,
 					    bool mmio)
 {
-	*hpte_v |= HPTE_V_ABSENT;
-	if (mmio)
-		*hpte_r |= HPTE_R_KEY_HI | HPTE_R_KEY_LO;
+	/*
+	 * We unmap on host by adding the page to AMR class 31
+	 * which have both read/write access denied.
+	 *
+	 * For VRMA area we mark them invalid.
+	 *
+	 * If we are not using mmu_notifiers we don't use Access
+	 * class protection.
+	 *
+	 * Since we are not changing the hpt directly we don't
+	 * Worry about update ordering.
+	 */
+	if ((*hpte_v & HPTE_V_VRMA) || !kvm->arch.using_mmu_notifiers)
+		*hpte_v &= ~HPTE_V_VALID;
+	else if (!mmio) {
+		*hpte_r |= HPTE_R_HOST_UNMAP_KEY;
+		*hpte_v |= HPTE_V_VALID;
+	} else {
+		*hpte_r |= HPTE_R_MMIO_UNMAP_KEY;
+		*hpte_v |= HPTE_V_VALID;
+	}
 }
 
 static inline void kvmppc_unmap_host_hpte(struct kvm *kvm, __be64 *hptep)
 {
+	unsigned long pte_v, pte_r;
+
+	pte_v = be64_to_cpu(hptep[0]);
+	pte_r = be64_to_cpu(hptep[1]);
 	/*
 	 * We will never call this for MMIO
 	 */
-	hptep[0] |= cpu_to_be64(HPTE_V_ABSENT);
+	__kvmppc_unmap_host_hpte(kvm, &pte_v, &pte_r, 0);
+	hptep[1] = cpu_to_be64(pte_r);
+	eieio();
+	hptep[0] = cpu_to_be64(pte_v);
+	asm volatile("ptesync" : : : "memory");
+	/*
+	 * we have now successfully marked the hpte using key bits
+	 */
 	atomic_dec(&kvm->arch.hpte_update_in_progress);
 }
 
 static inline void kvmppc_map_host_hpte(struct kvm *kvm, unsigned long *hpte_v,
 					unsigned long *hpte_r)
 {
-	*hpte_v |= HPTE_V_VALID;
-	*hpte_v &= ~HPTE_V_ABSENT;
+	/*
+	 * We will never try to map an MMIO region
+	 */
+	if ((*hpte_v & HPTE_V_VRMA) || !kvm->arch.using_mmu_notifiers)
+		*hpte_v |= HPTE_V_VALID;
+	else {
+		/*
+		 * When we allow guest keys we should set this with key
+		 * for this page.
+		 */
+		*hpte_r &= ~(HPTE_R_KEY_HI | HPTE_R_KEY_LO);
+	}
 }
 
 static inline bool kvmppc_is_host_mapped_hpte(struct kvm *kvm, __be64 *hpte)
 {
-	unsigned long v;
+	unsigned long v, r;
 
 	v = be64_to_cpu(hpte[0]);
-	if (v & HPTE_V_VALID)
-		return true;
-	return false;
+	if ((v & HPTE_V_VRMA) || !kvm->arch.using_mmu_notifiers)
+		return v & HPTE_V_VALID;
+
+	r = be64_to_cpu(hpte[1]);
+	if (!(v & HPTE_V_VALID))
+		return false;
+	if ((r & (HPTE_R_KEY_HI | HPTE_R_KEY_LO)) == HPTE_R_HOST_UNMAP_KEY)
+		return false;
+	if ((r & (HPTE_R_KEY_HI | HPTE_R_KEY_LO)) == HPTE_R_MMIO_UNMAP_KEY)
+		return false;
+	return true;
 }
 
 
diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
index 1f34ef7ec4a8..ca9a7aebc9ce 100644
--- a/arch/powerpc/include/asm/reg.h
+++ b/arch/powerpc/include/asm/reg.h
@@ -605,6 +605,7 @@
 #define   SRR1_ISI_NOPT		0x40000000 /* ISI: Not found in hash */
 #define   SRR1_ISI_N_OR_G	0x10000000 /* ISI: Access is no-exec or G */
 #define   SRR1_ISI_PROT		0x08000000 /* ISI: Other protection fault */
+#define   SRR1_ISI_KEYFAULT	0x00200000	/* Key fault */
 #define   SRR1_WAKEMASK		0x00380000 /* reason for wakeup */
 #define   SRR1_WAKESYSERR	0x00300000 /* System error */
 #define   SRR1_WAKEEE		0x00200000 /* External interrupt */
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index cb7a616aacb1..06f860d84d69 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -214,6 +214,11 @@ void kvmppc_map_vrma(struct kvm_vcpu *vcpu, struct kvm_memory_slot *memslot,
 		hash = (hash << 3) + 7;
 		hp_v = hp0 | ((addr >> 16) & ~0x7fUL);
 		hp_r = hp1 | addr;
+		/*
+		 * VRMA mapping doesn't work with access class protection
+		 * mechanism, hence don't use keys for them
+		 */
+		hp_v |= HPTE_V_VRMA;
 		ret = kvmppc_virtmode_do_h_enter(kvm, H_EXACT, hash, hp_v, hp_r,
 						 &idx_ret);
 		if (ret != H_SUCCESS) {
@@ -409,7 +414,7 @@ long kvmppc_virtmode_h_enter(struct kvm_vcpu *vcpu, unsigned long flags,
 	/*
 	 * Clear few bits, when called via hcall
 	 */
-	pteh &= ~(HPTE_V_HVLOCK | HPTE_V_ABSENT | HPTE_V_VALID);
+	pteh &= ~(HPTE_V_HVLOCK | HPTE_V_VRMA | HPTE_V_VALID);
 	ptel &= ~(HPTE_R_KEY_HI | HPTE_R_KEY_LO | HPTE_GR_RESERVED);
 
 	return kvmppc_virtmode_do_h_enter(vcpu->kvm, flags, pte_index,
@@ -472,7 +477,7 @@ static int kvmppc_mmu_book3s_64_hv_xlate(struct kvm_vcpu *vcpu, gva_t eaddr,
 	preempt_disable();
 	/* Find the HPTE in the hash table */
 	index = kvmppc_hv_find_lock_hpte(kvm, eaddr, slb_v,
-					 HPTE_V_VALID | HPTE_V_ABSENT);
+					 HPTE_V_VALID | HPTE_V_VRMA);
 	if (index < 0) {
 		preempt_enable();
 		return -ENOENT;
@@ -733,7 +738,13 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu,
 		rev->guest_rpte != hpte[2])
 		/* HPTE has been changed under us; let the guest retry */
 		goto out_unlock;
-	hpte[0] = (hpte[0] & ~HPTE_V_ABSENT) | HPTE_V_VALID;
+	/*
+	 * mark this hpte host mapped. We will use this value
+	 * to update actual hpte later. We don't need to clear
+	 * clear key bits, because we use rev->guest_rpte values
+	 * for the lower half.
+	 */
+	hpte[0] |= HPTE_V_VALID;
 
 	/* Always put the HPTE in the rmap chain for the page base address */
 	rmap = &memslot->arch.rmap[gfn_base - memslot->base_gfn];
@@ -906,8 +917,9 @@ static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long *rmapp,
 		psize = hpte_page_size(be64_to_cpu(hptep[0]), ptel);
 		if (kvmppc_is_host_mapped_hpte(kvm, hptep) &&
 		    hpte_rpn(ptel, psize) == gfn) {
-			if (kvm->arch.using_mmu_notifiers)
-				kvmppc_unmap_host_hpte(kvm, hptep);
+			/*
+			 * For hpte update always invalidate first
+			 */
 			kvmppc_invalidate_hpte(kvm, hptep, i);
 
 			/* Harvest R and C */
@@ -917,6 +929,11 @@ static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long *rmapp,
 				rev[i].guest_rpte = ptel | rcbits;
 				note_hpte_modification(kvm, &rev[i]);
 			}
+			/*
+			 * Mark the hpte umapped so that host can
+			 * handle the faults.
+			 */
+			kvmppc_unmap_host_hpte(kvm, hptep);
 		}
 		unlock_rmap(rmapp);
 		hptep[0] &= ~cpu_to_be64(HPTE_V_HVLOCK);
@@ -1345,7 +1362,7 @@ static long record_hpte(unsigned long flags, __be64 *hptp,
 		return 0;
 
 	valid = 0;
-	if (be64_to_cpu(hptp[0]) & (HPTE_V_VALID | HPTE_V_ABSENT)) {
+	if (be64_to_cpu(hptp[0]) & (HPTE_V_VALID | HPTE_V_VRMA)) {
 		valid = 1;
 		if ((flags & KVM_GET_HTAB_BOLTED_ONLY) &&
 		    !(be64_to_cpu(hptp[0]) & HPTE_V_BOLTED))
@@ -1362,7 +1379,11 @@ static long record_hpte(unsigned long flags, __be64 *hptp,
 			cpu_relax();
 		v = be64_to_cpu(hptp[0]);
 
-		/* re-evaluate valid and dirty from synchronized HPTE value */
+		/*
+		 * re-evaluate valid and dirty from synchronized HPTE value
+		 * We don't need to worry of host unmapped. We keep the valid
+		 * bit set even if we move the hpte to class 31.
+		 */
 		valid = !!(v & HPTE_V_VALID);
 		dirty = !!(revp->guest_rpte & HPTE_GR_MODIFIED);
 
@@ -1374,8 +1395,11 @@ static long record_hpte(unsigned long flags, __be64 *hptp,
 			dirty = 1;
 		}
 
-		if (v & HPTE_V_ABSENT) {
-			v &= ~HPTE_V_ABSENT;
+		if (v & HPTE_V_VRMA) {
+			/*
+			 * unmapped vrma consider them mapped and also
+			 * retain the HPTE_V_VRMA bit.
+			 */
 			v |= HPTE_V_VALID;
 			valid = 1;
 		}
@@ -1559,14 +1583,14 @@ static ssize_t kvm_htab_write(struct file *file, const char __user *buf,
 			lbuf += 2;
 			nb += HPTE_SIZE;
 
-			if (be64_to_cpu(hptp[0]) & (HPTE_V_VALID | HPTE_V_ABSENT))
+			if (be64_to_cpu(hptp[0]) & (HPTE_V_VALID | HPTE_V_VRMA))
 				kvmppc_do_h_remove(kvm, 0, i, 0, tmp);
 			err = -EIO;
 			/*
 			 * Clear few bits we got via read_htab which we
 			 * don't need to carry forward.
 			 */
-			v &= ~(HPTE_V_HVLOCK | HPTE_V_ABSENT | HPTE_V_VALID);
+			v &= ~(HPTE_V_HVLOCK | HPTE_V_VALID);
 			r &= ~(HPTE_R_KEY_HI | HPTE_R_KEY_LO | HPTE_GR_RESERVED);
 
 			ret = kvmppc_virtmode_do_h_enter(kvm, H_EXACT, i, v, r,
@@ -1592,7 +1616,7 @@ static ssize_t kvm_htab_write(struct file *file, const char __user *buf,
 		}
 
 		for (j = 0; j < hdr.n_invalid; ++j) {
-			if (be64_to_cpu(hptp[0]) & (HPTE_V_VALID | HPTE_V_ABSENT))
+			if (be64_to_cpu(hptp[0]) & (HPTE_V_VALID | HPTE_V_VRMA))
 				kvmppc_do_h_remove(kvm, 0, i, 0, tmp);
 			++i;
 			hptp += 2;
diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index d628d2810c93..f907be908b28 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -252,7 +252,13 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags,
 
 	ptel &= ~(HPTE_R_PP0 - psize);
 	ptel |= pa;
-
+	/*
+	 * We mark the pte valid, if it is valid from guest point of view
+	 * For VRMA area we need to mark it still invalid, because access
+	 * class protection mechanism doesn't work with guest real mode
+	 * access.
+	 * Non vrma area is always valid, and vram is valid only if pa is set.
+	 */
 	if (pa)
 		pteh |= HPTE_V_VALID;
 
@@ -278,7 +284,7 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags,
 		for (i = 0; i < 8; ++i) {
 			if ((be64_to_cpu(*hpte) & HPTE_V_VALID) == 0 &&
 			    try_lock_hpte(hpte, HPTE_V_HVLOCK | HPTE_V_VALID |
-					  HPTE_V_ABSENT))
+					  HPTE_V_VRMA))
 				break;
 			hpte += 2;
 		}
@@ -295,7 +301,7 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags,
 				while (!try_lock_hpte(hpte, HPTE_V_HVLOCK))
 					cpu_relax();
 				pte = be64_to_cpu(*hpte);
-				if (!(pte & (HPTE_V_VALID | HPTE_V_ABSENT)))
+				if (!(pte & (HPTE_V_VALID | HPTE_V_VRMA)))
 					break;
 				*hpte &= ~cpu_to_be64(HPTE_V_HVLOCK);
 				hpte += 2;
@@ -307,14 +313,14 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags,
 	} else {
 		hpte = (__be64 *)(kvm->arch.hpt_virt + (pte_index << 4));
 		if (!try_lock_hpte(hpte, HPTE_V_HVLOCK | HPTE_V_VALID |
-				   HPTE_V_ABSENT)) {
+				   HPTE_V_VRMA)) {
 			/* Lock the slot and check again */
 			u64 pte;
 
 			while (!try_lock_hpte(hpte, HPTE_V_HVLOCK))
 				cpu_relax();
 			pte = be64_to_cpu(*hpte);
-			if (pte & (HPTE_V_VALID | HPTE_V_ABSENT)) {
+			if (pte & (HPTE_V_VALID | HPTE_V_VRMA)) {
 				*hpte &= ~cpu_to_be64(HPTE_V_HVLOCK);
 				return H_PTEG_FULL;
 			}
@@ -372,7 +378,7 @@ long kvmppc_h_enter(struct kvm_vcpu *vcpu, unsigned long flags,
 	/*
 	 * Clear few bits. when called via hcall.
 	 */
-	pteh &= ~(HPTE_V_HVLOCK | HPTE_V_ABSENT | HPTE_V_VALID);
+	pteh &= ~(HPTE_V_HVLOCK | HPTE_V_VRMA | HPTE_V_VALID);
 	ptel &= ~(HPTE_R_KEY_HI | HPTE_R_KEY_LO | HPTE_GR_RESERVED);
 
 	return kvmppc_do_h_enter(vcpu->kvm, flags, pte_index, pteh, ptel,
@@ -492,7 +498,7 @@ long kvmppc_do_h_remove(struct kvm *kvm, unsigned long flags,
 	while (!try_lock_hpte(hpte, HPTE_V_HVLOCK))
 		cpu_relax();
 	pte = be64_to_cpu(hpte[0]);
-	if ((pte & (HPTE_V_ABSENT | HPTE_V_VALID)) == 0 ||
+	if ((pte & (HPTE_V_VRMA | HPTE_V_VALID)) == 0 ||
 	    ((flags & H_AVPN) && (pte & ~0x7fUL) != avpn) ||
 	    ((flags & H_ANDCOND) && (pte & avpn) != 0)) {
 		hpte[0] &= ~cpu_to_be64(HPTE_V_HVLOCK);
@@ -503,9 +509,12 @@ long kvmppc_do_h_remove(struct kvm *kvm, unsigned long flags,
 	v = pte & ~HPTE_V_HVLOCK;
 	if (v & HPTE_V_VALID) {
 		u64 pte1;
-
 		pte1 = be64_to_cpu(hpte[1]);
-		hpte[0] &= ~cpu_to_be64(HPTE_V_VALID);
+		/*
+		 * Remove the valid and VRMA bits
+		 */
+		hpte[0] &= ~cpu_to_be64(HPTE_V_VALID | HPTE_V_VRMA);
+
 		rb = compute_tlbie_rb(v, pte1, pte_index);
 		do_tlbies(kvm, &rb, 1, global_invalidates(kvm, flags), true);
 		/* Read PTE low word after tlbie to get final R/C values */
@@ -572,7 +581,7 @@ long kvmppc_h_bulk_remove(struct kvm_vcpu *vcpu)
 			}
 			found = 0;
 			hp0 = be64_to_cpu(hp[0]);
-			if (hp0 & (HPTE_V_ABSENT | HPTE_V_VALID)) {
+			if (hp0 & (HPTE_V_VRMA | HPTE_V_VALID)) {
 				switch (flags & 3) {
 				case 0:		/* absolute */
 					found = 1;
@@ -606,7 +615,7 @@ long kvmppc_h_bulk_remove(struct kvm_vcpu *vcpu)
 			}
 
 			/* leave it locked */
-			hp[0] &= ~cpu_to_be64(HPTE_V_VALID);
+			hp[0] &= ~cpu_to_be64(HPTE_V_VALID | HPTE_V_VRMA);
 			tlbrb[n] = compute_tlbie_rb(be64_to_cpu(hp[0]),
 				be64_to_cpu(hp[1]), pte_index);
 			indexes[n] = j;
@@ -656,7 +665,7 @@ long kvmppc_h_protect(struct kvm_vcpu *vcpu, unsigned long flags,
 	while (!try_lock_hpte(hpte, HPTE_V_HVLOCK))
 		cpu_relax();
 	pte = be64_to_cpu(hpte[0]);
-	if ((pte & (HPTE_V_ABSENT | HPTE_V_VALID)) == 0 ||
+	if ((pte & (HPTE_V_VRMA | HPTE_V_VALID)) == 0 ||
 	    ((flags & H_AVPN) && (pte & ~0x7fUL) != avpn)) {
 		hpte[0] &= ~cpu_to_be64(HPTE_V_HVLOCK);
 		return H_NOT_FOUND;
@@ -758,10 +767,17 @@ long kvmppc_h_read(struct kvm_vcpu *vcpu, unsigned long flags,
 		hpte = (__be64 *)(kvm->arch.hpt_virt + (pte_index << 4));
 		v = be64_to_cpu(hpte[0]) & ~HPTE_V_HVLOCK;
 		r = be64_to_cpu(hpte[1]);
-		if (v & HPTE_V_ABSENT) {
-			v &= ~HPTE_V_ABSENT;
+		if (v & HPTE_V_VRMA) {
+			/*
+			 * don't share vrma bits back to guest
+			 */
+			v &= ~HPTE_V_VRMA;
 			v |= HPTE_V_VALID;
 		}
+		/*
+		 * Clear the AMR class bits
+		 */
+		r &= ~(HPTE_R_KEY_HI | HPTE_R_KEY_LO);
 		if (v & HPTE_V_VALID) {
 			r = rev[i].guest_rpte | (r & (HPTE_R_R | HPTE_R_C));
 			r &= ~HPTE_GR_RESERVED;
@@ -871,8 +887,13 @@ long kvmppc_hv_find_lock_hpte(struct kvm *kvm, gva_t eaddr, unsigned long slb_v,
 			/* Read the PTE racily */
 			v = be64_to_cpu(hpte[i]) & ~HPTE_V_HVLOCK;
 
-			/* Check valid/absent, hash, segment size and AVPN */
-			if (!(v & valid) || (v & mask) != val)
+			/*
+			 * Check hash, segment size and AVPN.
+			 * We can't check for valid here without lock. We do
+			 * mark the hpte not valid while an hpte update is
+			 * in progress.
+			 */
+			if ((v & mask) != val)
 				continue;
 
 			/* Lock the PTE and read it under the lock */
@@ -927,7 +948,7 @@ long kvmppc_hpte_hv_fault(struct kvm_vcpu *vcpu, unsigned long addr,
 	/* For protection fault, expect to find a valid HPTE */
 	valid = HPTE_V_VALID;
 	if (status & DSISR_NOHPTE)
-		valid |= HPTE_V_ABSENT;
+		valid |= HPTE_V_VRMA;
 
 	index = kvmppc_hv_find_lock_hpte(kvm, addr, slb_v, valid);
 	if (index < 0) {
@@ -942,10 +963,17 @@ long kvmppc_hpte_hv_fault(struct kvm_vcpu *vcpu, unsigned long addr,
 	gr = rev->guest_rpte;
 
 	unlock_hpte(hpte, v);
-
-	/* For not found, if the HPTE is valid by now, retry the instruction */
+	/*
+	 * For not found, if the HPTE is valid by now, retry the instruction
+	 */
 	if ((status & DSISR_NOHPTE) && (v & HPTE_V_VALID))
 		return 0;
+	/*
+	 * For key fault if HPTE is host mapped by now retry the instruction
+	 */
+	if ((status & DSISR_KEYFAULT) &&
+	    kvmppc_is_host_mapped_hpte(kvm, hpte))
+		return 0;
 
 	/* Check access permissions to the page */
 	pp = gr & (HPTE_R_PP0 | HPTE_R_PP);
@@ -973,8 +1001,7 @@ long kvmppc_hpte_hv_fault(struct kvm_vcpu *vcpu, unsigned long addr,
 
 	/* Check the storage key to see if it is possibly emulated MMIO */
 	if (data && (vcpu->arch.shregs.msr & MSR_IR) &&
-	    (r & (HPTE_R_KEY_HI | HPTE_R_KEY_LO)) ==
-	    (HPTE_R_KEY_HI | HPTE_R_KEY_LO))
+	    ((r & HPTE_R_MMIO_UNMAP_KEY) == HPTE_R_MMIO_UNMAP_KEY))
 		return -2;	/* MMIO emulation - load instr word */
 
 	return -1;		/* send fault up to host kernel mode */
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index 0b425da9f8db..f1af0d24e2b5 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -834,7 +834,11 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S)
 	mtmsrd	r8
 
 	/* Load up POWER8-specific registers */
+	/*
+	 * Always disable read/write w.r.t to class index 31
+	 */
 	ld	r5, VCPU_IAMR(r4)
+	ori	r5, r5, 0x3
 	lwz	r6, VCPU_PSPB(r4)
 	ld	r7, VCPU_FSCR(r4)
 	mtspr	SPRN_IAMR, r5
@@ -901,10 +905,25 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S)
 	mtspr	SPRN_DSISR, r6
 
 BEGIN_FTR_SECTION
-	/* Restore AMR and UAMOR, set AMOR to all 1s */
-	ld	r5,VCPU_AMR(r4)
+	/* Restore AMR and UAMOR */
+	/*
+	 * Always disable read/write wr.t. to class index
+	 * 30 and 31
+	 */
+	ld	r5, VCPU_AMR(r4)
+	ori	r5, r5, 0xf
+	/*
+	 * UAMOR set so that mask bits for class index 30
+	 * and 31 cannot be updated
+	 */
+
 	ld	r6,VCPU_UAMOR(r4)
-	li	r7,-1
+	rldicr  r6, r6, 0, 59
+	/*
+	 * AMOR set so that mask bits for class index 30
+	 * and 31 cannot be updated
+	 */
+	li	r7, ~0xf
 	mtspr	SPRN_AMR,r5
 	mtspr	SPRN_UAMOR,r6
 	mtspr	SPRN_AMOR,r7
@@ -1801,13 +1820,22 @@ kvmppc_hdsi:
 	 */
 	cmpwi	r0, 0
 	bne	5f
-
-	andi.	r0, r11, MSR_DR		/* data relocation enabled? */
+	/*
+	 * If data relocation is disabled, virtual page class
+	 * key protection mechanis does not apply.
+	 */
+	andi.	r0, r11, MSR_DR
 	beq	3f
-
-	/* HPTE not found fault or protection fault? */
-	andis.	r0, r6, (DSISR_NOHPTE | DSISR_PROTFAULT)@h
-	beq	1f			/* if not, send it to the guest */
+	/*
+	 * If access is not permitted by virtual page class
+	 * key protection, handle it in the host. If not
+	 * send it to the guest.
+	 */
+	andis.	r0, r6, (DSISR_KEYFAULT | DSISR_PROTFAULT)@h
+	beq	1f
+	/*
+	 * skip the real mode check below
+	 */
 	b	8f
 5:
 	/*
@@ -1857,7 +1885,8 @@ fast_interrupt_c_return:
 
 3:	/*
 	 * Check whether we can send the fault directly to
-	 * guest.
+	 * guest. We don't need to worry about keyfault if
+	 * the fault happens in real mode.
 	 */
 	andis.	r0, r6, (DSISR_NOHPTE | DSISR_PROTFAULT)@h
 	beq	1b
@@ -1907,8 +1936,7 @@ kvmppc_hisi:
 
 	andi.	r0, r11, MSR_IR		/* instruction relocation enabled? */
 	beq	3f
-
-	andis.	r0, r11, SRR1_ISI_NOPT@h
+	andis.	r0, r11, SRR1_ISI_KEYFAULT@h
 	beq	1f
 	b	8f
 5:
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 6/6] KVM: PPC: BOOK3S: HV: Use virtual page class protection mechanism for host fault and mmio
@ 2014-06-29 11:17   ` Aneesh Kumar K.V
  0 siblings, 0 replies; 51+ messages in thread
From: Aneesh Kumar K.V @ 2014-06-29 11:17 UTC (permalink / raw)
  To: agraf, benh, paulus; +Cc: linuxppc-dev, kvm, kvm-ppc, Aneesh Kumar K.V

With this patch we use AMR class 30 and 31 for indicating a page
fault that should be handled by host. This includes the MMIO access and
the page fault resulting from guest RAM swapout in the host. This
enables us to forward the fault to guest without doing the expensive
hash page table search for finding the hpte entry. With this patch, we
mark hash pte always valid and use class index 30 and 31 for key based
fault. These virtual class index are configured in AMR to deny
read/write. Since access class protection mechanism doesn't work with
VRMA region, we need to handle them separately. We mark those HPTEs
invalid and use the software defined bit, HPTE_V_VRMA, to differentiate
them.

NOTE: We still need to handle protection fault in host so that a
write to KSM shared page is handled in the host.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/kvm_book3s_64.h | 80 +++++++++++++++++++++++++++-----
 arch/powerpc/include/asm/reg.h           |  1 +
 arch/powerpc/kvm/book3s_64_mmu_hv.c      | 48 ++++++++++++++-----
 arch/powerpc/kvm/book3s_hv_rm_mmu.c      | 69 ++++++++++++++++++---------
 arch/powerpc/kvm/book3s_hv_rmhandlers.S  | 52 ++++++++++++++++-----
 5 files changed, 194 insertions(+), 56 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h
index a6bf41865a66..4aa9c3601fe8 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -48,7 +48,18 @@ extern unsigned long kvm_rma_pages;
  * HPTEs.
  */
 #define HPTE_V_HVLOCK	0x40UL
-#define HPTE_V_ABSENT	0x20UL
+/*
+ * VRMA mapping
+ */
+#define HPTE_V_VRMA	0x20UL
+
+#define HPTE_R_HOST_UNMAP_KEY	0x3000000000000e00UL
+/*
+ * We use this to differentiate between an MMIO key fault and
+ * and a key fault resulting from host swapping out the page.
+ */
+#define HPTE_R_MMIO_UNMAP_KEY	0x3000000000000c00UL
+
 
 /*
  * We use this bit in the guest_rpte field of the revmap entry
@@ -405,35 +416,82 @@ static inline void __kvmppc_unmap_host_hpte(struct kvm *kvm,
 					    unsigned long *hpte_r,
 					    bool mmio)
 {
-	*hpte_v |= HPTE_V_ABSENT;
-	if (mmio)
-		*hpte_r |= HPTE_R_KEY_HI | HPTE_R_KEY_LO;
+	/*
+	 * We unmap on host by adding the page to AMR class 31
+	 * which have both read/write access denied.
+	 *
+	 * For VRMA area we mark them invalid.
+	 *
+	 * If we are not using mmu_notifiers we don't use Access
+	 * class protection.
+	 *
+	 * Since we are not changing the hpt directly we don't
+	 * Worry about update ordering.
+	 */
+	if ((*hpte_v & HPTE_V_VRMA) || !kvm->arch.using_mmu_notifiers)
+		*hpte_v &= ~HPTE_V_VALID;
+	else if (!mmio) {
+		*hpte_r |= HPTE_R_HOST_UNMAP_KEY;
+		*hpte_v |= HPTE_V_VALID;
+	} else {
+		*hpte_r |= HPTE_R_MMIO_UNMAP_KEY;
+		*hpte_v |= HPTE_V_VALID;
+	}
 }
 
 static inline void kvmppc_unmap_host_hpte(struct kvm *kvm, __be64 *hptep)
 {
+	unsigned long pte_v, pte_r;
+
+	pte_v = be64_to_cpu(hptep[0]);
+	pte_r = be64_to_cpu(hptep[1]);
 	/*
 	 * We will never call this for MMIO
 	 */
-	hptep[0] |= cpu_to_be64(HPTE_V_ABSENT);
+	__kvmppc_unmap_host_hpte(kvm, &pte_v, &pte_r, 0);
+	hptep[1] = cpu_to_be64(pte_r);
+	eieio();
+	hptep[0] = cpu_to_be64(pte_v);
+	asm volatile("ptesync" : : : "memory");
+	/*
+	 * we have now successfully marked the hpte using key bits
+	 */
 	atomic_dec(&kvm->arch.hpte_update_in_progress);
 }
 
 static inline void kvmppc_map_host_hpte(struct kvm *kvm, unsigned long *hpte_v,
 					unsigned long *hpte_r)
 {
-	*hpte_v |= HPTE_V_VALID;
-	*hpte_v &= ~HPTE_V_ABSENT;
+	/*
+	 * We will never try to map an MMIO region
+	 */
+	if ((*hpte_v & HPTE_V_VRMA) || !kvm->arch.using_mmu_notifiers)
+		*hpte_v |= HPTE_V_VALID;
+	else {
+		/*
+		 * When we allow guest keys we should set this with key
+		 * for this page.
+		 */
+		*hpte_r &= ~(HPTE_R_KEY_HI | HPTE_R_KEY_LO);
+	}
 }
 
 static inline bool kvmppc_is_host_mapped_hpte(struct kvm *kvm, __be64 *hpte)
 {
-	unsigned long v;
+	unsigned long v, r;
 
 	v = be64_to_cpu(hpte[0]);
-	if (v & HPTE_V_VALID)
-		return true;
-	return false;
+	if ((v & HPTE_V_VRMA) || !kvm->arch.using_mmu_notifiers)
+		return v & HPTE_V_VALID;
+
+	r = be64_to_cpu(hpte[1]);
+	if (!(v & HPTE_V_VALID))
+		return false;
+	if ((r & (HPTE_R_KEY_HI | HPTE_R_KEY_LO)) == HPTE_R_HOST_UNMAP_KEY)
+		return false;
+	if ((r & (HPTE_R_KEY_HI | HPTE_R_KEY_LO)) == HPTE_R_MMIO_UNMAP_KEY)
+		return false;
+	return true;
 }
 
 
diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
index 1f34ef7ec4a8..ca9a7aebc9ce 100644
--- a/arch/powerpc/include/asm/reg.h
+++ b/arch/powerpc/include/asm/reg.h
@@ -605,6 +605,7 @@
 #define   SRR1_ISI_NOPT		0x40000000 /* ISI: Not found in hash */
 #define   SRR1_ISI_N_OR_G	0x10000000 /* ISI: Access is no-exec or G */
 #define   SRR1_ISI_PROT		0x08000000 /* ISI: Other protection fault */
+#define   SRR1_ISI_KEYFAULT	0x00200000	/* Key fault */
 #define   SRR1_WAKEMASK		0x00380000 /* reason for wakeup */
 #define   SRR1_WAKESYSERR	0x00300000 /* System error */
 #define   SRR1_WAKEEE		0x00200000 /* External interrupt */
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index cb7a616aacb1..06f860d84d69 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -214,6 +214,11 @@ void kvmppc_map_vrma(struct kvm_vcpu *vcpu, struct kvm_memory_slot *memslot,
 		hash = (hash << 3) + 7;
 		hp_v = hp0 | ((addr >> 16) & ~0x7fUL);
 		hp_r = hp1 | addr;
+		/*
+		 * VRMA mapping doesn't work with access class protection
+		 * mechanism, hence don't use keys for them
+		 */
+		hp_v |= HPTE_V_VRMA;
 		ret = kvmppc_virtmode_do_h_enter(kvm, H_EXACT, hash, hp_v, hp_r,
 						 &idx_ret);
 		if (ret != H_SUCCESS) {
@@ -409,7 +414,7 @@ long kvmppc_virtmode_h_enter(struct kvm_vcpu *vcpu, unsigned long flags,
 	/*
 	 * Clear few bits, when called via hcall
 	 */
-	pteh &= ~(HPTE_V_HVLOCK | HPTE_V_ABSENT | HPTE_V_VALID);
+	pteh &= ~(HPTE_V_HVLOCK | HPTE_V_VRMA | HPTE_V_VALID);
 	ptel &= ~(HPTE_R_KEY_HI | HPTE_R_KEY_LO | HPTE_GR_RESERVED);
 
 	return kvmppc_virtmode_do_h_enter(vcpu->kvm, flags, pte_index,
@@ -472,7 +477,7 @@ static int kvmppc_mmu_book3s_64_hv_xlate(struct kvm_vcpu *vcpu, gva_t eaddr,
 	preempt_disable();
 	/* Find the HPTE in the hash table */
 	index = kvmppc_hv_find_lock_hpte(kvm, eaddr, slb_v,
-					 HPTE_V_VALID | HPTE_V_ABSENT);
+					 HPTE_V_VALID | HPTE_V_VRMA);
 	if (index < 0) {
 		preempt_enable();
 		return -ENOENT;
@@ -733,7 +738,13 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu,
 		rev->guest_rpte != hpte[2])
 		/* HPTE has been changed under us; let the guest retry */
 		goto out_unlock;
-	hpte[0] = (hpte[0] & ~HPTE_V_ABSENT) | HPTE_V_VALID;
+	/*
+	 * mark this hpte host mapped. We will use this value
+	 * to update actual hpte later. We don't need to clear
+	 * clear key bits, because we use rev->guest_rpte values
+	 * for the lower half.
+	 */
+	hpte[0] |= HPTE_V_VALID;
 
 	/* Always put the HPTE in the rmap chain for the page base address */
 	rmap = &memslot->arch.rmap[gfn_base - memslot->base_gfn];
@@ -906,8 +917,9 @@ static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long *rmapp,
 		psize = hpte_page_size(be64_to_cpu(hptep[0]), ptel);
 		if (kvmppc_is_host_mapped_hpte(kvm, hptep) &&
 		    hpte_rpn(ptel, psize) == gfn) {
-			if (kvm->arch.using_mmu_notifiers)
-				kvmppc_unmap_host_hpte(kvm, hptep);
+			/*
+			 * For hpte update always invalidate first
+			 */
 			kvmppc_invalidate_hpte(kvm, hptep, i);
 
 			/* Harvest R and C */
@@ -917,6 +929,11 @@ static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long *rmapp,
 				rev[i].guest_rpte = ptel | rcbits;
 				note_hpte_modification(kvm, &rev[i]);
 			}
+			/*
+			 * Mark the hpte umapped so that host can
+			 * handle the faults.
+			 */
+			kvmppc_unmap_host_hpte(kvm, hptep);
 		}
 		unlock_rmap(rmapp);
 		hptep[0] &= ~cpu_to_be64(HPTE_V_HVLOCK);
@@ -1345,7 +1362,7 @@ static long record_hpte(unsigned long flags, __be64 *hptp,
 		return 0;
 
 	valid = 0;
-	if (be64_to_cpu(hptp[0]) & (HPTE_V_VALID | HPTE_V_ABSENT)) {
+	if (be64_to_cpu(hptp[0]) & (HPTE_V_VALID | HPTE_V_VRMA)) {
 		valid = 1;
 		if ((flags & KVM_GET_HTAB_BOLTED_ONLY) &&
 		    !(be64_to_cpu(hptp[0]) & HPTE_V_BOLTED))
@@ -1362,7 +1379,11 @@ static long record_hpte(unsigned long flags, __be64 *hptp,
 			cpu_relax();
 		v = be64_to_cpu(hptp[0]);
 
-		/* re-evaluate valid and dirty from synchronized HPTE value */
+		/*
+		 * re-evaluate valid and dirty from synchronized HPTE value
+		 * We don't need to worry of host unmapped. We keep the valid
+		 * bit set even if we move the hpte to class 31.
+		 */
 		valid = !!(v & HPTE_V_VALID);
 		dirty = !!(revp->guest_rpte & HPTE_GR_MODIFIED);
 
@@ -1374,8 +1395,11 @@ static long record_hpte(unsigned long flags, __be64 *hptp,
 			dirty = 1;
 		}
 
-		if (v & HPTE_V_ABSENT) {
-			v &= ~HPTE_V_ABSENT;
+		if (v & HPTE_V_VRMA) {
+			/*
+			 * unmapped vrma consider them mapped and also
+			 * retain the HPTE_V_VRMA bit.
+			 */
 			v |= HPTE_V_VALID;
 			valid = 1;
 		}
@@ -1559,14 +1583,14 @@ static ssize_t kvm_htab_write(struct file *file, const char __user *buf,
 			lbuf += 2;
 			nb += HPTE_SIZE;
 
-			if (be64_to_cpu(hptp[0]) & (HPTE_V_VALID | HPTE_V_ABSENT))
+			if (be64_to_cpu(hptp[0]) & (HPTE_V_VALID | HPTE_V_VRMA))
 				kvmppc_do_h_remove(kvm, 0, i, 0, tmp);
 			err = -EIO;
 			/*
 			 * Clear few bits we got via read_htab which we
 			 * don't need to carry forward.
 			 */
-			v &= ~(HPTE_V_HVLOCK | HPTE_V_ABSENT | HPTE_V_VALID);
+			v &= ~(HPTE_V_HVLOCK | HPTE_V_VALID);
 			r &= ~(HPTE_R_KEY_HI | HPTE_R_KEY_LO | HPTE_GR_RESERVED);
 
 			ret = kvmppc_virtmode_do_h_enter(kvm, H_EXACT, i, v, r,
@@ -1592,7 +1616,7 @@ static ssize_t kvm_htab_write(struct file *file, const char __user *buf,
 		}
 
 		for (j = 0; j < hdr.n_invalid; ++j) {
-			if (be64_to_cpu(hptp[0]) & (HPTE_V_VALID | HPTE_V_ABSENT))
+			if (be64_to_cpu(hptp[0]) & (HPTE_V_VALID | HPTE_V_VRMA))
 				kvmppc_do_h_remove(kvm, 0, i, 0, tmp);
 			++i;
 			hptp += 2;
diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index d628d2810c93..f907be908b28 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -252,7 +252,13 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags,
 
 	ptel &= ~(HPTE_R_PP0 - psize);
 	ptel |= pa;
-
+	/*
+	 * We mark the pte valid, if it is valid from guest point of view
+	 * For VRMA area we need to mark it still invalid, because access
+	 * class protection mechanism doesn't work with guest real mode
+	 * access.
+	 * Non vrma area is always valid, and vram is valid only if pa is set.
+	 */
 	if (pa)
 		pteh |= HPTE_V_VALID;
 
@@ -278,7 +284,7 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags,
 		for (i = 0; i < 8; ++i) {
 			if ((be64_to_cpu(*hpte) & HPTE_V_VALID) == 0 &&
 			    try_lock_hpte(hpte, HPTE_V_HVLOCK | HPTE_V_VALID |
-					  HPTE_V_ABSENT))
+					  HPTE_V_VRMA))
 				break;
 			hpte += 2;
 		}
@@ -295,7 +301,7 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags,
 				while (!try_lock_hpte(hpte, HPTE_V_HVLOCK))
 					cpu_relax();
 				pte = be64_to_cpu(*hpte);
-				if (!(pte & (HPTE_V_VALID | HPTE_V_ABSENT)))
+				if (!(pte & (HPTE_V_VALID | HPTE_V_VRMA)))
 					break;
 				*hpte &= ~cpu_to_be64(HPTE_V_HVLOCK);
 				hpte += 2;
@@ -307,14 +313,14 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags,
 	} else {
 		hpte = (__be64 *)(kvm->arch.hpt_virt + (pte_index << 4));
 		if (!try_lock_hpte(hpte, HPTE_V_HVLOCK | HPTE_V_VALID |
-				   HPTE_V_ABSENT)) {
+				   HPTE_V_VRMA)) {
 			/* Lock the slot and check again */
 			u64 pte;
 
 			while (!try_lock_hpte(hpte, HPTE_V_HVLOCK))
 				cpu_relax();
 			pte = be64_to_cpu(*hpte);
-			if (pte & (HPTE_V_VALID | HPTE_V_ABSENT)) {
+			if (pte & (HPTE_V_VALID | HPTE_V_VRMA)) {
 				*hpte &= ~cpu_to_be64(HPTE_V_HVLOCK);
 				return H_PTEG_FULL;
 			}
@@ -372,7 +378,7 @@ long kvmppc_h_enter(struct kvm_vcpu *vcpu, unsigned long flags,
 	/*
 	 * Clear few bits. when called via hcall.
 	 */
-	pteh &= ~(HPTE_V_HVLOCK | HPTE_V_ABSENT | HPTE_V_VALID);
+	pteh &= ~(HPTE_V_HVLOCK | HPTE_V_VRMA | HPTE_V_VALID);
 	ptel &= ~(HPTE_R_KEY_HI | HPTE_R_KEY_LO | HPTE_GR_RESERVED);
 
 	return kvmppc_do_h_enter(vcpu->kvm, flags, pte_index, pteh, ptel,
@@ -492,7 +498,7 @@ long kvmppc_do_h_remove(struct kvm *kvm, unsigned long flags,
 	while (!try_lock_hpte(hpte, HPTE_V_HVLOCK))
 		cpu_relax();
 	pte = be64_to_cpu(hpte[0]);
-	if ((pte & (HPTE_V_ABSENT | HPTE_V_VALID)) == 0 ||
+	if ((pte & (HPTE_V_VRMA | HPTE_V_VALID)) == 0 ||
 	    ((flags & H_AVPN) && (pte & ~0x7fUL) != avpn) ||
 	    ((flags & H_ANDCOND) && (pte & avpn) != 0)) {
 		hpte[0] &= ~cpu_to_be64(HPTE_V_HVLOCK);
@@ -503,9 +509,12 @@ long kvmppc_do_h_remove(struct kvm *kvm, unsigned long flags,
 	v = pte & ~HPTE_V_HVLOCK;
 	if (v & HPTE_V_VALID) {
 		u64 pte1;
-
 		pte1 = be64_to_cpu(hpte[1]);
-		hpte[0] &= ~cpu_to_be64(HPTE_V_VALID);
+		/*
+		 * Remove the valid and VRMA bits
+		 */
+		hpte[0] &= ~cpu_to_be64(HPTE_V_VALID | HPTE_V_VRMA);
+
 		rb = compute_tlbie_rb(v, pte1, pte_index);
 		do_tlbies(kvm, &rb, 1, global_invalidates(kvm, flags), true);
 		/* Read PTE low word after tlbie to get final R/C values */
@@ -572,7 +581,7 @@ long kvmppc_h_bulk_remove(struct kvm_vcpu *vcpu)
 			}
 			found = 0;
 			hp0 = be64_to_cpu(hp[0]);
-			if (hp0 & (HPTE_V_ABSENT | HPTE_V_VALID)) {
+			if (hp0 & (HPTE_V_VRMA | HPTE_V_VALID)) {
 				switch (flags & 3) {
 				case 0:		/* absolute */
 					found = 1;
@@ -606,7 +615,7 @@ long kvmppc_h_bulk_remove(struct kvm_vcpu *vcpu)
 			}
 
 			/* leave it locked */
-			hp[0] &= ~cpu_to_be64(HPTE_V_VALID);
+			hp[0] &= ~cpu_to_be64(HPTE_V_VALID | HPTE_V_VRMA);
 			tlbrb[n] = compute_tlbie_rb(be64_to_cpu(hp[0]),
 				be64_to_cpu(hp[1]), pte_index);
 			indexes[n] = j;
@@ -656,7 +665,7 @@ long kvmppc_h_protect(struct kvm_vcpu *vcpu, unsigned long flags,
 	while (!try_lock_hpte(hpte, HPTE_V_HVLOCK))
 		cpu_relax();
 	pte = be64_to_cpu(hpte[0]);
-	if ((pte & (HPTE_V_ABSENT | HPTE_V_VALID)) == 0 ||
+	if ((pte & (HPTE_V_VRMA | HPTE_V_VALID)) == 0 ||
 	    ((flags & H_AVPN) && (pte & ~0x7fUL) != avpn)) {
 		hpte[0] &= ~cpu_to_be64(HPTE_V_HVLOCK);
 		return H_NOT_FOUND;
@@ -758,10 +767,17 @@ long kvmppc_h_read(struct kvm_vcpu *vcpu, unsigned long flags,
 		hpte = (__be64 *)(kvm->arch.hpt_virt + (pte_index << 4));
 		v = be64_to_cpu(hpte[0]) & ~HPTE_V_HVLOCK;
 		r = be64_to_cpu(hpte[1]);
-		if (v & HPTE_V_ABSENT) {
-			v &= ~HPTE_V_ABSENT;
+		if (v & HPTE_V_VRMA) {
+			/*
+			 * don't share vrma bits back to guest
+			 */
+			v &= ~HPTE_V_VRMA;
 			v |= HPTE_V_VALID;
 		}
+		/*
+		 * Clear the AMR class bits
+		 */
+		r &= ~(HPTE_R_KEY_HI | HPTE_R_KEY_LO);
 		if (v & HPTE_V_VALID) {
 			r = rev[i].guest_rpte | (r & (HPTE_R_R | HPTE_R_C));
 			r &= ~HPTE_GR_RESERVED;
@@ -871,8 +887,13 @@ long kvmppc_hv_find_lock_hpte(struct kvm *kvm, gva_t eaddr, unsigned long slb_v,
 			/* Read the PTE racily */
 			v = be64_to_cpu(hpte[i]) & ~HPTE_V_HVLOCK;
 
-			/* Check valid/absent, hash, segment size and AVPN */
-			if (!(v & valid) || (v & mask) != val)
+			/*
+			 * Check hash, segment size and AVPN.
+			 * We can't check for valid here without lock. We do
+			 * mark the hpte not valid while an hpte update is
+			 * in progress.
+			 */
+			if ((v & mask) != val)
 				continue;
 
 			/* Lock the PTE and read it under the lock */
@@ -927,7 +948,7 @@ long kvmppc_hpte_hv_fault(struct kvm_vcpu *vcpu, unsigned long addr,
 	/* For protection fault, expect to find a valid HPTE */
 	valid = HPTE_V_VALID;
 	if (status & DSISR_NOHPTE)
-		valid |= HPTE_V_ABSENT;
+		valid |= HPTE_V_VRMA;
 
 	index = kvmppc_hv_find_lock_hpte(kvm, addr, slb_v, valid);
 	if (index < 0) {
@@ -942,10 +963,17 @@ long kvmppc_hpte_hv_fault(struct kvm_vcpu *vcpu, unsigned long addr,
 	gr = rev->guest_rpte;
 
 	unlock_hpte(hpte, v);
-
-	/* For not found, if the HPTE is valid by now, retry the instruction */
+	/*
+	 * For not found, if the HPTE is valid by now, retry the instruction
+	 */
 	if ((status & DSISR_NOHPTE) && (v & HPTE_V_VALID))
 		return 0;
+	/*
+	 * For key fault if HPTE is host mapped by now retry the instruction
+	 */
+	if ((status & DSISR_KEYFAULT) &&
+	    kvmppc_is_host_mapped_hpte(kvm, hpte))
+		return 0;
 
 	/* Check access permissions to the page */
 	pp = gr & (HPTE_R_PP0 | HPTE_R_PP);
@@ -973,8 +1001,7 @@ long kvmppc_hpte_hv_fault(struct kvm_vcpu *vcpu, unsigned long addr,
 
 	/* Check the storage key to see if it is possibly emulated MMIO */
 	if (data && (vcpu->arch.shregs.msr & MSR_IR) &&
-	    (r & (HPTE_R_KEY_HI | HPTE_R_KEY_LO)) ==
-	    (HPTE_R_KEY_HI | HPTE_R_KEY_LO))
+	    ((r & HPTE_R_MMIO_UNMAP_KEY) == HPTE_R_MMIO_UNMAP_KEY))
 		return -2;	/* MMIO emulation - load instr word */
 
 	return -1;		/* send fault up to host kernel mode */
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index 0b425da9f8db..f1af0d24e2b5 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -834,7 +834,11 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S)
 	mtmsrd	r8
 
 	/* Load up POWER8-specific registers */
+	/*
+	 * Always disable read/write w.r.t to class index 31
+	 */
 	ld	r5, VCPU_IAMR(r4)
+	ori	r5, r5, 0x3
 	lwz	r6, VCPU_PSPB(r4)
 	ld	r7, VCPU_FSCR(r4)
 	mtspr	SPRN_IAMR, r5
@@ -901,10 +905,25 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S)
 	mtspr	SPRN_DSISR, r6
 
 BEGIN_FTR_SECTION
-	/* Restore AMR and UAMOR, set AMOR to all 1s */
-	ld	r5,VCPU_AMR(r4)
+	/* Restore AMR and UAMOR */
+	/*
+	 * Always disable read/write wr.t. to class index
+	 * 30 and 31
+	 */
+	ld	r5, VCPU_AMR(r4)
+	ori	r5, r5, 0xf
+	/*
+	 * UAMOR set so that mask bits for class index 30
+	 * and 31 cannot be updated
+	 */
+
 	ld	r6,VCPU_UAMOR(r4)
-	li	r7,-1
+	rldicr  r6, r6, 0, 59
+	/*
+	 * AMOR set so that mask bits for class index 30
+	 * and 31 cannot be updated
+	 */
+	li	r7, ~0xf
 	mtspr	SPRN_AMR,r5
 	mtspr	SPRN_UAMOR,r6
 	mtspr	SPRN_AMOR,r7
@@ -1801,13 +1820,22 @@ kvmppc_hdsi:
 	 */
 	cmpwi	r0, 0
 	bne	5f
-
-	andi.	r0, r11, MSR_DR		/* data relocation enabled? */
+	/*
+	 * If data relocation is disabled, virtual page class
+	 * key protection mechanis does not apply.
+	 */
+	andi.	r0, r11, MSR_DR
 	beq	3f
-
-	/* HPTE not found fault or protection fault? */
-	andis.	r0, r6, (DSISR_NOHPTE | DSISR_PROTFAULT)@h
-	beq	1f			/* if not, send it to the guest */
+	/*
+	 * If access is not permitted by virtual page class
+	 * key protection, handle it in the host. If not
+	 * send it to the guest.
+	 */
+	andis.	r0, r6, (DSISR_KEYFAULT | DSISR_PROTFAULT)@h
+	beq	1f
+	/*
+	 * skip the real mode check below
+	 */
 	b	8f
 5:
 	/*
@@ -1857,7 +1885,8 @@ fast_interrupt_c_return:
 
 3:	/*
 	 * Check whether we can send the fault directly to
-	 * guest.
+	 * guest. We don't need to worry about keyfault if
+	 * the fault happens in real mode.
 	 */
 	andis.	r0, r6, (DSISR_NOHPTE | DSISR_PROTFAULT)@h
 	beq	1b
@@ -1907,8 +1936,7 @@ kvmppc_hisi:
 
 	andi.	r0, r11, MSR_IR		/* instruction relocation enabled? */
 	beq	3f
-
-	andis.	r0, r11, SRR1_ISI_NOPT@h
+	andis.	r0, r11, SRR1_ISI_KEYFAULT@h
 	beq	1f
 	b	8f
 5:
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [PATCH 0/6] Use virtual page class key protection mechanism for speeding up guest page fault
  2014-06-29 11:17 ` Aneesh Kumar K.V
  (?)
@ 2014-06-29 11:26   ` Benjamin Herrenschmidt
  -1 siblings, 0 replies; 51+ messages in thread
From: Benjamin Herrenschmidt @ 2014-06-29 11:26 UTC (permalink / raw)
  To: Aneesh Kumar K.V; +Cc: agraf, paulus, linuxppc-dev, kvm-ppc, kvm

On Sun, 2014-06-29 at 16:47 +0530, Aneesh Kumar K.V wrote:

> To achieve the above we use virtual page calss protection mechanism for
> covering (2) and (3). For both the above case we mark the hpte
> valid, but associate the page with virtual page class index 30 and 31.
> The authority mask register is configured such that class index 30 and 31
> will have read/write denied. The above change results in a key fault
> for (2) and (3). This allows us to forward a NO_HPTE fault directly to guest
> without doing the expensive hash pagetable lookup.

So we have a measurable performance benefit (about half a second out of
8) but you didn't explain the drawback here which is to essentially make
it impossible for guests to exploit virtual page class keys, or did you
find a way to still make that possible ?

As it-is, it's not a huge issue for Linux but we might have to care with
other OSes that do care...

Do we have a way in PAPR to signify to the guest that the keys are not
available ?

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 0/6] Use virtual page class key protection mechanism for speeding up guest page fault
@ 2014-06-29 11:26   ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 51+ messages in thread
From: Benjamin Herrenschmidt @ 2014-06-29 11:26 UTC (permalink / raw)
  To: Aneesh Kumar K.V; +Cc: linuxppc-dev, paulus, agraf, kvm-ppc, kvm

On Sun, 2014-06-29 at 16:47 +0530, Aneesh Kumar K.V wrote:

> To achieve the above we use virtual page calss protection mechanism for
> covering (2) and (3). For both the above case we mark the hpte
> valid, but associate the page with virtual page class index 30 and 31.
> The authority mask register is configured such that class index 30 and 31
> will have read/write denied. The above change results in a key fault
> for (2) and (3). This allows us to forward a NO_HPTE fault directly to guest
> without doing the expensive hash pagetable lookup.

So we have a measurable performance benefit (about half a second out of
8) but you didn't explain the drawback here which is to essentially make
it impossible for guests to exploit virtual page class keys, or did you
find a way to still make that possible ?

As it-is, it's not a huge issue for Linux but we might have to care with
other OSes that do care...

Do we have a way in PAPR to signify to the guest that the keys are not
available ?

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 0/6] Use virtual page class key protection mechanism for speeding up guest page fault
@ 2014-06-29 11:26   ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 51+ messages in thread
From: Benjamin Herrenschmidt @ 2014-06-29 11:26 UTC (permalink / raw)
  To: Aneesh Kumar K.V; +Cc: agraf, paulus, linuxppc-dev, kvm-ppc, kvm

On Sun, 2014-06-29 at 16:47 +0530, Aneesh Kumar K.V wrote:

> To achieve the above we use virtual page calss protection mechanism for
> covering (2) and (3). For both the above case we mark the hpte
> valid, but associate the page with virtual page class index 30 and 31.
> The authority mask register is configured such that class index 30 and 31
> will have read/write denied. The above change results in a key fault
> for (2) and (3). This allows us to forward a NO_HPTE fault directly to guest
> without doing the expensive hash pagetable lookup.

So we have a measurable performance benefit (about half a second out of
8) but you didn't explain the drawback here which is to essentially make
it impossible for guests to exploit virtual page class keys, or did you
find a way to still make that possible ?

As it-is, it's not a huge issue for Linux but we might have to care with
other OSes that do care...

Do we have a way in PAPR to signify to the guest that the keys are not
available ?

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH 0/6] Use virtual page class key protection mechanism for speeding up guest page fault
@ 2014-06-29 11:17 ` Aneesh Kumar K.V
  0 siblings, 0 replies; 51+ messages in thread
From: Aneesh Kumar K.V @ 2014-06-29 11:29 UTC (permalink / raw)
  To: agraf, benh, paulus; +Cc: linuxppc-dev, kvm-ppc, kvm, Aneesh Kumar K.V

Hi,

With the current code we do an expensive hash page table lookup on every
page fault resulting from a missing hash page table entry. A NO_HPTE
page fault can happen due to the below reasons:

1) Missing hash pte as per guest. This should be forwarded to the guest
2) MMIO hash pte. The address against which the load/store is performed
   should be emulated as a MMIO operation.
3) Missing hash pte because host swapped out the guest page.

We want to differentiate (1) from (2) and (3) so that we can speed up
page fault due to (1). Optimizing (1) will help in improving
the overall performance because that covers a large percentage of
the page faults.

To achieve the above we use virtual page calss protection mechanism for
covering (2) and (3). For both the above case we mark the hpte
valid, but associate the page with virtual page class index 30 and 31.
The authority mask register is configured such that class index 30 and 31
will have read/write denied. The above change results in a key fault
for (2) and (3). This allows us to forward a NO_HPTE fault directly to guest
without doing the expensive hash pagetable lookup.

For the test below:

#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/mman.h>

#define PAGES (40*1024)

int main()
{
	unsigned long size = getpagesize();
	unsigned long length = size * PAGES;
	unsigned long i, j, k = 0;

	for (j = 0; j < 10; j++) {
		char *c = mmap(NULL, length, PROT_READ|PROT_WRITE,
			       MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
		if (c = MAP_FAILED) {
			perror("mmap");
			exit(1);
		}
		for (i = 0; i < length; i += size)
			c[i] = 0;

		/* flush hptes */
		mprotect(c, length, PROT_WRITE);

		for (i = 0; i < length; i += size)
			c[i] = 10;

		mprotect(c, length, PROT_READ);

		for (i = 0; i < length; i += size)
			k += c[i];

		munmap(c, length);
	}
}

Without Fix:
----------
[root@qemu-pr-host ~]# time ./pfault

real    0m8.438s
user    0m0.855s
sys     0m7.540s
[root@qemu-pr-host ~]#

With Fix:
--------
[root@qemu-pr-host ~]# time ./pfault

real    0m7.833s
user    0m0.782s
sys     0m7.038s
[root@qemu-pr-host ~]#

Aneesh Kumar K.V (6):
  KVM: PPC: BOOK3S: HV: Clear hash pte bits from do_h_enter callers
  KVM: PPC: BOOK3S: HV: Deny virtual page class key update via h_protect
  KVM: PPC: BOOK3S: HV: Remove dead code
  KVM: PPC: BOOK3S: HV: Use new functions for mapping/unmapping hpte in
    host
  KVM: PPC: BOOK3S: Use hpte_update_in_progress to track invalid hpte
    during an hpte update
  KVM: PPC: BOOK3S: HV: Use virtual page class protection mechanism for
    host fault and mmio

 arch/powerpc/include/asm/kvm_book3s_64.h |  97 +++++++++++++++++-
 arch/powerpc/include/asm/kvm_host.h      |   1 +
 arch/powerpc/include/asm/reg.h           |   1 +
 arch/powerpc/kernel/asm-offsets.c        |   1 +
 arch/powerpc/kvm/book3s_64_mmu_hv.c      |  99 ++++++++++++------
 arch/powerpc/kvm/book3s_hv.c             |   1 +
 arch/powerpc/kvm/book3s_hv_rm_mmu.c      | 166 +++++++++++++++++++++----------
 arch/powerpc/kvm/book3s_hv_rmhandlers.S  | 100 +++++++++++++++++--
 8 files changed, 371 insertions(+), 95 deletions(-)

-- 
1.9.1

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH 1/6] KVM: PPC: BOOK3S: HV: Clear hash pte bits from do_h_enter callers
@ 2014-06-29 11:17   ` Aneesh Kumar K.V
  0 siblings, 0 replies; 51+ messages in thread
From: Aneesh Kumar K.V @ 2014-06-29 11:29 UTC (permalink / raw)
  To: agraf, benh, paulus; +Cc: linuxppc-dev, kvm-ppc, kvm, Aneesh Kumar K.V

We will use this to set HPTE_V_VRMA bit in the later patch. This also
make sure we clear the hpte bits only when called via hcall.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 arch/powerpc/kvm/book3s_64_mmu_hv.c | 15 +++++++++++++--
 arch/powerpc/kvm/book3s_hv_rm_mmu.c |  8 ++++++--
 2 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 09a47aeb5b63..1c137f45dd55 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -371,8 +371,6 @@ long kvmppc_virtmode_do_h_enter(struct kvm *kvm, unsigned long flags,
 	if (!psize)
 		return H_PARAMETER;
 
-	pteh &= ~(HPTE_V_HVLOCK | HPTE_V_ABSENT | HPTE_V_VALID);
-
 	/* Find the memslot (if any) for this address */
 	gpa = (ptel & HPTE_R_RPN) & ~(psize - 1);
 	gfn = gpa >> PAGE_SHIFT;
@@ -408,6 +406,12 @@ long kvmppc_virtmode_h_enter(struct kvm_vcpu *vcpu, unsigned long flags,
 			     long pte_index, unsigned long pteh,
 			     unsigned long ptel)
 {
+	/*
+	 * Clear few bits, when called via hcall
+	 */
+	pteh &= ~(HPTE_V_HVLOCK | HPTE_V_ABSENT | HPTE_V_VALID);
+	ptel &= ~(HPTE_R_KEY_HI | HPTE_R_KEY_LO | HPTE_GR_RESERVED);
+
 	return kvmppc_virtmode_do_h_enter(vcpu->kvm, flags, pte_index,
 					  pteh, ptel, &vcpu->arch.gpr[4]);
 }
@@ -1560,6 +1564,13 @@ static ssize_t kvm_htab_write(struct file *file, const char __user *buf,
 			if (be64_to_cpu(hptp[0]) & (HPTE_V_VALID | HPTE_V_ABSENT))
 				kvmppc_do_h_remove(kvm, 0, i, 0, tmp);
 			err = -EIO;
+			/*
+			 * Clear few bits we got via read_htab which we
+			 * don't need to carry forward.
+			 */
+			v &= ~(HPTE_V_HVLOCK | HPTE_V_ABSENT | HPTE_V_VALID);
+			r &= ~(HPTE_R_KEY_HI | HPTE_R_KEY_LO | HPTE_GR_RESERVED);
+
 			ret = kvmppc_virtmode_do_h_enter(kvm, H_EXACT, i, v, r,
 							 tmp);
 			if (ret != H_SUCCESS) {
diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index 084ad54c73cd..157a5f35edfa 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -182,8 +182,6 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags,
 	if (!psize)
 		return H_PARAMETER;
 	writing = hpte_is_writable(ptel);
-	pteh &= ~(HPTE_V_HVLOCK | HPTE_V_ABSENT | HPTE_V_VALID);
-	ptel &= ~HPTE_GR_RESERVED;
 	g_ptel = ptel;
 
 	/* used later to detect if we might have been invalidated */
@@ -367,6 +365,12 @@ EXPORT_SYMBOL_GPL(kvmppc_do_h_enter);
 long kvmppc_h_enter(struct kvm_vcpu *vcpu, unsigned long flags,
 		    long pte_index, unsigned long pteh, unsigned long ptel)
 {
+	/*
+	 * Clear few bits. when called via hcall.
+	 */
+	pteh &= ~(HPTE_V_HVLOCK | HPTE_V_ABSENT | HPTE_V_VALID);
+	ptel &= ~(HPTE_R_KEY_HI | HPTE_R_KEY_LO | HPTE_GR_RESERVED);
+
 	return kvmppc_do_h_enter(vcpu->kvm, flags, pte_index, pteh, ptel,
 				 vcpu->arch.pgdir, true, &vcpu->arch.gpr[4]);
 }
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH] KVM: PPC: BOOK3S: HV: Update compute_tlbie_rb to handle 16MB base page
@ 2014-06-29 11:17   ` Aneesh Kumar K.V
  0 siblings, 0 replies; 51+ messages in thread
From: Aneesh Kumar K.V @ 2014-06-29 11:29 UTC (permalink / raw)
  To: agraf, benh, paulus; +Cc: linuxppc-dev, kvm-ppc, kvm, Aneesh Kumar K.V

When calculating the lower bits of AVA field, use the shift
count based on the base page size. Also add the missing segment
size and remove stale comment.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/kvm_book3s_64.h | 6 ++++--
 arch/powerpc/kvm/book3s_hv.c             | 6 ------
 2 files changed, 4 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h
index 66a0a44b62a8..ca7c1688a7b6 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -158,6 +158,8 @@ static inline unsigned long compute_tlbie_rb(unsigned long v, unsigned long r,
 	 */
 	/* This covers 14..54 bits of va*/
 	rb = (v & ~0x7fUL) << 16;		/* AVA field */
+
+	rb |= v >> (62 - 8);			/*  B field */
 	/*
 	 * AVA in v had cleared lower 23 bits. We need to derive
 	 * that from pteg index
@@ -188,10 +190,10 @@ static inline unsigned long compute_tlbie_rb(unsigned long v, unsigned long r,
 	{
 		int aval_shift;
 		/*
-		 * remaining 7bits of AVA/LP fields
+		 * remaining bits of AVA/LP fields
 		 * Also contain the rr bits of LP
 		 */
-		rb |= (va_low & 0x7f) << 16;
+		rb |= (va_low << mmu_psize_defs[b_psize].shift) & 0x7ff000;
 		/*
 		 * Now clear not needed LP bits based on actual psize
 		 */
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index cbf46eb3f59c..328416f28a55 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -1917,12 +1917,6 @@ static void kvmppc_add_seg_page_size(struct kvm_ppc_one_seg_page_size **sps,
 	(*sps)->page_shift = def->shift;
 	(*sps)->slb_enc = def->sllp;
 	(*sps)->enc[0].page_shift = def->shift;
-	/*
-	 * Only return base page encoding. We don't want to return
-	 * all the supporting pte_enc, because our H_ENTER doesn't
-	 * support MPSS yet. Once they do, we can start passing all
-	 * support pte_enc here
-	 */
 	(*sps)->enc[0].pte_enc = def->penc[linux_psize];
 	/*
 	 * Add 16MB MPSS support if host supports it
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 2/6] KVM: PPC: BOOK3S: HV: Deny virtual page class key update via h_protect
@ 2014-06-29 11:17   ` Aneesh Kumar K.V
  0 siblings, 0 replies; 51+ messages in thread
From: Aneesh Kumar K.V @ 2014-06-29 11:29 UTC (permalink / raw)
  To: agraf, benh, paulus; +Cc: linuxppc-dev, kvm-ppc, kvm, Aneesh Kumar K.V

This makes it consistent with h_enter where we clear the key
bits. We also want to use virtual page class key protection mechanism
for indicating host page fault. For that we will be using key class
index 30 and 31. So prevent the guest from updating key bits until
we add proper support for virtual page class protection mechanism for
the guest. This will not have any impact for PAPR linux guest because
Linux guest currently don't use virtual page class key protection model

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 arch/powerpc/kvm/book3s_hv_rm_mmu.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index 157a5f35edfa..f908845f7379 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -658,13 +658,17 @@ long kvmppc_h_protect(struct kvm_vcpu *vcpu, unsigned long flags,
 	}
 
 	v = pte;
+	/*
+	 * We ignore key bits here. We use class 31 and 30 for
+	 * hypervisor purpose. We still don't track the page
+	 * class seperately. Until then don't allow h_protect
+	 * to change key bits.
+	 */
 	bits = (flags << 55) & HPTE_R_PP0;
-	bits |= (flags << 48) & HPTE_R_KEY_HI;
-	bits |= flags & (HPTE_R_PP | HPTE_R_N | HPTE_R_KEY_LO);
+	bits |= flags & (HPTE_R_PP | HPTE_R_N);
 
 	/* Update guest view of 2nd HPTE dword */
-	mask = HPTE_R_PP0 | HPTE_R_PP | HPTE_R_N |
-		HPTE_R_KEY_HI | HPTE_R_KEY_LO;
+	mask = HPTE_R_PP0 | HPTE_R_PP | HPTE_R_N;
 	rev = real_vmalloc_addr(&kvm->arch.revmap[pte_index]);
 	if (rev) {
 		r = (rev->guest_rpte & ~mask) | bits;
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 3/6] KVM: PPC: BOOK3S: HV: Remove dead code
@ 2014-06-29 11:17   ` Aneesh Kumar K.V
  0 siblings, 0 replies; 51+ messages in thread
From: Aneesh Kumar K.V @ 2014-06-29 11:29 UTC (permalink / raw)
  To: agraf, benh, paulus; +Cc: linuxppc-dev, kvm-ppc, kvm, Aneesh Kumar K.V

Since we do don't support virtual page class key protection mechanism in
the guest, we should not find a keyfault that needs to be forwarded to
the guest. So remove the dead code.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 arch/powerpc/kvm/book3s_64_mmu_hv.c | 9 ---------
 arch/powerpc/kvm/book3s_hv_rm_mmu.c | 9 ---------
 2 files changed, 18 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 1c137f45dd55..590e07b1a43f 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -499,15 +499,6 @@ static int kvmppc_mmu_book3s_64_hv_xlate(struct kvm_vcpu *vcpu, gva_t eaddr,
 	gpte->may_write = hpte_write_permission(pp, key);
 	gpte->may_execute = gpte->may_read && !(gr & (HPTE_R_N | HPTE_R_G));
 
-	/* Storage key permission check for POWER7 */
-	if (data && virtmode && cpu_has_feature(CPU_FTR_ARCH_206)) {
-		int amrfield = hpte_get_skey_perm(gr, vcpu->arch.amr);
-		if (amrfield & 1)
-			gpte->may_read = 0;
-		if (amrfield & 2)
-			gpte->may_write = 0;
-	}
-
 	/* Get the guest physical address */
 	gpte->raddr = kvmppc_mmu_get_real_addr(v, gr, eaddr);
 	return 0;
diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index f908845f7379..1884bff3122c 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -925,15 +925,6 @@ long kvmppc_hpte_hv_fault(struct kvm_vcpu *vcpu, unsigned long addr,
 			return status | DSISR_PROTFAULT;
 	}
 
-	/* Check storage key, if applicable */
-	if (data && (vcpu->arch.shregs.msr & MSR_DR)) {
-		unsigned int perm = hpte_get_skey_perm(gr, vcpu->arch.amr);
-		if (status & DSISR_ISSTORE)
-			perm >>= 1;
-		if (perm & 1)
-			return status | DSISR_KEYFAULT;
-	}
-
 	/* Save HPTE info for virtual-mode handler */
 	vcpu->arch.pgfault_addr = addr;
 	vcpu->arch.pgfault_index = index;
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 4/6] KVM: PPC: BOOK3S: HV: Use new functions for mapping/unmapping hpte in host
@ 2014-06-29 11:17   ` Aneesh Kumar K.V
  0 siblings, 0 replies; 51+ messages in thread
From: Aneesh Kumar K.V @ 2014-06-29 11:29 UTC (permalink / raw)
  To: agraf, benh, paulus; +Cc: linuxppc-dev, kvm-ppc, kvm, Aneesh Kumar K.V

We want to use virtual page class key protection mechanism for
indicating a MMIO mapped hpte entry or a guest hpte entry that is swapped out
in the host. Those hptes will be marked valid, but have virtual page
class key set to 30 or 31. These virtual page class numbers are
configured in AMR to deny read/write. To accomodate such a change, add
new functions that map, unmap and check whether a hpte is mapped in the
host. This patch still use HPTE_V_VALID and HPTE_V_ABSENT and don't use
virtual page class keys. But we want to differentiate in the code
where we explicitly check for HPTE_V_VALID with places where we want to
check whether the hpte is host mapped. This patch enables a closer
review for such a change.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/kvm_book3s_64.h | 36 ++++++++++++++++++++++++++++++++
 arch/powerpc/kvm/book3s_64_mmu_hv.c      | 24 +++++++++++----------
 arch/powerpc/kvm/book3s_hv_rm_mmu.c      | 30 ++++++++++++++------------
 3 files changed, 66 insertions(+), 24 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h
index 0aa817933e6a..da00b1f05ea1 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -400,6 +400,42 @@ static inline int is_vrma_hpte(unsigned long hpte_v)
 		(HPTE_V_1TB_SEG | (VRMA_VSID << (40 - 16)));
 }
 
+static inline void __kvmppc_unmap_host_hpte(struct kvm *kvm,
+					    unsigned long *hpte_v,
+					    unsigned long *hpte_r,
+					    bool mmio)
+{
+	*hpte_v |= HPTE_V_ABSENT;
+	if (mmio)
+		*hpte_r |= HPTE_R_KEY_HI | HPTE_R_KEY_LO;
+}
+
+static inline void kvmppc_unmap_host_hpte(struct kvm *kvm, __be64 *hptep)
+{
+	/*
+	 * We will never call this for MMIO
+	 */
+	hptep[0] |= cpu_to_be64(HPTE_V_ABSENT);
+}
+
+static inline void kvmppc_map_host_hpte(struct kvm *kvm, unsigned long *hpte_v,
+					unsigned long *hpte_r)
+{
+	*hpte_v |= HPTE_V_VALID;
+	*hpte_v &= ~HPTE_V_ABSENT;
+}
+
+static inline bool kvmppc_is_host_mapped_hpte(struct kvm *kvm, __be64 *hpte)
+{
+	unsigned long v;
+
+	v = be64_to_cpu(hpte[0]);
+	if (v & HPTE_V_VALID)
+		return true;
+	return false;
+}
+
+
 #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
 /*
  * Note modification of an HPTE; set the HPTE modified bit
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 590e07b1a43f..8ce5e95613f8 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -752,7 +752,8 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu,
 	if (be64_to_cpu(hptep[0]) & HPTE_V_VALID) {
 		/* HPTE was previously valid, so we need to invalidate it */
 		unlock_rmap(rmap);
-		hptep[0] |= cpu_to_be64(HPTE_V_ABSENT);
+		/* Always mark HPTE_V_ABSENT before invalidating */
+		kvmppc_unmap_host_hpte(kvm, hptep);
 		kvmppc_invalidate_hpte(kvm, hptep, index);
 		/* don't lose previous R and C bits */
 		r |= be64_to_cpu(hptep[1]) & (HPTE_R_R | HPTE_R_C);
@@ -897,11 +898,12 @@ static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long *rmapp,
 		/* Now check and modify the HPTE */
 		ptel = rev[i].guest_rpte;
 		psize = hpte_page_size(be64_to_cpu(hptep[0]), ptel);
-		if ((be64_to_cpu(hptep[0]) & HPTE_V_VALID) &&
+		if (kvmppc_is_host_mapped_hpte(kvm, hptep) &&
 		    hpte_rpn(ptel, psize) = gfn) {
 			if (kvm->arch.using_mmu_notifiers)
-				hptep[0] |= cpu_to_be64(HPTE_V_ABSENT);
+				kvmppc_unmap_host_hpte(kvm, hptep);
 			kvmppc_invalidate_hpte(kvm, hptep, i);
+
 			/* Harvest R and C */
 			rcbits = be64_to_cpu(hptep[1]) & (HPTE_R_R | HPTE_R_C);
 			*rmapp |= rcbits << KVMPPC_RMAP_RC_SHIFT;
@@ -990,7 +992,7 @@ static int kvm_age_rmapp(struct kvm *kvm, unsigned long *rmapp,
 		}
 
 		/* Now check and modify the HPTE */
-		if ((be64_to_cpu(hptep[0]) & HPTE_V_VALID) &&
+		if (kvmppc_is_host_mapped_hpte(kvm, hptep) &&
 		    (be64_to_cpu(hptep[1]) & HPTE_R_R)) {
 			kvmppc_clear_ref_hpte(kvm, hptep, i);
 			if (!(rev[i].guest_rpte & HPTE_R_R)) {
@@ -1121,11 +1123,12 @@ static int kvm_test_clear_dirty_npages(struct kvm *kvm, unsigned long *rmapp)
 		}
 
 		/* Now check and modify the HPTE */
-		if (!(hptep[0] & cpu_to_be64(HPTE_V_VALID)))
+		if (!kvmppc_is_host_mapped_hpte(kvm, hptep))
 			continue;
-
-		/* need to make it temporarily absent so C is stable */
-		hptep[0] |= cpu_to_be64(HPTE_V_ABSENT);
+		/*
+		 * need to make it temporarily absent so C is stable
+		 */
+		kvmppc_unmap_host_hpte(kvm, hptep);
 		kvmppc_invalidate_hpte(kvm, hptep, i);
 		v = be64_to_cpu(hptep[0]);
 		r = be64_to_cpu(hptep[1]);
@@ -1141,9 +1144,8 @@ static int kvm_test_clear_dirty_npages(struct kvm *kvm, unsigned long *rmapp)
 				npages_dirty = n;
 			eieio();
 		}
-		v &= ~(HPTE_V_ABSENT | HPTE_V_HVLOCK);
-		v |= HPTE_V_VALID;
-		hptep[0] = cpu_to_be64(v);
+		kvmppc_map_host_hpte(kvm, &v, &r);
+		hptep[0] = cpu_to_be64(v & ~HPTE_V_HVLOCK);
 	} while ((i = j) != head);
 
 	unlock_rmap(rmapp);
diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index 1884bff3122c..e8458c0d1336 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -177,6 +177,7 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags,
 	unsigned int writing;
 	unsigned long mmu_seq;
 	unsigned long rcbits;
+	unsigned int host_unmapped_hpte = 0;
 
 	psize = hpte_page_size(pteh, ptel);
 	if (!psize)
@@ -199,9 +200,10 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags,
 		/* PPC970 can't do emulated MMIO */
 		if (!cpu_has_feature(CPU_FTR_ARCH_206))
 			return H_PARAMETER;
-		/* Emulated MMIO - mark this with key1 */
-		pteh |= HPTE_V_ABSENT;
-		ptel |= HPTE_R_KEY_HI | HPTE_R_KEY_LO;
+		/*
+		 * Mark the hpte as host unmapped
+		 */
+		host_unmapped_hpte = 2;
 		goto do_insert;
 	}
 
@@ -241,7 +243,8 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags,
 			pa = pte_pfn(pte) << PAGE_SHIFT;
 			pa |= hva & (pte_size - 1);
 			pa |= gpa & ~PAGE_MASK;
-		}
+		} else
+			host_unmapped_hpte = 1;
 	}
 
 	if (pte_size < psize)
@@ -252,8 +255,6 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags,
 
 	if (pa)
 		pteh |= HPTE_V_VALID;
-	else
-		pteh |= HPTE_V_ABSENT;
 
 	/* Check WIMG */
 	if (is_io != ~0ul && !hpte_cache_flags_ok(ptel, is_io)) {
@@ -330,16 +331,17 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags,
 	}
 
 	/* Link HPTE into reverse-map chain */
-	if (pteh & HPTE_V_VALID) {
+	if (!host_unmapped_hpte) {
 		if (realmode)
 			rmap = real_vmalloc_addr(rmap);
 		lock_rmap(rmap);
 		/* Check for pending invalidations under the rmap chain lock */
 		if (kvm->arch.using_mmu_notifiers &&
 		    mmu_notifier_retry(kvm, mmu_seq)) {
-			/* inval in progress, write a non-present HPTE */
-			pteh |= HPTE_V_ABSENT;
-			pteh &= ~HPTE_V_VALID;
+			/*
+			 * inval in progress in host, write host unmapped pte.
+			 */
+			host_unmapped_hpte = 1;
 			unlock_rmap(rmap);
 		} else {
 			kvmppc_add_revmap_chain(kvm, rev, rmap, pte_index,
@@ -350,8 +352,10 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags,
 		}
 	}
 
+	if (host_unmapped_hpte)
+		__kvmppc_unmap_host_hpte(kvm, &pteh, &ptel,
+					 (host_unmapped_hpte = 2));
 	hpte[1] = cpu_to_be64(ptel);
-
 	/* Write the first HPTE dword, unlocking the HPTE and making it valid */
 	eieio();
 	hpte[0] = cpu_to_be64(pteh);
@@ -593,7 +597,7 @@ long kvmppc_h_bulk_remove(struct kvm_vcpu *vcpu)
 			rev = real_vmalloc_addr(&kvm->arch.revmap[pte_index]);
 			note_hpte_modification(kvm, rev);
 
-			if (!(hp0 & HPTE_V_VALID)) {
+			if (!kvmppc_is_host_mapped_hpte(kvm, hp)) {
 				/* insert R and C bits from PTE */
 				rcbits = rev->guest_rpte & (HPTE_R_R|HPTE_R_C);
 				args[j] |= rcbits << (56 - 5);
@@ -678,7 +682,7 @@ long kvmppc_h_protect(struct kvm_vcpu *vcpu, unsigned long flags,
 	r = (be64_to_cpu(hpte[1]) & ~mask) | bits;
 
 	/* Update HPTE */
-	if (v & HPTE_V_VALID) {
+	if (kvmppc_is_host_mapped_hpte(kvm, hpte)) {
 		rb = compute_tlbie_rb(v, r, pte_index);
 		hpte[0] = cpu_to_be64(v & ~HPTE_V_VALID);
 		do_tlbies(kvm, &rb, 1, global_invalidates(kvm, flags), true);
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 5/6] KVM: PPC: BOOK3S: Use hpte_update_in_progress to track invalid hpte during an hpte updat
@ 2014-06-29 11:17   ` Aneesh Kumar K.V
  0 siblings, 0 replies; 51+ messages in thread
From: Aneesh Kumar K.V @ 2014-06-29 11:29 UTC (permalink / raw)
  To: agraf, benh, paulus; +Cc: linuxppc-dev, kvm-ppc, kvm, Aneesh Kumar K.V

As per ISA, we first need to mark hpte invalid (V=0) before we update
the hpte lower half bits. With virtual page class key protection mechanism we want
to send any fault other than key fault to guest directly without
searching the hash page table. But then we can get NO_HPTE fault while
we are updating the hpte. To track that add a vm specific atomic
variable that we check in the fault path to always send the fault
to host.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/kvm_book3s_64.h |  1 +
 arch/powerpc/include/asm/kvm_host.h      |  1 +
 arch/powerpc/kernel/asm-offsets.c        |  1 +
 arch/powerpc/kvm/book3s_64_mmu_hv.c      | 19 ++++++----
 arch/powerpc/kvm/book3s_hv.c             |  1 +
 arch/powerpc/kvm/book3s_hv_rm_mmu.c      | 40 +++++++++++++++++++--
 arch/powerpc/kvm/book3s_hv_rmhandlers.S  | 60 +++++++++++++++++++++++++++++---
 7 files changed, 109 insertions(+), 14 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h
index da00b1f05ea1..a6bf41865a66 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -416,6 +416,7 @@ static inline void kvmppc_unmap_host_hpte(struct kvm *kvm, __be64 *hptep)
 	 * We will never call this for MMIO
 	 */
 	hptep[0] |= cpu_to_be64(HPTE_V_ABSENT);
+	atomic_dec(&kvm->arch.hpte_update_in_progress);
 }
 
 static inline void kvmppc_map_host_hpte(struct kvm *kvm, unsigned long *hpte_v,
diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index f9ae69682ce1..0a9ff60fae4c 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -254,6 +254,7 @@ struct kvm_arch {
 	atomic_t hpte_mod_interest;
 	spinlock_t slot_phys_lock;
 	cpumask_t need_tlb_flush;
+	atomic_t hpte_update_in_progress;
 	struct kvmppc_vcore *vcores[KVM_MAX_VCORES];
 	int hpt_cma_alloc;
 #endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */
diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
index f5995a912213..54a36110f8f2 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -496,6 +496,7 @@ int main(void)
 	DEFINE(KVM_LPCR, offsetof(struct kvm, arch.lpcr));
 	DEFINE(KVM_RMOR, offsetof(struct kvm, arch.rmor));
 	DEFINE(KVM_VRMA_SLB_V, offsetof(struct kvm, arch.vrma_slb_v));
+	DEFINE(KVM_HPTE_UPDATE, offsetof(struct kvm, arch.hpte_update_in_progress));
 	DEFINE(VCPU_DSISR, offsetof(struct kvm_vcpu, arch.shregs.dsisr));
 	DEFINE(VCPU_DAR, offsetof(struct kvm_vcpu, arch.shregs.dar));
 	DEFINE(VCPU_VPA, offsetof(struct kvm_vcpu, arch.vpa.pinned_addr));
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 8ce5e95613f8..cb7a616aacb1 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -592,6 +592,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu,
 	unsigned int writing, write_ok;
 	struct vm_area_struct *vma;
 	unsigned long rcbits;
+	bool hpte_invalidated = false;
 
 	/*
 	 * Real-mode code has already searched the HPT and found the
@@ -750,13 +751,15 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu,
 	r &= rcbits | ~(HPTE_R_R | HPTE_R_C);
 
 	if (be64_to_cpu(hptep[0]) & HPTE_V_VALID) {
-		/* HPTE was previously valid, so we need to invalidate it */
+		/*
+		 * If we had mapped this hpte before, we now need to
+		 * invalidate that.
+		 */
 		unlock_rmap(rmap);
-		/* Always mark HPTE_V_ABSENT before invalidating */
-		kvmppc_unmap_host_hpte(kvm, hptep);
 		kvmppc_invalidate_hpte(kvm, hptep, index);
 		/* don't lose previous R and C bits */
 		r |= be64_to_cpu(hptep[1]) & (HPTE_R_R | HPTE_R_C);
+		hpte_invalidated = true;
 	} else {
 		kvmppc_add_revmap_chain(kvm, rev, rmap, index, 0);
 	}
@@ -765,6 +768,9 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu,
 	eieio();
 	hptep[0] = cpu_to_be64(hpte[0]);
 	asm volatile("ptesync" : : : "memory");
+	if (hpte_invalidated)
+		atomic_dec(&kvm->arch.hpte_update_in_progress);
+
 	preempt_enable();
 	if (page && hpte_is_writable(r))
 		SetPageDirty(page);
@@ -1128,10 +1134,9 @@ static int kvm_test_clear_dirty_npages(struct kvm *kvm, unsigned long *rmapp)
 		/*
 		 * need to make it temporarily absent so C is stable
 		 */
-		kvmppc_unmap_host_hpte(kvm, hptep);
-		kvmppc_invalidate_hpte(kvm, hptep, i);
 		v = be64_to_cpu(hptep[0]);
 		r = be64_to_cpu(hptep[1]);
+		kvmppc_invalidate_hpte(kvm, hptep, i);
 		if (r & HPTE_R_C) {
 			hptep[1] = cpu_to_be64(r & ~HPTE_R_C);
 			if (!(rev[i].guest_rpte & HPTE_R_C)) {
@@ -1144,8 +1149,8 @@ static int kvm_test_clear_dirty_npages(struct kvm *kvm, unsigned long *rmapp)
 				npages_dirty = n;
 			eieio();
 		}
-		kvmppc_map_host_hpte(kvm, &v, &r);
-		hptep[0] = cpu_to_be64(v & ~HPTE_V_HVLOCK);
+		hptep[0] = cpu_to_be64(v & ~HPTE_V_LOCK);
+		atomic_dec(&kvm->arch.hpte_update_in_progress);
 	} while ((i = j) != head);
 
 	unlock_rmap(rmapp);
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 47d3f20ad10f..328416f28a55 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -2291,6 +2291,7 @@ static int kvmppc_core_init_vm_hv(struct kvm *kvm)
 
 	kvm->arch.using_mmu_notifiers = !!cpu_has_feature(CPU_FTR_ARCH_206);
 	spin_lock_init(&kvm->arch.slot_phys_lock);
+	atomic_set(&kvm->arch.hpte_update_in_progress, 0);
 
 	/*
 	 * Track that we now have a HV mode VM active. This blocks secondary
diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index e8458c0d1336..d628d2810c93 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -647,6 +647,7 @@ long kvmppc_h_protect(struct kvm_vcpu *vcpu, unsigned long flags,
 	struct revmap_entry *rev;
 	unsigned long v, r, rb, mask, bits;
 	u64 pte;
+	bool hpte_invalidated = false;
 
 	if (pte_index >= kvm->arch.hpt_npte)
 		return H_PARAMETER;
@@ -683,8 +684,26 @@ long kvmppc_h_protect(struct kvm_vcpu *vcpu, unsigned long flags,
 
 	/* Update HPTE */
 	if (kvmppc_is_host_mapped_hpte(kvm, hpte)) {
-		rb = compute_tlbie_rb(v, r, pte_index);
+		/*
+		 * We use this in the fault path to check whether a
+		 * NO_HPTE fault should be send to guest. In this
+		 * case we don't want to send it to guest.
+		 */
+		atomic_inc(&kvm->arch.hpte_update_in_progress);
+		/*
+		 * We want to ensure that other cpus see the hpte_update_in_progress
+		 * change before taking a page fault due to us clearing the valid
+		 * bit below
+		 */
+		smp_wmb();
+		hpte_invalidated = true;
+		/*
+		 * We need to mark V = 0 before doing a tlb invalidate.
+		 * Refer Page table entry modifying section in ISA
+		 */
 		hpte[0] = cpu_to_be64(v & ~HPTE_V_VALID);
+
+		rb = compute_tlbie_rb(v, r, pte_index);
 		do_tlbies(kvm, &rb, 1, global_invalidates(kvm, flags), true);
 		/*
 		 * If the host has this page as readonly but the guest
@@ -714,6 +733,8 @@ long kvmppc_h_protect(struct kvm_vcpu *vcpu, unsigned long flags,
 	eieio();
 	hpte[0] = cpu_to_be64(v & ~HPTE_V_HVLOCK);
 	asm volatile("ptesync" : : : "memory");
+	if (hpte_invalidated)
+		atomic_dec(&kvm->arch.hpte_update_in_progress);
 	return H_SUCCESS;
 }
 
@@ -755,7 +776,22 @@ void kvmppc_invalidate_hpte(struct kvm *kvm, __be64 *hptep,
 			unsigned long pte_index)
 {
 	unsigned long rb;
-
+	/*
+	 * We use this in the fault path to check whether a
+	 * NO_HPTE fault should be send to guest. In this
+	 * case we don't want to send it to guest.
+	 */
+	atomic_inc(&kvm->arch.hpte_update_in_progress);
+	/*
+	 * We want to ensure that other cpus see the hpte_update_in_progress
+	 * change before taking a page fault due to us clearing the valid
+	 * bit below
+	 */
+	smp_wmb();
+	/*
+	 * We need to mark V = 0 before doing a tlb invalidate.
+	 * Refer Page table entry modifying section in ISA
+	 */
 	hptep[0] &= ~cpu_to_be64(HPTE_V_VALID);
 	rb = compute_tlbie_rb(be64_to_cpu(hptep[0]), be64_to_cpu(hptep[1]),
 			      pte_index);
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index 1ff3ebdd2ab0..0b425da9f8db 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -1790,11 +1790,33 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
 kvmppc_hdsi:
 	mfspr	r4, SPRN_HDAR
 	mfspr	r6, SPRN_HDSISR
+	/*
+	 * first check whether we have an hpte update in progress.
+	 * If so, we go to host directly
+	 */
+	ld	r0, VCPU_KVM(r9)
+	lwz	r0, KVM_HPTE_UPDATE(r0)
+	/*
+	 * If update is in progress go to host directly
+	 */
+	cmpwi	r0, 0
+	bne	5f
+
+	andi.	r0, r11, MSR_DR		/* data relocation enabled? */
+	beq	3f
+
 	/* HPTE not found fault or protection fault? */
 	andis.	r0, r6, (DSISR_NOHPTE | DSISR_PROTFAULT)@h
 	beq	1f			/* if not, send it to the guest */
-	andi.	r0, r11, MSR_DR		/* data relocation enabled? */
-	beq	3f
+	b	8f
+5:
+	/*
+	 * real mode access to be sent to host. because
+	 * of an hpte update in progress
+	 */
+	andi.	r0, r11, MSR_DR
+	beq	7f
+8:
 	clrrdi	r0, r4, 28
 	PPC_SLBFEE_DOT(R5, R0)		/* if so, look up SLB */
 	bne	1f			/* if no SLB entry found */
@@ -1833,7 +1855,14 @@ fast_interrupt_c_return:
 	mr	r4, r9
 	b	fast_guest_return
 
-3:	ld	r5, VCPU_KVM(r9)	/* not relocated, use VRMA */
+3:	/*
+	 * Check whether we can send the fault directly to
+	 * guest.
+	 */
+	andis.	r0, r6, (DSISR_NOHPTE | DSISR_PROTFAULT)@h
+	beq	1b
+7:
+	ld	r5, VCPU_KVM(r9)	/* not relocated, use VRMA */
 	ld	r5, KVM_VRMA_SLB_V(r5)
 	b	4b
 
@@ -1865,10 +1894,27 @@ fast_interrupt_c_return:
  * it is an HPTE not found fault for a page that we have paged out.
  */
 kvmppc_hisi:
+	/* first check whether we have an hpte update in progress.
+	 * If so, we go to host directly
+	 */
+	ld	r0, VCPU_KVM(r9)	/* not relocated, use VRMA */
+	lwz	r0, KVM_HPTE_UPDATE(r0)
+	/*
+	 * If update is in progress go to host directly
+	 */
+	cmpwi	r0, 0
+	bne	5f
+
+	andi.	r0, r11, MSR_IR		/* instruction relocation enabled? */
+	beq	3f
+
 	andis.	r0, r11, SRR1_ISI_NOPT@h
 	beq	1f
+	b	8f
+5:
 	andi.	r0, r11, MSR_IR		/* instruction relocation enabled? */
-	beq	3f
+	beq	7f
+8:
 	clrrdi	r0, r10, 28
 	PPC_SLBFEE_DOT(R5, R0)		/* if so, look up SLB */
 	bne	1f			/* if no SLB entry found */
@@ -1896,7 +1942,11 @@ kvmppc_hisi:
 	bl	kvmppc_msr_interrupt
 	b	fast_interrupt_c_return
 
-3:	ld	r6, VCPU_KVM(r9)	/* not relocated, use VRMA */
+3:	/* Check whether we should send this to guest */
+	andis.	r0, r11, SRR1_ISI_NOPT@h
+	beq	1b
+7:
+	ld	r6, VCPU_KVM(r9)	/* not relocated, use VRMA */
 	ld	r5, KVM_VRMA_SLB_V(r6)
 	b	4b
 
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 6/6] KVM: PPC: BOOK3S: HV: Use virtual page class protection mechanism for host fault and mmi
@ 2014-06-29 11:17   ` Aneesh Kumar K.V
  0 siblings, 0 replies; 51+ messages in thread
From: Aneesh Kumar K.V @ 2014-06-29 11:29 UTC (permalink / raw)
  To: agraf, benh, paulus; +Cc: linuxppc-dev, kvm-ppc, kvm, Aneesh Kumar K.V

With this patch we use AMR class 30 and 31 for indicating a page
fault that should be handled by host. This includes the MMIO access and
the page fault resulting from guest RAM swapout in the host. This
enables us to forward the fault to guest without doing the expensive
hash page table search for finding the hpte entry. With this patch, we
mark hash pte always valid and use class index 30 and 31 for key based
fault. These virtual class index are configured in AMR to deny
read/write. Since access class protection mechanism doesn't work with
VRMA region, we need to handle them separately. We mark those HPTEs
invalid and use the software defined bit, HPTE_V_VRMA, to differentiate
them.

NOTE: We still need to handle protection fault in host so that a
write to KSM shared page is handled in the host.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/kvm_book3s_64.h | 80 +++++++++++++++++++++++++++-----
 arch/powerpc/include/asm/reg.h           |  1 +
 arch/powerpc/kvm/book3s_64_mmu_hv.c      | 48 ++++++++++++++-----
 arch/powerpc/kvm/book3s_hv_rm_mmu.c      | 69 ++++++++++++++++++---------
 arch/powerpc/kvm/book3s_hv_rmhandlers.S  | 52 ++++++++++++++++-----
 5 files changed, 194 insertions(+), 56 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h
index a6bf41865a66..4aa9c3601fe8 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -48,7 +48,18 @@ extern unsigned long kvm_rma_pages;
  * HPTEs.
  */
 #define HPTE_V_HVLOCK	0x40UL
-#define HPTE_V_ABSENT	0x20UL
+/*
+ * VRMA mapping
+ */
+#define HPTE_V_VRMA	0x20UL
+
+#define HPTE_R_HOST_UNMAP_KEY	0x3000000000000e00UL
+/*
+ * We use this to differentiate between an MMIO key fault and
+ * and a key fault resulting from host swapping out the page.
+ */
+#define HPTE_R_MMIO_UNMAP_KEY	0x3000000000000c00UL
+
 
 /*
  * We use this bit in the guest_rpte field of the revmap entry
@@ -405,35 +416,82 @@ static inline void __kvmppc_unmap_host_hpte(struct kvm *kvm,
 					    unsigned long *hpte_r,
 					    bool mmio)
 {
-	*hpte_v |= HPTE_V_ABSENT;
-	if (mmio)
-		*hpte_r |= HPTE_R_KEY_HI | HPTE_R_KEY_LO;
+	/*
+	 * We unmap on host by adding the page to AMR class 31
+	 * which have both read/write access denied.
+	 *
+	 * For VRMA area we mark them invalid.
+	 *
+	 * If we are not using mmu_notifiers we don't use Access
+	 * class protection.
+	 *
+	 * Since we are not changing the hpt directly we don't
+	 * Worry about update ordering.
+	 */
+	if ((*hpte_v & HPTE_V_VRMA) || !kvm->arch.using_mmu_notifiers)
+		*hpte_v &= ~HPTE_V_VALID;
+	else if (!mmio) {
+		*hpte_r |= HPTE_R_HOST_UNMAP_KEY;
+		*hpte_v |= HPTE_V_VALID;
+	} else {
+		*hpte_r |= HPTE_R_MMIO_UNMAP_KEY;
+		*hpte_v |= HPTE_V_VALID;
+	}
 }
 
 static inline void kvmppc_unmap_host_hpte(struct kvm *kvm, __be64 *hptep)
 {
+	unsigned long pte_v, pte_r;
+
+	pte_v = be64_to_cpu(hptep[0]);
+	pte_r = be64_to_cpu(hptep[1]);
 	/*
 	 * We will never call this for MMIO
 	 */
-	hptep[0] |= cpu_to_be64(HPTE_V_ABSENT);
+	__kvmppc_unmap_host_hpte(kvm, &pte_v, &pte_r, 0);
+	hptep[1] = cpu_to_be64(pte_r);
+	eieio();
+	hptep[0] = cpu_to_be64(pte_v);
+	asm volatile("ptesync" : : : "memory");
+	/*
+	 * we have now successfully marked the hpte using key bits
+	 */
 	atomic_dec(&kvm->arch.hpte_update_in_progress);
 }
 
 static inline void kvmppc_map_host_hpte(struct kvm *kvm, unsigned long *hpte_v,
 					unsigned long *hpte_r)
 {
-	*hpte_v |= HPTE_V_VALID;
-	*hpte_v &= ~HPTE_V_ABSENT;
+	/*
+	 * We will never try to map an MMIO region
+	 */
+	if ((*hpte_v & HPTE_V_VRMA) || !kvm->arch.using_mmu_notifiers)
+		*hpte_v |= HPTE_V_VALID;
+	else {
+		/*
+		 * When we allow guest keys we should set this with key
+		 * for this page.
+		 */
+		*hpte_r &= ~(HPTE_R_KEY_HI | HPTE_R_KEY_LO);
+	}
 }
 
 static inline bool kvmppc_is_host_mapped_hpte(struct kvm *kvm, __be64 *hpte)
 {
-	unsigned long v;
+	unsigned long v, r;
 
 	v = be64_to_cpu(hpte[0]);
-	if (v & HPTE_V_VALID)
-		return true;
-	return false;
+	if ((v & HPTE_V_VRMA) || !kvm->arch.using_mmu_notifiers)
+		return v & HPTE_V_VALID;
+
+	r = be64_to_cpu(hpte[1]);
+	if (!(v & HPTE_V_VALID))
+		return false;
+	if ((r & (HPTE_R_KEY_HI | HPTE_R_KEY_LO)) = HPTE_R_HOST_UNMAP_KEY)
+		return false;
+	if ((r & (HPTE_R_KEY_HI | HPTE_R_KEY_LO)) = HPTE_R_MMIO_UNMAP_KEY)
+		return false;
+	return true;
 }
 
 
diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
index 1f34ef7ec4a8..ca9a7aebc9ce 100644
--- a/arch/powerpc/include/asm/reg.h
+++ b/arch/powerpc/include/asm/reg.h
@@ -605,6 +605,7 @@
 #define   SRR1_ISI_NOPT		0x40000000 /* ISI: Not found in hash */
 #define   SRR1_ISI_N_OR_G	0x10000000 /* ISI: Access is no-exec or G */
 #define   SRR1_ISI_PROT		0x08000000 /* ISI: Other protection fault */
+#define   SRR1_ISI_KEYFAULT	0x00200000	/* Key fault */
 #define   SRR1_WAKEMASK		0x00380000 /* reason for wakeup */
 #define   SRR1_WAKESYSERR	0x00300000 /* System error */
 #define   SRR1_WAKEEE		0x00200000 /* External interrupt */
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index cb7a616aacb1..06f860d84d69 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -214,6 +214,11 @@ void kvmppc_map_vrma(struct kvm_vcpu *vcpu, struct kvm_memory_slot *memslot,
 		hash = (hash << 3) + 7;
 		hp_v = hp0 | ((addr >> 16) & ~0x7fUL);
 		hp_r = hp1 | addr;
+		/*
+		 * VRMA mapping doesn't work with access class protection
+		 * mechanism, hence don't use keys for them
+		 */
+		hp_v |= HPTE_V_VRMA;
 		ret = kvmppc_virtmode_do_h_enter(kvm, H_EXACT, hash, hp_v, hp_r,
 						 &idx_ret);
 		if (ret != H_SUCCESS) {
@@ -409,7 +414,7 @@ long kvmppc_virtmode_h_enter(struct kvm_vcpu *vcpu, unsigned long flags,
 	/*
 	 * Clear few bits, when called via hcall
 	 */
-	pteh &= ~(HPTE_V_HVLOCK | HPTE_V_ABSENT | HPTE_V_VALID);
+	pteh &= ~(HPTE_V_HVLOCK | HPTE_V_VRMA | HPTE_V_VALID);
 	ptel &= ~(HPTE_R_KEY_HI | HPTE_R_KEY_LO | HPTE_GR_RESERVED);
 
 	return kvmppc_virtmode_do_h_enter(vcpu->kvm, flags, pte_index,
@@ -472,7 +477,7 @@ static int kvmppc_mmu_book3s_64_hv_xlate(struct kvm_vcpu *vcpu, gva_t eaddr,
 	preempt_disable();
 	/* Find the HPTE in the hash table */
 	index = kvmppc_hv_find_lock_hpte(kvm, eaddr, slb_v,
-					 HPTE_V_VALID | HPTE_V_ABSENT);
+					 HPTE_V_VALID | HPTE_V_VRMA);
 	if (index < 0) {
 		preempt_enable();
 		return -ENOENT;
@@ -733,7 +738,13 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu,
 		rev->guest_rpte != hpte[2])
 		/* HPTE has been changed under us; let the guest retry */
 		goto out_unlock;
-	hpte[0] = (hpte[0] & ~HPTE_V_ABSENT) | HPTE_V_VALID;
+	/*
+	 * mark this hpte host mapped. We will use this value
+	 * to update actual hpte later. We don't need to clear
+	 * clear key bits, because we use rev->guest_rpte values
+	 * for the lower half.
+	 */
+	hpte[0] |= HPTE_V_VALID;
 
 	/* Always put the HPTE in the rmap chain for the page base address */
 	rmap = &memslot->arch.rmap[gfn_base - memslot->base_gfn];
@@ -906,8 +917,9 @@ static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long *rmapp,
 		psize = hpte_page_size(be64_to_cpu(hptep[0]), ptel);
 		if (kvmppc_is_host_mapped_hpte(kvm, hptep) &&
 		    hpte_rpn(ptel, psize) = gfn) {
-			if (kvm->arch.using_mmu_notifiers)
-				kvmppc_unmap_host_hpte(kvm, hptep);
+			/*
+			 * For hpte update always invalidate first
+			 */
 			kvmppc_invalidate_hpte(kvm, hptep, i);
 
 			/* Harvest R and C */
@@ -917,6 +929,11 @@ static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long *rmapp,
 				rev[i].guest_rpte = ptel | rcbits;
 				note_hpte_modification(kvm, &rev[i]);
 			}
+			/*
+			 * Mark the hpte umapped so that host can
+			 * handle the faults.
+			 */
+			kvmppc_unmap_host_hpte(kvm, hptep);
 		}
 		unlock_rmap(rmapp);
 		hptep[0] &= ~cpu_to_be64(HPTE_V_HVLOCK);
@@ -1345,7 +1362,7 @@ static long record_hpte(unsigned long flags, __be64 *hptp,
 		return 0;
 
 	valid = 0;
-	if (be64_to_cpu(hptp[0]) & (HPTE_V_VALID | HPTE_V_ABSENT)) {
+	if (be64_to_cpu(hptp[0]) & (HPTE_V_VALID | HPTE_V_VRMA)) {
 		valid = 1;
 		if ((flags & KVM_GET_HTAB_BOLTED_ONLY) &&
 		    !(be64_to_cpu(hptp[0]) & HPTE_V_BOLTED))
@@ -1362,7 +1379,11 @@ static long record_hpte(unsigned long flags, __be64 *hptp,
 			cpu_relax();
 		v = be64_to_cpu(hptp[0]);
 
-		/* re-evaluate valid and dirty from synchronized HPTE value */
+		/*
+		 * re-evaluate valid and dirty from synchronized HPTE value
+		 * We don't need to worry of host unmapped. We keep the valid
+		 * bit set even if we move the hpte to class 31.
+		 */
 		valid = !!(v & HPTE_V_VALID);
 		dirty = !!(revp->guest_rpte & HPTE_GR_MODIFIED);
 
@@ -1374,8 +1395,11 @@ static long record_hpte(unsigned long flags, __be64 *hptp,
 			dirty = 1;
 		}
 
-		if (v & HPTE_V_ABSENT) {
-			v &= ~HPTE_V_ABSENT;
+		if (v & HPTE_V_VRMA) {
+			/*
+			 * unmapped vrma consider them mapped and also
+			 * retain the HPTE_V_VRMA bit.
+			 */
 			v |= HPTE_V_VALID;
 			valid = 1;
 		}
@@ -1559,14 +1583,14 @@ static ssize_t kvm_htab_write(struct file *file, const char __user *buf,
 			lbuf += 2;
 			nb += HPTE_SIZE;
 
-			if (be64_to_cpu(hptp[0]) & (HPTE_V_VALID | HPTE_V_ABSENT))
+			if (be64_to_cpu(hptp[0]) & (HPTE_V_VALID | HPTE_V_VRMA))
 				kvmppc_do_h_remove(kvm, 0, i, 0, tmp);
 			err = -EIO;
 			/*
 			 * Clear few bits we got via read_htab which we
 			 * don't need to carry forward.
 			 */
-			v &= ~(HPTE_V_HVLOCK | HPTE_V_ABSENT | HPTE_V_VALID);
+			v &= ~(HPTE_V_HVLOCK | HPTE_V_VALID);
 			r &= ~(HPTE_R_KEY_HI | HPTE_R_KEY_LO | HPTE_GR_RESERVED);
 
 			ret = kvmppc_virtmode_do_h_enter(kvm, H_EXACT, i, v, r,
@@ -1592,7 +1616,7 @@ static ssize_t kvm_htab_write(struct file *file, const char __user *buf,
 		}
 
 		for (j = 0; j < hdr.n_invalid; ++j) {
-			if (be64_to_cpu(hptp[0]) & (HPTE_V_VALID | HPTE_V_ABSENT))
+			if (be64_to_cpu(hptp[0]) & (HPTE_V_VALID | HPTE_V_VRMA))
 				kvmppc_do_h_remove(kvm, 0, i, 0, tmp);
 			++i;
 			hptp += 2;
diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index d628d2810c93..f907be908b28 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -252,7 +252,13 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags,
 
 	ptel &= ~(HPTE_R_PP0 - psize);
 	ptel |= pa;
-
+	/*
+	 * We mark the pte valid, if it is valid from guest point of view
+	 * For VRMA area we need to mark it still invalid, because access
+	 * class protection mechanism doesn't work with guest real mode
+	 * access.
+	 * Non vrma area is always valid, and vram is valid only if pa is set.
+	 */
 	if (pa)
 		pteh |= HPTE_V_VALID;
 
@@ -278,7 +284,7 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags,
 		for (i = 0; i < 8; ++i) {
 			if ((be64_to_cpu(*hpte) & HPTE_V_VALID) = 0 &&
 			    try_lock_hpte(hpte, HPTE_V_HVLOCK | HPTE_V_VALID |
-					  HPTE_V_ABSENT))
+					  HPTE_V_VRMA))
 				break;
 			hpte += 2;
 		}
@@ -295,7 +301,7 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags,
 				while (!try_lock_hpte(hpte, HPTE_V_HVLOCK))
 					cpu_relax();
 				pte = be64_to_cpu(*hpte);
-				if (!(pte & (HPTE_V_VALID | HPTE_V_ABSENT)))
+				if (!(pte & (HPTE_V_VALID | HPTE_V_VRMA)))
 					break;
 				*hpte &= ~cpu_to_be64(HPTE_V_HVLOCK);
 				hpte += 2;
@@ -307,14 +313,14 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags,
 	} else {
 		hpte = (__be64 *)(kvm->arch.hpt_virt + (pte_index << 4));
 		if (!try_lock_hpte(hpte, HPTE_V_HVLOCK | HPTE_V_VALID |
-				   HPTE_V_ABSENT)) {
+				   HPTE_V_VRMA)) {
 			/* Lock the slot and check again */
 			u64 pte;
 
 			while (!try_lock_hpte(hpte, HPTE_V_HVLOCK))
 				cpu_relax();
 			pte = be64_to_cpu(*hpte);
-			if (pte & (HPTE_V_VALID | HPTE_V_ABSENT)) {
+			if (pte & (HPTE_V_VALID | HPTE_V_VRMA)) {
 				*hpte &= ~cpu_to_be64(HPTE_V_HVLOCK);
 				return H_PTEG_FULL;
 			}
@@ -372,7 +378,7 @@ long kvmppc_h_enter(struct kvm_vcpu *vcpu, unsigned long flags,
 	/*
 	 * Clear few bits. when called via hcall.
 	 */
-	pteh &= ~(HPTE_V_HVLOCK | HPTE_V_ABSENT | HPTE_V_VALID);
+	pteh &= ~(HPTE_V_HVLOCK | HPTE_V_VRMA | HPTE_V_VALID);
 	ptel &= ~(HPTE_R_KEY_HI | HPTE_R_KEY_LO | HPTE_GR_RESERVED);
 
 	return kvmppc_do_h_enter(vcpu->kvm, flags, pte_index, pteh, ptel,
@@ -492,7 +498,7 @@ long kvmppc_do_h_remove(struct kvm *kvm, unsigned long flags,
 	while (!try_lock_hpte(hpte, HPTE_V_HVLOCK))
 		cpu_relax();
 	pte = be64_to_cpu(hpte[0]);
-	if ((pte & (HPTE_V_ABSENT | HPTE_V_VALID)) = 0 ||
+	if ((pte & (HPTE_V_VRMA | HPTE_V_VALID)) = 0 ||
 	    ((flags & H_AVPN) && (pte & ~0x7fUL) != avpn) ||
 	    ((flags & H_ANDCOND) && (pte & avpn) != 0)) {
 		hpte[0] &= ~cpu_to_be64(HPTE_V_HVLOCK);
@@ -503,9 +509,12 @@ long kvmppc_do_h_remove(struct kvm *kvm, unsigned long flags,
 	v = pte & ~HPTE_V_HVLOCK;
 	if (v & HPTE_V_VALID) {
 		u64 pte1;
-
 		pte1 = be64_to_cpu(hpte[1]);
-		hpte[0] &= ~cpu_to_be64(HPTE_V_VALID);
+		/*
+		 * Remove the valid and VRMA bits
+		 */
+		hpte[0] &= ~cpu_to_be64(HPTE_V_VALID | HPTE_V_VRMA);
+
 		rb = compute_tlbie_rb(v, pte1, pte_index);
 		do_tlbies(kvm, &rb, 1, global_invalidates(kvm, flags), true);
 		/* Read PTE low word after tlbie to get final R/C values */
@@ -572,7 +581,7 @@ long kvmppc_h_bulk_remove(struct kvm_vcpu *vcpu)
 			}
 			found = 0;
 			hp0 = be64_to_cpu(hp[0]);
-			if (hp0 & (HPTE_V_ABSENT | HPTE_V_VALID)) {
+			if (hp0 & (HPTE_V_VRMA | HPTE_V_VALID)) {
 				switch (flags & 3) {
 				case 0:		/* absolute */
 					found = 1;
@@ -606,7 +615,7 @@ long kvmppc_h_bulk_remove(struct kvm_vcpu *vcpu)
 			}
 
 			/* leave it locked */
-			hp[0] &= ~cpu_to_be64(HPTE_V_VALID);
+			hp[0] &= ~cpu_to_be64(HPTE_V_VALID | HPTE_V_VRMA);
 			tlbrb[n] = compute_tlbie_rb(be64_to_cpu(hp[0]),
 				be64_to_cpu(hp[1]), pte_index);
 			indexes[n] = j;
@@ -656,7 +665,7 @@ long kvmppc_h_protect(struct kvm_vcpu *vcpu, unsigned long flags,
 	while (!try_lock_hpte(hpte, HPTE_V_HVLOCK))
 		cpu_relax();
 	pte = be64_to_cpu(hpte[0]);
-	if ((pte & (HPTE_V_ABSENT | HPTE_V_VALID)) = 0 ||
+	if ((pte & (HPTE_V_VRMA | HPTE_V_VALID)) = 0 ||
 	    ((flags & H_AVPN) && (pte & ~0x7fUL) != avpn)) {
 		hpte[0] &= ~cpu_to_be64(HPTE_V_HVLOCK);
 		return H_NOT_FOUND;
@@ -758,10 +767,17 @@ long kvmppc_h_read(struct kvm_vcpu *vcpu, unsigned long flags,
 		hpte = (__be64 *)(kvm->arch.hpt_virt + (pte_index << 4));
 		v = be64_to_cpu(hpte[0]) & ~HPTE_V_HVLOCK;
 		r = be64_to_cpu(hpte[1]);
-		if (v & HPTE_V_ABSENT) {
-			v &= ~HPTE_V_ABSENT;
+		if (v & HPTE_V_VRMA) {
+			/*
+			 * don't share vrma bits back to guest
+			 */
+			v &= ~HPTE_V_VRMA;
 			v |= HPTE_V_VALID;
 		}
+		/*
+		 * Clear the AMR class bits
+		 */
+		r &= ~(HPTE_R_KEY_HI | HPTE_R_KEY_LO);
 		if (v & HPTE_V_VALID) {
 			r = rev[i].guest_rpte | (r & (HPTE_R_R | HPTE_R_C));
 			r &= ~HPTE_GR_RESERVED;
@@ -871,8 +887,13 @@ long kvmppc_hv_find_lock_hpte(struct kvm *kvm, gva_t eaddr, unsigned long slb_v,
 			/* Read the PTE racily */
 			v = be64_to_cpu(hpte[i]) & ~HPTE_V_HVLOCK;
 
-			/* Check valid/absent, hash, segment size and AVPN */
-			if (!(v & valid) || (v & mask) != val)
+			/*
+			 * Check hash, segment size and AVPN.
+			 * We can't check for valid here without lock. We do
+			 * mark the hpte not valid while an hpte update is
+			 * in progress.
+			 */
+			if ((v & mask) != val)
 				continue;
 
 			/* Lock the PTE and read it under the lock */
@@ -927,7 +948,7 @@ long kvmppc_hpte_hv_fault(struct kvm_vcpu *vcpu, unsigned long addr,
 	/* For protection fault, expect to find a valid HPTE */
 	valid = HPTE_V_VALID;
 	if (status & DSISR_NOHPTE)
-		valid |= HPTE_V_ABSENT;
+		valid |= HPTE_V_VRMA;
 
 	index = kvmppc_hv_find_lock_hpte(kvm, addr, slb_v, valid);
 	if (index < 0) {
@@ -942,10 +963,17 @@ long kvmppc_hpte_hv_fault(struct kvm_vcpu *vcpu, unsigned long addr,
 	gr = rev->guest_rpte;
 
 	unlock_hpte(hpte, v);
-
-	/* For not found, if the HPTE is valid by now, retry the instruction */
+	/*
+	 * For not found, if the HPTE is valid by now, retry the instruction
+	 */
 	if ((status & DSISR_NOHPTE) && (v & HPTE_V_VALID))
 		return 0;
+	/*
+	 * For key fault if HPTE is host mapped by now retry the instruction
+	 */
+	if ((status & DSISR_KEYFAULT) &&
+	    kvmppc_is_host_mapped_hpte(kvm, hpte))
+		return 0;
 
 	/* Check access permissions to the page */
 	pp = gr & (HPTE_R_PP0 | HPTE_R_PP);
@@ -973,8 +1001,7 @@ long kvmppc_hpte_hv_fault(struct kvm_vcpu *vcpu, unsigned long addr,
 
 	/* Check the storage key to see if it is possibly emulated MMIO */
 	if (data && (vcpu->arch.shregs.msr & MSR_IR) &&
-	    (r & (HPTE_R_KEY_HI | HPTE_R_KEY_LO)) =
-	    (HPTE_R_KEY_HI | HPTE_R_KEY_LO))
+	    ((r & HPTE_R_MMIO_UNMAP_KEY) = HPTE_R_MMIO_UNMAP_KEY))
 		return -2;	/* MMIO emulation - load instr word */
 
 	return -1;		/* send fault up to host kernel mode */
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index 0b425da9f8db..f1af0d24e2b5 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -834,7 +834,11 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S)
 	mtmsrd	r8
 
 	/* Load up POWER8-specific registers */
+	/*
+	 * Always disable read/write w.r.t to class index 31
+	 */
 	ld	r5, VCPU_IAMR(r4)
+	ori	r5, r5, 0x3
 	lwz	r6, VCPU_PSPB(r4)
 	ld	r7, VCPU_FSCR(r4)
 	mtspr	SPRN_IAMR, r5
@@ -901,10 +905,25 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S)
 	mtspr	SPRN_DSISR, r6
 
 BEGIN_FTR_SECTION
-	/* Restore AMR and UAMOR, set AMOR to all 1s */
-	ld	r5,VCPU_AMR(r4)
+	/* Restore AMR and UAMOR */
+	/*
+	 * Always disable read/write wr.t. to class index
+	 * 30 and 31
+	 */
+	ld	r5, VCPU_AMR(r4)
+	ori	r5, r5, 0xf
+	/*
+	 * UAMOR set so that mask bits for class index 30
+	 * and 31 cannot be updated
+	 */
+
 	ld	r6,VCPU_UAMOR(r4)
-	li	r7,-1
+	rldicr  r6, r6, 0, 59
+	/*
+	 * AMOR set so that mask bits for class index 30
+	 * and 31 cannot be updated
+	 */
+	li	r7, ~0xf
 	mtspr	SPRN_AMR,r5
 	mtspr	SPRN_UAMOR,r6
 	mtspr	SPRN_AMOR,r7
@@ -1801,13 +1820,22 @@ kvmppc_hdsi:
 	 */
 	cmpwi	r0, 0
 	bne	5f
-
-	andi.	r0, r11, MSR_DR		/* data relocation enabled? */
+	/*
+	 * If data relocation is disabled, virtual page class
+	 * key protection mechanis does not apply.
+	 */
+	andi.	r0, r11, MSR_DR
 	beq	3f
-
-	/* HPTE not found fault or protection fault? */
-	andis.	r0, r6, (DSISR_NOHPTE | DSISR_PROTFAULT)@h
-	beq	1f			/* if not, send it to the guest */
+	/*
+	 * If access is not permitted by virtual page class
+	 * key protection, handle it in the host. If not
+	 * send it to the guest.
+	 */
+	andis.	r0, r6, (DSISR_KEYFAULT | DSISR_PROTFAULT)@h
+	beq	1f
+	/*
+	 * skip the real mode check below
+	 */
 	b	8f
 5:
 	/*
@@ -1857,7 +1885,8 @@ fast_interrupt_c_return:
 
 3:	/*
 	 * Check whether we can send the fault directly to
-	 * guest.
+	 * guest. We don't need to worry about keyfault if
+	 * the fault happens in real mode.
 	 */
 	andis.	r0, r6, (DSISR_NOHPTE | DSISR_PROTFAULT)@h
 	beq	1b
@@ -1907,8 +1936,7 @@ kvmppc_hisi:
 
 	andi.	r0, r11, MSR_IR		/* instruction relocation enabled? */
 	beq	3f
-
-	andis.	r0, r11, SRR1_ISI_NOPT@h
+	andis.	r0, r11, SRR1_ISI_KEYFAULT@h
 	beq	1f
 	b	8f
 5:
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [PATCH 0/6] Use virtual page class key protection mechanism for speeding up guest page fault
  2014-06-29 11:26   ` Benjamin Herrenschmidt
  (?)
@ 2014-06-29 16:57     ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 51+ messages in thread
From: Aneesh Kumar K.V @ 2014-06-29 16:57 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: agraf, paulus, linuxppc-dev, kvm-ppc, kvm

Benjamin Herrenschmidt <benh@kernel.crashing.org> writes:

> On Sun, 2014-06-29 at 16:47 +0530, Aneesh Kumar K.V wrote:
>
>> To achieve the above we use virtual page calss protection mechanism for
>> covering (2) and (3). For both the above case we mark the hpte
>> valid, but associate the page with virtual page class index 30 and 31.
>> The authority mask register is configured such that class index 30 and 31
>> will have read/write denied. The above change results in a key fault
>> for (2) and (3). This allows us to forward a NO_HPTE fault directly to guest
>> without doing the expensive hash pagetable lookup.
>
> So we have a measurable performance benefit (about half a second out of
> 8).

I was able to measure a performance benefit of 2 seconds earlier. But
once i get the below patch applied that got reduced. I am trying
to find how the patch is helping the performance. May be it is
avoiding some unnecessary invalidation ?

http://mid.gmane.org/1403876103-32459-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com

I also believe the benefit depends on how much impact a hash table
lookup have. I did try to access the addresses linearly so that I can make
sure we do take a cache miss for hash page table access. 

>but you didn't explain the drawback here which is to essentially make
> it impossible for guests to exploit virtual page class keys, or did you
> find a way to still make that possible ?

I am now making PROTFAULT to go to host. That means, ksm sharing is
represented as read only page and an attempt to write to it will get to
host via PROTFAULT. Now with that we can implement keys for guest if we
want to. So irrespective of what restrictions guest has put in, if the
host swapout the page, we will deny read/write. Now if the key fault
need to go to guest, we will find that out looking at the key index. 

>
> As it-is, it's not a huge issue for Linux but we might have to care with
> other OSes that do care...
>
> Do we have a way in PAPR to signify to the guest that the keys are not
> available ?

Right now Qemu doesn't provide the device tree node
ibm,processor-storage-keys. That means guest cannot use keys. So we are
good there. If we want to support guest keys, we need to fill that with
the value that indicate how many keys can be used for data and instruction.

-aneesh

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 0/6] Use virtual page class key protection mechanism for speeding up guest page fault
@ 2014-06-29 16:57     ` Aneesh Kumar K.V
  0 siblings, 0 replies; 51+ messages in thread
From: Aneesh Kumar K.V @ 2014-06-29 16:57 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev, paulus, agraf, kvm-ppc, kvm

Benjamin Herrenschmidt <benh@kernel.crashing.org> writes:

> On Sun, 2014-06-29 at 16:47 +0530, Aneesh Kumar K.V wrote:
>
>> To achieve the above we use virtual page calss protection mechanism for
>> covering (2) and (3). For both the above case we mark the hpte
>> valid, but associate the page with virtual page class index 30 and 31.
>> The authority mask register is configured such that class index 30 and 31
>> will have read/write denied. The above change results in a key fault
>> for (2) and (3). This allows us to forward a NO_HPTE fault directly to guest
>> without doing the expensive hash pagetable lookup.
>
> So we have a measurable performance benefit (about half a second out of
> 8).

I was able to measure a performance benefit of 2 seconds earlier. But
once i get the below patch applied that got reduced. I am trying
to find how the patch is helping the performance. May be it is
avoiding some unnecessary invalidation ?

http://mid.gmane.org/1403876103-32459-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com

I also believe the benefit depends on how much impact a hash table
lookup have. I did try to access the addresses linearly so that I can make
sure we do take a cache miss for hash page table access. 

>but you didn't explain the drawback here which is to essentially make
> it impossible for guests to exploit virtual page class keys, or did you
> find a way to still make that possible ?

I am now making PROTFAULT to go to host. That means, ksm sharing is
represented as read only page and an attempt to write to it will get to
host via PROTFAULT. Now with that we can implement keys for guest if we
want to. So irrespective of what restrictions guest has put in, if the
host swapout the page, we will deny read/write. Now if the key fault
need to go to guest, we will find that out looking at the key index. 

>
> As it-is, it's not a huge issue for Linux but we might have to care with
> other OSes that do care...
>
> Do we have a way in PAPR to signify to the guest that the keys are not
> available ?

Right now Qemu doesn't provide the device tree node
ibm,processor-storage-keys. That means guest cannot use keys. So we are
good there. If we want to support guest keys, we need to fill that with
the value that indicate how many keys can be used for data and instruction.

-aneesh

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 0/6] Use virtual page class key protection mechanism for speeding up guest page fault
@ 2014-06-29 16:57     ` Aneesh Kumar K.V
  0 siblings, 0 replies; 51+ messages in thread
From: Aneesh Kumar K.V @ 2014-06-29 16:57 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: agraf, paulus, linuxppc-dev, kvm-ppc, kvm

Benjamin Herrenschmidt <benh@kernel.crashing.org> writes:

> On Sun, 2014-06-29 at 16:47 +0530, Aneesh Kumar K.V wrote:
>
>> To achieve the above we use virtual page calss protection mechanism for
>> covering (2) and (3). For both the above case we mark the hpte
>> valid, but associate the page with virtual page class index 30 and 31.
>> The authority mask register is configured such that class index 30 and 31
>> will have read/write denied. The above change results in a key fault
>> for (2) and (3). This allows us to forward a NO_HPTE fault directly to guest
>> without doing the expensive hash pagetable lookup.
>
> So we have a measurable performance benefit (about half a second out of
> 8).

I was able to measure a performance benefit of 2 seconds earlier. But
once i get the below patch applied that got reduced. I am trying
to find how the patch is helping the performance. May be it is
avoiding some unnecessary invalidation ?

http://mid.gmane.org/1403876103-32459-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com

I also believe the benefit depends on how much impact a hash table
lookup have. I did try to access the addresses linearly so that I can make
sure we do take a cache miss for hash page table access. 

>but you didn't explain the drawback here which is to essentially make
> it impossible for guests to exploit virtual page class keys, or did you
> find a way to still make that possible ?

I am now making PROTFAULT to go to host. That means, ksm sharing is
represented as read only page and an attempt to write to it will get to
host via PROTFAULT. Now with that we can implement keys for guest if we
want to. So irrespective of what restrictions guest has put in, if the
host swapout the page, we will deny read/write. Now if the key fault
need to go to guest, we will find that out looking at the key index. 

>
> As it-is, it's not a huge issue for Linux but we might have to care with
> other OSes that do care...
>
> Do we have a way in PAPR to signify to the guest that the keys are not
> available ?

Right now Qemu doesn't provide the device tree node
ibm,processor-storage-keys. That means guest cannot use keys. So we are
good there. If we want to support guest keys, we need to fill that with
the value that indicate how many keys can be used for data and instruction.

-aneesh

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH] KVM: PPC: BOOK3S: HV: Update compute_tlbie_rb to handle 16MB base page
  2014-06-29 11:17   ` Aneesh Kumar K.V
  (?)
@ 2014-07-02  4:00     ` Paul Mackerras
  -1 siblings, 0 replies; 51+ messages in thread
From: Paul Mackerras @ 2014-07-02  4:00 UTC (permalink / raw)
  To: Aneesh Kumar K.V; +Cc: agraf, benh, linuxppc-dev, kvm-ppc, kvm

On Sun, Jun 29, 2014 at 04:47:30PM +0530, Aneesh Kumar K.V wrote:
> When calculating the lower bits of AVA field, use the shift
> count based on the base page size. Also add the missing segment
> size and remove stale comment.
> 
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

Acked-by: Paul Mackerras <paulus@samba.org>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH] KVM: PPC: BOOK3S: HV: Update compute_tlbie_rb to handle 16MB base page
@ 2014-07-02  4:00     ` Paul Mackerras
  0 siblings, 0 replies; 51+ messages in thread
From: Paul Mackerras @ 2014-07-02  4:00 UTC (permalink / raw)
  To: Aneesh Kumar K.V; +Cc: linuxppc-dev, agraf, kvm-ppc, kvm

On Sun, Jun 29, 2014 at 04:47:30PM +0530, Aneesh Kumar K.V wrote:
> When calculating the lower bits of AVA field, use the shift
> count based on the base page size. Also add the missing segment
> size and remove stale comment.
> 
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

Acked-by: Paul Mackerras <paulus@samba.org>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH] KVM: PPC: BOOK3S: HV: Update compute_tlbie_rb to handle 16MB base page
@ 2014-07-02  4:00     ` Paul Mackerras
  0 siblings, 0 replies; 51+ messages in thread
From: Paul Mackerras @ 2014-07-02  4:00 UTC (permalink / raw)
  To: Aneesh Kumar K.V; +Cc: agraf, benh, linuxppc-dev, kvm-ppc, kvm

On Sun, Jun 29, 2014 at 04:47:30PM +0530, Aneesh Kumar K.V wrote:
> When calculating the lower bits of AVA field, use the shift
> count based on the base page size. Also add the missing segment
> size and remove stale comment.
> 
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

Acked-by: Paul Mackerras <paulus@samba.org>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 4/6] KVM: PPC: BOOK3S: HV: Use new functions for mapping/unmapping hpte in host
  2014-06-29 11:17   ` Aneesh Kumar K.V
  (?)
@ 2014-07-02  4:28     ` Paul Mackerras
  -1 siblings, 0 replies; 51+ messages in thread
From: Paul Mackerras @ 2014-07-02  4:28 UTC (permalink / raw)
  To: Aneesh Kumar K.V; +Cc: agraf, benh, linuxppc-dev, kvm-ppc, kvm

On Sun, Jun 29, 2014 at 04:47:33PM +0530, Aneesh Kumar K.V wrote:
> We want to use virtual page class key protection mechanism for
> indicating a MMIO mapped hpte entry or a guest hpte entry that is swapped out
> in the host. Those hptes will be marked valid, but have virtual page
> class key set to 30 or 31. These virtual page class numbers are
> configured in AMR to deny read/write. To accomodate such a change, add
> new functions that map, unmap and check whether a hpte is mapped in the
> host. This patch still use HPTE_V_VALID and HPTE_V_ABSENT and don't use
> virtual page class keys. But we want to differentiate in the code
> where we explicitly check for HPTE_V_VALID with places where we want to
> check whether the hpte is host mapped. This patch enables a closer
> review for such a change.

[...]

>  		/* Check for pending invalidations under the rmap chain lock */
>  		if (kvm->arch.using_mmu_notifiers &&
>  		    mmu_notifier_retry(kvm, mmu_seq)) {
> -			/* inval in progress, write a non-present HPTE */
> -			pteh |= HPTE_V_ABSENT;
> -			pteh &= ~HPTE_V_VALID;
> +			/*
> +			 * inval in progress in host, write host unmapped pte.
> +			 */
> +			host_unmapped_hpte = 1;

This isn't right.  We already have HPTE_V_VALID set here, and you now
don't clear it here, and it doesn't get cleared by the
__kvmppc_unmap_host_hpte() call below either.

Paul.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 4/6] KVM: PPC: BOOK3S: HV: Use new functions for mapping/unmapping hpte in host
@ 2014-07-02  4:28     ` Paul Mackerras
  0 siblings, 0 replies; 51+ messages in thread
From: Paul Mackerras @ 2014-07-02  4:28 UTC (permalink / raw)
  To: Aneesh Kumar K.V; +Cc: linuxppc-dev, agraf, kvm-ppc, kvm

On Sun, Jun 29, 2014 at 04:47:33PM +0530, Aneesh Kumar K.V wrote:
> We want to use virtual page class key protection mechanism for
> indicating a MMIO mapped hpte entry or a guest hpte entry that is swapped out
> in the host. Those hptes will be marked valid, but have virtual page
> class key set to 30 or 31. These virtual page class numbers are
> configured in AMR to deny read/write. To accomodate such a change, add
> new functions that map, unmap and check whether a hpte is mapped in the
> host. This patch still use HPTE_V_VALID and HPTE_V_ABSENT and don't use
> virtual page class keys. But we want to differentiate in the code
> where we explicitly check for HPTE_V_VALID with places where we want to
> check whether the hpte is host mapped. This patch enables a closer
> review for such a change.

[...]

>  		/* Check for pending invalidations under the rmap chain lock */
>  		if (kvm->arch.using_mmu_notifiers &&
>  		    mmu_notifier_retry(kvm, mmu_seq)) {
> -			/* inval in progress, write a non-present HPTE */
> -			pteh |= HPTE_V_ABSENT;
> -			pteh &= ~HPTE_V_VALID;
> +			/*
> +			 * inval in progress in host, write host unmapped pte.
> +			 */
> +			host_unmapped_hpte = 1;

This isn't right.  We already have HPTE_V_VALID set here, and you now
don't clear it here, and it doesn't get cleared by the
__kvmppc_unmap_host_hpte() call below either.

Paul.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 4/6] KVM: PPC: BOOK3S: HV: Use new functions for mapping/unmapping hpte in host
@ 2014-07-02  4:28     ` Paul Mackerras
  0 siblings, 0 replies; 51+ messages in thread
From: Paul Mackerras @ 2014-07-02  4:28 UTC (permalink / raw)
  To: Aneesh Kumar K.V; +Cc: agraf, benh, linuxppc-dev, kvm-ppc, kvm

On Sun, Jun 29, 2014 at 04:47:33PM +0530, Aneesh Kumar K.V wrote:
> We want to use virtual page class key protection mechanism for
> indicating a MMIO mapped hpte entry or a guest hpte entry that is swapped out
> in the host. Those hptes will be marked valid, but have virtual page
> class key set to 30 or 31. These virtual page class numbers are
> configured in AMR to deny read/write. To accomodate such a change, add
> new functions that map, unmap and check whether a hpte is mapped in the
> host. This patch still use HPTE_V_VALID and HPTE_V_ABSENT and don't use
> virtual page class keys. But we want to differentiate in the code
> where we explicitly check for HPTE_V_VALID with places where we want to
> check whether the hpte is host mapped. This patch enables a closer
> review for such a change.

[...]

>  		/* Check for pending invalidations under the rmap chain lock */
>  		if (kvm->arch.using_mmu_notifiers &&
>  		    mmu_notifier_retry(kvm, mmu_seq)) {
> -			/* inval in progress, write a non-present HPTE */
> -			pteh |= HPTE_V_ABSENT;
> -			pteh &= ~HPTE_V_VALID;
> +			/*
> +			 * inval in progress in host, write host unmapped pte.
> +			 */
> +			host_unmapped_hpte = 1;

This isn't right.  We already have HPTE_V_VALID set here, and you now
don't clear it here, and it doesn't get cleared by the
__kvmppc_unmap_host_hpte() call below either.

Paul.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 2/6] KVM: PPC: BOOK3S: HV: Deny virtual page class key update via h_protect
  2014-06-29 11:17   ` Aneesh Kumar K.V
  (?)
@ 2014-07-02  4:50     ` Paul Mackerras
  -1 siblings, 0 replies; 51+ messages in thread
From: Paul Mackerras @ 2014-07-02  4:50 UTC (permalink / raw)
  To: Aneesh Kumar K.V; +Cc: agraf, benh, linuxppc-dev, kvm-ppc, kvm

On Sun, Jun 29, 2014 at 04:47:31PM +0530, Aneesh Kumar K.V wrote:
> This makes it consistent with h_enter where we clear the key
> bits. We also want to use virtual page class key protection mechanism
> for indicating host page fault. For that we will be using key class
> index 30 and 31. So prevent the guest from updating key bits until
> we add proper support for virtual page class protection mechanism for
> the guest. This will not have any impact for PAPR linux guest because
> Linux guest currently don't use virtual page class key protection model

As things stand, without this patch series, we do actually have
everything we need in the kernel for guests to use virtual page class
keys.  Arguably we should have a capability to tell userspace how many
storage keys the guest can use, but that's the only missing piece as
far as I can see.

If we add such a capability, I can't see any reason why we should need
to disable guest use of storage keys in this patchset.

Paul.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 2/6] KVM: PPC: BOOK3S: HV: Deny virtual page class key update via h_protect
@ 2014-07-02  4:50     ` Paul Mackerras
  0 siblings, 0 replies; 51+ messages in thread
From: Paul Mackerras @ 2014-07-02  4:50 UTC (permalink / raw)
  To: Aneesh Kumar K.V; +Cc: linuxppc-dev, agraf, kvm-ppc, kvm

On Sun, Jun 29, 2014 at 04:47:31PM +0530, Aneesh Kumar K.V wrote:
> This makes it consistent with h_enter where we clear the key
> bits. We also want to use virtual page class key protection mechanism
> for indicating host page fault. For that we will be using key class
> index 30 and 31. So prevent the guest from updating key bits until
> we add proper support for virtual page class protection mechanism for
> the guest. This will not have any impact for PAPR linux guest because
> Linux guest currently don't use virtual page class key protection model

As things stand, without this patch series, we do actually have
everything we need in the kernel for guests to use virtual page class
keys.  Arguably we should have a capability to tell userspace how many
storage keys the guest can use, but that's the only missing piece as
far as I can see.

If we add such a capability, I can't see any reason why we should need
to disable guest use of storage keys in this patchset.

Paul.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 2/6] KVM: PPC: BOOK3S: HV: Deny virtual page class key update via h_protect
@ 2014-07-02  4:50     ` Paul Mackerras
  0 siblings, 0 replies; 51+ messages in thread
From: Paul Mackerras @ 2014-07-02  4:50 UTC (permalink / raw)
  To: Aneesh Kumar K.V; +Cc: agraf, benh, linuxppc-dev, kvm-ppc, kvm

On Sun, Jun 29, 2014 at 04:47:31PM +0530, Aneesh Kumar K.V wrote:
> This makes it consistent with h_enter where we clear the key
> bits. We also want to use virtual page class key protection mechanism
> for indicating host page fault. For that we will be using key class
> index 30 and 31. So prevent the guest from updating key bits until
> we add proper support for virtual page class protection mechanism for
> the guest. This will not have any impact for PAPR linux guest because
> Linux guest currently don't use virtual page class key protection model

As things stand, without this patch series, we do actually have
everything we need in the kernel for guests to use virtual page class
keys.  Arguably we should have a capability to tell userspace how many
storage keys the guest can use, but that's the only missing piece as
far as I can see.

If we add such a capability, I can't see any reason why we should need
to disable guest use of storage keys in this patchset.

Paul.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 5/6] KVM: PPC: BOOK3S: Use hpte_update_in_progress to track invalid hpte during an hpte update
  2014-06-29 11:17   ` [PATCH 5/6] KVM: PPC: BOOK3S: Use hpte_update_in_progress to track invalid hpte during an hpte update Aneesh Kumar K.V
  (?)
@ 2014-07-02  5:41     ` Paul Mackerras
  -1 siblings, 0 replies; 51+ messages in thread
From: Paul Mackerras @ 2014-07-02  5:41 UTC (permalink / raw)
  To: Aneesh Kumar K.V; +Cc: agraf, benh, linuxppc-dev, kvm-ppc, kvm

On Sun, Jun 29, 2014 at 04:47:34PM +0530, Aneesh Kumar K.V wrote:
> As per ISA, we first need to mark hpte invalid (V=0) before we update
> the hpte lower half bits. With virtual page class key protection mechanism we want
> to send any fault other than key fault to guest directly without
> searching the hash page table. But then we can get NO_HPTE fault while
> we are updating the hpte. To track that add a vm specific atomic
> variable that we check in the fault path to always send the fault
> to host.

[...]

> @@ -750,13 +751,15 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu,
>  	r &= rcbits | ~(HPTE_R_R | HPTE_R_C);
>  
>  	if (be64_to_cpu(hptep[0]) & HPTE_V_VALID) {
> -		/* HPTE was previously valid, so we need to invalidate it */
> +		/*
> +		 * If we had mapped this hpte before, we now need to
> +		 * invalidate that.
> +		 */
>  		unlock_rmap(rmap);
> -		/* Always mark HPTE_V_ABSENT before invalidating */
> -		kvmppc_unmap_host_hpte(kvm, hptep);
>  		kvmppc_invalidate_hpte(kvm, hptep, index);
>  		/* don't lose previous R and C bits */
>  		r |= be64_to_cpu(hptep[1]) & (HPTE_R_R | HPTE_R_C);
> +		hpte_invalidated = true;

So now we're not setting the ABSENT bit before invalidating the HPTE.
That means that another guest vcpu could do an H_ENTER which could
think that this HPTE is free and use it for another unrelated guest
HPTE, which would be bad...

> @@ -1144,8 +1149,8 @@ static int kvm_test_clear_dirty_npages(struct kvm *kvm, unsigned long *rmapp)
>  				npages_dirty = n;
>  			eieio();
>  		}
> -		kvmppc_map_host_hpte(kvm, &v, &r);
> -		hptep[0] = cpu_to_be64(v & ~HPTE_V_HVLOCK);
> +		hptep[0] = cpu_to_be64(v & ~HPTE_V_LOCK);
> +		atomic_dec(&kvm->arch.hpte_update_in_progress);

Why are we using LOCK rather than HVLOCK now?  (And why didn't you
mention this change and its rationale in the patch description?)

Paul.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 5/6] KVM: PPC: BOOK3S: Use hpte_update_in_progress to track invalid hpte during an hpte update
@ 2014-07-02  5:41     ` Paul Mackerras
  0 siblings, 0 replies; 51+ messages in thread
From: Paul Mackerras @ 2014-07-02  5:41 UTC (permalink / raw)
  To: Aneesh Kumar K.V; +Cc: linuxppc-dev, agraf, kvm-ppc, kvm

On Sun, Jun 29, 2014 at 04:47:34PM +0530, Aneesh Kumar K.V wrote:
> As per ISA, we first need to mark hpte invalid (V=0) before we update
> the hpte lower half bits. With virtual page class key protection mechanism we want
> to send any fault other than key fault to guest directly without
> searching the hash page table. But then we can get NO_HPTE fault while
> we are updating the hpte. To track that add a vm specific atomic
> variable that we check in the fault path to always send the fault
> to host.

[...]

> @@ -750,13 +751,15 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu,
>  	r &= rcbits | ~(HPTE_R_R | HPTE_R_C);
>  
>  	if (be64_to_cpu(hptep[0]) & HPTE_V_VALID) {
> -		/* HPTE was previously valid, so we need to invalidate it */
> +		/*
> +		 * If we had mapped this hpte before, we now need to
> +		 * invalidate that.
> +		 */
>  		unlock_rmap(rmap);
> -		/* Always mark HPTE_V_ABSENT before invalidating */
> -		kvmppc_unmap_host_hpte(kvm, hptep);
>  		kvmppc_invalidate_hpte(kvm, hptep, index);
>  		/* don't lose previous R and C bits */
>  		r |= be64_to_cpu(hptep[1]) & (HPTE_R_R | HPTE_R_C);
> +		hpte_invalidated = true;

So now we're not setting the ABSENT bit before invalidating the HPTE.
That means that another guest vcpu could do an H_ENTER which could
think that this HPTE is free and use it for another unrelated guest
HPTE, which would be bad...

> @@ -1144,8 +1149,8 @@ static int kvm_test_clear_dirty_npages(struct kvm *kvm, unsigned long *rmapp)
>  				npages_dirty = n;
>  			eieio();
>  		}
> -		kvmppc_map_host_hpte(kvm, &v, &r);
> -		hptep[0] = cpu_to_be64(v & ~HPTE_V_HVLOCK);
> +		hptep[0] = cpu_to_be64(v & ~HPTE_V_LOCK);
> +		atomic_dec(&kvm->arch.hpte_update_in_progress);

Why are we using LOCK rather than HVLOCK now?  (And why didn't you
mention this change and its rationale in the patch description?)

Paul.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 5/6] KVM: PPC: BOOK3S: Use hpte_update_in_progress to track invalid hpte during an hpte u
@ 2014-07-02  5:41     ` Paul Mackerras
  0 siblings, 0 replies; 51+ messages in thread
From: Paul Mackerras @ 2014-07-02  5:41 UTC (permalink / raw)
  To: Aneesh Kumar K.V; +Cc: agraf, benh, linuxppc-dev, kvm-ppc, kvm

On Sun, Jun 29, 2014 at 04:47:34PM +0530, Aneesh Kumar K.V wrote:
> As per ISA, we first need to mark hpte invalid (V=0) before we update
> the hpte lower half bits. With virtual page class key protection mechanism we want
> to send any fault other than key fault to guest directly without
> searching the hash page table. But then we can get NO_HPTE fault while
> we are updating the hpte. To track that add a vm specific atomic
> variable that we check in the fault path to always send the fault
> to host.

[...]

> @@ -750,13 +751,15 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu,
>  	r &= rcbits | ~(HPTE_R_R | HPTE_R_C);
>  
>  	if (be64_to_cpu(hptep[0]) & HPTE_V_VALID) {
> -		/* HPTE was previously valid, so we need to invalidate it */
> +		/*
> +		 * If we had mapped this hpte before, we now need to
> +		 * invalidate that.
> +		 */
>  		unlock_rmap(rmap);
> -		/* Always mark HPTE_V_ABSENT before invalidating */
> -		kvmppc_unmap_host_hpte(kvm, hptep);
>  		kvmppc_invalidate_hpte(kvm, hptep, index);
>  		/* don't lose previous R and C bits */
>  		r |= be64_to_cpu(hptep[1]) & (HPTE_R_R | HPTE_R_C);
> +		hpte_invalidated = true;

So now we're not setting the ABSENT bit before invalidating the HPTE.
That means that another guest vcpu could do an H_ENTER which could
think that this HPTE is free and use it for another unrelated guest
HPTE, which would be bad...

> @@ -1144,8 +1149,8 @@ static int kvm_test_clear_dirty_npages(struct kvm *kvm, unsigned long *rmapp)
>  				npages_dirty = n;
>  			eieio();
>  		}
> -		kvmppc_map_host_hpte(kvm, &v, &r);
> -		hptep[0] = cpu_to_be64(v & ~HPTE_V_HVLOCK);
> +		hptep[0] = cpu_to_be64(v & ~HPTE_V_LOCK);
> +		atomic_dec(&kvm->arch.hpte_update_in_progress);

Why are we using LOCK rather than HVLOCK now?  (And why didn't you
mention this change and its rationale in the patch description?)

Paul.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 4/6] KVM: PPC: BOOK3S: HV: Use new functions for mapping/unmapping hpte in host
  2014-07-02  4:28     ` Paul Mackerras
  (?)
@ 2014-07-02 11:49       ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 51+ messages in thread
From: Aneesh Kumar K.V @ 2014-07-02 11:49 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: agraf, benh, linuxppc-dev, kvm-ppc, kvm

Paul Mackerras <paulus@samba.org> writes:

> On Sun, Jun 29, 2014 at 04:47:33PM +0530, Aneesh Kumar K.V wrote:
>> We want to use virtual page class key protection mechanism for
>> indicating a MMIO mapped hpte entry or a guest hpte entry that is swapped out
>> in the host. Those hptes will be marked valid, but have virtual page
>> class key set to 30 or 31. These virtual page class numbers are
>> configured in AMR to deny read/write. To accomodate such a change, add
>> new functions that map, unmap and check whether a hpte is mapped in the
>> host. This patch still use HPTE_V_VALID and HPTE_V_ABSENT and don't use
>> virtual page class keys. But we want to differentiate in the code
>> where we explicitly check for HPTE_V_VALID with places where we want to
>> check whether the hpte is host mapped. This patch enables a closer
>> review for such a change.
>
> [...]
>
>>  		/* Check for pending invalidations under the rmap chain lock */
>>  		if (kvm->arch.using_mmu_notifiers &&
>>  		    mmu_notifier_retry(kvm, mmu_seq)) {
>> -			/* inval in progress, write a non-present HPTE */
>> -			pteh |= HPTE_V_ABSENT;
>> -			pteh &= ~HPTE_V_VALID;
>> +			/*
>> +			 * inval in progress in host, write host unmapped pte.
>> +			 */
>> +			host_unmapped_hpte = 1;
>
> This isn't right.  We already have HPTE_V_VALID set here, and you now
> don't clear it here, and it doesn't get cleared by the
> __kvmppc_unmap_host_hpte() call below either.


Ok missed that. Will fix that in the next update. In the earlier version
I had kvmppc_unmap_host_hpte always clearing V_VALID. 

-aneesh

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 4/6] KVM: PPC: BOOK3S: HV: Use new functions for mapping/unmapping hpte in host
@ 2014-07-02 11:49       ` Aneesh Kumar K.V
  0 siblings, 0 replies; 51+ messages in thread
From: Aneesh Kumar K.V @ 2014-07-02 11:49 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc-dev, agraf, kvm-ppc, kvm

Paul Mackerras <paulus@samba.org> writes:

> On Sun, Jun 29, 2014 at 04:47:33PM +0530, Aneesh Kumar K.V wrote:
>> We want to use virtual page class key protection mechanism for
>> indicating a MMIO mapped hpte entry or a guest hpte entry that is swapped out
>> in the host. Those hptes will be marked valid, but have virtual page
>> class key set to 30 or 31. These virtual page class numbers are
>> configured in AMR to deny read/write. To accomodate such a change, add
>> new functions that map, unmap and check whether a hpte is mapped in the
>> host. This patch still use HPTE_V_VALID and HPTE_V_ABSENT and don't use
>> virtual page class keys. But we want to differentiate in the code
>> where we explicitly check for HPTE_V_VALID with places where we want to
>> check whether the hpte is host mapped. This patch enables a closer
>> review for such a change.
>
> [...]
>
>>  		/* Check for pending invalidations under the rmap chain lock */
>>  		if (kvm->arch.using_mmu_notifiers &&
>>  		    mmu_notifier_retry(kvm, mmu_seq)) {
>> -			/* inval in progress, write a non-present HPTE */
>> -			pteh |= HPTE_V_ABSENT;
>> -			pteh &= ~HPTE_V_VALID;
>> +			/*
>> +			 * inval in progress in host, write host unmapped pte.
>> +			 */
>> +			host_unmapped_hpte = 1;
>
> This isn't right.  We already have HPTE_V_VALID set here, and you now
> don't clear it here, and it doesn't get cleared by the
> __kvmppc_unmap_host_hpte() call below either.


Ok missed that. Will fix that in the next update. In the earlier version
I had kvmppc_unmap_host_hpte always clearing V_VALID. 

-aneesh

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 4/6] KVM: PPC: BOOK3S: HV: Use new functions for mapping/unmapping hpte in host
@ 2014-07-02 11:49       ` Aneesh Kumar K.V
  0 siblings, 0 replies; 51+ messages in thread
From: Aneesh Kumar K.V @ 2014-07-02 11:50 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: agraf, benh, linuxppc-dev, kvm-ppc, kvm

Paul Mackerras <paulus@samba.org> writes:

> On Sun, Jun 29, 2014 at 04:47:33PM +0530, Aneesh Kumar K.V wrote:
>> We want to use virtual page class key protection mechanism for
>> indicating a MMIO mapped hpte entry or a guest hpte entry that is swapped out
>> in the host. Those hptes will be marked valid, but have virtual page
>> class key set to 30 or 31. These virtual page class numbers are
>> configured in AMR to deny read/write. To accomodate such a change, add
>> new functions that map, unmap and check whether a hpte is mapped in the
>> host. This patch still use HPTE_V_VALID and HPTE_V_ABSENT and don't use
>> virtual page class keys. But we want to differentiate in the code
>> where we explicitly check for HPTE_V_VALID with places where we want to
>> check whether the hpte is host mapped. This patch enables a closer
>> review for such a change.
>
> [...]
>
>>  		/* Check for pending invalidations under the rmap chain lock */
>>  		if (kvm->arch.using_mmu_notifiers &&
>>  		    mmu_notifier_retry(kvm, mmu_seq)) {
>> -			/* inval in progress, write a non-present HPTE */
>> -			pteh |= HPTE_V_ABSENT;
>> -			pteh &= ~HPTE_V_VALID;
>> +			/*
>> +			 * inval in progress in host, write host unmapped pte.
>> +			 */
>> +			host_unmapped_hpte = 1;
>
> This isn't right.  We already have HPTE_V_VALID set here, and you now
> don't clear it here, and it doesn't get cleared by the
> __kvmppc_unmap_host_hpte() call below either.


Ok missed that. Will fix that in the next update. In the earlier version
I had kvmppc_unmap_host_hpte always clearing V_VALID. 

-aneesh


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 5/6] KVM: PPC: BOOK3S: Use hpte_update_in_progress to track invalid hpte during an hpte update
  2014-07-02  5:41     ` [PATCH 5/6] KVM: PPC: BOOK3S: Use hpte_update_in_progress to track invalid hpte during an hpte update Paul Mackerras
  (?)
@ 2014-07-02 11:57       ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 51+ messages in thread
From: Aneesh Kumar K.V @ 2014-07-02 11:57 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: agraf, benh, linuxppc-dev, kvm-ppc, kvm

Paul Mackerras <paulus@samba.org> writes:

> On Sun, Jun 29, 2014 at 04:47:34PM +0530, Aneesh Kumar K.V wrote:
>> As per ISA, we first need to mark hpte invalid (V=0) before we update
>> the hpte lower half bits. With virtual page class key protection mechanism we want
>> to send any fault other than key fault to guest directly without
>> searching the hash page table. But then we can get NO_HPTE fault while
>> we are updating the hpte. To track that add a vm specific atomic
>> variable that we check in the fault path to always send the fault
>> to host.
>
> [...]
>
>> @@ -750,13 +751,15 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu,
>>  	r &= rcbits | ~(HPTE_R_R | HPTE_R_C);
>>  
>>  	if (be64_to_cpu(hptep[0]) & HPTE_V_VALID) {
>> -		/* HPTE was previously valid, so we need to invalidate it */
>> +		/*
>> +		 * If we had mapped this hpte before, we now need to
>> +		 * invalidate that.
>> +		 */
>>  		unlock_rmap(rmap);
>> -		/* Always mark HPTE_V_ABSENT before invalidating */
>> -		kvmppc_unmap_host_hpte(kvm, hptep);
>>  		kvmppc_invalidate_hpte(kvm, hptep, index);
>>  		/* don't lose previous R and C bits */
>>  		r |= be64_to_cpu(hptep[1]) & (HPTE_R_R | HPTE_R_C);
>> +		hpte_invalidated = true;
>
> So now we're not setting the ABSENT bit before invalidating the HPTE.
> That means that another guest vcpu could do an H_ENTER which could
> think that this HPTE is free and use it for another unrelated guest
> HPTE, which would be bad...

But henter looks at HPTE_V_HVLOCK, and we keep that set through out. But
I will double the code again to make sure it is safe in the above
scenario.

>
>> @@ -1144,8 +1149,8 @@ static int kvm_test_clear_dirty_npages(struct kvm *kvm, unsigned long *rmapp)
>>  				npages_dirty = n;
>>  			eieio();
>>  		}
>> -		kvmppc_map_host_hpte(kvm, &v, &r);
>> -		hptep[0] = cpu_to_be64(v & ~HPTE_V_HVLOCK);
>> +		hptep[0] = cpu_to_be64(v & ~HPTE_V_LOCK);
>> +		atomic_dec(&kvm->arch.hpte_update_in_progress);
>
> Why are we using LOCK rather than HVLOCK now?  (And why didn't you
> mention this change and its rationale in the patch description?)

Sorry, that is a typo. I intend to use HPTE_V_HVLOCK.

-aneesh

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 5/6] KVM: PPC: BOOK3S: Use hpte_update_in_progress to track invalid hpte during an hpte update
@ 2014-07-02 11:57       ` Aneesh Kumar K.V
  0 siblings, 0 replies; 51+ messages in thread
From: Aneesh Kumar K.V @ 2014-07-02 11:57 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc-dev, agraf, kvm-ppc, kvm

Paul Mackerras <paulus@samba.org> writes:

> On Sun, Jun 29, 2014 at 04:47:34PM +0530, Aneesh Kumar K.V wrote:
>> As per ISA, we first need to mark hpte invalid (V=0) before we update
>> the hpte lower half bits. With virtual page class key protection mechanism we want
>> to send any fault other than key fault to guest directly without
>> searching the hash page table. But then we can get NO_HPTE fault while
>> we are updating the hpte. To track that add a vm specific atomic
>> variable that we check in the fault path to always send the fault
>> to host.
>
> [...]
>
>> @@ -750,13 +751,15 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu,
>>  	r &= rcbits | ~(HPTE_R_R | HPTE_R_C);
>>  
>>  	if (be64_to_cpu(hptep[0]) & HPTE_V_VALID) {
>> -		/* HPTE was previously valid, so we need to invalidate it */
>> +		/*
>> +		 * If we had mapped this hpte before, we now need to
>> +		 * invalidate that.
>> +		 */
>>  		unlock_rmap(rmap);
>> -		/* Always mark HPTE_V_ABSENT before invalidating */
>> -		kvmppc_unmap_host_hpte(kvm, hptep);
>>  		kvmppc_invalidate_hpte(kvm, hptep, index);
>>  		/* don't lose previous R and C bits */
>>  		r |= be64_to_cpu(hptep[1]) & (HPTE_R_R | HPTE_R_C);
>> +		hpte_invalidated = true;
>
> So now we're not setting the ABSENT bit before invalidating the HPTE.
> That means that another guest vcpu could do an H_ENTER which could
> think that this HPTE is free and use it for another unrelated guest
> HPTE, which would be bad...

But henter looks at HPTE_V_HVLOCK, and we keep that set through out. But
I will double the code again to make sure it is safe in the above
scenario.

>
>> @@ -1144,8 +1149,8 @@ static int kvm_test_clear_dirty_npages(struct kvm *kvm, unsigned long *rmapp)
>>  				npages_dirty = n;
>>  			eieio();
>>  		}
>> -		kvmppc_map_host_hpte(kvm, &v, &r);
>> -		hptep[0] = cpu_to_be64(v & ~HPTE_V_HVLOCK);
>> +		hptep[0] = cpu_to_be64(v & ~HPTE_V_LOCK);
>> +		atomic_dec(&kvm->arch.hpte_update_in_progress);
>
> Why are we using LOCK rather than HVLOCK now?  (And why didn't you
> mention this change and its rationale in the patch description?)

Sorry, that is a typo. I intend to use HPTE_V_HVLOCK.

-aneesh

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 5/6] KVM: PPC: BOOK3S: Use hpte_update_in_progress to track invalid hpte during an hpte u
@ 2014-07-02 11:57       ` Aneesh Kumar K.V
  0 siblings, 0 replies; 51+ messages in thread
From: Aneesh Kumar K.V @ 2014-07-02 11:57 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: agraf, benh, linuxppc-dev, kvm-ppc, kvm

Paul Mackerras <paulus@samba.org> writes:

> On Sun, Jun 29, 2014 at 04:47:34PM +0530, Aneesh Kumar K.V wrote:
>> As per ISA, we first need to mark hpte invalid (V=0) before we update
>> the hpte lower half bits. With virtual page class key protection mechanism we want
>> to send any fault other than key fault to guest directly without
>> searching the hash page table. But then we can get NO_HPTE fault while
>> we are updating the hpte. To track that add a vm specific atomic
>> variable that we check in the fault path to always send the fault
>> to host.
>
> [...]
>
>> @@ -750,13 +751,15 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu,
>>  	r &= rcbits | ~(HPTE_R_R | HPTE_R_C);
>>  
>>  	if (be64_to_cpu(hptep[0]) & HPTE_V_VALID) {
>> -		/* HPTE was previously valid, so we need to invalidate it */
>> +		/*
>> +		 * If we had mapped this hpte before, we now need to
>> +		 * invalidate that.
>> +		 */
>>  		unlock_rmap(rmap);
>> -		/* Always mark HPTE_V_ABSENT before invalidating */
>> -		kvmppc_unmap_host_hpte(kvm, hptep);
>>  		kvmppc_invalidate_hpte(kvm, hptep, index);
>>  		/* don't lose previous R and C bits */
>>  		r |= be64_to_cpu(hptep[1]) & (HPTE_R_R | HPTE_R_C);
>> +		hpte_invalidated = true;
>
> So now we're not setting the ABSENT bit before invalidating the HPTE.
> That means that another guest vcpu could do an H_ENTER which could
> think that this HPTE is free and use it for another unrelated guest
> HPTE, which would be bad...

But henter looks at HPTE_V_HVLOCK, and we keep that set through out. But
I will double the code again to make sure it is safe in the above
scenario.

>
>> @@ -1144,8 +1149,8 @@ static int kvm_test_clear_dirty_npages(struct kvm *kvm, unsigned long *rmapp)
>>  				npages_dirty = n;
>>  			eieio();
>>  		}
>> -		kvmppc_map_host_hpte(kvm, &v, &r);
>> -		hptep[0] = cpu_to_be64(v & ~HPTE_V_HVLOCK);
>> +		hptep[0] = cpu_to_be64(v & ~HPTE_V_LOCK);
>> +		atomic_dec(&kvm->arch.hpte_update_in_progress);
>
> Why are we using LOCK rather than HVLOCK now?  (And why didn't you
> mention this change and its rationale in the patch description?)

Sorry, that is a typo. I intend to use HPTE_V_HVLOCK.

-aneesh


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 2/6] KVM: PPC: BOOK3S: HV: Deny virtual page class key update via h_protect
  2014-07-02  4:50     ` Paul Mackerras
  (?)
@ 2014-07-02 12:12       ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 51+ messages in thread
From: Aneesh Kumar K.V @ 2014-07-02 12:12 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: agraf, benh, linuxppc-dev, kvm-ppc, kvm

Paul Mackerras <paulus@samba.org> writes:

> On Sun, Jun 29, 2014 at 04:47:31PM +0530, Aneesh Kumar K.V wrote:
>> This makes it consistent with h_enter where we clear the key
>> bits. We also want to use virtual page class key protection mechanism
>> for indicating host page fault. For that we will be using key class
>> index 30 and 31. So prevent the guest from updating key bits until
>> we add proper support for virtual page class protection mechanism for
>> the guest. This will not have any impact for PAPR linux guest because
>> Linux guest currently don't use virtual page class key protection model
>
> As things stand, without this patch series, we do actually have
> everything we need in the kernel for guests to use virtual page class
> keys.  Arguably we should have a capability to tell userspace how many
> storage keys the guest can use, but that's the only missing piece as
> far as I can see.

yes.

>
> If we add such a capability, I can't see any reason why we should need
> to disable guest use of storage keys in this patchset.

With this patchset, we would need additonal changes to find out whether the key
fault happened because of the guest's usage of the key. I was planning to do
that as an add-on series to keep the changes in this minimal. Also since
linux didn't use keys i was not sure whether guest support of keys is an
important item.

-aneesh


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 2/6] KVM: PPC: BOOK3S: HV: Deny virtual page class key update via h_protect
@ 2014-07-02 12:12       ` Aneesh Kumar K.V
  0 siblings, 0 replies; 51+ messages in thread
From: Aneesh Kumar K.V @ 2014-07-02 12:12 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc-dev, agraf, kvm-ppc, kvm

Paul Mackerras <paulus@samba.org> writes:

> On Sun, Jun 29, 2014 at 04:47:31PM +0530, Aneesh Kumar K.V wrote:
>> This makes it consistent with h_enter where we clear the key
>> bits. We also want to use virtual page class key protection mechanism
>> for indicating host page fault. For that we will be using key class
>> index 30 and 31. So prevent the guest from updating key bits until
>> we add proper support for virtual page class protection mechanism for
>> the guest. This will not have any impact for PAPR linux guest because
>> Linux guest currently don't use virtual page class key protection model
>
> As things stand, without this patch series, we do actually have
> everything we need in the kernel for guests to use virtual page class
> keys.  Arguably we should have a capability to tell userspace how many
> storage keys the guest can use, but that's the only missing piece as
> far as I can see.

yes.

>
> If we add such a capability, I can't see any reason why we should need
> to disable guest use of storage keys in this patchset.

With this patchset, we would need additonal changes to find out whether the key
fault happened because of the guest's usage of the key. I was planning to do
that as an add-on series to keep the changes in this minimal. Also since
linux didn't use keys i was not sure whether guest support of keys is an
important item.

-aneesh

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 2/6] KVM: PPC: BOOK3S: HV: Deny virtual page class key update via h_protect
@ 2014-07-02 12:12       ` Aneesh Kumar K.V
  0 siblings, 0 replies; 51+ messages in thread
From: Aneesh Kumar K.V @ 2014-07-02 12:24 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: agraf, benh, linuxppc-dev, kvm-ppc, kvm

Paul Mackerras <paulus@samba.org> writes:

> On Sun, Jun 29, 2014 at 04:47:31PM +0530, Aneesh Kumar K.V wrote:
>> This makes it consistent with h_enter where we clear the key
>> bits. We also want to use virtual page class key protection mechanism
>> for indicating host page fault. For that we will be using key class
>> index 30 and 31. So prevent the guest from updating key bits until
>> we add proper support for virtual page class protection mechanism for
>> the guest. This will not have any impact for PAPR linux guest because
>> Linux guest currently don't use virtual page class key protection model
>
> As things stand, without this patch series, we do actually have
> everything we need in the kernel for guests to use virtual page class
> keys.  Arguably we should have a capability to tell userspace how many
> storage keys the guest can use, but that's the only missing piece as
> far as I can see.

yes.

>
> If we add such a capability, I can't see any reason why we should need
> to disable guest use of storage keys in this patchset.

With this patchset, we would need additonal changes to find out whether the key
fault happened because of the guest's usage of the key. I was planning to do
that as an add-on series to keep the changes in this minimal. Also since
linux didn't use keys i was not sure whether guest support of keys is an
important item.

-aneesh


^ permalink raw reply	[flat|nested] 51+ messages in thread

end of thread, other threads:[~2014-07-02 12:24 UTC | newest]

Thread overview: 51+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-06-29 11:17 [PATCH 0/6] Use virtual page class key protection mechanism for speeding up guest page fault Aneesh Kumar K.V
2014-06-29 11:29 ` Aneesh Kumar K.V
2014-06-29 11:17 ` Aneesh Kumar K.V
2014-06-29 11:17 ` [PATCH 1/6] KVM: PPC: BOOK3S: HV: Clear hash pte bits from do_h_enter callers Aneesh Kumar K.V
2014-06-29 11:29   ` Aneesh Kumar K.V
2014-06-29 11:17   ` Aneesh Kumar K.V
2014-06-29 11:17 ` [PATCH] KVM: PPC: BOOK3S: HV: Update compute_tlbie_rb to handle 16MB base page Aneesh Kumar K.V
2014-06-29 11:29   ` Aneesh Kumar K.V
2014-06-29 11:17   ` Aneesh Kumar K.V
2014-07-02  4:00   ` Paul Mackerras
2014-07-02  4:00     ` Paul Mackerras
2014-07-02  4:00     ` Paul Mackerras
2014-06-29 11:17 ` [PATCH 2/6] KVM: PPC: BOOK3S: HV: Deny virtual page class key update via h_protect Aneesh Kumar K.V
2014-06-29 11:29   ` Aneesh Kumar K.V
2014-06-29 11:17   ` Aneesh Kumar K.V
2014-07-02  4:50   ` Paul Mackerras
2014-07-02  4:50     ` Paul Mackerras
2014-07-02  4:50     ` Paul Mackerras
2014-07-02 12:12     ` Aneesh Kumar K.V
2014-07-02 12:24       ` Aneesh Kumar K.V
2014-07-02 12:12       ` Aneesh Kumar K.V
2014-06-29 11:17 ` [PATCH 3/6] KVM: PPC: BOOK3S: HV: Remove dead code Aneesh Kumar K.V
2014-06-29 11:29   ` Aneesh Kumar K.V
2014-06-29 11:17   ` Aneesh Kumar K.V
2014-06-29 11:17 ` [PATCH 4/6] KVM: PPC: BOOK3S: HV: Use new functions for mapping/unmapping hpte in host Aneesh Kumar K.V
2014-06-29 11:29   ` Aneesh Kumar K.V
2014-06-29 11:17   ` Aneesh Kumar K.V
2014-07-02  4:28   ` Paul Mackerras
2014-07-02  4:28     ` Paul Mackerras
2014-07-02  4:28     ` Paul Mackerras
2014-07-02 11:49     ` Aneesh Kumar K.V
2014-07-02 11:50       ` Aneesh Kumar K.V
2014-07-02 11:49       ` Aneesh Kumar K.V
2014-06-29 11:17 ` [PATCH 5/6] KVM: PPC: BOOK3S: Use hpte_update_in_progress to track invalid hpte during an hpte update Aneesh Kumar K.V
2014-06-29 11:29   ` [PATCH 5/6] KVM: PPC: BOOK3S: Use hpte_update_in_progress to track invalid hpte during an hpte updat Aneesh Kumar K.V
2014-06-29 11:17   ` [PATCH 5/6] KVM: PPC: BOOK3S: Use hpte_update_in_progress to track invalid hpte during an hpte update Aneesh Kumar K.V
2014-07-02  5:41   ` Paul Mackerras
2014-07-02  5:41     ` [PATCH 5/6] KVM: PPC: BOOK3S: Use hpte_update_in_progress to track invalid hpte during an hpte u Paul Mackerras
2014-07-02  5:41     ` [PATCH 5/6] KVM: PPC: BOOK3S: Use hpte_update_in_progress to track invalid hpte during an hpte update Paul Mackerras
2014-07-02 11:57     ` Aneesh Kumar K.V
2014-07-02 11:57       ` [PATCH 5/6] KVM: PPC: BOOK3S: Use hpte_update_in_progress to track invalid hpte during an hpte u Aneesh Kumar K.V
2014-07-02 11:57       ` [PATCH 5/6] KVM: PPC: BOOK3S: Use hpte_update_in_progress to track invalid hpte during an hpte update Aneesh Kumar K.V
2014-06-29 11:17 ` [PATCH 6/6] KVM: PPC: BOOK3S: HV: Use virtual page class protection mechanism for host fault and mmio Aneesh Kumar K.V
2014-06-29 11:29   ` [PATCH 6/6] KVM: PPC: BOOK3S: HV: Use virtual page class protection mechanism for host fault and mmi Aneesh Kumar K.V
2014-06-29 11:17   ` [PATCH 6/6] KVM: PPC: BOOK3S: HV: Use virtual page class protection mechanism for host fault and mmio Aneesh Kumar K.V
2014-06-29 11:26 ` [PATCH 0/6] Use virtual page class key protection mechanism for speeding up guest page fault Benjamin Herrenschmidt
2014-06-29 11:26   ` Benjamin Herrenschmidt
2014-06-29 11:26   ` Benjamin Herrenschmidt
2014-06-29 16:57   ` Aneesh Kumar K.V
2014-06-29 16:57     ` Aneesh Kumar K.V
2014-06-29 16:57     ` Aneesh Kumar K.V

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.