All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/16] Remove hash page table slot tracking from linux PTE
@ 2017-10-27  4:08 Aneesh Kumar K.V
  2017-10-27  4:08 ` [PATCH 01/16] powerpc/mm/hash: Remove the superfluous bitwise operation when find hpte group Aneesh Kumar K.V
                   ` (16 more replies)
  0 siblings, 17 replies; 31+ messages in thread
From: Aneesh Kumar K.V @ 2017-10-27  4:08 UTC (permalink / raw)
  To: benh, paulus, mpe; +Cc: linuxppc-dev, Aneesh Kumar K.V

Hi,

With hash translation mode we always tracked the hash pte slot details in linux page table.
This occupied space in the linux page table and also limitted our ability to support
linux features that require additional PTE bits. This series attempt to lift this
limitation by not tracking slot number in linux page table. We still track slot details
w.r.t Transparent Hugepage entries because an invalidate there requires us to go through
all the 256 hash pte slots. So tracking whether hash page table entry is valid helps us in
avoiding a lot of hcalls there. With THP entries we don't keep slot details in the primary
linux page table entry but in the second half of page table. Hence tracking slot details
for THP doesn't take up space in PTE.

Even though we don't track slot, for removing/updating hash page table entry, PAPR hcalls expect
hash page table slot details. On pseries we find slot using H_READ hcall using H_READ_4 flags.
This implies an additional 2 hcalls in the updatepp and remove paths. The patch series also
attempt to limit the impact of this by adding new hcalls that does remove/update of hash page table
entry using hash value instead of hash page table slot.

Below is the performance numbers observed when running a workload that does the below sequence

for(5000) {
mmap(128M)
touch every page of 2048 page
munmap()
}

The test is run with address randomization off, swap disabled in both host and guest.


|------------+----------+---------------+--------------------------+-----------------------|
| iterations | platform | without patch | With series and no hcall | With series and hcall |
|------------+----------+---------------+--------------------------+-----------------------|
|          1 | powernv  |               |                50.818343 |                       |
|          2 | powernv  |               |                50.744123 |                       |
|          3 | powernv  |               |                50.721603 |                       |
|          4 | powernv  |               |                50.739922 |                       |
|          5 | powernv  |               |                50.638555 |                       |
|          1 | powernv  |     51.388249 |                          |                       |
|          2 | powernv  |     51.789701 |                          |                       |
|          3 | powernv  |     52.240394 |                          |                       |
|          4 | powernv  |     51.432255 |                          |                       |
|          5 | powernv  |     51.392947 |                          |                       |
|------------+----------+---------------+--------------------------+-----------------------|
|          1 | pseries  |               |                          |            123.154394 |
|          2 | pseries  |               |                          |            122.253956 |
|          3 | pseries  |               |                          |            117.666344 |
|          4 | pseries  |               |                          |            117.681479 |
|          5 | pseries  |               |                          |            117.735808 |
|          1 | pseries  |               |               119.424940 |                       |
|          2 | pseries  |               |               117.663078 |                       |
|          3 | pseries  |               |               118.345584 |                       |
|          4 | pseries  |               |               119.620934 |                       |
|          5 | pseries  |               |               119.463185 |                       |
|          1 | pseries  |    122.810867 |                          |                       |
|          2 | pseries  |    115.760801 |                          |                       |
|          3 | pseries  |    115.257030 |                          |                       |
|          4 | pseries  |    116.617884 |                          |                       |
|          5 | pseries  |    117.247036 |                          |                       |
|------------+----------+---------------+--------------------------+-----------------------|

-aneesh

Aneesh Kumar K.V (16):
  powerpc/mm/hash: Remove the superfluous bitwise operation when find
    hpte group
  powerpc/mm: Update native_hpte_find to return hash pte
  powerpc/pseries: Update hpte find helper to take hash value
  powerpc/mm: Add hash invalidate callback
  powerpc/mm: use hash_invalidate for __kernel_map_pages()
  powerpc/mm: Switch flush_hash_range to not use slot
  powerpc/mm: Add hash updatepp callback
  powerpc/mm/hash: Don't track hash pte slot number in linux page table.
  powerpc/mm: Add new firmware feature HASH API
  powerpc/kvm/hash: Implement HASH_REMOVE hcall
  powerpc/kvm/hash: Implement HASH_PROTECT hcall
  powerpc/kvm/hash: Implement HASH_BULK_REMOVE hcall
  powerpc/mm/pseries: Use HASH_PROTECT hcall in guest
  powerpc/mm/pseries: Use HASH_REMOVE hcall in guest
  powerpc/mm/pseries: Move slot based bulk remove to helper
  powerpc/mm/pseries: Use HASH_BULK_REMOVE hcall in guest

 arch/powerpc/include/asm/book3s/64/hash-4k.h       |  16 +-
 arch/powerpc/include/asm/book3s/64/hash-64k.h      |  44 +--
 arch/powerpc/include/asm/book3s/64/hash.h          |   5 +-
 arch/powerpc/include/asm/book3s/64/mmu-hash.h      |  12 +
 arch/powerpc/include/asm/book3s/64/pgtable.h       |  26 --
 arch/powerpc/include/asm/book3s/64/tlbflush-hash.h |   3 +-
 arch/powerpc/include/asm/firmware.h                |   3 +-
 arch/powerpc/include/asm/hvcall.h                  |   5 +-
 arch/powerpc/include/asm/pgtable-be-types.h        |  10 -
 arch/powerpc/include/asm/pgtable-types.h           |   9 -
 arch/powerpc/include/asm/plpar_wrappers.h          |  23 ++
 arch/powerpc/kvm/book3s_hv.c                       |   3 +
 arch/powerpc/kvm/book3s_hv_rm_mmu.c                | 297 ++++++++++++++++++---
 arch/powerpc/kvm/book3s_hv_rmhandlers.S            |   4 +
 arch/powerpc/kvm/powerpc.c                         |   4 +
 arch/powerpc/mm/dump_hashpagetable.c               |   2 +-
 arch/powerpc/mm/dump_linuxpagetables.c             |  10 -
 arch/powerpc/mm/hash64_4k.c                        |  17 +-
 arch/powerpc/mm/hash64_64k.c                       | 124 +++------
 arch/powerpc/mm/hash_native_64.c                   | 175 ++++++++----
 arch/powerpc/mm/hash_utils_64.c                    |  75 ++----
 arch/powerpc/mm/hugepage-hash64.c                  |   9 +-
 arch/powerpc/mm/hugetlbpage-hash64.c               |  13 +-
 arch/powerpc/mm/tlb_hash64.c                       |   9 +-
 arch/powerpc/platforms/ps3/htab.c                  |  88 ++++++
 arch/powerpc/platforms/pseries/firmware.c          |   1 +
 arch/powerpc/platforms/pseries/lpar.c              | 196 +++++++++++---
 include/uapi/linux/kvm.h                           |   1 +
 28 files changed, 760 insertions(+), 424 deletions(-)

-- 
2.13.6

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH 01/16] powerpc/mm/hash: Remove the superfluous bitwise operation when find hpte group
  2017-10-27  4:08 [PATCH 00/16] Remove hash page table slot tracking from linux PTE Aneesh Kumar K.V
@ 2017-10-27  4:08 ` Aneesh Kumar K.V
  2017-10-27  4:08 ` [PATCH 02/16] powerpc/mm: Update native_hpte_find to return hash pte Aneesh Kumar K.V
                   ` (15 subsequent siblings)
  16 siblings, 0 replies; 31+ messages in thread
From: Aneesh Kumar K.V @ 2017-10-27  4:08 UTC (permalink / raw)
  To: benh, paulus, mpe; +Cc: linuxppc-dev, Aneesh Kumar K.V

When computing the starting slot number for a hash page table group we used
to do this
hpte_group = ((hash & htab_hash_mask) * HPTES_PER_GROUP) & ~0x7UL;

Multiplying with 8 (HPTES_PER_GROUP) imply the last three bits are 0. Hence we
really don't need to clear then separately.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 arch/powerpc/mm/dump_hashpagetable.c |  2 +-
 arch/powerpc/mm/hash64_4k.c          |  8 ++++----
 arch/powerpc/mm/hash64_64k.c         | 16 ++++++++--------
 arch/powerpc/mm/hash_utils_64.c      | 10 ++++------
 arch/powerpc/mm/hugepage-hash64.c    |  9 ++++-----
 5 files changed, 21 insertions(+), 24 deletions(-)

diff --git a/arch/powerpc/mm/dump_hashpagetable.c b/arch/powerpc/mm/dump_hashpagetable.c
index 5c4c93dcff19..2384d40bfcf4 100644
--- a/arch/powerpc/mm/dump_hashpagetable.c
+++ b/arch/powerpc/mm/dump_hashpagetable.c
@@ -260,7 +260,7 @@ static int pseries_find(unsigned long ea, int psize, bool primary, u64 *v, u64 *
 	/* to check in the secondary hash table, we invert the hash */
 	if (!primary)
 		hash = ~hash;
-	hpte_group = ((hash & htab_hash_mask) * HPTES_PER_GROUP) & ~0x7UL;
+	hpte_group = (hash & htab_hash_mask) * HPTES_PER_GROUP;
 	/* see if we can find an entry in the hpte with this hash */
 	for (i = 0; i < HPTES_PER_GROUP; i += 4, hpte_group += 4) {
 		lpar_rc = plpar_pte_read_4(0, hpte_group, (void *)ptes);
diff --git a/arch/powerpc/mm/hash64_4k.c b/arch/powerpc/mm/hash64_4k.c
index 6fa450c12d6d..975793de0914 100644
--- a/arch/powerpc/mm/hash64_4k.c
+++ b/arch/powerpc/mm/hash64_4k.c
@@ -81,7 +81,7 @@ int __hash_page_4K(unsigned long ea, unsigned long access, unsigned long vsid,
 		hash = hpt_hash(vpn, shift, ssize);
 
 repeat:
-		hpte_group = ((hash & htab_hash_mask) * HPTES_PER_GROUP) & ~0x7UL;
+		hpte_group = (hash & htab_hash_mask) * HPTES_PER_GROUP;
 
 		/* Insert into the hash table, primary slot */
 		slot = mmu_hash_ops.hpte_insert(hpte_group, vpn, pa, rflags, 0,
@@ -90,7 +90,7 @@ int __hash_page_4K(unsigned long ea, unsigned long access, unsigned long vsid,
 		 * Primary is full, try the secondary
 		 */
 		if (unlikely(slot == -1)) {
-			hpte_group = ((~hash & htab_hash_mask) * HPTES_PER_GROUP) & ~0x7UL;
+			hpte_group = (~hash & htab_hash_mask) * HPTES_PER_GROUP;
 			slot = mmu_hash_ops.hpte_insert(hpte_group, vpn, pa,
 							rflags,
 							HPTE_V_SECONDARY,
@@ -98,8 +98,8 @@ int __hash_page_4K(unsigned long ea, unsigned long access, unsigned long vsid,
 							MMU_PAGE_4K, ssize);
 			if (slot == -1) {
 				if (mftb() & 0x1)
-					hpte_group = ((hash & htab_hash_mask) *
-						      HPTES_PER_GROUP) & ~0x7UL;
+					hpte_group = (hash & htab_hash_mask) *
+							HPTES_PER_GROUP;
 				mmu_hash_ops.hpte_remove(hpte_group);
 				/*
 				 * FIXME!! Should be try the group from which we removed ?
diff --git a/arch/powerpc/mm/hash64_64k.c b/arch/powerpc/mm/hash64_64k.c
index 1a68cb19b0e3..f1eb538721fc 100644
--- a/arch/powerpc/mm/hash64_64k.c
+++ b/arch/powerpc/mm/hash64_64k.c
@@ -163,7 +163,7 @@ int __hash_page_4K(unsigned long ea, unsigned long access, unsigned long vsid,
 	}
 	hash = hpt_hash(vpn, shift, ssize);
 repeat:
-	hpte_group = ((hash & htab_hash_mask) * HPTES_PER_GROUP) & ~0x7UL;
+	hpte_group = (hash & htab_hash_mask) * HPTES_PER_GROUP;
 
 	/* Insert into the hash table, primary slot */
 	slot = mmu_hash_ops.hpte_insert(hpte_group, vpn, pa, rflags, 0,
@@ -172,15 +172,15 @@ int __hash_page_4K(unsigned long ea, unsigned long access, unsigned long vsid,
 	 * Primary is full, try the secondary
 	 */
 	if (unlikely(slot == -1)) {
-		hpte_group = ((~hash & htab_hash_mask) * HPTES_PER_GROUP) & ~0x7UL;
+		hpte_group = (~hash & htab_hash_mask) * HPTES_PER_GROUP;
 		slot = mmu_hash_ops.hpte_insert(hpte_group, vpn, pa,
 						rflags, HPTE_V_SECONDARY,
 						MMU_PAGE_4K, MMU_PAGE_4K,
 						ssize);
 		if (slot == -1) {
 			if (mftb() & 0x1)
-				hpte_group = ((hash & htab_hash_mask) *
-					      HPTES_PER_GROUP) & ~0x7UL;
+				hpte_group = (hash & htab_hash_mask) *
+						HPTES_PER_GROUP;
 			mmu_hash_ops.hpte_remove(hpte_group);
 			/*
 			 * FIXME!! Should be try the group from which we removed ?
@@ -285,7 +285,7 @@ int __hash_page_64K(unsigned long ea, unsigned long access,
 		hash = hpt_hash(vpn, shift, ssize);
 
 repeat:
-		hpte_group = ((hash & htab_hash_mask) * HPTES_PER_GROUP) & ~0x7UL;
+		hpte_group = (hash & htab_hash_mask) * HPTES_PER_GROUP;
 
 		/* Insert into the hash table, primary slot */
 		slot = mmu_hash_ops.hpte_insert(hpte_group, vpn, pa, rflags, 0,
@@ -295,7 +295,7 @@ int __hash_page_64K(unsigned long ea, unsigned long access,
 		 * Primary is full, try the secondary
 		 */
 		if (unlikely(slot == -1)) {
-			hpte_group = ((~hash & htab_hash_mask) * HPTES_PER_GROUP) & ~0x7UL;
+			hpte_group = (~hash & htab_hash_mask) * HPTES_PER_GROUP;
 			slot = mmu_hash_ops.hpte_insert(hpte_group, vpn, pa,
 							rflags,
 							HPTE_V_SECONDARY,
@@ -303,8 +303,8 @@ int __hash_page_64K(unsigned long ea, unsigned long access,
 							MMU_PAGE_64K, ssize);
 			if (slot == -1) {
 				if (mftb() & 0x1)
-					hpte_group = ((hash & htab_hash_mask) *
-						      HPTES_PER_GROUP) & ~0x7UL;
+					hpte_group = (hash & htab_hash_mask) *
+							HPTES_PER_GROUP;
 				mmu_hash_ops.hpte_remove(hpte_group);
 				/*
 				 * FIXME!! Should be try the group from which we removed ?
diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
index 655a5a9a183d..4d4662a77c14 100644
--- a/arch/powerpc/mm/hash_utils_64.c
+++ b/arch/powerpc/mm/hash_utils_64.c
@@ -1723,8 +1723,7 @@ long hpte_insert_repeating(unsigned long hash, unsigned long vpn,
 	long slot;
 
 repeat:
-	hpte_group = ((hash & htab_hash_mask) *
-		       HPTES_PER_GROUP) & ~0x7UL;
+	hpte_group = (hash & htab_hash_mask) * HPTES_PER_GROUP;
 
 	/* Insert into the hash table, primary slot */
 	slot = mmu_hash_ops.hpte_insert(hpte_group, vpn, pa, rflags, vflags,
@@ -1732,15 +1731,14 @@ long hpte_insert_repeating(unsigned long hash, unsigned long vpn,
 
 	/* Primary is full, try the secondary */
 	if (unlikely(slot == -1)) {
-		hpte_group = ((~hash & htab_hash_mask) *
-			      HPTES_PER_GROUP) & ~0x7UL;
+		hpte_group = (~hash & htab_hash_mask) * HPTES_PER_GROUP;
 		slot = mmu_hash_ops.hpte_insert(hpte_group, vpn, pa, rflags,
 						vflags | HPTE_V_SECONDARY,
 						psize, psize, ssize);
 		if (slot == -1) {
 			if (mftb() & 0x1)
-				hpte_group = ((hash & htab_hash_mask) *
-					      HPTES_PER_GROUP)&~0x7UL;
+				hpte_group = (hash & htab_hash_mask) *
+						HPTES_PER_GROUP;
 
 			mmu_hash_ops.hpte_remove(hpte_group);
 			goto repeat;
diff --git a/arch/powerpc/mm/hugepage-hash64.c b/arch/powerpc/mm/hugepage-hash64.c
index f20d16f849c5..01f213d2bcb9 100644
--- a/arch/powerpc/mm/hugepage-hash64.c
+++ b/arch/powerpc/mm/hugepage-hash64.c
@@ -128,7 +128,7 @@ int __hash_page_thp(unsigned long ea, unsigned long access, unsigned long vsid,
 		new_pmd |= H_PAGE_HASHPTE;
 
 repeat:
-		hpte_group = ((hash & htab_hash_mask) * HPTES_PER_GROUP) & ~0x7UL;
+		hpte_group = (hash & htab_hash_mask) * HPTES_PER_GROUP;
 
 		/* Insert into the hash table, primary slot */
 		slot = mmu_hash_ops.hpte_insert(hpte_group, vpn, pa, rflags, 0,
@@ -137,16 +137,15 @@ int __hash_page_thp(unsigned long ea, unsigned long access, unsigned long vsid,
 		 * Primary is full, try the secondary
 		 */
 		if (unlikely(slot == -1)) {
-			hpte_group = ((~hash & htab_hash_mask) *
-				      HPTES_PER_GROUP) & ~0x7UL;
+			hpte_group = (~hash & htab_hash_mask) * HPTES_PER_GROUP;
 			slot = mmu_hash_ops.hpte_insert(hpte_group, vpn, pa,
 							rflags,
 							HPTE_V_SECONDARY,
 							psize, lpsize, ssize);
 			if (slot == -1) {
 				if (mftb() & 0x1)
-					hpte_group = ((hash & htab_hash_mask) *
-						      HPTES_PER_GROUP) & ~0x7UL;
+					hpte_group = (hash & htab_hash_mask) *
+							HPTES_PER_GROUP;
 
 				mmu_hash_ops.hpte_remove(hpte_group);
 				goto repeat;
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 02/16] powerpc/mm: Update native_hpte_find to return hash pte
  2017-10-27  4:08 [PATCH 00/16] Remove hash page table slot tracking from linux PTE Aneesh Kumar K.V
  2017-10-27  4:08 ` [PATCH 01/16] powerpc/mm/hash: Remove the superfluous bitwise operation when find hpte group Aneesh Kumar K.V
@ 2017-10-27  4:08 ` Aneesh Kumar K.V
  2017-10-27  4:08 ` [PATCH 03/16] powerpc/pseries: Update hpte find helper to take hash value Aneesh Kumar K.V
                   ` (14 subsequent siblings)
  16 siblings, 0 replies; 31+ messages in thread
From: Aneesh Kumar K.V @ 2017-10-27  4:08 UTC (permalink / raw)
  To: benh, paulus, mpe; +Cc: linuxppc-dev, Aneesh Kumar K.V

The helper now also does a secondary hash search so that we can use this in other
functions.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 arch/powerpc/mm/hash_native_64.c | 73 ++++++++++++++++++++++++----------------
 1 file changed, 44 insertions(+), 29 deletions(-)

diff --git a/arch/powerpc/mm/hash_native_64.c b/arch/powerpc/mm/hash_native_64.c
index 3848af167df9..496b1680ba24 100644
--- a/arch/powerpc/mm/hash_native_64.c
+++ b/arch/powerpc/mm/hash_native_64.c
@@ -351,32 +351,49 @@ static long native_hpte_updatepp(unsigned long slot, unsigned long newpp,
 	return ret;
 }
 
-static long native_hpte_find(unsigned long vpn, int psize, int ssize)
+/* returns a locked hash pte */
+struct hash_pte *native_hpte_find(unsigned long hash, unsigned long vpn,
+				  unsigned long bpsize, unsigned long ssize)
 {
+	int i;
+	unsigned long hpte_v;
 	struct hash_pte *hptep;
-	unsigned long hash;
-	unsigned long i;
-	long slot;
-	unsigned long want_v, hpte_v;
-
-	hash = hpt_hash(vpn, mmu_psize_defs[psize].shift, ssize);
-	want_v = hpte_encode_avpn(vpn, psize, ssize);
+	unsigned long want_v, slot;
+	bool secondary_search = false;
 
-	/* Bolted mappings are only ever in the primary group */
+	want_v = hpte_encode_avpn(vpn, bpsize, ssize);
 	slot = (hash & htab_hash_mask) * HPTES_PER_GROUP;
-	for (i = 0; i < HPTES_PER_GROUP; i++) {
-		hptep = htab_address + slot;
+
+	/*
+	 * search for hpte in the primary group
+	 */
+search_again:
+	hptep = htab_address + slot;
+	for (i = 0; i < HPTES_PER_GROUP; i++, hptep++) {
+		/* check locklessly first */
 		hpte_v = be64_to_cpu(hptep->v);
 		if (cpu_has_feature(CPU_FTR_ARCH_300))
 			hpte_v = hpte_new_to_old_v(hpte_v, be64_to_cpu(hptep->r));
+		if (!HPTE_V_COMPARE(hpte_v, want_v) || !(hpte_v & HPTE_V_VALID))
+			continue;
 
-		if (HPTE_V_COMPARE(hpte_v, want_v) && (hpte_v & HPTE_V_VALID))
-			/* HPTE matches */
-			return slot;
-		++slot;
+		native_lock_hpte(hptep);
+		hpte_v = be64_to_cpu(hptep->v);
+		if (cpu_has_feature(CPU_FTR_ARCH_300))
+			hpte_v = hpte_new_to_old_v(hpte_v, be64_to_cpu(hptep->r));
+		if (unlikely(!HPTE_V_COMPARE(hpte_v, want_v) ||
+			     !(hpte_v & HPTE_V_VALID)))
+			native_unlock_hpte(hptep);
+		else
+			return hptep;
 	}
-
-	return -1;
+	if (!secondary_search) {
+		/* Search for hpte in the secondary group */
+		slot = (~hash & htab_hash_mask) * HPTES_PER_GROUP;
+		secondary_search = true;
+		goto search_again;
+	}
+	return NULL;
 }
 
 /*
@@ -389,23 +406,22 @@ static long native_hpte_find(unsigned long vpn, int psize, int ssize)
 static void native_hpte_updateboltedpp(unsigned long newpp, unsigned long ea,
 				       int psize, int ssize)
 {
-	unsigned long vpn;
-	unsigned long vsid;
-	long slot;
+	unsigned long hash;
+	unsigned long vpn, vsid;
 	struct hash_pte *hptep;
 
 	vsid = get_kernel_vsid(ea, ssize);
 	vpn = hpt_vpn(ea, vsid, ssize);
-
-	slot = native_hpte_find(vpn, psize, ssize);
-	if (slot == -1)
+	hash = hpt_hash(vpn, mmu_psize_defs[psize].shift, ssize);
+	hptep = native_hpte_find(hash, vpn, psize, ssize);
+	if (!hptep)
 		panic("could not find page to bolt\n");
-	hptep = htab_address + slot;
 
 	/* Update the HPTE */
 	hptep->r = cpu_to_be64((be64_to_cpu(hptep->r) &
 				~(HPTE_R_PPP | HPTE_R_N)) |
 			       (newpp & (HPTE_R_PPP | HPTE_R_N)));
+	native_unlock_hpte(hptep);
 	/*
 	 * Ensure it is out of the tlb too. Bolted entries base and
 	 * actual page size will be same.
@@ -422,18 +438,17 @@ static int native_hpte_removebolted(unsigned long ea, int psize, int ssize)
 {
 	unsigned long vpn;
 	unsigned long vsid;
-	long slot;
+	unsigned long hash;
 	struct hash_pte *hptep;
 
 	vsid = get_kernel_vsid(ea, ssize);
 	vpn = hpt_vpn(ea, vsid, ssize);
+	hash = hpt_hash(vpn, mmu_psize_defs[psize].shift, ssize);
 
-	slot = native_hpte_find(vpn, psize, ssize);
-	if (slot == -1)
+	hptep = native_hpte_find(hash, vpn, psize, ssize);
+	if (!hptep)
 		return -ENOENT;
 
-	hptep = htab_address + slot;
-
 	VM_WARN_ON(!(be64_to_cpu(hptep->v) & HPTE_V_BOLTED));
 
 	/* Invalidate the hpte */
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 03/16] powerpc/pseries: Update hpte find helper to take hash value
  2017-10-27  4:08 [PATCH 00/16] Remove hash page table slot tracking from linux PTE Aneesh Kumar K.V
  2017-10-27  4:08 ` [PATCH 01/16] powerpc/mm/hash: Remove the superfluous bitwise operation when find hpte group Aneesh Kumar K.V
  2017-10-27  4:08 ` [PATCH 02/16] powerpc/mm: Update native_hpte_find to return hash pte Aneesh Kumar K.V
@ 2017-10-27  4:08 ` Aneesh Kumar K.V
  2017-10-27  4:08 ` [PATCH 04/16] powerpc/mm: Add hash invalidate callback Aneesh Kumar K.V
                   ` (13 subsequent siblings)
  16 siblings, 0 replies; 31+ messages in thread
From: Aneesh Kumar K.V @ 2017-10-27  4:08 UTC (permalink / raw)
  To: benh, paulus, mpe; +Cc: linuxppc-dev, Aneesh Kumar K.V

The helper now also does secondary hash search so that we can use this in other
functions.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/pseries/lpar.c | 28 +++++++++++++++++-----------
 1 file changed, 17 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/lpar.c b/arch/powerpc/platforms/pseries/lpar.c
index 495ba4e7336d..edab68d9f9f3 100644
--- a/arch/powerpc/platforms/pseries/lpar.c
+++ b/arch/powerpc/platforms/pseries/lpar.c
@@ -328,15 +328,21 @@ static long pSeries_lpar_hpte_updatepp(unsigned long slot,
 	return 0;
 }
 
-static long __pSeries_lpar_hpte_find(unsigned long want_v, unsigned long hpte_group)
+static long __pSeries_lpar_hpte_find(unsigned long hash, unsigned long want_v)
 {
 	long lpar_rc;
 	unsigned long i, j;
+	unsigned long hpte_group;
+	bool secondary_search = false;
 	struct {
 		unsigned long pteh;
 		unsigned long ptel;
 	} ptes[4];
 
+	/* first check primary */
+	hpte_group = (hash & htab_hash_mask) * HPTES_PER_GROUP;
+
+search_again:
 	for (i = 0; i < HPTES_PER_GROUP; i += 4, hpte_group += 4) {
 
 		lpar_rc = plpar_pte_read_4(0, hpte_group, (void *)ptes);
@@ -346,31 +352,31 @@ static long __pSeries_lpar_hpte_find(unsigned long want_v, unsigned long hpte_gr
 		for (j = 0; j < 4; j++) {
 			if (HPTE_V_COMPARE(ptes[j].pteh, want_v) &&
 			    (ptes[j].pteh & HPTE_V_VALID))
-				return i + j;
+				return hpte_group + j;
 		}
 	}
-
+	if (!secondary_search) {
+		hpte_group = (~hash & htab_hash_mask) * HPTES_PER_GROUP;
+		secondary_search = true;
+		goto search_again;
+	}
 	return -1;
 }
 
 static long pSeries_lpar_hpte_find(unsigned long vpn, int psize, int ssize)
 {
 	long slot;
-	unsigned long hash;
-	unsigned long want_v;
-	unsigned long hpte_group;
+	unsigned long hash, want_v;
 
 	hash = hpt_hash(vpn, mmu_psize_defs[psize].shift, ssize);
 	want_v = hpte_encode_avpn(vpn, psize, ssize);
-
-	/* Bolted entries are always in the primary group */
-	hpte_group = (hash & htab_hash_mask) * HPTES_PER_GROUP;
-	slot = __pSeries_lpar_hpte_find(want_v, hpte_group);
+	slot = __pSeries_lpar_hpte_find(hash, want_v);
 	if (slot < 0)
 		return -1;
-	return hpte_group + slot;
+	return slot;
 }
 
+
 static void pSeries_lpar_hpte_updateboltedpp(unsigned long newpp,
 					     unsigned long ea,
 					     int psize, int ssize)
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 04/16] powerpc/mm: Add hash invalidate callback
  2017-10-27  4:08 [PATCH 00/16] Remove hash page table slot tracking from linux PTE Aneesh Kumar K.V
                   ` (2 preceding siblings ...)
  2017-10-27  4:08 ` [PATCH 03/16] powerpc/pseries: Update hpte find helper to take hash value Aneesh Kumar K.V
@ 2017-10-27  4:08 ` Aneesh Kumar K.V
  2017-10-27  4:08 ` [PATCH 05/16] powerpc/mm: use hash_invalidate for __kernel_map_pages() Aneesh Kumar K.V
                   ` (12 subsequent siblings)
  16 siblings, 0 replies; 31+ messages in thread
From: Aneesh Kumar K.V @ 2017-10-27  4:08 UTC (permalink / raw)
  To: benh, paulus, mpe; +Cc: linuxppc-dev, Aneesh Kumar K.V

Add hash based invalidate callback and use that in flush_hash_page.
Note: In a later patch, we will drop the slot tracking completely. At that point
we will also loose the __rpte_sub_valid() check in
pte_iterate_hashed_subpages(). That means we call the invalidate for all
subpages irrespective of whether we took a hash fault on that or not.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/book3s/64/mmu-hash.h |  4 ++
 arch/powerpc/mm/hash_native_64.c              | 27 ++++++++++++
 arch/powerpc/mm/hash_utils_64.c               | 11 ++---
 arch/powerpc/platforms/ps3/htab.c             | 59 +++++++++++++++++++++++++++
 arch/powerpc/platforms/pseries/lpar.c         | 26 ++++++++++++
 5 files changed, 119 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/mmu-hash.h b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
index 508275bb05d5..79f141e721ee 100644
--- a/arch/powerpc/include/asm/book3s/64/mmu-hash.h
+++ b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
@@ -136,6 +136,10 @@ struct mmu_hash_ops {
 					   unsigned long vpn,
 					   int bpsize, int apsize,
 					   int ssize, int local);
+	void            (*hash_invalidate)(unsigned long hash,
+					   unsigned long vpn,
+					   int bpsize, int apsize,
+					   int ssize, int local);
 	long		(*hpte_updatepp)(unsigned long slot,
 					 unsigned long newpp,
 					 unsigned long vpn,
diff --git a/arch/powerpc/mm/hash_native_64.c b/arch/powerpc/mm/hash_native_64.c
index 496b1680ba24..f473a78baab7 100644
--- a/arch/powerpc/mm/hash_native_64.c
+++ b/arch/powerpc/mm/hash_native_64.c
@@ -497,6 +497,32 @@ static void native_hpte_invalidate(unsigned long slot, unsigned long vpn,
 	local_irq_restore(flags);
 }
 
+static void native_hash_invalidate(unsigned long hash, unsigned long vpn,
+				   int bpsize, int apsize, int ssize, int local)
+{
+	unsigned long flags;
+	struct hash_pte *hptep;
+
+	DBG_LOW("    invalidate(vpn=%016lx, hash: %lx)\n", vpn, hash);
+	local_irq_save(flags);
+	hptep = native_hpte_find(hash, vpn, bpsize, ssize);
+	if (hptep) {
+		/*
+		 * Invalidate the hpte. NOTE: this also unlocks it
+		 */
+		hptep->v = 0;
+	}
+	/*
+	 * We need to invalidate the TLB always because hpte_remove doesn't do
+	 * a tlb invalidate. If a hash bucket gets full, we "evict" a more/less
+	 * random entry from it. When we do that we don't invalidate the TLB
+	 * (hpte_remove) because we assume the old translation is still
+	 * technically "valid".
+	 */
+	tlbie(vpn, bpsize, apsize, ssize, local);
+	local_irq_restore(flags);
+}
+
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 static void native_hugepage_invalidate(unsigned long vsid,
 				       unsigned long addr,
@@ -776,6 +802,7 @@ static int native_register_proc_table(unsigned long base, unsigned long page_siz
 void __init hpte_init_native(void)
 {
 	mmu_hash_ops.hpte_invalidate	= native_hpte_invalidate;
+	mmu_hash_ops.hash_invalidate	= native_hash_invalidate;
 	mmu_hash_ops.hpte_updatepp	= native_hpte_updatepp;
 	mmu_hash_ops.hpte_updateboltedpp = native_hpte_updateboltedpp;
 	mmu_hash_ops.hpte_removebolted = native_hpte_removebolted;
diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
index 4d4662a77c14..b197fe57547e 100644
--- a/arch/powerpc/mm/hash_utils_64.c
+++ b/arch/powerpc/mm/hash_utils_64.c
@@ -1598,23 +1598,18 @@ static inline void tm_flush_hash_page(int local)
 void flush_hash_page(unsigned long vpn, real_pte_t pte, int psize, int ssize,
 		     unsigned long flags)
 {
-	unsigned long hash, index, shift, hidx, slot;
+	unsigned long hash, index, shift;
 	int local = flags & HPTE_LOCAL_UPDATE;
 
 	DBG_LOW("flush_hash_page(vpn=%016lx)\n", vpn);
 	pte_iterate_hashed_subpages(pte, psize, vpn, index, shift) {
 		hash = hpt_hash(vpn, shift, ssize);
-		hidx = __rpte_to_hidx(pte, index);
-		if (hidx & _PTEIDX_SECONDARY)
-			hash = ~hash;
-		slot = (hash & htab_hash_mask) * HPTES_PER_GROUP;
-		slot += hidx & _PTEIDX_GROUP_IX;
-		DBG_LOW(" sub %ld: hash=%lx, hidx=%lx\n", index, slot, hidx);
+		DBG_LOW(" sub %ld: hash=%lx\n", index, hash);
 		/*
 		 * We use same base page size and actual psize, because we don't
 		 * use these functions for hugepage
 		 */
-		mmu_hash_ops.hpte_invalidate(slot, vpn, psize, psize,
+		mmu_hash_ops.hash_invalidate(hash, vpn, psize, psize,
 					     ssize, local);
 	} pte_iterate_hashed_end();
 
diff --git a/arch/powerpc/platforms/ps3/htab.c b/arch/powerpc/platforms/ps3/htab.c
index cc2b281a3766..813c2f77f75d 100644
--- a/arch/powerpc/platforms/ps3/htab.c
+++ b/arch/powerpc/platforms/ps3/htab.c
@@ -193,9 +193,68 @@ static void ps3_hpte_clear(void)
 	ps3_mm_vas_destroy();
 }
 
+static long ps3_hpte_find(unsigned long hash, unsigned long want_v)
+{
+	unsigned long i, j, result;
+	unsigned long hpte_group;
+	bool secondary_search = false;
+	u64 hpte_v_array[4], hpte_rs;
+
+
+	/* first check primary */
+	hpte_group = (hash & htab_hash_mask) * HPTES_PER_GROUP;
+
+search_again:
+	for (i = 0; i < HPTES_PER_GROUP; i += 4, hpte_group += 4) {
+
+		result = lv1_read_htab_entries(PS3_LPAR_VAS_ID_CURRENT,
+					       hpte_group & ~0x3UL, &hpte_v_array[0],
+					       &hpte_v_array[1], &hpte_v_array[2],
+					       &hpte_v_array[3], &hpte_rs);
+		/* ignore failures ? */
+		if (result)
+			continue;
+
+		for (j = 0; j < 4; j++) {
+			if (HPTE_V_COMPARE(hpte_v_array[j], want_v) &&
+			    (hpte_v_array[j] & HPTE_V_VALID)) {
+				return hpte_group + j;
+			}
+		}
+	}
+	if (!secondary_search) {
+		hpte_group = (~hash & htab_hash_mask) * HPTES_PER_GROUP;
+		secondary_search = true;
+		goto search_again;
+	}
+	return -1;
+}
+
+static void ps3_hash_invalidate(unsigned long hash, unsigned long vpn,
+				int psize, int apsize, int ssize, int local)
+{
+	long slot;
+	unsigned long flags;
+	unsigned long want_v;
+
+	want_v = hpte_encode_avpn(vpn, psize, ssize);
+
+	spin_lock_irqsave(&ps3_htab_lock, flags);
+	slot = ps3_hpte_find(hash, want_v);
+	if (slot < 0)
+		/* HPTE not found */
+		goto err_out;
+	/* invalidate the entry */
+	lv1_write_htab_entry(PS3_LPAR_VAS_ID_CURRENT, slot, 0, 0);
+err_out:
+	spin_unlock_irqrestore(&ps3_htab_lock, flags);
+	return;
+}
+
 void __init ps3_hpte_init(unsigned long htab_size)
 {
 	mmu_hash_ops.hpte_invalidate = ps3_hpte_invalidate;
+	mmu_hash_ops.hash_invalidate = ps3_hash_invalidate;
 	mmu_hash_ops.hpte_updatepp = ps3_hpte_updatepp;
 	mmu_hash_ops.hpte_updateboltedpp = ps3_hpte_updateboltedpp;
 	mmu_hash_ops.hpte_insert = ps3_hpte_insert;
diff --git a/arch/powerpc/platforms/pseries/lpar.c b/arch/powerpc/platforms/pseries/lpar.c
index edab68d9f9f3..e366252e0e93 100644
--- a/arch/powerpc/platforms/pseries/lpar.c
+++ b/arch/powerpc/platforms/pseries/lpar.c
@@ -419,6 +419,31 @@ static void pSeries_lpar_hpte_invalidate(unsigned long slot, unsigned long vpn,
 	BUG_ON(lpar_rc != H_SUCCESS);
 }
 
+static void pSeries_lpar_hash_invalidate(unsigned long hash, unsigned long vpn,
+					 int psize, int apsize,
+					 int ssize, int local)
+{
+	long slot;
+	unsigned long want_v;
+	unsigned long lpar_rc;
+	unsigned long dummy1, dummy2;
+
+	pr_devel("    inval : hash=%lx, vpn=%016lx, psize: %d, local: %d\n",
+		 hash, vpn, psize, local);
+
+	want_v = hpte_encode_avpn(vpn, psize, ssize);
+	slot = __pSeries_lpar_hpte_find(hash, want_v);
+	if (slot < 0)
+		/* HPTE not found */
+		return;
+	lpar_rc = plpar_pte_remove(H_AVPN, slot, want_v, &dummy1, &dummy2);
+	if (lpar_rc == H_NOT_FOUND)
+		return;
+
+	BUG_ON(lpar_rc != H_SUCCESS);
+}
+
+
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 /*
  * Limit iterations holding pSeries_lpar_tlbie_lock to 3. We also need
@@ -758,6 +783,7 @@ static int pseries_lpar_register_process_table(unsigned long base,
 void __init hpte_init_pseries(void)
 {
 	mmu_hash_ops.hpte_invalidate	 = pSeries_lpar_hpte_invalidate;
+	mmu_hash_ops.hash_invalidate	 = pSeries_lpar_hash_invalidate;
 	mmu_hash_ops.hpte_updatepp	 = pSeries_lpar_hpte_updatepp;
 	mmu_hash_ops.hpte_updateboltedpp = pSeries_lpar_hpte_updateboltedpp;
 	mmu_hash_ops.hpte_insert	 = pSeries_lpar_hpte_insert;
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 05/16] powerpc/mm: use hash_invalidate for __kernel_map_pages()
  2017-10-27  4:08 [PATCH 00/16] Remove hash page table slot tracking from linux PTE Aneesh Kumar K.V
                   ` (3 preceding siblings ...)
  2017-10-27  4:08 ` [PATCH 04/16] powerpc/mm: Add hash invalidate callback Aneesh Kumar K.V
@ 2017-10-27  4:08 ` Aneesh Kumar K.V
  2017-10-27  4:08 ` [PATCH 06/16] powerpc/mm: Switch flush_hash_range to not use slot Aneesh Kumar K.V
                   ` (11 subsequent siblings)
  16 siblings, 0 replies; 31+ messages in thread
From: Aneesh Kumar K.V @ 2017-10-27  4:08 UTC (permalink / raw)
  To: benh, paulus, mpe; +Cc: linuxppc-dev, Aneesh Kumar K.V

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 arch/powerpc/mm/hash_utils_64.c | 32 +++++---------------------------
 1 file changed, 5 insertions(+), 27 deletions(-)

diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
index b197fe57547e..8635b241e2d5 100644
--- a/arch/powerpc/mm/hash_utils_64.c
+++ b/arch/powerpc/mm/hash_utils_64.c
@@ -119,11 +119,6 @@ EXPORT_SYMBOL_GPL(mmu_slb_size);
 #ifdef CONFIG_PPC_64K_PAGES
 int mmu_ci_restrictions;
 #endif
-#ifdef CONFIG_DEBUG_PAGEALLOC
-static u8 *linear_map_hash_slots;
-static unsigned long linear_map_hash_count;
-static DEFINE_SPINLOCK(linear_map_hash_lock);
-#endif /* CONFIG_DEBUG_PAGEALLOC */
 struct mmu_hash_ops mmu_hash_ops;
 EXPORT_SYMBOL(mmu_hash_ops);
 
@@ -1744,7 +1739,7 @@ long hpte_insert_repeating(unsigned long hash, unsigned long vpn,
 }
 
 #ifdef CONFIG_DEBUG_PAGEALLOC
-static void kernel_map_linear_page(unsigned long vaddr, unsigned long lmi)
+static void kernel_map_linear_page(unsigned long vaddr)
 {
 	unsigned long hash;
 	unsigned long vsid = get_kernel_vsid(vaddr, mmu_kernel_ssize);
@@ -1761,12 +1756,7 @@ static void kernel_map_linear_page(unsigned long vaddr, unsigned long lmi)
 	ret = hpte_insert_repeating(hash, vpn, __pa(vaddr), mode,
 				    HPTE_V_BOLTED,
 				    mmu_linear_psize, mmu_kernel_ssize);
-
 	BUG_ON (ret < 0);
-	spin_lock(&linear_map_hash_lock);
-	BUG_ON(linear_map_hash_slots[lmi] & 0x80);
-	linear_map_hash_slots[lmi] = ret | 0x80;
-	spin_unlock(&linear_map_hash_lock);
 }
 
 static void kernel_unmap_linear_page(unsigned long vaddr, unsigned long lmi)
@@ -1776,35 +1766,23 @@ static void kernel_unmap_linear_page(unsigned long vaddr, unsigned long lmi)
 	unsigned long vpn = hpt_vpn(vaddr, vsid, mmu_kernel_ssize);
 
 	hash = hpt_hash(vpn, PAGE_SHIFT, mmu_kernel_ssize);
-	spin_lock(&linear_map_hash_lock);
-	BUG_ON(!(linear_map_hash_slots[lmi] & 0x80));
-	hidx = linear_map_hash_slots[lmi] & 0x7f;
-	linear_map_hash_slots[lmi] = 0;
-	spin_unlock(&linear_map_hash_lock);
-	if (hidx & _PTEIDX_SECONDARY)
-		hash = ~hash;
-	slot = (hash & htab_hash_mask) * HPTES_PER_GROUP;
-	slot += hidx & _PTEIDX_GROUP_IX;
-	mmu_hash_ops.hpte_invalidate(slot, vpn, mmu_linear_psize,
+	mmu_hash_ops.hash_invalidate(hash, vpn, mmu_linear_psize,
 				     mmu_linear_psize,
 				     mmu_kernel_ssize, 0);
 }
 
 void __kernel_map_pages(struct page *page, int numpages, int enable)
 {
-	unsigned long flags, vaddr, lmi;
+	unsigned long flags, vaddr;
 	int i;
 
 	local_irq_save(flags);
 	for (i = 0; i < numpages; i++, page++) {
 		vaddr = (unsigned long)page_address(page);
-		lmi = __pa(vaddr) >> PAGE_SHIFT;
-		if (lmi >= linear_map_hash_count)
-			continue;
 		if (enable)
-			kernel_map_linear_page(vaddr, lmi);
+			kernel_map_linear_page(vaddr);
 		else
-			kernel_unmap_linear_page(vaddr, lmi);
+			kernel_unmap_linear_page(vaddr);
 	}
 	local_irq_restore(flags);
 }
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 06/16] powerpc/mm: Switch flush_hash_range to not use slot
  2017-10-27  4:08 [PATCH 00/16] Remove hash page table slot tracking from linux PTE Aneesh Kumar K.V
                   ` (4 preceding siblings ...)
  2017-10-27  4:08 ` [PATCH 05/16] powerpc/mm: use hash_invalidate for __kernel_map_pages() Aneesh Kumar K.V
@ 2017-10-27  4:08 ` Aneesh Kumar K.V
  2017-10-27  4:08 ` [PATCH 07/16] powerpc/mm: Add hash updatepp callback Aneesh Kumar K.V
                   ` (10 subsequent siblings)
  16 siblings, 0 replies; 31+ messages in thread
From: Aneesh Kumar K.V @ 2017-10-27  4:08 UTC (permalink / raw)
  To: benh, paulus, mpe; +Cc: linuxppc-dev, Aneesh Kumar K.V

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 arch/powerpc/mm/hash_native_64.c      | 28 ++++++++--------------------
 arch/powerpc/platforms/pseries/lpar.c | 17 ++++++++---------
 2 files changed, 16 insertions(+), 29 deletions(-)

diff --git a/arch/powerpc/mm/hash_native_64.c b/arch/powerpc/mm/hash_native_64.c
index f473a78baab7..8e2e6b92aa27 100644
--- a/arch/powerpc/mm/hash_native_64.c
+++ b/arch/powerpc/mm/hash_native_64.c
@@ -707,10 +707,8 @@ static void native_hpte_clear(void)
 static void native_flush_hash_range(unsigned long number, int local)
 {
 	unsigned long vpn;
-	unsigned long hash, index, hidx, shift, slot;
+	unsigned long hash, index, shift;
 	struct hash_pte *hptep;
-	unsigned long hpte_v;
-	unsigned long want_v;
 	unsigned long flags;
 	real_pte_t pte;
 	struct ppc64_tlb_batch *batch = this_cpu_ptr(&ppc64_tlb_batch);
@@ -730,23 +728,13 @@ static void native_flush_hash_range(unsigned long number, int local)
 
 		pte_iterate_hashed_subpages(pte, psize, vpn, index, shift) {
 			hash = hpt_hash(vpn, shift, ssize);
-			hidx = __rpte_to_hidx(pte, index);
-			if (hidx & _PTEIDX_SECONDARY)
-				hash = ~hash;
-			slot = (hash & htab_hash_mask) * HPTES_PER_GROUP;
-			slot += hidx & _PTEIDX_GROUP_IX;
-			hptep = htab_address + slot;
-			want_v = hpte_encode_avpn(vpn, psize, ssize);
-			native_lock_hpte(hptep);
-			hpte_v = be64_to_cpu(hptep->v);
-			if (cpu_has_feature(CPU_FTR_ARCH_300))
-				hpte_v = hpte_new_to_old_v(hpte_v,
-						be64_to_cpu(hptep->r));
-			if (!HPTE_V_COMPARE(hpte_v, want_v) ||
-			    !(hpte_v & HPTE_V_VALID))
-				native_unlock_hpte(hptep);
-			else
-				hptep->v = 0;
+			hptep = native_hpte_find(hash, vpn, psize, ssize);
+			if (!hptep)
+				continue;
+			/*
+			 * Invalidate the hpte. NOTE: this also unlocks it
+			 */
+			hptep->v = 0;
 		} pte_iterate_hashed_end();
 	}
 
diff --git a/arch/powerpc/platforms/pseries/lpar.c b/arch/powerpc/platforms/pseries/lpar.c
index e366252e0e93..d32469e40bbc 100644
--- a/arch/powerpc/platforms/pseries/lpar.c
+++ b/arch/powerpc/platforms/pseries/lpar.c
@@ -580,14 +580,14 @@ static int pSeries_lpar_hpte_removebolted(unsigned long ea,
 static void pSeries_lpar_flush_hash_range(unsigned long number, int local)
 {
 	unsigned long vpn;
-	unsigned long i, pix, rc;
+	unsigned long i, rc;
 	unsigned long flags = 0;
 	struct ppc64_tlb_batch *batch = this_cpu_ptr(&ppc64_tlb_batch);
 	int lock_tlbie = !mmu_has_feature(MMU_FTR_LOCKLESS_TLBIE);
 	unsigned long param[PLPAR_HCALL9_BUFSIZE];
-	unsigned long hash, index, shift, hidx, slot;
+	unsigned long index, shift;
 	real_pte_t pte;
-	int psize, ssize;
+	int psize, ssize, pix;
 
 	if (lock_tlbie)
 		spin_lock_irqsave(&pSeries_lpar_tlbie_lock, flags);
@@ -599,12 +599,11 @@ static void pSeries_lpar_flush_hash_range(unsigned long number, int local)
 		vpn = batch->vpn[i];
 		pte = batch->pte[i];
 		pte_iterate_hashed_subpages(pte, psize, vpn, index, shift) {
-			hash = hpt_hash(vpn, shift, ssize);
-			hidx = __rpte_to_hidx(pte, index);
-			if (hidx & _PTEIDX_SECONDARY)
-				hash = ~hash;
-			slot = (hash & htab_hash_mask) * HPTES_PER_GROUP;
-			slot += hidx & _PTEIDX_GROUP_IX;
+			long slot;
+
+			slot = pSeries_lpar_hpte_find(vpn, psize, ssize);
+			if (slot < 0)
+				continue;
 			if (!firmware_has_feature(FW_FEATURE_BULK_REMOVE)) {
 				/*
 				 * lpar doesn't use the passed actual page size
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 07/16] powerpc/mm: Add hash updatepp callback
  2017-10-27  4:08 [PATCH 00/16] Remove hash page table slot tracking from linux PTE Aneesh Kumar K.V
                   ` (5 preceding siblings ...)
  2017-10-27  4:08 ` [PATCH 06/16] powerpc/mm: Switch flush_hash_range to not use slot Aneesh Kumar K.V
@ 2017-10-27  4:08 ` Aneesh Kumar K.V
  2017-10-27  4:08 ` [PATCH 08/16] powerpc/mm/hash: Don't track hash pte slot number in linux page table Aneesh Kumar K.V
                   ` (9 subsequent siblings)
  16 siblings, 0 replies; 31+ messages in thread
From: Aneesh Kumar K.V @ 2017-10-27  4:08 UTC (permalink / raw)
  To: benh, paulus, mpe; +Cc: linuxppc-dev, Aneesh Kumar K.V

Add hash based updatepp callback and use that during hash pte fault.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/book3s/64/mmu-hash.h |  6 +++++
 arch/powerpc/mm/hash64_4k.c                   |  7 +----
 arch/powerpc/mm/hash64_64k.c                  | 19 +++-----------
 arch/powerpc/mm/hash_native_64.c              | 37 +++++++++++++++++++++++++++
 arch/powerpc/mm/hugetlbpage-hash64.c          |  9 ++-----
 arch/powerpc/platforms/ps3/htab.c             | 29 +++++++++++++++++++++
 arch/powerpc/platforms/pseries/lpar.c         | 31 ++++++++++++++++++++++
 7 files changed, 110 insertions(+), 28 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/mmu-hash.h b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
index 79f141e721ee..8b1d924a2f85 100644
--- a/arch/powerpc/include/asm/book3s/64/mmu-hash.h
+++ b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
@@ -145,6 +145,12 @@ struct mmu_hash_ops {
 					 unsigned long vpn,
 					 int bpsize, int apsize,
 					 int ssize, unsigned long flags);
+	long		(*hash_updatepp)(unsigned long hash,
+					 unsigned long newpp,
+					 unsigned long vpn,
+					 int bpsize, int apsize,
+					 int ssize, unsigned long flags);
+
 	void            (*hpte_updateboltedpp)(unsigned long newpp,
 					       unsigned long ea,
 					       int psize, int ssize);
diff --git a/arch/powerpc/mm/hash64_4k.c b/arch/powerpc/mm/hash64_4k.c
index 975793de0914..afb79100f0ce 100644
--- a/arch/powerpc/mm/hash64_4k.c
+++ b/arch/powerpc/mm/hash64_4k.c
@@ -65,12 +65,7 @@ int __hash_page_4K(unsigned long ea, unsigned long access, unsigned long vsid,
 		 * There MIGHT be an HPTE for this pte
 		 */
 		hash = hpt_hash(vpn, shift, ssize);
-		if (old_pte & H_PAGE_F_SECOND)
-			hash = ~hash;
-		slot = (hash & htab_hash_mask) * HPTES_PER_GROUP;
-		slot += (old_pte & H_PAGE_F_GIX) >> H_PAGE_F_GIX_SHIFT;
-
-		if (mmu_hash_ops.hpte_updatepp(slot, rflags, vpn, MMU_PAGE_4K,
+		if (mmu_hash_ops.hash_updatepp(hash, rflags, vpn, MMU_PAGE_4K,
 					       MMU_PAGE_4K, ssize, flags) == -1)
 			old_pte &= ~_PAGE_HPTEFLAGS;
 	}
diff --git a/arch/powerpc/mm/hash64_64k.c b/arch/powerpc/mm/hash64_64k.c
index f1eb538721fc..096fdfaf6f1c 100644
--- a/arch/powerpc/mm/hash64_64k.c
+++ b/arch/powerpc/mm/hash64_64k.c
@@ -53,7 +53,7 @@ int __hash_page_4K(unsigned long ea, unsigned long access, unsigned long vsid,
 	unsigned long *hidxp;
 	unsigned long hpte_group;
 	unsigned int subpg_index;
-	unsigned long rflags, pa, hidx;
+	unsigned long rflags, pa;
 	unsigned long old_pte, new_pte, subpg_pte;
 	unsigned long vpn, hash, slot;
 	unsigned long shift = mmu_psize_defs[MMU_PAGE_4K].shift;
@@ -127,17 +127,11 @@ int __hash_page_4K(unsigned long ea, unsigned long access, unsigned long vsid,
 		int ret;
 
 		hash = hpt_hash(vpn, shift, ssize);
-		hidx = __rpte_to_hidx(rpte, subpg_index);
-		if (hidx & _PTEIDX_SECONDARY)
-			hash = ~hash;
-		slot = (hash & htab_hash_mask) * HPTES_PER_GROUP;
-		slot += hidx & _PTEIDX_GROUP_IX;
-
-		ret = mmu_hash_ops.hpte_updatepp(slot, rflags, vpn,
+		ret = mmu_hash_ops.hash_updatepp(hash, rflags, vpn,
 						 MMU_PAGE_4K, MMU_PAGE_4K,
 						 ssize, flags);
 		/*
-		 *if we failed because typically the HPTE wasn't really here
+		 * if we failed because typically the HPTE wasn't really here
 		 * we try an insertion.
 		 */
 		if (ret == -1)
@@ -268,12 +262,7 @@ int __hash_page_64K(unsigned long ea, unsigned long access,
 		 * There MIGHT be an HPTE for this pte
 		 */
 		hash = hpt_hash(vpn, shift, ssize);
-		if (old_pte & H_PAGE_F_SECOND)
-			hash = ~hash;
-		slot = (hash & htab_hash_mask) * HPTES_PER_GROUP;
-		slot += (old_pte & H_PAGE_F_GIX) >> H_PAGE_F_GIX_SHIFT;
-
-		if (mmu_hash_ops.hpte_updatepp(slot, rflags, vpn, MMU_PAGE_64K,
+		if (mmu_hash_ops.hash_updatepp(hash, rflags, vpn, MMU_PAGE_64K,
 					       MMU_PAGE_64K, ssize,
 					       flags) == -1)
 			old_pte &= ~_PAGE_HPTEFLAGS;
diff --git a/arch/powerpc/mm/hash_native_64.c b/arch/powerpc/mm/hash_native_64.c
index 8e2e6b92aa27..3b061844929c 100644
--- a/arch/powerpc/mm/hash_native_64.c
+++ b/arch/powerpc/mm/hash_native_64.c
@@ -396,6 +396,42 @@ struct hash_pte *native_hpte_find(unsigned long hash, unsigned long vpn,
 	return NULL;
 }
 
+static long native_hash_updatepp(unsigned long hash, unsigned long newpp,
+				 unsigned long vpn, int bpsize,
+				 int apsize, int ssize, unsigned long flags)
+{
+	int ret = 0;
+	struct hash_pte *hptep;
+	int local = 0;
+
+
+	DBG_LOW("    update(vpn=%016lx, newpp=%lx)", vpn, newpp);
+
+	hptep = native_hpte_find(hash, vpn, bpsize, ssize);
+	if (hptep) {
+		DBG_LOW(" -> hit\n");
+		/* Update the HPTE */
+		hptep->r = cpu_to_be64((be64_to_cpu(hptep->r) &
+					~(HPTE_R_PPP | HPTE_R_N)) |
+				       (newpp & (HPTE_R_PPP | HPTE_R_N |
+						 HPTE_R_C)));
+		native_unlock_hpte(hptep);
+	} else {
+		DBG_LOW(" -> miss\n");
+		ret = -1;
+	}
+	/*
+	 * Ensure it is out of the tlb too if it is not a nohpte fault
+	 */
+	if (!(flags & HPTE_NOHPTE_UPDATE)) {
+		if (flags & HPTE_LOCAL_UPDATE)
+			local = 1;
+		tlbie(vpn, bpsize, apsize, ssize, local);
+	}
+	return ret;
+}
+
+
 /*
  * Update the page protection bits. Intended to be used to create
  * guard pages for kernel data structures on pages which are bolted
@@ -792,6 +828,7 @@ void __init hpte_init_native(void)
 	mmu_hash_ops.hpte_invalidate	= native_hpte_invalidate;
 	mmu_hash_ops.hash_invalidate	= native_hash_invalidate;
 	mmu_hash_ops.hpte_updatepp	= native_hpte_updatepp;
+	mmu_hash_ops.hash_updatepp	= native_hash_updatepp;
 	mmu_hash_ops.hpte_updateboltedpp = native_hpte_updateboltedpp;
 	mmu_hash_ops.hpte_removebolted = native_hpte_removebolted;
 	mmu_hash_ops.hpte_insert	= native_hpte_insert;
diff --git a/arch/powerpc/mm/hugetlbpage-hash64.c b/arch/powerpc/mm/hugetlbpage-hash64.c
index a84bb44497f9..4eb8c9d2f452 100644
--- a/arch/powerpc/mm/hugetlbpage-hash64.c
+++ b/arch/powerpc/mm/hugetlbpage-hash64.c
@@ -71,15 +71,10 @@ int __hash_page_huge(unsigned long ea, unsigned long access, unsigned long vsid,
 	/* Check if pte already has an hpte (case 2) */
 	if (unlikely(old_pte & H_PAGE_HASHPTE)) {
 		/* There MIGHT be an HPTE for this pte */
-		unsigned long hash, slot;
+		unsigned long hash;
 
 		hash = hpt_hash(vpn, shift, ssize);
-		if (old_pte & H_PAGE_F_SECOND)
-			hash = ~hash;
-		slot = (hash & htab_hash_mask) * HPTES_PER_GROUP;
-		slot += (old_pte & H_PAGE_F_GIX) >> H_PAGE_F_GIX_SHIFT;
-
-		if (mmu_hash_ops.hpte_updatepp(slot, rflags, vpn, mmu_psize,
+		if (mmu_hash_ops.hash_updatepp(hash, rflags, vpn, mmu_psize,
 					       mmu_psize, ssize, flags) == -1)
 			old_pte &= ~_PAGE_HPTEFLAGS;
 	}
diff --git a/arch/powerpc/platforms/ps3/htab.c b/arch/powerpc/platforms/ps3/htab.c
index 813c2f77f75d..4e82f7cbd124 100644
--- a/arch/powerpc/platforms/ps3/htab.c
+++ b/arch/powerpc/platforms/ps3/htab.c
@@ -251,11 +251,40 @@ static void ps3_hash_invalidate(unsigned long hash, unsigned long vpn,
 	return;
 }
 
+static long ps3_hash_updatepp(unsigned long hash,
+			      unsigned long newpp, unsigned long vpn,
+			      int psize, int apsize, int ssize,
+			      unsigned long inv_flags)
+{
+	long slot;
+	unsigned long flags;
+	unsigned long want_v;
+
+	want_v = hpte_encode_avpn(vpn, psize, ssize);
+	spin_lock_irqsave(&ps3_htab_lock, flags);
+
+	slot = ps3_hpte_find(hash, want_v);
+	if (slot < 0)
+		goto err_out;
+	/*
+	 * entry found, just invalidate it
+	 */
+	lv1_write_htab_entry(PS3_LPAR_VAS_ID_CURRENT, slot, 0, 0);
+	/*
+	 * We just invalidate instead of updating pp. Hence
+	 * return -1;
+	 */
+err_out:
+	spin_unlock_irqrestore(&ps3_htab_lock, flags);
+	return -1;
+}
+
 void __init ps3_hpte_init(unsigned long htab_size)
 {
 	mmu_hash_ops.hpte_invalidate = ps3_hpte_invalidate;
 	mmu_hash_ops.hash_invalidate = ps3_hash_invalidate;
 	mmu_hash_ops.hpte_updatepp = ps3_hpte_updatepp;
+	mmu_hash_ops.hash_updatepp = ps3_hash_updatepp;
 	mmu_hash_ops.hpte_updateboltedpp = ps3_hpte_updateboltedpp;
 	mmu_hash_ops.hpte_insert = ps3_hpte_insert;
 	mmu_hash_ops.hpte_remove = ps3_hpte_remove;
diff --git a/arch/powerpc/platforms/pseries/lpar.c b/arch/powerpc/platforms/pseries/lpar.c
index d32469e40bbc..511a2e9ed9a0 100644
--- a/arch/powerpc/platforms/pseries/lpar.c
+++ b/arch/powerpc/platforms/pseries/lpar.c
@@ -376,6 +376,36 @@ static long pSeries_lpar_hpte_find(unsigned long vpn, int psize, int ssize)
 	return slot;
 }
 
+static long pSeries_lpar_hash_updatepp(unsigned long hash,
+				       unsigned long newpp,
+				       unsigned long vpn,
+				       int psize, int apsize,
+				       int ssize, unsigned long inv_flags)
+{
+	long slot;
+	unsigned long lpar_rc;
+	unsigned long flags = (newpp & 7) | H_AVPN;
+	unsigned long want_v;
+
+	want_v = hpte_encode_avpn(vpn, psize, ssize);
+
+	pr_devel("    update: avpnv=%016lx, hash=%016lx, f=%lx, psize: %d ...",
+		 want_v, hash, flags, psize);
+
+	slot = __pSeries_lpar_hpte_find(hash, want_v);
+	if (slot < 0)
+		return -1;
+
+	lpar_rc = plpar_pte_protect(flags, slot, want_v);
+	if (lpar_rc == H_NOT_FOUND) {
+		pr_devel("not found !\n");
+		return -1;
+	}
+	pr_devel("ok\n");
+	BUG_ON(lpar_rc != H_SUCCESS);
+
+	return 0;
+}
 
 static void pSeries_lpar_hpte_updateboltedpp(unsigned long newpp,
 					     unsigned long ea,
@@ -784,6 +814,7 @@ void __init hpte_init_pseries(void)
 	mmu_hash_ops.hpte_invalidate	 = pSeries_lpar_hpte_invalidate;
 	mmu_hash_ops.hash_invalidate	 = pSeries_lpar_hash_invalidate;
 	mmu_hash_ops.hpte_updatepp	 = pSeries_lpar_hpte_updatepp;
+	mmu_hash_ops.hash_updatepp	 = pSeries_lpar_hash_updatepp;
 	mmu_hash_ops.hpte_updateboltedpp = pSeries_lpar_hpte_updateboltedpp;
 	mmu_hash_ops.hpte_insert	 = pSeries_lpar_hpte_insert;
 	mmu_hash_ops.hpte_remove	 = pSeries_lpar_hpte_remove;
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 08/16] powerpc/mm/hash: Don't track hash pte slot number in linux page table.
  2017-10-27  4:08 [PATCH 00/16] Remove hash page table slot tracking from linux PTE Aneesh Kumar K.V
                   ` (6 preceding siblings ...)
  2017-10-27  4:08 ` [PATCH 07/16] powerpc/mm: Add hash updatepp callback Aneesh Kumar K.V
@ 2017-10-27  4:08 ` Aneesh Kumar K.V
  2017-10-27  4:08 ` [PATCH 09/16] powerpc/mm: Add new firmware feature HASH API Aneesh Kumar K.V
                   ` (8 subsequent siblings)
  16 siblings, 0 replies; 31+ messages in thread
From: Aneesh Kumar K.V @ 2017-10-27  4:08 UTC (permalink / raw)
  To: benh, paulus, mpe; +Cc: linuxppc-dev, Aneesh Kumar K.V

Now that we have updated all MMU hash operations to work with hash value instead
of slot, remove slot tracking completely. We also remove real_pte because
without slot tracking 4k, 64k and 64k subpages all have similar pte format.

One of the side effect of this is, we now don't track whether we have taken
a fault on 4k subpages on a 64k page config. That means a invalidate will try
to invalidate all the 4k subpages.

To minimize the impact from above THP still track the slot details. With THP we
have 4096 subpages and we want to avoid calling invalidate on all. For THP we
don't track slot details as part of linux page table, but are tracked in the
deposited page table

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/book3s/64/hash-4k.h       | 16 +++-
 arch/powerpc/include/asm/book3s/64/hash-64k.h      | 44 +---------
 arch/powerpc/include/asm/book3s/64/hash.h          |  5 +-
 arch/powerpc/include/asm/book3s/64/pgtable.h       | 26 ------
 arch/powerpc/include/asm/book3s/64/tlbflush-hash.h |  3 +-
 arch/powerpc/include/asm/pgtable-be-types.h        | 10 ---
 arch/powerpc/include/asm/pgtable-types.h           |  9 ---
 arch/powerpc/mm/dump_linuxpagetables.c             | 10 ---
 arch/powerpc/mm/hash64_4k.c                        |  2 -
 arch/powerpc/mm/hash64_64k.c                       | 93 +++++-----------------
 arch/powerpc/mm/hash_native_64.c                   | 12 +--
 arch/powerpc/mm/hash_utils_64.c                    | 22 +----
 arch/powerpc/mm/hugetlbpage-hash64.c               |  4 -
 arch/powerpc/mm/tlb_hash64.c                       |  9 +--
 arch/powerpc/platforms/pseries/lpar.c              |  4 +-
 15 files changed, 49 insertions(+), 220 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/hash-4k.h b/arch/powerpc/include/asm/book3s/64/hash-4k.h
index 0c4e470571ca..d65dcb5826ff 100644
--- a/arch/powerpc/include/asm/book3s/64/hash-4k.h
+++ b/arch/powerpc/include/asm/book3s/64/hash-4k.h
@@ -17,8 +17,7 @@
 #define H_PGD_TABLE_SIZE	(sizeof(pgd_t) << H_PGD_INDEX_SIZE)
 
 /* PTE flags to conserve for HPTE identification */
-#define _PAGE_HPTEFLAGS (H_PAGE_BUSY | H_PAGE_HASHPTE | \
-			 H_PAGE_F_SECOND | H_PAGE_F_GIX)
+#define _PAGE_HPTEFLAGS (H_PAGE_BUSY | H_PAGE_HASHPTE)
 /*
  * Not supported by 4k linux page size
  */
@@ -27,6 +26,19 @@
 #define H_PAGE_COMBO	0x0
 #define H_PTE_FRAG_NR	0
 #define H_PTE_FRAG_SIZE_SHIFT  0
+
+#define pte_iterate_hashed_subpages(vpn, psize, index, shift)	\
+	do {							\
+	index = 0;						\
+	shift = mmu_psize_defs[psize].shift;			\
+
+#define pte_iterate_hashed_end() } while(0)
+/*
+ * We expect this to be called only for user addresses or kernel virtual
+ * addresses other than the linear mapping.
+ */
+#define pte_pagesize_index(mm, addr, pte)	MMU_PAGE_4K
+
 /*
  * On all 4K setups, remap_4k_pfn() equates to remap_pfn_range()
  */
diff --git a/arch/powerpc/include/asm/book3s/64/hash-64k.h b/arch/powerpc/include/asm/book3s/64/hash-64k.h
index 9732837aaae8..ab36323b8a3e 100644
--- a/arch/powerpc/include/asm/book3s/64/hash-64k.h
+++ b/arch/powerpc/include/asm/book3s/64/hash-64k.h
@@ -25,8 +25,7 @@
 #define H_PAGE_COMBO_VALID	(H_PAGE_F_GIX | H_PAGE_F_SECOND)
 
 /* PTE flags to conserve for HPTE identification */
-#define _PAGE_HPTEFLAGS (H_PAGE_BUSY | H_PAGE_F_SECOND | \
-			 H_PAGE_F_GIX | H_PAGE_HASHPTE | H_PAGE_COMBO)
+#define _PAGE_HPTEFLAGS (H_PAGE_BUSY | H_PAGE_HASHPTE | H_PAGE_COMBO)
 /*
  * we support 16 fragments per PTE page of 64K size.
  */
@@ -40,55 +39,16 @@
 
 #ifndef __ASSEMBLY__
 #include <asm/errno.h>
-
-/*
- * With 64K pages on hash table, we have a special PTE format that
- * uses a second "half" of the page table to encode sub-page information
- * in order to deal with 64K made of 4K HW pages. Thus we override the
- * generic accessors and iterators here
- */
-#define __real_pte __real_pte
-static inline real_pte_t __real_pte(pte_t pte, pte_t *ptep)
-{
-	real_pte_t rpte;
-	unsigned long *hidxp;
-
-	rpte.pte = pte;
-	rpte.hidx = 0;
-	if (pte_val(pte) & H_PAGE_COMBO) {
-		/*
-		 * Make sure we order the hidx load against the H_PAGE_COMBO
-		 * check. The store side ordering is done in __hash_page_4K
-		 */
-		smp_rmb();
-		hidxp = (unsigned long *)(ptep + PTRS_PER_PTE);
-		rpte.hidx = *hidxp;
-	}
-	return rpte;
-}
-
-static inline unsigned long __rpte_to_hidx(real_pte_t rpte, unsigned long index)
-{
-	if ((pte_val(rpte.pte) & H_PAGE_COMBO))
-		return (rpte.hidx >> (index<<2)) & 0xf;
-	return (pte_val(rpte.pte) >> H_PAGE_F_GIX_SHIFT) & 0xf;
-}
-
-#define __rpte_to_pte(r)	((r).pte)
-extern bool __rpte_sub_valid(real_pte_t rpte, unsigned long index);
 /*
  * Trick: we set __end to va + 64k, which happens works for
  * a 16M page as well as we want only one iteration
  */
-#define pte_iterate_hashed_subpages(rpte, psize, vpn, index, shift)	\
+#define pte_iterate_hashed_subpages(vpn, psize, index, shift)		\
 	do {								\
 		unsigned long __end = vpn + (1UL << (PAGE_SHIFT - VPN_SHIFT));	\
-		unsigned __split = (psize == MMU_PAGE_4K ||		\
-				    psize == MMU_PAGE_64K_AP);		\
 		shift = mmu_psize_defs[psize].shift;			\
 		for (index = 0; vpn < __end; index++,			\
 			     vpn += (1L << (shift - VPN_SHIFT))) {	\
-			if (!__split || __rpte_sub_valid(rpte, index))	\
 				do {
 
 #define pte_iterate_hashed_end() } while(0); } } while(0)
diff --git a/arch/powerpc/include/asm/book3s/64/hash.h b/arch/powerpc/include/asm/book3s/64/hash.h
index f88452019114..d95a3d41d8d0 100644
--- a/arch/powerpc/include/asm/book3s/64/hash.h
+++ b/arch/powerpc/include/asm/book3s/64/hash.h
@@ -8,11 +8,8 @@
  *
  */
 #define H_PTE_NONE_MASK		_PAGE_HPTEFLAGS
-#define H_PAGE_F_GIX_SHIFT	56
 #define H_PAGE_BUSY		_RPAGE_RSV1 /* software: PTE & hash are busy */
-#define H_PAGE_F_SECOND		_RPAGE_RSV2	/* HPTE is in 2ndary HPTEG */
-#define H_PAGE_F_GIX		(_RPAGE_RSV3 | _RPAGE_RSV4 | _RPAGE_RPN44)
-#define H_PAGE_HASHPTE		_RPAGE_RPN43	/* PTE has associated HPTE */
+#define H_PAGE_HASHPTE		_RPAGE_RSV2	/* PTE has associated HPTE */
 
 #ifdef CONFIG_PPC_64K_PAGES
 #include <asm/book3s/64/hash-64k.h>
diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h
index b9aff515b4de..9c2ffaaa5b80 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -316,32 +316,6 @@ extern unsigned long pci_io_base;
 
 #ifndef __ASSEMBLY__
 
-/*
- * This is the default implementation of various PTE accessors, it's
- * used in all cases except Book3S with 64K pages where we have a
- * concept of sub-pages
- */
-#ifndef __real_pte
-
-#define __real_pte(e,p)		((real_pte_t){(e)})
-#define __rpte_to_pte(r)	((r).pte)
-#define __rpte_to_hidx(r,index)	(pte_val(__rpte_to_pte(r)) >> H_PAGE_F_GIX_SHIFT)
-
-#define pte_iterate_hashed_subpages(rpte, psize, va, index, shift)       \
-	do {							         \
-		index = 0;					         \
-		shift = mmu_psize_defs[psize].shift;		         \
-
-#define pte_iterate_hashed_end() } while(0)
-
-/*
- * We expect this to be called only for user addresses or kernel virtual
- * addresses other than the linear mapping.
- */
-#define pte_pagesize_index(mm, addr, pte)	MMU_PAGE_4K
-
-#endif /* __real_pte */
-
 static inline unsigned long pte_update(struct mm_struct *mm, unsigned long addr,
 				       pte_t *ptep, unsigned long clr,
 				       unsigned long set, int huge)
diff --git a/arch/powerpc/include/asm/book3s/64/tlbflush-hash.h b/arch/powerpc/include/asm/book3s/64/tlbflush-hash.h
index 99c99bb04353..6fd4b5682056 100644
--- a/arch/powerpc/include/asm/book3s/64/tlbflush-hash.h
+++ b/arch/powerpc/include/asm/book3s/64/tlbflush-hash.h
@@ -14,7 +14,6 @@ struct ppc64_tlb_batch {
 	int			active;
 	unsigned long		index;
 	struct mm_struct	*mm;
-	real_pte_t		pte[PPC64_TLB_BATCH_NR];
 	unsigned long		vpn[PPC64_TLB_BATCH_NR];
 	unsigned int		psize;
 	int			ssize;
@@ -51,7 +50,7 @@ static inline void arch_leave_lazy_mmu_mode(void)
 #define arch_flush_lazy_mmu_mode()      do {} while (0)
 
 
-extern void flush_hash_page(unsigned long vpn, real_pte_t pte, int psize,
+extern void flush_hash_page(unsigned long vpn, int psize,
 			    int ssize, unsigned long flags);
 extern void flush_hash_range(unsigned long number, int local);
 extern void flush_hash_hugepage(unsigned long vsid, unsigned long addr,
diff --git a/arch/powerpc/include/asm/pgtable-be-types.h b/arch/powerpc/include/asm/pgtable-be-types.h
index 67e7e3d990f4..367a6662e05e 100644
--- a/arch/powerpc/include/asm/pgtable-be-types.h
+++ b/arch/powerpc/include/asm/pgtable-be-types.h
@@ -72,16 +72,6 @@ typedef struct { unsigned long pgprot; } pgprot_t;
 #define pgprot_val(x)	((x).pgprot)
 #define __pgprot(x)	((pgprot_t) { (x) })
 
-/*
- * With hash config 64k pages additionally define a bigger "real PTE" type that
- * gathers the "second half" part of the PTE for pseudo 64k pages
- */
-#if defined(CONFIG_PPC_64K_PAGES) && defined(CONFIG_PPC_STD_MMU_64)
-typedef struct { pte_t pte; unsigned long hidx; } real_pte_t;
-#else
-typedef struct { pte_t pte; } real_pte_t;
-#endif
-
 static inline bool pte_xchg(pte_t *ptep, pte_t old, pte_t new)
 {
 	unsigned long *p = (unsigned long *)ptep;
diff --git a/arch/powerpc/include/asm/pgtable-types.h b/arch/powerpc/include/asm/pgtable-types.h
index 369a164b545c..baa49eccff20 100644
--- a/arch/powerpc/include/asm/pgtable-types.h
+++ b/arch/powerpc/include/asm/pgtable-types.h
@@ -45,15 +45,6 @@ typedef struct { unsigned long pgprot; } pgprot_t;
 #define pgprot_val(x)	((x).pgprot)
 #define __pgprot(x)	((pgprot_t) { (x) })
 
-/*
- * With hash config 64k pages additionally define a bigger "real PTE" type that
- * gathers the "second half" part of the PTE for pseudo 64k pages
- */
-#if defined(CONFIG_PPC_64K_PAGES) && defined(CONFIG_PPC_STD_MMU_64)
-typedef struct { pte_t pte; unsigned long hidx; } real_pte_t;
-#else
-typedef struct { pte_t pte; } real_pte_t;
-#endif
 
 #ifdef CONFIG_PPC_STD_MMU_64
 #include <asm/cmpxchg.h>
diff --git a/arch/powerpc/mm/dump_linuxpagetables.c b/arch/powerpc/mm/dump_linuxpagetables.c
index c9282d27b203..af98ad112c56 100644
--- a/arch/powerpc/mm/dump_linuxpagetables.c
+++ b/arch/powerpc/mm/dump_linuxpagetables.c
@@ -214,16 +214,6 @@ static const struct flag_info flag_array[] = {
 		.set	= "4K_pfn",
 	}, {
 #endif
-		.mask	= H_PAGE_F_GIX,
-		.val	= H_PAGE_F_GIX,
-		.set	= "f_gix",
-		.is_val	= true,
-		.shift	= H_PAGE_F_GIX_SHIFT,
-	}, {
-		.mask	= H_PAGE_F_SECOND,
-		.val	= H_PAGE_F_SECOND,
-		.set	= "f_second",
-	}, {
 #endif
 		.mask	= _PAGE_SPECIAL,
 		.val	= _PAGE_SPECIAL,
diff --git a/arch/powerpc/mm/hash64_4k.c b/arch/powerpc/mm/hash64_4k.c
index afb79100f0ce..68ae99ea6bcf 100644
--- a/arch/powerpc/mm/hash64_4k.c
+++ b/arch/powerpc/mm/hash64_4k.c
@@ -113,8 +113,6 @@ int __hash_page_4K(unsigned long ea, unsigned long access, unsigned long vsid,
 			return -1;
 		}
 		new_pte = (new_pte & ~_PAGE_HPTEFLAGS) | H_PAGE_HASHPTE;
-		new_pte |= (slot << H_PAGE_F_GIX_SHIFT) &
-			(H_PAGE_F_SECOND | H_PAGE_F_GIX);
 	}
 	*ptep = __pte(new_pte & ~H_PAGE_BUSY);
 	return 0;
diff --git a/arch/powerpc/mm/hash64_64k.c b/arch/powerpc/mm/hash64_64k.c
index 096fdfaf6f1c..3beb3063202f 100644
--- a/arch/powerpc/mm/hash64_64k.c
+++ b/arch/powerpc/mm/hash64_64k.c
@@ -15,42 +15,12 @@
 #include <linux/mm.h>
 #include <asm/machdep.h>
 #include <asm/mmu.h>
-/*
- * index from 0 - 15
- */
-bool __rpte_sub_valid(real_pte_t rpte, unsigned long index)
-{
-	unsigned long g_idx;
-	unsigned long ptev = pte_val(rpte.pte);
-
-	g_idx = (ptev & H_PAGE_COMBO_VALID) >> H_PAGE_F_GIX_SHIFT;
-	index = index >> 2;
-	if (g_idx & (0x1 << index))
-		return true;
-	else
-		return false;
-}
-/*
- * index from 0 - 15
- */
-static unsigned long mark_subptegroup_valid(unsigned long ptev, unsigned long index)
-{
-	unsigned long g_idx;
-
-	if (!(ptev & H_PAGE_COMBO))
-		return ptev;
-	index = index >> 2;
-	g_idx = 0x1 << index;
-
-	return ptev | (g_idx << H_PAGE_F_GIX_SHIFT);
-}
 
 int __hash_page_4K(unsigned long ea, unsigned long access, unsigned long vsid,
 		   pte_t *ptep, unsigned long trap, unsigned long flags,
 		   int ssize, int subpg_prot)
 {
-	real_pte_t rpte;
-	unsigned long *hidxp;
+	int ret;
 	unsigned long hpte_group;
 	unsigned int subpg_index;
 	unsigned long rflags, pa;
@@ -99,7 +69,6 @@ int __hash_page_4K(unsigned long ea, unsigned long access, unsigned long vsid,
 
 	subpg_index = (ea & (PAGE_SIZE - 1)) >> shift;
 	vpn  = hpt_vpn(ea, vsid, ssize);
-	rpte = __real_pte(__pte(old_pte), ptep);
 	/*
 	 *None of the sub 4k page is hashed
 	 */
@@ -110,37 +79,31 @@ int __hash_page_4K(unsigned long ea, unsigned long access, unsigned long vsid,
 	 * as a 64k HW page, and invalidate the 64k HPTE if so.
 	 */
 	if (!(old_pte & H_PAGE_COMBO)) {
-		flush_hash_page(vpn, rpte, MMU_PAGE_64K, ssize, flags);
-		/*
-		 * clear the old slot details from the old and new pte.
-		 * On hash insert failure we use old pte value and we don't
-		 * want slot information there if we have a insert failure.
-		 */
-		old_pte &= ~(H_PAGE_HASHPTE | H_PAGE_F_GIX | H_PAGE_F_SECOND);
-		new_pte &= ~(H_PAGE_HASHPTE | H_PAGE_F_GIX | H_PAGE_F_SECOND);
+		flush_hash_page(vpn, MMU_PAGE_64K, ssize, flags);
+		old_pte &= ~H_PAGE_HASHPTE;
+		new_pte &= ~H_PAGE_HASHPTE;
 		goto htab_insert_hpte;
 	}
 	/*
-	 * Check for sub page valid and update
+	 * We are not tracking the validty of 4k entries seperately. Hence
+	 * If H_PAGE_HASHPTE is set, we always try an update.
 	 */
-	if (__rpte_sub_valid(rpte, subpg_index)) {
-		int ret;
-
-		hash = hpt_hash(vpn, shift, ssize);
-		ret = mmu_hash_ops.hash_updatepp(hash, rflags, vpn,
-						 MMU_PAGE_4K, MMU_PAGE_4K,
-						 ssize, flags);
-		/*
-		 * if we failed because typically the HPTE wasn't really here
-		 * we try an insertion.
-		 */
-		if (ret == -1)
-			goto htab_insert_hpte;
-
+	hash = hpt_hash(vpn, shift, ssize);
+	ret = mmu_hash_ops.hash_updatepp(hash, rflags, vpn,
+					 MMU_PAGE_4K, MMU_PAGE_4K,
+					 ssize, flags);
+	/*
+	 * if we failed because typically the HPTE wasn't really here
+	 * we try an insertion.
+	 */
+	if (ret != -1) {
 		*ptep = __pte(new_pte & ~H_PAGE_BUSY);
 		return 0;
 	}
-
+	/*
+	 * updatepp failed, hash table doesn't have an entry for this,
+	 * insert a new entry
+	 */
 htab_insert_hpte:
 	/*
 	 * handle H_PAGE_4K_PFN case
@@ -192,21 +155,7 @@ int __hash_page_4K(unsigned long ea, unsigned long access, unsigned long vsid,
 				   MMU_PAGE_4K, MMU_PAGE_4K, old_pte);
 		return -1;
 	}
-	/*
-	 * Insert slot number & secondary bit in PTE second half,
-	 * clear H_PAGE_BUSY and set appropriate HPTE slot bit
-	 * Since we have H_PAGE_BUSY set on ptep, we can be sure
-	 * nobody is undating hidx.
-	 */
-	hidxp = (unsigned long *)(ptep + PTRS_PER_PTE);
-	rpte.hidx &= ~(0xfUL << (subpg_index << 2));
-	*hidxp = rpte.hidx  | (slot << (subpg_index << 2));
-	new_pte = mark_subptegroup_valid(new_pte, subpg_index);
 	new_pte |=  H_PAGE_HASHPTE;
-	/*
-	 * check __real_pte for details on matching smp_rmb()
-	 */
-	smp_wmb();
 	*ptep = __pte(new_pte & ~H_PAGE_BUSY);
 	return 0;
 }
@@ -311,9 +260,7 @@ int __hash_page_64K(unsigned long ea, unsigned long access,
 					   MMU_PAGE_64K, MMU_PAGE_64K, old_pte);
 			return -1;
 		}
-		new_pte = (new_pte & ~_PAGE_HPTEFLAGS) | H_PAGE_HASHPTE;
-		new_pte |= (slot << H_PAGE_F_GIX_SHIFT) &
-			(H_PAGE_F_SECOND | H_PAGE_F_GIX);
+		new_pte = new_pte |  H_PAGE_HASHPTE;
 	}
 	*ptep = __pte(new_pte & ~H_PAGE_BUSY);
 	return 0;
diff --git a/arch/powerpc/mm/hash_native_64.c b/arch/powerpc/mm/hash_native_64.c
index 3b061844929c..a268d3a62425 100644
--- a/arch/powerpc/mm/hash_native_64.c
+++ b/arch/powerpc/mm/hash_native_64.c
@@ -746,7 +746,6 @@ static void native_flush_hash_range(unsigned long number, int local)
 	unsigned long hash, index, shift;
 	struct hash_pte *hptep;
 	unsigned long flags;
-	real_pte_t pte;
 	struct ppc64_tlb_batch *batch = this_cpu_ptr(&ppc64_tlb_batch);
 	unsigned long psize = batch->psize;
 	int ssize = batch->ssize;
@@ -760,9 +759,8 @@ static void native_flush_hash_range(unsigned long number, int local)
 
 	for (i = 0; i < number; i++) {
 		vpn = batch->vpn[i];
-		pte = batch->pte[i];
 
-		pte_iterate_hashed_subpages(pte, psize, vpn, index, shift) {
+		pte_iterate_hashed_subpages(vpn, psize, index, shift) {
 			hash = hpt_hash(vpn, shift, ssize);
 			hptep = native_hpte_find(hash, vpn, psize, ssize);
 			if (!hptep)
@@ -778,10 +776,8 @@ static void native_flush_hash_range(unsigned long number, int local)
 		asm volatile("ptesync":::"memory");
 		for (i = 0; i < number; i++) {
 			vpn = batch->vpn[i];
-			pte = batch->pte[i];
 
-			pte_iterate_hashed_subpages(pte, psize,
-						    vpn, index, shift) {
+			pte_iterate_hashed_subpages(vpn, psize, index, shift) {
 				__tlbiel(vpn, psize, psize, ssize);
 			} pte_iterate_hashed_end();
 		}
@@ -795,10 +791,8 @@ static void native_flush_hash_range(unsigned long number, int local)
 		asm volatile("ptesync":::"memory");
 		for (i = 0; i < number; i++) {
 			vpn = batch->vpn[i];
-			pte = batch->pte[i];
 
-			pte_iterate_hashed_subpages(pte, psize,
-						    vpn, index, shift) {
+			pte_iterate_hashed_subpages(vpn, psize, index, shift) {
 				__tlbie(vpn, psize, psize, ssize);
 			} pte_iterate_hashed_end();
 		}
diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
index 8635b241e2d5..e700660459c4 100644
--- a/arch/powerpc/mm/hash_utils_64.c
+++ b/arch/powerpc/mm/hash_utils_64.c
@@ -974,21 +974,8 @@ void __init hash__early_init_devtree(void)
 
 void __init hash__early_init_mmu(void)
 {
-	/*
-	 * We have code in __hash_page_64K() and elsewhere, which assumes it can
-	 * do the following:
-	 *   new_pte |= (slot << H_PAGE_F_GIX_SHIFT) & (H_PAGE_F_SECOND | H_PAGE_F_GIX);
-	 *
-	 * Where the slot number is between 0-15, and values of 8-15 indicate
-	 * the secondary bucket. For that code to work H_PAGE_F_SECOND and
-	 * H_PAGE_F_GIX must occupy four contiguous bits in the PTE, and
-	 * H_PAGE_F_SECOND must be placed above H_PAGE_F_GIX. Assert that here
-	 * with a BUILD_BUG_ON().
-	 */
-	BUILD_BUG_ON(H_PAGE_F_SECOND != (1ul  << (H_PAGE_F_GIX_SHIFT + 3)));
 
 	htab_init_page_sizes();
-
 	/*
 	 * initialize page table size
 	 */
@@ -1590,14 +1577,13 @@ static inline void tm_flush_hash_page(int local)
 /* WARNING: This is called from hash_low_64.S, if you change this prototype,
  *          do not forget to update the assembly call site !
  */
-void flush_hash_page(unsigned long vpn, real_pte_t pte, int psize, int ssize,
-		     unsigned long flags)
+void flush_hash_page(unsigned long vpn, int psize, int ssize, unsigned long flags)
 {
 	unsigned long hash, index, shift;
 	int local = flags & HPTE_LOCAL_UPDATE;
 
 	DBG_LOW("flush_hash_page(vpn=%016lx)\n", vpn);
-	pte_iterate_hashed_subpages(pte, psize, vpn, index, shift) {
+	pte_iterate_hashed_subpages(vpn, psize, index, shift) {
 		hash = hpt_hash(vpn, shift, ssize);
 		DBG_LOW(" sub %ld: hash=%lx\n", index, hash);
 		/*
@@ -1679,8 +1665,8 @@ void flush_hash_range(unsigned long number, int local)
 			this_cpu_ptr(&ppc64_tlb_batch);
 
 		for (i = 0; i < number; i++)
-			flush_hash_page(batch->vpn[i], batch->pte[i],
-					batch->psize, batch->ssize, local);
+			flush_hash_page(batch->vpn[i], batch->psize,
+					batch->ssize, local);
 	}
 }
 
diff --git a/arch/powerpc/mm/hugetlbpage-hash64.c b/arch/powerpc/mm/hugetlbpage-hash64.c
index 4eb8c9d2f452..8aff8d17d91c 100644
--- a/arch/powerpc/mm/hugetlbpage-hash64.c
+++ b/arch/powerpc/mm/hugetlbpage-hash64.c
@@ -100,11 +100,7 @@ int __hash_page_huge(unsigned long ea, unsigned long access, unsigned long vsid,
 					   mmu_psize, mmu_psize, old_pte);
 			return -1;
 		}
-
-		new_pte |= (slot << H_PAGE_F_GIX_SHIFT) &
-			(H_PAGE_F_SECOND | H_PAGE_F_GIX);
 	}
-
 	/*
 	 * No need to use ldarx/stdcx here
 	 */
diff --git a/arch/powerpc/mm/tlb_hash64.c b/arch/powerpc/mm/tlb_hash64.c
index 881ebd53ffc2..39ebb0bf4694 100644
--- a/arch/powerpc/mm/tlb_hash64.c
+++ b/arch/powerpc/mm/tlb_hash64.c
@@ -50,7 +50,6 @@ void hpte_need_flush(struct mm_struct *mm, unsigned long addr,
 	unsigned long vsid;
 	unsigned int psize;
 	int ssize;
-	real_pte_t rpte;
 	int i;
 
 	i = batch->index;
@@ -91,14 +90,13 @@ void hpte_need_flush(struct mm_struct *mm, unsigned long addr,
 	}
 	WARN_ON(vsid == 0);
 	vpn = hpt_vpn(addr, vsid, ssize);
-	rpte = __real_pte(__pte(pte), ptep);
 
 	/*
 	 * Check if we have an active batch on this CPU. If not, just
 	 * flush now and return.
 	 */
 	if (!batch->active) {
-		flush_hash_page(vpn, rpte, psize, ssize, mm_is_thread_local(mm));
+		flush_hash_page(vpn, psize, ssize, mm_is_thread_local(mm));
 		put_cpu_var(ppc64_tlb_batch);
 		return;
 	}
@@ -123,7 +121,6 @@ void hpte_need_flush(struct mm_struct *mm, unsigned long addr,
 		batch->psize = psize;
 		batch->ssize = ssize;
 	}
-	batch->pte[i] = rpte;
 	batch->vpn[i] = vpn;
 	batch->index = ++i;
 	if (i >= PPC64_TLB_BATCH_NR)
@@ -145,8 +142,8 @@ void __flush_tlb_pending(struct ppc64_tlb_batch *batch)
 	i = batch->index;
 	local = mm_is_thread_local(batch->mm);
 	if (i == 1)
-		flush_hash_page(batch->vpn[0], batch->pte[0],
-				batch->psize, batch->ssize, local);
+		flush_hash_page(batch->vpn[0], batch->psize,
+				batch->ssize, local);
 	else
 		flush_hash_range(i, local);
 	batch->index = 0;
diff --git a/arch/powerpc/platforms/pseries/lpar.c b/arch/powerpc/platforms/pseries/lpar.c
index 511a2e9ed9a0..52d2e3038c05 100644
--- a/arch/powerpc/platforms/pseries/lpar.c
+++ b/arch/powerpc/platforms/pseries/lpar.c
@@ -616,7 +616,6 @@ static void pSeries_lpar_flush_hash_range(unsigned long number, int local)
 	int lock_tlbie = !mmu_has_feature(MMU_FTR_LOCKLESS_TLBIE);
 	unsigned long param[PLPAR_HCALL9_BUFSIZE];
 	unsigned long index, shift;
-	real_pte_t pte;
 	int psize, ssize, pix;
 
 	if (lock_tlbie)
@@ -627,8 +626,7 @@ static void pSeries_lpar_flush_hash_range(unsigned long number, int local)
 	pix = 0;
 	for (i = 0; i < number; i++) {
 		vpn = batch->vpn[i];
-		pte = batch->pte[i];
-		pte_iterate_hashed_subpages(pte, psize, vpn, index, shift) {
+		pte_iterate_hashed_subpages(vpn, psize, index, shift) {
 			long slot;
 
 			slot = pSeries_lpar_hpte_find(vpn, psize, ssize);
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 09/16] powerpc/mm: Add new firmware feature HASH API
  2017-10-27  4:08 [PATCH 00/16] Remove hash page table slot tracking from linux PTE Aneesh Kumar K.V
                   ` (7 preceding siblings ...)
  2017-10-27  4:08 ` [PATCH 08/16] powerpc/mm/hash: Don't track hash pte slot number in linux page table Aneesh Kumar K.V
@ 2017-10-27  4:08 ` Aneesh Kumar K.V
  2017-10-27  4:08 ` [PATCH 10/16] powerpc/kvm/hash: Implement HASH_REMOVE hcall Aneesh Kumar K.V
                   ` (7 subsequent siblings)
  16 siblings, 0 replies; 31+ messages in thread
From: Aneesh Kumar K.V @ 2017-10-27  4:08 UTC (permalink / raw)
  To: benh, paulus, mpe; +Cc: linuxppc-dev, Aneesh Kumar K.V

We will use this feature to check whether hypervisor implements hash based
remove and protect hcalls

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/firmware.h       | 3 ++-
 arch/powerpc/kvm/powerpc.c                | 4 ++++
 arch/powerpc/platforms/pseries/firmware.c | 1 +
 include/uapi/linux/kvm.h                  | 1 +
 4 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/firmware.h b/arch/powerpc/include/asm/firmware.h
index 8645897472b1..152d704ac3c3 100644
--- a/arch/powerpc/include/asm/firmware.h
+++ b/arch/powerpc/include/asm/firmware.h
@@ -51,6 +51,7 @@
 #define FW_FEATURE_BEST_ENERGY	ASM_CONST(0x0000000080000000)
 #define FW_FEATURE_TYPE1_AFFINITY ASM_CONST(0x0000000100000000)
 #define FW_FEATURE_PRRN		ASM_CONST(0x0000000200000000)
+#define FW_FEATURE_HASH_API	ASM_CONST(0x0000000400000000)
 
 #ifndef __ASSEMBLY__
 
@@ -67,7 +68,7 @@ enum {
 		FW_FEATURE_CMO | FW_FEATURE_VPHN | FW_FEATURE_XCMO |
 		FW_FEATURE_SET_MODE | FW_FEATURE_BEST_ENERGY |
 		FW_FEATURE_TYPE1_AFFINITY | FW_FEATURE_PRRN |
-		FW_FEATURE_HPT_RESIZE,
+		FW_FEATURE_HPT_RESIZE | FW_FEATURE_HASH_API,
 	FW_FEATURE_PSERIES_ALWAYS = 0,
 	FW_FEATURE_POWERNV_POSSIBLE = FW_FEATURE_OPAL,
 	FW_FEATURE_POWERNV_ALWAYS = 0,
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 3480faaf1ef8..6fb91198dc90 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -637,6 +637,10 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 		/* Disable this on POWER9 until code handles new HPTE format */
 		r = !!hv_enabled && !cpu_has_feature(CPU_FTR_ARCH_300);
 		break;
+	case KVM_CAP_SPAPR_HASH_API:
+		/* Only enable for HV kvm */
+		r = is_kvmppc_hv_enabled(kvm);
+		break;
 #endif
 #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
 	case KVM_CAP_PPC_FWNMI:
diff --git a/arch/powerpc/platforms/pseries/firmware.c b/arch/powerpc/platforms/pseries/firmware.c
index 63cc82ad58ac..32081d4406e8 100644
--- a/arch/powerpc/platforms/pseries/firmware.c
+++ b/arch/powerpc/platforms/pseries/firmware.c
@@ -65,6 +65,7 @@ hypertas_fw_features_table[] = {
 	{FW_FEATURE_SET_MODE,		"hcall-set-mode"},
 	{FW_FEATURE_BEST_ENERGY,	"hcall-best-energy-1*"},
 	{FW_FEATURE_HPT_RESIZE,		"hcall-hpt-resize"},
+	{FW_FEATURE_HASH_API,		"hcall-hash-api"},
 };
 
 /* Build up the firmware features bitmask using the contents of
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 838887587411..780433b1f179 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -930,6 +930,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_PPC_SMT_POSSIBLE 147
 #define KVM_CAP_HYPERV_SYNIC2 148
 #define KVM_CAP_HYPERV_VP_INDEX 149
+#define KVM_CAP_SPAPR_HASH_API 150
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 10/16] powerpc/kvm/hash: Implement HASH_REMOVE hcall
  2017-10-27  4:08 [PATCH 00/16] Remove hash page table slot tracking from linux PTE Aneesh Kumar K.V
                   ` (8 preceding siblings ...)
  2017-10-27  4:08 ` [PATCH 09/16] powerpc/mm: Add new firmware feature HASH API Aneesh Kumar K.V
@ 2017-10-27  4:08 ` Aneesh Kumar K.V
  2017-10-27  4:08 ` [PATCH 11/16] powerpc/kvm/hash: Implement HASH_PROTECT hcall Aneesh Kumar K.V
                   ` (6 subsequent siblings)
  16 siblings, 0 replies; 31+ messages in thread
From: Aneesh Kumar K.V @ 2017-10-27  4:08 UTC (permalink / raw)
  To: benh, paulus, mpe; +Cc: linuxppc-dev, Aneesh Kumar K.V

This is equivalent to H_REMOVE hcall, but then takes hash value as the arg
instead of hashpte slot number. We will use this later to speed up invalidate
operation in guest. Instead of finding slot number using H_READ4 hcall, we can
use hash value directly using this hcall.

Only support flag value for the operation is H_AVPN.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/hvcall.h         |   3 +-
 arch/powerpc/include/asm/plpar_wrappers.h |  16 ++++
 arch/powerpc/kvm/book3s_hv.c              |   1 +
 arch/powerpc/kvm/book3s_hv_rm_mmu.c       | 134 ++++++++++++++++++++++++++----
 arch/powerpc/kvm/book3s_hv_rmhandlers.S   |   2 +
 5 files changed, 138 insertions(+), 18 deletions(-)

diff --git a/arch/powerpc/include/asm/hvcall.h b/arch/powerpc/include/asm/hvcall.h
index 3d34dc0869f6..92980217a076 100644
--- a/arch/powerpc/include/asm/hvcall.h
+++ b/arch/powerpc/include/asm/hvcall.h
@@ -291,7 +291,8 @@
 #define H_INT_ESB               0x3C8
 #define H_INT_SYNC              0x3CC
 #define H_INT_RESET             0x3D0
-#define MAX_HCALL_OPCODE	H_INT_RESET
+#define H_HASH_REMOVE		0x3D4
+#define MAX_HCALL_OPCODE	H_HASH_REMOVE
 
 /* H_VIOCTL functions */
 #define H_GET_VIOA_DUMP_SIZE	0x01
diff --git a/arch/powerpc/include/asm/plpar_wrappers.h b/arch/powerpc/include/asm/plpar_wrappers.h
index c7b164836bc3..8160fea9b5bc 100644
--- a/arch/powerpc/include/asm/plpar_wrappers.h
+++ b/arch/powerpc/include/asm/plpar_wrappers.h
@@ -124,6 +124,22 @@ static inline long plpar_pte_remove(unsigned long flags, unsigned long ptex,
 	return rc;
 }
 
+static inline long plpar_pte_hash_remove(unsigned long flags, unsigned long hash,
+				    unsigned long avpn, unsigned long *old_pteh_ret,
+				    unsigned long *old_ptel_ret)
+{
+	long rc;
+	unsigned long retbuf[PLPAR_HCALL_BUFSIZE];
+
+	rc = plpar_hcall(H_HASH_REMOVE, retbuf, flags, hash, avpn);
+
+	*old_pteh_ret = retbuf[0];
+	*old_ptel_ret = retbuf[1];
+
+	return rc;
+}
+
+
 /* plpar_pte_remove_raw can be called in real mode. It calls plpar_hcall_raw */
 static inline long plpar_pte_remove_raw(unsigned long flags, unsigned long ptex,
 		unsigned long avpn, unsigned long *old_pteh_ret,
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 73bf1ebfa78f..56e7f52ed324 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -4171,6 +4171,7 @@ static unsigned int default_hcall_list[] = {
 	H_XIRR,
 	H_XIRR_X,
 #endif
+	H_HASH_REMOVE,
 	0
 };
 
diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index d80240ba6de4..7ebeb1be8380 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -465,34 +465,21 @@ static void do_tlbies(struct kvm *kvm, unsigned long *rbvalues,
 	}
 }
 
-long kvmppc_do_h_remove(struct kvm *kvm, unsigned long flags,
-			unsigned long pte_index, unsigned long avpn,
-			unsigned long *hpret)
+static long __kvmppc_do_hash_remove(struct kvm *kvm, __be64 *hpte,
+				    unsigned long pte_index,
+				    unsigned long *hpret)
 {
-	__be64 *hpte;
+
 	unsigned long v, r, rb;
 	struct revmap_entry *rev;
 	u64 pte, orig_pte, pte_r;
 
-	if (kvm_is_radix(kvm))
-		return H_FUNCTION;
-	if (pte_index >= kvmppc_hpt_npte(&kvm->arch.hpt))
-		return H_PARAMETER;
-	hpte = (__be64 *)(kvm->arch.hpt.virt + (pte_index << 4));
-	while (!try_lock_hpte(hpte, HPTE_V_HVLOCK))
-		cpu_relax();
 	pte = orig_pte = be64_to_cpu(hpte[0]);
 	pte_r = be64_to_cpu(hpte[1]);
 	if (cpu_has_feature(CPU_FTR_ARCH_300)) {
 		pte = hpte_new_to_old_v(pte, pte_r);
 		pte_r = hpte_new_to_old_r(pte_r);
 	}
-	if ((pte & (HPTE_V_ABSENT | HPTE_V_VALID)) == 0 ||
-	    ((flags & H_AVPN) && (pte & ~0x7fUL) != avpn) ||
-	    ((flags & H_ANDCOND) && (pte & avpn) != 0)) {
-		__unlock_hpte(hpte, orig_pte);
-		return H_NOT_FOUND;
-	}
 
 	rev = real_vmalloc_addr(&kvm->arch.hpt.rev[pte_index]);
 	v = pte & ~HPTE_V_HVLOCK;
@@ -525,6 +512,35 @@ long kvmppc_do_h_remove(struct kvm *kvm, unsigned long flags,
 	hpret[1] = r;
 	return H_SUCCESS;
 }
+
+long kvmppc_do_h_remove(struct kvm *kvm, unsigned long flags,
+			unsigned long pte_index, unsigned long avpn,
+			unsigned long *hpret)
+{
+	__be64 *hpte;
+	u64 pte, orig_pte, pte_r;
+
+	if (kvm_is_radix(kvm))
+		return H_FUNCTION;
+	if (pte_index >= kvmppc_hpt_npte(&kvm->arch.hpt))
+		return H_PARAMETER;
+	hpte = (__be64 *)(kvm->arch.hpt.virt + (pte_index << 4));
+	while (!try_lock_hpte(hpte, HPTE_V_HVLOCK))
+		cpu_relax();
+	pte = orig_pte = be64_to_cpu(hpte[0]);
+	pte_r = be64_to_cpu(hpte[1]);
+	if (cpu_has_feature(CPU_FTR_ARCH_300)) {
+		pte = hpte_new_to_old_v(pte, pte_r);
+		pte_r = hpte_new_to_old_r(pte_r);
+	}
+	if ((pte & (HPTE_V_ABSENT | HPTE_V_VALID)) == 0 ||
+	    ((flags & H_AVPN) && (pte & ~0x7fUL) != avpn) ||
+	    ((flags & H_ANDCOND) && (pte & avpn) != 0)) {
+		__unlock_hpte(hpte, orig_pte);
+		return H_NOT_FOUND;
+	}
+	return __kvmppc_do_hash_remove(kvm, hpte, pte_index, hpret);
+}
 EXPORT_SYMBOL_GPL(kvmppc_do_h_remove);
 
 long kvmppc_h_remove(struct kvm_vcpu *vcpu, unsigned long flags,
@@ -534,6 +550,90 @@ long kvmppc_h_remove(struct kvm_vcpu *vcpu, unsigned long flags,
 				  &vcpu->arch.gpr[4]);
 }
 
+/* return locked hpte */
+static __be64 *kvmppc_find_hpte_slot(struct kvm *kvm, unsigned long hash,
+				     unsigned long avpn, unsigned long *pte_index)
+{
+	int i;
+	__be64 *hpte;
+	unsigned long slot;
+	u64 pte_v, orig_pte, pte_r;
+	bool secondary_search = false;
+
+	/*
+	 * search for the hpte in primary group
+	 */
+	slot = (hash & kvmppc_hpt_mask(&kvm->arch.hpt)) * HPTES_PER_GROUP;
+
+search_again:
+	hpte = (__be64 *)(kvm->arch.hpt.virt + (slot << 4));
+	for (i = 0; i < HPTES_PER_GROUP; i++ , hpte += 2) {
+		/* lockless search */
+		pte_v = orig_pte = be64_to_cpu(hpte[0]);
+		pte_r = be64_to_cpu(hpte[1]);
+		if (cpu_has_feature(CPU_FTR_ARCH_300)) {
+			pte_v = hpte_new_to_old_v(pte_v, pte_r);
+			pte_r = hpte_new_to_old_r(pte_r);
+		}
+		if ((pte_v & (HPTE_V_ABSENT | HPTE_V_VALID)) == 0)
+			continue;
+
+		if ((pte_v & ~0x7FUL) == avpn) {
+			while (!try_lock_hpte(hpte, HPTE_V_HVLOCK))
+				cpu_relax();
+			pte_v = orig_pte = be64_to_cpu(hpte[0]);
+			pte_r = be64_to_cpu(hpte[1]);
+			if (cpu_has_feature(CPU_FTR_ARCH_300)) {
+				pte_v = hpte_new_to_old_v(pte_v, pte_r);
+				pte_r = hpte_new_to_old_r(pte_r);
+			}
+			if ((pte_v & ~0x7FUL) != avpn) {
+				/* unlock and continue */
+				__unlock_hpte(hpte, orig_pte);
+				continue;
+			}
+			*pte_index = slot + i;
+			return hpte;
+		}
+	}
+	if (!secondary_search) {
+		secondary_search = true;
+		slot = (~hash & kvmppc_hpt_mask(&kvm->arch.hpt)) * HPTES_PER_GROUP;
+		goto search_again;
+	}
+	return NULL;
+}
+
+/* Only support H_AVPN flag, which is must */
+long kvmppc_do_h_hash_remove(struct kvm *kvm, unsigned long flags,
+			     unsigned long hash, unsigned long avpn,
+			     unsigned long *hpret)
+{
+	__be64 *hpte;
+	unsigned long pte_index;
+
+
+	if (kvm_is_radix(kvm))
+		return H_FUNCTION;
+
+	if ((flags & H_AVPN) != H_AVPN)
+		return H_PARAMETER;
+
+	hpte = kvmppc_find_hpte_slot(kvm, hash, avpn, &pte_index);
+	if (!hpte)
+		return H_NOT_FOUND;
+
+	return __kvmppc_do_hash_remove(kvm, hpte, pte_index, hpret);
+}
+EXPORT_SYMBOL_GPL(kvmppc_do_h_hash_remove);
+
+long kvmppc_h_hash_remove(struct kvm_vcpu *vcpu, unsigned long flags,
+			  unsigned long hash, unsigned long avpn)
+{
+	return kvmppc_do_h_hash_remove(vcpu->kvm, flags, hash, avpn,
+				  &vcpu->arch.gpr[4]);
+}
+
 long kvmppc_h_bulk_remove(struct kvm_vcpu *vcpu)
 {
 	struct kvm *kvm = vcpu->kvm;
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index ec69fa45d5a2..238ecf5d0ed8 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -2375,6 +2375,8 @@ hcall_real_table:
 	.long	0		/* 0x2fc - H_XIRR_X*/
 #endif
 	.long	DOTSYM(kvmppc_h_random) - hcall_real_table
+	.space	((H_HASH_REMOVE - 4) - H_RANDOM), 0
+	.long	DOTSYM(kvmppc_h_hash_remove) - hcall_real_table
 	.globl	hcall_real_table_end
 hcall_real_table_end:
 
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 11/16] powerpc/kvm/hash: Implement HASH_PROTECT hcall
  2017-10-27  4:08 [PATCH 00/16] Remove hash page table slot tracking from linux PTE Aneesh Kumar K.V
                   ` (9 preceding siblings ...)
  2017-10-27  4:08 ` [PATCH 10/16] powerpc/kvm/hash: Implement HASH_REMOVE hcall Aneesh Kumar K.V
@ 2017-10-27  4:08 ` Aneesh Kumar K.V
  2017-10-27  4:08 ` [PATCH 12/16] powerpc/kvm/hash: Implement HASH_BULK_REMOVE hcall Aneesh Kumar K.V
                   ` (5 subsequent siblings)
  16 siblings, 0 replies; 31+ messages in thread
From: Aneesh Kumar K.V @ 2017-10-27  4:08 UTC (permalink / raw)
  To: benh, paulus, mpe; +Cc: linuxppc-dev, Aneesh Kumar K.V

This is equivalent to H_PROTECT hcall, but then takes hash value as the arg
instead of hashpte slot number. We will use this later to speed up invalidate
operation in guest. Instead of finding slot number using H_READ4 hcall, we can
use hash value directly using this hcall.

H_AVPN flag value is needed. Otherwise will return error.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/hvcall.h         |  3 +-
 arch/powerpc/include/asm/plpar_wrappers.h |  7 +++
 arch/powerpc/kvm/book3s_hv.c              |  1 +
 arch/powerpc/kvm/book3s_hv_rm_mmu.c       | 74 ++++++++++++++++++++++---------
 arch/powerpc/kvm/book3s_hv_rmhandlers.S   |  1 +
 5 files changed, 63 insertions(+), 23 deletions(-)

diff --git a/arch/powerpc/include/asm/hvcall.h b/arch/powerpc/include/asm/hvcall.h
index 92980217a076..725d4fadec82 100644
--- a/arch/powerpc/include/asm/hvcall.h
+++ b/arch/powerpc/include/asm/hvcall.h
@@ -292,7 +292,8 @@
 #define H_INT_SYNC              0x3CC
 #define H_INT_RESET             0x3D0
 #define H_HASH_REMOVE		0x3D4
-#define MAX_HCALL_OPCODE	H_HASH_REMOVE
+#define H_HASH_PROTECT		0x3D8
+#define MAX_HCALL_OPCODE	H_HASH_PROTECT
 
 /* H_VIOCTL functions */
 #define H_GET_VIOA_DUMP_SIZE	0x01
diff --git a/arch/powerpc/include/asm/plpar_wrappers.h b/arch/powerpc/include/asm/plpar_wrappers.h
index 8160fea9b5bc..27e30ca6105d 100644
--- a/arch/powerpc/include/asm/plpar_wrappers.h
+++ b/arch/powerpc/include/asm/plpar_wrappers.h
@@ -226,6 +226,13 @@ static inline long plpar_pte_protect(unsigned long flags, unsigned long ptex,
 	return plpar_hcall_norets(H_PROTECT, flags, ptex, avpn);
 }
 
+static inline long plpar_pte_hash_protect(unsigned long flags,
+					  unsigned long hash,
+					  unsigned long avpn)
+{
+	return plpar_hcall_norets(H_HASH_PROTECT, flags, hash, avpn);
+}
+
 static inline long plpar_resize_hpt_prepare(unsigned long flags,
 					    unsigned long shift)
 {
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 56e7f52ed324..822e91ba1dbe 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -4171,6 +4171,7 @@ static unsigned int default_hcall_list[] = {
 	H_XIRR,
 	H_XIRR_X,
 #endif
+	H_HASH_PROTECT,
 	H_HASH_REMOVE,
 	0
 };
diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index 7ebeb1be8380..d6782fab2584 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -752,33 +752,14 @@ long kvmppc_h_bulk_remove(struct kvm_vcpu *vcpu)
 	return ret;
 }
 
-long kvmppc_h_protect(struct kvm_vcpu *vcpu, unsigned long flags,
-		      unsigned long pte_index, unsigned long avpn,
-		      unsigned long va)
+long __kvmppc_do_hash_protect(struct kvm *kvm, __be64 *hpte,
+			      unsigned long flags, unsigned long pte_index)
 {
-	struct kvm *kvm = vcpu->kvm;
-	__be64 *hpte;
+	u64 pte_v, pte_r;
 	struct revmap_entry *rev;
 	unsigned long v, r, rb, mask, bits;
-	u64 pte_v, pte_r;
-
-	if (kvm_is_radix(kvm))
-		return H_FUNCTION;
-	if (pte_index >= kvmppc_hpt_npte(&kvm->arch.hpt))
-		return H_PARAMETER;
 
-	hpte = (__be64 *)(kvm->arch.hpt.virt + (pte_index << 4));
-	while (!try_lock_hpte(hpte, HPTE_V_HVLOCK))
-		cpu_relax();
 	v = pte_v = be64_to_cpu(hpte[0]);
-	if (cpu_has_feature(CPU_FTR_ARCH_300))
-		v = hpte_new_to_old_v(v, be64_to_cpu(hpte[1]));
-	if ((v & (HPTE_V_ABSENT | HPTE_V_VALID)) == 0 ||
-	    ((flags & H_AVPN) && (v & ~0x7fUL) != avpn)) {
-		__unlock_hpte(hpte, pte_v);
-		return H_NOT_FOUND;
-	}
-
 	pte_r = be64_to_cpu(hpte[1]);
 	bits = (flags << 55) & HPTE_R_PP0;
 	bits |= (flags << 48) & HPTE_R_KEY_HI;
@@ -823,6 +804,55 @@ long kvmppc_h_protect(struct kvm_vcpu *vcpu, unsigned long flags,
 	return H_SUCCESS;
 }
 
+long kvmppc_h_protect(struct kvm_vcpu *vcpu, unsigned long flags,
+		      unsigned long pte_index, unsigned long avpn,
+		      unsigned long va)
+{
+	__be64 *hpte;
+	u64 v, pte_v;
+	struct kvm *kvm = vcpu->kvm;
+
+	if (kvm_is_radix(kvm))
+		return H_FUNCTION;
+	if (pte_index >= kvmppc_hpt_npte(&kvm->arch.hpt))
+		return H_PARAMETER;
+
+	hpte = (__be64 *)(kvm->arch.hpt.virt + (pte_index << 4));
+	while (!try_lock_hpte(hpte, HPTE_V_HVLOCK))
+		cpu_relax();
+	v = pte_v = be64_to_cpu(hpte[0]);
+	if (cpu_has_feature(CPU_FTR_ARCH_300))
+		v = hpte_new_to_old_v(v, be64_to_cpu(hpte[1]));
+	if ((v & (HPTE_V_ABSENT | HPTE_V_VALID)) == 0 ||
+	    ((flags & H_AVPN) && (v & ~0x7fUL) != avpn)) {
+		__unlock_hpte(hpte, pte_v);
+		return H_NOT_FOUND;
+	}
+	return __kvmppc_do_hash_protect(kvm, hpte, flags, pte_index);
+}
+
+/*  H_AVPN flag is must */
+long kvmppc_h_hash_protect(struct kvm_vcpu *vcpu, unsigned long flags,
+			   unsigned long hash, unsigned long avpn,
+			   unsigned long va)
+{
+	__be64 *hpte;
+	unsigned long pte_index;
+	struct kvm *kvm = vcpu->kvm;
+
+	if (kvm_is_radix(kvm))
+		return H_FUNCTION;
+
+	if (!(flags & H_AVPN))
+		return H_PARAMETER;
+
+	hpte = kvmppc_find_hpte_slot(kvm, hash, avpn, &pte_index);
+	if (!hpte)
+		return H_NOT_FOUND;
+
+	return __kvmppc_do_hash_protect(kvm, hpte, flags, pte_index);
+}
+
 long kvmppc_h_read(struct kvm_vcpu *vcpu, unsigned long flags,
 		   unsigned long pte_index)
 {
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index 238ecf5d0ed8..8e190eb8b26d 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -2377,6 +2377,7 @@ hcall_real_table:
 	.long	DOTSYM(kvmppc_h_random) - hcall_real_table
 	.space	((H_HASH_REMOVE - 4) - H_RANDOM), 0
 	.long	DOTSYM(kvmppc_h_hash_remove) - hcall_real_table
+	.long	DOTSYM(kvmppc_h_hash_protect) - hcall_real_table
 	.globl	hcall_real_table_end
 hcall_real_table_end:
 
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 12/16] powerpc/kvm/hash: Implement HASH_BULK_REMOVE hcall
  2017-10-27  4:08 [PATCH 00/16] Remove hash page table slot tracking from linux PTE Aneesh Kumar K.V
                   ` (10 preceding siblings ...)
  2017-10-27  4:08 ` [PATCH 11/16] powerpc/kvm/hash: Implement HASH_PROTECT hcall Aneesh Kumar K.V
@ 2017-10-27  4:08 ` Aneesh Kumar K.V
  2017-10-27  4:08 ` [PATCH 13/16] powerpc/mm/pseries: Use HASH_PROTECT hcall in guest Aneesh Kumar K.V
                   ` (4 subsequent siblings)
  16 siblings, 0 replies; 31+ messages in thread
From: Aneesh Kumar K.V @ 2017-10-27  4:08 UTC (permalink / raw)
  To: benh, paulus, mpe; +Cc: linuxppc-dev, Aneesh Kumar K.V

This is equivalent to H_BULK_REMOVE hcall, but then takes hash value as the arg
instead of hashpte slot number. We will use this later to speed up bulk remove
operation in guest. Instead of finding slot number using H_READ4 hcall, we can
use hash value directly using this hcall.

only support H_AVPN operation

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/book3s/64/mmu-hash.h |  2 +
 arch/powerpc/include/asm/hvcall.h             |  3 +-
 arch/powerpc/kvm/book3s_hv.c                  |  1 +
 arch/powerpc/kvm/book3s_hv_rm_mmu.c           | 95 +++++++++++++++++++++++++++
 arch/powerpc/kvm/book3s_hv_rmhandlers.S       |  1 +
 5 files changed, 101 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/book3s/64/mmu-hash.h b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
index 8b1d924a2f85..c24157fa200c 100644
--- a/arch/powerpc/include/asm/book3s/64/mmu-hash.h
+++ b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
@@ -68,6 +68,8 @@
  */
 
 #define HPTES_PER_GROUP 8
+/* ISA defines max HTAB SIZE bits 46 */
+#define MAX_HTAB_MASK ((1UL << 46) - 1)
 
 #define HPTE_V_SSIZE_SHIFT	62
 #define HPTE_V_AVPN_SHIFT	7
diff --git a/arch/powerpc/include/asm/hvcall.h b/arch/powerpc/include/asm/hvcall.h
index 725d4fadec82..c4feb950dd9f 100644
--- a/arch/powerpc/include/asm/hvcall.h
+++ b/arch/powerpc/include/asm/hvcall.h
@@ -293,7 +293,8 @@
 #define H_INT_RESET             0x3D0
 #define H_HASH_REMOVE		0x3D4
 #define H_HASH_PROTECT		0x3D8
-#define MAX_HCALL_OPCODE	H_HASH_PROTECT
+#define H_HASH_BULK_REMOVE	0x3DC
+#define MAX_HCALL_OPCODE	H_HASH_BULK_REMOVE
 
 /* H_VIOCTL functions */
 #define H_GET_VIOA_DUMP_SIZE	0x01
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 822e91ba1dbe..9c6db0cb8a1c 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -4173,6 +4173,7 @@ static unsigned int default_hcall_list[] = {
 #endif
 	H_HASH_PROTECT,
 	H_HASH_REMOVE,
+	H_HASH_BULK_REMOVE,
 	0
 };
 
diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index d6782fab2584..24668e499a01 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -752,6 +752,101 @@ long kvmppc_h_bulk_remove(struct kvm_vcpu *vcpu)
 	return ret;
 }
 
+long kvmppc_h_hash_bulk_remove(struct kvm_vcpu *vcpu)
+{
+	struct kvm *kvm = vcpu->kvm;
+	unsigned long *args = &vcpu->arch.gpr[4];
+	__be64 *hp, *hptes[4];
+	unsigned long tlbrb[4];
+	long int i, j, k, n, pte_index[4];
+	unsigned long flags, req, hash, rcbits;
+	int global;
+	long int ret = H_SUCCESS;
+	struct revmap_entry *rev, *revs[4];
+	u64 hp0, hp1;
+
+	if (kvm_is_radix(kvm))
+		return H_FUNCTION;
+
+	global = global_invalidates(kvm);
+	for (i = 0; i < 4 && ret == H_SUCCESS; ) {
+		n = 0;
+		for (; i < 4; ++i) {
+			j = i * 2;
+			hash = args[j];
+			flags = hash >> 56;
+			hash &= ((1ul << 56) - 1);
+			req = flags >> 6;
+			flags &= 3;
+			if (req == 3) {		/* no more requests */
+				i = 4;
+				break;
+			}
+			/* only support avpn flag */
+			if (req != 1 || flags != 2) {
+				/* parameter error */
+				args[j] = ((0xa0 | flags) << 56) + hash;
+				ret = H_PARAMETER;
+				break;
+			}
+			/*
+			 * We wait here to take lock for all hash values
+			 * FIXME!! will that deadlock ?
+			 */
+			hp = kvmppc_find_hpte_slot(kvm, hash,
+						   args[j + 1], &pte_index[n]);
+			if (!hp) {
+				args[j] = ((0x90 | flags) << 56) + hash;
+				continue;
+			}
+			hp0 = be64_to_cpu(hp[0]);
+			hp1 = be64_to_cpu(hp[1]);
+			if (cpu_has_feature(CPU_FTR_ARCH_300)) {
+				hp0 = hpte_new_to_old_v(hp0, hp1);
+				hp1 = hpte_new_to_old_r(hp1);
+			}
+			args[j] = ((0x80 | flags) << 56) + hash;
+			rev = real_vmalloc_addr(&kvm->arch.hpt.rev[pte_index[n]]);
+			note_hpte_modification(kvm, rev);
+
+			if (!(hp0 & HPTE_V_VALID)) {
+				/* insert R and C bits from PTE */
+				rcbits = rev->guest_rpte & (HPTE_R_R|HPTE_R_C);
+				args[j] |= rcbits << (56 - 5);
+				hp[0] = 0;
+				if (is_mmio_hpte(hp0, hp1))
+					atomic64_inc(&kvm->arch.mmio_update);
+				continue;
+			}
+			/* leave it locked */
+			hp[0] &= ~cpu_to_be64(HPTE_V_VALID);
+			tlbrb[n] = compute_tlbie_rb(hp0, hp1, pte_index[n]);
+			hptes[n] = hp;
+			revs[n] = rev;
+			++n;
+		}
+
+		if (!n)
+			break;
+
+		/* Now that we've collected a batch, do the tlbies */
+		do_tlbies(kvm, tlbrb, n, global, true);
+
+		/* Read PTE low words after tlbie to get final R/C values */
+		for (k = 0; k < n; ++k) {
+			hp = hptes[k];
+			rev = revs[k];
+			remove_revmap_chain(kvm, pte_index[k], rev,
+				be64_to_cpu(hp[0]), be64_to_cpu(hp[1]));
+			rcbits = rev->guest_rpte & (HPTE_R_R|HPTE_R_C);
+			args[j] |= rcbits << (56 - 5);
+			__unlock_hpte(hp, 0);
+		}
+	}
+
+	return ret;
+}
+
 long __kvmppc_do_hash_protect(struct kvm *kvm, __be64 *hpte,
 			      unsigned long flags, unsigned long pte_index)
 {
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index 8e190eb8b26d..c2fe9851f4dc 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -2378,6 +2378,7 @@ hcall_real_table:
 	.space	((H_HASH_REMOVE - 4) - H_RANDOM), 0
 	.long	DOTSYM(kvmppc_h_hash_remove) - hcall_real_table
 	.long	DOTSYM(kvmppc_h_hash_protect) - hcall_real_table
+	.long	DOTSYM(kvmppc_h_hash_bulk_remove) - hcall_real_table
 	.globl	hcall_real_table_end
 hcall_real_table_end:
 
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 13/16] powerpc/mm/pseries: Use HASH_PROTECT hcall in guest
  2017-10-27  4:08 [PATCH 00/16] Remove hash page table slot tracking from linux PTE Aneesh Kumar K.V
                   ` (11 preceding siblings ...)
  2017-10-27  4:08 ` [PATCH 12/16] powerpc/kvm/hash: Implement HASH_BULK_REMOVE hcall Aneesh Kumar K.V
@ 2017-10-27  4:08 ` Aneesh Kumar K.V
  2017-10-27  4:08 ` [PATCH 14/16] powerpc/mm/pseries: Use HASH_REMOVE " Aneesh Kumar K.V
                   ` (3 subsequent siblings)
  16 siblings, 0 replies; 31+ messages in thread
From: Aneesh Kumar K.V @ 2017-10-27  4:08 UTC (permalink / raw)
  To: benh, paulus, mpe; +Cc: linuxppc-dev, Aneesh Kumar K.V

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/pseries/lpar.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/pseries/lpar.c b/arch/powerpc/platforms/pseries/lpar.c
index 52d2e3038c05..cd5cf5bd53f1 100644
--- a/arch/powerpc/platforms/pseries/lpar.c
+++ b/arch/powerpc/platforms/pseries/lpar.c
@@ -388,15 +388,20 @@ static long pSeries_lpar_hash_updatepp(unsigned long hash,
 	unsigned long want_v;
 
 	want_v = hpte_encode_avpn(vpn, psize, ssize);
-
 	pr_devel("    update: avpnv=%016lx, hash=%016lx, f=%lx, psize: %d ...",
 		 want_v, hash, flags, psize);
 
+	if (firmware_has_feature(FW_FEATURE_HASH_API)) {
+		lpar_rc = plpar_pte_hash_protect(flags, hash, want_v);
+		goto err_out;
+	}
 	slot = __pSeries_lpar_hpte_find(hash, want_v);
 	if (slot < 0)
 		return -1;
 
 	lpar_rc = plpar_pte_protect(flags, slot, want_v);
+
+err_out:
 	if (lpar_rc == H_NOT_FOUND) {
 		pr_devel("not found !\n");
 		return -1;
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 14/16] powerpc/mm/pseries: Use HASH_REMOVE hcall in guest
  2017-10-27  4:08 [PATCH 00/16] Remove hash page table slot tracking from linux PTE Aneesh Kumar K.V
                   ` (12 preceding siblings ...)
  2017-10-27  4:08 ` [PATCH 13/16] powerpc/mm/pseries: Use HASH_PROTECT hcall in guest Aneesh Kumar K.V
@ 2017-10-27  4:08 ` Aneesh Kumar K.V
  2017-10-27  4:08 ` [PATCH 15/16] powerpc/mm/pseries: Move slot based bulk remove to helper Aneesh Kumar K.V
                   ` (2 subsequent siblings)
  16 siblings, 0 replies; 31+ messages in thread
From: Aneesh Kumar K.V @ 2017-10-27  4:08 UTC (permalink / raw)
  To: benh, paulus, mpe; +Cc: linuxppc-dev, Aneesh Kumar K.V

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/pseries/lpar.c | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/arch/powerpc/platforms/pseries/lpar.c b/arch/powerpc/platforms/pseries/lpar.c
index cd5cf5bd53f1..41512aaa8c8e 100644
--- a/arch/powerpc/platforms/pseries/lpar.c
+++ b/arch/powerpc/platforms/pseries/lpar.c
@@ -454,6 +454,27 @@ static void pSeries_lpar_hpte_invalidate(unsigned long slot, unsigned long vpn,
 	BUG_ON(lpar_rc != H_SUCCESS);
 }
 
+static void __pseries_lpar_hash_invalidate(unsigned long hash, unsigned long vpn,
+					   int psize, int apsize,
+					   int ssize, int local)
+{
+	unsigned long want_v;
+	unsigned long lpar_rc;
+	unsigned long dummy1, dummy2;
+
+	pr_devel("    inval : hash=%lx, vpn=%016lx, psize: %d, local: %d\n",
+		 hash, vpn, psize, local);
+
+	want_v = hpte_encode_avpn(vpn, psize, ssize);
+	lpar_rc = plpar_pte_hash_remove(H_AVPN, hash, want_v, &dummy1, &dummy2);
+	if (lpar_rc == H_NOT_FOUND)
+		return;
+
+	BUG_ON(lpar_rc != H_SUCCESS);
+
+}
+
+
 static void pSeries_lpar_hash_invalidate(unsigned long hash, unsigned long vpn,
 					 int psize, int apsize,
 					 int ssize, int local)
@@ -466,6 +487,9 @@ static void pSeries_lpar_hash_invalidate(unsigned long hash, unsigned long vpn,
 	pr_devel("    inval : hash=%lx, vpn=%016lx, psize: %d, local: %d\n",
 		 hash, vpn, psize, local);
 
+	if (firmware_has_feature(FW_FEATURE_HASH_API))
+		return __pseries_lpar_hash_invalidate(hash, vpn, psize,
+						      apsize, ssize, local);
 	want_v = hpte_encode_avpn(vpn, psize, ssize);
 	slot = __pSeries_lpar_hpte_find(hash, want_v);
 	if (slot < 0)
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 15/16] powerpc/mm/pseries: Move slot based bulk remove to helper
  2017-10-27  4:08 [PATCH 00/16] Remove hash page table slot tracking from linux PTE Aneesh Kumar K.V
                   ` (13 preceding siblings ...)
  2017-10-27  4:08 ` [PATCH 14/16] powerpc/mm/pseries: Use HASH_REMOVE " Aneesh Kumar K.V
@ 2017-10-27  4:08 ` Aneesh Kumar K.V
  2017-10-27  4:08 ` [PATCH 16/16] powerpc/mm/pseries: Use HASH_BULK_REMOVE hcall in guest Aneesh Kumar K.V
  2017-10-27  4:34 ` [PATCH 00/16] Remove hash page table slot tracking from linux PTE Paul Mackerras
  16 siblings, 0 replies; 31+ messages in thread
From: Aneesh Kumar K.V @ 2017-10-27  4:08 UTC (permalink / raw)
  To: benh, paulus, mpe; +Cc: linuxppc-dev, Aneesh Kumar K.V

---
 arch/powerpc/platforms/pseries/lpar.c | 51 +++++++++++++++++++++--------------
 1 file changed, 31 insertions(+), 20 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/lpar.c b/arch/powerpc/platforms/pseries/lpar.c
index 41512aaa8c8e..4ea9224cbeb6 100644
--- a/arch/powerpc/platforms/pseries/lpar.c
+++ b/arch/powerpc/platforms/pseries/lpar.c
@@ -632,6 +632,34 @@ static int pSeries_lpar_hpte_removebolted(unsigned long ea,
 	return 0;
 }
 
+static int plpar_bluk_remove(unsigned long *param, int index, unsigned long slot,
+			     unsigned long vpn, unsigned long psize,
+			     unsigned long ssize, int local)
+{
+	unsigned long rc;
+	if (!firmware_has_feature(FW_FEATURE_BULK_REMOVE)) {
+		/*
+		 * lpar doesn't use the passed actual page size
+		 */
+		pSeries_lpar_hpte_invalidate(slot, vpn, psize,
+					     0, ssize, local);
+	} else {
+		param[index] = HBR_REQUEST | HBR_AVPN | slot;
+		param[index+1] = hpte_encode_avpn(vpn, psize,
+						ssize);
+		index += 2;
+		if (index == 8) {
+			rc = plpar_hcall9(H_BULK_REMOVE, param,
+					  param[0], param[1], param[2],
+					  param[3], param[4], param[5],
+					  param[6], param[7]);
+			BUG_ON(rc != H_SUCCESS);
+			index = 0;
+		}
+	}
+	return index;
+}
+
 /*
  * Take a spinlock around flushes to avoid bouncing the hypervisor tlbie
  * lock.
@@ -661,29 +689,12 @@ static void pSeries_lpar_flush_hash_range(unsigned long number, int local)
 			slot = pSeries_lpar_hpte_find(vpn, psize, ssize);
 			if (slot < 0)
 				continue;
-			if (!firmware_has_feature(FW_FEATURE_BULK_REMOVE)) {
-				/*
-				 * lpar doesn't use the passed actual page size
-				 */
-				pSeries_lpar_hpte_invalidate(slot, vpn, psize,
-							     0, ssize, local);
-			} else {
-				param[pix] = HBR_REQUEST | HBR_AVPN | slot;
-				param[pix+1] = hpte_encode_avpn(vpn, psize,
-								ssize);
-				pix += 2;
-				if (pix == 8) {
-					rc = plpar_hcall9(H_BULK_REMOVE, param,
-						param[0], param[1], param[2],
-						param[3], param[4], param[5],
-						param[6], param[7]);
-					BUG_ON(rc != H_SUCCESS);
-					pix = 0;
-				}
-			}
+			pix = plpar_bluk_remove(param, pix, slot, vpn,
+						psize, ssize, local);
 		} pte_iterate_hashed_end();
 	}
 	if (pix) {
+		/* We have a flush pending */
 		param[pix] = HBR_END;
 		rc = plpar_hcall9(H_BULK_REMOVE, param, param[0], param[1],
 				  param[2], param[3], param[4], param[5],
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 16/16] powerpc/mm/pseries: Use HASH_BULK_REMOVE hcall in guest
  2017-10-27  4:08 [PATCH 00/16] Remove hash page table slot tracking from linux PTE Aneesh Kumar K.V
                   ` (14 preceding siblings ...)
  2017-10-27  4:08 ` [PATCH 15/16] powerpc/mm/pseries: Move slot based bulk remove to helper Aneesh Kumar K.V
@ 2017-10-27  4:08 ` Aneesh Kumar K.V
  2017-10-27  4:34 ` [PATCH 00/16] Remove hash page table slot tracking from linux PTE Paul Mackerras
  16 siblings, 0 replies; 31+ messages in thread
From: Aneesh Kumar K.V @ 2017-10-27  4:08 UTC (permalink / raw)
  To: benh, paulus, mpe; +Cc: linuxppc-dev, Aneesh Kumar K.V

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/pseries/lpar.c | 40 ++++++++++++++++++++++++++++-------
 1 file changed, 32 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/lpar.c b/arch/powerpc/platforms/pseries/lpar.c
index 4ea9224cbeb6..6dffdf654a28 100644
--- a/arch/powerpc/platforms/pseries/lpar.c
+++ b/arch/powerpc/platforms/pseries/lpar.c
@@ -684,19 +684,43 @@ static void pSeries_lpar_flush_hash_range(unsigned long number, int local)
 	for (i = 0; i < number; i++) {
 		vpn = batch->vpn[i];
 		pte_iterate_hashed_subpages(vpn, psize, index, shift) {
-			long slot;
-
-			slot = pSeries_lpar_hpte_find(vpn, psize, ssize);
-			if (slot < 0)
-				continue;
-			pix = plpar_bluk_remove(param, pix, slot, vpn,
-						psize, ssize, local);
+			if (!firmware_has_feature(FW_FEATURE_HASH_API)) {
+				long slot;
+
+				slot = pSeries_lpar_hpte_find(vpn, psize, ssize);
+				if (slot < 0)
+					continue;
+				pix = plpar_bluk_remove(param, pix, slot, vpn,
+							  psize, ssize, local);
+			} else {
+				unsigned long hash;
+				hash = hpt_hash(vpn, mmu_psize_defs[psize].shift, ssize);
+				/* trim the top bits, we overload them below */
+				hash &= MAX_HTAB_MASK;
+				param[pix] = HBR_REQUEST | HBR_AVPN | hash;
+				param[pix+1] = hpte_encode_avpn(vpn, psize, ssize);
+				pix += 2;
+				if (pix == 8) {
+					rc = plpar_hcall9(H_HASH_BULK_REMOVE, param,
+							  param[0], param[1], param[2],
+							  param[3], param[4], param[5],
+							  param[6], param[7]);
+					BUG_ON(rc != H_SUCCESS);
+					pix = 0;
+				}
+			}
 		} pte_iterate_hashed_end();
 	}
 	if (pix) {
+		unsigned long hcall;
+
 		/* We have a flush pending */
 		param[pix] = HBR_END;
-		rc = plpar_hcall9(H_BULK_REMOVE, param, param[0], param[1],
+		if (!firmware_has_feature(FW_FEATURE_HASH_API))
+			hcall = H_BULK_REMOVE;
+		else
+			hcall = H_HASH_BULK_REMOVE;
+		rc = plpar_hcall9(hcall, param, param[0], param[1],
 				  param[2], param[3], param[4], param[5],
 				  param[6], param[7]);
 		BUG_ON(rc != H_SUCCESS);
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH 00/16] Remove hash page table slot tracking from linux PTE
  2017-10-27  4:08 [PATCH 00/16] Remove hash page table slot tracking from linux PTE Aneesh Kumar K.V
                   ` (15 preceding siblings ...)
  2017-10-27  4:08 ` [PATCH 16/16] powerpc/mm/pseries: Use HASH_BULK_REMOVE hcall in guest Aneesh Kumar K.V
@ 2017-10-27  4:34 ` Paul Mackerras
  2017-10-27  5:27   ` Aneesh Kumar K.V
  16 siblings, 1 reply; 31+ messages in thread
From: Paul Mackerras @ 2017-10-27  4:34 UTC (permalink / raw)
  To: Aneesh Kumar K.V; +Cc: benh, mpe, linuxppc-dev

On Fri, Oct 27, 2017 at 09:38:17AM +0530, Aneesh Kumar K.V wrote:
> Hi,
> 
> With hash translation mode we always tracked the hash pte slot details in linux page table.
> This occupied space in the linux page table and also limitted our ability to support
> linux features that require additional PTE bits. This series attempt to lift this
> limitation by not tracking slot number in linux page table. We still track slot details
> w.r.t Transparent Hugepage entries because an invalidate there requires us to go through
> all the 256 hash pte slots. So tracking whether hash page table entry is valid helps us in
> avoiding a lot of hcalls there. With THP entries we don't keep slot details in the primary
> linux page table entry but in the second half of page table. Hence tracking slot details
> for THP doesn't take up space in PTE.
> 
> Even though we don't track slot, for removing/updating hash page table entry, PAPR hcalls expect
> hash page table slot details. On pseries we find slot using H_READ hcall using H_READ_4 flags.
> This implies an additional 2 hcalls in the updatepp and remove paths. The patch series also
> attempt to limit the impact of this by adding new hcalls that does remove/update of hash page table
> entry using hash value instead of hash page table slot.
> 
> Below is the performance numbers observed when running a workload that does the below sequence
> 
> for(5000) {
> mmap(128M)
> touch every page of 2048 page
> munmap()
> }
> 
> The test is run with address randomization off, swap disabled in both host and guest.
> 
> 
> |------------+----------+---------------+--------------------------+-----------------------|
> | iterations | platform | without patch | With series and no hcall | With series and hcall |
> |------------+----------+---------------+--------------------------+-----------------------|
> |          1 | powernv  |               |                50.818343 |                       |
> |          2 | powernv  |               |                50.744123 |                       |
> |          3 | powernv  |               |                50.721603 |                       |
> |          4 | powernv  |               |                50.739922 |                       |
> |          5 | powernv  |               |                50.638555 |                       |
> |          1 | powernv  |     51.388249 |                          |                       |
> |          2 | powernv  |     51.789701 |                          |                       |
> |          3 | powernv  |     52.240394 |                          |                       |
> |          4 | powernv  |     51.432255 |                          |                       |
> |          5 | powernv  |     51.392947 |                          |                       |
> |------------+----------+---------------+--------------------------+-----------------------|
> |          1 | pseries  |               |                          |            123.154394 |
> |          2 | pseries  |               |                          |            122.253956 |
> |          3 | pseries  |               |                          |            117.666344 |
> |          4 | pseries  |               |                          |            117.681479 |
> |          5 | pseries  |               |                          |            117.735808 |
> |          1 | pseries  |               |               119.424940 |                       |
> |          2 | pseries  |               |               117.663078 |                       |
> |          3 | pseries  |               |               118.345584 |                       |
> |          4 | pseries  |               |               119.620934 |                       |
> |          5 | pseries  |               |               119.463185 |                       |
> |          1 | pseries  |    122.810867 |                          |                       |
> |          2 | pseries  |    115.760801 |                          |                       |
> |          3 | pseries  |    115.257030 |                          |                       |
> |          4 | pseries  |    116.617884 |                          |                       |
> |          5 | pseries  |    117.247036 |                          |                       |
> |------------+----------+---------------+--------------------------+-----------------------|
>

How do we interpret these numbers?  Are they times, or speed?  Is
larger better or worse?

Can you give us the mean and standard deviation for each set of 5
please?

Paul.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 00/16] Remove hash page table slot tracking from linux PTE
  2017-10-27  4:34 ` [PATCH 00/16] Remove hash page table slot tracking from linux PTE Paul Mackerras
@ 2017-10-27  5:27   ` Aneesh Kumar K.V
  2017-10-27  5:41     ` Paul Mackerras
  2017-10-28 22:35     ` Ram Pai
  0 siblings, 2 replies; 31+ messages in thread
From: Aneesh Kumar K.V @ 2017-10-27  5:27 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: benh, mpe, linuxppc-dev



On 10/27/2017 10:04 AM, Paul Mackerras wrote:
> On Fri, Oct 27, 2017 at 09:38:17AM +0530, Aneesh Kumar K.V wrote:
>> Hi,
>>
>> With hash translation mode we always tracked the hash pte slot details in linux page table.
>> This occupied space in the linux page table and also limitted our ability to support
>> linux features that require additional PTE bits. This series attempt to lift this
>> limitation by not tracking slot number in linux page table. We still track slot details
>> w.r.t Transparent Hugepage entries because an invalidate there requires us to go through
>> all the 256 hash pte slots. So tracking whether hash page table entry is valid helps us in
>> avoiding a lot of hcalls there. With THP entries we don't keep slot details in the primary
>> linux page table entry but in the second half of page table. Hence tracking slot details
>> for THP doesn't take up space in PTE.
>>
>> Even though we don't track slot, for removing/updating hash page table entry, PAPR hcalls expect
>> hash page table slot details. On pseries we find slot using H_READ hcall using H_READ_4 flags.
>> This implies an additional 2 hcalls in the updatepp and remove paths. The patch series also
>> attempt to limit the impact of this by adding new hcalls that does remove/update of hash page table
>> entry using hash value instead of hash page table slot.
>>
>> Below is the performance numbers observed when running a workload that does the below sequence
>>
>> for(5000) {
>> mmap(128M)
>> touch every page of 2048 page
>> munmap()
>> }
>>
>> The test is run with address randomization off, swap disabled in both host and guest.
>>
>>
>> |------------+----------+---------------+--------------------------+-----------------------|
>> | iterations | platform | without patch | With series and no hcall | With series and hcall |
>> |------------+----------+---------------+--------------------------+-----------------------|
>> |          1 | powernv  |               |                50.818343 |                       |
>> |          2 | powernv  |               |                50.744123 |                       |
>> |          3 | powernv  |               |                50.721603 |                       |
>> |          4 | powernv  |               |                50.739922 |                       |
>> |          5 | powernv  |               |                50.638555 |                       |
>> |          1 | powernv  |     51.388249 |                          |                       |
>> |          2 | powernv  |     51.789701 |                          |                       |
>> |          3 | powernv  |     52.240394 |                          |                       |
>> |          4 | powernv  |     51.432255 |                          |                       |
>> |          5 | powernv  |     51.392947 |                          |                       |
>> |------------+----------+---------------+--------------------------+-----------------------|
>> |          1 | pseries  |               |                          |            123.154394 |
>> |          2 | pseries  |               |                          |            122.253956 |
>> |          3 | pseries  |               |                          |            117.666344 |
>> |          4 | pseries  |               |                          |            117.681479 |
>> |          5 | pseries  |               |                          |            117.735808 |
>> |          1 | pseries  |               |               119.424940 |                       |
>> |          2 | pseries  |               |               117.663078 |                       |
>> |          3 | pseries  |               |               118.345584 |                       |
>> |          4 | pseries  |               |               119.620934 |                       |
>> |          5 | pseries  |               |               119.463185 |                       |
>> |          1 | pseries  |    122.810867 |                          |                       |
>> |          2 | pseries  |    115.760801 |                          |                       |
>> |          3 | pseries  |    115.257030 |                          |                       |
>> |          4 | pseries  |    116.617884 |                          |                       |
>> |          5 | pseries  |    117.247036 |                          |                       |
>> |------------+----------+---------------+--------------------------+-----------------------|
>>
> 
> How do we interpret these numbers?  Are they times, or speed?  Is
> larger better or worse?

Sorry for not including the details. They are time in seconds. Test case 
is a modified mmap_bench included in powerpc/selftest.

> 
> Can you give us the mean and standard deviation for each set of 5
> please?
> 

powernv without patch
median= 51.432255
stdev = 0.370835

with patch
median = 50.739922
stdev = 0.06419662

pseries without patch
median = 116.617884
stdev = 3.04531023

with patch no hcall
median = 119.42494
stdev = 0.85874552

with patch and hcall
median = 117.735808
stdev = 2.7624151

-aneesh

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 00/16] Remove hash page table slot tracking from linux PTE
  2017-10-27  5:27   ` Aneesh Kumar K.V
@ 2017-10-27  5:41     ` Paul Mackerras
  2017-10-30  7:57       ` Aneesh Kumar K.V
                         ` (2 more replies)
  2017-10-28 22:35     ` Ram Pai
  1 sibling, 3 replies; 31+ messages in thread
From: Paul Mackerras @ 2017-10-27  5:41 UTC (permalink / raw)
  To: Aneesh Kumar K.V; +Cc: benh, mpe, linuxppc-dev

On Fri, Oct 27, 2017 at 10:57:13AM +0530, Aneesh Kumar K.V wrote:
> 
> 
> On 10/27/2017 10:04 AM, Paul Mackerras wrote:
> >How do we interpret these numbers?  Are they times, or speed?  Is
> >larger better or worse?
> 
> Sorry for not including the details. They are time in seconds. Test case is
> a modified mmap_bench included in powerpc/selftest.
> 
> >
> >Can you give us the mean and standard deviation for each set of 5
> >please?
> >
> 
> powernv without patch
> median= 51.432255
> stdev = 0.370835
> 
> with patch
> median = 50.739922
> stdev = 0.06419662
> 
> pseries without patch
> median = 116.617884
> stdev = 3.04531023
> 
> with patch no hcall
> median = 119.42494
> stdev = 0.85874552
> 
> with patch and hcall
> median = 117.735808
> stdev = 2.7624151

So on powernv, the patch set *improves* performance by about 1.3%
(almost 2 standard deviations).  Do we know why that is?

On pseries, performance is about 2.4% worse without new hcalls, but
that is less than 1 standard deviation.  With new hcalls, performance
is 0.95% worse, only a third of a standard deviation.  I think we need
to do more measurements to try to get a more accurate picture here.

Were the pseries numbers done on KVM or PowerVM?  Could you do a set
of measurements on the other one too please?  (I assume the numbers
with the new hcall were done on KVM, and can't be done on PowerVM.)

Paul.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 00/16] Remove hash page table slot tracking from linux PTE
  2017-10-27  5:27   ` Aneesh Kumar K.V
  2017-10-27  5:41     ` Paul Mackerras
@ 2017-10-28 22:35     ` Ram Pai
  2017-10-29 14:05       ` Aneesh Kumar K.V
  2017-10-29 22:04       ` Paul Mackerras
  1 sibling, 2 replies; 31+ messages in thread
From: Ram Pai @ 2017-10-28 22:35 UTC (permalink / raw)
  To: Aneesh Kumar K.V; +Cc: Paul Mackerras, linuxppc-dev

On Fri, Oct 27, 2017 at 10:57:13AM +0530, Aneesh Kumar K.V wrote:
> 
> 
> On 10/27/2017 10:04 AM, Paul Mackerras wrote:
> >On Fri, Oct 27, 2017 at 09:38:17AM +0530, Aneesh Kumar K.V wrote:
> >>Hi,
> >>
> >>With hash translation mode we always tracked the hash pte slot details in linux page table.
> >>This occupied space in the linux page table and also limitted our ability to support
> >>linux features that require additional PTE bits. This series attempt to lift this
> >>limitation by not tracking slot number in linux page table. We still track slot details
> >>w.r.t Transparent Hugepage entries because an invalidate there requires us to go through
> >>all the 256 hash pte slots. So tracking whether hash page table entry is valid helps us in
> >>avoiding a lot of hcalls there. With THP entries we don't keep slot details in the primary
> >>linux page table entry but in the second half of page table. Hence tracking slot details
> >>for THP doesn't take up space in PTE.
> >>
> >>Even though we don't track slot, for removing/updating hash page table entry, PAPR hcalls expect
> >>hash page table slot details. On pseries we find slot using H_READ hcall using H_READ_4 flags.
> >>This implies an additional 2 hcalls in the updatepp and remove paths. The patch series also
> >>attempt to limit the impact of this by adding new hcalls that does remove/update of hash page table
> >>entry using hash value instead of hash page table slot.
> >>
> >>Below is the performance numbers observed when running a workload that does the below sequence
> >>
> >>for(5000) {
> >>mmap(128M)
> >>touch every page of 2048 page
> >>munmap()
> >>}
> >>

I like the idea of not tracking the slots at all. It is something the
guest should not be knowing or tracking.

> >>The test is run with address randomization off, swap disabled in both host and guest.
> >>
> >>
> >>|------------+----------+---------------+--------------------------+-----------------------|
> >>| iterations | platform | without patch | With series and no hcall | With series and hcall |
> >>|------------+----------+---------------+--------------------------+-----------------------|
> >>|          1 | powernv  |               |                50.818343 |                       |
> >>|          2 | powernv  |               |                50.744123 |                       |
> >>|          3 | powernv  |               |                50.721603 |                       |
> >>|          4 | powernv  |               |                50.739922 |                       |
> >>|          5 | powernv  |               |                50.638555 |                       |
> >>|          1 | powernv  |     51.388249 |                          |                       |
> >>|          2 | powernv  |     51.789701 |                          |                       |
> >>|          3 | powernv  |     52.240394 |                          |                       |
> >>|          4 | powernv  |     51.432255 |                          |                       |
> >>|          5 | powernv  |     51.392947 |                          |                       |
> >>|------------+----------+---------------+--------------------------+-----------------------|
> >>|          1 | pseries  |               |                          |            123.154394 |
> >>|          2 | pseries  |               |                          |            122.253956 |
> >>|          3 | pseries  |               |                          |            117.666344 |
> >>|          4 | pseries  |               |                          |            117.681479 |
> >>|          5 | pseries  |               |                          |            117.735808 |
> >>|          1 | pseries  |               |               119.424940 |                       |
> >>|          2 | pseries  |               |               117.663078 |                       |
> >>|          3 | pseries  |               |               118.345584 |                       |
> >>|          4 | pseries  |               |               119.620934 |                       |
> >>|          5 | pseries  |               |               119.463185 |                       |
> >>|          1 | pseries  |    122.810867 |                          |                       |
> >>|          2 | pseries  |    115.760801 |                          |                       |
> >>|          3 | pseries  |    115.257030 |                          |                       |
> >>|          4 | pseries  |    116.617884 |                          |                       |
> >>|          5 | pseries  |    117.247036 |                          |                       |
> >>|------------+----------+---------------+--------------------------+-----------------------|
> >>
> >

What does 'With series and no hcall' mean?  does it mean -- no calls to  new hcalls,
     instead use H_READ_4 followed by old HCALLs?

And I am assuming the code is not using any of my slot-move-to-secondary-pte changes.

RP

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 00/16] Remove hash page table slot tracking from linux PTE
  2017-10-28 22:35     ` Ram Pai
@ 2017-10-29 14:05       ` Aneesh Kumar K.V
  2017-10-29 22:04       ` Paul Mackerras
  1 sibling, 0 replies; 31+ messages in thread
From: Aneesh Kumar K.V @ 2017-10-29 14:05 UTC (permalink / raw)
  To: Ram Pai; +Cc: Paul Mackerras, linuxppc-dev

Ram Pai <linuxram@us.ibm.com> writes:

> On Fri, Oct 27, 2017 at 10:57:13AM +0530, Aneesh Kumar K.V wrote:
>> 
>> 
> >>
>> >
>
> What does 'With series and no hcall' mean?  does it mean -- no calls to  new hcalls,
>      instead use H_READ_4 followed by old HCALLs?

That is correct.

>
> And I am assuming the code is not using any of my slot-move-to-secondary-pte changes.
>

That is correct.

-aneesh

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 00/16] Remove hash page table slot tracking from linux PTE
  2017-10-28 22:35     ` Ram Pai
  2017-10-29 14:05       ` Aneesh Kumar K.V
@ 2017-10-29 22:04       ` Paul Mackerras
  2017-10-30  0:51         ` Ram Pai
  1 sibling, 1 reply; 31+ messages in thread
From: Paul Mackerras @ 2017-10-29 22:04 UTC (permalink / raw)
  To: Ram Pai; +Cc: Aneesh Kumar K.V, linuxppc-dev

On Sat, Oct 28, 2017 at 03:35:32PM -0700, Ram Pai wrote:
> 
> I like the idea of not tracking the slots at all. It is something the
> guest should not be knowing or tracking.

Why do you say that?

Paul.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 00/16] Remove hash page table slot tracking from linux PTE
  2017-10-29 22:04       ` Paul Mackerras
@ 2017-10-30  0:51         ` Ram Pai
  2017-11-01  4:46           ` Michael Ellerman
  2017-11-01 11:02           ` Paul Mackerras
  0 siblings, 2 replies; 31+ messages in thread
From: Ram Pai @ 2017-10-30  0:51 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc-dev, Aneesh Kumar K.V

On Mon, Oct 30, 2017 at 09:04:17AM +1100, Paul Mackerras wrote:
> On Sat, Oct 28, 2017 at 03:35:32PM -0700, Ram Pai wrote:
> > 
> > I like the idea of not tracking the slots at all. It is something the
> > guest should not be knowing or tracking.
> 
> Why do you say that?

'slot' is a internal mechanism used by the hash table to accelerate
mapping an address to a hash-page.  If the hash-table implementation
choose to use a different mechanism to accelerate the mapping, it can't
because that mechanism is baked into the logic of all the consumers.

RP

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 00/16] Remove hash page table slot tracking from linux PTE
  2017-10-27  5:41     ` Paul Mackerras
@ 2017-10-30  7:57       ` Aneesh Kumar K.V
  2017-10-30 11:49       ` Aneesh Kumar K.V
  2017-11-21  8:41       ` Aneesh Kumar K.V
  2 siblings, 0 replies; 31+ messages in thread
From: Aneesh Kumar K.V @ 2017-10-30  7:57 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: benh, mpe, linuxppc-dev

Paul Mackerras <paulus@ozlabs.org> writes:

> On Fri, Oct 27, 2017 at 10:57:13AM +0530, Aneesh Kumar K.V wrote:
>> 
>> 
>> On 10/27/2017 10:04 AM, Paul Mackerras wrote:
>> >How do we interpret these numbers?  Are they times, or speed?  Is
>> >larger better or worse?
>> 
>> Sorry for not including the details. They are time in seconds. Test case is
>> a modified mmap_bench included in powerpc/selftest.
>> 
>> >
>> >Can you give us the mean and standard deviation for each set of 5
>> >please?
>> >
>> 
>> powernv without patch
>> median= 51.432255
>> stdev = 0.370835
>> 
>> with patch
>> median = 50.739922
>> stdev = 0.06419662
>> 
>> pseries without patch
>> median = 116.617884
>> stdev = 3.04531023
>> 
>> with patch no hcall
>> median = 119.42494
>> stdev = 0.85874552
>> 
>> with patch and hcall
>> median = 117.735808
>> stdev = 2.7624151
>
> So on powernv, the patch set *improves* performance by about 1.3%
> (almost 2 standard deviations).  Do we know why that is?

I haven't looked at that closely. I was considering it within runtime
variance (no impact with patch series). I will get perf record collected
and will see if that points to any details.

>
> On pseries, performance is about 2.4% worse without new hcalls, but
> that is less than 1 standard deviation.  With new hcalls, performance
> is 0.95% worse, only a third of a standard deviation.  I think we need
> to do more measurements to try to get a more accurate picture here.
>
> Were the pseries numbers done on KVM or PowerVM?  Could you do a set
> of measurements on the other one too please?  (I assume the numbers
> with the new hcall were done on KVM, and can't be done on PowerVM.)
>


The above pseries numbers were collected on KVM.

PowerVM numbers on a different machine:
Without patch
31.194165                           
31.372913                           
31.253494
31.416198
31.199180
MEDIAN = 31.253494
STDEV = 0.1018900

With patch series
31.538281
31.385996
31.492737
31.452514
31.259461
MEDIAN = 31.452514
STDEV  = 0.108511

-aneesh

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 00/16] Remove hash page table slot tracking from linux PTE
  2017-10-27  5:41     ` Paul Mackerras
  2017-10-30  7:57       ` Aneesh Kumar K.V
@ 2017-10-30 11:49       ` Aneesh Kumar K.V
  2017-10-30 13:14         ` Aneesh Kumar K.V
  2017-11-21  8:41       ` Aneesh Kumar K.V
  2 siblings, 1 reply; 31+ messages in thread
From: Aneesh Kumar K.V @ 2017-10-30 11:49 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: benh, mpe, linuxppc-dev

Paul Mackerras <paulus@ozlabs.org> writes:

> On Fri, Oct 27, 2017 at 10:57:13AM +0530, Aneesh Kumar K.V wrote:
>> 
>> 
>> On 10/27/2017 10:04 AM, Paul Mackerras wrote:
>> >How do we interpret these numbers?  Are they times, or speed?  Is
>> >larger better or worse?
>> 
>> Sorry for not including the details. They are time in seconds. Test case is
>> a modified mmap_bench included in powerpc/selftest.
>> 
>> >
>> >Can you give us the mean and standard deviation for each set of 5
>> >please?
>> >
>> 
>> powernv without patch
>> median= 51.432255
>> stdev = 0.370835
>> 
>> with patch
>> median = 50.739922
>> stdev = 0.06419662
>> 
>> pseries without patch
>> median = 116.617884
>> stdev = 3.04531023
>> 
>> with patch no hcall
>> median = 119.42494
>> stdev = 0.85874552
>> 
>> with patch and hcall
>> median = 117.735808
>> stdev = 2.7624151
>
> So on powernv, the patch set *improves* performance by about 1.3%
> (almost 2 standard deviations).  Do we know why that is?
>

I looked at the perf data and with the test, we are doing larger number
of hash faults and then around 10k flush_hash_range. Can the small
improvement in number be due to the fact that we are not storing slot
number when doing an insert now?. Also in the flush path we are now not
using real_pte_t.

aneesh

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 00/16] Remove hash page table slot tracking from linux PTE
  2017-10-30 11:49       ` Aneesh Kumar K.V
@ 2017-10-30 13:14         ` Aneesh Kumar K.V
  2017-10-30 13:49           ` Aneesh Kumar K.V
  0 siblings, 1 reply; 31+ messages in thread
From: Aneesh Kumar K.V @ 2017-10-30 13:14 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: benh, mpe, linuxppc-dev

"Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> writes:


> I looked at the perf data and with the test, we are doing larger number
> of hash faults and then around 10k flush_hash_range. Can the small
> improvement in number be due to the fact that we are not storing slot
> number when doing an insert now?. Also in the flush path we are now not
> using real_pte_t.
>

With THP disabled I am finding below.

Without patch

    35.62%  a.out    [kernel.vmlinux]            [k] clear_user_page
     8.54%  a.out    [kernel.vmlinux]            [k] __lock_acquire
     3.86%  a.out    [kernel.vmlinux]            [k] native_flush_hash_range
     3.38%  a.out    [kernel.vmlinux]            [k] save_context_stack
     2.98%  a.out    a.out                       [.] main
     2.59%  a.out    [kernel.vmlinux]            [k] lock_acquire
     2.29%  a.out    [kernel.vmlinux]            [k] mark_lock
     2.23%  a.out    [kernel.vmlinux]            [k] native_hpte_insert
     1.87%  a.out    [kernel.vmlinux]            [k] get_mem_cgroup_from_mm
     1.71%  a.out    [kernel.vmlinux]            [k] rcu_lockdep_current_cpu_online
     1.68%  a.out    [kernel.vmlinux]            [k] lock_release
     1.47%  a.out    [kernel.vmlinux]            [k] __handle_mm_fault
     1.41%  a.out    [kernel.vmlinux]            [k] validate_sp


With patch
    35.40%  a.out    [kernel.vmlinux]            [k] clear_user_page
     8.82%  a.out    [kernel.vmlinux]            [k] __lock_acquire
     3.66%  a.out    a.out                       [.] main
     3.49%  a.out    [kernel.vmlinux]            [k] save_context_stack
     2.77%  a.out    [kernel.vmlinux]            [k] lock_acquire
     2.45%  a.out    [kernel.vmlinux]            [k] mark_lock
     1.80%  a.out    [kernel.vmlinux]            [k] get_mem_cgroup_from_mm
     1.80%  a.out    [kernel.vmlinux]            [k] native_hpte_insert
     1.79%  a.out    [kernel.vmlinux]            [k] rcu_lockdep_current_cpu_online
     1.78%  a.out    [kernel.vmlinux]            [k] lock_release
     1.73%  a.out    [kernel.vmlinux]            [k] native_flush_hash_range
     1.53%  a.out    [kernel.vmlinux]            [k] __handle_mm_fault

That is we are now spending less time in native_flush_hash_range.

-aneesh

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 00/16] Remove hash page table slot tracking from linux PTE
  2017-10-30 13:14         ` Aneesh Kumar K.V
@ 2017-10-30 13:49           ` Aneesh Kumar K.V
  0 siblings, 0 replies; 31+ messages in thread
From: Aneesh Kumar K.V @ 2017-10-30 13:49 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: benh, mpe, linuxppc-dev

"Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> writes:

> "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> writes:
>
>
>> I looked at the perf data and with the test, we are doing larger number
>> of hash faults and then around 10k flush_hash_range. Can the small
>> improvement in number be due to the fact that we are not storing slot
>> number when doing an insert now?. Also in the flush path we are now not
>> using real_pte_t.
>>
>
> With THP disabled I am finding below.
>
> Without patch
>
>     35.62%  a.out    [kernel.vmlinux]            [k] clear_user_page
>      8.54%  a.out    [kernel.vmlinux]            [k] __lock_acquire
>      3.86%  a.out    [kernel.vmlinux]            [k] native_flush_hash_range
>      3.38%  a.out    [kernel.vmlinux]            [k] save_context_stack
>      2.98%  a.out    a.out                       [.] main
>      2.59%  a.out    [kernel.vmlinux]            [k] lock_acquire
>      2.29%  a.out    [kernel.vmlinux]            [k] mark_lock
>      2.23%  a.out    [kernel.vmlinux]            [k] native_hpte_insert
>      1.87%  a.out    [kernel.vmlinux]            [k] get_mem_cgroup_from_mm
>      1.71%  a.out    [kernel.vmlinux]            [k] rcu_lockdep_current_cpu_online
>      1.68%  a.out    [kernel.vmlinux]            [k] lock_release
>      1.47%  a.out    [kernel.vmlinux]            [k] __handle_mm_fault
>      1.41%  a.out    [kernel.vmlinux]            [k] validate_sp
>
>
> With patch
>     35.40%  a.out    [kernel.vmlinux]            [k] clear_user_page
>      8.82%  a.out    [kernel.vmlinux]            [k] __lock_acquire
>      3.66%  a.out    a.out                       [.] main
>      3.49%  a.out    [kernel.vmlinux]            [k] save_context_stack
>      2.77%  a.out    [kernel.vmlinux]            [k] lock_acquire
>      2.45%  a.out    [kernel.vmlinux]            [k] mark_lock
>      1.80%  a.out    [kernel.vmlinux]            [k] get_mem_cgroup_from_mm
>      1.80%  a.out    [kernel.vmlinux]            [k] native_hpte_insert
>      1.79%  a.out    [kernel.vmlinux]            [k] rcu_lockdep_current_cpu_online
>      1.78%  a.out    [kernel.vmlinux]            [k] lock_release
>      1.73%  a.out    [kernel.vmlinux]            [k] native_flush_hash_range
>      1.53%  a.out    [kernel.vmlinux]            [k] __handle_mm_fault
>
> That is we are now spending less time in native_flush_hash_range.
>
> -aneesh

One possible explanation is, with slot tracking we do

	slot += hidx & _PTEIDX_GROUP_IX;
	hptep = htab_address + slot;
	want_v = hpte_encode_avpn(vpn, psize, ssize);
	native_lock_hpte(hptep);
	hpte_v = be64_to_cpu(hptep->v);
	if (cpu_has_feature(CPU_FTR_ARCH_300))
		hpte_v = hpte_new_to_old_v(hpte_v,
				be64_to_cpu(hptep->r));
	if (!HPTE_V_COMPARE(hpte_v, want_v) ||
	    !(hpte_v & HPTE_V_VALID))
		native_unlock_hpte(hptep);


and without slot tracking we do

	for (i = 0; i < HPTES_PER_GROUP; i++, hptep++) {
		/* check locklessly first */
		hpte_v = be64_to_cpu(hptep->v);
		if (cpu_has_feature(CPU_FTR_ARCH_300))
			hpte_v = hpte_new_to_old_v(hpte_v, be64_to_cpu(hptep->r));
		if (!HPTE_V_COMPARE(hpte_v, want_v) || !(hpte_v & HPTE_V_VALID))
			continue;

		native_lock_hpte(hptep);

That is without the patch series, we take the hpte lock always even if the
hpte didn't match. Hence in perf annotate we find the lock to be
highly contended without patch series.

I will change that to compare pte without taking lock and see if that
has any impact.

-aneesh

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 00/16] Remove hash page table slot tracking from linux PTE
  2017-10-30  0:51         ` Ram Pai
@ 2017-11-01  4:46           ` Michael Ellerman
  2017-11-01 11:02           ` Paul Mackerras
  1 sibling, 0 replies; 31+ messages in thread
From: Michael Ellerman @ 2017-11-01  4:46 UTC (permalink / raw)
  To: Ram Pai, Paul Mackerras; +Cc: linuxppc-dev, Aneesh Kumar K.V

Ram Pai <linuxram@us.ibm.com> writes:

> On Mon, Oct 30, 2017 at 09:04:17AM +1100, Paul Mackerras wrote:
>> On Sat, Oct 28, 2017 at 03:35:32PM -0700, Ram Pai wrote:
>> > 
>> > I like the idea of not tracking the slots at all. It is something the
>> > guest should not be knowing or tracking.
>> 
>> Why do you say that?
>
> 'slot' is a internal mechanism used by the hash table to accelerate
> mapping an address to a hash-page.  If the hash-table implementation
> choose to use a different mechanism to accelerate the mapping, it can't
> because that mechanism is baked into the logic of all the consumers.

The slots are defined in the architecture, so it's entirely fine for
guests to track them.

cheers

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 00/16] Remove hash page table slot tracking from linux PTE
  2017-10-30  0:51         ` Ram Pai
  2017-11-01  4:46           ` Michael Ellerman
@ 2017-11-01 11:02           ` Paul Mackerras
  1 sibling, 0 replies; 31+ messages in thread
From: Paul Mackerras @ 2017-11-01 11:02 UTC (permalink / raw)
  To: Ram Pai; +Cc: linuxppc-dev, Aneesh Kumar K.V

On Sun, Oct 29, 2017 at 05:51:06PM -0700, Ram Pai wrote:
> On Mon, Oct 30, 2017 at 09:04:17AM +1100, Paul Mackerras wrote:
> > On Sat, Oct 28, 2017 at 03:35:32PM -0700, Ram Pai wrote:
> > > 
> > > I like the idea of not tracking the slots at all. It is something the
> > > guest should not be knowing or tracking.
> > 
> > Why do you say that?
> 
> 'slot' is a internal mechanism used by the hash table to accelerate
> mapping an address to a hash-page.  If the hash-table implementation
> choose to use a different mechanism to accelerate the mapping, it can't
> because that mechanism is baked into the logic of all the consumers.

Not all operating systems use the HPT as a cache of translations that
are also stored somewhere, as Linux does.  Those OSes are perfectly
entitled to control the slot allocation for their own purposes in
whatever way they want.  So having the interface use the slot number
is fine; just as having alternative interfaces that don't need to
specify the slot number for the kind of usage that Linux does is also
fine.

Paul.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 00/16] Remove hash page table slot tracking from linux PTE
  2017-10-27  5:41     ` Paul Mackerras
  2017-10-30  7:57       ` Aneesh Kumar K.V
  2017-10-30 11:49       ` Aneesh Kumar K.V
@ 2017-11-21  8:41       ` Aneesh Kumar K.V
  2 siblings, 0 replies; 31+ messages in thread
From: Aneesh Kumar K.V @ 2017-11-21  8:41 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: benh, mpe, linuxppc-dev

Paul Mackerras <paulus@ozlabs.org> writes:
> On pseries, performance is about 2.4% worse without new hcalls, but
> that is less than 1 standard deviation.  With new hcalls, performance
> is 0.95% worse, only a third of a standard deviation.  I think we need
> to do more measurements to try to get a more accurate picture here.
>
> Were the pseries numbers done on KVM or PowerVM?  Could you do a set
> of measurements on the other one too please?  (I assume the numbers
> with the new hcall were done on KVM, and can't be done on PowerVM.)
>

I got ebizzy and kernel compile run on powernv and powervm config. You
can find the numbers below. I did 10 iterations and only added stdev and
median below. I do find powernv do better with patch series.

ebizzy run
-----------
PowerNV (ebizzy -m -n 1000 -P -s 512000 -S 100 -t 100):
With patches, 10 iterations results records/sec.
stdev =	37.60
median = 7411.5

Without patch:
stdev = 23.071
median = 7350

PowerVM numbers(./ebizzy -m -n 1000 -P -s 512000 -S 100 -t 30):
With patch (no new hcalls):
stdev = 20.721
median = 6955.5
	
Without patch	
stdev =	35.049
median = 7081

kernel compile:(time -p)
---------------------------
PowerNV:
With patches:
Real	
----	
stdev = 1.624
median = 61.56
	
User:	
stdev =	61.204
median = 4816.73
	
Sys:	
stdev =	4.367
median = 387.575

Without patches:
Real:	
stdev =	1.318
median = 63.635
	
User:	
stdev =	50.531
median = 4820.51
	
	
Sys:	
stdev =	6.409
median = 389.765

PowerVM numbers:
-------------------
With patches (no new hcalls):
Real:	
stdev =	3.016
median = 442.745
	
	
User:	
stdev =	9.738
median = 5507.87
	
Sys:	
stdev =	0.223
median = 176.455

Witout patches:
Real:
stdev = 0.720
median = 442.445

User:
stdev = 8.621
median = 5501.615

Sys:
stdev = 0.189
median = 173.3

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2017-11-21  8:41 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-27  4:08 [PATCH 00/16] Remove hash page table slot tracking from linux PTE Aneesh Kumar K.V
2017-10-27  4:08 ` [PATCH 01/16] powerpc/mm/hash: Remove the superfluous bitwise operation when find hpte group Aneesh Kumar K.V
2017-10-27  4:08 ` [PATCH 02/16] powerpc/mm: Update native_hpte_find to return hash pte Aneesh Kumar K.V
2017-10-27  4:08 ` [PATCH 03/16] powerpc/pseries: Update hpte find helper to take hash value Aneesh Kumar K.V
2017-10-27  4:08 ` [PATCH 04/16] powerpc/mm: Add hash invalidate callback Aneesh Kumar K.V
2017-10-27  4:08 ` [PATCH 05/16] powerpc/mm: use hash_invalidate for __kernel_map_pages() Aneesh Kumar K.V
2017-10-27  4:08 ` [PATCH 06/16] powerpc/mm: Switch flush_hash_range to not use slot Aneesh Kumar K.V
2017-10-27  4:08 ` [PATCH 07/16] powerpc/mm: Add hash updatepp callback Aneesh Kumar K.V
2017-10-27  4:08 ` [PATCH 08/16] powerpc/mm/hash: Don't track hash pte slot number in linux page table Aneesh Kumar K.V
2017-10-27  4:08 ` [PATCH 09/16] powerpc/mm: Add new firmware feature HASH API Aneesh Kumar K.V
2017-10-27  4:08 ` [PATCH 10/16] powerpc/kvm/hash: Implement HASH_REMOVE hcall Aneesh Kumar K.V
2017-10-27  4:08 ` [PATCH 11/16] powerpc/kvm/hash: Implement HASH_PROTECT hcall Aneesh Kumar K.V
2017-10-27  4:08 ` [PATCH 12/16] powerpc/kvm/hash: Implement HASH_BULK_REMOVE hcall Aneesh Kumar K.V
2017-10-27  4:08 ` [PATCH 13/16] powerpc/mm/pseries: Use HASH_PROTECT hcall in guest Aneesh Kumar K.V
2017-10-27  4:08 ` [PATCH 14/16] powerpc/mm/pseries: Use HASH_REMOVE " Aneesh Kumar K.V
2017-10-27  4:08 ` [PATCH 15/16] powerpc/mm/pseries: Move slot based bulk remove to helper Aneesh Kumar K.V
2017-10-27  4:08 ` [PATCH 16/16] powerpc/mm/pseries: Use HASH_BULK_REMOVE hcall in guest Aneesh Kumar K.V
2017-10-27  4:34 ` [PATCH 00/16] Remove hash page table slot tracking from linux PTE Paul Mackerras
2017-10-27  5:27   ` Aneesh Kumar K.V
2017-10-27  5:41     ` Paul Mackerras
2017-10-30  7:57       ` Aneesh Kumar K.V
2017-10-30 11:49       ` Aneesh Kumar K.V
2017-10-30 13:14         ` Aneesh Kumar K.V
2017-10-30 13:49           ` Aneesh Kumar K.V
2017-11-21  8:41       ` Aneesh Kumar K.V
2017-10-28 22:35     ` Ram Pai
2017-10-29 14:05       ` Aneesh Kumar K.V
2017-10-29 22:04       ` Paul Mackerras
2017-10-30  0:51         ` Ram Pai
2017-11-01  4:46           ` Michael Ellerman
2017-11-01 11:02           ` Paul Mackerras

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.