linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [resend] [PATCH 0/3] powerpc/pseries: use H_BLOCK_REMOVE
@ 2018-07-27 13:51 Laurent Dufour
  2018-07-27 13:51 ` [PATCH 1/3] powerpc/pseries/mm: Introducing FW_FEATURE_BLOCK_REMOVE Laurent Dufour
                   ` (3 more replies)
  0 siblings, 4 replies; 13+ messages in thread
From: Laurent Dufour @ 2018-07-27 13:51 UTC (permalink / raw)
  To: linuxppc-dev, linux-kernel; +Cc: aneesh.kumar, mpe, benh, paulus, npiggin

[Resending so everyone is getting the cover letter]

On very large system we could see soft lockup fired when a process is exiting

watchdog: BUG: soft lockup - CPU#851 stuck for 21s! [forkoff:215523]
Modules linked in: pseries_rng rng_core xfs raid10 vmx_crypto btrfs libcrc32c xor zstd_decompress zstd_compress xxhash lzo_compress raid6_pq crc32c_vpmsum lpfc crc_t10dif crct10dif_generic crct10dif_common dm_multipath scsi_dh_rdac scsi_dh_alua autofs4
CPU: 851 PID: 215523 Comm: forkoff Not tainted 4.17.0 #1
NIP:  c0000000000b995c LR: c0000000000b8f64 CTR: 000000000000aa18
REGS: c00006b0645b7610 TRAP: 0901   Not tainted  (4.17.0)
MSR:  800000010280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE,TM[E]>  CR: 22042082  XER: 00000000
CFAR: 00000000006cf8f0 SOFTE: 0 
GPR00: 0010000000000000 c00006b0645b7890 c000000000f99200 0000000000000000 
GPR04: 8e000001a5a4de58 400249cf1bfd5480 8e000001a5a4de50 400249cf1bfd5480 
GPR08: 8e000001a5a4de48 400249cf1bfd5480 8e000001a5a4de40 400249cf1bfd5480 
GPR12: ffffffffffffffff c00000001e690800 
NIP [c0000000000b995c] plpar_hcall9+0x44/0x7c
LR [c0000000000b8f64] pSeries_lpar_flush_hash_range+0x324/0x3d0
Call Trace:
[c00006b0645b7890] [8e000001a5a4dd20] 0x8e000001a5a4dd20 (unreliable)
[c00006b0645b7a00] [c00000000006d5b0] flush_hash_range+0x60/0x110
[c00006b0645b7a50] [c000000000072a2c] __flush_tlb_pending+0x4c/0xd0
[c00006b0645b7a80] [c0000000002eaf44] unmap_page_range+0x984/0xbd0
[c00006b0645b7bc0] [c0000000002eb594] unmap_vmas+0x84/0x100
[c00006b0645b7c10] [c0000000002f8afc] exit_mmap+0xac/0x1f0
[c00006b0645b7cd0] [c0000000000f2638] mmput+0x98/0x1b0
[c00006b0645b7d00] [c0000000000fc9d0] do_exit+0x330/0xc00
[c00006b0645b7dc0] [c0000000000fd384] do_group_exit+0x64/0x100
[c00006b0645b7e00] [c0000000000fd44c] sys_exit_group+0x2c/0x30
[c00006b0645b7e30] [c00000000000b960] system_call+0x58/0x6c
Instruction dump:
60000000 f8810028 7ca42b78 7cc53378 7ce63b78 7d074378 7d284b78 7d495378 
e9410060 e9610068 e9810070 44000022 <7d806378> e9810028 f88c0000 f8ac0008

This happens when removing the PTE by calling the hypervisor using the
H_BULK_REMOVE call. This call is processing up to 4 PTEs but is doing a
tlbie for each PTE it is processing. This could lead to long time spent in
the hypervisor (sometimes up to 4s) and soft lockup being raised because
the scheduler is not called in zap_pte_range().

Since the Power7's time, the hypervisor is providing a new hcall
H_BLOCK_REMOVE allowing processing up to 8 PTEs with one call to
tlbie. By limiting the amount of tlbie generated, this reduces the time
spent invalidating the PTEs.

This hcall requires that the pages are "all within the same naturally
aligned 8 page virtual address block".

With this patch series applied, I couldn't see any soft lockup raised on
the victim LPAR I was running the test one.

This series is covering both normal pages and huge pages.

Laurent Dufour (3):
  powerpc/pseries/mm: Introducing FW_FEATURE_BLOCK_REMOVE
  powerpc/pseries/mm: factorize PTE slot computation
  powerpc/pseries/mm: call H_BLOCK_REMOVE

 arch/powerpc/include/asm/firmware.h       |   3 +-
 arch/powerpc/include/asm/hvcall.h         |   1 +
 arch/powerpc/platforms/pseries/firmware.c |   1 +
 arch/powerpc/platforms/pseries/lpar.c     | 250 ++++++++++++++++++++++++++----
 4 files changed, 228 insertions(+), 27 deletions(-)

-- 
2.7.4


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/3] powerpc/pseries/mm: Introducing FW_FEATURE_BLOCK_REMOVE
  2018-07-27 13:51 [resend] [PATCH 0/3] powerpc/pseries: use H_BLOCK_REMOVE Laurent Dufour
@ 2018-07-27 13:51 ` Laurent Dufour
  2018-07-30 14:18   ` Aneesh Kumar K.V
  2018-07-27 13:51 ` [PATCH 2/3] powerpc/pseries/mm: factorize PTE slot computation Laurent Dufour
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 13+ messages in thread
From: Laurent Dufour @ 2018-07-27 13:51 UTC (permalink / raw)
  To: linuxppc-dev, linux-kernel; +Cc: aneesh.kumar, mpe, benh, paulus, npiggin

This feature tells if the hcall H_BLOCK_REMOVE is available.

Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/firmware.h       | 3 ++-
 arch/powerpc/platforms/pseries/firmware.c | 1 +
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/firmware.h b/arch/powerpc/include/asm/firmware.h
index 535add3f7791..360ba197f9d2 100644
--- a/arch/powerpc/include/asm/firmware.h
+++ b/arch/powerpc/include/asm/firmware.h
@@ -53,6 +53,7 @@
 #define FW_FEATURE_PRRN		ASM_CONST(0x0000000200000000)
 #define FW_FEATURE_DRMEM_V2	ASM_CONST(0x0000000400000000)
 #define FW_FEATURE_DRC_INFO	ASM_CONST(0x0000000800000000)
+#define FW_FEATURE_BLOCK_REMOVE ASM_CONST(0x0000001000000000)
 
 #ifndef __ASSEMBLY__
 
@@ -70,7 +71,7 @@ enum {
 		FW_FEATURE_SET_MODE | FW_FEATURE_BEST_ENERGY |
 		FW_FEATURE_TYPE1_AFFINITY | FW_FEATURE_PRRN |
 		FW_FEATURE_HPT_RESIZE | FW_FEATURE_DRMEM_V2 |
-		FW_FEATURE_DRC_INFO,
+		FW_FEATURE_DRC_INFO | FW_FEATURE_BLOCK_REMOVE,
 	FW_FEATURE_PSERIES_ALWAYS = 0,
 	FW_FEATURE_POWERNV_POSSIBLE = FW_FEATURE_OPAL,
 	FW_FEATURE_POWERNV_ALWAYS = 0,
diff --git a/arch/powerpc/platforms/pseries/firmware.c b/arch/powerpc/platforms/pseries/firmware.c
index a3bbeb43689e..1624501386f4 100644
--- a/arch/powerpc/platforms/pseries/firmware.c
+++ b/arch/powerpc/platforms/pseries/firmware.c
@@ -65,6 +65,7 @@ hypertas_fw_features_table[] = {
 	{FW_FEATURE_SET_MODE,		"hcall-set-mode"},
 	{FW_FEATURE_BEST_ENERGY,	"hcall-best-energy-1*"},
 	{FW_FEATURE_HPT_RESIZE,		"hcall-hpt-resize"},
+	{FW_FEATURE_BLOCK_REMOVE,	"hcall-block-remove"},
 };
 
 /* Build up the firmware features bitmask using the contents of
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/3] powerpc/pseries/mm: factorize PTE slot computation
  2018-07-27 13:51 [resend] [PATCH 0/3] powerpc/pseries: use H_BLOCK_REMOVE Laurent Dufour
  2018-07-27 13:51 ` [PATCH 1/3] powerpc/pseries/mm: Introducing FW_FEATURE_BLOCK_REMOVE Laurent Dufour
@ 2018-07-27 13:51 ` Laurent Dufour
  2018-07-30 14:19   ` Aneesh Kumar K.V
  2018-07-27 13:51 ` [PATCH 3/3] powerpc/pseries/mm: call H_BLOCK_REMOVE Laurent Dufour
  2018-08-01  1:55 ` [resend] [PATCH 0/3] powerpc/pseries: use H_BLOCK_REMOVE Nicholas Piggin
  3 siblings, 1 reply; 13+ messages in thread
From: Laurent Dufour @ 2018-07-27 13:51 UTC (permalink / raw)
  To: linuxppc-dev, linux-kernel; +Cc: aneesh.kumar, mpe, benh, paulus, npiggin

This part of code will be called also when dealing with H_BLOCK_REMOVE.

Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/pseries/lpar.c | 27 ++++++++++++++++++++-------
 1 file changed, 20 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/lpar.c b/arch/powerpc/platforms/pseries/lpar.c
index 52eeff1297f4..96b8cd8a802d 100644
--- a/arch/powerpc/platforms/pseries/lpar.c
+++ b/arch/powerpc/platforms/pseries/lpar.c
@@ -547,6 +547,24 @@ static int pSeries_lpar_hpte_removebolted(unsigned long ea,
 	return 0;
 }
 
+
+static inline unsigned long compute_slot(real_pte_t pte,
+					 unsigned long vpn,
+					 unsigned long index,
+					 unsigned long shift,
+					 int ssize)
+{
+	unsigned long slot, hash, hidx;
+
+	hash = hpt_hash(vpn, shift, ssize);
+	hidx = __rpte_to_hidx(pte, index);
+	if (hidx & _PTEIDX_SECONDARY)
+		hash = ~hash;
+	slot = (hash & htab_hash_mask) * HPTES_PER_GROUP;
+	slot += hidx & _PTEIDX_GROUP_IX;
+	return slot;
+}
+
 /*
  * Take a spinlock around flushes to avoid bouncing the hypervisor tlbie
  * lock.
@@ -559,7 +577,7 @@ static void pSeries_lpar_flush_hash_range(unsigned long number, int local)
 	struct ppc64_tlb_batch *batch = this_cpu_ptr(&ppc64_tlb_batch);
 	int lock_tlbie = !mmu_has_feature(MMU_FTR_LOCKLESS_TLBIE);
 	unsigned long param[PLPAR_HCALL9_BUFSIZE];
-	unsigned long hash, index, shift, hidx, slot;
+	unsigned long index, shift, slot;
 	real_pte_t pte;
 	int psize, ssize;
 
@@ -573,12 +591,7 @@ static void pSeries_lpar_flush_hash_range(unsigned long number, int local)
 		vpn = batch->vpn[i];
 		pte = batch->pte[i];
 		pte_iterate_hashed_subpages(pte, psize, vpn, index, shift) {
-			hash = hpt_hash(vpn, shift, ssize);
-			hidx = __rpte_to_hidx(pte, index);
-			if (hidx & _PTEIDX_SECONDARY)
-				hash = ~hash;
-			slot = (hash & htab_hash_mask) * HPTES_PER_GROUP;
-			slot += hidx & _PTEIDX_GROUP_IX;
+			slot = compute_slot(pte, vpn, index, shift, ssize);
 			if (!firmware_has_feature(FW_FEATURE_BULK_REMOVE)) {
 				/*
 				 * lpar doesn't use the passed actual page size
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 3/3] powerpc/pseries/mm: call H_BLOCK_REMOVE
  2018-07-27 13:51 [resend] [PATCH 0/3] powerpc/pseries: use H_BLOCK_REMOVE Laurent Dufour
  2018-07-27 13:51 ` [PATCH 1/3] powerpc/pseries/mm: Introducing FW_FEATURE_BLOCK_REMOVE Laurent Dufour
  2018-07-27 13:51 ` [PATCH 2/3] powerpc/pseries/mm: factorize PTE slot computation Laurent Dufour
@ 2018-07-27 13:51 ` Laurent Dufour
  2018-07-30 13:47   ` Michael Ellerman
  2018-08-01  1:55 ` [resend] [PATCH 0/3] powerpc/pseries: use H_BLOCK_REMOVE Nicholas Piggin
  3 siblings, 1 reply; 13+ messages in thread
From: Laurent Dufour @ 2018-07-27 13:51 UTC (permalink / raw)
  To: linuxppc-dev, linux-kernel; +Cc: aneesh.kumar, mpe, benh, paulus, npiggin

This hypervisor call allows to remove up to 8 ptes with only call to tlbie.

The virtual pages must be all within the same naturally aligned 8 page
virtual address block and have the same page and segment size encodings.

Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/hvcall.h     |   1 +
 arch/powerpc/platforms/pseries/lpar.c | 223 +++++++++++++++++++++++++++++++---
 2 files changed, 205 insertions(+), 19 deletions(-)

diff --git a/arch/powerpc/include/asm/hvcall.h b/arch/powerpc/include/asm/hvcall.h
index 662c8347d699..e403d574651d 100644
--- a/arch/powerpc/include/asm/hvcall.h
+++ b/arch/powerpc/include/asm/hvcall.h
@@ -278,6 +278,7 @@
 #define H_COP			0x304
 #define H_GET_MPP_X		0x314
 #define H_SET_MODE		0x31C
+#define H_BLOCK_REMOVE		0x328
 #define H_CLEAR_HPT		0x358
 #define H_REQUEST_VMC		0x360
 #define H_RESIZE_HPT_PREPARE	0x36C
diff --git a/arch/powerpc/platforms/pseries/lpar.c b/arch/powerpc/platforms/pseries/lpar.c
index 96b8cd8a802d..41ed03245eb4 100644
--- a/arch/powerpc/platforms/pseries/lpar.c
+++ b/arch/powerpc/platforms/pseries/lpar.c
@@ -418,6 +418,73 @@ static void pSeries_lpar_hpte_invalidate(unsigned long slot, unsigned long vpn,
 	BUG_ON(lpar_rc != H_SUCCESS);
 }
 
+
+/*
+ * As defined in the PAPR's section 14.5.4.1.8
+ * The control mask doesn't include the returned reference and change bit from
+ * the processed PTE.
+ */
+#define HBLKR_AVPN		0x0100000000000000UL
+#define HBLKR_CTRL_MASK		0xf800000000000000UL
+#define HBLKR_CTRL_SUCCESS	0x8000000000000000UL
+#define HBLKR_CTRL_ERRNOTFOUND	0x8800000000000000UL
+#define HBLKR_CTRL_ERRBUSY	0xa000000000000000UL
+
+/**
+ * H_BLOCK_REMOVE caller.
+ * @idx should point to the latest @param entry set with a PTEX.
+ * If PTE cannot be processed because another CPUs has already locked that
+ * group, those entries are put back in @param starting at index 1.
+ * If entries has to be retried and @retry_busy is set to true, these entries
+ * are retried until success. If @retry_busy is set to false, the returned
+ * is the number of entries yet to process.
+ */
+static unsigned long call_block_remove(unsigned long idx, unsigned long *param,
+				       bool retry_busy)
+{
+	unsigned long i, rc, new_idx;
+	unsigned long retbuf[PLPAR_HCALL9_BUFSIZE];
+
+again:
+	new_idx = 0;
+	BUG_ON((idx < 2) || (idx > PLPAR_HCALL9_BUFSIZE));
+	if (idx < PLPAR_HCALL9_BUFSIZE)
+		param[idx] = HBR_END;
+
+	rc = plpar_hcall9(H_BLOCK_REMOVE, retbuf,
+			  param[0], /* AVA */
+			  param[1],  param[2],  param[3],  param[4], /* TS0-7 */
+			  param[5],  param[6],  param[7],  param[8]);
+	if (rc == H_SUCCESS)
+		return 0;
+
+	BUG_ON(rc != H_PARTIAL);
+
+	/* Check that the unprocessed entries were 'not found' or 'busy' */
+	for (i = 0; i < idx-1; i++) {
+		unsigned long ctrl = retbuf[i] & HBLKR_CTRL_MASK;
+
+		if (ctrl == HBLKR_CTRL_ERRBUSY) {
+			param[++new_idx] = param[i+1];
+			continue;
+		}
+
+		BUG_ON(ctrl != HBLKR_CTRL_SUCCESS
+		       && ctrl != HBLKR_CTRL_ERRNOTFOUND);
+	}
+
+	/*
+	 * If there were entries found busy, retry these entries if requested,
+	 * of if all the entries have to be retried.
+	 */
+	if (new_idx && (retry_busy || new_idx == (PLPAR_HCALL9_BUFSIZE-1))) {
+		idx = new_idx + 1;
+		goto again;
+	}
+
+	return new_idx;
+}
+
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 /*
  * Limit iterations holding pSeries_lpar_tlbie_lock to 3. We also need
@@ -425,17 +492,59 @@ static void pSeries_lpar_hpte_invalidate(unsigned long slot, unsigned long vpn,
  */
 #define PPC64_HUGE_HPTE_BATCH 12
 
-static void __pSeries_lpar_hugepage_invalidate(unsigned long *slot,
-					     unsigned long *vpn, int count,
-					     int psize, int ssize)
+static void hugepage_block_invalidate(unsigned long *slot, unsigned long *vpn,
+				      int count, int psize, int ssize)
 {
 	unsigned long param[PLPAR_HCALL9_BUFSIZE];
-	int i = 0, pix = 0, rc;
-	unsigned long flags = 0;
-	int lock_tlbie = !mmu_has_feature(MMU_FTR_LOCKLESS_TLBIE);
+	unsigned long shift, current_vpgb, vpgb;
+	int i, pix = 0;
 
-	if (lock_tlbie)
-		spin_lock_irqsave(&pSeries_lpar_tlbie_lock, flags);
+	shift = mmu_psize_defs[psize].shift;
+
+	for (i = 0; i < count; i++) {
+		/*
+		 * Shifting 3 bits more on the right to get a
+		 * 8 pages aligned virtual addresse.
+		 */
+		vpgb = (vpn[i] >> (shift - VPN_SHIFT + 3));
+		if (!pix || vpgb != current_vpgb) {
+			/*
+			 * Need to start a new 8 pages block, flush
+			 * the current one if needed.
+			 */
+			if (pix)
+				(void)call_block_remove(pix, param, true);
+			current_vpgb = vpgb;
+			param[0] = hpte_encode_avpn(vpn[i], psize, ssize);
+			pix = 1;
+		}
+
+		param[pix++] = HBR_REQUEST | HBLKR_AVPN | slot[i];
+		if (pix == PLPAR_HCALL9_BUFSIZE) {
+			pix = call_block_remove(pix, param, false);
+			/*
+			 * pix = 0 means that all the entries were
+			 * removed, we can start a new block.
+			 * Otherwise, this means that there are entries
+			 * to retry, and pix points to latest one, so
+			 * we should increment it and try to continue
+			 * the same block.
+			 */
+			if (!pix)
+				current_vpgb = 0;
+			else
+				pix++;
+		}
+	}
+	if (pix)
+		(void)call_block_remove(pix, param, true);
+}
+
+static void hugepage_bulk_invalidate(unsigned long *slot, unsigned long *vpn,
+				     int count, int psize, int ssize)
+{
+	unsigned long param[PLPAR_HCALL9_BUFSIZE];
+	int i = 0, pix = 0, rc;
 
 	for (i = 0; i < count; i++) {
 
@@ -443,17 +552,6 @@ static void __pSeries_lpar_hugepage_invalidate(unsigned long *slot,
 			pSeries_lpar_hpte_invalidate(slot[i], vpn[i], psize, 0,
 						     ssize, 0);
 		} else {
-			param[pix] = HBR_REQUEST | HBR_AVPN | slot[i];
-			param[pix+1] = hpte_encode_avpn(vpn[i], psize, ssize);
-			pix += 2;
-			if (pix == 8) {
-				rc = plpar_hcall9(H_BULK_REMOVE, param,
-						  param[0], param[1], param[2],
-						  param[3], param[4], param[5],
-						  param[6], param[7]);
-				BUG_ON(rc != H_SUCCESS);
-				pix = 0;
-			}
 		}
 	}
 	if (pix) {
@@ -463,6 +561,23 @@ static void __pSeries_lpar_hugepage_invalidate(unsigned long *slot,
 				  param[6], param[7]);
 		BUG_ON(rc != H_SUCCESS);
 	}
+}
+
+static inline void __pSeries_lpar_hugepage_invalidate(unsigned long *slot,
+						      unsigned long *vpn,
+						      int count, int psize,
+						      int ssize)
+{
+	unsigned long flags = 0;
+	int lock_tlbie = !mmu_has_feature(MMU_FTR_LOCKLESS_TLBIE);
+
+	if (lock_tlbie)
+		spin_lock_irqsave(&pSeries_lpar_tlbie_lock, flags);
+
+	if (firmware_has_feature(FW_FEATURE_BLOCK_REMOVE))
+		hugepage_block_invalidate(slot, vpn, count, psize, ssize);
+	else
+		hugepage_bulk_invalidate(slot, vpn, count, psize, ssize);
 
 	if (lock_tlbie)
 		spin_unlock_irqrestore(&pSeries_lpar_tlbie_lock, flags);
@@ -565,6 +680,70 @@ static inline unsigned long compute_slot(real_pte_t pte,
 	return slot;
 }
 
+/**
+ * The hcall H_BLOCK_REMOVE implies that the virtual pages to processed are
+ * "all within the same naturally aligned 8 page virtual address block".
+ */
+static void do_block_remove(unsigned long number, struct ppc64_tlb_batch *batch,
+			    unsigned long *param)
+{
+	unsigned long vpn;
+	unsigned long i, pix = 0;
+	unsigned long index, shift, slot, current_vpgb, vpgb;
+	real_pte_t pte;
+	int psize, ssize;
+
+	psize = batch->psize;
+	ssize = batch->ssize;
+
+	for (i = 0; i < number; i++) {
+		vpn = batch->vpn[i];
+		pte = batch->pte[i];
+		pte_iterate_hashed_subpages(pte, psize, vpn, index, shift) {
+			/*
+			 * Shifting 3 bits more on the right to get a
+			 * 8 pages aligned virtual addresse.
+			 */
+			vpgb = (vpn >> (shift - VPN_SHIFT + 3));
+			if (!pix || vpgb != current_vpgb) {
+				/*
+				 * Need to start a new 8 pages block, flush
+				 * the current one if needed.
+				 */
+				if (pix)
+					(void)call_block_remove(pix, param,
+								true);
+				current_vpgb = vpgb;
+				param[0] = hpte_encode_avpn(vpn, psize,
+							    ssize);
+				pix = 1;
+			}
+
+			slot = compute_slot(pte, vpn, index, shift, ssize);
+			param[pix++] = HBR_REQUEST | HBLKR_AVPN | slot;
+
+			if (pix == PLPAR_HCALL9_BUFSIZE) {
+				pix = call_block_remove(pix, param, false);
+				/*
+				 * pix = 0 means that all the entries were
+				 * removed, we can start a new block.
+				 * Otherwise, this means that there are entries
+				 * to retry, and pix points to latest one, so
+				 * we should increment it and try to continue
+				 * the same block.
+				 */
+				if (!pix)
+					current_vpgb = 0;
+				else
+					pix++;
+			}
+		} pte_iterate_hashed_end();
+	}
+
+	if (pix > 1)
+		(void)call_block_remove(pix, param, true);
+}
+
 /*
  * Take a spinlock around flushes to avoid bouncing the hypervisor tlbie
  * lock.
@@ -584,6 +763,11 @@ static void pSeries_lpar_flush_hash_range(unsigned long number, int local)
 	if (lock_tlbie)
 		spin_lock_irqsave(&pSeries_lpar_tlbie_lock, flags);
 
+	if (firmware_has_feature(FW_FEATURE_BLOCK_REMOVE)) {
+		do_block_remove(number, batch, param);
+		goto out;
+	}
+
 	psize = batch->psize;
 	ssize = batch->ssize;
 	pix = 0;
@@ -622,6 +806,7 @@ static void pSeries_lpar_flush_hash_range(unsigned long number, int local)
 		BUG_ON(rc != H_SUCCESS);
 	}
 
+out:
 	if (lock_tlbie)
 		spin_unlock_irqrestore(&pSeries_lpar_tlbie_lock, flags);
 }
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/3] powerpc/pseries/mm: call H_BLOCK_REMOVE
  2018-07-27 13:51 ` [PATCH 3/3] powerpc/pseries/mm: call H_BLOCK_REMOVE Laurent Dufour
@ 2018-07-30 13:47   ` Michael Ellerman
  2018-07-30 14:22     ` Aneesh Kumar K.V
  2018-08-16  9:41     ` Laurent Dufour
  0 siblings, 2 replies; 13+ messages in thread
From: Michael Ellerman @ 2018-07-30 13:47 UTC (permalink / raw)
  To: Laurent Dufour, linuxppc-dev, linux-kernel
  Cc: aneesh.kumar, benh, paulus, npiggin

Hi Laurent,

Just one comment below.

Laurent Dufour <ldufour@linux.vnet.ibm.com> writes:
> diff --git a/arch/powerpc/platforms/pseries/lpar.c b/arch/powerpc/platforms/pseries/lpar.c
> index 96b8cd8a802d..41ed03245eb4 100644
> --- a/arch/powerpc/platforms/pseries/lpar.c
> +++ b/arch/powerpc/platforms/pseries/lpar.c
> @@ -418,6 +418,73 @@ static void pSeries_lpar_hpte_invalidate(unsigned long slot, unsigned long vpn,
>  	BUG_ON(lpar_rc != H_SUCCESS);
>  }
>  
> +
> +/*
> + * As defined in the PAPR's section 14.5.4.1.8
> + * The control mask doesn't include the returned reference and change bit from
> + * the processed PTE.
> + */
> +#define HBLKR_AVPN		0x0100000000000000UL
> +#define HBLKR_CTRL_MASK		0xf800000000000000UL
> +#define HBLKR_CTRL_SUCCESS	0x8000000000000000UL
> +#define HBLKR_CTRL_ERRNOTFOUND	0x8800000000000000UL
> +#define HBLKR_CTRL_ERRBUSY	0xa000000000000000UL
> +
> +/**
> + * H_BLOCK_REMOVE caller.
> + * @idx should point to the latest @param entry set with a PTEX.
> + * If PTE cannot be processed because another CPUs has already locked that
> + * group, those entries are put back in @param starting at index 1.
> + * If entries has to be retried and @retry_busy is set to true, these entries
> + * are retried until success. If @retry_busy is set to false, the returned
> + * is the number of entries yet to process.
> + */
> +static unsigned long call_block_remove(unsigned long idx, unsigned long *param,
> +				       bool retry_busy)
> +{
> +	unsigned long i, rc, new_idx;
> +	unsigned long retbuf[PLPAR_HCALL9_BUFSIZE];
> +
> +again:
> +	new_idx = 0;
> +	BUG_ON((idx < 2) || (idx > PLPAR_HCALL9_BUFSIZE));

I count 1 ..

> +	if (idx < PLPAR_HCALL9_BUFSIZE)
> +		param[idx] = HBR_END;
> +
> +	rc = plpar_hcall9(H_BLOCK_REMOVE, retbuf,
> +			  param[0], /* AVA */
> +			  param[1],  param[2],  param[3],  param[4], /* TS0-7 */
> +			  param[5],  param[6],  param[7],  param[8]);
> +	if (rc == H_SUCCESS)
> +		return 0;
> +
> +	BUG_ON(rc != H_PARTIAL);

2 ...

> +	/* Check that the unprocessed entries were 'not found' or 'busy' */
> +	for (i = 0; i < idx-1; i++) {
> +		unsigned long ctrl = retbuf[i] & HBLKR_CTRL_MASK;
> +
> +		if (ctrl == HBLKR_CTRL_ERRBUSY) {
> +			param[++new_idx] = param[i+1];
> +			continue;
> +		}
> +
> +		BUG_ON(ctrl != HBLKR_CTRL_SUCCESS
> +		       && ctrl != HBLKR_CTRL_ERRNOTFOUND);

3 ...

BUG_ON()s.

I know the code in this file is already pretty liberal with the use of
BUG_ON() but I'd prefer if we don't make it any worse.

Given this is an optimisation it seems like we should be able to fall
back to the existing implementation in the case of error (which will
probably then BUG_ON() 😂)

If there's some reason we can't then I guess I can live with it.

cheers

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/3] powerpc/pseries/mm: Introducing FW_FEATURE_BLOCK_REMOVE
  2018-07-27 13:51 ` [PATCH 1/3] powerpc/pseries/mm: Introducing FW_FEATURE_BLOCK_REMOVE Laurent Dufour
@ 2018-07-30 14:18   ` Aneesh Kumar K.V
  0 siblings, 0 replies; 13+ messages in thread
From: Aneesh Kumar K.V @ 2018-07-30 14:18 UTC (permalink / raw)
  To: Laurent Dufour, linuxppc-dev, linux-kernel; +Cc: mpe, benh, paulus, npiggin

Laurent Dufour <ldufour@linux.vnet.ibm.com> writes:

> This feature tells if the hcall H_BLOCK_REMOVE is available.
>
> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
> Cc: Nicholas Piggin <npiggin@gmail.com>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Paul Mackerras <paulus@samba.org>
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>

Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>

> ---
>  arch/powerpc/include/asm/firmware.h       | 3 ++-
>  arch/powerpc/platforms/pseries/firmware.c | 1 +
>  2 files changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/include/asm/firmware.h b/arch/powerpc/include/asm/firmware.h
> index 535add3f7791..360ba197f9d2 100644
> --- a/arch/powerpc/include/asm/firmware.h
> +++ b/arch/powerpc/include/asm/firmware.h
> @@ -53,6 +53,7 @@
>  #define FW_FEATURE_PRRN		ASM_CONST(0x0000000200000000)
>  #define FW_FEATURE_DRMEM_V2	ASM_CONST(0x0000000400000000)
>  #define FW_FEATURE_DRC_INFO	ASM_CONST(0x0000000800000000)
> +#define FW_FEATURE_BLOCK_REMOVE ASM_CONST(0x0000001000000000)
>  
>  #ifndef __ASSEMBLY__
>  
> @@ -70,7 +71,7 @@ enum {
>  		FW_FEATURE_SET_MODE | FW_FEATURE_BEST_ENERGY |
>  		FW_FEATURE_TYPE1_AFFINITY | FW_FEATURE_PRRN |
>  		FW_FEATURE_HPT_RESIZE | FW_FEATURE_DRMEM_V2 |
> -		FW_FEATURE_DRC_INFO,
> +		FW_FEATURE_DRC_INFO | FW_FEATURE_BLOCK_REMOVE,
>  	FW_FEATURE_PSERIES_ALWAYS = 0,
>  	FW_FEATURE_POWERNV_POSSIBLE = FW_FEATURE_OPAL,
>  	FW_FEATURE_POWERNV_ALWAYS = 0,
> diff --git a/arch/powerpc/platforms/pseries/firmware.c b/arch/powerpc/platforms/pseries/firmware.c
> index a3bbeb43689e..1624501386f4 100644
> --- a/arch/powerpc/platforms/pseries/firmware.c
> +++ b/arch/powerpc/platforms/pseries/firmware.c
> @@ -65,6 +65,7 @@ hypertas_fw_features_table[] = {
>  	{FW_FEATURE_SET_MODE,		"hcall-set-mode"},
>  	{FW_FEATURE_BEST_ENERGY,	"hcall-best-energy-1*"},
>  	{FW_FEATURE_HPT_RESIZE,		"hcall-hpt-resize"},
> +	{FW_FEATURE_BLOCK_REMOVE,	"hcall-block-remove"},
>  };
>  
>  /* Build up the firmware features bitmask using the contents of
> -- 
> 2.7.4


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/3] powerpc/pseries/mm: factorize PTE slot computation
  2018-07-27 13:51 ` [PATCH 2/3] powerpc/pseries/mm: factorize PTE slot computation Laurent Dufour
@ 2018-07-30 14:19   ` Aneesh Kumar K.V
  0 siblings, 0 replies; 13+ messages in thread
From: Aneesh Kumar K.V @ 2018-07-30 14:19 UTC (permalink / raw)
  To: Laurent Dufour, linuxppc-dev, linux-kernel; +Cc: mpe, benh, paulus, npiggin

Laurent Dufour <ldufour@linux.vnet.ibm.com> writes:

> This part of code will be called also when dealing with H_BLOCK_REMOVE.
>
> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
> Cc: Nicholas Piggin <npiggin@gmail.com>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Paul Mackerras <paulus@samba.org>
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>

Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>

> ---
>  arch/powerpc/platforms/pseries/lpar.c | 27 ++++++++++++++++++++-------
>  1 file changed, 20 insertions(+), 7 deletions(-)
>
> diff --git a/arch/powerpc/platforms/pseries/lpar.c b/arch/powerpc/platforms/pseries/lpar.c
> index 52eeff1297f4..96b8cd8a802d 100644
> --- a/arch/powerpc/platforms/pseries/lpar.c
> +++ b/arch/powerpc/platforms/pseries/lpar.c
> @@ -547,6 +547,24 @@ static int pSeries_lpar_hpte_removebolted(unsigned long ea,
>  	return 0;
>  }
>  
> +
> +static inline unsigned long compute_slot(real_pte_t pte,
> +					 unsigned long vpn,
> +					 unsigned long index,
> +					 unsigned long shift,
> +					 int ssize)
> +{
> +	unsigned long slot, hash, hidx;
> +
> +	hash = hpt_hash(vpn, shift, ssize);
> +	hidx = __rpte_to_hidx(pte, index);
> +	if (hidx & _PTEIDX_SECONDARY)
> +		hash = ~hash;
> +	slot = (hash & htab_hash_mask) * HPTES_PER_GROUP;
> +	slot += hidx & _PTEIDX_GROUP_IX;
> +	return slot;
> +}
> +
>  /*
>   * Take a spinlock around flushes to avoid bouncing the hypervisor tlbie
>   * lock.
> @@ -559,7 +577,7 @@ static void pSeries_lpar_flush_hash_range(unsigned long number, int local)
>  	struct ppc64_tlb_batch *batch = this_cpu_ptr(&ppc64_tlb_batch);
>  	int lock_tlbie = !mmu_has_feature(MMU_FTR_LOCKLESS_TLBIE);
>  	unsigned long param[PLPAR_HCALL9_BUFSIZE];
> -	unsigned long hash, index, shift, hidx, slot;
> +	unsigned long index, shift, slot;
>  	real_pte_t pte;
>  	int psize, ssize;
>  
> @@ -573,12 +591,7 @@ static void pSeries_lpar_flush_hash_range(unsigned long number, int local)
>  		vpn = batch->vpn[i];
>  		pte = batch->pte[i];
>  		pte_iterate_hashed_subpages(pte, psize, vpn, index, shift) {
> -			hash = hpt_hash(vpn, shift, ssize);
> -			hidx = __rpte_to_hidx(pte, index);
> -			if (hidx & _PTEIDX_SECONDARY)
> -				hash = ~hash;
> -			slot = (hash & htab_hash_mask) * HPTES_PER_GROUP;
> -			slot += hidx & _PTEIDX_GROUP_IX;
> +			slot = compute_slot(pte, vpn, index, shift, ssize);
>  			if (!firmware_has_feature(FW_FEATURE_BULK_REMOVE)) {
>  				/*
>  				 * lpar doesn't use the passed actual page size
> -- 
> 2.7.4


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/3] powerpc/pseries/mm: call H_BLOCK_REMOVE
  2018-07-30 13:47   ` Michael Ellerman
@ 2018-07-30 14:22     ` Aneesh Kumar K.V
  2018-08-16 17:27       ` Laurent Dufour
  2018-08-16  9:41     ` Laurent Dufour
  1 sibling, 1 reply; 13+ messages in thread
From: Aneesh Kumar K.V @ 2018-07-30 14:22 UTC (permalink / raw)
  To: Michael Ellerman, Laurent Dufour, linuxppc-dev, linux-kernel
  Cc: benh, paulus, npiggin

Michael Ellerman <mpe@ellerman.id.au> writes:

> Hi Laurent,
>
> Just one comment below.
>
> Laurent Dufour <ldufour@linux.vnet.ibm.com> writes:
>> diff --git a/arch/powerpc/platforms/pseries/lpar.c b/arch/powerpc/platforms/pseries/lpar.c
>> index 96b8cd8a802d..41ed03245eb4 100644
>> --- a/arch/powerpc/platforms/pseries/lpar.c
>> +++ b/arch/powerpc/platforms/pseries/lpar.c
>> @@ -418,6 +418,73 @@ static void pSeries_lpar_hpte_invalidate(unsigned long slot, unsigned long vpn,
>>  	BUG_ON(lpar_rc != H_SUCCESS);
>>  }
>>  
>> +
>> +/*
>> + * As defined in the PAPR's section 14.5.4.1.8
>> + * The control mask doesn't include the returned reference and change bit from
>> + * the processed PTE.
>> + */
>> +#define HBLKR_AVPN		0x0100000000000000UL
>> +#define HBLKR_CTRL_MASK		0xf800000000000000UL
>> +#define HBLKR_CTRL_SUCCESS	0x8000000000000000UL
>> +#define HBLKR_CTRL_ERRNOTFOUND	0x8800000000000000UL
>> +#define HBLKR_CTRL_ERRBUSY	0xa000000000000000UL
>> +
>> +/**
>> + * H_BLOCK_REMOVE caller.
>> + * @idx should point to the latest @param entry set with a PTEX.
>> + * If PTE cannot be processed because another CPUs has already locked that
>> + * group, those entries are put back in @param starting at index 1.
>> + * If entries has to be retried and @retry_busy is set to true, these entries
>> + * are retried until success. If @retry_busy is set to false, the returned
>> + * is the number of entries yet to process.
>> + */
>> +static unsigned long call_block_remove(unsigned long idx, unsigned long *param,
>> +				       bool retry_busy)
>> +{
>> +	unsigned long i, rc, new_idx;
>> +	unsigned long retbuf[PLPAR_HCALL9_BUFSIZE];
>> +
>> +again:
>> +	new_idx = 0;
>> +	BUG_ON((idx < 2) || (idx > PLPAR_HCALL9_BUFSIZE));
>
> I count 1 ..
>
>> +	if (idx < PLPAR_HCALL9_BUFSIZE)
>> +		param[idx] = HBR_END;
>> +
>> +	rc = plpar_hcall9(H_BLOCK_REMOVE, retbuf,
>> +			  param[0], /* AVA */
>> +			  param[1],  param[2],  param[3],  param[4], /* TS0-7 */
>> +			  param[5],  param[6],  param[7],  param[8]);
>> +	if (rc == H_SUCCESS)
>> +		return 0;
>> +
>> +	BUG_ON(rc != H_PARTIAL);
>
> 2 ...
>
>> +	/* Check that the unprocessed entries were 'not found' or 'busy' */
>> +	for (i = 0; i < idx-1; i++) {
>> +		unsigned long ctrl = retbuf[i] & HBLKR_CTRL_MASK;
>> +
>> +		if (ctrl == HBLKR_CTRL_ERRBUSY) {
>> +			param[++new_idx] = param[i+1];
>> +			continue;
>> +		}
>> +
>> +		BUG_ON(ctrl != HBLKR_CTRL_SUCCESS
>> +		       && ctrl != HBLKR_CTRL_ERRNOTFOUND);
>
> 3 ...
>
> BUG_ON()s.
>
> I know the code in this file is already pretty liberal with the use of
> BUG_ON() but I'd prefer if we don't make it any worse.
>
> Given this is an optimisation it seems like we should be able to fall
> back to the existing implementation in the case of error (which will
> probably then BUG_ON() 😂)
>
> If there's some reason we can't then I guess I can live with it.

It would be nice to log the error in case we are not expecting the
error return. We recently did
https://marc.info/?i=20180629083904.29250-1-aneesh.kumar@linux.ibm.com

-aneesh


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [resend] [PATCH 0/3] powerpc/pseries: use H_BLOCK_REMOVE
  2018-07-27 13:51 [resend] [PATCH 0/3] powerpc/pseries: use H_BLOCK_REMOVE Laurent Dufour
                   ` (2 preceding siblings ...)
  2018-07-27 13:51 ` [PATCH 3/3] powerpc/pseries/mm: call H_BLOCK_REMOVE Laurent Dufour
@ 2018-08-01  1:55 ` Nicholas Piggin
  2018-08-03  2:29   ` Michael Ellerman
  3 siblings, 1 reply; 13+ messages in thread
From: Nicholas Piggin @ 2018-08-01  1:55 UTC (permalink / raw)
  To: Laurent Dufour
  Cc: linuxppc-dev, linux-kernel, aneesh.kumar, mpe, benh, paulus

On Fri, 27 Jul 2018 15:51:30 +0200
Laurent Dufour <ldufour@linux.vnet.ibm.com> wrote:

> [Resending so everyone is getting the cover letter]
> 
> On very large system we could see soft lockup fired when a process is exiting
> 
> watchdog: BUG: soft lockup - CPU#851 stuck for 21s! [forkoff:215523]
> Modules linked in: pseries_rng rng_core xfs raid10 vmx_crypto btrfs libcrc32c xor zstd_decompress zstd_compress xxhash lzo_compress raid6_pq crc32c_vpmsum lpfc crc_t10dif crct10dif_generic crct10dif_common dm_multipath scsi_dh_rdac scsi_dh_alua autofs4
> CPU: 851 PID: 215523 Comm: forkoff Not tainted 4.17.0 #1
> NIP:  c0000000000b995c LR: c0000000000b8f64 CTR: 000000000000aa18
> REGS: c00006b0645b7610 TRAP: 0901   Not tainted  (4.17.0)
> MSR:  800000010280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE,TM[E]>  CR: 22042082  XER: 00000000
> CFAR: 00000000006cf8f0 SOFTE: 0 
> GPR00: 0010000000000000 c00006b0645b7890 c000000000f99200 0000000000000000 
> GPR04: 8e000001a5a4de58 400249cf1bfd5480 8e000001a5a4de50 400249cf1bfd5480 
> GPR08: 8e000001a5a4de48 400249cf1bfd5480 8e000001a5a4de40 400249cf1bfd5480 
> GPR12: ffffffffffffffff c00000001e690800 
> NIP [c0000000000b995c] plpar_hcall9+0x44/0x7c
> LR [c0000000000b8f64] pSeries_lpar_flush_hash_range+0x324/0x3d0
> Call Trace:
> [c00006b0645b7890] [8e000001a5a4dd20] 0x8e000001a5a4dd20 (unreliable)
> [c00006b0645b7a00] [c00000000006d5b0] flush_hash_range+0x60/0x110
> [c00006b0645b7a50] [c000000000072a2c] __flush_tlb_pending+0x4c/0xd0
> [c00006b0645b7a80] [c0000000002eaf44] unmap_page_range+0x984/0xbd0
> [c00006b0645b7bc0] [c0000000002eb594] unmap_vmas+0x84/0x100
> [c00006b0645b7c10] [c0000000002f8afc] exit_mmap+0xac/0x1f0
> [c00006b0645b7cd0] [c0000000000f2638] mmput+0x98/0x1b0
> [c00006b0645b7d00] [c0000000000fc9d0] do_exit+0x330/0xc00
> [c00006b0645b7dc0] [c0000000000fd384] do_group_exit+0x64/0x100
> [c00006b0645b7e00] [c0000000000fd44c] sys_exit_group+0x2c/0x30
> [c00006b0645b7e30] [c00000000000b960] system_call+0x58/0x6c
> Instruction dump:
> 60000000 f8810028 7ca42b78 7cc53378 7ce63b78 7d074378 7d284b78 7d495378 
> e9410060 e9610068 e9810070 44000022 <7d806378> e9810028 f88c0000 f8ac0008
> 
> This happens when removing the PTE by calling the hypervisor using the
> H_BULK_REMOVE call. This call is processing up to 4 PTEs but is doing a
> tlbie for each PTE it is processing. This could lead to long time spent in
> the hypervisor (sometimes up to 4s) and soft lockup being raised because
> the scheduler is not called in zap_pte_range().
> 
> Since the Power7's time, the hypervisor is providing a new hcall
> H_BLOCK_REMOVE allowing processing up to 8 PTEs with one call to
> tlbie. By limiting the amount of tlbie generated, this reduces the time
> spent invalidating the PTEs.

Oh that's a nice feature. I must have an ancient PAPR because I don't
have it. It could be a good project for someone to implement it in KVM
too.

> 
> This hcall requires that the pages are "all within the same naturally
> aligned 8 page virtual address block".
> 
> With this patch series applied, I couldn't see any soft lockup raised on
> the victim LPAR I was running the test one.
> 
> This series is covering both normal pages and huge pages.

Really nice, thanks for working on the problem.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [resend] [PATCH 0/3] powerpc/pseries: use H_BLOCK_REMOVE
  2018-08-01  1:55 ` [resend] [PATCH 0/3] powerpc/pseries: use H_BLOCK_REMOVE Nicholas Piggin
@ 2018-08-03  2:29   ` Michael Ellerman
  0 siblings, 0 replies; 13+ messages in thread
From: Michael Ellerman @ 2018-08-03  2:29 UTC (permalink / raw)
  To: Nicholas Piggin, Laurent Dufour
  Cc: linuxppc-dev, linux-kernel, aneesh.kumar, benh, paulus

Nicholas Piggin <npiggin@gmail.com> writes:
> On Fri, 27 Jul 2018 15:51:30 +0200
> Laurent Dufour <ldufour@linux.vnet.ibm.com> wrote:
>> [Resending so everyone is getting the cover letter]
>> On very large system we could see soft lockup fired when a process is exiting
>> 
>> watchdog: BUG: soft lockup - CPU#851 stuck for 21s! [forkoff:215523]
>> Modules linked in: pseries_rng rng_core xfs raid10 vmx_crypto btrfs libcrc32c xor zstd_decompress zstd_compress xxhash lzo_compress raid6_pq crc32c_vpmsum lpfc crc_t10dif crct10dif_generic crct10dif_common dm_multipath scsi_dh_rdac scsi_dh_alua autofs4
>> CPU: 851 PID: 215523 Comm: forkoff Not tainted 4.17.0 #1
>> NIP:  c0000000000b995c LR: c0000000000b8f64 CTR: 000000000000aa18
>> REGS: c00006b0645b7610 TRAP: 0901   Not tainted  (4.17.0)
>> MSR:  800000010280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE,TM[E]>  CR: 22042082  XER: 00000000
>> CFAR: 00000000006cf8f0 SOFTE: 0 
>> GPR00: 0010000000000000 c00006b0645b7890 c000000000f99200 0000000000000000 
>> GPR04: 8e000001a5a4de58 400249cf1bfd5480 8e000001a5a4de50 400249cf1bfd5480 
>> GPR08: 8e000001a5a4de48 400249cf1bfd5480 8e000001a5a4de40 400249cf1bfd5480 
>> GPR12: ffffffffffffffff c00000001e690800 
>> NIP [c0000000000b995c] plpar_hcall9+0x44/0x7c
>> LR [c0000000000b8f64] pSeries_lpar_flush_hash_range+0x324/0x3d0
>> Call Trace:
>> [c00006b0645b7890] [8e000001a5a4dd20] 0x8e000001a5a4dd20 (unreliable)
>> [c00006b0645b7a00] [c00000000006d5b0] flush_hash_range+0x60/0x110
>> [c00006b0645b7a50] [c000000000072a2c] __flush_tlb_pending+0x4c/0xd0
>> [c00006b0645b7a80] [c0000000002eaf44] unmap_page_range+0x984/0xbd0
>> [c00006b0645b7bc0] [c0000000002eb594] unmap_vmas+0x84/0x100
>> [c00006b0645b7c10] [c0000000002f8afc] exit_mmap+0xac/0x1f0
>> [c00006b0645b7cd0] [c0000000000f2638] mmput+0x98/0x1b0
>> [c00006b0645b7d00] [c0000000000fc9d0] do_exit+0x330/0xc00
>> [c00006b0645b7dc0] [c0000000000fd384] do_group_exit+0x64/0x100
>> [c00006b0645b7e00] [c0000000000fd44c] sys_exit_group+0x2c/0x30
>> [c00006b0645b7e30] [c00000000000b960] system_call+0x58/0x6c
>> Instruction dump:
>> 60000000 f8810028 7ca42b78 7cc53378 7ce63b78 7d074378 7d284b78 7d495378 
>> e9410060 e9610068 e9810070 44000022 <7d806378> e9810028 f88c0000 f8ac0008
>> 
>> This happens when removing the PTE by calling the hypervisor using the
>> H_BULK_REMOVE call. This call is processing up to 4 PTEs but is doing a
>> tlbie for each PTE it is processing. This could lead to long time spent in
>> the hypervisor (sometimes up to 4s) and soft lockup being raised because
>> the scheduler is not called in zap_pte_range().
>> 
>> Since the Power7's time, the hypervisor is providing a new hcall
>> H_BLOCK_REMOVE allowing processing up to 8 PTEs with one call to
>> tlbie. By limiting the amount of tlbie generated, this reduces the time
>> spent invalidating the PTEs.
>
> Oh that's a nice feature. I must have an ancient PAPR because I don't
> have it.

The only public document is LoPAPR, which is a stripped down version of
the 2012 PAPR.

cheers

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/3] powerpc/pseries/mm: call H_BLOCK_REMOVE
  2018-07-30 13:47   ` Michael Ellerman
  2018-07-30 14:22     ` Aneesh Kumar K.V
@ 2018-08-16  9:41     ` Laurent Dufour
  1 sibling, 0 replies; 13+ messages in thread
From: Laurent Dufour @ 2018-08-16  9:41 UTC (permalink / raw)
  To: Michael Ellerman, linuxppc-dev, linux-kernel
  Cc: aneesh.kumar, benh, paulus, npiggin

On 30/07/2018 15:47, Michael Ellerman wrote:
> Hi Laurent,
> 
> Just one comment below.
> 
> Laurent Dufour <ldufour@linux.vnet.ibm.com> writes:
>> diff --git a/arch/powerpc/platforms/pseries/lpar.c b/arch/powerpc/platforms/pseries/lpar.c
>> index 96b8cd8a802d..41ed03245eb4 100644
>> --- a/arch/powerpc/platforms/pseries/lpar.c
>> +++ b/arch/powerpc/platforms/pseries/lpar.c
>> @@ -418,6 +418,73 @@ static void pSeries_lpar_hpte_invalidate(unsigned long slot, unsigned long vpn,
>>  	BUG_ON(lpar_rc != H_SUCCESS);
>>  }
>>  
>> +
>> +/*
>> + * As defined in the PAPR's section 14.5.4.1.8
>> + * The control mask doesn't include the returned reference and change bit from
>> + * the processed PTE.
>> + */
>> +#define HBLKR_AVPN		0x0100000000000000UL
>> +#define HBLKR_CTRL_MASK		0xf800000000000000UL
>> +#define HBLKR_CTRL_SUCCESS	0x8000000000000000UL
>> +#define HBLKR_CTRL_ERRNOTFOUND	0x8800000000000000UL
>> +#define HBLKR_CTRL_ERRBUSY	0xa000000000000000UL
>> +
>> +/**
>> + * H_BLOCK_REMOVE caller.
>> + * @idx should point to the latest @param entry set with a PTEX.
>> + * If PTE cannot be processed because another CPUs has already locked that
>> + * group, those entries are put back in @param starting at index 1.
>> + * If entries has to be retried and @retry_busy is set to true, these entries
>> + * are retried until success. If @retry_busy is set to false, the returned
>> + * is the number of entries yet to process.
>> + */
>> +static unsigned long call_block_remove(unsigned long idx, unsigned long *param,
>> +				       bool retry_busy)
>> +{
>> +	unsigned long i, rc, new_idx;
>> +	unsigned long retbuf[PLPAR_HCALL9_BUFSIZE];
>> +
>> +again:
>> +	new_idx = 0;
>> +	BUG_ON((idx < 2) || (idx > PLPAR_HCALL9_BUFSIZE));
> 
> I count 1 ..
> 
>> +	if (idx < PLPAR_HCALL9_BUFSIZE)
>> +		param[idx] = HBR_END;
>> +
>> +	rc = plpar_hcall9(H_BLOCK_REMOVE, retbuf,
>> +			  param[0], /* AVA */
>> +			  param[1],  param[2],  param[3],  param[4], /* TS0-7 */
>> +			  param[5],  param[6],  param[7],  param[8]);
>> +	if (rc == H_SUCCESS)
>> +		return 0;
>> +
>> +	BUG_ON(rc != H_PARTIAL);
> 
> 2 ...
> 
>> +	/* Check that the unprocessed entries were 'not found' or 'busy' */
>> +	for (i = 0; i < idx-1; i++) {
>> +		unsigned long ctrl = retbuf[i] & HBLKR_CTRL_MASK;
>> +
>> +		if (ctrl == HBLKR_CTRL_ERRBUSY) {
>> +			param[++new_idx] = param[i+1];
>> +			continue;
>> +		}
>> +
>> +		BUG_ON(ctrl != HBLKR_CTRL_SUCCESS
>> +		       && ctrl != HBLKR_CTRL_ERRNOTFOUND);
> 
> 3 ...
> 
> BUG_ON()s.
> 
> I know the code in this file is already pretty liberal with the use of
> BUG_ON() but I'd prefer if we don't make it any worse.

The first one is clearly not required. But I would keep the following twos
because this call is not expected to fail except if there is a discrepancy
between the linux kernel HASH views and the hypervisor's one, which could be
dramatic in the consequences.

> 
> Given this is an optimisation it seems like we should be able to fall
> back to the existing implementation in the case of error (which will
> probably then BUG_ON() 😂)

I don't think falling back to the H_BULK call will be helpfull since it is
doing the same so the same errors are expected. Furthermore, this hcall can do
a partial work which means complex code to fallback on H_BULK as we should
identify to already processed entries.

> If there's some reason we can't then I guess I can live with it.

I'm proposing to send a new series with _only_ 2 calls to BUG_ON().

Furthermore this patch is not correct on the way the huge pages are managed. I
was too hurry to push it last time.

Cheers,
Laurent.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/3] powerpc/pseries/mm: call H_BLOCK_REMOVE
  2018-07-30 14:22     ` Aneesh Kumar K.V
@ 2018-08-16 17:27       ` Laurent Dufour
  0 siblings, 0 replies; 13+ messages in thread
From: Laurent Dufour @ 2018-08-16 17:27 UTC (permalink / raw)
  To: Aneesh Kumar K.V, Michael Ellerman, linuxppc-dev, linux-kernel
  Cc: benh, paulus, npiggin

On 30/07/2018 16:22, Aneesh Kumar K.V wrote:
> Michael Ellerman <mpe@ellerman.id.au> writes:
> 
>> Hi Laurent,
>>
>> Just one comment below.
>>
>> Laurent Dufour <ldufour@linux.vnet.ibm.com> writes:
>>> diff --git a/arch/powerpc/platforms/pseries/lpar.c b/arch/powerpc/platforms/pseries/lpar.c
>>> index 96b8cd8a802d..41ed03245eb4 100644
>>> --- a/arch/powerpc/platforms/pseries/lpar.c
>>> +++ b/arch/powerpc/platforms/pseries/lpar.c
>>> @@ -418,6 +418,73 @@ static void pSeries_lpar_hpte_invalidate(unsigned long slot, unsigned long vpn,
>>>  	BUG_ON(lpar_rc != H_SUCCESS);
>>>  }
>>>  
>>> +
>>> +/*
>>> + * As defined in the PAPR's section 14.5.4.1.8
>>> + * The control mask doesn't include the returned reference and change bit from
>>> + * the processed PTE.
>>> + */
>>> +#define HBLKR_AVPN		0x0100000000000000UL
>>> +#define HBLKR_CTRL_MASK		0xf800000000000000UL
>>> +#define HBLKR_CTRL_SUCCESS	0x8000000000000000UL
>>> +#define HBLKR_CTRL_ERRNOTFOUND	0x8800000000000000UL
>>> +#define HBLKR_CTRL_ERRBUSY	0xa000000000000000UL
>>> +
>>> +/**
>>> + * H_BLOCK_REMOVE caller.
>>> + * @idx should point to the latest @param entry set with a PTEX.
>>> + * If PTE cannot be processed because another CPUs has already locked that
>>> + * group, those entries are put back in @param starting at index 1.
>>> + * If entries has to be retried and @retry_busy is set to true, these entries
>>> + * are retried until success. If @retry_busy is set to false, the returned
>>> + * is the number of entries yet to process.
>>> + */
>>> +static unsigned long call_block_remove(unsigned long idx, unsigned long *param,
>>> +				       bool retry_busy)
>>> +{
>>> +	unsigned long i, rc, new_idx;
>>> +	unsigned long retbuf[PLPAR_HCALL9_BUFSIZE];
>>> +
>>> +again:
>>> +	new_idx = 0;
>>> +	BUG_ON((idx < 2) || (idx > PLPAR_HCALL9_BUFSIZE));
>>
>> I count 1 ..
>>
>>> +	if (idx < PLPAR_HCALL9_BUFSIZE)
>>> +		param[idx] = HBR_END;
>>> +
>>> +	rc = plpar_hcall9(H_BLOCK_REMOVE, retbuf,
>>> +			  param[0], /* AVA */
>>> +			  param[1],  param[2],  param[3],  param[4], /* TS0-7 */
>>> +			  param[5],  param[6],  param[7],  param[8]);
>>> +	if (rc == H_SUCCESS)
>>> +		return 0;
>>> +
>>> +	BUG_ON(rc != H_PARTIAL);
>>
>> 2 ...
>>
>>> +	/* Check that the unprocessed entries were 'not found' or 'busy' */
>>> +	for (i = 0; i < idx-1; i++) {
>>> +		unsigned long ctrl = retbuf[i] & HBLKR_CTRL_MASK;
>>> +
>>> +		if (ctrl == HBLKR_CTRL_ERRBUSY) {
>>> +			param[++new_idx] = param[i+1];
>>> +			continue;
>>> +		}
>>> +
>>> +		BUG_ON(ctrl != HBLKR_CTRL_SUCCESS
>>> +		       && ctrl != HBLKR_CTRL_ERRNOTFOUND);
>>
>> 3 ...
>>
>> BUG_ON()s.
>>
>> I know the code in this file is already pretty liberal with the use of
>> BUG_ON() but I'd prefer if we don't make it any worse.
>>
>> Given this is an optimisation it seems like we should be able to fall
>> back to the existing implementation in the case of error (which will
>> probably then BUG_ON() 😂)
>>
>> If there's some reason we can't then I guess I can live with it.
> 
> It would be nice to log the error in case we are not expecting the
> error return. We recently did
> https://marc.info/?i=20180629083904.29250-1-aneesh.kumar@linux.ibm.com

I'm not sure that a failure during an invalidation should just result in an
error message being displayed because the page remains accessible and could
potentially be accessed later.
A comment in the caller hash__tlb_flush(), is quite explicit about that:
	/* If there's a TLB batch pending, then we must flush it because the
	 * pages are going to be freed and we really don't want to have a CPU
	 * access a freed page because it has a stale TLB
	 */

Getting an error when adding an entry may not be fatal but when removing one,
this could lead to data being exposed.

Laurent.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 3/3] powerpc/pseries/mm: call H_BLOCK_REMOVE
  2018-07-27 13:22 Laurent Dufour
@ 2018-07-27 13:22 ` Laurent Dufour
  0 siblings, 0 replies; 13+ messages in thread
From: Laurent Dufour @ 2018-07-27 13:22 UTC (permalink / raw)
  To: linuxppc-dev, linux-kernel
  Cc: Aneesh Kumar K.V, Nicholas Piggin, Michael Ellerman,
	Paul Mackerras, Benjamin Herrenschmidt

This hypervisor call allows to remove up to 8 ptes with only call to tlbie.

The virtual pages must be all within the same naturally aligned 8 page
virtual address block and have the same page and segment size encodings.

Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/hvcall.h     |   1 +
 arch/powerpc/platforms/pseries/lpar.c | 223 +++++++++++++++++++++++++++++++---
 2 files changed, 205 insertions(+), 19 deletions(-)

diff --git a/arch/powerpc/include/asm/hvcall.h b/arch/powerpc/include/asm/hvcall.h
index 662c8347d699..e403d574651d 100644
--- a/arch/powerpc/include/asm/hvcall.h
+++ b/arch/powerpc/include/asm/hvcall.h
@@ -278,6 +278,7 @@
 #define H_COP			0x304
 #define H_GET_MPP_X		0x314
 #define H_SET_MODE		0x31C
+#define H_BLOCK_REMOVE		0x328
 #define H_CLEAR_HPT		0x358
 #define H_REQUEST_VMC		0x360
 #define H_RESIZE_HPT_PREPARE	0x36C
diff --git a/arch/powerpc/platforms/pseries/lpar.c b/arch/powerpc/platforms/pseries/lpar.c
index 96b8cd8a802d..41ed03245eb4 100644
--- a/arch/powerpc/platforms/pseries/lpar.c
+++ b/arch/powerpc/platforms/pseries/lpar.c
@@ -418,6 +418,73 @@ static void pSeries_lpar_hpte_invalidate(unsigned long slot, unsigned long vpn,
 	BUG_ON(lpar_rc != H_SUCCESS);
 }
 
+
+/*
+ * As defined in the PAPR's section 14.5.4.1.8
+ * The control mask doesn't include the returned reference and change bit from
+ * the processed PTE.
+ */
+#define HBLKR_AVPN		0x0100000000000000UL
+#define HBLKR_CTRL_MASK		0xf800000000000000UL
+#define HBLKR_CTRL_SUCCESS	0x8000000000000000UL
+#define HBLKR_CTRL_ERRNOTFOUND	0x8800000000000000UL
+#define HBLKR_CTRL_ERRBUSY	0xa000000000000000UL
+
+/**
+ * H_BLOCK_REMOVE caller.
+ * @idx should point to the latest @param entry set with a PTEX.
+ * If PTE cannot be processed because another CPUs has already locked that
+ * group, those entries are put back in @param starting at index 1.
+ * If entries has to be retried and @retry_busy is set to true, these entries
+ * are retried until success. If @retry_busy is set to false, the returned
+ * is the number of entries yet to process.
+ */
+static unsigned long call_block_remove(unsigned long idx, unsigned long *param,
+				       bool retry_busy)
+{
+	unsigned long i, rc, new_idx;
+	unsigned long retbuf[PLPAR_HCALL9_BUFSIZE];
+
+again:
+	new_idx = 0;
+	BUG_ON((idx < 2) || (idx > PLPAR_HCALL9_BUFSIZE));
+	if (idx < PLPAR_HCALL9_BUFSIZE)
+		param[idx] = HBR_END;
+
+	rc = plpar_hcall9(H_BLOCK_REMOVE, retbuf,
+			  param[0], /* AVA */
+			  param[1],  param[2],  param[3],  param[4], /* TS0-7 */
+			  param[5],  param[6],  param[7],  param[8]);
+	if (rc == H_SUCCESS)
+		return 0;
+
+	BUG_ON(rc != H_PARTIAL);
+
+	/* Check that the unprocessed entries were 'not found' or 'busy' */
+	for (i = 0; i < idx-1; i++) {
+		unsigned long ctrl = retbuf[i] & HBLKR_CTRL_MASK;
+
+		if (ctrl == HBLKR_CTRL_ERRBUSY) {
+			param[++new_idx] = param[i+1];
+			continue;
+		}
+
+		BUG_ON(ctrl != HBLKR_CTRL_SUCCESS
+		       && ctrl != HBLKR_CTRL_ERRNOTFOUND);
+	}
+
+	/*
+	 * If there were entries found busy, retry these entries if requested,
+	 * of if all the entries have to be retried.
+	 */
+	if (new_idx && (retry_busy || new_idx == (PLPAR_HCALL9_BUFSIZE-1))) {
+		idx = new_idx + 1;
+		goto again;
+	}
+
+	return new_idx;
+}
+
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 /*
  * Limit iterations holding pSeries_lpar_tlbie_lock to 3. We also need
@@ -425,17 +492,59 @@ static void pSeries_lpar_hpte_invalidate(unsigned long slot, unsigned long vpn,
  */
 #define PPC64_HUGE_HPTE_BATCH 12
 
-static void __pSeries_lpar_hugepage_invalidate(unsigned long *slot,
-					     unsigned long *vpn, int count,
-					     int psize, int ssize)
+static void hugepage_block_invalidate(unsigned long *slot, unsigned long *vpn,
+				      int count, int psize, int ssize)
 {
 	unsigned long param[PLPAR_HCALL9_BUFSIZE];
-	int i = 0, pix = 0, rc;
-	unsigned long flags = 0;
-	int lock_tlbie = !mmu_has_feature(MMU_FTR_LOCKLESS_TLBIE);
+	unsigned long shift, current_vpgb, vpgb;
+	int i, pix = 0;
 
-	if (lock_tlbie)
-		spin_lock_irqsave(&pSeries_lpar_tlbie_lock, flags);
+	shift = mmu_psize_defs[psize].shift;
+
+	for (i = 0; i < count; i++) {
+		/*
+		 * Shifting 3 bits more on the right to get a
+		 * 8 pages aligned virtual addresse.
+		 */
+		vpgb = (vpn[i] >> (shift - VPN_SHIFT + 3));
+		if (!pix || vpgb != current_vpgb) {
+			/*
+			 * Need to start a new 8 pages block, flush
+			 * the current one if needed.
+			 */
+			if (pix)
+				(void)call_block_remove(pix, param, true);
+			current_vpgb = vpgb;
+			param[0] = hpte_encode_avpn(vpn[i], psize, ssize);
+			pix = 1;
+		}
+
+		param[pix++] = HBR_REQUEST | HBLKR_AVPN | slot[i];
+		if (pix == PLPAR_HCALL9_BUFSIZE) {
+			pix = call_block_remove(pix, param, false);
+			/*
+			 * pix = 0 means that all the entries were
+			 * removed, we can start a new block.
+			 * Otherwise, this means that there are entries
+			 * to retry, and pix points to latest one, so
+			 * we should increment it and try to continue
+			 * the same block.
+			 */
+			if (!pix)
+				current_vpgb = 0;
+			else
+				pix++;
+		}
+	}
+	if (pix)
+		(void)call_block_remove(pix, param, true);
+}
+
+static void hugepage_bulk_invalidate(unsigned long *slot, unsigned long *vpn,
+				     int count, int psize, int ssize)
+{
+	unsigned long param[PLPAR_HCALL9_BUFSIZE];
+	int i = 0, pix = 0, rc;
 
 	for (i = 0; i < count; i++) {
 
@@ -443,17 +552,6 @@ static void __pSeries_lpar_hugepage_invalidate(unsigned long *slot,
 			pSeries_lpar_hpte_invalidate(slot[i], vpn[i], psize, 0,
 						     ssize, 0);
 		} else {
-			param[pix] = HBR_REQUEST | HBR_AVPN | slot[i];
-			param[pix+1] = hpte_encode_avpn(vpn[i], psize, ssize);
-			pix += 2;
-			if (pix == 8) {
-				rc = plpar_hcall9(H_BULK_REMOVE, param,
-						  param[0], param[1], param[2],
-						  param[3], param[4], param[5],
-						  param[6], param[7]);
-				BUG_ON(rc != H_SUCCESS);
-				pix = 0;
-			}
 		}
 	}
 	if (pix) {
@@ -463,6 +561,23 @@ static void __pSeries_lpar_hugepage_invalidate(unsigned long *slot,
 				  param[6], param[7]);
 		BUG_ON(rc != H_SUCCESS);
 	}
+}
+
+static inline void __pSeries_lpar_hugepage_invalidate(unsigned long *slot,
+						      unsigned long *vpn,
+						      int count, int psize,
+						      int ssize)
+{
+	unsigned long flags = 0;
+	int lock_tlbie = !mmu_has_feature(MMU_FTR_LOCKLESS_TLBIE);
+
+	if (lock_tlbie)
+		spin_lock_irqsave(&pSeries_lpar_tlbie_lock, flags);
+
+	if (firmware_has_feature(FW_FEATURE_BLOCK_REMOVE))
+		hugepage_block_invalidate(slot, vpn, count, psize, ssize);
+	else
+		hugepage_bulk_invalidate(slot, vpn, count, psize, ssize);
 
 	if (lock_tlbie)
 		spin_unlock_irqrestore(&pSeries_lpar_tlbie_lock, flags);
@@ -565,6 +680,70 @@ static inline unsigned long compute_slot(real_pte_t pte,
 	return slot;
 }
 
+/**
+ * The hcall H_BLOCK_REMOVE implies that the virtual pages to processed are
+ * "all within the same naturally aligned 8 page virtual address block".
+ */
+static void do_block_remove(unsigned long number, struct ppc64_tlb_batch *batch,
+			    unsigned long *param)
+{
+	unsigned long vpn;
+	unsigned long i, pix = 0;
+	unsigned long index, shift, slot, current_vpgb, vpgb;
+	real_pte_t pte;
+	int psize, ssize;
+
+	psize = batch->psize;
+	ssize = batch->ssize;
+
+	for (i = 0; i < number; i++) {
+		vpn = batch->vpn[i];
+		pte = batch->pte[i];
+		pte_iterate_hashed_subpages(pte, psize, vpn, index, shift) {
+			/*
+			 * Shifting 3 bits more on the right to get a
+			 * 8 pages aligned virtual addresse.
+			 */
+			vpgb = (vpn >> (shift - VPN_SHIFT + 3));
+			if (!pix || vpgb != current_vpgb) {
+				/*
+				 * Need to start a new 8 pages block, flush
+				 * the current one if needed.
+				 */
+				if (pix)
+					(void)call_block_remove(pix, param,
+								true);
+				current_vpgb = vpgb;
+				param[0] = hpte_encode_avpn(vpn, psize,
+							    ssize);
+				pix = 1;
+			}
+
+			slot = compute_slot(pte, vpn, index, shift, ssize);
+			param[pix++] = HBR_REQUEST | HBLKR_AVPN | slot;
+
+			if (pix == PLPAR_HCALL9_BUFSIZE) {
+				pix = call_block_remove(pix, param, false);
+				/*
+				 * pix = 0 means that all the entries were
+				 * removed, we can start a new block.
+				 * Otherwise, this means that there are entries
+				 * to retry, and pix points to latest one, so
+				 * we should increment it and try to continue
+				 * the same block.
+				 */
+				if (!pix)
+					current_vpgb = 0;
+				else
+					pix++;
+			}
+		} pte_iterate_hashed_end();
+	}
+
+	if (pix > 1)
+		(void)call_block_remove(pix, param, true);
+}
+
 /*
  * Take a spinlock around flushes to avoid bouncing the hypervisor tlbie
  * lock.
@@ -584,6 +763,11 @@ static void pSeries_lpar_flush_hash_range(unsigned long number, int local)
 	if (lock_tlbie)
 		spin_lock_irqsave(&pSeries_lpar_tlbie_lock, flags);
 
+	if (firmware_has_feature(FW_FEATURE_BLOCK_REMOVE)) {
+		do_block_remove(number, batch, param);
+		goto out;
+	}
+
 	psize = batch->psize;
 	ssize = batch->ssize;
 	pix = 0;
@@ -622,6 +806,7 @@ static void pSeries_lpar_flush_hash_range(unsigned long number, int local)
 		BUG_ON(rc != H_SUCCESS);
 	}
 
+out:
 	if (lock_tlbie)
 		spin_unlock_irqrestore(&pSeries_lpar_tlbie_lock, flags);
 }
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2018-08-16 17:28 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-27 13:51 [resend] [PATCH 0/3] powerpc/pseries: use H_BLOCK_REMOVE Laurent Dufour
2018-07-27 13:51 ` [PATCH 1/3] powerpc/pseries/mm: Introducing FW_FEATURE_BLOCK_REMOVE Laurent Dufour
2018-07-30 14:18   ` Aneesh Kumar K.V
2018-07-27 13:51 ` [PATCH 2/3] powerpc/pseries/mm: factorize PTE slot computation Laurent Dufour
2018-07-30 14:19   ` Aneesh Kumar K.V
2018-07-27 13:51 ` [PATCH 3/3] powerpc/pseries/mm: call H_BLOCK_REMOVE Laurent Dufour
2018-07-30 13:47   ` Michael Ellerman
2018-07-30 14:22     ` Aneesh Kumar K.V
2018-08-16 17:27       ` Laurent Dufour
2018-08-16  9:41     ` Laurent Dufour
2018-08-01  1:55 ` [resend] [PATCH 0/3] powerpc/pseries: use H_BLOCK_REMOVE Nicholas Piggin
2018-08-03  2:29   ` Michael Ellerman
  -- strict thread matches above, loose matches on Subject: below --
2018-07-27 13:22 Laurent Dufour
2018-07-27 13:22 ` [PATCH 3/3] powerpc/pseries/mm: call H_BLOCK_REMOVE Laurent Dufour

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).