linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/3] powerpc/pseries: use H_BLOCK_REMOVE
@ 2018-08-20 14:29 Laurent Dufour
  2018-08-20 14:29 ` [PATCH v2 1/3] powerpc/pseries/mm: Introducing FW_FEATURE_BLOCK_REMOVE Laurent Dufour
                   ` (3 more replies)
  0 siblings, 4 replies; 6+ messages in thread
From: Laurent Dufour @ 2018-08-20 14:29 UTC (permalink / raw)
  To: linuxppc-dev, linux-kernel; +Cc: aneesh.kumar, mpe, benh, paulus, npiggin

On very large system we could see soft lockup fired when a process is
exiting

watchdog: BUG: soft lockup - CPU#851 stuck for 21s! [forkoff:215523]
Modules linked in: pseries_rng rng_core xfs raid10 vmx_crypto btrfs libcrc32c xor zstd_decompress zstd_compress xxhash lzo_compress raid6_pq crc32c_vpmsum lpfc crc_t10dif crct10dif_generic crct10dif_common dm_multipath scsi_dh_rdac scsi_dh_alua autofs4
CPU: 851 PID: 215523 Comm: forkoff Not tainted 4.17.0 #1
NIP:  c0000000000b995c LR: c0000000000b8f64 CTR: 000000000000aa18
REGS: c00006b0645b7610 TRAP: 0901   Not tainted  (4.17.0)
MSR:  800000010280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE,TM[E]>  CR: 22042082  XER: 00000000
CFAR: 00000000006cf8f0 SOFTE: 0 
GPR00: 0010000000000000 c00006b0645b7890 c000000000f99200 0000000000000000 
GPR04: 8e000001a5a4de58 400249cf1bfd5480 8e000001a5a4de50 400249cf1bfd5480 
GPR08: 8e000001a5a4de48 400249cf1bfd5480 8e000001a5a4de40 400249cf1bfd5480 
GPR12: ffffffffffffffff c00000001e690800 
NIP [c0000000000b995c] plpar_hcall9+0x44/0x7c
LR [c0000000000b8f64] pSeries_lpar_flush_hash_range+0x324/0x3d0
Call Trace:
[c00006b0645b7890] [8e000001a5a4dd20] 0x8e000001a5a4dd20 (unreliable)
[c00006b0645b7a00] [c00000000006d5b0] flush_hash_range+0x60/0x110
[c00006b0645b7a50] [c000000000072a2c] __flush_tlb_pending+0x4c/0xd0
[c00006b0645b7a80] [c0000000002eaf44] unmap_page_range+0x984/0xbd0
[c00006b0645b7bc0] [c0000000002eb594] unmap_vmas+0x84/0x100
[c00006b0645b7c10] [c0000000002f8afc] exit_mmap+0xac/0x1f0
[c00006b0645b7cd0] [c0000000000f2638] mmput+0x98/0x1b0
[c00006b0645b7d00] [c0000000000fc9d0] do_exit+0x330/0xc00
[c00006b0645b7dc0] [c0000000000fd384] do_group_exit+0x64/0x100
[c00006b0645b7e00] [c0000000000fd44c] sys_exit_group+0x2c/0x30
[c00006b0645b7e30] [c00000000000b960] system_call+0x58/0x6c
Instruction dump:
60000000 f8810028 7ca42b78 7cc53378 7ce63b78 7d074378 7d284b78 7d495378 
e9410060 e9610068 e9810070 44000022 <7d806378> e9810028 f88c0000 f8ac0008

This happens when removing the PTE by calling the hypervisor using the
H_BULK_REMOVE call. This call is processing up to 4 PTEs but is doing a
tlbie for each PTE it is processing. This could lead to long time spent in
the hypervisor (sometimes up to 4s) and soft lockup being raised because
the scheduler is not called in zap_pte_range().

Since the Power7's time, the hypervisor is providing a new hcall
H_BLOCK_REMOVE allowing processing up to 8 PTEs with one call to
tlbie. By limiting the amount of tlbie generated, this reduces the time
spent invalidating the PTEs.

This hcall requires that the pages are "all within the same naturally
aligned 8 page virtual address block".

With this patch series applied, I couldn't see any soft lockup raised on
the victim LPAR I was running the test one.

Changes since V1:
- Remove a call to BUG_ON() in call_block_remove() since this one can be
  handled gently.
- Remove uneeded of current_vpgb to 0 when retrying entries in
  hugepage_block_invalidate() and do_block_remove().

Laurent Dufour (3):
  powerpc/pseries/mm: Introducing FW_FEATURE_BLOCK_REMOVE
  powerpc/pseries/mm: factorize PTE slot computation
  powerpc/pseries/mm: call H_BLOCK_REMOVE

 arch/powerpc/include/asm/firmware.h       |   3 +-
 arch/powerpc/include/asm/hvcall.h         |   1 +
 arch/powerpc/platforms/pseries/firmware.c |   1 +
 arch/powerpc/platforms/pseries/lpar.c     | 241 ++++++++++++++++++++++++++++--
 4 files changed, 230 insertions(+), 16 deletions(-)

-- 
2.7.4


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH v2 1/3] powerpc/pseries/mm: Introducing FW_FEATURE_BLOCK_REMOVE
  2018-08-20 14:29 [PATCH v2 0/3] powerpc/pseries: use H_BLOCK_REMOVE Laurent Dufour
@ 2018-08-20 14:29 ` Laurent Dufour
  2018-09-20  4:20   ` [v2,1/3] " Michael Ellerman
  2018-08-20 14:29 ` [PATCH v2 2/3] powerpc/pseries/mm: factorize PTE slot computation Laurent Dufour
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 6+ messages in thread
From: Laurent Dufour @ 2018-08-20 14:29 UTC (permalink / raw)
  To: linuxppc-dev, linux-kernel; +Cc: aneesh.kumar, mpe, benh, paulus, npiggin

This feature tells if the hcall H_BLOCK_REMOVE is available.

Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/firmware.h       | 3 ++-
 arch/powerpc/platforms/pseries/firmware.c | 1 +
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/firmware.h b/arch/powerpc/include/asm/firmware.h
index 7a051bd21f87..2aca2655fe30 100644
--- a/arch/powerpc/include/asm/firmware.h
+++ b/arch/powerpc/include/asm/firmware.h
@@ -52,6 +52,7 @@
 #define FW_FEATURE_PRRN		ASM_CONST(0x0000000200000000)
 #define FW_FEATURE_DRMEM_V2	ASM_CONST(0x0000000400000000)
 #define FW_FEATURE_DRC_INFO	ASM_CONST(0x0000000800000000)
+#define FW_FEATURE_BLOCK_REMOVE ASM_CONST(0x0000001000000000)
 
 #ifndef __ASSEMBLY__
 
@@ -69,7 +70,7 @@ enum {
 		FW_FEATURE_SET_MODE | FW_FEATURE_BEST_ENERGY |
 		FW_FEATURE_TYPE1_AFFINITY | FW_FEATURE_PRRN |
 		FW_FEATURE_HPT_RESIZE | FW_FEATURE_DRMEM_V2 |
-		FW_FEATURE_DRC_INFO,
+		FW_FEATURE_DRC_INFO | FW_FEATURE_BLOCK_REMOVE,
 	FW_FEATURE_PSERIES_ALWAYS = 0,
 	FW_FEATURE_POWERNV_POSSIBLE = FW_FEATURE_OPAL,
 	FW_FEATURE_POWERNV_ALWAYS = 0,
diff --git a/arch/powerpc/platforms/pseries/firmware.c b/arch/powerpc/platforms/pseries/firmware.c
index a3bbeb43689e..1624501386f4 100644
--- a/arch/powerpc/platforms/pseries/firmware.c
+++ b/arch/powerpc/platforms/pseries/firmware.c
@@ -65,6 +65,7 @@ hypertas_fw_features_table[] = {
 	{FW_FEATURE_SET_MODE,		"hcall-set-mode"},
 	{FW_FEATURE_BEST_ENERGY,	"hcall-best-energy-1*"},
 	{FW_FEATURE_HPT_RESIZE,		"hcall-hpt-resize"},
+	{FW_FEATURE_BLOCK_REMOVE,	"hcall-block-remove"},
 };
 
 /* Build up the firmware features bitmask using the contents of
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH v2 2/3] powerpc/pseries/mm: factorize PTE slot computation
  2018-08-20 14:29 [PATCH v2 0/3] powerpc/pseries: use H_BLOCK_REMOVE Laurent Dufour
  2018-08-20 14:29 ` [PATCH v2 1/3] powerpc/pseries/mm: Introducing FW_FEATURE_BLOCK_REMOVE Laurent Dufour
@ 2018-08-20 14:29 ` Laurent Dufour
  2018-08-20 14:29 ` [PATCH v2 3/3] powerpc/pseries/mm: call H_BLOCK_REMOVE Laurent Dufour
  2018-09-10 13:38 ` [PATCH v2 0/3] powerpc/pseries: use H_BLOCK_REMOVE Laurent Dufour
  3 siblings, 0 replies; 6+ messages in thread
From: Laurent Dufour @ 2018-08-20 14:29 UTC (permalink / raw)
  To: linuxppc-dev, linux-kernel; +Cc: aneesh.kumar, mpe, benh, paulus, npiggin

This part of code will be called also when dealing with H_BLOCK_REMOVE.

Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/pseries/lpar.c | 27 ++++++++++++++++++++-------
 1 file changed, 20 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/lpar.c b/arch/powerpc/platforms/pseries/lpar.c
index d3992ced0782..ebc852e3607d 100644
--- a/arch/powerpc/platforms/pseries/lpar.c
+++ b/arch/powerpc/platforms/pseries/lpar.c
@@ -546,6 +546,24 @@ static int pSeries_lpar_hpte_removebolted(unsigned long ea,
 	return 0;
 }
 
+
+static inline unsigned long compute_slot(real_pte_t pte,
+					 unsigned long vpn,
+					 unsigned long index,
+					 unsigned long shift,
+					 int ssize)
+{
+	unsigned long slot, hash, hidx;
+
+	hash = hpt_hash(vpn, shift, ssize);
+	hidx = __rpte_to_hidx(pte, index);
+	if (hidx & _PTEIDX_SECONDARY)
+		hash = ~hash;
+	slot = (hash & htab_hash_mask) * HPTES_PER_GROUP;
+	slot += hidx & _PTEIDX_GROUP_IX;
+	return slot;
+}
+
 /*
  * Take a spinlock around flushes to avoid bouncing the hypervisor tlbie
  * lock.
@@ -558,7 +576,7 @@ static void pSeries_lpar_flush_hash_range(unsigned long number, int local)
 	struct ppc64_tlb_batch *batch = this_cpu_ptr(&ppc64_tlb_batch);
 	int lock_tlbie = !mmu_has_feature(MMU_FTR_LOCKLESS_TLBIE);
 	unsigned long param[PLPAR_HCALL9_BUFSIZE];
-	unsigned long hash, index, shift, hidx, slot;
+	unsigned long index, shift, slot;
 	real_pte_t pte;
 	int psize, ssize;
 
@@ -572,12 +590,7 @@ static void pSeries_lpar_flush_hash_range(unsigned long number, int local)
 		vpn = batch->vpn[i];
 		pte = batch->pte[i];
 		pte_iterate_hashed_subpages(pte, psize, vpn, index, shift) {
-			hash = hpt_hash(vpn, shift, ssize);
-			hidx = __rpte_to_hidx(pte, index);
-			if (hidx & _PTEIDX_SECONDARY)
-				hash = ~hash;
-			slot = (hash & htab_hash_mask) * HPTES_PER_GROUP;
-			slot += hidx & _PTEIDX_GROUP_IX;
+			slot = compute_slot(pte, vpn, index, shift, ssize);
 			if (!firmware_has_feature(FW_FEATURE_BULK_REMOVE)) {
 				/*
 				 * lpar doesn't use the passed actual page size
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH v2 3/3] powerpc/pseries/mm: call H_BLOCK_REMOVE
  2018-08-20 14:29 [PATCH v2 0/3] powerpc/pseries: use H_BLOCK_REMOVE Laurent Dufour
  2018-08-20 14:29 ` [PATCH v2 1/3] powerpc/pseries/mm: Introducing FW_FEATURE_BLOCK_REMOVE Laurent Dufour
  2018-08-20 14:29 ` [PATCH v2 2/3] powerpc/pseries/mm: factorize PTE slot computation Laurent Dufour
@ 2018-08-20 14:29 ` Laurent Dufour
  2018-09-10 13:38 ` [PATCH v2 0/3] powerpc/pseries: use H_BLOCK_REMOVE Laurent Dufour
  3 siblings, 0 replies; 6+ messages in thread
From: Laurent Dufour @ 2018-08-20 14:29 UTC (permalink / raw)
  To: linuxppc-dev, linux-kernel; +Cc: aneesh.kumar, mpe, benh, paulus, npiggin

This hypervisor's call allows to remove up to 8 ptes with only call to
tlbie.

The virtual pages must be all within the same naturally aligned 8 pages
virtual address block and have the same page and segment size encodings.

Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/hvcall.h     |   1 +
 arch/powerpc/platforms/pseries/lpar.c | 214 ++++++++++++++++++++++++++++++++--
 2 files changed, 207 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/include/asm/hvcall.h b/arch/powerpc/include/asm/hvcall.h
index a0b17f9f1ea4..c349d3960d63 100644
--- a/arch/powerpc/include/asm/hvcall.h
+++ b/arch/powerpc/include/asm/hvcall.h
@@ -278,6 +278,7 @@
 #define H_COP			0x304
 #define H_GET_MPP_X		0x314
 #define H_SET_MODE		0x31C
+#define H_BLOCK_REMOVE		0x328
 #define H_CLEAR_HPT		0x358
 #define H_REQUEST_VMC		0x360
 #define H_RESIZE_HPT_PREPARE	0x36C
diff --git a/arch/powerpc/platforms/pseries/lpar.c b/arch/powerpc/platforms/pseries/lpar.c
index ebc852e3607d..0b5081085a44 100644
--- a/arch/powerpc/platforms/pseries/lpar.c
+++ b/arch/powerpc/platforms/pseries/lpar.c
@@ -417,6 +417,79 @@ static void pSeries_lpar_hpte_invalidate(unsigned long slot, unsigned long vpn,
 	BUG_ON(lpar_rc != H_SUCCESS);
 }
 
+
+/*
+ * As defined in the PAPR's section 14.5.4.1.8
+ * The control mask doesn't include the returned reference and change bit from
+ * the processed PTE.
+ */
+#define HBLKR_AVPN		0x0100000000000000UL
+#define HBLKR_CTRL_MASK		0xf800000000000000UL
+#define HBLKR_CTRL_SUCCESS	0x8000000000000000UL
+#define HBLKR_CTRL_ERRNOTFOUND	0x8800000000000000UL
+#define HBLKR_CTRL_ERRBUSY	0xa000000000000000UL
+
+/**
+ * H_BLOCK_REMOVE caller.
+ * @idx should point to the latest @param entry set with a PTEX.
+ * If PTE cannot be processed because another CPUs has already locked that
+ * group, those entries are put back in @param starting at index 1.
+ * If entries has to be retried and @retry_busy is set to true, these entries
+ * are retried until success. If @retry_busy is set to false, the returned
+ * is the number of entries yet to process.
+ */
+static unsigned long call_block_remove(unsigned long idx, unsigned long *param,
+				       bool retry_busy)
+{
+	unsigned long i, rc, new_idx;
+	unsigned long retbuf[PLPAR_HCALL9_BUFSIZE];
+
+	if (idx < 2) {
+		pr_warn("Unexpected empty call to H_BLOCK_REMOVE");
+		return 0;
+	}
+again:
+	new_idx = 0;
+	if (idx > PLPAR_HCALL9_BUFSIZE) {
+		pr_err("Too many PTEs (%lu) for H_BLOCK_REMOVE", idx);
+		idx = PLPAR_HCALL9_BUFSIZE;
+	} else if (idx < PLPAR_HCALL9_BUFSIZE)
+		param[idx] = HBR_END;
+
+	rc = plpar_hcall9(H_BLOCK_REMOVE, retbuf,
+			  param[0], /* AVA */
+			  param[1],  param[2],  param[3],  param[4], /* TS0-7 */
+			  param[5],  param[6],  param[7],  param[8]);
+	if (rc == H_SUCCESS)
+		return 0;
+
+	BUG_ON(rc != H_PARTIAL);
+
+	/* Check that the unprocessed entries were 'not found' or 'busy' */
+	for (i = 0; i < idx-1; i++) {
+		unsigned long ctrl = retbuf[i] & HBLKR_CTRL_MASK;
+
+		if (ctrl == HBLKR_CTRL_ERRBUSY) {
+			param[++new_idx] = param[i+1];
+			continue;
+		}
+
+		BUG_ON(ctrl != HBLKR_CTRL_SUCCESS
+		       && ctrl != HBLKR_CTRL_ERRNOTFOUND);
+	}
+
+	/*
+	 * If there were entries found busy, retry these entries if requested,
+	 * of if all the entries have to be retried.
+	 */
+	if (new_idx && (retry_busy || new_idx == (PLPAR_HCALL9_BUFSIZE-1))) {
+		idx = new_idx + 1;
+		goto again;
+	}
+
+	return new_idx;
+}
+
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 /*
  * Limit iterations holding pSeries_lpar_tlbie_lock to 3. We also need
@@ -424,17 +497,57 @@ static void pSeries_lpar_hpte_invalidate(unsigned long slot, unsigned long vpn,
  */
 #define PPC64_HUGE_HPTE_BATCH 12
 
-static void __pSeries_lpar_hugepage_invalidate(unsigned long *slot,
-					     unsigned long *vpn, int count,
-					     int psize, int ssize)
+static void hugepage_block_invalidate(unsigned long *slot, unsigned long *vpn,
+				      int count, int psize, int ssize)
 {
 	unsigned long param[PLPAR_HCALL9_BUFSIZE];
-	int i = 0, pix = 0, rc;
-	unsigned long flags = 0;
-	int lock_tlbie = !mmu_has_feature(MMU_FTR_LOCKLESS_TLBIE);
+	unsigned long shift, current_vpgb, vpgb;
+	int i, pix = 0;
 
-	if (lock_tlbie)
-		spin_lock_irqsave(&pSeries_lpar_tlbie_lock, flags);
+	shift = mmu_psize_defs[psize].shift;
+
+	for (i = 0; i < count; i++) {
+		/*
+		 * Shifting 3 bits more on the right to get a
+		 * 8 pages aligned virtual addresse.
+		 */
+		vpgb = (vpn[i] >> (shift - VPN_SHIFT + 3));
+		if (!pix || vpgb != current_vpgb) {
+			/*
+			 * Need to start a new 8 pages block, flush
+			 * the current one if needed.
+			 */
+			if (pix)
+				(void)call_block_remove(pix, param, true);
+			current_vpgb = vpgb;
+			param[0] = hpte_encode_avpn(vpn[i], psize, ssize);
+			pix = 1;
+		}
+
+		param[pix++] = HBR_REQUEST | HBLKR_AVPN | slot[i];
+		if (pix == PLPAR_HCALL9_BUFSIZE) {
+			pix = call_block_remove(pix, param, false);
+			/*
+			 * pix = 0 means that all the entries were
+			 * removed, we can start a new block.
+			 * Otherwise, this means that there are entries
+			 * to retry, and pix points to latest one, so
+			 * we should increment it and try to continue
+			 * the same block.
+			 */
+			if (pix)
+				pix++;
+		}
+	}
+	if (pix)
+		(void)call_block_remove(pix, param, true);
+}
+
+static void hugepage_bulk_invalidate(unsigned long *slot, unsigned long *vpn,
+				     int count, int psize, int ssize)
+{
+	unsigned long param[PLPAR_HCALL9_BUFSIZE];
+	int i = 0, pix = 0, rc;
 
 	for (i = 0; i < count; i++) {
 
@@ -462,6 +575,23 @@ static void __pSeries_lpar_hugepage_invalidate(unsigned long *slot,
 				  param[6], param[7]);
 		BUG_ON(rc != H_SUCCESS);
 	}
+}
+
+static inline void __pSeries_lpar_hugepage_invalidate(unsigned long *slot,
+						      unsigned long *vpn,
+						      int count, int psize,
+						      int ssize)
+{
+	unsigned long flags = 0;
+	int lock_tlbie = !mmu_has_feature(MMU_FTR_LOCKLESS_TLBIE);
+
+	if (lock_tlbie)
+		spin_lock_irqsave(&pSeries_lpar_tlbie_lock, flags);
+
+	if (firmware_has_feature(FW_FEATURE_BLOCK_REMOVE))
+		hugepage_block_invalidate(slot, vpn, count, psize, ssize);
+	else
+		hugepage_bulk_invalidate(slot, vpn, count, psize, ssize);
 
 	if (lock_tlbie)
 		spin_unlock_irqrestore(&pSeries_lpar_tlbie_lock, flags);
@@ -564,6 +694,68 @@ static inline unsigned long compute_slot(real_pte_t pte,
 	return slot;
 }
 
+/**
+ * The hcall H_BLOCK_REMOVE implies that the virtual pages to processed are
+ * "all within the same naturally aligned 8 page virtual address block".
+ */
+static void do_block_remove(unsigned long number, struct ppc64_tlb_batch *batch,
+			    unsigned long *param)
+{
+	unsigned long vpn;
+	unsigned long i, pix = 0;
+	unsigned long index, shift, slot, current_vpgb, vpgb;
+	real_pte_t pte;
+	int psize, ssize;
+
+	psize = batch->psize;
+	ssize = batch->ssize;
+
+	for (i = 0; i < number; i++) {
+		vpn = batch->vpn[i];
+		pte = batch->pte[i];
+		pte_iterate_hashed_subpages(pte, psize, vpn, index, shift) {
+			/*
+			 * Shifting 3 bits more on the right to get a
+			 * 8 pages aligned virtual addresse.
+			 */
+			vpgb = (vpn >> (shift - VPN_SHIFT + 3));
+			if (!pix || vpgb != current_vpgb) {
+				/*
+				 * Need to start a new 8 pages block, flush
+				 * the current one if needed.
+				 */
+				if (pix)
+					(void)call_block_remove(pix, param,
+								true);
+				current_vpgb = vpgb;
+				param[0] = hpte_encode_avpn(vpn, psize,
+							    ssize);
+				pix = 1;
+			}
+
+			slot = compute_slot(pte, vpn, index, shift, ssize);
+			param[pix++] = HBR_REQUEST | HBLKR_AVPN | slot;
+
+			if (pix == PLPAR_HCALL9_BUFSIZE) {
+				pix = call_block_remove(pix, param, false);
+				/*
+				 * pix = 0 means that all the entries were
+				 * removed, we can start a new block.
+				 * Otherwise, this means that there are entries
+				 * to retry, and pix points to latest one, so
+				 * we should increment it and try to continue
+				 * the same block.
+				 */
+				if (pix)
+					pix++;
+			}
+		} pte_iterate_hashed_end();
+	}
+
+	if (pix)
+		(void)call_block_remove(pix, param, true);
+}
+
 /*
  * Take a spinlock around flushes to avoid bouncing the hypervisor tlbie
  * lock.
@@ -583,6 +775,11 @@ static void pSeries_lpar_flush_hash_range(unsigned long number, int local)
 	if (lock_tlbie)
 		spin_lock_irqsave(&pSeries_lpar_tlbie_lock, flags);
 
+	if (firmware_has_feature(FW_FEATURE_BLOCK_REMOVE)) {
+		do_block_remove(number, batch, param);
+		goto out;
+	}
+
 	psize = batch->psize;
 	ssize = batch->ssize;
 	pix = 0;
@@ -621,6 +818,7 @@ static void pSeries_lpar_flush_hash_range(unsigned long number, int local)
 		BUG_ON(rc != H_SUCCESS);
 	}
 
+out:
 	if (lock_tlbie)
 		spin_unlock_irqrestore(&pSeries_lpar_tlbie_lock, flags);
 }
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v2 0/3] powerpc/pseries: use H_BLOCK_REMOVE
  2018-08-20 14:29 [PATCH v2 0/3] powerpc/pseries: use H_BLOCK_REMOVE Laurent Dufour
                   ` (2 preceding siblings ...)
  2018-08-20 14:29 ` [PATCH v2 3/3] powerpc/pseries/mm: call H_BLOCK_REMOVE Laurent Dufour
@ 2018-09-10 13:38 ` Laurent Dufour
  3 siblings, 0 replies; 6+ messages in thread
From: Laurent Dufour @ 2018-09-10 13:38 UTC (permalink / raw)
  To: linuxppc-dev, linux-kernel, mpe; +Cc: aneesh.kumar, benh, paulus, npiggin

Hi Michael,

Do you plan to pull it for 4.20 ?

Cheers,
Laurent.

On 20/08/2018 16:29, Laurent Dufour wrote:
> On very large system we could see soft lockup fired when a process is
> exiting
> 
> watchdog: BUG: soft lockup - CPU#851 stuck for 21s! [forkoff:215523]
> Modules linked in: pseries_rng rng_core xfs raid10 vmx_crypto btrfs libcrc32c xor zstd_decompress zstd_compress xxhash lzo_compress raid6_pq crc32c_vpmsum lpfc crc_t10dif crct10dif_generic crct10dif_common dm_multipath scsi_dh_rdac scsi_dh_alua autofs4
> CPU: 851 PID: 215523 Comm: forkoff Not tainted 4.17.0 #1
> NIP:  c0000000000b995c LR: c0000000000b8f64 CTR: 000000000000aa18
> REGS: c00006b0645b7610 TRAP: 0901   Not tainted  (4.17.0)
> MSR:  800000010280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE,TM[E]>  CR: 22042082  XER: 00000000
> CFAR: 00000000006cf8f0 SOFTE: 0 
> GPR00: 0010000000000000 c00006b0645b7890 c000000000f99200 0000000000000000 
> GPR04: 8e000001a5a4de58 400249cf1bfd5480 8e000001a5a4de50 400249cf1bfd5480 
> GPR08: 8e000001a5a4de48 400249cf1bfd5480 8e000001a5a4de40 400249cf1bfd5480 
> GPR12: ffffffffffffffff c00000001e690800 
> NIP [c0000000000b995c] plpar_hcall9+0x44/0x7c
> LR [c0000000000b8f64] pSeries_lpar_flush_hash_range+0x324/0x3d0
> Call Trace:
> [c00006b0645b7890] [8e000001a5a4dd20] 0x8e000001a5a4dd20 (unreliable)
> [c00006b0645b7a00] [c00000000006d5b0] flush_hash_range+0x60/0x110
> [c00006b0645b7a50] [c000000000072a2c] __flush_tlb_pending+0x4c/0xd0
> [c00006b0645b7a80] [c0000000002eaf44] unmap_page_range+0x984/0xbd0
> [c00006b0645b7bc0] [c0000000002eb594] unmap_vmas+0x84/0x100
> [c00006b0645b7c10] [c0000000002f8afc] exit_mmap+0xac/0x1f0
> [c00006b0645b7cd0] [c0000000000f2638] mmput+0x98/0x1b0
> [c00006b0645b7d00] [c0000000000fc9d0] do_exit+0x330/0xc00
> [c00006b0645b7dc0] [c0000000000fd384] do_group_exit+0x64/0x100
> [c00006b0645b7e00] [c0000000000fd44c] sys_exit_group+0x2c/0x30
> [c00006b0645b7e30] [c00000000000b960] system_call+0x58/0x6c
> Instruction dump:
> 60000000 f8810028 7ca42b78 7cc53378 7ce63b78 7d074378 7d284b78 7d495378 
> e9410060 e9610068 e9810070 44000022 <7d806378> e9810028 f88c0000 f8ac0008
> 
> This happens when removing the PTE by calling the hypervisor using the
> H_BULK_REMOVE call. This call is processing up to 4 PTEs but is doing a
> tlbie for each PTE it is processing. This could lead to long time spent in
> the hypervisor (sometimes up to 4s) and soft lockup being raised because
> the scheduler is not called in zap_pte_range().
> 
> Since the Power7's time, the hypervisor is providing a new hcall
> H_BLOCK_REMOVE allowing processing up to 8 PTEs with one call to
> tlbie. By limiting the amount of tlbie generated, this reduces the time
> spent invalidating the PTEs.
> 
> This hcall requires that the pages are "all within the same naturally
> aligned 8 page virtual address block".
> 
> With this patch series applied, I couldn't see any soft lockup raised on
> the victim LPAR I was running the test one.
> 
> Changes since V1:
> - Remove a call to BUG_ON() in call_block_remove() since this one can be
>   handled gently.
> - Remove uneeded of current_vpgb to 0 when retrying entries in
>   hugepage_block_invalidate() and do_block_remove().
> 
> Laurent Dufour (3):
>   powerpc/pseries/mm: Introducing FW_FEATURE_BLOCK_REMOVE
>   powerpc/pseries/mm: factorize PTE slot computation
>   powerpc/pseries/mm: call H_BLOCK_REMOVE
> 
>  arch/powerpc/include/asm/firmware.h       |   3 +-
>  arch/powerpc/include/asm/hvcall.h         |   1 +
>  arch/powerpc/platforms/pseries/firmware.c |   1 +
>  arch/powerpc/platforms/pseries/lpar.c     | 241 ++++++++++++++++++++++++++++--
>  4 files changed, 230 insertions(+), 16 deletions(-)
> 


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [v2,1/3] powerpc/pseries/mm: Introducing FW_FEATURE_BLOCK_REMOVE
  2018-08-20 14:29 ` [PATCH v2 1/3] powerpc/pseries/mm: Introducing FW_FEATURE_BLOCK_REMOVE Laurent Dufour
@ 2018-09-20  4:20   ` Michael Ellerman
  0 siblings, 0 replies; 6+ messages in thread
From: Michael Ellerman @ 2018-09-20  4:20 UTC (permalink / raw)
  To: Laurent Dufour, linuxppc-dev, linux-kernel; +Cc: aneesh.kumar, npiggin, paulus

On Mon, 2018-08-20 at 14:29:34 UTC, Laurent Dufour wrote:
> This feature tells if the hcall H_BLOCK_REMOVE is available.
> 
> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
> Cc: Nicholas Piggin <npiggin@gmail.com>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Paul Mackerras <paulus@samba.org>
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>

Series applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/5600fbe340331e2a25d1b277f9c190

cheers

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2018-09-20  4:21 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-08-20 14:29 [PATCH v2 0/3] powerpc/pseries: use H_BLOCK_REMOVE Laurent Dufour
2018-08-20 14:29 ` [PATCH v2 1/3] powerpc/pseries/mm: Introducing FW_FEATURE_BLOCK_REMOVE Laurent Dufour
2018-09-20  4:20   ` [v2,1/3] " Michael Ellerman
2018-08-20 14:29 ` [PATCH v2 2/3] powerpc/pseries/mm: factorize PTE slot computation Laurent Dufour
2018-08-20 14:29 ` [PATCH v2 3/3] powerpc/pseries/mm: call H_BLOCK_REMOVE Laurent Dufour
2018-09-10 13:38 ` [PATCH v2 0/3] powerpc/pseries: use H_BLOCK_REMOVE Laurent Dufour

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).