linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] powerpc/mm: Conditionally call H_BLOCK_REMOVE
@ 2019-08-30 12:07 Laurent Dufour
  2019-08-30 12:07 ` [PATCH 1/3] powerpc/mm: Initialize the HPTE encoding values Laurent Dufour
                   ` (3 more replies)
  0 siblings, 4 replies; 17+ messages in thread
From: Laurent Dufour @ 2019-08-30 12:07 UTC (permalink / raw)
  To: mpe, benh, paulus, aneesh.kumar, npiggin
  Cc: linuxppc-dev, linux-mm, linux-kernel

Since the commit ba2dd8a26baa ("powerpc/pseries/mm: call H_BLOCK_REMOVE"),
the call to H_BLOCK_REMOVE is always done if the feature is exhibited.

On some system, the hypervisor may not support all the combination of
segment base page size and page size. When this happens the hcall is
returning H_PARAM, which is triggering a BUG_ON check leading to a panic.

The PAPR document is specifying a TLB Block Invalidate Characteristics item
detailing which couple base page size, page size the hypervisor is
supporting through H_BLOCK_REMOVE. Furthermore, the characteristics are
also providing the size of the block the hcall could process.

Supporting various block size seems not needed as all systems I was able to
play with was support an 8 addresses block size, which is the maximum
through the hcall. Supporting various size may complexify the algorithm in
call_block_remove() so unless this is required, this is not done.

In the case of block size different from 8, a warning message is displayed
at boot time and that block size will be ignored checking for the
H_BLOCK_REMOVE support.

Due to the minimal amount of hardware showing a limited set of
H_BLOCK_REMOVE supported page size, I don't think there is a need to push
this series to the stable mailing list.

The first patch is initializing the penc values for each page size to an
invalid value to be able to detect those which have been initialized as 0
is a valid value.

The second patch is reading the characteristic through the hcall
ibm,get-system-parameter and record the supported block size for each page
size.

The third patch is changing the check used to detect the H_BLOCK_REMOVE
availability to take care of the base page size and page size couple.

Laurent Dufour (3):
  powerpc/mm: Initialize the HPTE encoding values
  powperc/mm: read TLB Block Invalidate Characteristics
  powerpc/mm: call H_BLOCK_REMOVE when supported

 arch/powerpc/include/asm/book3s/64/mmu.h |   3 +
 arch/powerpc/mm/book3s64/hash_utils.c    |   8 +-
 arch/powerpc/platforms/pseries/lpar.c    | 118 ++++++++++++++++++++++-
 3 files changed, 125 insertions(+), 4 deletions(-)

-- 
2.23.0



^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/3] powerpc/mm: Initialize the HPTE encoding values
  2019-08-30 12:07 [PATCH 0/3] powerpc/mm: Conditionally call H_BLOCK_REMOVE Laurent Dufour
@ 2019-08-30 12:07 ` Laurent Dufour
  2019-09-12 13:32   ` Aneesh Kumar K.V
  2019-09-12 13:37   ` Aneesh Kumar K.V
  2019-08-30 12:07 ` [PATCH 2/3] powperc/mm: read TLB Block Invalidate Characteristics Laurent Dufour
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 17+ messages in thread
From: Laurent Dufour @ 2019-08-30 12:07 UTC (permalink / raw)
  To: mpe, benh, paulus, aneesh.kumar, npiggin
  Cc: linuxppc-dev, linux-mm, linux-kernel

Before reading the HPTE encoding values we initialize all of them to -1 (an
invalid value) to later being able to detect the initialized ones.

Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
---
 arch/powerpc/mm/book3s64/hash_utils.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/mm/book3s64/hash_utils.c b/arch/powerpc/mm/book3s64/hash_utils.c
index c3bfef08dcf8..2039bc315459 100644
--- a/arch/powerpc/mm/book3s64/hash_utils.c
+++ b/arch/powerpc/mm/book3s64/hash_utils.c
@@ -408,7 +408,7 @@ static int __init htab_dt_scan_page_sizes(unsigned long node,
 {
 	const char *type = of_get_flat_dt_prop(node, "device_type", NULL);
 	const __be32 *prop;
-	int size = 0;
+	int size = 0, idx, base_idx;
 
 	/* We are scanning "cpu" nodes only */
 	if (type == NULL || strcmp(type, "cpu") != 0)
@@ -418,6 +418,11 @@ static int __init htab_dt_scan_page_sizes(unsigned long node,
 	if (!prop)
 		return 0;
 
+	/* Set all the penc values to invalid */
+	for (base_idx = 0; base_idx < MMU_PAGE_COUNT; base_idx++)
+		for (idx = 0; idx < MMU_PAGE_COUNT; idx++)
+			mmu_psize_defs[base_idx].penc[idx] = -1;
+
 	pr_info("Page sizes from device-tree:\n");
 	size /= 4;
 	cur_cpu_spec->mmu_features &= ~(MMU_FTR_16M_PAGE);
@@ -426,7 +431,6 @@ static int __init htab_dt_scan_page_sizes(unsigned long node,
 		unsigned int slbenc = be32_to_cpu(prop[1]);
 		unsigned int lpnum = be32_to_cpu(prop[2]);
 		struct mmu_psize_def *def;
-		int idx, base_idx;
 
 		size -= 3; prop += 3;
 		base_idx = get_idx_from_shift(base_shift);
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 2/3] powperc/mm: read TLB Block Invalidate Characteristics
  2019-08-30 12:07 [PATCH 0/3] powerpc/mm: Conditionally call H_BLOCK_REMOVE Laurent Dufour
  2019-08-30 12:07 ` [PATCH 1/3] powerpc/mm: Initialize the HPTE encoding values Laurent Dufour
@ 2019-08-30 12:07 ` Laurent Dufour
  2019-09-12 14:16   ` Aneesh Kumar K.V
  2019-09-12 14:44   ` Aneesh Kumar K.V
  2019-08-30 12:07 ` [PATCH 3/3] powerpc/mm: call H_BLOCK_REMOVE when supported Laurent Dufour
  2019-09-12 13:44 ` [PATCH 0/3] powerpc/mm: Conditionally call H_BLOCK_REMOVE Aneesh Kumar K.V
  3 siblings, 2 replies; 17+ messages in thread
From: Laurent Dufour @ 2019-08-30 12:07 UTC (permalink / raw)
  To: mpe, benh, paulus, aneesh.kumar, npiggin
  Cc: linuxppc-dev, linux-mm, linux-kernel

The PAPR document specifies the TLB Block Invalidate Characteristics which
is telling which couple base page size / page size is supported by the
H_BLOCK_REMOVE hcall.

A new set of feature is added to the mmu_psize_def structure to record per
base page size which page size is supported by H_BLOCK_REMOVE.

A new init service is added to read the characteristics. The size of the
buffer is set to twice the number of known page size, plus 10 bytes to
ensure we have enough place.

Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
---
 arch/powerpc/include/asm/book3s/64/mmu.h |   3 +
 arch/powerpc/platforms/pseries/lpar.c    | 107 +++++++++++++++++++++++
 2 files changed, 110 insertions(+)

diff --git a/arch/powerpc/include/asm/book3s/64/mmu.h b/arch/powerpc/include/asm/book3s/64/mmu.h
index 23b83d3593e2..675895dfe39f 100644
--- a/arch/powerpc/include/asm/book3s/64/mmu.h
+++ b/arch/powerpc/include/asm/book3s/64/mmu.h
@@ -12,11 +12,14 @@
  *    sllp  : is a bit mask with the value of SLB L || LP to be or'ed
  *            directly to a slbmte "vsid" value
  *    penc  : is the HPTE encoding mask for the "LP" field:
+ *    hblk  : H_BLOCK_REMOVE supported block size for this page size in
+ *            segment who's base page size is that page size.
  *
  */
 struct mmu_psize_def {
 	unsigned int	shift;	/* number of bits */
 	int		penc[MMU_PAGE_COUNT];	/* HPTE encoding */
+	int		hblk[MMU_PAGE_COUNT];	/* H_BLOCK_REMOVE support */
 	unsigned int	tlbiel;	/* tlbiel supported for that page size */
 	unsigned long	avpnm;	/* bits to mask out in AVPN in the HPTE */
 	union {
diff --git a/arch/powerpc/platforms/pseries/lpar.c b/arch/powerpc/platforms/pseries/lpar.c
index 4f76e5f30c97..375e19b3cf53 100644
--- a/arch/powerpc/platforms/pseries/lpar.c
+++ b/arch/powerpc/platforms/pseries/lpar.c
@@ -1311,6 +1311,113 @@ static void do_block_remove(unsigned long number, struct ppc64_tlb_batch *batch,
 		(void)call_block_remove(pix, param, true);
 }
 
+static inline void __init set_hblk_bloc_size(int bpsize, int psize,
+					     unsigned int block_size)
+{
+	struct mmu_psize_def *def = &mmu_psize_defs[bpsize];
+
+	if (block_size > def->hblk[psize])
+		def->hblk[psize] = block_size;
+}
+
+static inline void __init check_lp_set_hblk(unsigned int lp,
+					    unsigned int block_size)
+{
+	unsigned int bpsize, psize;
+
+
+	/* First, check the L bit, if not set, this means 4K */
+	if ((lp & 0x80) == 0) {
+		set_hblk_bloc_size(MMU_PAGE_4K, MMU_PAGE_4K, block_size);
+		return;
+	}
+
+	/* PAPR says to look at bits 2-7 (0 = MSB) */
+	lp &= 0x3f;
+	for (bpsize = 0; bpsize < MMU_PAGE_COUNT; bpsize++) {
+		struct mmu_psize_def *def =  &mmu_psize_defs[bpsize];
+
+		for (psize = 0; psize < MMU_PAGE_COUNT; psize++) {
+			if (def->penc[psize] == lp) {
+				set_hblk_bloc_size(bpsize, psize, block_size);
+				return;
+			}
+		}
+	}
+}
+
+#define SPLPAR_TLB_BIC_TOKEN		50
+#define SPLPAR_TLB_BIC_MAXLENGTH	(MMU_PAGE_COUNT*2 + 10)
+static int __init read_tlbbi_characteristics(void)
+{
+	int call_status;
+	unsigned char local_buffer[SPLPAR_TLB_BIC_MAXLENGTH];
+	int len, idx, bpsize;
+
+	if (!firmware_has_feature(FW_FEATURE_BLOCK_REMOVE)) {
+		pr_info("H_BLOCK_REMOVE is not supported");
+		return 0;
+	}
+
+	memset(local_buffer, 0, SPLPAR_TLB_BIC_MAXLENGTH);
+
+	spin_lock(&rtas_data_buf_lock);
+	memset(rtas_data_buf, 0, RTAS_DATA_BUF_SIZE);
+	call_status = rtas_call(rtas_token("ibm,get-system-parameter"), 3, 1,
+				NULL,
+				SPLPAR_TLB_BIC_TOKEN,
+				__pa(rtas_data_buf),
+				RTAS_DATA_BUF_SIZE);
+	memcpy(local_buffer, rtas_data_buf, SPLPAR_TLB_BIC_MAXLENGTH);
+	local_buffer[SPLPAR_TLB_BIC_MAXLENGTH - 1] = '\0';
+	spin_unlock(&rtas_data_buf_lock);
+
+	if (call_status != 0) {
+		pr_warn("%s %s Error calling get-system-parameter (0x%x)\n",
+			__FILE__, __func__, call_status);
+		return 0;
+	}
+
+	/*
+	 * The first two (2) bytes of the data in the buffer are the length of
+	 * the returned data, not counting these first two (2) bytes.
+	 */
+	len = local_buffer[0] * 256 + local_buffer[1] + 2;
+	if (len >= SPLPAR_TLB_BIC_MAXLENGTH) {
+		pr_warn("%s too large returned buffer %d", __func__, len);
+		return 0;
+	}
+
+	idx = 2;
+	while (idx < len) {
+		unsigned int block_size = local_buffer[idx++];
+		unsigned int npsize;
+
+		if (!block_size)
+			break;
+
+		block_size = 1 << block_size;
+		if (block_size != 8)
+			/* We only support 8 bytes size TLB invalidate buffer */
+			pr_warn("Unsupported H_BLOCK_REMOVE block size : %d\n",
+				block_size);
+
+		for (npsize = local_buffer[idx++];  npsize > 0; npsize--)
+			check_lp_set_hblk((unsigned int) local_buffer[idx++],
+					  block_size);
+	}
+
+	for (bpsize = 0; bpsize < MMU_PAGE_COUNT; bpsize++)
+		for (idx = 0; idx < MMU_PAGE_COUNT; idx++)
+			if (mmu_psize_defs[bpsize].hblk[idx])
+				pr_info("H_BLOCK_REMOVE supports base psize:%d psize:%d block size:%d",
+					bpsize, idx,
+					mmu_psize_defs[bpsize].hblk[idx]);
+
+	return 0;
+}
+machine_arch_initcall(pseries, read_tlbbi_characteristics);
+
 /*
  * Take a spinlock around flushes to avoid bouncing the hypervisor tlbie
  * lock.
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 3/3] powerpc/mm: call H_BLOCK_REMOVE when supported
  2019-08-30 12:07 [PATCH 0/3] powerpc/mm: Conditionally call H_BLOCK_REMOVE Laurent Dufour
  2019-08-30 12:07 ` [PATCH 1/3] powerpc/mm: Initialize the HPTE encoding values Laurent Dufour
  2019-08-30 12:07 ` [PATCH 2/3] powperc/mm: read TLB Block Invalidate Characteristics Laurent Dufour
@ 2019-08-30 12:07 ` Laurent Dufour
  2019-09-12 14:20   ` Aneesh Kumar K.V
  2019-09-12 13:44 ` [PATCH 0/3] powerpc/mm: Conditionally call H_BLOCK_REMOVE Aneesh Kumar K.V
  3 siblings, 1 reply; 17+ messages in thread
From: Laurent Dufour @ 2019-08-30 12:07 UTC (permalink / raw)
  To: mpe, benh, paulus, aneesh.kumar, npiggin
  Cc: linuxppc-dev, linux-mm, linux-kernel

Instead of calling H_BLOCK_REMOVE all the time when the feature is
exhibited, call this hcall only when the couple base page size, page size
is supported as reported by the TLB Invalidate Characteristics.

For regular pages and hugetlb, the assumption is made that the page size is
equal to the base page size. For THP the page size is assumed to be 16M.

Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
---
 arch/powerpc/platforms/pseries/lpar.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/lpar.c b/arch/powerpc/platforms/pseries/lpar.c
index 375e19b3cf53..ef3dbf108a65 100644
--- a/arch/powerpc/platforms/pseries/lpar.c
+++ b/arch/powerpc/platforms/pseries/lpar.c
@@ -1143,7 +1143,11 @@ static inline void __pSeries_lpar_hugepage_invalidate(unsigned long *slot,
 	if (lock_tlbie)
 		spin_lock_irqsave(&pSeries_lpar_tlbie_lock, flags);
 
-	if (firmware_has_feature(FW_FEATURE_BLOCK_REMOVE))
+	/*
+	 * Assuming THP size is 16M, and we only support 8 bytes size buffer
+	 * for the momment.
+	 */
+	if (mmu_psize_defs[psize].hblk[MMU_PAGE_16M] == 8)
 		hugepage_block_invalidate(slot, vpn, count, psize, ssize);
 	else
 		hugepage_bulk_invalidate(slot, vpn, count, psize, ssize);
@@ -1437,7 +1441,10 @@ static void pSeries_lpar_flush_hash_range(unsigned long number, int local)
 	if (lock_tlbie)
 		spin_lock_irqsave(&pSeries_lpar_tlbie_lock, flags);
 
-	if (firmware_has_feature(FW_FEATURE_BLOCK_REMOVE)) {
+	/*
+	 * Currently, we only support 8 bytes size buffer in do_block_remove().
+	 */
+	if (mmu_psize_defs[batch->psize].hblk[batch->psize] == 8) {
 		do_block_remove(number, batch, param);
 		goto out;
 	}
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/3] powerpc/mm: Initialize the HPTE encoding values
  2019-08-30 12:07 ` [PATCH 1/3] powerpc/mm: Initialize the HPTE encoding values Laurent Dufour
@ 2019-09-12 13:32   ` Aneesh Kumar K.V
  2019-09-12 13:37   ` Aneesh Kumar K.V
  1 sibling, 0 replies; 17+ messages in thread
From: Aneesh Kumar K.V @ 2019-09-12 13:32 UTC (permalink / raw)
  To: Laurent Dufour, mpe, benh, paulus, npiggin
  Cc: linuxppc-dev, linux-mm, linux-kernel

Laurent Dufour <ldufour@linux.ibm.com> writes:

> Before reading the HPTE encoding values we initialize all of them to -1 (an
> invalid value) to later being able to detect the initialized ones.
>
> Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>

We already do this in mmu_psize_set_default_penc() ?

> ---
>  arch/powerpc/mm/book3s64/hash_utils.c | 8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/arch/powerpc/mm/book3s64/hash_utils.c b/arch/powerpc/mm/book3s64/hash_utils.c
> index c3bfef08dcf8..2039bc315459 100644
> --- a/arch/powerpc/mm/book3s64/hash_utils.c
> +++ b/arch/powerpc/mm/book3s64/hash_utils.c
> @@ -408,7 +408,7 @@ static int __init htab_dt_scan_page_sizes(unsigned long node,
>  {
>  	const char *type = of_get_flat_dt_prop(node, "device_type", NULL);
>  	const __be32 *prop;
> -	int size = 0;
> +	int size = 0, idx, base_idx;
>  
>  	/* We are scanning "cpu" nodes only */
>  	if (type == NULL || strcmp(type, "cpu") != 0)
> @@ -418,6 +418,11 @@ static int __init htab_dt_scan_page_sizes(unsigned long node,
>  	if (!prop)
>  		return 0;
>  
> +	/* Set all the penc values to invalid */
> +	for (base_idx = 0; base_idx < MMU_PAGE_COUNT; base_idx++)
> +		for (idx = 0; idx < MMU_PAGE_COUNT; idx++)
> +			mmu_psize_defs[base_idx].penc[idx] = -1;
> +
>  	pr_info("Page sizes from device-tree:\n");
>  	size /= 4;
>  	cur_cpu_spec->mmu_features &= ~(MMU_FTR_16M_PAGE);
> @@ -426,7 +431,6 @@ static int __init htab_dt_scan_page_sizes(unsigned long node,
>  		unsigned int slbenc = be32_to_cpu(prop[1]);
>  		unsigned int lpnum = be32_to_cpu(prop[2]);
>  		struct mmu_psize_def *def;
> -		int idx, base_idx;
>  
>  		size -= 3; prop += 3;
>  		base_idx = get_idx_from_shift(base_shift);
> -- 
> 2.23.0


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/3] powerpc/mm: Initialize the HPTE encoding values
  2019-08-30 12:07 ` [PATCH 1/3] powerpc/mm: Initialize the HPTE encoding values Laurent Dufour
  2019-09-12 13:32   ` Aneesh Kumar K.V
@ 2019-09-12 13:37   ` Aneesh Kumar K.V
  2019-09-13 10:56     ` Laurent Dufour
  1 sibling, 1 reply; 17+ messages in thread
From: Aneesh Kumar K.V @ 2019-09-12 13:37 UTC (permalink / raw)
  To: Laurent Dufour, mpe, benh, paulus, npiggin
  Cc: linuxppc-dev, linux-mm, linux-kernel

On 8/30/19 5:37 PM, Laurent Dufour wrote:
> Before reading the HPTE encoding values we initialize all of them to -1 (an
> invalid value) to later being able to detect the initialized ones.
> 
> Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
> ---
>   arch/powerpc/mm/book3s64/hash_utils.c | 8 ++++++--
>   1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/mm/book3s64/hash_utils.c b/arch/powerpc/mm/book3s64/hash_utils.c
> index c3bfef08dcf8..2039bc315459 100644
> --- a/arch/powerpc/mm/book3s64/hash_utils.c
> +++ b/arch/powerpc/mm/book3s64/hash_utils.c
> @@ -408,7 +408,7 @@ static int __init htab_dt_scan_page_sizes(unsigned long node,
>   {
>   	const char *type = of_get_flat_dt_prop(node, "device_type", NULL);
>   	const __be32 *prop;
> -	int size = 0;
> +	int size = 0, idx, base_idx;
>   
>   	/* We are scanning "cpu" nodes only */
>   	if (type == NULL || strcmp(type, "cpu") != 0)
> @@ -418,6 +418,11 @@ static int __init htab_dt_scan_page_sizes(unsigned long node,
>   	if (!prop)
>   		return 0;
>   
> +	/* Set all the penc values to invalid */
> +	for (base_idx = 0; base_idx < MMU_PAGE_COUNT; base_idx++)
> +		for (idx = 0; idx < MMU_PAGE_COUNT; idx++)
> +			mmu_psize_defs[base_idx].penc[idx] = -1;
> +
>   	pr_info("Page sizes from device-tree:\n");
>   	size /= 4;
>   	cur_cpu_spec->mmu_features &= ~(MMU_FTR_16M_PAGE);
> @@ -426,7 +431,6 @@ static int __init htab_dt_scan_page_sizes(unsigned long node,
>   		unsigned int slbenc = be32_to_cpu(prop[1]);
>   		unsigned int lpnum = be32_to_cpu(prop[2]);
>   		struct mmu_psize_def *def;
> -		int idx, base_idx;
>   
>   		size -= 3; prop += 3;
>   		base_idx = get_idx_from_shift(base_shift);
> 

We already do this in mmu_psize_set_default_penc() ?

-aneesh


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/3] powerpc/mm: Conditionally call H_BLOCK_REMOVE
  2019-08-30 12:07 [PATCH 0/3] powerpc/mm: Conditionally call H_BLOCK_REMOVE Laurent Dufour
                   ` (2 preceding siblings ...)
  2019-08-30 12:07 ` [PATCH 3/3] powerpc/mm: call H_BLOCK_REMOVE when supported Laurent Dufour
@ 2019-09-12 13:44 ` Aneesh Kumar K.V
  2019-09-13 11:09   ` Laurent Dufour
  3 siblings, 1 reply; 17+ messages in thread
From: Aneesh Kumar K.V @ 2019-09-12 13:44 UTC (permalink / raw)
  To: Laurent Dufour, mpe, benh, paulus, npiggin
  Cc: linuxppc-dev, linux-mm, linux-kernel

On 8/30/19 5:37 PM, Laurent Dufour wrote:
> Since the commit ba2dd8a26baa ("powerpc/pseries/mm: call H_BLOCK_REMOVE"),
> the call to H_BLOCK_REMOVE is always done if the feature is exhibited.
> 
> On some system, the hypervisor may not support all the combination of
> segment base page size and page size. When this happens the hcall is
> returning H_PARAM, which is triggering a BUG_ON check leading to a panic.
> 
> The PAPR document is specifying a TLB Block Invalidate Characteristics item
> detailing which couple base page size, page size the hypervisor is
> supporting through H_BLOCK_REMOVE. Furthermore, the characteristics are
> also providing the size of the block the hcall could process.
> 
> Supporting various block size seems not needed as all systems I was able to
> play with was support an 8 addresses block size, which is the maximum
> through the hcall. Supporting various size may complexify the algorithm in
> call_block_remove() so unless this is required, this is not done.
> 
> In the case of block size different from 8, a warning message is displayed
> at boot time and that block size will be ignored checking for the
> H_BLOCK_REMOVE support.
> 
> Due to the minimal amount of hardware showing a limited set of
> H_BLOCK_REMOVE supported page size, I don't think there is a need to push
> this series to the stable mailing list.
> 
> The first patch is initializing the penc values for each page size to an
> invalid value to be able to detect those which have been initialized as 0
> is a valid value.
> 
> The second patch is reading the characteristic through the hcall
> ibm,get-system-parameter and record the supported block size for each page
> size.
> 
> The third patch is changing the check used to detect the H_BLOCK_REMOVE
> availability to take care of the base page size and page size couple.
> 

So ibm,segment-page-sizes indicates wether we support a combination of 
base page size and actual page size. You are suggesting that the value 
reported by that is not correct? Can you also share the early part of 
dmesg as below.

[    0.000000] hash-mmu: Page sizes from device-tree:
[    0.000000] hash-mmu: base_shift=12: shift=12, sllp=0x0000, 
avpnm=0x00000000, tlbiel=1, penc=0
[    0.000000] hash-mmu: base_shift=12: shift=16, sllp=0x0000, 
avpnm=0x00000000, tlbiel=1, penc=7
[    0.000000] hash-mmu: base_shift=12: shift=24, sllp=0x0000, 
avpnm=0x00000000, tlbiel=1, penc=56
[    0.000000] hash-mmu: base_shift=16: shift=16, sllp=0x0110, 
avpnm=0x00000000, tlbiel=1, penc=1
[    0.000000] hash-mmu: base_shift=16: shift=24, sllp=0x0110, 
avpnm=0x00000000, tlbiel=1, penc=8
[    0.000000] hash-mmu: base_shift=24: shift=24, sllp=0x0100, 
avpnm=0x00000001, tlbiel=0, penc=0
[    0.000000] hash-mmu: base_shift=34: shift=34, sllp=0x0120, 
avpnm=0x000007ff, tlbiel=0, penc=3

That shows different base page size and actual page size combination.


> Laurent Dufour (3):
>    powerpc/mm: Initialize the HPTE encoding values
>    powperc/mm: read TLB Block Invalidate Characteristics
>    powerpc/mm: call H_BLOCK_REMOVE when supported
> 
>   arch/powerpc/include/asm/book3s/64/mmu.h |   3 +
>   arch/powerpc/mm/book3s64/hash_utils.c    |   8 +-
>   arch/powerpc/platforms/pseries/lpar.c    | 118 ++++++++++++++++++++++-
>   3 files changed, 125 insertions(+), 4 deletions(-)
> 


-aneesh


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/3] powperc/mm: read TLB Block Invalidate Characteristics
  2019-08-30 12:07 ` [PATCH 2/3] powperc/mm: read TLB Block Invalidate Characteristics Laurent Dufour
@ 2019-09-12 14:16   ` Aneesh Kumar K.V
  2019-09-13 13:55     ` Laurent Dufour
  2019-09-12 14:44   ` Aneesh Kumar K.V
  1 sibling, 1 reply; 17+ messages in thread
From: Aneesh Kumar K.V @ 2019-09-12 14:16 UTC (permalink / raw)
  To: Laurent Dufour, mpe, benh, paulus, npiggin
  Cc: linuxppc-dev, linux-mm, linux-kernel

On 8/30/19 5:37 PM, Laurent Dufour wrote:
> The PAPR document specifies the TLB Block Invalidate Characteristics which
> is telling which couple base page size / page size is supported by the
> H_BLOCK_REMOVE hcall.
> 
> A new set of feature is added to the mmu_psize_def structure to record per
> base page size which page size is supported by H_BLOCK_REMOVE.
> 
> A new init service is added to read the characteristics. The size of the
> buffer is set to twice the number of known page size, plus 10 bytes to
> ensure we have enough place.
> 


So this is not really the base page size/actual page size combination. 
This is related to H_BLOCK_REMOVE hcall, block size supported by that 
HCALL and what page size combination is supported with that specific 
block size.

We should add that TLB block invalidate characteristics format in this 
patch.


> Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
> ---
>   arch/powerpc/include/asm/book3s/64/mmu.h |   3 +
>   arch/powerpc/platforms/pseries/lpar.c    | 107 +++++++++++++++++++++++
>   2 files changed, 110 insertions(+)
> 
> diff --git a/arch/powerpc/include/asm/book3s/64/mmu.h b/arch/powerpc/include/asm/book3s/64/mmu.h
> index 23b83d3593e2..675895dfe39f 100644
> --- a/arch/powerpc/include/asm/book3s/64/mmu.h
> +++ b/arch/powerpc/include/asm/book3s/64/mmu.h
> @@ -12,11 +12,14 @@
>    *    sllp  : is a bit mask with the value of SLB L || LP to be or'ed
>    *            directly to a slbmte "vsid" value
>    *    penc  : is the HPTE encoding mask for the "LP" field:
> + *    hblk  : H_BLOCK_REMOVE supported block size for this page size in
> + *            segment who's base page size is that page size.
>    *
>    */
>   struct mmu_psize_def {
>   	unsigned int	shift;	/* number of bits */
>   	int		penc[MMU_PAGE_COUNT];	/* HPTE encoding */
> +	int		hblk[MMU_PAGE_COUNT];	/* H_BLOCK_REMOVE support */
>   	unsigned int	tlbiel;	/* tlbiel supported for that page size */
>   	unsigned long	avpnm;	/* bits to mask out in AVPN in the HPTE */
>   	union {
> diff --git a/arch/powerpc/platforms/pseries/lpar.c b/arch/powerpc/platforms/pseries/lpar.c
> index 4f76e5f30c97..375e19b3cf53 100644
> --- a/arch/powerpc/platforms/pseries/lpar.c
> +++ b/arch/powerpc/platforms/pseries/lpar.c
> @@ -1311,6 +1311,113 @@ static void do_block_remove(unsigned long number, struct ppc64_tlb_batch *batch,
>   		(void)call_block_remove(pix, param, true);
>   }
>   
> +static inline void __init set_hblk_bloc_size(int bpsize, int psize,
> +					     unsigned int block_size)
> +{
> +	struct mmu_psize_def *def = &mmu_psize_defs[bpsize];
> +
> +	if (block_size > def->hblk[psize])
> +		def->hblk[psize] = block_size;
> +}
> +
> +static inline void __init check_lp_set_hblk(unsigned int lp,
> +					    unsigned int block_size)
> +{
> +	unsigned int bpsize, psize;
> +
> +
> +	/* First, check the L bit, if not set, this means 4K */
> +	if ((lp & 0x80) == 0) {


What is that 0x80? We should have #define for most of those.

> +		set_hblk_bloc_size(MMU_PAGE_4K, MMU_PAGE_4K, block_size);
> +		return;
> +	}
> +
> +	/* PAPR says to look at bits 2-7 (0 = MSB) */
> +	lp &= 0x3f;

Also convert that to #define?

> +	for (bpsize = 0; bpsize < MMU_PAGE_COUNT; bpsize++) {
> +		struct mmu_psize_def *def =  &mmu_psize_defs[bpsize];
> +
> +		for (psize = 0; psize < MMU_PAGE_COUNT; psize++) {
> +			if (def->penc[psize] == lp) {
> +				set_hblk_bloc_size(bpsize, psize, block_size);
> +				return;
> +			}
> +		}
> +	}
> +}
> +
> +#define SPLPAR_TLB_BIC_TOKEN		50
> +#define SPLPAR_TLB_BIC_MAXLENGTH	(MMU_PAGE_COUNT*2 + 10)
> +static int __init read_tlbbi_characteristics(void)
> +{
> +	int call_status;
> +	unsigned char local_buffer[SPLPAR_TLB_BIC_MAXLENGTH];
> +	int len, idx, bpsize;
> +
> +	if (!firmware_has_feature(FW_FEATURE_BLOCK_REMOVE)) {
> +		pr_info("H_BLOCK_REMOVE is not supported");
> +		return 0;
> +	}
> +
> +	memset(local_buffer, 0, SPLPAR_TLB_BIC_MAXLENGTH);
> +
> +	spin_lock(&rtas_data_buf_lock);
> +	memset(rtas_data_buf, 0, RTAS_DATA_BUF_SIZE);
> +	call_status = rtas_call(rtas_token("ibm,get-system-parameter"), 3, 1,
> +				NULL,
> +				SPLPAR_TLB_BIC_TOKEN,
> +				__pa(rtas_data_buf),
> +				RTAS_DATA_BUF_SIZE);
> +	memcpy(local_buffer, rtas_data_buf, SPLPAR_TLB_BIC_MAXLENGTH);
> +	local_buffer[SPLPAR_TLB_BIC_MAXLENGTH - 1] = '\0';
> +	spin_unlock(&rtas_data_buf_lock);
> +
> +	if (call_status != 0) {
> +		pr_warn("%s %s Error calling get-system-parameter (0x%x)\n",
> +			__FILE__, __func__, call_status);
> +		return 0;
> +	}
> +
> +	/*
> +	 * The first two (2) bytes of the data in the buffer are the length of
> +	 * the returned data, not counting these first two (2) bytes.
> +	 */
> +	len = local_buffer[0] * 256 + local_buffer[1] + 2;
> +	if (len >= SPLPAR_TLB_BIC_MAXLENGTH) {
> +		pr_warn("%s too large returned buffer %d", __func__, len);
> +		return 0;
> +	}
> +
> +	idx = 2;
> +	while (idx < len) {
> +		unsigned int block_size = local_buffer[idx++];
> +		unsigned int npsize;
> +
> +		if (!block_size)
> +			break;
> +
> +		block_size = 1 << block_size;
> +		if (block_size != 8)
> +			/* We only support 8 bytes size TLB invalidate buffer */
> +			pr_warn("Unsupported H_BLOCK_REMOVE block size : %d\n",
> +				block_size);
> +
> +		for (npsize = local_buffer[idx++];  npsize > 0; npsize--)
> +			check_lp_set_hblk((unsigned int) local_buffer[idx++],
> +					  block_size);
> +	}
> +
> +	for (bpsize = 0; bpsize < MMU_PAGE_COUNT; bpsize++)
> +		for (idx = 0; idx < MMU_PAGE_COUNT; idx++)
> +			if (mmu_psize_defs[bpsize].hblk[idx])
> +				pr_info("H_BLOCK_REMOVE supports base psize:%d psize:%d block size:%d",
> +					bpsize, idx,
> +					mmu_psize_defs[bpsize].hblk[idx]);
> +
> +	return 0;
> +}
> +machine_arch_initcall(pseries, read_tlbbi_characteristics);
> +

Why a machine_arch_initcall() ? Can't we do this similar to how we do 
segment-page-size parsing from device tree? Also this should be hash 
translation mode specific.

>   /*
>    * Take a spinlock around flushes to avoid bouncing the hypervisor tlbie
>    * lock.
> 



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/3] powerpc/mm: call H_BLOCK_REMOVE when supported
  2019-08-30 12:07 ` [PATCH 3/3] powerpc/mm: call H_BLOCK_REMOVE when supported Laurent Dufour
@ 2019-09-12 14:20   ` Aneesh Kumar K.V
  2019-09-13 13:16     ` Laurent Dufour
  0 siblings, 1 reply; 17+ messages in thread
From: Aneesh Kumar K.V @ 2019-09-12 14:20 UTC (permalink / raw)
  To: Laurent Dufour, mpe, benh, paulus, npiggin
  Cc: linuxppc-dev, linux-mm, linux-kernel

On 8/30/19 5:37 PM, Laurent Dufour wrote:
> Instead of calling H_BLOCK_REMOVE all the time when the feature is
> exhibited, call this hcall only when the couple base page size, page size
> is supported as reported by the TLB Invalidate Characteristics.
> 

supported is not actually what we are checking here. We are checking 
whether the base page size actual page size remove can be done in chunks 
of 8 blocks. If we don't support 8 block you fallback to bulk 
invalidate. May be update the commit message accordingly?


> For regular pages and hugetlb, the assumption is made that the page size is
> equal to the base page size. For THP the page size is assumed to be 16M.
> 
> Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
> ---
>   arch/powerpc/platforms/pseries/lpar.c | 11 +++++++++--
>   1 file changed, 9 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/pseries/lpar.c b/arch/powerpc/platforms/pseries/lpar.c
> index 375e19b3cf53..ef3dbf108a65 100644
> --- a/arch/powerpc/platforms/pseries/lpar.c
> +++ b/arch/powerpc/platforms/pseries/lpar.c
> @@ -1143,7 +1143,11 @@ static inline void __pSeries_lpar_hugepage_invalidate(unsigned long *slot,
>   	if (lock_tlbie)
>   		spin_lock_irqsave(&pSeries_lpar_tlbie_lock, flags);
>   
> -	if (firmware_has_feature(FW_FEATURE_BLOCK_REMOVE))
> +	/*
> +	 * Assuming THP size is 16M, and we only support 8 bytes size buffer
> +	 * for the momment.
> +	 */
> +	if (mmu_psize_defs[psize].hblk[MMU_PAGE_16M] == 8)
>   		hugepage_block_invalidate(slot, vpn, count, psize, ssize);
>   	else
>   		hugepage_bulk_invalidate(slot, vpn, count, psize, ssize);



So we don't use block invalidate if blocksize is != 8.


> @@ -1437,7 +1441,10 @@ static void pSeries_lpar_flush_hash_range(unsigned long number, int local)
>   	if (lock_tlbie)
>   		spin_lock_irqsave(&pSeries_lpar_tlbie_lock, flags);
>   
> -	if (firmware_has_feature(FW_FEATURE_BLOCK_REMOVE)) {
> +	/*
> +	 * Currently, we only support 8 bytes size buffer in do_block_remove().
> +	 */
> +	if (mmu_psize_defs[batch->psize].hblk[batch->psize] == 8) {
>   		do_block_remove(number, batch, param);
>   		goto out;
>   	}
> 

-aneesh


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/3] powperc/mm: read TLB Block Invalidate Characteristics
  2019-08-30 12:07 ` [PATCH 2/3] powperc/mm: read TLB Block Invalidate Characteristics Laurent Dufour
  2019-09-12 14:16   ` Aneesh Kumar K.V
@ 2019-09-12 14:44   ` Aneesh Kumar K.V
  2019-09-12 19:26     ` Laurent Dufour
  1 sibling, 1 reply; 17+ messages in thread
From: Aneesh Kumar K.V @ 2019-09-12 14:44 UTC (permalink / raw)
  To: Laurent Dufour, mpe, benh, paulus, npiggin
  Cc: linuxppc-dev, linux-mm, linux-kernel

Laurent Dufour <ldufour@linux.ibm.com> writes:

> The PAPR document specifies the TLB Block Invalidate Characteristics which
> is telling which couple base page size / page size is supported by the
> H_BLOCK_REMOVE hcall.
>
> A new set of feature is added to the mmu_psize_def structure to record per
> base page size which page size is supported by H_BLOCK_REMOVE.
>
> A new init service is added to read the characteristics. The size of the
> buffer is set to twice the number of known page size, plus 10 bytes to
> ensure we have enough place.
>
> Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
> ---
>  arch/powerpc/include/asm/book3s/64/mmu.h |   3 +
>  arch/powerpc/platforms/pseries/lpar.c    | 107 +++++++++++++++++++++++
>  2 files changed, 110 insertions(+)
>
> diff --git a/arch/powerpc/include/asm/book3s/64/mmu.h b/arch/powerpc/include/asm/book3s/64/mmu.h
> index 23b83d3593e2..675895dfe39f 100644
> --- a/arch/powerpc/include/asm/book3s/64/mmu.h
> +++ b/arch/powerpc/include/asm/book3s/64/mmu.h
> @@ -12,11 +12,14 @@
>   *    sllp  : is a bit mask with the value of SLB L || LP to be or'ed
>   *            directly to a slbmte "vsid" value
>   *    penc  : is the HPTE encoding mask for the "LP" field:
> + *    hblk  : H_BLOCK_REMOVE supported block size for this page size in
> + *            segment who's base page size is that page size.
>   *
>   */
>  struct mmu_psize_def {
>  	unsigned int	shift;	/* number of bits */
>  	int		penc[MMU_PAGE_COUNT];	/* HPTE encoding */
> +	int		hblk[MMU_PAGE_COUNT];	/* H_BLOCK_REMOVE support */
>  	unsigned int	tlbiel;	/* tlbiel supported for that page size */
>  	unsigned long	avpnm;	/* bits to mask out in AVPN in the HPTE */
>  	union {
> diff --git a/arch/powerpc/platforms/pseries/lpar.c b/arch/powerpc/platforms/pseries/lpar.c
> index 4f76e5f30c97..375e19b3cf53 100644
> --- a/arch/powerpc/platforms/pseries/lpar.c
> +++ b/arch/powerpc/platforms/pseries/lpar.c
> @@ -1311,6 +1311,113 @@ static void do_block_remove(unsigned long number, struct ppc64_tlb_batch *batch,
>  		(void)call_block_remove(pix, param, true);
>  }
>  
> +static inline void __init set_hblk_bloc_size(int bpsize, int psize,
> +					     unsigned int block_size)
> +{
> +	struct mmu_psize_def *def = &mmu_psize_defs[bpsize];
> +
> +	if (block_size > def->hblk[psize])
> +		def->hblk[psize] = block_size;
> +}
> +
> +static inline void __init check_lp_set_hblk(unsigned int lp,
> +					    unsigned int block_size)
> +{
> +	unsigned int bpsize, psize;
> +
> +
> +	/* First, check the L bit, if not set, this means 4K */
> +	if ((lp & 0x80) == 0) {
> +		set_hblk_bloc_size(MMU_PAGE_4K, MMU_PAGE_4K, block_size);
> +		return;
> +	}
> +
> +	/* PAPR says to look at bits 2-7 (0 = MSB) */
> +	lp &= 0x3f;
> +	for (bpsize = 0; bpsize < MMU_PAGE_COUNT; bpsize++) {
> +		struct mmu_psize_def *def =  &mmu_psize_defs[bpsize];
> +
> +		for (psize = 0; psize < MMU_PAGE_COUNT; psize++) {
> +			if (def->penc[psize] == lp) {
> +				set_hblk_bloc_size(bpsize, psize, block_size);
> +				return;
> +			}
> +		}
> +	}
> +}
> +
> +#define SPLPAR_TLB_BIC_TOKEN		50
> +#define SPLPAR_TLB_BIC_MAXLENGTH	(MMU_PAGE_COUNT*2 + 10)
> +static int __init read_tlbbi_characteristics(void)
> +{
> +	int call_status;
> +	unsigned char local_buffer[SPLPAR_TLB_BIC_MAXLENGTH];
> +	int len, idx, bpsize;
> +
> +	if (!firmware_has_feature(FW_FEATURE_BLOCK_REMOVE)) {
> +		pr_info("H_BLOCK_REMOVE is not supported");
> +		return 0;
> +	}
> +
> +	memset(local_buffer, 0, SPLPAR_TLB_BIC_MAXLENGTH);
> +
> +	spin_lock(&rtas_data_buf_lock);
> +	memset(rtas_data_buf, 0, RTAS_DATA_BUF_SIZE);
> +	call_status = rtas_call(rtas_token("ibm,get-system-parameter"), 3, 1,
> +				NULL,
> +				SPLPAR_TLB_BIC_TOKEN,
> +				__pa(rtas_data_buf),
> +				RTAS_DATA_BUF_SIZE);
> +	memcpy(local_buffer, rtas_data_buf, SPLPAR_TLB_BIC_MAXLENGTH);
> +	local_buffer[SPLPAR_TLB_BIC_MAXLENGTH - 1] = '\0';
> +	spin_unlock(&rtas_data_buf_lock);
> +
> +	if (call_status != 0) {
> +		pr_warn("%s %s Error calling get-system-parameter (0x%x)\n",
> +			__FILE__, __func__, call_status);
> +		return 0;
> +	}
> +
> +	/*
> +	 * The first two (2) bytes of the data in the buffer are the length of
> +	 * the returned data, not counting these first two (2) bytes.
> +	 */
> +	len = local_buffer[0] * 256 + local_buffer[1] + 2;
> +	if (len >= SPLPAR_TLB_BIC_MAXLENGTH) {
> +		pr_warn("%s too large returned buffer %d", __func__, len);
> +		return 0;
> +	}
> +
> +	idx = 2;
> +	while (idx < len) {
> +		unsigned int block_size = local_buffer[idx++];
> +		unsigned int npsize;
> +
> +		if (!block_size)
> +			break;
> +
> +		block_size = 1 << block_size;
> +		if (block_size != 8)
> +			/* We only support 8 bytes size TLB invalidate buffer */
> +			pr_warn("Unsupported H_BLOCK_REMOVE block size : %d\n",
> +				block_size);

Should we skip setting block size if we find block_size != 8? Also can
we avoid doing that pr_warn in loop and only warn if we don't find
block_size 8 in the invalidate characteristics array? 

> +
> +		for (npsize = local_buffer[idx++];  npsize > 0; npsize--)
> +			check_lp_set_hblk((unsigned int) local_buffer[idx++],
> +					  block_size);
> +	}
> +
> +	for (bpsize = 0; bpsize < MMU_PAGE_COUNT; bpsize++)
> +		for (idx = 0; idx < MMU_PAGE_COUNT; idx++)
> +			if (mmu_psize_defs[bpsize].hblk[idx])
> +				pr_info("H_BLOCK_REMOVE supports base psize:%d psize:%d block size:%d",
> +					bpsize, idx,
> +					mmu_psize_defs[bpsize].hblk[idx]);
> +
> +	return 0;
> +}
> +machine_arch_initcall(pseries, read_tlbbi_characteristics);
> +
>  /*
>   * Take a spinlock around flushes to avoid bouncing the hypervisor tlbie
>   * lock.
> -- 
> 2.23.0


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/3] powperc/mm: read TLB Block Invalidate Characteristics
  2019-09-12 14:44   ` Aneesh Kumar K.V
@ 2019-09-12 19:26     ` Laurent Dufour
  2019-09-13  2:00       ` Aneesh Kumar K.V
  0 siblings, 1 reply; 17+ messages in thread
From: Laurent Dufour @ 2019-09-12 19:26 UTC (permalink / raw)
  To: Aneesh Kumar K.V, mpe, benh, paulus, npiggin
  Cc: linuxppc-dev, linux-mm, linux-kernel

Le 12/09/2019 à 16:44, Aneesh Kumar K.V a écrit :
> Laurent Dufour <ldufour@linux.ibm.com> writes:
> 
>> The PAPR document specifies the TLB Block Invalidate Characteristics which
>> is telling which couple base page size / page size is supported by the
>> H_BLOCK_REMOVE hcall.
>>
>> A new set of feature is added to the mmu_psize_def structure to record per
>> base page size which page size is supported by H_BLOCK_REMOVE.
>>
>> A new init service is added to read the characteristics. The size of the
>> buffer is set to twice the number of known page size, plus 10 bytes to
>> ensure we have enough place.
>>
>> Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
>> ---
>>   arch/powerpc/include/asm/book3s/64/mmu.h |   3 +
>>   arch/powerpc/platforms/pseries/lpar.c    | 107 +++++++++++++++++++++++
>>   2 files changed, 110 insertions(+)
>>
>> diff --git a/arch/powerpc/include/asm/book3s/64/mmu.h b/arch/powerpc/include/asm/book3s/64/mmu.h
>> index 23b83d3593e2..675895dfe39f 100644
>> --- a/arch/powerpc/include/asm/book3s/64/mmu.h
>> +++ b/arch/powerpc/include/asm/book3s/64/mmu.h
>> @@ -12,11 +12,14 @@
>>    *    sllp  : is a bit mask with the value of SLB L || LP to be or'ed
>>    *            directly to a slbmte "vsid" value
>>    *    penc  : is the HPTE encoding mask for the "LP" field:
>> + *    hblk  : H_BLOCK_REMOVE supported block size for this page size in
>> + *            segment who's base page size is that page size.
>>    *
>>    */
>>   struct mmu_psize_def {
>>   	unsigned int	shift;	/* number of bits */
>>   	int		penc[MMU_PAGE_COUNT];	/* HPTE encoding */
>> +	int		hblk[MMU_PAGE_COUNT];	/* H_BLOCK_REMOVE support */
>>   	unsigned int	tlbiel;	/* tlbiel supported for that page size */
>>   	unsigned long	avpnm;	/* bits to mask out in AVPN in the HPTE */
>>   	union {
>> diff --git a/arch/powerpc/platforms/pseries/lpar.c b/arch/powerpc/platforms/pseries/lpar.c
>> index 4f76e5f30c97..375e19b3cf53 100644
>> --- a/arch/powerpc/platforms/pseries/lpar.c
>> +++ b/arch/powerpc/platforms/pseries/lpar.c
>> @@ -1311,6 +1311,113 @@ static void do_block_remove(unsigned long number, struct ppc64_tlb_batch *batch,
>>   		(void)call_block_remove(pix, param, true);
>>   }
>>   
>> +static inline void __init set_hblk_bloc_size(int bpsize, int psize,
>> +					     unsigned int block_size)
>> +{
>> +	struct mmu_psize_def *def = &mmu_psize_defs[bpsize];
>> +
>> +	if (block_size > def->hblk[psize])
>> +		def->hblk[psize] = block_size;
>> +}
>> +
>> +static inline void __init check_lp_set_hblk(unsigned int lp,
>> +					    unsigned int block_size)
>> +{
>> +	unsigned int bpsize, psize;
>> +
>> +
>> +	/* First, check the L bit, if not set, this means 4K */
>> +	if ((lp & 0x80) == 0) {
>> +		set_hblk_bloc_size(MMU_PAGE_4K, MMU_PAGE_4K, block_size);
>> +		return;
>> +	}
>> +
>> +	/* PAPR says to look at bits 2-7 (0 = MSB) */
>> +	lp &= 0x3f;
>> +	for (bpsize = 0; bpsize < MMU_PAGE_COUNT; bpsize++) {
>> +		struct mmu_psize_def *def =  &mmu_psize_defs[bpsize];
>> +
>> +		for (psize = 0; psize < MMU_PAGE_COUNT; psize++) {
>> +			if (def->penc[psize] == lp) {
>> +				set_hblk_bloc_size(bpsize, psize, block_size);
>> +				return;
>> +			}
>> +		}
>> +	}
>> +}
>> +
>> +#define SPLPAR_TLB_BIC_TOKEN		50
>> +#define SPLPAR_TLB_BIC_MAXLENGTH	(MMU_PAGE_COUNT*2 + 10)
>> +static int __init read_tlbbi_characteristics(void)
>> +{
>> +	int call_status;
>> +	unsigned char local_buffer[SPLPAR_TLB_BIC_MAXLENGTH];
>> +	int len, idx, bpsize;
>> +
>> +	if (!firmware_has_feature(FW_FEATURE_BLOCK_REMOVE)) {
>> +		pr_info("H_BLOCK_REMOVE is not supported");
>> +		return 0;
>> +	}
>> +
>> +	memset(local_buffer, 0, SPLPAR_TLB_BIC_MAXLENGTH);
>> +
>> +	spin_lock(&rtas_data_buf_lock);
>> +	memset(rtas_data_buf, 0, RTAS_DATA_BUF_SIZE);
>> +	call_status = rtas_call(rtas_token("ibm,get-system-parameter"), 3, 1,
>> +				NULL,
>> +				SPLPAR_TLB_BIC_TOKEN,
>> +				__pa(rtas_data_buf),
>> +				RTAS_DATA_BUF_SIZE);
>> +	memcpy(local_buffer, rtas_data_buf, SPLPAR_TLB_BIC_MAXLENGTH);
>> +	local_buffer[SPLPAR_TLB_BIC_MAXLENGTH - 1] = '\0';
>> +	spin_unlock(&rtas_data_buf_lock);
>> +
>> +	if (call_status != 0) {
>> +		pr_warn("%s %s Error calling get-system-parameter (0x%x)\n",
>> +			__FILE__, __func__, call_status);
>> +		return 0;
>> +	}
>> +
>> +	/*
>> +	 * The first two (2) bytes of the data in the buffer are the length of
>> +	 * the returned data, not counting these first two (2) bytes.
>> +	 */
>> +	len = local_buffer[0] * 256 + local_buffer[1] + 2;
>> +	if (len >= SPLPAR_TLB_BIC_MAXLENGTH) {
>> +		pr_warn("%s too large returned buffer %d", __func__, len);
>> +		return 0;
>> +	}
>> +
>> +	idx = 2;
>> +	while (idx < len) {
>> +		unsigned int block_size = local_buffer[idx++];
>> +		unsigned int npsize;
>> +
>> +		if (!block_size)
>> +			break;
>> +
>> +		block_size = 1 << block_size;
>> +		if (block_size != 8)
>> +			/* We only support 8 bytes size TLB invalidate buffer */
>> +			pr_warn("Unsupported H_BLOCK_REMOVE block size : %d\n",
>> +				block_size);
> 
> Should we skip setting block size if we find block_size != 8? Also can
> we avoid doing that pr_warn in loop and only warn if we don't find
> block_size 8 in the invalidate characteristics array?

My idea here is to fully read and process the data returned by the hcall, 
and to put the limitation to 8 when checking before calling H_BLOCK_REMOVE.
The warning is there because I want it to be displayed once at boot.

> 
>> +
>> +		for (npsize = local_buffer[idx++];  npsize > 0; npsize--)
>> +			check_lp_set_hblk((unsigned int) local_buffer[idx++],
>> +					  block_size);
>> +	}
>> +
>> +	for (bpsize = 0; bpsize < MMU_PAGE_COUNT; bpsize++)
>> +		for (idx = 0; idx < MMU_PAGE_COUNT; idx++)
>> +			if (mmu_psize_defs[bpsize].hblk[idx])
>> +				pr_info("H_BLOCK_REMOVE supports base psize:%d psize:%d block size:%d",
>> +					bpsize, idx,
>> +					mmu_psize_defs[bpsize].hblk[idx]);
>> +
>> +	return 0;
>> +}
>> +machine_arch_initcall(pseries, read_tlbbi_characteristics);
>> +
>>   /*
>>    * Take a spinlock around flushes to avoid bouncing the hypervisor tlbie
>>    * lock.
>> -- 
>> 2.23.0



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/3] powperc/mm: read TLB Block Invalidate Characteristics
  2019-09-12 19:26     ` Laurent Dufour
@ 2019-09-13  2:00       ` Aneesh Kumar K.V
  2019-09-13  9:10         ` Laurent Dufour
  0 siblings, 1 reply; 17+ messages in thread
From: Aneesh Kumar K.V @ 2019-09-13  2:00 UTC (permalink / raw)
  To: Laurent Dufour, mpe, benh, paulus, npiggin
  Cc: linuxppc-dev, linux-mm, linux-kernel

On 9/13/19 12:56 AM, Laurent Dufour wrote:
> Le 12/09/2019 à 16:44, Aneesh Kumar K.V a écrit :
>> Laurent Dufour <ldufour@linux.ibm.com> writes:

>>> +
>>> +    idx = 2;
>>> +    while (idx < len) {
>>> +        unsigned int block_size = local_buffer[idx++];
>>> +        unsigned int npsize;
>>> +
>>> +        if (!block_size)
>>> +            break;
>>> +
>>> +        block_size = 1 << block_size;
>>> +        if (block_size != 8)
>>> +            /* We only support 8 bytes size TLB invalidate buffer */
>>> +            pr_warn("Unsupported H_BLOCK_REMOVE block size : %d\n",
>>> +                block_size);
>>
>> Should we skip setting block size if we find block_size != 8? Also can
>> we avoid doing that pr_warn in loop and only warn if we don't find
>> block_size 8 in the invalidate characteristics array?
> 
> My idea here is to fully read and process the data returned by the 
> hcall, and to put the limitation to 8 when checking before calling 
> H_BLOCK_REMOVE.
> The warning is there because I want it to be displayed once at boot.
> 


Can we have two block size reported for the same base page size/actual 
page size combination? If so we will overwrite the hblk[actual_psize] ?

>>
>>> +
>>> +        for (npsize = local_buffer[idx++];  npsize > 0; npsize--)
>>> +            check_lp_set_hblk((unsigned int) local_buffer[idx++],
>>> +                      block_size);
>>> +    }
>>> +
>>> +    for (bpsize = 0; bpsize < MMU_PAGE_COUNT; bpsize++)
>>> +        for (idx = 0; idx < MMU_PAGE_COUNT; idx++)
>>> +            if (mmu_psize_defs[bpsize].hblk[idx])
>>> +                pr_info("H_BLOCK_REMOVE supports base psize:%d 
>>> psize:%d block size:%d",
>>> +                    bpsize, idx,
>>> +                    mmu_psize_defs[bpsize].hblk[idx]);
>>> +
>>> +    return 0;
>>> +}
>>> +machine_arch_initcall(pseries, read_tlbbi_characteristics);
>>> +
>>>   /*
>>>    * Take a spinlock around flushes to avoid bouncing the hypervisor 
>>> tlbie
>>>    * lock.

-aneesh


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/3] powperc/mm: read TLB Block Invalidate Characteristics
  2019-09-13  2:00       ` Aneesh Kumar K.V
@ 2019-09-13  9:10         ` Laurent Dufour
  0 siblings, 0 replies; 17+ messages in thread
From: Laurent Dufour @ 2019-09-13  9:10 UTC (permalink / raw)
  To: Aneesh Kumar K.V, mpe, benh, paulus, npiggin
  Cc: linuxppc-dev, linux-mm, linux-kernel

Le 13/09/2019 à 04:00, Aneesh Kumar K.V a écrit :
> On 9/13/19 12:56 AM, Laurent Dufour wrote:
>> Le 12/09/2019 à 16:44, Aneesh Kumar K.V a écrit :
>>> Laurent Dufour <ldufour@linux.ibm.com> writes:
> 
>>>> +
>>>> +    idx = 2;
>>>> +    while (idx < len) {
>>>> +        unsigned int block_size = local_buffer[idx++];
>>>> +        unsigned int npsize;
>>>> +
>>>> +        if (!block_size)
>>>> +            break;
>>>> +
>>>> +        block_size = 1 << block_size;
>>>> +        if (block_size != 8)
>>>> +            /* We only support 8 bytes size TLB invalidate buffer */
>>>> +            pr_warn("Unsupported H_BLOCK_REMOVE block size : %d\n",
>>>> +                block_size);
>>>
>>> Should we skip setting block size if we find block_size != 8? Also can
>>> we avoid doing that pr_warn in loop and only warn if we don't find
>>> block_size 8 in the invalidate characteristics array?
>>
>> My idea here is to fully read and process the data returned by the hcall, 
>> and to put the limitation to 8 when checking before calling H_BLOCK_REMOVE.
>> The warning is there because I want it to be displayed once at boot.
>>
> 
> 
> Can we have two block size reported for the same base page size/actual page 
> size combination? If so we will overwrite the hblk[actual_psize] ?

In check_lp_set_hblk() I'm only keeping the bigger one.

> 
>>>
>>>> +
>>>> +        for (npsize = local_buffer[idx++];  npsize > 0; npsize--)
>>>> +            check_lp_set_hblk((unsigned int) local_buffer[idx++],
>>>> +                      block_size);
>>>> +    }
>>>> +
>>>> +    for (bpsize = 0; bpsize < MMU_PAGE_COUNT; bpsize++)
>>>> +        for (idx = 0; idx < MMU_PAGE_COUNT; idx++)
>>>> +            if (mmu_psize_defs[bpsize].hblk[idx])
>>>> +                pr_info("H_BLOCK_REMOVE supports base psize:%d 
>>>> psize:%d block size:%d",
>>>> +                    bpsize, idx,
>>>> +                    mmu_psize_defs[bpsize].hblk[idx]);
>>>> +
>>>> +    return 0;
>>>> +}
>>>> +machine_arch_initcall(pseries, read_tlbbi_characteristics);
>>>> +
>>>>   /*
>>>>    * Take a spinlock around flushes to avoid bouncing the hypervisor tlbie
>>>>    * lock.
> 
> -aneesh



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/3] powerpc/mm: Initialize the HPTE encoding values
  2019-09-12 13:37   ` Aneesh Kumar K.V
@ 2019-09-13 10:56     ` Laurent Dufour
  0 siblings, 0 replies; 17+ messages in thread
From: Laurent Dufour @ 2019-09-13 10:56 UTC (permalink / raw)
  To: Aneesh Kumar K.V, mpe, benh, paulus, npiggin
  Cc: linuxppc-dev, linux-mm, linux-kernel

Le 12/09/2019 à 15:37, Aneesh Kumar K.V a écrit :
> On 8/30/19 5:37 PM, Laurent Dufour wrote:
>> Before reading the HPTE encoding values we initialize all of them to -1 (an
>> invalid value) to later being able to detect the initialized ones.
>>
>> Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
>> ---
>>   arch/powerpc/mm/book3s64/hash_utils.c | 8 ++++++--
>>   1 file changed, 6 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/powerpc/mm/book3s64/hash_utils.c 
>> b/arch/powerpc/mm/book3s64/hash_utils.c
>> index c3bfef08dcf8..2039bc315459 100644
>> --- a/arch/powerpc/mm/book3s64/hash_utils.c
>> +++ b/arch/powerpc/mm/book3s64/hash_utils.c
>> @@ -408,7 +408,7 @@ static int __init htab_dt_scan_page_sizes(unsigned 
>> long node,
>>   {
>>       const char *type = of_get_flat_dt_prop(node, "device_type", NULL);
>>       const __be32 *prop;
>> -    int size = 0;
>> +    int size = 0, idx, base_idx;
>>       /* We are scanning "cpu" nodes only */
>>       if (type == NULL || strcmp(type, "cpu") != 0)
>> @@ -418,6 +418,11 @@ static int __init htab_dt_scan_page_sizes(unsigned 
>> long node,
>>       if (!prop)
>>           return 0;
>> +    /* Set all the penc values to invalid */
>> +    for (base_idx = 0; base_idx < MMU_PAGE_COUNT; base_idx++)
>> +        for (idx = 0; idx < MMU_PAGE_COUNT; idx++)
>> +            mmu_psize_defs[base_idx].penc[idx] = -1;
>> +
>>       pr_info("Page sizes from device-tree:\n");
>>       size /= 4;
>>       cur_cpu_spec->mmu_features &= ~(MMU_FTR_16M_PAGE);
>> @@ -426,7 +431,6 @@ static int __init htab_dt_scan_page_sizes(unsigned 
>> long node,
>>           unsigned int slbenc = be32_to_cpu(prop[1]);
>>           unsigned int lpnum = be32_to_cpu(prop[2]);
>>           struct mmu_psize_def *def;
>> -        int idx, base_idx;
>>           size -= 3; prop += 3;
>>           base_idx = get_idx_from_shift(base_shift);
>>
> 
> We already do this in mmu_psize_set_default_penc() ?

Correct, I missed that, then this patch is no more needed.



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/3] powerpc/mm: Conditionally call H_BLOCK_REMOVE
  2019-09-12 13:44 ` [PATCH 0/3] powerpc/mm: Conditionally call H_BLOCK_REMOVE Aneesh Kumar K.V
@ 2019-09-13 11:09   ` Laurent Dufour
  0 siblings, 0 replies; 17+ messages in thread
From: Laurent Dufour @ 2019-09-13 11:09 UTC (permalink / raw)
  To: Aneesh Kumar K.V, mpe, benh, paulus, npiggin
  Cc: linuxppc-dev, linux-mm, linux-kernel

Le 12/09/2019 à 15:44, Aneesh Kumar K.V a écrit :
> On 8/30/19 5:37 PM, Laurent Dufour wrote:
>> Since the commit ba2dd8a26baa ("powerpc/pseries/mm: call H_BLOCK_REMOVE"),
>> the call to H_BLOCK_REMOVE is always done if the feature is exhibited.
>>
>> On some system, the hypervisor may not support all the combination of
>> segment base page size and page size. When this happens the hcall is
>> returning H_PARAM, which is triggering a BUG_ON check leading to a panic.
>>
>> The PAPR document is specifying a TLB Block Invalidate Characteristics item
>> detailing which couple base page size, page size the hypervisor is
>> supporting through H_BLOCK_REMOVE. Furthermore, the characteristics are
>> also providing the size of the block the hcall could process.
>>
>> Supporting various block size seems not needed as all systems I was able to
>> play with was support an 8 addresses block size, which is the maximum
>> through the hcall. Supporting various size may complexify the algorithm in
>> call_block_remove() so unless this is required, this is not done.
>>
>> In the case of block size different from 8, a warning message is displayed
>> at boot time and that block size will be ignored checking for the
>> H_BLOCK_REMOVE support.
>>
>> Due to the minimal amount of hardware showing a limited set of
>> H_BLOCK_REMOVE supported page size, I don't think there is a need to push
>> this series to the stable mailing list.
>>
>> The first patch is initializing the penc values for each page size to an
>> invalid value to be able to detect those which have been initialized as 0
>> is a valid value.
>>
>> The second patch is reading the characteristic through the hcall
>> ibm,get-system-parameter and record the supported block size for each page
>> size.
>>
>> The third patch is changing the check used to detect the H_BLOCK_REMOVE
>> availability to take care of the base page size and page size couple.
>>
> 
> So ibm,segment-page-sizes indicates wether we support a combination of base 
> page size and actual page size. You are suggesting that the value reported 
> by that is not correct? Can you also share the early part of dmesg as below.

I'm not saying that the value reported by ibm,segment-page-sizes are 
incorrect, I'm saying that some couple are not supported by the hcall 
H_BLOCK_REMOVE.

May be should I change the second sentence by

On some system, the hypervisor may not support all the combination of 
segment base page size and page size for the hcall H_BLOCK_REMOVE. When 
this happens the hcall is returning H_PARAM, which is triggering a BUG_ON 
check leading to a panic.

Is that clear enough now ?

> 
> [    0.000000] hash-mmu: Page sizes from device-tree:
> [    0.000000] hash-mmu: base_shift=12: shift=12, sllp=0x0000, 
> avpnm=0x00000000, tlbiel=1, penc=0
> [    0.000000] hash-mmu: base_shift=12: shift=16, sllp=0x0000, 
> avpnm=0x00000000, tlbiel=1, penc=7
> [    0.000000] hash-mmu: base_shift=12: shift=24, sllp=0x0000, 
> avpnm=0x00000000, tlbiel=1, penc=56
> [    0.000000] hash-mmu: base_shift=16: shift=16, sllp=0x0110, 
> avpnm=0x00000000, tlbiel=1, penc=1
> [    0.000000] hash-mmu: base_shift=16: shift=24, sllp=0x0110, 
> avpnm=0x00000000, tlbiel=1, penc=8
> [    0.000000] hash-mmu: base_shift=24: shift=24, sllp=0x0100, 
> avpnm=0x00000001, tlbiel=0, penc=0
> [    0.000000] hash-mmu: base_shift=34: shift=34, sllp=0x0120, 
> avpnm=0x000007ff, tlbiel=0, penc=3
> 
> That shows different base page size and actual page size combination.
> 
> 
>> Laurent Dufour (3):
>>    powerpc/mm: Initialize the HPTE encoding values
>>    powperc/mm: read TLB Block Invalidate Characteristics
>>    powerpc/mm: call H_BLOCK_REMOVE when supported
>>
>>   arch/powerpc/include/asm/book3s/64/mmu.h |   3 +
>>   arch/powerpc/mm/book3s64/hash_utils.c    |   8 +-
>>   arch/powerpc/platforms/pseries/lpar.c    | 118 ++++++++++++++++++++++-
>>   3 files changed, 125 insertions(+), 4 deletions(-)
>>
> 
> 
> -aneesh



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/3] powerpc/mm: call H_BLOCK_REMOVE when supported
  2019-09-12 14:20   ` Aneesh Kumar K.V
@ 2019-09-13 13:16     ` Laurent Dufour
  0 siblings, 0 replies; 17+ messages in thread
From: Laurent Dufour @ 2019-09-13 13:16 UTC (permalink / raw)
  To: Aneesh Kumar K.V, mpe, benh, paulus, npiggin
  Cc: linuxppc-dev, linux-mm, linux-kernel

Le 12/09/2019 à 16:20, Aneesh Kumar K.V a écrit :
> On 8/30/19 5:37 PM, Laurent Dufour wrote:
>> Instead of calling H_BLOCK_REMOVE all the time when the feature is
>> exhibited, call this hcall only when the couple base page size, page size
>> is supported as reported by the TLB Invalidate Characteristics.
>>
> 
> supported is not actually what we are checking here. We are checking 
> whether the base page size actual page size remove can be done in chunks of 
> 8 blocks. If we don't support 8 block you fallback to bulk invalidate. May 
> be update the commit message accordingly?

Yes that's correct.

I think I should also put the warning message displayed when reading the 
characteristic in that commit and explicitly mentioned that we only support 
8 entries size block for this hcall.
This way the limitation is limited to this commit.

> 
>> For regular pages and hugetlb, the assumption is made that the page size is
>> equal to the base page size. For THP the page size is assumed to be 16M.
>>
>> Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
>> ---
>>   arch/powerpc/platforms/pseries/lpar.c | 11 +++++++++--
>>   1 file changed, 9 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/powerpc/platforms/pseries/lpar.c 
>> b/arch/powerpc/platforms/pseries/lpar.c
>> index 375e19b3cf53..ef3dbf108a65 100644
>> --- a/arch/powerpc/platforms/pseries/lpar.c
>> +++ b/arch/powerpc/platforms/pseries/lpar.c
>> @@ -1143,7 +1143,11 @@ static inline void 
>> __pSeries_lpar_hugepage_invalidate(unsigned long *slot,
>>       if (lock_tlbie)
>>           spin_lock_irqsave(&pSeries_lpar_tlbie_lock, flags);
>> -    if (firmware_has_feature(FW_FEATURE_BLOCK_REMOVE))
>> +    /*
>> +     * Assuming THP size is 16M, and we only support 8 bytes size buffer
>> +     * for the momment.
>> +     */
>> +    if (mmu_psize_defs[psize].hblk[MMU_PAGE_16M] == 8)
>>           hugepage_block_invalidate(slot, vpn, count, psize, ssize);
>>       else
>>           hugepage_bulk_invalidate(slot, vpn, count, psize, ssize);
> 
> 
> 
> So we don't use block invalidate if blocksize is != 8.

yes

> 
> 
>> @@ -1437,7 +1441,10 @@ static void pSeries_lpar_flush_hash_range(unsigned 
>> long number, int local)
>>       if (lock_tlbie)
>>           spin_lock_irqsave(&pSeries_lpar_tlbie_lock, flags);
>> -    if (firmware_has_feature(FW_FEATURE_BLOCK_REMOVE)) {
>> +    /*
>> +     * Currently, we only support 8 bytes size buffer in do_block_remove().
>> +     */
>> +    if (mmu_psize_defs[batch->psize].hblk[batch->psize] == 8) {
>>           do_block_remove(number, batch, param);
>>           goto out;
>>       }
>>
> 
> -aneesh



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/3] powperc/mm: read TLB Block Invalidate Characteristics
  2019-09-12 14:16   ` Aneesh Kumar K.V
@ 2019-09-13 13:55     ` Laurent Dufour
  0 siblings, 0 replies; 17+ messages in thread
From: Laurent Dufour @ 2019-09-13 13:55 UTC (permalink / raw)
  To: Aneesh Kumar K.V, mpe, benh, paulus, npiggin
  Cc: linuxppc-dev, linux-mm, linux-kernel

Le 12/09/2019 à 16:16, Aneesh Kumar K.V a écrit :
> On 8/30/19 5:37 PM, Laurent Dufour wrote:
>> The PAPR document specifies the TLB Block Invalidate Characteristics which
>> is telling which couple base page size / page size is supported by the
>> H_BLOCK_REMOVE hcall.
>>
>> A new set of feature is added to the mmu_psize_def structure to record per
>> base page size which page size is supported by H_BLOCK_REMOVE.
>>
>> A new init service is added to read the characteristics. The size of the
>> buffer is set to twice the number of known page size, plus 10 bytes to
>> ensure we have enough place.
>>
> 
> 
> So this is not really the base page size/actual page size combination. This 
> is related to H_BLOCK_REMOVE hcall, block size supported by that HCALL and 
> what page size combination is supported with that specific block size.

I agree

> 
> We should add that TLB block invalidate characteristics format in this patch.

Sure, will do that in a comment inside the code.

> 
>> Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
>> ---
>>   arch/powerpc/include/asm/book3s/64/mmu.h |   3 +
>>   arch/powerpc/platforms/pseries/lpar.c    | 107 +++++++++++++++++++++++
>>   2 files changed, 110 insertions(+)
>>
>> diff --git a/arch/powerpc/include/asm/book3s/64/mmu.h 
>> b/arch/powerpc/include/asm/book3s/64/mmu.h
>> index 23b83d3593e2..675895dfe39f 100644
>> --- a/arch/powerpc/include/asm/book3s/64/mmu.h
>> +++ b/arch/powerpc/include/asm/book3s/64/mmu.h
>> @@ -12,11 +12,14 @@
>>    *    sllp  : is a bit mask with the value of SLB L || LP to be or'ed
>>    *            directly to a slbmte "vsid" value
>>    *    penc  : is the HPTE encoding mask for the "LP" field:
>> + *    hblk  : H_BLOCK_REMOVE supported block size for this page size in
>> + *            segment who's base page size is that page size.
>>    *
>>    */
>>   struct mmu_psize_def {
>>       unsigned int    shift;    /* number of bits */
>>       int        penc[MMU_PAGE_COUNT];    /* HPTE encoding */
>> +    int        hblk[MMU_PAGE_COUNT];    /* H_BLOCK_REMOVE support */
>>       unsigned int    tlbiel;    /* tlbiel supported for that page size */
>>       unsigned long    avpnm;    /* bits to mask out in AVPN in the HPTE */
>>       union {
>> diff --git a/arch/powerpc/platforms/pseries/lpar.c 
>> b/arch/powerpc/platforms/pseries/lpar.c
>> index 4f76e5f30c97..375e19b3cf53 100644
>> --- a/arch/powerpc/platforms/pseries/lpar.c
>> +++ b/arch/powerpc/platforms/pseries/lpar.c
>> @@ -1311,6 +1311,113 @@ static void do_block_remove(unsigned long number, 
>> struct ppc64_tlb_batch *batch,
>>           (void)call_block_remove(pix, param, true);
>>   }
>> +static inline void __init set_hblk_bloc_size(int bpsize, int psize,
>> +                         unsigned int block_size)
>> +{
>> +    struct mmu_psize_def *def = &mmu_psize_defs[bpsize];
>> +
>> +    if (block_size > def->hblk[psize])
>> +        def->hblk[psize] = block_size;
>> +}
>> +
>> +static inline void __init check_lp_set_hblk(unsigned int lp,
>> +                        unsigned int block_size)
>> +{
>> +    unsigned int bpsize, psize;
>> +
>> +
>> +    /* First, check the L bit, if not set, this means 4K */
>> +    if ((lp & 0x80) == 0) {
> 
> 
> What is that 0x80? We should have #define for most of those.

I will make that more explicit through a define

> 
>> +        set_hblk_bloc_size(MMU_PAGE_4K, MMU_PAGE_4K, block_size);
>> +        return;
>> +    }
>> +
>> +    /* PAPR says to look at bits 2-7 (0 = MSB) */
>> +    lp &= 0x3f;
> 
> Also convert that to #define?

Really ? The comment above is explicitly saying that we are looking at bits 
2-7. A define will obfuscate that.

> 
>> +    for (bpsize = 0; bpsize < MMU_PAGE_COUNT; bpsize++) {
>> +        struct mmu_psize_def *def =  &mmu_psize_defs[bpsize];
>> +
>> +        for (psize = 0; psize < MMU_PAGE_COUNT; psize++) {
>> +            if (def->penc[psize] == lp) {
>> +                set_hblk_bloc_size(bpsize, psize, block_size);
>> +                return;
>> +            }
>> +        }
>> +    }
>> +}
>> +
>> +#define SPLPAR_TLB_BIC_TOKEN        50
>> +#define SPLPAR_TLB_BIC_MAXLENGTH    (MMU_PAGE_COUNT*2 + 10)
>> +static int __init read_tlbbi_characteristics(void)
>> +{
>> +    int call_status;
>> +    unsigned char local_buffer[SPLPAR_TLB_BIC_MAXLENGTH];
>> +    int len, idx, bpsize;
>> +
>> +    if (!firmware_has_feature(FW_FEATURE_BLOCK_REMOVE)) {
>> +        pr_info("H_BLOCK_REMOVE is not supported");
>> +        return 0;
>> +    }
>> +
>> +    memset(local_buffer, 0, SPLPAR_TLB_BIC_MAXLENGTH);
>> +
>> +    spin_lock(&rtas_data_buf_lock);
>> +    memset(rtas_data_buf, 0, RTAS_DATA_BUF_SIZE);
>> +    call_status = rtas_call(rtas_token("ibm,get-system-parameter"), 3, 1,
>> +                NULL,
>> +                SPLPAR_TLB_BIC_TOKEN,
>> +                __pa(rtas_data_buf),
>> +                RTAS_DATA_BUF_SIZE);
>> +    memcpy(local_buffer, rtas_data_buf, SPLPAR_TLB_BIC_MAXLENGTH);
>> +    local_buffer[SPLPAR_TLB_BIC_MAXLENGTH - 1] = '\0';
>> +    spin_unlock(&rtas_data_buf_lock);
>> +
>> +    if (call_status != 0) {
>> +        pr_warn("%s %s Error calling get-system-parameter (0x%x)\n",
>> +            __FILE__, __func__, call_status);
>> +        return 0;
>> +    }
>> +
>> +    /*
>> +     * The first two (2) bytes of the data in the buffer are the length of
>> +     * the returned data, not counting these first two (2) bytes.
>> +     */
>> +    len = local_buffer[0] * 256 + local_buffer[1] + 2;
>> +    if (len >= SPLPAR_TLB_BIC_MAXLENGTH) {
>> +        pr_warn("%s too large returned buffer %d", __func__, len);
>> +        return 0;
>> +    }
>> +
>> +    idx = 2;
>> +    while (idx < len) {
>> +        unsigned int block_size = local_buffer[idx++];
>> +        unsigned int npsize;
>> +
>> +        if (!block_size)
>> +            break;
>> +
>> +        block_size = 1 << block_size;
>> +        if (block_size != 8)
>> +            /* We only support 8 bytes size TLB invalidate buffer */
>> +            pr_warn("Unsupported H_BLOCK_REMOVE block size : %d\n",
>> +                block_size);
>> +
>> +        for (npsize = local_buffer[idx++];  npsize > 0; npsize--)
>> +            check_lp_set_hblk((unsigned int) local_buffer[idx++],
>> +                      block_size);
>> +    }
>> +
>> +    for (bpsize = 0; bpsize < MMU_PAGE_COUNT; bpsize++)
>> +        for (idx = 0; idx < MMU_PAGE_COUNT; idx++)
>> +            if (mmu_psize_defs[bpsize].hblk[idx])
>> +                pr_info("H_BLOCK_REMOVE supports base psize:%d psize:%d 
>> block size:%d",
>> +                    bpsize, idx,
>> +                    mmu_psize_defs[bpsize].hblk[idx]);
>> +
>> +    return 0;
>> +}
>> +machine_arch_initcall(pseries, read_tlbbi_characteristics);
>> +
> 
> Why a machine_arch_initcall() ? Can't we do this similar to how we do 
> segment-page-size parsing from device tree? Also this should be hash 
> translation mode specific.

Because that code is specific to the pseries architecture. the hash 
translation is not pseries specific.

Indeed the change in mmu_psize_defs is not too generic. The hblk 
characteristics should remain static to the lpar.c file where it is used.

> 
>>   /*
>>    * Take a spinlock around flushes to avoid bouncing the hypervisor tlbie
>>    * lock.
>>
> 



^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2019-11-11 11:33 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-30 12:07 [PATCH 0/3] powerpc/mm: Conditionally call H_BLOCK_REMOVE Laurent Dufour
2019-08-30 12:07 ` [PATCH 1/3] powerpc/mm: Initialize the HPTE encoding values Laurent Dufour
2019-09-12 13:32   ` Aneesh Kumar K.V
2019-09-12 13:37   ` Aneesh Kumar K.V
2019-09-13 10:56     ` Laurent Dufour
2019-08-30 12:07 ` [PATCH 2/3] powperc/mm: read TLB Block Invalidate Characteristics Laurent Dufour
2019-09-12 14:16   ` Aneesh Kumar K.V
2019-09-13 13:55     ` Laurent Dufour
2019-09-12 14:44   ` Aneesh Kumar K.V
2019-09-12 19:26     ` Laurent Dufour
2019-09-13  2:00       ` Aneesh Kumar K.V
2019-09-13  9:10         ` Laurent Dufour
2019-08-30 12:07 ` [PATCH 3/3] powerpc/mm: call H_BLOCK_REMOVE when supported Laurent Dufour
2019-09-12 14:20   ` Aneesh Kumar K.V
2019-09-13 13:16     ` Laurent Dufour
2019-09-12 13:44 ` [PATCH 0/3] powerpc/mm: Conditionally call H_BLOCK_REMOVE Aneesh Kumar K.V
2019-09-13 11:09   ` Laurent Dufour

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).