linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/2] powerpc: use smp_rmb when looking at deposisted pgtable to store hash index
@ 2013-05-17  8:15 Aneesh Kumar K.V
  2013-05-17  8:15 ` [PATCH 2/2] powerpc: split hugepage when using subpage protection Aneesh Kumar K.V
  2013-05-20  1:18 ` [PATCH 1/2] powerpc: use smp_rmb when looking at deposisted pgtable to store hash index Michael Neuling
  0 siblings, 2 replies; 7+ messages in thread
From: Aneesh Kumar K.V @ 2013-05-17  8:15 UTC (permalink / raw)
  To: benh, paulus, dwg; +Cc: linuxppc-dev, Aneesh Kumar K.V

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>

We need to use smb_rmb when looking at hpte slot array. Otherwise we could
reorder the hpte_slot array load bfore even we marked the pmd trans huge.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/pgtable-ppc64.h | 15 +++++++++++++++
 arch/powerpc/mm/hugepage-hash64.c        |  6 +-----
 arch/powerpc/mm/pgtable_64.c             |  6 +-----
 3 files changed, 17 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/include/asm/pgtable-ppc64.h b/arch/powerpc/include/asm/pgtable-ppc64.h
index d046289..46db094 100644
--- a/arch/powerpc/include/asm/pgtable-ppc64.h
+++ b/arch/powerpc/include/asm/pgtable-ppc64.h
@@ -10,6 +10,7 @@
 #else
 #include <asm/pgtable-ppc64-4k.h>
 #endif
+#include <asm/barrier.h>
 
 #define FIRST_USER_ADDRESS	0
 
@@ -393,6 +394,20 @@ static inline void mark_hpte_slot_valid(unsigned char *hpte_slot_array,
 	hpte_slot_array[index] = hidx << 4 | 0x1 << 3;
 }
 
+static inline char *get_hpte_slot_array(pmd_t *pmdp)
+{
+	/*
+	 * The hpte hindex is stored in the pgtable whose address is in the
+	 * second half of the PMD
+	 *
+	 * Order this load with the test for pmd_trans_huge in the caller
+	 */
+	smp_rmb();
+	return *(char **)(pmdp + PTRS_PER_PMD);
+
+
+}
+
 extern void hpte_do_hugepage_flush(struct mm_struct *mm, unsigned long addr,
 				   pmd_t *pmdp);
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
diff --git a/arch/powerpc/mm/hugepage-hash64.c b/arch/powerpc/mm/hugepage-hash64.c
index e430766..c3ba3d5 100644
--- a/arch/powerpc/mm/hugepage-hash64.c
+++ b/arch/powerpc/mm/hugepage-hash64.c
@@ -84,11 +84,7 @@ int __hash_page_thp(unsigned long ea, unsigned long access, unsigned long vsid,
 
 	vpn = hpt_vpn(ea, vsid, ssize);
 	hash = hpt_hash(vpn, shift, ssize);
-	/*
-	 * The hpte hindex are stored in the pgtable whose address is in the
-	 * second half of the PMD
-	 */
-	hpte_slot_array = *(char **)(pmdp + PTRS_PER_PMD);
+	hpte_slot_array = get_hpte_slot_array(pmdp);
 
 	valid = hpte_valid(hpte_slot_array, index);
 	if (valid) {
diff --git a/arch/powerpc/mm/pgtable_64.c b/arch/powerpc/mm/pgtable_64.c
index 8dd7c83..19d6734 100644
--- a/arch/powerpc/mm/pgtable_64.c
+++ b/arch/powerpc/mm/pgtable_64.c
@@ -701,11 +701,7 @@ void hpte_do_hugepage_flush(struct mm_struct *mm, unsigned long addr,
 	 * Flush all the hptes mapping this hugepage
 	 */
 	s_addr = addr & HPAGE_PMD_MASK;
-	/*
-	 * The hpte hindex are stored in the pgtable whose address is in the
-	 * second half of the PMD
-	 */
-	hpte_slot_array = *(char **)(pmdp + PTRS_PER_PMD);
+	hpte_slot_array = get_hpte_slot_array(pmdp);
 	/*
 	 * IF we try to do a HUGE PTE update after a withdraw is done.
 	 * we will find the below NULL. This happens when we do
-- 
1.8.1.2

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/2] powerpc: split hugepage when using subpage protection
  2013-05-17  8:15 [PATCH 1/2] powerpc: use smp_rmb when looking at deposisted pgtable to store hash index Aneesh Kumar K.V
@ 2013-05-17  8:15 ` Aneesh Kumar K.V
  2013-05-17  8:36   ` Aneesh Kumar K.V
  2013-05-20  1:18 ` [PATCH 1/2] powerpc: use smp_rmb when looking at deposisted pgtable to store hash index Michael Neuling
  1 sibling, 1 reply; 7+ messages in thread
From: Aneesh Kumar K.V @ 2013-05-17  8:15 UTC (permalink / raw)
  To: benh, paulus, dwg; +Cc: linuxppc-dev, Aneesh Kumar K.V

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>

We find all the overlapping vma and mark them such that we don't allocate
hugepage in that range. Also we split existing huge page so that the
normal page hash can be invalidated and new page faulted in with new
protection bits.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 arch/powerpc/mm/subpage-prot.c | 27 +++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)

diff --git a/arch/powerpc/mm/subpage-prot.c b/arch/powerpc/mm/subpage-prot.c
index 7c415dd..33fd329 100644
--- a/arch/powerpc/mm/subpage-prot.c
+++ b/arch/powerpc/mm/subpage-prot.c
@@ -130,6 +130,14 @@ static void subpage_prot_clear(unsigned long addr, unsigned long len)
 	up_write(&mm->mmap_sem);
 }
 
+static int subpage_walk_pmd_entry(pmd_t *pmd, unsigned long addr,
+				  unsigned long end, struct mm_walk *walk)
+{
+	struct vm_area_struct *vma = walk->private;
+	split_huge_page_pmd(vma, addr, pmd);
+	return 0;
+}
+
 /*
  * Copy in a subpage protection map for an address range.
  * The map has 2 bits per 4k subpage, so 32 bits per 64k page.
@@ -149,6 +157,12 @@ long sys_subpage_prot(unsigned long addr, unsigned long len, u32 __user *map)
 	size_t nw;
 	unsigned long next, limit;
 	int err;
+	struct vm_area_struct *vma;
+
+	struct mm_walk subpage_proto_walk = {
+		.mm = mm,
+		.pmd_entry = subpage_walk_pmd_entry,
+	};
 
 	/* Check parameters */
 	if ((addr & ~PAGE_MASK) || (len & ~PAGE_MASK) ||
@@ -168,6 +182,19 @@ long sys_subpage_prot(unsigned long addr, unsigned long len, u32 __user *map)
 		return -EFAULT;
 
 	down_write(&mm->mmap_sem);
+
+	/*
+	 * We don't try too hard, we just mark all the vma in that range
+	 * VM_NOHUGEPAGE and split them.
+	 */
+	for (vma = find_vma(mm, addr);
+	     (vma && vma->vm_end < (addr + len)); vma = vma->vm_next) {
+
+		vma->vm_flags |= VM_NOHUGEPAGE;
+		subpage_proto_walk.private = vma;
+		walk_page_range(vma->vm_start, vma->vm_end,
+				&subpage_proto_walk);
+	}
 	for (limit = addr + len; addr < limit; addr = next) {
 		next = pmd_addr_end(addr, limit);
 		err = -ENOMEM;
-- 
1.8.1.2

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] powerpc: split hugepage when using subpage protection
  2013-05-17  8:15 ` [PATCH 2/2] powerpc: split hugepage when using subpage protection Aneesh Kumar K.V
@ 2013-05-17  8:36   ` Aneesh Kumar K.V
  0 siblings, 0 replies; 7+ messages in thread
From: Aneesh Kumar K.V @ 2013-05-17  8:36 UTC (permalink / raw)
  To: benh, paulus, dwg; +Cc: linuxppc-dev

"Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> writes:

> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
>
> We find all the overlapping vma and mark them such that we don't allocate
> hugepage in that range. Also we split existing huge page so that the
> normal page hash can be invalidated and new page faulted in with new
> protection bits.
>
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> ---
>  arch/powerpc/mm/subpage-prot.c | 27 +++++++++++++++++++++++++++
>  1 file changed, 27 insertions(+)
>
> diff --git a/arch/powerpc/mm/subpage-prot.c b/arch/powerpc/mm/subpage-prot.c
> index 7c415dd..33fd329 100644
> --- a/arch/powerpc/mm/subpage-prot.c
> +++ b/arch/powerpc/mm/subpage-prot.c
> @@ -130,6 +130,14 @@ static void subpage_prot_clear(unsigned long addr, unsigned long len)
>  	up_write(&mm->mmap_sem);
>  }
>
> +static int subpage_walk_pmd_entry(pmd_t *pmd, unsigned long addr,
> +				  unsigned long end, struct mm_walk *walk)
> +{
> +	struct vm_area_struct *vma = walk->private;
> +	split_huge_page_pmd(vma, addr, pmd);
> +	return 0;
> +}
> +
>  /*
>   * Copy in a subpage protection map for an address range.
>   * The map has 2 bits per 4k subpage, so 32 bits per 64k page.
> @@ -149,6 +157,12 @@ long sys_subpage_prot(unsigned long addr, unsigned long len, u32 __user *map)
>  	size_t nw;
>  	unsigned long next, limit;
>  	int err;
> +	struct vm_area_struct *vma;
> +
> +	struct mm_walk subpage_proto_walk = {
> +		.mm = mm,
> +		.pmd_entry = subpage_walk_pmd_entry,
> +	};
>
>  	/* Check parameters */
>  	if ((addr & ~PAGE_MASK) || (len & ~PAGE_MASK) ||
> @@ -168,6 +182,19 @@ long sys_subpage_prot(unsigned long addr, unsigned long len, u32 __user *map)
>  		return -EFAULT;
>
>  	down_write(&mm->mmap_sem);
> +
> +	/*
> +	 * We don't try too hard, we just mark all the vma in that range
> +	 * VM_NOHUGEPAGE and split them.
> +	 */
> +	for (vma = find_vma(mm, addr);
> +	     (vma && vma->vm_end < (addr + len)); vma = vma->vm_next) {

should be, Missed commit -amend 

	     (vma && vma->vm_start < (addr + len)); vma = vma->vm_next) {

> +		vma->vm_flags |= VM_NOHUGEPAGE;
> +		subpage_proto_walk.private = vma;
> +		walk_page_range(vma->vm_start, vma->vm_end,
> +				&subpage_proto_walk);
> +	}
>  	for (limit = addr + len; addr < limit; addr = next) {
>  		next = pmd_addr_end(addr, limit);
>  		err = -ENOMEM;

-aneesh

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] powerpc: use smp_rmb when looking at deposisted pgtable to store hash index
  2013-05-17  8:15 [PATCH 1/2] powerpc: use smp_rmb when looking at deposisted pgtable to store hash index Aneesh Kumar K.V
  2013-05-17  8:15 ` [PATCH 2/2] powerpc: split hugepage when using subpage protection Aneesh Kumar K.V
@ 2013-05-20  1:18 ` Michael Neuling
  2013-05-20  4:27   ` Aneesh Kumar K.V
  1 sibling, 1 reply; 7+ messages in thread
From: Michael Neuling @ 2013-05-20  1:18 UTC (permalink / raw)
  To: Aneesh Kumar K.V; +Cc: paulus, linuxppc-dev, dwg

Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> wrote:

> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> 
> We need to use smb_rmb when looking at hpte slot array. Otherwise we could
> reorder the hpte_slot array load bfore even we marked the pmd trans huge.

Does this need to go back into the stable series?

Mikey

> 
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> ---
>  arch/powerpc/include/asm/pgtable-ppc64.h | 15 +++++++++++++++
>  arch/powerpc/mm/hugepage-hash64.c        |  6 +-----
>  arch/powerpc/mm/pgtable_64.c             |  6 +-----
>  3 files changed, 17 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/pgtable-ppc64.h b/arch/powerpc/include/asm/pgtable-ppc64.h
> index d046289..46db094 100644
> --- a/arch/powerpc/include/asm/pgtable-ppc64.h
> +++ b/arch/powerpc/include/asm/pgtable-ppc64.h
> @@ -10,6 +10,7 @@
>  #else
>  #include <asm/pgtable-ppc64-4k.h>
>  #endif
> +#include <asm/barrier.h>
>  
>  #define FIRST_USER_ADDRESS	0
>  
> @@ -393,6 +394,20 @@ static inline void mark_hpte_slot_valid(unsigned char *hpte_slot_array,
>  	hpte_slot_array[index] = hidx << 4 | 0x1 << 3;
>  }
>  
> +static inline char *get_hpte_slot_array(pmd_t *pmdp)
> +{
> +	/*
> +	 * The hpte hindex is stored in the pgtable whose address is in the
> +	 * second half of the PMD
> +	 *
> +	 * Order this load with the test for pmd_trans_huge in the caller
> +	 */
> +	smp_rmb();
> +	return *(char **)(pmdp + PTRS_PER_PMD);
> +
> +
> +}
> +
>  extern void hpte_do_hugepage_flush(struct mm_struct *mm, unsigned long addr,
>  				   pmd_t *pmdp);
>  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
> diff --git a/arch/powerpc/mm/hugepage-hash64.c b/arch/powerpc/mm/hugepage-hash64.c
> index e430766..c3ba3d5 100644
> --- a/arch/powerpc/mm/hugepage-hash64.c
> +++ b/arch/powerpc/mm/hugepage-hash64.c
> @@ -84,11 +84,7 @@ int __hash_page_thp(unsigned long ea, unsigned long access, unsigned long vsid,
>  
>  	vpn = hpt_vpn(ea, vsid, ssize);
>  	hash = hpt_hash(vpn, shift, ssize);
> -	/*
> -	 * The hpte hindex are stored in the pgtable whose address is in the
> -	 * second half of the PMD
> -	 */
> -	hpte_slot_array = *(char **)(pmdp + PTRS_PER_PMD);
> +	hpte_slot_array = get_hpte_slot_array(pmdp);
>  
>  	valid = hpte_valid(hpte_slot_array, index);
>  	if (valid) {
> diff --git a/arch/powerpc/mm/pgtable_64.c b/arch/powerpc/mm/pgtable_64.c
> index 8dd7c83..19d6734 100644
> --- a/arch/powerpc/mm/pgtable_64.c
> +++ b/arch/powerpc/mm/pgtable_64.c
> @@ -701,11 +701,7 @@ void hpte_do_hugepage_flush(struct mm_struct *mm, unsigned long addr,
>  	 * Flush all the hptes mapping this hugepage
>  	 */
>  	s_addr = addr & HPAGE_PMD_MASK;
> -	/*
> -	 * The hpte hindex are stored in the pgtable whose address is in the
> -	 * second half of the PMD
> -	 */
> -	hpte_slot_array = *(char **)(pmdp + PTRS_PER_PMD);
> +	hpte_slot_array = get_hpte_slot_array(pmdp);
>  	/*
>  	 * IF we try to do a HUGE PTE update after a withdraw is done.
>  	 * we will find the below NULL. This happens when we do
> -- 
> 1.8.1.2
> 
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] powerpc: use smp_rmb when looking at deposisted pgtable to store hash index
  2013-05-20  1:18 ` [PATCH 1/2] powerpc: use smp_rmb when looking at deposisted pgtable to store hash index Michael Neuling
@ 2013-05-20  4:27   ` Aneesh Kumar K.V
  2013-05-20  6:28     ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 7+ messages in thread
From: Aneesh Kumar K.V @ 2013-05-20  4:27 UTC (permalink / raw)
  To: Michael Neuling; +Cc: paulus, linuxppc-dev, dwg

Michael Neuling <mikey@neuling.org> writes:

> Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> wrote:
>
>> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
>> 
>> We need to use smb_rmb when looking at hpte slot array. Otherwise we could
>> reorder the hpte_slot array load bfore even we marked the pmd trans huge.
>
> Does this need to go back into the stable series?
>

No the changes are not yet upstream. hpte_slot_array changes are in the
THP series. I didn't want to post the entire series again with the 
above two patches. 

-aneesh

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] powerpc: use smp_rmb when looking at deposisted pgtable to store hash index
  2013-05-20  4:27   ` Aneesh Kumar K.V
@ 2013-05-20  6:28     ` Benjamin Herrenschmidt
  2013-05-20  9:26       ` Aneesh Kumar K.V
  0 siblings, 1 reply; 7+ messages in thread
From: Benjamin Herrenschmidt @ 2013-05-20  6:28 UTC (permalink / raw)
  To: Aneesh Kumar K.V; +Cc: linuxppc-dev, Michael Neuling, paulus, dwg

On Mon, 2013-05-20 at 09:57 +0530, Aneesh Kumar K.V wrote:
> Michael Neuling <mikey@neuling.org> writes:
> 
> > Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> wrote:
> >
> >> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> >> 
> >> We need to use smb_rmb when looking at hpte slot array. Otherwise we could
> >> reorder the hpte_slot array load bfore even we marked the pmd trans huge.
> >
> > Does this need to go back into the stable series?
> >
> 
> No the changes are not yet upstream. hpte_slot_array changes are in the
> THP series. I didn't want to post the entire series again with the 
> above two patches. 

Note that any patch that adds such a rmb should have a very clear
description of what is the corresponding wmb. It's almost never right to
have one without the other. IE. What is it that you are ordering
against.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] powerpc: use smp_rmb when looking at deposisted pgtable to store hash index
  2013-05-20  6:28     ` Benjamin Herrenschmidt
@ 2013-05-20  9:26       ` Aneesh Kumar K.V
  0 siblings, 0 replies; 7+ messages in thread
From: Aneesh Kumar K.V @ 2013-05-20  9:26 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev, Michael Neuling, paulus, dwg

Benjamin Herrenschmidt <benh@kernel.crashing.org> writes:

> On Mon, 2013-05-20 at 09:57 +0530, Aneesh Kumar K.V wrote:
>> Michael Neuling <mikey@neuling.org> writes:
>> 
>> > Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> wrote:
>> >
>> >> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
>> >> 
>> >> We need to use smb_rmb when looking at hpte slot array. Otherwise we could
>> >> reorder the hpte_slot array load bfore even we marked the pmd trans huge.
>> >
>> > Does this need to go back into the stable series?
>> >
>> 
>> No the changes are not yet upstream. hpte_slot_array changes are in the
>> THP series. I didn't want to post the entire series again with the 
>> above two patches. 
>
> Note that any patch that adds such a rmb should have a very clear
> description of what is the corresponding wmb. It's almost never right to
> have one without the other. IE. What is it that you are ordering
> against.

Will update the commit message. Some of that smb_wmb() are in the
core THP, and the others in ppc64 code. Will update with more info.

When we deposit pgtable we use pgtable_trans_huge_deposit that calls
related smb_wmb(). When we read pgtable via pgtable_trans_huge_withdraw,
core THP code does the related smb_wmb() after setting the right
PTE information in the pgtable.

-aneesh

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2013-05-20  9:26 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-05-17  8:15 [PATCH 1/2] powerpc: use smp_rmb when looking at deposisted pgtable to store hash index Aneesh Kumar K.V
2013-05-17  8:15 ` [PATCH 2/2] powerpc: split hugepage when using subpage protection Aneesh Kumar K.V
2013-05-17  8:36   ` Aneesh Kumar K.V
2013-05-20  1:18 ` [PATCH 1/2] powerpc: use smp_rmb when looking at deposisted pgtable to store hash index Michael Neuling
2013-05-20  4:27   ` Aneesh Kumar K.V
2013-05-20  6:28     ` Benjamin Herrenschmidt
2013-05-20  9:26       ` Aneesh Kumar K.V

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).