linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] powerpc/THP: Wait for all hash_page calls to finish before invalidating HPTE entries
@ 2013-06-19  6:44 Aneesh Kumar K.V
  2013-06-19  6:55 ` Michael Neuling
  0 siblings, 1 reply; 5+ messages in thread
From: Aneesh Kumar K.V @ 2013-06-19  6:44 UTC (permalink / raw)
  To: benh, paulus; +Cc: linuxppc-dev, Aneesh Kumar K.V

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>

When we collapse normal pages to hugepage, we first clear the pmd, then invalidate all
the PTE entries. The assumption here is that any low level page fault will see pmd as
none and take the slow path that will wait on mmap_sem. But we could very well be in
a hash_page with local ptep pointer value. Such a hash page can result in adding new
HPTE entries for normal subpages/small page. That means we could be modifying the
page content as we copy them to a huge page. Fix this by waiting on hash_page to finish
after marking the pmd none and bfore invalidating HPTE entries. We use the heavy
kick_all_cpus_sync(). This should be ok as we do this in the background khugepaged
thread and not in application context. But we block page fault handling for this time.
Also if we find collapse slow we can ideally increase the scan rate.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 arch/powerpc/mm/pgtable_64.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/arch/powerpc/mm/pgtable_64.c b/arch/powerpc/mm/pgtable_64.c
index bbecac4..4bb44c3 100644
--- a/arch/powerpc/mm/pgtable_64.c
+++ b/arch/powerpc/mm/pgtable_64.c
@@ -543,6 +543,14 @@ pmd_t pmdp_clear_flush(struct vm_area_struct *vma, unsigned long address,
 		pmd = *pmdp;
 		pmd_clear(pmdp);
 		/*
+		 * Wait for all pending hash_page to finish
+		 * We can do this by waiting for a context switch to happen on
+		 * the cpus. Any new hash_page after this will see pmd none
+		 * and fallback to code that takes mmap_sem and hence will block
+		 * for collapse to finish.
+		 */
+		kick_all_cpus_sync();
+		/*
 		 * Now invalidate the hpte entries in the range
 		 * covered by pmd. This make sure we take a
 		 * fault and will find the pmd as none, which will
-- 
1.8.1.2

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] powerpc/THP: Wait for all hash_page calls to finish before invalidating HPTE entries
  2013-06-19  6:44 [PATCH] powerpc/THP: Wait for all hash_page calls to finish before invalidating HPTE entries Aneesh Kumar K.V
@ 2013-06-19  6:55 ` Michael Neuling
  2013-06-19  7:04   ` Aneesh Kumar K.V
  0 siblings, 1 reply; 5+ messages in thread
From: Michael Neuling @ 2013-06-19  6:55 UTC (permalink / raw)
  To: Aneesh Kumar K.V; +Cc: paulus, linuxppc-dev

Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> wrote:

> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> 
> When we collapse normal pages to hugepage, we first clear the pmd, then invalidate all
> the PTE entries. The assumption here is that any low level page fault will see pmd as
> none and take the slow path that will wait on mmap_sem. But we could very well be in
> a hash_page with local ptep pointer value. Such a hash page can result in adding new
> HPTE entries for normal subpages/small page. That means we could be modifying the
> page content as we copy them to a huge page. Fix this by waiting on hash_page to finish
> after marking the pmd none and bfore invalidating HPTE entries. We use the heavy
> kick_all_cpus_sync(). This should be ok as we do this in the background khugepaged
> thread and not in application context. But we block page fault handling for this time.
> Also if we find collapse slow we can ideally increase the scan rate.

80 columns here

> 
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> ---
>  arch/powerpc/mm/pgtable_64.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/arch/powerpc/mm/pgtable_64.c b/arch/powerpc/mm/pgtable_64.c
> index bbecac4..4bb44c3 100644
> --- a/arch/powerpc/mm/pgtable_64.c
> +++ b/arch/powerpc/mm/pgtable_64.c
> @@ -543,6 +543,14 @@ pmd_t pmdp_clear_flush(struct vm_area_struct *vma, unsigned long address,
>  		pmd = *pmdp;
>  		pmd_clear(pmdp);
>  		/*
> +		 * Wait for all pending hash_page to finish
> +		 * We can do this by waiting for a context switch to happen on
> +		 * the cpus. Any new hash_page after this will see pmd none
> +		 * and fallback to code that takes mmap_sem and hence will block
> +		 * for collapse to finish.
> +		 */
> +		kick_all_cpus_sync();
> +		/*

This doesn't apply on mainline... I assume it's needs your TPH patches?

Also, dumb question. Is this a bug we're fixing or just an optimisation?

Mikey



>  		 * Now invalidate the hpte entries in the range
>  		 * covered by pmd. This make sure we take a
>  		 * fault and will find the pmd as none, which will
> -- 
> 1.8.1.2
> 
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] powerpc/THP: Wait for all hash_page calls to finish before invalidating HPTE entries
  2013-06-19  6:55 ` Michael Neuling
@ 2013-06-19  7:04   ` Aneesh Kumar K.V
  2013-06-19  7:14     ` Michael Neuling
  0 siblings, 1 reply; 5+ messages in thread
From: Aneesh Kumar K.V @ 2013-06-19  7:04 UTC (permalink / raw)
  To: Michael Neuling; +Cc: paulus, linuxppc-dev

Michael Neuling <mikey@neuling.org> writes:

> Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> wrote:
>
>> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
>> 
>> When we collapse normal pages to hugepage, we first clear the pmd, then invalidate all
>> the PTE entries. The assumption here is that any low level page fault will see pmd as
>> none and take the slow path that will wait on mmap_sem. But we could very well be in
>> a hash_page with local ptep pointer value. Such a hash page can result in adding new
>> HPTE entries for normal subpages/small page. That means we could be modifying the
>> page content as we copy them to a huge page. Fix this by waiting on hash_page to finish
>> after marking the pmd none and bfore invalidating HPTE entries. We use the heavy
>> kick_all_cpus_sync(). This should be ok as we do this in the background khugepaged
>> thread and not in application context. But we block page fault handling for this time.
>> Also if we find collapse slow we can ideally increase the scan rate.
>
> 80 columns here
>
>> 
>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
>> ---
>>  arch/powerpc/mm/pgtable_64.c | 8 ++++++++
>>  1 file changed, 8 insertions(+)
>> 
>> diff --git a/arch/powerpc/mm/pgtable_64.c b/arch/powerpc/mm/pgtable_64.c
>> index bbecac4..4bb44c3 100644
>> --- a/arch/powerpc/mm/pgtable_64.c
>> +++ b/arch/powerpc/mm/pgtable_64.c
>> @@ -543,6 +543,14 @@ pmd_t pmdp_clear_flush(struct vm_area_struct *vma, unsigned long address,
>>  		pmd = *pmdp;
>>  		pmd_clear(pmdp);
>>  		/*
>> +		 * Wait for all pending hash_page to finish
>> +		 * We can do this by waiting for a context switch to happen on
>> +		 * the cpus. Any new hash_page after this will see pmd none
>> +		 * and fallback to code that takes mmap_sem and hence will block
>> +		 * for collapse to finish.
>> +		 */
>> +		kick_all_cpus_sync();
>> +		/*
>
> This doesn't apply on mainline... I assume it's needs your TPH
> patches?

yes, They are on top V10 THP series

>
> Also, dumb question. Is this a bug we're fixing or just an optimisation?


This is a bug fix. The details can be found at 

http://article.gmane.org/gmane.linux.ports.ppc.embedded/60266

-aneesh

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] powerpc/THP: Wait for all hash_page calls to finish before invalidating HPTE entries
  2013-06-19  7:04   ` Aneesh Kumar K.V
@ 2013-06-19  7:14     ` Michael Neuling
  2013-06-19 10:18       ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 5+ messages in thread
From: Michael Neuling @ 2013-06-19  7:14 UTC (permalink / raw)
  To: Aneesh Kumar K.V; +Cc: paulus, linuxppc-dev

Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> wrote:

> Michael Neuling <mikey@neuling.org> writes:
> 
> > Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> wrote:
> >
> >> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> >> 
> >> When we collapse normal pages to hugepage, we first clear the pmd, then invalidate all
> >> the PTE entries. The assumption here is that any low level page fault will see pmd as
> >> none and take the slow path that will wait on mmap_sem. But we could very well be in
> >> a hash_page with local ptep pointer value. Such a hash page can result in adding new
> >> HPTE entries for normal subpages/small page. That means we could be modifying the
> >> page content as we copy them to a huge page. Fix this by waiting on hash_page to finish
> >> after marking the pmd none and bfore invalidating HPTE entries. We use the heavy
> >> kick_all_cpus_sync(). This should be ok as we do this in the background khugepaged
> >> thread and not in application context. But we block page fault handling for this time.
> >> Also if we find collapse slow we can ideally increase the scan rate.
> >
> > 80 columns here



> >
> >> 
> >> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> >> ---
> >>  arch/powerpc/mm/pgtable_64.c | 8 ++++++++
> >>  1 file changed, 8 insertions(+)
> >> 
> >> diff --git a/arch/powerpc/mm/pgtable_64.c b/arch/powerpc/mm/pgtable_64.c
> >> index bbecac4..4bb44c3 100644
> >> --- a/arch/powerpc/mm/pgtable_64.c
> >> +++ b/arch/powerpc/mm/pgtable_64.c
> >> @@ -543,6 +543,14 @@ pmd_t pmdp_clear_flush(struct vm_area_struct *vma, unsigned long address,
> >>  		pmd = *pmdp;
> >>  		pmd_clear(pmdp);
> >>  		/*
> >> +		 * Wait for all pending hash_page to finish
> >> +		 * We can do this by waiting for a context switch to happen on
> >> +		 * the cpus. Any new hash_page after this will see pmd none
> >> +		 * and fallback to code that takes mmap_sem and hence will block
> >> +		 * for collapse to finish.
> >> +		 */
> >> +		kick_all_cpus_sync();
> >> +		/*
> >
> > This doesn't apply on mainline... I assume it's needs your TPH
> > patches?
> 
> yes, They are on top V10 THP series
> 
> >
> > Also, dumb question. Is this a bug we're fixing or just an optimisation?
> 
> This is a bug fix. The details can be found at 

Can you make this more obvious in the changelog (as well as making it 80
col).  I don't see 'bug' mentioned anywhere.  'Fix' is mentioned
somewhere in the middle of the changelog.

> 
> http://article.gmane.org/gmane.linux.ports.ppc.embedded/60266

OK, but V10 THP is not in yet, right?  So why not roll it into that
series rather than pushing broken stuff and fixing it?

Mikey

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] powerpc/THP: Wait for all hash_page calls to finish before invalidating HPTE entries
  2013-06-19  7:14     ` Michael Neuling
@ 2013-06-19 10:18       ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 5+ messages in thread
From: Benjamin Herrenschmidt @ 2013-06-19 10:18 UTC (permalink / raw)
  To: Michael Neuling; +Cc: linuxppc-dev, paulus, Aneesh Kumar K.V

On Wed, 2013-06-19 at 17:14 +1000, Michael Neuling wrote:

> > This is a bug fix. The details can be found at 
> 
> Can you make this more obvious in the changelog (as well as making it 80
> col).  I don't see 'bug' mentioned anywhere.  'Fix' is mentioned
> somewhere in the middle of the changelog.
> 
> > 
> > http://article.gmane.org/gmane.linux.ports.ppc.embedded/60266
> 
> OK, but V10 THP is not in yet, right?  So why not roll it into that
> series rather than pushing broken stuff and fixing it?

Right, this is expected. This fix is a result of me asking him to do it
that way on ST and should eventually be folded in the series (or
re-ordered to be in there before the patch that enables THP).

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2013-06-19 10:20 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-06-19  6:44 [PATCH] powerpc/THP: Wait for all hash_page calls to finish before invalidating HPTE entries Aneesh Kumar K.V
2013-06-19  6:55 ` Michael Neuling
2013-06-19  7:04   ` Aneesh Kumar K.V
2013-06-19  7:14     ` Michael Neuling
2013-06-19 10:18       ` Benjamin Herrenschmidt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).