All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] x86/mm/pti: in pti_clone_pgtable() don't increase addr by PUD_SIZE
@ 2019-08-20  7:51 Song Liu
  2019-08-20  9:12   ` Thomas Gleixner
                   ` (2 more replies)
  0 siblings, 3 replies; 21+ messages in thread
From: Song Liu @ 2019-08-20  7:51 UTC (permalink / raw)
  To: linux-kernel, linux-mm
  Cc: kernel-team, Song Liu, stable, Joerg Roedel, Thomas Gleixner,
	Dave Hansen, Andy Lutomirski, Peter Zijlstra

pti_clone_pgtable() increases addr by PUD_SIZE for pud_none(*pud) case.
This is not accurate because addr may not be PUD_SIZE aligned.

In our x86_64 kernel, pti_clone_pgtable() fails to clone 7 PMDs because
of this issuse, including PMD for the irq entry table. For a memcache
like workload, this introduces about 4.5x more iTLB-load and about 2.5x
more iTLB-load-misses on a Skylake CPU.

This patch fixes this issue by adding PMD_SIZE to addr for pud_none()
case.

Cc: stable@vger.kernel.org # v4.19+
Fixes: 16a3fe634f6a ("x86/mm/pti: Clone kernel-image on PTE level for 32 bit")
Signed-off-by: Song Liu <songliubraving@fb.com>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
---
 arch/x86/mm/pti.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/mm/pti.c b/arch/x86/mm/pti.c
index b196524759ec..5a67c3015f59 100644
--- a/arch/x86/mm/pti.c
+++ b/arch/x86/mm/pti.c
@@ -330,7 +330,7 @@ pti_clone_pgtable(unsigned long start, unsigned long end,
 
 		pud = pud_offset(p4d, addr);
 		if (pud_none(*pud)) {
-			addr += PUD_SIZE;
+			addr += PMD_SIZE;
 			continue;
 		}
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH] x86/mm/pti: in pti_clone_pgtable() don't increase addr by PUD_SIZE
  2019-08-20  7:51 [PATCH] x86/mm/pti: in pti_clone_pgtable() don't increase addr by PUD_SIZE Song Liu
@ 2019-08-20  9:12   ` Thomas Gleixner
  2019-08-20 10:00 ` Peter Zijlstra
  2019-08-20 13:57 ` Dave Hansen
  2 siblings, 0 replies; 21+ messages in thread
From: Thomas Gleixner @ 2019-08-20  9:12 UTC (permalink / raw)
  To: Song Liu
  Cc: linux-kernel, linux-mm, kernel-team, stable, Joerg Roedel,
	Dave Hansen, Andy Lutomirski, Peter Zijlstra

On Tue, 20 Aug 2019, Song Liu wrote:

> pti_clone_pgtable() increases addr by PUD_SIZE for pud_none(*pud) case.
> This is not accurate because addr may not be PUD_SIZE aligned.

You fail to explain how this happened. The code before the 32bit support
did always increase by PMD_SIZE. The 32bit support broke that.
 
> In our x86_64 kernel, pti_clone_pgtable() fails to clone 7 PMDs because
> of this issuse, including PMD for the irq entry table. For a memcache
> like workload, this introduces about 4.5x more iTLB-load and about 2.5x
> more iTLB-load-misses on a Skylake CPU.

This information is largely irrelevant. What matters is the fact that this
got broken and incorrectly forwards the address by PUD_SIZE which is wrong
if address is not PUD_SIZE aligned.

> This patch fixes this issue by adding PMD_SIZE to addr for pud_none()
> case.

  git grep 'This patch' Documentation/process/submitting-patches.rst

> Cc: stable@vger.kernel.org # v4.19+
> Fixes: 16a3fe634f6a ("x86/mm/pti: Clone kernel-image on PTE level for 32 bit")
> Signed-off-by: Song Liu <songliubraving@fb.com>
> Cc: Joerg Roedel <jroedel@suse.de>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Peter Zijlstra <peterz@infradead.org>
> ---
>  arch/x86/mm/pti.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/x86/mm/pti.c b/arch/x86/mm/pti.c
> index b196524759ec..5a67c3015f59 100644
> --- a/arch/x86/mm/pti.c
> +++ b/arch/x86/mm/pti.c
> @@ -330,7 +330,7 @@ pti_clone_pgtable(unsigned long start, unsigned long end,
>  
>  		pud = pud_offset(p4d, addr);
>  		if (pud_none(*pud)) {
> -			addr += PUD_SIZE;
> +			addr += PMD_SIZE;

The right fix is to skip forward to the next PUD boundary instead of doing
this in a loop with PMD_SIZE increments.

Thanks,

	tglx


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] x86/mm/pti: in pti_clone_pgtable() don't increase addr by PUD_SIZE
@ 2019-08-20  9:12   ` Thomas Gleixner
  0 siblings, 0 replies; 21+ messages in thread
From: Thomas Gleixner @ 2019-08-20  9:12 UTC (permalink / raw)
  To: Song Liu
  Cc: linux-kernel, linux-mm, kernel-team, stable, Joerg Roedel,
	Dave Hansen, Andy Lutomirski, Peter Zijlstra

On Tue, 20 Aug 2019, Song Liu wrote:

> pti_clone_pgtable() increases addr by PUD_SIZE for pud_none(*pud) case.
> This is not accurate because addr may not be PUD_SIZE aligned.

You fail to explain how this happened. The code before the 32bit support
did always increase by PMD_SIZE. The 32bit support broke that.
 
> In our x86_64 kernel, pti_clone_pgtable() fails to clone 7 PMDs because
> of this issuse, including PMD for the irq entry table. For a memcache
> like workload, this introduces about 4.5x more iTLB-load and about 2.5x
> more iTLB-load-misses on a Skylake CPU.

This information is largely irrelevant. What matters is the fact that this
got broken and incorrectly forwards the address by PUD_SIZE which is wrong
if address is not PUD_SIZE aligned.

> This patch fixes this issue by adding PMD_SIZE to addr for pud_none()
> case.

  git grep 'This patch' Documentation/process/submitting-patches.rst

> Cc: stable@vger.kernel.org # v4.19+
> Fixes: 16a3fe634f6a ("x86/mm/pti: Clone kernel-image on PTE level for 32 bit")
> Signed-off-by: Song Liu <songliubraving@fb.com>
> Cc: Joerg Roedel <jroedel@suse.de>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Peter Zijlstra <peterz@infradead.org>
> ---
>  arch/x86/mm/pti.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/x86/mm/pti.c b/arch/x86/mm/pti.c
> index b196524759ec..5a67c3015f59 100644
> --- a/arch/x86/mm/pti.c
> +++ b/arch/x86/mm/pti.c
> @@ -330,7 +330,7 @@ pti_clone_pgtable(unsigned long start, unsigned long end,
>  
>  		pud = pud_offset(p4d, addr);
>  		if (pud_none(*pud)) {
> -			addr += PUD_SIZE;
> +			addr += PMD_SIZE;

The right fix is to skip forward to the next PUD boundary instead of doing
this in a loop with PMD_SIZE increments.

Thanks,

	tglx



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] x86/mm/pti: in pti_clone_pgtable() don't increase addr by PUD_SIZE
  2019-08-20  7:51 [PATCH] x86/mm/pti: in pti_clone_pgtable() don't increase addr by PUD_SIZE Song Liu
  2019-08-20  9:12   ` Thomas Gleixner
@ 2019-08-20 10:00 ` Peter Zijlstra
  2019-08-20 11:16     ` Thomas Gleixner
  2019-08-20 13:19   ` Song Liu
  2019-08-20 13:57 ` Dave Hansen
  2 siblings, 2 replies; 21+ messages in thread
From: Peter Zijlstra @ 2019-08-20 10:00 UTC (permalink / raw)
  To: Song Liu
  Cc: linux-kernel, linux-mm, kernel-team, stable, Joerg Roedel,
	Thomas Gleixner, Dave Hansen, Andy Lutomirski

On Tue, Aug 20, 2019 at 12:51:28AM -0700, Song Liu wrote:
> pti_clone_pgtable() increases addr by PUD_SIZE for pud_none(*pud) case.
> This is not accurate because addr may not be PUD_SIZE aligned.
> 
> In our x86_64 kernel, pti_clone_pgtable() fails to clone 7 PMDs because
> of this issuse, including PMD for the irq entry table. For a memcache
> like workload, this introduces about 4.5x more iTLB-load and about 2.5x
> more iTLB-load-misses on a Skylake CPU.
> 
> This patch fixes this issue by adding PMD_SIZE to addr for pud_none()
> case.

> diff --git a/arch/x86/mm/pti.c b/arch/x86/mm/pti.c
> index b196524759ec..5a67c3015f59 100644
> --- a/arch/x86/mm/pti.c
> +++ b/arch/x86/mm/pti.c
> @@ -330,7 +330,7 @@ pti_clone_pgtable(unsigned long start, unsigned long end,
>  
>  		pud = pud_offset(p4d, addr);
>  		if (pud_none(*pud)) {
> -			addr += PUD_SIZE;
> +			addr += PMD_SIZE;
>  			continue;
>  		}

I'm thinking you're right in that there's a bug here, but I'm also
thinking your patch is both incomplete and broken.

What that code wants to do is skip to the end of the pud, a pmd_size
increase will not do that. And right below this, there's a second
instance of this exact pattern.

Did I get the below right?

---
diff --git a/arch/x86/mm/pti.c b/arch/x86/mm/pti.c
index b196524759ec..32b20b3cb227 100644
--- a/arch/x86/mm/pti.c
+++ b/arch/x86/mm/pti.c
@@ -330,12 +330,14 @@ pti_clone_pgtable(unsigned long start, unsigned long end,
 
 		pud = pud_offset(p4d, addr);
 		if (pud_none(*pud)) {
+			addr &= PUD_MASK;
 			addr += PUD_SIZE;
 			continue;
 		}
 
 		pmd = pmd_offset(pud, addr);
 		if (pmd_none(*pmd)) {
+			addr &= PMD_MASK;
 			addr += PMD_SIZE;
 			continue;
 		}

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH] x86/mm/pti: in pti_clone_pgtable() don't increase addr by PUD_SIZE
  2019-08-20 10:00 ` Peter Zijlstra
@ 2019-08-20 11:16     ` Thomas Gleixner
  2019-08-20 13:19   ` Song Liu
  1 sibling, 0 replies; 21+ messages in thread
From: Thomas Gleixner @ 2019-08-20 11:16 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Song Liu, linux-kernel, linux-mm, kernel-team, stable,
	Joerg Roedel, Dave Hansen, Andy Lutomirski

On Tue, 20 Aug 2019, Peter Zijlstra wrote:
> What that code wants to do is skip to the end of the pud, a pmd_size
> increase will not do that. And right below this, there's a second
> instance of this exact pattern.
> 
> Did I get the below right?
> 
> ---
> diff --git a/arch/x86/mm/pti.c b/arch/x86/mm/pti.c
> index b196524759ec..32b20b3cb227 100644
> --- a/arch/x86/mm/pti.c
> +++ b/arch/x86/mm/pti.c
> @@ -330,12 +330,14 @@ pti_clone_pgtable(unsigned long start, unsigned long end,
>  
>  		pud = pud_offset(p4d, addr);
>  		if (pud_none(*pud)) {
> +			addr &= PUD_MASK;
>  			addr += PUD_SIZE;

			round_up(addr, PUD_SIZE);

perhaps?

>  			continue;
>  		}
>  
>  		pmd = pmd_offset(pud, addr);
>  		if (pmd_none(*pmd)) {
> +			addr &= PMD_MASK;
>  			addr += PMD_SIZE;
>  			continue;
>  		}
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] x86/mm/pti: in pti_clone_pgtable() don't increase addr by PUD_SIZE
@ 2019-08-20 11:16     ` Thomas Gleixner
  0 siblings, 0 replies; 21+ messages in thread
From: Thomas Gleixner @ 2019-08-20 11:16 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Song Liu, linux-kernel, linux-mm, kernel-team, stable,
	Joerg Roedel, Dave Hansen, Andy Lutomirski

On Tue, 20 Aug 2019, Peter Zijlstra wrote:
> What that code wants to do is skip to the end of the pud, a pmd_size
> increase will not do that. And right below this, there's a second
> instance of this exact pattern.
> 
> Did I get the below right?
> 
> ---
> diff --git a/arch/x86/mm/pti.c b/arch/x86/mm/pti.c
> index b196524759ec..32b20b3cb227 100644
> --- a/arch/x86/mm/pti.c
> +++ b/arch/x86/mm/pti.c
> @@ -330,12 +330,14 @@ pti_clone_pgtable(unsigned long start, unsigned long end,
>  
>  		pud = pud_offset(p4d, addr);
>  		if (pud_none(*pud)) {
> +			addr &= PUD_MASK;
>  			addr += PUD_SIZE;

			round_up(addr, PUD_SIZE);

perhaps?

>  			continue;
>  		}
>  
>  		pmd = pmd_offset(pud, addr);
>  		if (pmd_none(*pmd)) {
> +			addr &= PMD_MASK;
>  			addr += PMD_SIZE;
>  			continue;
>  		}
> 


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] x86/mm/pti: in pti_clone_pgtable() don't increase addr by PUD_SIZE
  2019-08-20  9:12   ` Thomas Gleixner
  (?)
@ 2019-08-20 13:17   ` Song Liu
  -1 siblings, 0 replies; 21+ messages in thread
From: Song Liu @ 2019-08-20 13:17 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Linux Kernel Mailing List, Linux MM, Kernel Team, stable,
	Joerg Roedel, Dave Hansen, Andy Lutomirski, Peter Zijlstra



> On Aug 20, 2019, at 2:12 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
> 
> On Tue, 20 Aug 2019, Song Liu wrote:
> 
>> pti_clone_pgtable() increases addr by PUD_SIZE for pud_none(*pud) case.
>> This is not accurate because addr may not be PUD_SIZE aligned.
> 
> You fail to explain how this happened. The code before the 32bit support
> did always increase by PMD_SIZE. The 32bit support broke that.

Will fix. 

> 
>> In our x86_64 kernel, pti_clone_pgtable() fails to clone 7 PMDs because
>> of this issuse, including PMD for the irq entry table. For a memcache
>> like workload, this introduces about 4.5x more iTLB-load and about 2.5x
>> more iTLB-load-misses on a Skylake CPU.
> 
> This information is largely irrelevant. What matters is the fact that this
> got broken and incorrectly forwards the address by PUD_SIZE which is wrong
> if address is not PUD_SIZE aligned.

We started looking into this because we cannot explain the regression in 
iTLB miss rate. I guess the patch itself explains it pretty well, so the 
original issue doesn't matter that much?

I will remove this part. 

> 
>> This patch fixes this issue by adding PMD_SIZE to addr for pud_none()
>> case.
> 
>  git grep 'This patch' Documentation/process/submitting-patches.rst

Will fix. 

>> Cc: stable@vger.kernel.org # v4.19+
>> Fixes: 16a3fe634f6a ("x86/mm/pti: Clone kernel-image on PTE level for 32 bit")
>> Signed-off-by: Song Liu <songliubraving@fb.com>
>> Cc: Joerg Roedel <jroedel@suse.de>
>> Cc: Thomas Gleixner <tglx@linutronix.de>
>> Cc: Dave Hansen <dave.hansen@linux.intel.com>
>> Cc: Andy Lutomirski <luto@kernel.org>
>> Cc: Peter Zijlstra <peterz@infradead.org>
>> ---
>> arch/x86/mm/pti.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>> 
>> diff --git a/arch/x86/mm/pti.c b/arch/x86/mm/pti.c
>> index b196524759ec..5a67c3015f59 100644
>> --- a/arch/x86/mm/pti.c
>> +++ b/arch/x86/mm/pti.c
>> @@ -330,7 +330,7 @@ pti_clone_pgtable(unsigned long start, unsigned long end,
>> 
>> 		pud = pud_offset(p4d, addr);
>> 		if (pud_none(*pud)) {
>> -			addr += PUD_SIZE;
>> +			addr += PMD_SIZE;
> 
> The right fix is to skip forward to the next PUD boundary instead of doing
> this in a loop with PMD_SIZE increments.

Agreed. 

Thanks,
Song


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] x86/mm/pti: in pti_clone_pgtable() don't increase addr by PUD_SIZE
  2019-08-20 10:00 ` Peter Zijlstra
  2019-08-20 11:16     ` Thomas Gleixner
@ 2019-08-20 13:19   ` Song Liu
  1 sibling, 0 replies; 21+ messages in thread
From: Song Liu @ 2019-08-20 13:19 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linux Kernel Mailing List, linux-mm, Kernel Team, stable,
	Joerg Roedel, Thomas Gleixner, Dave Hansen, Andy Lutomirski



> On Aug 20, 2019, at 3:00 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> 
> On Tue, Aug 20, 2019 at 12:51:28AM -0700, Song Liu wrote:
>> pti_clone_pgtable() increases addr by PUD_SIZE for pud_none(*pud) case.
>> This is not accurate because addr may not be PUD_SIZE aligned.
>> 
>> In our x86_64 kernel, pti_clone_pgtable() fails to clone 7 PMDs because
>> of this issuse, including PMD for the irq entry table. For a memcache
>> like workload, this introduces about 4.5x more iTLB-load and about 2.5x
>> more iTLB-load-misses on a Skylake CPU.
>> 
>> This patch fixes this issue by adding PMD_SIZE to addr for pud_none()
>> case.
> 
>> diff --git a/arch/x86/mm/pti.c b/arch/x86/mm/pti.c
>> index b196524759ec..5a67c3015f59 100644
>> --- a/arch/x86/mm/pti.c
>> +++ b/arch/x86/mm/pti.c
>> @@ -330,7 +330,7 @@ pti_clone_pgtable(unsigned long start, unsigned long end,
>> 
>> 		pud = pud_offset(p4d, addr);
>> 		if (pud_none(*pud)) {
>> -			addr += PUD_SIZE;
>> +			addr += PMD_SIZE;
>> 			continue;
>> 		}
> 
> I'm thinking you're right in that there's a bug here, but I'm also
> thinking your patch is both incomplete and broken.
> 
> What that code wants to do is skip to the end of the pud, a pmd_size
> increase will not do that. And right below this, there's a second
> instance of this exact pattern.
> 
> Did I get the below right?

Yes, your are right. 

Thanks,
Song

> 
> ---
> diff --git a/arch/x86/mm/pti.c b/arch/x86/mm/pti.c
> index b196524759ec..32b20b3cb227 100644
> --- a/arch/x86/mm/pti.c
> +++ b/arch/x86/mm/pti.c
> @@ -330,12 +330,14 @@ pti_clone_pgtable(unsigned long start, unsigned long end,
> 
> 		pud = pud_offset(p4d, addr);
> 		if (pud_none(*pud)) {
> +			addr &= PUD_MASK;
> 			addr += PUD_SIZE;
> 			continue;
> 		}
> 
> 		pmd = pmd_offset(pud, addr);
> 		if (pmd_none(*pmd)) {
> +			addr &= PMD_MASK;
> 			addr += PMD_SIZE;
> 			continue;
> 		}


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] x86/mm/pti: in pti_clone_pgtable() don't increase addr by PUD_SIZE
  2019-08-20 11:16     ` Thomas Gleixner
  (?)
@ 2019-08-20 13:21     ` Song Liu
  2019-08-20 13:39         ` Thomas Gleixner
  2019-08-20 13:55       ` Rik van Riel
  -1 siblings, 2 replies; 21+ messages in thread
From: Song Liu @ 2019-08-20 13:21 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Peter Zijlstra, linux-kernel, linux-mm, Kernel Team, stable,
	Joerg Roedel, Dave Hansen, Andy Lutomirski



> On Aug 20, 2019, at 4:16 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
> 
> On Tue, 20 Aug 2019, Peter Zijlstra wrote:
>> What that code wants to do is skip to the end of the pud, a pmd_size
>> increase will not do that. And right below this, there's a second
>> instance of this exact pattern.
>> 
>> Did I get the below right?
>> 
>> ---
>> diff --git a/arch/x86/mm/pti.c b/arch/x86/mm/pti.c
>> index b196524759ec..32b20b3cb227 100644
>> --- a/arch/x86/mm/pti.c
>> +++ b/arch/x86/mm/pti.c
>> @@ -330,12 +330,14 @@ pti_clone_pgtable(unsigned long start, unsigned long end,
>> 
>> 		pud = pud_offset(p4d, addr);
>> 		if (pud_none(*pud)) {
>> +			addr &= PUD_MASK;
>> 			addr += PUD_SIZE;
> 
> 			round_up(addr, PUD_SIZE);

I guess we need "round_up(addr + PMD_SIZE, PUD_SIZE)". 

Thanks,
Song

> 
> perhaps?
> 
>> 			continue;
>> 		}
>> 
>> 		pmd = pmd_offset(pud, addr);
>> 		if (pmd_none(*pmd)) {
>> +			addr &= PMD_MASK;
>> 			addr += PMD_SIZE;
>> 			continue;
>> 		}
>> 


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] x86/mm/pti: in pti_clone_pgtable() don't increase addr by PUD_SIZE
  2019-08-20 11:16     ` Thomas Gleixner
  (?)
  (?)
@ 2019-08-20 13:21     ` Rik van Riel
  -1 siblings, 0 replies; 21+ messages in thread
From: Rik van Riel @ 2019-08-20 13:21 UTC (permalink / raw)
  To: Thomas Gleixner, Peter Zijlstra
  Cc: Song Liu, linux-kernel, linux-mm, Kernel Team, stable,
	Joerg Roedel, Dave Hansen, Andy Lutomirski

On Tue, 2019-08-20 at 13:16 +0200, Thomas Gleixner wrote:
> On Tue, 20 Aug 2019, Peter Zijlstra wrote:
> > What that code wants to do is skip to the end of the pud, a
> > pmd_size
> > increase will not do that. And right below this, there's a second
> > instance of this exact pattern.
> > 
> > Did I get the below right?
> > 
> > ---
> > diff --git a/arch/x86/mm/pti.c b/arch/x86/mm/pti.c
> > index b196524759ec..32b20b3cb227 100644
> > --- a/arch/x86/mm/pti.c
> > +++ b/arch/x86/mm/pti.c
> > @@ -330,12 +330,14 @@ pti_clone_pgtable(unsigned long start,
> > unsigned long end,
> >  
> >  		pud = pud_offset(p4d, addr);
> >  		if (pud_none(*pud)) {
> > +			addr &= PUD_MASK;
> >  			addr += PUD_SIZE;
> 
> 			round_up(addr, PUD_SIZE);
> 
> perhaps?

Won't that keep returning the same address forever
if addr & PUD_MASK == 0?


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] x86/mm/pti: in pti_clone_pgtable() don't increase addr by PUD_SIZE
  2019-08-20 13:21     ` Song Liu
@ 2019-08-20 13:39         ` Thomas Gleixner
  2019-08-20 13:55       ` Rik van Riel
  1 sibling, 0 replies; 21+ messages in thread
From: Thomas Gleixner @ 2019-08-20 13:39 UTC (permalink / raw)
  To: Song Liu
  Cc: Peter Zijlstra, linux-kernel, linux-mm, Kernel Team, stable,
	Joerg Roedel, Dave Hansen, Andy Lutomirski

On Tue, 20 Aug 2019, Song Liu wrote:
> > On Aug 20, 2019, at 4:16 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
> > 
> > On Tue, 20 Aug 2019, Peter Zijlstra wrote:
> >> What that code wants to do is skip to the end of the pud, a pmd_size
> >> increase will not do that. And right below this, there's a second
> >> instance of this exact pattern.
> >> 
> >> Did I get the below right?
> >> 
> >> ---
> >> diff --git a/arch/x86/mm/pti.c b/arch/x86/mm/pti.c
> >> index b196524759ec..32b20b3cb227 100644
> >> --- a/arch/x86/mm/pti.c
> >> +++ b/arch/x86/mm/pti.c
> >> @@ -330,12 +330,14 @@ pti_clone_pgtable(unsigned long start, unsigned long end,
> >> 
> >> 		pud = pud_offset(p4d, addr);
> >> 		if (pud_none(*pud)) {
> >> +			addr &= PUD_MASK;
> >> 			addr += PUD_SIZE;
> > 
> > 			round_up(addr, PUD_SIZE);
> 
> I guess we need "round_up(addr + PMD_SIZE, PUD_SIZE)". 

Right you are.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] x86/mm/pti: in pti_clone_pgtable() don't increase addr by PUD_SIZE
@ 2019-08-20 13:39         ` Thomas Gleixner
  0 siblings, 0 replies; 21+ messages in thread
From: Thomas Gleixner @ 2019-08-20 13:39 UTC (permalink / raw)
  To: Song Liu
  Cc: Peter Zijlstra, linux-kernel, linux-mm, Kernel Team, stable,
	Joerg Roedel, Dave Hansen, Andy Lutomirski

On Tue, 20 Aug 2019, Song Liu wrote:
> > On Aug 20, 2019, at 4:16 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
> > 
> > On Tue, 20 Aug 2019, Peter Zijlstra wrote:
> >> What that code wants to do is skip to the end of the pud, a pmd_size
> >> increase will not do that. And right below this, there's a second
> >> instance of this exact pattern.
> >> 
> >> Did I get the below right?
> >> 
> >> ---
> >> diff --git a/arch/x86/mm/pti.c b/arch/x86/mm/pti.c
> >> index b196524759ec..32b20b3cb227 100644
> >> --- a/arch/x86/mm/pti.c
> >> +++ b/arch/x86/mm/pti.c
> >> @@ -330,12 +330,14 @@ pti_clone_pgtable(unsigned long start, unsigned long end,
> >> 
> >> 		pud = pud_offset(p4d, addr);
> >> 		if (pud_none(*pud)) {
> >> +			addr &= PUD_MASK;
> >> 			addr += PUD_SIZE;
> > 
> > 			round_up(addr, PUD_SIZE);
> 
> I guess we need "round_up(addr + PMD_SIZE, PUD_SIZE)". 

Right you are.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] x86/mm/pti: in pti_clone_pgtable() don't increase addr by PUD_SIZE
  2019-08-20 13:21     ` Song Liu
  2019-08-20 13:39         ` Thomas Gleixner
@ 2019-08-20 13:55       ` Rik van Riel
  2019-08-20 14:00         ` Song Liu
  1 sibling, 1 reply; 21+ messages in thread
From: Rik van Riel @ 2019-08-20 13:55 UTC (permalink / raw)
  To: Song Liu, Thomas Gleixner
  Cc: Peter Zijlstra, linux-kernel, linux-mm, Kernel Team, stable,
	Joerg Roedel, Dave Hansen, Andy Lutomirski

On Tue, 2019-08-20 at 09:21 -0400, Song Liu wrote:
> > On Aug 20, 2019, at 4:16 AM, Thomas Gleixner <tglx@linutronix.de>
> > wrote:
> > 
> > On Tue, 20 Aug 2019, Peter Zijlstra wrote:
> > > What that code wants to do is skip to the end of the pud, a
> > > pmd_size
> > > increase will not do that. And right below this, there's a second
> > > instance of this exact pattern.
> > > 
> > > Did I get the below right?
> > > 
> > > ---
> > > diff --git a/arch/x86/mm/pti.c b/arch/x86/mm/pti.c
> > > index b196524759ec..32b20b3cb227 100644
> > > --- a/arch/x86/mm/pti.c
> > > +++ b/arch/x86/mm/pti.c
> > > @@ -330,12 +330,14 @@ pti_clone_pgtable(unsigned long start,
> > > unsigned long end,
> > > 
> > > 		pud = pud_offset(p4d, addr);
> > > 		if (pud_none(*pud)) {
> > > +			addr &= PUD_MASK;
> > > 			addr += PUD_SIZE;
> > 
> > 			round_up(addr, PUD_SIZE);
> 
> I guess we need "round_up(addr + PMD_SIZE, PUD_SIZE)". 

What does that do if start is less than PMD_SIZE
away from the next PUD_SIZE boundary?

How about:   round_up(addr + 1, PUD_SIZE)  ?


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] x86/mm/pti: in pti_clone_pgtable() don't increase addr by PUD_SIZE
  2019-08-20  7:51 [PATCH] x86/mm/pti: in pti_clone_pgtable() don't increase addr by PUD_SIZE Song Liu
  2019-08-20  9:12   ` Thomas Gleixner
  2019-08-20 10:00 ` Peter Zijlstra
@ 2019-08-20 13:57 ` Dave Hansen
  2019-08-20 14:14   ` Song Liu
  2 siblings, 1 reply; 21+ messages in thread
From: Dave Hansen @ 2019-08-20 13:57 UTC (permalink / raw)
  To: Song Liu, linux-kernel, linux-mm
  Cc: kernel-team, stable, Joerg Roedel, Thomas Gleixner, Dave Hansen,
	Andy Lutomirski, Peter Zijlstra

On 8/20/19 12:51 AM, Song Liu wrote:
> In our x86_64 kernel, pti_clone_pgtable() fails to clone 7 PMDs because
> of this issuse, including PMD for the irq entry table. For a memcache
> like workload, this introduces about 4.5x more iTLB-load and about 2.5x
> more iTLB-load-misses on a Skylake CPU.

I was surprised that this manifests as a performance issue.  Usually
messing up PTI page table manipulation means you get to experience the
jobs of debugging triple faults.  But, it makes sense if its this line:

        /*
         * Note that this will undo _some_ of the work that
         * pti_set_kernel_image_nonglobal() did to clear the
         * global bit.
         */
        pti_clone_pgtable(start, end_clone, PTI_LEVEL_KERNEL_IMAGE);

which is restoring the Global bit.

*But*, that shouldn't get hit on a Skylake CPU since those have PCIDs
and shouldn't have a global kernel image.  Could you confirm whether
PCIDs are supported on this CPU?

>  		pud = pud_offset(p4d, addr);
>  		if (pud_none(*pud)) {
> -			addr += PUD_SIZE;
> +			addr += PMD_SIZE;
>  			continue;
>  		}

Did we also bugger up this code:

                pmd = pmd_offset(pud, addr);
                if (pmd_none(*pmd)) {
                        addr += PMD_SIZE;
                        continue;
                }

if we're on 32-bit and this:

#define PTI_LEVEL_KERNEL_IMAGE  PTI_CLONE_PTE

and we get a hole walking to a non-PMD-aligned address?

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] x86/mm/pti: in pti_clone_pgtable() don't increase addr by PUD_SIZE
  2019-08-20 13:55       ` Rik van Riel
@ 2019-08-20 14:00         ` Song Liu
  2019-08-20 16:56             ` Rik van Riel
  0 siblings, 1 reply; 21+ messages in thread
From: Song Liu @ 2019-08-20 14:00 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Thomas Gleixner, Peter Zijlstra, linux-kernel, linux-mm,
	Kernel Team, stable, Joerg Roedel, Dave Hansen, Andy Lutomirski



> On Aug 20, 2019, at 6:55 AM, Rik van Riel <riel@fb.com> wrote:
> 
> On Tue, 2019-08-20 at 09:21 -0400, Song Liu wrote:
>>> On Aug 20, 2019, at 4:16 AM, Thomas Gleixner <tglx@linutronix.de>
>>> wrote:
>>> 
>>> On Tue, 20 Aug 2019, Peter Zijlstra wrote:
>>>> What that code wants to do is skip to the end of the pud, a
>>>> pmd_size
>>>> increase will not do that. And right below this, there's a second
>>>> instance of this exact pattern.
>>>> 
>>>> Did I get the below right?
>>>> 
>>>> ---
>>>> diff --git a/arch/x86/mm/pti.c b/arch/x86/mm/pti.c
>>>> index b196524759ec..32b20b3cb227 100644
>>>> --- a/arch/x86/mm/pti.c
>>>> +++ b/arch/x86/mm/pti.c
>>>> @@ -330,12 +330,14 @@ pti_clone_pgtable(unsigned long start,
>>>> unsigned long end,
>>>> 
>>>> 		pud = pud_offset(p4d, addr);
>>>> 		if (pud_none(*pud)) {
>>>> +			addr &= PUD_MASK;
>>>> 			addr += PUD_SIZE;
>>> 
>>> 			round_up(addr, PUD_SIZE);
>> 
>> I guess we need "round_up(addr + PMD_SIZE, PUD_SIZE)". 
> 
> What does that do if start is less than PMD_SIZE
> away from the next PUD_SIZE boundary?

Great point!

> 
> How about:   round_up(addr + 1, PUD_SIZE)  ?

Yes. How about this?

=========================== 8< ============================

From 9ae74cff4faf4710a11cb8da4c4a3f3404bd9fdd Mon Sep 17 00:00:00 2001
From: Song Liu <songliubraving@fb.com>
Date: Mon, 19 Aug 2019 23:59:47 -0700
Subject: [PATCH] x86/mm/pti: in pti_clone_pgtable(), increase addr properly

Before 32-bit support, pti_clone_pmds() always adds PMD_SIZE to addr.
This behavior changes after the 32-bit support:  pti_clone_pgtable()
increases addr by PUD_SIZE for pud_none(*pud) case, and increases addr by
PMD_SIZE for pmd_none(*pmd) case. However, this is not accurate because
addr may not be PUD_SIZE/PMD_SIZE aligned.

Fix this issue by properly rounding up addr to next PUD_SIZE/PMD_SIZE
in these two cases.

Cc: stable@vger.kernel.org # v4.19+
Fixes: 16a3fe634f6a ("x86/mm/pti: Clone kernel-image on PTE level for 32 bit")
Signed-off-by: Song Liu <songliubraving@fb.com>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
---
 arch/x86/mm/pti.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/mm/pti.c b/arch/x86/mm/pti.c
index b196524759ec..1337494e22ef 100644
--- a/arch/x86/mm/pti.c
+++ b/arch/x86/mm/pti.c
@@ -330,13 +330,13 @@ pti_clone_pgtable(unsigned long start, unsigned long end,

                pud = pud_offset(p4d, addr);
                if (pud_none(*pud)) {
-                       addr += PUD_SIZE;
+                       addr = round_up(addr + 1, PUD_SIZE);
                        continue;
                }

                pmd = pmd_offset(pud, addr);
                if (pmd_none(*pmd)) {
-                       addr += PMD_SIZE;
+                       addr = round_up(addr + 1, PMD_SIZE);
                        continue;
                }

--
2.17.1



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH] x86/mm/pti: in pti_clone_pgtable() don't increase addr by PUD_SIZE
  2019-08-20 13:57 ` Dave Hansen
@ 2019-08-20 14:14   ` Song Liu
  2019-08-20 14:18     ` Dave Hansen
  0 siblings, 1 reply; 21+ messages in thread
From: Song Liu @ 2019-08-20 14:14 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Linux Kernel Mailing List, Linux MM, Kernel Team, stable,
	Joerg Roedel, Thomas Gleixner, Dave Hansen, Andy Lutomirski,
	Peter Zijlstra



> On Aug 20, 2019, at 6:57 AM, Dave Hansen <dave.hansen@intel.com> wrote:
> 
> On 8/20/19 12:51 AM, Song Liu wrote:
>> In our x86_64 kernel, pti_clone_pgtable() fails to clone 7 PMDs because
>> of this issuse, including PMD for the irq entry table. For a memcache
>> like workload, this introduces about 4.5x more iTLB-load and about 2.5x
>> more iTLB-load-misses on a Skylake CPU.
> 
> I was surprised that this manifests as a performance issue.  Usually
> messing up PTI page table manipulation means you get to experience the
> jobs of debugging triple faults.  But, it makes sense if its this line:
> 
>        /*
>         * Note that this will undo _some_ of the work that
>         * pti_set_kernel_image_nonglobal() did to clear the
>         * global bit.
>         */
>        pti_clone_pgtable(start, end_clone, PTI_LEVEL_KERNEL_IMAGE);
> 
> which is restoring the Global bit.
> 
> *But*, that shouldn't get hit on a Skylake CPU since those have PCIDs
> and shouldn't have a global kernel image.  Could you confirm whether
> PCIDs are supported on this CPU?

Yes, pcid is listed in /proc/cpuinfo. 

Thanks,
Song

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] x86/mm/pti: in pti_clone_pgtable() don't increase addr by PUD_SIZE
  2019-08-20 14:14   ` Song Liu
@ 2019-08-20 14:18     ` Dave Hansen
  2019-08-20 16:05       ` Song Liu
  0 siblings, 1 reply; 21+ messages in thread
From: Dave Hansen @ 2019-08-20 14:18 UTC (permalink / raw)
  To: Song Liu
  Cc: Linux Kernel Mailing List, Linux MM, Kernel Team, stable,
	Joerg Roedel, Thomas Gleixner, Dave Hansen, Andy Lutomirski,
	Peter Zijlstra

On 8/20/19 7:14 AM, Song Liu wrote:
>> *But*, that shouldn't get hit on a Skylake CPU since those have PCIDs
>> and shouldn't have a global kernel image.  Could you confirm whether
>> PCIDs are supported on this CPU?
> Yes, pcid is listed in /proc/cpuinfo. 

So what's going on?  Could you confirm exactly which pti_clone_pgtable()
is causing you problems?  Do you have a theory as to why this manifests
as a performance problem rather than a functional one?

A diff of these:

	/sys/kernel/debug/page_tables/current_user
	/sys/kernel/debug/page_tables/current_kernel

before and after your patch might be helpful.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] x86/mm/pti: in pti_clone_pgtable() don't increase addr by PUD_SIZE
  2019-08-20 14:18     ` Dave Hansen
@ 2019-08-20 16:05       ` Song Liu
  2019-08-20 16:38         ` Song Liu
  0 siblings, 1 reply; 21+ messages in thread
From: Song Liu @ 2019-08-20 16:05 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Linux Kernel Mailing List, Linux MM, Kernel Team, stable,
	Joerg Roedel, Thomas Gleixner, Dave Hansen, Andy Lutomirski,
	Peter Zijlstra



> On Aug 20, 2019, at 7:18 AM, Dave Hansen <dave.hansen@intel.com> wrote:
> 
> On 8/20/19 7:14 AM, Song Liu wrote:
>>> *But*, that shouldn't get hit on a Skylake CPU since those have PCIDs
>>> and shouldn't have a global kernel image.  Could you confirm whether
>>> PCIDs are supported on this CPU?
>> Yes, pcid is listed in /proc/cpuinfo. 
> 
> So what's going on?  Could you confirm exactly which pti_clone_pgtable()
> is causing you problems?  Do you have a theory as to why this manifests
> as a performance problem rather than a functional one?
> 
> A diff of these:
> 
> 	/sys/kernel/debug/page_tables/current_user
> 	/sys/kernel/debug/page_tables/current_kernel
> 
> before and after your patch might be helpful.

I believe the difference is from the following entries (7 PMDs)

Before the patch:

current_kernel:	0xffffffff81000000-0xffffffff81e04000       14352K     ro                 GLB x  pte
efi:		0xffffffff81000000-0xffffffff81e04000       14352K     ro                 GLB x  pte
kernel:		0xffffffff81000000-0xffffffff81e04000       14352K     ro                 GLB x  pte


After the patch:

current_kernel:	0xffffffff81000000-0xffffffff81e00000          14M     ro         PSE     GLB x  pmd
efi:		0xffffffff81000000-0xffffffff81e00000          14M     ro         PSE     GLB x  pmd
kernel:		0xffffffff81000000-0xffffffff81e00000          14M     ro         PSE     GLB x  pmd

current_kernel and kernel show same data though. 

Thanks,
Song


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] x86/mm/pti: in pti_clone_pgtable() don't increase addr by PUD_SIZE
  2019-08-20 16:05       ` Song Liu
@ 2019-08-20 16:38         ` Song Liu
  0 siblings, 0 replies; 21+ messages in thread
From: Song Liu @ 2019-08-20 16:38 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Linux Kernel Mailing List, Linux MM, Kernel Team, stable,
	Joerg Roedel, Thomas Gleixner, Dave Hansen, Andy Lutomirski,
	Peter Zijlstra



> On Aug 20, 2019, at 9:05 AM, Song Liu <songliubraving@fb.com> wrote:
> 
> 
> 
>> On Aug 20, 2019, at 7:18 AM, Dave Hansen <dave.hansen@intel.com> wrote:
>> 
>> On 8/20/19 7:14 AM, Song Liu wrote:
>>>> *But*, that shouldn't get hit on a Skylake CPU since those have PCIDs
>>>> and shouldn't have a global kernel image.  Could you confirm whether
>>>> PCIDs are supported on this CPU?
>>> Yes, pcid is listed in /proc/cpuinfo. 
>> 
>> So what's going on?  Could you confirm exactly which pti_clone_pgtable()
>> is causing you problems?  Do you have a theory as to why this manifests
>> as a performance problem rather than a functional one?
>> 
>> A diff of these:
>> 
>> 	/sys/kernel/debug/page_tables/current_user
>> 	/sys/kernel/debug/page_tables/current_kernel
>> 
>> before and after your patch might be helpful.
> 
> I believe the difference is from the following entries (7 PMDs)
> 
> Before the patch:
> 
> current_kernel:	0xffffffff81000000-0xffffffff81e04000       14352K     ro                 GLB x  pte
> efi:		0xffffffff81000000-0xffffffff81e04000       14352K     ro                 GLB x  pte
> kernel:		0xffffffff81000000-0xffffffff81e04000       14352K     ro                 GLB x  pte
> 
> 
> After the patch:
> 
> current_kernel:	0xffffffff81000000-0xffffffff81e00000          14M     ro         PSE     GLB x  pmd
> efi:		0xffffffff81000000-0xffffffff81e00000          14M     ro         PSE     GLB x  pmd
> kernel:		0xffffffff81000000-0xffffffff81e00000          14M     ro         PSE     GLB x  pmd
> 
> current_kernel and kernel show same data though. 

A little more details on how I got here.

We use huge page for hot text and thus reduces iTLB misses. As we 
benchmark 5.2 based kernel (vs. 4.16 based), we found ~2.5x more 
iTLB misses. 

To figure out the issue, I use a debug patch that dumps page table for 
a pid. The following are information from the workload pid. 


For the 4.16 based kernel:

host-4.16 # grep "x  pmd" /sys/kernel/debug/page_tables/dump_pid
0x0000000000600000-0x0000000000e00000           8M USR ro         PSE         x  pmd
0xffffffff81a00000-0xffffffff81c00000           2M     ro         PSE         x  pmd


For the 5.2 based kernel before this patch:

host-5.2-before # grep "x  pmd" /sys/kernel/debug/page_tables/dump_pid
0x0000000000600000-0x0000000000e00000           8M USR ro         PSE         x  pmd


The 8MB text in pmd is from user space. 4.16 kernel has 1 pmd for the
irq entry table; while 4.16 kernel doesn't have it. 


For the 5.2 based kernel after this patch:

host-5.2-after # grep "x  pmd" /sys/kernel/debug/page_tables/dump_pid
0x0000000000600000-0x0000000000e00000           8M USR ro         PSE         x  pmd
0xffffffff81000000-0xffffffff81e00000          14M     ro         PSE     GLB x  pmd


So after this patch, the 5.2 based kernel has 7 PMDs instead of 1 PMD 
in 4.16 kernel. This further reduces iTLB miss rate 

Thanks,
Song

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2] x86/mm/pti: in pti_clone_pgtable() don't increase addr by PUD_SIZE
  2019-08-20 14:00         ` Song Liu
@ 2019-08-20 16:56             ` Rik van Riel
  0 siblings, 0 replies; 21+ messages in thread
From: Rik van Riel @ 2019-08-20 16:56 UTC (permalink / raw)
  To: Song Liu
  Cc: Thomas Gleixner, Peter Zijlstra, linux-kernel, linux-mm,
	Kernel Team, stable, Joerg Roedel, Dave Hansen, Andy Lutomirski

[-- Attachment #1: Type: text/plain, Size: 1267 bytes --]

On Tue, 2019-08-20 at 10:00 -0400, Song Liu wrote:
> 
> From 9ae74cff4faf4710a11cb8da4c4a3f3404bd9fdd Mon Sep 17 00:00:00
> 2001
> From: Song Liu <songliubraving@fb.com>
> Date: Mon, 19 Aug 2019 23:59:47 -0700
> Subject: [PATCH] x86/mm/pti: in pti_clone_pgtable(), increase addr
> properly
> 
> Before 32-bit support, pti_clone_pmds() always adds PMD_SIZE to addr.
> This behavior changes after the 32-bit support:  pti_clone_pgtable()
> increases addr by PUD_SIZE for pud_none(*pud) case, and increases
> addr by
> PMD_SIZE for pmd_none(*pmd) case. However, this is not accurate
> because
> addr may not be PUD_SIZE/PMD_SIZE aligned.
> 
> Fix this issue by properly rounding up addr to next PUD_SIZE/PMD_SIZE
> in these two cases.
> 
> Cc: stable@vger.kernel.org # v4.19+
> Fixes: 16a3fe634f6a ("x86/mm/pti: Clone kernel-image on PTE level for
> 32 bit")
> Signed-off-by: Song Liu <songliubraving@fb.com>
> Cc: Joerg Roedel <jroedel@suse.de>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Peter Zijlstra <peterz@infradead.org>

This looks like it should do the trick!

Reviewed-by: Rik van Riel <riel@surriel.com>

-- 
All Rights Reversed.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2] x86/mm/pti: in pti_clone_pgtable() don't increase addr by PUD_SIZE
@ 2019-08-20 16:56             ` Rik van Riel
  0 siblings, 0 replies; 21+ messages in thread
From: Rik van Riel @ 2019-08-20 16:56 UTC (permalink / raw)
  To: Song Liu
  Cc: Thomas Gleixner, Peter Zijlstra, linux-kernel, linux-mm,
	Kernel Team, stable, Joerg Roedel, Dave Hansen, Andy Lutomirski

[-- Attachment #1: Type: text/plain, Size: 1267 bytes --]

On Tue, 2019-08-20 at 10:00 -0400, Song Liu wrote:
> 
> From 9ae74cff4faf4710a11cb8da4c4a3f3404bd9fdd Mon Sep 17 00:00:00
> 2001
> From: Song Liu <songliubraving@fb.com>
> Date: Mon, 19 Aug 2019 23:59:47 -0700
> Subject: [PATCH] x86/mm/pti: in pti_clone_pgtable(), increase addr
> properly
> 
> Before 32-bit support, pti_clone_pmds() always adds PMD_SIZE to addr.
> This behavior changes after the 32-bit support:  pti_clone_pgtable()
> increases addr by PUD_SIZE for pud_none(*pud) case, and increases
> addr by
> PMD_SIZE for pmd_none(*pmd) case. However, this is not accurate
> because
> addr may not be PUD_SIZE/PMD_SIZE aligned.
> 
> Fix this issue by properly rounding up addr to next PUD_SIZE/PMD_SIZE
> in these two cases.
> 
> Cc: stable@vger.kernel.org # v4.19+
> Fixes: 16a3fe634f6a ("x86/mm/pti: Clone kernel-image on PTE level for
> 32 bit")
> Signed-off-by: Song Liu <songliubraving@fb.com>
> Cc: Joerg Roedel <jroedel@suse.de>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Peter Zijlstra <peterz@infradead.org>

This looks like it should do the trick!

Reviewed-by: Rik van Riel <riel@surriel.com>

-- 
All Rights Reversed.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2019-08-20 16:56 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-20  7:51 [PATCH] x86/mm/pti: in pti_clone_pgtable() don't increase addr by PUD_SIZE Song Liu
2019-08-20  9:12 ` Thomas Gleixner
2019-08-20  9:12   ` Thomas Gleixner
2019-08-20 13:17   ` Song Liu
2019-08-20 10:00 ` Peter Zijlstra
2019-08-20 11:16   ` Thomas Gleixner
2019-08-20 11:16     ` Thomas Gleixner
2019-08-20 13:21     ` Song Liu
2019-08-20 13:39       ` Thomas Gleixner
2019-08-20 13:39         ` Thomas Gleixner
2019-08-20 13:55       ` Rik van Riel
2019-08-20 14:00         ` Song Liu
2019-08-20 16:56           ` [PATCH v2] " Rik van Riel
2019-08-20 16:56             ` Rik van Riel
2019-08-20 13:21     ` [PATCH] " Rik van Riel
2019-08-20 13:19   ` Song Liu
2019-08-20 13:57 ` Dave Hansen
2019-08-20 14:14   ` Song Liu
2019-08-20 14:18     ` Dave Hansen
2019-08-20 16:05       ` Song Liu
2019-08-20 16:38         ` Song Liu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.