* Re: [PATCH] x86/mm/64: Do not dereference non-present PGD entries
2020-08-07 8:40 [PATCH] x86/mm/64: Do not dereference non-present PGD entries Joerg Roedel
@ 2020-08-07 9:52 ` Jason A. Donenfeld
2020-08-07 10:07 ` Mike Rapoport
2020-08-10 14:27 ` Dave Hansen
2 siblings, 0 replies; 6+ messages in thread
From: Jason A. Donenfeld @ 2020-08-07 9:52 UTC (permalink / raw)
To: Joerg Roedel
Cc: X86 ML, Dave Hansen, Andy Lutomirski, Peter Zijlstra,
H . Peter Anvin, Mike Rapoport, Joerg Roedel, Linus Torvalds,
Andrew Morton, LKML
On Fri, Aug 7, 2020 at 10:40 AM Joerg Roedel <joro@8bytes.org> wrote:
>
> From: Joerg Roedel <jroedel@suse.de>
>
> The code for preallocate_vmalloc_pages() was written under the
> assumption that the p4d_offset() and pud_offset() functions will perform
> present checks before dereferencing the parent entries.
>
> This assumption is wrong an leads to a bug in the code which causes the
> physical address found in the PGD be used as a page-table page, even if
> the PGD is not present.
>
> So the code flow currently is:
>
> pgd = pgd_offset_k(addr);
> p4d = p4d_offset(pgd, addr);
> if (p4d_none(*p4d))
> p4d = p4d_alloc(&init_mm, pgd, addr);
>
> This lacks a check for pgd_none() at least, the correct flow would be:
>
> pgd = pgd_offset_k(addr);
> if (pgd_none(*pgd))
> p4d = p4d_alloc(&init_mm, pgd, addr);
> else
> p4d = p4d_offset(pgd, addr);
>
> But this is the same flow that the p4d_alloc() and the pud_alloc()
> functions use internally, so there is no need to duplicate them.
>
> Remove the p?d_none() checks from the function and just call into
> p4d_alloc() and pud_alloc() to correctly pre-allocate the PGD entries.
>
> Reported-by: Jason A. Donenfeld <Jason@zx2c4.com>
> Fixes: 6eb82f994026 ("x86/mm: Pre-allocate P4D/PUD pages for vmalloc area")
> Signed-off-by: Joerg Roedel <jroedel@suse.de>
> ---
> arch/x86/mm/init_64.c | 31 +++++++++++++------------------
> 1 file changed, 13 insertions(+), 18 deletions(-)
>
> diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
> index 3f4e29a78f2b..449e071240e1 100644
> --- a/arch/x86/mm/init_64.c
> +++ b/arch/x86/mm/init_64.c
> @@ -1253,28 +1253,23 @@ static void __init preallocate_vmalloc_pages(void)
> p4d_t *p4d;
> pud_t *pud;
>
> - p4d = p4d_offset(pgd, addr);
> - if (p4d_none(*p4d)) {
> - /* Can only happen with 5-level paging */
> - p4d = p4d_alloc(&init_mm, pgd, addr);
> - if (!p4d) {
> - lvl = "p4d";
> - goto failed;
> - }
> - }
> + lvl = "p4d";
> + p4d = p4d_alloc(&init_mm, pgd, addr);
> + if (!p4d)
> + goto failed;
>
> + /*
> + * With 5-level paging the P4D level is not folded. So the PGDs
> + * are now populated and there is no need to walk down to the
> + * PUD level.
> + */
> if (pgtable_l5_enabled())
> continue;
>
> - pud = pud_offset(p4d, addr);
> - if (pud_none(*pud)) {
> - /* Ends up here only with 4-level paging */
> - pud = pud_alloc(&init_mm, p4d, addr);
> - if (!pud) {
> - lvl = "pud";
> - goto failed;
> - }
> - }
> + lvl = "pud";
> + pud = pud_alloc(&init_mm, p4d, addr);
> + if (!pud)
> + goto failed;
> }
>
> return;
> --
> 2.26.2
This appears to fix the issue, so:
Tested-by: Jason A. Donenfeld <Jason@zx2c4.com>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] x86/mm/64: Do not dereference non-present PGD entries
2020-08-07 8:40 [PATCH] x86/mm/64: Do not dereference non-present PGD entries Joerg Roedel
2020-08-07 9:52 ` Jason A. Donenfeld
@ 2020-08-07 10:07 ` Mike Rapoport
2020-08-10 14:27 ` Dave Hansen
2 siblings, 0 replies; 6+ messages in thread
From: Mike Rapoport @ 2020-08-07 10:07 UTC (permalink / raw)
To: Joerg Roedel
Cc: x86, Dave Hansen, Andy Lutomirski, Peter Zijlstra, hpa,
Joerg Roedel, Linus Torvalds, Jason, Andrew Morton, linux-kernel
On Fri, Aug 07, 2020 at 10:40:13AM +0200, Joerg Roedel wrote:
> From: Joerg Roedel <jroedel@suse.de>
>
> The code for preallocate_vmalloc_pages() was written under the
> assumption that the p4d_offset() and pud_offset() functions will perform
> present checks before dereferencing the parent entries.
>
> This assumption is wrong an leads to a bug in the code which causes the
> physical address found in the PGD be used as a page-table page, even if
> the PGD is not present.
>
> So the code flow currently is:
>
> pgd = pgd_offset_k(addr);
> p4d = p4d_offset(pgd, addr);
> if (p4d_none(*p4d))
> p4d = p4d_alloc(&init_mm, pgd, addr);
>
> This lacks a check for pgd_none() at least, the correct flow would be:
>
> pgd = pgd_offset_k(addr);
> if (pgd_none(*pgd))
> p4d = p4d_alloc(&init_mm, pgd, addr);
> else
> p4d = p4d_offset(pgd, addr);
>
> But this is the same flow that the p4d_alloc() and the pud_alloc()
> functions use internally, so there is no need to duplicate them.
>
> Remove the p?d_none() checks from the function and just call into
> p4d_alloc() and pud_alloc() to correctly pre-allocate the PGD entries.
>
> Reported-by: Jason A. Donenfeld <Jason@zx2c4.com>
> Fixes: 6eb82f994026 ("x86/mm: Pre-allocate P4D/PUD pages for vmalloc area")
> Signed-off-by: Joerg Roedel <jroedel@suse.de>
LGTM,
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
> ---
> arch/x86/mm/init_64.c | 31 +++++++++++++------------------
> 1 file changed, 13 insertions(+), 18 deletions(-)
>
> diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
> index 3f4e29a78f2b..449e071240e1 100644
> --- a/arch/x86/mm/init_64.c
> +++ b/arch/x86/mm/init_64.c
> @@ -1253,28 +1253,23 @@ static void __init preallocate_vmalloc_pages(void)
> p4d_t *p4d;
> pud_t *pud;
>
> - p4d = p4d_offset(pgd, addr);
> - if (p4d_none(*p4d)) {
> - /* Can only happen with 5-level paging */
> - p4d = p4d_alloc(&init_mm, pgd, addr);
> - if (!p4d) {
> - lvl = "p4d";
> - goto failed;
> - }
> - }
> + lvl = "p4d";
> + p4d = p4d_alloc(&init_mm, pgd, addr);
> + if (!p4d)
> + goto failed;
>
> + /*
> + * With 5-level paging the P4D level is not folded. So the PGDs
> + * are now populated and there is no need to walk down to the
> + * PUD level.
> + */
> if (pgtable_l5_enabled())
> continue;
>
> - pud = pud_offset(p4d, addr);
> - if (pud_none(*pud)) {
> - /* Ends up here only with 4-level paging */
> - pud = pud_alloc(&init_mm, p4d, addr);
> - if (!pud) {
> - lvl = "pud";
> - goto failed;
> - }
> - }
> + lvl = "pud";
> + pud = pud_alloc(&init_mm, p4d, addr);
> + if (!pud)
> + goto failed;
> }
>
> return;
> --
> 2.26.2
>
--
Sincerely yours,
Mike.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] x86/mm/64: Do not dereference non-present PGD entries
2020-08-07 8:40 [PATCH] x86/mm/64: Do not dereference non-present PGD entries Joerg Roedel
2020-08-07 9:52 ` Jason A. Donenfeld
2020-08-07 10:07 ` Mike Rapoport
@ 2020-08-10 14:27 ` Dave Hansen
2020-08-10 15:53 ` Mike Rapoport
2 siblings, 1 reply; 6+ messages in thread
From: Dave Hansen @ 2020-08-10 14:27 UTC (permalink / raw)
To: Joerg Roedel, x86
Cc: Andy Lutomirski, Peter Zijlstra, hpa, Mike Rapoport,
Joerg Roedel, Linus Torvalds, Jason, Andrew Morton, linux-kernel,
Shutemov, Kirill
... adding Kirill
On 8/7/20 1:40 AM, Joerg Roedel wrote:
> + lvl = "p4d";
> + p4d = p4d_alloc(&init_mm, pgd, addr);
> + if (!p4d)
> + goto failed;
>
> + /*
> + * With 5-level paging the P4D level is not folded. So the PGDs
> + * are now populated and there is no need to walk down to the
> + * PUD level.
> + */
> if (pgtable_l5_enabled())
> continue;
It's early and I'm a coffee or two short of awake, but I had to stare at
the comment for a but to make sense of it.
It feels wrong, I think, because the 5-level code usually ends up doing
*more* allocations and in this case, it is _appearing_ to do fewer.
Would something like this make sense?
/*
* The goal here is to allocate all possibly required
* hardware page tables pointed to by the top hardware
* level.
*
* On 4-level systems, the p4d layer is folded away and
* the above code does no preallocation. Below, go down
* to the pud _software_ level to ensure the second
* hardware level is allocated.
*/
> - pud = pud_offset(p4d, addr);
> - if (pud_none(*pud)) {
> - /* Ends up here only with 4-level paging */
> - pud = pud_alloc(&init_mm, p4d, addr);
> - if (!pud) {
> - lvl = "pud";
> - goto failed;
> - }
> - }
> + lvl = "pud";
> + pud = pud_alloc(&init_mm, p4d, addr);
> + if (!pud)
> + goto failed;
> }
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] x86/mm/64: Do not dereference non-present PGD entries
2020-08-10 14:27 ` Dave Hansen
@ 2020-08-10 15:53 ` Mike Rapoport
2020-08-13 19:21 ` Ingo Molnar
0 siblings, 1 reply; 6+ messages in thread
From: Mike Rapoport @ 2020-08-10 15:53 UTC (permalink / raw)
To: Dave Hansen
Cc: Joerg Roedel, x86, Andy Lutomirski, Peter Zijlstra, hpa,
Joerg Roedel, Linus Torvalds, Jason, Andrew Morton, linux-kernel,
Shutemov, Kirill
On Mon, Aug 10, 2020 at 07:27:33AM -0700, Dave Hansen wrote:
> ... adding Kirill
>
> On 8/7/20 1:40 AM, Joerg Roedel wrote:
> > + lvl = "p4d";
> > + p4d = p4d_alloc(&init_mm, pgd, addr);
> > + if (!p4d)
> > + goto failed;
> >
> > + /*
> > + * With 5-level paging the P4D level is not folded. So the PGDs
> > + * are now populated and there is no need to walk down to the
> > + * PUD level.
> > + */
> > if (pgtable_l5_enabled())
> > continue;
>
> It's early and I'm a coffee or two short of awake, but I had to stare at
> the comment for a but to make sense of it.
>
> It feels wrong, I think, because the 5-level code usually ends up doing
> *more* allocations and in this case, it is _appearing_ to do fewer.
> Would something like this make sense?
Unless I miss something, with 5 levels vmalloc mappings are shared at
p4d level, so allocating a p4d page would be enough. With 4 levels,
p4d_alloc() is a nop and pud is the first actually populated level below
pgd.
> /*
> * The goal here is to allocate all possibly required
> * hardware page tables pointed to by the top hardware
> * level.
> *
> * On 4-level systems, the p4d layer is folded away and
> * the above code does no preallocation. Below, go down
> * to the pud _software_ level to ensure the second
> * hardware level is allocated.
> */
>
>
> > - pud = pud_offset(p4d, addr);
> > - if (pud_none(*pud)) {
> > - /* Ends up here only with 4-level paging */
> > - pud = pud_alloc(&init_mm, p4d, addr);
> > - if (!pud) {
> > - lvl = "pud";
> > - goto failed;
> > - }
> > - }
> > + lvl = "pud";
> > + pud = pud_alloc(&init_mm, p4d, addr);
> > + if (!pud)
> > + goto failed;
> > }
--
Sincerely yours,
Mike.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] x86/mm/64: Do not dereference non-present PGD entries
2020-08-10 15:53 ` Mike Rapoport
@ 2020-08-13 19:21 ` Ingo Molnar
0 siblings, 0 replies; 6+ messages in thread
From: Ingo Molnar @ 2020-08-13 19:21 UTC (permalink / raw)
To: Mike Rapoport
Cc: Dave Hansen, Joerg Roedel, x86, Andy Lutomirski, Peter Zijlstra,
hpa, Joerg Roedel, Linus Torvalds, Jason, Andrew Morton,
linux-kernel, Shutemov, Kirill
* Mike Rapoport <rppt@linux.ibm.com> wrote:
> On Mon, Aug 10, 2020 at 07:27:33AM -0700, Dave Hansen wrote:
> > ... adding Kirill
> >
> > On 8/7/20 1:40 AM, Joerg Roedel wrote:
> > > + lvl = "p4d";
> > > + p4d = p4d_alloc(&init_mm, pgd, addr);
> > > + if (!p4d)
> > > + goto failed;
> > >
> > > + /*
> > > + * With 5-level paging the P4D level is not folded. So the PGDs
> > > + * are now populated and there is no need to walk down to the
> > > + * PUD level.
> > > + */
> > > if (pgtable_l5_enabled())
> > > continue;
> >
> > It's early and I'm a coffee or two short of awake, but I had to stare at
> > the comment for a but to make sense of it.
> >
> > It feels wrong, I think, because the 5-level code usually ends up doing
> > *more* allocations and in this case, it is _appearing_ to do fewer.
> > Would something like this make sense?
>
> Unless I miss something, with 5 levels vmalloc mappings are shared at
> p4d level, so allocating a p4d page would be enough. With 4 levels,
> p4d_alloc() is a nop and pud is the first actually populated level below
> pgd.
>
> > /*
> > * The goal here is to allocate all possibly required
> > * hardware page tables pointed to by the top hardware
> > * level.
> > *
> > * On 4-level systems, the p4d layer is folded away and
> > * the above code does no preallocation. Below, go down
> > * to the pud _software_ level to ensure the second
> > * hardware level is allocated.
> > */
Would be nice to integrate all these explanations into the comment itself?
Thanks,
Ingo
^ permalink raw reply [flat|nested] 6+ messages in thread