linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] x86/mm: Fix leak of pmd ptlock
@ 2020-12-03  6:28 Dan Williams
  2020-12-03 10:54 ` Peter Zijlstra
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: Dan Williams @ 2020-12-03  6:28 UTC (permalink / raw)
  To: tglx, mingo, bp
  Cc: stable, Dave Hansen, Andy Lutomirski, Peter Zijlstra, x86,
	H. Peter Anvin, Yi Zhang, linux-kernel, linux-nvdimm, willy

Commit 28ee90fe6048 ("x86/mm: implement free pmd/pte page interfaces")
introduced a new location where a pmd was released, but neglected to run
the pmd page destructor. In fact, this happened previously for a
different pmd release path and was fixed by commit:

c283610e44ec ("x86, mm: do not leak page->ptl for pmd page tables").

This issue was hidden until recently because the failure mode is silent,
but commit:

b2b29d6d0119 ("mm: account PMD tables like PTE tables")

...turns the failure mode into this signature:

 BUG: Bad page state in process lt-pmem-ns  pfn:15943d
 page:000000007262ed7b refcount:0 mapcount:-1024 mapping:0000000000000000 index:0x0 pfn:0x15943d
 flags: 0xaffff800000000()
 raw: 00affff800000000 dead000000000100 0000000000000000 0000000000000000
 raw: 0000000000000000 ffff913a029bcc08 00000000fffffbff 0000000000000000
 page dumped because: nonzero mapcount
 [..]
  dump_stack+0x8b/0xb0
  bad_page.cold+0x63/0x94
  free_pcp_prepare+0x224/0x270
  free_unref_page+0x18/0xd0
  pud_free_pmd_page+0x146/0x160
  ioremap_pud_range+0xe3/0x350
  ioremap_page_range+0x108/0x160
  __ioremap_caller.constprop.0+0x174/0x2b0
  ? memremap+0x7a/0x110
  memremap+0x7a/0x110
  devm_memremap+0x53/0xa0
  pmem_attach_disk+0x4ed/0x530 [nd_pmem]
  ? __devm_release_region+0x52/0x80
  nvdimm_bus_probe+0x85/0x210 [libnvdimm]

Given this is a repeat occurrence it seemed prudent to look for other
places where this destructor might be missing and whether a better
helper is needed. try_to_free_pmd_page() looks like a candidate, but
testing with setting up and tearing down pmd mappings via the dax unit
tests is thus far not triggering the failure. As for a better helper
pmd_free() is close, but it is a messy fit due to requiring an @mm arg.
Also, ___pmd_free_tlb() wants to call paravirt_tlb_remove_table()
instead of free_page(), so open-coded pgtable_pmd_page_dtor() seems the
best way forward for now.

Fixes: 28ee90fe6048 ("x86/mm: implement free pmd/pte page interfaces")
Cc: <stable@vger.kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: x86@kernel.org
Cc: "H. Peter Anvin" <hpa@zytor.com>
Co-debugged-by: Matthew Wilcox <willy@infradead.org>
Tested-by: Yi Zhang <yi.zhang@redhat.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 arch/x86/mm/pgtable.c |    2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index dfd82f51ba66..f6a9e2e36642 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -829,6 +829,8 @@ int pud_free_pmd_page(pud_t *pud, unsigned long addr)
 	}
 
 	free_page((unsigned long)pmd_sv);
+
+	pgtable_pmd_page_dtor(virt_to_page(pmd));
 	free_page((unsigned long)pmd);
 
 	return 1;


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] x86/mm: Fix leak of pmd ptlock
  2020-12-03  6:28 [PATCH] x86/mm: Fix leak of pmd ptlock Dan Williams
@ 2020-12-03 10:54 ` Peter Zijlstra
  2020-12-16  3:25 ` Dan Williams
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: Peter Zijlstra @ 2020-12-03 10:54 UTC (permalink / raw)
  To: Dan Williams
  Cc: tglx, mingo, bp, stable, Dave Hansen, Andy Lutomirski, x86,
	H. Peter Anvin, Yi Zhang, linux-kernel, linux-nvdimm, willy

On Wed, Dec 02, 2020 at 10:28:12PM -0800, Dan Williams wrote:
> pmd_free() is close, but it is a messy fit due to requiring an @mm arg.

Hurpm, only parisc and s390 actually use that argument. And s390
_really_ needs it, because they're doing runtime folding per mm.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] x86/mm: Fix leak of pmd ptlock
  2020-12-03  6:28 [PATCH] x86/mm: Fix leak of pmd ptlock Dan Williams
  2020-12-03 10:54 ` Peter Zijlstra
@ 2020-12-16  3:25 ` Dan Williams
  2021-01-05  4:18 ` Dan Williams
  2021-01-05 15:40 ` Dave Hansen
  3 siblings, 0 replies; 5+ messages in thread
From: Dan Williams @ 2020-12-16  3:25 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov
  Cc: stable, Dave Hansen, Andy Lutomirski, Peter Zijlstra, X86 ML,
	H. Peter Anvin, Yi Zhang, Linux Kernel Mailing List,
	linux-nvdimm, Matthew Wilcox

Might I tempt an x86/mm maintainer to ack this, or a x86-tip
maintainer to apply it outright?

On Wed, Dec 2, 2020 at 10:28 PM Dan Williams <dan.j.williams@intel.com> wrote:
>
> Commit 28ee90fe6048 ("x86/mm: implement free pmd/pte page interfaces")
> introduced a new location where a pmd was released, but neglected to run
> the pmd page destructor. In fact, this happened previously for a
> different pmd release path and was fixed by commit:
>
> c283610e44ec ("x86, mm: do not leak page->ptl for pmd page tables").
>
> This issue was hidden until recently because the failure mode is silent,
> but commit:
>
> b2b29d6d0119 ("mm: account PMD tables like PTE tables")
>
> ...turns the failure mode into this signature:
>
>  BUG: Bad page state in process lt-pmem-ns  pfn:15943d
>  page:000000007262ed7b refcount:0 mapcount:-1024 mapping:0000000000000000 index:0x0 pfn:0x15943d
>  flags: 0xaffff800000000()
>  raw: 00affff800000000 dead000000000100 0000000000000000 0000000000000000
>  raw: 0000000000000000 ffff913a029bcc08 00000000fffffbff 0000000000000000
>  page dumped because: nonzero mapcount
>  [..]
>   dump_stack+0x8b/0xb0
>   bad_page.cold+0x63/0x94
>   free_pcp_prepare+0x224/0x270
>   free_unref_page+0x18/0xd0
>   pud_free_pmd_page+0x146/0x160
>   ioremap_pud_range+0xe3/0x350
>   ioremap_page_range+0x108/0x160
>   __ioremap_caller.constprop.0+0x174/0x2b0
>   ? memremap+0x7a/0x110
>   memremap+0x7a/0x110
>   devm_memremap+0x53/0xa0
>   pmem_attach_disk+0x4ed/0x530 [nd_pmem]
>   ? __devm_release_region+0x52/0x80
>   nvdimm_bus_probe+0x85/0x210 [libnvdimm]
>
> Given this is a repeat occurrence it seemed prudent to look for other
> places where this destructor might be missing and whether a better
> helper is needed. try_to_free_pmd_page() looks like a candidate, but
> testing with setting up and tearing down pmd mappings via the dax unit
> tests is thus far not triggering the failure. As for a better helper
> pmd_free() is close, but it is a messy fit due to requiring an @mm arg.
> Also, ___pmd_free_tlb() wants to call paravirt_tlb_remove_table()
> instead of free_page(), so open-coded pgtable_pmd_page_dtor() seems the
> best way forward for now.
>
> Fixes: 28ee90fe6048 ("x86/mm: implement free pmd/pte page interfaces")
> Cc: <stable@vger.kernel.org>
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: x86@kernel.org
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Co-debugged-by: Matthew Wilcox <willy@infradead.org>
> Tested-by: Yi Zhang <yi.zhang@redhat.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  arch/x86/mm/pgtable.c |    2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
> index dfd82f51ba66..f6a9e2e36642 100644
> --- a/arch/x86/mm/pgtable.c
> +++ b/arch/x86/mm/pgtable.c
> @@ -829,6 +829,8 @@ int pud_free_pmd_page(pud_t *pud, unsigned long addr)
>         }
>
>         free_page((unsigned long)pmd_sv);
> +
> +       pgtable_pmd_page_dtor(virt_to_page(pmd));
>         free_page((unsigned long)pmd);
>
>         return 1;
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] x86/mm: Fix leak of pmd ptlock
  2020-12-03  6:28 [PATCH] x86/mm: Fix leak of pmd ptlock Dan Williams
  2020-12-03 10:54 ` Peter Zijlstra
  2020-12-16  3:25 ` Dan Williams
@ 2021-01-05  4:18 ` Dan Williams
  2021-01-05 15:40 ` Dave Hansen
  3 siblings, 0 replies; 5+ messages in thread
From: Dan Williams @ 2021-01-05  4:18 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov
  Cc: stable, Dave Hansen, Andy Lutomirski, Peter Zijlstra, X86 ML,
	H. Peter Anvin, Yi Zhang, Linux Kernel Mailing List,
	linux-nvdimm, Matthew Wilcox

Ping, this bug is still present on v5.11-rc2, need a resend?

On Wed, Dec 2, 2020 at 10:28 PM Dan Williams <dan.j.williams@intel.com> wrote:
>
> Commit 28ee90fe6048 ("x86/mm: implement free pmd/pte page interfaces")
> introduced a new location where a pmd was released, but neglected to run
> the pmd page destructor. In fact, this happened previously for a
> different pmd release path and was fixed by commit:
>
> c283610e44ec ("x86, mm: do not leak page->ptl for pmd page tables").
>
> This issue was hidden until recently because the failure mode is silent,
> but commit:
>
> b2b29d6d0119 ("mm: account PMD tables like PTE tables")
>
> ...turns the failure mode into this signature:
>
>  BUG: Bad page state in process lt-pmem-ns  pfn:15943d
>  page:000000007262ed7b refcount:0 mapcount:-1024 mapping:0000000000000000 index:0x0 pfn:0x15943d
>  flags: 0xaffff800000000()
>  raw: 00affff800000000 dead000000000100 0000000000000000 0000000000000000
>  raw: 0000000000000000 ffff913a029bcc08 00000000fffffbff 0000000000000000
>  page dumped because: nonzero mapcount
>  [..]
>   dump_stack+0x8b/0xb0
>   bad_page.cold+0x63/0x94
>   free_pcp_prepare+0x224/0x270
>   free_unref_page+0x18/0xd0
>   pud_free_pmd_page+0x146/0x160
>   ioremap_pud_range+0xe3/0x350
>   ioremap_page_range+0x108/0x160
>   __ioremap_caller.constprop.0+0x174/0x2b0
>   ? memremap+0x7a/0x110
>   memremap+0x7a/0x110
>   devm_memremap+0x53/0xa0
>   pmem_attach_disk+0x4ed/0x530 [nd_pmem]
>   ? __devm_release_region+0x52/0x80
>   nvdimm_bus_probe+0x85/0x210 [libnvdimm]
>
> Given this is a repeat occurrence it seemed prudent to look for other
> places where this destructor might be missing and whether a better
> helper is needed. try_to_free_pmd_page() looks like a candidate, but
> testing with setting up and tearing down pmd mappings via the dax unit
> tests is thus far not triggering the failure. As for a better helper
> pmd_free() is close, but it is a messy fit due to requiring an @mm arg.
> Also, ___pmd_free_tlb() wants to call paravirt_tlb_remove_table()
> instead of free_page(), so open-coded pgtable_pmd_page_dtor() seems the
> best way forward for now.
>
> Fixes: 28ee90fe6048 ("x86/mm: implement free pmd/pte page interfaces")
> Cc: <stable@vger.kernel.org>
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: x86@kernel.org
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Co-debugged-by: Matthew Wilcox <willy@infradead.org>
> Tested-by: Yi Zhang <yi.zhang@redhat.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  arch/x86/mm/pgtable.c |    2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
> index dfd82f51ba66..f6a9e2e36642 100644
> --- a/arch/x86/mm/pgtable.c
> +++ b/arch/x86/mm/pgtable.c
> @@ -829,6 +829,8 @@ int pud_free_pmd_page(pud_t *pud, unsigned long addr)
>         }
>
>         free_page((unsigned long)pmd_sv);
> +
> +       pgtable_pmd_page_dtor(virt_to_page(pmd));
>         free_page((unsigned long)pmd);
>
>         return 1;
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] x86/mm: Fix leak of pmd ptlock
  2020-12-03  6:28 [PATCH] x86/mm: Fix leak of pmd ptlock Dan Williams
                   ` (2 preceding siblings ...)
  2021-01-05  4:18 ` Dan Williams
@ 2021-01-05 15:40 ` Dave Hansen
  3 siblings, 0 replies; 5+ messages in thread
From: Dave Hansen @ 2021-01-05 15:40 UTC (permalink / raw)
  To: Dan Williams, tglx, mingo, bp
  Cc: stable, Dave Hansen, Andy Lutomirski, Peter Zijlstra, x86,
	H. Peter Anvin, Yi Zhang, linux-kernel, linux-nvdimm, willy

On 12/2/20 10:28 PM, Dan Williams wrote:
> Commit 28ee90fe6048 ("x86/mm: implement free pmd/pte page interfaces")
> introduced a new location where a pmd was released, but neglected to run
> the pmd page destructor. In fact, this happened previously for a
> different pmd release path and was fixed by commit:
> 
> c283610e44ec ("x86, mm: do not leak page->ptl for pmd page tables").
> 
> This issue was hidden until recently because the failure mode is silent,
> but commit:

Looks sane.  Thanks as always for the thorough changelog and the
investigation into why we're suddenly seeing this now.

I agree that ridding ourselves of open-coded free_page()'s is a good
idea, but this patch itself needs to be around for stable anyway.  So,

Acked-by: Dave Hansen <dave.hansen@linux.intel.com>

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-01-05 15:41 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-03  6:28 [PATCH] x86/mm: Fix leak of pmd ptlock Dan Williams
2020-12-03 10:54 ` Peter Zijlstra
2020-12-16  3:25 ` Dan Williams
2021-01-05  4:18 ` Dan Williams
2021-01-05 15:40 ` Dave Hansen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).