linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] fix memory leak / panic in ioremap huge pages
@ 2018-03-07 18:32 Toshi Kani
  2018-03-07 18:32 ` [PATCH 1/2] mm/vmalloc: Add interfaces to free unused page table Toshi Kani
  2018-03-07 18:32 ` [PATCH 2/2] x86/mm: implement free pmd/pte page interfaces Toshi Kani
  0 siblings, 2 replies; 15+ messages in thread
From: Toshi Kani @ 2018-03-07 18:32 UTC (permalink / raw)
  To: mhocko, akpm, tglx, mingo, hpa, bp, catalin.marinas
  Cc: guohanjun, will.deacon, wxf.wang, linux-mm, x86,
	linux-arm-kernel, linux-kernel

On architectures with CONFIG_HAVE_ARCH_HUGE_VMAP set, ioremap()
may create pud/pmd mappings.  Kernel panic was observed on arm64
systems with Cortex-A75 in the following steps as described by
Hanjun Guo. [1]

1. ioremap a 4K size, valid page table will build,
2. iounmap it, pte0 will set to 0;
3. ioremap the same address with 2M size, pgd/pmd is unchanged,
   then set the a new value for pmd;
4. pte0 is leaked;
5. CPU may meet exception because the old pmd is still in TLB,
   which will lead to kernel panic.

This panic is not reproducible on x86.  INVLPG, called from iounmap,
purges all levels of entries associated with purged address on x86.
x86 still has memory leak.

Patch 01 adds new interfaces as stubs, which work as workaround of
this issue.  This patch 01 was leveraged from Hanjun's patch. [1]
Patch 02 fixes the issue on x86 by implementing the interfaces.

[1] https://patchwork.kernel.org/patch/10134581/

---
Toshi Kani (2):
 1/2 mm/vmalloc: Add interfaces to free unused page table
 2/2 x86/mm: implement free pmd/pte page interfaces

---
 arch/arm64/mm/mmu.c           | 10 ++++++++++
 arch/x86/mm/pgtable.c         | 44 +++++++++++++++++++++++++++++++++++++++++++
 include/asm-generic/pgtable.h | 10 ++++++++++
 lib/ioremap.c                 |  6 ++++--
 4 files changed, 68 insertions(+), 2 deletions(-)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/2] mm/vmalloc: Add interfaces to free unused page table
  2018-03-07 18:32 [PATCH 0/2] fix memory leak / panic in ioremap huge pages Toshi Kani
@ 2018-03-07 18:32 ` Toshi Kani
  2018-03-07 22:54   ` Andrew Morton
                     ` (4 more replies)
  2018-03-07 18:32 ` [PATCH 2/2] x86/mm: implement free pmd/pte page interfaces Toshi Kani
  1 sibling, 5 replies; 15+ messages in thread
From: Toshi Kani @ 2018-03-07 18:32 UTC (permalink / raw)
  To: mhocko, akpm, tglx, mingo, hpa, bp, catalin.marinas
  Cc: guohanjun, will.deacon, wxf.wang, linux-mm, x86,
	linux-arm-kernel, linux-kernel, Toshi Kani

On architectures with CONFIG_HAVE_ARCH_HUGE_VMAP set, ioremap()
may create pud/pmd mappings.  Kernel panic was observed on arm64
systems with Cortex-A75 in the following steps as described by
Hanjun Guo.

1. ioremap a 4K size, valid page table will build,
2. iounmap it, pte0 will set to 0;
3. ioremap the same address with 2M size, pgd/pmd is unchanged,
   then set the a new value for pmd;
4. pte0 is leaked;
5. CPU may meet exception because the old pmd is still in TLB,
   which will lead to kernel panic.

This panic is not reproducible on x86.  INVLPG, called from iounmap,
purges all levels of entries associated with purged address on x86.
x86 still has memory leak.

Add two interfaces, pud_free_pmd_page() and pmd_free_pte_page(),
which clear a given pud/pmd entry and free up a page for the lower
level entries.

This patch implements their stub functions on x86 and arm64, which
work as workaround.

Reported-by: Lei Li <lious.lilei@hisilicon.com>
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Wang Xuefeng <wxf.wang@hisilicon.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Hanjun Guo <guohanjun@huawei.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Borislav Petkov <bp@suse.de>
---
 arch/arm64/mm/mmu.c           |   10 ++++++++++
 arch/x86/mm/pgtable.c         |   20 ++++++++++++++++++++
 include/asm-generic/pgtable.h |   10 ++++++++++
 lib/ioremap.c                 |    6 ++++--
 4 files changed, 44 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 84a019f55022..84a37b4bc28e 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -972,3 +972,13 @@ int pmd_clear_huge(pmd_t *pmdp)
 	pmd_clear(pmdp);
 	return 1;
 }
+
+int pud_free_pmd_page(pud_t *pud)
+{
+	return pud_none(*pud);
+}
+
+int pmd_free_pte_page(pmd_t *pmd)
+{
+	return pmd_none(*pmd);
+}
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index 004abf9ebf12..942f4fa341f1 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -702,4 +702,24 @@ int pmd_clear_huge(pmd_t *pmd)
 
 	return 0;
 }
+
+/**
+ * pud_free_pmd_page - clear pud entry and free pmd page
+ *
+ * Returns 1 on success and 0 on failure (pud not cleared).
+ */
+int pud_free_pmd_page(pud_t *pud)
+{
+	return pud_none(*pud);
+}
+
+/**
+ * pmd_free_pte_page - clear pmd entry and free pte page
+ *
+ * Returns 1 on success and 0 on failure (pmd not cleared).
+ */
+int pmd_free_pte_page(pmd_t *pmd)
+{
+	return pmd_none(*pmd);
+}
 #endif	/* CONFIG_HAVE_ARCH_HUGE_VMAP */
diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
index 2cfa3075d148..2490800f7c5a 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -983,6 +983,8 @@ int pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t prot);
 int pmd_set_huge(pmd_t *pmd, phys_addr_t addr, pgprot_t prot);
 int pud_clear_huge(pud_t *pud);
 int pmd_clear_huge(pmd_t *pmd);
+int pud_free_pmd_page(pud_t *pud);
+int pmd_free_pte_page(pmd_t *pmd);
 #else	/* !CONFIG_HAVE_ARCH_HUGE_VMAP */
 static inline int p4d_set_huge(p4d_t *p4d, phys_addr_t addr, pgprot_t prot)
 {
@@ -1008,6 +1010,14 @@ static inline int pmd_clear_huge(pmd_t *pmd)
 {
 	return 0;
 }
+static inline int pud_free_pmd_page(pud_t *pud)
+{
+	return 0;
+}
+static inline int pmd_free_pte_page(pud_t *pmd)
+{
+	return 0;
+}
 #endif	/* CONFIG_HAVE_ARCH_HUGE_VMAP */
 
 #ifndef __HAVE_ARCH_FLUSH_PMD_TLB_RANGE
diff --git a/lib/ioremap.c b/lib/ioremap.c
index b808a390e4c3..54e5bbaa3200 100644
--- a/lib/ioremap.c
+++ b/lib/ioremap.c
@@ -91,7 +91,8 @@ static inline int ioremap_pmd_range(pud_t *pud, unsigned long addr,
 
 		if (ioremap_pmd_enabled() &&
 		    ((next - addr) == PMD_SIZE) &&
-		    IS_ALIGNED(phys_addr + addr, PMD_SIZE)) {
+		    IS_ALIGNED(phys_addr + addr, PMD_SIZE) &&
+		    pmd_free_pte_page(pmd)) {
 			if (pmd_set_huge(pmd, phys_addr + addr, prot))
 				continue;
 		}
@@ -117,7 +118,8 @@ static inline int ioremap_pud_range(p4d_t *p4d, unsigned long addr,
 
 		if (ioremap_pud_enabled() &&
 		    ((next - addr) == PUD_SIZE) &&
-		    IS_ALIGNED(phys_addr + addr, PUD_SIZE)) {
+		    IS_ALIGNED(phys_addr + addr, PUD_SIZE) &&
+		    pud_free_pmd_page(pud)) {
 			if (pud_set_huge(pud, phys_addr + addr, prot))
 				continue;
 		}

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/2] x86/mm: implement free pmd/pte page interfaces
  2018-03-07 18:32 [PATCH 0/2] fix memory leak / panic in ioremap huge pages Toshi Kani
  2018-03-07 18:32 ` [PATCH 1/2] mm/vmalloc: Add interfaces to free unused page table Toshi Kani
@ 2018-03-07 18:32 ` Toshi Kani
  2018-03-07 23:01   ` Andrew Morton
  1 sibling, 1 reply; 15+ messages in thread
From: Toshi Kani @ 2018-03-07 18:32 UTC (permalink / raw)
  To: mhocko, akpm, tglx, mingo, hpa, bp, catalin.marinas
  Cc: guohanjun, will.deacon, wxf.wang, linux-mm, x86,
	linux-arm-kernel, linux-kernel, Toshi Kani

Implement pud_free_pmd_page() and pmd_free_pte_page() on x86, which
clear a given pud/pmd entry and free up lower level page table(s).
Address range associated with the pud/pmd entry must have been purged
by INVLPG.

Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Borislav Petkov <bp@suse.de>
---
 arch/x86/mm/pgtable.c |   28 ++++++++++++++++++++++++++--
 1 file changed, 26 insertions(+), 2 deletions(-)

diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index 942f4fa341f1..121c0114439e 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -710,7 +710,22 @@ int pmd_clear_huge(pmd_t *pmd)
  */
 int pud_free_pmd_page(pud_t *pud)
 {
-	return pud_none(*pud);
+	pmd_t *pmd;
+	int i;
+
+	if (pud_none(*pud))
+		return 1;
+
+	pmd = (pmd_t *)pud_page_vaddr(*pud);
+
+	for (i = 0; i < PTRS_PER_PMD; i++)
+		if (!pmd_free_pte_page(&pmd[i]))
+			return 0;
+
+	pud_clear(pud);
+	free_page((unsigned long)pmd);
+
+	return 1;
 }
 
 /**
@@ -720,6 +735,15 @@ int pud_free_pmd_page(pud_t *pud)
  */
 int pmd_free_pte_page(pmd_t *pmd)
 {
-	return pmd_none(*pmd);
+	pte_t *pte;
+
+	if (pmd_none(*pmd))
+		return 1;
+
+	pte = (pte_t *)pmd_page_vaddr(*pmd);
+	pmd_clear(pmd);
+	free_page((unsigned long)pte);
+
+	return 1;
 }
 #endif	/* CONFIG_HAVE_ARCH_HUGE_VMAP */

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] mm/vmalloc: Add interfaces to free unused page table
  2018-03-07 18:32 ` [PATCH 1/2] mm/vmalloc: Add interfaces to free unused page table Toshi Kani
@ 2018-03-07 22:54   ` Andrew Morton
  2018-03-07 23:02     ` Kani, Toshi
  2018-03-07 22:55   ` Andrew Morton
                     ` (3 subsequent siblings)
  4 siblings, 1 reply; 15+ messages in thread
From: Andrew Morton @ 2018-03-07 22:54 UTC (permalink / raw)
  To: Toshi Kani
  Cc: mhocko, tglx, mingo, hpa, bp, catalin.marinas, guohanjun,
	will.deacon, wxf.wang, linux-mm, x86, linux-arm-kernel,
	linux-kernel

On Wed,  7 Mar 2018 11:32:26 -0700 Toshi Kani <toshi.kani@hpe.com> wrote:

> On architectures with CONFIG_HAVE_ARCH_HUGE_VMAP set, ioremap()
> may create pud/pmd mappings.  Kernel panic was observed on arm64
> systems with Cortex-A75 in the following steps as described by
> Hanjun Guo.
> 
> 1. ioremap a 4K size, valid page table will build,
> 2. iounmap it, pte0 will set to 0;
> 3. ioremap the same address with 2M size, pgd/pmd is unchanged,
>    then set the a new value for pmd;
> 4. pte0 is leaked;
> 5. CPU may meet exception because the old pmd is still in TLB,
>    which will lead to kernel panic.
> 
> This panic is not reproducible on x86.  INVLPG, called from iounmap,
> purges all levels of entries associated with purged address on x86.
> x86 still has memory leak.
> 
> Add two interfaces, pud_free_pmd_page() and pmd_free_pte_page(),
> which clear a given pud/pmd entry and free up a page for the lower
> level entries.
> 
> This patch implements their stub functions on x86 and arm64, which
> work as workaround.
> 
> index 004abf9ebf12..942f4fa341f1 100644
> --- a/arch/x86/mm/pgtable.c
> +++ b/arch/x86/mm/pgtable.c
> @@ -702,4 +702,24 @@ int pmd_clear_huge(pmd_t *pmd)
>  
>  	return 0;
>  }
> +
> +/**
> + * pud_free_pmd_page - clear pud entry and free pmd page
> + *
> + * Returns 1 on success and 0 on failure (pud not cleared).
> + */
> +int pud_free_pmd_page(pud_t *pud)
> +{
> +	return pud_none(*pud);
> +}
> +
> +/**
> + * pmd_free_pte_page - clear pmd entry and free pte page
> + *
> + * Returns 1 on success and 0 on failure (pmd not cleared).
> + */
> +int pmd_free_pte_page(pmd_t *pmd)
> +{
> +	return pmd_none(*pmd);
> +}

Are these functions well named?  I mean, the comment says "clear pud
entry and free pmd page" but the implementatin does neither of those
things.  The name implies that the function frees a pmd_page but the
callsites use the function as a way of querying state.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] mm/vmalloc: Add interfaces to free unused page table
  2018-03-07 18:32 ` [PATCH 1/2] mm/vmalloc: Add interfaces to free unused page table Toshi Kani
  2018-03-07 22:54   ` Andrew Morton
@ 2018-03-07 22:55   ` Andrew Morton
  2018-03-08  4:00   ` Matthew Wilcox
                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 15+ messages in thread
From: Andrew Morton @ 2018-03-07 22:55 UTC (permalink / raw)
  To: Toshi Kani
  Cc: mhocko, tglx, mingo, hpa, bp, catalin.marinas, guohanjun,
	will.deacon, wxf.wang, linux-mm, x86, linux-arm-kernel,
	linux-kernel

On Wed,  7 Mar 2018 11:32:26 -0700 Toshi Kani <toshi.kani@hpe.com> wrote:

> On architectures with CONFIG_HAVE_ARCH_HUGE_VMAP set, ioremap()
> may create pud/pmd mappings.  Kernel panic was observed on arm64
> systems with Cortex-A75 in the following steps as described by
> Hanjun Guo.
> 
> 1. ioremap a 4K size, valid page table will build,
> 2. iounmap it, pte0 will set to 0;
> 3. ioremap the same address with 2M size, pgd/pmd is unchanged,
>    then set the a new value for pmd;
> 4. pte0 is leaked;
> 5. CPU may meet exception because the old pmd is still in TLB,
>    which will lead to kernel panic.
> 
> This panic is not reproducible on x86.  INVLPG, called from iounmap,
> purges all levels of entries associated with purged address on x86.
> x86 still has memory leak.
> 
> Add two interfaces, pud_free_pmd_page() and pmd_free_pte_page(),
> which clear a given pud/pmd entry and free up a page for the lower
> level entries.
> 
> This patch implements their stub functions on x86 and arm64, which
> work as workaround.

Also, as this fixes an arm64 kernel panic, should we be using cc:stable?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] x86/mm: implement free pmd/pte page interfaces
  2018-03-07 18:32 ` [PATCH 2/2] x86/mm: implement free pmd/pte page interfaces Toshi Kani
@ 2018-03-07 23:01   ` Andrew Morton
  2018-03-07 23:22     ` Kani, Toshi
  0 siblings, 1 reply; 15+ messages in thread
From: Andrew Morton @ 2018-03-07 23:01 UTC (permalink / raw)
  To: Toshi Kani
  Cc: mhocko, tglx, mingo, hpa, bp, catalin.marinas, guohanjun,
	will.deacon, wxf.wang, linux-mm, x86, linux-arm-kernel,
	linux-kernel

On Wed,  7 Mar 2018 11:32:27 -0700 Toshi Kani <toshi.kani@hpe.com> wrote:

> Implement pud_free_pmd_page() and pmd_free_pte_page() on x86, which
> clear a given pud/pmd entry and free up lower level page table(s).
> Address range associated with the pud/pmd entry must have been purged
> by INVLPG.

OK, now we have implementations which match the naming ;) Again, is a
cc:stable warranted?

Do you have any preferences/suggestions as to which tree these should
be merged through?  You're hitting core, arm and x86.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] mm/vmalloc: Add interfaces to free unused page table
  2018-03-07 22:54   ` Andrew Morton
@ 2018-03-07 23:02     ` Kani, Toshi
  0 siblings, 0 replies; 15+ messages in thread
From: Kani, Toshi @ 2018-03-07 23:02 UTC (permalink / raw)
  To: akpm
  Cc: linux-kernel, bp, tglx, guohanjun, wxf.wang, linux-mm, x86, hpa,
	will.deacon, catalin.marinas, mingo, Hocko, Michal,
	linux-arm-kernel

On Wed, 2018-03-07 at 14:54 -0800, Andrew Morton wrote:
> On Wed,  7 Mar 2018 11:32:26 -0700 Toshi Kani <toshi.kani@hpe.com> wrote:
> 
> > On architectures with CONFIG_HAVE_ARCH_HUGE_VMAP set, ioremap()
> > may create pud/pmd mappings.  Kernel panic was observed on arm64
> > systems with Cortex-A75 in the following steps as described by
> > Hanjun Guo.
> > 
> > 1. ioremap a 4K size, valid page table will build,
> > 2. iounmap it, pte0 will set to 0;
> > 3. ioremap the same address with 2M size, pgd/pmd is unchanged,
> >    then set the a new value for pmd;
> > 4. pte0 is leaked;
> > 5. CPU may meet exception because the old pmd is still in TLB,
> >    which will lead to kernel panic.
> > 
> > This panic is not reproducible on x86.  INVLPG, called from iounmap,
> > purges all levels of entries associated with purged address on x86.
> > x86 still has memory leak.
> > 
> > Add two interfaces, pud_free_pmd_page() and pmd_free_pte_page(),
> > which clear a given pud/pmd entry and free up a page for the lower
> > level entries.
> > 
> > This patch implements their stub functions on x86 and arm64, which
> > work as workaround.
> > 
> > index 004abf9ebf12..942f4fa341f1 100644
> > --- a/arch/x86/mm/pgtable.c
> > +++ b/arch/x86/mm/pgtable.c
> > @@ -702,4 +702,24 @@ int pmd_clear_huge(pmd_t *pmd)
> >  
> >  	return 0;
> >  }
> > +
> > +/**
> > + * pud_free_pmd_page - clear pud entry and free pmd page
> > + *
> > + * Returns 1 on success and 0 on failure (pud not cleared).
> > + */
> > +int pud_free_pmd_page(pud_t *pud)
> > +{
> > +	return pud_none(*pud);
> > +}
> > +
> > +/**
> > + * pmd_free_pte_page - clear pmd entry and free pte page
> > + *
> > + * Returns 1 on success and 0 on failure (pmd not cleared).
> > + */
> > +int pmd_free_pte_page(pmd_t *pmd)
> > +{
> > +	return pmd_none(*pmd);
> > +}
> 
> Are these functions well named?  I mean, the comment says "clear pud
> entry and free pmd page" but the implementatin does neither of those
> things.  The name implies that the function frees a pmd_page but the
> callsites use the function as a way of querying state.

This patch 1/2 only implements stubs.  Patch 2/2 implements what is
described here.

> Also, as this fixes an arm64 kernel panic, should we be using
> cc:stable?

Right.

Thanks,
-Toshi

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] x86/mm: implement free pmd/pte page interfaces
  2018-03-07 23:01   ` Andrew Morton
@ 2018-03-07 23:22     ` Kani, Toshi
  0 siblings, 0 replies; 15+ messages in thread
From: Kani, Toshi @ 2018-03-07 23:22 UTC (permalink / raw)
  To: akpm
  Cc: linux-kernel, bp, tglx, guohanjun, wxf.wang, linux-mm, x86, hpa,
	will.deacon, catalin.marinas, mingo, Hocko, Michal,
	linux-arm-kernel

On Wed, 2018-03-07 at 15:01 -0800, Andrew Morton wrote:
> On Wed,  7 Mar 2018 11:32:27 -0700 Toshi Kani <toshi.kani@hpe.com> wrote:
> 
> > Implement pud_free_pmd_page() and pmd_free_pte_page() on x86, which
> > clear a given pud/pmd entry and free up lower level page table(s).
> > Address range associated with the pud/pmd entry must have been purged
> > by INVLPG.
> 
> OK, now we have implementations which match the naming ;) Again, is a
> cc:stable warranted?

Right. This patch 2/2 fixes the memory leak on x86.

Fixes: e61ce6ade404e ("mm: change ioremap to set up huge I/O mappings")

Patch 1/2 fixes the panic on arm64.

Fixes: 324420bf91f60 ("arm64: add support for ioremap() block mappings")

> Do you have any preferences/suggestions as to which tree these should
> be merged through?  You're hitting core, arm and x86.

No, I do not have any preference.

Thanks,
-Toshi

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] mm/vmalloc: Add interfaces to free unused page table
  2018-03-07 18:32 ` [PATCH 1/2] mm/vmalloc: Add interfaces to free unused page table Toshi Kani
  2018-03-07 22:54   ` Andrew Morton
  2018-03-07 22:55   ` Andrew Morton
@ 2018-03-08  4:00   ` Matthew Wilcox
  2018-03-08 15:56     ` Kani, Toshi
  2018-03-08  8:08   ` Ingo Molnar
  2018-03-08 18:04   ` Will Deacon
  4 siblings, 1 reply; 15+ messages in thread
From: Matthew Wilcox @ 2018-03-08  4:00 UTC (permalink / raw)
  To: Toshi Kani
  Cc: mhocko, akpm, tglx, mingo, hpa, bp, catalin.marinas, guohanjun,
	will.deacon, wxf.wang, linux-mm, x86, linux-arm-kernel,
	linux-kernel

On Wed, Mar 07, 2018 at 11:32:26AM -0700, Toshi Kani wrote:
> +/**
> + * pud_free_pmd_page - clear pud entry and free pmd page
> + *
> + * Returns 1 on success and 0 on failure (pud not cleared).
> + */
> +int pud_free_pmd_page(pud_t *pud)
> +{
> +	return pud_none(*pud);
> +}

Wouldn't it be clearer if you returned 'bool' instead of 'int' here?

Also you didn't document the pud parameter, nor use the approved form
for documenting the return type, nor the calling context.  So I would
have written it out like this:

/**
 * pud_free_pmd_page - Clear pud entry and free pmd page.
 * @pud: Pointer to a PUD.
 *
 * Context: Caller should hold mmap_sem write-locked.
 * Return: %true if clearing the entry succeeded.
 */

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] mm/vmalloc: Add interfaces to free unused page table
  2018-03-07 18:32 ` [PATCH 1/2] mm/vmalloc: Add interfaces to free unused page table Toshi Kani
                     ` (2 preceding siblings ...)
  2018-03-08  4:00   ` Matthew Wilcox
@ 2018-03-08  8:08   ` Ingo Molnar
  2018-03-08 18:04   ` Will Deacon
  4 siblings, 0 replies; 15+ messages in thread
From: Ingo Molnar @ 2018-03-08  8:08 UTC (permalink / raw)
  To: Toshi Kani
  Cc: mhocko, akpm, tglx, mingo, hpa, bp, catalin.marinas, guohanjun,
	will.deacon, wxf.wang, linux-mm, x86, linux-arm-kernel,
	linux-kernel, Andy Lutomirski, Linus Torvalds, Peter Zijlstra


* Toshi Kani <toshi.kani@hpe.com> wrote:

> On architectures with CONFIG_HAVE_ARCH_HUGE_VMAP set, ioremap()
> may create pud/pmd mappings.  Kernel panic was observed on arm64
> systems with Cortex-A75 in the following steps as described by
> Hanjun Guo.
> 
> 1. ioremap a 4K size, valid page table will build,
> 2. iounmap it, pte0 will set to 0;
> 3. ioremap the same address with 2M size, pgd/pmd is unchanged,
>    then set the a new value for pmd;
> 4. pte0 is leaked;
> 5. CPU may meet exception because the old pmd is still in TLB,
>    which will lead to kernel panic.
> 
> This panic is not reproducible on x86.  INVLPG, called from iounmap,
> purges all levels of entries associated with purged address on x86.

Where does x86 iounmap() do that?

> x86 still has memory leak.
> Add two interfaces, pud_free_pmd_page() and pmd_free_pte_page(),
> which clear a given pud/pmd entry and free up a page for the lower
> level entries.
> 
> This patch implements their stub functions on x86 and arm64, which
> work as workaround.

At minimum the ordering of the patches is very confusing: why don't you introduce 
the new methods in patch #1, and then use them in patch #2?

Also please double check the coding style of your patches, there's a number of 
obvious problems of outright bad patterns and also cases where you clearly don't 
try to follow the (correct) style of existing code.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] mm/vmalloc: Add interfaces to free unused page table
  2018-03-08  4:00   ` Matthew Wilcox
@ 2018-03-08 15:56     ` Kani, Toshi
  2018-03-08 22:07       ` Matthew Wilcox
  0 siblings, 1 reply; 15+ messages in thread
From: Kani, Toshi @ 2018-03-08 15:56 UTC (permalink / raw)
  To: willy
  Cc: linux-kernel, bp, tglx, guohanjun, wxf.wang, linux-mm, x86, akpm,
	hpa, catalin.marinas, mingo, will.deacon, Hocko, Michal,
	linux-arm-kernel

On Wed, 2018-03-07 at 20:00 -0800, Matthew Wilcox wrote:
> On Wed, Mar 07, 2018 at 11:32:26AM -0700, Toshi Kani wrote:
> > +/**
> > + * pud_free_pmd_page - clear pud entry and free pmd page
> > + *
> > + * Returns 1 on success and 0 on failure (pud not cleared).
> > + */
> > +int pud_free_pmd_page(pud_t *pud)
> > +{
> > +	return pud_none(*pud);
> > +}
> 
> Wouldn't it be clearer if you returned 'bool' instead of 'int' here?

I thought about it and decided to use 'int' since all other pud/pmd/pte
interfaces, such as pud_none() above, use 'int'.

> Also you didn't document the pud parameter, nor use the approved form
> for documenting the return type, nor the calling context.  So I would
> have written it out like this:
> 
> /**
>  * pud_free_pmd_page - Clear pud entry and free pmd page.
>  * @pud: Pointer to a PUD.
>  *
>  * Context: Caller should hold mmap_sem write-locked.
>  * Return: %true if clearing the entry succeeded.
>  */

Will do.

Thanks!
-Toshi

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] mm/vmalloc: Add interfaces to free unused page table
  2018-03-07 18:32 ` [PATCH 1/2] mm/vmalloc: Add interfaces to free unused page table Toshi Kani
                     ` (3 preceding siblings ...)
  2018-03-08  8:08   ` Ingo Molnar
@ 2018-03-08 18:04   ` Will Deacon
  2018-03-08 19:30     ` Kani, Toshi
  4 siblings, 1 reply; 15+ messages in thread
From: Will Deacon @ 2018-03-08 18:04 UTC (permalink / raw)
  To: Toshi Kani
  Cc: mhocko, akpm, tglx, mingo, hpa, bp, catalin.marinas, guohanjun,
	wxf.wang, linux-mm, x86, linux-arm-kernel, linux-kernel

Hi Toshi,

Thanks for the patches!

On Wed, Mar 07, 2018 at 11:32:26AM -0700, Toshi Kani wrote:
> On architectures with CONFIG_HAVE_ARCH_HUGE_VMAP set, ioremap()
> may create pud/pmd mappings.  Kernel panic was observed on arm64
> systems with Cortex-A75 in the following steps as described by
> Hanjun Guo.
> 
> 1. ioremap a 4K size, valid page table will build,
> 2. iounmap it, pte0 will set to 0;
> 3. ioremap the same address with 2M size, pgd/pmd is unchanged,
>    then set the a new value for pmd;
> 4. pte0 is leaked;
> 5. CPU may meet exception because the old pmd is still in TLB,
>    which will lead to kernel panic.
> 
> This panic is not reproducible on x86.  INVLPG, called from iounmap,
> purges all levels of entries associated with purged address on x86.
> x86 still has memory leak.
> 
> Add two interfaces, pud_free_pmd_page() and pmd_free_pte_page(),
> which clear a given pud/pmd entry and free up a page for the lower
> level entries.
> 
> This patch implements their stub functions on x86 and arm64, which
> work as workaround.
> 
> Reported-by: Lei Li <lious.lilei@hisilicon.com>
> Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Wang Xuefeng <wxf.wang@hisilicon.com>
> Cc: Will Deacon <will.deacon@arm.com>
> Cc: Hanjun Guo <guohanjun@huawei.com>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Borislav Petkov <bp@suse.de>
> ---
>  arch/arm64/mm/mmu.c           |   10 ++++++++++
>  arch/x86/mm/pgtable.c         |   20 ++++++++++++++++++++
>  include/asm-generic/pgtable.h |   10 ++++++++++
>  lib/ioremap.c                 |    6 ++++--
>  4 files changed, 44 insertions(+), 2 deletions(-)

[...]

> diff --git a/lib/ioremap.c b/lib/ioremap.c
> index b808a390e4c3..54e5bbaa3200 100644
> --- a/lib/ioremap.c
> +++ b/lib/ioremap.c
> @@ -91,7 +91,8 @@ static inline int ioremap_pmd_range(pud_t *pud, unsigned long addr,
>  
>  		if (ioremap_pmd_enabled() &&
>  		    ((next - addr) == PMD_SIZE) &&
> -		    IS_ALIGNED(phys_addr + addr, PMD_SIZE)) {
> +		    IS_ALIGNED(phys_addr + addr, PMD_SIZE) &&
> +		    pmd_free_pte_page(pmd)) {

I find it a bit weird that we're postponing this to the subsequent map. If
we want to address the break-before-make issue that was causing a panic on
arm64, then I think it would be better to do this on the unmap path to avoid
duplicating TLB invalidation.

Will

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] mm/vmalloc: Add interfaces to free unused page table
  2018-03-08 18:04   ` Will Deacon
@ 2018-03-08 19:30     ` Kani, Toshi
  0 siblings, 0 replies; 15+ messages in thread
From: Kani, Toshi @ 2018-03-08 19:30 UTC (permalink / raw)
  To: will.deacon
  Cc: linux-kernel, bp, tglx, guohanjun, wxf.wang, linux-mm, x86, akpm,
	hpa, catalin.marinas, mingo, Hocko, Michal, linux-arm-kernel

On Thu, 2018-03-08 at 18:04 +0000, Will Deacon wrote:
 :
> > diff --git a/lib/ioremap.c b/lib/ioremap.c
> > index b808a390e4c3..54e5bbaa3200 100644
> > --- a/lib/ioremap.c
> > +++ b/lib/ioremap.c
> > @@ -91,7 +91,8 @@ static inline int ioremap_pmd_range(pud_t *pud, unsigned long addr,
> >  
> >  		if (ioremap_pmd_enabled() &&
> >  		    ((next - addr) == PMD_SIZE) &&
> > -		    IS_ALIGNED(phys_addr + addr, PMD_SIZE)) {
> > +		    IS_ALIGNED(phys_addr + addr, PMD_SIZE) &&
> > +		    pmd_free_pte_page(pmd)) {
> 
> I find it a bit weird that we're postponing this to the subsequent map. If
> we want to address the break-before-make issue that was causing a panic on
> arm64, then I think it would be better to do this on the unmap path to avoid
> duplicating TLB invalidation.

Hi Will,

Yes, I started looking into doing it the unmap path, but found the
following issues:

 - The iounmap() path is shared with vunmap().  Since vmap() only
supports pte mappings, making vunmap() to free pte pages is an overhead
for regular vmap users as they do not need pte pages freed up.
 - Checking to see if all entries in a pte page are cleared in the unmap
path is racy, and serializing this check is expensive.
 - The unmap path calls free_vmap_area_noflush() to do lazy TLB purges.
Clearing a pud/pmd entry before the lazy TLB purges needs extra TLB
purge.

Hence, I decided to postpone and do it in the ioremap path when a
pud/pmd mapping is set.  The "break" on arm64 happens when you update a
pmd entry without purging it.  So, the unmap path is not broken.  I
understand that arm64 may need extra TLB purge in pmd_free_pte_page(),
but it limits this overhead only when it sets up a pud/pmd mapping.

Thanks,
-Toshi

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] mm/vmalloc: Add interfaces to free unused page table
  2018-03-08 15:56     ` Kani, Toshi
@ 2018-03-08 22:07       ` Matthew Wilcox
  2018-03-08 23:27         ` Kani, Toshi
  0 siblings, 1 reply; 15+ messages in thread
From: Matthew Wilcox @ 2018-03-08 22:07 UTC (permalink / raw)
  To: Kani, Toshi
  Cc: linux-kernel, bp, tglx, guohanjun, wxf.wang, linux-mm, x86, akpm,
	hpa, catalin.marinas, mingo, will.deacon, Hocko, Michal,
	linux-arm-kernel

On Thu, Mar 08, 2018 at 03:56:30PM +0000, Kani, Toshi wrote:
> On Wed, 2018-03-07 at 20:00 -0800, Matthew Wilcox wrote:
> > On Wed, Mar 07, 2018 at 11:32:26AM -0700, Toshi Kani wrote:
> > > +/**
> > > + * pud_free_pmd_page - clear pud entry and free pmd page
> > > + *
> > > + * Returns 1 on success and 0 on failure (pud not cleared).
> > > + */
> > > +int pud_free_pmd_page(pud_t *pud)
> > > +{
> > > +	return pud_none(*pud);
> > > +}
> > 
> > Wouldn't it be clearer if you returned 'bool' instead of 'int' here?
> 
> I thought about it and decided to use 'int' since all other pud/pmd/pte
> interfaces, such as pud_none() above, use 'int'.

These interfaces were introduced before we had bool ... I suspect nobody's
taken the time to go through and convert them all.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] mm/vmalloc: Add interfaces to free unused page table
  2018-03-08 22:07       ` Matthew Wilcox
@ 2018-03-08 23:27         ` Kani, Toshi
  0 siblings, 0 replies; 15+ messages in thread
From: Kani, Toshi @ 2018-03-08 23:27 UTC (permalink / raw)
  To: willy
  Cc: linux-kernel, bp, tglx, linux-mm, guohanjun, wxf.wang, x86, akpm,
	hpa, catalin.marinas, mingo, will.deacon, Hocko, Michal,
	linux-arm-kernel

On Thu, 2018-03-08 at 14:07 -0800, Matthew Wilcox wrote:
> On Thu, Mar 08, 2018 at 03:56:30PM +0000, Kani, Toshi wrote:
> > On Wed, 2018-03-07 at 20:00 -0800, Matthew Wilcox wrote:
> > > On Wed, Mar 07, 2018 at 11:32:26AM -0700, Toshi Kani wrote:
> > > > +/**
> > > > + * pud_free_pmd_page - clear pud entry and free pmd page
> > > > + *
> > > > + * Returns 1 on success and 0 on failure (pud not cleared).
> > > > + */
> > > > +int pud_free_pmd_page(pud_t *pud)
> > > > +{
> > > > +	return pud_none(*pud);
> > > > +}
> > > 
> > > Wouldn't it be clearer if you returned 'bool' instead of 'int' here?
> > 
> > I thought about it and decided to use 'int' since all other pud/pmd/pte
> > interfaces, such as pud_none() above, use 'int'.
> 
> These interfaces were introduced before we had bool ... I suspect nobody's
> taken the time to go through and convert them all.

I see.  Since this patchset already changes core, arm and x86, and will
be back ported to stables.  So, if you do not mind, I'd like to leave it
consistent with others with 'int', and make the footstep minimum.

Thanks,
-Toshi

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2018-03-08 23:27 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-07 18:32 [PATCH 0/2] fix memory leak / panic in ioremap huge pages Toshi Kani
2018-03-07 18:32 ` [PATCH 1/2] mm/vmalloc: Add interfaces to free unused page table Toshi Kani
2018-03-07 22:54   ` Andrew Morton
2018-03-07 23:02     ` Kani, Toshi
2018-03-07 22:55   ` Andrew Morton
2018-03-08  4:00   ` Matthew Wilcox
2018-03-08 15:56     ` Kani, Toshi
2018-03-08 22:07       ` Matthew Wilcox
2018-03-08 23:27         ` Kani, Toshi
2018-03-08  8:08   ` Ingo Molnar
2018-03-08 18:04   ` Will Deacon
2018-03-08 19:30     ` Kani, Toshi
2018-03-07 18:32 ` [PATCH 2/2] x86/mm: implement free pmd/pte page interfaces Toshi Kani
2018-03-07 23:01   ` Andrew Morton
2018-03-07 23:22     ` Kani, Toshi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).