All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] fix free pmd/pte page handlings on x86
@ 2018-04-30 17:59 ` Toshi Kani
  0 siblings, 0 replies; 14+ messages in thread
From: Toshi Kani @ 2018-04-30 17:59 UTC (permalink / raw)
  To: mhocko, akpm, tglx, mingo, hpa
  Cc: cpandya, linux-mm, x86, linux-arm-kernel, linux-kernel

This series fixes x86 ioremap free page handlings when setting up
pud/pmd maps.

Patch 01 is from Chintan's v9 01/04 patch [1], which adds a new arg 'addr'.
This avoids merge conflicts with his series.

Patch 02 adds a TLB purge (INVLPG) to purge page-structure caches that
may be cached by speculation.  See patch 2/2 for the detals.

Patch 03 disables free page handling on x86-PAE to address BUG_ON reported
by Joerg.

[1] https://patchwork.kernel.org/patch/10371015/

---
Chintan Pandya (1):
  1/3 ioremap: Update pgtable free interfaces with addr

Toshi Kani (2):
  2/3 x86/mm: add TLB purge to free pmd/pte page interfaces
  3/3 x86/mm: disable ioremap free page handling on x86-PAE

---
 arch/arm64/mm/mmu.c           |  4 +--
 arch/x86/mm/pgtable.c         | 57 +++++++++++++++++++++++++++++++++++++------
 include/asm-generic/pgtable.h |  8 +++---
 lib/ioremap.c                 |  4 +--
 4 files changed, 57 insertions(+), 16 deletions(-)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 0/3] fix free pmd/pte page handlings on x86
@ 2018-04-30 17:59 ` Toshi Kani
  0 siblings, 0 replies; 14+ messages in thread
From: Toshi Kani @ 2018-04-30 17:59 UTC (permalink / raw)
  To: linux-arm-kernel

This series fixes x86 ioremap free page handlings when setting up
pud/pmd maps.

Patch 01 is from Chintan's v9 01/04 patch [1], which adds a new arg 'addr'.
This avoids merge conflicts with his series.

Patch 02 adds a TLB purge (INVLPG) to purge page-structure caches that
may be cached by speculation.  See patch 2/2 for the detals.

Patch 03 disables free page handling on x86-PAE to address BUG_ON reported
by Joerg.

[1] https://patchwork.kernel.org/patch/10371015/

---
Chintan Pandya (1):
  1/3 ioremap: Update pgtable free interfaces with addr

Toshi Kani (2):
  2/3 x86/mm: add TLB purge to free pmd/pte page interfaces
  3/3 x86/mm: disable ioremap free page handling on x86-PAE

---
 arch/arm64/mm/mmu.c           |  4 +--
 arch/x86/mm/pgtable.c         | 57 +++++++++++++++++++++++++++++++++++++------
 include/asm-generic/pgtable.h |  8 +++---
 lib/ioremap.c                 |  4 +--
 4 files changed, 57 insertions(+), 16 deletions(-)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/3] ioremap: Update pgtable free interfaces with addr
  2018-04-30 17:59 ` Toshi Kani
@ 2018-04-30 17:59   ` Toshi Kani
  -1 siblings, 0 replies; 14+ messages in thread
From: Toshi Kani @ 2018-04-30 17:59 UTC (permalink / raw)
  To: mhocko, akpm, tglx, mingo, hpa
  Cc: cpandya, linux-mm, x86, linux-arm-kernel, linux-kernel,
	Toshi Kani, stable

From: Chintan Pandya <cpandya@codeaurora.org>

This patch ("mm/vmalloc: Add interfaces to free unmapped
page table") adds following 2 interfaces to free the page
table in case we implement huge mapping.

pud_free_pmd_page() and pmd_free_pte_page()

Some architectures (like arm64) needs to do proper TLB
maintanance after updating pagetable entry even in map.
Why ? Read this,
https://patchwork.kernel.org/patch/10134581/

Pass 'addr' in these interfaces so that proper TLB ops
can be performed.

Fixes: b6bdb7517c3d ("mm/vmalloc: add interfaces to free unmapped page table")
Signed-off-by: Chintan Pandya <cpandya@codeaurora.org>
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Cc: <stable@vger.kernel.org>
---
 arch/arm64/mm/mmu.c           |    4 ++--
 arch/x86/mm/pgtable.c         |    8 +++++---
 include/asm-generic/pgtable.h |    8 ++++----
 lib/ioremap.c                 |    4 ++--
 4 files changed, 13 insertions(+), 11 deletions(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 2dbb2c9f1ec1..da98828609a1 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -973,12 +973,12 @@ int pmd_clear_huge(pmd_t *pmdp)
 	return 1;
 }
 
-int pud_free_pmd_page(pud_t *pud)
+int pud_free_pmd_page(pud_t *pud, unsigned long addr)
 {
 	return pud_none(*pud);
 }
 
-int pmd_free_pte_page(pmd_t *pmd)
+int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
 {
 	return pmd_none(*pmd);
 }
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index ffc8c13c50e4..37e3cbac59b9 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -718,11 +718,12 @@ int pmd_clear_huge(pmd_t *pmd)
 /**
  * pud_free_pmd_page - Clear pud entry and free pmd page.
  * @pud: Pointer to a PUD.
+ * @addr: Virtual address associated with pud.
  *
  * Context: The pud range has been unmaped and TLB purged.
  * Return: 1 if clearing the entry succeeded. 0 otherwise.
  */
-int pud_free_pmd_page(pud_t *pud)
+int pud_free_pmd_page(pud_t *pud, unsigned long addr)
 {
 	pmd_t *pmd;
 	int i;
@@ -733,7 +734,7 @@ int pud_free_pmd_page(pud_t *pud)
 	pmd = (pmd_t *)pud_page_vaddr(*pud);
 
 	for (i = 0; i < PTRS_PER_PMD; i++)
-		if (!pmd_free_pte_page(&pmd[i]))
+		if (!pmd_free_pte_page(&pmd[i], addr + (i * PMD_SIZE)))
 			return 0;
 
 	pud_clear(pud);
@@ -745,11 +746,12 @@ int pud_free_pmd_page(pud_t *pud)
 /**
  * pmd_free_pte_page - Clear pmd entry and free pte page.
  * @pmd: Pointer to a PMD.
+ * @addr: Virtual address associated with pmd.
  *
  * Context: The pmd range has been unmaped and TLB purged.
  * Return: 1 if clearing the entry succeeded. 0 otherwise.
  */
-int pmd_free_pte_page(pmd_t *pmd)
+int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
 {
 	pte_t *pte;
 
diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
index f59639afaa39..b081794ba135 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -1019,8 +1019,8 @@ int pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t prot);
 int pmd_set_huge(pmd_t *pmd, phys_addr_t addr, pgprot_t prot);
 int pud_clear_huge(pud_t *pud);
 int pmd_clear_huge(pmd_t *pmd);
-int pud_free_pmd_page(pud_t *pud);
-int pmd_free_pte_page(pmd_t *pmd);
+int pud_free_pmd_page(pud_t *pud, unsigned long addr);
+int pmd_free_pte_page(pmd_t *pmd, unsigned long addr);
 #else	/* !CONFIG_HAVE_ARCH_HUGE_VMAP */
 static inline int p4d_set_huge(p4d_t *p4d, phys_addr_t addr, pgprot_t prot)
 {
@@ -1046,11 +1046,11 @@ static inline int pmd_clear_huge(pmd_t *pmd)
 {
 	return 0;
 }
-static inline int pud_free_pmd_page(pud_t *pud)
+static inline int pud_free_pmd_page(pud_t *pud, unsigned long addr)
 {
 	return 0;
 }
-static inline int pmd_free_pte_page(pmd_t *pmd)
+static inline int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
 {
 	return 0;
 }
diff --git a/lib/ioremap.c b/lib/ioremap.c
index 54e5bbaa3200..517f5853ffed 100644
--- a/lib/ioremap.c
+++ b/lib/ioremap.c
@@ -92,7 +92,7 @@ static inline int ioremap_pmd_range(pud_t *pud, unsigned long addr,
 		if (ioremap_pmd_enabled() &&
 		    ((next - addr) == PMD_SIZE) &&
 		    IS_ALIGNED(phys_addr + addr, PMD_SIZE) &&
-		    pmd_free_pte_page(pmd)) {
+		    pmd_free_pte_page(pmd, addr)) {
 			if (pmd_set_huge(pmd, phys_addr + addr, prot))
 				continue;
 		}
@@ -119,7 +119,7 @@ static inline int ioremap_pud_range(p4d_t *p4d, unsigned long addr,
 		if (ioremap_pud_enabled() &&
 		    ((next - addr) == PUD_SIZE) &&
 		    IS_ALIGNED(phys_addr + addr, PUD_SIZE) &&
-		    pud_free_pmd_page(pud)) {
+		    pud_free_pmd_page(pud, addr)) {
 			if (pud_set_huge(pud, phys_addr + addr, prot))
 				continue;
 		}

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 1/3] ioremap: Update pgtable free interfaces with addr
@ 2018-04-30 17:59   ` Toshi Kani
  0 siblings, 0 replies; 14+ messages in thread
From: Toshi Kani @ 2018-04-30 17:59 UTC (permalink / raw)
  To: linux-arm-kernel

From: Chintan Pandya <cpandya@codeaurora.org>

This patch ("mm/vmalloc: Add interfaces to free unmapped
page table") adds following 2 interfaces to free the page
table in case we implement huge mapping.

pud_free_pmd_page() and pmd_free_pte_page()

Some architectures (like arm64) needs to do proper TLB
maintanance after updating pagetable entry even in map.
Why ? Read this,
https://patchwork.kernel.org/patch/10134581/

Pass 'addr' in these interfaces so that proper TLB ops
can be performed.

Fixes: b6bdb7517c3d ("mm/vmalloc: add interfaces to free unmapped page table")
Signed-off-by: Chintan Pandya <cpandya@codeaurora.org>
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Cc: <stable@vger.kernel.org>
---
 arch/arm64/mm/mmu.c           |    4 ++--
 arch/x86/mm/pgtable.c         |    8 +++++---
 include/asm-generic/pgtable.h |    8 ++++----
 lib/ioremap.c                 |    4 ++--
 4 files changed, 13 insertions(+), 11 deletions(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 2dbb2c9f1ec1..da98828609a1 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -973,12 +973,12 @@ int pmd_clear_huge(pmd_t *pmdp)
 	return 1;
 }
 
-int pud_free_pmd_page(pud_t *pud)
+int pud_free_pmd_page(pud_t *pud, unsigned long addr)
 {
 	return pud_none(*pud);
 }
 
-int pmd_free_pte_page(pmd_t *pmd)
+int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
 {
 	return pmd_none(*pmd);
 }
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index ffc8c13c50e4..37e3cbac59b9 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -718,11 +718,12 @@ int pmd_clear_huge(pmd_t *pmd)
 /**
  * pud_free_pmd_page - Clear pud entry and free pmd page.
  * @pud: Pointer to a PUD.
+ * @addr: Virtual address associated with pud.
  *
  * Context: The pud range has been unmaped and TLB purged.
  * Return: 1 if clearing the entry succeeded. 0 otherwise.
  */
-int pud_free_pmd_page(pud_t *pud)
+int pud_free_pmd_page(pud_t *pud, unsigned long addr)
 {
 	pmd_t *pmd;
 	int i;
@@ -733,7 +734,7 @@ int pud_free_pmd_page(pud_t *pud)
 	pmd = (pmd_t *)pud_page_vaddr(*pud);
 
 	for (i = 0; i < PTRS_PER_PMD; i++)
-		if (!pmd_free_pte_page(&pmd[i]))
+		if (!pmd_free_pte_page(&pmd[i], addr + (i * PMD_SIZE)))
 			return 0;
 
 	pud_clear(pud);
@@ -745,11 +746,12 @@ int pud_free_pmd_page(pud_t *pud)
 /**
  * pmd_free_pte_page - Clear pmd entry and free pte page.
  * @pmd: Pointer to a PMD.
+ * @addr: Virtual address associated with pmd.
  *
  * Context: The pmd range has been unmaped and TLB purged.
  * Return: 1 if clearing the entry succeeded. 0 otherwise.
  */
-int pmd_free_pte_page(pmd_t *pmd)
+int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
 {
 	pte_t *pte;
 
diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
index f59639afaa39..b081794ba135 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -1019,8 +1019,8 @@ int pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t prot);
 int pmd_set_huge(pmd_t *pmd, phys_addr_t addr, pgprot_t prot);
 int pud_clear_huge(pud_t *pud);
 int pmd_clear_huge(pmd_t *pmd);
-int pud_free_pmd_page(pud_t *pud);
-int pmd_free_pte_page(pmd_t *pmd);
+int pud_free_pmd_page(pud_t *pud, unsigned long addr);
+int pmd_free_pte_page(pmd_t *pmd, unsigned long addr);
 #else	/* !CONFIG_HAVE_ARCH_HUGE_VMAP */
 static inline int p4d_set_huge(p4d_t *p4d, phys_addr_t addr, pgprot_t prot)
 {
@@ -1046,11 +1046,11 @@ static inline int pmd_clear_huge(pmd_t *pmd)
 {
 	return 0;
 }
-static inline int pud_free_pmd_page(pud_t *pud)
+static inline int pud_free_pmd_page(pud_t *pud, unsigned long addr)
 {
 	return 0;
 }
-static inline int pmd_free_pte_page(pmd_t *pmd)
+static inline int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
 {
 	return 0;
 }
diff --git a/lib/ioremap.c b/lib/ioremap.c
index 54e5bbaa3200..517f5853ffed 100644
--- a/lib/ioremap.c
+++ b/lib/ioremap.c
@@ -92,7 +92,7 @@ static inline int ioremap_pmd_range(pud_t *pud, unsigned long addr,
 		if (ioremap_pmd_enabled() &&
 		    ((next - addr) == PMD_SIZE) &&
 		    IS_ALIGNED(phys_addr + addr, PMD_SIZE) &&
-		    pmd_free_pte_page(pmd)) {
+		    pmd_free_pte_page(pmd, addr)) {
 			if (pmd_set_huge(pmd, phys_addr + addr, prot))
 				continue;
 		}
@@ -119,7 +119,7 @@ static inline int ioremap_pud_range(p4d_t *p4d, unsigned long addr,
 		if (ioremap_pud_enabled() &&
 		    ((next - addr) == PUD_SIZE) &&
 		    IS_ALIGNED(phys_addr + addr, PUD_SIZE) &&
-		    pud_free_pmd_page(pud)) {
+		    pud_free_pmd_page(pud, addr)) {
 			if (pud_set_huge(pud, phys_addr + addr, prot))
 				continue;
 		}

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/3] x86/mm: add TLB purge to free pmd/pte page interfaces
  2018-04-30 17:59 ` Toshi Kani
@ 2018-04-30 17:59   ` Toshi Kani
  -1 siblings, 0 replies; 14+ messages in thread
From: Toshi Kani @ 2018-04-30 17:59 UTC (permalink / raw)
  To: mhocko, akpm, tglx, mingo, hpa
  Cc: cpandya, linux-mm, x86, linux-arm-kernel, linux-kernel,
	Toshi Kani, Joerg Roedel, stable

ioremap() calls pud_free_pmd_page() / pmd_free_pte_page() when it creates
a pud / pmd map.  The following preconditions are met at their entry.
 - All pte entries for a target pud/pmd address range have been cleared.
 - System-wide TLB purges have been peformed for a target pud/pmd address
   range.

The preconditions assure that there is no stale TLB entry for the range.
Speculation may not cache TLB entries since it requires all levels of page
entries, including ptes, to have P & A-bits set for an associated address.
However, speculation may cache pud/pmd entries (paging-structure caches)
when they have P-bit set.

Add a system-wide TLB purge (INVLPG) to a single page after clearing
pud/pmd entry's P-bit.

SDM 4.10.4.1, Operation that Invalidate TLBs and Paging-Structure Caches,
states that:
  INVLPG invalidates all paging-structure caches associated with the
  current PCID regardless of the liner addresses to which they correspond.

Fixes: 28ee90fe6048 ("x86/mm: implement free pmd/pte page interfaces")
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: <stable@vger.kernel.org>
---
 arch/x86/mm/pgtable.c |   32 ++++++++++++++++++++++++++------
 1 file changed, 26 insertions(+), 6 deletions(-)

diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index 37e3cbac59b9..816fd41ee854 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -720,24 +720,40 @@ int pmd_clear_huge(pmd_t *pmd)
  * @pud: Pointer to a PUD.
  * @addr: Virtual address associated with pud.
  *
- * Context: The pud range has been unmaped and TLB purged.
+ * Context: The pud range has been unmapped and TLB purged.
  * Return: 1 if clearing the entry succeeded. 0 otherwise.
  */
 int pud_free_pmd_page(pud_t *pud, unsigned long addr)
 {
-	pmd_t *pmd;
+	pmd_t *pmd, *pmd_sv;
+	pte_t *pte;
 	int i;
 
 	if (pud_none(*pud))
 		return 1;
 
 	pmd = (pmd_t *)pud_page_vaddr(*pud);
+	pmd_sv = (pmd_t *)__get_free_page(GFP_KERNEL);
 
-	for (i = 0; i < PTRS_PER_PMD; i++)
-		if (!pmd_free_pte_page(&pmd[i], addr + (i * PMD_SIZE)))
-			return 0;
+	for (i = 0; i < PTRS_PER_PMD; i++) {
+		pmd_sv[i] = pmd[i];
+		if (!pmd_none(pmd[i]))
+			pmd_clear(&pmd[i]);
+	}
 
 	pud_clear(pud);
+
+	/* INVLPG to clear all paging-structure caches */
+	flush_tlb_kernel_range(addr, addr + PAGE_SIZE-1);
+
+	for (i = 0; i < PTRS_PER_PMD; i++) {
+		if (!pmd_none(pmd_sv[i])) {
+			pte = (pte_t *)pmd_page_vaddr(pmd_sv[i]);
+			free_page((unsigned long)pte);
+		}
+	}
+
+	free_page((unsigned long)pmd_sv);
 	free_page((unsigned long)pmd);
 
 	return 1;
@@ -748,7 +764,7 @@ int pud_free_pmd_page(pud_t *pud, unsigned long addr)
  * @pmd: Pointer to a PMD.
  * @addr: Virtual address associated with pmd.
  *
- * Context: The pmd range has been unmaped and TLB purged.
+ * Context: The pmd range has been unmapped and TLB purged.
  * Return: 1 if clearing the entry succeeded. 0 otherwise.
  */
 int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
@@ -760,6 +776,10 @@ int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
 
 	pte = (pte_t *)pmd_page_vaddr(*pmd);
 	pmd_clear(pmd);
+
+	/* INVLPG to clear all paging-structure caches */
+	flush_tlb_kernel_range(addr, addr + PAGE_SIZE-1);
+
 	free_page((unsigned long)pte);
 
 	return 1;

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/3] x86/mm: add TLB purge to free pmd/pte page interfaces
@ 2018-04-30 17:59   ` Toshi Kani
  0 siblings, 0 replies; 14+ messages in thread
From: Toshi Kani @ 2018-04-30 17:59 UTC (permalink / raw)
  To: linux-arm-kernel

ioremap() calls pud_free_pmd_page() / pmd_free_pte_page() when it creates
a pud / pmd map.  The following preconditions are met at their entry.
 - All pte entries for a target pud/pmd address range have been cleared.
 - System-wide TLB purges have been peformed for a target pud/pmd address
   range.

The preconditions assure that there is no stale TLB entry for the range.
Speculation may not cache TLB entries since it requires all levels of page
entries, including ptes, to have P & A-bits set for an associated address.
However, speculation may cache pud/pmd entries (paging-structure caches)
when they have P-bit set.

Add a system-wide TLB purge (INVLPG) to a single page after clearing
pud/pmd entry's P-bit.

SDM 4.10.4.1, Operation that Invalidate TLBs and Paging-Structure Caches,
states that:
  INVLPG invalidates all paging-structure caches associated with the
  current PCID regardless of the liner addresses to which they correspond.

Fixes: 28ee90fe6048 ("x86/mm: implement free pmd/pte page interfaces")
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: <stable@vger.kernel.org>
---
 arch/x86/mm/pgtable.c |   32 ++++++++++++++++++++++++++------
 1 file changed, 26 insertions(+), 6 deletions(-)

diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index 37e3cbac59b9..816fd41ee854 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -720,24 +720,40 @@ int pmd_clear_huge(pmd_t *pmd)
  * @pud: Pointer to a PUD.
  * @addr: Virtual address associated with pud.
  *
- * Context: The pud range has been unmaped and TLB purged.
+ * Context: The pud range has been unmapped and TLB purged.
  * Return: 1 if clearing the entry succeeded. 0 otherwise.
  */
 int pud_free_pmd_page(pud_t *pud, unsigned long addr)
 {
-	pmd_t *pmd;
+	pmd_t *pmd, *pmd_sv;
+	pte_t *pte;
 	int i;
 
 	if (pud_none(*pud))
 		return 1;
 
 	pmd = (pmd_t *)pud_page_vaddr(*pud);
+	pmd_sv = (pmd_t *)__get_free_page(GFP_KERNEL);
 
-	for (i = 0; i < PTRS_PER_PMD; i++)
-		if (!pmd_free_pte_page(&pmd[i], addr + (i * PMD_SIZE)))
-			return 0;
+	for (i = 0; i < PTRS_PER_PMD; i++) {
+		pmd_sv[i] = pmd[i];
+		if (!pmd_none(pmd[i]))
+			pmd_clear(&pmd[i]);
+	}
 
 	pud_clear(pud);
+
+	/* INVLPG to clear all paging-structure caches */
+	flush_tlb_kernel_range(addr, addr + PAGE_SIZE-1);
+
+	for (i = 0; i < PTRS_PER_PMD; i++) {
+		if (!pmd_none(pmd_sv[i])) {
+			pte = (pte_t *)pmd_page_vaddr(pmd_sv[i]);
+			free_page((unsigned long)pte);
+		}
+	}
+
+	free_page((unsigned long)pmd_sv);
 	free_page((unsigned long)pmd);
 
 	return 1;
@@ -748,7 +764,7 @@ int pud_free_pmd_page(pud_t *pud, unsigned long addr)
  * @pmd: Pointer to a PMD.
  * @addr: Virtual address associated with pmd.
  *
- * Context: The pmd range has been unmaped and TLB purged.
+ * Context: The pmd range has been unmapped and TLB purged.
  * Return: 1 if clearing the entry succeeded. 0 otherwise.
  */
 int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
@@ -760,6 +776,10 @@ int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
 
 	pte = (pte_t *)pmd_page_vaddr(*pmd);
 	pmd_clear(pmd);
+
+	/* INVLPG to clear all paging-structure caches */
+	flush_tlb_kernel_range(addr, addr + PAGE_SIZE-1);
+
 	free_page((unsigned long)pte);
 
 	return 1;

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 3/3] x86/mm: disable ioremap free page handling on x86-PAE
  2018-04-30 17:59 ` Toshi Kani
@ 2018-04-30 17:59   ` Toshi Kani
  -1 siblings, 0 replies; 14+ messages in thread
From: Toshi Kani @ 2018-04-30 17:59 UTC (permalink / raw)
  To: mhocko, akpm, tglx, mingo, hpa
  Cc: cpandya, linux-mm, x86, linux-arm-kernel, linux-kernel,
	Toshi Kani, Joerg Roedel, stable

ioremap() supports pmd mappings on x86-PAE.  However, kernel's pmd
tables are not shared among processes on x86-PAE.  Therefore, any
update to sync'd pmd entries need re-syncing.  Freeing a pte page
also leads to a vmalloc fault and hits the BUG_ON in vmalloc_sync_one().

Disable free page handling on x86-PAE.  pud_free_pmd_page() and
pmd_free_pte_page() simply return 0 if a given pud/pmd entry is present.
This assures that ioremap() does not update sync'd pmd entries at the
cost of falling back to pte mappings.

Fixes: 28ee90fe6048 ("x86/mm: implement free pmd/pte page interfaces")
Reported-by: Joerg Roedel <joro@8bytes.org>
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: <stable@vger.kernel.org>
---
 arch/x86/mm/pgtable.c |   19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index 816fd41ee854..809115150d8b 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -715,6 +715,7 @@ int pmd_clear_huge(pmd_t *pmd)
 	return 0;
 }
 
+#ifdef CONFIG_X86_64
 /**
  * pud_free_pmd_page - Clear pud entry and free pmd page.
  * @pud: Pointer to a PUD.
@@ -784,4 +785,22 @@ int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
 
 	return 1;
 }
+
+#else /* !CONFIG_X86_64 */
+
+int pud_free_pmd_page(pud_t *pud, unsigned long addr)
+{
+	return pud_none(*pud);
+}
+
+/*
+ * Disable free page handling on x86-PAE. This assures that ioremap()
+ * does not update sync'd pmd entries. See vmalloc_sync_one().
+ */
+int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
+{
+	return pmd_none(*pmd);
+}
+
+#endif /* CONFIG_X86_64 */
 #endif	/* CONFIG_HAVE_ARCH_HUGE_VMAP */

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 3/3] x86/mm: disable ioremap free page handling on x86-PAE
@ 2018-04-30 17:59   ` Toshi Kani
  0 siblings, 0 replies; 14+ messages in thread
From: Toshi Kani @ 2018-04-30 17:59 UTC (permalink / raw)
  To: linux-arm-kernel

ioremap() supports pmd mappings on x86-PAE.  However, kernel's pmd
tables are not shared among processes on x86-PAE.  Therefore, any
update to sync'd pmd entries need re-syncing.  Freeing a pte page
also leads to a vmalloc fault and hits the BUG_ON in vmalloc_sync_one().

Disable free page handling on x86-PAE.  pud_free_pmd_page() and
pmd_free_pte_page() simply return 0 if a given pud/pmd entry is present.
This assures that ioremap() does not update sync'd pmd entries at the
cost of falling back to pte mappings.

Fixes: 28ee90fe6048 ("x86/mm: implement free pmd/pte page interfaces")
Reported-by: Joerg Roedel <joro@8bytes.org>
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: <stable@vger.kernel.org>
---
 arch/x86/mm/pgtable.c |   19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index 816fd41ee854..809115150d8b 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -715,6 +715,7 @@ int pmd_clear_huge(pmd_t *pmd)
 	return 0;
 }
 
+#ifdef CONFIG_X86_64
 /**
  * pud_free_pmd_page - Clear pud entry and free pmd page.
  * @pud: Pointer to a PUD.
@@ -784,4 +785,22 @@ int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
 
 	return 1;
 }
+
+#else /* !CONFIG_X86_64 */
+
+int pud_free_pmd_page(pud_t *pud, unsigned long addr)
+{
+	return pud_none(*pud);
+}
+
+/*
+ * Disable free page handling on x86-PAE. This assures that ioremap()
+ * does not update sync'd pmd entries. See vmalloc_sync_one().
+ */
+int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
+{
+	return pmd_none(*pmd);
+}
+
+#endif /* CONFIG_X86_64 */
 #endif	/* CONFIG_HAVE_ARCH_HUGE_VMAP */

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/3] x86/mm: disable ioremap free page handling on x86-PAE
  2018-04-30 17:59   ` Toshi Kani
@ 2018-05-09 15:47     ` Kani, Toshi
  -1 siblings, 0 replies; 14+ messages in thread
From: Kani, Toshi @ 2018-05-09 15:47 UTC (permalink / raw)
  To: tglx, hpa, joro, Hocko, Michal, akpm, mingo
  Cc: linux-arm-kernel, cpandya, linux-mm, linux-kernel, x86, stable

On Mon, 2018-04-30 at 11:59 -0600, Toshi Kani wrote:
> ioremap() supports pmd mappings on x86-PAE.  However, kernel's pmd
> tables are not shared among processes on x86-PAE.  Therefore, any
> update to sync'd pmd entries need re-syncing.  Freeing a pte page
> also leads to a vmalloc fault and hits the BUG_ON in vmalloc_sync_one().
> 
> Disable free page handling on x86-PAE.  pud_free_pmd_page() and
> pmd_free_pte_page() simply return 0 if a given pud/pmd entry is present.
> This assures that ioremap() does not update sync'd pmd entries at the
> cost of falling back to pte mappings.
> 
> Fixes: 28ee90fe6048 ("x86/mm: implement free pmd/pte page interfaces")
> Reported-by: Joerg Roedel <joro@8bytes.org>

Hi Joerg,

Does it solve your problem?  Let me know if you have any issue with the
series. 

Thanks,
-Toshi

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 3/3] x86/mm: disable ioremap free page handling on x86-PAE
@ 2018-05-09 15:47     ` Kani, Toshi
  0 siblings, 0 replies; 14+ messages in thread
From: Kani, Toshi @ 2018-05-09 15:47 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, 2018-04-30 at 11:59 -0600, Toshi Kani wrote:
> ioremap() supports pmd mappings on x86-PAE.  However, kernel's pmd
> tables are not shared among processes on x86-PAE.  Therefore, any
> update to sync'd pmd entries need re-syncing.  Freeing a pte page
> also leads to a vmalloc fault and hits the BUG_ON in vmalloc_sync_one().
> 
> Disable free page handling on x86-PAE.  pud_free_pmd_page() and
> pmd_free_pte_page() simply return 0 if a given pud/pmd entry is present.
> This assures that ioremap() does not update sync'd pmd entries at the
> cost of falling back to pte mappings.
> 
> Fixes: 28ee90fe6048 ("x86/mm: implement free pmd/pte page interfaces")
> Reported-by: Joerg Roedel <joro@8bytes.org>

Hi Joerg,

Does it solve your problem?  Let me know if you have any issue with the
series. 

Thanks,
-Toshi

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/3] x86/mm: add TLB purge to free pmd/pte page interfaces
  2018-04-30 17:59   ` Toshi Kani
@ 2018-05-15 14:05     ` Joerg Roedel
  -1 siblings, 0 replies; 14+ messages in thread
From: Joerg Roedel @ 2018-05-15 14:05 UTC (permalink / raw)
  To: Toshi Kani
  Cc: mhocko, akpm, tglx, mingo, hpa, cpandya, linux-mm, x86,
	linux-arm-kernel, linux-kernel, stable

On Mon, Apr 30, 2018 at 11:59:24AM -0600, Toshi Kani wrote:
>  int pud_free_pmd_page(pud_t *pud, unsigned long addr)
>  {
> -	pmd_t *pmd;
> +	pmd_t *pmd, *pmd_sv;
> +	pte_t *pte;
>  	int i;
>  
>  	if (pud_none(*pud))
>  		return 1;
>  
>  	pmd = (pmd_t *)pud_page_vaddr(*pud);
> +	pmd_sv = (pmd_t *)__get_free_page(GFP_KERNEL);

So you need to allocate a page to free a page? It is better to put the
pages into a list with a list_head on the stack.

I am still on favour of just reverting the broken commit and do a
correct and working fix for the/a merge window.


	Joerg

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 2/3] x86/mm: add TLB purge to free pmd/pte page interfaces
@ 2018-05-15 14:05     ` Joerg Roedel
  0 siblings, 0 replies; 14+ messages in thread
From: Joerg Roedel @ 2018-05-15 14:05 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Apr 30, 2018 at 11:59:24AM -0600, Toshi Kani wrote:
>  int pud_free_pmd_page(pud_t *pud, unsigned long addr)
>  {
> -	pmd_t *pmd;
> +	pmd_t *pmd, *pmd_sv;
> +	pte_t *pte;
>  	int i;
>  
>  	if (pud_none(*pud))
>  		return 1;
>  
>  	pmd = (pmd_t *)pud_page_vaddr(*pud);
> +	pmd_sv = (pmd_t *)__get_free_page(GFP_KERNEL);

So you need to allocate a page to free a page? It is better to put the
pages into a list with a list_head on the stack.

I am still on favour of just reverting the broken commit and do a
correct and working fix for the/a merge window.


	Joerg

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/3] x86/mm: add TLB purge to free pmd/pte page interfaces
  2018-05-15 14:05     ` Joerg Roedel
@ 2018-05-15 16:34       ` Kani, Toshi
  -1 siblings, 0 replies; 14+ messages in thread
From: Kani, Toshi @ 2018-05-15 16:34 UTC (permalink / raw)
  To: joro
  Cc: linux-kernel, tglx, linux-mm, stable, x86, akpm, hpa, mingo,
	Hocko, Michal, cpandya, linux-arm-kernel

On Tue, 2018-05-15 at 16:05 +0200, Joerg Roedel wrote:
> On Mon, Apr 30, 2018 at 11:59:24AM -0600, Toshi Kani wrote:
> >  int pud_free_pmd_page(pud_t *pud, unsigned long addr)
> >  {
> > -	pmd_t *pmd;
> > +	pmd_t *pmd, *pmd_sv;
> > +	pte_t *pte;
> >  	int i;
> >  
> >  	if (pud_none(*pud))
> >  		return 1;
> >  
> >  	pmd = (pmd_t *)pud_page_vaddr(*pud);
> > +	pmd_sv = (pmd_t *)__get_free_page(GFP_KERNEL);
> 
> So you need to allocate a page to free a page? It is better to put the
> pages into a list with a list_head on the stack.

The code should have checked if pmd_sv is NULL...  I will update the
patch.

For performance, I do not think this page alloc is a problem.  Unlike
pmd_free_pte_page(), pud_free_pmd_page() covers an extremely rare case. 
  Since pud requires 1GB-alignment, pud and pmd/pte mappings do not
share the same ranges within the vmalloc space.  I had to instrument the
kernel to force them share the same ranges in order to test this patch.

> I am still on favour of just reverting the broken commit and do a
> correct and working fix for the/a merge window.

I will reorder the patch series, and change patch 3/3 to 1/3 so that we
can take it first to fix the BUG_ON on PAE.  This revert will disable
2MB ioremap on PAE in some cases, but I do not think it's important on
PAE anyway.

I do not think revert on x86/64 is necessary and I am more worried about
disabling 2MB ioremap in some cases, which can be seen as degradation. 
Patch 2/3 fixes a possible page-directory cache issue that I cannot hit
even though I put ioremap/iounmap with various sizes into a tight loop
for a day.

Thanks,
-Toshi

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 2/3] x86/mm: add TLB purge to free pmd/pte page interfaces
@ 2018-05-15 16:34       ` Kani, Toshi
  0 siblings, 0 replies; 14+ messages in thread
From: Kani, Toshi @ 2018-05-15 16:34 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, 2018-05-15 at 16:05 +0200, Joerg Roedel wrote:
> On Mon, Apr 30, 2018 at 11:59:24AM -0600, Toshi Kani wrote:
> >  int pud_free_pmd_page(pud_t *pud, unsigned long addr)
> >  {
> > -	pmd_t *pmd;
> > +	pmd_t *pmd, *pmd_sv;
> > +	pte_t *pte;
> >  	int i;
> >  
> >  	if (pud_none(*pud))
> >  		return 1;
> >  
> >  	pmd = (pmd_t *)pud_page_vaddr(*pud);
> > +	pmd_sv = (pmd_t *)__get_free_page(GFP_KERNEL);
> 
> So you need to allocate a page to free a page? It is better to put the
> pages into a list with a list_head on the stack.

The code should have checked if pmd_sv is NULL...  I will update the
patch.

For performance, I do not think this page alloc is a problem.  Unlike
pmd_free_pte_page(), pud_free_pmd_page() covers an extremely rare case. 
  Since pud requires 1GB-alignment, pud and pmd/pte mappings do not
share the same ranges within the vmalloc space.  I had to instrument the
kernel to force them share the same ranges in order to test this patch.

> I am still on favour of just reverting the broken commit and do a
> correct and working fix for the/a merge window.

I will reorder the patch series, and change patch 3/3 to 1/3 so that we
can take it first to fix the BUG_ON on PAE.  This revert will disable
2MB ioremap on PAE in some cases, but I do not think it's important on
PAE anyway.

I do not think revert on x86/64 is necessary and I am more worried about
disabling 2MB ioremap in some cases, which can be seen as degradation. 
Patch 2/3 fixes a possible page-directory cache issue that I cannot hit
even though I put ioremap/iounmap with various sizes into a tight loop
for a day.

Thanks,
-Toshi

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2018-05-15 16:34 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-30 17:59 [PATCH 0/3] fix free pmd/pte page handlings on x86 Toshi Kani
2018-04-30 17:59 ` Toshi Kani
2018-04-30 17:59 ` [PATCH 1/3] ioremap: Update pgtable free interfaces with addr Toshi Kani
2018-04-30 17:59   ` Toshi Kani
2018-04-30 17:59 ` [PATCH 2/3] x86/mm: add TLB purge to free pmd/pte page interfaces Toshi Kani
2018-04-30 17:59   ` Toshi Kani
2018-05-15 14:05   ` Joerg Roedel
2018-05-15 14:05     ` Joerg Roedel
2018-05-15 16:34     ` Kani, Toshi
2018-05-15 16:34       ` Kani, Toshi
2018-04-30 17:59 ` [PATCH 3/3] x86/mm: disable ioremap free page handling on x86-PAE Toshi Kani
2018-04-30 17:59   ` Toshi Kani
2018-05-09 15:47   ` Kani, Toshi
2018-05-09 15:47     ` Kani, Toshi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.