linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/3] fix free pmd/pte page handlings on x86
@ 2018-05-16 23:32 Toshi Kani
  2018-05-16 23:32 ` [PATCH v3 1/3] x86/mm: disable ioremap free page handling on x86-PAE Toshi Kani
                   ` (3 more replies)
  0 siblings, 4 replies; 18+ messages in thread
From: Toshi Kani @ 2018-05-16 23:32 UTC (permalink / raw)
  To: mhocko, akpm, tglx, mingo, hpa
  Cc: cpandya, linux-mm, x86, linux-arm-kernel, linux-kernel

This series fixes two issues in the x86 ioremap free page handlings
for pud/pmd mappings.

Patch 01 fixes BUG_ON on x86-PAE reported by Joerg.  It disables
the free page handling on x86-PAE.

Patch 02-03 fixes a possible issue with speculation which can cause
stale page-directory cache.
 - Patch 02 is from Chintan's v9 01/04 patch [1], which adds a new arg
   'addr', with my merge change to patch 01.
 - Patch 03 adds a TLB purge (INVLPG) to purge page-structure caches
   that may be cached by speculation.  See the patch descriptions for
   more detal.

[1] https://patchwork.kernel.org/patch/10371015/

v3:
 - Fixed a build error in v2.

v2:
 - Reordered patch-set, so that patch 01 can be applied independently.
 - Added a NULL pointer check for the page alloc in patch 03. 

---
Toshi Kani (2):
  1/3 x86/mm: disable ioremap free page handling on x86-PAE
  3/3 x86/mm: add TLB purge to free pmd/pte page interfaces

Chintan Pandya (1):
  2/3 ioremap: Update pgtable free interfaces with addr

---
 arch/arm64/mm/mmu.c           |  4 +--
 arch/x86/mm/pgtable.c         | 59 +++++++++++++++++++++++++++++++++++++------
 include/asm-generic/pgtable.h |  8 +++---
 lib/ioremap.c                 |  4 +--
 4 files changed, 59 insertions(+), 16 deletions(-)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v3 1/3] x86/mm: disable ioremap free page handling on x86-PAE
  2018-05-16 23:32 [PATCH v3 0/3] fix free pmd/pte page handlings on x86 Toshi Kani
@ 2018-05-16 23:32 ` Toshi Kani
  2018-05-16 23:32 ` [PATCH v3 2/3] ioremap: Update pgtable free interfaces with addr Toshi Kani
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 18+ messages in thread
From: Toshi Kani @ 2018-05-16 23:32 UTC (permalink / raw)
  To: mhocko, akpm, tglx, mingo, hpa
  Cc: cpandya, linux-mm, x86, linux-arm-kernel, linux-kernel,
	Toshi Kani, Joerg Roedel, stable

ioremap() supports pmd mappings on x86-PAE.  However, kernel's pmd
tables are not shared among processes on x86-PAE.  Therefore, any
update to sync'd pmd entries need re-syncing.  Freeing a pte page
also leads to a vmalloc fault and hits the BUG_ON in vmalloc_sync_one().

Disable free page handling on x86-PAE.  pud_free_pmd_page() and
pmd_free_pte_page() simply return 0 if a given pud/pmd entry is present.
This assures that ioremap() does not update sync'd pmd entries at the
cost of falling back to pte mappings.

Fixes: 28ee90fe6048 ("x86/mm: implement free pmd/pte page interfaces")
Reported-by: Joerg Roedel <joro@8bytes.org>
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: <stable@vger.kernel.org>
---
 arch/x86/mm/pgtable.c |   19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index ffc8c13c50e4..3f7180bc5f52 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -715,6 +715,7 @@ int pmd_clear_huge(pmd_t *pmd)
 	return 0;
 }
 
+#ifdef CONFIG_X86_64
 /**
  * pud_free_pmd_page - Clear pud entry and free pmd page.
  * @pud: Pointer to a PUD.
@@ -762,4 +763,22 @@ int pmd_free_pte_page(pmd_t *pmd)
 
 	return 1;
 }
+
+#else /* !CONFIG_X86_64 */
+
+int pud_free_pmd_page(pud_t *pud)
+{
+	return pud_none(*pud);
+}
+
+/*
+ * Disable free page handling on x86-PAE. This assures that ioremap()
+ * does not update sync'd pmd entries. See vmalloc_sync_one().
+ */
+int pmd_free_pte_page(pmd_t *pmd)
+{
+	return pmd_none(*pmd);
+}
+
+#endif /* CONFIG_X86_64 */
 #endif	/* CONFIG_HAVE_ARCH_HUGE_VMAP */

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v3 2/3] ioremap: Update pgtable free interfaces with addr
  2018-05-16 23:32 [PATCH v3 0/3] fix free pmd/pte page handlings on x86 Toshi Kani
  2018-05-16 23:32 ` [PATCH v3 1/3] x86/mm: disable ioremap free page handling on x86-PAE Toshi Kani
@ 2018-05-16 23:32 ` Toshi Kani
  2018-05-17  6:47   ` Michal Hocko
  2018-05-16 23:32 ` [PATCH v3 3/3] x86/mm: add TLB purge to free pmd/pte page interfaces Toshi Kani
  2018-06-24 13:19 ` [PATCH v3 0/3] fix free pmd/pte page handlings on x86 Thomas Gleixner
  3 siblings, 1 reply; 18+ messages in thread
From: Toshi Kani @ 2018-05-16 23:32 UTC (permalink / raw)
  To: mhocko, akpm, tglx, mingo, hpa
  Cc: cpandya, linux-mm, x86, linux-arm-kernel, linux-kernel,
	Toshi Kani, stable

From: Chintan Pandya <cpandya@codeaurora.org>

This patch ("mm/vmalloc: Add interfaces to free unmapped
page table") adds following 2 interfaces to free the page
table in case we implement huge mapping.

pud_free_pmd_page() and pmd_free_pte_page()

Some architectures (like arm64) needs to do proper TLB
maintanance after updating pagetable entry even in map.
Why ? Read this,
https://patchwork.kernel.org/patch/10134581/

Pass 'addr' in these interfaces so that proper TLB ops
can be performed.

[toshi@hpe.com: merge changes]
Fixes: 28ee90fe6048 ("x86/mm: implement free pmd/pte page interfaces")
Signed-off-by: Chintan Pandya <cpandya@codeaurora.org>
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Cc: <stable@vger.kernel.org>
---
 arch/arm64/mm/mmu.c           |    4 ++--
 arch/x86/mm/pgtable.c         |   12 +++++++-----
 include/asm-generic/pgtable.h |    8 ++++----
 lib/ioremap.c                 |    4 ++--
 4 files changed, 15 insertions(+), 13 deletions(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 2dbb2c9f1ec1..da98828609a1 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -973,12 +973,12 @@ int pmd_clear_huge(pmd_t *pmdp)
 	return 1;
 }
 
-int pud_free_pmd_page(pud_t *pud)
+int pud_free_pmd_page(pud_t *pud, unsigned long addr)
 {
 	return pud_none(*pud);
 }
 
-int pmd_free_pte_page(pmd_t *pmd)
+int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
 {
 	return pmd_none(*pmd);
 }
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index 3f7180bc5f52..f60fdf411103 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -719,11 +719,12 @@ int pmd_clear_huge(pmd_t *pmd)
 /**
  * pud_free_pmd_page - Clear pud entry and free pmd page.
  * @pud: Pointer to a PUD.
+ * @addr: Virtual address associated with pud.
  *
  * Context: The pud range has been unmaped and TLB purged.
  * Return: 1 if clearing the entry succeeded. 0 otherwise.
  */
-int pud_free_pmd_page(pud_t *pud)
+int pud_free_pmd_page(pud_t *pud, unsigned long addr)
 {
 	pmd_t *pmd;
 	int i;
@@ -734,7 +735,7 @@ int pud_free_pmd_page(pud_t *pud)
 	pmd = (pmd_t *)pud_page_vaddr(*pud);
 
 	for (i = 0; i < PTRS_PER_PMD; i++)
-		if (!pmd_free_pte_page(&pmd[i]))
+		if (!pmd_free_pte_page(&pmd[i], addr + (i * PMD_SIZE)))
 			return 0;
 
 	pud_clear(pud);
@@ -746,11 +747,12 @@ int pud_free_pmd_page(pud_t *pud)
 /**
  * pmd_free_pte_page - Clear pmd entry and free pte page.
  * @pmd: Pointer to a PMD.
+ * @addr: Virtual address associated with pmd.
  *
  * Context: The pmd range has been unmaped and TLB purged.
  * Return: 1 if clearing the entry succeeded. 0 otherwise.
  */
-int pmd_free_pte_page(pmd_t *pmd)
+int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
 {
 	pte_t *pte;
 
@@ -766,7 +768,7 @@ int pmd_free_pte_page(pmd_t *pmd)
 
 #else /* !CONFIG_X86_64 */
 
-int pud_free_pmd_page(pud_t *pud)
+int pud_free_pmd_page(pud_t *pud, unsigned long addr)
 {
 	return pud_none(*pud);
 }
@@ -775,7 +777,7 @@ int pud_free_pmd_page(pud_t *pud)
  * Disable free page handling on x86-PAE. This assures that ioremap()
  * does not update sync'd pmd entries. See vmalloc_sync_one().
  */
-int pmd_free_pte_page(pmd_t *pmd)
+int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
 {
 	return pmd_none(*pmd);
 }
diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
index f59639afaa39..b081794ba135 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -1019,8 +1019,8 @@ int pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t prot);
 int pmd_set_huge(pmd_t *pmd, phys_addr_t addr, pgprot_t prot);
 int pud_clear_huge(pud_t *pud);
 int pmd_clear_huge(pmd_t *pmd);
-int pud_free_pmd_page(pud_t *pud);
-int pmd_free_pte_page(pmd_t *pmd);
+int pud_free_pmd_page(pud_t *pud, unsigned long addr);
+int pmd_free_pte_page(pmd_t *pmd, unsigned long addr);
 #else	/* !CONFIG_HAVE_ARCH_HUGE_VMAP */
 static inline int p4d_set_huge(p4d_t *p4d, phys_addr_t addr, pgprot_t prot)
 {
@@ -1046,11 +1046,11 @@ static inline int pmd_clear_huge(pmd_t *pmd)
 {
 	return 0;
 }
-static inline int pud_free_pmd_page(pud_t *pud)
+static inline int pud_free_pmd_page(pud_t *pud, unsigned long addr)
 {
 	return 0;
 }
-static inline int pmd_free_pte_page(pmd_t *pmd)
+static inline int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
 {
 	return 0;
 }
diff --git a/lib/ioremap.c b/lib/ioremap.c
index 54e5bbaa3200..517f5853ffed 100644
--- a/lib/ioremap.c
+++ b/lib/ioremap.c
@@ -92,7 +92,7 @@ static inline int ioremap_pmd_range(pud_t *pud, unsigned long addr,
 		if (ioremap_pmd_enabled() &&
 		    ((next - addr) == PMD_SIZE) &&
 		    IS_ALIGNED(phys_addr + addr, PMD_SIZE) &&
-		    pmd_free_pte_page(pmd)) {
+		    pmd_free_pte_page(pmd, addr)) {
 			if (pmd_set_huge(pmd, phys_addr + addr, prot))
 				continue;
 		}
@@ -119,7 +119,7 @@ static inline int ioremap_pud_range(p4d_t *p4d, unsigned long addr,
 		if (ioremap_pud_enabled() &&
 		    ((next - addr) == PUD_SIZE) &&
 		    IS_ALIGNED(phys_addr + addr, PUD_SIZE) &&
-		    pud_free_pmd_page(pud)) {
+		    pud_free_pmd_page(pud, addr)) {
 			if (pud_set_huge(pud, phys_addr + addr, prot))
 				continue;
 		}

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v3 3/3] x86/mm: add TLB purge to free pmd/pte page interfaces
  2018-05-16 23:32 [PATCH v3 0/3] fix free pmd/pte page handlings on x86 Toshi Kani
  2018-05-16 23:32 ` [PATCH v3 1/3] x86/mm: disable ioremap free page handling on x86-PAE Toshi Kani
  2018-05-16 23:32 ` [PATCH v3 2/3] ioremap: Update pgtable free interfaces with addr Toshi Kani
@ 2018-05-16 23:32 ` Toshi Kani
  2018-05-29 14:44   ` Joerg Roedel
  2018-06-24 13:19 ` [PATCH v3 0/3] fix free pmd/pte page handlings on x86 Thomas Gleixner
  3 siblings, 1 reply; 18+ messages in thread
From: Toshi Kani @ 2018-05-16 23:32 UTC (permalink / raw)
  To: mhocko, akpm, tglx, mingo, hpa
  Cc: cpandya, linux-mm, x86, linux-arm-kernel, linux-kernel,
	Toshi Kani, Joerg Roedel, stable

ioremap() calls pud_free_pmd_page() / pmd_free_pte_page() when it creates
a pud / pmd map.  The following preconditions are met at their entry.
 - All pte entries for a target pud/pmd address range have been cleared.
 - System-wide TLB purges have been peformed for a target pud/pmd address
   range.

The preconditions assure that there is no stale TLB entry for the range.
Speculation may not cache TLB entries since it requires all levels of page
entries, including ptes, to have P & A-bits set for an associated address.
However, speculation may cache pud/pmd entries (paging-structure caches)
when they have P-bit set.

Add a system-wide TLB purge (INVLPG) to a single page after clearing
pud/pmd entry's P-bit.

SDM 4.10.4.1, Operation that Invalidate TLBs and Paging-Structure Caches,
states that:
  INVLPG invalidates all paging-structure caches associated with the
  current PCID regardless of the liner addresses to which they correspond.

Fixes: 28ee90fe6048 ("x86/mm: implement free pmd/pte page interfaces")
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: <stable@vger.kernel.org>
---
 arch/x86/mm/pgtable.c |   34 ++++++++++++++++++++++++++++------
 1 file changed, 28 insertions(+), 6 deletions(-)

diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index f60fdf411103..7e96594c7e97 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -721,24 +721,42 @@ int pmd_clear_huge(pmd_t *pmd)
  * @pud: Pointer to a PUD.
  * @addr: Virtual address associated with pud.
  *
- * Context: The pud range has been unmaped and TLB purged.
+ * Context: The pud range has been unmapped and TLB purged.
  * Return: 1 if clearing the entry succeeded. 0 otherwise.
  */
 int pud_free_pmd_page(pud_t *pud, unsigned long addr)
 {
-	pmd_t *pmd;
+	pmd_t *pmd, *pmd_sv;
+	pte_t *pte;
 	int i;
 
 	if (pud_none(*pud))
 		return 1;
 
 	pmd = (pmd_t *)pud_page_vaddr(*pud);
+	pmd_sv = (pmd_t *)__get_free_page(GFP_KERNEL);
+	if (!pmd_sv)
+		return 0;
 
-	for (i = 0; i < PTRS_PER_PMD; i++)
-		if (!pmd_free_pte_page(&pmd[i], addr + (i * PMD_SIZE)))
-			return 0;
+	for (i = 0; i < PTRS_PER_PMD; i++) {
+		pmd_sv[i] = pmd[i];
+		if (!pmd_none(pmd[i]))
+			pmd_clear(&pmd[i]);
+	}
 
 	pud_clear(pud);
+
+	/* INVLPG to clear all paging-structure caches */
+	flush_tlb_kernel_range(addr, addr + PAGE_SIZE-1);
+
+	for (i = 0; i < PTRS_PER_PMD; i++) {
+		if (!pmd_none(pmd_sv[i])) {
+			pte = (pte_t *)pmd_page_vaddr(pmd_sv[i]);
+			free_page((unsigned long)pte);
+		}
+	}
+
+	free_page((unsigned long)pmd_sv);
 	free_page((unsigned long)pmd);
 
 	return 1;
@@ -749,7 +767,7 @@ int pud_free_pmd_page(pud_t *pud, unsigned long addr)
  * @pmd: Pointer to a PMD.
  * @addr: Virtual address associated with pmd.
  *
- * Context: The pmd range has been unmaped and TLB purged.
+ * Context: The pmd range has been unmapped and TLB purged.
  * Return: 1 if clearing the entry succeeded. 0 otherwise.
  */
 int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
@@ -761,6 +779,10 @@ int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
 
 	pte = (pte_t *)pmd_page_vaddr(*pmd);
 	pmd_clear(pmd);
+
+	/* INVLPG to clear all paging-structure caches */
+	flush_tlb_kernel_range(addr, addr + PAGE_SIZE-1);
+
 	free_page((unsigned long)pte);
 
 	return 1;

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 2/3] ioremap: Update pgtable free interfaces with addr
  2018-05-16 23:32 ` [PATCH v3 2/3] ioremap: Update pgtable free interfaces with addr Toshi Kani
@ 2018-05-17  6:47   ` Michal Hocko
  2018-05-17 14:32     ` Kani, Toshi
  0 siblings, 1 reply; 18+ messages in thread
From: Michal Hocko @ 2018-05-17  6:47 UTC (permalink / raw)
  To: Toshi Kani
  Cc: akpm, tglx, mingo, hpa, cpandya, linux-mm, x86, linux-arm-kernel,
	linux-kernel, stable

On Wed 16-05-18 17:32:06, Kani Toshimitsu wrote:
> From: Chintan Pandya <cpandya@codeaurora.org>
> 
> This patch ("mm/vmalloc: Add interfaces to free unmapped
> page table") adds following 2 interfaces to free the page
> table in case we implement huge mapping.
> 
> pud_free_pmd_page() and pmd_free_pte_page()
> 
> Some architectures (like arm64) needs to do proper TLB
> maintanance after updating pagetable entry even in map.
> Why ? Read this,
> https://patchwork.kernel.org/patch/10134581/

Please add that information to the changelog.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 2/3] ioremap: Update pgtable free interfaces with addr
  2018-05-17  6:47   ` Michal Hocko
@ 2018-05-17 14:32     ` Kani, Toshi
  0 siblings, 0 replies; 18+ messages in thread
From: Kani, Toshi @ 2018-05-17 14:32 UTC (permalink / raw)
  To: mhocko
  Cc: linux-kernel, tglx, linux-mm, stable, x86, akpm, hpa, mingo,
	linux-arm-kernel, cpandya

On Thu, 2018-05-17 at 08:47 +0200, Michal Hocko wrote:
> On Wed 16-05-18 17:32:06, Kani Toshimitsu wrote:
> > From: Chintan Pandya <cpandya@codeaurora.org>
> > 
> > This patch ("mm/vmalloc: Add interfaces to free unmapped
> > page table") adds following 2 interfaces to free the page
> > table in case we implement huge mapping.
> > 
> > pud_free_pmd_page() and pmd_free_pte_page()
> > 
> > Some architectures (like arm64) needs to do proper TLB
> > maintanance after updating pagetable entry even in map.
> > Why ? Read this,
> > https://patchwork.kernel.org/patch/10134581/
> 
> Please add that information to the changelog.

I will update the description and resend this patch.

Thanks!
-Toshi

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 3/3] x86/mm: add TLB purge to free pmd/pte page interfaces
  2018-05-16 23:32 ` [PATCH v3 3/3] x86/mm: add TLB purge to free pmd/pte page interfaces Toshi Kani
@ 2018-05-29 14:44   ` Joerg Roedel
  2018-05-29 16:10     ` Kani, Toshi
  0 siblings, 1 reply; 18+ messages in thread
From: Joerg Roedel @ 2018-05-29 14:44 UTC (permalink / raw)
  To: mingo, tglx, Toshi Kani
  Cc: mhocko, akpm, tglx, mingo, hpa, cpandya, linux-mm, x86,
	linux-arm-kernel, linux-kernel, stable

On Wed, May 16, 2018 at 05:32:07PM -0600, Toshi Kani wrote:
>  	pmd = (pmd_t *)pud_page_vaddr(*pud);
> +	pmd_sv = (pmd_t *)__get_free_page(GFP_KERNEL);
> +	if (!pmd_sv)
> +		return 0;

So your code still needs to allocate a full page where a simple
list_head on the stack would do the same job.

Ingo, Thomas, can you please just revert the original broken patch for
now until there is proper fix?

Thanks,

	Joerg

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 3/3] x86/mm: add TLB purge to free pmd/pte page interfaces
  2018-05-29 14:44   ` Joerg Roedel
@ 2018-05-29 16:10     ` Kani, Toshi
  2018-05-30  4:59       ` joro
  0 siblings, 1 reply; 18+ messages in thread
From: Kani, Toshi @ 2018-05-29 16:10 UTC (permalink / raw)
  To: tglx, joro, mingo
  Cc: linux-kernel, linux-mm, stable, x86, akpm, hpa, linux-arm-kernel,
	cpandya, Hocko, Michal

On Tue, 2018-05-29 at 16:44 +0200, Joerg Roedel wrote:
> On Wed, May 16, 2018 at 05:32:07PM -0600, Toshi Kani wrote:
> >  	pmd = (pmd_t *)pud_page_vaddr(*pud);
> > +	pmd_sv = (pmd_t *)__get_free_page(GFP_KERNEL);
> > +	if (!pmd_sv)
> > +		return 0;
> 
> So your code still needs to allocate a full page where a simple
> list_head on the stack would do the same job.

Can you explain why you think allocating a page here is a major problem?
 
As I explained before, pud_free_pmd_page() covers an extremely rare case
 which I could not even hit with a huge number of ioremap() calls until
I instrumented alloc_vmap_area() to force this case to happen.  I do not
think pages should be listed for such a rare case.

> Ingo, Thomas, can you please just revert the original broken patch for
> now until there is proper fix?

If we just revert, please apply patch 1/3 first.  This patch address the
BUG_ON issue on PAE.  This is a real issue that needs a fix ASAP.

The page-directory cache issue on x64, which is addressed by patch 3/3,
is a theoretical issue that I could not hit by putting ioremap() calls
into a loop for a whole day.  Nobody hit this issue, either.

The simple revert patch Joerg posted a while ago causes
pmd_free_pte_page() to fail on x64.  This causes multiple pmd mappings
to fall into pte mappings on my test systems.  This can be seen as a
degradation, and I am afraid that it is more harmful than good.

Thanks,
-Toshi

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 3/3] x86/mm: add TLB purge to free pmd/pte page interfaces
  2018-05-29 16:10     ` Kani, Toshi
@ 2018-05-30  4:59       ` joro
  2018-05-30 15:39         ` Kani, Toshi
  0 siblings, 1 reply; 18+ messages in thread
From: joro @ 2018-05-30  4:59 UTC (permalink / raw)
  To: Kani, Toshi
  Cc: tglx, mingo, linux-kernel, linux-mm, stable, x86, akpm, hpa,
	linux-arm-kernel, cpandya, Hocko, Michal

On Tue, May 29, 2018 at 04:10:24PM +0000, Kani, Toshi wrote:
> Can you explain why you think allocating a page here is a major problem?

Because a larger allocation is more likely to fail. And if you fail the
allocation, you also fail to free more pages, which _is_ a problem. So
better avoid any allocations in code paths that are about freeing
memory.

> If we just revert, please apply patch 1/3 first.  This patch address the
> BUG_ON issue on PAE.  This is a real issue that needs a fix ASAP.

It does not address the problem of dirty page-walk caches on x86-64.

> The page-directory cache issue on x64, which is addressed by patch 3/3,
> is a theoretical issue that I could not hit by putting ioremap() calls
> into a loop for a whole day.  Nobody hit this issue, either.

How do you know you didn't hit that issue? It might cause silent data
corruption, which might not be easily detected.

> The simple revert patch Joerg posted a while ago causes
> pmd_free_pte_page() to fail on x64.  This causes multiple pmd mappings
> to fall into pte mappings on my test systems.  This can be seen as a
> degradation, and I am afraid that it is more harmful than good.

The plain revert just removes all the issues with the dirty TLB that the
original patch introduced and prevents huge mappings from being
established when there have been smaller mappings before. This is not
ideal, but at least its is consistent and does not leak pages and leaves
no dirty TLBs. So this is the easiest and most reliable fix for this
stage in the release process.


Regards,

	Joerg

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 3/3] x86/mm: add TLB purge to free pmd/pte page interfaces
  2018-05-30  4:59       ` joro
@ 2018-05-30 15:39         ` Kani, Toshi
  0 siblings, 0 replies; 18+ messages in thread
From: Kani, Toshi @ 2018-05-30 15:39 UTC (permalink / raw)
  To: joro
  Cc: linux-kernel, tglx, linux-mm, stable, x86, akpm, hpa, mingo,
	linux-arm-kernel, cpandya, Hocko, Michal

On Wed, 2018-05-30 at 06:59 +0200, joro@8bytes.org wrote:
> On Tue, May 29, 2018 at 04:10:24PM +0000, Kani, Toshi wrote:
> > Can you explain why you think allocating a page here is a major problem?
> 
> Because a larger allocation is more likely to fail. And if you fail the
> allocation, you also fail to free more pages, which _is_ a problem. So
> better avoid any allocations in code paths that are about freeing
> memory.

It only allocates a single page.  If it fails to allocate a single page,
this pud mapping request simply falls to pmd mappings, which is only as
bad as your suggested plain revert always does for both pud and pmd
mappings.  Also, this func is called from ioremap() when a driver
initializes its mapping.  If the system does not have a single page to
allocate, the driver has a much bigger issue to deal with than its
request falling back to pmd mappings.  Please also note that this func
is not called from free-memory path.

> > If we just revert, please apply patch 1/3 first.  This patch address the
> > BUG_ON issue on PAE.  This is a real issue that needs a fix ASAP.
> 
> It does not address the problem of dirty page-walk caches on x86-64.

This patch 3/3 fixes it.  Hope my explanation above clarified.

> > The page-directory cache issue on x64, which is addressed by patch 3/3,
> > is a theoretical issue that I could not hit by putting ioremap() calls
> > into a loop for a whole day.  Nobody hit this issue, either.
> 
> How do you know you didn't hit that issue? It might cause silent data
> corruption, which might not be easily detected.

If the test hit that issue, it would have caused a kernel page fault
(freed & cleared pte page still unmapped, or this page reused and
attribute data invalid) or MCE (pte page reused and phys addr invalid)
when it accessed to ioremap'd address.

Causing silent data corruption requires that this freed & cleared pte
page to be reused and re-initialized with a valid pte entry data.  While
this case is possible, the chance of my test only hitting this case
without ever hitting much more likely cases of page fault or MCE is
basically zero.

> > The simple revert patch Joerg posted a while ago causes
> > pmd_free_pte_page() to fail on x64.  This causes multiple pmd mappings
> > to fall into pte mappings on my test systems.  This can be seen as a
> > degradation, and I am afraid that it is more harmful than good.
> 
> The plain revert just removes all the issues with the dirty TLB that the
> original patch introduced and prevents huge mappings from being
> established when there have been smaller mappings before. This is not
> ideal, but at least its is consistent and does not leak pages and leaves
> no dirty TLBs. So this is the easiest and most reliable fix for this
> stage in the release process.

If you think the page directory issue needs a fix ASAP, I believe taking
patch 3/3 is much better option than the plain revert, which will
introduce the fall-back issue I explained.

Thanks,
-Toshi

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 0/3] fix free pmd/pte page handlings on x86
  2018-05-16 23:32 [PATCH v3 0/3] fix free pmd/pte page handlings on x86 Toshi Kani
                   ` (2 preceding siblings ...)
  2018-05-16 23:32 ` [PATCH v3 3/3] x86/mm: add TLB purge to free pmd/pte page interfaces Toshi Kani
@ 2018-06-24 13:19 ` Thomas Gleixner
  2018-06-25 14:56   ` Kani, Toshi
  3 siblings, 1 reply; 18+ messages in thread
From: Thomas Gleixner @ 2018-06-24 13:19 UTC (permalink / raw)
  To: Toshi Kani
  Cc: mhocko, akpm, mingo, hpa, cpandya, linux-mm, x86,
	linux-arm-kernel, linux-kernel

On Wed, 16 May 2018, Toshi Kani wrote:

> This series fixes two issues in the x86 ioremap free page handlings
> for pud/pmd mappings.
> 
> Patch 01 fixes BUG_ON on x86-PAE reported by Joerg.  It disables
> the free page handling on x86-PAE.
> 
> Patch 02-03 fixes a possible issue with speculation which can cause
> stale page-directory cache.
>  - Patch 02 is from Chintan's v9 01/04 patch [1], which adds a new arg
>    'addr', with my merge change to patch 01.
>  - Patch 03 adds a TLB purge (INVLPG) to purge page-structure caches
>    that may be cached by speculation.  See the patch descriptions for
>    more detal.

Toshi, Joerg, Michal!

I'm failing to find a conclusion of this discussion. Can we finally make
some progress with that?

Can someone give me a hint what to pick up urgently please?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 0/3] fix free pmd/pte page handlings on x86
  2018-06-24 13:19 ` [PATCH v3 0/3] fix free pmd/pte page handlings on x86 Thomas Gleixner
@ 2018-06-25 14:56   ` Kani, Toshi
  2018-06-25 17:53     ` Michal Hocko
  0 siblings, 1 reply; 18+ messages in thread
From: Kani, Toshi @ 2018-06-25 14:56 UTC (permalink / raw)
  To: tglx
  Cc: linux-kernel, linux-mm, x86, akpm, hpa, mingo, linux-arm-kernel,
	cpandya, Hocko, Michal

On Sun, 2018-06-24 at 15:19 +0200, Thomas Gleixner wrote:
> On Wed, 16 May 2018, Toshi Kani wrote:
> 
> > This series fixes two issues in the x86 ioremap free page handlings
> > for pud/pmd mappings.
> > 
> > Patch 01 fixes BUG_ON on x86-PAE reported by Joerg.  It disables
> > the free page handling on x86-PAE.
> > 
> > Patch 02-03 fixes a possible issue with speculation which can cause
> > stale page-directory cache.
> >  - Patch 02 is from Chintan's v9 01/04 patch [1], which adds a new arg
> >    'addr', with my merge change to patch 01.
> >  - Patch 03 adds a TLB purge (INVLPG) to purge page-structure caches
> >    that may be cached by speculation.  See the patch descriptions for
> >    more detal.
> 
> Toshi, Joerg, Michal!

Hi Thomas,

Thanks for checking. I was about to ping as well.

> I'm failing to find a conclusion of this discussion. Can we finally make
> some progress with that?

I have not heard from Joerg since I last replied to his comments to
Patch 3/3 -- I did my best to explain that there was no issue in the
single page allocation in pud_free_pmd_page().  From my perspective, the
 v3 series is good to go.

> Can someone give me a hint what to pick up urgently please?

I've confirmed that the v3 series still applies clearly to 4.18-r2
linus.git.

Patch 1/3, v3 
https://patchwork.kernel.org/patch/10405071/

Patch 2/3, v3-UPDATE
https://patchwork.kernel.org/patch/10407065/

nit: sorry, please fix my email address below.
- [toshi@hpe.com: merge changes, rewrite patch description]
+ [toshi.kani@hpe.com: merge changes, rewrite patch description]

Patch 3/3, v3
https://patchwork.kernel.org/patch/10405073/

Thanks,
-Toshi

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 0/3] fix free pmd/pte page handlings on x86
  2018-06-25 14:56   ` Kani, Toshi
@ 2018-06-25 17:53     ` Michal Hocko
  2018-06-25 21:15       ` Kani, Toshi
  0 siblings, 1 reply; 18+ messages in thread
From: Michal Hocko @ 2018-06-25 17:53 UTC (permalink / raw)
  To: Kani, Toshi
  Cc: tglx, linux-kernel, linux-mm, x86, akpm, hpa, mingo,
	linux-arm-kernel, cpandya

On Mon 25-06-18 14:56:26, Kani Toshimitsu wrote:
> On Sun, 2018-06-24 at 15:19 +0200, Thomas Gleixner wrote:
> > On Wed, 16 May 2018, Toshi Kani wrote:
> > 
> > > This series fixes two issues in the x86 ioremap free page handlings
> > > for pud/pmd mappings.
> > > 
> > > Patch 01 fixes BUG_ON on x86-PAE reported by Joerg.  It disables
> > > the free page handling on x86-PAE.
> > > 
> > > Patch 02-03 fixes a possible issue with speculation which can cause
> > > stale page-directory cache.
> > >  - Patch 02 is from Chintan's v9 01/04 patch [1], which adds a new arg
> > >    'addr', with my merge change to patch 01.
> > >  - Patch 03 adds a TLB purge (INVLPG) to purge page-structure caches
> > >    that may be cached by speculation.  See the patch descriptions for
> > >    more detal.
> > 
> > Toshi, Joerg, Michal!
> 
> Hi Thomas,
> 
> Thanks for checking. I was about to ping as well.
> 
> > I'm failing to find a conclusion of this discussion. Can we finally make
> > some progress with that?
> 
> I have not heard from Joerg since I last replied to his comments to
> Patch 3/3 -- I did my best to explain that there was no issue in the
> single page allocation in pud_free_pmd_page().  From my perspective, the
>  v3 series is good to go.

Well, I admit that this not my area but I agree with Joerg that
allocating memory inside afunction that is supposed to free page table
is far from ideal. More so that the allocation is hardcoded GFP_KERNEL.
We already have this antipattern in functions to allocate page tables
and it has turned to be maintenance PITA longterm. So if there is a way
around that then I would strongly suggest finding a different solution.

Whether that is sufficient to ditch the whole series is not my call
though.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 0/3] fix free pmd/pte page handlings on x86
  2018-06-25 17:53     ` Michal Hocko
@ 2018-06-25 21:15       ` Kani, Toshi
  2018-06-26  6:35         ` Michal Hocko
  0 siblings, 1 reply; 18+ messages in thread
From: Kani, Toshi @ 2018-06-25 21:15 UTC (permalink / raw)
  To: mhocko
  Cc: linux-kernel, tglx, linux-mm, x86, akpm, hpa, mingo,
	linux-arm-kernel, cpandya

On Mon, 2018-06-25 at 19:53 +0200, Michal Hocko wrote:
> On Mon 25-06-18 14:56:26, Kani Toshimitsu wrote:
> > On Sun, 2018-06-24 at 15:19 +0200, Thomas Gleixner wrote:
> > > On Wed, 16 May 2018, Toshi Kani wrote:
> > > 
> > > > This series fixes two issues in the x86 ioremap free page handlings
> > > > for pud/pmd mappings.
> > > > 
> > > > Patch 01 fixes BUG_ON on x86-PAE reported by Joerg.  It disables
> > > > the free page handling on x86-PAE.
> > > > 
> > > > Patch 02-03 fixes a possible issue with speculation which can cause
> > > > stale page-directory cache.
> > > >  - Patch 02 is from Chintan's v9 01/04 patch [1], which adds a new arg
> > > >    'addr', with my merge change to patch 01.
> > > >  - Patch 03 adds a TLB purge (INVLPG) to purge page-structure caches
> > > >    that may be cached by speculation.  See the patch descriptions for
> > > >    more detal.
> > > 
> > > Toshi, Joerg, Michal!
> > 
> > Hi Thomas,
> > 
> > Thanks for checking. I was about to ping as well.
> > 
> > > I'm failing to find a conclusion of this discussion. Can we finally make
> > > some progress with that?
> > 
> > I have not heard from Joerg since I last replied to his comments to
> > Patch 3/3 -- I did my best to explain that there was no issue in the
> > single page allocation in pud_free_pmd_page().  From my perspective, the
> >  v3 series is good to go.
> 
> Well, I admit that this not my area but I agree with Joerg that
> allocating memory inside afunction that is supposed to free page table
> is far from ideal. More so that the allocation is hardcoded GFP_KERNEL.
> We already have this antipattern in functions to allocate page tables
> and it has turned to be maintenance PITA longterm. So if there is a way
> around that then I would strongly suggest finding a different solution.
> 
> Whether that is sufficient to ditch the whole series is not my call
> though.

I'd agree if this code is in a memory free path.  However, this code is
in the ioremap() path, which is expected to allocate new page(s).

For example, setting a fresh PUD map allocates a new page to setup page
tables as follows:

  ioremap_pud_range()
    pud_alloc()
      __pud_alloc()
        pud_alloc_one()
          get_zeroed_page() with GFP_KERNEL
            __get_free_pages() with GFP_KERNEL | __GFP_ZERO

In a rare case, a PUD map is set to a previously-used-for-PMD range,
which leads to call pud_free_pmd_page() to free up the page consisting
of cleared PMD entries.  To manage this procedure, pud_free_pmd_page()
temporary allocates a page to save the cleared PMD entries as follows:

  ioremap_pud_range()
    pud_free_pmd_page()
      __get_free_page() with GFP_KERNEL

These details are all internal to the ioremap() callers, who should
always expect that ioremap() allocates pages for setting page tables.

As for possible performance implications associated with this page
allocation, pmd_free_pte_page() and pud_free_pmd_page() are very
different in terms of how likely they can be called.

pmd_free_pte_page(), which does not allocate a page, gets called
multiple times during normal boot on my test systems.  My ioremap tests
cause this function be called quite frequently.  This is because 4KB and
2MB vaddr allocation comes from similar vmalloc ranges. 

pud_free_pmd_page(), which allocates a page, seems to be never called
under normal circumstances, at least I was not able to with my ioremap
tests.  I found that 1GB vaddr allocation does not share with 4KB/2MB
ranges.  I had to hack the allocation code to force them shared to test
this function.  Hence, this memory allocation does not have any
implications in performance.

Lastly, for the code maintenance, I believe this memory allocation keeps
the code much simpler than it would otherwise need to manage a special
page list.

Thanks,
-Toshi

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 0/3] fix free pmd/pte page handlings on x86
  2018-06-25 21:15       ` Kani, Toshi
@ 2018-06-26  6:35         ` Michal Hocko
  2018-06-26  8:45           ` Thomas Gleixner
  0 siblings, 1 reply; 18+ messages in thread
From: Michal Hocko @ 2018-06-26  6:35 UTC (permalink / raw)
  To: Kani, Toshi
  Cc: linux-kernel, tglx, linux-mm, x86, akpm, hpa, mingo,
	linux-arm-kernel, cpandya

On Mon 25-06-18 21:15:03, Kani Toshimitsu wrote:
> On Mon, 2018-06-25 at 19:53 +0200, Michal Hocko wrote:
> > On Mon 25-06-18 14:56:26, Kani Toshimitsu wrote:
> > > On Sun, 2018-06-24 at 15:19 +0200, Thomas Gleixner wrote:
> > > > On Wed, 16 May 2018, Toshi Kani wrote:
> > > > 
> > > > > This series fixes two issues in the x86 ioremap free page handlings
> > > > > for pud/pmd mappings.
> > > > > 
> > > > > Patch 01 fixes BUG_ON on x86-PAE reported by Joerg.  It disables
> > > > > the free page handling on x86-PAE.
> > > > > 
> > > > > Patch 02-03 fixes a possible issue with speculation which can cause
> > > > > stale page-directory cache.
> > > > >  - Patch 02 is from Chintan's v9 01/04 patch [1], which adds a new arg
> > > > >    'addr', with my merge change to patch 01.
> > > > >  - Patch 03 adds a TLB purge (INVLPG) to purge page-structure caches
> > > > >    that may be cached by speculation.  See the patch descriptions for
> > > > >    more detal.
> > > > 
> > > > Toshi, Joerg, Michal!
> > > 
> > > Hi Thomas,
> > > 
> > > Thanks for checking. I was about to ping as well.
> > > 
> > > > I'm failing to find a conclusion of this discussion. Can we finally make
> > > > some progress with that?
> > > 
> > > I have not heard from Joerg since I last replied to his comments to
> > > Patch 3/3 -- I did my best to explain that there was no issue in the
> > > single page allocation in pud_free_pmd_page().  From my perspective, the
> > >  v3 series is good to go.
> > 
> > Well, I admit that this not my area but I agree with Joerg that
> > allocating memory inside afunction that is supposed to free page table
> > is far from ideal. More so that the allocation is hardcoded GFP_KERNEL.
> > We already have this antipattern in functions to allocate page tables
> > and it has turned to be maintenance PITA longterm. So if there is a way
> > around that then I would strongly suggest finding a different solution.
> > 
> > Whether that is sufficient to ditch the whole series is not my call
> > though.
> 
> I'd agree if this code is in a memory free path.  However, this code is
> in the ioremap() path, which is expected to allocate new page(s).

This might be the case right now but my experience tells me that
something named this generic and placed in a generic pte handling header
file will end up being called in many other places you even do not
expect right now sooner or later.

> For example, setting a fresh PUD map allocates a new page to setup page
> tables as follows:
> 
>   ioremap_pud_range()
>     pud_alloc()
>       __pud_alloc()
>         pud_alloc_one()
>           get_zeroed_page() with GFP_KERNEL
>             __get_free_pages() with GFP_KERNEL | __GFP_ZERO
> 
> In a rare case, a PUD map is set to a previously-used-for-PMD range,
> which leads to call pud_free_pmd_page() to free up the page consisting
> of cleared PMD entries.  To manage this procedure, pud_free_pmd_page()
> temporary allocates a page to save the cleared PMD entries as follows:
> 
>   ioremap_pud_range()
>     pud_free_pmd_page()
>       __get_free_page() with GFP_KERNEL
> 
> These details are all internal to the ioremap() callers, who should
> always expect that ioremap() allocates pages for setting page tables.
> 
> As for possible performance implications associated with this page
> allocation, pmd_free_pte_page() and pud_free_pmd_page() are very
> different in terms of how likely they can be called.
> 
> pmd_free_pte_page(), which does not allocate a page, gets called
> multiple times during normal boot on my test systems.  My ioremap tests
> cause this function be called quite frequently.  This is because 4KB and
> 2MB vaddr allocation comes from similar vmalloc ranges. 
> 
> pud_free_pmd_page(), which allocates a page, seems to be never called
> under normal circumstances, at least I was not able to with my ioremap
> tests.  I found that 1GB vaddr allocation does not share with 4KB/2MB
> ranges.  I had to hack the allocation code to force them shared to test
> this function.  Hence, this memory allocation does not have any
> implications in performance.

Again, this is all too much focused on your particular testing and the
current code base. Neither is a great foundation for design decisions.

> Lastly, for the code maintenance, I believe this memory allocation keeps
> the code much simpler than it would otherwise need to manage a special
> page list.

Yes, I can see a simplicity as a reasonable argument for a quick fix,
which these pile is supposed to be AFAIU. So this might be good to go
from that perspective, but I believe that this should be changed in
future at least.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 0/3] fix free pmd/pte page handlings on x86
  2018-06-26  6:35         ` Michal Hocko
@ 2018-06-26  8:45           ` Thomas Gleixner
  2018-06-26  8:54             ` Michal Hocko
  0 siblings, 1 reply; 18+ messages in thread
From: Thomas Gleixner @ 2018-06-26  8:45 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Kani, Toshi, linux-kernel, linux-mm, x86, akpm, hpa, mingo,
	linux-arm-kernel, cpandya

On Tue, 26 Jun 2018, Michal Hocko wrote:
> On Mon 25-06-18 21:15:03, Kani Toshimitsu wrote:
> > Lastly, for the code maintenance, I believe this memory allocation keeps
> > the code much simpler than it would otherwise need to manage a special
> > page list.
> 
> Yes, I can see a simplicity as a reasonable argument for a quick fix,
> which these pile is supposed to be AFAIU. So this might be good to go
> from that perspective, but I believe that this should be changed in
> future at least.

So the conclusion is, that we ship this set of patches now to cure the
existing wreckage, right?

Fine with that, but who will take care of reworking it proper? I'm
concerned that this will just go stale the moment the fixes hit the tree.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 0/3] fix free pmd/pte page handlings on x86
  2018-06-26  8:45           ` Thomas Gleixner
@ 2018-06-26  8:54             ` Michal Hocko
  2018-06-26 15:25               ` Kani, Toshi
  0 siblings, 1 reply; 18+ messages in thread
From: Michal Hocko @ 2018-06-26  8:54 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Kani, Toshi, linux-kernel, linux-mm, x86, akpm, hpa, mingo,
	linux-arm-kernel, cpandya

On Tue 26-06-18 10:45:11, Thomas Gleixner wrote:
> On Tue, 26 Jun 2018, Michal Hocko wrote:
> > On Mon 25-06-18 21:15:03, Kani Toshimitsu wrote:
> > > Lastly, for the code maintenance, I believe this memory allocation keeps
> > > the code much simpler than it would otherwise need to manage a special
> > > page list.
> > 
> > Yes, I can see a simplicity as a reasonable argument for a quick fix,
> > which these pile is supposed to be AFAIU. So this might be good to go
> > from that perspective, but I believe that this should be changed in
> > future at least.
> 
> So the conclusion is, that we ship this set of patches now to cure the
> existing wreckage, right?

Joerg was suggesting some alternative but I got lost in the discussion
to be honest so I might mis{interpret,remember}.

> Fine with that, but who will take care of reworking it proper? I'm
> concerned that this will just go stale the moment the fixes hit the tree.

Yeah, this is why I usually try to push back hard because "will be fixed
later" is similar to say "documentation will come later" etc...

A big fat TODO would be appropriate so it won't get forgotten at least.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 0/3] fix free pmd/pte page handlings on x86
  2018-06-26  8:54             ` Michal Hocko
@ 2018-06-26 15:25               ` Kani, Toshi
  0 siblings, 0 replies; 18+ messages in thread
From: Kani, Toshi @ 2018-06-26 15:25 UTC (permalink / raw)
  To: tglx, mhocko
  Cc: linux-kernel, linux-mm, x86, akpm, hpa, mingo, linux-arm-kernel, cpandya

On Tue, 2018-06-26 at 10:54 +0200, Michal Hocko wrote:
> On Tue 26-06-18 10:45:11, Thomas Gleixner wrote:
> > On Tue, 26 Jun 2018, Michal Hocko wrote:
> > > On Mon 25-06-18 21:15:03, Kani Toshimitsu wrote:
> > > > Lastly, for the code maintenance, I believe this memory allocation keeps
> > > > the code much simpler than it would otherwise need to manage a special
> > > > page list.
> > > 
> > > Yes, I can see a simplicity as a reasonable argument for a quick fix,
> > > which these pile is supposed to be AFAIU. So this might be good to go
> > > from that perspective, but I believe that this should be changed in
> > > future at least.
> > 
> > So the conclusion is, that we ship this set of patches now to cure the
> > existing wreckage, right?
> 
> Joerg was suggesting some alternative but I got lost in the discussion
> to be honest so I might mis{interpret,remember}.

I've addressed his multiple comments in this series, but had to disagree
with his following suggestions:

1. Apply his revert patch first https://lkml.org/lkml/2018/4/26/636

Disagreed because this patch breaks the current ioremap() callers using
2MB mappings by failing pmd_free_pte_page().

2. Add a special page list, instead of memory alloc, in patch 3/3

Quoting his concerns about the memory allocation:
===
On Tue, May 29, 2018 at 04:10:24PM +0000, Kani, Toshi wrote:
> Can you explain why you think allocating a page here is a major
problem?

Because a larger allocation is more likely to fail. And if you fail the
allocation, you also fail to free more pages, which _is_ a problem. So
better avoid any allocations in code paths that are about freeing
memory.
===

Which I had replied as:
====
It only allocates a single page.  If it fails to allocate a single page,
this pud mapping request simply falls to pmd mappings, which is only as
bad as your suggested plain revert always does for both pud and pmd
mappings.  Also, this func is called from ioremap() when a driver
initializes its mapping.  If the system does not have a single page to
allocate, the driver has a much bigger issue to deal with than its
request falling back to pmd mappings.  Please also note that this func
is not called from free-memory path.
====

I have further summarized my reasons to keep the memory allocation in my
reply to Michal.  Another reason, which I forgot to mention, is that
ioremap() does not have an init interface to initialize a special page
list in architecture-specific way (and there is a good reason not to
have it).

> > Fine with that, but who will take care of reworking it proper? I'm
> > concerned that this will just go stale the moment the fixes hit the tree.
> 
> Yeah, this is why I usually try to push back hard because "will be fixed
> later" is similar to say "documentation will come later" etc...
> 
> A big fat TODO would be appropriate so it won't get forgotten at least.

I'd be happy to rework if we find a bug in the code.  But I do not think
we need to rework for the sake of possible future callers that nobody
knows for sure.  So, I will add the NOTE blow to clarify the pre-
requisite.
 
NOTE:
 - Callers must allow a single page allocation.

Thanks,
-Toshi

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2018-06-26 15:25 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-16 23:32 [PATCH v3 0/3] fix free pmd/pte page handlings on x86 Toshi Kani
2018-05-16 23:32 ` [PATCH v3 1/3] x86/mm: disable ioremap free page handling on x86-PAE Toshi Kani
2018-05-16 23:32 ` [PATCH v3 2/3] ioremap: Update pgtable free interfaces with addr Toshi Kani
2018-05-17  6:47   ` Michal Hocko
2018-05-17 14:32     ` Kani, Toshi
2018-05-16 23:32 ` [PATCH v3 3/3] x86/mm: add TLB purge to free pmd/pte page interfaces Toshi Kani
2018-05-29 14:44   ` Joerg Roedel
2018-05-29 16:10     ` Kani, Toshi
2018-05-30  4:59       ` joro
2018-05-30 15:39         ` Kani, Toshi
2018-06-24 13:19 ` [PATCH v3 0/3] fix free pmd/pte page handlings on x86 Thomas Gleixner
2018-06-25 14:56   ` Kani, Toshi
2018-06-25 17:53     ` Michal Hocko
2018-06-25 21:15       ` Kani, Toshi
2018-06-26  6:35         ` Michal Hocko
2018-06-26  8:45           ` Thomas Gleixner
2018-06-26  8:54             ` Michal Hocko
2018-06-26 15:25               ` Kani, Toshi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).