All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/5] Clean up huge vmap and ioremap code
@ 2018-09-12 10:26 Will Deacon
  2018-09-12 10:26 ` [PATCH 1/5] ioremap: Rework pXd_free_pYd_page() API Will Deacon
                   ` (4 more replies)
  0 siblings, 5 replies; 17+ messages in thread
From: Will Deacon @ 2018-09-12 10:26 UTC (permalink / raw)
  To: linux-kernel, linux-mm
  Cc: cpandya, toshi.kani, tglx, mhocko, akpm, Will Deacon

Hi all,

The recent introduction of break-before-make on the huge vmap path in
b6bdb7517c3d ("mm/vmalloc: add interfaces to free unmapped page table")
introduced a pair of arch functions for freeing a child level of page
table before putting down a huge mapping at the parent level.

Whilst this works well, the semantics of the pXd_free_pYd_table() function
are slightly confusing, and this led to an over-eager VM_WARN_ON in the
arm64 code that we fixed in -rc3 [1]. Linus suggested that the interface
could be tidied up so that the pXd_present() checks are moved into the
caller, so I've implemented that and generally cleaned up the ioremap code
so that it's easier to follow. I also extended the break-before-make code
to cover the huge p4d case, although this remains unused by any architectures.

Feedback welcome.

Cheers,

Will

[1] https://lkml.org/lkml/2018/9/7/898

--->8

Will Deacon (5):
  ioremap: Rework pXd_free_pYd_page() API
  arm64: mmu: Drop pXd_present() checks from pXd_free_pYd_table()
  x86: pgtable: Drop pXd_none() checks from pXd_free_pYd_table()
  lib/ioremap: Ensure phys_addr actually corresponds to a physical
    address
  lib/ioremap: Ensure break-before-make is used for huge p4d mappings

 arch/arm64/mm/mmu.c           |  13 +++---
 arch/x86/mm/pgtable.c         |  14 +++---
 include/asm-generic/pgtable.h |   5 ++
 lib/ioremap.c                 | 103 +++++++++++++++++++++++++++++-------------
 4 files changed, 91 insertions(+), 44 deletions(-)

-- 
2.1.4


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/5] ioremap: Rework pXd_free_pYd_page() API
  2018-09-12 10:26 [PATCH 0/5] Clean up huge vmap and ioremap code Will Deacon
@ 2018-09-12 10:26 ` Will Deacon
  2018-09-14 20:36   ` Kani, Toshi
  2018-09-12 10:26 ` [PATCH 2/5] arm64: mmu: Drop pXd_present() checks from pXd_free_pYd_table() Will Deacon
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 17+ messages in thread
From: Will Deacon @ 2018-09-12 10:26 UTC (permalink / raw)
  To: linux-kernel, linux-mm
  Cc: cpandya, toshi.kani, tglx, mhocko, akpm, Will Deacon

The recently merged API for ensuring break-before-make on page-table
entries when installing huge mappings in the vmalloc/ioremap region is
fairly counter-intuitive, resulting in the arch freeing functions
(e.g. pmd_free_pte_page()) being called even on entries that aren't
present. This resulted in a minor bug in the arm64 implementation, giving
rise to spurious VM_WARN messages.

This patch moves the pXd_present() checks out into the core code,
refactoring the callsites at the same time so that we avoid the complex
conjunctions when determining whether or not we can put down a huge
mapping.

Cc: Chintan Pandya <cpandya@codeaurora.org>
Cc: Toshi Kani <toshi.kani@hpe.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 lib/ioremap.c | 56 ++++++++++++++++++++++++++++++++++++++++++--------------
 1 file changed, 42 insertions(+), 14 deletions(-)

diff --git a/lib/ioremap.c b/lib/ioremap.c
index 517f5853ffed..6c72764af19c 100644
--- a/lib/ioremap.c
+++ b/lib/ioremap.c
@@ -76,6 +76,25 @@ static int ioremap_pte_range(pmd_t *pmd, unsigned long addr,
 	return 0;
 }
 
+static int ioremap_try_huge_pmd(pmd_t *pmd, unsigned long addr,
+				unsigned long end, phys_addr_t phys_addr,
+				pgprot_t prot)
+{
+	if (!ioremap_pmd_enabled())
+		return 0;
+
+	if ((end - addr) != PMD_SIZE)
+		return 0;
+
+	if (!IS_ALIGNED(phys_addr, PMD_SIZE))
+		return 0;
+
+	if (pmd_present(*pmd) && !pmd_free_pte_page(pmd, addr))
+		return 0;
+
+	return pmd_set_huge(pmd, phys_addr, prot);
+}
+
 static inline int ioremap_pmd_range(pud_t *pud, unsigned long addr,
 		unsigned long end, phys_addr_t phys_addr, pgprot_t prot)
 {
@@ -89,13 +108,8 @@ static inline int ioremap_pmd_range(pud_t *pud, unsigned long addr,
 	do {
 		next = pmd_addr_end(addr, end);
 
-		if (ioremap_pmd_enabled() &&
-		    ((next - addr) == PMD_SIZE) &&
-		    IS_ALIGNED(phys_addr + addr, PMD_SIZE) &&
-		    pmd_free_pte_page(pmd, addr)) {
-			if (pmd_set_huge(pmd, phys_addr + addr, prot))
-				continue;
-		}
+		if (ioremap_try_huge_pmd(pmd, addr, next, phys_addr + addr, prot))
+			continue;
 
 		if (ioremap_pte_range(pmd, addr, next, phys_addr + addr, prot))
 			return -ENOMEM;
@@ -103,6 +117,25 @@ static inline int ioremap_pmd_range(pud_t *pud, unsigned long addr,
 	return 0;
 }
 
+static int ioremap_try_huge_pud(pud_t *pud, unsigned long addr,
+				unsigned long end, phys_addr_t phys_addr,
+				pgprot_t prot)
+{
+	if (!ioremap_pud_enabled())
+		return 0;
+
+	if ((end - addr) != PUD_SIZE)
+		return 0;
+
+	if (!IS_ALIGNED(phys_addr, PUD_SIZE))
+		return 0;
+
+	if (pud_present(*pud) && !pud_free_pmd_page(pud, addr))
+		return 0;
+
+	return pud_set_huge(pud, phys_addr, prot);
+}
+
 static inline int ioremap_pud_range(p4d_t *p4d, unsigned long addr,
 		unsigned long end, phys_addr_t phys_addr, pgprot_t prot)
 {
@@ -116,13 +149,8 @@ static inline int ioremap_pud_range(p4d_t *p4d, unsigned long addr,
 	do {
 		next = pud_addr_end(addr, end);
 
-		if (ioremap_pud_enabled() &&
-		    ((next - addr) == PUD_SIZE) &&
-		    IS_ALIGNED(phys_addr + addr, PUD_SIZE) &&
-		    pud_free_pmd_page(pud, addr)) {
-			if (pud_set_huge(pud, phys_addr + addr, prot))
-				continue;
-		}
+		if (ioremap_try_huge_pud(pud, addr, next, phys_addr + addr, prot))
+			continue;
 
 		if (ioremap_pmd_range(pud, addr, next, phys_addr + addr, prot))
 			return -ENOMEM;
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 2/5] arm64: mmu: Drop pXd_present() checks from pXd_free_pYd_table()
  2018-09-12 10:26 [PATCH 0/5] Clean up huge vmap and ioremap code Will Deacon
  2018-09-12 10:26 ` [PATCH 1/5] ioremap: Rework pXd_free_pYd_page() API Will Deacon
@ 2018-09-12 10:26 ` Will Deacon
  2018-09-12 10:26 ` [PATCH 3/5] x86: pgtable: Drop pXd_none() " Will Deacon
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 17+ messages in thread
From: Will Deacon @ 2018-09-12 10:26 UTC (permalink / raw)
  To: linux-kernel, linux-mm
  Cc: cpandya, toshi.kani, tglx, mhocko, akpm, Will Deacon

Now that the core code checks this for us, we don't need to do it in the
backend.

Cc: Chintan Pandya <cpandya@codeaurora.org>
Cc: Toshi Kani <toshi.kani@hpe.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/mm/mmu.c | 8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 8080c9f489c3..0dcb3354d6dd 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -985,10 +985,8 @@ int pmd_free_pte_page(pmd_t *pmdp, unsigned long addr)
 
 	pmd = READ_ONCE(*pmdp);
 
-	if (!pmd_present(pmd))
-		return 1;
 	if (!pmd_table(pmd)) {
-		VM_WARN_ON(!pmd_table(pmd));
+		VM_WARN_ON(1);
 		return 1;
 	}
 
@@ -1008,10 +1006,8 @@ int pud_free_pmd_page(pud_t *pudp, unsigned long addr)
 
 	pud = READ_ONCE(*pudp);
 
-	if (!pud_present(pud))
-		return 1;
 	if (!pud_table(pud)) {
-		VM_WARN_ON(!pud_table(pud));
+		VM_WARN_ON(1);
 		return 1;
 	}
 
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 3/5] x86: pgtable: Drop pXd_none() checks from pXd_free_pYd_table()
  2018-09-12 10:26 [PATCH 0/5] Clean up huge vmap and ioremap code Will Deacon
  2018-09-12 10:26 ` [PATCH 1/5] ioremap: Rework pXd_free_pYd_page() API Will Deacon
  2018-09-12 10:26 ` [PATCH 2/5] arm64: mmu: Drop pXd_present() checks from pXd_free_pYd_table() Will Deacon
@ 2018-09-12 10:26 ` Will Deacon
  2018-09-14 20:37   ` Kani, Toshi
  2018-09-12 10:26 ` [PATCH 4/5] lib/ioremap: Ensure phys_addr actually corresponds to a physical address Will Deacon
  2018-09-12 10:26 ` [PATCH 5/5] lib/ioremap: Ensure break-before-make is used for huge p4d mappings Will Deacon
  4 siblings, 1 reply; 17+ messages in thread
From: Will Deacon @ 2018-09-12 10:26 UTC (permalink / raw)
  To: linux-kernel, linux-mm
  Cc: cpandya, toshi.kani, tglx, mhocko, akpm, Will Deacon

Now that the core code checks this for us, we don't need to do it in the
backend.

Cc: Chintan Pandya <cpandya@codeaurora.org>
Cc: Toshi Kani <toshi.kani@hpe.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/x86/mm/pgtable.c | 6 ------
 1 file changed, 6 deletions(-)

diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index ae394552fb94..b4919c44a194 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -796,9 +796,6 @@ int pud_free_pmd_page(pud_t *pud, unsigned long addr)
 	pte_t *pte;
 	int i;
 
-	if (pud_none(*pud))
-		return 1;
-
 	pmd = (pmd_t *)pud_page_vaddr(*pud);
 	pmd_sv = (pmd_t *)__get_free_page(GFP_KERNEL);
 	if (!pmd_sv)
@@ -840,9 +837,6 @@ int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
 {
 	pte_t *pte;
 
-	if (pmd_none(*pmd))
-		return 1;
-
 	pte = (pte_t *)pmd_page_vaddr(*pmd);
 	pmd_clear(pmd);
 
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 4/5] lib/ioremap: Ensure phys_addr actually corresponds to a physical address
  2018-09-12 10:26 [PATCH 0/5] Clean up huge vmap and ioremap code Will Deacon
                   ` (2 preceding siblings ...)
  2018-09-12 10:26 ` [PATCH 3/5] x86: pgtable: Drop pXd_none() " Will Deacon
@ 2018-09-12 10:26 ` Will Deacon
  2018-09-12 15:09   ` Sean Christopherson
  2018-09-12 10:26 ` [PATCH 5/5] lib/ioremap: Ensure break-before-make is used for huge p4d mappings Will Deacon
  4 siblings, 1 reply; 17+ messages in thread
From: Will Deacon @ 2018-09-12 10:26 UTC (permalink / raw)
  To: linux-kernel, linux-mm
  Cc: cpandya, toshi.kani, tglx, mhocko, akpm, Will Deacon

The current ioremap() code uses a phys_addr variable at each level of
page table, which is confusingly offset by subtracting the base virtual
address being mapped so that adding the current virtual address back on
when iterating through the page table entries gives back the corresponding
physical address.

This is fairly confusing and results in all users of phys_addr having to
add the current virtual address back on. Instead, this patch just updates
phys_addr when iterating over the page table entries, ensuring that it's
always up-to-date and doesn't require explicit offsetting.

Cc: Chintan Pandya <cpandya@codeaurora.org>
Cc: Toshi Kani <toshi.kani@hpe.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 lib/ioremap.c | 28 ++++++++++++----------------
 1 file changed, 12 insertions(+), 16 deletions(-)

diff --git a/lib/ioremap.c b/lib/ioremap.c
index 6c72764af19c..fc834a59c90c 100644
--- a/lib/ioremap.c
+++ b/lib/ioremap.c
@@ -101,19 +101,18 @@ static inline int ioremap_pmd_range(pud_t *pud, unsigned long addr,
 	pmd_t *pmd;
 	unsigned long next;
 
-	phys_addr -= addr;
 	pmd = pmd_alloc(&init_mm, pud, addr);
 	if (!pmd)
 		return -ENOMEM;
 	do {
 		next = pmd_addr_end(addr, end);
 
-		if (ioremap_try_huge_pmd(pmd, addr, next, phys_addr + addr, prot))
+		if (ioremap_try_huge_pmd(pmd, addr, next, phys_addr, prot))
 			continue;
 
-		if (ioremap_pte_range(pmd, addr, next, phys_addr + addr, prot))
+		if (ioremap_pte_range(pmd, addr, next, phys_addr, prot))
 			return -ENOMEM;
-	} while (pmd++, addr = next, addr != end);
+	} while (pmd++, addr = next, phys_addr += PMD_SIZE, addr != end);
 	return 0;
 }
 
@@ -142,19 +141,18 @@ static inline int ioremap_pud_range(p4d_t *p4d, unsigned long addr,
 	pud_t *pud;
 	unsigned long next;
 
-	phys_addr -= addr;
 	pud = pud_alloc(&init_mm, p4d, addr);
 	if (!pud)
 		return -ENOMEM;
 	do {
 		next = pud_addr_end(addr, end);
 
-		if (ioremap_try_huge_pud(pud, addr, next, phys_addr + addr, prot))
+		if (ioremap_try_huge_pud(pud, addr, next, phys_addr, prot))
 			continue;
 
-		if (ioremap_pmd_range(pud, addr, next, phys_addr + addr, prot))
+		if (ioremap_pmd_range(pud, addr, next, phys_addr, prot))
 			return -ENOMEM;
-	} while (pud++, addr = next, addr != end);
+	} while (pud++, addr = next, phys_addr += PUD_SIZE, addr != end);
 	return 0;
 }
 
@@ -164,7 +162,6 @@ static inline int ioremap_p4d_range(pgd_t *pgd, unsigned long addr,
 	p4d_t *p4d;
 	unsigned long next;
 
-	phys_addr -= addr;
 	p4d = p4d_alloc(&init_mm, pgd, addr);
 	if (!p4d)
 		return -ENOMEM;
@@ -173,14 +170,14 @@ static inline int ioremap_p4d_range(pgd_t *pgd, unsigned long addr,
 
 		if (ioremap_p4d_enabled() &&
 		    ((next - addr) == P4D_SIZE) &&
-		    IS_ALIGNED(phys_addr + addr, P4D_SIZE)) {
-			if (p4d_set_huge(p4d, phys_addr + addr, prot))
+		    IS_ALIGNED(phys_addr, P4D_SIZE)) {
+			if (p4d_set_huge(p4d, phys_addr, prot))
 				continue;
 		}
 
-		if (ioremap_pud_range(p4d, addr, next, phys_addr + addr, prot))
+		if (ioremap_pud_range(p4d, addr, next, phys_addr, prot))
 			return -ENOMEM;
-	} while (p4d++, addr = next, addr != end);
+	} while (p4d++, addr = next, phys_addr += P4D_SIZE, addr != end);
 	return 0;
 }
 
@@ -196,14 +193,13 @@ int ioremap_page_range(unsigned long addr,
 	BUG_ON(addr >= end);
 
 	start = addr;
-	phys_addr -= addr;
 	pgd = pgd_offset_k(addr);
 	do {
 		next = pgd_addr_end(addr, end);
-		err = ioremap_p4d_range(pgd, addr, next, phys_addr+addr, prot);
+		err = ioremap_p4d_range(pgd, addr, next, phys_addr, prot);
 		if (err)
 			break;
-	} while (pgd++, addr = next, addr != end);
+	} while (pgd++, addr = next, phys_addr += PGDIR_SIZE, addr != end);
 
 	flush_cache_vmap(start, end);
 
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 5/5] lib/ioremap: Ensure break-before-make is used for huge p4d mappings
  2018-09-12 10:26 [PATCH 0/5] Clean up huge vmap and ioremap code Will Deacon
                   ` (3 preceding siblings ...)
  2018-09-12 10:26 ` [PATCH 4/5] lib/ioremap: Ensure phys_addr actually corresponds to a physical address Will Deacon
@ 2018-09-12 10:26 ` Will Deacon
  2018-09-17 18:55   ` Kani, Toshi
  4 siblings, 1 reply; 17+ messages in thread
From: Will Deacon @ 2018-09-12 10:26 UTC (permalink / raw)
  To: linux-kernel, linux-mm
  Cc: cpandya, toshi.kani, tglx, mhocko, akpm, Will Deacon

Whilst no architectures actually enable support for huge p4d mappings
in the vmap area, the code that is implemented should be using
break-before-make, as we do for pud and pmd huge entries.

Cc: Chintan Pandya <cpandya@codeaurora.org>
Cc: Toshi Kani <toshi.kani@hpe.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/mm/mmu.c           |  5 +++++
 arch/x86/mm/pgtable.c         |  8 ++++++++
 include/asm-generic/pgtable.h |  5 +++++
 lib/ioremap.c                 | 27 +++++++++++++++++++++------
 4 files changed, 39 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 0dcb3354d6dd..58776b90dd2a 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -1024,3 +1024,8 @@ int pud_free_pmd_page(pud_t *pudp, unsigned long addr)
 	pmd_free(NULL, table);
 	return 1;
 }
+
+int p4d_free_pud_page(p4d_t *p4d, unsigned long addr)
+{
+	return 0;	/* Don't attempt a block mapping */
+}
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index b4919c44a194..c6094997d060 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -779,6 +779,14 @@ int pmd_clear_huge(pmd_t *pmd)
 	return 0;
 }
 
+/*
+ * Until we support 512GB pages, skip them in the vmap area.
+ */
+int p4d_free_pud_page(p4d_t *p4d, unsigned long addr)
+{
+	return 0;
+}
+
 #ifdef CONFIG_X86_64
 /**
  * pud_free_pmd_page - Clear pud entry and free pmd page.
diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
index 88ebc6102c7c..4297a2519ebf 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -1019,6 +1019,7 @@ int pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t prot);
 int pmd_set_huge(pmd_t *pmd, phys_addr_t addr, pgprot_t prot);
 int pud_clear_huge(pud_t *pud);
 int pmd_clear_huge(pmd_t *pmd);
+int p4d_free_pud_page(p4d_t *p4d, unsigned long addr);
 int pud_free_pmd_page(pud_t *pud, unsigned long addr);
 int pmd_free_pte_page(pmd_t *pmd, unsigned long addr);
 #else	/* !CONFIG_HAVE_ARCH_HUGE_VMAP */
@@ -1046,6 +1047,10 @@ static inline int pmd_clear_huge(pmd_t *pmd)
 {
 	return 0;
 }
+static inline int p4d_free_pud_page(p4d_t *p4d, unsigned long addr)
+{
+	return 0;
+}
 static inline int pud_free_pmd_page(pud_t *pud, unsigned long addr)
 {
 	return 0;
diff --git a/lib/ioremap.c b/lib/ioremap.c
index fc834a59c90c..49d2e23dad2e 100644
--- a/lib/ioremap.c
+++ b/lib/ioremap.c
@@ -156,6 +156,25 @@ static inline int ioremap_pud_range(p4d_t *p4d, unsigned long addr,
 	return 0;
 }
 
+static int ioremap_try_huge_p4d(p4d_t *p4d, unsigned long addr,
+				unsigned long end, phys_addr_t phys_addr,
+				pgprot_t prot)
+{
+	if (!ioremap_p4d_enabled())
+		return 0;
+
+	if ((end - addr) != P4D_SIZE)
+		return 0;
+
+	if (!IS_ALIGNED(phys_addr, P4D_SIZE))
+		return 0;
+
+	if (p4d_present(*p4d) && !p4d_free_pud_page(p4d, addr))
+		return 0;
+
+	return p4d_set_huge(p4d, phys_addr, prot);
+}
+
 static inline int ioremap_p4d_range(pgd_t *pgd, unsigned long addr,
 		unsigned long end, phys_addr_t phys_addr, pgprot_t prot)
 {
@@ -168,12 +187,8 @@ static inline int ioremap_p4d_range(pgd_t *pgd, unsigned long addr,
 	do {
 		next = p4d_addr_end(addr, end);
 
-		if (ioremap_p4d_enabled() &&
-		    ((next - addr) == P4D_SIZE) &&
-		    IS_ALIGNED(phys_addr, P4D_SIZE)) {
-			if (p4d_set_huge(p4d, phys_addr, prot))
-				continue;
-		}
+		if (ioremap_try_huge_p4d(p4d, addr, next, phys_addr, prot))
+			continue;
 
 		if (ioremap_pud_range(p4d, addr, next, phys_addr, prot))
 			return -ENOMEM;
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 4/5] lib/ioremap: Ensure phys_addr actually corresponds to a physical address
  2018-09-12 10:26 ` [PATCH 4/5] lib/ioremap: Ensure phys_addr actually corresponds to a physical address Will Deacon
@ 2018-09-12 15:09   ` Sean Christopherson
  2018-09-12 16:39     ` Will Deacon
  0 siblings, 1 reply; 17+ messages in thread
From: Sean Christopherson @ 2018-09-12 15:09 UTC (permalink / raw)
  To: Will Deacon
  Cc: linux-kernel, linux-mm, cpandya, toshi.kani, tglx, mhocko, akpm

On Wed, Sep 12, 2018 at 11:26:13AM +0100, Will Deacon wrote:
> The current ioremap() code uses a phys_addr variable at each level of
> page table, which is confusingly offset by subtracting the base virtual
> address being mapped so that adding the current virtual address back on
> when iterating through the page table entries gives back the corresponding
> physical address.
> 
> This is fairly confusing and results in all users of phys_addr having to
> add the current virtual address back on. Instead, this patch just updates
> phys_addr when iterating over the page table entries, ensuring that it's
> always up-to-date and doesn't require explicit offsetting.
> 
> Cc: Chintan Pandya <cpandya@codeaurora.org>
> Cc: Toshi Kani <toshi.kani@hpe.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Signed-off-by: Will Deacon <will.deacon@arm.com>
> ---
>  lib/ioremap.c | 28 ++++++++++++----------------
>  1 file changed, 12 insertions(+), 16 deletions(-)
> 
> diff --git a/lib/ioremap.c b/lib/ioremap.c
> index 6c72764af19c..fc834a59c90c 100644
> --- a/lib/ioremap.c
> +++ b/lib/ioremap.c
> @@ -101,19 +101,18 @@ static inline int ioremap_pmd_range(pud_t *pud, unsigned long addr,
>  	pmd_t *pmd;
>  	unsigned long next;
>  
> -	phys_addr -= addr;
>  	pmd = pmd_alloc(&init_mm, pud, addr);
>  	if (!pmd)
>  		return -ENOMEM;
>  	do {
>  		next = pmd_addr_end(addr, end);
>  
> -		if (ioremap_try_huge_pmd(pmd, addr, next, phys_addr + addr, prot))
> +		if (ioremap_try_huge_pmd(pmd, addr, next, phys_addr, prot))
>  			continue;
>  
> -		if (ioremap_pte_range(pmd, addr, next, phys_addr + addr, prot))
> +		if (ioremap_pte_range(pmd, addr, next, phys_addr, prot))
>  			return -ENOMEM;
> -	} while (pmd++, addr = next, addr != end);
> +	} while (pmd++, addr = next, phys_addr += PMD_SIZE, addr != end);

I think bumping phys_addr by PXX_SIZE is wrong if phys_addr and addr
start unaligned with respect to PXX_SIZE.  The addresses must be
PAGE_ALIGNED, which lets ioremap_pte_range() do a simple calculation,
but that doesn't hold true for the upper levels, i.e. phys_addr needs
to be adjusted using an algorithm similar to pxx_addr_end().

Using a 2mb page as an example (lower 32 bits only): 

pxx_size  = 0x00020000
pxx_mask  = 0xfffe0000
addr      = 0x1000
end       = 0x00040000
phys_addr = 0x1000

Loop 1:
   addr = 0x1000
   phys = 0x1000

Loop 2:
   addr = 0x20000
   phys = 0x21000


>  	return 0;
>  }
>  
> @@ -142,19 +141,18 @@ static inline int ioremap_pud_range(p4d_t *p4d, unsigned long addr,
>  	pud_t *pud;
>  	unsigned long next;
>  
> -	phys_addr -= addr;
>  	pud = pud_alloc(&init_mm, p4d, addr);
>  	if (!pud)
>  		return -ENOMEM;
>  	do {
>  		next = pud_addr_end(addr, end);
>  
> -		if (ioremap_try_huge_pud(pud, addr, next, phys_addr + addr, prot))
> +		if (ioremap_try_huge_pud(pud, addr, next, phys_addr, prot))
>  			continue;
>  
> -		if (ioremap_pmd_range(pud, addr, next, phys_addr + addr, prot))
> +		if (ioremap_pmd_range(pud, addr, next, phys_addr, prot))
>  			return -ENOMEM;
> -	} while (pud++, addr = next, addr != end);
> +	} while (pud++, addr = next, phys_addr += PUD_SIZE, addr != end);
>  	return 0;
>  }
>  
> @@ -164,7 +162,6 @@ static inline int ioremap_p4d_range(pgd_t *pgd, unsigned long addr,
>  	p4d_t *p4d;
>  	unsigned long next;
>  
> -	phys_addr -= addr;
>  	p4d = p4d_alloc(&init_mm, pgd, addr);
>  	if (!p4d)
>  		return -ENOMEM;
> @@ -173,14 +170,14 @@ static inline int ioremap_p4d_range(pgd_t *pgd, unsigned long addr,
>  
>  		if (ioremap_p4d_enabled() &&
>  		    ((next - addr) == P4D_SIZE) &&
> -		    IS_ALIGNED(phys_addr + addr, P4D_SIZE)) {
> -			if (p4d_set_huge(p4d, phys_addr + addr, prot))
> +		    IS_ALIGNED(phys_addr, P4D_SIZE)) {
> +			if (p4d_set_huge(p4d, phys_addr, prot))
>  				continue;
>  		}
>  
> -		if (ioremap_pud_range(p4d, addr, next, phys_addr + addr, prot))
> +		if (ioremap_pud_range(p4d, addr, next, phys_addr, prot))
>  			return -ENOMEM;
> -	} while (p4d++, addr = next, addr != end);
> +	} while (p4d++, addr = next, phys_addr += P4D_SIZE, addr != end);
>  	return 0;
>  }
>  
> @@ -196,14 +193,13 @@ int ioremap_page_range(unsigned long addr,
>  	BUG_ON(addr >= end);
>  
>  	start = addr;
> -	phys_addr -= addr;
>  	pgd = pgd_offset_k(addr);
>  	do {
>  		next = pgd_addr_end(addr, end);
> -		err = ioremap_p4d_range(pgd, addr, next, phys_addr+addr, prot);
> +		err = ioremap_p4d_range(pgd, addr, next, phys_addr, prot);
>  		if (err)
>  			break;
> -	} while (pgd++, addr = next, addr != end);
> +	} while (pgd++, addr = next, phys_addr += PGDIR_SIZE, addr != end);
>  
>  	flush_cache_vmap(start, end);
>  
> -- 
> 2.1.4
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 4/5] lib/ioremap: Ensure phys_addr actually corresponds to a physical address
  2018-09-12 15:09   ` Sean Christopherson
@ 2018-09-12 16:39     ` Will Deacon
  2018-09-12 17:14       ` Sean Christopherson
  0 siblings, 1 reply; 17+ messages in thread
From: Will Deacon @ 2018-09-12 16:39 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: linux-kernel, linux-mm, cpandya, toshi.kani, tglx, mhocko, akpm

Hi Sean,

Thanks for looking at the patch.

On Wed, Sep 12, 2018 at 08:09:39AM -0700, Sean Christopherson wrote:
> On Wed, Sep 12, 2018 at 11:26:13AM +0100, Will Deacon wrote:
> > The current ioremap() code uses a phys_addr variable at each level of
> > page table, which is confusingly offset by subtracting the base virtual
> > address being mapped so that adding the current virtual address back on
> > when iterating through the page table entries gives back the corresponding
> > physical address.
> > 
> > This is fairly confusing and results in all users of phys_addr having to
> > add the current virtual address back on. Instead, this patch just updates
> > phys_addr when iterating over the page table entries, ensuring that it's
> > always up-to-date and doesn't require explicit offsetting.
> > 
> > Cc: Chintan Pandya <cpandya@codeaurora.org>
> > Cc: Toshi Kani <toshi.kani@hpe.com>
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Michal Hocko <mhocko@suse.com>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Signed-off-by: Will Deacon <will.deacon@arm.com>
> > ---
> >  lib/ioremap.c | 28 ++++++++++++----------------
> >  1 file changed, 12 insertions(+), 16 deletions(-)
> > 
> > diff --git a/lib/ioremap.c b/lib/ioremap.c
> > index 6c72764af19c..fc834a59c90c 100644
> > --- a/lib/ioremap.c
> > +++ b/lib/ioremap.c
> > @@ -101,19 +101,18 @@ static inline int ioremap_pmd_range(pud_t *pud, unsigned long addr,
> >  	pmd_t *pmd;
> >  	unsigned long next;
> >  
> > -	phys_addr -= addr;
> >  	pmd = pmd_alloc(&init_mm, pud, addr);
> >  	if (!pmd)
> >  		return -ENOMEM;
> >  	do {
> >  		next = pmd_addr_end(addr, end);
> >  
> > -		if (ioremap_try_huge_pmd(pmd, addr, next, phys_addr + addr, prot))
> > +		if (ioremap_try_huge_pmd(pmd, addr, next, phys_addr, prot))
> >  			continue;
> >  
> > -		if (ioremap_pte_range(pmd, addr, next, phys_addr + addr, prot))
> > +		if (ioremap_pte_range(pmd, addr, next, phys_addr, prot))
> >  			return -ENOMEM;
> > -	} while (pmd++, addr = next, addr != end);
> > +	} while (pmd++, addr = next, phys_addr += PMD_SIZE, addr != end);
> 
> I think bumping phys_addr by PXX_SIZE is wrong if phys_addr and addr
> start unaligned with respect to PXX_SIZE.  The addresses must be
> PAGE_ALIGNED, which lets ioremap_pte_range() do a simple calculation,
> but that doesn't hold true for the upper levels, i.e. phys_addr needs
> to be adjusted using an algorithm similar to pxx_addr_end().
> 
> Using a 2mb page as an example (lower 32 bits only): 
> 
> pxx_size  = 0x00020000
> pxx_mask  = 0xfffe0000
> addr      = 0x1000
> end       = 0x00040000
> phys_addr = 0x1000
> 
> Loop 1:
>    addr = 0x1000
>    phys = 0x1000
> 
> Loop 2:
>    addr = 0x20000
>    phys = 0x21000

Yes, I think you're completely right, however I also don't think this
can happen with the current code (and I've failed to trigger it in my
testing). The virtual addresses allocated for VM_IOREMAP allocations
are aligned to the order of the allocation, which means that the virtual
address at the start of the mapping is aligned such that when we hit the
end of a pXd, we know we've mapped the previous PXD_SIZE bytes.

Having said that, this is clearly a change from the current code and I
haven't audited architectures other than arm64 (where IOREMAP_MAX_ORDER
corresponds to the maximum size of our huge mappings), so it would be
much better not to introduce this funny behaviour in a patch that aims
to reduce confusion in the first place!

Fixing this using the pxx_addr_end() macros is a bit strange, since we
don't have a physical end variable (nor do we need one), so perhaps
something like changing the while condition to be:

	do {
		...
	} while (pmd++, phys_addr += (next - addr), addr = next, addr != end);

would do the trick. What do you reckon?

Will

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 4/5] lib/ioremap: Ensure phys_addr actually corresponds to a physical address
  2018-09-12 16:39     ` Will Deacon
@ 2018-09-12 17:14       ` Sean Christopherson
  0 siblings, 0 replies; 17+ messages in thread
From: Sean Christopherson @ 2018-09-12 17:14 UTC (permalink / raw)
  To: Will Deacon
  Cc: linux-kernel, linux-mm, cpandya, toshi.kani, tglx, mhocko, akpm

On Wed, Sep 12, 2018 at 05:39:14PM +0100, Will Deacon wrote:
> Hi Sean,
> 
> Thanks for looking at the patch.
> 
> On Wed, Sep 12, 2018 at 08:09:39AM -0700, Sean Christopherson wrote:
> > On Wed, Sep 12, 2018 at 11:26:13AM +0100, Will Deacon wrote:
> > > The current ioremap() code uses a phys_addr variable at each level of
> > > page table, which is confusingly offset by subtracting the base virtual
> > > address being mapped so that adding the current virtual address back on
> > > when iterating through the page table entries gives back the corresponding
> > > physical address.
> > > 
> > > This is fairly confusing and results in all users of phys_addr having to
> > > add the current virtual address back on. Instead, this patch just updates
> > > phys_addr when iterating over the page table entries, ensuring that it's
> > > always up-to-date and doesn't require explicit offsetting.
> > > 
> > > Cc: Chintan Pandya <cpandya@codeaurora.org>
> > > Cc: Toshi Kani <toshi.kani@hpe.com>
> > > Cc: Thomas Gleixner <tglx@linutronix.de>
> > > Cc: Michal Hocko <mhocko@suse.com>
> > > Cc: Andrew Morton <akpm@linux-foundation.org>
> > > Signed-off-by: Will Deacon <will.deacon@arm.com>
> > > ---
> > >  lib/ioremap.c | 28 ++++++++++++----------------
> > >  1 file changed, 12 insertions(+), 16 deletions(-)
> > > 
> > > diff --git a/lib/ioremap.c b/lib/ioremap.c
> > > index 6c72764af19c..fc834a59c90c 100644
> > > --- a/lib/ioremap.c
> > > +++ b/lib/ioremap.c
> > > @@ -101,19 +101,18 @@ static inline int ioremap_pmd_range(pud_t *pud, unsigned long addr,
> > >  	pmd_t *pmd;
> > >  	unsigned long next;
> > >  
> > > -	phys_addr -= addr;
> > >  	pmd = pmd_alloc(&init_mm, pud, addr);
> > >  	if (!pmd)
> > >  		return -ENOMEM;
> > >  	do {
> > >  		next = pmd_addr_end(addr, end);
> > >  
> > > -		if (ioremap_try_huge_pmd(pmd, addr, next, phys_addr + addr, prot))
> > > +		if (ioremap_try_huge_pmd(pmd, addr, next, phys_addr, prot))
> > >  			continue;
> > >  
> > > -		if (ioremap_pte_range(pmd, addr, next, phys_addr + addr, prot))
> > > +		if (ioremap_pte_range(pmd, addr, next, phys_addr, prot))
> > >  			return -ENOMEM;
> > > -	} while (pmd++, addr = next, addr != end);
> > > +	} while (pmd++, addr = next, phys_addr += PMD_SIZE, addr != end);
> > 
> > I think bumping phys_addr by PXX_SIZE is wrong if phys_addr and addr
> > start unaligned with respect to PXX_SIZE.  The addresses must be
> > PAGE_ALIGNED, which lets ioremap_pte_range() do a simple calculation,
> > but that doesn't hold true for the upper levels, i.e. phys_addr needs
> > to be adjusted using an algorithm similar to pxx_addr_end().
> > 
> > Using a 2mb page as an example (lower 32 bits only): 
> > 
> > pxx_size  = 0x00020000
> > pxx_mask  = 0xfffe0000
> > addr      = 0x1000
> > end       = 0x00040000
> > phys_addr = 0x1000
> > 
> > Loop 1:
> >    addr = 0x1000
> >    phys = 0x1000
> > 
> > Loop 2:
> >    addr = 0x20000
> >    phys = 0x21000
> 
> Yes, I think you're completely right, however I also don't think this
> can happen with the current code (and I've failed to trigger it in my
> testing). The virtual addresses allocated for VM_IOREMAP allocations
> are aligned to the order of the allocation, which means that the virtual
> address at the start of the mapping is aligned such that when we hit the
> end of a pXd, we know we've mapped the previous PXD_SIZE bytes.
> 
> Having said that, this is clearly a change from the current code and I
> haven't audited architectures other than arm64 (where IOREMAP_MAX_ORDER
> corresponds to the maximum size of our huge mappings), so it would be
> much better not to introduce this funny behaviour in a patch that aims
> to reduce confusion in the first place!
> 
> Fixing this using the pxx_addr_end() macros is a bit strange, since we
> don't have a physical end variable (nor do we need one), so perhaps
> something like changing the while condition to be:
> 
> 	do {
> 		...
> 	} while (pmd++, phys_addr += (next - addr), addr = next, addr != end);
> 
> would do the trick. What do you reckon?

LGTM.  I like that there isn't a separate calculation for phys_addr's offset.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/5] ioremap: Rework pXd_free_pYd_page() API
  2018-09-12 10:26 ` [PATCH 1/5] ioremap: Rework pXd_free_pYd_page() API Will Deacon
@ 2018-09-14 20:36   ` Kani, Toshi
  2018-09-14 21:10     ` Kani, Toshi
  0 siblings, 1 reply; 17+ messages in thread
From: Kani, Toshi @ 2018-09-14 20:36 UTC (permalink / raw)
  To: linux-kernel, linux-mm, will.deacon; +Cc: tglx, cpandya, Hocko, Michal, akpm

On Wed, 2018-09-12 at 11:26 +0100, Will Deacon wrote:
> The recently merged API for ensuring break-before-make on page-table
> entries when installing huge mappings in the vmalloc/ioremap region is
> fairly counter-intuitive, resulting in the arch freeing functions
> (e.g. pmd_free_pte_page()) being called even on entries that aren't
> present. This resulted in a minor bug in the arm64 implementation, giving
> rise to spurious VM_WARN messages.
> 
> This patch moves the pXd_present() checks out into the core code,
> refactoring the callsites at the same time so that we avoid the complex
> conjunctions when determining whether or not we can put down a huge
> mapping.
> 
> Cc: Chintan Pandya <cpandya@codeaurora.org>
> Cc: Toshi Kani <toshi.kani@hpe.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
> Signed-off-by: Will Deacon <will.deacon@arm.com>

Yes, this looks nicer.

Reviewed-by: Toshi Kani <toshi.kani@hpe.com>

Thanks,
-Toshi

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/5] x86: pgtable: Drop pXd_none() checks from pXd_free_pYd_table()
  2018-09-12 10:26 ` [PATCH 3/5] x86: pgtable: Drop pXd_none() " Will Deacon
@ 2018-09-14 20:37   ` Kani, Toshi
  2018-09-17 11:33     ` Will Deacon
  0 siblings, 1 reply; 17+ messages in thread
From: Kani, Toshi @ 2018-09-14 20:37 UTC (permalink / raw)
  To: linux-kernel, linux-mm, will.deacon; +Cc: tglx, cpandya, Hocko, Michal, akpm

On Wed, 2018-09-12 at 11:26 +0100, Will Deacon wrote:
> Now that the core code checks this for us, we don't need to do it in the
> backend.
> 
> Cc: Chintan Pandya <cpandya@codeaurora.org>
> Cc: Toshi Kani <toshi.kani@hpe.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Signed-off-by: Will Deacon <will.deacon@arm.com>
> ---
>  arch/x86/mm/pgtable.c | 6 ------
>  1 file changed, 6 deletions(-)
> 
> diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
> index ae394552fb94..b4919c44a194 100644
> --- a/arch/x86/mm/pgtable.c
> +++ b/arch/x86/mm/pgtable.c
> @@ -796,9 +796,6 @@ int pud_free_pmd_page(pud_t *pud, unsigned long addr)
>  	pte_t *pte;
>  	int i;
>  
> -	if (pud_none(*pud))
> -		return 1;
> -

Do we need to remove this safe guard?  I feel list this is same as
kfree() accepting NULL.

Thanks,
-Toshi


>  	pmd = (pmd_t *)pud_page_vaddr(*pud);
>  	pmd_sv = (pmd_t *)__get_free_page(GFP_KERNEL);
>  	if (!pmd_sv)
> @@ -840,9 +837,6 @@ int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
>  {
>  	pte_t *pte;
>  
> -	if (pmd_none(*pmd))
> -		return 1;
> -
>  	pte = (pte_t *)pmd_page_vaddr(*pmd);
>  	pmd_clear(pmd);
>  

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/5] ioremap: Rework pXd_free_pYd_page() API
  2018-09-14 20:36   ` Kani, Toshi
@ 2018-09-14 21:10     ` Kani, Toshi
  2018-09-17 11:33       ` Will Deacon
  0 siblings, 1 reply; 17+ messages in thread
From: Kani, Toshi @ 2018-09-14 21:10 UTC (permalink / raw)
  To: linux-kernel, linux-mm, will.deacon; +Cc: tglx, cpandya, Hocko, Michal, akpm

On Fri, 2018-09-14 at 14:36 -0600, Toshi Kani wrote:
> On Wed, 2018-09-12 at 11:26 +0100, Will Deacon wrote:
> > The recently merged API for ensuring break-before-make on page-table
> > entries when installing huge mappings in the vmalloc/ioremap region is
> > fairly counter-intuitive, resulting in the arch freeing functions
> > (e.g. pmd_free_pte_page()) being called even on entries that aren't
> > present. This resulted in a minor bug in the arm64 implementation, giving
> > rise to spurious VM_WARN messages.
> > 
> > This patch moves the pXd_present() checks out into the core code,
> > refactoring the callsites at the same time so that we avoid the complex
> > conjunctions when determining whether or not we can put down a huge
> > mapping.
> > 
> > Cc: Chintan Pandya <cpandya@codeaurora.org>
> > Cc: Toshi Kani <toshi.kani@hpe.com>
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Michal Hocko <mhocko@suse.com>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
> > Signed-off-by: Will Deacon <will.deacon@arm.com>
> 
> Yes, this looks nicer.
> 
> Reviewed-by: Toshi Kani <toshi.kani@hpe.com>

Sorry, I take it back since I got a question...

+static int ioremap_try_huge_pmd(pmd_t *pmd, unsigned long addr,
> +				unsigned long end, phys_addr_t
phys_addr,
> +				pgprot_t prot)
> +{
> +	if (!ioremap_pmd_enabled())
> +		return 0;
> +
> +	if ((end - addr) != PMD_SIZE)
> +		return 0;
> +
> +	if (!IS_ALIGNED(phys_addr, PMD_SIZE))
> +		return 0;
> +
> +	if (pmd_present(*pmd) && !pmd_free_pte_page(pmd, addr))
> +		return 0;

Is pm_present() a proper check here?  We probably do not have this case
for iomap, but I wonder if one can drop p-bit while it has a pte page
underneath.

Thanks,
-Toshi


> +
> +	return pmd_set_huge(pmd, phys_addr, prot);
> +}
> +



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/5] x86: pgtable: Drop pXd_none() checks from pXd_free_pYd_table()
  2018-09-14 20:37   ` Kani, Toshi
@ 2018-09-17 11:33     ` Will Deacon
  2018-09-17 18:43       ` Kani, Toshi
  0 siblings, 1 reply; 17+ messages in thread
From: Will Deacon @ 2018-09-17 11:33 UTC (permalink / raw)
  To: Kani, Toshi; +Cc: linux-kernel, linux-mm, tglx, cpandya, Hocko, Michal, akpm

On Fri, Sep 14, 2018 at 08:37:48PM +0000, Kani, Toshi wrote:
> On Wed, 2018-09-12 at 11:26 +0100, Will Deacon wrote:
> > Now that the core code checks this for us, we don't need to do it in the
> > backend.
> > 
> > Cc: Chintan Pandya <cpandya@codeaurora.org>
> > Cc: Toshi Kani <toshi.kani@hpe.com>
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Michal Hocko <mhocko@suse.com>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Signed-off-by: Will Deacon <will.deacon@arm.com>
> > ---
> >  arch/x86/mm/pgtable.c | 6 ------
> >  1 file changed, 6 deletions(-)
> > 
> > diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
> > index ae394552fb94..b4919c44a194 100644
> > --- a/arch/x86/mm/pgtable.c
> > +++ b/arch/x86/mm/pgtable.c
> > @@ -796,9 +796,6 @@ int pud_free_pmd_page(pud_t *pud, unsigned long addr)
> >  	pte_t *pte;
> >  	int i;
> >  
> > -	if (pud_none(*pud))
> > -		return 1;
> > -
> 
> Do we need to remove this safe guard?  I feel list this is same as
> kfree() accepting NULL.

I think two big differences with kfree() are (1) that this function has
exactly one caller in the tree and (2) it's implemented per-arch. Therefore
we're in a good position to give it some simple semantics and implement
those. Of course, if the x86 people would like to keep the redundant check,
that's up to them, but I think it makes the function more confusing and
tempts people into calling it for present entries.

Will

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/5] ioremap: Rework pXd_free_pYd_page() API
  2018-09-14 21:10     ` Kani, Toshi
@ 2018-09-17 11:33       ` Will Deacon
  2018-09-17 18:38         ` Kani, Toshi
  0 siblings, 1 reply; 17+ messages in thread
From: Will Deacon @ 2018-09-17 11:33 UTC (permalink / raw)
  To: Kani, Toshi; +Cc: linux-kernel, linux-mm, tglx, cpandya, Hocko, Michal, akpm

On Fri, Sep 14, 2018 at 09:10:49PM +0000, Kani, Toshi wrote:
> On Fri, 2018-09-14 at 14:36 -0600, Toshi Kani wrote:
> > On Wed, 2018-09-12 at 11:26 +0100, Will Deacon wrote:
> > > The recently merged API for ensuring break-before-make on page-table
> > > entries when installing huge mappings in the vmalloc/ioremap region is
> > > fairly counter-intuitive, resulting in the arch freeing functions
> > > (e.g. pmd_free_pte_page()) being called even on entries that aren't
> > > present. This resulted in a minor bug in the arm64 implementation, giving
> > > rise to spurious VM_WARN messages.
> > > 
> > > This patch moves the pXd_present() checks out into the core code,
> > > refactoring the callsites at the same time so that we avoid the complex
> > > conjunctions when determining whether or not we can put down a huge
> > > mapping.
> > > 
> > > Cc: Chintan Pandya <cpandya@codeaurora.org>
> > > Cc: Toshi Kani <toshi.kani@hpe.com>
> > > Cc: Thomas Gleixner <tglx@linutronix.de>
> > > Cc: Michal Hocko <mhocko@suse.com>
> > > Cc: Andrew Morton <akpm@linux-foundation.org>
> > > Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
> > > Signed-off-by: Will Deacon <will.deacon@arm.com>
> > 
> > Yes, this looks nicer.
> > 
> > Reviewed-by: Toshi Kani <toshi.kani@hpe.com>
> 
> Sorry, I take it back since I got a question...
> 
> +static int ioremap_try_huge_pmd(pmd_t *pmd, unsigned long addr,
> > +				unsigned long end, phys_addr_t
> phys_addr,
> > +				pgprot_t prot)
> > +{
> > +	if (!ioremap_pmd_enabled())
> > +		return 0;
> > +
> > +	if ((end - addr) != PMD_SIZE)
> > +		return 0;
> > +
> > +	if (!IS_ALIGNED(phys_addr, PMD_SIZE))
> > +		return 0;
> > +
> > +	if (pmd_present(*pmd) && !pmd_free_pte_page(pmd, addr))
> > +		return 0;
> 
> Is pm_present() a proper check here?  We probably do not have this case
> for iomap, but I wonder if one can drop p-bit while it has a pte page
> underneath.

For ioremap/vunmap the pXd_present() check is correct, yes. The vunmap()
code only ever clears leaf entries, leaving table entries intact. If it
did clear table entries, you'd be stuck here because you wouldn't have
the address of the table to free.

If somebody called pmd_mknotpresent() on a table entry, we may run into
problems, but it's only used for huge mappings afaict.

Will

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/5] ioremap: Rework pXd_free_pYd_page() API
  2018-09-17 11:33       ` Will Deacon
@ 2018-09-17 18:38         ` Kani, Toshi
  0 siblings, 0 replies; 17+ messages in thread
From: Kani, Toshi @ 2018-09-17 18:38 UTC (permalink / raw)
  To: will.deacon; +Cc: tglx, linux-kernel, linux-mm, cpandya, Hocko, Michal, akpm

On Mon, 2018-09-17 at 12:33 +0100, Will Deacon wrote:
> On Fri, Sep 14, 2018 at 09:10:49PM +0000, Kani, Toshi wrote:
> > On Fri, 2018-09-14 at 14:36 -0600, Toshi Kani wrote:
> > > On Wed, 2018-09-12 at 11:26 +0100, Will Deacon wrote:
> > > > The recently merged API for ensuring break-before-make on page-table
> > > > entries when installing huge mappings in the vmalloc/ioremap region is
> > > > fairly counter-intuitive, resulting in the arch freeing functions
> > > > (e.g. pmd_free_pte_page()) being called even on entries that aren't
> > > > present. This resulted in a minor bug in the arm64 implementation, giving
> > > > rise to spurious VM_WARN messages.
> > > > 
> > > > This patch moves the pXd_present() checks out into the core code,
> > > > refactoring the callsites at the same time so that we avoid the complex
> > > > conjunctions when determining whether or not we can put down a huge
> > > > mapping.
> > > > 
> > > > Cc: Chintan Pandya <cpandya@codeaurora.org>
> > > > Cc: Toshi Kani <toshi.kani@hpe.com>
> > > > Cc: Thomas Gleixner <tglx@linutronix.de>
> > > > Cc: Michal Hocko <mhocko@suse.com>
> > > > Cc: Andrew Morton <akpm@linux-foundation.org>
> > > > Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
> > > > Signed-off-by: Will Deacon <will.deacon@arm.com>
> > > 
> > > Yes, this looks nicer.
> > > 
> > > Reviewed-by: Toshi Kani <toshi.kani@hpe.com>
> > 
> > Sorry, I take it back since I got a question...
> > 
> > +static int ioremap_try_huge_pmd(pmd_t *pmd, unsigned long addr,
> > > +				unsigned long end, phys_addr_t
> > 
> > phys_addr,
> > > +				pgprot_t prot)
> > > +{
> > > +	if (!ioremap_pmd_enabled())
> > > +		return 0;
> > > +
> > > +	if ((end - addr) != PMD_SIZE)
> > > +		return 0;
> > > +
> > > +	if (!IS_ALIGNED(phys_addr, PMD_SIZE))
> > > +		return 0;
> > > +
> > > +	if (pmd_present(*pmd) && !pmd_free_pte_page(pmd, addr))
> > > +		return 0;
> > 
> > Is pm_present() a proper check here?  We probably do not have this case
> > for iomap, but I wonder if one can drop p-bit while it has a pte page
> > underneath.
> 
> For ioremap/vunmap the pXd_present() check is correct, yes. The vunmap()
> code only ever clears leaf entries, leaving table entries intact. 

Right. I was thinking if such case happens in future.

> If it
> did clear table entries, you'd be stuck here because you wouldn't have
> the address of the table to free.
> 
> If somebody called pmd_mknotpresent() on a table entry, we may run into
> problems, but it's only used for huge mappings afaict.

Treating a table entry valid when p-bit is off is risky as well. So, I
agree with the pXd_present() check.

Reviewed-by: Toshi Kani <toshi.kani@hpe.com>

Thanks,
-Toshi


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/5] x86: pgtable: Drop pXd_none() checks from pXd_free_pYd_table()
  2018-09-17 11:33     ` Will Deacon
@ 2018-09-17 18:43       ` Kani, Toshi
  0 siblings, 0 replies; 17+ messages in thread
From: Kani, Toshi @ 2018-09-17 18:43 UTC (permalink / raw)
  To: will.deacon; +Cc: tglx, linux-kernel, linux-mm, cpandya, Hocko, Michal, akpm

On Mon, 2018-09-17 at 12:33 +0100, Will Deacon wrote:
> On Fri, Sep 14, 2018 at 08:37:48PM +0000, Kani, Toshi wrote:
> > On Wed, 2018-09-12 at 11:26 +0100, Will Deacon wrote:
> > > Now that the core code checks this for us, we don't need to do it in the
> > > backend.
> > > 
> > > Cc: Chintan Pandya <cpandya@codeaurora.org>
> > > Cc: Toshi Kani <toshi.kani@hpe.com>
> > > Cc: Thomas Gleixner <tglx@linutronix.de>
> > > Cc: Michal Hocko <mhocko@suse.com>
> > > Cc: Andrew Morton <akpm@linux-foundation.org>
> > > Signed-off-by: Will Deacon <will.deacon@arm.com>
> > > ---
> > >  arch/x86/mm/pgtable.c | 6 ------
> > >  1 file changed, 6 deletions(-)
> > > 
> > > diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
> > > index ae394552fb94..b4919c44a194 100644
> > > --- a/arch/x86/mm/pgtable.c
> > > +++ b/arch/x86/mm/pgtable.c
> > > @@ -796,9 +796,6 @@ int pud_free_pmd_page(pud_t *pud, unsigned long addr)
> > >  	pte_t *pte;
> > >  	int i;
> > >  
> > > -	if (pud_none(*pud))
> > > -		return 1;
> > > -
> > 
> > Do we need to remove this safe guard?  I feel list this is same as
> > kfree() accepting NULL.
> 
> I think two big differences with kfree() are (1) that this function has
> exactly one caller in the tree and (2) it's implemented per-arch. Therefore
> we're in a good position to give it some simple semantics and implement
> those. Of course, if the x86 people would like to keep the redundant check,
> that's up to them, but I think it makes the function more confusing and
> tempts people into calling it for present entries.

With patch 1/5 change to have pXd_present() check, I agree that we can
remove this pXd_none() check to avoid any confusion.

Reviewed-by: Toshi Kani <toshi.kani@hpe.com>

Thanks,
-Toshi


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 5/5] lib/ioremap: Ensure break-before-make is used for huge p4d mappings
  2018-09-12 10:26 ` [PATCH 5/5] lib/ioremap: Ensure break-before-make is used for huge p4d mappings Will Deacon
@ 2018-09-17 18:55   ` Kani, Toshi
  0 siblings, 0 replies; 17+ messages in thread
From: Kani, Toshi @ 2018-09-17 18:55 UTC (permalink / raw)
  To: linux-kernel, linux-mm, will.deacon; +Cc: tglx, cpandya, Hocko, Michal, akpm

On Wed, 2018-09-12 at 11:26 +0100, Will Deacon wrote:
> Whilst no architectures actually enable support for huge p4d mappings
> in the vmap area, the code that is implemented should be using
> break-before-make, as we do for pud and pmd huge entries.
> 
> Cc: Chintan Pandya <cpandya@codeaurora.org>
> Cc: Toshi Kani <toshi.kani@hpe.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Signed-off-by: Will Deacon <will.deacon@arm.com>

Thanks. This keeps the p4d path consistent.

Reviewed-by: Toshi Kani <toshi.kani@hpe.com>

-Toshi


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2018-09-17 18:55 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-12 10:26 [PATCH 0/5] Clean up huge vmap and ioremap code Will Deacon
2018-09-12 10:26 ` [PATCH 1/5] ioremap: Rework pXd_free_pYd_page() API Will Deacon
2018-09-14 20:36   ` Kani, Toshi
2018-09-14 21:10     ` Kani, Toshi
2018-09-17 11:33       ` Will Deacon
2018-09-17 18:38         ` Kani, Toshi
2018-09-12 10:26 ` [PATCH 2/5] arm64: mmu: Drop pXd_present() checks from pXd_free_pYd_table() Will Deacon
2018-09-12 10:26 ` [PATCH 3/5] x86: pgtable: Drop pXd_none() " Will Deacon
2018-09-14 20:37   ` Kani, Toshi
2018-09-17 11:33     ` Will Deacon
2018-09-17 18:43       ` Kani, Toshi
2018-09-12 10:26 ` [PATCH 4/5] lib/ioremap: Ensure phys_addr actually corresponds to a physical address Will Deacon
2018-09-12 15:09   ` Sean Christopherson
2018-09-12 16:39     ` Will Deacon
2018-09-12 17:14       ` Sean Christopherson
2018-09-12 10:26 ` [PATCH 5/5] lib/ioremap: Ensure break-before-make is used for huge p4d mappings Will Deacon
2018-09-17 18:55   ` Kani, Toshi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.