linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] fix vmalloc_to_page for huge vmap mappings
@ 2019-06-23  9:44 Nicholas Piggin
  2019-06-23  9:44 ` [PATCH 1/3] arm64: mm: Add p?d_large() definitions Nicholas Piggin
                   ` (3 more replies)
  0 siblings, 4 replies; 12+ messages in thread
From: Nicholas Piggin @ 2019-06-23  9:44 UTC (permalink / raw)
  To: linux-mm
  Cc: Nicholas Piggin, linux-arm-kernel, linuxppc-dev, Andrew Morton,
	Anshuman Khandual, Christophe Leroy, Ard Biesheuvel,
	Mark Rutland

This is a change broken out from the huge vmap vmalloc series as
requested. There is a little bit of dependency juggling across
trees, but patches are pretty trivial. Ideally if Andrew accepts
this patch and queues it up for next, then the arch patches would
be merged through those trees then patch 3 gets sent by Andrew.

I've tested this with other powerpc and vmalloc patches, with code
that explicitly tests vmalloc_to_page on vmalloced memory and
results look fine.

Thanks,
Nick

Nicholas Piggin (3):
  arm64: mm: Add p?d_large() definitions
  powerpc/64s: Add p?d_large definitions
  mm/vmalloc: fix vmalloc_to_page for huge vmap mappings

 arch/arm64/include/asm/pgtable.h             |  2 ++
 arch/powerpc/include/asm/book3s/64/pgtable.h | 24 ++++++++-----
 include/asm-generic/4level-fixup.h           |  1 +
 include/asm-generic/5level-fixup.h           |  1 +
 mm/vmalloc.c                                 | 37 +++++++++++++-------
 5 files changed, 43 insertions(+), 22 deletions(-)

-- 
2.20.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/3] arm64: mm: Add p?d_large() definitions
  2019-06-23  9:44 [PATCH 0/3] fix vmalloc_to_page for huge vmap mappings Nicholas Piggin
@ 2019-06-23  9:44 ` Nicholas Piggin
  2019-07-01  9:27   ` Will Deacon
  2019-06-23  9:44 ` [PATCH 2/3] powerpc/64s: Add p?d_large definitions Nicholas Piggin
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 12+ messages in thread
From: Nicholas Piggin @ 2019-06-23  9:44 UTC (permalink / raw)
  To: linux-mm
  Cc: Nicholas Piggin, linux-arm-kernel, linuxppc-dev, Andrew Morton,
	Anshuman Khandual, Christophe Leroy, Ard Biesheuvel,
	Mark Rutland, Catalin Marinas, Will Deacon, Steven Price

walk_page_range() is going to be allowed to walk page tables other than
those of user space. For this it needs to know when it has reached a
'leaf' entry in the page tables. This information will be provided by the
p?d_large() functions/macros.

For arm64, we already have p?d_sect() macros which we can reuse for
p?d_large().

pud_sect() is defined as a dummy function when CONFIG_PGTABLE_LEVELS < 3
or CONFIG_ARM64_64K_PAGES is defined. However when the kernel is
configured this way then architecturally it isn't allowed to have a
large page that this level, and any code using these page walking macros
is implicitly relying on the page size/number of levels being the same as
the kernel. So it is safe to reuse this for p?d_large() as it is an
architectural restriction.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Steven Price <steven.price@arm.com>
---
This patch is taken from arm64 but is required if this series is not
build together with arm64 tree.

 arch/arm64/include/asm/pgtable.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index fca26759081a..0e973201bc16 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -417,6 +417,7 @@ extern pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn,
 				 PMD_TYPE_TABLE)
 #define pmd_sect(pmd)		((pmd_val(pmd) & PMD_TYPE_MASK) == \
 				 PMD_TYPE_SECT)
+#define pmd_large(pmd)		pmd_sect(pmd)
 
 #if defined(CONFIG_ARM64_64K_PAGES) || CONFIG_PGTABLE_LEVELS < 3
 #define pud_sect(pud)		(0)
@@ -499,6 +500,7 @@ static inline void pte_unmap(pte_t *pte) { }
 #define pud_none(pud)		(!pud_val(pud))
 #define pud_bad(pud)		(!(pud_val(pud) & PUD_TABLE_BIT))
 #define pud_present(pud)	pte_present(pud_pte(pud))
+#define pud_large(pud)		pud_sect(pud)
 #define pud_valid(pud)		pte_valid(pud_pte(pud))
 
 static inline void set_pud(pud_t *pudp, pud_t pud)
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 2/3] powerpc/64s: Add p?d_large definitions
  2019-06-23  9:44 [PATCH 0/3] fix vmalloc_to_page for huge vmap mappings Nicholas Piggin
  2019-06-23  9:44 ` [PATCH 1/3] arm64: mm: Add p?d_large() definitions Nicholas Piggin
@ 2019-06-23  9:44 ` Nicholas Piggin
  2019-06-23  9:44 ` [PATCH 3/3] mm/vmalloc: fix vmalloc_to_page for huge vmap mappings Nicholas Piggin
  2019-06-24  5:52 ` [PATCH 0/3] " Anshuman Khandual
  3 siblings, 0 replies; 12+ messages in thread
From: Nicholas Piggin @ 2019-06-23  9:44 UTC (permalink / raw)
  To: linux-mm
  Cc: Nicholas Piggin, linux-arm-kernel, linuxppc-dev, Andrew Morton,
	Anshuman Khandual, Christophe Leroy, Ard Biesheuvel,
	Mark Rutland

The subsequent patch to fix vmalloc_to_page with huge vmap requires
HUGE_VMAP archs to provide p?d_large definitions for the non-pgd page
table levels they support.

Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
Upstream powerpc code does not enable HUGE_VMAP, but the powerpc next
tree has patches, so this patch is required to fix dependency between
this series and powerpc tree in linux-next.

 arch/powerpc/include/asm/book3s/64/pgtable.h | 24 ++++++++++++--------
 1 file changed, 15 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h
index ccf00a8b98c6..c19c8396a1bd 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -915,6 +915,11 @@ static inline int pud_present(pud_t pud)
 	return !!(pud_raw(pud) & cpu_to_be64(_PAGE_PRESENT));
 }
 
+static inline int pud_large(pud_t pud)
+{
+	return !!(pud_raw(pud) & cpu_to_be64(_PAGE_PTE));
+}
+
 extern struct page *pud_page(pud_t pud);
 extern struct page *pmd_page(pmd_t pmd);
 static inline pte_t pud_pte(pud_t pud)
@@ -958,6 +963,11 @@ static inline int pgd_present(pgd_t pgd)
 	return !!(pgd_raw(pgd) & cpu_to_be64(_PAGE_PRESENT));
 }
 
+static inline int pgd_large(pgd_t pgd)
+{
+	return !!(pgd_raw(pgd) & cpu_to_be64(_PAGE_PTE));
+}
+
 static inline pte_t pgd_pte(pgd_t pgd)
 {
 	return __pte_raw(pgd_raw(pgd));
@@ -1083,6 +1093,11 @@ static inline pte_t *pmdp_ptep(pmd_t *pmd)
 #define pmd_mk_savedwrite(pmd)	pte_pmd(pte_mk_savedwrite(pmd_pte(pmd)))
 #define pmd_clear_savedwrite(pmd)	pte_pmd(pte_clear_savedwrite(pmd_pte(pmd)))
 
+static inline int pmd_large(pmd_t pmd)
+{
+	return !!(pmd_raw(pmd) & cpu_to_be64(_PAGE_PTE));
+}
+
 #ifdef CONFIG_HAVE_ARCH_SOFT_DIRTY
 #define pmd_soft_dirty(pmd)    pte_soft_dirty(pmd_pte(pmd))
 #define pmd_mksoft_dirty(pmd)  pte_pmd(pte_mksoft_dirty(pmd_pte(pmd)))
@@ -1151,15 +1166,6 @@ pmd_hugepage_update(struct mm_struct *mm, unsigned long addr, pmd_t *pmdp,
 	return hash__pmd_hugepage_update(mm, addr, pmdp, clr, set);
 }
 
-/*
- * returns true for pmd migration entries, THP, devmap, hugetlb
- * But compile time dependent on THP config
- */
-static inline int pmd_large(pmd_t pmd)
-{
-	return !!(pmd_raw(pmd) & cpu_to_be64(_PAGE_PTE));
-}
-
 static inline pmd_t pmd_mknotpresent(pmd_t pmd)
 {
 	return __pmd(pmd_val(pmd) & ~_PAGE_PRESENT);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 3/3] mm/vmalloc: fix vmalloc_to_page for huge vmap mappings
  2019-06-23  9:44 [PATCH 0/3] fix vmalloc_to_page for huge vmap mappings Nicholas Piggin
  2019-06-23  9:44 ` [PATCH 1/3] arm64: mm: Add p?d_large() definitions Nicholas Piggin
  2019-06-23  9:44 ` [PATCH 2/3] powerpc/64s: Add p?d_large definitions Nicholas Piggin
@ 2019-06-23  9:44 ` Nicholas Piggin
  2019-06-24  6:52   ` Anshuman Khandual
  2019-06-24  5:52 ` [PATCH 0/3] " Anshuman Khandual
  3 siblings, 1 reply; 12+ messages in thread
From: Nicholas Piggin @ 2019-06-23  9:44 UTC (permalink / raw)
  To: linux-mm
  Cc: Nicholas Piggin, linux-arm-kernel, linuxppc-dev, Andrew Morton,
	Anshuman Khandual, Christophe Leroy, Ard Biesheuvel,
	Mark Rutland

vmalloc_to_page returns NULL for addresses mapped by larger pages[*].
Whether or not a vmap is huge depends on the architecture details,
alignments, boot options, etc., which the caller can not be expected
to know. Therefore HUGE_VMAP is a regression for vmalloc_to_page.

This change teaches vmalloc_to_page about larger pages, and returns
the struct page that corresponds to the offset within the large page.
This makes the API agnostic to mapping implementation details.

[*] As explained by commit 029c54b095995 ("mm/vmalloc.c: huge-vmap:
    fail gracefully on unexpected huge vmap mappings")

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 include/asm-generic/4level-fixup.h |  1 +
 include/asm-generic/5level-fixup.h |  1 +
 mm/vmalloc.c                       | 37 +++++++++++++++++++-----------
 3 files changed, 26 insertions(+), 13 deletions(-)

diff --git a/include/asm-generic/4level-fixup.h b/include/asm-generic/4level-fixup.h
index e3667c9a33a5..3cc65a4dd093 100644
--- a/include/asm-generic/4level-fixup.h
+++ b/include/asm-generic/4level-fixup.h
@@ -20,6 +20,7 @@
 #define pud_none(pud)			0
 #define pud_bad(pud)			0
 #define pud_present(pud)		1
+#define pud_large(pud)			0
 #define pud_ERROR(pud)			do { } while (0)
 #define pud_clear(pud)			pgd_clear(pud)
 #define pud_val(pud)			pgd_val(pud)
diff --git a/include/asm-generic/5level-fixup.h b/include/asm-generic/5level-fixup.h
index bb6cb347018c..c4377db09a4f 100644
--- a/include/asm-generic/5level-fixup.h
+++ b/include/asm-generic/5level-fixup.h
@@ -22,6 +22,7 @@
 #define p4d_none(p4d)			0
 #define p4d_bad(p4d)			0
 #define p4d_present(p4d)		1
+#define p4d_large(p4d)			0
 #define p4d_ERROR(p4d)			do { } while (0)
 #define p4d_clear(p4d)			pgd_clear(p4d)
 #define p4d_val(p4d)			pgd_val(p4d)
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 4c9e150e5ad3..4be98f700862 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -36,6 +36,7 @@
 #include <linux/rbtree_augmented.h>
 
 #include <linux/uaccess.h>
+#include <asm/pgtable.h>
 #include <asm/tlbflush.h>
 #include <asm/shmparam.h>
 
@@ -284,26 +285,36 @@ struct page *vmalloc_to_page(const void *vmalloc_addr)
 
 	if (pgd_none(*pgd))
 		return NULL;
+
 	p4d = p4d_offset(pgd, addr);
 	if (p4d_none(*p4d))
 		return NULL;
-	pud = pud_offset(p4d, addr);
+	if (WARN_ON_ONCE(p4d_bad(*p4d)))
+		return NULL;
+#ifdef CONFIG_HAVE_ARCH_HUGE_VMAP
+	if (p4d_large(*p4d))
+		return p4d_page(*p4d) + ((addr & ~P4D_MASK) >> PAGE_SHIFT);
+#endif
 
-	/*
-	 * Don't dereference bad PUD or PMD (below) entries. This will also
-	 * identify huge mappings, which we may encounter on architectures
-	 * that define CONFIG_HAVE_ARCH_HUGE_VMAP=y. Such regions will be
-	 * identified as vmalloc addresses by is_vmalloc_addr(), but are
-	 * not [unambiguously] associated with a struct page, so there is
-	 * no correct value to return for them.
-	 */
-	WARN_ON_ONCE(pud_bad(*pud));
-	if (pud_none(*pud) || pud_bad(*pud))
+	pud = pud_offset(p4d, addr);
+	if (pud_none(*pud))
+		return NULL;
+	if (WARN_ON_ONCE(pud_bad(*pud)))
 		return NULL;
+#ifdef CONFIG_HAVE_ARCH_HUGE_VMAP
+	if (pud_large(*pud))
+		return pud_page(*pud) + ((addr & ~PUD_MASK) >> PAGE_SHIFT);
+#endif
+
 	pmd = pmd_offset(pud, addr);
-	WARN_ON_ONCE(pmd_bad(*pmd));
-	if (pmd_none(*pmd) || pmd_bad(*pmd))
+	if (pmd_none(*pmd))
+		return NULL;
+	if (WARN_ON_ONCE(pmd_bad(*pmd)))
 		return NULL;
+#ifdef CONFIG_HAVE_ARCH_HUGE_VMAP
+	if (pmd_large(*pmd))
+		return pmd_page(*pmd) + ((addr & ~PMD_MASK) >> PAGE_SHIFT);
+#endif
 
 	ptep = pte_offset_map(pmd, addr);
 	pte = *ptep;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/3] fix vmalloc_to_page for huge vmap mappings
  2019-06-23  9:44 [PATCH 0/3] fix vmalloc_to_page for huge vmap mappings Nicholas Piggin
                   ` (2 preceding siblings ...)
  2019-06-23  9:44 ` [PATCH 3/3] mm/vmalloc: fix vmalloc_to_page for huge vmap mappings Nicholas Piggin
@ 2019-06-24  5:52 ` Anshuman Khandual
  3 siblings, 0 replies; 12+ messages in thread
From: Anshuman Khandual @ 2019-06-24  5:52 UTC (permalink / raw)
  To: Nicholas Piggin, linux-mm
  Cc: linux-arm-kernel, linuxppc-dev, Andrew Morton, Christophe Leroy,
	Ard Biesheuvel, Mark Rutland

Hello Nicholas,

On 06/23/2019 03:14 PM, Nicholas Piggin wrote:
> This is a change broken out from the huge vmap vmalloc series as
> requested. There is a little bit of dependency juggling across

Thanks for splitting up the previous series and sending this one out.


> trees, but patches are pretty trivial. Ideally if Andrew accepts
> this patch and queues it up for next, then the arch patches would
> be merged through those trees then patch 3 gets sent by Andrew.

Fair enough.

> 
> I've tested this with other powerpc and vmalloc patches, with code
> that explicitly tests vmalloc_to_page on vmalloced memory and
> results look fine.

- Anshuman



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/3] mm/vmalloc: fix vmalloc_to_page for huge vmap mappings
  2019-06-23  9:44 ` [PATCH 3/3] mm/vmalloc: fix vmalloc_to_page for huge vmap mappings Nicholas Piggin
@ 2019-06-24  6:52   ` Anshuman Khandual
  2019-06-25  0:20     ` Nicholas Piggin
  0 siblings, 1 reply; 12+ messages in thread
From: Anshuman Khandual @ 2019-06-24  6:52 UTC (permalink / raw)
  To: Nicholas Piggin, linux-mm
  Cc: linux-arm-kernel, linuxppc-dev, Andrew Morton, Christophe Leroy,
	Ard Biesheuvel, Mark Rutland



On 06/23/2019 03:14 PM, Nicholas Piggin wrote:
> vmalloc_to_page returns NULL for addresses mapped by larger pages[*].
> Whether or not a vmap is huge depends on the architecture details,
> alignments, boot options, etc., which the caller can not be expected
> to know. Therefore HUGE_VMAP is a regression for vmalloc_to_page.
> 
> This change teaches vmalloc_to_page about larger pages, and returns
> the struct page that corresponds to the offset within the large page.
> This makes the API agnostic to mapping implementation details.
> 
> [*] As explained by commit 029c54b095995 ("mm/vmalloc.c: huge-vmap:
>     fail gracefully on unexpected huge vmap mappings")
> 
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
>  include/asm-generic/4level-fixup.h |  1 +
>  include/asm-generic/5level-fixup.h |  1 +
>  mm/vmalloc.c                       | 37 +++++++++++++++++++-----------
>  3 files changed, 26 insertions(+), 13 deletions(-)
> 
> diff --git a/include/asm-generic/4level-fixup.h b/include/asm-generic/4level-fixup.h
> index e3667c9a33a5..3cc65a4dd093 100644
> --- a/include/asm-generic/4level-fixup.h
> +++ b/include/asm-generic/4level-fixup.h
> @@ -20,6 +20,7 @@
>  #define pud_none(pud)			0
>  #define pud_bad(pud)			0
>  #define pud_present(pud)		1
> +#define pud_large(pud)			0
>  #define pud_ERROR(pud)			do { } while (0)
>  #define pud_clear(pud)			pgd_clear(pud)
>  #define pud_val(pud)			pgd_val(pud)
> diff --git a/include/asm-generic/5level-fixup.h b/include/asm-generic/5level-fixup.h
> index bb6cb347018c..c4377db09a4f 100644
> --- a/include/asm-generic/5level-fixup.h
> +++ b/include/asm-generic/5level-fixup.h
> @@ -22,6 +22,7 @@
>  #define p4d_none(p4d)			0
>  #define p4d_bad(p4d)			0
>  #define p4d_present(p4d)		1
> +#define p4d_large(p4d)			0
>  #define p4d_ERROR(p4d)			do { } while (0)
>  #define p4d_clear(p4d)			pgd_clear(p4d)
>  #define p4d_val(p4d)			pgd_val(p4d)
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 4c9e150e5ad3..4be98f700862 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -36,6 +36,7 @@
>  #include <linux/rbtree_augmented.h>
>  
>  #include <linux/uaccess.h>
> +#include <asm/pgtable.h>
>  #include <asm/tlbflush.h>
>  #include <asm/shmparam.h>
>  
> @@ -284,26 +285,36 @@ struct page *vmalloc_to_page(const void *vmalloc_addr)
>  
>  	if (pgd_none(*pgd))
>  		return NULL;
> +
>  	p4d = p4d_offset(pgd, addr);
>  	if (p4d_none(*p4d))
>  		return NULL;
> -	pud = pud_offset(p4d, addr);
> +	if (WARN_ON_ONCE(p4d_bad(*p4d)))
> +		return NULL;

The warning here is a required addition but it needs to be moved after p4d_large()
check. Please see the next comment below.

> +#ifdef CONFIG_HAVE_ARCH_HUGE_VMAP
> +	if (p4d_large(*p4d))
> +		return p4d_page(*p4d) + ((addr & ~P4D_MASK) >> PAGE_SHIFT);
> +#endif
>  
> -	/*
> -	 * Don't dereference bad PUD or PMD (below) entries. This will also
> -	 * identify huge mappings, which we may encounter on architectures
> -	 * that define CONFIG_HAVE_ARCH_HUGE_VMAP=y. Such regions will be
> -	 * identified as vmalloc addresses by is_vmalloc_addr(), but are
> -	 * not [unambiguously] associated with a struct page, so there is
> -	 * no correct value to return for them.
> -	 */
> -	WARN_ON_ONCE(pud_bad(*pud));
> -	if (pud_none(*pud) || pud_bad(*pud))
> +	pud = pud_offset(p4d, addr);
> +	if (pud_none(*pud))
> +		return NULL;
> +	if (WARN_ON_ONCE(pud_bad(*pud)))
>  		return NULL;
> +#ifdef CONFIG_HAVE_ARCH_HUGE_VMAP
> +	if (pud_large(*pud))
> +		return pud_page(*pud) + ((addr & ~PUD_MASK) >> PAGE_SHIFT);
> +#endif
> +

pud_bad() on arm64 returns true when the PUD does not point to a next page
table page implying the fact that it might be a large/huge entry. I am not
sure if the semantics holds good for other architectures too. But on arm64
if pud_large() is true, then pud_bad() will be true as well. So pud_bad()
check must happen after pud_large() check. So the sequence here should be

1. pud_none()	--> Nothing is in here, return NULL
2. pud_large()	--> Return offset page address from the huge page mapping
3. pud_bad()	--> Return NULL as there is no more page table level left

Checking pud_bad() first can return NULL for a valid huge mapping.

>  	pmd = pmd_offset(pud, addr);
> -	WARN_ON_ONCE(pmd_bad(*pmd));
> -	if (pmd_none(*pmd) || pmd_bad(*pmd))
> +	if (pmd_none(*pmd))
> +		return NULL;
> +	if (WARN_ON_ONCE(pmd_bad(*pmd)))
>  		return NULL;
> +#ifdef CONFIG_HAVE_ARCH_HUGE_VMAP
> +	if (pmd_large(*pmd))
> +		return pmd_page(*pmd) + ((addr & ~PMD_MASK) >> PAGE_SHIFT);
> +#endif

Ditto.

I see that your previous proposal had this right which got changed in this
manner after my comments. Sorry about it.

It was recently when I learned (correctly) that expected semantics of pxx_bad()
is that - It does not point to the next page table page.  Hence I wonder why is
this not renamed as pxx_table() instead to make it absolutely clear.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/3] mm/vmalloc: fix vmalloc_to_page for huge vmap mappings
  2019-06-24  6:52   ` Anshuman Khandual
@ 2019-06-25  0:20     ` Nicholas Piggin
  0 siblings, 0 replies; 12+ messages in thread
From: Nicholas Piggin @ 2019-06-25  0:20 UTC (permalink / raw)
  To: Anshuman Khandual, linux-mm
  Cc: Andrew Morton, Ard Biesheuvel, Christophe Leroy,
	linux-arm-kernel, linuxppc-dev, Mark Rutland

Anshuman Khandual's on June 24, 2019 4:52 pm:
> 
> 
> On 06/23/2019 03:14 PM, Nicholas Piggin wrote:
>> vmalloc_to_page returns NULL for addresses mapped by larger pages[*].
>> Whether or not a vmap is huge depends on the architecture details,
>> alignments, boot options, etc., which the caller can not be expected
>> to know. Therefore HUGE_VMAP is a regression for vmalloc_to_page.
>> 
>> This change teaches vmalloc_to_page about larger pages, and returns
>> the struct page that corresponds to the offset within the large page.
>> This makes the API agnostic to mapping implementation details.
>> 
>> [*] As explained by commit 029c54b095995 ("mm/vmalloc.c: huge-vmap:
>>     fail gracefully on unexpected huge vmap mappings")
>> 
>> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
>> ---
>>  include/asm-generic/4level-fixup.h |  1 +
>>  include/asm-generic/5level-fixup.h |  1 +
>>  mm/vmalloc.c                       | 37 +++++++++++++++++++-----------
>>  3 files changed, 26 insertions(+), 13 deletions(-)
>> 
>> diff --git a/include/asm-generic/4level-fixup.h b/include/asm-generic/4level-fixup.h
>> index e3667c9a33a5..3cc65a4dd093 100644
>> --- a/include/asm-generic/4level-fixup.h
>> +++ b/include/asm-generic/4level-fixup.h
>> @@ -20,6 +20,7 @@
>>  #define pud_none(pud)			0
>>  #define pud_bad(pud)			0
>>  #define pud_present(pud)		1
>> +#define pud_large(pud)			0
>>  #define pud_ERROR(pud)			do { } while (0)
>>  #define pud_clear(pud)			pgd_clear(pud)
>>  #define pud_val(pud)			pgd_val(pud)
>> diff --git a/include/asm-generic/5level-fixup.h b/include/asm-generic/5level-fixup.h
>> index bb6cb347018c..c4377db09a4f 100644
>> --- a/include/asm-generic/5level-fixup.h
>> +++ b/include/asm-generic/5level-fixup.h
>> @@ -22,6 +22,7 @@
>>  #define p4d_none(p4d)			0
>>  #define p4d_bad(p4d)			0
>>  #define p4d_present(p4d)		1
>> +#define p4d_large(p4d)			0
>>  #define p4d_ERROR(p4d)			do { } while (0)
>>  #define p4d_clear(p4d)			pgd_clear(p4d)
>>  #define p4d_val(p4d)			pgd_val(p4d)
>> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
>> index 4c9e150e5ad3..4be98f700862 100644
>> --- a/mm/vmalloc.c
>> +++ b/mm/vmalloc.c
>> @@ -36,6 +36,7 @@
>>  #include <linux/rbtree_augmented.h>
>>  
>>  #include <linux/uaccess.h>
>> +#include <asm/pgtable.h>
>>  #include <asm/tlbflush.h>
>>  #include <asm/shmparam.h>
>>  
>> @@ -284,26 +285,36 @@ struct page *vmalloc_to_page(const void *vmalloc_addr)
>>  
>>  	if (pgd_none(*pgd))
>>  		return NULL;
>> +
>>  	p4d = p4d_offset(pgd, addr);
>>  	if (p4d_none(*p4d))
>>  		return NULL;
>> -	pud = pud_offset(p4d, addr);
>> +	if (WARN_ON_ONCE(p4d_bad(*p4d)))
>> +		return NULL;
> 
> The warning here is a required addition but it needs to be moved after p4d_large()
> check. Please see the next comment below.
> 
>> +#ifdef CONFIG_HAVE_ARCH_HUGE_VMAP
>> +	if (p4d_large(*p4d))
>> +		return p4d_page(*p4d) + ((addr & ~P4D_MASK) >> PAGE_SHIFT);
>> +#endif
>>  
>> -	/*
>> -	 * Don't dereference bad PUD or PMD (below) entries. This will also
>> -	 * identify huge mappings, which we may encounter on architectures
>> -	 * that define CONFIG_HAVE_ARCH_HUGE_VMAP=y. Such regions will be
>> -	 * identified as vmalloc addresses by is_vmalloc_addr(), but are
>> -	 * not [unambiguously] associated with a struct page, so there is
>> -	 * no correct value to return for them.
>> -	 */
>> -	WARN_ON_ONCE(pud_bad(*pud));
>> -	if (pud_none(*pud) || pud_bad(*pud))
>> +	pud = pud_offset(p4d, addr);
>> +	if (pud_none(*pud))
>> +		return NULL;
>> +	if (WARN_ON_ONCE(pud_bad(*pud)))
>>  		return NULL;
>> +#ifdef CONFIG_HAVE_ARCH_HUGE_VMAP
>> +	if (pud_large(*pud))
>> +		return pud_page(*pud) + ((addr & ~PUD_MASK) >> PAGE_SHIFT);
>> +#endif
>> +
> 
> pud_bad() on arm64 returns true when the PUD does not point to a next page
> table page implying the fact that it might be a large/huge entry. I am not
> sure if the semantics holds good for other architectures too. But on arm64
> if pud_large() is true, then pud_bad() will be true as well. So pud_bad()
> check must happen after pud_large() check. So the sequence here should be
> 
> 1. pud_none()	--> Nothing is in here, return NULL
> 2. pud_large()	--> Return offset page address from the huge page mapping
> 3. pud_bad()	--> Return NULL as there is no more page table level left
> 
> Checking pud_bad() first can return NULL for a valid huge mapping.
> 
>>  	pmd = pmd_offset(pud, addr);
>> -	WARN_ON_ONCE(pmd_bad(*pmd));
>> -	if (pmd_none(*pmd) || pmd_bad(*pmd))
>> +	if (pmd_none(*pmd))
>> +		return NULL;
>> +	if (WARN_ON_ONCE(pmd_bad(*pmd)))
>>  		return NULL;
>> +#ifdef CONFIG_HAVE_ARCH_HUGE_VMAP
>> +	if (pmd_large(*pmd))
>> +		return pmd_page(*pmd) + ((addr & ~PMD_MASK) >> PAGE_SHIFT);
>> +#endif
> 
> Ditto.
> 
> I see that your previous proposal had this right which got changed in this
> manner after my comments. Sorry about it.
> 
> It was recently when I learned (correctly) that expected semantics of pxx_bad()
> is that - It does not point to the next page table page.  Hence I wonder why is
> this not renamed as pxx_table() instead to make it absolutely clear.
> 

Okay, I'll change it and resend. It worked okay on powerpc but it
looks like the usual precedent is testing for large before bad so we
will have to go with that.

Thanks,
Nick


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/3] arm64: mm: Add p?d_large() definitions
  2019-06-23  9:44 ` [PATCH 1/3] arm64: mm: Add p?d_large() definitions Nicholas Piggin
@ 2019-07-01  9:27   ` Will Deacon
  2019-07-01 10:03     ` Steven Price
  0 siblings, 1 reply; 12+ messages in thread
From: Will Deacon @ 2019-07-01  9:27 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: linux-mm, Christophe Leroy, Mark Rutland, Anshuman Khandual,
	Catalin Marinas, Ard Biesheuvel, Will Deacon, Steven Price,
	Andrew Morton, linuxppc-dev, linux-arm-kernel

Hi Nick,

On Sun, Jun 23, 2019 at 07:44:44PM +1000, Nicholas Piggin wrote:
> walk_page_range() is going to be allowed to walk page tables other than
> those of user space. For this it needs to know when it has reached a
> 'leaf' entry in the page tables. This information will be provided by the
> p?d_large() functions/macros.

I can't remember whether or not I asked this before, but why not call
this macro p?d_leaf() if that's what it's identifying? "Large" and "huge"
are usually synonymous, so I find this naming needlessly confusing based
on this patch in isolation.

Will


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Re: [PATCH 1/3] arm64: mm: Add p?d_large() definitions
  2019-07-01  9:27   ` Will Deacon
@ 2019-07-01 10:03     ` Steven Price
  2019-07-01 10:15       ` Will Deacon
  0 siblings, 1 reply; 12+ messages in thread
From: Steven Price @ 2019-07-01 10:03 UTC (permalink / raw)
  To: Will Deacon, Nicholas Piggin
  Cc: linux-mm, Christophe Leroy, Mark Rutland, Anshuman Khandual,
	Catalin Marinas, Ard Biesheuvel, Will Deacon, Andrew Morton,
	linuxppc-dev, linux-arm-kernel

On 01/07/2019 10:27, Will Deacon wrote:
> Hi Nick,
> 
> On Sun, Jun 23, 2019 at 07:44:44PM +1000, Nicholas Piggin wrote:
>> walk_page_range() is going to be allowed to walk page tables other than
>> those of user space. For this it needs to know when it has reached a
>> 'leaf' entry in the page tables. This information will be provided by the
>> p?d_large() functions/macros.
> 
> I can't remember whether or not I asked this before, but why not call
> this macro p?d_leaf() if that's what it's identifying? "Large" and "huge"
> are usually synonymous, so I find this naming needlessly confusing based
> on this patch in isolation.

Hi Will,

You replied to my posting of this patch before[1], to which you said:

> I've have thought p?d_leaf() might match better with your description
> above, but I'm not going to quibble on naming.

Have you changed your mind about quibbling? ;)

Steve

[1]
https://lore.kernel.org/lkml/20190611153650.GB4324@fuggles.cambridge.arm.com/


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Re: [PATCH 1/3] arm64: mm: Add p?d_large() definitions
  2019-07-01 10:03     ` Steven Price
@ 2019-07-01 10:15       ` Will Deacon
  2019-07-02  3:07         ` Nicholas Piggin
  0 siblings, 1 reply; 12+ messages in thread
From: Will Deacon @ 2019-07-01 10:15 UTC (permalink / raw)
  To: Steven Price
  Cc: Nicholas Piggin, linux-mm, Christophe Leroy, Mark Rutland,
	Anshuman Khandual, Catalin Marinas, Ard Biesheuvel, Will Deacon,
	Andrew Morton, linuxppc-dev, linux-arm-kernel

On Mon, Jul 01, 2019 at 11:03:51AM +0100, Steven Price wrote:
> On 01/07/2019 10:27, Will Deacon wrote:
> > On Sun, Jun 23, 2019 at 07:44:44PM +1000, Nicholas Piggin wrote:
> >> walk_page_range() is going to be allowed to walk page tables other than
> >> those of user space. For this it needs to know when it has reached a
> >> 'leaf' entry in the page tables. This information will be provided by the
> >> p?d_large() functions/macros.
> > 
> > I can't remember whether or not I asked this before, but why not call
> > this macro p?d_leaf() if that's what it's identifying? "Large" and "huge"
> > are usually synonymous, so I find this naming needlessly confusing based
> > on this patch in isolation.
> 
> You replied to my posting of this patch before[1], to which you said:
> 
> > I've have thought p?d_leaf() might match better with your description
> > above, but I'm not going to quibble on naming.

That explains the sense of deja vu.

> Have you changed your mind about quibbling? ;)

Ha, I suppose I have! If it's not loads of effort to use p?d_leaf() instead
of p?d_large, then I'd certainly prefer that.

Will


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Re: [PATCH 1/3] arm64: mm: Add p?d_large() definitions
  2019-07-01 10:15       ` Will Deacon
@ 2019-07-02  3:07         ` Nicholas Piggin
  2019-07-02 10:46           ` Will Deacon
  0 siblings, 1 reply; 12+ messages in thread
From: Nicholas Piggin @ 2019-07-02  3:07 UTC (permalink / raw)
  To: Steven Price, Will Deacon
  Cc: Andrew Morton, Anshuman Khandual, Ard Biesheuvel,
	Catalin Marinas, Christophe Leroy, linux-arm-kernel, linux-mm,
	linuxppc-dev, Mark Rutland, Will Deacon

Will Deacon's on July 1, 2019 8:15 pm:
> On Mon, Jul 01, 2019 at 11:03:51AM +0100, Steven Price wrote:
>> On 01/07/2019 10:27, Will Deacon wrote:
>> > On Sun, Jun 23, 2019 at 07:44:44PM +1000, Nicholas Piggin wrote:
>> >> walk_page_range() is going to be allowed to walk page tables other than
>> >> those of user space. For this it needs to know when it has reached a
>> >> 'leaf' entry in the page tables. This information will be provided by the
>> >> p?d_large() functions/macros.
>> > 
>> > I can't remember whether or not I asked this before, but why not call
>> > this macro p?d_leaf() if that's what it's identifying? "Large" and "huge"
>> > are usually synonymous, so I find this naming needlessly confusing based
>> > on this patch in isolation.

Those page table macro names are horrible. Large, huge, leaf, wtf?
They could do with a sensible renaming. But this series just follows
naming that's alreay there on x86.

Thanks,
Nick


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Re: [PATCH 1/3] arm64: mm: Add p?d_large() definitions
  2019-07-02  3:07         ` Nicholas Piggin
@ 2019-07-02 10:46           ` Will Deacon
  0 siblings, 0 replies; 12+ messages in thread
From: Will Deacon @ 2019-07-02 10:46 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: Steven Price, Andrew Morton, Anshuman Khandual, Ard Biesheuvel,
	Catalin Marinas, Christophe Leroy, linux-arm-kernel, linux-mm,
	linuxppc-dev, Mark Rutland, Will Deacon

On Tue, Jul 02, 2019 at 01:07:11PM +1000, Nicholas Piggin wrote:
> Will Deacon's on July 1, 2019 8:15 pm:
> > On Mon, Jul 01, 2019 at 11:03:51AM +0100, Steven Price wrote:
> >> On 01/07/2019 10:27, Will Deacon wrote:
> >> > On Sun, Jun 23, 2019 at 07:44:44PM +1000, Nicholas Piggin wrote:
> >> >> walk_page_range() is going to be allowed to walk page tables other than
> >> >> those of user space. For this it needs to know when it has reached a
> >> >> 'leaf' entry in the page tables. This information will be provided by the
> >> >> p?d_large() functions/macros.
> >> > 
> >> > I can't remember whether or not I asked this before, but why not call
> >> > this macro p?d_leaf() if that's what it's identifying? "Large" and "huge"
> >> > are usually synonymous, so I find this naming needlessly confusing based
> >> > on this patch in isolation.
> 
> Those page table macro names are horrible. Large, huge, leaf, wtf?
> They could do with a sensible renaming. But this series just follows
> naming that's alreay there on x86.

I realise that, and I wasn't meaning to have a go at you. Just wanted to
make my opinion clear by having a moan :)

Will


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2019-07-02 10:46 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-23  9:44 [PATCH 0/3] fix vmalloc_to_page for huge vmap mappings Nicholas Piggin
2019-06-23  9:44 ` [PATCH 1/3] arm64: mm: Add p?d_large() definitions Nicholas Piggin
2019-07-01  9:27   ` Will Deacon
2019-07-01 10:03     ` Steven Price
2019-07-01 10:15       ` Will Deacon
2019-07-02  3:07         ` Nicholas Piggin
2019-07-02 10:46           ` Will Deacon
2019-06-23  9:44 ` [PATCH 2/3] powerpc/64s: Add p?d_large definitions Nicholas Piggin
2019-06-23  9:44 ` [PATCH 3/3] mm/vmalloc: fix vmalloc_to_page for huge vmap mappings Nicholas Piggin
2019-06-24  6:52   ` Anshuman Khandual
2019-06-25  0:20     ` Nicholas Piggin
2019-06-24  5:52 ` [PATCH 0/3] " Anshuman Khandual

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).