All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v16 00/25] Generic page walk and ptdump
@ 2019-12-06 13:52 ` Steven Price
  0 siblings, 0 replies; 85+ messages in thread
From: Steven Price @ 2019-12-06 13:52 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: Steven Price, Andy Lutomirski, Ard Biesheuvel, Arnd Bergmann,
	Borislav Petkov, Catalin Marinas, Dave Hansen, Ingo Molnar,
	James Morse, Jérôme Glisse, Peter Zijlstra,
	Thomas Gleixner, Will Deacon, x86, H. Peter Anvin,
	linux-arm-kernel, linux-kernel, Mark Rutland, Liang, Kan

Since this series is still in linux-next and causing problems I'm
sending this out before -rc1.

This version adds two new patches over the previous series (v15):
 13: mm: pagewalk: Don't lock PTEs for walk_page_range_novma()
 14: mm: pagewalk: fix termination condition in walk_pte_range()

Patch 13 solves the conflict with ace88f1018b8 ("mm: pagewalk: Take the
pagetable lock in walk_pte_range()") by not taking the lock for the
_novma() version of the function.

Patch 14 fixes an existing issue with walk_pte_range() whereby if the
end address isn't aligned to PAGE_SIZE the loop will be infinite. This
starts to trigger on some x86 32 bit configurations with the generic
ptdump support because there is a page in the last PMD which means that
the end address is ~0UL.

I've posted these patches separately as I think they do stand alone (and
shouldn't cause bisection problems) - but 13/14 could potentially be
squashed into 12.

Patch 12 ("mm: pagewalk: Allow walking without vma") has also been
updated from v15 to include the p*d_present() check that was posted[1]
after v15 and Andrew squashed into the commit.

Patch 21 ("mm: Add generic ptdump") also has the fix from Qian Cai
squashed in to fix the order of "static const".

[1] https://lore.kernel.org/lkml/16da6118-ac4d-a165-6202-0731a776ac72@arm.com/

Previous description for the series:

Many architectures current have a debugfs file for dumping the kernel
page tables. Currently each architecture has to implement custom
functions for this because the details of walking the page tables used
by the kernel are different between architectures.

This series extends the capabilities of walk_page_range() so that it can
deal with the page tables of the kernel (which have no VMAs and can
contain larger huge pages than exist for user space). A generic PTDUMP
implementation is the implemented making use of the new functionality of
walk_page_range() and finally arm64 and x86 are switch to using it,
removing the custom table walkers.

To enable a generic page table walker to walk the unusual mappings of
the kernel we need to implement a set of functions which let us know
when the walker has reached the leaf entry. After a suggestion from Will
Deacon I've chosen the name p?d_leaf() as this (hopefully) describes
the purpose (and is a new name so has no historic baggage). Some
architectures have p?d_large macros but this is easily confused with
"large pages".

This series ends with a generic PTDUMP implemention for arm64 and x86.

Mostly this is a clean up and there should be very little functional
change. The exceptions are:

* arm64 PTDUMP debugfs now displays pages which aren't present (patch 22).

* arm64 has the ability to efficiently process KASAN pages (which
  previously only x86 implemented). This means that the combination of
  KASAN and DEBUG_WX is now useable.

Also available as a git tree:
git://linux-arm.org/linux-sp.git walk_page_range/v16

Changes since v15:
https://lore.kernel.org/lkml/20191101140942.51554-1-steven.price@arm.com/
 * Rebased onto Linus' tree, which includes the conflicting commit:
   ace88f1018b8 ("mm: pagewalk: Take the pagetable lock in walk_pte_range()")
 * New patch fixing conflict with above patch
 * Squashed in fix for ordering of "static const"
 * Squashed in fix checking p*d_present()
 * New patch fixing termination condition for walk_pte_range()

Changes since v14:
https://lore.kernel.org/lkml/20191028135910.33253-1-steven.price@arm.com/
 * Switch walk_page_range() into two functions, the existing
   walk_page_range() now still requires VMAs (and treats areas without a
   VMA as a 'hole'). The new walk_page_range_novma() ignores VMAs and
   will report the actual page table layout. This fixes the previous
   breakage of /proc/<pid>/pagemap
 * New patch at the end of the series which reduces the 'level' numbers
   by 1 to simplify the code slightly
 * Added tags

Changes since v13:
https://lore.kernel.org/lkml/20191024093716.49420-1-steven.price@arm.com/
 * Fixed typo in arc definition of pmd_leaf() spotted by the kbuild test
   robot
 * Added tags

Changes since v12:
https://lore.kernel.org/lkml/20191018101248.33727-1-steven.price@arm.com/
 * Correct code format in riscv pud_leaf()/pmd_leaf()
 * v12 may not have reached everyone because of mail server problems
   (which are now hopefully resolved!)

Changes since v11:
https://lore.kernel.org/lkml/20191007153822.16518-1-steven.price@arm.com/
 * Use "-1" as dummy depth parameter in patch 14.

Changes since v10:
https://lore.kernel.org/lkml/20190731154603.41797-1-steven.price@arm.com/
 * Rebased to v5.4-rc1 - mainly various updates to deal with the
   splitting out of ops from struct mm_walk.
 * Deal with PGD_LEVEL_MULT not always being constant on x86.

Changes since v9:
https://lore.kernel.org/lkml/20190722154210.42799-1-steven.price@arm.com/
 * Moved generic macros to first page in the series and explained the
   macro naming in the commit message.
 * mips: Moved macros to pgtable.h as they are now valid for both 32 and 64
   bit
 * x86: Dropped patch which changed the debugfs output for x86, instead
   we have...
 * new patch adding 'depth' parameter to pte_hole. This is used to
   provide the necessary information to output lines for 'holes' in the
   debugfs files
 * new patch changing arm64 debugfs output to include holes to match x86
 * generic ptdump KASAN handling has been simplified and now works with
   CONFIG_DEBUG_VIRTUAL.

Changes since v8:
https://lore.kernel.org/lkml/20190403141627.11664-1-steven.price@arm.com/
 * Rename from p?d_large() to p?d_leaf()
 * Dropped patches migrating arm64/x86 custom walkers to
   walk_page_range() in favour of adding a generic PTDUMP implementation
   and migrating arm64/x86 to that instead.
 * Rebased to v5.3-rc1

Steven Price (25):
  mm: Add generic p?d_leaf() macros
  arc: mm: Add p?d_leaf() definitions
  arm: mm: Add p?d_leaf() definitions
  arm64: mm: Add p?d_leaf() definitions
  mips: mm: Add p?d_leaf() definitions
  powerpc: mm: Add p?d_leaf() definitions
  riscv: mm: Add p?d_leaf() definitions
  s390: mm: Add p?d_leaf() definitions
  sparc: mm: Add p?d_leaf() definitions
  x86: mm: Add p?d_leaf() definitions
  mm: pagewalk: Add p4d_entry() and pgd_entry()
  mm: pagewalk: Allow walking without vma
  mm: pagewalk: Don't lock PTEs for walk_page_range_novma()
  mm: pagewalk: fix termination condition in walk_pte_range()
  mm: pagewalk: Add test_p?d callbacks
  mm: pagewalk: Add 'depth' parameter to pte_hole
  x86: mm: Point to struct seq_file from struct pg_state
  x86: mm+efi: Convert ptdump_walk_pgd_level() to take a mm_struct
  x86: mm: Convert ptdump_walk_pgd_level_debugfs() to take an mm_struct
  x86: mm: Convert ptdump_walk_pgd_level_core() to take an mm_struct
  mm: Add generic ptdump
  x86: mm: Convert dump_pagetables to use walk_page_range
  arm64: mm: Convert mm/dump.c to use walk_page_range()
  arm64: mm: Display non-present entries in ptdump
  mm: ptdump: Reduce level numbers by 1 in note_page()

 arch/arc/include/asm/pgtable.h               |   1 +
 arch/arm/include/asm/pgtable-2level.h        |   1 +
 arch/arm/include/asm/pgtable-3level.h        |   1 +
 arch/arm64/Kconfig                           |   1 +
 arch/arm64/Kconfig.debug                     |  19 +-
 arch/arm64/include/asm/pgtable.h             |   2 +
 arch/arm64/include/asm/ptdump.h              |   8 +-
 arch/arm64/mm/Makefile                       |   4 +-
 arch/arm64/mm/dump.c                         | 148 +++-----
 arch/arm64/mm/mmu.c                          |   4 +-
 arch/arm64/mm/ptdump_debugfs.c               |   2 +-
 arch/mips/include/asm/pgtable.h              |   5 +
 arch/powerpc/include/asm/book3s/64/pgtable.h |  30 +-
 arch/riscv/include/asm/pgtable-64.h          |   7 +
 arch/riscv/include/asm/pgtable.h             |   7 +
 arch/s390/include/asm/pgtable.h              |   2 +
 arch/sparc/include/asm/pgtable_64.h          |   2 +
 arch/x86/Kconfig                             |   1 +
 arch/x86/Kconfig.debug                       |  20 +-
 arch/x86/include/asm/pgtable.h               |  10 +-
 arch/x86/mm/Makefile                         |   4 +-
 arch/x86/mm/debug_pagetables.c               |   8 +-
 arch/x86/mm/dump_pagetables.c                | 343 +++++--------------
 arch/x86/platform/efi/efi_32.c               |   2 +-
 arch/x86/platform/efi/efi_64.c               |   4 +-
 drivers/firmware/efi/arm-runtime.c           |   2 +-
 fs/proc/task_mmu.c                           |   4 +-
 include/asm-generic/pgtable.h                |  20 ++
 include/linux/pagewalk.h                     |  42 ++-
 include/linux/ptdump.h                       |  22 ++
 mm/Kconfig.debug                             |  21 ++
 mm/Makefile                                  |   1 +
 mm/hmm.c                                     |   8 +-
 mm/migrate.c                                 |   5 +-
 mm/mincore.c                                 |   1 +
 mm/pagewalk.c                                | 145 ++++++--
 mm/ptdump.c                                  | 151 ++++++++
 37 files changed, 600 insertions(+), 458 deletions(-)
 create mode 100644 include/linux/ptdump.h
 create mode 100644 mm/ptdump.c

-- 
2.20.1


^ permalink raw reply	[flat|nested] 85+ messages in thread

* [PATCH v16 00/25] Generic page walk and ptdump
@ 2019-12-06 13:52 ` Steven Price
  0 siblings, 0 replies; 85+ messages in thread
From: Steven Price @ 2019-12-06 13:52 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: Mark Rutland, x86, Arnd Bergmann, Ard Biesheuvel, Peter Zijlstra,
	Catalin Marinas, Dave Hansen, linux-kernel, Steven Price,
	Jérôme Glisse, Ingo Molnar, Borislav Petkov,
	Andy Lutomirski, H. Peter Anvin, James Morse, Thomas Gleixner,
	Will Deacon, linux-arm-kernel, Liang, Kan

Since this series is still in linux-next and causing problems I'm
sending this out before -rc1.

This version adds two new patches over the previous series (v15):
 13: mm: pagewalk: Don't lock PTEs for walk_page_range_novma()
 14: mm: pagewalk: fix termination condition in walk_pte_range()

Patch 13 solves the conflict with ace88f1018b8 ("mm: pagewalk: Take the
pagetable lock in walk_pte_range()") by not taking the lock for the
_novma() version of the function.

Patch 14 fixes an existing issue with walk_pte_range() whereby if the
end address isn't aligned to PAGE_SIZE the loop will be infinite. This
starts to trigger on some x86 32 bit configurations with the generic
ptdump support because there is a page in the last PMD which means that
the end address is ~0UL.

I've posted these patches separately as I think they do stand alone (and
shouldn't cause bisection problems) - but 13/14 could potentially be
squashed into 12.

Patch 12 ("mm: pagewalk: Allow walking without vma") has also been
updated from v15 to include the p*d_present() check that was posted[1]
after v15 and Andrew squashed into the commit.

Patch 21 ("mm: Add generic ptdump") also has the fix from Qian Cai
squashed in to fix the order of "static const".

[1] https://lore.kernel.org/lkml/16da6118-ac4d-a165-6202-0731a776ac72@arm.com/

Previous description for the series:

Many architectures current have a debugfs file for dumping the kernel
page tables. Currently each architecture has to implement custom
functions for this because the details of walking the page tables used
by the kernel are different between architectures.

This series extends the capabilities of walk_page_range() so that it can
deal with the page tables of the kernel (which have no VMAs and can
contain larger huge pages than exist for user space). A generic PTDUMP
implementation is the implemented making use of the new functionality of
walk_page_range() and finally arm64 and x86 are switch to using it,
removing the custom table walkers.

To enable a generic page table walker to walk the unusual mappings of
the kernel we need to implement a set of functions which let us know
when the walker has reached the leaf entry. After a suggestion from Will
Deacon I've chosen the name p?d_leaf() as this (hopefully) describes
the purpose (and is a new name so has no historic baggage). Some
architectures have p?d_large macros but this is easily confused with
"large pages".

This series ends with a generic PTDUMP implemention for arm64 and x86.

Mostly this is a clean up and there should be very little functional
change. The exceptions are:

* arm64 PTDUMP debugfs now displays pages which aren't present (patch 22).

* arm64 has the ability to efficiently process KASAN pages (which
  previously only x86 implemented). This means that the combination of
  KASAN and DEBUG_WX is now useable.

Also available as a git tree:
git://linux-arm.org/linux-sp.git walk_page_range/v16

Changes since v15:
https://lore.kernel.org/lkml/20191101140942.51554-1-steven.price@arm.com/
 * Rebased onto Linus' tree, which includes the conflicting commit:
   ace88f1018b8 ("mm: pagewalk: Take the pagetable lock in walk_pte_range()")
 * New patch fixing conflict with above patch
 * Squashed in fix for ordering of "static const"
 * Squashed in fix checking p*d_present()
 * New patch fixing termination condition for walk_pte_range()

Changes since v14:
https://lore.kernel.org/lkml/20191028135910.33253-1-steven.price@arm.com/
 * Switch walk_page_range() into two functions, the existing
   walk_page_range() now still requires VMAs (and treats areas without a
   VMA as a 'hole'). The new walk_page_range_novma() ignores VMAs and
   will report the actual page table layout. This fixes the previous
   breakage of /proc/<pid>/pagemap
 * New patch at the end of the series which reduces the 'level' numbers
   by 1 to simplify the code slightly
 * Added tags

Changes since v13:
https://lore.kernel.org/lkml/20191024093716.49420-1-steven.price@arm.com/
 * Fixed typo in arc definition of pmd_leaf() spotted by the kbuild test
   robot
 * Added tags

Changes since v12:
https://lore.kernel.org/lkml/20191018101248.33727-1-steven.price@arm.com/
 * Correct code format in riscv pud_leaf()/pmd_leaf()
 * v12 may not have reached everyone because of mail server problems
   (which are now hopefully resolved!)

Changes since v11:
https://lore.kernel.org/lkml/20191007153822.16518-1-steven.price@arm.com/
 * Use "-1" as dummy depth parameter in patch 14.

Changes since v10:
https://lore.kernel.org/lkml/20190731154603.41797-1-steven.price@arm.com/
 * Rebased to v5.4-rc1 - mainly various updates to deal with the
   splitting out of ops from struct mm_walk.
 * Deal with PGD_LEVEL_MULT not always being constant on x86.

Changes since v9:
https://lore.kernel.org/lkml/20190722154210.42799-1-steven.price@arm.com/
 * Moved generic macros to first page in the series and explained the
   macro naming in the commit message.
 * mips: Moved macros to pgtable.h as they are now valid for both 32 and 64
   bit
 * x86: Dropped patch which changed the debugfs output for x86, instead
   we have...
 * new patch adding 'depth' parameter to pte_hole. This is used to
   provide the necessary information to output lines for 'holes' in the
   debugfs files
 * new patch changing arm64 debugfs output to include holes to match x86
 * generic ptdump KASAN handling has been simplified and now works with
   CONFIG_DEBUG_VIRTUAL.

Changes since v8:
https://lore.kernel.org/lkml/20190403141627.11664-1-steven.price@arm.com/
 * Rename from p?d_large() to p?d_leaf()
 * Dropped patches migrating arm64/x86 custom walkers to
   walk_page_range() in favour of adding a generic PTDUMP implementation
   and migrating arm64/x86 to that instead.
 * Rebased to v5.3-rc1

Steven Price (25):
  mm: Add generic p?d_leaf() macros
  arc: mm: Add p?d_leaf() definitions
  arm: mm: Add p?d_leaf() definitions
  arm64: mm: Add p?d_leaf() definitions
  mips: mm: Add p?d_leaf() definitions
  powerpc: mm: Add p?d_leaf() definitions
  riscv: mm: Add p?d_leaf() definitions
  s390: mm: Add p?d_leaf() definitions
  sparc: mm: Add p?d_leaf() definitions
  x86: mm: Add p?d_leaf() definitions
  mm: pagewalk: Add p4d_entry() and pgd_entry()
  mm: pagewalk: Allow walking without vma
  mm: pagewalk: Don't lock PTEs for walk_page_range_novma()
  mm: pagewalk: fix termination condition in walk_pte_range()
  mm: pagewalk: Add test_p?d callbacks
  mm: pagewalk: Add 'depth' parameter to pte_hole
  x86: mm: Point to struct seq_file from struct pg_state
  x86: mm+efi: Convert ptdump_walk_pgd_level() to take a mm_struct
  x86: mm: Convert ptdump_walk_pgd_level_debugfs() to take an mm_struct
  x86: mm: Convert ptdump_walk_pgd_level_core() to take an mm_struct
  mm: Add generic ptdump
  x86: mm: Convert dump_pagetables to use walk_page_range
  arm64: mm: Convert mm/dump.c to use walk_page_range()
  arm64: mm: Display non-present entries in ptdump
  mm: ptdump: Reduce level numbers by 1 in note_page()

 arch/arc/include/asm/pgtable.h               |   1 +
 arch/arm/include/asm/pgtable-2level.h        |   1 +
 arch/arm/include/asm/pgtable-3level.h        |   1 +
 arch/arm64/Kconfig                           |   1 +
 arch/arm64/Kconfig.debug                     |  19 +-
 arch/arm64/include/asm/pgtable.h             |   2 +
 arch/arm64/include/asm/ptdump.h              |   8 +-
 arch/arm64/mm/Makefile                       |   4 +-
 arch/arm64/mm/dump.c                         | 148 +++-----
 arch/arm64/mm/mmu.c                          |   4 +-
 arch/arm64/mm/ptdump_debugfs.c               |   2 +-
 arch/mips/include/asm/pgtable.h              |   5 +
 arch/powerpc/include/asm/book3s/64/pgtable.h |  30 +-
 arch/riscv/include/asm/pgtable-64.h          |   7 +
 arch/riscv/include/asm/pgtable.h             |   7 +
 arch/s390/include/asm/pgtable.h              |   2 +
 arch/sparc/include/asm/pgtable_64.h          |   2 +
 arch/x86/Kconfig                             |   1 +
 arch/x86/Kconfig.debug                       |  20 +-
 arch/x86/include/asm/pgtable.h               |  10 +-
 arch/x86/mm/Makefile                         |   4 +-
 arch/x86/mm/debug_pagetables.c               |   8 +-
 arch/x86/mm/dump_pagetables.c                | 343 +++++--------------
 arch/x86/platform/efi/efi_32.c               |   2 +-
 arch/x86/platform/efi/efi_64.c               |   4 +-
 drivers/firmware/efi/arm-runtime.c           |   2 +-
 fs/proc/task_mmu.c                           |   4 +-
 include/asm-generic/pgtable.h                |  20 ++
 include/linux/pagewalk.h                     |  42 ++-
 include/linux/ptdump.h                       |  22 ++
 mm/Kconfig.debug                             |  21 ++
 mm/Makefile                                  |   1 +
 mm/hmm.c                                     |   8 +-
 mm/migrate.c                                 |   5 +-
 mm/mincore.c                                 |   1 +
 mm/pagewalk.c                                | 145 ++++++--
 mm/ptdump.c                                  | 151 ++++++++
 37 files changed, 600 insertions(+), 458 deletions(-)
 create mode 100644 include/linux/ptdump.h
 create mode 100644 mm/ptdump.c

-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 85+ messages in thread

* [PATCH v16 01/25] mm: Add generic p?d_leaf() macros
  2019-12-06 13:52 ` Steven Price
@ 2019-12-06 13:52   ` Steven Price
  -1 siblings, 0 replies; 85+ messages in thread
From: Steven Price @ 2019-12-06 13:52 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: Steven Price, Andy Lutomirski, Ard Biesheuvel, Arnd Bergmann,
	Borislav Petkov, Catalin Marinas, Dave Hansen, Ingo Molnar,
	James Morse, Jérôme Glisse, Peter Zijlstra,
	Thomas Gleixner, Will Deacon, x86, H. Peter Anvin,
	linux-arm-kernel, linux-kernel, Mark Rutland, Liang, Kan,
	Mark Rutland

Exposing the pud/pgd levels of the page tables to walk_page_range() means
we may come across the exotic large mappings that come with large areas
of contiguous memory (such as the kernel's linear map).

For architectures that don't provide all p?d_leaf() macros, provide
generic do nothing default that are suitable where there cannot be leaf
pages at that level. Futher patches will add implementations for
individual architectures.

The name p?d_leaf() is chosen to minimize the confusion with existing
uses of "large" pages and "huge" pages which do not necessary mean that
the entry is a leaf (for example it may be a set of contiguous entries
that only take 1 TLB slot). For the purpose of walking the page tables
we don't need to know how it will be represented in the TLB, but we do
need to know for sure if it is a leaf of the tree.

Signed-off-by: Steven Price <steven.price@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
---
 include/asm-generic/pgtable.h | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
index 798ea36a0549..e2e2bef07dd2 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -1238,4 +1238,24 @@ static inline bool arch_has_pfn_modify_check(void)
 #define mm_pmd_folded(mm)	__is_defined(__PAGETABLE_PMD_FOLDED)
 #endif
 
+/*
+ * p?d_leaf() - true if this entry is a final mapping to a physical address.
+ * This differs from p?d_huge() by the fact that they are always available (if
+ * the architecture supports large pages at the appropriate level) even
+ * if CONFIG_HUGETLB_PAGE is not defined.
+ * Only meaningful when called on a valid entry.
+ */
+#ifndef pgd_leaf
+#define pgd_leaf(x)	0
+#endif
+#ifndef p4d_leaf
+#define p4d_leaf(x)	0
+#endif
+#ifndef pud_leaf
+#define pud_leaf(x)	0
+#endif
+#ifndef pmd_leaf
+#define pmd_leaf(x)	0
+#endif
+
 #endif /* _ASM_GENERIC_PGTABLE_H */
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH v16 01/25] mm: Add generic p?d_leaf() macros
@ 2019-12-06 13:52   ` Steven Price
  0 siblings, 0 replies; 85+ messages in thread
From: Steven Price @ 2019-12-06 13:52 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: Mark Rutland, x86, Arnd Bergmann, Ard Biesheuvel, Peter Zijlstra,
	Catalin Marinas, Dave Hansen, linux-kernel, Steven Price,
	Jérôme Glisse, Ingo Molnar, Borislav Petkov,
	Andy Lutomirski, H. Peter Anvin, James Morse, Thomas Gleixner,
	Will Deacon, linux-arm-kernel, Liang, Kan

Exposing the pud/pgd levels of the page tables to walk_page_range() means
we may come across the exotic large mappings that come with large areas
of contiguous memory (such as the kernel's linear map).

For architectures that don't provide all p?d_leaf() macros, provide
generic do nothing default that are suitable where there cannot be leaf
pages at that level. Futher patches will add implementations for
individual architectures.

The name p?d_leaf() is chosen to minimize the confusion with existing
uses of "large" pages and "huge" pages which do not necessary mean that
the entry is a leaf (for example it may be a set of contiguous entries
that only take 1 TLB slot). For the purpose of walking the page tables
we don't need to know how it will be represented in the TLB, but we do
need to know for sure if it is a leaf of the tree.

Signed-off-by: Steven Price <steven.price@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
---
 include/asm-generic/pgtable.h | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
index 798ea36a0549..e2e2bef07dd2 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -1238,4 +1238,24 @@ static inline bool arch_has_pfn_modify_check(void)
 #define mm_pmd_folded(mm)	__is_defined(__PAGETABLE_PMD_FOLDED)
 #endif
 
+/*
+ * p?d_leaf() - true if this entry is a final mapping to a physical address.
+ * This differs from p?d_huge() by the fact that they are always available (if
+ * the architecture supports large pages at the appropriate level) even
+ * if CONFIG_HUGETLB_PAGE is not defined.
+ * Only meaningful when called on a valid entry.
+ */
+#ifndef pgd_leaf
+#define pgd_leaf(x)	0
+#endif
+#ifndef p4d_leaf
+#define p4d_leaf(x)	0
+#endif
+#ifndef pud_leaf
+#define pud_leaf(x)	0
+#endif
+#ifndef pmd_leaf
+#define pmd_leaf(x)	0
+#endif
+
 #endif /* _ASM_GENERIC_PGTABLE_H */
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH v16 02/25] arc: mm: Add p?d_leaf() definitions
  2019-12-06 13:52 ` Steven Price
  (?)
@ 2019-12-06 13:52   ` Steven Price
  -1 siblings, 0 replies; 85+ messages in thread
From: Steven Price @ 2019-12-06 13:52 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: Steven Price, Andy Lutomirski, Ard Biesheuvel, Arnd Bergmann,
	Borislav Petkov, Catalin Marinas, Dave Hansen, Ingo Molnar,
	James Morse, Jérôme Glisse, Peter Zijlstra,
	Thomas Gleixner, Will Deacon, x86, H. Peter Anvin,
	linux-arm-kernel, linux-kernel, Mark Rutland, Liang, Kan,
	Vineet Gupta, linux-snps-arc

walk_page_range() is going to be allowed to walk page tables other than
those of user space. For this it needs to know when it has reached a
'leaf' entry in the page tables. This information will be provided by the
p?d_leaf() functions/macros.

For arc, we only have two levels, so only pmd_leaf() is needed.

CC: Vineet Gupta <vgupta@synopsys.com>
CC: linux-snps-arc@lists.infradead.org
Acked-by: Vineet Gupta <vgupta@synopsys.com>
Signed-off-by: Steven Price <steven.price@arm.com>
---
 arch/arc/include/asm/pgtable.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arc/include/asm/pgtable.h b/arch/arc/include/asm/pgtable.h
index 9019ed9f9c94..12be7e1b7cc0 100644
--- a/arch/arc/include/asm/pgtable.h
+++ b/arch/arc/include/asm/pgtable.h
@@ -273,6 +273,7 @@ static inline void pmd_set(pmd_t *pmdp, pte_t *ptep)
 #define pmd_none(x)			(!pmd_val(x))
 #define	pmd_bad(x)			((pmd_val(x) & ~PAGE_MASK))
 #define pmd_present(x)			(pmd_val(x))
+#define pmd_leaf(x)			(pmd_val(x) & _PAGE_HW_SZ)
 #define pmd_clear(xp)			do { pmd_val(*(xp)) = 0; } while (0)
 
 #define pte_page(pte)		pfn_to_page(pte_pfn(pte))
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH v16 02/25] arc: mm: Add p?d_leaf() definitions
@ 2019-12-06 13:52   ` Steven Price
  0 siblings, 0 replies; 85+ messages in thread
From: Steven Price @ 2019-12-06 13:52 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: Mark Rutland, x86, Arnd Bergmann, Ard Biesheuvel, Peter Zijlstra,
	Catalin Marinas, Dave Hansen, linux-kernel, Steven Price,
	Vineet Gupta, Jérôme Glisse, Ingo Molnar,
	Borislav Petkov, Andy Lutomirski, H. Peter Anvin, James Morse,
	Thomas Gleixner, linux-snps-arc, Will Deacon, linux-arm-kernel,
	Liang, Kan

walk_page_range() is going to be allowed to walk page tables other than
those of user space. For this it needs to know when it has reached a
'leaf' entry in the page tables. This information will be provided by the
p?d_leaf() functions/macros.

For arc, we only have two levels, so only pmd_leaf() is needed.

CC: Vineet Gupta <vgupta@synopsys.com>
CC: linux-snps-arc@lists.infradead.org
Acked-by: Vineet Gupta <vgupta@synopsys.com>
Signed-off-by: Steven Price <steven.price@arm.com>
---
 arch/arc/include/asm/pgtable.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arc/include/asm/pgtable.h b/arch/arc/include/asm/pgtable.h
index 9019ed9f9c94..12be7e1b7cc0 100644
--- a/arch/arc/include/asm/pgtable.h
+++ b/arch/arc/include/asm/pgtable.h
@@ -273,6 +273,7 @@ static inline void pmd_set(pmd_t *pmdp, pte_t *ptep)
 #define pmd_none(x)			(!pmd_val(x))
 #define	pmd_bad(x)			((pmd_val(x) & ~PAGE_MASK))
 #define pmd_present(x)			(pmd_val(x))
+#define pmd_leaf(x)			(pmd_val(x) & _PAGE_HW_SZ)
 #define pmd_clear(xp)			do { pmd_val(*(xp)) = 0; } while (0)
 
 #define pte_page(pte)		pfn_to_page(pte_pfn(pte))
-- 
2.20.1


_______________________________________________
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc

^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH v16 02/25] arc: mm: Add p?d_leaf() definitions
@ 2019-12-06 13:52   ` Steven Price
  0 siblings, 0 replies; 85+ messages in thread
From: Steven Price @ 2019-12-06 13:52 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: Mark Rutland, x86, Arnd Bergmann, Ard Biesheuvel, Peter Zijlstra,
	Catalin Marinas, Dave Hansen, linux-kernel, Steven Price,
	Vineet Gupta, Jérôme Glisse, Ingo Molnar,
	Borislav Petkov, Andy Lutomirski, H. Peter Anvin, James Morse,
	Thomas Gleixner, linux-snps-arc, Will Deacon, linux-arm-kernel,
	Liang, Kan

walk_page_range() is going to be allowed to walk page tables other than
those of user space. For this it needs to know when it has reached a
'leaf' entry in the page tables. This information will be provided by the
p?d_leaf() functions/macros.

For arc, we only have two levels, so only pmd_leaf() is needed.

CC: Vineet Gupta <vgupta@synopsys.com>
CC: linux-snps-arc@lists.infradead.org
Acked-by: Vineet Gupta <vgupta@synopsys.com>
Signed-off-by: Steven Price <steven.price@arm.com>
---
 arch/arc/include/asm/pgtable.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arc/include/asm/pgtable.h b/arch/arc/include/asm/pgtable.h
index 9019ed9f9c94..12be7e1b7cc0 100644
--- a/arch/arc/include/asm/pgtable.h
+++ b/arch/arc/include/asm/pgtable.h
@@ -273,6 +273,7 @@ static inline void pmd_set(pmd_t *pmdp, pte_t *ptep)
 #define pmd_none(x)			(!pmd_val(x))
 #define	pmd_bad(x)			((pmd_val(x) & ~PAGE_MASK))
 #define pmd_present(x)			(pmd_val(x))
+#define pmd_leaf(x)			(pmd_val(x) & _PAGE_HW_SZ)
 #define pmd_clear(xp)			do { pmd_val(*(xp)) = 0; } while (0)
 
 #define pte_page(pte)		pfn_to_page(pte_pfn(pte))
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH v16 03/25] arm: mm: Add p?d_leaf() definitions
  2019-12-06 13:52 ` Steven Price
@ 2019-12-06 13:52   ` Steven Price
  -1 siblings, 0 replies; 85+ messages in thread
From: Steven Price @ 2019-12-06 13:52 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: Steven Price, Andy Lutomirski, Ard Biesheuvel, Arnd Bergmann,
	Borislav Petkov, Catalin Marinas, Dave Hansen, Ingo Molnar,
	James Morse, Jérôme Glisse, Peter Zijlstra,
	Thomas Gleixner, Will Deacon, x86, H. Peter Anvin,
	linux-arm-kernel, linux-kernel, Mark Rutland, Liang, Kan,
	Russell King

walk_page_range() is going to be allowed to walk page tables other than
those of user space. For this it needs to know when it has reached a
'leaf' entry in the page tables. This information is provided by the
p?d_leaf() functions/macros.

For arm pmd_large() already exists and does what we want. So simply
provide the generic pmd_leaf() name.

CC: Russell King <linux@armlinux.org.uk>
CC: linux-arm-kernel@lists.infradead.org
Signed-off-by: Steven Price <steven.price@arm.com>
---
 arch/arm/include/asm/pgtable-2level.h | 1 +
 arch/arm/include/asm/pgtable-3level.h | 1 +
 2 files changed, 2 insertions(+)

diff --git a/arch/arm/include/asm/pgtable-2level.h b/arch/arm/include/asm/pgtable-2level.h
index 51beec41d48c..0d3ea35c97fe 100644
--- a/arch/arm/include/asm/pgtable-2level.h
+++ b/arch/arm/include/asm/pgtable-2level.h
@@ -189,6 +189,7 @@ static inline pmd_t *pmd_offset(pud_t *pud, unsigned long addr)
 }
 
 #define pmd_large(pmd)		(pmd_val(pmd) & 2)
+#define pmd_leaf(pmd)		(pmd_val(pmd) & 2)
 #define pmd_bad(pmd)		(pmd_val(pmd) & 2)
 #define pmd_present(pmd)	(pmd_val(pmd))
 
diff --git a/arch/arm/include/asm/pgtable-3level.h b/arch/arm/include/asm/pgtable-3level.h
index 5b18295021a0..ad55ab068dbf 100644
--- a/arch/arm/include/asm/pgtable-3level.h
+++ b/arch/arm/include/asm/pgtable-3level.h
@@ -134,6 +134,7 @@
 #define pmd_sect(pmd)		((pmd_val(pmd) & PMD_TYPE_MASK) == \
 						 PMD_TYPE_SECT)
 #define pmd_large(pmd)		pmd_sect(pmd)
+#define pmd_leaf(pmd)		pmd_sect(pmd)
 
 #define pud_clear(pudp)			\
 	do {				\
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH v16 03/25] arm: mm: Add p?d_leaf() definitions
@ 2019-12-06 13:52   ` Steven Price
  0 siblings, 0 replies; 85+ messages in thread
From: Steven Price @ 2019-12-06 13:52 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: Mark Rutland, x86, Arnd Bergmann, Ard Biesheuvel, Peter Zijlstra,
	Catalin Marinas, Dave Hansen, linux-kernel, Steven Price,
	Jérôme Glisse, Ingo Molnar, Borislav Petkov,
	Andy Lutomirski, H. Peter Anvin, James Morse, Thomas Gleixner,
	Will Deacon, Russell King, linux-arm-kernel, Liang, Kan

walk_page_range() is going to be allowed to walk page tables other than
those of user space. For this it needs to know when it has reached a
'leaf' entry in the page tables. This information is provided by the
p?d_leaf() functions/macros.

For arm pmd_large() already exists and does what we want. So simply
provide the generic pmd_leaf() name.

CC: Russell King <linux@armlinux.org.uk>
CC: linux-arm-kernel@lists.infradead.org
Signed-off-by: Steven Price <steven.price@arm.com>
---
 arch/arm/include/asm/pgtable-2level.h | 1 +
 arch/arm/include/asm/pgtable-3level.h | 1 +
 2 files changed, 2 insertions(+)

diff --git a/arch/arm/include/asm/pgtable-2level.h b/arch/arm/include/asm/pgtable-2level.h
index 51beec41d48c..0d3ea35c97fe 100644
--- a/arch/arm/include/asm/pgtable-2level.h
+++ b/arch/arm/include/asm/pgtable-2level.h
@@ -189,6 +189,7 @@ static inline pmd_t *pmd_offset(pud_t *pud, unsigned long addr)
 }
 
 #define pmd_large(pmd)		(pmd_val(pmd) & 2)
+#define pmd_leaf(pmd)		(pmd_val(pmd) & 2)
 #define pmd_bad(pmd)		(pmd_val(pmd) & 2)
 #define pmd_present(pmd)	(pmd_val(pmd))
 
diff --git a/arch/arm/include/asm/pgtable-3level.h b/arch/arm/include/asm/pgtable-3level.h
index 5b18295021a0..ad55ab068dbf 100644
--- a/arch/arm/include/asm/pgtable-3level.h
+++ b/arch/arm/include/asm/pgtable-3level.h
@@ -134,6 +134,7 @@
 #define pmd_sect(pmd)		((pmd_val(pmd) & PMD_TYPE_MASK) == \
 						 PMD_TYPE_SECT)
 #define pmd_large(pmd)		pmd_sect(pmd)
+#define pmd_leaf(pmd)		pmd_sect(pmd)
 
 #define pud_clear(pudp)			\
 	do {				\
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH v16 04/25] arm64: mm: Add p?d_leaf() definitions
  2019-12-06 13:52 ` Steven Price
@ 2019-12-06 13:52   ` Steven Price
  -1 siblings, 0 replies; 85+ messages in thread
From: Steven Price @ 2019-12-06 13:52 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: Steven Price, Andy Lutomirski, Ard Biesheuvel, Arnd Bergmann,
	Borislav Petkov, Catalin Marinas, Dave Hansen, Ingo Molnar,
	James Morse, Jérôme Glisse, Peter Zijlstra,
	Thomas Gleixner, Will Deacon, x86, H. Peter Anvin,
	linux-arm-kernel, linux-kernel, Mark Rutland, Liang, Kan

walk_page_range() is going to be allowed to walk page tables other than
those of user space. For this it needs to know when it has reached a
'leaf' entry in the page tables. This information will be provided by the
p?d_leaf() functions/macros.

For arm64, we already have p?d_sect() macros which we can reuse for
p?d_leaf().

pud_sect() is defined as a dummy function when CONFIG_PGTABLE_LEVELS < 3
or CONFIG_ARM64_64K_PAGES is defined. However when the kernel is
configured this way then architecturally it isn't allowed to have a
large page at this level, and any code using these page walking macros
is implicitly relying on the page size/number of levels being the same as
the kernel. So it is safe to reuse this for p?d_leaf() as it is an
architectural restriction.

CC: Catalin Marinas <catalin.marinas@arm.com>
CC: Will Deacon <will@kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Steven Price <steven.price@arm.com>
---
 arch/arm64/include/asm/pgtable.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 5d15b4735a0e..40df7e16d397 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -445,6 +445,7 @@ extern pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn,
 				 PMD_TYPE_TABLE)
 #define pmd_sect(pmd)		((pmd_val(pmd) & PMD_TYPE_MASK) == \
 				 PMD_TYPE_SECT)
+#define pmd_leaf(pmd)		pmd_sect(pmd)
 
 #if defined(CONFIG_ARM64_64K_PAGES) || CONFIG_PGTABLE_LEVELS < 3
 static inline bool pud_sect(pud_t pud) { return false; }
@@ -529,6 +530,7 @@ static inline void pte_unmap(pte_t *pte) { }
 #define pud_none(pud)		(!pud_val(pud))
 #define pud_bad(pud)		(!(pud_val(pud) & PUD_TABLE_BIT))
 #define pud_present(pud)	pte_present(pud_pte(pud))
+#define pud_leaf(pud)		pud_sect(pud)
 #define pud_valid(pud)		pte_valid(pud_pte(pud))
 
 static inline void set_pud(pud_t *pudp, pud_t pud)
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH v16 04/25] arm64: mm: Add p?d_leaf() definitions
@ 2019-12-06 13:52   ` Steven Price
  0 siblings, 0 replies; 85+ messages in thread
From: Steven Price @ 2019-12-06 13:52 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: Mark Rutland, x86, Arnd Bergmann, Ard Biesheuvel, Peter Zijlstra,
	Catalin Marinas, Dave Hansen, linux-kernel, Steven Price,
	Jérôme Glisse, Ingo Molnar, Borislav Petkov,
	Andy Lutomirski, H. Peter Anvin, James Morse, Thomas Gleixner,
	Will Deacon, linux-arm-kernel, Liang, Kan

walk_page_range() is going to be allowed to walk page tables other than
those of user space. For this it needs to know when it has reached a
'leaf' entry in the page tables. This information will be provided by the
p?d_leaf() functions/macros.

For arm64, we already have p?d_sect() macros which we can reuse for
p?d_leaf().

pud_sect() is defined as a dummy function when CONFIG_PGTABLE_LEVELS < 3
or CONFIG_ARM64_64K_PAGES is defined. However when the kernel is
configured this way then architecturally it isn't allowed to have a
large page at this level, and any code using these page walking macros
is implicitly relying on the page size/number of levels being the same as
the kernel. So it is safe to reuse this for p?d_leaf() as it is an
architectural restriction.

CC: Catalin Marinas <catalin.marinas@arm.com>
CC: Will Deacon <will@kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Steven Price <steven.price@arm.com>
---
 arch/arm64/include/asm/pgtable.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 5d15b4735a0e..40df7e16d397 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -445,6 +445,7 @@ extern pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn,
 				 PMD_TYPE_TABLE)
 #define pmd_sect(pmd)		((pmd_val(pmd) & PMD_TYPE_MASK) == \
 				 PMD_TYPE_SECT)
+#define pmd_leaf(pmd)		pmd_sect(pmd)
 
 #if defined(CONFIG_ARM64_64K_PAGES) || CONFIG_PGTABLE_LEVELS < 3
 static inline bool pud_sect(pud_t pud) { return false; }
@@ -529,6 +530,7 @@ static inline void pte_unmap(pte_t *pte) { }
 #define pud_none(pud)		(!pud_val(pud))
 #define pud_bad(pud)		(!(pud_val(pud) & PUD_TABLE_BIT))
 #define pud_present(pud)	pte_present(pud_pte(pud))
+#define pud_leaf(pud)		pud_sect(pud)
 #define pud_valid(pud)		pte_valid(pud_pte(pud))
 
 static inline void set_pud(pud_t *pudp, pud_t pud)
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH v16 05/25] mips: mm: Add p?d_leaf() definitions
  2019-12-06 13:52 ` Steven Price
@ 2019-12-06 13:52   ` Steven Price
  -1 siblings, 0 replies; 85+ messages in thread
From: Steven Price @ 2019-12-06 13:52 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: Steven Price, Andy Lutomirski, Ard Biesheuvel, Arnd Bergmann,
	Borislav Petkov, Catalin Marinas, Dave Hansen, Ingo Molnar,
	James Morse, Jérôme Glisse, Peter Zijlstra,
	Thomas Gleixner, Will Deacon, x86, H. Peter Anvin,
	linux-arm-kernel, linux-kernel, Mark Rutland, Liang, Kan,
	Ralf Baechle, Paul Burton, James Hogan, linux-mips

walk_page_range() is going to be allowed to walk page tables other than
those of user space. For this it needs to know when it has reached a
'leaf' entry in the page tables. This information is provided by the
p?d_leaf() functions/macros.

If _PAGE_HUGE is defined we can simply look for it. When not defined we
can be confident that there are no leaf pages in existence and fall back
on the generic implementation (added in a later patch) which returns 0.

CC: Ralf Baechle <ralf@linux-mips.org>
CC: Paul Burton <paul.burton@mips.com>
CC: James Hogan <jhogan@kernel.org>
CC: linux-mips@vger.kernel.org
Signed-off-by: Steven Price <steven.price@arm.com>
Acked-by: Paul Burton <paul.burton@mips.com>
---
 arch/mips/include/asm/pgtable.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/arch/mips/include/asm/pgtable.h b/arch/mips/include/asm/pgtable.h
index 91b89aab1787..aef5378f909c 100644
--- a/arch/mips/include/asm/pgtable.h
+++ b/arch/mips/include/asm/pgtable.h
@@ -639,6 +639,11 @@ static inline pmd_t pmdp_huge_get_and_clear(struct mm_struct *mm,
 
 #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
 
+#ifdef _PAGE_HUGE
+#define pmd_leaf(pmd)	((pmd_val(pmd) & _PAGE_HUGE) != 0)
+#define pud_leaf(pud)	((pud_val(pud) & _PAGE_HUGE) != 0)
+#endif
+
 #define gup_fast_permitted(start, end)	(!cpu_has_dc_aliases)
 
 #include <asm-generic/pgtable.h>
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH v16 05/25] mips: mm: Add p?d_leaf() definitions
@ 2019-12-06 13:52   ` Steven Price
  0 siblings, 0 replies; 85+ messages in thread
From: Steven Price @ 2019-12-06 13:52 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: Mark Rutland, Peter Zijlstra, Catalin Marinas, Dave Hansen,
	linux-mips, H. Peter Anvin, Will Deacon, Liang, Kan, x86,
	Steven Price, Ingo Molnar, James Hogan, Arnd Bergmann,
	Jérôme Glisse, Borislav Petkov, Andy Lutomirski,
	Thomas Gleixner, linux-arm-kernel, Ard Biesheuvel, linux-kernel,
	Ralf Baechle, Paul Burton, James Morse

walk_page_range() is going to be allowed to walk page tables other than
those of user space. For this it needs to know when it has reached a
'leaf' entry in the page tables. This information is provided by the
p?d_leaf() functions/macros.

If _PAGE_HUGE is defined we can simply look for it. When not defined we
can be confident that there are no leaf pages in existence and fall back
on the generic implementation (added in a later patch) which returns 0.

CC: Ralf Baechle <ralf@linux-mips.org>
CC: Paul Burton <paul.burton@mips.com>
CC: James Hogan <jhogan@kernel.org>
CC: linux-mips@vger.kernel.org
Signed-off-by: Steven Price <steven.price@arm.com>
Acked-by: Paul Burton <paul.burton@mips.com>
---
 arch/mips/include/asm/pgtable.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/arch/mips/include/asm/pgtable.h b/arch/mips/include/asm/pgtable.h
index 91b89aab1787..aef5378f909c 100644
--- a/arch/mips/include/asm/pgtable.h
+++ b/arch/mips/include/asm/pgtable.h
@@ -639,6 +639,11 @@ static inline pmd_t pmdp_huge_get_and_clear(struct mm_struct *mm,
 
 #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
 
+#ifdef _PAGE_HUGE
+#define pmd_leaf(pmd)	((pmd_val(pmd) & _PAGE_HUGE) != 0)
+#define pud_leaf(pud)	((pud_val(pud) & _PAGE_HUGE) != 0)
+#endif
+
 #define gup_fast_permitted(start, end)	(!cpu_has_dc_aliases)
 
 #include <asm-generic/pgtable.h>
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH v16 06/25] powerpc: mm: Add p?d_leaf() definitions
  2019-12-06 13:52 ` Steven Price
  (?)
  (?)
@ 2019-12-06 13:52   ` Steven Price
  -1 siblings, 0 replies; 85+ messages in thread
From: Steven Price @ 2019-12-06 13:52 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: Steven Price, Andy Lutomirski, Ard Biesheuvel, Arnd Bergmann,
	Borislav Petkov, Catalin Marinas, Dave Hansen, Ingo Molnar,
	James Morse, Jérôme Glisse, Peter Zijlstra,
	Thomas Gleixner, Will Deacon, x86, H. Peter Anvin,
	linux-arm-kernel, linux-kernel, Mark Rutland, Liang, Kan,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	linuxppc-dev, kvm-ppc

walk_page_range() is going to be allowed to walk page tables other than
those of user space. For this it needs to know when it has reached a
'leaf' entry in the page tables. This information is provided by the
p?d_leaf() functions/macros.

For powerpc pmd_large() already exists and does what we want, so hoist
it out of the CONFIG_TRANSPARENT_HUGEPAGE condition and implement the
other levels. Macros are used to provide the generic p?d_leaf() names.

CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: Paul Mackerras <paulus@samba.org>
CC: Michael Ellerman <mpe@ellerman.id.au>
CC: linuxppc-dev@lists.ozlabs.org
CC: kvm-ppc@vger.kernel.org
Signed-off-by: Steven Price <steven.price@arm.com>
---
 arch/powerpc/include/asm/book3s/64/pgtable.h | 30 ++++++++++++++------
 1 file changed, 21 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h
index b01624e5c467..3dd7b6f5edd0 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -923,6 +923,12 @@ static inline int pud_present(pud_t pud)
 	return !!(pud_raw(pud) & cpu_to_be64(_PAGE_PRESENT));
 }
 
+#define pud_leaf	pud_large
+static inline int pud_large(pud_t pud)
+{
+	return !!(pud_raw(pud) & cpu_to_be64(_PAGE_PTE));
+}
+
 extern struct page *pud_page(pud_t pud);
 extern struct page *pmd_page(pmd_t pmd);
 static inline pte_t pud_pte(pud_t pud)
@@ -966,6 +972,12 @@ static inline int pgd_present(pgd_t pgd)
 	return !!(pgd_raw(pgd) & cpu_to_be64(_PAGE_PRESENT));
 }
 
+#define pgd_leaf	pgd_large
+static inline int pgd_large(pgd_t pgd)
+{
+	return !!(pgd_raw(pgd) & cpu_to_be64(_PAGE_PTE));
+}
+
 static inline pte_t pgd_pte(pgd_t pgd)
 {
 	return __pte_raw(pgd_raw(pgd));
@@ -1133,6 +1145,15 @@ static inline bool pmd_access_permitted(pmd_t pmd, bool write)
 	return pte_access_permitted(pmd_pte(pmd), write);
 }
 
+#define pmd_leaf	pmd_large
+/*
+ * returns true for pmd migration entries, THP, devmap, hugetlb
+ */
+static inline int pmd_large(pmd_t pmd)
+{
+	return !!(pmd_raw(pmd) & cpu_to_be64(_PAGE_PTE));
+}
+
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 extern pmd_t pfn_pmd(unsigned long pfn, pgprot_t pgprot);
 extern pmd_t mk_pmd(struct page *page, pgprot_t pgprot);
@@ -1159,15 +1180,6 @@ pmd_hugepage_update(struct mm_struct *mm, unsigned long addr, pmd_t *pmdp,
 	return hash__pmd_hugepage_update(mm, addr, pmdp, clr, set);
 }
 
-/*
- * returns true for pmd migration entries, THP, devmap, hugetlb
- * But compile time dependent on THP config
- */
-static inline int pmd_large(pmd_t pmd)
-{
-	return !!(pmd_raw(pmd) & cpu_to_be64(_PAGE_PTE));
-}
-
 static inline pmd_t pmd_mknotpresent(pmd_t pmd)
 {
 	return __pmd(pmd_val(pmd) & ~_PAGE_PRESENT);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH v16 06/25] powerpc: mm: Add p?d_leaf() definitions
@ 2019-12-06 13:52   ` Steven Price
  0 siblings, 0 replies; 85+ messages in thread
From: Steven Price @ 2019-12-06 13:52 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: Mark Rutland, Peter Zijlstra, Dave Hansen, Paul Mackerras,
	H. Peter Anvin, Will Deacon, Liang, Kan, x86, Steven Price,
	Ingo Molnar, Catalin Marinas, Arnd Bergmann, kvm-ppc,
	Jérôme Glisse, Borislav Petkov, Andy Lutomirski,
	Thomas Gleixner, linux-arm-kernel, Ard Biesheuvel, linux-kernel,
	James Morse, linuxppc-dev

walk_page_range() is going to be allowed to walk page tables other than
those of user space. For this it needs to know when it has reached a
'leaf' entry in the page tables. This information is provided by the
p?d_leaf() functions/macros.

For powerpc pmd_large() already exists and does what we want, so hoist
it out of the CONFIG_TRANSPARENT_HUGEPAGE condition and implement the
other levels. Macros are used to provide the generic p?d_leaf() names.

CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: Paul Mackerras <paulus@samba.org>
CC: Michael Ellerman <mpe@ellerman.id.au>
CC: linuxppc-dev@lists.ozlabs.org
CC: kvm-ppc@vger.kernel.org
Signed-off-by: Steven Price <steven.price@arm.com>
---
 arch/powerpc/include/asm/book3s/64/pgtable.h | 30 ++++++++++++++------
 1 file changed, 21 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h
index b01624e5c467..3dd7b6f5edd0 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -923,6 +923,12 @@ static inline int pud_present(pud_t pud)
 	return !!(pud_raw(pud) & cpu_to_be64(_PAGE_PRESENT));
 }
 
+#define pud_leaf	pud_large
+static inline int pud_large(pud_t pud)
+{
+	return !!(pud_raw(pud) & cpu_to_be64(_PAGE_PTE));
+}
+
 extern struct page *pud_page(pud_t pud);
 extern struct page *pmd_page(pmd_t pmd);
 static inline pte_t pud_pte(pud_t pud)
@@ -966,6 +972,12 @@ static inline int pgd_present(pgd_t pgd)
 	return !!(pgd_raw(pgd) & cpu_to_be64(_PAGE_PRESENT));
 }
 
+#define pgd_leaf	pgd_large
+static inline int pgd_large(pgd_t pgd)
+{
+	return !!(pgd_raw(pgd) & cpu_to_be64(_PAGE_PTE));
+}
+
 static inline pte_t pgd_pte(pgd_t pgd)
 {
 	return __pte_raw(pgd_raw(pgd));
@@ -1133,6 +1145,15 @@ static inline bool pmd_access_permitted(pmd_t pmd, bool write)
 	return pte_access_permitted(pmd_pte(pmd), write);
 }
 
+#define pmd_leaf	pmd_large
+/*
+ * returns true for pmd migration entries, THP, devmap, hugetlb
+ */
+static inline int pmd_large(pmd_t pmd)
+{
+	return !!(pmd_raw(pmd) & cpu_to_be64(_PAGE_PTE));
+}
+
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 extern pmd_t pfn_pmd(unsigned long pfn, pgprot_t pgprot);
 extern pmd_t mk_pmd(struct page *page, pgprot_t pgprot);
@@ -1159,15 +1180,6 @@ pmd_hugepage_update(struct mm_struct *mm, unsigned long addr, pmd_t *pmdp,
 	return hash__pmd_hugepage_update(mm, addr, pmdp, clr, set);
 }
 
-/*
- * returns true for pmd migration entries, THP, devmap, hugetlb
- * But compile time dependent on THP config
- */
-static inline int pmd_large(pmd_t pmd)
-{
-	return !!(pmd_raw(pmd) & cpu_to_be64(_PAGE_PTE));
-}
-
 static inline pmd_t pmd_mknotpresent(pmd_t pmd)
 {
 	return __pmd(pmd_val(pmd) & ~_PAGE_PRESENT);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH v16 06/25] powerpc: mm: Add p?d_leaf() definitions
@ 2019-12-06 13:52   ` Steven Price
  0 siblings, 0 replies; 85+ messages in thread
From: Steven Price @ 2019-12-06 13:52 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: Mark Rutland, Peter Zijlstra, Benjamin Herrenschmidt,
	Dave Hansen, Paul Mackerras, H. Peter Anvin, Will Deacon, Liang,
	Kan, Michael Ellerman, x86, Steven Price, Ingo Molnar,
	Catalin Marinas, Arnd Bergmann, kvm-ppc, Jérôme Glisse,
	Borislav Petkov, Andy Lutomirski, Thomas Gleixner,
	linux-arm-kernel, Ard Biesheuvel, linux-kernel, James Morse,
	linuxppc-dev

walk_page_range() is going to be allowed to walk page tables other than
those of user space. For this it needs to know when it has reached a
'leaf' entry in the page tables. This information is provided by the
p?d_leaf() functions/macros.

For powerpc pmd_large() already exists and does what we want, so hoist
it out of the CONFIG_TRANSPARENT_HUGEPAGE condition and implement the
other levels. Macros are used to provide the generic p?d_leaf() names.

CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: Paul Mackerras <paulus@samba.org>
CC: Michael Ellerman <mpe@ellerman.id.au>
CC: linuxppc-dev@lists.ozlabs.org
CC: kvm-ppc@vger.kernel.org
Signed-off-by: Steven Price <steven.price@arm.com>
---
 arch/powerpc/include/asm/book3s/64/pgtable.h | 30 ++++++++++++++------
 1 file changed, 21 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h
index b01624e5c467..3dd7b6f5edd0 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -923,6 +923,12 @@ static inline int pud_present(pud_t pud)
 	return !!(pud_raw(pud) & cpu_to_be64(_PAGE_PRESENT));
 }
 
+#define pud_leaf	pud_large
+static inline int pud_large(pud_t pud)
+{
+	return !!(pud_raw(pud) & cpu_to_be64(_PAGE_PTE));
+}
+
 extern struct page *pud_page(pud_t pud);
 extern struct page *pmd_page(pmd_t pmd);
 static inline pte_t pud_pte(pud_t pud)
@@ -966,6 +972,12 @@ static inline int pgd_present(pgd_t pgd)
 	return !!(pgd_raw(pgd) & cpu_to_be64(_PAGE_PRESENT));
 }
 
+#define pgd_leaf	pgd_large
+static inline int pgd_large(pgd_t pgd)
+{
+	return !!(pgd_raw(pgd) & cpu_to_be64(_PAGE_PTE));
+}
+
 static inline pte_t pgd_pte(pgd_t pgd)
 {
 	return __pte_raw(pgd_raw(pgd));
@@ -1133,6 +1145,15 @@ static inline bool pmd_access_permitted(pmd_t pmd, bool write)
 	return pte_access_permitted(pmd_pte(pmd), write);
 }
 
+#define pmd_leaf	pmd_large
+/*
+ * returns true for pmd migration entries, THP, devmap, hugetlb
+ */
+static inline int pmd_large(pmd_t pmd)
+{
+	return !!(pmd_raw(pmd) & cpu_to_be64(_PAGE_PTE));
+}
+
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 extern pmd_t pfn_pmd(unsigned long pfn, pgprot_t pgprot);
 extern pmd_t mk_pmd(struct page *page, pgprot_t pgprot);
@@ -1159,15 +1180,6 @@ pmd_hugepage_update(struct mm_struct *mm, unsigned long addr, pmd_t *pmdp,
 	return hash__pmd_hugepage_update(mm, addr, pmdp, clr, set);
 }
 
-/*
- * returns true for pmd migration entries, THP, devmap, hugetlb
- * But compile time dependent on THP config
- */
-static inline int pmd_large(pmd_t pmd)
-{
-	return !!(pmd_raw(pmd) & cpu_to_be64(_PAGE_PTE));
-}
-
 static inline pmd_t pmd_mknotpresent(pmd_t pmd)
 {
 	return __pmd(pmd_val(pmd) & ~_PAGE_PRESENT);
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH v16 06/25] powerpc: mm: Add p?d_leaf() definitions
@ 2019-12-06 13:52   ` Steven Price
  0 siblings, 0 replies; 85+ messages in thread
From: Steven Price @ 2019-12-06 13:52 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: Steven Price, Andy Lutomirski, Ard Biesheuvel, Arnd Bergmann,
	Borislav Petkov, Catalin Marinas, Dave Hansen, Ingo Molnar,
	James Morse, Jérôme Glisse, Peter Zijlstra,
	Thomas Gleixner, Will Deacon, x86, H. Peter Anvin,
	linux-arm-kernel, linux-kernel, Mark Rutland, Liang, Kan,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	linuxppc-dev, kvm-ppc

walk_page_range() is going to be allowed to walk page tables other than
those of user space. For this it needs to know when it has reached a
'leaf' entry in the page tables. This information is provided by the
p?d_leaf() functions/macros.

For powerpc pmd_large() already exists and does what we want, so hoist
it out of the CONFIG_TRANSPARENT_HUGEPAGE condition and implement the
other levels. Macros are used to provide the generic p?d_leaf() names.

CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: Paul Mackerras <paulus@samba.org>
CC: Michael Ellerman <mpe@ellerman.id.au>
CC: linuxppc-dev@lists.ozlabs.org
CC: kvm-ppc@vger.kernel.org
Signed-off-by: Steven Price <steven.price@arm.com>
---
 arch/powerpc/include/asm/book3s/64/pgtable.h | 30 ++++++++++++++------
 1 file changed, 21 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h
index b01624e5c467..3dd7b6f5edd0 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -923,6 +923,12 @@ static inline int pud_present(pud_t pud)
 	return !!(pud_raw(pud) & cpu_to_be64(_PAGE_PRESENT));
 }
 
+#define pud_leaf	pud_large
+static inline int pud_large(pud_t pud)
+{
+	return !!(pud_raw(pud) & cpu_to_be64(_PAGE_PTE));
+}
+
 extern struct page *pud_page(pud_t pud);
 extern struct page *pmd_page(pmd_t pmd);
 static inline pte_t pud_pte(pud_t pud)
@@ -966,6 +972,12 @@ static inline int pgd_present(pgd_t pgd)
 	return !!(pgd_raw(pgd) & cpu_to_be64(_PAGE_PRESENT));
 }
 
+#define pgd_leaf	pgd_large
+static inline int pgd_large(pgd_t pgd)
+{
+	return !!(pgd_raw(pgd) & cpu_to_be64(_PAGE_PTE));
+}
+
 static inline pte_t pgd_pte(pgd_t pgd)
 {
 	return __pte_raw(pgd_raw(pgd));
@@ -1133,6 +1145,15 @@ static inline bool pmd_access_permitted(pmd_t pmd, bool write)
 	return pte_access_permitted(pmd_pte(pmd), write);
 }
 
+#define pmd_leaf	pmd_large
+/*
+ * returns true for pmd migration entries, THP, devmap, hugetlb
+ */
+static inline int pmd_large(pmd_t pmd)
+{
+	return !!(pmd_raw(pmd) & cpu_to_be64(_PAGE_PTE));
+}
+
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 extern pmd_t pfn_pmd(unsigned long pfn, pgprot_t pgprot);
 extern pmd_t mk_pmd(struct page *page, pgprot_t pgprot);
@@ -1159,15 +1180,6 @@ pmd_hugepage_update(struct mm_struct *mm, unsigned long addr, pmd_t *pmdp,
 	return hash__pmd_hugepage_update(mm, addr, pmdp, clr, set);
 }
 
-/*
- * returns true for pmd migration entries, THP, devmap, hugetlb
- * But compile time dependent on THP config
- */
-static inline int pmd_large(pmd_t pmd)
-{
-	return !!(pmd_raw(pmd) & cpu_to_be64(_PAGE_PTE));
-}
-
 static inline pmd_t pmd_mknotpresent(pmd_t pmd)
 {
 	return __pmd(pmd_val(pmd) & ~_PAGE_PRESENT);
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH v16 07/25] riscv: mm: Add p?d_leaf() definitions
  2019-12-06 13:52 ` Steven Price
  (?)
@ 2019-12-06 13:52   ` Steven Price
  -1 siblings, 0 replies; 85+ messages in thread
From: Steven Price @ 2019-12-06 13:52 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: Steven Price, Andy Lutomirski, Ard Biesheuvel, Arnd Bergmann,
	Borislav Petkov, Catalin Marinas, Dave Hansen, Ingo Molnar,
	James Morse, Jérôme Glisse, Peter Zijlstra,
	Thomas Gleixner, Will Deacon, x86, H. Peter Anvin,
	linux-arm-kernel, linux-kernel, Mark Rutland, Liang, Kan,
	Palmer Dabbelt, Albert Ou, linux-riscv, Alexandre Ghiti, Zong Li,
	Paul Walmsley

walk_page_range() is going to be allowed to walk page tables other than
those of user space. For this it needs to know when it has reached a
'leaf' entry in the page tables. This information is provided by the
p?d_leaf() functions/macros.

For riscv a page is a leaf page when it has a read, write or execute bit
set on it.

CC: Palmer Dabbelt <palmer@sifive.com>
CC: Albert Ou <aou@eecs.berkeley.edu>
CC: linux-riscv@lists.infradead.org
Reviewed-by: Alexandre Ghiti <alex@ghiti.fr>
Reviewed-by: Zong Li <zong.li@sifive.com>
Acked-by: Paul Walmsley <paul.walmsley@sifive.com> # for arch/riscv
Signed-off-by: Steven Price <steven.price@arm.com>
---
 arch/riscv/include/asm/pgtable-64.h | 7 +++++++
 arch/riscv/include/asm/pgtable.h    | 7 +++++++
 2 files changed, 14 insertions(+)

diff --git a/arch/riscv/include/asm/pgtable-64.h b/arch/riscv/include/asm/pgtable-64.h
index 74630989006d..4c4d2c65ba6c 100644
--- a/arch/riscv/include/asm/pgtable-64.h
+++ b/arch/riscv/include/asm/pgtable-64.h
@@ -43,6 +43,13 @@ static inline int pud_bad(pud_t pud)
 	return !pud_present(pud);
 }
 
+#define pud_leaf	pud_leaf
+static inline int pud_leaf(pud_t pud)
+{
+	return pud_present(pud) &&
+	       (pud_val(pud) & (_PAGE_READ | _PAGE_WRITE | _PAGE_EXEC));
+}
+
 static inline void set_pud(pud_t *pudp, pud_t pud)
 {
 	*pudp = pud;
diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
index 7ff0ed4f292e..5cf96b2b4d5a 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -105,6 +105,13 @@ static inline int pmd_bad(pmd_t pmd)
 	return !pmd_present(pmd);
 }
 
+#define pmd_leaf	pmd_leaf
+static inline int pmd_leaf(pmd_t pmd)
+{
+	return pmd_present(pmd) &&
+	       (pmd_val(pmd) & (_PAGE_READ | _PAGE_WRITE | _PAGE_EXEC));
+}
+
 static inline void set_pmd(pmd_t *pmdp, pmd_t pmd)
 {
 	*pmdp = pmd;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH v16 07/25] riscv: mm: Add p?d_leaf() definitions
@ 2019-12-06 13:52   ` Steven Price
  0 siblings, 0 replies; 85+ messages in thread
From: Steven Price @ 2019-12-06 13:52 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: Mark Rutland, Peter Zijlstra, Catalin Marinas, Dave Hansen,
	Zong Li, H. Peter Anvin, linux-riscv, Will Deacon, Liang, Kan,
	Alexandre Ghiti, x86, Steven Price, Ingo Molnar, Palmer Dabbelt,
	Albert Ou, Arnd Bergmann, Jérôme Glisse,
	Borislav Petkov, Andy Lutomirski, Paul Walmsley, Thomas Gleixner,
	linux-arm-kernel, Ard Biesheuvel, linux-kernel, James Morse

walk_page_range() is going to be allowed to walk page tables other than
those of user space. For this it needs to know when it has reached a
'leaf' entry in the page tables. This information is provided by the
p?d_leaf() functions/macros.

For riscv a page is a leaf page when it has a read, write or execute bit
set on it.

CC: Palmer Dabbelt <palmer@sifive.com>
CC: Albert Ou <aou@eecs.berkeley.edu>
CC: linux-riscv@lists.infradead.org
Reviewed-by: Alexandre Ghiti <alex@ghiti.fr>
Reviewed-by: Zong Li <zong.li@sifive.com>
Acked-by: Paul Walmsley <paul.walmsley@sifive.com> # for arch/riscv
Signed-off-by: Steven Price <steven.price@arm.com>
---
 arch/riscv/include/asm/pgtable-64.h | 7 +++++++
 arch/riscv/include/asm/pgtable.h    | 7 +++++++
 2 files changed, 14 insertions(+)

diff --git a/arch/riscv/include/asm/pgtable-64.h b/arch/riscv/include/asm/pgtable-64.h
index 74630989006d..4c4d2c65ba6c 100644
--- a/arch/riscv/include/asm/pgtable-64.h
+++ b/arch/riscv/include/asm/pgtable-64.h
@@ -43,6 +43,13 @@ static inline int pud_bad(pud_t pud)
 	return !pud_present(pud);
 }
 
+#define pud_leaf	pud_leaf
+static inline int pud_leaf(pud_t pud)
+{
+	return pud_present(pud) &&
+	       (pud_val(pud) & (_PAGE_READ | _PAGE_WRITE | _PAGE_EXEC));
+}
+
 static inline void set_pud(pud_t *pudp, pud_t pud)
 {
 	*pudp = pud;
diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
index 7ff0ed4f292e..5cf96b2b4d5a 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -105,6 +105,13 @@ static inline int pmd_bad(pmd_t pmd)
 	return !pmd_present(pmd);
 }
 
+#define pmd_leaf	pmd_leaf
+static inline int pmd_leaf(pmd_t pmd)
+{
+	return pmd_present(pmd) &&
+	       (pmd_val(pmd) & (_PAGE_READ | _PAGE_WRITE | _PAGE_EXEC));
+}
+
 static inline void set_pmd(pmd_t *pmdp, pmd_t pmd)
 {
 	*pmdp = pmd;
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH v16 07/25] riscv: mm: Add p?d_leaf() definitions
@ 2019-12-06 13:52   ` Steven Price
  0 siblings, 0 replies; 85+ messages in thread
From: Steven Price @ 2019-12-06 13:52 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: Mark Rutland, Peter Zijlstra, Catalin Marinas, Dave Hansen,
	Zong Li, H. Peter Anvin, linux-riscv, Will Deacon, Liang, Kan,
	Alexandre Ghiti, x86, Steven Price, Ingo Molnar, Palmer Dabbelt,
	Albert Ou, Arnd Bergmann, Jérôme Glisse,
	Borislav Petkov, Andy Lutomirski, Paul Walmsley, Thomas Gleixner,
	linux-arm-kernel, Ard Biesheuvel, linux-kernel, James Morse

walk_page_range() is going to be allowed to walk page tables other than
those of user space. For this it needs to know when it has reached a
'leaf' entry in the page tables. This information is provided by the
p?d_leaf() functions/macros.

For riscv a page is a leaf page when it has a read, write or execute bit
set on it.

CC: Palmer Dabbelt <palmer@sifive.com>
CC: Albert Ou <aou@eecs.berkeley.edu>
CC: linux-riscv@lists.infradead.org
Reviewed-by: Alexandre Ghiti <alex@ghiti.fr>
Reviewed-by: Zong Li <zong.li@sifive.com>
Acked-by: Paul Walmsley <paul.walmsley@sifive.com> # for arch/riscv
Signed-off-by: Steven Price <steven.price@arm.com>
---
 arch/riscv/include/asm/pgtable-64.h | 7 +++++++
 arch/riscv/include/asm/pgtable.h    | 7 +++++++
 2 files changed, 14 insertions(+)

diff --git a/arch/riscv/include/asm/pgtable-64.h b/arch/riscv/include/asm/pgtable-64.h
index 74630989006d..4c4d2c65ba6c 100644
--- a/arch/riscv/include/asm/pgtable-64.h
+++ b/arch/riscv/include/asm/pgtable-64.h
@@ -43,6 +43,13 @@ static inline int pud_bad(pud_t pud)
 	return !pud_present(pud);
 }
 
+#define pud_leaf	pud_leaf
+static inline int pud_leaf(pud_t pud)
+{
+	return pud_present(pud) &&
+	       (pud_val(pud) & (_PAGE_READ | _PAGE_WRITE | _PAGE_EXEC));
+}
+
 static inline void set_pud(pud_t *pudp, pud_t pud)
 {
 	*pudp = pud;
diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
index 7ff0ed4f292e..5cf96b2b4d5a 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -105,6 +105,13 @@ static inline int pmd_bad(pmd_t pmd)
 	return !pmd_present(pmd);
 }
 
+#define pmd_leaf	pmd_leaf
+static inline int pmd_leaf(pmd_t pmd)
+{
+	return pmd_present(pmd) &&
+	       (pmd_val(pmd) & (_PAGE_READ | _PAGE_WRITE | _PAGE_EXEC));
+}
+
 static inline void set_pmd(pmd_t *pmdp, pmd_t pmd)
 {
 	*pmdp = pmd;
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH v16 08/25] s390: mm: Add p?d_leaf() definitions
  2019-12-06 13:52 ` Steven Price
@ 2019-12-06 13:52   ` Steven Price
  -1 siblings, 0 replies; 85+ messages in thread
From: Steven Price @ 2019-12-06 13:52 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: Steven Price, Andy Lutomirski, Ard Biesheuvel, Arnd Bergmann,
	Borislav Petkov, Catalin Marinas, Dave Hansen, Ingo Molnar,
	James Morse, Jérôme Glisse, Peter Zijlstra,
	Thomas Gleixner, Will Deacon, x86, H. Peter Anvin,
	linux-arm-kernel, linux-kernel, Mark Rutland, Liang, Kan,
	Heiko Carstens, Vasily Gorbik, Christian Borntraeger, linux-s390

walk_page_range() is going to be allowed to walk page tables other than
those of user space. For this it needs to know when it has reached a
'leaf' entry in the page tables. This information is provided by the
p?d_leaf() functions/macros.

For s390, pud_large() and pmd_large() are already implemented as static
inline functions. Add a macro to provide the p?d_leaf names for the
generic code to use.

CC: Heiko Carstens <heiko.carstens@de.ibm.com>
CC: Vasily Gorbik <gor@linux.ibm.com>
CC: Christian Borntraeger <borntraeger@de.ibm.com>
CC: linux-s390@vger.kernel.org
Signed-off-by: Steven Price <steven.price@arm.com>
---
 arch/s390/include/asm/pgtable.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h
index 7b03037a8475..137a3920ca36 100644
--- a/arch/s390/include/asm/pgtable.h
+++ b/arch/s390/include/asm/pgtable.h
@@ -673,6 +673,7 @@ static inline int pud_none(pud_t pud)
 	return pud_val(pud) == _REGION3_ENTRY_EMPTY;
 }
 
+#define pud_leaf	pud_large
 static inline int pud_large(pud_t pud)
 {
 	if ((pud_val(pud) & _REGION_ENTRY_TYPE_MASK) != _REGION_ENTRY_TYPE_R3)
@@ -690,6 +691,7 @@ static inline unsigned long pud_pfn(pud_t pud)
 	return (pud_val(pud) & origin_mask) >> PAGE_SHIFT;
 }
 
+#define pmd_leaf	pmd_large
 static inline int pmd_large(pmd_t pmd)
 {
 	return (pmd_val(pmd) & _SEGMENT_ENTRY_LARGE) != 0;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH v16 08/25] s390: mm: Add p?d_leaf() definitions
@ 2019-12-06 13:52   ` Steven Price
  0 siblings, 0 replies; 85+ messages in thread
From: Steven Price @ 2019-12-06 13:52 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: Mark Rutland, Peter Zijlstra, Catalin Marinas, Dave Hansen,
	Heiko Carstens, H. Peter Anvin, Will Deacon, Liang, Kan,
	linux-s390, Arnd Bergmann, x86, Steven Price,
	Christian Borntraeger, Ingo Molnar, Vasily Gorbik,
	Jérôme Glisse, Borislav Petkov, Andy Lutomirski,
	Thomas Gleixner, linux-arm-kernel, Ard Biesheuvel, linux-kernel,
	James Morse

walk_page_range() is going to be allowed to walk page tables other than
those of user space. For this it needs to know when it has reached a
'leaf' entry in the page tables. This information is provided by the
p?d_leaf() functions/macros.

For s390, pud_large() and pmd_large() are already implemented as static
inline functions. Add a macro to provide the p?d_leaf names for the
generic code to use.

CC: Heiko Carstens <heiko.carstens@de.ibm.com>
CC: Vasily Gorbik <gor@linux.ibm.com>
CC: Christian Borntraeger <borntraeger@de.ibm.com>
CC: linux-s390@vger.kernel.org
Signed-off-by: Steven Price <steven.price@arm.com>
---
 arch/s390/include/asm/pgtable.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h
index 7b03037a8475..137a3920ca36 100644
--- a/arch/s390/include/asm/pgtable.h
+++ b/arch/s390/include/asm/pgtable.h
@@ -673,6 +673,7 @@ static inline int pud_none(pud_t pud)
 	return pud_val(pud) == _REGION3_ENTRY_EMPTY;
 }
 
+#define pud_leaf	pud_large
 static inline int pud_large(pud_t pud)
 {
 	if ((pud_val(pud) & _REGION_ENTRY_TYPE_MASK) != _REGION_ENTRY_TYPE_R3)
@@ -690,6 +691,7 @@ static inline unsigned long pud_pfn(pud_t pud)
 	return (pud_val(pud) & origin_mask) >> PAGE_SHIFT;
 }
 
+#define pmd_leaf	pmd_large
 static inline int pmd_large(pmd_t pmd)
 {
 	return (pmd_val(pmd) & _SEGMENT_ENTRY_LARGE) != 0;
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH v16 09/25] sparc: mm: Add p?d_leaf() definitions
  2019-12-06 13:52 ` Steven Price
  (?)
@ 2019-12-06 13:53   ` Steven Price
  -1 siblings, 0 replies; 85+ messages in thread
From: Steven Price @ 2019-12-06 13:53 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: Steven Price, Andy Lutomirski, Ard Biesheuvel, Arnd Bergmann,
	Borislav Petkov, Catalin Marinas, Dave Hansen, Ingo Molnar,
	James Morse, Jérôme Glisse, Peter Zijlstra,
	Thomas Gleixner, Will Deacon, x86, H. Peter Anvin,
	linux-arm-kernel, linux-kernel, Mark Rutland, Liang, Kan,
	David S. Miller, sparclinux

walk_page_range() is going to be allowed to walk page tables other than
those of user space. For this it needs to know when it has reached a
'leaf' entry in the page tables. This information is provided by the
p?d_leaf() functions/macros.

For sparc 64 bit, pmd_large() and pud_large() are already provided, so
add macros to provide the p?d_leaf names required by the generic code.

CC: "David S. Miller" <davem@davemloft.net>
CC: sparclinux@vger.kernel.org
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Steven Price <steven.price@arm.com>
---
 arch/sparc/include/asm/pgtable_64.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/sparc/include/asm/pgtable_64.h b/arch/sparc/include/asm/pgtable_64.h
index 6ae8016ef4ec..43206652eaf5 100644
--- a/arch/sparc/include/asm/pgtable_64.h
+++ b/arch/sparc/include/asm/pgtable_64.h
@@ -683,6 +683,7 @@ static inline unsigned long pte_special(pte_t pte)
 	return pte_val(pte) & _PAGE_SPECIAL;
 }
 
+#define pmd_leaf	pmd_large
 static inline unsigned long pmd_large(pmd_t pmd)
 {
 	pte_t pte = __pte(pmd_val(pmd));
@@ -867,6 +868,7 @@ static inline unsigned long pud_page_vaddr(pud_t pud)
 /* only used by the stubbed out hugetlb gup code, should never be called */
 #define pgd_page(pgd)			NULL
 
+#define pud_leaf	pud_large
 static inline unsigned long pud_large(pud_t pud)
 {
 	pte_t pte = __pte(pud_val(pud));
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH v16 09/25] sparc: mm: Add p?d_leaf() definitions
@ 2019-12-06 13:53   ` Steven Price
  0 siblings, 0 replies; 85+ messages in thread
From: Steven Price @ 2019-12-06 13:53 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: Mark Rutland, x86, Arnd Bergmann, Ard Biesheuvel, Peter Zijlstra,
	Catalin Marinas, Dave Hansen, linux-kernel, Steven Price,
	Jérôme Glisse, Ingo Molnar, Borislav Petkov,
	Andy Lutomirski, H. Peter Anvin, sparclinux, James Morse,
	Thomas Gleixner, Will Deacon, David S. Miller, linux-arm-kernel,
	Liang, Kan

walk_page_range() is going to be allowed to walk page tables other than
those of user space. For this it needs to know when it has reached a
'leaf' entry in the page tables. This information is provided by the
p?d_leaf() functions/macros.

For sparc 64 bit, pmd_large() and pud_large() are already provided, so
add macros to provide the p?d_leaf names required by the generic code.

CC: "David S. Miller" <davem@davemloft.net>
CC: sparclinux@vger.kernel.org
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Steven Price <steven.price@arm.com>
---
 arch/sparc/include/asm/pgtable_64.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/sparc/include/asm/pgtable_64.h b/arch/sparc/include/asm/pgtable_64.h
index 6ae8016ef4ec..43206652eaf5 100644
--- a/arch/sparc/include/asm/pgtable_64.h
+++ b/arch/sparc/include/asm/pgtable_64.h
@@ -683,6 +683,7 @@ static inline unsigned long pte_special(pte_t pte)
 	return pte_val(pte) & _PAGE_SPECIAL;
 }
 
+#define pmd_leaf	pmd_large
 static inline unsigned long pmd_large(pmd_t pmd)
 {
 	pte_t pte = __pte(pmd_val(pmd));
@@ -867,6 +868,7 @@ static inline unsigned long pud_page_vaddr(pud_t pud)
 /* only used by the stubbed out hugetlb gup code, should never be called */
 #define pgd_page(pgd)			NULL
 
+#define pud_leaf	pud_large
 static inline unsigned long pud_large(pud_t pud)
 {
 	pte_t pte = __pte(pud_val(pud));
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH v16 09/25] sparc: mm: Add p?d_leaf() definitions
@ 2019-12-06 13:53   ` Steven Price
  0 siblings, 0 replies; 85+ messages in thread
From: Steven Price @ 2019-12-06 13:53 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: Mark Rutland, x86, Arnd Bergmann, Ard Biesheuvel, Peter Zijlstra,
	Catalin Marinas, Dave Hansen, linux-kernel, Steven Price,
	Jérôme Glisse, Ingo Molnar, Borislav Petkov,
	Andy Lutomirski, H. Peter Anvin, sparclinux, James Morse,
	Thomas Gleixner, Will Deacon, David S. Miller, linux-arm-kernel,
	Liang, Kan

walk_page_range() is going to be allowed to walk page tables other than
those of user space. For this it needs to know when it has reached a
'leaf' entry in the page tables. This information is provided by the
p?d_leaf() functions/macros.

For sparc 64 bit, pmd_large() and pud_large() are already provided, so
add macros to provide the p?d_leaf names required by the generic code.

CC: "David S. Miller" <davem@davemloft.net>
CC: sparclinux@vger.kernel.org
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Steven Price <steven.price@arm.com>
---
 arch/sparc/include/asm/pgtable_64.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/sparc/include/asm/pgtable_64.h b/arch/sparc/include/asm/pgtable_64.h
index 6ae8016ef4ec..43206652eaf5 100644
--- a/arch/sparc/include/asm/pgtable_64.h
+++ b/arch/sparc/include/asm/pgtable_64.h
@@ -683,6 +683,7 @@ static inline unsigned long pte_special(pte_t pte)
 	return pte_val(pte) & _PAGE_SPECIAL;
 }
 
+#define pmd_leaf	pmd_large
 static inline unsigned long pmd_large(pmd_t pmd)
 {
 	pte_t pte = __pte(pmd_val(pmd));
@@ -867,6 +868,7 @@ static inline unsigned long pud_page_vaddr(pud_t pud)
 /* only used by the stubbed out hugetlb gup code, should never be called */
 #define pgd_page(pgd)			NULL
 
+#define pud_leaf	pud_large
 static inline unsigned long pud_large(pud_t pud)
 {
 	pte_t pte = __pte(pud_val(pud));
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH v16 10/25] x86: mm: Add p?d_leaf() definitions
  2019-12-06 13:52 ` Steven Price
@ 2019-12-06 13:53   ` Steven Price
  -1 siblings, 0 replies; 85+ messages in thread
From: Steven Price @ 2019-12-06 13:53 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: Steven Price, Andy Lutomirski, Ard Biesheuvel, Arnd Bergmann,
	Borislav Petkov, Catalin Marinas, Dave Hansen, Ingo Molnar,
	James Morse, Jérôme Glisse, Peter Zijlstra,
	Thomas Gleixner, Will Deacon, x86, H. Peter Anvin,
	linux-arm-kernel, linux-kernel, Mark Rutland, Liang, Kan

walk_page_range() is going to be allowed to walk page tables other than
those of user space. For this it needs to know when it has reached a
'leaf' entry in the page tables. This information is provided by the
p?d_leaf() functions/macros.

For x86 we already have p?d_large() functions, so simply add macros to
provide the generic p?d_leaf() names for the generic code.

Signed-off-by: Steven Price <steven.price@arm.com>
---
 arch/x86/include/asm/pgtable.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index ad97dc155195..8091a1c62596 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -239,6 +239,7 @@ static inline unsigned long pgd_pfn(pgd_t pgd)
 	return (pgd_val(pgd) & PTE_PFN_MASK) >> PAGE_SHIFT;
 }
 
+#define p4d_leaf	p4d_large
 static inline int p4d_large(p4d_t p4d)
 {
 	/* No 512 GiB pages yet */
@@ -247,6 +248,7 @@ static inline int p4d_large(p4d_t p4d)
 
 #define pte_page(pte)	pfn_to_page(pte_pfn(pte))
 
+#define pmd_leaf	pmd_large
 static inline int pmd_large(pmd_t pte)
 {
 	return pmd_flags(pte) & _PAGE_PSE;
@@ -874,6 +876,7 @@ static inline pmd_t *pmd_offset(pud_t *pud, unsigned long address)
 	return (pmd_t *)pud_page_vaddr(*pud) + pmd_index(address);
 }
 
+#define pud_leaf	pud_large
 static inline int pud_large(pud_t pud)
 {
 	return (pud_val(pud) & (_PAGE_PSE | _PAGE_PRESENT)) ==
@@ -885,6 +888,7 @@ static inline int pud_bad(pud_t pud)
 	return (pud_flags(pud) & ~(_KERNPG_TABLE | _PAGE_USER)) != 0;
 }
 #else
+#define pud_leaf	pud_large
 static inline int pud_large(pud_t pud)
 {
 	return 0;
@@ -1233,6 +1237,7 @@ static inline bool pgdp_maps_userspace(void *__ptr)
 	return (((ptr & ~PAGE_MASK) / sizeof(pgd_t)) < PGD_KERNEL_START);
 }
 
+#define pgd_leaf	pgd_large
 static inline int pgd_large(pgd_t pgd) { return 0; }
 
 #ifdef CONFIG_PAGE_TABLE_ISOLATION
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH v16 10/25] x86: mm: Add p?d_leaf() definitions
@ 2019-12-06 13:53   ` Steven Price
  0 siblings, 0 replies; 85+ messages in thread
From: Steven Price @ 2019-12-06 13:53 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: Mark Rutland, x86, Arnd Bergmann, Ard Biesheuvel, Peter Zijlstra,
	Catalin Marinas, Dave Hansen, linux-kernel, Steven Price,
	Jérôme Glisse, Ingo Molnar, Borislav Petkov,
	Andy Lutomirski, H. Peter Anvin, James Morse, Thomas Gleixner,
	Will Deacon, linux-arm-kernel, Liang, Kan

walk_page_range() is going to be allowed to walk page tables other than
those of user space. For this it needs to know when it has reached a
'leaf' entry in the page tables. This information is provided by the
p?d_leaf() functions/macros.

For x86 we already have p?d_large() functions, so simply add macros to
provide the generic p?d_leaf() names for the generic code.

Signed-off-by: Steven Price <steven.price@arm.com>
---
 arch/x86/include/asm/pgtable.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index ad97dc155195..8091a1c62596 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -239,6 +239,7 @@ static inline unsigned long pgd_pfn(pgd_t pgd)
 	return (pgd_val(pgd) & PTE_PFN_MASK) >> PAGE_SHIFT;
 }
 
+#define p4d_leaf	p4d_large
 static inline int p4d_large(p4d_t p4d)
 {
 	/* No 512 GiB pages yet */
@@ -247,6 +248,7 @@ static inline int p4d_large(p4d_t p4d)
 
 #define pte_page(pte)	pfn_to_page(pte_pfn(pte))
 
+#define pmd_leaf	pmd_large
 static inline int pmd_large(pmd_t pte)
 {
 	return pmd_flags(pte) & _PAGE_PSE;
@@ -874,6 +876,7 @@ static inline pmd_t *pmd_offset(pud_t *pud, unsigned long address)
 	return (pmd_t *)pud_page_vaddr(*pud) + pmd_index(address);
 }
 
+#define pud_leaf	pud_large
 static inline int pud_large(pud_t pud)
 {
 	return (pud_val(pud) & (_PAGE_PSE | _PAGE_PRESENT)) ==
@@ -885,6 +888,7 @@ static inline int pud_bad(pud_t pud)
 	return (pud_flags(pud) & ~(_KERNPG_TABLE | _PAGE_USER)) != 0;
 }
 #else
+#define pud_leaf	pud_large
 static inline int pud_large(pud_t pud)
 {
 	return 0;
@@ -1233,6 +1237,7 @@ static inline bool pgdp_maps_userspace(void *__ptr)
 	return (((ptr & ~PAGE_MASK) / sizeof(pgd_t)) < PGD_KERNEL_START);
 }
 
+#define pgd_leaf	pgd_large
 static inline int pgd_large(pgd_t pgd) { return 0; }
 
 #ifdef CONFIG_PAGE_TABLE_ISOLATION
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH v16 11/25] mm: pagewalk: Add p4d_entry() and pgd_entry()
  2019-12-06 13:52 ` Steven Price
@ 2019-12-06 13:53   ` Steven Price
  -1 siblings, 0 replies; 85+ messages in thread
From: Steven Price @ 2019-12-06 13:53 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: Steven Price, Andy Lutomirski, Ard Biesheuvel, Arnd Bergmann,
	Borislav Petkov, Catalin Marinas, Dave Hansen, Ingo Molnar,
	James Morse, Jérôme Glisse, Peter Zijlstra,
	Thomas Gleixner, Will Deacon, x86, H. Peter Anvin,
	linux-arm-kernel, linux-kernel, Mark Rutland, Liang, Kan,
	Zong Li

pgd_entry() and pud_entry() were removed by commit 0b1fbfe50006c410
("mm/pagewalk: remove pgd_entry() and pud_entry()") because there were
no users. We're about to add users so reintroduce them, along with
p4d_entry() as we now have 5 levels of tables.

Note that commit a00cc7d9dd93d66a ("mm, x86: add support for
PUD-sized transparent hugepages") already re-added pud_entry() but with
different semantics to the other callbacks. Since there have never
been upstream users of this, revert the semantics back to match the
other callbacks. This means pud_entry() is called for all entries, not
just transparent huge pages.

Tested-by: Zong Li <zong.li@sifive.com>
Signed-off-by: Steven Price <steven.price@arm.com>
---
 include/linux/pagewalk.h | 19 +++++++++++++------
 mm/pagewalk.c            | 27 ++++++++++++++++-----------
 2 files changed, 29 insertions(+), 17 deletions(-)

diff --git a/include/linux/pagewalk.h b/include/linux/pagewalk.h
index 6ec82e92c87f..06790f23957f 100644
--- a/include/linux/pagewalk.h
+++ b/include/linux/pagewalk.h
@@ -8,15 +8,15 @@ struct mm_walk;
 
 /**
  * mm_walk_ops - callbacks for walk_page_range
- * @pud_entry:		if set, called for each non-empty PUD (2nd-level) entry
- *			this handler should only handle pud_trans_huge() puds.
- *			the pmd_entry or pte_entry callbacks will be used for
- *			regular PUDs.
- * @pmd_entry:		if set, called for each non-empty PMD (3rd-level) entry
+ * @pgd_entry:		if set, called for each non-empty PGD (top-level) entry
+ * @p4d_entry:		if set, called for each non-empty P4D entry
+ * @pud_entry:		if set, called for each non-empty PUD entry
+ * @pmd_entry:		if set, called for each non-empty PMD entry
  *			this handler is required to be able to handle
  *			pmd_trans_huge() pmds.  They may simply choose to
  *			split_huge_page() instead of handling it explicitly.
- * @pte_entry:		if set, called for each non-empty PTE (4th-level) entry
+ * @pte_entry:		if set, called for each non-empty PTE (lowest-level)
+ *			entry
  * @pte_hole:		if set, called for each hole at all levels
  * @hugetlb_entry:	if set, called for each hugetlb entry
  * @test_walk:		caller specific callback function to determine whether
@@ -27,8 +27,15 @@ struct mm_walk;
  * @pre_vma:            if set, called before starting walk on a non-null vma.
  * @post_vma:           if set, called after a walk on a non-null vma, provided
  *                      that @pre_vma and the vma walk succeeded.
+ *
+ * p?d_entry callbacks are called even if those levels are folded on a
+ * particular architecture/configuration.
  */
 struct mm_walk_ops {
+	int (*pgd_entry)(pgd_t *pgd, unsigned long addr,
+			 unsigned long next, struct mm_walk *walk);
+	int (*p4d_entry)(p4d_t *p4d, unsigned long addr,
+			 unsigned long next, struct mm_walk *walk);
 	int (*pud_entry)(pud_t *pud, unsigned long addr,
 			 unsigned long next, struct mm_walk *walk);
 	int (*pmd_entry)(pmd_t *pmd, unsigned long addr,
diff --git a/mm/pagewalk.c b/mm/pagewalk.c
index ea0b9e606ad1..c089786e7a7f 100644
--- a/mm/pagewalk.c
+++ b/mm/pagewalk.c
@@ -94,15 +94,9 @@ static int walk_pud_range(p4d_t *p4d, unsigned long addr, unsigned long end,
 		}
 
 		if (ops->pud_entry) {
-			spinlock_t *ptl = pud_trans_huge_lock(pud, walk->vma);
-
-			if (ptl) {
-				err = ops->pud_entry(pud, addr, next, walk);
-				spin_unlock(ptl);
-				if (err)
-					break;
-				continue;
-			}
+			err = ops->pud_entry(pud, addr, next, walk);
+			if (err)
+				break;
 		}
 
 		split_huge_pud(walk->vma, pud, addr);
@@ -136,7 +130,12 @@ static int walk_p4d_range(pgd_t *pgd, unsigned long addr, unsigned long end,
 				break;
 			continue;
 		}
-		if (ops->pmd_entry || ops->pte_entry)
+		if (ops->p4d_entry) {
+			err = ops->p4d_entry(p4d, addr, next, walk);
+			if (err)
+				break;
+		}
+		if (ops->pud_entry || ops->pmd_entry || ops->pte_entry)
 			err = walk_pud_range(p4d, addr, next, walk);
 		if (err)
 			break;
@@ -163,7 +162,13 @@ static int walk_pgd_range(unsigned long addr, unsigned long end,
 				break;
 			continue;
 		}
-		if (ops->pmd_entry || ops->pte_entry)
+		if (ops->pgd_entry) {
+			err = ops->pgd_entry(pgd, addr, next, walk);
+			if (err)
+				break;
+		}
+		if (ops->p4d_entry || ops->pud_entry || ops->pmd_entry ||
+		    ops->pte_entry)
 			err = walk_p4d_range(pgd, addr, next, walk);
 		if (err)
 			break;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH v16 11/25] mm: pagewalk: Add p4d_entry() and pgd_entry()
@ 2019-12-06 13:53   ` Steven Price
  0 siblings, 0 replies; 85+ messages in thread
From: Steven Price @ 2019-12-06 13:53 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: Mark Rutland, x86, Zong Li, Arnd Bergmann, Ard Biesheuvel,
	Peter Zijlstra, Catalin Marinas, Dave Hansen, linux-kernel,
	Steven Price, Jérôme Glisse, Ingo Molnar,
	Borislav Petkov, Andy Lutomirski, H. Peter Anvin, James Morse,
	Thomas Gleixner, Will Deacon, linux-arm-kernel, Liang, Kan

pgd_entry() and pud_entry() were removed by commit 0b1fbfe50006c410
("mm/pagewalk: remove pgd_entry() and pud_entry()") because there were
no users. We're about to add users so reintroduce them, along with
p4d_entry() as we now have 5 levels of tables.

Note that commit a00cc7d9dd93d66a ("mm, x86: add support for
PUD-sized transparent hugepages") already re-added pud_entry() but with
different semantics to the other callbacks. Since there have never
been upstream users of this, revert the semantics back to match the
other callbacks. This means pud_entry() is called for all entries, not
just transparent huge pages.

Tested-by: Zong Li <zong.li@sifive.com>
Signed-off-by: Steven Price <steven.price@arm.com>
---
 include/linux/pagewalk.h | 19 +++++++++++++------
 mm/pagewalk.c            | 27 ++++++++++++++++-----------
 2 files changed, 29 insertions(+), 17 deletions(-)

diff --git a/include/linux/pagewalk.h b/include/linux/pagewalk.h
index 6ec82e92c87f..06790f23957f 100644
--- a/include/linux/pagewalk.h
+++ b/include/linux/pagewalk.h
@@ -8,15 +8,15 @@ struct mm_walk;
 
 /**
  * mm_walk_ops - callbacks for walk_page_range
- * @pud_entry:		if set, called for each non-empty PUD (2nd-level) entry
- *			this handler should only handle pud_trans_huge() puds.
- *			the pmd_entry or pte_entry callbacks will be used for
- *			regular PUDs.
- * @pmd_entry:		if set, called for each non-empty PMD (3rd-level) entry
+ * @pgd_entry:		if set, called for each non-empty PGD (top-level) entry
+ * @p4d_entry:		if set, called for each non-empty P4D entry
+ * @pud_entry:		if set, called for each non-empty PUD entry
+ * @pmd_entry:		if set, called for each non-empty PMD entry
  *			this handler is required to be able to handle
  *			pmd_trans_huge() pmds.  They may simply choose to
  *			split_huge_page() instead of handling it explicitly.
- * @pte_entry:		if set, called for each non-empty PTE (4th-level) entry
+ * @pte_entry:		if set, called for each non-empty PTE (lowest-level)
+ *			entry
  * @pte_hole:		if set, called for each hole at all levels
  * @hugetlb_entry:	if set, called for each hugetlb entry
  * @test_walk:		caller specific callback function to determine whether
@@ -27,8 +27,15 @@ struct mm_walk;
  * @pre_vma:            if set, called before starting walk on a non-null vma.
  * @post_vma:           if set, called after a walk on a non-null vma, provided
  *                      that @pre_vma and the vma walk succeeded.
+ *
+ * p?d_entry callbacks are called even if those levels are folded on a
+ * particular architecture/configuration.
  */
 struct mm_walk_ops {
+	int (*pgd_entry)(pgd_t *pgd, unsigned long addr,
+			 unsigned long next, struct mm_walk *walk);
+	int (*p4d_entry)(p4d_t *p4d, unsigned long addr,
+			 unsigned long next, struct mm_walk *walk);
 	int (*pud_entry)(pud_t *pud, unsigned long addr,
 			 unsigned long next, struct mm_walk *walk);
 	int (*pmd_entry)(pmd_t *pmd, unsigned long addr,
diff --git a/mm/pagewalk.c b/mm/pagewalk.c
index ea0b9e606ad1..c089786e7a7f 100644
--- a/mm/pagewalk.c
+++ b/mm/pagewalk.c
@@ -94,15 +94,9 @@ static int walk_pud_range(p4d_t *p4d, unsigned long addr, unsigned long end,
 		}
 
 		if (ops->pud_entry) {
-			spinlock_t *ptl = pud_trans_huge_lock(pud, walk->vma);
-
-			if (ptl) {
-				err = ops->pud_entry(pud, addr, next, walk);
-				spin_unlock(ptl);
-				if (err)
-					break;
-				continue;
-			}
+			err = ops->pud_entry(pud, addr, next, walk);
+			if (err)
+				break;
 		}
 
 		split_huge_pud(walk->vma, pud, addr);
@@ -136,7 +130,12 @@ static int walk_p4d_range(pgd_t *pgd, unsigned long addr, unsigned long end,
 				break;
 			continue;
 		}
-		if (ops->pmd_entry || ops->pte_entry)
+		if (ops->p4d_entry) {
+			err = ops->p4d_entry(p4d, addr, next, walk);
+			if (err)
+				break;
+		}
+		if (ops->pud_entry || ops->pmd_entry || ops->pte_entry)
 			err = walk_pud_range(p4d, addr, next, walk);
 		if (err)
 			break;
@@ -163,7 +162,13 @@ static int walk_pgd_range(unsigned long addr, unsigned long end,
 				break;
 			continue;
 		}
-		if (ops->pmd_entry || ops->pte_entry)
+		if (ops->pgd_entry) {
+			err = ops->pgd_entry(pgd, addr, next, walk);
+			if (err)
+				break;
+		}
+		if (ops->p4d_entry || ops->pud_entry || ops->pmd_entry ||
+		    ops->pte_entry)
 			err = walk_p4d_range(pgd, addr, next, walk);
 		if (err)
 			break;
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH v16 12/25] mm: pagewalk: Allow walking without vma
  2019-12-06 13:52 ` Steven Price
@ 2019-12-06 13:53   ` Steven Price
  -1 siblings, 0 replies; 85+ messages in thread
From: Steven Price @ 2019-12-06 13:53 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: Steven Price, Andy Lutomirski, Ard Biesheuvel, Arnd Bergmann,
	Borislav Petkov, Catalin Marinas, Dave Hansen, Ingo Molnar,
	James Morse, Jérôme Glisse, Peter Zijlstra,
	Thomas Gleixner, Will Deacon, x86, H. Peter Anvin,
	linux-arm-kernel, linux-kernel, Mark Rutland, Liang, Kan

Since 48684a65b4e3: "mm: pagewalk: fix misbehavior of walk_page_range
for vma(VM_PFNMAP)", page_table_walk() will report any kernel area as
a hole, because it lacks a vma.

This means each arch has re-implemented page table walking when needed,
for example in the per-arch ptdump walker.

Remove the requirement to have a vma in the generic code and add a new
function walk_page_range_novma() which ignores the VMAs and simply walks
the page tables.

Signed-off-by: Steven Price <steven.price@arm.com>
---
 include/linux/pagewalk.h |  5 +++++
 mm/pagewalk.c            | 44 ++++++++++++++++++++++++++++++++--------
 2 files changed, 41 insertions(+), 8 deletions(-)

diff --git a/include/linux/pagewalk.h b/include/linux/pagewalk.h
index 06790f23957f..2c9725bdcf1f 100644
--- a/include/linux/pagewalk.h
+++ b/include/linux/pagewalk.h
@@ -59,6 +59,7 @@ struct mm_walk_ops {
  * @ops:	operation to call during the walk
  * @mm:		mm_struct representing the target process of page table walk
  * @vma:	vma currently walked (NULL if walking outside vmas)
+ * @no_vma:	walk ignoring vmas (vma will always be NULL)
  * @private:	private data for callbacks' usage
  *
  * (see the comment on walk_page_range() for more details)
@@ -67,12 +68,16 @@ struct mm_walk {
 	const struct mm_walk_ops *ops;
 	struct mm_struct *mm;
 	struct vm_area_struct *vma;
+	bool no_vma;
 	void *private;
 };
 
 int walk_page_range(struct mm_struct *mm, unsigned long start,
 		unsigned long end, const struct mm_walk_ops *ops,
 		void *private);
+int walk_page_range_novma(struct mm_struct *mm, unsigned long start,
+			  unsigned long end, const struct mm_walk_ops *ops,
+			  void *private);
 int walk_page_vma(struct vm_area_struct *vma, const struct mm_walk_ops *ops,
 		void *private);
 int walk_page_mapping(struct address_space *mapping, pgoff_t first_index,
diff --git a/mm/pagewalk.c b/mm/pagewalk.c
index c089786e7a7f..efa464cf079b 100644
--- a/mm/pagewalk.c
+++ b/mm/pagewalk.c
@@ -39,7 +39,7 @@ static int walk_pmd_range(pud_t *pud, unsigned long addr, unsigned long end,
 	do {
 again:
 		next = pmd_addr_end(addr, end);
-		if (pmd_none(*pmd) || !walk->vma) {
+		if (pmd_none(*pmd) || (!walk->vma && !walk->no_vma)) {
 			if (ops->pte_hole)
 				err = ops->pte_hole(addr, next, walk);
 			if (err)
@@ -62,9 +62,14 @@ static int walk_pmd_range(pud_t *pud, unsigned long addr, unsigned long end,
 		if (!ops->pte_entry)
 			continue;
 
-		split_huge_pmd(walk->vma, pmd, addr);
-		if (pmd_trans_unstable(pmd))
-			goto again;
+		if (walk->vma) {
+			split_huge_pmd(walk->vma, pmd, addr);
+			if (pmd_trans_unstable(pmd))
+				goto again;
+		} else if (pmd_leaf(*pmd) || !pmd_present(*pmd)) {
+			continue;
+		}
+
 		err = walk_pte_range(pmd, addr, next, walk);
 		if (err)
 			break;
@@ -85,7 +90,7 @@ static int walk_pud_range(p4d_t *p4d, unsigned long addr, unsigned long end,
 	do {
  again:
 		next = pud_addr_end(addr, end);
-		if (pud_none(*pud) || !walk->vma) {
+		if (pud_none(*pud) || (!walk->vma && !walk->no_vma)) {
 			if (ops->pte_hole)
 				err = ops->pte_hole(addr, next, walk);
 			if (err)
@@ -99,9 +104,13 @@ static int walk_pud_range(p4d_t *p4d, unsigned long addr, unsigned long end,
 				break;
 		}
 
-		split_huge_pud(walk->vma, pud, addr);
-		if (pud_none(*pud))
-			goto again;
+		if (walk->vma) {
+			split_huge_pud(walk->vma, pud, addr);
+			if (pud_none(*pud))
+				goto again;
+		} else if (pud_leaf(*pud) || !pud_present(*pud)) {
+			continue;
+		}
 
 		if (ops->pmd_entry || ops->pte_entry)
 			err = walk_pmd_range(pud, addr, next, walk);
@@ -374,6 +383,25 @@ int walk_page_range(struct mm_struct *mm, unsigned long start,
 	return err;
 }
 
+int walk_page_range_novma(struct mm_struct *mm, unsigned long start,
+			  unsigned long end, const struct mm_walk_ops *ops,
+			  void *private)
+{
+	struct mm_walk walk = {
+		.ops		= ops,
+		.mm		= mm,
+		.private	= private,
+		.no_vma		= true
+	};
+
+	if (start >= end || !walk.mm)
+		return -EINVAL;
+
+	lockdep_assert_held(&walk.mm->mmap_sem);
+
+	return __walk_page_range(start, end, &walk);
+}
+
 int walk_page_vma(struct vm_area_struct *vma, const struct mm_walk_ops *ops,
 		void *private)
 {
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH v16 12/25] mm: pagewalk: Allow walking without vma
@ 2019-12-06 13:53   ` Steven Price
  0 siblings, 0 replies; 85+ messages in thread
From: Steven Price @ 2019-12-06 13:53 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: Mark Rutland, x86, Arnd Bergmann, Ard Biesheuvel, Peter Zijlstra,
	Catalin Marinas, Dave Hansen, linux-kernel, Steven Price,
	Jérôme Glisse, Ingo Molnar, Borislav Petkov,
	Andy Lutomirski, H. Peter Anvin, James Morse, Thomas Gleixner,
	Will Deacon, linux-arm-kernel, Liang, Kan

Since 48684a65b4e3: "mm: pagewalk: fix misbehavior of walk_page_range
for vma(VM_PFNMAP)", page_table_walk() will report any kernel area as
a hole, because it lacks a vma.

This means each arch has re-implemented page table walking when needed,
for example in the per-arch ptdump walker.

Remove the requirement to have a vma in the generic code and add a new
function walk_page_range_novma() which ignores the VMAs and simply walks
the page tables.

Signed-off-by: Steven Price <steven.price@arm.com>
---
 include/linux/pagewalk.h |  5 +++++
 mm/pagewalk.c            | 44 ++++++++++++++++++++++++++++++++--------
 2 files changed, 41 insertions(+), 8 deletions(-)

diff --git a/include/linux/pagewalk.h b/include/linux/pagewalk.h
index 06790f23957f..2c9725bdcf1f 100644
--- a/include/linux/pagewalk.h
+++ b/include/linux/pagewalk.h
@@ -59,6 +59,7 @@ struct mm_walk_ops {
  * @ops:	operation to call during the walk
  * @mm:		mm_struct representing the target process of page table walk
  * @vma:	vma currently walked (NULL if walking outside vmas)
+ * @no_vma:	walk ignoring vmas (vma will always be NULL)
  * @private:	private data for callbacks' usage
  *
  * (see the comment on walk_page_range() for more details)
@@ -67,12 +68,16 @@ struct mm_walk {
 	const struct mm_walk_ops *ops;
 	struct mm_struct *mm;
 	struct vm_area_struct *vma;
+	bool no_vma;
 	void *private;
 };
 
 int walk_page_range(struct mm_struct *mm, unsigned long start,
 		unsigned long end, const struct mm_walk_ops *ops,
 		void *private);
+int walk_page_range_novma(struct mm_struct *mm, unsigned long start,
+			  unsigned long end, const struct mm_walk_ops *ops,
+			  void *private);
 int walk_page_vma(struct vm_area_struct *vma, const struct mm_walk_ops *ops,
 		void *private);
 int walk_page_mapping(struct address_space *mapping, pgoff_t first_index,
diff --git a/mm/pagewalk.c b/mm/pagewalk.c
index c089786e7a7f..efa464cf079b 100644
--- a/mm/pagewalk.c
+++ b/mm/pagewalk.c
@@ -39,7 +39,7 @@ static int walk_pmd_range(pud_t *pud, unsigned long addr, unsigned long end,
 	do {
 again:
 		next = pmd_addr_end(addr, end);
-		if (pmd_none(*pmd) || !walk->vma) {
+		if (pmd_none(*pmd) || (!walk->vma && !walk->no_vma)) {
 			if (ops->pte_hole)
 				err = ops->pte_hole(addr, next, walk);
 			if (err)
@@ -62,9 +62,14 @@ static int walk_pmd_range(pud_t *pud, unsigned long addr, unsigned long end,
 		if (!ops->pte_entry)
 			continue;
 
-		split_huge_pmd(walk->vma, pmd, addr);
-		if (pmd_trans_unstable(pmd))
-			goto again;
+		if (walk->vma) {
+			split_huge_pmd(walk->vma, pmd, addr);
+			if (pmd_trans_unstable(pmd))
+				goto again;
+		} else if (pmd_leaf(*pmd) || !pmd_present(*pmd)) {
+			continue;
+		}
+
 		err = walk_pte_range(pmd, addr, next, walk);
 		if (err)
 			break;
@@ -85,7 +90,7 @@ static int walk_pud_range(p4d_t *p4d, unsigned long addr, unsigned long end,
 	do {
  again:
 		next = pud_addr_end(addr, end);
-		if (pud_none(*pud) || !walk->vma) {
+		if (pud_none(*pud) || (!walk->vma && !walk->no_vma)) {
 			if (ops->pte_hole)
 				err = ops->pte_hole(addr, next, walk);
 			if (err)
@@ -99,9 +104,13 @@ static int walk_pud_range(p4d_t *p4d, unsigned long addr, unsigned long end,
 				break;
 		}
 
-		split_huge_pud(walk->vma, pud, addr);
-		if (pud_none(*pud))
-			goto again;
+		if (walk->vma) {
+			split_huge_pud(walk->vma, pud, addr);
+			if (pud_none(*pud))
+				goto again;
+		} else if (pud_leaf(*pud) || !pud_present(*pud)) {
+			continue;
+		}
 
 		if (ops->pmd_entry || ops->pte_entry)
 			err = walk_pmd_range(pud, addr, next, walk);
@@ -374,6 +383,25 @@ int walk_page_range(struct mm_struct *mm, unsigned long start,
 	return err;
 }
 
+int walk_page_range_novma(struct mm_struct *mm, unsigned long start,
+			  unsigned long end, const struct mm_walk_ops *ops,
+			  void *private)
+{
+	struct mm_walk walk = {
+		.ops		= ops,
+		.mm		= mm,
+		.private	= private,
+		.no_vma		= true
+	};
+
+	if (start >= end || !walk.mm)
+		return -EINVAL;
+
+	lockdep_assert_held(&walk.mm->mmap_sem);
+
+	return __walk_page_range(start, end, &walk);
+}
+
 int walk_page_vma(struct vm_area_struct *vma, const struct mm_walk_ops *ops,
 		void *private)
 {
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH v16 13/25] mm: pagewalk: Don't lock PTEs for walk_page_range_novma()
  2019-12-06 13:52 ` Steven Price
@ 2019-12-06 13:53   ` Steven Price
  -1 siblings, 0 replies; 85+ messages in thread
From: Steven Price @ 2019-12-06 13:53 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: Steven Price, Andy Lutomirski, Ard Biesheuvel, Arnd Bergmann,
	Borislav Petkov, Catalin Marinas, Dave Hansen, Ingo Molnar,
	James Morse, Jérôme Glisse, Peter Zijlstra,
	Thomas Gleixner, Will Deacon, x86, H. Peter Anvin,
	linux-arm-kernel, linux-kernel, Mark Rutland, Liang, Kan

walk_page_range_novma() can be used to walk page tables or the kernel or
for firmware. These page tables may contain entries that are not backed
by a struct page and so it isn't (in general) possible to take the PTE
lock for the pte_entry() callback. So update walk_pte_range() to only
take the lock when no_vma==false and add a comment explaining the
difference to walk_page_range_novma().

Signed-off-by: Steven Price <steven.price@arm.com>
---
 mm/pagewalk.c | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/mm/pagewalk.c b/mm/pagewalk.c
index efa464cf079b..1b9a3ba24c51 100644
--- a/mm/pagewalk.c
+++ b/mm/pagewalk.c
@@ -10,9 +10,10 @@ static int walk_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
 	pte_t *pte;
 	int err = 0;
 	const struct mm_walk_ops *ops = walk->ops;
-	spinlock_t *ptl;
+	spinlock_t *uninitialized_var(ptl);
 
-	pte = pte_offset_map_lock(walk->mm, pmd, addr, &ptl);
+	pte = walk->no_vma ? pte_offset_map(pmd, addr) :
+			     pte_offset_map_lock(walk->mm, pmd, addr, &ptl);
 	for (;;) {
 		err = ops->pte_entry(pte, addr, addr + PAGE_SIZE, walk);
 		if (err)
@@ -23,7 +24,9 @@ static int walk_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
 		pte++;
 	}
 
-	pte_unmap_unlock(pte, ptl);
+	if (!walk->no_vma)
+		spin_unlock(ptl);
+	pte_unmap(pte);
 	return err;
 }
 
@@ -383,6 +386,12 @@ int walk_page_range(struct mm_struct *mm, unsigned long start,
 	return err;
 }
 
+/*
+ * Similar to walk_page_range() but can walk any page tables even if they are
+ * not backed by VMAs. Because 'unusual' entries may be walked this function
+ * will also not lock the PTEs for the pte_entry() callback. This is useful for
+ * walking the kernel pages tables or page tables for firmware.
+ */
 int walk_page_range_novma(struct mm_struct *mm, unsigned long start,
 			  unsigned long end, const struct mm_walk_ops *ops,
 			  void *private)
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH v16 13/25] mm: pagewalk: Don't lock PTEs for walk_page_range_novma()
@ 2019-12-06 13:53   ` Steven Price
  0 siblings, 0 replies; 85+ messages in thread
From: Steven Price @ 2019-12-06 13:53 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: Mark Rutland, x86, Arnd Bergmann, Ard Biesheuvel, Peter Zijlstra,
	Catalin Marinas, Dave Hansen, linux-kernel, Steven Price,
	Jérôme Glisse, Ingo Molnar, Borislav Petkov,
	Andy Lutomirski, H. Peter Anvin, James Morse, Thomas Gleixner,
	Will Deacon, linux-arm-kernel, Liang, Kan

walk_page_range_novma() can be used to walk page tables or the kernel or
for firmware. These page tables may contain entries that are not backed
by a struct page and so it isn't (in general) possible to take the PTE
lock for the pte_entry() callback. So update walk_pte_range() to only
take the lock when no_vma==false and add a comment explaining the
difference to walk_page_range_novma().

Signed-off-by: Steven Price <steven.price@arm.com>
---
 mm/pagewalk.c | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/mm/pagewalk.c b/mm/pagewalk.c
index efa464cf079b..1b9a3ba24c51 100644
--- a/mm/pagewalk.c
+++ b/mm/pagewalk.c
@@ -10,9 +10,10 @@ static int walk_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
 	pte_t *pte;
 	int err = 0;
 	const struct mm_walk_ops *ops = walk->ops;
-	spinlock_t *ptl;
+	spinlock_t *uninitialized_var(ptl);
 
-	pte = pte_offset_map_lock(walk->mm, pmd, addr, &ptl);
+	pte = walk->no_vma ? pte_offset_map(pmd, addr) :
+			     pte_offset_map_lock(walk->mm, pmd, addr, &ptl);
 	for (;;) {
 		err = ops->pte_entry(pte, addr, addr + PAGE_SIZE, walk);
 		if (err)
@@ -23,7 +24,9 @@ static int walk_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
 		pte++;
 	}
 
-	pte_unmap_unlock(pte, ptl);
+	if (!walk->no_vma)
+		spin_unlock(ptl);
+	pte_unmap(pte);
 	return err;
 }
 
@@ -383,6 +386,12 @@ int walk_page_range(struct mm_struct *mm, unsigned long start,
 	return err;
 }
 
+/*
+ * Similar to walk_page_range() but can walk any page tables even if they are
+ * not backed by VMAs. Because 'unusual' entries may be walked this function
+ * will also not lock the PTEs for the pte_entry() callback. This is useful for
+ * walking the kernel pages tables or page tables for firmware.
+ */
 int walk_page_range_novma(struct mm_struct *mm, unsigned long start,
 			  unsigned long end, const struct mm_walk_ops *ops,
 			  void *private)
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH v16 14/25] mm: pagewalk: fix termination condition in walk_pte_range()
  2019-12-06 13:52 ` Steven Price
@ 2019-12-06 13:53   ` Steven Price
  -1 siblings, 0 replies; 85+ messages in thread
From: Steven Price @ 2019-12-06 13:53 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: Steven Price, Andy Lutomirski, Ard Biesheuvel, Arnd Bergmann,
	Borislav Petkov, Catalin Marinas, Dave Hansen, Ingo Molnar,
	James Morse, Jérôme Glisse, Peter Zijlstra,
	Thomas Gleixner, Will Deacon, x86, H. Peter Anvin,
	linux-arm-kernel, linux-kernel, Mark Rutland, Liang, Kan

If walk_pte_range() is called with a 'end' argument that is beyond the
last page of memory (e.g. ~0UL) then the comparison between 'addr' and
'end' will always fail and the loop will be infinite. Instead change the
comparison to >= while accounting for overflow.

Signed-off-by: Steven Price <steven.price@arm.com>
---
 mm/pagewalk.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/pagewalk.c b/mm/pagewalk.c
index 1b9a3ba24c51..88104ab00a97 100644
--- a/mm/pagewalk.c
+++ b/mm/pagewalk.c
@@ -18,9 +18,9 @@ static int walk_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
 		err = ops->pte_entry(pte, addr, addr + PAGE_SIZE, walk);
 		if (err)
 		       break;
-		addr += PAGE_SIZE;
-		if (addr == end)
+		if (addr >= end - PAGE_SIZE)
 			break;
+		addr += PAGE_SIZE;
 		pte++;
 	}
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH v16 14/25] mm: pagewalk: fix termination condition in walk_pte_range()
@ 2019-12-06 13:53   ` Steven Price
  0 siblings, 0 replies; 85+ messages in thread
From: Steven Price @ 2019-12-06 13:53 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: Mark Rutland, x86, Arnd Bergmann, Ard Biesheuvel, Peter Zijlstra,
	Catalin Marinas, Dave Hansen, linux-kernel, Steven Price,
	Jérôme Glisse, Ingo Molnar, Borislav Petkov,
	Andy Lutomirski, H. Peter Anvin, James Morse, Thomas Gleixner,
	Will Deacon, linux-arm-kernel, Liang, Kan

If walk_pte_range() is called with a 'end' argument that is beyond the
last page of memory (e.g. ~0UL) then the comparison between 'addr' and
'end' will always fail and the loop will be infinite. Instead change the
comparison to >= while accounting for overflow.

Signed-off-by: Steven Price <steven.price@arm.com>
---
 mm/pagewalk.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/pagewalk.c b/mm/pagewalk.c
index 1b9a3ba24c51..88104ab00a97 100644
--- a/mm/pagewalk.c
+++ b/mm/pagewalk.c
@@ -18,9 +18,9 @@ static int walk_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
 		err = ops->pte_entry(pte, addr, addr + PAGE_SIZE, walk);
 		if (err)
 		       break;
-		addr += PAGE_SIZE;
-		if (addr == end)
+		if (addr >= end - PAGE_SIZE)
 			break;
+		addr += PAGE_SIZE;
 		pte++;
 	}
 
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH v16 15/25] mm: pagewalk: Add test_p?d callbacks
  2019-12-06 13:52 ` Steven Price
@ 2019-12-06 13:53   ` Steven Price
  -1 siblings, 0 replies; 85+ messages in thread
From: Steven Price @ 2019-12-06 13:53 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: Steven Price, Andy Lutomirski, Ard Biesheuvel, Arnd Bergmann,
	Borislav Petkov, Catalin Marinas, Dave Hansen, Ingo Molnar,
	James Morse, Jérôme Glisse, Peter Zijlstra,
	Thomas Gleixner, Will Deacon, x86, H. Peter Anvin,
	linux-arm-kernel, linux-kernel, Mark Rutland, Liang, Kan,
	Zong Li

It is useful to be able to skip parts of the page table tree even when
walking without VMAs. Add test_p?d callbacks similar to test_walk but
which are called just before a table at that level is walked. If the
callback returns non-zero then the entire table is skipped.

Tested-by: Zong Li <zong.li@sifive.com>
Signed-off-by: Steven Price <steven.price@arm.com>
---
 include/linux/pagewalk.h | 11 +++++++++++
 mm/pagewalk.c            | 24 ++++++++++++++++++++++++
 2 files changed, 35 insertions(+)

diff --git a/include/linux/pagewalk.h b/include/linux/pagewalk.h
index 2c9725bdcf1f..84ac77620bfc 100644
--- a/include/linux/pagewalk.h
+++ b/include/linux/pagewalk.h
@@ -24,6 +24,11 @@ struct mm_walk;
  *			"do page table walk over the current vma", returning
  *			a negative value means "abort current page table walk
  *			right now" and returning 1 means "skip the current vma"
+ * @test_pmd:		similar to test_walk(), but called for every pmd.
+ * @test_pud:		similar to test_walk(), but called for every pud.
+ * @test_p4d:		similar to test_walk(), but called for every p4d.
+ *			Returning 0 means walk this part of the page tables,
+ *			returning 1 means to skip this range.
  * @pre_vma:            if set, called before starting walk on a non-null vma.
  * @post_vma:           if set, called after a walk on a non-null vma, provided
  *                      that @pre_vma and the vma walk succeeded.
@@ -49,6 +54,12 @@ struct mm_walk_ops {
 			     struct mm_walk *walk);
 	int (*test_walk)(unsigned long addr, unsigned long next,
 			struct mm_walk *walk);
+	int (*test_pmd)(unsigned long addr, unsigned long next,
+			pmd_t *pmd_start, struct mm_walk *walk);
+	int (*test_pud)(unsigned long addr, unsigned long next,
+			pud_t *pud_start, struct mm_walk *walk);
+	int (*test_p4d)(unsigned long addr, unsigned long next,
+			p4d_t *p4d_start, struct mm_walk *walk);
 	int (*pre_vma)(unsigned long start, unsigned long end,
 		       struct mm_walk *walk);
 	void (*post_vma)(struct mm_walk *walk);
diff --git a/mm/pagewalk.c b/mm/pagewalk.c
index 88104ab00a97..83ab6c1ebf9b 100644
--- a/mm/pagewalk.c
+++ b/mm/pagewalk.c
@@ -38,6 +38,14 @@ static int walk_pmd_range(pud_t *pud, unsigned long addr, unsigned long end,
 	const struct mm_walk_ops *ops = walk->ops;
 	int err = 0;
 
+	if (ops->test_pmd) {
+		err = ops->test_pmd(addr, end, pmd_offset(pud, 0UL), walk);
+		if (err < 0)
+			return err;
+		if (err > 0)
+			return 0;
+	}
+
 	pmd = pmd_offset(pud, addr);
 	do {
 again:
@@ -89,6 +97,14 @@ static int walk_pud_range(p4d_t *p4d, unsigned long addr, unsigned long end,
 	const struct mm_walk_ops *ops = walk->ops;
 	int err = 0;
 
+	if (ops->test_pud) {
+		err = ops->test_pud(addr, end, pud_offset(p4d, 0UL), walk);
+		if (err < 0)
+			return err;
+		if (err > 0)
+			return 0;
+	}
+
 	pud = pud_offset(p4d, addr);
 	do {
  again:
@@ -132,6 +148,14 @@ static int walk_p4d_range(pgd_t *pgd, unsigned long addr, unsigned long end,
 	const struct mm_walk_ops *ops = walk->ops;
 	int err = 0;
 
+	if (ops->test_p4d) {
+		err = ops->test_p4d(addr, end, p4d_offset(pgd, 0UL), walk);
+		if (err < 0)
+			return err;
+		if (err > 0)
+			return 0;
+	}
+
 	p4d = p4d_offset(pgd, addr);
 	do {
 		next = p4d_addr_end(addr, end);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH v16 15/25] mm: pagewalk: Add test_p?d callbacks
@ 2019-12-06 13:53   ` Steven Price
  0 siblings, 0 replies; 85+ messages in thread
From: Steven Price @ 2019-12-06 13:53 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: Mark Rutland, x86, Zong Li, Arnd Bergmann, Ard Biesheuvel,
	Peter Zijlstra, Catalin Marinas, Dave Hansen, linux-kernel,
	Steven Price, Jérôme Glisse, Ingo Molnar,
	Borislav Petkov, Andy Lutomirski, H. Peter Anvin, James Morse,
	Thomas Gleixner, Will Deacon, linux-arm-kernel, Liang, Kan

It is useful to be able to skip parts of the page table tree even when
walking without VMAs. Add test_p?d callbacks similar to test_walk but
which are called just before a table at that level is walked. If the
callback returns non-zero then the entire table is skipped.

Tested-by: Zong Li <zong.li@sifive.com>
Signed-off-by: Steven Price <steven.price@arm.com>
---
 include/linux/pagewalk.h | 11 +++++++++++
 mm/pagewalk.c            | 24 ++++++++++++++++++++++++
 2 files changed, 35 insertions(+)

diff --git a/include/linux/pagewalk.h b/include/linux/pagewalk.h
index 2c9725bdcf1f..84ac77620bfc 100644
--- a/include/linux/pagewalk.h
+++ b/include/linux/pagewalk.h
@@ -24,6 +24,11 @@ struct mm_walk;
  *			"do page table walk over the current vma", returning
  *			a negative value means "abort current page table walk
  *			right now" and returning 1 means "skip the current vma"
+ * @test_pmd:		similar to test_walk(), but called for every pmd.
+ * @test_pud:		similar to test_walk(), but called for every pud.
+ * @test_p4d:		similar to test_walk(), but called for every p4d.
+ *			Returning 0 means walk this part of the page tables,
+ *			returning 1 means to skip this range.
  * @pre_vma:            if set, called before starting walk on a non-null vma.
  * @post_vma:           if set, called after a walk on a non-null vma, provided
  *                      that @pre_vma and the vma walk succeeded.
@@ -49,6 +54,12 @@ struct mm_walk_ops {
 			     struct mm_walk *walk);
 	int (*test_walk)(unsigned long addr, unsigned long next,
 			struct mm_walk *walk);
+	int (*test_pmd)(unsigned long addr, unsigned long next,
+			pmd_t *pmd_start, struct mm_walk *walk);
+	int (*test_pud)(unsigned long addr, unsigned long next,
+			pud_t *pud_start, struct mm_walk *walk);
+	int (*test_p4d)(unsigned long addr, unsigned long next,
+			p4d_t *p4d_start, struct mm_walk *walk);
 	int (*pre_vma)(unsigned long start, unsigned long end,
 		       struct mm_walk *walk);
 	void (*post_vma)(struct mm_walk *walk);
diff --git a/mm/pagewalk.c b/mm/pagewalk.c
index 88104ab00a97..83ab6c1ebf9b 100644
--- a/mm/pagewalk.c
+++ b/mm/pagewalk.c
@@ -38,6 +38,14 @@ static int walk_pmd_range(pud_t *pud, unsigned long addr, unsigned long end,
 	const struct mm_walk_ops *ops = walk->ops;
 	int err = 0;
 
+	if (ops->test_pmd) {
+		err = ops->test_pmd(addr, end, pmd_offset(pud, 0UL), walk);
+		if (err < 0)
+			return err;
+		if (err > 0)
+			return 0;
+	}
+
 	pmd = pmd_offset(pud, addr);
 	do {
 again:
@@ -89,6 +97,14 @@ static int walk_pud_range(p4d_t *p4d, unsigned long addr, unsigned long end,
 	const struct mm_walk_ops *ops = walk->ops;
 	int err = 0;
 
+	if (ops->test_pud) {
+		err = ops->test_pud(addr, end, pud_offset(p4d, 0UL), walk);
+		if (err < 0)
+			return err;
+		if (err > 0)
+			return 0;
+	}
+
 	pud = pud_offset(p4d, addr);
 	do {
  again:
@@ -132,6 +148,14 @@ static int walk_p4d_range(pgd_t *pgd, unsigned long addr, unsigned long end,
 	const struct mm_walk_ops *ops = walk->ops;
 	int err = 0;
 
+	if (ops->test_p4d) {
+		err = ops->test_p4d(addr, end, p4d_offset(pgd, 0UL), walk);
+		if (err < 0)
+			return err;
+		if (err > 0)
+			return 0;
+	}
+
 	p4d = p4d_offset(pgd, addr);
 	do {
 		next = p4d_addr_end(addr, end);
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH v16 16/25] mm: pagewalk: Add 'depth' parameter to pte_hole
  2019-12-06 13:52 ` Steven Price
@ 2019-12-06 13:53   ` Steven Price
  -1 siblings, 0 replies; 85+ messages in thread
From: Steven Price @ 2019-12-06 13:53 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: Steven Price, Andy Lutomirski, Ard Biesheuvel, Arnd Bergmann,
	Borislav Petkov, Catalin Marinas, Dave Hansen, Ingo Molnar,
	James Morse, Jérôme Glisse, Peter Zijlstra,
	Thomas Gleixner, Will Deacon, x86, H. Peter Anvin,
	linux-arm-kernel, linux-kernel, Mark Rutland, Liang, Kan,
	Zong Li

The pte_hole() callback is called at multiple levels of the page tables.
Code dumping the kernel page tables needs to know what at what depth
the missing entry is. Add this is an extra parameter to pte_hole().
When the depth isn't know (e.g. processing a vma) then -1 is passed.

The depth that is reported is the actual level where the entry is
missing (ignoring any folding that is in place), i.e. any levels where
PTRS_PER_P?D is set to 1 are ignored.

Note that depth starts at 0 for a PGD so that PUD/PMD/PTE retain their
natural numbers as levels 2/3/4.

Tested-by: Zong Li <zong.li@sifive.com>
Signed-off-by: Steven Price <steven.price@arm.com>
---
 fs/proc/task_mmu.c       |  4 ++--
 include/linux/pagewalk.h |  7 +++++--
 mm/hmm.c                 |  8 ++++----
 mm/migrate.c             |  5 +++--
 mm/mincore.c             |  1 +
 mm/pagewalk.c            | 31 +++++++++++++++++++++++++------
 6 files changed, 40 insertions(+), 16 deletions(-)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 9442631fd4af..3ba9ae83bff5 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -505,7 +505,7 @@ static void smaps_account(struct mem_size_stats *mss, struct page *page,
 
 #ifdef CONFIG_SHMEM
 static int smaps_pte_hole(unsigned long addr, unsigned long end,
-		struct mm_walk *walk)
+			  __always_unused int depth, struct mm_walk *walk)
 {
 	struct mem_size_stats *mss = walk->private;
 
@@ -1282,7 +1282,7 @@ static int add_to_pagemap(unsigned long addr, pagemap_entry_t *pme,
 }
 
 static int pagemap_pte_hole(unsigned long start, unsigned long end,
-				struct mm_walk *walk)
+			    __always_unused int depth, struct mm_walk *walk)
 {
 	struct pagemapread *pm = walk->private;
 	unsigned long addr = start;
diff --git a/include/linux/pagewalk.h b/include/linux/pagewalk.h
index 84ac77620bfc..c33b47358e9f 100644
--- a/include/linux/pagewalk.h
+++ b/include/linux/pagewalk.h
@@ -17,7 +17,10 @@ struct mm_walk;
  *			split_huge_page() instead of handling it explicitly.
  * @pte_entry:		if set, called for each non-empty PTE (lowest-level)
  *			entry
- * @pte_hole:		if set, called for each hole at all levels
+ * @pte_hole:		if set, called for each hole at all levels,
+ *			depth is -1 if not known, 0:PGD, 1:P4D, 2:PUD, 3:PMD
+ *			4:PTE. Any folded depths (where PTRS_PER_P?D is equal
+ *			to 1) are skipped.
  * @hugetlb_entry:	if set, called for each hugetlb entry
  * @test_walk:		caller specific callback function to determine whether
  *			we walk over the current vma or not. Returning 0 means
@@ -48,7 +51,7 @@ struct mm_walk_ops {
 	int (*pte_entry)(pte_t *pte, unsigned long addr,
 			 unsigned long next, struct mm_walk *walk);
 	int (*pte_hole)(unsigned long addr, unsigned long next,
-			struct mm_walk *walk);
+			int depth, struct mm_walk *walk);
 	int (*hugetlb_entry)(pte_t *pte, unsigned long hmask,
 			     unsigned long addr, unsigned long next,
 			     struct mm_walk *walk);
diff --git a/mm/hmm.c b/mm/hmm.c
index d379cb6496ae..981f9f8614f2 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -186,7 +186,7 @@ static void hmm_range_need_fault(const struct hmm_vma_walk *hmm_vma_walk,
 }
 
 static int hmm_vma_walk_hole(unsigned long addr, unsigned long end,
-			     struct mm_walk *walk)
+			     __always_unused int depth, struct mm_walk *walk)
 {
 	struct hmm_vma_walk *hmm_vma_walk = walk->private;
 	struct hmm_range *range = hmm_vma_walk->range;
@@ -380,7 +380,7 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp,
 again:
 	pmd = READ_ONCE(*pmdp);
 	if (pmd_none(pmd))
-		return hmm_vma_walk_hole(start, end, walk);
+		return hmm_vma_walk_hole(start, end, -1, walk);
 
 	if (thp_migration_supported() && is_pmd_migration_entry(pmd)) {
 		bool fault, write_fault;
@@ -482,7 +482,7 @@ static int hmm_vma_walk_pud(pud_t *pudp, unsigned long start, unsigned long end,
 again:
 	pud = READ_ONCE(*pudp);
 	if (pud_none(pud))
-		return hmm_vma_walk_hole(start, end, walk);
+		return hmm_vma_walk_hole(start, end, -1, walk);
 
 	if (pud_huge(pud) && pud_devmap(pud)) {
 		unsigned long i, npages, pfn;
@@ -490,7 +490,7 @@ static int hmm_vma_walk_pud(pud_t *pudp, unsigned long start, unsigned long end,
 		bool fault, write_fault;
 
 		if (!pud_present(pud))
-			return hmm_vma_walk_hole(start, end, walk);
+			return hmm_vma_walk_hole(start, end, -1, walk);
 
 		i = (addr - range->start) >> PAGE_SHIFT;
 		npages = (end - addr) >> PAGE_SHIFT;
diff --git a/mm/migrate.c b/mm/migrate.c
index eae1565285e3..616cd1ed04dd 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -2119,6 +2119,7 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm,
 #ifdef CONFIG_DEVICE_PRIVATE
 static int migrate_vma_collect_hole(unsigned long start,
 				    unsigned long end,
+				    __always_unused int depth,
 				    struct mm_walk *walk)
 {
 	struct migrate_vma *migrate = walk->private;
@@ -2163,7 +2164,7 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp,
 
 again:
 	if (pmd_none(*pmdp))
-		return migrate_vma_collect_hole(start, end, walk);
+		return migrate_vma_collect_hole(start, end, -1, walk);
 
 	if (pmd_trans_huge(*pmdp)) {
 		struct page *page;
@@ -2196,7 +2197,7 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp,
 				return migrate_vma_collect_skip(start, end,
 								walk);
 			if (pmd_none(*pmdp))
-				return migrate_vma_collect_hole(start, end,
+				return migrate_vma_collect_hole(start, end, -1,
 								walk);
 		}
 	}
diff --git a/mm/mincore.c b/mm/mincore.c
index 49b6fa2f6aa1..0e6dd9948f1a 100644
--- a/mm/mincore.c
+++ b/mm/mincore.c
@@ -112,6 +112,7 @@ static int __mincore_unmapped_range(unsigned long addr, unsigned long end,
 }
 
 static int mincore_unmapped_range(unsigned long addr, unsigned long end,
+				   __always_unused int depth,
 				   struct mm_walk *walk)
 {
 	walk->private += __mincore_unmapped_range(addr, end,
diff --git a/mm/pagewalk.c b/mm/pagewalk.c
index 83ab6c1ebf9b..7e13f2f5f4c5 100644
--- a/mm/pagewalk.c
+++ b/mm/pagewalk.c
@@ -4,6 +4,22 @@
 #include <linux/sched.h>
 #include <linux/hugetlb.h>
 
+/*
+ * We want to know the real level where a entry is located ignoring any
+ * folding of levels which may be happening. For example if p4d is folded then
+ * a missing entry found at level 1 (p4d) is actually at level 0 (pgd).
+ */
+static int real_depth(int depth)
+{
+	if (depth == 3 && PTRS_PER_PMD == 1)
+		depth = 2;
+	if (depth == 2 && PTRS_PER_PUD == 1)
+		depth = 1;
+	if (depth == 1 && PTRS_PER_P4D == 1)
+		depth = 0;
+	return depth;
+}
+
 static int walk_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
 			  struct mm_walk *walk)
 {
@@ -37,6 +53,7 @@ static int walk_pmd_range(pud_t *pud, unsigned long addr, unsigned long end,
 	unsigned long next;
 	const struct mm_walk_ops *ops = walk->ops;
 	int err = 0;
+	int depth = real_depth(3);
 
 	if (ops->test_pmd) {
 		err = ops->test_pmd(addr, end, pmd_offset(pud, 0UL), walk);
@@ -52,7 +69,7 @@ static int walk_pmd_range(pud_t *pud, unsigned long addr, unsigned long end,
 		next = pmd_addr_end(addr, end);
 		if (pmd_none(*pmd) || (!walk->vma && !walk->no_vma)) {
 			if (ops->pte_hole)
-				err = ops->pte_hole(addr, next, walk);
+				err = ops->pte_hole(addr, next, depth, walk);
 			if (err)
 				break;
 			continue;
@@ -96,6 +113,7 @@ static int walk_pud_range(p4d_t *p4d, unsigned long addr, unsigned long end,
 	unsigned long next;
 	const struct mm_walk_ops *ops = walk->ops;
 	int err = 0;
+	int depth = real_depth(2);
 
 	if (ops->test_pud) {
 		err = ops->test_pud(addr, end, pud_offset(p4d, 0UL), walk);
@@ -111,7 +129,7 @@ static int walk_pud_range(p4d_t *p4d, unsigned long addr, unsigned long end,
 		next = pud_addr_end(addr, end);
 		if (pud_none(*pud) || (!walk->vma && !walk->no_vma)) {
 			if (ops->pte_hole)
-				err = ops->pte_hole(addr, next, walk);
+				err = ops->pte_hole(addr, next, depth, walk);
 			if (err)
 				break;
 			continue;
@@ -147,6 +165,7 @@ static int walk_p4d_range(pgd_t *pgd, unsigned long addr, unsigned long end,
 	unsigned long next;
 	const struct mm_walk_ops *ops = walk->ops;
 	int err = 0;
+	int depth = real_depth(1);
 
 	if (ops->test_p4d) {
 		err = ops->test_p4d(addr, end, p4d_offset(pgd, 0UL), walk);
@@ -161,7 +180,7 @@ static int walk_p4d_range(pgd_t *pgd, unsigned long addr, unsigned long end,
 		next = p4d_addr_end(addr, end);
 		if (p4d_none_or_clear_bad(p4d)) {
 			if (ops->pte_hole)
-				err = ops->pte_hole(addr, next, walk);
+				err = ops->pte_hole(addr, next, depth, walk);
 			if (err)
 				break;
 			continue;
@@ -193,7 +212,7 @@ static int walk_pgd_range(unsigned long addr, unsigned long end,
 		next = pgd_addr_end(addr, end);
 		if (pgd_none_or_clear_bad(pgd)) {
 			if (ops->pte_hole)
-				err = ops->pte_hole(addr, next, walk);
+				err = ops->pte_hole(addr, next, 0, walk);
 			if (err)
 				break;
 			continue;
@@ -240,7 +259,7 @@ static int walk_hugetlb_range(unsigned long addr, unsigned long end,
 		if (pte)
 			err = ops->hugetlb_entry(pte, hmask, addr, next, walk);
 		else if (ops->pte_hole)
-			err = ops->pte_hole(addr, next, walk);
+			err = ops->pte_hole(addr, next, -1, walk);
 
 		if (err)
 			break;
@@ -284,7 +303,7 @@ static int walk_page_test(unsigned long start, unsigned long end,
 	if (vma->vm_flags & VM_PFNMAP) {
 		int err = 1;
 		if (ops->pte_hole)
-			err = ops->pte_hole(start, end, walk);
+			err = ops->pte_hole(start, end, -1, walk);
 		return err ? err : 1;
 	}
 	return 0;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH v16 16/25] mm: pagewalk: Add 'depth' parameter to pte_hole
@ 2019-12-06 13:53   ` Steven Price
  0 siblings, 0 replies; 85+ messages in thread
From: Steven Price @ 2019-12-06 13:53 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: Mark Rutland, x86, Zong Li, Arnd Bergmann, Ard Biesheuvel,
	Peter Zijlstra, Catalin Marinas, Dave Hansen, linux-kernel,
	Steven Price, Jérôme Glisse, Ingo Molnar,
	Borislav Petkov, Andy Lutomirski, H. Peter Anvin, James Morse,
	Thomas Gleixner, Will Deacon, linux-arm-kernel, Liang, Kan

The pte_hole() callback is called at multiple levels of the page tables.
Code dumping the kernel page tables needs to know what at what depth
the missing entry is. Add this is an extra parameter to pte_hole().
When the depth isn't know (e.g. processing a vma) then -1 is passed.

The depth that is reported is the actual level where the entry is
missing (ignoring any folding that is in place), i.e. any levels where
PTRS_PER_P?D is set to 1 are ignored.

Note that depth starts at 0 for a PGD so that PUD/PMD/PTE retain their
natural numbers as levels 2/3/4.

Tested-by: Zong Li <zong.li@sifive.com>
Signed-off-by: Steven Price <steven.price@arm.com>
---
 fs/proc/task_mmu.c       |  4 ++--
 include/linux/pagewalk.h |  7 +++++--
 mm/hmm.c                 |  8 ++++----
 mm/migrate.c             |  5 +++--
 mm/mincore.c             |  1 +
 mm/pagewalk.c            | 31 +++++++++++++++++++++++++------
 6 files changed, 40 insertions(+), 16 deletions(-)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 9442631fd4af..3ba9ae83bff5 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -505,7 +505,7 @@ static void smaps_account(struct mem_size_stats *mss, struct page *page,
 
 #ifdef CONFIG_SHMEM
 static int smaps_pte_hole(unsigned long addr, unsigned long end,
-		struct mm_walk *walk)
+			  __always_unused int depth, struct mm_walk *walk)
 {
 	struct mem_size_stats *mss = walk->private;
 
@@ -1282,7 +1282,7 @@ static int add_to_pagemap(unsigned long addr, pagemap_entry_t *pme,
 }
 
 static int pagemap_pte_hole(unsigned long start, unsigned long end,
-				struct mm_walk *walk)
+			    __always_unused int depth, struct mm_walk *walk)
 {
 	struct pagemapread *pm = walk->private;
 	unsigned long addr = start;
diff --git a/include/linux/pagewalk.h b/include/linux/pagewalk.h
index 84ac77620bfc..c33b47358e9f 100644
--- a/include/linux/pagewalk.h
+++ b/include/linux/pagewalk.h
@@ -17,7 +17,10 @@ struct mm_walk;
  *			split_huge_page() instead of handling it explicitly.
  * @pte_entry:		if set, called for each non-empty PTE (lowest-level)
  *			entry
- * @pte_hole:		if set, called for each hole at all levels
+ * @pte_hole:		if set, called for each hole at all levels,
+ *			depth is -1 if not known, 0:PGD, 1:P4D, 2:PUD, 3:PMD
+ *			4:PTE. Any folded depths (where PTRS_PER_P?D is equal
+ *			to 1) are skipped.
  * @hugetlb_entry:	if set, called for each hugetlb entry
  * @test_walk:		caller specific callback function to determine whether
  *			we walk over the current vma or not. Returning 0 means
@@ -48,7 +51,7 @@ struct mm_walk_ops {
 	int (*pte_entry)(pte_t *pte, unsigned long addr,
 			 unsigned long next, struct mm_walk *walk);
 	int (*pte_hole)(unsigned long addr, unsigned long next,
-			struct mm_walk *walk);
+			int depth, struct mm_walk *walk);
 	int (*hugetlb_entry)(pte_t *pte, unsigned long hmask,
 			     unsigned long addr, unsigned long next,
 			     struct mm_walk *walk);
diff --git a/mm/hmm.c b/mm/hmm.c
index d379cb6496ae..981f9f8614f2 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -186,7 +186,7 @@ static void hmm_range_need_fault(const struct hmm_vma_walk *hmm_vma_walk,
 }
 
 static int hmm_vma_walk_hole(unsigned long addr, unsigned long end,
-			     struct mm_walk *walk)
+			     __always_unused int depth, struct mm_walk *walk)
 {
 	struct hmm_vma_walk *hmm_vma_walk = walk->private;
 	struct hmm_range *range = hmm_vma_walk->range;
@@ -380,7 +380,7 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp,
 again:
 	pmd = READ_ONCE(*pmdp);
 	if (pmd_none(pmd))
-		return hmm_vma_walk_hole(start, end, walk);
+		return hmm_vma_walk_hole(start, end, -1, walk);
 
 	if (thp_migration_supported() && is_pmd_migration_entry(pmd)) {
 		bool fault, write_fault;
@@ -482,7 +482,7 @@ static int hmm_vma_walk_pud(pud_t *pudp, unsigned long start, unsigned long end,
 again:
 	pud = READ_ONCE(*pudp);
 	if (pud_none(pud))
-		return hmm_vma_walk_hole(start, end, walk);
+		return hmm_vma_walk_hole(start, end, -1, walk);
 
 	if (pud_huge(pud) && pud_devmap(pud)) {
 		unsigned long i, npages, pfn;
@@ -490,7 +490,7 @@ static int hmm_vma_walk_pud(pud_t *pudp, unsigned long start, unsigned long end,
 		bool fault, write_fault;
 
 		if (!pud_present(pud))
-			return hmm_vma_walk_hole(start, end, walk);
+			return hmm_vma_walk_hole(start, end, -1, walk);
 
 		i = (addr - range->start) >> PAGE_SHIFT;
 		npages = (end - addr) >> PAGE_SHIFT;
diff --git a/mm/migrate.c b/mm/migrate.c
index eae1565285e3..616cd1ed04dd 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -2119,6 +2119,7 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm,
 #ifdef CONFIG_DEVICE_PRIVATE
 static int migrate_vma_collect_hole(unsigned long start,
 				    unsigned long end,
+				    __always_unused int depth,
 				    struct mm_walk *walk)
 {
 	struct migrate_vma *migrate = walk->private;
@@ -2163,7 +2164,7 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp,
 
 again:
 	if (pmd_none(*pmdp))
-		return migrate_vma_collect_hole(start, end, walk);
+		return migrate_vma_collect_hole(start, end, -1, walk);
 
 	if (pmd_trans_huge(*pmdp)) {
 		struct page *page;
@@ -2196,7 +2197,7 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp,
 				return migrate_vma_collect_skip(start, end,
 								walk);
 			if (pmd_none(*pmdp))
-				return migrate_vma_collect_hole(start, end,
+				return migrate_vma_collect_hole(start, end, -1,
 								walk);
 		}
 	}
diff --git a/mm/mincore.c b/mm/mincore.c
index 49b6fa2f6aa1..0e6dd9948f1a 100644
--- a/mm/mincore.c
+++ b/mm/mincore.c
@@ -112,6 +112,7 @@ static int __mincore_unmapped_range(unsigned long addr, unsigned long end,
 }
 
 static int mincore_unmapped_range(unsigned long addr, unsigned long end,
+				   __always_unused int depth,
 				   struct mm_walk *walk)
 {
 	walk->private += __mincore_unmapped_range(addr, end,
diff --git a/mm/pagewalk.c b/mm/pagewalk.c
index 83ab6c1ebf9b..7e13f2f5f4c5 100644
--- a/mm/pagewalk.c
+++ b/mm/pagewalk.c
@@ -4,6 +4,22 @@
 #include <linux/sched.h>
 #include <linux/hugetlb.h>
 
+/*
+ * We want to know the real level where a entry is located ignoring any
+ * folding of levels which may be happening. For example if p4d is folded then
+ * a missing entry found at level 1 (p4d) is actually at level 0 (pgd).
+ */
+static int real_depth(int depth)
+{
+	if (depth == 3 && PTRS_PER_PMD == 1)
+		depth = 2;
+	if (depth == 2 && PTRS_PER_PUD == 1)
+		depth = 1;
+	if (depth == 1 && PTRS_PER_P4D == 1)
+		depth = 0;
+	return depth;
+}
+
 static int walk_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
 			  struct mm_walk *walk)
 {
@@ -37,6 +53,7 @@ static int walk_pmd_range(pud_t *pud, unsigned long addr, unsigned long end,
 	unsigned long next;
 	const struct mm_walk_ops *ops = walk->ops;
 	int err = 0;
+	int depth = real_depth(3);
 
 	if (ops->test_pmd) {
 		err = ops->test_pmd(addr, end, pmd_offset(pud, 0UL), walk);
@@ -52,7 +69,7 @@ static int walk_pmd_range(pud_t *pud, unsigned long addr, unsigned long end,
 		next = pmd_addr_end(addr, end);
 		if (pmd_none(*pmd) || (!walk->vma && !walk->no_vma)) {
 			if (ops->pte_hole)
-				err = ops->pte_hole(addr, next, walk);
+				err = ops->pte_hole(addr, next, depth, walk);
 			if (err)
 				break;
 			continue;
@@ -96,6 +113,7 @@ static int walk_pud_range(p4d_t *p4d, unsigned long addr, unsigned long end,
 	unsigned long next;
 	const struct mm_walk_ops *ops = walk->ops;
 	int err = 0;
+	int depth = real_depth(2);
 
 	if (ops->test_pud) {
 		err = ops->test_pud(addr, end, pud_offset(p4d, 0UL), walk);
@@ -111,7 +129,7 @@ static int walk_pud_range(p4d_t *p4d, unsigned long addr, unsigned long end,
 		next = pud_addr_end(addr, end);
 		if (pud_none(*pud) || (!walk->vma && !walk->no_vma)) {
 			if (ops->pte_hole)
-				err = ops->pte_hole(addr, next, walk);
+				err = ops->pte_hole(addr, next, depth, walk);
 			if (err)
 				break;
 			continue;
@@ -147,6 +165,7 @@ static int walk_p4d_range(pgd_t *pgd, unsigned long addr, unsigned long end,
 	unsigned long next;
 	const struct mm_walk_ops *ops = walk->ops;
 	int err = 0;
+	int depth = real_depth(1);
 
 	if (ops->test_p4d) {
 		err = ops->test_p4d(addr, end, p4d_offset(pgd, 0UL), walk);
@@ -161,7 +180,7 @@ static int walk_p4d_range(pgd_t *pgd, unsigned long addr, unsigned long end,
 		next = p4d_addr_end(addr, end);
 		if (p4d_none_or_clear_bad(p4d)) {
 			if (ops->pte_hole)
-				err = ops->pte_hole(addr, next, walk);
+				err = ops->pte_hole(addr, next, depth, walk);
 			if (err)
 				break;
 			continue;
@@ -193,7 +212,7 @@ static int walk_pgd_range(unsigned long addr, unsigned long end,
 		next = pgd_addr_end(addr, end);
 		if (pgd_none_or_clear_bad(pgd)) {
 			if (ops->pte_hole)
-				err = ops->pte_hole(addr, next, walk);
+				err = ops->pte_hole(addr, next, 0, walk);
 			if (err)
 				break;
 			continue;
@@ -240,7 +259,7 @@ static int walk_hugetlb_range(unsigned long addr, unsigned long end,
 		if (pte)
 			err = ops->hugetlb_entry(pte, hmask, addr, next, walk);
 		else if (ops->pte_hole)
-			err = ops->pte_hole(addr, next, walk);
+			err = ops->pte_hole(addr, next, -1, walk);
 
 		if (err)
 			break;
@@ -284,7 +303,7 @@ static int walk_page_test(unsigned long start, unsigned long end,
 	if (vma->vm_flags & VM_PFNMAP) {
 		int err = 1;
 		if (ops->pte_hole)
-			err = ops->pte_hole(start, end, walk);
+			err = ops->pte_hole(start, end, -1, walk);
 		return err ? err : 1;
 	}
 	return 0;
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH v16 17/25] x86: mm: Point to struct seq_file from struct pg_state
  2019-12-06 13:52 ` Steven Price
@ 2019-12-06 13:53   ` Steven Price
  -1 siblings, 0 replies; 85+ messages in thread
From: Steven Price @ 2019-12-06 13:53 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: Steven Price, Andy Lutomirski, Ard Biesheuvel, Arnd Bergmann,
	Borislav Petkov, Catalin Marinas, Dave Hansen, Ingo Molnar,
	James Morse, Jérôme Glisse, Peter Zijlstra,
	Thomas Gleixner, Will Deacon, x86, H. Peter Anvin,
	linux-arm-kernel, linux-kernel, Mark Rutland, Liang, Kan

mm/dump_pagetables.c passes both struct seq_file and struct pg_state
down the chain of walk_*_level() functions to be passed to note_page().
Instead place the struct seq_file in struct pg_state and access it from
struct pg_state (which is private to this file) in note_page().

Signed-off-by: Steven Price <steven.price@arm.com>
---
 arch/x86/mm/dump_pagetables.c | 69 ++++++++++++++++++-----------------
 1 file changed, 35 insertions(+), 34 deletions(-)

diff --git a/arch/x86/mm/dump_pagetables.c b/arch/x86/mm/dump_pagetables.c
index ab67822fd2f4..4dc6f4df40af 100644
--- a/arch/x86/mm/dump_pagetables.c
+++ b/arch/x86/mm/dump_pagetables.c
@@ -36,6 +36,7 @@ struct pg_state {
 	bool to_dmesg;
 	bool check_wx;
 	unsigned long wx_pages;
+	struct seq_file *seq;
 };
 
 struct addr_marker {
@@ -265,11 +266,12 @@ static void note_wx(struct pg_state *st)
  * of PTE entries; the next one is different so we need to
  * print what we collected so far.
  */
-static void note_page(struct seq_file *m, struct pg_state *st,
-		      pgprot_t new_prot, pgprotval_t new_eff, int level)
+static void note_page(struct pg_state *st, pgprot_t new_prot,
+		      pgprotval_t new_eff, int level)
 {
 	pgprotval_t prot, cur, eff;
 	static const char units[] = "BKMGTPE";
+	struct seq_file *m = st->seq;
 
 	/*
 	 * If we have a "break" in the series, we need to flush the state that
@@ -354,8 +356,8 @@ static inline pgprotval_t effective_prot(pgprotval_t prot1, pgprotval_t prot2)
 	       ((prot1 | prot2) & _PAGE_NX);
 }
 
-static void walk_pte_level(struct seq_file *m, struct pg_state *st, pmd_t addr,
-			   pgprotval_t eff_in, unsigned long P)
+static void walk_pte_level(struct pg_state *st, pmd_t addr, pgprotval_t eff_in,
+			   unsigned long P)
 {
 	int i;
 	pte_t *pte;
@@ -366,7 +368,7 @@ static void walk_pte_level(struct seq_file *m, struct pg_state *st, pmd_t addr,
 		pte = pte_offset_map(&addr, st->current_address);
 		prot = pte_flags(*pte);
 		eff = effective_prot(eff_in, prot);
-		note_page(m, st, __pgprot(prot), eff, 5);
+		note_page(st, __pgprot(prot), eff, 5);
 		pte_unmap(pte);
 	}
 }
@@ -379,22 +381,20 @@ static void walk_pte_level(struct seq_file *m, struct pg_state *st, pmd_t addr,
  * us dozens of seconds (minutes for 5-level config) while checking for
  * W+X mapping or reading kernel_page_tables debugfs file.
  */
-static inline bool kasan_page_table(struct seq_file *m, struct pg_state *st,
-				void *pt)
+static inline bool kasan_page_table(struct pg_state *st, void *pt)
 {
 	if (__pa(pt) == __pa(kasan_early_shadow_pmd) ||
 	    (pgtable_l5_enabled() &&
 			__pa(pt) == __pa(kasan_early_shadow_p4d)) ||
 	    __pa(pt) == __pa(kasan_early_shadow_pud)) {
 		pgprotval_t prot = pte_flags(kasan_early_shadow_pte[0]);
-		note_page(m, st, __pgprot(prot), 0, 5);
+		note_page(st, __pgprot(prot), 0, 5);
 		return true;
 	}
 	return false;
 }
 #else
-static inline bool kasan_page_table(struct seq_file *m, struct pg_state *st,
-				void *pt)
+static inline bool kasan_page_table(struct pg_state *st, void *pt)
 {
 	return false;
 }
@@ -402,7 +402,7 @@ static inline bool kasan_page_table(struct seq_file *m, struct pg_state *st,
 
 #if PTRS_PER_PMD > 1
 
-static void walk_pmd_level(struct seq_file *m, struct pg_state *st, pud_t addr,
+static void walk_pmd_level(struct pg_state *st, pud_t addr,
 			   pgprotval_t eff_in, unsigned long P)
 {
 	int i;
@@ -416,27 +416,27 @@ static void walk_pmd_level(struct seq_file *m, struct pg_state *st, pud_t addr,
 			prot = pmd_flags(*start);
 			eff = effective_prot(eff_in, prot);
 			if (pmd_large(*start) || !pmd_present(*start)) {
-				note_page(m, st, __pgprot(prot), eff, 4);
-			} else if (!kasan_page_table(m, st, pmd_start)) {
-				walk_pte_level(m, st, *start, eff,
+				note_page(st, __pgprot(prot), eff, 4);
+			} else if (!kasan_page_table(st, pmd_start)) {
+				walk_pte_level(st, *start, eff,
 					       P + i * PMD_LEVEL_MULT);
 			}
 		} else
-			note_page(m, st, __pgprot(0), 0, 4);
+			note_page(st, __pgprot(0), 0, 4);
 		start++;
 	}
 }
 
 #else
-#define walk_pmd_level(m,s,a,e,p) walk_pte_level(m,s,__pmd(pud_val(a)),e,p)
+#define walk_pmd_level(s,a,e,p) walk_pte_level(s,__pmd(pud_val(a)),e,p)
 #define pud_large(a) pmd_large(__pmd(pud_val(a)))
 #define pud_none(a)  pmd_none(__pmd(pud_val(a)))
 #endif
 
 #if PTRS_PER_PUD > 1
 
-static void walk_pud_level(struct seq_file *m, struct pg_state *st, p4d_t addr,
-			   pgprotval_t eff_in, unsigned long P)
+static void walk_pud_level(struct pg_state *st, p4d_t addr, pgprotval_t eff_in,
+			   unsigned long P)
 {
 	int i;
 	pud_t *start, *pud_start;
@@ -450,33 +450,33 @@ static void walk_pud_level(struct seq_file *m, struct pg_state *st, p4d_t addr,
 			prot = pud_flags(*start);
 			eff = effective_prot(eff_in, prot);
 			if (pud_large(*start) || !pud_present(*start)) {
-				note_page(m, st, __pgprot(prot), eff, 3);
-			} else if (!kasan_page_table(m, st, pud_start)) {
-				walk_pmd_level(m, st, *start, eff,
+				note_page(st, __pgprot(prot), eff, 3);
+			} else if (!kasan_page_table(st, pud_start)) {
+				walk_pmd_level(st, *start, eff,
 					       P + i * PUD_LEVEL_MULT);
 			}
 		} else
-			note_page(m, st, __pgprot(0), 0, 3);
+			note_page(st, __pgprot(0), 0, 3);
 
 		start++;
 	}
 }
 
 #else
-#define walk_pud_level(m,s,a,e,p) walk_pmd_level(m,s,__pud(p4d_val(a)),e,p)
+#define walk_pud_level(s,a,e,p) walk_pmd_level(s,__pud(p4d_val(a)),e,p)
 #define p4d_large(a) pud_large(__pud(p4d_val(a)))
 #define p4d_none(a)  pud_none(__pud(p4d_val(a)))
 #endif
 
-static void walk_p4d_level(struct seq_file *m, struct pg_state *st, pgd_t addr,
-			   pgprotval_t eff_in, unsigned long P)
+static void walk_p4d_level(struct pg_state *st, pgd_t addr, pgprotval_t eff_in,
+			   unsigned long P)
 {
 	int i;
 	p4d_t *start, *p4d_start;
 	pgprotval_t prot, eff;
 
 	if (PTRS_PER_P4D == 1)
-		return walk_pud_level(m, st, __p4d(pgd_val(addr)), eff_in, P);
+		return walk_pud_level(st, __p4d(pgd_val(addr)), eff_in, P);
 
 	p4d_start = start = (p4d_t *)pgd_page_vaddr(addr);
 
@@ -486,13 +486,13 @@ static void walk_p4d_level(struct seq_file *m, struct pg_state *st, pgd_t addr,
 			prot = p4d_flags(*start);
 			eff = effective_prot(eff_in, prot);
 			if (p4d_large(*start) || !p4d_present(*start)) {
-				note_page(m, st, __pgprot(prot), eff, 2);
-			} else if (!kasan_page_table(m, st, p4d_start)) {
-				walk_pud_level(m, st, *start, eff,
+				note_page(st, __pgprot(prot), eff, 2);
+			} else if (!kasan_page_table(st, p4d_start)) {
+				walk_pud_level(st, *start, eff,
 					       P + i * P4D_LEVEL_MULT);
 			}
 		} else
-			note_page(m, st, __pgprot(0), 0, 2);
+			note_page(st, __pgprot(0), 0, 2);
 
 		start++;
 	}
@@ -529,6 +529,7 @@ static void ptdump_walk_pgd_level_core(struct seq_file *m, pgd_t *pgd,
 	}
 
 	st.check_wx = checkwx;
+	st.seq = m;
 	if (checkwx)
 		st.wx_pages = 0;
 
@@ -542,13 +543,13 @@ static void ptdump_walk_pgd_level_core(struct seq_file *m, pgd_t *pgd,
 			eff = prot;
 #endif
 			if (pgd_large(*start) || !pgd_present(*start)) {
-				note_page(m, &st, __pgprot(prot), eff, 1);
+				note_page(&st, __pgprot(prot), eff, 1);
 			} else {
-				walk_p4d_level(m, &st, *start, eff,
+				walk_p4d_level(&st, *start, eff,
 					       i * PGD_LEVEL_MULT);
 			}
 		} else
-			note_page(m, &st, __pgprot(0), 0, 1);
+			note_page(&st, __pgprot(0), 0, 1);
 
 		cond_resched();
 		start++;
@@ -556,7 +557,7 @@ static void ptdump_walk_pgd_level_core(struct seq_file *m, pgd_t *pgd,
 
 	/* Flush out the last page */
 	st.current_address = normalize_addr(PTRS_PER_PGD*PGD_LEVEL_MULT);
-	note_page(m, &st, __pgprot(0), 0, 0);
+	note_page(&st, __pgprot(0), 0, 0);
 	if (!checkwx)
 		return;
 	if (st.wx_pages)
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH v16 17/25] x86: mm: Point to struct seq_file from struct pg_state
@ 2019-12-06 13:53   ` Steven Price
  0 siblings, 0 replies; 85+ messages in thread
From: Steven Price @ 2019-12-06 13:53 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: Mark Rutland, x86, Arnd Bergmann, Ard Biesheuvel, Peter Zijlstra,
	Catalin Marinas, Dave Hansen, linux-kernel, Steven Price,
	Jérôme Glisse, Ingo Molnar, Borislav Petkov,
	Andy Lutomirski, H. Peter Anvin, James Morse, Thomas Gleixner,
	Will Deacon, linux-arm-kernel, Liang, Kan

mm/dump_pagetables.c passes both struct seq_file and struct pg_state
down the chain of walk_*_level() functions to be passed to note_page().
Instead place the struct seq_file in struct pg_state and access it from
struct pg_state (which is private to this file) in note_page().

Signed-off-by: Steven Price <steven.price@arm.com>
---
 arch/x86/mm/dump_pagetables.c | 69 ++++++++++++++++++-----------------
 1 file changed, 35 insertions(+), 34 deletions(-)

diff --git a/arch/x86/mm/dump_pagetables.c b/arch/x86/mm/dump_pagetables.c
index ab67822fd2f4..4dc6f4df40af 100644
--- a/arch/x86/mm/dump_pagetables.c
+++ b/arch/x86/mm/dump_pagetables.c
@@ -36,6 +36,7 @@ struct pg_state {
 	bool to_dmesg;
 	bool check_wx;
 	unsigned long wx_pages;
+	struct seq_file *seq;
 };
 
 struct addr_marker {
@@ -265,11 +266,12 @@ static void note_wx(struct pg_state *st)
  * of PTE entries; the next one is different so we need to
  * print what we collected so far.
  */
-static void note_page(struct seq_file *m, struct pg_state *st,
-		      pgprot_t new_prot, pgprotval_t new_eff, int level)
+static void note_page(struct pg_state *st, pgprot_t new_prot,
+		      pgprotval_t new_eff, int level)
 {
 	pgprotval_t prot, cur, eff;
 	static const char units[] = "BKMGTPE";
+	struct seq_file *m = st->seq;
 
 	/*
 	 * If we have a "break" in the series, we need to flush the state that
@@ -354,8 +356,8 @@ static inline pgprotval_t effective_prot(pgprotval_t prot1, pgprotval_t prot2)
 	       ((prot1 | prot2) & _PAGE_NX);
 }
 
-static void walk_pte_level(struct seq_file *m, struct pg_state *st, pmd_t addr,
-			   pgprotval_t eff_in, unsigned long P)
+static void walk_pte_level(struct pg_state *st, pmd_t addr, pgprotval_t eff_in,
+			   unsigned long P)
 {
 	int i;
 	pte_t *pte;
@@ -366,7 +368,7 @@ static void walk_pte_level(struct seq_file *m, struct pg_state *st, pmd_t addr,
 		pte = pte_offset_map(&addr, st->current_address);
 		prot = pte_flags(*pte);
 		eff = effective_prot(eff_in, prot);
-		note_page(m, st, __pgprot(prot), eff, 5);
+		note_page(st, __pgprot(prot), eff, 5);
 		pte_unmap(pte);
 	}
 }
@@ -379,22 +381,20 @@ static void walk_pte_level(struct seq_file *m, struct pg_state *st, pmd_t addr,
  * us dozens of seconds (minutes for 5-level config) while checking for
  * W+X mapping or reading kernel_page_tables debugfs file.
  */
-static inline bool kasan_page_table(struct seq_file *m, struct pg_state *st,
-				void *pt)
+static inline bool kasan_page_table(struct pg_state *st, void *pt)
 {
 	if (__pa(pt) == __pa(kasan_early_shadow_pmd) ||
 	    (pgtable_l5_enabled() &&
 			__pa(pt) == __pa(kasan_early_shadow_p4d)) ||
 	    __pa(pt) == __pa(kasan_early_shadow_pud)) {
 		pgprotval_t prot = pte_flags(kasan_early_shadow_pte[0]);
-		note_page(m, st, __pgprot(prot), 0, 5);
+		note_page(st, __pgprot(prot), 0, 5);
 		return true;
 	}
 	return false;
 }
 #else
-static inline bool kasan_page_table(struct seq_file *m, struct pg_state *st,
-				void *pt)
+static inline bool kasan_page_table(struct pg_state *st, void *pt)
 {
 	return false;
 }
@@ -402,7 +402,7 @@ static inline bool kasan_page_table(struct seq_file *m, struct pg_state *st,
 
 #if PTRS_PER_PMD > 1
 
-static void walk_pmd_level(struct seq_file *m, struct pg_state *st, pud_t addr,
+static void walk_pmd_level(struct pg_state *st, pud_t addr,
 			   pgprotval_t eff_in, unsigned long P)
 {
 	int i;
@@ -416,27 +416,27 @@ static void walk_pmd_level(struct seq_file *m, struct pg_state *st, pud_t addr,
 			prot = pmd_flags(*start);
 			eff = effective_prot(eff_in, prot);
 			if (pmd_large(*start) || !pmd_present(*start)) {
-				note_page(m, st, __pgprot(prot), eff, 4);
-			} else if (!kasan_page_table(m, st, pmd_start)) {
-				walk_pte_level(m, st, *start, eff,
+				note_page(st, __pgprot(prot), eff, 4);
+			} else if (!kasan_page_table(st, pmd_start)) {
+				walk_pte_level(st, *start, eff,
 					       P + i * PMD_LEVEL_MULT);
 			}
 		} else
-			note_page(m, st, __pgprot(0), 0, 4);
+			note_page(st, __pgprot(0), 0, 4);
 		start++;
 	}
 }
 
 #else
-#define walk_pmd_level(m,s,a,e,p) walk_pte_level(m,s,__pmd(pud_val(a)),e,p)
+#define walk_pmd_level(s,a,e,p) walk_pte_level(s,__pmd(pud_val(a)),e,p)
 #define pud_large(a) pmd_large(__pmd(pud_val(a)))
 #define pud_none(a)  pmd_none(__pmd(pud_val(a)))
 #endif
 
 #if PTRS_PER_PUD > 1
 
-static void walk_pud_level(struct seq_file *m, struct pg_state *st, p4d_t addr,
-			   pgprotval_t eff_in, unsigned long P)
+static void walk_pud_level(struct pg_state *st, p4d_t addr, pgprotval_t eff_in,
+			   unsigned long P)
 {
 	int i;
 	pud_t *start, *pud_start;
@@ -450,33 +450,33 @@ static void walk_pud_level(struct seq_file *m, struct pg_state *st, p4d_t addr,
 			prot = pud_flags(*start);
 			eff = effective_prot(eff_in, prot);
 			if (pud_large(*start) || !pud_present(*start)) {
-				note_page(m, st, __pgprot(prot), eff, 3);
-			} else if (!kasan_page_table(m, st, pud_start)) {
-				walk_pmd_level(m, st, *start, eff,
+				note_page(st, __pgprot(prot), eff, 3);
+			} else if (!kasan_page_table(st, pud_start)) {
+				walk_pmd_level(st, *start, eff,
 					       P + i * PUD_LEVEL_MULT);
 			}
 		} else
-			note_page(m, st, __pgprot(0), 0, 3);
+			note_page(st, __pgprot(0), 0, 3);
 
 		start++;
 	}
 }
 
 #else
-#define walk_pud_level(m,s,a,e,p) walk_pmd_level(m,s,__pud(p4d_val(a)),e,p)
+#define walk_pud_level(s,a,e,p) walk_pmd_level(s,__pud(p4d_val(a)),e,p)
 #define p4d_large(a) pud_large(__pud(p4d_val(a)))
 #define p4d_none(a)  pud_none(__pud(p4d_val(a)))
 #endif
 
-static void walk_p4d_level(struct seq_file *m, struct pg_state *st, pgd_t addr,
-			   pgprotval_t eff_in, unsigned long P)
+static void walk_p4d_level(struct pg_state *st, pgd_t addr, pgprotval_t eff_in,
+			   unsigned long P)
 {
 	int i;
 	p4d_t *start, *p4d_start;
 	pgprotval_t prot, eff;
 
 	if (PTRS_PER_P4D == 1)
-		return walk_pud_level(m, st, __p4d(pgd_val(addr)), eff_in, P);
+		return walk_pud_level(st, __p4d(pgd_val(addr)), eff_in, P);
 
 	p4d_start = start = (p4d_t *)pgd_page_vaddr(addr);
 
@@ -486,13 +486,13 @@ static void walk_p4d_level(struct seq_file *m, struct pg_state *st, pgd_t addr,
 			prot = p4d_flags(*start);
 			eff = effective_prot(eff_in, prot);
 			if (p4d_large(*start) || !p4d_present(*start)) {
-				note_page(m, st, __pgprot(prot), eff, 2);
-			} else if (!kasan_page_table(m, st, p4d_start)) {
-				walk_pud_level(m, st, *start, eff,
+				note_page(st, __pgprot(prot), eff, 2);
+			} else if (!kasan_page_table(st, p4d_start)) {
+				walk_pud_level(st, *start, eff,
 					       P + i * P4D_LEVEL_MULT);
 			}
 		} else
-			note_page(m, st, __pgprot(0), 0, 2);
+			note_page(st, __pgprot(0), 0, 2);
 
 		start++;
 	}
@@ -529,6 +529,7 @@ static void ptdump_walk_pgd_level_core(struct seq_file *m, pgd_t *pgd,
 	}
 
 	st.check_wx = checkwx;
+	st.seq = m;
 	if (checkwx)
 		st.wx_pages = 0;
 
@@ -542,13 +543,13 @@ static void ptdump_walk_pgd_level_core(struct seq_file *m, pgd_t *pgd,
 			eff = prot;
 #endif
 			if (pgd_large(*start) || !pgd_present(*start)) {
-				note_page(m, &st, __pgprot(prot), eff, 1);
+				note_page(&st, __pgprot(prot), eff, 1);
 			} else {
-				walk_p4d_level(m, &st, *start, eff,
+				walk_p4d_level(&st, *start, eff,
 					       i * PGD_LEVEL_MULT);
 			}
 		} else
-			note_page(m, &st, __pgprot(0), 0, 1);
+			note_page(&st, __pgprot(0), 0, 1);
 
 		cond_resched();
 		start++;
@@ -556,7 +557,7 @@ static void ptdump_walk_pgd_level_core(struct seq_file *m, pgd_t *pgd,
 
 	/* Flush out the last page */
 	st.current_address = normalize_addr(PTRS_PER_PGD*PGD_LEVEL_MULT);
-	note_page(m, &st, __pgprot(0), 0, 0);
+	note_page(&st, __pgprot(0), 0, 0);
 	if (!checkwx)
 		return;
 	if (st.wx_pages)
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH v16 18/25] x86: mm+efi: Convert ptdump_walk_pgd_level() to take a mm_struct
  2019-12-06 13:52 ` Steven Price
@ 2019-12-06 13:53   ` Steven Price
  -1 siblings, 0 replies; 85+ messages in thread
From: Steven Price @ 2019-12-06 13:53 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: Steven Price, Andy Lutomirski, Ard Biesheuvel, Arnd Bergmann,
	Borislav Petkov, Catalin Marinas, Dave Hansen, Ingo Molnar,
	James Morse, Jérôme Glisse, Peter Zijlstra,
	Thomas Gleixner, Will Deacon, x86, H. Peter Anvin,
	linux-arm-kernel, linux-kernel, Mark Rutland, Liang, Kan

To enable x86 to use the generic walk_page_range() function, the
callers of ptdump_walk_pgd_level() need to pass an mm_struct rather
than the raw pgd_t pointer. Luckily since commit 7e904a91bf60
("efi: Use efi_mm in x86 as well as ARM") we now have an mm_struct
for EFI on x86.

Signed-off-by: Steven Price <steven.price@arm.com>
---
 arch/x86/include/asm/pgtable.h | 2 +-
 arch/x86/mm/dump_pagetables.c  | 4 ++--
 arch/x86/platform/efi/efi_32.c | 2 +-
 arch/x86/platform/efi/efi_64.c | 4 ++--
 4 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 8091a1c62596..5a30bf58156e 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -29,7 +29,7 @@
 extern pgd_t early_top_pgt[PTRS_PER_PGD];
 int __init __early_make_pgtable(unsigned long address, pmdval_t pmd);
 
-void ptdump_walk_pgd_level(struct seq_file *m, pgd_t *pgd);
+void ptdump_walk_pgd_level(struct seq_file *m, struct mm_struct *mm);
 void ptdump_walk_pgd_level_debugfs(struct seq_file *m, pgd_t *pgd, bool user);
 void ptdump_walk_pgd_level_checkwx(void);
 void ptdump_walk_user_pgd_level_checkwx(void);
diff --git a/arch/x86/mm/dump_pagetables.c b/arch/x86/mm/dump_pagetables.c
index 4dc6f4df40af..24fe76325b31 100644
--- a/arch/x86/mm/dump_pagetables.c
+++ b/arch/x86/mm/dump_pagetables.c
@@ -567,9 +567,9 @@ static void ptdump_walk_pgd_level_core(struct seq_file *m, pgd_t *pgd,
 		pr_info("x86/mm: Checked W+X mappings: passed, no W+X pages found.\n");
 }
 
-void ptdump_walk_pgd_level(struct seq_file *m, pgd_t *pgd)
+void ptdump_walk_pgd_level(struct seq_file *m, struct mm_struct *mm)
 {
-	ptdump_walk_pgd_level_core(m, pgd, false, true);
+	ptdump_walk_pgd_level_core(m, mm->pgd, false, true);
 }
 
 void ptdump_walk_pgd_level_debugfs(struct seq_file *m, pgd_t *pgd, bool user)
diff --git a/arch/x86/platform/efi/efi_32.c b/arch/x86/platform/efi/efi_32.c
index 9959657127f4..1616074075c3 100644
--- a/arch/x86/platform/efi/efi_32.c
+++ b/arch/x86/platform/efi/efi_32.c
@@ -49,7 +49,7 @@ void efi_sync_low_kernel_mappings(void) {}
 void __init efi_dump_pagetable(void)
 {
 #ifdef CONFIG_EFI_PGT_DUMP
-	ptdump_walk_pgd_level(NULL, swapper_pg_dir);
+	ptdump_walk_pgd_level(NULL, &init_mm);
 #endif
 }
 
diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c
index 08ce8177c3af..3cb63cd369d6 100644
--- a/arch/x86/platform/efi/efi_64.c
+++ b/arch/x86/platform/efi/efi_64.c
@@ -614,9 +614,9 @@ void __init efi_dump_pagetable(void)
 {
 #ifdef CONFIG_EFI_PGT_DUMP
 	if (efi_enabled(EFI_OLD_MEMMAP))
-		ptdump_walk_pgd_level(NULL, swapper_pg_dir);
+		ptdump_walk_pgd_level(NULL, &init_mm);
 	else
-		ptdump_walk_pgd_level(NULL, efi_mm.pgd);
+		ptdump_walk_pgd_level(NULL, &efi_mm);
 #endif
 }
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH v16 18/25] x86: mm+efi: Convert ptdump_walk_pgd_level() to take a mm_struct
@ 2019-12-06 13:53   ` Steven Price
  0 siblings, 0 replies; 85+ messages in thread
From: Steven Price @ 2019-12-06 13:53 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: Mark Rutland, x86, Arnd Bergmann, Ard Biesheuvel, Peter Zijlstra,
	Catalin Marinas, Dave Hansen, linux-kernel, Steven Price,
	Jérôme Glisse, Ingo Molnar, Borislav Petkov,
	Andy Lutomirski, H. Peter Anvin, James Morse, Thomas Gleixner,
	Will Deacon, linux-arm-kernel, Liang, Kan

To enable x86 to use the generic walk_page_range() function, the
callers of ptdump_walk_pgd_level() need to pass an mm_struct rather
than the raw pgd_t pointer. Luckily since commit 7e904a91bf60
("efi: Use efi_mm in x86 as well as ARM") we now have an mm_struct
for EFI on x86.

Signed-off-by: Steven Price <steven.price@arm.com>
---
 arch/x86/include/asm/pgtable.h | 2 +-
 arch/x86/mm/dump_pagetables.c  | 4 ++--
 arch/x86/platform/efi/efi_32.c | 2 +-
 arch/x86/platform/efi/efi_64.c | 4 ++--
 4 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 8091a1c62596..5a30bf58156e 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -29,7 +29,7 @@
 extern pgd_t early_top_pgt[PTRS_PER_PGD];
 int __init __early_make_pgtable(unsigned long address, pmdval_t pmd);
 
-void ptdump_walk_pgd_level(struct seq_file *m, pgd_t *pgd);
+void ptdump_walk_pgd_level(struct seq_file *m, struct mm_struct *mm);
 void ptdump_walk_pgd_level_debugfs(struct seq_file *m, pgd_t *pgd, bool user);
 void ptdump_walk_pgd_level_checkwx(void);
 void ptdump_walk_user_pgd_level_checkwx(void);
diff --git a/arch/x86/mm/dump_pagetables.c b/arch/x86/mm/dump_pagetables.c
index 4dc6f4df40af..24fe76325b31 100644
--- a/arch/x86/mm/dump_pagetables.c
+++ b/arch/x86/mm/dump_pagetables.c
@@ -567,9 +567,9 @@ static void ptdump_walk_pgd_level_core(struct seq_file *m, pgd_t *pgd,
 		pr_info("x86/mm: Checked W+X mappings: passed, no W+X pages found.\n");
 }
 
-void ptdump_walk_pgd_level(struct seq_file *m, pgd_t *pgd)
+void ptdump_walk_pgd_level(struct seq_file *m, struct mm_struct *mm)
 {
-	ptdump_walk_pgd_level_core(m, pgd, false, true);
+	ptdump_walk_pgd_level_core(m, mm->pgd, false, true);
 }
 
 void ptdump_walk_pgd_level_debugfs(struct seq_file *m, pgd_t *pgd, bool user)
diff --git a/arch/x86/platform/efi/efi_32.c b/arch/x86/platform/efi/efi_32.c
index 9959657127f4..1616074075c3 100644
--- a/arch/x86/platform/efi/efi_32.c
+++ b/arch/x86/platform/efi/efi_32.c
@@ -49,7 +49,7 @@ void efi_sync_low_kernel_mappings(void) {}
 void __init efi_dump_pagetable(void)
 {
 #ifdef CONFIG_EFI_PGT_DUMP
-	ptdump_walk_pgd_level(NULL, swapper_pg_dir);
+	ptdump_walk_pgd_level(NULL, &init_mm);
 #endif
 }
 
diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c
index 08ce8177c3af..3cb63cd369d6 100644
--- a/arch/x86/platform/efi/efi_64.c
+++ b/arch/x86/platform/efi/efi_64.c
@@ -614,9 +614,9 @@ void __init efi_dump_pagetable(void)
 {
 #ifdef CONFIG_EFI_PGT_DUMP
 	if (efi_enabled(EFI_OLD_MEMMAP))
-		ptdump_walk_pgd_level(NULL, swapper_pg_dir);
+		ptdump_walk_pgd_level(NULL, &init_mm);
 	else
-		ptdump_walk_pgd_level(NULL, efi_mm.pgd);
+		ptdump_walk_pgd_level(NULL, &efi_mm);
 #endif
 }
 
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH v16 19/25] x86: mm: Convert ptdump_walk_pgd_level_debugfs() to take an mm_struct
  2019-12-06 13:52 ` Steven Price
@ 2019-12-06 13:53   ` Steven Price
  -1 siblings, 0 replies; 85+ messages in thread
From: Steven Price @ 2019-12-06 13:53 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: Steven Price, Andy Lutomirski, Ard Biesheuvel, Arnd Bergmann,
	Borislav Petkov, Catalin Marinas, Dave Hansen, Ingo Molnar,
	James Morse, Jérôme Glisse, Peter Zijlstra,
	Thomas Gleixner, Will Deacon, x86, H. Peter Anvin,
	linux-arm-kernel, linux-kernel, Mark Rutland, Liang, Kan

To enable x86 to use the generic walk_page_range() function, the
callers of ptdump_walk_pgd_level_debugfs() need to pass in the mm_struct.

This means that ptdump_walk_pgd_level_core() is now always passed a
valid pgd, so drop the support for pgd==NULL.

Signed-off-by: Steven Price <steven.price@arm.com>
---
 arch/x86/include/asm/pgtable.h |  3 ++-
 arch/x86/mm/debug_pagetables.c |  8 ++++----
 arch/x86/mm/dump_pagetables.c  | 14 ++++++--------
 3 files changed, 12 insertions(+), 13 deletions(-)

diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 5a30bf58156e..7e118660bbd9 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -30,7 +30,8 @@ extern pgd_t early_top_pgt[PTRS_PER_PGD];
 int __init __early_make_pgtable(unsigned long address, pmdval_t pmd);
 
 void ptdump_walk_pgd_level(struct seq_file *m, struct mm_struct *mm);
-void ptdump_walk_pgd_level_debugfs(struct seq_file *m, pgd_t *pgd, bool user);
+void ptdump_walk_pgd_level_debugfs(struct seq_file *m, struct mm_struct *mm,
+				   bool user);
 void ptdump_walk_pgd_level_checkwx(void);
 void ptdump_walk_user_pgd_level_checkwx(void);
 
diff --git a/arch/x86/mm/debug_pagetables.c b/arch/x86/mm/debug_pagetables.c
index 39001a401eff..d0efec713c6c 100644
--- a/arch/x86/mm/debug_pagetables.c
+++ b/arch/x86/mm/debug_pagetables.c
@@ -7,7 +7,7 @@
 
 static int ptdump_show(struct seq_file *m, void *v)
 {
-	ptdump_walk_pgd_level_debugfs(m, NULL, false);
+	ptdump_walk_pgd_level_debugfs(m, &init_mm, false);
 	return 0;
 }
 
@@ -17,7 +17,7 @@ static int ptdump_curknl_show(struct seq_file *m, void *v)
 {
 	if (current->mm->pgd) {
 		down_read(&current->mm->mmap_sem);
-		ptdump_walk_pgd_level_debugfs(m, current->mm->pgd, false);
+		ptdump_walk_pgd_level_debugfs(m, current->mm, false);
 		up_read(&current->mm->mmap_sem);
 	}
 	return 0;
@@ -30,7 +30,7 @@ static int ptdump_curusr_show(struct seq_file *m, void *v)
 {
 	if (current->mm->pgd) {
 		down_read(&current->mm->mmap_sem);
-		ptdump_walk_pgd_level_debugfs(m, current->mm->pgd, true);
+		ptdump_walk_pgd_level_debugfs(m, current->mm, true);
 		up_read(&current->mm->mmap_sem);
 	}
 	return 0;
@@ -43,7 +43,7 @@ DEFINE_SHOW_ATTRIBUTE(ptdump_curusr);
 static int ptdump_efi_show(struct seq_file *m, void *v)
 {
 	if (efi_mm.pgd)
-		ptdump_walk_pgd_level_debugfs(m, efi_mm.pgd, false);
+		ptdump_walk_pgd_level_debugfs(m, &efi_mm, false);
 	return 0;
 }
 
diff --git a/arch/x86/mm/dump_pagetables.c b/arch/x86/mm/dump_pagetables.c
index 24fe76325b31..2f5f32f21f81 100644
--- a/arch/x86/mm/dump_pagetables.c
+++ b/arch/x86/mm/dump_pagetables.c
@@ -518,16 +518,12 @@ static inline bool is_hypervisor_range(int idx)
 static void ptdump_walk_pgd_level_core(struct seq_file *m, pgd_t *pgd,
 				       bool checkwx, bool dmesg)
 {
-	pgd_t *start = INIT_PGD;
+	pgd_t *start = pgd;
 	pgprotval_t prot, eff;
 	int i;
 	struct pg_state st = {};
 
-	if (pgd) {
-		start = pgd;
-		st.to_dmesg = dmesg;
-	}
-
+	st.to_dmesg = dmesg;
 	st.check_wx = checkwx;
 	st.seq = m;
 	if (checkwx)
@@ -572,8 +568,10 @@ void ptdump_walk_pgd_level(struct seq_file *m, struct mm_struct *mm)
 	ptdump_walk_pgd_level_core(m, mm->pgd, false, true);
 }
 
-void ptdump_walk_pgd_level_debugfs(struct seq_file *m, pgd_t *pgd, bool user)
+void ptdump_walk_pgd_level_debugfs(struct seq_file *m, struct mm_struct *mm,
+				   bool user)
 {
+	pgd_t *pgd = mm->pgd;
 #ifdef CONFIG_PAGE_TABLE_ISOLATION
 	if (user && boot_cpu_has(X86_FEATURE_PTI))
 		pgd = kernel_to_user_pgdp(pgd);
@@ -599,7 +597,7 @@ void ptdump_walk_user_pgd_level_checkwx(void)
 
 void ptdump_walk_pgd_level_checkwx(void)
 {
-	ptdump_walk_pgd_level_core(NULL, NULL, true, false);
+	ptdump_walk_pgd_level_core(NULL, INIT_PGD, true, false);
 }
 
 static int __init pt_dump_init(void)
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH v16 19/25] x86: mm: Convert ptdump_walk_pgd_level_debugfs() to take an mm_struct
@ 2019-12-06 13:53   ` Steven Price
  0 siblings, 0 replies; 85+ messages in thread
From: Steven Price @ 2019-12-06 13:53 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: Mark Rutland, x86, Arnd Bergmann, Ard Biesheuvel, Peter Zijlstra,
	Catalin Marinas, Dave Hansen, linux-kernel, Steven Price,
	Jérôme Glisse, Ingo Molnar, Borislav Petkov,
	Andy Lutomirski, H. Peter Anvin, James Morse, Thomas Gleixner,
	Will Deacon, linux-arm-kernel, Liang, Kan

To enable x86 to use the generic walk_page_range() function, the
callers of ptdump_walk_pgd_level_debugfs() need to pass in the mm_struct.

This means that ptdump_walk_pgd_level_core() is now always passed a
valid pgd, so drop the support for pgd==NULL.

Signed-off-by: Steven Price <steven.price@arm.com>
---
 arch/x86/include/asm/pgtable.h |  3 ++-
 arch/x86/mm/debug_pagetables.c |  8 ++++----
 arch/x86/mm/dump_pagetables.c  | 14 ++++++--------
 3 files changed, 12 insertions(+), 13 deletions(-)

diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 5a30bf58156e..7e118660bbd9 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -30,7 +30,8 @@ extern pgd_t early_top_pgt[PTRS_PER_PGD];
 int __init __early_make_pgtable(unsigned long address, pmdval_t pmd);
 
 void ptdump_walk_pgd_level(struct seq_file *m, struct mm_struct *mm);
-void ptdump_walk_pgd_level_debugfs(struct seq_file *m, pgd_t *pgd, bool user);
+void ptdump_walk_pgd_level_debugfs(struct seq_file *m, struct mm_struct *mm,
+				   bool user);
 void ptdump_walk_pgd_level_checkwx(void);
 void ptdump_walk_user_pgd_level_checkwx(void);
 
diff --git a/arch/x86/mm/debug_pagetables.c b/arch/x86/mm/debug_pagetables.c
index 39001a401eff..d0efec713c6c 100644
--- a/arch/x86/mm/debug_pagetables.c
+++ b/arch/x86/mm/debug_pagetables.c
@@ -7,7 +7,7 @@
 
 static int ptdump_show(struct seq_file *m, void *v)
 {
-	ptdump_walk_pgd_level_debugfs(m, NULL, false);
+	ptdump_walk_pgd_level_debugfs(m, &init_mm, false);
 	return 0;
 }
 
@@ -17,7 +17,7 @@ static int ptdump_curknl_show(struct seq_file *m, void *v)
 {
 	if (current->mm->pgd) {
 		down_read(&current->mm->mmap_sem);
-		ptdump_walk_pgd_level_debugfs(m, current->mm->pgd, false);
+		ptdump_walk_pgd_level_debugfs(m, current->mm, false);
 		up_read(&current->mm->mmap_sem);
 	}
 	return 0;
@@ -30,7 +30,7 @@ static int ptdump_curusr_show(struct seq_file *m, void *v)
 {
 	if (current->mm->pgd) {
 		down_read(&current->mm->mmap_sem);
-		ptdump_walk_pgd_level_debugfs(m, current->mm->pgd, true);
+		ptdump_walk_pgd_level_debugfs(m, current->mm, true);
 		up_read(&current->mm->mmap_sem);
 	}
 	return 0;
@@ -43,7 +43,7 @@ DEFINE_SHOW_ATTRIBUTE(ptdump_curusr);
 static int ptdump_efi_show(struct seq_file *m, void *v)
 {
 	if (efi_mm.pgd)
-		ptdump_walk_pgd_level_debugfs(m, efi_mm.pgd, false);
+		ptdump_walk_pgd_level_debugfs(m, &efi_mm, false);
 	return 0;
 }
 
diff --git a/arch/x86/mm/dump_pagetables.c b/arch/x86/mm/dump_pagetables.c
index 24fe76325b31..2f5f32f21f81 100644
--- a/arch/x86/mm/dump_pagetables.c
+++ b/arch/x86/mm/dump_pagetables.c
@@ -518,16 +518,12 @@ static inline bool is_hypervisor_range(int idx)
 static void ptdump_walk_pgd_level_core(struct seq_file *m, pgd_t *pgd,
 				       bool checkwx, bool dmesg)
 {
-	pgd_t *start = INIT_PGD;
+	pgd_t *start = pgd;
 	pgprotval_t prot, eff;
 	int i;
 	struct pg_state st = {};
 
-	if (pgd) {
-		start = pgd;
-		st.to_dmesg = dmesg;
-	}
-
+	st.to_dmesg = dmesg;
 	st.check_wx = checkwx;
 	st.seq = m;
 	if (checkwx)
@@ -572,8 +568,10 @@ void ptdump_walk_pgd_level(struct seq_file *m, struct mm_struct *mm)
 	ptdump_walk_pgd_level_core(m, mm->pgd, false, true);
 }
 
-void ptdump_walk_pgd_level_debugfs(struct seq_file *m, pgd_t *pgd, bool user)
+void ptdump_walk_pgd_level_debugfs(struct seq_file *m, struct mm_struct *mm,
+				   bool user)
 {
+	pgd_t *pgd = mm->pgd;
 #ifdef CONFIG_PAGE_TABLE_ISOLATION
 	if (user && boot_cpu_has(X86_FEATURE_PTI))
 		pgd = kernel_to_user_pgdp(pgd);
@@ -599,7 +597,7 @@ void ptdump_walk_user_pgd_level_checkwx(void)
 
 void ptdump_walk_pgd_level_checkwx(void)
 {
-	ptdump_walk_pgd_level_core(NULL, NULL, true, false);
+	ptdump_walk_pgd_level_core(NULL, INIT_PGD, true, false);
 }
 
 static int __init pt_dump_init(void)
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH v16 20/25] x86: mm: Convert ptdump_walk_pgd_level_core() to take an mm_struct
  2019-12-06 13:52 ` Steven Price
@ 2019-12-06 13:53   ` Steven Price
  -1 siblings, 0 replies; 85+ messages in thread
From: Steven Price @ 2019-12-06 13:53 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: Steven Price, Andy Lutomirski, Ard Biesheuvel, Arnd Bergmann,
	Borislav Petkov, Catalin Marinas, Dave Hansen, Ingo Molnar,
	James Morse, Jérôme Glisse, Peter Zijlstra,
	Thomas Gleixner, Will Deacon, x86, H. Peter Anvin,
	linux-arm-kernel, linux-kernel, Mark Rutland, Liang, Kan

An mm_struct is needed to enable x86 to use of the generic
walk_page_range() function.

In the case of walking the user page tables (when
CONFIG_PAGE_TABLE_ISOLATION is enabled), it is necessary to create a
fake_mm structure because there isn't an mm_struct with a pointer
to the pgd of the user page tables. This fake_mm structure is
initialised with the minimum necessary for the generic page walk code.

Signed-off-by: Steven Price <steven.price@arm.com>
---
 arch/x86/mm/dump_pagetables.c | 36 ++++++++++++++++++++---------------
 1 file changed, 21 insertions(+), 15 deletions(-)

diff --git a/arch/x86/mm/dump_pagetables.c b/arch/x86/mm/dump_pagetables.c
index 2f5f32f21f81..3632be89ec99 100644
--- a/arch/x86/mm/dump_pagetables.c
+++ b/arch/x86/mm/dump_pagetables.c
@@ -107,8 +107,6 @@ static struct addr_marker address_markers[] = {
 	[END_OF_SPACE_NR]	= { -1,			NULL }
 };
 
-#define INIT_PGD	((pgd_t *) &init_top_pgt)
-
 #else /* CONFIG_X86_64 */
 
 enum address_markers_idx {
@@ -143,8 +141,6 @@ static struct addr_marker address_markers[] = {
 	[END_OF_SPACE_NR]	= { -1,			NULL }
 };
 
-#define INIT_PGD	(swapper_pg_dir)
-
 #endif /* !CONFIG_X86_64 */
 
 /* Multipliers for offsets within the PTEs */
@@ -515,10 +511,10 @@ static inline bool is_hypervisor_range(int idx)
 #endif
 }
 
-static void ptdump_walk_pgd_level_core(struct seq_file *m, pgd_t *pgd,
+static void ptdump_walk_pgd_level_core(struct seq_file *m, struct mm_struct *mm,
 				       bool checkwx, bool dmesg)
 {
-	pgd_t *start = pgd;
+	pgd_t *start = mm->pgd;
 	pgprotval_t prot, eff;
 	int i;
 	struct pg_state st = {};
@@ -565,39 +561,49 @@ static void ptdump_walk_pgd_level_core(struct seq_file *m, pgd_t *pgd,
 
 void ptdump_walk_pgd_level(struct seq_file *m, struct mm_struct *mm)
 {
-	ptdump_walk_pgd_level_core(m, mm->pgd, false, true);
+	ptdump_walk_pgd_level_core(m, mm, false, true);
 }
 
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+static void ptdump_walk_pgd_level_user_core(struct seq_file *m,
+					    struct mm_struct *mm,
+					    bool checkwx, bool dmesg)
+{
+	struct mm_struct fake_mm = {
+		.pgd = kernel_to_user_pgdp(mm->pgd)
+	};
+	init_rwsem(&fake_mm.mmap_sem);
+	ptdump_walk_pgd_level_core(m, &fake_mm, checkwx, dmesg);
+}
+#endif
+
 void ptdump_walk_pgd_level_debugfs(struct seq_file *m, struct mm_struct *mm,
 				   bool user)
 {
-	pgd_t *pgd = mm->pgd;
 #ifdef CONFIG_PAGE_TABLE_ISOLATION
 	if (user && boot_cpu_has(X86_FEATURE_PTI))
-		pgd = kernel_to_user_pgdp(pgd);
+		ptdump_walk_pgd_level_user_core(m, mm, false, false);
+	else
 #endif
-	ptdump_walk_pgd_level_core(m, pgd, false, false);
+		ptdump_walk_pgd_level_core(m, mm, false, false);
 }
 EXPORT_SYMBOL_GPL(ptdump_walk_pgd_level_debugfs);
 
 void ptdump_walk_user_pgd_level_checkwx(void)
 {
 #ifdef CONFIG_PAGE_TABLE_ISOLATION
-	pgd_t *pgd = INIT_PGD;
-
 	if (!(__supported_pte_mask & _PAGE_NX) ||
 	    !boot_cpu_has(X86_FEATURE_PTI))
 		return;
 
 	pr_info("x86/mm: Checking user space page tables\n");
-	pgd = kernel_to_user_pgdp(pgd);
-	ptdump_walk_pgd_level_core(NULL, pgd, true, false);
+	ptdump_walk_pgd_level_user_core(NULL, &init_mm, true, false);
 #endif
 }
 
 void ptdump_walk_pgd_level_checkwx(void)
 {
-	ptdump_walk_pgd_level_core(NULL, INIT_PGD, true, false);
+	ptdump_walk_pgd_level_core(NULL, &init_mm, true, false);
 }
 
 static int __init pt_dump_init(void)
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH v16 20/25] x86: mm: Convert ptdump_walk_pgd_level_core() to take an mm_struct
@ 2019-12-06 13:53   ` Steven Price
  0 siblings, 0 replies; 85+ messages in thread
From: Steven Price @ 2019-12-06 13:53 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: Mark Rutland, x86, Arnd Bergmann, Ard Biesheuvel, Peter Zijlstra,
	Catalin Marinas, Dave Hansen, linux-kernel, Steven Price,
	Jérôme Glisse, Ingo Molnar, Borislav Petkov,
	Andy Lutomirski, H. Peter Anvin, James Morse, Thomas Gleixner,
	Will Deacon, linux-arm-kernel, Liang, Kan

An mm_struct is needed to enable x86 to use of the generic
walk_page_range() function.

In the case of walking the user page tables (when
CONFIG_PAGE_TABLE_ISOLATION is enabled), it is necessary to create a
fake_mm structure because there isn't an mm_struct with a pointer
to the pgd of the user page tables. This fake_mm structure is
initialised with the minimum necessary for the generic page walk code.

Signed-off-by: Steven Price <steven.price@arm.com>
---
 arch/x86/mm/dump_pagetables.c | 36 ++++++++++++++++++++---------------
 1 file changed, 21 insertions(+), 15 deletions(-)

diff --git a/arch/x86/mm/dump_pagetables.c b/arch/x86/mm/dump_pagetables.c
index 2f5f32f21f81..3632be89ec99 100644
--- a/arch/x86/mm/dump_pagetables.c
+++ b/arch/x86/mm/dump_pagetables.c
@@ -107,8 +107,6 @@ static struct addr_marker address_markers[] = {
 	[END_OF_SPACE_NR]	= { -1,			NULL }
 };
 
-#define INIT_PGD	((pgd_t *) &init_top_pgt)
-
 #else /* CONFIG_X86_64 */
 
 enum address_markers_idx {
@@ -143,8 +141,6 @@ static struct addr_marker address_markers[] = {
 	[END_OF_SPACE_NR]	= { -1,			NULL }
 };
 
-#define INIT_PGD	(swapper_pg_dir)
-
 #endif /* !CONFIG_X86_64 */
 
 /* Multipliers for offsets within the PTEs */
@@ -515,10 +511,10 @@ static inline bool is_hypervisor_range(int idx)
 #endif
 }
 
-static void ptdump_walk_pgd_level_core(struct seq_file *m, pgd_t *pgd,
+static void ptdump_walk_pgd_level_core(struct seq_file *m, struct mm_struct *mm,
 				       bool checkwx, bool dmesg)
 {
-	pgd_t *start = pgd;
+	pgd_t *start = mm->pgd;
 	pgprotval_t prot, eff;
 	int i;
 	struct pg_state st = {};
@@ -565,39 +561,49 @@ static void ptdump_walk_pgd_level_core(struct seq_file *m, pgd_t *pgd,
 
 void ptdump_walk_pgd_level(struct seq_file *m, struct mm_struct *mm)
 {
-	ptdump_walk_pgd_level_core(m, mm->pgd, false, true);
+	ptdump_walk_pgd_level_core(m, mm, false, true);
 }
 
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+static void ptdump_walk_pgd_level_user_core(struct seq_file *m,
+					    struct mm_struct *mm,
+					    bool checkwx, bool dmesg)
+{
+	struct mm_struct fake_mm = {
+		.pgd = kernel_to_user_pgdp(mm->pgd)
+	};
+	init_rwsem(&fake_mm.mmap_sem);
+	ptdump_walk_pgd_level_core(m, &fake_mm, checkwx, dmesg);
+}
+#endif
+
 void ptdump_walk_pgd_level_debugfs(struct seq_file *m, struct mm_struct *mm,
 				   bool user)
 {
-	pgd_t *pgd = mm->pgd;
 #ifdef CONFIG_PAGE_TABLE_ISOLATION
 	if (user && boot_cpu_has(X86_FEATURE_PTI))
-		pgd = kernel_to_user_pgdp(pgd);
+		ptdump_walk_pgd_level_user_core(m, mm, false, false);
+	else
 #endif
-	ptdump_walk_pgd_level_core(m, pgd, false, false);
+		ptdump_walk_pgd_level_core(m, mm, false, false);
 }
 EXPORT_SYMBOL_GPL(ptdump_walk_pgd_level_debugfs);
 
 void ptdump_walk_user_pgd_level_checkwx(void)
 {
 #ifdef CONFIG_PAGE_TABLE_ISOLATION
-	pgd_t *pgd = INIT_PGD;
-
 	if (!(__supported_pte_mask & _PAGE_NX) ||
 	    !boot_cpu_has(X86_FEATURE_PTI))
 		return;
 
 	pr_info("x86/mm: Checking user space page tables\n");
-	pgd = kernel_to_user_pgdp(pgd);
-	ptdump_walk_pgd_level_core(NULL, pgd, true, false);
+	ptdump_walk_pgd_level_user_core(NULL, &init_mm, true, false);
 #endif
 }
 
 void ptdump_walk_pgd_level_checkwx(void)
 {
-	ptdump_walk_pgd_level_core(NULL, INIT_PGD, true, false);
+	ptdump_walk_pgd_level_core(NULL, &init_mm, true, false);
 }
 
 static int __init pt_dump_init(void)
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH v16 21/25] mm: Add generic ptdump
  2019-12-06 13:52 ` Steven Price
@ 2019-12-06 13:53   ` Steven Price
  -1 siblings, 0 replies; 85+ messages in thread
From: Steven Price @ 2019-12-06 13:53 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: Steven Price, Andy Lutomirski, Ard Biesheuvel, Arnd Bergmann,
	Borislav Petkov, Catalin Marinas, Dave Hansen, Ingo Molnar,
	James Morse, Jérôme Glisse, Peter Zijlstra,
	Thomas Gleixner, Will Deacon, x86, H. Peter Anvin,
	linux-arm-kernel, linux-kernel, Mark Rutland, Liang, Kan

Add a generic version of page table dumping that architectures can
opt-in to

Signed-off-by: Steven Price <steven.price@arm.com>
---
 include/linux/ptdump.h |  21 ++++++
 mm/Kconfig.debug       |  21 ++++++
 mm/Makefile            |   1 +
 mm/ptdump.c            | 151 +++++++++++++++++++++++++++++++++++++++++
 4 files changed, 194 insertions(+)
 create mode 100644 include/linux/ptdump.h
 create mode 100644 mm/ptdump.c

diff --git a/include/linux/ptdump.h b/include/linux/ptdump.h
new file mode 100644
index 000000000000..a0fb8dd2be97
--- /dev/null
+++ b/include/linux/ptdump.h
@@ -0,0 +1,21 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef _LINUX_PTDUMP_H
+#define _LINUX_PTDUMP_H
+
+#include <linux/mm_types.h>
+
+struct ptdump_range {
+	unsigned long start;
+	unsigned long end;
+};
+
+struct ptdump_state {
+	void (*note_page)(struct ptdump_state *st, unsigned long addr,
+			  int level, unsigned long val);
+	const struct ptdump_range *range;
+};
+
+void ptdump_walk_pgd(struct ptdump_state *st, struct mm_struct *mm);
+
+#endif /* _LINUX_PTDUMP_H */
diff --git a/mm/Kconfig.debug b/mm/Kconfig.debug
index 327b3ebf23bf..0271b22e063f 100644
--- a/mm/Kconfig.debug
+++ b/mm/Kconfig.debug
@@ -117,3 +117,24 @@ config DEBUG_RODATA_TEST
     depends on STRICT_KERNEL_RWX
     ---help---
       This option enables a testcase for the setting rodata read-only.
+
+config GENERIC_PTDUMP
+	bool
+
+config PTDUMP_CORE
+	bool
+
+config PTDUMP_DEBUGFS
+	bool "Export kernel pagetable layout to userspace via debugfs"
+	depends on DEBUG_KERNEL
+	depends on DEBUG_FS
+	depends on GENERIC_PTDUMP
+	select PTDUMP_CORE
+	help
+	  Say Y here if you want to show the kernel pagetable layout in a
+	  debugfs file. This information is only useful for kernel developers
+	  who are working in architecture specific areas of the kernel.
+	  It is probably not a good idea to enable this feature in a production
+	  kernel.
+
+	  If in doubt, say N.
diff --git a/mm/Makefile b/mm/Makefile
index 1937cc251883..a8eff811f6c4 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -108,3 +108,4 @@ obj-$(CONFIG_ZONE_DEVICE) += memremap.o
 obj-$(CONFIG_HMM_MIRROR) += hmm.o
 obj-$(CONFIG_MEMFD_CREATE) += memfd.o
 obj-$(CONFIG_MAPPING_DIRTY_HELPERS) += mapping_dirty_helpers.o
+obj-$(CONFIG_PTDUMP_CORE) += ptdump.o
diff --git a/mm/ptdump.c b/mm/ptdump.c
new file mode 100644
index 000000000000..9357ab3ecc88
--- /dev/null
+++ b/mm/ptdump.c
@@ -0,0 +1,151 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <linux/pagewalk.h>
+#include <linux/ptdump.h>
+#include <linux/kasan.h>
+
+static int ptdump_pgd_entry(pgd_t *pgd, unsigned long addr,
+			    unsigned long next, struct mm_walk *walk)
+{
+	struct ptdump_state *st = walk->private;
+	pgd_t val = READ_ONCE(*pgd);
+
+	if (pgd_leaf(val))
+		st->note_page(st, addr, 1, pgd_val(val));
+
+	return 0;
+}
+
+static int ptdump_p4d_entry(p4d_t *p4d, unsigned long addr,
+			    unsigned long next, struct mm_walk *walk)
+{
+	struct ptdump_state *st = walk->private;
+	p4d_t val = READ_ONCE(*p4d);
+
+	if (p4d_leaf(val))
+		st->note_page(st, addr, 2, p4d_val(val));
+
+	return 0;
+}
+
+static int ptdump_pud_entry(pud_t *pud, unsigned long addr,
+			    unsigned long next, struct mm_walk *walk)
+{
+	struct ptdump_state *st = walk->private;
+	pud_t val = READ_ONCE(*pud);
+
+	if (pud_leaf(val))
+		st->note_page(st, addr, 3, pud_val(val));
+
+	return 0;
+}
+
+static int ptdump_pmd_entry(pmd_t *pmd, unsigned long addr,
+			    unsigned long next, struct mm_walk *walk)
+{
+	struct ptdump_state *st = walk->private;
+	pmd_t val = READ_ONCE(*pmd);
+
+	if (pmd_leaf(val))
+		st->note_page(st, addr, 4, pmd_val(val));
+
+	return 0;
+}
+
+static int ptdump_pte_entry(pte_t *pte, unsigned long addr,
+			    unsigned long next, struct mm_walk *walk)
+{
+	struct ptdump_state *st = walk->private;
+
+	st->note_page(st, addr, 5, pte_val(READ_ONCE(*pte)));
+
+	return 0;
+}
+
+#ifdef CONFIG_KASAN
+/*
+ * This is an optimization for KASAN=y case. Since all kasan page tables
+ * eventually point to the kasan_early_shadow_page we could call note_page()
+ * right away without walking through lower level page tables. This saves
+ * us dozens of seconds (minutes for 5-level config) while checking for
+ * W+X mapping or reading kernel_page_tables debugfs file.
+ */
+static inline int note_kasan_page_table(struct mm_walk *walk,
+					unsigned long addr)
+{
+	struct ptdump_state *st = walk->private;
+
+	st->note_page(st, addr, 5, pte_val(kasan_early_shadow_pte[0]));
+	return 1;
+}
+
+static int ptdump_test_p4d(unsigned long addr, unsigned long next,
+			   p4d_t *p4d, struct mm_walk *walk)
+{
+#if CONFIG_PGTABLE_LEVELS > 4
+	if (p4d == lm_alias(kasan_early_shadow_p4d))
+		return note_kasan_page_table(walk, addr);
+#endif
+	return 0;
+}
+
+static int ptdump_test_pud(unsigned long addr, unsigned long next,
+			   pud_t *pud, struct mm_walk *walk)
+{
+#if CONFIG_PGTABLE_LEVELS > 3
+	if (pud == lm_alias(kasan_early_shadow_pud))
+		return note_kasan_page_table(walk, addr);
+#endif
+	return 0;
+}
+
+static int ptdump_test_pmd(unsigned long addr, unsigned long next,
+			   pmd_t *pmd, struct mm_walk *walk)
+{
+#if CONFIG_PGTABLE_LEVELS > 2
+	if (pmd == lm_alias(kasan_early_shadow_pmd))
+		return note_kasan_page_table(walk, addr);
+#endif
+	return 0;
+}
+#endif /* CONFIG_KASAN */
+
+static int ptdump_hole(unsigned long addr, unsigned long next,
+		       int depth, struct mm_walk *walk)
+{
+	struct ptdump_state *st = walk->private;
+
+	st->note_page(st, addr, depth + 1, 0);
+
+	return 0;
+}
+
+static const struct mm_walk_ops ptdump_ops = {
+	.pgd_entry	= ptdump_pgd_entry,
+	.p4d_entry	= ptdump_p4d_entry,
+	.pud_entry	= ptdump_pud_entry,
+	.pmd_entry	= ptdump_pmd_entry,
+	.pte_entry	= ptdump_pte_entry,
+#ifdef CONFIG_KASAN
+	.test_p4d	= ptdump_test_p4d,
+	.test_pud	= ptdump_test_pud,
+	.test_pmd	= ptdump_test_pmd,
+#endif
+	.pte_hole	= ptdump_hole,
+};
+
+void ptdump_walk_pgd(struct ptdump_state *st, struct mm_struct *mm)
+{
+	const struct ptdump_range *range = st->range;
+
+	down_read(&mm->mmap_sem);
+	while (range->start != range->end) {
+		walk_page_range_novma(mm, range->start, range->end,
+				      &ptdump_ops, st);
+		range++;
+	}
+	up_read(&mm->mmap_sem);
+
+	/* Flush out the last page */
+	st->note_page(st, 0, 0, 0);
+}
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH v16 21/25] mm: Add generic ptdump
@ 2019-12-06 13:53   ` Steven Price
  0 siblings, 0 replies; 85+ messages in thread
From: Steven Price @ 2019-12-06 13:53 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: Mark Rutland, x86, Arnd Bergmann, Ard Biesheuvel, Peter Zijlstra,
	Catalin Marinas, Dave Hansen, linux-kernel, Steven Price,
	Jérôme Glisse, Ingo Molnar, Borislav Petkov,
	Andy Lutomirski, H. Peter Anvin, James Morse, Thomas Gleixner,
	Will Deacon, linux-arm-kernel, Liang, Kan

Add a generic version of page table dumping that architectures can
opt-in to

Signed-off-by: Steven Price <steven.price@arm.com>
---
 include/linux/ptdump.h |  21 ++++++
 mm/Kconfig.debug       |  21 ++++++
 mm/Makefile            |   1 +
 mm/ptdump.c            | 151 +++++++++++++++++++++++++++++++++++++++++
 4 files changed, 194 insertions(+)
 create mode 100644 include/linux/ptdump.h
 create mode 100644 mm/ptdump.c

diff --git a/include/linux/ptdump.h b/include/linux/ptdump.h
new file mode 100644
index 000000000000..a0fb8dd2be97
--- /dev/null
+++ b/include/linux/ptdump.h
@@ -0,0 +1,21 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef _LINUX_PTDUMP_H
+#define _LINUX_PTDUMP_H
+
+#include <linux/mm_types.h>
+
+struct ptdump_range {
+	unsigned long start;
+	unsigned long end;
+};
+
+struct ptdump_state {
+	void (*note_page)(struct ptdump_state *st, unsigned long addr,
+			  int level, unsigned long val);
+	const struct ptdump_range *range;
+};
+
+void ptdump_walk_pgd(struct ptdump_state *st, struct mm_struct *mm);
+
+#endif /* _LINUX_PTDUMP_H */
diff --git a/mm/Kconfig.debug b/mm/Kconfig.debug
index 327b3ebf23bf..0271b22e063f 100644
--- a/mm/Kconfig.debug
+++ b/mm/Kconfig.debug
@@ -117,3 +117,24 @@ config DEBUG_RODATA_TEST
     depends on STRICT_KERNEL_RWX
     ---help---
       This option enables a testcase for the setting rodata read-only.
+
+config GENERIC_PTDUMP
+	bool
+
+config PTDUMP_CORE
+	bool
+
+config PTDUMP_DEBUGFS
+	bool "Export kernel pagetable layout to userspace via debugfs"
+	depends on DEBUG_KERNEL
+	depends on DEBUG_FS
+	depends on GENERIC_PTDUMP
+	select PTDUMP_CORE
+	help
+	  Say Y here if you want to show the kernel pagetable layout in a
+	  debugfs file. This information is only useful for kernel developers
+	  who are working in architecture specific areas of the kernel.
+	  It is probably not a good idea to enable this feature in a production
+	  kernel.
+
+	  If in doubt, say N.
diff --git a/mm/Makefile b/mm/Makefile
index 1937cc251883..a8eff811f6c4 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -108,3 +108,4 @@ obj-$(CONFIG_ZONE_DEVICE) += memremap.o
 obj-$(CONFIG_HMM_MIRROR) += hmm.o
 obj-$(CONFIG_MEMFD_CREATE) += memfd.o
 obj-$(CONFIG_MAPPING_DIRTY_HELPERS) += mapping_dirty_helpers.o
+obj-$(CONFIG_PTDUMP_CORE) += ptdump.o
diff --git a/mm/ptdump.c b/mm/ptdump.c
new file mode 100644
index 000000000000..9357ab3ecc88
--- /dev/null
+++ b/mm/ptdump.c
@@ -0,0 +1,151 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <linux/pagewalk.h>
+#include <linux/ptdump.h>
+#include <linux/kasan.h>
+
+static int ptdump_pgd_entry(pgd_t *pgd, unsigned long addr,
+			    unsigned long next, struct mm_walk *walk)
+{
+	struct ptdump_state *st = walk->private;
+	pgd_t val = READ_ONCE(*pgd);
+
+	if (pgd_leaf(val))
+		st->note_page(st, addr, 1, pgd_val(val));
+
+	return 0;
+}
+
+static int ptdump_p4d_entry(p4d_t *p4d, unsigned long addr,
+			    unsigned long next, struct mm_walk *walk)
+{
+	struct ptdump_state *st = walk->private;
+	p4d_t val = READ_ONCE(*p4d);
+
+	if (p4d_leaf(val))
+		st->note_page(st, addr, 2, p4d_val(val));
+
+	return 0;
+}
+
+static int ptdump_pud_entry(pud_t *pud, unsigned long addr,
+			    unsigned long next, struct mm_walk *walk)
+{
+	struct ptdump_state *st = walk->private;
+	pud_t val = READ_ONCE(*pud);
+
+	if (pud_leaf(val))
+		st->note_page(st, addr, 3, pud_val(val));
+
+	return 0;
+}
+
+static int ptdump_pmd_entry(pmd_t *pmd, unsigned long addr,
+			    unsigned long next, struct mm_walk *walk)
+{
+	struct ptdump_state *st = walk->private;
+	pmd_t val = READ_ONCE(*pmd);
+
+	if (pmd_leaf(val))
+		st->note_page(st, addr, 4, pmd_val(val));
+
+	return 0;
+}
+
+static int ptdump_pte_entry(pte_t *pte, unsigned long addr,
+			    unsigned long next, struct mm_walk *walk)
+{
+	struct ptdump_state *st = walk->private;
+
+	st->note_page(st, addr, 5, pte_val(READ_ONCE(*pte)));
+
+	return 0;
+}
+
+#ifdef CONFIG_KASAN
+/*
+ * This is an optimization for KASAN=y case. Since all kasan page tables
+ * eventually point to the kasan_early_shadow_page we could call note_page()
+ * right away without walking through lower level page tables. This saves
+ * us dozens of seconds (minutes for 5-level config) while checking for
+ * W+X mapping or reading kernel_page_tables debugfs file.
+ */
+static inline int note_kasan_page_table(struct mm_walk *walk,
+					unsigned long addr)
+{
+	struct ptdump_state *st = walk->private;
+
+	st->note_page(st, addr, 5, pte_val(kasan_early_shadow_pte[0]));
+	return 1;
+}
+
+static int ptdump_test_p4d(unsigned long addr, unsigned long next,
+			   p4d_t *p4d, struct mm_walk *walk)
+{
+#if CONFIG_PGTABLE_LEVELS > 4
+	if (p4d == lm_alias(kasan_early_shadow_p4d))
+		return note_kasan_page_table(walk, addr);
+#endif
+	return 0;
+}
+
+static int ptdump_test_pud(unsigned long addr, unsigned long next,
+			   pud_t *pud, struct mm_walk *walk)
+{
+#if CONFIG_PGTABLE_LEVELS > 3
+	if (pud == lm_alias(kasan_early_shadow_pud))
+		return note_kasan_page_table(walk, addr);
+#endif
+	return 0;
+}
+
+static int ptdump_test_pmd(unsigned long addr, unsigned long next,
+			   pmd_t *pmd, struct mm_walk *walk)
+{
+#if CONFIG_PGTABLE_LEVELS > 2
+	if (pmd == lm_alias(kasan_early_shadow_pmd))
+		return note_kasan_page_table(walk, addr);
+#endif
+	return 0;
+}
+#endif /* CONFIG_KASAN */
+
+static int ptdump_hole(unsigned long addr, unsigned long next,
+		       int depth, struct mm_walk *walk)
+{
+	struct ptdump_state *st = walk->private;
+
+	st->note_page(st, addr, depth + 1, 0);
+
+	return 0;
+}
+
+static const struct mm_walk_ops ptdump_ops = {
+	.pgd_entry	= ptdump_pgd_entry,
+	.p4d_entry	= ptdump_p4d_entry,
+	.pud_entry	= ptdump_pud_entry,
+	.pmd_entry	= ptdump_pmd_entry,
+	.pte_entry	= ptdump_pte_entry,
+#ifdef CONFIG_KASAN
+	.test_p4d	= ptdump_test_p4d,
+	.test_pud	= ptdump_test_pud,
+	.test_pmd	= ptdump_test_pmd,
+#endif
+	.pte_hole	= ptdump_hole,
+};
+
+void ptdump_walk_pgd(struct ptdump_state *st, struct mm_struct *mm)
+{
+	const struct ptdump_range *range = st->range;
+
+	down_read(&mm->mmap_sem);
+	while (range->start != range->end) {
+		walk_page_range_novma(mm, range->start, range->end,
+				      &ptdump_ops, st);
+		range++;
+	}
+	up_read(&mm->mmap_sem);
+
+	/* Flush out the last page */
+	st->note_page(st, 0, 0, 0);
+}
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH v16 22/25] x86: mm: Convert dump_pagetables to use walk_page_range
  2019-12-06 13:52 ` Steven Price
@ 2019-12-06 13:53   ` Steven Price
  -1 siblings, 0 replies; 85+ messages in thread
From: Steven Price @ 2019-12-06 13:53 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: Steven Price, Andy Lutomirski, Ard Biesheuvel, Arnd Bergmann,
	Borislav Petkov, Catalin Marinas, Dave Hansen, Ingo Molnar,
	James Morse, Jérôme Glisse, Peter Zijlstra,
	Thomas Gleixner, Will Deacon, x86, H. Peter Anvin,
	linux-arm-kernel, linux-kernel, Mark Rutland, Liang, Kan

Make use of the new functionality in walk_page_range to remove the
arch page walking code and use the generic code to walk the page tables.

The effective permissions are passed down the chain using new fields
in struct pg_state.

The KASAN optimisation is implemented by including test_p?d callbacks
which can decide to skip an entire tree of entries

Signed-off-by: Steven Price <steven.price@arm.com>
---
 arch/x86/Kconfig              |   1 +
 arch/x86/Kconfig.debug        |  20 +--
 arch/x86/mm/Makefile          |   4 +-
 arch/x86/mm/dump_pagetables.c | 289 +++++++---------------------------
 4 files changed, 66 insertions(+), 248 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 5e8949953660..8a9594817874 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -119,6 +119,7 @@ config X86
 	select GENERIC_IRQ_RESERVATION_MODE
 	select GENERIC_IRQ_SHOW
 	select GENERIC_PENDING_IRQ		if SMP
+	select GENERIC_PTDUMP
 	select GENERIC_SMP_IDLE_THREAD
 	select GENERIC_STRNCPY_FROM_USER
 	select GENERIC_STRNLEN_USER
diff --git a/arch/x86/Kconfig.debug b/arch/x86/Kconfig.debug
index c4eab8ed33a3..2e74690b028a 100644
--- a/arch/x86/Kconfig.debug
+++ b/arch/x86/Kconfig.debug
@@ -62,26 +62,10 @@ config EARLY_PRINTK_USB_XDBC
 config MCSAFE_TEST
 	def_bool n
 
-config X86_PTDUMP_CORE
-	def_bool n
-
-config X86_PTDUMP
-	tristate "Export kernel pagetable layout to userspace via debugfs"
-	depends on DEBUG_KERNEL
-	select DEBUG_FS
-	select X86_PTDUMP_CORE
-	---help---
-	  Say Y here if you want to show the kernel pagetable layout in a
-	  debugfs file. This information is only useful for kernel developers
-	  who are working in architecture specific areas of the kernel.
-	  It is probably not a good idea to enable this feature in a production
-	  kernel.
-	  If in doubt, say "N"
-
 config EFI_PGT_DUMP
 	bool "Dump the EFI pagetable"
 	depends on EFI
-	select X86_PTDUMP_CORE
+	select PTDUMP_CORE
 	---help---
 	  Enable this if you want to dump the EFI page table before
 	  enabling virtual mode. This can be used to debug miscellaneous
@@ -90,7 +74,7 @@ config EFI_PGT_DUMP
 
 config DEBUG_WX
 	bool "Warn on W+X mappings at boot"
-	select X86_PTDUMP_CORE
+	select PTDUMP_CORE
 	---help---
 	  Generate a warning if any W+X mappings are found at boot.
 
diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile
index 3b89c201ac26..ad44713b2e1e 100644
--- a/arch/x86/mm/Makefile
+++ b/arch/x86/mm/Makefile
@@ -28,8 +28,8 @@ obj-$(CONFIG_X86_PAT)		+= pat_interval.o
 obj-$(CONFIG_X86_32)		+= pgtable_32.o iomap_32.o
 
 obj-$(CONFIG_HUGETLB_PAGE)	+= hugetlbpage.o
-obj-$(CONFIG_X86_PTDUMP_CORE)	+= dump_pagetables.o
-obj-$(CONFIG_X86_PTDUMP)	+= debug_pagetables.o
+obj-$(CONFIG_PTDUMP_CORE)	+= dump_pagetables.o
+obj-$(CONFIG_PTDUMP_DEBUGFS)	+= debug_pagetables.o
 
 obj-$(CONFIG_HIGHMEM)		+= highmem_32.o
 
diff --git a/arch/x86/mm/dump_pagetables.c b/arch/x86/mm/dump_pagetables.c
index 3632be89ec99..77a1332c6cd4 100644
--- a/arch/x86/mm/dump_pagetables.c
+++ b/arch/x86/mm/dump_pagetables.c
@@ -16,6 +16,7 @@
 #include <linux/seq_file.h>
 #include <linux/highmem.h>
 #include <linux/pci.h>
+#include <linux/ptdump.h>
 
 #include <asm/e820/types.h>
 #include <asm/pgtable.h>
@@ -26,11 +27,12 @@
  * when a "break" in the continuity is found.
  */
 struct pg_state {
+	struct ptdump_state ptdump;
 	int level;
-	pgprot_t current_prot;
+	pgprotval_t current_prot;
 	pgprotval_t effective_prot;
+	pgprotval_t prot_levels[5];
 	unsigned long start_address;
-	unsigned long current_address;
 	const struct addr_marker *marker;
 	unsigned long lines;
 	bool to_dmesg;
@@ -171,9 +173,8 @@ static struct addr_marker address_markers[] = {
 /*
  * Print a readable form of a pgprot_t to the seq_file
  */
-static void printk_prot(struct seq_file *m, pgprot_t prot, int level, bool dmsg)
+static void printk_prot(struct seq_file *m, pgprotval_t pr, int level, bool dmsg)
 {
-	pgprotval_t pr = pgprot_val(prot);
 	static const char * const level_name[] =
 		{ "cr3", "pgd", "p4d", "pud", "pmd", "pte" };
 
@@ -220,24 +221,11 @@ static void printk_prot(struct seq_file *m, pgprot_t prot, int level, bool dmsg)
 	pt_dump_cont_printf(m, dmsg, "%s\n", level_name[level]);
 }
 
-/*
- * On 64 bits, sign-extend the 48 bit address to 64 bit
- */
-static unsigned long normalize_addr(unsigned long u)
-{
-	int shift;
-	if (!IS_ENABLED(CONFIG_X86_64))
-		return u;
-
-	shift = 64 - (__VIRTUAL_MASK_SHIFT + 1);
-	return (signed long)(u << shift) >> shift;
-}
-
-static void note_wx(struct pg_state *st)
+static void note_wx(struct pg_state *st, unsigned long addr)
 {
 	unsigned long npages;
 
-	npages = (st->current_address - st->start_address) / PAGE_SIZE;
+	npages = (addr - st->start_address) / PAGE_SIZE;
 
 #ifdef CONFIG_PCI_BIOS
 	/*
@@ -245,7 +233,7 @@ static void note_wx(struct pg_state *st)
 	 * Inform about it, but avoid the warning.
 	 */
 	if (pcibios_enabled && st->start_address >= PAGE_OFFSET + BIOS_BEGIN &&
-	    st->current_address <= PAGE_OFFSET + BIOS_END) {
+	    addr <= PAGE_OFFSET + BIOS_END) {
 		pr_warn_once("x86/mm: PCI BIOS W+X mapping %lu pages\n", npages);
 		return;
 	}
@@ -257,25 +245,44 @@ static void note_wx(struct pg_state *st)
 		  (void *)st->start_address);
 }
 
+static inline pgprotval_t effective_prot(pgprotval_t prot1, pgprotval_t prot2)
+{
+	return (prot1 & prot2 & (_PAGE_USER | _PAGE_RW)) |
+	       ((prot1 | prot2) & _PAGE_NX);
+}
+
 /*
  * This function gets called on a break in a continuous series
  * of PTE entries; the next one is different so we need to
  * print what we collected so far.
  */
-static void note_page(struct pg_state *st, pgprot_t new_prot,
-		      pgprotval_t new_eff, int level)
+static void note_page(struct ptdump_state *pt_st, unsigned long addr, int level,
+		      unsigned long val)
 {
-	pgprotval_t prot, cur, eff;
+	struct pg_state *st = container_of(pt_st, struct pg_state, ptdump);
+	pgprotval_t new_prot, new_eff;
+	pgprotval_t cur, eff;
 	static const char units[] = "BKMGTPE";
 	struct seq_file *m = st->seq;
 
+	new_prot = val & PTE_FLAGS_MASK;
+
+	if (level > 1) {
+		new_eff = effective_prot(st->prot_levels[level - 2],
+					 new_prot);
+	} else {
+		new_eff = new_prot;
+	}
+
+	if (level > 0)
+		st->prot_levels[level - 1] = new_eff;
+
 	/*
 	 * If we have a "break" in the series, we need to flush the state that
 	 * we have now. "break" is either changing perms, levels or
 	 * address space marker.
 	 */
-	prot = pgprot_val(new_prot);
-	cur = pgprot_val(st->current_prot);
+	cur = st->current_prot;
 	eff = st->effective_prot;
 
 	if (!st->level) {
@@ -287,14 +294,14 @@ static void note_page(struct pg_state *st, pgprot_t new_prot,
 		st->lines = 0;
 		pt_dump_seq_printf(m, st->to_dmesg, "---[ %s ]---\n",
 				   st->marker->name);
-	} else if (prot != cur || new_eff != eff || level != st->level ||
-		   st->current_address >= st->marker[1].start_address) {
+	} else if (new_prot != cur || new_eff != eff || level != st->level ||
+		   addr >= st->marker[1].start_address) {
 		const char *unit = units;
 		unsigned long delta;
 		int width = sizeof(unsigned long) * 2;
 
 		if (st->check_wx && (eff & _PAGE_RW) && !(eff & _PAGE_NX))
-			note_wx(st);
+			note_wx(st, addr);
 
 		/*
 		 * Now print the actual finished series
@@ -304,9 +311,9 @@ static void note_page(struct pg_state *st, pgprot_t new_prot,
 			pt_dump_seq_printf(m, st->to_dmesg,
 					   "0x%0*lx-0x%0*lx   ",
 					   width, st->start_address,
-					   width, st->current_address);
+					   width, addr);
 
-			delta = st->current_address - st->start_address;
+			delta = addr - st->start_address;
 			while (!(delta & 1023) && unit[1]) {
 				delta >>= 10;
 				unit++;
@@ -323,7 +330,7 @@ static void note_page(struct pg_state *st, pgprot_t new_prot,
 		 * such as the start of vmalloc space etc.
 		 * This helps in the interpretation.
 		 */
-		if (st->current_address >= st->marker[1].start_address) {
+		if (addr >= st->marker[1].start_address) {
 			if (st->marker->max_lines &&
 			    st->lines > st->marker->max_lines) {
 				unsigned long nskip =
@@ -339,217 +346,43 @@ static void note_page(struct pg_state *st, pgprot_t new_prot,
 					   st->marker->name);
 		}
 
-		st->start_address = st->current_address;
+		st->start_address = addr;
 		st->current_prot = new_prot;
 		st->effective_prot = new_eff;
 		st->level = level;
 	}
 }
 
-static inline pgprotval_t effective_prot(pgprotval_t prot1, pgprotval_t prot2)
-{
-	return (prot1 & prot2 & (_PAGE_USER | _PAGE_RW)) |
-	       ((prot1 | prot2) & _PAGE_NX);
-}
-
-static void walk_pte_level(struct pg_state *st, pmd_t addr, pgprotval_t eff_in,
-			   unsigned long P)
-{
-	int i;
-	pte_t *pte;
-	pgprotval_t prot, eff;
-
-	for (i = 0; i < PTRS_PER_PTE; i++) {
-		st->current_address = normalize_addr(P + i * PTE_LEVEL_MULT);
-		pte = pte_offset_map(&addr, st->current_address);
-		prot = pte_flags(*pte);
-		eff = effective_prot(eff_in, prot);
-		note_page(st, __pgprot(prot), eff, 5);
-		pte_unmap(pte);
-	}
-}
-#ifdef CONFIG_KASAN
-
-/*
- * This is an optimization for KASAN=y case. Since all kasan page tables
- * eventually point to the kasan_early_shadow_page we could call note_page()
- * right away without walking through lower level page tables. This saves
- * us dozens of seconds (minutes for 5-level config) while checking for
- * W+X mapping or reading kernel_page_tables debugfs file.
- */
-static inline bool kasan_page_table(struct pg_state *st, void *pt)
-{
-	if (__pa(pt) == __pa(kasan_early_shadow_pmd) ||
-	    (pgtable_l5_enabled() &&
-			__pa(pt) == __pa(kasan_early_shadow_p4d)) ||
-	    __pa(pt) == __pa(kasan_early_shadow_pud)) {
-		pgprotval_t prot = pte_flags(kasan_early_shadow_pte[0]);
-		note_page(st, __pgprot(prot), 0, 5);
-		return true;
-	}
-	return false;
-}
-#else
-static inline bool kasan_page_table(struct pg_state *st, void *pt)
-{
-	return false;
-}
-#endif
-
-#if PTRS_PER_PMD > 1
-
-static void walk_pmd_level(struct pg_state *st, pud_t addr,
-			   pgprotval_t eff_in, unsigned long P)
-{
-	int i;
-	pmd_t *start, *pmd_start;
-	pgprotval_t prot, eff;
-
-	pmd_start = start = (pmd_t *)pud_page_vaddr(addr);
-	for (i = 0; i < PTRS_PER_PMD; i++) {
-		st->current_address = normalize_addr(P + i * PMD_LEVEL_MULT);
-		if (!pmd_none(*start)) {
-			prot = pmd_flags(*start);
-			eff = effective_prot(eff_in, prot);
-			if (pmd_large(*start) || !pmd_present(*start)) {
-				note_page(st, __pgprot(prot), eff, 4);
-			} else if (!kasan_page_table(st, pmd_start)) {
-				walk_pte_level(st, *start, eff,
-					       P + i * PMD_LEVEL_MULT);
-			}
-		} else
-			note_page(st, __pgprot(0), 0, 4);
-		start++;
-	}
-}
-
-#else
-#define walk_pmd_level(s,a,e,p) walk_pte_level(s,__pmd(pud_val(a)),e,p)
-#define pud_large(a) pmd_large(__pmd(pud_val(a)))
-#define pud_none(a)  pmd_none(__pmd(pud_val(a)))
-#endif
-
-#if PTRS_PER_PUD > 1
-
-static void walk_pud_level(struct pg_state *st, p4d_t addr, pgprotval_t eff_in,
-			   unsigned long P)
-{
-	int i;
-	pud_t *start, *pud_start;
-	pgprotval_t prot, eff;
-
-	pud_start = start = (pud_t *)p4d_page_vaddr(addr);
-
-	for (i = 0; i < PTRS_PER_PUD; i++) {
-		st->current_address = normalize_addr(P + i * PUD_LEVEL_MULT);
-		if (!pud_none(*start)) {
-			prot = pud_flags(*start);
-			eff = effective_prot(eff_in, prot);
-			if (pud_large(*start) || !pud_present(*start)) {
-				note_page(st, __pgprot(prot), eff, 3);
-			} else if (!kasan_page_table(st, pud_start)) {
-				walk_pmd_level(st, *start, eff,
-					       P + i * PUD_LEVEL_MULT);
-			}
-		} else
-			note_page(st, __pgprot(0), 0, 3);
-
-		start++;
-	}
-}
-
-#else
-#define walk_pud_level(s,a,e,p) walk_pmd_level(s,__pud(p4d_val(a)),e,p)
-#define p4d_large(a) pud_large(__pud(p4d_val(a)))
-#define p4d_none(a)  pud_none(__pud(p4d_val(a)))
-#endif
-
-static void walk_p4d_level(struct pg_state *st, pgd_t addr, pgprotval_t eff_in,
-			   unsigned long P)
+static void ptdump_walk_pgd_level_core(struct seq_file *m, struct mm_struct *mm,
+				       bool checkwx, bool dmesg)
 {
-	int i;
-	p4d_t *start, *p4d_start;
-	pgprotval_t prot, eff;
-
-	if (PTRS_PER_P4D == 1)
-		return walk_pud_level(st, __p4d(pgd_val(addr)), eff_in, P);
-
-	p4d_start = start = (p4d_t *)pgd_page_vaddr(addr);
-
-	for (i = 0; i < PTRS_PER_P4D; i++) {
-		st->current_address = normalize_addr(P + i * P4D_LEVEL_MULT);
-		if (!p4d_none(*start)) {
-			prot = p4d_flags(*start);
-			eff = effective_prot(eff_in, prot);
-			if (p4d_large(*start) || !p4d_present(*start)) {
-				note_page(st, __pgprot(prot), eff, 2);
-			} else if (!kasan_page_table(st, p4d_start)) {
-				walk_pud_level(st, *start, eff,
-					       P + i * P4D_LEVEL_MULT);
-			}
-		} else
-			note_page(st, __pgprot(0), 0, 2);
-
-		start++;
-	}
-}
+	const struct ptdump_range ptdump_ranges[] = {
+#ifdef CONFIG_X86_64
 
-#define pgd_large(a) (pgtable_l5_enabled() ? pgd_large(a) : p4d_large(__p4d(pgd_val(a))))
-#define pgd_none(a)  (pgtable_l5_enabled() ? pgd_none(a) : p4d_none(__p4d(pgd_val(a))))
+#define normalize_addr_shift (64 - (__VIRTUAL_MASK_SHIFT + 1))
+#define normalize_addr(u) ((signed long)((u) << normalize_addr_shift) >> \
+			   normalize_addr_shift)
 
-static inline bool is_hypervisor_range(int idx)
-{
-#ifdef CONFIG_X86_64
-	/*
-	 * A hole in the beginning of kernel address space reserved
-	 * for a hypervisor.
-	 */
-	return	(idx >= pgd_index(GUARD_HOLE_BASE_ADDR)) &&
-		(idx <  pgd_index(GUARD_HOLE_END_ADDR));
+	{0, PTRS_PER_PGD * PGD_LEVEL_MULT / 2},
+	{normalize_addr(PTRS_PER_PGD * PGD_LEVEL_MULT / 2), ~0UL},
 #else
-	return false;
+	{0, ~0UL},
 #endif
-}
+	{0, 0}
+};
 
-static void ptdump_walk_pgd_level_core(struct seq_file *m, struct mm_struct *mm,
-				       bool checkwx, bool dmesg)
-{
-	pgd_t *start = mm->pgd;
-	pgprotval_t prot, eff;
-	int i;
-	struct pg_state st = {};
-
-	st.to_dmesg = dmesg;
-	st.check_wx = checkwx;
-	st.seq = m;
-	if (checkwx)
-		st.wx_pages = 0;
-
-	for (i = 0; i < PTRS_PER_PGD; i++) {
-		st.current_address = normalize_addr(i * PGD_LEVEL_MULT);
-		if (!pgd_none(*start) && !is_hypervisor_range(i)) {
-			prot = pgd_flags(*start);
-#ifdef CONFIG_X86_PAE
-			eff = _PAGE_USER | _PAGE_RW;
-#else
-			eff = prot;
-#endif
-			if (pgd_large(*start) || !pgd_present(*start)) {
-				note_page(&st, __pgprot(prot), eff, 1);
-			} else {
-				walk_p4d_level(&st, *start, eff,
-					       i * PGD_LEVEL_MULT);
-			}
-		} else
-			note_page(&st, __pgprot(0), 0, 1);
+	struct pg_state st = {
+		.ptdump = {
+			.note_page	= note_page,
+			.range		= ptdump_ranges
+		},
+		.to_dmesg	= dmesg,
+		.check_wx	= checkwx,
+		.seq		= m
+	};
 
-		cond_resched();
-		start++;
-	}
+	ptdump_walk_pgd(&st.ptdump, mm);
 
-	/* Flush out the last page */
-	st.current_address = normalize_addr(PTRS_PER_PGD*PGD_LEVEL_MULT);
-	note_page(&st, __pgprot(0), 0, 0);
 	if (!checkwx)
 		return;
 	if (st.wx_pages)
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH v16 22/25] x86: mm: Convert dump_pagetables to use walk_page_range
@ 2019-12-06 13:53   ` Steven Price
  0 siblings, 0 replies; 85+ messages in thread
From: Steven Price @ 2019-12-06 13:53 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: Mark Rutland, x86, Arnd Bergmann, Ard Biesheuvel, Peter Zijlstra,
	Catalin Marinas, Dave Hansen, linux-kernel, Steven Price,
	Jérôme Glisse, Ingo Molnar, Borislav Petkov,
	Andy Lutomirski, H. Peter Anvin, James Morse, Thomas Gleixner,
	Will Deacon, linux-arm-kernel, Liang, Kan

Make use of the new functionality in walk_page_range to remove the
arch page walking code and use the generic code to walk the page tables.

The effective permissions are passed down the chain using new fields
in struct pg_state.

The KASAN optimisation is implemented by including test_p?d callbacks
which can decide to skip an entire tree of entries

Signed-off-by: Steven Price <steven.price@arm.com>
---
 arch/x86/Kconfig              |   1 +
 arch/x86/Kconfig.debug        |  20 +--
 arch/x86/mm/Makefile          |   4 +-
 arch/x86/mm/dump_pagetables.c | 289 +++++++---------------------------
 4 files changed, 66 insertions(+), 248 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 5e8949953660..8a9594817874 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -119,6 +119,7 @@ config X86
 	select GENERIC_IRQ_RESERVATION_MODE
 	select GENERIC_IRQ_SHOW
 	select GENERIC_PENDING_IRQ		if SMP
+	select GENERIC_PTDUMP
 	select GENERIC_SMP_IDLE_THREAD
 	select GENERIC_STRNCPY_FROM_USER
 	select GENERIC_STRNLEN_USER
diff --git a/arch/x86/Kconfig.debug b/arch/x86/Kconfig.debug
index c4eab8ed33a3..2e74690b028a 100644
--- a/arch/x86/Kconfig.debug
+++ b/arch/x86/Kconfig.debug
@@ -62,26 +62,10 @@ config EARLY_PRINTK_USB_XDBC
 config MCSAFE_TEST
 	def_bool n
 
-config X86_PTDUMP_CORE
-	def_bool n
-
-config X86_PTDUMP
-	tristate "Export kernel pagetable layout to userspace via debugfs"
-	depends on DEBUG_KERNEL
-	select DEBUG_FS
-	select X86_PTDUMP_CORE
-	---help---
-	  Say Y here if you want to show the kernel pagetable layout in a
-	  debugfs file. This information is only useful for kernel developers
-	  who are working in architecture specific areas of the kernel.
-	  It is probably not a good idea to enable this feature in a production
-	  kernel.
-	  If in doubt, say "N"
-
 config EFI_PGT_DUMP
 	bool "Dump the EFI pagetable"
 	depends on EFI
-	select X86_PTDUMP_CORE
+	select PTDUMP_CORE
 	---help---
 	  Enable this if you want to dump the EFI page table before
 	  enabling virtual mode. This can be used to debug miscellaneous
@@ -90,7 +74,7 @@ config EFI_PGT_DUMP
 
 config DEBUG_WX
 	bool "Warn on W+X mappings at boot"
-	select X86_PTDUMP_CORE
+	select PTDUMP_CORE
 	---help---
 	  Generate a warning if any W+X mappings are found at boot.
 
diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile
index 3b89c201ac26..ad44713b2e1e 100644
--- a/arch/x86/mm/Makefile
+++ b/arch/x86/mm/Makefile
@@ -28,8 +28,8 @@ obj-$(CONFIG_X86_PAT)		+= pat_interval.o
 obj-$(CONFIG_X86_32)		+= pgtable_32.o iomap_32.o
 
 obj-$(CONFIG_HUGETLB_PAGE)	+= hugetlbpage.o
-obj-$(CONFIG_X86_PTDUMP_CORE)	+= dump_pagetables.o
-obj-$(CONFIG_X86_PTDUMP)	+= debug_pagetables.o
+obj-$(CONFIG_PTDUMP_CORE)	+= dump_pagetables.o
+obj-$(CONFIG_PTDUMP_DEBUGFS)	+= debug_pagetables.o
 
 obj-$(CONFIG_HIGHMEM)		+= highmem_32.o
 
diff --git a/arch/x86/mm/dump_pagetables.c b/arch/x86/mm/dump_pagetables.c
index 3632be89ec99..77a1332c6cd4 100644
--- a/arch/x86/mm/dump_pagetables.c
+++ b/arch/x86/mm/dump_pagetables.c
@@ -16,6 +16,7 @@
 #include <linux/seq_file.h>
 #include <linux/highmem.h>
 #include <linux/pci.h>
+#include <linux/ptdump.h>
 
 #include <asm/e820/types.h>
 #include <asm/pgtable.h>
@@ -26,11 +27,12 @@
  * when a "break" in the continuity is found.
  */
 struct pg_state {
+	struct ptdump_state ptdump;
 	int level;
-	pgprot_t current_prot;
+	pgprotval_t current_prot;
 	pgprotval_t effective_prot;
+	pgprotval_t prot_levels[5];
 	unsigned long start_address;
-	unsigned long current_address;
 	const struct addr_marker *marker;
 	unsigned long lines;
 	bool to_dmesg;
@@ -171,9 +173,8 @@ static struct addr_marker address_markers[] = {
 /*
  * Print a readable form of a pgprot_t to the seq_file
  */
-static void printk_prot(struct seq_file *m, pgprot_t prot, int level, bool dmsg)
+static void printk_prot(struct seq_file *m, pgprotval_t pr, int level, bool dmsg)
 {
-	pgprotval_t pr = pgprot_val(prot);
 	static const char * const level_name[] =
 		{ "cr3", "pgd", "p4d", "pud", "pmd", "pte" };
 
@@ -220,24 +221,11 @@ static void printk_prot(struct seq_file *m, pgprot_t prot, int level, bool dmsg)
 	pt_dump_cont_printf(m, dmsg, "%s\n", level_name[level]);
 }
 
-/*
- * On 64 bits, sign-extend the 48 bit address to 64 bit
- */
-static unsigned long normalize_addr(unsigned long u)
-{
-	int shift;
-	if (!IS_ENABLED(CONFIG_X86_64))
-		return u;
-
-	shift = 64 - (__VIRTUAL_MASK_SHIFT + 1);
-	return (signed long)(u << shift) >> shift;
-}
-
-static void note_wx(struct pg_state *st)
+static void note_wx(struct pg_state *st, unsigned long addr)
 {
 	unsigned long npages;
 
-	npages = (st->current_address - st->start_address) / PAGE_SIZE;
+	npages = (addr - st->start_address) / PAGE_SIZE;
 
 #ifdef CONFIG_PCI_BIOS
 	/*
@@ -245,7 +233,7 @@ static void note_wx(struct pg_state *st)
 	 * Inform about it, but avoid the warning.
 	 */
 	if (pcibios_enabled && st->start_address >= PAGE_OFFSET + BIOS_BEGIN &&
-	    st->current_address <= PAGE_OFFSET + BIOS_END) {
+	    addr <= PAGE_OFFSET + BIOS_END) {
 		pr_warn_once("x86/mm: PCI BIOS W+X mapping %lu pages\n", npages);
 		return;
 	}
@@ -257,25 +245,44 @@ static void note_wx(struct pg_state *st)
 		  (void *)st->start_address);
 }
 
+static inline pgprotval_t effective_prot(pgprotval_t prot1, pgprotval_t prot2)
+{
+	return (prot1 & prot2 & (_PAGE_USER | _PAGE_RW)) |
+	       ((prot1 | prot2) & _PAGE_NX);
+}
+
 /*
  * This function gets called on a break in a continuous series
  * of PTE entries; the next one is different so we need to
  * print what we collected so far.
  */
-static void note_page(struct pg_state *st, pgprot_t new_prot,
-		      pgprotval_t new_eff, int level)
+static void note_page(struct ptdump_state *pt_st, unsigned long addr, int level,
+		      unsigned long val)
 {
-	pgprotval_t prot, cur, eff;
+	struct pg_state *st = container_of(pt_st, struct pg_state, ptdump);
+	pgprotval_t new_prot, new_eff;
+	pgprotval_t cur, eff;
 	static const char units[] = "BKMGTPE";
 	struct seq_file *m = st->seq;
 
+	new_prot = val & PTE_FLAGS_MASK;
+
+	if (level > 1) {
+		new_eff = effective_prot(st->prot_levels[level - 2],
+					 new_prot);
+	} else {
+		new_eff = new_prot;
+	}
+
+	if (level > 0)
+		st->prot_levels[level - 1] = new_eff;
+
 	/*
 	 * If we have a "break" in the series, we need to flush the state that
 	 * we have now. "break" is either changing perms, levels or
 	 * address space marker.
 	 */
-	prot = pgprot_val(new_prot);
-	cur = pgprot_val(st->current_prot);
+	cur = st->current_prot;
 	eff = st->effective_prot;
 
 	if (!st->level) {
@@ -287,14 +294,14 @@ static void note_page(struct pg_state *st, pgprot_t new_prot,
 		st->lines = 0;
 		pt_dump_seq_printf(m, st->to_dmesg, "---[ %s ]---\n",
 				   st->marker->name);
-	} else if (prot != cur || new_eff != eff || level != st->level ||
-		   st->current_address >= st->marker[1].start_address) {
+	} else if (new_prot != cur || new_eff != eff || level != st->level ||
+		   addr >= st->marker[1].start_address) {
 		const char *unit = units;
 		unsigned long delta;
 		int width = sizeof(unsigned long) * 2;
 
 		if (st->check_wx && (eff & _PAGE_RW) && !(eff & _PAGE_NX))
-			note_wx(st);
+			note_wx(st, addr);
 
 		/*
 		 * Now print the actual finished series
@@ -304,9 +311,9 @@ static void note_page(struct pg_state *st, pgprot_t new_prot,
 			pt_dump_seq_printf(m, st->to_dmesg,
 					   "0x%0*lx-0x%0*lx   ",
 					   width, st->start_address,
-					   width, st->current_address);
+					   width, addr);
 
-			delta = st->current_address - st->start_address;
+			delta = addr - st->start_address;
 			while (!(delta & 1023) && unit[1]) {
 				delta >>= 10;
 				unit++;
@@ -323,7 +330,7 @@ static void note_page(struct pg_state *st, pgprot_t new_prot,
 		 * such as the start of vmalloc space etc.
 		 * This helps in the interpretation.
 		 */
-		if (st->current_address >= st->marker[1].start_address) {
+		if (addr >= st->marker[1].start_address) {
 			if (st->marker->max_lines &&
 			    st->lines > st->marker->max_lines) {
 				unsigned long nskip =
@@ -339,217 +346,43 @@ static void note_page(struct pg_state *st, pgprot_t new_prot,
 					   st->marker->name);
 		}
 
-		st->start_address = st->current_address;
+		st->start_address = addr;
 		st->current_prot = new_prot;
 		st->effective_prot = new_eff;
 		st->level = level;
 	}
 }
 
-static inline pgprotval_t effective_prot(pgprotval_t prot1, pgprotval_t prot2)
-{
-	return (prot1 & prot2 & (_PAGE_USER | _PAGE_RW)) |
-	       ((prot1 | prot2) & _PAGE_NX);
-}
-
-static void walk_pte_level(struct pg_state *st, pmd_t addr, pgprotval_t eff_in,
-			   unsigned long P)
-{
-	int i;
-	pte_t *pte;
-	pgprotval_t prot, eff;
-
-	for (i = 0; i < PTRS_PER_PTE; i++) {
-		st->current_address = normalize_addr(P + i * PTE_LEVEL_MULT);
-		pte = pte_offset_map(&addr, st->current_address);
-		prot = pte_flags(*pte);
-		eff = effective_prot(eff_in, prot);
-		note_page(st, __pgprot(prot), eff, 5);
-		pte_unmap(pte);
-	}
-}
-#ifdef CONFIG_KASAN
-
-/*
- * This is an optimization for KASAN=y case. Since all kasan page tables
- * eventually point to the kasan_early_shadow_page we could call note_page()
- * right away without walking through lower level page tables. This saves
- * us dozens of seconds (minutes for 5-level config) while checking for
- * W+X mapping or reading kernel_page_tables debugfs file.
- */
-static inline bool kasan_page_table(struct pg_state *st, void *pt)
-{
-	if (__pa(pt) == __pa(kasan_early_shadow_pmd) ||
-	    (pgtable_l5_enabled() &&
-			__pa(pt) == __pa(kasan_early_shadow_p4d)) ||
-	    __pa(pt) == __pa(kasan_early_shadow_pud)) {
-		pgprotval_t prot = pte_flags(kasan_early_shadow_pte[0]);
-		note_page(st, __pgprot(prot), 0, 5);
-		return true;
-	}
-	return false;
-}
-#else
-static inline bool kasan_page_table(struct pg_state *st, void *pt)
-{
-	return false;
-}
-#endif
-
-#if PTRS_PER_PMD > 1
-
-static void walk_pmd_level(struct pg_state *st, pud_t addr,
-			   pgprotval_t eff_in, unsigned long P)
-{
-	int i;
-	pmd_t *start, *pmd_start;
-	pgprotval_t prot, eff;
-
-	pmd_start = start = (pmd_t *)pud_page_vaddr(addr);
-	for (i = 0; i < PTRS_PER_PMD; i++) {
-		st->current_address = normalize_addr(P + i * PMD_LEVEL_MULT);
-		if (!pmd_none(*start)) {
-			prot = pmd_flags(*start);
-			eff = effective_prot(eff_in, prot);
-			if (pmd_large(*start) || !pmd_present(*start)) {
-				note_page(st, __pgprot(prot), eff, 4);
-			} else if (!kasan_page_table(st, pmd_start)) {
-				walk_pte_level(st, *start, eff,
-					       P + i * PMD_LEVEL_MULT);
-			}
-		} else
-			note_page(st, __pgprot(0), 0, 4);
-		start++;
-	}
-}
-
-#else
-#define walk_pmd_level(s,a,e,p) walk_pte_level(s,__pmd(pud_val(a)),e,p)
-#define pud_large(a) pmd_large(__pmd(pud_val(a)))
-#define pud_none(a)  pmd_none(__pmd(pud_val(a)))
-#endif
-
-#if PTRS_PER_PUD > 1
-
-static void walk_pud_level(struct pg_state *st, p4d_t addr, pgprotval_t eff_in,
-			   unsigned long P)
-{
-	int i;
-	pud_t *start, *pud_start;
-	pgprotval_t prot, eff;
-
-	pud_start = start = (pud_t *)p4d_page_vaddr(addr);
-
-	for (i = 0; i < PTRS_PER_PUD; i++) {
-		st->current_address = normalize_addr(P + i * PUD_LEVEL_MULT);
-		if (!pud_none(*start)) {
-			prot = pud_flags(*start);
-			eff = effective_prot(eff_in, prot);
-			if (pud_large(*start) || !pud_present(*start)) {
-				note_page(st, __pgprot(prot), eff, 3);
-			} else if (!kasan_page_table(st, pud_start)) {
-				walk_pmd_level(st, *start, eff,
-					       P + i * PUD_LEVEL_MULT);
-			}
-		} else
-			note_page(st, __pgprot(0), 0, 3);
-
-		start++;
-	}
-}
-
-#else
-#define walk_pud_level(s,a,e,p) walk_pmd_level(s,__pud(p4d_val(a)),e,p)
-#define p4d_large(a) pud_large(__pud(p4d_val(a)))
-#define p4d_none(a)  pud_none(__pud(p4d_val(a)))
-#endif
-
-static void walk_p4d_level(struct pg_state *st, pgd_t addr, pgprotval_t eff_in,
-			   unsigned long P)
+static void ptdump_walk_pgd_level_core(struct seq_file *m, struct mm_struct *mm,
+				       bool checkwx, bool dmesg)
 {
-	int i;
-	p4d_t *start, *p4d_start;
-	pgprotval_t prot, eff;
-
-	if (PTRS_PER_P4D == 1)
-		return walk_pud_level(st, __p4d(pgd_val(addr)), eff_in, P);
-
-	p4d_start = start = (p4d_t *)pgd_page_vaddr(addr);
-
-	for (i = 0; i < PTRS_PER_P4D; i++) {
-		st->current_address = normalize_addr(P + i * P4D_LEVEL_MULT);
-		if (!p4d_none(*start)) {
-			prot = p4d_flags(*start);
-			eff = effective_prot(eff_in, prot);
-			if (p4d_large(*start) || !p4d_present(*start)) {
-				note_page(st, __pgprot(prot), eff, 2);
-			} else if (!kasan_page_table(st, p4d_start)) {
-				walk_pud_level(st, *start, eff,
-					       P + i * P4D_LEVEL_MULT);
-			}
-		} else
-			note_page(st, __pgprot(0), 0, 2);
-
-		start++;
-	}
-}
+	const struct ptdump_range ptdump_ranges[] = {
+#ifdef CONFIG_X86_64
 
-#define pgd_large(a) (pgtable_l5_enabled() ? pgd_large(a) : p4d_large(__p4d(pgd_val(a))))
-#define pgd_none(a)  (pgtable_l5_enabled() ? pgd_none(a) : p4d_none(__p4d(pgd_val(a))))
+#define normalize_addr_shift (64 - (__VIRTUAL_MASK_SHIFT + 1))
+#define normalize_addr(u) ((signed long)((u) << normalize_addr_shift) >> \
+			   normalize_addr_shift)
 
-static inline bool is_hypervisor_range(int idx)
-{
-#ifdef CONFIG_X86_64
-	/*
-	 * A hole in the beginning of kernel address space reserved
-	 * for a hypervisor.
-	 */
-	return	(idx >= pgd_index(GUARD_HOLE_BASE_ADDR)) &&
-		(idx <  pgd_index(GUARD_HOLE_END_ADDR));
+	{0, PTRS_PER_PGD * PGD_LEVEL_MULT / 2},
+	{normalize_addr(PTRS_PER_PGD * PGD_LEVEL_MULT / 2), ~0UL},
 #else
-	return false;
+	{0, ~0UL},
 #endif
-}
+	{0, 0}
+};
 
-static void ptdump_walk_pgd_level_core(struct seq_file *m, struct mm_struct *mm,
-				       bool checkwx, bool dmesg)
-{
-	pgd_t *start = mm->pgd;
-	pgprotval_t prot, eff;
-	int i;
-	struct pg_state st = {};
-
-	st.to_dmesg = dmesg;
-	st.check_wx = checkwx;
-	st.seq = m;
-	if (checkwx)
-		st.wx_pages = 0;
-
-	for (i = 0; i < PTRS_PER_PGD; i++) {
-		st.current_address = normalize_addr(i * PGD_LEVEL_MULT);
-		if (!pgd_none(*start) && !is_hypervisor_range(i)) {
-			prot = pgd_flags(*start);
-#ifdef CONFIG_X86_PAE
-			eff = _PAGE_USER | _PAGE_RW;
-#else
-			eff = prot;
-#endif
-			if (pgd_large(*start) || !pgd_present(*start)) {
-				note_page(&st, __pgprot(prot), eff, 1);
-			} else {
-				walk_p4d_level(&st, *start, eff,
-					       i * PGD_LEVEL_MULT);
-			}
-		} else
-			note_page(&st, __pgprot(0), 0, 1);
+	struct pg_state st = {
+		.ptdump = {
+			.note_page	= note_page,
+			.range		= ptdump_ranges
+		},
+		.to_dmesg	= dmesg,
+		.check_wx	= checkwx,
+		.seq		= m
+	};
 
-		cond_resched();
-		start++;
-	}
+	ptdump_walk_pgd(&st.ptdump, mm);
 
-	/* Flush out the last page */
-	st.current_address = normalize_addr(PTRS_PER_PGD*PGD_LEVEL_MULT);
-	note_page(&st, __pgprot(0), 0, 0);
 	if (!checkwx)
 		return;
 	if (st.wx_pages)
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH v16 23/25] arm64: mm: Convert mm/dump.c to use walk_page_range()
  2019-12-06 13:52 ` Steven Price
@ 2019-12-06 13:53   ` Steven Price
  -1 siblings, 0 replies; 85+ messages in thread
From: Steven Price @ 2019-12-06 13:53 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: Steven Price, Andy Lutomirski, Ard Biesheuvel, Arnd Bergmann,
	Borislav Petkov, Catalin Marinas, Dave Hansen, Ingo Molnar,
	James Morse, Jérôme Glisse, Peter Zijlstra,
	Thomas Gleixner, Will Deacon, x86, H. Peter Anvin,
	linux-arm-kernel, linux-kernel, Mark Rutland, Liang, Kan

Now walk_page_range() can walk kernel page tables, we can switch the
arm64 ptdump code over to using it, simplifying the code.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Steven Price <steven.price@arm.com>
---
 arch/arm64/Kconfig                 |   1 +
 arch/arm64/Kconfig.debug           |  19 +----
 arch/arm64/include/asm/ptdump.h    |   8 +-
 arch/arm64/mm/Makefile             |   4 +-
 arch/arm64/mm/dump.c               | 117 ++++++++++-------------------
 arch/arm64/mm/mmu.c                |   4 +-
 arch/arm64/mm/ptdump_debugfs.c     |   2 +-
 drivers/firmware/efi/arm-runtime.c |   2 +-
 8 files changed, 50 insertions(+), 107 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index b1b4476ddb83..43aa1de727f4 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -103,6 +103,7 @@ config ARM64
 	select GENERIC_IRQ_SHOW
 	select GENERIC_IRQ_SHOW_LEVEL
 	select GENERIC_PCI_IOMAP
+	select GENERIC_PTDUMP
 	select GENERIC_SCHED_CLOCK
 	select GENERIC_SMP_IDLE_THREAD
 	select GENERIC_STRNCPY_FROM_USER
diff --git a/arch/arm64/Kconfig.debug b/arch/arm64/Kconfig.debug
index cf09010d825f..1c906d932d6b 100644
--- a/arch/arm64/Kconfig.debug
+++ b/arch/arm64/Kconfig.debug
@@ -1,22 +1,5 @@
 # SPDX-License-Identifier: GPL-2.0-only
 
-config ARM64_PTDUMP_CORE
-	def_bool n
-
-config ARM64_PTDUMP_DEBUGFS
-	bool "Export kernel pagetable layout to userspace via debugfs"
-	depends on DEBUG_KERNEL
-	select ARM64_PTDUMP_CORE
-	select DEBUG_FS
-        help
-	  Say Y here if you want to show the kernel pagetable layout in a
-	  debugfs file. This information is only useful for kernel developers
-	  who are working in architecture specific areas of the kernel.
-	  It is probably not a good idea to enable this feature in a production
-	  kernel.
-
-	  If in doubt, say N.
-
 config PID_IN_CONTEXTIDR
 	bool "Write the current PID to the CONTEXTIDR register"
 	help
@@ -42,7 +25,7 @@ config ARM64_RANDOMIZE_TEXT_OFFSET
 
 config DEBUG_WX
 	bool "Warn on W+X mappings at boot"
-	select ARM64_PTDUMP_CORE
+	select PTDUMP_CORE
 	---help---
 	  Generate a warning if any W+X mappings are found at boot.
 
diff --git a/arch/arm64/include/asm/ptdump.h b/arch/arm64/include/asm/ptdump.h
index 0b8e7269ec82..38187f74e089 100644
--- a/arch/arm64/include/asm/ptdump.h
+++ b/arch/arm64/include/asm/ptdump.h
@@ -5,7 +5,7 @@
 #ifndef __ASM_PTDUMP_H
 #define __ASM_PTDUMP_H
 
-#ifdef CONFIG_ARM64_PTDUMP_CORE
+#ifdef CONFIG_PTDUMP_CORE
 
 #include <linux/mm_types.h>
 #include <linux/seq_file.h>
@@ -21,15 +21,15 @@ struct ptdump_info {
 	unsigned long			base_addr;
 };
 
-void ptdump_walk_pgd(struct seq_file *s, struct ptdump_info *info);
-#ifdef CONFIG_ARM64_PTDUMP_DEBUGFS
+void ptdump_walk(struct seq_file *s, struct ptdump_info *info);
+#ifdef CONFIG_PTDUMP_DEBUGFS
 void ptdump_debugfs_register(struct ptdump_info *info, const char *name);
 #else
 static inline void ptdump_debugfs_register(struct ptdump_info *info,
 					   const char *name) { }
 #endif
 void ptdump_check_wx(void);
-#endif /* CONFIG_ARM64_PTDUMP_CORE */
+#endif /* CONFIG_PTDUMP_CORE */
 
 #ifdef CONFIG_DEBUG_WX
 #define debug_checkwx()	ptdump_check_wx()
diff --git a/arch/arm64/mm/Makefile b/arch/arm64/mm/Makefile
index 849c1df3d214..d91030f0ffee 100644
--- a/arch/arm64/mm/Makefile
+++ b/arch/arm64/mm/Makefile
@@ -4,8 +4,8 @@ obj-y				:= dma-mapping.o extable.o fault.o init.o \
 				   ioremap.o mmap.o pgd.o mmu.o \
 				   context.o proc.o pageattr.o
 obj-$(CONFIG_HUGETLB_PAGE)	+= hugetlbpage.o
-obj-$(CONFIG_ARM64_PTDUMP_CORE)	+= dump.o
-obj-$(CONFIG_ARM64_PTDUMP_DEBUGFS)	+= ptdump_debugfs.o
+obj-$(CONFIG_PTDUMP_CORE)	+= dump.o
+obj-$(CONFIG_PTDUMP_DEBUGFS)	+= ptdump_debugfs.o
 obj-$(CONFIG_NUMA)		+= numa.o
 obj-$(CONFIG_DEBUG_VIRTUAL)	+= physaddr.o
 KASAN_SANITIZE_physaddr.o	+= n
diff --git a/arch/arm64/mm/dump.c b/arch/arm64/mm/dump.c
index 93f9f77582ae..9d9b740a86d2 100644
--- a/arch/arm64/mm/dump.c
+++ b/arch/arm64/mm/dump.c
@@ -15,6 +15,7 @@
 #include <linux/io.h>
 #include <linux/init.h>
 #include <linux/mm.h>
+#include <linux/ptdump.h>
 #include <linux/sched.h>
 #include <linux/seq_file.h>
 
@@ -75,10 +76,11 @@ static struct addr_marker address_markers[] = {
  * dumps out a description of the range.
  */
 struct pg_state {
+	struct ptdump_state ptdump;
 	struct seq_file *seq;
 	const struct addr_marker *marker;
 	unsigned long start_address;
-	unsigned level;
+	int level;
 	u64 current_prot;
 	bool check_wx;
 	unsigned long wx_pages;
@@ -178,6 +180,10 @@ static struct pg_level pg_level[] = {
 		.name	= "PGD",
 		.bits	= pte_bits,
 		.num	= ARRAY_SIZE(pte_bits),
+	}, { /* p4d */
+		.name	= "P4D",
+		.bits	= pte_bits,
+		.num	= ARRAY_SIZE(pte_bits),
 	}, { /* pud */
 		.name	= (CONFIG_PGTABLE_LEVELS > 3) ? "PUD" : "PGD",
 		.bits	= pte_bits,
@@ -240,11 +246,15 @@ static void note_prot_wx(struct pg_state *st, unsigned long addr)
 	st->wx_pages += (addr - st->start_address) / PAGE_SIZE;
 }
 
-static void note_page(struct pg_state *st, unsigned long addr, unsigned level,
-				u64 val)
+static void note_page(struct ptdump_state *pt_st, unsigned long addr, int level,
+		      unsigned long val)
 {
+	struct pg_state *st = container_of(pt_st, struct pg_state, ptdump);
 	static const char units[] = "KMGTPE";
-	u64 prot = val & pg_level[level].mask;
+	u64 prot = 0;
+
+	if (level >= 0)
+		prot = val & pg_level[level].mask;
 
 	if (!st->level) {
 		st->level = level;
@@ -292,85 +302,27 @@ static void note_page(struct pg_state *st, unsigned long addr, unsigned level,
 
 }
 
-static void walk_pte(struct pg_state *st, pmd_t *pmdp, unsigned long start,
-		     unsigned long end)
-{
-	unsigned long addr = start;
-	pte_t *ptep = pte_offset_kernel(pmdp, start);
-
-	do {
-		note_page(st, addr, 4, READ_ONCE(pte_val(*ptep)));
-	} while (ptep++, addr += PAGE_SIZE, addr != end);
-}
-
-static void walk_pmd(struct pg_state *st, pud_t *pudp, unsigned long start,
-		     unsigned long end)
-{
-	unsigned long next, addr = start;
-	pmd_t *pmdp = pmd_offset(pudp, start);
-
-	do {
-		pmd_t pmd = READ_ONCE(*pmdp);
-		next = pmd_addr_end(addr, end);
-
-		if (pmd_none(pmd) || pmd_sect(pmd)) {
-			note_page(st, addr, 3, pmd_val(pmd));
-		} else {
-			BUG_ON(pmd_bad(pmd));
-			walk_pte(st, pmdp, addr, next);
-		}
-	} while (pmdp++, addr = next, addr != end);
-}
-
-static void walk_pud(struct pg_state *st, pgd_t *pgdp, unsigned long start,
-		     unsigned long end)
+void ptdump_walk(struct seq_file *s, struct ptdump_info *info)
 {
-	unsigned long next, addr = start;
-	pud_t *pudp = pud_offset(pgdp, start);
-
-	do {
-		pud_t pud = READ_ONCE(*pudp);
-		next = pud_addr_end(addr, end);
-
-		if (pud_none(pud) || pud_sect(pud)) {
-			note_page(st, addr, 2, pud_val(pud));
-		} else {
-			BUG_ON(pud_bad(pud));
-			walk_pmd(st, pudp, addr, next);
-		}
-	} while (pudp++, addr = next, addr != end);
-}
+	unsigned long end = ~0UL;
+	struct pg_state st;
 
-static void walk_pgd(struct pg_state *st, struct mm_struct *mm,
-		     unsigned long start)
-{
-	unsigned long end = (start < TASK_SIZE_64) ? TASK_SIZE_64 : 0;
-	unsigned long next, addr = start;
-	pgd_t *pgdp = pgd_offset(mm, start);
-
-	do {
-		pgd_t pgd = READ_ONCE(*pgdp);
-		next = pgd_addr_end(addr, end);
-
-		if (pgd_none(pgd)) {
-			note_page(st, addr, 1, pgd_val(pgd));
-		} else {
-			BUG_ON(pgd_bad(pgd));
-			walk_pud(st, pgdp, addr, next);
-		}
-	} while (pgdp++, addr = next, addr != end);
-}
+	if (info->base_addr < TASK_SIZE_64)
+		end = TASK_SIZE_64;
 
-void ptdump_walk_pgd(struct seq_file *m, struct ptdump_info *info)
-{
-	struct pg_state st = {
-		.seq = m,
+	st = (struct pg_state){
+		.seq = s,
 		.marker = info->markers,
+		.ptdump = {
+			.note_page = note_page,
+			.range = (struct ptdump_range[]){
+				{info->base_addr, end},
+				{0, 0}
+			}
+		}
 	};
 
-	walk_pgd(&st, info->mm, info->base_addr);
-
-	note_page(&st, 0, 0, 0);
+	ptdump_walk_pgd(&st.ptdump, info->mm);
 }
 
 static void ptdump_initialize(void)
@@ -398,10 +350,17 @@ void ptdump_check_wx(void)
 			{ -1, NULL},
 		},
 		.check_wx = true,
+		.ptdump = {
+			.note_page = note_page,
+			.range = (struct ptdump_range[]) {
+				{PAGE_OFFSET, ~0UL},
+				{0, 0}
+			}
+		}
 	};
 
-	walk_pgd(&st, &init_mm, PAGE_OFFSET);
-	note_page(&st, 0, 0, 0);
+	ptdump_walk_pgd(&st.ptdump, &init_mm);
+
 	if (st.wx_pages || st.uxn_pages)
 		pr_warn("Checked W+X mappings: FAILED, %lu W+X pages found, %lu non-UXN pages found\n",
 			st.wx_pages, st.uxn_pages);
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 5a3b15a14a7f..36e8f4f74433 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -943,13 +943,13 @@ int __init arch_ioremap_pud_supported(void)
 	 * SW table walks can't handle removal of intermediate entries.
 	 */
 	return IS_ENABLED(CONFIG_ARM64_4K_PAGES) &&
-	       !IS_ENABLED(CONFIG_ARM64_PTDUMP_DEBUGFS);
+	       !IS_ENABLED(CONFIG_PTDUMP_DEBUGFS);
 }
 
 int __init arch_ioremap_pmd_supported(void)
 {
 	/* See arch_ioremap_pud_supported() */
-	return !IS_ENABLED(CONFIG_ARM64_PTDUMP_DEBUGFS);
+	return !IS_ENABLED(CONFIG_PTDUMP_DEBUGFS);
 }
 
 int pud_set_huge(pud_t *pudp, phys_addr_t phys, pgprot_t prot)
diff --git a/arch/arm64/mm/ptdump_debugfs.c b/arch/arm64/mm/ptdump_debugfs.c
index 064163f25592..1f2eae3e988b 100644
--- a/arch/arm64/mm/ptdump_debugfs.c
+++ b/arch/arm64/mm/ptdump_debugfs.c
@@ -7,7 +7,7 @@
 static int ptdump_show(struct seq_file *m, void *v)
 {
 	struct ptdump_info *info = m->private;
-	ptdump_walk_pgd(m, info);
+	ptdump_walk(m, info);
 	return 0;
 }
 DEFINE_SHOW_ATTRIBUTE(ptdump);
diff --git a/drivers/firmware/efi/arm-runtime.c b/drivers/firmware/efi/arm-runtime.c
index 899b803842bb..9dda2602c862 100644
--- a/drivers/firmware/efi/arm-runtime.c
+++ b/drivers/firmware/efi/arm-runtime.c
@@ -27,7 +27,7 @@
 
 extern u64 efi_system_table;
 
-#ifdef CONFIG_ARM64_PTDUMP_DEBUGFS
+#if defined(CONFIG_PTDUMP_DEBUGFS) && defined(CONFIG_ARM64)
 #include <asm/ptdump.h>
 
 static struct ptdump_info efi_ptdump_info = {
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH v16 23/25] arm64: mm: Convert mm/dump.c to use walk_page_range()
@ 2019-12-06 13:53   ` Steven Price
  0 siblings, 0 replies; 85+ messages in thread
From: Steven Price @ 2019-12-06 13:53 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: Mark Rutland, x86, Arnd Bergmann, Ard Biesheuvel, Peter Zijlstra,
	Catalin Marinas, Dave Hansen, linux-kernel, Steven Price,
	Jérôme Glisse, Ingo Molnar, Borislav Petkov,
	Andy Lutomirski, H. Peter Anvin, James Morse, Thomas Gleixner,
	Will Deacon, linux-arm-kernel, Liang, Kan

Now walk_page_range() can walk kernel page tables, we can switch the
arm64 ptdump code over to using it, simplifying the code.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Steven Price <steven.price@arm.com>
---
 arch/arm64/Kconfig                 |   1 +
 arch/arm64/Kconfig.debug           |  19 +----
 arch/arm64/include/asm/ptdump.h    |   8 +-
 arch/arm64/mm/Makefile             |   4 +-
 arch/arm64/mm/dump.c               | 117 ++++++++++-------------------
 arch/arm64/mm/mmu.c                |   4 +-
 arch/arm64/mm/ptdump_debugfs.c     |   2 +-
 drivers/firmware/efi/arm-runtime.c |   2 +-
 8 files changed, 50 insertions(+), 107 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index b1b4476ddb83..43aa1de727f4 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -103,6 +103,7 @@ config ARM64
 	select GENERIC_IRQ_SHOW
 	select GENERIC_IRQ_SHOW_LEVEL
 	select GENERIC_PCI_IOMAP
+	select GENERIC_PTDUMP
 	select GENERIC_SCHED_CLOCK
 	select GENERIC_SMP_IDLE_THREAD
 	select GENERIC_STRNCPY_FROM_USER
diff --git a/arch/arm64/Kconfig.debug b/arch/arm64/Kconfig.debug
index cf09010d825f..1c906d932d6b 100644
--- a/arch/arm64/Kconfig.debug
+++ b/arch/arm64/Kconfig.debug
@@ -1,22 +1,5 @@
 # SPDX-License-Identifier: GPL-2.0-only
 
-config ARM64_PTDUMP_CORE
-	def_bool n
-
-config ARM64_PTDUMP_DEBUGFS
-	bool "Export kernel pagetable layout to userspace via debugfs"
-	depends on DEBUG_KERNEL
-	select ARM64_PTDUMP_CORE
-	select DEBUG_FS
-        help
-	  Say Y here if you want to show the kernel pagetable layout in a
-	  debugfs file. This information is only useful for kernel developers
-	  who are working in architecture specific areas of the kernel.
-	  It is probably not a good idea to enable this feature in a production
-	  kernel.
-
-	  If in doubt, say N.
-
 config PID_IN_CONTEXTIDR
 	bool "Write the current PID to the CONTEXTIDR register"
 	help
@@ -42,7 +25,7 @@ config ARM64_RANDOMIZE_TEXT_OFFSET
 
 config DEBUG_WX
 	bool "Warn on W+X mappings at boot"
-	select ARM64_PTDUMP_CORE
+	select PTDUMP_CORE
 	---help---
 	  Generate a warning if any W+X mappings are found at boot.
 
diff --git a/arch/arm64/include/asm/ptdump.h b/arch/arm64/include/asm/ptdump.h
index 0b8e7269ec82..38187f74e089 100644
--- a/arch/arm64/include/asm/ptdump.h
+++ b/arch/arm64/include/asm/ptdump.h
@@ -5,7 +5,7 @@
 #ifndef __ASM_PTDUMP_H
 #define __ASM_PTDUMP_H
 
-#ifdef CONFIG_ARM64_PTDUMP_CORE
+#ifdef CONFIG_PTDUMP_CORE
 
 #include <linux/mm_types.h>
 #include <linux/seq_file.h>
@@ -21,15 +21,15 @@ struct ptdump_info {
 	unsigned long			base_addr;
 };
 
-void ptdump_walk_pgd(struct seq_file *s, struct ptdump_info *info);
-#ifdef CONFIG_ARM64_PTDUMP_DEBUGFS
+void ptdump_walk(struct seq_file *s, struct ptdump_info *info);
+#ifdef CONFIG_PTDUMP_DEBUGFS
 void ptdump_debugfs_register(struct ptdump_info *info, const char *name);
 #else
 static inline void ptdump_debugfs_register(struct ptdump_info *info,
 					   const char *name) { }
 #endif
 void ptdump_check_wx(void);
-#endif /* CONFIG_ARM64_PTDUMP_CORE */
+#endif /* CONFIG_PTDUMP_CORE */
 
 #ifdef CONFIG_DEBUG_WX
 #define debug_checkwx()	ptdump_check_wx()
diff --git a/arch/arm64/mm/Makefile b/arch/arm64/mm/Makefile
index 849c1df3d214..d91030f0ffee 100644
--- a/arch/arm64/mm/Makefile
+++ b/arch/arm64/mm/Makefile
@@ -4,8 +4,8 @@ obj-y				:= dma-mapping.o extable.o fault.o init.o \
 				   ioremap.o mmap.o pgd.o mmu.o \
 				   context.o proc.o pageattr.o
 obj-$(CONFIG_HUGETLB_PAGE)	+= hugetlbpage.o
-obj-$(CONFIG_ARM64_PTDUMP_CORE)	+= dump.o
-obj-$(CONFIG_ARM64_PTDUMP_DEBUGFS)	+= ptdump_debugfs.o
+obj-$(CONFIG_PTDUMP_CORE)	+= dump.o
+obj-$(CONFIG_PTDUMP_DEBUGFS)	+= ptdump_debugfs.o
 obj-$(CONFIG_NUMA)		+= numa.o
 obj-$(CONFIG_DEBUG_VIRTUAL)	+= physaddr.o
 KASAN_SANITIZE_physaddr.o	+= n
diff --git a/arch/arm64/mm/dump.c b/arch/arm64/mm/dump.c
index 93f9f77582ae..9d9b740a86d2 100644
--- a/arch/arm64/mm/dump.c
+++ b/arch/arm64/mm/dump.c
@@ -15,6 +15,7 @@
 #include <linux/io.h>
 #include <linux/init.h>
 #include <linux/mm.h>
+#include <linux/ptdump.h>
 #include <linux/sched.h>
 #include <linux/seq_file.h>
 
@@ -75,10 +76,11 @@ static struct addr_marker address_markers[] = {
  * dumps out a description of the range.
  */
 struct pg_state {
+	struct ptdump_state ptdump;
 	struct seq_file *seq;
 	const struct addr_marker *marker;
 	unsigned long start_address;
-	unsigned level;
+	int level;
 	u64 current_prot;
 	bool check_wx;
 	unsigned long wx_pages;
@@ -178,6 +180,10 @@ static struct pg_level pg_level[] = {
 		.name	= "PGD",
 		.bits	= pte_bits,
 		.num	= ARRAY_SIZE(pte_bits),
+	}, { /* p4d */
+		.name	= "P4D",
+		.bits	= pte_bits,
+		.num	= ARRAY_SIZE(pte_bits),
 	}, { /* pud */
 		.name	= (CONFIG_PGTABLE_LEVELS > 3) ? "PUD" : "PGD",
 		.bits	= pte_bits,
@@ -240,11 +246,15 @@ static void note_prot_wx(struct pg_state *st, unsigned long addr)
 	st->wx_pages += (addr - st->start_address) / PAGE_SIZE;
 }
 
-static void note_page(struct pg_state *st, unsigned long addr, unsigned level,
-				u64 val)
+static void note_page(struct ptdump_state *pt_st, unsigned long addr, int level,
+		      unsigned long val)
 {
+	struct pg_state *st = container_of(pt_st, struct pg_state, ptdump);
 	static const char units[] = "KMGTPE";
-	u64 prot = val & pg_level[level].mask;
+	u64 prot = 0;
+
+	if (level >= 0)
+		prot = val & pg_level[level].mask;
 
 	if (!st->level) {
 		st->level = level;
@@ -292,85 +302,27 @@ static void note_page(struct pg_state *st, unsigned long addr, unsigned level,
 
 }
 
-static void walk_pte(struct pg_state *st, pmd_t *pmdp, unsigned long start,
-		     unsigned long end)
-{
-	unsigned long addr = start;
-	pte_t *ptep = pte_offset_kernel(pmdp, start);
-
-	do {
-		note_page(st, addr, 4, READ_ONCE(pte_val(*ptep)));
-	} while (ptep++, addr += PAGE_SIZE, addr != end);
-}
-
-static void walk_pmd(struct pg_state *st, pud_t *pudp, unsigned long start,
-		     unsigned long end)
-{
-	unsigned long next, addr = start;
-	pmd_t *pmdp = pmd_offset(pudp, start);
-
-	do {
-		pmd_t pmd = READ_ONCE(*pmdp);
-		next = pmd_addr_end(addr, end);
-
-		if (pmd_none(pmd) || pmd_sect(pmd)) {
-			note_page(st, addr, 3, pmd_val(pmd));
-		} else {
-			BUG_ON(pmd_bad(pmd));
-			walk_pte(st, pmdp, addr, next);
-		}
-	} while (pmdp++, addr = next, addr != end);
-}
-
-static void walk_pud(struct pg_state *st, pgd_t *pgdp, unsigned long start,
-		     unsigned long end)
+void ptdump_walk(struct seq_file *s, struct ptdump_info *info)
 {
-	unsigned long next, addr = start;
-	pud_t *pudp = pud_offset(pgdp, start);
-
-	do {
-		pud_t pud = READ_ONCE(*pudp);
-		next = pud_addr_end(addr, end);
-
-		if (pud_none(pud) || pud_sect(pud)) {
-			note_page(st, addr, 2, pud_val(pud));
-		} else {
-			BUG_ON(pud_bad(pud));
-			walk_pmd(st, pudp, addr, next);
-		}
-	} while (pudp++, addr = next, addr != end);
-}
+	unsigned long end = ~0UL;
+	struct pg_state st;
 
-static void walk_pgd(struct pg_state *st, struct mm_struct *mm,
-		     unsigned long start)
-{
-	unsigned long end = (start < TASK_SIZE_64) ? TASK_SIZE_64 : 0;
-	unsigned long next, addr = start;
-	pgd_t *pgdp = pgd_offset(mm, start);
-
-	do {
-		pgd_t pgd = READ_ONCE(*pgdp);
-		next = pgd_addr_end(addr, end);
-
-		if (pgd_none(pgd)) {
-			note_page(st, addr, 1, pgd_val(pgd));
-		} else {
-			BUG_ON(pgd_bad(pgd));
-			walk_pud(st, pgdp, addr, next);
-		}
-	} while (pgdp++, addr = next, addr != end);
-}
+	if (info->base_addr < TASK_SIZE_64)
+		end = TASK_SIZE_64;
 
-void ptdump_walk_pgd(struct seq_file *m, struct ptdump_info *info)
-{
-	struct pg_state st = {
-		.seq = m,
+	st = (struct pg_state){
+		.seq = s,
 		.marker = info->markers,
+		.ptdump = {
+			.note_page = note_page,
+			.range = (struct ptdump_range[]){
+				{info->base_addr, end},
+				{0, 0}
+			}
+		}
 	};
 
-	walk_pgd(&st, info->mm, info->base_addr);
-
-	note_page(&st, 0, 0, 0);
+	ptdump_walk_pgd(&st.ptdump, info->mm);
 }
 
 static void ptdump_initialize(void)
@@ -398,10 +350,17 @@ void ptdump_check_wx(void)
 			{ -1, NULL},
 		},
 		.check_wx = true,
+		.ptdump = {
+			.note_page = note_page,
+			.range = (struct ptdump_range[]) {
+				{PAGE_OFFSET, ~0UL},
+				{0, 0}
+			}
+		}
 	};
 
-	walk_pgd(&st, &init_mm, PAGE_OFFSET);
-	note_page(&st, 0, 0, 0);
+	ptdump_walk_pgd(&st.ptdump, &init_mm);
+
 	if (st.wx_pages || st.uxn_pages)
 		pr_warn("Checked W+X mappings: FAILED, %lu W+X pages found, %lu non-UXN pages found\n",
 			st.wx_pages, st.uxn_pages);
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 5a3b15a14a7f..36e8f4f74433 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -943,13 +943,13 @@ int __init arch_ioremap_pud_supported(void)
 	 * SW table walks can't handle removal of intermediate entries.
 	 */
 	return IS_ENABLED(CONFIG_ARM64_4K_PAGES) &&
-	       !IS_ENABLED(CONFIG_ARM64_PTDUMP_DEBUGFS);
+	       !IS_ENABLED(CONFIG_PTDUMP_DEBUGFS);
 }
 
 int __init arch_ioremap_pmd_supported(void)
 {
 	/* See arch_ioremap_pud_supported() */
-	return !IS_ENABLED(CONFIG_ARM64_PTDUMP_DEBUGFS);
+	return !IS_ENABLED(CONFIG_PTDUMP_DEBUGFS);
 }
 
 int pud_set_huge(pud_t *pudp, phys_addr_t phys, pgprot_t prot)
diff --git a/arch/arm64/mm/ptdump_debugfs.c b/arch/arm64/mm/ptdump_debugfs.c
index 064163f25592..1f2eae3e988b 100644
--- a/arch/arm64/mm/ptdump_debugfs.c
+++ b/arch/arm64/mm/ptdump_debugfs.c
@@ -7,7 +7,7 @@
 static int ptdump_show(struct seq_file *m, void *v)
 {
 	struct ptdump_info *info = m->private;
-	ptdump_walk_pgd(m, info);
+	ptdump_walk(m, info);
 	return 0;
 }
 DEFINE_SHOW_ATTRIBUTE(ptdump);
diff --git a/drivers/firmware/efi/arm-runtime.c b/drivers/firmware/efi/arm-runtime.c
index 899b803842bb..9dda2602c862 100644
--- a/drivers/firmware/efi/arm-runtime.c
+++ b/drivers/firmware/efi/arm-runtime.c
@@ -27,7 +27,7 @@
 
 extern u64 efi_system_table;
 
-#ifdef CONFIG_ARM64_PTDUMP_DEBUGFS
+#if defined(CONFIG_PTDUMP_DEBUGFS) && defined(CONFIG_ARM64)
 #include <asm/ptdump.h>
 
 static struct ptdump_info efi_ptdump_info = {
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH v16 24/25] arm64: mm: Display non-present entries in ptdump
  2019-12-06 13:52 ` Steven Price
@ 2019-12-06 13:53   ` Steven Price
  -1 siblings, 0 replies; 85+ messages in thread
From: Steven Price @ 2019-12-06 13:53 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: Steven Price, Andy Lutomirski, Ard Biesheuvel, Arnd Bergmann,
	Borislav Petkov, Catalin Marinas, Dave Hansen, Ingo Molnar,
	James Morse, Jérôme Glisse, Peter Zijlstra,
	Thomas Gleixner, Will Deacon, x86, H. Peter Anvin,
	linux-arm-kernel, linux-kernel, Mark Rutland, Liang, Kan

Previously the /sys/kernel/debug/kernel_page_tables file would only show
lines for entries present in the page tables. However it is useful to
also show non-present entries as this makes the size and level of the
holes more visible. This aligns the behaviour with x86 which also shows
holes.

Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Steven Price <steven.price@arm.com>
---
 arch/arm64/mm/dump.c | 25 +++++++++++++------------
 1 file changed, 13 insertions(+), 12 deletions(-)

diff --git a/arch/arm64/mm/dump.c b/arch/arm64/mm/dump.c
index 9d9b740a86d2..3203dd8e6d0a 100644
--- a/arch/arm64/mm/dump.c
+++ b/arch/arm64/mm/dump.c
@@ -269,21 +269,22 @@ static void note_page(struct ptdump_state *pt_st, unsigned long addr, int level,
 		if (st->current_prot) {
 			note_prot_uxn(st, addr);
 			note_prot_wx(st, addr);
-			pt_dump_seq_printf(st->seq, "0x%016lx-0x%016lx   ",
+		}
+
+		pt_dump_seq_printf(st->seq, "0x%016lx-0x%016lx   ",
 				   st->start_address, addr);
 
-			delta = (addr - st->start_address) >> 10;
-			while (!(delta & 1023) && unit[1]) {
-				delta >>= 10;
-				unit++;
-			}
-			pt_dump_seq_printf(st->seq, "%9lu%c %s", delta, *unit,
-				   pg_level[st->level].name);
-			if (pg_level[st->level].bits)
-				dump_prot(st, pg_level[st->level].bits,
-					  pg_level[st->level].num);
-			pt_dump_seq_puts(st->seq, "\n");
+		delta = (addr - st->start_address) >> 10;
+		while (!(delta & 1023) && unit[1]) {
+			delta >>= 10;
+			unit++;
 		}
+		pt_dump_seq_printf(st->seq, "%9lu%c %s", delta, *unit,
+				   pg_level[st->level].name);
+		if (st->current_prot && pg_level[st->level].bits)
+			dump_prot(st, pg_level[st->level].bits,
+				  pg_level[st->level].num);
+		pt_dump_seq_puts(st->seq, "\n");
 
 		if (addr >= st->marker[1].start_address) {
 			st->marker++;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH v16 24/25] arm64: mm: Display non-present entries in ptdump
@ 2019-12-06 13:53   ` Steven Price
  0 siblings, 0 replies; 85+ messages in thread
From: Steven Price @ 2019-12-06 13:53 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: Mark Rutland, x86, Arnd Bergmann, Ard Biesheuvel, Peter Zijlstra,
	Catalin Marinas, Dave Hansen, linux-kernel, Steven Price,
	Jérôme Glisse, Ingo Molnar, Borislav Petkov,
	Andy Lutomirski, H. Peter Anvin, James Morse, Thomas Gleixner,
	Will Deacon, linux-arm-kernel, Liang, Kan

Previously the /sys/kernel/debug/kernel_page_tables file would only show
lines for entries present in the page tables. However it is useful to
also show non-present entries as this makes the size and level of the
holes more visible. This aligns the behaviour with x86 which also shows
holes.

Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Steven Price <steven.price@arm.com>
---
 arch/arm64/mm/dump.c | 25 +++++++++++++------------
 1 file changed, 13 insertions(+), 12 deletions(-)

diff --git a/arch/arm64/mm/dump.c b/arch/arm64/mm/dump.c
index 9d9b740a86d2..3203dd8e6d0a 100644
--- a/arch/arm64/mm/dump.c
+++ b/arch/arm64/mm/dump.c
@@ -269,21 +269,22 @@ static void note_page(struct ptdump_state *pt_st, unsigned long addr, int level,
 		if (st->current_prot) {
 			note_prot_uxn(st, addr);
 			note_prot_wx(st, addr);
-			pt_dump_seq_printf(st->seq, "0x%016lx-0x%016lx   ",
+		}
+
+		pt_dump_seq_printf(st->seq, "0x%016lx-0x%016lx   ",
 				   st->start_address, addr);
 
-			delta = (addr - st->start_address) >> 10;
-			while (!(delta & 1023) && unit[1]) {
-				delta >>= 10;
-				unit++;
-			}
-			pt_dump_seq_printf(st->seq, "%9lu%c %s", delta, *unit,
-				   pg_level[st->level].name);
-			if (pg_level[st->level].bits)
-				dump_prot(st, pg_level[st->level].bits,
-					  pg_level[st->level].num);
-			pt_dump_seq_puts(st->seq, "\n");
+		delta = (addr - st->start_address) >> 10;
+		while (!(delta & 1023) && unit[1]) {
+			delta >>= 10;
+			unit++;
 		}
+		pt_dump_seq_printf(st->seq, "%9lu%c %s", delta, *unit,
+				   pg_level[st->level].name);
+		if (st->current_prot && pg_level[st->level].bits)
+			dump_prot(st, pg_level[st->level].bits,
+				  pg_level[st->level].num);
+		pt_dump_seq_puts(st->seq, "\n");
 
 		if (addr >= st->marker[1].start_address) {
 			st->marker++;
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH v16 25/25] mm: ptdump: Reduce level numbers by 1 in note_page()
  2019-12-06 13:52 ` Steven Price
@ 2019-12-06 13:53   ` Steven Price
  -1 siblings, 0 replies; 85+ messages in thread
From: Steven Price @ 2019-12-06 13:53 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: Steven Price, Andy Lutomirski, Ard Biesheuvel, Arnd Bergmann,
	Borislav Petkov, Catalin Marinas, Dave Hansen, Ingo Molnar,
	James Morse, Jérôme Glisse, Peter Zijlstra,
	Thomas Gleixner, Will Deacon, x86, H. Peter Anvin,
	linux-arm-kernel, linux-kernel, Mark Rutland, Liang, Kan

Rather than having to increment the 'depth' number by 1 in
ptdump_hole(), let's change the meaning of 'level' in note_page() since
that makes the code simplier.

Note that for x86, the level numbers were previously increased by 1 in
commit 45dcd2091363 ("x86/mm/dump_pagetables: Fix printout of p4d level")
and the comment "Bit 7 has a different meaning" was not updated, so this
change also makes the code match the comment again.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Steven Price <steven.price@arm.com>
---
 arch/arm64/mm/dump.c          |  6 +++---
 arch/x86/mm/dump_pagetables.c | 19 ++++++++++---------
 include/linux/ptdump.h        |  1 +
 mm/ptdump.c                   | 16 ++++++++--------
 4 files changed, 22 insertions(+), 20 deletions(-)

diff --git a/arch/arm64/mm/dump.c b/arch/arm64/mm/dump.c
index 3203dd8e6d0a..4997ce244172 100644
--- a/arch/arm64/mm/dump.c
+++ b/arch/arm64/mm/dump.c
@@ -175,8 +175,7 @@ struct pg_level {
 };
 
 static struct pg_level pg_level[] = {
-	{
-	}, { /* pgd */
+	{ /* pgd */
 		.name	= "PGD",
 		.bits	= pte_bits,
 		.num	= ARRAY_SIZE(pte_bits),
@@ -256,7 +255,7 @@ static void note_page(struct ptdump_state *pt_st, unsigned long addr, int level,
 	if (level >= 0)
 		prot = val & pg_level[level].mask;
 
-	if (!st->level) {
+	if (st->level == -1) {
 		st->level = level;
 		st->current_prot = prot;
 		st->start_address = addr;
@@ -350,6 +349,7 @@ void ptdump_check_wx(void)
 			{ 0, NULL},
 			{ -1, NULL},
 		},
+		.level = -1,
 		.check_wx = true,
 		.ptdump = {
 			.note_page = note_page,
diff --git a/arch/x86/mm/dump_pagetables.c b/arch/x86/mm/dump_pagetables.c
index 77a1332c6cd4..d3c28b3765fc 100644
--- a/arch/x86/mm/dump_pagetables.c
+++ b/arch/x86/mm/dump_pagetables.c
@@ -176,7 +176,7 @@ static struct addr_marker address_markers[] = {
 static void printk_prot(struct seq_file *m, pgprotval_t pr, int level, bool dmsg)
 {
 	static const char * const level_name[] =
-		{ "cr3", "pgd", "p4d", "pud", "pmd", "pte" };
+		{ "pgd", "p4d", "pud", "pmd", "pte" };
 
 	if (!(pr & _PAGE_PRESENT)) {
 		/* Not present */
@@ -200,12 +200,12 @@ static void printk_prot(struct seq_file *m, pgprotval_t pr, int level, bool dmsg
 			pt_dump_cont_printf(m, dmsg, "    ");
 
 		/* Bit 7 has a different meaning on level 3 vs 4 */
-		if (level <= 4 && pr & _PAGE_PSE)
+		if (level <= 3 && pr & _PAGE_PSE)
 			pt_dump_cont_printf(m, dmsg, "PSE ");
 		else
 			pt_dump_cont_printf(m, dmsg, "    ");
-		if ((level == 5 && pr & _PAGE_PAT) ||
-		    ((level == 4 || level == 3) && pr & _PAGE_PAT_LARGE))
+		if ((level == 4 && pr & _PAGE_PAT) ||
+		    ((level == 3 || level == 2) && pr & _PAGE_PAT_LARGE))
 			pt_dump_cont_printf(m, dmsg, "PAT ");
 		else
 			pt_dump_cont_printf(m, dmsg, "    ");
@@ -267,15 +267,15 @@ static void note_page(struct ptdump_state *pt_st, unsigned long addr, int level,
 
 	new_prot = val & PTE_FLAGS_MASK;
 
-	if (level > 1) {
-		new_eff = effective_prot(st->prot_levels[level - 2],
+	if (level > 0) {
+		new_eff = effective_prot(st->prot_levels[level - 1],
 					 new_prot);
 	} else {
 		new_eff = new_prot;
 	}
 
-	if (level > 0)
-		st->prot_levels[level - 1] = new_eff;
+	if (level >= 0)
+		st->prot_levels[level] = new_eff;
 
 	/*
 	 * If we have a "break" in the series, we need to flush the state that
@@ -285,7 +285,7 @@ static void note_page(struct ptdump_state *pt_st, unsigned long addr, int level,
 	cur = st->current_prot;
 	eff = st->effective_prot;
 
-	if (!st->level) {
+	if (st->level == -1) {
 		/* First entry */
 		st->current_prot = new_prot;
 		st->effective_prot = new_eff;
@@ -376,6 +376,7 @@ static void ptdump_walk_pgd_level_core(struct seq_file *m, struct mm_struct *mm,
 			.note_page	= note_page,
 			.range		= ptdump_ranges
 		},
+		.level = -1,
 		.to_dmesg	= dmesg,
 		.check_wx	= checkwx,
 		.seq		= m
diff --git a/include/linux/ptdump.h b/include/linux/ptdump.h
index a0fb8dd2be97..b28f3f2acf90 100644
--- a/include/linux/ptdump.h
+++ b/include/linux/ptdump.h
@@ -11,6 +11,7 @@ struct ptdump_range {
 };
 
 struct ptdump_state {
+	/* level is 0:PGD to 4:PTE, or -1 if unknown */
 	void (*note_page)(struct ptdump_state *st, unsigned long addr,
 			  int level, unsigned long val);
 	const struct ptdump_range *range;
diff --git a/mm/ptdump.c b/mm/ptdump.c
index 9357ab3ecc88..39b0773b6172 100644
--- a/mm/ptdump.c
+++ b/mm/ptdump.c
@@ -11,7 +11,7 @@ static int ptdump_pgd_entry(pgd_t *pgd, unsigned long addr,
 	pgd_t val = READ_ONCE(*pgd);
 
 	if (pgd_leaf(val))
-		st->note_page(st, addr, 1, pgd_val(val));
+		st->note_page(st, addr, 0, pgd_val(val));
 
 	return 0;
 }
@@ -23,7 +23,7 @@ static int ptdump_p4d_entry(p4d_t *p4d, unsigned long addr,
 	p4d_t val = READ_ONCE(*p4d);
 
 	if (p4d_leaf(val))
-		st->note_page(st, addr, 2, p4d_val(val));
+		st->note_page(st, addr, 1, p4d_val(val));
 
 	return 0;
 }
@@ -35,7 +35,7 @@ static int ptdump_pud_entry(pud_t *pud, unsigned long addr,
 	pud_t val = READ_ONCE(*pud);
 
 	if (pud_leaf(val))
-		st->note_page(st, addr, 3, pud_val(val));
+		st->note_page(st, addr, 2, pud_val(val));
 
 	return 0;
 }
@@ -47,7 +47,7 @@ static int ptdump_pmd_entry(pmd_t *pmd, unsigned long addr,
 	pmd_t val = READ_ONCE(*pmd);
 
 	if (pmd_leaf(val))
-		st->note_page(st, addr, 4, pmd_val(val));
+		st->note_page(st, addr, 3, pmd_val(val));
 
 	return 0;
 }
@@ -57,7 +57,7 @@ static int ptdump_pte_entry(pte_t *pte, unsigned long addr,
 {
 	struct ptdump_state *st = walk->private;
 
-	st->note_page(st, addr, 5, pte_val(READ_ONCE(*pte)));
+	st->note_page(st, addr, 4, pte_val(READ_ONCE(*pte)));
 
 	return 0;
 }
@@ -75,7 +75,7 @@ static inline int note_kasan_page_table(struct mm_walk *walk,
 {
 	struct ptdump_state *st = walk->private;
 
-	st->note_page(st, addr, 5, pte_val(kasan_early_shadow_pte[0]));
+	st->note_page(st, addr, 4, pte_val(kasan_early_shadow_pte[0]));
 	return 1;
 }
 
@@ -115,7 +115,7 @@ static int ptdump_hole(unsigned long addr, unsigned long next,
 {
 	struct ptdump_state *st = walk->private;
 
-	st->note_page(st, addr, depth + 1, 0);
+	st->note_page(st, addr, depth, 0);
 
 	return 0;
 }
@@ -147,5 +147,5 @@ void ptdump_walk_pgd(struct ptdump_state *st, struct mm_struct *mm)
 	up_read(&mm->mmap_sem);
 
 	/* Flush out the last page */
-	st->note_page(st, 0, 0, 0);
+	st->note_page(st, 0, -1, 0);
 }
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH v16 25/25] mm: ptdump: Reduce level numbers by 1 in note_page()
@ 2019-12-06 13:53   ` Steven Price
  0 siblings, 0 replies; 85+ messages in thread
From: Steven Price @ 2019-12-06 13:53 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: Mark Rutland, x86, Arnd Bergmann, Ard Biesheuvel, Peter Zijlstra,
	Catalin Marinas, Dave Hansen, linux-kernel, Steven Price,
	Jérôme Glisse, Ingo Molnar, Borislav Petkov,
	Andy Lutomirski, H. Peter Anvin, James Morse, Thomas Gleixner,
	Will Deacon, linux-arm-kernel, Liang, Kan

Rather than having to increment the 'depth' number by 1 in
ptdump_hole(), let's change the meaning of 'level' in note_page() since
that makes the code simplier.

Note that for x86, the level numbers were previously increased by 1 in
commit 45dcd2091363 ("x86/mm/dump_pagetables: Fix printout of p4d level")
and the comment "Bit 7 has a different meaning" was not updated, so this
change also makes the code match the comment again.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Steven Price <steven.price@arm.com>
---
 arch/arm64/mm/dump.c          |  6 +++---
 arch/x86/mm/dump_pagetables.c | 19 ++++++++++---------
 include/linux/ptdump.h        |  1 +
 mm/ptdump.c                   | 16 ++++++++--------
 4 files changed, 22 insertions(+), 20 deletions(-)

diff --git a/arch/arm64/mm/dump.c b/arch/arm64/mm/dump.c
index 3203dd8e6d0a..4997ce244172 100644
--- a/arch/arm64/mm/dump.c
+++ b/arch/arm64/mm/dump.c
@@ -175,8 +175,7 @@ struct pg_level {
 };
 
 static struct pg_level pg_level[] = {
-	{
-	}, { /* pgd */
+	{ /* pgd */
 		.name	= "PGD",
 		.bits	= pte_bits,
 		.num	= ARRAY_SIZE(pte_bits),
@@ -256,7 +255,7 @@ static void note_page(struct ptdump_state *pt_st, unsigned long addr, int level,
 	if (level >= 0)
 		prot = val & pg_level[level].mask;
 
-	if (!st->level) {
+	if (st->level == -1) {
 		st->level = level;
 		st->current_prot = prot;
 		st->start_address = addr;
@@ -350,6 +349,7 @@ void ptdump_check_wx(void)
 			{ 0, NULL},
 			{ -1, NULL},
 		},
+		.level = -1,
 		.check_wx = true,
 		.ptdump = {
 			.note_page = note_page,
diff --git a/arch/x86/mm/dump_pagetables.c b/arch/x86/mm/dump_pagetables.c
index 77a1332c6cd4..d3c28b3765fc 100644
--- a/arch/x86/mm/dump_pagetables.c
+++ b/arch/x86/mm/dump_pagetables.c
@@ -176,7 +176,7 @@ static struct addr_marker address_markers[] = {
 static void printk_prot(struct seq_file *m, pgprotval_t pr, int level, bool dmsg)
 {
 	static const char * const level_name[] =
-		{ "cr3", "pgd", "p4d", "pud", "pmd", "pte" };
+		{ "pgd", "p4d", "pud", "pmd", "pte" };
 
 	if (!(pr & _PAGE_PRESENT)) {
 		/* Not present */
@@ -200,12 +200,12 @@ static void printk_prot(struct seq_file *m, pgprotval_t pr, int level, bool dmsg
 			pt_dump_cont_printf(m, dmsg, "    ");
 
 		/* Bit 7 has a different meaning on level 3 vs 4 */
-		if (level <= 4 && pr & _PAGE_PSE)
+		if (level <= 3 && pr & _PAGE_PSE)
 			pt_dump_cont_printf(m, dmsg, "PSE ");
 		else
 			pt_dump_cont_printf(m, dmsg, "    ");
-		if ((level == 5 && pr & _PAGE_PAT) ||
-		    ((level == 4 || level == 3) && pr & _PAGE_PAT_LARGE))
+		if ((level == 4 && pr & _PAGE_PAT) ||
+		    ((level == 3 || level == 2) && pr & _PAGE_PAT_LARGE))
 			pt_dump_cont_printf(m, dmsg, "PAT ");
 		else
 			pt_dump_cont_printf(m, dmsg, "    ");
@@ -267,15 +267,15 @@ static void note_page(struct ptdump_state *pt_st, unsigned long addr, int level,
 
 	new_prot = val & PTE_FLAGS_MASK;
 
-	if (level > 1) {
-		new_eff = effective_prot(st->prot_levels[level - 2],
+	if (level > 0) {
+		new_eff = effective_prot(st->prot_levels[level - 1],
 					 new_prot);
 	} else {
 		new_eff = new_prot;
 	}
 
-	if (level > 0)
-		st->prot_levels[level - 1] = new_eff;
+	if (level >= 0)
+		st->prot_levels[level] = new_eff;
 
 	/*
 	 * If we have a "break" in the series, we need to flush the state that
@@ -285,7 +285,7 @@ static void note_page(struct ptdump_state *pt_st, unsigned long addr, int level,
 	cur = st->current_prot;
 	eff = st->effective_prot;
 
-	if (!st->level) {
+	if (st->level == -1) {
 		/* First entry */
 		st->current_prot = new_prot;
 		st->effective_prot = new_eff;
@@ -376,6 +376,7 @@ static void ptdump_walk_pgd_level_core(struct seq_file *m, struct mm_struct *mm,
 			.note_page	= note_page,
 			.range		= ptdump_ranges
 		},
+		.level = -1,
 		.to_dmesg	= dmesg,
 		.check_wx	= checkwx,
 		.seq		= m
diff --git a/include/linux/ptdump.h b/include/linux/ptdump.h
index a0fb8dd2be97..b28f3f2acf90 100644
--- a/include/linux/ptdump.h
+++ b/include/linux/ptdump.h
@@ -11,6 +11,7 @@ struct ptdump_range {
 };
 
 struct ptdump_state {
+	/* level is 0:PGD to 4:PTE, or -1 if unknown */
 	void (*note_page)(struct ptdump_state *st, unsigned long addr,
 			  int level, unsigned long val);
 	const struct ptdump_range *range;
diff --git a/mm/ptdump.c b/mm/ptdump.c
index 9357ab3ecc88..39b0773b6172 100644
--- a/mm/ptdump.c
+++ b/mm/ptdump.c
@@ -11,7 +11,7 @@ static int ptdump_pgd_entry(pgd_t *pgd, unsigned long addr,
 	pgd_t val = READ_ONCE(*pgd);
 
 	if (pgd_leaf(val))
-		st->note_page(st, addr, 1, pgd_val(val));
+		st->note_page(st, addr, 0, pgd_val(val));
 
 	return 0;
 }
@@ -23,7 +23,7 @@ static int ptdump_p4d_entry(p4d_t *p4d, unsigned long addr,
 	p4d_t val = READ_ONCE(*p4d);
 
 	if (p4d_leaf(val))
-		st->note_page(st, addr, 2, p4d_val(val));
+		st->note_page(st, addr, 1, p4d_val(val));
 
 	return 0;
 }
@@ -35,7 +35,7 @@ static int ptdump_pud_entry(pud_t *pud, unsigned long addr,
 	pud_t val = READ_ONCE(*pud);
 
 	if (pud_leaf(val))
-		st->note_page(st, addr, 3, pud_val(val));
+		st->note_page(st, addr, 2, pud_val(val));
 
 	return 0;
 }
@@ -47,7 +47,7 @@ static int ptdump_pmd_entry(pmd_t *pmd, unsigned long addr,
 	pmd_t val = READ_ONCE(*pmd);
 
 	if (pmd_leaf(val))
-		st->note_page(st, addr, 4, pmd_val(val));
+		st->note_page(st, addr, 3, pmd_val(val));
 
 	return 0;
 }
@@ -57,7 +57,7 @@ static int ptdump_pte_entry(pte_t *pte, unsigned long addr,
 {
 	struct ptdump_state *st = walk->private;
 
-	st->note_page(st, addr, 5, pte_val(READ_ONCE(*pte)));
+	st->note_page(st, addr, 4, pte_val(READ_ONCE(*pte)));
 
 	return 0;
 }
@@ -75,7 +75,7 @@ static inline int note_kasan_page_table(struct mm_walk *walk,
 {
 	struct ptdump_state *st = walk->private;
 
-	st->note_page(st, addr, 5, pte_val(kasan_early_shadow_pte[0]));
+	st->note_page(st, addr, 4, pte_val(kasan_early_shadow_pte[0]));
 	return 1;
 }
 
@@ -115,7 +115,7 @@ static int ptdump_hole(unsigned long addr, unsigned long next,
 {
 	struct ptdump_state *st = walk->private;
 
-	st->note_page(st, addr, depth + 1, 0);
+	st->note_page(st, addr, depth, 0);
 
 	return 0;
 }
@@ -147,5 +147,5 @@ void ptdump_walk_pgd(struct ptdump_state *st, struct mm_struct *mm)
 	up_read(&mm->mmap_sem);
 
 	/* Flush out the last page */
-	st->note_page(st, 0, 0, 0);
+	st->note_page(st, 0, -1, 0);
 }
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 85+ messages in thread

* Re: [PATCH v16 06/25] powerpc: mm: Add p?d_leaf() definitions
  2019-12-06 13:52   ` Steven Price
  (?)
  (?)
@ 2019-12-09 11:08     ` Michael Ellerman
  -1 siblings, 0 replies; 85+ messages in thread
From: Michael Ellerman @ 2019-12-09 11:08 UTC (permalink / raw)
  To: Steven Price, Andrew Morton, linux-mm
  Cc: Steven Price, Andy Lutomirski, Ard Biesheuvel, Arnd Bergmann,
	Borislav Petkov, Catalin Marinas, Dave Hansen, Ingo Molnar,
	James Morse, Jérôme Glisse, Peter Zijlstra,
	Thomas Gleixner, Will Deacon, x86, H. Peter Anvin,
	linux-arm-kernel, linux-kernel, Mark Rutland, Liang, Kan,
	Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev, kvm-ppc

Steven Price <steven.price@arm.com> writes:
> walk_page_range() is going to be allowed to walk page tables other than
> those of user space. For this it needs to know when it has reached a
> 'leaf' entry in the page tables. This information is provided by the
> p?d_leaf() functions/macros.
>
> For powerpc pmd_large() already exists and does what we want, so hoist
> it out of the CONFIG_TRANSPARENT_HUGEPAGE condition and implement the
> other levels. Macros are used to provide the generic p?d_leaf() names.
>
> CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> CC: Paul Mackerras <paulus@samba.org>
> CC: Michael Ellerman <mpe@ellerman.id.au>
> CC: linuxppc-dev@lists.ozlabs.org
> CC: kvm-ppc@vger.kernel.org
> Signed-off-by: Steven Price <steven.price@arm.com>
> ---
>  arch/powerpc/include/asm/book3s/64/pgtable.h | 30 ++++++++++++++------
>  1 file changed, 21 insertions(+), 9 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h
> index b01624e5c467..3dd7b6f5edd0 100644
> --- a/arch/powerpc/include/asm/book3s/64/pgtable.h
> +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
> @@ -923,6 +923,12 @@ static inline int pud_present(pud_t pud)
>  	return !!(pud_raw(pud) & cpu_to_be64(_PAGE_PRESENT));
>  }
>  
> +#define pud_leaf	pud_large
> +static inline int pud_large(pud_t pud)
> +{
> +	return !!(pud_raw(pud) & cpu_to_be64(_PAGE_PTE));
> +}

We already have:

#define pud_is_leaf pud_is_leaf
static inline bool pud_is_leaf(pud_t pud)
{
	return !!(pud_raw(pud) & cpu_to_be64(_PAGE_PTE));
}

And so on.

These went in relatively recently in:

  d6eacedd1f0e ("powerpc/book3s: Use config independent helpers for page table walk")


Assuming those all work for you, maybe your patch in this series should
just do:

#define pud_leaf pud_is_leaf

And so on. And then we can do a patch later to change the arch/powerpc
code to use pud_leaf() etc. directly and drop the "is" versions.

cheers


> @@ -966,6 +972,12 @@ static inline int pgd_present(pgd_t pgd)
>  	return !!(pgd_raw(pgd) & cpu_to_be64(_PAGE_PRESENT));
>  }
>  
> +#define pgd_leaf	pgd_large
> +static inline int pgd_large(pgd_t pgd)
> +{
> +	return !!(pgd_raw(pgd) & cpu_to_be64(_PAGE_PTE));
> +}
> +
>  static inline pte_t pgd_pte(pgd_t pgd)
>  {
>  	return __pte_raw(pgd_raw(pgd));
> @@ -1133,6 +1145,15 @@ static inline bool pmd_access_permitted(pmd_t pmd, bool write)
>  	return pte_access_permitted(pmd_pte(pmd), write);
>  }
>  
> +#define pmd_leaf	pmd_large
> +/*
> + * returns true for pmd migration entries, THP, devmap, hugetlb
> + */
> +static inline int pmd_large(pmd_t pmd)
> +{
> +	return !!(pmd_raw(pmd) & cpu_to_be64(_PAGE_PTE));
> +}
> +
>  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
>  extern pmd_t pfn_pmd(unsigned long pfn, pgprot_t pgprot);
>  extern pmd_t mk_pmd(struct page *page, pgprot_t pgprot);
> @@ -1159,15 +1180,6 @@ pmd_hugepage_update(struct mm_struct *mm, unsigned long addr, pmd_t *pmdp,
>  	return hash__pmd_hugepage_update(mm, addr, pmdp, clr, set);
>  }
>  
> -/*
> - * returns true for pmd migration entries, THP, devmap, hugetlb
> - * But compile time dependent on THP config
> - */
> -static inline int pmd_large(pmd_t pmd)
> -{
> -	return !!(pmd_raw(pmd) & cpu_to_be64(_PAGE_PTE));
> -}
> -
>  static inline pmd_t pmd_mknotpresent(pmd_t pmd)
>  {
>  	return __pmd(pmd_val(pmd) & ~_PAGE_PRESENT);
> -- 
> 2.20.1

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v16 06/25] powerpc: mm: Add p?d_leaf() definitions
@ 2019-12-09 11:08     ` Michael Ellerman
  0 siblings, 0 replies; 85+ messages in thread
From: Michael Ellerman @ 2019-12-09 11:08 UTC (permalink / raw)
  To: Steven Price, Andrew Morton, linux-mm
  Cc: Mark Rutland, Peter Zijlstra, Dave Hansen, Paul Mackerras,
	H. Peter Anvin, Will Deacon, Liang, Kan, x86, Steven Price,
	Ingo Molnar, Catalin Marinas, Arnd Bergmann, kvm-ppc,
	Jérôme Glisse, Borislav Petkov, Andy Lutomirski,
	Thomas Gleixner, linux-arm-kernel, Ard Biesheuvel, linux-kernel,
	James Morse, linuxppc-dev

Steven Price <steven.price@arm.com> writes:
> walk_page_range() is going to be allowed to walk page tables other than
> those of user space. For this it needs to know when it has reached a
> 'leaf' entry in the page tables. This information is provided by the
> p?d_leaf() functions/macros.
>
> For powerpc pmd_large() already exists and does what we want, so hoist
> it out of the CONFIG_TRANSPARENT_HUGEPAGE condition and implement the
> other levels. Macros are used to provide the generic p?d_leaf() names.
>
> CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> CC: Paul Mackerras <paulus@samba.org>
> CC: Michael Ellerman <mpe@ellerman.id.au>
> CC: linuxppc-dev@lists.ozlabs.org
> CC: kvm-ppc@vger.kernel.org
> Signed-off-by: Steven Price <steven.price@arm.com>
> ---
>  arch/powerpc/include/asm/book3s/64/pgtable.h | 30 ++++++++++++++------
>  1 file changed, 21 insertions(+), 9 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h
> index b01624e5c467..3dd7b6f5edd0 100644
> --- a/arch/powerpc/include/asm/book3s/64/pgtable.h
> +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
> @@ -923,6 +923,12 @@ static inline int pud_present(pud_t pud)
>  	return !!(pud_raw(pud) & cpu_to_be64(_PAGE_PRESENT));
>  }
>  
> +#define pud_leaf	pud_large
> +static inline int pud_large(pud_t pud)
> +{
> +	return !!(pud_raw(pud) & cpu_to_be64(_PAGE_PTE));
> +}

We already have:

#define pud_is_leaf pud_is_leaf
static inline bool pud_is_leaf(pud_t pud)
{
	return !!(pud_raw(pud) & cpu_to_be64(_PAGE_PTE));
}

And so on.

These went in relatively recently in:

  d6eacedd1f0e ("powerpc/book3s: Use config independent helpers for page table walk")


Assuming those all work for you, maybe your patch in this series should
just do:

#define pud_leaf pud_is_leaf

And so on. And then we can do a patch later to change the arch/powerpc
code to use pud_leaf() etc. directly and drop the "is" versions.

cheers


> @@ -966,6 +972,12 @@ static inline int pgd_present(pgd_t pgd)
>  	return !!(pgd_raw(pgd) & cpu_to_be64(_PAGE_PRESENT));
>  }
>  
> +#define pgd_leaf	pgd_large
> +static inline int pgd_large(pgd_t pgd)
> +{
> +	return !!(pgd_raw(pgd) & cpu_to_be64(_PAGE_PTE));
> +}
> +
>  static inline pte_t pgd_pte(pgd_t pgd)
>  {
>  	return __pte_raw(pgd_raw(pgd));
> @@ -1133,6 +1145,15 @@ static inline bool pmd_access_permitted(pmd_t pmd, bool write)
>  	return pte_access_permitted(pmd_pte(pmd), write);
>  }
>  
> +#define pmd_leaf	pmd_large
> +/*
> + * returns true for pmd migration entries, THP, devmap, hugetlb
> + */
> +static inline int pmd_large(pmd_t pmd)
> +{
> +	return !!(pmd_raw(pmd) & cpu_to_be64(_PAGE_PTE));
> +}
> +
>  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
>  extern pmd_t pfn_pmd(unsigned long pfn, pgprot_t pgprot);
>  extern pmd_t mk_pmd(struct page *page, pgprot_t pgprot);
> @@ -1159,15 +1180,6 @@ pmd_hugepage_update(struct mm_struct *mm, unsigned long addr, pmd_t *pmdp,
>  	return hash__pmd_hugepage_update(mm, addr, pmdp, clr, set);
>  }
>  
> -/*
> - * returns true for pmd migration entries, THP, devmap, hugetlb
> - * But compile time dependent on THP config
> - */
> -static inline int pmd_large(pmd_t pmd)
> -{
> -	return !!(pmd_raw(pmd) & cpu_to_be64(_PAGE_PTE));
> -}
> -
>  static inline pmd_t pmd_mknotpresent(pmd_t pmd)
>  {
>  	return __pmd(pmd_val(pmd) & ~_PAGE_PRESENT);
> -- 
> 2.20.1

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v16 06/25] powerpc: mm: Add p?d_leaf() definitions
@ 2019-12-09 11:08     ` Michael Ellerman
  0 siblings, 0 replies; 85+ messages in thread
From: Michael Ellerman @ 2019-12-09 11:08 UTC (permalink / raw)
  To: Steven Price, Andrew Morton, linux-mm
  Cc: Mark Rutland, Peter Zijlstra, Benjamin Herrenschmidt,
	Dave Hansen, Paul Mackerras, H. Peter Anvin, Will Deacon, Liang,
	Kan, x86, Steven Price, Ingo Molnar, Catalin Marinas,
	Arnd Bergmann, kvm-ppc, Jérôme Glisse, Borislav Petkov,
	Andy Lutomirski, Thomas Gleixner, linux-arm-kernel,
	Ard Biesheuvel, linux-kernel, James Morse, linuxppc-dev

Steven Price <steven.price@arm.com> writes:
> walk_page_range() is going to be allowed to walk page tables other than
> those of user space. For this it needs to know when it has reached a
> 'leaf' entry in the page tables. This information is provided by the
> p?d_leaf() functions/macros.
>
> For powerpc pmd_large() already exists and does what we want, so hoist
> it out of the CONFIG_TRANSPARENT_HUGEPAGE condition and implement the
> other levels. Macros are used to provide the generic p?d_leaf() names.
>
> CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> CC: Paul Mackerras <paulus@samba.org>
> CC: Michael Ellerman <mpe@ellerman.id.au>
> CC: linuxppc-dev@lists.ozlabs.org
> CC: kvm-ppc@vger.kernel.org
> Signed-off-by: Steven Price <steven.price@arm.com>
> ---
>  arch/powerpc/include/asm/book3s/64/pgtable.h | 30 ++++++++++++++------
>  1 file changed, 21 insertions(+), 9 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h
> index b01624e5c467..3dd7b6f5edd0 100644
> --- a/arch/powerpc/include/asm/book3s/64/pgtable.h
> +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
> @@ -923,6 +923,12 @@ static inline int pud_present(pud_t pud)
>  	return !!(pud_raw(pud) & cpu_to_be64(_PAGE_PRESENT));
>  }
>  
> +#define pud_leaf	pud_large
> +static inline int pud_large(pud_t pud)
> +{
> +	return !!(pud_raw(pud) & cpu_to_be64(_PAGE_PTE));
> +}

We already have:

#define pud_is_leaf pud_is_leaf
static inline bool pud_is_leaf(pud_t pud)
{
	return !!(pud_raw(pud) & cpu_to_be64(_PAGE_PTE));
}

And so on.

These went in relatively recently in:

  d6eacedd1f0e ("powerpc/book3s: Use config independent helpers for page table walk")


Assuming those all work for you, maybe your patch in this series should
just do:

#define pud_leaf pud_is_leaf

And so on. And then we can do a patch later to change the arch/powerpc
code to use pud_leaf() etc. directly and drop the "is" versions.

cheers


> @@ -966,6 +972,12 @@ static inline int pgd_present(pgd_t pgd)
>  	return !!(pgd_raw(pgd) & cpu_to_be64(_PAGE_PRESENT));
>  }
>  
> +#define pgd_leaf	pgd_large
> +static inline int pgd_large(pgd_t pgd)
> +{
> +	return !!(pgd_raw(pgd) & cpu_to_be64(_PAGE_PTE));
> +}
> +
>  static inline pte_t pgd_pte(pgd_t pgd)
>  {
>  	return __pte_raw(pgd_raw(pgd));
> @@ -1133,6 +1145,15 @@ static inline bool pmd_access_permitted(pmd_t pmd, bool write)
>  	return pte_access_permitted(pmd_pte(pmd), write);
>  }
>  
> +#define pmd_leaf	pmd_large
> +/*
> + * returns true for pmd migration entries, THP, devmap, hugetlb
> + */
> +static inline int pmd_large(pmd_t pmd)
> +{
> +	return !!(pmd_raw(pmd) & cpu_to_be64(_PAGE_PTE));
> +}
> +
>  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
>  extern pmd_t pfn_pmd(unsigned long pfn, pgprot_t pgprot);
>  extern pmd_t mk_pmd(struct page *page, pgprot_t pgprot);
> @@ -1159,15 +1180,6 @@ pmd_hugepage_update(struct mm_struct *mm, unsigned long addr, pmd_t *pmdp,
>  	return hash__pmd_hugepage_update(mm, addr, pmdp, clr, set);
>  }
>  
> -/*
> - * returns true for pmd migration entries, THP, devmap, hugetlb
> - * But compile time dependent on THP config
> - */
> -static inline int pmd_large(pmd_t pmd)
> -{
> -	return !!(pmd_raw(pmd) & cpu_to_be64(_PAGE_PTE));
> -}
> -
>  static inline pmd_t pmd_mknotpresent(pmd_t pmd)
>  {
>  	return __pmd(pmd_val(pmd) & ~_PAGE_PRESENT);
> -- 
> 2.20.1

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v16 06/25] powerpc: mm: Add p?d_leaf() definitions
@ 2019-12-09 11:08     ` Michael Ellerman
  0 siblings, 0 replies; 85+ messages in thread
From: Michael Ellerman @ 2019-12-09 11:08 UTC (permalink / raw)
  To: Steven Price, Andrew Morton, linux-mm
  Cc: Steven Price, Andy Lutomirski, Ard Biesheuvel, Arnd Bergmann,
	Borislav Petkov, Catalin Marinas, Dave Hansen, Ingo Molnar,
	James Morse, Jérôme Glisse, Peter Zijlstra,
	Thomas Gleixner, Will Deacon, x86, H. Peter Anvin,
	linux-arm-kernel, linux-kernel, Mark Rutland, Liang, Kan,
	Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev, kvm-ppc

Steven Price <steven.price@arm.com> writes:
> walk_page_range() is going to be allowed to walk page tables other than
> those of user space. For this it needs to know when it has reached a
> 'leaf' entry in the page tables. This information is provided by the
> p?d_leaf() functions/macros.
>
> For powerpc pmd_large() already exists and does what we want, so hoist
> it out of the CONFIG_TRANSPARENT_HUGEPAGE condition and implement the
> other levels. Macros are used to provide the generic p?d_leaf() names.
>
> CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> CC: Paul Mackerras <paulus@samba.org>
> CC: Michael Ellerman <mpe@ellerman.id.au>
> CC: linuxppc-dev@lists.ozlabs.org
> CC: kvm-ppc@vger.kernel.org
> Signed-off-by: Steven Price <steven.price@arm.com>
> ---
>  arch/powerpc/include/asm/book3s/64/pgtable.h | 30 ++++++++++++++------
>  1 file changed, 21 insertions(+), 9 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h
> index b01624e5c467..3dd7b6f5edd0 100644
> --- a/arch/powerpc/include/asm/book3s/64/pgtable.h
> +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
> @@ -923,6 +923,12 @@ static inline int pud_present(pud_t pud)
>  	return !!(pud_raw(pud) & cpu_to_be64(_PAGE_PRESENT));
>  }
>  
> +#define pud_leaf	pud_large
> +static inline int pud_large(pud_t pud)
> +{
> +	return !!(pud_raw(pud) & cpu_to_be64(_PAGE_PTE));
> +}

We already have:

#define pud_is_leaf pud_is_leaf
static inline bool pud_is_leaf(pud_t pud)
{
	return !!(pud_raw(pud) & cpu_to_be64(_PAGE_PTE));
}

And so on.

These went in relatively recently in:

  d6eacedd1f0e ("powerpc/book3s: Use config independent helpers for page table walk")


Assuming those all work for you, maybe your patch in this series should
just do:

#define pud_leaf pud_is_leaf

And so on. And then we can do a patch later to change the arch/powerpc
code to use pud_leaf() etc. directly and drop the "is" versions.

cheers


> @@ -966,6 +972,12 @@ static inline int pgd_present(pgd_t pgd)
>  	return !!(pgd_raw(pgd) & cpu_to_be64(_PAGE_PRESENT));
>  }
>  
> +#define pgd_leaf	pgd_large
> +static inline int pgd_large(pgd_t pgd)
> +{
> +	return !!(pgd_raw(pgd) & cpu_to_be64(_PAGE_PTE));
> +}
> +
>  static inline pte_t pgd_pte(pgd_t pgd)
>  {
>  	return __pte_raw(pgd_raw(pgd));
> @@ -1133,6 +1145,15 @@ static inline bool pmd_access_permitted(pmd_t pmd, bool write)
>  	return pte_access_permitted(pmd_pte(pmd), write);
>  }
>  
> +#define pmd_leaf	pmd_large
> +/*
> + * returns true for pmd migration entries, THP, devmap, hugetlb
> + */
> +static inline int pmd_large(pmd_t pmd)
> +{
> +	return !!(pmd_raw(pmd) & cpu_to_be64(_PAGE_PTE));
> +}
> +
>  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
>  extern pmd_t pfn_pmd(unsigned long pfn, pgprot_t pgprot);
>  extern pmd_t mk_pmd(struct page *page, pgprot_t pgprot);
> @@ -1159,15 +1180,6 @@ pmd_hugepage_update(struct mm_struct *mm, unsigned long addr, pmd_t *pmdp,
>  	return hash__pmd_hugepage_update(mm, addr, pmdp, clr, set);
>  }
>  
> -/*
> - * returns true for pmd migration entries, THP, devmap, hugetlb
> - * But compile time dependent on THP config
> - */
> -static inline int pmd_large(pmd_t pmd)
> -{
> -	return !!(pmd_raw(pmd) & cpu_to_be64(_PAGE_PTE));
> -}
> -
>  static inline pmd_t pmd_mknotpresent(pmd_t pmd)
>  {
>  	return __pmd(pmd_val(pmd) & ~_PAGE_PRESENT);
> -- 
> 2.20.1

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v16 06/25] powerpc: mm: Add p?d_leaf() definitions
  2019-12-09 11:08     ` Michael Ellerman
  (?)
  (?)
@ 2019-12-09 13:06       ` Steven Price
  -1 siblings, 0 replies; 85+ messages in thread
From: Steven Price @ 2019-12-09 13:06 UTC (permalink / raw)
  To: Michael Ellerman, Andrew Morton, linux-mm
  Cc: Mark Rutland, Peter Zijlstra, Benjamin Herrenschmidt,
	Dave Hansen, Paul Mackerras, H. Peter Anvin, Will Deacon, Liang,
	Kan, x86, Ingo Molnar, Catalin Marinas, Arnd Bergmann, kvm-ppc,
	Jérôme Glisse, Borislav Petkov, Andy Lutomirski,
	Thomas Gleixner, linux-arm-kernel, Ard Biesheuvel, linux-kernel,
	James Morse, linuxppc-dev

On 09/12/2019 11:08, Michael Ellerman wrote:
> Steven Price <steven.price@arm.com> writes:
>> walk_page_range() is going to be allowed to walk page tables other than
>> those of user space. For this it needs to know when it has reached a
>> 'leaf' entry in the page tables. This information is provided by the
>> p?d_leaf() functions/macros.
>>
>> For powerpc pmd_large() already exists and does what we want, so hoist
>> it out of the CONFIG_TRANSPARENT_HUGEPAGE condition and implement the
>> other levels. Macros are used to provide the generic p?d_leaf() names.
>>
>> CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
>> CC: Paul Mackerras <paulus@samba.org>
>> CC: Michael Ellerman <mpe@ellerman.id.au>
>> CC: linuxppc-dev@lists.ozlabs.org
>> CC: kvm-ppc@vger.kernel.org
>> Signed-off-by: Steven Price <steven.price@arm.com>
>> ---
>>  arch/powerpc/include/asm/book3s/64/pgtable.h | 30 ++++++++++++++------
>>  1 file changed, 21 insertions(+), 9 deletions(-)
>>
>> diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h
>> index b01624e5c467..3dd7b6f5edd0 100644
>> --- a/arch/powerpc/include/asm/book3s/64/pgtable.h
>> +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
>> @@ -923,6 +923,12 @@ static inline int pud_present(pud_t pud)
>>  	return !!(pud_raw(pud) & cpu_to_be64(_PAGE_PRESENT));
>>  }
>>  
>> +#define pud_leaf	pud_large
>> +static inline int pud_large(pud_t pud)
>> +{
>> +	return !!(pud_raw(pud) & cpu_to_be64(_PAGE_PTE));
>> +}
> 
> We already have:
> 
> #define pud_is_leaf pud_is_leaf
> static inline bool pud_is_leaf(pud_t pud)
> {
> 	return !!(pud_raw(pud) & cpu_to_be64(_PAGE_PTE));
> }
> 
> And so on.
> 
> These went in relatively recently in:
> 
>   d6eacedd1f0e ("powerpc/book3s: Use config independent helpers for page table walk")
> 
> 
> Assuming those all work for you, maybe your patch in this series should
> just do:
> 
> #define pud_leaf pud_is_leaf
> 
> And so on. And then we can do a patch later to change the arch/powerpc
> code to use pud_leaf() etc. directly and drop the "is" versions.

Thanks for pointing this out - these didn't exist when I started this
patch series, but yes it would be a good idea to make use of them now.
Followed by cleaning up to use the shorter p?d_leaf() versions in a
later patch.

Thanks,

Steve

> cheers
> 
> 
>> @@ -966,6 +972,12 @@ static inline int pgd_present(pgd_t pgd)
>>  	return !!(pgd_raw(pgd) & cpu_to_be64(_PAGE_PRESENT));
>>  }
>>  
>> +#define pgd_leaf	pgd_large
>> +static inline int pgd_large(pgd_t pgd)
>> +{
>> +	return !!(pgd_raw(pgd) & cpu_to_be64(_PAGE_PTE));
>> +}
>> +
>>  static inline pte_t pgd_pte(pgd_t pgd)
>>  {
>>  	return __pte_raw(pgd_raw(pgd));
>> @@ -1133,6 +1145,15 @@ static inline bool pmd_access_permitted(pmd_t pmd, bool write)
>>  	return pte_access_permitted(pmd_pte(pmd), write);
>>  }
>>  
>> +#define pmd_leaf	pmd_large
>> +/*
>> + * returns true for pmd migration entries, THP, devmap, hugetlb
>> + */
>> +static inline int pmd_large(pmd_t pmd)
>> +{
>> +	return !!(pmd_raw(pmd) & cpu_to_be64(_PAGE_PTE));
>> +}
>> +
>>  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
>>  extern pmd_t pfn_pmd(unsigned long pfn, pgprot_t pgprot);
>>  extern pmd_t mk_pmd(struct page *page, pgprot_t pgprot);
>> @@ -1159,15 +1180,6 @@ pmd_hugepage_update(struct mm_struct *mm, unsigned long addr, pmd_t *pmdp,
>>  	return hash__pmd_hugepage_update(mm, addr, pmdp, clr, set);
>>  }
>>  
>> -/*
>> - * returns true for pmd migration entries, THP, devmap, hugetlb
>> - * But compile time dependent on THP config
>> - */
>> -static inline int pmd_large(pmd_t pmd)
>> -{
>> -	return !!(pmd_raw(pmd) & cpu_to_be64(_PAGE_PTE));
>> -}
>> -
>>  static inline pmd_t pmd_mknotpresent(pmd_t pmd)
>>  {
>>  	return __pmd(pmd_val(pmd) & ~_PAGE_PRESENT);
>> -- 
>> 2.20.1
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v16 06/25] powerpc: mm: Add p?d_leaf() definitions
@ 2019-12-09 13:06       ` Steven Price
  0 siblings, 0 replies; 85+ messages in thread
From: Steven Price @ 2019-12-09 13:06 UTC (permalink / raw)
  To: Michael Ellerman, Andrew Morton, linux-mm
  Cc: Mark Rutland, Peter Zijlstra, Catalin Marinas, Dave Hansen,
	Paul Mackerras, H. Peter Anvin, Will Deacon, Liang, Kan, x86,
	Ingo Molnar, Arnd Bergmann, kvm-ppc, Jérôme Glisse,
	Borislav Petkov, Andy Lutomirski, Thomas Gleixner,
	linux-arm-kernel, Ard Biesheuvel, linux-kernel, James Morse,
	linuxppc-dev

On 09/12/2019 11:08, Michael Ellerman wrote:
> Steven Price <steven.price@arm.com> writes:
>> walk_page_range() is going to be allowed to walk page tables other than
>> those of user space. For this it needs to know when it has reached a
>> 'leaf' entry in the page tables. This information is provided by the
>> p?d_leaf() functions/macros.
>>
>> For powerpc pmd_large() already exists and does what we want, so hoist
>> it out of the CONFIG_TRANSPARENT_HUGEPAGE condition and implement the
>> other levels. Macros are used to provide the generic p?d_leaf() names.
>>
>> CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
>> CC: Paul Mackerras <paulus@samba.org>
>> CC: Michael Ellerman <mpe@ellerman.id.au>
>> CC: linuxppc-dev@lists.ozlabs.org
>> CC: kvm-ppc@vger.kernel.org
>> Signed-off-by: Steven Price <steven.price@arm.com>
>> ---
>>  arch/powerpc/include/asm/book3s/64/pgtable.h | 30 ++++++++++++++------
>>  1 file changed, 21 insertions(+), 9 deletions(-)
>>
>> diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h
>> index b01624e5c467..3dd7b6f5edd0 100644
>> --- a/arch/powerpc/include/asm/book3s/64/pgtable.h
>> +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
>> @@ -923,6 +923,12 @@ static inline int pud_present(pud_t pud)
>>  	return !!(pud_raw(pud) & cpu_to_be64(_PAGE_PRESENT));
>>  }
>>  
>> +#define pud_leaf	pud_large
>> +static inline int pud_large(pud_t pud)
>> +{
>> +	return !!(pud_raw(pud) & cpu_to_be64(_PAGE_PTE));
>> +}
> 
> We already have:
> 
> #define pud_is_leaf pud_is_leaf
> static inline bool pud_is_leaf(pud_t pud)
> {
> 	return !!(pud_raw(pud) & cpu_to_be64(_PAGE_PTE));
> }
> 
> And so on.
> 
> These went in relatively recently in:
> 
>   d6eacedd1f0e ("powerpc/book3s: Use config independent helpers for page table walk")
> 
> 
> Assuming those all work for you, maybe your patch in this series should
> just do:
> 
> #define pud_leaf pud_is_leaf
> 
> And so on. And then we can do a patch later to change the arch/powerpc
> code to use pud_leaf() etc. directly and drop the "is" versions.

Thanks for pointing this out - these didn't exist when I started this
patch series, but yes it would be a good idea to make use of them now.
Followed by cleaning up to use the shorter p?d_leaf() versions in a
later patch.

Thanks,

Steve

> cheers
> 
> 
>> @@ -966,6 +972,12 @@ static inline int pgd_present(pgd_t pgd)
>>  	return !!(pgd_raw(pgd) & cpu_to_be64(_PAGE_PRESENT));
>>  }
>>  
>> +#define pgd_leaf	pgd_large
>> +static inline int pgd_large(pgd_t pgd)
>> +{
>> +	return !!(pgd_raw(pgd) & cpu_to_be64(_PAGE_PTE));
>> +}
>> +
>>  static inline pte_t pgd_pte(pgd_t pgd)
>>  {
>>  	return __pte_raw(pgd_raw(pgd));
>> @@ -1133,6 +1145,15 @@ static inline bool pmd_access_permitted(pmd_t pmd, bool write)
>>  	return pte_access_permitted(pmd_pte(pmd), write);
>>  }
>>  
>> +#define pmd_leaf	pmd_large
>> +/*
>> + * returns true for pmd migration entries, THP, devmap, hugetlb
>> + */
>> +static inline int pmd_large(pmd_t pmd)
>> +{
>> +	return !!(pmd_raw(pmd) & cpu_to_be64(_PAGE_PTE));
>> +}
>> +
>>  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
>>  extern pmd_t pfn_pmd(unsigned long pfn, pgprot_t pgprot);
>>  extern pmd_t mk_pmd(struct page *page, pgprot_t pgprot);
>> @@ -1159,15 +1180,6 @@ pmd_hugepage_update(struct mm_struct *mm, unsigned long addr, pmd_t *pmdp,
>>  	return hash__pmd_hugepage_update(mm, addr, pmdp, clr, set);
>>  }
>>  
>> -/*
>> - * returns true for pmd migration entries, THP, devmap, hugetlb
>> - * But compile time dependent on THP config
>> - */
>> -static inline int pmd_large(pmd_t pmd)
>> -{
>> -	return !!(pmd_raw(pmd) & cpu_to_be64(_PAGE_PTE));
>> -}
>> -
>>  static inline pmd_t pmd_mknotpresent(pmd_t pmd)
>>  {
>>  	return __pmd(pmd_val(pmd) & ~_PAGE_PRESENT);
>> -- 
>> 2.20.1
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v16 06/25] powerpc: mm: Add p?d_leaf() definitions
@ 2019-12-09 13:06       ` Steven Price
  0 siblings, 0 replies; 85+ messages in thread
From: Steven Price @ 2019-12-09 13:06 UTC (permalink / raw)
  To: Michael Ellerman, Andrew Morton, linux-mm
  Cc: Mark Rutland, Peter Zijlstra, Catalin Marinas, Dave Hansen,
	Paul Mackerras, H. Peter Anvin, Will Deacon, Liang, Kan, x86,
	Ingo Molnar, Benjamin Herrenschmidt, Arnd Bergmann, kvm-ppc,
	Jérôme Glisse, Borislav Petkov, Andy Lutomirski,
	Thomas Gleixner, linux-arm-kernel, Ard Biesheuvel, linux-kernel,
	James Morse, linuxppc-dev

On 09/12/2019 11:08, Michael Ellerman wrote:
> Steven Price <steven.price@arm.com> writes:
>> walk_page_range() is going to be allowed to walk page tables other than
>> those of user space. For this it needs to know when it has reached a
>> 'leaf' entry in the page tables. This information is provided by the
>> p?d_leaf() functions/macros.
>>
>> For powerpc pmd_large() already exists and does what we want, so hoist
>> it out of the CONFIG_TRANSPARENT_HUGEPAGE condition and implement the
>> other levels. Macros are used to provide the generic p?d_leaf() names.
>>
>> CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
>> CC: Paul Mackerras <paulus@samba.org>
>> CC: Michael Ellerman <mpe@ellerman.id.au>
>> CC: linuxppc-dev@lists.ozlabs.org
>> CC: kvm-ppc@vger.kernel.org
>> Signed-off-by: Steven Price <steven.price@arm.com>
>> ---
>>  arch/powerpc/include/asm/book3s/64/pgtable.h | 30 ++++++++++++++------
>>  1 file changed, 21 insertions(+), 9 deletions(-)
>>
>> diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h
>> index b01624e5c467..3dd7b6f5edd0 100644
>> --- a/arch/powerpc/include/asm/book3s/64/pgtable.h
>> +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
>> @@ -923,6 +923,12 @@ static inline int pud_present(pud_t pud)
>>  	return !!(pud_raw(pud) & cpu_to_be64(_PAGE_PRESENT));
>>  }
>>  
>> +#define pud_leaf	pud_large
>> +static inline int pud_large(pud_t pud)
>> +{
>> +	return !!(pud_raw(pud) & cpu_to_be64(_PAGE_PTE));
>> +}
> 
> We already have:
> 
> #define pud_is_leaf pud_is_leaf
> static inline bool pud_is_leaf(pud_t pud)
> {
> 	return !!(pud_raw(pud) & cpu_to_be64(_PAGE_PTE));
> }
> 
> And so on.
> 
> These went in relatively recently in:
> 
>   d6eacedd1f0e ("powerpc/book3s: Use config independent helpers for page table walk")
> 
> 
> Assuming those all work for you, maybe your patch in this series should
> just do:
> 
> #define pud_leaf pud_is_leaf
> 
> And so on. And then we can do a patch later to change the arch/powerpc
> code to use pud_leaf() etc. directly and drop the "is" versions.

Thanks for pointing this out - these didn't exist when I started this
patch series, but yes it would be a good idea to make use of them now.
Followed by cleaning up to use the shorter p?d_leaf() versions in a
later patch.

Thanks,

Steve

> cheers
> 
> 
>> @@ -966,6 +972,12 @@ static inline int pgd_present(pgd_t pgd)
>>  	return !!(pgd_raw(pgd) & cpu_to_be64(_PAGE_PRESENT));
>>  }
>>  
>> +#define pgd_leaf	pgd_large
>> +static inline int pgd_large(pgd_t pgd)
>> +{
>> +	return !!(pgd_raw(pgd) & cpu_to_be64(_PAGE_PTE));
>> +}
>> +
>>  static inline pte_t pgd_pte(pgd_t pgd)
>>  {
>>  	return __pte_raw(pgd_raw(pgd));
>> @@ -1133,6 +1145,15 @@ static inline bool pmd_access_permitted(pmd_t pmd, bool write)
>>  	return pte_access_permitted(pmd_pte(pmd), write);
>>  }
>>  
>> +#define pmd_leaf	pmd_large
>> +/*
>> + * returns true for pmd migration entries, THP, devmap, hugetlb
>> + */
>> +static inline int pmd_large(pmd_t pmd)
>> +{
>> +	return !!(pmd_raw(pmd) & cpu_to_be64(_PAGE_PTE));
>> +}
>> +
>>  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
>>  extern pmd_t pfn_pmd(unsigned long pfn, pgprot_t pgprot);
>>  extern pmd_t mk_pmd(struct page *page, pgprot_t pgprot);
>> @@ -1159,15 +1180,6 @@ pmd_hugepage_update(struct mm_struct *mm, unsigned long addr, pmd_t *pmdp,
>>  	return hash__pmd_hugepage_update(mm, addr, pmdp, clr, set);
>>  }
>>  
>> -/*
>> - * returns true for pmd migration entries, THP, devmap, hugetlb
>> - * But compile time dependent on THP config
>> - */
>> -static inline int pmd_large(pmd_t pmd)
>> -{
>> -	return !!(pmd_raw(pmd) & cpu_to_be64(_PAGE_PTE));
>> -}
>> -
>>  static inline pmd_t pmd_mknotpresent(pmd_t pmd)
>>  {
>>  	return __pmd(pmd_val(pmd) & ~_PAGE_PRESENT);
>> -- 
>> 2.20.1
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v16 06/25] powerpc: mm: Add p?d_leaf() definitions
@ 2019-12-09 13:06       ` Steven Price
  0 siblings, 0 replies; 85+ messages in thread
From: Steven Price @ 2019-12-09 13:06 UTC (permalink / raw)
  To: Michael Ellerman, Andrew Morton, linux-mm
  Cc: Mark Rutland, Peter Zijlstra, Benjamin Herrenschmidt,
	Dave Hansen, Paul Mackerras, H. Peter Anvin, Will Deacon, Liang,
	Kan, x86, Ingo Molnar, Catalin Marinas, Arnd Bergmann, kvm-ppc,
	Jérôme Glisse, Borislav Petkov, Andy Lutomirski,
	Thomas Gleixner, linux-arm-kernel, Ard Biesheuvel, linux-kernel,
	James Morse, linuxppc-dev

On 09/12/2019 11:08, Michael Ellerman wrote:
> Steven Price <steven.price@arm.com> writes:
>> walk_page_range() is going to be allowed to walk page tables other than
>> those of user space. For this it needs to know when it has reached a
>> 'leaf' entry in the page tables. This information is provided by the
>> p?d_leaf() functions/macros.
>>
>> For powerpc pmd_large() already exists and does what we want, so hoist
>> it out of the CONFIG_TRANSPARENT_HUGEPAGE condition and implement the
>> other levels. Macros are used to provide the generic p?d_leaf() names.
>>
>> CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
>> CC: Paul Mackerras <paulus@samba.org>
>> CC: Michael Ellerman <mpe@ellerman.id.au>
>> CC: linuxppc-dev@lists.ozlabs.org
>> CC: kvm-ppc@vger.kernel.org
>> Signed-off-by: Steven Price <steven.price@arm.com>
>> ---
>>  arch/powerpc/include/asm/book3s/64/pgtable.h | 30 ++++++++++++++------
>>  1 file changed, 21 insertions(+), 9 deletions(-)
>>
>> diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h
>> index b01624e5c467..3dd7b6f5edd0 100644
>> --- a/arch/powerpc/include/asm/book3s/64/pgtable.h
>> +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
>> @@ -923,6 +923,12 @@ static inline int pud_present(pud_t pud)
>>  	return !!(pud_raw(pud) & cpu_to_be64(_PAGE_PRESENT));
>>  }
>>  
>> +#define pud_leaf	pud_large
>> +static inline int pud_large(pud_t pud)
>> +{
>> +	return !!(pud_raw(pud) & cpu_to_be64(_PAGE_PTE));
>> +}
> 
> We already have:
> 
> #define pud_is_leaf pud_is_leaf
> static inline bool pud_is_leaf(pud_t pud)
> {
> 	return !!(pud_raw(pud) & cpu_to_be64(_PAGE_PTE));
> }
> 
> And so on.
> 
> These went in relatively recently in:
> 
>   d6eacedd1f0e ("powerpc/book3s: Use config independent helpers for page table walk")
> 
> 
> Assuming those all work for you, maybe your patch in this series should
> just do:
> 
> #define pud_leaf pud_is_leaf
> 
> And so on. And then we can do a patch later to change the arch/powerpc
> code to use pud_leaf() etc. directly and drop the "is" versions.

Thanks for pointing this out - these didn't exist when I started this
patch series, but yes it would be a good idea to make use of them now.
Followed by cleaning up to use the shorter p?d_leaf() versions in a
later patch.

Thanks,

Steve

> cheers
> 
> 
>> @@ -966,6 +972,12 @@ static inline int pgd_present(pgd_t pgd)
>>  	return !!(pgd_raw(pgd) & cpu_to_be64(_PAGE_PRESENT));
>>  }
>>  
>> +#define pgd_leaf	pgd_large
>> +static inline int pgd_large(pgd_t pgd)
>> +{
>> +	return !!(pgd_raw(pgd) & cpu_to_be64(_PAGE_PTE));
>> +}
>> +
>>  static inline pte_t pgd_pte(pgd_t pgd)
>>  {
>>  	return __pte_raw(pgd_raw(pgd));
>> @@ -1133,6 +1145,15 @@ static inline bool pmd_access_permitted(pmd_t pmd, bool write)
>>  	return pte_access_permitted(pmd_pte(pmd), write);
>>  }
>>  
>> +#define pmd_leaf	pmd_large
>> +/*
>> + * returns true for pmd migration entries, THP, devmap, hugetlb
>> + */
>> +static inline int pmd_large(pmd_t pmd)
>> +{
>> +	return !!(pmd_raw(pmd) & cpu_to_be64(_PAGE_PTE));
>> +}
>> +
>>  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
>>  extern pmd_t pfn_pmd(unsigned long pfn, pgprot_t pgprot);
>>  extern pmd_t mk_pmd(struct page *page, pgprot_t pgprot);
>> @@ -1159,15 +1180,6 @@ pmd_hugepage_update(struct mm_struct *mm, unsigned long addr, pmd_t *pmdp,
>>  	return hash__pmd_hugepage_update(mm, addr, pmdp, clr, set);
>>  }
>>  
>> -/*
>> - * returns true for pmd migration entries, THP, devmap, hugetlb
>> - * But compile time dependent on THP config
>> - */
>> -static inline int pmd_large(pmd_t pmd)
>> -{
>> -	return !!(pmd_raw(pmd) & cpu_to_be64(_PAGE_PTE));
>> -}
>> -
>>  static inline pmd_t pmd_mknotpresent(pmd_t pmd)
>>  {
>>  	return __pmd(pmd_val(pmd) & ~_PAGE_PRESENT);
>> -- 
>> 2.20.1
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v16 13/25] mm: pagewalk: Don't lock PTEs for walk_page_range_novma()
  2019-12-06 13:53   ` Steven Price
  (?)
@ 2019-12-10 11:23     ` kbuild test robot
  -1 siblings, 0 replies; 85+ messages in thread
From: kbuild test robot @ 2019-12-10 11:23 UTC (permalink / raw)
  To: Steven Price
  Cc: kbuild-all, Andrew Morton, linux-mm, Steven Price,
	Andy Lutomirski, Ard Biesheuvel, Arnd Bergmann, Borislav Petkov,
	Catalin Marinas, Dave Hansen, Ingo Molnar, James Morse,
	Jérôme Glisse, Peter Zijlstra, Thomas Gleixner,
	Will Deacon, x86, H. Peter Anvin, linux-arm-kernel, linux-kernel,
	Mark Rutland, Liang, Kan

Hi Steven,

I love your patch! Perhaps something to improve:

[auto build test WARNING on linus/master]
[also build test WARNING on v5.5-rc1 next-20191209]
[cannot apply to arm64/for-next/core tip/x86/mm]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:    https://github.com/0day-ci/linux/commits/Steven-Price/Generic-page-walk-and-ptdump/20191208-035831
base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git ad910e36da4ca3a1bd436989f632d062dda0c921
reproduce:
        # apt-get install sparse
        # sparse version: v0.6.1-101-g82dee2e-dirty
        make ARCH=x86_64 allmodconfig
        make C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__'

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>


sparse warnings: (new ones prefixed by >>)

>> include/linux/spinlock.h:378:9: sparse: sparse: context imbalance in 'walk_pte_range' - unexpected unlock

vim +/walk_pte_range +378 include/linux/spinlock.h

c2f21ce2e31286 Thomas Gleixner 2009-12-02  375  
3490565b633c70 Denys Vlasenko  2015-07-13  376  static __always_inline void spin_unlock(spinlock_t *lock)
c2f21ce2e31286 Thomas Gleixner 2009-12-02  377  {
c2f21ce2e31286 Thomas Gleixner 2009-12-02 @378  	raw_spin_unlock(&lock->rlock);
c2f21ce2e31286 Thomas Gleixner 2009-12-02  379  }
c2f21ce2e31286 Thomas Gleixner 2009-12-02  380  

:::::: The code at line 378 was first introduced by commit
:::::: c2f21ce2e31286a0a32f8da0a7856e9ca1122ef3 locking: Implement new raw_spinlock

:::::: TO: Thomas Gleixner <tglx@linutronix.de>
:::::: CC: Thomas Gleixner <tglx@linutronix.de>

---
0-DAY kernel test infrastructure                 Open Source Technology Center
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org Intel Corporation

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v16 13/25] mm: pagewalk: Don't lock PTEs for walk_page_range_novma()
@ 2019-12-10 11:23     ` kbuild test robot
  0 siblings, 0 replies; 85+ messages in thread
From: kbuild test robot @ 2019-12-10 11:23 UTC (permalink / raw)
  To: Steven Price
  Cc: Mark Rutland, Peter Zijlstra, Catalin Marinas, Dave Hansen,
	linux-mm, H. Peter Anvin, Will Deacon, Liang, Kan, x86,
	Steven Price, Ingo Molnar, Arnd Bergmann,
	Jérôme Glisse, Borislav Petkov, Andy Lutomirski,
	Thomas Gleixner, linux-arm-kernel, kbuild-all, Ard Biesheuvel,
	linux-kernel, James Morse, Andrew Morton

Hi Steven,

I love your patch! Perhaps something to improve:

[auto build test WARNING on linus/master]
[also build test WARNING on v5.5-rc1 next-20191209]
[cannot apply to arm64/for-next/core tip/x86/mm]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:    https://github.com/0day-ci/linux/commits/Steven-Price/Generic-page-walk-and-ptdump/20191208-035831
base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git ad910e36da4ca3a1bd436989f632d062dda0c921
reproduce:
        # apt-get install sparse
        # sparse version: v0.6.1-101-g82dee2e-dirty
        make ARCH=x86_64 allmodconfig
        make C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__'

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>


sparse warnings: (new ones prefixed by >>)

>> include/linux/spinlock.h:378:9: sparse: sparse: context imbalance in 'walk_pte_range' - unexpected unlock

vim +/walk_pte_range +378 include/linux/spinlock.h

c2f21ce2e31286 Thomas Gleixner 2009-12-02  375  
3490565b633c70 Denys Vlasenko  2015-07-13  376  static __always_inline void spin_unlock(spinlock_t *lock)
c2f21ce2e31286 Thomas Gleixner 2009-12-02  377  {
c2f21ce2e31286 Thomas Gleixner 2009-12-02 @378  	raw_spin_unlock(&lock->rlock);
c2f21ce2e31286 Thomas Gleixner 2009-12-02  379  }
c2f21ce2e31286 Thomas Gleixner 2009-12-02  380  

:::::: The code at line 378 was first introduced by commit
:::::: c2f21ce2e31286a0a32f8da0a7856e9ca1122ef3 locking: Implement new raw_spinlock

:::::: TO: Thomas Gleixner <tglx@linutronix.de>
:::::: CC: Thomas Gleixner <tglx@linutronix.de>

---
0-DAY kernel test infrastructure                 Open Source Technology Center
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org Intel Corporation

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v16 13/25] mm: pagewalk: Don't lock PTEs for walk_page_range_novma()
@ 2019-12-10 11:23     ` kbuild test robot
  0 siblings, 0 replies; 85+ messages in thread
From: kbuild test robot @ 2019-12-10 11:23 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 1998 bytes --]

Hi Steven,

I love your patch! Perhaps something to improve:

[auto build test WARNING on linus/master]
[also build test WARNING on v5.5-rc1 next-20191209]
[cannot apply to arm64/for-next/core tip/x86/mm]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:    https://github.com/0day-ci/linux/commits/Steven-Price/Generic-page-walk-and-ptdump/20191208-035831
base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git ad910e36da4ca3a1bd436989f632d062dda0c921
reproduce:
        # apt-get install sparse
        # sparse version: v0.6.1-101-g82dee2e-dirty
        make ARCH=x86_64 allmodconfig
        make C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__'

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>


sparse warnings: (new ones prefixed by >>)

>> include/linux/spinlock.h:378:9: sparse: sparse: context imbalance in 'walk_pte_range' - unexpected unlock

vim +/walk_pte_range +378 include/linux/spinlock.h

c2f21ce2e31286 Thomas Gleixner 2009-12-02  375  
3490565b633c70 Denys Vlasenko  2015-07-13  376  static __always_inline void spin_unlock(spinlock_t *lock)
c2f21ce2e31286 Thomas Gleixner 2009-12-02  377  {
c2f21ce2e31286 Thomas Gleixner 2009-12-02 @378  	raw_spin_unlock(&lock->rlock);
c2f21ce2e31286 Thomas Gleixner 2009-12-02  379  }
c2f21ce2e31286 Thomas Gleixner 2009-12-02  380  

:::::: The code at line 378 was first introduced by commit
:::::: c2f21ce2e31286a0a32f8da0a7856e9ca1122ef3 locking: Implement new raw_spinlock

:::::: TO: Thomas Gleixner <tglx@linutronix.de>
:::::: CC: Thomas Gleixner <tglx@linutronix.de>

---
0-DAY kernel test infrastructure                 Open Source Technology Center
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org Intel Corporation

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v16 13/25] mm: pagewalk: Don't lock PTEs for walk_page_range_novma()
  2019-12-10 11:23     ` kbuild test robot
  (?)
@ 2019-12-11 15:54       ` Steven Price
  -1 siblings, 0 replies; 85+ messages in thread
From: Steven Price @ 2019-12-11 15:54 UTC (permalink / raw)
  To: kbuild test robot
  Cc: Mark Rutland, Peter Zijlstra, Catalin Marinas, Dave Hansen,
	linux-mm, H. Peter Anvin, Will Deacon, Liang, Kan, x86,
	Ingo Molnar, Arnd Bergmann, Jérôme Glisse,
	Borislav Petkov, Andy Lutomirski, Thomas Gleixner,
	linux-arm-kernel, kbuild-all, Ard Biesheuvel, linux-kernel,
	James Morse, Andrew Morton

On 10/12/2019 11:23, kbuild test robot wrote:
> Hi Steven,
> 
> I love your patch! Perhaps something to improve:
> 
> [auto build test WARNING on linus/master]
> [also build test WARNING on v5.5-rc1 next-20191209]
> [cannot apply to arm64/for-next/core tip/x86/mm]
> [if your patch is applied to the wrong git tree, please drop us a note to help
> improve the system. BTW, we also suggest to use '--base' option to specify the
> base tree in git format-patch, please see https://stackoverflow.com/a/37406982]
> 
> url:    https://github.com/0day-ci/linux/commits/Steven-Price/Generic-page-walk-and-ptdump/20191208-035831
> base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git ad910e36da4ca3a1bd436989f632d062dda0c921
> reproduce:
>         # apt-get install sparse
>         # sparse version: v0.6.1-101-g82dee2e-dirty
>         make ARCH=x86_64 allmodconfig
>         make C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__'
> 
> If you fix the issue, kindly add following tag
> Reported-by: kbuild test robot <lkp@intel.com>
> 
> 
> sparse warnings: (new ones prefixed by >>)
> 
>>> include/linux/spinlock.h:378:9: sparse: sparse: context imbalance in 'walk_pte_range' - unexpected unlock

I believe this is a false positive (although the trace here is useless).
This patch adds a conditional lock/unlock:

pte = walk->no_vma ? pte_offset_map(pmd, addr) :
		     pte_offset_map_lock(walk->mm, pmd, addr, &ptl);
...
if (!walk->no_vma)
	spin_unlock(ptl);
pte_unmap(pte);

I'm not sure how to match sparse happy about that. Is the only option to
have two versions of the walk_pte_range() function? One which takes the
lock and one which doesn't.

Steve

> vim +/walk_pte_range +378 include/linux/spinlock.h
> 
> c2f21ce2e31286 Thomas Gleixner 2009-12-02  375  
> 3490565b633c70 Denys Vlasenko  2015-07-13  376  static __always_inline void spin_unlock(spinlock_t *lock)
> c2f21ce2e31286 Thomas Gleixner 2009-12-02  377  {
> c2f21ce2e31286 Thomas Gleixner 2009-12-02 @378  	raw_spin_unlock(&lock->rlock);
> c2f21ce2e31286 Thomas Gleixner 2009-12-02  379  }
> c2f21ce2e31286 Thomas Gleixner 2009-12-02  380  
> 
> :::::: The code at line 378 was first introduced by commit
> :::::: c2f21ce2e31286a0a32f8da0a7856e9ca1122ef3 locking: Implement new raw_spinlock
> 
> :::::: TO: Thomas Gleixner <tglx@linutronix.de>
> :::::: CC: Thomas Gleixner <tglx@linutronix.de>
> 
> ---
> 0-DAY kernel test infrastructure                 Open Source Technology Center
> https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org Intel Corporation
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v16 13/25] mm: pagewalk: Don't lock PTEs for walk_page_range_novma()
@ 2019-12-11 15:54       ` Steven Price
  0 siblings, 0 replies; 85+ messages in thread
From: Steven Price @ 2019-12-11 15:54 UTC (permalink / raw)
  To: kbuild test robot
  Cc: Mark Rutland, x86, kbuild-all, Arnd Bergmann, Ard Biesheuvel,
	Peter Zijlstra, Catalin Marinas, Dave Hansen, linux-kernel,
	linux-mm, Jérôme Glisse, Ingo Molnar, Borislav Petkov,
	Andy Lutomirski, H. Peter Anvin, James Morse, Thomas Gleixner,
	Will Deacon, Andrew Morton, linux-arm-kernel, Liang, Kan

On 10/12/2019 11:23, kbuild test robot wrote:
> Hi Steven,
> 
> I love your patch! Perhaps something to improve:
> 
> [auto build test WARNING on linus/master]
> [also build test WARNING on v5.5-rc1 next-20191209]
> [cannot apply to arm64/for-next/core tip/x86/mm]
> [if your patch is applied to the wrong git tree, please drop us a note to help
> improve the system. BTW, we also suggest to use '--base' option to specify the
> base tree in git format-patch, please see https://stackoverflow.com/a/37406982]
> 
> url:    https://github.com/0day-ci/linux/commits/Steven-Price/Generic-page-walk-and-ptdump/20191208-035831
> base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git ad910e36da4ca3a1bd436989f632d062dda0c921
> reproduce:
>         # apt-get install sparse
>         # sparse version: v0.6.1-101-g82dee2e-dirty
>         make ARCH=x86_64 allmodconfig
>         make C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__'
> 
> If you fix the issue, kindly add following tag
> Reported-by: kbuild test robot <lkp@intel.com>
> 
> 
> sparse warnings: (new ones prefixed by >>)
> 
>>> include/linux/spinlock.h:378:9: sparse: sparse: context imbalance in 'walk_pte_range' - unexpected unlock

I believe this is a false positive (although the trace here is useless).
This patch adds a conditional lock/unlock:

pte = walk->no_vma ? pte_offset_map(pmd, addr) :
		     pte_offset_map_lock(walk->mm, pmd, addr, &ptl);
...
if (!walk->no_vma)
	spin_unlock(ptl);
pte_unmap(pte);

I'm not sure how to match sparse happy about that. Is the only option to
have two versions of the walk_pte_range() function? One which takes the
lock and one which doesn't.

Steve

> vim +/walk_pte_range +378 include/linux/spinlock.h
> 
> c2f21ce2e31286 Thomas Gleixner 2009-12-02  375  
> 3490565b633c70 Denys Vlasenko  2015-07-13  376  static __always_inline void spin_unlock(spinlock_t *lock)
> c2f21ce2e31286 Thomas Gleixner 2009-12-02  377  {
> c2f21ce2e31286 Thomas Gleixner 2009-12-02 @378  	raw_spin_unlock(&lock->rlock);
> c2f21ce2e31286 Thomas Gleixner 2009-12-02  379  }
> c2f21ce2e31286 Thomas Gleixner 2009-12-02  380  
> 
> :::::: The code at line 378 was first introduced by commit
> :::::: c2f21ce2e31286a0a32f8da0a7856e9ca1122ef3 locking: Implement new raw_spinlock
> 
> :::::: TO: Thomas Gleixner <tglx@linutronix.de>
> :::::: CC: Thomas Gleixner <tglx@linutronix.de>
> 
> ---
> 0-DAY kernel test infrastructure                 Open Source Technology Center
> https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org Intel Corporation
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v16 13/25] mm: pagewalk: Don't lock PTEs for walk_page_range_novma()
@ 2019-12-11 15:54       ` Steven Price
  0 siblings, 0 replies; 85+ messages in thread
From: Steven Price @ 2019-12-11 15:54 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 2809 bytes --]

On 10/12/2019 11:23, kbuild test robot wrote:
> Hi Steven,
> 
> I love your patch! Perhaps something to improve:
> 
> [auto build test WARNING on linus/master]
> [also build test WARNING on v5.5-rc1 next-20191209]
> [cannot apply to arm64/for-next/core tip/x86/mm]
> [if your patch is applied to the wrong git tree, please drop us a note to help
> improve the system. BTW, we also suggest to use '--base' option to specify the
> base tree in git format-patch, please see https://stackoverflow.com/a/37406982]
> 
> url:    https://github.com/0day-ci/linux/commits/Steven-Price/Generic-page-walk-and-ptdump/20191208-035831
> base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git ad910e36da4ca3a1bd436989f632d062dda0c921
> reproduce:
>         # apt-get install sparse
>         # sparse version: v0.6.1-101-g82dee2e-dirty
>         make ARCH=x86_64 allmodconfig
>         make C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__'
> 
> If you fix the issue, kindly add following tag
> Reported-by: kbuild test robot <lkp@intel.com>
> 
> 
> sparse warnings: (new ones prefixed by >>)
> 
>>> include/linux/spinlock.h:378:9: sparse: sparse: context imbalance in 'walk_pte_range' - unexpected unlock

I believe this is a false positive (although the trace here is useless).
This patch adds a conditional lock/unlock:

pte = walk->no_vma ? pte_offset_map(pmd, addr) :
		     pte_offset_map_lock(walk->mm, pmd, addr, &ptl);
...
if (!walk->no_vma)
	spin_unlock(ptl);
pte_unmap(pte);

I'm not sure how to match sparse happy about that. Is the only option to
have two versions of the walk_pte_range() function? One which takes the
lock and one which doesn't.

Steve

> vim +/walk_pte_range +378 include/linux/spinlock.h
> 
> c2f21ce2e31286 Thomas Gleixner 2009-12-02  375  
> 3490565b633c70 Denys Vlasenko  2015-07-13  376  static __always_inline void spin_unlock(spinlock_t *lock)
> c2f21ce2e31286 Thomas Gleixner 2009-12-02  377  {
> c2f21ce2e31286 Thomas Gleixner 2009-12-02 @378  	raw_spin_unlock(&lock->rlock);
> c2f21ce2e31286 Thomas Gleixner 2009-12-02  379  }
> c2f21ce2e31286 Thomas Gleixner 2009-12-02  380  
> 
> :::::: The code at line 378 was first introduced by commit
> :::::: c2f21ce2e31286a0a32f8da0a7856e9ca1122ef3 locking: Implement new raw_spinlock
> 
> :::::: TO: Thomas Gleixner <tglx@linutronix.de>
> :::::: CC: Thomas Gleixner <tglx@linutronix.de>
> 
> ---
> 0-DAY kernel test infrastructure                 Open Source Technology Center
> https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org Intel Corporation
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel(a)lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v16 13/25] mm: pagewalk: Don't lock PTEs for walk_page_range_novma()
  2019-12-11 15:54       ` Steven Price
@ 2019-12-11 17:12         ` Luc Van Oostenryck
  -1 siblings, 0 replies; 85+ messages in thread
From: Luc Van Oostenryck @ 2019-12-11 17:12 UTC (permalink / raw)
  To: Steven Price
  Cc: kbuild test robot, Mark Rutland, Peter Zijlstra, Catalin Marinas,
	Dave Hansen, linux-mm, H. Peter Anvin, Will Deacon, Liang, Kan,
	x86, Ingo Molnar, Arnd Bergmann, Jérôme Glisse,
	Borislav Petkov, Andy Lutomirski, Thomas Gleixner,
	linux-arm-kernel, kbuild-all, Ard Biesheuvel, linux-kernel,
	James Morse, Andrew Morton

On Wed, Dec 11, 2019 at 03:54:06PM +0000, Steven Price wrote:
> On 10/12/2019 11:23, kbuild test robot wrote:
> >>> include/linux/spinlock.h:378:9: sparse: sparse: context imbalance in 'walk_pte_range' - unexpected unlock
> 
> I believe this is a false positive (although the trace here is useless).
> This patch adds a conditional lock/unlock:
> 
> pte = walk->no_vma ? pte_offset_map(pmd, addr) :
> 		     pte_offset_map_lock(walk->mm, pmd, addr, &ptl);
> ...
> if (!walk->no_vma)
> 	spin_unlock(ptl);
> pte_unmap(pte);
> 
> I'm not sure how to match sparse happy about that. Is the only option to
> have two versions of the walk_pte_range() function? One which takes the
> lock and one which doesn't.

Yes.

-- Luc

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v16 13/25] mm: pagewalk: Don't lock PTEs for walk_page_range_novma()
@ 2019-12-11 17:12         ` Luc Van Oostenryck
  0 siblings, 0 replies; 85+ messages in thread
From: Luc Van Oostenryck @ 2019-12-11 17:12 UTC (permalink / raw)
  To: Steven Price
  Cc: Mark Rutland, Peter Zijlstra, Catalin Marinas, Dave Hansen,
	linux-mm, H. Peter Anvin, Will Deacon, Liang, Kan,
	kbuild test robot, x86, Ingo Molnar, Arnd Bergmann,
	Jérôme Glisse, Borislav Petkov, Andy Lutomirski,
	Thomas Gleixner, linux-arm-kernel, kbuild-all, Ard Biesheuvel,
	linux-kernel, James Morse, Andrew Morton

On Wed, Dec 11, 2019 at 03:54:06PM +0000, Steven Price wrote:
> On 10/12/2019 11:23, kbuild test robot wrote:
> >>> include/linux/spinlock.h:378:9: sparse: sparse: context imbalance in 'walk_pte_range' - unexpected unlock
> 
> I believe this is a false positive (although the trace here is useless).
> This patch adds a conditional lock/unlock:
> 
> pte = walk->no_vma ? pte_offset_map(pmd, addr) :
> 		     pte_offset_map_lock(walk->mm, pmd, addr, &ptl);
> ...
> if (!walk->no_vma)
> 	spin_unlock(ptl);
> pte_unmap(pte);
> 
> I'm not sure how to match sparse happy about that. Is the only option to
> have two versions of the walk_pte_range() function? One which takes the
> lock and one which doesn't.

Yes.

-- Luc

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v16 13/25] mm: pagewalk: Don't lock PTEs for walk_page_range_novma()
  2019-12-11 15:54       ` Steven Price
@ 2019-12-11 17:19         ` Qian Cai
  -1 siblings, 0 replies; 85+ messages in thread
From: Qian Cai @ 2019-12-11 17:19 UTC (permalink / raw)
  To: Steven Price
  Cc: kbuild test robot, Mark Rutland, Peter Zijlstra, Catalin Marinas,
	Dave Hansen, linux-mm, H. Peter Anvin, Will Deacon, Liang, Kan,
	x86, Ingo Molnar, Arnd Bergmann, Jérôme Glisse,
	Borislav Petkov, Andy Lutomirski, Thomas Gleixner,
	linux-arm-kernel, kbuild-all, Ard Biesheuvel, linux-kernel,
	James Morse, Andrew Morton



> On Dec 11, 2019, at 10:54 AM, Steven Price <Steven.Price@arm.com> wrote:
> 
> I believe this is a false positive (although the trace here is useless).
> This patch adds a conditional lock/unlock:
> 
> pte = walk->no_vma ? pte_offset_map(pmd, addr) :
>             pte_offset_map_lock(walk->mm, pmd, addr, &ptl);
> ...
> if (!walk->no_vma)
>    spin_unlock(ptl);
> pte_unmap(pte);
> 
> I'm not sure how to match sparse happy about that. Is the only option to
> have two versions of the walk_pte_range() function? One which takes the
> lock and one which doesn't.

Or just ignore the sparse false positive without complicating the code further.

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v16 13/25] mm: pagewalk: Don't lock PTEs for walk_page_range_novma()
@ 2019-12-11 17:19         ` Qian Cai
  0 siblings, 0 replies; 85+ messages in thread
From: Qian Cai @ 2019-12-11 17:19 UTC (permalink / raw)
  To: Steven Price
  Cc: Mark Rutland, Peter Zijlstra, Catalin Marinas, Dave Hansen,
	linux-mm, H. Peter Anvin, Will Deacon, Liang, Kan,
	kbuild test robot, x86, Ingo Molnar, Arnd Bergmann,
	Jérôme Glisse, Borislav Petkov, Andy Lutomirski,
	Thomas Gleixner, linux-arm-kernel, kbuild-all, Ard Biesheuvel,
	linux-kernel, James Morse, Andrew Morton



> On Dec 11, 2019, at 10:54 AM, Steven Price <Steven.Price@arm.com> wrote:
> 
> I believe this is a false positive (although the trace here is useless).
> This patch adds a conditional lock/unlock:
> 
> pte = walk->no_vma ? pte_offset_map(pmd, addr) :
>             pte_offset_map_lock(walk->mm, pmd, addr, &ptl);
> ...
> if (!walk->no_vma)
>    spin_unlock(ptl);
> pte_unmap(pte);
> 
> I'm not sure how to match sparse happy about that. Is the only option to
> have two versions of the walk_pte_range() function? One which takes the
> lock and one which doesn't.

Or just ignore the sparse false positive without complicating the code further.
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v16 11/25] mm: pagewalk: Add p4d_entry() and pgd_entry()
  2019-12-06 13:53   ` Steven Price
@ 2019-12-12 11:23     ` Thomas Hellström (VMware)
  -1 siblings, 0 replies; 85+ messages in thread
From: Thomas Hellström (VMware) @ 2019-12-12 11:23 UTC (permalink / raw)
  To: Steven Price
  Cc: Andy Lutomirski, Ard Biesheuvel, Arnd Bergmann, Borislav Petkov,
	Catalin Marinas, Dave Hansen, Ingo Molnar, James Morse,
	Jérôme Glisse, Peter Zijlstra, Thomas Gleixner,
	Will Deacon, x86, H. Peter Anvin, linux-arm-kernel, linux-kernel,
	Mark Rutland, Liang, Kan, Zong Li, Andrew Morton, linux-mm

On 12/6/19 2:53 PM, Steven Price wrote:
> pgd_entry() and pud_entry() were removed by commit 0b1fbfe50006c410
> ("mm/pagewalk: remove pgd_entry() and pud_entry()") because there were
> no users. We're about to add users so reintroduce them, along with
> p4d_entry() as we now have 5 levels of tables.
>
> Note that commit a00cc7d9dd93d66a ("mm, x86: add support for
> PUD-sized transparent hugepages") already re-added pud_entry() but with
> different semantics to the other callbacks. Since there have never
> been upstream users of this, revert the semantics back to match the
> other callbacks. This means pud_entry() is called for all entries, not
> just transparent huge pages.

Actually, there are two users of pud_entry(), in hmm.c and since 5.5rc1 
also mapping_dirty_helpers.c. The latter one is unproblematic and 
requires no attention but the one in hmm.c is probably largely untested, 
and seems to assume it was called outside of the spinlock.

The problem with the current patch is that the hmm pud_entry will 
traverse also pmds, so that will be done twice now.

In another thread we were discussing a means of rerunning the level (in 
case of a race), or continuing after a level, based on the return value 
after the callback. The change was fairly invasive,


> Tested-by: Zong Li <zong.li@sifive.com>
> Signed-off-by: Steven Price <steven.price@arm.com>
> ---
>   include/linux/pagewalk.h | 19 +++++++++++++------
>   mm/pagewalk.c            | 27 ++++++++++++++++-----------
>   2 files changed, 29 insertions(+), 17 deletions(-)
>
> diff --git a/include/linux/pagewalk.h b/include/linux/pagewalk.h
> index 6ec82e92c87f..06790f23957f 100644
> --- a/include/linux/pagewalk.h
> +++ b/include/linux/pagewalk.h
> @@ -8,15 +8,15 @@ struct mm_walk;
>   
>   /**
>    * mm_walk_ops - callbacks for walk_page_range
> - * @pud_entry:		if set, called for each non-empty PUD (2nd-level) entry
> - *			this handler should only handle pud_trans_huge() puds.
> - *			the pmd_entry or pte_entry callbacks will be used for
> - *			regular PUDs.
> - * @pmd_entry:		if set, called for each non-empty PMD (3rd-level) entry
> + * @pgd_entry:		if set, called for each non-empty PGD (top-level) entry
> + * @p4d_entry:		if set, called for each non-empty P4D entry
> + * @pud_entry:		if set, called for each non-empty PUD entry
> + * @pmd_entry:		if set, called for each non-empty PMD entry
>    *			this handler is required to be able to handle
>    *			pmd_trans_huge() pmds.  They may simply choose to
>    *			split_huge_page() instead of handling it explicitly.
> - * @pte_entry:		if set, called for each non-empty PTE (4th-level) entry
> + * @pte_entry:		if set, called for each non-empty PTE (lowest-level)
> + *			entry
>    * @pte_hole:		if set, called for each hole at all levels
>    * @hugetlb_entry:	if set, called for each hugetlb entry
>    * @test_walk:		caller specific callback function to determine whether
> @@ -27,8 +27,15 @@ struct mm_walk;
>    * @pre_vma:            if set, called before starting walk on a non-null vma.
>    * @post_vma:           if set, called after a walk on a non-null vma, provided
>    *                      that @pre_vma and the vma walk succeeded.
> + *
> + * p?d_entry callbacks are called even if those levels are folded on a
> + * particular architecture/configuration.
>    */
>   struct mm_walk_ops {
> +	int (*pgd_entry)(pgd_t *pgd, unsigned long addr,
> +			 unsigned long next, struct mm_walk *walk);
> +	int (*p4d_entry)(p4d_t *p4d, unsigned long addr,
> +			 unsigned long next, struct mm_walk *walk);
>   	int (*pud_entry)(pud_t *pud, unsigned long addr,
>   			 unsigned long next, struct mm_walk *walk);
>   	int (*pmd_entry)(pmd_t *pmd, unsigned long addr,
> diff --git a/mm/pagewalk.c b/mm/pagewalk.c
> index ea0b9e606ad1..c089786e7a7f 100644
> --- a/mm/pagewalk.c
> +++ b/mm/pagewalk.c
> @@ -94,15 +94,9 @@ static int walk_pud_range(p4d_t *p4d, unsigned long addr, unsigned long end,
>   		}
>   
>   		if (ops->pud_entry) {
> -			spinlock_t *ptl = pud_trans_huge_lock(pud, walk->vma);
> -
> -			if (ptl) {
> -				err = ops->pud_entry(pud, addr, next, walk);
> -				spin_unlock(ptl);
> -				if (err)
> -					break;
> -				continue;
> -			}
> +			err = ops->pud_entry(pud, addr, next, walk);
> +			if (err)
> +				break;

Actually, there are two current users of pud_entry(), in hmm.c and since 
5.5rc1 also mapping_dirty_helpers.c. The latter one is unproblematic and 
requires no attention but the one in hmm.c is probably largely untested, 
and seems to assume it was called outside of the spinlock.

The problem with the current patch is that the hmm pud_entry will 
traverse also pmds, so that will now be done twice.

/Thomas


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v16 11/25] mm: pagewalk: Add p4d_entry() and pgd_entry()
@ 2019-12-12 11:23     ` Thomas Hellström (VMware)
  0 siblings, 0 replies; 85+ messages in thread
From: Thomas Hellström (VMware) @ 2019-12-12 11:23 UTC (permalink / raw)
  To: Steven Price
  Cc: Mark Rutland, x86, Zong Li, Arnd Bergmann, Ard Biesheuvel,
	Peter Zijlstra, Catalin Marinas, Dave Hansen, linux-kernel,
	linux-mm, Jérôme Glisse, Ingo Molnar, Borislav Petkov,
	Andy Lutomirski, H. Peter Anvin, James Morse, Thomas Gleixner,
	Will Deacon, Andrew Morton, linux-arm-kernel, Liang, Kan

On 12/6/19 2:53 PM, Steven Price wrote:
> pgd_entry() and pud_entry() were removed by commit 0b1fbfe50006c410
> ("mm/pagewalk: remove pgd_entry() and pud_entry()") because there were
> no users. We're about to add users so reintroduce them, along with
> p4d_entry() as we now have 5 levels of tables.
>
> Note that commit a00cc7d9dd93d66a ("mm, x86: add support for
> PUD-sized transparent hugepages") already re-added pud_entry() but with
> different semantics to the other callbacks. Since there have never
> been upstream users of this, revert the semantics back to match the
> other callbacks. This means pud_entry() is called for all entries, not
> just transparent huge pages.

Actually, there are two users of pud_entry(), in hmm.c and since 5.5rc1 
also mapping_dirty_helpers.c. The latter one is unproblematic and 
requires no attention but the one in hmm.c is probably largely untested, 
and seems to assume it was called outside of the spinlock.

The problem with the current patch is that the hmm pud_entry will 
traverse also pmds, so that will be done twice now.

In another thread we were discussing a means of rerunning the level (in 
case of a race), or continuing after a level, based on the return value 
after the callback. The change was fairly invasive,


> Tested-by: Zong Li <zong.li@sifive.com>
> Signed-off-by: Steven Price <steven.price@arm.com>
> ---
>   include/linux/pagewalk.h | 19 +++++++++++++------
>   mm/pagewalk.c            | 27 ++++++++++++++++-----------
>   2 files changed, 29 insertions(+), 17 deletions(-)
>
> diff --git a/include/linux/pagewalk.h b/include/linux/pagewalk.h
> index 6ec82e92c87f..06790f23957f 100644
> --- a/include/linux/pagewalk.h
> +++ b/include/linux/pagewalk.h
> @@ -8,15 +8,15 @@ struct mm_walk;
>   
>   /**
>    * mm_walk_ops - callbacks for walk_page_range
> - * @pud_entry:		if set, called for each non-empty PUD (2nd-level) entry
> - *			this handler should only handle pud_trans_huge() puds.
> - *			the pmd_entry or pte_entry callbacks will be used for
> - *			regular PUDs.
> - * @pmd_entry:		if set, called for each non-empty PMD (3rd-level) entry
> + * @pgd_entry:		if set, called for each non-empty PGD (top-level) entry
> + * @p4d_entry:		if set, called for each non-empty P4D entry
> + * @pud_entry:		if set, called for each non-empty PUD entry
> + * @pmd_entry:		if set, called for each non-empty PMD entry
>    *			this handler is required to be able to handle
>    *			pmd_trans_huge() pmds.  They may simply choose to
>    *			split_huge_page() instead of handling it explicitly.
> - * @pte_entry:		if set, called for each non-empty PTE (4th-level) entry
> + * @pte_entry:		if set, called for each non-empty PTE (lowest-level)
> + *			entry
>    * @pte_hole:		if set, called for each hole at all levels
>    * @hugetlb_entry:	if set, called for each hugetlb entry
>    * @test_walk:		caller specific callback function to determine whether
> @@ -27,8 +27,15 @@ struct mm_walk;
>    * @pre_vma:            if set, called before starting walk on a non-null vma.
>    * @post_vma:           if set, called after a walk on a non-null vma, provided
>    *                      that @pre_vma and the vma walk succeeded.
> + *
> + * p?d_entry callbacks are called even if those levels are folded on a
> + * particular architecture/configuration.
>    */
>   struct mm_walk_ops {
> +	int (*pgd_entry)(pgd_t *pgd, unsigned long addr,
> +			 unsigned long next, struct mm_walk *walk);
> +	int (*p4d_entry)(p4d_t *p4d, unsigned long addr,
> +			 unsigned long next, struct mm_walk *walk);
>   	int (*pud_entry)(pud_t *pud, unsigned long addr,
>   			 unsigned long next, struct mm_walk *walk);
>   	int (*pmd_entry)(pmd_t *pmd, unsigned long addr,
> diff --git a/mm/pagewalk.c b/mm/pagewalk.c
> index ea0b9e606ad1..c089786e7a7f 100644
> --- a/mm/pagewalk.c
> +++ b/mm/pagewalk.c
> @@ -94,15 +94,9 @@ static int walk_pud_range(p4d_t *p4d, unsigned long addr, unsigned long end,
>   		}
>   
>   		if (ops->pud_entry) {
> -			spinlock_t *ptl = pud_trans_huge_lock(pud, walk->vma);
> -
> -			if (ptl) {
> -				err = ops->pud_entry(pud, addr, next, walk);
> -				spin_unlock(ptl);
> -				if (err)
> -					break;
> -				continue;
> -			}
> +			err = ops->pud_entry(pud, addr, next, walk);
> +			if (err)
> +				break;

Actually, there are two current users of pud_entry(), in hmm.c and since 
5.5rc1 also mapping_dirty_helpers.c. The latter one is unproblematic and 
requires no attention but the one in hmm.c is probably largely untested, 
and seems to assume it was called outside of the spinlock.

The problem with the current patch is that the hmm pud_entry will 
traverse also pmds, so that will now be done twice.

/Thomas


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v16 11/25] mm: pagewalk: Add p4d_entry() and pgd_entry()
  2019-12-12 11:23     ` Thomas Hellström (VMware)
@ 2019-12-12 11:33       ` Thomas Hellström (VMware)
  -1 siblings, 0 replies; 85+ messages in thread
From: Thomas Hellström (VMware) @ 2019-12-12 11:33 UTC (permalink / raw)
  To: Steven Price
  Cc: Andy Lutomirski, Ard Biesheuvel, Arnd Bergmann, Borislav Petkov,
	Catalin Marinas, Dave Hansen, Ingo Molnar, James Morse,
	Jérôme Glisse, Peter Zijlstra, Thomas Gleixner,
	Will Deacon, x86, H. Peter Anvin, linux-arm-kernel, linux-kernel,
	Mark Rutland, Liang, Kan, Zong Li, Andrew Morton, linux-mm

On 12/12/19 12:23 PM, Thomas Hellström (VMware) wrote:
> On 12/6/19 2:53 PM, Steven Price wrote:
>> pgd_entry() and pud_entry() were removed by commit 0b1fbfe50006c410
>> ("mm/pagewalk: remove pgd_entry() and pud_entry()") because there were
>> no users. We're about to add users so reintroduce them, along with
>> p4d_entry() as we now have 5 levels of tables.
>>
>> Note that commit a00cc7d9dd93d66a ("mm, x86: add support for
>> PUD-sized transparent hugepages") already re-added pud_entry() but with
>> different semantics to the other callbacks. Since there have never
>> been upstream users of this, revert the semantics back to match the
>> other callbacks. This means pud_entry() is called for all entries, not
>> just transparent huge pages.
>
> Actually, there are two users of pud_entry(), in hmm.c and since 
> 5.5rc1 also mapping_dirty_helpers.c. The latter one is unproblematic 
> and requires no attention but the one in hmm.c is probably largely 
> untested, and seems to assume it was called outside of the spinlock.
>
> The problem with the current patch is that the hmm pud_entry will 
> traverse also pmds, so that will be done twice now.
>
> In another thread we were discussing a means of rerunning the level 
> (in case of a race), or continuing after a level, based on the return 
> value after the callback. The change was fairly invasive,
>
Hmm. Forgot to remove the above text that appears twice. :(. The correct 
one is inline below.

>
>> Tested-by: Zong Li <zong.li@sifive.com>
>> Signed-off-by: Steven Price <steven.price@arm.com>
>> ---
>>   include/linux/pagewalk.h | 19 +++++++++++++------
>>   mm/pagewalk.c            | 27 ++++++++++++++++-----------
>>   2 files changed, 29 insertions(+), 17 deletions(-)
>>
>> diff --git a/include/linux/pagewalk.h b/include/linux/pagewalk.h
>> index 6ec82e92c87f..06790f23957f 100644
>> --- a/include/linux/pagewalk.h
>> +++ b/include/linux/pagewalk.h
>> @@ -8,15 +8,15 @@ struct mm_walk;
>>     /**
>>    * mm_walk_ops - callbacks for walk_page_range
>> - * @pud_entry:        if set, called for each non-empty PUD 
>> (2nd-level) entry
>> - *            this handler should only handle pud_trans_huge() puds.
>> - *            the pmd_entry or pte_entry callbacks will be used for
>> - *            regular PUDs.
>> - * @pmd_entry:        if set, called for each non-empty PMD 
>> (3rd-level) entry
>> + * @pgd_entry:        if set, called for each non-empty PGD 
>> (top-level) entry
>> + * @p4d_entry:        if set, called for each non-empty P4D entry
>> + * @pud_entry:        if set, called for each non-empty PUD entry
>> + * @pmd_entry:        if set, called for each non-empty PMD entry
>>    *            this handler is required to be able to handle
>>    *            pmd_trans_huge() pmds.  They may simply choose to
>>    *            split_huge_page() instead of handling it explicitly.
>> - * @pte_entry:        if set, called for each non-empty PTE 
>> (4th-level) entry
>> + * @pte_entry:        if set, called for each non-empty PTE 
>> (lowest-level)
>> + *            entry
>>    * @pte_hole:        if set, called for each hole at all levels
>>    * @hugetlb_entry:    if set, called for each hugetlb entry
>>    * @test_walk:        caller specific callback function to 
>> determine whether
>> @@ -27,8 +27,15 @@ struct mm_walk;
>>    * @pre_vma:            if set, called before starting walk on a 
>> non-null vma.
>>    * @post_vma:           if set, called after a walk on a non-null 
>> vma, provided
>>    *                      that @pre_vma and the vma walk succeeded.
>> + *
>> + * p?d_entry callbacks are called even if those levels are folded on a
>> + * particular architecture/configuration.
>>    */
>>   struct mm_walk_ops {
>> +    int (*pgd_entry)(pgd_t *pgd, unsigned long addr,
>> +             unsigned long next, struct mm_walk *walk);
>> +    int (*p4d_entry)(p4d_t *p4d, unsigned long addr,
>> +             unsigned long next, struct mm_walk *walk);
>>       int (*pud_entry)(pud_t *pud, unsigned long addr,
>>                unsigned long next, struct mm_walk *walk);
>>       int (*pmd_entry)(pmd_t *pmd, unsigned long addr,
>> diff --git a/mm/pagewalk.c b/mm/pagewalk.c
>> index ea0b9e606ad1..c089786e7a7f 100644
>> --- a/mm/pagewalk.c
>> +++ b/mm/pagewalk.c
>> @@ -94,15 +94,9 @@ static int walk_pud_range(p4d_t *p4d, unsigned 
>> long addr, unsigned long end,
>>           }
>>             if (ops->pud_entry) {
>> -            spinlock_t *ptl = pud_trans_huge_lock(pud, walk->vma);
>> -
>> -            if (ptl) {
>> -                err = ops->pud_entry(pud, addr, next, walk);
>> -                spin_unlock(ptl);
>> -                if (err)
>> -                    break;
>> -                continue;
>> -            }
>> +            err = ops->pud_entry(pud, addr, next, walk);
>> +            if (err)
>> +                break;
>
> Actually, there are two current users of pud_entry(), in hmm.c and 
> since 5.5rc1 also mapping_dirty_helpers.c. The latter one is 
> unproblematic and requires no attention but the one in hmm.c is 
> probably largely untested, and seems to assume it was called outside 
> of the spinlock.
>
> The problem with the current patch is that the hmm pud_entry will 
> traverse also pmds, so that will now be done twice.
>
> /Thomas
>


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v16 11/25] mm: pagewalk: Add p4d_entry() and pgd_entry()
@ 2019-12-12 11:33       ` Thomas Hellström (VMware)
  0 siblings, 0 replies; 85+ messages in thread
From: Thomas Hellström (VMware) @ 2019-12-12 11:33 UTC (permalink / raw)
  To: Steven Price
  Cc: Mark Rutland, x86, Zong Li, Arnd Bergmann, Ard Biesheuvel,
	Peter Zijlstra, Catalin Marinas, Dave Hansen, linux-kernel,
	linux-mm, Jérôme Glisse, Ingo Molnar, Borislav Petkov,
	Andy Lutomirski, H. Peter Anvin, James Morse, Thomas Gleixner,
	Will Deacon, Andrew Morton, linux-arm-kernel, Liang, Kan

On 12/12/19 12:23 PM, Thomas Hellström (VMware) wrote:
> On 12/6/19 2:53 PM, Steven Price wrote:
>> pgd_entry() and pud_entry() were removed by commit 0b1fbfe50006c410
>> ("mm/pagewalk: remove pgd_entry() and pud_entry()") because there were
>> no users. We're about to add users so reintroduce them, along with
>> p4d_entry() as we now have 5 levels of tables.
>>
>> Note that commit a00cc7d9dd93d66a ("mm, x86: add support for
>> PUD-sized transparent hugepages") already re-added pud_entry() but with
>> different semantics to the other callbacks. Since there have never
>> been upstream users of this, revert the semantics back to match the
>> other callbacks. This means pud_entry() is called for all entries, not
>> just transparent huge pages.
>
> Actually, there are two users of pud_entry(), in hmm.c and since 
> 5.5rc1 also mapping_dirty_helpers.c. The latter one is unproblematic 
> and requires no attention but the one in hmm.c is probably largely 
> untested, and seems to assume it was called outside of the spinlock.
>
> The problem with the current patch is that the hmm pud_entry will 
> traverse also pmds, so that will be done twice now.
>
> In another thread we were discussing a means of rerunning the level 
> (in case of a race), or continuing after a level, based on the return 
> value after the callback. The change was fairly invasive,
>
Hmm. Forgot to remove the above text that appears twice. :(. The correct 
one is inline below.

>
>> Tested-by: Zong Li <zong.li@sifive.com>
>> Signed-off-by: Steven Price <steven.price@arm.com>
>> ---
>>   include/linux/pagewalk.h | 19 +++++++++++++------
>>   mm/pagewalk.c            | 27 ++++++++++++++++-----------
>>   2 files changed, 29 insertions(+), 17 deletions(-)
>>
>> diff --git a/include/linux/pagewalk.h b/include/linux/pagewalk.h
>> index 6ec82e92c87f..06790f23957f 100644
>> --- a/include/linux/pagewalk.h
>> +++ b/include/linux/pagewalk.h
>> @@ -8,15 +8,15 @@ struct mm_walk;
>>     /**
>>    * mm_walk_ops - callbacks for walk_page_range
>> - * @pud_entry:        if set, called for each non-empty PUD 
>> (2nd-level) entry
>> - *            this handler should only handle pud_trans_huge() puds.
>> - *            the pmd_entry or pte_entry callbacks will be used for
>> - *            regular PUDs.
>> - * @pmd_entry:        if set, called for each non-empty PMD 
>> (3rd-level) entry
>> + * @pgd_entry:        if set, called for each non-empty PGD 
>> (top-level) entry
>> + * @p4d_entry:        if set, called for each non-empty P4D entry
>> + * @pud_entry:        if set, called for each non-empty PUD entry
>> + * @pmd_entry:        if set, called for each non-empty PMD entry
>>    *            this handler is required to be able to handle
>>    *            pmd_trans_huge() pmds.  They may simply choose to
>>    *            split_huge_page() instead of handling it explicitly.
>> - * @pte_entry:        if set, called for each non-empty PTE 
>> (4th-level) entry
>> + * @pte_entry:        if set, called for each non-empty PTE 
>> (lowest-level)
>> + *            entry
>>    * @pte_hole:        if set, called for each hole at all levels
>>    * @hugetlb_entry:    if set, called for each hugetlb entry
>>    * @test_walk:        caller specific callback function to 
>> determine whether
>> @@ -27,8 +27,15 @@ struct mm_walk;
>>    * @pre_vma:            if set, called before starting walk on a 
>> non-null vma.
>>    * @post_vma:           if set, called after a walk on a non-null 
>> vma, provided
>>    *                      that @pre_vma and the vma walk succeeded.
>> + *
>> + * p?d_entry callbacks are called even if those levels are folded on a
>> + * particular architecture/configuration.
>>    */
>>   struct mm_walk_ops {
>> +    int (*pgd_entry)(pgd_t *pgd, unsigned long addr,
>> +             unsigned long next, struct mm_walk *walk);
>> +    int (*p4d_entry)(p4d_t *p4d, unsigned long addr,
>> +             unsigned long next, struct mm_walk *walk);
>>       int (*pud_entry)(pud_t *pud, unsigned long addr,
>>                unsigned long next, struct mm_walk *walk);
>>       int (*pmd_entry)(pmd_t *pmd, unsigned long addr,
>> diff --git a/mm/pagewalk.c b/mm/pagewalk.c
>> index ea0b9e606ad1..c089786e7a7f 100644
>> --- a/mm/pagewalk.c
>> +++ b/mm/pagewalk.c
>> @@ -94,15 +94,9 @@ static int walk_pud_range(p4d_t *p4d, unsigned 
>> long addr, unsigned long end,
>>           }
>>             if (ops->pud_entry) {
>> -            spinlock_t *ptl = pud_trans_huge_lock(pud, walk->vma);
>> -
>> -            if (ptl) {
>> -                err = ops->pud_entry(pud, addr, next, walk);
>> -                spin_unlock(ptl);
>> -                if (err)
>> -                    break;
>> -                continue;
>> -            }
>> +            err = ops->pud_entry(pud, addr, next, walk);
>> +            if (err)
>> +                break;
>
> Actually, there are two current users of pud_entry(), in hmm.c and 
> since 5.5rc1 also mapping_dirty_helpers.c. The latter one is 
> unproblematic and requires no attention but the one in hmm.c is 
> probably largely untested, and seems to assume it was called outside 
> of the spinlock.
>
> The problem with the current patch is that the hmm pud_entry will 
> traverse also pmds, so that will now be done twice.
>
> /Thomas
>


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v16 11/25] mm: pagewalk: Add p4d_entry() and pgd_entry()
  2019-12-12 11:33       ` Thomas Hellström (VMware)
@ 2019-12-12 13:15         ` Steven Price
  -1 siblings, 0 replies; 85+ messages in thread
From: Steven Price @ 2019-12-12 13:15 UTC (permalink / raw)
  To: Thomas Hellström (VMware)
  Cc: Mark Rutland, x86, Zong Li, Arnd Bergmann, Ard Biesheuvel,
	Peter Zijlstra, Catalin Marinas, Dave Hansen, linux-kernel,
	linux-mm, Jérôme Glisse, Ingo Molnar, Borislav Petkov,
	Andy Lutomirski, H. Peter Anvin, James Morse, Thomas Gleixner,
	Will Deacon, Andrew Morton, linux-arm-kernel, Liang, Kan

On 12/12/2019 11:33, Thomas Hellström (VMware) wrote:
> On 12/12/19 12:23 PM, Thomas Hellström (VMware) wrote:
>> On 12/6/19 2:53 PM, Steven Price wrote:
>>> pgd_entry() and pud_entry() were removed by commit 0b1fbfe50006c410
>>> ("mm/pagewalk: remove pgd_entry() and pud_entry()") because there were
>>> no users. We're about to add users so reintroduce them, along with
>>> p4d_entry() as we now have 5 levels of tables.
>>>
>>> Note that commit a00cc7d9dd93d66a ("mm, x86: add support for
>>> PUD-sized transparent hugepages") already re-added pud_entry() but with
>>> different semantics to the other callbacks. Since there have never
>>> been upstream users of this, revert the semantics back to match the
>>> other callbacks. This means pud_entry() is called for all entries, not
>>> just transparent huge pages.

When I wrote that there were no upstream users, which sadly shows how
long ago that was :(

>> Actually, there are two users of pud_entry(), in hmm.c and since 
>> 5.5rc1 also mapping_dirty_helpers.c. The latter one is unproblematic 
>> and requires no attention but the one in hmm.c is probably largely 
>> untested, and seems to assume it was called outside of the spinlock.
>>
>> The problem with the current patch is that the hmm pud_entry will 
>> traverse also pmds, so that will be done twice now.
>>
>> In another thread we were discussing a means of rerunning the level 
>> (in case of a race), or continuing after a level, based on the return 
>> value after the callback. The change was fairly invasive,
>>
> Hmm. Forgot to remove the above text that appears twice. :(. The correct 
> one is inline below.
> 
>>
>>> Tested-by: Zong Li <zong.li@sifive.com>
>>> Signed-off-by: Steven Price <steven.price@arm.com>
>>> ---
>>>   include/linux/pagewalk.h | 19 +++++++++++++------
>>>   mm/pagewalk.c            | 27 ++++++++++++++++-----------
>>>   2 files changed, 29 insertions(+), 17 deletions(-)
>>>
>>> diff --git a/include/linux/pagewalk.h b/include/linux/pagewalk.h
>>> index 6ec82e92c87f..06790f23957f 100644
>>> --- a/include/linux/pagewalk.h
>>> +++ b/include/linux/pagewalk.h
>>> @@ -8,15 +8,15 @@ struct mm_walk;
>>>     /**
>>>    * mm_walk_ops - callbacks for walk_page_range
>>> - * @pud_entry:        if set, called for each non-empty PUD 
>>> (2nd-level) entry
>>> - *            this handler should only handle pud_trans_huge() puds.
>>> - *            the pmd_entry or pte_entry callbacks will be used for
>>> - *            regular PUDs.
>>> - * @pmd_entry:        if set, called for each non-empty PMD 
>>> (3rd-level) entry
>>> + * @pgd_entry:        if set, called for each non-empty PGD 
>>> (top-level) entry
>>> + * @p4d_entry:        if set, called for each non-empty P4D entry
>>> + * @pud_entry:        if set, called for each non-empty PUD entry
>>> + * @pmd_entry:        if set, called for each non-empty PMD entry
>>>    *            this handler is required to be able to handle
>>>    *            pmd_trans_huge() pmds.  They may simply choose to
>>>    *            split_huge_page() instead of handling it explicitly.
>>> - * @pte_entry:        if set, called for each non-empty PTE 
>>> (4th-level) entry
>>> + * @pte_entry:        if set, called for each non-empty PTE 
>>> (lowest-level)
>>> + *            entry
>>>    * @pte_hole:        if set, called for each hole at all levels
>>>    * @hugetlb_entry:    if set, called for each hugetlb entry
>>>    * @test_walk:        caller specific callback function to 
>>> determine whether
>>> @@ -27,8 +27,15 @@ struct mm_walk;
>>>    * @pre_vma:            if set, called before starting walk on a 
>>> non-null vma.
>>>    * @post_vma:           if set, called after a walk on a non-null 
>>> vma, provided
>>>    *                      that @pre_vma and the vma walk succeeded.
>>> + *
>>> + * p?d_entry callbacks are called even if those levels are folded on a
>>> + * particular architecture/configuration.
>>>    */
>>>   struct mm_walk_ops {
>>> +    int (*pgd_entry)(pgd_t *pgd, unsigned long addr,
>>> +             unsigned long next, struct mm_walk *walk);
>>> +    int (*p4d_entry)(p4d_t *p4d, unsigned long addr,
>>> +             unsigned long next, struct mm_walk *walk);
>>>       int (*pud_entry)(pud_t *pud, unsigned long addr,
>>>                unsigned long next, struct mm_walk *walk);
>>>       int (*pmd_entry)(pmd_t *pmd, unsigned long addr,
>>> diff --git a/mm/pagewalk.c b/mm/pagewalk.c
>>> index ea0b9e606ad1..c089786e7a7f 100644
>>> --- a/mm/pagewalk.c
>>> +++ b/mm/pagewalk.c
>>> @@ -94,15 +94,9 @@ static int walk_pud_range(p4d_t *p4d, unsigned 
>>> long addr, unsigned long end,
>>>           }
>>>             if (ops->pud_entry) {
>>> -            spinlock_t *ptl = pud_trans_huge_lock(pud, walk->vma);
>>> -
>>> -            if (ptl) {
>>> -                err = ops->pud_entry(pud, addr, next, walk);
>>> -                spin_unlock(ptl);
>>> -                if (err)
>>> -                    break;
>>> -                continue;
>>> -            }
>>> +            err = ops->pud_entry(pud, addr, next, walk);
>>> +            if (err)
>>> +                break;
>>
>> Actually, there are two current users of pud_entry(), in hmm.c and 
>> since 5.5rc1 also mapping_dirty_helpers.c. The latter one is 
>> unproblematic and requires no attention but the one in hmm.c is 
>> probably largely untested, and seems to assume it was called outside 
>> of the spinlock.

Thanks for pointing that out, I guess the simplest fix would be to
squash in something like the below which should restore the old
behaviour for hmm.c without affecting others.

Steve

---8<----
diff --git a/mm/hmm.c b/mm/hmm.c
index d379cb6496ae..744b6644d0e4 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -478,19 +478,26 @@ static int hmm_vma_walk_pud(pud_t *pudp, unsigned long start, unsigned long end,
  	pmd_t *pmdp;
  	pud_t pud;
  	int ret;
+	spinlock_t *ptl = pud_trans_huge_lock(pudp, walk->vma);
+	if (!ptl)
+		return 0;
  
  again:
  	pud = READ_ONCE(*pudp);
-	if (pud_none(pud))
-		return hmm_vma_walk_hole(start, end, walk);
+	if (pud_none(pud)) {
+		ret = hmm_vma_walk_hole(start, end, walk);
+		goto out_unlock;
+	}
  
  	if (pud_huge(pud) && pud_devmap(pud)) {
  		unsigned long i, npages, pfn;
  		uint64_t *pfns, cpu_flags;
  		bool fault, write_fault;
  
-		if (!pud_present(pud))
-			return hmm_vma_walk_hole(start, end, walk);
+		if (!pud_present(pud)) {
+			ret = hmm_vma_walk_hole(start, end, walk);
+			goto out_unlock;
+		}
  
  		i = (addr - range->start) >> PAGE_SHIFT;
  		npages = (end - addr) >> PAGE_SHIFT;
@@ -499,16 +506,20 @@ static int hmm_vma_walk_pud(pud_t *pudp, unsigned long start, unsigned long end,
  		cpu_flags = pud_to_hmm_pfn_flags(range, pud);
  		hmm_range_need_fault(hmm_vma_walk, pfns, npages,
  				     cpu_flags, &fault, &write_fault);
-		if (fault || write_fault)
-			return hmm_vma_walk_hole_(addr, end, fault,
-						write_fault, walk);
+		if (fault || write_fault) {
+			ret = hmm_vma_walk_hole_(addr, end, fault,
+						 write_fault, walk);
+			goto out_unlock;
+		}
  
  		pfn = pud_pfn(pud) + ((addr & ~PUD_MASK) >> PAGE_SHIFT);
  		for (i = 0; i < npages; ++i, ++pfn) {
  			hmm_vma_walk->pgmap = get_dev_pagemap(pfn,
  					      hmm_vma_walk->pgmap);
-			if (unlikely(!hmm_vma_walk->pgmap))
-				return -EBUSY;
+			if (unlikely(!hmm_vma_walk->pgmap)) {
+				ret = -EBUSY;
+				goto out_unlock;
+			}
  			pfns[i] = hmm_device_entry_from_pfn(range, pfn) |
  				  cpu_flags;
  		}
@@ -517,7 +528,8 @@ static int hmm_vma_walk_pud(pud_t *pudp, unsigned long start, unsigned long end,
  			hmm_vma_walk->pgmap = NULL;
  		}
  		hmm_vma_walk->last = end;
-		return 0;
+		ret = 0;
+		goto out_unlock;
  	}
  
  	split_huge_pud(walk->vma, pudp, addr);
@@ -529,10 +541,14 @@ static int hmm_vma_walk_pud(pud_t *pudp, unsigned long start, unsigned long end,
  		next = pmd_addr_end(addr, end);
  		ret = hmm_vma_walk_pmd(pmdp, addr, next, walk);
  		if (ret)
-			return ret;
+			goto out_unlock;
  	} while (pmdp++, addr = next, addr != end);
  
-	return 0;
+	ret = 0;
+
+out_unlock:
+	spin_unlock(ptl);
+	return ret;
  }
  #else
  #define hmm_vma_walk_pud	NULL

^ permalink raw reply related	[flat|nested] 85+ messages in thread

* Re: [PATCH v16 11/25] mm: pagewalk: Add p4d_entry() and pgd_entry()
@ 2019-12-12 13:15         ` Steven Price
  0 siblings, 0 replies; 85+ messages in thread
From: Steven Price @ 2019-12-12 13:15 UTC (permalink / raw)
  To: Thomas Hellström (VMware)
  Cc: Mark Rutland, Dave Hansen, Andy Lutomirski, Arnd Bergmann,
	Ard Biesheuvel, Peter Zijlstra, Catalin Marinas, x86,
	linux-kernel, linux-mm, Jérôme Glisse, Ingo Molnar,
	Borislav Petkov, Zong Li, H. Peter Anvin, James Morse,
	Thomas Gleixner, Will Deacon, Andrew Morton, linux-arm-kernel,
	Liang, Kan

On 12/12/2019 11:33, Thomas Hellström (VMware) wrote:
> On 12/12/19 12:23 PM, Thomas Hellström (VMware) wrote:
>> On 12/6/19 2:53 PM, Steven Price wrote:
>>> pgd_entry() and pud_entry() were removed by commit 0b1fbfe50006c410
>>> ("mm/pagewalk: remove pgd_entry() and pud_entry()") because there were
>>> no users. We're about to add users so reintroduce them, along with
>>> p4d_entry() as we now have 5 levels of tables.
>>>
>>> Note that commit a00cc7d9dd93d66a ("mm, x86: add support for
>>> PUD-sized transparent hugepages") already re-added pud_entry() but with
>>> different semantics to the other callbacks. Since there have never
>>> been upstream users of this, revert the semantics back to match the
>>> other callbacks. This means pud_entry() is called for all entries, not
>>> just transparent huge pages.

When I wrote that there were no upstream users, which sadly shows how
long ago that was :(

>> Actually, there are two users of pud_entry(), in hmm.c and since 
>> 5.5rc1 also mapping_dirty_helpers.c. The latter one is unproblematic 
>> and requires no attention but the one in hmm.c is probably largely 
>> untested, and seems to assume it was called outside of the spinlock.
>>
>> The problem with the current patch is that the hmm pud_entry will 
>> traverse also pmds, so that will be done twice now.
>>
>> In another thread we were discussing a means of rerunning the level 
>> (in case of a race), or continuing after a level, based on the return 
>> value after the callback. The change was fairly invasive,
>>
> Hmm. Forgot to remove the above text that appears twice. :(. The correct 
> one is inline below.
> 
>>
>>> Tested-by: Zong Li <zong.li@sifive.com>
>>> Signed-off-by: Steven Price <steven.price@arm.com>
>>> ---
>>>   include/linux/pagewalk.h | 19 +++++++++++++------
>>>   mm/pagewalk.c            | 27 ++++++++++++++++-----------
>>>   2 files changed, 29 insertions(+), 17 deletions(-)
>>>
>>> diff --git a/include/linux/pagewalk.h b/include/linux/pagewalk.h
>>> index 6ec82e92c87f..06790f23957f 100644
>>> --- a/include/linux/pagewalk.h
>>> +++ b/include/linux/pagewalk.h
>>> @@ -8,15 +8,15 @@ struct mm_walk;
>>>     /**
>>>    * mm_walk_ops - callbacks for walk_page_range
>>> - * @pud_entry:        if set, called for each non-empty PUD 
>>> (2nd-level) entry
>>> - *            this handler should only handle pud_trans_huge() puds.
>>> - *            the pmd_entry or pte_entry callbacks will be used for
>>> - *            regular PUDs.
>>> - * @pmd_entry:        if set, called for each non-empty PMD 
>>> (3rd-level) entry
>>> + * @pgd_entry:        if set, called for each non-empty PGD 
>>> (top-level) entry
>>> + * @p4d_entry:        if set, called for each non-empty P4D entry
>>> + * @pud_entry:        if set, called for each non-empty PUD entry
>>> + * @pmd_entry:        if set, called for each non-empty PMD entry
>>>    *            this handler is required to be able to handle
>>>    *            pmd_trans_huge() pmds.  They may simply choose to
>>>    *            split_huge_page() instead of handling it explicitly.
>>> - * @pte_entry:        if set, called for each non-empty PTE 
>>> (4th-level) entry
>>> + * @pte_entry:        if set, called for each non-empty PTE 
>>> (lowest-level)
>>> + *            entry
>>>    * @pte_hole:        if set, called for each hole at all levels
>>>    * @hugetlb_entry:    if set, called for each hugetlb entry
>>>    * @test_walk:        caller specific callback function to 
>>> determine whether
>>> @@ -27,8 +27,15 @@ struct mm_walk;
>>>    * @pre_vma:            if set, called before starting walk on a 
>>> non-null vma.
>>>    * @post_vma:           if set, called after a walk on a non-null 
>>> vma, provided
>>>    *                      that @pre_vma and the vma walk succeeded.
>>> + *
>>> + * p?d_entry callbacks are called even if those levels are folded on a
>>> + * particular architecture/configuration.
>>>    */
>>>   struct mm_walk_ops {
>>> +    int (*pgd_entry)(pgd_t *pgd, unsigned long addr,
>>> +             unsigned long next, struct mm_walk *walk);
>>> +    int (*p4d_entry)(p4d_t *p4d, unsigned long addr,
>>> +             unsigned long next, struct mm_walk *walk);
>>>       int (*pud_entry)(pud_t *pud, unsigned long addr,
>>>                unsigned long next, struct mm_walk *walk);
>>>       int (*pmd_entry)(pmd_t *pmd, unsigned long addr,
>>> diff --git a/mm/pagewalk.c b/mm/pagewalk.c
>>> index ea0b9e606ad1..c089786e7a7f 100644
>>> --- a/mm/pagewalk.c
>>> +++ b/mm/pagewalk.c
>>> @@ -94,15 +94,9 @@ static int walk_pud_range(p4d_t *p4d, unsigned 
>>> long addr, unsigned long end,
>>>           }
>>>             if (ops->pud_entry) {
>>> -            spinlock_t *ptl = pud_trans_huge_lock(pud, walk->vma);
>>> -
>>> -            if (ptl) {
>>> -                err = ops->pud_entry(pud, addr, next, walk);
>>> -                spin_unlock(ptl);
>>> -                if (err)
>>> -                    break;
>>> -                continue;
>>> -            }
>>> +            err = ops->pud_entry(pud, addr, next, walk);
>>> +            if (err)
>>> +                break;
>>
>> Actually, there are two current users of pud_entry(), in hmm.c and 
>> since 5.5rc1 also mapping_dirty_helpers.c. The latter one is 
>> unproblematic and requires no attention but the one in hmm.c is 
>> probably largely untested, and seems to assume it was called outside 
>> of the spinlock.

Thanks for pointing that out, I guess the simplest fix would be to
squash in something like the below which should restore the old
behaviour for hmm.c without affecting others.

Steve

---8<----
diff --git a/mm/hmm.c b/mm/hmm.c
index d379cb6496ae..744b6644d0e4 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -478,19 +478,26 @@ static int hmm_vma_walk_pud(pud_t *pudp, unsigned long start, unsigned long end,
  	pmd_t *pmdp;
  	pud_t pud;
  	int ret;
+	spinlock_t *ptl = pud_trans_huge_lock(pudp, walk->vma);
+	if (!ptl)
+		return 0;
  
  again:
  	pud = READ_ONCE(*pudp);
-	if (pud_none(pud))
-		return hmm_vma_walk_hole(start, end, walk);
+	if (pud_none(pud)) {
+		ret = hmm_vma_walk_hole(start, end, walk);
+		goto out_unlock;
+	}
  
  	if (pud_huge(pud) && pud_devmap(pud)) {
  		unsigned long i, npages, pfn;
  		uint64_t *pfns, cpu_flags;
  		bool fault, write_fault;
  
-		if (!pud_present(pud))
-			return hmm_vma_walk_hole(start, end, walk);
+		if (!pud_present(pud)) {
+			ret = hmm_vma_walk_hole(start, end, walk);
+			goto out_unlock;
+		}
  
  		i = (addr - range->start) >> PAGE_SHIFT;
  		npages = (end - addr) >> PAGE_SHIFT;
@@ -499,16 +506,20 @@ static int hmm_vma_walk_pud(pud_t *pudp, unsigned long start, unsigned long end,
  		cpu_flags = pud_to_hmm_pfn_flags(range, pud);
  		hmm_range_need_fault(hmm_vma_walk, pfns, npages,
  				     cpu_flags, &fault, &write_fault);
-		if (fault || write_fault)
-			return hmm_vma_walk_hole_(addr, end, fault,
-						write_fault, walk);
+		if (fault || write_fault) {
+			ret = hmm_vma_walk_hole_(addr, end, fault,
+						 write_fault, walk);
+			goto out_unlock;
+		}
  
  		pfn = pud_pfn(pud) + ((addr & ~PUD_MASK) >> PAGE_SHIFT);
  		for (i = 0; i < npages; ++i, ++pfn) {
  			hmm_vma_walk->pgmap = get_dev_pagemap(pfn,
  					      hmm_vma_walk->pgmap);
-			if (unlikely(!hmm_vma_walk->pgmap))
-				return -EBUSY;
+			if (unlikely(!hmm_vma_walk->pgmap)) {
+				ret = -EBUSY;
+				goto out_unlock;
+			}
  			pfns[i] = hmm_device_entry_from_pfn(range, pfn) |
  				  cpu_flags;
  		}
@@ -517,7 +528,8 @@ static int hmm_vma_walk_pud(pud_t *pudp, unsigned long start, unsigned long end,
  			hmm_vma_walk->pgmap = NULL;
  		}
  		hmm_vma_walk->last = end;
-		return 0;
+		ret = 0;
+		goto out_unlock;
  	}
  
  	split_huge_pud(walk->vma, pudp, addr);
@@ -529,10 +541,14 @@ static int hmm_vma_walk_pud(pud_t *pudp, unsigned long start, unsigned long end,
  		next = pmd_addr_end(addr, end);
  		ret = hmm_vma_walk_pmd(pmdp, addr, next, walk);
  		if (ret)
-			return ret;
+			goto out_unlock;
  	} while (pmdp++, addr = next, addr != end);
  
-	return 0;
+	ret = 0;
+
+out_unlock:
+	spin_unlock(ptl);
+	return ret;
  }
  #else
  #define hmm_vma_walk_pud	NULL

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 85+ messages in thread

* Re: [PATCH v16 11/25] mm: pagewalk: Add p4d_entry() and pgd_entry()
  2019-12-12 13:15         ` Steven Price
@ 2019-12-12 14:04           ` Thomas Hellström (VMware)
  -1 siblings, 0 replies; 85+ messages in thread
From: Thomas Hellström (VMware) @ 2019-12-12 14:04 UTC (permalink / raw)
  To: Steven Price
  Cc: Mark Rutland, x86, Zong Li, Arnd Bergmann, Ard Biesheuvel,
	Peter Zijlstra, Catalin Marinas, Dave Hansen, linux-kernel,
	linux-mm, Jérôme Glisse, Ingo Molnar, Borislav Petkov,
	Andy Lutomirski, H. Peter Anvin, James Morse, Thomas Gleixner,
	Will Deacon, Andrew Morton, linux-arm-kernel, Liang, Kan

On 12/12/19 2:15 PM, Steven Price wrote:
> On 12/12/2019 11:33, Thomas Hellström (VMware) wrote:
>> On 12/12/19 12:23 PM, Thomas Hellström (VMware) wrote:
>>> On 12/6/19 2:53 PM, Steven Price wrote:
>>>> pgd_entry() and pud_entry() were removed by commit 0b1fbfe50006c410
>>>> ("mm/pagewalk: remove pgd_entry() and pud_entry()") because there were
>>>> no users. We're about to add users so reintroduce them, along with
>>>> p4d_entry() as we now have 5 levels of tables.
>>>>
>>>> Note that commit a00cc7d9dd93d66a ("mm, x86: add support for
>>>> PUD-sized transparent hugepages") already re-added pud_entry() but 
>>>> with
>>>> different semantics to the other callbacks. Since there have never
>>>> been upstream users of this, revert the semantics back to match the
>>>> other callbacks. This means pud_entry() is called for all entries, not
>>>> just transparent huge pages.
>
> When I wrote that there were no upstream users, which sadly shows how
> long ago that was :(
>
>>> Actually, there are two users of pud_entry(), in hmm.c and since 
>>> 5.5rc1 also mapping_dirty_helpers.c. The latter one is unproblematic 
>>> and requires no attention but the one in hmm.c is probably largely 
>>> untested, and seems to assume it was called outside of the spinlock.
>>>
>>> The problem with the current patch is that the hmm pud_entry will 
>>> traverse also pmds, so that will be done twice now.
>>>
>>> In another thread we were discussing a means of rerunning the level 
>>> (in case of a race), or continuing after a level, based on the 
>>> return value after the callback. The change was fairly invasive,
>>>
>> Hmm. Forgot to remove the above text that appears twice. :(. The 
>> correct one is inline below.
>>
>>>
>>>> Tested-by: Zong Li <zong.li@sifive.com>
>>>> Signed-off-by: Steven Price <steven.price@arm.com>
>>>> ---
>>>>   include/linux/pagewalk.h | 19 +++++++++++++------
>>>>   mm/pagewalk.c            | 27 ++++++++++++++++-----------
>>>>   2 files changed, 29 insertions(+), 17 deletions(-)
>>>>
>>>> diff --git a/include/linux/pagewalk.h b/include/linux/pagewalk.h
>>>> index 6ec82e92c87f..06790f23957f 100644
>>>> --- a/include/linux/pagewalk.h
>>>> +++ b/include/linux/pagewalk.h
>>>> @@ -8,15 +8,15 @@ struct mm_walk;
>>>>     /**
>>>>    * mm_walk_ops - callbacks for walk_page_range
>>>> - * @pud_entry:        if set, called for each non-empty PUD 
>>>> (2nd-level) entry
>>>> - *            this handler should only handle pud_trans_huge() puds.
>>>> - *            the pmd_entry or pte_entry callbacks will be used for
>>>> - *            regular PUDs.
>>>> - * @pmd_entry:        if set, called for each non-empty PMD 
>>>> (3rd-level) entry
>>>> + * @pgd_entry:        if set, called for each non-empty PGD 
>>>> (top-level) entry
>>>> + * @p4d_entry:        if set, called for each non-empty P4D entry
>>>> + * @pud_entry:        if set, called for each non-empty PUD entry
>>>> + * @pmd_entry:        if set, called for each non-empty PMD entry
>>>>    *            this handler is required to be able to handle
>>>>    *            pmd_trans_huge() pmds.  They may simply choose to
>>>>    *            split_huge_page() instead of handling it explicitly.
>>>> - * @pte_entry:        if set, called for each non-empty PTE 
>>>> (4th-level) entry
>>>> + * @pte_entry:        if set, called for each non-empty PTE 
>>>> (lowest-level)
>>>> + *            entry
>>>>    * @pte_hole:        if set, called for each hole at all levels
>>>>    * @hugetlb_entry:    if set, called for each hugetlb entry
>>>>    * @test_walk:        caller specific callback function to 
>>>> determine whether
>>>> @@ -27,8 +27,15 @@ struct mm_walk;
>>>>    * @pre_vma:            if set, called before starting walk on a 
>>>> non-null vma.
>>>>    * @post_vma:           if set, called after a walk on a non-null 
>>>> vma, provided
>>>>    *                      that @pre_vma and the vma walk succeeded.
>>>> + *
>>>> + * p?d_entry callbacks are called even if those levels are folded 
>>>> on a
>>>> + * particular architecture/configuration.
>>>>    */
>>>>   struct mm_walk_ops {
>>>> +    int (*pgd_entry)(pgd_t *pgd, unsigned long addr,
>>>> +             unsigned long next, struct mm_walk *walk);
>>>> +    int (*p4d_entry)(p4d_t *p4d, unsigned long addr,
>>>> +             unsigned long next, struct mm_walk *walk);
>>>>       int (*pud_entry)(pud_t *pud, unsigned long addr,
>>>>                unsigned long next, struct mm_walk *walk);
>>>>       int (*pmd_entry)(pmd_t *pmd, unsigned long addr,
>>>> diff --git a/mm/pagewalk.c b/mm/pagewalk.c
>>>> index ea0b9e606ad1..c089786e7a7f 100644
>>>> --- a/mm/pagewalk.c
>>>> +++ b/mm/pagewalk.c
>>>> @@ -94,15 +94,9 @@ static int walk_pud_range(p4d_t *p4d, unsigned 
>>>> long addr, unsigned long end,
>>>>           }
>>>>             if (ops->pud_entry) {
>>>> -            spinlock_t *ptl = pud_trans_huge_lock(pud, walk->vma);
>>>> -
>>>> -            if (ptl) {
>>>> -                err = ops->pud_entry(pud, addr, next, walk);
>>>> -                spin_unlock(ptl);
>>>> -                if (err)
>>>> -                    break;
>>>> -                continue;
>>>> -            }
>>>> +            err = ops->pud_entry(pud, addr, next, walk);
>>>> +            if (err)
>>>> +                break;
>>>
>>> Actually, there are two current users of pud_entry(), in hmm.c and 
>>> since 5.5rc1 also mapping_dirty_helpers.c. The latter one is 
>>> unproblematic and requires no attention but the one in hmm.c is 
>>> probably largely untested, and seems to assume it was called outside 
>>> of the spinlock.
>
> Thanks for pointing that out, I guess the simplest fix would be to
> squash in something like the below which should restore the old
> behaviour for hmm.c without affecting others.
>
> Steve 

I'm not fully sure that the old behaviour is the correct one, but definitely hmm's pud_entry needs some fixing.
I'm more concerned with the pagewalk code. With your patch it actually splits all huge puds present in the page-table
on each page walk which is not what we want.

One idea would be to add a new member to struct_mm_walk:

enum page_walk_ret_action {
	ACTION_SUBTREE = 0,
	ACTION_CONTINUE = 1,
	ACTION_AGAIN = 2 /* Only for levels that thave p?d_unstable */
};

struct mm_walk {
	...
	enum page_walk_ret_action action; /* or perhaps as an enum */
};


if (ops->pud_entry) {
	walk->action = ACTION_SUBTREE;
	...
	...
	...
	if (walk->action == ACTION_AGAIN)  /* Callback tried to split huge entry, but failed */
		goto again;
	else if (walk->action == ACTION_CONTINUE) /* Done with this subtree. Probably huge entry handled. */
		continue;
	/* ACTION_SUBTREE falls through */
}

we discussed something similar before on linux-mm, but the idea then was to redefine
the positive return value of the callback to the action, but that meant changing those existing callbacks that relied on
a positive return value. The above would be helpful also for pmd_entry.

/Thomas





^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v16 11/25] mm: pagewalk: Add p4d_entry() and pgd_entry()
@ 2019-12-12 14:04           ` Thomas Hellström (VMware)
  0 siblings, 0 replies; 85+ messages in thread
From: Thomas Hellström (VMware) @ 2019-12-12 14:04 UTC (permalink / raw)
  To: Steven Price
  Cc: Mark Rutland, Dave Hansen, Andy Lutomirski, Arnd Bergmann,
	Ard Biesheuvel, Peter Zijlstra, Catalin Marinas, x86,
	linux-kernel, linux-mm, Jérôme Glisse, Ingo Molnar,
	Borislav Petkov, Zong Li, H. Peter Anvin, James Morse,
	Thomas Gleixner, Will Deacon, Andrew Morton, linux-arm-kernel,
	Liang, Kan

On 12/12/19 2:15 PM, Steven Price wrote:
> On 12/12/2019 11:33, Thomas Hellström (VMware) wrote:
>> On 12/12/19 12:23 PM, Thomas Hellström (VMware) wrote:
>>> On 12/6/19 2:53 PM, Steven Price wrote:
>>>> pgd_entry() and pud_entry() were removed by commit 0b1fbfe50006c410
>>>> ("mm/pagewalk: remove pgd_entry() and pud_entry()") because there were
>>>> no users. We're about to add users so reintroduce them, along with
>>>> p4d_entry() as we now have 5 levels of tables.
>>>>
>>>> Note that commit a00cc7d9dd93d66a ("mm, x86: add support for
>>>> PUD-sized transparent hugepages") already re-added pud_entry() but 
>>>> with
>>>> different semantics to the other callbacks. Since there have never
>>>> been upstream users of this, revert the semantics back to match the
>>>> other callbacks. This means pud_entry() is called for all entries, not
>>>> just transparent huge pages.
>
> When I wrote that there were no upstream users, which sadly shows how
> long ago that was :(
>
>>> Actually, there are two users of pud_entry(), in hmm.c and since 
>>> 5.5rc1 also mapping_dirty_helpers.c. The latter one is unproblematic 
>>> and requires no attention but the one in hmm.c is probably largely 
>>> untested, and seems to assume it was called outside of the spinlock.
>>>
>>> The problem with the current patch is that the hmm pud_entry will 
>>> traverse also pmds, so that will be done twice now.
>>>
>>> In another thread we were discussing a means of rerunning the level 
>>> (in case of a race), or continuing after a level, based on the 
>>> return value after the callback. The change was fairly invasive,
>>>
>> Hmm. Forgot to remove the above text that appears twice. :(. The 
>> correct one is inline below.
>>
>>>
>>>> Tested-by: Zong Li <zong.li@sifive.com>
>>>> Signed-off-by: Steven Price <steven.price@arm.com>
>>>> ---
>>>>   include/linux/pagewalk.h | 19 +++++++++++++------
>>>>   mm/pagewalk.c            | 27 ++++++++++++++++-----------
>>>>   2 files changed, 29 insertions(+), 17 deletions(-)
>>>>
>>>> diff --git a/include/linux/pagewalk.h b/include/linux/pagewalk.h
>>>> index 6ec82e92c87f..06790f23957f 100644
>>>> --- a/include/linux/pagewalk.h
>>>> +++ b/include/linux/pagewalk.h
>>>> @@ -8,15 +8,15 @@ struct mm_walk;
>>>>     /**
>>>>    * mm_walk_ops - callbacks for walk_page_range
>>>> - * @pud_entry:        if set, called for each non-empty PUD 
>>>> (2nd-level) entry
>>>> - *            this handler should only handle pud_trans_huge() puds.
>>>> - *            the pmd_entry or pte_entry callbacks will be used for
>>>> - *            regular PUDs.
>>>> - * @pmd_entry:        if set, called for each non-empty PMD 
>>>> (3rd-level) entry
>>>> + * @pgd_entry:        if set, called for each non-empty PGD 
>>>> (top-level) entry
>>>> + * @p4d_entry:        if set, called for each non-empty P4D entry
>>>> + * @pud_entry:        if set, called for each non-empty PUD entry
>>>> + * @pmd_entry:        if set, called for each non-empty PMD entry
>>>>    *            this handler is required to be able to handle
>>>>    *            pmd_trans_huge() pmds.  They may simply choose to
>>>>    *            split_huge_page() instead of handling it explicitly.
>>>> - * @pte_entry:        if set, called for each non-empty PTE 
>>>> (4th-level) entry
>>>> + * @pte_entry:        if set, called for each non-empty PTE 
>>>> (lowest-level)
>>>> + *            entry
>>>>    * @pte_hole:        if set, called for each hole at all levels
>>>>    * @hugetlb_entry:    if set, called for each hugetlb entry
>>>>    * @test_walk:        caller specific callback function to 
>>>> determine whether
>>>> @@ -27,8 +27,15 @@ struct mm_walk;
>>>>    * @pre_vma:            if set, called before starting walk on a 
>>>> non-null vma.
>>>>    * @post_vma:           if set, called after a walk on a non-null 
>>>> vma, provided
>>>>    *                      that @pre_vma and the vma walk succeeded.
>>>> + *
>>>> + * p?d_entry callbacks are called even if those levels are folded 
>>>> on a
>>>> + * particular architecture/configuration.
>>>>    */
>>>>   struct mm_walk_ops {
>>>> +    int (*pgd_entry)(pgd_t *pgd, unsigned long addr,
>>>> +             unsigned long next, struct mm_walk *walk);
>>>> +    int (*p4d_entry)(p4d_t *p4d, unsigned long addr,
>>>> +             unsigned long next, struct mm_walk *walk);
>>>>       int (*pud_entry)(pud_t *pud, unsigned long addr,
>>>>                unsigned long next, struct mm_walk *walk);
>>>>       int (*pmd_entry)(pmd_t *pmd, unsigned long addr,
>>>> diff --git a/mm/pagewalk.c b/mm/pagewalk.c
>>>> index ea0b9e606ad1..c089786e7a7f 100644
>>>> --- a/mm/pagewalk.c
>>>> +++ b/mm/pagewalk.c
>>>> @@ -94,15 +94,9 @@ static int walk_pud_range(p4d_t *p4d, unsigned 
>>>> long addr, unsigned long end,
>>>>           }
>>>>             if (ops->pud_entry) {
>>>> -            spinlock_t *ptl = pud_trans_huge_lock(pud, walk->vma);
>>>> -
>>>> -            if (ptl) {
>>>> -                err = ops->pud_entry(pud, addr, next, walk);
>>>> -                spin_unlock(ptl);
>>>> -                if (err)
>>>> -                    break;
>>>> -                continue;
>>>> -            }
>>>> +            err = ops->pud_entry(pud, addr, next, walk);
>>>> +            if (err)
>>>> +                break;
>>>
>>> Actually, there are two current users of pud_entry(), in hmm.c and 
>>> since 5.5rc1 also mapping_dirty_helpers.c. The latter one is 
>>> unproblematic and requires no attention but the one in hmm.c is 
>>> probably largely untested, and seems to assume it was called outside 
>>> of the spinlock.
>
> Thanks for pointing that out, I guess the simplest fix would be to
> squash in something like the below which should restore the old
> behaviour for hmm.c without affecting others.
>
> Steve 

I'm not fully sure that the old behaviour is the correct one, but definitely hmm's pud_entry needs some fixing.
I'm more concerned with the pagewalk code. With your patch it actually splits all huge puds present in the page-table
on each page walk which is not what we want.

One idea would be to add a new member to struct_mm_walk:

enum page_walk_ret_action {
	ACTION_SUBTREE = 0,
	ACTION_CONTINUE = 1,
	ACTION_AGAIN = 2 /* Only for levels that thave p?d_unstable */
};

struct mm_walk {
	...
	enum page_walk_ret_action action; /* or perhaps as an enum */
};


if (ops->pud_entry) {
	walk->action = ACTION_SUBTREE;
	...
	...
	...
	if (walk->action == ACTION_AGAIN)  /* Callback tried to split huge entry, but failed */
		goto again;
	else if (walk->action == ACTION_CONTINUE) /* Done with this subtree. Probably huge entry handled. */
		continue;
	/* ACTION_SUBTREE falls through */
}

we discussed something similar before on linux-mm, but the idea then was to redefine
the positive return value of the callback to the action, but that meant changing those existing callbacks that relied on
a positive return value. The above would be helpful also for pmd_entry.

/Thomas





_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v16 11/25] mm: pagewalk: Add p4d_entry() and pgd_entry()
  2019-12-12 14:04           ` Thomas Hellström (VMware)
@ 2019-12-12 15:18             ` Steven Price
  -1 siblings, 0 replies; 85+ messages in thread
From: Steven Price @ 2019-12-12 15:18 UTC (permalink / raw)
  To: Thomas Hellström (VMware)
  Cc: Mark Rutland, Dave Hansen, Andy Lutomirski, Arnd Bergmann,
	Ard Biesheuvel, Peter Zijlstra, Catalin Marinas, x86,
	linux-kernel, linux-mm, Jérôme Glisse, Ingo Molnar,
	Borislav Petkov, Zong Li, H. Peter Anvin, James Morse,
	Thomas Gleixner, Will Deacon, Andrew Morton, linux-arm-kernel,
	Liang, Kan

On 12/12/2019 14:04, Thomas Hellström (VMware) wrote:
> On 12/12/19 2:15 PM, Steven Price wrote:
>> On 12/12/2019 11:33, Thomas Hellström (VMware) wrote:
>>> On 12/12/19 12:23 PM, Thomas Hellström (VMware) wrote:
>>>> On 12/6/19 2:53 PM, Steven Price wrote:
>>>>> pgd_entry() and pud_entry() were removed by commit 0b1fbfe50006c410
>>>>> ("mm/pagewalk: remove pgd_entry() and pud_entry()") because there were
>>>>> no users. We're about to add users so reintroduce them, along with
>>>>> p4d_entry() as we now have 5 levels of tables.
>>>>>
>>>>> Note that commit a00cc7d9dd93d66a ("mm, x86: add support for
>>>>> PUD-sized transparent hugepages") already re-added pud_entry() but with
>>>>> different semantics to the other callbacks. Since there have never
>>>>> been upstream users of this, revert the semantics back to match the
>>>>> other callbacks. This means pud_entry() is called for all entries, not
>>>>> just transparent huge pages.
>>
>> When I wrote that there were no upstream users, which sadly shows how
>> long ago that was :(
>>
>>>> Actually, there are two users of pud_entry(), in hmm.c and since 5.5rc1 also mapping_dirty_helpers.c. The latter one is unproblematic and requires no attention but the one in hmm.c is probably largely untested, and seems to assume it was called outside of the spinlock.
>>>>
>>>> The problem with the current patch is that the hmm pud_entry will traverse also pmds, so that will be done twice now.
>>>>
>>>> In another thread we were discussing a means of rerunning the level (in case of a race), or continuing after a level, based on the return value after the callback. The change was fairly invasive,
>>>>
>>> Hmm. Forgot to remove the above text that appears twice. :(. The correct one is inline below.
>>>
>>>>
>>>>> Tested-by: Zong Li <zong.li@sifive.com>
>>>>> Signed-off-by: Steven Price <steven.price@arm.com>
>>>>> ---
>>>>>   include/linux/pagewalk.h | 19 +++++++++++++------
>>>>>   mm/pagewalk.c            | 27 ++++++++++++++++-----------
>>>>>   2 files changed, 29 insertions(+), 17 deletions(-)
>>>>>
>>>>> diff --git a/include/linux/pagewalk.h b/include/linux/pagewalk.h
>>>>> index 6ec82e92c87f..06790f23957f 100644
>>>>> --- a/include/linux/pagewalk.h
>>>>> +++ b/include/linux/pagewalk.h
>>>>> @@ -8,15 +8,15 @@ struct mm_walk;
>>>>>     /**
>>>>>    * mm_walk_ops - callbacks for walk_page_range
>>>>> - * @pud_entry:        if set, called for each non-empty PUD (2nd-level) entry
>>>>> - *            this handler should only handle pud_trans_huge() puds.
>>>>> - *            the pmd_entry or pte_entry callbacks will be used for
>>>>> - *            regular PUDs.
>>>>> - * @pmd_entry:        if set, called for each non-empty PMD (3rd-level) entry
>>>>> + * @pgd_entry:        if set, called for each non-empty PGD (top-level) entry
>>>>> + * @p4d_entry:        if set, called for each non-empty P4D entry
>>>>> + * @pud_entry:        if set, called for each non-empty PUD entry
>>>>> + * @pmd_entry:        if set, called for each non-empty PMD entry
>>>>>    *            this handler is required to be able to handle
>>>>>    *            pmd_trans_huge() pmds.  They may simply choose to
>>>>>    *            split_huge_page() instead of handling it explicitly.
>>>>> - * @pte_entry:        if set, called for each non-empty PTE (4th-level) entry
>>>>> + * @pte_entry:        if set, called for each non-empty PTE (lowest-level)
>>>>> + *            entry
>>>>>    * @pte_hole:        if set, called for each hole at all levels
>>>>>    * @hugetlb_entry:    if set, called for each hugetlb entry
>>>>>    * @test_walk:        caller specific callback function to determine whether
>>>>> @@ -27,8 +27,15 @@ struct mm_walk;
>>>>>    * @pre_vma:            if set, called before starting walk on a non-null vma.
>>>>>    * @post_vma:           if set, called after a walk on a non-null vma, provided
>>>>>    *                      that @pre_vma and the vma walk succeeded.
>>>>> + *
>>>>> + * p?d_entry callbacks are called even if those levels are folded on a
>>>>> + * particular architecture/configuration.
>>>>>    */
>>>>>   struct mm_walk_ops {
>>>>> +    int (*pgd_entry)(pgd_t *pgd, unsigned long addr,
>>>>> +             unsigned long next, struct mm_walk *walk);
>>>>> +    int (*p4d_entry)(p4d_t *p4d, unsigned long addr,
>>>>> +             unsigned long next, struct mm_walk *walk);
>>>>>       int (*pud_entry)(pud_t *pud, unsigned long addr,
>>>>>                unsigned long next, struct mm_walk *walk);
>>>>>       int (*pmd_entry)(pmd_t *pmd, unsigned long addr,
>>>>> diff --git a/mm/pagewalk.c b/mm/pagewalk.c
>>>>> index ea0b9e606ad1..c089786e7a7f 100644
>>>>> --- a/mm/pagewalk.c
>>>>> +++ b/mm/pagewalk.c
>>>>> @@ -94,15 +94,9 @@ static int walk_pud_range(p4d_t *p4d, unsigned long addr, unsigned long end,
>>>>>           }
>>>>>             if (ops->pud_entry) {
>>>>> -            spinlock_t *ptl = pud_trans_huge_lock(pud, walk->vma);
>>>>> -
>>>>> -            if (ptl) {
>>>>> -                err = ops->pud_entry(pud, addr, next, walk);
>>>>> -                spin_unlock(ptl);
>>>>> -                if (err)
>>>>> -                    break;
>>>>> -                continue;
>>>>> -            }
>>>>> +            err = ops->pud_entry(pud, addr, next, walk);
>>>>> +            if (err)
>>>>> +                break;
>>>>
>>>> Actually, there are two current users of pud_entry(), in hmm.c and since 5.5rc1 also mapping_dirty_helpers.c. The latter one is unproblematic and requires no attention but the one in hmm.c is probably largely untested, and seems to assume it was called outside of the spinlock.
>>
>> Thanks for pointing that out, I guess the simplest fix would be to
>> squash in something like the below which should restore the old
>> behaviour for hmm.c without affecting others.
>>
>> Steve 
> 
> I'm not fully sure that the old behaviour is the correct one, but definitely hmm's pud_entry needs some fixing.
> I'm more concerned with the pagewalk code. With your patch it actually splits all huge puds present in the page-table
> on each page walk which is not what we want.

Good catch - yes that's certainly not ideal.

> One idea would be to add a new member to struct_mm_walk:
> 
> enum page_walk_ret_action {
>      ACTION_SUBTREE = 0,
>      ACTION_CONTINUE = 1,
>      ACTION_AGAIN = 2 /* Only for levels that thave p?d_unstable */
> };
> 
> struct mm_walk {
>      ...
>      enum page_walk_ret_action action; /* or perhaps as an enum */
> };
> 
> 
> if (ops->pud_entry) {
>      walk->action = ACTION_SUBTREE;
>      ...
>      ...
>      ...
>      if (walk->action == ACTION_AGAIN)  /* Callback tried to split huge entry, but failed */
>          goto again;
>      else if (walk->action == ACTION_CONTINUE) /* Done with this subtree. Probably huge entry handled. */
>          continue;
>      /* ACTION_SUBTREE falls through */
> }

I'll have a go at implementing the above - this might also allow removing the test_p?d() callbacks as they can simply return ACTION_CONTINUE.

Steve

> we discussed something similar before on linux-mm, but the idea then was to redefine
> the positive return value of the callback to the action, but that meant changing those existing callbacks that relied on
> a positive return value. The above would be helpful also for pmd_entry.
> 
> /Thomas
> 
> 
> 
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v16 11/25] mm: pagewalk: Add p4d_entry() and pgd_entry()
@ 2019-12-12 15:18             ` Steven Price
  0 siblings, 0 replies; 85+ messages in thread
From: Steven Price @ 2019-12-12 15:18 UTC (permalink / raw)
  To: Thomas Hellström (VMware)
  Cc: Mark Rutland, x86, Zong Li, Arnd Bergmann, Ard Biesheuvel,
	Peter Zijlstra, Catalin Marinas, Dave Hansen, linux-kernel,
	linux-mm, Jérôme Glisse, Ingo Molnar, Borislav Petkov,
	Andy Lutomirski, H. Peter Anvin, James Morse, Thomas Gleixner,
	Will Deacon, Andrew Morton, linux-arm-kernel, Liang, Kan

On 12/12/2019 14:04, Thomas Hellström (VMware) wrote:
> On 12/12/19 2:15 PM, Steven Price wrote:
>> On 12/12/2019 11:33, Thomas Hellström (VMware) wrote:
>>> On 12/12/19 12:23 PM, Thomas Hellström (VMware) wrote:
>>>> On 12/6/19 2:53 PM, Steven Price wrote:
>>>>> pgd_entry() and pud_entry() were removed by commit 0b1fbfe50006c410
>>>>> ("mm/pagewalk: remove pgd_entry() and pud_entry()") because there were
>>>>> no users. We're about to add users so reintroduce them, along with
>>>>> p4d_entry() as we now have 5 levels of tables.
>>>>>
>>>>> Note that commit a00cc7d9dd93d66a ("mm, x86: add support for
>>>>> PUD-sized transparent hugepages") already re-added pud_entry() but with
>>>>> different semantics to the other callbacks. Since there have never
>>>>> been upstream users of this, revert the semantics back to match the
>>>>> other callbacks. This means pud_entry() is called for all entries, not
>>>>> just transparent huge pages.
>>
>> When I wrote that there were no upstream users, which sadly shows how
>> long ago that was :(
>>
>>>> Actually, there are two users of pud_entry(), in hmm.c and since 5.5rc1 also mapping_dirty_helpers.c. The latter one is unproblematic and requires no attention but the one in hmm.c is probably largely untested, and seems to assume it was called outside of the spinlock.
>>>>
>>>> The problem with the current patch is that the hmm pud_entry will traverse also pmds, so that will be done twice now.
>>>>
>>>> In another thread we were discussing a means of rerunning the level (in case of a race), or continuing after a level, based on the return value after the callback. The change was fairly invasive,
>>>>
>>> Hmm. Forgot to remove the above text that appears twice. :(. The correct one is inline below.
>>>
>>>>
>>>>> Tested-by: Zong Li <zong.li@sifive.com>
>>>>> Signed-off-by: Steven Price <steven.price@arm.com>
>>>>> ---
>>>>>   include/linux/pagewalk.h | 19 +++++++++++++------
>>>>>   mm/pagewalk.c            | 27 ++++++++++++++++-----------
>>>>>   2 files changed, 29 insertions(+), 17 deletions(-)
>>>>>
>>>>> diff --git a/include/linux/pagewalk.h b/include/linux/pagewalk.h
>>>>> index 6ec82e92c87f..06790f23957f 100644
>>>>> --- a/include/linux/pagewalk.h
>>>>> +++ b/include/linux/pagewalk.h
>>>>> @@ -8,15 +8,15 @@ struct mm_walk;
>>>>>     /**
>>>>>    * mm_walk_ops - callbacks for walk_page_range
>>>>> - * @pud_entry:        if set, called for each non-empty PUD (2nd-level) entry
>>>>> - *            this handler should only handle pud_trans_huge() puds.
>>>>> - *            the pmd_entry or pte_entry callbacks will be used for
>>>>> - *            regular PUDs.
>>>>> - * @pmd_entry:        if set, called for each non-empty PMD (3rd-level) entry
>>>>> + * @pgd_entry:        if set, called for each non-empty PGD (top-level) entry
>>>>> + * @p4d_entry:        if set, called for each non-empty P4D entry
>>>>> + * @pud_entry:        if set, called for each non-empty PUD entry
>>>>> + * @pmd_entry:        if set, called for each non-empty PMD entry
>>>>>    *            this handler is required to be able to handle
>>>>>    *            pmd_trans_huge() pmds.  They may simply choose to
>>>>>    *            split_huge_page() instead of handling it explicitly.
>>>>> - * @pte_entry:        if set, called for each non-empty PTE (4th-level) entry
>>>>> + * @pte_entry:        if set, called for each non-empty PTE (lowest-level)
>>>>> + *            entry
>>>>>    * @pte_hole:        if set, called for each hole at all levels
>>>>>    * @hugetlb_entry:    if set, called for each hugetlb entry
>>>>>    * @test_walk:        caller specific callback function to determine whether
>>>>> @@ -27,8 +27,15 @@ struct mm_walk;
>>>>>    * @pre_vma:            if set, called before starting walk on a non-null vma.
>>>>>    * @post_vma:           if set, called after a walk on a non-null vma, provided
>>>>>    *                      that @pre_vma and the vma walk succeeded.
>>>>> + *
>>>>> + * p?d_entry callbacks are called even if those levels are folded on a
>>>>> + * particular architecture/configuration.
>>>>>    */
>>>>>   struct mm_walk_ops {
>>>>> +    int (*pgd_entry)(pgd_t *pgd, unsigned long addr,
>>>>> +             unsigned long next, struct mm_walk *walk);
>>>>> +    int (*p4d_entry)(p4d_t *p4d, unsigned long addr,
>>>>> +             unsigned long next, struct mm_walk *walk);
>>>>>       int (*pud_entry)(pud_t *pud, unsigned long addr,
>>>>>                unsigned long next, struct mm_walk *walk);
>>>>>       int (*pmd_entry)(pmd_t *pmd, unsigned long addr,
>>>>> diff --git a/mm/pagewalk.c b/mm/pagewalk.c
>>>>> index ea0b9e606ad1..c089786e7a7f 100644
>>>>> --- a/mm/pagewalk.c
>>>>> +++ b/mm/pagewalk.c
>>>>> @@ -94,15 +94,9 @@ static int walk_pud_range(p4d_t *p4d, unsigned long addr, unsigned long end,
>>>>>           }
>>>>>             if (ops->pud_entry) {
>>>>> -            spinlock_t *ptl = pud_trans_huge_lock(pud, walk->vma);
>>>>> -
>>>>> -            if (ptl) {
>>>>> -                err = ops->pud_entry(pud, addr, next, walk);
>>>>> -                spin_unlock(ptl);
>>>>> -                if (err)
>>>>> -                    break;
>>>>> -                continue;
>>>>> -            }
>>>>> +            err = ops->pud_entry(pud, addr, next, walk);
>>>>> +            if (err)
>>>>> +                break;
>>>>
>>>> Actually, there are two current users of pud_entry(), in hmm.c and since 5.5rc1 also mapping_dirty_helpers.c. The latter one is unproblematic and requires no attention but the one in hmm.c is probably largely untested, and seems to assume it was called outside of the spinlock.
>>
>> Thanks for pointing that out, I guess the simplest fix would be to
>> squash in something like the below which should restore the old
>> behaviour for hmm.c without affecting others.
>>
>> Steve 
> 
> I'm not fully sure that the old behaviour is the correct one, but definitely hmm's pud_entry needs some fixing.
> I'm more concerned with the pagewalk code. With your patch it actually splits all huge puds present in the page-table
> on each page walk which is not what we want.

Good catch - yes that's certainly not ideal.

> One idea would be to add a new member to struct_mm_walk:
> 
> enum page_walk_ret_action {
>      ACTION_SUBTREE = 0,
>      ACTION_CONTINUE = 1,
>      ACTION_AGAIN = 2 /* Only for levels that thave p?d_unstable */
> };
> 
> struct mm_walk {
>      ...
>      enum page_walk_ret_action action; /* or perhaps as an enum */
> };
> 
> 
> if (ops->pud_entry) {
>      walk->action = ACTION_SUBTREE;
>      ...
>      ...
>      ...
>      if (walk->action == ACTION_AGAIN)  /* Callback tried to split huge entry, but failed */
>          goto again;
>      else if (walk->action == ACTION_CONTINUE) /* Done with this subtree. Probably huge entry handled. */
>          continue;
>      /* ACTION_SUBTREE falls through */
> }

I'll have a go at implementing the above - this might also allow removing the test_p?d() callbacks as they can simply return ACTION_CONTINUE.

Steve

> we discussed something similar before on linux-mm, but the idea then was to redefine
> the positive return value of the callback to the action, but that meant changing those existing callbacks that relied on
> a positive return value. The above would be helpful also for pmd_entry.
> 
> /Thomas
> 
> 
> 
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 85+ messages in thread

end of thread, other threads:[~2019-12-12 15:18 UTC | newest]

Thread overview: 85+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-06 13:52 [PATCH v16 00/25] Generic page walk and ptdump Steven Price
2019-12-06 13:52 ` Steven Price
2019-12-06 13:52 ` [PATCH v16 01/25] mm: Add generic p?d_leaf() macros Steven Price
2019-12-06 13:52   ` Steven Price
2019-12-06 13:52 ` [PATCH v16 02/25] arc: mm: Add p?d_leaf() definitions Steven Price
2019-12-06 13:52   ` Steven Price
2019-12-06 13:52   ` Steven Price
2019-12-06 13:52 ` [PATCH v16 03/25] arm: " Steven Price
2019-12-06 13:52   ` Steven Price
2019-12-06 13:52 ` [PATCH v16 04/25] arm64: " Steven Price
2019-12-06 13:52   ` Steven Price
2019-12-06 13:52 ` [PATCH v16 05/25] mips: " Steven Price
2019-12-06 13:52   ` Steven Price
2019-12-06 13:52 ` [PATCH v16 06/25] powerpc: " Steven Price
2019-12-06 13:52   ` Steven Price
2019-12-06 13:52   ` Steven Price
2019-12-06 13:52   ` Steven Price
2019-12-09 11:08   ` Michael Ellerman
2019-12-09 11:08     ` Michael Ellerman
2019-12-09 11:08     ` Michael Ellerman
2019-12-09 11:08     ` Michael Ellerman
2019-12-09 13:06     ` Steven Price
2019-12-09 13:06       ` Steven Price
2019-12-09 13:06       ` Steven Price
2019-12-09 13:06       ` Steven Price
2019-12-06 13:52 ` [PATCH v16 07/25] riscv: " Steven Price
2019-12-06 13:52   ` Steven Price
2019-12-06 13:52   ` Steven Price
2019-12-06 13:52 ` [PATCH v16 08/25] s390: " Steven Price
2019-12-06 13:52   ` Steven Price
2019-12-06 13:53 ` [PATCH v16 09/25] sparc: " Steven Price
2019-12-06 13:53   ` Steven Price
2019-12-06 13:53   ` Steven Price
2019-12-06 13:53 ` [PATCH v16 10/25] x86: " Steven Price
2019-12-06 13:53   ` Steven Price
2019-12-06 13:53 ` [PATCH v16 11/25] mm: pagewalk: Add p4d_entry() and pgd_entry() Steven Price
2019-12-06 13:53   ` Steven Price
2019-12-12 11:23   ` Thomas Hellström (VMware)
2019-12-12 11:23     ` Thomas Hellström (VMware)
2019-12-12 11:33     ` Thomas Hellström (VMware)
2019-12-12 11:33       ` Thomas Hellström (VMware)
2019-12-12 13:15       ` Steven Price
2019-12-12 13:15         ` Steven Price
2019-12-12 14:04         ` Thomas Hellström (VMware)
2019-12-12 14:04           ` Thomas Hellström (VMware)
2019-12-12 15:18           ` Steven Price
2019-12-12 15:18             ` Steven Price
2019-12-06 13:53 ` [PATCH v16 12/25] mm: pagewalk: Allow walking without vma Steven Price
2019-12-06 13:53   ` Steven Price
2019-12-06 13:53 ` [PATCH v16 13/25] mm: pagewalk: Don't lock PTEs for walk_page_range_novma() Steven Price
2019-12-06 13:53   ` Steven Price
2019-12-10 11:23   ` kbuild test robot
2019-12-10 11:23     ` kbuild test robot
2019-12-10 11:23     ` kbuild test robot
2019-12-11 15:54     ` Steven Price
2019-12-11 15:54       ` Steven Price
2019-12-11 15:54       ` Steven Price
2019-12-11 17:12       ` Luc Van Oostenryck
2019-12-11 17:12         ` Luc Van Oostenryck
2019-12-11 17:19       ` Qian Cai
2019-12-11 17:19         ` Qian Cai
2019-12-06 13:53 ` [PATCH v16 14/25] mm: pagewalk: fix termination condition in walk_pte_range() Steven Price
2019-12-06 13:53   ` Steven Price
2019-12-06 13:53 ` [PATCH v16 15/25] mm: pagewalk: Add test_p?d callbacks Steven Price
2019-12-06 13:53   ` Steven Price
2019-12-06 13:53 ` [PATCH v16 16/25] mm: pagewalk: Add 'depth' parameter to pte_hole Steven Price
2019-12-06 13:53   ` Steven Price
2019-12-06 13:53 ` [PATCH v16 17/25] x86: mm: Point to struct seq_file from struct pg_state Steven Price
2019-12-06 13:53   ` Steven Price
2019-12-06 13:53 ` [PATCH v16 18/25] x86: mm+efi: Convert ptdump_walk_pgd_level() to take a mm_struct Steven Price
2019-12-06 13:53   ` Steven Price
2019-12-06 13:53 ` [PATCH v16 19/25] x86: mm: Convert ptdump_walk_pgd_level_debugfs() to take an mm_struct Steven Price
2019-12-06 13:53   ` Steven Price
2019-12-06 13:53 ` [PATCH v16 20/25] x86: mm: Convert ptdump_walk_pgd_level_core() " Steven Price
2019-12-06 13:53   ` Steven Price
2019-12-06 13:53 ` [PATCH v16 21/25] mm: Add generic ptdump Steven Price
2019-12-06 13:53   ` Steven Price
2019-12-06 13:53 ` [PATCH v16 22/25] x86: mm: Convert dump_pagetables to use walk_page_range Steven Price
2019-12-06 13:53   ` Steven Price
2019-12-06 13:53 ` [PATCH v16 23/25] arm64: mm: Convert mm/dump.c to use walk_page_range() Steven Price
2019-12-06 13:53   ` Steven Price
2019-12-06 13:53 ` [PATCH v16 24/25] arm64: mm: Display non-present entries in ptdump Steven Price
2019-12-06 13:53   ` Steven Price
2019-12-06 13:53 ` [PATCH v16 25/25] mm: ptdump: Reduce level numbers by 1 in note_page() Steven Price
2019-12-06 13:53   ` Steven Price

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.