linux-arch.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 00/36] New page table range API
@ 2023-03-15  5:14 Matthew Wilcox (Oracle)
  2023-03-15  5:14 ` [PATCH v4 01/36] mm: Convert page_table_check_pte_set() to page_table_check_ptes_set() Matthew Wilcox (Oracle)
                   ` (35 more replies)
  0 siblings, 36 replies; 138+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-03-15  5:14 UTC (permalink / raw)
  To: linux-arch; +Cc: Matthew Wilcox (Oracle), linux-mm, linux-kernel

This patchset changes the API used by the MM to set up page table entries.
The four APIs are:
    set_ptes(mm, addr, ptep, pte, nr)
    update_mmu_cache_range(vma, addr, ptep, nr)
    flush_dcache_folio(folio) 
    flush_icache_pages(vma, page, nr)

flush_dcache_folio() isn't technically new, but no architecture
implemented it, so I've done that for you.  The old APIs remain around
but are mostly implemented by calling the new interfaces.

The new APIs are based around setting up N page table entries at once.
The N entries belong to the same PMD, the same folio and the same VMA,
so ptep++ is a legitimate operation, and locking is taken care of for
you.  Some architectures can do a better job of it than just a loop,
but I have hesitated to make too deep a change to architectures I don't
understand well.

One thing I have changed in every architecture is that PG_arch_1 is now a
per-folio bit instead of a per-page bit.  This was something that would
have to happen eventually, and it makes sense to do it now rather than
iterate over every page involved in a cache flush and figure out if it
needs to happen.

The point of all this is better performance, and Fengwei Yin has
measured improvement on x86.  I suspect you'll see improvement on
your architecture too.  Try the new will-it-scale test mentioned here:
https://lore.kernel.org/linux-mm/20230206140639.538867-5-fengwei.yin@intel.com/
You'll need to run it on an XFS filesystem and have
CONFIG_TRANSPARENT_HUGEPAGE set.

For testing, I've only run the code on x86.  If an x86->foo compiler
exists in Debian, I've built defconfig.  I'm relying on the buildbots
to tell me what I missed, and people who actually have the hardware to
tell me if it actually works.

I'd like to get this into the MM tree soon, so quick feedback would
be appreciated.

v4:
 - Fix a few compile errors (mostly Mike Rapoport)
 - Incorporate Mike's suggestion to avoid having to define set_ptes()
   or set_pte_at() on the majority of architectures
 - Optimise m68k's __flush_pages_to_ram (Geert Uytterhoeven)
 - Fix sun3 (me)
 - Fix sparc32 (me)
 - Pick up a few more Ack/Reviewed tags

v3:
 - Reinstate flush_dcache_icache_phys() on PowerPC
 - Fix folio_flush_mapping().  The documentation was correct and the
   implementation was completely wrong
 - Change the flush_dcache_page() documentation to describe
   flush_dcache_folio() instead
 - Split ARM from ARC.  I messed up my git commands
 - Remove page_mapping_file()
 - Rationalise how flush_icache_pages() and flush_icache_page() are defined
 - Use flush_icache_pages() in do_set_pmd()
 - Pick up Guo Ren's Ack for csky

Matthew Wilcox (Oracle) (32):
  mm: Convert page_table_check_pte_set() to page_table_check_ptes_set()
  mm: Add generic flush_icache_pages() and documentation
  mm: Add folio_flush_mapping()
  mm: Remove ARCH_IMPLEMENTS_FLUSH_DCACHE_FOLIO
  mm: Add default definition of set_ptes()
  alpha: Implement the new page table range API
  arc: Implement the new page table range API
  arm: Implement the new page table range API
  arm64: Implement the new page table range API
  csky: Implement the new page table range API
  hexagon: Implement the new page table range API
  ia64: Implement the new page table range API
  loongarch: Implement the new page table range API
  m68k: Implement the new page table range API
  microblaze: Implement the new page table range API
  mips: Implement the new page table range API
  nios2: Implement the new page table range API
  openrisc: Implement the new page table range API
  parisc: Implement the new page table range API
  powerpc: Implement the new page table range API
  riscv: Implement the new page table range API
  s390: Implement the new page table range API
  superh: Implement the new page table range API
  sparc32: Implement the new page table range API
  sparc64: Implement the new page table range API
  um: Implement the new page table range API
  x86: Implement the new page table range API
  xtensa: Implement the new page table range API
  mm: Remove page_mapping_file()
  mm: Rationalise flush_icache_pages() and flush_icache_page()
  mm: Tidy up set_ptes definition
  mm: Use flush_icache_pages() in do_set_pmd()

Yin Fengwei (4):
  filemap: Add filemap_map_folio_range()
  rmap: add folio_add_file_rmap_range()
  mm: Convert do_set_pte() to set_pte_range()
  filemap: Batch PTE mappings

 Documentation/core-api/cachetlb.rst       |  51 +++++-----
 Documentation/filesystems/locking.rst     |   2 +-
 arch/alpha/include/asm/cacheflush.h       |  13 ++-
 arch/alpha/include/asm/pgtable.h          |   9 +-
 arch/arc/include/asm/cacheflush.h         |  14 +--
 arch/arc/include/asm/pgtable-bits-arcv2.h |  11 +--
 arch/arc/include/asm/pgtable-levels.h     |   1 +
 arch/arc/mm/cache.c                       |  61 +++++++-----
 arch/arc/mm/tlb.c                         |  18 ++--
 arch/arm/include/asm/cacheflush.h         |  29 +++---
 arch/arm/include/asm/pgtable.h            |   5 +-
 arch/arm/include/asm/tlbflush.h           |  13 ++-
 arch/arm/mm/copypage-v4mc.c               |   5 +-
 arch/arm/mm/copypage-v6.c                 |   5 +-
 arch/arm/mm/copypage-xscale.c             |   5 +-
 arch/arm/mm/dma-mapping.c                 |  24 ++---
 arch/arm/mm/fault-armv.c                  |  14 +--
 arch/arm/mm/flush.c                       |  99 +++++++++++--------
 arch/arm/mm/mm.h                          |   2 +-
 arch/arm/mm/mmu.c                         |  14 ++-
 arch/arm64/include/asm/cacheflush.h       |   4 +-
 arch/arm64/include/asm/pgtable.h          |  25 +++--
 arch/arm64/mm/flush.c                     |  36 +++----
 arch/csky/abiv1/cacheflush.c              |  32 ++++---
 arch/csky/abiv1/inc/abi/cacheflush.h      |   3 +-
 arch/csky/abiv2/cacheflush.c              |  32 +++----
 arch/csky/abiv2/inc/abi/cacheflush.h      |  11 ++-
 arch/csky/include/asm/pgtable.h           |   8 +-
 arch/hexagon/include/asm/cacheflush.h     |   9 +-
 arch/hexagon/include/asm/pgtable.h        |   9 +-
 arch/ia64/hp/common/sba_iommu.c           |  26 ++---
 arch/ia64/include/asm/cacheflush.h        |  14 ++-
 arch/ia64/include/asm/pgtable.h           |   4 +-
 arch/ia64/mm/init.c                       |  28 ++++--
 arch/loongarch/include/asm/cacheflush.h   |   2 +-
 arch/loongarch/include/asm/pgtable-bits.h |   4 +-
 arch/loongarch/include/asm/pgtable.h      |  28 +++---
 arch/loongarch/mm/pgtable.c               |   2 +-
 arch/loongarch/mm/tlb.c                   |   2 +-
 arch/m68k/include/asm/cacheflush_mm.h     |  26 +++--
 arch/m68k/include/asm/mcf_pgtable.h       |   1 +
 arch/m68k/include/asm/motorola_pgtable.h  |   1 +
 arch/m68k/include/asm/pgtable_mm.h        |   9 +-
 arch/m68k/include/asm/sun3_pgtable.h      |   1 +
 arch/m68k/mm/motorola.c                   |   2 +-
 arch/microblaze/include/asm/cacheflush.h  |   8 ++
 arch/microblaze/include/asm/pgtable.h     |  15 +--
 arch/microblaze/include/asm/tlbflush.h    |   4 +-
 arch/mips/bcm47xx/prom.c                  |   2 +-
 arch/mips/include/asm/cacheflush.h        |  32 ++++---
 arch/mips/include/asm/pgtable-32.h        |  10 +-
 arch/mips/include/asm/pgtable-64.h        |   6 +-
 arch/mips/include/asm/pgtable-bits.h      |   6 +-
 arch/mips/include/asm/pgtable.h           |  44 +++++----
 arch/mips/mm/c-r4k.c                      |   5 +-
 arch/mips/mm/cache.c                      |  56 +++++------
 arch/mips/mm/init.c                       |  21 ++--
 arch/mips/mm/pgtable-32.c                 |   2 +-
 arch/mips/mm/pgtable-64.c                 |   2 +-
 arch/mips/mm/tlbex.c                      |   2 +-
 arch/nios2/include/asm/cacheflush.h       |   6 +-
 arch/nios2/include/asm/pgtable.h          |  28 ++++--
 arch/nios2/mm/cacheflush.c                |  62 ++++++------
 arch/openrisc/include/asm/cacheflush.h    |   8 +-
 arch/openrisc/include/asm/pgtable.h       |  14 ++-
 arch/openrisc/mm/cache.c                  |  12 ++-
 arch/parisc/include/asm/cacheflush.h      |  14 ++-
 arch/parisc/include/asm/pgtable.h         |  37 +++++---
 arch/parisc/kernel/cache.c                | 101 ++++++++++++++------
 arch/powerpc/include/asm/book3s/pgtable.h |  10 +-
 arch/powerpc/include/asm/cacheflush.h     |  14 ++-
 arch/powerpc/include/asm/kvm_ppc.h        |  10 +-
 arch/powerpc/include/asm/nohash/pgtable.h |  13 +--
 arch/powerpc/include/asm/pgtable.h        |   6 ++
 arch/powerpc/mm/book3s64/hash_utils.c     |  11 ++-
 arch/powerpc/mm/cacheflush.c              |  40 +++-----
 arch/powerpc/mm/nohash/e500_hugetlbpage.c |   3 +-
 arch/powerpc/mm/pgtable.c                 |  51 +++++-----
 arch/riscv/include/asm/cacheflush.h       |  19 ++--
 arch/riscv/include/asm/pgtable.h          |  26 +++--
 arch/riscv/mm/cacheflush.c                |  11 +--
 arch/s390/include/asm/pgtable.h           |  33 +++++--
 arch/sh/include/asm/cacheflush.h          |  21 ++--
 arch/sh/include/asm/pgtable.h             |   6 +-
 arch/sh/include/asm/pgtable_32.h          |   5 +-
 arch/sh/mm/cache-j2.c                     |   4 +-
 arch/sh/mm/cache-sh4.c                    |  26 +++--
 arch/sh/mm/cache-sh7705.c                 |  26 +++--
 arch/sh/mm/cache.c                        |  52 +++++-----
 arch/sh/mm/kmap.c                         |   3 +-
 arch/sparc/include/asm/cacheflush_32.h    |   9 +-
 arch/sparc/include/asm/cacheflush_64.h    |  19 ++--
 arch/sparc/include/asm/pgtable_32.h       |   8 +-
 arch/sparc/include/asm/pgtable_64.h       |  24 ++++-
 arch/sparc/kernel/smp_64.c                |  56 +++++++----
 arch/sparc/mm/init_32.c                   |  13 ++-
 arch/sparc/mm/init_64.c                   |  78 ++++++++-------
 arch/sparc/mm/tlb.c                       |   5 +-
 arch/um/include/asm/pgtable.h             |   7 +-
 arch/x86/include/asm/pgtable.h            |  13 ++-
 arch/xtensa/include/asm/cacheflush.h      |  11 ++-
 arch/xtensa/include/asm/pgtable.h         |  17 ++--
 arch/xtensa/mm/cache.c                    |  83 +++++++++-------
 include/asm-generic/cacheflush.h          |   7 --
 include/linux/cacheflush.h                |  13 ++-
 include/linux/mm.h                        |   3 +-
 include/linux/page_table_check.h          |  14 +--
 include/linux/pagemap.h                   |  28 ++++--
 include/linux/pgtable.h                   |  31 ++++++
 include/linux/rmap.h                      |   2 +
 mm/filemap.c                              | 111 +++++++++++++---------
 mm/memory.c                               |  31 +++---
 mm/page_table_check.c                     |  14 +--
 mm/rmap.c                                 |  60 +++++++++---
 mm/util.c                                 |   2 +-
 115 files changed, 1344 insertions(+), 916 deletions(-)

-- 
2.39.2


^ permalink raw reply	[flat|nested] 138+ messages in thread

* [PATCH v4 01/36] mm: Convert page_table_check_pte_set() to page_table_check_ptes_set()
  2023-03-15  5:14 [PATCH v4 00/36] New page table range API Matthew Wilcox (Oracle)
@ 2023-03-15  5:14 ` Matthew Wilcox (Oracle)
  2023-03-15  9:21   ` Mike Rapoport
                     ` (2 more replies)
  2023-03-15  5:14 ` [PATCH v4 02/36] mm: Add generic flush_icache_pages() and documentation Matthew Wilcox (Oracle)
                   ` (34 subsequent siblings)
  35 siblings, 3 replies; 138+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-03-15  5:14 UTC (permalink / raw)
  To: linux-arch; +Cc: Matthew Wilcox (Oracle), linux-mm, linux-kernel

Tell the page table check how many PTEs & PFNs we want it to check.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 arch/arm64/include/asm/pgtable.h |  2 +-
 arch/riscv/include/asm/pgtable.h |  2 +-
 arch/x86/include/asm/pgtable.h   |  2 +-
 include/linux/page_table_check.h | 14 +++++++-------
 mm/page_table_check.c            | 14 ++++++++------
 5 files changed, 18 insertions(+), 16 deletions(-)

diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 0bd18de9fd97..9428748f4691 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -358,7 +358,7 @@ static inline void __set_pte_at(struct mm_struct *mm, unsigned long addr,
 static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
 			      pte_t *ptep, pte_t pte)
 {
-	page_table_check_pte_set(mm, addr, ptep, pte);
+	page_table_check_ptes_set(mm, addr, ptep, pte, 1);
 	return __set_pte_at(mm, addr, ptep, pte);
 }
 
diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
index ab05f892d317..b516f3b59616 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -459,7 +459,7 @@ static inline void __set_pte_at(struct mm_struct *mm,
 static inline void set_pte_at(struct mm_struct *mm,
 	unsigned long addr, pte_t *ptep, pte_t pteval)
 {
-	page_table_check_pte_set(mm, addr, ptep, pteval);
+	page_table_check_ptes_set(mm, addr, ptep, pteval, 1);
 	__set_pte_at(mm, addr, ptep, pteval);
 }
 
diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 15ae4d6ba476..1031025730d0 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -1022,7 +1022,7 @@ static inline pud_t native_local_pudp_get_and_clear(pud_t *pudp)
 static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
 			      pte_t *ptep, pte_t pte)
 {
-	page_table_check_pte_set(mm, addr, ptep, pte);
+	page_table_check_ptes_set(mm, addr, ptep, pte, 1);
 	set_pte(ptep, pte);
 }
 
diff --git a/include/linux/page_table_check.h b/include/linux/page_table_check.h
index 01e16c7696ec..ba269c7009e4 100644
--- a/include/linux/page_table_check.h
+++ b/include/linux/page_table_check.h
@@ -20,8 +20,8 @@ void __page_table_check_pmd_clear(struct mm_struct *mm, unsigned long addr,
 				  pmd_t pmd);
 void __page_table_check_pud_clear(struct mm_struct *mm, unsigned long addr,
 				  pud_t pud);
-void __page_table_check_pte_set(struct mm_struct *mm, unsigned long addr,
-				pte_t *ptep, pte_t pte);
+void __page_table_check_ptes_set(struct mm_struct *mm, unsigned long addr,
+				pte_t *ptep, pte_t pte, unsigned int nr);
 void __page_table_check_pmd_set(struct mm_struct *mm, unsigned long addr,
 				pmd_t *pmdp, pmd_t pmd);
 void __page_table_check_pud_set(struct mm_struct *mm, unsigned long addr,
@@ -73,14 +73,14 @@ static inline void page_table_check_pud_clear(struct mm_struct *mm,
 	__page_table_check_pud_clear(mm, addr, pud);
 }
 
-static inline void page_table_check_pte_set(struct mm_struct *mm,
+static inline void page_table_check_ptes_set(struct mm_struct *mm,
 					    unsigned long addr, pte_t *ptep,
-					    pte_t pte)
+					    pte_t pte, unsigned int nr)
 {
 	if (static_branch_likely(&page_table_check_disabled))
 		return;
 
-	__page_table_check_pte_set(mm, addr, ptep, pte);
+	__page_table_check_ptes_set(mm, addr, ptep, pte, nr);
 }
 
 static inline void page_table_check_pmd_set(struct mm_struct *mm,
@@ -138,9 +138,9 @@ static inline void page_table_check_pud_clear(struct mm_struct *mm,
 {
 }
 
-static inline void page_table_check_pte_set(struct mm_struct *mm,
+static inline void page_table_check_ptes_set(struct mm_struct *mm,
 					    unsigned long addr, pte_t *ptep,
-					    pte_t pte)
+					    pte_t pte, unsigned int nr)
 {
 }
 
diff --git a/mm/page_table_check.c b/mm/page_table_check.c
index 25d8610c0042..e6f4d40caaa2 100644
--- a/mm/page_table_check.c
+++ b/mm/page_table_check.c
@@ -184,20 +184,22 @@ void __page_table_check_pud_clear(struct mm_struct *mm, unsigned long addr,
 }
 EXPORT_SYMBOL(__page_table_check_pud_clear);
 
-void __page_table_check_pte_set(struct mm_struct *mm, unsigned long addr,
-				pte_t *ptep, pte_t pte)
+void __page_table_check_ptes_set(struct mm_struct *mm, unsigned long addr,
+				pte_t *ptep, pte_t pte, unsigned int nr)
 {
+	unsigned int i;
+
 	if (&init_mm == mm)
 		return;
 
-	__page_table_check_pte_clear(mm, addr, *ptep);
+	for (i = 0; i < nr; i++)
+		__page_table_check_pte_clear(mm, addr, ptep[i]);
 	if (pte_user_accessible_page(pte)) {
-		page_table_check_set(mm, addr, pte_pfn(pte),
-				     PAGE_SIZE >> PAGE_SHIFT,
+		page_table_check_set(mm, addr, pte_pfn(pte), nr,
 				     pte_write(pte));
 	}
 }
-EXPORT_SYMBOL(__page_table_check_pte_set);
+EXPORT_SYMBOL(__page_table_check_ptes_set);
 
 void __page_table_check_pmd_set(struct mm_struct *mm, unsigned long addr,
 				pmd_t *pmdp, pmd_t pmd)
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH v4 02/36] mm: Add generic flush_icache_pages() and documentation
  2023-03-15  5:14 [PATCH v4 00/36] New page table range API Matthew Wilcox (Oracle)
  2023-03-15  5:14 ` [PATCH v4 01/36] mm: Convert page_table_check_pte_set() to page_table_check_ptes_set() Matthew Wilcox (Oracle)
@ 2023-03-15  5:14 ` Matthew Wilcox (Oracle)
  2023-03-15  9:27   ` Mike Rapoport
  2023-05-25  2:23   ` Anshuman Khandual
  2023-03-15  5:14 ` [PATCH v4 03/36] mm: Add folio_flush_mapping() Matthew Wilcox (Oracle)
                   ` (33 subsequent siblings)
  35 siblings, 2 replies; 138+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-03-15  5:14 UTC (permalink / raw)
  To: linux-arch; +Cc: Matthew Wilcox (Oracle), linux-mm, linux-kernel

flush_icache_page() is deprecated but not yet removed, so add
a range version of it.  Change the documentation to refer to
update_mmu_cache_range() instead of update_mmu_cache().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 Documentation/core-api/cachetlb.rst | 35 +++++++++++++++--------------
 include/asm-generic/cacheflush.h    |  5 +++++
 2 files changed, 23 insertions(+), 17 deletions(-)

diff --git a/Documentation/core-api/cachetlb.rst b/Documentation/core-api/cachetlb.rst
index 5c0552e78c58..d4c9e2a28d36 100644
--- a/Documentation/core-api/cachetlb.rst
+++ b/Documentation/core-api/cachetlb.rst
@@ -88,13 +88,13 @@ changes occur:
 
 	This is used primarily during fault processing.
 
-5) ``void update_mmu_cache(struct vm_area_struct *vma,
-   unsigned long address, pte_t *ptep)``
+5) ``void update_mmu_cache_range(struct vm_area_struct *vma,
+   unsigned long address, pte_t *ptep, unsigned int nr)``
 
-	At the end of every page fault, this routine is invoked to
-	tell the architecture specific code that a translation
-	now exists at virtual address "address" for address space
-	"vma->vm_mm", in the software page tables.
+	At the end of every page fault, this routine is invoked to tell
+	the architecture specific code that translations now exists
+	in the software page tables for address space "vma->vm_mm"
+	at virtual address "address" for "nr" consecutive pages.
 
 	A port may use this information in any way it so chooses.
 	For example, it could use this event to pre-load TLB
@@ -306,17 +306,18 @@ maps this page at its virtual address.
 	private".  The kernel guarantees that, for pagecache pages, it will
 	clear this bit when such a page first enters the pagecache.
 
-	This allows these interfaces to be implemented much more efficiently.
-	It allows one to "defer" (perhaps indefinitely) the actual flush if
-	there are currently no user processes mapping this page.  See sparc64's
-	flush_dcache_page and update_mmu_cache implementations for an example
-	of how to go about doing this.
+	This allows these interfaces to be implemented much more
+	efficiently.  It allows one to "defer" (perhaps indefinitely) the
+	actual flush if there are currently no user processes mapping this
+	page.  See sparc64's flush_dcache_page and update_mmu_cache_range
+	implementations for an example of how to go about doing this.
 
-	The idea is, first at flush_dcache_page() time, if page_file_mapping()
-	returns a mapping, and mapping_mapped on that mapping returns %false,
-	just mark the architecture private page flag bit.  Later, in
-	update_mmu_cache(), a check is made of this flag bit, and if set the
-	flush is done and the flag bit is cleared.
+	The idea is, first at flush_dcache_page() time, if
+	page_file_mapping() returns a mapping, and mapping_mapped on that
+	mapping returns %false, just mark the architecture private page
+	flag bit.  Later, in update_mmu_cache_range(), a check is made
+	of this flag bit, and if set the flush is done and the flag bit
+	is cleared.
 
 	.. important::
 
@@ -369,7 +370,7 @@ maps this page at its virtual address.
   ``void flush_icache_page(struct vm_area_struct *vma, struct page *page)``
 
 	All the functionality of flush_icache_page can be implemented in
-	flush_dcache_page and update_mmu_cache. In the future, the hope
+	flush_dcache_page and update_mmu_cache_range. In the future, the hope
 	is to remove this interface completely.
 
 The final category of APIs is for I/O to deliberately aliased address
diff --git a/include/asm-generic/cacheflush.h b/include/asm-generic/cacheflush.h
index f46258d1a080..09d51a680765 100644
--- a/include/asm-generic/cacheflush.h
+++ b/include/asm-generic/cacheflush.h
@@ -78,6 +78,11 @@ static inline void flush_icache_range(unsigned long start, unsigned long end)
 #endif
 
 #ifndef flush_icache_page
+static inline void flush_icache_pages(struct vm_area_struct *vma,
+				     struct page *page, unsigned int nr)
+{
+}
+
 static inline void flush_icache_page(struct vm_area_struct *vma,
 				     struct page *page)
 {
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH v4 03/36] mm: Add folio_flush_mapping()
  2023-03-15  5:14 [PATCH v4 00/36] New page table range API Matthew Wilcox (Oracle)
  2023-03-15  5:14 ` [PATCH v4 01/36] mm: Convert page_table_check_pte_set() to page_table_check_ptes_set() Matthew Wilcox (Oracle)
  2023-03-15  5:14 ` [PATCH v4 02/36] mm: Add generic flush_icache_pages() and documentation Matthew Wilcox (Oracle)
@ 2023-03-15  5:14 ` Matthew Wilcox (Oracle)
  2023-03-15  9:28   ` Mike Rapoport
  2023-05-25  2:35   ` Anshuman Khandual
  2023-03-15  5:14 ` [PATCH v4 04/36] mm: Remove ARCH_IMPLEMENTS_FLUSH_DCACHE_FOLIO Matthew Wilcox (Oracle)
                   ` (32 subsequent siblings)
  35 siblings, 2 replies; 138+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-03-15  5:14 UTC (permalink / raw)
  To: linux-arch; +Cc: Matthew Wilcox (Oracle), linux-mm, linux-kernel

This is the folio equivalent of page_mapping_file(), but rename it
to make it clear that it's very different from page_file_mapping().
Theoretically, there's nothing flush-only about it, but there are no
other users today, and I doubt there will be; it's almost always more
useful to know the swapfile's mapping or the swapcache's mapping.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 include/linux/pagemap.h | 26 +++++++++++++++++++++-----
 1 file changed, 21 insertions(+), 5 deletions(-)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index a56308a9d1a4..e56c2023aa0e 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -369,6 +369,26 @@ static inline struct address_space *folio_file_mapping(struct folio *folio)
 	return folio->mapping;
 }
 
+/**
+ * folio_flush_mapping - Find the file mapping this folio belongs to.
+ * @folio: The folio.
+ *
+ * For folios which are in the page cache, return the mapping that this
+ * page belongs to.  Anonymous folios return NULL, even if they're in
+ * the swap cache.  Other kinds of folio also return NULL.
+ *
+ * This is ONLY used by architecture cache flushing code.  If you aren't
+ * writing cache flushing code, you want either folio_mapping() or
+ * folio_file_mapping().
+ */
+static inline struct address_space *folio_flush_mapping(struct folio *folio)
+{
+	if (unlikely(folio_test_swapcache(folio)))
+		return NULL;
+
+	return folio_mapping(folio);
+}
+
 static inline struct address_space *page_file_mapping(struct page *page)
 {
 	return folio_file_mapping(page_folio(page));
@@ -379,11 +399,7 @@ static inline struct address_space *page_file_mapping(struct page *page)
  */
 static inline struct address_space *page_mapping_file(struct page *page)
 {
-	struct folio *folio = page_folio(page);
-
-	if (unlikely(folio_test_swapcache(folio)))
-		return NULL;
-	return folio_mapping(folio);
+	return folio_flush_mapping(page_folio(page));
 }
 
 /**
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH v4 04/36] mm: Remove ARCH_IMPLEMENTS_FLUSH_DCACHE_FOLIO
  2023-03-15  5:14 [PATCH v4 00/36] New page table range API Matthew Wilcox (Oracle)
                   ` (2 preceding siblings ...)
  2023-03-15  5:14 ` [PATCH v4 03/36] mm: Add folio_flush_mapping() Matthew Wilcox (Oracle)
@ 2023-03-15  5:14 ` Matthew Wilcox (Oracle)
  2023-03-15  9:28   ` Mike Rapoport
  2023-05-25  2:43   ` Anshuman Khandual
  2023-03-15  5:14 ` [PATCH v4 05/36] mm: Add default definition of set_ptes() Matthew Wilcox (Oracle)
                   ` (31 subsequent siblings)
  35 siblings, 2 replies; 138+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-03-15  5:14 UTC (permalink / raw)
  To: linux-arch; +Cc: Matthew Wilcox (Oracle), linux-mm, linux-kernel

Current best practice is to reuse the name of the function as a define
to indicate that the function is implemented by the architecture.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 Documentation/core-api/cachetlb.rst | 24 +++++++++---------------
 include/linux/cacheflush.h          |  4 ++--
 mm/util.c                           |  2 +-
 3 files changed, 12 insertions(+), 18 deletions(-)

diff --git a/Documentation/core-api/cachetlb.rst b/Documentation/core-api/cachetlb.rst
index d4c9e2a28d36..770008afd409 100644
--- a/Documentation/core-api/cachetlb.rst
+++ b/Documentation/core-api/cachetlb.rst
@@ -269,7 +269,7 @@ maps this page at its virtual address.
 	If D-cache aliasing is not an issue, these two routines may
 	simply call memcpy/memset directly and do nothing more.
 
-  ``void flush_dcache_page(struct page *page)``
+  ``void flush_dcache_folio(struct folio *folio)``
 
         This routines must be called when:
 
@@ -277,7 +277,7 @@ maps this page at its virtual address.
 	     and / or in high memory
 	  b) the kernel is about to read from a page cache page and user space
 	     shared/writable mappings of this page potentially exist.  Note
-	     that {get,pin}_user_pages{_fast} already call flush_dcache_page
+	     that {get,pin}_user_pages{_fast} already call flush_dcache_folio
 	     on any page found in the user address space and thus driver
 	     code rarely needs to take this into account.
 
@@ -291,7 +291,7 @@ maps this page at its virtual address.
 
 	The phrase "kernel writes to a page cache page" means, specifically,
 	that the kernel executes store instructions that dirty data in that
-	page at the page->virtual mapping of that page.  It is important to
+	page at the kernel virtual mapping of that page.  It is important to
 	flush here to handle D-cache aliasing, to make sure these kernel stores
 	are visible to user space mappings of that page.
 
@@ -302,18 +302,18 @@ maps this page at its virtual address.
 	If D-cache aliasing is not an issue, this routine may simply be defined
 	as a nop on that architecture.
 
-        There is a bit set aside in page->flags (PG_arch_1) as "architecture
+        There is a bit set aside in folio->flags (PG_arch_1) as "architecture
 	private".  The kernel guarantees that, for pagecache pages, it will
 	clear this bit when such a page first enters the pagecache.
 
 	This allows these interfaces to be implemented much more
 	efficiently.  It allows one to "defer" (perhaps indefinitely) the
 	actual flush if there are currently no user processes mapping this
-	page.  See sparc64's flush_dcache_page and update_mmu_cache_range
+	page.  See sparc64's flush_dcache_folio and update_mmu_cache_range
 	implementations for an example of how to go about doing this.
 
-	The idea is, first at flush_dcache_page() time, if
-	page_file_mapping() returns a mapping, and mapping_mapped on that
+	The idea is, first at flush_dcache_folio() time, if
+	folio_flush_mapping() returns a mapping, and mapping_mapped() on that
 	mapping returns %false, just mark the architecture private page
 	flag bit.  Later, in update_mmu_cache_range(), a check is made
 	of this flag bit, and if set the flush is done and the flag bit
@@ -327,12 +327,6 @@ maps this page at its virtual address.
 			dirty.  Again, see sparc64 for examples of how
 			to deal with this.
 
-  ``void flush_dcache_folio(struct folio *folio)``
-	This function is called under the same circumstances as
-	flush_dcache_page().  It allows the architecture to
-	optimise for flushing the entire folio of pages instead
-	of flushing one page at a time.
-
   ``void copy_to_user_page(struct vm_area_struct *vma, struct page *page,
   unsigned long user_vaddr, void *dst, void *src, int len)``
   ``void copy_from_user_page(struct vm_area_struct *vma, struct page *page,
@@ -353,7 +347,7 @@ maps this page at its virtual address.
 
   	When the kernel needs to access the contents of an anonymous
 	page, it calls this function (currently only
-	get_user_pages()).  Note: flush_dcache_page() deliberately
+	get_user_pages()).  Note: flush_dcache_folio() deliberately
 	doesn't work for an anonymous page.  The default
 	implementation is a nop (and should remain so for all coherent
 	architectures).  For incoherent architectures, it should flush
@@ -370,7 +364,7 @@ maps this page at its virtual address.
   ``void flush_icache_page(struct vm_area_struct *vma, struct page *page)``
 
 	All the functionality of flush_icache_page can be implemented in
-	flush_dcache_page and update_mmu_cache_range. In the future, the hope
+	flush_dcache_folio and update_mmu_cache_range. In the future, the hope
 	is to remove this interface completely.
 
 The final category of APIs is for I/O to deliberately aliased address
diff --git a/include/linux/cacheflush.h b/include/linux/cacheflush.h
index a6189d21f2ba..82136f3fcf54 100644
--- a/include/linux/cacheflush.h
+++ b/include/linux/cacheflush.h
@@ -7,14 +7,14 @@
 struct folio;
 
 #if ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE
-#ifndef ARCH_IMPLEMENTS_FLUSH_DCACHE_FOLIO
+#ifndef flush_dcache_folio
 void flush_dcache_folio(struct folio *folio);
 #endif
 #else
 static inline void flush_dcache_folio(struct folio *folio)
 {
 }
-#define ARCH_IMPLEMENTS_FLUSH_DCACHE_FOLIO 0
+#define flush_dcache_folio flush_dcache_folio
 #endif /* ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE */
 
 #endif /* _LINUX_CACHEFLUSH_H */
diff --git a/mm/util.c b/mm/util.c
index dd12b9531ac4..98ce51b01627 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -1125,7 +1125,7 @@ void page_offline_end(void)
 }
 EXPORT_SYMBOL(page_offline_end);
 
-#ifndef ARCH_IMPLEMENTS_FLUSH_DCACHE_FOLIO
+#ifndef flush_dcache_folio
 void flush_dcache_folio(struct folio *folio)
 {
 	long i, nr = folio_nr_pages(folio);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH v4 05/36] mm: Add default definition of set_ptes()
  2023-03-15  5:14 [PATCH v4 00/36] New page table range API Matthew Wilcox (Oracle)
                   ` (3 preceding siblings ...)
  2023-03-15  5:14 ` [PATCH v4 04/36] mm: Remove ARCH_IMPLEMENTS_FLUSH_DCACHE_FOLIO Matthew Wilcox (Oracle)
@ 2023-03-15  5:14 ` Matthew Wilcox (Oracle)
  2023-03-15  9:34   ` Mike Rapoport
  2023-05-25  3:01   ` Anshuman Khandual
  2023-03-15  5:14 ` [PATCH v4 06/36] alpha: Implement the new page table range API Matthew Wilcox (Oracle)
                   ` (30 subsequent siblings)
  35 siblings, 2 replies; 138+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-03-15  5:14 UTC (permalink / raw)
  To: linux-arch; +Cc: Matthew Wilcox (Oracle), linux-mm, linux-kernel, Mike Rapoport

Most architectures can just define set_pte() and PFN_PTE_SHIFT to
use this definition.  It's also a handy spot to document the guarantees
provided by the MM.

Suggested-by: Mike Rapoport (IBM) <rppt@kernel.org>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 include/linux/pgtable.h | 37 +++++++++++++++++++++++++++++++++++++
 1 file changed, 37 insertions(+)

diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index c5a51481bbb9..a755fe94b4b4 100644
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -172,6 +172,43 @@ static inline int pmd_young(pmd_t pmd)
 }
 #endif
 
+#ifndef set_ptes
+#ifdef PFN_PTE_SHIFT
+/**
+ * set_ptes - Map consecutive pages to a contiguous range of addresses.
+ * @mm: Address space to map the pages into.
+ * @addr: Address to map the first page at.
+ * @ptep: Page table pointer for the first entry.
+ * @pte: Page table entry for the first page.
+ * @nr: Number of pages to map.
+ *
+ * May be overridden by the architecture, or the architecture can define
+ * set_pte() and PFN_PTE_SHIFT.
+ *
+ * Context: The caller holds the page table lock.  The pages all belong
+ * to the same folio.  The PTEs are all in the same PMD.
+ */
+static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
+		pte_t *ptep, pte_t pte, unsigned int nr)
+{
+	page_table_check_ptes_set(mm, addr, ptep, pte, nr);
+
+	for (;;) {
+		set_pte(ptep, pte);
+		if (--nr == 0)
+			break;
+		ptep++;
+		pte = __pte(pte_val(pte) + (1UL << PFN_PTE_SHIFT));
+	}
+}
+#ifndef set_pte_at
+#define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1)
+#endif
+#endif
+#else
+#define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1)
+#endif
+
 #ifndef __HAVE_ARCH_PTEP_SET_ACCESS_FLAGS
 extern int ptep_set_access_flags(struct vm_area_struct *vma,
 				 unsigned long address, pte_t *ptep,
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH v4 06/36] alpha: Implement the new page table range API
  2023-03-15  5:14 [PATCH v4 00/36] New page table range API Matthew Wilcox (Oracle)
                   ` (4 preceding siblings ...)
  2023-03-15  5:14 ` [PATCH v4 05/36] mm: Add default definition of set_ptes() Matthew Wilcox (Oracle)
@ 2023-03-15  5:14 ` Matthew Wilcox (Oracle)
  2023-03-15  9:41   ` Mike Rapoport
  2023-03-15  5:14 ` [PATCH v4 07/36] arc: " Matthew Wilcox (Oracle)
                   ` (29 subsequent siblings)
  35 siblings, 1 reply; 138+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-03-15  5:14 UTC (permalink / raw)
  To: linux-arch
  Cc: Matthew Wilcox (Oracle),
	linux-mm, linux-kernel, Richard Henderson, Ivan Kokshaysky,
	Matt Turner, linux-alpha

Add PFN_PTE_SHIFT, update_mmu_cache_range() and flush_icache_pages().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Richard Henderson <richard.henderson@linaro.org>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Matt Turner <mattst88@gmail.com>
Cc: linux-alpha@vger.kernel.org
---
 arch/alpha/include/asm/cacheflush.h | 10 ++++++++++
 arch/alpha/include/asm/pgtable.h    |  9 +++++++--
 2 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/arch/alpha/include/asm/cacheflush.h b/arch/alpha/include/asm/cacheflush.h
index 9945ff483eaf..3956460e69e2 100644
--- a/arch/alpha/include/asm/cacheflush.h
+++ b/arch/alpha/include/asm/cacheflush.h
@@ -57,6 +57,16 @@ extern void flush_icache_user_page(struct vm_area_struct *vma,
 #define flush_icache_page(vma, page) \
 	flush_icache_user_page((vma), (page), 0, 0)
 
+/*
+ * Both implementations of flush_icache_user_page flush the entire
+ * address space, so one call, no matter how many pages.
+ */
+static inline void flush_icache_pages(struct vm_area_struct *vma,
+		struct page *page, unsigned int nr)
+{
+	flush_icache_user_page(vma, page, 0, 0);
+}
+
 #include <asm-generic/cacheflush.h>
 
 #endif /* _ALPHA_CACHEFLUSH_H */
diff --git a/arch/alpha/include/asm/pgtable.h b/arch/alpha/include/asm/pgtable.h
index ba43cb841d19..6c24c408b8e9 100644
--- a/arch/alpha/include/asm/pgtable.h
+++ b/arch/alpha/include/asm/pgtable.h
@@ -26,7 +26,6 @@ struct vm_area_struct;
  * hook is made available.
  */
 #define set_pte(pteptr, pteval) ((*(pteptr)) = (pteval))
-#define set_pte_at(mm,addr,ptep,pteval) set_pte(ptep,pteval)
 
 /* PMD_SHIFT determines the size of the area a second-level page table can map */
 #define PMD_SHIFT	(PAGE_SHIFT + (PAGE_SHIFT-3))
@@ -189,7 +188,8 @@ extern unsigned long __zero_page(void);
  * and a page entry and page directory to the page they refer to.
  */
 #define page_to_pa(page)	(page_to_pfn(page) << PAGE_SHIFT)
-#define pte_pfn(pte)	(pte_val(pte) >> 32)
+#define PFN_PTE_SHIFT		32
+#define pte_pfn(pte)		(pte_val(pte) >> PFN_PTE_SHIFT)
 
 #define pte_page(pte)	pfn_to_page(pte_pfn(pte))
 #define mk_pte(page, pgprot)						\
@@ -303,6 +303,11 @@ extern inline void update_mmu_cache(struct vm_area_struct * vma,
 {
 }
 
+static inline void update_mmu_cache_range(struct vm_area_struct *vma,
+		unsigned long address, pte_t *ptep, unsigned int nr)
+{
+}
+
 /*
  * Encode/decode swap entries and swap PTEs. Swap PTEs are all PTEs that
  * are !pte_none() && !pte_present().
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH v4 07/36] arc: Implement the new page table range API
  2023-03-15  5:14 [PATCH v4 00/36] New page table range API Matthew Wilcox (Oracle)
                   ` (5 preceding siblings ...)
  2023-03-15  5:14 ` [PATCH v4 06/36] alpha: Implement the new page table range API Matthew Wilcox (Oracle)
@ 2023-03-15  5:14 ` Matthew Wilcox (Oracle)
  2023-03-15  9:44   ` Mike Rapoport
  2023-03-15  5:14 ` [PATCH v4 08/36] arm: " Matthew Wilcox (Oracle)
                   ` (28 subsequent siblings)
  35 siblings, 1 reply; 138+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-03-15  5:14 UTC (permalink / raw)
  To: linux-arch
  Cc: Matthew Wilcox (Oracle),
	linux-mm, linux-kernel, Vineet Gupta, linux-snps-arc

Add PFN_PTE_SHIFT, update_mmu_cache_range(), flush_dcache_folio()
and flush_icache_pages().

Change the PG_dc_clean flag from being per-page to per-folio (which
means it cannot always be set as we don't know that all pages in this
folio were cleaned).  Enhance the internal flush routines to take the
number of pages to flush.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Vineet Gupta <vgupta@kernel.org>
Cc: linux-snps-arc@lists.infradead.org
---
 arch/arc/include/asm/cacheflush.h         |  7 ++-
 arch/arc/include/asm/pgtable-bits-arcv2.h | 11 ++--
 arch/arc/include/asm/pgtable-levels.h     |  1 +
 arch/arc/mm/cache.c                       | 61 ++++++++++++++---------
 arch/arc/mm/tlb.c                         | 18 ++++---
 5 files changed, 58 insertions(+), 40 deletions(-)

diff --git a/arch/arc/include/asm/cacheflush.h b/arch/arc/include/asm/cacheflush.h
index e201b4b1655a..04f65f588510 100644
--- a/arch/arc/include/asm/cacheflush.h
+++ b/arch/arc/include/asm/cacheflush.h
@@ -25,17 +25,20 @@
  * in update_mmu_cache()
  */
 #define flush_icache_page(vma, page)
+#define flush_icache_pages(vma, page, nr)
 
 void flush_cache_all(void);
 
 void flush_icache_range(unsigned long kstart, unsigned long kend);
 void __sync_icache_dcache(phys_addr_t paddr, unsigned long vaddr, int len);
-void __inv_icache_page(phys_addr_t paddr, unsigned long vaddr);
-void __flush_dcache_page(phys_addr_t paddr, unsigned long vaddr);
+void __inv_icache_pages(phys_addr_t paddr, unsigned long vaddr, unsigned nr);
+void __flush_dcache_pages(phys_addr_t paddr, unsigned long vaddr, unsigned nr);
 
 #define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
 
 void flush_dcache_page(struct page *page);
+void flush_dcache_folio(struct folio *folio);
+#define flush_dcache_folio flush_dcache_folio
 
 void dma_cache_wback_inv(phys_addr_t start, unsigned long sz);
 void dma_cache_inv(phys_addr_t start, unsigned long sz);
diff --git a/arch/arc/include/asm/pgtable-bits-arcv2.h b/arch/arc/include/asm/pgtable-bits-arcv2.h
index 6e9f8ca6d6a1..06d8039180c0 100644
--- a/arch/arc/include/asm/pgtable-bits-arcv2.h
+++ b/arch/arc/include/asm/pgtable-bits-arcv2.h
@@ -100,14 +100,11 @@ static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
 	return __pte((pte_val(pte) & _PAGE_CHG_MASK) | pgprot_val(newprot));
 }
 
-static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
-			      pte_t *ptep, pte_t pteval)
-{
-	set_pte(ptep, pteval);
-}
+void update_mmu_cache_range(struct vm_area_struct *vma, unsigned long address,
+		      pte_t *ptep, unsigned int nr);
 
-void update_mmu_cache(struct vm_area_struct *vma, unsigned long address,
-		      pte_t *ptep);
+#define update_mmu_cache(vma, addr, ptep) \
+	update_mmu_cache_range(vma, addr, ptep, 1)
 
 /*
  * Encode/decode swap entries and swap PTEs. Swap PTEs are all PTEs that
diff --git a/arch/arc/include/asm/pgtable-levels.h b/arch/arc/include/asm/pgtable-levels.h
index ef68758b69f7..fc417c75c24d 100644
--- a/arch/arc/include/asm/pgtable-levels.h
+++ b/arch/arc/include/asm/pgtable-levels.h
@@ -169,6 +169,7 @@
 #define pte_ERROR(e) \
 	pr_crit("%s:%d: bad pte %08lx.\n", __FILE__, __LINE__, pte_val(e))
 
+#define PFN_PTE_SHIFT		PAGE_SHIFT
 #define pte_none(x)		(!pte_val(x))
 #define pte_present(x)		(pte_val(x) & _PAGE_PRESENT)
 #define pte_clear(mm,addr,ptep)	set_pte_at(mm, addr, ptep, __pte(0))
diff --git a/arch/arc/mm/cache.c b/arch/arc/mm/cache.c
index 55c6de138eae..3c16ee942a5c 100644
--- a/arch/arc/mm/cache.c
+++ b/arch/arc/mm/cache.c
@@ -752,17 +752,17 @@ static inline void arc_slc_enable(void)
  * There's a corollary case, where kernel READs from a userspace mapped page.
  * If the U-mapping is not congruent to K-mapping, former needs flushing.
  */
-void flush_dcache_page(struct page *page)
+void flush_dcache_folio(struct folio *folio)
 {
 	struct address_space *mapping;
 
 	if (!cache_is_vipt_aliasing()) {
-		clear_bit(PG_dc_clean, &page->flags);
+		clear_bit(PG_dc_clean, &folio->flags);
 		return;
 	}
 
 	/* don't handle anon pages here */
-	mapping = page_mapping_file(page);
+	mapping = folio_flush_mapping(folio);
 	if (!mapping)
 		return;
 
@@ -771,17 +771,27 @@ void flush_dcache_page(struct page *page)
 	 * Make a note that K-mapping is dirty
 	 */
 	if (!mapping_mapped(mapping)) {
-		clear_bit(PG_dc_clean, &page->flags);
-	} else if (page_mapcount(page)) {
-
+		clear_bit(PG_dc_clean, &folio->flags);
+	} else if (folio_mapped(folio)) {
 		/* kernel reading from page with U-mapping */
-		phys_addr_t paddr = (unsigned long)page_address(page);
-		unsigned long vaddr = page->index << PAGE_SHIFT;
+		phys_addr_t paddr = (unsigned long)folio_address(folio);
+		unsigned long vaddr = folio_pos(folio);
 
+		/*
+		 * vaddr is not actually the virtual address, but is
+		 * congruent to every user mapping.
+		 */
 		if (addr_not_cache_congruent(paddr, vaddr))
-			__flush_dcache_page(paddr, vaddr);
+			__flush_dcache_pages(paddr, vaddr,
+						folio_nr_pages(folio));
 	}
 }
+EXPORT_SYMBOL(flush_dcache_folio);
+
+void flush_dcache_page(struct page *page)
+{
+	return flush_dcache_folio(page_folio(page));
+}
 EXPORT_SYMBOL(flush_dcache_page);
 
 /*
@@ -921,18 +931,18 @@ void __sync_icache_dcache(phys_addr_t paddr, unsigned long vaddr, int len)
 }
 
 /* wrapper to compile time eliminate alignment checks in flush loop */
-void __inv_icache_page(phys_addr_t paddr, unsigned long vaddr)
+void __inv_icache_pages(phys_addr_t paddr, unsigned long vaddr, unsigned nr)
 {
-	__ic_line_inv_vaddr(paddr, vaddr, PAGE_SIZE);
+	__ic_line_inv_vaddr(paddr, vaddr, nr * PAGE_SIZE);
 }
 
 /*
  * wrapper to clearout kernel or userspace mappings of a page
  * For kernel mappings @vaddr == @paddr
  */
-void __flush_dcache_page(phys_addr_t paddr, unsigned long vaddr)
+void __flush_dcache_pages(phys_addr_t paddr, unsigned long vaddr, unsigned nr)
 {
-	__dc_line_op(paddr, vaddr & PAGE_MASK, PAGE_SIZE, OP_FLUSH_N_INV);
+	__dc_line_op(paddr, vaddr & PAGE_MASK, nr * PAGE_SIZE, OP_FLUSH_N_INV);
 }
 
 noinline void flush_cache_all(void)
@@ -962,10 +972,10 @@ void flush_cache_page(struct vm_area_struct *vma, unsigned long u_vaddr,
 
 	u_vaddr &= PAGE_MASK;
 
-	__flush_dcache_page(paddr, u_vaddr);
+	__flush_dcache_pages(paddr, u_vaddr, 1);
 
 	if (vma->vm_flags & VM_EXEC)
-		__inv_icache_page(paddr, u_vaddr);
+		__inv_icache_pages(paddr, u_vaddr, 1);
 }
 
 void flush_cache_range(struct vm_area_struct *vma, unsigned long start,
@@ -978,9 +988,9 @@ void flush_anon_page(struct vm_area_struct *vma, struct page *page,
 		     unsigned long u_vaddr)
 {
 	/* TBD: do we really need to clear the kernel mapping */
-	__flush_dcache_page((phys_addr_t)page_address(page), u_vaddr);
-	__flush_dcache_page((phys_addr_t)page_address(page),
-			    (phys_addr_t)page_address(page));
+	__flush_dcache_pages((phys_addr_t)page_address(page), u_vaddr, 1);
+	__flush_dcache_pages((phys_addr_t)page_address(page),
+			    (phys_addr_t)page_address(page), 1);
 
 }
 
@@ -989,6 +999,8 @@ void flush_anon_page(struct vm_area_struct *vma, struct page *page,
 void copy_user_highpage(struct page *to, struct page *from,
 	unsigned long u_vaddr, struct vm_area_struct *vma)
 {
+	struct folio *src = page_folio(from);
+	struct folio *dst = page_folio(to);
 	void *kfrom = kmap_atomic(from);
 	void *kto = kmap_atomic(to);
 	int clean_src_k_mappings = 0;
@@ -1005,7 +1017,7 @@ void copy_user_highpage(struct page *to, struct page *from,
 	 * addr_not_cache_congruent() is 0
 	 */
 	if (page_mapcount(from) && addr_not_cache_congruent(kfrom, u_vaddr)) {
-		__flush_dcache_page((unsigned long)kfrom, u_vaddr);
+		__flush_dcache_pages((unsigned long)kfrom, u_vaddr, 1);
 		clean_src_k_mappings = 1;
 	}
 
@@ -1019,17 +1031,17 @@ void copy_user_highpage(struct page *to, struct page *from,
 	 * non copied user pages (e.g. read faults which wire in pagecache page
 	 * directly).
 	 */
-	clear_bit(PG_dc_clean, &to->flags);
+	clear_bit(PG_dc_clean, &dst->flags);
 
 	/*
 	 * if SRC was already usermapped and non-congruent to kernel mapping
 	 * sync the kernel mapping back to physical page
 	 */
 	if (clean_src_k_mappings) {
-		__flush_dcache_page((unsigned long)kfrom, (unsigned long)kfrom);
-		set_bit(PG_dc_clean, &from->flags);
+		__flush_dcache_pages((unsigned long)kfrom,
+					(unsigned long)kfrom, 1);
 	} else {
-		clear_bit(PG_dc_clean, &from->flags);
+		clear_bit(PG_dc_clean, &src->flags);
 	}
 
 	kunmap_atomic(kto);
@@ -1038,8 +1050,9 @@ void copy_user_highpage(struct page *to, struct page *from,
 
 void clear_user_page(void *to, unsigned long u_vaddr, struct page *page)
 {
+	struct folio *folio = page_folio(page);
 	clear_page(to);
-	clear_bit(PG_dc_clean, &page->flags);
+	clear_bit(PG_dc_clean, &folio->flags);
 }
 EXPORT_SYMBOL(clear_user_page);
 
diff --git a/arch/arc/mm/tlb.c b/arch/arc/mm/tlb.c
index 5f71445f26bd..0a996b65bb4e 100644
--- a/arch/arc/mm/tlb.c
+++ b/arch/arc/mm/tlb.c
@@ -467,8 +467,8 @@ void create_tlb(struct vm_area_struct *vma, unsigned long vaddr, pte_t *ptep)
  * Note that flush (when done) involves both WBACK - so physical page is
  * in sync as well as INV - so any non-congruent aliases don't remain
  */
-void update_mmu_cache(struct vm_area_struct *vma, unsigned long vaddr_unaligned,
-		      pte_t *ptep)
+void update_mmu_cache_range(struct vm_area_struct *vma,
+		unsigned long vaddr_unaligned, pte_t *ptep, unsigned int nr)
 {
 	unsigned long vaddr = vaddr_unaligned & PAGE_MASK;
 	phys_addr_t paddr = pte_val(*ptep) & PAGE_MASK_PHYS;
@@ -491,15 +491,19 @@ void update_mmu_cache(struct vm_area_struct *vma, unsigned long vaddr_unaligned,
 	 */
 	if ((vma->vm_flags & VM_EXEC) ||
 	     addr_not_cache_congruent(paddr, vaddr)) {
-
-		int dirty = !test_and_set_bit(PG_dc_clean, &page->flags);
+		struct folio *folio = page_folio(page);
+		int dirty = !test_and_set_bit(PG_dc_clean, &folio->flags);
 		if (dirty) {
+			unsigned long offset = offset_in_folio(folio, paddr);
+			nr = folio_nr_pages(folio);
+			paddr -= offset;
+			vaddr -= offset;
 			/* wback + inv dcache lines (K-mapping) */
-			__flush_dcache_page(paddr, paddr);
+			__flush_dcache_pages(paddr, paddr, nr);
 
 			/* invalidate any existing icache lines (U-mapping) */
 			if (vma->vm_flags & VM_EXEC)
-				__inv_icache_page(paddr, vaddr);
+				__inv_icache_pages(paddr, vaddr, nr);
 		}
 	}
 }
@@ -531,7 +535,7 @@ void update_mmu_cache_pmd(struct vm_area_struct *vma, unsigned long addr,
 				 pmd_t *pmd)
 {
 	pte_t pte = __pte(pmd_val(*pmd));
-	update_mmu_cache(vma, addr, &pte);
+	update_mmu_cache_range(vma, addr, &pte, HPAGE_PMD_NR);
 }
 
 void local_flush_pmd_tlb_range(struct vm_area_struct *vma, unsigned long start,
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH v4 08/36] arm: Implement the new page table range API
  2023-03-15  5:14 [PATCH v4 00/36] New page table range API Matthew Wilcox (Oracle)
                   ` (6 preceding siblings ...)
  2023-03-15  5:14 ` [PATCH v4 07/36] arc: " Matthew Wilcox (Oracle)
@ 2023-03-15  5:14 ` Matthew Wilcox (Oracle)
  2023-03-15  9:48   ` Mike Rapoport
  2023-03-15 10:56   ` Russell King (Oracle)
  2023-03-15  5:14 ` [PATCH v4 09/36] arm64: " Matthew Wilcox (Oracle)
                   ` (27 subsequent siblings)
  35 siblings, 2 replies; 138+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-03-15  5:14 UTC (permalink / raw)
  To: linux-arch
  Cc: Matthew Wilcox (Oracle),
	linux-mm, linux-kernel, Russell King, linux-arm-kernel

Add set_ptes(), update_mmu_cache_range(), flush_dcache_folio() and
flush_icache_pages().  Change the PG_dcache_clear flag from being per-page
to per-folio which makes __dma_page_dev_to_cpu() a bit more exciting.
Also add flush_cache_pages(), even though this isn't used by generic code
(yet?)

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: linux-arm-kernel@lists.infradead.org
---
 arch/arm/include/asm/cacheflush.h | 24 +++++---
 arch/arm/include/asm/pgtable.h    |  5 +-
 arch/arm/include/asm/tlbflush.h   | 13 ++--
 arch/arm/mm/copypage-v4mc.c       |  5 +-
 arch/arm/mm/copypage-v6.c         |  5 +-
 arch/arm/mm/copypage-xscale.c     |  5 +-
 arch/arm/mm/dma-mapping.c         | 24 ++++----
 arch/arm/mm/fault-armv.c          | 14 ++---
 arch/arm/mm/flush.c               | 99 +++++++++++++++++++------------
 arch/arm/mm/mm.h                  |  2 +-
 arch/arm/mm/mmu.c                 | 14 +++--
 11 files changed, 125 insertions(+), 85 deletions(-)

diff --git a/arch/arm/include/asm/cacheflush.h b/arch/arm/include/asm/cacheflush.h
index a094f964c869..841e268d2374 100644
--- a/arch/arm/include/asm/cacheflush.h
+++ b/arch/arm/include/asm/cacheflush.h
@@ -231,14 +231,15 @@ vivt_flush_cache_range(struct vm_area_struct *vma, unsigned long start, unsigned
 					vma->vm_flags);
 }
 
-static inline void
-vivt_flush_cache_page(struct vm_area_struct *vma, unsigned long user_addr, unsigned long pfn)
+static inline void vivt_flush_cache_pages(struct vm_area_struct *vma,
+		unsigned long user_addr, unsigned long pfn, unsigned int nr)
 {
 	struct mm_struct *mm = vma->vm_mm;
 
 	if (!mm || cpumask_test_cpu(smp_processor_id(), mm_cpumask(mm))) {
 		unsigned long addr = user_addr & PAGE_MASK;
-		__cpuc_flush_user_range(addr, addr + PAGE_SIZE, vma->vm_flags);
+		__cpuc_flush_user_range(addr, addr + nr * PAGE_SIZE,
+				vma->vm_flags);
 	}
 }
 
@@ -247,15 +248,17 @@ vivt_flush_cache_page(struct vm_area_struct *vma, unsigned long user_addr, unsig
 		vivt_flush_cache_mm(mm)
 #define flush_cache_range(vma,start,end) \
 		vivt_flush_cache_range(vma,start,end)
-#define flush_cache_page(vma,addr,pfn) \
-		vivt_flush_cache_page(vma,addr,pfn)
+#define flush_cache_pages(vma, addr, pfn, nr) \
+		vivt_flush_cache_pages(vma, addr, pfn, nr)
 #else
-extern void flush_cache_mm(struct mm_struct *mm);
-extern void flush_cache_range(struct vm_area_struct *vma, unsigned long start, unsigned long end);
-extern void flush_cache_page(struct vm_area_struct *vma, unsigned long user_addr, unsigned long pfn);
+void flush_cache_mm(struct mm_struct *mm);
+void flush_cache_range(struct vm_area_struct *vma, unsigned long start, unsigned long end);
+void flush_cache_pages(struct vm_area_struct *vma, unsigned long user_addr,
+		unsigned long pfn, unsigned int nr);
 #endif
 
 #define flush_cache_dup_mm(mm) flush_cache_mm(mm)
+#define flush_cache_page(vma, addr, pfn) flush_cache_pages(vma, addr, pfn, 1)
 
 /*
  * flush_icache_user_range is used when we want to ensure that the
@@ -289,7 +292,9 @@ extern void flush_cache_page(struct vm_area_struct *vma, unsigned long user_addr
  * See update_mmu_cache for the user space part.
  */
 #define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
-extern void flush_dcache_page(struct page *);
+void flush_dcache_page(struct page *);
+void flush_dcache_folio(struct folio *folio);
+#define flush_dcache_folio flush_dcache_folio
 
 #define ARCH_IMPLEMENTS_FLUSH_KERNEL_VMAP_RANGE 1
 static inline void flush_kernel_vmap_range(void *addr, int size)
@@ -321,6 +326,7 @@ static inline void flush_anon_page(struct vm_area_struct *vma,
  * duplicate cache flushing elsewhere performed by flush_dcache_page().
  */
 #define flush_icache_page(vma,page)	do { } while (0)
+#define flush_icache_pages(vma, page, nr)	do { } while (0)
 
 /*
  * flush_cache_vmap() is used when creating mappings (eg, via vmap,
diff --git a/arch/arm/include/asm/pgtable.h b/arch/arm/include/asm/pgtable.h
index a58ccbb406ad..841001ab495c 100644
--- a/arch/arm/include/asm/pgtable.h
+++ b/arch/arm/include/asm/pgtable.h
@@ -207,8 +207,9 @@ static inline void __sync_icache_dcache(pte_t pteval)
 extern void __sync_icache_dcache(pte_t pteval);
 #endif
 
-void set_pte_at(struct mm_struct *mm, unsigned long addr,
-		      pte_t *ptep, pte_t pteval);
+void set_ptes(struct mm_struct *mm, unsigned long addr,
+		      pte_t *ptep, pte_t pteval, unsigned int nr);
+#define set_ptes set_ptes
 
 static inline pte_t clear_pte_bit(pte_t pte, pgprot_t prot)
 {
diff --git a/arch/arm/include/asm/tlbflush.h b/arch/arm/include/asm/tlbflush.h
index 0ccc985b90af..7d792e485f4f 100644
--- a/arch/arm/include/asm/tlbflush.h
+++ b/arch/arm/include/asm/tlbflush.h
@@ -619,18 +619,21 @@ extern void flush_bp_all(void);
  * If PG_dcache_clean is not set for the page, we need to ensure that any
  * cache entries for the kernels virtual memory range are written
  * back to the page. On ARMv6 and later, the cache coherency is handled via
- * the set_pte_at() function.
+ * the set_ptes() function.
  */
 #if __LINUX_ARM_ARCH__ < 6
-extern void update_mmu_cache(struct vm_area_struct *vma, unsigned long addr,
-	pte_t *ptep);
+void update_mmu_cache_range(struct vm_area_struct *vma, unsigned long addr,
+		pte_t *ptep, unsigned int nr);
 #else
-static inline void update_mmu_cache(struct vm_area_struct *vma,
-				    unsigned long addr, pte_t *ptep)
+static inline void update_mmu_cache_range(struct vm_area_struct *vma,
+		unsigned long addr, pte_t *ptep, unsigned int nr)
 {
 }
 #endif
 
+#define update_mmu_cache(vma, addr, ptep) \
+	update_mmu_cache_range(vma, addr, ptep, 1)
+
 #define update_mmu_cache_pmd(vma, address, pmd) do { } while (0)
 
 #endif
diff --git a/arch/arm/mm/copypage-v4mc.c b/arch/arm/mm/copypage-v4mc.c
index f1da3b439b96..7ddd82b9fe8b 100644
--- a/arch/arm/mm/copypage-v4mc.c
+++ b/arch/arm/mm/copypage-v4mc.c
@@ -64,10 +64,11 @@ static void mc_copy_user_page(void *from, void *to)
 void v4_mc_copy_user_highpage(struct page *to, struct page *from,
 	unsigned long vaddr, struct vm_area_struct *vma)
 {
+	struct folio *src = page_folio(from);
 	void *kto = kmap_atomic(to);
 
-	if (!test_and_set_bit(PG_dcache_clean, &from->flags))
-		__flush_dcache_page(page_mapping_file(from), from);
+	if (!test_and_set_bit(PG_dcache_clean, &src->flags))
+		__flush_dcache_folio(folio_flush_mapping(src), src);
 
 	raw_spin_lock(&minicache_lock);
 
diff --git a/arch/arm/mm/copypage-v6.c b/arch/arm/mm/copypage-v6.c
index d8a115de5507..a1a71f36d850 100644
--- a/arch/arm/mm/copypage-v6.c
+++ b/arch/arm/mm/copypage-v6.c
@@ -69,11 +69,12 @@ static void discard_old_kernel_data(void *kto)
 static void v6_copy_user_highpage_aliasing(struct page *to,
 	struct page *from, unsigned long vaddr, struct vm_area_struct *vma)
 {
+	struct folio *src = page_folio(from);
 	unsigned int offset = CACHE_COLOUR(vaddr);
 	unsigned long kfrom, kto;
 
-	if (!test_and_set_bit(PG_dcache_clean, &from->flags))
-		__flush_dcache_page(page_mapping_file(from), from);
+	if (!test_and_set_bit(PG_dcache_clean, &src->flags))
+		__flush_dcache_folio(folio_flush_mapping(src), src);
 
 	/* FIXME: not highmem safe */
 	discard_old_kernel_data(page_address(to));
diff --git a/arch/arm/mm/copypage-xscale.c b/arch/arm/mm/copypage-xscale.c
index bcb485620a05..f1e29d3e8193 100644
--- a/arch/arm/mm/copypage-xscale.c
+++ b/arch/arm/mm/copypage-xscale.c
@@ -84,10 +84,11 @@ static void mc_copy_user_page(void *from, void *to)
 void xscale_mc_copy_user_highpage(struct page *to, struct page *from,
 	unsigned long vaddr, struct vm_area_struct *vma)
 {
+	struct folio *src = page_folio(from);
 	void *kto = kmap_atomic(to);
 
-	if (!test_and_set_bit(PG_dcache_clean, &from->flags))
-		__flush_dcache_page(page_mapping_file(from), from);
+	if (!test_and_set_bit(PG_dcache_clean, &src->flags))
+		__flush_dcache_folio(folio_flush_mapping(src), src);
 
 	raw_spin_lock(&minicache_lock);
 
diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index 8bc01071474a..5ecfde41d70a 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -693,6 +693,7 @@ static void __dma_page_cpu_to_dev(struct page *page, unsigned long off,
 static void __dma_page_dev_to_cpu(struct page *page, unsigned long off,
 	size_t size, enum dma_data_direction dir)
 {
+	struct folio *folio = page_folio(page);
 	phys_addr_t paddr = page_to_phys(page) + off;
 
 	/* FIXME: non-speculating: not required */
@@ -707,19 +708,18 @@ static void __dma_page_dev_to_cpu(struct page *page, unsigned long off,
 	 * Mark the D-cache clean for these pages to avoid extra flushing.
 	 */
 	if (dir != DMA_TO_DEVICE && size >= PAGE_SIZE) {
-		unsigned long pfn;
-		size_t left = size;
-
-		pfn = page_to_pfn(page) + off / PAGE_SIZE;
-		off %= PAGE_SIZE;
-		if (off) {
-			pfn++;
-			left -= PAGE_SIZE - off;
+		ssize_t left = size;
+		size_t offset = offset_in_folio(folio, paddr);
+
+		if (offset) {
+			left -= folio_size(folio) - offset;
+			folio = folio_next(folio);
 		}
-		while (left >= PAGE_SIZE) {
-			page = pfn_to_page(pfn++);
-			set_bit(PG_dcache_clean, &page->flags);
-			left -= PAGE_SIZE;
+
+		while (left >= (ssize_t)folio_size(folio)) {
+			set_bit(PG_dcache_clean, &folio->flags);
+			left -= folio_size(folio);
+			folio = folio_next(folio);
 		}
 	}
 }
diff --git a/arch/arm/mm/fault-armv.c b/arch/arm/mm/fault-armv.c
index 0e49154454a6..e2c869b8f012 100644
--- a/arch/arm/mm/fault-armv.c
+++ b/arch/arm/mm/fault-armv.c
@@ -178,8 +178,8 @@ make_coherent(struct address_space *mapping, struct vm_area_struct *vma,
  *
  * Note that the pte lock will be held.
  */
-void update_mmu_cache(struct vm_area_struct *vma, unsigned long addr,
-	pte_t *ptep)
+void update_mmu_cache_range(struct vm_area_struct *vma, unsigned long addr,
+		pte_t *ptep, unsigned int nr)
 {
 	unsigned long pfn = pte_pfn(*ptep);
 	struct address_space *mapping;
@@ -192,13 +192,13 @@ void update_mmu_cache(struct vm_area_struct *vma, unsigned long addr,
 	 * The zero page is never written to, so never has any dirty
 	 * cache lines, and therefore never needs to be flushed.
 	 */
-	page = pfn_to_page(pfn);
-	if (page == ZERO_PAGE(0))
+	if (is_zero_pfn(pfn))
 		return;
 
-	mapping = page_mapping_file(page);
-	if (!test_and_set_bit(PG_dcache_clean, &page->flags))
-		__flush_dcache_page(mapping, page);
+	folio = page_folio(pfn_to_page(pfn));
+	mapping = folio_flush_mapping(page);
+	if (!test_and_set_bit(PG_dcache_clean, &folio->flags))
+		__flush_dcache_folio(mapping, folio);
 	if (mapping) {
 		if (cache_is_vivt())
 			make_coherent(mapping, vma, addr, ptep, pfn);
diff --git a/arch/arm/mm/flush.c b/arch/arm/mm/flush.c
index 7ff9feea13a6..07ea0ab51099 100644
--- a/arch/arm/mm/flush.c
+++ b/arch/arm/mm/flush.c
@@ -95,10 +95,10 @@ void flush_cache_range(struct vm_area_struct *vma, unsigned long start, unsigned
 		__flush_icache_all();
 }
 
-void flush_cache_page(struct vm_area_struct *vma, unsigned long user_addr, unsigned long pfn)
+void flush_cache_pages(struct vm_area_struct *vma, unsigned long user_addr, unsigned long pfn, unsigned int nr)
 {
 	if (cache_is_vivt()) {
-		vivt_flush_cache_page(vma, user_addr, pfn);
+		vivt_flush_cache_pages(vma, user_addr, pfn, nr);
 		return;
 	}
 
@@ -196,29 +196,31 @@ void copy_to_user_page(struct vm_area_struct *vma, struct page *page,
 #endif
 }
 
-void __flush_dcache_page(struct address_space *mapping, struct page *page)
+void __flush_dcache_folio(struct address_space *mapping, struct folio *folio)
 {
 	/*
 	 * Writeback any data associated with the kernel mapping of this
 	 * page.  This ensures that data in the physical page is mutually
 	 * coherent with the kernels mapping.
 	 */
-	if (!PageHighMem(page)) {
-		__cpuc_flush_dcache_area(page_address(page), page_size(page));
+	if (!folio_test_highmem(folio)) {
+		__cpuc_flush_dcache_area(folio_address(folio),
+					folio_size(folio));
 	} else {
 		unsigned long i;
 		if (cache_is_vipt_nonaliasing()) {
-			for (i = 0; i < compound_nr(page); i++) {
-				void *addr = kmap_atomic(page + i);
+			for (i = 0; i < folio_nr_pages(folio); i++) {
+				void *addr = kmap_local_folio(folio,
+								i * PAGE_SIZE);
 				__cpuc_flush_dcache_area(addr, PAGE_SIZE);
-				kunmap_atomic(addr);
+				kunmap_local(addr);
 			}
 		} else {
-			for (i = 0; i < compound_nr(page); i++) {
-				void *addr = kmap_high_get(page + i);
+			for (i = 0; i < folio_nr_pages(folio); i++) {
+				void *addr = kmap_high_get(folio_page(folio, i));
 				if (addr) {
 					__cpuc_flush_dcache_area(addr, PAGE_SIZE);
-					kunmap_high(page + i);
+					kunmap_high(folio_page(folio, i));
 				}
 			}
 		}
@@ -230,15 +232,14 @@ void __flush_dcache_page(struct address_space *mapping, struct page *page)
 	 * userspace colour, which is congruent with page->index.
 	 */
 	if (mapping && cache_is_vipt_aliasing())
-		flush_pfn_alias(page_to_pfn(page),
-				page->index << PAGE_SHIFT);
+		flush_pfn_alias(folio_pfn(folio), folio_pos(folio));
 }
 
-static void __flush_dcache_aliases(struct address_space *mapping, struct page *page)
+static void __flush_dcache_aliases(struct address_space *mapping, struct folio *folio)
 {
 	struct mm_struct *mm = current->active_mm;
-	struct vm_area_struct *mpnt;
-	pgoff_t pgoff;
+	struct vm_area_struct *vma;
+	pgoff_t pgoff, pgoff_end;
 
 	/*
 	 * There are possible user space mappings of this page:
@@ -246,21 +247,36 @@ static void __flush_dcache_aliases(struct address_space *mapping, struct page *p
 	 *   data in the current VM view associated with this page.
 	 * - aliasing VIPT: we only need to find one mapping of this page.
 	 */
-	pgoff = page->index;
+	pgoff = folio->index;
+	pgoff_end = pgoff + folio_nr_pages(folio) - 1;
 
 	flush_dcache_mmap_lock(mapping);
-	vma_interval_tree_foreach(mpnt, &mapping->i_mmap, pgoff, pgoff) {
-		unsigned long offset;
+	vma_interval_tree_foreach(vma, &mapping->i_mmap, pgoff, pgoff_end) {
+		unsigned long start, offset, pfn;
+		unsigned int nr;
 
 		/*
 		 * If this VMA is not in our MM, we can ignore it.
 		 */
-		if (mpnt->vm_mm != mm)
+		if (vma->vm_mm != mm)
 			continue;
-		if (!(mpnt->vm_flags & VM_MAYSHARE))
+		if (!(vma->vm_flags & VM_MAYSHARE))
 			continue;
-		offset = (pgoff - mpnt->vm_pgoff) << PAGE_SHIFT;
-		flush_cache_page(mpnt, mpnt->vm_start + offset, page_to_pfn(page));
+
+		start = vma->vm_start;
+		pfn = folio_pfn(folio);
+		nr = folio_nr_pages(folio);
+		offset = pgoff - vma->vm_pgoff;
+		if (offset > -nr) {
+			pfn -= offset;
+			nr += offset;
+		} else {
+			start += offset * PAGE_SIZE;
+		}
+		if (start + nr * PAGE_SIZE > vma->vm_end)
+			nr = (vma->vm_end - start) / PAGE_SIZE;
+
+		flush_cache_pages(vma, start, pfn, nr);
 	}
 	flush_dcache_mmap_unlock(mapping);
 }
@@ -269,7 +285,7 @@ static void __flush_dcache_aliases(struct address_space *mapping, struct page *p
 void __sync_icache_dcache(pte_t pteval)
 {
 	unsigned long pfn;
-	struct page *page;
+	struct folio *folio;
 	struct address_space *mapping;
 
 	if (cache_is_vipt_nonaliasing() && !pte_exec(pteval))
@@ -279,14 +295,14 @@ void __sync_icache_dcache(pte_t pteval)
 	if (!pfn_valid(pfn))
 		return;
 
-	page = pfn_to_page(pfn);
+	folio = page_folio(pfn_to_page(pfn));
 	if (cache_is_vipt_aliasing())
-		mapping = page_mapping_file(page);
+		mapping = folio_flush_mapping(folio);
 	else
 		mapping = NULL;
 
-	if (!test_and_set_bit(PG_dcache_clean, &page->flags))
-		__flush_dcache_page(mapping, page);
+	if (!test_and_set_bit(PG_dcache_clean, &folio->flags))
+		__flush_dcache_folio(mapping, folio);
 
 	if (pte_exec(pteval))
 		__flush_icache_all();
@@ -312,7 +328,7 @@ void __sync_icache_dcache(pte_t pteval)
  * Note that we disable the lazy flush for SMP configurations where
  * the cache maintenance operations are not automatically broadcasted.
  */
-void flush_dcache_page(struct page *page)
+void flush_dcache_folio(struct folio *folio)
 {
 	struct address_space *mapping;
 
@@ -320,31 +336,36 @@ void flush_dcache_page(struct page *page)
 	 * The zero page is never written to, so never has any dirty
 	 * cache lines, and therefore never needs to be flushed.
 	 */
-	if (page == ZERO_PAGE(0))
+	if (is_zero_pfn(folio_pfn(folio)))
 		return;
 
 	if (!cache_ops_need_broadcast() && cache_is_vipt_nonaliasing()) {
-		if (test_bit(PG_dcache_clean, &page->flags))
-			clear_bit(PG_dcache_clean, &page->flags);
+		if (test_bit(PG_dcache_clean, &folio->flags))
+			clear_bit(PG_dcache_clean, &folio->flags);
 		return;
 	}
 
-	mapping = page_mapping_file(page);
+	mapping = folio_flush_mapping(folio);
 
 	if (!cache_ops_need_broadcast() &&
-	    mapping && !page_mapcount(page))
-		clear_bit(PG_dcache_clean, &page->flags);
+	    mapping && !folio_mapped(folio))
+		clear_bit(PG_dcache_clean, &folio->flags);
 	else {
-		__flush_dcache_page(mapping, page);
+		__flush_dcache_folio(mapping, folio);
 		if (mapping && cache_is_vivt())
-			__flush_dcache_aliases(mapping, page);
+			__flush_dcache_aliases(mapping, folio);
 		else if (mapping)
 			__flush_icache_all();
-		set_bit(PG_dcache_clean, &page->flags);
+		set_bit(PG_dcache_clean, &folio->flags);
 	}
 }
-EXPORT_SYMBOL(flush_dcache_page);
+EXPORT_SYMBOL(flush_dcache_folio);
 
+void flush_dcache_page(struct page *page)
+{
+	flush_dcache_folio(page_folio(page));
+}
+EXPORT_SYMBOL(flush_dcache_page);
 /*
  * Flush an anonymous page so that users of get_user_pages()
  * can safely access the data.  The expected sequence is:
diff --git a/arch/arm/mm/mm.h b/arch/arm/mm/mm.h
index d7ffccb7fea7..419316316711 100644
--- a/arch/arm/mm/mm.h
+++ b/arch/arm/mm/mm.h
@@ -45,7 +45,7 @@ struct mem_type {
 
 const struct mem_type *get_mem_type(unsigned int type);
 
-extern void __flush_dcache_page(struct address_space *mapping, struct page *page);
+void __flush_dcache_folio(struct address_space *mapping, struct folio *folio);
 
 /*
  * ARM specific vm_struct->flags bits.
diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
index 463fc2a8448f..9947bbc32b04 100644
--- a/arch/arm/mm/mmu.c
+++ b/arch/arm/mm/mmu.c
@@ -1788,7 +1788,7 @@ void __init paging_init(const struct machine_desc *mdesc)
 	bootmem_init();
 
 	empty_zero_page = virt_to_page(zero_page);
-	__flush_dcache_page(NULL, empty_zero_page);
+	__flush_dcache_folio(NULL, page_folio(empty_zero_page));
 }
 
 void __init early_mm_init(const struct machine_desc *mdesc)
@@ -1797,8 +1797,8 @@ void __init early_mm_init(const struct machine_desc *mdesc)
 	early_paging_init(mdesc);
 }
 
-void set_pte_at(struct mm_struct *mm, unsigned long addr,
-			      pte_t *ptep, pte_t pteval)
+void set_ptes(struct mm_struct *mm, unsigned long addr,
+			      pte_t *ptep, pte_t pteval, unsigned int nr)
 {
 	unsigned long ext = 0;
 
@@ -1808,5 +1808,11 @@ void set_pte_at(struct mm_struct *mm, unsigned long addr,
 		ext |= PTE_EXT_NG;
 	}
 
-	set_pte_ext(ptep, pteval, ext);
+	for (;;) {
+		set_pte_ext(ptep, pteval, ext);
+		if (--nr == 0)
+			break;
+		ptep++;
+		pte_val(pteval) += PAGE_SIZE;
+	}
 }
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH v4 09/36] arm64: Implement the new page table range API
  2023-03-15  5:14 [PATCH v4 00/36] New page table range API Matthew Wilcox (Oracle)
                   ` (7 preceding siblings ...)
  2023-03-15  5:14 ` [PATCH v4 08/36] arm: " Matthew Wilcox (Oracle)
@ 2023-03-15  5:14 ` Matthew Wilcox (Oracle)
  2023-03-15  9:49   ` Mike Rapoport
  2023-05-25  3:35   ` Anshuman Khandual
  2023-03-15  5:14 ` [PATCH v4 10/36] csky: " Matthew Wilcox (Oracle)
                   ` (26 subsequent siblings)
  35 siblings, 2 replies; 138+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-03-15  5:14 UTC (permalink / raw)
  To: linux-arch
  Cc: Matthew Wilcox (Oracle),
	linux-mm, linux-kernel, Catalin Marinas, linux-arm-kernel

Add set_ptes(), update_mmu_cache_range() and flush_dcache_folio().
Change the PG_dcache_clean flag from being per-page to per-folio.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
---
 arch/arm64/include/asm/cacheflush.h |  4 +++-
 arch/arm64/include/asm/pgtable.h    | 25 ++++++++++++++------
 arch/arm64/mm/flush.c               | 36 +++++++++++------------------
 3 files changed, 35 insertions(+), 30 deletions(-)

diff --git a/arch/arm64/include/asm/cacheflush.h b/arch/arm64/include/asm/cacheflush.h
index 37185e978aeb..d115451ed263 100644
--- a/arch/arm64/include/asm/cacheflush.h
+++ b/arch/arm64/include/asm/cacheflush.h
@@ -114,7 +114,7 @@ extern void copy_to_user_page(struct vm_area_struct *, struct page *,
 #define copy_to_user_page copy_to_user_page
 
 /*
- * flush_dcache_page is used when the kernel has written to the page
+ * flush_dcache_folio is used when the kernel has written to the page
  * cache page at virtual address page->virtual.
  *
  * If this page isn't mapped (ie, page_mapping == NULL), or it might
@@ -127,6 +127,8 @@ extern void copy_to_user_page(struct vm_area_struct *, struct page *,
  */
 #define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
 extern void flush_dcache_page(struct page *);
+void flush_dcache_folio(struct folio *);
+#define flush_dcache_folio flush_dcache_folio
 
 static __always_inline void icache_inval_all_pou(void)
 {
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 9428748f4691..6fd012663a01 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -355,12 +355,21 @@ static inline void __set_pte_at(struct mm_struct *mm, unsigned long addr,
 	set_pte(ptep, pte);
 }
 
-static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
-			      pte_t *ptep, pte_t pte)
-{
-	page_table_check_ptes_set(mm, addr, ptep, pte, 1);
-	return __set_pte_at(mm, addr, ptep, pte);
+static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
+			      pte_t *ptep, pte_t pte, unsigned int nr)
+{
+	page_table_check_ptes_set(mm, addr, ptep, pte, nr);
+
+	for (;;) {
+		__set_pte_at(mm, addr, ptep, pte);
+		if (--nr == 0)
+			break;
+		ptep++;
+		addr += PAGE_SIZE;
+		pte_val(pte) += PAGE_SIZE;
+	}
 }
+#define set_ptes set_ptes
 
 /*
  * Huge pte definitions.
@@ -1059,8 +1068,8 @@ static inline void arch_swap_restore(swp_entry_t entry, struct folio *folio)
 /*
  * On AArch64, the cache coherency is handled via the set_pte_at() function.
  */
-static inline void update_mmu_cache(struct vm_area_struct *vma,
-				    unsigned long addr, pte_t *ptep)
+static inline void update_mmu_cache_range(struct vm_area_struct *vma,
+		unsigned long addr, pte_t *ptep, unsigned int nr)
 {
 	/*
 	 * We don't do anything here, so there's a very small chance of
@@ -1069,6 +1078,8 @@ static inline void update_mmu_cache(struct vm_area_struct *vma,
 	 */
 }
 
+#define update_mmu_cache(vma, addr, ptep) \
+	update_mmu_cache_range(vma, addr, ptep, 1)
 #define update_mmu_cache_pmd(vma, address, pmd) do { } while (0)
 
 #ifdef CONFIG_ARM64_PA_BITS_52
diff --git a/arch/arm64/mm/flush.c b/arch/arm64/mm/flush.c
index 5f9379b3c8c8..deb781af0a3a 100644
--- a/arch/arm64/mm/flush.c
+++ b/arch/arm64/mm/flush.c
@@ -50,20 +50,13 @@ void copy_to_user_page(struct vm_area_struct *vma, struct page *page,
 
 void __sync_icache_dcache(pte_t pte)
 {
-	struct page *page = pte_page(pte);
+	struct folio *folio = page_folio(pte_page(pte));
 
-	/*
-	 * HugeTLB pages are always fully mapped, so only setting head page's
-	 * PG_dcache_clean flag is enough.
-	 */
-	if (PageHuge(page))
-		page = compound_head(page);
-
-	if (!test_bit(PG_dcache_clean, &page->flags)) {
-		sync_icache_aliases((unsigned long)page_address(page),
-				    (unsigned long)page_address(page) +
-					    page_size(page));
-		set_bit(PG_dcache_clean, &page->flags);
+	if (!test_bit(PG_dcache_clean, &folio->flags)) {
+		sync_icache_aliases((unsigned long)folio_address(folio),
+				    (unsigned long)folio_address(folio) +
+					    folio_size(folio));
+		set_bit(PG_dcache_clean, &folio->flags);
 	}
 }
 EXPORT_SYMBOL_GPL(__sync_icache_dcache);
@@ -73,17 +66,16 @@ EXPORT_SYMBOL_GPL(__sync_icache_dcache);
  * it as dirty for later flushing when mapped in user space (if executable,
  * see __sync_icache_dcache).
  */
-void flush_dcache_page(struct page *page)
+void flush_dcache_folio(struct folio *folio)
 {
-	/*
-	 * HugeTLB pages are always fully mapped and only head page will be
-	 * set PG_dcache_clean (see comments in __sync_icache_dcache()).
-	 */
-	if (PageHuge(page))
-		page = compound_head(page);
+	if (test_bit(PG_dcache_clean, &folio->flags))
+		clear_bit(PG_dcache_clean, &folio->flags);
+}
+EXPORT_SYMBOL(flush_dcache_folio);
 
-	if (test_bit(PG_dcache_clean, &page->flags))
-		clear_bit(PG_dcache_clean, &page->flags);
+void flush_dcache_page(struct page *page)
+{
+	flush_dcache_folio(page_folio(page));
 }
 EXPORT_SYMBOL(flush_dcache_page);
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH v4 10/36] csky: Implement the new page table range API
  2023-03-15  5:14 [PATCH v4 00/36] New page table range API Matthew Wilcox (Oracle)
                   ` (8 preceding siblings ...)
  2023-03-15  5:14 ` [PATCH v4 09/36] arm64: " Matthew Wilcox (Oracle)
@ 2023-03-15  5:14 ` Matthew Wilcox (Oracle)
  2023-03-15  9:50   ` Mike Rapoport
  2023-03-15  5:14 ` [PATCH v4 11/36] hexagon: " Matthew Wilcox (Oracle)
                   ` (25 subsequent siblings)
  35 siblings, 1 reply; 138+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-03-15  5:14 UTC (permalink / raw)
  To: linux-arch
  Cc: Matthew Wilcox (Oracle), linux-mm, linux-kernel, Guo Ren, linux-csky

Add PFN_PTE_SHIFT, update_mmu_cache_range() and flush_dcache_folio().
Change the PG_dcache_clean flag from being per-page to per-folio.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Acked-by: Guo Ren <guoren@kernel.org>
Cc: linux-csky@vger.kernel.org
---
 arch/csky/abiv1/cacheflush.c         | 32 +++++++++++++++++-----------
 arch/csky/abiv1/inc/abi/cacheflush.h |  2 ++
 arch/csky/abiv2/cacheflush.c         | 32 ++++++++++++++--------------
 arch/csky/abiv2/inc/abi/cacheflush.h | 10 +++++++--
 arch/csky/include/asm/pgtable.h      |  8 ++++---
 5 files changed, 50 insertions(+), 34 deletions(-)

diff --git a/arch/csky/abiv1/cacheflush.c b/arch/csky/abiv1/cacheflush.c
index fb91b069dc69..ba43f6c26b4f 100644
--- a/arch/csky/abiv1/cacheflush.c
+++ b/arch/csky/abiv1/cacheflush.c
@@ -14,43 +14,49 @@
 
 #define PG_dcache_clean		PG_arch_1
 
-void flush_dcache_page(struct page *page)
+void flush_dcache_folio(struct folio *folio)
 {
 	struct address_space *mapping;
 
-	if (page == ZERO_PAGE(0))
+	if (is_zero_pfn(folio_pfn(folio)))
 		return;
 
-	mapping = page_mapping_file(page);
+	mapping = folio_flush_mapping(folio);
 
-	if (mapping && !page_mapcount(page))
-		clear_bit(PG_dcache_clean, &page->flags);
+	if (mapping && !folio_mapped(folio))
+		clear_bit(PG_dcache_clean, &folio->flags);
 	else {
 		dcache_wbinv_all();
 		if (mapping)
 			icache_inv_all();
-		set_bit(PG_dcache_clean, &page->flags);
+		set_bit(PG_dcache_clean, &folio->flags);
 	}
 }
+EXPORT_SYMBOL(flush_dcache_folio);
+
+void flush_dcache_page(struct page *page)
+{
+	flush_dcache_folio(page_folio(page));
+}
 EXPORT_SYMBOL(flush_dcache_page);
 
-void update_mmu_cache(struct vm_area_struct *vma, unsigned long addr,
-	pte_t *ptep)
+void update_mmu_cache_range(struct vm_area_struct *vma, unsigned long addr,
+		pte_t *ptep, unsigned int nr)
 {
 	unsigned long pfn = pte_pfn(*ptep);
-	struct page *page;
+	struct folio *folio;
 
 	if (!pfn_valid(pfn))
 		return;
 
-	page = pfn_to_page(pfn);
-	if (page == ZERO_PAGE(0))
+	if (is_zero_pfn(pfn))
 		return;
 
-	if (!test_and_set_bit(PG_dcache_clean, &page->flags))
+	folio = page_folio(pfn_to_page(pfn));
+	if (!test_and_set_bit(PG_dcache_clean, &folio->flags))
 		dcache_wbinv_all();
 
-	if (page_mapping_file(page)) {
+	if (folio_flush_mapping(folio)) {
 		if (vma->vm_flags & VM_EXEC)
 			icache_inv_all();
 	}
diff --git a/arch/csky/abiv1/inc/abi/cacheflush.h b/arch/csky/abiv1/inc/abi/cacheflush.h
index ed62e2066ba7..0d6cb65624c4 100644
--- a/arch/csky/abiv1/inc/abi/cacheflush.h
+++ b/arch/csky/abiv1/inc/abi/cacheflush.h
@@ -9,6 +9,8 @@
 
 #define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
 extern void flush_dcache_page(struct page *);
+void flush_dcache_folio(struct folio *);
+#define flush_dcache_folio flush_dcache_folio
 
 #define flush_cache_mm(mm)			dcache_wbinv_all()
 #define flush_cache_page(vma, page, pfn)	cache_wbinv_all()
diff --git a/arch/csky/abiv2/cacheflush.c b/arch/csky/abiv2/cacheflush.c
index 39c51399dd81..622e5b1b3f8a 100644
--- a/arch/csky/abiv2/cacheflush.c
+++ b/arch/csky/abiv2/cacheflush.c
@@ -6,30 +6,30 @@
 #include <linux/mm.h>
 #include <asm/cache.h>
 
-void update_mmu_cache(struct vm_area_struct *vma, unsigned long address,
-		      pte_t *pte)
+void update_mmu_cache_range(struct vm_area_struct *vma, unsigned long address,
+		pte_t *pte, unsigned int nr)
 {
-	unsigned long addr;
-	struct page *page;
+	unsigned long pfn = pte_pfn(*pte);
+	struct folio *folio;
+	unsigned int i;
 
-	if (!pfn_valid(pte_pfn(*pte)))
+	if (!pfn_valid(pfn) || is_zero_pfn(pfn))
 		return;
 
-	page = pfn_to_page(pte_pfn(*pte));
-	if (page == ZERO_PAGE(0))
-		return;
+	folio = page_folio(pfn_to_page(pfn));
 
-	if (test_and_set_bit(PG_dcache_clean, &page->flags))
+	if (test_and_set_bit(PG_dcache_clean, &folio->flags))
 		return;
 
-	addr = (unsigned long) kmap_atomic(page);
-
-	dcache_wb_range(addr, addr + PAGE_SIZE);
+	for (i = 0; i < folio_nr_pages(folio); i++) {
+		unsigned long addr = (unsigned long) kmap_local_folio(folio,
+								i * PAGE_SIZE);
 
-	if (vma->vm_flags & VM_EXEC)
-		icache_inv_range(addr, addr + PAGE_SIZE);
-
-	kunmap_atomic((void *) addr);
+		dcache_wb_range(addr, addr + PAGE_SIZE);
+		if (vma->vm_flags & VM_EXEC)
+			icache_inv_range(addr, addr + PAGE_SIZE);
+		kunmap_local((void *) addr);
+	}
 }
 
 void flush_icache_deferred(struct mm_struct *mm)
diff --git a/arch/csky/abiv2/inc/abi/cacheflush.h b/arch/csky/abiv2/inc/abi/cacheflush.h
index a565e00c3f70..9c728933a776 100644
--- a/arch/csky/abiv2/inc/abi/cacheflush.h
+++ b/arch/csky/abiv2/inc/abi/cacheflush.h
@@ -18,11 +18,17 @@
 
 #define PG_dcache_clean		PG_arch_1
 
+static inline void flush_dcache_folio(struct folio *folio)
+{
+	if (test_bit(PG_dcache_clean, &folio->flags))
+		clear_bit(PG_dcache_clean, &folio->flags);
+}
+#define flush_dcache_folio flush_dcache_folio
+
 #define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
 static inline void flush_dcache_page(struct page *page)
 {
-	if (test_bit(PG_dcache_clean, &page->flags))
-		clear_bit(PG_dcache_clean, &page->flags);
+	flush_dcache_folio(page_folio(page));
 }
 
 #define flush_dcache_mmap_lock(mapping)		do { } while (0)
diff --git a/arch/csky/include/asm/pgtable.h b/arch/csky/include/asm/pgtable.h
index d4042495febc..8cd27104f408 100644
--- a/arch/csky/include/asm/pgtable.h
+++ b/arch/csky/include/asm/pgtable.h
@@ -28,6 +28,7 @@
 #define pgd_ERROR(e) \
 	pr_err("%s:%d: bad pgd %08lx.\n", __FILE__, __LINE__, pgd_val(e))
 
+#define PFN_PTE_SHIFT	PAGE_SHIFT
 #define pmd_pfn(pmd)	(pmd_phys(pmd) >> PAGE_SHIFT)
 #define pmd_page(pmd)	(pfn_to_page(pmd_phys(pmd) >> PAGE_SHIFT))
 #define pte_clear(mm, addr, ptep)	set_pte((ptep), \
@@ -90,7 +91,6 @@ static inline void set_pte(pte_t *p, pte_t pte)
 	/* prevent out of order excution */
 	smp_mb();
 }
-#define set_pte_at(mm, addr, ptep, pteval) set_pte(ptep, pteval)
 
 static inline pte_t *pmd_page_vaddr(pmd_t pmd)
 {
@@ -263,8 +263,10 @@ static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
 extern pgd_t swapper_pg_dir[PTRS_PER_PGD];
 extern void paging_init(void);
 
-void update_mmu_cache(struct vm_area_struct *vma, unsigned long address,
-		      pte_t *pte);
+void update_mmu_cache_range(struct vm_area_struct *vma, unsigned long address,
+		pte_t *pte, unsigned int nr);
+#define update_mmu_cache(vma, addr, ptep) \
+	update_mmu_cache_range(vma, addr, ptep, 1)
 
 #define io_remap_pfn_range(vma, vaddr, pfn, size, prot) \
 	remap_pfn_range(vma, vaddr, pfn, size, prot)
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH v4 11/36] hexagon: Implement the new page table range API
  2023-03-15  5:14 [PATCH v4 00/36] New page table range API Matthew Wilcox (Oracle)
                   ` (9 preceding siblings ...)
  2023-03-15  5:14 ` [PATCH v4 10/36] csky: " Matthew Wilcox (Oracle)
@ 2023-03-15  5:14 ` Matthew Wilcox (Oracle)
  2023-03-15  9:54   ` Mike Rapoport
  2023-03-15  5:14 ` [PATCH v4 12/36] ia64: " Matthew Wilcox (Oracle)
                   ` (24 subsequent siblings)
  35 siblings, 1 reply; 138+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-03-15  5:14 UTC (permalink / raw)
  To: linux-arch; +Cc: Matthew Wilcox (Oracle), linux-mm, linux-kernel, Brian Cain

Add PFN_PTE_SHIFT and update_mmu_cache_range().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Acked-by: Brian Cain <bcain@quicinc.com>
---
 arch/hexagon/include/asm/cacheflush.h | 7 +++++--
 arch/hexagon/include/asm/pgtable.h    | 9 +--------
 2 files changed, 6 insertions(+), 10 deletions(-)

diff --git a/arch/hexagon/include/asm/cacheflush.h b/arch/hexagon/include/asm/cacheflush.h
index 6eff0730e6ef..63ca314ede89 100644
--- a/arch/hexagon/include/asm/cacheflush.h
+++ b/arch/hexagon/include/asm/cacheflush.h
@@ -58,12 +58,15 @@ extern void flush_cache_all_hexagon(void);
  * clean the cache when the PTE is set.
  *
  */
-static inline void update_mmu_cache(struct vm_area_struct *vma,
-					unsigned long address, pte_t *ptep)
+static inline void update_mmu_cache_range(struct vm_area_struct *vma,
+		unsigned long address, pte_t *ptep, unsigned int nr)
 {
 	/*  generic_ptrace_pokedata doesn't wind up here, does it?  */
 }
 
+#define update_mmu_cache(vma, addr, ptep) \
+	update_mmu_cache_range(vma, addr, ptep, 1)
+
 void copy_to_user_page(struct vm_area_struct *vma, struct page *page,
 		       unsigned long vaddr, void *dst, void *src, int len);
 #define copy_to_user_page copy_to_user_page
diff --git a/arch/hexagon/include/asm/pgtable.h b/arch/hexagon/include/asm/pgtable.h
index 59393613d086..dd05dd71b8ec 100644
--- a/arch/hexagon/include/asm/pgtable.h
+++ b/arch/hexagon/include/asm/pgtable.h
@@ -338,6 +338,7 @@ static inline int pte_exec(pte_t pte)
 /* __swp_entry_to_pte - extract PTE from swap entry */
 #define __swp_entry_to_pte(x) ((pte_t) { (x).val })
 
+#define PFN_PTE_SHIFT	PAGE_SHIFT
 /* pfn_pte - convert page number and protection value to page table entry */
 #define pfn_pte(pfn, pgprot) __pte((pfn << PAGE_SHIFT) | pgprot_val(pgprot))
 
@@ -345,14 +346,6 @@ static inline int pte_exec(pte_t pte)
 #define pte_pfn(pte) (pte_val(pte) >> PAGE_SHIFT)
 #define set_pmd(pmdptr, pmdval) (*(pmdptr) = (pmdval))
 
-/*
- * set_pte_at - update page table and do whatever magic may be
- * necessary to make the underlying hardware/firmware take note.
- *
- * VM may require a virtual instruction to alert the MMU.
- */
-#define set_pte_at(mm, addr, ptep, pte) set_pte(ptep, pte)
-
 static inline unsigned long pmd_page_vaddr(pmd_t pmd)
 {
 	return (unsigned long)__va(pmd_val(pmd) & PAGE_MASK);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH v4 12/36] ia64: Implement the new page table range API
  2023-03-15  5:14 [PATCH v4 00/36] New page table range API Matthew Wilcox (Oracle)
                   ` (10 preceding siblings ...)
  2023-03-15  5:14 ` [PATCH v4 11/36] hexagon: " Matthew Wilcox (Oracle)
@ 2023-03-15  5:14 ` Matthew Wilcox (Oracle)
  2023-03-15  9:55   ` Mike Rapoport
  2023-03-15  5:14 ` [PATCH v4 13/36] loongarch: " Matthew Wilcox (Oracle)
                   ` (23 subsequent siblings)
  35 siblings, 1 reply; 138+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-03-15  5:14 UTC (permalink / raw)
  To: linux-arch; +Cc: Matthew Wilcox (Oracle), linux-mm, linux-kernel, linux-ia64

Add PFN_PTE_SHIFT, update_mmu_cache_range() and flush_dcache_folio().
Change the PG_arch_1 (aka PG_dcache_clean) flag from being per-page to
per-folio, which makes arch_dma_mark_clean() and mark_clean() a little
more exciting.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: linux-ia64@vger.kernel.org
---
 arch/ia64/hp/common/sba_iommu.c    | 26 +++++++++++++++-----------
 arch/ia64/include/asm/cacheflush.h | 14 ++++++++++----
 arch/ia64/include/asm/pgtable.h    |  4 ++--
 arch/ia64/mm/init.c                | 28 +++++++++++++++++++---------
 4 files changed, 46 insertions(+), 26 deletions(-)

diff --git a/arch/ia64/hp/common/sba_iommu.c b/arch/ia64/hp/common/sba_iommu.c
index 8ad6946521d8..48d475f10003 100644
--- a/arch/ia64/hp/common/sba_iommu.c
+++ b/arch/ia64/hp/common/sba_iommu.c
@@ -798,22 +798,26 @@ sba_io_pdir_entry(u64 *pdir_ptr, unsigned long vba)
 #endif
 
 #ifdef ENABLE_MARK_CLEAN
-/**
+/*
  * Since DMA is i-cache coherent, any (complete) pages that were written via
  * DMA can be marked as "clean" so that lazy_mmu_prot_update() doesn't have to
  * flush them when they get mapped into an executable vm-area.
  */
-static void
-mark_clean (void *addr, size_t size)
+static void mark_clean(void *addr, size_t size)
 {
-	unsigned long pg_addr, end;
-
-	pg_addr = PAGE_ALIGN((unsigned long) addr);
-	end = (unsigned long) addr + size;
-	while (pg_addr + PAGE_SIZE <= end) {
-		struct page *page = virt_to_page((void *)pg_addr);
-		set_bit(PG_arch_1, &page->flags);
-		pg_addr += PAGE_SIZE;
+	struct folio *folio = virt_to_folio(addr);
+	ssize_t left = size;
+	size_t offset = offset_in_folio(folio, addr);
+
+	if (offset) {
+		left -= folio_size(folio) - offset;
+		folio = folio_next(folio);
+	}
+
+	while (left >= folio_size(folio)) {
+		set_bit(PG_arch_1, &folio->flags);
+		left -= folio_size(folio);
+		folio = folio_next(folio);
 	}
 }
 #endif
diff --git a/arch/ia64/include/asm/cacheflush.h b/arch/ia64/include/asm/cacheflush.h
index 708c0fa5d975..eac493fa9e0d 100644
--- a/arch/ia64/include/asm/cacheflush.h
+++ b/arch/ia64/include/asm/cacheflush.h
@@ -13,10 +13,16 @@
 #include <asm/page.h>
 
 #define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
-#define flush_dcache_page(page)			\
-do {						\
-	clear_bit(PG_arch_1, &(page)->flags);	\
-} while (0)
+static inline void flush_dcache_folio(struct folio *folio)
+{
+	clear_bit(PG_arch_1, &folio->flags);
+}
+#define flush_dcache_folio flush_dcache_folio
+
+static inline void flush_dcache_page(struct page *page)
+{
+	flush_dcache_folio(page_folio(page));
+}
 
 extern void flush_icache_range(unsigned long start, unsigned long end);
 #define flush_icache_range flush_icache_range
diff --git a/arch/ia64/include/asm/pgtable.h b/arch/ia64/include/asm/pgtable.h
index 21c97e31a28a..5450d59e4fb9 100644
--- a/arch/ia64/include/asm/pgtable.h
+++ b/arch/ia64/include/asm/pgtable.h
@@ -206,6 +206,7 @@ ia64_phys_addr_valid (unsigned long addr)
 #define RGN_MAP_SHIFT (PGDIR_SHIFT + PTRS_PER_PGD_SHIFT - 3)
 #define RGN_MAP_LIMIT	((1UL << RGN_MAP_SHIFT) - PAGE_SIZE)	/* per region addr limit */
 
+#define PFN_PTE_SHIFT	PAGE_SHIFT
 /*
  * Conversion functions: convert page frame number (pfn) and a protection value to a page
  * table entry (pte).
@@ -303,8 +304,6 @@ static inline void set_pte(pte_t *ptep, pte_t pteval)
 	*ptep = pteval;
 }
 
-#define set_pte_at(mm,addr,ptep,pteval) set_pte(ptep,pteval)
-
 /*
  * Make page protection values cacheable, uncacheable, or write-
  * combining.  Note that "protection" is really a misnomer here as the
@@ -396,6 +395,7 @@ pte_same (pte_t a, pte_t b)
 	return pte_val(a) == pte_val(b);
 }
 
+#define update_mmu_cache_range(vma, address, ptep, nr) do { } while (0)
 #define update_mmu_cache(vma, address, ptep) do { } while (0)
 
 extern pgd_t swapper_pg_dir[PTRS_PER_PGD];
diff --git a/arch/ia64/mm/init.c b/arch/ia64/mm/init.c
index 7f5353e28516..b95debabdc2a 100644
--- a/arch/ia64/mm/init.c
+++ b/arch/ia64/mm/init.c
@@ -50,30 +50,40 @@ void
 __ia64_sync_icache_dcache (pte_t pte)
 {
 	unsigned long addr;
-	struct page *page;
+	struct folio *folio;
 
-	page = pte_page(pte);
-	addr = (unsigned long) page_address(page);
+	folio = page_folio(pte_page(pte));
+	addr = (unsigned long)folio_address(folio);
 
-	if (test_bit(PG_arch_1, &page->flags))
+	if (test_bit(PG_arch_1, &folio->flags))
 		return;				/* i-cache is already coherent with d-cache */
 
-	flush_icache_range(addr, addr + page_size(page));
-	set_bit(PG_arch_1, &page->flags);	/* mark page as clean */
+	flush_icache_range(addr, addr + folio_size(folio));
+	set_bit(PG_arch_1, &folio->flags);	/* mark page as clean */
 }
 
 /*
- * Since DMA is i-cache coherent, any (complete) pages that were written via
+ * Since DMA is i-cache coherent, any (complete) folios that were written via
  * DMA can be marked as "clean" so that lazy_mmu_prot_update() doesn't have to
  * flush them when they get mapped into an executable vm-area.
  */
 void arch_dma_mark_clean(phys_addr_t paddr, size_t size)
 {
 	unsigned long pfn = PHYS_PFN(paddr);
+	struct folio *folio = page_folio(pfn_to_page(pfn));
+	ssize_t left = size;
+	size_t offset = offset_in_folio(folio, paddr);
 
-	do {
+	if (offset) {
+		left -= folio_size(folio) - offset;
+		folio = folio_next(folio);
+	}
+
+	while (left >= (ssize_t)folio_size(folio)) {
 		set_bit(PG_arch_1, &pfn_to_page(pfn)->flags);
-	} while (++pfn <= PHYS_PFN(paddr + size - 1));
+		left -= folio_size(folio);
+		folio = folio_next(folio);
+	}
 }
 
 inline void
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH v4 13/36] loongarch: Implement the new page table range API
  2023-03-15  5:14 [PATCH v4 00/36] New page table range API Matthew Wilcox (Oracle)
                   ` (11 preceding siblings ...)
  2023-03-15  5:14 ` [PATCH v4 12/36] ia64: " Matthew Wilcox (Oracle)
@ 2023-03-15  5:14 ` Matthew Wilcox (Oracle)
  2023-03-15 10:07   ` Mike Rapoport
  2023-03-15  5:14 ` [PATCH v4 14/36] m68k: " Matthew Wilcox (Oracle)
                   ` (22 subsequent siblings)
  35 siblings, 1 reply; 138+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-03-15  5:14 UTC (permalink / raw)
  To: linux-arch
  Cc: Matthew Wilcox (Oracle),
	linux-mm, linux-kernel, Huacai Chen, WANG Xuerui, loongarch

Add update_mmu_cache_range() and change _PFN_SHIFT to PFN_PTE_SHIFT.
It would probably be more efficient to implement __update_tlb() by
flushing the entire folio instead of calling __update_tlb() N times,
but I'll leave that for someone who understands the architecture better.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: WANG Xuerui <kernel@xen0n.name>
Cc: loongarch@lists.linux.dev
---
 arch/loongarch/include/asm/cacheflush.h   |  2 ++
 arch/loongarch/include/asm/pgtable-bits.h |  4 ++--
 arch/loongarch/include/asm/pgtable.h      | 28 ++++++++++++-----------
 arch/loongarch/mm/pgtable.c               |  2 +-
 arch/loongarch/mm/tlb.c                   |  2 +-
 5 files changed, 21 insertions(+), 17 deletions(-)

diff --git a/arch/loongarch/include/asm/cacheflush.h b/arch/loongarch/include/asm/cacheflush.h
index 0681788eb474..7907eb42bfbd 100644
--- a/arch/loongarch/include/asm/cacheflush.h
+++ b/arch/loongarch/include/asm/cacheflush.h
@@ -47,8 +47,10 @@ void local_flush_icache_range(unsigned long start, unsigned long end);
 #define flush_cache_vmap(start, end)			do { } while (0)
 #define flush_cache_vunmap(start, end)			do { } while (0)
 #define flush_icache_page(vma, page)			do { } while (0)
+#define flush_icache_pages(vma, page)			do { } while (0)
 #define flush_icache_user_page(vma, page, addr, len)	do { } while (0)
 #define flush_dcache_page(page)				do { } while (0)
+#define flush_dcache_folio(folio)			do { } while (0)
 #define flush_dcache_mmap_lock(mapping)			do { } while (0)
 #define flush_dcache_mmap_unlock(mapping)		do { } while (0)
 
diff --git a/arch/loongarch/include/asm/pgtable-bits.h b/arch/loongarch/include/asm/pgtable-bits.h
index 8b98d22a145b..a1eb2e25446b 100644
--- a/arch/loongarch/include/asm/pgtable-bits.h
+++ b/arch/loongarch/include/asm/pgtable-bits.h
@@ -48,12 +48,12 @@
 #define _PAGE_NO_EXEC		(_ULCAST_(1) << _PAGE_NO_EXEC_SHIFT)
 #define _PAGE_RPLV		(_ULCAST_(1) << _PAGE_RPLV_SHIFT)
 #define _CACHE_MASK		(_ULCAST_(3) << _CACHE_SHIFT)
-#define _PFN_SHIFT		(PAGE_SHIFT - 12 + _PAGE_PFN_SHIFT)
+#define PFN_PTE_SHIFT		(PAGE_SHIFT - 12 + _PAGE_PFN_SHIFT)
 
 #define _PAGE_USER	(PLV_USER << _PAGE_PLV_SHIFT)
 #define _PAGE_KERN	(PLV_KERN << _PAGE_PLV_SHIFT)
 
-#define _PFN_MASK (~((_ULCAST_(1) << (_PFN_SHIFT)) - 1) & \
+#define _PFN_MASK (~((_ULCAST_(1) << (PFN_PTE_SHIFT)) - 1) & \
 		  ((_ULCAST_(1) << (_PAGE_PFN_END_SHIFT)) - 1))
 
 /*
diff --git a/arch/loongarch/include/asm/pgtable.h b/arch/loongarch/include/asm/pgtable.h
index d28fb9dbec59..13aad0003e9a 100644
--- a/arch/loongarch/include/asm/pgtable.h
+++ b/arch/loongarch/include/asm/pgtable.h
@@ -237,9 +237,9 @@ extern pmd_t mk_pmd(struct page *page, pgprot_t prot);
 extern void set_pmd_at(struct mm_struct *mm, unsigned long addr, pmd_t *pmdp, pmd_t pmd);
 
 #define pte_page(x)		pfn_to_page(pte_pfn(x))
-#define pte_pfn(x)		((unsigned long)(((x).pte & _PFN_MASK) >> _PFN_SHIFT))
-#define pfn_pte(pfn, prot)	__pte(((pfn) << _PFN_SHIFT) | pgprot_val(prot))
-#define pfn_pmd(pfn, prot)	__pmd(((pfn) << _PFN_SHIFT) | pgprot_val(prot))
+#define pte_pfn(x)		((unsigned long)(((x).pte & _PFN_MASK) >> PFN_PTE_SHIFT))
+#define pfn_pte(pfn, prot)	__pte(((pfn) << PFN_PTE_SHIFT) | pgprot_val(prot))
+#define pfn_pmd(pfn, prot)	__pmd(((pfn) << PFN_PTE_SHIFT) | pgprot_val(prot))
 
 /*
  * Initialize a new pgd / pud / pmd table with invalid pointers.
@@ -334,12 +334,6 @@ static inline void set_pte(pte_t *ptep, pte_t pteval)
 	}
 }
 
-static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
-			      pte_t *ptep, pte_t pteval)
-{
-	set_pte(ptep, pteval);
-}
-
 static inline void pte_clear(struct mm_struct *mm, unsigned long addr, pte_t *ptep)
 {
 	/* Preserve global status for the pair */
@@ -445,11 +439,19 @@ static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
 extern void __update_tlb(struct vm_area_struct *vma,
 			unsigned long address, pte_t *ptep);
 
-static inline void update_mmu_cache(struct vm_area_struct *vma,
-			unsigned long address, pte_t *ptep)
+static inline void update_mmu_cache_range(struct vm_area_struct *vma,
+		unsigned long address, pte_t *ptep, unsigned int nr)
 {
-	__update_tlb(vma, address, ptep);
+	for (;;) {
+		__update_tlb(vma, address, ptep);
+		if (--nr == 0)
+			break;
+		address += PAGE_SIZE;
+		ptep++;
+	}
 }
+#define update_mmu_cache(vma, addr, ptep) \
+	update_mmu_cache_range(vma, addr, ptep, 1)
 
 #define __HAVE_ARCH_UPDATE_MMU_TLB
 #define update_mmu_tlb	update_mmu_cache
@@ -462,7 +464,7 @@ static inline void update_mmu_cache_pmd(struct vm_area_struct *vma,
 
 static inline unsigned long pmd_pfn(pmd_t pmd)
 {
-	return (pmd_val(pmd) & _PFN_MASK) >> _PFN_SHIFT;
+	return (pmd_val(pmd) & _PFN_MASK) >> PFN_PTE_SHIFT;
 }
 
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
diff --git a/arch/loongarch/mm/pgtable.c b/arch/loongarch/mm/pgtable.c
index 36a6dc0148ae..1260cf30e3ee 100644
--- a/arch/loongarch/mm/pgtable.c
+++ b/arch/loongarch/mm/pgtable.c
@@ -107,7 +107,7 @@ pmd_t mk_pmd(struct page *page, pgprot_t prot)
 {
 	pmd_t pmd;
 
-	pmd_val(pmd) = (page_to_pfn(page) << _PFN_SHIFT) | pgprot_val(prot);
+	pmd_val(pmd) = (page_to_pfn(page) << PFN_PTE_SHIFT) | pgprot_val(prot);
 
 	return pmd;
 }
diff --git a/arch/loongarch/mm/tlb.c b/arch/loongarch/mm/tlb.c
index 8bad6b0cff59..73652930b268 100644
--- a/arch/loongarch/mm/tlb.c
+++ b/arch/loongarch/mm/tlb.c
@@ -246,7 +246,7 @@ static void output_pgtable_bits_defines(void)
 	pr_define("_PAGE_WRITE_SHIFT %d\n", _PAGE_WRITE_SHIFT);
 	pr_define("_PAGE_NO_READ_SHIFT %d\n", _PAGE_NO_READ_SHIFT);
 	pr_define("_PAGE_NO_EXEC_SHIFT %d\n", _PAGE_NO_EXEC_SHIFT);
-	pr_define("_PFN_SHIFT %d\n", _PFN_SHIFT);
+	pr_define("PFN_PTE_SHIFT %d\n", PFN_PTE_SHIFT);
 	pr_debug("\n");
 }
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH v4 14/36] m68k: Implement the new page table range API
  2023-03-15  5:14 [PATCH v4 00/36] New page table range API Matthew Wilcox (Oracle)
                   ` (12 preceding siblings ...)
  2023-03-15  5:14 ` [PATCH v4 13/36] loongarch: " Matthew Wilcox (Oracle)
@ 2023-03-15  5:14 ` Matthew Wilcox (Oracle)
  2023-03-15  7:43   ` Geert Uytterhoeven
  2023-03-15 10:07   ` Mike Rapoport
  2023-03-15  5:14 ` [PATCH v4 15/36] microblaze: " Matthew Wilcox (Oracle)
                   ` (21 subsequent siblings)
  35 siblings, 2 replies; 138+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-03-15  5:14 UTC (permalink / raw)
  To: linux-arch
  Cc: Matthew Wilcox (Oracle),
	linux-mm, linux-kernel, Geert Uytterhoeven, linux-m68k

Add PFN_PTE_SHIFT, update_mmu_cache_range(), flush_icache_pages() and
flush_dcache_folio().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: linux-m68k@lists.linux-m68k.org
---
 arch/m68k/include/asm/cacheflush_mm.h    | 27 ++++++++++++++++--------
 arch/m68k/include/asm/mcf_pgtable.h      |  1 +
 arch/m68k/include/asm/motorola_pgtable.h |  1 +
 arch/m68k/include/asm/pgtable_mm.h       |  9 ++++----
 arch/m68k/include/asm/sun3_pgtable.h     |  1 +
 arch/m68k/mm/motorola.c                  |  2 +-
 6 files changed, 27 insertions(+), 14 deletions(-)

diff --git a/arch/m68k/include/asm/cacheflush_mm.h b/arch/m68k/include/asm/cacheflush_mm.h
index 1ac55e7b47f0..88eb85e81ef6 100644
--- a/arch/m68k/include/asm/cacheflush_mm.h
+++ b/arch/m68k/include/asm/cacheflush_mm.h
@@ -220,24 +220,29 @@ static inline void flush_cache_page(struct vm_area_struct *vma, unsigned long vm
 
 /* Push the page at kernel virtual address and clear the icache */
 /* RZ: use cpush %bc instead of cpush %dc, cinv %ic */
-static inline void __flush_page_to_ram(void *vaddr)
+static inline void __flush_pages_to_ram(void *vaddr, unsigned int nr)
 {
 	if (CPU_IS_COLDFIRE) {
 		unsigned long addr, start, end;
 		addr = ((unsigned long) vaddr) & ~(PAGE_SIZE - 1);
 		start = addr & ICACHE_SET_MASK;
-		end = (addr + PAGE_SIZE - 1) & ICACHE_SET_MASK;
+		end = (addr + nr * PAGE_SIZE - 1) & ICACHE_SET_MASK;
 		if (start > end) {
 			flush_cf_bcache(0, end);
 			end = ICACHE_MAX_ADDR;
 		}
 		flush_cf_bcache(start, end);
 	} else if (CPU_IS_040_OR_060) {
-		__asm__ __volatile__("nop\n\t"
-				     ".chip 68040\n\t"
-				     "cpushp %%bc,(%0)\n\t"
-				     ".chip 68k"
-				     : : "a" (__pa(vaddr)));
+		unsigned long paddr = __pa(vaddr);
+
+		do {
+			__asm__ __volatile__("nop\n\t"
+					     ".chip 68040\n\t"
+					     "cpushp %%bc,(%0)\n\t"
+					     ".chip 68k"
+					     : : "a" (paddr));
+			paddr += PAGE_SIZE;
+		} while (--nr);
 	} else {
 		unsigned long _tmp;
 		__asm__ __volatile__("movec %%cacr,%0\n\t"
@@ -249,10 +254,14 @@ static inline void __flush_page_to_ram(void *vaddr)
 }
 
 #define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
-#define flush_dcache_page(page)		__flush_page_to_ram(page_address(page))
+#define flush_dcache_page(page)	__flush_pages_to_ram(page_address(page), 1)
+#define flush_dcache_folio(folio)		\
+	__flush_pages_to_ram(folio_address(folio), folio_nr_pages(folio))
 #define flush_dcache_mmap_lock(mapping)		do { } while (0)
 #define flush_dcache_mmap_unlock(mapping)	do { } while (0)
-#define flush_icache_page(vma, page)	__flush_page_to_ram(page_address(page))
+#define flush_icache_pages(vma, page, nr)	\
+	__flush_pages_to_ram(page_address(page), nr)
+#define flush_icache_page(vma, page) flush_icache_pages(vma, page, 1)
 
 extern void flush_icache_user_page(struct vm_area_struct *vma, struct page *page,
 				    unsigned long addr, int len);
diff --git a/arch/m68k/include/asm/mcf_pgtable.h b/arch/m68k/include/asm/mcf_pgtable.h
index 13741c1245e1..1414b607eff4 100644
--- a/arch/m68k/include/asm/mcf_pgtable.h
+++ b/arch/m68k/include/asm/mcf_pgtable.h
@@ -292,6 +292,7 @@ static inline pte_t pte_swp_clear_exclusive(pte_t pte)
 	return pte;
 }
 
+#define PFN_PTE_SHIFT		PAGE_SHIFT
 #define pmd_pfn(pmd)		(pmd_val(pmd) >> PAGE_SHIFT)
 #define pmd_page(pmd)		(pfn_to_page(pmd_val(pmd) >> PAGE_SHIFT))
 
diff --git a/arch/m68k/include/asm/motorola_pgtable.h b/arch/m68k/include/asm/motorola_pgtable.h
index ec0dc19ab834..38d5e5edc3e1 100644
--- a/arch/m68k/include/asm/motorola_pgtable.h
+++ b/arch/m68k/include/asm/motorola_pgtable.h
@@ -112,6 +112,7 @@ static inline void pud_set(pud_t *pudp, pmd_t *pmdp)
 #define pte_present(pte)	(pte_val(pte) & (_PAGE_PRESENT | _PAGE_PROTNONE))
 #define pte_clear(mm,addr,ptep)		({ pte_val(*(ptep)) = 0; })
 
+#define PFN_PTE_SHIFT		PAGE_SHIFT
 #define pte_page(pte)		virt_to_page(__va(pte_val(pte)))
 #define pte_pfn(pte)		(pte_val(pte) >> PAGE_SHIFT)
 #define pfn_pte(pfn, prot)	__pte(((pfn) << PAGE_SHIFT) | pgprot_val(prot))
diff --git a/arch/m68k/include/asm/pgtable_mm.h b/arch/m68k/include/asm/pgtable_mm.h
index b93c41fe2067..8c2db20abdb6 100644
--- a/arch/m68k/include/asm/pgtable_mm.h
+++ b/arch/m68k/include/asm/pgtable_mm.h
@@ -31,8 +31,6 @@
 	do{							\
 		*(pteptr) = (pteval);				\
 	} while(0)
-#define set_pte_at(mm,addr,ptep,pteval) set_pte(ptep,pteval)
-
 
 /* PMD_SHIFT determines the size of the area a second-level page table can map */
 #if CONFIG_PGTABLE_LEVELS == 3
@@ -138,11 +136,14 @@ extern void kernel_set_cachemode(void *addr, unsigned long size, int cmode);
  * tables contain all the necessary information.  The Sun3 does, but
  * they are updated on demand.
  */
-static inline void update_mmu_cache(struct vm_area_struct *vma,
-				    unsigned long address, pte_t *ptep)
+static inline void update_mmu_cache_range(struct vm_area_struct *vma,
+		unsigned long address, pte_t *ptep, unsigned int nr)
 {
 }
 
+#define update_mmu_cache(vma, addr, ptep) \
+	update_mmu_cache_range(vma, addr, ptep, 1)
+
 #endif /* !__ASSEMBLY__ */
 
 /* MMU-specific headers */
diff --git a/arch/m68k/include/asm/sun3_pgtable.h b/arch/m68k/include/asm/sun3_pgtable.h
index e582b0484a55..feae73b3b342 100644
--- a/arch/m68k/include/asm/sun3_pgtable.h
+++ b/arch/m68k/include/asm/sun3_pgtable.h
@@ -105,6 +105,7 @@ static inline void pte_clear (struct mm_struct *mm, unsigned long addr, pte_t *p
 	pte_val (*ptep) = 0;
 }
 
+#define PFN_PTE_SHIFT		0
 #define pte_pfn(pte)            (pte_val(pte) & SUN3_PAGE_PGNUM_MASK)
 #define pfn_pte(pfn, pgprot) \
 ({ pte_t __pte; pte_val(__pte) = pfn | pgprot_val(pgprot); __pte; })
diff --git a/arch/m68k/mm/motorola.c b/arch/m68k/mm/motorola.c
index 911301224078..790666c6d146 100644
--- a/arch/m68k/mm/motorola.c
+++ b/arch/m68k/mm/motorola.c
@@ -81,7 +81,7 @@ static inline void cache_page(void *vaddr)
 
 void mmu_page_ctor(void *page)
 {
-	__flush_page_to_ram(page);
+	__flush_pages_to_ram(page, 1);
 	flush_tlb_kernel_page(page);
 	nocache_page(page);
 }
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH v4 15/36] microblaze: Implement the new page table range API
  2023-03-15  5:14 [PATCH v4 00/36] New page table range API Matthew Wilcox (Oracle)
                   ` (13 preceding siblings ...)
  2023-03-15  5:14 ` [PATCH v4 14/36] m68k: " Matthew Wilcox (Oracle)
@ 2023-03-15  5:14 ` Matthew Wilcox (Oracle)
  2023-03-15 10:07   ` Mike Rapoport
  2023-03-15  5:14 ` [PATCH v4 16/36] mips: " Matthew Wilcox (Oracle)
                   ` (20 subsequent siblings)
  35 siblings, 1 reply; 138+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-03-15  5:14 UTC (permalink / raw)
  To: linux-arch; +Cc: Matthew Wilcox (Oracle), linux-mm, linux-kernel, Michal Simek

Rename PFN_SHIFT_OFFSET to PTE_PFN_SHIFT.  Change the calling
convention for set_pte() to be the same as other architectures.  Add
update_mmu_cache_range(), flush_icache_pages() and flush_dcache_folio().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Simek <monstr@monstr.eu>
---
 arch/microblaze/include/asm/cacheflush.h |  8 ++++++++
 arch/microblaze/include/asm/pgtable.h    | 15 ++++-----------
 arch/microblaze/include/asm/tlbflush.h   |  4 +++-
 3 files changed, 15 insertions(+), 12 deletions(-)

diff --git a/arch/microblaze/include/asm/cacheflush.h b/arch/microblaze/include/asm/cacheflush.h
index 39f8fb6768d8..e6641ff98cb3 100644
--- a/arch/microblaze/include/asm/cacheflush.h
+++ b/arch/microblaze/include/asm/cacheflush.h
@@ -74,6 +74,14 @@ do { \
 	flush_dcache_range((unsigned) (addr), (unsigned) (addr) + PAGE_SIZE); \
 } while (0);
 
+static void flush_dcache_folio(struct folio *folio)
+{
+	unsigned long addr = folio_pfn(folio) << PAGE_SHIFT;
+
+	flush_dcache_range(addr, addr + folio_size(folio));
+}
+#define flush_dcache_folio flush_dcache_folio
+
 #define flush_cache_page(vma, vmaddr, pfn) \
 	flush_dcache_range(pfn << PAGE_SHIFT, (pfn << PAGE_SHIFT) + PAGE_SIZE);
 
diff --git a/arch/microblaze/include/asm/pgtable.h b/arch/microblaze/include/asm/pgtable.h
index d1b8272abcd9..19fcd7f8517e 100644
--- a/arch/microblaze/include/asm/pgtable.h
+++ b/arch/microblaze/include/asm/pgtable.h
@@ -230,12 +230,12 @@ extern unsigned long empty_zero_page[1024];
 
 #define pte_page(x)		(mem_map + (unsigned long) \
 				((pte_val(x) - memory_start) >> PAGE_SHIFT))
-#define PFN_SHIFT_OFFSET	(PAGE_SHIFT)
+#define PTE_PFN_SHIFT		PAGE_SHIFT
 
-#define pte_pfn(x)		(pte_val(x) >> PFN_SHIFT_OFFSET)
+#define pte_pfn(x)		(pte_val(x) >> PTE_PFN_SHIFT)
 
 #define pfn_pte(pfn, prot) \
-	__pte(((pte_basic_t)(pfn) << PFN_SHIFT_OFFSET) | pgprot_val(prot))
+	__pte(((pte_basic_t)(pfn) << PTE_PFN_SHIFT) | pgprot_val(prot))
 
 #ifndef __ASSEMBLY__
 /*
@@ -330,14 +330,7 @@ static inline unsigned long pte_update(pte_t *p, unsigned long clr,
 /*
  * set_pte stores a linux PTE into the linux page table.
  */
-static inline void set_pte(struct mm_struct *mm, unsigned long addr,
-		pte_t *ptep, pte_t pte)
-{
-	*ptep = pte;
-}
-
-static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
-		pte_t *ptep, pte_t pte)
+static inline void set_pte(pte_t *ptep, pte_t pte)
 {
 	*ptep = pte;
 }
diff --git a/arch/microblaze/include/asm/tlbflush.h b/arch/microblaze/include/asm/tlbflush.h
index 2038168ed128..1b179e5e9062 100644
--- a/arch/microblaze/include/asm/tlbflush.h
+++ b/arch/microblaze/include/asm/tlbflush.h
@@ -33,7 +33,9 @@ static inline void local_flush_tlb_range(struct vm_area_struct *vma,
 
 #define flush_tlb_kernel_range(start, end)	do { } while (0)
 
-#define update_mmu_cache(vma, addr, ptep)	do { } while (0)
+#define update_mmu_cache_range(vma, addr, ptep, nr)	do { } while (0)
+#define update_mmu_cache(vma, addr, pte) \
+	update_mmu_cache_range(vma, addr, ptep, 1)
 
 #define flush_tlb_all local_flush_tlb_all
 #define flush_tlb_mm local_flush_tlb_mm
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH v4 16/36] mips: Implement the new page table range API
  2023-03-15  5:14 [PATCH v4 00/36] New page table range API Matthew Wilcox (Oracle)
                   ` (14 preceding siblings ...)
  2023-03-15  5:14 ` [PATCH v4 15/36] microblaze: " Matthew Wilcox (Oracle)
@ 2023-03-15  5:14 ` Matthew Wilcox (Oracle)
  2023-03-15 10:08   ` Mike Rapoport
  2023-03-15 10:50   ` Thomas Bogendoerfer
  2023-03-15  5:14 ` [PATCH v4 17/36] nios2: " Matthew Wilcox (Oracle)
                   ` (19 subsequent siblings)
  35 siblings, 2 replies; 138+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-03-15  5:14 UTC (permalink / raw)
  To: linux-arch
  Cc: Matthew Wilcox (Oracle),
	linux-mm, linux-kernel, Thomas Bogendoerfer, linux-mips

Rename _PFN_SHIFT to PFN_PTE_SHIFT.  Convert a few places
to call set_pte() instead of set_pte_at().  Add set_ptes(),
update_mmu_cache_range(), flush_icache_pages() and flush_dcache_folio().
Change the PG_arch_1 (aka PG_dcache_dirty) flag from being per-page
to per-folio.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: linux-mips@vger.kernel.org
---
 arch/mips/bcm47xx/prom.c             |  2 +-
 arch/mips/include/asm/cacheflush.h   | 32 ++++++++++------
 arch/mips/include/asm/pgtable-32.h   | 10 ++---
 arch/mips/include/asm/pgtable-64.h   |  6 +--
 arch/mips/include/asm/pgtable-bits.h |  6 +--
 arch/mips/include/asm/pgtable.h      | 44 +++++++++++++---------
 arch/mips/mm/c-r4k.c                 |  5 ++-
 arch/mips/mm/cache.c                 | 56 ++++++++++++++--------------
 arch/mips/mm/init.c                  | 21 +++++++----
 arch/mips/mm/pgtable-32.c            |  2 +-
 arch/mips/mm/pgtable-64.c            |  2 +-
 arch/mips/mm/tlbex.c                 |  2 +-
 12 files changed, 107 insertions(+), 81 deletions(-)

diff --git a/arch/mips/bcm47xx/prom.c b/arch/mips/bcm47xx/prom.c
index a9bea411d928..99a1ba5394e0 100644
--- a/arch/mips/bcm47xx/prom.c
+++ b/arch/mips/bcm47xx/prom.c
@@ -116,7 +116,7 @@ void __init prom_init(void)
 #if defined(CONFIG_BCM47XX_BCMA) && defined(CONFIG_HIGHMEM)
 
 #define EXTVBASE	0xc0000000
-#define ENTRYLO(x)	((pte_val(pfn_pte((x) >> _PFN_SHIFT, PAGE_KERNEL_UNCACHED)) >> 6) | 1)
+#define ENTRYLO(x)	((pte_val(pfn_pte((x) >> PFN_PTE_SHIFT, PAGE_KERNEL_UNCACHED)) >> 6) | 1)
 
 #include <asm/tlbflush.h>
 
diff --git a/arch/mips/include/asm/cacheflush.h b/arch/mips/include/asm/cacheflush.h
index b3dc9c589442..2683cade42ef 100644
--- a/arch/mips/include/asm/cacheflush.h
+++ b/arch/mips/include/asm/cacheflush.h
@@ -36,12 +36,12 @@
  */
 #define PG_dcache_dirty			PG_arch_1
 
-#define Page_dcache_dirty(page)		\
-	test_bit(PG_dcache_dirty, &(page)->flags)
-#define SetPageDcacheDirty(page)	\
-	set_bit(PG_dcache_dirty, &(page)->flags)
-#define ClearPageDcacheDirty(page)	\
-	clear_bit(PG_dcache_dirty, &(page)->flags)
+#define folio_test_dcache_dirty(folio)		\
+	test_bit(PG_dcache_dirty, &(folio)->flags)
+#define folio_set_dcache_dirty(folio)	\
+	set_bit(PG_dcache_dirty, &(folio)->flags)
+#define folio_clear_dcache_dirty(folio)	\
+	clear_bit(PG_dcache_dirty, &(folio)->flags)
 
 extern void (*flush_cache_all)(void);
 extern void (*__flush_cache_all)(void);
@@ -50,15 +50,24 @@ extern void (*flush_cache_mm)(struct mm_struct *mm);
 extern void (*flush_cache_range)(struct vm_area_struct *vma,
 	unsigned long start, unsigned long end);
 extern void (*flush_cache_page)(struct vm_area_struct *vma, unsigned long page, unsigned long pfn);
-extern void __flush_dcache_page(struct page *page);
+extern void __flush_dcache_pages(struct page *page, unsigned int nr);
 
 #define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
+static inline void flush_dcache_folio(struct folio *folio)
+{
+	if (cpu_has_dc_aliases)
+		__flush_dcache_pages(&folio->page, folio_nr_pages(folio));
+	else if (!cpu_has_ic_fills_f_dc)
+		folio_set_dcache_dirty(folio);
+}
+#define flush_dcache_folio flush_dcache_folio
+
 static inline void flush_dcache_page(struct page *page)
 {
 	if (cpu_has_dc_aliases)
-		__flush_dcache_page(page);
+		__flush_dcache_pages(page, 1);
 	else if (!cpu_has_ic_fills_f_dc)
-		SetPageDcacheDirty(page);
+		folio_set_dcache_dirty(page_folio(page));
 }
 
 #define flush_dcache_mmap_lock(mapping)		do { } while (0)
@@ -73,10 +82,11 @@ static inline void flush_anon_page(struct vm_area_struct *vma,
 		__flush_anon_page(page, vmaddr);
 }
 
-static inline void flush_icache_page(struct vm_area_struct *vma,
-	struct page *page)
+static inline void flush_icache_pages(struct vm_area_struct *vma,
+		struct page *page, unsigned int nr)
 {
 }
+#define flush_icache_page(vma, page) flush_icache_pages(vma, page, 1)
 
 extern void (*flush_icache_range)(unsigned long start, unsigned long end);
 extern void (*local_flush_icache_range)(unsigned long start, unsigned long end);
diff --git a/arch/mips/include/asm/pgtable-32.h b/arch/mips/include/asm/pgtable-32.h
index ba0016709a1a..0e196650f4f4 100644
--- a/arch/mips/include/asm/pgtable-32.h
+++ b/arch/mips/include/asm/pgtable-32.h
@@ -153,7 +153,7 @@ static inline void pmd_clear(pmd_t *pmdp)
 #if defined(CONFIG_XPA)
 
 #define MAX_POSSIBLE_PHYSMEM_BITS 40
-#define pte_pfn(x)		(((unsigned long)((x).pte_high >> _PFN_SHIFT)) | (unsigned long)((x).pte_low << _PAGE_PRESENT_SHIFT))
+#define pte_pfn(x)		(((unsigned long)((x).pte_high >> PFN_PTE_SHIFT)) | (unsigned long)((x).pte_low << _PAGE_PRESENT_SHIFT))
 static inline pte_t
 pfn_pte(unsigned long pfn, pgprot_t prot)
 {
@@ -161,7 +161,7 @@ pfn_pte(unsigned long pfn, pgprot_t prot)
 
 	pte.pte_low = (pfn >> _PAGE_PRESENT_SHIFT) |
 				(pgprot_val(prot) & ~_PFNX_MASK);
-	pte.pte_high = (pfn << _PFN_SHIFT) |
+	pte.pte_high = (pfn << PFN_PTE_SHIFT) |
 				(pgprot_val(prot) & ~_PFN_MASK);
 	return pte;
 }
@@ -184,9 +184,9 @@ static inline pte_t pfn_pte(unsigned long pfn, pgprot_t prot)
 #else
 
 #define MAX_POSSIBLE_PHYSMEM_BITS 32
-#define pte_pfn(x)		((unsigned long)((x).pte >> _PFN_SHIFT))
-#define pfn_pte(pfn, prot)	__pte(((unsigned long long)(pfn) << _PFN_SHIFT) | pgprot_val(prot))
-#define pfn_pmd(pfn, prot)	__pmd(((unsigned long long)(pfn) << _PFN_SHIFT) | pgprot_val(prot))
+#define pte_pfn(x)		((unsigned long)((x).pte >> PFN_PTE_SHIFT))
+#define pfn_pte(pfn, prot)	__pte(((unsigned long long)(pfn) << PFN_PTE_SHIFT) | pgprot_val(prot))
+#define pfn_pmd(pfn, prot)	__pmd(((unsigned long long)(pfn) << PFN_PTE_SHIFT) | pgprot_val(prot))
 #endif /* defined(CONFIG_PHYS_ADDR_T_64BIT) && defined(CONFIG_CPU_MIPS32) */
 
 #define pte_page(x)		pfn_to_page(pte_pfn(x))
diff --git a/arch/mips/include/asm/pgtable-64.h b/arch/mips/include/asm/pgtable-64.h
index 98e24e3e7f2b..20ca48c1b606 100644
--- a/arch/mips/include/asm/pgtable-64.h
+++ b/arch/mips/include/asm/pgtable-64.h
@@ -298,9 +298,9 @@ static inline void pud_clear(pud_t *pudp)
 
 #define pte_page(x)		pfn_to_page(pte_pfn(x))
 
-#define pte_pfn(x)		((unsigned long)((x).pte >> _PFN_SHIFT))
-#define pfn_pte(pfn, prot)	__pte(((pfn) << _PFN_SHIFT) | pgprot_val(prot))
-#define pfn_pmd(pfn, prot)	__pmd(((pfn) << _PFN_SHIFT) | pgprot_val(prot))
+#define pte_pfn(x)		((unsigned long)((x).pte >> PFN_PTE_SHIFT))
+#define pfn_pte(pfn, prot)	__pte(((pfn) << PFN_PTE_SHIFT) | pgprot_val(prot))
+#define pfn_pmd(pfn, prot)	__pmd(((pfn) << PFN_PTE_SHIFT) | pgprot_val(prot))
 
 #ifndef __PAGETABLE_PMD_FOLDED
 static inline pmd_t *pud_pgtable(pud_t pud)
diff --git a/arch/mips/include/asm/pgtable-bits.h b/arch/mips/include/asm/pgtable-bits.h
index 2362842ee2b5..744abba9111f 100644
--- a/arch/mips/include/asm/pgtable-bits.h
+++ b/arch/mips/include/asm/pgtable-bits.h
@@ -182,10 +182,10 @@ enum pgtable_bits {
 #if defined(CONFIG_CPU_R3K_TLB)
 # define _CACHE_UNCACHED	(1 << _CACHE_UNCACHED_SHIFT)
 # define _CACHE_MASK		_CACHE_UNCACHED
-# define _PFN_SHIFT		PAGE_SHIFT
+# define PFN_PTE_SHIFT		PAGE_SHIFT
 #else
 # define _CACHE_MASK		(7 << _CACHE_SHIFT)
-# define _PFN_SHIFT		(PAGE_SHIFT - 12 + _CACHE_SHIFT + 3)
+# define PFN_PTE_SHIFT		(PAGE_SHIFT - 12 + _CACHE_SHIFT + 3)
 #endif
 
 #ifndef _PAGE_NO_EXEC
@@ -195,7 +195,7 @@ enum pgtable_bits {
 #define _PAGE_SILENT_READ	_PAGE_VALID
 #define _PAGE_SILENT_WRITE	_PAGE_DIRTY
 
-#define _PFN_MASK		(~((1 << (_PFN_SHIFT)) - 1))
+#define _PFN_MASK		(~((1 << (PFN_PTE_SHIFT)) - 1))
 
 /*
  * The final layouts of the PTE bits are:
diff --git a/arch/mips/include/asm/pgtable.h b/arch/mips/include/asm/pgtable.h
index 574fa14ac8b2..cfcd6a8ba8ef 100644
--- a/arch/mips/include/asm/pgtable.h
+++ b/arch/mips/include/asm/pgtable.h
@@ -66,7 +66,7 @@ extern void paging_init(void);
 
 static inline unsigned long pmd_pfn(pmd_t pmd)
 {
-	return pmd_val(pmd) >> _PFN_SHIFT;
+	return pmd_val(pmd) >> PFN_PTE_SHIFT;
 }
 
 #ifndef CONFIG_MIPS_HUGE_TLB_SUPPORT
@@ -105,9 +105,6 @@ do {									\
 	}								\
 } while(0)
 
-static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
-			      pte_t *ptep, pte_t pteval);
-
 #if defined(CONFIG_PHYS_ADDR_T_64BIT) && defined(CONFIG_CPU_MIPS32)
 
 #ifdef CONFIG_XPA
@@ -157,7 +154,7 @@ static inline void pte_clear(struct mm_struct *mm, unsigned long addr, pte_t *pt
 			null.pte_low = null.pte_high = _PAGE_GLOBAL;
 	}
 
-	set_pte_at(mm, addr, ptep, null);
+	set_pte(ptep, null);
 	htw_start();
 }
 #else
@@ -196,28 +193,41 @@ static inline void pte_clear(struct mm_struct *mm, unsigned long addr, pte_t *pt
 #if !defined(CONFIG_CPU_R3K_TLB)
 	/* Preserve global status for the pair */
 	if (pte_val(*ptep_buddy(ptep)) & _PAGE_GLOBAL)
-		set_pte_at(mm, addr, ptep, __pte(_PAGE_GLOBAL));
+		set_pte(ptep, __pte(_PAGE_GLOBAL));
 	else
 #endif
-		set_pte_at(mm, addr, ptep, __pte(0));
+		set_pte(ptep, __pte(0));
 	htw_start();
 }
 #endif
 
-static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
-			      pte_t *ptep, pte_t pteval)
+static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
+		pte_t *ptep, pte_t pte, unsigned int nr)
 {
+	unsigned int i;
+	bool do_sync = false;
 
-	if (!pte_present(pteval))
-		goto cache_sync_done;
+	for (i = 0; i < nr; i++) {
+		if (!pte_present(pte))
+			continue;
+		if (pte_present(ptep[i]) &&
+		    (pte_pfn(ptep[i]) == pte_pfn(pte)))
+			continue;
+		do_sync = true;
+	}
 
-	if (pte_present(*ptep) && (pte_pfn(*ptep) == pte_pfn(pteval)))
-		goto cache_sync_done;
+	if (do_sync)
+		__update_cache(addr, pte);
 
-	__update_cache(addr, pteval);
-cache_sync_done:
-	set_pte(ptep, pteval);
+	for (;;) {
+		set_pte(ptep, pte);
+		if (--nr == 0)
+			break;
+		ptep++;
+		pte_val(pte) += 1 << PFN_PTE_SHIFT;
+	}
 }
+#define set_ptes set_ptes
 
 /*
  * (pmds are folded into puds so this doesn't get actually called,
@@ -486,7 +496,7 @@ static inline int ptep_set_access_flags(struct vm_area_struct *vma,
 					pte_t entry, int dirty)
 {
 	if (!pte_same(*ptep, entry))
-		set_pte_at(vma->vm_mm, address, ptep, entry);
+		set_pte(ptep, entry);
 	/*
 	 * update_mmu_cache will unconditionally execute, handling both
 	 * the case that the PTE changed and the spurious fault case.
diff --git a/arch/mips/mm/c-r4k.c b/arch/mips/mm/c-r4k.c
index a549fa98c2f4..7d2a42f0cffd 100644
--- a/arch/mips/mm/c-r4k.c
+++ b/arch/mips/mm/c-r4k.c
@@ -679,13 +679,14 @@ static inline void local_r4k_flush_cache_page(void *args)
 	if ((mm == current->active_mm) && (pte_val(*ptep) & _PAGE_VALID))
 		vaddr = NULL;
 	else {
+		struct folio *folio = page_folio(page);
 		/*
 		 * Use kmap_coherent or kmap_atomic to do flushes for
 		 * another ASID than the current one.
 		 */
 		map_coherent = (cpu_has_dc_aliases &&
-				page_mapcount(page) &&
-				!Page_dcache_dirty(page));
+				folio_mapped(folio) &&
+				!folio_test_dcache_dirty(folio));
 		if (map_coherent)
 			vaddr = kmap_coherent(page, addr);
 		else
diff --git a/arch/mips/mm/cache.c b/arch/mips/mm/cache.c
index 11b3e7ddafd5..0668435521fc 100644
--- a/arch/mips/mm/cache.c
+++ b/arch/mips/mm/cache.c
@@ -82,13 +82,15 @@ SYSCALL_DEFINE3(cacheflush, unsigned long, addr, unsigned long, bytes,
 	return 0;
 }
 
-void __flush_dcache_page(struct page *page)
+void __flush_dcache_pages(struct page *page, unsigned int nr)
 {
-	struct address_space *mapping = page_mapping_file(page);
+	struct folio *folio = page_folio(page);
+	struct address_space *mapping = folio_flush_mapping(folio);
 	unsigned long addr;
+	unsigned int i;
 
 	if (mapping && !mapping_mapped(mapping)) {
-		SetPageDcacheDirty(page);
+		folio_set_dcache_dirty(folio);
 		return;
 	}
 
@@ -97,25 +99,21 @@ void __flush_dcache_page(struct page *page)
 	 * case is for exec env/arg pages and those are %99 certainly going to
 	 * get faulted into the tlb (and thus flushed) anyways.
 	 */
-	if (PageHighMem(page))
-		addr = (unsigned long)kmap_atomic(page);
-	else
-		addr = (unsigned long)page_address(page);
-
-	flush_data_cache_page(addr);
-
-	if (PageHighMem(page))
-		kunmap_atomic((void *)addr);
+	for (i = 0; i < nr; i++) {
+		addr = (unsigned long)kmap_local_page(page + i);
+		flush_data_cache_page(addr);
+		kunmap_local((void *)addr);
+	}
 }
-
-EXPORT_SYMBOL(__flush_dcache_page);
+EXPORT_SYMBOL(__flush_dcache_pages);
 
 void __flush_anon_page(struct page *page, unsigned long vmaddr)
 {
 	unsigned long addr = (unsigned long) page_address(page);
+	struct folio *folio = page_folio(page);
 
 	if (pages_do_alias(addr, vmaddr)) {
-		if (page_mapcount(page) && !Page_dcache_dirty(page)) {
+		if (folio_mapped(folio) && !folio_test_dcache_dirty(folio)) {
 			void *kaddr;
 
 			kaddr = kmap_coherent(page, vmaddr);
@@ -130,27 +128,29 @@ EXPORT_SYMBOL(__flush_anon_page);
 
 void __update_cache(unsigned long address, pte_t pte)
 {
-	struct page *page;
+	struct folio *folio;
 	unsigned long pfn, addr;
 	int exec = !pte_no_exec(pte) && !cpu_has_ic_fills_f_dc;
+	unsigned int i;
 
 	pfn = pte_pfn(pte);
 	if (unlikely(!pfn_valid(pfn)))
 		return;
-	page = pfn_to_page(pfn);
-	if (Page_dcache_dirty(page)) {
-		if (PageHighMem(page))
-			addr = (unsigned long)kmap_atomic(page);
-		else
-			addr = (unsigned long)page_address(page);
-
-		if (exec || pages_do_alias(addr, address & PAGE_MASK))
-			flush_data_cache_page(addr);
 
-		if (PageHighMem(page))
-			kunmap_atomic((void *)addr);
+	folio = page_folio(pfn_to_page(pfn));
+	address &= PAGE_MASK;
+	address -= offset_in_folio(folio, pfn << PAGE_SHIFT);
+
+	if (folio_test_dcache_dirty(folio)) {
+		for (i = 0; i < folio_nr_pages(folio); i++) {
+			addr = (unsigned long)kmap_local_folio(folio, i);
 
-		ClearPageDcacheDirty(page);
+			if (exec || pages_do_alias(addr, address))
+				flush_data_cache_page(addr);
+			kunmap_local((void *)addr);
+			address += PAGE_SIZE;
+		}
+		folio_clear_dcache_dirty(folio);
 	}
 }
 
diff --git a/arch/mips/mm/init.c b/arch/mips/mm/init.c
index 5a8002839550..5dcb525a8995 100644
--- a/arch/mips/mm/init.c
+++ b/arch/mips/mm/init.c
@@ -88,7 +88,7 @@ static void *__kmap_pgprot(struct page *page, unsigned long addr, pgprot_t prot)
 	pte_t pte;
 	int tlbidx;
 
-	BUG_ON(Page_dcache_dirty(page));
+	BUG_ON(folio_test_dcache_dirty(page_folio(page)));
 
 	preempt_disable();
 	pagefault_disable();
@@ -169,11 +169,12 @@ void kunmap_coherent(void)
 void copy_user_highpage(struct page *to, struct page *from,
 	unsigned long vaddr, struct vm_area_struct *vma)
 {
+	struct folio *src = page_folio(from);
 	void *vfrom, *vto;
 
 	vto = kmap_atomic(to);
 	if (cpu_has_dc_aliases &&
-	    page_mapcount(from) && !Page_dcache_dirty(from)) {
+	    folio_mapped(src) && !folio_test_dcache_dirty(src)) {
 		vfrom = kmap_coherent(from, vaddr);
 		copy_page(vto, vfrom);
 		kunmap_coherent();
@@ -194,15 +195,17 @@ void copy_to_user_page(struct vm_area_struct *vma,
 	struct page *page, unsigned long vaddr, void *dst, const void *src,
 	unsigned long len)
 {
+	struct folio *folio = page_folio(page);
+
 	if (cpu_has_dc_aliases &&
-	    page_mapcount(page) && !Page_dcache_dirty(page)) {
+	    folio_mapped(folio) && !folio_test_dcache_dirty(folio)) {
 		void *vto = kmap_coherent(page, vaddr) + (vaddr & ~PAGE_MASK);
 		memcpy(vto, src, len);
 		kunmap_coherent();
 	} else {
 		memcpy(dst, src, len);
 		if (cpu_has_dc_aliases)
-			SetPageDcacheDirty(page);
+			folio_set_dcache_dirty(folio);
 	}
 	if (vma->vm_flags & VM_EXEC)
 		flush_cache_page(vma, vaddr, page_to_pfn(page));
@@ -212,15 +215,17 @@ void copy_from_user_page(struct vm_area_struct *vma,
 	struct page *page, unsigned long vaddr, void *dst, const void *src,
 	unsigned long len)
 {
+	struct folio *folio = page_folio(page);
+
 	if (cpu_has_dc_aliases &&
-	    page_mapcount(page) && !Page_dcache_dirty(page)) {
+	    folio_mapped(folio) && !folio_test_dcache_dirty(folio)) {
 		void *vfrom = kmap_coherent(page, vaddr) + (vaddr & ~PAGE_MASK);
 		memcpy(dst, vfrom, len);
 		kunmap_coherent();
 	} else {
 		memcpy(dst, src, len);
 		if (cpu_has_dc_aliases)
-			SetPageDcacheDirty(page);
+			folio_set_dcache_dirty(folio);
 	}
 }
 EXPORT_SYMBOL_GPL(copy_from_user_page);
@@ -448,10 +453,10 @@ static inline void __init mem_init_free_highmem(void)
 void __init mem_init(void)
 {
 	/*
-	 * When _PFN_SHIFT is greater than PAGE_SHIFT we won't have enough PTE
+	 * When PFN_PTE_SHIFT is greater than PAGE_SHIFT we won't have enough PTE
 	 * bits to hold a full 32b physical address on MIPS32 systems.
 	 */
-	BUILD_BUG_ON(IS_ENABLED(CONFIG_32BIT) && (_PFN_SHIFT > PAGE_SHIFT));
+	BUILD_BUG_ON(IS_ENABLED(CONFIG_32BIT) && (PFN_PTE_SHIFT > PAGE_SHIFT));
 
 #ifdef CONFIG_HIGHMEM
 	max_mapnr = highend_pfn ? highend_pfn : max_low_pfn;
diff --git a/arch/mips/mm/pgtable-32.c b/arch/mips/mm/pgtable-32.c
index f57fb69472f8..84dd5136d53a 100644
--- a/arch/mips/mm/pgtable-32.c
+++ b/arch/mips/mm/pgtable-32.c
@@ -35,7 +35,7 @@ pmd_t mk_pmd(struct page *page, pgprot_t prot)
 {
 	pmd_t pmd;
 
-	pmd_val(pmd) = (page_to_pfn(page) << _PFN_SHIFT) | pgprot_val(prot);
+	pmd_val(pmd) = (page_to_pfn(page) << PFN_PTE_SHIFT) | pgprot_val(prot);
 
 	return pmd;
 }
diff --git a/arch/mips/mm/pgtable-64.c b/arch/mips/mm/pgtable-64.c
index b4386a0e2ef8..c76d21f7dffb 100644
--- a/arch/mips/mm/pgtable-64.c
+++ b/arch/mips/mm/pgtable-64.c
@@ -93,7 +93,7 @@ pmd_t mk_pmd(struct page *page, pgprot_t prot)
 {
 	pmd_t pmd;
 
-	pmd_val(pmd) = (page_to_pfn(page) << _PFN_SHIFT) | pgprot_val(prot);
+	pmd_val(pmd) = (page_to_pfn(page) << PFN_PTE_SHIFT) | pgprot_val(prot);
 
 	return pmd;
 }
diff --git a/arch/mips/mm/tlbex.c b/arch/mips/mm/tlbex.c
index 80e05ee98d62..1393a11af539 100644
--- a/arch/mips/mm/tlbex.c
+++ b/arch/mips/mm/tlbex.c
@@ -253,7 +253,7 @@ static void output_pgtable_bits_defines(void)
 	pr_define("_PAGE_GLOBAL_SHIFT %d\n", _PAGE_GLOBAL_SHIFT);
 	pr_define("_PAGE_VALID_SHIFT %d\n", _PAGE_VALID_SHIFT);
 	pr_define("_PAGE_DIRTY_SHIFT %d\n", _PAGE_DIRTY_SHIFT);
-	pr_define("_PFN_SHIFT %d\n", _PFN_SHIFT);
+	pr_define("PFN_PTE_SHIFT %d\n", PFN_PTE_SHIFT);
 	pr_debug("\n");
 }
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH v4 17/36] nios2: Implement the new page table range API
  2023-03-15  5:14 [PATCH v4 00/36] New page table range API Matthew Wilcox (Oracle)
                   ` (15 preceding siblings ...)
  2023-03-15  5:14 ` [PATCH v4 16/36] mips: " Matthew Wilcox (Oracle)
@ 2023-03-15  5:14 ` Matthew Wilcox (Oracle)
  2023-03-15 10:08   ` Mike Rapoport
  2023-03-15  5:14 ` [PATCH v4 18/36] openrisc: " Matthew Wilcox (Oracle)
                   ` (18 subsequent siblings)
  35 siblings, 1 reply; 138+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-03-15  5:14 UTC (permalink / raw)
  To: linux-arch; +Cc: Matthew Wilcox (Oracle), linux-mm, linux-kernel, Dinh Nguyen

Add set_ptes(), update_mmu_cache_range(), flush_icache_pages() and
flush_dcache_folio().  Change the PG_arch_1 (aka PG_dcache_dirty) flag
from being per-page to per-folio.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Dinh Nguyen <dinguyen@kernel.org>
---
 arch/nios2/include/asm/cacheflush.h |  6 ++-
 arch/nios2/include/asm/pgtable.h    | 28 ++++++++-----
 arch/nios2/mm/cacheflush.c          | 61 ++++++++++++++++-------------
 3 files changed, 58 insertions(+), 37 deletions(-)

diff --git a/arch/nios2/include/asm/cacheflush.h b/arch/nios2/include/asm/cacheflush.h
index d0b71dd71287..8624ca83cffe 100644
--- a/arch/nios2/include/asm/cacheflush.h
+++ b/arch/nios2/include/asm/cacheflush.h
@@ -29,9 +29,13 @@ extern void flush_cache_page(struct vm_area_struct *vma, unsigned long vmaddr,
 	unsigned long pfn);
 #define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
 void flush_dcache_page(struct page *page);
+void flush_dcache_folio(struct folio *folio);
+#define flush_dcache_folio flush_dcache_folio
 
 extern void flush_icache_range(unsigned long start, unsigned long end);
-extern void flush_icache_page(struct vm_area_struct *vma, struct page *page);
+void flush_icache_pages(struct vm_area_struct *vma, struct page *page,
+		unsigned int nr);
+#define flush_icache_page(vma, page) flush_icache_pages(vma, page, 1);
 
 #define flush_cache_vmap(start, end)		flush_dcache_range(start, end)
 #define flush_cache_vunmap(start, end)		flush_dcache_range(start, end)
diff --git a/arch/nios2/include/asm/pgtable.h b/arch/nios2/include/asm/pgtable.h
index 0f5c2564e9f5..4bb5f4dfff82 100644
--- a/arch/nios2/include/asm/pgtable.h
+++ b/arch/nios2/include/asm/pgtable.h
@@ -178,14 +178,21 @@ static inline void set_pte(pte_t *ptep, pte_t pteval)
 	*ptep = pteval;
 }
 
-static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
-			      pte_t *ptep, pte_t pteval)
+static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
+		pte_t *ptep, pte_t pte, unsigned int nr)
 {
-	unsigned long paddr = (unsigned long)page_to_virt(pte_page(pteval));
-
-	flush_dcache_range(paddr, paddr + PAGE_SIZE);
-	set_pte(ptep, pteval);
+	unsigned long paddr = (unsigned long)page_to_virt(pte_page(pte));
+
+	flush_dcache_range(paddr, paddr + nr * PAGE_SIZE);
+	for (;;) {
+		set_pte(ptep, pte);
+		if (--nr == 0)
+			break;
+		ptep++;
+		pte_val(pte) += 1;
+	}
 }
+#define set_ptes set_ptes
 
 static inline int pmd_none(pmd_t pmd)
 {
@@ -202,7 +209,7 @@ static inline void pte_clear(struct mm_struct *mm,
 
 	pte_val(null) = (addr >> PAGE_SHIFT) & 0xf;
 
-	set_pte_at(mm, addr, ptep, null);
+	set_pte(ptep, null);
 }
 
 /*
@@ -273,7 +280,10 @@ static inline pte_t pte_swp_clear_exclusive(pte_t pte)
 extern void __init paging_init(void);
 extern void __init mmu_init(void);
 
-extern void update_mmu_cache(struct vm_area_struct *vma,
-			     unsigned long address, pte_t *pte);
+void update_mmu_cache_range(struct vm_area_struct *vma, unsigned long address,
+		pte_t *ptep, unsigned int nr);
+
+#define update_mmu_cache(vma, addr, ptep) \
+	update_mmu_cache_range(vma, addr, ptep, 1)
 
 #endif /* _ASM_NIOS2_PGTABLE_H */
diff --git a/arch/nios2/mm/cacheflush.c b/arch/nios2/mm/cacheflush.c
index 6aa9257c3ede..471485a84b2c 100644
--- a/arch/nios2/mm/cacheflush.c
+++ b/arch/nios2/mm/cacheflush.c
@@ -138,10 +138,11 @@ void flush_cache_range(struct vm_area_struct *vma, unsigned long start,
 		__flush_icache(start, end);
 }
 
-void flush_icache_page(struct vm_area_struct *vma, struct page *page)
+void flush_icache_pages(struct vm_area_struct *vma, struct page *page,
+		unsigned int nr)
 {
 	unsigned long start = (unsigned long) page_address(page);
-	unsigned long end = start + PAGE_SIZE;
+	unsigned long end = start + nr * PAGE_SIZE;
 
 	__flush_dcache(start, end);
 	__flush_icache(start, end);
@@ -158,19 +159,19 @@ void flush_cache_page(struct vm_area_struct *vma, unsigned long vmaddr,
 		__flush_icache(start, end);
 }
 
-void __flush_dcache_page(struct address_space *mapping, struct page *page)
+void __flush_dcache_folio(struct address_space *mapping, struct folio *folio)
 {
 	/*
 	 * Writeback any data associated with the kernel mapping of this
 	 * page.  This ensures that data in the physical page is mutually
 	 * coherent with the kernels mapping.
 	 */
-	unsigned long start = (unsigned long)page_address(page);
+	unsigned long start = (unsigned long)folio_address(folio);
 
-	__flush_dcache(start, start + PAGE_SIZE);
+	__flush_dcache(start, start + folio_size(folio));
 }
 
-void flush_dcache_page(struct page *page)
+void flush_dcache_folio(struct folio *folio)
 {
 	struct address_space *mapping;
 
@@ -178,32 +179,38 @@ void flush_dcache_page(struct page *page)
 	 * The zero page is never written to, so never has any dirty
 	 * cache lines, and therefore never needs to be flushed.
 	 */
-	if (page == ZERO_PAGE(0))
+	if (is_zero_pfn(folio_pfn(folio)))
 		return;
 
-	mapping = page_mapping_file(page);
+	mapping = folio_flush_mapping(folio);
 
 	/* Flush this page if there are aliases. */
 	if (mapping && !mapping_mapped(mapping)) {
-		clear_bit(PG_dcache_clean, &page->flags);
+		clear_bit(PG_dcache_clean, &folio->flags);
 	} else {
-		__flush_dcache_page(mapping, page);
+		__flush_dcache_folio(mapping, folio);
 		if (mapping) {
-			unsigned long start = (unsigned long)page_address(page);
-			flush_aliases(mapping,  page);
-			flush_icache_range(start, start + PAGE_SIZE);
+			unsigned long start = (unsigned long)folio_address(folio);
+			flush_aliases(mapping, folio);
+			flush_icache_range(start, start + folio_size(folio));
 		}
-		set_bit(PG_dcache_clean, &page->flags);
+		set_bit(PG_dcache_clean, &folio->flags);
 	}
 }
-EXPORT_SYMBOL(flush_dcache_page);
+EXPORT_SYMBOL(flush_dcache_folio);
+
+void flush_dcache_page(struct page *page)
+{
+	flush_dcache_folio(page_folio(page));
+}
+EXPORT_SYMBOL(flush_dcache_folio);
 
-void update_mmu_cache(struct vm_area_struct *vma,
-		      unsigned long address, pte_t *ptep)
+void update_mmu_cache_range(struct vm_area_struct *vma, unsigned long address,
+		pte_t *ptep, unsigned int nr)
 {
 	pte_t pte = *ptep;
 	unsigned long pfn = pte_pfn(pte);
-	struct page *page;
+	struct folio *folio;
 	struct address_space *mapping;
 
 	reload_tlb_page(vma, address, pte);
@@ -215,19 +222,19 @@ void update_mmu_cache(struct vm_area_struct *vma,
 	* The zero page is never written to, so never has any dirty
 	* cache lines, and therefore never needs to be flushed.
 	*/
-	page = pfn_to_page(pfn);
-	if (page == ZERO_PAGE(0))
+	if (is_zero_pfn(pfn))
 		return;
 
-	mapping = page_mapping_file(page);
-	if (!test_and_set_bit(PG_dcache_clean, &page->flags))
-		__flush_dcache_page(mapping, page);
+	folio = page_folio(pfn_to_page(pfn));
+	mapping = folio_flush_mapping(folio);
+	if (!test_and_set_bit(PG_dcache_clean, &folio->flags))
+		__flush_dcache_folio(mapping, folio);
 
-	if(mapping)
-	{
-		flush_aliases(mapping, page);
+	if (mapping) {
+		flush_aliases(mapping, folio);
 		if (vma->vm_flags & VM_EXEC)
-			flush_icache_page(vma, page);
+			flush_icache_pages(vma, &folio->page,
+					folio_nr_pages(folio));
 	}
 }
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH v4 18/36] openrisc: Implement the new page table range API
  2023-03-15  5:14 [PATCH v4 00/36] New page table range API Matthew Wilcox (Oracle)
                   ` (16 preceding siblings ...)
  2023-03-15  5:14 ` [PATCH v4 17/36] nios2: " Matthew Wilcox (Oracle)
@ 2023-03-15  5:14 ` Matthew Wilcox (Oracle)
  2023-03-15 10:09   ` Mike Rapoport
  2023-03-15  5:14 ` [PATCH v4 19/36] parisc: " Matthew Wilcox (Oracle)
                   ` (17 subsequent siblings)
  35 siblings, 1 reply; 138+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-03-15  5:14 UTC (permalink / raw)
  To: linux-arch
  Cc: Matthew Wilcox (Oracle),
	linux-mm, linux-kernel, Jonas Bonn, Stefan Kristiansson,
	Stafford Horne, linux-openrisc

Add PFN_PTE_SHIFT, update_mmu_cache_range() and flush_dcache_folio().
Change the PG_arch_1 (aka PG_dcache_dirty) flag from being per-page
to per-folio.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
Cc: Stafford Horne <shorne@gmail.com>
Cc: linux-openrisc@vger.kernel.org
---
 arch/openrisc/include/asm/cacheflush.h |  8 +++++++-
 arch/openrisc/include/asm/pgtable.h    | 14 +++++++++-----
 arch/openrisc/mm/cache.c               | 12 ++++++++----
 3 files changed, 24 insertions(+), 10 deletions(-)

diff --git a/arch/openrisc/include/asm/cacheflush.h b/arch/openrisc/include/asm/cacheflush.h
index eeac40d4a854..984c331ff5f4 100644
--- a/arch/openrisc/include/asm/cacheflush.h
+++ b/arch/openrisc/include/asm/cacheflush.h
@@ -56,10 +56,16 @@ static inline void sync_icache_dcache(struct page *page)
  */
 #define PG_dc_clean                  PG_arch_1
 
+static inline void flush_dcache_folio(struct folio *folio)
+{
+	clear_bit(PG_dc_clean, &folio->flags);
+}
+#define flush_dcache_folio flush_dcache_folio
+
 #define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
 static inline void flush_dcache_page(struct page *page)
 {
-	clear_bit(PG_dc_clean, &page->flags);
+	flush_dcache_folio(page_folio(page));
 }
 
 #define flush_icache_user_page(vma, page, addr, len)	\
diff --git a/arch/openrisc/include/asm/pgtable.h b/arch/openrisc/include/asm/pgtable.h
index 3eb9b9555d0d..2f42a12c40ab 100644
--- a/arch/openrisc/include/asm/pgtable.h
+++ b/arch/openrisc/include/asm/pgtable.h
@@ -46,7 +46,7 @@ extern void paging_init(void);
  * hook is made available.
  */
 #define set_pte(pteptr, pteval) ((*(pteptr)) = (pteval))
-#define set_pte_at(mm, addr, ptep, pteval) set_pte(ptep, pteval)
+
 /*
  * (pmds are folded into pgds so this doesn't get actually called,
  * but the define is needed for a generic inline function.)
@@ -357,6 +357,7 @@ static inline unsigned long pmd_page_vaddr(pmd_t pmd)
 #define __pmd_offset(address) \
 	(((address) >> PMD_SHIFT) & (PTRS_PER_PMD-1))
 
+#define PFN_PTE_SHIFT		PAGE_SHIFT
 #define pte_pfn(x)		((unsigned long)(((x).pte)) >> PAGE_SHIFT)
 #define pfn_pte(pfn, prot)  __pte((((pfn) << PAGE_SHIFT)) | pgprot_val(prot))
 
@@ -379,13 +380,16 @@ static inline void update_tlb(struct vm_area_struct *vma,
 extern void update_cache(struct vm_area_struct *vma,
 	unsigned long address, pte_t *pte);
 
-static inline void update_mmu_cache(struct vm_area_struct *vma,
-	unsigned long address, pte_t *pte)
+static inline void update_mmu_cache_range(struct vm_area_struct *vma,
+		unsigned long address, pte_t *ptep, unsigned int nr)
 {
-	update_tlb(vma, address, pte);
-	update_cache(vma, address, pte);
+	update_tlb(vma, address, ptep);
+	update_cache(vma, address, ptep);
 }
 
+#define update_mmu_cache(vma, addr, ptep) \
+	update_mmu_cache_range(vma, addr, ptep, 1)
+
 /* __PHX__ FIXME, SWAP, this probably doesn't work */
 
 /*
diff --git a/arch/openrisc/mm/cache.c b/arch/openrisc/mm/cache.c
index 534a52ec5e66..eb43b73f3855 100644
--- a/arch/openrisc/mm/cache.c
+++ b/arch/openrisc/mm/cache.c
@@ -43,15 +43,19 @@ void update_cache(struct vm_area_struct *vma, unsigned long address,
 	pte_t *pte)
 {
 	unsigned long pfn = pte_val(*pte) >> PAGE_SHIFT;
-	struct page *page = pfn_to_page(pfn);
-	int dirty = !test_and_set_bit(PG_dc_clean, &page->flags);
+	struct folio *folio = page_folio(pfn_to_page(pfn));
+	int dirty = !test_and_set_bit(PG_dc_clean, &folio->flags);
 
 	/*
 	 * Since icaches do not snoop for updated data on OpenRISC, we
 	 * must write back and invalidate any dirty pages manually. We
 	 * can skip data pages, since they will not end up in icaches.
 	 */
-	if ((vma->vm_flags & VM_EXEC) && dirty)
-		sync_icache_dcache(page);
+	if ((vma->vm_flags & VM_EXEC) && dirty) {
+		unsigned int nr = folio_nr_pages(folio);
+
+		while (nr--)
+			sync_icache_dcache(folio_page(folio, nr));
+	}
 }
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH v4 19/36] parisc: Implement the new page table range API
  2023-03-15  5:14 [PATCH v4 00/36] New page table range API Matthew Wilcox (Oracle)
                   ` (17 preceding siblings ...)
  2023-03-15  5:14 ` [PATCH v4 18/36] openrisc: " Matthew Wilcox (Oracle)
@ 2023-03-15  5:14 ` Matthew Wilcox (Oracle)
  2023-03-15 10:09   ` Mike Rapoport
  2023-03-15  5:14 ` [PATCH v4 20/36] powerpc: " Matthew Wilcox (Oracle)
                   ` (16 subsequent siblings)
  35 siblings, 1 reply; 138+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-03-15  5:14 UTC (permalink / raw)
  To: linux-arch
  Cc: Matthew Wilcox (Oracle),
	linux-mm, linux-kernel, James E.J. Bottomley, Helge Deller,
	linux-parisc

Add set_ptes(), update_mmu_cache_range(), flush_dcache_folio()
and flush_icache_pages().  Change the PG_arch_1 (aka PG_dcache_dirty) flag
from being per-page to per-folio.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Helge Deller <deller@gmx.de>
Cc: linux-parisc@vger.kernel.org
---
 arch/parisc/include/asm/cacheflush.h |  14 ++--
 arch/parisc/include/asm/pgtable.h    |  37 ++++++----
 arch/parisc/kernel/cache.c           | 101 +++++++++++++++++++--------
 3 files changed, 103 insertions(+), 49 deletions(-)

diff --git a/arch/parisc/include/asm/cacheflush.h b/arch/parisc/include/asm/cacheflush.h
index 0bdee6724132..2cdc0ea562d6 100644
--- a/arch/parisc/include/asm/cacheflush.h
+++ b/arch/parisc/include/asm/cacheflush.h
@@ -43,16 +43,20 @@ void invalidate_kernel_vmap_range(void *vaddr, int size);
 #define flush_cache_vmap(start, end)		flush_cache_all()
 #define flush_cache_vunmap(start, end)		flush_cache_all()
 
+void flush_dcache_folio(struct folio *folio);
+#define flush_dcache_folio flush_dcache_folio
 #define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
-void flush_dcache_page(struct page *page);
+static inline void flush_dcache_page(struct page *page)
+{
+	flush_dcache_folio(page_folio(page));
+}
 
 #define flush_dcache_mmap_lock(mapping)		xa_lock_irq(&mapping->i_pages)
 #define flush_dcache_mmap_unlock(mapping)	xa_unlock_irq(&mapping->i_pages)
 
-#define flush_icache_page(vma,page)	do { 		\
-	flush_kernel_dcache_page_addr(page_address(page)); \
-	flush_kernel_icache_page(page_address(page)); 	\
-} while (0)
+void flush_icache_pages(struct vm_area_struct *vma, struct page *page,
+		unsigned int nr);
+#define flush_icache_page(vma, page)	flush_icache_pages(vma, page, 1)
 
 #define flush_icache_range(s,e)		do { 		\
 	flush_kernel_dcache_range_asm(s,e); 		\
diff --git a/arch/parisc/include/asm/pgtable.h b/arch/parisc/include/asm/pgtable.h
index e2950f5db7c9..ca6afe1980a5 100644
--- a/arch/parisc/include/asm/pgtable.h
+++ b/arch/parisc/include/asm/pgtable.h
@@ -73,15 +73,6 @@ extern void __update_cache(pte_t pte);
 		mb();				\
 	} while(0)
 
-#define set_pte_at(mm, addr, pteptr, pteval)	\
-	do {					\
-		if (pte_present(pteval) &&	\
-		    pte_user(pteval))		\
-			__update_cache(pteval);	\
-		*(pteptr) = (pteval);		\
-		purge_tlb_entries(mm, addr);	\
-	} while (0)
-
 #endif /* !__ASSEMBLY__ */
 
 #define pte_ERROR(e) \
@@ -285,7 +276,7 @@ extern unsigned long *empty_zero_page;
 #define pte_none(x)     (pte_val(x) == 0)
 #define pte_present(x)	(pte_val(x) & _PAGE_PRESENT)
 #define pte_user(x)	(pte_val(x) & _PAGE_USER)
-#define pte_clear(mm, addr, xp)  set_pte_at(mm, addr, xp, __pte(0))
+#define pte_clear(mm, addr, xp)  set_pte(xp, __pte(0))
 
 #define pmd_flag(x)	(pmd_val(x) & PxD_FLAG_MASK)
 #define pmd_address(x)	((unsigned long)(pmd_val(x) &~ PxD_FLAG_MASK) << PxD_VALUE_SHIFT)
@@ -391,11 +382,29 @@ static inline unsigned long pmd_page_vaddr(pmd_t pmd)
 
 extern void paging_init (void);
 
+static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
+		pte_t *ptep, pte_t pte, unsigned int nr)
+{
+	if (pte_present(pte) && pte_user(pte))
+		__update_cache(pte);
+	for (;;) {
+		*ptep = pte;
+		purge_tlb_entries(mm, addr);
+		if (--nr == 0)
+			break;
+		ptep++;
+		pte_val(pte) += 1 << PFN_PTE_SHIFT;
+		addr += PAGE_SIZE;
+	}
+}
+#define set_ptes set_ptes
+
 /* Used for deferring calls to flush_dcache_page() */
 
 #define PG_dcache_dirty         PG_arch_1
 
-#define update_mmu_cache(vms,addr,ptep) __update_cache(*ptep)
+#define update_mmu_cache_range(vma, addr, ptep, nr) __update_cache(*ptep)
+#define update_mmu_cache(vma, addr, ptep) __update_cache(*ptep)
 
 /*
  * Encode/decode swap entries and swap PTEs. Swap PTEs are all PTEs that
@@ -450,7 +459,7 @@ static inline int ptep_test_and_clear_young(struct vm_area_struct *vma, unsigned
 	if (!pte_young(pte)) {
 		return 0;
 	}
-	set_pte_at(vma->vm_mm, addr, ptep, pte_mkold(pte));
+	set_pte(ptep, pte_mkold(pte));
 	return 1;
 }
 
@@ -460,14 +469,14 @@ static inline pte_t ptep_get_and_clear(struct mm_struct *mm, unsigned long addr,
 	pte_t old_pte;
 
 	old_pte = *ptep;
-	set_pte_at(mm, addr, ptep, __pte(0));
+	set_pte(ptep, __pte(0));
 
 	return old_pte;
 }
 
 static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long addr, pte_t *ptep)
 {
-	set_pte_at(mm, addr, ptep, pte_wrprotect(*ptep));
+	set_pte(ptep, pte_wrprotect(*ptep));
 }
 
 #define pte_same(A,B)	(pte_val(A) == pte_val(B))
diff --git a/arch/parisc/kernel/cache.c b/arch/parisc/kernel/cache.c
index 1d3b8bc8a623..ceaa268fc1a6 100644
--- a/arch/parisc/kernel/cache.c
+++ b/arch/parisc/kernel/cache.c
@@ -92,11 +92,11 @@ static inline void flush_data_cache(void)
 /* Kernel virtual address of pfn.  */
 #define pfn_va(pfn)	__va(PFN_PHYS(pfn))
 
-void
-__update_cache(pte_t pte)
+void __update_cache(pte_t pte)
 {
 	unsigned long pfn = pte_pfn(pte);
-	struct page *page;
+	struct folio *folio;
+	unsigned int nr;
 
 	/* We don't have pte special.  As a result, we can be called with
 	   an invalid pfn and we don't need to flush the kernel dcache page.
@@ -104,13 +104,17 @@ __update_cache(pte_t pte)
 	if (!pfn_valid(pfn))
 		return;
 
-	page = pfn_to_page(pfn);
-	if (page_mapping_file(page) &&
-	    test_bit(PG_dcache_dirty, &page->flags)) {
-		flush_kernel_dcache_page_addr(pfn_va(pfn));
-		clear_bit(PG_dcache_dirty, &page->flags);
+	folio = page_folio(pfn_to_page(pfn));
+	pfn = folio_pfn(folio);
+	nr = folio_nr_pages(folio);
+	if (folio_flush_mapping(folio) &&
+	    test_bit(PG_dcache_dirty, &folio->flags)) {
+		while (nr--)
+			flush_kernel_dcache_page_addr(pfn_va(pfn + nr));
+		clear_bit(PG_dcache_dirty, &folio->flags);
 	} else if (parisc_requires_coherency())
-		flush_kernel_dcache_page_addr(pfn_va(pfn));
+		while (nr--)
+			flush_kernel_dcache_page_addr(pfn_va(pfn + nr));
 }
 
 void
@@ -364,6 +368,20 @@ static void flush_user_cache_page(struct vm_area_struct *vma, unsigned long vmad
 	preempt_enable();
 }
 
+void flush_icache_pages(struct vm_area_struct *vma, struct page *page,
+		unsigned int nr)
+{
+	void *kaddr = page_address(page);
+
+	for (;;) {
+		flush_kernel_dcache_page_addr(kaddr);
+		flush_kernel_icache_page(kaddr);
+		if (--nr == 0)
+			break;
+		page += PAGE_SIZE;
+	}
+}
+
 static inline pte_t *get_ptep(struct mm_struct *mm, unsigned long addr)
 {
 	pte_t *ptep = NULL;
@@ -392,26 +410,30 @@ static inline bool pte_needs_flush(pte_t pte)
 		== (_PAGE_PRESENT | _PAGE_ACCESSED);
 }
 
-void flush_dcache_page(struct page *page)
+void flush_dcache_folio(struct folio *folio)
 {
-	struct address_space *mapping = page_mapping_file(page);
-	struct vm_area_struct *mpnt;
-	unsigned long offset;
+	struct address_space *mapping = folio_flush_mapping(folio);
+	struct vm_area_struct *vma;
 	unsigned long addr, old_addr = 0;
+	void *kaddr;
 	unsigned long count = 0;
+	unsigned long i, nr;
 	pgoff_t pgoff;
 
 	if (mapping && !mapping_mapped(mapping)) {
-		set_bit(PG_dcache_dirty, &page->flags);
+		set_bit(PG_dcache_dirty, &folio->flags);
 		return;
 	}
 
-	flush_kernel_dcache_page_addr(page_address(page));
+	nr = folio_nr_pages(folio);
+	kaddr = folio_address(folio);
+	for (i = 0; i < nr; i++)
+		flush_kernel_dcache_page_addr(kaddr + i * PAGE_SIZE);
 
 	if (!mapping)
 		return;
 
-	pgoff = page->index;
+	pgoff = folio->index;
 
 	/*
 	 * We have carefully arranged in arch_get_unmapped_area() that
@@ -421,15 +443,29 @@ void flush_dcache_page(struct page *page)
 	 * on machines that support equivalent aliasing
 	 */
 	flush_dcache_mmap_lock(mapping);
-	vma_interval_tree_foreach(mpnt, &mapping->i_mmap, pgoff, pgoff) {
-		offset = (pgoff - mpnt->vm_pgoff) << PAGE_SHIFT;
-		addr = mpnt->vm_start + offset;
-		if (parisc_requires_coherency()) {
-			pte_t *ptep;
+	vma_interval_tree_foreach(vma, &mapping->i_mmap, pgoff, pgoff + nr - 1) {
+		unsigned long offset = pgoff - vma->vm_pgoff;
+		unsigned long pfn = folio_pfn(folio);
+
+		addr = vma->vm_start;
+		nr = folio_nr_pages(folio);
+		if (offset > -nr) {
+			pfn -= offset;
+			nr += offset;
+		} else {
+			addr += offset * PAGE_SIZE;
+		}
+		if (addr + nr * PAGE_SIZE > vma->vm_end)
+			nr = (vma->vm_end - addr) / PAGE_SIZE;
 
-			ptep = get_ptep(mpnt->vm_mm, addr);
-			if (ptep && pte_needs_flush(*ptep))
-				flush_user_cache_page(mpnt, addr);
+		if (parisc_requires_coherency()) {
+			for (i = 0; i < nr; i++) {
+				pte_t *ptep = get_ptep(vma->vm_mm,
+							addr + i * PAGE_SIZE);
+				if (ptep && pte_needs_flush(*ptep))
+					flush_user_cache_page(vma,
+							addr + i * PAGE_SIZE);
+			}
 		} else {
 			/*
 			 * The TLB is the engine of coherence on parisc:
@@ -442,27 +478,32 @@ void flush_dcache_page(struct page *page)
 			 * in (until the user or kernel specifically
 			 * accesses it, of course)
 			 */
-			flush_tlb_page(mpnt, addr);
+			for (i = 0; i < nr; i++)
+				flush_tlb_page(vma, addr + i * PAGE_SIZE);
 			if (old_addr == 0 || (old_addr & (SHM_COLOUR - 1))
 					!= (addr & (SHM_COLOUR - 1))) {
-				__flush_cache_page(mpnt, addr, page_to_phys(page));
+				for (i = 0; i < nr; i++)
+					__flush_cache_page(vma,
+						addr + i * PAGE_SIZE,
+						(pfn + i) * PAGE_SIZE);
 				/*
 				 * Software is allowed to have any number
 				 * of private mappings to a page.
 				 */
-				if (!(mpnt->vm_flags & VM_SHARED))
+				if (!(vma->vm_flags & VM_SHARED))
 					continue;
 				if (old_addr)
 					pr_err("INEQUIVALENT ALIASES 0x%lx and 0x%lx in file %pD\n",
-						old_addr, addr, mpnt->vm_file);
-				old_addr = addr;
+						old_addr, addr, vma->vm_file);
+				if (nr == folio_nr_pages(folio))
+					old_addr = addr;
 			}
 		}
 		WARN_ON(++count == 4096);
 	}
 	flush_dcache_mmap_unlock(mapping);
 }
-EXPORT_SYMBOL(flush_dcache_page);
+EXPORT_SYMBOL(flush_dcache_folio);
 
 /* Defined in arch/parisc/kernel/pacache.S */
 EXPORT_SYMBOL(flush_kernel_dcache_range_asm);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH v4 20/36] powerpc: Implement the new page table range API
  2023-03-15  5:14 [PATCH v4 00/36] New page table range API Matthew Wilcox (Oracle)
                   ` (18 preceding siblings ...)
  2023-03-15  5:14 ` [PATCH v4 19/36] parisc: " Matthew Wilcox (Oracle)
@ 2023-03-15  5:14 ` Matthew Wilcox (Oracle)
  2023-03-15  9:43   ` Christophe Leroy
  2023-03-15 10:09   ` Mike Rapoport
  2023-03-15  5:14 ` [PATCH v4 21/36] riscv: " Matthew Wilcox (Oracle)
                   ` (15 subsequent siblings)
  35 siblings, 2 replies; 138+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-03-15  5:14 UTC (permalink / raw)
  To: linux-arch
  Cc: Matthew Wilcox (Oracle),
	linux-mm, linux-kernel, Michael Ellerman, Nicholas Piggin,
	Christophe Leroy, linuxppc-dev

Add set_ptes(), update_mmu_cache_range() and flush_dcache_folio().
Change the PG_arch_1 (aka PG_dcache_dirty) flag from being per-page to
per-folio.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: linuxppc-dev@lists.ozlabs.org
---
 arch/powerpc/include/asm/book3s/pgtable.h | 10 +----
 arch/powerpc/include/asm/cacheflush.h     | 14 +++++--
 arch/powerpc/include/asm/kvm_ppc.h        | 10 ++---
 arch/powerpc/include/asm/nohash/pgtable.h | 13 ++----
 arch/powerpc/include/asm/pgtable.h        |  6 +++
 arch/powerpc/mm/book3s64/hash_utils.c     | 11 ++---
 arch/powerpc/mm/cacheflush.c              | 40 ++++++------------
 arch/powerpc/mm/nohash/e500_hugetlbpage.c |  3 +-
 arch/powerpc/mm/pgtable.c                 | 51 +++++++++++++----------
 9 files changed, 77 insertions(+), 81 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/pgtable.h b/arch/powerpc/include/asm/book3s/pgtable.h
index d18b748ea3ae..c2ef811505b0 100644
--- a/arch/powerpc/include/asm/book3s/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/pgtable.h
@@ -9,13 +9,6 @@
 #endif
 
 #ifndef __ASSEMBLY__
-/* Insert a PTE, top-level function is out of line. It uses an inline
- * low level function in the respective pgtable-* files
- */
-extern void set_pte_at(struct mm_struct *mm, unsigned long addr, pte_t *ptep,
-		       pte_t pte);
-
-
 #define __HAVE_ARCH_PTEP_SET_ACCESS_FLAGS
 extern int ptep_set_access_flags(struct vm_area_struct *vma, unsigned long address,
 				 pte_t *ptep, pte_t entry, int dirty);
@@ -36,7 +29,8 @@ void __update_mmu_cache(struct vm_area_struct *vma, unsigned long address, pte_t
  * corresponding HPTE into the hash table ahead of time, instead of
  * waiting for the inevitable extra hash-table miss exception.
  */
-static inline void update_mmu_cache(struct vm_area_struct *vma, unsigned long address, pte_t *ptep)
+static inline void update_mmu_cache_range(struct vm_area_struct *vma,
+		unsigned long address, pte_t *ptep, unsigned int nr)
 {
 	if (IS_ENABLED(CONFIG_PPC32) && !mmu_has_feature(MMU_FTR_HPTE_TABLE))
 		return;
diff --git a/arch/powerpc/include/asm/cacheflush.h b/arch/powerpc/include/asm/cacheflush.h
index 7564dd4fd12b..ef7d2de33b89 100644
--- a/arch/powerpc/include/asm/cacheflush.h
+++ b/arch/powerpc/include/asm/cacheflush.h
@@ -35,13 +35,19 @@ static inline void flush_cache_vmap(unsigned long start, unsigned long end)
  * It just marks the page as not i-cache clean.  We do the i-cache
  * flush later when the page is given to a user process, if necessary.
  */
-static inline void flush_dcache_page(struct page *page)
+static inline void flush_dcache_folio(struct folio *folio)
 {
 	if (cpu_has_feature(CPU_FTR_COHERENT_ICACHE))
 		return;
 	/* avoid an atomic op if possible */
-	if (test_bit(PG_dcache_clean, &page->flags))
-		clear_bit(PG_dcache_clean, &page->flags);
+	if (test_bit(PG_dcache_clean, &folio->flags))
+		clear_bit(PG_dcache_clean, &folio->flags);
+}
+#define flush_dcache_folio flush_dcache_folio
+
+static inline void flush_dcache_page(struct page *page)
+{
+	flush_dcache_folio(page_folio(page));
 }
 
 void flush_icache_range(unsigned long start, unsigned long stop);
@@ -51,7 +57,7 @@ void flush_icache_user_page(struct vm_area_struct *vma, struct page *page,
 		unsigned long addr, int len);
 #define flush_icache_user_page flush_icache_user_page
 
-void flush_dcache_icache_page(struct page *page);
+void flush_dcache_icache_folio(struct folio *folio);
 
 /**
  * flush_dcache_range(): Write any modified data cache blocks out to memory and
diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
index 6bef23d6d0e3..e91dd8e88bb7 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -868,7 +868,7 @@ void kvmppc_init_lpid(unsigned long nr_lpids);
 
 static inline void kvmppc_mmu_flush_icache(kvm_pfn_t pfn)
 {
-	struct page *page;
+	struct folio *folio;
 	/*
 	 * We can only access pages that the kernel maps
 	 * as memory. Bail out for unmapped ones.
@@ -877,10 +877,10 @@ static inline void kvmppc_mmu_flush_icache(kvm_pfn_t pfn)
 		return;
 
 	/* Clear i-cache for new pages */
-	page = pfn_to_page(pfn);
-	if (!test_bit(PG_dcache_clean, &page->flags)) {
-		flush_dcache_icache_page(page);
-		set_bit(PG_dcache_clean, &page->flags);
+	folio = page_folio(pfn_to_page(pfn));
+	if (!test_bit(PG_dcache_clean, &folio->flags)) {
+		flush_dcache_icache_folio(folio);
+		set_bit(PG_dcache_clean, &folio->flags);
 	}
 }
 
diff --git a/arch/powerpc/include/asm/nohash/pgtable.h b/arch/powerpc/include/asm/nohash/pgtable.h
index a6caaaab6f92..69a7dd47a9f0 100644
--- a/arch/powerpc/include/asm/nohash/pgtable.h
+++ b/arch/powerpc/include/asm/nohash/pgtable.h
@@ -166,12 +166,6 @@ static inline pte_t pte_swp_clear_exclusive(pte_t pte)
 	return __pte(pte_val(pte) & ~_PAGE_SWP_EXCLUSIVE);
 }
 
-/* Insert a PTE, top-level function is out of line. It uses an inline
- * low level function in the respective pgtable-* files
- */
-extern void set_pte_at(struct mm_struct *mm, unsigned long addr, pte_t *ptep,
-		       pte_t pte);
-
 /* This low level function performs the actual PTE insertion
  * Setting the PTE depends on the MMU type and other factors. It's
  * an horrible mess that I'm not going to try to clean up now but
@@ -282,10 +276,11 @@ static inline int pud_huge(pud_t pud)
  * for the page which has just been mapped in.
  */
 #if defined(CONFIG_PPC_E500) && defined(CONFIG_HUGETLB_PAGE)
-void update_mmu_cache(struct vm_area_struct *vma, unsigned long address, pte_t *ptep);
+void update_mmu_cache_range(struct vm_area_struct *vma, unsigned long address,
+		pte_t *ptep, unsigned int nr);
 #else
-static inline
-void update_mmu_cache(struct vm_area_struct *vma, unsigned long address, pte_t *ptep) {}
+static inline void update_mmu_cache(struct vm_area_struct *vma,
+		unsigned long address, pte_t *ptep, unsigned int nr) {}
 #endif
 
 #endif /* __ASSEMBLY__ */
diff --git a/arch/powerpc/include/asm/pgtable.h b/arch/powerpc/include/asm/pgtable.h
index 9972626ddaf6..656ecf2b10cd 100644
--- a/arch/powerpc/include/asm/pgtable.h
+++ b/arch/powerpc/include/asm/pgtable.h
@@ -41,6 +41,12 @@ struct mm_struct;
 
 #ifndef __ASSEMBLY__
 
+void set_ptes(struct mm_struct *mm, unsigned long addr, pte_t *ptep,
+		pte_t pte, unsigned int nr);
+#define set_ptes set_ptes
+#define update_mmu_cache(vma, addr, ptep) \
+	update_mmu_cache_range(vma, addr, ptep, 1)
+
 #ifndef MAX_PTRS_PER_PGD
 #define MAX_PTRS_PER_PGD PTRS_PER_PGD
 #endif
diff --git a/arch/powerpc/mm/book3s64/hash_utils.c b/arch/powerpc/mm/book3s64/hash_utils.c
index fedffe3ae136..ad2afa08e62e 100644
--- a/arch/powerpc/mm/book3s64/hash_utils.c
+++ b/arch/powerpc/mm/book3s64/hash_utils.c
@@ -1307,18 +1307,19 @@ void hash__early_init_mmu_secondary(void)
  */
 unsigned int hash_page_do_lazy_icache(unsigned int pp, pte_t pte, int trap)
 {
-	struct page *page;
+	struct folio *folio;
 
 	if (!pfn_valid(pte_pfn(pte)))
 		return pp;
 
-	page = pte_page(pte);
+	folio = page_folio(pte_page(pte));
 
 	/* page is dirty */
-	if (!test_bit(PG_dcache_clean, &page->flags) && !PageReserved(page)) {
+	if (!test_bit(PG_dcache_clean, &folio->flags) &&
+	    !folio_test_reserved(folio)) {
 		if (trap == INTERRUPT_INST_STORAGE) {
-			flush_dcache_icache_page(page);
-			set_bit(PG_dcache_clean, &page->flags);
+			flush_dcache_icache_folio(folio);
+			set_bit(PG_dcache_clean, &folio->flags);
 		} else
 			pp |= HPTE_R_N;
 	}
diff --git a/arch/powerpc/mm/cacheflush.c b/arch/powerpc/mm/cacheflush.c
index 0e9b4879c0f9..8760d2223abe 100644
--- a/arch/powerpc/mm/cacheflush.c
+++ b/arch/powerpc/mm/cacheflush.c
@@ -148,44 +148,30 @@ static void __flush_dcache_icache(void *p)
 	invalidate_icache_range(addr, addr + PAGE_SIZE);
 }
 
-static void flush_dcache_icache_hugepage(struct page *page)
+void flush_dcache_icache_folio(struct folio *folio)
 {
-	int i;
-	int nr = compound_nr(page);
+	unsigned int i, nr = folio_nr_pages(folio);
 
-	if (!PageHighMem(page)) {
+	if (flush_coherent_icache())
+		return;
+
+	if (!folio_test_highmem(folio)) {
+		void *addr = folio_address(folio);
 		for (i = 0; i < nr; i++)
-			__flush_dcache_icache(lowmem_page_address(page + i));
-	} else {
+			__flush_dcache_icache(addr + i * PAGE_SIZE);
+	} else if (IS_ENABLED(CONFIG_BOOKE) || sizeof(phys_addr_t) > sizeof(void *)) {
 		for (i = 0; i < nr; i++) {
-			void *start = kmap_local_page(page + i);
+			void *start = kmap_local_folio(folio, i * PAGE_SIZE);
 
 			__flush_dcache_icache(start);
 			kunmap_local(start);
 		}
-	}
-}
-
-void flush_dcache_icache_page(struct page *page)
-{
-	if (flush_coherent_icache())
-		return;
-
-	if (PageCompound(page))
-		return flush_dcache_icache_hugepage(page);
-
-	if (!PageHighMem(page)) {
-		__flush_dcache_icache(lowmem_page_address(page));
-	} else if (IS_ENABLED(CONFIG_BOOKE) || sizeof(phys_addr_t) > sizeof(void *)) {
-		void *start = kmap_local_page(page);
-
-		__flush_dcache_icache(start);
-		kunmap_local(start);
 	} else {
-		flush_dcache_icache_phys(page_to_phys(page));
+		unsigned long pfn = folio_pfn(folio);
+		for (i = 0; i < nr; i++)
+			flush_dcache_icache_phys((pfn + i) * PAGE_SIZE);
 	}
 }
-EXPORT_SYMBOL(flush_dcache_icache_page);
 
 void clear_user_page(void *page, unsigned long vaddr, struct page *pg)
 {
diff --git a/arch/powerpc/mm/nohash/e500_hugetlbpage.c b/arch/powerpc/mm/nohash/e500_hugetlbpage.c
index 58c8d9849cb1..f3cb91107a47 100644
--- a/arch/powerpc/mm/nohash/e500_hugetlbpage.c
+++ b/arch/powerpc/mm/nohash/e500_hugetlbpage.c
@@ -178,7 +178,8 @@ book3e_hugetlb_preload(struct vm_area_struct *vma, unsigned long ea, pte_t pte)
  *
  * This must always be called with the pte lock held.
  */
-void update_mmu_cache(struct vm_area_struct *vma, unsigned long address, pte_t *ptep)
+void update_mmu_cache(struct vm_area_struct *vma, unsigned long address,
+		pte_t *ptep, unsigned int nr)
 {
 	if (is_vm_hugetlb_page(vma))
 		book3e_hugetlb_preload(vma, address, *ptep);
diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c
index cb2dcdb18f8e..b3c7b874a7a2 100644
--- a/arch/powerpc/mm/pgtable.c
+++ b/arch/powerpc/mm/pgtable.c
@@ -58,7 +58,7 @@ static inline int pte_looks_normal(pte_t pte)
 	return 0;
 }
 
-static struct page *maybe_pte_to_page(pte_t pte)
+static struct folio *maybe_pte_to_folio(pte_t pte)
 {
 	unsigned long pfn = pte_pfn(pte);
 	struct page *page;
@@ -68,7 +68,7 @@ static struct page *maybe_pte_to_page(pte_t pte)
 	page = pfn_to_page(pfn);
 	if (PageReserved(page))
 		return NULL;
-	return page;
+	return page_folio(page);
 }
 
 #ifdef CONFIG_PPC_BOOK3S
@@ -84,12 +84,12 @@ static pte_t set_pte_filter_hash(pte_t pte)
 	pte = __pte(pte_val(pte) & ~_PAGE_HPTEFLAGS);
 	if (pte_looks_normal(pte) && !(cpu_has_feature(CPU_FTR_COHERENT_ICACHE) ||
 				       cpu_has_feature(CPU_FTR_NOEXECUTE))) {
-		struct page *pg = maybe_pte_to_page(pte);
-		if (!pg)
+		struct folio *folio = maybe_pte_to_folio(pte);
+		if (!folio)
 			return pte;
-		if (!test_bit(PG_dcache_clean, &pg->flags)) {
-			flush_dcache_icache_page(pg);
-			set_bit(PG_dcache_clean, &pg->flags);
+		if (!test_bit(PG_dcache_clean, &folio->flags)) {
+			flush_dcache_icache_folio(folio);
+			set_bit(PG_dcache_clean, &folio->flags);
 		}
 	}
 	return pte;
@@ -107,7 +107,7 @@ static pte_t set_pte_filter_hash(pte_t pte) { return pte; }
  */
 static inline pte_t set_pte_filter(pte_t pte)
 {
-	struct page *pg;
+	struct folio *folio;
 
 	if (radix_enabled())
 		return pte;
@@ -120,18 +120,18 @@ static inline pte_t set_pte_filter(pte_t pte)
 		return pte;
 
 	/* If you set _PAGE_EXEC on weird pages you're on your own */
-	pg = maybe_pte_to_page(pte);
-	if (unlikely(!pg))
+	folio = maybe_pte_to_folio(pte);
+	if (unlikely(!folio))
 		return pte;
 
 	/* If the page clean, we move on */
-	if (test_bit(PG_dcache_clean, &pg->flags))
+	if (test_bit(PG_dcache_clean, &folio->flags))
 		return pte;
 
 	/* If it's an exec fault, we flush the cache and make it clean */
 	if (is_exec_fault()) {
-		flush_dcache_icache_page(pg);
-		set_bit(PG_dcache_clean, &pg->flags);
+		flush_dcache_icache_folio(folio);
+		set_bit(PG_dcache_clean, &folio->flags);
 		return pte;
 	}
 
@@ -142,7 +142,7 @@ static inline pte_t set_pte_filter(pte_t pte)
 static pte_t set_access_flags_filter(pte_t pte, struct vm_area_struct *vma,
 				     int dirty)
 {
-	struct page *pg;
+	struct folio *folio;
 
 	if (IS_ENABLED(CONFIG_PPC_BOOK3S_64))
 		return pte;
@@ -168,17 +168,17 @@ static pte_t set_access_flags_filter(pte_t pte, struct vm_area_struct *vma,
 #endif /* CONFIG_DEBUG_VM */
 
 	/* If you set _PAGE_EXEC on weird pages you're on your own */
-	pg = maybe_pte_to_page(pte);
-	if (unlikely(!pg))
+	folio = maybe_pte_to_folio(pte);
+	if (unlikely(!folio))
 		goto bail;
 
 	/* If the page is already clean, we move on */
-	if (test_bit(PG_dcache_clean, &pg->flags))
+	if (test_bit(PG_dcache_clean, &folio->flags))
 		goto bail;
 
 	/* Clean the page and set PG_dcache_clean */
-	flush_dcache_icache_page(pg);
-	set_bit(PG_dcache_clean, &pg->flags);
+	flush_dcache_icache_folio(folio);
+	set_bit(PG_dcache_clean, &folio->flags);
 
  bail:
 	return pte_mkexec(pte);
@@ -187,8 +187,8 @@ static pte_t set_access_flags_filter(pte_t pte, struct vm_area_struct *vma,
 /*
  * set_pte stores a linux PTE into the linux page table.
  */
-void set_pte_at(struct mm_struct *mm, unsigned long addr, pte_t *ptep,
-		pte_t pte)
+void set_ptes(struct mm_struct *mm, unsigned long addr, pte_t *ptep,
+		pte_t pte, unsigned int nr)
 {
 	/*
 	 * Make sure hardware valid bit is not set. We don't do
@@ -203,7 +203,14 @@ void set_pte_at(struct mm_struct *mm, unsigned long addr, pte_t *ptep,
 	pte = set_pte_filter(pte);
 
 	/* Perform the setting of the PTE */
-	__set_pte_at(mm, addr, ptep, pte, 0);
+	for (;;) {
+		__set_pte_at(mm, addr, ptep, pte, 0);
+		if (--nr == 0)
+			break;
+		ptep++;
+		pte = __pte(pte_val(pte) + PAGE_SIZE);
+		addr += PAGE_SIZE;
+	}
 }
 
 void unmap_kernel_page(unsigned long va)
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH v4 21/36] riscv: Implement the new page table range API
  2023-03-15  5:14 [PATCH v4 00/36] New page table range API Matthew Wilcox (Oracle)
                   ` (19 preceding siblings ...)
  2023-03-15  5:14 ` [PATCH v4 20/36] powerpc: " Matthew Wilcox (Oracle)
@ 2023-03-15  5:14 ` Matthew Wilcox (Oracle)
  2023-03-15 10:10   ` Mike Rapoport
  2023-03-15  5:14 ` [PATCH v4 22/36] s390: " Matthew Wilcox (Oracle)
                   ` (14 subsequent siblings)
  35 siblings, 1 reply; 138+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-03-15  5:14 UTC (permalink / raw)
  To: linux-arch
  Cc: Matthew Wilcox (Oracle),
	linux-mm, linux-kernel, Alexandre Ghiti, Paul Walmsley,
	Palmer Dabbelt, Albert Ou, linux-riscv

Add set_ptes(), update_mmu_cache_range() and flush_dcache_folio().
Change the PG_dcache_clean flag from being per-page to per-folio.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: linux-riscv@lists.infradead.org
---
 arch/riscv/include/asm/cacheflush.h | 19 +++++++++----------
 arch/riscv/include/asm/pgtable.h    | 26 +++++++++++++++++++-------
 arch/riscv/mm/cacheflush.c          | 11 ++---------
 3 files changed, 30 insertions(+), 26 deletions(-)

diff --git a/arch/riscv/include/asm/cacheflush.h b/arch/riscv/include/asm/cacheflush.h
index 03e3b95ae6da..10e5e96f09b5 100644
--- a/arch/riscv/include/asm/cacheflush.h
+++ b/arch/riscv/include/asm/cacheflush.h
@@ -15,20 +15,19 @@ static inline void local_flush_icache_all(void)
 
 #define PG_dcache_clean PG_arch_1
 
-static inline void flush_dcache_page(struct page *page)
+static inline void flush_dcache_folio(struct folio *folio)
 {
-	/*
-	 * HugeTLB pages are always fully mapped and only head page will be
-	 * set PG_dcache_clean (see comments in flush_icache_pte()).
-	 */
-	if (PageHuge(page))
-		page = compound_head(page);
-
-	if (test_bit(PG_dcache_clean, &page->flags))
-		clear_bit(PG_dcache_clean, &page->flags);
+	if (test_bit(PG_dcache_clean, &folio->flags))
+		clear_bit(PG_dcache_clean, &folio->flags);
 }
+#define flush_dcache_folio flush_dcache_folio
 #define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
 
+static inline void flush_dcache_page(struct page *page)
+{
+	flush_dcache_folio(page_folio(page));
+}
+
 /*
  * RISC-V doesn't have an instruction to flush parts of the instruction cache,
  * so instead we just flush the whole thing.
diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
index b516f3b59616..b077bc8c498c 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -405,8 +405,8 @@ static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
 
 
 /* Commit new configuration to MMU hardware */
-static inline void update_mmu_cache(struct vm_area_struct *vma,
-	unsigned long address, pte_t *ptep)
+static inline void update_mmu_cache_range(struct vm_area_struct *vma,
+		unsigned long address, pte_t *ptep, unsigned int nr)
 {
 	/*
 	 * The kernel assumes that TLBs don't cache invalid entries, but
@@ -415,8 +415,11 @@ static inline void update_mmu_cache(struct vm_area_struct *vma,
 	 * Relying on flush_tlb_fix_spurious_fault would suffice, but
 	 * the extra traps reduce performance.  So, eagerly SFENCE.VMA.
 	 */
-	local_flush_tlb_page(address);
+	while (nr--)
+		local_flush_tlb_page(address + nr * PAGE_SIZE);
 }
+#define update_mmu_cache(vma, addr, ptep) \
+	update_mmu_cache_range(vma, addr, ptep, 1)
 
 #define __HAVE_ARCH_UPDATE_MMU_TLB
 #define update_mmu_tlb update_mmu_cache
@@ -456,12 +459,21 @@ static inline void __set_pte_at(struct mm_struct *mm,
 	set_pte(ptep, pteval);
 }
 
-static inline void set_pte_at(struct mm_struct *mm,
-	unsigned long addr, pte_t *ptep, pte_t pteval)
+static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
+		pte_t *ptep, pte_t pteval, unsigned int nr)
 {
-	page_table_check_ptes_set(mm, addr, ptep, pteval, 1);
-	__set_pte_at(mm, addr, ptep, pteval);
+	page_table_check_ptes_set(mm, addr, ptep, pteval, nr);
+
+	for (;;) {
+		__set_pte_at(mm, addr, ptep, pteval);
+		if (--nr == 0)
+			break;
+		ptep++;
+		addr += PAGE_SIZE;
+		pte_val(pteval) += 1 << _PAGE_PFN_SHIFT;
+	}
 }
+#define set_ptes set_ptes
 
 static inline void pte_clear(struct mm_struct *mm,
 	unsigned long addr, pte_t *ptep)
diff --git a/arch/riscv/mm/cacheflush.c b/arch/riscv/mm/cacheflush.c
index fcd6145fbead..e36a851e5788 100644
--- a/arch/riscv/mm/cacheflush.c
+++ b/arch/riscv/mm/cacheflush.c
@@ -81,16 +81,9 @@ void flush_icache_mm(struct mm_struct *mm, bool local)
 #ifdef CONFIG_MMU
 void flush_icache_pte(pte_t pte)
 {
-	struct page *page = pte_page(pte);
+	struct folio *folio = page_folio(pte_page(pte));
 
-	/*
-	 * HugeTLB pages are always fully mapped, so only setting head page's
-	 * PG_dcache_clean flag is enough.
-	 */
-	if (PageHuge(page))
-		page = compound_head(page);
-
-	if (!test_bit(PG_dcache_clean, &page->flags)) {
+	if (!test_bit(PG_dcache_clean, &folio->flags)) {
 		flush_icache_all();
 		set_bit(PG_dcache_clean, &page->flags);
 	}
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH v4 22/36] s390: Implement the new page table range API
  2023-03-15  5:14 [PATCH v4 00/36] New page table range API Matthew Wilcox (Oracle)
                   ` (20 preceding siblings ...)
  2023-03-15  5:14 ` [PATCH v4 21/36] riscv: " Matthew Wilcox (Oracle)
@ 2023-03-15  5:14 ` Matthew Wilcox (Oracle)
  2023-03-15 10:10   ` Mike Rapoport
  2023-03-15  5:14 ` [PATCH v4 23/36] superh: " Matthew Wilcox (Oracle)
                   ` (13 subsequent siblings)
  35 siblings, 1 reply; 138+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-03-15  5:14 UTC (permalink / raw)
  To: linux-arch
  Cc: Matthew Wilcox (Oracle),
	linux-mm, linux-kernel, Gerald Schaefer, Heiko Carstens,
	Vasily Gorbik, Alexander Gordeev, Christian Borntraeger,
	Sven Schnelle, linux-s390

Add set_ptes() and update_mmu_cache_range().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: linux-s390@vger.kernel.org
---
 arch/s390/include/asm/pgtable.h | 33 ++++++++++++++++++++++++---------
 1 file changed, 24 insertions(+), 9 deletions(-)

diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h
index c1f6b46ec555..fea678c67e51 100644
--- a/arch/s390/include/asm/pgtable.h
+++ b/arch/s390/include/asm/pgtable.h
@@ -50,6 +50,7 @@ void arch_report_meminfo(struct seq_file *m);
  * tables contain all the necessary information.
  */
 #define update_mmu_cache(vma, address, ptep)     do { } while (0)
+#define update_mmu_cache_range(vma, addr, ptep, nr)	do { } while (0)
 #define update_mmu_cache_pmd(vma, address, ptep) do { } while (0)
 
 /*
@@ -1319,20 +1320,34 @@ pgprot_t pgprot_writecombine(pgprot_t prot);
 pgprot_t pgprot_writethrough(pgprot_t prot);
 
 /*
- * Certain architectures need to do special things when PTEs
- * within a page table are directly modified.  Thus, the following
- * hook is made available.
+ * Set multiple PTEs to consecutive pages with a single call.  All PTEs
+ * are within the same folio, PMD and VMA.
  */
-static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
-			      pte_t *ptep, pte_t entry)
+static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
+			      pte_t *ptep, pte_t entry, unsigned int nr)
 {
 	if (pte_present(entry))
 		entry = clear_pte_bit(entry, __pgprot(_PAGE_UNUSED));
-	if (mm_has_pgste(mm))
-		ptep_set_pte_at(mm, addr, ptep, entry);
-	else
-		set_pte(ptep, entry);
+	if (mm_has_pgste(mm)) {
+		for (;;) {
+			ptep_set_pte_at(mm, addr, ptep, entry);
+			if (--nr == 0)
+				break;
+			ptep++;
+			entry = __pte(pte_val(entry) + PAGE_SIZE);
+			addr += PAGE_SIZE;
+		}
+	} else {
+		for (;;) {
+			set_pte(ptep, entry);
+			if (--nr == 0)
+				break;
+			ptep++;
+			entry = __pte(pte_val(entry) + PAGE_SIZE);
+		}
+	}
 }
+#define set_ptes set_ptes
 
 /*
  * Conversion functions: convert a page and protection to a page entry,
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH v4 23/36] superh: Implement the new page table range API
  2023-03-15  5:14 [PATCH v4 00/36] New page table range API Matthew Wilcox (Oracle)
                   ` (21 preceding siblings ...)
  2023-03-15  5:14 ` [PATCH v4 22/36] s390: " Matthew Wilcox (Oracle)
@ 2023-03-15  5:14 ` Matthew Wilcox (Oracle)
  2023-03-15  7:22   ` John Paul Adrian Glaubitz
                     ` (2 more replies)
  2023-03-15  5:14 ` [PATCH v4 24/36] sparc32: " Matthew Wilcox (Oracle)
                   ` (12 subsequent siblings)
  35 siblings, 3 replies; 138+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-03-15  5:14 UTC (permalink / raw)
  To: linux-arch
  Cc: Matthew Wilcox (Oracle),
	linux-mm, linux-kernel, Yoshinori Sato, Rich Felker,
	John Paul Adrian Glaubitz, linux-sh

Add PFN_PTE_SHIFT, update_mmu_cache_range(), flush_dcache_folio() and
flush_icache_pages().  Change the PG_dcache_clean flag from being
per-page to per-folio.  Flush the entire folio containing the pages in
flush_icache_pages() for ease of implementation.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Cc: linux-sh@vger.kernel.org
---
 arch/sh/include/asm/cacheflush.h | 21 ++++++++-----
 arch/sh/include/asm/pgtable.h    |  6 ++--
 arch/sh/include/asm/pgtable_32.h |  5 ++-
 arch/sh/mm/cache-j2.c            |  4 +--
 arch/sh/mm/cache-sh4.c           | 26 +++++++++++-----
 arch/sh/mm/cache-sh7705.c        | 26 ++++++++++------
 arch/sh/mm/cache.c               | 52 ++++++++++++++++++--------------
 arch/sh/mm/kmap.c                |  3 +-
 8 files changed, 88 insertions(+), 55 deletions(-)

diff --git a/arch/sh/include/asm/cacheflush.h b/arch/sh/include/asm/cacheflush.h
index 481a664287e2..9fceef6f3e00 100644
--- a/arch/sh/include/asm/cacheflush.h
+++ b/arch/sh/include/asm/cacheflush.h
@@ -13,9 +13,9 @@
  *  - flush_cache_page(mm, vmaddr, pfn) flushes a single page
  *  - flush_cache_range(vma, start, end) flushes a range of pages
  *
- *  - flush_dcache_page(pg) flushes(wback&invalidates) a page for dcache
+ *  - flush_dcache_folio(folio) flushes(wback&invalidates) a folio for dcache
  *  - flush_icache_range(start, end) flushes(invalidates) a range for icache
- *  - flush_icache_page(vma, pg) flushes(invalidates) a page for icache
+ *  - flush_icache_pages(vma, pg, nr) flushes(invalidates) pages for icache
  *  - flush_cache_sigtramp(vaddr) flushes the signal trampoline
  */
 extern void (*local_flush_cache_all)(void *args);
@@ -23,9 +23,9 @@ extern void (*local_flush_cache_mm)(void *args);
 extern void (*local_flush_cache_dup_mm)(void *args);
 extern void (*local_flush_cache_page)(void *args);
 extern void (*local_flush_cache_range)(void *args);
-extern void (*local_flush_dcache_page)(void *args);
+extern void (*local_flush_dcache_folio)(void *args);
 extern void (*local_flush_icache_range)(void *args);
-extern void (*local_flush_icache_page)(void *args);
+extern void (*local_flush_icache_folio)(void *args);
 extern void (*local_flush_cache_sigtramp)(void *args);
 
 static inline void cache_noop(void *args) { }
@@ -42,11 +42,18 @@ extern void flush_cache_page(struct vm_area_struct *vma,
 extern void flush_cache_range(struct vm_area_struct *vma,
 				 unsigned long start, unsigned long end);
 #define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
-void flush_dcache_page(struct page *page);
+void flush_dcache_folio(struct folio *folio);
+#define flush_dcache_folio flush_dcache_folio
+static inline void flush_dcache_page(struct page *page)
+{
+	flush_dcache_folio(page_folio(page));
+}
+
 extern void flush_icache_range(unsigned long start, unsigned long end);
 #define flush_icache_user_range flush_icache_range
-extern void flush_icache_page(struct vm_area_struct *vma,
-				 struct page *page);
+void flush_icache_pages(struct vm_area_struct *vma, struct page *page,
+		unsigned int nr);
+#define flush_icache_page(vma, page) flush_icache_pages(vma, page, 1)
 extern void flush_cache_sigtramp(unsigned long address);
 
 struct flusher_data {
diff --git a/arch/sh/include/asm/pgtable.h b/arch/sh/include/asm/pgtable.h
index 3ce30becf6df..1a8fdc3bc363 100644
--- a/arch/sh/include/asm/pgtable.h
+++ b/arch/sh/include/asm/pgtable.h
@@ -102,13 +102,15 @@ extern void __update_cache(struct vm_area_struct *vma,
 extern void __update_tlb(struct vm_area_struct *vma,
 			 unsigned long address, pte_t pte);
 
-static inline void
-update_mmu_cache(struct vm_area_struct *vma, unsigned long address, pte_t *ptep)
+static inline void update_mmu_cache_range(struct vm_area_struct *vma,
+		unsigned long address, pte_t *ptep, unsigned int nr)
 {
 	pte_t pte = *ptep;
 	__update_cache(vma, address, pte);
 	__update_tlb(vma, address, pte);
 }
+#define update_mmu_cache(vma, addr, ptep) \
+	update_mmu_cache_range(vma, addr, ptep, 1)
 
 extern pgd_t swapper_pg_dir[PTRS_PER_PGD];
 extern void paging_init(void);
diff --git a/arch/sh/include/asm/pgtable_32.h b/arch/sh/include/asm/pgtable_32.h
index 21952b094650..676f3d4ef6ce 100644
--- a/arch/sh/include/asm/pgtable_32.h
+++ b/arch/sh/include/asm/pgtable_32.h
@@ -307,14 +307,13 @@ static inline void set_pte(pte_t *ptep, pte_t pte)
 #define set_pte(pteptr, pteval) (*(pteptr) = pteval)
 #endif
 
-#define set_pte_at(mm,addr,ptep,pteval) set_pte(ptep,pteval)
-
 /*
  * (pmds are folded into pgds so this doesn't get actually called,
  * but the define is needed for a generic inline function.)
  */
 #define set_pmd(pmdptr, pmdval) (*(pmdptr) = pmdval)
 
+#define PFN_PTE_SHIFT	PAGE_SHIFT
 #define pfn_pte(pfn, prot) \
 	__pte(((unsigned long long)(pfn) << PAGE_SHIFT) | pgprot_val(prot))
 #define pfn_pmd(pfn, prot) \
@@ -323,7 +322,7 @@ static inline void set_pte(pte_t *ptep, pte_t pte)
 #define pte_none(x)		(!pte_val(x))
 #define pte_present(x)		((x).pte_low & (_PAGE_PRESENT | _PAGE_PROTNONE))
 
-#define pte_clear(mm,addr,xp) do { set_pte_at(mm, addr, xp, __pte(0)); } while (0)
+#define pte_clear(mm, addr, ptep) set_pte(ptep, __pte(0))
 
 #define pmd_none(x)	(!pmd_val(x))
 #define pmd_present(x)	(pmd_val(x))
diff --git a/arch/sh/mm/cache-j2.c b/arch/sh/mm/cache-j2.c
index f277862a11f5..9ac960214380 100644
--- a/arch/sh/mm/cache-j2.c
+++ b/arch/sh/mm/cache-j2.c
@@ -55,9 +55,9 @@ void __init j2_cache_init(void)
 	local_flush_cache_dup_mm = j2_flush_both;
 	local_flush_cache_page = j2_flush_both;
 	local_flush_cache_range = j2_flush_both;
-	local_flush_dcache_page = j2_flush_dcache;
+	local_flush_dcache_folio = j2_flush_dcache;
 	local_flush_icache_range = j2_flush_icache;
-	local_flush_icache_page = j2_flush_icache;
+	local_flush_icache_folio = j2_flush_icache;
 	local_flush_cache_sigtramp = j2_flush_icache;
 
 	pr_info("Initial J2 CCR is %.8x\n", __raw_readl(j2_ccr_base));
diff --git a/arch/sh/mm/cache-sh4.c b/arch/sh/mm/cache-sh4.c
index 72c2e1b46c08..862046f26981 100644
--- a/arch/sh/mm/cache-sh4.c
+++ b/arch/sh/mm/cache-sh4.c
@@ -107,19 +107,29 @@ static inline void flush_cache_one(unsigned long start, unsigned long phys)
  * Write back & invalidate the D-cache of the page.
  * (To avoid "alias" issues)
  */
-static void sh4_flush_dcache_page(void *arg)
+static void sh4_flush_dcache_folio(void *arg)
 {
-	struct page *page = arg;
-	unsigned long addr = (unsigned long)page_address(page);
+	struct folio *folio = arg;
 #ifndef CONFIG_SMP
-	struct address_space *mapping = page_mapping_file(page);
+	struct address_space *mapping = folio_flush_mapping(folio);
 
 	if (mapping && !mapping_mapped(mapping))
-		clear_bit(PG_dcache_clean, &page->flags);
+		clear_bit(PG_dcache_clean, &folio->flags);
 	else
 #endif
-		flush_cache_one(CACHE_OC_ADDRESS_ARRAY |
-				(addr & shm_align_mask), page_to_phys(page));
+	{
+		unsigned long pfn = folio_pfn(folio);
+		unsigned long addr = (unsigned long)folio_address(folio);
+		unsigned int i, nr = folio_nr_pages(folio);
+
+		for (i = 0; i < nr; i++) {
+			flush_cache_one(CACHE_OC_ADDRESS_ARRAY |
+						(addr & shm_align_mask),
+					pfn * PAGE_SIZE);
+			addr += PAGE_SIZE;
+			pfn++;
+		}
+	}
 
 	wmb();
 }
@@ -379,7 +389,7 @@ void __init sh4_cache_init(void)
 		__raw_readl(CCN_PRR));
 
 	local_flush_icache_range	= sh4_flush_icache_range;
-	local_flush_dcache_page		= sh4_flush_dcache_page;
+	local_flush_dcache_folio	= sh4_flush_dcache_folio;
 	local_flush_cache_all		= sh4_flush_cache_all;
 	local_flush_cache_mm		= sh4_flush_cache_mm;
 	local_flush_cache_dup_mm	= sh4_flush_cache_mm;
diff --git a/arch/sh/mm/cache-sh7705.c b/arch/sh/mm/cache-sh7705.c
index 9b63a53a5e46..b509a407588f 100644
--- a/arch/sh/mm/cache-sh7705.c
+++ b/arch/sh/mm/cache-sh7705.c
@@ -132,15 +132,20 @@ static void __flush_dcache_page(unsigned long phys)
  * Write back & invalidate the D-cache of the page.
  * (To avoid "alias" issues)
  */
-static void sh7705_flush_dcache_page(void *arg)
+static void sh7705_flush_dcache_folio(void *arg)
 {
-	struct page *page = arg;
-	struct address_space *mapping = page_mapping_file(page);
+	struct folio *folio = arg;
+	struct address_space *mapping = folio_flush_mapping(folio);
 
 	if (mapping && !mapping_mapped(mapping))
-		clear_bit(PG_dcache_clean, &page->flags);
-	else
-		__flush_dcache_page(__pa(page_address(page)));
+		clear_bit(PG_dcache_clean, &folio->flags);
+	else {
+		unsigned long pfn = folio_pfn(folio);
+		unsigned int i, nr = folio_nr_pages(folio);
+
+		for (i = 0; i < nr; i++)
+			__flush_dcache_page((pfn + i) * PAGE_SIZE);
+	}
 }
 
 static void sh7705_flush_cache_all(void *args)
@@ -176,19 +181,20 @@ static void sh7705_flush_cache_page(void *args)
  * Not entirely sure why this is necessary on SH3 with 32K cache but
  * without it we get occasional "Memory fault" when loading a program.
  */
-static void sh7705_flush_icache_page(void *page)
+static void sh7705_flush_icache_folio(void *arg)
 {
-	__flush_purge_region(page_address(page), PAGE_SIZE);
+	struct folio *folio = arg;
+	__flush_purge_region(folio_address(folio), folio_size(folio));
 }
 
 void __init sh7705_cache_init(void)
 {
 	local_flush_icache_range	= sh7705_flush_icache_range;
-	local_flush_dcache_page		= sh7705_flush_dcache_page;
+	local_flush_dcache_folio	= sh7705_flush_dcache_folio;
 	local_flush_cache_all		= sh7705_flush_cache_all;
 	local_flush_cache_mm		= sh7705_flush_cache_all;
 	local_flush_cache_dup_mm	= sh7705_flush_cache_all;
 	local_flush_cache_range		= sh7705_flush_cache_all;
 	local_flush_cache_page		= sh7705_flush_cache_page;
-	local_flush_icache_page		= sh7705_flush_icache_page;
+	local_flush_icache_folio	= sh7705_flush_icache_folio;
 }
diff --git a/arch/sh/mm/cache.c b/arch/sh/mm/cache.c
index 3aef78ceb820..9bcaa5619eab 100644
--- a/arch/sh/mm/cache.c
+++ b/arch/sh/mm/cache.c
@@ -20,9 +20,9 @@ void (*local_flush_cache_mm)(void *args) = cache_noop;
 void (*local_flush_cache_dup_mm)(void *args) = cache_noop;
 void (*local_flush_cache_page)(void *args) = cache_noop;
 void (*local_flush_cache_range)(void *args) = cache_noop;
-void (*local_flush_dcache_page)(void *args) = cache_noop;
+void (*local_flush_dcache_folio)(void *args) = cache_noop;
 void (*local_flush_icache_range)(void *args) = cache_noop;
-void (*local_flush_icache_page)(void *args) = cache_noop;
+void (*local_flush_icache_folio)(void *args) = cache_noop;
 void (*local_flush_cache_sigtramp)(void *args) = cache_noop;
 
 void (*__flush_wback_region)(void *start, int size);
@@ -61,15 +61,17 @@ void copy_to_user_page(struct vm_area_struct *vma, struct page *page,
 		       unsigned long vaddr, void *dst, const void *src,
 		       unsigned long len)
 {
-	if (boot_cpu_data.dcache.n_aliases && page_mapcount(page) &&
-	    test_bit(PG_dcache_clean, &page->flags)) {
+	struct folio *folio = page_folio(page);
+
+	if (boot_cpu_data.dcache.n_aliases && folio_mapped(folio) &&
+	    test_bit(PG_dcache_clean, &folio->flags)) {
 		void *vto = kmap_coherent(page, vaddr) + (vaddr & ~PAGE_MASK);
 		memcpy(vto, src, len);
 		kunmap_coherent(vto);
 	} else {
 		memcpy(dst, src, len);
 		if (boot_cpu_data.dcache.n_aliases)
-			clear_bit(PG_dcache_clean, &page->flags);
+			clear_bit(PG_dcache_clean, &folio->flags);
 	}
 
 	if (vma->vm_flags & VM_EXEC)
@@ -80,27 +82,30 @@ void copy_from_user_page(struct vm_area_struct *vma, struct page *page,
 			 unsigned long vaddr, void *dst, const void *src,
 			 unsigned long len)
 {
+	struct folio *folio = page_folio(page);
+
 	if (boot_cpu_data.dcache.n_aliases && page_mapcount(page) &&
-	    test_bit(PG_dcache_clean, &page->flags)) {
+	    test_bit(PG_dcache_clean, &folio->flags)) {
 		void *vfrom = kmap_coherent(page, vaddr) + (vaddr & ~PAGE_MASK);
 		memcpy(dst, vfrom, len);
 		kunmap_coherent(vfrom);
 	} else {
 		memcpy(dst, src, len);
 		if (boot_cpu_data.dcache.n_aliases)
-			clear_bit(PG_dcache_clean, &page->flags);
+			clear_bit(PG_dcache_clean, &folio->flags);
 	}
 }
 
 void copy_user_highpage(struct page *to, struct page *from,
 			unsigned long vaddr, struct vm_area_struct *vma)
 {
+	struct folio *src = page_folio(from);
 	void *vfrom, *vto;
 
 	vto = kmap_atomic(to);
 
-	if (boot_cpu_data.dcache.n_aliases && page_mapcount(from) &&
-	    test_bit(PG_dcache_clean, &from->flags)) {
+	if (boot_cpu_data.dcache.n_aliases && folio_mapped(src) &&
+	    test_bit(PG_dcache_clean, &src->flags)) {
 		vfrom = kmap_coherent(from, vaddr);
 		copy_page(vto, vfrom);
 		kunmap_coherent(vfrom);
@@ -136,27 +141,28 @@ EXPORT_SYMBOL(clear_user_highpage);
 void __update_cache(struct vm_area_struct *vma,
 		    unsigned long address, pte_t pte)
 {
-	struct page *page;
 	unsigned long pfn = pte_pfn(pte);
 
 	if (!boot_cpu_data.dcache.n_aliases)
 		return;
 
-	page = pfn_to_page(pfn);
 	if (pfn_valid(pfn)) {
-		int dirty = !test_and_set_bit(PG_dcache_clean, &page->flags);
+		struct folio *folio = page_folio(pfn_to_page(pfn));
+		int dirty = !test_and_set_bit(PG_dcache_clean, &folio->flags);
 		if (dirty)
-			__flush_purge_region(page_address(page), PAGE_SIZE);
+			__flush_purge_region(folio_address(folio),
+						folio_size(folio));
 	}
 }
 
 void __flush_anon_page(struct page *page, unsigned long vmaddr)
 {
+	struct folio *folio = page_folio(page);
 	unsigned long addr = (unsigned long) page_address(page);
 
 	if (pages_do_alias(addr, vmaddr)) {
-		if (boot_cpu_data.dcache.n_aliases && page_mapcount(page) &&
-		    test_bit(PG_dcache_clean, &page->flags)) {
+		if (boot_cpu_data.dcache.n_aliases && folio_mapped(folio) &&
+		    test_bit(PG_dcache_clean, &folio->flags)) {
 			void *kaddr;
 
 			kaddr = kmap_coherent(page, vmaddr);
@@ -164,7 +170,8 @@ void __flush_anon_page(struct page *page, unsigned long vmaddr)
 			/* __flush_purge_region((void *)kaddr, PAGE_SIZE); */
 			kunmap_coherent(kaddr);
 		} else
-			__flush_purge_region((void *)addr, PAGE_SIZE);
+			__flush_purge_region(folio_address(folio),
+						folio_size(folio));
 	}
 }
 
@@ -215,11 +222,11 @@ void flush_cache_range(struct vm_area_struct *vma, unsigned long start,
 }
 EXPORT_SYMBOL(flush_cache_range);
 
-void flush_dcache_page(struct page *page)
+void flush_dcache_folio(struct folio *folio)
 {
-	cacheop_on_each_cpu(local_flush_dcache_page, page, 1);
+	cacheop_on_each_cpu(local_flush_dcache_folio, folio, 1);
 }
-EXPORT_SYMBOL(flush_dcache_page);
+EXPORT_SYMBOL(flush_dcache_folio);
 
 void flush_icache_range(unsigned long start, unsigned long end)
 {
@@ -233,10 +240,11 @@ void flush_icache_range(unsigned long start, unsigned long end)
 }
 EXPORT_SYMBOL(flush_icache_range);
 
-void flush_icache_page(struct vm_area_struct *vma, struct page *page)
+void flush_icache_pages(struct vm_area_struct *vma, struct page *page,
+		unsigned int nr)
 {
-	/* Nothing uses the VMA, so just pass the struct page along */
-	cacheop_on_each_cpu(local_flush_icache_page, page, 1);
+	/* Nothing uses the VMA, so just pass the folio along */
+	cacheop_on_each_cpu(local_flush_icache_folio, page_folio(page), 1);
 }
 
 void flush_cache_sigtramp(unsigned long address)
diff --git a/arch/sh/mm/kmap.c b/arch/sh/mm/kmap.c
index 73fd7cc99430..fa50e8f6e7a9 100644
--- a/arch/sh/mm/kmap.c
+++ b/arch/sh/mm/kmap.c
@@ -27,10 +27,11 @@ void __init kmap_coherent_init(void)
 
 void *kmap_coherent(struct page *page, unsigned long addr)
 {
+	struct folio *folio = page_folio(page);
 	enum fixed_addresses idx;
 	unsigned long vaddr;
 
-	BUG_ON(!test_bit(PG_dcache_clean, &page->flags));
+	BUG_ON(!test_bit(PG_dcache_clean, &folio->flags));
 
 	preempt_disable();
 	pagefault_disable();
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH v4 24/36] sparc32: Implement the new page table range API
  2023-03-15  5:14 [PATCH v4 00/36] New page table range API Matthew Wilcox (Oracle)
                   ` (22 preceding siblings ...)
  2023-03-15  5:14 ` [PATCH v4 23/36] superh: " Matthew Wilcox (Oracle)
@ 2023-03-15  5:14 ` Matthew Wilcox (Oracle)
  2023-03-15 10:11   ` Mike Rapoport
  2023-03-15  5:14 ` [PATCH v4 25/36] sparc64: " Matthew Wilcox (Oracle)
                   ` (11 subsequent siblings)
  35 siblings, 1 reply; 138+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-03-15  5:14 UTC (permalink / raw)
  To: linux-arch
  Cc: Matthew Wilcox (Oracle),
	linux-mm, linux-kernel, David S. Miller, sparclinux

Add PFN_PTE_SHIFT, update_mmu_cache_range(), flush_dcache_folio() and
flush_icache_pages().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: sparclinux@vger.kernel.org
---
 arch/sparc/include/asm/cacheflush_32.h |  9 +++++++--
 arch/sparc/include/asm/pgtable_32.h    |  8 ++++----
 arch/sparc/mm/init_32.c                | 13 +++++++++++--
 3 files changed, 22 insertions(+), 8 deletions(-)

diff --git a/arch/sparc/include/asm/cacheflush_32.h b/arch/sparc/include/asm/cacheflush_32.h
index adb6991d0455..8dba35d63328 100644
--- a/arch/sparc/include/asm/cacheflush_32.h
+++ b/arch/sparc/include/asm/cacheflush_32.h
@@ -16,6 +16,7 @@
 	sparc32_cachetlb_ops->cache_page(vma, addr)
 #define flush_icache_range(start, end)		do { } while (0)
 #define flush_icache_page(vma, pg)		do { } while (0)
+#define flush_icache_pages(vma, pg, nr)		do { } while (0)
 
 #define copy_to_user_page(vma, page, vaddr, dst, src, len) \
 	do {							\
@@ -35,11 +36,15 @@
 #define flush_page_for_dma(addr) \
 	sparc32_cachetlb_ops->page_for_dma(addr)
 
-struct page;
 void sparc_flush_page_to_ram(struct page *page);
+void sparc_flush_folio_to_ram(struct folio *folio);
 
 #define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
-#define flush_dcache_page(page)			sparc_flush_page_to_ram(page)
+#define flush_dcache_folio(folio)		sparc_flush_folio_to_ram(folio)
+static inline void flush_dcache_page(struct page *page)
+{
+	flush_dcache_folio(page_folio(page));
+}
 #define flush_dcache_mmap_lock(mapping)		do { } while (0)
 #define flush_dcache_mmap_unlock(mapping)	do { } while (0)
 
diff --git a/arch/sparc/include/asm/pgtable_32.h b/arch/sparc/include/asm/pgtable_32.h
index d4330e3c57a6..7514611d14d3 100644
--- a/arch/sparc/include/asm/pgtable_32.h
+++ b/arch/sparc/include/asm/pgtable_32.h
@@ -101,8 +101,6 @@ static inline void set_pte(pte_t *ptep, pte_t pteval)
 	srmmu_swap((unsigned long *)ptep, pte_val(pteval));
 }
 
-#define set_pte_at(mm,addr,ptep,pteval) set_pte(ptep,pteval)
-
 static inline int srmmu_device_memory(unsigned long x)
 {
 	return ((x & 0xF0000000) != 0);
@@ -256,6 +254,7 @@ static inline pte_t pte_mkyoung(pte_t pte)
 	return __pte(pte_val(pte) | SRMMU_REF);
 }
 
+#define PFN_PTE_SHIFT			(PAGE_SHIFT - 4)
 #define pfn_pte(pfn, prot)		mk_pte(pfn_to_page(pfn), prot)
 
 static inline unsigned long pte_pfn(pte_t pte)
@@ -268,7 +267,7 @@ static inline unsigned long pte_pfn(pte_t pte)
 		 */
 		return ~0UL;
 	}
-	return (pte_val(pte) & SRMMU_PTE_PMASK) >> (PAGE_SHIFT-4);
+	return (pte_val(pte) & SRMMU_PTE_PMASK) >> PFN_PTE_SHIFT;
 }
 
 #define pte_page(pte)	pfn_to_page(pte_pfn(pte))
@@ -318,6 +317,7 @@ void mmu_info(struct seq_file *m);
 #define FAULT_CODE_USER     0x4
 
 #define update_mmu_cache(vma, address, ptep) do { } while (0)
+#define update_mmu_cache_range(vma, address, ptep, nr) do { } while (0)
 
 void srmmu_mapiorange(unsigned int bus, unsigned long xpa,
                       unsigned long xva, unsigned int len);
@@ -422,7 +422,7 @@ static inline int io_remap_pfn_range(struct vm_area_struct *vma,
 ({									  \
 	int __changed = !pte_same(*(__ptep), __entry);			  \
 	if (__changed) {						  \
-		set_pte_at((__vma)->vm_mm, (__address), __ptep, __entry); \
+		set_pte(__ptep, __entry);				  \
 		flush_tlb_page(__vma, __address);			  \
 	}								  \
 	__changed;							  \
diff --git a/arch/sparc/mm/init_32.c b/arch/sparc/mm/init_32.c
index 9c0ea457bdf0..d96a14ffceeb 100644
--- a/arch/sparc/mm/init_32.c
+++ b/arch/sparc/mm/init_32.c
@@ -297,11 +297,20 @@ void sparc_flush_page_to_ram(struct page *page)
 {
 	unsigned long vaddr = (unsigned long)page_address(page);
 
-	if (vaddr)
-		__flush_page_to_ram(vaddr);
+	__flush_page_to_ram(vaddr);
 }
 EXPORT_SYMBOL(sparc_flush_page_to_ram);
 
+void sparc_flush_folio_to_ram(struct folio *folio)
+{
+	unsigned long vaddr = (unsigned long)folio_address(folio);
+	unsigned int i, nr = folio_nr_pages(folio);
+
+	for (i = 0; i < nr; i++)
+		__flush_page_to_ram(vaddr + i * PAGE_SIZE);
+}
+EXPORT_SYMBOL(sparc_flush_folio_to_ram);
+
 static const pgprot_t protection_map[16] = {
 	[VM_NONE]					= PAGE_NONE,
 	[VM_READ]					= PAGE_READONLY,
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH v4 25/36] sparc64: Implement the new page table range API
  2023-03-15  5:14 [PATCH v4 00/36] New page table range API Matthew Wilcox (Oracle)
                   ` (23 preceding siblings ...)
  2023-03-15  5:14 ` [PATCH v4 24/36] sparc32: " Matthew Wilcox (Oracle)
@ 2023-03-15  5:14 ` Matthew Wilcox (Oracle)
  2023-03-15 10:11   ` Mike Rapoport
  2023-03-15  5:14 ` [PATCH v4 26/36] um: " Matthew Wilcox (Oracle)
                   ` (10 subsequent siblings)
  35 siblings, 1 reply; 138+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-03-15  5:14 UTC (permalink / raw)
  To: linux-arch
  Cc: Matthew Wilcox (Oracle),
	linux-mm, linux-kernel, David S. Miller, sparclinux

Add set_ptes(), update_mmu_cache_range(), flush_dcache_folio() and
flush_icache_pages().  Convert the PG_dcache_dirty flag from being
per-page to per-folio.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: sparclinux@vger.kernel.org
---
 arch/sparc/include/asm/cacheflush_64.h | 18 ++++--
 arch/sparc/include/asm/pgtable_64.h    | 24 ++++++--
 arch/sparc/kernel/smp_64.c             | 56 +++++++++++-------
 arch/sparc/mm/init_64.c                | 78 +++++++++++++++-----------
 arch/sparc/mm/tlb.c                    |  5 +-
 5 files changed, 116 insertions(+), 65 deletions(-)

diff --git a/arch/sparc/include/asm/cacheflush_64.h b/arch/sparc/include/asm/cacheflush_64.h
index b9341836597e..a9a719f04d06 100644
--- a/arch/sparc/include/asm/cacheflush_64.h
+++ b/arch/sparc/include/asm/cacheflush_64.h
@@ -35,20 +35,26 @@ void flush_icache_range(unsigned long start, unsigned long end);
 void __flush_icache_page(unsigned long);
 
 void __flush_dcache_page(void *addr, int flush_icache);
-void flush_dcache_page_impl(struct page *page);
+void flush_dcache_folio_impl(struct folio *folio);
 #ifdef CONFIG_SMP
-void smp_flush_dcache_page_impl(struct page *page, int cpu);
-void flush_dcache_page_all(struct mm_struct *mm, struct page *page);
+void smp_flush_dcache_folio_impl(struct folio *folio, int cpu);
+void flush_dcache_folio_all(struct mm_struct *mm, struct folio *folio);
 #else
-#define smp_flush_dcache_page_impl(page,cpu) flush_dcache_page_impl(page)
-#define flush_dcache_page_all(mm,page) flush_dcache_page_impl(page)
+#define smp_flush_dcache_folio_impl(folio, cpu) flush_dcache_folio_impl(folio)
+#define flush_dcache_folio_all(mm, folio) flush_dcache_folio_impl(folio)
 #endif
 
 void __flush_dcache_range(unsigned long start, unsigned long end);
 #define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
-void flush_dcache_page(struct page *page);
+void flush_dcache_folio(struct folio *folio);
+#define flush_dcache_folio flush_dcache_folio
+static inline void flush_dcache_page(struct page *page)
+{
+	flush_dcache_folio(page_folio(page));
+}
 
 #define flush_icache_page(vma, pg)	do { } while(0)
+#define flush_icache_pages(vma, pg, nr)	do { } while(0)
 
 void flush_ptrace_access(struct vm_area_struct *, struct page *,
 			 unsigned long uaddr, void *kaddr,
diff --git a/arch/sparc/include/asm/pgtable_64.h b/arch/sparc/include/asm/pgtable_64.h
index 2dc8d4641734..49c37000e1b1 100644
--- a/arch/sparc/include/asm/pgtable_64.h
+++ b/arch/sparc/include/asm/pgtable_64.h
@@ -911,8 +911,19 @@ static inline void __set_pte_at(struct mm_struct *mm, unsigned long addr,
 	maybe_tlb_batch_add(mm, addr, ptep, orig, fullmm, PAGE_SHIFT);
 }
 
-#define set_pte_at(mm,addr,ptep,pte)	\
-	__set_pte_at((mm), (addr), (ptep), (pte), 0)
+static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
+		pte_t *ptep, pte_t pte, unsigned int nr)
+{
+	for (;;) {
+		__set_pte_at(mm, addr, ptep, pte, 0);
+		if (--nr == 0)
+			break;
+		ptep++;
+		pte_val(pte) += PAGE_SIZE;
+		addr += PAGE_SIZE;
+	}
+}
+#define set_ptes set_ptes
 
 #define pte_clear(mm,addr,ptep)		\
 	set_pte_at((mm), (addr), (ptep), __pte(0UL))
@@ -931,8 +942,8 @@ static inline void __set_pte_at(struct mm_struct *mm, unsigned long addr,
 									\
 		if (pfn_valid(this_pfn) &&				\
 		    (((old_addr) ^ (new_addr)) & (1 << 13)))		\
-			flush_dcache_page_all(current->mm,		\
-					      pfn_to_page(this_pfn));	\
+			flush_dcache_folio_all(current->mm,		\
+				page_folio(pfn_to_page(this_pfn)));	\
 	}								\
 	newpte;								\
 })
@@ -947,7 +958,10 @@ struct seq_file;
 void mmu_info(struct seq_file *);
 
 struct vm_area_struct;
-void update_mmu_cache(struct vm_area_struct *, unsigned long, pte_t *);
+void update_mmu_cache_range(struct vm_area_struct *, unsigned long addr,
+		pte_t *ptep, unsigned int nr);
+#define update_mmu_cache(vma, addr, ptep) \
+	update_mmu_cache_range(vma, addr, ptep, 1)
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 void update_mmu_cache_pmd(struct vm_area_struct *vma, unsigned long addr,
 			  pmd_t *pmd);
diff --git a/arch/sparc/kernel/smp_64.c b/arch/sparc/kernel/smp_64.c
index a55295d1b924..90ef8677ac89 100644
--- a/arch/sparc/kernel/smp_64.c
+++ b/arch/sparc/kernel/smp_64.c
@@ -921,20 +921,26 @@ extern unsigned long xcall_flush_dcache_page_cheetah;
 #endif
 extern unsigned long xcall_flush_dcache_page_spitfire;
 
-static inline void __local_flush_dcache_page(struct page *page)
+static inline void __local_flush_dcache_folio(struct folio *folio)
 {
+	unsigned int i, nr = folio_nr_pages(folio);
+
 #ifdef DCACHE_ALIASING_POSSIBLE
-	__flush_dcache_page(page_address(page),
+	for (i = 0; i < nr; i++)
+		__flush_dcache_page(folio_address(folio) + i * PAGE_SIZE,
 			    ((tlb_type == spitfire) &&
-			     page_mapping_file(page) != NULL));
+			     folio_flush_mapping(folio) != NULL));
 #else
-	if (page_mapping_file(page) != NULL &&
-	    tlb_type == spitfire)
-		__flush_icache_page(__pa(page_address(page)));
+	if (folio_flush_mapping(folio) != NULL &&
+	    tlb_type == spitfire) {
+		unsigned long pfn = folio_pfn(folio)
+		for (i = 0; i < nr; i++)
+			__flush_icache_page((pfn + i) * PAGE_SIZE);
+	}
 #endif
 }
 
-void smp_flush_dcache_page_impl(struct page *page, int cpu)
+void smp_flush_dcache_folio_impl(struct folio *folio, int cpu)
 {
 	int this_cpu;
 
@@ -948,14 +954,14 @@ void smp_flush_dcache_page_impl(struct page *page, int cpu)
 	this_cpu = get_cpu();
 
 	if (cpu == this_cpu) {
-		__local_flush_dcache_page(page);
+		__local_flush_dcache_folio(folio);
 	} else if (cpu_online(cpu)) {
-		void *pg_addr = page_address(page);
+		void *pg_addr = folio_address(folio);
 		u64 data0 = 0;
 
 		if (tlb_type == spitfire) {
 			data0 = ((u64)&xcall_flush_dcache_page_spitfire);
-			if (page_mapping_file(page) != NULL)
+			if (folio_flush_mapping(folio) != NULL)
 				data0 |= ((u64)1 << 32);
 		} else if (tlb_type == cheetah || tlb_type == cheetah_plus) {
 #ifdef DCACHE_ALIASING_POSSIBLE
@@ -963,18 +969,23 @@ void smp_flush_dcache_page_impl(struct page *page, int cpu)
 #endif
 		}
 		if (data0) {
-			xcall_deliver(data0, __pa(pg_addr),
-				      (u64) pg_addr, cpumask_of(cpu));
+			unsigned int i, nr = folio_nr_pages(folio);
+
+			for (i = 0; i < nr; i++) {
+				xcall_deliver(data0, __pa(pg_addr),
+					      (u64) pg_addr, cpumask_of(cpu));
 #ifdef CONFIG_DEBUG_DCFLUSH
-			atomic_inc(&dcpage_flushes_xcall);
+				atomic_inc(&dcpage_flushes_xcall);
 #endif
+				pg_addr += PAGE_SIZE;
+			}
 		}
 	}
 
 	put_cpu();
 }
 
-void flush_dcache_page_all(struct mm_struct *mm, struct page *page)
+void flush_dcache_folio_all(struct mm_struct *mm, struct folio *folio)
 {
 	void *pg_addr;
 	u64 data0;
@@ -988,10 +999,10 @@ void flush_dcache_page_all(struct mm_struct *mm, struct page *page)
 	atomic_inc(&dcpage_flushes);
 #endif
 	data0 = 0;
-	pg_addr = page_address(page);
+	pg_addr = folio_address(folio);
 	if (tlb_type == spitfire) {
 		data0 = ((u64)&xcall_flush_dcache_page_spitfire);
-		if (page_mapping_file(page) != NULL)
+		if (folio_flush_mapping(folio) != NULL)
 			data0 |= ((u64)1 << 32);
 	} else if (tlb_type == cheetah || tlb_type == cheetah_plus) {
 #ifdef DCACHE_ALIASING_POSSIBLE
@@ -999,13 +1010,18 @@ void flush_dcache_page_all(struct mm_struct *mm, struct page *page)
 #endif
 	}
 	if (data0) {
-		xcall_deliver(data0, __pa(pg_addr),
-			      (u64) pg_addr, cpu_online_mask);
+		unsigned int i, nr = folio_nr_pages(folio);
+
+		for (i = 0; i < nr; i++) {
+			xcall_deliver(data0, __pa(pg_addr),
+				      (u64) pg_addr, cpu_online_mask);
 #ifdef CONFIG_DEBUG_DCFLUSH
-		atomic_inc(&dcpage_flushes_xcall);
+			atomic_inc(&dcpage_flushes_xcall);
 #endif
+			pg_addr += PAGE_SIZE;
+		}
 	}
-	__local_flush_dcache_page(page);
+	__local_flush_dcache_folio(folio);
 
 	preempt_enable();
 }
diff --git a/arch/sparc/mm/init_64.c b/arch/sparc/mm/init_64.c
index 04f9db0c3111..ab9aacbaf43c 100644
--- a/arch/sparc/mm/init_64.c
+++ b/arch/sparc/mm/init_64.c
@@ -195,21 +195,26 @@ atomic_t dcpage_flushes_xcall = ATOMIC_INIT(0);
 #endif
 #endif
 
-inline void flush_dcache_page_impl(struct page *page)
+inline void flush_dcache_folio_impl(struct folio *folio)
 {
+	unsigned int i, nr = folio_nr_pages(folio);
+
 	BUG_ON(tlb_type == hypervisor);
 #ifdef CONFIG_DEBUG_DCFLUSH
 	atomic_inc(&dcpage_flushes);
 #endif
 
 #ifdef DCACHE_ALIASING_POSSIBLE
-	__flush_dcache_page(page_address(page),
-			    ((tlb_type == spitfire) &&
-			     page_mapping_file(page) != NULL));
+	for (i = 0; i < nr; i++)
+		__flush_dcache_page(folio_address(folio) + i * PAGE_SIZE,
+				    ((tlb_type == spitfire) &&
+				     folio_flush_mapping(folio) != NULL));
 #else
-	if (page_mapping_file(page) != NULL &&
-	    tlb_type == spitfire)
-		__flush_icache_page(__pa(page_address(page)));
+	if (folio_flush_mapping(folio) != NULL &&
+	    tlb_type == spitfire) {
+		for (i = 0; i < nr; i++)
+			__flush_icache_page((pfn + i) * PAGE_SIZE);
+	}
 #endif
 }
 
@@ -218,10 +223,10 @@ inline void flush_dcache_page_impl(struct page *page)
 #define PG_dcache_cpu_mask	\
 	((1UL<<ilog2(roundup_pow_of_two(NR_CPUS)))-1UL)
 
-#define dcache_dirty_cpu(page) \
-	(((page)->flags >> PG_dcache_cpu_shift) & PG_dcache_cpu_mask)
+#define dcache_dirty_cpu(folio) \
+	(((folio)->flags >> PG_dcache_cpu_shift) & PG_dcache_cpu_mask)
 
-static inline void set_dcache_dirty(struct page *page, int this_cpu)
+static inline void set_dcache_dirty(struct folio *folio, int this_cpu)
 {
 	unsigned long mask = this_cpu;
 	unsigned long non_cpu_bits;
@@ -238,11 +243,11 @@ static inline void set_dcache_dirty(struct page *page, int this_cpu)
 			     "bne,pn	%%xcc, 1b\n\t"
 			     " nop"
 			     : /* no outputs */
-			     : "r" (mask), "r" (non_cpu_bits), "r" (&page->flags)
+			     : "r" (mask), "r" (non_cpu_bits), "r" (&folio->flags)
 			     : "g1", "g7");
 }
 
-static inline void clear_dcache_dirty_cpu(struct page *page, unsigned long cpu)
+static inline void clear_dcache_dirty_cpu(struct folio *folio, unsigned long cpu)
 {
 	unsigned long mask = (1UL << PG_dcache_dirty);
 
@@ -260,7 +265,7 @@ static inline void clear_dcache_dirty_cpu(struct page *page, unsigned long cpu)
 			     " nop\n"
 			     "2:"
 			     : /* no outputs */
-			     : "r" (cpu), "r" (mask), "r" (&page->flags),
+			     : "r" (cpu), "r" (mask), "r" (&folio->flags),
 			       "i" (PG_dcache_cpu_mask),
 			       "i" (PG_dcache_cpu_shift)
 			     : "g1", "g7");
@@ -284,9 +289,10 @@ static void flush_dcache(unsigned long pfn)
 
 	page = pfn_to_page(pfn);
 	if (page) {
+		struct folio *folio = page_folio(page);
 		unsigned long pg_flags;
 
-		pg_flags = page->flags;
+		pg_flags = folio->flags;
 		if (pg_flags & (1UL << PG_dcache_dirty)) {
 			int cpu = ((pg_flags >> PG_dcache_cpu_shift) &
 				   PG_dcache_cpu_mask);
@@ -296,11 +302,11 @@ static void flush_dcache(unsigned long pfn)
 			 * in the SMP case.
 			 */
 			if (cpu == this_cpu)
-				flush_dcache_page_impl(page);
+				flush_dcache_folio_impl(folio);
 			else
-				smp_flush_dcache_page_impl(page, cpu);
+				smp_flush_dcache_folio_impl(folio, cpu);
 
-			clear_dcache_dirty_cpu(page, cpu);
+			clear_dcache_dirty_cpu(folio, cpu);
 
 			put_cpu();
 		}
@@ -388,12 +394,14 @@ bool __init arch_hugetlb_valid_size(unsigned long size)
 }
 #endif	/* CONFIG_HUGETLB_PAGE */
 
-void update_mmu_cache(struct vm_area_struct *vma, unsigned long address, pte_t *ptep)
+void update_mmu_cache_range(struct vm_area_struct *vma, unsigned long address,
+		pte_t *ptep, unsigned int nr)
 {
 	struct mm_struct *mm;
 	unsigned long flags;
 	bool is_huge_tsb;
 	pte_t pte = *ptep;
+	unsigned int i;
 
 	if (tlb_type != hypervisor) {
 		unsigned long pfn = pte_pfn(pte);
@@ -440,15 +448,21 @@ void update_mmu_cache(struct vm_area_struct *vma, unsigned long address, pte_t *
 		}
 	}
 #endif
-	if (!is_huge_tsb)
-		__update_mmu_tsb_insert(mm, MM_TSB_BASE, PAGE_SHIFT,
-					address, pte_val(pte));
+	if (!is_huge_tsb) {
+		for (i = 0; i < nr; i++) {
+			__update_mmu_tsb_insert(mm, MM_TSB_BASE, PAGE_SHIFT,
+						address, pte_val(pte));
+			address += PAGE_SIZE;
+			pte_val(pte) += PAGE_SIZE;
+		}
+	}
 
 	spin_unlock_irqrestore(&mm->context.lock, flags);
 }
 
-void flush_dcache_page(struct page *page)
+void flush_dcache_folio(struct folio *folio)
 {
+	unsigned long pfn = folio_pfn(folio);
 	struct address_space *mapping;
 	int this_cpu;
 
@@ -459,35 +473,35 @@ void flush_dcache_page(struct page *page)
 	 * is merely the zero page.  The 'bigcore' testcase in GDB
 	 * causes this case to run millions of times.
 	 */
-	if (page == ZERO_PAGE(0))
+	if (is_zero_pfn(pfn))
 		return;
 
 	this_cpu = get_cpu();
 
-	mapping = page_mapping_file(page);
+	mapping = folio_flush_mapping(folio);
 	if (mapping && !mapping_mapped(mapping)) {
-		int dirty = test_bit(PG_dcache_dirty, &page->flags);
+		bool dirty = test_bit(PG_dcache_dirty, &folio->flags);
 		if (dirty) {
-			int dirty_cpu = dcache_dirty_cpu(page);
+			int dirty_cpu = dcache_dirty_cpu(folio);
 
 			if (dirty_cpu == this_cpu)
 				goto out;
-			smp_flush_dcache_page_impl(page, dirty_cpu);
+			smp_flush_dcache_folio_impl(folio, dirty_cpu);
 		}
-		set_dcache_dirty(page, this_cpu);
+		set_dcache_dirty(folio, this_cpu);
 	} else {
 		/* We could delay the flush for the !page_mapping
 		 * case too.  But that case is for exec env/arg
 		 * pages and those are %99 certainly going to get
 		 * faulted into the tlb (and thus flushed) anyways.
 		 */
-		flush_dcache_page_impl(page);
+		flush_dcache_folio_impl(folio);
 	}
 
 out:
 	put_cpu();
 }
-EXPORT_SYMBOL(flush_dcache_page);
+EXPORT_SYMBOL(flush_dcache_folio);
 
 void __kprobes flush_icache_range(unsigned long start, unsigned long end)
 {
@@ -2280,10 +2294,10 @@ void __init paging_init(void)
 	setup_page_offset();
 
 	/* These build time checkes make sure that the dcache_dirty_cpu()
-	 * page->flags usage will work.
+	 * folio->flags usage will work.
 	 *
 	 * When a page gets marked as dcache-dirty, we store the
-	 * cpu number starting at bit 32 in the page->flags.  Also,
+	 * cpu number starting at bit 32 in the folio->flags.  Also,
 	 * functions like clear_dcache_dirty_cpu use the cpu mask
 	 * in 13-bit signed-immediate instruction fields.
 	 */
diff --git a/arch/sparc/mm/tlb.c b/arch/sparc/mm/tlb.c
index 9a725547578e..3fa6a070912d 100644
--- a/arch/sparc/mm/tlb.c
+++ b/arch/sparc/mm/tlb.c
@@ -118,6 +118,7 @@ void tlb_batch_add(struct mm_struct *mm, unsigned long vaddr,
 		unsigned long paddr, pfn = pte_pfn(orig);
 		struct address_space *mapping;
 		struct page *page;
+		struct folio *folio;
 
 		if (!pfn_valid(pfn))
 			goto no_cache_flush;
@@ -127,13 +128,13 @@ void tlb_batch_add(struct mm_struct *mm, unsigned long vaddr,
 			goto no_cache_flush;
 
 		/* A real file page? */
-		mapping = page_mapping_file(page);
+		mapping = folio_flush_mapping(folio);
 		if (!mapping)
 			goto no_cache_flush;
 
 		paddr = (unsigned long) page_address(page);
 		if ((paddr ^ vaddr) & (1 << 13))
-			flush_dcache_page_all(mm, page);
+			flush_dcache_folio_all(mm, folio);
 	}
 
 no_cache_flush:
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH v4 26/36] um: Implement the new page table range API
  2023-03-15  5:14 [PATCH v4 00/36] New page table range API Matthew Wilcox (Oracle)
                   ` (24 preceding siblings ...)
  2023-03-15  5:14 ` [PATCH v4 25/36] sparc64: " Matthew Wilcox (Oracle)
@ 2023-03-15  5:14 ` Matthew Wilcox (Oracle)
  2023-03-15 10:12   ` Mike Rapoport
  2023-03-15  5:14 ` [PATCH v4 27/36] x86: " Matthew Wilcox (Oracle)
                   ` (9 subsequent siblings)
  35 siblings, 1 reply; 138+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-03-15  5:14 UTC (permalink / raw)
  To: linux-arch
  Cc: Matthew Wilcox (Oracle),
	linux-mm, linux-kernel, Richard Weinberger, Anton Ivanov,
	Johannes Berg, linux-um

Add PFN_PTE_SHIFT and update_mmu_cache_range().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: linux-um@lists.infradead.org
---
 arch/um/include/asm/pgtable.h | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/arch/um/include/asm/pgtable.h b/arch/um/include/asm/pgtable.h
index a70d1618eb35..ea5f8122f128 100644
--- a/arch/um/include/asm/pgtable.h
+++ b/arch/um/include/asm/pgtable.h
@@ -242,11 +242,7 @@ static inline void set_pte(pte_t *pteptr, pte_t pteval)
 	if(pte_present(*pteptr)) *pteptr = pte_mknewprot(*pteptr);
 }
 
-static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
-			      pte_t *pteptr, pte_t pteval)
-{
-	set_pte(pteptr, pteval);
-}
+#define PFN_PTE_SHIFT		PAGE_SHIFT
 
 #define __HAVE_ARCH_PTE_SAME
 static inline int pte_same(pte_t pte_a, pte_t pte_b)
@@ -290,6 +286,7 @@ struct mm_struct;
 extern pte_t *virt_to_pte(struct mm_struct *mm, unsigned long addr);
 
 #define update_mmu_cache(vma,address,ptep) do {} while (0)
+#define update_mmu_cache_range(vma, address, ptep, nr) do {} while (0)
 
 /*
  * Encode/decode swap entries and swap PTEs. Swap PTEs are all PTEs that
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH v4 27/36] x86: Implement the new page table range API
  2023-03-15  5:14 [PATCH v4 00/36] New page table range API Matthew Wilcox (Oracle)
                   ` (25 preceding siblings ...)
  2023-03-15  5:14 ` [PATCH v4 26/36] um: " Matthew Wilcox (Oracle)
@ 2023-03-15  5:14 ` Matthew Wilcox (Oracle)
  2023-03-15 10:12   ` Mike Rapoport
  2023-03-15 10:34   ` Peter Zijlstra
  2023-03-15  5:14 ` [PATCH v4 28/36] xtensa: " Matthew Wilcox (Oracle)
                   ` (8 subsequent siblings)
  35 siblings, 2 replies; 138+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-03-15  5:14 UTC (permalink / raw)
  To: linux-arch
  Cc: Matthew Wilcox (Oracle),
	linux-mm, linux-kernel, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, H. Peter Anvin

Add PFN_PTE_SHIFT and a noop update_mmu_cache_range().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: x86@kernel.org
Cc: "H. Peter Anvin" <hpa@zytor.com>
---
 arch/x86/include/asm/pgtable.h | 13 ++++++-------
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 1031025730d0..b237878061c4 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -184,6 +184,8 @@ static inline int pte_special(pte_t pte)
 
 static inline u64 protnone_mask(u64 val);
 
+#define PFN_PTE_SHIFT	PAGE_SHIFT
+
 static inline unsigned long pte_pfn(pte_t pte)
 {
 	phys_addr_t pfn = pte_val(pte);
@@ -1019,13 +1021,6 @@ static inline pud_t native_local_pudp_get_and_clear(pud_t *pudp)
 	return res;
 }
 
-static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
-			      pte_t *ptep, pte_t pte)
-{
-	page_table_check_ptes_set(mm, addr, ptep, pte, 1);
-	set_pte(ptep, pte);
-}
-
 static inline void set_pmd_at(struct mm_struct *mm, unsigned long addr,
 			      pmd_t *pmdp, pmd_t pmd)
 {
@@ -1291,6 +1286,10 @@ static inline void update_mmu_cache(struct vm_area_struct *vma,
 		unsigned long addr, pte_t *ptep)
 {
 }
+static inline void update_mmu_cache_range(struct vm_area_struct *vma,
+		unsigned long addr, pte_t *ptep, unsigned int nr)
+{
+}
 static inline void update_mmu_cache_pmd(struct vm_area_struct *vma,
 		unsigned long addr, pmd_t *pmd)
 {
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH v4 28/36] xtensa: Implement the new page table range API
  2023-03-15  5:14 [PATCH v4 00/36] New page table range API Matthew Wilcox (Oracle)
                   ` (26 preceding siblings ...)
  2023-03-15  5:14 ` [PATCH v4 27/36] x86: " Matthew Wilcox (Oracle)
@ 2023-03-15  5:14 ` Matthew Wilcox (Oracle)
  2023-03-15 10:12   ` Mike Rapoport
  2023-03-15  5:14 ` [PATCH v4 29/36] mm: Remove page_mapping_file() Matthew Wilcox (Oracle)
                   ` (7 subsequent siblings)
  35 siblings, 1 reply; 138+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-03-15  5:14 UTC (permalink / raw)
  To: linux-arch
  Cc: Matthew Wilcox (Oracle),
	linux-mm, linux-kernel, Max Filippov, linux-xtensa

Add PFN_PTE_SHIFT, update_mmu_cache_range(), flush_dcache_folio() and
flush_icache_pages().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: linux-xtensa@linux-xtensa.org
---
 arch/xtensa/include/asm/cacheflush.h |  9 ++-
 arch/xtensa/include/asm/pgtable.h    | 17 +++---
 arch/xtensa/mm/cache.c               | 83 ++++++++++++++++------------
 3 files changed, 62 insertions(+), 47 deletions(-)

diff --git a/arch/xtensa/include/asm/cacheflush.h b/arch/xtensa/include/asm/cacheflush.h
index 7b4359312c25..35153f6725e4 100644
--- a/arch/xtensa/include/asm/cacheflush.h
+++ b/arch/xtensa/include/asm/cacheflush.h
@@ -119,8 +119,14 @@ void flush_cache_page(struct vm_area_struct*,
 #define flush_cache_vmap(start,end)	flush_cache_all()
 #define flush_cache_vunmap(start,end)	flush_cache_all()
 
+void flush_dcache_folio(struct folio *folio);
+#define flush_dcache_folio flush_dcache_folio
+
 #define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
-void flush_dcache_page(struct page *);
+static inline void flush_dcache_page(struct page *page)
+{
+	flush_dcache_folio(page_folio(page));
+}
 
 void local_flush_cache_range(struct vm_area_struct *vma,
 		unsigned long start, unsigned long end);
@@ -156,6 +162,7 @@ void local_flush_cache_page(struct vm_area_struct *vma,
 
 /* This is not required, see Documentation/core-api/cachetlb.rst */
 #define	flush_icache_page(vma,page)			do { } while (0)
+#define	flush_icache_pages(vma, page, nr)		do { } while (0)
 
 #define flush_dcache_mmap_lock(mapping)			do { } while (0)
 #define flush_dcache_mmap_unlock(mapping)		do { } while (0)
diff --git a/arch/xtensa/include/asm/pgtable.h b/arch/xtensa/include/asm/pgtable.h
index fc7a14884c6c..80bc70251aad 100644
--- a/arch/xtensa/include/asm/pgtable.h
+++ b/arch/xtensa/include/asm/pgtable.h
@@ -274,6 +274,7 @@ static inline pte_t pte_mkwrite(pte_t pte)
  * and a page entry and page directory to the page they refer to.
  */
 
+#define PFN_PTE_SHIFT		PAGE_SHIFT
 #define pte_pfn(pte)		(pte_val(pte) >> PAGE_SHIFT)
 #define pte_same(a,b)		(pte_val(a) == pte_val(b))
 #define pte_page(x)		pfn_to_page(pte_pfn(x))
@@ -301,15 +302,9 @@ static inline void update_pte(pte_t *ptep, pte_t pteval)
 
 struct mm_struct;
 
-static inline void
-set_pte_at(struct mm_struct *mm, unsigned long addr, pte_t *ptep, pte_t pteval)
-{
-	update_pte(ptep, pteval);
-}
-
-static inline void set_pte(pte_t *ptep, pte_t pteval)
+static inline void set_pte(pte_t *ptep, pte_t pte)
 {
-	update_pte(ptep, pteval);
+	update_pte(ptep, pte);
 }
 
 static inline void
@@ -407,8 +402,10 @@ static inline pte_t pte_swp_clear_exclusive(pte_t pte)
 
 #else
 
-extern  void update_mmu_cache(struct vm_area_struct * vma,
-			      unsigned long address, pte_t *ptep);
+void update_mmu_cache_range(struct vm_area_struct *vma,
+		unsigned long address, pte_t *ptep, unsigned int nr);
+#define update_mmu_cache(vma, address, ptep) \
+	update_mmu_cache_range(vma, address, ptep, 1)
 
 typedef pte_t *pte_addr_t;
 
diff --git a/arch/xtensa/mm/cache.c b/arch/xtensa/mm/cache.c
index 19e5a478a7e8..27bd798e4d89 100644
--- a/arch/xtensa/mm/cache.c
+++ b/arch/xtensa/mm/cache.c
@@ -121,9 +121,9 @@ EXPORT_SYMBOL(copy_user_highpage);
  *
  */
 
-void flush_dcache_page(struct page *page)
+void flush_dcache_folio(struct folio *folio)
 {
-	struct address_space *mapping = page_mapping_file(page);
+	struct address_space *mapping = folio_flush_mapping(folio);
 
 	/*
 	 * If we have a mapping but the page is not mapped to user-space
@@ -132,14 +132,14 @@ void flush_dcache_page(struct page *page)
 	 */
 
 	if (mapping && !mapping_mapped(mapping)) {
-		if (!test_bit(PG_arch_1, &page->flags))
-			set_bit(PG_arch_1, &page->flags);
+		if (!test_bit(PG_arch_1, &folio->flags))
+			set_bit(PG_arch_1, &folio->flags);
 		return;
 
 	} else {
-
-		unsigned long phys = page_to_phys(page);
-		unsigned long temp = page->index << PAGE_SHIFT;
+		unsigned long phys = folio_pfn(folio) * PAGE_SIZE;
+		unsigned long temp = folio_pos(folio);
+		unsigned int i, nr = folio_nr_pages(folio);
 		unsigned long alias = !(DCACHE_ALIAS_EQ(temp, phys));
 		unsigned long virt;
 
@@ -154,22 +154,26 @@ void flush_dcache_page(struct page *page)
 			return;
 
 		preempt_disable();
-		virt = TLBTEMP_BASE_1 + (phys & DCACHE_ALIAS_MASK);
-		__flush_invalidate_dcache_page_alias(virt, phys);
+		for (i = 0; i < nr; i++) {
+			virt = TLBTEMP_BASE_1 + (phys & DCACHE_ALIAS_MASK);
+			__flush_invalidate_dcache_page_alias(virt, phys);
 
-		virt = TLBTEMP_BASE_1 + (temp & DCACHE_ALIAS_MASK);
+			virt = TLBTEMP_BASE_1 + (temp & DCACHE_ALIAS_MASK);
 
-		if (alias)
-			__flush_invalidate_dcache_page_alias(virt, phys);
+			if (alias)
+				__flush_invalidate_dcache_page_alias(virt, phys);
 
-		if (mapping)
-			__invalidate_icache_page_alias(virt, phys);
+			if (mapping)
+				__invalidate_icache_page_alias(virt, phys);
+			phys += PAGE_SIZE;
+			temp += PAGE_SIZE;
+		}
 		preempt_enable();
 	}
 
 	/* There shouldn't be an entry in the cache for this page anymore. */
 }
-EXPORT_SYMBOL(flush_dcache_page);
+EXPORT_SYMBOL(flush_dcache_folio);
 
 /*
  * For now, flush the whole cache. FIXME??
@@ -207,45 +211,52 @@ EXPORT_SYMBOL(local_flush_cache_page);
 
 #endif /* DCACHE_WAY_SIZE > PAGE_SIZE */
 
-void
-update_mmu_cache(struct vm_area_struct * vma, unsigned long addr, pte_t *ptep)
+void update_mmu_cache_range(struct vm_area_struct *vma, unsigned long addr,
+		pte_t *ptep, unsigned int nr)
 {
 	unsigned long pfn = pte_pfn(*ptep);
-	struct page *page;
+	struct folio *folio;
+	unsigned int i;
 
 	if (!pfn_valid(pfn))
 		return;
 
-	page = pfn_to_page(pfn);
+	folio = page_folio(pfn_to_page(pfn));
 
-	/* Invalidate old entry in TLBs */
-
-	flush_tlb_page(vma, addr);
+	/* Invalidate old entries in TLBs */
+	for (i = 0; i < nr; i++)
+		flush_tlb_page(vma, addr + i * PAGE_SIZE);
+	nr = folio_nr_pages(folio);
 
 #if (DCACHE_WAY_SIZE > PAGE_SIZE)
 
-	if (!PageReserved(page) && test_bit(PG_arch_1, &page->flags)) {
-		unsigned long phys = page_to_phys(page);
+	if (!folio_test_reserved(folio) && test_bit(PG_arch_1, &folio->flags)) {
+		unsigned long phys = folio_pfn(folio) * PAGE_SIZE;
 		unsigned long tmp;
 
 		preempt_disable();
-		tmp = TLBTEMP_BASE_1 + (phys & DCACHE_ALIAS_MASK);
-		__flush_invalidate_dcache_page_alias(tmp, phys);
-		tmp = TLBTEMP_BASE_1 + (addr & DCACHE_ALIAS_MASK);
-		__flush_invalidate_dcache_page_alias(tmp, phys);
-		__invalidate_icache_page_alias(tmp, phys);
+		for (i = 0; i < nr; i++) {
+			tmp = TLBTEMP_BASE_1 + (phys & DCACHE_ALIAS_MASK);
+			__flush_invalidate_dcache_page_alias(tmp, phys);
+			tmp = TLBTEMP_BASE_1 + (addr & DCACHE_ALIAS_MASK);
+			__flush_invalidate_dcache_page_alias(tmp, phys);
+			__invalidate_icache_page_alias(tmp, phys);
+			phys += PAGE_SIZE;
+		}
 		preempt_enable();
 
-		clear_bit(PG_arch_1, &page->flags);
+		clear_bit(PG_arch_1, &folio->flags);
 	}
 #else
-	if (!PageReserved(page) && !test_bit(PG_arch_1, &page->flags)
+	if (!folio_test_reserved(folio) && !test_bit(PG_arch_1, &folio->flags)
 	    && (vma->vm_flags & VM_EXEC) != 0) {
-		unsigned long paddr = (unsigned long)kmap_atomic(page);
-		__flush_dcache_page(paddr);
-		__invalidate_icache_page(paddr);
-		set_bit(PG_arch_1, &page->flags);
-		kunmap_atomic((void *)paddr);
+		for (i = 0; i < nr; i++) {
+			void *paddr = kmap_local_folio(folio, i * PAGE_SIZE);
+			__flush_dcache_page((unsigned long)paddr);
+			__invalidate_icache_page((unsigned long)paddr);
+			kunmap_local(paddr);
+		}
+		set_bit(PG_arch_1, &folio->flags);
 	}
 #endif
 }
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH v4 29/36] mm: Remove page_mapping_file()
  2023-03-15  5:14 [PATCH v4 00/36] New page table range API Matthew Wilcox (Oracle)
                   ` (27 preceding siblings ...)
  2023-03-15  5:14 ` [PATCH v4 28/36] xtensa: " Matthew Wilcox (Oracle)
@ 2023-03-15  5:14 ` Matthew Wilcox (Oracle)
  2023-05-25  3:50   ` Anshuman Khandual
  2023-05-25  5:37   ` Anshuman Khandual
  2023-03-15  5:14 ` [PATCH v4 30/36] mm: Rationalise flush_icache_pages() and flush_icache_page() Matthew Wilcox (Oracle)
                   ` (6 subsequent siblings)
  35 siblings, 2 replies; 138+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-03-15  5:14 UTC (permalink / raw)
  To: linux-arch; +Cc: Matthew Wilcox (Oracle), linux-mm, linux-kernel

This function has no more users.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 include/linux/pagemap.h | 8 --------
 1 file changed, 8 deletions(-)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index e56c2023aa0e..a87113055b9c 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -394,14 +394,6 @@ static inline struct address_space *page_file_mapping(struct page *page)
 	return folio_file_mapping(page_folio(page));
 }
 
-/*
- * For file cache pages, return the address_space, otherwise return NULL
- */
-static inline struct address_space *page_mapping_file(struct page *page)
-{
-	return folio_flush_mapping(page_folio(page));
-}
-
 /**
  * folio_inode - Get the host inode for this folio.
  * @folio: The folio.
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH v4 30/36] mm: Rationalise flush_icache_pages() and flush_icache_page()
  2023-03-15  5:14 [PATCH v4 00/36] New page table range API Matthew Wilcox (Oracle)
                   ` (28 preceding siblings ...)
  2023-03-15  5:14 ` [PATCH v4 29/36] mm: Remove page_mapping_file() Matthew Wilcox (Oracle)
@ 2023-03-15  5:14 ` Matthew Wilcox (Oracle)
  2023-03-15  5:14 ` [PATCH v4 31/36] mm: Tidy up set_ptes definition Matthew Wilcox (Oracle)
                   ` (5 subsequent siblings)
  35 siblings, 0 replies; 138+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-03-15  5:14 UTC (permalink / raw)
  To: linux-arch; +Cc: Matthew Wilcox (Oracle), linux-mm, linux-kernel

Move the default (no-op) implementation of flush_icache_pages()
to <linux/cacheflush.h> from <asm-generic/cacheflush.h>.
Remove the flush_icache_page() wrapper from each architecture
into <linux/cacheflush.h>.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 arch/alpha/include/asm/cacheflush.h     |  5 +----
 arch/arc/include/asm/cacheflush.h       |  9 ---------
 arch/arm/include/asm/cacheflush.h       |  7 -------
 arch/csky/abiv1/inc/abi/cacheflush.h    |  1 -
 arch/csky/abiv2/inc/abi/cacheflush.h    |  1 -
 arch/hexagon/include/asm/cacheflush.h   |  2 +-
 arch/loongarch/include/asm/cacheflush.h |  2 --
 arch/m68k/include/asm/cacheflush_mm.h   |  1 -
 arch/mips/include/asm/cacheflush.h      |  6 ------
 arch/nios2/include/asm/cacheflush.h     |  2 +-
 arch/nios2/mm/cacheflush.c              |  1 +
 arch/parisc/include/asm/cacheflush.h    |  2 +-
 arch/sh/include/asm/cacheflush.h        |  2 +-
 arch/sparc/include/asm/cacheflush_32.h  |  2 --
 arch/sparc/include/asm/cacheflush_64.h  |  3 ---
 arch/xtensa/include/asm/cacheflush.h    |  4 ----
 include/asm-generic/cacheflush.h        | 12 ------------
 include/linux/cacheflush.h              |  9 +++++++++
 18 files changed, 15 insertions(+), 56 deletions(-)

diff --git a/arch/alpha/include/asm/cacheflush.h b/arch/alpha/include/asm/cacheflush.h
index 3956460e69e2..36a7e924c3b9 100644
--- a/arch/alpha/include/asm/cacheflush.h
+++ b/arch/alpha/include/asm/cacheflush.h
@@ -53,10 +53,6 @@ extern void flush_icache_user_page(struct vm_area_struct *vma,
 #define flush_icache_user_page flush_icache_user_page
 #endif /* CONFIG_SMP */
 
-/* This is used only in __do_fault and do_swap_page.  */
-#define flush_icache_page(vma, page) \
-	flush_icache_user_page((vma), (page), 0, 0)
-
 /*
  * Both implementations of flush_icache_user_page flush the entire
  * address space, so one call, no matter how many pages.
@@ -66,6 +62,7 @@ static inline void flush_icache_pages(struct vm_area_struct *vma,
 {
 	flush_icache_user_page(vma, page, 0, 0);
 }
+#define flush_icache_pages flush_icache_pages
 
 #include <asm-generic/cacheflush.h>
 
diff --git a/arch/arc/include/asm/cacheflush.h b/arch/arc/include/asm/cacheflush.h
index 04f65f588510..bd5b1a9a0544 100644
--- a/arch/arc/include/asm/cacheflush.h
+++ b/arch/arc/include/asm/cacheflush.h
@@ -18,15 +18,6 @@
 #include <linux/mm.h>
 #include <asm/shmparam.h>
 
-/*
- * Semantically we need this because icache doesn't snoop dcache/dma.
- * However ARC Cache flush requires paddr as well as vaddr, latter not available
- * in the flush_icache_page() API. So we no-op it but do the equivalent work
- * in update_mmu_cache()
- */
-#define flush_icache_page(vma, page)
-#define flush_icache_pages(vma, page, nr)
-
 void flush_cache_all(void);
 
 void flush_icache_range(unsigned long kstart, unsigned long kend);
diff --git a/arch/arm/include/asm/cacheflush.h b/arch/arm/include/asm/cacheflush.h
index 841e268d2374..f6181f69577f 100644
--- a/arch/arm/include/asm/cacheflush.h
+++ b/arch/arm/include/asm/cacheflush.h
@@ -321,13 +321,6 @@ static inline void flush_anon_page(struct vm_area_struct *vma,
 #define flush_dcache_mmap_lock(mapping)		xa_lock_irq(&mapping->i_pages)
 #define flush_dcache_mmap_unlock(mapping)	xa_unlock_irq(&mapping->i_pages)
 
-/*
- * We don't appear to need to do anything here.  In fact, if we did, we'd
- * duplicate cache flushing elsewhere performed by flush_dcache_page().
- */
-#define flush_icache_page(vma,page)	do { } while (0)
-#define flush_icache_pages(vma, page, nr)	do { } while (0)
-
 /*
  * flush_cache_vmap() is used when creating mappings (eg, via vmap,
  * vmalloc, ioremap etc) in kernel space for pages.  On non-VIPT
diff --git a/arch/csky/abiv1/inc/abi/cacheflush.h b/arch/csky/abiv1/inc/abi/cacheflush.h
index 0d6cb65624c4..908d8b0bc4fd 100644
--- a/arch/csky/abiv1/inc/abi/cacheflush.h
+++ b/arch/csky/abiv1/inc/abi/cacheflush.h
@@ -45,7 +45,6 @@ extern void flush_cache_range(struct vm_area_struct *vma, unsigned long start, u
 #define flush_cache_vmap(start, end)		cache_wbinv_all()
 #define flush_cache_vunmap(start, end)		cache_wbinv_all()
 
-#define flush_icache_page(vma, page)		do {} while (0);
 #define flush_icache_range(start, end)		cache_wbinv_range(start, end)
 #define flush_icache_mm_range(mm, start, end)	cache_wbinv_range(start, end)
 #define flush_icache_deferred(mm)		do {} while (0);
diff --git a/arch/csky/abiv2/inc/abi/cacheflush.h b/arch/csky/abiv2/inc/abi/cacheflush.h
index 9c728933a776..40be16907267 100644
--- a/arch/csky/abiv2/inc/abi/cacheflush.h
+++ b/arch/csky/abiv2/inc/abi/cacheflush.h
@@ -33,7 +33,6 @@ static inline void flush_dcache_page(struct page *page)
 
 #define flush_dcache_mmap_lock(mapping)		do { } while (0)
 #define flush_dcache_mmap_unlock(mapping)	do { } while (0)
-#define flush_icache_page(vma, page)		do { } while (0)
 
 #define flush_icache_range(start, end)		cache_wbinv_range(start, end)
 
diff --git a/arch/hexagon/include/asm/cacheflush.h b/arch/hexagon/include/asm/cacheflush.h
index 63ca314ede89..bdacf72d97e1 100644
--- a/arch/hexagon/include/asm/cacheflush.h
+++ b/arch/hexagon/include/asm/cacheflush.h
@@ -18,7 +18,7 @@
  *  - flush_cache_range(vma, start, end) flushes a range of pages
  *  - flush_icache_range(start, end) flush a range of instructions
  *  - flush_dcache_page(pg) flushes(wback&invalidates) a page for dcache
- *  - flush_icache_page(vma, pg) flushes(invalidates) a page for icache
+ *  - flush_icache_pages(vma, pg, nr) flushes(invalidates) nr pages for icache
  *
  *  Need to doublecheck which one is really needed for ptrace stuff to work.
  */
diff --git a/arch/loongarch/include/asm/cacheflush.h b/arch/loongarch/include/asm/cacheflush.h
index 7907eb42bfbd..326ac6f1b27c 100644
--- a/arch/loongarch/include/asm/cacheflush.h
+++ b/arch/loongarch/include/asm/cacheflush.h
@@ -46,8 +46,6 @@ void local_flush_icache_range(unsigned long start, unsigned long end);
 #define flush_cache_page(vma, vmaddr, pfn)		do { } while (0)
 #define flush_cache_vmap(start, end)			do { } while (0)
 #define flush_cache_vunmap(start, end)			do { } while (0)
-#define flush_icache_page(vma, page)			do { } while (0)
-#define flush_icache_pages(vma, page)			do { } while (0)
 #define flush_icache_user_page(vma, page, addr, len)	do { } while (0)
 #define flush_dcache_page(page)				do { } while (0)
 #define flush_dcache_folio(folio)			do { } while (0)
diff --git a/arch/m68k/include/asm/cacheflush_mm.h b/arch/m68k/include/asm/cacheflush_mm.h
index 88eb85e81ef6..ed12358c4783 100644
--- a/arch/m68k/include/asm/cacheflush_mm.h
+++ b/arch/m68k/include/asm/cacheflush_mm.h
@@ -261,7 +261,6 @@ static inline void __flush_pages_to_ram(void *vaddr, unsigned int nr)
 #define flush_dcache_mmap_unlock(mapping)	do { } while (0)
 #define flush_icache_pages(vma, page, nr)	\
 	__flush_pages_to_ram(page_address(page), nr)
-#define flush_icache_page(vma, page) flush_icache_pages(vma, page, 1)
 
 extern void flush_icache_user_page(struct vm_area_struct *vma, struct page *page,
 				    unsigned long addr, int len);
diff --git a/arch/mips/include/asm/cacheflush.h b/arch/mips/include/asm/cacheflush.h
index 2683cade42ef..043e50effc62 100644
--- a/arch/mips/include/asm/cacheflush.h
+++ b/arch/mips/include/asm/cacheflush.h
@@ -82,12 +82,6 @@ static inline void flush_anon_page(struct vm_area_struct *vma,
 		__flush_anon_page(page, vmaddr);
 }
 
-static inline void flush_icache_pages(struct vm_area_struct *vma,
-		struct page *page, unsigned int nr)
-{
-}
-#define flush_icache_page(vma, page) flush_icache_pages(vma, page, 1)
-
 extern void (*flush_icache_range)(unsigned long start, unsigned long end);
 extern void (*local_flush_icache_range)(unsigned long start, unsigned long end);
 extern void (*__flush_icache_user_range)(unsigned long start,
diff --git a/arch/nios2/include/asm/cacheflush.h b/arch/nios2/include/asm/cacheflush.h
index 8624ca83cffe..7c48c5213fb7 100644
--- a/arch/nios2/include/asm/cacheflush.h
+++ b/arch/nios2/include/asm/cacheflush.h
@@ -35,7 +35,7 @@ void flush_dcache_folio(struct folio *folio);
 extern void flush_icache_range(unsigned long start, unsigned long end);
 void flush_icache_pages(struct vm_area_struct *vma, struct page *page,
 		unsigned int nr);
-#define flush_icache_page(vma, page) flush_icache_pages(vma, page, 1);
+#define flush_icache_pages flush_icache_pages
 
 #define flush_cache_vmap(start, end)		flush_dcache_range(start, end)
 #define flush_cache_vunmap(start, end)		flush_dcache_range(start, end)
diff --git a/arch/nios2/mm/cacheflush.c b/arch/nios2/mm/cacheflush.c
index 471485a84b2c..2565767b98a3 100644
--- a/arch/nios2/mm/cacheflush.c
+++ b/arch/nios2/mm/cacheflush.c
@@ -147,6 +147,7 @@ void flush_icache_pages(struct vm_area_struct *vma, struct page *page,
 	__flush_dcache(start, end);
 	__flush_icache(start, end);
 }
+#define flush_icache_pages flush_icache_pages
 
 void flush_cache_page(struct vm_area_struct *vma, unsigned long vmaddr,
 			unsigned long pfn)
diff --git a/arch/parisc/include/asm/cacheflush.h b/arch/parisc/include/asm/cacheflush.h
index 2cdc0ea562d6..cd0bfbd244db 100644
--- a/arch/parisc/include/asm/cacheflush.h
+++ b/arch/parisc/include/asm/cacheflush.h
@@ -56,7 +56,7 @@ static inline void flush_dcache_page(struct page *page)
 
 void flush_icache_pages(struct vm_area_struct *vma, struct page *page,
 		unsigned int nr);
-#define flush_icache_page(vma, page)	flush_icache_pages(vma, page, 1)
+#define flush_icache_pages flush_icache_pages
 
 #define flush_icache_range(s,e)		do { 		\
 	flush_kernel_dcache_range_asm(s,e); 		\
diff --git a/arch/sh/include/asm/cacheflush.h b/arch/sh/include/asm/cacheflush.h
index 9fceef6f3e00..878b6b551bd2 100644
--- a/arch/sh/include/asm/cacheflush.h
+++ b/arch/sh/include/asm/cacheflush.h
@@ -53,7 +53,7 @@ extern void flush_icache_range(unsigned long start, unsigned long end);
 #define flush_icache_user_range flush_icache_range
 void flush_icache_pages(struct vm_area_struct *vma, struct page *page,
 		unsigned int nr);
-#define flush_icache_page(vma, page) flush_icache_pages(vma, page, 1)
+#define flush_icache_pages flush_icache_pages
 extern void flush_cache_sigtramp(unsigned long address);
 
 struct flusher_data {
diff --git a/arch/sparc/include/asm/cacheflush_32.h b/arch/sparc/include/asm/cacheflush_32.h
index 8dba35d63328..21f6c918238b 100644
--- a/arch/sparc/include/asm/cacheflush_32.h
+++ b/arch/sparc/include/asm/cacheflush_32.h
@@ -15,8 +15,6 @@
 #define flush_cache_page(vma,addr,pfn) \
 	sparc32_cachetlb_ops->cache_page(vma, addr)
 #define flush_icache_range(start, end)		do { } while (0)
-#define flush_icache_page(vma, pg)		do { } while (0)
-#define flush_icache_pages(vma, pg, nr)		do { } while (0)
 
 #define copy_to_user_page(vma, page, vaddr, dst, src, len) \
 	do {							\
diff --git a/arch/sparc/include/asm/cacheflush_64.h b/arch/sparc/include/asm/cacheflush_64.h
index a9a719f04d06..0e879004efff 100644
--- a/arch/sparc/include/asm/cacheflush_64.h
+++ b/arch/sparc/include/asm/cacheflush_64.h
@@ -53,9 +53,6 @@ static inline void flush_dcache_page(struct page *page)
 	flush_dcache_folio(page_folio(page));
 }
 
-#define flush_icache_page(vma, pg)	do { } while(0)
-#define flush_icache_pages(vma, pg, nr)	do { } while(0)
-
 void flush_ptrace_access(struct vm_area_struct *, struct page *,
 			 unsigned long uaddr, void *kaddr,
 			 unsigned long len, int write);
diff --git a/arch/xtensa/include/asm/cacheflush.h b/arch/xtensa/include/asm/cacheflush.h
index 35153f6725e4..785a00ce83c1 100644
--- a/arch/xtensa/include/asm/cacheflush.h
+++ b/arch/xtensa/include/asm/cacheflush.h
@@ -160,10 +160,6 @@ void local_flush_cache_page(struct vm_area_struct *vma,
 		__invalidate_icache_range(start,(end) - (start));	\
 	} while (0)
 
-/* This is not required, see Documentation/core-api/cachetlb.rst */
-#define	flush_icache_page(vma,page)			do { } while (0)
-#define	flush_icache_pages(vma, page, nr)		do { } while (0)
-
 #define flush_dcache_mmap_lock(mapping)			do { } while (0)
 #define flush_dcache_mmap_unlock(mapping)		do { } while (0)
 
diff --git a/include/asm-generic/cacheflush.h b/include/asm-generic/cacheflush.h
index 09d51a680765..84ec53ccc450 100644
--- a/include/asm-generic/cacheflush.h
+++ b/include/asm-generic/cacheflush.h
@@ -77,18 +77,6 @@ static inline void flush_icache_range(unsigned long start, unsigned long end)
 #define flush_icache_user_range flush_icache_range
 #endif
 
-#ifndef flush_icache_page
-static inline void flush_icache_pages(struct vm_area_struct *vma,
-				     struct page *page, unsigned int nr)
-{
-}
-
-static inline void flush_icache_page(struct vm_area_struct *vma,
-				     struct page *page)
-{
-}
-#endif
-
 #ifndef flush_icache_user_page
 static inline void flush_icache_user_page(struct vm_area_struct *vma,
 					   struct page *page,
diff --git a/include/linux/cacheflush.h b/include/linux/cacheflush.h
index 82136f3fcf54..55f297b2c23f 100644
--- a/include/linux/cacheflush.h
+++ b/include/linux/cacheflush.h
@@ -17,4 +17,13 @@ static inline void flush_dcache_folio(struct folio *folio)
 #define flush_dcache_folio flush_dcache_folio
 #endif /* ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE */
 
+#ifndef flush_icache_pages
+static inline void flush_icache_pages(struct vm_area_struct *vma,
+				     struct page *page, unsigned int nr)
+{
+}
+#endif
+
+#define flush_icache_page(vma, page)	flush_icache_pages(vma, page, 1)
+
 #endif /* _LINUX_CACHEFLUSH_H */
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH v4 31/36] mm: Tidy up set_ptes definition
  2023-03-15  5:14 [PATCH v4 00/36] New page table range API Matthew Wilcox (Oracle)
                   ` (29 preceding siblings ...)
  2023-03-15  5:14 ` [PATCH v4 30/36] mm: Rationalise flush_icache_pages() and flush_icache_page() Matthew Wilcox (Oracle)
@ 2023-03-15  5:14 ` Matthew Wilcox (Oracle)
  2023-05-25  6:20   ` Anshuman Khandual
  2023-03-15  5:14 ` [PATCH v4 32/36] mm: Use flush_icache_pages() in do_set_pmd() Matthew Wilcox (Oracle)
                   ` (4 subsequent siblings)
  35 siblings, 1 reply; 138+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-03-15  5:14 UTC (permalink / raw)
  To: linux-arch; +Cc: Matthew Wilcox (Oracle), linux-mm, linux-kernel

Now that all architectures are converted, we can remove the
PFN_PTE_SHIFT ifdef and we can define set_pte_at() unconditionally.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 include/linux/pgtable.h | 6 ------
 1 file changed, 6 deletions(-)

diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index a755fe94b4b4..a54b9197f2f2 100644
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -173,7 +173,6 @@ static inline int pmd_young(pmd_t pmd)
 #endif
 
 #ifndef set_ptes
-#ifdef PFN_PTE_SHIFT
 /**
  * set_ptes - Map consecutive pages to a contiguous range of addresses.
  * @mm: Address space to map the pages into.
@@ -201,13 +200,8 @@ static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
 		pte = __pte(pte_val(pte) + (1UL << PFN_PTE_SHIFT));
 	}
 }
-#ifndef set_pte_at
-#define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1)
-#endif
 #endif
-#else
 #define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1)
-#endif
 
 #ifndef __HAVE_ARCH_PTEP_SET_ACCESS_FLAGS
 extern int ptep_set_access_flags(struct vm_area_struct *vma,
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH v4 32/36] mm: Use flush_icache_pages() in do_set_pmd()
  2023-03-15  5:14 [PATCH v4 00/36] New page table range API Matthew Wilcox (Oracle)
                   ` (30 preceding siblings ...)
  2023-03-15  5:14 ` [PATCH v4 31/36] mm: Tidy up set_ptes definition Matthew Wilcox (Oracle)
@ 2023-03-15  5:14 ` Matthew Wilcox (Oracle)
  2023-05-25  6:31   ` Anshuman Khandual
  2023-03-15  5:14 ` [PATCH v4 33/36] filemap: Add filemap_map_folio_range() Matthew Wilcox (Oracle)
                   ` (3 subsequent siblings)
  35 siblings, 1 reply; 138+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-03-15  5:14 UTC (permalink / raw)
  To: linux-arch; +Cc: Matthew Wilcox (Oracle), linux-mm, linux-kernel

Push the iteration over each page down to the architectures (many
can flush the entire THP without iteration).

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 mm/memory.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index c5f1bf906d0c..6aa21e8f3753 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4209,7 +4209,6 @@ vm_fault_t do_set_pmd(struct vm_fault *vmf, struct page *page)
 	bool write = vmf->flags & FAULT_FLAG_WRITE;
 	unsigned long haddr = vmf->address & HPAGE_PMD_MASK;
 	pmd_t entry;
-	int i;
 	vm_fault_t ret = VM_FAULT_FALLBACK;
 
 	if (!transhuge_vma_suitable(vma, haddr))
@@ -4242,8 +4241,7 @@ vm_fault_t do_set_pmd(struct vm_fault *vmf, struct page *page)
 	if (unlikely(!pmd_none(*vmf->pmd)))
 		goto out;
 
-	for (i = 0; i < HPAGE_PMD_NR; i++)
-		flush_icache_page(vma, page + i);
+	flush_icache_pages(vma, page, HPAGE_PMD_NR);
 
 	entry = mk_huge_pmd(page, vma->vm_page_prot);
 	if (write)
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH v4 33/36] filemap: Add filemap_map_folio_range()
  2023-03-15  5:14 [PATCH v4 00/36] New page table range API Matthew Wilcox (Oracle)
                   ` (31 preceding siblings ...)
  2023-03-15  5:14 ` [PATCH v4 32/36] mm: Use flush_icache_pages() in do_set_pmd() Matthew Wilcox (Oracle)
@ 2023-03-15  5:14 ` Matthew Wilcox (Oracle)
  2023-03-15  5:14 ` [PATCH v4 34/36] rmap: add folio_add_file_rmap_range() Matthew Wilcox (Oracle)
                   ` (2 subsequent siblings)
  35 siblings, 0 replies; 138+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-03-15  5:14 UTC (permalink / raw)
  To: linux-arch; +Cc: Yin Fengwei, linux-mm, linux-kernel, Matthew Wilcox

From: Yin Fengwei <fengwei.yin@intel.com>

filemap_map_folio_range() maps partial/full folio. Comparing to original
filemap_map_pages(), it updates refcount once per folio instead of per
page and gets minor performance improvement for large folio.

With a will-it-scale.page_fault3 like app (change file write
fault testing to read fault testing. Trying to upstream it to
will-it-scale at [1]), got 2% performance gain on a 48C/96T
Cascade Lake test box with 96 processes running against xfs.

[1]: https://github.com/antonblanchard/will-it-scale/pull/37

Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 mm/filemap.c | 98 +++++++++++++++++++++++++++++-----------------------
 1 file changed, 54 insertions(+), 44 deletions(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index a34abfe8c654..6e2b0778db45 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2199,16 +2199,6 @@ unsigned filemap_get_folios(struct address_space *mapping, pgoff_t *start,
 }
 EXPORT_SYMBOL(filemap_get_folios);
 
-static inline
-bool folio_more_pages(struct folio *folio, pgoff_t index, pgoff_t max)
-{
-	if (!folio_test_large(folio) || folio_test_hugetlb(folio))
-		return false;
-	if (index >= max)
-		return false;
-	return index < folio->index + folio_nr_pages(folio) - 1;
-}
-
 /**
  * filemap_get_folios_contig - Get a batch of contiguous folios
  * @mapping:	The address_space to search
@@ -3480,6 +3470,53 @@ static inline struct folio *next_map_page(struct address_space *mapping,
 				  mapping, xas, end_pgoff);
 }
 
+/*
+ * Map page range [start_page, start_page + nr_pages) of folio.
+ * start_page is gotten from start by folio_page(folio, start)
+ */
+static vm_fault_t filemap_map_folio_range(struct vm_fault *vmf,
+			struct folio *folio, unsigned long start,
+			unsigned long addr, unsigned int nr_pages)
+{
+	vm_fault_t ret = 0;
+	struct vm_area_struct *vma = vmf->vma;
+	struct file *file = vma->vm_file;
+	struct page *page = folio_page(folio, start);
+	unsigned int mmap_miss = READ_ONCE(file->f_ra.mmap_miss);
+	unsigned int ref_count = 0, count = 0;
+
+	do {
+		if (PageHWPoison(page))
+			continue;
+
+		if (mmap_miss > 0)
+			mmap_miss--;
+
+		/*
+		 * NOTE: If there're PTE markers, we'll leave them to be
+		 * handled in the specific fault path, and it'll prohibit the
+		 * fault-around logic.
+		 */
+		if (!pte_none(*vmf->pte))
+			continue;
+
+		if (vmf->address == addr)
+			ret = VM_FAULT_NOPAGE;
+
+		ref_count++;
+		do_set_pte(vmf, page, addr);
+		update_mmu_cache(vma, addr, vmf->pte);
+	} while (vmf->pte++, page++, addr += PAGE_SIZE, ++count < nr_pages);
+
+	/* Restore the vmf->pte */
+	vmf->pte -= nr_pages;
+
+	folio_ref_add(folio, ref_count);
+	WRITE_ONCE(file->f_ra.mmap_miss, mmap_miss);
+
+	return ret;
+}
+
 vm_fault_t filemap_map_pages(struct vm_fault *vmf,
 			     pgoff_t start_pgoff, pgoff_t end_pgoff)
 {
@@ -3490,9 +3527,9 @@ vm_fault_t filemap_map_pages(struct vm_fault *vmf,
 	unsigned long addr;
 	XA_STATE(xas, &mapping->i_pages, start_pgoff);
 	struct folio *folio;
-	struct page *page;
 	unsigned int mmap_miss = READ_ONCE(file->f_ra.mmap_miss);
 	vm_fault_t ret = 0;
+	int nr_pages = 0;
 
 	rcu_read_lock();
 	folio = first_map_page(mapping, &xas, end_pgoff);
@@ -3507,45 +3544,18 @@ vm_fault_t filemap_map_pages(struct vm_fault *vmf,
 	addr = vma->vm_start + ((start_pgoff - vma->vm_pgoff) << PAGE_SHIFT);
 	vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, addr, &vmf->ptl);
 	do {
-again:
-		page = folio_file_page(folio, xas.xa_index);
-		if (PageHWPoison(page))
-			goto unlock;
-
-		if (mmap_miss > 0)
-			mmap_miss--;
+		unsigned long end;
 
 		addr += (xas.xa_index - last_pgoff) << PAGE_SHIFT;
 		vmf->pte += xas.xa_index - last_pgoff;
 		last_pgoff = xas.xa_index;
+		end = folio->index + folio_nr_pages(folio) - 1;
+		nr_pages = min(end, end_pgoff) - xas.xa_index + 1;
 
-		/*
-		 * NOTE: If there're PTE markers, we'll leave them to be
-		 * handled in the specific fault path, and it'll prohibit the
-		 * fault-around logic.
-		 */
-		if (!pte_none(*vmf->pte))
-			goto unlock;
+		ret |= filemap_map_folio_range(vmf, folio,
+				xas.xa_index - folio->index, addr, nr_pages);
+		xas.xa_index += nr_pages;
 
-		/* We're about to handle the fault */
-		if (vmf->address == addr)
-			ret = VM_FAULT_NOPAGE;
-
-		do_set_pte(vmf, page, addr);
-		/* no need to invalidate: a not-present page won't be cached */
-		update_mmu_cache(vma, addr, vmf->pte);
-		if (folio_more_pages(folio, xas.xa_index, end_pgoff)) {
-			xas.xa_index++;
-			folio_ref_inc(folio);
-			goto again;
-		}
-		folio_unlock(folio);
-		continue;
-unlock:
-		if (folio_more_pages(folio, xas.xa_index, end_pgoff)) {
-			xas.xa_index++;
-			goto again;
-		}
 		folio_unlock(folio);
 		folio_put(folio);
 	} while ((folio = next_map_page(mapping, &xas, end_pgoff)) != NULL);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH v4 34/36] rmap: add folio_add_file_rmap_range()
  2023-03-15  5:14 [PATCH v4 00/36] New page table range API Matthew Wilcox (Oracle)
                   ` (32 preceding siblings ...)
  2023-03-15  5:14 ` [PATCH v4 33/36] filemap: Add filemap_map_folio_range() Matthew Wilcox (Oracle)
@ 2023-03-15  5:14 ` Matthew Wilcox (Oracle)
  2023-03-15 13:34   ` Ryan Roberts
  2023-03-15  5:14 ` [PATCH v4 35/36] mm: Convert do_set_pte() to set_pte_range() Matthew Wilcox (Oracle)
  2023-03-15  5:14 ` [PATCH v4 36/36] filemap: Batch PTE mappings Matthew Wilcox (Oracle)
  35 siblings, 1 reply; 138+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-03-15  5:14 UTC (permalink / raw)
  To: linux-arch; +Cc: Yin Fengwei, linux-mm, linux-kernel, Matthew Wilcox

From: Yin Fengwei <fengwei.yin@intel.com>

folio_add_file_rmap_range() allows to add pte mapping to a specific
range of file folio. Comparing to page_add_file_rmap(), it batched
updates __lruvec_stat for large folio.

Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 include/linux/rmap.h |  2 ++
 mm/rmap.c            | 60 +++++++++++++++++++++++++++++++++-----------
 2 files changed, 48 insertions(+), 14 deletions(-)

diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index b87d01660412..a3825ce81102 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -198,6 +198,8 @@ void folio_add_new_anon_rmap(struct folio *, struct vm_area_struct *,
 		unsigned long address);
 void page_add_file_rmap(struct page *, struct vm_area_struct *,
 		bool compound);
+void folio_add_file_rmap_range(struct folio *, struct page *, unsigned int nr,
+		struct vm_area_struct *, bool compound);
 void page_remove_rmap(struct page *, struct vm_area_struct *,
 		bool compound);
 
diff --git a/mm/rmap.c b/mm/rmap.c
index 4898e10c569a..a91906b28835 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1301,31 +1301,39 @@ void folio_add_new_anon_rmap(struct folio *folio, struct vm_area_struct *vma,
 }
 
 /**
- * page_add_file_rmap - add pte mapping to a file page
- * @page:	the page to add the mapping to
+ * folio_add_file_rmap_range - add pte mapping to page range of a folio
+ * @folio:	The folio to add the mapping to
+ * @page:	The first page to add
+ * @nr_pages:	The number of pages which will be mapped
  * @vma:	the vm area in which the mapping is added
  * @compound:	charge the page as compound or small page
  *
+ * The page range of folio is defined by [first_page, first_page + nr_pages)
+ *
  * The caller needs to hold the pte lock.
  */
-void page_add_file_rmap(struct page *page, struct vm_area_struct *vma,
-		bool compound)
+void folio_add_file_rmap_range(struct folio *folio, struct page *page,
+			unsigned int nr_pages, struct vm_area_struct *vma,
+			bool compound)
 {
-	struct folio *folio = page_folio(page);
 	atomic_t *mapped = &folio->_nr_pages_mapped;
-	int nr = 0, nr_pmdmapped = 0;
-	bool first;
+	unsigned int nr_pmdmapped = 0, first;
+	int nr = 0;
 
-	VM_BUG_ON_PAGE(compound && !PageTransHuge(page), page);
+	VM_WARN_ON_FOLIO(compound && !folio_test_pmd_mappable(folio), folio);
 
 	/* Is page being mapped by PTE? Is this its first map to be added? */
 	if (likely(!compound)) {
-		first = atomic_inc_and_test(&page->_mapcount);
-		nr = first;
-		if (first && folio_test_large(folio)) {
-			nr = atomic_inc_return_relaxed(mapped);
-			nr = (nr < COMPOUND_MAPPED);
-		}
+		do {
+			first = atomic_inc_and_test(&page->_mapcount);
+			if (first && folio_test_large(folio)) {
+				first = atomic_inc_return_relaxed(mapped);
+				first = (nr < COMPOUND_MAPPED);
+			}
+
+			if (first)
+				nr++;
+		} while (page++, --nr_pages > 0);
 	} else if (folio_test_pmd_mappable(folio)) {
 		/* That test is redundant: it's for safety or to optimize out */
 
@@ -1354,6 +1362,30 @@ void page_add_file_rmap(struct page *page, struct vm_area_struct *vma,
 	mlock_vma_folio(folio, vma, compound);
 }
 
+/**
+ * page_add_file_rmap - add pte mapping to a file page
+ * @page:	the page to add the mapping to
+ * @vma:	the vm area in which the mapping is added
+ * @compound:	charge the page as compound or small page
+ *
+ * The caller needs to hold the pte lock.
+ */
+void page_add_file_rmap(struct page *page, struct vm_area_struct *vma,
+		bool compound)
+{
+	struct folio *folio = page_folio(page);
+	unsigned int nr_pages;
+
+	VM_WARN_ON_ONCE_PAGE(compound && !PageTransHuge(page), page);
+
+	if (likely(!compound))
+		nr_pages = 1;
+	else
+		nr_pages = folio_nr_pages(folio);
+
+	folio_add_file_rmap_range(folio, page, nr_pages, vma, compound);
+}
+
 /**
  * page_remove_rmap - take down pte mapping from a page
  * @page:	page to remove mapping from
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH v4 35/36] mm: Convert do_set_pte() to set_pte_range()
  2023-03-15  5:14 [PATCH v4 00/36] New page table range API Matthew Wilcox (Oracle)
                   ` (33 preceding siblings ...)
  2023-03-15  5:14 ` [PATCH v4 34/36] rmap: add folio_add_file_rmap_range() Matthew Wilcox (Oracle)
@ 2023-03-15  5:14 ` Matthew Wilcox (Oracle)
  2023-03-15 15:26   ` Ryan Roberts
  2023-03-15  5:14 ` [PATCH v4 36/36] filemap: Batch PTE mappings Matthew Wilcox (Oracle)
  35 siblings, 1 reply; 138+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-03-15  5:14 UTC (permalink / raw)
  To: linux-arch; +Cc: Yin Fengwei, linux-mm, linux-kernel, Matthew Wilcox

From: Yin Fengwei <fengwei.yin@intel.com>

set_pte_range() allows to setup page table entries for a specific
range.  It takes advantage of batched rmap update for large folio.
It now takes care of calling update_mmu_cache_range().

Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 Documentation/filesystems/locking.rst |  2 +-
 include/linux/mm.h                    |  3 ++-
 mm/filemap.c                          |  3 +--
 mm/memory.c                           | 27 +++++++++++++++------------
 4 files changed, 19 insertions(+), 16 deletions(-)

diff --git a/Documentation/filesystems/locking.rst b/Documentation/filesystems/locking.rst
index 7de7a7272a5e..922886fefb7f 100644
--- a/Documentation/filesystems/locking.rst
+++ b/Documentation/filesystems/locking.rst
@@ -663,7 +663,7 @@ locked. The VM will unlock the page.
 Filesystem should find and map pages associated with offsets from "start_pgoff"
 till "end_pgoff". ->map_pages() is called with page table locked and must
 not block.  If it's not possible to reach a page without blocking,
-filesystem should skip it. Filesystem should use do_set_pte() to setup
+filesystem should skip it. Filesystem should use set_pte_range() to setup
 page table entry. Pointer to entry associated with the page is passed in
 "pte" field in vm_fault structure. Pointers to entries for other offsets
 should be calculated relative to "pte".
diff --git a/include/linux/mm.h b/include/linux/mm.h
index ee755bb4e1c1..81788c985a8c 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1299,7 +1299,8 @@ static inline pte_t maybe_mkwrite(pte_t pte, struct vm_area_struct *vma)
 }
 
 vm_fault_t do_set_pmd(struct vm_fault *vmf, struct page *page);
-void do_set_pte(struct vm_fault *vmf, struct page *page, unsigned long addr);
+void set_pte_range(struct vm_fault *vmf, struct folio *folio,
+		struct page *page, unsigned int nr, unsigned long addr);
 
 vm_fault_t finish_fault(struct vm_fault *vmf);
 vm_fault_t finish_mkwrite_fault(struct vm_fault *vmf);
diff --git a/mm/filemap.c b/mm/filemap.c
index 6e2b0778db45..e2317623dcbf 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -3504,8 +3504,7 @@ static vm_fault_t filemap_map_folio_range(struct vm_fault *vmf,
 			ret = VM_FAULT_NOPAGE;
 
 		ref_count++;
-		do_set_pte(vmf, page, addr);
-		update_mmu_cache(vma, addr, vmf->pte);
+		set_pte_range(vmf, folio, page, 1, addr);
 	} while (vmf->pte++, page++, addr += PAGE_SIZE, ++count < nr_pages);
 
 	/* Restore the vmf->pte */
diff --git a/mm/memory.c b/mm/memory.c
index 6aa21e8f3753..9a654802f104 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4274,7 +4274,8 @@ vm_fault_t do_set_pmd(struct vm_fault *vmf, struct page *page)
 }
 #endif
 
-void do_set_pte(struct vm_fault *vmf, struct page *page, unsigned long addr)
+void set_pte_range(struct vm_fault *vmf, struct folio *folio,
+		struct page *page, unsigned int nr, unsigned long addr)
 {
 	struct vm_area_struct *vma = vmf->vma;
 	bool uffd_wp = vmf_orig_pte_uffd_wp(vmf);
@@ -4282,7 +4283,7 @@ void do_set_pte(struct vm_fault *vmf, struct page *page, unsigned long addr)
 	bool prefault = vmf->address != addr;
 	pte_t entry;
 
-	flush_icache_page(vma, page);
+	flush_icache_pages(vma, page, nr);
 	entry = mk_pte(page, vma->vm_page_prot);
 
 	if (prefault && arch_wants_old_prefaulted_pte())
@@ -4296,14 +4297,18 @@ void do_set_pte(struct vm_fault *vmf, struct page *page, unsigned long addr)
 		entry = pte_mkuffd_wp(entry);
 	/* copy-on-write page */
 	if (write && !(vma->vm_flags & VM_SHARED)) {
-		inc_mm_counter(vma->vm_mm, MM_ANONPAGES);
-		page_add_new_anon_rmap(page, vma, addr);
-		lru_cache_add_inactive_or_unevictable(page, vma);
+		add_mm_counter(vma->vm_mm, MM_ANONPAGES, nr);
+		VM_BUG_ON_FOLIO(nr != 1, folio);
+		folio_add_new_anon_rmap(folio, vma, addr);
+		folio_add_lru_vma(folio, vma);
 	} else {
-		inc_mm_counter(vma->vm_mm, mm_counter_file(page));
-		page_add_file_rmap(page, vma, false);
+		add_mm_counter(vma->vm_mm, mm_counter_file(page), nr);
+		folio_add_file_rmap_range(folio, page, nr, vma, false);
 	}
-	set_pte_at(vma->vm_mm, addr, vmf->pte, entry);
+	set_ptes(vma->vm_mm, addr, vmf->pte, entry, nr);
+
+	/* no need to invalidate: a not-present page won't be cached */
+	update_mmu_cache_range(vma, addr, vmf->pte, nr);
 }
 
 static bool vmf_pte_changed(struct vm_fault *vmf)
@@ -4376,11 +4381,9 @@ vm_fault_t finish_fault(struct vm_fault *vmf)
 
 	/* Re-check under ptl */
 	if (likely(!vmf_pte_changed(vmf))) {
-		do_set_pte(vmf, page, vmf->address);
-
-		/* no need to invalidate: a not-present page won't be cached */
-		update_mmu_cache(vma, vmf->address, vmf->pte);
+		struct folio *folio = page_folio(page);
 
+		set_pte_range(vmf, folio, page, 1, vmf->address);
 		ret = 0;
 	} else {
 		update_mmu_tlb(vma, vmf->address, vmf->pte);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH v4 36/36] filemap: Batch PTE mappings
  2023-03-15  5:14 [PATCH v4 00/36] New page table range API Matthew Wilcox (Oracle)
                   ` (34 preceding siblings ...)
  2023-03-15  5:14 ` [PATCH v4 35/36] mm: Convert do_set_pte() to set_pte_range() Matthew Wilcox (Oracle)
@ 2023-03-15  5:14 ` Matthew Wilcox (Oracle)
  35 siblings, 0 replies; 138+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-03-15  5:14 UTC (permalink / raw)
  To: linux-arch; +Cc: Yin Fengwei, linux-mm, linux-kernel, Matthew Wilcox

From: Yin Fengwei <fengwei.yin@intel.com>

Call set_pte_range() once per contiguous range of the folio instead
of once per page.  This batches the updates to mm counters and the
rmap.

With a will-it-scale.page_fault3 like app (change file write
fault testing to read fault testing. Trying to upstream it to
will-it-scale at [1]) got 15% performance gain on a 48C/96T
Cascade Lake test box with 96 processes running against xfs.

Perf data collected before/after the change:
  18.73%--page_add_file_rmap
          |
           --11.60%--__mod_lruvec_page_state
                     |
                     |--7.40%--__mod_memcg_lruvec_state
                     |          |
                     |           --5.58%--cgroup_rstat_updated
                     |
                      --2.53%--__mod_lruvec_state
                                |
                                 --1.48%--__mod_node_page_state

  9.93%--page_add_file_rmap_range
         |
          --2.67%--__mod_lruvec_page_state
                    |
                    |--1.95%--__mod_memcg_lruvec_state
                    |          |
                    |           --1.57%--cgroup_rstat_updated
                    |
                     --0.61%--__mod_lruvec_state
                               |
                                --0.54%--__mod_node_page_state

The running time of __mode_lruvec_page_state() is reduced about 9%.

[1]: https://github.com/antonblanchard/will-it-scale/pull/37

Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 mm/filemap.c | 36 +++++++++++++++++++++++++-----------
 1 file changed, 25 insertions(+), 11 deletions(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index e2317623dcbf..7a1534460b55 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -3483,11 +3483,12 @@ static vm_fault_t filemap_map_folio_range(struct vm_fault *vmf,
 	struct file *file = vma->vm_file;
 	struct page *page = folio_page(folio, start);
 	unsigned int mmap_miss = READ_ONCE(file->f_ra.mmap_miss);
-	unsigned int ref_count = 0, count = 0;
+	unsigned int count = 0;
+	pte_t *old_ptep = vmf->pte;
 
 	do {
-		if (PageHWPoison(page))
-			continue;
+		if (PageHWPoison(page + count))
+			goto skip;
 
 		if (mmap_miss > 0)
 			mmap_miss--;
@@ -3497,20 +3498,33 @@ static vm_fault_t filemap_map_folio_range(struct vm_fault *vmf,
 		 * handled in the specific fault path, and it'll prohibit the
 		 * fault-around logic.
 		 */
-		if (!pte_none(*vmf->pte))
-			continue;
+		if (!pte_none(vmf->pte[count]))
+			goto skip;
 
 		if (vmf->address == addr)
 			ret = VM_FAULT_NOPAGE;
 
-		ref_count++;
-		set_pte_range(vmf, folio, page, 1, addr);
-	} while (vmf->pte++, page++, addr += PAGE_SIZE, ++count < nr_pages);
+		count++;
+		continue;
+skip:
+		if (count) {
+			set_pte_range(vmf, folio, page, count, addr);
+			folio_ref_add(folio, count);
+		}
 
-	/* Restore the vmf->pte */
-	vmf->pte -= nr_pages;
+		count++;
+		page += count;
+		vmf->pte += count;
+		addr += count * PAGE_SIZE;
+		count = 0;
+	} while (--nr_pages > 0);
+
+	if (count) {
+		set_pte_range(vmf, folio, page, count, addr);
+		folio_ref_add(folio, count);
+	}
 
-	folio_ref_add(folio, ref_count);
+	vmf->pte = old_ptep;
 	WRITE_ONCE(file->f_ra.mmap_miss, mmap_miss);
 
 	return ret;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 138+ messages in thread

* Re: [PATCH v4 23/36] superh: Implement the new page table range API
  2023-03-15  5:14 ` [PATCH v4 23/36] superh: " Matthew Wilcox (Oracle)
@ 2023-03-15  7:22   ` John Paul Adrian Glaubitz
  2023-03-15  7:36   ` John Paul Adrian Glaubitz
  2023-03-15 10:10   ` Mike Rapoport
  2 siblings, 0 replies; 138+ messages in thread
From: John Paul Adrian Glaubitz @ 2023-03-15  7:22 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), linux-arch
  Cc: linux-mm, linux-kernel, Yoshinori Sato, Rich Felker, linux-sh

Hi Matthew!

On Wed, 2023-03-15 at 05:14 +0000, Matthew Wilcox (Oracle) wrote:
> Add PFN_PTE_SHIFT, update_mmu_cache_range(), flush_dcache_folio() and
> flush_icache_pages().  Change the PG_dcache_clean flag from being
> per-page to per-folio.  Flush the entire folio containing the pages in
> flush_icache_pages() for ease of implementation.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
> Cc: Rich Felker <dalias@libc.org>
> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
> Cc: linux-sh@vger.kernel.org

I'm going to test this patch later today and report back.

In the meantime, could you change "superh" in the subject to "sh"?

Thanks,
Adrian

-- 
 .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer
`. `'   Physicist
  `-    GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v4 23/36] superh: Implement the new page table range API
  2023-03-15  5:14 ` [PATCH v4 23/36] superh: " Matthew Wilcox (Oracle)
  2023-03-15  7:22   ` John Paul Adrian Glaubitz
@ 2023-03-15  7:36   ` John Paul Adrian Glaubitz
  2023-03-15 10:10   ` Mike Rapoport
  2 siblings, 0 replies; 138+ messages in thread
From: John Paul Adrian Glaubitz @ 2023-03-15  7:36 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), linux-arch
  Cc: linux-mm, linux-kernel, Yoshinori Sato, Rich Felker, linux-sh

Hi Matthew!

On Wed, 2023-03-15 at 05:14 +0000, Matthew Wilcox (Oracle) wrote:
> Add PFN_PTE_SHIFT, update_mmu_cache_range(), flush_dcache_folio() and
> flush_icache_pages().  Change the PG_dcache_clean flag from being
> per-page to per-folio.  Flush the entire folio containing the pages in
> flush_icache_pages() for ease of implementation.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
> Cc: Rich Felker <dalias@libc.org>
> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
> Cc: linux-sh@vger.kernel.org
> ---
>  arch/sh/include/asm/cacheflush.h | 21 ++++++++-----
>  arch/sh/include/asm/pgtable.h    |  6 ++--
>  arch/sh/include/asm/pgtable_32.h |  5 ++-
>  arch/sh/mm/cache-j2.c            |  4 +--
>  arch/sh/mm/cache-sh4.c           | 26 +++++++++++-----
>  arch/sh/mm/cache-sh7705.c        | 26 ++++++++++------
>  arch/sh/mm/cache.c               | 52 ++++++++++++++++++--------------
>  arch/sh/mm/kmap.c                |  3 +-
>  8 files changed, 88 insertions(+), 55 deletions(-)
> 
> diff --git a/arch/sh/include/asm/cacheflush.h b/arch/sh/include/asm/cacheflush.h
> index 481a664287e2..9fceef6f3e00 100644
> --- a/arch/sh/include/asm/cacheflush.h
> +++ b/arch/sh/include/asm/cacheflush.h
> @@ -13,9 +13,9 @@
>   *  - flush_cache_page(mm, vmaddr, pfn) flushes a single page
>   *  - flush_cache_range(vma, start, end) flushes a range of pages
>   *
> - *  - flush_dcache_page(pg) flushes(wback&invalidates) a page for dcache
> + *  - flush_dcache_folio(folio) flushes(wback&invalidates) a folio for dcache
>   *  - flush_icache_range(start, end) flushes(invalidates) a range for icache
> - *  - flush_icache_page(vma, pg) flushes(invalidates) a page for icache
> + *  - flush_icache_pages(vma, pg, nr) flushes(invalidates) pages for icache
>   *  - flush_cache_sigtramp(vaddr) flushes the signal trampoline
>   */
>  extern void (*local_flush_cache_all)(void *args);
> @@ -23,9 +23,9 @@ extern void (*local_flush_cache_mm)(void *args);
>  extern void (*local_flush_cache_dup_mm)(void *args);
>  extern void (*local_flush_cache_page)(void *args);
>  extern void (*local_flush_cache_range)(void *args);
> -extern void (*local_flush_dcache_page)(void *args);
> +extern void (*local_flush_dcache_folio)(void *args);
>  extern void (*local_flush_icache_range)(void *args);
> -extern void (*local_flush_icache_page)(void *args);
> +extern void (*local_flush_icache_folio)(void *args);
>  extern void (*local_flush_cache_sigtramp)(void *args);
>  
>  static inline void cache_noop(void *args) { }
> @@ -42,11 +42,18 @@ extern void flush_cache_page(struct vm_area_struct *vma,
>  extern void flush_cache_range(struct vm_area_struct *vma,
>  				 unsigned long start, unsigned long end);
>  #define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
> -void flush_dcache_page(struct page *page);
> +void flush_dcache_folio(struct folio *folio);
> +#define flush_dcache_folio flush_dcache_folio
> +static inline void flush_dcache_page(struct page *page)
> +{
> +	flush_dcache_folio(page_folio(page));
> +}
> +
>  extern void flush_icache_range(unsigned long start, unsigned long end);
>  #define flush_icache_user_range flush_icache_range
> -extern void flush_icache_page(struct vm_area_struct *vma,
> -				 struct page *page);
> +void flush_icache_pages(struct vm_area_struct *vma, struct page *page,
> +		unsigned int nr);
> +#define flush_icache_page(vma, page) flush_icache_pages(vma, page, 1)
>  extern void flush_cache_sigtramp(unsigned long address);
>  
>  struct flusher_data {
> diff --git a/arch/sh/include/asm/pgtable.h b/arch/sh/include/asm/pgtable.h
> index 3ce30becf6df..1a8fdc3bc363 100644
> --- a/arch/sh/include/asm/pgtable.h
> +++ b/arch/sh/include/asm/pgtable.h
> @@ -102,13 +102,15 @@ extern void __update_cache(struct vm_area_struct *vma,
>  extern void __update_tlb(struct vm_area_struct *vma,
>  			 unsigned long address, pte_t pte);
>  
> -static inline void
> -update_mmu_cache(struct vm_area_struct *vma, unsigned long address, pte_t *ptep)
> +static inline void update_mmu_cache_range(struct vm_area_struct *vma,
> +		unsigned long address, pte_t *ptep, unsigned int nr)
>  {
>  	pte_t pte = *ptep;
>  	__update_cache(vma, address, pte);
>  	__update_tlb(vma, address, pte);
>  }
> +#define update_mmu_cache(vma, addr, ptep) \
> +	update_mmu_cache_range(vma, addr, ptep, 1)
>  
>  extern pgd_t swapper_pg_dir[PTRS_PER_PGD];
>  extern void paging_init(void);
> diff --git a/arch/sh/include/asm/pgtable_32.h b/arch/sh/include/asm/pgtable_32.h
> index 21952b094650..676f3d4ef6ce 100644
> --- a/arch/sh/include/asm/pgtable_32.h
> +++ b/arch/sh/include/asm/pgtable_32.h
> @@ -307,14 +307,13 @@ static inline void set_pte(pte_t *ptep, pte_t pte)
>  #define set_pte(pteptr, pteval) (*(pteptr) = pteval)
>  #endif
>  
> -#define set_pte_at(mm,addr,ptep,pteval) set_pte(ptep,pteval)
> -
>  /*
>   * (pmds are folded into pgds so this doesn't get actually called,
>   * but the define is needed for a generic inline function.)
>   */
>  #define set_pmd(pmdptr, pmdval) (*(pmdptr) = pmdval)
>  
> +#define PFN_PTE_SHIFT	PAGE_SHIFT
>  #define pfn_pte(pfn, prot) \
>  	__pte(((unsigned long long)(pfn) << PAGE_SHIFT) | pgprot_val(prot))
>  #define pfn_pmd(pfn, prot) \
> @@ -323,7 +322,7 @@ static inline void set_pte(pte_t *ptep, pte_t pte)
>  #define pte_none(x)		(!pte_val(x))
>  #define pte_present(x)		((x).pte_low & (_PAGE_PRESENT | _PAGE_PROTNONE))
>  
> -#define pte_clear(mm,addr,xp) do { set_pte_at(mm, addr, xp, __pte(0)); } while (0)
> +#define pte_clear(mm, addr, ptep) set_pte(ptep, __pte(0))
>  
>  #define pmd_none(x)	(!pmd_val(x))
>  #define pmd_present(x)	(pmd_val(x))
> diff --git a/arch/sh/mm/cache-j2.c b/arch/sh/mm/cache-j2.c
> index f277862a11f5..9ac960214380 100644
> --- a/arch/sh/mm/cache-j2.c
> +++ b/arch/sh/mm/cache-j2.c
> @@ -55,9 +55,9 @@ void __init j2_cache_init(void)
>  	local_flush_cache_dup_mm = j2_flush_both;
>  	local_flush_cache_page = j2_flush_both;
>  	local_flush_cache_range = j2_flush_both;
> -	local_flush_dcache_page = j2_flush_dcache;
> +	local_flush_dcache_folio = j2_flush_dcache;
>  	local_flush_icache_range = j2_flush_icache;
> -	local_flush_icache_page = j2_flush_icache;
> +	local_flush_icache_folio = j2_flush_icache;
>  	local_flush_cache_sigtramp = j2_flush_icache;
>  
>  	pr_info("Initial J2 CCR is %.8x\n", __raw_readl(j2_ccr_base));
> diff --git a/arch/sh/mm/cache-sh4.c b/arch/sh/mm/cache-sh4.c
> index 72c2e1b46c08..862046f26981 100644
> --- a/arch/sh/mm/cache-sh4.c
> +++ b/arch/sh/mm/cache-sh4.c
> @@ -107,19 +107,29 @@ static inline void flush_cache_one(unsigned long start, unsigned long phys)
>   * Write back & invalidate the D-cache of the page.
>   * (To avoid "alias" issues)
>   */
> -static void sh4_flush_dcache_page(void *arg)
> +static void sh4_flush_dcache_folio(void *arg)
>  {
> -	struct page *page = arg;
> -	unsigned long addr = (unsigned long)page_address(page);
> +	struct folio *folio = arg;
>  #ifndef CONFIG_SMP
> -	struct address_space *mapping = page_mapping_file(page);
> +	struct address_space *mapping = folio_flush_mapping(folio);
>  
>  	if (mapping && !mapping_mapped(mapping))
> -		clear_bit(PG_dcache_clean, &page->flags);
> +		clear_bit(PG_dcache_clean, &folio->flags);
>  	else
>  #endif
> -		flush_cache_one(CACHE_OC_ADDRESS_ARRAY |
> -				(addr & shm_align_mask), page_to_phys(page));
> +	{
> +		unsigned long pfn = folio_pfn(folio);
> +		unsigned long addr = (unsigned long)folio_address(folio);
> +		unsigned int i, nr = folio_nr_pages(folio);
> +
> +		for (i = 0; i < nr; i++) {
> +			flush_cache_one(CACHE_OC_ADDRESS_ARRAY |
> +						(addr & shm_align_mask),
> +					pfn * PAGE_SIZE);
> +			addr += PAGE_SIZE;
> +			pfn++;
> +		}
> +	}
>  
>  	wmb();
>  }
> @@ -379,7 +389,7 @@ void __init sh4_cache_init(void)
>  		__raw_readl(CCN_PRR));
>  
>  	local_flush_icache_range	= sh4_flush_icache_range;
> -	local_flush_dcache_page		= sh4_flush_dcache_page;
> +	local_flush_dcache_folio	= sh4_flush_dcache_folio;
>  	local_flush_cache_all		= sh4_flush_cache_all;
>  	local_flush_cache_mm		= sh4_flush_cache_mm;
>  	local_flush_cache_dup_mm	= sh4_flush_cache_mm;
> diff --git a/arch/sh/mm/cache-sh7705.c b/arch/sh/mm/cache-sh7705.c
> index 9b63a53a5e46..b509a407588f 100644
> --- a/arch/sh/mm/cache-sh7705.c
> +++ b/arch/sh/mm/cache-sh7705.c
> @@ -132,15 +132,20 @@ static void __flush_dcache_page(unsigned long phys)
>   * Write back & invalidate the D-cache of the page.
>   * (To avoid "alias" issues)
>   */
> -static void sh7705_flush_dcache_page(void *arg)
> +static void sh7705_flush_dcache_folio(void *arg)
>  {
> -	struct page *page = arg;
> -	struct address_space *mapping = page_mapping_file(page);
> +	struct folio *folio = arg;
> +	struct address_space *mapping = folio_flush_mapping(folio);
>  
>  	if (mapping && !mapping_mapped(mapping))
> -		clear_bit(PG_dcache_clean, &page->flags);
> -	else
> -		__flush_dcache_page(__pa(page_address(page)));
> +		clear_bit(PG_dcache_clean, &folio->flags);
> +	else {
> +		unsigned long pfn = folio_pfn(folio);
> +		unsigned int i, nr = folio_nr_pages(folio);
> +
> +		for (i = 0; i < nr; i++)
> +			__flush_dcache_page((pfn + i) * PAGE_SIZE);
> +	}
>  }
>  
>  static void sh7705_flush_cache_all(void *args)
> @@ -176,19 +181,20 @@ static void sh7705_flush_cache_page(void *args)
>   * Not entirely sure why this is necessary on SH3 with 32K cache but
>   * without it we get occasional "Memory fault" when loading a program.
>   */
> -static void sh7705_flush_icache_page(void *page)
> +static void sh7705_flush_icache_folio(void *arg)
>  {
> -	__flush_purge_region(page_address(page), PAGE_SIZE);
> +	struct folio *folio = arg;
> +	__flush_purge_region(folio_address(folio), folio_size(folio));
>  }
>  
>  void __init sh7705_cache_init(void)
>  {
>  	local_flush_icache_range	= sh7705_flush_icache_range;
> -	local_flush_dcache_page		= sh7705_flush_dcache_page;
> +	local_flush_dcache_folio	= sh7705_flush_dcache_folio;
>  	local_flush_cache_all		= sh7705_flush_cache_all;
>  	local_flush_cache_mm		= sh7705_flush_cache_all;
>  	local_flush_cache_dup_mm	= sh7705_flush_cache_all;
>  	local_flush_cache_range		= sh7705_flush_cache_all;
>  	local_flush_cache_page		= sh7705_flush_cache_page;
> -	local_flush_icache_page		= sh7705_flush_icache_page;
> +	local_flush_icache_folio	= sh7705_flush_icache_folio;
>  }
> diff --git a/arch/sh/mm/cache.c b/arch/sh/mm/cache.c
> index 3aef78ceb820..9bcaa5619eab 100644
> --- a/arch/sh/mm/cache.c
> +++ b/arch/sh/mm/cache.c
> @@ -20,9 +20,9 @@ void (*local_flush_cache_mm)(void *args) = cache_noop;
>  void (*local_flush_cache_dup_mm)(void *args) = cache_noop;
>  void (*local_flush_cache_page)(void *args) = cache_noop;
>  void (*local_flush_cache_range)(void *args) = cache_noop;
> -void (*local_flush_dcache_page)(void *args) = cache_noop;
> +void (*local_flush_dcache_folio)(void *args) = cache_noop;
>  void (*local_flush_icache_range)(void *args) = cache_noop;
> -void (*local_flush_icache_page)(void *args) = cache_noop;
> +void (*local_flush_icache_folio)(void *args) = cache_noop;
>  void (*local_flush_cache_sigtramp)(void *args) = cache_noop;
>  
>  void (*__flush_wback_region)(void *start, int size);
> @@ -61,15 +61,17 @@ void copy_to_user_page(struct vm_area_struct *vma, struct page *page,
>  		       unsigned long vaddr, void *dst, const void *src,
>  		       unsigned long len)
>  {
> -	if (boot_cpu_data.dcache.n_aliases && page_mapcount(page) &&
> -	    test_bit(PG_dcache_clean, &page->flags)) {
> +	struct folio *folio = page_folio(page);
> +
> +	if (boot_cpu_data.dcache.n_aliases && folio_mapped(folio) &&
> +	    test_bit(PG_dcache_clean, &folio->flags)) {
>  		void *vto = kmap_coherent(page, vaddr) + (vaddr & ~PAGE_MASK);
>  		memcpy(vto, src, len);
>  		kunmap_coherent(vto);
>  	} else {
>  		memcpy(dst, src, len);
>  		if (boot_cpu_data.dcache.n_aliases)
> -			clear_bit(PG_dcache_clean, &page->flags);
> +			clear_bit(PG_dcache_clean, &folio->flags);
>  	}
>  
>  	if (vma->vm_flags & VM_EXEC)
> @@ -80,27 +82,30 @@ void copy_from_user_page(struct vm_area_struct *vma, struct page *page,
>  			 unsigned long vaddr, void *dst, const void *src,
>  			 unsigned long len)
>  {
> +	struct folio *folio = page_folio(page);
> +
>  	if (boot_cpu_data.dcache.n_aliases && page_mapcount(page) &&
> -	    test_bit(PG_dcache_clean, &page->flags)) {
> +	    test_bit(PG_dcache_clean, &folio->flags)) {
>  		void *vfrom = kmap_coherent(page, vaddr) + (vaddr & ~PAGE_MASK);
>  		memcpy(dst, vfrom, len);
>  		kunmap_coherent(vfrom);
>  	} else {
>  		memcpy(dst, src, len);
>  		if (boot_cpu_data.dcache.n_aliases)
> -			clear_bit(PG_dcache_clean, &page->flags);
> +			clear_bit(PG_dcache_clean, &folio->flags);
>  	}
>  }
>  
>  void copy_user_highpage(struct page *to, struct page *from,
>  			unsigned long vaddr, struct vm_area_struct *vma)
>  {
> +	struct folio *src = page_folio(from);
>  	void *vfrom, *vto;
>  
>  	vto = kmap_atomic(to);
>  
> -	if (boot_cpu_data.dcache.n_aliases && page_mapcount(from) &&
> -	    test_bit(PG_dcache_clean, &from->flags)) {
> +	if (boot_cpu_data.dcache.n_aliases && folio_mapped(src) &&
> +	    test_bit(PG_dcache_clean, &src->flags)) {
>  		vfrom = kmap_coherent(from, vaddr);
>  		copy_page(vto, vfrom);
>  		kunmap_coherent(vfrom);
> @@ -136,27 +141,28 @@ EXPORT_SYMBOL(clear_user_highpage);
>  void __update_cache(struct vm_area_struct *vma,
>  		    unsigned long address, pte_t pte)
>  {
> -	struct page *page;
>  	unsigned long pfn = pte_pfn(pte);
>  
>  	if (!boot_cpu_data.dcache.n_aliases)
>  		return;
>  
> -	page = pfn_to_page(pfn);
>  	if (pfn_valid(pfn)) {
> -		int dirty = !test_and_set_bit(PG_dcache_clean, &page->flags);
> +		struct folio *folio = page_folio(pfn_to_page(pfn));
> +		int dirty = !test_and_set_bit(PG_dcache_clean, &folio->flags);
>  		if (dirty)
> -			__flush_purge_region(page_address(page), PAGE_SIZE);
> +			__flush_purge_region(folio_address(folio),
> +						folio_size(folio));
>  	}
>  }
>  
>  void __flush_anon_page(struct page *page, unsigned long vmaddr)
>  {
> +	struct folio *folio = page_folio(page);
>  	unsigned long addr = (unsigned long) page_address(page);
>  
>  	if (pages_do_alias(addr, vmaddr)) {
> -		if (boot_cpu_data.dcache.n_aliases && page_mapcount(page) &&
> -		    test_bit(PG_dcache_clean, &page->flags)) {
> +		if (boot_cpu_data.dcache.n_aliases && folio_mapped(folio) &&
> +		    test_bit(PG_dcache_clean, &folio->flags)) {
>  			void *kaddr;
>  
>  			kaddr = kmap_coherent(page, vmaddr);
> @@ -164,7 +170,8 @@ void __flush_anon_page(struct page *page, unsigned long vmaddr)
>  			/* __flush_purge_region((void *)kaddr, PAGE_SIZE); */
>  			kunmap_coherent(kaddr);
>  		} else
> -			__flush_purge_region((void *)addr, PAGE_SIZE);
> +			__flush_purge_region(folio_address(folio),
> +						folio_size(folio));
>  	}
>  }
>  
> @@ -215,11 +222,11 @@ void flush_cache_range(struct vm_area_struct *vma, unsigned long start,
>  }
>  EXPORT_SYMBOL(flush_cache_range);
>  
> -void flush_dcache_page(struct page *page)
> +void flush_dcache_folio(struct folio *folio)
>  {
> -	cacheop_on_each_cpu(local_flush_dcache_page, page, 1);
> +	cacheop_on_each_cpu(local_flush_dcache_folio, folio, 1);
>  }
> -EXPORT_SYMBOL(flush_dcache_page);
> +EXPORT_SYMBOL(flush_dcache_folio);
>  
>  void flush_icache_range(unsigned long start, unsigned long end)
>  {
> @@ -233,10 +240,11 @@ void flush_icache_range(unsigned long start, unsigned long end)
>  }
>  EXPORT_SYMBOL(flush_icache_range);
>  
> -void flush_icache_page(struct vm_area_struct *vma, struct page *page)
> +void flush_icache_pages(struct vm_area_struct *vma, struct page *page,
> +		unsigned int nr)
>  {
> -	/* Nothing uses the VMA, so just pass the struct page along */
> -	cacheop_on_each_cpu(local_flush_icache_page, page, 1);
> +	/* Nothing uses the VMA, so just pass the folio along */
> +	cacheop_on_each_cpu(local_flush_icache_folio, page_folio(page), 1);
>  }
>  
>  void flush_cache_sigtramp(unsigned long address)
> diff --git a/arch/sh/mm/kmap.c b/arch/sh/mm/kmap.c
> index 73fd7cc99430..fa50e8f6e7a9 100644
> --- a/arch/sh/mm/kmap.c
> +++ b/arch/sh/mm/kmap.c
> @@ -27,10 +27,11 @@ void __init kmap_coherent_init(void)
>  
>  void *kmap_coherent(struct page *page, unsigned long addr)
>  {
> +	struct folio *folio = page_folio(page);
>  	enum fixed_addresses idx;
>  	unsigned long vaddr;
>  
> -	BUG_ON(!test_bit(PG_dcache_clean, &page->flags));
> +	BUG_ON(!test_bit(PG_dcache_clean, &folio->flags));
>  
>  	preempt_disable();
>  	pagefault_disable();

Doesn't build for me with CONFIG_WERROR:

  CC      arch/sh/kernel/asm-offsets.s
In file included from ./include/linux/mm.h:29,
                 from arch/sh/kernel/asm-offsets.c:14:
./include/linux/pgtable.h: In function 'ptep_test_and_clear_young':
./include/linux/pgtable.h:217:17: error: implicit declaration of function 'set_pte_at'; did you mean 'set_pte'? [-Werror=implicit-function-
declaration]
  217 |                 set_pte_at(vma->vm_mm, address, ptep, pte_mkold(pte));
      |                 ^~~~~~~~~~
      |                 set_pte
cc1: some warnings being treated as errors
make[1]: *** [scripts/Makefile.build:114: arch/sh/kernel/asm-offsets.s] Error 1
make: *** [Makefile:1287: prepare0] Error 2

Adrian

-- 
 .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer
`. `'   Physicist
  `-    GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v4 14/36] m68k: Implement the new page table range API
  2023-03-15  5:14 ` [PATCH v4 14/36] m68k: " Matthew Wilcox (Oracle)
@ 2023-03-15  7:43   ` Geert Uytterhoeven
  2023-03-16 16:32     ` Geert Uytterhoeven
  2023-03-15 10:07   ` Mike Rapoport
  1 sibling, 1 reply; 138+ messages in thread
From: Geert Uytterhoeven @ 2023-03-15  7:43 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle); +Cc: linux-arch, linux-mm, linux-kernel, linux-m68k

Hi Willy,

On Wed, Mar 15, 2023 at 6:14 AM Matthew Wilcox (Oracle)
<willy@infradead.org> wrote:
> Add PFN_PTE_SHIFT, update_mmu_cache_range(), flush_icache_pages() and
> flush_dcache_folio().
>
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

Thanks for your patch!

> --- a/arch/m68k/include/asm/cacheflush_mm.h
> +++ b/arch/m68k/include/asm/cacheflush_mm.h
> @@ -220,24 +220,29 @@ static inline void flush_cache_page(struct vm_area_struct *vma, unsigned long vm
>
>  /* Push the page at kernel virtual address and clear the icache */
>  /* RZ: use cpush %bc instead of cpush %dc, cinv %ic */
> -static inline void __flush_page_to_ram(void *vaddr)
> +static inline void __flush_pages_to_ram(void *vaddr, unsigned int nr)
>  {
>         if (CPU_IS_COLDFIRE) {
>                 unsigned long addr, start, end;
>                 addr = ((unsigned long) vaddr) & ~(PAGE_SIZE - 1);
>                 start = addr & ICACHE_SET_MASK;
> -               end = (addr + PAGE_SIZE - 1) & ICACHE_SET_MASK;
> +               end = (addr + nr * PAGE_SIZE - 1) & ICACHE_SET_MASK;
>                 if (start > end) {
>                         flush_cf_bcache(0, end);
>                         end = ICACHE_MAX_ADDR;
>                 }
>                 flush_cf_bcache(start, end);
>         } else if (CPU_IS_040_OR_060) {
> -               __asm__ __volatile__("nop\n\t"
> -                                    ".chip 68040\n\t"
> -                                    "cpushp %%bc,(%0)\n\t"
> -                                    ".chip 68k"
> -                                    : : "a" (__pa(vaddr)));
> +               unsigned long paddr = __pa(vaddr);
> +
> +               do {
> +                       __asm__ __volatile__("nop\n\t"
> +                                            ".chip 68040\n\t"
> +                                            "cpushp %%bc,(%0)\n\t"
> +                                            ".chip 68k"
> +                                            : : "a" (paddr));
> +                       paddr += PAGE_SIZE;
> +               } while (--nr);

Please use "while (nr--) { ... }", to protect against anyone ever
calling this with nr == 0.

The rest LGTM, I'll give it a try shortly...

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v4 01/36] mm: Convert page_table_check_pte_set() to page_table_check_ptes_set()
  2023-03-15  5:14 ` [PATCH v4 01/36] mm: Convert page_table_check_pte_set() to page_table_check_ptes_set() Matthew Wilcox (Oracle)
@ 2023-03-15  9:21   ` Mike Rapoport
  2023-03-23 18:36   ` Pasha Tatashin
  2023-05-25  2:16   ` Anshuman Khandual
  2 siblings, 0 replies; 138+ messages in thread
From: Mike Rapoport @ 2023-03-15  9:21 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle); +Cc: linux-arch, linux-mm, linux-kernel

On Wed, Mar 15, 2023 at 05:14:09AM +0000, Matthew Wilcox (Oracle) wrote:
> Tell the page table check how many PTEs & PFNs we want it to check.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org>

> ---
>  arch/arm64/include/asm/pgtable.h |  2 +-
>  arch/riscv/include/asm/pgtable.h |  2 +-
>  arch/x86/include/asm/pgtable.h   |  2 +-
>  include/linux/page_table_check.h | 14 +++++++-------
>  mm/page_table_check.c            | 14 ++++++++------
>  5 files changed, 18 insertions(+), 16 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
> index 0bd18de9fd97..9428748f4691 100644
> --- a/arch/arm64/include/asm/pgtable.h
> +++ b/arch/arm64/include/asm/pgtable.h
> @@ -358,7 +358,7 @@ static inline void __set_pte_at(struct mm_struct *mm, unsigned long addr,
>  static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
>  			      pte_t *ptep, pte_t pte)
>  {
> -	page_table_check_pte_set(mm, addr, ptep, pte);
> +	page_table_check_ptes_set(mm, addr, ptep, pte, 1);
>  	return __set_pte_at(mm, addr, ptep, pte);
>  }
>  
> diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
> index ab05f892d317..b516f3b59616 100644
> --- a/arch/riscv/include/asm/pgtable.h
> +++ b/arch/riscv/include/asm/pgtable.h
> @@ -459,7 +459,7 @@ static inline void __set_pte_at(struct mm_struct *mm,
>  static inline void set_pte_at(struct mm_struct *mm,
>  	unsigned long addr, pte_t *ptep, pte_t pteval)
>  {
> -	page_table_check_pte_set(mm, addr, ptep, pteval);
> +	page_table_check_ptes_set(mm, addr, ptep, pteval, 1);
>  	__set_pte_at(mm, addr, ptep, pteval);
>  }
>  
> diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
> index 15ae4d6ba476..1031025730d0 100644
> --- a/arch/x86/include/asm/pgtable.h
> +++ b/arch/x86/include/asm/pgtable.h
> @@ -1022,7 +1022,7 @@ static inline pud_t native_local_pudp_get_and_clear(pud_t *pudp)
>  static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
>  			      pte_t *ptep, pte_t pte)
>  {
> -	page_table_check_pte_set(mm, addr, ptep, pte);
> +	page_table_check_ptes_set(mm, addr, ptep, pte, 1);
>  	set_pte(ptep, pte);
>  }
>  
> diff --git a/include/linux/page_table_check.h b/include/linux/page_table_check.h
> index 01e16c7696ec..ba269c7009e4 100644
> --- a/include/linux/page_table_check.h
> +++ b/include/linux/page_table_check.h
> @@ -20,8 +20,8 @@ void __page_table_check_pmd_clear(struct mm_struct *mm, unsigned long addr,
>  				  pmd_t pmd);
>  void __page_table_check_pud_clear(struct mm_struct *mm, unsigned long addr,
>  				  pud_t pud);
> -void __page_table_check_pte_set(struct mm_struct *mm, unsigned long addr,
> -				pte_t *ptep, pte_t pte);
> +void __page_table_check_ptes_set(struct mm_struct *mm, unsigned long addr,
> +				pte_t *ptep, pte_t pte, unsigned int nr);
>  void __page_table_check_pmd_set(struct mm_struct *mm, unsigned long addr,
>  				pmd_t *pmdp, pmd_t pmd);
>  void __page_table_check_pud_set(struct mm_struct *mm, unsigned long addr,
> @@ -73,14 +73,14 @@ static inline void page_table_check_pud_clear(struct mm_struct *mm,
>  	__page_table_check_pud_clear(mm, addr, pud);
>  }
>  
> -static inline void page_table_check_pte_set(struct mm_struct *mm,
> +static inline void page_table_check_ptes_set(struct mm_struct *mm,
>  					    unsigned long addr, pte_t *ptep,
> -					    pte_t pte)
> +					    pte_t pte, unsigned int nr)
>  {
>  	if (static_branch_likely(&page_table_check_disabled))
>  		return;
>  
> -	__page_table_check_pte_set(mm, addr, ptep, pte);
> +	__page_table_check_ptes_set(mm, addr, ptep, pte, nr);
>  }
>  
>  static inline void page_table_check_pmd_set(struct mm_struct *mm,
> @@ -138,9 +138,9 @@ static inline void page_table_check_pud_clear(struct mm_struct *mm,
>  {
>  }
>  
> -static inline void page_table_check_pte_set(struct mm_struct *mm,
> +static inline void page_table_check_ptes_set(struct mm_struct *mm,
>  					    unsigned long addr, pte_t *ptep,
> -					    pte_t pte)
> +					    pte_t pte, unsigned int nr)
>  {
>  }
>  
> diff --git a/mm/page_table_check.c b/mm/page_table_check.c
> index 25d8610c0042..e6f4d40caaa2 100644
> --- a/mm/page_table_check.c
> +++ b/mm/page_table_check.c
> @@ -184,20 +184,22 @@ void __page_table_check_pud_clear(struct mm_struct *mm, unsigned long addr,
>  }
>  EXPORT_SYMBOL(__page_table_check_pud_clear);
>  
> -void __page_table_check_pte_set(struct mm_struct *mm, unsigned long addr,
> -				pte_t *ptep, pte_t pte)
> +void __page_table_check_ptes_set(struct mm_struct *mm, unsigned long addr,
> +				pte_t *ptep, pte_t pte, unsigned int nr)
>  {
> +	unsigned int i;
> +
>  	if (&init_mm == mm)
>  		return;
>  
> -	__page_table_check_pte_clear(mm, addr, *ptep);
> +	for (i = 0; i < nr; i++)
> +		__page_table_check_pte_clear(mm, addr, ptep[i]);
>  	if (pte_user_accessible_page(pte)) {
> -		page_table_check_set(mm, addr, pte_pfn(pte),
> -				     PAGE_SIZE >> PAGE_SHIFT,
> +		page_table_check_set(mm, addr, pte_pfn(pte), nr,
>  				     pte_write(pte));
>  	}
>  }
> -EXPORT_SYMBOL(__page_table_check_pte_set);
> +EXPORT_SYMBOL(__page_table_check_ptes_set);
>  
>  void __page_table_check_pmd_set(struct mm_struct *mm, unsigned long addr,
>  				pmd_t *pmdp, pmd_t pmd)
> -- 
> 2.39.2
> 
> 

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v4 02/36] mm: Add generic flush_icache_pages() and documentation
  2023-03-15  5:14 ` [PATCH v4 02/36] mm: Add generic flush_icache_pages() and documentation Matthew Wilcox (Oracle)
@ 2023-03-15  9:27   ` Mike Rapoport
  2023-05-25  2:23   ` Anshuman Khandual
  1 sibling, 0 replies; 138+ messages in thread
From: Mike Rapoport @ 2023-03-15  9:27 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle); +Cc: linux-arch, linux-mm, linux-kernel

On Wed, Mar 15, 2023 at 05:14:10AM +0000, Matthew Wilcox (Oracle) wrote:
> flush_icache_page() is deprecated but not yet removed, so add
> a range version of it.  Change the documentation to refer to
> update_mmu_cache_range() instead of update_mmu_cache().
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

Acked-by: Mike Rapoport (IBM) <rppt@kernel.org>

> ---
>  Documentation/core-api/cachetlb.rst | 35 +++++++++++++++--------------
>  include/asm-generic/cacheflush.h    |  5 +++++
>  2 files changed, 23 insertions(+), 17 deletions(-)
> 
> diff --git a/Documentation/core-api/cachetlb.rst b/Documentation/core-api/cachetlb.rst
> index 5c0552e78c58..d4c9e2a28d36 100644
> --- a/Documentation/core-api/cachetlb.rst
> +++ b/Documentation/core-api/cachetlb.rst
> @@ -88,13 +88,13 @@ changes occur:
>  
>  	This is used primarily during fault processing.
>  
> -5) ``void update_mmu_cache(struct vm_area_struct *vma,
> -   unsigned long address, pte_t *ptep)``
> +5) ``void update_mmu_cache_range(struct vm_area_struct *vma,
> +   unsigned long address, pte_t *ptep, unsigned int nr)``
>  
> -	At the end of every page fault, this routine is invoked to
> -	tell the architecture specific code that a translation
> -	now exists at virtual address "address" for address space
> -	"vma->vm_mm", in the software page tables.
> +	At the end of every page fault, this routine is invoked to tell
> +	the architecture specific code that translations now exists
> +	in the software page tables for address space "vma->vm_mm"
> +	at virtual address "address" for "nr" consecutive pages.
>  
>  	A port may use this information in any way it so chooses.
>  	For example, it could use this event to pre-load TLB
> @@ -306,17 +306,18 @@ maps this page at its virtual address.
>  	private".  The kernel guarantees that, for pagecache pages, it will
>  	clear this bit when such a page first enters the pagecache.
>  
> -	This allows these interfaces to be implemented much more efficiently.
> -	It allows one to "defer" (perhaps indefinitely) the actual flush if
> -	there are currently no user processes mapping this page.  See sparc64's
> -	flush_dcache_page and update_mmu_cache implementations for an example
> -	of how to go about doing this.
> +	This allows these interfaces to be implemented much more
> +	efficiently.  It allows one to "defer" (perhaps indefinitely) the
> +	actual flush if there are currently no user processes mapping this
> +	page.  See sparc64's flush_dcache_page and update_mmu_cache_range
> +	implementations for an example of how to go about doing this.
>  
> -	The idea is, first at flush_dcache_page() time, if page_file_mapping()
> -	returns a mapping, and mapping_mapped on that mapping returns %false,
> -	just mark the architecture private page flag bit.  Later, in
> -	update_mmu_cache(), a check is made of this flag bit, and if set the
> -	flush is done and the flag bit is cleared.
> +	The idea is, first at flush_dcache_page() time, if
> +	page_file_mapping() returns a mapping, and mapping_mapped on that
> +	mapping returns %false, just mark the architecture private page
> +	flag bit.  Later, in update_mmu_cache_range(), a check is made
> +	of this flag bit, and if set the flush is done and the flag bit
> +	is cleared.
>  
>  	.. important::
>  
> @@ -369,7 +370,7 @@ maps this page at its virtual address.
>    ``void flush_icache_page(struct vm_area_struct *vma, struct page *page)``
>  
>  	All the functionality of flush_icache_page can be implemented in
> -	flush_dcache_page and update_mmu_cache. In the future, the hope
> +	flush_dcache_page and update_mmu_cache_range. In the future, the hope
>  	is to remove this interface completely.
>  
>  The final category of APIs is for I/O to deliberately aliased address
> diff --git a/include/asm-generic/cacheflush.h b/include/asm-generic/cacheflush.h
> index f46258d1a080..09d51a680765 100644
> --- a/include/asm-generic/cacheflush.h
> +++ b/include/asm-generic/cacheflush.h
> @@ -78,6 +78,11 @@ static inline void flush_icache_range(unsigned long start, unsigned long end)
>  #endif
>  
>  #ifndef flush_icache_page
> +static inline void flush_icache_pages(struct vm_area_struct *vma,
> +				     struct page *page, unsigned int nr)
> +{
> +}
> +
>  static inline void flush_icache_page(struct vm_area_struct *vma,
>  				     struct page *page)
>  {
> -- 
> 2.39.2
> 
> 

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v4 03/36] mm: Add folio_flush_mapping()
  2023-03-15  5:14 ` [PATCH v4 03/36] mm: Add folio_flush_mapping() Matthew Wilcox (Oracle)
@ 2023-03-15  9:28   ` Mike Rapoport
  2023-05-25  2:35   ` Anshuman Khandual
  1 sibling, 0 replies; 138+ messages in thread
From: Mike Rapoport @ 2023-03-15  9:28 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle); +Cc: linux-arch, linux-mm, linux-kernel

On Wed, Mar 15, 2023 at 05:14:11AM +0000, Matthew Wilcox (Oracle) wrote:
> This is the folio equivalent of page_mapping_file(), but rename it
> to make it clear that it's very different from page_file_mapping().
> Theoretically, there's nothing flush-only about it, but there are no
> other users today, and I doubt there will be; it's almost always more
> useful to know the swapfile's mapping or the swapcache's mapping.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

Acked-by: Mike Rapoport (IBM) <rppt@kernel.org>

> ---
>  include/linux/pagemap.h | 26 +++++++++++++++++++++-----
>  1 file changed, 21 insertions(+), 5 deletions(-)
> 
> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
> index a56308a9d1a4..e56c2023aa0e 100644
> --- a/include/linux/pagemap.h
> +++ b/include/linux/pagemap.h
> @@ -369,6 +369,26 @@ static inline struct address_space *folio_file_mapping(struct folio *folio)
>  	return folio->mapping;
>  }
>  
> +/**
> + * folio_flush_mapping - Find the file mapping this folio belongs to.
> + * @folio: The folio.
> + *
> + * For folios which are in the page cache, return the mapping that this
> + * page belongs to.  Anonymous folios return NULL, even if they're in
> + * the swap cache.  Other kinds of folio also return NULL.
> + *
> + * This is ONLY used by architecture cache flushing code.  If you aren't
> + * writing cache flushing code, you want either folio_mapping() or
> + * folio_file_mapping().
> + */
> +static inline struct address_space *folio_flush_mapping(struct folio *folio)
> +{
> +	if (unlikely(folio_test_swapcache(folio)))
> +		return NULL;
> +
> +	return folio_mapping(folio);
> +}
> +
>  static inline struct address_space *page_file_mapping(struct page *page)
>  {
>  	return folio_file_mapping(page_folio(page));
> @@ -379,11 +399,7 @@ static inline struct address_space *page_file_mapping(struct page *page)
>   */
>  static inline struct address_space *page_mapping_file(struct page *page)
>  {
> -	struct folio *folio = page_folio(page);
> -
> -	if (unlikely(folio_test_swapcache(folio)))
> -		return NULL;
> -	return folio_mapping(folio);
> +	return folio_flush_mapping(page_folio(page));
>  }
>  
>  /**
> -- 
> 2.39.2
> 

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v4 04/36] mm: Remove ARCH_IMPLEMENTS_FLUSH_DCACHE_FOLIO
  2023-03-15  5:14 ` [PATCH v4 04/36] mm: Remove ARCH_IMPLEMENTS_FLUSH_DCACHE_FOLIO Matthew Wilcox (Oracle)
@ 2023-03-15  9:28   ` Mike Rapoport
  2023-05-25  2:43   ` Anshuman Khandual
  1 sibling, 0 replies; 138+ messages in thread
From: Mike Rapoport @ 2023-03-15  9:28 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle); +Cc: linux-arch, linux-mm, linux-kernel

On Wed, Mar 15, 2023 at 05:14:12AM +0000, Matthew Wilcox (Oracle) wrote:
> Current best practice is to reuse the name of the function as a define
> to indicate that the function is implemented by the architecture.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

Acked-by: Mike Rapoport (IBM) <rppt@kernel.org>

> ---
>  Documentation/core-api/cachetlb.rst | 24 +++++++++---------------
>  include/linux/cacheflush.h          |  4 ++--
>  mm/util.c                           |  2 +-
>  3 files changed, 12 insertions(+), 18 deletions(-)
> 
> diff --git a/Documentation/core-api/cachetlb.rst b/Documentation/core-api/cachetlb.rst
> index d4c9e2a28d36..770008afd409 100644
> --- a/Documentation/core-api/cachetlb.rst
> +++ b/Documentation/core-api/cachetlb.rst
> @@ -269,7 +269,7 @@ maps this page at its virtual address.
>  	If D-cache aliasing is not an issue, these two routines may
>  	simply call memcpy/memset directly and do nothing more.
>  
> -  ``void flush_dcache_page(struct page *page)``
> +  ``void flush_dcache_folio(struct folio *folio)``
>  
>          This routines must be called when:
>  
> @@ -277,7 +277,7 @@ maps this page at its virtual address.
>  	     and / or in high memory
>  	  b) the kernel is about to read from a page cache page and user space
>  	     shared/writable mappings of this page potentially exist.  Note
> -	     that {get,pin}_user_pages{_fast} already call flush_dcache_page
> +	     that {get,pin}_user_pages{_fast} already call flush_dcache_folio
>  	     on any page found in the user address space and thus driver
>  	     code rarely needs to take this into account.
>  
> @@ -291,7 +291,7 @@ maps this page at its virtual address.
>  
>  	The phrase "kernel writes to a page cache page" means, specifically,
>  	that the kernel executes store instructions that dirty data in that
> -	page at the page->virtual mapping of that page.  It is important to
> +	page at the kernel virtual mapping of that page.  It is important to
>  	flush here to handle D-cache aliasing, to make sure these kernel stores
>  	are visible to user space mappings of that page.
>  
> @@ -302,18 +302,18 @@ maps this page at its virtual address.
>  	If D-cache aliasing is not an issue, this routine may simply be defined
>  	as a nop on that architecture.
>  
> -        There is a bit set aside in page->flags (PG_arch_1) as "architecture
> +        There is a bit set aside in folio->flags (PG_arch_1) as "architecture
>  	private".  The kernel guarantees that, for pagecache pages, it will
>  	clear this bit when such a page first enters the pagecache.
>  
>  	This allows these interfaces to be implemented much more
>  	efficiently.  It allows one to "defer" (perhaps indefinitely) the
>  	actual flush if there are currently no user processes mapping this
> -	page.  See sparc64's flush_dcache_page and update_mmu_cache_range
> +	page.  See sparc64's flush_dcache_folio and update_mmu_cache_range
>  	implementations for an example of how to go about doing this.
>  
> -	The idea is, first at flush_dcache_page() time, if
> -	page_file_mapping() returns a mapping, and mapping_mapped on that
> +	The idea is, first at flush_dcache_folio() time, if
> +	folio_flush_mapping() returns a mapping, and mapping_mapped() on that
>  	mapping returns %false, just mark the architecture private page
>  	flag bit.  Later, in update_mmu_cache_range(), a check is made
>  	of this flag bit, and if set the flush is done and the flag bit
> @@ -327,12 +327,6 @@ maps this page at its virtual address.
>  			dirty.  Again, see sparc64 for examples of how
>  			to deal with this.
>  
> -  ``void flush_dcache_folio(struct folio *folio)``
> -	This function is called under the same circumstances as
> -	flush_dcache_page().  It allows the architecture to
> -	optimise for flushing the entire folio of pages instead
> -	of flushing one page at a time.
> -
>    ``void copy_to_user_page(struct vm_area_struct *vma, struct page *page,
>    unsigned long user_vaddr, void *dst, void *src, int len)``
>    ``void copy_from_user_page(struct vm_area_struct *vma, struct page *page,
> @@ -353,7 +347,7 @@ maps this page at its virtual address.
>  
>    	When the kernel needs to access the contents of an anonymous
>  	page, it calls this function (currently only
> -	get_user_pages()).  Note: flush_dcache_page() deliberately
> +	get_user_pages()).  Note: flush_dcache_folio() deliberately
>  	doesn't work for an anonymous page.  The default
>  	implementation is a nop (and should remain so for all coherent
>  	architectures).  For incoherent architectures, it should flush
> @@ -370,7 +364,7 @@ maps this page at its virtual address.
>    ``void flush_icache_page(struct vm_area_struct *vma, struct page *page)``
>  
>  	All the functionality of flush_icache_page can be implemented in
> -	flush_dcache_page and update_mmu_cache_range. In the future, the hope
> +	flush_dcache_folio and update_mmu_cache_range. In the future, the hope
>  	is to remove this interface completely.
>  
>  The final category of APIs is for I/O to deliberately aliased address
> diff --git a/include/linux/cacheflush.h b/include/linux/cacheflush.h
> index a6189d21f2ba..82136f3fcf54 100644
> --- a/include/linux/cacheflush.h
> +++ b/include/linux/cacheflush.h
> @@ -7,14 +7,14 @@
>  struct folio;
>  
>  #if ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE
> -#ifndef ARCH_IMPLEMENTS_FLUSH_DCACHE_FOLIO
> +#ifndef flush_dcache_folio
>  void flush_dcache_folio(struct folio *folio);
>  #endif
>  #else
>  static inline void flush_dcache_folio(struct folio *folio)
>  {
>  }
> -#define ARCH_IMPLEMENTS_FLUSH_DCACHE_FOLIO 0
> +#define flush_dcache_folio flush_dcache_folio
>  #endif /* ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE */
>  
>  #endif /* _LINUX_CACHEFLUSH_H */
> diff --git a/mm/util.c b/mm/util.c
> index dd12b9531ac4..98ce51b01627 100644
> --- a/mm/util.c
> +++ b/mm/util.c
> @@ -1125,7 +1125,7 @@ void page_offline_end(void)
>  }
>  EXPORT_SYMBOL(page_offline_end);
>  
> -#ifndef ARCH_IMPLEMENTS_FLUSH_DCACHE_FOLIO
> +#ifndef flush_dcache_folio
>  void flush_dcache_folio(struct folio *folio)
>  {
>  	long i, nr = folio_nr_pages(folio);
> -- 
> 2.39.2
> 
> 

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v4 05/36] mm: Add default definition of set_ptes()
  2023-03-15  5:14 ` [PATCH v4 05/36] mm: Add default definition of set_ptes() Matthew Wilcox (Oracle)
@ 2023-03-15  9:34   ` Mike Rapoport
  2023-05-25  3:01   ` Anshuman Khandual
  1 sibling, 0 replies; 138+ messages in thread
From: Mike Rapoport @ 2023-03-15  9:34 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle); +Cc: linux-arch, linux-mm, linux-kernel

On Wed, Mar 15, 2023 at 05:14:13AM +0000, Matthew Wilcox (Oracle) wrote:
> Most architectures can just define set_pte() and PFN_PTE_SHIFT to
> use this definition.  It's also a handy spot to document the guarantees
> provided by the MM.
> 
> Suggested-by: Mike Rapoport (IBM) <rppt@kernel.org>
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org>

> ---
>  include/linux/pgtable.h | 37 +++++++++++++++++++++++++++++++++++++
>  1 file changed, 37 insertions(+)
> 
> diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
> index c5a51481bbb9..a755fe94b4b4 100644
> --- a/include/linux/pgtable.h
> +++ b/include/linux/pgtable.h
> @@ -172,6 +172,43 @@ static inline int pmd_young(pmd_t pmd)
>  }
>  #endif
>  
> +#ifndef set_ptes
> +#ifdef PFN_PTE_SHIFT
> +/**
> + * set_ptes - Map consecutive pages to a contiguous range of addresses.
> + * @mm: Address space to map the pages into.
> + * @addr: Address to map the first page at.
> + * @ptep: Page table pointer for the first entry.
> + * @pte: Page table entry for the first page.
> + * @nr: Number of pages to map.
> + *
> + * May be overridden by the architecture, or the architecture can define
> + * set_pte() and PFN_PTE_SHIFT.
> + *
> + * Context: The caller holds the page table lock.  The pages all belong
> + * to the same folio.  The PTEs are all in the same PMD.
> + */
> +static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
> +		pte_t *ptep, pte_t pte, unsigned int nr)
> +{
> +	page_table_check_ptes_set(mm, addr, ptep, pte, nr);
> +
> +	for (;;) {
> +		set_pte(ptep, pte);
> +		if (--nr == 0)
> +			break;
> +		ptep++;
> +		pte = __pte(pte_val(pte) + (1UL << PFN_PTE_SHIFT));
> +	}
> +}
> +#ifndef set_pte_at
> +#define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1)
> +#endif
> +#endif
> +#else
> +#define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1)
> +#endif
> +
>  #ifndef __HAVE_ARCH_PTEP_SET_ACCESS_FLAGS
>  extern int ptep_set_access_flags(struct vm_area_struct *vma,
>  				 unsigned long address, pte_t *ptep,
> -- 
> 2.39.2
> 

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v4 06/36] alpha: Implement the new page table range API
  2023-03-15  5:14 ` [PATCH v4 06/36] alpha: Implement the new page table range API Matthew Wilcox (Oracle)
@ 2023-03-15  9:41   ` Mike Rapoport
  0 siblings, 0 replies; 138+ messages in thread
From: Mike Rapoport @ 2023-03-15  9:41 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-arch, linux-mm, linux-kernel, Richard Henderson,
	Ivan Kokshaysky, Matt Turner, linux-alpha

On Wed, Mar 15, 2023 at 05:14:14AM +0000, Matthew Wilcox (Oracle) wrote:
> Add PFN_PTE_SHIFT, update_mmu_cache_range() and flush_icache_pages().
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> Cc: Richard Henderson <richard.henderson@linaro.org>
> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
> Cc: Matt Turner <mattst88@gmail.com>
> Cc: linux-alpha@vger.kernel.org

Acked-by: Mike Rapoport (IBM) <rppt@kernel.org>

> ---
>  arch/alpha/include/asm/cacheflush.h | 10 ++++++++++
>  arch/alpha/include/asm/pgtable.h    |  9 +++++++--
>  2 files changed, 17 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/alpha/include/asm/cacheflush.h b/arch/alpha/include/asm/cacheflush.h
> index 9945ff483eaf..3956460e69e2 100644
> --- a/arch/alpha/include/asm/cacheflush.h
> +++ b/arch/alpha/include/asm/cacheflush.h
> @@ -57,6 +57,16 @@ extern void flush_icache_user_page(struct vm_area_struct *vma,
>  #define flush_icache_page(vma, page) \
>  	flush_icache_user_page((vma), (page), 0, 0)
>  
> +/*
> + * Both implementations of flush_icache_user_page flush the entire
> + * address space, so one call, no matter how many pages.
> + */
> +static inline void flush_icache_pages(struct vm_area_struct *vma,
> +		struct page *page, unsigned int nr)
> +{
> +	flush_icache_user_page(vma, page, 0, 0);
> +}
> +
>  #include <asm-generic/cacheflush.h>
>  
>  #endif /* _ALPHA_CACHEFLUSH_H */
> diff --git a/arch/alpha/include/asm/pgtable.h b/arch/alpha/include/asm/pgtable.h
> index ba43cb841d19..6c24c408b8e9 100644
> --- a/arch/alpha/include/asm/pgtable.h
> +++ b/arch/alpha/include/asm/pgtable.h
> @@ -26,7 +26,6 @@ struct vm_area_struct;
>   * hook is made available.
>   */
>  #define set_pte(pteptr, pteval) ((*(pteptr)) = (pteval))
> -#define set_pte_at(mm,addr,ptep,pteval) set_pte(ptep,pteval)
>  
>  /* PMD_SHIFT determines the size of the area a second-level page table can map */
>  #define PMD_SHIFT	(PAGE_SHIFT + (PAGE_SHIFT-3))
> @@ -189,7 +188,8 @@ extern unsigned long __zero_page(void);
>   * and a page entry and page directory to the page they refer to.
>   */
>  #define page_to_pa(page)	(page_to_pfn(page) << PAGE_SHIFT)
> -#define pte_pfn(pte)	(pte_val(pte) >> 32)
> +#define PFN_PTE_SHIFT		32
> +#define pte_pfn(pte)		(pte_val(pte) >> PFN_PTE_SHIFT)
>  
>  #define pte_page(pte)	pfn_to_page(pte_pfn(pte))
>  #define mk_pte(page, pgprot)						\
> @@ -303,6 +303,11 @@ extern inline void update_mmu_cache(struct vm_area_struct * vma,
>  {
>  }
>  
> +static inline void update_mmu_cache_range(struct vm_area_struct *vma,
> +		unsigned long address, pte_t *ptep, unsigned int nr)
> +{
> +}
> +
>  /*
>   * Encode/decode swap entries and swap PTEs. Swap PTEs are all PTEs that
>   * are !pte_none() && !pte_present().
> -- 
> 2.39.2
> 
> 

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v4 20/36] powerpc: Implement the new page table range API
  2023-03-15  5:14 ` [PATCH v4 20/36] powerpc: " Matthew Wilcox (Oracle)
@ 2023-03-15  9:43   ` Christophe Leroy
  2023-03-15 10:18     ` Christophe Leroy
  2023-03-15 10:09   ` Mike Rapoport
  1 sibling, 1 reply; 138+ messages in thread
From: Christophe Leroy @ 2023-03-15  9:43 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), linux-arch
  Cc: linux-mm, linux-kernel, Michael Ellerman, Nicholas Piggin, linuxppc-dev



Le 15/03/2023 à 06:14, Matthew Wilcox (Oracle) a écrit :
> Add set_ptes(), update_mmu_cache_range() and flush_dcache_folio().
> Change the PG_arch_1 (aka PG_dcache_dirty) flag from being per-page to
> per-folio.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Nicholas Piggin <npiggin@gmail.com>
> Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
> Cc: linuxppc-dev@lists.ozlabs.org
> ---
>   arch/powerpc/include/asm/book3s/pgtable.h | 10 +----
>   arch/powerpc/include/asm/cacheflush.h     | 14 +++++--
>   arch/powerpc/include/asm/kvm_ppc.h        | 10 ++---
>   arch/powerpc/include/asm/nohash/pgtable.h | 13 ++----
>   arch/powerpc/include/asm/pgtable.h        |  6 +++
>   arch/powerpc/mm/book3s64/hash_utils.c     | 11 ++---
>   arch/powerpc/mm/cacheflush.c              | 40 ++++++------------
>   arch/powerpc/mm/nohash/e500_hugetlbpage.c |  3 +-
>   arch/powerpc/mm/pgtable.c                 | 51 +++++++++++++----------
>   9 files changed, 77 insertions(+), 81 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/book3s/pgtable.h b/arch/powerpc/include/asm/book3s/pgtable.h
> index d18b748ea3ae..c2ef811505b0 100644
> --- a/arch/powerpc/include/asm/book3s/pgtable.h
> +++ b/arch/powerpc/include/asm/book3s/pgtable.h
> @@ -9,13 +9,6 @@
>   #endif
>   
>   #ifndef __ASSEMBLY__
> -/* Insert a PTE, top-level function is out of line. It uses an inline
> - * low level function in the respective pgtable-* files
> - */
> -extern void set_pte_at(struct mm_struct *mm, unsigned long addr, pte_t *ptep,
> -		       pte_t pte);
> -
> -
>   #define __HAVE_ARCH_PTEP_SET_ACCESS_FLAGS
>   extern int ptep_set_access_flags(struct vm_area_struct *vma, unsigned long address,
>   				 pte_t *ptep, pte_t entry, int dirty);
> @@ -36,7 +29,8 @@ void __update_mmu_cache(struct vm_area_struct *vma, unsigned long address, pte_t
>    * corresponding HPTE into the hash table ahead of time, instead of
>    * waiting for the inevitable extra hash-table miss exception.
>    */
> -static inline void update_mmu_cache(struct vm_area_struct *vma, unsigned long address, pte_t *ptep)
> +static inline void update_mmu_cache_range(struct vm_area_struct *vma,
> +		unsigned long address, pte_t *ptep, unsigned int nr)
>   {
>   	if (IS_ENABLED(CONFIG_PPC32) && !mmu_has_feature(MMU_FTR_HPTE_TABLE))
>   		return;
> diff --git a/arch/powerpc/include/asm/cacheflush.h b/arch/powerpc/include/asm/cacheflush.h
> index 7564dd4fd12b..ef7d2de33b89 100644
> --- a/arch/powerpc/include/asm/cacheflush.h
> +++ b/arch/powerpc/include/asm/cacheflush.h
> @@ -35,13 +35,19 @@ static inline void flush_cache_vmap(unsigned long start, unsigned long end)
>    * It just marks the page as not i-cache clean.  We do the i-cache
>    * flush later when the page is given to a user process, if necessary.
>    */
> -static inline void flush_dcache_page(struct page *page)
> +static inline void flush_dcache_folio(struct folio *folio)
>   {
>   	if (cpu_has_feature(CPU_FTR_COHERENT_ICACHE))
>   		return;
>   	/* avoid an atomic op if possible */
> -	if (test_bit(PG_dcache_clean, &page->flags))
> -		clear_bit(PG_dcache_clean, &page->flags);
> +	if (test_bit(PG_dcache_clean, &folio->flags))
> +		clear_bit(PG_dcache_clean, &folio->flags);
> +}
> +#define flush_dcache_folio flush_dcache_folio
> +
> +static inline void flush_dcache_page(struct page *page)
> +{
> +	flush_dcache_folio(page_folio(page));
>   }
>   
>   void flush_icache_range(unsigned long start, unsigned long stop);
> @@ -51,7 +57,7 @@ void flush_icache_user_page(struct vm_area_struct *vma, struct page *page,
>   		unsigned long addr, int len);
>   #define flush_icache_user_page flush_icache_user_page
>   
> -void flush_dcache_icache_page(struct page *page);
> +void flush_dcache_icache_folio(struct folio *folio);
>   
>   /**
>    * flush_dcache_range(): Write any modified data cache blocks out to memory and
> diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
> index 6bef23d6d0e3..e91dd8e88bb7 100644
> --- a/arch/powerpc/include/asm/kvm_ppc.h
> +++ b/arch/powerpc/include/asm/kvm_ppc.h
> @@ -868,7 +868,7 @@ void kvmppc_init_lpid(unsigned long nr_lpids);
>   
>   static inline void kvmppc_mmu_flush_icache(kvm_pfn_t pfn)
>   {
> -	struct page *page;
> +	struct folio *folio;
>   	/*
>   	 * We can only access pages that the kernel maps
>   	 * as memory. Bail out for unmapped ones.
> @@ -877,10 +877,10 @@ static inline void kvmppc_mmu_flush_icache(kvm_pfn_t pfn)
>   		return;
>   
>   	/* Clear i-cache for new pages */
> -	page = pfn_to_page(pfn);
> -	if (!test_bit(PG_dcache_clean, &page->flags)) {
> -		flush_dcache_icache_page(page);
> -		set_bit(PG_dcache_clean, &page->flags);
> +	folio = page_folio(pfn_to_page(pfn));
> +	if (!test_bit(PG_dcache_clean, &folio->flags)) {
> +		flush_dcache_icache_folio(folio);
> +		set_bit(PG_dcache_clean, &folio->flags);
>   	}
>   }
>   
> diff --git a/arch/powerpc/include/asm/nohash/pgtable.h b/arch/powerpc/include/asm/nohash/pgtable.h
> index a6caaaab6f92..69a7dd47a9f0 100644
> --- a/arch/powerpc/include/asm/nohash/pgtable.h
> +++ b/arch/powerpc/include/asm/nohash/pgtable.h
> @@ -166,12 +166,6 @@ static inline pte_t pte_swp_clear_exclusive(pte_t pte)
>   	return __pte(pte_val(pte) & ~_PAGE_SWP_EXCLUSIVE);
>   }
>   
> -/* Insert a PTE, top-level function is out of line. It uses an inline
> - * low level function in the respective pgtable-* files
> - */
> -extern void set_pte_at(struct mm_struct *mm, unsigned long addr, pte_t *ptep,
> -		       pte_t pte);
> -
>   /* This low level function performs the actual PTE insertion
>    * Setting the PTE depends on the MMU type and other factors. It's
>    * an horrible mess that I'm not going to try to clean up now but
> @@ -282,10 +276,11 @@ static inline int pud_huge(pud_t pud)
>    * for the page which has just been mapped in.
>    */
>   #if defined(CONFIG_PPC_E500) && defined(CONFIG_HUGETLB_PAGE)
> -void update_mmu_cache(struct vm_area_struct *vma, unsigned long address, pte_t *ptep);
> +void update_mmu_cache_range(struct vm_area_struct *vma, unsigned long address,
> +		pte_t *ptep, unsigned int nr);
>   #else
> -static inline
> -void update_mmu_cache(struct vm_area_struct *vma, unsigned long address, pte_t *ptep) {}
> +static inline void update_mmu_cache(struct vm_area_struct *vma,
> +		unsigned long address, pte_t *ptep, unsigned int nr) {}

Do you mean update_mmu_cache_range() ?

>   #endif
>   
>   #endif /* __ASSEMBLY__ */
> diff --git a/arch/powerpc/include/asm/pgtable.h b/arch/powerpc/include/asm/pgtable.h
> index 9972626ddaf6..656ecf2b10cd 100644
> --- a/arch/powerpc/include/asm/pgtable.h
> +++ b/arch/powerpc/include/asm/pgtable.h
> @@ -41,6 +41,12 @@ struct mm_struct;
>   
>   #ifndef __ASSEMBLY__
>   
> +void set_ptes(struct mm_struct *mm, unsigned long addr, pte_t *ptep,
> +		pte_t pte, unsigned int nr);
> +#define set_ptes set_ptes
> +#define update_mmu_cache(vma, addr, ptep) \
> +	update_mmu_cache_range(vma, addr, ptep, 1)
> +
>   #ifndef MAX_PTRS_PER_PGD
>   #define MAX_PTRS_PER_PGD PTRS_PER_PGD
>   #endif
> diff --git a/arch/powerpc/mm/book3s64/hash_utils.c b/arch/powerpc/mm/book3s64/hash_utils.c
> index fedffe3ae136..ad2afa08e62e 100644
> --- a/arch/powerpc/mm/book3s64/hash_utils.c
> +++ b/arch/powerpc/mm/book3s64/hash_utils.c
> @@ -1307,18 +1307,19 @@ void hash__early_init_mmu_secondary(void)
>    */
>   unsigned int hash_page_do_lazy_icache(unsigned int pp, pte_t pte, int trap)
>   {
> -	struct page *page;
> +	struct folio *folio;
>   
>   	if (!pfn_valid(pte_pfn(pte)))
>   		return pp;
>   
> -	page = pte_page(pte);
> +	folio = page_folio(pte_page(pte));
>   
>   	/* page is dirty */
> -	if (!test_bit(PG_dcache_clean, &page->flags) && !PageReserved(page)) {
> +	if (!test_bit(PG_dcache_clean, &folio->flags) &&
> +	    !folio_test_reserved(folio)) {
>   		if (trap == INTERRUPT_INST_STORAGE) {
> -			flush_dcache_icache_page(page);
> -			set_bit(PG_dcache_clean, &page->flags);
> +			flush_dcache_icache_folio(folio);
> +			set_bit(PG_dcache_clean, &folio->flags);
>   		} else
>   			pp |= HPTE_R_N;
>   	}
> diff --git a/arch/powerpc/mm/cacheflush.c b/arch/powerpc/mm/cacheflush.c
> index 0e9b4879c0f9..8760d2223abe 100644
> --- a/arch/powerpc/mm/cacheflush.c
> +++ b/arch/powerpc/mm/cacheflush.c
> @@ -148,44 +148,30 @@ static void __flush_dcache_icache(void *p)
>   	invalidate_icache_range(addr, addr + PAGE_SIZE);
>   }
>   
> -static void flush_dcache_icache_hugepage(struct page *page)
> +void flush_dcache_icache_folio(struct folio *folio)
>   {
> -	int i;
> -	int nr = compound_nr(page);
> +	unsigned int i, nr = folio_nr_pages(folio);
>   
> -	if (!PageHighMem(page)) {
> +	if (flush_coherent_icache())
> +		return;
> +
> +	if (!folio_test_highmem(folio)) {
> +		void *addr = folio_address(folio);

Should be a blank line here ?

>   		for (i = 0; i < nr; i++)
> -			__flush_dcache_icache(lowmem_page_address(page + i));
> -	} else {
> +			__flush_dcache_icache(addr + i * PAGE_SIZE);
> +	} else if (IS_ENABLED(CONFIG_BOOKE) || sizeof(phys_addr_t) > sizeof(void *)) {
>   		for (i = 0; i < nr; i++) {
> -			void *start = kmap_local_page(page + i);
> +			void *start = kmap_local_folio(folio, i * PAGE_SIZE);
>   
>   			__flush_dcache_icache(start);
>   			kunmap_local(start);
>   		}
> -	}
> -}
> -
> -void flush_dcache_icache_page(struct page *page)
> -{
> -	if (flush_coherent_icache())
> -		return;
> -
> -	if (PageCompound(page))
> -		return flush_dcache_icache_hugepage(page);
> -
> -	if (!PageHighMem(page)) {
> -		__flush_dcache_icache(lowmem_page_address(page));
> -	} else if (IS_ENABLED(CONFIG_BOOKE) || sizeof(phys_addr_t) > sizeof(void *)) {
> -		void *start = kmap_local_page(page);
> -
> -		__flush_dcache_icache(start);
> -		kunmap_local(start);
>   	} else {
> -		flush_dcache_icache_phys(page_to_phys(page));
> +		unsigned long pfn = folio_pfn(folio);

Blank line ?

> +		for (i = 0; i < nr; i++)
> +			flush_dcache_icache_phys((pfn + i) * PAGE_SIZE);

Use PFN_PHYS(pfn + i) ?

>   	}
>   }
> -EXPORT_SYMBOL(flush_dcache_icache_page);
>   
>   void clear_user_page(void *page, unsigned long vaddr, struct page *pg)
>   {
> diff --git a/arch/powerpc/mm/nohash/e500_hugetlbpage.c b/arch/powerpc/mm/nohash/e500_hugetlbpage.c
> index 58c8d9849cb1..f3cb91107a47 100644
> --- a/arch/powerpc/mm/nohash/e500_hugetlbpage.c
> +++ b/arch/powerpc/mm/nohash/e500_hugetlbpage.c
> @@ -178,7 +178,8 @@ book3e_hugetlb_preload(struct vm_area_struct *vma, unsigned long ea, pte_t pte)
>    *
>    * This must always be called with the pte lock held.
>    */
> -void update_mmu_cache(struct vm_area_struct *vma, unsigned long address, pte_t *ptep)
> +void update_mmu_cache(struct vm_area_struct *vma, unsigned long address,
> +		pte_t *ptep, unsigned int nr)

update_mmu_cache_range() ?


>   {
>   	if (is_vm_hugetlb_page(vma))
>   		book3e_hugetlb_preload(vma, address, *ptep);
> diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c
> index cb2dcdb18f8e..b3c7b874a7a2 100644
> --- a/arch/powerpc/mm/pgtable.c
> +++ b/arch/powerpc/mm/pgtable.c
> @@ -58,7 +58,7 @@ static inline int pte_looks_normal(pte_t pte)
>   	return 0;
>   }
>   
> -static struct page *maybe_pte_to_page(pte_t pte)
> +static struct folio *maybe_pte_to_folio(pte_t pte)
>   {
>   	unsigned long pfn = pte_pfn(pte);
>   	struct page *page;
> @@ -68,7 +68,7 @@ static struct page *maybe_pte_to_page(pte_t pte)
>   	page = pfn_to_page(pfn);
>   	if (PageReserved(page))
>   		return NULL;
> -	return page;
> +	return page_folio(page);
>   }
>   
>   #ifdef CONFIG_PPC_BOOK3S
> @@ -84,12 +84,12 @@ static pte_t set_pte_filter_hash(pte_t pte)
>   	pte = __pte(pte_val(pte) & ~_PAGE_HPTEFLAGS);
>   	if (pte_looks_normal(pte) && !(cpu_has_feature(CPU_FTR_COHERENT_ICACHE) ||
>   				       cpu_has_feature(CPU_FTR_NOEXECUTE))) {
> -		struct page *pg = maybe_pte_to_page(pte);
> -		if (!pg)
> +		struct folio *folio = maybe_pte_to_folio(pte);
> +		if (!folio)
>   			return pte;
> -		if (!test_bit(PG_dcache_clean, &pg->flags)) {
> -			flush_dcache_icache_page(pg);
> -			set_bit(PG_dcache_clean, &pg->flags);
> +		if (!test_bit(PG_dcache_clean, &folio->flags)) {
> +			flush_dcache_icache_folio(folio);
> +			set_bit(PG_dcache_clean, &folio->flags);
>   		}
>   	}
>   	return pte;
> @@ -107,7 +107,7 @@ static pte_t set_pte_filter_hash(pte_t pte) { return pte; }
>    */
>   static inline pte_t set_pte_filter(pte_t pte)
>   {
> -	struct page *pg;
> +	struct folio *folio;
>   
>   	if (radix_enabled())
>   		return pte;
> @@ -120,18 +120,18 @@ static inline pte_t set_pte_filter(pte_t pte)
>   		return pte;
>   
>   	/* If you set _PAGE_EXEC on weird pages you're on your own */
> -	pg = maybe_pte_to_page(pte);
> -	if (unlikely(!pg))
> +	folio = maybe_pte_to_folio(pte);
> +	if (unlikely(!folio))
>   		return pte;
>   
>   	/* If the page clean, we move on */
> -	if (test_bit(PG_dcache_clean, &pg->flags))
> +	if (test_bit(PG_dcache_clean, &folio->flags))
>   		return pte;
>   
>   	/* If it's an exec fault, we flush the cache and make it clean */
>   	if (is_exec_fault()) {
> -		flush_dcache_icache_page(pg);
> -		set_bit(PG_dcache_clean, &pg->flags);
> +		flush_dcache_icache_folio(folio);
> +		set_bit(PG_dcache_clean, &folio->flags);
>   		return pte;
>   	}
>   
> @@ -142,7 +142,7 @@ static inline pte_t set_pte_filter(pte_t pte)
>   static pte_t set_access_flags_filter(pte_t pte, struct vm_area_struct *vma,
>   				     int dirty)
>   {
> -	struct page *pg;
> +	struct folio *folio;
>   
>   	if (IS_ENABLED(CONFIG_PPC_BOOK3S_64))
>   		return pte;
> @@ -168,17 +168,17 @@ static pte_t set_access_flags_filter(pte_t pte, struct vm_area_struct *vma,
>   #endif /* CONFIG_DEBUG_VM */
>   
>   	/* If you set _PAGE_EXEC on weird pages you're on your own */
> -	pg = maybe_pte_to_page(pte);
> -	if (unlikely(!pg))
> +	folio = maybe_pte_to_folio(pte);
> +	if (unlikely(!folio))
>   		goto bail;
>   
>   	/* If the page is already clean, we move on */
> -	if (test_bit(PG_dcache_clean, &pg->flags))
> +	if (test_bit(PG_dcache_clean, &folio->flags))
>   		goto bail;
>   
>   	/* Clean the page and set PG_dcache_clean */
> -	flush_dcache_icache_page(pg);
> -	set_bit(PG_dcache_clean, &pg->flags);
> +	flush_dcache_icache_folio(folio);
> +	set_bit(PG_dcache_clean, &folio->flags);
>   
>    bail:
>   	return pte_mkexec(pte);
> @@ -187,8 +187,8 @@ static pte_t set_access_flags_filter(pte_t pte, struct vm_area_struct *vma,
>   /*
>    * set_pte stores a linux PTE into the linux page table.
>    */
> -void set_pte_at(struct mm_struct *mm, unsigned long addr, pte_t *ptep,
> -		pte_t pte)
> +void set_ptes(struct mm_struct *mm, unsigned long addr, pte_t *ptep,
> +		pte_t pte, unsigned int nr)
>   {
>   	/*
>   	 * Make sure hardware valid bit is not set. We don't do
> @@ -203,7 +203,14 @@ void set_pte_at(struct mm_struct *mm, unsigned long addr, pte_t *ptep,
>   	pte = set_pte_filter(pte);
>   
>   	/* Perform the setting of the PTE */
> -	__set_pte_at(mm, addr, ptep, pte, 0);
> +	for (;;) {
> +		__set_pte_at(mm, addr, ptep, pte, 0);
> +		if (--nr == 0)
> +			break;
> +		ptep++;
> +		pte = __pte(pte_val(pte) + PAGE_SIZE);

I don't like that math too much, but I have no better idea at the moment.

Maybe set_ptes() should take a pgprot_t and rebuild the pte with 
mk_pte() or similar ?

> +		addr += PAGE_SIZE;
> +	}
>   }
>   
>   void unmap_kernel_page(unsigned long va)

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v4 07/36] arc: Implement the new page table range API
  2023-03-15  5:14 ` [PATCH v4 07/36] arc: " Matthew Wilcox (Oracle)
@ 2023-03-15  9:44   ` Mike Rapoport
  0 siblings, 0 replies; 138+ messages in thread
From: Mike Rapoport @ 2023-03-15  9:44 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-arch, linux-mm, linux-kernel, Vineet Gupta, linux-snps-arc

On Wed, Mar 15, 2023 at 05:14:15AM +0000, Matthew Wilcox (Oracle) wrote:
> Add PFN_PTE_SHIFT, update_mmu_cache_range(), flush_dcache_folio()
> and flush_icache_pages().
> 
> Change the PG_dc_clean flag from being per-page to per-folio (which
> means it cannot always be set as we don't know that all pages in this
> folio were cleaned).  Enhance the internal flush routines to take the
> number of pages to flush.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> Cc: Vineet Gupta <vgupta@kernel.org>
> Cc: linux-snps-arc@lists.infradead.org

Acked-by: Mike Rapoport (IBM) <rppt@kernel.org>

> ---
>  arch/arc/include/asm/cacheflush.h         |  7 ++-
>  arch/arc/include/asm/pgtable-bits-arcv2.h | 11 ++--
>  arch/arc/include/asm/pgtable-levels.h     |  1 +
>  arch/arc/mm/cache.c                       | 61 ++++++++++++++---------
>  arch/arc/mm/tlb.c                         | 18 ++++---
>  5 files changed, 58 insertions(+), 40 deletions(-)
> 
> diff --git a/arch/arc/include/asm/cacheflush.h b/arch/arc/include/asm/cacheflush.h
> index e201b4b1655a..04f65f588510 100644
> --- a/arch/arc/include/asm/cacheflush.h
> +++ b/arch/arc/include/asm/cacheflush.h
> @@ -25,17 +25,20 @@
>   * in update_mmu_cache()
>   */
>  #define flush_icache_page(vma, page)
> +#define flush_icache_pages(vma, page, nr)
>  
>  void flush_cache_all(void);
>  
>  void flush_icache_range(unsigned long kstart, unsigned long kend);
>  void __sync_icache_dcache(phys_addr_t paddr, unsigned long vaddr, int len);
> -void __inv_icache_page(phys_addr_t paddr, unsigned long vaddr);
> -void __flush_dcache_page(phys_addr_t paddr, unsigned long vaddr);
> +void __inv_icache_pages(phys_addr_t paddr, unsigned long vaddr, unsigned nr);
> +void __flush_dcache_pages(phys_addr_t paddr, unsigned long vaddr, unsigned nr);
>  
>  #define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
>  
>  void flush_dcache_page(struct page *page);
> +void flush_dcache_folio(struct folio *folio);
> +#define flush_dcache_folio flush_dcache_folio
>  
>  void dma_cache_wback_inv(phys_addr_t start, unsigned long sz);
>  void dma_cache_inv(phys_addr_t start, unsigned long sz);
> diff --git a/arch/arc/include/asm/pgtable-bits-arcv2.h b/arch/arc/include/asm/pgtable-bits-arcv2.h
> index 6e9f8ca6d6a1..06d8039180c0 100644
> --- a/arch/arc/include/asm/pgtable-bits-arcv2.h
> +++ b/arch/arc/include/asm/pgtable-bits-arcv2.h
> @@ -100,14 +100,11 @@ static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
>  	return __pte((pte_val(pte) & _PAGE_CHG_MASK) | pgprot_val(newprot));
>  }
>  
> -static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
> -			      pte_t *ptep, pte_t pteval)
> -{
> -	set_pte(ptep, pteval);
> -}
> +void update_mmu_cache_range(struct vm_area_struct *vma, unsigned long address,
> +		      pte_t *ptep, unsigned int nr);
>  
> -void update_mmu_cache(struct vm_area_struct *vma, unsigned long address,
> -		      pte_t *ptep);
> +#define update_mmu_cache(vma, addr, ptep) \
> +	update_mmu_cache_range(vma, addr, ptep, 1)
>  
>  /*
>   * Encode/decode swap entries and swap PTEs. Swap PTEs are all PTEs that
> diff --git a/arch/arc/include/asm/pgtable-levels.h b/arch/arc/include/asm/pgtable-levels.h
> index ef68758b69f7..fc417c75c24d 100644
> --- a/arch/arc/include/asm/pgtable-levels.h
> +++ b/arch/arc/include/asm/pgtable-levels.h
> @@ -169,6 +169,7 @@
>  #define pte_ERROR(e) \
>  	pr_crit("%s:%d: bad pte %08lx.\n", __FILE__, __LINE__, pte_val(e))
>  
> +#define PFN_PTE_SHIFT		PAGE_SHIFT
>  #define pte_none(x)		(!pte_val(x))
>  #define pte_present(x)		(pte_val(x) & _PAGE_PRESENT)
>  #define pte_clear(mm,addr,ptep)	set_pte_at(mm, addr, ptep, __pte(0))
> diff --git a/arch/arc/mm/cache.c b/arch/arc/mm/cache.c
> index 55c6de138eae..3c16ee942a5c 100644
> --- a/arch/arc/mm/cache.c
> +++ b/arch/arc/mm/cache.c
> @@ -752,17 +752,17 @@ static inline void arc_slc_enable(void)
>   * There's a corollary case, where kernel READs from a userspace mapped page.
>   * If the U-mapping is not congruent to K-mapping, former needs flushing.
>   */
> -void flush_dcache_page(struct page *page)
> +void flush_dcache_folio(struct folio *folio)
>  {
>  	struct address_space *mapping;
>  
>  	if (!cache_is_vipt_aliasing()) {
> -		clear_bit(PG_dc_clean, &page->flags);
> +		clear_bit(PG_dc_clean, &folio->flags);
>  		return;
>  	}
>  
>  	/* don't handle anon pages here */
> -	mapping = page_mapping_file(page);
> +	mapping = folio_flush_mapping(folio);
>  	if (!mapping)
>  		return;
>  
> @@ -771,17 +771,27 @@ void flush_dcache_page(struct page *page)
>  	 * Make a note that K-mapping is dirty
>  	 */
>  	if (!mapping_mapped(mapping)) {
> -		clear_bit(PG_dc_clean, &page->flags);
> -	} else if (page_mapcount(page)) {
> -
> +		clear_bit(PG_dc_clean, &folio->flags);
> +	} else if (folio_mapped(folio)) {
>  		/* kernel reading from page with U-mapping */
> -		phys_addr_t paddr = (unsigned long)page_address(page);
> -		unsigned long vaddr = page->index << PAGE_SHIFT;
> +		phys_addr_t paddr = (unsigned long)folio_address(folio);
> +		unsigned long vaddr = folio_pos(folio);
>  
> +		/*
> +		 * vaddr is not actually the virtual address, but is
> +		 * congruent to every user mapping.
> +		 */
>  		if (addr_not_cache_congruent(paddr, vaddr))
> -			__flush_dcache_page(paddr, vaddr);
> +			__flush_dcache_pages(paddr, vaddr,
> +						folio_nr_pages(folio));
>  	}
>  }
> +EXPORT_SYMBOL(flush_dcache_folio);
> +
> +void flush_dcache_page(struct page *page)
> +{
> +	return flush_dcache_folio(page_folio(page));
> +}
>  EXPORT_SYMBOL(flush_dcache_page);
>  
>  /*
> @@ -921,18 +931,18 @@ void __sync_icache_dcache(phys_addr_t paddr, unsigned long vaddr, int len)
>  }
>  
>  /* wrapper to compile time eliminate alignment checks in flush loop */
> -void __inv_icache_page(phys_addr_t paddr, unsigned long vaddr)
> +void __inv_icache_pages(phys_addr_t paddr, unsigned long vaddr, unsigned nr)
>  {
> -	__ic_line_inv_vaddr(paddr, vaddr, PAGE_SIZE);
> +	__ic_line_inv_vaddr(paddr, vaddr, nr * PAGE_SIZE);
>  }
>  
>  /*
>   * wrapper to clearout kernel or userspace mappings of a page
>   * For kernel mappings @vaddr == @paddr
>   */
> -void __flush_dcache_page(phys_addr_t paddr, unsigned long vaddr)
> +void __flush_dcache_pages(phys_addr_t paddr, unsigned long vaddr, unsigned nr)
>  {
> -	__dc_line_op(paddr, vaddr & PAGE_MASK, PAGE_SIZE, OP_FLUSH_N_INV);
> +	__dc_line_op(paddr, vaddr & PAGE_MASK, nr * PAGE_SIZE, OP_FLUSH_N_INV);
>  }
>  
>  noinline void flush_cache_all(void)
> @@ -962,10 +972,10 @@ void flush_cache_page(struct vm_area_struct *vma, unsigned long u_vaddr,
>  
>  	u_vaddr &= PAGE_MASK;
>  
> -	__flush_dcache_page(paddr, u_vaddr);
> +	__flush_dcache_pages(paddr, u_vaddr, 1);
>  
>  	if (vma->vm_flags & VM_EXEC)
> -		__inv_icache_page(paddr, u_vaddr);
> +		__inv_icache_pages(paddr, u_vaddr, 1);
>  }
>  
>  void flush_cache_range(struct vm_area_struct *vma, unsigned long start,
> @@ -978,9 +988,9 @@ void flush_anon_page(struct vm_area_struct *vma, struct page *page,
>  		     unsigned long u_vaddr)
>  {
>  	/* TBD: do we really need to clear the kernel mapping */
> -	__flush_dcache_page((phys_addr_t)page_address(page), u_vaddr);
> -	__flush_dcache_page((phys_addr_t)page_address(page),
> -			    (phys_addr_t)page_address(page));
> +	__flush_dcache_pages((phys_addr_t)page_address(page), u_vaddr, 1);
> +	__flush_dcache_pages((phys_addr_t)page_address(page),
> +			    (phys_addr_t)page_address(page), 1);
>  
>  }
>  
> @@ -989,6 +999,8 @@ void flush_anon_page(struct vm_area_struct *vma, struct page *page,
>  void copy_user_highpage(struct page *to, struct page *from,
>  	unsigned long u_vaddr, struct vm_area_struct *vma)
>  {
> +	struct folio *src = page_folio(from);
> +	struct folio *dst = page_folio(to);
>  	void *kfrom = kmap_atomic(from);
>  	void *kto = kmap_atomic(to);
>  	int clean_src_k_mappings = 0;
> @@ -1005,7 +1017,7 @@ void copy_user_highpage(struct page *to, struct page *from,
>  	 * addr_not_cache_congruent() is 0
>  	 */
>  	if (page_mapcount(from) && addr_not_cache_congruent(kfrom, u_vaddr)) {
> -		__flush_dcache_page((unsigned long)kfrom, u_vaddr);
> +		__flush_dcache_pages((unsigned long)kfrom, u_vaddr, 1);
>  		clean_src_k_mappings = 1;
>  	}
>  
> @@ -1019,17 +1031,17 @@ void copy_user_highpage(struct page *to, struct page *from,
>  	 * non copied user pages (e.g. read faults which wire in pagecache page
>  	 * directly).
>  	 */
> -	clear_bit(PG_dc_clean, &to->flags);
> +	clear_bit(PG_dc_clean, &dst->flags);
>  
>  	/*
>  	 * if SRC was already usermapped and non-congruent to kernel mapping
>  	 * sync the kernel mapping back to physical page
>  	 */
>  	if (clean_src_k_mappings) {
> -		__flush_dcache_page((unsigned long)kfrom, (unsigned long)kfrom);
> -		set_bit(PG_dc_clean, &from->flags);
> +		__flush_dcache_pages((unsigned long)kfrom,
> +					(unsigned long)kfrom, 1);
>  	} else {
> -		clear_bit(PG_dc_clean, &from->flags);
> +		clear_bit(PG_dc_clean, &src->flags);
>  	}
>  
>  	kunmap_atomic(kto);
> @@ -1038,8 +1050,9 @@ void copy_user_highpage(struct page *to, struct page *from,
>  
>  void clear_user_page(void *to, unsigned long u_vaddr, struct page *page)
>  {
> +	struct folio *folio = page_folio(page);
>  	clear_page(to);
> -	clear_bit(PG_dc_clean, &page->flags);
> +	clear_bit(PG_dc_clean, &folio->flags);
>  }
>  EXPORT_SYMBOL(clear_user_page);
>  
> diff --git a/arch/arc/mm/tlb.c b/arch/arc/mm/tlb.c
> index 5f71445f26bd..0a996b65bb4e 100644
> --- a/arch/arc/mm/tlb.c
> +++ b/arch/arc/mm/tlb.c
> @@ -467,8 +467,8 @@ void create_tlb(struct vm_area_struct *vma, unsigned long vaddr, pte_t *ptep)
>   * Note that flush (when done) involves both WBACK - so physical page is
>   * in sync as well as INV - so any non-congruent aliases don't remain
>   */
> -void update_mmu_cache(struct vm_area_struct *vma, unsigned long vaddr_unaligned,
> -		      pte_t *ptep)
> +void update_mmu_cache_range(struct vm_area_struct *vma,
> +		unsigned long vaddr_unaligned, pte_t *ptep, unsigned int nr)
>  {
>  	unsigned long vaddr = vaddr_unaligned & PAGE_MASK;
>  	phys_addr_t paddr = pte_val(*ptep) & PAGE_MASK_PHYS;
> @@ -491,15 +491,19 @@ void update_mmu_cache(struct vm_area_struct *vma, unsigned long vaddr_unaligned,
>  	 */
>  	if ((vma->vm_flags & VM_EXEC) ||
>  	     addr_not_cache_congruent(paddr, vaddr)) {
> -
> -		int dirty = !test_and_set_bit(PG_dc_clean, &page->flags);
> +		struct folio *folio = page_folio(page);
> +		int dirty = !test_and_set_bit(PG_dc_clean, &folio->flags);
>  		if (dirty) {
> +			unsigned long offset = offset_in_folio(folio, paddr);
> +			nr = folio_nr_pages(folio);
> +			paddr -= offset;
> +			vaddr -= offset;
>  			/* wback + inv dcache lines (K-mapping) */
> -			__flush_dcache_page(paddr, paddr);
> +			__flush_dcache_pages(paddr, paddr, nr);
>  
>  			/* invalidate any existing icache lines (U-mapping) */
>  			if (vma->vm_flags & VM_EXEC)
> -				__inv_icache_page(paddr, vaddr);
> +				__inv_icache_pages(paddr, vaddr, nr);
>  		}
>  	}
>  }
> @@ -531,7 +535,7 @@ void update_mmu_cache_pmd(struct vm_area_struct *vma, unsigned long addr,
>  				 pmd_t *pmd)
>  {
>  	pte_t pte = __pte(pmd_val(*pmd));
> -	update_mmu_cache(vma, addr, &pte);
> +	update_mmu_cache_range(vma, addr, &pte, HPAGE_PMD_NR);
>  }
>  
>  void local_flush_pmd_tlb_range(struct vm_area_struct *vma, unsigned long start,
> -- 
> 2.39.2
> 

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v4 08/36] arm: Implement the new page table range API
  2023-03-15  5:14 ` [PATCH v4 08/36] arm: " Matthew Wilcox (Oracle)
@ 2023-03-15  9:48   ` Mike Rapoport
  2023-03-15 10:56   ` Russell King (Oracle)
  1 sibling, 0 replies; 138+ messages in thread
From: Mike Rapoport @ 2023-03-15  9:48 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-arch, linux-mm, linux-kernel, Russell King, linux-arm-kernel

On Wed, Mar 15, 2023 at 05:14:16AM +0000, Matthew Wilcox (Oracle) wrote:
> Add set_ptes(), update_mmu_cache_range(), flush_dcache_folio() and
> flush_icache_pages().  Change the PG_dcache_clear flag from being per-page
> to per-folio which makes __dma_page_dev_to_cpu() a bit more exciting.
> Also add flush_cache_pages(), even though this isn't used by generic code
> (yet?)
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> Cc: Russell King <linux@armlinux.org.uk>
> Cc: linux-arm-kernel@lists.infradead.org

Acked-by: Mike Rapoport (IBM) <rppt@kernel.org>

> ---
>  arch/arm/include/asm/cacheflush.h | 24 +++++---
>  arch/arm/include/asm/pgtable.h    |  5 +-
>  arch/arm/include/asm/tlbflush.h   | 13 ++--
>  arch/arm/mm/copypage-v4mc.c       |  5 +-
>  arch/arm/mm/copypage-v6.c         |  5 +-
>  arch/arm/mm/copypage-xscale.c     |  5 +-
>  arch/arm/mm/dma-mapping.c         | 24 ++++----
>  arch/arm/mm/fault-armv.c          | 14 ++---
>  arch/arm/mm/flush.c               | 99 +++++++++++++++++++------------
>  arch/arm/mm/mm.h                  |  2 +-
>  arch/arm/mm/mmu.c                 | 14 +++--
>  11 files changed, 125 insertions(+), 85 deletions(-)
> 
> diff --git a/arch/arm/include/asm/cacheflush.h b/arch/arm/include/asm/cacheflush.h
> index a094f964c869..841e268d2374 100644
> --- a/arch/arm/include/asm/cacheflush.h
> +++ b/arch/arm/include/asm/cacheflush.h
> @@ -231,14 +231,15 @@ vivt_flush_cache_range(struct vm_area_struct *vma, unsigned long start, unsigned
>  					vma->vm_flags);
>  }
>  
> -static inline void
> -vivt_flush_cache_page(struct vm_area_struct *vma, unsigned long user_addr, unsigned long pfn)
> +static inline void vivt_flush_cache_pages(struct vm_area_struct *vma,
> +		unsigned long user_addr, unsigned long pfn, unsigned int nr)
>  {
>  	struct mm_struct *mm = vma->vm_mm;
>  
>  	if (!mm || cpumask_test_cpu(smp_processor_id(), mm_cpumask(mm))) {
>  		unsigned long addr = user_addr & PAGE_MASK;
> -		__cpuc_flush_user_range(addr, addr + PAGE_SIZE, vma->vm_flags);
> +		__cpuc_flush_user_range(addr, addr + nr * PAGE_SIZE,
> +				vma->vm_flags);
>  	}
>  }
>  
> @@ -247,15 +248,17 @@ vivt_flush_cache_page(struct vm_area_struct *vma, unsigned long user_addr, unsig
>  		vivt_flush_cache_mm(mm)
>  #define flush_cache_range(vma,start,end) \
>  		vivt_flush_cache_range(vma,start,end)
> -#define flush_cache_page(vma,addr,pfn) \
> -		vivt_flush_cache_page(vma,addr,pfn)
> +#define flush_cache_pages(vma, addr, pfn, nr) \
> +		vivt_flush_cache_pages(vma, addr, pfn, nr)
>  #else
> -extern void flush_cache_mm(struct mm_struct *mm);
> -extern void flush_cache_range(struct vm_area_struct *vma, unsigned long start, unsigned long end);
> -extern void flush_cache_page(struct vm_area_struct *vma, unsigned long user_addr, unsigned long pfn);
> +void flush_cache_mm(struct mm_struct *mm);
> +void flush_cache_range(struct vm_area_struct *vma, unsigned long start, unsigned long end);
> +void flush_cache_pages(struct vm_area_struct *vma, unsigned long user_addr,
> +		unsigned long pfn, unsigned int nr);
>  #endif
>  
>  #define flush_cache_dup_mm(mm) flush_cache_mm(mm)
> +#define flush_cache_page(vma, addr, pfn) flush_cache_pages(vma, addr, pfn, 1)
>  
>  /*
>   * flush_icache_user_range is used when we want to ensure that the
> @@ -289,7 +292,9 @@ extern void flush_cache_page(struct vm_area_struct *vma, unsigned long user_addr
>   * See update_mmu_cache for the user space part.
>   */
>  #define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
> -extern void flush_dcache_page(struct page *);
> +void flush_dcache_page(struct page *);
> +void flush_dcache_folio(struct folio *folio);
> +#define flush_dcache_folio flush_dcache_folio
>  
>  #define ARCH_IMPLEMENTS_FLUSH_KERNEL_VMAP_RANGE 1
>  static inline void flush_kernel_vmap_range(void *addr, int size)
> @@ -321,6 +326,7 @@ static inline void flush_anon_page(struct vm_area_struct *vma,
>   * duplicate cache flushing elsewhere performed by flush_dcache_page().
>   */
>  #define flush_icache_page(vma,page)	do { } while (0)
> +#define flush_icache_pages(vma, page, nr)	do { } while (0)
>  
>  /*
>   * flush_cache_vmap() is used when creating mappings (eg, via vmap,
> diff --git a/arch/arm/include/asm/pgtable.h b/arch/arm/include/asm/pgtable.h
> index a58ccbb406ad..841001ab495c 100644
> --- a/arch/arm/include/asm/pgtable.h
> +++ b/arch/arm/include/asm/pgtable.h
> @@ -207,8 +207,9 @@ static inline void __sync_icache_dcache(pte_t pteval)
>  extern void __sync_icache_dcache(pte_t pteval);
>  #endif
>  
> -void set_pte_at(struct mm_struct *mm, unsigned long addr,
> -		      pte_t *ptep, pte_t pteval);
> +void set_ptes(struct mm_struct *mm, unsigned long addr,
> +		      pte_t *ptep, pte_t pteval, unsigned int nr);
> +#define set_ptes set_ptes
>  
>  static inline pte_t clear_pte_bit(pte_t pte, pgprot_t prot)
>  {
> diff --git a/arch/arm/include/asm/tlbflush.h b/arch/arm/include/asm/tlbflush.h
> index 0ccc985b90af..7d792e485f4f 100644
> --- a/arch/arm/include/asm/tlbflush.h
> +++ b/arch/arm/include/asm/tlbflush.h
> @@ -619,18 +619,21 @@ extern void flush_bp_all(void);
>   * If PG_dcache_clean is not set for the page, we need to ensure that any
>   * cache entries for the kernels virtual memory range are written
>   * back to the page. On ARMv6 and later, the cache coherency is handled via
> - * the set_pte_at() function.
> + * the set_ptes() function.
>   */
>  #if __LINUX_ARM_ARCH__ < 6
> -extern void update_mmu_cache(struct vm_area_struct *vma, unsigned long addr,
> -	pte_t *ptep);
> +void update_mmu_cache_range(struct vm_area_struct *vma, unsigned long addr,
> +		pte_t *ptep, unsigned int nr);
>  #else
> -static inline void update_mmu_cache(struct vm_area_struct *vma,
> -				    unsigned long addr, pte_t *ptep)
> +static inline void update_mmu_cache_range(struct vm_area_struct *vma,
> +		unsigned long addr, pte_t *ptep, unsigned int nr)
>  {
>  }
>  #endif
>  
> +#define update_mmu_cache(vma, addr, ptep) \
> +	update_mmu_cache_range(vma, addr, ptep, 1)
> +
>  #define update_mmu_cache_pmd(vma, address, pmd) do { } while (0)
>  
>  #endif
> diff --git a/arch/arm/mm/copypage-v4mc.c b/arch/arm/mm/copypage-v4mc.c
> index f1da3b439b96..7ddd82b9fe8b 100644
> --- a/arch/arm/mm/copypage-v4mc.c
> +++ b/arch/arm/mm/copypage-v4mc.c
> @@ -64,10 +64,11 @@ static void mc_copy_user_page(void *from, void *to)
>  void v4_mc_copy_user_highpage(struct page *to, struct page *from,
>  	unsigned long vaddr, struct vm_area_struct *vma)
>  {
> +	struct folio *src = page_folio(from);
>  	void *kto = kmap_atomic(to);
>  
> -	if (!test_and_set_bit(PG_dcache_clean, &from->flags))
> -		__flush_dcache_page(page_mapping_file(from), from);
> +	if (!test_and_set_bit(PG_dcache_clean, &src->flags))
> +		__flush_dcache_folio(folio_flush_mapping(src), src);
>  
>  	raw_spin_lock(&minicache_lock);
>  
> diff --git a/arch/arm/mm/copypage-v6.c b/arch/arm/mm/copypage-v6.c
> index d8a115de5507..a1a71f36d850 100644
> --- a/arch/arm/mm/copypage-v6.c
> +++ b/arch/arm/mm/copypage-v6.c
> @@ -69,11 +69,12 @@ static void discard_old_kernel_data(void *kto)
>  static void v6_copy_user_highpage_aliasing(struct page *to,
>  	struct page *from, unsigned long vaddr, struct vm_area_struct *vma)
>  {
> +	struct folio *src = page_folio(from);
>  	unsigned int offset = CACHE_COLOUR(vaddr);
>  	unsigned long kfrom, kto;
>  
> -	if (!test_and_set_bit(PG_dcache_clean, &from->flags))
> -		__flush_dcache_page(page_mapping_file(from), from);
> +	if (!test_and_set_bit(PG_dcache_clean, &src->flags))
> +		__flush_dcache_folio(folio_flush_mapping(src), src);
>  
>  	/* FIXME: not highmem safe */
>  	discard_old_kernel_data(page_address(to));
> diff --git a/arch/arm/mm/copypage-xscale.c b/arch/arm/mm/copypage-xscale.c
> index bcb485620a05..f1e29d3e8193 100644
> --- a/arch/arm/mm/copypage-xscale.c
> +++ b/arch/arm/mm/copypage-xscale.c
> @@ -84,10 +84,11 @@ static void mc_copy_user_page(void *from, void *to)
>  void xscale_mc_copy_user_highpage(struct page *to, struct page *from,
>  	unsigned long vaddr, struct vm_area_struct *vma)
>  {
> +	struct folio *src = page_folio(from);
>  	void *kto = kmap_atomic(to);
>  
> -	if (!test_and_set_bit(PG_dcache_clean, &from->flags))
> -		__flush_dcache_page(page_mapping_file(from), from);
> +	if (!test_and_set_bit(PG_dcache_clean, &src->flags))
> +		__flush_dcache_folio(folio_flush_mapping(src), src);
>  
>  	raw_spin_lock(&minicache_lock);
>  
> diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
> index 8bc01071474a..5ecfde41d70a 100644
> --- a/arch/arm/mm/dma-mapping.c
> +++ b/arch/arm/mm/dma-mapping.c
> @@ -693,6 +693,7 @@ static void __dma_page_cpu_to_dev(struct page *page, unsigned long off,
>  static void __dma_page_dev_to_cpu(struct page *page, unsigned long off,
>  	size_t size, enum dma_data_direction dir)
>  {
> +	struct folio *folio = page_folio(page);
>  	phys_addr_t paddr = page_to_phys(page) + off;
>  
>  	/* FIXME: non-speculating: not required */
> @@ -707,19 +708,18 @@ static void __dma_page_dev_to_cpu(struct page *page, unsigned long off,
>  	 * Mark the D-cache clean for these pages to avoid extra flushing.
>  	 */
>  	if (dir != DMA_TO_DEVICE && size >= PAGE_SIZE) {
> -		unsigned long pfn;
> -		size_t left = size;
> -
> -		pfn = page_to_pfn(page) + off / PAGE_SIZE;
> -		off %= PAGE_SIZE;
> -		if (off) {
> -			pfn++;
> -			left -= PAGE_SIZE - off;
> +		ssize_t left = size;
> +		size_t offset = offset_in_folio(folio, paddr);
> +
> +		if (offset) {
> +			left -= folio_size(folio) - offset;
> +			folio = folio_next(folio);
>  		}
> -		while (left >= PAGE_SIZE) {
> -			page = pfn_to_page(pfn++);
> -			set_bit(PG_dcache_clean, &page->flags);
> -			left -= PAGE_SIZE;
> +
> +		while (left >= (ssize_t)folio_size(folio)) {
> +			set_bit(PG_dcache_clean, &folio->flags);
> +			left -= folio_size(folio);
> +			folio = folio_next(folio);
>  		}
>  	}
>  }
> diff --git a/arch/arm/mm/fault-armv.c b/arch/arm/mm/fault-armv.c
> index 0e49154454a6..e2c869b8f012 100644
> --- a/arch/arm/mm/fault-armv.c
> +++ b/arch/arm/mm/fault-armv.c
> @@ -178,8 +178,8 @@ make_coherent(struct address_space *mapping, struct vm_area_struct *vma,
>   *
>   * Note that the pte lock will be held.
>   */
> -void update_mmu_cache(struct vm_area_struct *vma, unsigned long addr,
> -	pte_t *ptep)
> +void update_mmu_cache_range(struct vm_area_struct *vma, unsigned long addr,
> +		pte_t *ptep, unsigned int nr)
>  {
>  	unsigned long pfn = pte_pfn(*ptep);
>  	struct address_space *mapping;
> @@ -192,13 +192,13 @@ void update_mmu_cache(struct vm_area_struct *vma, unsigned long addr,
>  	 * The zero page is never written to, so never has any dirty
>  	 * cache lines, and therefore never needs to be flushed.
>  	 */
> -	page = pfn_to_page(pfn);
> -	if (page == ZERO_PAGE(0))
> +	if (is_zero_pfn(pfn))
>  		return;
>  
> -	mapping = page_mapping_file(page);
> -	if (!test_and_set_bit(PG_dcache_clean, &page->flags))
> -		__flush_dcache_page(mapping, page);
> +	folio = page_folio(pfn_to_page(pfn));
> +	mapping = folio_flush_mapping(page);
> +	if (!test_and_set_bit(PG_dcache_clean, &folio->flags))
> +		__flush_dcache_folio(mapping, folio);
>  	if (mapping) {
>  		if (cache_is_vivt())
>  			make_coherent(mapping, vma, addr, ptep, pfn);
> diff --git a/arch/arm/mm/flush.c b/arch/arm/mm/flush.c
> index 7ff9feea13a6..07ea0ab51099 100644
> --- a/arch/arm/mm/flush.c
> +++ b/arch/arm/mm/flush.c
> @@ -95,10 +95,10 @@ void flush_cache_range(struct vm_area_struct *vma, unsigned long start, unsigned
>  		__flush_icache_all();
>  }
>  
> -void flush_cache_page(struct vm_area_struct *vma, unsigned long user_addr, unsigned long pfn)
> +void flush_cache_pages(struct vm_area_struct *vma, unsigned long user_addr, unsigned long pfn, unsigned int nr)
>  {
>  	if (cache_is_vivt()) {
> -		vivt_flush_cache_page(vma, user_addr, pfn);
> +		vivt_flush_cache_pages(vma, user_addr, pfn, nr);
>  		return;
>  	}
>  
> @@ -196,29 +196,31 @@ void copy_to_user_page(struct vm_area_struct *vma, struct page *page,
>  #endif
>  }
>  
> -void __flush_dcache_page(struct address_space *mapping, struct page *page)
> +void __flush_dcache_folio(struct address_space *mapping, struct folio *folio)
>  {
>  	/*
>  	 * Writeback any data associated with the kernel mapping of this
>  	 * page.  This ensures that data in the physical page is mutually
>  	 * coherent with the kernels mapping.
>  	 */
> -	if (!PageHighMem(page)) {
> -		__cpuc_flush_dcache_area(page_address(page), page_size(page));
> +	if (!folio_test_highmem(folio)) {
> +		__cpuc_flush_dcache_area(folio_address(folio),
> +					folio_size(folio));
>  	} else {
>  		unsigned long i;
>  		if (cache_is_vipt_nonaliasing()) {
> -			for (i = 0; i < compound_nr(page); i++) {
> -				void *addr = kmap_atomic(page + i);
> +			for (i = 0; i < folio_nr_pages(folio); i++) {
> +				void *addr = kmap_local_folio(folio,
> +								i * PAGE_SIZE);
>  				__cpuc_flush_dcache_area(addr, PAGE_SIZE);
> -				kunmap_atomic(addr);
> +				kunmap_local(addr);
>  			}
>  		} else {
> -			for (i = 0; i < compound_nr(page); i++) {
> -				void *addr = kmap_high_get(page + i);
> +			for (i = 0; i < folio_nr_pages(folio); i++) {
> +				void *addr = kmap_high_get(folio_page(folio, i));
>  				if (addr) {
>  					__cpuc_flush_dcache_area(addr, PAGE_SIZE);
> -					kunmap_high(page + i);
> +					kunmap_high(folio_page(folio, i));
>  				}
>  			}
>  		}
> @@ -230,15 +232,14 @@ void __flush_dcache_page(struct address_space *mapping, struct page *page)
>  	 * userspace colour, which is congruent with page->index.
>  	 */
>  	if (mapping && cache_is_vipt_aliasing())
> -		flush_pfn_alias(page_to_pfn(page),
> -				page->index << PAGE_SHIFT);
> +		flush_pfn_alias(folio_pfn(folio), folio_pos(folio));
>  }
>  
> -static void __flush_dcache_aliases(struct address_space *mapping, struct page *page)
> +static void __flush_dcache_aliases(struct address_space *mapping, struct folio *folio)
>  {
>  	struct mm_struct *mm = current->active_mm;
> -	struct vm_area_struct *mpnt;
> -	pgoff_t pgoff;
> +	struct vm_area_struct *vma;
> +	pgoff_t pgoff, pgoff_end;
>  
>  	/*
>  	 * There are possible user space mappings of this page:
> @@ -246,21 +247,36 @@ static void __flush_dcache_aliases(struct address_space *mapping, struct page *p
>  	 *   data in the current VM view associated with this page.
>  	 * - aliasing VIPT: we only need to find one mapping of this page.
>  	 */
> -	pgoff = page->index;
> +	pgoff = folio->index;
> +	pgoff_end = pgoff + folio_nr_pages(folio) - 1;
>  
>  	flush_dcache_mmap_lock(mapping);
> -	vma_interval_tree_foreach(mpnt, &mapping->i_mmap, pgoff, pgoff) {
> -		unsigned long offset;
> +	vma_interval_tree_foreach(vma, &mapping->i_mmap, pgoff, pgoff_end) {
> +		unsigned long start, offset, pfn;
> +		unsigned int nr;
>  
>  		/*
>  		 * If this VMA is not in our MM, we can ignore it.
>  		 */
> -		if (mpnt->vm_mm != mm)
> +		if (vma->vm_mm != mm)
>  			continue;
> -		if (!(mpnt->vm_flags & VM_MAYSHARE))
> +		if (!(vma->vm_flags & VM_MAYSHARE))
>  			continue;
> -		offset = (pgoff - mpnt->vm_pgoff) << PAGE_SHIFT;
> -		flush_cache_page(mpnt, mpnt->vm_start + offset, page_to_pfn(page));
> +
> +		start = vma->vm_start;
> +		pfn = folio_pfn(folio);
> +		nr = folio_nr_pages(folio);
> +		offset = pgoff - vma->vm_pgoff;
> +		if (offset > -nr) {
> +			pfn -= offset;
> +			nr += offset;
> +		} else {
> +			start += offset * PAGE_SIZE;
> +		}
> +		if (start + nr * PAGE_SIZE > vma->vm_end)
> +			nr = (vma->vm_end - start) / PAGE_SIZE;
> +
> +		flush_cache_pages(vma, start, pfn, nr);
>  	}
>  	flush_dcache_mmap_unlock(mapping);
>  }
> @@ -269,7 +285,7 @@ static void __flush_dcache_aliases(struct address_space *mapping, struct page *p
>  void __sync_icache_dcache(pte_t pteval)
>  {
>  	unsigned long pfn;
> -	struct page *page;
> +	struct folio *folio;
>  	struct address_space *mapping;
>  
>  	if (cache_is_vipt_nonaliasing() && !pte_exec(pteval))
> @@ -279,14 +295,14 @@ void __sync_icache_dcache(pte_t pteval)
>  	if (!pfn_valid(pfn))
>  		return;
>  
> -	page = pfn_to_page(pfn);
> +	folio = page_folio(pfn_to_page(pfn));
>  	if (cache_is_vipt_aliasing())
> -		mapping = page_mapping_file(page);
> +		mapping = folio_flush_mapping(folio);
>  	else
>  		mapping = NULL;
>  
> -	if (!test_and_set_bit(PG_dcache_clean, &page->flags))
> -		__flush_dcache_page(mapping, page);
> +	if (!test_and_set_bit(PG_dcache_clean, &folio->flags))
> +		__flush_dcache_folio(mapping, folio);
>  
>  	if (pte_exec(pteval))
>  		__flush_icache_all();
> @@ -312,7 +328,7 @@ void __sync_icache_dcache(pte_t pteval)
>   * Note that we disable the lazy flush for SMP configurations where
>   * the cache maintenance operations are not automatically broadcasted.
>   */
> -void flush_dcache_page(struct page *page)
> +void flush_dcache_folio(struct folio *folio)
>  {
>  	struct address_space *mapping;
>  
> @@ -320,31 +336,36 @@ void flush_dcache_page(struct page *page)
>  	 * The zero page is never written to, so never has any dirty
>  	 * cache lines, and therefore never needs to be flushed.
>  	 */
> -	if (page == ZERO_PAGE(0))
> +	if (is_zero_pfn(folio_pfn(folio)))
>  		return;
>  
>  	if (!cache_ops_need_broadcast() && cache_is_vipt_nonaliasing()) {
> -		if (test_bit(PG_dcache_clean, &page->flags))
> -			clear_bit(PG_dcache_clean, &page->flags);
> +		if (test_bit(PG_dcache_clean, &folio->flags))
> +			clear_bit(PG_dcache_clean, &folio->flags);
>  		return;
>  	}
>  
> -	mapping = page_mapping_file(page);
> +	mapping = folio_flush_mapping(folio);
>  
>  	if (!cache_ops_need_broadcast() &&
> -	    mapping && !page_mapcount(page))
> -		clear_bit(PG_dcache_clean, &page->flags);
> +	    mapping && !folio_mapped(folio))
> +		clear_bit(PG_dcache_clean, &folio->flags);
>  	else {
> -		__flush_dcache_page(mapping, page);
> +		__flush_dcache_folio(mapping, folio);
>  		if (mapping && cache_is_vivt())
> -			__flush_dcache_aliases(mapping, page);
> +			__flush_dcache_aliases(mapping, folio);
>  		else if (mapping)
>  			__flush_icache_all();
> -		set_bit(PG_dcache_clean, &page->flags);
> +		set_bit(PG_dcache_clean, &folio->flags);
>  	}
>  }
> -EXPORT_SYMBOL(flush_dcache_page);
> +EXPORT_SYMBOL(flush_dcache_folio);
>  
> +void flush_dcache_page(struct page *page)
> +{
> +	flush_dcache_folio(page_folio(page));
> +}
> +EXPORT_SYMBOL(flush_dcache_page);
>  /*
>   * Flush an anonymous page so that users of get_user_pages()
>   * can safely access the data.  The expected sequence is:
> diff --git a/arch/arm/mm/mm.h b/arch/arm/mm/mm.h
> index d7ffccb7fea7..419316316711 100644
> --- a/arch/arm/mm/mm.h
> +++ b/arch/arm/mm/mm.h
> @@ -45,7 +45,7 @@ struct mem_type {
>  
>  const struct mem_type *get_mem_type(unsigned int type);
>  
> -extern void __flush_dcache_page(struct address_space *mapping, struct page *page);
> +void __flush_dcache_folio(struct address_space *mapping, struct folio *folio);
>  
>  /*
>   * ARM specific vm_struct->flags bits.
> diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
> index 463fc2a8448f..9947bbc32b04 100644
> --- a/arch/arm/mm/mmu.c
> +++ b/arch/arm/mm/mmu.c
> @@ -1788,7 +1788,7 @@ void __init paging_init(const struct machine_desc *mdesc)
>  	bootmem_init();
>  
>  	empty_zero_page = virt_to_page(zero_page);
> -	__flush_dcache_page(NULL, empty_zero_page);
> +	__flush_dcache_folio(NULL, page_folio(empty_zero_page));
>  }
>  
>  void __init early_mm_init(const struct machine_desc *mdesc)
> @@ -1797,8 +1797,8 @@ void __init early_mm_init(const struct machine_desc *mdesc)
>  	early_paging_init(mdesc);
>  }
>  
> -void set_pte_at(struct mm_struct *mm, unsigned long addr,
> -			      pte_t *ptep, pte_t pteval)
> +void set_ptes(struct mm_struct *mm, unsigned long addr,
> +			      pte_t *ptep, pte_t pteval, unsigned int nr)
>  {
>  	unsigned long ext = 0;
>  
> @@ -1808,5 +1808,11 @@ void set_pte_at(struct mm_struct *mm, unsigned long addr,
>  		ext |= PTE_EXT_NG;
>  	}
>  
> -	set_pte_ext(ptep, pteval, ext);
> +	for (;;) {
> +		set_pte_ext(ptep, pteval, ext);
> +		if (--nr == 0)
> +			break;
> +		ptep++;
> +		pte_val(pteval) += PAGE_SIZE;
> +	}
>  }
> -- 
> 2.39.2
> 

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v4 09/36] arm64: Implement the new page table range API
  2023-03-15  5:14 ` [PATCH v4 09/36] arm64: " Matthew Wilcox (Oracle)
@ 2023-03-15  9:49   ` Mike Rapoport
  2023-05-25  3:35   ` Anshuman Khandual
  1 sibling, 0 replies; 138+ messages in thread
From: Mike Rapoport @ 2023-03-15  9:49 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-arch, linux-mm, linux-kernel, Catalin Marinas, linux-arm-kernel

On Wed, Mar 15, 2023 at 05:14:17AM +0000, Matthew Wilcox (Oracle) wrote:
> Add set_ptes(), update_mmu_cache_range() and flush_dcache_folio().
> Change the PG_dcache_clean flag from being per-page to per-folio.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
> Cc: linux-arm-kernel@lists.infradead.org

Acked-by: Mike Rapoport (IBM) <rppt@kernel.org>

> ---
>  arch/arm64/include/asm/cacheflush.h |  4 +++-
>  arch/arm64/include/asm/pgtable.h    | 25 ++++++++++++++------
>  arch/arm64/mm/flush.c               | 36 +++++++++++------------------
>  3 files changed, 35 insertions(+), 30 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/cacheflush.h b/arch/arm64/include/asm/cacheflush.h
> index 37185e978aeb..d115451ed263 100644
> --- a/arch/arm64/include/asm/cacheflush.h
> +++ b/arch/arm64/include/asm/cacheflush.h
> @@ -114,7 +114,7 @@ extern void copy_to_user_page(struct vm_area_struct *, struct page *,
>  #define copy_to_user_page copy_to_user_page
>  
>  /*
> - * flush_dcache_page is used when the kernel has written to the page
> + * flush_dcache_folio is used when the kernel has written to the page
>   * cache page at virtual address page->virtual.
>   *
>   * If this page isn't mapped (ie, page_mapping == NULL), or it might
> @@ -127,6 +127,8 @@ extern void copy_to_user_page(struct vm_area_struct *, struct page *,
>   */
>  #define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
>  extern void flush_dcache_page(struct page *);
> +void flush_dcache_folio(struct folio *);
> +#define flush_dcache_folio flush_dcache_folio
>  
>  static __always_inline void icache_inval_all_pou(void)
>  {
> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
> index 9428748f4691..6fd012663a01 100644
> --- a/arch/arm64/include/asm/pgtable.h
> +++ b/arch/arm64/include/asm/pgtable.h
> @@ -355,12 +355,21 @@ static inline void __set_pte_at(struct mm_struct *mm, unsigned long addr,
>  	set_pte(ptep, pte);
>  }
>  
> -static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
> -			      pte_t *ptep, pte_t pte)
> -{
> -	page_table_check_ptes_set(mm, addr, ptep, pte, 1);
> -	return __set_pte_at(mm, addr, ptep, pte);
> +static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
> +			      pte_t *ptep, pte_t pte, unsigned int nr)
> +{
> +	page_table_check_ptes_set(mm, addr, ptep, pte, nr);
> +
> +	for (;;) {
> +		__set_pte_at(mm, addr, ptep, pte);
> +		if (--nr == 0)
> +			break;
> +		ptep++;
> +		addr += PAGE_SIZE;
> +		pte_val(pte) += PAGE_SIZE;
> +	}
>  }
> +#define set_ptes set_ptes
>  
>  /*
>   * Huge pte definitions.
> @@ -1059,8 +1068,8 @@ static inline void arch_swap_restore(swp_entry_t entry, struct folio *folio)
>  /*
>   * On AArch64, the cache coherency is handled via the set_pte_at() function.
>   */
> -static inline void update_mmu_cache(struct vm_area_struct *vma,
> -				    unsigned long addr, pte_t *ptep)
> +static inline void update_mmu_cache_range(struct vm_area_struct *vma,
> +		unsigned long addr, pte_t *ptep, unsigned int nr)
>  {
>  	/*
>  	 * We don't do anything here, so there's a very small chance of
> @@ -1069,6 +1078,8 @@ static inline void update_mmu_cache(struct vm_area_struct *vma,
>  	 */
>  }
>  
> +#define update_mmu_cache(vma, addr, ptep) \
> +	update_mmu_cache_range(vma, addr, ptep, 1)
>  #define update_mmu_cache_pmd(vma, address, pmd) do { } while (0)
>  
>  #ifdef CONFIG_ARM64_PA_BITS_52
> diff --git a/arch/arm64/mm/flush.c b/arch/arm64/mm/flush.c
> index 5f9379b3c8c8..deb781af0a3a 100644
> --- a/arch/arm64/mm/flush.c
> +++ b/arch/arm64/mm/flush.c
> @@ -50,20 +50,13 @@ void copy_to_user_page(struct vm_area_struct *vma, struct page *page,
>  
>  void __sync_icache_dcache(pte_t pte)
>  {
> -	struct page *page = pte_page(pte);
> +	struct folio *folio = page_folio(pte_page(pte));
>  
> -	/*
> -	 * HugeTLB pages are always fully mapped, so only setting head page's
> -	 * PG_dcache_clean flag is enough.
> -	 */
> -	if (PageHuge(page))
> -		page = compound_head(page);
> -
> -	if (!test_bit(PG_dcache_clean, &page->flags)) {
> -		sync_icache_aliases((unsigned long)page_address(page),
> -				    (unsigned long)page_address(page) +
> -					    page_size(page));
> -		set_bit(PG_dcache_clean, &page->flags);
> +	if (!test_bit(PG_dcache_clean, &folio->flags)) {
> +		sync_icache_aliases((unsigned long)folio_address(folio),
> +				    (unsigned long)folio_address(folio) +
> +					    folio_size(folio));
> +		set_bit(PG_dcache_clean, &folio->flags);
>  	}
>  }
>  EXPORT_SYMBOL_GPL(__sync_icache_dcache);
> @@ -73,17 +66,16 @@ EXPORT_SYMBOL_GPL(__sync_icache_dcache);
>   * it as dirty for later flushing when mapped in user space (if executable,
>   * see __sync_icache_dcache).
>   */
> -void flush_dcache_page(struct page *page)
> +void flush_dcache_folio(struct folio *folio)
>  {
> -	/*
> -	 * HugeTLB pages are always fully mapped and only head page will be
> -	 * set PG_dcache_clean (see comments in __sync_icache_dcache()).
> -	 */
> -	if (PageHuge(page))
> -		page = compound_head(page);
> +	if (test_bit(PG_dcache_clean, &folio->flags))
> +		clear_bit(PG_dcache_clean, &folio->flags);
> +}
> +EXPORT_SYMBOL(flush_dcache_folio);
>  
> -	if (test_bit(PG_dcache_clean, &page->flags))
> -		clear_bit(PG_dcache_clean, &page->flags);
> +void flush_dcache_page(struct page *page)
> +{
> +	flush_dcache_folio(page_folio(page));
>  }
>  EXPORT_SYMBOL(flush_dcache_page);
>  
> -- 
> 2.39.2
> 
> 

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v4 10/36] csky: Implement the new page table range API
  2023-03-15  5:14 ` [PATCH v4 10/36] csky: " Matthew Wilcox (Oracle)
@ 2023-03-15  9:50   ` Mike Rapoport
  0 siblings, 0 replies; 138+ messages in thread
From: Mike Rapoport @ 2023-03-15  9:50 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-arch, linux-mm, linux-kernel, Guo Ren, linux-csky

On Wed, Mar 15, 2023 at 05:14:18AM +0000, Matthew Wilcox (Oracle) wrote:
> Add PFN_PTE_SHIFT, update_mmu_cache_range() and flush_dcache_folio().
> Change the PG_dcache_clean flag from being per-page to per-folio.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> Acked-by: Guo Ren <guoren@kernel.org>
> Cc: linux-csky@vger.kernel.org

Acked-by: Mike Rapoport (IBM) <rppt@kernel.org>

> ---
>  arch/csky/abiv1/cacheflush.c         | 32 +++++++++++++++++-----------
>  arch/csky/abiv1/inc/abi/cacheflush.h |  2 ++
>  arch/csky/abiv2/cacheflush.c         | 32 ++++++++++++++--------------
>  arch/csky/abiv2/inc/abi/cacheflush.h | 10 +++++++--
>  arch/csky/include/asm/pgtable.h      |  8 ++++---
>  5 files changed, 50 insertions(+), 34 deletions(-)
> 
> diff --git a/arch/csky/abiv1/cacheflush.c b/arch/csky/abiv1/cacheflush.c
> index fb91b069dc69..ba43f6c26b4f 100644
> --- a/arch/csky/abiv1/cacheflush.c
> +++ b/arch/csky/abiv1/cacheflush.c
> @@ -14,43 +14,49 @@
>  
>  #define PG_dcache_clean		PG_arch_1
>  
> -void flush_dcache_page(struct page *page)
> +void flush_dcache_folio(struct folio *folio)
>  {
>  	struct address_space *mapping;
>  
> -	if (page == ZERO_PAGE(0))
> +	if (is_zero_pfn(folio_pfn(folio)))
>  		return;
>  
> -	mapping = page_mapping_file(page);
> +	mapping = folio_flush_mapping(folio);
>  
> -	if (mapping && !page_mapcount(page))
> -		clear_bit(PG_dcache_clean, &page->flags);
> +	if (mapping && !folio_mapped(folio))
> +		clear_bit(PG_dcache_clean, &folio->flags);
>  	else {
>  		dcache_wbinv_all();
>  		if (mapping)
>  			icache_inv_all();
> -		set_bit(PG_dcache_clean, &page->flags);
> +		set_bit(PG_dcache_clean, &folio->flags);
>  	}
>  }
> +EXPORT_SYMBOL(flush_dcache_folio);
> +
> +void flush_dcache_page(struct page *page)
> +{
> +	flush_dcache_folio(page_folio(page));
> +}
>  EXPORT_SYMBOL(flush_dcache_page);
>  
> -void update_mmu_cache(struct vm_area_struct *vma, unsigned long addr,
> -	pte_t *ptep)
> +void update_mmu_cache_range(struct vm_area_struct *vma, unsigned long addr,
> +		pte_t *ptep, unsigned int nr)
>  {
>  	unsigned long pfn = pte_pfn(*ptep);
> -	struct page *page;
> +	struct folio *folio;
>  
>  	if (!pfn_valid(pfn))
>  		return;
>  
> -	page = pfn_to_page(pfn);
> -	if (page == ZERO_PAGE(0))
> +	if (is_zero_pfn(pfn))
>  		return;
>  
> -	if (!test_and_set_bit(PG_dcache_clean, &page->flags))
> +	folio = page_folio(pfn_to_page(pfn));
> +	if (!test_and_set_bit(PG_dcache_clean, &folio->flags))
>  		dcache_wbinv_all();
>  
> -	if (page_mapping_file(page)) {
> +	if (folio_flush_mapping(folio)) {
>  		if (vma->vm_flags & VM_EXEC)
>  			icache_inv_all();
>  	}
> diff --git a/arch/csky/abiv1/inc/abi/cacheflush.h b/arch/csky/abiv1/inc/abi/cacheflush.h
> index ed62e2066ba7..0d6cb65624c4 100644
> --- a/arch/csky/abiv1/inc/abi/cacheflush.h
> +++ b/arch/csky/abiv1/inc/abi/cacheflush.h
> @@ -9,6 +9,8 @@
>  
>  #define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
>  extern void flush_dcache_page(struct page *);
> +void flush_dcache_folio(struct folio *);
> +#define flush_dcache_folio flush_dcache_folio
>  
>  #define flush_cache_mm(mm)			dcache_wbinv_all()
>  #define flush_cache_page(vma, page, pfn)	cache_wbinv_all()
> diff --git a/arch/csky/abiv2/cacheflush.c b/arch/csky/abiv2/cacheflush.c
> index 39c51399dd81..622e5b1b3f8a 100644
> --- a/arch/csky/abiv2/cacheflush.c
> +++ b/arch/csky/abiv2/cacheflush.c
> @@ -6,30 +6,30 @@
>  #include <linux/mm.h>
>  #include <asm/cache.h>
>  
> -void update_mmu_cache(struct vm_area_struct *vma, unsigned long address,
> -		      pte_t *pte)
> +void update_mmu_cache_range(struct vm_area_struct *vma, unsigned long address,
> +		pte_t *pte, unsigned int nr)
>  {
> -	unsigned long addr;
> -	struct page *page;
> +	unsigned long pfn = pte_pfn(*pte);
> +	struct folio *folio;
> +	unsigned int i;
>  
> -	if (!pfn_valid(pte_pfn(*pte)))
> +	if (!pfn_valid(pfn) || is_zero_pfn(pfn))
>  		return;
>  
> -	page = pfn_to_page(pte_pfn(*pte));
> -	if (page == ZERO_PAGE(0))
> -		return;
> +	folio = page_folio(pfn_to_page(pfn));
>  
> -	if (test_and_set_bit(PG_dcache_clean, &page->flags))
> +	if (test_and_set_bit(PG_dcache_clean, &folio->flags))
>  		return;
>  
> -	addr = (unsigned long) kmap_atomic(page);
> -
> -	dcache_wb_range(addr, addr + PAGE_SIZE);
> +	for (i = 0; i < folio_nr_pages(folio); i++) {
> +		unsigned long addr = (unsigned long) kmap_local_folio(folio,
> +								i * PAGE_SIZE);
>  
> -	if (vma->vm_flags & VM_EXEC)
> -		icache_inv_range(addr, addr + PAGE_SIZE);
> -
> -	kunmap_atomic((void *) addr);
> +		dcache_wb_range(addr, addr + PAGE_SIZE);
> +		if (vma->vm_flags & VM_EXEC)
> +			icache_inv_range(addr, addr + PAGE_SIZE);
> +		kunmap_local((void *) addr);
> +	}
>  }
>  
>  void flush_icache_deferred(struct mm_struct *mm)
> diff --git a/arch/csky/abiv2/inc/abi/cacheflush.h b/arch/csky/abiv2/inc/abi/cacheflush.h
> index a565e00c3f70..9c728933a776 100644
> --- a/arch/csky/abiv2/inc/abi/cacheflush.h
> +++ b/arch/csky/abiv2/inc/abi/cacheflush.h
> @@ -18,11 +18,17 @@
>  
>  #define PG_dcache_clean		PG_arch_1
>  
> +static inline void flush_dcache_folio(struct folio *folio)
> +{
> +	if (test_bit(PG_dcache_clean, &folio->flags))
> +		clear_bit(PG_dcache_clean, &folio->flags);
> +}
> +#define flush_dcache_folio flush_dcache_folio
> +
>  #define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
>  static inline void flush_dcache_page(struct page *page)
>  {
> -	if (test_bit(PG_dcache_clean, &page->flags))
> -		clear_bit(PG_dcache_clean, &page->flags);
> +	flush_dcache_folio(page_folio(page));
>  }
>  
>  #define flush_dcache_mmap_lock(mapping)		do { } while (0)
> diff --git a/arch/csky/include/asm/pgtable.h b/arch/csky/include/asm/pgtable.h
> index d4042495febc..8cd27104f408 100644
> --- a/arch/csky/include/asm/pgtable.h
> +++ b/arch/csky/include/asm/pgtable.h
> @@ -28,6 +28,7 @@
>  #define pgd_ERROR(e) \
>  	pr_err("%s:%d: bad pgd %08lx.\n", __FILE__, __LINE__, pgd_val(e))
>  
> +#define PFN_PTE_SHIFT	PAGE_SHIFT
>  #define pmd_pfn(pmd)	(pmd_phys(pmd) >> PAGE_SHIFT)
>  #define pmd_page(pmd)	(pfn_to_page(pmd_phys(pmd) >> PAGE_SHIFT))
>  #define pte_clear(mm, addr, ptep)	set_pte((ptep), \
> @@ -90,7 +91,6 @@ static inline void set_pte(pte_t *p, pte_t pte)
>  	/* prevent out of order excution */
>  	smp_mb();
>  }
> -#define set_pte_at(mm, addr, ptep, pteval) set_pte(ptep, pteval)
>  
>  static inline pte_t *pmd_page_vaddr(pmd_t pmd)
>  {
> @@ -263,8 +263,10 @@ static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
>  extern pgd_t swapper_pg_dir[PTRS_PER_PGD];
>  extern void paging_init(void);
>  
> -void update_mmu_cache(struct vm_area_struct *vma, unsigned long address,
> -		      pte_t *pte);
> +void update_mmu_cache_range(struct vm_area_struct *vma, unsigned long address,
> +		pte_t *pte, unsigned int nr);
> +#define update_mmu_cache(vma, addr, ptep) \
> +	update_mmu_cache_range(vma, addr, ptep, 1)
>  
>  #define io_remap_pfn_range(vma, vaddr, pfn, size, prot) \
>  	remap_pfn_range(vma, vaddr, pfn, size, prot)
> -- 
> 2.39.2
> 
> 

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v4 11/36] hexagon: Implement the new page table range API
  2023-03-15  5:14 ` [PATCH v4 11/36] hexagon: " Matthew Wilcox (Oracle)
@ 2023-03-15  9:54   ` Mike Rapoport
  0 siblings, 0 replies; 138+ messages in thread
From: Mike Rapoport @ 2023-03-15  9:54 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle); +Cc: linux-arch, linux-mm, linux-kernel, Brian Cain

On Wed, Mar 15, 2023 at 05:14:19AM +0000, Matthew Wilcox (Oracle) wrote:
> Add PFN_PTE_SHIFT and update_mmu_cache_range().
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> Acked-by: Brian Cain <bcain@quicinc.com>

Acked-by: Mike Rapoport (IBM) <rppt@kernel.org>

> ---
>  arch/hexagon/include/asm/cacheflush.h | 7 +++++--
>  arch/hexagon/include/asm/pgtable.h    | 9 +--------
>  2 files changed, 6 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/hexagon/include/asm/cacheflush.h b/arch/hexagon/include/asm/cacheflush.h
> index 6eff0730e6ef..63ca314ede89 100644
> --- a/arch/hexagon/include/asm/cacheflush.h
> +++ b/arch/hexagon/include/asm/cacheflush.h
> @@ -58,12 +58,15 @@ extern void flush_cache_all_hexagon(void);
>   * clean the cache when the PTE is set.
>   *
>   */
> -static inline void update_mmu_cache(struct vm_area_struct *vma,
> -					unsigned long address, pte_t *ptep)
> +static inline void update_mmu_cache_range(struct vm_area_struct *vma,
> +		unsigned long address, pte_t *ptep, unsigned int nr)
>  {
>  	/*  generic_ptrace_pokedata doesn't wind up here, does it?  */
>  }
>  
> +#define update_mmu_cache(vma, addr, ptep) \
> +	update_mmu_cache_range(vma, addr, ptep, 1)
> +
>  void copy_to_user_page(struct vm_area_struct *vma, struct page *page,
>  		       unsigned long vaddr, void *dst, void *src, int len);
>  #define copy_to_user_page copy_to_user_page
> diff --git a/arch/hexagon/include/asm/pgtable.h b/arch/hexagon/include/asm/pgtable.h
> index 59393613d086..dd05dd71b8ec 100644
> --- a/arch/hexagon/include/asm/pgtable.h
> +++ b/arch/hexagon/include/asm/pgtable.h
> @@ -338,6 +338,7 @@ static inline int pte_exec(pte_t pte)
>  /* __swp_entry_to_pte - extract PTE from swap entry */
>  #define __swp_entry_to_pte(x) ((pte_t) { (x).val })
>  
> +#define PFN_PTE_SHIFT	PAGE_SHIFT
>  /* pfn_pte - convert page number and protection value to page table entry */
>  #define pfn_pte(pfn, pgprot) __pte((pfn << PAGE_SHIFT) | pgprot_val(pgprot))
>  
> @@ -345,14 +346,6 @@ static inline int pte_exec(pte_t pte)
>  #define pte_pfn(pte) (pte_val(pte) >> PAGE_SHIFT)
>  #define set_pmd(pmdptr, pmdval) (*(pmdptr) = (pmdval))
>  
> -/*
> - * set_pte_at - update page table and do whatever magic may be
> - * necessary to make the underlying hardware/firmware take note.
> - *
> - * VM may require a virtual instruction to alert the MMU.
> - */
> -#define set_pte_at(mm, addr, ptep, pte) set_pte(ptep, pte)
> -
>  static inline unsigned long pmd_page_vaddr(pmd_t pmd)
>  {
>  	return (unsigned long)__va(pmd_val(pmd) & PAGE_MASK);
> -- 
> 2.39.2
> 

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v4 12/36] ia64: Implement the new page table range API
  2023-03-15  5:14 ` [PATCH v4 12/36] ia64: " Matthew Wilcox (Oracle)
@ 2023-03-15  9:55   ` Mike Rapoport
  0 siblings, 0 replies; 138+ messages in thread
From: Mike Rapoport @ 2023-03-15  9:55 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle); +Cc: linux-arch, linux-mm, linux-kernel, linux-ia64

On Wed, Mar 15, 2023 at 05:14:20AM +0000, Matthew Wilcox (Oracle) wrote:
> Add PFN_PTE_SHIFT, update_mmu_cache_range() and flush_dcache_folio().
> Change the PG_arch_1 (aka PG_dcache_clean) flag from being per-page to
> per-folio, which makes arch_dma_mark_clean() and mark_clean() a little
> more exciting.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> Cc: linux-ia64@vger.kernel.org

Acked-by: Mike Rapoport (IBM) <rppt@kernel.org>

> ---
>  arch/ia64/hp/common/sba_iommu.c    | 26 +++++++++++++++-----------
>  arch/ia64/include/asm/cacheflush.h | 14 ++++++++++----
>  arch/ia64/include/asm/pgtable.h    |  4 ++--
>  arch/ia64/mm/init.c                | 28 +++++++++++++++++++---------
>  4 files changed, 46 insertions(+), 26 deletions(-)
> 
> diff --git a/arch/ia64/hp/common/sba_iommu.c b/arch/ia64/hp/common/sba_iommu.c
> index 8ad6946521d8..48d475f10003 100644
> --- a/arch/ia64/hp/common/sba_iommu.c
> +++ b/arch/ia64/hp/common/sba_iommu.c
> @@ -798,22 +798,26 @@ sba_io_pdir_entry(u64 *pdir_ptr, unsigned long vba)
>  #endif
>  
>  #ifdef ENABLE_MARK_CLEAN
> -/**
> +/*
>   * Since DMA is i-cache coherent, any (complete) pages that were written via
>   * DMA can be marked as "clean" so that lazy_mmu_prot_update() doesn't have to
>   * flush them when they get mapped into an executable vm-area.
>   */
> -static void
> -mark_clean (void *addr, size_t size)
> +static void mark_clean(void *addr, size_t size)
>  {
> -	unsigned long pg_addr, end;
> -
> -	pg_addr = PAGE_ALIGN((unsigned long) addr);
> -	end = (unsigned long) addr + size;
> -	while (pg_addr + PAGE_SIZE <= end) {
> -		struct page *page = virt_to_page((void *)pg_addr);
> -		set_bit(PG_arch_1, &page->flags);
> -		pg_addr += PAGE_SIZE;
> +	struct folio *folio = virt_to_folio(addr);
> +	ssize_t left = size;
> +	size_t offset = offset_in_folio(folio, addr);
> +
> +	if (offset) {
> +		left -= folio_size(folio) - offset;
> +		folio = folio_next(folio);
> +	}
> +
> +	while (left >= folio_size(folio)) {
> +		set_bit(PG_arch_1, &folio->flags);
> +		left -= folio_size(folio);
> +		folio = folio_next(folio);
>  	}
>  }
>  #endif
> diff --git a/arch/ia64/include/asm/cacheflush.h b/arch/ia64/include/asm/cacheflush.h
> index 708c0fa5d975..eac493fa9e0d 100644
> --- a/arch/ia64/include/asm/cacheflush.h
> +++ b/arch/ia64/include/asm/cacheflush.h
> @@ -13,10 +13,16 @@
>  #include <asm/page.h>
>  
>  #define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
> -#define flush_dcache_page(page)			\
> -do {						\
> -	clear_bit(PG_arch_1, &(page)->flags);	\
> -} while (0)
> +static inline void flush_dcache_folio(struct folio *folio)
> +{
> +	clear_bit(PG_arch_1, &folio->flags);
> +}
> +#define flush_dcache_folio flush_dcache_folio
> +
> +static inline void flush_dcache_page(struct page *page)
> +{
> +	flush_dcache_folio(page_folio(page));
> +}
>  
>  extern void flush_icache_range(unsigned long start, unsigned long end);
>  #define flush_icache_range flush_icache_range
> diff --git a/arch/ia64/include/asm/pgtable.h b/arch/ia64/include/asm/pgtable.h
> index 21c97e31a28a..5450d59e4fb9 100644
> --- a/arch/ia64/include/asm/pgtable.h
> +++ b/arch/ia64/include/asm/pgtable.h
> @@ -206,6 +206,7 @@ ia64_phys_addr_valid (unsigned long addr)
>  #define RGN_MAP_SHIFT (PGDIR_SHIFT + PTRS_PER_PGD_SHIFT - 3)
>  #define RGN_MAP_LIMIT	((1UL << RGN_MAP_SHIFT) - PAGE_SIZE)	/* per region addr limit */
>  
> +#define PFN_PTE_SHIFT	PAGE_SHIFT
>  /*
>   * Conversion functions: convert page frame number (pfn) and a protection value to a page
>   * table entry (pte).
> @@ -303,8 +304,6 @@ static inline void set_pte(pte_t *ptep, pte_t pteval)
>  	*ptep = pteval;
>  }
>  
> -#define set_pte_at(mm,addr,ptep,pteval) set_pte(ptep,pteval)
> -
>  /*
>   * Make page protection values cacheable, uncacheable, or write-
>   * combining.  Note that "protection" is really a misnomer here as the
> @@ -396,6 +395,7 @@ pte_same (pte_t a, pte_t b)
>  	return pte_val(a) == pte_val(b);
>  }
>  
> +#define update_mmu_cache_range(vma, address, ptep, nr) do { } while (0)
>  #define update_mmu_cache(vma, address, ptep) do { } while (0)
>  
>  extern pgd_t swapper_pg_dir[PTRS_PER_PGD];
> diff --git a/arch/ia64/mm/init.c b/arch/ia64/mm/init.c
> index 7f5353e28516..b95debabdc2a 100644
> --- a/arch/ia64/mm/init.c
> +++ b/arch/ia64/mm/init.c
> @@ -50,30 +50,40 @@ void
>  __ia64_sync_icache_dcache (pte_t pte)
>  {
>  	unsigned long addr;
> -	struct page *page;
> +	struct folio *folio;
>  
> -	page = pte_page(pte);
> -	addr = (unsigned long) page_address(page);
> +	folio = page_folio(pte_page(pte));
> +	addr = (unsigned long)folio_address(folio);
>  
> -	if (test_bit(PG_arch_1, &page->flags))
> +	if (test_bit(PG_arch_1, &folio->flags))
>  		return;				/* i-cache is already coherent with d-cache */
>  
> -	flush_icache_range(addr, addr + page_size(page));
> -	set_bit(PG_arch_1, &page->flags);	/* mark page as clean */
> +	flush_icache_range(addr, addr + folio_size(folio));
> +	set_bit(PG_arch_1, &folio->flags);	/* mark page as clean */
>  }
>  
>  /*
> - * Since DMA is i-cache coherent, any (complete) pages that were written via
> + * Since DMA is i-cache coherent, any (complete) folios that were written via
>   * DMA can be marked as "clean" so that lazy_mmu_prot_update() doesn't have to
>   * flush them when they get mapped into an executable vm-area.
>   */
>  void arch_dma_mark_clean(phys_addr_t paddr, size_t size)
>  {
>  	unsigned long pfn = PHYS_PFN(paddr);
> +	struct folio *folio = page_folio(pfn_to_page(pfn));
> +	ssize_t left = size;
> +	size_t offset = offset_in_folio(folio, paddr);
>  
> -	do {
> +	if (offset) {
> +		left -= folio_size(folio) - offset;
> +		folio = folio_next(folio);
> +	}
> +
> +	while (left >= (ssize_t)folio_size(folio)) {
>  		set_bit(PG_arch_1, &pfn_to_page(pfn)->flags);
> -	} while (++pfn <= PHYS_PFN(paddr + size - 1));
> +		left -= folio_size(folio);
> +		folio = folio_next(folio);
> +	}
>  }
>  
>  inline void
> -- 
> 2.39.2
> 
> 

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v4 13/36] loongarch: Implement the new page table range API
  2023-03-15  5:14 ` [PATCH v4 13/36] loongarch: " Matthew Wilcox (Oracle)
@ 2023-03-15 10:07   ` Mike Rapoport
  0 siblings, 0 replies; 138+ messages in thread
From: Mike Rapoport @ 2023-03-15 10:07 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-arch, linux-mm, linux-kernel, Huacai Chen, WANG Xuerui, loongarch

On Wed, Mar 15, 2023 at 05:14:21AM +0000, Matthew Wilcox (Oracle) wrote:
> Add update_mmu_cache_range() and change _PFN_SHIFT to PFN_PTE_SHIFT.
> It would probably be more efficient to implement __update_tlb() by
> flushing the entire folio instead of calling __update_tlb() N times,
> but I'll leave that for someone who understands the architecture better.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> Cc: Huacai Chen <chenhuacai@kernel.org>
> Cc: WANG Xuerui <kernel@xen0n.name>
> Cc: loongarch@lists.linux.dev

Acked-by: Mike Rapoport (IBM) <rppt@kernel.org>

> ---
>  arch/loongarch/include/asm/cacheflush.h   |  2 ++
>  arch/loongarch/include/asm/pgtable-bits.h |  4 ++--
>  arch/loongarch/include/asm/pgtable.h      | 28 ++++++++++++-----------
>  arch/loongarch/mm/pgtable.c               |  2 +-
>  arch/loongarch/mm/tlb.c                   |  2 +-
>  5 files changed, 21 insertions(+), 17 deletions(-)
> 
> diff --git a/arch/loongarch/include/asm/cacheflush.h b/arch/loongarch/include/asm/cacheflush.h
> index 0681788eb474..7907eb42bfbd 100644
> --- a/arch/loongarch/include/asm/cacheflush.h
> +++ b/arch/loongarch/include/asm/cacheflush.h
> @@ -47,8 +47,10 @@ void local_flush_icache_range(unsigned long start, unsigned long end);
>  #define flush_cache_vmap(start, end)			do { } while (0)
>  #define flush_cache_vunmap(start, end)			do { } while (0)
>  #define flush_icache_page(vma, page)			do { } while (0)
> +#define flush_icache_pages(vma, page)			do { } while (0)
>  #define flush_icache_user_page(vma, page, addr, len)	do { } while (0)
>  #define flush_dcache_page(page)				do { } while (0)
> +#define flush_dcache_folio(folio)			do { } while (0)
>  #define flush_dcache_mmap_lock(mapping)			do { } while (0)
>  #define flush_dcache_mmap_unlock(mapping)		do { } while (0)
>  
> diff --git a/arch/loongarch/include/asm/pgtable-bits.h b/arch/loongarch/include/asm/pgtable-bits.h
> index 8b98d22a145b..a1eb2e25446b 100644
> --- a/arch/loongarch/include/asm/pgtable-bits.h
> +++ b/arch/loongarch/include/asm/pgtable-bits.h
> @@ -48,12 +48,12 @@
>  #define _PAGE_NO_EXEC		(_ULCAST_(1) << _PAGE_NO_EXEC_SHIFT)
>  #define _PAGE_RPLV		(_ULCAST_(1) << _PAGE_RPLV_SHIFT)
>  #define _CACHE_MASK		(_ULCAST_(3) << _CACHE_SHIFT)
> -#define _PFN_SHIFT		(PAGE_SHIFT - 12 + _PAGE_PFN_SHIFT)
> +#define PFN_PTE_SHIFT		(PAGE_SHIFT - 12 + _PAGE_PFN_SHIFT)
>  
>  #define _PAGE_USER	(PLV_USER << _PAGE_PLV_SHIFT)
>  #define _PAGE_KERN	(PLV_KERN << _PAGE_PLV_SHIFT)
>  
> -#define _PFN_MASK (~((_ULCAST_(1) << (_PFN_SHIFT)) - 1) & \
> +#define _PFN_MASK (~((_ULCAST_(1) << (PFN_PTE_SHIFT)) - 1) & \
>  		  ((_ULCAST_(1) << (_PAGE_PFN_END_SHIFT)) - 1))
>  
>  /*
> diff --git a/arch/loongarch/include/asm/pgtable.h b/arch/loongarch/include/asm/pgtable.h
> index d28fb9dbec59..13aad0003e9a 100644
> --- a/arch/loongarch/include/asm/pgtable.h
> +++ b/arch/loongarch/include/asm/pgtable.h
> @@ -237,9 +237,9 @@ extern pmd_t mk_pmd(struct page *page, pgprot_t prot);
>  extern void set_pmd_at(struct mm_struct *mm, unsigned long addr, pmd_t *pmdp, pmd_t pmd);
>  
>  #define pte_page(x)		pfn_to_page(pte_pfn(x))
> -#define pte_pfn(x)		((unsigned long)(((x).pte & _PFN_MASK) >> _PFN_SHIFT))
> -#define pfn_pte(pfn, prot)	__pte(((pfn) << _PFN_SHIFT) | pgprot_val(prot))
> -#define pfn_pmd(pfn, prot)	__pmd(((pfn) << _PFN_SHIFT) | pgprot_val(prot))
> +#define pte_pfn(x)		((unsigned long)(((x).pte & _PFN_MASK) >> PFN_PTE_SHIFT))
> +#define pfn_pte(pfn, prot)	__pte(((pfn) << PFN_PTE_SHIFT) | pgprot_val(prot))
> +#define pfn_pmd(pfn, prot)	__pmd(((pfn) << PFN_PTE_SHIFT) | pgprot_val(prot))
>  
>  /*
>   * Initialize a new pgd / pud / pmd table with invalid pointers.
> @@ -334,12 +334,6 @@ static inline void set_pte(pte_t *ptep, pte_t pteval)
>  	}
>  }
>  
> -static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
> -			      pte_t *ptep, pte_t pteval)
> -{
> -	set_pte(ptep, pteval);
> -}
> -
>  static inline void pte_clear(struct mm_struct *mm, unsigned long addr, pte_t *ptep)
>  {
>  	/* Preserve global status for the pair */
> @@ -445,11 +439,19 @@ static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
>  extern void __update_tlb(struct vm_area_struct *vma,
>  			unsigned long address, pte_t *ptep);
>  
> -static inline void update_mmu_cache(struct vm_area_struct *vma,
> -			unsigned long address, pte_t *ptep)
> +static inline void update_mmu_cache_range(struct vm_area_struct *vma,
> +		unsigned long address, pte_t *ptep, unsigned int nr)
>  {
> -	__update_tlb(vma, address, ptep);
> +	for (;;) {
> +		__update_tlb(vma, address, ptep);
> +		if (--nr == 0)
> +			break;
> +		address += PAGE_SIZE;
> +		ptep++;
> +	}
>  }
> +#define update_mmu_cache(vma, addr, ptep) \
> +	update_mmu_cache_range(vma, addr, ptep, 1)
>  
>  #define __HAVE_ARCH_UPDATE_MMU_TLB
>  #define update_mmu_tlb	update_mmu_cache
> @@ -462,7 +464,7 @@ static inline void update_mmu_cache_pmd(struct vm_area_struct *vma,
>  
>  static inline unsigned long pmd_pfn(pmd_t pmd)
>  {
> -	return (pmd_val(pmd) & _PFN_MASK) >> _PFN_SHIFT;
> +	return (pmd_val(pmd) & _PFN_MASK) >> PFN_PTE_SHIFT;
>  }
>  
>  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
> diff --git a/arch/loongarch/mm/pgtable.c b/arch/loongarch/mm/pgtable.c
> index 36a6dc0148ae..1260cf30e3ee 100644
> --- a/arch/loongarch/mm/pgtable.c
> +++ b/arch/loongarch/mm/pgtable.c
> @@ -107,7 +107,7 @@ pmd_t mk_pmd(struct page *page, pgprot_t prot)
>  {
>  	pmd_t pmd;
>  
> -	pmd_val(pmd) = (page_to_pfn(page) << _PFN_SHIFT) | pgprot_val(prot);
> +	pmd_val(pmd) = (page_to_pfn(page) << PFN_PTE_SHIFT) | pgprot_val(prot);
>  
>  	return pmd;
>  }
> diff --git a/arch/loongarch/mm/tlb.c b/arch/loongarch/mm/tlb.c
> index 8bad6b0cff59..73652930b268 100644
> --- a/arch/loongarch/mm/tlb.c
> +++ b/arch/loongarch/mm/tlb.c
> @@ -246,7 +246,7 @@ static void output_pgtable_bits_defines(void)
>  	pr_define("_PAGE_WRITE_SHIFT %d\n", _PAGE_WRITE_SHIFT);
>  	pr_define("_PAGE_NO_READ_SHIFT %d\n", _PAGE_NO_READ_SHIFT);
>  	pr_define("_PAGE_NO_EXEC_SHIFT %d\n", _PAGE_NO_EXEC_SHIFT);
> -	pr_define("_PFN_SHIFT %d\n", _PFN_SHIFT);
> +	pr_define("PFN_PTE_SHIFT %d\n", PFN_PTE_SHIFT);
>  	pr_debug("\n");
>  }
>  
> -- 
> 2.39.2
> 

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v4 14/36] m68k: Implement the new page table range API
  2023-03-15  5:14 ` [PATCH v4 14/36] m68k: " Matthew Wilcox (Oracle)
  2023-03-15  7:43   ` Geert Uytterhoeven
@ 2023-03-15 10:07   ` Mike Rapoport
  1 sibling, 0 replies; 138+ messages in thread
From: Mike Rapoport @ 2023-03-15 10:07 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-arch, linux-mm, linux-kernel, Geert Uytterhoeven, linux-m68k

On Wed, Mar 15, 2023 at 05:14:22AM +0000, Matthew Wilcox (Oracle) wrote:
> Add PFN_PTE_SHIFT, update_mmu_cache_range(), flush_icache_pages() and
> flush_dcache_folio().
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> Cc: Geert Uytterhoeven <geert@linux-m68k.org>
> Cc: linux-m68k@lists.linux-m68k.org

Acked-by: Mike Rapoport (IBM) <rppt@kernel.org>

> ---
>  arch/m68k/include/asm/cacheflush_mm.h    | 27 ++++++++++++++++--------
>  arch/m68k/include/asm/mcf_pgtable.h      |  1 +
>  arch/m68k/include/asm/motorola_pgtable.h |  1 +
>  arch/m68k/include/asm/pgtable_mm.h       |  9 ++++----
>  arch/m68k/include/asm/sun3_pgtable.h     |  1 +
>  arch/m68k/mm/motorola.c                  |  2 +-
>  6 files changed, 27 insertions(+), 14 deletions(-)
> 
> diff --git a/arch/m68k/include/asm/cacheflush_mm.h b/arch/m68k/include/asm/cacheflush_mm.h
> index 1ac55e7b47f0..88eb85e81ef6 100644
> --- a/arch/m68k/include/asm/cacheflush_mm.h
> +++ b/arch/m68k/include/asm/cacheflush_mm.h
> @@ -220,24 +220,29 @@ static inline void flush_cache_page(struct vm_area_struct *vma, unsigned long vm
>  
>  /* Push the page at kernel virtual address and clear the icache */
>  /* RZ: use cpush %bc instead of cpush %dc, cinv %ic */
> -static inline void __flush_page_to_ram(void *vaddr)
> +static inline void __flush_pages_to_ram(void *vaddr, unsigned int nr)
>  {
>  	if (CPU_IS_COLDFIRE) {
>  		unsigned long addr, start, end;
>  		addr = ((unsigned long) vaddr) & ~(PAGE_SIZE - 1);
>  		start = addr & ICACHE_SET_MASK;
> -		end = (addr + PAGE_SIZE - 1) & ICACHE_SET_MASK;
> +		end = (addr + nr * PAGE_SIZE - 1) & ICACHE_SET_MASK;
>  		if (start > end) {
>  			flush_cf_bcache(0, end);
>  			end = ICACHE_MAX_ADDR;
>  		}
>  		flush_cf_bcache(start, end);
>  	} else if (CPU_IS_040_OR_060) {
> -		__asm__ __volatile__("nop\n\t"
> -				     ".chip 68040\n\t"
> -				     "cpushp %%bc,(%0)\n\t"
> -				     ".chip 68k"
> -				     : : "a" (__pa(vaddr)));
> +		unsigned long paddr = __pa(vaddr);
> +
> +		do {
> +			__asm__ __volatile__("nop\n\t"
> +					     ".chip 68040\n\t"
> +					     "cpushp %%bc,(%0)\n\t"
> +					     ".chip 68k"
> +					     : : "a" (paddr));
> +			paddr += PAGE_SIZE;
> +		} while (--nr);
>  	} else {
>  		unsigned long _tmp;
>  		__asm__ __volatile__("movec %%cacr,%0\n\t"
> @@ -249,10 +254,14 @@ static inline void __flush_page_to_ram(void *vaddr)
>  }
>  
>  #define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
> -#define flush_dcache_page(page)		__flush_page_to_ram(page_address(page))
> +#define flush_dcache_page(page)	__flush_pages_to_ram(page_address(page), 1)
> +#define flush_dcache_folio(folio)		\
> +	__flush_pages_to_ram(folio_address(folio), folio_nr_pages(folio))
>  #define flush_dcache_mmap_lock(mapping)		do { } while (0)
>  #define flush_dcache_mmap_unlock(mapping)	do { } while (0)
> -#define flush_icache_page(vma, page)	__flush_page_to_ram(page_address(page))
> +#define flush_icache_pages(vma, page, nr)	\
> +	__flush_pages_to_ram(page_address(page), nr)
> +#define flush_icache_page(vma, page) flush_icache_pages(vma, page, 1)
>  
>  extern void flush_icache_user_page(struct vm_area_struct *vma, struct page *page,
>  				    unsigned long addr, int len);
> diff --git a/arch/m68k/include/asm/mcf_pgtable.h b/arch/m68k/include/asm/mcf_pgtable.h
> index 13741c1245e1..1414b607eff4 100644
> --- a/arch/m68k/include/asm/mcf_pgtable.h
> +++ b/arch/m68k/include/asm/mcf_pgtable.h
> @@ -292,6 +292,7 @@ static inline pte_t pte_swp_clear_exclusive(pte_t pte)
>  	return pte;
>  }
>  
> +#define PFN_PTE_SHIFT		PAGE_SHIFT
>  #define pmd_pfn(pmd)		(pmd_val(pmd) >> PAGE_SHIFT)
>  #define pmd_page(pmd)		(pfn_to_page(pmd_val(pmd) >> PAGE_SHIFT))
>  
> diff --git a/arch/m68k/include/asm/motorola_pgtable.h b/arch/m68k/include/asm/motorola_pgtable.h
> index ec0dc19ab834..38d5e5edc3e1 100644
> --- a/arch/m68k/include/asm/motorola_pgtable.h
> +++ b/arch/m68k/include/asm/motorola_pgtable.h
> @@ -112,6 +112,7 @@ static inline void pud_set(pud_t *pudp, pmd_t *pmdp)
>  #define pte_present(pte)	(pte_val(pte) & (_PAGE_PRESENT | _PAGE_PROTNONE))
>  #define pte_clear(mm,addr,ptep)		({ pte_val(*(ptep)) = 0; })
>  
> +#define PFN_PTE_SHIFT		PAGE_SHIFT
>  #define pte_page(pte)		virt_to_page(__va(pte_val(pte)))
>  #define pte_pfn(pte)		(pte_val(pte) >> PAGE_SHIFT)
>  #define pfn_pte(pfn, prot)	__pte(((pfn) << PAGE_SHIFT) | pgprot_val(prot))
> diff --git a/arch/m68k/include/asm/pgtable_mm.h b/arch/m68k/include/asm/pgtable_mm.h
> index b93c41fe2067..8c2db20abdb6 100644
> --- a/arch/m68k/include/asm/pgtable_mm.h
> +++ b/arch/m68k/include/asm/pgtable_mm.h
> @@ -31,8 +31,6 @@
>  	do{							\
>  		*(pteptr) = (pteval);				\
>  	} while(0)
> -#define set_pte_at(mm,addr,ptep,pteval) set_pte(ptep,pteval)
> -
>  
>  /* PMD_SHIFT determines the size of the area a second-level page table can map */
>  #if CONFIG_PGTABLE_LEVELS == 3
> @@ -138,11 +136,14 @@ extern void kernel_set_cachemode(void *addr, unsigned long size, int cmode);
>   * tables contain all the necessary information.  The Sun3 does, but
>   * they are updated on demand.
>   */
> -static inline void update_mmu_cache(struct vm_area_struct *vma,
> -				    unsigned long address, pte_t *ptep)
> +static inline void update_mmu_cache_range(struct vm_area_struct *vma,
> +		unsigned long address, pte_t *ptep, unsigned int nr)
>  {
>  }
>  
> +#define update_mmu_cache(vma, addr, ptep) \
> +	update_mmu_cache_range(vma, addr, ptep, 1)
> +
>  #endif /* !__ASSEMBLY__ */
>  
>  /* MMU-specific headers */
> diff --git a/arch/m68k/include/asm/sun3_pgtable.h b/arch/m68k/include/asm/sun3_pgtable.h
> index e582b0484a55..feae73b3b342 100644
> --- a/arch/m68k/include/asm/sun3_pgtable.h
> +++ b/arch/m68k/include/asm/sun3_pgtable.h
> @@ -105,6 +105,7 @@ static inline void pte_clear (struct mm_struct *mm, unsigned long addr, pte_t *p
>  	pte_val (*ptep) = 0;
>  }
>  
> +#define PFN_PTE_SHIFT		0
>  #define pte_pfn(pte)            (pte_val(pte) & SUN3_PAGE_PGNUM_MASK)
>  #define pfn_pte(pfn, pgprot) \
>  ({ pte_t __pte; pte_val(__pte) = pfn | pgprot_val(pgprot); __pte; })
> diff --git a/arch/m68k/mm/motorola.c b/arch/m68k/mm/motorola.c
> index 911301224078..790666c6d146 100644
> --- a/arch/m68k/mm/motorola.c
> +++ b/arch/m68k/mm/motorola.c
> @@ -81,7 +81,7 @@ static inline void cache_page(void *vaddr)
>  
>  void mmu_page_ctor(void *page)
>  {
> -	__flush_page_to_ram(page);
> +	__flush_pages_to_ram(page, 1);
>  	flush_tlb_kernel_page(page);
>  	nocache_page(page);
>  }
> -- 
> 2.39.2
> 

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v4 15/36] microblaze: Implement the new page table range API
  2023-03-15  5:14 ` [PATCH v4 15/36] microblaze: " Matthew Wilcox (Oracle)
@ 2023-03-15 10:07   ` Mike Rapoport
  0 siblings, 0 replies; 138+ messages in thread
From: Mike Rapoport @ 2023-03-15 10:07 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle); +Cc: linux-arch, linux-mm, linux-kernel, Michal Simek

On Wed, Mar 15, 2023 at 05:14:23AM +0000, Matthew Wilcox (Oracle) wrote:
> Rename PFN_SHIFT_OFFSET to PTE_PFN_SHIFT.  Change the calling
> convention for set_pte() to be the same as other architectures.  Add
> update_mmu_cache_range(), flush_icache_pages() and flush_dcache_folio().
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> Cc: Michal Simek <monstr@monstr.eu>

Acked-by: Mike Rapoport (IBM) <rppt@kernel.org>

> ---
>  arch/microblaze/include/asm/cacheflush.h |  8 ++++++++
>  arch/microblaze/include/asm/pgtable.h    | 15 ++++-----------
>  arch/microblaze/include/asm/tlbflush.h   |  4 +++-
>  3 files changed, 15 insertions(+), 12 deletions(-)
> 
> diff --git a/arch/microblaze/include/asm/cacheflush.h b/arch/microblaze/include/asm/cacheflush.h
> index 39f8fb6768d8..e6641ff98cb3 100644
> --- a/arch/microblaze/include/asm/cacheflush.h
> +++ b/arch/microblaze/include/asm/cacheflush.h
> @@ -74,6 +74,14 @@ do { \
>  	flush_dcache_range((unsigned) (addr), (unsigned) (addr) + PAGE_SIZE); \
>  } while (0);
>  
> +static void flush_dcache_folio(struct folio *folio)
> +{
> +	unsigned long addr = folio_pfn(folio) << PAGE_SHIFT;
> +
> +	flush_dcache_range(addr, addr + folio_size(folio));
> +}
> +#define flush_dcache_folio flush_dcache_folio
> +
>  #define flush_cache_page(vma, vmaddr, pfn) \
>  	flush_dcache_range(pfn << PAGE_SHIFT, (pfn << PAGE_SHIFT) + PAGE_SIZE);
>  
> diff --git a/arch/microblaze/include/asm/pgtable.h b/arch/microblaze/include/asm/pgtable.h
> index d1b8272abcd9..19fcd7f8517e 100644
> --- a/arch/microblaze/include/asm/pgtable.h
> +++ b/arch/microblaze/include/asm/pgtable.h
> @@ -230,12 +230,12 @@ extern unsigned long empty_zero_page[1024];
>  
>  #define pte_page(x)		(mem_map + (unsigned long) \
>  				((pte_val(x) - memory_start) >> PAGE_SHIFT))
> -#define PFN_SHIFT_OFFSET	(PAGE_SHIFT)
> +#define PTE_PFN_SHIFT		PAGE_SHIFT
>  
> -#define pte_pfn(x)		(pte_val(x) >> PFN_SHIFT_OFFSET)
> +#define pte_pfn(x)		(pte_val(x) >> PTE_PFN_SHIFT)
>  
>  #define pfn_pte(pfn, prot) \
> -	__pte(((pte_basic_t)(pfn) << PFN_SHIFT_OFFSET) | pgprot_val(prot))
> +	__pte(((pte_basic_t)(pfn) << PTE_PFN_SHIFT) | pgprot_val(prot))
>  
>  #ifndef __ASSEMBLY__
>  /*
> @@ -330,14 +330,7 @@ static inline unsigned long pte_update(pte_t *p, unsigned long clr,
>  /*
>   * set_pte stores a linux PTE into the linux page table.
>   */
> -static inline void set_pte(struct mm_struct *mm, unsigned long addr,
> -		pte_t *ptep, pte_t pte)
> -{
> -	*ptep = pte;
> -}
> -
> -static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
> -		pte_t *ptep, pte_t pte)
> +static inline void set_pte(pte_t *ptep, pte_t pte)
>  {
>  	*ptep = pte;
>  }
> diff --git a/arch/microblaze/include/asm/tlbflush.h b/arch/microblaze/include/asm/tlbflush.h
> index 2038168ed128..1b179e5e9062 100644
> --- a/arch/microblaze/include/asm/tlbflush.h
> +++ b/arch/microblaze/include/asm/tlbflush.h
> @@ -33,7 +33,9 @@ static inline void local_flush_tlb_range(struct vm_area_struct *vma,
>  
>  #define flush_tlb_kernel_range(start, end)	do { } while (0)
>  
> -#define update_mmu_cache(vma, addr, ptep)	do { } while (0)
> +#define update_mmu_cache_range(vma, addr, ptep, nr)	do { } while (0)
> +#define update_mmu_cache(vma, addr, pte) \
> +	update_mmu_cache_range(vma, addr, ptep, 1)
>  
>  #define flush_tlb_all local_flush_tlb_all
>  #define flush_tlb_mm local_flush_tlb_mm
> -- 
> 2.39.2
> 
> 

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v4 16/36] mips: Implement the new page table range API
  2023-03-15  5:14 ` [PATCH v4 16/36] mips: " Matthew Wilcox (Oracle)
@ 2023-03-15 10:08   ` Mike Rapoport
  2023-03-15 10:50   ` Thomas Bogendoerfer
  1 sibling, 0 replies; 138+ messages in thread
From: Mike Rapoport @ 2023-03-15 10:08 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-arch, linux-mm, linux-kernel, Thomas Bogendoerfer, linux-mips

On Wed, Mar 15, 2023 at 05:14:24AM +0000, Matthew Wilcox (Oracle) wrote:
> Rename _PFN_SHIFT to PFN_PTE_SHIFT.  Convert a few places
> to call set_pte() instead of set_pte_at().  Add set_ptes(),
> update_mmu_cache_range(), flush_icache_pages() and flush_dcache_folio().
> Change the PG_arch_1 (aka PG_dcache_dirty) flag from being per-page
> to per-folio.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
> Cc: linux-mips@vger.kernel.org

Acked-by: Mike Rapoport (IBM) <rppt@kernel.org>

> ---
>  arch/mips/bcm47xx/prom.c             |  2 +-
>  arch/mips/include/asm/cacheflush.h   | 32 ++++++++++------
>  arch/mips/include/asm/pgtable-32.h   | 10 ++---
>  arch/mips/include/asm/pgtable-64.h   |  6 +--
>  arch/mips/include/asm/pgtable-bits.h |  6 +--
>  arch/mips/include/asm/pgtable.h      | 44 +++++++++++++---------
>  arch/mips/mm/c-r4k.c                 |  5 ++-
>  arch/mips/mm/cache.c                 | 56 ++++++++++++++--------------
>  arch/mips/mm/init.c                  | 21 +++++++----
>  arch/mips/mm/pgtable-32.c            |  2 +-
>  arch/mips/mm/pgtable-64.c            |  2 +-
>  arch/mips/mm/tlbex.c                 |  2 +-
>  12 files changed, 107 insertions(+), 81 deletions(-)
> 
> diff --git a/arch/mips/bcm47xx/prom.c b/arch/mips/bcm47xx/prom.c
> index a9bea411d928..99a1ba5394e0 100644
> --- a/arch/mips/bcm47xx/prom.c
> +++ b/arch/mips/bcm47xx/prom.c
> @@ -116,7 +116,7 @@ void __init prom_init(void)
>  #if defined(CONFIG_BCM47XX_BCMA) && defined(CONFIG_HIGHMEM)
>  
>  #define EXTVBASE	0xc0000000
> -#define ENTRYLO(x)	((pte_val(pfn_pte((x) >> _PFN_SHIFT, PAGE_KERNEL_UNCACHED)) >> 6) | 1)
> +#define ENTRYLO(x)	((pte_val(pfn_pte((x) >> PFN_PTE_SHIFT, PAGE_KERNEL_UNCACHED)) >> 6) | 1)
>  
>  #include <asm/tlbflush.h>
>  
> diff --git a/arch/mips/include/asm/cacheflush.h b/arch/mips/include/asm/cacheflush.h
> index b3dc9c589442..2683cade42ef 100644
> --- a/arch/mips/include/asm/cacheflush.h
> +++ b/arch/mips/include/asm/cacheflush.h
> @@ -36,12 +36,12 @@
>   */
>  #define PG_dcache_dirty			PG_arch_1
>  
> -#define Page_dcache_dirty(page)		\
> -	test_bit(PG_dcache_dirty, &(page)->flags)
> -#define SetPageDcacheDirty(page)	\
> -	set_bit(PG_dcache_dirty, &(page)->flags)
> -#define ClearPageDcacheDirty(page)	\
> -	clear_bit(PG_dcache_dirty, &(page)->flags)
> +#define folio_test_dcache_dirty(folio)		\
> +	test_bit(PG_dcache_dirty, &(folio)->flags)
> +#define folio_set_dcache_dirty(folio)	\
> +	set_bit(PG_dcache_dirty, &(folio)->flags)
> +#define folio_clear_dcache_dirty(folio)	\
> +	clear_bit(PG_dcache_dirty, &(folio)->flags)
>  
>  extern void (*flush_cache_all)(void);
>  extern void (*__flush_cache_all)(void);
> @@ -50,15 +50,24 @@ extern void (*flush_cache_mm)(struct mm_struct *mm);
>  extern void (*flush_cache_range)(struct vm_area_struct *vma,
>  	unsigned long start, unsigned long end);
>  extern void (*flush_cache_page)(struct vm_area_struct *vma, unsigned long page, unsigned long pfn);
> -extern void __flush_dcache_page(struct page *page);
> +extern void __flush_dcache_pages(struct page *page, unsigned int nr);
>  
>  #define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
> +static inline void flush_dcache_folio(struct folio *folio)
> +{
> +	if (cpu_has_dc_aliases)
> +		__flush_dcache_pages(&folio->page, folio_nr_pages(folio));
> +	else if (!cpu_has_ic_fills_f_dc)
> +		folio_set_dcache_dirty(folio);
> +}
> +#define flush_dcache_folio flush_dcache_folio
> +
>  static inline void flush_dcache_page(struct page *page)
>  {
>  	if (cpu_has_dc_aliases)
> -		__flush_dcache_page(page);
> +		__flush_dcache_pages(page, 1);
>  	else if (!cpu_has_ic_fills_f_dc)
> -		SetPageDcacheDirty(page);
> +		folio_set_dcache_dirty(page_folio(page));
>  }
>  
>  #define flush_dcache_mmap_lock(mapping)		do { } while (0)
> @@ -73,10 +82,11 @@ static inline void flush_anon_page(struct vm_area_struct *vma,
>  		__flush_anon_page(page, vmaddr);
>  }
>  
> -static inline void flush_icache_page(struct vm_area_struct *vma,
> -	struct page *page)
> +static inline void flush_icache_pages(struct vm_area_struct *vma,
> +		struct page *page, unsigned int nr)
>  {
>  }
> +#define flush_icache_page(vma, page) flush_icache_pages(vma, page, 1)
>  
>  extern void (*flush_icache_range)(unsigned long start, unsigned long end);
>  extern void (*local_flush_icache_range)(unsigned long start, unsigned long end);
> diff --git a/arch/mips/include/asm/pgtable-32.h b/arch/mips/include/asm/pgtable-32.h
> index ba0016709a1a..0e196650f4f4 100644
> --- a/arch/mips/include/asm/pgtable-32.h
> +++ b/arch/mips/include/asm/pgtable-32.h
> @@ -153,7 +153,7 @@ static inline void pmd_clear(pmd_t *pmdp)
>  #if defined(CONFIG_XPA)
>  
>  #define MAX_POSSIBLE_PHYSMEM_BITS 40
> -#define pte_pfn(x)		(((unsigned long)((x).pte_high >> _PFN_SHIFT)) | (unsigned long)((x).pte_low << _PAGE_PRESENT_SHIFT))
> +#define pte_pfn(x)		(((unsigned long)((x).pte_high >> PFN_PTE_SHIFT)) | (unsigned long)((x).pte_low << _PAGE_PRESENT_SHIFT))
>  static inline pte_t
>  pfn_pte(unsigned long pfn, pgprot_t prot)
>  {
> @@ -161,7 +161,7 @@ pfn_pte(unsigned long pfn, pgprot_t prot)
>  
>  	pte.pte_low = (pfn >> _PAGE_PRESENT_SHIFT) |
>  				(pgprot_val(prot) & ~_PFNX_MASK);
> -	pte.pte_high = (pfn << _PFN_SHIFT) |
> +	pte.pte_high = (pfn << PFN_PTE_SHIFT) |
>  				(pgprot_val(prot) & ~_PFN_MASK);
>  	return pte;
>  }
> @@ -184,9 +184,9 @@ static inline pte_t pfn_pte(unsigned long pfn, pgprot_t prot)
>  #else
>  
>  #define MAX_POSSIBLE_PHYSMEM_BITS 32
> -#define pte_pfn(x)		((unsigned long)((x).pte >> _PFN_SHIFT))
> -#define pfn_pte(pfn, prot)	__pte(((unsigned long long)(pfn) << _PFN_SHIFT) | pgprot_val(prot))
> -#define pfn_pmd(pfn, prot)	__pmd(((unsigned long long)(pfn) << _PFN_SHIFT) | pgprot_val(prot))
> +#define pte_pfn(x)		((unsigned long)((x).pte >> PFN_PTE_SHIFT))
> +#define pfn_pte(pfn, prot)	__pte(((unsigned long long)(pfn) << PFN_PTE_SHIFT) | pgprot_val(prot))
> +#define pfn_pmd(pfn, prot)	__pmd(((unsigned long long)(pfn) << PFN_PTE_SHIFT) | pgprot_val(prot))
>  #endif /* defined(CONFIG_PHYS_ADDR_T_64BIT) && defined(CONFIG_CPU_MIPS32) */
>  
>  #define pte_page(x)		pfn_to_page(pte_pfn(x))
> diff --git a/arch/mips/include/asm/pgtable-64.h b/arch/mips/include/asm/pgtable-64.h
> index 98e24e3e7f2b..20ca48c1b606 100644
> --- a/arch/mips/include/asm/pgtable-64.h
> +++ b/arch/mips/include/asm/pgtable-64.h
> @@ -298,9 +298,9 @@ static inline void pud_clear(pud_t *pudp)
>  
>  #define pte_page(x)		pfn_to_page(pte_pfn(x))
>  
> -#define pte_pfn(x)		((unsigned long)((x).pte >> _PFN_SHIFT))
> -#define pfn_pte(pfn, prot)	__pte(((pfn) << _PFN_SHIFT) | pgprot_val(prot))
> -#define pfn_pmd(pfn, prot)	__pmd(((pfn) << _PFN_SHIFT) | pgprot_val(prot))
> +#define pte_pfn(x)		((unsigned long)((x).pte >> PFN_PTE_SHIFT))
> +#define pfn_pte(pfn, prot)	__pte(((pfn) << PFN_PTE_SHIFT) | pgprot_val(prot))
> +#define pfn_pmd(pfn, prot)	__pmd(((pfn) << PFN_PTE_SHIFT) | pgprot_val(prot))
>  
>  #ifndef __PAGETABLE_PMD_FOLDED
>  static inline pmd_t *pud_pgtable(pud_t pud)
> diff --git a/arch/mips/include/asm/pgtable-bits.h b/arch/mips/include/asm/pgtable-bits.h
> index 2362842ee2b5..744abba9111f 100644
> --- a/arch/mips/include/asm/pgtable-bits.h
> +++ b/arch/mips/include/asm/pgtable-bits.h
> @@ -182,10 +182,10 @@ enum pgtable_bits {
>  #if defined(CONFIG_CPU_R3K_TLB)
>  # define _CACHE_UNCACHED	(1 << _CACHE_UNCACHED_SHIFT)
>  # define _CACHE_MASK		_CACHE_UNCACHED
> -# define _PFN_SHIFT		PAGE_SHIFT
> +# define PFN_PTE_SHIFT		PAGE_SHIFT
>  #else
>  # define _CACHE_MASK		(7 << _CACHE_SHIFT)
> -# define _PFN_SHIFT		(PAGE_SHIFT - 12 + _CACHE_SHIFT + 3)
> +# define PFN_PTE_SHIFT		(PAGE_SHIFT - 12 + _CACHE_SHIFT + 3)
>  #endif
>  
>  #ifndef _PAGE_NO_EXEC
> @@ -195,7 +195,7 @@ enum pgtable_bits {
>  #define _PAGE_SILENT_READ	_PAGE_VALID
>  #define _PAGE_SILENT_WRITE	_PAGE_DIRTY
>  
> -#define _PFN_MASK		(~((1 << (_PFN_SHIFT)) - 1))
> +#define _PFN_MASK		(~((1 << (PFN_PTE_SHIFT)) - 1))
>  
>  /*
>   * The final layouts of the PTE bits are:
> diff --git a/arch/mips/include/asm/pgtable.h b/arch/mips/include/asm/pgtable.h
> index 574fa14ac8b2..cfcd6a8ba8ef 100644
> --- a/arch/mips/include/asm/pgtable.h
> +++ b/arch/mips/include/asm/pgtable.h
> @@ -66,7 +66,7 @@ extern void paging_init(void);
>  
>  static inline unsigned long pmd_pfn(pmd_t pmd)
>  {
> -	return pmd_val(pmd) >> _PFN_SHIFT;
> +	return pmd_val(pmd) >> PFN_PTE_SHIFT;
>  }
>  
>  #ifndef CONFIG_MIPS_HUGE_TLB_SUPPORT
> @@ -105,9 +105,6 @@ do {									\
>  	}								\
>  } while(0)
>  
> -static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
> -			      pte_t *ptep, pte_t pteval);
> -
>  #if defined(CONFIG_PHYS_ADDR_T_64BIT) && defined(CONFIG_CPU_MIPS32)
>  
>  #ifdef CONFIG_XPA
> @@ -157,7 +154,7 @@ static inline void pte_clear(struct mm_struct *mm, unsigned long addr, pte_t *pt
>  			null.pte_low = null.pte_high = _PAGE_GLOBAL;
>  	}
>  
> -	set_pte_at(mm, addr, ptep, null);
> +	set_pte(ptep, null);
>  	htw_start();
>  }
>  #else
> @@ -196,28 +193,41 @@ static inline void pte_clear(struct mm_struct *mm, unsigned long addr, pte_t *pt
>  #if !defined(CONFIG_CPU_R3K_TLB)
>  	/* Preserve global status for the pair */
>  	if (pte_val(*ptep_buddy(ptep)) & _PAGE_GLOBAL)
> -		set_pte_at(mm, addr, ptep, __pte(_PAGE_GLOBAL));
> +		set_pte(ptep, __pte(_PAGE_GLOBAL));
>  	else
>  #endif
> -		set_pte_at(mm, addr, ptep, __pte(0));
> +		set_pte(ptep, __pte(0));
>  	htw_start();
>  }
>  #endif
>  
> -static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
> -			      pte_t *ptep, pte_t pteval)
> +static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
> +		pte_t *ptep, pte_t pte, unsigned int nr)
>  {
> +	unsigned int i;
> +	bool do_sync = false;
>  
> -	if (!pte_present(pteval))
> -		goto cache_sync_done;
> +	for (i = 0; i < nr; i++) {
> +		if (!pte_present(pte))
> +			continue;
> +		if (pte_present(ptep[i]) &&
> +		    (pte_pfn(ptep[i]) == pte_pfn(pte)))
> +			continue;
> +		do_sync = true;
> +	}
>  
> -	if (pte_present(*ptep) && (pte_pfn(*ptep) == pte_pfn(pteval)))
> -		goto cache_sync_done;
> +	if (do_sync)
> +		__update_cache(addr, pte);
>  
> -	__update_cache(addr, pteval);
> -cache_sync_done:
> -	set_pte(ptep, pteval);
> +	for (;;) {
> +		set_pte(ptep, pte);
> +		if (--nr == 0)
> +			break;
> +		ptep++;
> +		pte_val(pte) += 1 << PFN_PTE_SHIFT;
> +	}
>  }
> +#define set_ptes set_ptes
>  
>  /*
>   * (pmds are folded into puds so this doesn't get actually called,
> @@ -486,7 +496,7 @@ static inline int ptep_set_access_flags(struct vm_area_struct *vma,
>  					pte_t entry, int dirty)
>  {
>  	if (!pte_same(*ptep, entry))
> -		set_pte_at(vma->vm_mm, address, ptep, entry);
> +		set_pte(ptep, entry);
>  	/*
>  	 * update_mmu_cache will unconditionally execute, handling both
>  	 * the case that the PTE changed and the spurious fault case.
> diff --git a/arch/mips/mm/c-r4k.c b/arch/mips/mm/c-r4k.c
> index a549fa98c2f4..7d2a42f0cffd 100644
> --- a/arch/mips/mm/c-r4k.c
> +++ b/arch/mips/mm/c-r4k.c
> @@ -679,13 +679,14 @@ static inline void local_r4k_flush_cache_page(void *args)
>  	if ((mm == current->active_mm) && (pte_val(*ptep) & _PAGE_VALID))
>  		vaddr = NULL;
>  	else {
> +		struct folio *folio = page_folio(page);
>  		/*
>  		 * Use kmap_coherent or kmap_atomic to do flushes for
>  		 * another ASID than the current one.
>  		 */
>  		map_coherent = (cpu_has_dc_aliases &&
> -				page_mapcount(page) &&
> -				!Page_dcache_dirty(page));
> +				folio_mapped(folio) &&
> +				!folio_test_dcache_dirty(folio));
>  		if (map_coherent)
>  			vaddr = kmap_coherent(page, addr);
>  		else
> diff --git a/arch/mips/mm/cache.c b/arch/mips/mm/cache.c
> index 11b3e7ddafd5..0668435521fc 100644
> --- a/arch/mips/mm/cache.c
> +++ b/arch/mips/mm/cache.c
> @@ -82,13 +82,15 @@ SYSCALL_DEFINE3(cacheflush, unsigned long, addr, unsigned long, bytes,
>  	return 0;
>  }
>  
> -void __flush_dcache_page(struct page *page)
> +void __flush_dcache_pages(struct page *page, unsigned int nr)
>  {
> -	struct address_space *mapping = page_mapping_file(page);
> +	struct folio *folio = page_folio(page);
> +	struct address_space *mapping = folio_flush_mapping(folio);
>  	unsigned long addr;
> +	unsigned int i;
>  
>  	if (mapping && !mapping_mapped(mapping)) {
> -		SetPageDcacheDirty(page);
> +		folio_set_dcache_dirty(folio);
>  		return;
>  	}
>  
> @@ -97,25 +99,21 @@ void __flush_dcache_page(struct page *page)
>  	 * case is for exec env/arg pages and those are %99 certainly going to
>  	 * get faulted into the tlb (and thus flushed) anyways.
>  	 */
> -	if (PageHighMem(page))
> -		addr = (unsigned long)kmap_atomic(page);
> -	else
> -		addr = (unsigned long)page_address(page);
> -
> -	flush_data_cache_page(addr);
> -
> -	if (PageHighMem(page))
> -		kunmap_atomic((void *)addr);
> +	for (i = 0; i < nr; i++) {
> +		addr = (unsigned long)kmap_local_page(page + i);
> +		flush_data_cache_page(addr);
> +		kunmap_local((void *)addr);
> +	}
>  }
> -
> -EXPORT_SYMBOL(__flush_dcache_page);
> +EXPORT_SYMBOL(__flush_dcache_pages);
>  
>  void __flush_anon_page(struct page *page, unsigned long vmaddr)
>  {
>  	unsigned long addr = (unsigned long) page_address(page);
> +	struct folio *folio = page_folio(page);
>  
>  	if (pages_do_alias(addr, vmaddr)) {
> -		if (page_mapcount(page) && !Page_dcache_dirty(page)) {
> +		if (folio_mapped(folio) && !folio_test_dcache_dirty(folio)) {
>  			void *kaddr;
>  
>  			kaddr = kmap_coherent(page, vmaddr);
> @@ -130,27 +128,29 @@ EXPORT_SYMBOL(__flush_anon_page);
>  
>  void __update_cache(unsigned long address, pte_t pte)
>  {
> -	struct page *page;
> +	struct folio *folio;
>  	unsigned long pfn, addr;
>  	int exec = !pte_no_exec(pte) && !cpu_has_ic_fills_f_dc;
> +	unsigned int i;
>  
>  	pfn = pte_pfn(pte);
>  	if (unlikely(!pfn_valid(pfn)))
>  		return;
> -	page = pfn_to_page(pfn);
> -	if (Page_dcache_dirty(page)) {
> -		if (PageHighMem(page))
> -			addr = (unsigned long)kmap_atomic(page);
> -		else
> -			addr = (unsigned long)page_address(page);
> -
> -		if (exec || pages_do_alias(addr, address & PAGE_MASK))
> -			flush_data_cache_page(addr);
>  
> -		if (PageHighMem(page))
> -			kunmap_atomic((void *)addr);
> +	folio = page_folio(pfn_to_page(pfn));
> +	address &= PAGE_MASK;
> +	address -= offset_in_folio(folio, pfn << PAGE_SHIFT);
> +
> +	if (folio_test_dcache_dirty(folio)) {
> +		for (i = 0; i < folio_nr_pages(folio); i++) {
> +			addr = (unsigned long)kmap_local_folio(folio, i);
>  
> -		ClearPageDcacheDirty(page);
> +			if (exec || pages_do_alias(addr, address))
> +				flush_data_cache_page(addr);
> +			kunmap_local((void *)addr);
> +			address += PAGE_SIZE;
> +		}
> +		folio_clear_dcache_dirty(folio);
>  	}
>  }
>  
> diff --git a/arch/mips/mm/init.c b/arch/mips/mm/init.c
> index 5a8002839550..5dcb525a8995 100644
> --- a/arch/mips/mm/init.c
> +++ b/arch/mips/mm/init.c
> @@ -88,7 +88,7 @@ static void *__kmap_pgprot(struct page *page, unsigned long addr, pgprot_t prot)
>  	pte_t pte;
>  	int tlbidx;
>  
> -	BUG_ON(Page_dcache_dirty(page));
> +	BUG_ON(folio_test_dcache_dirty(page_folio(page)));
>  
>  	preempt_disable();
>  	pagefault_disable();
> @@ -169,11 +169,12 @@ void kunmap_coherent(void)
>  void copy_user_highpage(struct page *to, struct page *from,
>  	unsigned long vaddr, struct vm_area_struct *vma)
>  {
> +	struct folio *src = page_folio(from);
>  	void *vfrom, *vto;
>  
>  	vto = kmap_atomic(to);
>  	if (cpu_has_dc_aliases &&
> -	    page_mapcount(from) && !Page_dcache_dirty(from)) {
> +	    folio_mapped(src) && !folio_test_dcache_dirty(src)) {
>  		vfrom = kmap_coherent(from, vaddr);
>  		copy_page(vto, vfrom);
>  		kunmap_coherent();
> @@ -194,15 +195,17 @@ void copy_to_user_page(struct vm_area_struct *vma,
>  	struct page *page, unsigned long vaddr, void *dst, const void *src,
>  	unsigned long len)
>  {
> +	struct folio *folio = page_folio(page);
> +
>  	if (cpu_has_dc_aliases &&
> -	    page_mapcount(page) && !Page_dcache_dirty(page)) {
> +	    folio_mapped(folio) && !folio_test_dcache_dirty(folio)) {
>  		void *vto = kmap_coherent(page, vaddr) + (vaddr & ~PAGE_MASK);
>  		memcpy(vto, src, len);
>  		kunmap_coherent();
>  	} else {
>  		memcpy(dst, src, len);
>  		if (cpu_has_dc_aliases)
> -			SetPageDcacheDirty(page);
> +			folio_set_dcache_dirty(folio);
>  	}
>  	if (vma->vm_flags & VM_EXEC)
>  		flush_cache_page(vma, vaddr, page_to_pfn(page));
> @@ -212,15 +215,17 @@ void copy_from_user_page(struct vm_area_struct *vma,
>  	struct page *page, unsigned long vaddr, void *dst, const void *src,
>  	unsigned long len)
>  {
> +	struct folio *folio = page_folio(page);
> +
>  	if (cpu_has_dc_aliases &&
> -	    page_mapcount(page) && !Page_dcache_dirty(page)) {
> +	    folio_mapped(folio) && !folio_test_dcache_dirty(folio)) {
>  		void *vfrom = kmap_coherent(page, vaddr) + (vaddr & ~PAGE_MASK);
>  		memcpy(dst, vfrom, len);
>  		kunmap_coherent();
>  	} else {
>  		memcpy(dst, src, len);
>  		if (cpu_has_dc_aliases)
> -			SetPageDcacheDirty(page);
> +			folio_set_dcache_dirty(folio);
>  	}
>  }
>  EXPORT_SYMBOL_GPL(copy_from_user_page);
> @@ -448,10 +453,10 @@ static inline void __init mem_init_free_highmem(void)
>  void __init mem_init(void)
>  {
>  	/*
> -	 * When _PFN_SHIFT is greater than PAGE_SHIFT we won't have enough PTE
> +	 * When PFN_PTE_SHIFT is greater than PAGE_SHIFT we won't have enough PTE
>  	 * bits to hold a full 32b physical address on MIPS32 systems.
>  	 */
> -	BUILD_BUG_ON(IS_ENABLED(CONFIG_32BIT) && (_PFN_SHIFT > PAGE_SHIFT));
> +	BUILD_BUG_ON(IS_ENABLED(CONFIG_32BIT) && (PFN_PTE_SHIFT > PAGE_SHIFT));
>  
>  #ifdef CONFIG_HIGHMEM
>  	max_mapnr = highend_pfn ? highend_pfn : max_low_pfn;
> diff --git a/arch/mips/mm/pgtable-32.c b/arch/mips/mm/pgtable-32.c
> index f57fb69472f8..84dd5136d53a 100644
> --- a/arch/mips/mm/pgtable-32.c
> +++ b/arch/mips/mm/pgtable-32.c
> @@ -35,7 +35,7 @@ pmd_t mk_pmd(struct page *page, pgprot_t prot)
>  {
>  	pmd_t pmd;
>  
> -	pmd_val(pmd) = (page_to_pfn(page) << _PFN_SHIFT) | pgprot_val(prot);
> +	pmd_val(pmd) = (page_to_pfn(page) << PFN_PTE_SHIFT) | pgprot_val(prot);
>  
>  	return pmd;
>  }
> diff --git a/arch/mips/mm/pgtable-64.c b/arch/mips/mm/pgtable-64.c
> index b4386a0e2ef8..c76d21f7dffb 100644
> --- a/arch/mips/mm/pgtable-64.c
> +++ b/arch/mips/mm/pgtable-64.c
> @@ -93,7 +93,7 @@ pmd_t mk_pmd(struct page *page, pgprot_t prot)
>  {
>  	pmd_t pmd;
>  
> -	pmd_val(pmd) = (page_to_pfn(page) << _PFN_SHIFT) | pgprot_val(prot);
> +	pmd_val(pmd) = (page_to_pfn(page) << PFN_PTE_SHIFT) | pgprot_val(prot);
>  
>  	return pmd;
>  }
> diff --git a/arch/mips/mm/tlbex.c b/arch/mips/mm/tlbex.c
> index 80e05ee98d62..1393a11af539 100644
> --- a/arch/mips/mm/tlbex.c
> +++ b/arch/mips/mm/tlbex.c
> @@ -253,7 +253,7 @@ static void output_pgtable_bits_defines(void)
>  	pr_define("_PAGE_GLOBAL_SHIFT %d\n", _PAGE_GLOBAL_SHIFT);
>  	pr_define("_PAGE_VALID_SHIFT %d\n", _PAGE_VALID_SHIFT);
>  	pr_define("_PAGE_DIRTY_SHIFT %d\n", _PAGE_DIRTY_SHIFT);
> -	pr_define("_PFN_SHIFT %d\n", _PFN_SHIFT);
> +	pr_define("PFN_PTE_SHIFT %d\n", PFN_PTE_SHIFT);
>  	pr_debug("\n");
>  }
>  
> -- 
> 2.39.2
> 
> 

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v4 17/36] nios2: Implement the new page table range API
  2023-03-15  5:14 ` [PATCH v4 17/36] nios2: " Matthew Wilcox (Oracle)
@ 2023-03-15 10:08   ` Mike Rapoport
  2023-06-13 22:45     ` Dinh Nguyen
  0 siblings, 1 reply; 138+ messages in thread
From: Mike Rapoport @ 2023-03-15 10:08 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle); +Cc: linux-arch, linux-mm, linux-kernel, Dinh Nguyen

On Wed, Mar 15, 2023 at 05:14:25AM +0000, Matthew Wilcox (Oracle) wrote:
> Add set_ptes(), update_mmu_cache_range(), flush_icache_pages() and
> flush_dcache_folio().  Change the PG_arch_1 (aka PG_dcache_dirty) flag
> from being per-page to per-folio.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> Cc: Dinh Nguyen <dinguyen@kernel.org>

Acked-by: Mike Rapoport (IBM) <rppt@kernel.org>

> ---
>  arch/nios2/include/asm/cacheflush.h |  6 ++-
>  arch/nios2/include/asm/pgtable.h    | 28 ++++++++-----
>  arch/nios2/mm/cacheflush.c          | 61 ++++++++++++++++-------------
>  3 files changed, 58 insertions(+), 37 deletions(-)
> 
> diff --git a/arch/nios2/include/asm/cacheflush.h b/arch/nios2/include/asm/cacheflush.h
> index d0b71dd71287..8624ca83cffe 100644
> --- a/arch/nios2/include/asm/cacheflush.h
> +++ b/arch/nios2/include/asm/cacheflush.h
> @@ -29,9 +29,13 @@ extern void flush_cache_page(struct vm_area_struct *vma, unsigned long vmaddr,
>  	unsigned long pfn);
>  #define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
>  void flush_dcache_page(struct page *page);
> +void flush_dcache_folio(struct folio *folio);
> +#define flush_dcache_folio flush_dcache_folio
>  
>  extern void flush_icache_range(unsigned long start, unsigned long end);
> -extern void flush_icache_page(struct vm_area_struct *vma, struct page *page);
> +void flush_icache_pages(struct vm_area_struct *vma, struct page *page,
> +		unsigned int nr);
> +#define flush_icache_page(vma, page) flush_icache_pages(vma, page, 1);
>  
>  #define flush_cache_vmap(start, end)		flush_dcache_range(start, end)
>  #define flush_cache_vunmap(start, end)		flush_dcache_range(start, end)
> diff --git a/arch/nios2/include/asm/pgtable.h b/arch/nios2/include/asm/pgtable.h
> index 0f5c2564e9f5..4bb5f4dfff82 100644
> --- a/arch/nios2/include/asm/pgtable.h
> +++ b/arch/nios2/include/asm/pgtable.h
> @@ -178,14 +178,21 @@ static inline void set_pte(pte_t *ptep, pte_t pteval)
>  	*ptep = pteval;
>  }
>  
> -static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
> -			      pte_t *ptep, pte_t pteval)
> +static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
> +		pte_t *ptep, pte_t pte, unsigned int nr)
>  {
> -	unsigned long paddr = (unsigned long)page_to_virt(pte_page(pteval));
> -
> -	flush_dcache_range(paddr, paddr + PAGE_SIZE);
> -	set_pte(ptep, pteval);
> +	unsigned long paddr = (unsigned long)page_to_virt(pte_page(pte));
> +
> +	flush_dcache_range(paddr, paddr + nr * PAGE_SIZE);
> +	for (;;) {
> +		set_pte(ptep, pte);
> +		if (--nr == 0)
> +			break;
> +		ptep++;
> +		pte_val(pte) += 1;
> +	}
>  }
> +#define set_ptes set_ptes
>  
>  static inline int pmd_none(pmd_t pmd)
>  {
> @@ -202,7 +209,7 @@ static inline void pte_clear(struct mm_struct *mm,
>  
>  	pte_val(null) = (addr >> PAGE_SHIFT) & 0xf;
>  
> -	set_pte_at(mm, addr, ptep, null);
> +	set_pte(ptep, null);
>  }
>  
>  /*
> @@ -273,7 +280,10 @@ static inline pte_t pte_swp_clear_exclusive(pte_t pte)
>  extern void __init paging_init(void);
>  extern void __init mmu_init(void);
>  
> -extern void update_mmu_cache(struct vm_area_struct *vma,
> -			     unsigned long address, pte_t *pte);
> +void update_mmu_cache_range(struct vm_area_struct *vma, unsigned long address,
> +		pte_t *ptep, unsigned int nr);
> +
> +#define update_mmu_cache(vma, addr, ptep) \
> +	update_mmu_cache_range(vma, addr, ptep, 1)
>  
>  #endif /* _ASM_NIOS2_PGTABLE_H */
> diff --git a/arch/nios2/mm/cacheflush.c b/arch/nios2/mm/cacheflush.c
> index 6aa9257c3ede..471485a84b2c 100644
> --- a/arch/nios2/mm/cacheflush.c
> +++ b/arch/nios2/mm/cacheflush.c
> @@ -138,10 +138,11 @@ void flush_cache_range(struct vm_area_struct *vma, unsigned long start,
>  		__flush_icache(start, end);
>  }
>  
> -void flush_icache_page(struct vm_area_struct *vma, struct page *page)
> +void flush_icache_pages(struct vm_area_struct *vma, struct page *page,
> +		unsigned int nr)
>  {
>  	unsigned long start = (unsigned long) page_address(page);
> -	unsigned long end = start + PAGE_SIZE;
> +	unsigned long end = start + nr * PAGE_SIZE;
>  
>  	__flush_dcache(start, end);
>  	__flush_icache(start, end);
> @@ -158,19 +159,19 @@ void flush_cache_page(struct vm_area_struct *vma, unsigned long vmaddr,
>  		__flush_icache(start, end);
>  }
>  
> -void __flush_dcache_page(struct address_space *mapping, struct page *page)
> +void __flush_dcache_folio(struct address_space *mapping, struct folio *folio)
>  {
>  	/*
>  	 * Writeback any data associated with the kernel mapping of this
>  	 * page.  This ensures that data in the physical page is mutually
>  	 * coherent with the kernels mapping.
>  	 */
> -	unsigned long start = (unsigned long)page_address(page);
> +	unsigned long start = (unsigned long)folio_address(folio);
>  
> -	__flush_dcache(start, start + PAGE_SIZE);
> +	__flush_dcache(start, start + folio_size(folio));
>  }
>  
> -void flush_dcache_page(struct page *page)
> +void flush_dcache_folio(struct folio *folio)
>  {
>  	struct address_space *mapping;
>  
> @@ -178,32 +179,38 @@ void flush_dcache_page(struct page *page)
>  	 * The zero page is never written to, so never has any dirty
>  	 * cache lines, and therefore never needs to be flushed.
>  	 */
> -	if (page == ZERO_PAGE(0))
> +	if (is_zero_pfn(folio_pfn(folio)))
>  		return;
>  
> -	mapping = page_mapping_file(page);
> +	mapping = folio_flush_mapping(folio);
>  
>  	/* Flush this page if there are aliases. */
>  	if (mapping && !mapping_mapped(mapping)) {
> -		clear_bit(PG_dcache_clean, &page->flags);
> +		clear_bit(PG_dcache_clean, &folio->flags);
>  	} else {
> -		__flush_dcache_page(mapping, page);
> +		__flush_dcache_folio(mapping, folio);
>  		if (mapping) {
> -			unsigned long start = (unsigned long)page_address(page);
> -			flush_aliases(mapping,  page);
> -			flush_icache_range(start, start + PAGE_SIZE);
> +			unsigned long start = (unsigned long)folio_address(folio);
> +			flush_aliases(mapping, folio);
> +			flush_icache_range(start, start + folio_size(folio));
>  		}
> -		set_bit(PG_dcache_clean, &page->flags);
> +		set_bit(PG_dcache_clean, &folio->flags);
>  	}
>  }
> -EXPORT_SYMBOL(flush_dcache_page);
> +EXPORT_SYMBOL(flush_dcache_folio);
> +
> +void flush_dcache_page(struct page *page)
> +{
> +	flush_dcache_folio(page_folio(page));
> +}
> +EXPORT_SYMBOL(flush_dcache_folio);
>  
> -void update_mmu_cache(struct vm_area_struct *vma,
> -		      unsigned long address, pte_t *ptep)
> +void update_mmu_cache_range(struct vm_area_struct *vma, unsigned long address,
> +		pte_t *ptep, unsigned int nr)
>  {
>  	pte_t pte = *ptep;
>  	unsigned long pfn = pte_pfn(pte);
> -	struct page *page;
> +	struct folio *folio;
>  	struct address_space *mapping;
>  
>  	reload_tlb_page(vma, address, pte);
> @@ -215,19 +222,19 @@ void update_mmu_cache(struct vm_area_struct *vma,
>  	* The zero page is never written to, so never has any dirty
>  	* cache lines, and therefore never needs to be flushed.
>  	*/
> -	page = pfn_to_page(pfn);
> -	if (page == ZERO_PAGE(0))
> +	if (is_zero_pfn(pfn))
>  		return;
>  
> -	mapping = page_mapping_file(page);
> -	if (!test_and_set_bit(PG_dcache_clean, &page->flags))
> -		__flush_dcache_page(mapping, page);
> +	folio = page_folio(pfn_to_page(pfn));
> +	mapping = folio_flush_mapping(folio);
> +	if (!test_and_set_bit(PG_dcache_clean, &folio->flags))
> +		__flush_dcache_folio(mapping, folio);
>  
> -	if(mapping)
> -	{
> -		flush_aliases(mapping, page);
> +	if (mapping) {
> +		flush_aliases(mapping, folio);
>  		if (vma->vm_flags & VM_EXEC)
> -			flush_icache_page(vma, page);
> +			flush_icache_pages(vma, &folio->page,
> +					folio_nr_pages(folio));
>  	}
>  }
>  
> -- 
> 2.39.2
> 
> 

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v4 18/36] openrisc: Implement the new page table range API
  2023-03-15  5:14 ` [PATCH v4 18/36] openrisc: " Matthew Wilcox (Oracle)
@ 2023-03-15 10:09   ` Mike Rapoport
  0 siblings, 0 replies; 138+ messages in thread
From: Mike Rapoport @ 2023-03-15 10:09 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-arch, linux-mm, linux-kernel, Jonas Bonn,
	Stefan Kristiansson, Stafford Horne, linux-openrisc

On Wed, Mar 15, 2023 at 05:14:26AM +0000, Matthew Wilcox (Oracle) wrote:
> Add PFN_PTE_SHIFT, update_mmu_cache_range() and flush_dcache_folio().
> Change the PG_arch_1 (aka PG_dcache_dirty) flag from being per-page
> to per-folio.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> Cc: Jonas Bonn <jonas@southpole.se>
> Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
> Cc: Stafford Horne <shorne@gmail.com>
> Cc: linux-openrisc@vger.kernel.org

Acked-by: Mike Rapoport (IBM) <rppt@kernel.org>

> ---
>  arch/openrisc/include/asm/cacheflush.h |  8 +++++++-
>  arch/openrisc/include/asm/pgtable.h    | 14 +++++++++-----
>  arch/openrisc/mm/cache.c               | 12 ++++++++----
>  3 files changed, 24 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/openrisc/include/asm/cacheflush.h b/arch/openrisc/include/asm/cacheflush.h
> index eeac40d4a854..984c331ff5f4 100644
> --- a/arch/openrisc/include/asm/cacheflush.h
> +++ b/arch/openrisc/include/asm/cacheflush.h
> @@ -56,10 +56,16 @@ static inline void sync_icache_dcache(struct page *page)
>   */
>  #define PG_dc_clean                  PG_arch_1
>  
> +static inline void flush_dcache_folio(struct folio *folio)
> +{
> +	clear_bit(PG_dc_clean, &folio->flags);
> +}
> +#define flush_dcache_folio flush_dcache_folio
> +
>  #define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
>  static inline void flush_dcache_page(struct page *page)
>  {
> -	clear_bit(PG_dc_clean, &page->flags);
> +	flush_dcache_folio(page_folio(page));
>  }
>  
>  #define flush_icache_user_page(vma, page, addr, len)	\
> diff --git a/arch/openrisc/include/asm/pgtable.h b/arch/openrisc/include/asm/pgtable.h
> index 3eb9b9555d0d..2f42a12c40ab 100644
> --- a/arch/openrisc/include/asm/pgtable.h
> +++ b/arch/openrisc/include/asm/pgtable.h
> @@ -46,7 +46,7 @@ extern void paging_init(void);
>   * hook is made available.
>   */
>  #define set_pte(pteptr, pteval) ((*(pteptr)) = (pteval))
> -#define set_pte_at(mm, addr, ptep, pteval) set_pte(ptep, pteval)
> +
>  /*
>   * (pmds are folded into pgds so this doesn't get actually called,
>   * but the define is needed for a generic inline function.)
> @@ -357,6 +357,7 @@ static inline unsigned long pmd_page_vaddr(pmd_t pmd)
>  #define __pmd_offset(address) \
>  	(((address) >> PMD_SHIFT) & (PTRS_PER_PMD-1))
>  
> +#define PFN_PTE_SHIFT		PAGE_SHIFT
>  #define pte_pfn(x)		((unsigned long)(((x).pte)) >> PAGE_SHIFT)
>  #define pfn_pte(pfn, prot)  __pte((((pfn) << PAGE_SHIFT)) | pgprot_val(prot))
>  
> @@ -379,13 +380,16 @@ static inline void update_tlb(struct vm_area_struct *vma,
>  extern void update_cache(struct vm_area_struct *vma,
>  	unsigned long address, pte_t *pte);
>  
> -static inline void update_mmu_cache(struct vm_area_struct *vma,
> -	unsigned long address, pte_t *pte)
> +static inline void update_mmu_cache_range(struct vm_area_struct *vma,
> +		unsigned long address, pte_t *ptep, unsigned int nr)
>  {
> -	update_tlb(vma, address, pte);
> -	update_cache(vma, address, pte);
> +	update_tlb(vma, address, ptep);
> +	update_cache(vma, address, ptep);
>  }
>  
> +#define update_mmu_cache(vma, addr, ptep) \
> +	update_mmu_cache_range(vma, addr, ptep, 1)
> +
>  /* __PHX__ FIXME, SWAP, this probably doesn't work */
>  
>  /*
> diff --git a/arch/openrisc/mm/cache.c b/arch/openrisc/mm/cache.c
> index 534a52ec5e66..eb43b73f3855 100644
> --- a/arch/openrisc/mm/cache.c
> +++ b/arch/openrisc/mm/cache.c
> @@ -43,15 +43,19 @@ void update_cache(struct vm_area_struct *vma, unsigned long address,
>  	pte_t *pte)
>  {
>  	unsigned long pfn = pte_val(*pte) >> PAGE_SHIFT;
> -	struct page *page = pfn_to_page(pfn);
> -	int dirty = !test_and_set_bit(PG_dc_clean, &page->flags);
> +	struct folio *folio = page_folio(pfn_to_page(pfn));
> +	int dirty = !test_and_set_bit(PG_dc_clean, &folio->flags);
>  
>  	/*
>  	 * Since icaches do not snoop for updated data on OpenRISC, we
>  	 * must write back and invalidate any dirty pages manually. We
>  	 * can skip data pages, since they will not end up in icaches.
>  	 */
> -	if ((vma->vm_flags & VM_EXEC) && dirty)
> -		sync_icache_dcache(page);
> +	if ((vma->vm_flags & VM_EXEC) && dirty) {
> +		unsigned int nr = folio_nr_pages(folio);
> +
> +		while (nr--)
> +			sync_icache_dcache(folio_page(folio, nr));
> +	}
>  }
>  
> -- 
> 2.39.2
> 
> 

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v4 19/36] parisc: Implement the new page table range API
  2023-03-15  5:14 ` [PATCH v4 19/36] parisc: " Matthew Wilcox (Oracle)
@ 2023-03-15 10:09   ` Mike Rapoport
  0 siblings, 0 replies; 138+ messages in thread
From: Mike Rapoport @ 2023-03-15 10:09 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-arch, linux-mm, linux-kernel, James E.J. Bottomley,
	Helge Deller, linux-parisc

On Wed, Mar 15, 2023 at 05:14:27AM +0000, Matthew Wilcox (Oracle) wrote:
> Add set_ptes(), update_mmu_cache_range(), flush_dcache_folio()
> and flush_icache_pages().  Change the PG_arch_1 (aka PG_dcache_dirty) flag
> from being per-page to per-folio.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
> Cc: Helge Deller <deller@gmx.de>
> Cc: linux-parisc@vger.kernel.org

Acked-by: Mike Rapoport (IBM) <rppt@kernel.org>

> ---
>  arch/parisc/include/asm/cacheflush.h |  14 ++--
>  arch/parisc/include/asm/pgtable.h    |  37 ++++++----
>  arch/parisc/kernel/cache.c           | 101 +++++++++++++++++++--------
>  3 files changed, 103 insertions(+), 49 deletions(-)
> 
> diff --git a/arch/parisc/include/asm/cacheflush.h b/arch/parisc/include/asm/cacheflush.h
> index 0bdee6724132..2cdc0ea562d6 100644
> --- a/arch/parisc/include/asm/cacheflush.h
> +++ b/arch/parisc/include/asm/cacheflush.h
> @@ -43,16 +43,20 @@ void invalidate_kernel_vmap_range(void *vaddr, int size);
>  #define flush_cache_vmap(start, end)		flush_cache_all()
>  #define flush_cache_vunmap(start, end)		flush_cache_all()
>  
> +void flush_dcache_folio(struct folio *folio);
> +#define flush_dcache_folio flush_dcache_folio
>  #define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
> -void flush_dcache_page(struct page *page);
> +static inline void flush_dcache_page(struct page *page)
> +{
> +	flush_dcache_folio(page_folio(page));
> +}
>  
>  #define flush_dcache_mmap_lock(mapping)		xa_lock_irq(&mapping->i_pages)
>  #define flush_dcache_mmap_unlock(mapping)	xa_unlock_irq(&mapping->i_pages)
>  
> -#define flush_icache_page(vma,page)	do { 		\
> -	flush_kernel_dcache_page_addr(page_address(page)); \
> -	flush_kernel_icache_page(page_address(page)); 	\
> -} while (0)
> +void flush_icache_pages(struct vm_area_struct *vma, struct page *page,
> +		unsigned int nr);
> +#define flush_icache_page(vma, page)	flush_icache_pages(vma, page, 1)
>  
>  #define flush_icache_range(s,e)		do { 		\
>  	flush_kernel_dcache_range_asm(s,e); 		\
> diff --git a/arch/parisc/include/asm/pgtable.h b/arch/parisc/include/asm/pgtable.h
> index e2950f5db7c9..ca6afe1980a5 100644
> --- a/arch/parisc/include/asm/pgtable.h
> +++ b/arch/parisc/include/asm/pgtable.h
> @@ -73,15 +73,6 @@ extern void __update_cache(pte_t pte);
>  		mb();				\
>  	} while(0)
>  
> -#define set_pte_at(mm, addr, pteptr, pteval)	\
> -	do {					\
> -		if (pte_present(pteval) &&	\
> -		    pte_user(pteval))		\
> -			__update_cache(pteval);	\
> -		*(pteptr) = (pteval);		\
> -		purge_tlb_entries(mm, addr);	\
> -	} while (0)
> -
>  #endif /* !__ASSEMBLY__ */
>  
>  #define pte_ERROR(e) \
> @@ -285,7 +276,7 @@ extern unsigned long *empty_zero_page;
>  #define pte_none(x)     (pte_val(x) == 0)
>  #define pte_present(x)	(pte_val(x) & _PAGE_PRESENT)
>  #define pte_user(x)	(pte_val(x) & _PAGE_USER)
> -#define pte_clear(mm, addr, xp)  set_pte_at(mm, addr, xp, __pte(0))
> +#define pte_clear(mm, addr, xp)  set_pte(xp, __pte(0))
>  
>  #define pmd_flag(x)	(pmd_val(x) & PxD_FLAG_MASK)
>  #define pmd_address(x)	((unsigned long)(pmd_val(x) &~ PxD_FLAG_MASK) << PxD_VALUE_SHIFT)
> @@ -391,11 +382,29 @@ static inline unsigned long pmd_page_vaddr(pmd_t pmd)
>  
>  extern void paging_init (void);
>  
> +static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
> +		pte_t *ptep, pte_t pte, unsigned int nr)
> +{
> +	if (pte_present(pte) && pte_user(pte))
> +		__update_cache(pte);
> +	for (;;) {
> +		*ptep = pte;
> +		purge_tlb_entries(mm, addr);
> +		if (--nr == 0)
> +			break;
> +		ptep++;
> +		pte_val(pte) += 1 << PFN_PTE_SHIFT;
> +		addr += PAGE_SIZE;
> +	}
> +}
> +#define set_ptes set_ptes
> +
>  /* Used for deferring calls to flush_dcache_page() */
>  
>  #define PG_dcache_dirty         PG_arch_1
>  
> -#define update_mmu_cache(vms,addr,ptep) __update_cache(*ptep)
> +#define update_mmu_cache_range(vma, addr, ptep, nr) __update_cache(*ptep)
> +#define update_mmu_cache(vma, addr, ptep) __update_cache(*ptep)
>  
>  /*
>   * Encode/decode swap entries and swap PTEs. Swap PTEs are all PTEs that
> @@ -450,7 +459,7 @@ static inline int ptep_test_and_clear_young(struct vm_area_struct *vma, unsigned
>  	if (!pte_young(pte)) {
>  		return 0;
>  	}
> -	set_pte_at(vma->vm_mm, addr, ptep, pte_mkold(pte));
> +	set_pte(ptep, pte_mkold(pte));
>  	return 1;
>  }
>  
> @@ -460,14 +469,14 @@ static inline pte_t ptep_get_and_clear(struct mm_struct *mm, unsigned long addr,
>  	pte_t old_pte;
>  
>  	old_pte = *ptep;
> -	set_pte_at(mm, addr, ptep, __pte(0));
> +	set_pte(ptep, __pte(0));
>  
>  	return old_pte;
>  }
>  
>  static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long addr, pte_t *ptep)
>  {
> -	set_pte_at(mm, addr, ptep, pte_wrprotect(*ptep));
> +	set_pte(ptep, pte_wrprotect(*ptep));
>  }
>  
>  #define pte_same(A,B)	(pte_val(A) == pte_val(B))
> diff --git a/arch/parisc/kernel/cache.c b/arch/parisc/kernel/cache.c
> index 1d3b8bc8a623..ceaa268fc1a6 100644
> --- a/arch/parisc/kernel/cache.c
> +++ b/arch/parisc/kernel/cache.c
> @@ -92,11 +92,11 @@ static inline void flush_data_cache(void)
>  /* Kernel virtual address of pfn.  */
>  #define pfn_va(pfn)	__va(PFN_PHYS(pfn))
>  
> -void
> -__update_cache(pte_t pte)
> +void __update_cache(pte_t pte)
>  {
>  	unsigned long pfn = pte_pfn(pte);
> -	struct page *page;
> +	struct folio *folio;
> +	unsigned int nr;
>  
>  	/* We don't have pte special.  As a result, we can be called with
>  	   an invalid pfn and we don't need to flush the kernel dcache page.
> @@ -104,13 +104,17 @@ __update_cache(pte_t pte)
>  	if (!pfn_valid(pfn))
>  		return;
>  
> -	page = pfn_to_page(pfn);
> -	if (page_mapping_file(page) &&
> -	    test_bit(PG_dcache_dirty, &page->flags)) {
> -		flush_kernel_dcache_page_addr(pfn_va(pfn));
> -		clear_bit(PG_dcache_dirty, &page->flags);
> +	folio = page_folio(pfn_to_page(pfn));
> +	pfn = folio_pfn(folio);
> +	nr = folio_nr_pages(folio);
> +	if (folio_flush_mapping(folio) &&
> +	    test_bit(PG_dcache_dirty, &folio->flags)) {
> +		while (nr--)
> +			flush_kernel_dcache_page_addr(pfn_va(pfn + nr));
> +		clear_bit(PG_dcache_dirty, &folio->flags);
>  	} else if (parisc_requires_coherency())
> -		flush_kernel_dcache_page_addr(pfn_va(pfn));
> +		while (nr--)
> +			flush_kernel_dcache_page_addr(pfn_va(pfn + nr));
>  }
>  
>  void
> @@ -364,6 +368,20 @@ static void flush_user_cache_page(struct vm_area_struct *vma, unsigned long vmad
>  	preempt_enable();
>  }
>  
> +void flush_icache_pages(struct vm_area_struct *vma, struct page *page,
> +		unsigned int nr)
> +{
> +	void *kaddr = page_address(page);
> +
> +	for (;;) {
> +		flush_kernel_dcache_page_addr(kaddr);
> +		flush_kernel_icache_page(kaddr);
> +		if (--nr == 0)
> +			break;
> +		page += PAGE_SIZE;
> +	}
> +}
> +
>  static inline pte_t *get_ptep(struct mm_struct *mm, unsigned long addr)
>  {
>  	pte_t *ptep = NULL;
> @@ -392,26 +410,30 @@ static inline bool pte_needs_flush(pte_t pte)
>  		== (_PAGE_PRESENT | _PAGE_ACCESSED);
>  }
>  
> -void flush_dcache_page(struct page *page)
> +void flush_dcache_folio(struct folio *folio)
>  {
> -	struct address_space *mapping = page_mapping_file(page);
> -	struct vm_area_struct *mpnt;
> -	unsigned long offset;
> +	struct address_space *mapping = folio_flush_mapping(folio);
> +	struct vm_area_struct *vma;
>  	unsigned long addr, old_addr = 0;
> +	void *kaddr;
>  	unsigned long count = 0;
> +	unsigned long i, nr;
>  	pgoff_t pgoff;
>  
>  	if (mapping && !mapping_mapped(mapping)) {
> -		set_bit(PG_dcache_dirty, &page->flags);
> +		set_bit(PG_dcache_dirty, &folio->flags);
>  		return;
>  	}
>  
> -	flush_kernel_dcache_page_addr(page_address(page));
> +	nr = folio_nr_pages(folio);
> +	kaddr = folio_address(folio);
> +	for (i = 0; i < nr; i++)
> +		flush_kernel_dcache_page_addr(kaddr + i * PAGE_SIZE);
>  
>  	if (!mapping)
>  		return;
>  
> -	pgoff = page->index;
> +	pgoff = folio->index;
>  
>  	/*
>  	 * We have carefully arranged in arch_get_unmapped_area() that
> @@ -421,15 +443,29 @@ void flush_dcache_page(struct page *page)
>  	 * on machines that support equivalent aliasing
>  	 */
>  	flush_dcache_mmap_lock(mapping);
> -	vma_interval_tree_foreach(mpnt, &mapping->i_mmap, pgoff, pgoff) {
> -		offset = (pgoff - mpnt->vm_pgoff) << PAGE_SHIFT;
> -		addr = mpnt->vm_start + offset;
> -		if (parisc_requires_coherency()) {
> -			pte_t *ptep;
> +	vma_interval_tree_foreach(vma, &mapping->i_mmap, pgoff, pgoff + nr - 1) {
> +		unsigned long offset = pgoff - vma->vm_pgoff;
> +		unsigned long pfn = folio_pfn(folio);
> +
> +		addr = vma->vm_start;
> +		nr = folio_nr_pages(folio);
> +		if (offset > -nr) {
> +			pfn -= offset;
> +			nr += offset;
> +		} else {
> +			addr += offset * PAGE_SIZE;
> +		}
> +		if (addr + nr * PAGE_SIZE > vma->vm_end)
> +			nr = (vma->vm_end - addr) / PAGE_SIZE;
>  
> -			ptep = get_ptep(mpnt->vm_mm, addr);
> -			if (ptep && pte_needs_flush(*ptep))
> -				flush_user_cache_page(mpnt, addr);
> +		if (parisc_requires_coherency()) {
> +			for (i = 0; i < nr; i++) {
> +				pte_t *ptep = get_ptep(vma->vm_mm,
> +							addr + i * PAGE_SIZE);
> +				if (ptep && pte_needs_flush(*ptep))
> +					flush_user_cache_page(vma,
> +							addr + i * PAGE_SIZE);
> +			}
>  		} else {
>  			/*
>  			 * The TLB is the engine of coherence on parisc:
> @@ -442,27 +478,32 @@ void flush_dcache_page(struct page *page)
>  			 * in (until the user or kernel specifically
>  			 * accesses it, of course)
>  			 */
> -			flush_tlb_page(mpnt, addr);
> +			for (i = 0; i < nr; i++)
> +				flush_tlb_page(vma, addr + i * PAGE_SIZE);
>  			if (old_addr == 0 || (old_addr & (SHM_COLOUR - 1))
>  					!= (addr & (SHM_COLOUR - 1))) {
> -				__flush_cache_page(mpnt, addr, page_to_phys(page));
> +				for (i = 0; i < nr; i++)
> +					__flush_cache_page(vma,
> +						addr + i * PAGE_SIZE,
> +						(pfn + i) * PAGE_SIZE);
>  				/*
>  				 * Software is allowed to have any number
>  				 * of private mappings to a page.
>  				 */
> -				if (!(mpnt->vm_flags & VM_SHARED))
> +				if (!(vma->vm_flags & VM_SHARED))
>  					continue;
>  				if (old_addr)
>  					pr_err("INEQUIVALENT ALIASES 0x%lx and 0x%lx in file %pD\n",
> -						old_addr, addr, mpnt->vm_file);
> -				old_addr = addr;
> +						old_addr, addr, vma->vm_file);
> +				if (nr == folio_nr_pages(folio))
> +					old_addr = addr;
>  			}
>  		}
>  		WARN_ON(++count == 4096);
>  	}
>  	flush_dcache_mmap_unlock(mapping);
>  }
> -EXPORT_SYMBOL(flush_dcache_page);
> +EXPORT_SYMBOL(flush_dcache_folio);
>  
>  /* Defined in arch/parisc/kernel/pacache.S */
>  EXPORT_SYMBOL(flush_kernel_dcache_range_asm);
> -- 
> 2.39.2
> 
> 

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v4 20/36] powerpc: Implement the new page table range API
  2023-03-15  5:14 ` [PATCH v4 20/36] powerpc: " Matthew Wilcox (Oracle)
  2023-03-15  9:43   ` Christophe Leroy
@ 2023-03-15 10:09   ` Mike Rapoport
  1 sibling, 0 replies; 138+ messages in thread
From: Mike Rapoport @ 2023-03-15 10:09 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-arch, linux-mm, linux-kernel, Michael Ellerman,
	Nicholas Piggin, Christophe Leroy, linuxppc-dev

On Wed, Mar 15, 2023 at 05:14:28AM +0000, Matthew Wilcox (Oracle) wrote:
> Add set_ptes(), update_mmu_cache_range() and flush_dcache_folio().
> Change the PG_arch_1 (aka PG_dcache_dirty) flag from being per-page to
> per-folio.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Nicholas Piggin <npiggin@gmail.com>
> Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
> Cc: linuxppc-dev@lists.ozlabs.org

Acked-by: Mike Rapoport (IBM) <rppt@kernel.org>

> ---
>  arch/powerpc/include/asm/book3s/pgtable.h | 10 +----
>  arch/powerpc/include/asm/cacheflush.h     | 14 +++++--
>  arch/powerpc/include/asm/kvm_ppc.h        | 10 ++---
>  arch/powerpc/include/asm/nohash/pgtable.h | 13 ++----
>  arch/powerpc/include/asm/pgtable.h        |  6 +++
>  arch/powerpc/mm/book3s64/hash_utils.c     | 11 ++---
>  arch/powerpc/mm/cacheflush.c              | 40 ++++++------------
>  arch/powerpc/mm/nohash/e500_hugetlbpage.c |  3 +-
>  arch/powerpc/mm/pgtable.c                 | 51 +++++++++++++----------
>  9 files changed, 77 insertions(+), 81 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/book3s/pgtable.h b/arch/powerpc/include/asm/book3s/pgtable.h
> index d18b748ea3ae..c2ef811505b0 100644
> --- a/arch/powerpc/include/asm/book3s/pgtable.h
> +++ b/arch/powerpc/include/asm/book3s/pgtable.h
> @@ -9,13 +9,6 @@
>  #endif
>  
>  #ifndef __ASSEMBLY__
> -/* Insert a PTE, top-level function is out of line. It uses an inline
> - * low level function in the respective pgtable-* files
> - */
> -extern void set_pte_at(struct mm_struct *mm, unsigned long addr, pte_t *ptep,
> -		       pte_t pte);
> -
> -
>  #define __HAVE_ARCH_PTEP_SET_ACCESS_FLAGS
>  extern int ptep_set_access_flags(struct vm_area_struct *vma, unsigned long address,
>  				 pte_t *ptep, pte_t entry, int dirty);
> @@ -36,7 +29,8 @@ void __update_mmu_cache(struct vm_area_struct *vma, unsigned long address, pte_t
>   * corresponding HPTE into the hash table ahead of time, instead of
>   * waiting for the inevitable extra hash-table miss exception.
>   */
> -static inline void update_mmu_cache(struct vm_area_struct *vma, unsigned long address, pte_t *ptep)
> +static inline void update_mmu_cache_range(struct vm_area_struct *vma,
> +		unsigned long address, pte_t *ptep, unsigned int nr)
>  {
>  	if (IS_ENABLED(CONFIG_PPC32) && !mmu_has_feature(MMU_FTR_HPTE_TABLE))
>  		return;
> diff --git a/arch/powerpc/include/asm/cacheflush.h b/arch/powerpc/include/asm/cacheflush.h
> index 7564dd4fd12b..ef7d2de33b89 100644
> --- a/arch/powerpc/include/asm/cacheflush.h
> +++ b/arch/powerpc/include/asm/cacheflush.h
> @@ -35,13 +35,19 @@ static inline void flush_cache_vmap(unsigned long start, unsigned long end)
>   * It just marks the page as not i-cache clean.  We do the i-cache
>   * flush later when the page is given to a user process, if necessary.
>   */
> -static inline void flush_dcache_page(struct page *page)
> +static inline void flush_dcache_folio(struct folio *folio)
>  {
>  	if (cpu_has_feature(CPU_FTR_COHERENT_ICACHE))
>  		return;
>  	/* avoid an atomic op if possible */
> -	if (test_bit(PG_dcache_clean, &page->flags))
> -		clear_bit(PG_dcache_clean, &page->flags);
> +	if (test_bit(PG_dcache_clean, &folio->flags))
> +		clear_bit(PG_dcache_clean, &folio->flags);
> +}
> +#define flush_dcache_folio flush_dcache_folio
> +
> +static inline void flush_dcache_page(struct page *page)
> +{
> +	flush_dcache_folio(page_folio(page));
>  }
>  
>  void flush_icache_range(unsigned long start, unsigned long stop);
> @@ -51,7 +57,7 @@ void flush_icache_user_page(struct vm_area_struct *vma, struct page *page,
>  		unsigned long addr, int len);
>  #define flush_icache_user_page flush_icache_user_page
>  
> -void flush_dcache_icache_page(struct page *page);
> +void flush_dcache_icache_folio(struct folio *folio);
>  
>  /**
>   * flush_dcache_range(): Write any modified data cache blocks out to memory and
> diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
> index 6bef23d6d0e3..e91dd8e88bb7 100644
> --- a/arch/powerpc/include/asm/kvm_ppc.h
> +++ b/arch/powerpc/include/asm/kvm_ppc.h
> @@ -868,7 +868,7 @@ void kvmppc_init_lpid(unsigned long nr_lpids);
>  
>  static inline void kvmppc_mmu_flush_icache(kvm_pfn_t pfn)
>  {
> -	struct page *page;
> +	struct folio *folio;
>  	/*
>  	 * We can only access pages that the kernel maps
>  	 * as memory. Bail out for unmapped ones.
> @@ -877,10 +877,10 @@ static inline void kvmppc_mmu_flush_icache(kvm_pfn_t pfn)
>  		return;
>  
>  	/* Clear i-cache for new pages */
> -	page = pfn_to_page(pfn);
> -	if (!test_bit(PG_dcache_clean, &page->flags)) {
> -		flush_dcache_icache_page(page);
> -		set_bit(PG_dcache_clean, &page->flags);
> +	folio = page_folio(pfn_to_page(pfn));
> +	if (!test_bit(PG_dcache_clean, &folio->flags)) {
> +		flush_dcache_icache_folio(folio);
> +		set_bit(PG_dcache_clean, &folio->flags);
>  	}
>  }
>  
> diff --git a/arch/powerpc/include/asm/nohash/pgtable.h b/arch/powerpc/include/asm/nohash/pgtable.h
> index a6caaaab6f92..69a7dd47a9f0 100644
> --- a/arch/powerpc/include/asm/nohash/pgtable.h
> +++ b/arch/powerpc/include/asm/nohash/pgtable.h
> @@ -166,12 +166,6 @@ static inline pte_t pte_swp_clear_exclusive(pte_t pte)
>  	return __pte(pte_val(pte) & ~_PAGE_SWP_EXCLUSIVE);
>  }
>  
> -/* Insert a PTE, top-level function is out of line. It uses an inline
> - * low level function in the respective pgtable-* files
> - */
> -extern void set_pte_at(struct mm_struct *mm, unsigned long addr, pte_t *ptep,
> -		       pte_t pte);
> -
>  /* This low level function performs the actual PTE insertion
>   * Setting the PTE depends on the MMU type and other factors. It's
>   * an horrible mess that I'm not going to try to clean up now but
> @@ -282,10 +276,11 @@ static inline int pud_huge(pud_t pud)
>   * for the page which has just been mapped in.
>   */
>  #if defined(CONFIG_PPC_E500) && defined(CONFIG_HUGETLB_PAGE)
> -void update_mmu_cache(struct vm_area_struct *vma, unsigned long address, pte_t *ptep);
> +void update_mmu_cache_range(struct vm_area_struct *vma, unsigned long address,
> +		pte_t *ptep, unsigned int nr);
>  #else
> -static inline
> -void update_mmu_cache(struct vm_area_struct *vma, unsigned long address, pte_t *ptep) {}
> +static inline void update_mmu_cache(struct vm_area_struct *vma,
> +		unsigned long address, pte_t *ptep, unsigned int nr) {}
>  #endif
>  
>  #endif /* __ASSEMBLY__ */
> diff --git a/arch/powerpc/include/asm/pgtable.h b/arch/powerpc/include/asm/pgtable.h
> index 9972626ddaf6..656ecf2b10cd 100644
> --- a/arch/powerpc/include/asm/pgtable.h
> +++ b/arch/powerpc/include/asm/pgtable.h
> @@ -41,6 +41,12 @@ struct mm_struct;
>  
>  #ifndef __ASSEMBLY__
>  
> +void set_ptes(struct mm_struct *mm, unsigned long addr, pte_t *ptep,
> +		pte_t pte, unsigned int nr);
> +#define set_ptes set_ptes
> +#define update_mmu_cache(vma, addr, ptep) \
> +	update_mmu_cache_range(vma, addr, ptep, 1)
> +
>  #ifndef MAX_PTRS_PER_PGD
>  #define MAX_PTRS_PER_PGD PTRS_PER_PGD
>  #endif
> diff --git a/arch/powerpc/mm/book3s64/hash_utils.c b/arch/powerpc/mm/book3s64/hash_utils.c
> index fedffe3ae136..ad2afa08e62e 100644
> --- a/arch/powerpc/mm/book3s64/hash_utils.c
> +++ b/arch/powerpc/mm/book3s64/hash_utils.c
> @@ -1307,18 +1307,19 @@ void hash__early_init_mmu_secondary(void)
>   */
>  unsigned int hash_page_do_lazy_icache(unsigned int pp, pte_t pte, int trap)
>  {
> -	struct page *page;
> +	struct folio *folio;
>  
>  	if (!pfn_valid(pte_pfn(pte)))
>  		return pp;
>  
> -	page = pte_page(pte);
> +	folio = page_folio(pte_page(pte));
>  
>  	/* page is dirty */
> -	if (!test_bit(PG_dcache_clean, &page->flags) && !PageReserved(page)) {
> +	if (!test_bit(PG_dcache_clean, &folio->flags) &&
> +	    !folio_test_reserved(folio)) {
>  		if (trap == INTERRUPT_INST_STORAGE) {
> -			flush_dcache_icache_page(page);
> -			set_bit(PG_dcache_clean, &page->flags);
> +			flush_dcache_icache_folio(folio);
> +			set_bit(PG_dcache_clean, &folio->flags);
>  		} else
>  			pp |= HPTE_R_N;
>  	}
> diff --git a/arch/powerpc/mm/cacheflush.c b/arch/powerpc/mm/cacheflush.c
> index 0e9b4879c0f9..8760d2223abe 100644
> --- a/arch/powerpc/mm/cacheflush.c
> +++ b/arch/powerpc/mm/cacheflush.c
> @@ -148,44 +148,30 @@ static void __flush_dcache_icache(void *p)
>  	invalidate_icache_range(addr, addr + PAGE_SIZE);
>  }
>  
> -static void flush_dcache_icache_hugepage(struct page *page)
> +void flush_dcache_icache_folio(struct folio *folio)
>  {
> -	int i;
> -	int nr = compound_nr(page);
> +	unsigned int i, nr = folio_nr_pages(folio);
>  
> -	if (!PageHighMem(page)) {
> +	if (flush_coherent_icache())
> +		return;
> +
> +	if (!folio_test_highmem(folio)) {
> +		void *addr = folio_address(folio);
>  		for (i = 0; i < nr; i++)
> -			__flush_dcache_icache(lowmem_page_address(page + i));
> -	} else {
> +			__flush_dcache_icache(addr + i * PAGE_SIZE);
> +	} else if (IS_ENABLED(CONFIG_BOOKE) || sizeof(phys_addr_t) > sizeof(void *)) {
>  		for (i = 0; i < nr; i++) {
> -			void *start = kmap_local_page(page + i);
> +			void *start = kmap_local_folio(folio, i * PAGE_SIZE);
>  
>  			__flush_dcache_icache(start);
>  			kunmap_local(start);
>  		}
> -	}
> -}
> -
> -void flush_dcache_icache_page(struct page *page)
> -{
> -	if (flush_coherent_icache())
> -		return;
> -
> -	if (PageCompound(page))
> -		return flush_dcache_icache_hugepage(page);
> -
> -	if (!PageHighMem(page)) {
> -		__flush_dcache_icache(lowmem_page_address(page));
> -	} else if (IS_ENABLED(CONFIG_BOOKE) || sizeof(phys_addr_t) > sizeof(void *)) {
> -		void *start = kmap_local_page(page);
> -
> -		__flush_dcache_icache(start);
> -		kunmap_local(start);
>  	} else {
> -		flush_dcache_icache_phys(page_to_phys(page));
> +		unsigned long pfn = folio_pfn(folio);
> +		for (i = 0; i < nr; i++)
> +			flush_dcache_icache_phys((pfn + i) * PAGE_SIZE);
>  	}
>  }
> -EXPORT_SYMBOL(flush_dcache_icache_page);
>  
>  void clear_user_page(void *page, unsigned long vaddr, struct page *pg)
>  {
> diff --git a/arch/powerpc/mm/nohash/e500_hugetlbpage.c b/arch/powerpc/mm/nohash/e500_hugetlbpage.c
> index 58c8d9849cb1..f3cb91107a47 100644
> --- a/arch/powerpc/mm/nohash/e500_hugetlbpage.c
> +++ b/arch/powerpc/mm/nohash/e500_hugetlbpage.c
> @@ -178,7 +178,8 @@ book3e_hugetlb_preload(struct vm_area_struct *vma, unsigned long ea, pte_t pte)
>   *
>   * This must always be called with the pte lock held.
>   */
> -void update_mmu_cache(struct vm_area_struct *vma, unsigned long address, pte_t *ptep)
> +void update_mmu_cache(struct vm_area_struct *vma, unsigned long address,
> +		pte_t *ptep, unsigned int nr)
>  {
>  	if (is_vm_hugetlb_page(vma))
>  		book3e_hugetlb_preload(vma, address, *ptep);
> diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c
> index cb2dcdb18f8e..b3c7b874a7a2 100644
> --- a/arch/powerpc/mm/pgtable.c
> +++ b/arch/powerpc/mm/pgtable.c
> @@ -58,7 +58,7 @@ static inline int pte_looks_normal(pte_t pte)
>  	return 0;
>  }
>  
> -static struct page *maybe_pte_to_page(pte_t pte)
> +static struct folio *maybe_pte_to_folio(pte_t pte)
>  {
>  	unsigned long pfn = pte_pfn(pte);
>  	struct page *page;
> @@ -68,7 +68,7 @@ static struct page *maybe_pte_to_page(pte_t pte)
>  	page = pfn_to_page(pfn);
>  	if (PageReserved(page))
>  		return NULL;
> -	return page;
> +	return page_folio(page);
>  }
>  
>  #ifdef CONFIG_PPC_BOOK3S
> @@ -84,12 +84,12 @@ static pte_t set_pte_filter_hash(pte_t pte)
>  	pte = __pte(pte_val(pte) & ~_PAGE_HPTEFLAGS);
>  	if (pte_looks_normal(pte) && !(cpu_has_feature(CPU_FTR_COHERENT_ICACHE) ||
>  				       cpu_has_feature(CPU_FTR_NOEXECUTE))) {
> -		struct page *pg = maybe_pte_to_page(pte);
> -		if (!pg)
> +		struct folio *folio = maybe_pte_to_folio(pte);
> +		if (!folio)
>  			return pte;
> -		if (!test_bit(PG_dcache_clean, &pg->flags)) {
> -			flush_dcache_icache_page(pg);
> -			set_bit(PG_dcache_clean, &pg->flags);
> +		if (!test_bit(PG_dcache_clean, &folio->flags)) {
> +			flush_dcache_icache_folio(folio);
> +			set_bit(PG_dcache_clean, &folio->flags);
>  		}
>  	}
>  	return pte;
> @@ -107,7 +107,7 @@ static pte_t set_pte_filter_hash(pte_t pte) { return pte; }
>   */
>  static inline pte_t set_pte_filter(pte_t pte)
>  {
> -	struct page *pg;
> +	struct folio *folio;
>  
>  	if (radix_enabled())
>  		return pte;
> @@ -120,18 +120,18 @@ static inline pte_t set_pte_filter(pte_t pte)
>  		return pte;
>  
>  	/* If you set _PAGE_EXEC on weird pages you're on your own */
> -	pg = maybe_pte_to_page(pte);
> -	if (unlikely(!pg))
> +	folio = maybe_pte_to_folio(pte);
> +	if (unlikely(!folio))
>  		return pte;
>  
>  	/* If the page clean, we move on */
> -	if (test_bit(PG_dcache_clean, &pg->flags))
> +	if (test_bit(PG_dcache_clean, &folio->flags))
>  		return pte;
>  
>  	/* If it's an exec fault, we flush the cache and make it clean */
>  	if (is_exec_fault()) {
> -		flush_dcache_icache_page(pg);
> -		set_bit(PG_dcache_clean, &pg->flags);
> +		flush_dcache_icache_folio(folio);
> +		set_bit(PG_dcache_clean, &folio->flags);
>  		return pte;
>  	}
>  
> @@ -142,7 +142,7 @@ static inline pte_t set_pte_filter(pte_t pte)
>  static pte_t set_access_flags_filter(pte_t pte, struct vm_area_struct *vma,
>  				     int dirty)
>  {
> -	struct page *pg;
> +	struct folio *folio;
>  
>  	if (IS_ENABLED(CONFIG_PPC_BOOK3S_64))
>  		return pte;
> @@ -168,17 +168,17 @@ static pte_t set_access_flags_filter(pte_t pte, struct vm_area_struct *vma,
>  #endif /* CONFIG_DEBUG_VM */
>  
>  	/* If you set _PAGE_EXEC on weird pages you're on your own */
> -	pg = maybe_pte_to_page(pte);
> -	if (unlikely(!pg))
> +	folio = maybe_pte_to_folio(pte);
> +	if (unlikely(!folio))
>  		goto bail;
>  
>  	/* If the page is already clean, we move on */
> -	if (test_bit(PG_dcache_clean, &pg->flags))
> +	if (test_bit(PG_dcache_clean, &folio->flags))
>  		goto bail;
>  
>  	/* Clean the page and set PG_dcache_clean */
> -	flush_dcache_icache_page(pg);
> -	set_bit(PG_dcache_clean, &pg->flags);
> +	flush_dcache_icache_folio(folio);
> +	set_bit(PG_dcache_clean, &folio->flags);
>  
>   bail:
>  	return pte_mkexec(pte);
> @@ -187,8 +187,8 @@ static pte_t set_access_flags_filter(pte_t pte, struct vm_area_struct *vma,
>  /*
>   * set_pte stores a linux PTE into the linux page table.
>   */
> -void set_pte_at(struct mm_struct *mm, unsigned long addr, pte_t *ptep,
> -		pte_t pte)
> +void set_ptes(struct mm_struct *mm, unsigned long addr, pte_t *ptep,
> +		pte_t pte, unsigned int nr)
>  {
>  	/*
>  	 * Make sure hardware valid bit is not set. We don't do
> @@ -203,7 +203,14 @@ void set_pte_at(struct mm_struct *mm, unsigned long addr, pte_t *ptep,
>  	pte = set_pte_filter(pte);
>  
>  	/* Perform the setting of the PTE */
> -	__set_pte_at(mm, addr, ptep, pte, 0);
> +	for (;;) {
> +		__set_pte_at(mm, addr, ptep, pte, 0);
> +		if (--nr == 0)
> +			break;
> +		ptep++;
> +		pte = __pte(pte_val(pte) + PAGE_SIZE);
> +		addr += PAGE_SIZE;
> +	}
>  }
>  
>  void unmap_kernel_page(unsigned long va)
> -- 
> 2.39.2
> 
> 

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v4 21/36] riscv: Implement the new page table range API
  2023-03-15  5:14 ` [PATCH v4 21/36] riscv: " Matthew Wilcox (Oracle)
@ 2023-03-15 10:10   ` Mike Rapoport
  0 siblings, 0 replies; 138+ messages in thread
From: Mike Rapoport @ 2023-03-15 10:10 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-arch, linux-mm, linux-kernel, Alexandre Ghiti,
	Paul Walmsley, Palmer Dabbelt, Albert Ou, linux-riscv

On Wed, Mar 15, 2023 at 05:14:29AM +0000, Matthew Wilcox (Oracle) wrote:
> Add set_ptes(), update_mmu_cache_range() and flush_dcache_folio().
> Change the PG_dcache_clean flag from being per-page to per-folio.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
> Cc: Paul Walmsley <paul.walmsley@sifive.com>
> Cc: Palmer Dabbelt <palmer@dabbelt.com>
> Cc: Albert Ou <aou@eecs.berkeley.edu>
> Cc: linux-riscv@lists.infradead.org

Acked-by: Mike Rapoport (IBM) <rppt@kernel.org>

> ---
>  arch/riscv/include/asm/cacheflush.h | 19 +++++++++----------
>  arch/riscv/include/asm/pgtable.h    | 26 +++++++++++++++++++-------
>  arch/riscv/mm/cacheflush.c          | 11 ++---------
>  3 files changed, 30 insertions(+), 26 deletions(-)
> 
> diff --git a/arch/riscv/include/asm/cacheflush.h b/arch/riscv/include/asm/cacheflush.h
> index 03e3b95ae6da..10e5e96f09b5 100644
> --- a/arch/riscv/include/asm/cacheflush.h
> +++ b/arch/riscv/include/asm/cacheflush.h
> @@ -15,20 +15,19 @@ static inline void local_flush_icache_all(void)
>  
>  #define PG_dcache_clean PG_arch_1
>  
> -static inline void flush_dcache_page(struct page *page)
> +static inline void flush_dcache_folio(struct folio *folio)
>  {
> -	/*
> -	 * HugeTLB pages are always fully mapped and only head page will be
> -	 * set PG_dcache_clean (see comments in flush_icache_pte()).
> -	 */
> -	if (PageHuge(page))
> -		page = compound_head(page);
> -
> -	if (test_bit(PG_dcache_clean, &page->flags))
> -		clear_bit(PG_dcache_clean, &page->flags);
> +	if (test_bit(PG_dcache_clean, &folio->flags))
> +		clear_bit(PG_dcache_clean, &folio->flags);
>  }
> +#define flush_dcache_folio flush_dcache_folio
>  #define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
>  
> +static inline void flush_dcache_page(struct page *page)
> +{
> +	flush_dcache_folio(page_folio(page));
> +}
> +
>  /*
>   * RISC-V doesn't have an instruction to flush parts of the instruction cache,
>   * so instead we just flush the whole thing.
> diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
> index b516f3b59616..b077bc8c498c 100644
> --- a/arch/riscv/include/asm/pgtable.h
> +++ b/arch/riscv/include/asm/pgtable.h
> @@ -405,8 +405,8 @@ static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
>  
>  
>  /* Commit new configuration to MMU hardware */
> -static inline void update_mmu_cache(struct vm_area_struct *vma,
> -	unsigned long address, pte_t *ptep)
> +static inline void update_mmu_cache_range(struct vm_area_struct *vma,
> +		unsigned long address, pte_t *ptep, unsigned int nr)
>  {
>  	/*
>  	 * The kernel assumes that TLBs don't cache invalid entries, but
> @@ -415,8 +415,11 @@ static inline void update_mmu_cache(struct vm_area_struct *vma,
>  	 * Relying on flush_tlb_fix_spurious_fault would suffice, but
>  	 * the extra traps reduce performance.  So, eagerly SFENCE.VMA.
>  	 */
> -	local_flush_tlb_page(address);
> +	while (nr--)
> +		local_flush_tlb_page(address + nr * PAGE_SIZE);
>  }
> +#define update_mmu_cache(vma, addr, ptep) \
> +	update_mmu_cache_range(vma, addr, ptep, 1)
>  
>  #define __HAVE_ARCH_UPDATE_MMU_TLB
>  #define update_mmu_tlb update_mmu_cache
> @@ -456,12 +459,21 @@ static inline void __set_pte_at(struct mm_struct *mm,
>  	set_pte(ptep, pteval);
>  }
>  
> -static inline void set_pte_at(struct mm_struct *mm,
> -	unsigned long addr, pte_t *ptep, pte_t pteval)
> +static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
> +		pte_t *ptep, pte_t pteval, unsigned int nr)
>  {
> -	page_table_check_ptes_set(mm, addr, ptep, pteval, 1);
> -	__set_pte_at(mm, addr, ptep, pteval);
> +	page_table_check_ptes_set(mm, addr, ptep, pteval, nr);
> +
> +	for (;;) {
> +		__set_pte_at(mm, addr, ptep, pteval);
> +		if (--nr == 0)
> +			break;
> +		ptep++;
> +		addr += PAGE_SIZE;
> +		pte_val(pteval) += 1 << _PAGE_PFN_SHIFT;
> +	}
>  }
> +#define set_ptes set_ptes
>  
>  static inline void pte_clear(struct mm_struct *mm,
>  	unsigned long addr, pte_t *ptep)
> diff --git a/arch/riscv/mm/cacheflush.c b/arch/riscv/mm/cacheflush.c
> index fcd6145fbead..e36a851e5788 100644
> --- a/arch/riscv/mm/cacheflush.c
> +++ b/arch/riscv/mm/cacheflush.c
> @@ -81,16 +81,9 @@ void flush_icache_mm(struct mm_struct *mm, bool local)
>  #ifdef CONFIG_MMU
>  void flush_icache_pte(pte_t pte)
>  {
> -	struct page *page = pte_page(pte);
> +	struct folio *folio = page_folio(pte_page(pte));
>  
> -	/*
> -	 * HugeTLB pages are always fully mapped, so only setting head page's
> -	 * PG_dcache_clean flag is enough.
> -	 */
> -	if (PageHuge(page))
> -		page = compound_head(page);
> -
> -	if (!test_bit(PG_dcache_clean, &page->flags)) {
> +	if (!test_bit(PG_dcache_clean, &folio->flags)) {
>  		flush_icache_all();
>  		set_bit(PG_dcache_clean, &page->flags);
>  	}
> -- 
> 2.39.2
> 
> 
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v4 22/36] s390: Implement the new page table range API
  2023-03-15  5:14 ` [PATCH v4 22/36] s390: " Matthew Wilcox (Oracle)
@ 2023-03-15 10:10   ` Mike Rapoport
  0 siblings, 0 replies; 138+ messages in thread
From: Mike Rapoport @ 2023-03-15 10:10 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-arch, linux-mm, linux-kernel, Gerald Schaefer,
	Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
	Christian Borntraeger, Sven Schnelle, linux-s390

On Wed, Mar 15, 2023 at 05:14:30AM +0000, Matthew Wilcox (Oracle) wrote:
> Add set_ptes() and update_mmu_cache_range().
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> Reviewed-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
> Cc: Heiko Carstens <hca@linux.ibm.com>
> Cc: Vasily Gorbik <gor@linux.ibm.com>
> Cc: Alexander Gordeev <agordeev@linux.ibm.com>
> Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
> Cc: Sven Schnelle <svens@linux.ibm.com>
> Cc: linux-s390@vger.kernel.org

Acked-by: Mike Rapoport (IBM) <rppt@kernel.org>

> ---
>  arch/s390/include/asm/pgtable.h | 33 ++++++++++++++++++++++++---------
>  1 file changed, 24 insertions(+), 9 deletions(-)
> 
> diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h
> index c1f6b46ec555..fea678c67e51 100644
> --- a/arch/s390/include/asm/pgtable.h
> +++ b/arch/s390/include/asm/pgtable.h
> @@ -50,6 +50,7 @@ void arch_report_meminfo(struct seq_file *m);
>   * tables contain all the necessary information.
>   */
>  #define update_mmu_cache(vma, address, ptep)     do { } while (0)
> +#define update_mmu_cache_range(vma, addr, ptep, nr)	do { } while (0)
>  #define update_mmu_cache_pmd(vma, address, ptep) do { } while (0)
>  
>  /*
> @@ -1319,20 +1320,34 @@ pgprot_t pgprot_writecombine(pgprot_t prot);
>  pgprot_t pgprot_writethrough(pgprot_t prot);
>  
>  /*
> - * Certain architectures need to do special things when PTEs
> - * within a page table are directly modified.  Thus, the following
> - * hook is made available.
> + * Set multiple PTEs to consecutive pages with a single call.  All PTEs
> + * are within the same folio, PMD and VMA.
>   */
> -static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
> -			      pte_t *ptep, pte_t entry)
> +static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
> +			      pte_t *ptep, pte_t entry, unsigned int nr)
>  {
>  	if (pte_present(entry))
>  		entry = clear_pte_bit(entry, __pgprot(_PAGE_UNUSED));
> -	if (mm_has_pgste(mm))
> -		ptep_set_pte_at(mm, addr, ptep, entry);
> -	else
> -		set_pte(ptep, entry);
> +	if (mm_has_pgste(mm)) {
> +		for (;;) {
> +			ptep_set_pte_at(mm, addr, ptep, entry);
> +			if (--nr == 0)
> +				break;
> +			ptep++;
> +			entry = __pte(pte_val(entry) + PAGE_SIZE);
> +			addr += PAGE_SIZE;
> +		}
> +	} else {
> +		for (;;) {
> +			set_pte(ptep, entry);
> +			if (--nr == 0)
> +				break;
> +			ptep++;
> +			entry = __pte(pte_val(entry) + PAGE_SIZE);
> +		}
> +	}
>  }
> +#define set_ptes set_ptes
>  
>  /*
>   * Conversion functions: convert a page and protection to a page entry,
> -- 
> 2.39.2
> 
> 

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v4 23/36] superh: Implement the new page table range API
  2023-03-15  5:14 ` [PATCH v4 23/36] superh: " Matthew Wilcox (Oracle)
  2023-03-15  7:22   ` John Paul Adrian Glaubitz
  2023-03-15  7:36   ` John Paul Adrian Glaubitz
@ 2023-03-15 10:10   ` Mike Rapoport
  2 siblings, 0 replies; 138+ messages in thread
From: Mike Rapoport @ 2023-03-15 10:10 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-arch, linux-mm, linux-kernel, Yoshinori Sato, Rich Felker,
	John Paul Adrian Glaubitz, linux-sh

On Wed, Mar 15, 2023 at 05:14:31AM +0000, Matthew Wilcox (Oracle) wrote:
> Add PFN_PTE_SHIFT, update_mmu_cache_range(), flush_dcache_folio() and
> flush_icache_pages().  Change the PG_dcache_clean flag from being
> per-page to per-folio.  Flush the entire folio containing the pages in
> flush_icache_pages() for ease of implementation.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
> Cc: Rich Felker <dalias@libc.org>
> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
> Cc: linux-sh@vger.kernel.org

Acked-by: Mike Rapoport (IBM) <rppt@kernel.org>

> ---
>  arch/sh/include/asm/cacheflush.h | 21 ++++++++-----
>  arch/sh/include/asm/pgtable.h    |  6 ++--
>  arch/sh/include/asm/pgtable_32.h |  5 ++-
>  arch/sh/mm/cache-j2.c            |  4 +--
>  arch/sh/mm/cache-sh4.c           | 26 +++++++++++-----
>  arch/sh/mm/cache-sh7705.c        | 26 ++++++++++------
>  arch/sh/mm/cache.c               | 52 ++++++++++++++++++--------------
>  arch/sh/mm/kmap.c                |  3 +-
>  8 files changed, 88 insertions(+), 55 deletions(-)
> 
> diff --git a/arch/sh/include/asm/cacheflush.h b/arch/sh/include/asm/cacheflush.h
> index 481a664287e2..9fceef6f3e00 100644
> --- a/arch/sh/include/asm/cacheflush.h
> +++ b/arch/sh/include/asm/cacheflush.h
> @@ -13,9 +13,9 @@
>   *  - flush_cache_page(mm, vmaddr, pfn) flushes a single page
>   *  - flush_cache_range(vma, start, end) flushes a range of pages
>   *
> - *  - flush_dcache_page(pg) flushes(wback&invalidates) a page for dcache
> + *  - flush_dcache_folio(folio) flushes(wback&invalidates) a folio for dcache
>   *  - flush_icache_range(start, end) flushes(invalidates) a range for icache
> - *  - flush_icache_page(vma, pg) flushes(invalidates) a page for icache
> + *  - flush_icache_pages(vma, pg, nr) flushes(invalidates) pages for icache
>   *  - flush_cache_sigtramp(vaddr) flushes the signal trampoline
>   */
>  extern void (*local_flush_cache_all)(void *args);
> @@ -23,9 +23,9 @@ extern void (*local_flush_cache_mm)(void *args);
>  extern void (*local_flush_cache_dup_mm)(void *args);
>  extern void (*local_flush_cache_page)(void *args);
>  extern void (*local_flush_cache_range)(void *args);
> -extern void (*local_flush_dcache_page)(void *args);
> +extern void (*local_flush_dcache_folio)(void *args);
>  extern void (*local_flush_icache_range)(void *args);
> -extern void (*local_flush_icache_page)(void *args);
> +extern void (*local_flush_icache_folio)(void *args);
>  extern void (*local_flush_cache_sigtramp)(void *args);
>  
>  static inline void cache_noop(void *args) { }
> @@ -42,11 +42,18 @@ extern void flush_cache_page(struct vm_area_struct *vma,
>  extern void flush_cache_range(struct vm_area_struct *vma,
>  				 unsigned long start, unsigned long end);
>  #define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
> -void flush_dcache_page(struct page *page);
> +void flush_dcache_folio(struct folio *folio);
> +#define flush_dcache_folio flush_dcache_folio
> +static inline void flush_dcache_page(struct page *page)
> +{
> +	flush_dcache_folio(page_folio(page));
> +}
> +
>  extern void flush_icache_range(unsigned long start, unsigned long end);
>  #define flush_icache_user_range flush_icache_range
> -extern void flush_icache_page(struct vm_area_struct *vma,
> -				 struct page *page);
> +void flush_icache_pages(struct vm_area_struct *vma, struct page *page,
> +		unsigned int nr);
> +#define flush_icache_page(vma, page) flush_icache_pages(vma, page, 1)
>  extern void flush_cache_sigtramp(unsigned long address);
>  
>  struct flusher_data {
> diff --git a/arch/sh/include/asm/pgtable.h b/arch/sh/include/asm/pgtable.h
> index 3ce30becf6df..1a8fdc3bc363 100644
> --- a/arch/sh/include/asm/pgtable.h
> +++ b/arch/sh/include/asm/pgtable.h
> @@ -102,13 +102,15 @@ extern void __update_cache(struct vm_area_struct *vma,
>  extern void __update_tlb(struct vm_area_struct *vma,
>  			 unsigned long address, pte_t pte);
>  
> -static inline void
> -update_mmu_cache(struct vm_area_struct *vma, unsigned long address, pte_t *ptep)
> +static inline void update_mmu_cache_range(struct vm_area_struct *vma,
> +		unsigned long address, pte_t *ptep, unsigned int nr)
>  {
>  	pte_t pte = *ptep;
>  	__update_cache(vma, address, pte);
>  	__update_tlb(vma, address, pte);
>  }
> +#define update_mmu_cache(vma, addr, ptep) \
> +	update_mmu_cache_range(vma, addr, ptep, 1)
>  
>  extern pgd_t swapper_pg_dir[PTRS_PER_PGD];
>  extern void paging_init(void);
> diff --git a/arch/sh/include/asm/pgtable_32.h b/arch/sh/include/asm/pgtable_32.h
> index 21952b094650..676f3d4ef6ce 100644
> --- a/arch/sh/include/asm/pgtable_32.h
> +++ b/arch/sh/include/asm/pgtable_32.h
> @@ -307,14 +307,13 @@ static inline void set_pte(pte_t *ptep, pte_t pte)
>  #define set_pte(pteptr, pteval) (*(pteptr) = pteval)
>  #endif
>  
> -#define set_pte_at(mm,addr,ptep,pteval) set_pte(ptep,pteval)
> -
>  /*
>   * (pmds are folded into pgds so this doesn't get actually called,
>   * but the define is needed for a generic inline function.)
>   */
>  #define set_pmd(pmdptr, pmdval) (*(pmdptr) = pmdval)
>  
> +#define PFN_PTE_SHIFT	PAGE_SHIFT
>  #define pfn_pte(pfn, prot) \
>  	__pte(((unsigned long long)(pfn) << PAGE_SHIFT) | pgprot_val(prot))
>  #define pfn_pmd(pfn, prot) \
> @@ -323,7 +322,7 @@ static inline void set_pte(pte_t *ptep, pte_t pte)
>  #define pte_none(x)		(!pte_val(x))
>  #define pte_present(x)		((x).pte_low & (_PAGE_PRESENT | _PAGE_PROTNONE))
>  
> -#define pte_clear(mm,addr,xp) do { set_pte_at(mm, addr, xp, __pte(0)); } while (0)
> +#define pte_clear(mm, addr, ptep) set_pte(ptep, __pte(0))
>  
>  #define pmd_none(x)	(!pmd_val(x))
>  #define pmd_present(x)	(pmd_val(x))
> diff --git a/arch/sh/mm/cache-j2.c b/arch/sh/mm/cache-j2.c
> index f277862a11f5..9ac960214380 100644
> --- a/arch/sh/mm/cache-j2.c
> +++ b/arch/sh/mm/cache-j2.c
> @@ -55,9 +55,9 @@ void __init j2_cache_init(void)
>  	local_flush_cache_dup_mm = j2_flush_both;
>  	local_flush_cache_page = j2_flush_both;
>  	local_flush_cache_range = j2_flush_both;
> -	local_flush_dcache_page = j2_flush_dcache;
> +	local_flush_dcache_folio = j2_flush_dcache;
>  	local_flush_icache_range = j2_flush_icache;
> -	local_flush_icache_page = j2_flush_icache;
> +	local_flush_icache_folio = j2_flush_icache;
>  	local_flush_cache_sigtramp = j2_flush_icache;
>  
>  	pr_info("Initial J2 CCR is %.8x\n", __raw_readl(j2_ccr_base));
> diff --git a/arch/sh/mm/cache-sh4.c b/arch/sh/mm/cache-sh4.c
> index 72c2e1b46c08..862046f26981 100644
> --- a/arch/sh/mm/cache-sh4.c
> +++ b/arch/sh/mm/cache-sh4.c
> @@ -107,19 +107,29 @@ static inline void flush_cache_one(unsigned long start, unsigned long phys)
>   * Write back & invalidate the D-cache of the page.
>   * (To avoid "alias" issues)
>   */
> -static void sh4_flush_dcache_page(void *arg)
> +static void sh4_flush_dcache_folio(void *arg)
>  {
> -	struct page *page = arg;
> -	unsigned long addr = (unsigned long)page_address(page);
> +	struct folio *folio = arg;
>  #ifndef CONFIG_SMP
> -	struct address_space *mapping = page_mapping_file(page);
> +	struct address_space *mapping = folio_flush_mapping(folio);
>  
>  	if (mapping && !mapping_mapped(mapping))
> -		clear_bit(PG_dcache_clean, &page->flags);
> +		clear_bit(PG_dcache_clean, &folio->flags);
>  	else
>  #endif
> -		flush_cache_one(CACHE_OC_ADDRESS_ARRAY |
> -				(addr & shm_align_mask), page_to_phys(page));
> +	{
> +		unsigned long pfn = folio_pfn(folio);
> +		unsigned long addr = (unsigned long)folio_address(folio);
> +		unsigned int i, nr = folio_nr_pages(folio);
> +
> +		for (i = 0; i < nr; i++) {
> +			flush_cache_one(CACHE_OC_ADDRESS_ARRAY |
> +						(addr & shm_align_mask),
> +					pfn * PAGE_SIZE);
> +			addr += PAGE_SIZE;
> +			pfn++;
> +		}
> +	}
>  
>  	wmb();
>  }
> @@ -379,7 +389,7 @@ void __init sh4_cache_init(void)
>  		__raw_readl(CCN_PRR));
>  
>  	local_flush_icache_range	= sh4_flush_icache_range;
> -	local_flush_dcache_page		= sh4_flush_dcache_page;
> +	local_flush_dcache_folio	= sh4_flush_dcache_folio;
>  	local_flush_cache_all		= sh4_flush_cache_all;
>  	local_flush_cache_mm		= sh4_flush_cache_mm;
>  	local_flush_cache_dup_mm	= sh4_flush_cache_mm;
> diff --git a/arch/sh/mm/cache-sh7705.c b/arch/sh/mm/cache-sh7705.c
> index 9b63a53a5e46..b509a407588f 100644
> --- a/arch/sh/mm/cache-sh7705.c
> +++ b/arch/sh/mm/cache-sh7705.c
> @@ -132,15 +132,20 @@ static void __flush_dcache_page(unsigned long phys)
>   * Write back & invalidate the D-cache of the page.
>   * (To avoid "alias" issues)
>   */
> -static void sh7705_flush_dcache_page(void *arg)
> +static void sh7705_flush_dcache_folio(void *arg)
>  {
> -	struct page *page = arg;
> -	struct address_space *mapping = page_mapping_file(page);
> +	struct folio *folio = arg;
> +	struct address_space *mapping = folio_flush_mapping(folio);
>  
>  	if (mapping && !mapping_mapped(mapping))
> -		clear_bit(PG_dcache_clean, &page->flags);
> -	else
> -		__flush_dcache_page(__pa(page_address(page)));
> +		clear_bit(PG_dcache_clean, &folio->flags);
> +	else {
> +		unsigned long pfn = folio_pfn(folio);
> +		unsigned int i, nr = folio_nr_pages(folio);
> +
> +		for (i = 0; i < nr; i++)
> +			__flush_dcache_page((pfn + i) * PAGE_SIZE);
> +	}
>  }
>  
>  static void sh7705_flush_cache_all(void *args)
> @@ -176,19 +181,20 @@ static void sh7705_flush_cache_page(void *args)
>   * Not entirely sure why this is necessary on SH3 with 32K cache but
>   * without it we get occasional "Memory fault" when loading a program.
>   */
> -static void sh7705_flush_icache_page(void *page)
> +static void sh7705_flush_icache_folio(void *arg)
>  {
> -	__flush_purge_region(page_address(page), PAGE_SIZE);
> +	struct folio *folio = arg;
> +	__flush_purge_region(folio_address(folio), folio_size(folio));
>  }
>  
>  void __init sh7705_cache_init(void)
>  {
>  	local_flush_icache_range	= sh7705_flush_icache_range;
> -	local_flush_dcache_page		= sh7705_flush_dcache_page;
> +	local_flush_dcache_folio	= sh7705_flush_dcache_folio;
>  	local_flush_cache_all		= sh7705_flush_cache_all;
>  	local_flush_cache_mm		= sh7705_flush_cache_all;
>  	local_flush_cache_dup_mm	= sh7705_flush_cache_all;
>  	local_flush_cache_range		= sh7705_flush_cache_all;
>  	local_flush_cache_page		= sh7705_flush_cache_page;
> -	local_flush_icache_page		= sh7705_flush_icache_page;
> +	local_flush_icache_folio	= sh7705_flush_icache_folio;
>  }
> diff --git a/arch/sh/mm/cache.c b/arch/sh/mm/cache.c
> index 3aef78ceb820..9bcaa5619eab 100644
> --- a/arch/sh/mm/cache.c
> +++ b/arch/sh/mm/cache.c
> @@ -20,9 +20,9 @@ void (*local_flush_cache_mm)(void *args) = cache_noop;
>  void (*local_flush_cache_dup_mm)(void *args) = cache_noop;
>  void (*local_flush_cache_page)(void *args) = cache_noop;
>  void (*local_flush_cache_range)(void *args) = cache_noop;
> -void (*local_flush_dcache_page)(void *args) = cache_noop;
> +void (*local_flush_dcache_folio)(void *args) = cache_noop;
>  void (*local_flush_icache_range)(void *args) = cache_noop;
> -void (*local_flush_icache_page)(void *args) = cache_noop;
> +void (*local_flush_icache_folio)(void *args) = cache_noop;
>  void (*local_flush_cache_sigtramp)(void *args) = cache_noop;
>  
>  void (*__flush_wback_region)(void *start, int size);
> @@ -61,15 +61,17 @@ void copy_to_user_page(struct vm_area_struct *vma, struct page *page,
>  		       unsigned long vaddr, void *dst, const void *src,
>  		       unsigned long len)
>  {
> -	if (boot_cpu_data.dcache.n_aliases && page_mapcount(page) &&
> -	    test_bit(PG_dcache_clean, &page->flags)) {
> +	struct folio *folio = page_folio(page);
> +
> +	if (boot_cpu_data.dcache.n_aliases && folio_mapped(folio) &&
> +	    test_bit(PG_dcache_clean, &folio->flags)) {
>  		void *vto = kmap_coherent(page, vaddr) + (vaddr & ~PAGE_MASK);
>  		memcpy(vto, src, len);
>  		kunmap_coherent(vto);
>  	} else {
>  		memcpy(dst, src, len);
>  		if (boot_cpu_data.dcache.n_aliases)
> -			clear_bit(PG_dcache_clean, &page->flags);
> +			clear_bit(PG_dcache_clean, &folio->flags);
>  	}
>  
>  	if (vma->vm_flags & VM_EXEC)
> @@ -80,27 +82,30 @@ void copy_from_user_page(struct vm_area_struct *vma, struct page *page,
>  			 unsigned long vaddr, void *dst, const void *src,
>  			 unsigned long len)
>  {
> +	struct folio *folio = page_folio(page);
> +
>  	if (boot_cpu_data.dcache.n_aliases && page_mapcount(page) &&
> -	    test_bit(PG_dcache_clean, &page->flags)) {
> +	    test_bit(PG_dcache_clean, &folio->flags)) {
>  		void *vfrom = kmap_coherent(page, vaddr) + (vaddr & ~PAGE_MASK);
>  		memcpy(dst, vfrom, len);
>  		kunmap_coherent(vfrom);
>  	} else {
>  		memcpy(dst, src, len);
>  		if (boot_cpu_data.dcache.n_aliases)
> -			clear_bit(PG_dcache_clean, &page->flags);
> +			clear_bit(PG_dcache_clean, &folio->flags);
>  	}
>  }
>  
>  void copy_user_highpage(struct page *to, struct page *from,
>  			unsigned long vaddr, struct vm_area_struct *vma)
>  {
> +	struct folio *src = page_folio(from);
>  	void *vfrom, *vto;
>  
>  	vto = kmap_atomic(to);
>  
> -	if (boot_cpu_data.dcache.n_aliases && page_mapcount(from) &&
> -	    test_bit(PG_dcache_clean, &from->flags)) {
> +	if (boot_cpu_data.dcache.n_aliases && folio_mapped(src) &&
> +	    test_bit(PG_dcache_clean, &src->flags)) {
>  		vfrom = kmap_coherent(from, vaddr);
>  		copy_page(vto, vfrom);
>  		kunmap_coherent(vfrom);
> @@ -136,27 +141,28 @@ EXPORT_SYMBOL(clear_user_highpage);
>  void __update_cache(struct vm_area_struct *vma,
>  		    unsigned long address, pte_t pte)
>  {
> -	struct page *page;
>  	unsigned long pfn = pte_pfn(pte);
>  
>  	if (!boot_cpu_data.dcache.n_aliases)
>  		return;
>  
> -	page = pfn_to_page(pfn);
>  	if (pfn_valid(pfn)) {
> -		int dirty = !test_and_set_bit(PG_dcache_clean, &page->flags);
> +		struct folio *folio = page_folio(pfn_to_page(pfn));
> +		int dirty = !test_and_set_bit(PG_dcache_clean, &folio->flags);
>  		if (dirty)
> -			__flush_purge_region(page_address(page), PAGE_SIZE);
> +			__flush_purge_region(folio_address(folio),
> +						folio_size(folio));
>  	}
>  }
>  
>  void __flush_anon_page(struct page *page, unsigned long vmaddr)
>  {
> +	struct folio *folio = page_folio(page);
>  	unsigned long addr = (unsigned long) page_address(page);
>  
>  	if (pages_do_alias(addr, vmaddr)) {
> -		if (boot_cpu_data.dcache.n_aliases && page_mapcount(page) &&
> -		    test_bit(PG_dcache_clean, &page->flags)) {
> +		if (boot_cpu_data.dcache.n_aliases && folio_mapped(folio) &&
> +		    test_bit(PG_dcache_clean, &folio->flags)) {
>  			void *kaddr;
>  
>  			kaddr = kmap_coherent(page, vmaddr);
> @@ -164,7 +170,8 @@ void __flush_anon_page(struct page *page, unsigned long vmaddr)
>  			/* __flush_purge_region((void *)kaddr, PAGE_SIZE); */
>  			kunmap_coherent(kaddr);
>  		} else
> -			__flush_purge_region((void *)addr, PAGE_SIZE);
> +			__flush_purge_region(folio_address(folio),
> +						folio_size(folio));
>  	}
>  }
>  
> @@ -215,11 +222,11 @@ void flush_cache_range(struct vm_area_struct *vma, unsigned long start,
>  }
>  EXPORT_SYMBOL(flush_cache_range);
>  
> -void flush_dcache_page(struct page *page)
> +void flush_dcache_folio(struct folio *folio)
>  {
> -	cacheop_on_each_cpu(local_flush_dcache_page, page, 1);
> +	cacheop_on_each_cpu(local_flush_dcache_folio, folio, 1);
>  }
> -EXPORT_SYMBOL(flush_dcache_page);
> +EXPORT_SYMBOL(flush_dcache_folio);
>  
>  void flush_icache_range(unsigned long start, unsigned long end)
>  {
> @@ -233,10 +240,11 @@ void flush_icache_range(unsigned long start, unsigned long end)
>  }
>  EXPORT_SYMBOL(flush_icache_range);
>  
> -void flush_icache_page(struct vm_area_struct *vma, struct page *page)
> +void flush_icache_pages(struct vm_area_struct *vma, struct page *page,
> +		unsigned int nr)
>  {
> -	/* Nothing uses the VMA, so just pass the struct page along */
> -	cacheop_on_each_cpu(local_flush_icache_page, page, 1);
> +	/* Nothing uses the VMA, so just pass the folio along */
> +	cacheop_on_each_cpu(local_flush_icache_folio, page_folio(page), 1);
>  }
>  
>  void flush_cache_sigtramp(unsigned long address)
> diff --git a/arch/sh/mm/kmap.c b/arch/sh/mm/kmap.c
> index 73fd7cc99430..fa50e8f6e7a9 100644
> --- a/arch/sh/mm/kmap.c
> +++ b/arch/sh/mm/kmap.c
> @@ -27,10 +27,11 @@ void __init kmap_coherent_init(void)
>  
>  void *kmap_coherent(struct page *page, unsigned long addr)
>  {
> +	struct folio *folio = page_folio(page);
>  	enum fixed_addresses idx;
>  	unsigned long vaddr;
>  
> -	BUG_ON(!test_bit(PG_dcache_clean, &page->flags));
> +	BUG_ON(!test_bit(PG_dcache_clean, &folio->flags));
>  
>  	preempt_disable();
>  	pagefault_disable();
> -- 
> 2.39.2
> 
> 

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v4 24/36] sparc32: Implement the new page table range API
  2023-03-15  5:14 ` [PATCH v4 24/36] sparc32: " Matthew Wilcox (Oracle)
@ 2023-03-15 10:11   ` Mike Rapoport
  0 siblings, 0 replies; 138+ messages in thread
From: Mike Rapoport @ 2023-03-15 10:11 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-arch, linux-mm, linux-kernel, David S. Miller, sparclinux

On Wed, Mar 15, 2023 at 05:14:32AM +0000, Matthew Wilcox (Oracle) wrote:
> Add PFN_PTE_SHIFT, update_mmu_cache_range(), flush_dcache_folio() and
> flush_icache_pages().
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: sparclinux@vger.kernel.org

Acked-by: Mike Rapoport (IBM) <rppt@kernel.org>

> ---
>  arch/sparc/include/asm/cacheflush_32.h |  9 +++++++--
>  arch/sparc/include/asm/pgtable_32.h    |  8 ++++----
>  arch/sparc/mm/init_32.c                | 13 +++++++++++--
>  3 files changed, 22 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/sparc/include/asm/cacheflush_32.h b/arch/sparc/include/asm/cacheflush_32.h
> index adb6991d0455..8dba35d63328 100644
> --- a/arch/sparc/include/asm/cacheflush_32.h
> +++ b/arch/sparc/include/asm/cacheflush_32.h
> @@ -16,6 +16,7 @@
>  	sparc32_cachetlb_ops->cache_page(vma, addr)
>  #define flush_icache_range(start, end)		do { } while (0)
>  #define flush_icache_page(vma, pg)		do { } while (0)
> +#define flush_icache_pages(vma, pg, nr)		do { } while (0)
>  
>  #define copy_to_user_page(vma, page, vaddr, dst, src, len) \
>  	do {							\
> @@ -35,11 +36,15 @@
>  #define flush_page_for_dma(addr) \
>  	sparc32_cachetlb_ops->page_for_dma(addr)
>  
> -struct page;
>  void sparc_flush_page_to_ram(struct page *page);
> +void sparc_flush_folio_to_ram(struct folio *folio);
>  
>  #define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
> -#define flush_dcache_page(page)			sparc_flush_page_to_ram(page)
> +#define flush_dcache_folio(folio)		sparc_flush_folio_to_ram(folio)
> +static inline void flush_dcache_page(struct page *page)
> +{
> +	flush_dcache_folio(page_folio(page));
> +}
>  #define flush_dcache_mmap_lock(mapping)		do { } while (0)
>  #define flush_dcache_mmap_unlock(mapping)	do { } while (0)
>  
> diff --git a/arch/sparc/include/asm/pgtable_32.h b/arch/sparc/include/asm/pgtable_32.h
> index d4330e3c57a6..7514611d14d3 100644
> --- a/arch/sparc/include/asm/pgtable_32.h
> +++ b/arch/sparc/include/asm/pgtable_32.h
> @@ -101,8 +101,6 @@ static inline void set_pte(pte_t *ptep, pte_t pteval)
>  	srmmu_swap((unsigned long *)ptep, pte_val(pteval));
>  }
>  
> -#define set_pte_at(mm,addr,ptep,pteval) set_pte(ptep,pteval)
> -
>  static inline int srmmu_device_memory(unsigned long x)
>  {
>  	return ((x & 0xF0000000) != 0);
> @@ -256,6 +254,7 @@ static inline pte_t pte_mkyoung(pte_t pte)
>  	return __pte(pte_val(pte) | SRMMU_REF);
>  }
>  
> +#define PFN_PTE_SHIFT			(PAGE_SHIFT - 4)
>  #define pfn_pte(pfn, prot)		mk_pte(pfn_to_page(pfn), prot)
>  
>  static inline unsigned long pte_pfn(pte_t pte)
> @@ -268,7 +267,7 @@ static inline unsigned long pte_pfn(pte_t pte)
>  		 */
>  		return ~0UL;
>  	}
> -	return (pte_val(pte) & SRMMU_PTE_PMASK) >> (PAGE_SHIFT-4);
> +	return (pte_val(pte) & SRMMU_PTE_PMASK) >> PFN_PTE_SHIFT;
>  }
>  
>  #define pte_page(pte)	pfn_to_page(pte_pfn(pte))
> @@ -318,6 +317,7 @@ void mmu_info(struct seq_file *m);
>  #define FAULT_CODE_USER     0x4
>  
>  #define update_mmu_cache(vma, address, ptep) do { } while (0)
> +#define update_mmu_cache_range(vma, address, ptep, nr) do { } while (0)
>  
>  void srmmu_mapiorange(unsigned int bus, unsigned long xpa,
>                        unsigned long xva, unsigned int len);
> @@ -422,7 +422,7 @@ static inline int io_remap_pfn_range(struct vm_area_struct *vma,
>  ({									  \
>  	int __changed = !pte_same(*(__ptep), __entry);			  \
>  	if (__changed) {						  \
> -		set_pte_at((__vma)->vm_mm, (__address), __ptep, __entry); \
> +		set_pte(__ptep, __entry);				  \
>  		flush_tlb_page(__vma, __address);			  \
>  	}								  \
>  	__changed;							  \
> diff --git a/arch/sparc/mm/init_32.c b/arch/sparc/mm/init_32.c
> index 9c0ea457bdf0..d96a14ffceeb 100644
> --- a/arch/sparc/mm/init_32.c
> +++ b/arch/sparc/mm/init_32.c
> @@ -297,11 +297,20 @@ void sparc_flush_page_to_ram(struct page *page)
>  {
>  	unsigned long vaddr = (unsigned long)page_address(page);
>  
> -	if (vaddr)
> -		__flush_page_to_ram(vaddr);
> +	__flush_page_to_ram(vaddr);
>  }
>  EXPORT_SYMBOL(sparc_flush_page_to_ram);
>  
> +void sparc_flush_folio_to_ram(struct folio *folio)
> +{
> +	unsigned long vaddr = (unsigned long)folio_address(folio);
> +	unsigned int i, nr = folio_nr_pages(folio);
> +
> +	for (i = 0; i < nr; i++)
> +		__flush_page_to_ram(vaddr + i * PAGE_SIZE);
> +}
> +EXPORT_SYMBOL(sparc_flush_folio_to_ram);
> +
>  static const pgprot_t protection_map[16] = {
>  	[VM_NONE]					= PAGE_NONE,
>  	[VM_READ]					= PAGE_READONLY,
> -- 
> 2.39.2
> 
> 

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v4 25/36] sparc64: Implement the new page table range API
  2023-03-15  5:14 ` [PATCH v4 25/36] sparc64: " Matthew Wilcox (Oracle)
@ 2023-03-15 10:11   ` Mike Rapoport
  0 siblings, 0 replies; 138+ messages in thread
From: Mike Rapoport @ 2023-03-15 10:11 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-arch, linux-mm, linux-kernel, David S. Miller, sparclinux

On Wed, Mar 15, 2023 at 05:14:33AM +0000, Matthew Wilcox (Oracle) wrote:
> Add set_ptes(), update_mmu_cache_range(), flush_dcache_folio() and
> flush_icache_pages().  Convert the PG_dcache_dirty flag from being
> per-page to per-folio.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: sparclinux@vger.kernel.org

Acked-by: Mike Rapoport (IBM) <rppt@kernel.org>

> ---
>  arch/sparc/include/asm/cacheflush_64.h | 18 ++++--
>  arch/sparc/include/asm/pgtable_64.h    | 24 ++++++--
>  arch/sparc/kernel/smp_64.c             | 56 +++++++++++-------
>  arch/sparc/mm/init_64.c                | 78 +++++++++++++++-----------
>  arch/sparc/mm/tlb.c                    |  5 +-
>  5 files changed, 116 insertions(+), 65 deletions(-)
> 
> diff --git a/arch/sparc/include/asm/cacheflush_64.h b/arch/sparc/include/asm/cacheflush_64.h
> index b9341836597e..a9a719f04d06 100644
> --- a/arch/sparc/include/asm/cacheflush_64.h
> +++ b/arch/sparc/include/asm/cacheflush_64.h
> @@ -35,20 +35,26 @@ void flush_icache_range(unsigned long start, unsigned long end);
>  void __flush_icache_page(unsigned long);
>  
>  void __flush_dcache_page(void *addr, int flush_icache);
> -void flush_dcache_page_impl(struct page *page);
> +void flush_dcache_folio_impl(struct folio *folio);
>  #ifdef CONFIG_SMP
> -void smp_flush_dcache_page_impl(struct page *page, int cpu);
> -void flush_dcache_page_all(struct mm_struct *mm, struct page *page);
> +void smp_flush_dcache_folio_impl(struct folio *folio, int cpu);
> +void flush_dcache_folio_all(struct mm_struct *mm, struct folio *folio);
>  #else
> -#define smp_flush_dcache_page_impl(page,cpu) flush_dcache_page_impl(page)
> -#define flush_dcache_page_all(mm,page) flush_dcache_page_impl(page)
> +#define smp_flush_dcache_folio_impl(folio, cpu) flush_dcache_folio_impl(folio)
> +#define flush_dcache_folio_all(mm, folio) flush_dcache_folio_impl(folio)
>  #endif
>  
>  void __flush_dcache_range(unsigned long start, unsigned long end);
>  #define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
> -void flush_dcache_page(struct page *page);
> +void flush_dcache_folio(struct folio *folio);
> +#define flush_dcache_folio flush_dcache_folio
> +static inline void flush_dcache_page(struct page *page)
> +{
> +	flush_dcache_folio(page_folio(page));
> +}
>  
>  #define flush_icache_page(vma, pg)	do { } while(0)
> +#define flush_icache_pages(vma, pg, nr)	do { } while(0)
>  
>  void flush_ptrace_access(struct vm_area_struct *, struct page *,
>  			 unsigned long uaddr, void *kaddr,
> diff --git a/arch/sparc/include/asm/pgtable_64.h b/arch/sparc/include/asm/pgtable_64.h
> index 2dc8d4641734..49c37000e1b1 100644
> --- a/arch/sparc/include/asm/pgtable_64.h
> +++ b/arch/sparc/include/asm/pgtable_64.h
> @@ -911,8 +911,19 @@ static inline void __set_pte_at(struct mm_struct *mm, unsigned long addr,
>  	maybe_tlb_batch_add(mm, addr, ptep, orig, fullmm, PAGE_SHIFT);
>  }
>  
> -#define set_pte_at(mm,addr,ptep,pte)	\
> -	__set_pte_at((mm), (addr), (ptep), (pte), 0)
> +static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
> +		pte_t *ptep, pte_t pte, unsigned int nr)
> +{
> +	for (;;) {
> +		__set_pte_at(mm, addr, ptep, pte, 0);
> +		if (--nr == 0)
> +			break;
> +		ptep++;
> +		pte_val(pte) += PAGE_SIZE;
> +		addr += PAGE_SIZE;
> +	}
> +}
> +#define set_ptes set_ptes
>  
>  #define pte_clear(mm,addr,ptep)		\
>  	set_pte_at((mm), (addr), (ptep), __pte(0UL))
> @@ -931,8 +942,8 @@ static inline void __set_pte_at(struct mm_struct *mm, unsigned long addr,
>  									\
>  		if (pfn_valid(this_pfn) &&				\
>  		    (((old_addr) ^ (new_addr)) & (1 << 13)))		\
> -			flush_dcache_page_all(current->mm,		\
> -					      pfn_to_page(this_pfn));	\
> +			flush_dcache_folio_all(current->mm,		\
> +				page_folio(pfn_to_page(this_pfn)));	\
>  	}								\
>  	newpte;								\
>  })
> @@ -947,7 +958,10 @@ struct seq_file;
>  void mmu_info(struct seq_file *);
>  
>  struct vm_area_struct;
> -void update_mmu_cache(struct vm_area_struct *, unsigned long, pte_t *);
> +void update_mmu_cache_range(struct vm_area_struct *, unsigned long addr,
> +		pte_t *ptep, unsigned int nr);
> +#define update_mmu_cache(vma, addr, ptep) \
> +	update_mmu_cache_range(vma, addr, ptep, 1)
>  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
>  void update_mmu_cache_pmd(struct vm_area_struct *vma, unsigned long addr,
>  			  pmd_t *pmd);
> diff --git a/arch/sparc/kernel/smp_64.c b/arch/sparc/kernel/smp_64.c
> index a55295d1b924..90ef8677ac89 100644
> --- a/arch/sparc/kernel/smp_64.c
> +++ b/arch/sparc/kernel/smp_64.c
> @@ -921,20 +921,26 @@ extern unsigned long xcall_flush_dcache_page_cheetah;
>  #endif
>  extern unsigned long xcall_flush_dcache_page_spitfire;
>  
> -static inline void __local_flush_dcache_page(struct page *page)
> +static inline void __local_flush_dcache_folio(struct folio *folio)
>  {
> +	unsigned int i, nr = folio_nr_pages(folio);
> +
>  #ifdef DCACHE_ALIASING_POSSIBLE
> -	__flush_dcache_page(page_address(page),
> +	for (i = 0; i < nr; i++)
> +		__flush_dcache_page(folio_address(folio) + i * PAGE_SIZE,
>  			    ((tlb_type == spitfire) &&
> -			     page_mapping_file(page) != NULL));
> +			     folio_flush_mapping(folio) != NULL));
>  #else
> -	if (page_mapping_file(page) != NULL &&
> -	    tlb_type == spitfire)
> -		__flush_icache_page(__pa(page_address(page)));
> +	if (folio_flush_mapping(folio) != NULL &&
> +	    tlb_type == spitfire) {
> +		unsigned long pfn = folio_pfn(folio)
> +		for (i = 0; i < nr; i++)
> +			__flush_icache_page((pfn + i) * PAGE_SIZE);
> +	}
>  #endif
>  }
>  
> -void smp_flush_dcache_page_impl(struct page *page, int cpu)
> +void smp_flush_dcache_folio_impl(struct folio *folio, int cpu)
>  {
>  	int this_cpu;
>  
> @@ -948,14 +954,14 @@ void smp_flush_dcache_page_impl(struct page *page, int cpu)
>  	this_cpu = get_cpu();
>  
>  	if (cpu == this_cpu) {
> -		__local_flush_dcache_page(page);
> +		__local_flush_dcache_folio(folio);
>  	} else if (cpu_online(cpu)) {
> -		void *pg_addr = page_address(page);
> +		void *pg_addr = folio_address(folio);
>  		u64 data0 = 0;
>  
>  		if (tlb_type == spitfire) {
>  			data0 = ((u64)&xcall_flush_dcache_page_spitfire);
> -			if (page_mapping_file(page) != NULL)
> +			if (folio_flush_mapping(folio) != NULL)
>  				data0 |= ((u64)1 << 32);
>  		} else if (tlb_type == cheetah || tlb_type == cheetah_plus) {
>  #ifdef DCACHE_ALIASING_POSSIBLE
> @@ -963,18 +969,23 @@ void smp_flush_dcache_page_impl(struct page *page, int cpu)
>  #endif
>  		}
>  		if (data0) {
> -			xcall_deliver(data0, __pa(pg_addr),
> -				      (u64) pg_addr, cpumask_of(cpu));
> +			unsigned int i, nr = folio_nr_pages(folio);
> +
> +			for (i = 0; i < nr; i++) {
> +				xcall_deliver(data0, __pa(pg_addr),
> +					      (u64) pg_addr, cpumask_of(cpu));
>  #ifdef CONFIG_DEBUG_DCFLUSH
> -			atomic_inc(&dcpage_flushes_xcall);
> +				atomic_inc(&dcpage_flushes_xcall);
>  #endif
> +				pg_addr += PAGE_SIZE;
> +			}
>  		}
>  	}
>  
>  	put_cpu();
>  }
>  
> -void flush_dcache_page_all(struct mm_struct *mm, struct page *page)
> +void flush_dcache_folio_all(struct mm_struct *mm, struct folio *folio)
>  {
>  	void *pg_addr;
>  	u64 data0;
> @@ -988,10 +999,10 @@ void flush_dcache_page_all(struct mm_struct *mm, struct page *page)
>  	atomic_inc(&dcpage_flushes);
>  #endif
>  	data0 = 0;
> -	pg_addr = page_address(page);
> +	pg_addr = folio_address(folio);
>  	if (tlb_type == spitfire) {
>  		data0 = ((u64)&xcall_flush_dcache_page_spitfire);
> -		if (page_mapping_file(page) != NULL)
> +		if (folio_flush_mapping(folio) != NULL)
>  			data0 |= ((u64)1 << 32);
>  	} else if (tlb_type == cheetah || tlb_type == cheetah_plus) {
>  #ifdef DCACHE_ALIASING_POSSIBLE
> @@ -999,13 +1010,18 @@ void flush_dcache_page_all(struct mm_struct *mm, struct page *page)
>  #endif
>  	}
>  	if (data0) {
> -		xcall_deliver(data0, __pa(pg_addr),
> -			      (u64) pg_addr, cpu_online_mask);
> +		unsigned int i, nr = folio_nr_pages(folio);
> +
> +		for (i = 0; i < nr; i++) {
> +			xcall_deliver(data0, __pa(pg_addr),
> +				      (u64) pg_addr, cpu_online_mask);
>  #ifdef CONFIG_DEBUG_DCFLUSH
> -		atomic_inc(&dcpage_flushes_xcall);
> +			atomic_inc(&dcpage_flushes_xcall);
>  #endif
> +			pg_addr += PAGE_SIZE;
> +		}
>  	}
> -	__local_flush_dcache_page(page);
> +	__local_flush_dcache_folio(folio);
>  
>  	preempt_enable();
>  }
> diff --git a/arch/sparc/mm/init_64.c b/arch/sparc/mm/init_64.c
> index 04f9db0c3111..ab9aacbaf43c 100644
> --- a/arch/sparc/mm/init_64.c
> +++ b/arch/sparc/mm/init_64.c
> @@ -195,21 +195,26 @@ atomic_t dcpage_flushes_xcall = ATOMIC_INIT(0);
>  #endif
>  #endif
>  
> -inline void flush_dcache_page_impl(struct page *page)
> +inline void flush_dcache_folio_impl(struct folio *folio)
>  {
> +	unsigned int i, nr = folio_nr_pages(folio);
> +
>  	BUG_ON(tlb_type == hypervisor);
>  #ifdef CONFIG_DEBUG_DCFLUSH
>  	atomic_inc(&dcpage_flushes);
>  #endif
>  
>  #ifdef DCACHE_ALIASING_POSSIBLE
> -	__flush_dcache_page(page_address(page),
> -			    ((tlb_type == spitfire) &&
> -			     page_mapping_file(page) != NULL));
> +	for (i = 0; i < nr; i++)
> +		__flush_dcache_page(folio_address(folio) + i * PAGE_SIZE,
> +				    ((tlb_type == spitfire) &&
> +				     folio_flush_mapping(folio) != NULL));
>  #else
> -	if (page_mapping_file(page) != NULL &&
> -	    tlb_type == spitfire)
> -		__flush_icache_page(__pa(page_address(page)));
> +	if (folio_flush_mapping(folio) != NULL &&
> +	    tlb_type == spitfire) {
> +		for (i = 0; i < nr; i++)
> +			__flush_icache_page((pfn + i) * PAGE_SIZE);
> +	}
>  #endif
>  }
>  
> @@ -218,10 +223,10 @@ inline void flush_dcache_page_impl(struct page *page)
>  #define PG_dcache_cpu_mask	\
>  	((1UL<<ilog2(roundup_pow_of_two(NR_CPUS)))-1UL)
>  
> -#define dcache_dirty_cpu(page) \
> -	(((page)->flags >> PG_dcache_cpu_shift) & PG_dcache_cpu_mask)
> +#define dcache_dirty_cpu(folio) \
> +	(((folio)->flags >> PG_dcache_cpu_shift) & PG_dcache_cpu_mask)
>  
> -static inline void set_dcache_dirty(struct page *page, int this_cpu)
> +static inline void set_dcache_dirty(struct folio *folio, int this_cpu)
>  {
>  	unsigned long mask = this_cpu;
>  	unsigned long non_cpu_bits;
> @@ -238,11 +243,11 @@ static inline void set_dcache_dirty(struct page *page, int this_cpu)
>  			     "bne,pn	%%xcc, 1b\n\t"
>  			     " nop"
>  			     : /* no outputs */
> -			     : "r" (mask), "r" (non_cpu_bits), "r" (&page->flags)
> +			     : "r" (mask), "r" (non_cpu_bits), "r" (&folio->flags)
>  			     : "g1", "g7");
>  }
>  
> -static inline void clear_dcache_dirty_cpu(struct page *page, unsigned long cpu)
> +static inline void clear_dcache_dirty_cpu(struct folio *folio, unsigned long cpu)
>  {
>  	unsigned long mask = (1UL << PG_dcache_dirty);
>  
> @@ -260,7 +265,7 @@ static inline void clear_dcache_dirty_cpu(struct page *page, unsigned long cpu)
>  			     " nop\n"
>  			     "2:"
>  			     : /* no outputs */
> -			     : "r" (cpu), "r" (mask), "r" (&page->flags),
> +			     : "r" (cpu), "r" (mask), "r" (&folio->flags),
>  			       "i" (PG_dcache_cpu_mask),
>  			       "i" (PG_dcache_cpu_shift)
>  			     : "g1", "g7");
> @@ -284,9 +289,10 @@ static void flush_dcache(unsigned long pfn)
>  
>  	page = pfn_to_page(pfn);
>  	if (page) {
> +		struct folio *folio = page_folio(page);
>  		unsigned long pg_flags;
>  
> -		pg_flags = page->flags;
> +		pg_flags = folio->flags;
>  		if (pg_flags & (1UL << PG_dcache_dirty)) {
>  			int cpu = ((pg_flags >> PG_dcache_cpu_shift) &
>  				   PG_dcache_cpu_mask);
> @@ -296,11 +302,11 @@ static void flush_dcache(unsigned long pfn)
>  			 * in the SMP case.
>  			 */
>  			if (cpu == this_cpu)
> -				flush_dcache_page_impl(page);
> +				flush_dcache_folio_impl(folio);
>  			else
> -				smp_flush_dcache_page_impl(page, cpu);
> +				smp_flush_dcache_folio_impl(folio, cpu);
>  
> -			clear_dcache_dirty_cpu(page, cpu);
> +			clear_dcache_dirty_cpu(folio, cpu);
>  
>  			put_cpu();
>  		}
> @@ -388,12 +394,14 @@ bool __init arch_hugetlb_valid_size(unsigned long size)
>  }
>  #endif	/* CONFIG_HUGETLB_PAGE */
>  
> -void update_mmu_cache(struct vm_area_struct *vma, unsigned long address, pte_t *ptep)
> +void update_mmu_cache_range(struct vm_area_struct *vma, unsigned long address,
> +		pte_t *ptep, unsigned int nr)
>  {
>  	struct mm_struct *mm;
>  	unsigned long flags;
>  	bool is_huge_tsb;
>  	pte_t pte = *ptep;
> +	unsigned int i;
>  
>  	if (tlb_type != hypervisor) {
>  		unsigned long pfn = pte_pfn(pte);
> @@ -440,15 +448,21 @@ void update_mmu_cache(struct vm_area_struct *vma, unsigned long address, pte_t *
>  		}
>  	}
>  #endif
> -	if (!is_huge_tsb)
> -		__update_mmu_tsb_insert(mm, MM_TSB_BASE, PAGE_SHIFT,
> -					address, pte_val(pte));
> +	if (!is_huge_tsb) {
> +		for (i = 0; i < nr; i++) {
> +			__update_mmu_tsb_insert(mm, MM_TSB_BASE, PAGE_SHIFT,
> +						address, pte_val(pte));
> +			address += PAGE_SIZE;
> +			pte_val(pte) += PAGE_SIZE;
> +		}
> +	}
>  
>  	spin_unlock_irqrestore(&mm->context.lock, flags);
>  }
>  
> -void flush_dcache_page(struct page *page)
> +void flush_dcache_folio(struct folio *folio)
>  {
> +	unsigned long pfn = folio_pfn(folio);
>  	struct address_space *mapping;
>  	int this_cpu;
>  
> @@ -459,35 +473,35 @@ void flush_dcache_page(struct page *page)
>  	 * is merely the zero page.  The 'bigcore' testcase in GDB
>  	 * causes this case to run millions of times.
>  	 */
> -	if (page == ZERO_PAGE(0))
> +	if (is_zero_pfn(pfn))
>  		return;
>  
>  	this_cpu = get_cpu();
>  
> -	mapping = page_mapping_file(page);
> +	mapping = folio_flush_mapping(folio);
>  	if (mapping && !mapping_mapped(mapping)) {
> -		int dirty = test_bit(PG_dcache_dirty, &page->flags);
> +		bool dirty = test_bit(PG_dcache_dirty, &folio->flags);
>  		if (dirty) {
> -			int dirty_cpu = dcache_dirty_cpu(page);
> +			int dirty_cpu = dcache_dirty_cpu(folio);
>  
>  			if (dirty_cpu == this_cpu)
>  				goto out;
> -			smp_flush_dcache_page_impl(page, dirty_cpu);
> +			smp_flush_dcache_folio_impl(folio, dirty_cpu);
>  		}
> -		set_dcache_dirty(page, this_cpu);
> +		set_dcache_dirty(folio, this_cpu);
>  	} else {
>  		/* We could delay the flush for the !page_mapping
>  		 * case too.  But that case is for exec env/arg
>  		 * pages and those are %99 certainly going to get
>  		 * faulted into the tlb (and thus flushed) anyways.
>  		 */
> -		flush_dcache_page_impl(page);
> +		flush_dcache_folio_impl(folio);
>  	}
>  
>  out:
>  	put_cpu();
>  }
> -EXPORT_SYMBOL(flush_dcache_page);
> +EXPORT_SYMBOL(flush_dcache_folio);
>  
>  void __kprobes flush_icache_range(unsigned long start, unsigned long end)
>  {
> @@ -2280,10 +2294,10 @@ void __init paging_init(void)
>  	setup_page_offset();
>  
>  	/* These build time checkes make sure that the dcache_dirty_cpu()
> -	 * page->flags usage will work.
> +	 * folio->flags usage will work.
>  	 *
>  	 * When a page gets marked as dcache-dirty, we store the
> -	 * cpu number starting at bit 32 in the page->flags.  Also,
> +	 * cpu number starting at bit 32 in the folio->flags.  Also,
>  	 * functions like clear_dcache_dirty_cpu use the cpu mask
>  	 * in 13-bit signed-immediate instruction fields.
>  	 */
> diff --git a/arch/sparc/mm/tlb.c b/arch/sparc/mm/tlb.c
> index 9a725547578e..3fa6a070912d 100644
> --- a/arch/sparc/mm/tlb.c
> +++ b/arch/sparc/mm/tlb.c
> @@ -118,6 +118,7 @@ void tlb_batch_add(struct mm_struct *mm, unsigned long vaddr,
>  		unsigned long paddr, pfn = pte_pfn(orig);
>  		struct address_space *mapping;
>  		struct page *page;
> +		struct folio *folio;
>  
>  		if (!pfn_valid(pfn))
>  			goto no_cache_flush;
> @@ -127,13 +128,13 @@ void tlb_batch_add(struct mm_struct *mm, unsigned long vaddr,
>  			goto no_cache_flush;
>  
>  		/* A real file page? */
> -		mapping = page_mapping_file(page);
> +		mapping = folio_flush_mapping(folio);
>  		if (!mapping)
>  			goto no_cache_flush;
>  
>  		paddr = (unsigned long) page_address(page);
>  		if ((paddr ^ vaddr) & (1 << 13))
> -			flush_dcache_page_all(mm, page);
> +			flush_dcache_folio_all(mm, folio);
>  	}
>  
>  no_cache_flush:
> -- 
> 2.39.2
> 
> 

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v4 26/36] um: Implement the new page table range API
  2023-03-15  5:14 ` [PATCH v4 26/36] um: " Matthew Wilcox (Oracle)
@ 2023-03-15 10:12   ` Mike Rapoport
  0 siblings, 0 replies; 138+ messages in thread
From: Mike Rapoport @ 2023-03-15 10:12 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-arch, linux-mm, linux-kernel, Richard Weinberger,
	Anton Ivanov, Johannes Berg, linux-um

On Wed, Mar 15, 2023 at 05:14:34AM +0000, Matthew Wilcox (Oracle) wrote:
> Add PFN_PTE_SHIFT and update_mmu_cache_range().
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> Cc: Richard Weinberger <richard@nod.at>
> Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
> Cc: Johannes Berg <johannes@sipsolutions.net>
> Cc: linux-um@lists.infradead.org

Acked-by: Mike Rapoport (IBM) <rppt@kernel.org>

> ---
>  arch/um/include/asm/pgtable.h | 7 ++-----
>  1 file changed, 2 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/um/include/asm/pgtable.h b/arch/um/include/asm/pgtable.h
> index a70d1618eb35..ea5f8122f128 100644
> --- a/arch/um/include/asm/pgtable.h
> +++ b/arch/um/include/asm/pgtable.h
> @@ -242,11 +242,7 @@ static inline void set_pte(pte_t *pteptr, pte_t pteval)
>  	if(pte_present(*pteptr)) *pteptr = pte_mknewprot(*pteptr);
>  }
>  
> -static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
> -			      pte_t *pteptr, pte_t pteval)
> -{
> -	set_pte(pteptr, pteval);
> -}
> +#define PFN_PTE_SHIFT		PAGE_SHIFT
>  
>  #define __HAVE_ARCH_PTE_SAME
>  static inline int pte_same(pte_t pte_a, pte_t pte_b)
> @@ -290,6 +286,7 @@ struct mm_struct;
>  extern pte_t *virt_to_pte(struct mm_struct *mm, unsigned long addr);
>  
>  #define update_mmu_cache(vma,address,ptep) do {} while (0)
> +#define update_mmu_cache_range(vma, address, ptep, nr) do {} while (0)
>  
>  /*
>   * Encode/decode swap entries and swap PTEs. Swap PTEs are all PTEs that
> -- 
> 2.39.2
> 
> 

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v4 27/36] x86: Implement the new page table range API
  2023-03-15  5:14 ` [PATCH v4 27/36] x86: " Matthew Wilcox (Oracle)
@ 2023-03-15 10:12   ` Mike Rapoport
  2023-03-15 10:34   ` Peter Zijlstra
  1 sibling, 0 replies; 138+ messages in thread
From: Mike Rapoport @ 2023-03-15 10:12 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-arch, linux-mm, linux-kernel, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, H. Peter Anvin

On Wed, Mar 15, 2023 at 05:14:35AM +0000, Matthew Wilcox (Oracle) wrote:
> Add PFN_PTE_SHIFT and a noop update_mmu_cache_range().
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: x86@kernel.org
> Cc: "H. Peter Anvin" <hpa@zytor.com>

Acked-by: Mike Rapoport (IBM) <rppt@kernel.org>

> ---
>  arch/x86/include/asm/pgtable.h | 13 ++++++-------
>  1 file changed, 6 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
> index 1031025730d0..b237878061c4 100644
> --- a/arch/x86/include/asm/pgtable.h
> +++ b/arch/x86/include/asm/pgtable.h
> @@ -184,6 +184,8 @@ static inline int pte_special(pte_t pte)
>  
>  static inline u64 protnone_mask(u64 val);
>  
> +#define PFN_PTE_SHIFT	PAGE_SHIFT
> +
>  static inline unsigned long pte_pfn(pte_t pte)
>  {
>  	phys_addr_t pfn = pte_val(pte);
> @@ -1019,13 +1021,6 @@ static inline pud_t native_local_pudp_get_and_clear(pud_t *pudp)
>  	return res;
>  }
>  
> -static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
> -			      pte_t *ptep, pte_t pte)
> -{
> -	page_table_check_ptes_set(mm, addr, ptep, pte, 1);
> -	set_pte(ptep, pte);
> -}
> -
>  static inline void set_pmd_at(struct mm_struct *mm, unsigned long addr,
>  			      pmd_t *pmdp, pmd_t pmd)
>  {
> @@ -1291,6 +1286,10 @@ static inline void update_mmu_cache(struct vm_area_struct *vma,
>  		unsigned long addr, pte_t *ptep)
>  {
>  }
> +static inline void update_mmu_cache_range(struct vm_area_struct *vma,
> +		unsigned long addr, pte_t *ptep, unsigned int nr)
> +{
> +}
>  static inline void update_mmu_cache_pmd(struct vm_area_struct *vma,
>  		unsigned long addr, pmd_t *pmd)
>  {
> -- 
> 2.39.2
> 
> 

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v4 28/36] xtensa: Implement the new page table range API
  2023-03-15  5:14 ` [PATCH v4 28/36] xtensa: " Matthew Wilcox (Oracle)
@ 2023-03-15 10:12   ` Mike Rapoport
  0 siblings, 0 replies; 138+ messages in thread
From: Mike Rapoport @ 2023-03-15 10:12 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-arch, linux-mm, linux-kernel, Max Filippov, linux-xtensa

On Wed, Mar 15, 2023 at 05:14:36AM +0000, Matthew Wilcox (Oracle) wrote:
> Add PFN_PTE_SHIFT, update_mmu_cache_range(), flush_dcache_folio() and
> flush_icache_pages().
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> Cc: Max Filippov <jcmvbkbc@gmail.com>
> Cc: linux-xtensa@linux-xtensa.org

Acked-by: Mike Rapoport (IBM) <rppt@kernel.org>

> ---
>  arch/xtensa/include/asm/cacheflush.h |  9 ++-
>  arch/xtensa/include/asm/pgtable.h    | 17 +++---
>  arch/xtensa/mm/cache.c               | 83 ++++++++++++++++------------
>  3 files changed, 62 insertions(+), 47 deletions(-)
> 
> diff --git a/arch/xtensa/include/asm/cacheflush.h b/arch/xtensa/include/asm/cacheflush.h
> index 7b4359312c25..35153f6725e4 100644
> --- a/arch/xtensa/include/asm/cacheflush.h
> +++ b/arch/xtensa/include/asm/cacheflush.h
> @@ -119,8 +119,14 @@ void flush_cache_page(struct vm_area_struct*,
>  #define flush_cache_vmap(start,end)	flush_cache_all()
>  #define flush_cache_vunmap(start,end)	flush_cache_all()
>  
> +void flush_dcache_folio(struct folio *folio);
> +#define flush_dcache_folio flush_dcache_folio
> +
>  #define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
> -void flush_dcache_page(struct page *);
> +static inline void flush_dcache_page(struct page *page)
> +{
> +	flush_dcache_folio(page_folio(page));
> +}
>  
>  void local_flush_cache_range(struct vm_area_struct *vma,
>  		unsigned long start, unsigned long end);
> @@ -156,6 +162,7 @@ void local_flush_cache_page(struct vm_area_struct *vma,
>  
>  /* This is not required, see Documentation/core-api/cachetlb.rst */
>  #define	flush_icache_page(vma,page)			do { } while (0)
> +#define	flush_icache_pages(vma, page, nr)		do { } while (0)
>  
>  #define flush_dcache_mmap_lock(mapping)			do { } while (0)
>  #define flush_dcache_mmap_unlock(mapping)		do { } while (0)
> diff --git a/arch/xtensa/include/asm/pgtable.h b/arch/xtensa/include/asm/pgtable.h
> index fc7a14884c6c..80bc70251aad 100644
> --- a/arch/xtensa/include/asm/pgtable.h
> +++ b/arch/xtensa/include/asm/pgtable.h
> @@ -274,6 +274,7 @@ static inline pte_t pte_mkwrite(pte_t pte)
>   * and a page entry and page directory to the page they refer to.
>   */
>  
> +#define PFN_PTE_SHIFT		PAGE_SHIFT
>  #define pte_pfn(pte)		(pte_val(pte) >> PAGE_SHIFT)
>  #define pte_same(a,b)		(pte_val(a) == pte_val(b))
>  #define pte_page(x)		pfn_to_page(pte_pfn(x))
> @@ -301,15 +302,9 @@ static inline void update_pte(pte_t *ptep, pte_t pteval)
>  
>  struct mm_struct;
>  
> -static inline void
> -set_pte_at(struct mm_struct *mm, unsigned long addr, pte_t *ptep, pte_t pteval)
> -{
> -	update_pte(ptep, pteval);
> -}
> -
> -static inline void set_pte(pte_t *ptep, pte_t pteval)
> +static inline void set_pte(pte_t *ptep, pte_t pte)
>  {
> -	update_pte(ptep, pteval);
> +	update_pte(ptep, pte);
>  }
>  
>  static inline void
> @@ -407,8 +402,10 @@ static inline pte_t pte_swp_clear_exclusive(pte_t pte)
>  
>  #else
>  
> -extern  void update_mmu_cache(struct vm_area_struct * vma,
> -			      unsigned long address, pte_t *ptep);
> +void update_mmu_cache_range(struct vm_area_struct *vma,
> +		unsigned long address, pte_t *ptep, unsigned int nr);
> +#define update_mmu_cache(vma, address, ptep) \
> +	update_mmu_cache_range(vma, address, ptep, 1)
>  
>  typedef pte_t *pte_addr_t;
>  
> diff --git a/arch/xtensa/mm/cache.c b/arch/xtensa/mm/cache.c
> index 19e5a478a7e8..27bd798e4d89 100644
> --- a/arch/xtensa/mm/cache.c
> +++ b/arch/xtensa/mm/cache.c
> @@ -121,9 +121,9 @@ EXPORT_SYMBOL(copy_user_highpage);
>   *
>   */
>  
> -void flush_dcache_page(struct page *page)
> +void flush_dcache_folio(struct folio *folio)
>  {
> -	struct address_space *mapping = page_mapping_file(page);
> +	struct address_space *mapping = folio_flush_mapping(folio);
>  
>  	/*
>  	 * If we have a mapping but the page is not mapped to user-space
> @@ -132,14 +132,14 @@ void flush_dcache_page(struct page *page)
>  	 */
>  
>  	if (mapping && !mapping_mapped(mapping)) {
> -		if (!test_bit(PG_arch_1, &page->flags))
> -			set_bit(PG_arch_1, &page->flags);
> +		if (!test_bit(PG_arch_1, &folio->flags))
> +			set_bit(PG_arch_1, &folio->flags);
>  		return;
>  
>  	} else {
> -
> -		unsigned long phys = page_to_phys(page);
> -		unsigned long temp = page->index << PAGE_SHIFT;
> +		unsigned long phys = folio_pfn(folio) * PAGE_SIZE;
> +		unsigned long temp = folio_pos(folio);
> +		unsigned int i, nr = folio_nr_pages(folio);
>  		unsigned long alias = !(DCACHE_ALIAS_EQ(temp, phys));
>  		unsigned long virt;
>  
> @@ -154,22 +154,26 @@ void flush_dcache_page(struct page *page)
>  			return;
>  
>  		preempt_disable();
> -		virt = TLBTEMP_BASE_1 + (phys & DCACHE_ALIAS_MASK);
> -		__flush_invalidate_dcache_page_alias(virt, phys);
> +		for (i = 0; i < nr; i++) {
> +			virt = TLBTEMP_BASE_1 + (phys & DCACHE_ALIAS_MASK);
> +			__flush_invalidate_dcache_page_alias(virt, phys);
>  
> -		virt = TLBTEMP_BASE_1 + (temp & DCACHE_ALIAS_MASK);
> +			virt = TLBTEMP_BASE_1 + (temp & DCACHE_ALIAS_MASK);
>  
> -		if (alias)
> -			__flush_invalidate_dcache_page_alias(virt, phys);
> +			if (alias)
> +				__flush_invalidate_dcache_page_alias(virt, phys);
>  
> -		if (mapping)
> -			__invalidate_icache_page_alias(virt, phys);
> +			if (mapping)
> +				__invalidate_icache_page_alias(virt, phys);
> +			phys += PAGE_SIZE;
> +			temp += PAGE_SIZE;
> +		}
>  		preempt_enable();
>  	}
>  
>  	/* There shouldn't be an entry in the cache for this page anymore. */
>  }
> -EXPORT_SYMBOL(flush_dcache_page);
> +EXPORT_SYMBOL(flush_dcache_folio);
>  
>  /*
>   * For now, flush the whole cache. FIXME??
> @@ -207,45 +211,52 @@ EXPORT_SYMBOL(local_flush_cache_page);
>  
>  #endif /* DCACHE_WAY_SIZE > PAGE_SIZE */
>  
> -void
> -update_mmu_cache(struct vm_area_struct * vma, unsigned long addr, pte_t *ptep)
> +void update_mmu_cache_range(struct vm_area_struct *vma, unsigned long addr,
> +		pte_t *ptep, unsigned int nr)
>  {
>  	unsigned long pfn = pte_pfn(*ptep);
> -	struct page *page;
> +	struct folio *folio;
> +	unsigned int i;
>  
>  	if (!pfn_valid(pfn))
>  		return;
>  
> -	page = pfn_to_page(pfn);
> +	folio = page_folio(pfn_to_page(pfn));
>  
> -	/* Invalidate old entry in TLBs */
> -
> -	flush_tlb_page(vma, addr);
> +	/* Invalidate old entries in TLBs */
> +	for (i = 0; i < nr; i++)
> +		flush_tlb_page(vma, addr + i * PAGE_SIZE);
> +	nr = folio_nr_pages(folio);
>  
>  #if (DCACHE_WAY_SIZE > PAGE_SIZE)
>  
> -	if (!PageReserved(page) && test_bit(PG_arch_1, &page->flags)) {
> -		unsigned long phys = page_to_phys(page);
> +	if (!folio_test_reserved(folio) && test_bit(PG_arch_1, &folio->flags)) {
> +		unsigned long phys = folio_pfn(folio) * PAGE_SIZE;
>  		unsigned long tmp;
>  
>  		preempt_disable();
> -		tmp = TLBTEMP_BASE_1 + (phys & DCACHE_ALIAS_MASK);
> -		__flush_invalidate_dcache_page_alias(tmp, phys);
> -		tmp = TLBTEMP_BASE_1 + (addr & DCACHE_ALIAS_MASK);
> -		__flush_invalidate_dcache_page_alias(tmp, phys);
> -		__invalidate_icache_page_alias(tmp, phys);
> +		for (i = 0; i < nr; i++) {
> +			tmp = TLBTEMP_BASE_1 + (phys & DCACHE_ALIAS_MASK);
> +			__flush_invalidate_dcache_page_alias(tmp, phys);
> +			tmp = TLBTEMP_BASE_1 + (addr & DCACHE_ALIAS_MASK);
> +			__flush_invalidate_dcache_page_alias(tmp, phys);
> +			__invalidate_icache_page_alias(tmp, phys);
> +			phys += PAGE_SIZE;
> +		}
>  		preempt_enable();
>  
> -		clear_bit(PG_arch_1, &page->flags);
> +		clear_bit(PG_arch_1, &folio->flags);
>  	}
>  #else
> -	if (!PageReserved(page) && !test_bit(PG_arch_1, &page->flags)
> +	if (!folio_test_reserved(folio) && !test_bit(PG_arch_1, &folio->flags)
>  	    && (vma->vm_flags & VM_EXEC) != 0) {
> -		unsigned long paddr = (unsigned long)kmap_atomic(page);
> -		__flush_dcache_page(paddr);
> -		__invalidate_icache_page(paddr);
> -		set_bit(PG_arch_1, &page->flags);
> -		kunmap_atomic((void *)paddr);
> +		for (i = 0; i < nr; i++) {
> +			void *paddr = kmap_local_folio(folio, i * PAGE_SIZE);
> +			__flush_dcache_page((unsigned long)paddr);
> +			__invalidate_icache_page((unsigned long)paddr);
> +			kunmap_local(paddr);
> +		}
> +		set_bit(PG_arch_1, &folio->flags);
>  	}
>  #endif
>  }
> -- 
> 2.39.2
> 
> 

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v4 20/36] powerpc: Implement the new page table range API
  2023-03-15  9:43   ` Christophe Leroy
@ 2023-03-15 10:18     ` Christophe Leroy
  2023-03-17  3:47       ` Matthew Wilcox
  0 siblings, 1 reply; 138+ messages in thread
From: Christophe Leroy @ 2023-03-15 10:18 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), linux-arch
  Cc: linux-mm, linux-kernel, Michael Ellerman, Nicholas Piggin, linuxppc-dev



Le 15/03/2023 à 10:43, Christophe Leroy a écrit :
> 
> 
> Le 15/03/2023 à 06:14, Matthew Wilcox (Oracle) a écrit :
>> Add set_ptes(), update_mmu_cache_range() and flush_dcache_folio().
>> Change the PG_arch_1 (aka PG_dcache_dirty) flag from being per-page to
>> per-folio.
>>
>> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
>> Cc: Michael Ellerman <mpe@ellerman.id.au>
>> Cc: Nicholas Piggin <npiggin@gmail.com>
>> Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
>> Cc: linuxppc-dev@lists.ozlabs.org
>> ---

>> @@ -203,7 +203,14 @@ void set_pte_at(struct mm_struct *mm, unsigned 
>> long addr, pte_t *ptep,
>>       pte = set_pte_filter(pte);
>>       /* Perform the setting of the PTE */
>> -    __set_pte_at(mm, addr, ptep, pte, 0);
>> +    for (;;) {
>> +        __set_pte_at(mm, addr, ptep, pte, 0);
>> +        if (--nr == 0)
>> +            break;
>> +        ptep++;
>> +        pte = __pte(pte_val(pte) + PAGE_SIZE);
> 
> I don't like that math too much, but I have no better idea at the moment.
> 
> Maybe set_ptes() should take a pgprot_t and rebuild the pte with 
> mk_pte() or similar ?
> 
>> +        addr += PAGE_SIZE;
>> +    }
>>   }
>>   void unmap_kernel_page(unsigned long va)

I investigated a bit further and can confirm now that the above won't 
always work, see comment 
https://elixir.bootlin.com/linux/v6.3-rc2/source/arch/powerpc/include/asm/nohash/32/pgtable.h#L147

And then you see 
https://elixir.bootlin.com/linux/v6.3-rc2/source/arch/powerpc/include/asm/nohash/pte-e500.h#L63

Christophe

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v4 27/36] x86: Implement the new page table range API
  2023-03-15  5:14 ` [PATCH v4 27/36] x86: " Matthew Wilcox (Oracle)
  2023-03-15 10:12   ` Mike Rapoport
@ 2023-03-15 10:34   ` Peter Zijlstra
  2023-03-15 11:16     ` Mike Rapoport
  1 sibling, 1 reply; 138+ messages in thread
From: Peter Zijlstra @ 2023-03-15 10:34 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-arch, linux-mm, linux-kernel, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, H. Peter Anvin

On Wed, Mar 15, 2023 at 05:14:35AM +0000, Matthew Wilcox (Oracle) wrote:
> Add PFN_PTE_SHIFT and a noop update_mmu_cache_range().
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: x86@kernel.org
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> ---
>  arch/x86/include/asm/pgtable.h | 13 ++++++-------
>  1 file changed, 6 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
> index 1031025730d0..b237878061c4 100644
> --- a/arch/x86/include/asm/pgtable.h
> +++ b/arch/x86/include/asm/pgtable.h
> @@ -184,6 +184,8 @@ static inline int pte_special(pte_t pte)
>  
>  static inline u64 protnone_mask(u64 val);
>  
> +#define PFN_PTE_SHIFT	PAGE_SHIFT
> +
>  static inline unsigned long pte_pfn(pte_t pte)
>  {
>  	phys_addr_t pfn = pte_val(pte);
> @@ -1019,13 +1021,6 @@ static inline pud_t native_local_pudp_get_and_clear(pud_t *pudp)
>  	return res;
>  }
>  
> -static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
> -			      pte_t *ptep, pte_t pte)
> -{
> -	page_table_check_ptes_set(mm, addr, ptep, pte, 1);
> -	set_pte(ptep, pte);
> -}
> -

And remove set_pte_at() apparently.. whut?!?

>  static inline void set_pmd_at(struct mm_struct *mm, unsigned long addr,
>  			      pmd_t *pmdp, pmd_t pmd)
>  {
> @@ -1291,6 +1286,10 @@ static inline void update_mmu_cache(struct vm_area_struct *vma,
>  		unsigned long addr, pte_t *ptep)
>  {
>  }
> +static inline void update_mmu_cache_range(struct vm_area_struct *vma,
> +		unsigned long addr, pte_t *ptep, unsigned int nr)
> +{
> +}
>  static inline void update_mmu_cache_pmd(struct vm_area_struct *vma,
>  		unsigned long addr, pmd_t *pmd)
>  {
> -- 
> 2.39.2
> 

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v4 16/36] mips: Implement the new page table range API
  2023-03-15  5:14 ` [PATCH v4 16/36] mips: " Matthew Wilcox (Oracle)
  2023-03-15 10:08   ` Mike Rapoport
@ 2023-03-15 10:50   ` Thomas Bogendoerfer
  2023-03-15 20:33     ` Matthew Wilcox
  1 sibling, 1 reply; 138+ messages in thread
From: Thomas Bogendoerfer @ 2023-03-15 10:50 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle); +Cc: linux-arch, linux-mm, linux-kernel, linux-mips

On Wed, Mar 15, 2023 at 05:14:24AM +0000, Matthew Wilcox (Oracle) wrote:
> Rename _PFN_SHIFT to PFN_PTE_SHIFT.  Convert a few places
> to call set_pte() instead of set_pte_at().  Add set_ptes(),
> update_mmu_cache_range(), flush_icache_pages() and flush_dcache_folio().

/local/tbogendoerfer/korg/linux/mm/memory.c: In function ‘set_pte_range’:
/local/tbogendoerfer/korg/linux/mm/memory.c:4290:2: error: implicit declaration of function ‘update_mmu_cache_range’ [-Werror=implicit-function-declaration]
  update_mmu_cache_range(vma, addr, vmf->pte, nr);

update_mmu_cache_range() is missing in this patch.

Thomas.

-- 
Crap can work. Given enough thrust pigs will fly, but it's not necessarily a
good idea.                                                [ RFC1925, 2.3 ]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v4 08/36] arm: Implement the new page table range API
  2023-03-15  5:14 ` [PATCH v4 08/36] arm: " Matthew Wilcox (Oracle)
  2023-03-15  9:48   ` Mike Rapoport
@ 2023-03-15 10:56   ` Russell King (Oracle)
  1 sibling, 0 replies; 138+ messages in thread
From: Russell King (Oracle) @ 2023-03-15 10:56 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-arch, linux-mm, linux-kernel, linux-arm-kernel

On Wed, Mar 15, 2023 at 05:14:16AM +0000, Matthew Wilcox (Oracle) wrote:
> Add set_ptes(), update_mmu_cache_range(), flush_dcache_folio() and
> flush_icache_pages().  Change the PG_dcache_clear flag from being per-page
> to per-folio which makes __dma_page_dev_to_cpu() a bit more exciting.
> Also add flush_cache_pages(), even though this isn't used by generic code
> (yet?)
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>

Thanks!

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v4 27/36] x86: Implement the new page table range API
  2023-03-15 10:34   ` Peter Zijlstra
@ 2023-03-15 11:16     ` Mike Rapoport
  2023-03-15 11:19       ` Peter Zijlstra
  0 siblings, 1 reply; 138+ messages in thread
From: Mike Rapoport @ 2023-03-15 11:16 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Matthew Wilcox (Oracle),
	linux-arch, linux-mm, linux-kernel, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, H. Peter Anvin

On Wed, Mar 15, 2023 at 11:34:36AM +0100, Peter Zijlstra wrote:
> On Wed, Mar 15, 2023 at 05:14:35AM +0000, Matthew Wilcox (Oracle) wrote:
> > Add PFN_PTE_SHIFT and a noop update_mmu_cache_range().
> > 
> > Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Ingo Molnar <mingo@redhat.com>
> > Cc: Borislav Petkov <bp@alien8.de>
> > Cc: Dave Hansen <dave.hansen@linux.intel.com>
> > Cc: x86@kernel.org
> > Cc: "H. Peter Anvin" <hpa@zytor.com>
> > ---
> >  arch/x86/include/asm/pgtable.h | 13 ++++++-------
> >  1 file changed, 6 insertions(+), 7 deletions(-)
> > 
> > diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
> > index 1031025730d0..b237878061c4 100644
> > --- a/arch/x86/include/asm/pgtable.h
> > +++ b/arch/x86/include/asm/pgtable.h
> > @@ -184,6 +184,8 @@ static inline int pte_special(pte_t pte)
> >  
> >  static inline u64 protnone_mask(u64 val);
> >  
> > +#define PFN_PTE_SHIFT	PAGE_SHIFT
> > +
> >  static inline unsigned long pte_pfn(pte_t pte)
> >  {
> >  	phys_addr_t pfn = pte_val(pte);
> > @@ -1019,13 +1021,6 @@ static inline pud_t native_local_pudp_get_and_clear(pud_t *pudp)
> >  	return res;
> >  }
> >  
> > -static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
> > -			      pte_t *ptep, pte_t pte)
> > -{
> > -	page_table_check_ptes_set(mm, addr, ptep, pte, 1);
> > -	set_pte(ptep, pte);
> > -}
> > -
> 
> And remove set_pte_at() apparently.. whut?!?

It's now in include/linux/pgtable.h
 
> >  static inline void set_pmd_at(struct mm_struct *mm, unsigned long addr,
> >  			      pmd_t *pmdp, pmd_t pmd)
> >  {
> > @@ -1291,6 +1286,10 @@ static inline void update_mmu_cache(struct vm_area_struct *vma,
> >  		unsigned long addr, pte_t *ptep)
> >  {
> >  }
> > +static inline void update_mmu_cache_range(struct vm_area_struct *vma,
> > +		unsigned long addr, pte_t *ptep, unsigned int nr)
> > +{
> > +}
> >  static inline void update_mmu_cache_pmd(struct vm_area_struct *vma,
> >  		unsigned long addr, pmd_t *pmd)
> >  {
> > -- 
> > 2.39.2
> > 
> 

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v4 27/36] x86: Implement the new page table range API
  2023-03-15 11:16     ` Mike Rapoport
@ 2023-03-15 11:19       ` Peter Zijlstra
  2023-03-15 16:12         ` Matthew Wilcox
  0 siblings, 1 reply; 138+ messages in thread
From: Peter Zijlstra @ 2023-03-15 11:19 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: Matthew Wilcox (Oracle),
	linux-arch, linux-mm, linux-kernel, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, H. Peter Anvin

On Wed, Mar 15, 2023 at 01:16:24PM +0200, Mike Rapoport wrote:
> On Wed, Mar 15, 2023 at 11:34:36AM +0100, Peter Zijlstra wrote:
> > On Wed, Mar 15, 2023 at 05:14:35AM +0000, Matthew Wilcox (Oracle) wrote:
> > > Add PFN_PTE_SHIFT and a noop update_mmu_cache_range().
> > > 
> > > Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> > > Cc: Thomas Gleixner <tglx@linutronix.de>
> > > Cc: Ingo Molnar <mingo@redhat.com>
> > > Cc: Borislav Petkov <bp@alien8.de>
> > > Cc: Dave Hansen <dave.hansen@linux.intel.com>
> > > Cc: x86@kernel.org
> > > Cc: "H. Peter Anvin" <hpa@zytor.com>
> > > ---
> > >  arch/x86/include/asm/pgtable.h | 13 ++++++-------
> > >  1 file changed, 6 insertions(+), 7 deletions(-)
> > > 
> > > diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
> > > index 1031025730d0..b237878061c4 100644
> > > --- a/arch/x86/include/asm/pgtable.h
> > > +++ b/arch/x86/include/asm/pgtable.h
> > > @@ -184,6 +184,8 @@ static inline int pte_special(pte_t pte)
> > >  
> > >  static inline u64 protnone_mask(u64 val);
> > >  
> > > +#define PFN_PTE_SHIFT	PAGE_SHIFT
> > > +
> > >  static inline unsigned long pte_pfn(pte_t pte)
> > >  {
> > >  	phys_addr_t pfn = pte_val(pte);
> > > @@ -1019,13 +1021,6 @@ static inline pud_t native_local_pudp_get_and_clear(pud_t *pudp)
> > >  	return res;
> > >  }
> > >  
> > > -static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
> > > -			      pte_t *ptep, pte_t pte)
> > > -{
> > > -	page_table_check_ptes_set(mm, addr, ptep, pte, 1);
> > > -	set_pte(ptep, pte);
> > > -}
> > > -
> > 
> > And remove set_pte_at() apparently.. whut?!?
> 
> It's now in include/linux/pgtable.h

All I have is this one patch -- and the changelog doesn't mention this.
HTF am I supposed to know that?

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v4 34/36] rmap: add folio_add_file_rmap_range()
  2023-03-15  5:14 ` [PATCH v4 34/36] rmap: add folio_add_file_rmap_range() Matthew Wilcox (Oracle)
@ 2023-03-15 13:34   ` Ryan Roberts
  2023-03-15 16:08     ` Ryan Roberts
  0 siblings, 1 reply; 138+ messages in thread
From: Ryan Roberts @ 2023-03-15 13:34 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), linux-arch; +Cc: Yin Fengwei, linux-mm, linux-kernel

On 15/03/2023 05:14, Matthew Wilcox (Oracle) wrote:
> From: Yin Fengwei <fengwei.yin@intel.com>
> 
> folio_add_file_rmap_range() allows to add pte mapping to a specific
> range of file folio. Comparing to page_add_file_rmap(), it batched
> updates __lruvec_stat for large folio.
> 
> Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>  include/linux/rmap.h |  2 ++
>  mm/rmap.c            | 60 +++++++++++++++++++++++++++++++++-----------
>  2 files changed, 48 insertions(+), 14 deletions(-)
> 
> diff --git a/include/linux/rmap.h b/include/linux/rmap.h
> index b87d01660412..a3825ce81102 100644
> --- a/include/linux/rmap.h
> +++ b/include/linux/rmap.h
> @@ -198,6 +198,8 @@ void folio_add_new_anon_rmap(struct folio *, struct vm_area_struct *,
>  		unsigned long address);
>  void page_add_file_rmap(struct page *, struct vm_area_struct *,
>  		bool compound);
> +void folio_add_file_rmap_range(struct folio *, struct page *, unsigned int nr,
> +		struct vm_area_struct *, bool compound);
>  void page_remove_rmap(struct page *, struct vm_area_struct *,
>  		bool compound);
>  
> diff --git a/mm/rmap.c b/mm/rmap.c
> index 4898e10c569a..a91906b28835 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -1301,31 +1301,39 @@ void folio_add_new_anon_rmap(struct folio *folio, struct vm_area_struct *vma,
>  }
>  
>  /**
> - * page_add_file_rmap - add pte mapping to a file page
> - * @page:	the page to add the mapping to
> + * folio_add_file_rmap_range - add pte mapping to page range of a folio
> + * @folio:	The folio to add the mapping to
> + * @page:	The first page to add
> + * @nr_pages:	The number of pages which will be mapped
>   * @vma:	the vm area in which the mapping is added
>   * @compound:	charge the page as compound or small page
>   *
> + * The page range of folio is defined by [first_page, first_page + nr_pages)
> + *
>   * The caller needs to hold the pte lock.
>   */
> -void page_add_file_rmap(struct page *page, struct vm_area_struct *vma,
> -		bool compound)
> +void folio_add_file_rmap_range(struct folio *folio, struct page *page,
> +			unsigned int nr_pages, struct vm_area_struct *vma,
> +			bool compound)
>  {
> -	struct folio *folio = page_folio(page);
>  	atomic_t *mapped = &folio->_nr_pages_mapped;
> -	int nr = 0, nr_pmdmapped = 0;
> -	bool first;
> +	unsigned int nr_pmdmapped = 0, first;
> +	int nr = 0;
>  
> -	VM_BUG_ON_PAGE(compound && !PageTransHuge(page), page);
> +	VM_WARN_ON_FOLIO(compound && !folio_test_pmd_mappable(folio), folio);
>  
>  	/* Is page being mapped by PTE? Is this its first map to be added? */
>  	if (likely(!compound)) {
> -		first = atomic_inc_and_test(&page->_mapcount);
> -		nr = first;
> -		if (first && folio_test_large(folio)) {
> -			nr = atomic_inc_return_relaxed(mapped);
> -			nr = (nr < COMPOUND_MAPPED);
> -		}
> +		do {
> +			first = atomic_inc_and_test(&page->_mapcount);
> +			if (first && folio_test_large(folio)) {
> +				first = atomic_inc_return_relaxed(mapped);
> +				first = (nr < COMPOUND_MAPPED);

This still contains the typo that Yin Fengwei spotted in the previous version:
https://lore.kernel.org/linux-mm/20230228213738.272178-1-willy@infradead.org/T/#m84673899e25bc31356093a1177941f2cc35e5da8

FYI, I'm seeing a perf regression of about 1% when compiling the kernel on
Ampere Altra (arm64) with this whole series on top of v6.3-rc1 (In a VM using
ext4 filesystem). Looks like instruction aborts are taking much longer and a
selection of syscalls are a bit slower. Still hunting down the root cause. Will
report once I have conclusive diagnosis.

Thanks,
Ryan


> +			}
> +
> +			if (first)
> +				nr++;
> +		} while (page++, --nr_pages > 0);
>  	} else if (folio_test_pmd_mappable(folio)) {
>  		/* That test is redundant: it's for safety or to optimize out */
>  
> @@ -1354,6 +1362,30 @@ void page_add_file_rmap(struct page *page, struct vm_area_struct *vma,
>  	mlock_vma_folio(folio, vma, compound);
>  }
>  
> +/**
> + * page_add_file_rmap - add pte mapping to a file page
> + * @page:	the page to add the mapping to
> + * @vma:	the vm area in which the mapping is added
> + * @compound:	charge the page as compound or small page
> + *
> + * The caller needs to hold the pte lock.
> + */
> +void page_add_file_rmap(struct page *page, struct vm_area_struct *vma,
> +		bool compound)
> +{
> +	struct folio *folio = page_folio(page);
> +	unsigned int nr_pages;
> +
> +	VM_WARN_ON_ONCE_PAGE(compound && !PageTransHuge(page), page);
> +
> +	if (likely(!compound))
> +		nr_pages = 1;
> +	else
> +		nr_pages = folio_nr_pages(folio);
> +
> +	folio_add_file_rmap_range(folio, page, nr_pages, vma, compound);
> +}
> +
>  /**
>   * page_remove_rmap - take down pte mapping from a page
>   * @page:	page to remove mapping from


^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v4 35/36] mm: Convert do_set_pte() to set_pte_range()
  2023-03-15  5:14 ` [PATCH v4 35/36] mm: Convert do_set_pte() to set_pte_range() Matthew Wilcox (Oracle)
@ 2023-03-15 15:26   ` Ryan Roberts
  2023-03-16 16:23     ` Yin, Fengwei
  0 siblings, 1 reply; 138+ messages in thread
From: Ryan Roberts @ 2023-03-15 15:26 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), linux-arch; +Cc: Yin Fengwei, linux-mm, linux-kernel

On 15/03/2023 05:14, Matthew Wilcox (Oracle) wrote:
> From: Yin Fengwei <fengwei.yin@intel.com>
> 
> set_pte_range() allows to setup page table entries for a specific
> range.  It takes advantage of batched rmap update for large folio.
> It now takes care of calling update_mmu_cache_range().
> 
> Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>  Documentation/filesystems/locking.rst |  2 +-
>  include/linux/mm.h                    |  3 ++-
>  mm/filemap.c                          |  3 +--
>  mm/memory.c                           | 27 +++++++++++++++------------
>  4 files changed, 19 insertions(+), 16 deletions(-)
> 
> diff --git a/Documentation/filesystems/locking.rst b/Documentation/filesystems/locking.rst
> index 7de7a7272a5e..922886fefb7f 100644
> --- a/Documentation/filesystems/locking.rst
> +++ b/Documentation/filesystems/locking.rst
> @@ -663,7 +663,7 @@ locked. The VM will unlock the page.
>  Filesystem should find and map pages associated with offsets from "start_pgoff"
>  till "end_pgoff". ->map_pages() is called with page table locked and must
>  not block.  If it's not possible to reach a page without blocking,
> -filesystem should skip it. Filesystem should use do_set_pte() to setup
> +filesystem should skip it. Filesystem should use set_pte_range() to setup
>  page table entry. Pointer to entry associated with the page is passed in
>  "pte" field in vm_fault structure. Pointers to entries for other offsets
>  should be calculated relative to "pte".
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index ee755bb4e1c1..81788c985a8c 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1299,7 +1299,8 @@ static inline pte_t maybe_mkwrite(pte_t pte, struct vm_area_struct *vma)
>  }
>  
>  vm_fault_t do_set_pmd(struct vm_fault *vmf, struct page *page);
> -void do_set_pte(struct vm_fault *vmf, struct page *page, unsigned long addr);
> +void set_pte_range(struct vm_fault *vmf, struct folio *folio,
> +		struct page *page, unsigned int nr, unsigned long addr);
>  
>  vm_fault_t finish_fault(struct vm_fault *vmf);
>  vm_fault_t finish_mkwrite_fault(struct vm_fault *vmf);
> diff --git a/mm/filemap.c b/mm/filemap.c
> index 6e2b0778db45..e2317623dcbf 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -3504,8 +3504,7 @@ static vm_fault_t filemap_map_folio_range(struct vm_fault *vmf,
>  			ret = VM_FAULT_NOPAGE;
>  
>  		ref_count++;
> -		do_set_pte(vmf, page, addr);
> -		update_mmu_cache(vma, addr, vmf->pte);
> +		set_pte_range(vmf, folio, page, 1, addr);
>  	} while (vmf->pte++, page++, addr += PAGE_SIZE, ++count < nr_pages);
>  
>  	/* Restore the vmf->pte */
> diff --git a/mm/memory.c b/mm/memory.c
> index 6aa21e8f3753..9a654802f104 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -4274,7 +4274,8 @@ vm_fault_t do_set_pmd(struct vm_fault *vmf, struct page *page)
>  }
>  #endif
>  
> -void do_set_pte(struct vm_fault *vmf, struct page *page, unsigned long addr)
> +void set_pte_range(struct vm_fault *vmf, struct folio *folio,
> +		struct page *page, unsigned int nr, unsigned long addr)
>  {
>  	struct vm_area_struct *vma = vmf->vma;
>  	bool uffd_wp = vmf_orig_pte_uffd_wp(vmf);
> @@ -4282,7 +4283,7 @@ void do_set_pte(struct vm_fault *vmf, struct page *page, unsigned long addr)
>  	bool prefault = vmf->address != addr;

I think you are changing behavior here - is this intentional? Previously this
would be evaluated per page, now its evaluated once for the whole range. The
intention below is that directly faulted pages are mapped young and prefaulted
pages are mapped old. But now a whole range will be mapped the same.

Thanks,
Ryan

>  	pte_t entry;
>  
> -	flush_icache_page(vma, page);
> +	flush_icache_pages(vma, page, nr);
>  	entry = mk_pte(page, vma->vm_page_prot);
>  
>  	if (prefault && arch_wants_old_prefaulted_pte())
> @@ -4296,14 +4297,18 @@ void do_set_pte(struct vm_fault *vmf, struct page *page, unsigned long addr)
>  		entry = pte_mkuffd_wp(entry);
>  	/* copy-on-write page */
>  	if (write && !(vma->vm_flags & VM_SHARED)) {
> -		inc_mm_counter(vma->vm_mm, MM_ANONPAGES);
> -		page_add_new_anon_rmap(page, vma, addr);
> -		lru_cache_add_inactive_or_unevictable(page, vma);
> +		add_mm_counter(vma->vm_mm, MM_ANONPAGES, nr);
> +		VM_BUG_ON_FOLIO(nr != 1, folio);
> +		folio_add_new_anon_rmap(folio, vma, addr);
> +		folio_add_lru_vma(folio, vma);
>  	} else {
> -		inc_mm_counter(vma->vm_mm, mm_counter_file(page));
> -		page_add_file_rmap(page, vma, false);
> +		add_mm_counter(vma->vm_mm, mm_counter_file(page), nr);
> +		folio_add_file_rmap_range(folio, page, nr, vma, false);
>  	}
> -	set_pte_at(vma->vm_mm, addr, vmf->pte, entry);
> +	set_ptes(vma->vm_mm, addr, vmf->pte, entry, nr);
> +
> +	/* no need to invalidate: a not-present page won't be cached */
> +	update_mmu_cache_range(vma, addr, vmf->pte, nr);
>  }
>  
>  static bool vmf_pte_changed(struct vm_fault *vmf)
> @@ -4376,11 +4381,9 @@ vm_fault_t finish_fault(struct vm_fault *vmf)
>  
>  	/* Re-check under ptl */
>  	if (likely(!vmf_pte_changed(vmf))) {
> -		do_set_pte(vmf, page, vmf->address);
> -
> -		/* no need to invalidate: a not-present page won't be cached */
> -		update_mmu_cache(vma, vmf->address, vmf->pte);
> +		struct folio *folio = page_folio(page);
>  
> +		set_pte_range(vmf, folio, page, 1, vmf->address);
>  		ret = 0;
>  	} else {
>  		update_mmu_tlb(vma, vmf->address, vmf->pte);


^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v4 34/36] rmap: add folio_add_file_rmap_range()
  2023-03-15 13:34   ` Ryan Roberts
@ 2023-03-15 16:08     ` Ryan Roberts
  2023-03-16 16:27       ` Yin, Fengwei
  0 siblings, 1 reply; 138+ messages in thread
From: Ryan Roberts @ 2023-03-15 16:08 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), linux-arch; +Cc: Yin Fengwei, linux-mm, linux-kernel

On 15/03/2023 13:34, Ryan Roberts wrote:
> On 15/03/2023 05:14, Matthew Wilcox (Oracle) wrote:
>> From: Yin Fengwei <fengwei.yin@intel.com>
>>
>> folio_add_file_rmap_range() allows to add pte mapping to a specific
>> range of file folio. Comparing to page_add_file_rmap(), it batched
>> updates __lruvec_stat for large folio.
>>
>> Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
>> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
>> ---
>>  include/linux/rmap.h |  2 ++
>>  mm/rmap.c            | 60 +++++++++++++++++++++++++++++++++-----------
>>  2 files changed, 48 insertions(+), 14 deletions(-)
>>
>> diff --git a/include/linux/rmap.h b/include/linux/rmap.h
>> index b87d01660412..a3825ce81102 100644
>> --- a/include/linux/rmap.h
>> +++ b/include/linux/rmap.h
>> @@ -198,6 +198,8 @@ void folio_add_new_anon_rmap(struct folio *, struct vm_area_struct *,
>>  		unsigned long address);
>>  void page_add_file_rmap(struct page *, struct vm_area_struct *,
>>  		bool compound);
>> +void folio_add_file_rmap_range(struct folio *, struct page *, unsigned int nr,
>> +		struct vm_area_struct *, bool compound);
>>  void page_remove_rmap(struct page *, struct vm_area_struct *,
>>  		bool compound);
>>  
>> diff --git a/mm/rmap.c b/mm/rmap.c
>> index 4898e10c569a..a91906b28835 100644
>> --- a/mm/rmap.c
>> +++ b/mm/rmap.c
>> @@ -1301,31 +1301,39 @@ void folio_add_new_anon_rmap(struct folio *folio, struct vm_area_struct *vma,
>>  }
>>  
>>  /**
>> - * page_add_file_rmap - add pte mapping to a file page
>> - * @page:	the page to add the mapping to
>> + * folio_add_file_rmap_range - add pte mapping to page range of a folio
>> + * @folio:	The folio to add the mapping to
>> + * @page:	The first page to add
>> + * @nr_pages:	The number of pages which will be mapped
>>   * @vma:	the vm area in which the mapping is added
>>   * @compound:	charge the page as compound or small page
>>   *
>> + * The page range of folio is defined by [first_page, first_page + nr_pages)
>> + *
>>   * The caller needs to hold the pte lock.
>>   */
>> -void page_add_file_rmap(struct page *page, struct vm_area_struct *vma,
>> -		bool compound)
>> +void folio_add_file_rmap_range(struct folio *folio, struct page *page,
>> +			unsigned int nr_pages, struct vm_area_struct *vma,
>> +			bool compound)
>>  {
>> -	struct folio *folio = page_folio(page);
>>  	atomic_t *mapped = &folio->_nr_pages_mapped;
>> -	int nr = 0, nr_pmdmapped = 0;
>> -	bool first;
>> +	unsigned int nr_pmdmapped = 0, first;
>> +	int nr = 0;
>>  
>> -	VM_BUG_ON_PAGE(compound && !PageTransHuge(page), page);
>> +	VM_WARN_ON_FOLIO(compound && !folio_test_pmd_mappable(folio), folio);
>>  
>>  	/* Is page being mapped by PTE? Is this its first map to be added? */
>>  	if (likely(!compound)) {
>> -		first = atomic_inc_and_test(&page->_mapcount);
>> -		nr = first;
>> -		if (first && folio_test_large(folio)) {
>> -			nr = atomic_inc_return_relaxed(mapped);
>> -			nr = (nr < COMPOUND_MAPPED);
>> -		}
>> +		do {
>> +			first = atomic_inc_and_test(&page->_mapcount);
>> +			if (first && folio_test_large(folio)) {
>> +				first = atomic_inc_return_relaxed(mapped);
>> +				first = (nr < COMPOUND_MAPPED);
> 
> This still contains the typo that Yin Fengwei spotted in the previous version:
> https://lore.kernel.org/linux-mm/20230228213738.272178-1-willy@infradead.org/T/#m84673899e25bc31356093a1177941f2cc35e5da8
> 
> FYI, I'm seeing a perf regression of about 1% when compiling the kernel on
> Ampere Altra (arm64) with this whole series on top of v6.3-rc1 (In a VM using
> ext4 filesystem). Looks like instruction aborts are taking much longer and a
> selection of syscalls are a bit slower. Still hunting down the root cause. Will
> report once I have conclusive diagnosis.

I'm sorry - I'm struggling to find the exact cause. But its spending over 2x the
amount of time in the instruction abort handling code once patches 32-36 are
included. Everything in the flame graph is just taking longer. Perhaps we are
getting more instruction aborts somehow? I have the flamegraphs if anyone wants
them - just shout and I'll email them separately.

> 
> Thanks,
> Ryan
> 
> 
>> +			}
>> +
>> +			if (first)
>> +				nr++;
>> +		} while (page++, --nr_pages > 0);
>>  	} else if (folio_test_pmd_mappable(folio)) {
>>  		/* That test is redundant: it's for safety or to optimize out */
>>  
>> @@ -1354,6 +1362,30 @@ void page_add_file_rmap(struct page *page, struct vm_area_struct *vma,
>>  	mlock_vma_folio(folio, vma, compound);
>>  }
>>  
>> +/**
>> + * page_add_file_rmap - add pte mapping to a file page
>> + * @page:	the page to add the mapping to
>> + * @vma:	the vm area in which the mapping is added
>> + * @compound:	charge the page as compound or small page
>> + *
>> + * The caller needs to hold the pte lock.
>> + */
>> +void page_add_file_rmap(struct page *page, struct vm_area_struct *vma,
>> +		bool compound)
>> +{
>> +	struct folio *folio = page_folio(page);
>> +	unsigned int nr_pages;
>> +
>> +	VM_WARN_ON_ONCE_PAGE(compound && !PageTransHuge(page), page);
>> +
>> +	if (likely(!compound))
>> +		nr_pages = 1;
>> +	else
>> +		nr_pages = folio_nr_pages(folio);
>> +
>> +	folio_add_file_rmap_range(folio, page, nr_pages, vma, compound);
>> +}
>> +
>>  /**
>>   * page_remove_rmap - take down pte mapping from a page
>>   * @page:	page to remove mapping from
> 


^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v4 27/36] x86: Implement the new page table range API
  2023-03-15 11:19       ` Peter Zijlstra
@ 2023-03-15 16:12         ` Matthew Wilcox
  0 siblings, 0 replies; 138+ messages in thread
From: Matthew Wilcox @ 2023-03-15 16:12 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Mike Rapoport, linux-arch, linux-mm, linux-kernel,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin

On Wed, Mar 15, 2023 at 12:19:41PM +0100, Peter Zijlstra wrote:
> On Wed, Mar 15, 2023 at 01:16:24PM +0200, Mike Rapoport wrote:
> > On Wed, Mar 15, 2023 at 11:34:36AM +0100, Peter Zijlstra wrote:
> > > On Wed, Mar 15, 2023 at 05:14:35AM +0000, Matthew Wilcox (Oracle) wrote:
> > > > Add PFN_PTE_SHIFT and a noop update_mmu_cache_range().
> > > > 
> > > > Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> > > > Cc: Thomas Gleixner <tglx@linutronix.de>
> > > > Cc: Ingo Molnar <mingo@redhat.com>
> > > > Cc: Borislav Petkov <bp@alien8.de>
> > > > Cc: Dave Hansen <dave.hansen@linux.intel.com>
> > > > Cc: x86@kernel.org
> > > > Cc: "H. Peter Anvin" <hpa@zytor.com>
> > > > ---
> > > >  arch/x86/include/asm/pgtable.h | 13 ++++++-------
> > > >  1 file changed, 6 insertions(+), 7 deletions(-)
> > > > 
> > > > diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
> > > > index 1031025730d0..b237878061c4 100644
> > > > --- a/arch/x86/include/asm/pgtable.h
> > > > +++ b/arch/x86/include/asm/pgtable.h
> > > > @@ -184,6 +184,8 @@ static inline int pte_special(pte_t pte)
> > > >  
> > > >  static inline u64 protnone_mask(u64 val);
> > > >  
> > > > +#define PFN_PTE_SHIFT	PAGE_SHIFT
> > > > +
> > > >  static inline unsigned long pte_pfn(pte_t pte)
> > > >  {
> > > >  	phys_addr_t pfn = pte_val(pte);
> > > > @@ -1019,13 +1021,6 @@ static inline pud_t native_local_pudp_get_and_clear(pud_t *pudp)
> > > >  	return res;
> > > >  }
> > > >  
> > > > -static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
> > > > -			      pte_t *ptep, pte_t pte)
> > > > -{
> > > > -	page_table_check_ptes_set(mm, addr, ptep, pte, 1);
> > > > -	set_pte(ptep, pte);
> > > > -}
> > > > -
> > > 
> > > And remove set_pte_at() apparently.. whut?!?
> > 
> > It's now in include/linux/pgtable.h
> 
> All I have is this one patch -- and the changelog doesn't mention this.
> HTF am I supposed to know that?

You should be subscribed to linux-arch.  I literally can't cc all arch
maintainers on every patch; many of the mailing lists will reject the
emails based on "too many recipients".  That's what linux-arch is _for_.

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v4 16/36] mips: Implement the new page table range API
  2023-03-15 10:50   ` Thomas Bogendoerfer
@ 2023-03-15 20:33     ` Matthew Wilcox
  2023-03-17 15:29       ` Thomas Bogendoerfer
  0 siblings, 1 reply; 138+ messages in thread
From: Matthew Wilcox @ 2023-03-15 20:33 UTC (permalink / raw)
  To: Thomas Bogendoerfer; +Cc: linux-arch, linux-mm, linux-kernel, linux-mips

On Wed, Mar 15, 2023 at 11:50:22AM +0100, Thomas Bogendoerfer wrote:
> On Wed, Mar 15, 2023 at 05:14:24AM +0000, Matthew Wilcox (Oracle) wrote:
> > Rename _PFN_SHIFT to PFN_PTE_SHIFT.  Convert a few places
> > to call set_pte() instead of set_pte_at().  Add set_ptes(),
> > update_mmu_cache_range(), flush_icache_pages() and flush_dcache_folio().
> 
> /local/tbogendoerfer/korg/linux/mm/memory.c: In function ‘set_pte_range’:
> /local/tbogendoerfer/korg/linux/mm/memory.c:4290:2: error: implicit declaration of function ‘update_mmu_cache_range’ [-Werror=implicit-function-declaration]
>   update_mmu_cache_range(vma, addr, vmf->pte, nr);
> 
> update_mmu_cache_range() is missing in this patch.

Oops.  And mips was one of the arches I did a test build for!

Looks like we could try to gain some efficiency by passing 'nr' to
__update_tlb(), but as far as I can tell, that's only called for r3k and
r4k, so maybe it's not worth optimising at this point?  Anyway, this
add-on makes the mips build compile for me and I'll fold it into v5.

diff --git a/arch/mips/include/asm/pgtable.h b/arch/mips/include/asm/pgtable.h
index cfcd6a8ba8ef..9f51b0813dc6 100644
--- a/arch/mips/include/asm/pgtable.h
+++ b/arch/mips/include/asm/pgtable.h
@@ -578,12 +578,20 @@ static inline pte_t pte_swp_clear_exclusive(pte_t pte)
 extern void __update_tlb(struct vm_area_struct *vma, unsigned long address,
 	pte_t pte);
 
-static inline void update_mmu_cache(struct vm_area_struct *vma,
-	unsigned long address, pte_t *ptep)
+static inline void update_mmu_cache_range(struct vm_area_struct *vma,
+		unsigned long address, pte_t *ptep, unsigned int nr)
 {
-	pte_t pte = *ptep;
-	__update_tlb(vma, address, pte);
+	for (;;) {
+		pte_t pte = *ptep;
+		__update_tlb(vma, address, pte);
+		if (--nr == 0)
+			break;
+		ptep++;
+		address += PAGE_SIZE;
+	}
 }
+#define update_mmu_cache(vma, address, ptep) \
+	update_mmu_cache_range(vma, address, ptep, 1)
 
 #define	__HAVE_ARCH_UPDATE_MMU_TLB
 #define update_mmu_tlb	update_mmu_cache

^ permalink raw reply related	[flat|nested] 138+ messages in thread

* Re: [PATCH v4 35/36] mm: Convert do_set_pte() to set_pte_range()
  2023-03-15 15:26   ` Ryan Roberts
@ 2023-03-16 16:23     ` Yin, Fengwei
  2023-03-16 16:38       ` Ryan Roberts
  0 siblings, 1 reply; 138+ messages in thread
From: Yin, Fengwei @ 2023-03-16 16:23 UTC (permalink / raw)
  To: Ryan Roberts, Matthew Wilcox (Oracle), linux-arch, will
  Cc: linux-mm, linux-kernel



On 3/15/2023 11:26 PM, Ryan Roberts wrote:
> On 15/03/2023 05:14, Matthew Wilcox (Oracle) wrote:
>> From: Yin Fengwei <fengwei.yin@intel.com>
>>
>> set_pte_range() allows to setup page table entries for a specific
>> range.  It takes advantage of batched rmap update for large folio.
>> It now takes care of calling update_mmu_cache_range().
>>
>> Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
>> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
>> ---
>>  Documentation/filesystems/locking.rst |  2 +-
>>  include/linux/mm.h                    |  3 ++-
>>  mm/filemap.c                          |  3 +--
>>  mm/memory.c                           | 27 +++++++++++++++------------
>>  4 files changed, 19 insertions(+), 16 deletions(-)
>>
>> diff --git a/Documentation/filesystems/locking.rst b/Documentation/filesystems/locking.rst
>> index 7de7a7272a5e..922886fefb7f 100644
>> --- a/Documentation/filesystems/locking.rst
>> +++ b/Documentation/filesystems/locking.rst
>> @@ -663,7 +663,7 @@ locked. The VM will unlock the page.
>>  Filesystem should find and map pages associated with offsets from "start_pgoff"
>>  till "end_pgoff". ->map_pages() is called with page table locked and must
>>  not block.  If it's not possible to reach a page without blocking,
>> -filesystem should skip it. Filesystem should use do_set_pte() to setup
>> +filesystem should skip it. Filesystem should use set_pte_range() to setup
>>  page table entry. Pointer to entry associated with the page is passed in
>>  "pte" field in vm_fault structure. Pointers to entries for other offsets
>>  should be calculated relative to "pte".
>> diff --git a/include/linux/mm.h b/include/linux/mm.h
>> index ee755bb4e1c1..81788c985a8c 100644
>> --- a/include/linux/mm.h
>> +++ b/include/linux/mm.h
>> @@ -1299,7 +1299,8 @@ static inline pte_t maybe_mkwrite(pte_t pte, struct vm_area_struct *vma)
>>  }
>>  
>>  vm_fault_t do_set_pmd(struct vm_fault *vmf, struct page *page);
>> -void do_set_pte(struct vm_fault *vmf, struct page *page, unsigned long addr);
>> +void set_pte_range(struct vm_fault *vmf, struct folio *folio,
>> +		struct page *page, unsigned int nr, unsigned long addr);
>>  
>>  vm_fault_t finish_fault(struct vm_fault *vmf);
>>  vm_fault_t finish_mkwrite_fault(struct vm_fault *vmf);
>> diff --git a/mm/filemap.c b/mm/filemap.c
>> index 6e2b0778db45..e2317623dcbf 100644
>> --- a/mm/filemap.c
>> +++ b/mm/filemap.c
>> @@ -3504,8 +3504,7 @@ static vm_fault_t filemap_map_folio_range(struct vm_fault *vmf,
>>  			ret = VM_FAULT_NOPAGE;
>>  
>>  		ref_count++;
>> -		do_set_pte(vmf, page, addr);
>> -		update_mmu_cache(vma, addr, vmf->pte);
>> +		set_pte_range(vmf, folio, page, 1, addr);
>>  	} while (vmf->pte++, page++, addr += PAGE_SIZE, ++count < nr_pages);
>>  
>>  	/* Restore the vmf->pte */
>> diff --git a/mm/memory.c b/mm/memory.c
>> index 6aa21e8f3753..9a654802f104 100644
>> --- a/mm/memory.c
>> +++ b/mm/memory.c
>> @@ -4274,7 +4274,8 @@ vm_fault_t do_set_pmd(struct vm_fault *vmf, struct page *page)
>>  }
>>  #endif
>>  
>> -void do_set_pte(struct vm_fault *vmf, struct page *page, unsigned long addr)
>> +void set_pte_range(struct vm_fault *vmf, struct folio *folio,
>> +		struct page *page, unsigned int nr, unsigned long addr)
>>  {
>>  	struct vm_area_struct *vma = vmf->vma;
>>  	bool uffd_wp = vmf_orig_pte_uffd_wp(vmf);
>> @@ -4282,7 +4283,7 @@ void do_set_pte(struct vm_fault *vmf, struct page *page, unsigned long addr)
>>  	bool prefault = vmf->address != addr;
> 
> I think you are changing behavior here - is this intentional? Previously this
> would be evaluated per page, now its evaluated once for the whole range. The
> intention below is that directly faulted pages are mapped young and prefaulted
> pages are mapped old. But now a whole range will be mapped the same.

Yes. You are right here.

Look at the prefault and cpu_has_hw_af for ARM64, it looks like we
can avoid to handle vmf->address == addr specially. It's OK to 
drop prefault and change the logic here a little bit to:
  if (arch_wants_old_prefaulted_pte())
      entry = pte_mkold(entry);
  else
      entry = pte_sw_mkyong(entry);

It's not necessary to use pte_sw_mkyong for vmf->address == addr
because HW will set the ACCESS bit in page table entry.

Add Will Deacon in case I missed something here. Thanks.


Regards
Yin, Fengwei

> 
> Thanks,
> Ryan
> 
>>  	pte_t entry;
>>  
>> -	flush_icache_page(vma, page);
>> +	flush_icache_pages(vma, page, nr);
>>  	entry = mk_pte(page, vma->vm_page_prot);
>>  
>>  	if (prefault && arch_wants_old_prefaulted_pte())
>> @@ -4296,14 +4297,18 @@ void do_set_pte(struct vm_fault *vmf, struct page *page, unsigned long addr)
>>  		entry = pte_mkuffd_wp(entry);
>>  	/* copy-on-write page */
>>  	if (write && !(vma->vm_flags & VM_SHARED)) {
>> -		inc_mm_counter(vma->vm_mm, MM_ANONPAGES);
>> -		page_add_new_anon_rmap(page, vma, addr);
>> -		lru_cache_add_inactive_or_unevictable(page, vma);
>> +		add_mm_counter(vma->vm_mm, MM_ANONPAGES, nr);
>> +		VM_BUG_ON_FOLIO(nr != 1, folio);
>> +		folio_add_new_anon_rmap(folio, vma, addr);
>> +		folio_add_lru_vma(folio, vma);
>>  	} else {
>> -		inc_mm_counter(vma->vm_mm, mm_counter_file(page));
>> -		page_add_file_rmap(page, vma, false);
>> +		add_mm_counter(vma->vm_mm, mm_counter_file(page), nr);
>> +		folio_add_file_rmap_range(folio, page, nr, vma, false);
>>  	}
>> -	set_pte_at(vma->vm_mm, addr, vmf->pte, entry);
>> +	set_ptes(vma->vm_mm, addr, vmf->pte, entry, nr);
>> +
>> +	/* no need to invalidate: a not-present page won't be cached */
>> +	update_mmu_cache_range(vma, addr, vmf->pte, nr);
>>  }
>>  
>>  static bool vmf_pte_changed(struct vm_fault *vmf)
>> @@ -4376,11 +4381,9 @@ vm_fault_t finish_fault(struct vm_fault *vmf)
>>  
>>  	/* Re-check under ptl */
>>  	if (likely(!vmf_pte_changed(vmf))) {
>> -		do_set_pte(vmf, page, vmf->address);
>> -
>> -		/* no need to invalidate: a not-present page won't be cached */
>> -		update_mmu_cache(vma, vmf->address, vmf->pte);
>> +		struct folio *folio = page_folio(page);
>>  
>> +		set_pte_range(vmf, folio, page, 1, vmf->address);
>>  		ret = 0;
>>  	} else {
>>  		update_mmu_tlb(vma, vmf->address, vmf->pte);
> 

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v4 34/36] rmap: add folio_add_file_rmap_range()
  2023-03-15 16:08     ` Ryan Roberts
@ 2023-03-16 16:27       ` Yin, Fengwei
  2023-03-16 16:34         ` Ryan Roberts
  0 siblings, 1 reply; 138+ messages in thread
From: Yin, Fengwei @ 2023-03-16 16:27 UTC (permalink / raw)
  To: Ryan Roberts, Matthew Wilcox (Oracle), linux-arch; +Cc: linux-mm, linux-kernel

Hi Matthew,

On 3/16/2023 12:08 AM, Ryan Roberts wrote:
> On 15/03/2023 13:34, Ryan Roberts wrote:
>> On 15/03/2023 05:14, Matthew Wilcox (Oracle) wrote:
>>> From: Yin Fengwei <fengwei.yin@intel.com>
>>>
>>> folio_add_file_rmap_range() allows to add pte mapping to a specific
>>> range of file folio. Comparing to page_add_file_rmap(), it batched
>>> updates __lruvec_stat for large folio.
>>>
>>> Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
>>> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
>>> ---
>>>  include/linux/rmap.h |  2 ++
>>>  mm/rmap.c            | 60 +++++++++++++++++++++++++++++++++-----------
>>>  2 files changed, 48 insertions(+), 14 deletions(-)
>>>
>>> diff --git a/include/linux/rmap.h b/include/linux/rmap.h
>>> index b87d01660412..a3825ce81102 100644
>>> --- a/include/linux/rmap.h
>>> +++ b/include/linux/rmap.h
>>> @@ -198,6 +198,8 @@ void folio_add_new_anon_rmap(struct folio *, struct vm_area_struct *,
>>>  		unsigned long address);
>>>  void page_add_file_rmap(struct page *, struct vm_area_struct *,
>>>  		bool compound);
>>> +void folio_add_file_rmap_range(struct folio *, struct page *, unsigned int nr,
>>> +		struct vm_area_struct *, bool compound);
>>>  void page_remove_rmap(struct page *, struct vm_area_struct *,
>>>  		bool compound);
>>>  
>>> diff --git a/mm/rmap.c b/mm/rmap.c
>>> index 4898e10c569a..a91906b28835 100644
>>> --- a/mm/rmap.c
>>> +++ b/mm/rmap.c
>>> @@ -1301,31 +1301,39 @@ void folio_add_new_anon_rmap(struct folio *folio, struct vm_area_struct *vma,
>>>  }
>>>  
>>>  /**
>>> - * page_add_file_rmap - add pte mapping to a file page
>>> - * @page:	the page to add the mapping to
>>> + * folio_add_file_rmap_range - add pte mapping to page range of a folio
>>> + * @folio:	The folio to add the mapping to
>>> + * @page:	The first page to add
>>> + * @nr_pages:	The number of pages which will be mapped
>>>   * @vma:	the vm area in which the mapping is added
>>>   * @compound:	charge the page as compound or small page
>>>   *
>>> + * The page range of folio is defined by [first_page, first_page + nr_pages)
>>> + *
>>>   * The caller needs to hold the pte lock.
>>>   */
>>> -void page_add_file_rmap(struct page *page, struct vm_area_struct *vma,
>>> -		bool compound)
>>> +void folio_add_file_rmap_range(struct folio *folio, struct page *page,
>>> +			unsigned int nr_pages, struct vm_area_struct *vma,
>>> +			bool compound)
>>>  {
>>> -	struct folio *folio = page_folio(page);
>>>  	atomic_t *mapped = &folio->_nr_pages_mapped;
>>> -	int nr = 0, nr_pmdmapped = 0;
>>> -	bool first;
>>> +	unsigned int nr_pmdmapped = 0, first;
>>> +	int nr = 0;
>>>  
>>> -	VM_BUG_ON_PAGE(compound && !PageTransHuge(page), page);
>>> +	VM_WARN_ON_FOLIO(compound && !folio_test_pmd_mappable(folio), folio);
>>>  
>>>  	/* Is page being mapped by PTE? Is this its first map to be added? */
>>>  	if (likely(!compound)) {
>>> -		first = atomic_inc_and_test(&page->_mapcount);
>>> -		nr = first;
>>> -		if (first && folio_test_large(folio)) {
>>> -			nr = atomic_inc_return_relaxed(mapped);
>>> -			nr = (nr < COMPOUND_MAPPED);
>>> -		}
>>> +		do {
>>> +			first = atomic_inc_and_test(&page->_mapcount);
>>> +			if (first && folio_test_large(folio)) {
>>> +				first = atomic_inc_return_relaxed(mapped);
>>> +				first = (nr < COMPOUND_MAPPED);
>>
>> This still contains the typo that Yin Fengwei spotted in the previous version:
>> https://lore.kernel.org/linux-mm/20230228213738.272178-1-willy@infradead.org/T/#m84673899e25bc31356093a1177941f2cc35e5da8
>>
>> FYI, I'm seeing a perf regression of about 1% when compiling the kernel on
>> Ampere Altra (arm64) with this whole series on top of v6.3-rc1 (In a VM using
>> ext4 filesystem). Looks like instruction aborts are taking much longer and a
>> selection of syscalls are a bit slower. Still hunting down the root cause. Will
>> report once I have conclusive diagnosis.
> 
> I'm sorry - I'm struggling to find the exact cause. But its spending over 2x the
> amount of time in the instruction abort handling code once patches 32-36 are
> included. Everything in the flame graph is just taking longer. Perhaps we are
> getting more instruction aborts somehow? I have the flamegraphs if anyone wants
> them - just shout and I'll email them separately.
Thanks a lot to Ryan for sharing the flamegraphs to me. I found the __do_fault()
is called with patch 32-36 while no __do_fault() just with first 31 patches. I 
suspect the folio_add_file_rmap_range() missed some PTEs population. Please give
me few days to find the root cause and fix. Sorry for this.


Regards
Yin, Fengwei

> 
>>
>> Thanks,
>> Ryan
>>
>>
>>> +			}
>>> +
>>> +			if (first)
>>> +				nr++;
>>> +		} while (page++, --nr_pages > 0);
>>>  	} else if (folio_test_pmd_mappable(folio)) {
>>>  		/* That test is redundant: it's for safety or to optimize out */
>>>  
>>> @@ -1354,6 +1362,30 @@ void page_add_file_rmap(struct page *page, struct vm_area_struct *vma,
>>>  	mlock_vma_folio(folio, vma, compound);
>>>  }
>>>  
>>> +/**
>>> + * page_add_file_rmap - add pte mapping to a file page
>>> + * @page:	the page to add the mapping to
>>> + * @vma:	the vm area in which the mapping is added
>>> + * @compound:	charge the page as compound or small page
>>> + *
>>> + * The caller needs to hold the pte lock.
>>> + */
>>> +void page_add_file_rmap(struct page *page, struct vm_area_struct *vma,
>>> +		bool compound)
>>> +{
>>> +	struct folio *folio = page_folio(page);
>>> +	unsigned int nr_pages;
>>> +
>>> +	VM_WARN_ON_ONCE_PAGE(compound && !PageTransHuge(page), page);
>>> +
>>> +	if (likely(!compound))
>>> +		nr_pages = 1;
>>> +	else
>>> +		nr_pages = folio_nr_pages(folio);
>>> +
>>> +	folio_add_file_rmap_range(folio, page, nr_pages, vma, compound);
>>> +}
>>> +
>>>  /**
>>>   * page_remove_rmap - take down pte mapping from a page
>>>   * @page:	page to remove mapping from
>>
> 

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v4 14/36] m68k: Implement the new page table range API
  2023-03-15  7:43   ` Geert Uytterhoeven
@ 2023-03-16 16:32     ` Geert Uytterhoeven
  0 siblings, 0 replies; 138+ messages in thread
From: Geert Uytterhoeven @ 2023-03-16 16:32 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle); +Cc: linux-arch, linux-mm, linux-kernel, linux-m68k

On Wed, Mar 15, 2023 at 8:43 AM Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> On Wed, Mar 15, 2023 at 6:14 AM Matthew Wilcox (Oracle)
> <willy@infradead.org> wrote:
> > Add PFN_PTE_SHIFT, update_mmu_cache_range(), flush_icache_pages() and
> > flush_dcache_folio().
> >
> > Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
>
> Thanks for your patch!
>
> > --- a/arch/m68k/include/asm/cacheflush_mm.h
> > +++ b/arch/m68k/include/asm/cacheflush_mm.h
> > @@ -220,24 +220,29 @@ static inline void flush_cache_page(struct vm_area_struct *vma, unsigned long vm
> >
> >  /* Push the page at kernel virtual address and clear the icache */
> >  /* RZ: use cpush %bc instead of cpush %dc, cinv %ic */
> > -static inline void __flush_page_to_ram(void *vaddr)
> > +static inline void __flush_pages_to_ram(void *vaddr, unsigned int nr)
> >  {
> >         if (CPU_IS_COLDFIRE) {
> >                 unsigned long addr, start, end;
> >                 addr = ((unsigned long) vaddr) & ~(PAGE_SIZE - 1);
> >                 start = addr & ICACHE_SET_MASK;
> > -               end = (addr + PAGE_SIZE - 1) & ICACHE_SET_MASK;
> > +               end = (addr + nr * PAGE_SIZE - 1) & ICACHE_SET_MASK;
> >                 if (start > end) {
> >                         flush_cf_bcache(0, end);
> >                         end = ICACHE_MAX_ADDR;
> >                 }
> >                 flush_cf_bcache(start, end);
> >         } else if (CPU_IS_040_OR_060) {
> > -               __asm__ __volatile__("nop\n\t"
> > -                                    ".chip 68040\n\t"
> > -                                    "cpushp %%bc,(%0)\n\t"
> > -                                    ".chip 68k"
> > -                                    : : "a" (__pa(vaddr)));
> > +               unsigned long paddr = __pa(vaddr);
> > +
> > +               do {
> > +                       __asm__ __volatile__("nop\n\t"
> > +                                            ".chip 68040\n\t"
> > +                                            "cpushp %%bc,(%0)\n\t"
> > +                                            ".chip 68k"
> > +                                            : : "a" (paddr));
> > +                       paddr += PAGE_SIZE;
> > +               } while (--nr);
>
> Please use "while (nr--) { ... }", to protect against anyone ever
> calling this with nr == 0.
>
> The rest LGTM, I'll give it a try shortly...

Still working fine on ARAnyM, so
Tested-by: Geert Uytterhoeven <geert@linux-m68k.org>

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v4 34/36] rmap: add folio_add_file_rmap_range()
  2023-03-16 16:27       ` Yin, Fengwei
@ 2023-03-16 16:34         ` Ryan Roberts
  2023-03-17  8:23           ` Yin, Fengwei
  0 siblings, 1 reply; 138+ messages in thread
From: Ryan Roberts @ 2023-03-16 16:34 UTC (permalink / raw)
  To: Yin, Fengwei, Matthew Wilcox (Oracle), linux-arch; +Cc: linux-mm, linux-kernel

On 16/03/2023 16:27, Yin, Fengwei wrote:
> Hi Matthew,
> 
> On 3/16/2023 12:08 AM, Ryan Roberts wrote:
>> On 15/03/2023 13:34, Ryan Roberts wrote:
>>> On 15/03/2023 05:14, Matthew Wilcox (Oracle) wrote:
>>>> From: Yin Fengwei <fengwei.yin@intel.com>
>>>>
>>>> folio_add_file_rmap_range() allows to add pte mapping to a specific
>>>> range of file folio. Comparing to page_add_file_rmap(), it batched
>>>> updates __lruvec_stat for large folio.
>>>>
>>>> Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
>>>> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
>>>> ---
>>>>  include/linux/rmap.h |  2 ++
>>>>  mm/rmap.c            | 60 +++++++++++++++++++++++++++++++++-----------
>>>>  2 files changed, 48 insertions(+), 14 deletions(-)
>>>>
>>>> diff --git a/include/linux/rmap.h b/include/linux/rmap.h
>>>> index b87d01660412..a3825ce81102 100644
>>>> --- a/include/linux/rmap.h
>>>> +++ b/include/linux/rmap.h
>>>> @@ -198,6 +198,8 @@ void folio_add_new_anon_rmap(struct folio *, struct vm_area_struct *,
>>>>  		unsigned long address);
>>>>  void page_add_file_rmap(struct page *, struct vm_area_struct *,
>>>>  		bool compound);
>>>> +void folio_add_file_rmap_range(struct folio *, struct page *, unsigned int nr,
>>>> +		struct vm_area_struct *, bool compound);
>>>>  void page_remove_rmap(struct page *, struct vm_area_struct *,
>>>>  		bool compound);
>>>>  
>>>> diff --git a/mm/rmap.c b/mm/rmap.c
>>>> index 4898e10c569a..a91906b28835 100644
>>>> --- a/mm/rmap.c
>>>> +++ b/mm/rmap.c
>>>> @@ -1301,31 +1301,39 @@ void folio_add_new_anon_rmap(struct folio *folio, struct vm_area_struct *vma,
>>>>  }
>>>>  
>>>>  /**
>>>> - * page_add_file_rmap - add pte mapping to a file page
>>>> - * @page:	the page to add the mapping to
>>>> + * folio_add_file_rmap_range - add pte mapping to page range of a folio
>>>> + * @folio:	The folio to add the mapping to
>>>> + * @page:	The first page to add
>>>> + * @nr_pages:	The number of pages which will be mapped
>>>>   * @vma:	the vm area in which the mapping is added
>>>>   * @compound:	charge the page as compound or small page
>>>>   *
>>>> + * The page range of folio is defined by [first_page, first_page + nr_pages)
>>>> + *
>>>>   * The caller needs to hold the pte lock.
>>>>   */
>>>> -void page_add_file_rmap(struct page *page, struct vm_area_struct *vma,
>>>> -		bool compound)
>>>> +void folio_add_file_rmap_range(struct folio *folio, struct page *page,
>>>> +			unsigned int nr_pages, struct vm_area_struct *vma,
>>>> +			bool compound)
>>>>  {
>>>> -	struct folio *folio = page_folio(page);
>>>>  	atomic_t *mapped = &folio->_nr_pages_mapped;
>>>> -	int nr = 0, nr_pmdmapped = 0;
>>>> -	bool first;
>>>> +	unsigned int nr_pmdmapped = 0, first;
>>>> +	int nr = 0;
>>>>  
>>>> -	VM_BUG_ON_PAGE(compound && !PageTransHuge(page), page);
>>>> +	VM_WARN_ON_FOLIO(compound && !folio_test_pmd_mappable(folio), folio);
>>>>  
>>>>  	/* Is page being mapped by PTE? Is this its first map to be added? */
>>>>  	if (likely(!compound)) {
>>>> -		first = atomic_inc_and_test(&page->_mapcount);
>>>> -		nr = first;
>>>> -		if (first && folio_test_large(folio)) {
>>>> -			nr = atomic_inc_return_relaxed(mapped);
>>>> -			nr = (nr < COMPOUND_MAPPED);
>>>> -		}
>>>> +		do {
>>>> +			first = atomic_inc_and_test(&page->_mapcount);
>>>> +			if (first && folio_test_large(folio)) {
>>>> +				first = atomic_inc_return_relaxed(mapped);
>>>> +				first = (nr < COMPOUND_MAPPED);
>>>
>>> This still contains the typo that Yin Fengwei spotted in the previous version:
>>> https://lore.kernel.org/linux-mm/20230228213738.272178-1-willy@infradead.org/T/#m84673899e25bc31356093a1177941f2cc35e5da8
>>>
>>> FYI, I'm seeing a perf regression of about 1% when compiling the kernel on
>>> Ampere Altra (arm64) with this whole series on top of v6.3-rc1 (In a VM using
>>> ext4 filesystem). Looks like instruction aborts are taking much longer and a
>>> selection of syscalls are a bit slower. Still hunting down the root cause. Will
>>> report once I have conclusive diagnosis.
>>
>> I'm sorry - I'm struggling to find the exact cause. But its spending over 2x the
>> amount of time in the instruction abort handling code once patches 32-36 are
>> included. Everything in the flame graph is just taking longer. Perhaps we are
>> getting more instruction aborts somehow? I have the flamegraphs if anyone wants
>> them - just shout and I'll email them separately.
> Thanks a lot to Ryan for sharing the flamegraphs to me. I found the __do_fault()
> is called with patch 32-36 while no __do_fault() just with first 31 patches. I 
> suspect the folio_add_file_rmap_range() missed some PTEs population. Please give
> me few days to find the root cause and fix. Sorry for this.

You're welcome. Give me a shout once you have a re-spin and I'll rerun the tests.

> 
> 
> Regards
> Yin, Fengwei
> 
>>
>>>
>>> Thanks,
>>> Ryan
>>>
>>>
>>>> +			}
>>>> +
>>>> +			if (first)
>>>> +				nr++;
>>>> +		} while (page++, --nr_pages > 0);
>>>>  	} else if (folio_test_pmd_mappable(folio)) {
>>>>  		/* That test is redundant: it's for safety or to optimize out */
>>>>  
>>>> @@ -1354,6 +1362,30 @@ void page_add_file_rmap(struct page *page, struct vm_area_struct *vma,
>>>>  	mlock_vma_folio(folio, vma, compound);
>>>>  }
>>>>  
>>>> +/**
>>>> + * page_add_file_rmap - add pte mapping to a file page
>>>> + * @page:	the page to add the mapping to
>>>> + * @vma:	the vm area in which the mapping is added
>>>> + * @compound:	charge the page as compound or small page
>>>> + *
>>>> + * The caller needs to hold the pte lock.
>>>> + */
>>>> +void page_add_file_rmap(struct page *page, struct vm_area_struct *vma,
>>>> +		bool compound)
>>>> +{
>>>> +	struct folio *folio = page_folio(page);
>>>> +	unsigned int nr_pages;
>>>> +
>>>> +	VM_WARN_ON_ONCE_PAGE(compound && !PageTransHuge(page), page);
>>>> +
>>>> +	if (likely(!compound))
>>>> +		nr_pages = 1;
>>>> +	else
>>>> +		nr_pages = folio_nr_pages(folio);
>>>> +
>>>> +	folio_add_file_rmap_range(folio, page, nr_pages, vma, compound);
>>>> +}
>>>> +
>>>>  /**
>>>>   * page_remove_rmap - take down pte mapping from a page
>>>>   * @page:	page to remove mapping from
>>>
>>


^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v4 35/36] mm: Convert do_set_pte() to set_pte_range()
  2023-03-16 16:23     ` Yin, Fengwei
@ 2023-03-16 16:38       ` Ryan Roberts
  2023-03-16 16:41         ` Yin, Fengwei
  2023-03-16 17:52         ` Matthew Wilcox
  0 siblings, 2 replies; 138+ messages in thread
From: Ryan Roberts @ 2023-03-16 16:38 UTC (permalink / raw)
  To: Yin, Fengwei, Matthew Wilcox (Oracle), linux-arch, will
  Cc: linux-mm, linux-kernel

On 16/03/2023 16:23, Yin, Fengwei wrote:
> 
> 
> On 3/15/2023 11:26 PM, Ryan Roberts wrote:
>> On 15/03/2023 05:14, Matthew Wilcox (Oracle) wrote:
>>> From: Yin Fengwei <fengwei.yin@intel.com>
>>>
>>> set_pte_range() allows to setup page table entries for a specific
>>> range.  It takes advantage of batched rmap update for large folio.
>>> It now takes care of calling update_mmu_cache_range().
>>>
>>> Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
>>> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
>>> ---
>>>  Documentation/filesystems/locking.rst |  2 +-
>>>  include/linux/mm.h                    |  3 ++-
>>>  mm/filemap.c                          |  3 +--
>>>  mm/memory.c                           | 27 +++++++++++++++------------
>>>  4 files changed, 19 insertions(+), 16 deletions(-)
>>>
>>> diff --git a/Documentation/filesystems/locking.rst b/Documentation/filesystems/locking.rst
>>> index 7de7a7272a5e..922886fefb7f 100644
>>> --- a/Documentation/filesystems/locking.rst
>>> +++ b/Documentation/filesystems/locking.rst
>>> @@ -663,7 +663,7 @@ locked. The VM will unlock the page.
>>>  Filesystem should find and map pages associated with offsets from "start_pgoff"
>>>  till "end_pgoff". ->map_pages() is called with page table locked and must
>>>  not block.  If it's not possible to reach a page without blocking,
>>> -filesystem should skip it. Filesystem should use do_set_pte() to setup
>>> +filesystem should skip it. Filesystem should use set_pte_range() to setup
>>>  page table entry. Pointer to entry associated with the page is passed in
>>>  "pte" field in vm_fault structure. Pointers to entries for other offsets
>>>  should be calculated relative to "pte".
>>> diff --git a/include/linux/mm.h b/include/linux/mm.h
>>> index ee755bb4e1c1..81788c985a8c 100644
>>> --- a/include/linux/mm.h
>>> +++ b/include/linux/mm.h
>>> @@ -1299,7 +1299,8 @@ static inline pte_t maybe_mkwrite(pte_t pte, struct vm_area_struct *vma)
>>>  }
>>>  
>>>  vm_fault_t do_set_pmd(struct vm_fault *vmf, struct page *page);
>>> -void do_set_pte(struct vm_fault *vmf, struct page *page, unsigned long addr);
>>> +void set_pte_range(struct vm_fault *vmf, struct folio *folio,
>>> +		struct page *page, unsigned int nr, unsigned long addr);
>>>  
>>>  vm_fault_t finish_fault(struct vm_fault *vmf);
>>>  vm_fault_t finish_mkwrite_fault(struct vm_fault *vmf);
>>> diff --git a/mm/filemap.c b/mm/filemap.c
>>> index 6e2b0778db45..e2317623dcbf 100644
>>> --- a/mm/filemap.c
>>> +++ b/mm/filemap.c
>>> @@ -3504,8 +3504,7 @@ static vm_fault_t filemap_map_folio_range(struct vm_fault *vmf,
>>>  			ret = VM_FAULT_NOPAGE;
>>>  
>>>  		ref_count++;
>>> -		do_set_pte(vmf, page, addr);
>>> -		update_mmu_cache(vma, addr, vmf->pte);
>>> +		set_pte_range(vmf, folio, page, 1, addr);
>>>  	} while (vmf->pte++, page++, addr += PAGE_SIZE, ++count < nr_pages);
>>>  
>>>  	/* Restore the vmf->pte */
>>> diff --git a/mm/memory.c b/mm/memory.c
>>> index 6aa21e8f3753..9a654802f104 100644
>>> --- a/mm/memory.c
>>> +++ b/mm/memory.c
>>> @@ -4274,7 +4274,8 @@ vm_fault_t do_set_pmd(struct vm_fault *vmf, struct page *page)
>>>  }
>>>  #endif
>>>  
>>> -void do_set_pte(struct vm_fault *vmf, struct page *page, unsigned long addr)
>>> +void set_pte_range(struct vm_fault *vmf, struct folio *folio,
>>> +		struct page *page, unsigned int nr, unsigned long addr)
>>>  {
>>>  	struct vm_area_struct *vma = vmf->vma;
>>>  	bool uffd_wp = vmf_orig_pte_uffd_wp(vmf);
>>> @@ -4282,7 +4283,7 @@ void do_set_pte(struct vm_fault *vmf, struct page *page, unsigned long addr)
>>>  	bool prefault = vmf->address != addr;
>>
>> I think you are changing behavior here - is this intentional? Previously this
>> would be evaluated per page, now its evaluated once for the whole range. The
>> intention below is that directly faulted pages are mapped young and prefaulted
>> pages are mapped old. But now a whole range will be mapped the same.
> 
> Yes. You are right here.
> 
> Look at the prefault and cpu_has_hw_af for ARM64, it looks like we
> can avoid to handle vmf->address == addr specially. It's OK to 
> drop prefault and change the logic here a little bit to:
>   if (arch_wants_old_prefaulted_pte())
>       entry = pte_mkold(entry);
>   else
>       entry = pte_sw_mkyong(entry);
> 
> It's not necessary to use pte_sw_mkyong for vmf->address == addr
> because HW will set the ACCESS bit in page table entry.
> 
> Add Will Deacon in case I missed something here. Thanks.

I'll defer to Will's response, but not all arm HW supports HW access flag
management. In that case it's done by SW, so I would imagine that by setting
this to old initially, we will get a second fault to set the access bit, which
will slow things down. I wonder if you will need to split this into (up to) 3
calls to set_ptes()?

> 
> 
> Regards
> Yin, Fengwei
> 
>>
>> Thanks,
>> Ryan
>>
>>>  	pte_t entry;
>>>  
>>> -	flush_icache_page(vma, page);
>>> +	flush_icache_pages(vma, page, nr);
>>>  	entry = mk_pte(page, vma->vm_page_prot);
>>>  
>>>  	if (prefault && arch_wants_old_prefaulted_pte())
>>> @@ -4296,14 +4297,18 @@ void do_set_pte(struct vm_fault *vmf, struct page *page, unsigned long addr)
>>>  		entry = pte_mkuffd_wp(entry);
>>>  	/* copy-on-write page */
>>>  	if (write && !(vma->vm_flags & VM_SHARED)) {
>>> -		inc_mm_counter(vma->vm_mm, MM_ANONPAGES);
>>> -		page_add_new_anon_rmap(page, vma, addr);
>>> -		lru_cache_add_inactive_or_unevictable(page, vma);
>>> +		add_mm_counter(vma->vm_mm, MM_ANONPAGES, nr);
>>> +		VM_BUG_ON_FOLIO(nr != 1, folio);
>>> +		folio_add_new_anon_rmap(folio, vma, addr);
>>> +		folio_add_lru_vma(folio, vma);
>>>  	} else {
>>> -		inc_mm_counter(vma->vm_mm, mm_counter_file(page));
>>> -		page_add_file_rmap(page, vma, false);
>>> +		add_mm_counter(vma->vm_mm, mm_counter_file(page), nr);
>>> +		folio_add_file_rmap_range(folio, page, nr, vma, false);
>>>  	}
>>> -	set_pte_at(vma->vm_mm, addr, vmf->pte, entry);
>>> +	set_ptes(vma->vm_mm, addr, vmf->pte, entry, nr);
>>> +
>>> +	/* no need to invalidate: a not-present page won't be cached */
>>> +	update_mmu_cache_range(vma, addr, vmf->pte, nr);
>>>  }
>>>  
>>>  static bool vmf_pte_changed(struct vm_fault *vmf)
>>> @@ -4376,11 +4381,9 @@ vm_fault_t finish_fault(struct vm_fault *vmf)
>>>  
>>>  	/* Re-check under ptl */
>>>  	if (likely(!vmf_pte_changed(vmf))) {
>>> -		do_set_pte(vmf, page, vmf->address);
>>> -
>>> -		/* no need to invalidate: a not-present page won't be cached */
>>> -		update_mmu_cache(vma, vmf->address, vmf->pte);
>>> +		struct folio *folio = page_folio(page);
>>>  
>>> +		set_pte_range(vmf, folio, page, 1, vmf->address);
>>>  		ret = 0;
>>>  	} else {
>>>  		update_mmu_tlb(vma, vmf->address, vmf->pte);
>>


^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v4 35/36] mm: Convert do_set_pte() to set_pte_range()
  2023-03-16 16:38       ` Ryan Roberts
@ 2023-03-16 16:41         ` Yin, Fengwei
  2023-03-16 16:50           ` Ryan Roberts
  2023-03-16 17:52         ` Matthew Wilcox
  1 sibling, 1 reply; 138+ messages in thread
From: Yin, Fengwei @ 2023-03-16 16:41 UTC (permalink / raw)
  To: Ryan Roberts, Matthew Wilcox (Oracle), linux-arch, will
  Cc: linux-mm, linux-kernel



On 3/17/2023 12:38 AM, Ryan Roberts wrote:
> On 16/03/2023 16:23, Yin, Fengwei wrote:
>>
>>
>> On 3/15/2023 11:26 PM, Ryan Roberts wrote:
>>> On 15/03/2023 05:14, Matthew Wilcox (Oracle) wrote:
>>>> From: Yin Fengwei <fengwei.yin@intel.com>
>>>>
>>>> set_pte_range() allows to setup page table entries for a specific
>>>> range.  It takes advantage of batched rmap update for large folio.
>>>> It now takes care of calling update_mmu_cache_range().
>>>>
>>>> Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
>>>> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
>>>> ---
>>>>  Documentation/filesystems/locking.rst |  2 +-
>>>>  include/linux/mm.h                    |  3 ++-
>>>>  mm/filemap.c                          |  3 +--
>>>>  mm/memory.c                           | 27 +++++++++++++++------------
>>>>  4 files changed, 19 insertions(+), 16 deletions(-)
>>>>
>>>> diff --git a/Documentation/filesystems/locking.rst b/Documentation/filesystems/locking.rst
>>>> index 7de7a7272a5e..922886fefb7f 100644
>>>> --- a/Documentation/filesystems/locking.rst
>>>> +++ b/Documentation/filesystems/locking.rst
>>>> @@ -663,7 +663,7 @@ locked. The VM will unlock the page.
>>>>  Filesystem should find and map pages associated with offsets from "start_pgoff"
>>>>  till "end_pgoff". ->map_pages() is called with page table locked and must
>>>>  not block.  If it's not possible to reach a page without blocking,
>>>> -filesystem should skip it. Filesystem should use do_set_pte() to setup
>>>> +filesystem should skip it. Filesystem should use set_pte_range() to setup
>>>>  page table entry. Pointer to entry associated with the page is passed in
>>>>  "pte" field in vm_fault structure. Pointers to entries for other offsets
>>>>  should be calculated relative to "pte".
>>>> diff --git a/include/linux/mm.h b/include/linux/mm.h
>>>> index ee755bb4e1c1..81788c985a8c 100644
>>>> --- a/include/linux/mm.h
>>>> +++ b/include/linux/mm.h
>>>> @@ -1299,7 +1299,8 @@ static inline pte_t maybe_mkwrite(pte_t pte, struct vm_area_struct *vma)
>>>>  }
>>>>  
>>>>  vm_fault_t do_set_pmd(struct vm_fault *vmf, struct page *page);
>>>> -void do_set_pte(struct vm_fault *vmf, struct page *page, unsigned long addr);
>>>> +void set_pte_range(struct vm_fault *vmf, struct folio *folio,
>>>> +		struct page *page, unsigned int nr, unsigned long addr);
>>>>  
>>>>  vm_fault_t finish_fault(struct vm_fault *vmf);
>>>>  vm_fault_t finish_mkwrite_fault(struct vm_fault *vmf);
>>>> diff --git a/mm/filemap.c b/mm/filemap.c
>>>> index 6e2b0778db45..e2317623dcbf 100644
>>>> --- a/mm/filemap.c
>>>> +++ b/mm/filemap.c
>>>> @@ -3504,8 +3504,7 @@ static vm_fault_t filemap_map_folio_range(struct vm_fault *vmf,
>>>>  			ret = VM_FAULT_NOPAGE;
>>>>  
>>>>  		ref_count++;
>>>> -		do_set_pte(vmf, page, addr);
>>>> -		update_mmu_cache(vma, addr, vmf->pte);
>>>> +		set_pte_range(vmf, folio, page, 1, addr);
>>>>  	} while (vmf->pte++, page++, addr += PAGE_SIZE, ++count < nr_pages);
>>>>  
>>>>  	/* Restore the vmf->pte */
>>>> diff --git a/mm/memory.c b/mm/memory.c
>>>> index 6aa21e8f3753..9a654802f104 100644
>>>> --- a/mm/memory.c
>>>> +++ b/mm/memory.c
>>>> @@ -4274,7 +4274,8 @@ vm_fault_t do_set_pmd(struct vm_fault *vmf, struct page *page)
>>>>  }
>>>>  #endif
>>>>  
>>>> -void do_set_pte(struct vm_fault *vmf, struct page *page, unsigned long addr)
>>>> +void set_pte_range(struct vm_fault *vmf, struct folio *folio,
>>>> +		struct page *page, unsigned int nr, unsigned long addr)
>>>>  {
>>>>  	struct vm_area_struct *vma = vmf->vma;
>>>>  	bool uffd_wp = vmf_orig_pte_uffd_wp(vmf);
>>>> @@ -4282,7 +4283,7 @@ void do_set_pte(struct vm_fault *vmf, struct page *page, unsigned long addr)
>>>>  	bool prefault = vmf->address != addr;
>>>
>>> I think you are changing behavior here - is this intentional? Previously this
>>> would be evaluated per page, now its evaluated once for the whole range. The
>>> intention below is that directly faulted pages are mapped young and prefaulted
>>> pages are mapped old. But now a whole range will be mapped the same.
>>
>> Yes. You are right here.
>>
>> Look at the prefault and cpu_has_hw_af for ARM64, it looks like we
>> can avoid to handle vmf->address == addr specially. It's OK to 
>> drop prefault and change the logic here a little bit to:
>>   if (arch_wants_old_prefaulted_pte())
>>       entry = pte_mkold(entry);
>>   else
>>       entry = pte_sw_mkyong(entry);
>>
>> It's not necessary to use pte_sw_mkyong for vmf->address == addr
>> because HW will set the ACCESS bit in page table entry.
>>
>> Add Will Deacon in case I missed something here. Thanks.
> 
> I'll defer to Will's response, but not all arm HW supports HW access flag
> management. In that case it's done by SW, so I would imagine that by setting
> this to old initially, we will get a second fault to set the access bit, which
> will slow things down. I wonder if you will need to split this into (up to) 3
> calls to set_ptes()?
If no HW access flag, arch_wants_old_prefaulted_pte() will return false. So
path will goto pte_sw_mkyong(entry). Right?


Regards
Yin, Fengwei

> 
>>
>>
>> Regards
>> Yin, Fengwei
>>
>>>
>>> Thanks,
>>> Ryan
>>>
>>>>  	pte_t entry;
>>>>  
>>>> -	flush_icache_page(vma, page);
>>>> +	flush_icache_pages(vma, page, nr);
>>>>  	entry = mk_pte(page, vma->vm_page_prot);
>>>>  
>>>>  	if (prefault && arch_wants_old_prefaulted_pte())
>>>> @@ -4296,14 +4297,18 @@ void do_set_pte(struct vm_fault *vmf, struct page *page, unsigned long addr)
>>>>  		entry = pte_mkuffd_wp(entry);
>>>>  	/* copy-on-write page */
>>>>  	if (write && !(vma->vm_flags & VM_SHARED)) {
>>>> -		inc_mm_counter(vma->vm_mm, MM_ANONPAGES);
>>>> -		page_add_new_anon_rmap(page, vma, addr);
>>>> -		lru_cache_add_inactive_or_unevictable(page, vma);
>>>> +		add_mm_counter(vma->vm_mm, MM_ANONPAGES, nr);
>>>> +		VM_BUG_ON_FOLIO(nr != 1, folio);
>>>> +		folio_add_new_anon_rmap(folio, vma, addr);
>>>> +		folio_add_lru_vma(folio, vma);
>>>>  	} else {
>>>> -		inc_mm_counter(vma->vm_mm, mm_counter_file(page));
>>>> -		page_add_file_rmap(page, vma, false);
>>>> +		add_mm_counter(vma->vm_mm, mm_counter_file(page), nr);
>>>> +		folio_add_file_rmap_range(folio, page, nr, vma, false);
>>>>  	}
>>>> -	set_pte_at(vma->vm_mm, addr, vmf->pte, entry);
>>>> +	set_ptes(vma->vm_mm, addr, vmf->pte, entry, nr);
>>>> +
>>>> +	/* no need to invalidate: a not-present page won't be cached */
>>>> +	update_mmu_cache_range(vma, addr, vmf->pte, nr);
>>>>  }
>>>>  
>>>>  static bool vmf_pte_changed(struct vm_fault *vmf)
>>>> @@ -4376,11 +4381,9 @@ vm_fault_t finish_fault(struct vm_fault *vmf)
>>>>  
>>>>  	/* Re-check under ptl */
>>>>  	if (likely(!vmf_pte_changed(vmf))) {
>>>> -		do_set_pte(vmf, page, vmf->address);
>>>> -
>>>> -		/* no need to invalidate: a not-present page won't be cached */
>>>> -		update_mmu_cache(vma, vmf->address, vmf->pte);
>>>> +		struct folio *folio = page_folio(page);
>>>>  
>>>> +		set_pte_range(vmf, folio, page, 1, vmf->address);
>>>>  		ret = 0;
>>>>  	} else {
>>>>  		update_mmu_tlb(vma, vmf->address, vmf->pte);
>>>
> 

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v4 35/36] mm: Convert do_set_pte() to set_pte_range()
  2023-03-16 16:41         ` Yin, Fengwei
@ 2023-03-16 16:50           ` Ryan Roberts
  0 siblings, 0 replies; 138+ messages in thread
From: Ryan Roberts @ 2023-03-16 16:50 UTC (permalink / raw)
  To: Yin, Fengwei, Matthew Wilcox (Oracle), linux-arch, will
  Cc: linux-mm, linux-kernel

On 16/03/2023 16:41, Yin, Fengwei wrote:
> 
> 
> On 3/17/2023 12:38 AM, Ryan Roberts wrote:
>> On 16/03/2023 16:23, Yin, Fengwei wrote:
>>>
>>>
>>> On 3/15/2023 11:26 PM, Ryan Roberts wrote:
>>>> On 15/03/2023 05:14, Matthew Wilcox (Oracle) wrote:
>>>>> From: Yin Fengwei <fengwei.yin@intel.com>
>>>>>
>>>>> set_pte_range() allows to setup page table entries for a specific
>>>>> range.  It takes advantage of batched rmap update for large folio.
>>>>> It now takes care of calling update_mmu_cache_range().
>>>>>
>>>>> Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
>>>>> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
>>>>> ---
>>>>>  Documentation/filesystems/locking.rst |  2 +-
>>>>>  include/linux/mm.h                    |  3 ++-
>>>>>  mm/filemap.c                          |  3 +--
>>>>>  mm/memory.c                           | 27 +++++++++++++++------------
>>>>>  4 files changed, 19 insertions(+), 16 deletions(-)
>>>>>
>>>>> diff --git a/Documentation/filesystems/locking.rst b/Documentation/filesystems/locking.rst
>>>>> index 7de7a7272a5e..922886fefb7f 100644
>>>>> --- a/Documentation/filesystems/locking.rst
>>>>> +++ b/Documentation/filesystems/locking.rst
>>>>> @@ -663,7 +663,7 @@ locked. The VM will unlock the page.
>>>>>  Filesystem should find and map pages associated with offsets from "start_pgoff"
>>>>>  till "end_pgoff". ->map_pages() is called with page table locked and must
>>>>>  not block.  If it's not possible to reach a page without blocking,
>>>>> -filesystem should skip it. Filesystem should use do_set_pte() to setup
>>>>> +filesystem should skip it. Filesystem should use set_pte_range() to setup
>>>>>  page table entry. Pointer to entry associated with the page is passed in
>>>>>  "pte" field in vm_fault structure. Pointers to entries for other offsets
>>>>>  should be calculated relative to "pte".
>>>>> diff --git a/include/linux/mm.h b/include/linux/mm.h
>>>>> index ee755bb4e1c1..81788c985a8c 100644
>>>>> --- a/include/linux/mm.h
>>>>> +++ b/include/linux/mm.h
>>>>> @@ -1299,7 +1299,8 @@ static inline pte_t maybe_mkwrite(pte_t pte, struct vm_area_struct *vma)
>>>>>  }
>>>>>  
>>>>>  vm_fault_t do_set_pmd(struct vm_fault *vmf, struct page *page);
>>>>> -void do_set_pte(struct vm_fault *vmf, struct page *page, unsigned long addr);
>>>>> +void set_pte_range(struct vm_fault *vmf, struct folio *folio,
>>>>> +		struct page *page, unsigned int nr, unsigned long addr);
>>>>>  
>>>>>  vm_fault_t finish_fault(struct vm_fault *vmf);
>>>>>  vm_fault_t finish_mkwrite_fault(struct vm_fault *vmf);
>>>>> diff --git a/mm/filemap.c b/mm/filemap.c
>>>>> index 6e2b0778db45..e2317623dcbf 100644
>>>>> --- a/mm/filemap.c
>>>>> +++ b/mm/filemap.c
>>>>> @@ -3504,8 +3504,7 @@ static vm_fault_t filemap_map_folio_range(struct vm_fault *vmf,
>>>>>  			ret = VM_FAULT_NOPAGE;
>>>>>  
>>>>>  		ref_count++;
>>>>> -		do_set_pte(vmf, page, addr);
>>>>> -		update_mmu_cache(vma, addr, vmf->pte);
>>>>> +		set_pte_range(vmf, folio, page, 1, addr);
>>>>>  	} while (vmf->pte++, page++, addr += PAGE_SIZE, ++count < nr_pages);
>>>>>  
>>>>>  	/* Restore the vmf->pte */
>>>>> diff --git a/mm/memory.c b/mm/memory.c
>>>>> index 6aa21e8f3753..9a654802f104 100644
>>>>> --- a/mm/memory.c
>>>>> +++ b/mm/memory.c
>>>>> @@ -4274,7 +4274,8 @@ vm_fault_t do_set_pmd(struct vm_fault *vmf, struct page *page)
>>>>>  }
>>>>>  #endif
>>>>>  
>>>>> -void do_set_pte(struct vm_fault *vmf, struct page *page, unsigned long addr)
>>>>> +void set_pte_range(struct vm_fault *vmf, struct folio *folio,
>>>>> +		struct page *page, unsigned int nr, unsigned long addr)
>>>>>  {
>>>>>  	struct vm_area_struct *vma = vmf->vma;
>>>>>  	bool uffd_wp = vmf_orig_pte_uffd_wp(vmf);
>>>>> @@ -4282,7 +4283,7 @@ void do_set_pte(struct vm_fault *vmf, struct page *page, unsigned long addr)
>>>>>  	bool prefault = vmf->address != addr;
>>>>
>>>> I think you are changing behavior here - is this intentional? Previously this
>>>> would be evaluated per page, now its evaluated once for the whole range. The
>>>> intention below is that directly faulted pages are mapped young and prefaulted
>>>> pages are mapped old. But now a whole range will be mapped the same.
>>>
>>> Yes. You are right here.
>>>
>>> Look at the prefault and cpu_has_hw_af for ARM64, it looks like we
>>> can avoid to handle vmf->address == addr specially. It's OK to 
>>> drop prefault and change the logic here a little bit to:
>>>   if (arch_wants_old_prefaulted_pte())
>>>       entry = pte_mkold(entry);
>>>   else
>>>       entry = pte_sw_mkyong(entry);
>>>
>>> It's not necessary to use pte_sw_mkyong for vmf->address == addr
>>> because HW will set the ACCESS bit in page table entry.
>>>
>>> Add Will Deacon in case I missed something here. Thanks.
>>
>> I'll defer to Will's response, but not all arm HW supports HW access flag
>> management. In that case it's done by SW, so I would imagine that by setting
>> this to old initially, we will get a second fault to set the access bit, which
>> will slow things down. I wonder if you will need to split this into (up to) 3
>> calls to set_ptes()?
> If no HW access flag, arch_wants_old_prefaulted_pte() will return false. So
> path will goto pte_sw_mkyong(entry). Right?

Oops... yes, I agree with you - disregard my previous comment.

> 
> 
> Regards
> Yin, Fengwei
> 
>>
>>>
>>>
>>> Regards
>>> Yin, Fengwei
>>>
>>>>
>>>> Thanks,
>>>> Ryan
>>>>
>>>>>  	pte_t entry;
>>>>>  
>>>>> -	flush_icache_page(vma, page);
>>>>> +	flush_icache_pages(vma, page, nr);
>>>>>  	entry = mk_pte(page, vma->vm_page_prot);
>>>>>  
>>>>>  	if (prefault && arch_wants_old_prefaulted_pte())
>>>>> @@ -4296,14 +4297,18 @@ void do_set_pte(struct vm_fault *vmf, struct page *page, unsigned long addr)
>>>>>  		entry = pte_mkuffd_wp(entry);
>>>>>  	/* copy-on-write page */
>>>>>  	if (write && !(vma->vm_flags & VM_SHARED)) {
>>>>> -		inc_mm_counter(vma->vm_mm, MM_ANONPAGES);
>>>>> -		page_add_new_anon_rmap(page, vma, addr);
>>>>> -		lru_cache_add_inactive_or_unevictable(page, vma);
>>>>> +		add_mm_counter(vma->vm_mm, MM_ANONPAGES, nr);
>>>>> +		VM_BUG_ON_FOLIO(nr != 1, folio);
>>>>> +		folio_add_new_anon_rmap(folio, vma, addr);
>>>>> +		folio_add_lru_vma(folio, vma);
>>>>>  	} else {
>>>>> -		inc_mm_counter(vma->vm_mm, mm_counter_file(page));
>>>>> -		page_add_file_rmap(page, vma, false);
>>>>> +		add_mm_counter(vma->vm_mm, mm_counter_file(page), nr);
>>>>> +		folio_add_file_rmap_range(folio, page, nr, vma, false);
>>>>>  	}
>>>>> -	set_pte_at(vma->vm_mm, addr, vmf->pte, entry);
>>>>> +	set_ptes(vma->vm_mm, addr, vmf->pte, entry, nr);
>>>>> +
>>>>> +	/* no need to invalidate: a not-present page won't be cached */
>>>>> +	update_mmu_cache_range(vma, addr, vmf->pte, nr);
>>>>>  }
>>>>>  
>>>>>  static bool vmf_pte_changed(struct vm_fault *vmf)
>>>>> @@ -4376,11 +4381,9 @@ vm_fault_t finish_fault(struct vm_fault *vmf)
>>>>>  
>>>>>  	/* Re-check under ptl */
>>>>>  	if (likely(!vmf_pte_changed(vmf))) {
>>>>> -		do_set_pte(vmf, page, vmf->address);
>>>>> -
>>>>> -		/* no need to invalidate: a not-present page won't be cached */
>>>>> -		update_mmu_cache(vma, vmf->address, vmf->pte);
>>>>> +		struct folio *folio = page_folio(page);
>>>>>  
>>>>> +		set_pte_range(vmf, folio, page, 1, vmf->address);
>>>>>  		ret = 0;
>>>>>  	} else {
>>>>>  		update_mmu_tlb(vma, vmf->address, vmf->pte);
>>>>
>>


^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v4 35/36] mm: Convert do_set_pte() to set_pte_range()
  2023-03-16 16:38       ` Ryan Roberts
  2023-03-16 16:41         ` Yin, Fengwei
@ 2023-03-16 17:52         ` Matthew Wilcox
  2023-03-17  1:58           ` Yin, Fengwei
  1 sibling, 1 reply; 138+ messages in thread
From: Matthew Wilcox @ 2023-03-16 17:52 UTC (permalink / raw)
  To: Ryan Roberts; +Cc: Yin, Fengwei, linux-arch, will, linux-mm, linux-kernel

On Thu, Mar 16, 2023 at 04:38:58PM +0000, Ryan Roberts wrote:
> On 16/03/2023 16:23, Yin, Fengwei wrote:
> >> I think you are changing behavior here - is this intentional? Previously this
> >> would be evaluated per page, now its evaluated once for the whole range. The
> >> intention below is that directly faulted pages are mapped young and prefaulted
> >> pages are mapped old. But now a whole range will be mapped the same.
> > 
> > Yes. You are right here.
> > 
> > Look at the prefault and cpu_has_hw_af for ARM64, it looks like we
> > can avoid to handle vmf->address == addr specially. It's OK to 
> > drop prefault and change the logic here a little bit to:
> >   if (arch_wants_old_prefaulted_pte())
> >       entry = pte_mkold(entry);
> >   else
> >       entry = pte_sw_mkyong(entry);
> > 
> > It's not necessary to use pte_sw_mkyong for vmf->address == addr
> > because HW will set the ACCESS bit in page table entry.
> > 
> > Add Will Deacon in case I missed something here. Thanks.
> 
> I'll defer to Will's response, but not all arm HW supports HW access flag
> management. In that case it's done by SW, so I would imagine that by setting
> this to old initially, we will get a second fault to set the access bit, which
> will slow things down. I wonder if you will need to split this into (up to) 3
> calls to set_ptes()?

I don't think we should do that.  The limited information I have from
various microarchitectures is that the PTEs must differ only in their
PFN bits in order to use larger TLB entries.  That includes the Accessed
bit (or equivalent).  So we should mkyoung all the PTEs in the same
folio, at least initially.

That said, we should still do this conditionally.  We'll prefault some
other folios too.  So I think this should be:

        bool prefault = (addr > vmf->address) || ((addr + nr) < vmf->address);


^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v4 35/36] mm: Convert do_set_pte() to set_pte_range()
  2023-03-16 17:52         ` Matthew Wilcox
@ 2023-03-17  1:58           ` Yin, Fengwei
  2023-03-17  3:44             ` Matthew Wilcox
  0 siblings, 1 reply; 138+ messages in thread
From: Yin, Fengwei @ 2023-03-17  1:58 UTC (permalink / raw)
  To: Matthew Wilcox, Ryan Roberts; +Cc: linux-arch, will, linux-mm, linux-kernel



On 3/17/2023 1:52 AM, Matthew Wilcox wrote:
> On Thu, Mar 16, 2023 at 04:38:58PM +0000, Ryan Roberts wrote:
>> On 16/03/2023 16:23, Yin, Fengwei wrote:
>>>> I think you are changing behavior here - is this intentional? Previously this
>>>> would be evaluated per page, now its evaluated once for the whole range. The
>>>> intention below is that directly faulted pages are mapped young and prefaulted
>>>> pages are mapped old. But now a whole range will be mapped the same.
>>>
>>> Yes. You are right here.
>>>
>>> Look at the prefault and cpu_has_hw_af for ARM64, it looks like we
>>> can avoid to handle vmf->address == addr specially. It's OK to 
>>> drop prefault and change the logic here a little bit to:
>>>   if (arch_wants_old_prefaulted_pte())
>>>       entry = pte_mkold(entry);
>>>   else
>>>       entry = pte_sw_mkyong(entry);
>>>
>>> It's not necessary to use pte_sw_mkyong for vmf->address == addr
>>> because HW will set the ACCESS bit in page table entry.
>>>
>>> Add Will Deacon in case I missed something here. Thanks.
>>
>> I'll defer to Will's response, but not all arm HW supports HW access flag
>> management. In that case it's done by SW, so I would imagine that by setting
>> this to old initially, we will get a second fault to set the access bit, which
>> will slow things down. I wonder if you will need to split this into (up to) 3
>> calls to set_ptes()?
> 
> I don't think we should do that.  The limited information I have from
> various microarchitectures is that the PTEs must differ only in their
> PFN bits in order to use larger TLB entries.  That includes the Accessed
> bit (or equivalent).  So we should mkyoung all the PTEs in the same
> folio, at least initially.
> 
> That said, we should still do this conditionally.  We'll prefault some
> other folios too.  So I think this should be:
> 
>         bool prefault = (addr > vmf->address) || ((addr + nr) < vmf->address);
> 
According to commit 46bdb4277f98e70d0c91f4289897ade533fe9e80, if hardware access
flag is supported on ARM64, there is benefit if prefault PTEs is set as "old".
If we change prefault like above, the PTEs is set as "yong" which loose benefit
on ARM64 with hardware access flag.

ITOH, if from "old" to "yong" is cheap, why not leave all PTEs of folio as "old"
and let hardware to update it to "yong"?

Regards
Yin, Fengwei

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v4 35/36] mm: Convert do_set_pte() to set_pte_range()
  2023-03-17  1:58           ` Yin, Fengwei
@ 2023-03-17  3:44             ` Matthew Wilcox
  2023-03-17  6:33               ` Yin, Fengwei
  2023-03-20 13:38               ` Yin, Fengwei
  0 siblings, 2 replies; 138+ messages in thread
From: Matthew Wilcox @ 2023-03-17  3:44 UTC (permalink / raw)
  To: Yin, Fengwei; +Cc: Ryan Roberts, linux-arch, will, linux-mm, linux-kernel

On Fri, Mar 17, 2023 at 09:58:17AM +0800, Yin, Fengwei wrote:
> 
> 
> On 3/17/2023 1:52 AM, Matthew Wilcox wrote:
> > On Thu, Mar 16, 2023 at 04:38:58PM +0000, Ryan Roberts wrote:
> >> On 16/03/2023 16:23, Yin, Fengwei wrote:
> >>>> I think you are changing behavior here - is this intentional? Previously this
> >>>> would be evaluated per page, now its evaluated once for the whole range. The
> >>>> intention below is that directly faulted pages are mapped young and prefaulted
> >>>> pages are mapped old. But now a whole range will be mapped the same.
> >>>
> >>> Yes. You are right here.
> >>>
> >>> Look at the prefault and cpu_has_hw_af for ARM64, it looks like we
> >>> can avoid to handle vmf->address == addr specially. It's OK to 
> >>> drop prefault and change the logic here a little bit to:
> >>>   if (arch_wants_old_prefaulted_pte())
> >>>       entry = pte_mkold(entry);
> >>>   else
> >>>       entry = pte_sw_mkyong(entry);
> >>>
> >>> It's not necessary to use pte_sw_mkyong for vmf->address == addr
> >>> because HW will set the ACCESS bit in page table entry.
> >>>
> >>> Add Will Deacon in case I missed something here. Thanks.
> >>
> >> I'll defer to Will's response, but not all arm HW supports HW access flag
> >> management. In that case it's done by SW, so I would imagine that by setting
> >> this to old initially, we will get a second fault to set the access bit, which
> >> will slow things down. I wonder if you will need to split this into (up to) 3
> >> calls to set_ptes()?
> > 
> > I don't think we should do that.  The limited information I have from
> > various microarchitectures is that the PTEs must differ only in their
> > PFN bits in order to use larger TLB entries.  That includes the Accessed
> > bit (or equivalent).  So we should mkyoung all the PTEs in the same
> > folio, at least initially.
> > 
> > That said, we should still do this conditionally.  We'll prefault some
> > other folios too.  So I think this should be:
> > 
> >         bool prefault = (addr > vmf->address) || ((addr + nr) < vmf->address);
> > 
> According to commit 46bdb4277f98e70d0c91f4289897ade533fe9e80, if hardware access
> flag is supported on ARM64, there is benefit if prefault PTEs is set as "old".
> If we change prefault like above, the PTEs is set as "yong" which loose benefit
> on ARM64 with hardware access flag.
> 
> ITOH, if from "old" to "yong" is cheap, why not leave all PTEs of folio as "old"
> and let hardware to update it to "yong"?

Because we're tracking the entire folio as a single entity.  So we're
better off avoiding the extra pagefaults to update the accessed bit,
which won't actually give us any information (vmscan needs to know "were
any of the accessed bits set", not "how many of them were set").

Anyway, hopefully Ryan can test this and let us know if it fixes the
regression he sees.

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v4 20/36] powerpc: Implement the new page table range API
  2023-03-15 10:18     ` Christophe Leroy
@ 2023-03-17  3:47       ` Matthew Wilcox
  2023-03-18  9:19         ` Christophe Leroy
  0 siblings, 1 reply; 138+ messages in thread
From: Matthew Wilcox @ 2023-03-17  3:47 UTC (permalink / raw)
  To: Christophe Leroy
  Cc: linux-arch, linux-mm, linux-kernel, Michael Ellerman,
	Nicholas Piggin, linuxppc-dev

On Wed, Mar 15, 2023 at 10:18:22AM +0000, Christophe Leroy wrote:
> I investigated a bit further and can confirm now that the above won't 
> always work, see comment 
> https://elixir.bootlin.com/linux/v6.3-rc2/source/arch/powerpc/include/asm/nohash/32/pgtable.h#L147
> 
> And then you see 
> https://elixir.bootlin.com/linux/v6.3-rc2/source/arch/powerpc/include/asm/nohash/pte-e500.h#L63

Got it.  Here's what I intend to fold in for the next version:

diff --git a/arch/powerpc/include/asm/book3s/32/pgtable.h b/arch/powerpc/include/asm/book3s/32/pgtable.h
index 7bf1fe7297c6..5f12b9382909 100644
--- a/arch/powerpc/include/asm/book3s/32/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/32/pgtable.h
@@ -462,11 +462,6 @@ static inline pte_t pfn_pte(unsigned long pfn, pgprot_t pgprot)
 		     pgprot_val(pgprot));
 }
 
-static inline unsigned long pte_pfn(pte_t pte)
-{
-	return pte_val(pte) >> PTE_RPN_SHIFT;
-}
-
 /* Generic modifiers for PTE bits */
 static inline pte_t pte_wrprotect(pte_t pte)
 {
diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h
index 4acc9690f599..c5baa3082a5a 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -104,6 +104,7 @@
  * and every thing below PAGE_SHIFT;
  */
 #define PTE_RPN_MASK	(((1UL << _PAGE_PA_MAX) - 1) & (PAGE_MASK))
+#define PTE_RPN_SHIFT	PAGE_SHIFT
 /*
  * set of bits not changed in pmd_modify. Even though we have hash specific bits
  * in here, on radix we expect them to be zero.
@@ -569,11 +570,6 @@ static inline pte_t pfn_pte(unsigned long pfn, pgprot_t pgprot)
 	return __pte(((pte_basic_t)pfn << PAGE_SHIFT) | pgprot_val(pgprot) | _PAGE_PTE);
 }
 
-static inline unsigned long pte_pfn(pte_t pte)
-{
-	return (pte_val(pte) & PTE_RPN_MASK) >> PAGE_SHIFT;
-}
-
 /* Generic modifiers for PTE bits */
 static inline pte_t pte_wrprotect(pte_t pte)
 {
diff --git a/arch/powerpc/include/asm/nohash/pgtable.h b/arch/powerpc/include/asm/nohash/pgtable.h
index 69a7dd47a9f0..03be8b22aaea 100644
--- a/arch/powerpc/include/asm/nohash/pgtable.h
+++ b/arch/powerpc/include/asm/nohash/pgtable.h
@@ -101,8 +101,6 @@ static inline bool pte_access_permitted(pte_t pte, bool write)
 static inline pte_t pfn_pte(unsigned long pfn, pgprot_t pgprot) {
 	return __pte(((pte_basic_t)(pfn) << PTE_RPN_SHIFT) |
 		     pgprot_val(pgprot)); }
-static inline unsigned long pte_pfn(pte_t pte)	{
-	return pte_val(pte) >> PTE_RPN_SHIFT; }
 
 /* Generic modifiers for PTE bits */
 static inline pte_t pte_exprotect(pte_t pte)
@@ -279,7 +277,7 @@ static inline int pud_huge(pud_t pud)
 void update_mmu_cache_range(struct vm_area_struct *vma, unsigned long address,
 		pte_t *ptep, unsigned int nr);
 #else
-static inline void update_mmu_cache(struct vm_area_struct *vma,
+static inline void update_mmu_cache_range(struct vm_area_struct *vma,
 		unsigned long address, pte_t *ptep, unsigned int nr) {}
 #endif
 
diff --git a/arch/powerpc/include/asm/pgtable.h b/arch/powerpc/include/asm/pgtable.h
index 656ecf2b10cd..491a2720f835 100644
--- a/arch/powerpc/include/asm/pgtable.h
+++ b/arch/powerpc/include/asm/pgtable.h
@@ -54,6 +54,12 @@ void set_ptes(struct mm_struct *mm, unsigned long addr, pte_t *ptep,
 /* Keep these as a macros to avoid include dependency mess */
 #define pte_page(x)		pfn_to_page(pte_pfn(x))
 #define mk_pte(page, pgprot)	pfn_pte(page_to_pfn(page), (pgprot))
+
+static inline unsigned long pte_pfn(pte_t pte)
+{
+	return (pte_val(pte) & PTE_RPN_MASK) >> PTE_RPN_SHIFT;
+}
+
 /*
  * Select all bits except the pfn
  */
diff --git a/arch/powerpc/mm/nohash/e500_hugetlbpage.c b/arch/powerpc/mm/nohash/e500_hugetlbpage.c
index f3cb91107a47..583b3098763f 100644
--- a/arch/powerpc/mm/nohash/e500_hugetlbpage.c
+++ b/arch/powerpc/mm/nohash/e500_hugetlbpage.c
@@ -178,7 +178,7 @@ book3e_hugetlb_preload(struct vm_area_struct *vma, unsigned long ea, pte_t pte)
  *
  * This must always be called with the pte lock held.
  */
-void update_mmu_cache(struct vm_area_struct *vma, unsigned long address,
+void update_mmu_cache_range(struct vm_area_struct *vma, unsigned long address,
 		pte_t *ptep, unsigned int nr)
 {
 	if (is_vm_hugetlb_page(vma))
diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c
index b3c7b874a7a2..db236b494845 100644
--- a/arch/powerpc/mm/pgtable.c
+++ b/arch/powerpc/mm/pgtable.c
@@ -208,7 +208,7 @@ void set_ptes(struct mm_struct *mm, unsigned long addr, pte_t *ptep,
 		if (--nr == 0)
 			break;
 		ptep++;
-		pte = __pte(pte_val(pte) + PAGE_SIZE);
+		pte = __pte(pte_val(pte) + (1UL << PTE_RPN_SHIFT));
 		addr += PAGE_SIZE;
 	}
 }

^ permalink raw reply related	[flat|nested] 138+ messages in thread

* Re: [PATCH v4 35/36] mm: Convert do_set_pte() to set_pte_range()
  2023-03-17  3:44             ` Matthew Wilcox
@ 2023-03-17  6:33               ` Yin, Fengwei
  2023-03-17  8:00                 ` Ryan Roberts
  2023-03-20 13:38               ` Yin, Fengwei
  1 sibling, 1 reply; 138+ messages in thread
From: Yin, Fengwei @ 2023-03-17  6:33 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: Ryan Roberts, linux-arch, will, linux-mm, linux-kernel



On 3/17/2023 11:44 AM, Matthew Wilcox wrote:
> On Fri, Mar 17, 2023 at 09:58:17AM +0800, Yin, Fengwei wrote:
>>
>>
>> On 3/17/2023 1:52 AM, Matthew Wilcox wrote:
>>> On Thu, Mar 16, 2023 at 04:38:58PM +0000, Ryan Roberts wrote:
>>>> On 16/03/2023 16:23, Yin, Fengwei wrote:
>>>>>> I think you are changing behavior here - is this intentional? Previously this
>>>>>> would be evaluated per page, now its evaluated once for the whole range. The
>>>>>> intention below is that directly faulted pages are mapped young and prefaulted
>>>>>> pages are mapped old. But now a whole range will be mapped the same.
>>>>>
>>>>> Yes. You are right here.
>>>>>
>>>>> Look at the prefault and cpu_has_hw_af for ARM64, it looks like we
>>>>> can avoid to handle vmf->address == addr specially. It's OK to 
>>>>> drop prefault and change the logic here a little bit to:
>>>>>   if (arch_wants_old_prefaulted_pte())
>>>>>       entry = pte_mkold(entry);
>>>>>   else
>>>>>       entry = pte_sw_mkyong(entry);
>>>>>
>>>>> It's not necessary to use pte_sw_mkyong for vmf->address == addr
>>>>> because HW will set the ACCESS bit in page table entry.
>>>>>
>>>>> Add Will Deacon in case I missed something here. Thanks.
>>>>
>>>> I'll defer to Will's response, but not all arm HW supports HW access flag
>>>> management. In that case it's done by SW, so I would imagine that by setting
>>>> this to old initially, we will get a second fault to set the access bit, which
>>>> will slow things down. I wonder if you will need to split this into (up to) 3
>>>> calls to set_ptes()?
>>>
>>> I don't think we should do that.  The limited information I have from
>>> various microarchitectures is that the PTEs must differ only in their
>>> PFN bits in order to use larger TLB entries.  That includes the Accessed
>>> bit (or equivalent).  So we should mkyoung all the PTEs in the same
>>> folio, at least initially.
>>>
>>> That said, we should still do this conditionally.  We'll prefault some
>>> other folios too.  So I think this should be:
>>>
>>>         bool prefault = (addr > vmf->address) || ((addr + nr) < vmf->address);
>>>
>> According to commit 46bdb4277f98e70d0c91f4289897ade533fe9e80, if hardware access
>> flag is supported on ARM64, there is benefit if prefault PTEs is set as "old".
>> If we change prefault like above, the PTEs is set as "yong" which loose benefit
>> on ARM64 with hardware access flag.
>>
>> ITOH, if from "old" to "yong" is cheap, why not leave all PTEs of folio as "old"
>> and let hardware to update it to "yong"?
> 
> Because we're tracking the entire folio as a single entity.  So we're
> better off avoiding the extra pagefaults to update the accessed bit,
> which won't actually give us any information (vmscan needs to know "were
> any of the accessed bits set", not "how many of them were set").
There is no extra pagefaults to update the accessed bit. There are three cases here:
1. hardware support access flag and cheap from "old" to "yong" without extra fault
2. hardware support access flag and expensive from "old" to "yong" without extra fault
3. no hardware support access flag (extra pagefaults from "old" to "yong". Expensive)

For #2 and #3, it's expensive from "old" to "yong", so we always set PTEs "yong" in
page fault.
For #1, It's cheap from "old" to "yong", so it's OK to set PTEs "old" in page fault.
And hardware will set it to "yong" when access memory. Actually, ARM64 with hardware
access bit requires to set PTEs "old".

> 
> Anyway, hopefully Ryan can test this and let us know if it fixes the
> regression he sees.
I highly suspect the regression Ryan saw is not related with this but another my
stupid work. I will send out the testing patch soon. Thanks.


Regards
Yin, Fengwei

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v4 35/36] mm: Convert do_set_pte() to set_pte_range()
  2023-03-17  6:33               ` Yin, Fengwei
@ 2023-03-17  8:00                 ` Ryan Roberts
  2023-03-17  8:19                   ` Yin, Fengwei
  0 siblings, 1 reply; 138+ messages in thread
From: Ryan Roberts @ 2023-03-17  8:00 UTC (permalink / raw)
  To: Yin, Fengwei, Matthew Wilcox; +Cc: linux-arch, will, linux-mm, linux-kernel

On 17/03/2023 06:33, Yin, Fengwei wrote:
> 
> 
> On 3/17/2023 11:44 AM, Matthew Wilcox wrote:
>> On Fri, Mar 17, 2023 at 09:58:17AM +0800, Yin, Fengwei wrote:
>>>
>>>
>>> On 3/17/2023 1:52 AM, Matthew Wilcox wrote:
>>>> On Thu, Mar 16, 2023 at 04:38:58PM +0000, Ryan Roberts wrote:
>>>>> On 16/03/2023 16:23, Yin, Fengwei wrote:
>>>>>>> I think you are changing behavior here - is this intentional? Previously this
>>>>>>> would be evaluated per page, now its evaluated once for the whole range. The
>>>>>>> intention below is that directly faulted pages are mapped young and prefaulted
>>>>>>> pages are mapped old. But now a whole range will be mapped the same.
>>>>>>
>>>>>> Yes. You are right here.
>>>>>>
>>>>>> Look at the prefault and cpu_has_hw_af for ARM64, it looks like we
>>>>>> can avoid to handle vmf->address == addr specially. It's OK to 
>>>>>> drop prefault and change the logic here a little bit to:
>>>>>>   if (arch_wants_old_prefaulted_pte())
>>>>>>       entry = pte_mkold(entry);
>>>>>>   else
>>>>>>       entry = pte_sw_mkyong(entry);
>>>>>>
>>>>>> It's not necessary to use pte_sw_mkyong for vmf->address == addr
>>>>>> because HW will set the ACCESS bit in page table entry.
>>>>>>
>>>>>> Add Will Deacon in case I missed something here. Thanks.
>>>>>
>>>>> I'll defer to Will's response, but not all arm HW supports HW access flag
>>>>> management. In that case it's done by SW, so I would imagine that by setting
>>>>> this to old initially, we will get a second fault to set the access bit, which
>>>>> will slow things down. I wonder if you will need to split this into (up to) 3
>>>>> calls to set_ptes()?
>>>>
>>>> I don't think we should do that.  The limited information I have from
>>>> various microarchitectures is that the PTEs must differ only in their
>>>> PFN bits in order to use larger TLB entries.  That includes the Accessed
>>>> bit (or equivalent).  So we should mkyoung all the PTEs in the same
>>>> folio, at least initially.
>>>>
>>>> That said, we should still do this conditionally.  We'll prefault some
>>>> other folios too.  So I think this should be:
>>>>
>>>>         bool prefault = (addr > vmf->address) || ((addr + nr) < vmf->address);
>>>>
>>> According to commit 46bdb4277f98e70d0c91f4289897ade533fe9e80, if hardware access
>>> flag is supported on ARM64, there is benefit if prefault PTEs is set as "old".
>>> If we change prefault like above, the PTEs is set as "yong" which loose benefit
>>> on ARM64 with hardware access flag.
>>>
>>> ITOH, if from "old" to "yong" is cheap, why not leave all PTEs of folio as "old"
>>> and let hardware to update it to "yong"?
>>
>> Because we're tracking the entire folio as a single entity.  So we're
>> better off avoiding the extra pagefaults to update the accessed bit,
>> which won't actually give us any information (vmscan needs to know "were
>> any of the accessed bits set", not "how many of them were set").
> There is no extra pagefaults to update the accessed bit. There are three cases here:
> 1. hardware support access flag and cheap from "old" to "yong" without extra fault
> 2. hardware support access flag and expensive from "old" to "yong" without extra fault
> 3. no hardware support access flag (extra pagefaults from "old" to "yong". Expensive)
> 
> For #2 and #3, it's expensive from "old" to "yong", so we always set PTEs "yong" in
> page fault.
> For #1, It's cheap from "old" to "yong", so it's OK to set PTEs "old" in page fault.
> And hardware will set it to "yong" when access memory. Actually, ARM64 with hardware
> access bit requires to set PTEs "old".

Your logic makes sense, but it doesn't take into account the HPA
micro-architectural feature present in some ARM CPUs. HPA can transparently
coalesce multiple pages into a single TLB entry when certain conditions are met
(roughly; upto 4 pages physically and virtually contiguous and all within a
4-page natural alignment). But as Matthew says, this works out better when all
pte attributes (including access and dirty) match. Given the reason for setting
the prefault pages to old is so that vmscan can do a better job of finding cold
pages, and given vmscan will now be looking for folios and not individual pages
(I assume?), I agree with Matthew that we should make whole folios young or old.
It will marginally increase our chances of the access and dirty bits being
consistent across the whole 4-page block that the HW tries to coalesce. If we
unconditionally make everything old, the hw will set accessed for the single
page that faulted, and we therefore don't have consistency for that 4-page block.

> 
>>
>> Anyway, hopefully Ryan can test this and let us know if it fixes the
>> regression he sees.
> I highly suspect the regression Ryan saw is not related with this but another my
> stupid work. I will send out the testing patch soon. Thanks.

I tested a version of this where I made everything unconditionally young,
thinking it might be the source of the perf regression, before I reported it. It
doesn't make any difference. So I agree the regression is somewhere else.

Thanks,
Ryan

> 
> 
> Regards
> Yin, Fengwei


^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v4 35/36] mm: Convert do_set_pte() to set_pte_range()
  2023-03-17  8:00                 ` Ryan Roberts
@ 2023-03-17  8:19                   ` Yin, Fengwei
  2023-03-17 13:00                     ` Ryan Roberts
  2023-03-24 14:58                     ` Will Deacon
  0 siblings, 2 replies; 138+ messages in thread
From: Yin, Fengwei @ 2023-03-17  8:19 UTC (permalink / raw)
  To: Ryan Roberts, Matthew Wilcox; +Cc: linux-arch, will, linux-mm, linux-kernel



On 3/17/2023 4:00 PM, Ryan Roberts wrote:
> On 17/03/2023 06:33, Yin, Fengwei wrote:
>>
>>
>> On 3/17/2023 11:44 AM, Matthew Wilcox wrote:
>>> On Fri, Mar 17, 2023 at 09:58:17AM +0800, Yin, Fengwei wrote:
>>>>
>>>>
>>>> On 3/17/2023 1:52 AM, Matthew Wilcox wrote:
>>>>> On Thu, Mar 16, 2023 at 04:38:58PM +0000, Ryan Roberts wrote:
>>>>>> On 16/03/2023 16:23, Yin, Fengwei wrote:
>>>>>>>> I think you are changing behavior here - is this intentional? Previously this
>>>>>>>> would be evaluated per page, now its evaluated once for the whole range. The
>>>>>>>> intention below is that directly faulted pages are mapped young and prefaulted
>>>>>>>> pages are mapped old. But now a whole range will be mapped the same.
>>>>>>>
>>>>>>> Yes. You are right here.
>>>>>>>
>>>>>>> Look at the prefault and cpu_has_hw_af for ARM64, it looks like we
>>>>>>> can avoid to handle vmf->address == addr specially. It's OK to 
>>>>>>> drop prefault and change the logic here a little bit to:
>>>>>>>   if (arch_wants_old_prefaulted_pte())
>>>>>>>       entry = pte_mkold(entry);
>>>>>>>   else
>>>>>>>       entry = pte_sw_mkyong(entry);
>>>>>>>
>>>>>>> It's not necessary to use pte_sw_mkyong for vmf->address == addr
>>>>>>> because HW will set the ACCESS bit in page table entry.
>>>>>>>
>>>>>>> Add Will Deacon in case I missed something here. Thanks.
>>>>>>
>>>>>> I'll defer to Will's response, but not all arm HW supports HW access flag
>>>>>> management. In that case it's done by SW, so I would imagine that by setting
>>>>>> this to old initially, we will get a second fault to set the access bit, which
>>>>>> will slow things down. I wonder if you will need to split this into (up to) 3
>>>>>> calls to set_ptes()?
>>>>>
>>>>> I don't think we should do that.  The limited information I have from
>>>>> various microarchitectures is that the PTEs must differ only in their
>>>>> PFN bits in order to use larger TLB entries.  That includes the Accessed
>>>>> bit (or equivalent).  So we should mkyoung all the PTEs in the same
>>>>> folio, at least initially.
>>>>>
>>>>> That said, we should still do this conditionally.  We'll prefault some
>>>>> other folios too.  So I think this should be:
>>>>>
>>>>>         bool prefault = (addr > vmf->address) || ((addr + nr) < vmf->address);
>>>>>
>>>> According to commit 46bdb4277f98e70d0c91f4289897ade533fe9e80, if hardware access
>>>> flag is supported on ARM64, there is benefit if prefault PTEs is set as "old".
>>>> If we change prefault like above, the PTEs is set as "yong" which loose benefit
>>>> on ARM64 with hardware access flag.
>>>>
>>>> ITOH, if from "old" to "yong" is cheap, why not leave all PTEs of folio as "old"
>>>> and let hardware to update it to "yong"?
>>>
>>> Because we're tracking the entire folio as a single entity.  So we're
>>> better off avoiding the extra pagefaults to update the accessed bit,
>>> which won't actually give us any information (vmscan needs to know "were
>>> any of the accessed bits set", not "how many of them were set").
>> There is no extra pagefaults to update the accessed bit. There are three cases here:
>> 1. hardware support access flag and cheap from "old" to "yong" without extra fault
>> 2. hardware support access flag and expensive from "old" to "yong" without extra fault
>> 3. no hardware support access flag (extra pagefaults from "old" to "yong". Expensive)
>>
>> For #2 and #3, it's expensive from "old" to "yong", so we always set PTEs "yong" in
>> page fault.
>> For #1, It's cheap from "old" to "yong", so it's OK to set PTEs "old" in page fault.
>> And hardware will set it to "yong" when access memory. Actually, ARM64 with hardware
>> access bit requires to set PTEs "old".
> 
> Your logic makes sense, but it doesn't take into account the HPA
> micro-architectural feature present in some ARM CPUs. HPA can transparently
> coalesce multiple pages into a single TLB entry when certain conditions are met
> (roughly; upto 4 pages physically and virtually contiguous and all within a
> 4-page natural alignment). But as Matthew says, this works out better when all
> pte attributes (including access and dirty) match. Given the reason for setting
> the prefault pages to old is so that vmscan can do a better job of finding cold
> pages, and given vmscan will now be looking for folios and not individual pages
> (I assume?), I agree with Matthew that we should make whole folios young or old.
> It will marginally increase our chances of the access and dirty bits being
> consistent across the whole 4-page block that the HW tries to coalesce. If we
> unconditionally make everything old, the hw will set accessed for the single
> page that faulted, and we therefore don't have consistency for that 4-page block.
My concern was that the benefit of "old" PTEs for ARM64 with hardware access bit
will be lost. The workloads (application launch latency and direct reclaim according
to commit 46bdb4277f98e70d0c91f4289897ade533fe9e80) can show regression with this
series. Thanks.

BTW, with TLB merge feature, should hardware update coalesce multiple pages access
bit together? otherwise, it's avoidable that only one page access is set by hardware
finally.

Regards
Yin, Fengwei

> 
>>
>>>
>>> Anyway, hopefully Ryan can test this and let us know if it fixes the
>>> regression he sees.
>> I highly suspect the regression Ryan saw is not related with this but another my
>> stupid work. I will send out the testing patch soon. Thanks.
> 
> I tested a version of this where I made everything unconditionally young,
> thinking it might be the source of the perf regression, before I reported it. It
> doesn't make any difference. So I agree the regression is somewhere else.
> 
> Thanks,
> Ryan
> 
>>
>>
>> Regards
>> Yin, Fengwei
> 

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v4 34/36] rmap: add folio_add_file_rmap_range()
  2023-03-16 16:34         ` Ryan Roberts
@ 2023-03-17  8:23           ` Yin, Fengwei
  2023-03-17 12:46             ` Ryan Roberts
  0 siblings, 1 reply; 138+ messages in thread
From: Yin, Fengwei @ 2023-03-17  8:23 UTC (permalink / raw)
  To: Ryan Roberts, Yin, Fengwei, Matthew Wilcox (Oracle), linux-arch
  Cc: linux-mm, linux-kernel

Hi Ryan,

On 3/17/2023 12:34 AM, Ryan Roberts wrote:
> On 16/03/2023 16:27, Yin, Fengwei wrote:
>> Hi Matthew,
>>
>> On 3/16/2023 12:08 AM, Ryan Roberts wrote:
>>> On 15/03/2023 13:34, Ryan Roberts wrote:
>>>> On 15/03/2023 05:14, Matthew Wilcox (Oracle) wrote:
>>>>> From: Yin Fengwei <fengwei.yin@intel.com>
>>>>>
>>>>> folio_add_file_rmap_range() allows to add pte mapping to a specific
>>>>> range of file folio. Comparing to page_add_file_rmap(), it batched
>>>>> updates __lruvec_stat for large folio.
>>>>>
>>>>> Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
>>>>> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
>>>>> ---
>>>>>  include/linux/rmap.h |  2 ++
>>>>>  mm/rmap.c            | 60 +++++++++++++++++++++++++++++++++-----------
>>>>>  2 files changed, 48 insertions(+), 14 deletions(-)
>>>>>
>>>>> diff --git a/include/linux/rmap.h b/include/linux/rmap.h
>>>>> index b87d01660412..a3825ce81102 100644
>>>>> --- a/include/linux/rmap.h
>>>>> +++ b/include/linux/rmap.h
>>>>> @@ -198,6 +198,8 @@ void folio_add_new_anon_rmap(struct folio *, struct vm_area_struct *,
>>>>>  		unsigned long address);
>>>>>  void page_add_file_rmap(struct page *, struct vm_area_struct *,
>>>>>  		bool compound);
>>>>> +void folio_add_file_rmap_range(struct folio *, struct page *, unsigned int nr,
>>>>> +		struct vm_area_struct *, bool compound);
>>>>>  void page_remove_rmap(struct page *, struct vm_area_struct *,
>>>>>  		bool compound);
>>>>>  
>>>>> diff --git a/mm/rmap.c b/mm/rmap.c
>>>>> index 4898e10c569a..a91906b28835 100644
>>>>> --- a/mm/rmap.c
>>>>> +++ b/mm/rmap.c
>>>>> @@ -1301,31 +1301,39 @@ void folio_add_new_anon_rmap(struct folio *folio, struct vm_area_struct *vma,
>>>>>  }
>>>>>  
>>>>>  /**
>>>>> - * page_add_file_rmap - add pte mapping to a file page
>>>>> - * @page:	the page to add the mapping to
>>>>> + * folio_add_file_rmap_range - add pte mapping to page range of a folio
>>>>> + * @folio:	The folio to add the mapping to
>>>>> + * @page:	The first page to add
>>>>> + * @nr_pages:	The number of pages which will be mapped
>>>>>   * @vma:	the vm area in which the mapping is added
>>>>>   * @compound:	charge the page as compound or small page
>>>>>   *
>>>>> + * The page range of folio is defined by [first_page, first_page + nr_pages)
>>>>> + *
>>>>>   * The caller needs to hold the pte lock.
>>>>>   */
>>>>> -void page_add_file_rmap(struct page *page, struct vm_area_struct *vma,
>>>>> -		bool compound)
>>>>> +void folio_add_file_rmap_range(struct folio *folio, struct page *page,
>>>>> +			unsigned int nr_pages, struct vm_area_struct *vma,
>>>>> +			bool compound)
>>>>>  {
>>>>> -	struct folio *folio = page_folio(page);
>>>>>  	atomic_t *mapped = &folio->_nr_pages_mapped;
>>>>> -	int nr = 0, nr_pmdmapped = 0;
>>>>> -	bool first;
>>>>> +	unsigned int nr_pmdmapped = 0, first;
>>>>> +	int nr = 0;
>>>>>  
>>>>> -	VM_BUG_ON_PAGE(compound && !PageTransHuge(page), page);
>>>>> +	VM_WARN_ON_FOLIO(compound && !folio_test_pmd_mappable(folio), folio);
>>>>>  
>>>>>  	/* Is page being mapped by PTE? Is this its first map to be added? */
>>>>>  	if (likely(!compound)) {
>>>>> -		first = atomic_inc_and_test(&page->_mapcount);
>>>>> -		nr = first;
>>>>> -		if (first && folio_test_large(folio)) {
>>>>> -			nr = atomic_inc_return_relaxed(mapped);
>>>>> -			nr = (nr < COMPOUND_MAPPED);
>>>>> -		}
>>>>> +		do {
>>>>> +			first = atomic_inc_and_test(&page->_mapcount);
>>>>> +			if (first && folio_test_large(folio)) {
>>>>> +				first = atomic_inc_return_relaxed(mapped);
>>>>> +				first = (nr < COMPOUND_MAPPED);
>>>>
>>>> This still contains the typo that Yin Fengwei spotted in the previous version:
>>>> https://lore.kernel.org/linux-mm/20230228213738.272178-1-willy@infradead.org/T/#m84673899e25bc31356093a1177941f2cc35e5da8
>>>>
>>>> FYI, I'm seeing a perf regression of about 1% when compiling the kernel on
>>>> Ampere Altra (arm64) with this whole series on top of v6.3-rc1 (In a VM using
>>>> ext4 filesystem). Looks like instruction aborts are taking much longer and a
>>>> selection of syscalls are a bit slower. Still hunting down the root cause. Will
>>>> report once I have conclusive diagnosis.
>>>
>>> I'm sorry - I'm struggling to find the exact cause. But its spending over 2x the
>>> amount of time in the instruction abort handling code once patches 32-36 are
>>> included. Everything in the flame graph is just taking longer. Perhaps we are
>>> getting more instruction aborts somehow? I have the flamegraphs if anyone wants
>>> them - just shout and I'll email them separately.
>> Thanks a lot to Ryan for sharing the flamegraphs to me. I found the __do_fault()
>> is called with patch 32-36 while no __do_fault() just with first 31 patches. I 
>> suspect the folio_add_file_rmap_range() missed some PTEs population. Please give
>> me few days to find the root cause and fix. Sorry for this.
> 
> You're welcome. Give me a shout once you have a re-spin and I'll rerun the tests.
Could you please help to try following changes? Thanks in advance.

diff --git a/mm/filemap.c b/mm/filemap.c
index 40be33b5ee46..137011320c68 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -3504,15 +3504,16 @@ static vm_fault_t filemap_map_folio_range(struct vm_fault *vmf,
 		if (!pte_none(vmf->pte[count]))
 			goto skip;
 
-		if (vmf->address == addr)
-			ret = VM_FAULT_NOPAGE;
-
 		count++;
 		continue;
 skip:
 		if (count) {
 			set_pte_range(vmf, folio, page, count, addr);
 			folio_ref_add(folio, count);
+			if ((vmf->address < (addr + count * PAGE_SIZE)) &&
+					(vmf->address >= addr))
+				ret = VM_FAULT_NOPAGE;
+
 		}
 
 		count++;
@@ -3525,6 +3526,9 @@ static vm_fault_t filemap_map_folio_range(struct vm_fault *vmf,
 	if (count) {
 		set_pte_range(vmf, folio, page, count, addr);
 		folio_ref_add(folio, count);
+		if ((vmf->address < (addr + count * PAGE_SIZE)) &&
+				(vmf->address >= addr))
+			ret = VM_FAULT_NOPAGE;
 	}
 
 	vmf->pte = old_ptep;


Regards
Yin, Fengwei

> 
>>
>>
>> Regards
>> Yin, Fengwei
>>
>>>
>>>>
>>>> Thanks,
>>>> Ryan
>>>>
>>>>
>>>>> +			}
>>>>> +
>>>>> +			if (first)
>>>>> +				nr++;
>>>>> +		} while (page++, --nr_pages > 0);
>>>>>  	} else if (folio_test_pmd_mappable(folio)) {
>>>>>  		/* That test is redundant: it's for safety or to optimize out */
>>>>>  
>>>>> @@ -1354,6 +1362,30 @@ void page_add_file_rmap(struct page *page, struct vm_area_struct *vma,
>>>>>  	mlock_vma_folio(folio, vma, compound);
>>>>>  }
>>>>>  
>>>>> +/**
>>>>> + * page_add_file_rmap - add pte mapping to a file page
>>>>> + * @page:	the page to add the mapping to
>>>>> + * @vma:	the vm area in which the mapping is added
>>>>> + * @compound:	charge the page as compound or small page
>>>>> + *
>>>>> + * The caller needs to hold the pte lock.
>>>>> + */
>>>>> +void page_add_file_rmap(struct page *page, struct vm_area_struct *vma,
>>>>> +		bool compound)
>>>>> +{
>>>>> +	struct folio *folio = page_folio(page);
>>>>> +	unsigned int nr_pages;
>>>>> +
>>>>> +	VM_WARN_ON_ONCE_PAGE(compound && !PageTransHuge(page), page);
>>>>> +
>>>>> +	if (likely(!compound))
>>>>> +		nr_pages = 1;
>>>>> +	else
>>>>> +		nr_pages = folio_nr_pages(folio);
>>>>> +
>>>>> +	folio_add_file_rmap_range(folio, page, nr_pages, vma, compound);
>>>>> +}
>>>>> +
>>>>>  /**
>>>>>   * page_remove_rmap - take down pte mapping from a page
>>>>>   * @page:	page to remove mapping from
>>>>
>>>
> 
> 

^ permalink raw reply related	[flat|nested] 138+ messages in thread

* Re: [PATCH v4 34/36] rmap: add folio_add_file_rmap_range()
  2023-03-17  8:23           ` Yin, Fengwei
@ 2023-03-17 12:46             ` Ryan Roberts
  2023-03-17 13:28               ` Yin, Fengwei
  0 siblings, 1 reply; 138+ messages in thread
From: Ryan Roberts @ 2023-03-17 12:46 UTC (permalink / raw)
  To: Yin, Fengwei, Matthew Wilcox (Oracle), linux-arch; +Cc: linux-mm, linux-kernel

On 17/03/2023 08:23, Yin, Fengwei wrote:
[...]

>>>>> FYI, I'm seeing a perf regression of about 1% when compiling the kernel on
>>>>> Ampere Altra (arm64) with this whole series on top of v6.3-rc1 (In a VM using
>>>>> ext4 filesystem). Looks like instruction aborts are taking much longer and a
>>>>> selection of syscalls are a bit slower. Still hunting down the root cause. Will
>>>>> report once I have conclusive diagnosis.
>>>>
>>>> I'm sorry - I'm struggling to find the exact cause. But its spending over 2x the
>>>> amount of time in the instruction abort handling code once patches 32-36 are
>>>> included. Everything in the flame graph is just taking longer. Perhaps we are
>>>> getting more instruction aborts somehow? I have the flamegraphs if anyone wants
>>>> them - just shout and I'll email them separately.
>>> Thanks a lot to Ryan for sharing the flamegraphs to me. I found the __do_fault()
>>> is called with patch 32-36 while no __do_fault() just with first 31 patches. I 
>>> suspect the folio_add_file_rmap_range() missed some PTEs population. Please give
>>> me few days to find the root cause and fix. Sorry for this.
>>
>> You're welcome. Give me a shout once you have a re-spin and I'll rerun the tests.
> Could you please help to try following changes? Thanks in advance.
> 
> diff --git a/mm/filemap.c b/mm/filemap.c
> index 40be33b5ee46..137011320c68 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -3504,15 +3504,16 @@ static vm_fault_t filemap_map_folio_range(struct vm_fault *vmf,
>  		if (!pte_none(vmf->pte[count]))
>  			goto skip;
>  
> -		if (vmf->address == addr)
> -			ret = VM_FAULT_NOPAGE;
> -
>  		count++;
>  		continue;
>  skip:
>  		if (count) {
>  			set_pte_range(vmf, folio, page, count, addr);
>  			folio_ref_add(folio, count);
> +			if ((vmf->address < (addr + count * PAGE_SIZE)) &&
> +					(vmf->address >= addr))
> +				ret = VM_FAULT_NOPAGE;
> +
>  		}
>  
>  		count++;
> @@ -3525,6 +3526,9 @@ static vm_fault_t filemap_map_folio_range(struct vm_fault *vmf,
>  	if (count) {
>  		set_pte_range(vmf, folio, page, count, addr);
>  		folio_ref_add(folio, count);
> +		if ((vmf->address < (addr + count * PAGE_SIZE)) &&
> +				(vmf->address >= addr))
> +			ret = VM_FAULT_NOPAGE;
>  	}
>  
>  	vmf->pte = old_ptep;
> 

I'm afraid this hasn't fixed it, and I still see __do_fault(). I'll send the
flame graph over separately.

Given I'm running on ext4, I wasn't expecting to see any large page cache
folios? So I don't think we would have expected this patch to help anyway? (or
perhaps there are still THP folios? But I think they will get PMD mapped?).


> 
> Regards
> Yin, Fengwei
> 
>>
>>>
>>>
>>> Regards
>>> Yin, Fengwei
>>>
>>>>
>>>>>
>>>>> Thanks,
>>>>> Ryan
>>>>>
>>>>>
>>>>>> +			}
>>>>>> +
>>>>>> +			if (first)
>>>>>> +				nr++;
>>>>>> +		} while (page++, --nr_pages > 0);
>>>>>>  	} else if (folio_test_pmd_mappable(folio)) {
>>>>>>  		/* That test is redundant: it's for safety or to optimize out */
>>>>>>  
>>>>>> @@ -1354,6 +1362,30 @@ void page_add_file_rmap(struct page *page, struct vm_area_struct *vma,
>>>>>>  	mlock_vma_folio(folio, vma, compound);
>>>>>>  }
>>>>>>  
>>>>>> +/**
>>>>>> + * page_add_file_rmap - add pte mapping to a file page
>>>>>> + * @page:	the page to add the mapping to
>>>>>> + * @vma:	the vm area in which the mapping is added
>>>>>> + * @compound:	charge the page as compound or small page
>>>>>> + *
>>>>>> + * The caller needs to hold the pte lock.
>>>>>> + */
>>>>>> +void page_add_file_rmap(struct page *page, struct vm_area_struct *vma,
>>>>>> +		bool compound)
>>>>>> +{
>>>>>> +	struct folio *folio = page_folio(page);
>>>>>> +	unsigned int nr_pages;
>>>>>> +
>>>>>> +	VM_WARN_ON_ONCE_PAGE(compound && !PageTransHuge(page), page);
>>>>>> +
>>>>>> +	if (likely(!compound))
>>>>>> +		nr_pages = 1;
>>>>>> +	else
>>>>>> +		nr_pages = folio_nr_pages(folio);
>>>>>> +
>>>>>> +	folio_add_file_rmap_range(folio, page, nr_pages, vma, compound);
>>>>>> +}
>>>>>> +
>>>>>>  /**
>>>>>>   * page_remove_rmap - take down pte mapping from a page
>>>>>>   * @page:	page to remove mapping from
>>>>>
>>>>
>>
>>


^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v4 35/36] mm: Convert do_set_pte() to set_pte_range()
  2023-03-17  8:19                   ` Yin, Fengwei
@ 2023-03-17 13:00                     ` Ryan Roberts
  2023-03-17 13:44                       ` Yin, Fengwei
  2023-03-24 14:58                     ` Will Deacon
  1 sibling, 1 reply; 138+ messages in thread
From: Ryan Roberts @ 2023-03-17 13:00 UTC (permalink / raw)
  To: Yin, Fengwei, Matthew Wilcox; +Cc: linux-arch, will, linux-mm, linux-kernel

On 17/03/2023 08:19, Yin, Fengwei wrote:
> 
> 
> On 3/17/2023 4:00 PM, Ryan Roberts wrote:
>> On 17/03/2023 06:33, Yin, Fengwei wrote:
>>>
>>>
>>> On 3/17/2023 11:44 AM, Matthew Wilcox wrote:
>>>> On Fri, Mar 17, 2023 at 09:58:17AM +0800, Yin, Fengwei wrote:
>>>>>
>>>>>
>>>>> On 3/17/2023 1:52 AM, Matthew Wilcox wrote:
>>>>>> On Thu, Mar 16, 2023 at 04:38:58PM +0000, Ryan Roberts wrote:
>>>>>>> On 16/03/2023 16:23, Yin, Fengwei wrote:
>>>>>>>>> I think you are changing behavior here - is this intentional? Previously this
>>>>>>>>> would be evaluated per page, now its evaluated once for the whole range. The
>>>>>>>>> intention below is that directly faulted pages are mapped young and prefaulted
>>>>>>>>> pages are mapped old. But now a whole range will be mapped the same.
>>>>>>>>
>>>>>>>> Yes. You are right here.
>>>>>>>>
>>>>>>>> Look at the prefault and cpu_has_hw_af for ARM64, it looks like we
>>>>>>>> can avoid to handle vmf->address == addr specially. It's OK to 
>>>>>>>> drop prefault and change the logic here a little bit to:
>>>>>>>>   if (arch_wants_old_prefaulted_pte())
>>>>>>>>       entry = pte_mkold(entry);
>>>>>>>>   else
>>>>>>>>       entry = pte_sw_mkyong(entry);
>>>>>>>>
>>>>>>>> It's not necessary to use pte_sw_mkyong for vmf->address == addr
>>>>>>>> because HW will set the ACCESS bit in page table entry.
>>>>>>>>
>>>>>>>> Add Will Deacon in case I missed something here. Thanks.
>>>>>>>
>>>>>>> I'll defer to Will's response, but not all arm HW supports HW access flag
>>>>>>> management. In that case it's done by SW, so I would imagine that by setting
>>>>>>> this to old initially, we will get a second fault to set the access bit, which
>>>>>>> will slow things down. I wonder if you will need to split this into (up to) 3
>>>>>>> calls to set_ptes()?
>>>>>>
>>>>>> I don't think we should do that.  The limited information I have from
>>>>>> various microarchitectures is that the PTEs must differ only in their
>>>>>> PFN bits in order to use larger TLB entries.  That includes the Accessed
>>>>>> bit (or equivalent).  So we should mkyoung all the PTEs in the same
>>>>>> folio, at least initially.
>>>>>>
>>>>>> That said, we should still do this conditionally.  We'll prefault some
>>>>>> other folios too.  So I think this should be:
>>>>>>
>>>>>>         bool prefault = (addr > vmf->address) || ((addr + nr) < vmf->address);
>>>>>>
>>>>> According to commit 46bdb4277f98e70d0c91f4289897ade533fe9e80, if hardware access
>>>>> flag is supported on ARM64, there is benefit if prefault PTEs is set as "old".
>>>>> If we change prefault like above, the PTEs is set as "yong" which loose benefit
>>>>> on ARM64 with hardware access flag.
>>>>>
>>>>> ITOH, if from "old" to "yong" is cheap, why not leave all PTEs of folio as "old"
>>>>> and let hardware to update it to "yong"?
>>>>
>>>> Because we're tracking the entire folio as a single entity.  So we're
>>>> better off avoiding the extra pagefaults to update the accessed bit,
>>>> which won't actually give us any information (vmscan needs to know "were
>>>> any of the accessed bits set", not "how many of them were set").
>>> There is no extra pagefaults to update the accessed bit. There are three cases here:
>>> 1. hardware support access flag and cheap from "old" to "yong" without extra fault
>>> 2. hardware support access flag and expensive from "old" to "yong" without extra fault
>>> 3. no hardware support access flag (extra pagefaults from "old" to "yong". Expensive)
>>>
>>> For #2 and #3, it's expensive from "old" to "yong", so we always set PTEs "yong" in
>>> page fault.
>>> For #1, It's cheap from "old" to "yong", so it's OK to set PTEs "old" in page fault.
>>> And hardware will set it to "yong" when access memory. Actually, ARM64 with hardware
>>> access bit requires to set PTEs "old".
>>
>> Your logic makes sense, but it doesn't take into account the HPA
>> micro-architectural feature present in some ARM CPUs. HPA can transparently
>> coalesce multiple pages into a single TLB entry when certain conditions are met
>> (roughly; upto 4 pages physically and virtually contiguous and all within a
>> 4-page natural alignment). But as Matthew says, this works out better when all
>> pte attributes (including access and dirty) match. Given the reason for setting
>> the prefault pages to old is so that vmscan can do a better job of finding cold
>> pages, and given vmscan will now be looking for folios and not individual pages
>> (I assume?), I agree with Matthew that we should make whole folios young or old.
>> It will marginally increase our chances of the access and dirty bits being
>> consistent across the whole 4-page block that the HW tries to coalesce. If we
>> unconditionally make everything old, the hw will set accessed for the single
>> page that faulted, and we therefore don't have consistency for that 4-page block.
> My concern was that the benefit of "old" PTEs for ARM64 with hardware access bit
> will be lost. The workloads (application launch latency and direct reclaim according
> to commit 46bdb4277f98e70d0c91f4289897ade533fe9e80) can show regression with this
> series. Thanks.

My (potentially incorrect) understanding of the reason that marking the
prefaulted ptes as old was because it made it easier/quicker for vmscan to
identify those prefaulted pages and reclaim them under memory pressure. I
_assume_ now that we have large folios, that vmscan will be trying to pick
folios for reclaim, not individual subpages within the folio? In which case,
vmscan will only consider the folio as old if _all_ pages within are old. So
marking all the pages of a folio young vs marking 1 page in the folio young
won't make a difference from this perspective. But it will make a difference
from the perspective a HPA. (Please Matthew or somebody else, correct me if my
understanding is incorrect!)

> 
> BTW, with TLB merge feature, should hardware update coalesce multiple pages access
> bit together? otherwise, it's avoidable that only one page access is set by hardware
> finally.

No, the HW will only update the access flag for the single page that is
accessed. So yes, in the long run the value of the flags across the 4-page block
will diverge - that's why I said "marginal" above.

> 
> Regards
> Yin, Fengwei
> 
>>
>>>
>>>>
>>>> Anyway, hopefully Ryan can test this and let us know if it fixes the
>>>> regression he sees.
>>> I highly suspect the regression Ryan saw is not related with this but another my
>>> stupid work. I will send out the testing patch soon. Thanks.
>>
>> I tested a version of this where I made everything unconditionally young,
>> thinking it might be the source of the perf regression, before I reported it. It
>> doesn't make any difference. So I agree the regression is somewhere else.
>>
>> Thanks,
>> Ryan
>>
>>>
>>>
>>> Regards
>>> Yin, Fengwei
>>


^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v4 34/36] rmap: add folio_add_file_rmap_range()
  2023-03-17 12:46             ` Ryan Roberts
@ 2023-03-17 13:28               ` Yin, Fengwei
  0 siblings, 0 replies; 138+ messages in thread
From: Yin, Fengwei @ 2023-03-17 13:28 UTC (permalink / raw)
  To: Ryan Roberts, Matthew Wilcox (Oracle), linux-arch; +Cc: linux-mm, linux-kernel



On 3/17/2023 8:46 PM, Ryan Roberts wrote:
> On 17/03/2023 08:23, Yin, Fengwei wrote:
> [...]
> 
>>>>>> FYI, I'm seeing a perf regression of about 1% when compiling the kernel on
>>>>>> Ampere Altra (arm64) with this whole series on top of v6.3-rc1 (In a VM using
>>>>>> ext4 filesystem). Looks like instruction aborts are taking much longer and a
>>>>>> selection of syscalls are a bit slower. Still hunting down the root cause. Will
>>>>>> report once I have conclusive diagnosis.
>>>>>
>>>>> I'm sorry - I'm struggling to find the exact cause. But its spending over 2x the
>>>>> amount of time in the instruction abort handling code once patches 32-36 are
>>>>> included. Everything in the flame graph is just taking longer. Perhaps we are
>>>>> getting more instruction aborts somehow? I have the flamegraphs if anyone wants
>>>>> them - just shout and I'll email them separately.
>>>> Thanks a lot to Ryan for sharing the flamegraphs to me. I found the __do_fault()
>>>> is called with patch 32-36 while no __do_fault() just with first 31 patches. I 
>>>> suspect the folio_add_file_rmap_range() missed some PTEs population. Please give
>>>> me few days to find the root cause and fix. Sorry for this.
>>>
>>> You're welcome. Give me a shout once you have a re-spin and I'll rerun the tests.
>> Could you please help to try following changes? Thanks in advance.
>>
>> diff --git a/mm/filemap.c b/mm/filemap.c
>> index 40be33b5ee46..137011320c68 100644
>> --- a/mm/filemap.c
>> +++ b/mm/filemap.c
>> @@ -3504,15 +3504,16 @@ static vm_fault_t filemap_map_folio_range(struct vm_fault *vmf,
>>  		if (!pte_none(vmf->pte[count]))
>>  			goto skip;
>>  
>> -		if (vmf->address == addr)
>> -			ret = VM_FAULT_NOPAGE;
>> -
>>  		count++;
>>  		continue;
>>  skip:
>>  		if (count) {
>>  			set_pte_range(vmf, folio, page, count, addr);
>>  			folio_ref_add(folio, count);
>> +			if ((vmf->address < (addr + count * PAGE_SIZE)) &&
>> +					(vmf->address >= addr))
>> +				ret = VM_FAULT_NOPAGE;
>> +
>>  		}
>>  
>>  		count++;
>> @@ -3525,6 +3526,9 @@ static vm_fault_t filemap_map_folio_range(struct vm_fault *vmf,
>>  	if (count) {
>>  		set_pte_range(vmf, folio, page, count, addr);
>>  		folio_ref_add(folio, count);
>> +		if ((vmf->address < (addr + count * PAGE_SIZE)) &&
>> +				(vmf->address >= addr))
>> +			ret = VM_FAULT_NOPAGE;
>>  	}
>>  
>>  	vmf->pte = old_ptep;
>>
> 
> I'm afraid this hasn't fixed it, and I still see __do_fault(). I'll send the
> flame graph over separately.
> 
> Given I'm running on ext4, I wasn't expecting to see any large page cache
> folios? So I don't think we would have expected this patch to help anyway? (or
> perhaps there are still THP folios? But I think they will get PMD mapped?).
OK. I will try to reproduce the issue on my local env to see whether I could
reproduce it on x86_64 env.


Regards
Yin, Fengwei

> 
> 
>>
>> Regards
>> Yin, Fengwei
>>
>>>
>>>>
>>>>
>>>> Regards
>>>> Yin, Fengwei
>>>>
>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> Ryan
>>>>>>
>>>>>>
>>>>>>> +			}
>>>>>>> +
>>>>>>> +			if (first)
>>>>>>> +				nr++;
>>>>>>> +		} while (page++, --nr_pages > 0);
>>>>>>>  	} else if (folio_test_pmd_mappable(folio)) {
>>>>>>>  		/* That test is redundant: it's for safety or to optimize out */
>>>>>>>  
>>>>>>> @@ -1354,6 +1362,30 @@ void page_add_file_rmap(struct page *page, struct vm_area_struct *vma,
>>>>>>>  	mlock_vma_folio(folio, vma, compound);
>>>>>>>  }
>>>>>>>  
>>>>>>> +/**
>>>>>>> + * page_add_file_rmap - add pte mapping to a file page
>>>>>>> + * @page:	the page to add the mapping to
>>>>>>> + * @vma:	the vm area in which the mapping is added
>>>>>>> + * @compound:	charge the page as compound or small page
>>>>>>> + *
>>>>>>> + * The caller needs to hold the pte lock.
>>>>>>> + */
>>>>>>> +void page_add_file_rmap(struct page *page, struct vm_area_struct *vma,
>>>>>>> +		bool compound)
>>>>>>> +{
>>>>>>> +	struct folio *folio = page_folio(page);
>>>>>>> +	unsigned int nr_pages;
>>>>>>> +
>>>>>>> +	VM_WARN_ON_ONCE_PAGE(compound && !PageTransHuge(page), page);
>>>>>>> +
>>>>>>> +	if (likely(!compound))
>>>>>>> +		nr_pages = 1;
>>>>>>> +	else
>>>>>>> +		nr_pages = folio_nr_pages(folio);
>>>>>>> +
>>>>>>> +	folio_add_file_rmap_range(folio, page, nr_pages, vma, compound);
>>>>>>> +}
>>>>>>> +
>>>>>>>  /**
>>>>>>>   * page_remove_rmap - take down pte mapping from a page
>>>>>>>   * @page:	page to remove mapping from
>>>>>>
>>>>>
>>>
>>>
> 

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v4 35/36] mm: Convert do_set_pte() to set_pte_range()
  2023-03-17 13:00                     ` Ryan Roberts
@ 2023-03-17 13:44                       ` Yin, Fengwei
  0 siblings, 0 replies; 138+ messages in thread
From: Yin, Fengwei @ 2023-03-17 13:44 UTC (permalink / raw)
  To: Ryan Roberts, Matthew Wilcox; +Cc: linux-arch, will, linux-mm, linux-kernel



On 3/17/2023 9:00 PM, Ryan Roberts wrote:
> On 17/03/2023 08:19, Yin, Fengwei wrote:
>>
>>
>> On 3/17/2023 4:00 PM, Ryan Roberts wrote:
>>> On 17/03/2023 06:33, Yin, Fengwei wrote:
>>>>
>>>>
>>>> On 3/17/2023 11:44 AM, Matthew Wilcox wrote:
>>>>> On Fri, Mar 17, 2023 at 09:58:17AM +0800, Yin, Fengwei wrote:
>>>>>>
>>>>>>
>>>>>> On 3/17/2023 1:52 AM, Matthew Wilcox wrote:
>>>>>>> On Thu, Mar 16, 2023 at 04:38:58PM +0000, Ryan Roberts wrote:
>>>>>>>> On 16/03/2023 16:23, Yin, Fengwei wrote:
>>>>>>>>>> I think you are changing behavior here - is this intentional? Previously this
>>>>>>>>>> would be evaluated per page, now its evaluated once for the whole range. The
>>>>>>>>>> intention below is that directly faulted pages are mapped young and prefaulted
>>>>>>>>>> pages are mapped old. But now a whole range will be mapped the same.
>>>>>>>>>
>>>>>>>>> Yes. You are right here.
>>>>>>>>>
>>>>>>>>> Look at the prefault and cpu_has_hw_af for ARM64, it looks like we
>>>>>>>>> can avoid to handle vmf->address == addr specially. It's OK to 
>>>>>>>>> drop prefault and change the logic here a little bit to:
>>>>>>>>>   if (arch_wants_old_prefaulted_pte())
>>>>>>>>>       entry = pte_mkold(entry);
>>>>>>>>>   else
>>>>>>>>>       entry = pte_sw_mkyong(entry);
>>>>>>>>>
>>>>>>>>> It's not necessary to use pte_sw_mkyong for vmf->address == addr
>>>>>>>>> because HW will set the ACCESS bit in page table entry.
>>>>>>>>>
>>>>>>>>> Add Will Deacon in case I missed something here. Thanks.
>>>>>>>>
>>>>>>>> I'll defer to Will's response, but not all arm HW supports HW access flag
>>>>>>>> management. In that case it's done by SW, so I would imagine that by setting
>>>>>>>> this to old initially, we will get a second fault to set the access bit, which
>>>>>>>> will slow things down. I wonder if you will need to split this into (up to) 3
>>>>>>>> calls to set_ptes()?
>>>>>>>
>>>>>>> I don't think we should do that.  The limited information I have from
>>>>>>> various microarchitectures is that the PTEs must differ only in their
>>>>>>> PFN bits in order to use larger TLB entries.  That includes the Accessed
>>>>>>> bit (or equivalent).  So we should mkyoung all the PTEs in the same
>>>>>>> folio, at least initially.
>>>>>>>
>>>>>>> That said, we should still do this conditionally.  We'll prefault some
>>>>>>> other folios too.  So I think this should be:
>>>>>>>
>>>>>>>         bool prefault = (addr > vmf->address) || ((addr + nr) < vmf->address);
>>>>>>>
>>>>>> According to commit 46bdb4277f98e70d0c91f4289897ade533fe9e80, if hardware access
>>>>>> flag is supported on ARM64, there is benefit if prefault PTEs is set as "old".
>>>>>> If we change prefault like above, the PTEs is set as "yong" which loose benefit
>>>>>> on ARM64 with hardware access flag.
>>>>>>
>>>>>> ITOH, if from "old" to "yong" is cheap, why not leave all PTEs of folio as "old"
>>>>>> and let hardware to update it to "yong"?
>>>>>
>>>>> Because we're tracking the entire folio as a single entity.  So we're
>>>>> better off avoiding the extra pagefaults to update the accessed bit,
>>>>> which won't actually give us any information (vmscan needs to know "were
>>>>> any of the accessed bits set", not "how many of them were set").
>>>> There is no extra pagefaults to update the accessed bit. There are three cases here:
>>>> 1. hardware support access flag and cheap from "old" to "yong" without extra fault
>>>> 2. hardware support access flag and expensive from "old" to "yong" without extra fault
>>>> 3. no hardware support access flag (extra pagefaults from "old" to "yong". Expensive)
>>>>
>>>> For #2 and #3, it's expensive from "old" to "yong", so we always set PTEs "yong" in
>>>> page fault.
>>>> For #1, It's cheap from "old" to "yong", so it's OK to set PTEs "old" in page fault.
>>>> And hardware will set it to "yong" when access memory. Actually, ARM64 with hardware
>>>> access bit requires to set PTEs "old".
>>>
>>> Your logic makes sense, but it doesn't take into account the HPA
>>> micro-architectural feature present in some ARM CPUs. HPA can transparently
>>> coalesce multiple pages into a single TLB entry when certain conditions are met
>>> (roughly; upto 4 pages physically and virtually contiguous and all within a
>>> 4-page natural alignment). But as Matthew says, this works out better when all
>>> pte attributes (including access and dirty) match. Given the reason for setting
>>> the prefault pages to old is so that vmscan can do a better job of finding cold
>>> pages, and given vmscan will now be looking for folios and not individual pages
>>> (I assume?), I agree with Matthew that we should make whole folios young or old.
>>> It will marginally increase our chances of the access and dirty bits being
>>> consistent across the whole 4-page block that the HW tries to coalesce. If we
>>> unconditionally make everything old, the hw will set accessed for the single
>>> page that faulted, and we therefore don't have consistency for that 4-page block.
>> My concern was that the benefit of "old" PTEs for ARM64 with hardware access bit
>> will be lost. The workloads (application launch latency and direct reclaim according
>> to commit 46bdb4277f98e70d0c91f4289897ade533fe9e80) can show regression with this
>> series. Thanks.
> 
> My (potentially incorrect) understanding of the reason that marking the
> prefaulted ptes as old was because it made it easier/quicker for vmscan to
> identify those prefaulted pages and reclaim them under memory pressure. I
> _assume_ now that we have large folios, that vmscan will be trying to pick
> folios for reclaim, not individual subpages within the folio? In which case,
> vmscan will only consider the folio as old if _all_ pages within are old. So
> marking all the pages of a folio young vs marking 1 page in the folio young
> won't make a difference from this perspective. But it will make a difference
> from the perspective a HPA. (Please Matthew or somebody else, correct me if my
> understanding is incorrect!)
Thanks a lot for your patient explanation. I got the point here. For the first
access, we mark the all PTEs of folio "yong". So later access will get large TLB.


Regards
Yin, Fengwei

> 
>>
>> BTW, with TLB merge feature, should hardware update coalesce multiple pages access
>> bit together? otherwise, it's avoidable that only one page access is set by hardware
>> finally.
> 
> No, the HW will only update the access flag for the single page that is
> accessed. So yes, in the long run the value of the flags across the 4-page block
> will diverge - that's why I said "marginal" above.
> 
>>
>> Regards
>> Yin, Fengwei
>>
>>>
>>>>
>>>>>
>>>>> Anyway, hopefully Ryan can test this and let us know if it fixes the
>>>>> regression he sees.
>>>> I highly suspect the regression Ryan saw is not related with this but another my
>>>> stupid work. I will send out the testing patch soon. Thanks.
>>>
>>> I tested a version of this where I made everything unconditionally young,
>>> thinking it might be the source of the perf regression, before I reported it. It
>>> doesn't make any difference. So I agree the regression is somewhere else.
>>>
>>> Thanks,
>>> Ryan
>>>
>>>>
>>>>
>>>> Regards
>>>> Yin, Fengwei
>>>
> 

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v4 16/36] mips: Implement the new page table range API
  2023-03-15 20:33     ` Matthew Wilcox
@ 2023-03-17 15:29       ` Thomas Bogendoerfer
  2023-03-19 18:45         ` Thomas Bogendoerfer
  0 siblings, 1 reply; 138+ messages in thread
From: Thomas Bogendoerfer @ 2023-03-17 15:29 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: linux-arch, linux-mm, linux-kernel, linux-mips

On Wed, Mar 15, 2023 at 08:33:21PM +0000, Matthew Wilcox wrote:
> On Wed, Mar 15, 2023 at 11:50:22AM +0100, Thomas Bogendoerfer wrote:
> > On Wed, Mar 15, 2023 at 05:14:24AM +0000, Matthew Wilcox (Oracle) wrote:
> > > Rename _PFN_SHIFT to PFN_PTE_SHIFT.  Convert a few places
> > > to call set_pte() instead of set_pte_at().  Add set_ptes(),
> > > update_mmu_cache_range(), flush_icache_pages() and flush_dcache_folio().
> > 
> > /local/tbogendoerfer/korg/linux/mm/memory.c: In function ‘set_pte_range’:
> > /local/tbogendoerfer/korg/linux/mm/memory.c:4290:2: error: implicit declaration of function ‘update_mmu_cache_range’ [-Werror=implicit-function-declaration]
> >   update_mmu_cache_range(vma, addr, vmf->pte, nr);
> > 
> > update_mmu_cache_range() is missing in this patch.
> 
> Oops.  And mips was one of the arches I did a test build for!
> 
> Looks like we could try to gain some efficiency by passing 'nr' to
> __update_tlb(), but as far as I can tell, that's only called for r3k and
> r4k, so maybe it's not worth optimising at this point?

hmm, not sure if that would help. R4k style TLB has two PTEs mapped
per TLB entry. So by advancing per page __update_tlb() is called more
often than needed.

> Anyway, this add-on makes the mips build compile for me and I'll fold
> it into v5.

tested your v4 with the add-on on QEMU Malta and real hardware without
problems so far. I'll give v5 another spin.

Thomas.

-- 
Crap can work. Given enough thrust pigs will fly, but it's not necessarily a
good idea.                                                [ RFC1925, 2.3 ]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v4 20/36] powerpc: Implement the new page table range API
  2023-03-17  3:47       ` Matthew Wilcox
@ 2023-03-18  9:19         ` Christophe Leroy
  2023-07-10 20:24           ` Matthew Wilcox
  0 siblings, 1 reply; 138+ messages in thread
From: Christophe Leroy @ 2023-03-18  9:19 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: linux-arch, linux-mm, linux-kernel, Michael Ellerman,
	Nicholas Piggin, linuxppc-dev



Le 17/03/2023 à 04:47, Matthew Wilcox a écrit :
> On Wed, Mar 15, 2023 at 10:18:22AM +0000, Christophe Leroy wrote:
>> I investigated a bit further and can confirm now that the above won't
>> always work, see comment
>> https://elixir.bootlin.com/linux/v6.3-rc2/source/arch/powerpc/include/asm/nohash/32/pgtable.h#L147
>>
>> And then you see
>> https://elixir.bootlin.com/linux/v6.3-rc2/source/arch/powerpc/include/asm/nohash/pte-e500.h#L63
> 
> Got it.  Here's what I intend to fold in for the next version:
> 
> diff --git a/arch/powerpc/include/asm/book3s/32/pgtable.h b/arch/powerpc/include/asm/book3s/32/pgtable.h
> index 7bf1fe7297c6..5f12b9382909 100644
> --- a/arch/powerpc/include/asm/book3s/32/pgtable.h
> +++ b/arch/powerpc/include/asm/book3s/32/pgtable.h
> @@ -462,11 +462,6 @@ static inline pte_t pfn_pte(unsigned long pfn, pgprot_t pgprot)
>   		     pgprot_val(pgprot));
>   }
>   
> -static inline unsigned long pte_pfn(pte_t pte)
> -{
> -	return pte_val(pte) >> PTE_RPN_SHIFT;
> -}
> -
>   /* Generic modifiers for PTE bits */
>   static inline pte_t pte_wrprotect(pte_t pte)
>   {
> diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h
> index 4acc9690f599..c5baa3082a5a 100644
> --- a/arch/powerpc/include/asm/book3s/64/pgtable.h
> +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
> @@ -104,6 +104,7 @@
>    * and every thing below PAGE_SHIFT;
>    */
>   #define PTE_RPN_MASK	(((1UL << _PAGE_PA_MAX) - 1) & (PAGE_MASK))
> +#define PTE_RPN_SHIFT	PAGE_SHIFT
>   /*
>    * set of bits not changed in pmd_modify. Even though we have hash specific bits
>    * in here, on radix we expect them to be zero.
> @@ -569,11 +570,6 @@ static inline pte_t pfn_pte(unsigned long pfn, pgprot_t pgprot)
>   	return __pte(((pte_basic_t)pfn << PAGE_SHIFT) | pgprot_val(pgprot) | _PAGE_PTE);
>   }
>   
> -static inline unsigned long pte_pfn(pte_t pte)
> -{
> -	return (pte_val(pte) & PTE_RPN_MASK) >> PAGE_SHIFT;
> -}
> -
>   /* Generic modifiers for PTE bits */
>   static inline pte_t pte_wrprotect(pte_t pte)
>   {
> diff --git a/arch/powerpc/include/asm/nohash/pgtable.h b/arch/powerpc/include/asm/nohash/pgtable.h
> index 69a7dd47a9f0..03be8b22aaea 100644
> --- a/arch/powerpc/include/asm/nohash/pgtable.h
> +++ b/arch/powerpc/include/asm/nohash/pgtable.h
> @@ -101,8 +101,6 @@ static inline bool pte_access_permitted(pte_t pte, bool write)
>   static inline pte_t pfn_pte(unsigned long pfn, pgprot_t pgprot) {
>   	return __pte(((pte_basic_t)(pfn) << PTE_RPN_SHIFT) |
>   		     pgprot_val(pgprot)); }
> -static inline unsigned long pte_pfn(pte_t pte)	{
> -	return pte_val(pte) >> PTE_RPN_SHIFT; }
>   
>   /* Generic modifiers for PTE bits */
>   static inline pte_t pte_exprotect(pte_t pte)
> @@ -279,7 +277,7 @@ static inline int pud_huge(pud_t pud)
>   void update_mmu_cache_range(struct vm_area_struct *vma, unsigned long address,
>   		pte_t *ptep, unsigned int nr);
>   #else
> -static inline void update_mmu_cache(struct vm_area_struct *vma,
> +static inline void update_mmu_cache_range(struct vm_area_struct *vma,
>   		unsigned long address, pte_t *ptep, unsigned int nr) {}
>   #endif
>   
> diff --git a/arch/powerpc/include/asm/pgtable.h b/arch/powerpc/include/asm/pgtable.h
> index 656ecf2b10cd..491a2720f835 100644
> --- a/arch/powerpc/include/asm/pgtable.h
> +++ b/arch/powerpc/include/asm/pgtable.h
> @@ -54,6 +54,12 @@ void set_ptes(struct mm_struct *mm, unsigned long addr, pte_t *ptep,
>   /* Keep these as a macros to avoid include dependency mess */
>   #define pte_page(x)		pfn_to_page(pte_pfn(x))
>   #define mk_pte(page, pgprot)	pfn_pte(page_to_pfn(page), (pgprot))
> +
> +static inline unsigned long pte_pfn(pte_t pte)
> +{
> +	return (pte_val(pte) & PTE_RPN_MASK) >> PTE_RPN_SHIFT;
> +}
> +
>   /*
>    * Select all bits except the pfn
>    */
> diff --git a/arch/powerpc/mm/nohash/e500_hugetlbpage.c b/arch/powerpc/mm/nohash/e500_hugetlbpage.c
> index f3cb91107a47..583b3098763f 100644
> --- a/arch/powerpc/mm/nohash/e500_hugetlbpage.c
> +++ b/arch/powerpc/mm/nohash/e500_hugetlbpage.c
> @@ -178,7 +178,7 @@ book3e_hugetlb_preload(struct vm_area_struct *vma, unsigned long ea, pte_t pte)
>    *
>    * This must always be called with the pte lock held.
>    */
> -void update_mmu_cache(struct vm_area_struct *vma, unsigned long address,
> +void update_mmu_cache_range(struct vm_area_struct *vma, unsigned long address,
>   		pte_t *ptep, unsigned int nr)
>   {
>   	if (is_vm_hugetlb_page(vma))
> diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c
> index b3c7b874a7a2..db236b494845 100644
> --- a/arch/powerpc/mm/pgtable.c
> +++ b/arch/powerpc/mm/pgtable.c
> @@ -208,7 +208,7 @@ void set_ptes(struct mm_struct *mm, unsigned long addr, pte_t *ptep,
>   		if (--nr == 0)
>   			break;
>   		ptep++;
> -		pte = __pte(pte_val(pte) + PAGE_SIZE);
> +		pte = __pte(pte_val(pte) + (1UL << PTE_RPN_SHIFT));
>   		addr += PAGE_SIZE;
>   	}
>   }


What about:

void set_ptes(struct mm_struct *mm, unsigned long addr, pte_t *ptep,
		pte_t pte, unsigned int nr)
{
	pgprot_t prot;
	unsigned long pfn;
	/*
	 * Make sure hardware valid bit is not set. We don't do
	 * tlb flush for this update.
	 */
	VM_WARN_ON(pte_hw_valid(*ptep) && !pte_protnone(*ptep));

	/* Note: mm->context.id might not yet have been assigned as
	 * this context might not have been activated yet when this
	 * is called.
	 */
	pte = set_pte_filter(pte);

	prot = pte_pgprot(pte);
	pfn = pte_pfn(pte);
	/* Perform the setting of the PTE */
	for (;;) {
		__set_pte_at(mm, addr, ptep, pfn_pte(pfn, prot), 0);
		if (--nr == 0)
			break;
		ptep++;
		pfn++;
		addr += PAGE_SIZE;
	}
}


Christophe

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v4 16/36] mips: Implement the new page table range API
  2023-03-17 15:29       ` Thomas Bogendoerfer
@ 2023-03-19 18:45         ` Thomas Bogendoerfer
  2023-03-19 20:16           ` Matthew Wilcox
  0 siblings, 1 reply; 138+ messages in thread
From: Thomas Bogendoerfer @ 2023-03-19 18:45 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: linux-arch, linux-mm, linux-kernel, linux-mips

On Fri, Mar 17, 2023 at 04:29:20PM +0100, Thomas Bogendoerfer wrote:
> On Wed, Mar 15, 2023 at 08:33:21PM +0000, Matthew Wilcox wrote:
> > On Wed, Mar 15, 2023 at 11:50:22AM +0100, Thomas Bogendoerfer wrote:
> > > On Wed, Mar 15, 2023 at 05:14:24AM +0000, Matthew Wilcox (Oracle) wrote:
> > > > Rename _PFN_SHIFT to PFN_PTE_SHIFT.  Convert a few places
> > > > to call set_pte() instead of set_pte_at().  Add set_ptes(),
> > > > update_mmu_cache_range(), flush_icache_pages() and flush_dcache_folio().
> > > 
> > > /local/tbogendoerfer/korg/linux/mm/memory.c: In function ‘set_pte_range’:
> > > /local/tbogendoerfer/korg/linux/mm/memory.c:4290:2: error: implicit declaration of function ‘update_mmu_cache_range’ [-Werror=implicit-function-declaration]
> > >   update_mmu_cache_range(vma, addr, vmf->pte, nr);
> > > 
> > > update_mmu_cache_range() is missing in this patch.
> > 
> > Oops.  And mips was one of the arches I did a test build for!
> > 
> > Looks like we could try to gain some efficiency by passing 'nr' to
> > __update_tlb(), but as far as I can tell, that's only called for r3k and
> > r4k, so maybe it's not worth optimising at this point?
> 
> hmm, not sure if that would help. R4k style TLB has two PTEs mapped
> per TLB entry. So by advancing per page __update_tlb() is called more
> often than needed.

btw. how big is nr going to be ? There are MIPS SoCs out there, which
just have 16 TLBs...

Thomas.

-- 
Crap can work. Given enough thrust pigs will fly, but it's not necessarily a
good idea.                                                [ RFC1925, 2.3 ]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v4 16/36] mips: Implement the new page table range API
  2023-03-19 18:45         ` Thomas Bogendoerfer
@ 2023-03-19 20:16           ` Matthew Wilcox
  2023-03-21 11:30             ` Thomas Bogendoerfer
  0 siblings, 1 reply; 138+ messages in thread
From: Matthew Wilcox @ 2023-03-19 20:16 UTC (permalink / raw)
  To: Thomas Bogendoerfer; +Cc: linux-arch, linux-mm, linux-kernel, linux-mips

On Sun, Mar 19, 2023 at 07:45:36PM +0100, Thomas Bogendoerfer wrote:
> On Fri, Mar 17, 2023 at 04:29:20PM +0100, Thomas Bogendoerfer wrote:
> > hmm, not sure if that would help. R4k style TLB has two PTEs mapped
> > per TLB entry. So by advancing per page __update_tlb() is called more
> > often than needed.
> 
> btw. how big is nr going to be ? There are MIPS SoCs out there, which
> just have 16 TLBs...

Oof.  The biggest we're going to see for now is one less than PTRS_PER_PMD
(that'd be a PMD-sized allocation that's mapped askew with 1 page in
one PMD and n-1 pages in the adjacent PMD).  That'd be 511 on x86 and
I presume something similar on MIPS.  More than 16, for sure.

Now, this isn't a new problem with this patchset.  With fault-around,
we already call set_pte_at() N times.  And we don't say which ones are
speculative entries vs the one actually faulted in.

But let's see if we can fix it.  What if we passed in the vmf?  That would
give you the actual faulting address, so you'd know to only put the PTE
into the Linux page tables and not go as far as putting it into the TLB.
Open to other ideas.

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v4 35/36] mm: Convert do_set_pte() to set_pte_range()
  2023-03-17  3:44             ` Matthew Wilcox
  2023-03-17  6:33               ` Yin, Fengwei
@ 2023-03-20 13:38               ` Yin, Fengwei
  2023-03-20 14:08                 ` Matthew Wilcox
  1 sibling, 1 reply; 138+ messages in thread
From: Yin, Fengwei @ 2023-03-20 13:38 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: Ryan Roberts, linux-arch, will, linux-mm, linux-kernel

Hi Matthew,

On 3/17/2023 11:44 AM, Matthew Wilcox wrote:
> On Fri, Mar 17, 2023 at 09:58:17AM +0800, Yin, Fengwei wrote:
>>
>>
>> On 3/17/2023 1:52 AM, Matthew Wilcox wrote:
>>> On Thu, Mar 16, 2023 at 04:38:58PM +0000, Ryan Roberts wrote:
>>>> On 16/03/2023 16:23, Yin, Fengwei wrote:
>>>>>> I think you are changing behavior here - is this intentional? Previously this
>>>>>> would be evaluated per page, now its evaluated once for the whole range. The
>>>>>> intention below is that directly faulted pages are mapped young and prefaulted
>>>>>> pages are mapped old. But now a whole range will be mapped the same.
>>>>>
>>>>> Yes. You are right here.
>>>>>
>>>>> Look at the prefault and cpu_has_hw_af for ARM64, it looks like we
>>>>> can avoid to handle vmf->address == addr specially. It's OK to 
>>>>> drop prefault and change the logic here a little bit to:
>>>>>   if (arch_wants_old_prefaulted_pte())
>>>>>       entry = pte_mkold(entry);
>>>>>   else
>>>>>       entry = pte_sw_mkyong(entry);
>>>>>
>>>>> It's not necessary to use pte_sw_mkyong for vmf->address == addr
>>>>> because HW will set the ACCESS bit in page table entry.
>>>>>
>>>>> Add Will Deacon in case I missed something here. Thanks.
>>>>
>>>> I'll defer to Will's response, but not all arm HW supports HW access flag
>>>> management. In that case it's done by SW, so I would imagine that by setting
>>>> this to old initially, we will get a second fault to set the access bit, which
>>>> will slow things down. I wonder if you will need to split this into (up to) 3
>>>> calls to set_ptes()?
>>>
>>> I don't think we should do that.  The limited information I have from
>>> various microarchitectures is that the PTEs must differ only in their
>>> PFN bits in order to use larger TLB entries.  That includes the Accessed
>>> bit (or equivalent).  So we should mkyoung all the PTEs in the same
>>> folio, at least initially.
>>>
>>> That said, we should still do this conditionally.  We'll prefault some
>>> other folios too.  So I think this should be:
>>>
>>>         bool prefault = (addr > vmf->address) || ((addr + nr) < vmf->address);
>>>
>> According to commit 46bdb4277f98e70d0c91f4289897ade533fe9e80, if hardware access
>> flag is supported on ARM64, there is benefit if prefault PTEs is set as "old".
>> If we change prefault like above, the PTEs is set as "yong" which loose benefit
>> on ARM64 with hardware access flag.
>>
>> ITOH, if from "old" to "yong" is cheap, why not leave all PTEs of folio as "old"
>> and let hardware to update it to "yong"?
> 
> Because we're tracking the entire folio as a single entity.  So we're
> better off avoiding the extra pagefaults to update the accessed bit,
> which won't actually give us any information (vmscan needs to know "were
> any of the accessed bits set", not "how many of them were set").
> 
> Anyway, hopefully Ryan can test this and let us know if it fixes the
> regression he sees.

Thanks a lot to Ryan for helping to test the debug patch I made.

Ryan confirmed that the following change could fix the kernel build regression:
diff --git a/mm/filemap.c b/mm/filemap.c
index db86e459dde6..343d6ff36b2c 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -3557,7 +3557,7 @@ vm_fault_t filemap_map_pages(struct vm_fault *vmf,

                ret |= filemap_map_folio_range(vmf, folio,
                                xas.xa_index - folio->index, addr, nr_pages);
-               xas.xa_index += nr_pages;
+               xas.xa_index += folio_test_large(folio) ? nr_pages : 0;

                folio_unlock(folio);
                folio_put(folio);

I will make upstream-able change as "xas.xa_index += nr_pages - 1;"

Ryan and I also identify some other changes needed. I am not sure how to
integrate those changes to this series. Maybe an add-on patch after this
series? Thanks.

Regards
Yin, Fengwei

^ permalink raw reply related	[flat|nested] 138+ messages in thread

* Re: [PATCH v4 35/36] mm: Convert do_set_pte() to set_pte_range()
  2023-03-20 13:38               ` Yin, Fengwei
@ 2023-03-20 14:08                 ` Matthew Wilcox
  2023-03-21  1:58                   ` Yin, Fengwei
                                     ` (2 more replies)
  0 siblings, 3 replies; 138+ messages in thread
From: Matthew Wilcox @ 2023-03-20 14:08 UTC (permalink / raw)
  To: Yin, Fengwei; +Cc: Ryan Roberts, linux-arch, will, linux-mm, linux-kernel

On Mon, Mar 20, 2023 at 09:38:55PM +0800, Yin, Fengwei wrote:
> Thanks a lot to Ryan for helping to test the debug patch I made.
> 
> Ryan confirmed that the following change could fix the kernel build regression:
> diff --git a/mm/filemap.c b/mm/filemap.c
> index db86e459dde6..343d6ff36b2c 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -3557,7 +3557,7 @@ vm_fault_t filemap_map_pages(struct vm_fault *vmf,
> 
>                 ret |= filemap_map_folio_range(vmf, folio,
>                                 xas.xa_index - folio->index, addr, nr_pages);
> -               xas.xa_index += nr_pages;
> +               xas.xa_index += folio_test_large(folio) ? nr_pages : 0;
> 
>                 folio_unlock(folio);
>                 folio_put(folio);
> 
> I will make upstream-able change as "xas.xa_index += nr_pages - 1;"

Thanks to both of you!

Really, we shouldn't need to interfere with xas.xa_index at all.
Does this work?

diff --git a/mm/filemap.c b/mm/filemap.c
index 8e4f95c5b65a..e40c967dde5f 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -3420,10 +3420,10 @@ static bool filemap_map_pmd(struct vm_fault *vmf, struct folio *folio,
 	return false;
 }
 
-static struct folio *next_uptodate_page(struct folio *folio,
-				       struct address_space *mapping,
-				       struct xa_state *xas, pgoff_t end_pgoff)
+static struct folio *next_uptodate_folio(struct xa_state *xas,
+		struct address_space *mapping, pgoff_t end_pgoff)
 {
+	struct folio *folio = xas_next_entry(xas, end_pgoff);
 	unsigned long max_idx;
 
 	do {
@@ -3461,22 +3461,6 @@ static struct folio *next_uptodate_page(struct folio *folio,
 	return NULL;
 }
 
-static inline struct folio *first_map_page(struct address_space *mapping,
-					  struct xa_state *xas,
-					  pgoff_t end_pgoff)
-{
-	return next_uptodate_page(xas_find(xas, end_pgoff),
-				  mapping, xas, end_pgoff);
-}
-
-static inline struct folio *next_map_page(struct address_space *mapping,
-					 struct xa_state *xas,
-					 pgoff_t end_pgoff)
-{
-	return next_uptodate_page(xas_next_entry(xas, end_pgoff),
-				  mapping, xas, end_pgoff);
-}
-
 /*
  * Map page range [start_page, start_page + nr_pages) of folio.
  * start_page is gotten from start by folio_page(folio, start)
@@ -3552,7 +3536,7 @@ vm_fault_t filemap_map_pages(struct vm_fault *vmf,
 	int nr_pages = 0;
 
 	rcu_read_lock();
-	folio = first_map_page(mapping, &xas, end_pgoff);
+	folio = next_uptodate_folio(&xas, mapping, end_pgoff);
 	if (!folio)
 		goto out;
 
@@ -3574,11 +3558,11 @@ vm_fault_t filemap_map_pages(struct vm_fault *vmf,
 
 		ret |= filemap_map_folio_range(vmf, folio,
 				xas.xa_index - folio->index, addr, nr_pages);
-		xas.xa_index += nr_pages;
 
 		folio_unlock(folio);
 		folio_put(folio);
-	} while ((folio = next_map_page(mapping, &xas, end_pgoff)) != NULL);
+		folio = next_uptodate_folio(&xas, mapping, end_pgoff);
+	} while (folio);
 	pte_unmap_unlock(vmf->pte, vmf->ptl);
 out:
 	rcu_read_unlock();

> Ryan and I also identify some other changes needed. I am not sure how to
> integrate those changes to this series. Maybe an add-on patch after this
> series? Thanks.

Up to you; I'm happy to integrate fixup patches into the current series
or add on new ones.

^ permalink raw reply related	[flat|nested] 138+ messages in thread

* Re: [PATCH v4 35/36] mm: Convert do_set_pte() to set_pte_range()
  2023-03-20 14:08                 ` Matthew Wilcox
@ 2023-03-21  1:58                   ` Yin, Fengwei
  2023-03-21  5:13                   ` Yin Fengwei
  2023-05-30  8:07                   ` [PATCH 0/4] New page table range API fixup patches Yin Fengwei
  2 siblings, 0 replies; 138+ messages in thread
From: Yin, Fengwei @ 2023-03-21  1:58 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: Ryan Roberts, linux-arch, will, linux-mm, linux-kernel



On 3/20/2023 10:08 PM, Matthew Wilcox wrote:
> On Mon, Mar 20, 2023 at 09:38:55PM +0800, Yin, Fengwei wrote:
>> Thanks a lot to Ryan for helping to test the debug patch I made.
>>
>> Ryan confirmed that the following change could fix the kernel build regression:
>> diff --git a/mm/filemap.c b/mm/filemap.c
>> index db86e459dde6..343d6ff36b2c 100644
>> --- a/mm/filemap.c
>> +++ b/mm/filemap.c
>> @@ -3557,7 +3557,7 @@ vm_fault_t filemap_map_pages(struct vm_fault *vmf,
>>
>>                 ret |= filemap_map_folio_range(vmf, folio,
>>                                 xas.xa_index - folio->index, addr, nr_pages);
>> -               xas.xa_index += nr_pages;
>> +               xas.xa_index += folio_test_large(folio) ? nr_pages : 0;
>>
>>                 folio_unlock(folio);
>>                 folio_put(folio);
>>
>> I will make upstream-able change as "xas.xa_index += nr_pages - 1;"
> 
> Thanks to both of you!
> 
> Really, we shouldn't need to interfere with xas.xa_index at all.
> Does this work?
I will give this a try and let you know the result.

> 
> diff --git a/mm/filemap.c b/mm/filemap.c
> index 8e4f95c5b65a..e40c967dde5f 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -3420,10 +3420,10 @@ static bool filemap_map_pmd(struct vm_fault *vmf, struct folio *folio,
>  	return false;
>  }
>  
> -static struct folio *next_uptodate_page(struct folio *folio,
> -				       struct address_space *mapping,
> -				       struct xa_state *xas, pgoff_t end_pgoff)
> +static struct folio *next_uptodate_folio(struct xa_state *xas,
> +		struct address_space *mapping, pgoff_t end_pgoff)
>  {
> +	struct folio *folio = xas_next_entry(xas, end_pgoff);
>  	unsigned long max_idx;
>  
>  	do {
> @@ -3461,22 +3461,6 @@ static struct folio *next_uptodate_page(struct folio *folio,
>  	return NULL;
>  }
>  
> -static inline struct folio *first_map_page(struct address_space *mapping,
> -					  struct xa_state *xas,
> -					  pgoff_t end_pgoff)
> -{
> -	return next_uptodate_page(xas_find(xas, end_pgoff),
> -				  mapping, xas, end_pgoff);
> -}
> -
> -static inline struct folio *next_map_page(struct address_space *mapping,
> -					 struct xa_state *xas,
> -					 pgoff_t end_pgoff)
> -{
> -	return next_uptodate_page(xas_next_entry(xas, end_pgoff),
> -				  mapping, xas, end_pgoff);
> -}
> -
>  /*
>   * Map page range [start_page, start_page + nr_pages) of folio.
>   * start_page is gotten from start by folio_page(folio, start)
> @@ -3552,7 +3536,7 @@ vm_fault_t filemap_map_pages(struct vm_fault *vmf,
>  	int nr_pages = 0;
>  
>  	rcu_read_lock();
> -	folio = first_map_page(mapping, &xas, end_pgoff);
> +	folio = next_uptodate_folio(&xas, mapping, end_pgoff);
>  	if (!folio)
>  		goto out;
>  
> @@ -3574,11 +3558,11 @@ vm_fault_t filemap_map_pages(struct vm_fault *vmf,
>  
>  		ret |= filemap_map_folio_range(vmf, folio,
>  				xas.xa_index - folio->index, addr, nr_pages);
> -		xas.xa_index += nr_pages;
>  
>  		folio_unlock(folio);
>  		folio_put(folio);
> -	} while ((folio = next_map_page(mapping, &xas, end_pgoff)) != NULL);
> +		folio = next_uptodate_folio(&xas, mapping, end_pgoff);
> +	} while (folio);
>  	pte_unmap_unlock(vmf->pte, vmf->ptl);
>  out:
>  	rcu_read_unlock();
> 
>> Ryan and I also identify some other changes needed. I am not sure how to
>> integrate those changes to this series. Maybe an add-on patch after this
>> series? Thanks.
> 
> Up to you; I'm happy to integrate fixup patches into the current series
> or add on new ones.
Integrating to current series should be better. As it doesn't impact the
bisect operations. I will share the changes Ryan and I had after verify
the above change you proposed. Thanks.


Regards
Yin, Fengwei


^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v4 35/36] mm: Convert do_set_pte() to set_pte_range()
  2023-03-20 14:08                 ` Matthew Wilcox
  2023-03-21  1:58                   ` Yin, Fengwei
@ 2023-03-21  5:13                   ` Yin Fengwei
  2023-05-30  8:07                   ` [PATCH 0/4] New page table range API fixup patches Yin Fengwei
  2 siblings, 0 replies; 138+ messages in thread
From: Yin Fengwei @ 2023-03-21  5:13 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: Ryan Roberts, linux-arch, will, linux-mm, linux-kernel

On 3/20/23 22:08, Matthew Wilcox wrote:
> On Mon, Mar 20, 2023 at 09:38:55PM +0800, Yin, Fengwei wrote:
>> Thanks a lot to Ryan for helping to test the debug patch I made.
>>
>> Ryan confirmed that the following change could fix the kernel build regression:
>> diff --git a/mm/filemap.c b/mm/filemap.c
>> index db86e459dde6..343d6ff36b2c 100644
>> --- a/mm/filemap.c
>> +++ b/mm/filemap.c
>> @@ -3557,7 +3557,7 @@ vm_fault_t filemap_map_pages(struct vm_fault *vmf,
>>
>>                  ret |= filemap_map_folio_range(vmf, folio,
>>                                  xas.xa_index - folio->index, addr, nr_pages);
>> -               xas.xa_index += nr_pages;
>> +               xas.xa_index += folio_test_large(folio) ? nr_pages : 0;
>>
>>                  folio_unlock(folio);
>>                  folio_put(folio);
>>
>> I will make upstream-able change as "xas.xa_index += nr_pages - 1;"
> 
> Thanks to both of you!
> 
> Really, we shouldn't need to interfere with xas.xa_index at all.
> Does this work?
Yes. This works perfectly in my side. Thanks.

Regards
Yin, Fengwei

> 
> diff --git a/mm/filemap.c b/mm/filemap.c
> index 8e4f95c5b65a..e40c967dde5f 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -3420,10 +3420,10 @@ static bool filemap_map_pmd(struct vm_fault *vmf, struct folio *folio,
>   	return false;
>   }
>   
> -static struct folio *next_uptodate_page(struct folio *folio,
> -				       struct address_space *mapping,
> -				       struct xa_state *xas, pgoff_t end_pgoff)
> +static struct folio *next_uptodate_folio(struct xa_state *xas,
> +		struct address_space *mapping, pgoff_t end_pgoff)
>   {
> +	struct folio *folio = xas_next_entry(xas, end_pgoff);
>   	unsigned long max_idx;
>   
>   	do {
> @@ -3461,22 +3461,6 @@ static struct folio *next_uptodate_page(struct folio *folio,
>   	return NULL;
>   }
>   
> -static inline struct folio *first_map_page(struct address_space *mapping,
> -					  struct xa_state *xas,
> -					  pgoff_t end_pgoff)
> -{
> -	return next_uptodate_page(xas_find(xas, end_pgoff),
> -				  mapping, xas, end_pgoff);
> -}
> -
> -static inline struct folio *next_map_page(struct address_space *mapping,
> -					 struct xa_state *xas,
> -					 pgoff_t end_pgoff)
> -{
> -	return next_uptodate_page(xas_next_entry(xas, end_pgoff),
> -				  mapping, xas, end_pgoff);
> -}
> -
>   /*
>    * Map page range [start_page, start_page + nr_pages) of folio.
>    * start_page is gotten from start by folio_page(folio, start)
> @@ -3552,7 +3536,7 @@ vm_fault_t filemap_map_pages(struct vm_fault *vmf,
>   	int nr_pages = 0;
>   
>   	rcu_read_lock();
> -	folio = first_map_page(mapping, &xas, end_pgoff);
> +	folio = next_uptodate_folio(&xas, mapping, end_pgoff);
>   	if (!folio)
>   		goto out;
>   
> @@ -3574,11 +3558,11 @@ vm_fault_t filemap_map_pages(struct vm_fault *vmf,
>   
>   		ret |= filemap_map_folio_range(vmf, folio,
>   				xas.xa_index - folio->index, addr, nr_pages);
> -		xas.xa_index += nr_pages;
>   
>   		folio_unlock(folio);
>   		folio_put(folio);
> -	} while ((folio = next_map_page(mapping, &xas, end_pgoff)) != NULL);
> +		folio = next_uptodate_folio(&xas, mapping, end_pgoff);
> +	} while (folio);
>   	pte_unmap_unlock(vmf->pte, vmf->ptl);
>   out:
>   	rcu_read_unlock();
> 
>> Ryan and I also identify some other changes needed. I am not sure how to
>> integrate those changes to this series. Maybe an add-on patch after this
>> series? Thanks.
> 
> Up to you; I'm happy to integrate fixup patches into the current series
> or add on new ones.


^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v4 16/36] mips: Implement the new page table range API
  2023-03-19 20:16           ` Matthew Wilcox
@ 2023-03-21 11:30             ` Thomas Bogendoerfer
  0 siblings, 0 replies; 138+ messages in thread
From: Thomas Bogendoerfer @ 2023-03-21 11:30 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: linux-arch, linux-mm, linux-kernel, linux-mips

On Sun, Mar 19, 2023 at 08:16:36PM +0000, Matthew Wilcox wrote:
> On Sun, Mar 19, 2023 at 07:45:36PM +0100, Thomas Bogendoerfer wrote:
> > On Fri, Mar 17, 2023 at 04:29:20PM +0100, Thomas Bogendoerfer wrote:
> > > hmm, not sure if that would help. R4k style TLB has two PTEs mapped
> > > per TLB entry. So by advancing per page __update_tlb() is called more
> > > often than needed.
> > 
> > btw. how big is nr going to be ? There are MIPS SoCs out there, which
> > just have 16 TLBs...
> 
> Oof.  The biggest we're going to see for now is one less than PTRS_PER_PMD
> (that'd be a PMD-sized allocation that's mapped askew with 1 page in
> one PMD and n-1 pages in the adjacent PMD).  That'd be 511 on x86 and
> I presume something similar on MIPS.  More than 16, for sure.

biggest TLB I could find is 256 entries, which can map 512 pages.

> Now, this isn't a new problem with this patchset.  With fault-around,
> we already call set_pte_at() N times.  And we don't say which ones are
> speculative entries vs the one actually faulted in.

ic

> But let's see if we can fix it.  What if we passed in the vmf?  That would
> give you the actual faulting address, so you'd know to only put the PTE
> into the Linux page tables and not go as far as putting it into the TLB.
> Open to other ideas.

that would help to optimize the case. But update_mmu_cache_range needs to
do __update_tlb() for every page to avoid stale data in TLB. If I understood
correctly only the way how TLB updates are done changed, so there shouldn't
be performance regressions. And optimizing like moving the looping over
the pages into __update_tlb() could be done in a second step.

Thomas.

-- 
Crap can work. Given enough thrust pigs will fly, but it's not necessarily a
good idea.                                                [ RFC1925, 2.3 ]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v4 01/36] mm: Convert page_table_check_pte_set() to page_table_check_ptes_set()
  2023-03-15  5:14 ` [PATCH v4 01/36] mm: Convert page_table_check_pte_set() to page_table_check_ptes_set() Matthew Wilcox (Oracle)
  2023-03-15  9:21   ` Mike Rapoport
@ 2023-03-23 18:36   ` Pasha Tatashin
  2023-05-25  2:16   ` Anshuman Khandual
  2 siblings, 0 replies; 138+ messages in thread
From: Pasha Tatashin @ 2023-03-23 18:36 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle); +Cc: linux-arch, linux-mm, linux-kernel

On Wed, Mar 15, 2023 at 1:15 AM Matthew Wilcox (Oracle)
<willy@infradead.org> wrote:
>
> Tell the page table check how many PTEs & PFNs we want it to check.
>
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

Acked-by: Pasha Tatashin <pasha.tatashin@soleen.com>

Thanks,
Pasha

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v4 35/36] mm: Convert do_set_pte() to set_pte_range()
  2023-03-17  8:19                   ` Yin, Fengwei
  2023-03-17 13:00                     ` Ryan Roberts
@ 2023-03-24 14:58                     ` Will Deacon
  2023-03-24 15:11                       ` Matthew Wilcox
  1 sibling, 1 reply; 138+ messages in thread
From: Will Deacon @ 2023-03-24 14:58 UTC (permalink / raw)
  To: Yin, Fengwei
  Cc: Ryan Roberts, Matthew Wilcox, linux-arch, linux-mm, linux-kernel

On Fri, Mar 17, 2023 at 04:19:44PM +0800, Yin, Fengwei wrote:
> 
> 
> On 3/17/2023 4:00 PM, Ryan Roberts wrote:
> > On 17/03/2023 06:33, Yin, Fengwei wrote:
> >>
> >>
> >> On 3/17/2023 11:44 AM, Matthew Wilcox wrote:
> >>> On Fri, Mar 17, 2023 at 09:58:17AM +0800, Yin, Fengwei wrote:
> >>>>
> >>>>
> >>>> On 3/17/2023 1:52 AM, Matthew Wilcox wrote:
> >>>>> On Thu, Mar 16, 2023 at 04:38:58PM +0000, Ryan Roberts wrote:
> >>>>>> On 16/03/2023 16:23, Yin, Fengwei wrote:
> >>>>>>>> I think you are changing behavior here - is this intentional? Previously this
> >>>>>>>> would be evaluated per page, now its evaluated once for the whole range. The
> >>>>>>>> intention below is that directly faulted pages are mapped young and prefaulted
> >>>>>>>> pages are mapped old. But now a whole range will be mapped the same.
> >>>>>>>
> >>>>>>> Yes. You are right here.
> >>>>>>>
> >>>>>>> Look at the prefault and cpu_has_hw_af for ARM64, it looks like we
> >>>>>>> can avoid to handle vmf->address == addr specially. It's OK to 
> >>>>>>> drop prefault and change the logic here a little bit to:
> >>>>>>>   if (arch_wants_old_prefaulted_pte())
> >>>>>>>       entry = pte_mkold(entry);
> >>>>>>>   else
> >>>>>>>       entry = pte_sw_mkyong(entry);
> >>>>>>>
> >>>>>>> It's not necessary to use pte_sw_mkyong for vmf->address == addr
> >>>>>>> because HW will set the ACCESS bit in page table entry.
> >>>>>>>
> >>>>>>> Add Will Deacon in case I missed something here. Thanks.
> >>>>>>
> >>>>>> I'll defer to Will's response, but not all arm HW supports HW access flag
> >>>>>> management. In that case it's done by SW, so I would imagine that by setting
> >>>>>> this to old initially, we will get a second fault to set the access bit, which
> >>>>>> will slow things down. I wonder if you will need to split this into (up to) 3
> >>>>>> calls to set_ptes()?
> >>>>>
> >>>>> I don't think we should do that.  The limited information I have from
> >>>>> various microarchitectures is that the PTEs must differ only in their
> >>>>> PFN bits in order to use larger TLB entries.  That includes the Accessed
> >>>>> bit (or equivalent).  So we should mkyoung all the PTEs in the same
> >>>>> folio, at least initially.
> >>>>>
> >>>>> That said, we should still do this conditionally.  We'll prefault some
> >>>>> other folios too.  So I think this should be:
> >>>>>
> >>>>>         bool prefault = (addr > vmf->address) || ((addr + nr) < vmf->address);
> >>>>>
> >>>> According to commit 46bdb4277f98e70d0c91f4289897ade533fe9e80, if hardware access
> >>>> flag is supported on ARM64, there is benefit if prefault PTEs is set as "old".
> >>>> If we change prefault like above, the PTEs is set as "yong" which loose benefit
> >>>> on ARM64 with hardware access flag.
> >>>>
> >>>> ITOH, if from "old" to "yong" is cheap, why not leave all PTEs of folio as "old"
> >>>> and let hardware to update it to "yong"?
> >>>
> >>> Because we're tracking the entire folio as a single entity.  So we're
> >>> better off avoiding the extra pagefaults to update the accessed bit,
> >>> which won't actually give us any information (vmscan needs to know "were
> >>> any of the accessed bits set", not "how many of them were set").
> >> There is no extra pagefaults to update the accessed bit. There are three cases here:
> >> 1. hardware support access flag and cheap from "old" to "yong" without extra fault
> >> 2. hardware support access flag and expensive from "old" to "yong" without extra fault
> >> 3. no hardware support access flag (extra pagefaults from "old" to "yong". Expensive)
> >>
> >> For #2 and #3, it's expensive from "old" to "yong", so we always set PTEs "yong" in
> >> page fault.
> >> For #1, It's cheap from "old" to "yong", so it's OK to set PTEs "old" in page fault.
> >> And hardware will set it to "yong" when access memory. Actually, ARM64 with hardware
> >> access bit requires to set PTEs "old".
> > 
> > Your logic makes sense, but it doesn't take into account the HPA
> > micro-architectural feature present in some ARM CPUs. HPA can transparently
> > coalesce multiple pages into a single TLB entry when certain conditions are met
> > (roughly; upto 4 pages physically and virtually contiguous and all within a
> > 4-page natural alignment). But as Matthew says, this works out better when all
> > pte attributes (including access and dirty) match. Given the reason for setting
> > the prefault pages to old is so that vmscan can do a better job of finding cold
> > pages, and given vmscan will now be looking for folios and not individual pages
> > (I assume?), I agree with Matthew that we should make whole folios young or old.
> > It will marginally increase our chances of the access and dirty bits being
> > consistent across the whole 4-page block that the HW tries to coalesce. If we
> > unconditionally make everything old, the hw will set accessed for the single
> > page that faulted, and we therefore don't have consistency for that 4-page block.
> My concern was that the benefit of "old" PTEs for ARM64 with hardware access bit
> will be lost. The workloads (application launch latency and direct reclaim according
> to commit 46bdb4277f98e70d0c91f4289897ade533fe9e80) can show regression with this
> series. Thanks.

Yes, please don't fault everything in as young as it has caused horrible
vmscan behaviour leading to app-startup slowdown in the past:

https://lore.kernel.org/all/20210111140149.GB7642@willie-the-truck/

If we have to use the same value for all the ptes, then just base them
all on arch_wants_old_prefaulted_pte() as iirc hardware AF was pretty
cheap in practice for us.

Cheers,

Will

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v4 35/36] mm: Convert do_set_pte() to set_pte_range()
  2023-03-24 14:58                     ` Will Deacon
@ 2023-03-24 15:11                       ` Matthew Wilcox
  2023-03-24 17:23                         ` Will Deacon
  0 siblings, 1 reply; 138+ messages in thread
From: Matthew Wilcox @ 2023-03-24 15:11 UTC (permalink / raw)
  To: Will Deacon
  Cc: Yin, Fengwei, Ryan Roberts, linux-arch, linux-mm, linux-kernel

On Fri, Mar 24, 2023 at 02:58:29PM +0000, Will Deacon wrote:
> Yes, please don't fault everything in as young as it has caused horrible
> vmscan behaviour leading to app-startup slowdown in the past:
> 
> https://lore.kernel.org/all/20210111140149.GB7642@willie-the-truck/
> 
> If we have to use the same value for all the ptes, then just base them
> all on arch_wants_old_prefaulted_pte() as iirc hardware AF was pretty
> cheap in practice for us.

I think that's wrong, because this is a different scenario.

Before:

We faulted in N single-page folios.  Each page/folio is tracked
independently.  That's N entries on whatever LRU list it ends up on.
The prefaulted ones _should_ be marked old -- they haven't been
accessed; we've just decided to put them in the page tables to
speed up faultaround.  The unaccessed pages need to fall off the LRU
list as quickly as possible; keeping them around only hurts if the
workload has no locality of reference.

After:

We fault in N folios, some possibly consisting of multiple pages.
Each folio is tracked separately, but individual pages in the folio
are not tracked; they belong to their folio.  In this scenario, if
the other PTEs for pages in the same folio are marked as young or old
doesn't matter; the entire folio will be tracked as young, because we
referenced one of the pages in this folio.  Marking the other PTEs as
young actually helps because we don't take pagefaults on them (whether
we have a HW or SW accessed bit).

(can i just say that i dislike how we mix up our old/young accessed/not
terminology here?)

We should still mark the PTEs referencing unaccessed folios as old.
No argument there, and this patch does that.  But it's fine for all the
PTEs referencing the accessed folio to have the young bit, at least as
far as I can tell.

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v4 35/36] mm: Convert do_set_pte() to set_pte_range()
  2023-03-24 15:11                       ` Matthew Wilcox
@ 2023-03-24 17:23                         ` Will Deacon
  2023-03-27  1:23                           ` Yin Fengwei
  0 siblings, 1 reply; 138+ messages in thread
From: Will Deacon @ 2023-03-24 17:23 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Yin, Fengwei, Ryan Roberts, linux-arch, linux-mm, linux-kernel

On Fri, Mar 24, 2023 at 03:11:00PM +0000, Matthew Wilcox wrote:
> On Fri, Mar 24, 2023 at 02:58:29PM +0000, Will Deacon wrote:
> > Yes, please don't fault everything in as young as it has caused horrible
> > vmscan behaviour leading to app-startup slowdown in the past:
> > 
> > https://lore.kernel.org/all/20210111140149.GB7642@willie-the-truck/
> > 
> > If we have to use the same value for all the ptes, then just base them
> > all on arch_wants_old_prefaulted_pte() as iirc hardware AF was pretty
> > cheap in practice for us.
> 
> I think that's wrong, because this is a different scenario.
> 
> Before:
> 
> We faulted in N single-page folios.  Each page/folio is tracked
> independently.  That's N entries on whatever LRU list it ends up on.
> The prefaulted ones _should_ be marked old -- they haven't been
> accessed; we've just decided to put them in the page tables to
> speed up faultaround.  The unaccessed pages need to fall off the LRU
> list as quickly as possible; keeping them around only hurts if the
> workload has no locality of reference.
> 
> After:
> 
> We fault in N folios, some possibly consisting of multiple pages.
> Each folio is tracked separately, but individual pages in the folio
> are not tracked; they belong to their folio.  In this scenario, if
> the other PTEs for pages in the same folio are marked as young or old
> doesn't matter; the entire folio will be tracked as young, because we
> referenced one of the pages in this folio.  Marking the other PTEs as
> young actually helps because we don't take pagefaults on them (whether
> we have a HW or SW accessed bit).
> 
> (can i just say that i dislike how we mix up our old/young accessed/not
> terminology here?)
> 
> We should still mark the PTEs referencing unaccessed folios as old.
> No argument there, and this patch does that.  But it's fine for all the
> PTEs referencing the accessed folio to have the young bit, at least as
> far as I can tell.

Ok, thanks for the explanation. So as long as
arch_wants_old_prefaulted_pte() is taken into account for the unaccessed
folios, then I think we should be good? Unconditionally marking those
PTEs as old probably hurts x86.

Will

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v4 35/36] mm: Convert do_set_pte() to set_pte_range()
  2023-03-24 17:23                         ` Will Deacon
@ 2023-03-27  1:23                           ` Yin Fengwei
  0 siblings, 0 replies; 138+ messages in thread
From: Yin Fengwei @ 2023-03-27  1:23 UTC (permalink / raw)
  To: Will Deacon, Matthew Wilcox
  Cc: Ryan Roberts, linux-arch, linux-mm, linux-kernel

On 3/25/23 01:23, Will Deacon wrote:
> On Fri, Mar 24, 2023 at 03:11:00PM +0000, Matthew Wilcox wrote:
>> On Fri, Mar 24, 2023 at 02:58:29PM +0000, Will Deacon wrote:
>>> Yes, please don't fault everything in as young as it has caused horrible
>>> vmscan behaviour leading to app-startup slowdown in the past:
>>>
>>> https://lore.kernel.org/all/20210111140149.GB7642@willie-the-truck/
>>>
>>> If we have to use the same value for all the ptes, then just base them
>>> all on arch_wants_old_prefaulted_pte() as iirc hardware AF was pretty
>>> cheap in practice for us.
>>
>> I think that's wrong, because this is a different scenario.
>>
>> Before:
>>
>> We faulted in N single-page folios.  Each page/folio is tracked
>> independently.  That's N entries on whatever LRU list it ends up on.
>> The prefaulted ones _should_ be marked old -- they haven't been
>> accessed; we've just decided to put them in the page tables to
>> speed up faultaround.  The unaccessed pages need to fall off the LRU
>> list as quickly as possible; keeping them around only hurts if the
>> workload has no locality of reference.
>>
>> After:
>>
>> We fault in N folios, some possibly consisting of multiple pages.
>> Each folio is tracked separately, but individual pages in the folio
>> are not tracked; they belong to their folio.  In this scenario, if
>> the other PTEs for pages in the same folio are marked as young or old
>> doesn't matter; the entire folio will be tracked as young, because we
>> referenced one of the pages in this folio.  Marking the other PTEs as
>> young actually helps because we don't take pagefaults on them (whether
>> we have a HW or SW accessed bit).
>>
>> (can i just say that i dislike how we mix up our old/young accessed/not
>> terminology here?)
>>
>> We should still mark the PTEs referencing unaccessed folios as old.
>> No argument there, and this patch does that.  But it's fine for all the
>> PTEs referencing the accessed folio to have the young bit, at least as
>> far as I can tell.
> 
> Ok, thanks for the explanation. So as long as
> arch_wants_old_prefaulted_pte() is taken into account for the unaccessed
> folios, then I think we should be good? Unconditionally marking those
> PTEs as old probably hurts x86.
Yes. We do only mark PTEs old for arch_wants_old_prefaulted_pte()
system. Thanks.


Regards
Yin, Fengwei

> 
> Will


^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v4 01/36] mm: Convert page_table_check_pte_set() to page_table_check_ptes_set()
  2023-03-15  5:14 ` [PATCH v4 01/36] mm: Convert page_table_check_pte_set() to page_table_check_ptes_set() Matthew Wilcox (Oracle)
  2023-03-15  9:21   ` Mike Rapoport
  2023-03-23 18:36   ` Pasha Tatashin
@ 2023-05-25  2:16   ` Anshuman Khandual
  2 siblings, 0 replies; 138+ messages in thread
From: Anshuman Khandual @ 2023-05-25  2:16 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), linux-arch; +Cc: linux-mm, linux-kernel



On 3/15/23 10:44, Matthew Wilcox (Oracle) wrote:
> Tell the page table check how many PTEs & PFNs we want it to check.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>

> ---
>  arch/arm64/include/asm/pgtable.h |  2 +-
>  arch/riscv/include/asm/pgtable.h |  2 +-
>  arch/x86/include/asm/pgtable.h   |  2 +-
>  include/linux/page_table_check.h | 14 +++++++-------
>  mm/page_table_check.c            | 14 ++++++++------
>  5 files changed, 18 insertions(+), 16 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
> index 0bd18de9fd97..9428748f4691 100644
> --- a/arch/arm64/include/asm/pgtable.h
> +++ b/arch/arm64/include/asm/pgtable.h
> @@ -358,7 +358,7 @@ static inline void __set_pte_at(struct mm_struct *mm, unsigned long addr,
>  static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
>  			      pte_t *ptep, pte_t pte)
>  {
> -	page_table_check_pte_set(mm, addr, ptep, pte);
> +	page_table_check_ptes_set(mm, addr, ptep, pte, 1);
>  	return __set_pte_at(mm, addr, ptep, pte);
>  }
>  
> diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
> index ab05f892d317..b516f3b59616 100644
> --- a/arch/riscv/include/asm/pgtable.h
> +++ b/arch/riscv/include/asm/pgtable.h
> @@ -459,7 +459,7 @@ static inline void __set_pte_at(struct mm_struct *mm,
>  static inline void set_pte_at(struct mm_struct *mm,
>  	unsigned long addr, pte_t *ptep, pte_t pteval)
>  {
> -	page_table_check_pte_set(mm, addr, ptep, pteval);
> +	page_table_check_ptes_set(mm, addr, ptep, pteval, 1);
>  	__set_pte_at(mm, addr, ptep, pteval);
>  }
>  
> diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
> index 15ae4d6ba476..1031025730d0 100644
> --- a/arch/x86/include/asm/pgtable.h
> +++ b/arch/x86/include/asm/pgtable.h
> @@ -1022,7 +1022,7 @@ static inline pud_t native_local_pudp_get_and_clear(pud_t *pudp)
>  static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
>  			      pte_t *ptep, pte_t pte)
>  {
> -	page_table_check_pte_set(mm, addr, ptep, pte);
> +	page_table_check_ptes_set(mm, addr, ptep, pte, 1);
>  	set_pte(ptep, pte);
>  }
>  
> diff --git a/include/linux/page_table_check.h b/include/linux/page_table_check.h
> index 01e16c7696ec..ba269c7009e4 100644
> --- a/include/linux/page_table_check.h
> +++ b/include/linux/page_table_check.h
> @@ -20,8 +20,8 @@ void __page_table_check_pmd_clear(struct mm_struct *mm, unsigned long addr,
>  				  pmd_t pmd);
>  void __page_table_check_pud_clear(struct mm_struct *mm, unsigned long addr,
>  				  pud_t pud);
> -void __page_table_check_pte_set(struct mm_struct *mm, unsigned long addr,
> -				pte_t *ptep, pte_t pte);
> +void __page_table_check_ptes_set(struct mm_struct *mm, unsigned long addr,
> +				pte_t *ptep, pte_t pte, unsigned int nr);
>  void __page_table_check_pmd_set(struct mm_struct *mm, unsigned long addr,
>  				pmd_t *pmdp, pmd_t pmd);
>  void __page_table_check_pud_set(struct mm_struct *mm, unsigned long addr,
> @@ -73,14 +73,14 @@ static inline void page_table_check_pud_clear(struct mm_struct *mm,
>  	__page_table_check_pud_clear(mm, addr, pud);
>  }
>  
> -static inline void page_table_check_pte_set(struct mm_struct *mm,
> +static inline void page_table_check_ptes_set(struct mm_struct *mm,
>  					    unsigned long addr, pte_t *ptep,
> -					    pte_t pte)
> +					    pte_t pte, unsigned int nr)
>  {
>  	if (static_branch_likely(&page_table_check_disabled))
>  		return;
>  
> -	__page_table_check_pte_set(mm, addr, ptep, pte);
> +	__page_table_check_ptes_set(mm, addr, ptep, pte, nr);
>  }
>  
>  static inline void page_table_check_pmd_set(struct mm_struct *mm,
> @@ -138,9 +138,9 @@ static inline void page_table_check_pud_clear(struct mm_struct *mm,
>  {
>  }
>  
> -static inline void page_table_check_pte_set(struct mm_struct *mm,
> +static inline void page_table_check_ptes_set(struct mm_struct *mm,
>  					    unsigned long addr, pte_t *ptep,
> -					    pte_t pte)
> +					    pte_t pte, unsigned int nr)
>  {
>  }
>  
> diff --git a/mm/page_table_check.c b/mm/page_table_check.c
> index 25d8610c0042..e6f4d40caaa2 100644
> --- a/mm/page_table_check.c
> +++ b/mm/page_table_check.c
> @@ -184,20 +184,22 @@ void __page_table_check_pud_clear(struct mm_struct *mm, unsigned long addr,
>  }
>  EXPORT_SYMBOL(__page_table_check_pud_clear);
>  
> -void __page_table_check_pte_set(struct mm_struct *mm, unsigned long addr,
> -				pte_t *ptep, pte_t pte)
> +void __page_table_check_ptes_set(struct mm_struct *mm, unsigned long addr,
> +				pte_t *ptep, pte_t pte, unsigned int nr)
>  {
> +	unsigned int i;
> +
>  	if (&init_mm == mm)
>  		return;
>  
> -	__page_table_check_pte_clear(mm, addr, *ptep);
> +	for (i = 0; i < nr; i++)
> +		__page_table_check_pte_clear(mm, addr, ptep[i]);
>  	if (pte_user_accessible_page(pte)) {
> -		page_table_check_set(mm, addr, pte_pfn(pte),
> -				     PAGE_SIZE >> PAGE_SHIFT,
> +		page_table_check_set(mm, addr, pte_pfn(pte), nr,
>  				     pte_write(pte));
>  	}
>  }
> -EXPORT_SYMBOL(__page_table_check_pte_set);
> +EXPORT_SYMBOL(__page_table_check_ptes_set);
>  
>  void __page_table_check_pmd_set(struct mm_struct *mm, unsigned long addr,
>  				pmd_t *pmdp, pmd_t pmd)

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v4 02/36] mm: Add generic flush_icache_pages() and documentation
  2023-03-15  5:14 ` [PATCH v4 02/36] mm: Add generic flush_icache_pages() and documentation Matthew Wilcox (Oracle)
  2023-03-15  9:27   ` Mike Rapoport
@ 2023-05-25  2:23   ` Anshuman Khandual
  1 sibling, 0 replies; 138+ messages in thread
From: Anshuman Khandual @ 2023-05-25  2:23 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), linux-arch; +Cc: linux-mm, linux-kernel



On 3/15/23 10:44, Matthew Wilcox (Oracle) wrote:
> flush_icache_page() is deprecated but not yet removed, so add
> a range version of it.  Change the documentation to refer to
> update_mmu_cache_range() instead of update_mmu_cache().
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>

> ---
>  Documentation/core-api/cachetlb.rst | 35 +++++++++++++++--------------
>  include/asm-generic/cacheflush.h    |  5 +++++
>  2 files changed, 23 insertions(+), 17 deletions(-)
> 
> diff --git a/Documentation/core-api/cachetlb.rst b/Documentation/core-api/cachetlb.rst
> index 5c0552e78c58..d4c9e2a28d36 100644
> --- a/Documentation/core-api/cachetlb.rst
> +++ b/Documentation/core-api/cachetlb.rst
> @@ -88,13 +88,13 @@ changes occur:
>  
>  	This is used primarily during fault processing.
>  
> -5) ``void update_mmu_cache(struct vm_area_struct *vma,
> -   unsigned long address, pte_t *ptep)``
> +5) ``void update_mmu_cache_range(struct vm_area_struct *vma,
> +   unsigned long address, pte_t *ptep, unsigned int nr)``
>  
> -	At the end of every page fault, this routine is invoked to
> -	tell the architecture specific code that a translation
> -	now exists at virtual address "address" for address space
> -	"vma->vm_mm", in the software page tables.
> +	At the end of every page fault, this routine is invoked to tell
> +	the architecture specific code that translations now exists
> +	in the software page tables for address space "vma->vm_mm"
> +	at virtual address "address" for "nr" consecutive pages.
>  
>  	A port may use this information in any way it so chooses.
>  	For example, it could use this event to pre-load TLB
> @@ -306,17 +306,18 @@ maps this page at its virtual address.
>  	private".  The kernel guarantees that, for pagecache pages, it will
>  	clear this bit when such a page first enters the pagecache.
>  
> -	This allows these interfaces to be implemented much more efficiently.
> -	It allows one to "defer" (perhaps indefinitely) the actual flush if
> -	there are currently no user processes mapping this page.  See sparc64's
> -	flush_dcache_page and update_mmu_cache implementations for an example
> -	of how to go about doing this.
> +	This allows these interfaces to be implemented much more
> +	efficiently.  It allows one to "defer" (perhaps indefinitely) the
> +	actual flush if there are currently no user processes mapping this
> +	page.  See sparc64's flush_dcache_page and update_mmu_cache_range
> +	implementations for an example of how to go about doing this.
>  
> -	The idea is, first at flush_dcache_page() time, if page_file_mapping()
> -	returns a mapping, and mapping_mapped on that mapping returns %false,
> -	just mark the architecture private page flag bit.  Later, in
> -	update_mmu_cache(), a check is made of this flag bit, and if set the
> -	flush is done and the flag bit is cleared.
> +	The idea is, first at flush_dcache_page() time, if
> +	page_file_mapping() returns a mapping, and mapping_mapped on that
> +	mapping returns %false, just mark the architecture private page
> +	flag bit.  Later, in update_mmu_cache_range(), a check is made
> +	of this flag bit, and if set the flush is done and the flag bit
> +	is cleared.
>  
>  	.. important::
>  
> @@ -369,7 +370,7 @@ maps this page at its virtual address.
>    ``void flush_icache_page(struct vm_area_struct *vma, struct page *page)``
>  
>  	All the functionality of flush_icache_page can be implemented in
> -	flush_dcache_page and update_mmu_cache. In the future, the hope
> +	flush_dcache_page and update_mmu_cache_range. In the future, the hope
>  	is to remove this interface completely.
>  
>  The final category of APIs is for I/O to deliberately aliased address
> diff --git a/include/asm-generic/cacheflush.h b/include/asm-generic/cacheflush.h
> index f46258d1a080..09d51a680765 100644
> --- a/include/asm-generic/cacheflush.h
> +++ b/include/asm-generic/cacheflush.h
> @@ -78,6 +78,11 @@ static inline void flush_icache_range(unsigned long start, unsigned long end)
>  #endif
>  
>  #ifndef flush_icache_page
> +static inline void flush_icache_pages(struct vm_area_struct *vma,
> +				     struct page *page, unsigned int nr)
> +{
> +}
> +
>  static inline void flush_icache_page(struct vm_area_struct *vma,
>  				     struct page *page)
>  {

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v4 03/36] mm: Add folio_flush_mapping()
  2023-03-15  5:14 ` [PATCH v4 03/36] mm: Add folio_flush_mapping() Matthew Wilcox (Oracle)
  2023-03-15  9:28   ` Mike Rapoport
@ 2023-05-25  2:35   ` Anshuman Khandual
  1 sibling, 0 replies; 138+ messages in thread
From: Anshuman Khandual @ 2023-05-25  2:35 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), linux-arch; +Cc: linux-mm, linux-kernel



On 3/15/23 10:44, Matthew Wilcox (Oracle) wrote:
> This is the folio equivalent of page_mapping_file(), but rename it
> to make it clear that it's very different from page_file_mapping().
> Theoretically, there's nothing flush-only about it, but there are no
> other users today, and I doubt there will be; it's almost always more
> useful to know the swapfile's mapping or the swapcache's mapping.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>

> ---
>  include/linux/pagemap.h | 26 +++++++++++++++++++++-----
>  1 file changed, 21 insertions(+), 5 deletions(-)
> 
> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
> index a56308a9d1a4..e56c2023aa0e 100644
> --- a/include/linux/pagemap.h
> +++ b/include/linux/pagemap.h
> @@ -369,6 +369,26 @@ static inline struct address_space *folio_file_mapping(struct folio *folio)
>  	return folio->mapping;
>  }
>  
> +/**
> + * folio_flush_mapping - Find the file mapping this folio belongs to.
> + * @folio: The folio.
> + *
> + * For folios which are in the page cache, return the mapping that this
> + * page belongs to.  Anonymous folios return NULL, even if they're in
> + * the swap cache.  Other kinds of folio also return NULL.
> + *
> + * This is ONLY used by architecture cache flushing code.  If you aren't
> + * writing cache flushing code, you want either folio_mapping() or
> + * folio_file_mapping().
> + */
> +static inline struct address_space *folio_flush_mapping(struct folio *folio)
> +{
> +	if (unlikely(folio_test_swapcache(folio)))
> +		return NULL;
> +
> +	return folio_mapping(folio);
> +}
> +
>  static inline struct address_space *page_file_mapping(struct page *page)
>  {
>  	return folio_file_mapping(page_folio(page));
> @@ -379,11 +399,7 @@ static inline struct address_space *page_file_mapping(struct page *page)
>   */
>  static inline struct address_space *page_mapping_file(struct page *page)
>  {
> -	struct folio *folio = page_folio(page);
> -
> -	if (unlikely(folio_test_swapcache(folio)))
> -		return NULL;
> -	return folio_mapping(folio);
> +	return folio_flush_mapping(page_folio(page));
>  }
>  
>  /**

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v4 04/36] mm: Remove ARCH_IMPLEMENTS_FLUSH_DCACHE_FOLIO
  2023-03-15  5:14 ` [PATCH v4 04/36] mm: Remove ARCH_IMPLEMENTS_FLUSH_DCACHE_FOLIO Matthew Wilcox (Oracle)
  2023-03-15  9:28   ` Mike Rapoport
@ 2023-05-25  2:43   ` Anshuman Khandual
  1 sibling, 0 replies; 138+ messages in thread
From: Anshuman Khandual @ 2023-05-25  2:43 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), linux-arch; +Cc: linux-mm, linux-kernel



On 3/15/23 10:44, Matthew Wilcox (Oracle) wrote:
> Current best practice is to reuse the name of the function as a define
> to indicate that the function is implemented by the architecture.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>

> ---
>  Documentation/core-api/cachetlb.rst | 24 +++++++++---------------
>  include/linux/cacheflush.h          |  4 ++--
>  mm/util.c                           |  2 +-
>  3 files changed, 12 insertions(+), 18 deletions(-)
> 
> diff --git a/Documentation/core-api/cachetlb.rst b/Documentation/core-api/cachetlb.rst
> index d4c9e2a28d36..770008afd409 100644
> --- a/Documentation/core-api/cachetlb.rst
> +++ b/Documentation/core-api/cachetlb.rst
> @@ -269,7 +269,7 @@ maps this page at its virtual address.
>  	If D-cache aliasing is not an issue, these two routines may
>  	simply call memcpy/memset directly and do nothing more.
>  
> -  ``void flush_dcache_page(struct page *page)``
> +  ``void flush_dcache_folio(struct folio *folio)``
>  
>          This routines must be called when:
>  
> @@ -277,7 +277,7 @@ maps this page at its virtual address.
>  	     and / or in high memory
>  	  b) the kernel is about to read from a page cache page and user space
>  	     shared/writable mappings of this page potentially exist.  Note
> -	     that {get,pin}_user_pages{_fast} already call flush_dcache_page
> +	     that {get,pin}_user_pages{_fast} already call flush_dcache_folio
>  	     on any page found in the user address space and thus driver
>  	     code rarely needs to take this into account.
>  
> @@ -291,7 +291,7 @@ maps this page at its virtual address.
>  
>  	The phrase "kernel writes to a page cache page" means, specifically,
>  	that the kernel executes store instructions that dirty data in that
> -	page at the page->virtual mapping of that page.  It is important to
> +	page at the kernel virtual mapping of that page.  It is important to
>  	flush here to handle D-cache aliasing, to make sure these kernel stores
>  	are visible to user space mappings of that page.
>  
> @@ -302,18 +302,18 @@ maps this page at its virtual address.
>  	If D-cache aliasing is not an issue, this routine may simply be defined
>  	as a nop on that architecture.
>  
> -        There is a bit set aside in page->flags (PG_arch_1) as "architecture
> +        There is a bit set aside in folio->flags (PG_arch_1) as "architecture
>  	private".  The kernel guarantees that, for pagecache pages, it will
>  	clear this bit when such a page first enters the pagecache.
>  
>  	This allows these interfaces to be implemented much more
>  	efficiently.  It allows one to "defer" (perhaps indefinitely) the
>  	actual flush if there are currently no user processes mapping this
> -	page.  See sparc64's flush_dcache_page and update_mmu_cache_range
> +	page.  See sparc64's flush_dcache_folio and update_mmu_cache_range
>  	implementations for an example of how to go about doing this.
>  
> -	The idea is, first at flush_dcache_page() time, if
> -	page_file_mapping() returns a mapping, and mapping_mapped on that
> +	The idea is, first at flush_dcache_folio() time, if
> +	folio_flush_mapping() returns a mapping, and mapping_mapped() on that
>  	mapping returns %false, just mark the architecture private page
>  	flag bit.  Later, in update_mmu_cache_range(), a check is made
>  	of this flag bit, and if set the flush is done and the flag bit
> @@ -327,12 +327,6 @@ maps this page at its virtual address.
>  			dirty.  Again, see sparc64 for examples of how
>  			to deal with this.
>  
> -  ``void flush_dcache_folio(struct folio *folio)``
> -	This function is called under the same circumstances as
> -	flush_dcache_page().  It allows the architecture to
> -	optimise for flushing the entire folio of pages instead
> -	of flushing one page at a time.
> -
>    ``void copy_to_user_page(struct vm_area_struct *vma, struct page *page,
>    unsigned long user_vaddr, void *dst, void *src, int len)``
>    ``void copy_from_user_page(struct vm_area_struct *vma, struct page *page,
> @@ -353,7 +347,7 @@ maps this page at its virtual address.
>  
>    	When the kernel needs to access the contents of an anonymous
>  	page, it calls this function (currently only
> -	get_user_pages()).  Note: flush_dcache_page() deliberately
> +	get_user_pages()).  Note: flush_dcache_folio() deliberately
>  	doesn't work for an anonymous page.  The default
>  	implementation is a nop (and should remain so for all coherent
>  	architectures).  For incoherent architectures, it should flush
> @@ -370,7 +364,7 @@ maps this page at its virtual address.
>    ``void flush_icache_page(struct vm_area_struct *vma, struct page *page)``
>  
>  	All the functionality of flush_icache_page can be implemented in
> -	flush_dcache_page and update_mmu_cache_range. In the future, the hope
> +	flush_dcache_folio and update_mmu_cache_range. In the future, the hope
>  	is to remove this interface completely.
>  
>  The final category of APIs is for I/O to deliberately aliased address
> diff --git a/include/linux/cacheflush.h b/include/linux/cacheflush.h
> index a6189d21f2ba..82136f3fcf54 100644
> --- a/include/linux/cacheflush.h
> +++ b/include/linux/cacheflush.h
> @@ -7,14 +7,14 @@
>  struct folio;
>  
>  #if ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE
> -#ifndef ARCH_IMPLEMENTS_FLUSH_DCACHE_FOLIO
> +#ifndef flush_dcache_folio
>  void flush_dcache_folio(struct folio *folio);
>  #endif
>  #else
>  static inline void flush_dcache_folio(struct folio *folio)
>  {
>  }
> -#define ARCH_IMPLEMENTS_FLUSH_DCACHE_FOLIO 0
> +#define flush_dcache_folio flush_dcache_folio
>  #endif /* ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE */
>  
>  #endif /* _LINUX_CACHEFLUSH_H */
> diff --git a/mm/util.c b/mm/util.c
> index dd12b9531ac4..98ce51b01627 100644
> --- a/mm/util.c
> +++ b/mm/util.c
> @@ -1125,7 +1125,7 @@ void page_offline_end(void)
>  }
>  EXPORT_SYMBOL(page_offline_end);
>  
> -#ifndef ARCH_IMPLEMENTS_FLUSH_DCACHE_FOLIO
> +#ifndef flush_dcache_folio
>  void flush_dcache_folio(struct folio *folio)
>  {
>  	long i, nr = folio_nr_pages(folio);

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v4 05/36] mm: Add default definition of set_ptes()
  2023-03-15  5:14 ` [PATCH v4 05/36] mm: Add default definition of set_ptes() Matthew Wilcox (Oracle)
  2023-03-15  9:34   ` Mike Rapoport
@ 2023-05-25  3:01   ` Anshuman Khandual
  2023-05-25  4:06     ` Matthew Wilcox
  1 sibling, 1 reply; 138+ messages in thread
From: Anshuman Khandual @ 2023-05-25  3:01 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), linux-arch; +Cc: linux-mm, linux-kernel, Mike Rapoport



On 3/15/23 10:44, Matthew Wilcox (Oracle) wrote:
> Most architectures can just define set_pte() and PFN_PTE_SHIFT to
> use this definition.  It's also a handy spot to document the guarantees
> provided by the MM.
> 
> Suggested-by: Mike Rapoport (IBM) <rppt@kernel.org>
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>  include/linux/pgtable.h | 37 +++++++++++++++++++++++++++++++++++++
>  1 file changed, 37 insertions(+)
> 
> diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
> index c5a51481bbb9..a755fe94b4b4 100644
> --- a/include/linux/pgtable.h
> +++ b/include/linux/pgtable.h
> @@ -172,6 +172,43 @@ static inline int pmd_young(pmd_t pmd)
>  }
>  #endif
>  
> +#ifndef set_ptes
> +#ifdef PFN_PTE_SHIFT
> +/**
> + * set_ptes - Map consecutive pages to a contiguous range of addresses.
> + * @mm: Address space to map the pages into.
> + * @addr: Address to map the first page at.
> + * @ptep: Page table pointer for the first entry.
> + * @pte: Page table entry for the first page.
> + * @nr: Number of pages to map.
> + *
> + * May be overridden by the architecture, or the architecture can define
> + * set_pte() and PFN_PTE_SHIFT.
> + *
> + * Context: The caller holds the page table lock.  The pages all belong
> + * to the same folio.  The PTEs are all in the same PMD.
> + */
> +static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
> +		pte_t *ptep, pte_t pte, unsigned int nr)
> +{
> +	page_table_check_ptes_set(mm, addr, ptep, pte, nr);
> +
> +	for (;;) {
> +		set_pte(ptep, pte);
> +		if (--nr == 0)
> +			break;
> +		ptep++;
> +		pte = __pte(pte_val(pte) + (1UL << PFN_PTE_SHIFT));
> +	}
> +}
> +#ifndef set_pte_at
> +#define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1)
> +#endif

Should not there be a build phase call out when both set_ptes() and PFN_PTE_SHIFT
are not defined on a given platform ?

--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -204,6 +204,8 @@ static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
 #ifndef set_pte_at
 #define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1)
 #endif
+#else
+#error "You should define PFN_PTE_SHIFT"
 #endif
 #else
 #define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1)

> +#endif> +#else
> +#define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1)
> +#endif
> +
>  #ifndef __HAVE_ARCH_PTEP_SET_ACCESS_FLAGS
>  extern int ptep_set_access_flags(struct vm_area_struct *vma,
>  				 unsigned long address, pte_t *ptep,

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v4 09/36] arm64: Implement the new page table range API
  2023-03-15  5:14 ` [PATCH v4 09/36] arm64: " Matthew Wilcox (Oracle)
  2023-03-15  9:49   ` Mike Rapoport
@ 2023-05-25  3:35   ` Anshuman Khandual
  2023-05-25  4:05     ` Matthew Wilcox
  1 sibling, 1 reply; 138+ messages in thread
From: Anshuman Khandual @ 2023-05-25  3:35 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), linux-arch
  Cc: linux-mm, linux-kernel, Catalin Marinas, linux-arm-kernel



On 3/15/23 10:44, Matthew Wilcox (Oracle) wrote:
> Add set_ptes(), update_mmu_cache_range() and flush_dcache_folio().
> Change the PG_dcache_clean flag from being per-page to per-folio.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
> Cc: linux-arm-kernel@lists.infradead.org
> ---
>  arch/arm64/include/asm/cacheflush.h |  4 +++-
>  arch/arm64/include/asm/pgtable.h    | 25 ++++++++++++++------
>  arch/arm64/mm/flush.c               | 36 +++++++++++------------------
>  3 files changed, 35 insertions(+), 30 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/cacheflush.h b/arch/arm64/include/asm/cacheflush.h
> index 37185e978aeb..d115451ed263 100644
> --- a/arch/arm64/include/asm/cacheflush.h
> +++ b/arch/arm64/include/asm/cacheflush.h
> @@ -114,7 +114,7 @@ extern void copy_to_user_page(struct vm_area_struct *, struct page *,
>  #define copy_to_user_page copy_to_user_page
>  
>  /*
> - * flush_dcache_page is used when the kernel has written to the page
> + * flush_dcache_folio is used when the kernel has written to the page
>   * cache page at virtual address page->virtual.
>   *
>   * If this page isn't mapped (ie, page_mapping == NULL), or it might
> @@ -127,6 +127,8 @@ extern void copy_to_user_page(struct vm_area_struct *, struct page *,
>   */
>  #define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
>  extern void flush_dcache_page(struct page *);
> +void flush_dcache_folio(struct folio *);

This is giving a checkpatch.pl warning

WARNING: function definition argument 'struct folio *' should also have an identifier name
#36: FILE: arch/arm64/include/asm/cacheflush.h:130:
+void flush_dcache_folio(struct folio *);

total: 0 errors, 1 warnings, 111 lines checked

> +#define flush_dcache_folio flush_dcache_folio
>  
>  static __always_inline void icache_inval_all_pou(void)
>  {
> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
> index 9428748f4691..6fd012663a01 100644
> --- a/arch/arm64/include/asm/pgtable.h
> +++ b/arch/arm64/include/asm/pgtable.h
> @@ -355,12 +355,21 @@ static inline void __set_pte_at(struct mm_struct *mm, unsigned long addr,
>  	set_pte(ptep, pte);
>  }
>  
> -static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
> -			      pte_t *ptep, pte_t pte)
> -{
> -	page_table_check_ptes_set(mm, addr, ptep, pte, 1);
> -	return __set_pte_at(mm, addr, ptep, pte);
> +static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
> +			      pte_t *ptep, pte_t pte, unsigned int nr)
> +{
> +	page_table_check_ptes_set(mm, addr, ptep, pte, nr);
> +
> +	for (;;) {
> +		__set_pte_at(mm, addr, ptep, pte);
> +		if (--nr == 0)
> +			break;
> +		ptep++;
> +		addr += PAGE_SIZE;
> +		pte_val(pte) += PAGE_SIZE;
> +	}
>  }
> +#define set_ptes set_ptes
>  
>  /*
>   * Huge pte definitions.
> @@ -1059,8 +1068,8 @@ static inline void arch_swap_restore(swp_entry_t entry, struct folio *folio)
>  /*
>   * On AArch64, the cache coherency is handled via the set_pte_at() function.
>   */
> -static inline void update_mmu_cache(struct vm_area_struct *vma,
> -				    unsigned long addr, pte_t *ptep)
> +static inline void update_mmu_cache_range(struct vm_area_struct *vma,
> +		unsigned long addr, pte_t *ptep, unsigned int nr)
>  {
>  	/*
>  	 * We don't do anything here, so there's a very small chance of
> @@ -1069,6 +1078,8 @@ static inline void update_mmu_cache(struct vm_area_struct *vma,
>  	 */
>  }
>  
> +#define update_mmu_cache(vma, addr, ptep) \
> +	update_mmu_cache_range(vma, addr, ptep, 1)
>  #define update_mmu_cache_pmd(vma, address, pmd) do { } while (0)
>  
>  #ifdef CONFIG_ARM64_PA_BITS_52
> diff --git a/arch/arm64/mm/flush.c b/arch/arm64/mm/flush.c
> index 5f9379b3c8c8..deb781af0a3a 100644
> --- a/arch/arm64/mm/flush.c
> +++ b/arch/arm64/mm/flush.c
> @@ -50,20 +50,13 @@ void copy_to_user_page(struct vm_area_struct *vma, struct page *page,
>  
>  void __sync_icache_dcache(pte_t pte)
>  {
> -	struct page *page = pte_page(pte);
> +	struct folio *folio = page_folio(pte_page(pte));
>  
> -	/*
> -	 * HugeTLB pages are always fully mapped, so only setting head page's
> -	 * PG_dcache_clean flag is enough.
> -	 */
> -	if (PageHuge(page))
> -		page = compound_head(page);
> -
> -	if (!test_bit(PG_dcache_clean, &page->flags)) {
> -		sync_icache_aliases((unsigned long)page_address(page),
> -				    (unsigned long)page_address(page) +
> -					    page_size(page));
> -		set_bit(PG_dcache_clean, &page->flags);
> +	if (!test_bit(PG_dcache_clean, &folio->flags)) {
> +		sync_icache_aliases((unsigned long)folio_address(folio),
> +				    (unsigned long)folio_address(folio) +
> +					    folio_size(folio));
> +		set_bit(PG_dcache_clean, &folio->flags);
>  	}
>  }
>  EXPORT_SYMBOL_GPL(__sync_icache_dcache);
> @@ -73,17 +66,16 @@ EXPORT_SYMBOL_GPL(__sync_icache_dcache);
>   * it as dirty for later flushing when mapped in user space (if executable,
>   * see __sync_icache_dcache).
>   */
> -void flush_dcache_page(struct page *page)
> +void flush_dcache_folio(struct folio *folio)
>  {
> -	/*
> -	 * HugeTLB pages are always fully mapped and only head page will be
> -	 * set PG_dcache_clean (see comments in __sync_icache_dcache()).
> -	 */
> -	if (PageHuge(page))
> -		page = compound_head(page);
> +	if (test_bit(PG_dcache_clean, &folio->flags))
> +		clear_bit(PG_dcache_clean, &folio->flags);
> +}
> +EXPORT_SYMBOL(flush_dcache_folio);
>  
> -	if (test_bit(PG_dcache_clean, &page->flags))
> -		clear_bit(PG_dcache_clean, &page->flags);
> +void flush_dcache_page(struct page *page)
> +{
> +	flush_dcache_folio(page_folio(page));
>  }
>  EXPORT_SYMBOL(flush_dcache_page);
>

Otherwise LGTM.

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v4 29/36] mm: Remove page_mapping_file()
  2023-03-15  5:14 ` [PATCH v4 29/36] mm: Remove page_mapping_file() Matthew Wilcox (Oracle)
@ 2023-05-25  3:50   ` Anshuman Khandual
  2023-05-25  4:03     ` Matthew Wilcox
  2023-05-25  5:37   ` Anshuman Khandual
  1 sibling, 1 reply; 138+ messages in thread
From: Anshuman Khandual @ 2023-05-25  3:50 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), linux-arch; +Cc: linux-mm, linux-kernel



On 3/15/23 10:44, Matthew Wilcox (Oracle) wrote:
> This function has no more users.

On v6.4-rc3, there are still some users. Am I looking into a wrong
tree/branch/tag ?

~/workplace/linux$ git grep page_mapping_file
arch/arc/mm/cache.c:    mapping = page_mapping_file(page);
arch/arm/mm/copypage-v4mc.c:            __flush_dcache_page(page_mapping_file(from), from);
arch/arm/mm/copypage-v6.c:              __flush_dcache_page(page_mapping_file(from), from);
arch/arm/mm/copypage-xscale.c:          __flush_dcache_page(page_mapping_file(from), from);
arch/arm/mm/fault-armv.c:       mapping = page_mapping_file(page);
arch/arm/mm/flush.c:            mapping = page_mapping_file(page);
arch/arm/mm/flush.c:    mapping = page_mapping_file(page);
arch/csky/abiv1/cacheflush.c:   mapping = page_mapping_file(page);
arch/csky/abiv1/cacheflush.c:   if (page_mapping_file(page)) {
arch/mips/mm/cache.c:   struct address_space *mapping = page_mapping_file(page);
arch/nios2/mm/cacheflush.c:     mapping = page_mapping_file(page);
arch/nios2/mm/cacheflush.c:     mapping = page_mapping_file(page);
arch/parisc/kernel/cache.c:     if (page_mapping_file(page) &&
arch/parisc/kernel/cache.c:     struct address_space *mapping = page_mapping_file(page);
arch/sh/mm/cache-sh4.c: struct address_space *mapping = page_mapping_file(page);
arch/sh/mm/cache-sh7705.c:      struct address_space *mapping = page_mapping_file(page);
arch/sparc/kernel/smp_64.c:                          page_mapping_file(page) != NULL));
arch/sparc/kernel/smp_64.c:     if (page_mapping_file(page) != NULL &&
arch/sparc/kernel/smp_64.c:                     if (page_mapping_file(page) != NULL)
arch/sparc/kernel/smp_64.c:             if (page_mapping_file(page) != NULL)
arch/sparc/mm/init_64.c:                             page_mapping_file(page) != NULL));
arch/sparc/mm/init_64.c:        if (page_mapping_file(page) != NULL &&
arch/sparc/mm/init_64.c:        mapping = page_mapping_file(page);
arch/sparc/mm/tlb.c:            mapping = page_mapping_file(page);
arch/xtensa/mm/cache.c: struct address_space *mapping = page_mapping_file(page);

> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>  include/linux/pagemap.h | 8 --------
>  1 file changed, 8 deletions(-)
> 
> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
> index e56c2023aa0e..a87113055b9c 100644
> --- a/include/linux/pagemap.h
> +++ b/include/linux/pagemap.h
> @@ -394,14 +394,6 @@ static inline struct address_space *page_file_mapping(struct page *page)
>  	return folio_file_mapping(page_folio(page));
>  }
>  
> -/*
> - * For file cache pages, return the address_space, otherwise return NULL
> - */
> -static inline struct address_space *page_mapping_file(struct page *page)
> -{
> -	return folio_flush_mapping(page_folio(page));
> -}
> -
>  /**
>   * folio_inode - Get the host inode for this folio.
>   * @folio: The folio.

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v4 29/36] mm: Remove page_mapping_file()
  2023-05-25  3:50   ` Anshuman Khandual
@ 2023-05-25  4:03     ` Matthew Wilcox
  2023-05-25  4:46       ` Anshuman Khandual
  0 siblings, 1 reply; 138+ messages in thread
From: Matthew Wilcox @ 2023-05-25  4:03 UTC (permalink / raw)
  To: Anshuman Khandual; +Cc: linux-arch, linux-mm, linux-kernel

On Thu, May 25, 2023 at 09:20:47AM +0530, Anshuman Khandual wrote:
> 
> 
> On 3/15/23 10:44, Matthew Wilcox (Oracle) wrote:
> > This function has no more users.
> 
> On v6.4-rc3, there are still some users. Am I looking into a wrong
> tree/branch/tag ?

Did you apply patches 1-28 before grepping?

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v4 09/36] arm64: Implement the new page table range API
  2023-05-25  3:35   ` Anshuman Khandual
@ 2023-05-25  4:05     ` Matthew Wilcox
  2023-05-25  4:43       ` Anshuman Khandual
  0 siblings, 1 reply; 138+ messages in thread
From: Matthew Wilcox @ 2023-05-25  4:05 UTC (permalink / raw)
  To: Anshuman Khandual
  Cc: linux-arch, linux-mm, linux-kernel, Catalin Marinas, linux-arm-kernel

On Thu, May 25, 2023 at 09:05:35AM +0530, Anshuman Khandual wrote:
> > @@ -127,6 +127,8 @@ extern void copy_to_user_page(struct vm_area_struct *, struct page *,
> >   */
> >  #define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
> >  extern void flush_dcache_page(struct page *);
> > +void flush_dcache_folio(struct folio *);
> 
> This is giving a checkpatch.pl warning
> 
> WARNING: function definition argument 'struct folio *' should also have an identifier name
> #36: FILE: arch/arm64/include/asm/cacheflush.h:130:
> +void flush_dcache_folio(struct folio *);

Yes, but checkpatch is *stupid*.  Don't just follow tools blindly.
How is naming the parameter here helping anyone?

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v4 05/36] mm: Add default definition of set_ptes()
  2023-05-25  3:01   ` Anshuman Khandual
@ 2023-05-25  4:06     ` Matthew Wilcox
  0 siblings, 0 replies; 138+ messages in thread
From: Matthew Wilcox @ 2023-05-25  4:06 UTC (permalink / raw)
  To: Anshuman Khandual; +Cc: linux-arch, linux-mm, linux-kernel, Mike Rapoport

On Thu, May 25, 2023 at 08:31:14AM +0530, Anshuman Khandual wrote:
> > +#ifndef set_ptes
> > +#ifdef PFN_PTE_SHIFT
> > +/**
> > + * set_ptes - Map consecutive pages to a contiguous range of addresses.
> > + * @mm: Address space to map the pages into.
> > + * @addr: Address to map the first page at.
> > + * @ptep: Page table pointer for the first entry.
> > + * @pte: Page table entry for the first page.
> > + * @nr: Number of pages to map.
> > + *
> > + * May be overridden by the architecture, or the architecture can define
> > + * set_pte() and PFN_PTE_SHIFT.
> > + *
> > + * Context: The caller holds the page table lock.  The pages all belong
> > + * to the same folio.  The PTEs are all in the same PMD.
> > + */
> > +static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
> > +		pte_t *ptep, pte_t pte, unsigned int nr)
> > +{
> > +	page_table_check_ptes_set(mm, addr, ptep, pte, nr);
> > +
> > +	for (;;) {
> > +		set_pte(ptep, pte);
> > +		if (--nr == 0)
> > +			break;
> > +		ptep++;
> > +		pte = __pte(pte_val(pte) + (1UL << PFN_PTE_SHIFT));
> > +	}
> > +}
> > +#ifndef set_pte_at
> > +#define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1)
> > +#endif
> 
> Should not there be a build phase call out when both set_ptes() and PFN_PTE_SHIFT
> are not defined on a given platform ?

How does that help?  Either way you get a clear build error.


^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v4 09/36] arm64: Implement the new page table range API
  2023-05-25  4:05     ` Matthew Wilcox
@ 2023-05-25  4:43       ` Anshuman Khandual
  0 siblings, 0 replies; 138+ messages in thread
From: Anshuman Khandual @ 2023-05-25  4:43 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: linux-arch, linux-mm, linux-kernel, Catalin Marinas, linux-arm-kernel



On 5/25/23 09:35, Matthew Wilcox wrote:
> On Thu, May 25, 2023 at 09:05:35AM +0530, Anshuman Khandual wrote:
>>> @@ -127,6 +127,8 @@ extern void copy_to_user_page(struct vm_area_struct *, struct page *,
>>>   */
>>>  #define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
>>>  extern void flush_dcache_page(struct page *);
>>> +void flush_dcache_folio(struct folio *);
>>
>> This is giving a checkpatch.pl warning
>>
>> WARNING: function definition argument 'struct folio *' should also have an identifier name
>> #36: FILE: arch/arm64/include/asm/cacheflush.h:130:
>> +void flush_dcache_folio(struct folio *);
> 
> Yes, but checkpatch is *stupid*.  Don't just follow tools blindly.
> How is naming the parameter here helping anyone?

Agreed, it seemed bit weird. Never mind.

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v4 29/36] mm: Remove page_mapping_file()
  2023-05-25  4:03     ` Matthew Wilcox
@ 2023-05-25  4:46       ` Anshuman Khandual
  0 siblings, 0 replies; 138+ messages in thread
From: Anshuman Khandual @ 2023-05-25  4:46 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: linux-arch, linux-mm, linux-kernel



On 5/25/23 09:33, Matthew Wilcox wrote:
> On Thu, May 25, 2023 at 09:20:47AM +0530, Anshuman Khandual wrote:
>>
>>
>> On 3/15/23 10:44, Matthew Wilcox (Oracle) wrote:
>>> This function has no more users.
>>
>> On v6.4-rc3, there are still some users. Am I looking into a wrong
>> tree/branch/tag ?
> 
> Did you apply patches 1-28 before grepping?

Ahh, my bad. I had applied the generic MM ones and arm64 one to test.

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v4 29/36] mm: Remove page_mapping_file()
  2023-03-15  5:14 ` [PATCH v4 29/36] mm: Remove page_mapping_file() Matthew Wilcox (Oracle)
  2023-05-25  3:50   ` Anshuman Khandual
@ 2023-05-25  5:37   ` Anshuman Khandual
  1 sibling, 0 replies; 138+ messages in thread
From: Anshuman Khandual @ 2023-05-25  5:37 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), linux-arch; +Cc: linux-mm, linux-kernel

On 3/15/23 10:44, Matthew Wilcox (Oracle) wrote:
> This function has no more users.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>

> ---
>  include/linux/pagemap.h | 8 --------
>  1 file changed, 8 deletions(-)
> 
> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
> index e56c2023aa0e..a87113055b9c 100644
> --- a/include/linux/pagemap.h
> +++ b/include/linux/pagemap.h
> @@ -394,14 +394,6 @@ static inline struct address_space *page_file_mapping(struct page *page)
>  	return folio_file_mapping(page_folio(page));
>  }
>  
> -/*
> - * For file cache pages, return the address_space, otherwise return NULL
> - */
> -static inline struct address_space *page_mapping_file(struct page *page)
> -{
> -	return folio_flush_mapping(page_folio(page));
> -}
> -
>  /**
>   * folio_inode - Get the host inode for this folio.
>   * @folio: The folio.

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v4 31/36] mm: Tidy up set_ptes definition
  2023-03-15  5:14 ` [PATCH v4 31/36] mm: Tidy up set_ptes definition Matthew Wilcox (Oracle)
@ 2023-05-25  6:20   ` Anshuman Khandual
  0 siblings, 0 replies; 138+ messages in thread
From: Anshuman Khandual @ 2023-05-25  6:20 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), linux-arch; +Cc: linux-mm, linux-kernel



On 3/15/23 10:44, Matthew Wilcox (Oracle) wrote:
> Now that all architectures are converted, we can remove the
> PFN_PTE_SHIFT ifdef and we can define set_pte_at() unconditionally.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>

> ---
>  include/linux/pgtable.h | 6 ------
>  1 file changed, 6 deletions(-)
> 
> diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
> index a755fe94b4b4..a54b9197f2f2 100644
> --- a/include/linux/pgtable.h
> +++ b/include/linux/pgtable.h
> @@ -173,7 +173,6 @@ static inline int pmd_young(pmd_t pmd)
>  #endif
>  
>  #ifndef set_ptes
> -#ifdef PFN_PTE_SHIFT
>  /**
>   * set_ptes - Map consecutive pages to a contiguous range of addresses.
>   * @mm: Address space to map the pages into.
> @@ -201,13 +200,8 @@ static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
>  		pte = __pte(pte_val(pte) + (1UL << PFN_PTE_SHIFT));
>  	}
>  }
> -#ifndef set_pte_at
> -#define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1)
> -#endif
>  #endif
> -#else
>  #define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1)
> -#endif
>  
>  #ifndef __HAVE_ARCH_PTEP_SET_ACCESS_FLAGS
>  extern int ptep_set_access_flags(struct vm_area_struct *vma,

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v4 32/36] mm: Use flush_icache_pages() in do_set_pmd()
  2023-03-15  5:14 ` [PATCH v4 32/36] mm: Use flush_icache_pages() in do_set_pmd() Matthew Wilcox (Oracle)
@ 2023-05-25  6:31   ` Anshuman Khandual
  0 siblings, 0 replies; 138+ messages in thread
From: Anshuman Khandual @ 2023-05-25  6:31 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), linux-arch; +Cc: linux-mm, linux-kernel



On 3/15/23 10:44, Matthew Wilcox (Oracle) wrote:
> Push the iteration over each page down to the architectures (many
> can flush the entire THP without iteration).
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>

> ---
>  mm/memory.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/mm/memory.c b/mm/memory.c
> index c5f1bf906d0c..6aa21e8f3753 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -4209,7 +4209,6 @@ vm_fault_t do_set_pmd(struct vm_fault *vmf, struct page *page)
>  	bool write = vmf->flags & FAULT_FLAG_WRITE;
>  	unsigned long haddr = vmf->address & HPAGE_PMD_MASK;
>  	pmd_t entry;
> -	int i;
>  	vm_fault_t ret = VM_FAULT_FALLBACK;
>  
>  	if (!transhuge_vma_suitable(vma, haddr))
> @@ -4242,8 +4241,7 @@ vm_fault_t do_set_pmd(struct vm_fault *vmf, struct page *page)
>  	if (unlikely(!pmd_none(*vmf->pmd)))
>  		goto out;
>  
> -	for (i = 0; i < HPAGE_PMD_NR; i++)
> -		flush_icache_page(vma, page + i);
> +	flush_icache_pages(vma, page, HPAGE_PMD_NR);
>  
>  	entry = mk_huge_pmd(page, vma->vm_page_prot);
>  	if (write)

^ permalink raw reply	[flat|nested] 138+ messages in thread

* [PATCH 0/4] New page table range API fixup patches
  2023-03-20 14:08                 ` Matthew Wilcox
  2023-03-21  1:58                   ` Yin, Fengwei
  2023-03-21  5:13                   ` Yin Fengwei
@ 2023-05-30  8:07                   ` Yin Fengwei
  2023-05-30  8:07                     ` [PATCH 1/4] filemap: avoid interfere with xas.xa_index Yin Fengwei
                                       ` (3 more replies)
  2 siblings, 4 replies; 138+ messages in thread
From: Yin Fengwei @ 2023-05-30  8:07 UTC (permalink / raw)
  To: willy, ryan.roberts, linux-arch, linux-mm, linux-kernel; +Cc: fengwei.yin

These are fixup patches for Matthew's New page table range API
patchset.

Thanks Matthew and Ryan a lot for helping on these fixup patches.
I sent the patches to Matthew and Ryan in private mail. Later,
realized that private mail should be avoid.


Yin Fengwei (4):
  filemap: avoid interfere with xas.xa_index
  rmap: fix typo in folio_add_file_rmap_range()
  mm: mark PTEs referencing the accessed folio young
  filemap: Check address range in filemap_map_folio_range()

 mm/filemap.c | 39 ++++++++++++---------------------------
 mm/memory.c  |  2 +-
 mm/rmap.c    |  2 +-
 3 files changed, 14 insertions(+), 29 deletions(-)

-- 
2.30.2


^ permalink raw reply	[flat|nested] 138+ messages in thread

* [PATCH 1/4] filemap: avoid interfere with xas.xa_index
  2023-05-30  8:07                   ` [PATCH 0/4] New page table range API fixup patches Yin Fengwei
@ 2023-05-30  8:07                     ` Yin Fengwei
  2023-05-30  8:07                     ` [PATCH 2/4] rmap: fix typo in folio_add_file_rmap_range() Yin Fengwei
                                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 138+ messages in thread
From: Yin Fengwei @ 2023-05-30  8:07 UTC (permalink / raw)
  To: willy, ryan.roberts, linux-arch, linux-mm, linux-kernel; +Cc: fengwei.yin

Ryan noticed 1% performance regression for kernel build with
the ranged file map with ext4 file system. It was later identified
wrong xas.xa_index update in filemap_map_pages() when folio is
not large folio.

Matthew suggested to use XArray API instead of touch xas.xa_index
directly at [1].

[1] https://lore.kernel.org/linux-mm/ZBho6Q6Xq%2FYqRmBT@casper.infradead.org/

Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
Suggested-by: Matthew Wilcox <willy@infradead.org>
---
 mm/filemap.c | 30 ++++++------------------------
 1 file changed, 6 insertions(+), 24 deletions(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index 40be33b5ee46..fdb3e0a339b3 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -3416,10 +3416,10 @@ static bool filemap_map_pmd(struct vm_fault *vmf, struct folio *folio,
 	return false;
 }
 
-static struct folio *next_uptodate_page(struct folio *folio,
-				       struct address_space *mapping,
-				       struct xa_state *xas, pgoff_t end_pgoff)
+static struct folio *next_uptodate_folio(struct xa_state *xas,
+		struct address_space *mapping, pgoff_t end_pgoff)
 {
+	struct folio *folio = xas_next_entry(xas, end_pgoff);
 	unsigned long max_idx;
 
 	do {
@@ -3457,22 +3457,6 @@ static struct folio *next_uptodate_page(struct folio *folio,
 	return NULL;
 }
 
-static inline struct folio *first_map_page(struct address_space *mapping,
-					  struct xa_state *xas,
-					  pgoff_t end_pgoff)
-{
-	return next_uptodate_page(xas_find(xas, end_pgoff),
-				  mapping, xas, end_pgoff);
-}
-
-static inline struct folio *next_map_page(struct address_space *mapping,
-					 struct xa_state *xas,
-					 pgoff_t end_pgoff)
-{
-	return next_uptodate_page(xas_next_entry(xas, end_pgoff),
-				  mapping, xas, end_pgoff);
-}
-
 /*
  * Map page range [start_page, start_page + nr_pages) of folio.
  * start_page is gotten from start by folio_page(folio, start)
@@ -3543,12 +3527,11 @@ vm_fault_t filemap_map_pages(struct vm_fault *vmf,
 	unsigned long addr;
 	XA_STATE(xas, &mapping->i_pages, start_pgoff);
 	struct folio *folio;
-	unsigned int mmap_miss = READ_ONCE(file->f_ra.mmap_miss);
 	vm_fault_t ret = 0;
 	int nr_pages = 0;
 
 	rcu_read_lock();
-	folio = first_map_page(mapping, &xas, end_pgoff);
+	folio = next_uptodate_folio(&xas, mapping, end_pgoff);
 	if (!folio)
 		goto out;
 
@@ -3570,15 +3553,14 @@ vm_fault_t filemap_map_pages(struct vm_fault *vmf,
 
 		ret |= filemap_map_folio_range(vmf, folio,
 				xas.xa_index - folio->index, addr, nr_pages);
-		xas.xa_index += nr_pages;
 
 		folio_unlock(folio);
 		folio_put(folio);
-	} while ((folio = next_map_page(mapping, &xas, end_pgoff)) != NULL);
+		folio = next_uptodate_folio(&xas, mapping, end_pgoff);
+	} while (folio);
 	pte_unmap_unlock(vmf->pte, vmf->ptl);
 out:
 	rcu_read_unlock();
-	WRITE_ONCE(file->f_ra.mmap_miss, mmap_miss);
 	return ret;
 }
 EXPORT_SYMBOL(filemap_map_pages);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH 2/4] rmap: fix typo in folio_add_file_rmap_range()
  2023-05-30  8:07                   ` [PATCH 0/4] New page table range API fixup patches Yin Fengwei
  2023-05-30  8:07                     ` [PATCH 1/4] filemap: avoid interfere with xas.xa_index Yin Fengwei
@ 2023-05-30  8:07                     ` Yin Fengwei
  2023-05-30  8:07                     ` [PATCH 3/4] mm: mark PTEs referencing the accessed folio young Yin Fengwei
  2023-05-30  8:07                     ` [PATCH 4/4] filemap: Check address range in filemap_map_folio_range() Yin Fengwei
  3 siblings, 0 replies; 138+ messages in thread
From: Yin Fengwei @ 2023-05-30  8:07 UTC (permalink / raw)
  To: willy, ryan.roberts, linux-arch, linux-mm, linux-kernel; +Cc: fengwei.yin

The "first" should be used to compare with COMPOUND_MAPPED
instead of "nr".

Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
---
 mm/rmap.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/rmap.c b/mm/rmap.c
index ec52d7f264aa..b352c14da16c 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1330,7 +1330,7 @@ void folio_add_file_rmap_range(struct folio *folio, struct page *page,
 			first = atomic_inc_and_test(&page->_mapcount);
 			if (first && folio_test_large(folio)) {
 				first = atomic_inc_return_relaxed(mapped);
-				first = (nr < COMPOUND_MAPPED);
+				first = (first < COMPOUND_MAPPED);
 			}
 
 			if (first)
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH 3/4] mm: mark PTEs referencing the accessed folio young
  2023-05-30  8:07                   ` [PATCH 0/4] New page table range API fixup patches Yin Fengwei
  2023-05-30  8:07                     ` [PATCH 1/4] filemap: avoid interfere with xas.xa_index Yin Fengwei
  2023-05-30  8:07                     ` [PATCH 2/4] rmap: fix typo in folio_add_file_rmap_range() Yin Fengwei
@ 2023-05-30  8:07                     ` Yin Fengwei
  2023-05-30  8:07                     ` [PATCH 4/4] filemap: Check address range in filemap_map_folio_range() Yin Fengwei
  3 siblings, 0 replies; 138+ messages in thread
From: Yin Fengwei @ 2023-05-30  8:07 UTC (permalink / raw)
  To: willy, ryan.roberts, linux-arch, linux-mm, linux-kernel; +Cc: fengwei.yin

To allow using larger TLB entries, it's better to mark the
PTEs of same folio accessed when setup the PTEs.

Reported-by: Ryan Roberts <ryan.roberts@arm.com>
Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
---
 mm/memory.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/memory.c b/mm/memory.c
index c359fb8643e5..2615ea552613 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4259,7 +4259,7 @@ void set_pte_range(struct vm_fault *vmf, struct folio *folio,
 	struct vm_area_struct *vma = vmf->vma;
 	bool uffd_wp = pte_marker_uffd_wp(vmf->orig_pte);
 	bool write = vmf->flags & FAULT_FLAG_WRITE;
-	bool prefault = vmf->address != addr;
+	bool prefault = (addr > vmf->address) || ((addr + nr) < vmf->address);
 	pte_t entry;
 
 	flush_icache_pages(vma, page, nr);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH 4/4] filemap: Check address range in filemap_map_folio_range()
  2023-05-30  8:07                   ` [PATCH 0/4] New page table range API fixup patches Yin Fengwei
                                       ` (2 preceding siblings ...)
  2023-05-30  8:07                     ` [PATCH 3/4] mm: mark PTEs referencing the accessed folio young Yin Fengwei
@ 2023-05-30  8:07                     ` Yin Fengwei
  3 siblings, 0 replies; 138+ messages in thread
From: Yin Fengwei @ 2023-05-30  8:07 UTC (permalink / raw)
  To: willy, ryan.roberts, linux-arch, linux-mm, linux-kernel; +Cc: fengwei.yin

With filemap_map_folio_range(), the addr is updated with range
also. Address range checking is needed to make sure correct
return value (VM_FAULT_NOPAGE) if vmf->address is handled.

Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
---
 mm/filemap.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index fdb3e0a339b3..0f4baba1cd31 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -3488,15 +3488,15 @@ static vm_fault_t filemap_map_folio_range(struct vm_fault *vmf,
 		if (!pte_none(vmf->pte[count]))
 			goto skip;
 
-		if (vmf->address == addr)
-			ret = VM_FAULT_NOPAGE;
-
 		count++;
 		continue;
 skip:
 		if (count) {
 			set_pte_range(vmf, folio, page, count, addr);
 			folio_ref_add(folio, count);
+			if ((vmf->address < (addr + count * PAGE_SIZE)) &&
+					(vmf->address >= addr))
+				ret = VM_FAULT_NOPAGE;
 		}
 
 		count++;
@@ -3509,6 +3509,9 @@ static vm_fault_t filemap_map_folio_range(struct vm_fault *vmf,
 	if (count) {
 		set_pte_range(vmf, folio, page, count, addr);
 		folio_ref_add(folio, count);
+		if ((vmf->address < (addr + count * PAGE_SIZE)) &&
+				(vmf->address >= addr))
+			ret = VM_FAULT_NOPAGE;
 	}
 
 	vmf->pte = old_ptep;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 138+ messages in thread

* Re: [PATCH v4 17/36] nios2: Implement the new page table range API
  2023-03-15 10:08   ` Mike Rapoport
@ 2023-06-13 22:45     ` Dinh Nguyen
  2023-07-10 20:18       ` Matthew Wilcox
  0 siblings, 1 reply; 138+ messages in thread
From: Dinh Nguyen @ 2023-06-13 22:45 UTC (permalink / raw)
  To: Mike Rapoport, Matthew Wilcox (Oracle); +Cc: linux-arch, linux-mm, linux-kernel



On 3/15/23 05:08, Mike Rapoport wrote:
> On Wed, Mar 15, 2023 at 05:14:25AM +0000, Matthew Wilcox (Oracle) wrote:
>> Add set_ptes(), update_mmu_cache_range(), flush_icache_pages() and
>> flush_dcache_folio().  Change the PG_arch_1 (aka PG_dcache_dirty) flag
>> from being per-page to per-folio.
>>
>> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
>> Cc: Dinh Nguyen <dinguyen@kernel.org>
> 
> Acked-by: Mike Rapoport (IBM) <rppt@kernel.org>
> 

Applied!

Thanks,
Dinh

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v4 17/36] nios2: Implement the new page table range API
  2023-06-13 22:45     ` Dinh Nguyen
@ 2023-07-10 20:18       ` Matthew Wilcox
  2023-07-10 23:10         ` Dinh Nguyen
  0 siblings, 1 reply; 138+ messages in thread
From: Matthew Wilcox @ 2023-07-10 20:18 UTC (permalink / raw)
  To: Dinh Nguyen; +Cc: Mike Rapoport, linux-arch, linux-mm, linux-kernel

On Tue, Jun 13, 2023 at 05:45:54PM -0500, Dinh Nguyen wrote:
> 
> 
> On 3/15/23 05:08, Mike Rapoport wrote:
> > On Wed, Mar 15, 2023 at 05:14:25AM +0000, Matthew Wilcox (Oracle) wrote:
> > > Add set_ptes(), update_mmu_cache_range(), flush_icache_pages() and
> > > flush_dcache_folio().  Change the PG_arch_1 (aka PG_dcache_dirty) flag
> > > from being per-page to per-folio.
> > > 
> > > Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> > > Cc: Dinh Nguyen <dinguyen@kernel.org>
> > 
> > Acked-by: Mike Rapoport (IBM) <rppt@kernel.org>
> > 
> 
> Applied!

Sorry, what?  You can't pick this patch out of the middle of a series
and apply it!  This needs various earlier patches to work.  And then
later patches depend on this one having been applied, so if we were to
go the route of "please arch maintainers apply each of these patches",
it'd take over a year to get them all in.

As I said in the cover letter, this will all go in through the mm tree.
So what I want from arch maintainers is an Acked-by/Reviewed-by/Tested-by,
and then Andrew will apply the whole set.

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v4 20/36] powerpc: Implement the new page table range API
  2023-03-18  9:19         ` Christophe Leroy
@ 2023-07-10 20:24           ` Matthew Wilcox
  2023-07-11  4:40             ` Christophe Leroy
  0 siblings, 1 reply; 138+ messages in thread
From: Matthew Wilcox @ 2023-07-10 20:24 UTC (permalink / raw)
  To: Christophe Leroy
  Cc: linux-arch, linux-mm, linux-kernel, Michael Ellerman,
	Nicholas Piggin, linuxppc-dev

On Sat, Mar 18, 2023 at 09:19:04AM +0000, Christophe Leroy wrote:
> void set_ptes(struct mm_struct *mm, unsigned long addr, pte_t *ptep,
> 		pte_t pte, unsigned int nr)
> {
> 	pgprot_t prot;
> 	unsigned long pfn;
> 	/*
> 	 * Make sure hardware valid bit is not set. We don't do
> 	 * tlb flush for this update.
> 	 */
> 	VM_WARN_ON(pte_hw_valid(*ptep) && !pte_protnone(*ptep));
> 
> 	/* Note: mm->context.id might not yet have been assigned as
> 	 * this context might not have been activated yet when this
> 	 * is called.
> 	 */
> 	pte = set_pte_filter(pte);
> 
> 	prot = pte_pgprot(pte);
> 	pfn = pte_pfn(pte);
> 	/* Perform the setting of the PTE */
> 	for (;;) {
> 		__set_pte_at(mm, addr, ptep, pfn_pte(pfn, prot), 0);
> 		if (--nr == 0)
> 			break;
> 		ptep++;
> 		pfn++;
> 		addr += PAGE_SIZE;
> 	}
> }

I'd rather the per-arch code were as similar to each other and the
generic implementation as possible.  Fewer bugs that way and easier
for other people to make changes that have to touch every architecture
in the future.

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v4 17/36] nios2: Implement the new page table range API
  2023-07-10 20:18       ` Matthew Wilcox
@ 2023-07-10 23:10         ` Dinh Nguyen
  0 siblings, 0 replies; 138+ messages in thread
From: Dinh Nguyen @ 2023-07-10 23:10 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: Mike Rapoport, linux-arch, linux-mm, linux-kernel



On 7/10/23 15:18, Matthew Wilcox wrote:
> On Tue, Jun 13, 2023 at 05:45:54PM -0500, Dinh Nguyen wrote:
>>
>>
>> On 3/15/23 05:08, Mike Rapoport wrote:
>>> On Wed, Mar 15, 2023 at 05:14:25AM +0000, Matthew Wilcox (Oracle) wrote:
>>>> Add set_ptes(), update_mmu_cache_range(), flush_icache_pages() and
>>>> flush_dcache_folio().  Change the PG_arch_1 (aka PG_dcache_dirty) flag
>>>> from being per-page to per-folio.
>>>>
>>>> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
>>>> Cc: Dinh Nguyen <dinguyen@kernel.org>
>>>
>>> Acked-by: Mike Rapoport (IBM) <rppt@kernel.org>
>>>
>>
>> Applied!
> 
> Sorry, what?  You can't pick this patch out of the middle of a series
> and apply it!  This needs various earlier patches to work.  And then
> later patches depend on this one having been applied, so if we were to
> go the route of "please arch maintainers apply each of these patches",
> it'd take over a year to get them all in.
> 
> As I said in the cover letter, this will all go in through the mm tree.
> So what I want from arch maintainers is an Acked-by/Reviewed-by/Tested-by,
> and then Andrew will apply the whole set.

Apologies, I realized that after replying.

Dinh

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v4 20/36] powerpc: Implement the new page table range API
  2023-07-10 20:24           ` Matthew Wilcox
@ 2023-07-11  4:40             ` Christophe Leroy
  0 siblings, 0 replies; 138+ messages in thread
From: Christophe Leroy @ 2023-07-11  4:40 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: linux-arch, linux-mm, linux-kernel, Michael Ellerman,
	Nicholas Piggin, linuxppc-dev



Le 10/07/2023 à 22:24, Matthew Wilcox a écrit :
> On Sat, Mar 18, 2023 at 09:19:04AM +0000, Christophe Leroy wrote:
>> void set_ptes(struct mm_struct *mm, unsigned long addr, pte_t *ptep,
>> 		pte_t pte, unsigned int nr)
>> {
>> 	pgprot_t prot;
>> 	unsigned long pfn;
>> 	/*
>> 	 * Make sure hardware valid bit is not set. We don't do
>> 	 * tlb flush for this update.
>> 	 */
>> 	VM_WARN_ON(pte_hw_valid(*ptep) && !pte_protnone(*ptep));
>>
>> 	/* Note: mm->context.id might not yet have been assigned as
>> 	 * this context might not have been activated yet when this
>> 	 * is called.
>> 	 */
>> 	pte = set_pte_filter(pte);
>>
>> 	prot = pte_pgprot(pte);
>> 	pfn = pte_pfn(pte);
>> 	/* Perform the setting of the PTE */
>> 	for (;;) {
>> 		__set_pte_at(mm, addr, ptep, pfn_pte(pfn, prot), 0);
>> 		if (--nr == 0)
>> 			break;
>> 		ptep++;
>> 		pfn++;
>> 		addr += PAGE_SIZE;
>> 	}
>> }
> 
> I'd rather the per-arch code were as similar to each other and the
> generic implementation as possible.  Fewer bugs that way and easier
> for other people to make changes that have to touch every architecture
> in the future.

I understand your point but I dislike the idea of open coding pte 
manipulations when you have helpers for that. If you had used helpers 
from the begining you wouldn't have had the mishap you had in v4.

^ permalink raw reply	[flat|nested] 138+ messages in thread

end of thread, other threads:[~2023-07-11  4:41 UTC | newest]

Thread overview: 138+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-15  5:14 [PATCH v4 00/36] New page table range API Matthew Wilcox (Oracle)
2023-03-15  5:14 ` [PATCH v4 01/36] mm: Convert page_table_check_pte_set() to page_table_check_ptes_set() Matthew Wilcox (Oracle)
2023-03-15  9:21   ` Mike Rapoport
2023-03-23 18:36   ` Pasha Tatashin
2023-05-25  2:16   ` Anshuman Khandual
2023-03-15  5:14 ` [PATCH v4 02/36] mm: Add generic flush_icache_pages() and documentation Matthew Wilcox (Oracle)
2023-03-15  9:27   ` Mike Rapoport
2023-05-25  2:23   ` Anshuman Khandual
2023-03-15  5:14 ` [PATCH v4 03/36] mm: Add folio_flush_mapping() Matthew Wilcox (Oracle)
2023-03-15  9:28   ` Mike Rapoport
2023-05-25  2:35   ` Anshuman Khandual
2023-03-15  5:14 ` [PATCH v4 04/36] mm: Remove ARCH_IMPLEMENTS_FLUSH_DCACHE_FOLIO Matthew Wilcox (Oracle)
2023-03-15  9:28   ` Mike Rapoport
2023-05-25  2:43   ` Anshuman Khandual
2023-03-15  5:14 ` [PATCH v4 05/36] mm: Add default definition of set_ptes() Matthew Wilcox (Oracle)
2023-03-15  9:34   ` Mike Rapoport
2023-05-25  3:01   ` Anshuman Khandual
2023-05-25  4:06     ` Matthew Wilcox
2023-03-15  5:14 ` [PATCH v4 06/36] alpha: Implement the new page table range API Matthew Wilcox (Oracle)
2023-03-15  9:41   ` Mike Rapoport
2023-03-15  5:14 ` [PATCH v4 07/36] arc: " Matthew Wilcox (Oracle)
2023-03-15  9:44   ` Mike Rapoport
2023-03-15  5:14 ` [PATCH v4 08/36] arm: " Matthew Wilcox (Oracle)
2023-03-15  9:48   ` Mike Rapoport
2023-03-15 10:56   ` Russell King (Oracle)
2023-03-15  5:14 ` [PATCH v4 09/36] arm64: " Matthew Wilcox (Oracle)
2023-03-15  9:49   ` Mike Rapoport
2023-05-25  3:35   ` Anshuman Khandual
2023-05-25  4:05     ` Matthew Wilcox
2023-05-25  4:43       ` Anshuman Khandual
2023-03-15  5:14 ` [PATCH v4 10/36] csky: " Matthew Wilcox (Oracle)
2023-03-15  9:50   ` Mike Rapoport
2023-03-15  5:14 ` [PATCH v4 11/36] hexagon: " Matthew Wilcox (Oracle)
2023-03-15  9:54   ` Mike Rapoport
2023-03-15  5:14 ` [PATCH v4 12/36] ia64: " Matthew Wilcox (Oracle)
2023-03-15  9:55   ` Mike Rapoport
2023-03-15  5:14 ` [PATCH v4 13/36] loongarch: " Matthew Wilcox (Oracle)
2023-03-15 10:07   ` Mike Rapoport
2023-03-15  5:14 ` [PATCH v4 14/36] m68k: " Matthew Wilcox (Oracle)
2023-03-15  7:43   ` Geert Uytterhoeven
2023-03-16 16:32     ` Geert Uytterhoeven
2023-03-15 10:07   ` Mike Rapoport
2023-03-15  5:14 ` [PATCH v4 15/36] microblaze: " Matthew Wilcox (Oracle)
2023-03-15 10:07   ` Mike Rapoport
2023-03-15  5:14 ` [PATCH v4 16/36] mips: " Matthew Wilcox (Oracle)
2023-03-15 10:08   ` Mike Rapoport
2023-03-15 10:50   ` Thomas Bogendoerfer
2023-03-15 20:33     ` Matthew Wilcox
2023-03-17 15:29       ` Thomas Bogendoerfer
2023-03-19 18:45         ` Thomas Bogendoerfer
2023-03-19 20:16           ` Matthew Wilcox
2023-03-21 11:30             ` Thomas Bogendoerfer
2023-03-15  5:14 ` [PATCH v4 17/36] nios2: " Matthew Wilcox (Oracle)
2023-03-15 10:08   ` Mike Rapoport
2023-06-13 22:45     ` Dinh Nguyen
2023-07-10 20:18       ` Matthew Wilcox
2023-07-10 23:10         ` Dinh Nguyen
2023-03-15  5:14 ` [PATCH v4 18/36] openrisc: " Matthew Wilcox (Oracle)
2023-03-15 10:09   ` Mike Rapoport
2023-03-15  5:14 ` [PATCH v4 19/36] parisc: " Matthew Wilcox (Oracle)
2023-03-15 10:09   ` Mike Rapoport
2023-03-15  5:14 ` [PATCH v4 20/36] powerpc: " Matthew Wilcox (Oracle)
2023-03-15  9:43   ` Christophe Leroy
2023-03-15 10:18     ` Christophe Leroy
2023-03-17  3:47       ` Matthew Wilcox
2023-03-18  9:19         ` Christophe Leroy
2023-07-10 20:24           ` Matthew Wilcox
2023-07-11  4:40             ` Christophe Leroy
2023-03-15 10:09   ` Mike Rapoport
2023-03-15  5:14 ` [PATCH v4 21/36] riscv: " Matthew Wilcox (Oracle)
2023-03-15 10:10   ` Mike Rapoport
2023-03-15  5:14 ` [PATCH v4 22/36] s390: " Matthew Wilcox (Oracle)
2023-03-15 10:10   ` Mike Rapoport
2023-03-15  5:14 ` [PATCH v4 23/36] superh: " Matthew Wilcox (Oracle)
2023-03-15  7:22   ` John Paul Adrian Glaubitz
2023-03-15  7:36   ` John Paul Adrian Glaubitz
2023-03-15 10:10   ` Mike Rapoport
2023-03-15  5:14 ` [PATCH v4 24/36] sparc32: " Matthew Wilcox (Oracle)
2023-03-15 10:11   ` Mike Rapoport
2023-03-15  5:14 ` [PATCH v4 25/36] sparc64: " Matthew Wilcox (Oracle)
2023-03-15 10:11   ` Mike Rapoport
2023-03-15  5:14 ` [PATCH v4 26/36] um: " Matthew Wilcox (Oracle)
2023-03-15 10:12   ` Mike Rapoport
2023-03-15  5:14 ` [PATCH v4 27/36] x86: " Matthew Wilcox (Oracle)
2023-03-15 10:12   ` Mike Rapoport
2023-03-15 10:34   ` Peter Zijlstra
2023-03-15 11:16     ` Mike Rapoport
2023-03-15 11:19       ` Peter Zijlstra
2023-03-15 16:12         ` Matthew Wilcox
2023-03-15  5:14 ` [PATCH v4 28/36] xtensa: " Matthew Wilcox (Oracle)
2023-03-15 10:12   ` Mike Rapoport
2023-03-15  5:14 ` [PATCH v4 29/36] mm: Remove page_mapping_file() Matthew Wilcox (Oracle)
2023-05-25  3:50   ` Anshuman Khandual
2023-05-25  4:03     ` Matthew Wilcox
2023-05-25  4:46       ` Anshuman Khandual
2023-05-25  5:37   ` Anshuman Khandual
2023-03-15  5:14 ` [PATCH v4 30/36] mm: Rationalise flush_icache_pages() and flush_icache_page() Matthew Wilcox (Oracle)
2023-03-15  5:14 ` [PATCH v4 31/36] mm: Tidy up set_ptes definition Matthew Wilcox (Oracle)
2023-05-25  6:20   ` Anshuman Khandual
2023-03-15  5:14 ` [PATCH v4 32/36] mm: Use flush_icache_pages() in do_set_pmd() Matthew Wilcox (Oracle)
2023-05-25  6:31   ` Anshuman Khandual
2023-03-15  5:14 ` [PATCH v4 33/36] filemap: Add filemap_map_folio_range() Matthew Wilcox (Oracle)
2023-03-15  5:14 ` [PATCH v4 34/36] rmap: add folio_add_file_rmap_range() Matthew Wilcox (Oracle)
2023-03-15 13:34   ` Ryan Roberts
2023-03-15 16:08     ` Ryan Roberts
2023-03-16 16:27       ` Yin, Fengwei
2023-03-16 16:34         ` Ryan Roberts
2023-03-17  8:23           ` Yin, Fengwei
2023-03-17 12:46             ` Ryan Roberts
2023-03-17 13:28               ` Yin, Fengwei
2023-03-15  5:14 ` [PATCH v4 35/36] mm: Convert do_set_pte() to set_pte_range() Matthew Wilcox (Oracle)
2023-03-15 15:26   ` Ryan Roberts
2023-03-16 16:23     ` Yin, Fengwei
2023-03-16 16:38       ` Ryan Roberts
2023-03-16 16:41         ` Yin, Fengwei
2023-03-16 16:50           ` Ryan Roberts
2023-03-16 17:52         ` Matthew Wilcox
2023-03-17  1:58           ` Yin, Fengwei
2023-03-17  3:44             ` Matthew Wilcox
2023-03-17  6:33               ` Yin, Fengwei
2023-03-17  8:00                 ` Ryan Roberts
2023-03-17  8:19                   ` Yin, Fengwei
2023-03-17 13:00                     ` Ryan Roberts
2023-03-17 13:44                       ` Yin, Fengwei
2023-03-24 14:58                     ` Will Deacon
2023-03-24 15:11                       ` Matthew Wilcox
2023-03-24 17:23                         ` Will Deacon
2023-03-27  1:23                           ` Yin Fengwei
2023-03-20 13:38               ` Yin, Fengwei
2023-03-20 14:08                 ` Matthew Wilcox
2023-03-21  1:58                   ` Yin, Fengwei
2023-03-21  5:13                   ` Yin Fengwei
2023-05-30  8:07                   ` [PATCH 0/4] New page table range API fixup patches Yin Fengwei
2023-05-30  8:07                     ` [PATCH 1/4] filemap: avoid interfere with xas.xa_index Yin Fengwei
2023-05-30  8:07                     ` [PATCH 2/4] rmap: fix typo in folio_add_file_rmap_range() Yin Fengwei
2023-05-30  8:07                     ` [PATCH 3/4] mm: mark PTEs referencing the accessed folio young Yin Fengwei
2023-05-30  8:07                     ` [PATCH 4/4] filemap: Check address range in filemap_map_folio_range() Yin Fengwei
2023-03-15  5:14 ` [PATCH v4 36/36] filemap: Batch PTE mappings Matthew Wilcox (Oracle)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).