All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 00/34] New page table range API
@ 2023-02-28 21:37 Matthew Wilcox (Oracle)
  2023-02-28 21:37 ` [PATCH v3 01/34] mm: Convert page_table_check_pte_set() to page_table_check_ptes_set() Matthew Wilcox (Oracle)
                   ` (36 more replies)
  0 siblings, 37 replies; 77+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-02-28 21:37 UTC (permalink / raw)
  To: linux-mm, linux-arch; +Cc: Matthew Wilcox (Oracle), linux-kernel

This patchset changes the API used by the MM to set up page table entries.
The four APIs are:
    set_ptes(mm, addr, ptep, pte, nr)
    update_mmu_cache_range(vma, addr, ptep, nr)
    flush_dcache_folio(folio)
    flush_icache_pages(vma, page, nr)

flush_dcache_folio() isn't technically new, but no architecture
implemented it, so I've done that for you.  The old APIs remain around
but are mostly implemented by calling the new interfaces.

The new APIs are based around setting up N page table entries at once.
The N entries belong to the same PMD, the same folio and the same VMA,
so ptep++ is a legitimate operation, and locking is taken care of for
you.  Some architectures can do a better job of it than just a loop,
but I have hesitated to make too deep a change to architectures I don't
understand well.

One thing I have changed in every architecture is that PG_arch_1 is now a
per-folio bit instead of a per-page bit.  This was something that would
have to happen eventually, and it makes sense to do it now rather than
iterate over every page involved in a cache flush and figure out if it
needs to happen.

The point of all this is better performance, and Fengwei Yin has
measured improvement on x86.  I suspect you'll see improvement on
your architecture too.  Try the new will-it-scale test mentioned here:
https://lore.kernel.org/linux-mm/20230206140639.538867-5-fengwei.yin@intel.com/
You'll need to run it on an XFS filesystem and have
CONFIG_TRANSPARENT_HUGEPAGE set.

For testing, I've only run the code on x86.  If an x86->foo compiler
exists in Debian, I've built defconfig.  I'm relying on the buildbots
to tell me what I missed, and people who actually have the hardware to
tell me if it actually works.

I'd like to get this into the MM tree soon after the current merge window
closes, so quick feedback would be appreciated.

v3:
 - Reinstate flush_dcache_icache_phys() on PowerPC
 - Fix folio_flush_mapping().  The documentation was correct and the
   implementation was completely wrong
 - Change the flush_dcache_page() documentation to describe
   flush_dcache_folio() instead
 - Split ARM from ARC.  I messed up my git commands
 - Remove page_mapping_file()
 - Rationalise how flush_icache_pages() and flush_icache_page() are defined
 - Use flush_icache_pages() in do_set_pmd()
 - Pick up Guo Ren's Ack for csky

Matthew Wilcox (Oracle) (30):
  mm: Convert page_table_check_pte_set() to page_table_check_ptes_set()
  mm: Add generic flush_icache_pages() and documentation
  mm: Add folio_flush_mapping()
  mm: Remove ARCH_IMPLEMENTS_FLUSH_DCACHE_FOLIO
  alpha: Implement the new page table range API
  arc: Implement the new page table range API
  arm: Implement the new page table range API
  arm64: Implement the new page table range API
  csky: Implement the new page table range API
  hexagon: Implement the new page table range API
  ia64: Implement the new page table range API
  loongarch: Implement the new page table range API
  m68k: Implement the new page table range API
  microblaze: Implement the new page table range API
  mips: Implement the new page table range API
  nios2: Implement the new page table range API
  openrisc: Implement the new page table range API
  parisc: Implement the new page table range API
  powerpc: Implement the new page table range API
  riscv: Implement the new page table range API
  s390: Implement the new page table range API
  superh: Implement the new page table range API
  sparc32: Implement the new page table range API
  sparc64: Implement the new page table range API
  um: Implement the new page table range API
  x86: Implement the new page table range API
  xtensa: Implement the new page table range API
  mm: Remove page_mapping_file()
  mm: Rationalise flush_icache_pages() and flush_icache_page()
  mm: Use flush_icache_pages() in do_set_pmd()

Yin Fengwei (4):
  filemap: Add filemap_map_folio_range()
  rmap: add folio_add_file_rmap_range()
  mm: Convert do_set_pte() to set_pte_range()
  filemap: Batch PTE mappings

 Documentation/core-api/cachetlb.rst       |  51 +++++-----
 Documentation/filesystems/locking.rst     |   2 +-
 arch/alpha/include/asm/cacheflush.h       |  13 ++-
 arch/alpha/include/asm/pgtable.h          |  18 +++-
 arch/arc/include/asm/cacheflush.h         |  14 +--
 arch/arc/include/asm/pgtable-bits-arcv2.h |  20 +++-
 arch/arc/mm/cache.c                       |  61 +++++++-----
 arch/arc/mm/tlb.c                         |  18 ++--
 arch/arm/include/asm/cacheflush.h         |  29 +++---
 arch/arm/include/asm/pgtable.h            |   5 +-
 arch/arm/include/asm/tlbflush.h           |  13 ++-
 arch/arm/mm/copypage-v4mc.c               |   5 +-
 arch/arm/mm/copypage-v6.c                 |   5 +-
 arch/arm/mm/copypage-xscale.c             |   5 +-
 arch/arm/mm/dma-mapping.c                 |  24 ++---
 arch/arm/mm/fault-armv.c                  |  14 +--
 arch/arm/mm/flush.c                       |  99 +++++++++++--------
 arch/arm/mm/mm.h                          |   2 +-
 arch/arm/mm/mmu.c                         |  14 ++-
 arch/arm64/include/asm/cacheflush.h       |   4 +-
 arch/arm64/include/asm/pgtable.h          |  25 +++--
 arch/arm64/mm/flush.c                     |  36 +++----
 arch/csky/abiv1/cacheflush.c              |  32 ++++---
 arch/csky/abiv1/inc/abi/cacheflush.h      |   3 +-
 arch/csky/abiv2/cacheflush.c              |  30 +++---
 arch/csky/abiv2/inc/abi/cacheflush.h      |  11 ++-
 arch/csky/include/asm/pgtable.h           |  21 +++-
 arch/hexagon/include/asm/cacheflush.h     |   9 +-
 arch/hexagon/include/asm/pgtable.h        |  16 +++-
 arch/ia64/hp/common/sba_iommu.c           |  26 ++---
 arch/ia64/include/asm/cacheflush.h        |  14 ++-
 arch/ia64/include/asm/pgtable.h           |  14 ++-
 arch/ia64/mm/init.c                       |  29 ++++--
 arch/loongarch/include/asm/cacheflush.h   |   2 +-
 arch/loongarch/include/asm/pgtable.h      |  30 ++++--
 arch/m68k/include/asm/cacheflush_mm.h     |  25 +++--
 arch/m68k/include/asm/pgtable_mm.h        |  21 +++-
 arch/m68k/mm/motorola.c                   |   2 +-
 arch/microblaze/include/asm/cacheflush.h  |   8 ++
 arch/microblaze/include/asm/pgtable.h     |  17 +++-
 arch/microblaze/include/asm/tlbflush.h    |   4 +-
 arch/mips/include/asm/cacheflush.h        |  32 ++++---
 arch/mips/include/asm/pgtable.h           |  36 ++++---
 arch/mips/mm/c-r4k.c                      |   5 +-
 arch/mips/mm/cache.c                      |  56 +++++------
 arch/mips/mm/init.c                       |  17 ++--
 arch/nios2/include/asm/cacheflush.h       |   6 +-
 arch/nios2/include/asm/pgtable.h          |  27 ++++--
 arch/nios2/mm/cacheflush.c                |  62 ++++++------
 arch/openrisc/include/asm/cacheflush.h    |   8 +-
 arch/openrisc/include/asm/pgtable.h       |  27 +++++-
 arch/openrisc/mm/cache.c                  |  12 ++-
 arch/parisc/include/asm/cacheflush.h      |  14 ++-
 arch/parisc/include/asm/pgtable.h         |  28 ++++--
 arch/parisc/kernel/cache.c                | 101 ++++++++++++++------
 arch/powerpc/include/asm/book3s/pgtable.h |  10 +-
 arch/powerpc/include/asm/cacheflush.h     |  14 ++-
 arch/powerpc/include/asm/kvm_ppc.h        |  10 +-
 arch/powerpc/include/asm/nohash/pgtable.h |  13 +--
 arch/powerpc/include/asm/pgtable.h        |   6 ++
 arch/powerpc/mm/book3s64/hash_utils.c     |  11 ++-
 arch/powerpc/mm/cacheflush.c              |  40 +++-----
 arch/powerpc/mm/nohash/e500_hugetlbpage.c |   3 +-
 arch/powerpc/mm/pgtable.c                 |  51 +++++-----
 arch/riscv/include/asm/cacheflush.h       |  19 ++--
 arch/riscv/include/asm/pgtable.h          |  26 +++--
 arch/riscv/mm/cacheflush.c                |  11 +--
 arch/s390/include/asm/pgtable.h           |  34 +++++--
 arch/sh/include/asm/cacheflush.h          |  21 ++--
 arch/sh/include/asm/pgtable.h             |   6 +-
 arch/sh/include/asm/pgtable_32.h          |  16 +++-
 arch/sh/mm/cache-j2.c                     |   4 +-
 arch/sh/mm/cache-sh4.c                    |  26 +++--
 arch/sh/mm/cache-sh7705.c                 |  26 +++--
 arch/sh/mm/cache.c                        |  54 ++++++-----
 arch/sh/mm/kmap.c                         |   3 +-
 arch/sparc/include/asm/cacheflush_32.h    |   9 +-
 arch/sparc/include/asm/cacheflush_64.h    |  19 ++--
 arch/sparc/include/asm/pgtable_32.h       |  15 ++-
 arch/sparc/include/asm/pgtable_64.h       |  25 ++++-
 arch/sparc/kernel/smp_64.c                |  56 +++++++----
 arch/sparc/mm/init_32.c                   |  13 ++-
 arch/sparc/mm/init_64.c                   |  78 ++++++++-------
 arch/sparc/mm/tlb.c                       |   5 +-
 arch/um/include/asm/pgtable.h             |  15 ++-
 arch/x86/include/asm/pgtable.h            |  21 +++-
 arch/xtensa/include/asm/cacheflush.h      |  11 ++-
 arch/xtensa/include/asm/pgtable.h         |  24 +++--
 arch/xtensa/mm/cache.c                    |  83 +++++++++-------
 include/asm-generic/cacheflush.h          |   7 --
 include/linux/cacheflush.h                |  13 ++-
 include/linux/mm.h                        |   3 +-
 include/linux/page_table_check.h          |  14 +--
 include/linux/pagemap.h                   |  28 ++++--
 include/linux/rmap.h                      |   2 +
 mm/filemap.c                              | 111 +++++++++++++---------
 mm/memory.c                               |  30 +++---
 mm/page_table_check.c                     |  14 +--
 mm/rmap.c                                 |  60 +++++++++---
 mm/util.c                                 |   2 +-
 100 files changed, 1436 insertions(+), 848 deletions(-)

-- 
2.39.1


^ permalink raw reply	[flat|nested] 77+ messages in thread

* [PATCH v3 01/34] mm: Convert page_table_check_pte_set() to page_table_check_ptes_set()
  2023-02-28 21:37 [PATCH v3 00/34] New page table range API Matthew Wilcox (Oracle)
@ 2023-02-28 21:37 ` Matthew Wilcox (Oracle)
  2023-02-28 21:37 ` [PATCH v3 02/34] mm: Add generic flush_icache_pages() and documentation Matthew Wilcox (Oracle)
                   ` (35 subsequent siblings)
  36 siblings, 0 replies; 77+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-02-28 21:37 UTC (permalink / raw)
  To: linux-mm, linux-arch; +Cc: Matthew Wilcox (Oracle), linux-kernel

Tell the page table check how many PTEs & PFNs we want it to check.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 arch/arm64/include/asm/pgtable.h |  2 +-
 arch/riscv/include/asm/pgtable.h |  2 +-
 arch/x86/include/asm/pgtable.h   |  2 +-
 include/linux/page_table_check.h | 14 +++++++-------
 mm/page_table_check.c            | 14 ++++++++------
 5 files changed, 18 insertions(+), 16 deletions(-)

diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index b6ba466e2e8a..69765dc697af 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -358,7 +358,7 @@ static inline void __set_pte_at(struct mm_struct *mm, unsigned long addr,
 static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
 			      pte_t *ptep, pte_t pte)
 {
-	page_table_check_pte_set(mm, addr, ptep, pte);
+	page_table_check_ptes_set(mm, addr, ptep, pte, 1);
 	return __set_pte_at(mm, addr, ptep, pte);
 }
 
diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
index ab05f892d317..b516f3b59616 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -459,7 +459,7 @@ static inline void __set_pte_at(struct mm_struct *mm,
 static inline void set_pte_at(struct mm_struct *mm,
 	unsigned long addr, pte_t *ptep, pte_t pteval)
 {
-	page_table_check_pte_set(mm, addr, ptep, pteval);
+	page_table_check_ptes_set(mm, addr, ptep, pteval, 1);
 	__set_pte_at(mm, addr, ptep, pteval);
 }
 
diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 7425f32e5293..84be3e07b112 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -1022,7 +1022,7 @@ static inline pud_t native_local_pudp_get_and_clear(pud_t *pudp)
 static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
 			      pte_t *ptep, pte_t pte)
 {
-	page_table_check_pte_set(mm, addr, ptep, pte);
+	page_table_check_ptes_set(mm, addr, ptep, pte, 1);
 	set_pte(ptep, pte);
 }
 
diff --git a/include/linux/page_table_check.h b/include/linux/page_table_check.h
index 01e16c7696ec..ba269c7009e4 100644
--- a/include/linux/page_table_check.h
+++ b/include/linux/page_table_check.h
@@ -20,8 +20,8 @@ void __page_table_check_pmd_clear(struct mm_struct *mm, unsigned long addr,
 				  pmd_t pmd);
 void __page_table_check_pud_clear(struct mm_struct *mm, unsigned long addr,
 				  pud_t pud);
-void __page_table_check_pte_set(struct mm_struct *mm, unsigned long addr,
-				pte_t *ptep, pte_t pte);
+void __page_table_check_ptes_set(struct mm_struct *mm, unsigned long addr,
+				pte_t *ptep, pte_t pte, unsigned int nr);
 void __page_table_check_pmd_set(struct mm_struct *mm, unsigned long addr,
 				pmd_t *pmdp, pmd_t pmd);
 void __page_table_check_pud_set(struct mm_struct *mm, unsigned long addr,
@@ -73,14 +73,14 @@ static inline void page_table_check_pud_clear(struct mm_struct *mm,
 	__page_table_check_pud_clear(mm, addr, pud);
 }
 
-static inline void page_table_check_pte_set(struct mm_struct *mm,
+static inline void page_table_check_ptes_set(struct mm_struct *mm,
 					    unsigned long addr, pte_t *ptep,
-					    pte_t pte)
+					    pte_t pte, unsigned int nr)
 {
 	if (static_branch_likely(&page_table_check_disabled))
 		return;
 
-	__page_table_check_pte_set(mm, addr, ptep, pte);
+	__page_table_check_ptes_set(mm, addr, ptep, pte, nr);
 }
 
 static inline void page_table_check_pmd_set(struct mm_struct *mm,
@@ -138,9 +138,9 @@ static inline void page_table_check_pud_clear(struct mm_struct *mm,
 {
 }
 
-static inline void page_table_check_pte_set(struct mm_struct *mm,
+static inline void page_table_check_ptes_set(struct mm_struct *mm,
 					    unsigned long addr, pte_t *ptep,
-					    pte_t pte)
+					    pte_t pte, unsigned int nr)
 {
 }
 
diff --git a/mm/page_table_check.c b/mm/page_table_check.c
index 25d8610c0042..e6f4d40caaa2 100644
--- a/mm/page_table_check.c
+++ b/mm/page_table_check.c
@@ -184,20 +184,22 @@ void __page_table_check_pud_clear(struct mm_struct *mm, unsigned long addr,
 }
 EXPORT_SYMBOL(__page_table_check_pud_clear);
 
-void __page_table_check_pte_set(struct mm_struct *mm, unsigned long addr,
-				pte_t *ptep, pte_t pte)
+void __page_table_check_ptes_set(struct mm_struct *mm, unsigned long addr,
+				pte_t *ptep, pte_t pte, unsigned int nr)
 {
+	unsigned int i;
+
 	if (&init_mm == mm)
 		return;
 
-	__page_table_check_pte_clear(mm, addr, *ptep);
+	for (i = 0; i < nr; i++)
+		__page_table_check_pte_clear(mm, addr, ptep[i]);
 	if (pte_user_accessible_page(pte)) {
-		page_table_check_set(mm, addr, pte_pfn(pte),
-				     PAGE_SIZE >> PAGE_SHIFT,
+		page_table_check_set(mm, addr, pte_pfn(pte), nr,
 				     pte_write(pte));
 	}
 }
-EXPORT_SYMBOL(__page_table_check_pte_set);
+EXPORT_SYMBOL(__page_table_check_ptes_set);
 
 void __page_table_check_pmd_set(struct mm_struct *mm, unsigned long addr,
 				pmd_t *pmdp, pmd_t pmd)
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH v3 02/34] mm: Add generic flush_icache_pages() and documentation
  2023-02-28 21:37 [PATCH v3 00/34] New page table range API Matthew Wilcox (Oracle)
  2023-02-28 21:37 ` [PATCH v3 01/34] mm: Convert page_table_check_pte_set() to page_table_check_ptes_set() Matthew Wilcox (Oracle)
@ 2023-02-28 21:37 ` Matthew Wilcox (Oracle)
  2023-03-15  9:27   ` Mike Rapoport
  2023-02-28 21:37 ` [PATCH v3 03/34] mm: Add folio_flush_mapping() Matthew Wilcox (Oracle)
                   ` (34 subsequent siblings)
  36 siblings, 1 reply; 77+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-02-28 21:37 UTC (permalink / raw)
  To: linux-mm, linux-arch; +Cc: Matthew Wilcox (Oracle), linux-kernel

flush_icache_page() is deprecated but not yet removed, so add
a range version of it.  Change the documentation to refer to
update_mmu_cache_range() instead of update_mmu_cache().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 Documentation/core-api/cachetlb.rst | 35 +++++++++++++++--------------
 include/asm-generic/cacheflush.h    |  5 +++++
 2 files changed, 23 insertions(+), 17 deletions(-)

diff --git a/Documentation/core-api/cachetlb.rst b/Documentation/core-api/cachetlb.rst
index 5c0552e78c58..d4c9e2a28d36 100644
--- a/Documentation/core-api/cachetlb.rst
+++ b/Documentation/core-api/cachetlb.rst
@@ -88,13 +88,13 @@ changes occur:
 
 	This is used primarily during fault processing.
 
-5) ``void update_mmu_cache(struct vm_area_struct *vma,
-   unsigned long address, pte_t *ptep)``
+5) ``void update_mmu_cache_range(struct vm_area_struct *vma,
+   unsigned long address, pte_t *ptep, unsigned int nr)``
 
-	At the end of every page fault, this routine is invoked to
-	tell the architecture specific code that a translation
-	now exists at virtual address "address" for address space
-	"vma->vm_mm", in the software page tables.
+	At the end of every page fault, this routine is invoked to tell
+	the architecture specific code that translations now exists
+	in the software page tables for address space "vma->vm_mm"
+	at virtual address "address" for "nr" consecutive pages.
 
 	A port may use this information in any way it so chooses.
 	For example, it could use this event to pre-load TLB
@@ -306,17 +306,18 @@ maps this page at its virtual address.
 	private".  The kernel guarantees that, for pagecache pages, it will
 	clear this bit when such a page first enters the pagecache.
 
-	This allows these interfaces to be implemented much more efficiently.
-	It allows one to "defer" (perhaps indefinitely) the actual flush if
-	there are currently no user processes mapping this page.  See sparc64's
-	flush_dcache_page and update_mmu_cache implementations for an example
-	of how to go about doing this.
+	This allows these interfaces to be implemented much more
+	efficiently.  It allows one to "defer" (perhaps indefinitely) the
+	actual flush if there are currently no user processes mapping this
+	page.  See sparc64's flush_dcache_page and update_mmu_cache_range
+	implementations for an example of how to go about doing this.
 
-	The idea is, first at flush_dcache_page() time, if page_file_mapping()
-	returns a mapping, and mapping_mapped on that mapping returns %false,
-	just mark the architecture private page flag bit.  Later, in
-	update_mmu_cache(), a check is made of this flag bit, and if set the
-	flush is done and the flag bit is cleared.
+	The idea is, first at flush_dcache_page() time, if
+	page_file_mapping() returns a mapping, and mapping_mapped on that
+	mapping returns %false, just mark the architecture private page
+	flag bit.  Later, in update_mmu_cache_range(), a check is made
+	of this flag bit, and if set the flush is done and the flag bit
+	is cleared.
 
 	.. important::
 
@@ -369,7 +370,7 @@ maps this page at its virtual address.
   ``void flush_icache_page(struct vm_area_struct *vma, struct page *page)``
 
 	All the functionality of flush_icache_page can be implemented in
-	flush_dcache_page and update_mmu_cache. In the future, the hope
+	flush_dcache_page and update_mmu_cache_range. In the future, the hope
 	is to remove this interface completely.
 
 The final category of APIs is for I/O to deliberately aliased address
diff --git a/include/asm-generic/cacheflush.h b/include/asm-generic/cacheflush.h
index f46258d1a080..09d51a680765 100644
--- a/include/asm-generic/cacheflush.h
+++ b/include/asm-generic/cacheflush.h
@@ -78,6 +78,11 @@ static inline void flush_icache_range(unsigned long start, unsigned long end)
 #endif
 
 #ifndef flush_icache_page
+static inline void flush_icache_pages(struct vm_area_struct *vma,
+				     struct page *page, unsigned int nr)
+{
+}
+
 static inline void flush_icache_page(struct vm_area_struct *vma,
 				     struct page *page)
 {
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH v3 03/34] mm: Add folio_flush_mapping()
  2023-02-28 21:37 [PATCH v3 00/34] New page table range API Matthew Wilcox (Oracle)
  2023-02-28 21:37 ` [PATCH v3 01/34] mm: Convert page_table_check_pte_set() to page_table_check_ptes_set() Matthew Wilcox (Oracle)
  2023-02-28 21:37 ` [PATCH v3 02/34] mm: Add generic flush_icache_pages() and documentation Matthew Wilcox (Oracle)
@ 2023-02-28 21:37 ` Matthew Wilcox (Oracle)
  2023-03-03 10:33   ` Mike Rapoport
  2023-02-28 21:37 ` [PATCH v3 04/34] mm: Remove ARCH_IMPLEMENTS_FLUSH_DCACHE_FOLIO Matthew Wilcox (Oracle)
                   ` (33 subsequent siblings)
  36 siblings, 1 reply; 77+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-02-28 21:37 UTC (permalink / raw)
  To: linux-mm, linux-arch; +Cc: Matthew Wilcox (Oracle), linux-kernel

This is the folio equivalent of page_mapping_file(), but rename it
to make it clear that it's very different from page_file_mapping().
Theoretically, there's nothing flush-only about it, but there are no
other users today, and I doubt there will be; it's almost always more
useful to know the swapfile's mapping or the swapcache's mapping.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 include/linux/pagemap.h | 26 +++++++++++++++++++++-----
 1 file changed, 21 insertions(+), 5 deletions(-)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 51b75b89730e..1b1ba3d5100d 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -369,6 +369,26 @@ static inline struct address_space *folio_file_mapping(struct folio *folio)
 	return folio->mapping;
 }
 
+/**
+ * folio_flush_mapping - Find the file mapping this folio belongs to.
+ * @folio: The folio.
+ *
+ * For folios which are in the page cache, return the mapping that this
+ * page belongs to.  Anonymous folios return NULL, even if they're in
+ * the swap cache.  Other kinds of folio also return NULL.
+ *
+ * This is ONLY used by architecture cache flushing code.  If you aren't
+ * writing cache flushing code, you want either folio_mapping() or
+ * folio_file_mapping().
+ */
+static inline struct address_space *folio_flush_mapping(struct folio *folio)
+{
+	if (unlikely(folio_test_swapcache(folio)))
+		return NULL;
+
+	return folio_mapping(folio);
+}
+
 static inline struct address_space *page_file_mapping(struct page *page)
 {
 	return folio_file_mapping(page_folio(page));
@@ -379,11 +399,7 @@ static inline struct address_space *page_file_mapping(struct page *page)
  */
 static inline struct address_space *page_mapping_file(struct page *page)
 {
-	struct folio *folio = page_folio(page);
-
-	if (unlikely(folio_test_swapcache(folio)))
-		return NULL;
-	return folio_mapping(folio);
+	return folio_flush_mapping(page_folio(page));
 }
 
 /**
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH v3 04/34] mm: Remove ARCH_IMPLEMENTS_FLUSH_DCACHE_FOLIO
  2023-02-28 21:37 [PATCH v3 00/34] New page table range API Matthew Wilcox (Oracle)
                   ` (2 preceding siblings ...)
  2023-02-28 21:37 ` [PATCH v3 03/34] mm: Add folio_flush_mapping() Matthew Wilcox (Oracle)
@ 2023-02-28 21:37 ` Matthew Wilcox (Oracle)
  2023-02-28 21:37 ` [PATCH v3 05/34] alpha: Implement the new page table range API Matthew Wilcox (Oracle)
                   ` (32 subsequent siblings)
  36 siblings, 0 replies; 77+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-02-28 21:37 UTC (permalink / raw)
  To: linux-mm, linux-arch; +Cc: Matthew Wilcox (Oracle), linux-kernel

Current best practice is to reuse the name of the function as a define
to indicate that the function is implemented by the architecture.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 Documentation/core-api/cachetlb.rst | 24 +++++++++---------------
 include/linux/cacheflush.h          |  4 ++--
 mm/util.c                           |  2 +-
 3 files changed, 12 insertions(+), 18 deletions(-)

diff --git a/Documentation/core-api/cachetlb.rst b/Documentation/core-api/cachetlb.rst
index d4c9e2a28d36..770008afd409 100644
--- a/Documentation/core-api/cachetlb.rst
+++ b/Documentation/core-api/cachetlb.rst
@@ -269,7 +269,7 @@ maps this page at its virtual address.
 	If D-cache aliasing is not an issue, these two routines may
 	simply call memcpy/memset directly and do nothing more.
 
-  ``void flush_dcache_page(struct page *page)``
+  ``void flush_dcache_folio(struct folio *folio)``
 
         This routines must be called when:
 
@@ -277,7 +277,7 @@ maps this page at its virtual address.
 	     and / or in high memory
 	  b) the kernel is about to read from a page cache page and user space
 	     shared/writable mappings of this page potentially exist.  Note
-	     that {get,pin}_user_pages{_fast} already call flush_dcache_page
+	     that {get,pin}_user_pages{_fast} already call flush_dcache_folio
 	     on any page found in the user address space and thus driver
 	     code rarely needs to take this into account.
 
@@ -291,7 +291,7 @@ maps this page at its virtual address.
 
 	The phrase "kernel writes to a page cache page" means, specifically,
 	that the kernel executes store instructions that dirty data in that
-	page at the page->virtual mapping of that page.  It is important to
+	page at the kernel virtual mapping of that page.  It is important to
 	flush here to handle D-cache aliasing, to make sure these kernel stores
 	are visible to user space mappings of that page.
 
@@ -302,18 +302,18 @@ maps this page at its virtual address.
 	If D-cache aliasing is not an issue, this routine may simply be defined
 	as a nop on that architecture.
 
-        There is a bit set aside in page->flags (PG_arch_1) as "architecture
+        There is a bit set aside in folio->flags (PG_arch_1) as "architecture
 	private".  The kernel guarantees that, for pagecache pages, it will
 	clear this bit when such a page first enters the pagecache.
 
 	This allows these interfaces to be implemented much more
 	efficiently.  It allows one to "defer" (perhaps indefinitely) the
 	actual flush if there are currently no user processes mapping this
-	page.  See sparc64's flush_dcache_page and update_mmu_cache_range
+	page.  See sparc64's flush_dcache_folio and update_mmu_cache_range
 	implementations for an example of how to go about doing this.
 
-	The idea is, first at flush_dcache_page() time, if
-	page_file_mapping() returns a mapping, and mapping_mapped on that
+	The idea is, first at flush_dcache_folio() time, if
+	folio_flush_mapping() returns a mapping, and mapping_mapped() on that
 	mapping returns %false, just mark the architecture private page
 	flag bit.  Later, in update_mmu_cache_range(), a check is made
 	of this flag bit, and if set the flush is done and the flag bit
@@ -327,12 +327,6 @@ maps this page at its virtual address.
 			dirty.  Again, see sparc64 for examples of how
 			to deal with this.
 
-  ``void flush_dcache_folio(struct folio *folio)``
-	This function is called under the same circumstances as
-	flush_dcache_page().  It allows the architecture to
-	optimise for flushing the entire folio of pages instead
-	of flushing one page at a time.
-
   ``void copy_to_user_page(struct vm_area_struct *vma, struct page *page,
   unsigned long user_vaddr, void *dst, void *src, int len)``
   ``void copy_from_user_page(struct vm_area_struct *vma, struct page *page,
@@ -353,7 +347,7 @@ maps this page at its virtual address.
 
   	When the kernel needs to access the contents of an anonymous
 	page, it calls this function (currently only
-	get_user_pages()).  Note: flush_dcache_page() deliberately
+	get_user_pages()).  Note: flush_dcache_folio() deliberately
 	doesn't work for an anonymous page.  The default
 	implementation is a nop (and should remain so for all coherent
 	architectures).  For incoherent architectures, it should flush
@@ -370,7 +364,7 @@ maps this page at its virtual address.
   ``void flush_icache_page(struct vm_area_struct *vma, struct page *page)``
 
 	All the functionality of flush_icache_page can be implemented in
-	flush_dcache_page and update_mmu_cache_range. In the future, the hope
+	flush_dcache_folio and update_mmu_cache_range. In the future, the hope
 	is to remove this interface completely.
 
 The final category of APIs is for I/O to deliberately aliased address
diff --git a/include/linux/cacheflush.h b/include/linux/cacheflush.h
index a6189d21f2ba..82136f3fcf54 100644
--- a/include/linux/cacheflush.h
+++ b/include/linux/cacheflush.h
@@ -7,14 +7,14 @@
 struct folio;
 
 #if ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE
-#ifndef ARCH_IMPLEMENTS_FLUSH_DCACHE_FOLIO
+#ifndef flush_dcache_folio
 void flush_dcache_folio(struct folio *folio);
 #endif
 #else
 static inline void flush_dcache_folio(struct folio *folio)
 {
 }
-#define ARCH_IMPLEMENTS_FLUSH_DCACHE_FOLIO 0
+#define flush_dcache_folio flush_dcache_folio
 #endif /* ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE */
 
 #endif /* _LINUX_CACHEFLUSH_H */
diff --git a/mm/util.c b/mm/util.c
index b8ed9dbc7fd5..f66e0ca82d2d 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -1124,7 +1124,7 @@ void page_offline_end(void)
 }
 EXPORT_SYMBOL(page_offline_end);
 
-#ifndef ARCH_IMPLEMENTS_FLUSH_DCACHE_FOLIO
+#ifndef flush_dcache_folio
 void flush_dcache_folio(struct folio *folio)
 {
 	long i, nr = folio_nr_pages(folio);
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH v3 05/34] alpha: Implement the new page table range API
  2023-02-28 21:37 [PATCH v3 00/34] New page table range API Matthew Wilcox (Oracle)
                   ` (3 preceding siblings ...)
  2023-02-28 21:37 ` [PATCH v3 04/34] mm: Remove ARCH_IMPLEMENTS_FLUSH_DCACHE_FOLIO Matthew Wilcox (Oracle)
@ 2023-02-28 21:37 ` Matthew Wilcox (Oracle)
  2023-02-28 21:37   ` Matthew Wilcox (Oracle)
                   ` (31 subsequent siblings)
  36 siblings, 0 replies; 77+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-02-28 21:37 UTC (permalink / raw)
  To: linux-mm, linux-arch
  Cc: Matthew Wilcox (Oracle),
	linux-kernel, Richard Henderson, Ivan Kokshaysky, Matt Turner,
	linux-alpha

Add set_ptes(), update_mmu_cache_range() and flush_icache_pages().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Richard Henderson <richard.henderson@linaro.org>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Matt Turner <mattst88@gmail.com>
Cc: linux-alpha@vger.kernel.org
---
 arch/alpha/include/asm/cacheflush.h | 10 ++++++++++
 arch/alpha/include/asm/pgtable.h    | 18 +++++++++++++++++-
 2 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/arch/alpha/include/asm/cacheflush.h b/arch/alpha/include/asm/cacheflush.h
index 9945ff483eaf..3956460e69e2 100644
--- a/arch/alpha/include/asm/cacheflush.h
+++ b/arch/alpha/include/asm/cacheflush.h
@@ -57,6 +57,16 @@ extern void flush_icache_user_page(struct vm_area_struct *vma,
 #define flush_icache_page(vma, page) \
 	flush_icache_user_page((vma), (page), 0, 0)
 
+/*
+ * Both implementations of flush_icache_user_page flush the entire
+ * address space, so one call, no matter how many pages.
+ */
+static inline void flush_icache_pages(struct vm_area_struct *vma,
+		struct page *page, unsigned int nr)
+{
+	flush_icache_user_page(vma, page, 0, 0);
+}
+
 #include <asm-generic/cacheflush.h>
 
 #endif /* _ALPHA_CACHEFLUSH_H */
diff --git a/arch/alpha/include/asm/pgtable.h b/arch/alpha/include/asm/pgtable.h
index ba43cb841d19..1e3354e9731b 100644
--- a/arch/alpha/include/asm/pgtable.h
+++ b/arch/alpha/include/asm/pgtable.h
@@ -26,7 +26,18 @@ struct vm_area_struct;
  * hook is made available.
  */
 #define set_pte(pteptr, pteval) ((*(pteptr)) = (pteval))
-#define set_pte_at(mm,addr,ptep,pteval) set_pte(ptep,pteval)
+static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
+		pte_t *ptep, pte_t pte, unsigned int nr)
+{
+	for (;;) {
+		set_pte(ptep, pte);
+		if (--nr == 0)
+			break;
+		ptep++;
+		pte_val(pte) += 1UL << 32;
+	}
+}
+#define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1)
 
 /* PMD_SHIFT determines the size of the area a second-level page table can map */
 #define PMD_SHIFT	(PAGE_SHIFT + (PAGE_SHIFT-3))
@@ -303,6 +314,11 @@ extern inline void update_mmu_cache(struct vm_area_struct * vma,
 {
 }
 
+static inline void update_mmu_cache_range(struct vm_area_struct *vma,
+		unsigned long address, pte_t *ptep, unsigned int nr)
+{
+}
+
 /*
  * Encode/decode swap entries and swap PTEs. Swap PTEs are all PTEs that
  * are !pte_none() && !pte_present().
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH v3 06/34] arc: Implement the new page table range API
  2023-02-28 21:37 [PATCH v3 00/34] New page table range API Matthew Wilcox (Oracle)
@ 2023-02-28 21:37   ` Matthew Wilcox (Oracle)
  2023-02-28 21:37 ` [PATCH v3 02/34] mm: Add generic flush_icache_pages() and documentation Matthew Wilcox (Oracle)
                     ` (35 subsequent siblings)
  36 siblings, 0 replies; 77+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-02-28 21:37 UTC (permalink / raw)
  To: linux-mm, linux-arch
  Cc: Matthew Wilcox (Oracle), linux-kernel, Vineet Gupta, linux-snps-arc

Add set_ptes(), update_mmu_cache_range(), flush_dcache_folio()
and flush_icache_pages().

Change the PG_dc_clean flag from being per-page to per-folio (which
means it cannot always be set as we don't know that all pages in this
folio were cleaned).  Enhance the internal flush routines to take the
number of pages to flush.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Vineet Gupta <vgupta@kernel.org>
Cc: linux-snps-arc@lists.infradead.org
---
 arch/arc/include/asm/cacheflush.h         |  7 ++-
 arch/arc/include/asm/pgtable-bits-arcv2.h | 20 ++++++--
 arch/arc/mm/cache.c                       | 61 ++++++++++++++---------
 arch/arc/mm/tlb.c                         | 18 ++++---
 4 files changed, 68 insertions(+), 38 deletions(-)

diff --git a/arch/arc/include/asm/cacheflush.h b/arch/arc/include/asm/cacheflush.h
index e201b4b1655a..04f65f588510 100644
--- a/arch/arc/include/asm/cacheflush.h
+++ b/arch/arc/include/asm/cacheflush.h
@@ -25,17 +25,20 @@
  * in update_mmu_cache()
  */
 #define flush_icache_page(vma, page)
+#define flush_icache_pages(vma, page, nr)
 
 void flush_cache_all(void);
 
 void flush_icache_range(unsigned long kstart, unsigned long kend);
 void __sync_icache_dcache(phys_addr_t paddr, unsigned long vaddr, int len);
-void __inv_icache_page(phys_addr_t paddr, unsigned long vaddr);
-void __flush_dcache_page(phys_addr_t paddr, unsigned long vaddr);
+void __inv_icache_pages(phys_addr_t paddr, unsigned long vaddr, unsigned nr);
+void __flush_dcache_pages(phys_addr_t paddr, unsigned long vaddr, unsigned nr);
 
 #define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
 
 void flush_dcache_page(struct page *page);
+void flush_dcache_folio(struct folio *folio);
+#define flush_dcache_folio flush_dcache_folio
 
 void dma_cache_wback_inv(phys_addr_t start, unsigned long sz);
 void dma_cache_inv(phys_addr_t start, unsigned long sz);
diff --git a/arch/arc/include/asm/pgtable-bits-arcv2.h b/arch/arc/include/asm/pgtable-bits-arcv2.h
index 6e9f8ca6d6a1..4a1b2ce204c6 100644
--- a/arch/arc/include/asm/pgtable-bits-arcv2.h
+++ b/arch/arc/include/asm/pgtable-bits-arcv2.h
@@ -100,14 +100,24 @@ static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
 	return __pte((pte_val(pte) & _PAGE_CHG_MASK) | pgprot_val(newprot));
 }
 
-static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
-			      pte_t *ptep, pte_t pteval)
+static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
+		pte_t *ptep, pte_t pte, unsigned int nr)
 {
-	set_pte(ptep, pteval);
+	for (;;) {
+		set_pte(ptep, pte);
+		if (--nr == 0)
+			break;
+		ptep++;
+		pte_val(pte) += PAGE_SIZE;
+	}
 }
+#define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1)
 
-void update_mmu_cache(struct vm_area_struct *vma, unsigned long address,
-		      pte_t *ptep);
+void update_mmu_cache_range(struct vm_area_struct *vma, unsigned long address,
+		      pte_t *ptep, unsigned int nr);
+
+#define update_mmu_cache(vma, addr, ptep) \
+	update_mmu_cache_range(vma, addr, ptep, 1)
 
 /*
  * Encode/decode swap entries and swap PTEs. Swap PTEs are all PTEs that
diff --git a/arch/arc/mm/cache.c b/arch/arc/mm/cache.c
index 55c6de138eae..3c16ee942a5c 100644
--- a/arch/arc/mm/cache.c
+++ b/arch/arc/mm/cache.c
@@ -752,17 +752,17 @@ static inline void arc_slc_enable(void)
  * There's a corollary case, where kernel READs from a userspace mapped page.
  * If the U-mapping is not congruent to K-mapping, former needs flushing.
  */
-void flush_dcache_page(struct page *page)
+void flush_dcache_folio(struct folio *folio)
 {
 	struct address_space *mapping;
 
 	if (!cache_is_vipt_aliasing()) {
-		clear_bit(PG_dc_clean, &page->flags);
+		clear_bit(PG_dc_clean, &folio->flags);
 		return;
 	}
 
 	/* don't handle anon pages here */
-	mapping = page_mapping_file(page);
+	mapping = folio_flush_mapping(folio);
 	if (!mapping)
 		return;
 
@@ -771,17 +771,27 @@ void flush_dcache_page(struct page *page)
 	 * Make a note that K-mapping is dirty
 	 */
 	if (!mapping_mapped(mapping)) {
-		clear_bit(PG_dc_clean, &page->flags);
-	} else if (page_mapcount(page)) {
-
+		clear_bit(PG_dc_clean, &folio->flags);
+	} else if (folio_mapped(folio)) {
 		/* kernel reading from page with U-mapping */
-		phys_addr_t paddr = (unsigned long)page_address(page);
-		unsigned long vaddr = page->index << PAGE_SHIFT;
+		phys_addr_t paddr = (unsigned long)folio_address(folio);
+		unsigned long vaddr = folio_pos(folio);
 
+		/*
+		 * vaddr is not actually the virtual address, but is
+		 * congruent to every user mapping.
+		 */
 		if (addr_not_cache_congruent(paddr, vaddr))
-			__flush_dcache_page(paddr, vaddr);
+			__flush_dcache_pages(paddr, vaddr,
+						folio_nr_pages(folio));
 	}
 }
+EXPORT_SYMBOL(flush_dcache_folio);
+
+void flush_dcache_page(struct page *page)
+{
+	return flush_dcache_folio(page_folio(page));
+}
 EXPORT_SYMBOL(flush_dcache_page);
 
 /*
@@ -921,18 +931,18 @@ void __sync_icache_dcache(phys_addr_t paddr, unsigned long vaddr, int len)
 }
 
 /* wrapper to compile time eliminate alignment checks in flush loop */
-void __inv_icache_page(phys_addr_t paddr, unsigned long vaddr)
+void __inv_icache_pages(phys_addr_t paddr, unsigned long vaddr, unsigned nr)
 {
-	__ic_line_inv_vaddr(paddr, vaddr, PAGE_SIZE);
+	__ic_line_inv_vaddr(paddr, vaddr, nr * PAGE_SIZE);
 }
 
 /*
  * wrapper to clearout kernel or userspace mappings of a page
  * For kernel mappings @vaddr == @paddr
  */
-void __flush_dcache_page(phys_addr_t paddr, unsigned long vaddr)
+void __flush_dcache_pages(phys_addr_t paddr, unsigned long vaddr, unsigned nr)
 {
-	__dc_line_op(paddr, vaddr & PAGE_MASK, PAGE_SIZE, OP_FLUSH_N_INV);
+	__dc_line_op(paddr, vaddr & PAGE_MASK, nr * PAGE_SIZE, OP_FLUSH_N_INV);
 }
 
 noinline void flush_cache_all(void)
@@ -962,10 +972,10 @@ void flush_cache_page(struct vm_area_struct *vma, unsigned long u_vaddr,
 
 	u_vaddr &= PAGE_MASK;
 
-	__flush_dcache_page(paddr, u_vaddr);
+	__flush_dcache_pages(paddr, u_vaddr, 1);
 
 	if (vma->vm_flags & VM_EXEC)
-		__inv_icache_page(paddr, u_vaddr);
+		__inv_icache_pages(paddr, u_vaddr, 1);
 }
 
 void flush_cache_range(struct vm_area_struct *vma, unsigned long start,
@@ -978,9 +988,9 @@ void flush_anon_page(struct vm_area_struct *vma, struct page *page,
 		     unsigned long u_vaddr)
 {
 	/* TBD: do we really need to clear the kernel mapping */
-	__flush_dcache_page((phys_addr_t)page_address(page), u_vaddr);
-	__flush_dcache_page((phys_addr_t)page_address(page),
-			    (phys_addr_t)page_address(page));
+	__flush_dcache_pages((phys_addr_t)page_address(page), u_vaddr, 1);
+	__flush_dcache_pages((phys_addr_t)page_address(page),
+			    (phys_addr_t)page_address(page), 1);
 
 }
 
@@ -989,6 +999,8 @@ void flush_anon_page(struct vm_area_struct *vma, struct page *page,
 void copy_user_highpage(struct page *to, struct page *from,
 	unsigned long u_vaddr, struct vm_area_struct *vma)
 {
+	struct folio *src = page_folio(from);
+	struct folio *dst = page_folio(to);
 	void *kfrom = kmap_atomic(from);
 	void *kto = kmap_atomic(to);
 	int clean_src_k_mappings = 0;
@@ -1005,7 +1017,7 @@ void copy_user_highpage(struct page *to, struct page *from,
 	 * addr_not_cache_congruent() is 0
 	 */
 	if (page_mapcount(from) && addr_not_cache_congruent(kfrom, u_vaddr)) {
-		__flush_dcache_page((unsigned long)kfrom, u_vaddr);
+		__flush_dcache_pages((unsigned long)kfrom, u_vaddr, 1);
 		clean_src_k_mappings = 1;
 	}
 
@@ -1019,17 +1031,17 @@ void copy_user_highpage(struct page *to, struct page *from,
 	 * non copied user pages (e.g. read faults which wire in pagecache page
 	 * directly).
 	 */
-	clear_bit(PG_dc_clean, &to->flags);
+	clear_bit(PG_dc_clean, &dst->flags);
 
 	/*
 	 * if SRC was already usermapped and non-congruent to kernel mapping
 	 * sync the kernel mapping back to physical page
 	 */
 	if (clean_src_k_mappings) {
-		__flush_dcache_page((unsigned long)kfrom, (unsigned long)kfrom);
-		set_bit(PG_dc_clean, &from->flags);
+		__flush_dcache_pages((unsigned long)kfrom,
+					(unsigned long)kfrom, 1);
 	} else {
-		clear_bit(PG_dc_clean, &from->flags);
+		clear_bit(PG_dc_clean, &src->flags);
 	}
 
 	kunmap_atomic(kto);
@@ -1038,8 +1050,9 @@ void copy_user_highpage(struct page *to, struct page *from,
 
 void clear_user_page(void *to, unsigned long u_vaddr, struct page *page)
 {
+	struct folio *folio = page_folio(page);
 	clear_page(to);
-	clear_bit(PG_dc_clean, &page->flags);
+	clear_bit(PG_dc_clean, &folio->flags);
 }
 EXPORT_SYMBOL(clear_user_page);
 
diff --git a/arch/arc/mm/tlb.c b/arch/arc/mm/tlb.c
index 5f71445f26bd..0a996b65bb4e 100644
--- a/arch/arc/mm/tlb.c
+++ b/arch/arc/mm/tlb.c
@@ -467,8 +467,8 @@ void create_tlb(struct vm_area_struct *vma, unsigned long vaddr, pte_t *ptep)
  * Note that flush (when done) involves both WBACK - so physical page is
  * in sync as well as INV - so any non-congruent aliases don't remain
  */
-void update_mmu_cache(struct vm_area_struct *vma, unsigned long vaddr_unaligned,
-		      pte_t *ptep)
+void update_mmu_cache_range(struct vm_area_struct *vma,
+		unsigned long vaddr_unaligned, pte_t *ptep, unsigned int nr)
 {
 	unsigned long vaddr = vaddr_unaligned & PAGE_MASK;
 	phys_addr_t paddr = pte_val(*ptep) & PAGE_MASK_PHYS;
@@ -491,15 +491,19 @@ void update_mmu_cache(struct vm_area_struct *vma, unsigned long vaddr_unaligned,
 	 */
 	if ((vma->vm_flags & VM_EXEC) ||
 	     addr_not_cache_congruent(paddr, vaddr)) {
-
-		int dirty = !test_and_set_bit(PG_dc_clean, &page->flags);
+		struct folio *folio = page_folio(page);
+		int dirty = !test_and_set_bit(PG_dc_clean, &folio->flags);
 		if (dirty) {
+			unsigned long offset = offset_in_folio(folio, paddr);
+			nr = folio_nr_pages(folio);
+			paddr -= offset;
+			vaddr -= offset;
 			/* wback + inv dcache lines (K-mapping) */
-			__flush_dcache_page(paddr, paddr);
+			__flush_dcache_pages(paddr, paddr, nr);
 
 			/* invalidate any existing icache lines (U-mapping) */
 			if (vma->vm_flags & VM_EXEC)
-				__inv_icache_page(paddr, vaddr);
+				__inv_icache_pages(paddr, vaddr, nr);
 		}
 	}
 }
@@ -531,7 +535,7 @@ void update_mmu_cache_pmd(struct vm_area_struct *vma, unsigned long addr,
 				 pmd_t *pmd)
 {
 	pte_t pte = __pte(pmd_val(*pmd));
-	update_mmu_cache(vma, addr, &pte);
+	update_mmu_cache_range(vma, addr, &pte, HPAGE_PMD_NR);
 }
 
 void local_flush_pmd_tlb_range(struct vm_area_struct *vma, unsigned long start,
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH v3 06/34] arc: Implement the new page table range API
@ 2023-02-28 21:37   ` Matthew Wilcox (Oracle)
  0 siblings, 0 replies; 77+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-02-28 21:37 UTC (permalink / raw)
  To: linux-mm, linux-arch
  Cc: Matthew Wilcox (Oracle), linux-kernel, Vineet Gupta, linux-snps-arc

Add set_ptes(), update_mmu_cache_range(), flush_dcache_folio()
and flush_icache_pages().

Change the PG_dc_clean flag from being per-page to per-folio (which
means it cannot always be set as we don't know that all pages in this
folio were cleaned).  Enhance the internal flush routines to take the
number of pages to flush.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Vineet Gupta <vgupta@kernel.org>
Cc: linux-snps-arc@lists.infradead.org
---
 arch/arc/include/asm/cacheflush.h         |  7 ++-
 arch/arc/include/asm/pgtable-bits-arcv2.h | 20 ++++++--
 arch/arc/mm/cache.c                       | 61 ++++++++++++++---------
 arch/arc/mm/tlb.c                         | 18 ++++---
 4 files changed, 68 insertions(+), 38 deletions(-)

diff --git a/arch/arc/include/asm/cacheflush.h b/arch/arc/include/asm/cacheflush.h
index e201b4b1655a..04f65f588510 100644
--- a/arch/arc/include/asm/cacheflush.h
+++ b/arch/arc/include/asm/cacheflush.h
@@ -25,17 +25,20 @@
  * in update_mmu_cache()
  */
 #define flush_icache_page(vma, page)
+#define flush_icache_pages(vma, page, nr)
 
 void flush_cache_all(void);
 
 void flush_icache_range(unsigned long kstart, unsigned long kend);
 void __sync_icache_dcache(phys_addr_t paddr, unsigned long vaddr, int len);
-void __inv_icache_page(phys_addr_t paddr, unsigned long vaddr);
-void __flush_dcache_page(phys_addr_t paddr, unsigned long vaddr);
+void __inv_icache_pages(phys_addr_t paddr, unsigned long vaddr, unsigned nr);
+void __flush_dcache_pages(phys_addr_t paddr, unsigned long vaddr, unsigned nr);
 
 #define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
 
 void flush_dcache_page(struct page *page);
+void flush_dcache_folio(struct folio *folio);
+#define flush_dcache_folio flush_dcache_folio
 
 void dma_cache_wback_inv(phys_addr_t start, unsigned long sz);
 void dma_cache_inv(phys_addr_t start, unsigned long sz);
diff --git a/arch/arc/include/asm/pgtable-bits-arcv2.h b/arch/arc/include/asm/pgtable-bits-arcv2.h
index 6e9f8ca6d6a1..4a1b2ce204c6 100644
--- a/arch/arc/include/asm/pgtable-bits-arcv2.h
+++ b/arch/arc/include/asm/pgtable-bits-arcv2.h
@@ -100,14 +100,24 @@ static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
 	return __pte((pte_val(pte) & _PAGE_CHG_MASK) | pgprot_val(newprot));
 }
 
-static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
-			      pte_t *ptep, pte_t pteval)
+static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
+		pte_t *ptep, pte_t pte, unsigned int nr)
 {
-	set_pte(ptep, pteval);
+	for (;;) {
+		set_pte(ptep, pte);
+		if (--nr == 0)
+			break;
+		ptep++;
+		pte_val(pte) += PAGE_SIZE;
+	}
 }
+#define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1)
 
-void update_mmu_cache(struct vm_area_struct *vma, unsigned long address,
-		      pte_t *ptep);
+void update_mmu_cache_range(struct vm_area_struct *vma, unsigned long address,
+		      pte_t *ptep, unsigned int nr);
+
+#define update_mmu_cache(vma, addr, ptep) \
+	update_mmu_cache_range(vma, addr, ptep, 1)
 
 /*
  * Encode/decode swap entries and swap PTEs. Swap PTEs are all PTEs that
diff --git a/arch/arc/mm/cache.c b/arch/arc/mm/cache.c
index 55c6de138eae..3c16ee942a5c 100644
--- a/arch/arc/mm/cache.c
+++ b/arch/arc/mm/cache.c
@@ -752,17 +752,17 @@ static inline void arc_slc_enable(void)
  * There's a corollary case, where kernel READs from a userspace mapped page.
  * If the U-mapping is not congruent to K-mapping, former needs flushing.
  */
-void flush_dcache_page(struct page *page)
+void flush_dcache_folio(struct folio *folio)
 {
 	struct address_space *mapping;
 
 	if (!cache_is_vipt_aliasing()) {
-		clear_bit(PG_dc_clean, &page->flags);
+		clear_bit(PG_dc_clean, &folio->flags);
 		return;
 	}
 
 	/* don't handle anon pages here */
-	mapping = page_mapping_file(page);
+	mapping = folio_flush_mapping(folio);
 	if (!mapping)
 		return;
 
@@ -771,17 +771,27 @@ void flush_dcache_page(struct page *page)
 	 * Make a note that K-mapping is dirty
 	 */
 	if (!mapping_mapped(mapping)) {
-		clear_bit(PG_dc_clean, &page->flags);
-	} else if (page_mapcount(page)) {
-
+		clear_bit(PG_dc_clean, &folio->flags);
+	} else if (folio_mapped(folio)) {
 		/* kernel reading from page with U-mapping */
-		phys_addr_t paddr = (unsigned long)page_address(page);
-		unsigned long vaddr = page->index << PAGE_SHIFT;
+		phys_addr_t paddr = (unsigned long)folio_address(folio);
+		unsigned long vaddr = folio_pos(folio);
 
+		/*
+		 * vaddr is not actually the virtual address, but is
+		 * congruent to every user mapping.
+		 */
 		if (addr_not_cache_congruent(paddr, vaddr))
-			__flush_dcache_page(paddr, vaddr);
+			__flush_dcache_pages(paddr, vaddr,
+						folio_nr_pages(folio));
 	}
 }
+EXPORT_SYMBOL(flush_dcache_folio);
+
+void flush_dcache_page(struct page *page)
+{
+	return flush_dcache_folio(page_folio(page));
+}
 EXPORT_SYMBOL(flush_dcache_page);
 
 /*
@@ -921,18 +931,18 @@ void __sync_icache_dcache(phys_addr_t paddr, unsigned long vaddr, int len)
 }
 
 /* wrapper to compile time eliminate alignment checks in flush loop */
-void __inv_icache_page(phys_addr_t paddr, unsigned long vaddr)
+void __inv_icache_pages(phys_addr_t paddr, unsigned long vaddr, unsigned nr)
 {
-	__ic_line_inv_vaddr(paddr, vaddr, PAGE_SIZE);
+	__ic_line_inv_vaddr(paddr, vaddr, nr * PAGE_SIZE);
 }
 
 /*
  * wrapper to clearout kernel or userspace mappings of a page
  * For kernel mappings @vaddr == @paddr
  */
-void __flush_dcache_page(phys_addr_t paddr, unsigned long vaddr)
+void __flush_dcache_pages(phys_addr_t paddr, unsigned long vaddr, unsigned nr)
 {
-	__dc_line_op(paddr, vaddr & PAGE_MASK, PAGE_SIZE, OP_FLUSH_N_INV);
+	__dc_line_op(paddr, vaddr & PAGE_MASK, nr * PAGE_SIZE, OP_FLUSH_N_INV);
 }
 
 noinline void flush_cache_all(void)
@@ -962,10 +972,10 @@ void flush_cache_page(struct vm_area_struct *vma, unsigned long u_vaddr,
 
 	u_vaddr &= PAGE_MASK;
 
-	__flush_dcache_page(paddr, u_vaddr);
+	__flush_dcache_pages(paddr, u_vaddr, 1);
 
 	if (vma->vm_flags & VM_EXEC)
-		__inv_icache_page(paddr, u_vaddr);
+		__inv_icache_pages(paddr, u_vaddr, 1);
 }
 
 void flush_cache_range(struct vm_area_struct *vma, unsigned long start,
@@ -978,9 +988,9 @@ void flush_anon_page(struct vm_area_struct *vma, struct page *page,
 		     unsigned long u_vaddr)
 {
 	/* TBD: do we really need to clear the kernel mapping */
-	__flush_dcache_page((phys_addr_t)page_address(page), u_vaddr);
-	__flush_dcache_page((phys_addr_t)page_address(page),
-			    (phys_addr_t)page_address(page));
+	__flush_dcache_pages((phys_addr_t)page_address(page), u_vaddr, 1);
+	__flush_dcache_pages((phys_addr_t)page_address(page),
+			    (phys_addr_t)page_address(page), 1);
 
 }
 
@@ -989,6 +999,8 @@ void flush_anon_page(struct vm_area_struct *vma, struct page *page,
 void copy_user_highpage(struct page *to, struct page *from,
 	unsigned long u_vaddr, struct vm_area_struct *vma)
 {
+	struct folio *src = page_folio(from);
+	struct folio *dst = page_folio(to);
 	void *kfrom = kmap_atomic(from);
 	void *kto = kmap_atomic(to);
 	int clean_src_k_mappings = 0;
@@ -1005,7 +1017,7 @@ void copy_user_highpage(struct page *to, struct page *from,
 	 * addr_not_cache_congruent() is 0
 	 */
 	if (page_mapcount(from) && addr_not_cache_congruent(kfrom, u_vaddr)) {
-		__flush_dcache_page((unsigned long)kfrom, u_vaddr);
+		__flush_dcache_pages((unsigned long)kfrom, u_vaddr, 1);
 		clean_src_k_mappings = 1;
 	}
 
@@ -1019,17 +1031,17 @@ void copy_user_highpage(struct page *to, struct page *from,
 	 * non copied user pages (e.g. read faults which wire in pagecache page
 	 * directly).
 	 */
-	clear_bit(PG_dc_clean, &to->flags);
+	clear_bit(PG_dc_clean, &dst->flags);
 
 	/*
 	 * if SRC was already usermapped and non-congruent to kernel mapping
 	 * sync the kernel mapping back to physical page
 	 */
 	if (clean_src_k_mappings) {
-		__flush_dcache_page((unsigned long)kfrom, (unsigned long)kfrom);
-		set_bit(PG_dc_clean, &from->flags);
+		__flush_dcache_pages((unsigned long)kfrom,
+					(unsigned long)kfrom, 1);
 	} else {
-		clear_bit(PG_dc_clean, &from->flags);
+		clear_bit(PG_dc_clean, &src->flags);
 	}
 
 	kunmap_atomic(kto);
@@ -1038,8 +1050,9 @@ void copy_user_highpage(struct page *to, struct page *from,
 
 void clear_user_page(void *to, unsigned long u_vaddr, struct page *page)
 {
+	struct folio *folio = page_folio(page);
 	clear_page(to);
-	clear_bit(PG_dc_clean, &page->flags);
+	clear_bit(PG_dc_clean, &folio->flags);
 }
 EXPORT_SYMBOL(clear_user_page);
 
diff --git a/arch/arc/mm/tlb.c b/arch/arc/mm/tlb.c
index 5f71445f26bd..0a996b65bb4e 100644
--- a/arch/arc/mm/tlb.c
+++ b/arch/arc/mm/tlb.c
@@ -467,8 +467,8 @@ void create_tlb(struct vm_area_struct *vma, unsigned long vaddr, pte_t *ptep)
  * Note that flush (when done) involves both WBACK - so physical page is
  * in sync as well as INV - so any non-congruent aliases don't remain
  */
-void update_mmu_cache(struct vm_area_struct *vma, unsigned long vaddr_unaligned,
-		      pte_t *ptep)
+void update_mmu_cache_range(struct vm_area_struct *vma,
+		unsigned long vaddr_unaligned, pte_t *ptep, unsigned int nr)
 {
 	unsigned long vaddr = vaddr_unaligned & PAGE_MASK;
 	phys_addr_t paddr = pte_val(*ptep) & PAGE_MASK_PHYS;
@@ -491,15 +491,19 @@ void update_mmu_cache(struct vm_area_struct *vma, unsigned long vaddr_unaligned,
 	 */
 	if ((vma->vm_flags & VM_EXEC) ||
 	     addr_not_cache_congruent(paddr, vaddr)) {
-
-		int dirty = !test_and_set_bit(PG_dc_clean, &page->flags);
+		struct folio *folio = page_folio(page);
+		int dirty = !test_and_set_bit(PG_dc_clean, &folio->flags);
 		if (dirty) {
+			unsigned long offset = offset_in_folio(folio, paddr);
+			nr = folio_nr_pages(folio);
+			paddr -= offset;
+			vaddr -= offset;
 			/* wback + inv dcache lines (K-mapping) */
-			__flush_dcache_page(paddr, paddr);
+			__flush_dcache_pages(paddr, paddr, nr);
 
 			/* invalidate any existing icache lines (U-mapping) */
 			if (vma->vm_flags & VM_EXEC)
-				__inv_icache_page(paddr, vaddr);
+				__inv_icache_pages(paddr, vaddr, nr);
 		}
 	}
 }
@@ -531,7 +535,7 @@ void update_mmu_cache_pmd(struct vm_area_struct *vma, unsigned long addr,
 				 pmd_t *pmd)
 {
 	pte_t pte = __pte(pmd_val(*pmd));
-	update_mmu_cache(vma, addr, &pte);
+	update_mmu_cache_range(vma, addr, &pte, HPAGE_PMD_NR);
 }
 
 void local_flush_pmd_tlb_range(struct vm_area_struct *vma, unsigned long start,
-- 
2.39.1


_______________________________________________
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH v3 07/34] arm: Implement the new page table range API
  2023-02-28 21:37 [PATCH v3 00/34] New page table range API Matthew Wilcox (Oracle)
@ 2023-02-28 21:37   ` Matthew Wilcox (Oracle)
  2023-02-28 21:37 ` [PATCH v3 02/34] mm: Add generic flush_icache_pages() and documentation Matthew Wilcox (Oracle)
                     ` (35 subsequent siblings)
  36 siblings, 0 replies; 77+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-02-28 21:37 UTC (permalink / raw)
  To: linux-mm, linux-arch
  Cc: Matthew Wilcox (Oracle), linux-kernel, Russell King, linux-arm-kernel

Add set_ptes(), update_mmu_cache_range(), flush_dcache_folio() and
flush_icache_pages().  Change the PG_dcache_clear flag from being per-page
to per-folio which makes __dma_page_dev_to_cpu() a bit more exciting.
Also add flush_cache_pages(), even though this isn't used by generic code
(yet?)

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: linux-arm-kernel@lists.infradead.org
---
 arch/arm/include/asm/cacheflush.h | 24 +++++---
 arch/arm/include/asm/pgtable.h    |  5 +-
 arch/arm/include/asm/tlbflush.h   | 13 ++--
 arch/arm/mm/copypage-v4mc.c       |  5 +-
 arch/arm/mm/copypage-v6.c         |  5 +-
 arch/arm/mm/copypage-xscale.c     |  5 +-
 arch/arm/mm/dma-mapping.c         | 24 ++++----
 arch/arm/mm/fault-armv.c          | 14 ++---
 arch/arm/mm/flush.c               | 99 +++++++++++++++++++------------
 arch/arm/mm/mm.h                  |  2 +-
 arch/arm/mm/mmu.c                 | 14 +++--
 11 files changed, 125 insertions(+), 85 deletions(-)

diff --git a/arch/arm/include/asm/cacheflush.h b/arch/arm/include/asm/cacheflush.h
index a094f964c869..841e268d2374 100644
--- a/arch/arm/include/asm/cacheflush.h
+++ b/arch/arm/include/asm/cacheflush.h
@@ -231,14 +231,15 @@ vivt_flush_cache_range(struct vm_area_struct *vma, unsigned long start, unsigned
 					vma->vm_flags);
 }
 
-static inline void
-vivt_flush_cache_page(struct vm_area_struct *vma, unsigned long user_addr, unsigned long pfn)
+static inline void vivt_flush_cache_pages(struct vm_area_struct *vma,
+		unsigned long user_addr, unsigned long pfn, unsigned int nr)
 {
 	struct mm_struct *mm = vma->vm_mm;
 
 	if (!mm || cpumask_test_cpu(smp_processor_id(), mm_cpumask(mm))) {
 		unsigned long addr = user_addr & PAGE_MASK;
-		__cpuc_flush_user_range(addr, addr + PAGE_SIZE, vma->vm_flags);
+		__cpuc_flush_user_range(addr, addr + nr * PAGE_SIZE,
+				vma->vm_flags);
 	}
 }
 
@@ -247,15 +248,17 @@ vivt_flush_cache_page(struct vm_area_struct *vma, unsigned long user_addr, unsig
 		vivt_flush_cache_mm(mm)
 #define flush_cache_range(vma,start,end) \
 		vivt_flush_cache_range(vma,start,end)
-#define flush_cache_page(vma,addr,pfn) \
-		vivt_flush_cache_page(vma,addr,pfn)
+#define flush_cache_pages(vma, addr, pfn, nr) \
+		vivt_flush_cache_pages(vma, addr, pfn, nr)
 #else
-extern void flush_cache_mm(struct mm_struct *mm);
-extern void flush_cache_range(struct vm_area_struct *vma, unsigned long start, unsigned long end);
-extern void flush_cache_page(struct vm_area_struct *vma, unsigned long user_addr, unsigned long pfn);
+void flush_cache_mm(struct mm_struct *mm);
+void flush_cache_range(struct vm_area_struct *vma, unsigned long start, unsigned long end);
+void flush_cache_pages(struct vm_area_struct *vma, unsigned long user_addr,
+		unsigned long pfn, unsigned int nr);
 #endif
 
 #define flush_cache_dup_mm(mm) flush_cache_mm(mm)
+#define flush_cache_page(vma, addr, pfn) flush_cache_pages(vma, addr, pfn, 1)
 
 /*
  * flush_icache_user_range is used when we want to ensure that the
@@ -289,7 +292,9 @@ extern void flush_cache_page(struct vm_area_struct *vma, unsigned long user_addr
  * See update_mmu_cache for the user space part.
  */
 #define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
-extern void flush_dcache_page(struct page *);
+void flush_dcache_page(struct page *);
+void flush_dcache_folio(struct folio *folio);
+#define flush_dcache_folio flush_dcache_folio
 
 #define ARCH_IMPLEMENTS_FLUSH_KERNEL_VMAP_RANGE 1
 static inline void flush_kernel_vmap_range(void *addr, int size)
@@ -321,6 +326,7 @@ static inline void flush_anon_page(struct vm_area_struct *vma,
  * duplicate cache flushing elsewhere performed by flush_dcache_page().
  */
 #define flush_icache_page(vma,page)	do { } while (0)
+#define flush_icache_pages(vma, page, nr)	do { } while (0)
 
 /*
  * flush_cache_vmap() is used when creating mappings (eg, via vmap,
diff --git a/arch/arm/include/asm/pgtable.h b/arch/arm/include/asm/pgtable.h
index a58ccbb406ad..6525ac82bd50 100644
--- a/arch/arm/include/asm/pgtable.h
+++ b/arch/arm/include/asm/pgtable.h
@@ -207,8 +207,9 @@ static inline void __sync_icache_dcache(pte_t pteval)
 extern void __sync_icache_dcache(pte_t pteval);
 #endif
 
-void set_pte_at(struct mm_struct *mm, unsigned long addr,
-		      pte_t *ptep, pte_t pteval);
+void set_ptes(struct mm_struct *mm, unsigned long addr,
+		      pte_t *ptep, pte_t pteval, unsigned int nr);
+#define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1)
 
 static inline pte_t clear_pte_bit(pte_t pte, pgprot_t prot)
 {
diff --git a/arch/arm/include/asm/tlbflush.h b/arch/arm/include/asm/tlbflush.h
index 0ccc985b90af..7d792e485f4f 100644
--- a/arch/arm/include/asm/tlbflush.h
+++ b/arch/arm/include/asm/tlbflush.h
@@ -619,18 +619,21 @@ extern void flush_bp_all(void);
  * If PG_dcache_clean is not set for the page, we need to ensure that any
  * cache entries for the kernels virtual memory range are written
  * back to the page. On ARMv6 and later, the cache coherency is handled via
- * the set_pte_at() function.
+ * the set_ptes() function.
  */
 #if __LINUX_ARM_ARCH__ < 6
-extern void update_mmu_cache(struct vm_area_struct *vma, unsigned long addr,
-	pte_t *ptep);
+void update_mmu_cache_range(struct vm_area_struct *vma, unsigned long addr,
+		pte_t *ptep, unsigned int nr);
 #else
-static inline void update_mmu_cache(struct vm_area_struct *vma,
-				    unsigned long addr, pte_t *ptep)
+static inline void update_mmu_cache_range(struct vm_area_struct *vma,
+		unsigned long addr, pte_t *ptep, unsigned int nr)
 {
 }
 #endif
 
+#define update_mmu_cache(vma, addr, ptep) \
+	update_mmu_cache_range(vma, addr, ptep, 1)
+
 #define update_mmu_cache_pmd(vma, address, pmd) do { } while (0)
 
 #endif
diff --git a/arch/arm/mm/copypage-v4mc.c b/arch/arm/mm/copypage-v4mc.c
index f1da3b439b96..7ddd82b9fe8b 100644
--- a/arch/arm/mm/copypage-v4mc.c
+++ b/arch/arm/mm/copypage-v4mc.c
@@ -64,10 +64,11 @@ static void mc_copy_user_page(void *from, void *to)
 void v4_mc_copy_user_highpage(struct page *to, struct page *from,
 	unsigned long vaddr, struct vm_area_struct *vma)
 {
+	struct folio *src = page_folio(from);
 	void *kto = kmap_atomic(to);
 
-	if (!test_and_set_bit(PG_dcache_clean, &from->flags))
-		__flush_dcache_page(page_mapping_file(from), from);
+	if (!test_and_set_bit(PG_dcache_clean, &src->flags))
+		__flush_dcache_folio(folio_flush_mapping(src), src);
 
 	raw_spin_lock(&minicache_lock);
 
diff --git a/arch/arm/mm/copypage-v6.c b/arch/arm/mm/copypage-v6.c
index d8a115de5507..a1a71f36d850 100644
--- a/arch/arm/mm/copypage-v6.c
+++ b/arch/arm/mm/copypage-v6.c
@@ -69,11 +69,12 @@ static void discard_old_kernel_data(void *kto)
 static void v6_copy_user_highpage_aliasing(struct page *to,
 	struct page *from, unsigned long vaddr, struct vm_area_struct *vma)
 {
+	struct folio *src = page_folio(from);
 	unsigned int offset = CACHE_COLOUR(vaddr);
 	unsigned long kfrom, kto;
 
-	if (!test_and_set_bit(PG_dcache_clean, &from->flags))
-		__flush_dcache_page(page_mapping_file(from), from);
+	if (!test_and_set_bit(PG_dcache_clean, &src->flags))
+		__flush_dcache_folio(folio_flush_mapping(src), src);
 
 	/* FIXME: not highmem safe */
 	discard_old_kernel_data(page_address(to));
diff --git a/arch/arm/mm/copypage-xscale.c b/arch/arm/mm/copypage-xscale.c
index bcb485620a05..f1e29d3e8193 100644
--- a/arch/arm/mm/copypage-xscale.c
+++ b/arch/arm/mm/copypage-xscale.c
@@ -84,10 +84,11 @@ static void mc_copy_user_page(void *from, void *to)
 void xscale_mc_copy_user_highpage(struct page *to, struct page *from,
 	unsigned long vaddr, struct vm_area_struct *vma)
 {
+	struct folio *src = page_folio(from);
 	void *kto = kmap_atomic(to);
 
-	if (!test_and_set_bit(PG_dcache_clean, &from->flags))
-		__flush_dcache_page(page_mapping_file(from), from);
+	if (!test_and_set_bit(PG_dcache_clean, &src->flags))
+		__flush_dcache_folio(folio_flush_mapping(src), src);
 
 	raw_spin_lock(&minicache_lock);
 
diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index 8bc01071474a..5ecfde41d70a 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -693,6 +693,7 @@ static void __dma_page_cpu_to_dev(struct page *page, unsigned long off,
 static void __dma_page_dev_to_cpu(struct page *page, unsigned long off,
 	size_t size, enum dma_data_direction dir)
 {
+	struct folio *folio = page_folio(page);
 	phys_addr_t paddr = page_to_phys(page) + off;
 
 	/* FIXME: non-speculating: not required */
@@ -707,19 +708,18 @@ static void __dma_page_dev_to_cpu(struct page *page, unsigned long off,
 	 * Mark the D-cache clean for these pages to avoid extra flushing.
 	 */
 	if (dir != DMA_TO_DEVICE && size >= PAGE_SIZE) {
-		unsigned long pfn;
-		size_t left = size;
-
-		pfn = page_to_pfn(page) + off / PAGE_SIZE;
-		off %= PAGE_SIZE;
-		if (off) {
-			pfn++;
-			left -= PAGE_SIZE - off;
+		ssize_t left = size;
+		size_t offset = offset_in_folio(folio, paddr);
+
+		if (offset) {
+			left -= folio_size(folio) - offset;
+			folio = folio_next(folio);
 		}
-		while (left >= PAGE_SIZE) {
-			page = pfn_to_page(pfn++);
-			set_bit(PG_dcache_clean, &page->flags);
-			left -= PAGE_SIZE;
+
+		while (left >= (ssize_t)folio_size(folio)) {
+			set_bit(PG_dcache_clean, &folio->flags);
+			left -= folio_size(folio);
+			folio = folio_next(folio);
 		}
 	}
 }
diff --git a/arch/arm/mm/fault-armv.c b/arch/arm/mm/fault-armv.c
index 0e49154454a6..e2c869b8f012 100644
--- a/arch/arm/mm/fault-armv.c
+++ b/arch/arm/mm/fault-armv.c
@@ -178,8 +178,8 @@ make_coherent(struct address_space *mapping, struct vm_area_struct *vma,
  *
  * Note that the pte lock will be held.
  */
-void update_mmu_cache(struct vm_area_struct *vma, unsigned long addr,
-	pte_t *ptep)
+void update_mmu_cache_range(struct vm_area_struct *vma, unsigned long addr,
+		pte_t *ptep, unsigned int nr)
 {
 	unsigned long pfn = pte_pfn(*ptep);
 	struct address_space *mapping;
@@ -192,13 +192,13 @@ void update_mmu_cache(struct vm_area_struct *vma, unsigned long addr,
 	 * The zero page is never written to, so never has any dirty
 	 * cache lines, and therefore never needs to be flushed.
 	 */
-	page = pfn_to_page(pfn);
-	if (page == ZERO_PAGE(0))
+	if (is_zero_pfn(pfn))
 		return;
 
-	mapping = page_mapping_file(page);
-	if (!test_and_set_bit(PG_dcache_clean, &page->flags))
-		__flush_dcache_page(mapping, page);
+	folio = page_folio(pfn_to_page(pfn));
+	mapping = folio_flush_mapping(page);
+	if (!test_and_set_bit(PG_dcache_clean, &folio->flags))
+		__flush_dcache_folio(mapping, folio);
 	if (mapping) {
 		if (cache_is_vivt())
 			make_coherent(mapping, vma, addr, ptep, pfn);
diff --git a/arch/arm/mm/flush.c b/arch/arm/mm/flush.c
index 7ff9feea13a6..07ea0ab51099 100644
--- a/arch/arm/mm/flush.c
+++ b/arch/arm/mm/flush.c
@@ -95,10 +95,10 @@ void flush_cache_range(struct vm_area_struct *vma, unsigned long start, unsigned
 		__flush_icache_all();
 }
 
-void flush_cache_page(struct vm_area_struct *vma, unsigned long user_addr, unsigned long pfn)
+void flush_cache_pages(struct vm_area_struct *vma, unsigned long user_addr, unsigned long pfn, unsigned int nr)
 {
 	if (cache_is_vivt()) {
-		vivt_flush_cache_page(vma, user_addr, pfn);
+		vivt_flush_cache_pages(vma, user_addr, pfn, nr);
 		return;
 	}
 
@@ -196,29 +196,31 @@ void copy_to_user_page(struct vm_area_struct *vma, struct page *page,
 #endif
 }
 
-void __flush_dcache_page(struct address_space *mapping, struct page *page)
+void __flush_dcache_folio(struct address_space *mapping, struct folio *folio)
 {
 	/*
 	 * Writeback any data associated with the kernel mapping of this
 	 * page.  This ensures that data in the physical page is mutually
 	 * coherent with the kernels mapping.
 	 */
-	if (!PageHighMem(page)) {
-		__cpuc_flush_dcache_area(page_address(page), page_size(page));
+	if (!folio_test_highmem(folio)) {
+		__cpuc_flush_dcache_area(folio_address(folio),
+					folio_size(folio));
 	} else {
 		unsigned long i;
 		if (cache_is_vipt_nonaliasing()) {
-			for (i = 0; i < compound_nr(page); i++) {
-				void *addr = kmap_atomic(page + i);
+			for (i = 0; i < folio_nr_pages(folio); i++) {
+				void *addr = kmap_local_folio(folio,
+								i * PAGE_SIZE);
 				__cpuc_flush_dcache_area(addr, PAGE_SIZE);
-				kunmap_atomic(addr);
+				kunmap_local(addr);
 			}
 		} else {
-			for (i = 0; i < compound_nr(page); i++) {
-				void *addr = kmap_high_get(page + i);
+			for (i = 0; i < folio_nr_pages(folio); i++) {
+				void *addr = kmap_high_get(folio_page(folio, i));
 				if (addr) {
 					__cpuc_flush_dcache_area(addr, PAGE_SIZE);
-					kunmap_high(page + i);
+					kunmap_high(folio_page(folio, i));
 				}
 			}
 		}
@@ -230,15 +232,14 @@ void __flush_dcache_page(struct address_space *mapping, struct page *page)
 	 * userspace colour, which is congruent with page->index.
 	 */
 	if (mapping && cache_is_vipt_aliasing())
-		flush_pfn_alias(page_to_pfn(page),
-				page->index << PAGE_SHIFT);
+		flush_pfn_alias(folio_pfn(folio), folio_pos(folio));
 }
 
-static void __flush_dcache_aliases(struct address_space *mapping, struct page *page)
+static void __flush_dcache_aliases(struct address_space *mapping, struct folio *folio)
 {
 	struct mm_struct *mm = current->active_mm;
-	struct vm_area_struct *mpnt;
-	pgoff_t pgoff;
+	struct vm_area_struct *vma;
+	pgoff_t pgoff, pgoff_end;
 
 	/*
 	 * There are possible user space mappings of this page:
@@ -246,21 +247,36 @@ static void __flush_dcache_aliases(struct address_space *mapping, struct page *p
 	 *   data in the current VM view associated with this page.
 	 * - aliasing VIPT: we only need to find one mapping of this page.
 	 */
-	pgoff = page->index;
+	pgoff = folio->index;
+	pgoff_end = pgoff + folio_nr_pages(folio) - 1;
 
 	flush_dcache_mmap_lock(mapping);
-	vma_interval_tree_foreach(mpnt, &mapping->i_mmap, pgoff, pgoff) {
-		unsigned long offset;
+	vma_interval_tree_foreach(vma, &mapping->i_mmap, pgoff, pgoff_end) {
+		unsigned long start, offset, pfn;
+		unsigned int nr;
 
 		/*
 		 * If this VMA is not in our MM, we can ignore it.
 		 */
-		if (mpnt->vm_mm != mm)
+		if (vma->vm_mm != mm)
 			continue;
-		if (!(mpnt->vm_flags & VM_MAYSHARE))
+		if (!(vma->vm_flags & VM_MAYSHARE))
 			continue;
-		offset = (pgoff - mpnt->vm_pgoff) << PAGE_SHIFT;
-		flush_cache_page(mpnt, mpnt->vm_start + offset, page_to_pfn(page));
+
+		start = vma->vm_start;
+		pfn = folio_pfn(folio);
+		nr = folio_nr_pages(folio);
+		offset = pgoff - vma->vm_pgoff;
+		if (offset > -nr) {
+			pfn -= offset;
+			nr += offset;
+		} else {
+			start += offset * PAGE_SIZE;
+		}
+		if (start + nr * PAGE_SIZE > vma->vm_end)
+			nr = (vma->vm_end - start) / PAGE_SIZE;
+
+		flush_cache_pages(vma, start, pfn, nr);
 	}
 	flush_dcache_mmap_unlock(mapping);
 }
@@ -269,7 +285,7 @@ static void __flush_dcache_aliases(struct address_space *mapping, struct page *p
 void __sync_icache_dcache(pte_t pteval)
 {
 	unsigned long pfn;
-	struct page *page;
+	struct folio *folio;
 	struct address_space *mapping;
 
 	if (cache_is_vipt_nonaliasing() && !pte_exec(pteval))
@@ -279,14 +295,14 @@ void __sync_icache_dcache(pte_t pteval)
 	if (!pfn_valid(pfn))
 		return;
 
-	page = pfn_to_page(pfn);
+	folio = page_folio(pfn_to_page(pfn));
 	if (cache_is_vipt_aliasing())
-		mapping = page_mapping_file(page);
+		mapping = folio_flush_mapping(folio);
 	else
 		mapping = NULL;
 
-	if (!test_and_set_bit(PG_dcache_clean, &page->flags))
-		__flush_dcache_page(mapping, page);
+	if (!test_and_set_bit(PG_dcache_clean, &folio->flags))
+		__flush_dcache_folio(mapping, folio);
 
 	if (pte_exec(pteval))
 		__flush_icache_all();
@@ -312,7 +328,7 @@ void __sync_icache_dcache(pte_t pteval)
  * Note that we disable the lazy flush for SMP configurations where
  * the cache maintenance operations are not automatically broadcasted.
  */
-void flush_dcache_page(struct page *page)
+void flush_dcache_folio(struct folio *folio)
 {
 	struct address_space *mapping;
 
@@ -320,31 +336,36 @@ void flush_dcache_page(struct page *page)
 	 * The zero page is never written to, so never has any dirty
 	 * cache lines, and therefore never needs to be flushed.
 	 */
-	if (page == ZERO_PAGE(0))
+	if (is_zero_pfn(folio_pfn(folio)))
 		return;
 
 	if (!cache_ops_need_broadcast() && cache_is_vipt_nonaliasing()) {
-		if (test_bit(PG_dcache_clean, &page->flags))
-			clear_bit(PG_dcache_clean, &page->flags);
+		if (test_bit(PG_dcache_clean, &folio->flags))
+			clear_bit(PG_dcache_clean, &folio->flags);
 		return;
 	}
 
-	mapping = page_mapping_file(page);
+	mapping = folio_flush_mapping(folio);
 
 	if (!cache_ops_need_broadcast() &&
-	    mapping && !page_mapcount(page))
-		clear_bit(PG_dcache_clean, &page->flags);
+	    mapping && !folio_mapped(folio))
+		clear_bit(PG_dcache_clean, &folio->flags);
 	else {
-		__flush_dcache_page(mapping, page);
+		__flush_dcache_folio(mapping, folio);
 		if (mapping && cache_is_vivt())
-			__flush_dcache_aliases(mapping, page);
+			__flush_dcache_aliases(mapping, folio);
 		else if (mapping)
 			__flush_icache_all();
-		set_bit(PG_dcache_clean, &page->flags);
+		set_bit(PG_dcache_clean, &folio->flags);
 	}
 }
-EXPORT_SYMBOL(flush_dcache_page);
+EXPORT_SYMBOL(flush_dcache_folio);
 
+void flush_dcache_page(struct page *page)
+{
+	flush_dcache_folio(page_folio(page));
+}
+EXPORT_SYMBOL(flush_dcache_page);
 /*
  * Flush an anonymous page so that users of get_user_pages()
  * can safely access the data.  The expected sequence is:
diff --git a/arch/arm/mm/mm.h b/arch/arm/mm/mm.h
index d7ffccb7fea7..419316316711 100644
--- a/arch/arm/mm/mm.h
+++ b/arch/arm/mm/mm.h
@@ -45,7 +45,7 @@ struct mem_type {
 
 const struct mem_type *get_mem_type(unsigned int type);
 
-extern void __flush_dcache_page(struct address_space *mapping, struct page *page);
+void __flush_dcache_folio(struct address_space *mapping, struct folio *folio);
 
 /*
  * ARM specific vm_struct->flags bits.
diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
index 463fc2a8448f..9947bbc32b04 100644
--- a/arch/arm/mm/mmu.c
+++ b/arch/arm/mm/mmu.c
@@ -1788,7 +1788,7 @@ void __init paging_init(const struct machine_desc *mdesc)
 	bootmem_init();
 
 	empty_zero_page = virt_to_page(zero_page);
-	__flush_dcache_page(NULL, empty_zero_page);
+	__flush_dcache_folio(NULL, page_folio(empty_zero_page));
 }
 
 void __init early_mm_init(const struct machine_desc *mdesc)
@@ -1797,8 +1797,8 @@ void __init early_mm_init(const struct machine_desc *mdesc)
 	early_paging_init(mdesc);
 }
 
-void set_pte_at(struct mm_struct *mm, unsigned long addr,
-			      pte_t *ptep, pte_t pteval)
+void set_ptes(struct mm_struct *mm, unsigned long addr,
+			      pte_t *ptep, pte_t pteval, unsigned int nr)
 {
 	unsigned long ext = 0;
 
@@ -1808,5 +1808,11 @@ void set_pte_at(struct mm_struct *mm, unsigned long addr,
 		ext |= PTE_EXT_NG;
 	}
 
-	set_pte_ext(ptep, pteval, ext);
+	for (;;) {
+		set_pte_ext(ptep, pteval, ext);
+		if (--nr == 0)
+			break;
+		ptep++;
+		pte_val(pteval) += PAGE_SIZE;
+	}
 }
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH v3 07/34] arm: Implement the new page table range API
@ 2023-02-28 21:37   ` Matthew Wilcox (Oracle)
  0 siblings, 0 replies; 77+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-02-28 21:37 UTC (permalink / raw)
  To: linux-mm, linux-arch
  Cc: Matthew Wilcox (Oracle), linux-kernel, Russell King, linux-arm-kernel

Add set_ptes(), update_mmu_cache_range(), flush_dcache_folio() and
flush_icache_pages().  Change the PG_dcache_clear flag from being per-page
to per-folio which makes __dma_page_dev_to_cpu() a bit more exciting.
Also add flush_cache_pages(), even though this isn't used by generic code
(yet?)

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: linux-arm-kernel@lists.infradead.org
---
 arch/arm/include/asm/cacheflush.h | 24 +++++---
 arch/arm/include/asm/pgtable.h    |  5 +-
 arch/arm/include/asm/tlbflush.h   | 13 ++--
 arch/arm/mm/copypage-v4mc.c       |  5 +-
 arch/arm/mm/copypage-v6.c         |  5 +-
 arch/arm/mm/copypage-xscale.c     |  5 +-
 arch/arm/mm/dma-mapping.c         | 24 ++++----
 arch/arm/mm/fault-armv.c          | 14 ++---
 arch/arm/mm/flush.c               | 99 +++++++++++++++++++------------
 arch/arm/mm/mm.h                  |  2 +-
 arch/arm/mm/mmu.c                 | 14 +++--
 11 files changed, 125 insertions(+), 85 deletions(-)

diff --git a/arch/arm/include/asm/cacheflush.h b/arch/arm/include/asm/cacheflush.h
index a094f964c869..841e268d2374 100644
--- a/arch/arm/include/asm/cacheflush.h
+++ b/arch/arm/include/asm/cacheflush.h
@@ -231,14 +231,15 @@ vivt_flush_cache_range(struct vm_area_struct *vma, unsigned long start, unsigned
 					vma->vm_flags);
 }
 
-static inline void
-vivt_flush_cache_page(struct vm_area_struct *vma, unsigned long user_addr, unsigned long pfn)
+static inline void vivt_flush_cache_pages(struct vm_area_struct *vma,
+		unsigned long user_addr, unsigned long pfn, unsigned int nr)
 {
 	struct mm_struct *mm = vma->vm_mm;
 
 	if (!mm || cpumask_test_cpu(smp_processor_id(), mm_cpumask(mm))) {
 		unsigned long addr = user_addr & PAGE_MASK;
-		__cpuc_flush_user_range(addr, addr + PAGE_SIZE, vma->vm_flags);
+		__cpuc_flush_user_range(addr, addr + nr * PAGE_SIZE,
+				vma->vm_flags);
 	}
 }
 
@@ -247,15 +248,17 @@ vivt_flush_cache_page(struct vm_area_struct *vma, unsigned long user_addr, unsig
 		vivt_flush_cache_mm(mm)
 #define flush_cache_range(vma,start,end) \
 		vivt_flush_cache_range(vma,start,end)
-#define flush_cache_page(vma,addr,pfn) \
-		vivt_flush_cache_page(vma,addr,pfn)
+#define flush_cache_pages(vma, addr, pfn, nr) \
+		vivt_flush_cache_pages(vma, addr, pfn, nr)
 #else
-extern void flush_cache_mm(struct mm_struct *mm);
-extern void flush_cache_range(struct vm_area_struct *vma, unsigned long start, unsigned long end);
-extern void flush_cache_page(struct vm_area_struct *vma, unsigned long user_addr, unsigned long pfn);
+void flush_cache_mm(struct mm_struct *mm);
+void flush_cache_range(struct vm_area_struct *vma, unsigned long start, unsigned long end);
+void flush_cache_pages(struct vm_area_struct *vma, unsigned long user_addr,
+		unsigned long pfn, unsigned int nr);
 #endif
 
 #define flush_cache_dup_mm(mm) flush_cache_mm(mm)
+#define flush_cache_page(vma, addr, pfn) flush_cache_pages(vma, addr, pfn, 1)
 
 /*
  * flush_icache_user_range is used when we want to ensure that the
@@ -289,7 +292,9 @@ extern void flush_cache_page(struct vm_area_struct *vma, unsigned long user_addr
  * See update_mmu_cache for the user space part.
  */
 #define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
-extern void flush_dcache_page(struct page *);
+void flush_dcache_page(struct page *);
+void flush_dcache_folio(struct folio *folio);
+#define flush_dcache_folio flush_dcache_folio
 
 #define ARCH_IMPLEMENTS_FLUSH_KERNEL_VMAP_RANGE 1
 static inline void flush_kernel_vmap_range(void *addr, int size)
@@ -321,6 +326,7 @@ static inline void flush_anon_page(struct vm_area_struct *vma,
  * duplicate cache flushing elsewhere performed by flush_dcache_page().
  */
 #define flush_icache_page(vma,page)	do { } while (0)
+#define flush_icache_pages(vma, page, nr)	do { } while (0)
 
 /*
  * flush_cache_vmap() is used when creating mappings (eg, via vmap,
diff --git a/arch/arm/include/asm/pgtable.h b/arch/arm/include/asm/pgtable.h
index a58ccbb406ad..6525ac82bd50 100644
--- a/arch/arm/include/asm/pgtable.h
+++ b/arch/arm/include/asm/pgtable.h
@@ -207,8 +207,9 @@ static inline void __sync_icache_dcache(pte_t pteval)
 extern void __sync_icache_dcache(pte_t pteval);
 #endif
 
-void set_pte_at(struct mm_struct *mm, unsigned long addr,
-		      pte_t *ptep, pte_t pteval);
+void set_ptes(struct mm_struct *mm, unsigned long addr,
+		      pte_t *ptep, pte_t pteval, unsigned int nr);
+#define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1)
 
 static inline pte_t clear_pte_bit(pte_t pte, pgprot_t prot)
 {
diff --git a/arch/arm/include/asm/tlbflush.h b/arch/arm/include/asm/tlbflush.h
index 0ccc985b90af..7d792e485f4f 100644
--- a/arch/arm/include/asm/tlbflush.h
+++ b/arch/arm/include/asm/tlbflush.h
@@ -619,18 +619,21 @@ extern void flush_bp_all(void);
  * If PG_dcache_clean is not set for the page, we need to ensure that any
  * cache entries for the kernels virtual memory range are written
  * back to the page. On ARMv6 and later, the cache coherency is handled via
- * the set_pte_at() function.
+ * the set_ptes() function.
  */
 #if __LINUX_ARM_ARCH__ < 6
-extern void update_mmu_cache(struct vm_area_struct *vma, unsigned long addr,
-	pte_t *ptep);
+void update_mmu_cache_range(struct vm_area_struct *vma, unsigned long addr,
+		pte_t *ptep, unsigned int nr);
 #else
-static inline void update_mmu_cache(struct vm_area_struct *vma,
-				    unsigned long addr, pte_t *ptep)
+static inline void update_mmu_cache_range(struct vm_area_struct *vma,
+		unsigned long addr, pte_t *ptep, unsigned int nr)
 {
 }
 #endif
 
+#define update_mmu_cache(vma, addr, ptep) \
+	update_mmu_cache_range(vma, addr, ptep, 1)
+
 #define update_mmu_cache_pmd(vma, address, pmd) do { } while (0)
 
 #endif
diff --git a/arch/arm/mm/copypage-v4mc.c b/arch/arm/mm/copypage-v4mc.c
index f1da3b439b96..7ddd82b9fe8b 100644
--- a/arch/arm/mm/copypage-v4mc.c
+++ b/arch/arm/mm/copypage-v4mc.c
@@ -64,10 +64,11 @@ static void mc_copy_user_page(void *from, void *to)
 void v4_mc_copy_user_highpage(struct page *to, struct page *from,
 	unsigned long vaddr, struct vm_area_struct *vma)
 {
+	struct folio *src = page_folio(from);
 	void *kto = kmap_atomic(to);
 
-	if (!test_and_set_bit(PG_dcache_clean, &from->flags))
-		__flush_dcache_page(page_mapping_file(from), from);
+	if (!test_and_set_bit(PG_dcache_clean, &src->flags))
+		__flush_dcache_folio(folio_flush_mapping(src), src);
 
 	raw_spin_lock(&minicache_lock);
 
diff --git a/arch/arm/mm/copypage-v6.c b/arch/arm/mm/copypage-v6.c
index d8a115de5507..a1a71f36d850 100644
--- a/arch/arm/mm/copypage-v6.c
+++ b/arch/arm/mm/copypage-v6.c
@@ -69,11 +69,12 @@ static void discard_old_kernel_data(void *kto)
 static void v6_copy_user_highpage_aliasing(struct page *to,
 	struct page *from, unsigned long vaddr, struct vm_area_struct *vma)
 {
+	struct folio *src = page_folio(from);
 	unsigned int offset = CACHE_COLOUR(vaddr);
 	unsigned long kfrom, kto;
 
-	if (!test_and_set_bit(PG_dcache_clean, &from->flags))
-		__flush_dcache_page(page_mapping_file(from), from);
+	if (!test_and_set_bit(PG_dcache_clean, &src->flags))
+		__flush_dcache_folio(folio_flush_mapping(src), src);
 
 	/* FIXME: not highmem safe */
 	discard_old_kernel_data(page_address(to));
diff --git a/arch/arm/mm/copypage-xscale.c b/arch/arm/mm/copypage-xscale.c
index bcb485620a05..f1e29d3e8193 100644
--- a/arch/arm/mm/copypage-xscale.c
+++ b/arch/arm/mm/copypage-xscale.c
@@ -84,10 +84,11 @@ static void mc_copy_user_page(void *from, void *to)
 void xscale_mc_copy_user_highpage(struct page *to, struct page *from,
 	unsigned long vaddr, struct vm_area_struct *vma)
 {
+	struct folio *src = page_folio(from);
 	void *kto = kmap_atomic(to);
 
-	if (!test_and_set_bit(PG_dcache_clean, &from->flags))
-		__flush_dcache_page(page_mapping_file(from), from);
+	if (!test_and_set_bit(PG_dcache_clean, &src->flags))
+		__flush_dcache_folio(folio_flush_mapping(src), src);
 
 	raw_spin_lock(&minicache_lock);
 
diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index 8bc01071474a..5ecfde41d70a 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -693,6 +693,7 @@ static void __dma_page_cpu_to_dev(struct page *page, unsigned long off,
 static void __dma_page_dev_to_cpu(struct page *page, unsigned long off,
 	size_t size, enum dma_data_direction dir)
 {
+	struct folio *folio = page_folio(page);
 	phys_addr_t paddr = page_to_phys(page) + off;
 
 	/* FIXME: non-speculating: not required */
@@ -707,19 +708,18 @@ static void __dma_page_dev_to_cpu(struct page *page, unsigned long off,
 	 * Mark the D-cache clean for these pages to avoid extra flushing.
 	 */
 	if (dir != DMA_TO_DEVICE && size >= PAGE_SIZE) {
-		unsigned long pfn;
-		size_t left = size;
-
-		pfn = page_to_pfn(page) + off / PAGE_SIZE;
-		off %= PAGE_SIZE;
-		if (off) {
-			pfn++;
-			left -= PAGE_SIZE - off;
+		ssize_t left = size;
+		size_t offset = offset_in_folio(folio, paddr);
+
+		if (offset) {
+			left -= folio_size(folio) - offset;
+			folio = folio_next(folio);
 		}
-		while (left >= PAGE_SIZE) {
-			page = pfn_to_page(pfn++);
-			set_bit(PG_dcache_clean, &page->flags);
-			left -= PAGE_SIZE;
+
+		while (left >= (ssize_t)folio_size(folio)) {
+			set_bit(PG_dcache_clean, &folio->flags);
+			left -= folio_size(folio);
+			folio = folio_next(folio);
 		}
 	}
 }
diff --git a/arch/arm/mm/fault-armv.c b/arch/arm/mm/fault-armv.c
index 0e49154454a6..e2c869b8f012 100644
--- a/arch/arm/mm/fault-armv.c
+++ b/arch/arm/mm/fault-armv.c
@@ -178,8 +178,8 @@ make_coherent(struct address_space *mapping, struct vm_area_struct *vma,
  *
  * Note that the pte lock will be held.
  */
-void update_mmu_cache(struct vm_area_struct *vma, unsigned long addr,
-	pte_t *ptep)
+void update_mmu_cache_range(struct vm_area_struct *vma, unsigned long addr,
+		pte_t *ptep, unsigned int nr)
 {
 	unsigned long pfn = pte_pfn(*ptep);
 	struct address_space *mapping;
@@ -192,13 +192,13 @@ void update_mmu_cache(struct vm_area_struct *vma, unsigned long addr,
 	 * The zero page is never written to, so never has any dirty
 	 * cache lines, and therefore never needs to be flushed.
 	 */
-	page = pfn_to_page(pfn);
-	if (page == ZERO_PAGE(0))
+	if (is_zero_pfn(pfn))
 		return;
 
-	mapping = page_mapping_file(page);
-	if (!test_and_set_bit(PG_dcache_clean, &page->flags))
-		__flush_dcache_page(mapping, page);
+	folio = page_folio(pfn_to_page(pfn));
+	mapping = folio_flush_mapping(page);
+	if (!test_and_set_bit(PG_dcache_clean, &folio->flags))
+		__flush_dcache_folio(mapping, folio);
 	if (mapping) {
 		if (cache_is_vivt())
 			make_coherent(mapping, vma, addr, ptep, pfn);
diff --git a/arch/arm/mm/flush.c b/arch/arm/mm/flush.c
index 7ff9feea13a6..07ea0ab51099 100644
--- a/arch/arm/mm/flush.c
+++ b/arch/arm/mm/flush.c
@@ -95,10 +95,10 @@ void flush_cache_range(struct vm_area_struct *vma, unsigned long start, unsigned
 		__flush_icache_all();
 }
 
-void flush_cache_page(struct vm_area_struct *vma, unsigned long user_addr, unsigned long pfn)
+void flush_cache_pages(struct vm_area_struct *vma, unsigned long user_addr, unsigned long pfn, unsigned int nr)
 {
 	if (cache_is_vivt()) {
-		vivt_flush_cache_page(vma, user_addr, pfn);
+		vivt_flush_cache_pages(vma, user_addr, pfn, nr);
 		return;
 	}
 
@@ -196,29 +196,31 @@ void copy_to_user_page(struct vm_area_struct *vma, struct page *page,
 #endif
 }
 
-void __flush_dcache_page(struct address_space *mapping, struct page *page)
+void __flush_dcache_folio(struct address_space *mapping, struct folio *folio)
 {
 	/*
 	 * Writeback any data associated with the kernel mapping of this
 	 * page.  This ensures that data in the physical page is mutually
 	 * coherent with the kernels mapping.
 	 */
-	if (!PageHighMem(page)) {
-		__cpuc_flush_dcache_area(page_address(page), page_size(page));
+	if (!folio_test_highmem(folio)) {
+		__cpuc_flush_dcache_area(folio_address(folio),
+					folio_size(folio));
 	} else {
 		unsigned long i;
 		if (cache_is_vipt_nonaliasing()) {
-			for (i = 0; i < compound_nr(page); i++) {
-				void *addr = kmap_atomic(page + i);
+			for (i = 0; i < folio_nr_pages(folio); i++) {
+				void *addr = kmap_local_folio(folio,
+								i * PAGE_SIZE);
 				__cpuc_flush_dcache_area(addr, PAGE_SIZE);
-				kunmap_atomic(addr);
+				kunmap_local(addr);
 			}
 		} else {
-			for (i = 0; i < compound_nr(page); i++) {
-				void *addr = kmap_high_get(page + i);
+			for (i = 0; i < folio_nr_pages(folio); i++) {
+				void *addr = kmap_high_get(folio_page(folio, i));
 				if (addr) {
 					__cpuc_flush_dcache_area(addr, PAGE_SIZE);
-					kunmap_high(page + i);
+					kunmap_high(folio_page(folio, i));
 				}
 			}
 		}
@@ -230,15 +232,14 @@ void __flush_dcache_page(struct address_space *mapping, struct page *page)
 	 * userspace colour, which is congruent with page->index.
 	 */
 	if (mapping && cache_is_vipt_aliasing())
-		flush_pfn_alias(page_to_pfn(page),
-				page->index << PAGE_SHIFT);
+		flush_pfn_alias(folio_pfn(folio), folio_pos(folio));
 }
 
-static void __flush_dcache_aliases(struct address_space *mapping, struct page *page)
+static void __flush_dcache_aliases(struct address_space *mapping, struct folio *folio)
 {
 	struct mm_struct *mm = current->active_mm;
-	struct vm_area_struct *mpnt;
-	pgoff_t pgoff;
+	struct vm_area_struct *vma;
+	pgoff_t pgoff, pgoff_end;
 
 	/*
 	 * There are possible user space mappings of this page:
@@ -246,21 +247,36 @@ static void __flush_dcache_aliases(struct address_space *mapping, struct page *p
 	 *   data in the current VM view associated with this page.
 	 * - aliasing VIPT: we only need to find one mapping of this page.
 	 */
-	pgoff = page->index;
+	pgoff = folio->index;
+	pgoff_end = pgoff + folio_nr_pages(folio) - 1;
 
 	flush_dcache_mmap_lock(mapping);
-	vma_interval_tree_foreach(mpnt, &mapping->i_mmap, pgoff, pgoff) {
-		unsigned long offset;
+	vma_interval_tree_foreach(vma, &mapping->i_mmap, pgoff, pgoff_end) {
+		unsigned long start, offset, pfn;
+		unsigned int nr;
 
 		/*
 		 * If this VMA is not in our MM, we can ignore it.
 		 */
-		if (mpnt->vm_mm != mm)
+		if (vma->vm_mm != mm)
 			continue;
-		if (!(mpnt->vm_flags & VM_MAYSHARE))
+		if (!(vma->vm_flags & VM_MAYSHARE))
 			continue;
-		offset = (pgoff - mpnt->vm_pgoff) << PAGE_SHIFT;
-		flush_cache_page(mpnt, mpnt->vm_start + offset, page_to_pfn(page));
+
+		start = vma->vm_start;
+		pfn = folio_pfn(folio);
+		nr = folio_nr_pages(folio);
+		offset = pgoff - vma->vm_pgoff;
+		if (offset > -nr) {
+			pfn -= offset;
+			nr += offset;
+		} else {
+			start += offset * PAGE_SIZE;
+		}
+		if (start + nr * PAGE_SIZE > vma->vm_end)
+			nr = (vma->vm_end - start) / PAGE_SIZE;
+
+		flush_cache_pages(vma, start, pfn, nr);
 	}
 	flush_dcache_mmap_unlock(mapping);
 }
@@ -269,7 +285,7 @@ static void __flush_dcache_aliases(struct address_space *mapping, struct page *p
 void __sync_icache_dcache(pte_t pteval)
 {
 	unsigned long pfn;
-	struct page *page;
+	struct folio *folio;
 	struct address_space *mapping;
 
 	if (cache_is_vipt_nonaliasing() && !pte_exec(pteval))
@@ -279,14 +295,14 @@ void __sync_icache_dcache(pte_t pteval)
 	if (!pfn_valid(pfn))
 		return;
 
-	page = pfn_to_page(pfn);
+	folio = page_folio(pfn_to_page(pfn));
 	if (cache_is_vipt_aliasing())
-		mapping = page_mapping_file(page);
+		mapping = folio_flush_mapping(folio);
 	else
 		mapping = NULL;
 
-	if (!test_and_set_bit(PG_dcache_clean, &page->flags))
-		__flush_dcache_page(mapping, page);
+	if (!test_and_set_bit(PG_dcache_clean, &folio->flags))
+		__flush_dcache_folio(mapping, folio);
 
 	if (pte_exec(pteval))
 		__flush_icache_all();
@@ -312,7 +328,7 @@ void __sync_icache_dcache(pte_t pteval)
  * Note that we disable the lazy flush for SMP configurations where
  * the cache maintenance operations are not automatically broadcasted.
  */
-void flush_dcache_page(struct page *page)
+void flush_dcache_folio(struct folio *folio)
 {
 	struct address_space *mapping;
 
@@ -320,31 +336,36 @@ void flush_dcache_page(struct page *page)
 	 * The zero page is never written to, so never has any dirty
 	 * cache lines, and therefore never needs to be flushed.
 	 */
-	if (page == ZERO_PAGE(0))
+	if (is_zero_pfn(folio_pfn(folio)))
 		return;
 
 	if (!cache_ops_need_broadcast() && cache_is_vipt_nonaliasing()) {
-		if (test_bit(PG_dcache_clean, &page->flags))
-			clear_bit(PG_dcache_clean, &page->flags);
+		if (test_bit(PG_dcache_clean, &folio->flags))
+			clear_bit(PG_dcache_clean, &folio->flags);
 		return;
 	}
 
-	mapping = page_mapping_file(page);
+	mapping = folio_flush_mapping(folio);
 
 	if (!cache_ops_need_broadcast() &&
-	    mapping && !page_mapcount(page))
-		clear_bit(PG_dcache_clean, &page->flags);
+	    mapping && !folio_mapped(folio))
+		clear_bit(PG_dcache_clean, &folio->flags);
 	else {
-		__flush_dcache_page(mapping, page);
+		__flush_dcache_folio(mapping, folio);
 		if (mapping && cache_is_vivt())
-			__flush_dcache_aliases(mapping, page);
+			__flush_dcache_aliases(mapping, folio);
 		else if (mapping)
 			__flush_icache_all();
-		set_bit(PG_dcache_clean, &page->flags);
+		set_bit(PG_dcache_clean, &folio->flags);
 	}
 }
-EXPORT_SYMBOL(flush_dcache_page);
+EXPORT_SYMBOL(flush_dcache_folio);
 
+void flush_dcache_page(struct page *page)
+{
+	flush_dcache_folio(page_folio(page));
+}
+EXPORT_SYMBOL(flush_dcache_page);
 /*
  * Flush an anonymous page so that users of get_user_pages()
  * can safely access the data.  The expected sequence is:
diff --git a/arch/arm/mm/mm.h b/arch/arm/mm/mm.h
index d7ffccb7fea7..419316316711 100644
--- a/arch/arm/mm/mm.h
+++ b/arch/arm/mm/mm.h
@@ -45,7 +45,7 @@ struct mem_type {
 
 const struct mem_type *get_mem_type(unsigned int type);
 
-extern void __flush_dcache_page(struct address_space *mapping, struct page *page);
+void __flush_dcache_folio(struct address_space *mapping, struct folio *folio);
 
 /*
  * ARM specific vm_struct->flags bits.
diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
index 463fc2a8448f..9947bbc32b04 100644
--- a/arch/arm/mm/mmu.c
+++ b/arch/arm/mm/mmu.c
@@ -1788,7 +1788,7 @@ void __init paging_init(const struct machine_desc *mdesc)
 	bootmem_init();
 
 	empty_zero_page = virt_to_page(zero_page);
-	__flush_dcache_page(NULL, empty_zero_page);
+	__flush_dcache_folio(NULL, page_folio(empty_zero_page));
 }
 
 void __init early_mm_init(const struct machine_desc *mdesc)
@@ -1797,8 +1797,8 @@ void __init early_mm_init(const struct machine_desc *mdesc)
 	early_paging_init(mdesc);
 }
 
-void set_pte_at(struct mm_struct *mm, unsigned long addr,
-			      pte_t *ptep, pte_t pteval)
+void set_ptes(struct mm_struct *mm, unsigned long addr,
+			      pte_t *ptep, pte_t pteval, unsigned int nr)
 {
 	unsigned long ext = 0;
 
@@ -1808,5 +1808,11 @@ void set_pte_at(struct mm_struct *mm, unsigned long addr,
 		ext |= PTE_EXT_NG;
 	}
 
-	set_pte_ext(ptep, pteval, ext);
+	for (;;) {
+		set_pte_ext(ptep, pteval, ext);
+		if (--nr == 0)
+			break;
+		ptep++;
+		pte_val(pteval) += PAGE_SIZE;
+	}
 }
-- 
2.39.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH v3 08/34] arm64: Implement the new page table range API
  2023-02-28 21:37 [PATCH v3 00/34] New page table range API Matthew Wilcox (Oracle)
@ 2023-02-28 21:37   ` Matthew Wilcox (Oracle)
  2023-02-28 21:37 ` [PATCH v3 02/34] mm: Add generic flush_icache_pages() and documentation Matthew Wilcox (Oracle)
                     ` (35 subsequent siblings)
  36 siblings, 0 replies; 77+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-02-28 21:37 UTC (permalink / raw)
  To: linux-mm, linux-arch
  Cc: Matthew Wilcox (Oracle), linux-kernel, Catalin Marinas, linux-arm-kernel

Add set_ptes(), update_mmu_cache_range() and flush_dcache_folio().
Change the PG_dcache_clean flag from being per-page to per-folio.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
---
 arch/arm64/include/asm/cacheflush.h |  4 +++-
 arch/arm64/include/asm/pgtable.h    | 25 ++++++++++++++------
 arch/arm64/mm/flush.c               | 36 +++++++++++------------------
 3 files changed, 35 insertions(+), 30 deletions(-)

diff --git a/arch/arm64/include/asm/cacheflush.h b/arch/arm64/include/asm/cacheflush.h
index 37185e978aeb..d115451ed263 100644
--- a/arch/arm64/include/asm/cacheflush.h
+++ b/arch/arm64/include/asm/cacheflush.h
@@ -114,7 +114,7 @@ extern void copy_to_user_page(struct vm_area_struct *, struct page *,
 #define copy_to_user_page copy_to_user_page
 
 /*
- * flush_dcache_page is used when the kernel has written to the page
+ * flush_dcache_folio is used when the kernel has written to the page
  * cache page at virtual address page->virtual.
  *
  * If this page isn't mapped (ie, page_mapping == NULL), or it might
@@ -127,6 +127,8 @@ extern void copy_to_user_page(struct vm_area_struct *, struct page *,
  */
 #define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
 extern void flush_dcache_page(struct page *);
+void flush_dcache_folio(struct folio *);
+#define flush_dcache_folio flush_dcache_folio
 
 static __always_inline void icache_inval_all_pou(void)
 {
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 69765dc697af..4d1b79dbff16 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -355,12 +355,21 @@ static inline void __set_pte_at(struct mm_struct *mm, unsigned long addr,
 	set_pte(ptep, pte);
 }
 
-static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
-			      pte_t *ptep, pte_t pte)
-{
-	page_table_check_ptes_set(mm, addr, ptep, pte, 1);
-	return __set_pte_at(mm, addr, ptep, pte);
+static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
+			      pte_t *ptep, pte_t pte, unsigned int nr)
+{
+	page_table_check_ptes_set(mm, addr, ptep, pte, nr);
+
+	for (;;) {
+		__set_pte_at(mm, addr, ptep, pte);
+		if (--nr == 0)
+			break;
+		ptep++;
+		addr += PAGE_SIZE;
+		pte_val(pte) += PAGE_SIZE;
+	}
 }
+#define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1)
 
 /*
  * Huge pte definitions.
@@ -1059,8 +1068,8 @@ static inline void arch_swap_restore(swp_entry_t entry, struct folio *folio)
 /*
  * On AArch64, the cache coherency is handled via the set_pte_at() function.
  */
-static inline void update_mmu_cache(struct vm_area_struct *vma,
-				    unsigned long addr, pte_t *ptep)
+static inline void update_mmu_cache_range(struct vm_area_struct *vma,
+		unsigned long addr, pte_t *ptep, unsigned int nr)
 {
 	/*
 	 * We don't do anything here, so there's a very small chance of
@@ -1069,6 +1078,8 @@ static inline void update_mmu_cache(struct vm_area_struct *vma,
 	 */
 }
 
+#define update_mmu_cache(vma, addr, ptep) \
+	update_mmu_cache_range(vma, addr, ptep, 1)
 #define update_mmu_cache_pmd(vma, address, pmd) do { } while (0)
 
 #ifdef CONFIG_ARM64_PA_BITS_52
diff --git a/arch/arm64/mm/flush.c b/arch/arm64/mm/flush.c
index 5f9379b3c8c8..deb781af0a3a 100644
--- a/arch/arm64/mm/flush.c
+++ b/arch/arm64/mm/flush.c
@@ -50,20 +50,13 @@ void copy_to_user_page(struct vm_area_struct *vma, struct page *page,
 
 void __sync_icache_dcache(pte_t pte)
 {
-	struct page *page = pte_page(pte);
+	struct folio *folio = page_folio(pte_page(pte));
 
-	/*
-	 * HugeTLB pages are always fully mapped, so only setting head page's
-	 * PG_dcache_clean flag is enough.
-	 */
-	if (PageHuge(page))
-		page = compound_head(page);
-
-	if (!test_bit(PG_dcache_clean, &page->flags)) {
-		sync_icache_aliases((unsigned long)page_address(page),
-				    (unsigned long)page_address(page) +
-					    page_size(page));
-		set_bit(PG_dcache_clean, &page->flags);
+	if (!test_bit(PG_dcache_clean, &folio->flags)) {
+		sync_icache_aliases((unsigned long)folio_address(folio),
+				    (unsigned long)folio_address(folio) +
+					    folio_size(folio));
+		set_bit(PG_dcache_clean, &folio->flags);
 	}
 }
 EXPORT_SYMBOL_GPL(__sync_icache_dcache);
@@ -73,17 +66,16 @@ EXPORT_SYMBOL_GPL(__sync_icache_dcache);
  * it as dirty for later flushing when mapped in user space (if executable,
  * see __sync_icache_dcache).
  */
-void flush_dcache_page(struct page *page)
+void flush_dcache_folio(struct folio *folio)
 {
-	/*
-	 * HugeTLB pages are always fully mapped and only head page will be
-	 * set PG_dcache_clean (see comments in __sync_icache_dcache()).
-	 */
-	if (PageHuge(page))
-		page = compound_head(page);
+	if (test_bit(PG_dcache_clean, &folio->flags))
+		clear_bit(PG_dcache_clean, &folio->flags);
+}
+EXPORT_SYMBOL(flush_dcache_folio);
 
-	if (test_bit(PG_dcache_clean, &page->flags))
-		clear_bit(PG_dcache_clean, &page->flags);
+void flush_dcache_page(struct page *page)
+{
+	flush_dcache_folio(page_folio(page));
 }
 EXPORT_SYMBOL(flush_dcache_page);
 
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH v3 08/34] arm64: Implement the new page table range API
@ 2023-02-28 21:37   ` Matthew Wilcox (Oracle)
  0 siblings, 0 replies; 77+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-02-28 21:37 UTC (permalink / raw)
  To: linux-mm, linux-arch
  Cc: Matthew Wilcox (Oracle), linux-kernel, Catalin Marinas, linux-arm-kernel

Add set_ptes(), update_mmu_cache_range() and flush_dcache_folio().
Change the PG_dcache_clean flag from being per-page to per-folio.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
---
 arch/arm64/include/asm/cacheflush.h |  4 +++-
 arch/arm64/include/asm/pgtable.h    | 25 ++++++++++++++------
 arch/arm64/mm/flush.c               | 36 +++++++++++------------------
 3 files changed, 35 insertions(+), 30 deletions(-)

diff --git a/arch/arm64/include/asm/cacheflush.h b/arch/arm64/include/asm/cacheflush.h
index 37185e978aeb..d115451ed263 100644
--- a/arch/arm64/include/asm/cacheflush.h
+++ b/arch/arm64/include/asm/cacheflush.h
@@ -114,7 +114,7 @@ extern void copy_to_user_page(struct vm_area_struct *, struct page *,
 #define copy_to_user_page copy_to_user_page
 
 /*
- * flush_dcache_page is used when the kernel has written to the page
+ * flush_dcache_folio is used when the kernel has written to the page
  * cache page at virtual address page->virtual.
  *
  * If this page isn't mapped (ie, page_mapping == NULL), or it might
@@ -127,6 +127,8 @@ extern void copy_to_user_page(struct vm_area_struct *, struct page *,
  */
 #define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
 extern void flush_dcache_page(struct page *);
+void flush_dcache_folio(struct folio *);
+#define flush_dcache_folio flush_dcache_folio
 
 static __always_inline void icache_inval_all_pou(void)
 {
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 69765dc697af..4d1b79dbff16 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -355,12 +355,21 @@ static inline void __set_pte_at(struct mm_struct *mm, unsigned long addr,
 	set_pte(ptep, pte);
 }
 
-static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
-			      pte_t *ptep, pte_t pte)
-{
-	page_table_check_ptes_set(mm, addr, ptep, pte, 1);
-	return __set_pte_at(mm, addr, ptep, pte);
+static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
+			      pte_t *ptep, pte_t pte, unsigned int nr)
+{
+	page_table_check_ptes_set(mm, addr, ptep, pte, nr);
+
+	for (;;) {
+		__set_pte_at(mm, addr, ptep, pte);
+		if (--nr == 0)
+			break;
+		ptep++;
+		addr += PAGE_SIZE;
+		pte_val(pte) += PAGE_SIZE;
+	}
 }
+#define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1)
 
 /*
  * Huge pte definitions.
@@ -1059,8 +1068,8 @@ static inline void arch_swap_restore(swp_entry_t entry, struct folio *folio)
 /*
  * On AArch64, the cache coherency is handled via the set_pte_at() function.
  */
-static inline void update_mmu_cache(struct vm_area_struct *vma,
-				    unsigned long addr, pte_t *ptep)
+static inline void update_mmu_cache_range(struct vm_area_struct *vma,
+		unsigned long addr, pte_t *ptep, unsigned int nr)
 {
 	/*
 	 * We don't do anything here, so there's a very small chance of
@@ -1069,6 +1078,8 @@ static inline void update_mmu_cache(struct vm_area_struct *vma,
 	 */
 }
 
+#define update_mmu_cache(vma, addr, ptep) \
+	update_mmu_cache_range(vma, addr, ptep, 1)
 #define update_mmu_cache_pmd(vma, address, pmd) do { } while (0)
 
 #ifdef CONFIG_ARM64_PA_BITS_52
diff --git a/arch/arm64/mm/flush.c b/arch/arm64/mm/flush.c
index 5f9379b3c8c8..deb781af0a3a 100644
--- a/arch/arm64/mm/flush.c
+++ b/arch/arm64/mm/flush.c
@@ -50,20 +50,13 @@ void copy_to_user_page(struct vm_area_struct *vma, struct page *page,
 
 void __sync_icache_dcache(pte_t pte)
 {
-	struct page *page = pte_page(pte);
+	struct folio *folio = page_folio(pte_page(pte));
 
-	/*
-	 * HugeTLB pages are always fully mapped, so only setting head page's
-	 * PG_dcache_clean flag is enough.
-	 */
-	if (PageHuge(page))
-		page = compound_head(page);
-
-	if (!test_bit(PG_dcache_clean, &page->flags)) {
-		sync_icache_aliases((unsigned long)page_address(page),
-				    (unsigned long)page_address(page) +
-					    page_size(page));
-		set_bit(PG_dcache_clean, &page->flags);
+	if (!test_bit(PG_dcache_clean, &folio->flags)) {
+		sync_icache_aliases((unsigned long)folio_address(folio),
+				    (unsigned long)folio_address(folio) +
+					    folio_size(folio));
+		set_bit(PG_dcache_clean, &folio->flags);
 	}
 }
 EXPORT_SYMBOL_GPL(__sync_icache_dcache);
@@ -73,17 +66,16 @@ EXPORT_SYMBOL_GPL(__sync_icache_dcache);
  * it as dirty for later flushing when mapped in user space (if executable,
  * see __sync_icache_dcache).
  */
-void flush_dcache_page(struct page *page)
+void flush_dcache_folio(struct folio *folio)
 {
-	/*
-	 * HugeTLB pages are always fully mapped and only head page will be
-	 * set PG_dcache_clean (see comments in __sync_icache_dcache()).
-	 */
-	if (PageHuge(page))
-		page = compound_head(page);
+	if (test_bit(PG_dcache_clean, &folio->flags))
+		clear_bit(PG_dcache_clean, &folio->flags);
+}
+EXPORT_SYMBOL(flush_dcache_folio);
 
-	if (test_bit(PG_dcache_clean, &page->flags))
-		clear_bit(PG_dcache_clean, &page->flags);
+void flush_dcache_page(struct page *page)
+{
+	flush_dcache_folio(page_folio(page));
 }
 EXPORT_SYMBOL(flush_dcache_page);
 
-- 
2.39.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH v3 09/34] csky: Implement the new page table range API
  2023-02-28 21:37 [PATCH v3 00/34] New page table range API Matthew Wilcox (Oracle)
                   ` (7 preceding siblings ...)
  2023-02-28 21:37   ` Matthew Wilcox (Oracle)
@ 2023-02-28 21:37 ` Matthew Wilcox (Oracle)
  2023-03-03 11:40   ` Mike Rapoport
  2023-02-28 21:37 ` [PATCH v3 10/34] hexagon: " Matthew Wilcox (Oracle)
                   ` (27 subsequent siblings)
  36 siblings, 1 reply; 77+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-02-28 21:37 UTC (permalink / raw)
  To: linux-mm, linux-arch
  Cc: Matthew Wilcox (Oracle), linux-kernel, Guo Ren, linux-csky

Add set_ptes(), update_mmu_cache_range() and flush_dcache_folio().
Change the PG_dcache_clean flag from being per-page to per-folio.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Acked-by: Guo Ren <guoren@kernel.org>
Cc: linux-csky@vger.kernel.org
---
 arch/csky/abiv1/cacheflush.c         | 32 +++++++++++++++++-----------
 arch/csky/abiv1/inc/abi/cacheflush.h |  2 ++
 arch/csky/abiv2/cacheflush.c         | 30 +++++++++++++-------------
 arch/csky/abiv2/inc/abi/cacheflush.h | 10 +++++++--
 arch/csky/include/asm/pgtable.h      | 21 +++++++++++++++---
 5 files changed, 62 insertions(+), 33 deletions(-)

diff --git a/arch/csky/abiv1/cacheflush.c b/arch/csky/abiv1/cacheflush.c
index fb91b069dc69..ba43f6c26b4f 100644
--- a/arch/csky/abiv1/cacheflush.c
+++ b/arch/csky/abiv1/cacheflush.c
@@ -14,43 +14,49 @@
 
 #define PG_dcache_clean		PG_arch_1
 
-void flush_dcache_page(struct page *page)
+void flush_dcache_folio(struct folio *folio)
 {
 	struct address_space *mapping;
 
-	if (page == ZERO_PAGE(0))
+	if (is_zero_pfn(folio_pfn(folio)))
 		return;
 
-	mapping = page_mapping_file(page);
+	mapping = folio_flush_mapping(folio);
 
-	if (mapping && !page_mapcount(page))
-		clear_bit(PG_dcache_clean, &page->flags);
+	if (mapping && !folio_mapped(folio))
+		clear_bit(PG_dcache_clean, &folio->flags);
 	else {
 		dcache_wbinv_all();
 		if (mapping)
 			icache_inv_all();
-		set_bit(PG_dcache_clean, &page->flags);
+		set_bit(PG_dcache_clean, &folio->flags);
 	}
 }
+EXPORT_SYMBOL(flush_dcache_folio);
+
+void flush_dcache_page(struct page *page)
+{
+	flush_dcache_folio(page_folio(page));
+}
 EXPORT_SYMBOL(flush_dcache_page);
 
-void update_mmu_cache(struct vm_area_struct *vma, unsigned long addr,
-	pte_t *ptep)
+void update_mmu_cache_range(struct vm_area_struct *vma, unsigned long addr,
+		pte_t *ptep, unsigned int nr)
 {
 	unsigned long pfn = pte_pfn(*ptep);
-	struct page *page;
+	struct folio *folio;
 
 	if (!pfn_valid(pfn))
 		return;
 
-	page = pfn_to_page(pfn);
-	if (page == ZERO_PAGE(0))
+	if (is_zero_pfn(pfn))
 		return;
 
-	if (!test_and_set_bit(PG_dcache_clean, &page->flags))
+	folio = page_folio(pfn_to_page(pfn));
+	if (!test_and_set_bit(PG_dcache_clean, &folio->flags))
 		dcache_wbinv_all();
 
-	if (page_mapping_file(page)) {
+	if (folio_flush_mapping(folio)) {
 		if (vma->vm_flags & VM_EXEC)
 			icache_inv_all();
 	}
diff --git a/arch/csky/abiv1/inc/abi/cacheflush.h b/arch/csky/abiv1/inc/abi/cacheflush.h
index ed62e2066ba7..0d6cb65624c4 100644
--- a/arch/csky/abiv1/inc/abi/cacheflush.h
+++ b/arch/csky/abiv1/inc/abi/cacheflush.h
@@ -9,6 +9,8 @@
 
 #define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
 extern void flush_dcache_page(struct page *);
+void flush_dcache_folio(struct folio *);
+#define flush_dcache_folio flush_dcache_folio
 
 #define flush_cache_mm(mm)			dcache_wbinv_all()
 #define flush_cache_page(vma, page, pfn)	cache_wbinv_all()
diff --git a/arch/csky/abiv2/cacheflush.c b/arch/csky/abiv2/cacheflush.c
index 39c51399dd81..c1cf0d55a2a1 100644
--- a/arch/csky/abiv2/cacheflush.c
+++ b/arch/csky/abiv2/cacheflush.c
@@ -6,30 +6,30 @@
 #include <linux/mm.h>
 #include <asm/cache.h>
 
-void update_mmu_cache(struct vm_area_struct *vma, unsigned long address,
-		      pte_t *pte)
+void update_mmu_cache_range(struct vm_area_struct *vma, unsigned long address,
+		pte_t *pte, unsigned int nr)
 {
-	unsigned long addr;
+	unsigned long pfn = pte_pfn(*pte);
 	struct page *page;
+	unsigned int i;
 
-	if (!pfn_valid(pte_pfn(*pte)))
+	if (!pfn_valid(pfn) || is_zero_pfn(pfn))
 		return;
 
-	page = pfn_to_page(pte_pfn(*pte));
-	if (page == ZERO_PAGE(0))
-		return;
+	folio = page_folio(pfn_to_page(pfn));
 
-	if (test_and_set_bit(PG_dcache_clean, &page->flags))
+	if (test_and_set_bit(PG_dcache_clean, &folio->flags))
 		return;
 
-	addr = (unsigned long) kmap_atomic(page);
-
-	dcache_wb_range(addr, addr + PAGE_SIZE);
+	for (i = 0; i < folio_nr_pages(folio); i++) {
+		unsigned long addr = (unsigned long) kmap_local_folio(folio,
+								i * PAGE_SIZE);
 
-	if (vma->vm_flags & VM_EXEC)
-		icache_inv_range(addr, addr + PAGE_SIZE);
-
-	kunmap_atomic((void *) addr);
+		dcache_wb_range(addr, addr + PAGE_SIZE);
+		if (vma->vm_flags & VM_EXEC)
+			icache_inv_range(addr, addr + PAGE_SIZE);
+		kunmap_local((void *) addr);
+	}
 }
 
 void flush_icache_deferred(struct mm_struct *mm)
diff --git a/arch/csky/abiv2/inc/abi/cacheflush.h b/arch/csky/abiv2/inc/abi/cacheflush.h
index a565e00c3f70..9c728933a776 100644
--- a/arch/csky/abiv2/inc/abi/cacheflush.h
+++ b/arch/csky/abiv2/inc/abi/cacheflush.h
@@ -18,11 +18,17 @@
 
 #define PG_dcache_clean		PG_arch_1
 
+static inline void flush_dcache_folio(struct folio *folio)
+{
+	if (test_bit(PG_dcache_clean, &folio->flags))
+		clear_bit(PG_dcache_clean, &folio->flags);
+}
+#define flush_dcache_folio flush_dcache_folio
+
 #define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
 static inline void flush_dcache_page(struct page *page)
 {
-	if (test_bit(PG_dcache_clean, &page->flags))
-		clear_bit(PG_dcache_clean, &page->flags);
+	flush_dcache_folio(page_folio(page));
 }
 
 #define flush_dcache_mmap_lock(mapping)		do { } while (0)
diff --git a/arch/csky/include/asm/pgtable.h b/arch/csky/include/asm/pgtable.h
index d4042495febc..a30ae048233e 100644
--- a/arch/csky/include/asm/pgtable.h
+++ b/arch/csky/include/asm/pgtable.h
@@ -90,7 +90,20 @@ static inline void set_pte(pte_t *p, pte_t pte)
 	/* prevent out of order excution */
 	smp_mb();
 }
-#define set_pte_at(mm, addr, ptep, pteval) set_pte(ptep, pteval)
+
+static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
+		pte_t *ptep, pte_t pte, unsigned int nr)
+{
+	for (;;) {
+		set_pte(ptep, pte);
+		if (--nr == 0)
+			break;
+		ptep++;
+		pte_val(pte) += PAGE_SIZE;
+	}
+}
+
+#define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1)
 
 static inline pte_t *pmd_page_vaddr(pmd_t pmd)
 {
@@ -263,8 +276,10 @@ static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
 extern pgd_t swapper_pg_dir[PTRS_PER_PGD];
 extern void paging_init(void);
 
-void update_mmu_cache(struct vm_area_struct *vma, unsigned long address,
-		      pte_t *pte);
+void update_mmu_cache_range(struct vm_area_struct *vma, unsigned long address,
+		pte_t *pte, unsigned int nr);
+#define update_mmu_cache(vma, addr, ptep) \
+	update_mmu_cache_range(vma, addr, ptep, 1)
 
 #define io_remap_pfn_range(vma, vaddr, pfn, size, prot) \
 	remap_pfn_range(vma, vaddr, pfn, size, prot)
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH v3 10/34] hexagon: Implement the new page table range API
  2023-02-28 21:37 [PATCH v3 00/34] New page table range API Matthew Wilcox (Oracle)
                   ` (8 preceding siblings ...)
  2023-02-28 21:37 ` [PATCH v3 09/34] csky: " Matthew Wilcox (Oracle)
@ 2023-02-28 21:37 ` Matthew Wilcox (Oracle)
  2023-02-28 21:37   ` Matthew Wilcox (Oracle)
                   ` (26 subsequent siblings)
  36 siblings, 0 replies; 77+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-02-28 21:37 UTC (permalink / raw)
  To: linux-mm, linux-arch; +Cc: Matthew Wilcox (Oracle), linux-kernel, Brian Cain

Add set_ptes() and update_mmu_cache_range().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Acked-by: Brian Cain <bcain@quicinc.com>
---
 arch/hexagon/include/asm/cacheflush.h |  7 +++++--
 arch/hexagon/include/asm/pgtable.h    | 16 ++++++++++++++--
 2 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/arch/hexagon/include/asm/cacheflush.h b/arch/hexagon/include/asm/cacheflush.h
index 6eff0730e6ef..63ca314ede89 100644
--- a/arch/hexagon/include/asm/cacheflush.h
+++ b/arch/hexagon/include/asm/cacheflush.h
@@ -58,12 +58,15 @@ extern void flush_cache_all_hexagon(void);
  * clean the cache when the PTE is set.
  *
  */
-static inline void update_mmu_cache(struct vm_area_struct *vma,
-					unsigned long address, pte_t *ptep)
+static inline void update_mmu_cache_range(struct vm_area_struct *vma,
+		unsigned long address, pte_t *ptep, unsigned int nr)
 {
 	/*  generic_ptrace_pokedata doesn't wind up here, does it?  */
 }
 
+#define update_mmu_cache(vma, addr, ptep) \
+	update_mmu_cache_range(vma, addr, ptep, 1)
+
 void copy_to_user_page(struct vm_area_struct *vma, struct page *page,
 		       unsigned long vaddr, void *dst, void *src, int len);
 #define copy_to_user_page copy_to_user_page
diff --git a/arch/hexagon/include/asm/pgtable.h b/arch/hexagon/include/asm/pgtable.h
index 59393613d086..f58f1d920769 100644
--- a/arch/hexagon/include/asm/pgtable.h
+++ b/arch/hexagon/include/asm/pgtable.h
@@ -346,12 +346,24 @@ static inline int pte_exec(pte_t pte)
 #define set_pmd(pmdptr, pmdval) (*(pmdptr) = (pmdval))
 
 /*
- * set_pte_at - update page table and do whatever magic may be
+ * set_ptes - update page table and do whatever magic may be
  * necessary to make the underlying hardware/firmware take note.
  *
  * VM may require a virtual instruction to alert the MMU.
  */
-#define set_pte_at(mm, addr, ptep, pte) set_pte(ptep, pte)
+static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
+		pte_t *ptep, pte_t pte, unsigned int nr)
+{
+	for (;;) {
+		set_pte(ptep, pte);
+		if (--nr == 0)
+			break;
+		ptep++;
+		pte_val(pte) += PAGE_SIZE;
+	}
+}
+
+#define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1)
 
 static inline unsigned long pmd_page_vaddr(pmd_t pmd)
 {
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH v3 11/34] ia64: Implement the new page table range API
  2023-02-28 21:37 [PATCH v3 00/34] New page table range API Matthew Wilcox (Oracle)
@ 2023-02-28 21:37   ` Matthew Wilcox (Oracle)
  2023-02-28 21:37 ` [PATCH v3 02/34] mm: Add generic flush_icache_pages() and documentation Matthew Wilcox (Oracle)
                     ` (35 subsequent siblings)
  36 siblings, 0 replies; 77+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-02-28 21:37 UTC (permalink / raw)
  To: linux-mm, linux-arch; +Cc: Matthew Wilcox (Oracle), linux-kernel, linux-ia64

Add set_ptes(), update_mmu_cache_range() and flush_dcache_folio().
Change the PG_arch_1 (aka PG_dcache_clean) flag from being per-page to
per-folio, which makes arch_dma_mark_clean() and mark_clean() a little
more exciting.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: linux-ia64@vger.kernel.org
---
 arch/ia64/hp/common/sba_iommu.c    | 26 +++++++++++++++-----------
 arch/ia64/include/asm/cacheflush.h | 14 ++++++++++----
 arch/ia64/include/asm/pgtable.h    | 14 +++++++++++++-
 arch/ia64/mm/init.c                | 29 +++++++++++++++++++----------
 4 files changed, 57 insertions(+), 26 deletions(-)

diff --git a/arch/ia64/hp/common/sba_iommu.c b/arch/ia64/hp/common/sba_iommu.c
index 8ad6946521d8..48d475f10003 100644
--- a/arch/ia64/hp/common/sba_iommu.c
+++ b/arch/ia64/hp/common/sba_iommu.c
@@ -798,22 +798,26 @@ sba_io_pdir_entry(u64 *pdir_ptr, unsigned long vba)
 #endif
 
 #ifdef ENABLE_MARK_CLEAN
-/**
+/*
  * Since DMA is i-cache coherent, any (complete) pages that were written via
  * DMA can be marked as "clean" so that lazy_mmu_prot_update() doesn't have to
  * flush them when they get mapped into an executable vm-area.
  */
-static void
-mark_clean (void *addr, size_t size)
+static void mark_clean(void *addr, size_t size)
 {
-	unsigned long pg_addr, end;
-
-	pg_addr = PAGE_ALIGN((unsigned long) addr);
-	end = (unsigned long) addr + size;
-	while (pg_addr + PAGE_SIZE <= end) {
-		struct page *page = virt_to_page((void *)pg_addr);
-		set_bit(PG_arch_1, &page->flags);
-		pg_addr += PAGE_SIZE;
+	struct folio *folio = virt_to_folio(addr);
+	ssize_t left = size;
+	size_t offset = offset_in_folio(folio, addr);
+
+	if (offset) {
+		left -= folio_size(folio) - offset;
+		folio = folio_next(folio);
+	}
+
+	while (left >= folio_size(folio)) {
+		set_bit(PG_arch_1, &folio->flags);
+		left -= folio_size(folio);
+		folio = folio_next(folio);
 	}
 }
 #endif
diff --git a/arch/ia64/include/asm/cacheflush.h b/arch/ia64/include/asm/cacheflush.h
index 708c0fa5d975..eac493fa9e0d 100644
--- a/arch/ia64/include/asm/cacheflush.h
+++ b/arch/ia64/include/asm/cacheflush.h
@@ -13,10 +13,16 @@
 #include <asm/page.h>
 
 #define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
-#define flush_dcache_page(page)			\
-do {						\
-	clear_bit(PG_arch_1, &(page)->flags);	\
-} while (0)
+static inline void flush_dcache_folio(struct folio *folio)
+{
+	clear_bit(PG_arch_1, &folio->flags);
+}
+#define flush_dcache_folio flush_dcache_folio
+
+static inline void flush_dcache_page(struct page *page)
+{
+	flush_dcache_folio(page_folio(page));
+}
 
 extern void flush_icache_range(unsigned long start, unsigned long end);
 #define flush_icache_range flush_icache_range
diff --git a/arch/ia64/include/asm/pgtable.h b/arch/ia64/include/asm/pgtable.h
index 21c97e31a28a..0c2be4ea664b 100644
--- a/arch/ia64/include/asm/pgtable.h
+++ b/arch/ia64/include/asm/pgtable.h
@@ -303,7 +303,18 @@ static inline void set_pte(pte_t *ptep, pte_t pteval)
 	*ptep = pteval;
 }
 
-#define set_pte_at(mm,addr,ptep,pteval) set_pte(ptep,pteval)
+static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
+		pte_t *ptep, pte_t pte, unsigned int nr)
+{
+	for (;;) {
+		set_pte(ptep, pte);
+		if (--nr == 0)
+			break;
+		ptep++;
+		pte_val(pte) += PAGE_SIZE;
+	}
+}
+#define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, add, ptep, pte, 1)
 
 /*
  * Make page protection values cacheable, uncacheable, or write-
@@ -396,6 +407,7 @@ pte_same (pte_t a, pte_t b)
 	return pte_val(a) == pte_val(b);
 }
 
+#define update_mmu_cache_range(vma, address, ptep, nr) do { } while (0)
 #define update_mmu_cache(vma, address, ptep) do { } while (0)
 
 extern pgd_t swapper_pg_dir[PTRS_PER_PGD];
diff --git a/arch/ia64/mm/init.c b/arch/ia64/mm/init.c
index 7f5353e28516..12aef25944aa 100644
--- a/arch/ia64/mm/init.c
+++ b/arch/ia64/mm/init.c
@@ -50,30 +50,39 @@ void
 __ia64_sync_icache_dcache (pte_t pte)
 {
 	unsigned long addr;
-	struct page *page;
+	struct folio *folio;
 
-	page = pte_page(pte);
-	addr = (unsigned long) page_address(page);
+	folio = page_folio(pte_page(pte));
+	addr = (unsigned long)folio_address(folio);
 
-	if (test_bit(PG_arch_1, &page->flags))
+	if (test_bit(PG_arch_1, &folio->flags))
 		return;				/* i-cache is already coherent with d-cache */
 
-	flush_icache_range(addr, addr + page_size(page));
-	set_bit(PG_arch_1, &page->flags);	/* mark page as clean */
+	flush_icache_range(addr, addr + folio_size(folio));
+	set_bit(PG_arch_1, &folio->flags);	/* mark page as clean */
 }
 
 /*
- * Since DMA is i-cache coherent, any (complete) pages that were written via
+ * Since DMA is i-cache coherent, any (complete) folios that were written via
  * DMA can be marked as "clean" so that lazy_mmu_prot_update() doesn't have to
  * flush them when they get mapped into an executable vm-area.
  */
 void arch_dma_mark_clean(phys_addr_t paddr, size_t size)
 {
-	unsigned long pfn = PHYS_PFN(paddr);
+	struct folio *folio = page_folio(phys_to_page(paddr));
+	ssize_t left = size;
+	size_t offset = offset_in_folio(folio, paddr);
 
-	do {
+	if (offset) {
+		left -= folio_size(folio) - offset;
+		folio = folio_next(folio);
+	}
+
+	while (left >= (ssize_t)folio_size(folio)) {
 		set_bit(PG_arch_1, &pfn_to_page(pfn)->flags);
-	} while (++pfn <= PHYS_PFN(paddr + size - 1));
+		left -= folio_size(folio);
+		folio = folio_next(folio);
+	}
 }
 
 inline void
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH v3 11/34] ia64: Implement the new page table range API
@ 2023-02-28 21:37   ` Matthew Wilcox (Oracle)
  0 siblings, 0 replies; 77+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-02-28 21:37 UTC (permalink / raw)
  To: linux-mm, linux-arch; +Cc: Matthew Wilcox (Oracle), linux-kernel, linux-ia64

Add set_ptes(), update_mmu_cache_range() and flush_dcache_folio().
Change the PG_arch_1 (aka PG_dcache_clean) flag from being per-page to
per-folio, which makes arch_dma_mark_clean() and mark_clean() a little
more exciting.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: linux-ia64@vger.kernel.org
---
 arch/ia64/hp/common/sba_iommu.c    | 26 +++++++++++++++-----------
 arch/ia64/include/asm/cacheflush.h | 14 ++++++++++----
 arch/ia64/include/asm/pgtable.h    | 14 +++++++++++++-
 arch/ia64/mm/init.c                | 29 +++++++++++++++++++----------
 4 files changed, 57 insertions(+), 26 deletions(-)

diff --git a/arch/ia64/hp/common/sba_iommu.c b/arch/ia64/hp/common/sba_iommu.c
index 8ad6946521d8..48d475f10003 100644
--- a/arch/ia64/hp/common/sba_iommu.c
+++ b/arch/ia64/hp/common/sba_iommu.c
@@ -798,22 +798,26 @@ sba_io_pdir_entry(u64 *pdir_ptr, unsigned long vba)
 #endif
 
 #ifdef ENABLE_MARK_CLEAN
-/**
+/*
  * Since DMA is i-cache coherent, any (complete) pages that were written via
  * DMA can be marked as "clean" so that lazy_mmu_prot_update() doesn't have to
  * flush them when they get mapped into an executable vm-area.
  */
-static void
-mark_clean (void *addr, size_t size)
+static void mark_clean(void *addr, size_t size)
 {
-	unsigned long pg_addr, end;
-
-	pg_addr = PAGE_ALIGN((unsigned long) addr);
-	end = (unsigned long) addr + size;
-	while (pg_addr + PAGE_SIZE <= end) {
-		struct page *page = virt_to_page((void *)pg_addr);
-		set_bit(PG_arch_1, &page->flags);
-		pg_addr += PAGE_SIZE;
+	struct folio *folio = virt_to_folio(addr);
+	ssize_t left = size;
+	size_t offset = offset_in_folio(folio, addr);
+
+	if (offset) {
+		left -= folio_size(folio) - offset;
+		folio = folio_next(folio);
+	}
+
+	while (left >= folio_size(folio)) {
+		set_bit(PG_arch_1, &folio->flags);
+		left -= folio_size(folio);
+		folio = folio_next(folio);
 	}
 }
 #endif
diff --git a/arch/ia64/include/asm/cacheflush.h b/arch/ia64/include/asm/cacheflush.h
index 708c0fa5d975..eac493fa9e0d 100644
--- a/arch/ia64/include/asm/cacheflush.h
+++ b/arch/ia64/include/asm/cacheflush.h
@@ -13,10 +13,16 @@
 #include <asm/page.h>
 
 #define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
-#define flush_dcache_page(page)			\
-do {						\
-	clear_bit(PG_arch_1, &(page)->flags);	\
-} while (0)
+static inline void flush_dcache_folio(struct folio *folio)
+{
+	clear_bit(PG_arch_1, &folio->flags);
+}
+#define flush_dcache_folio flush_dcache_folio
+
+static inline void flush_dcache_page(struct page *page)
+{
+	flush_dcache_folio(page_folio(page));
+}
 
 extern void flush_icache_range(unsigned long start, unsigned long end);
 #define flush_icache_range flush_icache_range
diff --git a/arch/ia64/include/asm/pgtable.h b/arch/ia64/include/asm/pgtable.h
index 21c97e31a28a..0c2be4ea664b 100644
--- a/arch/ia64/include/asm/pgtable.h
+++ b/arch/ia64/include/asm/pgtable.h
@@ -303,7 +303,18 @@ static inline void set_pte(pte_t *ptep, pte_t pteval)
 	*ptep = pteval;
 }
 
-#define set_pte_at(mm,addr,ptep,pteval) set_pte(ptep,pteval)
+static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
+		pte_t *ptep, pte_t pte, unsigned int nr)
+{
+	for (;;) {
+		set_pte(ptep, pte);
+		if (--nr = 0)
+			break;
+		ptep++;
+		pte_val(pte) += PAGE_SIZE;
+	}
+}
+#define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, add, ptep, pte, 1)
 
 /*
  * Make page protection values cacheable, uncacheable, or write-
@@ -396,6 +407,7 @@ pte_same (pte_t a, pte_t b)
 	return pte_val(a) = pte_val(b);
 }
 
+#define update_mmu_cache_range(vma, address, ptep, nr) do { } while (0)
 #define update_mmu_cache(vma, address, ptep) do { } while (0)
 
 extern pgd_t swapper_pg_dir[PTRS_PER_PGD];
diff --git a/arch/ia64/mm/init.c b/arch/ia64/mm/init.c
index 7f5353e28516..12aef25944aa 100644
--- a/arch/ia64/mm/init.c
+++ b/arch/ia64/mm/init.c
@@ -50,30 +50,39 @@ void
 __ia64_sync_icache_dcache (pte_t pte)
 {
 	unsigned long addr;
-	struct page *page;
+	struct folio *folio;
 
-	page = pte_page(pte);
-	addr = (unsigned long) page_address(page);
+	folio = page_folio(pte_page(pte));
+	addr = (unsigned long)folio_address(folio);
 
-	if (test_bit(PG_arch_1, &page->flags))
+	if (test_bit(PG_arch_1, &folio->flags))
 		return;				/* i-cache is already coherent with d-cache */
 
-	flush_icache_range(addr, addr + page_size(page));
-	set_bit(PG_arch_1, &page->flags);	/* mark page as clean */
+	flush_icache_range(addr, addr + folio_size(folio));
+	set_bit(PG_arch_1, &folio->flags);	/* mark page as clean */
 }
 
 /*
- * Since DMA is i-cache coherent, any (complete) pages that were written via
+ * Since DMA is i-cache coherent, any (complete) folios that were written via
  * DMA can be marked as "clean" so that lazy_mmu_prot_update() doesn't have to
  * flush them when they get mapped into an executable vm-area.
  */
 void arch_dma_mark_clean(phys_addr_t paddr, size_t size)
 {
-	unsigned long pfn = PHYS_PFN(paddr);
+	struct folio *folio = page_folio(phys_to_page(paddr));
+	ssize_t left = size;
+	size_t offset = offset_in_folio(folio, paddr);
 
-	do {
+	if (offset) {
+		left -= folio_size(folio) - offset;
+		folio = folio_next(folio);
+	}
+
+	while (left >= (ssize_t)folio_size(folio)) {
 		set_bit(PG_arch_1, &pfn_to_page(pfn)->flags);
-	} while (++pfn <= PHYS_PFN(paddr + size - 1));
+		left -= folio_size(folio);
+		folio = folio_next(folio);
+	}
 }
 
 inline void
-- 
2.39.1

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH v3 12/34] loongarch: Implement the new page table range API
  2023-02-28 21:37 [PATCH v3 00/34] New page table range API Matthew Wilcox (Oracle)
                   ` (10 preceding siblings ...)
  2023-02-28 21:37   ` Matthew Wilcox (Oracle)
@ 2023-02-28 21:37 ` Matthew Wilcox (Oracle)
  2023-03-01  2:04   ` WANG Xuerui
  2023-02-28 21:37 ` [PATCH v3 13/34] m68k: " Matthew Wilcox (Oracle)
                   ` (24 subsequent siblings)
  36 siblings, 1 reply; 77+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-02-28 21:37 UTC (permalink / raw)
  To: linux-mm, linux-arch
  Cc: Matthew Wilcox (Oracle),
	linux-kernel, Huacai Chen, WANG Xuerui, loongarch

Add set_ptes() and update_mmu_cache_range().  It would probably be
more efficient to implement __update_tlb() by flushing the entire
folio instead of calling it __update_tlb() N times, but I'll leave
that for someone who understands the architecture better.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: WANG Xuerui <kernel@xen0n.name>
Cc: loongarch@lists.linux.dev
---
 arch/loongarch/include/asm/cacheflush.h |  2 ++
 arch/loongarch/include/asm/pgtable.h    | 30 +++++++++++++++++++------
 2 files changed, 25 insertions(+), 7 deletions(-)

diff --git a/arch/loongarch/include/asm/cacheflush.h b/arch/loongarch/include/asm/cacheflush.h
index 0681788eb474..7907eb42bfbd 100644
--- a/arch/loongarch/include/asm/cacheflush.h
+++ b/arch/loongarch/include/asm/cacheflush.h
@@ -47,8 +47,10 @@ void local_flush_icache_range(unsigned long start, unsigned long end);
 #define flush_cache_vmap(start, end)			do { } while (0)
 #define flush_cache_vunmap(start, end)			do { } while (0)
 #define flush_icache_page(vma, page)			do { } while (0)
+#define flush_icache_pages(vma, page)			do { } while (0)
 #define flush_icache_user_page(vma, page, addr, len)	do { } while (0)
 #define flush_dcache_page(page)				do { } while (0)
+#define flush_dcache_folio(folio)			do { } while (0)
 #define flush_dcache_mmap_lock(mapping)			do { } while (0)
 #define flush_dcache_mmap_unlock(mapping)		do { } while (0)
 
diff --git a/arch/loongarch/include/asm/pgtable.h b/arch/loongarch/include/asm/pgtable.h
index d28fb9dbec59..9154d317ffb4 100644
--- a/arch/loongarch/include/asm/pgtable.h
+++ b/arch/loongarch/include/asm/pgtable.h
@@ -334,12 +334,20 @@ static inline void set_pte(pte_t *ptep, pte_t pteval)
 	}
 }
 
-static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
-			      pte_t *ptep, pte_t pteval)
-{
-	set_pte(ptep, pteval);
+static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
+		pte_t *ptep, pte_t pte, unsigned int nr)
+{
+	for (;;) {
+		set_pte(ptep, pte);
+		if (--nr == 0)
+			break;
+		ptep++;
+		pte_val(pte) += 1 << _PFN_SHIFT;
+	}
 }
 
+#define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1)
+
 static inline void pte_clear(struct mm_struct *mm, unsigned long addr, pte_t *ptep)
 {
 	/* Preserve global status for the pair */
@@ -445,11 +453,19 @@ static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
 extern void __update_tlb(struct vm_area_struct *vma,
 			unsigned long address, pte_t *ptep);
 
-static inline void update_mmu_cache(struct vm_area_struct *vma,
-			unsigned long address, pte_t *ptep)
+static inline void update_mmu_cache_range(struct vm_area_struct *vma,
+		unsigned long address, pte_t *ptep, unsigned int nr)
 {
-	__update_tlb(vma, address, ptep);
+	for (;;) {
+		__update_tlb(vma, address, ptep);
+		if (--nr == 0)
+			break;
+		address += PAGE_SIZE;
+		ptep++;
+	}
 }
+#define update_mmu_cache(vma, addr, ptep) \
+	update_mmu_cache_range(vma, addr, ptep, 1)
 
 #define __HAVE_ARCH_UPDATE_MMU_TLB
 #define update_mmu_tlb	update_mmu_cache
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH v3 13/34] m68k: Implement the new page table range API
  2023-02-28 21:37 [PATCH v3 00/34] New page table range API Matthew Wilcox (Oracle)
                   ` (11 preceding siblings ...)
  2023-02-28 21:37 ` [PATCH v3 12/34] loongarch: " Matthew Wilcox (Oracle)
@ 2023-02-28 21:37 ` Matthew Wilcox (Oracle)
  2023-03-05 10:16   ` Geert Uytterhoeven
  2023-02-28 21:37 ` [PATCH v3 14/34] microblaze: " Matthew Wilcox (Oracle)
                   ` (23 subsequent siblings)
  36 siblings, 1 reply; 77+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-02-28 21:37 UTC (permalink / raw)
  To: linux-mm, linux-arch
  Cc: Matthew Wilcox (Oracle), linux-kernel, Geert Uytterhoeven, linux-m68k

Add set_ptes(), update_mmu_cache_range(), flush_icache_pages() and
flush_dcache_folio().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: linux-m68k@lists.linux-m68k.org
---
 arch/m68k/include/asm/cacheflush_mm.h | 26 +++++++++++++++++---------
 arch/m68k/include/asm/pgtable_mm.h    | 21 ++++++++++++++++++---
 arch/m68k/mm/motorola.c               |  2 +-
 3 files changed, 36 insertions(+), 13 deletions(-)

diff --git a/arch/m68k/include/asm/cacheflush_mm.h b/arch/m68k/include/asm/cacheflush_mm.h
index 1ac55e7b47f0..d43c8bce149b 100644
--- a/arch/m68k/include/asm/cacheflush_mm.h
+++ b/arch/m68k/include/asm/cacheflush_mm.h
@@ -220,24 +220,28 @@ static inline void flush_cache_page(struct vm_area_struct *vma, unsigned long vm
 
 /* Push the page at kernel virtual address and clear the icache */
 /* RZ: use cpush %bc instead of cpush %dc, cinv %ic */
-static inline void __flush_page_to_ram(void *vaddr)
+static inline void __flush_pages_to_ram(void *vaddr, unsigned int nr)
 {
 	if (CPU_IS_COLDFIRE) {
 		unsigned long addr, start, end;
 		addr = ((unsigned long) vaddr) & ~(PAGE_SIZE - 1);
 		start = addr & ICACHE_SET_MASK;
-		end = (addr + PAGE_SIZE - 1) & ICACHE_SET_MASK;
+		end = (addr + nr * PAGE_SIZE - 1) & ICACHE_SET_MASK;
 		if (start > end) {
 			flush_cf_bcache(0, end);
 			end = ICACHE_MAX_ADDR;
 		}
 		flush_cf_bcache(start, end);
 	} else if (CPU_IS_040_OR_060) {
-		__asm__ __volatile__("nop\n\t"
-				     ".chip 68040\n\t"
-				     "cpushp %%bc,(%0)\n\t"
-				     ".chip 68k"
-				     : : "a" (__pa(vaddr)));
+		unsigned long paddr = __pa(vaddr);
+
+		while (nr--) {
+			__asm__ __volatile__("nop\n\t"
+					     ".chip 68040\n\t"
+					     "cpushp %%bc,(%0)\n\t"
+					     ".chip 68k"
+					     : : "a" (paddr + nr * PAGE_SIZE));
+		}
 	} else {
 		unsigned long _tmp;
 		__asm__ __volatile__("movec %%cacr,%0\n\t"
@@ -249,10 +253,14 @@ static inline void __flush_page_to_ram(void *vaddr)
 }
 
 #define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
-#define flush_dcache_page(page)		__flush_page_to_ram(page_address(page))
+#define flush_dcache_page(page)	__flush_pages_to_ram(page_address(page), 1)
+#define flush_dcache_folio(folio)		\
+	__flush_pages_to_ram(folio_address(folio), folio_nr_pages(folio))
 #define flush_dcache_mmap_lock(mapping)		do { } while (0)
 #define flush_dcache_mmap_unlock(mapping)	do { } while (0)
-#define flush_icache_page(vma, page)	__flush_page_to_ram(page_address(page))
+#define flush_icache_pages(vma, page, nr)	\
+	__flush_pages_to_ram(page_address(page), nr)
+#define flush_icache_page(vma, page) flush_icache_pages(vma, page, 1)
 
 extern void flush_icache_user_page(struct vm_area_struct *vma, struct page *page,
 				    unsigned long addr, int len);
diff --git a/arch/m68k/include/asm/pgtable_mm.h b/arch/m68k/include/asm/pgtable_mm.h
index b93c41fe2067..400206c17c97 100644
--- a/arch/m68k/include/asm/pgtable_mm.h
+++ b/arch/m68k/include/asm/pgtable_mm.h
@@ -31,8 +31,20 @@
 	do{							\
 		*(pteptr) = (pteval);				\
 	} while(0)
-#define set_pte_at(mm,addr,ptep,pteval) set_pte(ptep,pteval)
 
+static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
+		pte_t *ptep, pte_t pte, unsigned int nr)
+{
+	for (;;) {
+		set_pte(ptep, pte);
+		if (--nr == 0)
+			break;
+		ptep++;
+		pte_val(pte) += PAGE_SIZE;
+	}
+}
+
+#define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1)
 
 /* PMD_SHIFT determines the size of the area a second-level page table can map */
 #if CONFIG_PGTABLE_LEVELS == 3
@@ -138,11 +150,14 @@ extern void kernel_set_cachemode(void *addr, unsigned long size, int cmode);
  * tables contain all the necessary information.  The Sun3 does, but
  * they are updated on demand.
  */
-static inline void update_mmu_cache(struct vm_area_struct *vma,
-				    unsigned long address, pte_t *ptep)
+static inline void update_mmu_cache_range(struct vm_area_struct *vma,
+		unsigned long address, pte_t *ptep, unsigned int nr)
 {
 }
 
+#define update_mmu_cache(vma, addr, ptep) \
+	update_mmu_cache_range(vma, addr, ptep, 1)
+
 #endif /* !__ASSEMBLY__ */
 
 /* MMU-specific headers */
diff --git a/arch/m68k/mm/motorola.c b/arch/m68k/mm/motorola.c
index 2a375637e007..7784d0fcdf6e 100644
--- a/arch/m68k/mm/motorola.c
+++ b/arch/m68k/mm/motorola.c
@@ -81,7 +81,7 @@ static inline void cache_page(void *vaddr)
 
 void mmu_page_ctor(void *page)
 {
-	__flush_page_to_ram(page);
+	__flush_pages_to_ram(page, 1);
 	flush_tlb_kernel_page(page);
 	nocache_page(page);
 }
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH v3 14/34] microblaze: Implement the new page table range API
  2023-02-28 21:37 [PATCH v3 00/34] New page table range API Matthew Wilcox (Oracle)
                   ` (12 preceding siblings ...)
  2023-02-28 21:37 ` [PATCH v3 13/34] m68k: " Matthew Wilcox (Oracle)
@ 2023-02-28 21:37 ` Matthew Wilcox (Oracle)
  2023-03-03 10:53   ` Mike Rapoport
  2023-02-28 21:37 ` [PATCH v3 15/34] mips: " Matthew Wilcox (Oracle)
                   ` (22 subsequent siblings)
  36 siblings, 1 reply; 77+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-02-28 21:37 UTC (permalink / raw)
  To: linux-mm, linux-arch; +Cc: Matthew Wilcox (Oracle), linux-kernel, Michal Simek

Add set_ptes(), update_mmu_cache_range(), flush_icache_pages() and
flush_dcache_folio().  Also change the calling convention for set_pte()
to be the same as other architectures.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Simek <monstr@monstr.eu>
---
 arch/microblaze/include/asm/cacheflush.h |  8 ++++++++
 arch/microblaze/include/asm/pgtable.h    | 17 ++++++++++++-----
 arch/microblaze/include/asm/tlbflush.h   |  4 +++-
 3 files changed, 23 insertions(+), 6 deletions(-)

diff --git a/arch/microblaze/include/asm/cacheflush.h b/arch/microblaze/include/asm/cacheflush.h
index 39f8fb6768d8..e6641ff98cb3 100644
--- a/arch/microblaze/include/asm/cacheflush.h
+++ b/arch/microblaze/include/asm/cacheflush.h
@@ -74,6 +74,14 @@ do { \
 	flush_dcache_range((unsigned) (addr), (unsigned) (addr) + PAGE_SIZE); \
 } while (0);
 
+static void flush_dcache_folio(struct folio *folio)
+{
+	unsigned long addr = folio_pfn(folio) << PAGE_SHIFT;
+
+	flush_dcache_range(addr, addr + folio_size(folio));
+}
+#define flush_dcache_folio flush_dcache_folio
+
 #define flush_cache_page(vma, vmaddr, pfn) \
 	flush_dcache_range(pfn << PAGE_SHIFT, (pfn << PAGE_SHIFT) + PAGE_SIZE);
 
diff --git a/arch/microblaze/include/asm/pgtable.h b/arch/microblaze/include/asm/pgtable.h
index d1b8272abcd9..a01e1369b486 100644
--- a/arch/microblaze/include/asm/pgtable.h
+++ b/arch/microblaze/include/asm/pgtable.h
@@ -330,18 +330,25 @@ static inline unsigned long pte_update(pte_t *p, unsigned long clr,
 /*
  * set_pte stores a linux PTE into the linux page table.
  */
-static inline void set_pte(struct mm_struct *mm, unsigned long addr,
-		pte_t *ptep, pte_t pte)
+static inline void set_pte(pte_t *ptep, pte_t pte)
 {
 	*ptep = pte;
 }
 
-static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
-		pte_t *ptep, pte_t pte)
+static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
+		pte_t *ptep, pte_t pte, unsigned int nr)
 {
-	*ptep = pte;
+	for (;;) {
+		set_pte(ptep, pte);
+		if (--nr == 0)
+			break;
+		ptep++;
+		pte_val(pte) += 1 << PFN_SHIFT_OFFSET;
+	}
 }
 
+#define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1)
+
 #define __HAVE_ARCH_PTEP_TEST_AND_CLEAR_YOUNG
 static inline int ptep_test_and_clear_young(struct vm_area_struct *vma,
 		unsigned long address, pte_t *ptep)
diff --git a/arch/microblaze/include/asm/tlbflush.h b/arch/microblaze/include/asm/tlbflush.h
index 2038168ed128..1b179e5e9062 100644
--- a/arch/microblaze/include/asm/tlbflush.h
+++ b/arch/microblaze/include/asm/tlbflush.h
@@ -33,7 +33,9 @@ static inline void local_flush_tlb_range(struct vm_area_struct *vma,
 
 #define flush_tlb_kernel_range(start, end)	do { } while (0)
 
-#define update_mmu_cache(vma, addr, ptep)	do { } while (0)
+#define update_mmu_cache_range(vma, addr, ptep, nr)	do { } while (0)
+#define update_mmu_cache(vma, addr, pte) \
+	update_mmu_cache_range(vma, addr, ptep, 1)
 
 #define flush_tlb_all local_flush_tlb_all
 #define flush_tlb_mm local_flush_tlb_mm
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH v3 15/34] mips: Implement the new page table range API
  2023-02-28 21:37 [PATCH v3 00/34] New page table range API Matthew Wilcox (Oracle)
                   ` (13 preceding siblings ...)
  2023-02-28 21:37 ` [PATCH v3 14/34] microblaze: " Matthew Wilcox (Oracle)
@ 2023-02-28 21:37 ` Matthew Wilcox (Oracle)
  2023-03-03 12:24   ` Mike Rapoport
  2023-02-28 21:37 ` [PATCH v3 16/34] nios2: " Matthew Wilcox (Oracle)
                   ` (21 subsequent siblings)
  36 siblings, 1 reply; 77+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-02-28 21:37 UTC (permalink / raw)
  To: linux-mm, linux-arch
  Cc: Matthew Wilcox (Oracle), linux-kernel, Thomas Bogendoerfer, linux-mips

Add set_ptes(), update_mmu_cache_range(), flush_icache_pages() and
flush_dcache_folio().  Change the PG_arch_1 (aka PG_dcache_dirty) flag
from being per-page to per-folio.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: linux-mips@vger.kernel.org
---
 arch/mips/include/asm/cacheflush.h | 32 +++++++++++------
 arch/mips/include/asm/pgtable.h    | 36 +++++++++++++------
 arch/mips/mm/c-r4k.c               |  5 +--
 arch/mips/mm/cache.c               | 56 +++++++++++++++---------------
 arch/mips/mm/init.c                | 17 +++++----
 5 files changed, 88 insertions(+), 58 deletions(-)

diff --git a/arch/mips/include/asm/cacheflush.h b/arch/mips/include/asm/cacheflush.h
index b3dc9c589442..2683cade42ef 100644
--- a/arch/mips/include/asm/cacheflush.h
+++ b/arch/mips/include/asm/cacheflush.h
@@ -36,12 +36,12 @@
  */
 #define PG_dcache_dirty			PG_arch_1
 
-#define Page_dcache_dirty(page)		\
-	test_bit(PG_dcache_dirty, &(page)->flags)
-#define SetPageDcacheDirty(page)	\
-	set_bit(PG_dcache_dirty, &(page)->flags)
-#define ClearPageDcacheDirty(page)	\
-	clear_bit(PG_dcache_dirty, &(page)->flags)
+#define folio_test_dcache_dirty(folio)		\
+	test_bit(PG_dcache_dirty, &(folio)->flags)
+#define folio_set_dcache_dirty(folio)	\
+	set_bit(PG_dcache_dirty, &(folio)->flags)
+#define folio_clear_dcache_dirty(folio)	\
+	clear_bit(PG_dcache_dirty, &(folio)->flags)
 
 extern void (*flush_cache_all)(void);
 extern void (*__flush_cache_all)(void);
@@ -50,15 +50,24 @@ extern void (*flush_cache_mm)(struct mm_struct *mm);
 extern void (*flush_cache_range)(struct vm_area_struct *vma,
 	unsigned long start, unsigned long end);
 extern void (*flush_cache_page)(struct vm_area_struct *vma, unsigned long page, unsigned long pfn);
-extern void __flush_dcache_page(struct page *page);
+extern void __flush_dcache_pages(struct page *page, unsigned int nr);
 
 #define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
+static inline void flush_dcache_folio(struct folio *folio)
+{
+	if (cpu_has_dc_aliases)
+		__flush_dcache_pages(&folio->page, folio_nr_pages(folio));
+	else if (!cpu_has_ic_fills_f_dc)
+		folio_set_dcache_dirty(folio);
+}
+#define flush_dcache_folio flush_dcache_folio
+
 static inline void flush_dcache_page(struct page *page)
 {
 	if (cpu_has_dc_aliases)
-		__flush_dcache_page(page);
+		__flush_dcache_pages(page, 1);
 	else if (!cpu_has_ic_fills_f_dc)
-		SetPageDcacheDirty(page);
+		folio_set_dcache_dirty(page_folio(page));
 }
 
 #define flush_dcache_mmap_lock(mapping)		do { } while (0)
@@ -73,10 +82,11 @@ static inline void flush_anon_page(struct vm_area_struct *vma,
 		__flush_anon_page(page, vmaddr);
 }
 
-static inline void flush_icache_page(struct vm_area_struct *vma,
-	struct page *page)
+static inline void flush_icache_pages(struct vm_area_struct *vma,
+		struct page *page, unsigned int nr)
 {
 }
+#define flush_icache_page(vma, page) flush_icache_pages(vma, page, 1)
 
 extern void (*flush_icache_range)(unsigned long start, unsigned long end);
 extern void (*local_flush_icache_range)(unsigned long start, unsigned long end);
diff --git a/arch/mips/include/asm/pgtable.h b/arch/mips/include/asm/pgtable.h
index 791389bf3c12..0cf0455e6ae8 100644
--- a/arch/mips/include/asm/pgtable.h
+++ b/arch/mips/include/asm/pgtable.h
@@ -105,8 +105,10 @@ do {									\
 	}								\
 } while(0)
 
-static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
-			      pte_t *ptep, pte_t pteval);
+static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
+		pte_t *ptep, pte_t pte, unsigned int nr);
+
+#define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1)
 
 #if defined(CONFIG_PHYS_ADDR_T_64BIT) && defined(CONFIG_CPU_MIPS32)
 
@@ -204,19 +206,31 @@ static inline void pte_clear(struct mm_struct *mm, unsigned long addr, pte_t *pt
 }
 #endif
 
-static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
-			      pte_t *ptep, pte_t pteval)
+static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
+		pte_t *ptep, pte_t pte, unsigned int nr)
 {
+	unsigned int i;
+	bool do_sync = false;
 
-	if (!pte_present(pteval))
-		goto cache_sync_done;
+	for (i = 0; i < nr; i++) {
+		if (!pte_present(pte))
+			continue;
+		if (pte_present(ptep[i]) &&
+		    (pte_pfn(ptep[i]) == pte_pfn(pte)))
+			continue;
+		do_sync = true;
+	}
 
-	if (pte_present(*ptep) && (pte_pfn(*ptep) == pte_pfn(pteval)))
-		goto cache_sync_done;
+	if (do_sync)
+		__update_cache(addr, pte);
 
-	__update_cache(addr, pteval);
-cache_sync_done:
-	set_pte(ptep, pteval);
+	for (;;) {
+		set_pte(ptep, pte);
+		if (--nr == 0)
+			break;
+		ptep++;
+		pte_val(pte) += 1 << _PFN_SHIFT;
+	}
 }
 
 /*
diff --git a/arch/mips/mm/c-r4k.c b/arch/mips/mm/c-r4k.c
index a549fa98c2f4..7d2a42f0cffd 100644
--- a/arch/mips/mm/c-r4k.c
+++ b/arch/mips/mm/c-r4k.c
@@ -679,13 +679,14 @@ static inline void local_r4k_flush_cache_page(void *args)
 	if ((mm == current->active_mm) && (pte_val(*ptep) & _PAGE_VALID))
 		vaddr = NULL;
 	else {
+		struct folio *folio = page_folio(page);
 		/*
 		 * Use kmap_coherent or kmap_atomic to do flushes for
 		 * another ASID than the current one.
 		 */
 		map_coherent = (cpu_has_dc_aliases &&
-				page_mapcount(page) &&
-				!Page_dcache_dirty(page));
+				folio_mapped(folio) &&
+				!folio_test_dcache_dirty(folio));
 		if (map_coherent)
 			vaddr = kmap_coherent(page, addr);
 		else
diff --git a/arch/mips/mm/cache.c b/arch/mips/mm/cache.c
index 11b3e7ddafd5..0668435521fc 100644
--- a/arch/mips/mm/cache.c
+++ b/arch/mips/mm/cache.c
@@ -82,13 +82,15 @@ SYSCALL_DEFINE3(cacheflush, unsigned long, addr, unsigned long, bytes,
 	return 0;
 }
 
-void __flush_dcache_page(struct page *page)
+void __flush_dcache_pages(struct page *page, unsigned int nr)
 {
-	struct address_space *mapping = page_mapping_file(page);
+	struct folio *folio = page_folio(page);
+	struct address_space *mapping = folio_flush_mapping(folio);
 	unsigned long addr;
+	unsigned int i;
 
 	if (mapping && !mapping_mapped(mapping)) {
-		SetPageDcacheDirty(page);
+		folio_set_dcache_dirty(folio);
 		return;
 	}
 
@@ -97,25 +99,21 @@ void __flush_dcache_page(struct page *page)
 	 * case is for exec env/arg pages and those are %99 certainly going to
 	 * get faulted into the tlb (and thus flushed) anyways.
 	 */
-	if (PageHighMem(page))
-		addr = (unsigned long)kmap_atomic(page);
-	else
-		addr = (unsigned long)page_address(page);
-
-	flush_data_cache_page(addr);
-
-	if (PageHighMem(page))
-		kunmap_atomic((void *)addr);
+	for (i = 0; i < nr; i++) {
+		addr = (unsigned long)kmap_local_page(page + i);
+		flush_data_cache_page(addr);
+		kunmap_local((void *)addr);
+	}
 }
-
-EXPORT_SYMBOL(__flush_dcache_page);
+EXPORT_SYMBOL(__flush_dcache_pages);
 
 void __flush_anon_page(struct page *page, unsigned long vmaddr)
 {
 	unsigned long addr = (unsigned long) page_address(page);
+	struct folio *folio = page_folio(page);
 
 	if (pages_do_alias(addr, vmaddr)) {
-		if (page_mapcount(page) && !Page_dcache_dirty(page)) {
+		if (folio_mapped(folio) && !folio_test_dcache_dirty(folio)) {
 			void *kaddr;
 
 			kaddr = kmap_coherent(page, vmaddr);
@@ -130,27 +128,29 @@ EXPORT_SYMBOL(__flush_anon_page);
 
 void __update_cache(unsigned long address, pte_t pte)
 {
-	struct page *page;
+	struct folio *folio;
 	unsigned long pfn, addr;
 	int exec = !pte_no_exec(pte) && !cpu_has_ic_fills_f_dc;
+	unsigned int i;
 
 	pfn = pte_pfn(pte);
 	if (unlikely(!pfn_valid(pfn)))
 		return;
-	page = pfn_to_page(pfn);
-	if (Page_dcache_dirty(page)) {
-		if (PageHighMem(page))
-			addr = (unsigned long)kmap_atomic(page);
-		else
-			addr = (unsigned long)page_address(page);
-
-		if (exec || pages_do_alias(addr, address & PAGE_MASK))
-			flush_data_cache_page(addr);
 
-		if (PageHighMem(page))
-			kunmap_atomic((void *)addr);
+	folio = page_folio(pfn_to_page(pfn));
+	address &= PAGE_MASK;
+	address -= offset_in_folio(folio, pfn << PAGE_SHIFT);
+
+	if (folio_test_dcache_dirty(folio)) {
+		for (i = 0; i < folio_nr_pages(folio); i++) {
+			addr = (unsigned long)kmap_local_folio(folio, i);
 
-		ClearPageDcacheDirty(page);
+			if (exec || pages_do_alias(addr, address))
+				flush_data_cache_page(addr);
+			kunmap_local((void *)addr);
+			address += PAGE_SIZE;
+		}
+		folio_clear_dcache_dirty(folio);
 	}
 }
 
diff --git a/arch/mips/mm/init.c b/arch/mips/mm/init.c
index 5a8002839550..19d4ca3b3fbd 100644
--- a/arch/mips/mm/init.c
+++ b/arch/mips/mm/init.c
@@ -88,7 +88,7 @@ static void *__kmap_pgprot(struct page *page, unsigned long addr, pgprot_t prot)
 	pte_t pte;
 	int tlbidx;
 
-	BUG_ON(Page_dcache_dirty(page));
+	BUG_ON(folio_test_dcache_dirty(page_folio(page)));
 
 	preempt_disable();
 	pagefault_disable();
@@ -169,11 +169,12 @@ void kunmap_coherent(void)
 void copy_user_highpage(struct page *to, struct page *from,
 	unsigned long vaddr, struct vm_area_struct *vma)
 {
+	struct folio *src = page_folio(from);
 	void *vfrom, *vto;
 
 	vto = kmap_atomic(to);
 	if (cpu_has_dc_aliases &&
-	    page_mapcount(from) && !Page_dcache_dirty(from)) {
+	    folio_mapped(src) && !folio_test_dcache_dirty(src)) {
 		vfrom = kmap_coherent(from, vaddr);
 		copy_page(vto, vfrom);
 		kunmap_coherent();
@@ -194,15 +195,17 @@ void copy_to_user_page(struct vm_area_struct *vma,
 	struct page *page, unsigned long vaddr, void *dst, const void *src,
 	unsigned long len)
 {
+	struct folio *folio = page_folio(page);
+
 	if (cpu_has_dc_aliases &&
-	    page_mapcount(page) && !Page_dcache_dirty(page)) {
+	    folio_mapped(folio) && !folio_test_dcache_dirty(folio)) {
 		void *vto = kmap_coherent(page, vaddr) + (vaddr & ~PAGE_MASK);
 		memcpy(vto, src, len);
 		kunmap_coherent();
 	} else {
 		memcpy(dst, src, len);
 		if (cpu_has_dc_aliases)
-			SetPageDcacheDirty(page);
+			folio_set_dcache_dirty(folio);
 	}
 	if (vma->vm_flags & VM_EXEC)
 		flush_cache_page(vma, vaddr, page_to_pfn(page));
@@ -212,15 +215,17 @@ void copy_from_user_page(struct vm_area_struct *vma,
 	struct page *page, unsigned long vaddr, void *dst, const void *src,
 	unsigned long len)
 {
+	struct folio *folio = page_folio(page);
+
 	if (cpu_has_dc_aliases &&
-	    page_mapcount(page) && !Page_dcache_dirty(page)) {
+	    folio_mapped(folio) && !folio_test_dcache_dirty(folio)) {
 		void *vfrom = kmap_coherent(page, vaddr) + (vaddr & ~PAGE_MASK);
 		memcpy(dst, vfrom, len);
 		kunmap_coherent();
 	} else {
 		memcpy(dst, src, len);
 		if (cpu_has_dc_aliases)
-			SetPageDcacheDirty(page);
+			folio_set_dcache_dirty(folio);
 	}
 }
 EXPORT_SYMBOL_GPL(copy_from_user_page);
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH v3 16/34] nios2: Implement the new page table range API
  2023-02-28 21:37 [PATCH v3 00/34] New page table range API Matthew Wilcox (Oracle)
                   ` (14 preceding siblings ...)
  2023-02-28 21:37 ` [PATCH v3 15/34] mips: " Matthew Wilcox (Oracle)
@ 2023-02-28 21:37 ` Matthew Wilcox (Oracle)
  2023-03-03 12:49   ` Mike Rapoport
  2023-02-28 21:37 ` [PATCH v3 17/34] openrisc: " Matthew Wilcox (Oracle)
                   ` (20 subsequent siblings)
  36 siblings, 1 reply; 77+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-02-28 21:37 UTC (permalink / raw)
  To: linux-mm, linux-arch; +Cc: Matthew Wilcox (Oracle), linux-kernel, Dinh Nguyen

Add set_ptes(), update_mmu_cache_range(), flush_icache_pages() and
flush_dcache_folio().  Change the PG_arch_1 (aka PG_dcache_dirty) flag
from being per-page to per-folio.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Dinh Nguyen <dinguyen@kernel.org>
---
 arch/nios2/include/asm/cacheflush.h |  6 ++-
 arch/nios2/include/asm/pgtable.h    | 27 +++++++++----
 arch/nios2/mm/cacheflush.c          | 61 ++++++++++++++++-------------
 3 files changed, 58 insertions(+), 36 deletions(-)

diff --git a/arch/nios2/include/asm/cacheflush.h b/arch/nios2/include/asm/cacheflush.h
index d0b71dd71287..8624ca83cffe 100644
--- a/arch/nios2/include/asm/cacheflush.h
+++ b/arch/nios2/include/asm/cacheflush.h
@@ -29,9 +29,13 @@ extern void flush_cache_page(struct vm_area_struct *vma, unsigned long vmaddr,
 	unsigned long pfn);
 #define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
 void flush_dcache_page(struct page *page);
+void flush_dcache_folio(struct folio *folio);
+#define flush_dcache_folio flush_dcache_folio
 
 extern void flush_icache_range(unsigned long start, unsigned long end);
-extern void flush_icache_page(struct vm_area_struct *vma, struct page *page);
+void flush_icache_pages(struct vm_area_struct *vma, struct page *page,
+		unsigned int nr);
+#define flush_icache_page(vma, page) flush_icache_pages(vma, page, 1);
 
 #define flush_cache_vmap(start, end)		flush_dcache_range(start, end)
 #define flush_cache_vunmap(start, end)		flush_dcache_range(start, end)
diff --git a/arch/nios2/include/asm/pgtable.h b/arch/nios2/include/asm/pgtable.h
index 0f5c2564e9f5..8a77821a17a5 100644
--- a/arch/nios2/include/asm/pgtable.h
+++ b/arch/nios2/include/asm/pgtable.h
@@ -178,15 +178,23 @@ static inline void set_pte(pte_t *ptep, pte_t pteval)
 	*ptep = pteval;
 }
 
-static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
-			      pte_t *ptep, pte_t pteval)
+static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
+		pte_t *ptep, pte_t pte, unsigned int nr)
 {
-	unsigned long paddr = (unsigned long)page_to_virt(pte_page(pteval));
-
-	flush_dcache_range(paddr, paddr + PAGE_SIZE);
-	set_pte(ptep, pteval);
+	unsigned long paddr = (unsigned long)page_to_virt(pte_page(pte));
+
+	flush_dcache_range(paddr, paddr + nr * PAGE_SIZE);
+	for (;;) {
+		set_pte(ptep, pte);
+		if (--nr == 0)
+			break;
+		ptep++;
+		pte_val(pte) += 1;
+	}
 }
 
+#define set_pte_at(mm, addr, ptep, pte)	set_ptes(mm, addr, ptep, pte, 1)
+
 static inline int pmd_none(pmd_t pmd)
 {
 	return (pmd_val(pmd) ==
@@ -273,7 +281,10 @@ static inline pte_t pte_swp_clear_exclusive(pte_t pte)
 extern void __init paging_init(void);
 extern void __init mmu_init(void);
 
-extern void update_mmu_cache(struct vm_area_struct *vma,
-			     unsigned long address, pte_t *pte);
+void update_mmu_cache_range(struct vm_area_struct *vma, unsigned long address,
+		pte_t *ptep, unsigned int nr);
+
+#define update_mmu_cache(vma, addr, ptep) \
+	update_mmu_cache_range(vma, addr, ptep, 1)
 
 #endif /* _ASM_NIOS2_PGTABLE_H */
diff --git a/arch/nios2/mm/cacheflush.c b/arch/nios2/mm/cacheflush.c
index 6aa9257c3ede..471485a84b2c 100644
--- a/arch/nios2/mm/cacheflush.c
+++ b/arch/nios2/mm/cacheflush.c
@@ -138,10 +138,11 @@ void flush_cache_range(struct vm_area_struct *vma, unsigned long start,
 		__flush_icache(start, end);
 }
 
-void flush_icache_page(struct vm_area_struct *vma, struct page *page)
+void flush_icache_pages(struct vm_area_struct *vma, struct page *page,
+		unsigned int nr)
 {
 	unsigned long start = (unsigned long) page_address(page);
-	unsigned long end = start + PAGE_SIZE;
+	unsigned long end = start + nr * PAGE_SIZE;
 
 	__flush_dcache(start, end);
 	__flush_icache(start, end);
@@ -158,19 +159,19 @@ void flush_cache_page(struct vm_area_struct *vma, unsigned long vmaddr,
 		__flush_icache(start, end);
 }
 
-void __flush_dcache_page(struct address_space *mapping, struct page *page)
+void __flush_dcache_folio(struct address_space *mapping, struct folio *folio)
 {
 	/*
 	 * Writeback any data associated with the kernel mapping of this
 	 * page.  This ensures that data in the physical page is mutually
 	 * coherent with the kernels mapping.
 	 */
-	unsigned long start = (unsigned long)page_address(page);
+	unsigned long start = (unsigned long)folio_address(folio);
 
-	__flush_dcache(start, start + PAGE_SIZE);
+	__flush_dcache(start, start + folio_size(folio));
 }
 
-void flush_dcache_page(struct page *page)
+void flush_dcache_folio(struct folio *folio)
 {
 	struct address_space *mapping;
 
@@ -178,32 +179,38 @@ void flush_dcache_page(struct page *page)
 	 * The zero page is never written to, so never has any dirty
 	 * cache lines, and therefore never needs to be flushed.
 	 */
-	if (page == ZERO_PAGE(0))
+	if (is_zero_pfn(folio_pfn(folio)))
 		return;
 
-	mapping = page_mapping_file(page);
+	mapping = folio_flush_mapping(folio);
 
 	/* Flush this page if there are aliases. */
 	if (mapping && !mapping_mapped(mapping)) {
-		clear_bit(PG_dcache_clean, &page->flags);
+		clear_bit(PG_dcache_clean, &folio->flags);
 	} else {
-		__flush_dcache_page(mapping, page);
+		__flush_dcache_folio(mapping, folio);
 		if (mapping) {
-			unsigned long start = (unsigned long)page_address(page);
-			flush_aliases(mapping,  page);
-			flush_icache_range(start, start + PAGE_SIZE);
+			unsigned long start = (unsigned long)folio_address(folio);
+			flush_aliases(mapping, folio);
+			flush_icache_range(start, start + folio_size(folio));
 		}
-		set_bit(PG_dcache_clean, &page->flags);
+		set_bit(PG_dcache_clean, &folio->flags);
 	}
 }
-EXPORT_SYMBOL(flush_dcache_page);
+EXPORT_SYMBOL(flush_dcache_folio);
+
+void flush_dcache_page(struct page *page)
+{
+	flush_dcache_folio(page_folio(page));
+}
+EXPORT_SYMBOL(flush_dcache_folio);
 
-void update_mmu_cache(struct vm_area_struct *vma,
-		      unsigned long address, pte_t *ptep)
+void update_mmu_cache_range(struct vm_area_struct *vma, unsigned long address,
+		pte_t *ptep, unsigned int nr)
 {
 	pte_t pte = *ptep;
 	unsigned long pfn = pte_pfn(pte);
-	struct page *page;
+	struct folio *folio;
 	struct address_space *mapping;
 
 	reload_tlb_page(vma, address, pte);
@@ -215,19 +222,19 @@ void update_mmu_cache(struct vm_area_struct *vma,
 	* The zero page is never written to, so never has any dirty
 	* cache lines, and therefore never needs to be flushed.
 	*/
-	page = pfn_to_page(pfn);
-	if (page == ZERO_PAGE(0))
+	if (is_zero_pfn(pfn))
 		return;
 
-	mapping = page_mapping_file(page);
-	if (!test_and_set_bit(PG_dcache_clean, &page->flags))
-		__flush_dcache_page(mapping, page);
+	folio = page_folio(pfn_to_page(pfn));
+	mapping = folio_flush_mapping(folio);
+	if (!test_and_set_bit(PG_dcache_clean, &folio->flags))
+		__flush_dcache_folio(mapping, folio);
 
-	if(mapping)
-	{
-		flush_aliases(mapping, page);
+	if (mapping) {
+		flush_aliases(mapping, folio);
 		if (vma->vm_flags & VM_EXEC)
-			flush_icache_page(vma, page);
+			flush_icache_pages(vma, &folio->page,
+					folio_nr_pages(folio));
 	}
 }
 
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH v3 17/34] openrisc: Implement the new page table range API
  2023-02-28 21:37 [PATCH v3 00/34] New page table range API Matthew Wilcox (Oracle)
                   ` (15 preceding siblings ...)
  2023-02-28 21:37 ` [PATCH v3 16/34] nios2: " Matthew Wilcox (Oracle)
@ 2023-02-28 21:37 ` Matthew Wilcox (Oracle)
  2023-02-28 21:37 ` [PATCH v3 18/34] parisc: " Matthew Wilcox (Oracle)
                   ` (19 subsequent siblings)
  36 siblings, 0 replies; 77+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-02-28 21:37 UTC (permalink / raw)
  To: linux-mm, linux-arch
  Cc: Matthew Wilcox (Oracle),
	linux-kernel, Jonas Bonn, Stefan Kristiansson, Stafford Horne,
	linux-openrisc

Add set_ptes(), update_mmu_cache_range() and flush_dcache_folio().
Change the PG_arch_1 (aka PG_dcache_dirty) flag from being per-page
to per-folio.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
Cc: Stafford Horne <shorne@gmail.com>
Cc: linux-openrisc@vger.kernel.org
---
 arch/openrisc/include/asm/cacheflush.h |  8 +++++++-
 arch/openrisc/include/asm/pgtable.h    | 27 +++++++++++++++++++++-----
 arch/openrisc/mm/cache.c               | 12 ++++++++----
 3 files changed, 37 insertions(+), 10 deletions(-)

diff --git a/arch/openrisc/include/asm/cacheflush.h b/arch/openrisc/include/asm/cacheflush.h
index eeac40d4a854..984c331ff5f4 100644
--- a/arch/openrisc/include/asm/cacheflush.h
+++ b/arch/openrisc/include/asm/cacheflush.h
@@ -56,10 +56,16 @@ static inline void sync_icache_dcache(struct page *page)
  */
 #define PG_dc_clean                  PG_arch_1
 
+static inline void flush_dcache_folio(struct folio *folio)
+{
+	clear_bit(PG_dc_clean, &folio->flags);
+}
+#define flush_dcache_folio flush_dcache_folio
+
 #define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
 static inline void flush_dcache_page(struct page *page)
 {
-	clear_bit(PG_dc_clean, &page->flags);
+	flush_dcache_folio(page_folio(page));
 }
 
 #define flush_icache_user_page(vma, page, addr, len)	\
diff --git a/arch/openrisc/include/asm/pgtable.h b/arch/openrisc/include/asm/pgtable.h
index 3eb9b9555d0d..1a7077150d7b 100644
--- a/arch/openrisc/include/asm/pgtable.h
+++ b/arch/openrisc/include/asm/pgtable.h
@@ -46,7 +46,21 @@ extern void paging_init(void);
  * hook is made available.
  */
 #define set_pte(pteptr, pteval) ((*(pteptr)) = (pteval))
-#define set_pte_at(mm, addr, ptep, pteval) set_pte(ptep, pteval)
+
+static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
+		pte_t *ptep, pte_t pte, unsigned int nr)
+{
+	for (;;) {
+		set_pte(ptep, pte);
+		if (--nr == 0)
+			break;
+		ptep++;
+		pte_val(pte) += PAGE_SIZE;
+	}
+}
+
+#define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1)
+
 /*
  * (pmds are folded into pgds so this doesn't get actually called,
  * but the define is needed for a generic inline function.)
@@ -379,13 +393,16 @@ static inline void update_tlb(struct vm_area_struct *vma,
 extern void update_cache(struct vm_area_struct *vma,
 	unsigned long address, pte_t *pte);
 
-static inline void update_mmu_cache(struct vm_area_struct *vma,
-	unsigned long address, pte_t *pte)
+static inline void update_mmu_cache_range(struct vm_area_struct *vma,
+		unsigned long address, pte_t *ptep, unsigned int nr)
 {
-	update_tlb(vma, address, pte);
-	update_cache(vma, address, pte);
+	update_tlb(vma, address, ptep);
+	update_cache(vma, address, ptep);
 }
 
+#define update_mmu_cache(vma, addr, ptep) \
+	update_mmu_cache_range(vma, addr, ptep, 1)
+
 /* __PHX__ FIXME, SWAP, this probably doesn't work */
 
 /*
diff --git a/arch/openrisc/mm/cache.c b/arch/openrisc/mm/cache.c
index 534a52ec5e66..eb43b73f3855 100644
--- a/arch/openrisc/mm/cache.c
+++ b/arch/openrisc/mm/cache.c
@@ -43,15 +43,19 @@ void update_cache(struct vm_area_struct *vma, unsigned long address,
 	pte_t *pte)
 {
 	unsigned long pfn = pte_val(*pte) >> PAGE_SHIFT;
-	struct page *page = pfn_to_page(pfn);
-	int dirty = !test_and_set_bit(PG_dc_clean, &page->flags);
+	struct folio *folio = page_folio(pfn_to_page(pfn));
+	int dirty = !test_and_set_bit(PG_dc_clean, &folio->flags);
 
 	/*
 	 * Since icaches do not snoop for updated data on OpenRISC, we
 	 * must write back and invalidate any dirty pages manually. We
 	 * can skip data pages, since they will not end up in icaches.
 	 */
-	if ((vma->vm_flags & VM_EXEC) && dirty)
-		sync_icache_dcache(page);
+	if ((vma->vm_flags & VM_EXEC) && dirty) {
+		unsigned int nr = folio_nr_pages(folio);
+
+		while (nr--)
+			sync_icache_dcache(folio_page(folio, nr));
+	}
 }
 
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH v3 18/34] parisc: Implement the new page table range API
  2023-02-28 21:37 [PATCH v3 00/34] New page table range API Matthew Wilcox (Oracle)
                   ` (16 preceding siblings ...)
  2023-02-28 21:37 ` [PATCH v3 17/34] openrisc: " Matthew Wilcox (Oracle)
@ 2023-02-28 21:37 ` Matthew Wilcox (Oracle)
  2023-03-02 16:43   ` John David Anglin
  2023-02-28 21:37   ` Matthew Wilcox (Oracle)
                   ` (18 subsequent siblings)
  36 siblings, 1 reply; 77+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-02-28 21:37 UTC (permalink / raw)
  To: linux-mm, linux-arch
  Cc: Matthew Wilcox (Oracle),
	linux-kernel, James E.J. Bottomley, Helge Deller, linux-parisc

Add set_ptes(), update_mmu_cache_range(), flush_dcache_folio()
and flush_icache_pages().  Change the PG_arch_1 (aka PG_dcache_dirty) flag
from being per-page to per-folio.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Helge Deller <deller@gmx.de>
Cc: linux-parisc@vger.kernel.org
---
 arch/parisc/include/asm/cacheflush.h |  14 ++--
 arch/parisc/include/asm/pgtable.h    |  28 +++++---
 arch/parisc/kernel/cache.c           | 101 +++++++++++++++++++--------
 3 files changed, 99 insertions(+), 44 deletions(-)

diff --git a/arch/parisc/include/asm/cacheflush.h b/arch/parisc/include/asm/cacheflush.h
index ff07c509e04b..0bf8b69d086b 100644
--- a/arch/parisc/include/asm/cacheflush.h
+++ b/arch/parisc/include/asm/cacheflush.h
@@ -46,16 +46,20 @@ void invalidate_kernel_vmap_range(void *vaddr, int size);
 #define flush_cache_vmap(start, end)		flush_cache_all()
 #define flush_cache_vunmap(start, end)		flush_cache_all()
 
+void flush_dcache_folio(struct folio *folio);
+#define flush_dcache_folio flush_dcache_folio
 #define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
-void flush_dcache_page(struct page *page);
+static inline void flush_dcache_page(struct page *page)
+{
+	flush_dcache_folio(page_folio(page));
+}
 
 #define flush_dcache_mmap_lock(mapping)		xa_lock_irq(&mapping->i_pages)
 #define flush_dcache_mmap_unlock(mapping)	xa_unlock_irq(&mapping->i_pages)
 
-#define flush_icache_page(vma,page)	do { 		\
-	flush_kernel_dcache_page_addr(page_address(page)); \
-	flush_kernel_icache_page(page_address(page)); 	\
-} while (0)
+void flush_icache_pages(struct vm_area_struct *vma, struct page *page,
+		unsigned int nr);
+#define flush_icache_page(vma, page)	flush_icache_pages(vma, page, 1)
 
 #define flush_icache_range(s,e)		do { 		\
 	flush_kernel_dcache_range_asm(s,e); 		\
diff --git a/arch/parisc/include/asm/pgtable.h b/arch/parisc/include/asm/pgtable.h
index e2950f5db7c9..78ee9816f423 100644
--- a/arch/parisc/include/asm/pgtable.h
+++ b/arch/parisc/include/asm/pgtable.h
@@ -73,14 +73,7 @@ extern void __update_cache(pte_t pte);
 		mb();				\
 	} while(0)
 
-#define set_pte_at(mm, addr, pteptr, pteval)	\
-	do {					\
-		if (pte_present(pteval) &&	\
-		    pte_user(pteval))		\
-			__update_cache(pteval);	\
-		*(pteptr) = (pteval);		\
-		purge_tlb_entries(mm, addr);	\
-	} while (0)
+#define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1)
 
 #endif /* !__ASSEMBLY__ */
 
@@ -391,11 +384,28 @@ static inline unsigned long pmd_page_vaddr(pmd_t pmd)
 
 extern void paging_init (void);
 
+static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
+		pte_t *ptep, pte_t pte, unsigned int nr)
+{
+	if (pte_present(pte) && pte_user(pte))
+		__update_cache(pte);
+	for (;;) {
+		*ptep = pte;
+		purge_tlb_entries(mm, addr);
+		if (--nr == 0)
+			break;
+		ptep++;
+		pte_val(pte) += 1 << PFN_PTE_SHIFT;
+		addr += PAGE_SIZE;
+	}
+}
+
 /* Used for deferring calls to flush_dcache_page() */
 
 #define PG_dcache_dirty         PG_arch_1
 
-#define update_mmu_cache(vms,addr,ptep) __update_cache(*ptep)
+#define update_mmu_cache_range(vma, addr, ptep, nr) __update_cache(*ptep)
+#define update_mmu_cache(vma, addr, ptep) __update_cache(*ptep)
 
 /*
  * Encode/decode swap entries and swap PTEs. Swap PTEs are all PTEs that
diff --git a/arch/parisc/kernel/cache.c b/arch/parisc/kernel/cache.c
index 984d3a1b3828..16057812103b 100644
--- a/arch/parisc/kernel/cache.c
+++ b/arch/parisc/kernel/cache.c
@@ -92,11 +92,11 @@ static inline void flush_data_cache(void)
 /* Kernel virtual address of pfn.  */
 #define pfn_va(pfn)	__va(PFN_PHYS(pfn))
 
-void
-__update_cache(pte_t pte)
+void __update_cache(pte_t pte)
 {
 	unsigned long pfn = pte_pfn(pte);
-	struct page *page;
+	struct folio *folio;
+	unsigned int nr;
 
 	/* We don't have pte special.  As a result, we can be called with
 	   an invalid pfn and we don't need to flush the kernel dcache page.
@@ -104,13 +104,17 @@ __update_cache(pte_t pte)
 	if (!pfn_valid(pfn))
 		return;
 
-	page = pfn_to_page(pfn);
-	if (page_mapping_file(page) &&
-	    test_bit(PG_dcache_dirty, &page->flags)) {
-		flush_kernel_dcache_page_addr(pfn_va(pfn));
-		clear_bit(PG_dcache_dirty, &page->flags);
+	folio = page_folio(pfn_to_page(pfn));
+	pfn = folio_pfn(folio);
+	nr = folio_nr_pages(folio);
+	if (folio_flush_mapping(folio) &&
+	    test_bit(PG_dcache_dirty, &folio->flags)) {
+		while (nr--)
+			flush_kernel_dcache_page_addr(pfn_va(pfn + nr));
+		clear_bit(PG_dcache_dirty, &folio->flags);
 	} else if (parisc_requires_coherency())
-		flush_kernel_dcache_page_addr(pfn_va(pfn));
+		while (nr--)
+			flush_kernel_dcache_page_addr(pfn_va(pfn + nr));
 }
 
 void
@@ -365,6 +369,20 @@ static void flush_user_cache_page(struct vm_area_struct *vma, unsigned long vmad
 	preempt_enable();
 }
 
+void flush_icache_pages(struct vm_area_struct *vma, struct page *page,
+		unsigned int nr)
+{
+	void *kaddr = page_address(page);
+
+	for (;;) {
+		flush_kernel_dcache_page_addr(kaddr);
+		flush_kernel_icache_page(kaddr);
+		if (--nr == 0)
+			break;
+		page += PAGE_SIZE;
+	}
+}
+
 static inline pte_t *get_ptep(struct mm_struct *mm, unsigned long addr)
 {
 	pte_t *ptep = NULL;
@@ -393,26 +411,30 @@ static inline bool pte_needs_flush(pte_t pte)
 		== (_PAGE_PRESENT | _PAGE_ACCESSED);
 }
 
-void flush_dcache_page(struct page *page)
+void flush_dcache_folio(struct folio *folio)
 {
-	struct address_space *mapping = page_mapping_file(page);
-	struct vm_area_struct *mpnt;
-	unsigned long offset;
+	struct address_space *mapping = folio_flush_mapping(folio);
+	struct vm_area_struct *vma;
 	unsigned long addr, old_addr = 0;
+	void *kaddr;
 	unsigned long count = 0;
+	unsigned long i, nr;
 	pgoff_t pgoff;
 
 	if (mapping && !mapping_mapped(mapping)) {
-		set_bit(PG_dcache_dirty, &page->flags);
+		set_bit(PG_dcache_dirty, &folio->flags);
 		return;
 	}
 
-	flush_kernel_dcache_page_addr(page_address(page));
+	nr = folio_nr_pages(folio);
+	kaddr = folio_address(folio);
+	for (i = 0; i < nr; i++)
+		flush_kernel_dcache_page_addr(kaddr + i * PAGE_SIZE);
 
 	if (!mapping)
 		return;
 
-	pgoff = page->index;
+	pgoff = folio->index;
 
 	/*
 	 * We have carefully arranged in arch_get_unmapped_area() that
@@ -422,15 +444,29 @@ void flush_dcache_page(struct page *page)
 	 * on machines that support equivalent aliasing
 	 */
 	flush_dcache_mmap_lock(mapping);
-	vma_interval_tree_foreach(mpnt, &mapping->i_mmap, pgoff, pgoff) {
-		offset = (pgoff - mpnt->vm_pgoff) << PAGE_SHIFT;
-		addr = mpnt->vm_start + offset;
-		if (parisc_requires_coherency()) {
-			pte_t *ptep;
+	vma_interval_tree_foreach(vma, &mapping->i_mmap, pgoff, pgoff + nr - 1) {
+		unsigned long offset = pgoff - vma->vm_pgoff;
+		unsigned long pfn = folio_pfn(folio);
+
+		addr = vma->vm_start;
+		nr = folio_nr_pages(folio);
+		if (offset > -nr) {
+			pfn -= offset;
+			nr += offset;
+		} else {
+			addr += offset * PAGE_SIZE;
+		}
+		if (addr + nr * PAGE_SIZE > vma->vm_end)
+			nr = (vma->vm_end - addr) / PAGE_SIZE;
 
-			ptep = get_ptep(mpnt->vm_mm, addr);
-			if (ptep && pte_needs_flush(*ptep))
-				flush_user_cache_page(mpnt, addr);
+		if (parisc_requires_coherency()) {
+			for (i = 0; i < nr; i++) {
+				pte_t *ptep = get_ptep(vma->vm_mm,
+							addr + i * PAGE_SIZE);
+				if (ptep && pte_needs_flush(*ptep))
+					flush_user_cache_page(vma,
+							addr + i * PAGE_SIZE);
+			}
 		} else {
 			/*
 			 * The TLB is the engine of coherence on parisc:
@@ -443,27 +479,32 @@ void flush_dcache_page(struct page *page)
 			 * in (until the user or kernel specifically
 			 * accesses it, of course)
 			 */
-			flush_tlb_page(mpnt, addr);
+			for (i = 0; i < nr; i++)
+				flush_tlb_page(vma, addr + i * PAGE_SIZE);
 			if (old_addr == 0 || (old_addr & (SHM_COLOUR - 1))
 					!= (addr & (SHM_COLOUR - 1))) {
-				__flush_cache_page(mpnt, addr, page_to_phys(page));
+				for (i = 0; i < nr; i++)
+					__flush_cache_page(vma,
+						addr + i * PAGE_SIZE,
+						(pfn + i) * PAGE_SIZE);
 				/*
 				 * Software is allowed to have any number
 				 * of private mappings to a page.
 				 */
-				if (!(mpnt->vm_flags & VM_SHARED))
+				if (!(vma->vm_flags & VM_SHARED))
 					continue;
 				if (old_addr)
 					pr_err("INEQUIVALENT ALIASES 0x%lx and 0x%lx in file %pD\n",
-						old_addr, addr, mpnt->vm_file);
-				old_addr = addr;
+						old_addr, addr, vma->vm_file);
+				if (nr == folio_nr_pages(folio))
+					old_addr = addr;
 			}
 		}
 		WARN_ON(++count == 4096);
 	}
 	flush_dcache_mmap_unlock(mapping);
 }
-EXPORT_SYMBOL(flush_dcache_page);
+EXPORT_SYMBOL(flush_dcache_folio);
 
 /* Defined in arch/parisc/kernel/pacache.S */
 EXPORT_SYMBOL(flush_kernel_dcache_range_asm);
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH v3 19/34] powerpc: Implement the new page table range API
  2023-02-28 21:37 [PATCH v3 00/34] New page table range API Matthew Wilcox (Oracle)
@ 2023-02-28 21:37   ` Matthew Wilcox (Oracle)
  2023-02-28 21:37 ` [PATCH v3 02/34] mm: Add generic flush_icache_pages() and documentation Matthew Wilcox (Oracle)
                     ` (35 subsequent siblings)
  36 siblings, 0 replies; 77+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-02-28 21:37 UTC (permalink / raw)
  To: linux-mm, linux-arch
  Cc: Matthew Wilcox (Oracle),
	linux-kernel, Michael Ellerman, Nicholas Piggin,
	Christophe Leroy, linuxppc-dev

Add set_ptes(), update_mmu_cache_range() and flush_dcache_folio().
Change the PG_arch_1 (aka PG_dcache_dirty) flag from being per-page to
per-folio.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: linuxppc-dev@lists.ozlabs.org
---
 arch/powerpc/include/asm/book3s/pgtable.h | 10 +----
 arch/powerpc/include/asm/cacheflush.h     | 14 +++++--
 arch/powerpc/include/asm/kvm_ppc.h        | 10 ++---
 arch/powerpc/include/asm/nohash/pgtable.h | 13 ++----
 arch/powerpc/include/asm/pgtable.h        |  6 +++
 arch/powerpc/mm/book3s64/hash_utils.c     | 11 ++---
 arch/powerpc/mm/cacheflush.c              | 40 ++++++------------
 arch/powerpc/mm/nohash/e500_hugetlbpage.c |  3 +-
 arch/powerpc/mm/pgtable.c                 | 51 +++++++++++++----------
 9 files changed, 77 insertions(+), 81 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/pgtable.h b/arch/powerpc/include/asm/book3s/pgtable.h
index d18b748ea3ae..c2ef811505b0 100644
--- a/arch/powerpc/include/asm/book3s/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/pgtable.h
@@ -9,13 +9,6 @@
 #endif
 
 #ifndef __ASSEMBLY__
-/* Insert a PTE, top-level function is out of line. It uses an inline
- * low level function in the respective pgtable-* files
- */
-extern void set_pte_at(struct mm_struct *mm, unsigned long addr, pte_t *ptep,
-		       pte_t pte);
-
-
 #define __HAVE_ARCH_PTEP_SET_ACCESS_FLAGS
 extern int ptep_set_access_flags(struct vm_area_struct *vma, unsigned long address,
 				 pte_t *ptep, pte_t entry, int dirty);
@@ -36,7 +29,8 @@ void __update_mmu_cache(struct vm_area_struct *vma, unsigned long address, pte_t
  * corresponding HPTE into the hash table ahead of time, instead of
  * waiting for the inevitable extra hash-table miss exception.
  */
-static inline void update_mmu_cache(struct vm_area_struct *vma, unsigned long address, pte_t *ptep)
+static inline void update_mmu_cache_range(struct vm_area_struct *vma,
+		unsigned long address, pte_t *ptep, unsigned int nr)
 {
 	if (IS_ENABLED(CONFIG_PPC32) && !mmu_has_feature(MMU_FTR_HPTE_TABLE))
 		return;
diff --git a/arch/powerpc/include/asm/cacheflush.h b/arch/powerpc/include/asm/cacheflush.h
index 7564dd4fd12b..ef7d2de33b89 100644
--- a/arch/powerpc/include/asm/cacheflush.h
+++ b/arch/powerpc/include/asm/cacheflush.h
@@ -35,13 +35,19 @@ static inline void flush_cache_vmap(unsigned long start, unsigned long end)
  * It just marks the page as not i-cache clean.  We do the i-cache
  * flush later when the page is given to a user process, if necessary.
  */
-static inline void flush_dcache_page(struct page *page)
+static inline void flush_dcache_folio(struct folio *folio)
 {
 	if (cpu_has_feature(CPU_FTR_COHERENT_ICACHE))
 		return;
 	/* avoid an atomic op if possible */
-	if (test_bit(PG_dcache_clean, &page->flags))
-		clear_bit(PG_dcache_clean, &page->flags);
+	if (test_bit(PG_dcache_clean, &folio->flags))
+		clear_bit(PG_dcache_clean, &folio->flags);
+}
+#define flush_dcache_folio flush_dcache_folio
+
+static inline void flush_dcache_page(struct page *page)
+{
+	flush_dcache_folio(page_folio(page));
 }
 
 void flush_icache_range(unsigned long start, unsigned long stop);
@@ -51,7 +57,7 @@ void flush_icache_user_page(struct vm_area_struct *vma, struct page *page,
 		unsigned long addr, int len);
 #define flush_icache_user_page flush_icache_user_page
 
-void flush_dcache_icache_page(struct page *page);
+void flush_dcache_icache_folio(struct folio *folio);
 
 /**
  * flush_dcache_range(): Write any modified data cache blocks out to memory and
diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
index 6bef23d6d0e3..e91dd8e88bb7 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -868,7 +868,7 @@ void kvmppc_init_lpid(unsigned long nr_lpids);
 
 static inline void kvmppc_mmu_flush_icache(kvm_pfn_t pfn)
 {
-	struct page *page;
+	struct folio *folio;
 	/*
 	 * We can only access pages that the kernel maps
 	 * as memory. Bail out for unmapped ones.
@@ -877,10 +877,10 @@ static inline void kvmppc_mmu_flush_icache(kvm_pfn_t pfn)
 		return;
 
 	/* Clear i-cache for new pages */
-	page = pfn_to_page(pfn);
-	if (!test_bit(PG_dcache_clean, &page->flags)) {
-		flush_dcache_icache_page(page);
-		set_bit(PG_dcache_clean, &page->flags);
+	folio = page_folio(pfn_to_page(pfn));
+	if (!test_bit(PG_dcache_clean, &folio->flags)) {
+		flush_dcache_icache_folio(folio);
+		set_bit(PG_dcache_clean, &folio->flags);
 	}
 }
 
diff --git a/arch/powerpc/include/asm/nohash/pgtable.h b/arch/powerpc/include/asm/nohash/pgtable.h
index a6caaaab6f92..69a7dd47a9f0 100644
--- a/arch/powerpc/include/asm/nohash/pgtable.h
+++ b/arch/powerpc/include/asm/nohash/pgtable.h
@@ -166,12 +166,6 @@ static inline pte_t pte_swp_clear_exclusive(pte_t pte)
 	return __pte(pte_val(pte) & ~_PAGE_SWP_EXCLUSIVE);
 }
 
-/* Insert a PTE, top-level function is out of line. It uses an inline
- * low level function in the respective pgtable-* files
- */
-extern void set_pte_at(struct mm_struct *mm, unsigned long addr, pte_t *ptep,
-		       pte_t pte);
-
 /* This low level function performs the actual PTE insertion
  * Setting the PTE depends on the MMU type and other factors. It's
  * an horrible mess that I'm not going to try to clean up now but
@@ -282,10 +276,11 @@ static inline int pud_huge(pud_t pud)
  * for the page which has just been mapped in.
  */
 #if defined(CONFIG_PPC_E500) && defined(CONFIG_HUGETLB_PAGE)
-void update_mmu_cache(struct vm_area_struct *vma, unsigned long address, pte_t *ptep);
+void update_mmu_cache_range(struct vm_area_struct *vma, unsigned long address,
+		pte_t *ptep, unsigned int nr);
 #else
-static inline
-void update_mmu_cache(struct vm_area_struct *vma, unsigned long address, pte_t *ptep) {}
+static inline void update_mmu_cache(struct vm_area_struct *vma,
+		unsigned long address, pte_t *ptep, unsigned int nr) {}
 #endif
 
 #endif /* __ASSEMBLY__ */
diff --git a/arch/powerpc/include/asm/pgtable.h b/arch/powerpc/include/asm/pgtable.h
index 9972626ddaf6..bf1263ff7e67 100644
--- a/arch/powerpc/include/asm/pgtable.h
+++ b/arch/powerpc/include/asm/pgtable.h
@@ -41,6 +41,12 @@ struct mm_struct;
 
 #ifndef __ASSEMBLY__
 
+void set_ptes(struct mm_struct *mm, unsigned long addr, pte_t *ptep,
+		pte_t pte, unsigned int nr);
+#define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1)
+#define update_mmu_cache(vma, addr, ptep) \
+	update_mmu_cache_range(vma, addr, ptep, 1);
+
 #ifndef MAX_PTRS_PER_PGD
 #define MAX_PTRS_PER_PGD PTRS_PER_PGD
 #endif
diff --git a/arch/powerpc/mm/book3s64/hash_utils.c b/arch/powerpc/mm/book3s64/hash_utils.c
index fedffe3ae136..ad2afa08e62e 100644
--- a/arch/powerpc/mm/book3s64/hash_utils.c
+++ b/arch/powerpc/mm/book3s64/hash_utils.c
@@ -1307,18 +1307,19 @@ void hash__early_init_mmu_secondary(void)
  */
 unsigned int hash_page_do_lazy_icache(unsigned int pp, pte_t pte, int trap)
 {
-	struct page *page;
+	struct folio *folio;
 
 	if (!pfn_valid(pte_pfn(pte)))
 		return pp;
 
-	page = pte_page(pte);
+	folio = page_folio(pte_page(pte));
 
 	/* page is dirty */
-	if (!test_bit(PG_dcache_clean, &page->flags) && !PageReserved(page)) {
+	if (!test_bit(PG_dcache_clean, &folio->flags) &&
+	    !folio_test_reserved(folio)) {
 		if (trap == INTERRUPT_INST_STORAGE) {
-			flush_dcache_icache_page(page);
-			set_bit(PG_dcache_clean, &page->flags);
+			flush_dcache_icache_folio(folio);
+			set_bit(PG_dcache_clean, &folio->flags);
 		} else
 			pp |= HPTE_R_N;
 	}
diff --git a/arch/powerpc/mm/cacheflush.c b/arch/powerpc/mm/cacheflush.c
index 0e9b4879c0f9..8760d2223abe 100644
--- a/arch/powerpc/mm/cacheflush.c
+++ b/arch/powerpc/mm/cacheflush.c
@@ -148,44 +148,30 @@ static void __flush_dcache_icache(void *p)
 	invalidate_icache_range(addr, addr + PAGE_SIZE);
 }
 
-static void flush_dcache_icache_hugepage(struct page *page)
+void flush_dcache_icache_folio(struct folio *folio)
 {
-	int i;
-	int nr = compound_nr(page);
+	unsigned int i, nr = folio_nr_pages(folio);
 
-	if (!PageHighMem(page)) {
+	if (flush_coherent_icache())
+		return;
+
+	if (!folio_test_highmem(folio)) {
+		void *addr = folio_address(folio);
 		for (i = 0; i < nr; i++)
-			__flush_dcache_icache(lowmem_page_address(page + i));
-	} else {
+			__flush_dcache_icache(addr + i * PAGE_SIZE);
+	} else if (IS_ENABLED(CONFIG_BOOKE) || sizeof(phys_addr_t) > sizeof(void *)) {
 		for (i = 0; i < nr; i++) {
-			void *start = kmap_local_page(page + i);
+			void *start = kmap_local_folio(folio, i * PAGE_SIZE);
 
 			__flush_dcache_icache(start);
 			kunmap_local(start);
 		}
-	}
-}
-
-void flush_dcache_icache_page(struct page *page)
-{
-	if (flush_coherent_icache())
-		return;
-
-	if (PageCompound(page))
-		return flush_dcache_icache_hugepage(page);
-
-	if (!PageHighMem(page)) {
-		__flush_dcache_icache(lowmem_page_address(page));
-	} else if (IS_ENABLED(CONFIG_BOOKE) || sizeof(phys_addr_t) > sizeof(void *)) {
-		void *start = kmap_local_page(page);
-
-		__flush_dcache_icache(start);
-		kunmap_local(start);
 	} else {
-		flush_dcache_icache_phys(page_to_phys(page));
+		unsigned long pfn = folio_pfn(folio);
+		for (i = 0; i < nr; i++)
+			flush_dcache_icache_phys((pfn + i) * PAGE_SIZE);
 	}
 }
-EXPORT_SYMBOL(flush_dcache_icache_page);
 
 void clear_user_page(void *page, unsigned long vaddr, struct page *pg)
 {
diff --git a/arch/powerpc/mm/nohash/e500_hugetlbpage.c b/arch/powerpc/mm/nohash/e500_hugetlbpage.c
index 58c8d9849cb1..f3cb91107a47 100644
--- a/arch/powerpc/mm/nohash/e500_hugetlbpage.c
+++ b/arch/powerpc/mm/nohash/e500_hugetlbpage.c
@@ -178,7 +178,8 @@ book3e_hugetlb_preload(struct vm_area_struct *vma, unsigned long ea, pte_t pte)
  *
  * This must always be called with the pte lock held.
  */
-void update_mmu_cache(struct vm_area_struct *vma, unsigned long address, pte_t *ptep)
+void update_mmu_cache(struct vm_area_struct *vma, unsigned long address,
+		pte_t *ptep, unsigned int nr)
 {
 	if (is_vm_hugetlb_page(vma))
 		book3e_hugetlb_preload(vma, address, *ptep);
diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c
index cb2dcdb18f8e..b3c7b874a7a2 100644
--- a/arch/powerpc/mm/pgtable.c
+++ b/arch/powerpc/mm/pgtable.c
@@ -58,7 +58,7 @@ static inline int pte_looks_normal(pte_t pte)
 	return 0;
 }
 
-static struct page *maybe_pte_to_page(pte_t pte)
+static struct folio *maybe_pte_to_folio(pte_t pte)
 {
 	unsigned long pfn = pte_pfn(pte);
 	struct page *page;
@@ -68,7 +68,7 @@ static struct page *maybe_pte_to_page(pte_t pte)
 	page = pfn_to_page(pfn);
 	if (PageReserved(page))
 		return NULL;
-	return page;
+	return page_folio(page);
 }
 
 #ifdef CONFIG_PPC_BOOK3S
@@ -84,12 +84,12 @@ static pte_t set_pte_filter_hash(pte_t pte)
 	pte = __pte(pte_val(pte) & ~_PAGE_HPTEFLAGS);
 	if (pte_looks_normal(pte) && !(cpu_has_feature(CPU_FTR_COHERENT_ICACHE) ||
 				       cpu_has_feature(CPU_FTR_NOEXECUTE))) {
-		struct page *pg = maybe_pte_to_page(pte);
-		if (!pg)
+		struct folio *folio = maybe_pte_to_folio(pte);
+		if (!folio)
 			return pte;
-		if (!test_bit(PG_dcache_clean, &pg->flags)) {
-			flush_dcache_icache_page(pg);
-			set_bit(PG_dcache_clean, &pg->flags);
+		if (!test_bit(PG_dcache_clean, &folio->flags)) {
+			flush_dcache_icache_folio(folio);
+			set_bit(PG_dcache_clean, &folio->flags);
 		}
 	}
 	return pte;
@@ -107,7 +107,7 @@ static pte_t set_pte_filter_hash(pte_t pte) { return pte; }
  */
 static inline pte_t set_pte_filter(pte_t pte)
 {
-	struct page *pg;
+	struct folio *folio;
 
 	if (radix_enabled())
 		return pte;
@@ -120,18 +120,18 @@ static inline pte_t set_pte_filter(pte_t pte)
 		return pte;
 
 	/* If you set _PAGE_EXEC on weird pages you're on your own */
-	pg = maybe_pte_to_page(pte);
-	if (unlikely(!pg))
+	folio = maybe_pte_to_folio(pte);
+	if (unlikely(!folio))
 		return pte;
 
 	/* If the page clean, we move on */
-	if (test_bit(PG_dcache_clean, &pg->flags))
+	if (test_bit(PG_dcache_clean, &folio->flags))
 		return pte;
 
 	/* If it's an exec fault, we flush the cache and make it clean */
 	if (is_exec_fault()) {
-		flush_dcache_icache_page(pg);
-		set_bit(PG_dcache_clean, &pg->flags);
+		flush_dcache_icache_folio(folio);
+		set_bit(PG_dcache_clean, &folio->flags);
 		return pte;
 	}
 
@@ -142,7 +142,7 @@ static inline pte_t set_pte_filter(pte_t pte)
 static pte_t set_access_flags_filter(pte_t pte, struct vm_area_struct *vma,
 				     int dirty)
 {
-	struct page *pg;
+	struct folio *folio;
 
 	if (IS_ENABLED(CONFIG_PPC_BOOK3S_64))
 		return pte;
@@ -168,17 +168,17 @@ static pte_t set_access_flags_filter(pte_t pte, struct vm_area_struct *vma,
 #endif /* CONFIG_DEBUG_VM */
 
 	/* If you set _PAGE_EXEC on weird pages you're on your own */
-	pg = maybe_pte_to_page(pte);
-	if (unlikely(!pg))
+	folio = maybe_pte_to_folio(pte);
+	if (unlikely(!folio))
 		goto bail;
 
 	/* If the page is already clean, we move on */
-	if (test_bit(PG_dcache_clean, &pg->flags))
+	if (test_bit(PG_dcache_clean, &folio->flags))
 		goto bail;
 
 	/* Clean the page and set PG_dcache_clean */
-	flush_dcache_icache_page(pg);
-	set_bit(PG_dcache_clean, &pg->flags);
+	flush_dcache_icache_folio(folio);
+	set_bit(PG_dcache_clean, &folio->flags);
 
  bail:
 	return pte_mkexec(pte);
@@ -187,8 +187,8 @@ static pte_t set_access_flags_filter(pte_t pte, struct vm_area_struct *vma,
 /*
  * set_pte stores a linux PTE into the linux page table.
  */
-void set_pte_at(struct mm_struct *mm, unsigned long addr, pte_t *ptep,
-		pte_t pte)
+void set_ptes(struct mm_struct *mm, unsigned long addr, pte_t *ptep,
+		pte_t pte, unsigned int nr)
 {
 	/*
 	 * Make sure hardware valid bit is not set. We don't do
@@ -203,7 +203,14 @@ void set_pte_at(struct mm_struct *mm, unsigned long addr, pte_t *ptep,
 	pte = set_pte_filter(pte);
 
 	/* Perform the setting of the PTE */
-	__set_pte_at(mm, addr, ptep, pte, 0);
+	for (;;) {
+		__set_pte_at(mm, addr, ptep, pte, 0);
+		if (--nr == 0)
+			break;
+		ptep++;
+		pte = __pte(pte_val(pte) + PAGE_SIZE);
+		addr += PAGE_SIZE;
+	}
 }
 
 void unmap_kernel_page(unsigned long va)
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH v3 19/34] powerpc: Implement the new page table range API
@ 2023-02-28 21:37   ` Matthew Wilcox (Oracle)
  0 siblings, 0 replies; 77+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-02-28 21:37 UTC (permalink / raw)
  To: linux-mm, linux-arch
  Cc: Matthew Wilcox (Oracle), linux-kernel, Nicholas Piggin, linuxppc-dev

Add set_ptes(), update_mmu_cache_range() and flush_dcache_folio().
Change the PG_arch_1 (aka PG_dcache_dirty) flag from being per-page to
per-folio.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: linuxppc-dev@lists.ozlabs.org
---
 arch/powerpc/include/asm/book3s/pgtable.h | 10 +----
 arch/powerpc/include/asm/cacheflush.h     | 14 +++++--
 arch/powerpc/include/asm/kvm_ppc.h        | 10 ++---
 arch/powerpc/include/asm/nohash/pgtable.h | 13 ++----
 arch/powerpc/include/asm/pgtable.h        |  6 +++
 arch/powerpc/mm/book3s64/hash_utils.c     | 11 ++---
 arch/powerpc/mm/cacheflush.c              | 40 ++++++------------
 arch/powerpc/mm/nohash/e500_hugetlbpage.c |  3 +-
 arch/powerpc/mm/pgtable.c                 | 51 +++++++++++++----------
 9 files changed, 77 insertions(+), 81 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/pgtable.h b/arch/powerpc/include/asm/book3s/pgtable.h
index d18b748ea3ae..c2ef811505b0 100644
--- a/arch/powerpc/include/asm/book3s/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/pgtable.h
@@ -9,13 +9,6 @@
 #endif
 
 #ifndef __ASSEMBLY__
-/* Insert a PTE, top-level function is out of line. It uses an inline
- * low level function in the respective pgtable-* files
- */
-extern void set_pte_at(struct mm_struct *mm, unsigned long addr, pte_t *ptep,
-		       pte_t pte);
-
-
 #define __HAVE_ARCH_PTEP_SET_ACCESS_FLAGS
 extern int ptep_set_access_flags(struct vm_area_struct *vma, unsigned long address,
 				 pte_t *ptep, pte_t entry, int dirty);
@@ -36,7 +29,8 @@ void __update_mmu_cache(struct vm_area_struct *vma, unsigned long address, pte_t
  * corresponding HPTE into the hash table ahead of time, instead of
  * waiting for the inevitable extra hash-table miss exception.
  */
-static inline void update_mmu_cache(struct vm_area_struct *vma, unsigned long address, pte_t *ptep)
+static inline void update_mmu_cache_range(struct vm_area_struct *vma,
+		unsigned long address, pte_t *ptep, unsigned int nr)
 {
 	if (IS_ENABLED(CONFIG_PPC32) && !mmu_has_feature(MMU_FTR_HPTE_TABLE))
 		return;
diff --git a/arch/powerpc/include/asm/cacheflush.h b/arch/powerpc/include/asm/cacheflush.h
index 7564dd4fd12b..ef7d2de33b89 100644
--- a/arch/powerpc/include/asm/cacheflush.h
+++ b/arch/powerpc/include/asm/cacheflush.h
@@ -35,13 +35,19 @@ static inline void flush_cache_vmap(unsigned long start, unsigned long end)
  * It just marks the page as not i-cache clean.  We do the i-cache
  * flush later when the page is given to a user process, if necessary.
  */
-static inline void flush_dcache_page(struct page *page)
+static inline void flush_dcache_folio(struct folio *folio)
 {
 	if (cpu_has_feature(CPU_FTR_COHERENT_ICACHE))
 		return;
 	/* avoid an atomic op if possible */
-	if (test_bit(PG_dcache_clean, &page->flags))
-		clear_bit(PG_dcache_clean, &page->flags);
+	if (test_bit(PG_dcache_clean, &folio->flags))
+		clear_bit(PG_dcache_clean, &folio->flags);
+}
+#define flush_dcache_folio flush_dcache_folio
+
+static inline void flush_dcache_page(struct page *page)
+{
+	flush_dcache_folio(page_folio(page));
 }
 
 void flush_icache_range(unsigned long start, unsigned long stop);
@@ -51,7 +57,7 @@ void flush_icache_user_page(struct vm_area_struct *vma, struct page *page,
 		unsigned long addr, int len);
 #define flush_icache_user_page flush_icache_user_page
 
-void flush_dcache_icache_page(struct page *page);
+void flush_dcache_icache_folio(struct folio *folio);
 
 /**
  * flush_dcache_range(): Write any modified data cache blocks out to memory and
diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
index 6bef23d6d0e3..e91dd8e88bb7 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -868,7 +868,7 @@ void kvmppc_init_lpid(unsigned long nr_lpids);
 
 static inline void kvmppc_mmu_flush_icache(kvm_pfn_t pfn)
 {
-	struct page *page;
+	struct folio *folio;
 	/*
 	 * We can only access pages that the kernel maps
 	 * as memory. Bail out for unmapped ones.
@@ -877,10 +877,10 @@ static inline void kvmppc_mmu_flush_icache(kvm_pfn_t pfn)
 		return;
 
 	/* Clear i-cache for new pages */
-	page = pfn_to_page(pfn);
-	if (!test_bit(PG_dcache_clean, &page->flags)) {
-		flush_dcache_icache_page(page);
-		set_bit(PG_dcache_clean, &page->flags);
+	folio = page_folio(pfn_to_page(pfn));
+	if (!test_bit(PG_dcache_clean, &folio->flags)) {
+		flush_dcache_icache_folio(folio);
+		set_bit(PG_dcache_clean, &folio->flags);
 	}
 }
 
diff --git a/arch/powerpc/include/asm/nohash/pgtable.h b/arch/powerpc/include/asm/nohash/pgtable.h
index a6caaaab6f92..69a7dd47a9f0 100644
--- a/arch/powerpc/include/asm/nohash/pgtable.h
+++ b/arch/powerpc/include/asm/nohash/pgtable.h
@@ -166,12 +166,6 @@ static inline pte_t pte_swp_clear_exclusive(pte_t pte)
 	return __pte(pte_val(pte) & ~_PAGE_SWP_EXCLUSIVE);
 }
 
-/* Insert a PTE, top-level function is out of line. It uses an inline
- * low level function in the respective pgtable-* files
- */
-extern void set_pte_at(struct mm_struct *mm, unsigned long addr, pte_t *ptep,
-		       pte_t pte);
-
 /* This low level function performs the actual PTE insertion
  * Setting the PTE depends on the MMU type and other factors. It's
  * an horrible mess that I'm not going to try to clean up now but
@@ -282,10 +276,11 @@ static inline int pud_huge(pud_t pud)
  * for the page which has just been mapped in.
  */
 #if defined(CONFIG_PPC_E500) && defined(CONFIG_HUGETLB_PAGE)
-void update_mmu_cache(struct vm_area_struct *vma, unsigned long address, pte_t *ptep);
+void update_mmu_cache_range(struct vm_area_struct *vma, unsigned long address,
+		pte_t *ptep, unsigned int nr);
 #else
-static inline
-void update_mmu_cache(struct vm_area_struct *vma, unsigned long address, pte_t *ptep) {}
+static inline void update_mmu_cache(struct vm_area_struct *vma,
+		unsigned long address, pte_t *ptep, unsigned int nr) {}
 #endif
 
 #endif /* __ASSEMBLY__ */
diff --git a/arch/powerpc/include/asm/pgtable.h b/arch/powerpc/include/asm/pgtable.h
index 9972626ddaf6..bf1263ff7e67 100644
--- a/arch/powerpc/include/asm/pgtable.h
+++ b/arch/powerpc/include/asm/pgtable.h
@@ -41,6 +41,12 @@ struct mm_struct;
 
 #ifndef __ASSEMBLY__
 
+void set_ptes(struct mm_struct *mm, unsigned long addr, pte_t *ptep,
+		pte_t pte, unsigned int nr);
+#define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1)
+#define update_mmu_cache(vma, addr, ptep) \
+	update_mmu_cache_range(vma, addr, ptep, 1);
+
 #ifndef MAX_PTRS_PER_PGD
 #define MAX_PTRS_PER_PGD PTRS_PER_PGD
 #endif
diff --git a/arch/powerpc/mm/book3s64/hash_utils.c b/arch/powerpc/mm/book3s64/hash_utils.c
index fedffe3ae136..ad2afa08e62e 100644
--- a/arch/powerpc/mm/book3s64/hash_utils.c
+++ b/arch/powerpc/mm/book3s64/hash_utils.c
@@ -1307,18 +1307,19 @@ void hash__early_init_mmu_secondary(void)
  */
 unsigned int hash_page_do_lazy_icache(unsigned int pp, pte_t pte, int trap)
 {
-	struct page *page;
+	struct folio *folio;
 
 	if (!pfn_valid(pte_pfn(pte)))
 		return pp;
 
-	page = pte_page(pte);
+	folio = page_folio(pte_page(pte));
 
 	/* page is dirty */
-	if (!test_bit(PG_dcache_clean, &page->flags) && !PageReserved(page)) {
+	if (!test_bit(PG_dcache_clean, &folio->flags) &&
+	    !folio_test_reserved(folio)) {
 		if (trap == INTERRUPT_INST_STORAGE) {
-			flush_dcache_icache_page(page);
-			set_bit(PG_dcache_clean, &page->flags);
+			flush_dcache_icache_folio(folio);
+			set_bit(PG_dcache_clean, &folio->flags);
 		} else
 			pp |= HPTE_R_N;
 	}
diff --git a/arch/powerpc/mm/cacheflush.c b/arch/powerpc/mm/cacheflush.c
index 0e9b4879c0f9..8760d2223abe 100644
--- a/arch/powerpc/mm/cacheflush.c
+++ b/arch/powerpc/mm/cacheflush.c
@@ -148,44 +148,30 @@ static void __flush_dcache_icache(void *p)
 	invalidate_icache_range(addr, addr + PAGE_SIZE);
 }
 
-static void flush_dcache_icache_hugepage(struct page *page)
+void flush_dcache_icache_folio(struct folio *folio)
 {
-	int i;
-	int nr = compound_nr(page);
+	unsigned int i, nr = folio_nr_pages(folio);
 
-	if (!PageHighMem(page)) {
+	if (flush_coherent_icache())
+		return;
+
+	if (!folio_test_highmem(folio)) {
+		void *addr = folio_address(folio);
 		for (i = 0; i < nr; i++)
-			__flush_dcache_icache(lowmem_page_address(page + i));
-	} else {
+			__flush_dcache_icache(addr + i * PAGE_SIZE);
+	} else if (IS_ENABLED(CONFIG_BOOKE) || sizeof(phys_addr_t) > sizeof(void *)) {
 		for (i = 0; i < nr; i++) {
-			void *start = kmap_local_page(page + i);
+			void *start = kmap_local_folio(folio, i * PAGE_SIZE);
 
 			__flush_dcache_icache(start);
 			kunmap_local(start);
 		}
-	}
-}
-
-void flush_dcache_icache_page(struct page *page)
-{
-	if (flush_coherent_icache())
-		return;
-
-	if (PageCompound(page))
-		return flush_dcache_icache_hugepage(page);
-
-	if (!PageHighMem(page)) {
-		__flush_dcache_icache(lowmem_page_address(page));
-	} else if (IS_ENABLED(CONFIG_BOOKE) || sizeof(phys_addr_t) > sizeof(void *)) {
-		void *start = kmap_local_page(page);
-
-		__flush_dcache_icache(start);
-		kunmap_local(start);
 	} else {
-		flush_dcache_icache_phys(page_to_phys(page));
+		unsigned long pfn = folio_pfn(folio);
+		for (i = 0; i < nr; i++)
+			flush_dcache_icache_phys((pfn + i) * PAGE_SIZE);
 	}
 }
-EXPORT_SYMBOL(flush_dcache_icache_page);
 
 void clear_user_page(void *page, unsigned long vaddr, struct page *pg)
 {
diff --git a/arch/powerpc/mm/nohash/e500_hugetlbpage.c b/arch/powerpc/mm/nohash/e500_hugetlbpage.c
index 58c8d9849cb1..f3cb91107a47 100644
--- a/arch/powerpc/mm/nohash/e500_hugetlbpage.c
+++ b/arch/powerpc/mm/nohash/e500_hugetlbpage.c
@@ -178,7 +178,8 @@ book3e_hugetlb_preload(struct vm_area_struct *vma, unsigned long ea, pte_t pte)
  *
  * This must always be called with the pte lock held.
  */
-void update_mmu_cache(struct vm_area_struct *vma, unsigned long address, pte_t *ptep)
+void update_mmu_cache(struct vm_area_struct *vma, unsigned long address,
+		pte_t *ptep, unsigned int nr)
 {
 	if (is_vm_hugetlb_page(vma))
 		book3e_hugetlb_preload(vma, address, *ptep);
diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c
index cb2dcdb18f8e..b3c7b874a7a2 100644
--- a/arch/powerpc/mm/pgtable.c
+++ b/arch/powerpc/mm/pgtable.c
@@ -58,7 +58,7 @@ static inline int pte_looks_normal(pte_t pte)
 	return 0;
 }
 
-static struct page *maybe_pte_to_page(pte_t pte)
+static struct folio *maybe_pte_to_folio(pte_t pte)
 {
 	unsigned long pfn = pte_pfn(pte);
 	struct page *page;
@@ -68,7 +68,7 @@ static struct page *maybe_pte_to_page(pte_t pte)
 	page = pfn_to_page(pfn);
 	if (PageReserved(page))
 		return NULL;
-	return page;
+	return page_folio(page);
 }
 
 #ifdef CONFIG_PPC_BOOK3S
@@ -84,12 +84,12 @@ static pte_t set_pte_filter_hash(pte_t pte)
 	pte = __pte(pte_val(pte) & ~_PAGE_HPTEFLAGS);
 	if (pte_looks_normal(pte) && !(cpu_has_feature(CPU_FTR_COHERENT_ICACHE) ||
 				       cpu_has_feature(CPU_FTR_NOEXECUTE))) {
-		struct page *pg = maybe_pte_to_page(pte);
-		if (!pg)
+		struct folio *folio = maybe_pte_to_folio(pte);
+		if (!folio)
 			return pte;
-		if (!test_bit(PG_dcache_clean, &pg->flags)) {
-			flush_dcache_icache_page(pg);
-			set_bit(PG_dcache_clean, &pg->flags);
+		if (!test_bit(PG_dcache_clean, &folio->flags)) {
+			flush_dcache_icache_folio(folio);
+			set_bit(PG_dcache_clean, &folio->flags);
 		}
 	}
 	return pte;
@@ -107,7 +107,7 @@ static pte_t set_pte_filter_hash(pte_t pte) { return pte; }
  */
 static inline pte_t set_pte_filter(pte_t pte)
 {
-	struct page *pg;
+	struct folio *folio;
 
 	if (radix_enabled())
 		return pte;
@@ -120,18 +120,18 @@ static inline pte_t set_pte_filter(pte_t pte)
 		return pte;
 
 	/* If you set _PAGE_EXEC on weird pages you're on your own */
-	pg = maybe_pte_to_page(pte);
-	if (unlikely(!pg))
+	folio = maybe_pte_to_folio(pte);
+	if (unlikely(!folio))
 		return pte;
 
 	/* If the page clean, we move on */
-	if (test_bit(PG_dcache_clean, &pg->flags))
+	if (test_bit(PG_dcache_clean, &folio->flags))
 		return pte;
 
 	/* If it's an exec fault, we flush the cache and make it clean */
 	if (is_exec_fault()) {
-		flush_dcache_icache_page(pg);
-		set_bit(PG_dcache_clean, &pg->flags);
+		flush_dcache_icache_folio(folio);
+		set_bit(PG_dcache_clean, &folio->flags);
 		return pte;
 	}
 
@@ -142,7 +142,7 @@ static inline pte_t set_pte_filter(pte_t pte)
 static pte_t set_access_flags_filter(pte_t pte, struct vm_area_struct *vma,
 				     int dirty)
 {
-	struct page *pg;
+	struct folio *folio;
 
 	if (IS_ENABLED(CONFIG_PPC_BOOK3S_64))
 		return pte;
@@ -168,17 +168,17 @@ static pte_t set_access_flags_filter(pte_t pte, struct vm_area_struct *vma,
 #endif /* CONFIG_DEBUG_VM */
 
 	/* If you set _PAGE_EXEC on weird pages you're on your own */
-	pg = maybe_pte_to_page(pte);
-	if (unlikely(!pg))
+	folio = maybe_pte_to_folio(pte);
+	if (unlikely(!folio))
 		goto bail;
 
 	/* If the page is already clean, we move on */
-	if (test_bit(PG_dcache_clean, &pg->flags))
+	if (test_bit(PG_dcache_clean, &folio->flags))
 		goto bail;
 
 	/* Clean the page and set PG_dcache_clean */
-	flush_dcache_icache_page(pg);
-	set_bit(PG_dcache_clean, &pg->flags);
+	flush_dcache_icache_folio(folio);
+	set_bit(PG_dcache_clean, &folio->flags);
 
  bail:
 	return pte_mkexec(pte);
@@ -187,8 +187,8 @@ static pte_t set_access_flags_filter(pte_t pte, struct vm_area_struct *vma,
 /*
  * set_pte stores a linux PTE into the linux page table.
  */
-void set_pte_at(struct mm_struct *mm, unsigned long addr, pte_t *ptep,
-		pte_t pte)
+void set_ptes(struct mm_struct *mm, unsigned long addr, pte_t *ptep,
+		pte_t pte, unsigned int nr)
 {
 	/*
 	 * Make sure hardware valid bit is not set. We don't do
@@ -203,7 +203,14 @@ void set_pte_at(struct mm_struct *mm, unsigned long addr, pte_t *ptep,
 	pte = set_pte_filter(pte);
 
 	/* Perform the setting of the PTE */
-	__set_pte_at(mm, addr, ptep, pte, 0);
+	for (;;) {
+		__set_pte_at(mm, addr, ptep, pte, 0);
+		if (--nr == 0)
+			break;
+		ptep++;
+		pte = __pte(pte_val(pte) + PAGE_SIZE);
+		addr += PAGE_SIZE;
+	}
 }
 
 void unmap_kernel_page(unsigned long va)
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH v3 20/34] riscv: Implement the new page table range API
  2023-02-28 21:37 [PATCH v3 00/34] New page table range API Matthew Wilcox (Oracle)
@ 2023-02-28 21:37   ` Matthew Wilcox (Oracle)
  2023-02-28 21:37 ` [PATCH v3 02/34] mm: Add generic flush_icache_pages() and documentation Matthew Wilcox (Oracle)
                     ` (35 subsequent siblings)
  36 siblings, 0 replies; 77+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-02-28 21:37 UTC (permalink / raw)
  To: linux-mm, linux-arch
  Cc: Matthew Wilcox (Oracle),
	linux-kernel, Alexandre Ghiti, Paul Walmsley, Palmer Dabbelt,
	Albert Ou, linux-riscv

Add set_ptes(), update_mmu_cache_range() and flush_dcache_folio().
Change the PG_dcache_clean flag from being per-page to per-folio.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: linux-riscv@lists.infradead.org
---
 arch/riscv/include/asm/cacheflush.h | 19 +++++++++----------
 arch/riscv/include/asm/pgtable.h    | 26 +++++++++++++++++++-------
 arch/riscv/mm/cacheflush.c          | 11 ++---------
 3 files changed, 30 insertions(+), 26 deletions(-)

diff --git a/arch/riscv/include/asm/cacheflush.h b/arch/riscv/include/asm/cacheflush.h
index 03e3b95ae6da..10e5e96f09b5 100644
--- a/arch/riscv/include/asm/cacheflush.h
+++ b/arch/riscv/include/asm/cacheflush.h
@@ -15,20 +15,19 @@ static inline void local_flush_icache_all(void)
 
 #define PG_dcache_clean PG_arch_1
 
-static inline void flush_dcache_page(struct page *page)
+static inline void flush_dcache_folio(struct folio *folio)
 {
-	/*
-	 * HugeTLB pages are always fully mapped and only head page will be
-	 * set PG_dcache_clean (see comments in flush_icache_pte()).
-	 */
-	if (PageHuge(page))
-		page = compound_head(page);
-
-	if (test_bit(PG_dcache_clean, &page->flags))
-		clear_bit(PG_dcache_clean, &page->flags);
+	if (test_bit(PG_dcache_clean, &folio->flags))
+		clear_bit(PG_dcache_clean, &folio->flags);
 }
+#define flush_dcache_folio flush_dcache_folio
 #define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
 
+static inline void flush_dcache_page(struct page *page)
+{
+	flush_dcache_folio(page_folio(page));
+}
+
 /*
  * RISC-V doesn't have an instruction to flush parts of the instruction cache,
  * so instead we just flush the whole thing.
diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
index b516f3b59616..3a3a776fc047 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -405,8 +405,8 @@ static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
 
 
 /* Commit new configuration to MMU hardware */
-static inline void update_mmu_cache(struct vm_area_struct *vma,
-	unsigned long address, pte_t *ptep)
+static inline void update_mmu_cache_range(struct vm_area_struct *vma,
+		unsigned long address, pte_t *ptep, unsigned int nr)
 {
 	/*
 	 * The kernel assumes that TLBs don't cache invalid entries, but
@@ -415,8 +415,11 @@ static inline void update_mmu_cache(struct vm_area_struct *vma,
 	 * Relying on flush_tlb_fix_spurious_fault would suffice, but
 	 * the extra traps reduce performance.  So, eagerly SFENCE.VMA.
 	 */
-	local_flush_tlb_page(address);
+	while (nr--)
+		local_flush_tlb_page(address + nr * PAGE_SIZE);
 }
+#define update_mmu_cache(vma, addr, ptep) \
+	update_mmu_cache_range(vma, addr, ptep, 1)
 
 #define __HAVE_ARCH_UPDATE_MMU_TLB
 #define update_mmu_tlb update_mmu_cache
@@ -456,12 +459,21 @@ static inline void __set_pte_at(struct mm_struct *mm,
 	set_pte(ptep, pteval);
 }
 
-static inline void set_pte_at(struct mm_struct *mm,
-	unsigned long addr, pte_t *ptep, pte_t pteval)
+static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
+		pte_t *ptep, pte_t pteval, unsigned int nr)
 {
-	page_table_check_ptes_set(mm, addr, ptep, pteval, 1);
-	__set_pte_at(mm, addr, ptep, pteval);
+	page_table_check_ptes_set(mm, addr, ptep, pteval, nr);
+
+	for (;;) {
+		__set_pte_at(mm, addr, ptep, pteval);
+		if (--nr == 0)
+			break;
+		ptep++;
+		addr += PAGE_SIZE;
+		pte_val(pteval) += 1 << _PAGE_PFN_SHIFT;
+	}
 }
+#define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1)
 
 static inline void pte_clear(struct mm_struct *mm,
 	unsigned long addr, pte_t *ptep)
diff --git a/arch/riscv/mm/cacheflush.c b/arch/riscv/mm/cacheflush.c
index fcd6145fbead..e36a851e5788 100644
--- a/arch/riscv/mm/cacheflush.c
+++ b/arch/riscv/mm/cacheflush.c
@@ -81,16 +81,9 @@ void flush_icache_mm(struct mm_struct *mm, bool local)
 #ifdef CONFIG_MMU
 void flush_icache_pte(pte_t pte)
 {
-	struct page *page = pte_page(pte);
+	struct folio *folio = page_folio(pte_page(pte));
 
-	/*
-	 * HugeTLB pages are always fully mapped, so only setting head page's
-	 * PG_dcache_clean flag is enough.
-	 */
-	if (PageHuge(page))
-		page = compound_head(page);
-
-	if (!test_bit(PG_dcache_clean, &page->flags)) {
+	if (!test_bit(PG_dcache_clean, &folio->flags)) {
 		flush_icache_all();
 		set_bit(PG_dcache_clean, &page->flags);
 	}
-- 
2.39.1


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH v3 20/34] riscv: Implement the new page table range API
@ 2023-02-28 21:37   ` Matthew Wilcox (Oracle)
  0 siblings, 0 replies; 77+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-02-28 21:37 UTC (permalink / raw)
  To: linux-mm, linux-arch
  Cc: Matthew Wilcox (Oracle),
	linux-kernel, Alexandre Ghiti, Paul Walmsley, Palmer Dabbelt,
	Albert Ou, linux-riscv

Add set_ptes(), update_mmu_cache_range() and flush_dcache_folio().
Change the PG_dcache_clean flag from being per-page to per-folio.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: linux-riscv@lists.infradead.org
---
 arch/riscv/include/asm/cacheflush.h | 19 +++++++++----------
 arch/riscv/include/asm/pgtable.h    | 26 +++++++++++++++++++-------
 arch/riscv/mm/cacheflush.c          | 11 ++---------
 3 files changed, 30 insertions(+), 26 deletions(-)

diff --git a/arch/riscv/include/asm/cacheflush.h b/arch/riscv/include/asm/cacheflush.h
index 03e3b95ae6da..10e5e96f09b5 100644
--- a/arch/riscv/include/asm/cacheflush.h
+++ b/arch/riscv/include/asm/cacheflush.h
@@ -15,20 +15,19 @@ static inline void local_flush_icache_all(void)
 
 #define PG_dcache_clean PG_arch_1
 
-static inline void flush_dcache_page(struct page *page)
+static inline void flush_dcache_folio(struct folio *folio)
 {
-	/*
-	 * HugeTLB pages are always fully mapped and only head page will be
-	 * set PG_dcache_clean (see comments in flush_icache_pte()).
-	 */
-	if (PageHuge(page))
-		page = compound_head(page);
-
-	if (test_bit(PG_dcache_clean, &page->flags))
-		clear_bit(PG_dcache_clean, &page->flags);
+	if (test_bit(PG_dcache_clean, &folio->flags))
+		clear_bit(PG_dcache_clean, &folio->flags);
 }
+#define flush_dcache_folio flush_dcache_folio
 #define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
 
+static inline void flush_dcache_page(struct page *page)
+{
+	flush_dcache_folio(page_folio(page));
+}
+
 /*
  * RISC-V doesn't have an instruction to flush parts of the instruction cache,
  * so instead we just flush the whole thing.
diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
index b516f3b59616..3a3a776fc047 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -405,8 +405,8 @@ static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
 
 
 /* Commit new configuration to MMU hardware */
-static inline void update_mmu_cache(struct vm_area_struct *vma,
-	unsigned long address, pte_t *ptep)
+static inline void update_mmu_cache_range(struct vm_area_struct *vma,
+		unsigned long address, pte_t *ptep, unsigned int nr)
 {
 	/*
 	 * The kernel assumes that TLBs don't cache invalid entries, but
@@ -415,8 +415,11 @@ static inline void update_mmu_cache(struct vm_area_struct *vma,
 	 * Relying on flush_tlb_fix_spurious_fault would suffice, but
 	 * the extra traps reduce performance.  So, eagerly SFENCE.VMA.
 	 */
-	local_flush_tlb_page(address);
+	while (nr--)
+		local_flush_tlb_page(address + nr * PAGE_SIZE);
 }
+#define update_mmu_cache(vma, addr, ptep) \
+	update_mmu_cache_range(vma, addr, ptep, 1)
 
 #define __HAVE_ARCH_UPDATE_MMU_TLB
 #define update_mmu_tlb update_mmu_cache
@@ -456,12 +459,21 @@ static inline void __set_pte_at(struct mm_struct *mm,
 	set_pte(ptep, pteval);
 }
 
-static inline void set_pte_at(struct mm_struct *mm,
-	unsigned long addr, pte_t *ptep, pte_t pteval)
+static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
+		pte_t *ptep, pte_t pteval, unsigned int nr)
 {
-	page_table_check_ptes_set(mm, addr, ptep, pteval, 1);
-	__set_pte_at(mm, addr, ptep, pteval);
+	page_table_check_ptes_set(mm, addr, ptep, pteval, nr);
+
+	for (;;) {
+		__set_pte_at(mm, addr, ptep, pteval);
+		if (--nr == 0)
+			break;
+		ptep++;
+		addr += PAGE_SIZE;
+		pte_val(pteval) += 1 << _PAGE_PFN_SHIFT;
+	}
 }
+#define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1)
 
 static inline void pte_clear(struct mm_struct *mm,
 	unsigned long addr, pte_t *ptep)
diff --git a/arch/riscv/mm/cacheflush.c b/arch/riscv/mm/cacheflush.c
index fcd6145fbead..e36a851e5788 100644
--- a/arch/riscv/mm/cacheflush.c
+++ b/arch/riscv/mm/cacheflush.c
@@ -81,16 +81,9 @@ void flush_icache_mm(struct mm_struct *mm, bool local)
 #ifdef CONFIG_MMU
 void flush_icache_pte(pte_t pte)
 {
-	struct page *page = pte_page(pte);
+	struct folio *folio = page_folio(pte_page(pte));
 
-	/*
-	 * HugeTLB pages are always fully mapped, so only setting head page's
-	 * PG_dcache_clean flag is enough.
-	 */
-	if (PageHuge(page))
-		page = compound_head(page);
-
-	if (!test_bit(PG_dcache_clean, &page->flags)) {
+	if (!test_bit(PG_dcache_clean, &folio->flags)) {
 		flush_icache_all();
 		set_bit(PG_dcache_clean, &page->flags);
 	}
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH v3 21/34] s390: Implement the new page table range API
  2023-02-28 21:37 [PATCH v3 00/34] New page table range API Matthew Wilcox (Oracle)
                   ` (19 preceding siblings ...)
  2023-02-28 21:37   ` Matthew Wilcox (Oracle)
@ 2023-02-28 21:37 ` Matthew Wilcox (Oracle)
  2023-03-02 13:31   ` Gerald Schaefer
  2023-02-28 21:37 ` [PATCH v3 22/34] superh: " Matthew Wilcox (Oracle)
                   ` (15 subsequent siblings)
  36 siblings, 1 reply; 77+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-02-28 21:37 UTC (permalink / raw)
  To: linux-mm, linux-arch
  Cc: Matthew Wilcox (Oracle),
	linux-kernel, Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
	Christian Borntraeger, Sven Schnelle, linux-s390

Add set_ptes() and update_mmu_cache_range().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: linux-s390@vger.kernel.org
---
 arch/s390/include/asm/pgtable.h | 34 ++++++++++++++++++++++++---------
 1 file changed, 25 insertions(+), 9 deletions(-)

diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h
index 2c70b4d1263d..46bf475116f1 100644
--- a/arch/s390/include/asm/pgtable.h
+++ b/arch/s390/include/asm/pgtable.h
@@ -50,6 +50,7 @@ void arch_report_meminfo(struct seq_file *m);
  * tables contain all the necessary information.
  */
 #define update_mmu_cache(vma, address, ptep)     do { } while (0)
+#define update_mmu_cache_range(vma, addr, ptep, nr)	do { } while (0)
 #define update_mmu_cache_pmd(vma, address, ptep) do { } while (0)
 
 /*
@@ -1317,21 +1318,36 @@ pgprot_t pgprot_writecombine(pgprot_t prot);
 pgprot_t pgprot_writethrough(pgprot_t prot);
 
 /*
- * Certain architectures need to do special things when PTEs
- * within a page table are directly modified.  Thus, the following
- * hook is made available.
+ * Set multiple PTEs to consecutive pages with a single call.  All PTEs
+ * are within the same folio, PMD and VMA.
  */
-static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
-			      pte_t *ptep, pte_t entry)
+static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
+			      pte_t *ptep, pte_t entry, unsigned int nr)
 {
 	if (pte_present(entry))
 		entry = clear_pte_bit(entry, __pgprot(_PAGE_UNUSED));
-	if (mm_has_pgste(mm))
-		ptep_set_pte_at(mm, addr, ptep, entry);
-	else
-		set_pte(ptep, entry);
+	if (mm_has_pgste(mm)) {
+		for (;;) {
+			ptep_set_pte_at(mm, addr, ptep, entry);
+			if (--nr == 0)
+				break;
+			ptep++;
+			entry = __pte(pte_val(entry) + PAGE_SIZE);
+			addr += PAGE_SIZE;
+		}
+	} else {
+		for (;;) {
+			set_pte(ptep, entry);
+			if (--nr == 0)
+				break;
+			ptep++;
+			entry = __pte(pte_val(entry) + PAGE_SIZE);
+		}
+	}
 }
 
+#define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1)
+
 /*
  * Conversion functions: convert a page and protection to a page entry,
  * and a page entry and page directory to the page they refer to.
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH v3 22/34] superh: Implement the new page table range API
  2023-02-28 21:37 [PATCH v3 00/34] New page table range API Matthew Wilcox (Oracle)
                   ` (20 preceding siblings ...)
  2023-02-28 21:37 ` [PATCH v3 21/34] s390: " Matthew Wilcox (Oracle)
@ 2023-02-28 21:37 ` Matthew Wilcox (Oracle)
  2023-03-01  8:06   ` Geert Uytterhoeven
  2023-02-28 21:37 ` [PATCH v3 23/34] sparc32: " Matthew Wilcox (Oracle)
                   ` (14 subsequent siblings)
  36 siblings, 1 reply; 77+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-02-28 21:37 UTC (permalink / raw)
  To: linux-mm, linux-arch
  Cc: Matthew Wilcox (Oracle),
	linux-kernel, Yoshinori Sato, Rich Felker,
	John Paul Adrian Glaubitz, linux-sh

Add set_ptes(), update_mmu_cache_range(), flush_dcache_folio() and
flush_icache_pages().  Change the PG_dcache_clean flag from being
per-page to per-folio.  Flush the entire folio containing the pages in
flush_icache_pages() for ease of implementation.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Cc: linux-sh@vger.kernel.org
---
 arch/sh/include/asm/cacheflush.h | 21 ++++++++-----
 arch/sh/include/asm/pgtable.h    |  6 ++--
 arch/sh/include/asm/pgtable_32.h | 16 ++++++++--
 arch/sh/mm/cache-j2.c            |  4 +--
 arch/sh/mm/cache-sh4.c           | 26 ++++++++++-----
 arch/sh/mm/cache-sh7705.c        | 26 +++++++++------
 arch/sh/mm/cache.c               | 54 ++++++++++++++++++--------------
 arch/sh/mm/kmap.c                |  3 +-
 8 files changed, 101 insertions(+), 55 deletions(-)

diff --git a/arch/sh/include/asm/cacheflush.h b/arch/sh/include/asm/cacheflush.h
index 481a664287e2..9fceef6f3e00 100644
--- a/arch/sh/include/asm/cacheflush.h
+++ b/arch/sh/include/asm/cacheflush.h
@@ -13,9 +13,9 @@
  *  - flush_cache_page(mm, vmaddr, pfn) flushes a single page
  *  - flush_cache_range(vma, start, end) flushes a range of pages
  *
- *  - flush_dcache_page(pg) flushes(wback&invalidates) a page for dcache
+ *  - flush_dcache_folio(folio) flushes(wback&invalidates) a folio for dcache
  *  - flush_icache_range(start, end) flushes(invalidates) a range for icache
- *  - flush_icache_page(vma, pg) flushes(invalidates) a page for icache
+ *  - flush_icache_pages(vma, pg, nr) flushes(invalidates) pages for icache
  *  - flush_cache_sigtramp(vaddr) flushes the signal trampoline
  */
 extern void (*local_flush_cache_all)(void *args);
@@ -23,9 +23,9 @@ extern void (*local_flush_cache_mm)(void *args);
 extern void (*local_flush_cache_dup_mm)(void *args);
 extern void (*local_flush_cache_page)(void *args);
 extern void (*local_flush_cache_range)(void *args);
-extern void (*local_flush_dcache_page)(void *args);
+extern void (*local_flush_dcache_folio)(void *args);
 extern void (*local_flush_icache_range)(void *args);
-extern void (*local_flush_icache_page)(void *args);
+extern void (*local_flush_icache_folio)(void *args);
 extern void (*local_flush_cache_sigtramp)(void *args);
 
 static inline void cache_noop(void *args) { }
@@ -42,11 +42,18 @@ extern void flush_cache_page(struct vm_area_struct *vma,
 extern void flush_cache_range(struct vm_area_struct *vma,
 				 unsigned long start, unsigned long end);
 #define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
-void flush_dcache_page(struct page *page);
+void flush_dcache_folio(struct folio *folio);
+#define flush_dcache_folio flush_dcache_folio
+static inline void flush_dcache_page(struct page *page)
+{
+	flush_dcache_folio(page_folio(page));
+}
+
 extern void flush_icache_range(unsigned long start, unsigned long end);
 #define flush_icache_user_range flush_icache_range
-extern void flush_icache_page(struct vm_area_struct *vma,
-				 struct page *page);
+void flush_icache_pages(struct vm_area_struct *vma, struct page *page,
+		unsigned int nr);
+#define flush_icache_page(vma, page) flush_icache_pages(vma, page, 1)
 extern void flush_cache_sigtramp(unsigned long address);
 
 struct flusher_data {
diff --git a/arch/sh/include/asm/pgtable.h b/arch/sh/include/asm/pgtable.h
index 3ce30becf6df..1a8fdc3bc363 100644
--- a/arch/sh/include/asm/pgtable.h
+++ b/arch/sh/include/asm/pgtable.h
@@ -102,13 +102,15 @@ extern void __update_cache(struct vm_area_struct *vma,
 extern void __update_tlb(struct vm_area_struct *vma,
 			 unsigned long address, pte_t pte);
 
-static inline void
-update_mmu_cache(struct vm_area_struct *vma, unsigned long address, pte_t *ptep)
+static inline void update_mmu_cache_range(struct vm_area_struct *vma,
+		unsigned long address, pte_t *ptep, unsigned int nr)
 {
 	pte_t pte = *ptep;
 	__update_cache(vma, address, pte);
 	__update_tlb(vma, address, pte);
 }
+#define update_mmu_cache(vma, addr, ptep) \
+	update_mmu_cache_range(vma, addr, ptep, 1)
 
 extern pgd_t swapper_pg_dir[PTRS_PER_PGD];
 extern void paging_init(void);
diff --git a/arch/sh/include/asm/pgtable_32.h b/arch/sh/include/asm/pgtable_32.h
index 21952b094650..03ba1834e126 100644
--- a/arch/sh/include/asm/pgtable_32.h
+++ b/arch/sh/include/asm/pgtable_32.h
@@ -307,7 +307,19 @@ static inline void set_pte(pte_t *ptep, pte_t pte)
 #define set_pte(pteptr, pteval) (*(pteptr) = pteval)
 #endif
 
-#define set_pte_at(mm,addr,ptep,pteval) set_pte(ptep,pteval)
+static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
+		pte_t *ptep, pte_t pte, unsigned int nr)
+{
+	for (;;) {
+		set_pte(ptep, pte);
+		if (--nr == 0)
+			break;
+		ptep++;
+		pte = __pte(pte_val(pte) + PAGE_SIZE);
+	}
+}
+
+#define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1)
 
 /*
  * (pmds are folded into pgds so this doesn't get actually called,
@@ -323,7 +335,7 @@ static inline void set_pte(pte_t *ptep, pte_t pte)
 #define pte_none(x)		(!pte_val(x))
 #define pte_present(x)		((x).pte_low & (_PAGE_PRESENT | _PAGE_PROTNONE))
 
-#define pte_clear(mm,addr,xp) do { set_pte_at(mm, addr, xp, __pte(0)); } while (0)
+#define pte_clear(mm, addr, ptep) set_pte(ptep, __pte(0))
 
 #define pmd_none(x)	(!pmd_val(x))
 #define pmd_present(x)	(pmd_val(x))
diff --git a/arch/sh/mm/cache-j2.c b/arch/sh/mm/cache-j2.c
index f277862a11f5..9ac960214380 100644
--- a/arch/sh/mm/cache-j2.c
+++ b/arch/sh/mm/cache-j2.c
@@ -55,9 +55,9 @@ void __init j2_cache_init(void)
 	local_flush_cache_dup_mm = j2_flush_both;
 	local_flush_cache_page = j2_flush_both;
 	local_flush_cache_range = j2_flush_both;
-	local_flush_dcache_page = j2_flush_dcache;
+	local_flush_dcache_folio = j2_flush_dcache;
 	local_flush_icache_range = j2_flush_icache;
-	local_flush_icache_page = j2_flush_icache;
+	local_flush_icache_folio = j2_flush_icache;
 	local_flush_cache_sigtramp = j2_flush_icache;
 
 	pr_info("Initial J2 CCR is %.8x\n", __raw_readl(j2_ccr_base));
diff --git a/arch/sh/mm/cache-sh4.c b/arch/sh/mm/cache-sh4.c
index 72c2e1b46c08..862046f26981 100644
--- a/arch/sh/mm/cache-sh4.c
+++ b/arch/sh/mm/cache-sh4.c
@@ -107,19 +107,29 @@ static inline void flush_cache_one(unsigned long start, unsigned long phys)
  * Write back & invalidate the D-cache of the page.
  * (To avoid "alias" issues)
  */
-static void sh4_flush_dcache_page(void *arg)
+static void sh4_flush_dcache_folio(void *arg)
 {
-	struct page *page = arg;
-	unsigned long addr = (unsigned long)page_address(page);
+	struct folio *folio = arg;
 #ifndef CONFIG_SMP
-	struct address_space *mapping = page_mapping_file(page);
+	struct address_space *mapping = folio_flush_mapping(folio);
 
 	if (mapping && !mapping_mapped(mapping))
-		clear_bit(PG_dcache_clean, &page->flags);
+		clear_bit(PG_dcache_clean, &folio->flags);
 	else
 #endif
-		flush_cache_one(CACHE_OC_ADDRESS_ARRAY |
-				(addr & shm_align_mask), page_to_phys(page));
+	{
+		unsigned long pfn = folio_pfn(folio);
+		unsigned long addr = (unsigned long)folio_address(folio);
+		unsigned int i, nr = folio_nr_pages(folio);
+
+		for (i = 0; i < nr; i++) {
+			flush_cache_one(CACHE_OC_ADDRESS_ARRAY |
+						(addr & shm_align_mask),
+					pfn * PAGE_SIZE);
+			addr += PAGE_SIZE;
+			pfn++;
+		}
+	}
 
 	wmb();
 }
@@ -379,7 +389,7 @@ void __init sh4_cache_init(void)
 		__raw_readl(CCN_PRR));
 
 	local_flush_icache_range	= sh4_flush_icache_range;
-	local_flush_dcache_page		= sh4_flush_dcache_page;
+	local_flush_dcache_folio	= sh4_flush_dcache_folio;
 	local_flush_cache_all		= sh4_flush_cache_all;
 	local_flush_cache_mm		= sh4_flush_cache_mm;
 	local_flush_cache_dup_mm	= sh4_flush_cache_mm;
diff --git a/arch/sh/mm/cache-sh7705.c b/arch/sh/mm/cache-sh7705.c
index 9b63a53a5e46..b509a407588f 100644
--- a/arch/sh/mm/cache-sh7705.c
+++ b/arch/sh/mm/cache-sh7705.c
@@ -132,15 +132,20 @@ static void __flush_dcache_page(unsigned long phys)
  * Write back & invalidate the D-cache of the page.
  * (To avoid "alias" issues)
  */
-static void sh7705_flush_dcache_page(void *arg)
+static void sh7705_flush_dcache_folio(void *arg)
 {
-	struct page *page = arg;
-	struct address_space *mapping = page_mapping_file(page);
+	struct folio *folio = arg;
+	struct address_space *mapping = folio_flush_mapping(folio);
 
 	if (mapping && !mapping_mapped(mapping))
-		clear_bit(PG_dcache_clean, &page->flags);
-	else
-		__flush_dcache_page(__pa(page_address(page)));
+		clear_bit(PG_dcache_clean, &folio->flags);
+	else {
+		unsigned long pfn = folio_pfn(folio);
+		unsigned int i, nr = folio_nr_pages(folio);
+
+		for (i = 0; i < nr; i++)
+			__flush_dcache_page((pfn + i) * PAGE_SIZE);
+	}
 }
 
 static void sh7705_flush_cache_all(void *args)
@@ -176,19 +181,20 @@ static void sh7705_flush_cache_page(void *args)
  * Not entirely sure why this is necessary on SH3 with 32K cache but
  * without it we get occasional "Memory fault" when loading a program.
  */
-static void sh7705_flush_icache_page(void *page)
+static void sh7705_flush_icache_folio(void *arg)
 {
-	__flush_purge_region(page_address(page), PAGE_SIZE);
+	struct folio *folio = arg;
+	__flush_purge_region(folio_address(folio), folio_size(folio));
 }
 
 void __init sh7705_cache_init(void)
 {
 	local_flush_icache_range	= sh7705_flush_icache_range;
-	local_flush_dcache_page		= sh7705_flush_dcache_page;
+	local_flush_dcache_folio	= sh7705_flush_dcache_folio;
 	local_flush_cache_all		= sh7705_flush_cache_all;
 	local_flush_cache_mm		= sh7705_flush_cache_all;
 	local_flush_cache_dup_mm	= sh7705_flush_cache_all;
 	local_flush_cache_range		= sh7705_flush_cache_all;
 	local_flush_cache_page		= sh7705_flush_cache_page;
-	local_flush_icache_page		= sh7705_flush_icache_page;
+	local_flush_icache_folio	= sh7705_flush_icache_folio;
 }
diff --git a/arch/sh/mm/cache.c b/arch/sh/mm/cache.c
index 3aef78ceb820..93fc5fb8ec1c 100644
--- a/arch/sh/mm/cache.c
+++ b/arch/sh/mm/cache.c
@@ -20,9 +20,9 @@ void (*local_flush_cache_mm)(void *args) = cache_noop;
 void (*local_flush_cache_dup_mm)(void *args) = cache_noop;
 void (*local_flush_cache_page)(void *args) = cache_noop;
 void (*local_flush_cache_range)(void *args) = cache_noop;
-void (*local_flush_dcache_page)(void *args) = cache_noop;
+void (*local_flush_dcache_folio)(void *args) = cache_noop;
 void (*local_flush_icache_range)(void *args) = cache_noop;
-void (*local_flush_icache_page)(void *args) = cache_noop;
+void (*local_flush_icache_folio)(void *args) = cache_noop;
 void (*local_flush_cache_sigtramp)(void *args) = cache_noop;
 
 void (*__flush_wback_region)(void *start, int size);
@@ -61,15 +61,17 @@ void copy_to_user_page(struct vm_area_struct *vma, struct page *page,
 		       unsigned long vaddr, void *dst, const void *src,
 		       unsigned long len)
 {
-	if (boot_cpu_data.dcache.n_aliases && page_mapcount(page) &&
-	    test_bit(PG_dcache_clean, &page->flags)) {
+	struct folio *folio = page_folio(page);
+
+	if (boot_cpu_data.dcache.n_aliases && folio_mapped(folio) &&
+	    test_bit(PG_dcache_clean, &folio->flags)) {
 		void *vto = kmap_coherent(page, vaddr) + (vaddr & ~PAGE_MASK);
 		memcpy(vto, src, len);
 		kunmap_coherent(vto);
 	} else {
 		memcpy(dst, src, len);
 		if (boot_cpu_data.dcache.n_aliases)
-			clear_bit(PG_dcache_clean, &page->flags);
+			clear_bit(PG_dcache_clean, &folio->flags);
 	}
 
 	if (vma->vm_flags & VM_EXEC)
@@ -80,27 +82,30 @@ void copy_from_user_page(struct vm_area_struct *vma, struct page *page,
 			 unsigned long vaddr, void *dst, const void *src,
 			 unsigned long len)
 {
+	struct folio *folio = page_folio(page);
+
 	if (boot_cpu_data.dcache.n_aliases && page_mapcount(page) &&
-	    test_bit(PG_dcache_clean, &page->flags)) {
+	    test_bit(PG_dcache_clean, &folio->flags)) {
 		void *vfrom = kmap_coherent(page, vaddr) + (vaddr & ~PAGE_MASK);
 		memcpy(dst, vfrom, len);
 		kunmap_coherent(vfrom);
 	} else {
 		memcpy(dst, src, len);
 		if (boot_cpu_data.dcache.n_aliases)
-			clear_bit(PG_dcache_clean, &page->flags);
+			clear_bit(PG_dcache_clean, &folio->flags);
 	}
 }
 
 void copy_user_highpage(struct page *to, struct page *from,
 			unsigned long vaddr, struct vm_area_struct *vma)
 {
+	struct folio *src = page_folio(from);
 	void *vfrom, *vto;
 
 	vto = kmap_atomic(to);
 
-	if (boot_cpu_data.dcache.n_aliases && page_mapcount(from) &&
-	    test_bit(PG_dcache_clean, &from->flags)) {
+	if (boot_cpu_data.dcache.n_aliases && folio_mapped(src) &&
+	    test_bit(PG_dcache_clean, &src->flags)) {
 		vfrom = kmap_coherent(from, vaddr);
 		copy_page(vto, vfrom);
 		kunmap_coherent(vfrom);
@@ -136,35 +141,37 @@ EXPORT_SYMBOL(clear_user_highpage);
 void __update_cache(struct vm_area_struct *vma,
 		    unsigned long address, pte_t pte)
 {
-	struct page *page;
 	unsigned long pfn = pte_pfn(pte);
 
 	if (!boot_cpu_data.dcache.n_aliases)
 		return;
 
-	page = pfn_to_page(pfn);
 	if (pfn_valid(pfn)) {
-		int dirty = !test_and_set_bit(PG_dcache_clean, &page->flags);
+		struct folio *folio = page_folio(pfn_to_page(pfn));
+		int dirty = !test_and_set_bit(PG_dcache_clean, &folio->flags);
 		if (dirty)
-			__flush_purge_region(page_address(page), PAGE_SIZE);
+			__flush_purge_region(folio_address(folio),
+						folio_size(folio));
 	}
 }
 
 void __flush_anon_page(struct page *page, unsigned long vmaddr)
 {
+	struct folio *folio = page_folio(page);
 	unsigned long addr = (unsigned long) page_address(page);
 
 	if (pages_do_alias(addr, vmaddr)) {
-		if (boot_cpu_data.dcache.n_aliases && page_mapcount(page) &&
-		    test_bit(PG_dcache_clean, &page->flags)) {
+		if (boot_cpu_data.dcache.n_aliases && folio_mapped(folio) &&
+		    test_bit(PG_dcache_clean, &folio->flags)) {
 			void *kaddr;
 
 			kaddr = kmap_coherent(page, vmaddr);
 			/* XXX.. For now kunmap_coherent() does a purge */
 			/* __flush_purge_region((void *)kaddr, PAGE_SIZE); */
 			kunmap_coherent(kaddr);
-		} else
-			__flush_purge_region((void *)addr, PAGE_SIZE);
+		} else 
+			__flush_purge_region(folio_address(folio),
+						folio_size(folio));
 	}
 }
 
@@ -215,11 +222,11 @@ void flush_cache_range(struct vm_area_struct *vma, unsigned long start,
 }
 EXPORT_SYMBOL(flush_cache_range);
 
-void flush_dcache_page(struct page *page)
+void flush_dcache_folio(struct folio *folio)
 {
-	cacheop_on_each_cpu(local_flush_dcache_page, page, 1);
+	cacheop_on_each_cpu(local_flush_dcache_folio, folio, 1);
 }
-EXPORT_SYMBOL(flush_dcache_page);
+EXPORT_SYMBOL(flush_dcache_folio);
 
 void flush_icache_range(unsigned long start, unsigned long end)
 {
@@ -233,10 +240,11 @@ void flush_icache_range(unsigned long start, unsigned long end)
 }
 EXPORT_SYMBOL(flush_icache_range);
 
-void flush_icache_page(struct vm_area_struct *vma, struct page *page)
+void flush_icache_pages(struct vm_area_struct *vma, struct page *page,
+		unsigned int nr)
 {
-	/* Nothing uses the VMA, so just pass the struct page along */
-	cacheop_on_each_cpu(local_flush_icache_page, page, 1);
+	/* Nothing uses the VMA, so just pass the folio along */
+	cacheop_on_each_cpu(local_flush_icache_folio, page_folio(page), 1);
 }
 
 void flush_cache_sigtramp(unsigned long address)
diff --git a/arch/sh/mm/kmap.c b/arch/sh/mm/kmap.c
index 73fd7cc99430..fa50e8f6e7a9 100644
--- a/arch/sh/mm/kmap.c
+++ b/arch/sh/mm/kmap.c
@@ -27,10 +27,11 @@ void __init kmap_coherent_init(void)
 
 void *kmap_coherent(struct page *page, unsigned long addr)
 {
+	struct folio *folio = page_folio(page);
 	enum fixed_addresses idx;
 	unsigned long vaddr;
 
-	BUG_ON(!test_bit(PG_dcache_clean, &page->flags));
+	BUG_ON(!test_bit(PG_dcache_clean, &folio->flags));
 
 	preempt_disable();
 	pagefault_disable();
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH v3 23/34] sparc32: Implement the new page table range API
  2023-02-28 21:37 [PATCH v3 00/34] New page table range API Matthew Wilcox (Oracle)
                   ` (21 preceding siblings ...)
  2023-02-28 21:37 ` [PATCH v3 22/34] superh: " Matthew Wilcox (Oracle)
@ 2023-02-28 21:37 ` Matthew Wilcox (Oracle)
  2023-02-28 21:37 ` [PATCH v3 24/34] sparc64: " Matthew Wilcox (Oracle)
                   ` (13 subsequent siblings)
  36 siblings, 0 replies; 77+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-02-28 21:37 UTC (permalink / raw)
  To: linux-mm, linux-arch
  Cc: Matthew Wilcox (Oracle), linux-kernel, David S. Miller, sparclinux

Add set_ptes(), update_mmu_cache_range(), flush_dcache_folio() and
flush_icache_pages().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: sparclinux@vger.kernel.org
---
 arch/sparc/include/asm/cacheflush_32.h |  9 +++++++--
 arch/sparc/include/asm/pgtable_32.h    | 15 ++++++++++++++-
 arch/sparc/mm/init_32.c                | 13 +++++++++++--
 3 files changed, 32 insertions(+), 5 deletions(-)

diff --git a/arch/sparc/include/asm/cacheflush_32.h b/arch/sparc/include/asm/cacheflush_32.h
index adb6991d0455..8dba35d63328 100644
--- a/arch/sparc/include/asm/cacheflush_32.h
+++ b/arch/sparc/include/asm/cacheflush_32.h
@@ -16,6 +16,7 @@
 	sparc32_cachetlb_ops->cache_page(vma, addr)
 #define flush_icache_range(start, end)		do { } while (0)
 #define flush_icache_page(vma, pg)		do { } while (0)
+#define flush_icache_pages(vma, pg, nr)		do { } while (0)
 
 #define copy_to_user_page(vma, page, vaddr, dst, src, len) \
 	do {							\
@@ -35,11 +36,15 @@
 #define flush_page_for_dma(addr) \
 	sparc32_cachetlb_ops->page_for_dma(addr)
 
-struct page;
 void sparc_flush_page_to_ram(struct page *page);
+void sparc_flush_folio_to_ram(struct folio *folio);
 
 #define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
-#define flush_dcache_page(page)			sparc_flush_page_to_ram(page)
+#define flush_dcache_folio(folio)		sparc_flush_folio_to_ram(folio)
+static inline void flush_dcache_page(struct page *page)
+{
+	flush_dcache_folio(page_folio(page));
+}
 #define flush_dcache_mmap_lock(mapping)		do { } while (0)
 #define flush_dcache_mmap_unlock(mapping)	do { } while (0)
 
diff --git a/arch/sparc/include/asm/pgtable_32.h b/arch/sparc/include/asm/pgtable_32.h
index d4330e3c57a6..47ae55ea1837 100644
--- a/arch/sparc/include/asm/pgtable_32.h
+++ b/arch/sparc/include/asm/pgtable_32.h
@@ -101,7 +101,19 @@ static inline void set_pte(pte_t *ptep, pte_t pteval)
 	srmmu_swap((unsigned long *)ptep, pte_val(pteval));
 }
 
-#define set_pte_at(mm,addr,ptep,pteval) set_pte(ptep,pteval)
+static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
+		pte_t *ptep, pte_t pte, unsigned int nr)
+{
+	for (;;) {
+		set_pte(ptep, pte);
+		if (--nr == 0)
+			break;
+		ptep++;
+		pte_val(pte) += PAGE_SIZE;
+	}
+}
+
+#define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1)
 
 static inline int srmmu_device_memory(unsigned long x)
 {
@@ -318,6 +330,7 @@ void mmu_info(struct seq_file *m);
 #define FAULT_CODE_USER     0x4
 
 #define update_mmu_cache(vma, address, ptep) do { } while (0)
+#define update_mmu_cache_range(vma, address, ptep, nr) do { } while (0)
 
 void srmmu_mapiorange(unsigned int bus, unsigned long xpa,
                       unsigned long xva, unsigned int len);
diff --git a/arch/sparc/mm/init_32.c b/arch/sparc/mm/init_32.c
index 9c0ea457bdf0..d96a14ffceeb 100644
--- a/arch/sparc/mm/init_32.c
+++ b/arch/sparc/mm/init_32.c
@@ -297,11 +297,20 @@ void sparc_flush_page_to_ram(struct page *page)
 {
 	unsigned long vaddr = (unsigned long)page_address(page);
 
-	if (vaddr)
-		__flush_page_to_ram(vaddr);
+	__flush_page_to_ram(vaddr);
 }
 EXPORT_SYMBOL(sparc_flush_page_to_ram);
 
+void sparc_flush_folio_to_ram(struct folio *folio)
+{
+	unsigned long vaddr = (unsigned long)folio_address(folio);
+	unsigned int i, nr = folio_nr_pages(folio);
+
+	for (i = 0; i < nr; i++)
+		__flush_page_to_ram(vaddr + i * PAGE_SIZE);
+}
+EXPORT_SYMBOL(sparc_flush_folio_to_ram);
+
 static const pgprot_t protection_map[16] = {
 	[VM_NONE]					= PAGE_NONE,
 	[VM_READ]					= PAGE_READONLY,
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH v3 24/34] sparc64: Implement the new page table range API
  2023-02-28 21:37 [PATCH v3 00/34] New page table range API Matthew Wilcox (Oracle)
                   ` (22 preceding siblings ...)
  2023-02-28 21:37 ` [PATCH v3 23/34] sparc32: " Matthew Wilcox (Oracle)
@ 2023-02-28 21:37 ` Matthew Wilcox (Oracle)
  2023-02-28 21:37   ` Matthew Wilcox (Oracle)
                   ` (12 subsequent siblings)
  36 siblings, 0 replies; 77+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-02-28 21:37 UTC (permalink / raw)
  To: linux-mm, linux-arch
  Cc: Matthew Wilcox (Oracle), linux-kernel, David S. Miller, sparclinux

Add set_ptes(), update_mmu_cache_range(), flush_dcache_folio() and
flush_icache_pages().  Convert the PG_dcache_dirty flag from being
per-page to per-folio.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: sparclinux@vger.kernel.org
---
 arch/sparc/include/asm/cacheflush_64.h | 18 ++++--
 arch/sparc/include/asm/pgtable_64.h    | 25 +++++++--
 arch/sparc/kernel/smp_64.c             | 56 +++++++++++-------
 arch/sparc/mm/init_64.c                | 78 +++++++++++++++-----------
 arch/sparc/mm/tlb.c                    |  5 +-
 5 files changed, 117 insertions(+), 65 deletions(-)

diff --git a/arch/sparc/include/asm/cacheflush_64.h b/arch/sparc/include/asm/cacheflush_64.h
index b9341836597e..a9a719f04d06 100644
--- a/arch/sparc/include/asm/cacheflush_64.h
+++ b/arch/sparc/include/asm/cacheflush_64.h
@@ -35,20 +35,26 @@ void flush_icache_range(unsigned long start, unsigned long end);
 void __flush_icache_page(unsigned long);
 
 void __flush_dcache_page(void *addr, int flush_icache);
-void flush_dcache_page_impl(struct page *page);
+void flush_dcache_folio_impl(struct folio *folio);
 #ifdef CONFIG_SMP
-void smp_flush_dcache_page_impl(struct page *page, int cpu);
-void flush_dcache_page_all(struct mm_struct *mm, struct page *page);
+void smp_flush_dcache_folio_impl(struct folio *folio, int cpu);
+void flush_dcache_folio_all(struct mm_struct *mm, struct folio *folio);
 #else
-#define smp_flush_dcache_page_impl(page,cpu) flush_dcache_page_impl(page)
-#define flush_dcache_page_all(mm,page) flush_dcache_page_impl(page)
+#define smp_flush_dcache_folio_impl(folio, cpu) flush_dcache_folio_impl(folio)
+#define flush_dcache_folio_all(mm, folio) flush_dcache_folio_impl(folio)
 #endif
 
 void __flush_dcache_range(unsigned long start, unsigned long end);
 #define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
-void flush_dcache_page(struct page *page);
+void flush_dcache_folio(struct folio *folio);
+#define flush_dcache_folio flush_dcache_folio
+static inline void flush_dcache_page(struct page *page)
+{
+	flush_dcache_folio(page_folio(page));
+}
 
 #define flush_icache_page(vma, pg)	do { } while(0)
+#define flush_icache_pages(vma, pg, nr)	do { } while(0)
 
 void flush_ptrace_access(struct vm_area_struct *, struct page *,
 			 unsigned long uaddr, void *kaddr,
diff --git a/arch/sparc/include/asm/pgtable_64.h b/arch/sparc/include/asm/pgtable_64.h
index 2dc8d4641734..d5c0088e0c6a 100644
--- a/arch/sparc/include/asm/pgtable_64.h
+++ b/arch/sparc/include/asm/pgtable_64.h
@@ -911,8 +911,20 @@ static inline void __set_pte_at(struct mm_struct *mm, unsigned long addr,
 	maybe_tlb_batch_add(mm, addr, ptep, orig, fullmm, PAGE_SHIFT);
 }
 
-#define set_pte_at(mm,addr,ptep,pte)	\
-	__set_pte_at((mm), (addr), (ptep), (pte), 0)
+static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
+		pte_t *ptep, pte_t pte, unsigned int nr)
+{
+	for (;;) {
+		__set_pte_at(mm, addr, ptep, pte, 0);
+		if (--nr == 0)
+			break;
+		ptep++;
+		pte_val(pte) += PAGE_SIZE;
+		addr += PAGE_SIZE;
+	}
+}
+
+#define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1);
 
 #define pte_clear(mm,addr,ptep)		\
 	set_pte_at((mm), (addr), (ptep), __pte(0UL))
@@ -931,8 +943,8 @@ static inline void __set_pte_at(struct mm_struct *mm, unsigned long addr,
 									\
 		if (pfn_valid(this_pfn) &&				\
 		    (((old_addr) ^ (new_addr)) & (1 << 13)))		\
-			flush_dcache_page_all(current->mm,		\
-					      pfn_to_page(this_pfn));	\
+			flush_dcache_folio_all(current->mm,		\
+				page_folio(pfn_to_page(this_pfn)));	\
 	}								\
 	newpte;								\
 })
@@ -947,7 +959,10 @@ struct seq_file;
 void mmu_info(struct seq_file *);
 
 struct vm_area_struct;
-void update_mmu_cache(struct vm_area_struct *, unsigned long, pte_t *);
+void update_mmu_cache_range(struct vm_area_struct *, unsigned long addr,
+		pte_t *ptep, unsigned int nr);
+#define update_mmu_cache(vma, addr, ptep) \
+	update_mmu_cache_range(vma, addr, ptep, 1)
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 void update_mmu_cache_pmd(struct vm_area_struct *vma, unsigned long addr,
 			  pmd_t *pmd);
diff --git a/arch/sparc/kernel/smp_64.c b/arch/sparc/kernel/smp_64.c
index a55295d1b924..90ef8677ac89 100644
--- a/arch/sparc/kernel/smp_64.c
+++ b/arch/sparc/kernel/smp_64.c
@@ -921,20 +921,26 @@ extern unsigned long xcall_flush_dcache_page_cheetah;
 #endif
 extern unsigned long xcall_flush_dcache_page_spitfire;
 
-static inline void __local_flush_dcache_page(struct page *page)
+static inline void __local_flush_dcache_folio(struct folio *folio)
 {
+	unsigned int i, nr = folio_nr_pages(folio);
+
 #ifdef DCACHE_ALIASING_POSSIBLE
-	__flush_dcache_page(page_address(page),
+	for (i = 0; i < nr; i++)
+		__flush_dcache_page(folio_address(folio) + i * PAGE_SIZE,
 			    ((tlb_type == spitfire) &&
-			     page_mapping_file(page) != NULL));
+			     folio_flush_mapping(folio) != NULL));
 #else
-	if (page_mapping_file(page) != NULL &&
-	    tlb_type == spitfire)
-		__flush_icache_page(__pa(page_address(page)));
+	if (folio_flush_mapping(folio) != NULL &&
+	    tlb_type == spitfire) {
+		unsigned long pfn = folio_pfn(folio)
+		for (i = 0; i < nr; i++)
+			__flush_icache_page((pfn + i) * PAGE_SIZE);
+	}
 #endif
 }
 
-void smp_flush_dcache_page_impl(struct page *page, int cpu)
+void smp_flush_dcache_folio_impl(struct folio *folio, int cpu)
 {
 	int this_cpu;
 
@@ -948,14 +954,14 @@ void smp_flush_dcache_page_impl(struct page *page, int cpu)
 	this_cpu = get_cpu();
 
 	if (cpu == this_cpu) {
-		__local_flush_dcache_page(page);
+		__local_flush_dcache_folio(folio);
 	} else if (cpu_online(cpu)) {
-		void *pg_addr = page_address(page);
+		void *pg_addr = folio_address(folio);
 		u64 data0 = 0;
 
 		if (tlb_type == spitfire) {
 			data0 = ((u64)&xcall_flush_dcache_page_spitfire);
-			if (page_mapping_file(page) != NULL)
+			if (folio_flush_mapping(folio) != NULL)
 				data0 |= ((u64)1 << 32);
 		} else if (tlb_type == cheetah || tlb_type == cheetah_plus) {
 #ifdef DCACHE_ALIASING_POSSIBLE
@@ -963,18 +969,23 @@ void smp_flush_dcache_page_impl(struct page *page, int cpu)
 #endif
 		}
 		if (data0) {
-			xcall_deliver(data0, __pa(pg_addr),
-				      (u64) pg_addr, cpumask_of(cpu));
+			unsigned int i, nr = folio_nr_pages(folio);
+
+			for (i = 0; i < nr; i++) {
+				xcall_deliver(data0, __pa(pg_addr),
+					      (u64) pg_addr, cpumask_of(cpu));
 #ifdef CONFIG_DEBUG_DCFLUSH
-			atomic_inc(&dcpage_flushes_xcall);
+				atomic_inc(&dcpage_flushes_xcall);
 #endif
+				pg_addr += PAGE_SIZE;
+			}
 		}
 	}
 
 	put_cpu();
 }
 
-void flush_dcache_page_all(struct mm_struct *mm, struct page *page)
+void flush_dcache_folio_all(struct mm_struct *mm, struct folio *folio)
 {
 	void *pg_addr;
 	u64 data0;
@@ -988,10 +999,10 @@ void flush_dcache_page_all(struct mm_struct *mm, struct page *page)
 	atomic_inc(&dcpage_flushes);
 #endif
 	data0 = 0;
-	pg_addr = page_address(page);
+	pg_addr = folio_address(folio);
 	if (tlb_type == spitfire) {
 		data0 = ((u64)&xcall_flush_dcache_page_spitfire);
-		if (page_mapping_file(page) != NULL)
+		if (folio_flush_mapping(folio) != NULL)
 			data0 |= ((u64)1 << 32);
 	} else if (tlb_type == cheetah || tlb_type == cheetah_plus) {
 #ifdef DCACHE_ALIASING_POSSIBLE
@@ -999,13 +1010,18 @@ void flush_dcache_page_all(struct mm_struct *mm, struct page *page)
 #endif
 	}
 	if (data0) {
-		xcall_deliver(data0, __pa(pg_addr),
-			      (u64) pg_addr, cpu_online_mask);
+		unsigned int i, nr = folio_nr_pages(folio);
+
+		for (i = 0; i < nr; i++) {
+			xcall_deliver(data0, __pa(pg_addr),
+				      (u64) pg_addr, cpu_online_mask);
 #ifdef CONFIG_DEBUG_DCFLUSH
-		atomic_inc(&dcpage_flushes_xcall);
+			atomic_inc(&dcpage_flushes_xcall);
 #endif
+			pg_addr += PAGE_SIZE;
+		}
 	}
-	__local_flush_dcache_page(page);
+	__local_flush_dcache_folio(folio);
 
 	preempt_enable();
 }
diff --git a/arch/sparc/mm/init_64.c b/arch/sparc/mm/init_64.c
index 04f9db0c3111..ab9aacbaf43c 100644
--- a/arch/sparc/mm/init_64.c
+++ b/arch/sparc/mm/init_64.c
@@ -195,21 +195,26 @@ atomic_t dcpage_flushes_xcall = ATOMIC_INIT(0);
 #endif
 #endif
 
-inline void flush_dcache_page_impl(struct page *page)
+inline void flush_dcache_folio_impl(struct folio *folio)
 {
+	unsigned int i, nr = folio_nr_pages(folio);
+
 	BUG_ON(tlb_type == hypervisor);
 #ifdef CONFIG_DEBUG_DCFLUSH
 	atomic_inc(&dcpage_flushes);
 #endif
 
 #ifdef DCACHE_ALIASING_POSSIBLE
-	__flush_dcache_page(page_address(page),
-			    ((tlb_type == spitfire) &&
-			     page_mapping_file(page) != NULL));
+	for (i = 0; i < nr; i++)
+		__flush_dcache_page(folio_address(folio) + i * PAGE_SIZE,
+				    ((tlb_type == spitfire) &&
+				     folio_flush_mapping(folio) != NULL));
 #else
-	if (page_mapping_file(page) != NULL &&
-	    tlb_type == spitfire)
-		__flush_icache_page(__pa(page_address(page)));
+	if (folio_flush_mapping(folio) != NULL &&
+	    tlb_type == spitfire) {
+		for (i = 0; i < nr; i++)
+			__flush_icache_page((pfn + i) * PAGE_SIZE);
+	}
 #endif
 }
 
@@ -218,10 +223,10 @@ inline void flush_dcache_page_impl(struct page *page)
 #define PG_dcache_cpu_mask	\
 	((1UL<<ilog2(roundup_pow_of_two(NR_CPUS)))-1UL)
 
-#define dcache_dirty_cpu(page) \
-	(((page)->flags >> PG_dcache_cpu_shift) & PG_dcache_cpu_mask)
+#define dcache_dirty_cpu(folio) \
+	(((folio)->flags >> PG_dcache_cpu_shift) & PG_dcache_cpu_mask)
 
-static inline void set_dcache_dirty(struct page *page, int this_cpu)
+static inline void set_dcache_dirty(struct folio *folio, int this_cpu)
 {
 	unsigned long mask = this_cpu;
 	unsigned long non_cpu_bits;
@@ -238,11 +243,11 @@ static inline void set_dcache_dirty(struct page *page, int this_cpu)
 			     "bne,pn	%%xcc, 1b\n\t"
 			     " nop"
 			     : /* no outputs */
-			     : "r" (mask), "r" (non_cpu_bits), "r" (&page->flags)
+			     : "r" (mask), "r" (non_cpu_bits), "r" (&folio->flags)
 			     : "g1", "g7");
 }
 
-static inline void clear_dcache_dirty_cpu(struct page *page, unsigned long cpu)
+static inline void clear_dcache_dirty_cpu(struct folio *folio, unsigned long cpu)
 {
 	unsigned long mask = (1UL << PG_dcache_dirty);
 
@@ -260,7 +265,7 @@ static inline void clear_dcache_dirty_cpu(struct page *page, unsigned long cpu)
 			     " nop\n"
 			     "2:"
 			     : /* no outputs */
-			     : "r" (cpu), "r" (mask), "r" (&page->flags),
+			     : "r" (cpu), "r" (mask), "r" (&folio->flags),
 			       "i" (PG_dcache_cpu_mask),
 			       "i" (PG_dcache_cpu_shift)
 			     : "g1", "g7");
@@ -284,9 +289,10 @@ static void flush_dcache(unsigned long pfn)
 
 	page = pfn_to_page(pfn);
 	if (page) {
+		struct folio *folio = page_folio(page);
 		unsigned long pg_flags;
 
-		pg_flags = page->flags;
+		pg_flags = folio->flags;
 		if (pg_flags & (1UL << PG_dcache_dirty)) {
 			int cpu = ((pg_flags >> PG_dcache_cpu_shift) &
 				   PG_dcache_cpu_mask);
@@ -296,11 +302,11 @@ static void flush_dcache(unsigned long pfn)
 			 * in the SMP case.
 			 */
 			if (cpu == this_cpu)
-				flush_dcache_page_impl(page);
+				flush_dcache_folio_impl(folio);
 			else
-				smp_flush_dcache_page_impl(page, cpu);
+				smp_flush_dcache_folio_impl(folio, cpu);
 
-			clear_dcache_dirty_cpu(page, cpu);
+			clear_dcache_dirty_cpu(folio, cpu);
 
 			put_cpu();
 		}
@@ -388,12 +394,14 @@ bool __init arch_hugetlb_valid_size(unsigned long size)
 }
 #endif	/* CONFIG_HUGETLB_PAGE */
 
-void update_mmu_cache(struct vm_area_struct *vma, unsigned long address, pte_t *ptep)
+void update_mmu_cache_range(struct vm_area_struct *vma, unsigned long address,
+		pte_t *ptep, unsigned int nr)
 {
 	struct mm_struct *mm;
 	unsigned long flags;
 	bool is_huge_tsb;
 	pte_t pte = *ptep;
+	unsigned int i;
 
 	if (tlb_type != hypervisor) {
 		unsigned long pfn = pte_pfn(pte);
@@ -440,15 +448,21 @@ void update_mmu_cache(struct vm_area_struct *vma, unsigned long address, pte_t *
 		}
 	}
 #endif
-	if (!is_huge_tsb)
-		__update_mmu_tsb_insert(mm, MM_TSB_BASE, PAGE_SHIFT,
-					address, pte_val(pte));
+	if (!is_huge_tsb) {
+		for (i = 0; i < nr; i++) {
+			__update_mmu_tsb_insert(mm, MM_TSB_BASE, PAGE_SHIFT,
+						address, pte_val(pte));
+			address += PAGE_SIZE;
+			pte_val(pte) += PAGE_SIZE;
+		}
+	}
 
 	spin_unlock_irqrestore(&mm->context.lock, flags);
 }
 
-void flush_dcache_page(struct page *page)
+void flush_dcache_folio(struct folio *folio)
 {
+	unsigned long pfn = folio_pfn(folio);
 	struct address_space *mapping;
 	int this_cpu;
 
@@ -459,35 +473,35 @@ void flush_dcache_page(struct page *page)
 	 * is merely the zero page.  The 'bigcore' testcase in GDB
 	 * causes this case to run millions of times.
 	 */
-	if (page == ZERO_PAGE(0))
+	if (is_zero_pfn(pfn))
 		return;
 
 	this_cpu = get_cpu();
 
-	mapping = page_mapping_file(page);
+	mapping = folio_flush_mapping(folio);
 	if (mapping && !mapping_mapped(mapping)) {
-		int dirty = test_bit(PG_dcache_dirty, &page->flags);
+		bool dirty = test_bit(PG_dcache_dirty, &folio->flags);
 		if (dirty) {
-			int dirty_cpu = dcache_dirty_cpu(page);
+			int dirty_cpu = dcache_dirty_cpu(folio);
 
 			if (dirty_cpu == this_cpu)
 				goto out;
-			smp_flush_dcache_page_impl(page, dirty_cpu);
+			smp_flush_dcache_folio_impl(folio, dirty_cpu);
 		}
-		set_dcache_dirty(page, this_cpu);
+		set_dcache_dirty(folio, this_cpu);
 	} else {
 		/* We could delay the flush for the !page_mapping
 		 * case too.  But that case is for exec env/arg
 		 * pages and those are %99 certainly going to get
 		 * faulted into the tlb (and thus flushed) anyways.
 		 */
-		flush_dcache_page_impl(page);
+		flush_dcache_folio_impl(folio);
 	}
 
 out:
 	put_cpu();
 }
-EXPORT_SYMBOL(flush_dcache_page);
+EXPORT_SYMBOL(flush_dcache_folio);
 
 void __kprobes flush_icache_range(unsigned long start, unsigned long end)
 {
@@ -2280,10 +2294,10 @@ void __init paging_init(void)
 	setup_page_offset();
 
 	/* These build time checkes make sure that the dcache_dirty_cpu()
-	 * page->flags usage will work.
+	 * folio->flags usage will work.
 	 *
 	 * When a page gets marked as dcache-dirty, we store the
-	 * cpu number starting at bit 32 in the page->flags.  Also,
+	 * cpu number starting at bit 32 in the folio->flags.  Also,
 	 * functions like clear_dcache_dirty_cpu use the cpu mask
 	 * in 13-bit signed-immediate instruction fields.
 	 */
diff --git a/arch/sparc/mm/tlb.c b/arch/sparc/mm/tlb.c
index 9a725547578e..3fa6a070912d 100644
--- a/arch/sparc/mm/tlb.c
+++ b/arch/sparc/mm/tlb.c
@@ -118,6 +118,7 @@ void tlb_batch_add(struct mm_struct *mm, unsigned long vaddr,
 		unsigned long paddr, pfn = pte_pfn(orig);
 		struct address_space *mapping;
 		struct page *page;
+		struct folio *folio;
 
 		if (!pfn_valid(pfn))
 			goto no_cache_flush;
@@ -127,13 +128,13 @@ void tlb_batch_add(struct mm_struct *mm, unsigned long vaddr,
 			goto no_cache_flush;
 
 		/* A real file page? */
-		mapping = page_mapping_file(page);
+		mapping = folio_flush_mapping(folio);
 		if (!mapping)
 			goto no_cache_flush;
 
 		paddr = (unsigned long) page_address(page);
 		if ((paddr ^ vaddr) & (1 << 13))
-			flush_dcache_page_all(mm, page);
+			flush_dcache_folio_all(mm, folio);
 	}
 
 no_cache_flush:
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH v3 25/34] um: Implement the new page table range API
  2023-02-28 21:37 [PATCH v3 00/34] New page table range API Matthew Wilcox (Oracle)
@ 2023-02-28 21:37   ` Matthew Wilcox (Oracle)
  2023-02-28 21:37 ` [PATCH v3 02/34] mm: Add generic flush_icache_pages() and documentation Matthew Wilcox (Oracle)
                     ` (35 subsequent siblings)
  36 siblings, 0 replies; 77+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-02-28 21:37 UTC (permalink / raw)
  To: linux-mm, linux-arch
  Cc: Matthew Wilcox (Oracle),
	linux-kernel, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um

Add set_ptes() and update_mmu_cache_range().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: linux-um@lists.infradead.org
---
 arch/um/include/asm/pgtable.h | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/arch/um/include/asm/pgtable.h b/arch/um/include/asm/pgtable.h
index a70d1618eb35..ca78c90ae74f 100644
--- a/arch/um/include/asm/pgtable.h
+++ b/arch/um/include/asm/pgtable.h
@@ -242,12 +242,20 @@ static inline void set_pte(pte_t *pteptr, pte_t pteval)
 	if(pte_present(*pteptr)) *pteptr = pte_mknewprot(*pteptr);
 }
 
-static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
-			      pte_t *pteptr, pte_t pteval)
+static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
+		pte_t *ptep, pte_t pte, unsigned int nr)
 {
-	set_pte(pteptr, pteval);
+	for (;;) {
+		set_pte(ptep, pte);
+		if (--nr == 0)
+			break;
+		ptep++;
+		pte_val(pte) += PAGE_SIZE;
+	}
 }
 
+#define set_pte_at(mm, addr, ptep, pte)	set_ptes(mm, addr, ptep, pte, 1)
+
 #define __HAVE_ARCH_PTE_SAME
 static inline int pte_same(pte_t pte_a, pte_t pte_b)
 {
@@ -290,6 +298,7 @@ struct mm_struct;
 extern pte_t *virt_to_pte(struct mm_struct *mm, unsigned long addr);
 
 #define update_mmu_cache(vma,address,ptep) do {} while (0)
+#define update_mmu_cache_range(vma, address, ptep, nr) do {} while (0)
 
 /*
  * Encode/decode swap entries and swap PTEs. Swap PTEs are all PTEs that
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH v3 25/34] um: Implement the new page table range API
@ 2023-02-28 21:37   ` Matthew Wilcox (Oracle)
  0 siblings, 0 replies; 77+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-02-28 21:37 UTC (permalink / raw)
  To: linux-mm, linux-arch
  Cc: Matthew Wilcox (Oracle),
	linux-kernel, Richard Weinberger, Anton Ivanov, Johannes Berg,
	linux-um

Add set_ptes() and update_mmu_cache_range().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: linux-um@lists.infradead.org
---
 arch/um/include/asm/pgtable.h | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/arch/um/include/asm/pgtable.h b/arch/um/include/asm/pgtable.h
index a70d1618eb35..ca78c90ae74f 100644
--- a/arch/um/include/asm/pgtable.h
+++ b/arch/um/include/asm/pgtable.h
@@ -242,12 +242,20 @@ static inline void set_pte(pte_t *pteptr, pte_t pteval)
 	if(pte_present(*pteptr)) *pteptr = pte_mknewprot(*pteptr);
 }
 
-static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
-			      pte_t *pteptr, pte_t pteval)
+static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
+		pte_t *ptep, pte_t pte, unsigned int nr)
 {
-	set_pte(pteptr, pteval);
+	for (;;) {
+		set_pte(ptep, pte);
+		if (--nr == 0)
+			break;
+		ptep++;
+		pte_val(pte) += PAGE_SIZE;
+	}
 }
 
+#define set_pte_at(mm, addr, ptep, pte)	set_ptes(mm, addr, ptep, pte, 1)
+
 #define __HAVE_ARCH_PTE_SAME
 static inline int pte_same(pte_t pte_a, pte_t pte_b)
 {
@@ -290,6 +298,7 @@ struct mm_struct;
 extern pte_t *virt_to_pte(struct mm_struct *mm, unsigned long addr);
 
 #define update_mmu_cache(vma,address,ptep) do {} while (0)
+#define update_mmu_cache_range(vma, address, ptep, nr) do {} while (0)
 
 /*
  * Encode/decode swap entries and swap PTEs. Swap PTEs are all PTEs that
-- 
2.39.1


_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH v3 26/34] x86: Implement the new page table range API
  2023-02-28 21:37 [PATCH v3 00/34] New page table range API Matthew Wilcox (Oracle)
                   ` (24 preceding siblings ...)
  2023-02-28 21:37   ` Matthew Wilcox (Oracle)
@ 2023-02-28 21:37 ` Matthew Wilcox (Oracle)
  2023-02-28 21:37 ` [PATCH v3 27/34] xtensa: " Matthew Wilcox (Oracle)
                   ` (10 subsequent siblings)
  36 siblings, 0 replies; 77+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-02-28 21:37 UTC (permalink / raw)
  To: linux-mm, linux-arch
  Cc: Matthew Wilcox (Oracle),
	linux-kernel, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86, H. Peter Anvin

Convert set_pte_at() into set_ptes() and add a noop
update_mmu_cache_range().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: x86@kernel.org
Cc: "H. Peter Anvin" <hpa@zytor.com>
---
 arch/x86/include/asm/pgtable.h | 21 +++++++++++++++++----
 1 file changed, 17 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 84be3e07b112..f424371ea143 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -1019,13 +1019,22 @@ static inline pud_t native_local_pudp_get_and_clear(pud_t *pudp)
 	return res;
 }
 
-static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
-			      pte_t *ptep, pte_t pte)
+static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
+			      pte_t *ptep, pte_t pte, unsigned int nr)
 {
-	page_table_check_ptes_set(mm, addr, ptep, pte, 1);
-	set_pte(ptep, pte);
+	page_table_check_ptes_set(mm, addr, ptep, pte, nr);
+
+	for (;;) {
+		set_pte(ptep, pte);
+		if (--nr == 0)
+			break;
+		ptep++;
+		pte = __pte(pte_val(pte) + PAGE_SIZE);
+	}
 }
 
+#define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1)
+
 static inline void set_pmd_at(struct mm_struct *mm, unsigned long addr,
 			      pmd_t *pmdp, pmd_t pmd)
 {
@@ -1291,6 +1300,10 @@ static inline void update_mmu_cache(struct vm_area_struct *vma,
 		unsigned long addr, pte_t *ptep)
 {
 }
+static inline void update_mmu_cache_range(struct vm_area_struct *vma,
+		unsigned long addr, pte_t *ptep, unsigned int nr)
+{
+}
 static inline void update_mmu_cache_pmd(struct vm_area_struct *vma,
 		unsigned long addr, pmd_t *pmd)
 {
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH v3 27/34] xtensa: Implement the new page table range API
  2023-02-28 21:37 [PATCH v3 00/34] New page table range API Matthew Wilcox (Oracle)
                   ` (25 preceding siblings ...)
  2023-02-28 21:37 ` [PATCH v3 26/34] x86: " Matthew Wilcox (Oracle)
@ 2023-02-28 21:37 ` Matthew Wilcox (Oracle)
  2023-02-28 21:37 ` [PATCH v3 28/34] mm: Remove page_mapping_file() Matthew Wilcox (Oracle)
                   ` (9 subsequent siblings)
  36 siblings, 0 replies; 77+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-02-28 21:37 UTC (permalink / raw)
  To: linux-mm, linux-arch
  Cc: Matthew Wilcox (Oracle), linux-kernel, Max Filippov, linux-xtensa

Add set_ptes(), update_mmu_cache_range(), flush_dcache_folio() and
flush_icache_pages().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: linux-xtensa@linux-xtensa.org
---
 arch/xtensa/include/asm/cacheflush.h |  9 ++-
 arch/xtensa/include/asm/pgtable.h    | 24 +++++---
 arch/xtensa/mm/cache.c               | 83 ++++++++++++++++------------
 3 files changed, 72 insertions(+), 44 deletions(-)

diff --git a/arch/xtensa/include/asm/cacheflush.h b/arch/xtensa/include/asm/cacheflush.h
index 7b4359312c25..35153f6725e4 100644
--- a/arch/xtensa/include/asm/cacheflush.h
+++ b/arch/xtensa/include/asm/cacheflush.h
@@ -119,8 +119,14 @@ void flush_cache_page(struct vm_area_struct*,
 #define flush_cache_vmap(start,end)	flush_cache_all()
 #define flush_cache_vunmap(start,end)	flush_cache_all()
 
+void flush_dcache_folio(struct folio *folio);
+#define flush_dcache_folio flush_dcache_folio
+
 #define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
-void flush_dcache_page(struct page *);
+static inline void flush_dcache_page(struct page *page)
+{
+	flush_dcache_folio(page_folio(page));
+}
 
 void local_flush_cache_range(struct vm_area_struct *vma,
 		unsigned long start, unsigned long end);
@@ -156,6 +162,7 @@ void local_flush_cache_page(struct vm_area_struct *vma,
 
 /* This is not required, see Documentation/core-api/cachetlb.rst */
 #define	flush_icache_page(vma,page)			do { } while (0)
+#define	flush_icache_pages(vma, page, nr)		do { } while (0)
 
 #define flush_dcache_mmap_lock(mapping)			do { } while (0)
 #define flush_dcache_mmap_unlock(mapping)		do { } while (0)
diff --git a/arch/xtensa/include/asm/pgtable.h b/arch/xtensa/include/asm/pgtable.h
index fc7a14884c6c..293101530541 100644
--- a/arch/xtensa/include/asm/pgtable.h
+++ b/arch/xtensa/include/asm/pgtable.h
@@ -301,17 +301,25 @@ static inline void update_pte(pte_t *ptep, pte_t pteval)
 
 struct mm_struct;
 
-static inline void
-set_pte_at(struct mm_struct *mm, unsigned long addr, pte_t *ptep, pte_t pteval)
+static inline void set_pte(pte_t *ptep, pte_t pte)
 {
-	update_pte(ptep, pteval);
+	update_pte(ptep, pte);
 }
 
-static inline void set_pte(pte_t *ptep, pte_t pteval)
+static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
+		pte_t *ptep, pte_t pte, unsigned int nr)
 {
-	update_pte(ptep, pteval);
+	for (;;) {
+		set_pte(ptep, pte);
+		if (--nr == 0)
+			break;
+		ptep++;
+		pte_val(pte) += PAGE_SIZE;
+	}
 }
 
+#define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1)
+
 static inline void
 set_pmd(pmd_t *pmdp, pmd_t pmdval)
 {
@@ -407,8 +415,10 @@ static inline pte_t pte_swp_clear_exclusive(pte_t pte)
 
 #else
 
-extern  void update_mmu_cache(struct vm_area_struct * vma,
-			      unsigned long address, pte_t *ptep);
+void update_mmu_cache_range(struct vm_area_struct *vma,
+		unsigned long address, pte_t *ptep, unsigned int nr);
+#define update_mmu_cache(vma, address, ptep) \
+	update_mmu_cache_range(vma, address, ptep, 1)
 
 typedef pte_t *pte_addr_t;
 
diff --git a/arch/xtensa/mm/cache.c b/arch/xtensa/mm/cache.c
index 19e5a478a7e8..65c0d5298041 100644
--- a/arch/xtensa/mm/cache.c
+++ b/arch/xtensa/mm/cache.c
@@ -121,9 +121,9 @@ EXPORT_SYMBOL(copy_user_highpage);
  *
  */
 
-void flush_dcache_page(struct page *page)
+void flush_dcache_folio(struct folio *folio)
 {
-	struct address_space *mapping = page_mapping_file(page);
+	struct address_space *mapping = folio_flush_mapping(folio);
 
 	/*
 	 * If we have a mapping but the page is not mapped to user-space
@@ -132,14 +132,14 @@ void flush_dcache_page(struct page *page)
 	 */
 
 	if (mapping && !mapping_mapped(mapping)) {
-		if (!test_bit(PG_arch_1, &page->flags))
-			set_bit(PG_arch_1, &page->flags);
+		if (!test_bit(PG_arch_1, &folio->flags))
+			set_bit(PG_arch_1, &folio->flags);
 		return;
 
 	} else {
-
-		unsigned long phys = page_to_phys(page);
-		unsigned long temp = page->index << PAGE_SHIFT;
+		unsigned long phys = folio_pfn(folio) * PAGE_SIZE;
+		unsigned long temp = folio_pos(folio);
+		unsigned int i, nr = folio_nr_pages(folio);
 		unsigned long alias = !(DCACHE_ALIAS_EQ(temp, phys));
 		unsigned long virt;
 
@@ -154,22 +154,26 @@ void flush_dcache_page(struct page *page)
 			return;
 
 		preempt_disable();
-		virt = TLBTEMP_BASE_1 + (phys & DCACHE_ALIAS_MASK);
-		__flush_invalidate_dcache_page_alias(virt, phys);
+		for (i = 0; i < nr; i++) {
+			virt = TLBTEMP_BASE_1 + (phys & DCACHE_ALIAS_MASK);
+			__flush_invalidate_dcache_page_alias(virt, phys);
 
-		virt = TLBTEMP_BASE_1 + (temp & DCACHE_ALIAS_MASK);
+			virt = TLBTEMP_BASE_1 + (temp & DCACHE_ALIAS_MASK);
 
-		if (alias)
-			__flush_invalidate_dcache_page_alias(virt, phys);
+			if (alias)
+				__flush_invalidate_dcache_page_alias(virt, phys);
 
-		if (mapping)
-			__invalidate_icache_page_alias(virt, phys);
+			if (mapping)
+				__invalidate_icache_page_alias(virt, phys);
+			phys += PAGE_SIZE;
+			temp += PAGE_SIZE;
+		}
 		preempt_enable();
 	}
 
 	/* There shouldn't be an entry in the cache for this page anymore. */
 }
-EXPORT_SYMBOL(flush_dcache_page);
+EXPORT_SYMBOL(flush_dcache_folio);
 
 /*
  * For now, flush the whole cache. FIXME??
@@ -207,45 +211,52 @@ EXPORT_SYMBOL(local_flush_cache_page);
 
 #endif /* DCACHE_WAY_SIZE > PAGE_SIZE */
 
-void
-update_mmu_cache(struct vm_area_struct * vma, unsigned long addr, pte_t *ptep)
+void update_mmu_cache_range(struct vm_area_struct *vma, unsigned long addr,
+		pte_t *ptep, unsigned int nr)
 {
 	unsigned long pfn = pte_pfn(*ptep);
-	struct page *page;
+	struct folio *folio;
+	unsigned int i;
 
 	if (!pfn_valid(pfn))
 		return;
 
-	page = pfn_to_page(pfn);
+	folio = page_folio(pfn_to_page(pfn));
 
-	/* Invalidate old entry in TLBs */
-
-	flush_tlb_page(vma, addr);
+	/* Invalidate old entries in TLBs */
+	for (i = 0; i < nr; i++)
+		flush_tlb_page(vma, addr + i * PAGE_SIZE);
+	nr = folio_nr_pages(folio);
 
 #if (DCACHE_WAY_SIZE > PAGE_SIZE)
 
-	if (!PageReserved(page) && test_bit(PG_arch_1, &page->flags)) {
-		unsigned long phys = page_to_phys(page);
+	if (!folio_test_reserved(folio) && test_bit(PG_arch_1, &folio->flags)) {
+		unsigned long phys = folio_pfn(folio) * PAGE_SIZE;
 		unsigned long tmp;
 
 		preempt_disable();
-		tmp = TLBTEMP_BASE_1 + (phys & DCACHE_ALIAS_MASK);
-		__flush_invalidate_dcache_page_alias(tmp, phys);
-		tmp = TLBTEMP_BASE_1 + (addr & DCACHE_ALIAS_MASK);
-		__flush_invalidate_dcache_page_alias(tmp, phys);
-		__invalidate_icache_page_alias(tmp, phys);
+		for (i = 0; i < nr; i++) {
+			tmp = TLBTEMP_BASE_1 + (phys & DCACHE_ALIAS_MASK);
+			__flush_invalidate_dcache_page_alias(tmp, phys);
+			tmp = TLBTEMP_BASE_1 + (addr & DCACHE_ALIAS_MASK);
+			__flush_invalidate_dcache_page_alias(tmp, phys);
+			__invalidate_icache_page_alias(tmp, phys);
+			phys += PAGE_SIZE;
+		}
 		preempt_enable();
 
-		clear_bit(PG_arch_1, &page->flags);
+		clear_bit(PG_arch_1, &folio->flags);
 	}
 #else
-	if (!PageReserved(page) && !test_bit(PG_arch_1, &page->flags)
+	if (!folio_test_reserved(folio) && !test_bit(PG_arch_1, &folio->flags)
 	    && (vma->vm_flags & VM_EXEC) != 0) {
-		unsigned long paddr = (unsigned long)kmap_atomic(page);
-		__flush_dcache_page(paddr);
-		__invalidate_icache_page(paddr);
-		set_bit(PG_arch_1, &page->flags);
-		kunmap_atomic((void *)paddr);
+		for (i = 0; i < nr; i++) {
+			void *paddr = kmap_local_folio(folio, i * PAGE_SIZE);
+			__flush_dcache_page((unsigned long)paddr);
+			__invalidate_icache_page((unsigned long)paddr);
+			kunmap_atomic(paddr);
+		}
+		set_bit(PG_arch_1, &folio->flags);
 	}
 #endif
 }
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH v3 28/34] mm: Remove page_mapping_file()
  2023-02-28 21:37 [PATCH v3 00/34] New page table range API Matthew Wilcox (Oracle)
                   ` (26 preceding siblings ...)
  2023-02-28 21:37 ` [PATCH v3 27/34] xtensa: " Matthew Wilcox (Oracle)
@ 2023-02-28 21:37 ` Matthew Wilcox (Oracle)
  2023-02-28 21:37 ` [PATCH v3 29/34] mm: Rationalise flush_icache_pages() and flush_icache_page() Matthew Wilcox (Oracle)
                   ` (8 subsequent siblings)
  36 siblings, 0 replies; 77+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-02-28 21:37 UTC (permalink / raw)
  To: linux-mm, linux-arch; +Cc: Matthew Wilcox (Oracle), linux-kernel

This function has no more users.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 include/linux/pagemap.h | 8 --------
 1 file changed, 8 deletions(-)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 1b1ba3d5100d..c21b3ad1068c 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -394,14 +394,6 @@ static inline struct address_space *page_file_mapping(struct page *page)
 	return folio_file_mapping(page_folio(page));
 }
 
-/*
- * For file cache pages, return the address_space, otherwise return NULL
- */
-static inline struct address_space *page_mapping_file(struct page *page)
-{
-	return folio_flush_mapping(page_folio(page));
-}
-
 /**
  * folio_inode - Get the host inode for this folio.
  * @folio: The folio.
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH v3 29/34] mm: Rationalise flush_icache_pages() and flush_icache_page()
  2023-02-28 21:37 [PATCH v3 00/34] New page table range API Matthew Wilcox (Oracle)
                   ` (27 preceding siblings ...)
  2023-02-28 21:37 ` [PATCH v3 28/34] mm: Remove page_mapping_file() Matthew Wilcox (Oracle)
@ 2023-02-28 21:37 ` Matthew Wilcox (Oracle)
  2023-03-05  9:53   ` Geert Uytterhoeven
  2023-02-28 21:37 ` [PATCH v3 30/34] mm: Use flush_icache_pages() in do_set_pmd() Matthew Wilcox (Oracle)
                   ` (7 subsequent siblings)
  36 siblings, 1 reply; 77+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-02-28 21:37 UTC (permalink / raw)
  To: linux-mm, linux-arch; +Cc: Matthew Wilcox (Oracle), linux-kernel

Move the default (no-op) implementation of flush_icache_pages()
to <linux/cacheflush.h> from <asm-generic/cacheflush.h>.
Remove the flush_icache_page() wrapper from each architecture
into <linux/cacheflush.h>.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 arch/alpha/include/asm/cacheflush.h     |  5 +----
 arch/arc/include/asm/cacheflush.h       |  9 ---------
 arch/arm/include/asm/cacheflush.h       |  7 -------
 arch/csky/abiv1/inc/abi/cacheflush.h    |  1 -
 arch/csky/abiv2/inc/abi/cacheflush.h    |  1 -
 arch/hexagon/include/asm/cacheflush.h   |  2 +-
 arch/loongarch/include/asm/cacheflush.h |  2 --
 arch/m68k/include/asm/cacheflush_mm.h   |  1 -
 arch/mips/include/asm/cacheflush.h      |  6 ------
 arch/nios2/include/asm/cacheflush.h     |  2 +-
 arch/nios2/mm/cacheflush.c              |  1 +
 arch/parisc/include/asm/cacheflush.h    |  2 +-
 arch/sh/include/asm/cacheflush.h        |  2 +-
 arch/sparc/include/asm/cacheflush_32.h  |  2 --
 arch/sparc/include/asm/cacheflush_64.h  |  3 ---
 arch/xtensa/include/asm/cacheflush.h    |  4 ----
 include/asm-generic/cacheflush.h        | 12 ------------
 include/linux/cacheflush.h              |  9 +++++++++
 18 files changed, 15 insertions(+), 56 deletions(-)

diff --git a/arch/alpha/include/asm/cacheflush.h b/arch/alpha/include/asm/cacheflush.h
index 3956460e69e2..36a7e924c3b9 100644
--- a/arch/alpha/include/asm/cacheflush.h
+++ b/arch/alpha/include/asm/cacheflush.h
@@ -53,10 +53,6 @@ extern void flush_icache_user_page(struct vm_area_struct *vma,
 #define flush_icache_user_page flush_icache_user_page
 #endif /* CONFIG_SMP */
 
-/* This is used only in __do_fault and do_swap_page.  */
-#define flush_icache_page(vma, page) \
-	flush_icache_user_page((vma), (page), 0, 0)
-
 /*
  * Both implementations of flush_icache_user_page flush the entire
  * address space, so one call, no matter how many pages.
@@ -66,6 +62,7 @@ static inline void flush_icache_pages(struct vm_area_struct *vma,
 {
 	flush_icache_user_page(vma, page, 0, 0);
 }
+#define flush_icache_pages flush_icache_pages
 
 #include <asm-generic/cacheflush.h>
 
diff --git a/arch/arc/include/asm/cacheflush.h b/arch/arc/include/asm/cacheflush.h
index 04f65f588510..bd5b1a9a0544 100644
--- a/arch/arc/include/asm/cacheflush.h
+++ b/arch/arc/include/asm/cacheflush.h
@@ -18,15 +18,6 @@
 #include <linux/mm.h>
 #include <asm/shmparam.h>
 
-/*
- * Semantically we need this because icache doesn't snoop dcache/dma.
- * However ARC Cache flush requires paddr as well as vaddr, latter not available
- * in the flush_icache_page() API. So we no-op it but do the equivalent work
- * in update_mmu_cache()
- */
-#define flush_icache_page(vma, page)
-#define flush_icache_pages(vma, page, nr)
-
 void flush_cache_all(void);
 
 void flush_icache_range(unsigned long kstart, unsigned long kend);
diff --git a/arch/arm/include/asm/cacheflush.h b/arch/arm/include/asm/cacheflush.h
index 841e268d2374..f6181f69577f 100644
--- a/arch/arm/include/asm/cacheflush.h
+++ b/arch/arm/include/asm/cacheflush.h
@@ -321,13 +321,6 @@ static inline void flush_anon_page(struct vm_area_struct *vma,
 #define flush_dcache_mmap_lock(mapping)		xa_lock_irq(&mapping->i_pages)
 #define flush_dcache_mmap_unlock(mapping)	xa_unlock_irq(&mapping->i_pages)
 
-/*
- * We don't appear to need to do anything here.  In fact, if we did, we'd
- * duplicate cache flushing elsewhere performed by flush_dcache_page().
- */
-#define flush_icache_page(vma,page)	do { } while (0)
-#define flush_icache_pages(vma, page, nr)	do { } while (0)
-
 /*
  * flush_cache_vmap() is used when creating mappings (eg, via vmap,
  * vmalloc, ioremap etc) in kernel space for pages.  On non-VIPT
diff --git a/arch/csky/abiv1/inc/abi/cacheflush.h b/arch/csky/abiv1/inc/abi/cacheflush.h
index 0d6cb65624c4..908d8b0bc4fd 100644
--- a/arch/csky/abiv1/inc/abi/cacheflush.h
+++ b/arch/csky/abiv1/inc/abi/cacheflush.h
@@ -45,7 +45,6 @@ extern void flush_cache_range(struct vm_area_struct *vma, unsigned long start, u
 #define flush_cache_vmap(start, end)		cache_wbinv_all()
 #define flush_cache_vunmap(start, end)		cache_wbinv_all()
 
-#define flush_icache_page(vma, page)		do {} while (0);
 #define flush_icache_range(start, end)		cache_wbinv_range(start, end)
 #define flush_icache_mm_range(mm, start, end)	cache_wbinv_range(start, end)
 #define flush_icache_deferred(mm)		do {} while (0);
diff --git a/arch/csky/abiv2/inc/abi/cacheflush.h b/arch/csky/abiv2/inc/abi/cacheflush.h
index 9c728933a776..40be16907267 100644
--- a/arch/csky/abiv2/inc/abi/cacheflush.h
+++ b/arch/csky/abiv2/inc/abi/cacheflush.h
@@ -33,7 +33,6 @@ static inline void flush_dcache_page(struct page *page)
 
 #define flush_dcache_mmap_lock(mapping)		do { } while (0)
 #define flush_dcache_mmap_unlock(mapping)	do { } while (0)
-#define flush_icache_page(vma, page)		do { } while (0)
 
 #define flush_icache_range(start, end)		cache_wbinv_range(start, end)
 
diff --git a/arch/hexagon/include/asm/cacheflush.h b/arch/hexagon/include/asm/cacheflush.h
index 63ca314ede89..bdacf72d97e1 100644
--- a/arch/hexagon/include/asm/cacheflush.h
+++ b/arch/hexagon/include/asm/cacheflush.h
@@ -18,7 +18,7 @@
  *  - flush_cache_range(vma, start, end) flushes a range of pages
  *  - flush_icache_range(start, end) flush a range of instructions
  *  - flush_dcache_page(pg) flushes(wback&invalidates) a page for dcache
- *  - flush_icache_page(vma, pg) flushes(invalidates) a page for icache
+ *  - flush_icache_pages(vma, pg, nr) flushes(invalidates) nr pages for icache
  *
  *  Need to doublecheck which one is really needed for ptrace stuff to work.
  */
diff --git a/arch/loongarch/include/asm/cacheflush.h b/arch/loongarch/include/asm/cacheflush.h
index 7907eb42bfbd..326ac6f1b27c 100644
--- a/arch/loongarch/include/asm/cacheflush.h
+++ b/arch/loongarch/include/asm/cacheflush.h
@@ -46,8 +46,6 @@ void local_flush_icache_range(unsigned long start, unsigned long end);
 #define flush_cache_page(vma, vmaddr, pfn)		do { } while (0)
 #define flush_cache_vmap(start, end)			do { } while (0)
 #define flush_cache_vunmap(start, end)			do { } while (0)
-#define flush_icache_page(vma, page)			do { } while (0)
-#define flush_icache_pages(vma, page)			do { } while (0)
 #define flush_icache_user_page(vma, page, addr, len)	do { } while (0)
 #define flush_dcache_page(page)				do { } while (0)
 #define flush_dcache_folio(folio)			do { } while (0)
diff --git a/arch/m68k/include/asm/cacheflush_mm.h b/arch/m68k/include/asm/cacheflush_mm.h
index d43c8bce149b..c67a8d2e6d6a 100644
--- a/arch/m68k/include/asm/cacheflush_mm.h
+++ b/arch/m68k/include/asm/cacheflush_mm.h
@@ -260,7 +260,6 @@ static inline void __flush_pages_to_ram(void *vaddr, unsigned int nr)
 #define flush_dcache_mmap_unlock(mapping)	do { } while (0)
 #define flush_icache_pages(vma, page, nr)	\
 	__flush_pages_to_ram(page_address(page), nr)
-#define flush_icache_page(vma, page) flush_icache_pages(vma, page, 1)
 
 extern void flush_icache_user_page(struct vm_area_struct *vma, struct page *page,
 				    unsigned long addr, int len);
diff --git a/arch/mips/include/asm/cacheflush.h b/arch/mips/include/asm/cacheflush.h
index 2683cade42ef..043e50effc62 100644
--- a/arch/mips/include/asm/cacheflush.h
+++ b/arch/mips/include/asm/cacheflush.h
@@ -82,12 +82,6 @@ static inline void flush_anon_page(struct vm_area_struct *vma,
 		__flush_anon_page(page, vmaddr);
 }
 
-static inline void flush_icache_pages(struct vm_area_struct *vma,
-		struct page *page, unsigned int nr)
-{
-}
-#define flush_icache_page(vma, page) flush_icache_pages(vma, page, 1)
-
 extern void (*flush_icache_range)(unsigned long start, unsigned long end);
 extern void (*local_flush_icache_range)(unsigned long start, unsigned long end);
 extern void (*__flush_icache_user_range)(unsigned long start,
diff --git a/arch/nios2/include/asm/cacheflush.h b/arch/nios2/include/asm/cacheflush.h
index 8624ca83cffe..7c48c5213fb7 100644
--- a/arch/nios2/include/asm/cacheflush.h
+++ b/arch/nios2/include/asm/cacheflush.h
@@ -35,7 +35,7 @@ void flush_dcache_folio(struct folio *folio);
 extern void flush_icache_range(unsigned long start, unsigned long end);
 void flush_icache_pages(struct vm_area_struct *vma, struct page *page,
 		unsigned int nr);
-#define flush_icache_page(vma, page) flush_icache_pages(vma, page, 1);
+#define flush_icache_pages flush_icache_pages
 
 #define flush_cache_vmap(start, end)		flush_dcache_range(start, end)
 #define flush_cache_vunmap(start, end)		flush_dcache_range(start, end)
diff --git a/arch/nios2/mm/cacheflush.c b/arch/nios2/mm/cacheflush.c
index 471485a84b2c..2565767b98a3 100644
--- a/arch/nios2/mm/cacheflush.c
+++ b/arch/nios2/mm/cacheflush.c
@@ -147,6 +147,7 @@ void flush_icache_pages(struct vm_area_struct *vma, struct page *page,
 	__flush_dcache(start, end);
 	__flush_icache(start, end);
 }
+#define flush_icache_pages flush_icache_pages
 
 void flush_cache_page(struct vm_area_struct *vma, unsigned long vmaddr,
 			unsigned long pfn)
diff --git a/arch/parisc/include/asm/cacheflush.h b/arch/parisc/include/asm/cacheflush.h
index 0bf8b69d086b..e4fdce328dbd 100644
--- a/arch/parisc/include/asm/cacheflush.h
+++ b/arch/parisc/include/asm/cacheflush.h
@@ -59,7 +59,7 @@ static inline void flush_dcache_page(struct page *page)
 
 void flush_icache_pages(struct vm_area_struct *vma, struct page *page,
 		unsigned int nr);
-#define flush_icache_page(vma, page)	flush_icache_pages(vma, page, 1)
+#define flush_icache_pages flush_icache_pages
 
 #define flush_icache_range(s,e)		do { 		\
 	flush_kernel_dcache_range_asm(s,e); 		\
diff --git a/arch/sh/include/asm/cacheflush.h b/arch/sh/include/asm/cacheflush.h
index 9fceef6f3e00..878b6b551bd2 100644
--- a/arch/sh/include/asm/cacheflush.h
+++ b/arch/sh/include/asm/cacheflush.h
@@ -53,7 +53,7 @@ extern void flush_icache_range(unsigned long start, unsigned long end);
 #define flush_icache_user_range flush_icache_range
 void flush_icache_pages(struct vm_area_struct *vma, struct page *page,
 		unsigned int nr);
-#define flush_icache_page(vma, page) flush_icache_pages(vma, page, 1)
+#define flush_icache_pages flush_icache_pages
 extern void flush_cache_sigtramp(unsigned long address);
 
 struct flusher_data {
diff --git a/arch/sparc/include/asm/cacheflush_32.h b/arch/sparc/include/asm/cacheflush_32.h
index 8dba35d63328..21f6c918238b 100644
--- a/arch/sparc/include/asm/cacheflush_32.h
+++ b/arch/sparc/include/asm/cacheflush_32.h
@@ -15,8 +15,6 @@
 #define flush_cache_page(vma,addr,pfn) \
 	sparc32_cachetlb_ops->cache_page(vma, addr)
 #define flush_icache_range(start, end)		do { } while (0)
-#define flush_icache_page(vma, pg)		do { } while (0)
-#define flush_icache_pages(vma, pg, nr)		do { } while (0)
 
 #define copy_to_user_page(vma, page, vaddr, dst, src, len) \
 	do {							\
diff --git a/arch/sparc/include/asm/cacheflush_64.h b/arch/sparc/include/asm/cacheflush_64.h
index a9a719f04d06..0e879004efff 100644
--- a/arch/sparc/include/asm/cacheflush_64.h
+++ b/arch/sparc/include/asm/cacheflush_64.h
@@ -53,9 +53,6 @@ static inline void flush_dcache_page(struct page *page)
 	flush_dcache_folio(page_folio(page));
 }
 
-#define flush_icache_page(vma, pg)	do { } while(0)
-#define flush_icache_pages(vma, pg, nr)	do { } while(0)
-
 void flush_ptrace_access(struct vm_area_struct *, struct page *,
 			 unsigned long uaddr, void *kaddr,
 			 unsigned long len, int write);
diff --git a/arch/xtensa/include/asm/cacheflush.h b/arch/xtensa/include/asm/cacheflush.h
index 35153f6725e4..785a00ce83c1 100644
--- a/arch/xtensa/include/asm/cacheflush.h
+++ b/arch/xtensa/include/asm/cacheflush.h
@@ -160,10 +160,6 @@ void local_flush_cache_page(struct vm_area_struct *vma,
 		__invalidate_icache_range(start,(end) - (start));	\
 	} while (0)
 
-/* This is not required, see Documentation/core-api/cachetlb.rst */
-#define	flush_icache_page(vma,page)			do { } while (0)
-#define	flush_icache_pages(vma, page, nr)		do { } while (0)
-
 #define flush_dcache_mmap_lock(mapping)			do { } while (0)
 #define flush_dcache_mmap_unlock(mapping)		do { } while (0)
 
diff --git a/include/asm-generic/cacheflush.h b/include/asm-generic/cacheflush.h
index 09d51a680765..84ec53ccc450 100644
--- a/include/asm-generic/cacheflush.h
+++ b/include/asm-generic/cacheflush.h
@@ -77,18 +77,6 @@ static inline void flush_icache_range(unsigned long start, unsigned long end)
 #define flush_icache_user_range flush_icache_range
 #endif
 
-#ifndef flush_icache_page
-static inline void flush_icache_pages(struct vm_area_struct *vma,
-				     struct page *page, unsigned int nr)
-{
-}
-
-static inline void flush_icache_page(struct vm_area_struct *vma,
-				     struct page *page)
-{
-}
-#endif
-
 #ifndef flush_icache_user_page
 static inline void flush_icache_user_page(struct vm_area_struct *vma,
 					   struct page *page,
diff --git a/include/linux/cacheflush.h b/include/linux/cacheflush.h
index 82136f3fcf54..55f297b2c23f 100644
--- a/include/linux/cacheflush.h
+++ b/include/linux/cacheflush.h
@@ -17,4 +17,13 @@ static inline void flush_dcache_folio(struct folio *folio)
 #define flush_dcache_folio flush_dcache_folio
 #endif /* ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE */
 
+#ifndef flush_icache_pages
+static inline void flush_icache_pages(struct vm_area_struct *vma,
+				     struct page *page, unsigned int nr)
+{
+}
+#endif
+
+#define flush_icache_page(vma, page)	flush_icache_pages(vma, page, 1)
+
 #endif /* _LINUX_CACHEFLUSH_H */
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH v3 30/34] mm: Use flush_icache_pages() in do_set_pmd()
  2023-02-28 21:37 [PATCH v3 00/34] New page table range API Matthew Wilcox (Oracle)
                   ` (28 preceding siblings ...)
  2023-02-28 21:37 ` [PATCH v3 29/34] mm: Rationalise flush_icache_pages() and flush_icache_page() Matthew Wilcox (Oracle)
@ 2023-02-28 21:37 ` Matthew Wilcox (Oracle)
  2023-03-03 14:02   ` Mike Rapoport
  2023-02-28 21:37 ` [PATCH v3 31/34] filemap: Add filemap_map_folio_range() Matthew Wilcox (Oracle)
                   ` (6 subsequent siblings)
  36 siblings, 1 reply; 77+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-02-28 21:37 UTC (permalink / raw)
  To: linux-mm, linux-arch; +Cc: Matthew Wilcox (Oracle), linux-kernel

Push the iteration over each page down to the architectures (many
can flush the entire THP without iteration).

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 mm/memory.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index bfa3100ec5a3..69e844d5f75c 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4222,8 +4222,7 @@ vm_fault_t do_set_pmd(struct vm_fault *vmf, struct page *page)
 	if (unlikely(!pmd_none(*vmf->pmd)))
 		goto out;
 
-	for (i = 0; i < HPAGE_PMD_NR; i++)
-		flush_icache_page(vma, page + i);
+	flush_icache_pages(vma, page, HPAGE_PMD_NR);
 
 	entry = mk_huge_pmd(page, vma->vm_page_prot);
 	if (write)
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH v3 31/34] filemap: Add filemap_map_folio_range()
  2023-02-28 21:37 [PATCH v3 00/34] New page table range API Matthew Wilcox (Oracle)
                   ` (29 preceding siblings ...)
  2023-02-28 21:37 ` [PATCH v3 30/34] mm: Use flush_icache_pages() in do_set_pmd() Matthew Wilcox (Oracle)
@ 2023-02-28 21:37 ` Matthew Wilcox (Oracle)
  2023-02-28 21:37 ` [PATCH v3 32/34] rmap: add folio_add_file_rmap_range() Matthew Wilcox (Oracle)
                   ` (5 subsequent siblings)
  36 siblings, 0 replies; 77+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-02-28 21:37 UTC (permalink / raw)
  To: linux-mm, linux-arch; +Cc: Yin Fengwei, linux-kernel, Matthew Wilcox

From: Yin Fengwei <fengwei.yin@intel.com>

filemap_map_folio_range() maps partial/full folio. Comparing to original
filemap_map_pages(), it updates refcount once per folio instead of per
page and gets minor performance improvement for large folio.

With a will-it-scale.page_fault3 like app (change file write
fault testing to read fault testing. Trying to upstream it to
will-it-scale at [1]), got 2% performance gain on a 48C/96T
Cascade Lake test box with 96 processes running against xfs.

[1]: https://github.com/antonblanchard/will-it-scale/pull/37

Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 mm/filemap.c | 98 +++++++++++++++++++++++++++++-----------------------
 1 file changed, 54 insertions(+), 44 deletions(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index 2723104cc06a..db86e459dde6 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2202,16 +2202,6 @@ unsigned filemap_get_folios(struct address_space *mapping, pgoff_t *start,
 }
 EXPORT_SYMBOL(filemap_get_folios);
 
-static inline
-bool folio_more_pages(struct folio *folio, pgoff_t index, pgoff_t max)
-{
-	if (!folio_test_large(folio) || folio_test_hugetlb(folio))
-		return false;
-	if (index >= max)
-		return false;
-	return index < folio->index + folio_nr_pages(folio) - 1;
-}
-
 /**
  * filemap_get_folios_contig - Get a batch of contiguous folios
  * @mapping:	The address_space to search
@@ -3483,6 +3473,53 @@ static inline struct folio *next_map_page(struct address_space *mapping,
 				  mapping, xas, end_pgoff);
 }
 
+/*
+ * Map page range [start_page, start_page + nr_pages) of folio.
+ * start_page is gotten from start by folio_page(folio, start)
+ */
+static vm_fault_t filemap_map_folio_range(struct vm_fault *vmf,
+			struct folio *folio, unsigned long start,
+			unsigned long addr, unsigned int nr_pages)
+{
+	vm_fault_t ret = 0;
+	struct vm_area_struct *vma = vmf->vma;
+	struct file *file = vma->vm_file;
+	struct page *page = folio_page(folio, start);
+	unsigned int mmap_miss = READ_ONCE(file->f_ra.mmap_miss);
+	unsigned int ref_count = 0, count = 0;
+
+	do {
+		if (PageHWPoison(page))
+			continue;
+
+		if (mmap_miss > 0)
+			mmap_miss--;
+
+		/*
+		 * NOTE: If there're PTE markers, we'll leave them to be
+		 * handled in the specific fault path, and it'll prohibit the
+		 * fault-around logic.
+		 */
+		if (!pte_none(*vmf->pte))
+			continue;
+
+		if (vmf->address == addr)
+			ret = VM_FAULT_NOPAGE;
+
+		ref_count++;
+		do_set_pte(vmf, page, addr);
+		update_mmu_cache(vma, addr, vmf->pte);
+	} while (vmf->pte++, page++, addr += PAGE_SIZE, ++count < nr_pages);
+
+	/* Restore the vmf->pte */
+	vmf->pte -= nr_pages;
+
+	folio_ref_add(folio, ref_count);
+	WRITE_ONCE(file->f_ra.mmap_miss, mmap_miss);
+
+	return ret;
+}
+
 vm_fault_t filemap_map_pages(struct vm_fault *vmf,
 			     pgoff_t start_pgoff, pgoff_t end_pgoff)
 {
@@ -3493,9 +3530,9 @@ vm_fault_t filemap_map_pages(struct vm_fault *vmf,
 	unsigned long addr;
 	XA_STATE(xas, &mapping->i_pages, start_pgoff);
 	struct folio *folio;
-	struct page *page;
 	unsigned int mmap_miss = READ_ONCE(file->f_ra.mmap_miss);
 	vm_fault_t ret = 0;
+	int nr_pages = 0;
 
 	rcu_read_lock();
 	folio = first_map_page(mapping, &xas, end_pgoff);
@@ -3510,45 +3547,18 @@ vm_fault_t filemap_map_pages(struct vm_fault *vmf,
 	addr = vma->vm_start + ((start_pgoff - vma->vm_pgoff) << PAGE_SHIFT);
 	vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, addr, &vmf->ptl);
 	do {
-again:
-		page = folio_file_page(folio, xas.xa_index);
-		if (PageHWPoison(page))
-			goto unlock;
-
-		if (mmap_miss > 0)
-			mmap_miss--;
+		unsigned long end;
 
 		addr += (xas.xa_index - last_pgoff) << PAGE_SHIFT;
 		vmf->pte += xas.xa_index - last_pgoff;
 		last_pgoff = xas.xa_index;
+		end = folio->index + folio_nr_pages(folio) - 1;
+		nr_pages = min(end, end_pgoff) - xas.xa_index + 1;
 
-		/*
-		 * NOTE: If there're PTE markers, we'll leave them to be
-		 * handled in the specific fault path, and it'll prohibit the
-		 * fault-around logic.
-		 */
-		if (!pte_none(*vmf->pte))
-			goto unlock;
+		ret |= filemap_map_folio_range(vmf, folio,
+				xas.xa_index - folio->index, addr, nr_pages);
+		xas.xa_index += nr_pages;
 
-		/* We're about to handle the fault */
-		if (vmf->address == addr)
-			ret = VM_FAULT_NOPAGE;
-
-		do_set_pte(vmf, page, addr);
-		/* no need to invalidate: a not-present page won't be cached */
-		update_mmu_cache(vma, addr, vmf->pte);
-		if (folio_more_pages(folio, xas.xa_index, end_pgoff)) {
-			xas.xa_index++;
-			folio_ref_inc(folio);
-			goto again;
-		}
-		folio_unlock(folio);
-		continue;
-unlock:
-		if (folio_more_pages(folio, xas.xa_index, end_pgoff)) {
-			xas.xa_index++;
-			goto again;
-		}
 		folio_unlock(folio);
 		folio_put(folio);
 	} while ((folio = next_map_page(mapping, &xas, end_pgoff)) != NULL);
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH v3 32/34] rmap: add folio_add_file_rmap_range()
  2023-02-28 21:37 [PATCH v3 00/34] New page table range API Matthew Wilcox (Oracle)
                   ` (30 preceding siblings ...)
  2023-02-28 21:37 ` [PATCH v3 31/34] filemap: Add filemap_map_folio_range() Matthew Wilcox (Oracle)
@ 2023-02-28 21:37 ` Matthew Wilcox (Oracle)
  2023-03-01  3:04   ` Yin, Fengwei
  2023-02-28 21:37 ` [PATCH v3 33/34] mm: Convert do_set_pte() to set_pte_range() Matthew Wilcox (Oracle)
                   ` (4 subsequent siblings)
  36 siblings, 1 reply; 77+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-02-28 21:37 UTC (permalink / raw)
  To: linux-mm, linux-arch; +Cc: Yin Fengwei, linux-kernel, Matthew Wilcox

From: Yin Fengwei <fengwei.yin@intel.com>

folio_add_file_rmap_range() allows to add pte mapping to a specific
range of file folio. Comparing to page_add_file_rmap(), it batched
updates __lruvec_stat for large folio.

Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 include/linux/rmap.h |  2 ++
 mm/rmap.c            | 60 +++++++++++++++++++++++++++++++++-----------
 2 files changed, 48 insertions(+), 14 deletions(-)

diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index b87d01660412..a3825ce81102 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -198,6 +198,8 @@ void folio_add_new_anon_rmap(struct folio *, struct vm_area_struct *,
 		unsigned long address);
 void page_add_file_rmap(struct page *, struct vm_area_struct *,
 		bool compound);
+void folio_add_file_rmap_range(struct folio *, struct page *, unsigned int nr,
+		struct vm_area_struct *, bool compound);
 void page_remove_rmap(struct page *, struct vm_area_struct *,
 		bool compound);
 
diff --git a/mm/rmap.c b/mm/rmap.c
index bacdb795d5ee..fffdb85a3b3d 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1303,31 +1303,39 @@ void folio_add_new_anon_rmap(struct folio *folio, struct vm_area_struct *vma,
 }
 
 /**
- * page_add_file_rmap - add pte mapping to a file page
- * @page:	the page to add the mapping to
+ * folio_add_file_rmap_range - add pte mapping to page range of a folio
+ * @folio:	The folio to add the mapping to
+ * @page:	The first page to add
+ * @nr_pages:	The number of pages which will be mapped
  * @vma:	the vm area in which the mapping is added
  * @compound:	charge the page as compound or small page
  *
+ * The page range of folio is defined by [first_page, first_page + nr_pages)
+ *
  * The caller needs to hold the pte lock.
  */
-void page_add_file_rmap(struct page *page, struct vm_area_struct *vma,
-		bool compound)
+void folio_add_file_rmap_range(struct folio *folio, struct page *page,
+			unsigned int nr_pages, struct vm_area_struct *vma,
+			bool compound)
 {
-	struct folio *folio = page_folio(page);
 	atomic_t *mapped = &folio->_nr_pages_mapped;
-	int nr = 0, nr_pmdmapped = 0;
-	bool first;
+	unsigned int nr_pmdmapped = 0, first;
+	int nr = 0;
 
-	VM_BUG_ON_PAGE(compound && !PageTransHuge(page), page);
+	VM_WARN_ON_FOLIO(compound && !folio_test_pmd_mappable(folio), folio);
 
 	/* Is page being mapped by PTE? Is this its first map to be added? */
 	if (likely(!compound)) {
-		first = atomic_inc_and_test(&page->_mapcount);
-		nr = first;
-		if (first && folio_test_large(folio)) {
-			nr = atomic_inc_return_relaxed(mapped);
-			nr = (nr < COMPOUND_MAPPED);
-		}
+		do {
+			first = atomic_inc_and_test(&page->_mapcount);
+			if (first && folio_test_large(folio)) {
+				first = atomic_inc_return_relaxed(mapped);
+				first = (nr < COMPOUND_MAPPED);
+			}
+
+			if (first)
+				nr++;
+		} while (page++, --nr_pages > 0);
 	} else if (folio_test_pmd_mappable(folio)) {
 		/* That test is redundant: it's for safety or to optimize out */
 
@@ -1356,6 +1364,30 @@ void page_add_file_rmap(struct page *page, struct vm_area_struct *vma,
 	mlock_vma_folio(folio, vma, compound);
 }
 
+/**
+ * page_add_file_rmap - add pte mapping to a file page
+ * @page:	the page to add the mapping to
+ * @vma:	the vm area in which the mapping is added
+ * @compound:	charge the page as compound or small page
+ *
+ * The caller needs to hold the pte lock.
+ */
+void page_add_file_rmap(struct page *page, struct vm_area_struct *vma,
+		bool compound)
+{
+	struct folio *folio = page_folio(page);
+	unsigned int nr_pages;
+
+	VM_WARN_ON_ONCE_PAGE(compound && !PageTransHuge(page), page);
+
+	if (likely(!compound))
+		nr_pages = 1;
+	else
+		nr_pages = folio_nr_pages(folio);
+
+	folio_add_file_rmap_range(folio, page, nr_pages, vma, compound);
+}
+
 /**
  * page_remove_rmap - take down pte mapping from a page
  * @page:	page to remove mapping from
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH v3 33/34] mm: Convert do_set_pte() to set_pte_range()
  2023-02-28 21:37 [PATCH v3 00/34] New page table range API Matthew Wilcox (Oracle)
                   ` (31 preceding siblings ...)
  2023-02-28 21:37 ` [PATCH v3 32/34] rmap: add folio_add_file_rmap_range() Matthew Wilcox (Oracle)
@ 2023-02-28 21:37 ` Matthew Wilcox (Oracle)
  2023-02-28 21:37 ` [PATCH v3 34/34] filemap: Batch PTE mappings Matthew Wilcox (Oracle)
                   ` (3 subsequent siblings)
  36 siblings, 0 replies; 77+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-02-28 21:37 UTC (permalink / raw)
  To: linux-mm, linux-arch; +Cc: Yin Fengwei, linux-kernel, Matthew Wilcox

From: Yin Fengwei <fengwei.yin@intel.com>

set_pte_range() allows to setup page table entries for a specific
range.  It takes advantage of batched rmap update for large folio.
It now takes care of calling update_mmu_cache_range().

Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 Documentation/filesystems/locking.rst |  2 +-
 include/linux/mm.h                    |  3 ++-
 mm/filemap.c                          |  3 +--
 mm/memory.c                           | 27 +++++++++++++++------------
 4 files changed, 19 insertions(+), 16 deletions(-)

diff --git a/Documentation/filesystems/locking.rst b/Documentation/filesystems/locking.rst
index 7de7a7272a5e..922886fefb7f 100644
--- a/Documentation/filesystems/locking.rst
+++ b/Documentation/filesystems/locking.rst
@@ -663,7 +663,7 @@ locked. The VM will unlock the page.
 Filesystem should find and map pages associated with offsets from "start_pgoff"
 till "end_pgoff". ->map_pages() is called with page table locked and must
 not block.  If it's not possible to reach a page without blocking,
-filesystem should skip it. Filesystem should use do_set_pte() to setup
+filesystem should skip it. Filesystem should use set_pte_range() to setup
 page table entry. Pointer to entry associated with the page is passed in
 "pte" field in vm_fault structure. Pointers to entries for other offsets
 should be calculated relative to "pte".
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 1f79667824eb..568ebe7058d4 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1168,7 +1168,8 @@ static inline pte_t maybe_mkwrite(pte_t pte, struct vm_area_struct *vma)
 }
 
 vm_fault_t do_set_pmd(struct vm_fault *vmf, struct page *page);
-void do_set_pte(struct vm_fault *vmf, struct page *page, unsigned long addr);
+void set_pte_range(struct vm_fault *vmf, struct folio *folio,
+		struct page *page, unsigned int nr, unsigned long addr);
 
 vm_fault_t finish_fault(struct vm_fault *vmf);
 vm_fault_t finish_mkwrite_fault(struct vm_fault *vmf);
diff --git a/mm/filemap.c b/mm/filemap.c
index db86e459dde6..07ebd90967a3 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -3507,8 +3507,7 @@ static vm_fault_t filemap_map_folio_range(struct vm_fault *vmf,
 			ret = VM_FAULT_NOPAGE;
 
 		ref_count++;
-		do_set_pte(vmf, page, addr);
-		update_mmu_cache(vma, addr, vmf->pte);
+		set_pte_range(vmf, folio, page, 1, addr);
 	} while (vmf->pte++, page++, addr += PAGE_SIZE, ++count < nr_pages);
 
 	/* Restore the vmf->pte */
diff --git a/mm/memory.c b/mm/memory.c
index 69e844d5f75c..efd17ff09315 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4255,7 +4255,8 @@ vm_fault_t do_set_pmd(struct vm_fault *vmf, struct page *page)
 }
 #endif
 
-void do_set_pte(struct vm_fault *vmf, struct page *page, unsigned long addr)
+void set_pte_range(struct vm_fault *vmf, struct folio *folio,
+		struct page *page, unsigned int nr, unsigned long addr)
 {
 	struct vm_area_struct *vma = vmf->vma;
 	bool uffd_wp = pte_marker_uffd_wp(vmf->orig_pte);
@@ -4263,7 +4264,7 @@ void do_set_pte(struct vm_fault *vmf, struct page *page, unsigned long addr)
 	bool prefault = vmf->address != addr;
 	pte_t entry;
 
-	flush_icache_page(vma, page);
+	flush_icache_pages(vma, page, nr);
 	entry = mk_pte(page, vma->vm_page_prot);
 
 	if (prefault && arch_wants_old_prefaulted_pte())
@@ -4277,14 +4278,18 @@ void do_set_pte(struct vm_fault *vmf, struct page *page, unsigned long addr)
 		entry = pte_mkuffd_wp(entry);
 	/* copy-on-write page */
 	if (write && !(vma->vm_flags & VM_SHARED)) {
-		inc_mm_counter(vma->vm_mm, MM_ANONPAGES);
-		page_add_new_anon_rmap(page, vma, addr);
-		lru_cache_add_inactive_or_unevictable(page, vma);
+		add_mm_counter(vma->vm_mm, MM_ANONPAGES, nr);
+		VM_BUG_ON_FOLIO(nr != 1, folio);
+		folio_add_new_anon_rmap(folio, vma, addr);
+		folio_add_lru_vma(folio, vma);
 	} else {
-		inc_mm_counter(vma->vm_mm, mm_counter_file(page));
-		page_add_file_rmap(page, vma, false);
+		add_mm_counter(vma->vm_mm, mm_counter_file(page), nr);
+		folio_add_file_rmap_range(folio, page, nr, vma, false);
 	}
-	set_pte_at(vma->vm_mm, addr, vmf->pte, entry);
+	set_ptes(vma->vm_mm, addr, vmf->pte, entry, nr);
+
+	/* no need to invalidate: a not-present page won't be cached */
+	update_mmu_cache_range(vma, addr, vmf->pte, nr);
 }
 
 static bool vmf_pte_changed(struct vm_fault *vmf)
@@ -4357,11 +4362,9 @@ vm_fault_t finish_fault(struct vm_fault *vmf)
 
 	/* Re-check under ptl */
 	if (likely(!vmf_pte_changed(vmf))) {
-		do_set_pte(vmf, page, vmf->address);
-
-		/* no need to invalidate: a not-present page won't be cached */
-		update_mmu_cache(vma, vmf->address, vmf->pte);
+		struct folio *folio = page_folio(page);
 
+		set_pte_range(vmf, folio, page, 1, vmf->address);
 		ret = 0;
 	} else {
 		update_mmu_tlb(vma, vmf->address, vmf->pte);
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH v3 34/34] filemap: Batch PTE mappings
  2023-02-28 21:37 [PATCH v3 00/34] New page table range API Matthew Wilcox (Oracle)
                   ` (32 preceding siblings ...)
  2023-02-28 21:37 ` [PATCH v3 33/34] mm: Convert do_set_pte() to set_pte_range() Matthew Wilcox (Oracle)
@ 2023-02-28 21:37 ` Matthew Wilcox (Oracle)
  2023-03-03 14:19 ` [PATCH v3 00/34] New page table range API Mike Rapoport
                   ` (2 subsequent siblings)
  36 siblings, 0 replies; 77+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-02-28 21:37 UTC (permalink / raw)
  To: linux-mm, linux-arch; +Cc: Yin Fengwei, linux-kernel, Matthew Wilcox

From: Yin Fengwei <fengwei.yin@intel.com>

Call set_pte_range() once per contiguous range of the folio instead
of once per page.  This batches the updates to mm counters and the
rmap.

With a will-it-scale.page_fault3 like app (change file write
fault testing to read fault testing. Trying to upstream it to
will-it-scale at [1]) got 15% performance gain on a 48C/96T
Cascade Lake test box with 96 processes running against xfs.

Perf data collected before/after the change:
  18.73%--page_add_file_rmap
          |
           --11.60%--__mod_lruvec_page_state
                     |
                     |--7.40%--__mod_memcg_lruvec_state
                     |          |
                     |           --5.58%--cgroup_rstat_updated
                     |
                      --2.53%--__mod_lruvec_state
                                |
                                 --1.48%--__mod_node_page_state

  9.93%--page_add_file_rmap_range
         |
          --2.67%--__mod_lruvec_page_state
                    |
                    |--1.95%--__mod_memcg_lruvec_state
                    |          |
                    |           --1.57%--cgroup_rstat_updated
                    |
                     --0.61%--__mod_lruvec_state
                               |
                                --0.54%--__mod_node_page_state

The running time of __mode_lruvec_page_state() is reduced about 9%.

[1]: https://github.com/antonblanchard/will-it-scale/pull/37

Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 mm/filemap.c | 36 +++++++++++++++++++++++++-----------
 1 file changed, 25 insertions(+), 11 deletions(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index 07ebd90967a3..40be33b5ee46 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -3486,11 +3486,12 @@ static vm_fault_t filemap_map_folio_range(struct vm_fault *vmf,
 	struct file *file = vma->vm_file;
 	struct page *page = folio_page(folio, start);
 	unsigned int mmap_miss = READ_ONCE(file->f_ra.mmap_miss);
-	unsigned int ref_count = 0, count = 0;
+	unsigned int count = 0;
+	pte_t *old_ptep = vmf->pte;
 
 	do {
-		if (PageHWPoison(page))
-			continue;
+		if (PageHWPoison(page + count))
+			goto skip;
 
 		if (mmap_miss > 0)
 			mmap_miss--;
@@ -3500,20 +3501,33 @@ static vm_fault_t filemap_map_folio_range(struct vm_fault *vmf,
 		 * handled in the specific fault path, and it'll prohibit the
 		 * fault-around logic.
 		 */
-		if (!pte_none(*vmf->pte))
-			continue;
+		if (!pte_none(vmf->pte[count]))
+			goto skip;
 
 		if (vmf->address == addr)
 			ret = VM_FAULT_NOPAGE;
 
-		ref_count++;
-		set_pte_range(vmf, folio, page, 1, addr);
-	} while (vmf->pte++, page++, addr += PAGE_SIZE, ++count < nr_pages);
+		count++;
+		continue;
+skip:
+		if (count) {
+			set_pte_range(vmf, folio, page, count, addr);
+			folio_ref_add(folio, count);
+		}
 
-	/* Restore the vmf->pte */
-	vmf->pte -= nr_pages;
+		count++;
+		page += count;
+		vmf->pte += count;
+		addr += count * PAGE_SIZE;
+		count = 0;
+	} while (--nr_pages > 0);
+
+	if (count) {
+		set_pte_range(vmf, folio, page, count, addr);
+		folio_ref_add(folio, count);
+	}
 
-	folio_ref_add(folio, ref_count);
+	vmf->pte = old_ptep;
 	WRITE_ONCE(file->f_ra.mmap_miss, mmap_miss);
 
 	return ret;
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* Re: [PATCH v3 12/34] loongarch: Implement the new page table range API
  2023-02-28 21:37 ` [PATCH v3 12/34] loongarch: " Matthew Wilcox (Oracle)
@ 2023-03-01  2:04   ` WANG Xuerui
  0 siblings, 0 replies; 77+ messages in thread
From: WANG Xuerui @ 2023-03-01  2:04 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), linux-mm, linux-arch
  Cc: linux-kernel, Huacai Chen, loongarch

Hi,

On 3/1/23 05:37, Matthew Wilcox (Oracle) wrote:
> Add set_ptes() and update_mmu_cache_range().  It would probably be
> more efficient to implement __update_tlb() by flushing the entire
> folio instead of calling it __update_tlb() N times, but I'll leave
> that for someone who understands the architecture better.
Thanks for the patch! Unfortunately it doesn't seem possible 
architecture-wise to batch-flush pages right now on loongarch, but AFAIK 
the vendor *could* be listening so a future model could have the feature 
supported... who knows!
>
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> Cc: Huacai Chen <chenhuacai@kernel.org>
> Cc: WANG Xuerui <kernel@xen0n.name>
> Cc: loongarch@lists.linux.dev
> ---
>   arch/loongarch/include/asm/cacheflush.h |  2 ++
>   arch/loongarch/include/asm/pgtable.h    | 30 +++++++++++++++++++------
>   2 files changed, 25 insertions(+), 7 deletions(-)
>
> diff --git a/arch/loongarch/include/asm/cacheflush.h b/arch/loongarch/include/asm/cacheflush.h
> index 0681788eb474..7907eb42bfbd 100644
> --- a/arch/loongarch/include/asm/cacheflush.h
> +++ b/arch/loongarch/include/asm/cacheflush.h
> @@ -47,8 +47,10 @@ void local_flush_icache_range(unsigned long start, unsigned long end);
>   #define flush_cache_vmap(start, end)			do { } while (0)
>   #define flush_cache_vunmap(start, end)			do { } while (0)
>   #define flush_icache_page(vma, page)			do { } while (0)
> +#define flush_icache_pages(vma, page)			do { } while (0)
>   #define flush_icache_user_page(vma, page, addr, len)	do { } while (0)
>   #define flush_dcache_page(page)				do { } while (0)
> +#define flush_dcache_folio(folio)			do { } while (0)
This will break the build because the surrounding code is unnecessarily 
redefining the stubs that the asm-generic include at the end of the file 
will properly take care of. With the build fixed (patch attached below), 
I can successfully boot-test and stress mm for a while on a Loongson 
3A5000. I haven't done the benchmarks though.
>   #define flush_dcache_mmap_lock(mapping)			do { } while (0)
>   #define flush_dcache_mmap_unlock(mapping)		do { } while (0)
>   
> <snip>

Cleanup patch:

-- >8 --

 From 0de3f49e6ba23c1b45e7b5da355f87398c4a7feb Mon Sep 17 00:00:00 2001
From: WANG Xuerui <git@xen0n.name>
Date: Wed, 1 Mar 2023 09:32:18 +0800
Subject: [PATCH] LoongArch: Remove stub definitions in cacheflush.h

Per the current best practice in mm, it's unnecessary to define no-op
stubs explicitly in arch code, because the asm-generic inclusion at the
end of the file will properly take care of these. And the definition of
ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE to 0 is going to confuse the
asm-generic code, so remove the stubs for clarity and avoiding misuse.

Signed-off-by: WANG Xuerui <git@xen0n.name>
---
  arch/loongarch/include/asm/cacheflush.h | 15 ---------------
  1 file changed, 15 deletions(-)

diff --git a/arch/loongarch/include/asm/cacheflush.h b/arch/loongarch/include/asm/cacheflush.h
index 326ac6f1b27c..cb8c046b4f17 100644
--- a/arch/loongarch/include/asm/cacheflush.h
+++ b/arch/loongarch/include/asm/cacheflush.h
@@ -37,21 +37,6 @@ void local_flush_icache_range(unsigned long start, unsigned long end);
  #define flush_icache_range     local_flush_icache_range
  #define flush_icache_user_range        local_flush_icache_range
  
-#define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 0
-
-#define flush_cache_all()                              do { } while (0)
-#define flush_cache_mm(mm)                             do { } while (0)
-#define flush_cache_dup_mm(mm)                         do { } while (0)
-#define flush_cache_range(vma, start, end)             do { } while (0)
-#define flush_cache_page(vma, vmaddr, pfn)             do { } while (0)
-#define flush_cache_vmap(start, end)                   do { } while (0)
-#define flush_cache_vunmap(start, end)                 do { } while (0)
-#define flush_icache_user_page(vma, page, addr, len)   do { } while (0)
-#define flush_dcache_page(page)                                do { } while (0)
-#define flush_dcache_folio(folio)                      do { } while (0)
-#define flush_dcache_mmap_lock(mapping)                        do { } while (0)
-#define flush_dcache_mmap_unlock(mapping)              do { } while (0)
-
  #define cache_op(op, addr)                                             \
         __asm__ __volatile__(                                           \
         "       cacop   %0, %1                                  \n"     \

-- 
WANG "xen0n" Xuerui

Linux/LoongArch mailing list: https://lore.kernel.org/loongarch/


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* Re: [PATCH v3 32/34] rmap: add folio_add_file_rmap_range()
  2023-02-28 21:37 ` [PATCH v3 32/34] rmap: add folio_add_file_rmap_range() Matthew Wilcox (Oracle)
@ 2023-03-01  3:04   ` Yin, Fengwei
  0 siblings, 0 replies; 77+ messages in thread
From: Yin, Fengwei @ 2023-03-01  3:04 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), linux-mm, linux-arch; +Cc: linux-kernel



On 3/1/2023 5:37 AM, Matthew Wilcox (Oracle) wrote:
> From: Yin Fengwei <fengwei.yin@intel.com>
> 
> folio_add_file_rmap_range() allows to add pte mapping to a specific
> range of file folio. Comparing to page_add_file_rmap(), it batched
> updates __lruvec_stat for large folio.
> 
> Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>  include/linux/rmap.h |  2 ++
>  mm/rmap.c            | 60 +++++++++++++++++++++++++++++++++-----------
>  2 files changed, 48 insertions(+), 14 deletions(-)
> 
> diff --git a/include/linux/rmap.h b/include/linux/rmap.h
> index b87d01660412..a3825ce81102 100644
> --- a/include/linux/rmap.h
> +++ b/include/linux/rmap.h
> @@ -198,6 +198,8 @@ void folio_add_new_anon_rmap(struct folio *, struct vm_area_struct *,
>  		unsigned long address);
>  void page_add_file_rmap(struct page *, struct vm_area_struct *,
>  		bool compound);
> +void folio_add_file_rmap_range(struct folio *, struct page *, unsigned int nr,
> +		struct vm_area_struct *, bool compound);
>  void page_remove_rmap(struct page *, struct vm_area_struct *,
>  		bool compound);
>  
> diff --git a/mm/rmap.c b/mm/rmap.c
> index bacdb795d5ee..fffdb85a3b3d 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -1303,31 +1303,39 @@ void folio_add_new_anon_rmap(struct folio *folio, struct vm_area_struct *vma,
>  }
>  
>  /**
> - * page_add_file_rmap - add pte mapping to a file page
> - * @page:	the page to add the mapping to
> + * folio_add_file_rmap_range - add pte mapping to page range of a folio
> + * @folio:	The folio to add the mapping to
> + * @page:	The first page to add
> + * @nr_pages:	The number of pages which will be mapped
>   * @vma:	the vm area in which the mapping is added
>   * @compound:	charge the page as compound or small page
>   *
> + * The page range of folio is defined by [first_page, first_page + nr_pages)
> + *
>   * The caller needs to hold the pte lock.
>   */
> -void page_add_file_rmap(struct page *page, struct vm_area_struct *vma,
> -		bool compound)
> +void folio_add_file_rmap_range(struct folio *folio, struct page *page,
> +			unsigned int nr_pages, struct vm_area_struct *vma,
> +			bool compound)
>  {
> -	struct folio *folio = page_folio(page);
>  	atomic_t *mapped = &folio->_nr_pages_mapped;
> -	int nr = 0, nr_pmdmapped = 0;
> -	bool first;
> +	unsigned int nr_pmdmapped = 0, first;
> +	int nr = 0;
>  
> -	VM_BUG_ON_PAGE(compound && !PageTransHuge(page), page);
> +	VM_WARN_ON_FOLIO(compound && !folio_test_pmd_mappable(folio), folio);
>  
>  	/* Is page being mapped by PTE? Is this its first map to be added? */
>  	if (likely(!compound)) {
> -		first = atomic_inc_and_test(&page->_mapcount);
> -		nr = first;
> -		if (first && folio_test_large(folio)) {
> -			nr = atomic_inc_return_relaxed(mapped);
> -			nr = (nr < COMPOUND_MAPPED);
> -		}
> +		do {
> +			first = atomic_inc_and_test(&page->_mapcount);
> +			if (first && folio_test_large(folio)) {
> +				first = atomic_inc_return_relaxed(mapped);
> +				first = (nr < COMPOUND_MAPPED);
Sorry. Just noticed this typo which was my bad. This should be:
				first = (first < COMPOUND_MAPPED);


Regards
Yin, Fengwei

> +			}
> +
> +			if (first)
> +				nr++;
> +		} while (page++, --nr_pages > 0);
>  	} else if (folio_test_pmd_mappable(folio)) {
>  		/* That test is redundant: it's for safety or to optimize out */
>  
> @@ -1356,6 +1364,30 @@ void page_add_file_rmap(struct page *page, struct vm_area_struct *vma,
>  	mlock_vma_folio(folio, vma, compound);
>  }
>  
> +/**
> + * page_add_file_rmap - add pte mapping to a file page
> + * @page:	the page to add the mapping to
> + * @vma:	the vm area in which the mapping is added
> + * @compound:	charge the page as compound or small page
> + *
> + * The caller needs to hold the pte lock.
> + */
> +void page_add_file_rmap(struct page *page, struct vm_area_struct *vma,
> +		bool compound)
> +{
> +	struct folio *folio = page_folio(page);
> +	unsigned int nr_pages;
> +
> +	VM_WARN_ON_ONCE_PAGE(compound && !PageTransHuge(page), page);
> +
> +	if (likely(!compound))
> +		nr_pages = 1;
> +	else
> +		nr_pages = folio_nr_pages(folio);
> +
> +	folio_add_file_rmap_range(folio, page, nr_pages, vma, compound);
> +}
> +
>  /**
>   * page_remove_rmap - take down pte mapping from a page
>   * @page:	page to remove mapping from

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v3 22/34] superh: Implement the new page table range API
  2023-02-28 21:37 ` [PATCH v3 22/34] superh: " Matthew Wilcox (Oracle)
@ 2023-03-01  8:06   ` Geert Uytterhoeven
  2023-03-01 16:17     ` Matthew Wilcox
  0 siblings, 1 reply; 77+ messages in thread
From: Geert Uytterhoeven @ 2023-03-01  8:06 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-mm, linux-arch, linux-kernel, Yoshinori Sato, Rich Felker,
	John Paul Adrian Glaubitz, linux-sh

Hi Willy,

On Tue, Feb 28, 2023 at 10:39 PM Matthew Wilcox (Oracle)
<willy@infradead.org> wrote:
> Add set_ptes(), update_mmu_cache_range(), flush_dcache_folio() and
> flush_icache_pages().  Change the PG_dcache_clean flag from being
> per-page to per-folio.  Flush the entire folio containing the pages in
> flush_icache_pages() for ease of implementation.
>
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

Thanks for your patch!

> --- a/arch/sh/mm/cache.c
> +++ b/arch/sh/mm/cache.c

>  void __flush_anon_page(struct page *page, unsigned long vmaddr)
>  {
> +       struct folio *folio = page_folio(page);
>         unsigned long addr = (unsigned long) page_address(page);
>
>         if (pages_do_alias(addr, vmaddr)) {
> -               if (boot_cpu_data.dcache.n_aliases && page_mapcount(page) &&
> -                   test_bit(PG_dcache_clean, &page->flags)) {
> +               if (boot_cpu_data.dcache.n_aliases && folio_mapped(folio) &&
> +                   test_bit(PG_dcache_clean, &folio->flags)) {
>                         void *kaddr;
>
>                         kaddr = kmap_coherent(page, vmaddr);
>                         /* XXX.. For now kunmap_coherent() does a purge */
>                         /* __flush_purge_region((void *)kaddr, PAGE_SIZE); */
>                         kunmap_coherent(kaddr);
> -               } else
> -                       __flush_purge_region((void *)addr, PAGE_SIZE);
> +               } else

Trailing whitespace. Please run scripts/checkpath.pl (on the full series).

> +                       __flush_purge_region(folio_address(folio),
> +                                               folio_size(folio));
>         }
>  }

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v3 22/34] superh: Implement the new page table range API
  2023-03-01  8:06   ` Geert Uytterhoeven
@ 2023-03-01 16:17     ` Matthew Wilcox
  0 siblings, 0 replies; 77+ messages in thread
From: Matthew Wilcox @ 2023-03-01 16:17 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: linux-mm, linux-arch, linux-kernel, Yoshinori Sato, Rich Felker,
	John Paul Adrian Glaubitz, linux-sh

On Wed, Mar 01, 2023 at 09:06:16AM +0100, Geert Uytterhoeven wrote:
> > -               } else
> > -                       __flush_purge_region((void *)addr, PAGE_SIZE);
> > +               } else
> 
> Trailing whitespace. Please run scripts/checkpath.pl (on the full series).

Thanks.  It's pretty noisy and I don't intend to clean up many of
those warnings, but it did find some legit problems.

I'll squash these fixes into v4.

diff --git a/arch/powerpc/include/asm/pgtable.h b/arch/powerpc/include/asm/pgtable.h
index bf1263ff7e67..45e5282f406c 100644
--- a/arch/powerpc/include/asm/pgtable.h
+++ b/arch/powerpc/include/asm/pgtable.h
@@ -45,7 +45,7 @@ void set_ptes(struct mm_struct *mm, unsigned long addr, pte_t *ptep,
 		pte_t pte, unsigned int nr);
 #define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1)
 #define update_mmu_cache(vma, addr, ptep) \
-	update_mmu_cache_range(vma, addr, ptep, 1);
+	update_mmu_cache_range(vma, addr, ptep, 1)
 
 #ifndef MAX_PTRS_PER_PGD
 #define MAX_PTRS_PER_PGD PTRS_PER_PGD
diff --git a/arch/sh/mm/cache.c b/arch/sh/mm/cache.c
index 93fc5fb8ec1c..9bcaa5619eab 100644
--- a/arch/sh/mm/cache.c
+++ b/arch/sh/mm/cache.c
@@ -169,7 +169,7 @@ void __flush_anon_page(struct page *page, unsigned long vmaddr)
 			/* XXX.. For now kunmap_coherent() does a purge */
 			/* __flush_purge_region((void *)kaddr, PAGE_SIZE); */
 			kunmap_coherent(kaddr);
-		} else 
+		} else
 			__flush_purge_region(folio_address(folio),
 						folio_size(folio));
 	}
diff --git a/arch/sparc/include/asm/pgtable_64.h b/arch/sparc/include/asm/pgtable_64.h
index d5c0088e0c6a..0c30b0eb6c57 100644
--- a/arch/sparc/include/asm/pgtable_64.h
+++ b/arch/sparc/include/asm/pgtable_64.h
@@ -924,7 +924,7 @@ static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
 	}
 }
 
-#define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1);
+#define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1)
 
 #define pte_clear(mm,addr,ptep)		\
 	set_pte_at((mm), (addr), (ptep), __pte(0UL))
diff --git a/arch/xtensa/mm/cache.c b/arch/xtensa/mm/cache.c
index 65c0d5298041..27bd798e4d89 100644
--- a/arch/xtensa/mm/cache.c
+++ b/arch/xtensa/mm/cache.c
@@ -254,7 +254,7 @@ void update_mmu_cache_range(struct vm_area_struct *vma, unsigned long addr,
 			void *paddr = kmap_local_folio(folio, i * PAGE_SIZE);
 			__flush_dcache_page((unsigned long)paddr);
 			__invalidate_icache_page((unsigned long)paddr);
-			kunmap_atomic(paddr);
+			kunmap_local(paddr);
 		}
 		set_bit(PG_arch_1, &folio->flags);
 	}

> > +                       __flush_purge_region(folio_address(folio),
> > +                                               folio_size(folio));
> >         }
> >  }
> 
> Gr{oetje,eeting}s,
> 
>                         Geert
> 
> -- 
> Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org
> 
> In personal conversations with technical people, I call myself a hacker. But
> when I'm talking to journalists I just say "programmer" or something like that.
>                                 -- Linus Torvalds

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* Re: [PATCH v3 21/34] s390: Implement the new page table range API
  2023-02-28 21:37 ` [PATCH v3 21/34] s390: " Matthew Wilcox (Oracle)
@ 2023-03-02 13:31   ` Gerald Schaefer
  0 siblings, 0 replies; 77+ messages in thread
From: Gerald Schaefer @ 2023-03-02 13:31 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-mm, linux-arch, linux-kernel, Heiko Carstens,
	Vasily Gorbik, Alexander Gordeev, Christian Borntraeger,
	Sven Schnelle, linux-s390

On Tue, 28 Feb 2023 21:37:24 +0000
"Matthew Wilcox (Oracle)" <willy@infradead.org> wrote:

> Add set_ptes() and update_mmu_cache_range().
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> Cc: Heiko Carstens <hca@linux.ibm.com>
> Cc: Vasily Gorbik <gor@linux.ibm.com>
> Cc: Alexander Gordeev <agordeev@linux.ibm.com>
> Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
> Cc: Sven Schnelle <svens@linux.ibm.com>
> Cc: linux-s390@vger.kernel.org
> ---
>  arch/s390/include/asm/pgtable.h | 34 ++++++++++++++++++++++++---------
>  1 file changed, 25 insertions(+), 9 deletions(-)
> 
> diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h
> index 2c70b4d1263d..46bf475116f1 100644
> --- a/arch/s390/include/asm/pgtable.h
> +++ b/arch/s390/include/asm/pgtable.h
> @@ -50,6 +50,7 @@ void arch_report_meminfo(struct seq_file *m);
>   * tables contain all the necessary information.
>   */
>  #define update_mmu_cache(vma, address, ptep)     do { } while (0)
> +#define update_mmu_cache_range(vma, addr, ptep, nr)	do { } while (0)
>  #define update_mmu_cache_pmd(vma, address, ptep) do { } while (0)
>  
>  /*
> @@ -1317,21 +1318,36 @@ pgprot_t pgprot_writecombine(pgprot_t prot);
>  pgprot_t pgprot_writethrough(pgprot_t prot);
>  
>  /*
> - * Certain architectures need to do special things when PTEs
> - * within a page table are directly modified.  Thus, the following
> - * hook is made available.
> + * Set multiple PTEs to consecutive pages with a single call.  All PTEs
> + * are within the same folio, PMD and VMA.
>   */
> -static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
> -			      pte_t *ptep, pte_t entry)
> +static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
> +			      pte_t *ptep, pte_t entry, unsigned int nr)
>  {
>  	if (pte_present(entry))
>  		entry = clear_pte_bit(entry, __pgprot(_PAGE_UNUSED));
> -	if (mm_has_pgste(mm))
> -		ptep_set_pte_at(mm, addr, ptep, entry);
> -	else
> -		set_pte(ptep, entry);
> +	if (mm_has_pgste(mm)) {
> +		for (;;) {
> +			ptep_set_pte_at(mm, addr, ptep, entry);

There might be room for additional optimization here, regarding the
preempt_disable/enable() in ptep_set_pte_at(), i.e. move it out of
ptep_set_pte_at() and do it only once in this loop.

We could add that later with an add-on patch, but for this series it
all looks good.

Reviewed-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v3 18/34] parisc: Implement the new page table range API
  2023-02-28 21:37 ` [PATCH v3 18/34] parisc: " Matthew Wilcox (Oracle)
@ 2023-03-02 16:43   ` John David Anglin
  2023-03-02 20:40     ` John David Anglin
  0 siblings, 1 reply; 77+ messages in thread
From: John David Anglin @ 2023-03-02 16:43 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), linux-mm, linux-arch
  Cc: linux-kernel, James E.J. Bottomley, Helge Deller, linux-parisc

On 2023-02-28 4:37 p.m., Matthew Wilcox (Oracle) wrote:
> Add set_ptes(), update_mmu_cache_range(), flush_dcache_folio()
> and flush_icache_pages().  Change the PG_arch_1 (aka PG_dcache_dirty) flag
> from being per-page to per-folio.
I have tested this change on rp3440 at mainline commit e492250d5252635b6c97d52eddf2792ec26f1ec1
and c8000 at mainline commit ee3f96b164688dae21e2466a57f2e806b64e8a37.

So far, I haven't seen an issues on c8000.  On rp3440, I saw the following:

_swap_info_get: Unused swap offset entry 00000320
BUG: Bad page map in process buildd  pte:00032100 pmd:003606c3
addr:0000000000482000 vm_flags:00100077 anon_vma:0000000066f61340 mapping:0000000000000000 index:482
file:(null) fault:0x0 mmap:0x0 read_folio:0x0
CPU: 0 PID: 6813 Comm: buildd Not tainted 6.2.0+ #1
Hardware name: 9000/800/rp3440
Backtrace:
  [<000000004020af50>] show_stack+0x70/0x90
  [<0000000040b7d408>] dump_stack_lvl+0xd8/0x128
  [<0000000040b7d48c>] dump_stack+0x34/0x48
  [<00000000404513a4>] print_bad_pte+0x24c/0x318
  [<00000000404560dc>] zap_pte_range+0x8d4/0x958
  [<0000000040456398>] unmap_page_range+0x1d8/0x490
  [<000000004045681c>] unmap_vmas+0x10c/0x1a8
  [<0000000040466330>] exit_mmap+0x198/0x4a0
  [<0000000040235cbc>] mmput+0x114/0x2a8
  [<0000000040244e90>] do_exit+0x4e0/0xc68
  [<0000000040245938>] do_group_exit+0x68/0x128
  [<000000004025967c>] get_signal+0xae4/0xb60
  [<000000004021a570>] do_signal+0x50/0x228
  [<000000004021ab38>] do_notify_resume+0x68/0x150
  [<00000000402030b4>] intr_check_sig+0x38/0x3c

Disabling lock debugging due to kernel taint
_swap_info_get: Unused swap offset entry 000003a9
BUG: Bad page map in process buildd  pte:0003a940 pmd:003606c3
addr:0000000000523000 vm_flags:00100077 anon_vma:0000000066f61340 mapping:0000000000000000 index:523
file:(null) fault:0x0 mmap:0x0 read_folio:0x0
CPU: 2 PID: 6813 Comm: buildd Tainted: G    B              6.2.0+ #1
Hardware name: 9000/800/rp3440
Backtrace:
  [<000000004020af50>] show_stack+0x70/0x90
  [<0000000040b7d408>] dump_stack_lvl+0xd8/0x128
  [<0000000040b7d48c>] dump_stack+0x34/0x48
  [<00000000404513a4>] print_bad_pte+0x24c/0x318
  [<00000000404560dc>] zap_pte_range+0x8d4/0x958
  [<0000000040456398>] unmap_page_range+0x1d8/0x490
  [<000000004045681c>] unmap_vmas+0x10c/0x1a8
  [<0000000040466330>] exit_mmap+0x198/0x4a0
  [<0000000040235cbc>] mmput+0x114/0x2a8
  [<0000000040244e90>] do_exit+0x4e0/0xc68
  [<0000000040245938>] do_group_exit+0x68/0x128
  [<000000004025967c>] get_signal+0xae4/0xb60
  [<000000004021a570>] do_signal+0x50/0x228
  [<000000004021ab38>] do_notify_resume+0x68/0x150
  [<00000000402030b4>] intr_check_sig+0x38/0x3c
[...]
pagefault_out_of_memory: 1158973 callbacks suppressed
Huh VM_FAULT_OOM leaked out to the #PF handler. Retrying PF
Huh VM_FAULT_OOM leaked out to the #PF handler. Retrying PF
Huh VM_FAULT_OOM leaked out to the #PF handler. Retrying PF

Rebooted rp3440.  Since then, I haven't seen any more problems.

Dave

-- 
John David Anglin  dave.anglin@bell.net


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v3 18/34] parisc: Implement the new page table range API
  2023-03-02 16:43   ` John David Anglin
@ 2023-03-02 20:40     ` John David Anglin
  2023-03-04 16:27       ` John David Anglin
  0 siblings, 1 reply; 77+ messages in thread
From: John David Anglin @ 2023-03-02 20:40 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), linux-mm, linux-arch
  Cc: linux-kernel, James E.J. Bottomley, Helge Deller, linux-parisc

On 2023-03-02 11:43 a.m., John David Anglin wrote:
> On 2023-02-28 4:37 p.m., Matthew Wilcox (Oracle) wrote:
>> Add set_ptes(), update_mmu_cache_range(), flush_dcache_folio()
>> and flush_icache_pages().  Change the PG_arch_1 (aka PG_dcache_dirty) flag
>> from being per-page to per-folio.
> I have tested this change on rp3440 at mainline commit e492250d5252635b6c97d52eddf2792ec26f1ec1
> and c8000 at mainline commit ee3f96b164688dae21e2466a57f2e806b64e8a37.
Here's another one:

------------[ cut here ]------------
kernel BUG at mm/memory.c:3865!
CPU: 1 PID: 6972 Comm: sbuild Not tainted 6.2.0+ #1
Hardware name: 9000/800/rp3440

      YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI
PSW: 00001000000001101111111100001111 Not tainted
r00-03  000000000806ff0f 000000004fab8d40 00000000404584b0 000000004fab8d40
r04-07  0000000040c2f4c0 0000000047fe60c0 000000004fab8b98 0000000000000953
r08-11  000000004de3de00 0000000000000000 0000000047fe60c0 0000004093ff4660
r12-15  0000000000000001 0000000047fe60c0 0000000040000540 000000022f8e9540
r16-19  0000000000000000 000000004c694c40 000000004fab8860 00000000000003d0
r20-23  0000000007be3a40 0000000000000fff 0000000000000000 000000004109f1a0
r24-27  0000000000000000 0000000000000cc0 0000000046de3a68 0000000040c2f4c0
r28-31  80e00000000a0435 000000004fab8df0 000000004fab8e20 0000000000000001
sr00-03  0000000000207c00 0000000000000000 0000000000000000 0000000002f11c00
sr04-07  0000000000000000 0000000000000000 0000000000000000 0000000000000000

IASQ: 0000000000000000 0000000000000000 IAOQ: 000000004045908c 0000000040459090
  IIR: 03ffe01f    ISR: 0000000000000000  IOR: 0000000000000000
  CPU:        1   CR30: 0000004095d64c20 CR31: ffffffffffffffff
  ORIG_R28: 000000001c569ad0
  IAOQ[0]: do_swap_page+0x108c/0x1168
  IAOQ[1]: do_swap_page+0x1090/0x1168
  RP(r2): do_swap_page+0x4b0/0x1168
Backtrace:
  [<000000004045a554>] handle_pte_fault+0x244/0x358
  [<000000004045c58c>] __handle_mm_fault+0x104/0x1b8
  [<000000004045c81c>] handle_mm_fault+0x1dc/0x318
  [<000000004044cb38>] faultin_page+0xa8/0x178
  [<000000004044e848>] __get_user_pages+0x328/0x560
  [<0000000040450ac4>] get_dump_page+0x9c/0x128
  [<0000000040596cb8>] dump_user_range+0xc0/0x2d8
  [<000000004058e790>] elf_core_dump+0x5f8/0x708
  [<0000000040596384>] do_coredump+0xc2c/0x14a0
  [<0000000040259040>] get_signal+0x4a8/0xb60
  [<000000004021a570>] do_signal+0x50/0x228
  [<000000004021ab38>] do_notify_resume+0x68/0x150
  [<0000000040203ee0>] syscall_do_signal+0x54/0xa0

CPU: 1 PID: 6972 Comm: sbuild Not tainted 6.2.0+ #1
Hardware name: 9000/800/rp3440
Backtrace:
  [<000000004020af50>] show_stack+0x70/0x90
  [<0000000040b7d408>] dump_stack_lvl+0xd8/0x128
  [<0000000040b7d48c>] dump_stack+0x34/0x48
  [<000000004020b160>] die_if_kernel+0x1d0/0x388
  [<000000004020c1c4>] handle_interruption+0xc34/0xc88
  [<000000004020307c>] intr_check_sig+0x0/0x3c

---[ end trace 0000000000000000 ]---
note: sbuild[6972] exited with preempt_count 1

Dave

-- 
John David Anglin  dave.anglin@bell.net


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v3 03/34] mm: Add folio_flush_mapping()
  2023-02-28 21:37 ` [PATCH v3 03/34] mm: Add folio_flush_mapping() Matthew Wilcox (Oracle)
@ 2023-03-03 10:33   ` Mike Rapoport
  0 siblings, 0 replies; 77+ messages in thread
From: Mike Rapoport @ 2023-03-03 10:33 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle); +Cc: linux-mm, linux-arch, linux-kernel

On Tue, Feb 28, 2023 at 09:37:06PM +0000, Matthew Wilcox (Oracle) wrote:
> This is the folio equivalent of page_mapping_file(), but rename it
> to make it clear that it's very different from page_file_mapping().
> Theoretically, there's nothing flush-only about it, but there are no
> other users today, and I doubt there will be; it's almost always more
> useful to know the swapfile's mapping or the swapcache's mapping.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>  include/linux/pagemap.h | 26 +++++++++++++++++++++-----
>  1 file changed, 21 insertions(+), 5 deletions(-)
> 
> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
> index 51b75b89730e..1b1ba3d5100d 100644
> --- a/include/linux/pagemap.h
> +++ b/include/linux/pagemap.h
> @@ -369,6 +369,26 @@ static inline struct address_space *folio_file_mapping(struct folio *folio)
>  	return folio->mapping;
>  }
>  
> +/**
> + * folio_flush_mapping - Find the file mapping this folio belongs to.
> + * @folio: The folio.
> + *
> + * For folios which are in the page cache, return the mapping that this
> + * page belongs to.  Anonymous folios return NULL, even if they're in

      ^ folio ?

> + * the swap cache.  Other kinds of folio also return NULL.
> + *
> + * This is ONLY used by architecture cache flushing code.  If you aren't
> + * writing cache flushing code, you want either folio_mapping() or
> + * folio_file_mapping().
> + */
> +static inline struct address_space *folio_flush_mapping(struct folio *folio)
> +{
> +	if (unlikely(folio_test_swapcache(folio)))
> +		return NULL;
> +
> +	return folio_mapping(folio);
> +}
> +
>  static inline struct address_space *page_file_mapping(struct page *page)
>  {
>  	return folio_file_mapping(page_folio(page));
> @@ -379,11 +399,7 @@ static inline struct address_space *page_file_mapping(struct page *page)
>   */
>  static inline struct address_space *page_mapping_file(struct page *page)
>  {
> -	struct folio *folio = page_folio(page);
> -
> -	if (unlikely(folio_test_swapcache(folio)))
> -		return NULL;
> -	return folio_mapping(folio);
> +	return folio_flush_mapping(page_folio(page));
>  }
>  
>  /**
> -- 
> 2.39.1
> 

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v3 14/34] microblaze: Implement the new page table range API
  2023-02-28 21:37 ` [PATCH v3 14/34] microblaze: " Matthew Wilcox (Oracle)
@ 2023-03-03 10:53   ` Mike Rapoport
  2023-03-03 14:38     ` Matthew Wilcox
  0 siblings, 1 reply; 77+ messages in thread
From: Mike Rapoport @ 2023-03-03 10:53 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle); +Cc: linux-mm, linux-arch, linux-kernel, Michal Simek

On Tue, Feb 28, 2023 at 09:37:17PM +0000, Matthew Wilcox (Oracle) wrote:
> Add set_ptes(), update_mmu_cache_range(), flush_icache_pages() and
> flush_dcache_folio().  Also change the calling convention for set_pte()
> to be the same as other architectures.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> Cc: Michal Simek <monstr@monstr.eu>
> ---
>  arch/microblaze/include/asm/cacheflush.h |  8 ++++++++
>  arch/microblaze/include/asm/pgtable.h    | 17 ++++++++++++-----
>  arch/microblaze/include/asm/tlbflush.h   |  4 +++-
>  3 files changed, 23 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/microblaze/include/asm/cacheflush.h b/arch/microblaze/include/asm/cacheflush.h
> index 39f8fb6768d8..e6641ff98cb3 100644
> --- a/arch/microblaze/include/asm/cacheflush.h
> +++ b/arch/microblaze/include/asm/cacheflush.h
> @@ -74,6 +74,14 @@ do { \
>  	flush_dcache_range((unsigned) (addr), (unsigned) (addr) + PAGE_SIZE); \
>  } while (0);
>  
> +static void flush_dcache_folio(struct folio *folio)
> +{
> +	unsigned long addr = folio_pfn(folio) << PAGE_SHIFT;
> +
> +	flush_dcache_range(addr, addr + folio_size(folio));
> +}
> +#define flush_dcache_folio flush_dcache_folio
> +
>  #define flush_cache_page(vma, vmaddr, pfn) \
>  	flush_dcache_range(pfn << PAGE_SHIFT, (pfn << PAGE_SHIFT) + PAGE_SIZE);
>  
> diff --git a/arch/microblaze/include/asm/pgtable.h b/arch/microblaze/include/asm/pgtable.h
> index d1b8272abcd9..a01e1369b486 100644
> --- a/arch/microblaze/include/asm/pgtable.h
> +++ b/arch/microblaze/include/asm/pgtable.h
> @@ -330,18 +330,25 @@ static inline unsigned long pte_update(pte_t *p, unsigned long clr,
>  /*
>   * set_pte stores a linux PTE into the linux page table.
>   */
> -static inline void set_pte(struct mm_struct *mm, unsigned long addr,
> -		pte_t *ptep, pte_t pte)
> +static inline void set_pte(pte_t *ptep, pte_t pte)
>  {
>  	*ptep = pte;
>  }
>  
> -static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
> -		pte_t *ptep, pte_t pte)
> +static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
> +		pte_t *ptep, pte_t pte, unsigned int nr)
>  {
> -	*ptep = pte;
> +	for (;;) {
> +		set_pte(ptep, pte);
> +		if (--nr == 0)
> +			break;
> +		ptep++;
> +		pte_val(pte) += 1 << PFN_SHIFT_OFFSET;

This is the same as

		pte_val(pte) += PAGE_SIZE;

isn't it?

> +	}
>  }
>  
> +#define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1)
> +
>  #define __HAVE_ARCH_PTEP_TEST_AND_CLEAR_YOUNG
>  static inline int ptep_test_and_clear_young(struct vm_area_struct *vma,
>  		unsigned long address, pte_t *ptep)
> diff --git a/arch/microblaze/include/asm/tlbflush.h b/arch/microblaze/include/asm/tlbflush.h
> index 2038168ed128..1b179e5e9062 100644
> --- a/arch/microblaze/include/asm/tlbflush.h
> +++ b/arch/microblaze/include/asm/tlbflush.h
> @@ -33,7 +33,9 @@ static inline void local_flush_tlb_range(struct vm_area_struct *vma,
>  
>  #define flush_tlb_kernel_range(start, end)	do { } while (0)
>  
> -#define update_mmu_cache(vma, addr, ptep)	do { } while (0)
> +#define update_mmu_cache_range(vma, addr, ptep, nr)	do { } while (0)
> +#define update_mmu_cache(vma, addr, pte) \
> +	update_mmu_cache_range(vma, addr, ptep, 1)
>  
>  #define flush_tlb_all local_flush_tlb_all
>  #define flush_tlb_mm local_flush_tlb_mm
> -- 
> 2.39.1
> 
> 

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v3 09/34] csky: Implement the new page table range API
  2023-02-28 21:37 ` [PATCH v3 09/34] csky: " Matthew Wilcox (Oracle)
@ 2023-03-03 11:40   ` Mike Rapoport
  0 siblings, 0 replies; 77+ messages in thread
From: Mike Rapoport @ 2023-03-03 11:40 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-mm, linux-arch, linux-kernel, Guo Ren, linux-csky

On Tue, Feb 28, 2023 at 09:37:12PM +0000, Matthew Wilcox (Oracle) wrote:
> Add set_ptes(), update_mmu_cache_range() and flush_dcache_folio().
> Change the PG_dcache_clean flag from being per-page to per-folio.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> Acked-by: Guo Ren <guoren@kernel.org>
> Cc: linux-csky@vger.kernel.org
> ---
>  arch/csky/abiv1/cacheflush.c         | 32 +++++++++++++++++-----------
>  arch/csky/abiv1/inc/abi/cacheflush.h |  2 ++
>  arch/csky/abiv2/cacheflush.c         | 30 +++++++++++++-------------
>  arch/csky/abiv2/inc/abi/cacheflush.h | 10 +++++++--
>  arch/csky/include/asm/pgtable.h      | 21 +++++++++++++++---
>  5 files changed, 62 insertions(+), 33 deletions(-)
> 
> diff --git a/arch/csky/abiv1/cacheflush.c b/arch/csky/abiv1/cacheflush.c
> index fb91b069dc69..ba43f6c26b4f 100644
> --- a/arch/csky/abiv1/cacheflush.c
> +++ b/arch/csky/abiv1/cacheflush.c
> @@ -14,43 +14,49 @@
>  
>  #define PG_dcache_clean		PG_arch_1
>  
> -void flush_dcache_page(struct page *page)
> +void flush_dcache_folio(struct folio *folio)
>  {
>  	struct address_space *mapping;
>  
> -	if (page == ZERO_PAGE(0))
> +	if (is_zero_pfn(folio_pfn(folio)))
>  		return;
>  
> -	mapping = page_mapping_file(page);
> +	mapping = folio_flush_mapping(folio);
>  
> -	if (mapping && !page_mapcount(page))
> -		clear_bit(PG_dcache_clean, &page->flags);
> +	if (mapping && !folio_mapped(folio))
> +		clear_bit(PG_dcache_clean, &folio->flags);
>  	else {
>  		dcache_wbinv_all();
>  		if (mapping)
>  			icache_inv_all();
> -		set_bit(PG_dcache_clean, &page->flags);
> +		set_bit(PG_dcache_clean, &folio->flags);
>  	}
>  }
> +EXPORT_SYMBOL(flush_dcache_folio);
> +
> +void flush_dcache_page(struct page *page)
> +{
> +	flush_dcache_folio(page_folio(page));
> +}
>  EXPORT_SYMBOL(flush_dcache_page);
>  
> -void update_mmu_cache(struct vm_area_struct *vma, unsigned long addr,
> -	pte_t *ptep)
> +void update_mmu_cache_range(struct vm_area_struct *vma, unsigned long addr,
> +		pte_t *ptep, unsigned int nr)
>  {
>  	unsigned long pfn = pte_pfn(*ptep);
> -	struct page *page;
> +	struct folio *folio;
>  
>  	if (!pfn_valid(pfn))
>  		return;
>  
> -	page = pfn_to_page(pfn);
> -	if (page == ZERO_PAGE(0))
> +	if (is_zero_pfn(pfn))
>  		return;
>  
> -	if (!test_and_set_bit(PG_dcache_clean, &page->flags))
> +	folio = page_folio(pfn_to_page(pfn));
> +	if (!test_and_set_bit(PG_dcache_clean, &folio->flags))
>  		dcache_wbinv_all();
>  
> -	if (page_mapping_file(page)) {
> +	if (folio_flush_mapping(folio)) {
>  		if (vma->vm_flags & VM_EXEC)
>  			icache_inv_all();
>  	}
> diff --git a/arch/csky/abiv1/inc/abi/cacheflush.h b/arch/csky/abiv1/inc/abi/cacheflush.h
> index ed62e2066ba7..0d6cb65624c4 100644
> --- a/arch/csky/abiv1/inc/abi/cacheflush.h
> +++ b/arch/csky/abiv1/inc/abi/cacheflush.h
> @@ -9,6 +9,8 @@
>  
>  #define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
>  extern void flush_dcache_page(struct page *);
> +void flush_dcache_folio(struct folio *);
> +#define flush_dcache_folio flush_dcache_folio
>  
>  #define flush_cache_mm(mm)			dcache_wbinv_all()
>  #define flush_cache_page(vma, page, pfn)	cache_wbinv_all()
> diff --git a/arch/csky/abiv2/cacheflush.c b/arch/csky/abiv2/cacheflush.c
> index 39c51399dd81..c1cf0d55a2a1 100644
> --- a/arch/csky/abiv2/cacheflush.c
> +++ b/arch/csky/abiv2/cacheflush.c
> @@ -6,30 +6,30 @@
>  #include <linux/mm.h>
>  #include <asm/cache.h>
>  
> -void update_mmu_cache(struct vm_area_struct *vma, unsigned long address,
> -		      pte_t *pte)
> +void update_mmu_cache_range(struct vm_area_struct *vma, unsigned long address,
> +		pte_t *pte, unsigned int nr)
>  {
> -	unsigned long addr;
> +	unsigned long pfn = pte_pfn(*pte);
>  	struct page *page;

Should be struct folio *folio instead:

  CC      arch/csky/abiv2/cacheflush.o
arch/csky/abiv2/cacheflush.c: In function 'update_mmu_cache_range':
arch/csky/abiv2/cacheflush.c:19:9: error: 'folio' undeclared (first use in this function)
   19 |         folio = page_folio(pfn_to_page(pfn));
      |         ^~~~~
arch/csky/abiv2/cacheflush.c:19:9: note: each undeclared identifier is reported only once for each function it appears in
arch/csky/abiv2/cacheflush.c:13:22: warning: unused variable 'page' [-Wunused-variable]
   13 |         struct page *page;
      |                      ^~~~

> +	unsigned int i;
>  
> -	if (!pfn_valid(pte_pfn(*pte)))
> +	if (!pfn_valid(pfn) || is_zero_pfn(pfn))
>  		return;
>  
> -	page = pfn_to_page(pte_pfn(*pte));
> -	if (page == ZERO_PAGE(0))
> -		return;
> +	folio = page_folio(pfn_to_page(pfn));
>  
> -	if (test_and_set_bit(PG_dcache_clean, &page->flags))
> +	if (test_and_set_bit(PG_dcache_clean, &folio->flags))
>  		return;
>  
> -	addr = (unsigned long) kmap_atomic(page);
> -
> -	dcache_wb_range(addr, addr + PAGE_SIZE);
> +	for (i = 0; i < folio_nr_pages(folio); i++) {
> +		unsigned long addr = (unsigned long) kmap_local_folio(folio,
> +								i * PAGE_SIZE);
>  
> -	if (vma->vm_flags & VM_EXEC)
> -		icache_inv_range(addr, addr + PAGE_SIZE);
> -
> -	kunmap_atomic((void *) addr);
> +		dcache_wb_range(addr, addr + PAGE_SIZE);
> +		if (vma->vm_flags & VM_EXEC)
> +			icache_inv_range(addr, addr + PAGE_SIZE);
> +		kunmap_local((void *) addr);
> +	}
>  }
>  
>  void flush_icache_deferred(struct mm_struct *mm)
> diff --git a/arch/csky/abiv2/inc/abi/cacheflush.h b/arch/csky/abiv2/inc/abi/cacheflush.h
> index a565e00c3f70..9c728933a776 100644
> --- a/arch/csky/abiv2/inc/abi/cacheflush.h
> +++ b/arch/csky/abiv2/inc/abi/cacheflush.h
> @@ -18,11 +18,17 @@
>  
>  #define PG_dcache_clean		PG_arch_1
>  
> +static inline void flush_dcache_folio(struct folio *folio)
> +{
> +	if (test_bit(PG_dcache_clean, &folio->flags))
> +		clear_bit(PG_dcache_clean, &folio->flags);
> +}
> +#define flush_dcache_folio flush_dcache_folio
> +
>  #define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
>  static inline void flush_dcache_page(struct page *page)
>  {
> -	if (test_bit(PG_dcache_clean, &page->flags))
> -		clear_bit(PG_dcache_clean, &page->flags);
> +	flush_dcache_folio(page_folio(page));
>  }
>  
>  #define flush_dcache_mmap_lock(mapping)		do { } while (0)
> diff --git a/arch/csky/include/asm/pgtable.h b/arch/csky/include/asm/pgtable.h
> index d4042495febc..a30ae048233e 100644
> --- a/arch/csky/include/asm/pgtable.h
> +++ b/arch/csky/include/asm/pgtable.h
> @@ -90,7 +90,20 @@ static inline void set_pte(pte_t *p, pte_t pte)
>  	/* prevent out of order excution */
>  	smp_mb();
>  }
> -#define set_pte_at(mm, addr, ptep, pteval) set_pte(ptep, pteval)
> +
> +static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
> +		pte_t *ptep, pte_t pte, unsigned int nr)
> +{
> +	for (;;) {
> +		set_pte(ptep, pte);
> +		if (--nr == 0)
> +			break;
> +		ptep++;
> +		pte_val(pte) += PAGE_SIZE;
> +	}
> +}
> +
> +#define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1)
>  
>  static inline pte_t *pmd_page_vaddr(pmd_t pmd)
>  {
> @@ -263,8 +276,10 @@ static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
>  extern pgd_t swapper_pg_dir[PTRS_PER_PGD];
>  extern void paging_init(void);
>  
> -void update_mmu_cache(struct vm_area_struct *vma, unsigned long address,
> -		      pte_t *pte);
> +void update_mmu_cache_range(struct vm_area_struct *vma, unsigned long address,
> +		pte_t *pte, unsigned int nr);
> +#define update_mmu_cache(vma, addr, ptep) \
> +	update_mmu_cache_range(vma, addr, ptep, 1)
>  
>  #define io_remap_pfn_range(vma, vaddr, pfn, size, prot) \
>  	remap_pfn_range(vma, vaddr, pfn, size, prot)
> -- 
> 2.39.1
> 

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v3 11/34] ia64: Implement the new page table range API
  2023-02-28 21:37   ` Matthew Wilcox (Oracle)
@ 2023-03-03 11:56     ` Mike Rapoport
  -1 siblings, 0 replies; 77+ messages in thread
From: Mike Rapoport @ 2023-03-03 11:56 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle); +Cc: linux-mm, linux-arch, linux-kernel, linux-ia64

On Tue, Feb 28, 2023 at 09:37:14PM +0000, Matthew Wilcox (Oracle) wrote:
> Add set_ptes(), update_mmu_cache_range() and flush_dcache_folio().
> Change the PG_arch_1 (aka PG_dcache_clean) flag from being per-page to
> per-folio, which makes arch_dma_mark_clean() and mark_clean() a little
> more exciting.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> Cc: linux-ia64@vger.kernel.org
> ---
>  arch/ia64/hp/common/sba_iommu.c    | 26 +++++++++++++++-----------
>  arch/ia64/include/asm/cacheflush.h | 14 ++++++++++----
>  arch/ia64/include/asm/pgtable.h    | 14 +++++++++++++-
>  arch/ia64/mm/init.c                | 29 +++++++++++++++++++----------
>  4 files changed, 57 insertions(+), 26 deletions(-)
> 
> diff --git a/arch/ia64/hp/common/sba_iommu.c b/arch/ia64/hp/common/sba_iommu.c
> index 8ad6946521d8..48d475f10003 100644
> --- a/arch/ia64/hp/common/sba_iommu.c
> +++ b/arch/ia64/hp/common/sba_iommu.c
> @@ -798,22 +798,26 @@ sba_io_pdir_entry(u64 *pdir_ptr, unsigned long vba)
>  #endif
>  
>  #ifdef ENABLE_MARK_CLEAN
> -/**
> +/*
>   * Since DMA is i-cache coherent, any (complete) pages that were written via
>   * DMA can be marked as "clean" so that lazy_mmu_prot_update() doesn't have to
>   * flush them when they get mapped into an executable vm-area.
>   */
> -static void
> -mark_clean (void *addr, size_t size)
> +static void mark_clean(void *addr, size_t size)
>  {
> -	unsigned long pg_addr, end;
> -
> -	pg_addr = PAGE_ALIGN((unsigned long) addr);
> -	end = (unsigned long) addr + size;
> -	while (pg_addr + PAGE_SIZE <= end) {
> -		struct page *page = virt_to_page((void *)pg_addr);
> -		set_bit(PG_arch_1, &page->flags);
> -		pg_addr += PAGE_SIZE;
> +	struct folio *folio = virt_to_folio(addr);
> +	ssize_t left = size;
> +	size_t offset = offset_in_folio(folio, addr);
> +
> +	if (offset) {
> +		left -= folio_size(folio) - offset;
> +		folio = folio_next(folio);
> +	}
> +
> +	while (left >= folio_size(folio)) {
> +		set_bit(PG_arch_1, &folio->flags);
> +		left -= folio_size(folio);
> +		folio = folio_next(folio);
>  	}
>  }
>  #endif
> diff --git a/arch/ia64/include/asm/cacheflush.h b/arch/ia64/include/asm/cacheflush.h
> index 708c0fa5d975..eac493fa9e0d 100644
> --- a/arch/ia64/include/asm/cacheflush.h
> +++ b/arch/ia64/include/asm/cacheflush.h
> @@ -13,10 +13,16 @@
>  #include <asm/page.h>
>  
>  #define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
> -#define flush_dcache_page(page)			\
> -do {						\
> -	clear_bit(PG_arch_1, &(page)->flags);	\
> -} while (0)
> +static inline void flush_dcache_folio(struct folio *folio)
> +{
> +	clear_bit(PG_arch_1, &folio->flags);
> +}
> +#define flush_dcache_folio flush_dcache_folio
> +
> +static inline void flush_dcache_page(struct page *page)
> +{
> +	flush_dcache_folio(page_folio(page));
> +}
>  
>  extern void flush_icache_range(unsigned long start, unsigned long end);
>  #define flush_icache_range flush_icache_range
> diff --git a/arch/ia64/include/asm/pgtable.h b/arch/ia64/include/asm/pgtable.h
> index 21c97e31a28a..0c2be4ea664b 100644
> --- a/arch/ia64/include/asm/pgtable.h
> +++ b/arch/ia64/include/asm/pgtable.h
> @@ -303,7 +303,18 @@ static inline void set_pte(pte_t *ptep, pte_t pteval)
>  	*ptep = pteval;
>  }
>  
> -#define set_pte_at(mm,addr,ptep,pteval) set_pte(ptep,pteval)
> +static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
> +		pte_t *ptep, pte_t pte, unsigned int nr)
> +{
> +	for (;;) {
> +		set_pte(ptep, pte);
> +		if (--nr == 0)
> +			break;
> +		ptep++;
> +		pte_val(pte) += PAGE_SIZE;
> +	}
> +}
> +#define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, add, ptep, pte, 1)
>  
>  /*
>   * Make page protection values cacheable, uncacheable, or write-
> @@ -396,6 +407,7 @@ pte_same (pte_t a, pte_t b)
>  	return pte_val(a) == pte_val(b);
>  }
>  
> +#define update_mmu_cache_range(vma, address, ptep, nr) do { } while (0)
>  #define update_mmu_cache(vma, address, ptep) do { } while (0)
>  
>  extern pgd_t swapper_pg_dir[PTRS_PER_PGD];
> diff --git a/arch/ia64/mm/init.c b/arch/ia64/mm/init.c
> index 7f5353e28516..12aef25944aa 100644
> --- a/arch/ia64/mm/init.c
> +++ b/arch/ia64/mm/init.c
> @@ -50,30 +50,39 @@ void
>  __ia64_sync_icache_dcache (pte_t pte)
>  {
>  	unsigned long addr;
> -	struct page *page;
> +	struct folio *folio;
>  
> -	page = pte_page(pte);
> -	addr = (unsigned long) page_address(page);
> +	folio = page_folio(pte_page(pte));
> +	addr = (unsigned long)folio_address(folio);
>  
> -	if (test_bit(PG_arch_1, &page->flags))
> +	if (test_bit(PG_arch_1, &folio->flags))
>  		return;				/* i-cache is already coherent with d-cache */
>  
> -	flush_icache_range(addr, addr + page_size(page));
> -	set_bit(PG_arch_1, &page->flags);	/* mark page as clean */
> +	flush_icache_range(addr, addr + folio_size(folio));
> +	set_bit(PG_arch_1, &folio->flags);	/* mark page as clean */
>  }
>  
>  /*
> - * Since DMA is i-cache coherent, any (complete) pages that were written via
> + * Since DMA is i-cache coherent, any (complete) folios that were written via
>   * DMA can be marked as "clean" so that lazy_mmu_prot_update() doesn't have to
>   * flush them when they get mapped into an executable vm-area.
>   */
>  void arch_dma_mark_clean(phys_addr_t paddr, size_t size)
>  {
> -	unsigned long pfn = PHYS_PFN(paddr);
> +	struct folio *folio = page_folio(phys_to_page(paddr));
> +	ssize_t left = size;
> +	size_t offset = offset_in_folio(folio, paddr);

Build of defconfig failed miserably for me without this:

diff --git a/arch/ia64/mm/init.c b/arch/ia64/mm/init.c
index 12aef25944aa..0775e7870257 100644
--- a/arch/ia64/mm/init.c
+++ b/arch/ia64/mm/init.c
@@ -69,7 +69,8 @@ __ia64_sync_icache_dcache (pte_t pte)
  */
 void arch_dma_mark_clean(phys_addr_t paddr, size_t size)
 {
-	struct folio *folio = page_folio(phys_to_page(paddr));
+	unsigned long pfn = __phys_to_pfn(paddr);
+	struct folio *folio = page_folio(pfn_to_page(pfn));
 	ssize_t left = size;
 	size_t offset = offset_in_folio(folio, paddr);
 

>  
> -	do {
> +	if (offset) {
> +		left -= folio_size(folio) - offset;
> +		folio = folio_next(folio);
> +	}
> +
> +	while (left >= (ssize_t)folio_size(folio)) {
>  		set_bit(PG_arch_1, &pfn_to_page(pfn)->flags);
> -	} while (++pfn <= PHYS_PFN(paddr + size - 1));
> +		left -= folio_size(folio);
> +		folio = folio_next(folio);
> +	}
>  }
>  
>  inline void
> -- 
> 2.39.1
> 

-- 
Sincerely yours,
Mike.

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* Re: [PATCH v3 11/34] ia64: Implement the new page table range API
@ 2023-03-03 11:56     ` Mike Rapoport
  0 siblings, 0 replies; 77+ messages in thread
From: Mike Rapoport @ 2023-03-03 11:56 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle); +Cc: linux-mm, linux-arch, linux-kernel, linux-ia64

On Tue, Feb 28, 2023 at 09:37:14PM +0000, Matthew Wilcox (Oracle) wrote:
> Add set_ptes(), update_mmu_cache_range() and flush_dcache_folio().
> Change the PG_arch_1 (aka PG_dcache_clean) flag from being per-page to
> per-folio, which makes arch_dma_mark_clean() and mark_clean() a little
> more exciting.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> Cc: linux-ia64@vger.kernel.org
> ---
>  arch/ia64/hp/common/sba_iommu.c    | 26 +++++++++++++++-----------
>  arch/ia64/include/asm/cacheflush.h | 14 ++++++++++----
>  arch/ia64/include/asm/pgtable.h    | 14 +++++++++++++-
>  arch/ia64/mm/init.c                | 29 +++++++++++++++++++----------
>  4 files changed, 57 insertions(+), 26 deletions(-)
> 
> diff --git a/arch/ia64/hp/common/sba_iommu.c b/arch/ia64/hp/common/sba_iommu.c
> index 8ad6946521d8..48d475f10003 100644
> --- a/arch/ia64/hp/common/sba_iommu.c
> +++ b/arch/ia64/hp/common/sba_iommu.c
> @@ -798,22 +798,26 @@ sba_io_pdir_entry(u64 *pdir_ptr, unsigned long vba)
>  #endif
>  
>  #ifdef ENABLE_MARK_CLEAN
> -/**
> +/*
>   * Since DMA is i-cache coherent, any (complete) pages that were written via
>   * DMA can be marked as "clean" so that lazy_mmu_prot_update() doesn't have to
>   * flush them when they get mapped into an executable vm-area.
>   */
> -static void
> -mark_clean (void *addr, size_t size)
> +static void mark_clean(void *addr, size_t size)
>  {
> -	unsigned long pg_addr, end;
> -
> -	pg_addr = PAGE_ALIGN((unsigned long) addr);
> -	end = (unsigned long) addr + size;
> -	while (pg_addr + PAGE_SIZE <= end) {
> -		struct page *page = virt_to_page((void *)pg_addr);
> -		set_bit(PG_arch_1, &page->flags);
> -		pg_addr += PAGE_SIZE;
> +	struct folio *folio = virt_to_folio(addr);
> +	ssize_t left = size;
> +	size_t offset = offset_in_folio(folio, addr);
> +
> +	if (offset) {
> +		left -= folio_size(folio) - offset;
> +		folio = folio_next(folio);
> +	}
> +
> +	while (left >= folio_size(folio)) {
> +		set_bit(PG_arch_1, &folio->flags);
> +		left -= folio_size(folio);
> +		folio = folio_next(folio);
>  	}
>  }
>  #endif
> diff --git a/arch/ia64/include/asm/cacheflush.h b/arch/ia64/include/asm/cacheflush.h
> index 708c0fa5d975..eac493fa9e0d 100644
> --- a/arch/ia64/include/asm/cacheflush.h
> +++ b/arch/ia64/include/asm/cacheflush.h
> @@ -13,10 +13,16 @@
>  #include <asm/page.h>
>  
>  #define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
> -#define flush_dcache_page(page)			\
> -do {						\
> -	clear_bit(PG_arch_1, &(page)->flags);	\
> -} while (0)
> +static inline void flush_dcache_folio(struct folio *folio)
> +{
> +	clear_bit(PG_arch_1, &folio->flags);
> +}
> +#define flush_dcache_folio flush_dcache_folio
> +
> +static inline void flush_dcache_page(struct page *page)
> +{
> +	flush_dcache_folio(page_folio(page));
> +}
>  
>  extern void flush_icache_range(unsigned long start, unsigned long end);
>  #define flush_icache_range flush_icache_range
> diff --git a/arch/ia64/include/asm/pgtable.h b/arch/ia64/include/asm/pgtable.h
> index 21c97e31a28a..0c2be4ea664b 100644
> --- a/arch/ia64/include/asm/pgtable.h
> +++ b/arch/ia64/include/asm/pgtable.h
> @@ -303,7 +303,18 @@ static inline void set_pte(pte_t *ptep, pte_t pteval)
>  	*ptep = pteval;
>  }
>  
> -#define set_pte_at(mm,addr,ptep,pteval) set_pte(ptep,pteval)
> +static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
> +		pte_t *ptep, pte_t pte, unsigned int nr)
> +{
> +	for (;;) {
> +		set_pte(ptep, pte);
> +		if (--nr = 0)
> +			break;
> +		ptep++;
> +		pte_val(pte) += PAGE_SIZE;
> +	}
> +}
> +#define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, add, ptep, pte, 1)
>  
>  /*
>   * Make page protection values cacheable, uncacheable, or write-
> @@ -396,6 +407,7 @@ pte_same (pte_t a, pte_t b)
>  	return pte_val(a) = pte_val(b);
>  }
>  
> +#define update_mmu_cache_range(vma, address, ptep, nr) do { } while (0)
>  #define update_mmu_cache(vma, address, ptep) do { } while (0)
>  
>  extern pgd_t swapper_pg_dir[PTRS_PER_PGD];
> diff --git a/arch/ia64/mm/init.c b/arch/ia64/mm/init.c
> index 7f5353e28516..12aef25944aa 100644
> --- a/arch/ia64/mm/init.c
> +++ b/arch/ia64/mm/init.c
> @@ -50,30 +50,39 @@ void
>  __ia64_sync_icache_dcache (pte_t pte)
>  {
>  	unsigned long addr;
> -	struct page *page;
> +	struct folio *folio;
>  
> -	page = pte_page(pte);
> -	addr = (unsigned long) page_address(page);
> +	folio = page_folio(pte_page(pte));
> +	addr = (unsigned long)folio_address(folio);
>  
> -	if (test_bit(PG_arch_1, &page->flags))
> +	if (test_bit(PG_arch_1, &folio->flags))
>  		return;				/* i-cache is already coherent with d-cache */
>  
> -	flush_icache_range(addr, addr + page_size(page));
> -	set_bit(PG_arch_1, &page->flags);	/* mark page as clean */
> +	flush_icache_range(addr, addr + folio_size(folio));
> +	set_bit(PG_arch_1, &folio->flags);	/* mark page as clean */
>  }
>  
>  /*
> - * Since DMA is i-cache coherent, any (complete) pages that were written via
> + * Since DMA is i-cache coherent, any (complete) folios that were written via
>   * DMA can be marked as "clean" so that lazy_mmu_prot_update() doesn't have to
>   * flush them when they get mapped into an executable vm-area.
>   */
>  void arch_dma_mark_clean(phys_addr_t paddr, size_t size)
>  {
> -	unsigned long pfn = PHYS_PFN(paddr);
> +	struct folio *folio = page_folio(phys_to_page(paddr));
> +	ssize_t left = size;
> +	size_t offset = offset_in_folio(folio, paddr);

Build of defconfig failed miserably for me without this:

diff --git a/arch/ia64/mm/init.c b/arch/ia64/mm/init.c
index 12aef25944aa..0775e7870257 100644
--- a/arch/ia64/mm/init.c
+++ b/arch/ia64/mm/init.c
@@ -69,7 +69,8 @@ __ia64_sync_icache_dcache (pte_t pte)
  */
 void arch_dma_mark_clean(phys_addr_t paddr, size_t size)
 {
-	struct folio *folio = page_folio(phys_to_page(paddr));
+	unsigned long pfn = __phys_to_pfn(paddr);
+	struct folio *folio = page_folio(pfn_to_page(pfn));
 	ssize_t left = size;
 	size_t offset = offset_in_folio(folio, paddr);
 

>  
> -	do {
> +	if (offset) {
> +		left -= folio_size(folio) - offset;
> +		folio = folio_next(folio);
> +	}
> +
> +	while (left >= (ssize_t)folio_size(folio)) {
>  		set_bit(PG_arch_1, &pfn_to_page(pfn)->flags);
> -	} while (++pfn <= PHYS_PFN(paddr + size - 1));
> +		left -= folio_size(folio);
> +		folio = folio_next(folio);
> +	}
>  }
>  
>  inline void
> -- 
> 2.39.1
> 

-- 
Sincerely yours,
Mike.

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* Re: [PATCH v3 15/34] mips: Implement the new page table range API
  2023-02-28 21:37 ` [PATCH v3 15/34] mips: " Matthew Wilcox (Oracle)
@ 2023-03-03 12:24   ` Mike Rapoport
  0 siblings, 0 replies; 77+ messages in thread
From: Mike Rapoport @ 2023-03-03 12:24 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-mm, linux-arch, linux-kernel, Thomas Bogendoerfer, linux-mips

On Tue, Feb 28, 2023 at 09:37:18PM +0000, Matthew Wilcox (Oracle) wrote:
> Add set_ptes(), update_mmu_cache_range(), flush_icache_pages() and
> flush_dcache_folio().  Change the PG_arch_1 (aka PG_dcache_dirty) flag
> from being per-page to per-folio.

Looks like update_mmu_cache_range() is missing:

  CC      mm/memory.o
mm/memory.c: In function 'set_pte_range':
mm/memory.c:4291:9: error: implicit declaration of function 'update_mmu_cache_range'; did you mean 'update_mmu_cache_pmd'? [-Werror=implicit-function-declaration]
 4291 |         update_mmu_cache_range(vma, addr, vmf->pte, nr);
      |         ^~~~~~~~~~~~~~~~~~~~~~
      |         update_mmu_cache_pmd
cc1: some warnings being treated as errors

 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
> Cc: linux-mips@vger.kernel.org
> ---
>  arch/mips/include/asm/cacheflush.h | 32 +++++++++++------
>  arch/mips/include/asm/pgtable.h    | 36 +++++++++++++------
>  arch/mips/mm/c-r4k.c               |  5 +--
>  arch/mips/mm/cache.c               | 56 +++++++++++++++---------------
>  arch/mips/mm/init.c                | 17 +++++----
>  5 files changed, 88 insertions(+), 58 deletions(-)
> 
> diff --git a/arch/mips/include/asm/cacheflush.h b/arch/mips/include/asm/cacheflush.h
> index b3dc9c589442..2683cade42ef 100644
> --- a/arch/mips/include/asm/cacheflush.h
> +++ b/arch/mips/include/asm/cacheflush.h
> @@ -36,12 +36,12 @@
>   */
>  #define PG_dcache_dirty			PG_arch_1
>  
> -#define Page_dcache_dirty(page)		\
> -	test_bit(PG_dcache_dirty, &(page)->flags)
> -#define SetPageDcacheDirty(page)	\
> -	set_bit(PG_dcache_dirty, &(page)->flags)
> -#define ClearPageDcacheDirty(page)	\
> -	clear_bit(PG_dcache_dirty, &(page)->flags)
> +#define folio_test_dcache_dirty(folio)		\
> +	test_bit(PG_dcache_dirty, &(folio)->flags)
> +#define folio_set_dcache_dirty(folio)	\
> +	set_bit(PG_dcache_dirty, &(folio)->flags)
> +#define folio_clear_dcache_dirty(folio)	\
> +	clear_bit(PG_dcache_dirty, &(folio)->flags)
>  
>  extern void (*flush_cache_all)(void);
>  extern void (*__flush_cache_all)(void);
> @@ -50,15 +50,24 @@ extern void (*flush_cache_mm)(struct mm_struct *mm);
>  extern void (*flush_cache_range)(struct vm_area_struct *vma,
>  	unsigned long start, unsigned long end);
>  extern void (*flush_cache_page)(struct vm_area_struct *vma, unsigned long page, unsigned long pfn);
> -extern void __flush_dcache_page(struct page *page);
> +extern void __flush_dcache_pages(struct page *page, unsigned int nr);
>  
>  #define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
> +static inline void flush_dcache_folio(struct folio *folio)
> +{
> +	if (cpu_has_dc_aliases)
> +		__flush_dcache_pages(&folio->page, folio_nr_pages(folio));
> +	else if (!cpu_has_ic_fills_f_dc)
> +		folio_set_dcache_dirty(folio);
> +}
> +#define flush_dcache_folio flush_dcache_folio
> +
>  static inline void flush_dcache_page(struct page *page)
>  {
>  	if (cpu_has_dc_aliases)
> -		__flush_dcache_page(page);
> +		__flush_dcache_pages(page, 1);
>  	else if (!cpu_has_ic_fills_f_dc)
> -		SetPageDcacheDirty(page);
> +		folio_set_dcache_dirty(page_folio(page));
>  }
>  
>  #define flush_dcache_mmap_lock(mapping)		do { } while (0)
> @@ -73,10 +82,11 @@ static inline void flush_anon_page(struct vm_area_struct *vma,
>  		__flush_anon_page(page, vmaddr);
>  }
>  
> -static inline void flush_icache_page(struct vm_area_struct *vma,
> -	struct page *page)
> +static inline void flush_icache_pages(struct vm_area_struct *vma,
> +		struct page *page, unsigned int nr)
>  {
>  }
> +#define flush_icache_page(vma, page) flush_icache_pages(vma, page, 1)
>  
>  extern void (*flush_icache_range)(unsigned long start, unsigned long end);
>  extern void (*local_flush_icache_range)(unsigned long start, unsigned long end);
> diff --git a/arch/mips/include/asm/pgtable.h b/arch/mips/include/asm/pgtable.h
> index 791389bf3c12..0cf0455e6ae8 100644
> --- a/arch/mips/include/asm/pgtable.h
> +++ b/arch/mips/include/asm/pgtable.h
> @@ -105,8 +105,10 @@ do {									\
>  	}								\
>  } while(0)
>  
> -static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
> -			      pte_t *ptep, pte_t pteval);
> +static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
> +		pte_t *ptep, pte_t pte, unsigned int nr);
> +
> +#define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1)
>  
>  #if defined(CONFIG_PHYS_ADDR_T_64BIT) && defined(CONFIG_CPU_MIPS32)
>  
> @@ -204,19 +206,31 @@ static inline void pte_clear(struct mm_struct *mm, unsigned long addr, pte_t *pt
>  }
>  #endif
>  
> -static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
> -			      pte_t *ptep, pte_t pteval)
> +static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
> +		pte_t *ptep, pte_t pte, unsigned int nr)
>  {
> +	unsigned int i;
> +	bool do_sync = false;
>  
> -	if (!pte_present(pteval))
> -		goto cache_sync_done;
> +	for (i = 0; i < nr; i++) {
> +		if (!pte_present(pte))
> +			continue;
> +		if (pte_present(ptep[i]) &&
> +		    (pte_pfn(ptep[i]) == pte_pfn(pte)))
> +			continue;
> +		do_sync = true;
> +	}
>  
> -	if (pte_present(*ptep) && (pte_pfn(*ptep) == pte_pfn(pteval)))
> -		goto cache_sync_done;
> +	if (do_sync)
> +		__update_cache(addr, pte);
>  
> -	__update_cache(addr, pteval);
> -cache_sync_done:
> -	set_pte(ptep, pteval);
> +	for (;;) {
> +		set_pte(ptep, pte);
> +		if (--nr == 0)
> +			break;
> +		ptep++;
> +		pte_val(pte) += 1 << _PFN_SHIFT;
> +	}
>  }
>  
>  /*
> diff --git a/arch/mips/mm/c-r4k.c b/arch/mips/mm/c-r4k.c
> index a549fa98c2f4..7d2a42f0cffd 100644
> --- a/arch/mips/mm/c-r4k.c
> +++ b/arch/mips/mm/c-r4k.c
> @@ -679,13 +679,14 @@ static inline void local_r4k_flush_cache_page(void *args)
>  	if ((mm == current->active_mm) && (pte_val(*ptep) & _PAGE_VALID))
>  		vaddr = NULL;
>  	else {
> +		struct folio *folio = page_folio(page);
>  		/*
>  		 * Use kmap_coherent or kmap_atomic to do flushes for
>  		 * another ASID than the current one.
>  		 */
>  		map_coherent = (cpu_has_dc_aliases &&
> -				page_mapcount(page) &&
> -				!Page_dcache_dirty(page));
> +				folio_mapped(folio) &&
> +				!folio_test_dcache_dirty(folio));
>  		if (map_coherent)
>  			vaddr = kmap_coherent(page, addr);
>  		else
> diff --git a/arch/mips/mm/cache.c b/arch/mips/mm/cache.c
> index 11b3e7ddafd5..0668435521fc 100644
> --- a/arch/mips/mm/cache.c
> +++ b/arch/mips/mm/cache.c
> @@ -82,13 +82,15 @@ SYSCALL_DEFINE3(cacheflush, unsigned long, addr, unsigned long, bytes,
>  	return 0;
>  }
>  
> -void __flush_dcache_page(struct page *page)
> +void __flush_dcache_pages(struct page *page, unsigned int nr)
>  {
> -	struct address_space *mapping = page_mapping_file(page);
> +	struct folio *folio = page_folio(page);
> +	struct address_space *mapping = folio_flush_mapping(folio);
>  	unsigned long addr;
> +	unsigned int i;
>  
>  	if (mapping && !mapping_mapped(mapping)) {
> -		SetPageDcacheDirty(page);
> +		folio_set_dcache_dirty(folio);
>  		return;
>  	}
>  
> @@ -97,25 +99,21 @@ void __flush_dcache_page(struct page *page)
>  	 * case is for exec env/arg pages and those are %99 certainly going to
>  	 * get faulted into the tlb (and thus flushed) anyways.
>  	 */
> -	if (PageHighMem(page))
> -		addr = (unsigned long)kmap_atomic(page);
> -	else
> -		addr = (unsigned long)page_address(page);
> -
> -	flush_data_cache_page(addr);
> -
> -	if (PageHighMem(page))
> -		kunmap_atomic((void *)addr);
> +	for (i = 0; i < nr; i++) {
> +		addr = (unsigned long)kmap_local_page(page + i);
> +		flush_data_cache_page(addr);
> +		kunmap_local((void *)addr);
> +	}
>  }
> -
> -EXPORT_SYMBOL(__flush_dcache_page);
> +EXPORT_SYMBOL(__flush_dcache_pages);
>  
>  void __flush_anon_page(struct page *page, unsigned long vmaddr)
>  {
>  	unsigned long addr = (unsigned long) page_address(page);
> +	struct folio *folio = page_folio(page);
>  
>  	if (pages_do_alias(addr, vmaddr)) {
> -		if (page_mapcount(page) && !Page_dcache_dirty(page)) {
> +		if (folio_mapped(folio) && !folio_test_dcache_dirty(folio)) {
>  			void *kaddr;
>  
>  			kaddr = kmap_coherent(page, vmaddr);
> @@ -130,27 +128,29 @@ EXPORT_SYMBOL(__flush_anon_page);
>  
>  void __update_cache(unsigned long address, pte_t pte)
>  {
> -	struct page *page;
> +	struct folio *folio;
>  	unsigned long pfn, addr;
>  	int exec = !pte_no_exec(pte) && !cpu_has_ic_fills_f_dc;
> +	unsigned int i;
>  
>  	pfn = pte_pfn(pte);
>  	if (unlikely(!pfn_valid(pfn)))
>  		return;
> -	page = pfn_to_page(pfn);
> -	if (Page_dcache_dirty(page)) {
> -		if (PageHighMem(page))
> -			addr = (unsigned long)kmap_atomic(page);
> -		else
> -			addr = (unsigned long)page_address(page);
> -
> -		if (exec || pages_do_alias(addr, address & PAGE_MASK))
> -			flush_data_cache_page(addr);
>  
> -		if (PageHighMem(page))
> -			kunmap_atomic((void *)addr);
> +	folio = page_folio(pfn_to_page(pfn));
> +	address &= PAGE_MASK;
> +	address -= offset_in_folio(folio, pfn << PAGE_SHIFT);
> +
> +	if (folio_test_dcache_dirty(folio)) {
> +		for (i = 0; i < folio_nr_pages(folio); i++) {
> +			addr = (unsigned long)kmap_local_folio(folio, i);
>  
> -		ClearPageDcacheDirty(page);
> +			if (exec || pages_do_alias(addr, address))
> +				flush_data_cache_page(addr);
> +			kunmap_local((void *)addr);
> +			address += PAGE_SIZE;
> +		}
> +		folio_clear_dcache_dirty(folio);
>  	}
>  }
>  
> diff --git a/arch/mips/mm/init.c b/arch/mips/mm/init.c
> index 5a8002839550..19d4ca3b3fbd 100644
> --- a/arch/mips/mm/init.c
> +++ b/arch/mips/mm/init.c
> @@ -88,7 +88,7 @@ static void *__kmap_pgprot(struct page *page, unsigned long addr, pgprot_t prot)
>  	pte_t pte;
>  	int tlbidx;
>  
> -	BUG_ON(Page_dcache_dirty(page));
> +	BUG_ON(folio_test_dcache_dirty(page_folio(page)));
>  
>  	preempt_disable();
>  	pagefault_disable();
> @@ -169,11 +169,12 @@ void kunmap_coherent(void)
>  void copy_user_highpage(struct page *to, struct page *from,
>  	unsigned long vaddr, struct vm_area_struct *vma)
>  {
> +	struct folio *src = page_folio(from);
>  	void *vfrom, *vto;
>  
>  	vto = kmap_atomic(to);
>  	if (cpu_has_dc_aliases &&
> -	    page_mapcount(from) && !Page_dcache_dirty(from)) {
> +	    folio_mapped(src) && !folio_test_dcache_dirty(src)) {
>  		vfrom = kmap_coherent(from, vaddr);
>  		copy_page(vto, vfrom);
>  		kunmap_coherent();
> @@ -194,15 +195,17 @@ void copy_to_user_page(struct vm_area_struct *vma,
>  	struct page *page, unsigned long vaddr, void *dst, const void *src,
>  	unsigned long len)
>  {
> +	struct folio *folio = page_folio(page);
> +
>  	if (cpu_has_dc_aliases &&
> -	    page_mapcount(page) && !Page_dcache_dirty(page)) {
> +	    folio_mapped(folio) && !folio_test_dcache_dirty(folio)) {
>  		void *vto = kmap_coherent(page, vaddr) + (vaddr & ~PAGE_MASK);
>  		memcpy(vto, src, len);
>  		kunmap_coherent();
>  	} else {
>  		memcpy(dst, src, len);
>  		if (cpu_has_dc_aliases)
> -			SetPageDcacheDirty(page);
> +			folio_set_dcache_dirty(folio);
>  	}
>  	if (vma->vm_flags & VM_EXEC)
>  		flush_cache_page(vma, vaddr, page_to_pfn(page));
> @@ -212,15 +215,17 @@ void copy_from_user_page(struct vm_area_struct *vma,
>  	struct page *page, unsigned long vaddr, void *dst, const void *src,
>  	unsigned long len)
>  {
> +	struct folio *folio = page_folio(page);
> +
>  	if (cpu_has_dc_aliases &&
> -	    page_mapcount(page) && !Page_dcache_dirty(page)) {
> +	    folio_mapped(folio) && !folio_test_dcache_dirty(folio)) {
>  		void *vfrom = kmap_coherent(page, vaddr) + (vaddr & ~PAGE_MASK);
>  		memcpy(dst, vfrom, len);
>  		kunmap_coherent();
>  	} else {
>  		memcpy(dst, src, len);
>  		if (cpu_has_dc_aliases)
> -			SetPageDcacheDirty(page);
> +			folio_set_dcache_dirty(folio);
>  	}
>  }
>  EXPORT_SYMBOL_GPL(copy_from_user_page);
> -- 
> 2.39.1
> 
> 

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v3 16/34] nios2: Implement the new page table range API
  2023-02-28 21:37 ` [PATCH v3 16/34] nios2: " Matthew Wilcox (Oracle)
@ 2023-03-03 12:49   ` Mike Rapoport
  0 siblings, 0 replies; 77+ messages in thread
From: Mike Rapoport @ 2023-03-03 12:49 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle); +Cc: linux-mm, linux-arch, linux-kernel, Dinh Nguyen

On Tue, Feb 28, 2023 at 09:37:19PM +0000, Matthew Wilcox (Oracle) wrote:
> Add set_ptes(), update_mmu_cache_range(), flush_icache_pages() and
> flush_dcache_folio().  Change the PG_arch_1 (aka PG_dcache_dirty) flag
> from being per-page to per-folio.

This failed for me with 

  CC      arch/nios2/mm/cacheflush.o
arch/nios2/mm/cacheflush.c: In function 'flush_dcache_folio':
arch/nios2/mm/cacheflush.c:195:48: error: passing argument 2 of 'flush_aliases' from incompatible pointer type [-Werror=incompatible-pointer-types]
  195 |                         flush_aliases(mapping, folio);
      |                                                ^~~~~
      |                                                |
      |                                                struct folio *
arch/nios2/mm/cacheflush.c:74:71: note: expected 'struct page *' but argument is of type 'struct folio *'
   74 | static void flush_aliases(struct address_space *mapping, struct page *page)
      |                                                          ~~~~~~~~~~~~~^~~~
In file included from arch/nios2/mm/cacheflush.c:10:
arch/nios2/mm/cacheflush.c: At top level:
include/linux/export.h:57:43: error: redefinition of '__ksymtab_flush_dcache_folio'
   57 |         static const struct kernel_symbol __ksymtab_##sym               \
      |                                           ^~~~~~~~~~
include/linux/export.h:96:9: note: in expansion of macro '__KSYMTAB_ENTRY'
   96 |         __KSYMTAB_ENTRY(sym, sec)
      |         ^~~~~~~~~~~~~~~
include/linux/export.h:140:41: note: in expansion of macro '___EXPORT_SYMBOL'
  140 | #define __EXPORT_SYMBOL(sym, sec, ns)   ___EXPORT_SYMBOL(sym, sec, ns)
      |                                         ^~~~~~~~~~~~~~~~
include/linux/export.h:147:41: note: in expansion of macro '__EXPORT_SYMBOL'
  147 | #define _EXPORT_SYMBOL(sym, sec)        __EXPORT_SYMBOL(sym, sec, "")
      |                                         ^~~~~~~~~~~~~~~
include/linux/export.h:150:41: note: in expansion of macro '_EXPORT_SYMBOL'
  150 | #define EXPORT_SYMBOL(sym)              _EXPORT_SYMBOL(sym, "")
      |                                         ^~~~~~~~~~~~~~
arch/nios2/mm/cacheflush.c:207:1: note: in expansion of macro 'EXPORT_SYMBOL'
  207 | EXPORT_SYMBOL(flush_dcache_folio);
      | ^~~~~~~~~~~~~
include/linux/export.h:57:43: note: previous definition of '__ksymtab_flush_dcache_folio' with type 'const struct kernel_symbol'
   57 |         static const struct kernel_symbol __ksymtab_##sym               \
      |                                           ^~~~~~~~~~
include/linux/export.h:96:9: note: in expansion of macro '__KSYMTAB_ENTRY'
   96 |         __KSYMTAB_ENTRY(sym, sec)
      |         ^~~~~~~~~~~~~~~
include/linux/export.h:140:41: note: in expansion of macro '___EXPORT_SYMBOL'
  140 | #define __EXPORT_SYMBOL(sym, sec, ns)   ___EXPORT_SYMBOL(sym, sec, ns)
      |                                         ^~~~~~~~~~~~~~~~
include/linux/export.h:147:41: note: in expansion of macro '__EXPORT_SYMBOL'
  147 | #define _EXPORT_SYMBOL(sym, sec)        __EXPORT_SYMBOL(sym, sec, "")
      |                                         ^~~~~~~~~~~~~~~
include/linux/export.h:150:41: note: in expansion of macro '_EXPORT_SYMBOL'
  150 | #define EXPORT_SYMBOL(sym)              _EXPORT_SYMBOL(sym, "")
      |                                         ^~~~~~~~~~~~~~
arch/nios2/mm/cacheflush.c:201:1: note: in expansion of macro 'EXPORT_SYMBOL'
  201 | EXPORT_SYMBOL(flush_dcache_folio);
      | ^~~~~~~~~~~~~
arch/nios2/mm/cacheflush.c: In function 'update_mmu_cache_range':
arch/nios2/mm/cacheflush.c:235:40: error: passing argument 2 of 'flush_aliases' from incompatible pointer type [-Werror=incompatible-pointer-types]
  235 |                 flush_aliases(mapping, folio);
      |                                        ^~~~~
      |                                        |
      |                                        struct folio *
arch/nios2/mm/cacheflush.c:74:71: note: expected 'struct page *' but argument is of type 'struct folio *'
   74 | static void flush_aliases(struct address_space *mapping, struct page *page)
      |                                                          ~~~~~~~~~~~~~^~~~
cc1: some warnings being treated as errors
make[4]: *** [scripts/Makefile.build:252: arch/nios2/mm/cacheflush.o] Error 1
make[3]: *** [scripts/Makefile.build:494: arch/nios2/mm] Error 2
make[2]: *** [scripts/Makefile.build:494: arch/nios2] Error 2
make[1]: *** [Makefile:2028: .] Error 2
make[1]: Leaving directory '/home/mike/build/cross/nios2/defconfig'
make: *** [Makefile:226: __sub-make] Error 2

 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> Cc: Dinh Nguyen <dinguyen@kernel.org>
> ---
>  arch/nios2/include/asm/cacheflush.h |  6 ++-
>  arch/nios2/include/asm/pgtable.h    | 27 +++++++++----
>  arch/nios2/mm/cacheflush.c          | 61 ++++++++++++++++-------------
>  3 files changed, 58 insertions(+), 36 deletions(-)
> 
> diff --git a/arch/nios2/include/asm/cacheflush.h b/arch/nios2/include/asm/cacheflush.h
> index d0b71dd71287..8624ca83cffe 100644
> --- a/arch/nios2/include/asm/cacheflush.h
> +++ b/arch/nios2/include/asm/cacheflush.h
> @@ -29,9 +29,13 @@ extern void flush_cache_page(struct vm_area_struct *vma, unsigned long vmaddr,
>  	unsigned long pfn);
>  #define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
>  void flush_dcache_page(struct page *page);
> +void flush_dcache_folio(struct folio *folio);
> +#define flush_dcache_folio flush_dcache_folio
>  
>  extern void flush_icache_range(unsigned long start, unsigned long end);
> -extern void flush_icache_page(struct vm_area_struct *vma, struct page *page);
> +void flush_icache_pages(struct vm_area_struct *vma, struct page *page,
> +		unsigned int nr);
> +#define flush_icache_page(vma, page) flush_icache_pages(vma, page, 1);
>  
>  #define flush_cache_vmap(start, end)		flush_dcache_range(start, end)
>  #define flush_cache_vunmap(start, end)		flush_dcache_range(start, end)
> diff --git a/arch/nios2/include/asm/pgtable.h b/arch/nios2/include/asm/pgtable.h
> index 0f5c2564e9f5..8a77821a17a5 100644
> --- a/arch/nios2/include/asm/pgtable.h
> +++ b/arch/nios2/include/asm/pgtable.h
> @@ -178,15 +178,23 @@ static inline void set_pte(pte_t *ptep, pte_t pteval)
>  	*ptep = pteval;
>  }
>  
> -static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
> -			      pte_t *ptep, pte_t pteval)
> +static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
> +		pte_t *ptep, pte_t pte, unsigned int nr)
>  {
> -	unsigned long paddr = (unsigned long)page_to_virt(pte_page(pteval));
> -
> -	flush_dcache_range(paddr, paddr + PAGE_SIZE);
> -	set_pte(ptep, pteval);
> +	unsigned long paddr = (unsigned long)page_to_virt(pte_page(pte));
> +
> +	flush_dcache_range(paddr, paddr + nr * PAGE_SIZE);
> +	for (;;) {
> +		set_pte(ptep, pte);
> +		if (--nr == 0)
> +			break;
> +		ptep++;
> +		pte_val(pte) += 1;
> +	}
>  }
>  
> +#define set_pte_at(mm, addr, ptep, pte)	set_ptes(mm, addr, ptep, pte, 1)
> +
>  static inline int pmd_none(pmd_t pmd)
>  {
>  	return (pmd_val(pmd) ==
> @@ -273,7 +281,10 @@ static inline pte_t pte_swp_clear_exclusive(pte_t pte)
>  extern void __init paging_init(void);
>  extern void __init mmu_init(void);
>  
> -extern void update_mmu_cache(struct vm_area_struct *vma,
> -			     unsigned long address, pte_t *pte);
> +void update_mmu_cache_range(struct vm_area_struct *vma, unsigned long address,
> +		pte_t *ptep, unsigned int nr);
> +
> +#define update_mmu_cache(vma, addr, ptep) \
> +	update_mmu_cache_range(vma, addr, ptep, 1)
>  
>  #endif /* _ASM_NIOS2_PGTABLE_H */
> diff --git a/arch/nios2/mm/cacheflush.c b/arch/nios2/mm/cacheflush.c
> index 6aa9257c3ede..471485a84b2c 100644
> --- a/arch/nios2/mm/cacheflush.c
> +++ b/arch/nios2/mm/cacheflush.c
> @@ -138,10 +138,11 @@ void flush_cache_range(struct vm_area_struct *vma, unsigned long start,
>  		__flush_icache(start, end);
>  }
>  
> -void flush_icache_page(struct vm_area_struct *vma, struct page *page)
> +void flush_icache_pages(struct vm_area_struct *vma, struct page *page,
> +		unsigned int nr)
>  {
>  	unsigned long start = (unsigned long) page_address(page);
> -	unsigned long end = start + PAGE_SIZE;
> +	unsigned long end = start + nr * PAGE_SIZE;
>  
>  	__flush_dcache(start, end);
>  	__flush_icache(start, end);
> @@ -158,19 +159,19 @@ void flush_cache_page(struct vm_area_struct *vma, unsigned long vmaddr,
>  		__flush_icache(start, end);
>  }
>  
> -void __flush_dcache_page(struct address_space *mapping, struct page *page)
> +void __flush_dcache_folio(struct address_space *mapping, struct folio *folio)
>  {
>  	/*
>  	 * Writeback any data associated with the kernel mapping of this
>  	 * page.  This ensures that data in the physical page is mutually
>  	 * coherent with the kernels mapping.
>  	 */
> -	unsigned long start = (unsigned long)page_address(page);
> +	unsigned long start = (unsigned long)folio_address(folio);
>  
> -	__flush_dcache(start, start + PAGE_SIZE);
> +	__flush_dcache(start, start + folio_size(folio));
>  }
>  
> -void flush_dcache_page(struct page *page)
> +void flush_dcache_folio(struct folio *folio)
>  {
>  	struct address_space *mapping;
>  
> @@ -178,32 +179,38 @@ void flush_dcache_page(struct page *page)
>  	 * The zero page is never written to, so never has any dirty
>  	 * cache lines, and therefore never needs to be flushed.
>  	 */
> -	if (page == ZERO_PAGE(0))
> +	if (is_zero_pfn(folio_pfn(folio)))
>  		return;
>  
> -	mapping = page_mapping_file(page);
> +	mapping = folio_flush_mapping(folio);
>  
>  	/* Flush this page if there are aliases. */
>  	if (mapping && !mapping_mapped(mapping)) {
> -		clear_bit(PG_dcache_clean, &page->flags);
> +		clear_bit(PG_dcache_clean, &folio->flags);
>  	} else {
> -		__flush_dcache_page(mapping, page);
> +		__flush_dcache_folio(mapping, folio);
>  		if (mapping) {
> -			unsigned long start = (unsigned long)page_address(page);
> -			flush_aliases(mapping,  page);
> -			flush_icache_range(start, start + PAGE_SIZE);
> +			unsigned long start = (unsigned long)folio_address(folio);
> +			flush_aliases(mapping, folio);
> +			flush_icache_range(start, start + folio_size(folio));
>  		}
> -		set_bit(PG_dcache_clean, &page->flags);
> +		set_bit(PG_dcache_clean, &folio->flags);
>  	}
>  }
> -EXPORT_SYMBOL(flush_dcache_page);
> +EXPORT_SYMBOL(flush_dcache_folio);
> +
> +void flush_dcache_page(struct page *page)
> +{
> +	flush_dcache_folio(page_folio(page));
> +}
> +EXPORT_SYMBOL(flush_dcache_folio);
>  
> -void update_mmu_cache(struct vm_area_struct *vma,
> -		      unsigned long address, pte_t *ptep)
> +void update_mmu_cache_range(struct vm_area_struct *vma, unsigned long address,
> +		pte_t *ptep, unsigned int nr)
>  {
>  	pte_t pte = *ptep;
>  	unsigned long pfn = pte_pfn(pte);
> -	struct page *page;
> +	struct folio *folio;
>  	struct address_space *mapping;
>  
>  	reload_tlb_page(vma, address, pte);
> @@ -215,19 +222,19 @@ void update_mmu_cache(struct vm_area_struct *vma,
>  	* The zero page is never written to, so never has any dirty
>  	* cache lines, and therefore never needs to be flushed.
>  	*/
> -	page = pfn_to_page(pfn);
> -	if (page == ZERO_PAGE(0))
> +	if (is_zero_pfn(pfn))
>  		return;
>  
> -	mapping = page_mapping_file(page);
> -	if (!test_and_set_bit(PG_dcache_clean, &page->flags))
> -		__flush_dcache_page(mapping, page);
> +	folio = page_folio(pfn_to_page(pfn));
> +	mapping = folio_flush_mapping(folio);
> +	if (!test_and_set_bit(PG_dcache_clean, &folio->flags))
> +		__flush_dcache_folio(mapping, folio);
>  
> -	if(mapping)
> -	{
> -		flush_aliases(mapping, page);
> +	if (mapping) {
> +		flush_aliases(mapping, folio);
>  		if (vma->vm_flags & VM_EXEC)
> -			flush_icache_page(vma, page);
> +			flush_icache_pages(vma, &folio->page,
> +					folio_nr_pages(folio));
>  	}
>  }
>  
> -- 
> 2.39.1
> 
> 

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v3 30/34] mm: Use flush_icache_pages() in do_set_pmd()
  2023-02-28 21:37 ` [PATCH v3 30/34] mm: Use flush_icache_pages() in do_set_pmd() Matthew Wilcox (Oracle)
@ 2023-03-03 14:02   ` Mike Rapoport
  2023-03-03 16:02     ` Matthew Wilcox
  0 siblings, 1 reply; 77+ messages in thread
From: Mike Rapoport @ 2023-03-03 14:02 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle); +Cc: linux-mm, linux-arch, linux-kernel

On Tue, Feb 28, 2023 at 09:37:33PM +0000, Matthew Wilcox (Oracle) wrote:
> Push the iteration over each page down to the architectures (many
> can flush the entire THP without iteration).
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>  mm/memory.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/mm/memory.c b/mm/memory.c
> index bfa3100ec5a3..69e844d5f75c 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -4222,8 +4222,7 @@ vm_fault_t do_set_pmd(struct vm_fault *vmf, struct page *page)
>  	if (unlikely(!pmd_none(*vmf->pmd)))
>  		goto out;
>  
> -	for (i = 0; i < HPAGE_PMD_NR; i++)
> -		flush_icache_page(vma, page + i);
> +	flush_icache_pages(vma, page, HPAGE_PMD_NR);
>  
>  	entry = mk_huge_pmd(page, vma->vm_page_prot);
>  	if (write)
> -- 
> 2.39.1

I get this:

  CC      mm/memory.o
/home/mike/git/linux/mm/memory.c: In function 'do_set_pmd':
/home/mike/git/linux/mm/memory.c:4191:13: warning: unused variable 'i' [-Wunused-variable]
 4191 |         int i;
      |             ^

And the patch here makes it go away:

diff --git a/mm/memory.c b/mm/memory.c
index cc7845ff09ba..c359fb8643e5 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4188,7 +4188,6 @@ vm_fault_t do_set_pmd(struct vm_fault *vmf, struct page *page)
 	bool write = vmf->flags & FAULT_FLAG_WRITE;
 	unsigned long haddr = vmf->address & HPAGE_PMD_MASK;
 	pmd_t entry;
-	int i;
 	vm_fault_t ret = VM_FAULT_FALLBACK;
 
 	if (!transhuge_vma_suitable(vma, haddr))

-- 
Sincerely yours,
Mike.

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* Re: [PATCH v3 00/34] New page table range API
  2023-02-28 21:37 [PATCH v3 00/34] New page table range API Matthew Wilcox (Oracle)
                   ` (33 preceding siblings ...)
  2023-02-28 21:37 ` [PATCH v3 34/34] filemap: Batch PTE mappings Matthew Wilcox (Oracle)
@ 2023-03-03 14:19 ` Mike Rapoport
  2023-03-05 10:15 ` Geert Uytterhoeven
  2023-03-09 11:09 ` Ryan Roberts
  36 siblings, 0 replies; 77+ messages in thread
From: Mike Rapoport @ 2023-03-03 14:19 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle); +Cc: linux-mm, linux-arch, linux-kernel

On Tue, Feb 28, 2023 at 09:37:03PM +0000, Matthew Wilcox (Oracle) wrote:
> This patchset changes the API used by the MM to set up page table entries.
> The four APIs are:
>     set_ptes(mm, addr, ptep, pte, nr)
>     update_mmu_cache_range(vma, addr, ptep, nr)
>     flush_dcache_folio(folio)
>     flush_icache_pages(vma, page, nr)
> 
> flush_dcache_folio() isn't technically new, but no architecture
> implemented it, so I've done that for you.  The old APIs remain around
> but are mostly implemented by calling the new interfaces.
> 
> The new APIs are based around setting up N page table entries at once.
> The N entries belong to the same PMD, the same folio and the same VMA,
> so ptep++ is a legitimate operation, and locking is taken care of for
> you.  Some architectures can do a better job of it than just a loop,
> but I have hesitated to make too deep a change to architectures I don't
> understand well.

The new set_ptes() looks unnecessarily duplicated all over arch/
What do you say about adding the patch below on top of the series?
Ideally it should be split into per-arch bits, but I can send it separately
as a cleanup on top.

diff --git a/arch/alpha/include/asm/pgtable.h b/arch/alpha/include/asm/pgtable.h
index 1e3354e9731b..65fb9e66675d 100644
--- a/arch/alpha/include/asm/pgtable.h
+++ b/arch/alpha/include/asm/pgtable.h
@@ -37,6 +37,7 @@ static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
 		pte_val(pte) += 1UL << 32;
 	}
 }
+#define set_ptes set_ptes
 #define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1)
 
 /* PMD_SHIFT determines the size of the area a second-level page table can map */
diff --git a/arch/arc/include/asm/pgtable-bits-arcv2.h b/arch/arc/include/asm/pgtable-bits-arcv2.h
index 4a1b2ce204c6..06d8039180c0 100644
--- a/arch/arc/include/asm/pgtable-bits-arcv2.h
+++ b/arch/arc/include/asm/pgtable-bits-arcv2.h
@@ -100,19 +100,6 @@ static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
 	return __pte((pte_val(pte) & _PAGE_CHG_MASK) | pgprot_val(newprot));
 }
 
-static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
-		pte_t *ptep, pte_t pte, unsigned int nr)
-{
-	for (;;) {
-		set_pte(ptep, pte);
-		if (--nr == 0)
-			break;
-		ptep++;
-		pte_val(pte) += PAGE_SIZE;
-	}
-}
-#define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1)
-
 void update_mmu_cache_range(struct vm_area_struct *vma, unsigned long address,
 		      pte_t *ptep, unsigned int nr);
 
diff --git a/arch/arm/include/asm/pgtable.h b/arch/arm/include/asm/pgtable.h
index 6525ac82bd50..0d326b201797 100644
--- a/arch/arm/include/asm/pgtable.h
+++ b/arch/arm/include/asm/pgtable.h
@@ -209,6 +209,7 @@ extern void __sync_icache_dcache(pte_t pteval);
 
 void set_ptes(struct mm_struct *mm, unsigned long addr,
 		      pte_t *ptep, pte_t pteval, unsigned int nr);
+#define set_ptes set_ptes
 #define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1)
 
 static inline pte_t clear_pte_bit(pte_t pte, pgprot_t prot)
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 4d1b79dbff16..a8d6460c5c9f 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -369,6 +369,7 @@ static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
 		pte_val(pte) += PAGE_SIZE;
 	}
 }
+#define set_ptes set_ptes
 #define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1)
 
 /*
diff --git a/arch/csky/include/asm/pgtable.h b/arch/csky/include/asm/pgtable.h
index a30ae048233e..e426f1820deb 100644
--- a/arch/csky/include/asm/pgtable.h
+++ b/arch/csky/include/asm/pgtable.h
@@ -91,20 +91,6 @@ static inline void set_pte(pte_t *p, pte_t pte)
 	smp_mb();
 }
 
-static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
-		pte_t *ptep, pte_t pte, unsigned int nr)
-{
-	for (;;) {
-		set_pte(ptep, pte);
-		if (--nr == 0)
-			break;
-		ptep++;
-		pte_val(pte) += PAGE_SIZE;
-	}
-}
-
-#define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1)
-
 static inline pte_t *pmd_page_vaddr(pmd_t pmd)
 {
 	unsigned long ptr;
diff --git a/arch/hexagon/include/asm/pgtable.h b/arch/hexagon/include/asm/pgtable.h
index f58f1d920769..67ab91662e83 100644
--- a/arch/hexagon/include/asm/pgtable.h
+++ b/arch/hexagon/include/asm/pgtable.h
@@ -345,26 +345,6 @@ static inline int pte_exec(pte_t pte)
 #define pte_pfn(pte) (pte_val(pte) >> PAGE_SHIFT)
 #define set_pmd(pmdptr, pmdval) (*(pmdptr) = (pmdval))
 
-/*
- * set_ptes - update page table and do whatever magic may be
- * necessary to make the underlying hardware/firmware take note.
- *
- * VM may require a virtual instruction to alert the MMU.
- */
-static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
-		pte_t *ptep, pte_t pte, unsigned int nr)
-{
-	for (;;) {
-		set_pte(ptep, pte);
-		if (--nr == 0)
-			break;
-		ptep++;
-		pte_val(pte) += PAGE_SIZE;
-	}
-}
-
-#define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1)
-
 static inline unsigned long pmd_page_vaddr(pmd_t pmd)
 {
 	return (unsigned long)__va(pmd_val(pmd) & PAGE_MASK);
diff --git a/arch/ia64/include/asm/pgtable.h b/arch/ia64/include/asm/pgtable.h
index 0c2be4ea664b..65a6e3b30721 100644
--- a/arch/ia64/include/asm/pgtable.h
+++ b/arch/ia64/include/asm/pgtable.h
@@ -303,19 +303,6 @@ static inline void set_pte(pte_t *ptep, pte_t pteval)
 	*ptep = pteval;
 }
 
-static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
-		pte_t *ptep, pte_t pte, unsigned int nr)
-{
-	for (;;) {
-		set_pte(ptep, pte);
-		if (--nr == 0)
-			break;
-		ptep++;
-		pte_val(pte) += PAGE_SIZE;
-	}
-}
-#define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, add, ptep, pte, 1)
-
 /*
  * Make page protection values cacheable, uncacheable, or write-
  * combining.  Note that "protection" is really a misnomer here as the
diff --git a/arch/loongarch/include/asm/pgtable.h b/arch/loongarch/include/asm/pgtable.h
index 9154d317ffb4..d4b0ca7b4bf7 100644
--- a/arch/loongarch/include/asm/pgtable.h
+++ b/arch/loongarch/include/asm/pgtable.h
@@ -346,6 +346,7 @@ static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
 	}
 }
 
+#define set_ptes set_ptes
 #define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1)
 
 static inline void pte_clear(struct mm_struct *mm, unsigned long addr, pte_t *ptep)
diff --git a/arch/m68k/include/asm/pgtable_mm.h b/arch/m68k/include/asm/pgtable_mm.h
index 400206c17c97..8c2db20abdb6 100644
--- a/arch/m68k/include/asm/pgtable_mm.h
+++ b/arch/m68k/include/asm/pgtable_mm.h
@@ -32,20 +32,6 @@
 		*(pteptr) = (pteval);				\
 	} while(0)
 
-static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
-		pte_t *ptep, pte_t pte, unsigned int nr)
-{
-	for (;;) {
-		set_pte(ptep, pte);
-		if (--nr == 0)
-			break;
-		ptep++;
-		pte_val(pte) += PAGE_SIZE;
-	}
-}
-
-#define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1)
-
 /* PMD_SHIFT determines the size of the area a second-level page table can map */
 #if CONFIG_PGTABLE_LEVELS == 3
 #define PMD_SHIFT	18
diff --git a/arch/microblaze/include/asm/pgtable.h b/arch/microblaze/include/asm/pgtable.h
index a01e1369b486..3e7643a986ad 100644
--- a/arch/microblaze/include/asm/pgtable.h
+++ b/arch/microblaze/include/asm/pgtable.h
@@ -335,20 +335,6 @@ static inline void set_pte(pte_t *ptep, pte_t pte)
 	*ptep = pte;
 }
 
-static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
-		pte_t *ptep, pte_t pte, unsigned int nr)
-{
-	for (;;) {
-		set_pte(ptep, pte);
-		if (--nr == 0)
-			break;
-		ptep++;
-		pte_val(pte) += 1 << PFN_SHIFT_OFFSET;
-	}
-}
-
-#define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1)
-
 #define __HAVE_ARCH_PTEP_TEST_AND_CLEAR_YOUNG
 static inline int ptep_test_and_clear_young(struct vm_area_struct *vma,
 		unsigned long address, pte_t *ptep)
diff --git a/arch/mips/include/asm/pgtable.h b/arch/mips/include/asm/pgtable.h
index 0cf0455e6ae8..18b77567ef72 100644
--- a/arch/mips/include/asm/pgtable.h
+++ b/arch/mips/include/asm/pgtable.h
@@ -108,6 +108,7 @@ do {									\
 static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
 		pte_t *ptep, pte_t pte, unsigned int nr);
 
+#define set_ptes set_ptes
 #define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1)
 
 #if defined(CONFIG_PHYS_ADDR_T_64BIT) && defined(CONFIG_CPU_MIPS32)
diff --git a/arch/nios2/include/asm/pgtable.h b/arch/nios2/include/asm/pgtable.h
index 8a77821a17a5..2a994b225a41 100644
--- a/arch/nios2/include/asm/pgtable.h
+++ b/arch/nios2/include/asm/pgtable.h
@@ -193,6 +193,7 @@ static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
 	}
 }
 
+#define set_ptes set_ptes
 #define set_pte_at(mm, addr, ptep, pte)	set_ptes(mm, addr, ptep, pte, 1)
 
 static inline int pmd_none(pmd_t pmd)
diff --git a/arch/openrisc/include/asm/pgtable.h b/arch/openrisc/include/asm/pgtable.h
index 1a7077150d7b..8f27730a9ab7 100644
--- a/arch/openrisc/include/asm/pgtable.h
+++ b/arch/openrisc/include/asm/pgtable.h
@@ -47,20 +47,6 @@ extern void paging_init(void);
  */
 #define set_pte(pteptr, pteval) ((*(pteptr)) = (pteval))
 
-static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
-		pte_t *ptep, pte_t pte, unsigned int nr)
-{
-	for (;;) {
-		set_pte(ptep, pte);
-		if (--nr == 0)
-			break;
-		ptep++;
-		pte_val(pte) += PAGE_SIZE;
-	}
-}
-
-#define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1)
-
 /*
  * (pmds are folded into pgds so this doesn't get actually called,
  * but the define is needed for a generic inline function.)
diff --git a/arch/parisc/include/asm/pgtable.h b/arch/parisc/include/asm/pgtable.h
index 78ee9816f423..cd04e85cb012 100644
--- a/arch/parisc/include/asm/pgtable.h
+++ b/arch/parisc/include/asm/pgtable.h
@@ -73,6 +73,7 @@ extern void __update_cache(pte_t pte);
 		mb();				\
 	} while(0)
 
+#define set_ptes set_ptes
 #define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1)
 
 #endif /* !__ASSEMBLY__ */
diff --git a/arch/powerpc/include/asm/pgtable.h b/arch/powerpc/include/asm/pgtable.h
index bf1263ff7e67..f10b6c2f8ade 100644
--- a/arch/powerpc/include/asm/pgtable.h
+++ b/arch/powerpc/include/asm/pgtable.h
@@ -43,6 +43,7 @@ struct mm_struct;
 
 void set_ptes(struct mm_struct *mm, unsigned long addr, pte_t *ptep,
 		pte_t pte, unsigned int nr);
+#define set_ptes set_ptes
 #define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1)
 #define update_mmu_cache(vma, addr, ptep) \
 	update_mmu_cache_range(vma, addr, ptep, 1);
diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
index 3a3a776fc047..8bc49496f8a6 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -473,6 +473,7 @@ static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
 		pte_val(pteval) += 1 << _PAGE_PFN_SHIFT;
 	}
 }
+#define set_ptes set_ptes
 #define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1)
 
 static inline void pte_clear(struct mm_struct *mm,
diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h
index 46bf475116f1..2fc20558af6b 100644
--- a/arch/s390/include/asm/pgtable.h
+++ b/arch/s390/include/asm/pgtable.h
@@ -1346,6 +1346,7 @@ static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
 	}
 }
 
+#define set_ptes set_ptes
 #define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1)
 
 /*
diff --git a/arch/sh/include/asm/pgtable_32.h b/arch/sh/include/asm/pgtable_32.h
index 03ba1834e126..d2f17e944bea 100644
--- a/arch/sh/include/asm/pgtable_32.h
+++ b/arch/sh/include/asm/pgtable_32.h
@@ -319,6 +319,7 @@ static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
 	}
 }
 
+#define set_ptes set_ptes
 #define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1)
 
 /*
diff --git a/arch/sparc/include/asm/pgtable_32.h b/arch/sparc/include/asm/pgtable_32.h
index 47ae55ea1837..7fbc7772a9b7 100644
--- a/arch/sparc/include/asm/pgtable_32.h
+++ b/arch/sparc/include/asm/pgtable_32.h
@@ -101,20 +101,6 @@ static inline void set_pte(pte_t *ptep, pte_t pteval)
 	srmmu_swap((unsigned long *)ptep, pte_val(pteval));
 }
 
-static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
-		pte_t *ptep, pte_t pte, unsigned int nr)
-{
-	for (;;) {
-		set_pte(ptep, pte);
-		if (--nr == 0)
-			break;
-		ptep++;
-		pte_val(pte) += PAGE_SIZE;
-	}
-}
-
-#define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1)
-
 static inline int srmmu_device_memory(unsigned long x)
 {
 	return ((x & 0xF0000000) != 0);
diff --git a/arch/sparc/include/asm/pgtable_64.h b/arch/sparc/include/asm/pgtable_64.h
index d5c0088e0c6a..fddca662ba1b 100644
--- a/arch/sparc/include/asm/pgtable_64.h
+++ b/arch/sparc/include/asm/pgtable_64.h
@@ -924,6 +924,7 @@ static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
 	}
 }
 
+#define set_ptes set_ptes
 #define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1);
 
 #define pte_clear(mm,addr,ptep)		\
diff --git a/arch/um/include/asm/pgtable.h b/arch/um/include/asm/pgtable.h
index ca78c90ae74f..60d2b20ff218 100644
--- a/arch/um/include/asm/pgtable.h
+++ b/arch/um/include/asm/pgtable.h
@@ -242,20 +242,6 @@ static inline void set_pte(pte_t *pteptr, pte_t pteval)
 	if(pte_present(*pteptr)) *pteptr = pte_mknewprot(*pteptr);
 }
 
-static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
-		pte_t *ptep, pte_t pte, unsigned int nr)
-{
-	for (;;) {
-		set_pte(ptep, pte);
-		if (--nr == 0)
-			break;
-		ptep++;
-		pte_val(pte) += PAGE_SIZE;
-	}
-}
-
-#define set_pte_at(mm, addr, ptep, pte)	set_ptes(mm, addr, ptep, pte, 1)
-
 #define __HAVE_ARCH_PTE_SAME
 static inline int pte_same(pte_t pte_a, pte_t pte_b)
 {
diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index f424371ea143..1e5fd352880d 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -1019,22 +1019,6 @@ static inline pud_t native_local_pudp_get_and_clear(pud_t *pudp)
 	return res;
 }
 
-static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
-			      pte_t *ptep, pte_t pte, unsigned int nr)
-{
-	page_table_check_ptes_set(mm, addr, ptep, pte, nr);
-
-	for (;;) {
-		set_pte(ptep, pte);
-		if (--nr == 0)
-			break;
-		ptep++;
-		pte = __pte(pte_val(pte) + PAGE_SIZE);
-	}
-}
-
-#define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1)
-
 static inline void set_pmd_at(struct mm_struct *mm, unsigned long addr,
 			      pmd_t *pmdp, pmd_t pmd)
 {
diff --git a/arch/xtensa/include/asm/pgtable.h b/arch/xtensa/include/asm/pgtable.h
index 293101530541..adeee96518b9 100644
--- a/arch/xtensa/include/asm/pgtable.h
+++ b/arch/xtensa/include/asm/pgtable.h
@@ -306,20 +306,6 @@ static inline void set_pte(pte_t *ptep, pte_t pte)
 	update_pte(ptep, pte);
 }
 
-static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
-		pte_t *ptep, pte_t pte, unsigned int nr)
-{
-	for (;;) {
-		set_pte(ptep, pte);
-		if (--nr == 0)
-			break;
-		ptep++;
-		pte_val(pte) += PAGE_SIZE;
-	}
-}
-
-#define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1)
-
 static inline void
 set_pmd(pmd_t *pmdp, pmd_t pmdval)
 {
diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index c63cd44777ec..ef204712eda3 100644
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -172,6 +172,24 @@ static inline int pmd_young(pmd_t pmd)
 }
 #endif
 
+#ifndef set_ptes
+static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
+			    pte_t *ptep, pte_t pte, unsigned int nr)
+{
+	page_table_check_ptes_set(mm, addr, ptep, pte, nr);
+
+	for (;;) {
+		set_pte(ptep, pte);
+		if (--nr == 0)
+			break;
+		ptep++;
+		pte = __pte(pte_val(pte) + PAGE_SIZE);
+	}
+}
+#define set_ptes set_ptes
+#define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1)
+#endif
+
 #ifndef __HAVE_ARCH_PTEP_SET_ACCESS_FLAGS
 extern int ptep_set_access_flags(struct vm_area_struct *vma,
 				 unsigned long address, pte_t *ptep,

-- 
Sincerely yours,
Mike.

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* Re: [PATCH v3 11/34] ia64: Implement the new page table range API
  2023-03-03 11:56     ` Mike Rapoport
@ 2023-03-03 14:36       ` Matthew Wilcox
  -1 siblings, 0 replies; 77+ messages in thread
From: Matthew Wilcox @ 2023-03-03 14:36 UTC (permalink / raw)
  To: Mike Rapoport; +Cc: linux-mm, linux-arch, linux-kernel, linux-ia64

On Fri, Mar 03, 2023 at 01:56:36PM +0200, Mike Rapoport wrote:
> On Tue, Feb 28, 2023 at 09:37:14PM +0000, Matthew Wilcox (Oracle) wrote:
> >  void arch_dma_mark_clean(phys_addr_t paddr, size_t size)
> >  {
> > -	unsigned long pfn = PHYS_PFN(paddr);
> > +	struct folio *folio = page_folio(phys_to_page(paddr));
> > +	ssize_t left = size;
> > +	size_t offset = offset_in_folio(folio, paddr);
> 
> Build of defconfig failed miserably for me without this:
> 
> diff --git a/arch/ia64/mm/init.c b/arch/ia64/mm/init.c
> index 12aef25944aa..0775e7870257 100644
> --- a/arch/ia64/mm/init.c
> +++ b/arch/ia64/mm/init.c
> @@ -69,7 +69,8 @@ __ia64_sync_icache_dcache (pte_t pte)
>   */
>  void arch_dma_mark_clean(phys_addr_t paddr, size_t size)
>  {
> -	struct folio *folio = page_folio(phys_to_page(paddr));
> +	unsigned long pfn = __phys_to_pfn(paddr);
> +	struct folio *folio = page_folio(pfn_to_page(pfn));

Huh, TIL that only some architectures have phys_to_page().  Thanks.
I'm going to use PHYS_PFN instead of __phys_to_pfn just to reduce the
diff.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v3 11/34] ia64: Implement the new page table range API
@ 2023-03-03 14:36       ` Matthew Wilcox
  0 siblings, 0 replies; 77+ messages in thread
From: Matthew Wilcox @ 2023-03-03 14:36 UTC (permalink / raw)
  To: Mike Rapoport; +Cc: linux-mm, linux-arch, linux-kernel, linux-ia64

On Fri, Mar 03, 2023 at 01:56:36PM +0200, Mike Rapoport wrote:
> On Tue, Feb 28, 2023 at 09:37:14PM +0000, Matthew Wilcox (Oracle) wrote:
> >  void arch_dma_mark_clean(phys_addr_t paddr, size_t size)
> >  {
> > -	unsigned long pfn = PHYS_PFN(paddr);
> > +	struct folio *folio = page_folio(phys_to_page(paddr));
> > +	ssize_t left = size;
> > +	size_t offset = offset_in_folio(folio, paddr);
> 
> Build of defconfig failed miserably for me without this:
> 
> diff --git a/arch/ia64/mm/init.c b/arch/ia64/mm/init.c
> index 12aef25944aa..0775e7870257 100644
> --- a/arch/ia64/mm/init.c
> +++ b/arch/ia64/mm/init.c
> @@ -69,7 +69,8 @@ __ia64_sync_icache_dcache (pte_t pte)
>   */
>  void arch_dma_mark_clean(phys_addr_t paddr, size_t size)
>  {
> -	struct folio *folio = page_folio(phys_to_page(paddr));
> +	unsigned long pfn = __phys_to_pfn(paddr);
> +	struct folio *folio = page_folio(pfn_to_page(pfn));

Huh, TIL that only some architectures have phys_to_page().  Thanks.
I'm going to use PHYS_PFN instead of __phys_to_pfn just to reduce the
diff.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v3 14/34] microblaze: Implement the new page table range API
  2023-03-03 10:53   ` Mike Rapoport
@ 2023-03-03 14:38     ` Matthew Wilcox
  0 siblings, 0 replies; 77+ messages in thread
From: Matthew Wilcox @ 2023-03-03 14:38 UTC (permalink / raw)
  To: Mike Rapoport; +Cc: linux-mm, linux-arch, linux-kernel, Michal Simek

On Fri, Mar 03, 2023 at 12:53:36PM +0200, Mike Rapoport wrote:
> On Tue, Feb 28, 2023 at 09:37:17PM +0000, Matthew Wilcox (Oracle) wrote:
> > +static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
> > +		pte_t *ptep, pte_t pte, unsigned int nr)
> >  {
> > -	*ptep = pte;
> > +	for (;;) {
> > +		set_pte(ptep, pte);
> > +		if (--nr == 0)
> > +			break;
> > +		ptep++;
> > +		pte_val(pte) += 1 << PFN_SHIFT_OFFSET;
> 
> This is the same as
> 
> 		pte_val(pte) += PAGE_SIZE;
> 
> isn't it?

Looks like it.  Based on this:

$ git grep PFN_SHIFT_OFFSET arch/microblaze/
arch/microblaze/include/asm/pgtable.h:#define PFN_SHIFT_OFFSET  (PAGE_SHIFT)
arch/microblaze/include/asm/pgtable.h:#define pte_pfn(x)                (pte_val(x) >> PFN_SHIFT_OFFSET)

I only looked at the definition of pte_pfn(); I didn't look to see how
PFN_SHIFT_OFFSET was defined.  I don't see the need to change it from
what I have though?

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v3 30/34] mm: Use flush_icache_pages() in do_set_pmd()
  2023-03-03 14:02   ` Mike Rapoport
@ 2023-03-03 16:02     ` Matthew Wilcox
  0 siblings, 0 replies; 77+ messages in thread
From: Matthew Wilcox @ 2023-03-03 16:02 UTC (permalink / raw)
  To: Mike Rapoport; +Cc: linux-mm, linux-arch, linux-kernel

On Fri, Mar 03, 2023 at 04:02:01PM +0200, Mike Rapoport wrote:
> On Tue, Feb 28, 2023 at 09:37:33PM +0000, Matthew Wilcox (Oracle) wrote:
> > Push the iteration over each page down to the architectures (many
> > can flush the entire THP without iteration).
> > 
> > Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> > ---
> >  mm/memory.c | 3 +--
> >  1 file changed, 1 insertion(+), 2 deletions(-)
> > 
> > diff --git a/mm/memory.c b/mm/memory.c
> > index bfa3100ec5a3..69e844d5f75c 100644
> > --- a/mm/memory.c
> > +++ b/mm/memory.c
> > @@ -4222,8 +4222,7 @@ vm_fault_t do_set_pmd(struct vm_fault *vmf, struct page *page)
> >  	if (unlikely(!pmd_none(*vmf->pmd)))
> >  		goto out;
> >  
> > -	for (i = 0; i < HPAGE_PMD_NR; i++)
> > -		flush_icache_page(vma, page + i);
> > +	flush_icache_pages(vma, page, HPAGE_PMD_NR);
> >  
> >  	entry = mk_huge_pmd(page, vma->vm_page_prot);
> >  	if (write)
> > -- 
> > 2.39.1
> 
> I get this:
> 
>   CC      mm/memory.o
> /home/mike/git/linux/mm/memory.c: In function 'do_set_pmd':
> /home/mike/git/linux/mm/memory.c:4191:13: warning: unused variable 'i' [-Wunused-variable]
>  4191 |         int i;

Yep, caught that one last night.  My build test must have been with a
config that didn't include THP.  Thanks.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v3 18/34] parisc: Implement the new page table range API
  2023-03-02 20:40     ` John David Anglin
@ 2023-03-04 16:27       ` John David Anglin
  0 siblings, 0 replies; 77+ messages in thread
From: John David Anglin @ 2023-03-04 16:27 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), linux-mm, linux-arch
  Cc: linux-kernel, James E.J. Bottomley, Helge Deller, linux-parisc

On 2023-03-02 3:40 p.m., John David Anglin wrote:
> On 2023-03-02 11:43 a.m., John David Anglin wrote:
>> On 2023-02-28 4:37 p.m., Matthew Wilcox (Oracle) wrote:
>>> Add set_ptes(), update_mmu_cache_range(), flush_dcache_folio()
>>> and flush_icache_pages().  Change the PG_arch_1 (aka PG_dcache_dirty) flag
>>> from being per-page to per-folio.
>> I have tested this change on rp3440 at mainline commit e492250d5252635b6c97d52eddf2792ec26f1ec1
>> and c8000 at mainline commit ee3f96b164688dae21e2466a57f2e806b64e8a37.
> Here's another one:
>
> ------------[ cut here ]------------
> kernel BUG at mm/memory.c:3865!
> CPU: 1 PID: 6972 Comm: sbuild Not tainted 6.2.0+ #1
> Hardware name: 9000/800/rp3440
>
>      YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI
> PSW: 00001000000001101111111100001111 Not tainted
> r00-03  000000000806ff0f 000000004fab8d40 00000000404584b0 000000004fab8d40
> r04-07  0000000040c2f4c0 0000000047fe60c0 000000004fab8b98 0000000000000953
> r08-11  000000004de3de00 0000000000000000 0000000047fe60c0 0000004093ff4660
> r12-15  0000000000000001 0000000047fe60c0 0000000040000540 000000022f8e9540
> r16-19  0000000000000000 000000004c694c40 000000004fab8860 00000000000003d0
> r20-23  0000000007be3a40 0000000000000fff 0000000000000000 000000004109f1a0
> r24-27  0000000000000000 0000000000000cc0 0000000046de3a68 0000000040c2f4c0
> r28-31  80e00000000a0435 000000004fab8df0 000000004fab8e20 0000000000000001
> sr00-03  0000000000207c00 0000000000000000 0000000000000000 0000000002f11c00
> sr04-07  0000000000000000 0000000000000000 0000000000000000 0000000000000000
>
> IASQ: 0000000000000000 0000000000000000 IAOQ: 000000004045908c 0000000040459090
>  IIR: 03ffe01f    ISR: 0000000000000000  IOR: 0000000000000000
>  CPU:        1   CR30: 0000004095d64c20 CR31: ffffffffffffffff
>  ORIG_R28: 000000001c569ad0
>  IAOQ[0]: do_swap_page+0x108c/0x1168
>  IAOQ[1]: do_swap_page+0x1090/0x1168
>  RP(r2): do_swap_page+0x4b0/0x1168
> Backtrace:
>  [<000000004045a554>] handle_pte_fault+0x244/0x358
>  [<000000004045c58c>] __handle_mm_fault+0x104/0x1b8
>  [<000000004045c81c>] handle_mm_fault+0x1dc/0x318
>  [<000000004044cb38>] faultin_page+0xa8/0x178
>  [<000000004044e848>] __get_user_pages+0x328/0x560
>  [<0000000040450ac4>] get_dump_page+0x9c/0x128
>  [<0000000040596cb8>] dump_user_range+0xc0/0x2d8
>  [<000000004058e790>] elf_core_dump+0x5f8/0x708
>  [<0000000040596384>] do_coredump+0xc2c/0x14a0
>  [<0000000040259040>] get_signal+0x4a8/0xb60
>  [<000000004021a570>] do_signal+0x50/0x228
>  [<000000004021ab38>] do_notify_resume+0x68/0x150
>  [<0000000040203ee0>] syscall_do_signal+0x54/0xa0
Removed new page table API change and still see a swap issue on rp3440.  So, these bugs are probably
unrelated to the API change.

get_swap_device: Bad swap file entry 600000000014ee20
get_swap_device: Bad swap file entry 600000000014ee20
[...]
get_swap_device: Bad swap file entry 600000000014ee20
_swap_info_get: Bad swap file entry 600000000014ee20
BUG: Bad page map in process sh  pte:14ee2418 pmd:01372913
addr:00000000f8406000 vm_flags:00000075 anon_vma:0000000000000000 mapping:000000007f67e1a8 index:25
file:libc.so.6 fault:xfs_filemap_fault [xfs] mmap:xfs_file_mmap [xfs] read_folio:xfs_vm_read_folio [xfs]
CPU: 3 PID: 12702 Comm: sh Not tainted 6.2.0+ #1
Hardware name: 9000/800/rp3440
Backtrace:
  [<000000004020ac50>] show_stack+0x70/0x90
  [<0000000040b7c148>] dump_stack_lvl+0xd8/0x128
  [<0000000040b7c1cc>] dump_stack+0x34/0x48
  [<000000004045020c>] print_bad_pte+0x24c/0x318
  [<0000000040454f78>] zap_pte_range+0x908/0x990
  [<0000000040455238>] unmap_page_range+0x1d8/0x490
  [<00000000404556bc>] unmap_vmas+0x10c/0x1a8
  [<0000000040465278>] exit_mmap+0x198/0x4a0
  [<0000000040234a3c>] mmput+0x114/0x2a8
  [<0000000040243c10>] do_exit+0x4e0/0xc68
  [<00000000402446b8>] do_group_exit+0x68/0x128
  [<00000000402583fc>] get_signal+0xae4/0xb60
  [<0000000040219310>] do_signal+0x50/0x228
  [<00000000402198d8>] do_notify_resume+0x68/0x150
  [<00000000402030b4>] intr_check_sig+0x38/0x3c

Dave

-- 
John David Anglin  dave.anglin@bell.net


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v3 29/34] mm: Rationalise flush_icache_pages() and flush_icache_page()
  2023-02-28 21:37 ` [PATCH v3 29/34] mm: Rationalise flush_icache_pages() and flush_icache_page() Matthew Wilcox (Oracle)
@ 2023-03-05  9:53   ` Geert Uytterhoeven
  0 siblings, 0 replies; 77+ messages in thread
From: Geert Uytterhoeven @ 2023-03-05  9:53 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle); +Cc: linux-mm, linux-arch, linux-kernel

On Tue, Feb 28, 2023 at 10:39 PM Matthew Wilcox (Oracle)
<willy@infradead.org> wrote:
> Move the default (no-op) implementation of flush_icache_pages()
> to <linux/cacheflush.h> from <asm-generic/cacheflush.h>.
> Remove the flush_icache_page() wrapper from each architecture
> into <linux/cacheflush.h>.
>
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

>  arch/m68k/include/asm/cacheflush_mm.h   |  1 -

Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v3 00/34] New page table range API
  2023-02-28 21:37 [PATCH v3 00/34] New page table range API Matthew Wilcox (Oracle)
                   ` (34 preceding siblings ...)
  2023-03-03 14:19 ` [PATCH v3 00/34] New page table range API Mike Rapoport
@ 2023-03-05 10:15 ` Geert Uytterhoeven
  2023-03-09 11:09 ` Ryan Roberts
  36 siblings, 0 replies; 77+ messages in thread
From: Geert Uytterhoeven @ 2023-03-05 10:15 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle); +Cc: linux-mm, linux-arch, linux-kernel

Hi Willy,

On Tue, Feb 28, 2023 at 10:40 PM Matthew Wilcox (Oracle)
<willy@infradead.org> wrote:
> This patchset changes the API used by the MM to set up page table entries.
> The four APIs are:
>     set_ptes(mm, addr, ptep, pte, nr)
>     update_mmu_cache_range(vma, addr, ptep, nr)
>     flush_dcache_folio(folio)
>     flush_icache_pages(vma, page, nr)
>
> flush_dcache_folio() isn't technically new, but no architecture
> implemented it, so I've done that for you.  The old APIs remain around
> but are mostly implemented by calling the new interfaces.
>
> The new APIs are based around setting up N page table entries at once.
> The N entries belong to the same PMD, the same folio and the same VMA,
> so ptep++ is a legitimate operation, and locking is taken care of for
> you.  Some architectures can do a better job of it than just a loop,
> but I have hesitated to make too deep a change to architectures I don't
> understand well.
>
> One thing I have changed in every architecture is that PG_arch_1 is now a
> per-folio bit instead of a per-page bit.  This was something that would
> have to happen eventually, and it makes sense to do it now rather than
> iterate over every page involved in a cache flush and figure out if it
> needs to happen.
>
> The point of all this is better performance, and Fengwei Yin has
> measured improvement on x86.  I suspect you'll see improvement on
> your architecture too.  Try the new will-it-scale test mentioned here:
> https://lore.kernel.org/linux-mm/20230206140639.538867-5-fengwei.yin@intel.com/
> You'll need to run it on an XFS filesystem and have
> CONFIG_TRANSPARENT_HUGEPAGE set.

Thanks for your series!

> For testing, I've only run the code on x86.  If an x86->foo compiler
> exists in Debian, I've built defconfig.  I'm relying on the buildbots
> to tell me what I missed, and people who actually have the hardware to
> tell me if it actually works.

Seems to work fine on ARAnyM and qemu-system-m68k/virt, so
Tested-by: Geert Uytterhoeven <geert@linux-m68k.org>

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v3 13/34] m68k: Implement the new page table range API
  2023-02-28 21:37 ` [PATCH v3 13/34] m68k: " Matthew Wilcox (Oracle)
@ 2023-03-05 10:16   ` Geert Uytterhoeven
  2023-03-05 15:28     ` Matthew Wilcox
  0 siblings, 1 reply; 77+ messages in thread
From: Geert Uytterhoeven @ 2023-03-05 10:16 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle); +Cc: linux-mm, linux-arch, linux-kernel, linux-m68k

Hi Willy,

On Tue, Feb 28, 2023 at 10:37 PM Matthew Wilcox (Oracle)
<willy@infradead.org> wrote:
> Add set_ptes(), update_mmu_cache_range(), flush_icache_pages() and
> flush_dcache_folio().
>
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

Thanks for your patch!

> --- a/arch/m68k/include/asm/cacheflush_mm.h
> +++ b/arch/m68k/include/asm/cacheflush_mm.h
> @@ -220,24 +220,28 @@ static inline void flush_cache_page(struct vm_area_struct *vma, unsigned long vm
>
>  /* Push the page at kernel virtual address and clear the icache */
>  /* RZ: use cpush %bc instead of cpush %dc, cinv %ic */
> -static inline void __flush_page_to_ram(void *vaddr)
> +static inline void __flush_pages_to_ram(void *vaddr, unsigned int nr)
>  {
>         if (CPU_IS_COLDFIRE) {
>                 unsigned long addr, start, end;
>                 addr = ((unsigned long) vaddr) & ~(PAGE_SIZE - 1);
>                 start = addr & ICACHE_SET_MASK;
> -               end = (addr + PAGE_SIZE - 1) & ICACHE_SET_MASK;
> +               end = (addr + nr * PAGE_SIZE - 1) & ICACHE_SET_MASK;
>                 if (start > end) {
>                         flush_cf_bcache(0, end);
>                         end = ICACHE_MAX_ADDR;
>                 }
>                 flush_cf_bcache(start, end);
>         } else if (CPU_IS_040_OR_060) {
> -               __asm__ __volatile__("nop\n\t"
> -                                    ".chip 68040\n\t"
> -                                    "cpushp %%bc,(%0)\n\t"
> -                                    ".chip 68k"
> -                                    : : "a" (__pa(vaddr)));
> +               unsigned long paddr = __pa(vaddr);
> +
> +               while (nr--) {
> +                       __asm__ __volatile__("nop\n\t"
> +                                            ".chip 68040\n\t"
> +                                            "cpushp %%bc,(%0)\n\t"
> +                                            ".chip 68k"
> +                                            : : "a" (paddr + nr * PAGE_SIZE));

As gcc (9.5.0) keeps on calculating "paddr + nr * PAGE_SIZE"
inside the loop (albeit using a shift instead of a multiplication),
please use "paddr" here, followed by "paddr += PAGE_SIZE;".

Gr{oetje,eeting}s,

                        Geert


--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v3 13/34] m68k: Implement the new page table range API
  2023-03-05 10:16   ` Geert Uytterhoeven
@ 2023-03-05 15:28     ` Matthew Wilcox
  2023-03-05 16:48       ` Geert Uytterhoeven
  2023-03-05 20:44       ` Michael Schmitz
  0 siblings, 2 replies; 77+ messages in thread
From: Matthew Wilcox @ 2023-03-05 15:28 UTC (permalink / raw)
  To: Geert Uytterhoeven; +Cc: linux-mm, linux-arch, linux-kernel, linux-m68k

On Sun, Mar 05, 2023 at 11:16:13AM +0100, Geert Uytterhoeven wrote:
> > +               while (nr--) {
> > +                       __asm__ __volatile__("nop\n\t"
> > +                                            ".chip 68040\n\t"
> > +                                            "cpushp %%bc,(%0)\n\t"
> > +                                            ".chip 68k"
> > +                                            : : "a" (paddr + nr * PAGE_SIZE));
> 
> As gcc (9.5.0) keeps on calculating "paddr + nr * PAGE_SIZE"
> inside the loop (albeit using a shift instead of a multiplication),
> please use "paddr" here, followed by "paddr += PAGE_SIZE;".

Thanks.  So this?

+++ b/arch/m68k/include/asm/cacheflush_mm.h
@@ -235,13 +235,14 @@ static inline void __flush_pages_to_ram(void *vaddr, unsigned int nr)
        } else if (CPU_IS_040_OR_060) {
                unsigned long paddr = __pa(vaddr);
 
-               while (nr--) {
+               do {
                        __asm__ __volatile__("nop\n\t"
                                             ".chip 68040\n\t"
                                             "cpushp %%bc,(%0)\n\t"
                                             ".chip 68k"
-                                            : : "a" (paddr + nr * PAGE_SIZE));
-               }
+                                            : : "a" (paddr));
+                       paddr += PAGE_SIZE;
+               } while (--nr);
        } else {
                unsigned long _tmp;
                __asm__ __volatile__("movec %%cacr,%0\n\t"

Also, I noticed that I broke sun3.  It puts the PFN in bits 0-n instead
of 12-n.  New patch coming soon.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v3 13/34] m68k: Implement the new page table range API
  2023-03-05 15:28     ` Matthew Wilcox
@ 2023-03-05 16:48       ` Geert Uytterhoeven
  2023-03-05 20:44       ` Michael Schmitz
  1 sibling, 0 replies; 77+ messages in thread
From: Geert Uytterhoeven @ 2023-03-05 16:48 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: linux-mm, linux-arch, linux-kernel, linux-m68k

Hi Willy,

On Sun, Mar 5, 2023 at 4:28 PM Matthew Wilcox <willy@infradead.org> wrote:
> On Sun, Mar 05, 2023 at 11:16:13AM +0100, Geert Uytterhoeven wrote:
> > > +               while (nr--) {
> > > +                       __asm__ __volatile__("nop\n\t"
> > > +                                            ".chip 68040\n\t"
> > > +                                            "cpushp %%bc,(%0)\n\t"
> > > +                                            ".chip 68k"
> > > +                                            : : "a" (paddr + nr * PAGE_SIZE));
> >
> > As gcc (9.5.0) keeps on calculating "paddr + nr * PAGE_SIZE"
> > inside the loop (albeit using a shift instead of a multiplication),
> > please use "paddr" here, followed by "paddr += PAGE_SIZE;".
>
> Thanks.  So this?
>
> +++ b/arch/m68k/include/asm/cacheflush_mm.h
> @@ -235,13 +235,14 @@ static inline void __flush_pages_to_ram(void *vaddr, unsigned int nr)
>         } else if (CPU_IS_040_OR_060) {
>                 unsigned long paddr = __pa(vaddr);
>
> -               while (nr--) {
> +               do {
>                         __asm__ __volatile__("nop\n\t"
>                                              ".chip 68040\n\t"
>                                              "cpushp %%bc,(%0)\n\t"
>                                              ".chip 68k"
> -                                            : : "a" (paddr + nr * PAGE_SIZE));
> -               }
> +                                            : : "a" (paddr));
> +                       paddr += PAGE_SIZE;
> +               } while (--nr);
>         } else {
>                 unsigned long _tmp;
>                 __asm__ __volatile__("movec %%cacr,%0\n\t"

LGTM. Might be safer to keep the "while (nr--) {", just in case someone
ever passes zero.

> Also, I noticed that I broke sun3.  It puts the PFN in bits 0-n instead
> of 12-n.  New patch coming soon.

Thanks, hadn't noticed (there are no sun3-specific code changes in
this series?)

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v3 13/34] m68k: Implement the new page table range API
  2023-03-05 15:28     ` Matthew Wilcox
  2023-03-05 16:48       ` Geert Uytterhoeven
@ 2023-03-05 20:44       ` Michael Schmitz
  2023-03-06  7:21         ` Geert Uytterhoeven
  1 sibling, 1 reply; 77+ messages in thread
From: Michael Schmitz @ 2023-03-05 20:44 UTC (permalink / raw)
  To: Matthew Wilcox, Geert Uytterhoeven
  Cc: linux-mm, linux-arch, linux-kernel, linux-m68k

Hi Matthew, Geert,

sorry, I missed that one when it got posted ...

On 6/03/23 04:28, Matthew Wilcox wrote:
> On Sun, Mar 05, 2023 at 11:16:13AM +0100, Geert Uytterhoeven wrote:
>>> +               while (nr--) {
>>> +                       __asm__ __volatile__("nop\n\t"
>>> +                                            ".chip 68040\n\t"
>>> +                                            "cpushp %%bc,(%0)\n\t"
>>> +                                            ".chip 68k"
>>> +                                            : : "a" (paddr + nr * PAGE_SIZE));
>> As gcc (9.5.0) keeps on calculating "paddr + nr * PAGE_SIZE"
>> inside the loop (albeit using a shift instead of a multiplication),
>> please use "paddr" here, followed by "paddr += PAGE_SIZE;".

Are we certain that contiguous vaddr always maps to contiguous paddr?

If not, I'd suggest we increment vaddr inside the loop, and use __pa() 
each time:

> +++ b/arch/m68k/include/asm/cacheflush_mm.h
> @@ -235,13 +235,14 @@ static inline void __flush_pages_to_ram(void *vaddr, unsigned int nr)
>          } else if (CPU_IS_040_OR_060) {
>   
> -               while (nr--) {
> +               do {
>                          __asm__ __volatile__("nop\n\t"
>                                               ".chip 68040\n\t"
>                                               "cpushp %%bc,(%0)\n\t"
>                                               ".chip 68k"
> -                                            : : "a" (paddr + nr * PAGE_SIZE));
> -               }
> +                                            : : "a" __pa(vaddr));
> +                       vaddr += PAGE_SIZE;
> +               } while (--nr);
>          } else {
>                  unsigned long _tmp;
>                  __asm__ __volatile__("movec %%cacr,%0\n\t"
>
(just edited Matthew's patch in the mail editor, untested, may not apply 
cleanly ...)

Cheers,

     Michael



^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v3 13/34] m68k: Implement the new page table range API
  2023-03-05 20:44       ` Michael Schmitz
@ 2023-03-06  7:21         ` Geert Uytterhoeven
  2023-03-06 23:01           ` Michael Schmitz
  0 siblings, 1 reply; 77+ messages in thread
From: Geert Uytterhoeven @ 2023-03-06  7:21 UTC (permalink / raw)
  To: Michael Schmitz
  Cc: Matthew Wilcox, linux-mm, linux-arch, linux-kernel, linux-m68k

Hi Michael,

On Sun, Mar 5, 2023 at 9:44 PM Michael Schmitz <schmitzmic@gmail.com> wrote:
> On 6/03/23 04:28, Matthew Wilcox wrote:
> > On Sun, Mar 05, 2023 at 11:16:13AM +0100, Geert Uytterhoeven wrote:
> >>> +               while (nr--) {
> >>> +                       __asm__ __volatile__("nop\n\t"
> >>> +                                            ".chip 68040\n\t"
> >>> +                                            "cpushp %%bc,(%0)\n\t"
> >>> +                                            ".chip 68k"
> >>> +                                            : : "a" (paddr + nr * PAGE_SIZE));
> >> As gcc (9.5.0) keeps on calculating "paddr + nr * PAGE_SIZE"
> >> inside the loop (albeit using a shift instead of a multiplication),
> >> please use "paddr" here, followed by "paddr += PAGE_SIZE;".
>
> Are we certain that contiguous vaddr always maps to contiguous paddr?

For a general __flush_pages_to_ram() function, that would not be
guaranteed. But as this is meant for folios, it must be true:
https://elixir.bootlin.com/linux/latest/source/include/linux/mm_types.h#L320

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v3 13/34] m68k: Implement the new page table range API
  2023-03-06  7:21         ` Geert Uytterhoeven
@ 2023-03-06 23:01           ` Michael Schmitz
  0 siblings, 0 replies; 77+ messages in thread
From: Michael Schmitz @ 2023-03-06 23:01 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Matthew Wilcox, linux-mm, linux-arch, linux-kernel, linux-m68k

Hi Geert,

On 6/03/23 20:21, Geert Uytterhoeven wrote:
>> Are we certain that contiguous vaddr always maps to contiguous paddr?
> For a general __flush_pages_to_ram() function, that would not be
> guaranteed. But as this is meant for folios, it must be true:
> https://elixir.bootlin.com/linux/latest/source/include/linux/mm_types.h#L320

Thanks for explaining - that just leaves the problem of cowboys like 
myself abusing __flush_pages_to_ram(addr, nr) with nr > 1 for something 
that isn't a folio. Maybe a comment 'nr > 1 only valid on folios' would 
help ...

Cheers,

     Michael

>
> Gr{oetje,eeting}s,
>
>                          Geert
>

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v3 08/34] arm64: Implement the new page table range API
  2023-02-28 21:37   ` Matthew Wilcox (Oracle)
@ 2023-03-09 11:03     ` Ryan Roberts
  -1 siblings, 0 replies; 77+ messages in thread
From: Ryan Roberts @ 2023-03-09 11:03 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), linux-mm, linux-arch
  Cc: linux-kernel, Catalin Marinas, linux-arm-kernel

On 28/02/2023 21:37, Matthew Wilcox (Oracle) wrote:
> Add set_ptes(), update_mmu_cache_range() and flush_dcache_folio().
> Change the PG_dcache_clean flag from being per-page to per-folio.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
> Cc: linux-arm-kernel@lists.infradead.org
> ---
>  arch/arm64/include/asm/cacheflush.h |  4 +++-
>  arch/arm64/include/asm/pgtable.h    | 25 ++++++++++++++------
>  arch/arm64/mm/flush.c               | 36 +++++++++++------------------
>  3 files changed, 35 insertions(+), 30 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/cacheflush.h b/arch/arm64/include/asm/cacheflush.h
> index 37185e978aeb..d115451ed263 100644
> --- a/arch/arm64/include/asm/cacheflush.h
> +++ b/arch/arm64/include/asm/cacheflush.h
> @@ -114,7 +114,7 @@ extern void copy_to_user_page(struct vm_area_struct *, struct page *,
>  #define copy_to_user_page copy_to_user_page
>  
>  /*
> - * flush_dcache_page is used when the kernel has written to the page
> + * flush_dcache_folio is used when the kernel has written to the page
>   * cache page at virtual address page->virtual.
>   *
>   * If this page isn't mapped (ie, page_mapping == NULL), or it might
> @@ -127,6 +127,8 @@ extern void copy_to_user_page(struct vm_area_struct *, struct page *,
>   */
>  #define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
>  extern void flush_dcache_page(struct page *);
> +void flush_dcache_folio(struct folio *);
> +#define flush_dcache_folio flush_dcache_folio
>  
>  static __always_inline void icache_inval_all_pou(void)
>  {
> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
> index 69765dc697af..4d1b79dbff16 100644
> --- a/arch/arm64/include/asm/pgtable.h
> +++ b/arch/arm64/include/asm/pgtable.h
> @@ -355,12 +355,21 @@ static inline void __set_pte_at(struct mm_struct *mm, unsigned long addr,
>  	set_pte(ptep, pte);
>  }
>  
> -static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
> -			      pte_t *ptep, pte_t pte)
> -{
> -	page_table_check_ptes_set(mm, addr, ptep, pte, 1);
> -	return __set_pte_at(mm, addr, ptep, pte);
> +static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
> +			      pte_t *ptep, pte_t pte, unsigned int nr)
> +{
> +	page_table_check_ptes_set(mm, addr, ptep, pte, nr);
> +
> +	for (;;) {
> +		__set_pte_at(mm, addr, ptep, pte);
> +		if (--nr == 0)
> +			break;
> +		ptep++;
> +		addr += PAGE_SIZE;
> +		pte_val(pte) += PAGE_SIZE;

For systems that support > 48-bit PA, arm64 places high bits [51:48] of the PA
at a low position in the PTE. I think I've convinced myself that this is ok
though, because set_ptes() promises that the range is always within a single PMD
and therefore its guaranteed that we will not have ptes straddling both sides of
the 48 bit boundary for a single call?

Also, its not clear to me if set_ptes() could be called for a range of
not-present ptes? (i.e. clearing the pte-range or swap entries, etc). If so,
then I guess you would only want to increment the address if pte_present(pte)?
I'm guessing that batch-clearing ptes might appear in the near future so might
be sensible to support that now?

Regardless, a comment to make these assumptions clear would be useful.

Thanks,
Ryan


> +	}
>  }
> +#define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1)
>  
>  /*
>   * Huge pte definitions.
> @@ -1059,8 +1068,8 @@ static inline void arch_swap_restore(swp_entry_t entry, struct folio *folio)
>  /*
>   * On AArch64, the cache coherency is handled via the set_pte_at() function.
>   */
> -static inline void update_mmu_cache(struct vm_area_struct *vma,
> -				    unsigned long addr, pte_t *ptep)
> +static inline void update_mmu_cache_range(struct vm_area_struct *vma,
> +		unsigned long addr, pte_t *ptep, unsigned int nr)
>  {
>  	/*
>  	 * We don't do anything here, so there's a very small chance of
> @@ -1069,6 +1078,8 @@ static inline void update_mmu_cache(struct vm_area_struct *vma,
>  	 */
>  }
>  
> +#define update_mmu_cache(vma, addr, ptep) \
> +	update_mmu_cache_range(vma, addr, ptep, 1)
>  #define update_mmu_cache_pmd(vma, address, pmd) do { } while (0)
>  
>  #ifdef CONFIG_ARM64_PA_BITS_52
> diff --git a/arch/arm64/mm/flush.c b/arch/arm64/mm/flush.c
> index 5f9379b3c8c8..deb781af0a3a 100644
> --- a/arch/arm64/mm/flush.c
> +++ b/arch/arm64/mm/flush.c
> @@ -50,20 +50,13 @@ void copy_to_user_page(struct vm_area_struct *vma, struct page *page,
>  
>  void __sync_icache_dcache(pte_t pte)
>  {
> -	struct page *page = pte_page(pte);
> +	struct folio *folio = page_folio(pte_page(pte));
>  
> -	/*
> -	 * HugeTLB pages are always fully mapped, so only setting head page's
> -	 * PG_dcache_clean flag is enough.
> -	 */
> -	if (PageHuge(page))
> -		page = compound_head(page);
> -
> -	if (!test_bit(PG_dcache_clean, &page->flags)) {
> -		sync_icache_aliases((unsigned long)page_address(page),
> -				    (unsigned long)page_address(page) +
> -					    page_size(page));
> -		set_bit(PG_dcache_clean, &page->flags);
> +	if (!test_bit(PG_dcache_clean, &folio->flags)) {
> +		sync_icache_aliases((unsigned long)folio_address(folio),
> +				    (unsigned long)folio_address(folio) +
> +					    folio_size(folio));
> +		set_bit(PG_dcache_clean, &folio->flags);
>  	}
>  }
>  EXPORT_SYMBOL_GPL(__sync_icache_dcache);
> @@ -73,17 +66,16 @@ EXPORT_SYMBOL_GPL(__sync_icache_dcache);
>   * it as dirty for later flushing when mapped in user space (if executable,
>   * see __sync_icache_dcache).
>   */
> -void flush_dcache_page(struct page *page)
> +void flush_dcache_folio(struct folio *folio)
>  {
> -	/*
> -	 * HugeTLB pages are always fully mapped and only head page will be
> -	 * set PG_dcache_clean (see comments in __sync_icache_dcache()).
> -	 */
> -	if (PageHuge(page))
> -		page = compound_head(page);
> +	if (test_bit(PG_dcache_clean, &folio->flags))
> +		clear_bit(PG_dcache_clean, &folio->flags);
> +}
> +EXPORT_SYMBOL(flush_dcache_folio);
>  
> -	if (test_bit(PG_dcache_clean, &page->flags))
> -		clear_bit(PG_dcache_clean, &page->flags);
> +void flush_dcache_page(struct page *page)
> +{
> +	flush_dcache_folio(page_folio(page));
>  }
>  EXPORT_SYMBOL(flush_dcache_page);
>  


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v3 08/34] arm64: Implement the new page table range API
@ 2023-03-09 11:03     ` Ryan Roberts
  0 siblings, 0 replies; 77+ messages in thread
From: Ryan Roberts @ 2023-03-09 11:03 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), linux-mm, linux-arch
  Cc: linux-kernel, Catalin Marinas, linux-arm-kernel

On 28/02/2023 21:37, Matthew Wilcox (Oracle) wrote:
> Add set_ptes(), update_mmu_cache_range() and flush_dcache_folio().
> Change the PG_dcache_clean flag from being per-page to per-folio.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
> Cc: linux-arm-kernel@lists.infradead.org
> ---
>  arch/arm64/include/asm/cacheflush.h |  4 +++-
>  arch/arm64/include/asm/pgtable.h    | 25 ++++++++++++++------
>  arch/arm64/mm/flush.c               | 36 +++++++++++------------------
>  3 files changed, 35 insertions(+), 30 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/cacheflush.h b/arch/arm64/include/asm/cacheflush.h
> index 37185e978aeb..d115451ed263 100644
> --- a/arch/arm64/include/asm/cacheflush.h
> +++ b/arch/arm64/include/asm/cacheflush.h
> @@ -114,7 +114,7 @@ extern void copy_to_user_page(struct vm_area_struct *, struct page *,
>  #define copy_to_user_page copy_to_user_page
>  
>  /*
> - * flush_dcache_page is used when the kernel has written to the page
> + * flush_dcache_folio is used when the kernel has written to the page
>   * cache page at virtual address page->virtual.
>   *
>   * If this page isn't mapped (ie, page_mapping == NULL), or it might
> @@ -127,6 +127,8 @@ extern void copy_to_user_page(struct vm_area_struct *, struct page *,
>   */
>  #define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
>  extern void flush_dcache_page(struct page *);
> +void flush_dcache_folio(struct folio *);
> +#define flush_dcache_folio flush_dcache_folio
>  
>  static __always_inline void icache_inval_all_pou(void)
>  {
> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
> index 69765dc697af..4d1b79dbff16 100644
> --- a/arch/arm64/include/asm/pgtable.h
> +++ b/arch/arm64/include/asm/pgtable.h
> @@ -355,12 +355,21 @@ static inline void __set_pte_at(struct mm_struct *mm, unsigned long addr,
>  	set_pte(ptep, pte);
>  }
>  
> -static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
> -			      pte_t *ptep, pte_t pte)
> -{
> -	page_table_check_ptes_set(mm, addr, ptep, pte, 1);
> -	return __set_pte_at(mm, addr, ptep, pte);
> +static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
> +			      pte_t *ptep, pte_t pte, unsigned int nr)
> +{
> +	page_table_check_ptes_set(mm, addr, ptep, pte, nr);
> +
> +	for (;;) {
> +		__set_pte_at(mm, addr, ptep, pte);
> +		if (--nr == 0)
> +			break;
> +		ptep++;
> +		addr += PAGE_SIZE;
> +		pte_val(pte) += PAGE_SIZE;

For systems that support > 48-bit PA, arm64 places high bits [51:48] of the PA
at a low position in the PTE. I think I've convinced myself that this is ok
though, because set_ptes() promises that the range is always within a single PMD
and therefore its guaranteed that we will not have ptes straddling both sides of
the 48 bit boundary for a single call?

Also, its not clear to me if set_ptes() could be called for a range of
not-present ptes? (i.e. clearing the pte-range or swap entries, etc). If so,
then I guess you would only want to increment the address if pte_present(pte)?
I'm guessing that batch-clearing ptes might appear in the near future so might
be sensible to support that now?

Regardless, a comment to make these assumptions clear would be useful.

Thanks,
Ryan


> +	}
>  }
> +#define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1)
>  
>  /*
>   * Huge pte definitions.
> @@ -1059,8 +1068,8 @@ static inline void arch_swap_restore(swp_entry_t entry, struct folio *folio)
>  /*
>   * On AArch64, the cache coherency is handled via the set_pte_at() function.
>   */
> -static inline void update_mmu_cache(struct vm_area_struct *vma,
> -				    unsigned long addr, pte_t *ptep)
> +static inline void update_mmu_cache_range(struct vm_area_struct *vma,
> +		unsigned long addr, pte_t *ptep, unsigned int nr)
>  {
>  	/*
>  	 * We don't do anything here, so there's a very small chance of
> @@ -1069,6 +1078,8 @@ static inline void update_mmu_cache(struct vm_area_struct *vma,
>  	 */
>  }
>  
> +#define update_mmu_cache(vma, addr, ptep) \
> +	update_mmu_cache_range(vma, addr, ptep, 1)
>  #define update_mmu_cache_pmd(vma, address, pmd) do { } while (0)
>  
>  #ifdef CONFIG_ARM64_PA_BITS_52
> diff --git a/arch/arm64/mm/flush.c b/arch/arm64/mm/flush.c
> index 5f9379b3c8c8..deb781af0a3a 100644
> --- a/arch/arm64/mm/flush.c
> +++ b/arch/arm64/mm/flush.c
> @@ -50,20 +50,13 @@ void copy_to_user_page(struct vm_area_struct *vma, struct page *page,
>  
>  void __sync_icache_dcache(pte_t pte)
>  {
> -	struct page *page = pte_page(pte);
> +	struct folio *folio = page_folio(pte_page(pte));
>  
> -	/*
> -	 * HugeTLB pages are always fully mapped, so only setting head page's
> -	 * PG_dcache_clean flag is enough.
> -	 */
> -	if (PageHuge(page))
> -		page = compound_head(page);
> -
> -	if (!test_bit(PG_dcache_clean, &page->flags)) {
> -		sync_icache_aliases((unsigned long)page_address(page),
> -				    (unsigned long)page_address(page) +
> -					    page_size(page));
> -		set_bit(PG_dcache_clean, &page->flags);
> +	if (!test_bit(PG_dcache_clean, &folio->flags)) {
> +		sync_icache_aliases((unsigned long)folio_address(folio),
> +				    (unsigned long)folio_address(folio) +
> +					    folio_size(folio));
> +		set_bit(PG_dcache_clean, &folio->flags);
>  	}
>  }
>  EXPORT_SYMBOL_GPL(__sync_icache_dcache);
> @@ -73,17 +66,16 @@ EXPORT_SYMBOL_GPL(__sync_icache_dcache);
>   * it as dirty for later flushing when mapped in user space (if executable,
>   * see __sync_icache_dcache).
>   */
> -void flush_dcache_page(struct page *page)
> +void flush_dcache_folio(struct folio *folio)
>  {
> -	/*
> -	 * HugeTLB pages are always fully mapped and only head page will be
> -	 * set PG_dcache_clean (see comments in __sync_icache_dcache()).
> -	 */
> -	if (PageHuge(page))
> -		page = compound_head(page);
> +	if (test_bit(PG_dcache_clean, &folio->flags))
> +		clear_bit(PG_dcache_clean, &folio->flags);
> +}
> +EXPORT_SYMBOL(flush_dcache_folio);
>  
> -	if (test_bit(PG_dcache_clean, &page->flags))
> -		clear_bit(PG_dcache_clean, &page->flags);
> +void flush_dcache_page(struct page *page)
> +{
> +	flush_dcache_folio(page_folio(page));
>  }
>  EXPORT_SYMBOL(flush_dcache_page);
>  


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v3 00/34] New page table range API
  2023-02-28 21:37 [PATCH v3 00/34] New page table range API Matthew Wilcox (Oracle)
                   ` (35 preceding siblings ...)
  2023-03-05 10:15 ` Geert Uytterhoeven
@ 2023-03-09 11:09 ` Ryan Roberts
  36 siblings, 0 replies; 77+ messages in thread
From: Ryan Roberts @ 2023-03-09 11:09 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), linux-mm, linux-arch; +Cc: linux-kernel

On 28/02/2023 21:37, Matthew Wilcox (Oracle) wrote:
> This patchset changes the API used by the MM to set up page table entries.
> The four APIs are:
>     set_ptes(mm, addr, ptep, pte, nr)
>     update_mmu_cache_range(vma, addr, ptep, nr)
>     flush_dcache_folio(folio)
>     flush_icache_pages(vma, page, nr)
> 
> flush_dcache_folio() isn't technically new, but no architecture
> implemented it, so I've done that for you.  The old APIs remain around
> but are mostly implemented by calling the new interfaces.
> 
> The new APIs are based around setting up N page table entries at once.
> The N entries belong to the same PMD, the same folio and the same VMA,
> so ptep++ is a legitimate operation, and locking is taken care of for
> you.  Some architectures can do a better job of it than just a loop,
> but I have hesitated to make too deep a change to architectures I don't
> understand well.
> 
> One thing I have changed in every architecture is that PG_arch_1 is now a
> per-folio bit instead of a per-page bit.  This was something that would
> have to happen eventually, and it makes sense to do it now rather than
> iterate over every page involved in a cache flush and figure out if it
> needs to happen.
> 
> The point of all this is better performance, and Fengwei Yin has
> measured improvement on x86.  I suspect you'll see improvement on
> your architecture too.  Try the new will-it-scale test mentioned here:
> https://lore.kernel.org/linux-mm/20230206140639.538867-5-fengwei.yin@intel.com/
> You'll need to run it on an XFS filesystem and have
> CONFIG_TRANSPARENT_HUGEPAGE set.
> 
> For testing, I've only run the code on x86.  If an x86->foo compiler
> exists in Debian, I've built defconfig.  I'm relying on the buildbots
> to tell me what I missed, and people who actually have the hardware to
> tell me if it actually works.
> 
> I'd like to get this into the MM tree soon after the current merge window
> closes, so quick feedback would be appreciated.

I've boot-tested the series (with the Yin's typo fix for patch 32) on arm64 FVP
and Ampere Altra. On the Altra, I also ran page_fault4 from will-it-scale, and
see ~35% improvement from this series. So:

Tested-by: Ryan Roberts <ryan.roberts@arm.com>

Thanks,
Ryan


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v3 20/34] riscv: Implement the new page table range API
  2023-02-28 21:37   ` Matthew Wilcox (Oracle)
@ 2023-03-15  5:23     ` Palmer Dabbelt
  -1 siblings, 0 replies; 77+ messages in thread
From: Palmer Dabbelt @ 2023-03-15  5:23 UTC (permalink / raw)
  To: willy
  Cc: linux-mm, linux-arch, willy, linux-kernel, alexghiti,
	Paul Walmsley, aou, linux-riscv

On Tue, 28 Feb 2023 13:37:23 PST (-0800), willy@infradead.org wrote:
> Add set_ptes(), update_mmu_cache_range() and flush_dcache_folio().
> Change the PG_dcache_clean flag from being per-page to per-folio.
>
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
> Cc: Paul Walmsley <paul.walmsley@sifive.com>
> Cc: Palmer Dabbelt <palmer@dabbelt.com>
> Cc: Albert Ou <aou@eecs.berkeley.edu>
> Cc: linux-riscv@lists.infradead.org
> ---
>  arch/riscv/include/asm/cacheflush.h | 19 +++++++++----------
>  arch/riscv/include/asm/pgtable.h    | 26 +++++++++++++++++++-------
>  arch/riscv/mm/cacheflush.c          | 11 ++---------
>  3 files changed, 30 insertions(+), 26 deletions(-)
>
> diff --git a/arch/riscv/include/asm/cacheflush.h b/arch/riscv/include/asm/cacheflush.h
> index 03e3b95ae6da..10e5e96f09b5 100644
> --- a/arch/riscv/include/asm/cacheflush.h
> +++ b/arch/riscv/include/asm/cacheflush.h
> @@ -15,20 +15,19 @@ static inline void local_flush_icache_all(void)
>
>  #define PG_dcache_clean PG_arch_1
>
> -static inline void flush_dcache_page(struct page *page)
> +static inline void flush_dcache_folio(struct folio *folio)
>  {
> -	/*
> -	 * HugeTLB pages are always fully mapped and only head page will be
> -	 * set PG_dcache_clean (see comments in flush_icache_pte()).
> -	 */
> -	if (PageHuge(page))
> -		page = compound_head(page);
> -
> -	if (test_bit(PG_dcache_clean, &page->flags))
> -		clear_bit(PG_dcache_clean, &page->flags);
> +	if (test_bit(PG_dcache_clean, &folio->flags))
> +		clear_bit(PG_dcache_clean, &folio->flags);
>  }
> +#define flush_dcache_folio flush_dcache_folio
>  #define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
>
> +static inline void flush_dcache_page(struct page *page)
> +{
> +	flush_dcache_folio(page_folio(page));
> +}
> +
>  /*
>   * RISC-V doesn't have an instruction to flush parts of the instruction cache,
>   * so instead we just flush the whole thing.
> diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
> index b516f3b59616..3a3a776fc047 100644
> --- a/arch/riscv/include/asm/pgtable.h
> +++ b/arch/riscv/include/asm/pgtable.h
> @@ -405,8 +405,8 @@ static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
>
>
>  /* Commit new configuration to MMU hardware */
> -static inline void update_mmu_cache(struct vm_area_struct *vma,
> -	unsigned long address, pte_t *ptep)
> +static inline void update_mmu_cache_range(struct vm_area_struct *vma,
> +		unsigned long address, pte_t *ptep, unsigned int nr)
>  {
>  	/*
>  	 * The kernel assumes that TLBs don't cache invalid entries, but
> @@ -415,8 +415,11 @@ static inline void update_mmu_cache(struct vm_area_struct *vma,
>  	 * Relying on flush_tlb_fix_spurious_fault would suffice, but
>  	 * the extra traps reduce performance.  So, eagerly SFENCE.VMA.
>  	 */
> -	local_flush_tlb_page(address);
> +	while (nr--)
> +		local_flush_tlb_page(address + nr * PAGE_SIZE);
>  }
> +#define update_mmu_cache(vma, addr, ptep) \
> +	update_mmu_cache_range(vma, addr, ptep, 1)
>
>  #define __HAVE_ARCH_UPDATE_MMU_TLB
>  #define update_mmu_tlb update_mmu_cache
> @@ -456,12 +459,21 @@ static inline void __set_pte_at(struct mm_struct *mm,
>  	set_pte(ptep, pteval);
>  }
>
> -static inline void set_pte_at(struct mm_struct *mm,
> -	unsigned long addr, pte_t *ptep, pte_t pteval)
> +static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
> +		pte_t *ptep, pte_t pteval, unsigned int nr)
>  {
> -	page_table_check_ptes_set(mm, addr, ptep, pteval, 1);
> -	__set_pte_at(mm, addr, ptep, pteval);
> +	page_table_check_ptes_set(mm, addr, ptep, pteval, nr);
> +
> +	for (;;) {
> +		__set_pte_at(mm, addr, ptep, pteval);
> +		if (--nr == 0)
> +			break;
> +		ptep++;
> +		addr += PAGE_SIZE;
> +		pte_val(pteval) += 1 << _PAGE_PFN_SHIFT;
> +	}
>  }
> +#define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1)
>
>  static inline void pte_clear(struct mm_struct *mm,
>  	unsigned long addr, pte_t *ptep)
> diff --git a/arch/riscv/mm/cacheflush.c b/arch/riscv/mm/cacheflush.c
> index fcd6145fbead..e36a851e5788 100644
> --- a/arch/riscv/mm/cacheflush.c
> +++ b/arch/riscv/mm/cacheflush.c
> @@ -81,16 +81,9 @@ void flush_icache_mm(struct mm_struct *mm, bool local)
>  #ifdef CONFIG_MMU
>  void flush_icache_pte(pte_t pte)
>  {
> -	struct page *page = pte_page(pte);
> +	struct folio *folio = page_folio(pte_page(pte));
>
> -	/*
> -	 * HugeTLB pages are always fully mapped, so only setting head page's
> -	 * PG_dcache_clean flag is enough.
> -	 */
> -	if (PageHuge(page))
> -		page = compound_head(page);
> -
> -	if (!test_bit(PG_dcache_clean, &page->flags)) {
> +	if (!test_bit(PG_dcache_clean, &folio->flags)) {
>  		flush_icache_all();
>  		set_bit(PG_dcache_clean, &page->flags);
>  	}

Acked-by: Palmer Dabbelt <palmer@rivosinc.com>

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v3 20/34] riscv: Implement the new page table range API
@ 2023-03-15  5:23     ` Palmer Dabbelt
  0 siblings, 0 replies; 77+ messages in thread
From: Palmer Dabbelt @ 2023-03-15  5:23 UTC (permalink / raw)
  To: willy
  Cc: linux-mm, linux-arch, willy, linux-kernel, alexghiti,
	Paul Walmsley, aou, linux-riscv

On Tue, 28 Feb 2023 13:37:23 PST (-0800), willy@infradead.org wrote:
> Add set_ptes(), update_mmu_cache_range() and flush_dcache_folio().
> Change the PG_dcache_clean flag from being per-page to per-folio.
>
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
> Cc: Paul Walmsley <paul.walmsley@sifive.com>
> Cc: Palmer Dabbelt <palmer@dabbelt.com>
> Cc: Albert Ou <aou@eecs.berkeley.edu>
> Cc: linux-riscv@lists.infradead.org
> ---
>  arch/riscv/include/asm/cacheflush.h | 19 +++++++++----------
>  arch/riscv/include/asm/pgtable.h    | 26 +++++++++++++++++++-------
>  arch/riscv/mm/cacheflush.c          | 11 ++---------
>  3 files changed, 30 insertions(+), 26 deletions(-)
>
> diff --git a/arch/riscv/include/asm/cacheflush.h b/arch/riscv/include/asm/cacheflush.h
> index 03e3b95ae6da..10e5e96f09b5 100644
> --- a/arch/riscv/include/asm/cacheflush.h
> +++ b/arch/riscv/include/asm/cacheflush.h
> @@ -15,20 +15,19 @@ static inline void local_flush_icache_all(void)
>
>  #define PG_dcache_clean PG_arch_1
>
> -static inline void flush_dcache_page(struct page *page)
> +static inline void flush_dcache_folio(struct folio *folio)
>  {
> -	/*
> -	 * HugeTLB pages are always fully mapped and only head page will be
> -	 * set PG_dcache_clean (see comments in flush_icache_pte()).
> -	 */
> -	if (PageHuge(page))
> -		page = compound_head(page);
> -
> -	if (test_bit(PG_dcache_clean, &page->flags))
> -		clear_bit(PG_dcache_clean, &page->flags);
> +	if (test_bit(PG_dcache_clean, &folio->flags))
> +		clear_bit(PG_dcache_clean, &folio->flags);
>  }
> +#define flush_dcache_folio flush_dcache_folio
>  #define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
>
> +static inline void flush_dcache_page(struct page *page)
> +{
> +	flush_dcache_folio(page_folio(page));
> +}
> +
>  /*
>   * RISC-V doesn't have an instruction to flush parts of the instruction cache,
>   * so instead we just flush the whole thing.
> diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
> index b516f3b59616..3a3a776fc047 100644
> --- a/arch/riscv/include/asm/pgtable.h
> +++ b/arch/riscv/include/asm/pgtable.h
> @@ -405,8 +405,8 @@ static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
>
>
>  /* Commit new configuration to MMU hardware */
> -static inline void update_mmu_cache(struct vm_area_struct *vma,
> -	unsigned long address, pte_t *ptep)
> +static inline void update_mmu_cache_range(struct vm_area_struct *vma,
> +		unsigned long address, pte_t *ptep, unsigned int nr)
>  {
>  	/*
>  	 * The kernel assumes that TLBs don't cache invalid entries, but
> @@ -415,8 +415,11 @@ static inline void update_mmu_cache(struct vm_area_struct *vma,
>  	 * Relying on flush_tlb_fix_spurious_fault would suffice, but
>  	 * the extra traps reduce performance.  So, eagerly SFENCE.VMA.
>  	 */
> -	local_flush_tlb_page(address);
> +	while (nr--)
> +		local_flush_tlb_page(address + nr * PAGE_SIZE);
>  }
> +#define update_mmu_cache(vma, addr, ptep) \
> +	update_mmu_cache_range(vma, addr, ptep, 1)
>
>  #define __HAVE_ARCH_UPDATE_MMU_TLB
>  #define update_mmu_tlb update_mmu_cache
> @@ -456,12 +459,21 @@ static inline void __set_pte_at(struct mm_struct *mm,
>  	set_pte(ptep, pteval);
>  }
>
> -static inline void set_pte_at(struct mm_struct *mm,
> -	unsigned long addr, pte_t *ptep, pte_t pteval)
> +static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
> +		pte_t *ptep, pte_t pteval, unsigned int nr)
>  {
> -	page_table_check_ptes_set(mm, addr, ptep, pteval, 1);
> -	__set_pte_at(mm, addr, ptep, pteval);
> +	page_table_check_ptes_set(mm, addr, ptep, pteval, nr);
> +
> +	for (;;) {
> +		__set_pte_at(mm, addr, ptep, pteval);
> +		if (--nr == 0)
> +			break;
> +		ptep++;
> +		addr += PAGE_SIZE;
> +		pte_val(pteval) += 1 << _PAGE_PFN_SHIFT;
> +	}
>  }
> +#define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1)
>
>  static inline void pte_clear(struct mm_struct *mm,
>  	unsigned long addr, pte_t *ptep)
> diff --git a/arch/riscv/mm/cacheflush.c b/arch/riscv/mm/cacheflush.c
> index fcd6145fbead..e36a851e5788 100644
> --- a/arch/riscv/mm/cacheflush.c
> +++ b/arch/riscv/mm/cacheflush.c
> @@ -81,16 +81,9 @@ void flush_icache_mm(struct mm_struct *mm, bool local)
>  #ifdef CONFIG_MMU
>  void flush_icache_pte(pte_t pte)
>  {
> -	struct page *page = pte_page(pte);
> +	struct folio *folio = page_folio(pte_page(pte));
>
> -	/*
> -	 * HugeTLB pages are always fully mapped, so only setting head page's
> -	 * PG_dcache_clean flag is enough.
> -	 */
> -	if (PageHuge(page))
> -		page = compound_head(page);
> -
> -	if (!test_bit(PG_dcache_clean, &page->flags)) {
> +	if (!test_bit(PG_dcache_clean, &folio->flags)) {
>  		flush_icache_all();
>  		set_bit(PG_dcache_clean, &page->flags);
>  	}

Acked-by: Palmer Dabbelt <palmer@rivosinc.com>

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v3 02/34] mm: Add generic flush_icache_pages() and documentation
  2023-02-28 21:37 ` [PATCH v3 02/34] mm: Add generic flush_icache_pages() and documentation Matthew Wilcox (Oracle)
@ 2023-03-15  9:27   ` Mike Rapoport
  0 siblings, 0 replies; 77+ messages in thread
From: Mike Rapoport @ 2023-03-15  9:27 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle); +Cc: linux-mm, linux-arch, linux-kernel

On Tue, Feb 28, 2023 at 09:37:05PM +0000, Matthew Wilcox (Oracle) wrote:
> flush_icache_page() is deprecated but not yet removed, so add
> a range version of it.  Change the documentation to refer to
> update_mmu_cache_range() instead of update_mmu_cache().
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

Acked-by: Mike Rapoport (IBM) <rppt@kernel.org>

> ---
>  Documentation/core-api/cachetlb.rst | 35 +++++++++++++++--------------
>  include/asm-generic/cacheflush.h    |  5 +++++
>  2 files changed, 23 insertions(+), 17 deletions(-)
> 
> diff --git a/Documentation/core-api/cachetlb.rst b/Documentation/core-api/cachetlb.rst
> index 5c0552e78c58..d4c9e2a28d36 100644
> --- a/Documentation/core-api/cachetlb.rst
> +++ b/Documentation/core-api/cachetlb.rst
> @@ -88,13 +88,13 @@ changes occur:
>  
>  	This is used primarily during fault processing.
>  
> -5) ``void update_mmu_cache(struct vm_area_struct *vma,
> -   unsigned long address, pte_t *ptep)``
> +5) ``void update_mmu_cache_range(struct vm_area_struct *vma,
> +   unsigned long address, pte_t *ptep, unsigned int nr)``
>  
> -	At the end of every page fault, this routine is invoked to
> -	tell the architecture specific code that a translation
> -	now exists at virtual address "address" for address space
> -	"vma->vm_mm", in the software page tables.
> +	At the end of every page fault, this routine is invoked to tell
> +	the architecture specific code that translations now exists
> +	in the software page tables for address space "vma->vm_mm"
> +	at virtual address "address" for "nr" consecutive pages.
>  
>  	A port may use this information in any way it so chooses.
>  	For example, it could use this event to pre-load TLB
> @@ -306,17 +306,18 @@ maps this page at its virtual address.
>  	private".  The kernel guarantees that, for pagecache pages, it will
>  	clear this bit when such a page first enters the pagecache.
>  
> -	This allows these interfaces to be implemented much more efficiently.
> -	It allows one to "defer" (perhaps indefinitely) the actual flush if
> -	there are currently no user processes mapping this page.  See sparc64's
> -	flush_dcache_page and update_mmu_cache implementations for an example
> -	of how to go about doing this.
> +	This allows these interfaces to be implemented much more
> +	efficiently.  It allows one to "defer" (perhaps indefinitely) the
> +	actual flush if there are currently no user processes mapping this
> +	page.  See sparc64's flush_dcache_page and update_mmu_cache_range
> +	implementations for an example of how to go about doing this.
>  
> -	The idea is, first at flush_dcache_page() time, if page_file_mapping()
> -	returns a mapping, and mapping_mapped on that mapping returns %false,
> -	just mark the architecture private page flag bit.  Later, in
> -	update_mmu_cache(), a check is made of this flag bit, and if set the
> -	flush is done and the flag bit is cleared.
> +	The idea is, first at flush_dcache_page() time, if
> +	page_file_mapping() returns a mapping, and mapping_mapped on that
> +	mapping returns %false, just mark the architecture private page
> +	flag bit.  Later, in update_mmu_cache_range(), a check is made
> +	of this flag bit, and if set the flush is done and the flag bit
> +	is cleared.
>  
>  	.. important::
>  
> @@ -369,7 +370,7 @@ maps this page at its virtual address.
>    ``void flush_icache_page(struct vm_area_struct *vma, struct page *page)``
>  
>  	All the functionality of flush_icache_page can be implemented in
> -	flush_dcache_page and update_mmu_cache. In the future, the hope
> +	flush_dcache_page and update_mmu_cache_range. In the future, the hope
>  	is to remove this interface completely.
>  
>  The final category of APIs is for I/O to deliberately aliased address
> diff --git a/include/asm-generic/cacheflush.h b/include/asm-generic/cacheflush.h
> index f46258d1a080..09d51a680765 100644
> --- a/include/asm-generic/cacheflush.h
> +++ b/include/asm-generic/cacheflush.h
> @@ -78,6 +78,11 @@ static inline void flush_icache_range(unsigned long start, unsigned long end)
>  #endif
>  
>  #ifndef flush_icache_page
> +static inline void flush_icache_pages(struct vm_area_struct *vma,
> +				     struct page *page, unsigned int nr)
> +{
> +}
> +
>  static inline void flush_icache_page(struct vm_area_struct *vma,
>  				     struct page *page)
>  {
> -- 
> 2.39.1
> 
> 

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 77+ messages in thread

end of thread, other threads:[~2023-03-15  9:27 UTC | newest]

Thread overview: 77+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-28 21:37 [PATCH v3 00/34] New page table range API Matthew Wilcox (Oracle)
2023-02-28 21:37 ` [PATCH v3 01/34] mm: Convert page_table_check_pte_set() to page_table_check_ptes_set() Matthew Wilcox (Oracle)
2023-02-28 21:37 ` [PATCH v3 02/34] mm: Add generic flush_icache_pages() and documentation Matthew Wilcox (Oracle)
2023-03-15  9:27   ` Mike Rapoport
2023-02-28 21:37 ` [PATCH v3 03/34] mm: Add folio_flush_mapping() Matthew Wilcox (Oracle)
2023-03-03 10:33   ` Mike Rapoport
2023-02-28 21:37 ` [PATCH v3 04/34] mm: Remove ARCH_IMPLEMENTS_FLUSH_DCACHE_FOLIO Matthew Wilcox (Oracle)
2023-02-28 21:37 ` [PATCH v3 05/34] alpha: Implement the new page table range API Matthew Wilcox (Oracle)
2023-02-28 21:37 ` [PATCH v3 06/34] arc: " Matthew Wilcox (Oracle)
2023-02-28 21:37   ` Matthew Wilcox (Oracle)
2023-02-28 21:37 ` [PATCH v3 07/34] arm: " Matthew Wilcox (Oracle)
2023-02-28 21:37   ` Matthew Wilcox (Oracle)
2023-02-28 21:37 ` [PATCH v3 08/34] arm64: " Matthew Wilcox (Oracle)
2023-02-28 21:37   ` Matthew Wilcox (Oracle)
2023-03-09 11:03   ` Ryan Roberts
2023-03-09 11:03     ` Ryan Roberts
2023-02-28 21:37 ` [PATCH v3 09/34] csky: " Matthew Wilcox (Oracle)
2023-03-03 11:40   ` Mike Rapoport
2023-02-28 21:37 ` [PATCH v3 10/34] hexagon: " Matthew Wilcox (Oracle)
2023-02-28 21:37 ` [PATCH v3 11/34] ia64: " Matthew Wilcox (Oracle)
2023-02-28 21:37   ` Matthew Wilcox (Oracle)
2023-03-03 11:56   ` Mike Rapoport
2023-03-03 11:56     ` Mike Rapoport
2023-03-03 14:36     ` Matthew Wilcox
2023-03-03 14:36       ` Matthew Wilcox
2023-02-28 21:37 ` [PATCH v3 12/34] loongarch: " Matthew Wilcox (Oracle)
2023-03-01  2:04   ` WANG Xuerui
2023-02-28 21:37 ` [PATCH v3 13/34] m68k: " Matthew Wilcox (Oracle)
2023-03-05 10:16   ` Geert Uytterhoeven
2023-03-05 15:28     ` Matthew Wilcox
2023-03-05 16:48       ` Geert Uytterhoeven
2023-03-05 20:44       ` Michael Schmitz
2023-03-06  7:21         ` Geert Uytterhoeven
2023-03-06 23:01           ` Michael Schmitz
2023-02-28 21:37 ` [PATCH v3 14/34] microblaze: " Matthew Wilcox (Oracle)
2023-03-03 10:53   ` Mike Rapoport
2023-03-03 14:38     ` Matthew Wilcox
2023-02-28 21:37 ` [PATCH v3 15/34] mips: " Matthew Wilcox (Oracle)
2023-03-03 12:24   ` Mike Rapoport
2023-02-28 21:37 ` [PATCH v3 16/34] nios2: " Matthew Wilcox (Oracle)
2023-03-03 12:49   ` Mike Rapoport
2023-02-28 21:37 ` [PATCH v3 17/34] openrisc: " Matthew Wilcox (Oracle)
2023-02-28 21:37 ` [PATCH v3 18/34] parisc: " Matthew Wilcox (Oracle)
2023-03-02 16:43   ` John David Anglin
2023-03-02 20:40     ` John David Anglin
2023-03-04 16:27       ` John David Anglin
2023-02-28 21:37 ` [PATCH v3 19/34] powerpc: " Matthew Wilcox (Oracle)
2023-02-28 21:37   ` Matthew Wilcox (Oracle)
2023-02-28 21:37 ` [PATCH v3 20/34] riscv: " Matthew Wilcox (Oracle)
2023-02-28 21:37   ` Matthew Wilcox (Oracle)
2023-03-15  5:23   ` Palmer Dabbelt
2023-03-15  5:23     ` Palmer Dabbelt
2023-02-28 21:37 ` [PATCH v3 21/34] s390: " Matthew Wilcox (Oracle)
2023-03-02 13:31   ` Gerald Schaefer
2023-02-28 21:37 ` [PATCH v3 22/34] superh: " Matthew Wilcox (Oracle)
2023-03-01  8:06   ` Geert Uytterhoeven
2023-03-01 16:17     ` Matthew Wilcox
2023-02-28 21:37 ` [PATCH v3 23/34] sparc32: " Matthew Wilcox (Oracle)
2023-02-28 21:37 ` [PATCH v3 24/34] sparc64: " Matthew Wilcox (Oracle)
2023-02-28 21:37 ` [PATCH v3 25/34] um: " Matthew Wilcox (Oracle)
2023-02-28 21:37   ` Matthew Wilcox (Oracle)
2023-02-28 21:37 ` [PATCH v3 26/34] x86: " Matthew Wilcox (Oracle)
2023-02-28 21:37 ` [PATCH v3 27/34] xtensa: " Matthew Wilcox (Oracle)
2023-02-28 21:37 ` [PATCH v3 28/34] mm: Remove page_mapping_file() Matthew Wilcox (Oracle)
2023-02-28 21:37 ` [PATCH v3 29/34] mm: Rationalise flush_icache_pages() and flush_icache_page() Matthew Wilcox (Oracle)
2023-03-05  9:53   ` Geert Uytterhoeven
2023-02-28 21:37 ` [PATCH v3 30/34] mm: Use flush_icache_pages() in do_set_pmd() Matthew Wilcox (Oracle)
2023-03-03 14:02   ` Mike Rapoport
2023-03-03 16:02     ` Matthew Wilcox
2023-02-28 21:37 ` [PATCH v3 31/34] filemap: Add filemap_map_folio_range() Matthew Wilcox (Oracle)
2023-02-28 21:37 ` [PATCH v3 32/34] rmap: add folio_add_file_rmap_range() Matthew Wilcox (Oracle)
2023-03-01  3:04   ` Yin, Fengwei
2023-02-28 21:37 ` [PATCH v3 33/34] mm: Convert do_set_pte() to set_pte_range() Matthew Wilcox (Oracle)
2023-02-28 21:37 ` [PATCH v3 34/34] filemap: Batch PTE mappings Matthew Wilcox (Oracle)
2023-03-03 14:19 ` [PATCH v3 00/34] New page table range API Mike Rapoport
2023-03-05 10:15 ` Geert Uytterhoeven
2023-03-09 11:09 ` Ryan Roberts

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.