All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5 0/5] Batched page table updates for file-backed large folios
@ 2023-02-07 19:49 Matthew Wilcox (Oracle)
  2023-02-07 19:49 ` [PATCH v5 1/5] filemap: Add filemap_map_folio_range() Matthew Wilcox (Oracle)
                   ` (4 more replies)
  0 siblings, 5 replies; 12+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-02-07 19:49 UTC (permalink / raw)
  To: Yin Fengwei, linux-mm
  Cc: Matthew Wilcox (Oracle), david, dave.hansen, tim.c.chen, ying.huang

On page fault, filemap_map_pages() already retrieves a folio from the page
cache and iterates over it, which realised some savings.  This patch series
drives that further down by allowing filemap to tell the MM to map a
contiguous range of pages in the folio.  This improves performance by
batching the updates to the folio's refcount and the rmap counters.

Testing with a micro benchmark like will-it-scale.pagefault on a
48C/96T IceLake box showed:
   - batched rmap brings around 15% performance gain
   - batched refcount brings around 2% performance gain

v4:
 - Add the set_ptes() architecture interface
 - Change various interfaces to take (folio, struct page *, nr) instead of
   (folio, unsigned long start, nr)
 - Remove do_set_pte() instead of keeping a compat interface
 - Add a check in set_pte_range() to ensure that large anon folios are not
   being passed in yet (David Hildenbrand)
 - Save / restore the old vmf->pte pointer instead of passing a different
   pte pointer to set_pte_range()

Matthew Wilcox (Oracle) (1):
  mm: Add generic set_ptes()

Yin Fengwei (4):
  filemap: Add filemap_map_folio_range()
  rmap: add folio_add_file_rmap_range()
  mm: Convert do_set_pte() to set_pte_range()
  filemap: Batch PTE mappings

 Documentation/filesystems/locking.rst |   2 +-
 include/linux/mm.h                    |   3 +-
 include/linux/pgtable.h               |  27 +++++++
 include/linux/rmap.h                  |   2 +
 mm/filemap.c                          | 111 ++++++++++++++++----------
 mm/memory.c                           |  31 ++++---
 mm/rmap.c                             |  60 ++++++++++----
 7 files changed, 164 insertions(+), 72 deletions(-)

-- 
2.35.1



^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v5 1/5] filemap: Add filemap_map_folio_range()
  2023-02-07 19:49 [PATCH v5 0/5] Batched page table updates for file-backed large folios Matthew Wilcox (Oracle)
@ 2023-02-07 19:49 ` Matthew Wilcox (Oracle)
  2023-02-07 19:49 ` [PATCH v5 2/5] mm: Add generic set_ptes() Matthew Wilcox (Oracle)
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 12+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-02-07 19:49 UTC (permalink / raw)
  To: Yin Fengwei, linux-mm
  Cc: david, dave.hansen, tim.c.chen, ying.huang, Matthew Wilcox

From: Yin Fengwei <fengwei.yin@intel.com>

filemap_map_folio_range() maps partial/full folio. Comparing to original
filemap_map_pages(), it updates refcount once per folio instead of per
page and gets minor performance improvement for large folio.

With a will-it-scale.page_fault3 like app (change file write
fault testing to read fault testing. Trying to upstream it to
will-it-scale at [1]), got 2% performance gain on a 48C/96T
Cascade Lake test box with 96 processes running against xfs.

[1]: https://github.com/antonblanchard/will-it-scale/pull/37

Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 mm/filemap.c | 98 +++++++++++++++++++++++++++++-----------------------
 1 file changed, 54 insertions(+), 44 deletions(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index 2ebcf500871d..9aa7a4fdc374 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2200,16 +2200,6 @@ unsigned filemap_get_folios(struct address_space *mapping, pgoff_t *start,
 }
 EXPORT_SYMBOL(filemap_get_folios);
 
-static inline
-bool folio_more_pages(struct folio *folio, pgoff_t index, pgoff_t max)
-{
-	if (!folio_test_large(folio) || folio_test_hugetlb(folio))
-		return false;
-	if (index >= max)
-		return false;
-	return index < folio->index + folio_nr_pages(folio) - 1;
-}
-
 /**
  * filemap_get_folios_contig - Get a batch of contiguous folios
  * @mapping:	The address_space to search
@@ -3351,6 +3341,53 @@ static inline struct folio *next_map_page(struct address_space *mapping,
 				  mapping, xas, end_pgoff);
 }
 
+/*
+ * Map page range [start_page, start_page + nr_pages) of folio.
+ * start_page is gotten from start by folio_page(folio, start)
+ */
+static vm_fault_t filemap_map_folio_range(struct vm_fault *vmf,
+			struct folio *folio, unsigned long start,
+			unsigned long addr, unsigned int nr_pages)
+{
+	vm_fault_t ret = 0;
+	struct vm_area_struct *vma = vmf->vma;
+	struct file *file = vma->vm_file;
+	struct page *page = folio_page(folio, start);
+	unsigned int mmap_miss = READ_ONCE(file->f_ra.mmap_miss);
+	unsigned int ref_count = 0, count = 0;
+
+	do {
+		if (PageHWPoison(page))
+			continue;
+
+		if (mmap_miss > 0)
+			mmap_miss--;
+
+		/*
+		 * NOTE: If there're PTE markers, we'll leave them to be
+		 * handled in the specific fault path, and it'll prohibit the
+		 * fault-around logic.
+		 */
+		if (!pte_none(*vmf->pte))
+			continue;
+
+		if (vmf->address == addr)
+			ret = VM_FAULT_NOPAGE;
+
+		ref_count++;
+		do_set_pte(vmf, page, addr);
+		update_mmu_cache(vma, addr, vmf->pte);
+	} while (vmf->pte++, page++, addr += PAGE_SIZE, ++count < nr_pages);
+
+	/* Restore the vmf->pte */
+	vmf->pte -= nr_pages;
+
+	folio_ref_add(folio, ref_count);
+	WRITE_ONCE(file->f_ra.mmap_miss, mmap_miss);
+
+	return ret;
+}
+
 vm_fault_t filemap_map_pages(struct vm_fault *vmf,
 			     pgoff_t start_pgoff, pgoff_t end_pgoff)
 {
@@ -3361,9 +3398,9 @@ vm_fault_t filemap_map_pages(struct vm_fault *vmf,
 	unsigned long addr;
 	XA_STATE(xas, &mapping->i_pages, start_pgoff);
 	struct folio *folio;
-	struct page *page;
 	unsigned int mmap_miss = READ_ONCE(file->f_ra.mmap_miss);
 	vm_fault_t ret = 0;
+	int nr_pages = 0;
 
 	rcu_read_lock();
 	folio = first_map_page(mapping, &xas, end_pgoff);
@@ -3378,45 +3415,18 @@ vm_fault_t filemap_map_pages(struct vm_fault *vmf,
 	addr = vma->vm_start + ((start_pgoff - vma->vm_pgoff) << PAGE_SHIFT);
 	vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, addr, &vmf->ptl);
 	do {
-again:
-		page = folio_file_page(folio, xas.xa_index);
-		if (PageHWPoison(page))
-			goto unlock;
-
-		if (mmap_miss > 0)
-			mmap_miss--;
+		unsigned long end;
 
 		addr += (xas.xa_index - last_pgoff) << PAGE_SHIFT;
 		vmf->pte += xas.xa_index - last_pgoff;
 		last_pgoff = xas.xa_index;
+		end = folio->index + folio_nr_pages(folio) - 1;
+		nr_pages = min(end, end_pgoff) - xas.xa_index + 1;
 
-		/*
-		 * NOTE: If there're PTE markers, we'll leave them to be
-		 * handled in the specific fault path, and it'll prohibit the
-		 * fault-around logic.
-		 */
-		if (!pte_none(*vmf->pte))
-			goto unlock;
-
-		/* We're about to handle the fault */
-		if (vmf->address == addr)
-			ret = VM_FAULT_NOPAGE;
+		ret |= filemap_map_folio_range(vmf, folio,
+				xas.xa_index - folio->index, addr, nr_pages);
+		xas.xa_index += nr_pages;
 
-		do_set_pte(vmf, page, addr);
-		/* no need to invalidate: a not-present page won't be cached */
-		update_mmu_cache(vma, addr, vmf->pte);
-		if (folio_more_pages(folio, xas.xa_index, end_pgoff)) {
-			xas.xa_index++;
-			folio_ref_inc(folio);
-			goto again;
-		}
-		folio_unlock(folio);
-		continue;
-unlock:
-		if (folio_more_pages(folio, xas.xa_index, end_pgoff)) {
-			xas.xa_index++;
-			goto again;
-		}
 		folio_unlock(folio);
 		folio_put(folio);
 	} while ((folio = next_map_page(mapping, &xas, end_pgoff)) != NULL);
-- 
2.35.1



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v5 2/5] mm: Add generic set_ptes()
  2023-02-07 19:49 [PATCH v5 0/5] Batched page table updates for file-backed large folios Matthew Wilcox (Oracle)
  2023-02-07 19:49 ` [PATCH v5 1/5] filemap: Add filemap_map_folio_range() Matthew Wilcox (Oracle)
@ 2023-02-07 19:49 ` Matthew Wilcox (Oracle)
  2023-02-08  0:48   ` kernel test robot
                     ` (3 more replies)
  2023-02-07 19:49 ` [PATCH v5 3/5] rmap: add folio_add_file_rmap_range() Matthew Wilcox (Oracle)
                   ` (2 subsequent siblings)
  4 siblings, 4 replies; 12+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-02-07 19:49 UTC (permalink / raw)
  To: Yin Fengwei, linux-mm
  Cc: Matthew Wilcox (Oracle), david, dave.hansen, tim.c.chen, ying.huang

This set_ptes should work for most architectures.  Those that need to
set a special bit in their PTEs or have their PFNs located at a different
offset from PAGE_SHIFT will need to override it.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 include/linux/pgtable.h | 27 +++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)

diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index c63cd44777ec..e1804d23e7c4 100644
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -1439,6 +1439,33 @@ static inline int pmd_protnone(pmd_t pmd)
 
 #endif /* CONFIG_MMU */
 
+#ifndef set_ptes
+/**
+ * set_ptes - Map consecutive pages to a contiguous range of addresses.
+ * @mm: Address space to map the pages into.
+ * @addr: Address to map the first page at.
+ * @ptep: Page table pointer for the first entry.
+ * @pte: Page table entry for the first page.
+ * @nr: Number of pages to map.
+ *
+ * Context: The caller holds the page table lock.  The PTEs all lie
+ * within a single PMD (and VMA, and folio).
+ */
+static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
+		pte_t *ptep, pte_t pte, unsigned int nr)
+{
+	for (;;) {
+		set_pte_at(mm, addr, ptep, pte);
+		if (--nr == 0)
+			break;
+		ptep++;
+		addr += PAGE_SIZE;
+		/* This works for x86.  Check how PTEs are encoded */
+		pte = __pte(pte_val(pte) + PAGE_SIZE);
+	}
+}
+#endif
+
 #ifdef CONFIG_HAVE_ARCH_HUGE_VMAP
 
 #ifndef __PAGETABLE_P4D_FOLDED
-- 
2.35.1



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v5 3/5] rmap: add folio_add_file_rmap_range()
  2023-02-07 19:49 [PATCH v5 0/5] Batched page table updates for file-backed large folios Matthew Wilcox (Oracle)
  2023-02-07 19:49 ` [PATCH v5 1/5] filemap: Add filemap_map_folio_range() Matthew Wilcox (Oracle)
  2023-02-07 19:49 ` [PATCH v5 2/5] mm: Add generic set_ptes() Matthew Wilcox (Oracle)
@ 2023-02-07 19:49 ` Matthew Wilcox (Oracle)
  2023-02-07 19:49 ` [PATCH v5 4/5] mm: Convert do_set_pte() to set_pte_range() Matthew Wilcox (Oracle)
  2023-02-07 19:49 ` [PATCH v5 5/5] filemap: Batch PTE mappings Matthew Wilcox (Oracle)
  4 siblings, 0 replies; 12+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-02-07 19:49 UTC (permalink / raw)
  To: Yin Fengwei, linux-mm
  Cc: david, dave.hansen, tim.c.chen, ying.huang, Matthew Wilcox

From: Yin Fengwei <fengwei.yin@intel.com>

folio_add_file_rmap_range() allows to add pte mapping to a specific
range of file folio. Comparing to page_add_file_rmap(), it batched
updates __lruvec_stat for large folio.

Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 include/linux/rmap.h |  2 ++
 mm/rmap.c            | 60 +++++++++++++++++++++++++++++++++-----------
 2 files changed, 48 insertions(+), 14 deletions(-)

diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index a4570da03e58..125ede2fb049 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -198,6 +198,8 @@ void folio_add_new_anon_rmap(struct folio *, struct vm_area_struct *,
 		unsigned long address);
 void page_add_file_rmap(struct page *, struct vm_area_struct *,
 		bool compound);
+void folio_add_file_rmap_range(struct folio *, struct page *, unsigned int nr,
+		struct vm_area_struct *, bool compound);
 void page_remove_rmap(struct page *, struct vm_area_struct *,
 		bool compound);
 
diff --git a/mm/rmap.c b/mm/rmap.c
index d52d90af0a7e..b1929adc4924 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1303,31 +1303,39 @@ void folio_add_new_anon_rmap(struct folio *folio, struct vm_area_struct *vma,
 }
 
 /**
- * page_add_file_rmap - add pte mapping to a file page
- * @page:	the page to add the mapping to
+ * folio_add_file_rmap_range - add pte mapping to page range of a folio
+ * @folio:	The folio to add the mapping to
+ * @page:	The first page to add
+ * @nr_pages:	The number of pages which will be mapped
  * @vma:	the vm area in which the mapping is added
  * @compound:	charge the page as compound or small page
  *
+ * The page range of folio is defined by [first_page, first_page + nr_pages)
+ *
  * The caller needs to hold the pte lock.
  */
-void page_add_file_rmap(struct page *page, struct vm_area_struct *vma,
-		bool compound)
+void folio_add_file_rmap_range(struct folio *folio, struct page *page,
+			unsigned int nr_pages, struct vm_area_struct *vma,
+			bool compound)
 {
-	struct folio *folio = page_folio(page);
 	atomic_t *mapped = &folio->_nr_pages_mapped;
-	int nr = 0, nr_pmdmapped = 0;
-	bool first;
+	unsigned int nr_pmdmapped = 0, first;
+	int nr = 0;
 
-	VM_BUG_ON_PAGE(compound && !PageTransHuge(page), page);
+	VM_WARN_ON_FOLIO(compound && !folio_test_pmd_mappable(folio), folio);
 
 	/* Is page being mapped by PTE? Is this its first map to be added? */
 	if (likely(!compound)) {
-		first = atomic_inc_and_test(&page->_mapcount);
-		nr = first;
-		if (first && folio_test_large(folio)) {
-			nr = atomic_inc_return_relaxed(mapped);
-			nr = (nr < COMPOUND_MAPPED);
-		}
+		do {
+			first = atomic_inc_and_test(&page->_mapcount);
+			if (first && folio_test_large(folio)) {
+				first = atomic_inc_return_relaxed(mapped);
+				first = (nr < COMPOUND_MAPPED);
+			}
+
+			if (first)
+				nr++;
+		} while (page++, --nr_pages > 0);
 	} else if (folio_test_pmd_mappable(folio)) {
 		/* That test is redundant: it's for safety or to optimize out */
 
@@ -1356,6 +1364,30 @@ void page_add_file_rmap(struct page *page, struct vm_area_struct *vma,
 	mlock_vma_folio(folio, vma, compound);
 }
 
+/**
+ * page_add_file_rmap - add pte mapping to a file page
+ * @page:	the page to add the mapping to
+ * @vma:	the vm area in which the mapping is added
+ * @compound:	charge the page as compound or small page
+ *
+ * The caller needs to hold the pte lock.
+ */
+void page_add_file_rmap(struct page *page, struct vm_area_struct *vma,
+		bool compound)
+{
+	struct folio *folio = page_folio(page);
+	unsigned int nr_pages;
+
+	VM_WARN_ON_ONCE_PAGE(compound && !PageTransHuge(page), page);
+
+	if (likely(!compound))
+		nr_pages = 1;
+	else
+		nr_pages = folio_nr_pages(folio);
+
+	folio_add_file_rmap_range(folio, page, nr_pages, vma, compound);
+}
+
 /**
  * page_remove_rmap - take down pte mapping from a page
  * @page:	page to remove mapping from
-- 
2.35.1



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v5 4/5] mm: Convert do_set_pte() to set_pte_range()
  2023-02-07 19:49 [PATCH v5 0/5] Batched page table updates for file-backed large folios Matthew Wilcox (Oracle)
                   ` (2 preceding siblings ...)
  2023-02-07 19:49 ` [PATCH v5 3/5] rmap: add folio_add_file_rmap_range() Matthew Wilcox (Oracle)
@ 2023-02-07 19:49 ` Matthew Wilcox (Oracle)
  2023-02-07 19:49 ` [PATCH v5 5/5] filemap: Batch PTE mappings Matthew Wilcox (Oracle)
  4 siblings, 0 replies; 12+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-02-07 19:49 UTC (permalink / raw)
  To: Yin Fengwei, linux-mm
  Cc: david, dave.hansen, tim.c.chen, ying.huang, Matthew Wilcox

From: Yin Fengwei <fengwei.yin@intel.com>

set_pte_range() allows to setup page table entries for a specific
range.  It takes advantage of batched rmap update for large folio.
It now takes care of calling update_mmu_cache().

Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 Documentation/filesystems/locking.rst |  2 +-
 include/linux/mm.h                    |  3 ++-
 mm/filemap.c                          |  3 +--
 mm/memory.c                           | 31 ++++++++++++++++-----------
 4 files changed, 23 insertions(+), 16 deletions(-)

diff --git a/Documentation/filesystems/locking.rst b/Documentation/filesystems/locking.rst
index 7de7a7272a5e..922886fefb7f 100644
--- a/Documentation/filesystems/locking.rst
+++ b/Documentation/filesystems/locking.rst
@@ -663,7 +663,7 @@ locked. The VM will unlock the page.
 Filesystem should find and map pages associated with offsets from "start_pgoff"
 till "end_pgoff". ->map_pages() is called with page table locked and must
 not block.  If it's not possible to reach a page without blocking,
-filesystem should skip it. Filesystem should use do_set_pte() to setup
+filesystem should skip it. Filesystem should use set_pte_range() to setup
 page table entry. Pointer to entry associated with the page is passed in
 "pte" field in vm_fault structure. Pointers to entries for other offsets
 should be calculated relative to "pte".
diff --git a/include/linux/mm.h b/include/linux/mm.h
index d6f8f41514cc..b39493b5c49e 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1161,7 +1161,8 @@ static inline pte_t maybe_mkwrite(pte_t pte, struct vm_area_struct *vma)
 }
 
 vm_fault_t do_set_pmd(struct vm_fault *vmf, struct page *page);
-void do_set_pte(struct vm_fault *vmf, struct page *page, unsigned long addr);
+void set_pte_range(struct vm_fault *vmf, struct folio *folio,
+		struct page *page, unsigned int nr, unsigned long addr);
 
 vm_fault_t finish_fault(struct vm_fault *vmf);
 vm_fault_t finish_mkwrite_fault(struct vm_fault *vmf);
diff --git a/mm/filemap.c b/mm/filemap.c
index 9aa7a4fdc374..745f1eb2a87f 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -3375,8 +3375,7 @@ static vm_fault_t filemap_map_folio_range(struct vm_fault *vmf,
 			ret = VM_FAULT_NOPAGE;
 
 		ref_count++;
-		do_set_pte(vmf, page, addr);
-		update_mmu_cache(vma, addr, vmf->pte);
+		set_pte_range(vmf, folio, page, 1, addr);
 	} while (vmf->pte++, page++, addr += PAGE_SIZE, ++count < nr_pages);
 
 	/* Restore the vmf->pte */
diff --git a/mm/memory.c b/mm/memory.c
index 7a04a1130ec1..f1cf47b7e3bb 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4257,15 +4257,18 @@ vm_fault_t do_set_pmd(struct vm_fault *vmf, struct page *page)
 }
 #endif
 
-void do_set_pte(struct vm_fault *vmf, struct page *page, unsigned long addr)
+void set_pte_range(struct vm_fault *vmf, struct folio *folio,
+		struct page *page, unsigned int nr, unsigned long addr)
 {
 	struct vm_area_struct *vma = vmf->vma;
 	bool uffd_wp = pte_marker_uffd_wp(vmf->orig_pte);
 	bool write = vmf->flags & FAULT_FLAG_WRITE;
 	bool prefault = vmf->address != addr;
 	pte_t entry;
+	unsigned int i;
 
-	flush_icache_page(vma, page);
+	for (i = 0; i < nr; i++)
+		flush_icache_page(vma, page + i);
 	entry = mk_pte(page, vma->vm_page_prot);
 
 	if (prefault && arch_wants_old_prefaulted_pte())
@@ -4279,14 +4282,20 @@ void do_set_pte(struct vm_fault *vmf, struct page *page, unsigned long addr)
 		entry = pte_mkuffd_wp(entry);
 	/* copy-on-write page */
 	if (write && !(vma->vm_flags & VM_SHARED)) {
-		inc_mm_counter(vma->vm_mm, MM_ANONPAGES);
-		page_add_new_anon_rmap(page, vma, addr);
-		lru_cache_add_inactive_or_unevictable(page, vma);
+		add_mm_counter(vma->vm_mm, MM_ANONPAGES, nr);
+		VM_BUG_ON_FOLIO(nr != 1, folio);
+		folio_add_new_anon_rmap(folio, vma, addr);
+		folio_add_lru_vma(folio, vma);
 	} else {
-		inc_mm_counter(vma->vm_mm, mm_counter_file(page));
-		page_add_file_rmap(page, vma, false);
+		add_mm_counter(vma->vm_mm, mm_counter_file(page), nr);
+		folio_add_file_rmap_range(folio, page, nr, vma, false);
 	}
-	set_pte_at(vma->vm_mm, addr, vmf->pte, entry);
+	set_ptes(vma->vm_mm, addr, vmf->pte, entry, nr);
+
+	/* no need to invalidate: a not-present page won't be cached */
+	for (i = 0; i < nr; i++)
+		update_mmu_cache(vma, addr + i * PAGE_SIZE, vmf->pte + i);
+
 }
 
 static bool vmf_pte_changed(struct vm_fault *vmf)
@@ -4359,11 +4368,9 @@ vm_fault_t finish_fault(struct vm_fault *vmf)
 
 	/* Re-check under ptl */
 	if (likely(!vmf_pte_changed(vmf))) {
-		do_set_pte(vmf, page, vmf->address);
-
-		/* no need to invalidate: a not-present page won't be cached */
-		update_mmu_cache(vma, vmf->address, vmf->pte);
+		struct folio *folio = page_folio(page);
 
+		set_pte_range(vmf, folio, page, 1, vmf->address);
 		ret = 0;
 	} else {
 		update_mmu_tlb(vma, vmf->address, vmf->pte);
-- 
2.35.1



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v5 5/5] filemap: Batch PTE mappings
  2023-02-07 19:49 [PATCH v5 0/5] Batched page table updates for file-backed large folios Matthew Wilcox (Oracle)
                   ` (3 preceding siblings ...)
  2023-02-07 19:49 ` [PATCH v5 4/5] mm: Convert do_set_pte() to set_pte_range() Matthew Wilcox (Oracle)
@ 2023-02-07 19:49 ` Matthew Wilcox (Oracle)
  4 siblings, 0 replies; 12+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-02-07 19:49 UTC (permalink / raw)
  To: Yin Fengwei, linux-mm
  Cc: david, dave.hansen, tim.c.chen, ying.huang, Matthew Wilcox

From: Yin Fengwei <fengwei.yin@intel.com>

Call set_pte_range() once per contiguous range of the folio instead
of once per page.  This batches the updates to mm counters and the
rmap.

With a will-it-scale.page_fault3 like app (change file write
fault testing to read fault testing. Trying to upstream it to
will-it-scale at [1]) got 15% performance gain on a 48C/96T
Cascade Lake test box with 96 processes running against xfs.

Perf data collected before/after the change:
  18.73%--page_add_file_rmap
          |
           --11.60%--__mod_lruvec_page_state
                     |
                     |--7.40%--__mod_memcg_lruvec_state
                     |          |
                     |           --5.58%--cgroup_rstat_updated
                     |
                      --2.53%--__mod_lruvec_state
                                |
                                 --1.48%--__mod_node_page_state

  9.93%--page_add_file_rmap_range
         |
          --2.67%--__mod_lruvec_page_state
                    |
                    |--1.95%--__mod_memcg_lruvec_state
                    |          |
                    |           --1.57%--cgroup_rstat_updated
                    |
                     --0.61%--__mod_lruvec_state
                               |
                                --0.54%--__mod_node_page_state

The running time of __mode_lruvec_page_state() is reduced about 9%.

[1]: https://github.com/antonblanchard/will-it-scale/pull/37

Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 mm/filemap.c | 36 +++++++++++++++++++++++++-----------
 1 file changed, 25 insertions(+), 11 deletions(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index 745f1eb2a87f..4abab43d9564 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -3354,11 +3354,12 @@ static vm_fault_t filemap_map_folio_range(struct vm_fault *vmf,
 	struct file *file = vma->vm_file;
 	struct page *page = folio_page(folio, start);
 	unsigned int mmap_miss = READ_ONCE(file->f_ra.mmap_miss);
-	unsigned int ref_count = 0, count = 0;
+	unsigned int count = 0;
+	pte_t *old_ptep = vmf->pte;
 
 	do {
-		if (PageHWPoison(page))
-			continue;
+		if (PageHWPoison(page + count))
+			goto skip;
 
 		if (mmap_miss > 0)
 			mmap_miss--;
@@ -3368,20 +3369,33 @@ static vm_fault_t filemap_map_folio_range(struct vm_fault *vmf,
 		 * handled in the specific fault path, and it'll prohibit the
 		 * fault-around logic.
 		 */
-		if (!pte_none(*vmf->pte))
-			continue;
+		if (!pte_none(vmf->pte[count]))
+			goto skip;
 
 		if (vmf->address == addr)
 			ret = VM_FAULT_NOPAGE;
 
-		ref_count++;
-		set_pte_range(vmf, folio, page, 1, addr);
-	} while (vmf->pte++, page++, addr += PAGE_SIZE, ++count < nr_pages);
+		count++;
+		continue;
+skip:
+		if (count) {
+			set_pte_range(vmf, folio, page, count, addr);
+			folio_ref_add(folio, count);
+		}
 
-	/* Restore the vmf->pte */
-	vmf->pte -= nr_pages;
+		count++;
+		page += count;
+		vmf->pte += count;
+		addr += count * PAGE_SIZE;
+		count = 0;
+	} while (--nr_pages > 0);
+
+	if (count) {
+		set_pte_range(vmf, folio, page, count, addr);
+		folio_ref_add(folio, count);
+	}
 
-	folio_ref_add(folio, ref_count);
+	vmf->pte = old_ptep;
 	WRITE_ONCE(file->f_ra.mmap_miss, mmap_miss);
 
 	return ret;
-- 
2.35.1



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v5 2/5] mm: Add generic set_ptes()
  2023-02-07 19:49 ` [PATCH v5 2/5] mm: Add generic set_ptes() Matthew Wilcox (Oracle)
@ 2023-02-08  0:48   ` kernel test robot
  2023-02-08  2:20   ` kernel test robot
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 12+ messages in thread
From: kernel test robot @ 2023-02-08  0:48 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), Yin Fengwei, linux-mm
  Cc: oe-kbuild-all, Matthew Wilcox (Oracle),
	david, dave.hansen, tim.c.chen, ying.huang

Hi Matthew,

I love your patch! Yet something to improve:

[auto build test ERROR on next-20230207]
[also build test ERROR on v6.2-rc7]
[cannot apply to linus/master v6.2-rc7 v6.2-rc6 v6.2-rc5]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Matthew-Wilcox-Oracle/filemap-Add-filemap_map_folio_range/20230208-041404
patch link:    https://lore.kernel.org/r/20230207194937.122543-3-willy%40infradead.org
patch subject: [PATCH v5 2/5] mm: Add generic set_ptes()
config: arm-randconfig-r016-20230205 (https://download.01.org/0day-ci/archive/20230208/202302080851.OXQ3usg2-lkp@intel.com/config)
compiler: arm-linux-gnueabi-gcc (GCC) 12.1.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/intel-lab-lkp/linux/commit/1549aed85e99407fbf2600da432b84c0660d2029
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Matthew-Wilcox-Oracle/filemap-Add-filemap_map_folio_range/20230208-041404
        git checkout 1549aed85e99407fbf2600da432b84c0660d2029
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=arm olddefconfig
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=arm prepare

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   In file included from include/linux/mm.h:29,
                    from arch/arm/kernel/asm-offsets.c:12:
   include/linux/pgtable.h: In function 'set_ptes':
>> include/linux/pgtable.h:1458:17: error: implicit declaration of function 'set_pte_at'; did you mean 'set_ptes'? [-Werror=implicit-function-declaration]
    1458 |                 set_pte_at(mm, addr, ptep, pte);
         |                 ^~~~~~~~~~
         |                 set_ptes
   cc1: some warnings being treated as errors
   make[2]: *** [scripts/Makefile.build:114: arch/arm/kernel/asm-offsets.s] Error 1
   make[2]: Target 'prepare' not remade because of errors.
   make[1]: *** [Makefile:1286: prepare0] Error 2
   make[1]: Target 'prepare' not remade because of errors.
   make: *** [Makefile:226: __sub-make] Error 2
   make: Target 'prepare' not remade because of errors.


vim +1458 include/linux/pgtable.h

  1441	
  1442	#ifndef set_ptes
  1443	/**
  1444	 * set_ptes - Map consecutive pages to a contiguous range of addresses.
  1445	 * @mm: Address space to map the pages into.
  1446	 * @addr: Address to map the first page at.
  1447	 * @ptep: Page table pointer for the first entry.
  1448	 * @pte: Page table entry for the first page.
  1449	 * @nr: Number of pages to map.
  1450	 *
  1451	 * Context: The caller holds the page table lock.  The PTEs all lie
  1452	 * within a single PMD (and VMA, and folio).
  1453	 */
  1454	static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
  1455			pte_t *ptep, pte_t pte, unsigned int nr)
  1456	{
  1457		for (;;) {
> 1458			set_pte_at(mm, addr, ptep, pte);
  1459			if (--nr == 0)
  1460				break;
  1461			ptep++;
  1462			addr += PAGE_SIZE;
  1463			/* This works for x86.  Check how PTEs are encoded */
  1464			pte = __pte(pte_val(pte) + PAGE_SIZE);
  1465		}
  1466	}
  1467	#endif
  1468	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v5 2/5] mm: Add generic set_ptes()
  2023-02-07 19:49 ` [PATCH v5 2/5] mm: Add generic set_ptes() Matthew Wilcox (Oracle)
  2023-02-08  0:48   ` kernel test robot
@ 2023-02-08  2:20   ` kernel test robot
  2023-02-08  4:57   ` Yin Fengwei
  2023-02-08  8:38   ` Kirill A. Shutemov
  3 siblings, 0 replies; 12+ messages in thread
From: kernel test robot @ 2023-02-08  2:20 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), Yin Fengwei, linux-mm
  Cc: llvm, oe-kbuild-all, Matthew Wilcox (Oracle),
	david, dave.hansen, tim.c.chen, ying.huang

Hi Matthew,

I love your patch! Perhaps something to improve:

[auto build test WARNING on next-20230207]
[cannot apply to linus/master v6.2-rc7 v6.2-rc6 v6.2-rc5]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Matthew-Wilcox-Oracle/filemap-Add-filemap_map_folio_range/20230208-041404
patch link:    https://lore.kernel.org/r/20230207194937.122543-3-willy%40infradead.org
patch subject: [PATCH v5 2/5] mm: Add generic set_ptes()
config: hexagon-randconfig-r041-20230205 (https://download.01.org/0day-ci/archive/20230208/202302080951.lhzrREyk-lkp@intel.com/config)
compiler: clang version 17.0.0 (https://github.com/llvm/llvm-project db0e6591612b53910a1b366863348bdb9d7d2fb1)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/intel-lab-lkp/linux/commit/1549aed85e99407fbf2600da432b84c0660d2029
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Matthew-Wilcox-Oracle/filemap-Add-filemap_map_folio_range/20230208-041404
        git checkout 1549aed85e99407fbf2600da432b84c0660d2029
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=hexagon olddefconfig
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=hexagon SHELL=/bin/bash drivers/media/dvb-frontends/

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

   In file included from drivers/media/dvb-frontends/a8293.c:8:
   In file included from drivers/media/dvb-frontends/a8293.h:11:
   In file included from include/media/dvb_frontend.h:38:
   In file included from include/linux/i2c.h:19:
   In file included from include/linux/regulator/consumer.h:35:
   In file included from include/linux/suspend.h:5:
   In file included from include/linux/swap.h:9:
   In file included from include/linux/memcontrol.h:13:
   In file included from include/linux/cgroup.h:26:
   In file included from include/linux/kernel_stat.h:9:
   In file included from include/linux/interrupt.h:11:
   In file included from include/linux/hardirq.h:11:
   In file included from ./arch/hexagon/include/generated/asm/hardirq.h:1:
   In file included from include/asm-generic/hardirq.h:17:
   In file included from include/linux/irq.h:20:
   In file included from include/linux/io.h:13:
   In file included from arch/hexagon/include/asm/io.h:334:
   include/asm-generic/io.h:547:31: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           val = __raw_readb(PCI_IOBASE + addr);
                             ~~~~~~~~~~ ^
   include/asm-generic/io.h:560:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           val = __le16_to_cpu((__le16 __force)__raw_readw(PCI_IOBASE + addr));
                                                           ~~~~~~~~~~ ^
   include/uapi/linux/byteorder/little_endian.h:37:51: note: expanded from macro '__le16_to_cpu'
   #define __le16_to_cpu(x) ((__force __u16)(__le16)(x))
                                                     ^
   In file included from drivers/media/dvb-frontends/a8293.c:8:
   In file included from drivers/media/dvb-frontends/a8293.h:11:
   In file included from include/media/dvb_frontend.h:38:
   In file included from include/linux/i2c.h:19:
   In file included from include/linux/regulator/consumer.h:35:
   In file included from include/linux/suspend.h:5:
   In file included from include/linux/swap.h:9:
   In file included from include/linux/memcontrol.h:13:
   In file included from include/linux/cgroup.h:26:
   In file included from include/linux/kernel_stat.h:9:
   In file included from include/linux/interrupt.h:11:
   In file included from include/linux/hardirq.h:11:
   In file included from ./arch/hexagon/include/generated/asm/hardirq.h:1:
   In file included from include/asm-generic/hardirq.h:17:
   In file included from include/linux/irq.h:20:
   In file included from include/linux/io.h:13:
   In file included from arch/hexagon/include/asm/io.h:334:
   include/asm-generic/io.h:573:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           val = __le32_to_cpu((__le32 __force)__raw_readl(PCI_IOBASE + addr));
                                                           ~~~~~~~~~~ ^
   include/uapi/linux/byteorder/little_endian.h:35:51: note: expanded from macro '__le32_to_cpu'
   #define __le32_to_cpu(x) ((__force __u32)(__le32)(x))
                                                     ^
   In file included from drivers/media/dvb-frontends/a8293.c:8:
   In file included from drivers/media/dvb-frontends/a8293.h:11:
   In file included from include/media/dvb_frontend.h:38:
   In file included from include/linux/i2c.h:19:
   In file included from include/linux/regulator/consumer.h:35:
   In file included from include/linux/suspend.h:5:
   In file included from include/linux/swap.h:9:
   In file included from include/linux/memcontrol.h:13:
   In file included from include/linux/cgroup.h:26:
   In file included from include/linux/kernel_stat.h:9:
   In file included from include/linux/interrupt.h:11:
   In file included from include/linux/hardirq.h:11:
   In file included from ./arch/hexagon/include/generated/asm/hardirq.h:1:
   In file included from include/asm-generic/hardirq.h:17:
   In file included from include/linux/irq.h:20:
   In file included from include/linux/io.h:13:
   In file included from arch/hexagon/include/asm/io.h:334:
   include/asm-generic/io.h:584:33: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           __raw_writeb(value, PCI_IOBASE + addr);
                               ~~~~~~~~~~ ^
   include/asm-generic/io.h:594:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           __raw_writew((u16 __force)cpu_to_le16(value), PCI_IOBASE + addr);
                                                         ~~~~~~~~~~ ^
   include/asm-generic/io.h:604:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           __raw_writel((u32 __force)cpu_to_le32(value), PCI_IOBASE + addr);
                                                         ~~~~~~~~~~ ^
   In file included from drivers/media/dvb-frontends/a8293.c:8:
   In file included from drivers/media/dvb-frontends/a8293.h:11:
   In file included from include/media/dvb_frontend.h:38:
   In file included from include/linux/i2c.h:19:
   In file included from include/linux/regulator/consumer.h:35:
   In file included from include/linux/suspend.h:5:
   In file included from include/linux/swap.h:9:
   In file included from include/linux/memcontrol.h:20:
   In file included from include/linux/mm.h:29:
>> include/linux/pgtable.h:1454:65: warning: parameter 'addr' set but not used [-Wunused-but-set-parameter]
   static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
                                                                   ^
   7 warnings generated.
--
   In file included from drivers/media/dvb-frontends/mb86a20s.c:12:
   In file included from include/media/dvb_frontend.h:38:
   In file included from include/linux/i2c.h:19:
   In file included from include/linux/regulator/consumer.h:35:
   In file included from include/linux/suspend.h:5:
   In file included from include/linux/swap.h:9:
   In file included from include/linux/memcontrol.h:13:
   In file included from include/linux/cgroup.h:26:
   In file included from include/linux/kernel_stat.h:9:
   In file included from include/linux/interrupt.h:11:
   In file included from include/linux/hardirq.h:11:
   In file included from ./arch/hexagon/include/generated/asm/hardirq.h:1:
   In file included from include/asm-generic/hardirq.h:17:
   In file included from include/linux/irq.h:20:
   In file included from include/linux/io.h:13:
   In file included from arch/hexagon/include/asm/io.h:334:
   include/asm-generic/io.h:547:31: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           val = __raw_readb(PCI_IOBASE + addr);
                             ~~~~~~~~~~ ^
   include/asm-generic/io.h:560:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           val = __le16_to_cpu((__le16 __force)__raw_readw(PCI_IOBASE + addr));
                                                           ~~~~~~~~~~ ^
   include/uapi/linux/byteorder/little_endian.h:37:51: note: expanded from macro '__le16_to_cpu'
   #define __le16_to_cpu(x) ((__force __u16)(__le16)(x))
                                                     ^
   In file included from drivers/media/dvb-frontends/mb86a20s.c:12:
   In file included from include/media/dvb_frontend.h:38:
   In file included from include/linux/i2c.h:19:
   In file included from include/linux/regulator/consumer.h:35:
   In file included from include/linux/suspend.h:5:
   In file included from include/linux/swap.h:9:
   In file included from include/linux/memcontrol.h:13:
   In file included from include/linux/cgroup.h:26:
   In file included from include/linux/kernel_stat.h:9:
   In file included from include/linux/interrupt.h:11:
   In file included from include/linux/hardirq.h:11:
   In file included from ./arch/hexagon/include/generated/asm/hardirq.h:1:
   In file included from include/asm-generic/hardirq.h:17:
   In file included from include/linux/irq.h:20:
   In file included from include/linux/io.h:13:
   In file included from arch/hexagon/include/asm/io.h:334:
   include/asm-generic/io.h:573:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           val = __le32_to_cpu((__le32 __force)__raw_readl(PCI_IOBASE + addr));
                                                           ~~~~~~~~~~ ^
   include/uapi/linux/byteorder/little_endian.h:35:51: note: expanded from macro '__le32_to_cpu'
   #define __le32_to_cpu(x) ((__force __u32)(__le32)(x))
                                                     ^
   In file included from drivers/media/dvb-frontends/mb86a20s.c:12:
   In file included from include/media/dvb_frontend.h:38:
   In file included from include/linux/i2c.h:19:
   In file included from include/linux/regulator/consumer.h:35:
   In file included from include/linux/suspend.h:5:
   In file included from include/linux/swap.h:9:
   In file included from include/linux/memcontrol.h:13:
   In file included from include/linux/cgroup.h:26:
   In file included from include/linux/kernel_stat.h:9:
   In file included from include/linux/interrupt.h:11:
   In file included from include/linux/hardirq.h:11:
   In file included from ./arch/hexagon/include/generated/asm/hardirq.h:1:
   In file included from include/asm-generic/hardirq.h:17:
   In file included from include/linux/irq.h:20:
   In file included from include/linux/io.h:13:
   In file included from arch/hexagon/include/asm/io.h:334:
   include/asm-generic/io.h:584:33: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           __raw_writeb(value, PCI_IOBASE + addr);
                               ~~~~~~~~~~ ^
   include/asm-generic/io.h:594:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           __raw_writew((u16 __force)cpu_to_le16(value), PCI_IOBASE + addr);
                                                         ~~~~~~~~~~ ^
   include/asm-generic/io.h:604:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           __raw_writel((u32 __force)cpu_to_le32(value), PCI_IOBASE + addr);
                                                         ~~~~~~~~~~ ^
   In file included from drivers/media/dvb-frontends/mb86a20s.c:12:
   In file included from include/media/dvb_frontend.h:38:
   In file included from include/linux/i2c.h:19:
   In file included from include/linux/regulator/consumer.h:35:
   In file included from include/linux/suspend.h:5:
   In file included from include/linux/swap.h:9:
   In file included from include/linux/memcontrol.h:20:
   In file included from include/linux/mm.h:29:
>> include/linux/pgtable.h:1454:65: warning: parameter 'addr' set but not used [-Wunused-but-set-parameter]
   static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
                                                                   ^
   drivers/media/dvb-frontends/mb86a20s.c:1572:6: warning: variable 'active_layers' set but not used [-Wunused-but-set-variable]
           int active_layers = 0, pre_ber_layers = 0, post_ber_layers = 0;
               ^
   8 warnings generated.
--
   In file included from drivers/media/dvb-frontends/mn88443x.c:12:
   In file included from include/linux/regmap.h:20:
   In file included from include/linux/iopoll.h:14:
   In file included from include/linux/io.h:13:
   In file included from arch/hexagon/include/asm/io.h:334:
   include/asm-generic/io.h:547:31: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           val = __raw_readb(PCI_IOBASE + addr);
                             ~~~~~~~~~~ ^
   include/asm-generic/io.h:560:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           val = __le16_to_cpu((__le16 __force)__raw_readw(PCI_IOBASE + addr));
                                                           ~~~~~~~~~~ ^
   include/uapi/linux/byteorder/little_endian.h:37:51: note: expanded from macro '__le16_to_cpu'
   #define __le16_to_cpu(x) ((__force __u16)(__le16)(x))
                                                     ^
   In file included from drivers/media/dvb-frontends/mn88443x.c:12:
   In file included from include/linux/regmap.h:20:
   In file included from include/linux/iopoll.h:14:
   In file included from include/linux/io.h:13:
   In file included from arch/hexagon/include/asm/io.h:334:
   include/asm-generic/io.h:573:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           val = __le32_to_cpu((__le32 __force)__raw_readl(PCI_IOBASE + addr));
                                                           ~~~~~~~~~~ ^
   include/uapi/linux/byteorder/little_endian.h:35:51: note: expanded from macro '__le32_to_cpu'
   #define __le32_to_cpu(x) ((__force __u32)(__le32)(x))
                                                     ^
   In file included from drivers/media/dvb-frontends/mn88443x.c:12:
   In file included from include/linux/regmap.h:20:
   In file included from include/linux/iopoll.h:14:
   In file included from include/linux/io.h:13:
   In file included from arch/hexagon/include/asm/io.h:334:
   include/asm-generic/io.h:584:33: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           __raw_writeb(value, PCI_IOBASE + addr);
                               ~~~~~~~~~~ ^
   include/asm-generic/io.h:594:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           __raw_writew((u16 __force)cpu_to_le16(value), PCI_IOBASE + addr);
                                                         ~~~~~~~~~~ ^
   include/asm-generic/io.h:604:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           __raw_writel((u32 __force)cpu_to_le32(value), PCI_IOBASE + addr);
                                                         ~~~~~~~~~~ ^
   In file included from drivers/media/dvb-frontends/mn88443x.c:15:
   In file included from drivers/media/dvb-frontends/mn88443x.h:11:
   In file included from include/media/dvb_frontend.h:38:
   In file included from include/linux/i2c.h:19:
   In file included from include/linux/regulator/consumer.h:35:
   In file included from include/linux/suspend.h:5:
   In file included from include/linux/swap.h:9:
   In file included from include/linux/memcontrol.h:20:
   In file included from include/linux/mm.h:29:
>> include/linux/pgtable.h:1454:65: warning: parameter 'addr' set but not used [-Wunused-but-set-parameter]
   static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
                                                                   ^
   drivers/media/dvb-frontends/mn88443x.c:782:34: warning: unused variable 'mn88443x_of_match' [-Wunused-const-variable]
   static const struct of_device_id mn88443x_of_match[] = {
                                    ^
   8 warnings generated.


vim +/addr +1454 include/linux/pgtable.h

  1441	
  1442	#ifndef set_ptes
  1443	/**
  1444	 * set_ptes - Map consecutive pages to a contiguous range of addresses.
  1445	 * @mm: Address space to map the pages into.
  1446	 * @addr: Address to map the first page at.
  1447	 * @ptep: Page table pointer for the first entry.
  1448	 * @pte: Page table entry for the first page.
  1449	 * @nr: Number of pages to map.
  1450	 *
  1451	 * Context: The caller holds the page table lock.  The PTEs all lie
  1452	 * within a single PMD (and VMA, and folio).
  1453	 */
> 1454	static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
  1455			pte_t *ptep, pte_t pte, unsigned int nr)
  1456	{
  1457		for (;;) {
  1458			set_pte_at(mm, addr, ptep, pte);
  1459			if (--nr == 0)
  1460				break;
  1461			ptep++;
  1462			addr += PAGE_SIZE;
  1463			/* This works for x86.  Check how PTEs are encoded */
  1464			pte = __pte(pte_val(pte) + PAGE_SIZE);
  1465		}
  1466	}
  1467	#endif
  1468	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v5 2/5] mm: Add generic set_ptes()
  2023-02-07 19:49 ` [PATCH v5 2/5] mm: Add generic set_ptes() Matthew Wilcox (Oracle)
  2023-02-08  0:48   ` kernel test robot
  2023-02-08  2:20   ` kernel test robot
@ 2023-02-08  4:57   ` Yin Fengwei
  2023-02-08  8:38   ` Kirill A. Shutemov
  3 siblings, 0 replies; 12+ messages in thread
From: Yin Fengwei @ 2023-02-08  4:57 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), linux-mm
  Cc: david, dave.hansen, tim.c.chen, ying.huang

On 2/8/23 03:49, Matthew Wilcox (Oracle) wrote:
> This set_ptes should work for most architectures.  Those that need to
> set a special bit in their PTEs or have their PFNs located at a different
> offset from PAGE_SHIFT will need to override it.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
For x86_64

Tested-by: Yin Fengwei <fengwei.yin@intel.com>

Regards
Yin, Fengwei

> ---
>   include/linux/pgtable.h | 27 +++++++++++++++++++++++++++
>   1 file changed, 27 insertions(+)
> 
> diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
> index c63cd44777ec..e1804d23e7c4 100644
> --- a/include/linux/pgtable.h
> +++ b/include/linux/pgtable.h
> @@ -1439,6 +1439,33 @@ static inline int pmd_protnone(pmd_t pmd)
>   
>   #endif /* CONFIG_MMU */
>   
> +#ifndef set_ptes
> +/**
> + * set_ptes - Map consecutive pages to a contiguous range of addresses.
> + * @mm: Address space to map the pages into.
> + * @addr: Address to map the first page at.
> + * @ptep: Page table pointer for the first entry.
> + * @pte: Page table entry for the first page.
> + * @nr: Number of pages to map.
> + *
> + * Context: The caller holds the page table lock.  The PTEs all lie
> + * within a single PMD (and VMA, and folio).
> + */
> +static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
> +		pte_t *ptep, pte_t pte, unsigned int nr)
> +{
> +	for (;;) {
> +		set_pte_at(mm, addr, ptep, pte);
> +		if (--nr == 0)
> +			break;
> +		ptep++;
> +		addr += PAGE_SIZE;
> +		/* This works for x86.  Check how PTEs are encoded */
> +		pte = __pte(pte_val(pte) + PAGE_SIZE);
> +	}
> +}
> +#endif
> +
>   #ifdef CONFIG_HAVE_ARCH_HUGE_VMAP
>   
>   #ifndef __PAGETABLE_P4D_FOLDED



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v5 2/5] mm: Add generic set_ptes()
  2023-02-07 19:49 ` [PATCH v5 2/5] mm: Add generic set_ptes() Matthew Wilcox (Oracle)
                     ` (2 preceding siblings ...)
  2023-02-08  4:57   ` Yin Fengwei
@ 2023-02-08  8:38   ` Kirill A. Shutemov
  2023-02-08 18:14     ` Matthew Wilcox
  3 siblings, 1 reply; 12+ messages in thread
From: Kirill A. Shutemov @ 2023-02-08  8:38 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: Yin Fengwei, linux-mm, david, dave.hansen, tim.c.chen, ying.huang

On Tue, Feb 07, 2023 at 07:49:32PM +0000, Matthew Wilcox (Oracle) wrote:
> This set_ptes should work for most architectures.  Those that need to
> set a special bit in their PTEs or have their PFNs located at a different
> offset from PAGE_SHIFT will need to override it.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>  include/linux/pgtable.h | 27 +++++++++++++++++++++++++++
>  1 file changed, 27 insertions(+)
> 
> diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
> index c63cd44777ec..e1804d23e7c4 100644
> --- a/include/linux/pgtable.h
> +++ b/include/linux/pgtable.h
> @@ -1439,6 +1439,33 @@ static inline int pmd_protnone(pmd_t pmd)
>  
>  #endif /* CONFIG_MMU */
>  
> +#ifndef set_ptes
> +/**
> + * set_ptes - Map consecutive pages to a contiguous range of addresses.
> + * @mm: Address space to map the pages into.
> + * @addr: Address to map the first page at.
> + * @ptep: Page table pointer for the first entry.
> + * @pte: Page table entry for the first page.
> + * @nr: Number of pages to map.
> + *
> + * Context: The caller holds the page table lock.  The PTEs all lie
> + * within a single PMD (and VMA, and folio).
> + */
> +static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
> +		pte_t *ptep, pte_t pte, unsigned int nr)
> +{
> +	for (;;) {
> +		set_pte_at(mm, addr, ptep, pte);
> +		if (--nr == 0)
> +			break;

Maybe do { ... } while (--nr); instead of the for()?

> +		ptep++;
> +		addr += PAGE_SIZE;
> +		/* This works for x86.  Check how PTEs are encoded */
> +		pte = __pte(pte_val(pte) + PAGE_SIZE);

Looks like it deserves own helper. Something like

		pte = pte_next(pte);

> +	}
> +}
> +#endif
> +
>  #ifdef CONFIG_HAVE_ARCH_HUGE_VMAP
>  
>  #ifndef __PAGETABLE_P4D_FOLDED
> -- 
> 2.35.1
> 
> 

-- 
  Kiryl Shutsemau / Kirill A. Shutemov


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v5 2/5] mm: Add generic set_ptes()
  2023-02-08  8:38   ` Kirill A. Shutemov
@ 2023-02-08 18:14     ` Matthew Wilcox
  2023-02-09  8:11       ` Kirill A. Shutemov
  0 siblings, 1 reply; 12+ messages in thread
From: Matthew Wilcox @ 2023-02-08 18:14 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Yin Fengwei, linux-mm, david, dave.hansen, tim.c.chen, ying.huang

On Wed, Feb 08, 2023 at 11:38:48AM +0300, Kirill A. Shutemov wrote:
> > +static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
> > +		pte_t *ptep, pte_t pte, unsigned int nr)
> > +{
> > +	for (;;) {
> > +		set_pte_at(mm, addr, ptep, pte);
> > +		if (--nr == 0)
> > +			break;
> 
> Maybe do { ... } while (--nr); instead of the for()?

Seems unnecessary to do the extra calculation to advance pte if
we're not going to use it?

> > +		ptep++;
> > +		addr += PAGE_SIZE;
> > +		/* This works for x86.  Check how PTEs are encoded */
> > +		pte = __pte(pte_val(pte) + PAGE_SIZE);
> 
> Looks like it deserves own helper. Something like
> 
> 		pte = pte_next(pte);

Maybe, but then it needs to be added to each arch, which I was trying
to avoid.  That said, I had another idea which I like better; more
patches coming soon.  Here's a sneak preview:

+++ b/arch/x86/include/asm/pgtable.h
@@ -1019,13 +1019,22 @@ static inline pud_t native_local_pudp_get_and_clear(pud_
t *pudp)
        return res;
 }

-static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
-                             pte_t *ptep, pte_t pte)
+static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
+                             pte_t *ptep, pte_t pte, unsigned int nr)
 {
-       page_table_check_ptes_set(mm, addr, ptep, pte, 1);
-       set_pte(ptep, pte);
+       page_table_check_ptes_set(mm, addr, ptep, pte, nr);
+
+       for (;;) {
+               set_pte(ptep, pte);
+               if (--nr == 0)
+                       break;
+               ptep++;
+               pte = __pte(pte_val(pte) + PAGE_SIZE);
+       }
 }

+#define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1)
+
 static inline void set_pmd_at(struct mm_struct *mm, unsigned long addr,
                              pmd_t *pmdp, pmd_t pmd)
 {




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v5 2/5] mm: Add generic set_ptes()
  2023-02-08 18:14     ` Matthew Wilcox
@ 2023-02-09  8:11       ` Kirill A. Shutemov
  0 siblings, 0 replies; 12+ messages in thread
From: Kirill A. Shutemov @ 2023-02-09  8:11 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Yin Fengwei, linux-mm, david, dave.hansen, tim.c.chen, ying.huang

On Wed, Feb 08, 2023 at 06:14:48PM +0000, Matthew Wilcox wrote:
> On Wed, Feb 08, 2023 at 11:38:48AM +0300, Kirill A. Shutemov wrote:
> > > +static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
> > > +		pte_t *ptep, pte_t pte, unsigned int nr)
> > > +{
> > > +	for (;;) {
> > > +		set_pte_at(mm, addr, ptep, pte);
> > > +		if (--nr == 0)
> > > +			break;
> > 
> > Maybe do { ... } while (--nr); instead of the for()?
> 
> Seems unnecessary to do the extra calculation to advance pte if
> we're not going to use it?

It should be transparent to compiler, but okay.

> 
> > > +		ptep++;
> > > +		addr += PAGE_SIZE;
> > > +		/* This works for x86.  Check how PTEs are encoded */
> > > +		pte = __pte(pte_val(pte) + PAGE_SIZE);
> > 
> > Looks like it deserves own helper. Something like
> > 
> > 		pte = pte_next(pte);
> 
> Maybe, but then it needs to be added to each arch, which I was trying
> to avoid.  That said, I had another idea which I like better; more
> patches coming soon.

Generic pte_next() that does the "+ PAGE_SIZE" trick should be fine. Arch
code can override it if bit placement is different, instead of rewriting
whole set_ptes()'

> Here's a sneak preview:
> 
> +++ b/arch/x86/include/asm/pgtable.h
> @@ -1019,13 +1019,22 @@ static inline pud_t native_local_pudp_get_and_clear(pud_
> t *pudp)
>         return res;
>  }
> 
> -static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
> -                             pte_t *ptep, pte_t pte)
> +static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
> +                             pte_t *ptep, pte_t pte, unsigned int nr)
>  {
> -       page_table_check_ptes_set(mm, addr, ptep, pte, 1);
> -       set_pte(ptep, pte);
> +       page_table_check_ptes_set(mm, addr, ptep, pte, nr);
> +
> +       for (;;) {
> +               set_pte(ptep, pte);
> +               if (--nr == 0)
> +                       break;
> +               ptep++;
> +               pte = __pte(pte_val(pte) + PAGE_SIZE);
> +       }

	for(; nr; nr--, ptep++, pte = pte_next(pte))
		set_pte(ptep, pte);

Hm?

>  }
> 
> +#define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1)
> +
>  static inline void set_pmd_at(struct mm_struct *mm, unsigned long addr,
>                               pmd_t *pmdp, pmd_t pmd)
>  {
> 
> 
> 

-- 
  Kiryl Shutsemau / Kirill A. Shutemov


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2023-02-09  8:11 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-07 19:49 [PATCH v5 0/5] Batched page table updates for file-backed large folios Matthew Wilcox (Oracle)
2023-02-07 19:49 ` [PATCH v5 1/5] filemap: Add filemap_map_folio_range() Matthew Wilcox (Oracle)
2023-02-07 19:49 ` [PATCH v5 2/5] mm: Add generic set_ptes() Matthew Wilcox (Oracle)
2023-02-08  0:48   ` kernel test robot
2023-02-08  2:20   ` kernel test robot
2023-02-08  4:57   ` Yin Fengwei
2023-02-08  8:38   ` Kirill A. Shutemov
2023-02-08 18:14     ` Matthew Wilcox
2023-02-09  8:11       ` Kirill A. Shutemov
2023-02-07 19:49 ` [PATCH v5 3/5] rmap: add folio_add_file_rmap_range() Matthew Wilcox (Oracle)
2023-02-07 19:49 ` [PATCH v5 4/5] mm: Convert do_set_pte() to set_pte_range() Matthew Wilcox (Oracle)
2023-02-07 19:49 ` [PATCH v5 5/5] filemap: Batch PTE mappings Matthew Wilcox (Oracle)

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.