Linux-Fsdevel Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH v3 00/12] Overhaul multi-page lookups for THP
@ 2020-10-26  4:13 Matthew Wilcox (Oracle)
  2020-10-26  4:13 ` [PATCH v3 01/12] mm: Make pagecache tagged lookups return only head pages Matthew Wilcox (Oracle)
                   ` (11 more replies)
  0 siblings, 12 replies; 25+ messages in thread
From: Matthew Wilcox (Oracle) @ 2020-10-26  4:13 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox (Oracle),
	linux-fsdevel, Andrew Morton, Hugh Dickins, Johannes Weiner,
	Yang Shi, Dave Chinner, linux-kernel

This THP prep patchset changes several page cache iteration APIs to only
return head pages.

 - It's only possible to tag head pages in the page cache, so only
   return head pages, not all their subpages.
 - Factor a lot of common code out of the various batch lookup routines
 - Add mapping_seek_hole_data()
 - Unify find_get_entries() and pagevec_lookup_entries()
 - Make find_get_entries only return head pages, like find_get_entry().

These are only loosely connected, but they seem to make sense together
as a series.  Another ~30 MM patches to come after this batch to
enable THPs for filesystems.

Matthew Wilcox (Oracle) (12):
  mm: Make pagecache tagged lookups return only head pages
  mm/shmem: Use pagevec_lookup in shmem_unlock_mapping
  mm/filemap: Add helper for finding pages
  mm/filemap: Add mapping_seek_hole_data
  mm: Add and use find_lock_entries
  mm: Add an 'end' parameter to find_get_entries
  mm: Add an 'end' parameter to pagevec_lookup_entries
  mm: Remove nr_entries parameter from pagevec_lookup_entries
  mm: Pass pvec directly to find_get_entries
  mm: Remove pagevec_lookup_entries
  mm/truncate,shmem: Handle truncates that split THPs
  mm/filemap: Return only head pages from find_get_entries

 include/linux/pagemap.h |   5 +-
 include/linux/pagevec.h |   4 -
 mm/filemap.c            | 275 +++++++++++++++++++++++++++-------------
 mm/internal.h           |   5 +
 mm/shmem.c              | 213 +++++++------------------------
 mm/swap.c               |  38 +-----
 mm/truncate.c           | 249 ++++++++++++++----------------------
 7 files changed, 335 insertions(+), 454 deletions(-)

-- 
2.28.0


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v3 01/12] mm: Make pagecache tagged lookups return only head pages
  2020-10-26  4:13 [PATCH v3 00/12] Overhaul multi-page lookups for THP Matthew Wilcox (Oracle)
@ 2020-10-26  4:13 ` Matthew Wilcox (Oracle)
  2020-10-28  7:50   ` Mike Rapoport
  2020-10-26  4:13 ` [PATCH v3 02/12] mm/shmem: Use pagevec_lookup in shmem_unlock_mapping Matthew Wilcox (Oracle)
                   ` (10 subsequent siblings)
  11 siblings, 1 reply; 25+ messages in thread
From: Matthew Wilcox (Oracle) @ 2020-10-26  4:13 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox (Oracle),
	linux-fsdevel, Andrew Morton, Hugh Dickins, Johannes Weiner,
	Yang Shi, Dave Chinner, linux-kernel, Jan Kara,
	William Kucharski

Pagecache tags are used for dirty page writeback.  Since dirtiness is
tracked on a per-THP basis, we only want to return the head page rather
than each subpage of a tagged page.  All the filesystems which use huge
pages today are in-memory, so there are no tagged huge pages today.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: William Kucharski <william.kucharski@oracle.com>
---
 mm/filemap.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index d5e7c2029d16..edde5dc0d28f 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2066,7 +2066,7 @@ unsigned find_get_pages_contig(struct address_space *mapping, pgoff_t index,
 EXPORT_SYMBOL(find_get_pages_contig);
 
 /**
- * find_get_pages_range_tag - find and return pages in given range matching @tag
+ * find_get_pages_range_tag - Find and return head pages matching @tag.
  * @mapping:	the address_space to search
  * @index:	the starting page index
  * @end:	The final page index (inclusive)
@@ -2074,8 +2074,8 @@ EXPORT_SYMBOL(find_get_pages_contig);
  * @nr_pages:	the maximum number of pages
  * @pages:	where the resulting pages are placed
  *
- * Like find_get_pages, except we only return pages which are tagged with
- * @tag.   We update @index to index the next page for the traversal.
+ * Like find_get_pages(), except we only return head pages which are tagged
+ * with @tag.   We update @index to index the next page for the traversal.
  *
  * Return: the number of pages which were found.
  */
@@ -2109,9 +2109,9 @@ unsigned find_get_pages_range_tag(struct address_space *mapping, pgoff_t *index,
 		if (unlikely(page != xas_reload(&xas)))
 			goto put_page;
 
-		pages[ret] = find_subpage(page, xas.xa_index);
+		pages[ret] = page;
 		if (++ret == nr_pages) {
-			*index = xas.xa_index + 1;
+			*index = page->index + thp_nr_pages(page);
 			goto out;
 		}
 		continue;
-- 
2.28.0


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v3 02/12] mm/shmem: Use pagevec_lookup in shmem_unlock_mapping
  2020-10-26  4:13 [PATCH v3 00/12] Overhaul multi-page lookups for THP Matthew Wilcox (Oracle)
  2020-10-26  4:13 ` [PATCH v3 01/12] mm: Make pagecache tagged lookups return only head pages Matthew Wilcox (Oracle)
@ 2020-10-26  4:13 ` Matthew Wilcox (Oracle)
  2020-10-26  4:13 ` [PATCH v3 03/12] mm/filemap: Add helper for finding pages Matthew Wilcox (Oracle)
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 25+ messages in thread
From: Matthew Wilcox (Oracle) @ 2020-10-26  4:13 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox (Oracle),
	linux-fsdevel, Andrew Morton, Hugh Dickins, Johannes Weiner,
	Yang Shi, Dave Chinner, linux-kernel, Jan Kara,
	William Kucharski

The comment shows that the reason for using find_get_entries() is now
stale; find_get_pages() will not return 0 if it hits a consecutive run
of swap entries, and I don't believe it has since 2011.  pagevec_lookup()
is a simpler function to use than find_get_pages(), so use it instead.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: William Kucharski <william.kucharski@oracle.com>
---
 mm/shmem.c | 11 +----------
 1 file changed, 1 insertion(+), 10 deletions(-)

diff --git a/mm/shmem.c b/mm/shmem.c
index 537c137698f8..a33972126b60 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -842,7 +842,6 @@ unsigned long shmem_swap_usage(struct vm_area_struct *vma)
 void shmem_unlock_mapping(struct address_space *mapping)
 {
 	struct pagevec pvec;
-	pgoff_t indices[PAGEVEC_SIZE];
 	pgoff_t index = 0;
 
 	pagevec_init(&pvec);
@@ -850,16 +849,8 @@ void shmem_unlock_mapping(struct address_space *mapping)
 	 * Minor point, but we might as well stop if someone else SHM_LOCKs it.
 	 */
 	while (!mapping_unevictable(mapping)) {
-		/*
-		 * Avoid pagevec_lookup(): find_get_pages() returns 0 as if it
-		 * has finished, if it hits a row of PAGEVEC_SIZE swap entries.
-		 */
-		pvec.nr = find_get_entries(mapping, index,
-					   PAGEVEC_SIZE, pvec.pages, indices);
-		if (!pvec.nr)
+		if (!pagevec_lookup(&pvec, mapping, &index))
 			break;
-		index = indices[pvec.nr - 1] + 1;
-		pagevec_remove_exceptionals(&pvec);
 		check_move_unevictable_pages(&pvec);
 		pagevec_release(&pvec);
 		cond_resched();
-- 
2.28.0


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v3 03/12] mm/filemap: Add helper for finding pages
  2020-10-26  4:13 [PATCH v3 00/12] Overhaul multi-page lookups for THP Matthew Wilcox (Oracle)
  2020-10-26  4:13 ` [PATCH v3 01/12] mm: Make pagecache tagged lookups return only head pages Matthew Wilcox (Oracle)
  2020-10-26  4:13 ` [PATCH v3 02/12] mm/shmem: Use pagevec_lookup in shmem_unlock_mapping Matthew Wilcox (Oracle)
@ 2020-10-26  4:13 ` Matthew Wilcox (Oracle)
  2020-10-27 18:56   ` Christoph Hellwig
  2020-10-26  4:14 ` [PATCH v3 04/12] mm/filemap: Add mapping_seek_hole_data Matthew Wilcox (Oracle)
                   ` (8 subsequent siblings)
  11 siblings, 1 reply; 25+ messages in thread
From: Matthew Wilcox (Oracle) @ 2020-10-26  4:13 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox (Oracle),
	linux-fsdevel, Andrew Morton, Hugh Dickins, Johannes Weiner,
	Yang Shi, Dave Chinner, linux-kernel, Jan Kara,
	William Kucharski

There is a lot of common code in find_get_entries(),
find_get_pages_range() and find_get_pages_range_tag().  Factor out
xas_find_get_entry() which simplifies all three functions.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: William Kucharski <william.kucharski@oracle.com>
---
 mm/filemap.c | 98 +++++++++++++++++++++++-----------------------------
 1 file changed, 43 insertions(+), 55 deletions(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index edde5dc0d28f..00eaed59e797 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1856,6 +1856,43 @@ struct page *pagecache_get_page(struct address_space *mapping, pgoff_t index,
 }
 EXPORT_SYMBOL(pagecache_get_page);
 
+static inline struct page *xas_find_get_entry(struct xa_state *xas,
+		pgoff_t max, xa_mark_t mark)
+{
+	struct page *page;
+
+retry:
+	if (mark == XA_PRESENT)
+		page = xas_find(xas, max);
+	else
+		page = xas_find_marked(xas, max, mark);
+
+	if (xas_retry(xas, page))
+		goto retry;
+	/*
+	 * A shadow entry of a recently evicted page, a swap
+	 * entry from shmem/tmpfs or a DAX entry.  Return it
+	 * without attempting to raise page count.
+	 */
+	if (!page || xa_is_value(page))
+		return page;
+
+	if (!page_cache_get_speculative(page))
+		goto reset;
+
+	/* Has the page moved or been split? */
+	if (unlikely(page != xas_reload(xas))) {
+		put_page(page);
+		goto reset;
+	}
+	VM_BUG_ON_PAGE(!thp_contains(page, xas->xa_index), page);
+
+	return page;
+reset:
+	xas_reset(xas);
+	goto retry;
+}
+
 /**
  * find_get_entries - gang pagecache lookup
  * @mapping:	The address_space to search
@@ -1895,42 +1932,21 @@ unsigned find_get_entries(struct address_space *mapping,
 		return 0;
 
 	rcu_read_lock();
-	xas_for_each(&xas, page, ULONG_MAX) {
-		if (xas_retry(&xas, page))
-			continue;
-		/*
-		 * A shadow entry of a recently evicted page, a swap
-		 * entry from shmem/tmpfs or a DAX entry.  Return it
-		 * without attempting to raise page count.
-		 */
-		if (xa_is_value(page))
-			goto export;
-
-		if (!page_cache_get_speculative(page))
-			goto retry;
-
-		/* Has the page moved or been split? */
-		if (unlikely(page != xas_reload(&xas)))
-			goto put_page;
-
+	while ((page = xas_find_get_entry(&xas, ULONG_MAX, XA_PRESENT))) {
 		/*
 		 * Terminate early on finding a THP, to allow the caller to
 		 * handle it all at once; but continue if this is hugetlbfs.
 		 */
-		if (PageTransHuge(page) && !PageHuge(page)) {
+		if (!xa_is_value(page) && PageTransHuge(page) &&
+				!PageHuge(page)) {
 			page = find_subpage(page, xas.xa_index);
 			nr_entries = ret + 1;
 		}
-export:
+
 		indices[ret] = xas.xa_index;
 		entries[ret] = page;
 		if (++ret == nr_entries)
 			break;
-		continue;
-put_page:
-		put_page(page);
-retry:
-		xas_reset(&xas);
 	}
 	rcu_read_unlock();
 	return ret;
@@ -1969,30 +1985,16 @@ unsigned find_get_pages_range(struct address_space *mapping, pgoff_t *start,
 		return 0;
 
 	rcu_read_lock();
-	xas_for_each(&xas, page, end) {
-		if (xas_retry(&xas, page))
-			continue;
+	while ((page = xas_find_get_entry(&xas, end, XA_PRESENT))) {
 		/* Skip over shadow, swap and DAX entries */
 		if (xa_is_value(page))
 			continue;
 
-		if (!page_cache_get_speculative(page))
-			goto retry;
-
-		/* Has the page moved or been split? */
-		if (unlikely(page != xas_reload(&xas)))
-			goto put_page;
-
 		pages[ret] = find_subpage(page, xas.xa_index);
 		if (++ret == nr_pages) {
 			*start = xas.xa_index + 1;
 			goto out;
 		}
-		continue;
-put_page:
-		put_page(page);
-retry:
-		xas_reset(&xas);
 	}
 
 	/*
@@ -2091,9 +2093,7 @@ unsigned find_get_pages_range_tag(struct address_space *mapping, pgoff_t *index,
 		return 0;
 
 	rcu_read_lock();
-	xas_for_each_marked(&xas, page, end, tag) {
-		if (xas_retry(&xas, page))
-			continue;
+	while ((page = xas_find_get_entry(&xas, end, tag))) {
 		/*
 		 * Shadow entries should never be tagged, but this iteration
 		 * is lockless so there is a window for page reclaim to evict
@@ -2102,23 +2102,11 @@ unsigned find_get_pages_range_tag(struct address_space *mapping, pgoff_t *index,
 		if (xa_is_value(page))
 			continue;
 
-		if (!page_cache_get_speculative(page))
-			goto retry;
-
-		/* Has the page moved or been split? */
-		if (unlikely(page != xas_reload(&xas)))
-			goto put_page;
-
 		pages[ret] = page;
 		if (++ret == nr_pages) {
 			*index = page->index + thp_nr_pages(page);
 			goto out;
 		}
-		continue;
-put_page:
-		put_page(page);
-retry:
-		xas_reset(&xas);
 	}
 
 	/*
-- 
2.28.0


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v3 04/12] mm/filemap: Add mapping_seek_hole_data
  2020-10-26  4:13 [PATCH v3 00/12] Overhaul multi-page lookups for THP Matthew Wilcox (Oracle)
                   ` (2 preceding siblings ...)
  2020-10-26  4:13 ` [PATCH v3 03/12] mm/filemap: Add helper for finding pages Matthew Wilcox (Oracle)
@ 2020-10-26  4:14 ` Matthew Wilcox (Oracle)
  2020-10-26 10:48   ` Jan Kara
  2020-10-27 18:58   ` Christoph Hellwig
  2020-10-26  4:14 ` [PATCH v3 05/12] mm: Add and use find_lock_entries Matthew Wilcox (Oracle)
                   ` (7 subsequent siblings)
  11 siblings, 2 replies; 25+ messages in thread
From: Matthew Wilcox (Oracle) @ 2020-10-26  4:14 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox (Oracle),
	linux-fsdevel, Andrew Morton, Hugh Dickins, Johannes Weiner,
	Yang Shi, Dave Chinner, linux-kernel, William Kucharski

Rewrite shmem_seek_hole_data() and move it to filemap.c.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: William Kucharski <william.kucharski@oracle.com>
---
 include/linux/pagemap.h |  2 ++
 mm/filemap.c            | 76 +++++++++++++++++++++++++++++++++++++++++
 mm/shmem.c              | 72 +++-----------------------------------
 3 files changed, 82 insertions(+), 68 deletions(-)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index c77b7c31b2e4..5f3e829c91fd 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -760,6 +760,8 @@ extern void __delete_from_page_cache(struct page *page, void *shadow);
 int replace_page_cache_page(struct page *old, struct page *new, gfp_t gfp_mask);
 void delete_from_page_cache_batch(struct address_space *mapping,
 				  struct pagevec *pvec);
+loff_t mapping_seek_hole_data(struct address_space *, loff_t start, loff_t end,
+		int whence);
 
 /*
  * Like add_to_page_cache_locked, but used to add newly allocated pages:
diff --git a/mm/filemap.c b/mm/filemap.c
index 00eaed59e797..3a55d258d9f2 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2526,6 +2526,82 @@ generic_file_read_iter(struct kiocb *iocb, struct iov_iter *iter)
 }
 EXPORT_SYMBOL(generic_file_read_iter);
 
+static inline loff_t page_seek_hole_data(struct page *page,
+		loff_t start, loff_t end, bool seek_data)
+{
+	if (xa_is_value(page) || PageUptodate(page))
+		return seek_data ? start : end;
+	return seek_data ? end : start;
+}
+
+static inline
+unsigned int seek_page_size(struct xa_state *xas, struct page *page)
+{
+	if (xa_is_value(page))
+		return PAGE_SIZE << xa_get_order(xas->xa, xas->xa_index);
+	return thp_size(page);
+}
+
+/**
+ * mapping_seek_hole_data - Seek for SEEK_DATA / SEEK_HOLE in the page cache.
+ * @mapping: Address space to search.
+ * @start: First byte to consider.
+ * @end: Limit of search (exclusive).
+ * @whence: Either SEEK_HOLE or SEEK_DATA.
+ *
+ * If the page cache knows which blocks contain holes and which blocks
+ * contain data, your filesystem can use this function to implement
+ * SEEK_HOLE and SEEK_DATA.  This is useful for filesystems which are
+ * entirely memory-based such as tmpfs, and filesystems which support
+ * unwritten extents.
+ *
+ * Return: The requested offset on successs, or -ENXIO if @whence specifies
+ * SEEK_DATA and there is no data after @start.  There is an implicit hole
+ * after @end - 1, so SEEK_HOLE returns @end if all the bytes between @start
+ * and @end contain data.
+ */
+loff_t mapping_seek_hole_data(struct address_space *mapping, loff_t start,
+		loff_t end, int whence)
+{
+	XA_STATE(xas, &mapping->i_pages, start >> PAGE_SHIFT);
+	pgoff_t max = (end - 1) / PAGE_SIZE;
+	bool seek_data = (whence == SEEK_DATA);
+	struct page *page;
+
+	if (end <= start)
+		return -ENXIO;
+
+	rcu_read_lock();
+	while ((page = xas_find_get_entry(&xas, max, XA_PRESENT))) {
+		loff_t pos = xas.xa_index * PAGE_SIZE;
+
+		if (start < pos) {
+			if (!seek_data)
+				goto unlock;
+			start = pos;
+		}
+
+		pos += seek_page_size(&xas, page);
+		start = page_seek_hole_data(page, start, pos, seek_data);
+		if (start < pos)
+			goto unlock;
+	}
+	rcu_read_unlock();
+
+	if (seek_data)
+		return -ENXIO;
+	goto out;
+
+unlock:
+	rcu_read_unlock();
+	if (!xa_is_value(page))
+		put_page(page);
+out:
+	if (start > end)
+		return end;
+	return start;
+}
+
 #ifdef CONFIG_MMU
 #define MMAP_LOTSAMISS  (100)
 /*
diff --git a/mm/shmem.c b/mm/shmem.c
index a33972126b60..726cede653f5 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2670,85 +2670,21 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
 	return retval ? retval : error;
 }
 
-/*
- * llseek SEEK_DATA or SEEK_HOLE through the page cache.
- */
-static pgoff_t shmem_seek_hole_data(struct address_space *mapping,
-				    pgoff_t index, pgoff_t end, int whence)
-{
-	struct page *page;
-	struct pagevec pvec;
-	pgoff_t indices[PAGEVEC_SIZE];
-	bool done = false;
-	int i;
-
-	pagevec_init(&pvec);
-	pvec.nr = 1;		/* start small: we may be there already */
-	while (!done) {
-		pvec.nr = find_get_entries(mapping, index,
-					pvec.nr, pvec.pages, indices);
-		if (!pvec.nr) {
-			if (whence == SEEK_DATA)
-				index = end;
-			break;
-		}
-		for (i = 0; i < pvec.nr; i++, index++) {
-			if (index < indices[i]) {
-				if (whence == SEEK_HOLE) {
-					done = true;
-					break;
-				}
-				index = indices[i];
-			}
-			page = pvec.pages[i];
-			if (page && !xa_is_value(page)) {
-				if (!PageUptodate(page))
-					page = NULL;
-			}
-			if (index >= end ||
-			    (page && whence == SEEK_DATA) ||
-			    (!page && whence == SEEK_HOLE)) {
-				done = true;
-				break;
-			}
-		}
-		pagevec_remove_exceptionals(&pvec);
-		pagevec_release(&pvec);
-		pvec.nr = PAGEVEC_SIZE;
-		cond_resched();
-	}
-	return index;
-}
-
 static loff_t shmem_file_llseek(struct file *file, loff_t offset, int whence)
 {
 	struct address_space *mapping = file->f_mapping;
 	struct inode *inode = mapping->host;
-	pgoff_t start, end;
-	loff_t new_offset;
 
 	if (whence != SEEK_DATA && whence != SEEK_HOLE)
 		return generic_file_llseek_size(file, offset, whence,
 					MAX_LFS_FILESIZE, i_size_read(inode));
+	if (offset < 0)
+		return -ENXIO;
+
 	inode_lock(inode);
 	/* We're holding i_mutex so we can access i_size directly */
 
-	if (offset < 0 || offset >= inode->i_size)
-		offset = -ENXIO;
-	else {
-		start = offset >> PAGE_SHIFT;
-		end = (inode->i_size + PAGE_SIZE - 1) >> PAGE_SHIFT;
-		new_offset = shmem_seek_hole_data(mapping, start, end, whence);
-		new_offset <<= PAGE_SHIFT;
-		if (new_offset > offset) {
-			if (new_offset < inode->i_size)
-				offset = new_offset;
-			else if (whence == SEEK_DATA)
-				offset = -ENXIO;
-			else
-				offset = inode->i_size;
-		}
-	}
+	offset = mapping_seek_hole_data(mapping, offset, inode->i_size, whence);
 
 	if (offset >= 0)
 		offset = vfs_setpos(file, offset, MAX_LFS_FILESIZE);
-- 
2.28.0


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v3 05/12] mm: Add and use find_lock_entries
  2020-10-26  4:13 [PATCH v3 00/12] Overhaul multi-page lookups for THP Matthew Wilcox (Oracle)
                   ` (3 preceding siblings ...)
  2020-10-26  4:14 ` [PATCH v3 04/12] mm/filemap: Add mapping_seek_hole_data Matthew Wilcox (Oracle)
@ 2020-10-26  4:14 ` Matthew Wilcox (Oracle)
  2020-10-26  4:14 ` [PATCH v3 06/12] mm: Add an 'end' parameter to find_get_entries Matthew Wilcox (Oracle)
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 25+ messages in thread
From: Matthew Wilcox (Oracle) @ 2020-10-26  4:14 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox (Oracle),
	linux-fsdevel, Andrew Morton, Hugh Dickins, Johannes Weiner,
	Yang Shi, Dave Chinner, linux-kernel, Jan Kara,
	William Kucharski

We have three functions (shmem_undo_range(), truncate_inode_pages_range()
and invalidate_mapping_pages()) which want exactly this function,
so add it to filemap.c.  Before this patch, shmem_undo_range() would
split any compound page which overlaps either end of the range being
punched in both the first and second loops through the address space.
After this patch, that functionality is left for the second loop, which
is arguably more appropriate since the first loop is supposed to run
through all the pages quickly, and splitting a page can sleep.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: William Kucharski <william.kucharski@oracle.com>
---
 mm/filemap.c  | 57 ++++++++++++++++++++++++++++++++
 mm/internal.h |  2 ++
 mm/shmem.c    | 22 +++----------
 mm/truncate.c | 91 +++++++--------------------------------------------
 4 files changed, 75 insertions(+), 97 deletions(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index 3a55d258d9f2..9a33d1b8cef6 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1952,6 +1952,63 @@ unsigned find_get_entries(struct address_space *mapping,
 	return ret;
 }
 
+/**
+ * find_lock_entries - Find a batch of pagecache entries.
+ * @mapping:	The address_space to search.
+ * @start:	The starting page cache index.
+ * @end:	The final page index (inclusive).
+ * @pvec:	Where the resulting entries are placed.
+ * @indices:	The cache indices of the entries in @pvec.
+ *
+ * find_lock_entries() will return a batch of entries from @mapping.
+ * Swap, shadow and DAX entries are included.  Pages are returned
+ * locked and with an incremented refcount.  Pages which are locked by
+ * somebody else or under writeback are skipped.  Only the head page of
+ * a THP is returned.  Pages which are partially outside the range are
+ * not returned.
+ *
+ * The entries have ascending indexes.  The indices may not be consecutive
+ * due to not-present entries, THP pages, pages which could not be locked
+ * or pages under writeback.
+ *
+ * Return: The number of entries which were found.
+ */
+unsigned find_lock_entries(struct address_space *mapping, pgoff_t start,
+		pgoff_t end, struct pagevec *pvec, pgoff_t *indices)
+{
+	XA_STATE(xas, &mapping->i_pages, start);
+	struct page *page;
+
+	rcu_read_lock();
+	while ((page = xas_find_get_entry(&xas, end, XA_PRESENT))) {
+		if (!xa_is_value(page)) {
+			if (page->index < start)
+				goto put;
+			VM_BUG_ON_PAGE(page->index != xas.xa_index, page);
+			if (page->index + thp_nr_pages(page) - 1 > end)
+				goto put;
+			if (!trylock_page(page))
+				goto put;
+			if (page->mapping != mapping || PageWriteback(page))
+				goto unlock;
+		}
+		indices[pvec->nr] = xas.xa_index;
+		if (!pagevec_add(pvec, page))
+			break;
+		goto next;
+unlock:
+		unlock_page(page);
+put:
+		put_page(page);
+next:
+		if (!xa_is_value(page) && PageTransHuge(page))
+			xas_set(&xas, page->index + thp_nr_pages(page));
+	}
+	rcu_read_unlock();
+
+	return pagevec_count(pvec);
+}
+
 /**
  * find_get_pages_range - gang pagecache lookup
  * @mapping:	The address_space to search
diff --git a/mm/internal.h b/mm/internal.h
index c43ccdddb0f6..8d79f4d21eaf 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -62,6 +62,8 @@ static inline void force_page_cache_readahead(struct address_space *mapping,
 
 struct page *find_get_entry(struct address_space *mapping, pgoff_t index);
 struct page *find_lock_entry(struct address_space *mapping, pgoff_t index);
+unsigned find_lock_entries(struct address_space *mapping, pgoff_t start,
+		pgoff_t end, struct pagevec *pvec, pgoff_t *indices);
 
 /**
  * page_evictable - test whether a page is evictable
diff --git a/mm/shmem.c b/mm/shmem.c
index 726cede653f5..ef34271cad2d 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -907,12 +907,8 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend,
 
 	pagevec_init(&pvec);
 	index = start;
-	while (index < end) {
-		pvec.nr = find_get_entries(mapping, index,
-			min(end - index, (pgoff_t)PAGEVEC_SIZE),
-			pvec.pages, indices);
-		if (!pvec.nr)
-			break;
+	while (index < end && find_lock_entries(mapping, index, end - 1,
+			&pvec, indices)) {
 		for (i = 0; i < pagevec_count(&pvec); i++) {
 			struct page *page = pvec.pages[i];
 
@@ -927,18 +923,10 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend,
 								index, page);
 				continue;
 			}
+			index += thp_nr_pages(page) - 1;
 
-			VM_BUG_ON_PAGE(page_to_pgoff(page) != index, page);
-
-			if (!trylock_page(page))
-				continue;
-
-			if ((!unfalloc || !PageUptodate(page)) &&
-			    page_mapping(page) == mapping) {
-				VM_BUG_ON_PAGE(PageWriteback(page), page);
-				if (shmem_punch_compound(page, start, end))
-					truncate_inode_page(mapping, page);
-			}
+			if (!unfalloc || !PageUptodate(page))
+				truncate_inode_page(mapping, page);
 			unlock_page(page);
 		}
 		pagevec_remove_exceptionals(&pvec);
diff --git a/mm/truncate.c b/mm/truncate.c
index 18cec39a9f53..3c6b6d5a0046 100644
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -326,51 +326,19 @@ void truncate_inode_pages_range(struct address_space *mapping,
 
 	pagevec_init(&pvec);
 	index = start;
-	while (index < end && pagevec_lookup_entries(&pvec, mapping, index,
-			min(end - index, (pgoff_t)PAGEVEC_SIZE),
-			indices)) {
-		/*
-		 * Pagevec array has exceptional entries and we may also fail
-		 * to lock some pages. So we store pages that can be deleted
-		 * in a new pagevec.
-		 */
-		struct pagevec locked_pvec;
-
-		pagevec_init(&locked_pvec);
-		for (i = 0; i < pagevec_count(&pvec); i++) {
-			struct page *page = pvec.pages[i];
-
-			/* We rely upon deletion not changing page->index */
-			index = indices[i];
-			if (index >= end)
-				break;
-
-			if (xa_is_value(page))
-				continue;
-
-			if (!trylock_page(page))
-				continue;
-			WARN_ON(page_to_index(page) != index);
-			if (PageWriteback(page)) {
-				unlock_page(page);
-				continue;
-			}
-			if (page->mapping != mapping) {
-				unlock_page(page);
-				continue;
-			}
-			pagevec_add(&locked_pvec, page);
-		}
-		for (i = 0; i < pagevec_count(&locked_pvec); i++)
-			truncate_cleanup_page(mapping, locked_pvec.pages[i]);
-		delete_from_page_cache_batch(mapping, &locked_pvec);
-		for (i = 0; i < pagevec_count(&locked_pvec); i++)
-			unlock_page(locked_pvec.pages[i]);
+	while (index < end && find_lock_entries(mapping, index, end - 1,
+			&pvec, indices)) {
+		index = indices[pagevec_count(&pvec) - 1] + 1;
 		truncate_exceptional_pvec_entries(mapping, &pvec, indices, end);
+		for (i = 0; i < pagevec_count(&pvec); i++)
+			truncate_cleanup_page(mapping, pvec.pages[i]);
+		delete_from_page_cache_batch(mapping, &pvec);
+		for (i = 0; i < pagevec_count(&pvec); i++)
+			unlock_page(pvec.pages[i]);
 		pagevec_release(&pvec);
 		cond_resched();
-		index++;
 	}
+
 	if (partial_start) {
 		struct page *page = find_lock_page(mapping, start - 1);
 		if (page) {
@@ -539,9 +507,7 @@ unsigned long __invalidate_mapping_pages(struct address_space *mapping,
 	int i;
 
 	pagevec_init(&pvec);
-	while (index <= end && pagevec_lookup_entries(&pvec, mapping, index,
-			min(end - index, (pgoff_t)PAGEVEC_SIZE - 1) + 1,
-			indices)) {
+	while (find_lock_entries(mapping, index, end, &pvec, indices)) {
 		for (i = 0; i < pagevec_count(&pvec); i++) {
 			struct page *page = pvec.pages[i];
 
@@ -555,39 +521,7 @@ unsigned long __invalidate_mapping_pages(struct address_space *mapping,
 							     page);
 				continue;
 			}
-
-			if (!trylock_page(page))
-				continue;
-
-			WARN_ON(page_to_index(page) != index);
-
-			/* Middle of THP: skip */
-			if (PageTransTail(page)) {
-				unlock_page(page);
-				continue;
-			} else if (PageTransHuge(page)) {
-				index += HPAGE_PMD_NR - 1;
-				i += HPAGE_PMD_NR - 1;
-				/*
-				 * 'end' is in the middle of THP. Don't
-				 * invalidate the page as the part outside of
-				 * 'end' could be still useful.
-				 */
-				if (index > end) {
-					unlock_page(page);
-					continue;
-				}
-
-				/* Take a pin outside pagevec */
-				get_page(page);
-
-				/*
-				 * Drop extra pins before trying to invalidate
-				 * the huge page.
-				 */
-				pagevec_remove_exceptionals(&pvec);
-				pagevec_release(&pvec);
-			}
+			index += thp_nr_pages(page) - 1;
 
 			ret = invalidate_inode_page(page);
 			unlock_page(page);
@@ -601,9 +535,6 @@ unsigned long __invalidate_mapping_pages(struct address_space *mapping,
 				if (nr_pagevec)
 					(*nr_pagevec)++;
 			}
-
-			if (PageTransHuge(page))
-				put_page(page);
 			count += ret;
 		}
 		pagevec_remove_exceptionals(&pvec);
-- 
2.28.0


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v3 06/12] mm: Add an 'end' parameter to find_get_entries
  2020-10-26  4:13 [PATCH v3 00/12] Overhaul multi-page lookups for THP Matthew Wilcox (Oracle)
                   ` (4 preceding siblings ...)
  2020-10-26  4:14 ` [PATCH v3 05/12] mm: Add and use find_lock_entries Matthew Wilcox (Oracle)
@ 2020-10-26  4:14 ` Matthew Wilcox (Oracle)
  2020-10-26  4:14 ` [PATCH v3 07/12] mm: Add an 'end' parameter to pagevec_lookup_entries Matthew Wilcox (Oracle)
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 25+ messages in thread
From: Matthew Wilcox (Oracle) @ 2020-10-26  4:14 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox (Oracle),
	linux-fsdevel, Andrew Morton, Hugh Dickins, Johannes Weiner,
	Yang Shi, Dave Chinner, linux-kernel, Jan Kara,
	William Kucharski

This simplifies the callers and leads to a more efficient implementation
since the XArray has this functionality already.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: William Kucharski <william.kucharski@oracle.com>
---
 include/linux/pagemap.h |  4 ++--
 mm/filemap.c            |  9 +++++----
 mm/shmem.c              | 10 ++--------
 mm/swap.c               |  2 +-
 4 files changed, 10 insertions(+), 15 deletions(-)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 5f3e829c91fd..5b425f666bc5 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -450,8 +450,8 @@ static inline struct page *find_subpage(struct page *head, pgoff_t index)
 }
 
 unsigned find_get_entries(struct address_space *mapping, pgoff_t start,
-			  unsigned int nr_entries, struct page **entries,
-			  pgoff_t *indices);
+		pgoff_t end, unsigned int nr_entries, struct page **entries,
+		pgoff_t *indices);
 unsigned find_get_pages_range(struct address_space *mapping, pgoff_t *start,
 			pgoff_t end, unsigned int nr_pages,
 			struct page **pages);
diff --git a/mm/filemap.c b/mm/filemap.c
index 9a33d1b8cef6..6ed2422426d2 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1897,6 +1897,7 @@ static inline struct page *xas_find_get_entry(struct xa_state *xas,
  * find_get_entries - gang pagecache lookup
  * @mapping:	The address_space to search
  * @start:	The starting page cache index
+ * @end:	The final page index (inclusive).
  * @nr_entries:	The maximum number of entries
  * @entries:	Where the resulting entries are placed
  * @indices:	The cache indices corresponding to the entries in @entries
@@ -1920,9 +1921,9 @@ static inline struct page *xas_find_get_entry(struct xa_state *xas,
  *
  * Return: the number of pages and shadow entries which were found.
  */
-unsigned find_get_entries(struct address_space *mapping,
-			  pgoff_t start, unsigned int nr_entries,
-			  struct page **entries, pgoff_t *indices)
+unsigned find_get_entries(struct address_space *mapping, pgoff_t start,
+		pgoff_t end, unsigned int nr_entries, struct page **entries,
+		pgoff_t *indices)
 {
 	XA_STATE(xas, &mapping->i_pages, start);
 	struct page *page;
@@ -1932,7 +1933,7 @@ unsigned find_get_entries(struct address_space *mapping,
 		return 0;
 
 	rcu_read_lock();
-	while ((page = xas_find_get_entry(&xas, ULONG_MAX, XA_PRESENT))) {
+	while ((page = xas_find_get_entry(&xas, end, XA_PRESENT))) {
 		/*
 		 * Terminate early on finding a THP, to allow the caller to
 		 * handle it all at once; but continue if this is hugetlbfs.
diff --git a/mm/shmem.c b/mm/shmem.c
index ef34271cad2d..27b93b738ea0 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -913,8 +913,6 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend,
 			struct page *page = pvec.pages[i];
 
 			index = indices[i];
-			if (index >= end)
-				break;
 
 			if (xa_is_value(page)) {
 				if (unfalloc)
@@ -967,9 +965,8 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend,
 	while (index < end) {
 		cond_resched();
 
-		pvec.nr = find_get_entries(mapping, index,
-				min(end - index, (pgoff_t)PAGEVEC_SIZE),
-				pvec.pages, indices);
+		pvec.nr = find_get_entries(mapping, index, end - 1,
+				PAGEVEC_SIZE, pvec.pages, indices);
 		if (!pvec.nr) {
 			/* If all gone or hole-punch or unfalloc, we're done */
 			if (index == start || end != -1)
@@ -982,9 +979,6 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend,
 			struct page *page = pvec.pages[i];
 
 			index = indices[i];
-			if (index >= end)
-				break;
-
 			if (xa_is_value(page)) {
 				if (unfalloc)
 					continue;
diff --git a/mm/swap.c b/mm/swap.c
index 47a47681c86b..9b0836cda971 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -1099,7 +1099,7 @@ unsigned pagevec_lookup_entries(struct pagevec *pvec,
 				pgoff_t start, unsigned nr_entries,
 				pgoff_t *indices)
 {
-	pvec->nr = find_get_entries(mapping, start, nr_entries,
+	pvec->nr = find_get_entries(mapping, start, ULONG_MAX, nr_entries,
 				    pvec->pages, indices);
 	return pagevec_count(pvec);
 }
-- 
2.28.0


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v3 07/12] mm: Add an 'end' parameter to pagevec_lookup_entries
  2020-10-26  4:13 [PATCH v3 00/12] Overhaul multi-page lookups for THP Matthew Wilcox (Oracle)
                   ` (5 preceding siblings ...)
  2020-10-26  4:14 ` [PATCH v3 06/12] mm: Add an 'end' parameter to find_get_entries Matthew Wilcox (Oracle)
@ 2020-10-26  4:14 ` Matthew Wilcox (Oracle)
  2020-10-26  4:14 ` [PATCH v3 08/12] mm: Remove nr_entries parameter from pagevec_lookup_entries Matthew Wilcox (Oracle)
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 25+ messages in thread
From: Matthew Wilcox (Oracle) @ 2020-10-26  4:14 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox (Oracle),
	linux-fsdevel, Andrew Morton, Hugh Dickins, Johannes Weiner,
	Yang Shi, Dave Chinner, linux-kernel, Jan Kara,
	William Kucharski

Simplifies the callers and uses the existing functionality
in find_get_entries().  We can also drop the final argument of
truncate_exceptional_pvec_entries() and simplify the logic in that
function.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: William Kucharski <william.kucharski@oracle.com>
---
 include/linux/pagevec.h |  5 ++---
 mm/swap.c               |  8 ++++----
 mm/truncate.c           | 41 ++++++++++-------------------------------
 3 files changed, 16 insertions(+), 38 deletions(-)

diff --git a/include/linux/pagevec.h b/include/linux/pagevec.h
index 081d934eda64..4b245592262c 100644
--- a/include/linux/pagevec.h
+++ b/include/linux/pagevec.h
@@ -26,9 +26,8 @@ struct pagevec {
 void __pagevec_release(struct pagevec *pvec);
 void __pagevec_lru_add(struct pagevec *pvec);
 unsigned pagevec_lookup_entries(struct pagevec *pvec,
-				struct address_space *mapping,
-				pgoff_t start, unsigned nr_entries,
-				pgoff_t *indices);
+		struct address_space *mapping, pgoff_t start, pgoff_t end,
+		unsigned nr_entries, pgoff_t *indices);
 void pagevec_remove_exceptionals(struct pagevec *pvec);
 unsigned pagevec_lookup_range(struct pagevec *pvec,
 			      struct address_space *mapping,
diff --git a/mm/swap.c b/mm/swap.c
index 9b0836cda971..d4d0c54d6ec9 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -1075,6 +1075,7 @@ void __pagevec_lru_add(struct pagevec *pvec)
  * @pvec:	Where the resulting entries are placed
  * @mapping:	The address_space to search
  * @start:	The starting entry index
+ * @end:	The highest index to return (inclusive).
  * @nr_entries:	The maximum number of pages
  * @indices:	The cache indices corresponding to the entries in @pvec
  *
@@ -1095,11 +1096,10 @@ void __pagevec_lru_add(struct pagevec *pvec)
  * found.
  */
 unsigned pagevec_lookup_entries(struct pagevec *pvec,
-				struct address_space *mapping,
-				pgoff_t start, unsigned nr_entries,
-				pgoff_t *indices)
+		struct address_space *mapping, pgoff_t start, pgoff_t end,
+		unsigned nr_entries, pgoff_t *indices)
 {
-	pvec->nr = find_get_entries(mapping, start, ULONG_MAX, nr_entries,
+	pvec->nr = find_get_entries(mapping, start, end, nr_entries,
 				    pvec->pages, indices);
 	return pagevec_count(pvec);
 }
diff --git a/mm/truncate.c b/mm/truncate.c
index 3c6b6d5a0046..ec43312f4756 100644
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -57,11 +57,10 @@ static void clear_shadow_entry(struct address_space *mapping, pgoff_t index,
  * exceptional entries similar to what pagevec_remove_exceptionals does.
  */
 static void truncate_exceptional_pvec_entries(struct address_space *mapping,
-				struct pagevec *pvec, pgoff_t *indices,
-				pgoff_t end)
+				struct pagevec *pvec, pgoff_t *indices)
 {
 	int i, j;
-	bool dax, lock;
+	bool dax;
 
 	/* Handled by shmem itself */
 	if (shmem_mapping(mapping))
@@ -75,8 +74,7 @@ static void truncate_exceptional_pvec_entries(struct address_space *mapping,
 		return;
 
 	dax = dax_mapping(mapping);
-	lock = !dax && indices[j] < end;
-	if (lock)
+	if (!dax)
 		xa_lock_irq(&mapping->i_pages);
 
 	for (i = j; i < pagevec_count(pvec); i++) {
@@ -88,9 +86,6 @@ static void truncate_exceptional_pvec_entries(struct address_space *mapping,
 			continue;
 		}
 
-		if (index >= end)
-			continue;
-
 		if (unlikely(dax)) {
 			dax_delete_mapping_entry(mapping, index);
 			continue;
@@ -99,7 +94,7 @@ static void truncate_exceptional_pvec_entries(struct address_space *mapping,
 		__clear_shadow_entry(mapping, index, page);
 	}
 
-	if (lock)
+	if (!dax)
 		xa_unlock_irq(&mapping->i_pages);
 	pvec->nr = j;
 }
@@ -329,7 +324,7 @@ void truncate_inode_pages_range(struct address_space *mapping,
 	while (index < end && find_lock_entries(mapping, index, end - 1,
 			&pvec, indices)) {
 		index = indices[pagevec_count(&pvec) - 1] + 1;
-		truncate_exceptional_pvec_entries(mapping, &pvec, indices, end);
+		truncate_exceptional_pvec_entries(mapping, &pvec, indices);
 		for (i = 0; i < pagevec_count(&pvec); i++)
 			truncate_cleanup_page(mapping, pvec.pages[i]);
 		delete_from_page_cache_batch(mapping, &pvec);
@@ -381,8 +376,8 @@ void truncate_inode_pages_range(struct address_space *mapping,
 	index = start;
 	for ( ; ; ) {
 		cond_resched();
-		if (!pagevec_lookup_entries(&pvec, mapping, index,
-			min(end - index, (pgoff_t)PAGEVEC_SIZE), indices)) {
+		if (!pagevec_lookup_entries(&pvec, mapping, index, end - 1,
+				PAGEVEC_SIZE, indices)) {
 			/* If all gone from start onwards, we're done */
 			if (index == start)
 				break;
@@ -390,23 +385,12 @@ void truncate_inode_pages_range(struct address_space *mapping,
 			index = start;
 			continue;
 		}
-		if (index == start && indices[0] >= end) {
-			/* All gone out of hole to be punched, we're done */
-			pagevec_remove_exceptionals(&pvec);
-			pagevec_release(&pvec);
-			break;
-		}
 
 		for (i = 0; i < pagevec_count(&pvec); i++) {
 			struct page *page = pvec.pages[i];
 
 			/* We rely upon deletion not changing page->index */
 			index = indices[i];
-			if (index >= end) {
-				/* Restart punch to make sure all gone */
-				index = start - 1;
-				break;
-			}
 
 			if (xa_is_value(page))
 				continue;
@@ -417,7 +401,7 @@ void truncate_inode_pages_range(struct address_space *mapping,
 			truncate_inode_page(mapping, page);
 			unlock_page(page);
 		}
-		truncate_exceptional_pvec_entries(mapping, &pvec, indices, end);
+		truncate_exceptional_pvec_entries(mapping, &pvec, indices);
 		pagevec_release(&pvec);
 		index++;
 	}
@@ -513,8 +497,6 @@ unsigned long __invalidate_mapping_pages(struct address_space *mapping,
 
 			/* We rely upon deletion not changing page->index */
 			index = indices[i];
-			if (index > end)
-				break;
 
 			if (xa_is_value(page)) {
 				invalidate_exceptional_entry(mapping, index,
@@ -650,16 +632,13 @@ int invalidate_inode_pages2_range(struct address_space *mapping,
 
 	pagevec_init(&pvec);
 	index = start;
-	while (index <= end && pagevec_lookup_entries(&pvec, mapping, index,
-			min(end - index, (pgoff_t)PAGEVEC_SIZE - 1) + 1,
-			indices)) {
+	while (pagevec_lookup_entries(&pvec, mapping, index, end,
+			PAGEVEC_SIZE, indices)) {
 		for (i = 0; i < pagevec_count(&pvec); i++) {
 			struct page *page = pvec.pages[i];
 
 			/* We rely upon deletion not changing page->index */
 			index = indices[i];
-			if (index > end)
-				break;
 
 			if (xa_is_value(page)) {
 				if (!invalidate_exceptional_entry2(mapping,
-- 
2.28.0


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v3 08/12] mm: Remove nr_entries parameter from pagevec_lookup_entries
  2020-10-26  4:13 [PATCH v3 00/12] Overhaul multi-page lookups for THP Matthew Wilcox (Oracle)
                   ` (6 preceding siblings ...)
  2020-10-26  4:14 ` [PATCH v3 07/12] mm: Add an 'end' parameter to pagevec_lookup_entries Matthew Wilcox (Oracle)
@ 2020-10-26  4:14 ` Matthew Wilcox (Oracle)
  2020-10-26  4:14 ` [PATCH v3 09/12] mm: Pass pvec directly to find_get_entries Matthew Wilcox (Oracle)
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 25+ messages in thread
From: Matthew Wilcox (Oracle) @ 2020-10-26  4:14 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox (Oracle),
	linux-fsdevel, Andrew Morton, Hugh Dickins, Johannes Weiner,
	Yang Shi, Dave Chinner, linux-kernel, Jan Kara,
	William Kucharski

All callers want to fetch the full size of the pvec.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: William Kucharski <william.kucharski@oracle.com>
---
 include/linux/pagevec.h | 2 +-
 mm/swap.c               | 4 ++--
 mm/truncate.c           | 5 ++---
 3 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/include/linux/pagevec.h b/include/linux/pagevec.h
index 4b245592262c..ce77724a2ab7 100644
--- a/include/linux/pagevec.h
+++ b/include/linux/pagevec.h
@@ -27,7 +27,7 @@ void __pagevec_release(struct pagevec *pvec);
 void __pagevec_lru_add(struct pagevec *pvec);
 unsigned pagevec_lookup_entries(struct pagevec *pvec,
 		struct address_space *mapping, pgoff_t start, pgoff_t end,
-		unsigned nr_entries, pgoff_t *indices);
+		pgoff_t *indices);
 void pagevec_remove_exceptionals(struct pagevec *pvec);
 unsigned pagevec_lookup_range(struct pagevec *pvec,
 			      struct address_space *mapping,
diff --git a/mm/swap.c b/mm/swap.c
index d4d0c54d6ec9..31653152f55d 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -1097,9 +1097,9 @@ void __pagevec_lru_add(struct pagevec *pvec)
  */
 unsigned pagevec_lookup_entries(struct pagevec *pvec,
 		struct address_space *mapping, pgoff_t start, pgoff_t end,
-		unsigned nr_entries, pgoff_t *indices)
+		pgoff_t *indices)
 {
-	pvec->nr = find_get_entries(mapping, start, end, nr_entries,
+	pvec->nr = find_get_entries(mapping, start, end, PAGEVEC_SIZE,
 				    pvec->pages, indices);
 	return pagevec_count(pvec);
 }
diff --git a/mm/truncate.c b/mm/truncate.c
index ec43312f4756..c4a3c9d6a0c1 100644
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -377,7 +377,7 @@ void truncate_inode_pages_range(struct address_space *mapping,
 	for ( ; ; ) {
 		cond_resched();
 		if (!pagevec_lookup_entries(&pvec, mapping, index, end - 1,
-				PAGEVEC_SIZE, indices)) {
+				indices)) {
 			/* If all gone from start onwards, we're done */
 			if (index == start)
 				break;
@@ -632,8 +632,7 @@ int invalidate_inode_pages2_range(struct address_space *mapping,
 
 	pagevec_init(&pvec);
 	index = start;
-	while (pagevec_lookup_entries(&pvec, mapping, index, end,
-			PAGEVEC_SIZE, indices)) {
+	while (pagevec_lookup_entries(&pvec, mapping, index, end, indices)) {
 		for (i = 0; i < pagevec_count(&pvec); i++) {
 			struct page *page = pvec.pages[i];
 
-- 
2.28.0


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v3 09/12] mm: Pass pvec directly to find_get_entries
  2020-10-26  4:13 [PATCH v3 00/12] Overhaul multi-page lookups for THP Matthew Wilcox (Oracle)
                   ` (7 preceding siblings ...)
  2020-10-26  4:14 ` [PATCH v3 08/12] mm: Remove nr_entries parameter from pagevec_lookup_entries Matthew Wilcox (Oracle)
@ 2020-10-26  4:14 ` Matthew Wilcox (Oracle)
  2020-10-26  4:14 ` [PATCH v3 10/12] mm: Remove pagevec_lookup_entries Matthew Wilcox (Oracle)
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 25+ messages in thread
From: Matthew Wilcox (Oracle) @ 2020-10-26  4:14 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox (Oracle),
	linux-fsdevel, Andrew Morton, Hugh Dickins, Johannes Weiner,
	Yang Shi, Dave Chinner, linux-kernel, Jan Kara,
	William Kucharski

All callers of find_get_entries() use a pvec, so pass it directly
instead of manipulating it in the caller.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: William Kucharski <william.kucharski@oracle.com>
---
 include/linux/pagemap.h |  3 +--
 mm/filemap.c            | 21 +++++++++------------
 mm/shmem.c              |  5 ++---
 mm/swap.c               |  4 +---
 4 files changed, 13 insertions(+), 20 deletions(-)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 5b425f666bc5..f0bbe29de732 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -450,8 +450,7 @@ static inline struct page *find_subpage(struct page *head, pgoff_t index)
 }
 
 unsigned find_get_entries(struct address_space *mapping, pgoff_t start,
-		pgoff_t end, unsigned int nr_entries, struct page **entries,
-		pgoff_t *indices);
+		pgoff_t end, struct pagevec *pvec, pgoff_t *indices);
 unsigned find_get_pages_range(struct address_space *mapping, pgoff_t *start,
 			pgoff_t end, unsigned int nr_pages,
 			struct page **pages);
diff --git a/mm/filemap.c b/mm/filemap.c
index 6ed2422426d2..e0fa943011d8 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1898,14 +1898,12 @@ static inline struct page *xas_find_get_entry(struct xa_state *xas,
  * @mapping:	The address_space to search
  * @start:	The starting page cache index
  * @end:	The final page index (inclusive).
- * @nr_entries:	The maximum number of entries
- * @entries:	Where the resulting entries are placed
+ * @pvec:	Where the resulting entries are placed.
  * @indices:	The cache indices corresponding to the entries in @entries
  *
- * find_get_entries() will search for and return a group of up to
- * @nr_entries entries in the mapping.  The entries are placed at
- * @entries.  find_get_entries() takes a reference against any actual
- * pages it returns.
+ * find_get_entries() will search for and return a batch of entries in
+ * the mapping.  The entries are placed in @pvec.  find_get_entries()
+ * takes a reference on any actual pages it returns.
  *
  * The search returns a group of mapping-contiguous page cache entries
  * with ascending indexes.  There may be holes in the indices due to
@@ -1922,15 +1920,12 @@ static inline struct page *xas_find_get_entry(struct xa_state *xas,
  * Return: the number of pages and shadow entries which were found.
  */
 unsigned find_get_entries(struct address_space *mapping, pgoff_t start,
-		pgoff_t end, unsigned int nr_entries, struct page **entries,
-		pgoff_t *indices)
+		pgoff_t end, struct pagevec *pvec, pgoff_t *indices)
 {
 	XA_STATE(xas, &mapping->i_pages, start);
 	struct page *page;
 	unsigned int ret = 0;
-
-	if (!nr_entries)
-		return 0;
+	unsigned nr_entries = PAGEVEC_SIZE;
 
 	rcu_read_lock();
 	while ((page = xas_find_get_entry(&xas, end, XA_PRESENT))) {
@@ -1945,11 +1940,13 @@ unsigned find_get_entries(struct address_space *mapping, pgoff_t start,
 		}
 
 		indices[ret] = xas.xa_index;
-		entries[ret] = page;
+		pvec->pages[ret] = page;
 		if (++ret == nr_entries)
 			break;
 	}
 	rcu_read_unlock();
+
+	pvec->nr = ret;
 	return ret;
 }
 
diff --git a/mm/shmem.c b/mm/shmem.c
index 27b93b738ea0..7880a245ac32 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -965,9 +965,8 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend,
 	while (index < end) {
 		cond_resched();
 
-		pvec.nr = find_get_entries(mapping, index, end - 1,
-				PAGEVEC_SIZE, pvec.pages, indices);
-		if (!pvec.nr) {
+		if (!find_get_entries(mapping, index, end - 1, &pvec,
+				indices)) {
 			/* If all gone or hole-punch or unfalloc, we're done */
 			if (index == start || end != -1)
 				break;
diff --git a/mm/swap.c b/mm/swap.c
index 31653152f55d..0c5659c05e6e 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -1099,9 +1099,7 @@ unsigned pagevec_lookup_entries(struct pagevec *pvec,
 		struct address_space *mapping, pgoff_t start, pgoff_t end,
 		pgoff_t *indices)
 {
-	pvec->nr = find_get_entries(mapping, start, end, PAGEVEC_SIZE,
-				    pvec->pages, indices);
-	return pagevec_count(pvec);
+	return find_get_entries(mapping, start, end, pvec, indices);
 }
 
 /**
-- 
2.28.0


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v3 10/12] mm: Remove pagevec_lookup_entries
  2020-10-26  4:13 [PATCH v3 00/12] Overhaul multi-page lookups for THP Matthew Wilcox (Oracle)
                   ` (8 preceding siblings ...)
  2020-10-26  4:14 ` [PATCH v3 09/12] mm: Pass pvec directly to find_get_entries Matthew Wilcox (Oracle)
@ 2020-10-26  4:14 ` Matthew Wilcox (Oracle)
  2020-10-26  4:14 ` [PATCH v3 11/12] mm/truncate,shmem: Handle truncates that split THPs Matthew Wilcox (Oracle)
  2020-10-26  4:14 ` [PATCH v3 12/12] mm/filemap: Return only head pages from find_get_entries Matthew Wilcox (Oracle)
  11 siblings, 0 replies; 25+ messages in thread
From: Matthew Wilcox (Oracle) @ 2020-10-26  4:14 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox (Oracle),
	linux-fsdevel, Andrew Morton, Hugh Dickins, Johannes Weiner,
	Yang Shi, Dave Chinner, linux-kernel, Jan Kara,
	William Kucharski

pagevec_lookup_entries() is now just a wrapper around find_get_entries()
so remove it and convert all its callers.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: William Kucharski <william.kucharski@oracle.com>
---
 include/linux/pagevec.h |  3 ---
 mm/swap.c               | 36 ++----------------------------------
 mm/truncate.c           |  4 ++--
 3 files changed, 4 insertions(+), 39 deletions(-)

diff --git a/include/linux/pagevec.h b/include/linux/pagevec.h
index ce77724a2ab7..a45bea4b4d08 100644
--- a/include/linux/pagevec.h
+++ b/include/linux/pagevec.h
@@ -25,9 +25,6 @@ struct pagevec {
 
 void __pagevec_release(struct pagevec *pvec);
 void __pagevec_lru_add(struct pagevec *pvec);
-unsigned pagevec_lookup_entries(struct pagevec *pvec,
-		struct address_space *mapping, pgoff_t start, pgoff_t end,
-		pgoff_t *indices);
 void pagevec_remove_exceptionals(struct pagevec *pvec);
 unsigned pagevec_lookup_range(struct pagevec *pvec,
 			      struct address_space *mapping,
diff --git a/mm/swap.c b/mm/swap.c
index 0c5659c05e6e..8f5623336eb3 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -1070,44 +1070,12 @@ void __pagevec_lru_add(struct pagevec *pvec)
 	pagevec_lru_move_fn(pvec, __pagevec_lru_add_fn, NULL);
 }
 
-/**
- * pagevec_lookup_entries - gang pagecache lookup
- * @pvec:	Where the resulting entries are placed
- * @mapping:	The address_space to search
- * @start:	The starting entry index
- * @end:	The highest index to return (inclusive).
- * @nr_entries:	The maximum number of pages
- * @indices:	The cache indices corresponding to the entries in @pvec
- *
- * pagevec_lookup_entries() will search for and return a group of up
- * to @nr_pages pages and shadow entries in the mapping.  All
- * entries are placed in @pvec.  pagevec_lookup_entries() takes a
- * reference against actual pages in @pvec.
- *
- * The search returns a group of mapping-contiguous entries with
- * ascending indexes.  There may be holes in the indices due to
- * not-present entries.
- *
- * Only one subpage of a Transparent Huge Page is returned in one call:
- * allowing truncate_inode_pages_range() to evict the whole THP without
- * cycling through a pagevec of extra references.
- *
- * pagevec_lookup_entries() returns the number of entries which were
- * found.
- */
-unsigned pagevec_lookup_entries(struct pagevec *pvec,
-		struct address_space *mapping, pgoff_t start, pgoff_t end,
-		pgoff_t *indices)
-{
-	return find_get_entries(mapping, start, end, pvec, indices);
-}
-
 /**
  * pagevec_remove_exceptionals - pagevec exceptionals pruning
  * @pvec:	The pagevec to prune
  *
- * pagevec_lookup_entries() fills both pages and exceptional radix
- * tree entries into the pagevec.  This function prunes all
+ * find_get_entries() fills both pages and XArray value entries (aka
+ * exceptional entries) into the pagevec.  This function prunes all
  * exceptionals from @pvec without leaving holes, so that it can be
  * passed on to page-only pagevec operations.
  */
diff --git a/mm/truncate.c b/mm/truncate.c
index c4a3c9d6a0c1..a96e44a5ce59 100644
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -376,7 +376,7 @@ void truncate_inode_pages_range(struct address_space *mapping,
 	index = start;
 	for ( ; ; ) {
 		cond_resched();
-		if (!pagevec_lookup_entries(&pvec, mapping, index, end - 1,
+		if (!find_get_entries(mapping, index, end - 1, &pvec,
 				indices)) {
 			/* If all gone from start onwards, we're done */
 			if (index == start)
@@ -632,7 +632,7 @@ int invalidate_inode_pages2_range(struct address_space *mapping,
 
 	pagevec_init(&pvec);
 	index = start;
-	while (pagevec_lookup_entries(&pvec, mapping, index, end, indices)) {
+	while (find_get_entries(mapping, index, end, &pvec, indices)) {
 		for (i = 0; i < pagevec_count(&pvec); i++) {
 			struct page *page = pvec.pages[i];
 
-- 
2.28.0


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v3 11/12] mm/truncate,shmem: Handle truncates that split THPs
  2020-10-26  4:13 [PATCH v3 00/12] Overhaul multi-page lookups for THP Matthew Wilcox (Oracle)
                   ` (9 preceding siblings ...)
  2020-10-26  4:14 ` [PATCH v3 10/12] mm: Remove pagevec_lookup_entries Matthew Wilcox (Oracle)
@ 2020-10-26  4:14 ` Matthew Wilcox (Oracle)
  2020-10-26 11:05   ` Jan Kara
  2020-10-26  4:14 ` [PATCH v3 12/12] mm/filemap: Return only head pages from find_get_entries Matthew Wilcox (Oracle)
  11 siblings, 1 reply; 25+ messages in thread
From: Matthew Wilcox (Oracle) @ 2020-10-26  4:14 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox (Oracle),
	linux-fsdevel, Andrew Morton, Hugh Dickins, Johannes Weiner,
	Yang Shi, Dave Chinner, linux-kernel, William Kucharski

Handle THP splitting in the parts of the truncation functions which
already handle partial pages.  Factor all that code out into a new
function called truncate_inode_partial_page().

We lose the easy 'bail out' path if a truncate or hole punch is entirely
within a single page.  We can add some more complex logic to restore
the optimisation if it proves to be worthwhile.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: William Kucharski <william.kucharski@oracle.com>
---
 mm/internal.h |   1 +
 mm/shmem.c    |  97 ++++++++++++++---------------------------
 mm/truncate.c | 118 +++++++++++++++++++++++++++++++-------------------
 3 files changed, 108 insertions(+), 108 deletions(-)

diff --git a/mm/internal.h b/mm/internal.h
index 8d79f4d21eaf..194572e1ab49 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -620,4 +620,5 @@ struct migration_target_control {
 	gfp_t gfp_mask;
 };
 
+bool truncate_inode_partial_page(struct page *page, loff_t start, loff_t end);
 #endif	/* __MM_INTERNAL_H */
diff --git a/mm/shmem.c b/mm/shmem.c
index 7880a245ac32..bcb4ecaa5949 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -857,32 +857,6 @@ void shmem_unlock_mapping(struct address_space *mapping)
 	}
 }
 
-/*
- * Check whether a hole-punch or truncation needs to split a huge page,
- * returning true if no split was required, or the split has been successful.
- *
- * Eviction (or truncation to 0 size) should never need to split a huge page;
- * but in rare cases might do so, if shmem_undo_range() failed to trylock on
- * head, and then succeeded to trylock on tail.
- *
- * A split can only succeed when there are no additional references on the
- * huge page: so the split below relies upon find_get_entries() having stopped
- * when it found a subpage of the huge page, without getting further references.
- */
-static bool shmem_punch_compound(struct page *page, pgoff_t start, pgoff_t end)
-{
-	if (!PageTransCompound(page))
-		return true;
-
-	/* Just proceed to delete a huge page wholly within the range punched */
-	if (PageHead(page) &&
-	    page->index >= start && page->index + HPAGE_PMD_NR <= end)
-		return true;
-
-	/* Try to split huge page, so we can truly punch the hole or truncate */
-	return split_huge_page(page) >= 0;
-}
-
 /*
  * Remove range of pages and swap entries from page cache, and free them.
  * If !unfalloc, truncate or punch hole; if unfalloc, undo failed fallocate.
@@ -894,13 +868,13 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend,
 	struct shmem_inode_info *info = SHMEM_I(inode);
 	pgoff_t start = (lstart + PAGE_SIZE - 1) >> PAGE_SHIFT;
 	pgoff_t end = (lend + 1) >> PAGE_SHIFT;
-	unsigned int partial_start = lstart & (PAGE_SIZE - 1);
-	unsigned int partial_end = (lend + 1) & (PAGE_SIZE - 1);
 	struct pagevec pvec;
 	pgoff_t indices[PAGEVEC_SIZE];
+	struct page *page;
 	long nr_swaps_freed = 0;
 	pgoff_t index;
 	int i;
+	bool partial_end;
 
 	if (lend == -1)
 		end = -1;	/* unsigned, so actually very big */
@@ -910,7 +884,7 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend,
 	while (index < end && find_lock_entries(mapping, index, end - 1,
 			&pvec, indices)) {
 		for (i = 0; i < pagevec_count(&pvec); i++) {
-			struct page *page = pvec.pages[i];
+			page = pvec.pages[i];
 
 			index = indices[i];
 
@@ -933,33 +907,37 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend,
 		index++;
 	}
 
-	if (partial_start) {
-		struct page *page = NULL;
-		shmem_getpage(inode, start - 1, &page, SGP_READ);
-		if (page) {
-			unsigned int top = PAGE_SIZE;
-			if (start > end) {
-				top = partial_end;
-				partial_end = 0;
-			}
-			zero_user_segment(page, partial_start, top);
-			set_page_dirty(page);
-			unlock_page(page);
-			put_page(page);
+	partial_end = ((lend + 1) % PAGE_SIZE) > 0;
+	page = NULL;
+	shmem_getpage(inode, lstart >> PAGE_SHIFT, &page, SGP_READ);
+	if (page) {
+		bool same_page;
+
+		page = thp_head(page);
+		same_page = lend < page_offset(page) + thp_size(page);
+		if (same_page)
+			partial_end = false;
+		set_page_dirty(page);
+		if (!truncate_inode_partial_page(page, lstart, lend)) {
+			start = page->index + thp_nr_pages(page);
+			if (same_page)
+				end = page->index;
 		}
+		unlock_page(page);
+		put_page(page);
+		page = NULL;
 	}
-	if (partial_end) {
-		struct page *page = NULL;
+
+	if (partial_end)
 		shmem_getpage(inode, end, &page, SGP_READ);
-		if (page) {
-			zero_user_segment(page, 0, partial_end);
-			set_page_dirty(page);
-			unlock_page(page);
-			put_page(page);
-		}
+	if (page) {
+		page = thp_head(page);
+		set_page_dirty(page);
+		if (!truncate_inode_partial_page(page, lstart, lend))
+			end = page->index;
+		unlock_page(page);
+		put_page(page);
 	}
-	if (start >= end)
-		return;
 
 	index = start;
 	while (index < end) {
@@ -975,7 +953,7 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend,
 			continue;
 		}
 		for (i = 0; i < pagevec_count(&pvec); i++) {
-			struct page *page = pvec.pages[i];
+			page = pvec.pages[i];
 
 			index = indices[i];
 			if (xa_is_value(page)) {
@@ -1000,18 +978,9 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend,
 					break;
 				}
 				VM_BUG_ON_PAGE(PageWriteback(page), page);
-				if (shmem_punch_compound(page, start, end))
-					truncate_inode_page(mapping, page);
-				else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) {
-					/* Wipe the page and don't get stuck */
-					clear_highpage(page);
-					flush_dcache_page(page);
-					set_page_dirty(page);
-					if (index <
-					    round_up(start, HPAGE_PMD_NR))
-						start = index + 1;
-				}
+				truncate_inode_page(mapping, page);
 			}
+			index = page->index + thp_nr_pages(page) - 1;
 			unlock_page(page);
 		}
 		pagevec_remove_exceptionals(&pvec);
diff --git a/mm/truncate.c b/mm/truncate.c
index a96e44a5ce59..11ef90d7e3af 100644
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -224,6 +224,53 @@ int truncate_inode_page(struct address_space *mapping, struct page *page)
 	return 0;
 }
 
+/*
+ * Handle partial (transparent) pages.  The page may be entirely within the
+ * range if a split has raced with us.  If not, we zero the part of the
+ * page that's within the [start, end] range, and then split the page if
+ * it's a THP.  split_page_range() will discard pages which now lie beyond
+ * i_size, and we rely on the caller to discard pages which lie within a
+ * newly created hole.
+ *
+ * Returns false if THP splitting failed so the caller can avoid
+ * discarding the entire page which is stubbornly unsplit.
+ */
+bool truncate_inode_partial_page(struct page *page, loff_t start, loff_t end)
+{
+	loff_t pos = page_offset(page);
+	unsigned int offset, length;
+
+	if (pos < start)
+		offset = start - pos;
+	else
+		offset = 0;
+	length = thp_size(page);
+	if (pos + length <= (u64)end)
+		length = length - offset;
+	else
+		length = end + 1 - pos - offset;
+
+	wait_on_page_writeback(page);
+	if (length == thp_size(page)) {
+		truncate_inode_page(page->mapping, page);
+		return true;
+	}
+
+	/*
+	 * We may be zeroing pages we're about to discard, but it avoids
+	 * doing a complex calculation here, and then doing the zeroing
+	 * anyway if the page split fails.
+	 */
+	zero_user(page, offset, length);
+
+	cleancache_invalidate_page(page->mapping, page);
+	if (page_has_private(page))
+		do_invalidatepage(page, offset, length);
+	if (!PageTransHuge(page))
+		return true;
+	return split_huge_page(page) == 0;
+}
+
 /*
  * Used to get rid of pages on hardware memory corruption.
  */
@@ -288,20 +335,16 @@ void truncate_inode_pages_range(struct address_space *mapping,
 {
 	pgoff_t		start;		/* inclusive */
 	pgoff_t		end;		/* exclusive */
-	unsigned int	partial_start;	/* inclusive */
-	unsigned int	partial_end;	/* exclusive */
 	struct pagevec	pvec;
 	pgoff_t		indices[PAGEVEC_SIZE];
 	pgoff_t		index;
 	int		i;
+	struct page *	page;
+	bool partial_end;
 
 	if (mapping->nrpages == 0 && mapping->nrexceptional == 0)
 		goto out;
 
-	/* Offsets within partial pages */
-	partial_start = lstart & (PAGE_SIZE - 1);
-	partial_end = (lend + 1) & (PAGE_SIZE - 1);
-
 	/*
 	 * 'start' and 'end' always covers the range of pages to be fully
 	 * truncated. Partial pages are covered with 'partial_start' at the
@@ -334,48 +377,35 @@ void truncate_inode_pages_range(struct address_space *mapping,
 		cond_resched();
 	}
 
-	if (partial_start) {
-		struct page *page = find_lock_page(mapping, start - 1);
-		if (page) {
-			unsigned int top = PAGE_SIZE;
-			if (start > end) {
-				/* Truncation within a single page */
-				top = partial_end;
-				partial_end = 0;
-			}
-			wait_on_page_writeback(page);
-			zero_user_segment(page, partial_start, top);
-			cleancache_invalidate_page(mapping, page);
-			if (page_has_private(page))
-				do_invalidatepage(page, partial_start,
-						  top - partial_start);
-			unlock_page(page);
-			put_page(page);
+	partial_end = ((lend + 1) % PAGE_SIZE) > 0;
+	page = find_lock_head(mapping, lstart >> PAGE_SHIFT);
+	if (page) {
+		bool same_page = lend < page_offset(page) + thp_size(page);
+		if (same_page)
+			partial_end = false;
+		if (!truncate_inode_partial_page(page, lstart, lend)) {
+			start = page->index + thp_nr_pages(page);
+			if (same_page)
+				end = page->index;
 		}
+		unlock_page(page);
+		put_page(page);
+		page = NULL;
 	}
-	if (partial_end) {
-		struct page *page = find_lock_page(mapping, end);
-		if (page) {
-			wait_on_page_writeback(page);
-			zero_user_segment(page, 0, partial_end);
-			cleancache_invalidate_page(mapping, page);
-			if (page_has_private(page))
-				do_invalidatepage(page, 0,
-						  partial_end);
-			unlock_page(page);
-			put_page(page);
-		}
+
+	if (partial_end)
+		page = find_lock_head(mapping, end);
+	if (page) {
+		if (!truncate_inode_partial_page(page, lstart, lend))
+			end = page->index;
+		unlock_page(page);
+		put_page(page);
 	}
-	/*
-	 * If the truncation happened within a single page no pages
-	 * will be released, just zeroed, so we can bail out now.
-	 */
-	if (start >= end)
-		goto out;
 
 	index = start;
-	for ( ; ; ) {
+	while (index < end) {
 		cond_resched();
+
 		if (!find_get_entries(mapping, index, end - 1, &pvec,
 				indices)) {
 			/* If all gone from start onwards, we're done */
@@ -387,7 +417,7 @@ void truncate_inode_pages_range(struct address_space *mapping,
 		}
 
 		for (i = 0; i < pagevec_count(&pvec); i++) {
-			struct page *page = pvec.pages[i];
+			page = pvec.pages[i];
 
 			/* We rely upon deletion not changing page->index */
 			index = indices[i];
@@ -396,7 +426,7 @@ void truncate_inode_pages_range(struct address_space *mapping,
 				continue;
 
 			lock_page(page);
-			WARN_ON(page_to_index(page) != index);
+			index = page->index + thp_nr_pages(page) - 1;
 			wait_on_page_writeback(page);
 			truncate_inode_page(mapping, page);
 			unlock_page(page);
-- 
2.28.0


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v3 12/12] mm/filemap: Return only head pages from find_get_entries
  2020-10-26  4:13 [PATCH v3 00/12] Overhaul multi-page lookups for THP Matthew Wilcox (Oracle)
                   ` (10 preceding siblings ...)
  2020-10-26  4:14 ` [PATCH v3 11/12] mm/truncate,shmem: Handle truncates that split THPs Matthew Wilcox (Oracle)
@ 2020-10-26  4:14 ` Matthew Wilcox (Oracle)
  11 siblings, 0 replies; 25+ messages in thread
From: Matthew Wilcox (Oracle) @ 2020-10-26  4:14 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox (Oracle),
	linux-fsdevel, Andrew Morton, Hugh Dickins, Johannes Weiner,
	Yang Shi, Dave Chinner, linux-kernel, Jan Kara,
	William Kucharski

All callers now expect head (and base) pages, and can handle multiple
head pages in a single batch, so make find_get_entries() behave that way.
Also take the opportunity to make it use the pagevec infrastructure
instead of open-coding how pvecs behave.  This has the side-effect of
being able to append to a pagevec with existing contents, although we
don't make use of that functionality anywhere yet.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: William Kucharski <william.kucharski@oracle.com>
---
 include/linux/pagemap.h |  2 --
 mm/filemap.c            | 36 ++++++++----------------------------
 mm/internal.h           |  2 ++
 3 files changed, 10 insertions(+), 30 deletions(-)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index f0bbe29de732..8938c64f418b 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -449,8 +449,6 @@ static inline struct page *find_subpage(struct page *head, pgoff_t index)
 	return head + (index & (thp_nr_pages(head) - 1));
 }
 
-unsigned find_get_entries(struct address_space *mapping, pgoff_t start,
-		pgoff_t end, struct pagevec *pvec, pgoff_t *indices);
 unsigned find_get_pages_range(struct address_space *mapping, pgoff_t *start,
 			pgoff_t end, unsigned int nr_pages,
 			struct page **pages);
diff --git a/mm/filemap.c b/mm/filemap.c
index e0fa943011d8..a117718ec1a8 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1905,49 +1905,29 @@ static inline struct page *xas_find_get_entry(struct xa_state *xas,
  * the mapping.  The entries are placed in @pvec.  find_get_entries()
  * takes a reference on any actual pages it returns.
  *
- * The search returns a group of mapping-contiguous page cache entries
- * with ascending indexes.  There may be holes in the indices due to
- * not-present pages.
+ * The entries have ascending indexes.  The indices may not be consecutive
+ * due to not-present entries or THPs.
  *
  * Any shadow entries of evicted pages, or swap entries from
  * shmem/tmpfs, are included in the returned array.
  *
- * If it finds a Transparent Huge Page, head or tail, find_get_entries()
- * stops at that page: the caller is likely to have a better way to handle
- * the compound page as a whole, and then skip its extent, than repeatedly
- * calling find_get_entries() to return all its tails.
- *
- * Return: the number of pages and shadow entries which were found.
+ * Return: The number of entries which were found.
  */
 unsigned find_get_entries(struct address_space *mapping, pgoff_t start,
 		pgoff_t end, struct pagevec *pvec, pgoff_t *indices)
 {
 	XA_STATE(xas, &mapping->i_pages, start);
 	struct page *page;
-	unsigned int ret = 0;
-	unsigned nr_entries = PAGEVEC_SIZE;
 
 	rcu_read_lock();
 	while ((page = xas_find_get_entry(&xas, end, XA_PRESENT))) {
-		/*
-		 * Terminate early on finding a THP, to allow the caller to
-		 * handle it all at once; but continue if this is hugetlbfs.
-		 */
-		if (!xa_is_value(page) && PageTransHuge(page) &&
-				!PageHuge(page)) {
-			page = find_subpage(page, xas.xa_index);
-			nr_entries = ret + 1;
-		}
-
-		indices[ret] = xas.xa_index;
-		pvec->pages[ret] = page;
-		if (++ret == nr_entries)
+		indices[pvec->nr] = xas.xa_index;
+		if (!pagevec_add(pvec, page))
 			break;
 	}
 	rcu_read_unlock();
 
-	pvec->nr = ret;
-	return ret;
+	return pagevec_count(pvec);
 }
 
 /**
@@ -1966,8 +1946,8 @@ unsigned find_get_entries(struct address_space *mapping, pgoff_t start,
  * not returned.
  *
  * The entries have ascending indexes.  The indices may not be consecutive
- * due to not-present entries, THP pages, pages which could not be locked
- * or pages under writeback.
+ * due to not-present entries, THPs, pages which could not be locked or
+ * pages under writeback.
  *
  * Return: The number of entries which were found.
  */
diff --git a/mm/internal.h b/mm/internal.h
index 194572e1ab49..5aca7d7bc57c 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -62,6 +62,8 @@ static inline void force_page_cache_readahead(struct address_space *mapping,
 
 struct page *find_get_entry(struct address_space *mapping, pgoff_t index);
 struct page *find_lock_entry(struct address_space *mapping, pgoff_t index);
+unsigned find_get_entries(struct address_space *mapping, pgoff_t start,
+		pgoff_t end, struct pagevec *pvec, pgoff_t *indices);
 unsigned find_lock_entries(struct address_space *mapping, pgoff_t start,
 		pgoff_t end, struct pagevec *pvec, pgoff_t *indices);
 
-- 
2.28.0


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 04/12] mm/filemap: Add mapping_seek_hole_data
  2020-10-26  4:14 ` [PATCH v3 04/12] mm/filemap: Add mapping_seek_hole_data Matthew Wilcox (Oracle)
@ 2020-10-26 10:48   ` Jan Kara
  2020-10-26 12:17     ` Matthew Wilcox
  2020-10-27 18:58   ` Christoph Hellwig
  1 sibling, 1 reply; 25+ messages in thread
From: Jan Kara @ 2020-10-26 10:48 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-mm, linux-fsdevel, Andrew Morton, Hugh Dickins,
	Johannes Weiner, Yang Shi, Dave Chinner, linux-kernel,
	William Kucharski

On Mon 26-10-20 04:14:00, Matthew Wilcox (Oracle) wrote:
> Rewrite shmem_seek_hole_data() and move it to filemap.c.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> Reviewed-by: William Kucharski <william.kucharski@oracle.com>
> ---
>  include/linux/pagemap.h |  2 ++
>  mm/filemap.c            | 76 +++++++++++++++++++++++++++++++++++++++++
>  mm/shmem.c              | 72 +++-----------------------------------
>  3 files changed, 82 insertions(+), 68 deletions(-)
> 
> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
> index c77b7c31b2e4..5f3e829c91fd 100644
> --- a/include/linux/pagemap.h
> +++ b/include/linux/pagemap.h
> @@ -760,6 +760,8 @@ extern void __delete_from_page_cache(struct page *page, void *shadow);
>  int replace_page_cache_page(struct page *old, struct page *new, gfp_t gfp_mask);
>  void delete_from_page_cache_batch(struct address_space *mapping,
>  				  struct pagevec *pvec);
> +loff_t mapping_seek_hole_data(struct address_space *, loff_t start, loff_t end,
> +		int whence);
>  
>  /*
>   * Like add_to_page_cache_locked, but used to add newly allocated pages:
> diff --git a/mm/filemap.c b/mm/filemap.c
> index 00eaed59e797..3a55d258d9f2 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -2526,6 +2526,82 @@ generic_file_read_iter(struct kiocb *iocb, struct iov_iter *iter)
>  }
>  EXPORT_SYMBOL(generic_file_read_iter);
>  
> +static inline loff_t page_seek_hole_data(struct page *page,
> +		loff_t start, loff_t end, bool seek_data)
> +{
> +	if (xa_is_value(page) || PageUptodate(page))

Please add a comment here that this is currently tmpfs specific treating
exceptional entries as swapped out pages and thus data. It took me quite a
while to figure this out. You can remove the comment later when it is no
longer true...

> +		return seek_data ? start : end;
> +	return seek_data ? end : start;
> +}
> +
> +static inline
> +unsigned int seek_page_size(struct xa_state *xas, struct page *page)
> +{
> +	if (xa_is_value(page))
> +		return PAGE_SIZE << xa_get_order(xas->xa, xas->xa_index);
> +	return thp_size(page);
> +}
> +
> +/**
> + * mapping_seek_hole_data - Seek for SEEK_DATA / SEEK_HOLE in the page cache.
> + * @mapping: Address space to search.
> + * @start: First byte to consider.
> + * @end: Limit of search (exclusive).
> + * @whence: Either SEEK_HOLE or SEEK_DATA.
> + *
> + * If the page cache knows which blocks contain holes and which blocks
> + * contain data, your filesystem can use this function to implement
> + * SEEK_HOLE and SEEK_DATA.  This is useful for filesystems which are
> + * entirely memory-based such as tmpfs, and filesystems which support
> + * unwritten extents.
> + *
> + * Return: The requested offset on successs, or -ENXIO if @whence specifies
> + * SEEK_DATA and there is no data after @start.  There is an implicit hole
> + * after @end - 1, so SEEK_HOLE returns @end if all the bytes between @start
> + * and @end contain data.
> + */
> +loff_t mapping_seek_hole_data(struct address_space *mapping, loff_t start,
> +		loff_t end, int whence)
> +{
> +	XA_STATE(xas, &mapping->i_pages, start >> PAGE_SHIFT);
> +	pgoff_t max = (end - 1) / PAGE_SIZE;
> +	bool seek_data = (whence == SEEK_DATA);
> +	struct page *page;
> +
> +	if (end <= start)
> +		return -ENXIO;
> +
> +	rcu_read_lock();
> +	while ((page = xas_find_get_entry(&xas, max, XA_PRESENT))) {
> +		loff_t pos = xas.xa_index * PAGE_SIZE;
> +
> +		if (start < pos) {
> +			if (!seek_data)
> +				goto unlock;
> +			start = pos;
> +		}
> +
> +		pos += seek_page_size(&xas, page);
> +		start = page_seek_hole_data(page, start, pos, seek_data);
> +		if (start < pos)
> +			goto unlock;

Uh, I was staring at this function for half an hour but I still couldn't
convince myself that it is correct in all the corner cases. Maybe I'm dumb
but I'd wish this was more intuitive (and I have to say that the original
tmpfs function is much more obviously correct to me). It would more 
understandable for me if we had a code like:

		if (page_seek_match(page, seek_data))
			goto unlock;

which would be just the condition in page_seek_hole_data(). Honestly at the
moment I fail to see why you bother with 'pos' in the above four lines at
all.

BTW I suspect that this loop forgets to release the page reference it has got
when doing SEEK_HOLE.

> +	}
> +	rcu_read_unlock();
> +
> +	if (seek_data)
> +		return -ENXIO;
> +	goto out;
> +
> +unlock:
> +	rcu_read_unlock();
> +	if (!xa_is_value(page))
> +		put_page(page);
> +out:
> +	if (start > end)
> +		return end;
> +	return start;
> +}


								Honza

-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 11/12] mm/truncate,shmem: Handle truncates that split THPs
  2020-10-26  4:14 ` [PATCH v3 11/12] mm/truncate,shmem: Handle truncates that split THPs Matthew Wilcox (Oracle)
@ 2020-10-26 11:05   ` Jan Kara
  0 siblings, 0 replies; 25+ messages in thread
From: Jan Kara @ 2020-10-26 11:05 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-mm, linux-fsdevel, Andrew Morton, Hugh Dickins,
	Johannes Weiner, Yang Shi, Dave Chinner, linux-kernel,
	William Kucharski

On Mon 26-10-20 04:14:07, Matthew Wilcox (Oracle) wrote:
> Handle THP splitting in the parts of the truncation functions which
> already handle partial pages.  Factor all that code out into a new
> function called truncate_inode_partial_page().
> 
> We lose the easy 'bail out' path if a truncate or hole punch is entirely
> within a single page.  We can add some more complex logic to restore
> the optimisation if it proves to be worthwhile.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> Reviewed-by: William Kucharski <william.kucharski@oracle.com>

The patch looks good to me. You can add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  mm/internal.h |   1 +
>  mm/shmem.c    |  97 ++++++++++++++---------------------------
>  mm/truncate.c | 118 +++++++++++++++++++++++++++++++-------------------
>  3 files changed, 108 insertions(+), 108 deletions(-)
> 
> diff --git a/mm/internal.h b/mm/internal.h
> index 8d79f4d21eaf..194572e1ab49 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -620,4 +620,5 @@ struct migration_target_control {
>  	gfp_t gfp_mask;
>  };
>  
> +bool truncate_inode_partial_page(struct page *page, loff_t start, loff_t end);
>  #endif	/* __MM_INTERNAL_H */
> diff --git a/mm/shmem.c b/mm/shmem.c
> index 7880a245ac32..bcb4ecaa5949 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -857,32 +857,6 @@ void shmem_unlock_mapping(struct address_space *mapping)
>  	}
>  }
>  
> -/*
> - * Check whether a hole-punch or truncation needs to split a huge page,
> - * returning true if no split was required, or the split has been successful.
> - *
> - * Eviction (or truncation to 0 size) should never need to split a huge page;
> - * but in rare cases might do so, if shmem_undo_range() failed to trylock on
> - * head, and then succeeded to trylock on tail.
> - *
> - * A split can only succeed when there are no additional references on the
> - * huge page: so the split below relies upon find_get_entries() having stopped
> - * when it found a subpage of the huge page, without getting further references.
> - */
> -static bool shmem_punch_compound(struct page *page, pgoff_t start, pgoff_t end)
> -{
> -	if (!PageTransCompound(page))
> -		return true;
> -
> -	/* Just proceed to delete a huge page wholly within the range punched */
> -	if (PageHead(page) &&
> -	    page->index >= start && page->index + HPAGE_PMD_NR <= end)
> -		return true;
> -
> -	/* Try to split huge page, so we can truly punch the hole or truncate */
> -	return split_huge_page(page) >= 0;
> -}
> -
>  /*
>   * Remove range of pages and swap entries from page cache, and free them.
>   * If !unfalloc, truncate or punch hole; if unfalloc, undo failed fallocate.
> @@ -894,13 +868,13 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend,
>  	struct shmem_inode_info *info = SHMEM_I(inode);
>  	pgoff_t start = (lstart + PAGE_SIZE - 1) >> PAGE_SHIFT;
>  	pgoff_t end = (lend + 1) >> PAGE_SHIFT;
> -	unsigned int partial_start = lstart & (PAGE_SIZE - 1);
> -	unsigned int partial_end = (lend + 1) & (PAGE_SIZE - 1);
>  	struct pagevec pvec;
>  	pgoff_t indices[PAGEVEC_SIZE];
> +	struct page *page;
>  	long nr_swaps_freed = 0;
>  	pgoff_t index;
>  	int i;
> +	bool partial_end;
>  
>  	if (lend == -1)
>  		end = -1;	/* unsigned, so actually very big */
> @@ -910,7 +884,7 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend,
>  	while (index < end && find_lock_entries(mapping, index, end - 1,
>  			&pvec, indices)) {
>  		for (i = 0; i < pagevec_count(&pvec); i++) {
> -			struct page *page = pvec.pages[i];
> +			page = pvec.pages[i];
>  
>  			index = indices[i];
>  
> @@ -933,33 +907,37 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend,
>  		index++;
>  	}
>  
> -	if (partial_start) {
> -		struct page *page = NULL;
> -		shmem_getpage(inode, start - 1, &page, SGP_READ);
> -		if (page) {
> -			unsigned int top = PAGE_SIZE;
> -			if (start > end) {
> -				top = partial_end;
> -				partial_end = 0;
> -			}
> -			zero_user_segment(page, partial_start, top);
> -			set_page_dirty(page);
> -			unlock_page(page);
> -			put_page(page);
> +	partial_end = ((lend + 1) % PAGE_SIZE) > 0;
> +	page = NULL;
> +	shmem_getpage(inode, lstart >> PAGE_SHIFT, &page, SGP_READ);
> +	if (page) {
> +		bool same_page;
> +
> +		page = thp_head(page);
> +		same_page = lend < page_offset(page) + thp_size(page);
> +		if (same_page)
> +			partial_end = false;
> +		set_page_dirty(page);
> +		if (!truncate_inode_partial_page(page, lstart, lend)) {
> +			start = page->index + thp_nr_pages(page);
> +			if (same_page)
> +				end = page->index;
>  		}
> +		unlock_page(page);
> +		put_page(page);
> +		page = NULL;
>  	}
> -	if (partial_end) {
> -		struct page *page = NULL;
> +
> +	if (partial_end)
>  		shmem_getpage(inode, end, &page, SGP_READ);
> -		if (page) {
> -			zero_user_segment(page, 0, partial_end);
> -			set_page_dirty(page);
> -			unlock_page(page);
> -			put_page(page);
> -		}
> +	if (page) {
> +		page = thp_head(page);
> +		set_page_dirty(page);
> +		if (!truncate_inode_partial_page(page, lstart, lend))
> +			end = page->index;
> +		unlock_page(page);
> +		put_page(page);
>  	}
> -	if (start >= end)
> -		return;
>  
>  	index = start;
>  	while (index < end) {
> @@ -975,7 +953,7 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend,
>  			continue;
>  		}
>  		for (i = 0; i < pagevec_count(&pvec); i++) {
> -			struct page *page = pvec.pages[i];
> +			page = pvec.pages[i];
>  
>  			index = indices[i];
>  			if (xa_is_value(page)) {
> @@ -1000,18 +978,9 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend,
>  					break;
>  				}
>  				VM_BUG_ON_PAGE(PageWriteback(page), page);
> -				if (shmem_punch_compound(page, start, end))
> -					truncate_inode_page(mapping, page);
> -				else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) {
> -					/* Wipe the page and don't get stuck */
> -					clear_highpage(page);
> -					flush_dcache_page(page);
> -					set_page_dirty(page);
> -					if (index <
> -					    round_up(start, HPAGE_PMD_NR))
> -						start = index + 1;
> -				}
> +				truncate_inode_page(mapping, page);
>  			}
> +			index = page->index + thp_nr_pages(page) - 1;
>  			unlock_page(page);
>  		}
>  		pagevec_remove_exceptionals(&pvec);
> diff --git a/mm/truncate.c b/mm/truncate.c
> index a96e44a5ce59..11ef90d7e3af 100644
> --- a/mm/truncate.c
> +++ b/mm/truncate.c
> @@ -224,6 +224,53 @@ int truncate_inode_page(struct address_space *mapping, struct page *page)
>  	return 0;
>  }
>  
> +/*
> + * Handle partial (transparent) pages.  The page may be entirely within the
> + * range if a split has raced with us.  If not, we zero the part of the
> + * page that's within the [start, end] range, and then split the page if
> + * it's a THP.  split_page_range() will discard pages which now lie beyond
> + * i_size, and we rely on the caller to discard pages which lie within a
> + * newly created hole.
> + *
> + * Returns false if THP splitting failed so the caller can avoid
> + * discarding the entire page which is stubbornly unsplit.
> + */
> +bool truncate_inode_partial_page(struct page *page, loff_t start, loff_t end)
> +{
> +	loff_t pos = page_offset(page);
> +	unsigned int offset, length;
> +
> +	if (pos < start)
> +		offset = start - pos;
> +	else
> +		offset = 0;
> +	length = thp_size(page);
> +	if (pos + length <= (u64)end)
> +		length = length - offset;
> +	else
> +		length = end + 1 - pos - offset;
> +
> +	wait_on_page_writeback(page);
> +	if (length == thp_size(page)) {
> +		truncate_inode_page(page->mapping, page);
> +		return true;
> +	}
> +
> +	/*
> +	 * We may be zeroing pages we're about to discard, but it avoids
> +	 * doing a complex calculation here, and then doing the zeroing
> +	 * anyway if the page split fails.
> +	 */
> +	zero_user(page, offset, length);
> +
> +	cleancache_invalidate_page(page->mapping, page);
> +	if (page_has_private(page))
> +		do_invalidatepage(page, offset, length);
> +	if (!PageTransHuge(page))
> +		return true;
> +	return split_huge_page(page) == 0;
> +}
> +
>  /*
>   * Used to get rid of pages on hardware memory corruption.
>   */
> @@ -288,20 +335,16 @@ void truncate_inode_pages_range(struct address_space *mapping,
>  {
>  	pgoff_t		start;		/* inclusive */
>  	pgoff_t		end;		/* exclusive */
> -	unsigned int	partial_start;	/* inclusive */
> -	unsigned int	partial_end;	/* exclusive */
>  	struct pagevec	pvec;
>  	pgoff_t		indices[PAGEVEC_SIZE];
>  	pgoff_t		index;
>  	int		i;
> +	struct page *	page;
> +	bool partial_end;
>  
>  	if (mapping->nrpages == 0 && mapping->nrexceptional == 0)
>  		goto out;
>  
> -	/* Offsets within partial pages */
> -	partial_start = lstart & (PAGE_SIZE - 1);
> -	partial_end = (lend + 1) & (PAGE_SIZE - 1);
> -
>  	/*
>  	 * 'start' and 'end' always covers the range of pages to be fully
>  	 * truncated. Partial pages are covered with 'partial_start' at the
> @@ -334,48 +377,35 @@ void truncate_inode_pages_range(struct address_space *mapping,
>  		cond_resched();
>  	}
>  
> -	if (partial_start) {
> -		struct page *page = find_lock_page(mapping, start - 1);
> -		if (page) {
> -			unsigned int top = PAGE_SIZE;
> -			if (start > end) {
> -				/* Truncation within a single page */
> -				top = partial_end;
> -				partial_end = 0;
> -			}
> -			wait_on_page_writeback(page);
> -			zero_user_segment(page, partial_start, top);
> -			cleancache_invalidate_page(mapping, page);
> -			if (page_has_private(page))
> -				do_invalidatepage(page, partial_start,
> -						  top - partial_start);
> -			unlock_page(page);
> -			put_page(page);
> +	partial_end = ((lend + 1) % PAGE_SIZE) > 0;
> +	page = find_lock_head(mapping, lstart >> PAGE_SHIFT);
> +	if (page) {
> +		bool same_page = lend < page_offset(page) + thp_size(page);
> +		if (same_page)
> +			partial_end = false;
> +		if (!truncate_inode_partial_page(page, lstart, lend)) {
> +			start = page->index + thp_nr_pages(page);
> +			if (same_page)
> +				end = page->index;
>  		}
> +		unlock_page(page);
> +		put_page(page);
> +		page = NULL;
>  	}
> -	if (partial_end) {
> -		struct page *page = find_lock_page(mapping, end);
> -		if (page) {
> -			wait_on_page_writeback(page);
> -			zero_user_segment(page, 0, partial_end);
> -			cleancache_invalidate_page(mapping, page);
> -			if (page_has_private(page))
> -				do_invalidatepage(page, 0,
> -						  partial_end);
> -			unlock_page(page);
> -			put_page(page);
> -		}
> +
> +	if (partial_end)
> +		page = find_lock_head(mapping, end);
> +	if (page) {
> +		if (!truncate_inode_partial_page(page, lstart, lend))
> +			end = page->index;
> +		unlock_page(page);
> +		put_page(page);
>  	}
> -	/*
> -	 * If the truncation happened within a single page no pages
> -	 * will be released, just zeroed, so we can bail out now.
> -	 */
> -	if (start >= end)
> -		goto out;
>  
>  	index = start;
> -	for ( ; ; ) {
> +	while (index < end) {
>  		cond_resched();
> +
>  		if (!find_get_entries(mapping, index, end - 1, &pvec,
>  				indices)) {
>  			/* If all gone from start onwards, we're done */
> @@ -387,7 +417,7 @@ void truncate_inode_pages_range(struct address_space *mapping,
>  		}
>  
>  		for (i = 0; i < pagevec_count(&pvec); i++) {
> -			struct page *page = pvec.pages[i];
> +			page = pvec.pages[i];
>  
>  			/* We rely upon deletion not changing page->index */
>  			index = indices[i];
> @@ -396,7 +426,7 @@ void truncate_inode_pages_range(struct address_space *mapping,
>  				continue;
>  
>  			lock_page(page);
> -			WARN_ON(page_to_index(page) != index);
> +			index = page->index + thp_nr_pages(page) - 1;
>  			wait_on_page_writeback(page);
>  			truncate_inode_page(mapping, page);
>  			unlock_page(page);
> -- 
> 2.28.0
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 04/12] mm/filemap: Add mapping_seek_hole_data
  2020-10-26 10:48   ` Jan Kara
@ 2020-10-26 12:17     ` Matthew Wilcox
  2020-10-26 14:49       ` Jan Kara
  0 siblings, 1 reply; 25+ messages in thread
From: Matthew Wilcox @ 2020-10-26 12:17 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-mm, linux-fsdevel, Andrew Morton, Hugh Dickins,
	Johannes Weiner, Yang Shi, Dave Chinner, linux-kernel,
	William Kucharski

On Mon, Oct 26, 2020 at 11:48:06AM +0100, Jan Kara wrote:
> > +static inline loff_t page_seek_hole_data(struct page *page,
> > +		loff_t start, loff_t end, bool seek_data)
> > +{
> > +	if (xa_is_value(page) || PageUptodate(page))
> 
> Please add a comment here that this is currently tmpfs specific treating
> exceptional entries as swapped out pages and thus data. It took me quite a
> while to figure this out. You can remove the comment later when it is no
> longer true...

But it's not tmpfs specific.  If the value entry is a DAX entry, there's
data here, and if the value entry is a shadow entry, there's data here
too.  Not that it should be called for either of those cases because the
filesystem should know, but a value entry always means there's data here.

I'm open to adding a comment, but saying "A value entry always means data"
seems a little redundant with the code.  What would have helped?

> > +		return seek_data ? start : end;
> > +	return seek_data ? end : start;
> > +}
> > +
> > +static inline
> > +unsigned int seek_page_size(struct xa_state *xas, struct page *page)
> > +{
> > +	if (xa_is_value(page))
> > +		return PAGE_SIZE << xa_get_order(xas->xa, xas->xa_index);
> > +	return thp_size(page);
> > +}
> > +
> > +/**
> > + * mapping_seek_hole_data - Seek for SEEK_DATA / SEEK_HOLE in the page cache.
> > + * @mapping: Address space to search.
> > + * @start: First byte to consider.
> > + * @end: Limit of search (exclusive).
> > + * @whence: Either SEEK_HOLE or SEEK_DATA.
> > + *
> > + * If the page cache knows which blocks contain holes and which blocks
> > + * contain data, your filesystem can use this function to implement
> > + * SEEK_HOLE and SEEK_DATA.  This is useful for filesystems which are
> > + * entirely memory-based such as tmpfs, and filesystems which support
> > + * unwritten extents.
> > + *
> > + * Return: The requested offset on successs, or -ENXIO if @whence specifies
> > + * SEEK_DATA and there is no data after @start.  There is an implicit hole
> > + * after @end - 1, so SEEK_HOLE returns @end if all the bytes between @start
> > + * and @end contain data.
> > + */
> > +loff_t mapping_seek_hole_data(struct address_space *mapping, loff_t start,
> > +		loff_t end, int whence)
> > +{
> > +	XA_STATE(xas, &mapping->i_pages, start >> PAGE_SHIFT);
> > +	pgoff_t max = (end - 1) / PAGE_SIZE;
> > +	bool seek_data = (whence == SEEK_DATA);
> > +	struct page *page;
> > +
> > +	if (end <= start)
> > +		return -ENXIO;
> > +
> > +	rcu_read_lock();
> > +	while ((page = xas_find_get_entry(&xas, max, XA_PRESENT))) {
> > +		loff_t pos = xas.xa_index * PAGE_SIZE;
> > +
> > +		if (start < pos) {
> > +			if (!seek_data)
> > +				goto unlock;
> > +			start = pos;
> > +		}
> > +
> > +		pos += seek_page_size(&xas, page);
> > +		start = page_seek_hole_data(page, start, pos, seek_data);
> > +		if (start < pos)
> > +			goto unlock;
> 
> Uh, I was staring at this function for half an hour but I still couldn't
> convince myself that it is correct in all the corner cases. Maybe I'm dumb
> but I'd wish this was more intuitive (and I have to say that the original
> tmpfs function is much more obviously correct to me). It would more 
> understandable for me if we had a code like:
> 
> 		if (page_seek_match(page, seek_data))
> 			goto unlock;
> 
> which would be just the condition in page_seek_hole_data(). Honestly at the
> moment I fail to see why you bother with 'pos' in the above four lines at
> all.

So this?

static bool page_seek_match(struct page *page, bool seek_data)
{
	/* Swap, shadow & DAX entries all represent data */
	if (xa_is_value(page) || PageUptodate(page))
		return seek_data;
	return !seek_data;
}

...

		if (page_seek_match(page, seek_data))
			goto unlock;
		start = pos + seek_page_size(&xas, page);

The function makes more sense when page_seek_hole_data() gains the
ability to look at sub-page uptodate status and it needs to return
where in the page the data (or hole) starts.  But that can be delayed
for the later patch.

With those changes,

Ran: generic/285 generic/286 generic/436 generic/445 generic/448 generic/490 generic/539
Passed all 7 tests

> BTW I suspect that this loop forgets to release the page reference it has got
> when doing SEEK_HOLE.

You're right.  I need a put_page() at the end of the loop.  Also true
for the case where we find a !Uptodate page when doing SEEK_DATA.

> > +	}
> > +	rcu_read_unlock();
> > +
> > +	if (seek_data)
> > +		return -ENXIO;
> > +	goto out;
> > +
> > +unlock:
> > +	rcu_read_unlock();
> > +	if (!xa_is_value(page))
> > +		put_page(page);
> > +out:
> > +	if (start > end)
> > +		return end;
> > +	return start;
> > +}
> 
> 
> 								Honza
> 
> -- 
> Jan Kara <jack@suse.com>
> SUSE Labs, CR

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 04/12] mm/filemap: Add mapping_seek_hole_data
  2020-10-26 12:17     ` Matthew Wilcox
@ 2020-10-26 14:49       ` Jan Kara
  0 siblings, 0 replies; 25+ messages in thread
From: Jan Kara @ 2020-10-26 14:49 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Jan Kara, linux-mm, linux-fsdevel, Andrew Morton, Hugh Dickins,
	Johannes Weiner, Yang Shi, Dave Chinner, linux-kernel,
	William Kucharski

On Mon 26-10-20 12:17:27, Matthew Wilcox wrote:
> On Mon, Oct 26, 2020 at 11:48:06AM +0100, Jan Kara wrote:
> > > +static inline loff_t page_seek_hole_data(struct page *page,
> > > +		loff_t start, loff_t end, bool seek_data)
> > > +{
> > > +	if (xa_is_value(page) || PageUptodate(page))
> > 
> > Please add a comment here that this is currently tmpfs specific treating
> > exceptional entries as swapped out pages and thus data. It took me quite a
> > while to figure this out. You can remove the comment later when it is no
> > longer true...
> 
> But it's not tmpfs specific.  If the value entry is a DAX entry, there's
> data here, and if the value entry is a shadow entry, there's data here
> too.  Not that it should be called for either of those cases because the
> filesystem should know, but a value entry always means there's data here.

Good point but for shadow entries I'm not convinced - we do have page cache
pages instantiated by reads from holes. When they get evicted, we have a
shadow entry there but it is still a hole. Actually, similarly we can have
zeroed page over an unwritten extent and that should still count as a hole
IMO.  Only once the page becomes dirty, it should be treated as data. This
looks like a bug even in the current page_seek_hole_data() helper:

# fallocate -l 4096 testfile
# xfs_io -x -c "seek -h 0" testfile
Whence	Result
HOLE	0
# dd if=testfile bs=4096 count=1 of=/dev/null
# xfs_io -x -c "seek -h 0" testfile
Whence	Result
HOLE	4096

Which is indeed a bit weird result... But we seem to be pretty consistent
in this behavior for quite some time. I'll send an email to fs folks about
this.

> > > +		return seek_data ? start : end;
> > > +	return seek_data ? end : start;
> > > +}
> > > +
> > > +static inline
> > > +unsigned int seek_page_size(struct xa_state *xas, struct page *page)
> > > +{
> > > +	if (xa_is_value(page))
> > > +		return PAGE_SIZE << xa_get_order(xas->xa, xas->xa_index);
> > > +	return thp_size(page);
> > > +}
> > > +
> > > +/**
> > > + * mapping_seek_hole_data - Seek for SEEK_DATA / SEEK_HOLE in the page cache.
> > > + * @mapping: Address space to search.
> > > + * @start: First byte to consider.
> > > + * @end: Limit of search (exclusive).
> > > + * @whence: Either SEEK_HOLE or SEEK_DATA.
> > > + *
> > > + * If the page cache knows which blocks contain holes and which blocks
> > > + * contain data, your filesystem can use this function to implement
> > > + * SEEK_HOLE and SEEK_DATA.  This is useful for filesystems which are
> > > + * entirely memory-based such as tmpfs, and filesystems which support
> > > + * unwritten extents.
> > > + *
> > > + * Return: The requested offset on successs, or -ENXIO if @whence specifies
> > > + * SEEK_DATA and there is no data after @start.  There is an implicit hole
> > > + * after @end - 1, so SEEK_HOLE returns @end if all the bytes between @start
> > > + * and @end contain data.
> > > + */
> > > +loff_t mapping_seek_hole_data(struct address_space *mapping, loff_t start,
> > > +		loff_t end, int whence)
> > > +{
> > > +	XA_STATE(xas, &mapping->i_pages, start >> PAGE_SHIFT);
> > > +	pgoff_t max = (end - 1) / PAGE_SIZE;
> > > +	bool seek_data = (whence == SEEK_DATA);
> > > +	struct page *page;
> > > +
> > > +	if (end <= start)
> > > +		return -ENXIO;
> > > +
> > > +	rcu_read_lock();
> > > +	while ((page = xas_find_get_entry(&xas, max, XA_PRESENT))) {
> > > +		loff_t pos = xas.xa_index * PAGE_SIZE;
> > > +
> > > +		if (start < pos) {
> > > +			if (!seek_data)
> > > +				goto unlock;
> > > +			start = pos;
> > > +		}
> > > +
> > > +		pos += seek_page_size(&xas, page);
> > > +		start = page_seek_hole_data(page, start, pos, seek_data);
> > > +		if (start < pos)
> > > +			goto unlock;
> > 
> > Uh, I was staring at this function for half an hour but I still couldn't
> > convince myself that it is correct in all the corner cases. Maybe I'm dumb
> > but I'd wish this was more intuitive (and I have to say that the original
> > tmpfs function is much more obviously correct to me). It would more 
> > understandable for me if we had a code like:
> > 
> > 		if (page_seek_match(page, seek_data))
> > 			goto unlock;
> > 
> > which would be just the condition in page_seek_hole_data(). Honestly at the
> > moment I fail to see why you bother with 'pos' in the above four lines at
> > all.
> 
> So this?
> 
> static bool page_seek_match(struct page *page, bool seek_data)
> {
> 	/* Swap, shadow & DAX entries all represent data */
> 	if (xa_is_value(page) || PageUptodate(page))
> 		return seek_data;
> 	return !seek_data;
> }
> 
> ...
> 
> 		if (page_seek_match(page, seek_data))
> 			goto unlock;
> 		start = pos + seek_page_size(&xas, page);
> 
> The function makes more sense when page_seek_hole_data() gains the
> ability to look at sub-page uptodate status and it needs to return
> where in the page the data (or hole) starts.  But that can be delayed
> for the later patch.

Yeah, this looks much more comprehensible for me. Thanks!

> With those changes,
> 
> Ran: generic/285 generic/286 generic/436 generic/445 generic/448 generic/490 generic/539
> Passed all 7 tests
> 
> > BTW I suspect that this loop forgets to release the page reference it has got
> > when doing SEEK_HOLE.
> 
> You're right.  I need a put_page() at the end of the loop.  Also true
> for the case where we find a !Uptodate page when doing SEEK_DATA.

Right.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 03/12] mm/filemap: Add helper for finding pages
  2020-10-26  4:13 ` [PATCH v3 03/12] mm/filemap: Add helper for finding pages Matthew Wilcox (Oracle)
@ 2020-10-27 18:56   ` Christoph Hellwig
  2020-11-12 14:25     ` Matthew Wilcox
  0 siblings, 1 reply; 25+ messages in thread
From: Christoph Hellwig @ 2020-10-27 18:56 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-mm, linux-fsdevel, Andrew Morton, Hugh Dickins,
	Johannes Weiner, Yang Shi, Dave Chinner, linux-kernel, Jan Kara,
	William Kucharski

> +static inline struct page *xas_find_get_entry(struct xa_state *xas,
> +		pgoff_t max, xa_mark_t mark)

I'd expect the xas_ prefix for function from xarray.h.  Maybe this
wants a better name?

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 04/12] mm/filemap: Add mapping_seek_hole_data
  2020-10-26  4:14 ` [PATCH v3 04/12] mm/filemap: Add mapping_seek_hole_data Matthew Wilcox (Oracle)
  2020-10-26 10:48   ` Jan Kara
@ 2020-10-27 18:58   ` Christoph Hellwig
  2020-10-27 20:04     ` Matthew Wilcox
  1 sibling, 1 reply; 25+ messages in thread
From: Christoph Hellwig @ 2020-10-27 18:58 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-mm, linux-fsdevel, Andrew Morton, Hugh Dickins,
	Johannes Weiner, Yang Shi, Dave Chinner, linux-kernel,
	William Kucharski

> +/**
> + * mapping_seek_hole_data - Seek for SEEK_DATA / SEEK_HOLE in the page cache.
> + * @mapping: Address space to search.
> + * @start: First byte to consider.
> + * @end: Limit of search (exclusive).
> + * @whence: Either SEEK_HOLE or SEEK_DATA.
> + *
> + * If the page cache knows which blocks contain holes and which blocks
> + * contain data, your filesystem can use this function to implement
> + * SEEK_HOLE and SEEK_DATA.  This is useful for filesystems which are
> + * entirely memory-based such as tmpfs, and filesystems which support
> + * unwritten extents.
> + *
> + * Return: The requested offset on successs, or -ENXIO if @whence specifies
> + * SEEK_DATA and there is no data after @start.  There is an implicit hole
> + * after @end - 1, so SEEK_HOLE returns @end if all the bytes between @start
> + * and @end contain data.
> + */

This seems to just lift the tmpfs one to common code.  If it really
is supposed to be generic it should be able to replace
page_cache_seek_hole_data as well.  So I don't think moving this without
removing the other common one is an all that good idea.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 04/12] mm/filemap: Add mapping_seek_hole_data
  2020-10-27 18:58   ` Christoph Hellwig
@ 2020-10-27 20:04     ` Matthew Wilcox
  2020-10-28  6:40       ` Christoph Hellwig
  0 siblings, 1 reply; 25+ messages in thread
From: Matthew Wilcox @ 2020-10-27 20:04 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-mm, linux-fsdevel, Andrew Morton, Hugh Dickins,
	Johannes Weiner, Yang Shi, Dave Chinner, linux-kernel,
	William Kucharski

On Tue, Oct 27, 2020 at 06:58:09PM +0000, Christoph Hellwig wrote:
> > +/**
> > + * mapping_seek_hole_data - Seek for SEEK_DATA / SEEK_HOLE in the page cache.
> > + * @mapping: Address space to search.
> > + * @start: First byte to consider.
> > + * @end: Limit of search (exclusive).
> > + * @whence: Either SEEK_HOLE or SEEK_DATA.
> > + *
> > + * If the page cache knows which blocks contain holes and which blocks
> > + * contain data, your filesystem can use this function to implement
> > + * SEEK_HOLE and SEEK_DATA.  This is useful for filesystems which are
> > + * entirely memory-based such as tmpfs, and filesystems which support
> > + * unwritten extents.
> > + *
> > + * Return: The requested offset on successs, or -ENXIO if @whence specifies
> > + * SEEK_DATA and there is no data after @start.  There is an implicit hole
> > + * after @end - 1, so SEEK_HOLE returns @end if all the bytes between @start
> > + * and @end contain data.
> > + */
> 
> This seems to just lift the tmpfs one to common code.  If it really
> is supposed to be generic it should be able to replace
> page_cache_seek_hole_data as well.  So I don't think moving this without
> removing the other common one is an all that good idea.

I have that patch here:

http://git.infradead.org/users/willy/pagecache.git/commitdiff/a4e435b5ed14a0b898da6e5a66fe232f467b8ba1

I was going to let this patch go upstream through Andrew's tree, then
submit that one through Darrick's tree.  But I can add that patch to
the next submission of this series if you'd rather.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 04/12] mm/filemap: Add mapping_seek_hole_data
  2020-10-27 20:04     ` Matthew Wilcox
@ 2020-10-28  6:40       ` Christoph Hellwig
  0 siblings, 0 replies; 25+ messages in thread
From: Christoph Hellwig @ 2020-10-28  6:40 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Christoph Hellwig, linux-mm, linux-fsdevel, Andrew Morton,
	Hugh Dickins, Johannes Weiner, Yang Shi, Dave Chinner,
	linux-kernel, William Kucharski

On Tue, Oct 27, 2020 at 08:04:58PM +0000, Matthew Wilcox wrote:
> I have that patch here:
> 
> http://git.infradead.org/users/willy/pagecache.git/commitdiff/a4e435b5ed14a0b898da6e5a66fe232f467b8ba1
> 
> I was going to let this patch go upstream through Andrew's tree, then
> submit that one through Darrick's tree.  But I can add that patch to
> the next submission of this series if you'd rather.

I think keeping the two patches together would be useful.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 01/12] mm: Make pagecache tagged lookups return only head pages
  2020-10-26  4:13 ` [PATCH v3 01/12] mm: Make pagecache tagged lookups return only head pages Matthew Wilcox (Oracle)
@ 2020-10-28  7:50   ` Mike Rapoport
  2020-11-12 17:41     ` Matthew Wilcox
  0 siblings, 1 reply; 25+ messages in thread
From: Mike Rapoport @ 2020-10-28  7:50 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-mm, linux-fsdevel, Andrew Morton, Hugh Dickins,
	Johannes Weiner, Yang Shi, Dave Chinner, linux-kernel, Jan Kara,
	William Kucharski

On Mon, Oct 26, 2020 at 04:13:57AM +0000, Matthew Wilcox (Oracle) wrote:
> Pagecache tags are used for dirty page writeback.  Since dirtiness is
> tracked on a per-THP basis, we only want to return the head page rather
> than each subpage of a tagged page.  All the filesystems which use huge
> pages today are in-memory, so there are no tagged huge pages today.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> Reviewed-by: Jan Kara <jack@suse.cz>
> Reviewed-by: William Kucharski <william.kucharski@oracle.com>
> ---
>  mm/filemap.c | 10 +++++-----
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/mm/filemap.c b/mm/filemap.c
> index d5e7c2029d16..edde5dc0d28f 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -2066,7 +2066,7 @@ unsigned find_get_pages_contig(struct address_space *mapping, pgoff_t index,
>  EXPORT_SYMBOL(find_get_pages_contig);
>  
>  /**
> - * find_get_pages_range_tag - find and return pages in given range matching @tag
> + * find_get_pages_range_tag - Find and return head pages matching @tag.
>   * @mapping:	the address_space to search
>   * @index:	the starting page index
>   * @end:	The final page index (inclusive)
> @@ -2074,8 +2074,8 @@ EXPORT_SYMBOL(find_get_pages_contig);
>   * @nr_pages:	the maximum number of pages
>   * @pages:	where the resulting pages are placed
>   *
> - * Like find_get_pages, except we only return pages which are tagged with
> - * @tag.   We update @index to index the next page for the traversal.
> + * Like find_get_pages(), except we only return head pages which are tagged
> + * with @tag.   We update @index to index the next page for the traversal.

Nit:                                           ^ next head page

>   *
>   * Return: the number of pages which were found.
>   */
> @@ -2109,9 +2109,9 @@ unsigned find_get_pages_range_tag(struct address_space *mapping, pgoff_t *index,
>  		if (unlikely(page != xas_reload(&xas)))
>  			goto put_page;
>  
> -		pages[ret] = find_subpage(page, xas.xa_index);
> +		pages[ret] = page;
>  		if (++ret == nr_pages) {
> -			*index = xas.xa_index + 1;
> +			*index = page->index + thp_nr_pages(page);
>  			goto out;
>  		}
>  		continue;
> -- 
> 2.28.0
> 
> 

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 03/12] mm/filemap: Add helper for finding pages
  2020-10-27 18:56   ` Christoph Hellwig
@ 2020-11-12 14:25     ` Matthew Wilcox
  0 siblings, 0 replies; 25+ messages in thread
From: Matthew Wilcox @ 2020-11-12 14:25 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-mm, linux-fsdevel, Andrew Morton, Hugh Dickins,
	Johannes Weiner, Yang Shi, Dave Chinner, linux-kernel, Jan Kara,
	William Kucharski

On Tue, Oct 27, 2020 at 06:56:42PM +0000, Christoph Hellwig wrote:
> > +static inline struct page *xas_find_get_entry(struct xa_state *xas,
> > +		pgoff_t max, xa_mark_t mark)
> 
> I'd expect the xas_ prefix for function from xarray.h.  Maybe this
> wants a better name?

The obvious name for this is find_get_entry().  But that already exists.
Although it turns out it has only two users and one of them doesn't
actually want to use it, so I'm going to rename find_get_entry() to
mapping_get_entry() and then this can become find_get_entry().

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 01/12] mm: Make pagecache tagged lookups return only head pages
  2020-10-28  7:50   ` Mike Rapoport
@ 2020-11-12 17:41     ` Matthew Wilcox
  2020-11-12 19:11       ` Mike Rapoport
  0 siblings, 1 reply; 25+ messages in thread
From: Matthew Wilcox @ 2020-11-12 17:41 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: linux-mm, linux-fsdevel, Andrew Morton, Hugh Dickins,
	Johannes Weiner, Yang Shi, Dave Chinner, linux-kernel, Jan Kara,
	William Kucharski

On Wed, Oct 28, 2020 at 09:50:56AM +0200, Mike Rapoport wrote:
> > @@ -2074,8 +2074,8 @@ EXPORT_SYMBOL(find_get_pages_contig);
> >   * @nr_pages:	the maximum number of pages
> >   * @pages:	where the resulting pages are placed
> >   *
> > - * Like find_get_pages, except we only return pages which are tagged with
> > - * @tag.   We update @index to index the next page for the traversal.
> > + * Like find_get_pages(), except we only return head pages which are tagged
> > + * with @tag.   We update @index to index the next page for the traversal.
> 
> Nit:                                           ^ next head page

I don't love the sentence anyway.  How about:

 * Like find_get_pages(), except we only return head pages which are tagged
 * with @tag.  @index is updated to the index immediately after the last
 * page we return, ready for the next iteration.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 01/12] mm: Make pagecache tagged lookups return only head pages
  2020-11-12 17:41     ` Matthew Wilcox
@ 2020-11-12 19:11       ` Mike Rapoport
  0 siblings, 0 replies; 25+ messages in thread
From: Mike Rapoport @ 2020-11-12 19:11 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: linux-mm, linux-fsdevel, Andrew Morton, Hugh Dickins,
	Johannes Weiner, Yang Shi, Dave Chinner, linux-kernel, Jan Kara,
	William Kucharski

On Thu, Nov 12, 2020 at 05:41:50PM +0000, Matthew Wilcox wrote:
> On Wed, Oct 28, 2020 at 09:50:56AM +0200, Mike Rapoport wrote:
> > > @@ -2074,8 +2074,8 @@ EXPORT_SYMBOL(find_get_pages_contig);
> > >   * @nr_pages:	the maximum number of pages
> > >   * @pages:	where the resulting pages are placed
> > >   *
> > > - * Like find_get_pages, except we only return pages which are tagged with
> > > - * @tag.   We update @index to index the next page for the traversal.
> > > + * Like find_get_pages(), except we only return head pages which are tagged
> > > + * with @tag.   We update @index to index the next page for the traversal.
> > 
> > Nit:                                           ^ next head page
> 
> I don't love the sentence anyway.  How about:
> 
>  * Like find_get_pages(), except we only return head pages which are tagged
>  * with @tag.  @index is updated to the index immediately after the last
>  * page we return, ready for the next iteration.

I like it.

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, back to index

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-26  4:13 [PATCH v3 00/12] Overhaul multi-page lookups for THP Matthew Wilcox (Oracle)
2020-10-26  4:13 ` [PATCH v3 01/12] mm: Make pagecache tagged lookups return only head pages Matthew Wilcox (Oracle)
2020-10-28  7:50   ` Mike Rapoport
2020-11-12 17:41     ` Matthew Wilcox
2020-11-12 19:11       ` Mike Rapoport
2020-10-26  4:13 ` [PATCH v3 02/12] mm/shmem: Use pagevec_lookup in shmem_unlock_mapping Matthew Wilcox (Oracle)
2020-10-26  4:13 ` [PATCH v3 03/12] mm/filemap: Add helper for finding pages Matthew Wilcox (Oracle)
2020-10-27 18:56   ` Christoph Hellwig
2020-11-12 14:25     ` Matthew Wilcox
2020-10-26  4:14 ` [PATCH v3 04/12] mm/filemap: Add mapping_seek_hole_data Matthew Wilcox (Oracle)
2020-10-26 10:48   ` Jan Kara
2020-10-26 12:17     ` Matthew Wilcox
2020-10-26 14:49       ` Jan Kara
2020-10-27 18:58   ` Christoph Hellwig
2020-10-27 20:04     ` Matthew Wilcox
2020-10-28  6:40       ` Christoph Hellwig
2020-10-26  4:14 ` [PATCH v3 05/12] mm: Add and use find_lock_entries Matthew Wilcox (Oracle)
2020-10-26  4:14 ` [PATCH v3 06/12] mm: Add an 'end' parameter to find_get_entries Matthew Wilcox (Oracle)
2020-10-26  4:14 ` [PATCH v3 07/12] mm: Add an 'end' parameter to pagevec_lookup_entries Matthew Wilcox (Oracle)
2020-10-26  4:14 ` [PATCH v3 08/12] mm: Remove nr_entries parameter from pagevec_lookup_entries Matthew Wilcox (Oracle)
2020-10-26  4:14 ` [PATCH v3 09/12] mm: Pass pvec directly to find_get_entries Matthew Wilcox (Oracle)
2020-10-26  4:14 ` [PATCH v3 10/12] mm: Remove pagevec_lookup_entries Matthew Wilcox (Oracle)
2020-10-26  4:14 ` [PATCH v3 11/12] mm/truncate,shmem: Handle truncates that split THPs Matthew Wilcox (Oracle)
2020-10-26 11:05   ` Jan Kara
2020-10-26  4:14 ` [PATCH v3 12/12] mm/filemap: Return only head pages from find_get_entries Matthew Wilcox (Oracle)

Linux-Fsdevel Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-fsdevel/0 linux-fsdevel/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-fsdevel linux-fsdevel/ https://lore.kernel.org/linux-fsdevel \
		linux-fsdevel@vger.kernel.org
	public-inbox-index linux-fsdevel

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-fsdevel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git