[PATCH 00/12] [LSF/MM/BPF RFC] shmem/tmpfs: add large folios support

linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 00/12] [LSF/MM/BPF RFC] shmem/tmpfs: add large folios support
       [not found] <CGME20240515055723eucas1p11bf14732f7fac943e688369ff7765f79@eucas1p1.samsung.com>
@ 2024-05-15  5:57 ` Daniel Gomez
       [not found]   ` <CGME20240515055724eucas1p1c502dbded4dc6ff929c7aff570de80c2@eucas1p1.samsung.com>
                     ` (11 more replies)
  0 siblings, 12 replies; 19+ messages in thread
From: Daniel Gomez @ 2024-05-15  5:57 UTC (permalink / raw)
  To: hughd, akpm, willy, jack, mcgrof
  Cc: linux-mm, linux-xfs, djwong, Pankaj Raghav, dagmcr, yosryahmed,
	baolin.wang, ritesh.list, lsf-pc, david, chandan.babu,
	linux-kernel, brauner, Daniel Gomez

In preparation for the LSF/MM/BPF 2024 discussion [1], the patches below add
support for large folios in shmem for the write and fallocate paths.

[1] https://lore.kernel.org/all/4ktpayu66noklllpdpspa3vm5gbmb5boxskcj2q6qn7md3pwwt@kvlu64pqwjzl/
test

This version includes per-block uptodate tracking required for lseek when
enabling support for large folios. Initially, this feature was introduced to
address lseek fstests (specifically generic/285 and generic/436) for huge pages.
However, it was suggested that, for THP, the test should be adapted to PAGE_SIZE
and PMD_SIZE. Nevertheless, with arbitrary folio orders we require the lowest
granularity possible. This topic will be part of the discussion in tomorrow's
session.

Fstests expunges results can be found in kdevops' tree:
https://github.com/linux-kdevops/kdevops/tree/main/workflows/fstests/expunges/6.9.0-shmem-large-folios-with-block-tracking/tmpfs
https://github.com/linux-kdevops/kdevops/tree/main/workflows/fstests/expunges/6.8.0-shmem-large-folios-with-block-tracking/tmpfs

Daniel

Daniel Gomez (11):
  shmem: add per-block uptodate tracking for large folios
  shmem: move folio zero operation to write_begin()
  shmem: exit shmem_get_folio_gfp() if block is uptodate
  shmem: clear_highpage() if block is not uptodate
  shmem: set folio uptodate when reclaim
  shmem: check if a block is uptodate before splice into pipe
  shmem: clear uptodate blocks after PUNCH_HOLE
  shmem: enable per-block uptodate
  shmem: add order arg to shmem_alloc_folio()
  shmem: add file length arg in shmem_get_folio() path
  shmem: add large folio support to the write and fallocate paths

Pankaj Raghav (1):
  splice: don't check for uptodate if partially uptodate is impl

 fs/splice.c              |  17 +-
 fs/xfs/scrub/xfile.c     |   6 +-
 fs/xfs/xfs_buf_mem.c     |   3 +-
 include/linux/shmem_fs.h |   2 +-
 mm/khugepaged.c          |   3 +-
 mm/shmem.c               | 441 ++++++++++++++++++++++++++++++++++-----
 mm/userfaultfd.c         |   2 +-
 7 files changed, 417 insertions(+), 57 deletions(-)

-- 
2.43.0

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 01/12] splice: don't check for uptodate if partially uptodate is impl
       [not found]   ` <CGME20240515055724eucas1p1c502dbded4dc6ff929c7aff570de80c2@eucas1p1.samsung.com>
@ 2024-05-15  5:57     ` Daniel Gomez
  0 siblings, 0 replies; 19+ messages in thread
From: Daniel Gomez @ 2024-05-15  5:57 UTC (permalink / raw)
  To: hughd, akpm, willy, jack, mcgrof
  Cc: linux-mm, linux-xfs, djwong, Pankaj Raghav, dagmcr, yosryahmed,
	baolin.wang, ritesh.list, lsf-pc, david, chandan.babu,
	linux-kernel, brauner, Daniel Gomez

From: Pankaj Raghav <p.raghav@samsung.com>

When a large folio is alloced, splice will zero out the whole folio even
if only a small part of it is written, and it updates the uptodate flag
of the folio.

Once the per-block uptodate tracking is implemented for tmpfs,
pipe_buf_confirm() only needs to check the range it needs to splice to
be uptodate and not the whole folio as we don't set uptodate flag for
partial writes.

Signed-off-by: Pankaj Raghav <p.raghav@samsung.com>
Signed-off-by: Daniel Gomez <da.gomez@samsung.com>
---
 fs/splice.c | 17 ++++++++++++++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/fs/splice.c b/fs/splice.c
index 218e24b1ac40..e6ac57795590 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -120,7 +120,9 @@ static int page_cache_pipe_buf_confirm(struct pipe_inode_info *pipe,
 				       struct pipe_buffer *buf)
 {
 	struct folio *folio = page_folio(buf->page);
+	const struct address_space_operations *ops;
 	int err;
+	off_t off = folio_page_idx(folio, buf->page) * PAGE_SIZE + buf->offset;
 
 	if (!folio_test_uptodate(folio)) {
 		folio_lock(folio);
@@ -134,12 +136,21 @@ static int page_cache_pipe_buf_confirm(struct pipe_inode_info *pipe,
 			goto error;
 		}
 
+		ops = folio->mapping->a_ops;
 		/*
 		 * Uh oh, read-error from disk.
 		 */
-		if (!folio_test_uptodate(folio)) {
-			err = -EIO;
-			goto error;
+		if (!ops->is_partially_uptodate) {
+			if (!folio_test_uptodate(folio)) {
+				err = -EIO;
+				goto error;
+			}
+		} else {
+			if (!ops->is_partially_uptodate(folio, off,
+							buf->len)) {
+				err = -EIO;
+				goto error;
+			}
 		}
 
 		/* Folio is ok after all, we are done */
-- 
2.43.0

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 02/12] shmem: add per-block uptodate tracking for large folios
       [not found]   ` <CGME20240515055726eucas1p2a795fc743373571bfc3349f9e1ef3f9e@eucas1p2.samsung.com>
@ 2024-05-15  5:57     ` Daniel Gomez
  0 siblings, 0 replies; 19+ messages in thread
From: Daniel Gomez @ 2024-05-15  5:57 UTC (permalink / raw)
  To: hughd, akpm, willy, jack, mcgrof
  Cc: linux-mm, linux-xfs, djwong, Pankaj Raghav, dagmcr, yosryahmed,
	baolin.wang, ritesh.list, lsf-pc, david, chandan.babu,
	linux-kernel, brauner, Daniel Gomez

Based on iomap per-block dirty and uptodate state track, add support
for shmem_folio_state struct to track the uptodate state per-block for
large folios.

Signed-off-by: Daniel Gomez <da.gomez@samsung.com>
---
 mm/shmem.c | 195 +++++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 189 insertions(+), 6 deletions(-)

diff --git a/mm/shmem.c b/mm/shmem.c
index 94ab99b6b574..4818f9fbd328 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -131,6 +131,124 @@ struct shmem_options {
 #define SHMEM_SEEN_QUOTA 32
 };
 
+/*
+ * Structure allocated for each folio to track per-block uptodate state.
+ *
+ * Like buffered-io iomap_folio_state struct but only for uptodate.
+ */
+struct shmem_folio_state {
+	spinlock_t state_lock;
+	unsigned long state[];
+};
+
+static inline bool sfs_is_fully_uptodate(struct folio *folio)
+{
+	struct inode *inode = folio->mapping->host;
+	struct shmem_folio_state *sfs = folio->private;
+
+	return bitmap_full(sfs->state, i_blocks_per_folio(inode, folio));
+}
+
+static inline bool sfs_is_block_uptodate(struct shmem_folio_state *sfs,
+					 unsigned int block)
+{
+	return test_bit(block, sfs->state);
+}
+
+/**
+ * sfs_get_last_block_uptodate - find the index of the last uptodate block
+ * within a specified range
+ * @folio: The folio
+ * @first: The starting block of the range to search
+ * @last: The ending block of the range to search
+ *
+ * Returns the index of the last uptodate block within the specified range. If
+ * a non uptodate block is found at the start, it returns UINT_MAX.
+ */
+static unsigned int sfs_get_last_block_uptodate(struct folio *folio,
+						unsigned int first,
+						unsigned int last)
+{
+	struct inode *inode = folio->mapping->host;
+	struct shmem_folio_state *sfs = folio->private;
+	unsigned int nr_blocks = i_blocks_per_folio(inode, folio);
+	unsigned int aux = find_next_zero_bit(sfs->state, nr_blocks, first);
+
+	/*
+	 * Exceed the range of possible last block and return UINT_MAX if a non
+	 * uptodate block is found at the beginning of the scan.
+	 */
+	if (aux == first)
+		return UINT_MAX;
+
+	return min_t(unsigned int, aux - 1, last);
+}
+
+static void sfs_set_range_uptodate(struct folio *folio,
+				   struct shmem_folio_state *sfs, size_t off,
+				   size_t len)
+{
+	struct inode *inode = folio->mapping->host;
+	unsigned int first_blk = off >> inode->i_blkbits;
+	unsigned int last_blk = (off + len - 1) >> inode->i_blkbits;
+	unsigned int nr_blks = last_blk - first_blk + 1;
+	unsigned long flags;
+
+	spin_lock_irqsave(&sfs->state_lock, flags);
+	bitmap_set(sfs->state, first_blk, nr_blks);
+	if (sfs_is_fully_uptodate(folio))
+		folio_mark_uptodate(folio);
+	spin_unlock_irqrestore(&sfs->state_lock, flags);
+}
+
+static struct shmem_folio_state *sfs_alloc(struct inode *inode,
+					   struct folio *folio)
+{
+	struct shmem_folio_state *sfs = folio->private;
+	unsigned int nr_blocks = i_blocks_per_folio(inode, folio);
+	gfp_t gfp = GFP_KERNEL;
+
+	if (sfs || nr_blocks <= 1)
+		return sfs;
+
+	/*
+	 * sfs->state tracks uptodate flag when the block size is smaller
+	 * than the folio size.
+	 */
+	sfs = kzalloc(struct_size(sfs, state, BITS_TO_LONGS(nr_blocks)), gfp);
+	if (!sfs)
+		return sfs;
+
+	spin_lock_init(&sfs->state_lock);
+	if (folio_test_uptodate(folio))
+		bitmap_set(sfs->state, 0, nr_blocks);
+	folio_attach_private(folio, sfs);
+
+	return sfs;
+}
+
+static void sfs_free(struct folio *folio, bool force)
+{
+	if (!folio_test_private(folio))
+		return;
+
+	if (!force)
+		WARN_ON_ONCE(sfs_is_fully_uptodate(folio) !=
+			     folio_test_uptodate(folio));
+
+	kfree(folio_detach_private(folio));
+}
+
+static void shmem_set_range_uptodate(struct folio *folio, size_t off,
+				     size_t len)
+{
+	struct shmem_folio_state *sfs = folio->private;
+
+	if (sfs)
+		sfs_set_range_uptodate(folio, sfs, off, len);
+	else
+		folio_mark_uptodate(folio);
+}
 #ifdef CONFIG_TMPFS
 static unsigned long shmem_default_max_blocks(void)
 {
@@ -1487,7 +1605,7 @@ static int shmem_writepage(struct page *page, struct writeback_control *wbc)
 		}
 		folio_zero_range(folio, 0, folio_size(folio));
 		flush_dcache_folio(folio);
-		folio_mark_uptodate(folio);
+		shmem_set_range_uptodate(folio, 0, folio_size(folio));
 	}
 
 	swap = folio_alloc_swap(folio);
@@ -1769,13 +1887,16 @@ static int shmem_replace_folio(struct folio **foliop, gfp_t gfp,
 	if (!new)
 		return -ENOMEM;
 
+	if (folio_get_private(old))
+		folio_attach_private(new, folio_detach_private(old));
+
 	folio_get(new);
 	folio_copy(new, old);
 	flush_dcache_folio(new);
 
 	__folio_set_locked(new);
 	__folio_set_swapbacked(new);
-	folio_mark_uptodate(new);
+	shmem_set_range_uptodate(new, 0, folio_size(new));
 	new->swap = entry;
 	folio_set_swapcache(new);
 
@@ -2063,6 +2184,12 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index,
 
 alloced:
 	alloced = true;
+
+	if (!sfs_alloc(inode, folio) && folio_test_large(folio)) {
+		error = -ENOMEM;
+		goto unlock;
+	}
+
 	if (folio_test_pmd_mappable(folio) &&
 	    DIV_ROUND_UP(i_size_read(inode), PAGE_SIZE) <
 					folio_next_index(folio) - 1) {
@@ -2104,7 +2231,7 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index,
 		for (i = 0; i < n; i++)
 			clear_highpage(folio_page(folio, i));
 		flush_dcache_folio(folio);
-		folio_mark_uptodate(folio);
+		shmem_set_range_uptodate(folio, 0, folio_size(folio));
 	}
 
 	/* Perhaps the file has been truncated since we checked */
@@ -2773,8 +2900,8 @@ shmem_write_end(struct file *file, struct address_space *mapping,
 			folio_zero_segments(folio, 0, from,
 					from + copied, folio_size(folio));
 		}
-		folio_mark_uptodate(folio);
 	}
+	shmem_set_range_uptodate(folio, 0, folio_size(folio));
 	folio_mark_dirty(folio);
 	folio_unlock(folio);
 	folio_put(folio);
@@ -2782,6 +2909,59 @@ shmem_write_end(struct file *file, struct address_space *mapping,
 	return copied;
 }
 
+static void shmem_invalidate_folio(struct folio *folio, size_t offset,
+				   size_t len)
+{
+	/*
+	 * If we're invalidating the entire folio, clear the dirty state
+	 * from it and release it to avoid unnecessary buildup of the LRU.
+	 */
+	if (offset == 0 && len == folio_size(folio)) {
+		WARN_ON_ONCE(folio_test_writeback(folio));
+		folio_cancel_dirty(folio);
+		sfs_free(folio, true);
+	}
+}
+
+static bool shmem_release_folio(struct folio *folio, gfp_t gfp_flags)
+{
+	if (folio_test_dirty(folio) && !sfs_is_fully_uptodate(folio))
+		return false;
+
+	sfs_free(folio, false);
+	return true;
+}
+
+/*
+ * shmem_is_partially_uptodate checks whether blocks within a folio are
+ * uptodate or not.
+ *
+ * Returns true if all blocks which correspond to the specified part
+ * of the folio are uptodate.
+ */
+static bool shmem_is_partially_uptodate(struct folio *folio, size_t from,
+					size_t count)
+{
+	struct shmem_folio_state *sfs = folio->private;
+	struct inode *inode = folio->mapping->host;
+	unsigned int first, last;
+
+	if (!sfs)
+		return false;
+
+	/* Caller's range may extend past the end of this folio */
+	count = min(folio_size(folio) - from, count);
+
+	/* First and last blocks in range within folio */
+	first = from >> inode->i_blkbits;
+	last = (from + count - 1) >> inode->i_blkbits;
+
+	if (sfs_get_last_block_uptodate(folio, first, last) != last)
+		return false;
+
+	return true;
+}
+
 static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
 {
 	struct file *file = iocb->ki_filp;
@@ -3533,7 +3713,7 @@ static int shmem_symlink(struct mnt_idmap *idmap, struct inode *dir,
 			goto out_remove_offset;
 		inode->i_op = &shmem_symlink_inode_operations;
 		memcpy(folio_address(folio), symname, len);
-		folio_mark_uptodate(folio);
+		shmem_set_range_uptodate(folio, 0, folio_size(folio));
 		folio_mark_dirty(folio);
 		folio_unlock(folio);
 		folio_put(folio);
@@ -4523,7 +4703,10 @@ static const struct address_space_operations shmem_aops = {
 #ifdef CONFIG_MIGRATION
 	.migrate_folio	= migrate_folio,
 #endif
-	.error_remove_folio = shmem_error_remove_folio,
+	.error_remove_folio    = shmem_error_remove_folio,
+	.invalidate_folio      = shmem_invalidate_folio,
+	.release_folio         = shmem_release_folio,
+	.is_partially_uptodate = shmem_is_partially_uptodate,
 };
 
 static const struct file_operations shmem_file_operations = {
-- 
2.43.0

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 03/12] shmem: move folio zero operation to write_begin()
       [not found]   ` <CGME20240515055727eucas1p2413c65b8b227ac0c6007b4600574abd8@eucas1p2.samsung.com>
@ 2024-05-15  5:57     ` Daniel Gomez
  0 siblings, 0 replies; 19+ messages in thread
From: Daniel Gomez @ 2024-05-15  5:57 UTC (permalink / raw)
  To: hughd, akpm, willy, jack, mcgrof
  Cc: linux-mm, linux-xfs, djwong, Pankaj Raghav, dagmcr, yosryahmed,
	baolin.wang, ritesh.list, lsf-pc, david, chandan.babu,
	linux-kernel, brauner, Daniel Gomez

Simplify zero out operation by moving it from write_end() to the
write_begin(). If a large folio does not have any block uptodate when we
first get it, zero it out entirely.

Signed-off-by: Daniel Gomez <da.gomez@samsung.com>
---
 mm/shmem.c | 27 ++++++++++++++++++++-------
 1 file changed, 20 insertions(+), 7 deletions(-)

diff --git a/mm/shmem.c b/mm/shmem.c
index 4818f9fbd328..86ad539b6a0f 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -149,6 +149,14 @@ static inline bool sfs_is_fully_uptodate(struct folio *folio)
 	return bitmap_full(sfs->state, i_blocks_per_folio(inode, folio));
 }
 
+static inline bool sfs_is_any_uptodate(struct folio *folio)
+{
+	struct inode *inode = folio->mapping->host;
+	struct shmem_folio_state *sfs = folio->private;
+
+	return !bitmap_empty(sfs->state, i_blocks_per_folio(inode, folio));
+}
+
 static inline bool sfs_is_block_uptodate(struct shmem_folio_state *sfs,
 					 unsigned int block)
 {
@@ -239,6 +247,15 @@ static void sfs_free(struct folio *folio, bool force)
 	kfree(folio_detach_private(folio));
 }
 
+static inline bool shmem_is_any_uptodate(struct folio *folio)
+{
+	struct shmem_folio_state *sfs = folio->private;
+
+	if (folio_test_large(folio) && sfs)
+		return sfs_is_any_uptodate(folio);
+	return folio_test_uptodate(folio);
+}
+
 static void shmem_set_range_uptodate(struct folio *folio, size_t off,
 				     size_t len)
 {
@@ -2872,6 +2889,9 @@ shmem_write_begin(struct file *file, struct address_space *mapping,
 	if (ret)
 		return ret;
 
+	if (!shmem_is_any_uptodate(folio))
+		folio_zero_range(folio, 0, folio_size(folio));
+
 	*pagep = folio_file_page(folio, index);
 	if (PageHWPoison(*pagep)) {
 		folio_unlock(folio);
@@ -2894,13 +2914,6 @@ shmem_write_end(struct file *file, struct address_space *mapping,
 	if (pos + copied > inode->i_size)
 		i_size_write(inode, pos + copied);
 
-	if (!folio_test_uptodate(folio)) {
-		if (copied < folio_size(folio)) {
-			size_t from = offset_in_folio(folio, pos);
-			folio_zero_segments(folio, 0, from,
-					from + copied, folio_size(folio));
-		}
-	}
 	shmem_set_range_uptodate(folio, 0, folio_size(folio));
 	folio_mark_dirty(folio);
 	folio_unlock(folio);
-- 
2.43.0

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 04/12] shmem: exit shmem_get_folio_gfp() if block is uptodate
       [not found]   ` <CGME20240515055728eucas1p181e0ed81b2663eb0eee6d6134c1c1956@eucas1p1.samsung.com>
@ 2024-05-15  5:57     ` Daniel Gomez
  0 siblings, 0 replies; 19+ messages in thread
From: Daniel Gomez @ 2024-05-15  5:57 UTC (permalink / raw)
  To: hughd, akpm, willy, jack, mcgrof
  Cc: linux-mm, linux-xfs, djwong, Pankaj Raghav, dagmcr, yosryahmed,
	baolin.wang, ritesh.list, lsf-pc, david, chandan.babu,
	linux-kernel, brauner, Daniel Gomez

When we get a folio from the page cache with filemap_get_entry() and
is uptodate we exit from shmem_get_folio_gfp(). Replicate the same
behaviour if the block is uptodate in the index we are operating on.

Signed-off-by: Daniel Gomez <da.gomez@samsung.com>
---
 mm/shmem.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/mm/shmem.c b/mm/shmem.c
index 86ad539b6a0f..69f3b98fdf7c 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -256,6 +256,16 @@ static inline bool shmem_is_any_uptodate(struct folio *folio)
 	return folio_test_uptodate(folio);
 }
 
+static inline bool shmem_is_block_uptodate(struct folio *folio,
+					   unsigned int block)
+{
+	struct shmem_folio_state *sfs = folio->private;
+
+	if (folio_test_large(folio) && sfs)
+		return sfs_is_block_uptodate(sfs, block);
+	return folio_test_uptodate(folio);
+}
+
 static void shmem_set_range_uptodate(struct folio *folio, size_t off,
 				     size_t len)
 {
@@ -2146,7 +2156,7 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index,
 		}
 		if (sgp == SGP_WRITE)
 			folio_mark_accessed(folio);
-		if (folio_test_uptodate(folio))
+		if (shmem_is_block_uptodate(folio, index - folio_index(folio)))
 			goto out;
 		/* fallocated folio */
 		if (sgp != SGP_READ)
-- 
2.43.0

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 05/12] shmem: clear_highpage() if block is not uptodate
       [not found]   ` <CGME20240515055729eucas1p14e953424ad39bbb923c64163b1bbd4b3@eucas1p1.samsung.com>
@ 2024-05-15  5:57     ` Daniel Gomez
  0 siblings, 0 replies; 19+ messages in thread
From: Daniel Gomez @ 2024-05-15  5:57 UTC (permalink / raw)
  To: hughd, akpm, willy, jack, mcgrof
  Cc: linux-mm, linux-xfs, djwong, Pankaj Raghav, dagmcr, yosryahmed,
	baolin.wang, ritesh.list, lsf-pc, david, chandan.babu,
	linux-kernel, brauner, Daniel Gomez

clear_highpage() is called for all the subpages (blocks) in a large
folio when the folio is not uptodate. Do clear the subpages only when
they are not uptodate.

Signed-off-by: Daniel Gomez <da.gomez@samsung.com>
---
 mm/shmem.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/mm/shmem.c b/mm/shmem.c
index 69f3b98fdf7c..04992010225f 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2256,7 +2256,8 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index,
 		long i, n = folio_nr_pages(folio);
 
 		for (i = 0; i < n; i++)
-			clear_highpage(folio_page(folio, i));
+			if (!shmem_is_block_uptodate(folio, i))
+				clear_highpage(folio_page(folio, i));
 		flush_dcache_folio(folio);
 		shmem_set_range_uptodate(folio, 0, folio_size(folio));
 	}
-- 
2.43.0

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 06/12] shmem: set folio uptodate when reclaim
       [not found]   ` <CGME20240515055731eucas1p12cbbba88e24a011ef5871f90ff25ae73@eucas1p1.samsung.com>
@ 2024-05-15  5:57     ` Daniel Gomez
  0 siblings, 0 replies; 19+ messages in thread
From: Daniel Gomez @ 2024-05-15  5:57 UTC (permalink / raw)
  To: hughd, akpm, willy, jack, mcgrof
  Cc: linux-mm, linux-xfs, djwong, Pankaj Raghav, dagmcr, yosryahmed,
	baolin.wang, ritesh.list, lsf-pc, david, chandan.babu,
	linux-kernel, brauner, Daniel Gomez

When reclaiming some space by splitting a large folio through
shmem_unused_huge_shrink(), a large folio is split regardless of its
uptodate status. Mark all the blocks as uptodate in the reclaim path so
split_folio() can release the folio private struct (shmem_folio_state).

Signed-off-by: Daniel Gomez <da.gomez@samsung.com>
---
 mm/shmem.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/mm/shmem.c b/mm/shmem.c
index 04992010225f..68fe769d91b1 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -842,6 +842,7 @@ static unsigned long shmem_unused_huge_shrink(struct shmem_sb_info *sbinfo,
 			goto move_back;
 		}
 
+		shmem_set_range_uptodate(folio, 0, folio_size(folio));
 		ret = split_folio(folio);
 		folio_unlock(folio);
 		folio_put(folio);
-- 
2.43.0

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 07/12] shmem: check if a block is uptodate before splice into pipe
       [not found]   ` <CGME20240515055732eucas1p2302bbca4d60e2e811a5c59e34f83628d@eucas1p2.samsung.com>
@ 2024-05-15  5:57     ` Daniel Gomez
  2024-05-16 13:19       ` kernel test robot
  0 siblings, 1 reply; 19+ messages in thread
From: Daniel Gomez @ 2024-05-15  5:57 UTC (permalink / raw)
  To: hughd, akpm, willy, jack, mcgrof
  Cc: linux-mm, linux-xfs, djwong, Pankaj Raghav, dagmcr, yosryahmed,
	baolin.wang, ritesh.list, lsf-pc, david, chandan.babu,
	linux-kernel, brauner, Daniel Gomez

The splice_read() path assumes folios are always uptodate. Make sure
all blocks in the given range are uptodate or else, splice zeropage into
the pipe. Maximize the number of blocks that can be spliced into pipe at
once by increasing the 'part' to the latest uptodate block found.

Signed-off-by: Daniel Gomez <da.gomez@samsung.com>
---
 mm/shmem.c | 24 +++++++++++++++++++++++-
 1 file changed, 23 insertions(+), 1 deletion(-)

diff --git a/mm/shmem.c b/mm/shmem.c
index 68fe769d91b1..e06cb6438ef8 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -3223,8 +3223,30 @@ static ssize_t shmem_file_splice_read(struct file *in, loff_t *ppos,
 		if (unlikely(*ppos >= isize))
 			break;
 		part = min_t(loff_t, isize - *ppos, len);
+		if (folio && folio_test_large(folio) &&
+		    folio_test_private(folio)) {
+			unsigned long from = offset_in_folio(folio, *ppos);
+			unsigned int bfirst = from >> inode->i_blkbits;
+			unsigned int blast, blast_upd;
+
+			len = min(folio_size(folio) - from, len);
+			blast = (from + len - 1) >> inode->i_blkbits;
+
+			blast_upd = sfs_get_last_block_uptodate(folio, bfirst,
+								blast);
+			if (blast_upd <= blast) {
+				unsigned int bsize = 1 << inode->i_blkbits;
+				unsigned int blks = blast_upd - bfirst + 1;
+				unsigned int bbytes = blks << inode->i_blkbits;
+				unsigned int boff = (*ppos % bsize);
+
+				part = min_t(loff_t, bbytes - boff, len);
+			}
+		}
 
-		if (folio) {
+		if (folio && shmem_is_block_uptodate(
+				     folio, offset_in_folio(folio, *ppos) >>
+						    inode->i_blkbits)) {
 			/*
 			 * If users can be writing to this page using arbitrary
 			 * virtual addresses, take care about potential aliasing
-- 
2.43.0

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 08/12] shmem: clear uptodate blocks after PUNCH_HOLE
       [not found]   ` <CGME20240515055733eucas1p2804d2fb5f5bf7d6adb460054f6e9f4d8@eucas1p2.samsung.com>
@ 2024-05-15  5:57     ` Daniel Gomez
  0 siblings, 0 replies; 19+ messages in thread
From: Daniel Gomez @ 2024-05-15  5:57 UTC (permalink / raw)
  To: hughd, akpm, willy, jack, mcgrof
  Cc: linux-mm, linux-xfs, djwong, Pankaj Raghav, dagmcr, yosryahmed,
	baolin.wang, ritesh.list, lsf-pc, david, chandan.babu,
	linux-kernel, brauner, Daniel Gomez

In the fallocate path with PUNCH_HOLE mode flag enabled, clear the
uptodate flag for those blocks covered by the punch. Skip all partial
blocks as they may still contain data.

Signed-off-by: Daniel Gomez <da.gomez@samsung.com>
---
 mm/shmem.c | 78 +++++++++++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 72 insertions(+), 6 deletions(-)

diff --git a/mm/shmem.c b/mm/shmem.c
index e06cb6438ef8..d5e6c8eba983 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -209,6 +209,28 @@ static void sfs_set_range_uptodate(struct folio *folio,
 	spin_unlock_irqrestore(&sfs->state_lock, flags);
 }
 
+static void sfs_clear_range_uptodate(struct folio *folio,
+				     struct shmem_folio_state *sfs, size_t off,
+				     size_t len)
+{
+	struct inode *inode = folio->mapping->host;
+	unsigned int first_blk, last_blk;
+	unsigned long flags;
+
+	first_blk = DIV_ROUND_UP_ULL(off, 1 << inode->i_blkbits);
+	last_blk = DIV_ROUND_DOWN_ULL(off + len, 1 << inode->i_blkbits) - 1;
+	if (last_blk == UINT_MAX)
+		return;
+
+	if (first_blk > last_blk)
+		return;
+
+	spin_lock_irqsave(&sfs->state_lock, flags);
+	bitmap_clear(sfs->state, first_blk, last_blk - first_blk + 1);
+	folio_clear_uptodate(folio);
+	spin_unlock_irqrestore(&sfs->state_lock, flags);
+}
+
 static struct shmem_folio_state *sfs_alloc(struct inode *inode,
 					   struct folio *folio)
 {
@@ -276,6 +298,19 @@ static void shmem_set_range_uptodate(struct folio *folio, size_t off,
 	else
 		folio_mark_uptodate(folio);
 }
+
+static void shmem_clear_range_uptodate(struct folio *folio, size_t off,
+				     size_t len)
+{
+	struct shmem_folio_state *sfs = folio->private;
+
+	if (sfs)
+		sfs_clear_range_uptodate(folio, sfs, off, len);
+	else
+		folio_clear_uptodate(folio);
+
+}
+
 #ifdef CONFIG_TMPFS
 static unsigned long shmem_default_max_blocks(void)
 {
@@ -1103,12 +1138,33 @@ static struct folio *shmem_get_partial_folio(struct inode *inode, pgoff_t index)
 	return folio;
 }
 
+static void shmem_clear(struct folio *folio, loff_t start, loff_t end, int mode)
+{
+	loff_t pos = folio_pos(folio);
+	unsigned int offset, length;
+
+	if (!(mode & FALLOC_FL_PUNCH_HOLE) || !(folio_test_large(folio)))
+		return;
+
+	if (pos < start)
+		offset = start - pos;
+	else
+		offset = 0;
+	length = folio_size(folio);
+	if (pos + length <= (u64)end)
+		length = length - offset;
+	else
+		length = end + 1 - pos - offset;
+
+	shmem_clear_range_uptodate(folio, offset, length);
+}
+
 /*
  * Remove range of pages and swap entries from page cache, and free them.
  * If !unfalloc, truncate or punch hole; if unfalloc, undo failed fallocate.
  */
 static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend,
-								 bool unfalloc)
+			     bool unfalloc, int mode)
 {
 	struct address_space *mapping = inode->i_mapping;
 	struct shmem_inode_info *info = SHMEM_I(inode);
@@ -1166,6 +1222,7 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend,
 	if (folio) {
 		same_folio = lend < folio_pos(folio) + folio_size(folio);
 		folio_mark_dirty(folio);
+		shmem_clear(folio, lstart, lend, mode);
 		if (!truncate_inode_partial_folio(folio, lstart, lend)) {
 			start = folio_next_index(folio);
 			if (same_folio)
@@ -1255,9 +1312,17 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend,
 	shmem_recalc_inode(inode, 0, -nr_swaps_freed);
 }
 
+static void shmem_truncate_range_mode(struct inode *inode, loff_t lstart,
+				      loff_t lend, int mode)
+{
+	shmem_undo_range(inode, lstart, lend, false, mode);
+	inode_set_mtime_to_ts(inode, inode_set_ctime_current(inode));
+	inode_inc_iversion(inode);
+}
+
 void shmem_truncate_range(struct inode *inode, loff_t lstart, loff_t lend)
 {
-	shmem_undo_range(inode, lstart, lend, false);
+	shmem_undo_range(inode, lstart, lend, false, 0);
 	inode_set_mtime_to_ts(inode, inode_set_ctime_current(inode));
 	inode_inc_iversion(inode);
 }
@@ -3342,7 +3407,7 @@ static long shmem_fallocate(struct file *file, int mode, loff_t offset,
 		if ((u64)unmap_end > (u64)unmap_start)
 			unmap_mapping_range(mapping, unmap_start,
 					    1 + unmap_end - unmap_start, 0);
-		shmem_truncate_range(inode, offset, offset + len - 1);
+		shmem_truncate_range_mode(inode, offset, offset + len - 1, mode);
 		/* No need to unmap again: hole-punching leaves COWed pages */
 
 		spin_lock(&inode->i_lock);
@@ -3408,9 +3473,10 @@ static long shmem_fallocate(struct file *file, int mode, loff_t offset,
 			info->fallocend = undo_fallocend;
 			/* Remove the !uptodate folios we added */
 			if (index > start) {
-				shmem_undo_range(inode,
-				    (loff_t)start << PAGE_SHIFT,
-				    ((loff_t)index << PAGE_SHIFT) - 1, true);
+				shmem_undo_range(
+					inode, (loff_t)start << PAGE_SHIFT,
+					((loff_t)index << PAGE_SHIFT) - 1, true,
+					0);
 			}
 			goto undone;
 		}
-- 
2.43.0

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 09/12] shmem: enable per-block uptodate
       [not found]   ` <CGME20240515055735eucas1p2a967b4eebc8e059588cd62139f006b0d@eucas1p2.samsung.com>
@ 2024-05-15  5:57     ` Daniel Gomez
  0 siblings, 0 replies; 19+ messages in thread
From: Daniel Gomez @ 2024-05-15  5:57 UTC (permalink / raw)
  To: hughd, akpm, willy, jack, mcgrof
  Cc: linux-mm, linux-xfs, djwong, Pankaj Raghav, dagmcr, yosryahmed,
	baolin.wang, ritesh.list, lsf-pc, david, chandan.babu,
	linux-kernel, brauner, Daniel Gomez

In the write_end() function, mark only the blocks that are being written
as uptodate.

Signed-off-by: Daniel Gomez <da.gomez@samsung.com>
---
 mm/shmem.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/shmem.c b/mm/shmem.c
index d5e6c8eba983..7a6ad678e2ff 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2991,7 +2991,7 @@ shmem_write_end(struct file *file, struct address_space *mapping,
 	if (pos + copied > inode->i_size)
 		i_size_write(inode, pos + copied);
 
-	shmem_set_range_uptodate(folio, 0, folio_size(folio));
+	shmem_set_range_uptodate(folio, offset_in_folio(folio, pos), len);
 	folio_mark_dirty(folio);
 	folio_unlock(folio);
 	folio_put(folio);
-- 
2.43.0

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 10/12] shmem: add order arg to shmem_alloc_folio()
       [not found]   ` <CGME20240515055736eucas1p1bfa9549398e766532d143ba9314bee18@eucas1p1.samsung.com>
@ 2024-05-15  5:57     ` Daniel Gomez
  0 siblings, 0 replies; 19+ messages in thread
From: Daniel Gomez @ 2024-05-15  5:57 UTC (permalink / raw)
  To: hughd, akpm, willy, jack, mcgrof
  Cc: linux-mm, linux-xfs, djwong, Pankaj Raghav, dagmcr, yosryahmed,
	baolin.wang, ritesh.list, lsf-pc, david, chandan.babu,
	linux-kernel, brauner, Daniel Gomez

Add folio order argument to the shmem_alloc_folio(). Return will make
use of the new page_rmappable_folio() where order-0 and high order
folios are both supported.

As the order requested may not match the order returned when allocating
high order folios, make sure pages are calculated after getting the
folio.

Signed-off-by: Daniel Gomez <da.gomez@samsung.com>
---
 mm/shmem.c | 21 +++++++++++----------
 1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/mm/shmem.c b/mm/shmem.c
index 7a6ad678e2ff..d531018ffece 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1828,18 +1828,18 @@ static struct folio *shmem_alloc_hugefolio(gfp_t gfp,
 	return page_rmappable_folio(page);
 }
 
-static struct folio *shmem_alloc_folio(gfp_t gfp,
-		struct shmem_inode_info *info, pgoff_t index)
+static struct folio *shmem_alloc_folio(gfp_t gfp, struct shmem_inode_info *info,
+				       pgoff_t index, unsigned int order)
 {
 	struct mempolicy *mpol;
 	pgoff_t ilx;
 	struct page *page;
 
-	mpol = shmem_get_pgoff_policy(info, index, 0, &ilx);
-	page = alloc_pages_mpol(gfp, 0, mpol, ilx, numa_node_id());
+	mpol = shmem_get_pgoff_policy(info, index, order, &ilx);
+	page = alloc_pages_mpol(gfp, order, mpol, ilx, numa_node_id());
 	mpol_cond_put(mpol);
 
-	return (struct folio *)page;
+	return page_rmappable_folio(page);
 }
 
 static struct folio *shmem_alloc_and_add_folio(gfp_t gfp,
@@ -1848,6 +1848,7 @@ static struct folio *shmem_alloc_and_add_folio(gfp_t gfp,
 {
 	struct address_space *mapping = inode->i_mapping;
 	struct shmem_inode_info *info = SHMEM_I(inode);
+	unsigned int order = 0;
 	struct folio *folio;
 	long pages;
 	int error;
@@ -1856,7 +1857,6 @@ static struct folio *shmem_alloc_and_add_folio(gfp_t gfp,
 		huge = false;
 
 	if (huge) {
-		pages = HPAGE_PMD_NR;
 		index = round_down(index, HPAGE_PMD_NR);
 
 		/*
@@ -1875,12 +1875,13 @@ static struct folio *shmem_alloc_and_add_folio(gfp_t gfp,
 		if (!folio)
 			count_vm_event(THP_FILE_FALLBACK);
 	} else {
-		pages = 1;
-		folio = shmem_alloc_folio(gfp, info, index);
+		folio = shmem_alloc_folio(gfp, info, index, order);
 	}
 	if (!folio)
 		return ERR_PTR(-ENOMEM);
 
+	pages = folio_nr_pages(folio);
+
 	__folio_set_locked(folio);
 	__folio_set_swapbacked(folio);
 
@@ -1976,7 +1977,7 @@ static int shmem_replace_folio(struct folio **foliop, gfp_t gfp,
 	 */
 	gfp &= ~GFP_CONSTRAINT_MASK;
 	VM_BUG_ON_FOLIO(folio_test_large(old), old);
-	new = shmem_alloc_folio(gfp, info, index);
+	new = shmem_alloc_folio(gfp, info, index, folio_order(old));
 	if (!new)
 		return -ENOMEM;
 
@@ -2855,7 +2856,7 @@ int shmem_mfill_atomic_pte(pmd_t *dst_pmd,
 
 	if (!*foliop) {
 		ret = -ENOMEM;
-		folio = shmem_alloc_folio(gfp, info, pgoff);
+		folio = shmem_alloc_folio(gfp, info, pgoff, 0);
 		if (!folio)
 			goto out_unacct_blocks;
 
-- 
2.43.0

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 11/12] shmem: add file length arg in shmem_get_folio() path
       [not found]   ` <CGME20240515055738eucas1p15335a32c790b731aa5857193bbddf92d@eucas1p1.samsung.com>
@ 2024-05-15  5:57     ` Daniel Gomez
  2024-05-15 17:47       ` kernel test robot
  2024-05-17 16:17       ` Darrick J. Wong
  0 siblings, 2 replies; 19+ messages in thread
From: Daniel Gomez @ 2024-05-15  5:57 UTC (permalink / raw)
  To: hughd, akpm, willy, jack, mcgrof
  Cc: linux-mm, linux-xfs, djwong, Pankaj Raghav, dagmcr, yosryahmed,
	baolin.wang, ritesh.list, lsf-pc, david, chandan.babu,
	linux-kernel, brauner, Daniel Gomez

In preparation for large folio in the write and fallocate paths, add
file length argument in shmem_get_folio() path to be able to calculate
the folio order based on the file size. Use of order-0 (PAGE_SIZE) for
read, page cache read, and vm fault.

This enables high order folios in the write and fallocate path once the
folio order is calculated based on the length.

Signed-off-by: Daniel Gomez <da.gomez@samsung.com>
---
 fs/xfs/scrub/xfile.c     |  6 +++---
 fs/xfs/xfs_buf_mem.c     |  3 ++-
 include/linux/shmem_fs.h |  2 +-
 mm/khugepaged.c          |  3 ++-
 mm/shmem.c               | 35 ++++++++++++++++++++---------------
 mm/userfaultfd.c         |  2 +-
 6 files changed, 29 insertions(+), 22 deletions(-)

diff --git a/fs/xfs/scrub/xfile.c b/fs/xfs/scrub/xfile.c
index 8cdd863db585..4905f5e4cb5d 100644
--- a/fs/xfs/scrub/xfile.c
+++ b/fs/xfs/scrub/xfile.c
@@ -127,7 +127,7 @@ xfile_load(
 		unsigned int	offset;
 
 		if (shmem_get_folio(inode, pos >> PAGE_SHIFT, &folio,
-				SGP_READ) < 0)
+				SGP_READ, PAGE_SIZE) < 0)
 			break;
 		if (!folio) {
 			/*
@@ -197,7 +197,7 @@ xfile_store(
 		unsigned int	offset;
 
 		if (shmem_get_folio(inode, pos >> PAGE_SHIFT, &folio,
-				SGP_CACHE) < 0)
+				SGP_CACHE, PAGE_SIZE) < 0)
 			break;
 		if (filemap_check_wb_err(inode->i_mapping, 0)) {
 			folio_unlock(folio);
@@ -268,7 +268,7 @@ xfile_get_folio(
 
 	pflags = memalloc_nofs_save();
 	error = shmem_get_folio(inode, pos >> PAGE_SHIFT, &folio,
-			(flags & XFILE_ALLOC) ? SGP_CACHE : SGP_READ);
+			(flags & XFILE_ALLOC) ? SGP_CACHE : SGP_READ, PAGE_SIZE);
 	memalloc_nofs_restore(pflags);
 	if (error)
 		return ERR_PTR(error);
diff --git a/fs/xfs/xfs_buf_mem.c b/fs/xfs/xfs_buf_mem.c
index 9bb2d24de709..784c81d35a1f 100644
--- a/fs/xfs/xfs_buf_mem.c
+++ b/fs/xfs/xfs_buf_mem.c
@@ -149,7 +149,8 @@ xmbuf_map_page(
 		return -ENOMEM;
 	}
 
-	error = shmem_get_folio(inode, pos >> PAGE_SHIFT, &folio, SGP_CACHE);
+	error = shmem_get_folio(inode, pos >> PAGE_SHIFT, &folio, SGP_CACHE,
+				PAGE_SIZE);
 	if (error)
 		return error;
 
diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h
index 3fb18f7eb73e..bc59b4a00228 100644
--- a/include/linux/shmem_fs.h
+++ b/include/linux/shmem_fs.h
@@ -142,7 +142,7 @@ enum sgp_type {
 };
 
 int shmem_get_folio(struct inode *inode, pgoff_t index, struct folio **foliop,
-		enum sgp_type sgp);
+		enum sgp_type sgp, size_t len);
 struct folio *shmem_read_folio_gfp(struct address_space *mapping,
 		pgoff_t index, gfp_t gfp);
 
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 38830174608f..947770ded68c 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -1863,7 +1863,8 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr,
 				xas_unlock_irq(&xas);
 				/* swap in or instantiate fallocated page */
 				if (shmem_get_folio(mapping->host, index,
-						&folio, SGP_NOALLOC)) {
+						    &folio, SGP_NOALLOC,
+						    PAGE_SIZE)) {
 					result = SCAN_FAIL;
 					goto xa_unlocked;
 				}
diff --git a/mm/shmem.c b/mm/shmem.c
index d531018ffece..fcd2c9befe19 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1134,7 +1134,7 @@ static struct folio *shmem_get_partial_folio(struct inode *inode, pgoff_t index)
 	 * (although in some cases this is just a waste of time).
 	 */
 	folio = NULL;
-	shmem_get_folio(inode, index, &folio, SGP_READ);
+	shmem_get_folio(inode, index, &folio, SGP_READ, PAGE_SIZE);
 	return folio;
 }
 
@@ -1844,7 +1844,7 @@ static struct folio *shmem_alloc_folio(gfp_t gfp, struct shmem_inode_info *info,
 
 static struct folio *shmem_alloc_and_add_folio(gfp_t gfp,
 		struct inode *inode, pgoff_t index,
-		struct mm_struct *fault_mm, bool huge)
+		struct mm_struct *fault_mm, bool huge, size_t len)
 {
 	struct address_space *mapping = inode->i_mapping;
 	struct shmem_inode_info *info = SHMEM_I(inode);
@@ -2173,7 +2173,7 @@ static int shmem_swapin_folio(struct inode *inode, pgoff_t index,
  */
 static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index,
 		struct folio **foliop, enum sgp_type sgp, gfp_t gfp,
-		struct vm_fault *vmf, vm_fault_t *fault_type)
+		struct vm_fault *vmf, vm_fault_t *fault_type, size_t len)
 {
 	struct vm_area_struct *vma = vmf ? vmf->vma : NULL;
 	struct mm_struct *fault_mm;
@@ -2258,7 +2258,7 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index,
 		huge_gfp = vma_thp_gfp_mask(vma);
 		huge_gfp = limit_gfp_mask(huge_gfp, gfp);
 		folio = shmem_alloc_and_add_folio(huge_gfp,
-				inode, index, fault_mm, true);
+				inode, index, fault_mm, true, len);
 		if (!IS_ERR(folio)) {
 			count_vm_event(THP_FILE_ALLOC);
 			goto alloced;
@@ -2267,7 +2267,8 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index,
 			goto repeat;
 	}
 
-	folio = shmem_alloc_and_add_folio(gfp, inode, index, fault_mm, false);
+	folio = shmem_alloc_and_add_folio(gfp, inode, index, fault_mm, false,
+					  len);
 	if (IS_ERR(folio)) {
 		error = PTR_ERR(folio);
 		if (error == -EEXIST)
@@ -2377,10 +2378,10 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index,
  * Return: 0 if successful, else a negative error code.
  */
 int shmem_get_folio(struct inode *inode, pgoff_t index, struct folio **foliop,
-		enum sgp_type sgp)
+		enum sgp_type sgp, size_t len)
 {
 	return shmem_get_folio_gfp(inode, index, foliop, sgp,
-			mapping_gfp_mask(inode->i_mapping), NULL, NULL);
+			mapping_gfp_mask(inode->i_mapping), NULL, NULL, len);
 }
 EXPORT_SYMBOL_GPL(shmem_get_folio);
 
@@ -2475,7 +2476,7 @@ static vm_fault_t shmem_fault(struct vm_fault *vmf)
 
 	WARN_ON_ONCE(vmf->page != NULL);
 	err = shmem_get_folio_gfp(inode, vmf->pgoff, &folio, SGP_CACHE,
-				  gfp, vmf, &ret);
+				  gfp, vmf, &ret, PAGE_SIZE);
 	if (err)
 		return vmf_error(err);
 	if (folio) {
@@ -2954,6 +2955,9 @@ shmem_write_begin(struct file *file, struct address_space *mapping,
 	struct folio *folio;
 	int ret = 0;
 
+	if (!mapping_large_folio_support(mapping))
+		len = min_t(size_t, len, PAGE_SIZE - offset_in_page(pos));
+
 	/* i_rwsem is held by caller */
 	if (unlikely(info->seals & (F_SEAL_GROW |
 				   F_SEAL_WRITE | F_SEAL_FUTURE_WRITE))) {
@@ -2963,7 +2967,7 @@ shmem_write_begin(struct file *file, struct address_space *mapping,
 			return -EPERM;
 	}
 
-	ret = shmem_get_folio(inode, index, &folio, SGP_WRITE);
+	ret = shmem_get_folio(inode, index, &folio, SGP_WRITE, len);
 	if (ret)
 		return ret;
 
@@ -3083,7 +3087,7 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
 				break;
 		}
 
-		error = shmem_get_folio(inode, index, &folio, SGP_READ);
+		error = shmem_get_folio(inode, index, &folio, SGP_READ, PAGE_SIZE);
 		if (error) {
 			if (error == -EINVAL)
 				error = 0;
@@ -3260,7 +3264,7 @@ static ssize_t shmem_file_splice_read(struct file *in, loff_t *ppos,
 			break;
 
 		error = shmem_get_folio(inode, *ppos / PAGE_SIZE, &folio,
-					SGP_READ);
+					SGP_READ, PAGE_SIZE);
 		if (error) {
 			if (error == -EINVAL)
 				error = 0;
@@ -3469,7 +3473,8 @@ static long shmem_fallocate(struct file *file, int mode, loff_t offset,
 			error = -ENOMEM;
 		else
 			error = shmem_get_folio(inode, index, &folio,
-						SGP_FALLOC);
+						SGP_FALLOC,
+						(end - index) << PAGE_SHIFT);
 		if (error) {
 			info->fallocend = undo_fallocend;
 			/* Remove the !uptodate folios we added */
@@ -3822,7 +3827,7 @@ static int shmem_symlink(struct mnt_idmap *idmap, struct inode *dir,
 	} else {
 		inode_nohighmem(inode);
 		inode->i_mapping->a_ops = &shmem_aops;
-		error = shmem_get_folio(inode, 0, &folio, SGP_WRITE);
+		error = shmem_get_folio(inode, 0, &folio, SGP_WRITE, PAGE_SIZE);
 		if (error)
 			goto out_remove_offset;
 		inode->i_op = &shmem_symlink_inode_operations;
@@ -3868,7 +3873,7 @@ static const char *shmem_get_link(struct dentry *dentry, struct inode *inode,
 			return ERR_PTR(-ECHILD);
 		}
 	} else {
-		error = shmem_get_folio(inode, 0, &folio, SGP_READ);
+		error = shmem_get_folio(inode, 0, &folio, SGP_READ, PAGE_SIZE);
 		if (error)
 			return ERR_PTR(error);
 		if (!folio)
@@ -5255,7 +5260,7 @@ struct folio *shmem_read_folio_gfp(struct address_space *mapping,
 	int error;
 
 	error = shmem_get_folio_gfp(inode, index, &folio, SGP_CACHE,
-				    gfp, NULL, NULL);
+				    gfp, NULL, NULL, PAGE_SIZE);
 	if (error)
 		return ERR_PTR(error);
 
diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
index 3c3539c573e7..540a0c2d4325 100644
--- a/mm/userfaultfd.c
+++ b/mm/userfaultfd.c
@@ -359,7 +359,7 @@ static int mfill_atomic_pte_continue(pmd_t *dst_pmd,
 	struct page *page;
 	int ret;
 
-	ret = shmem_get_folio(inode, pgoff, &folio, SGP_NOALLOC);
+	ret = shmem_get_folio(inode, pgoff, &folio, SGP_NOALLOC, PAGE_SIZE);
 	/* Our caller expects us to return -EFAULT if we failed to find folio */
 	if (ret == -ENOENT)
 		ret = -EFAULT;
-- 
2.43.0

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 12/12] shmem: add large folio support to the write and fallocate paths
       [not found]   ` <CGME20240515055740eucas1p1bf112e73a7009a0f9b2bbf09c989a51b@eucas1p1.samsung.com>
@ 2024-05-15  5:57     ` Daniel Gomez
  2024-05-15 18:59       ` kernel test robot
  0 siblings, 1 reply; 19+ messages in thread
From: Daniel Gomez @ 2024-05-15  5:57 UTC (permalink / raw)
  To: hughd, akpm, willy, jack, mcgrof
  Cc: linux-mm, linux-xfs, djwong, Pankaj Raghav, dagmcr, yosryahmed,
	baolin.wang, ritesh.list, lsf-pc, david, chandan.babu,
	linux-kernel, brauner, Daniel Gomez

Add large folio support for shmem write and fallocate paths matching the
same high order preference mechanism used in the iomap buffered IO path
as used in __filemap_get_folio().

Add shmem_mapping_size_order() to get a hint for the order of the folio
based on the file size which takes care of the mapping requirements.

Swap does not support high order folios for now, so make it order-0 in
case swap is enabled.

Skip high order folio allocation loop when reclaim path returns with no
space left (ENOSPC).

Add __GFP_COMP flag for high order folios allocation path to fix a
memory leak.

Signed-off-by: Daniel Gomez <da.gomez@samsung.com>
---
 mm/shmem.c | 49 +++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 47 insertions(+), 2 deletions(-)

diff --git a/mm/shmem.c b/mm/shmem.c
index fcd2c9befe19..9308a334a940 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1836,23 +1836,63 @@ static struct folio *shmem_alloc_folio(gfp_t gfp, struct shmem_inode_info *info,
 	struct page *page;
 
 	mpol = shmem_get_pgoff_policy(info, index, order, &ilx);
-	page = alloc_pages_mpol(gfp, order, mpol, ilx, numa_node_id());
+	page = alloc_pages_mpol(gfp | __GFP_COMP, order, mpol, ilx,
+				numa_node_id());
 	mpol_cond_put(mpol);
 
 	return page_rmappable_folio(page);
 }
 
+/**
+ * shmem_mapping_size_order - Get maximum folio order for the given file size.
+ * @mapping: Target address_space.
+ * @index: The page index.
+ * @size: The suggested size of the folio to create.
+ *
+ * This returns a high order for folios (when supported) based on the file size
+ * which the mapping currently allows at the given index. The index is relevant
+ * due to alignment considerations the mapping might have. The returned order
+ * may be less than the size passed.
+ *
+ * Like __filemap_get_folio order calculation.
+ *
+ * Return: The order.
+ */
+static inline unsigned int
+shmem_mapping_size_order(struct address_space *mapping, pgoff_t index,
+			 size_t size, struct shmem_sb_info *sbinfo)
+{
+	unsigned int order = ilog2(size);
+
+	if ((order <= PAGE_SHIFT) ||
+	    (!mapping_large_folio_support(mapping) || !sbinfo->noswap))
+		return 0;
+
+	order -= PAGE_SHIFT;
+
+	/* If we're not aligned, allocate a smaller folio */
+	if (index & ((1UL << order) - 1))
+		order = __ffs(index);
+
+	order = min_t(size_t, order, MAX_PAGECACHE_ORDER);
+
+	/* Order-1 not supported due to THP dependency */
+	return (order == 1) ? 0 : order;
+}
+
 static struct folio *shmem_alloc_and_add_folio(gfp_t gfp,
 		struct inode *inode, pgoff_t index,
 		struct mm_struct *fault_mm, bool huge, size_t len)
 {
 	struct address_space *mapping = inode->i_mapping;
 	struct shmem_inode_info *info = SHMEM_I(inode);
-	unsigned int order = 0;
+	unsigned int order = shmem_mapping_size_order(mapping, index, len,
+						      SHMEM_SB(inode->i_sb));
 	struct folio *folio;
 	long pages;
 	int error;
 
+neworder:
 	if (!IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE))
 		huge = false;
 
@@ -1937,6 +1977,11 @@ static struct folio *shmem_alloc_and_add_folio(gfp_t gfp,
 unlock:
 	folio_unlock(folio);
 	folio_put(folio);
+	if ((error != -ENOSPC) && (order > 0)) {
+		if (--order == 1)
+			order = 0;
+		goto neworder;
+	}
 	return ERR_PTR(error);
 }
 
-- 
2.43.0

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 11/12] shmem: add file length arg in shmem_get_folio() path
  2024-05-15  5:57     ` [PATCH 11/12] shmem: add file length arg in shmem_get_folio() path Daniel Gomez
@ 2024-05-15 17:47       ` kernel test robot
  2024-05-17 16:17       ` Darrick J. Wong
  1 sibling, 0 replies; 19+ messages in thread
From: kernel test robot @ 2024-05-15 17:47 UTC (permalink / raw)
  To: Daniel Gomez, hughd, akpm, willy, jack, mcgrof
  Cc: oe-kbuild-all, linux-mm, linux-xfs, djwong, Pankaj Raghav,
	dagmcr, yosryahmed, baolin.wang, ritesh.list, lsf-pc, david,
	chandan.babu, linux-kernel, brauner, Daniel Gomez

Hi Daniel,

kernel test robot noticed the following build warnings:

[auto build test WARNING on akpm-mm/mm-everything]
[also build test WARNING on xfs-linux/for-next brauner-vfs/vfs.all linus/master v6.9 next-20240515]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Daniel-Gomez/splice-don-t-check-for-uptodate-if-partially-uptodate-is-impl/20240515-135925
base:   https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
patch link:    https://lore.kernel.org/r/20240515055719.32577-12-da.gomez%40samsung.com
patch subject: [PATCH 11/12] shmem: add file length arg in shmem_get_folio() path
config: openrisc-defconfig (https://download.01.org/0day-ci/archive/20240516/202405160144.a9ad9CX5-lkp@intel.com/config)
compiler: or1k-linux-gcc (GCC) 13.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240516/202405160144.a9ad9CX5-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202405160144.a9ad9CX5-lkp@intel.com/

All warnings (new ones prefixed by >>):

>> mm/shmem.c:2382: warning: Function parameter or struct member 'len' not described in 'shmem_get_folio'


vim +2382 mm/shmem.c

^1da177e4c3f41 Linus Torvalds          2005-04-16  2356  
d7468609ee0f90 Christoph Hellwig       2024-02-19  2357  /**
d7468609ee0f90 Christoph Hellwig       2024-02-19  2358   * shmem_get_folio - find, and lock a shmem folio.
d7468609ee0f90 Christoph Hellwig       2024-02-19  2359   * @inode:	inode to search
d7468609ee0f90 Christoph Hellwig       2024-02-19  2360   * @index:	the page index.
d7468609ee0f90 Christoph Hellwig       2024-02-19  2361   * @foliop:	pointer to the folio if found
d7468609ee0f90 Christoph Hellwig       2024-02-19  2362   * @sgp:	SGP_* flags to control behavior
d7468609ee0f90 Christoph Hellwig       2024-02-19  2363   *
d7468609ee0f90 Christoph Hellwig       2024-02-19  2364   * Looks up the page cache entry at @inode & @index.  If a folio is
d7468609ee0f90 Christoph Hellwig       2024-02-19  2365   * present, it is returned locked with an increased refcount.
d7468609ee0f90 Christoph Hellwig       2024-02-19  2366   *
9d8b36744935f8 Christoph Hellwig       2024-02-19  2367   * If the caller modifies data in the folio, it must call folio_mark_dirty()
9d8b36744935f8 Christoph Hellwig       2024-02-19  2368   * before unlocking the folio to ensure that the folio is not reclaimed.
9d8b36744935f8 Christoph Hellwig       2024-02-19  2369   * There is no need to reserve space before calling folio_mark_dirty().
9d8b36744935f8 Christoph Hellwig       2024-02-19  2370   *
d7468609ee0f90 Christoph Hellwig       2024-02-19  2371   * When no folio is found, the behavior depends on @sgp:
8d4dd9d741c330 Akira Yokosawa          2024-02-27  2372   *  - for SGP_READ, *@foliop is %NULL and 0 is returned
8d4dd9d741c330 Akira Yokosawa          2024-02-27  2373   *  - for SGP_NOALLOC, *@foliop is %NULL and -ENOENT is returned
d7468609ee0f90 Christoph Hellwig       2024-02-19  2374   *  - for all other flags a new folio is allocated, inserted into the
d7468609ee0f90 Christoph Hellwig       2024-02-19  2375   *    page cache and returned locked in @foliop.
d7468609ee0f90 Christoph Hellwig       2024-02-19  2376   *
d7468609ee0f90 Christoph Hellwig       2024-02-19  2377   * Context: May sleep.
d7468609ee0f90 Christoph Hellwig       2024-02-19  2378   * Return: 0 if successful, else a negative error code.
d7468609ee0f90 Christoph Hellwig       2024-02-19  2379   */
4e1fc793ad9892 Matthew Wilcox (Oracle  2022-09-02  2380) int shmem_get_folio(struct inode *inode, pgoff_t index, struct folio **foliop,
02efe2fbe45ffd Daniel Gomez            2024-05-15  2381  		enum sgp_type sgp, size_t len)
4e1fc793ad9892 Matthew Wilcox (Oracle  2022-09-02 @2382) {
4e1fc793ad9892 Matthew Wilcox (Oracle  2022-09-02  2383) 	return shmem_get_folio_gfp(inode, index, foliop, sgp,
02efe2fbe45ffd Daniel Gomez            2024-05-15  2384  			mapping_gfp_mask(inode->i_mapping), NULL, NULL, len);
4e1fc793ad9892 Matthew Wilcox (Oracle  2022-09-02  2385) }
d7468609ee0f90 Christoph Hellwig       2024-02-19  2386  EXPORT_SYMBOL_GPL(shmem_get_folio);
4e1fc793ad9892 Matthew Wilcox (Oracle  2022-09-02  2387) 

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 12/12] shmem: add large folio support to the write and fallocate paths
  2024-05-15  5:57     ` [PATCH 12/12] shmem: add large folio support to the write and fallocate paths Daniel Gomez
@ 2024-05-15 18:59       ` kernel test robot
  0 siblings, 0 replies; 19+ messages in thread
From: kernel test robot @ 2024-05-15 18:59 UTC (permalink / raw)
  To: Daniel Gomez, hughd, akpm, willy, jack, mcgrof
  Cc: oe-kbuild-all, linux-mm, linux-xfs, djwong, Pankaj Raghav,
	dagmcr, yosryahmed, baolin.wang, ritesh.list, lsf-pc, david,
	chandan.babu, linux-kernel, brauner, Daniel Gomez

Hi Daniel,

kernel test robot noticed the following build warnings:

[auto build test WARNING on akpm-mm/mm-everything]
[also build test WARNING on xfs-linux/for-next brauner-vfs/vfs.all linus/master v6.9 next-20240515]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Daniel-Gomez/splice-don-t-check-for-uptodate-if-partially-uptodate-is-impl/20240515-135925
base:   https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
patch link:    https://lore.kernel.org/r/20240515055719.32577-13-da.gomez%40samsung.com
patch subject: [PATCH 12/12] shmem: add large folio support to the write and fallocate paths
config: openrisc-defconfig (https://download.01.org/0day-ci/archive/20240516/202405160245.2EBqOCyg-lkp@intel.com/config)
compiler: or1k-linux-gcc (GCC) 13.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240516/202405160245.2EBqOCyg-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202405160245.2EBqOCyg-lkp@intel.com/

All warnings (new ones prefixed by >>):

>> mm/shmem.c:1864: warning: Function parameter or struct member 'sbinfo' not described in 'shmem_mapping_size_order'
   mm/shmem.c:2427: warning: Function parameter or struct member 'len' not described in 'shmem_get_folio'


vim +1864 mm/shmem.c

  1845	
  1846	/**
  1847	 * shmem_mapping_size_order - Get maximum folio order for the given file size.
  1848	 * @mapping: Target address_space.
  1849	 * @index: The page index.
  1850	 * @size: The suggested size of the folio to create.
  1851	 *
  1852	 * This returns a high order for folios (when supported) based on the file size
  1853	 * which the mapping currently allows at the given index. The index is relevant
  1854	 * due to alignment considerations the mapping might have. The returned order
  1855	 * may be less than the size passed.
  1856	 *
  1857	 * Like __filemap_get_folio order calculation.
  1858	 *
  1859	 * Return: The order.
  1860	 */
  1861	static inline unsigned int
  1862	shmem_mapping_size_order(struct address_space *mapping, pgoff_t index,
  1863				 size_t size, struct shmem_sb_info *sbinfo)
> 1864	{
  1865		unsigned int order = ilog2(size);
  1866	
  1867		if ((order <= PAGE_SHIFT) ||
  1868		    (!mapping_large_folio_support(mapping) || !sbinfo->noswap))
  1869			return 0;
  1870	
  1871		order -= PAGE_SHIFT;
  1872	
  1873		/* If we're not aligned, allocate a smaller folio */
  1874		if (index & ((1UL << order) - 1))
  1875			order = __ffs(index);
  1876	
  1877		order = min_t(size_t, order, MAX_PAGECACHE_ORDER);
  1878	
  1879		/* Order-1 not supported due to THP dependency */
  1880		return (order == 1) ? 0 : order;
  1881	}
  1882	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 07/12] shmem: check if a block is uptodate before splice into pipe
  2024-05-15  5:57     ` [PATCH 07/12] shmem: check if a block is uptodate before splice into pipe Daniel Gomez
@ 2024-05-16 13:19       ` kernel test robot
  0 siblings, 0 replies; 19+ messages in thread
From: kernel test robot @ 2024-05-16 13:19 UTC (permalink / raw)
  To: Daniel Gomez, hughd, akpm, willy, jack, mcgrof
  Cc: oe-kbuild-all, linux-mm, linux-xfs, djwong, Pankaj Raghav,
	dagmcr, yosryahmed, baolin.wang, ritesh.list, lsf-pc, david,
	chandan.babu, linux-kernel, brauner, Daniel Gomez

Hi Daniel,

kernel test robot noticed the following build errors:

[auto build test ERROR on akpm-mm/mm-everything]
[also build test ERROR on xfs-linux/for-next brauner-vfs/vfs.all linus/master v6.9 next-20240516]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Daniel-Gomez/splice-don-t-check-for-uptodate-if-partially-uptodate-is-impl/20240515-135925
base:   https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
patch link:    https://lore.kernel.org/r/20240515055719.32577-8-da.gomez%40samsung.com
patch subject: [PATCH 07/12] shmem: check if a block is uptodate before splice into pipe
config: arm-s5pv210_defconfig (https://download.01.org/0day-ci/archive/20240516/202405162045.kaXgB2n3-lkp@intel.com/config)
compiler: arm-linux-gnueabi-gcc (GCC) 13.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240516/202405162045.kaXgB2n3-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202405162045.kaXgB2n3-lkp@intel.com/

All errors (new ones prefixed by >>):

   arm-linux-gnueabi-ld: mm/shmem.o: in function `shmem_file_splice_read':
>> mm/shmem.c:3240:(.text+0x5224): undefined reference to `__aeabi_ldivmod'


vim +3240 mm/shmem.c

  3174	
  3175	static ssize_t shmem_file_splice_read(struct file *in, loff_t *ppos,
  3176					      struct pipe_inode_info *pipe,
  3177					      size_t len, unsigned int flags)
  3178	{
  3179		struct inode *inode = file_inode(in);
  3180		struct address_space *mapping = inode->i_mapping;
  3181		struct folio *folio = NULL;
  3182		size_t total_spliced = 0, used, npages, n, part;
  3183		loff_t isize;
  3184		int error = 0;
  3185	
  3186		/* Work out how much data we can actually add into the pipe */
  3187		used = pipe_occupancy(pipe->head, pipe->tail);
  3188		npages = max_t(ssize_t, pipe->max_usage - used, 0);
  3189		len = min_t(size_t, len, npages * PAGE_SIZE);
  3190	
  3191		do {
  3192			if (*ppos >= i_size_read(inode))
  3193				break;
  3194	
  3195			error = shmem_get_folio(inode, *ppos / PAGE_SIZE, &folio,
  3196						SGP_READ);
  3197			if (error) {
  3198				if (error == -EINVAL)
  3199					error = 0;
  3200				break;
  3201			}
  3202			if (folio) {
  3203				folio_unlock(folio);
  3204	
  3205				if (folio_test_hwpoison(folio) ||
  3206				    (folio_test_large(folio) &&
  3207				     folio_test_has_hwpoisoned(folio))) {
  3208					error = -EIO;
  3209					break;
  3210				}
  3211			}
  3212	
  3213			/*
  3214			 * i_size must be checked after we know the pages are Uptodate.
  3215			 *
  3216			 * Checking i_size after the check allows us to calculate
  3217			 * the correct value for "nr", which means the zero-filled
  3218			 * part of the page is not copied back to userspace (unless
  3219			 * another truncate extends the file - this is desired though).
  3220			 */
  3221			isize = i_size_read(inode);
  3222			if (unlikely(*ppos >= isize))
  3223				break;
  3224			part = min_t(loff_t, isize - *ppos, len);
  3225			if (folio && folio_test_large(folio) &&
  3226			    folio_test_private(folio)) {
  3227				unsigned long from = offset_in_folio(folio, *ppos);
  3228				unsigned int bfirst = from >> inode->i_blkbits;
  3229				unsigned int blast, blast_upd;
  3230	
  3231				len = min(folio_size(folio) - from, len);
  3232				blast = (from + len - 1) >> inode->i_blkbits;
  3233	
  3234				blast_upd = sfs_get_last_block_uptodate(folio, bfirst,
  3235									blast);
  3236				if (blast_upd <= blast) {
  3237					unsigned int bsize = 1 << inode->i_blkbits;
  3238					unsigned int blks = blast_upd - bfirst + 1;
  3239					unsigned int bbytes = blks << inode->i_blkbits;
> 3240					unsigned int boff = (*ppos % bsize);
  3241	
  3242					part = min_t(loff_t, bbytes - boff, len);
  3243				}
  3244			}
  3245	
  3246			if (folio && shmem_is_block_uptodate(
  3247					     folio, offset_in_folio(folio, *ppos) >>
  3248							    inode->i_blkbits)) {
  3249				/*
  3250				 * If users can be writing to this page using arbitrary
  3251				 * virtual addresses, take care about potential aliasing
  3252				 * before reading the page on the kernel side.
  3253				 */
  3254				if (mapping_writably_mapped(mapping))
  3255					flush_dcache_folio(folio);
  3256				folio_mark_accessed(folio);
  3257				/*
  3258				 * Ok, we have the page, and it's up-to-date, so we can
  3259				 * now splice it into the pipe.
  3260				 */
  3261				n = splice_folio_into_pipe(pipe, folio, *ppos, part);
  3262				folio_put(folio);
  3263				folio = NULL;
  3264			} else {
  3265				n = splice_zeropage_into_pipe(pipe, *ppos, part);
  3266			}
  3267	
  3268			if (!n)
  3269				break;
  3270			len -= n;
  3271			total_spliced += n;
  3272			*ppos += n;
  3273			in->f_ra.prev_pos = *ppos;
  3274			if (pipe_full(pipe->head, pipe->tail, pipe->max_usage))
  3275				break;
  3276	
  3277			cond_resched();
  3278		} while (len);
  3279	
  3280		if (folio)
  3281			folio_put(folio);
  3282	
  3283		file_accessed(in);
  3284		return total_spliced ? total_spliced : error;
  3285	}
  3286	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 11/12] shmem: add file length arg in shmem_get_folio() path
  2024-05-15  5:57     ` [PATCH 11/12] shmem: add file length arg in shmem_get_folio() path Daniel Gomez
  2024-05-15 17:47       ` kernel test robot
@ 2024-05-17 16:17       ` Darrick J. Wong
  2024-05-21 11:38         ` Daniel Gomez
  1 sibling, 1 reply; 19+ messages in thread
From: Darrick J. Wong @ 2024-05-17 16:17 UTC (permalink / raw)
  To: Daniel Gomez
  Cc: hughd, akpm, willy, jack, mcgrof, linux-mm, linux-xfs,
	Pankaj Raghav, dagmcr, yosryahmed, baolin.wang, ritesh.list,
	lsf-pc, david, chandan.babu, linux-kernel, brauner

On Wed, May 15, 2024 at 05:57:36AM +0000, Daniel Gomez wrote:
> In preparation for large folio in the write and fallocate paths, add
> file length argument in shmem_get_folio() path to be able to calculate
> the folio order based on the file size. Use of order-0 (PAGE_SIZE) for
> read, page cache read, and vm fault.
> 
> This enables high order folios in the write and fallocate path once the
> folio order is calculated based on the length.
> 
> Signed-off-by: Daniel Gomez <da.gomez@samsung.com>
> ---
>  fs/xfs/scrub/xfile.c     |  6 +++---
>  fs/xfs/xfs_buf_mem.c     |  3 ++-
>  include/linux/shmem_fs.h |  2 +-
>  mm/khugepaged.c          |  3 ++-
>  mm/shmem.c               | 35 ++++++++++++++++++++---------------
>  mm/userfaultfd.c         |  2 +-
>  6 files changed, 29 insertions(+), 22 deletions(-)
> 
> diff --git a/fs/xfs/scrub/xfile.c b/fs/xfs/scrub/xfile.c
> index 8cdd863db585..4905f5e4cb5d 100644
> --- a/fs/xfs/scrub/xfile.c
> +++ b/fs/xfs/scrub/xfile.c
> @@ -127,7 +127,7 @@ xfile_load(
>  		unsigned int	offset;
>  
>  		if (shmem_get_folio(inode, pos >> PAGE_SHIFT, &folio,
> -				SGP_READ) < 0)
> +				SGP_READ, PAGE_SIZE) < 0)

I suppose I /did/ say during LSFMM that for the current users of xfile.c
and xfs_buf_mem.c the order of the folio being returned doesn't really
matter, but why wouldn't the last argument here be "roundup_64(count,
PAGE_SIZE)" ?  Shouldn't we at least hint to the page cache about the
folio order that we actually want instead of limiting it to order-0?

(Also it seems a little odd to me that the @index is in units of pgoff_t
but @len is in bytes.)

>  			break;
>  		if (!folio) {
>  			/*
> @@ -197,7 +197,7 @@ xfile_store(
>  		unsigned int	offset;
>  
>  		if (shmem_get_folio(inode, pos >> PAGE_SHIFT, &folio,
> -				SGP_CACHE) < 0)
> +				SGP_CACHE, PAGE_SIZE) < 0)
>  			break;
>  		if (filemap_check_wb_err(inode->i_mapping, 0)) {
>  			folio_unlock(folio);
> @@ -268,7 +268,7 @@ xfile_get_folio(
>  
>  	pflags = memalloc_nofs_save();
>  	error = shmem_get_folio(inode, pos >> PAGE_SHIFT, &folio,
> -			(flags & XFILE_ALLOC) ? SGP_CACHE : SGP_READ);
> +			(flags & XFILE_ALLOC) ? SGP_CACHE : SGP_READ, PAGE_SIZE);
>  	memalloc_nofs_restore(pflags);
>  	if (error)
>  		return ERR_PTR(error);
> diff --git a/fs/xfs/xfs_buf_mem.c b/fs/xfs/xfs_buf_mem.c
> index 9bb2d24de709..784c81d35a1f 100644
> --- a/fs/xfs/xfs_buf_mem.c
> +++ b/fs/xfs/xfs_buf_mem.c
> @@ -149,7 +149,8 @@ xmbuf_map_page(
>  		return -ENOMEM;
>  	}
>  
> -	error = shmem_get_folio(inode, pos >> PAGE_SHIFT, &folio, SGP_CACHE);
> +	error = shmem_get_folio(inode, pos >> PAGE_SHIFT, &folio, SGP_CACHE,
> +				PAGE_SIZE);

This is ok unless someone wants to use a different XMBUF_BLOCKSIZE.

--D

>  	if (error)
>  		return error;
>  
> diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h
> index 3fb18f7eb73e..bc59b4a00228 100644
> --- a/include/linux/shmem_fs.h
> +++ b/include/linux/shmem_fs.h
> @@ -142,7 +142,7 @@ enum sgp_type {
>  };
>  
>  int shmem_get_folio(struct inode *inode, pgoff_t index, struct folio **foliop,
> -		enum sgp_type sgp);
> +		enum sgp_type sgp, size_t len);
>  struct folio *shmem_read_folio_gfp(struct address_space *mapping,
>  		pgoff_t index, gfp_t gfp);
>  
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index 38830174608f..947770ded68c 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -1863,7 +1863,8 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr,
>  				xas_unlock_irq(&xas);
>  				/* swap in or instantiate fallocated page */
>  				if (shmem_get_folio(mapping->host, index,
> -						&folio, SGP_NOALLOC)) {
> +						    &folio, SGP_NOALLOC,
> +						    PAGE_SIZE)) {
>  					result = SCAN_FAIL;
>  					goto xa_unlocked;
>  				}
> diff --git a/mm/shmem.c b/mm/shmem.c
> index d531018ffece..fcd2c9befe19 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -1134,7 +1134,7 @@ static struct folio *shmem_get_partial_folio(struct inode *inode, pgoff_t index)
>  	 * (although in some cases this is just a waste of time).
>  	 */
>  	folio = NULL;
> -	shmem_get_folio(inode, index, &folio, SGP_READ);
> +	shmem_get_folio(inode, index, &folio, SGP_READ, PAGE_SIZE);
>  	return folio;
>  }
>  
> @@ -1844,7 +1844,7 @@ static struct folio *shmem_alloc_folio(gfp_t gfp, struct shmem_inode_info *info,
>  
>  static struct folio *shmem_alloc_and_add_folio(gfp_t gfp,
>  		struct inode *inode, pgoff_t index,
> -		struct mm_struct *fault_mm, bool huge)
> +		struct mm_struct *fault_mm, bool huge, size_t len)
>  {
>  	struct address_space *mapping = inode->i_mapping;
>  	struct shmem_inode_info *info = SHMEM_I(inode);
> @@ -2173,7 +2173,7 @@ static int shmem_swapin_folio(struct inode *inode, pgoff_t index,
>   */
>  static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index,
>  		struct folio **foliop, enum sgp_type sgp, gfp_t gfp,
> -		struct vm_fault *vmf, vm_fault_t *fault_type)
> +		struct vm_fault *vmf, vm_fault_t *fault_type, size_t len)
>  {
>  	struct vm_area_struct *vma = vmf ? vmf->vma : NULL;
>  	struct mm_struct *fault_mm;
> @@ -2258,7 +2258,7 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index,
>  		huge_gfp = vma_thp_gfp_mask(vma);
>  		huge_gfp = limit_gfp_mask(huge_gfp, gfp);
>  		folio = shmem_alloc_and_add_folio(huge_gfp,
> -				inode, index, fault_mm, true);
> +				inode, index, fault_mm, true, len);
>  		if (!IS_ERR(folio)) {
>  			count_vm_event(THP_FILE_ALLOC);
>  			goto alloced;
> @@ -2267,7 +2267,8 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index,
>  			goto repeat;
>  	}
>  
> -	folio = shmem_alloc_and_add_folio(gfp, inode, index, fault_mm, false);
> +	folio = shmem_alloc_and_add_folio(gfp, inode, index, fault_mm, false,
> +					  len);
>  	if (IS_ERR(folio)) {
>  		error = PTR_ERR(folio);
>  		if (error == -EEXIST)
> @@ -2377,10 +2378,10 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index,
>   * Return: 0 if successful, else a negative error code.
>   */
>  int shmem_get_folio(struct inode *inode, pgoff_t index, struct folio **foliop,
> -		enum sgp_type sgp)
> +		enum sgp_type sgp, size_t len)
>  {
>  	return shmem_get_folio_gfp(inode, index, foliop, sgp,
> -			mapping_gfp_mask(inode->i_mapping), NULL, NULL);
> +			mapping_gfp_mask(inode->i_mapping), NULL, NULL, len);
>  }
>  EXPORT_SYMBOL_GPL(shmem_get_folio);
>  
> @@ -2475,7 +2476,7 @@ static vm_fault_t shmem_fault(struct vm_fault *vmf)
>  
>  	WARN_ON_ONCE(vmf->page != NULL);
>  	err = shmem_get_folio_gfp(inode, vmf->pgoff, &folio, SGP_CACHE,
> -				  gfp, vmf, &ret);
> +				  gfp, vmf, &ret, PAGE_SIZE);
>  	if (err)
>  		return vmf_error(err);
>  	if (folio) {
> @@ -2954,6 +2955,9 @@ shmem_write_begin(struct file *file, struct address_space *mapping,
>  	struct folio *folio;
>  	int ret = 0;
>  
> +	if (!mapping_large_folio_support(mapping))
> +		len = min_t(size_t, len, PAGE_SIZE - offset_in_page(pos));
> +
>  	/* i_rwsem is held by caller */
>  	if (unlikely(info->seals & (F_SEAL_GROW |
>  				   F_SEAL_WRITE | F_SEAL_FUTURE_WRITE))) {
> @@ -2963,7 +2967,7 @@ shmem_write_begin(struct file *file, struct address_space *mapping,
>  			return -EPERM;
>  	}
>  
> -	ret = shmem_get_folio(inode, index, &folio, SGP_WRITE);
> +	ret = shmem_get_folio(inode, index, &folio, SGP_WRITE, len);
>  	if (ret)
>  		return ret;
>  
> @@ -3083,7 +3087,7 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
>  				break;
>  		}
>  
> -		error = shmem_get_folio(inode, index, &folio, SGP_READ);
> +		error = shmem_get_folio(inode, index, &folio, SGP_READ, PAGE_SIZE);
>  		if (error) {
>  			if (error == -EINVAL)
>  				error = 0;
> @@ -3260,7 +3264,7 @@ static ssize_t shmem_file_splice_read(struct file *in, loff_t *ppos,
>  			break;
>  
>  		error = shmem_get_folio(inode, *ppos / PAGE_SIZE, &folio,
> -					SGP_READ);
> +					SGP_READ, PAGE_SIZE);
>  		if (error) {
>  			if (error == -EINVAL)
>  				error = 0;
> @@ -3469,7 +3473,8 @@ static long shmem_fallocate(struct file *file, int mode, loff_t offset,
>  			error = -ENOMEM;
>  		else
>  			error = shmem_get_folio(inode, index, &folio,
> -						SGP_FALLOC);
> +						SGP_FALLOC,
> +						(end - index) << PAGE_SHIFT);
>  		if (error) {
>  			info->fallocend = undo_fallocend;
>  			/* Remove the !uptodate folios we added */
> @@ -3822,7 +3827,7 @@ static int shmem_symlink(struct mnt_idmap *idmap, struct inode *dir,
>  	} else {
>  		inode_nohighmem(inode);
>  		inode->i_mapping->a_ops = &shmem_aops;
> -		error = shmem_get_folio(inode, 0, &folio, SGP_WRITE);
> +		error = shmem_get_folio(inode, 0, &folio, SGP_WRITE, PAGE_SIZE);
>  		if (error)
>  			goto out_remove_offset;
>  		inode->i_op = &shmem_symlink_inode_operations;
> @@ -3868,7 +3873,7 @@ static const char *shmem_get_link(struct dentry *dentry, struct inode *inode,
>  			return ERR_PTR(-ECHILD);
>  		}
>  	} else {
> -		error = shmem_get_folio(inode, 0, &folio, SGP_READ);
> +		error = shmem_get_folio(inode, 0, &folio, SGP_READ, PAGE_SIZE);
>  		if (error)
>  			return ERR_PTR(error);
>  		if (!folio)
> @@ -5255,7 +5260,7 @@ struct folio *shmem_read_folio_gfp(struct address_space *mapping,
>  	int error;
>  
>  	error = shmem_get_folio_gfp(inode, index, &folio, SGP_CACHE,
> -				    gfp, NULL, NULL);
> +				    gfp, NULL, NULL, PAGE_SIZE);
>  	if (error)
>  		return ERR_PTR(error);
>  
> diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
> index 3c3539c573e7..540a0c2d4325 100644
> --- a/mm/userfaultfd.c
> +++ b/mm/userfaultfd.c
> @@ -359,7 +359,7 @@ static int mfill_atomic_pte_continue(pmd_t *dst_pmd,
>  	struct page *page;
>  	int ret;
>  
> -	ret = shmem_get_folio(inode, pgoff, &folio, SGP_NOALLOC);
> +	ret = shmem_get_folio(inode, pgoff, &folio, SGP_NOALLOC, PAGE_SIZE);
>  	/* Our caller expects us to return -EFAULT if we failed to find folio */
>  	if (ret == -ENOENT)
>  		ret = -EFAULT;
> -- 
> 2.43.0
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 11/12] shmem: add file length arg in shmem_get_folio() path
  2024-05-17 16:17       ` Darrick J. Wong
@ 2024-05-21 11:38         ` Daniel Gomez
  2024-05-21 16:36           ` Darrick J. Wong
  0 siblings, 1 reply; 19+ messages in thread
From: Daniel Gomez @ 2024-05-21 11:38 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: hughd, akpm, willy, jack, mcgrof, linux-mm, linux-xfs,
	Pankaj Raghav, dagmcr, yosryahmed, baolin.wang, ritesh.list,
	lsf-pc, david, chandan.babu, linux-kernel, brauner

On Fri, May 17, 2024 at 09:17:41AM -0700, Darrick J. Wong wrote:
> On Wed, May 15, 2024 at 05:57:36AM +0000, Daniel Gomez wrote:
> > In preparation for large folio in the write and fallocate paths, add
> > file length argument in shmem_get_folio() path to be able to calculate
> > the folio order based on the file size. Use of order-0 (PAGE_SIZE) for
> > read, page cache read, and vm fault.
> > 
> > This enables high order folios in the write and fallocate path once the
> > folio order is calculated based on the length.
> > 
> > Signed-off-by: Daniel Gomez <da.gomez@samsung.com>
> > ---
> >  fs/xfs/scrub/xfile.c     |  6 +++---
> >  fs/xfs/xfs_buf_mem.c     |  3 ++-
> >  include/linux/shmem_fs.h |  2 +-
> >  mm/khugepaged.c          |  3 ++-
> >  mm/shmem.c               | 35 ++++++++++++++++++++---------------
> >  mm/userfaultfd.c         |  2 +-
> >  6 files changed, 29 insertions(+), 22 deletions(-)
> > 
> > diff --git a/fs/xfs/scrub/xfile.c b/fs/xfs/scrub/xfile.c
> > index 8cdd863db585..4905f5e4cb5d 100644
> > --- a/fs/xfs/scrub/xfile.c
> > +++ b/fs/xfs/scrub/xfile.c
> > @@ -127,7 +127,7 @@ xfile_load(
> >  		unsigned int	offset;
> >  
> >  		if (shmem_get_folio(inode, pos >> PAGE_SHIFT, &folio,
> > -				SGP_READ) < 0)
> > +				SGP_READ, PAGE_SIZE) < 0)
> 
> I suppose I /did/ say during LSFMM that for the current users of xfile.c
> and xfs_buf_mem.c the order of the folio being returned doesn't really
I not sure if I understood you well. Could you please elaborate on this?

> matter, but why wouldn't the last argument here be "roundup_64(count,
> PAGE_SIZE)" ?  Shouldn't we at least hint to the page cache about the
> folio order that we actually want instead of limiting it to order-0?

For v2, I'll include your suggestions. I think we can also enable large folios
in xfile_get_folio(), please check below:

diff --git a/fs/xfs/scrub/xfile.c b/fs/xfs/scrub/xfile.c
index 8cdd863db585..df8b495b4939 100644
--- a/fs/xfs/scrub/xfile.c
+++ b/fs/xfs/scrub/xfile.c
@@ -127,7 +127,7 @@ xfile_load(
                unsigned int    offset;

                if (shmem_get_folio(inode, pos >> PAGE_SHIFT, &folio,
-                               SGP_READ) < 0)
+                               SGP_READ, roundup_64(count, PAGE_SIZE)) < 0)
                        break;
                if (!folio) {
                        /*
@@ -197,7 +197,7 @@ xfile_store(
                unsigned int    offset;

                if (shmem_get_folio(inode, pos >> PAGE_SHIFT, &folio,
-                               SGP_CACHE) < 0)
+                               SGP_CACHE, roundup_64(count, PAGE_SIZE)) < 0)
                        break;
                if (filemap_check_wb_err(inode->i_mapping, 0)) {
                        folio_unlock(folio);
@@ -268,7 +268,8 @@ xfile_get_folio(

        pflags = memalloc_nofs_save();
        error = shmem_get_folio(inode, pos >> PAGE_SHIFT, &folio,
-                       (flags & XFILE_ALLOC) ? SGP_CACHE : SGP_READ);
+                       (flags & XFILE_ALLOC) ? SGP_CACHE : SGP_READ,
+                       roundup_64(i_size_read(inode), PAGE_SIZE));
        memalloc_nofs_restore(pflags);
        if (error)
                return ERR_PTR(error);

> 
> (Also it seems a little odd to me that the @index is in units of pgoff_t
> but @len is in bytes.)

I extended the shmem_get_folio() with @len to calculate folio order based on
size (bytes). This is sent to ilog2() although I'm planning to use get_order()
instead (after fixing the issues mentioned during the discussion). @index is
used for __ffs() (same as in filemap).

Would you use lofft for @len instead? Or what's your suggestion?

Thanks,
Daniel

> 
> >  			break;
> >  		if (!folio) {
> >  			/*
> > @@ -197,7 +197,7 @@ xfile_store(
> >  		unsigned int	offset;
> >  
> >  		if (shmem_get_folio(inode, pos >> PAGE_SHIFT, &folio,
> > -				SGP_CACHE) < 0)
> > +				SGP_CACHE, PAGE_SIZE) < 0)
> >  			break;
> >  		if (filemap_check_wb_err(inode->i_mapping, 0)) {
> >  			folio_unlock(folio);
> > @@ -268,7 +268,7 @@ xfile_get_folio(
> >  
> >  	pflags = memalloc_nofs_save();
> >  	error = shmem_get_folio(inode, pos >> PAGE_SHIFT, &folio,
> > -			(flags & XFILE_ALLOC) ? SGP_CACHE : SGP_READ);
> > +			(flags & XFILE_ALLOC) ? SGP_CACHE : SGP_READ, PAGE_SIZE);
> >  	memalloc_nofs_restore(pflags);
> >  	if (error)
> >  		return ERR_PTR(error);
> > diff --git a/fs/xfs/xfs_buf_mem.c b/fs/xfs/xfs_buf_mem.c
> > index 9bb2d24de709..784c81d35a1f 100644
> > --- a/fs/xfs/xfs_buf_mem.c
> > +++ b/fs/xfs/xfs_buf_mem.c
> > @@ -149,7 +149,8 @@ xmbuf_map_page(
> >  		return -ENOMEM;
> >  	}
> >  
> > -	error = shmem_get_folio(inode, pos >> PAGE_SHIFT, &folio, SGP_CACHE);
> > +	error = shmem_get_folio(inode, pos >> PAGE_SHIFT, &folio, SGP_CACHE,
> > +				PAGE_SIZE);
> 
> This is ok unless someone wants to use a different XMBUF_BLOCKSIZE.
> 
> --D
> 
> >  	if (error)
> >  		return error;
> >  
> > diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h
> > index 3fb18f7eb73e..bc59b4a00228 100644
> > --- a/include/linux/shmem_fs.h
> > +++ b/include/linux/shmem_fs.h
> > @@ -142,7 +142,7 @@ enum sgp_type {
> >  };
> >  
> >  int shmem_get_folio(struct inode *inode, pgoff_t index, struct folio **foliop,
> > -		enum sgp_type sgp);
> > +		enum sgp_type sgp, size_t len);
> >  struct folio *shmem_read_folio_gfp(struct address_space *mapping,
> >  		pgoff_t index, gfp_t gfp);
> >  
> > diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> > index 38830174608f..947770ded68c 100644
> > --- a/mm/khugepaged.c
> > +++ b/mm/khugepaged.c
> > @@ -1863,7 +1863,8 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr,
> >  				xas_unlock_irq(&xas);
> >  				/* swap in or instantiate fallocated page */
> >  				if (shmem_get_folio(mapping->host, index,
> > -						&folio, SGP_NOALLOC)) {
> > +						    &folio, SGP_NOALLOC,
> > +						    PAGE_SIZE)) {
> >  					result = SCAN_FAIL;
> >  					goto xa_unlocked;
> >  				}
> > diff --git a/mm/shmem.c b/mm/shmem.c
> > index d531018ffece..fcd2c9befe19 100644
> > --- a/mm/shmem.c
> > +++ b/mm/shmem.c
> > @@ -1134,7 +1134,7 @@ static struct folio *shmem_get_partial_folio(struct inode *inode, pgoff_t index)
> >  	 * (although in some cases this is just a waste of time).
> >  	 */
> >  	folio = NULL;
> > -	shmem_get_folio(inode, index, &folio, SGP_READ);
> > +	shmem_get_folio(inode, index, &folio, SGP_READ, PAGE_SIZE);
> >  	return folio;
> >  }
> >  
> > @@ -1844,7 +1844,7 @@ static struct folio *shmem_alloc_folio(gfp_t gfp, struct shmem_inode_info *info,
> >  
> >  static struct folio *shmem_alloc_and_add_folio(gfp_t gfp,
> >  		struct inode *inode, pgoff_t index,
> > -		struct mm_struct *fault_mm, bool huge)
> > +		struct mm_struct *fault_mm, bool huge, size_t len)
> >  {
> >  	struct address_space *mapping = inode->i_mapping;
> >  	struct shmem_inode_info *info = SHMEM_I(inode);
> > @@ -2173,7 +2173,7 @@ static int shmem_swapin_folio(struct inode *inode, pgoff_t index,
> >   */
> >  static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index,
> >  		struct folio **foliop, enum sgp_type sgp, gfp_t gfp,
> > -		struct vm_fault *vmf, vm_fault_t *fault_type)
> > +		struct vm_fault *vmf, vm_fault_t *fault_type, size_t len)
> >  {
> >  	struct vm_area_struct *vma = vmf ? vmf->vma : NULL;
> >  	struct mm_struct *fault_mm;
> > @@ -2258,7 +2258,7 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index,
> >  		huge_gfp = vma_thp_gfp_mask(vma);
> >  		huge_gfp = limit_gfp_mask(huge_gfp, gfp);
> >  		folio = shmem_alloc_and_add_folio(huge_gfp,
> > -				inode, index, fault_mm, true);
> > +				inode, index, fault_mm, true, len);
> >  		if (!IS_ERR(folio)) {
> >  			count_vm_event(THP_FILE_ALLOC);
> >  			goto alloced;
> > @@ -2267,7 +2267,8 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index,
> >  			goto repeat;
> >  	}
> >  
> > -	folio = shmem_alloc_and_add_folio(gfp, inode, index, fault_mm, false);
> > +	folio = shmem_alloc_and_add_folio(gfp, inode, index, fault_mm, false,
> > +					  len);
> >  	if (IS_ERR(folio)) {
> >  		error = PTR_ERR(folio);
> >  		if (error == -EEXIST)
> > @@ -2377,10 +2378,10 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index,
> >   * Return: 0 if successful, else a negative error code.
> >   */
> >  int shmem_get_folio(struct inode *inode, pgoff_t index, struct folio **foliop,
> > -		enum sgp_type sgp)
> > +		enum sgp_type sgp, size_t len)
> >  {
> >  	return shmem_get_folio_gfp(inode, index, foliop, sgp,
> > -			mapping_gfp_mask(inode->i_mapping), NULL, NULL);
> > +			mapping_gfp_mask(inode->i_mapping), NULL, NULL, len);
> >  }
> >  EXPORT_SYMBOL_GPL(shmem_get_folio);
> >  
> > @@ -2475,7 +2476,7 @@ static vm_fault_t shmem_fault(struct vm_fault *vmf)
> >  
> >  	WARN_ON_ONCE(vmf->page != NULL);
> >  	err = shmem_get_folio_gfp(inode, vmf->pgoff, &folio, SGP_CACHE,
> > -				  gfp, vmf, &ret);
> > +				  gfp, vmf, &ret, PAGE_SIZE);
> >  	if (err)
> >  		return vmf_error(err);
> >  	if (folio) {
> > @@ -2954,6 +2955,9 @@ shmem_write_begin(struct file *file, struct address_space *mapping,
> >  	struct folio *folio;
> >  	int ret = 0;
> >  
> > +	if (!mapping_large_folio_support(mapping))
> > +		len = min_t(size_t, len, PAGE_SIZE - offset_in_page(pos));
> > +
> >  	/* i_rwsem is held by caller */
> >  	if (unlikely(info->seals & (F_SEAL_GROW |
> >  				   F_SEAL_WRITE | F_SEAL_FUTURE_WRITE))) {
> > @@ -2963,7 +2967,7 @@ shmem_write_begin(struct file *file, struct address_space *mapping,
> >  			return -EPERM;
> >  	}
> >  
> > -	ret = shmem_get_folio(inode, index, &folio, SGP_WRITE);
> > +	ret = shmem_get_folio(inode, index, &folio, SGP_WRITE, len);
> >  	if (ret)
> >  		return ret;
> >  
> > @@ -3083,7 +3087,7 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
> >  				break;
> >  		}
> >  
> > -		error = shmem_get_folio(inode, index, &folio, SGP_READ);
> > +		error = shmem_get_folio(inode, index, &folio, SGP_READ, PAGE_SIZE);
> >  		if (error) {
> >  			if (error == -EINVAL)
> >  				error = 0;
> > @@ -3260,7 +3264,7 @@ static ssize_t shmem_file_splice_read(struct file *in, loff_t *ppos,
> >  			break;
> >  
> >  		error = shmem_get_folio(inode, *ppos / PAGE_SIZE, &folio,
> > -					SGP_READ);
> > +					SGP_READ, PAGE_SIZE);
> >  		if (error) {
> >  			if (error == -EINVAL)
> >  				error = 0;
> > @@ -3469,7 +3473,8 @@ static long shmem_fallocate(struct file *file, int mode, loff_t offset,
> >  			error = -ENOMEM;
> >  		else
> >  			error = shmem_get_folio(inode, index, &folio,
> > -						SGP_FALLOC);
> > +						SGP_FALLOC,
> > +						(end - index) << PAGE_SHIFT);
> >  		if (error) {
> >  			info->fallocend = undo_fallocend;
> >  			/* Remove the !uptodate folios we added */
> > @@ -3822,7 +3827,7 @@ static int shmem_symlink(struct mnt_idmap *idmap, struct inode *dir,
> >  	} else {
> >  		inode_nohighmem(inode);
> >  		inode->i_mapping->a_ops = &shmem_aops;
> > -		error = shmem_get_folio(inode, 0, &folio, SGP_WRITE);
> > +		error = shmem_get_folio(inode, 0, &folio, SGP_WRITE, PAGE_SIZE);
> >  		if (error)
> >  			goto out_remove_offset;
> >  		inode->i_op = &shmem_symlink_inode_operations;
> > @@ -3868,7 +3873,7 @@ static const char *shmem_get_link(struct dentry *dentry, struct inode *inode,
> >  			return ERR_PTR(-ECHILD);
> >  		}
> >  	} else {
> > -		error = shmem_get_folio(inode, 0, &folio, SGP_READ);
> > +		error = shmem_get_folio(inode, 0, &folio, SGP_READ, PAGE_SIZE);
> >  		if (error)
> >  			return ERR_PTR(error);
> >  		if (!folio)
> > @@ -5255,7 +5260,7 @@ struct folio *shmem_read_folio_gfp(struct address_space *mapping,
> >  	int error;
> >  
> >  	error = shmem_get_folio_gfp(inode, index, &folio, SGP_CACHE,
> > -				    gfp, NULL, NULL);
> > +				    gfp, NULL, NULL, PAGE_SIZE);
> >  	if (error)
> >  		return ERR_PTR(error);
> >  
> > diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
> > index 3c3539c573e7..540a0c2d4325 100644
> > --- a/mm/userfaultfd.c
> > +++ b/mm/userfaultfd.c
> > @@ -359,7 +359,7 @@ static int mfill_atomic_pte_continue(pmd_t *dst_pmd,
> >  	struct page *page;
> >  	int ret;
> >  
> > -	ret = shmem_get_folio(inode, pgoff, &folio, SGP_NOALLOC);
> > +	ret = shmem_get_folio(inode, pgoff, &folio, SGP_NOALLOC, PAGE_SIZE);
> >  	/* Our caller expects us to return -EFAULT if we failed to find folio */
> >  	if (ret == -ENOENT)
> >  		ret = -EFAULT;
> > -- 
> > 2.43.0
> > 

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 11/12] shmem: add file length arg in shmem_get_folio() path
  2024-05-21 11:38         ` Daniel Gomez
@ 2024-05-21 16:36           ` Darrick J. Wong
  0 siblings, 0 replies; 19+ messages in thread
From: Darrick J. Wong @ 2024-05-21 16:36 UTC (permalink / raw)
  To: Daniel Gomez
  Cc: hughd, akpm, willy, jack, mcgrof, linux-mm, linux-xfs,
	Pankaj Raghav, dagmcr, yosryahmed, baolin.wang, ritesh.list,
	lsf-pc, david, chandan.babu, linux-kernel, brauner

On Tue, May 21, 2024 at 11:38:33AM +0000, Daniel Gomez wrote:
> On Fri, May 17, 2024 at 09:17:41AM -0700, Darrick J. Wong wrote:
> > On Wed, May 15, 2024 at 05:57:36AM +0000, Daniel Gomez wrote:
> > > In preparation for large folio in the write and fallocate paths, add
> > > file length argument in shmem_get_folio() path to be able to calculate
> > > the folio order based on the file size. Use of order-0 (PAGE_SIZE) for
> > > read, page cache read, and vm fault.
> > > 
> > > This enables high order folios in the write and fallocate path once the
> > > folio order is calculated based on the length.
> > > 
> > > Signed-off-by: Daniel Gomez <da.gomez@samsung.com>
> > > ---
> > >  fs/xfs/scrub/xfile.c     |  6 +++---
> > >  fs/xfs/xfs_buf_mem.c     |  3 ++-
> > >  include/linux/shmem_fs.h |  2 +-
> > >  mm/khugepaged.c          |  3 ++-
> > >  mm/shmem.c               | 35 ++++++++++++++++++++---------------
> > >  mm/userfaultfd.c         |  2 +-
> > >  6 files changed, 29 insertions(+), 22 deletions(-)
> > > 
> > > diff --git a/fs/xfs/scrub/xfile.c b/fs/xfs/scrub/xfile.c
> > > index 8cdd863db585..4905f5e4cb5d 100644
> > > --- a/fs/xfs/scrub/xfile.c
> > > +++ b/fs/xfs/scrub/xfile.c
> > > @@ -127,7 +127,7 @@ xfile_load(
> > >  		unsigned int	offset;
> > >  
> > >  		if (shmem_get_folio(inode, pos >> PAGE_SHIFT, &folio,
> > > -				SGP_READ) < 0)
> > > +				SGP_READ, PAGE_SIZE) < 0)
> > 
> > I suppose I /did/ say during LSFMM that for the current users of xfile.c
> > and xfs_buf_mem.c the order of the folio being returned doesn't really
> I not sure if I understood you well. Could you please elaborate on this?

Yes, I'll restate what I said in the session last week for those who
weren't there:

Currently, xfile.c and xfs_buf_mem.c are only used by online repair to
stage a recordset while rebuilding an ondisk btree index.  IOWs, they're
ephemeral, so we don't care or need to optimize folio sizing.  Some day
they might be adapted for longer-term usage though, so we might as well
try not to leave too many papercuts.

xfs_buf_mem.c creates in-memory btrees that mimic the ondisk btrees,
albeit with blocksize == PAGE_SIZE, regardless of the fs blocksize.
For this case we probably aren't ever going to care about large folios.

xfile.c is currently used to store fixed-size recordsets, names for
rebuilding directories, and name/value pairs for rebuilding xattr
structures.  Records aren't allowed to be larger than PAGE_SIZE, names
cannot be larger than MAXNAMELEN (255), and xattr values can't be larger
than 64k.

For that last case maybe it might be nice to get a large folio to reduce
processing overhead, but huge xattrs aren't that common.

> > matter, but why wouldn't the last argument here be "roundup_64(count,
> > PAGE_SIZE)" ?  Shouldn't we at least hint to the page cache about the
> > folio order that we actually want instead of limiting it to order-0?
> 
> For v2, I'll include your suggestions. I think we can also enable large folios
> in xfile_get_folio(), please check below:
> 
> diff --git a/fs/xfs/scrub/xfile.c b/fs/xfs/scrub/xfile.c
> index 8cdd863db585..df8b495b4939 100644
> --- a/fs/xfs/scrub/xfile.c
> +++ b/fs/xfs/scrub/xfile.c
> @@ -127,7 +127,7 @@ xfile_load(
>                 unsigned int    offset;
> 
>                 if (shmem_get_folio(inode, pos >> PAGE_SHIFT, &folio,
> -                               SGP_READ) < 0)
> +                               SGP_READ, roundup_64(count, PAGE_SIZE)) < 0)
>                         break;
>                 if (!folio) {
>                         /*
> @@ -197,7 +197,7 @@ xfile_store(
>                 unsigned int    offset;
> 
>                 if (shmem_get_folio(inode, pos >> PAGE_SHIFT, &folio,
> -                               SGP_CACHE) < 0)
> +                               SGP_CACHE, roundup_64(count, PAGE_SIZE)) < 0)
>                         break;
>                 if (filemap_check_wb_err(inode->i_mapping, 0)) {
>                         folio_unlock(folio);
> @@ -268,7 +268,8 @@ xfile_get_folio(
> 
>         pflags = memalloc_nofs_save();
>         error = shmem_get_folio(inode, pos >> PAGE_SHIFT, &folio,
> -                       (flags & XFILE_ALLOC) ? SGP_CACHE : SGP_READ);
> +                       (flags & XFILE_ALLOC) ? SGP_CACHE : SGP_READ,
> +                       roundup_64(i_size_read(inode), PAGE_SIZE));

I'm not sure why you picked i_size_read here; the xfile could be several
gigabytes long.  xfile_get_folio want to look at a subset of the xfile,
not all of it.

roundup_64(len, PAGE_SIZE) perhaps?

Also, should the rounding be done inside the shmem code so that callers
don't have to know about that detail?

>         memalloc_nofs_restore(pflags);
>         if (error)
>                 return ERR_PTR(error);
> 
> > 
> > (Also it seems a little odd to me that the @index is in units of pgoff_t
> > but @len is in bytes.)
> 
> I extended the shmem_get_folio() with @len to calculate folio order based on
> size (bytes). This is sent to ilog2() although I'm planning to use get_order()
> instead (after fixing the issues mentioned during the discussion). @index is
> used for __ffs() (same as in filemap).
> 
> Would you use lofft for @len instead? Or what's your suggestion?

I was reacting to @index, not @len.  I might've shifted @index to
"loff_t pos" but looking at the existing callsites it doesn't seem worth
the churn.

--D

> Thanks,
> Daniel
> 
> > 
> > >  			break;
> > >  		if (!folio) {
> > >  			/*
> > > @@ -197,7 +197,7 @@ xfile_store(
> > >  		unsigned int	offset;
> > >  
> > >  		if (shmem_get_folio(inode, pos >> PAGE_SHIFT, &folio,
> > > -				SGP_CACHE) < 0)
> > > +				SGP_CACHE, PAGE_SIZE) < 0)
> > >  			break;
> > >  		if (filemap_check_wb_err(inode->i_mapping, 0)) {
> > >  			folio_unlock(folio);
> > > @@ -268,7 +268,7 @@ xfile_get_folio(
> > >  
> > >  	pflags = memalloc_nofs_save();
> > >  	error = shmem_get_folio(inode, pos >> PAGE_SHIFT, &folio,
> > > -			(flags & XFILE_ALLOC) ? SGP_CACHE : SGP_READ);
> > > +			(flags & XFILE_ALLOC) ? SGP_CACHE : SGP_READ, PAGE_SIZE);
> > >  	memalloc_nofs_restore(pflags);
> > >  	if (error)
> > >  		return ERR_PTR(error);
> > > diff --git a/fs/xfs/xfs_buf_mem.c b/fs/xfs/xfs_buf_mem.c
> > > index 9bb2d24de709..784c81d35a1f 100644
> > > --- a/fs/xfs/xfs_buf_mem.c
> > > +++ b/fs/xfs/xfs_buf_mem.c
> > > @@ -149,7 +149,8 @@ xmbuf_map_page(
> > >  		return -ENOMEM;
> > >  	}
> > >  
> > > -	error = shmem_get_folio(inode, pos >> PAGE_SHIFT, &folio, SGP_CACHE);
> > > +	error = shmem_get_folio(inode, pos >> PAGE_SHIFT, &folio, SGP_CACHE,
> > > +				PAGE_SIZE);
> > 
> > This is ok unless someone wants to use a different XMBUF_BLOCKSIZE.
> > 
> > --D
> > 
> > >  	if (error)
> > >  		return error;
> > >  
> > > diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h
> > > index 3fb18f7eb73e..bc59b4a00228 100644
> > > --- a/include/linux/shmem_fs.h
> > > +++ b/include/linux/shmem_fs.h
> > > @@ -142,7 +142,7 @@ enum sgp_type {
> > >  };
> > >  
> > >  int shmem_get_folio(struct inode *inode, pgoff_t index, struct folio **foliop,
> > > -		enum sgp_type sgp);
> > > +		enum sgp_type sgp, size_t len);
> > >  struct folio *shmem_read_folio_gfp(struct address_space *mapping,
> > >  		pgoff_t index, gfp_t gfp);
> > >  
> > > diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> > > index 38830174608f..947770ded68c 100644
> > > --- a/mm/khugepaged.c
> > > +++ b/mm/khugepaged.c
> > > @@ -1863,7 +1863,8 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr,
> > >  				xas_unlock_irq(&xas);
> > >  				/* swap in or instantiate fallocated page */
> > >  				if (shmem_get_folio(mapping->host, index,
> > > -						&folio, SGP_NOALLOC)) {
> > > +						    &folio, SGP_NOALLOC,
> > > +						    PAGE_SIZE)) {
> > >  					result = SCAN_FAIL;
> > >  					goto xa_unlocked;
> > >  				}
> > > diff --git a/mm/shmem.c b/mm/shmem.c
> > > index d531018ffece..fcd2c9befe19 100644
> > > --- a/mm/shmem.c
> > > +++ b/mm/shmem.c
> > > @@ -1134,7 +1134,7 @@ static struct folio *shmem_get_partial_folio(struct inode *inode, pgoff_t index)
> > >  	 * (although in some cases this is just a waste of time).
> > >  	 */
> > >  	folio = NULL;
> > > -	shmem_get_folio(inode, index, &folio, SGP_READ);
> > > +	shmem_get_folio(inode, index, &folio, SGP_READ, PAGE_SIZE);
> > >  	return folio;
> > >  }
> > >  
> > > @@ -1844,7 +1844,7 @@ static struct folio *shmem_alloc_folio(gfp_t gfp, struct shmem_inode_info *info,
> > >  
> > >  static struct folio *shmem_alloc_and_add_folio(gfp_t gfp,
> > >  		struct inode *inode, pgoff_t index,
> > > -		struct mm_struct *fault_mm, bool huge)
> > > +		struct mm_struct *fault_mm, bool huge, size_t len)
> > >  {
> > >  	struct address_space *mapping = inode->i_mapping;
> > >  	struct shmem_inode_info *info = SHMEM_I(inode);
> > > @@ -2173,7 +2173,7 @@ static int shmem_swapin_folio(struct inode *inode, pgoff_t index,
> > >   */
> > >  static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index,
> > >  		struct folio **foliop, enum sgp_type sgp, gfp_t gfp,
> > > -		struct vm_fault *vmf, vm_fault_t *fault_type)
> > > +		struct vm_fault *vmf, vm_fault_t *fault_type, size_t len)
> > >  {
> > >  	struct vm_area_struct *vma = vmf ? vmf->vma : NULL;
> > >  	struct mm_struct *fault_mm;
> > > @@ -2258,7 +2258,7 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index,
> > >  		huge_gfp = vma_thp_gfp_mask(vma);
> > >  		huge_gfp = limit_gfp_mask(huge_gfp, gfp);
> > >  		folio = shmem_alloc_and_add_folio(huge_gfp,
> > > -				inode, index, fault_mm, true);
> > > +				inode, index, fault_mm, true, len);
> > >  		if (!IS_ERR(folio)) {
> > >  			count_vm_event(THP_FILE_ALLOC);
> > >  			goto alloced;
> > > @@ -2267,7 +2267,8 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index,
> > >  			goto repeat;
> > >  	}
> > >  
> > > -	folio = shmem_alloc_and_add_folio(gfp, inode, index, fault_mm, false);
> > > +	folio = shmem_alloc_and_add_folio(gfp, inode, index, fault_mm, false,
> > > +					  len);
> > >  	if (IS_ERR(folio)) {
> > >  		error = PTR_ERR(folio);
> > >  		if (error == -EEXIST)
> > > @@ -2377,10 +2378,10 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index,
> > >   * Return: 0 if successful, else a negative error code.
> > >   */
> > >  int shmem_get_folio(struct inode *inode, pgoff_t index, struct folio **foliop,
> > > -		enum sgp_type sgp)
> > > +		enum sgp_type sgp, size_t len)
> > >  {
> > >  	return shmem_get_folio_gfp(inode, index, foliop, sgp,
> > > -			mapping_gfp_mask(inode->i_mapping), NULL, NULL);
> > > +			mapping_gfp_mask(inode->i_mapping), NULL, NULL, len);
> > >  }
> > >  EXPORT_SYMBOL_GPL(shmem_get_folio);
> > >  
> > > @@ -2475,7 +2476,7 @@ static vm_fault_t shmem_fault(struct vm_fault *vmf)
> > >  
> > >  	WARN_ON_ONCE(vmf->page != NULL);
> > >  	err = shmem_get_folio_gfp(inode, vmf->pgoff, &folio, SGP_CACHE,
> > > -				  gfp, vmf, &ret);
> > > +				  gfp, vmf, &ret, PAGE_SIZE);
> > >  	if (err)
> > >  		return vmf_error(err);
> > >  	if (folio) {
> > > @@ -2954,6 +2955,9 @@ shmem_write_begin(struct file *file, struct address_space *mapping,
> > >  	struct folio *folio;
> > >  	int ret = 0;
> > >  
> > > +	if (!mapping_large_folio_support(mapping))
> > > +		len = min_t(size_t, len, PAGE_SIZE - offset_in_page(pos));
> > > +
> > >  	/* i_rwsem is held by caller */
> > >  	if (unlikely(info->seals & (F_SEAL_GROW |
> > >  				   F_SEAL_WRITE | F_SEAL_FUTURE_WRITE))) {
> > > @@ -2963,7 +2967,7 @@ shmem_write_begin(struct file *file, struct address_space *mapping,
> > >  			return -EPERM;
> > >  	}
> > >  
> > > -	ret = shmem_get_folio(inode, index, &folio, SGP_WRITE);
> > > +	ret = shmem_get_folio(inode, index, &folio, SGP_WRITE, len);
> > >  	if (ret)
> > >  		return ret;
> > >  
> > > @@ -3083,7 +3087,7 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
> > >  				break;
> > >  		}
> > >  
> > > -		error = shmem_get_folio(inode, index, &folio, SGP_READ);
> > > +		error = shmem_get_folio(inode, index, &folio, SGP_READ, PAGE_SIZE);
> > >  		if (error) {
> > >  			if (error == -EINVAL)
> > >  				error = 0;
> > > @@ -3260,7 +3264,7 @@ static ssize_t shmem_file_splice_read(struct file *in, loff_t *ppos,
> > >  			break;
> > >  
> > >  		error = shmem_get_folio(inode, *ppos / PAGE_SIZE, &folio,
> > > -					SGP_READ);
> > > +					SGP_READ, PAGE_SIZE);
> > >  		if (error) {
> > >  			if (error == -EINVAL)
> > >  				error = 0;
> > > @@ -3469,7 +3473,8 @@ static long shmem_fallocate(struct file *file, int mode, loff_t offset,
> > >  			error = -ENOMEM;
> > >  		else
> > >  			error = shmem_get_folio(inode, index, &folio,
> > > -						SGP_FALLOC);
> > > +						SGP_FALLOC,
> > > +						(end - index) << PAGE_SHIFT);
> > >  		if (error) {
> > >  			info->fallocend = undo_fallocend;
> > >  			/* Remove the !uptodate folios we added */
> > > @@ -3822,7 +3827,7 @@ static int shmem_symlink(struct mnt_idmap *idmap, struct inode *dir,
> > >  	} else {
> > >  		inode_nohighmem(inode);
> > >  		inode->i_mapping->a_ops = &shmem_aops;
> > > -		error = shmem_get_folio(inode, 0, &folio, SGP_WRITE);
> > > +		error = shmem_get_folio(inode, 0, &folio, SGP_WRITE, PAGE_SIZE);
> > >  		if (error)
> > >  			goto out_remove_offset;
> > >  		inode->i_op = &shmem_symlink_inode_operations;
> > > @@ -3868,7 +3873,7 @@ static const char *shmem_get_link(struct dentry *dentry, struct inode *inode,
> > >  			return ERR_PTR(-ECHILD);
> > >  		}
> > >  	} else {
> > > -		error = shmem_get_folio(inode, 0, &folio, SGP_READ);
> > > +		error = shmem_get_folio(inode, 0, &folio, SGP_READ, PAGE_SIZE);
> > >  		if (error)
> > >  			return ERR_PTR(error);
> > >  		if (!folio)
> > > @@ -5255,7 +5260,7 @@ struct folio *shmem_read_folio_gfp(struct address_space *mapping,
> > >  	int error;
> > >  
> > >  	error = shmem_get_folio_gfp(inode, index, &folio, SGP_CACHE,
> > > -				    gfp, NULL, NULL);
> > > +				    gfp, NULL, NULL, PAGE_SIZE);
> > >  	if (error)
> > >  		return ERR_PTR(error);
> > >  
> > > diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
> > > index 3c3539c573e7..540a0c2d4325 100644
> > > --- a/mm/userfaultfd.c
> > > +++ b/mm/userfaultfd.c
> > > @@ -359,7 +359,7 @@ static int mfill_atomic_pte_continue(pmd_t *dst_pmd,
> > >  	struct page *page;
> > >  	int ret;
> > >  
> > > -	ret = shmem_get_folio(inode, pgoff, &folio, SGP_NOALLOC);
> > > +	ret = shmem_get_folio(inode, pgoff, &folio, SGP_NOALLOC, PAGE_SIZE);
> > >  	/* Our caller expects us to return -EFAULT if we failed to find folio */
> > >  	if (ret == -ENOENT)
> > >  		ret = -EFAULT;
> > > -- 
> > > 2.43.0
> > > 

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2024-05-21 16:36 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CGME20240515055723eucas1p11bf14732f7fac943e688369ff7765f79@eucas1p1.samsung.com>
2024-05-15  5:57 ` [PATCH 00/12] [LSF/MM/BPF RFC] shmem/tmpfs: add large folios support Daniel Gomez
     [not found]   ` <CGME20240515055724eucas1p1c502dbded4dc6ff929c7aff570de80c2@eucas1p1.samsung.com>
2024-05-15  5:57     ` [PATCH 01/12] splice: don't check for uptodate if partially uptodate is impl Daniel Gomez
     [not found]   ` <CGME20240515055726eucas1p2a795fc743373571bfc3349f9e1ef3f9e@eucas1p2.samsung.com>
2024-05-15  5:57     ` [PATCH 02/12] shmem: add per-block uptodate tracking for large folios Daniel Gomez
     [not found]   ` <CGME20240515055727eucas1p2413c65b8b227ac0c6007b4600574abd8@eucas1p2.samsung.com>
2024-05-15  5:57     ` [PATCH 03/12] shmem: move folio zero operation to write_begin() Daniel Gomez
     [not found]   ` <CGME20240515055728eucas1p181e0ed81b2663eb0eee6d6134c1c1956@eucas1p1.samsung.com>
2024-05-15  5:57     ` [PATCH 04/12] shmem: exit shmem_get_folio_gfp() if block is uptodate Daniel Gomez
     [not found]   ` <CGME20240515055729eucas1p14e953424ad39bbb923c64163b1bbd4b3@eucas1p1.samsung.com>
2024-05-15  5:57     ` [PATCH 05/12] shmem: clear_highpage() if block is not uptodate Daniel Gomez
     [not found]   ` <CGME20240515055731eucas1p12cbbba88e24a011ef5871f90ff25ae73@eucas1p1.samsung.com>
2024-05-15  5:57     ` [PATCH 06/12] shmem: set folio uptodate when reclaim Daniel Gomez
     [not found]   ` <CGME20240515055732eucas1p2302bbca4d60e2e811a5c59e34f83628d@eucas1p2.samsung.com>
2024-05-15  5:57     ` [PATCH 07/12] shmem: check if a block is uptodate before splice into pipe Daniel Gomez
2024-05-16 13:19       ` kernel test robot
     [not found]   ` <CGME20240515055733eucas1p2804d2fb5f5bf7d6adb460054f6e9f4d8@eucas1p2.samsung.com>
2024-05-15  5:57     ` [PATCH 08/12] shmem: clear uptodate blocks after PUNCH_HOLE Daniel Gomez
     [not found]   ` <CGME20240515055735eucas1p2a967b4eebc8e059588cd62139f006b0d@eucas1p2.samsung.com>
2024-05-15  5:57     ` [PATCH 09/12] shmem: enable per-block uptodate Daniel Gomez
     [not found]   ` <CGME20240515055736eucas1p1bfa9549398e766532d143ba9314bee18@eucas1p1.samsung.com>
2024-05-15  5:57     ` [PATCH 10/12] shmem: add order arg to shmem_alloc_folio() Daniel Gomez
     [not found]   ` <CGME20240515055738eucas1p15335a32c790b731aa5857193bbddf92d@eucas1p1.samsung.com>
2024-05-15  5:57     ` [PATCH 11/12] shmem: add file length arg in shmem_get_folio() path Daniel Gomez
2024-05-15 17:47       ` kernel test robot
2024-05-17 16:17       ` Darrick J. Wong
2024-05-21 11:38         ` Daniel Gomez
2024-05-21 16:36           ` Darrick J. Wong
     [not found]   ` <CGME20240515055740eucas1p1bf112e73a7009a0f9b2bbf09c989a51b@eucas1p1.samsung.com>
2024-05-15  5:57     ` [PATCH 12/12] shmem: add large folio support to the write and fallocate paths Daniel Gomez
2024-05-15 18:59       ` kernel test robot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).