linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/6] shmem: high order folios support in write path
       [not found] <CGME20230915095123eucas1p2c23d8a8d910f5a8e9fd077dd9579ad0a@eucas1p2.samsung.com>
@ 2023-09-15  9:51 ` Daniel Gomez
       [not found]   ` <CGME20230915095124eucas1p1eb0e0ef883f6316cf14c349404a51150@eucas1p1.samsung.com>
                     ` (6 more replies)
  0 siblings, 7 replies; 29+ messages in thread
From: Daniel Gomez @ 2023-09-15  9:51 UTC (permalink / raw)
  To: minchan, senozhatsky, axboe, djwong, willy, hughd, akpm, mcgrof,
	linux-kernel, linux-block, linux-xfs, linux-fsdevel, linux-mm
  Cc: gost.dev, Pankaj Raghav, Daniel Gomez

This series add support for high order folios in shmem write
path.

This is a continuation of the shmem work from Luis here [1]
following Matthew Wilcox's suggestion [2] regarding the path to take
for the folio allocation order calculation.

[1] RFC v2 add support for blocksize > PAGE_SIZE
https://lore.kernel.org/all/ZHBowMEDfyrAAOWH@bombadil.infradead.org/T/#md3e93ab46ce2ad9254e1eb54ffe71211988b5632
[2] https://lore.kernel.org/all/ZHD9zmIeNXICDaRJ@casper.infradead.org/

Patches have been tested and sent from next-230911. They do apply
cleanly to the latest next-230914.

fsx and fstests has been performed on tmpfs with noswap with the
following results:
- fsx: 2d test, 21,5B
- fstests: Same result as baseline for next-230911 [3][4][5]

[3] Baseline next-230911 failures are: generic/080 generic/126
generic/193 generic/633 generic/689
[4] fstests logs baseline: https://gitlab.com/-/snippets/3598621
[5] fstests logs patches: https://gitlab.com/-/snippets/3598628

There are at least 2 cases/topics to handle that I'd appreciate
feedback.
1. With the new strategy, you might end up with a folio order matching
HPAGE_PMD_ORDER. However, we won't respect the 'huge' flag anymore if
THP is enabled.
2. When the above (1.) occurs, the code skips the huge path, so
xa_find with hindex is skipped.

Daniel

Daniel Gomez (5):
  filemap: make the folio order calculation shareable
  shmem: drop BLOCKS_PER_PAGE macro
  shmem: add order parameter support to shmem_alloc_folio
  shmem: add file length in shmem_get_folio path
  shmem: add large folios support to the write path

Luis Chamberlain (1):
  shmem: account for large order folios

 fs/iomap/buffered-io.c   |  6 ++-
 include/linux/pagemap.h  | 42 ++++++++++++++++---
 include/linux/shmem_fs.h |  2 +-
 mm/filemap.c             |  8 ----
 mm/khugepaged.c          |  2 +-
 mm/shmem.c               | 91 +++++++++++++++++++++++++---------------
 6 files changed, 100 insertions(+), 51 deletions(-)

--
2.39.2

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 1/6] filemap: make the folio order calculation shareable
       [not found]   ` <CGME20230915095124eucas1p1eb0e0ef883f6316cf14c349404a51150@eucas1p1.samsung.com>
@ 2023-09-15  9:51     ` Daniel Gomez
  2023-09-15 13:40       ` Matthew Wilcox
  0 siblings, 1 reply; 29+ messages in thread
From: Daniel Gomez @ 2023-09-15  9:51 UTC (permalink / raw)
  To: minchan, senozhatsky, axboe, djwong, willy, hughd, akpm, mcgrof,
	linux-kernel, linux-block, linux-xfs, linux-fsdevel, linux-mm
  Cc: gost.dev, Pankaj Raghav, Daniel Gomez

To make the code that clamps the folio order in the __filemap_get_folio
routine reusable to others, move and merge it to the fgf_set_order
new subroutine (mapping_size_order), so when mapping the size at a
given index, the order calculated is already valid and ready to be
used when order is retrieved from fgp_flags with FGF_GET_ORDER.

Signed-off-by: Daniel Gomez <da.gomez@samsung.com>
---
 fs/iomap/buffered-io.c  |  6 ++++--
 include/linux/pagemap.h | 42 ++++++++++++++++++++++++++++++++++++-----
 mm/filemap.c            |  8 --------
 3 files changed, 41 insertions(+), 15 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index ae8673ce08b1..bfd9a22a9464 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -546,12 +546,14 @@ EXPORT_SYMBOL_GPL(iomap_is_partially_uptodate);
 struct folio *iomap_get_folio(struct iomap_iter *iter, loff_t pos, size_t len)
 {
 	fgf_t fgp = FGP_WRITEBEGIN | FGP_NOFS;
+	pgoff_t index = pos >> PAGE_SHIFT;
+	struct address_space *mapping = iter->inode->i_mapping;
 
 	if (iter->flags & IOMAP_NOWAIT)
 		fgp |= FGP_NOWAIT;
-	fgp |= fgf_set_order(len);
+	fgp |= fgf_set_order(mapping, index, len);
 
-	return __filemap_get_folio(iter->inode->i_mapping, pos >> PAGE_SHIFT,
+	return __filemap_get_folio(mapping, index,
 			fgp, mapping_gfp_mask(iter->inode->i_mapping));
 }
 EXPORT_SYMBOL_GPL(iomap_get_folio);
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 351c3b7f93a1..7af5636eb32a 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -576,6 +576,39 @@ typedef unsigned int __bitwise fgf_t;
 
 #define FGP_WRITEBEGIN		(FGP_LOCK | FGP_WRITE | FGP_CREAT | FGP_STABLE)
 
+/**
+ * mapping_size_order - Get maximum folio order for the given file size.
+ * @mapping: Target address_space.
+ * @index: The page index.
+ * @size: The suggested size of the folio to create.
+ *
+ * This returns a high order for folios (when supported) based on the file size
+ * which the mapping currently allows at the given index. The index is relevant
+ * due to alignment considerations the mapping might have. The returned order
+ * may be less than the size passed.
+ *
+ * Return: The order.
+ */
+static inline unsigned int mapping_size_order(struct address_space *mapping,
+						 pgoff_t index, size_t size)
+{
+	unsigned int order = ilog2(size);
+
+	if ((order <= PAGE_SHIFT) || (!mapping_large_folio_support(mapping)))
+		return 0;
+	else
+		order = order - PAGE_SHIFT;
+
+	/* If we're not aligned, allocate a smaller folio */
+	if (index & ((1UL << order) - 1))
+		order = __ffs(index);
+
+	order = min_t(size_t, order, MAX_PAGECACHE_ORDER);
+
+	/* Order-1 not supported due to THP dependency */
+	return (order == 1) ? 0 : order;
+}
+
 /**
  * fgf_set_order - Encode a length in the fgf_t flags.
  * @size: The suggested size of the folio to create.
@@ -587,13 +620,12 @@ typedef unsigned int __bitwise fgf_t;
  * due to alignment constraints, memory pressure, or the presence of
  * other folios at nearby indices.
  */
-static inline fgf_t fgf_set_order(size_t size)
+static inline fgf_t fgf_set_order(struct address_space *mapping, pgoff_t index,
+				  size_t size)
 {
-	unsigned int shift = ilog2(size);
+	unsigned int order = mapping_size_order(mapping, index, size);
 
-	if (shift <= PAGE_SHIFT)
-		return 0;
-	return (__force fgf_t)((shift - PAGE_SHIFT) << 26);
+	return (__force fgf_t)(order << 26);
 }
 
 void *filemap_get_entry(struct address_space *mapping, pgoff_t index);
diff --git a/mm/filemap.c b/mm/filemap.c
index 582f5317ff71..e285fffa9bcf 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1917,14 +1917,6 @@ struct folio *__filemap_get_folio(struct address_space *mapping, pgoff_t index,
 		if (WARN_ON_ONCE(!(fgp_flags & (FGP_LOCK | FGP_FOR_MMAP))))
 			fgp_flags |= FGP_LOCK;
 
-		if (!mapping_large_folio_support(mapping))
-			order = 0;
-		if (order > MAX_PAGECACHE_ORDER)
-			order = MAX_PAGECACHE_ORDER;
-		/* If we're not aligned, allocate a smaller folio */
-		if (index & ((1UL << order) - 1))
-			order = __ffs(index);
-
 		do {
 			gfp_t alloc_gfp = gfp;
 
-- 
2.39.2

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 2/6] shmem: drop BLOCKS_PER_PAGE macro
       [not found]   ` <CGME20230915095126eucas1p2cf75674dab8a81228f493a7200f4a1ba@eucas1p2.samsung.com>
@ 2023-09-15  9:51     ` Daniel Gomez
  0 siblings, 0 replies; 29+ messages in thread
From: Daniel Gomez @ 2023-09-15  9:51 UTC (permalink / raw)
  To: minchan, senozhatsky, axboe, djwong, willy, hughd, akpm, mcgrof,
	linux-kernel, linux-block, linux-xfs, linux-fsdevel, linux-mm
  Cc: gost.dev, Pankaj Raghav, Daniel Gomez

The commit [1] replaced all BLOCKS_PER_PAGE in favor of the
generic PAGE_SECTORS but definition was not removed. Drop it
as unused macro.

[1] e09764cff44b5 ("shmem: quota support").

Signed-off-by: Daniel Gomez <da.gomez@samsung.com>
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
---
 mm/shmem.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/mm/shmem.c b/mm/shmem.c
index 13c27c343820..8b3823e4d344 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -84,7 +84,6 @@ static struct vfsmount *shm_mnt;
 
 #include "internal.h"
 
-#define BLOCKS_PER_PAGE  (PAGE_SIZE/512)
 #define VM_ACCT(size)    (PAGE_ALIGN(size) >> PAGE_SHIFT)
 
 /* Pretend that each entry is of this size in directory's i_size */
-- 
2.39.2

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 3/6] shmem: account for large order folios
       [not found]   ` <CGME20230915095128eucas1p2885c3add58d82413d9c1d17832d3d281@eucas1p2.samsung.com>
@ 2023-09-15  9:51     ` Daniel Gomez
  2023-09-15 12:14       ` Matthew Wilcox
  2023-09-15 13:44       ` Matthew Wilcox
  0 siblings, 2 replies; 29+ messages in thread
From: Daniel Gomez @ 2023-09-15  9:51 UTC (permalink / raw)
  To: minchan, senozhatsky, axboe, djwong, willy, hughd, akpm, mcgrof,
	linux-kernel, linux-block, linux-xfs, linux-fsdevel, linux-mm
  Cc: gost.dev, Pankaj Raghav, Daniel Gomez

From: Luis Chamberlain <mcgrof@kernel.org>

shmem uses the shem_info_inode alloced, swapped to account
for allocated pages and swapped pages. In preparation for large
order folios adjust the accounting to use folio_nr_pages().

This should produce no functional changes yet as larger order
folios are not yet used or supported in shmem.

Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Daniel Gomez <da.gomez@samsung.com>
---
 mm/shmem.c | 30 ++++++++++++++++++------------
 1 file changed, 18 insertions(+), 12 deletions(-)

diff --git a/mm/shmem.c b/mm/shmem.c
index 8b3823e4d344..836d44584796 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -869,16 +869,16 @@ unsigned long shmem_partial_swap_usage(struct address_space *mapping,
 						pgoff_t start, pgoff_t end)
 {
 	XA_STATE(xas, &mapping->i_pages, start);
-	struct page *page;
+	struct folio *folio;
 	unsigned long swapped = 0;
 	unsigned long max = end - 1;
 
 	rcu_read_lock();
-	xas_for_each(&xas, page, max) {
-		if (xas_retry(&xas, page))
+	xas_for_each(&xas, folio, max) {
+		if (xas_retry(&xas, folio))
 			continue;
-		if (xa_is_value(page))
-			swapped++;
+		if (xa_is_value(folio))
+			swapped += (folio_nr_pages(folio));
 		if (xas.xa_index == max)
 			break;
 		if (need_resched()) {
@@ -1006,10 +1006,12 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend,
 			folio = fbatch.folios[i];
 
 			if (xa_is_value(folio)) {
+				long swaps_freed;
 				if (unfalloc)
 					continue;
-				nr_swaps_freed += !shmem_free_swap(mapping,
-							indices[i], folio);
+				swaps_freed = folio_nr_pages(folio);
+				if (!shmem_free_swap(mapping, indices[i], folio))
+					nr_swaps_freed += swaps_freed;
 				continue;
 			}
 
@@ -1075,14 +1077,16 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend,
 			folio = fbatch.folios[i];
 
 			if (xa_is_value(folio)) {
+				long swaps_freed;
 				if (unfalloc)
 					continue;
+				swaps_freed = folio_nr_pages(folio);
 				if (shmem_free_swap(mapping, indices[i], folio)) {
 					/* Swap was replaced by page: retry */
 					index = indices[i];
 					break;
 				}
-				nr_swaps_freed++;
+				nr_swaps_freed += swaps_freed;
 				continue;
 			}
 
@@ -1528,7 +1532,7 @@ static int shmem_writepage(struct page *page, struct writeback_control *wbc)
 	if (add_to_swap_cache(folio, swap,
 			__GFP_HIGH | __GFP_NOMEMALLOC | __GFP_NOWARN,
 			NULL) == 0) {
-		shmem_recalc_inode(inode, 0, 1);
+		shmem_recalc_inode(inode, 0, folio_nr_pages(folio));
 		swap_shmem_alloc(swap);
 		shmem_delete_from_page_cache(folio, swp_to_radix_entry(swap));
 
@@ -1801,6 +1805,7 @@ static void shmem_set_folio_swapin_error(struct inode *inode, pgoff_t index,
 	struct address_space *mapping = inode->i_mapping;
 	swp_entry_t swapin_error;
 	void *old;
+	long num_swap_pages;
 
 	swapin_error = make_poisoned_swp_entry();
 	old = xa_cmpxchg_irq(&mapping->i_pages, index,
@@ -1810,13 +1815,14 @@ static void shmem_set_folio_swapin_error(struct inode *inode, pgoff_t index,
 		return;
 
 	folio_wait_writeback(folio);
+	num_swap_pages = folio_nr_pages(folio);
 	delete_from_swap_cache(folio);
 	/*
 	 * Don't treat swapin error folio as alloced. Otherwise inode->i_blocks
 	 * won't be 0 when inode is released and thus trigger WARN_ON(i_blocks)
 	 * in shmem_evict_inode().
 	 */
-	shmem_recalc_inode(inode, -1, -1);
+	shmem_recalc_inode(inode, num_swap_pages, num_swap_pages);
 	swap_free(swap);
 }
 
@@ -1903,7 +1909,7 @@ static int shmem_swapin_folio(struct inode *inode, pgoff_t index,
 	if (error)
 		goto failed;
 
-	shmem_recalc_inode(inode, 0, -1);
+	shmem_recalc_inode(inode, 0, folio_nr_pages(folio));
 
 	if (sgp == SGP_WRITE)
 		folio_mark_accessed(folio);
@@ -2663,7 +2669,7 @@ int shmem_mfill_atomic_pte(pmd_t *dst_pmd,
 	if (ret)
 		goto out_delete_from_cache;
 
-	shmem_recalc_inode(inode, 1, 0);
+	shmem_recalc_inode(inode, folio_nr_pages(folio), 0);
 	folio_unlock(folio);
 	return 0;
 out_delete_from_cache:
-- 
2.39.2

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 4/6] shmem: add order parameter support to shmem_alloc_folio
       [not found]   ` <CGME20230915095129eucas1p1383d75c6d62056afbb20b78a3ec15234@eucas1p1.samsung.com>
@ 2023-09-15  9:51     ` Daniel Gomez
  2023-09-15 12:19       ` Matthew Wilcox
  0 siblings, 1 reply; 29+ messages in thread
From: Daniel Gomez @ 2023-09-15  9:51 UTC (permalink / raw)
  To: minchan, senozhatsky, axboe, djwong, willy, hughd, akpm, mcgrof,
	linux-kernel, linux-block, linux-xfs, linux-fsdevel, linux-mm
  Cc: gost.dev, Pankaj Raghav, Daniel Gomez

In preparation for high order folio support for the write path, add
order parameter when allocating a folio. This is on the write path
when huge support is not enabled or when it is but the huge page
allocation fails, the fallback will take advantage of this too.

Use order 0 for the non write paths such as reads or swap in as these
currently lack high order folios support.

Signed-off-by: Daniel Gomez <da.gomez@samsung.com>
---
 mm/shmem.c | 22 +++++++++++++---------
 1 file changed, 13 insertions(+), 9 deletions(-)

diff --git a/mm/shmem.c b/mm/shmem.c
index 836d44584796..ee297d8874d3 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1669,20 +1669,21 @@ static struct folio *shmem_alloc_hugefolio(gfp_t gfp,
 }
 
 static struct folio *shmem_alloc_folio(gfp_t gfp,
-			struct shmem_inode_info *info, pgoff_t index)
+			struct shmem_inode_info *info, pgoff_t index,
+			unsigned int order)
 {
 	struct vm_area_struct pvma;
 	struct folio *folio;
 
 	shmem_pseudo_vma_init(&pvma, info, index);
-	folio = vma_alloc_folio(gfp, 0, &pvma, 0, false);
+	folio = vma_alloc_folio(gfp, order, &pvma, 0, false);
 	shmem_pseudo_vma_destroy(&pvma);
 
 	return folio;
 }
 
 static struct folio *shmem_alloc_and_acct_folio(gfp_t gfp, struct inode *inode,
-		pgoff_t index, bool huge)
+		pgoff_t index, bool huge, unsigned int *order)
 {
 	struct shmem_inode_info *info = SHMEM_I(inode);
 	struct folio *folio;
@@ -1691,7 +1692,7 @@ static struct folio *shmem_alloc_and_acct_folio(gfp_t gfp, struct inode *inode,
 
 	if (!IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE))
 		huge = false;
-	nr = huge ? HPAGE_PMD_NR : 1;
+	nr = huge ? HPAGE_PMD_NR : 1U << *order;
 
 	err = shmem_inode_acct_block(inode, nr);
 	if (err)
@@ -1700,7 +1701,7 @@ static struct folio *shmem_alloc_and_acct_folio(gfp_t gfp, struct inode *inode,
 	if (huge)
 		folio = shmem_alloc_hugefolio(gfp, info, index);
 	else
-		folio = shmem_alloc_folio(gfp, info, index);
+		folio = shmem_alloc_folio(gfp, info, index, *order);
 	if (folio) {
 		__folio_set_locked(folio);
 		__folio_set_swapbacked(folio);
@@ -1750,7 +1751,7 @@ static int shmem_replace_folio(struct folio **foliop, gfp_t gfp,
 	 */
 	gfp &= ~GFP_CONSTRAINT_MASK;
 	VM_BUG_ON_FOLIO(folio_test_large(old), old);
-	new = shmem_alloc_folio(gfp, info, index);
+	new = shmem_alloc_folio(gfp, info, index, 0);
 	if (!new)
 		return -ENOMEM;
 
@@ -1961,6 +1962,7 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index,
 	int error;
 	int once = 0;
 	int alloced = 0;
+	unsigned int order = 0;
 
 	if (index > (MAX_LFS_FILESIZE >> PAGE_SHIFT))
 		return -EFBIG;
@@ -2036,10 +2038,12 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index,
 
 	huge_gfp = vma_thp_gfp_mask(vma);
 	huge_gfp = limit_gfp_mask(huge_gfp, gfp);
-	folio = shmem_alloc_and_acct_folio(huge_gfp, inode, index, true);
+	folio = shmem_alloc_and_acct_folio(huge_gfp, inode, index, true,
+					   &order);
 	if (IS_ERR(folio)) {
 alloc_nohuge:
-		folio = shmem_alloc_and_acct_folio(gfp, inode, index, false);
+		folio = shmem_alloc_and_acct_folio(gfp, inode, index, false,
+						   &order);
 	}
 	if (IS_ERR(folio)) {
 		int retry = 5;
@@ -2602,7 +2606,7 @@ int shmem_mfill_atomic_pte(pmd_t *dst_pmd,
 
 	if (!*foliop) {
 		ret = -ENOMEM;
-		folio = shmem_alloc_folio(gfp, info, pgoff);
+		folio = shmem_alloc_folio(gfp, info, pgoff, 0);
 		if (!folio)
 			goto out_unacct_blocks;
 
-- 
2.39.2

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 5/6] shmem: add file length in shmem_get_folio path
       [not found]   ` <CGME20230915095131eucas1p1010e364cd1c351e5b7379954bd237a3d@eucas1p1.samsung.com>
@ 2023-09-15  9:51     ` Daniel Gomez
  2023-09-15 15:37       ` Matthew Wilcox
  0 siblings, 1 reply; 29+ messages in thread
From: Daniel Gomez @ 2023-09-15  9:51 UTC (permalink / raw)
  To: minchan, senozhatsky, axboe, djwong, willy, hughd, akpm, mcgrof,
	linux-kernel, linux-block, linux-xfs, linux-fsdevel, linux-mm
  Cc: gost.dev, Pankaj Raghav, Daniel Gomez

To be able to calculate folio order based on the file size when
allocation occurs on the write path. Use of length 0 for non write
paths.

Signed-off-by: Daniel Gomez <da.gomez@samsung.com>
---
 include/linux/shmem_fs.h |  2 +-
 mm/khugepaged.c          |  2 +-
 mm/shmem.c               | 28 ++++++++++++++++------------
 3 files changed, 18 insertions(+), 14 deletions(-)

diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h
index 6b0c626620f5..b3509e7f1054 100644
--- a/include/linux/shmem_fs.h
+++ b/include/linux/shmem_fs.h
@@ -133,7 +133,7 @@ enum sgp_type {
 };
 
 int shmem_get_folio(struct inode *inode, pgoff_t index, struct folio **foliop,
-		enum sgp_type sgp);
+		enum sgp_type sgp, size_t len);
 struct folio *shmem_read_folio_gfp(struct address_space *mapping,
 		pgoff_t index, gfp_t gfp);
 
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 88433cc25d8a..e5d3feff6de6 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -1856,7 +1856,7 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr,
 				xas_unlock_irq(&xas);
 				/* swap in or instantiate fallocated page */
 				if (shmem_get_folio(mapping->host, index,
-						&folio, SGP_NOALLOC)) {
+						&folio, SGP_NOALLOC, 0)) {
 					result = SCAN_FAIL;
 					goto xa_unlocked;
 				}
diff --git a/mm/shmem.c b/mm/shmem.c
index ee297d8874d3..adff74751065 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -969,7 +969,7 @@ static struct folio *shmem_get_partial_folio(struct inode *inode, pgoff_t index)
 	 * (although in some cases this is just a waste of time).
 	 */
 	folio = NULL;
-	shmem_get_folio(inode, index, &folio, SGP_READ);
+	shmem_get_folio(inode, index, &folio, SGP_READ, 0);
 	return folio;
 }
 
@@ -1950,7 +1950,7 @@ static int shmem_swapin_folio(struct inode *inode, pgoff_t index,
 static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index,
 		struct folio **foliop, enum sgp_type sgp, gfp_t gfp,
 		struct vm_area_struct *vma, struct vm_fault *vmf,
-		vm_fault_t *fault_type)
+		vm_fault_t *fault_type, size_t len)
 {
 	struct address_space *mapping = inode->i_mapping;
 	struct shmem_inode_info *info = SHMEM_I(inode);
@@ -2164,10 +2164,11 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index,
 }
 
 int shmem_get_folio(struct inode *inode, pgoff_t index, struct folio **foliop,
-		enum sgp_type sgp)
+		enum sgp_type sgp, size_t len)
 {
 	return shmem_get_folio_gfp(inode, index, foliop, sgp,
-			mapping_gfp_mask(inode->i_mapping), NULL, NULL, NULL);
+			mapping_gfp_mask(inode->i_mapping),
+			NULL, NULL, NULL, len);
 }
 
 /*
@@ -2251,7 +2252,7 @@ static vm_fault_t shmem_fault(struct vm_fault *vmf)
 	}
 
 	err = shmem_get_folio_gfp(inode, vmf->pgoff, &folio, SGP_CACHE,
-				  gfp, vma, vmf, &ret);
+				  gfp, vma, vmf, &ret, i_size_read(inode));
 	if (err)
 		return vmf_error(err);
 	if (folio)
@@ -2702,6 +2703,9 @@ shmem_write_begin(struct file *file, struct address_space *mapping,
 	struct folio *folio;
 	int ret = 0;
 
+	if (!mapping_large_folio_support(mapping))
+		len = min_t(size_t, len, PAGE_SIZE - offset_in_page(pos));
+
 	/* i_rwsem is held by caller */
 	if (unlikely(info->seals & (F_SEAL_GROW |
 				   F_SEAL_WRITE | F_SEAL_FUTURE_WRITE))) {
@@ -2711,7 +2715,7 @@ shmem_write_begin(struct file *file, struct address_space *mapping,
 			return -EPERM;
 	}
 
-	ret = shmem_get_folio(inode, index, &folio, SGP_WRITE);
+	ret = shmem_get_folio(inode, index, &folio, SGP_WRITE, len);
 
 	if (ret)
 		return ret;
@@ -2783,7 +2787,7 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
 				break;
 		}
 
-		error = shmem_get_folio(inode, index, &folio, SGP_READ);
+		error = shmem_get_folio(inode, index, &folio, SGP_READ, 0);
 		if (error) {
 			if (error == -EINVAL)
 				error = 0;
@@ -2960,7 +2964,7 @@ static ssize_t shmem_file_splice_read(struct file *in, loff_t *ppos,
 			break;
 
 		error = shmem_get_folio(inode, *ppos / PAGE_SIZE, &folio,
-					SGP_READ);
+					SGP_READ, 0);
 		if (error) {
 			if (error == -EINVAL)
 				error = 0;
@@ -3147,7 +3151,7 @@ static long shmem_fallocate(struct file *file, int mode, loff_t offset,
 			error = -ENOMEM;
 		else
 			error = shmem_get_folio(inode, index, &folio,
-						SGP_FALLOC);
+						SGP_FALLOC, 0);
 		if (error) {
 			info->fallocend = undo_fallocend;
 			/* Remove the !uptodate folios we added */
@@ -3502,7 +3506,7 @@ static int shmem_symlink(struct mnt_idmap *idmap, struct inode *dir,
 		inode->i_op = &shmem_short_symlink_operations;
 	} else {
 		inode_nohighmem(inode);
-		error = shmem_get_folio(inode, 0, &folio, SGP_WRITE);
+		error = shmem_get_folio(inode, 0, &folio, SGP_WRITE, 0);
 		if (error)
 			goto out_remove_offset;
 		inode->i_mapping->a_ops = &shmem_aops;
@@ -3550,7 +3554,7 @@ static const char *shmem_get_link(struct dentry *dentry,
 			return ERR_PTR(-ECHILD);
 		}
 	} else {
-		error = shmem_get_folio(inode, 0, &folio, SGP_READ);
+		error = shmem_get_folio(inode, 0, &folio, SGP_READ, 0);
 		if (error)
 			return ERR_PTR(error);
 		if (!folio)
@@ -4923,7 +4927,7 @@ struct folio *shmem_read_folio_gfp(struct address_space *mapping,
 
 	BUG_ON(!shmem_mapping(mapping));
 	error = shmem_get_folio_gfp(inode, index, &folio, SGP_CACHE,
-				  gfp, NULL, NULL, NULL);
+				  gfp, NULL, NULL, NULL, i_size_read(inode));
 	if (error)
 		return ERR_PTR(error);
 
-- 
2.39.2

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 6/6] shmem: add large folios support to the write path
       [not found]   ` <CGME20230915095133eucas1p267bade2888b7fcd2e1ea8e13e21c495f@eucas1p2.samsung.com>
@ 2023-09-15  9:51     ` Daniel Gomez
  2023-09-15 18:26       ` Yosry Ahmed
  0 siblings, 1 reply; 29+ messages in thread
From: Daniel Gomez @ 2023-09-15  9:51 UTC (permalink / raw)
  To: minchan, senozhatsky, axboe, djwong, willy, hughd, akpm, mcgrof,
	linux-kernel, linux-block, linux-xfs, linux-fsdevel, linux-mm
  Cc: gost.dev, Pankaj Raghav, Daniel Gomez

Add large folio support for shmem write path matching the same high
order preference mechanism used for iomap buffered IO path as used in
__filemap_get_folio().

Use the __folio_get_max_order to get a hint for the order of the folio
based on file size which takes care of the mapping requirements.

Swap does not support high order folios for now, so make it order 0 in
case swap is enabled.

Signed-off-by: Daniel Gomez <da.gomez@samsung.com>
---
 mm/shmem.c | 16 +++++++++++++---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/mm/shmem.c b/mm/shmem.c
index adff74751065..26ca555b1669 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1683,13 +1683,19 @@ static struct folio *shmem_alloc_folio(gfp_t gfp,
 }
 
 static struct folio *shmem_alloc_and_acct_folio(gfp_t gfp, struct inode *inode,
-		pgoff_t index, bool huge, unsigned int *order)
+		pgoff_t index, bool huge, unsigned int *order,
+		struct shmem_sb_info *sbinfo)
 {
 	struct shmem_inode_info *info = SHMEM_I(inode);
 	struct folio *folio;
 	int nr;
 	int err;
 
+	if (!sbinfo->noswap)
+		*order = 0;
+	else
+		*order = (*order == 1) ? 0 : *order;
+
 	if (!IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE))
 		huge = false;
 	nr = huge ? HPAGE_PMD_NR : 1U << *order;
@@ -2032,6 +2038,8 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index,
 		return 0;
 	}
 
+	order = mapping_size_order(inode->i_mapping, index, len);
+
 	if (!shmem_is_huge(inode, index, false,
 			   vma ? vma->vm_mm : NULL, vma ? vma->vm_flags : 0))
 		goto alloc_nohuge;
@@ -2039,11 +2047,11 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index,
 	huge_gfp = vma_thp_gfp_mask(vma);
 	huge_gfp = limit_gfp_mask(huge_gfp, gfp);
 	folio = shmem_alloc_and_acct_folio(huge_gfp, inode, index, true,
-					   &order);
+					   &order, sbinfo);
 	if (IS_ERR(folio)) {
 alloc_nohuge:
 		folio = shmem_alloc_and_acct_folio(gfp, inode, index, false,
-						   &order);
+						   &order, sbinfo);
 	}
 	if (IS_ERR(folio)) {
 		int retry = 5;
@@ -2147,6 +2155,8 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index,
 	if (folio_test_large(folio)) {
 		folio_unlock(folio);
 		folio_put(folio);
+		if (order > 0)
+			order--;
 		goto alloc_nohuge;
 	}
 unlock:
-- 
2.39.2

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH 3/6] shmem: account for large order folios
  2023-09-15  9:51     ` [PATCH 3/6] shmem: account for large order folios Daniel Gomez
@ 2023-09-15 12:14       ` Matthew Wilcox
  2023-09-15 13:44       ` Matthew Wilcox
  1 sibling, 0 replies; 29+ messages in thread
From: Matthew Wilcox @ 2023-09-15 12:14 UTC (permalink / raw)
  To: Daniel Gomez
  Cc: minchan, senozhatsky, axboe, djwong, hughd, akpm, mcgrof,
	linux-kernel, linux-block, linux-xfs, linux-fsdevel, linux-mm,
	gost.dev, Pankaj Raghav

On Fri, Sep 15, 2023 at 09:51:26AM +0000, Daniel Gomez wrote:
> @@ -1810,13 +1815,14 @@ static void shmem_set_folio_swapin_error(struct inode *inode, pgoff_t index,
>  		return;
>  
>  	folio_wait_writeback(folio);
> +	num_swap_pages = folio_nr_pages(folio);
>  	delete_from_swap_cache(folio);
>  	/*
>  	 * Don't treat swapin error folio as alloced. Otherwise inode->i_blocks
>  	 * won't be 0 when inode is released and thus trigger WARN_ON(i_blocks)
>  	 * in shmem_evict_inode().
>  	 */
> -	shmem_recalc_inode(inode, -1, -1);
> +	shmem_recalc_inode(inode, num_swap_pages, num_swap_pages);

Shouldn't that be -num_swap_pages?

>  	swap_free(swap);
>  }
>  
> @@ -1903,7 +1909,7 @@ static int shmem_swapin_folio(struct inode *inode, pgoff_t index,
>  	if (error)
>  		goto failed;
>  
> -	shmem_recalc_inode(inode, 0, -1);
> +	shmem_recalc_inode(inode, 0, folio_nr_pages(folio));
>  
>  	if (sgp == SGP_WRITE)
>  		folio_mark_accessed(folio);

Also here.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 4/6] shmem: add order parameter support to shmem_alloc_folio
  2023-09-15  9:51     ` [PATCH 4/6] shmem: add order parameter support to shmem_alloc_folio Daniel Gomez
@ 2023-09-15 12:19       ` Matthew Wilcox
  0 siblings, 0 replies; 29+ messages in thread
From: Matthew Wilcox @ 2023-09-15 12:19 UTC (permalink / raw)
  To: Daniel Gomez
  Cc: minchan, senozhatsky, axboe, djwong, hughd, akpm, mcgrof,
	linux-kernel, linux-block, linux-xfs, linux-fsdevel, linux-mm,
	gost.dev, Pankaj Raghav

On Fri, Sep 15, 2023 at 09:51:28AM +0000, Daniel Gomez wrote:
> In preparation for high order folio support for the write path, add
> order parameter when allocating a folio. This is on the write path
> when huge support is not enabled or when it is but the huge page
> allocation fails, the fallback will take advantage of this too.

>  static struct folio *shmem_alloc_and_acct_folio(gfp_t gfp, struct inode *inode,
> -		pgoff_t index, bool huge)
> +		pgoff_t index, bool huge, unsigned int *order)

I don't understand why you keep the 'huge' parameter when you could just
pass PMD_ORDER.  And I don't understand why you're passing a pointer to
the order instead of just passing the order.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/6] filemap: make the folio order calculation shareable
  2023-09-15  9:51     ` [PATCH 1/6] filemap: make the folio order calculation shareable Daniel Gomez
@ 2023-09-15 13:40       ` Matthew Wilcox
  2023-09-18  8:41         ` Daniel Gomez
  2023-09-18 18:09         ` Luis Chamberlain
  0 siblings, 2 replies; 29+ messages in thread
From: Matthew Wilcox @ 2023-09-15 13:40 UTC (permalink / raw)
  To: Daniel Gomez
  Cc: minchan, senozhatsky, axboe, djwong, hughd, akpm, mcgrof,
	linux-kernel, linux-block, linux-xfs, linux-fsdevel, linux-mm,
	gost.dev, Pankaj Raghav

On Fri, Sep 15, 2023 at 09:51:23AM +0000, Daniel Gomez wrote:
> To make the code that clamps the folio order in the __filemap_get_folio
> routine reusable to others, move and merge it to the fgf_set_order
> new subroutine (mapping_size_order), so when mapping the size at a
> given index, the order calculated is already valid and ready to be
> used when order is retrieved from fgp_flags with FGF_GET_ORDER.
> 
> Signed-off-by: Daniel Gomez <da.gomez@samsung.com>
> ---
>  fs/iomap/buffered-io.c  |  6 ++++--
>  include/linux/pagemap.h | 42 ++++++++++++++++++++++++++++++++++++-----
>  mm/filemap.c            |  8 --------
>  3 files changed, 41 insertions(+), 15 deletions(-)

That seems like a lot of extra code to add in order to avoid copying
six lines of code and one comment into the shmem code.

It's not wrong, but it seems like a bad tradeoff to me.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 3/6] shmem: account for large order folios
  2023-09-15  9:51     ` [PATCH 3/6] shmem: account for large order folios Daniel Gomez
  2023-09-15 12:14       ` Matthew Wilcox
@ 2023-09-15 13:44       ` Matthew Wilcox
  1 sibling, 0 replies; 29+ messages in thread
From: Matthew Wilcox @ 2023-09-15 13:44 UTC (permalink / raw)
  To: Daniel Gomez
  Cc: minchan, senozhatsky, axboe, djwong, hughd, akpm, mcgrof,
	linux-kernel, linux-block, linux-xfs, linux-fsdevel, linux-mm,
	gost.dev, Pankaj Raghav

On Fri, Sep 15, 2023 at 09:51:26AM +0000, Daniel Gomez wrote:
> +	xas_for_each(&xas, folio, max) {
> +		if (xas_retry(&xas, folio))
>  			continue;
> -		if (xa_is_value(page))
> -			swapped++;
> +		if (xa_is_value(folio))
> +			swapped += (folio_nr_pages(folio));

Unnecessary parens.

> @@ -1006,10 +1006,12 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend,
>  			folio = fbatch.folios[i];
>  
>  			if (xa_is_value(folio)) {
> +				long swaps_freed;
>  				if (unfalloc)
>  					continue;
> -				nr_swaps_freed += !shmem_free_swap(mapping,
> -							indices[i], folio);
> +				swaps_freed = folio_nr_pages(folio);
> +				if (!shmem_free_swap(mapping, indices[i], folio))
> +					nr_swaps_freed += swaps_freed;

Broader change (indeed, in a separate patch), why not make
shmem_free_swap() return the number of pages freed, rather than
returning an errno?

> @@ -1075,14 +1077,16 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend,
>  			folio = fbatch.folios[i];
>  
>  			if (xa_is_value(folio)) {
> +				long swaps_freed;
>  				if (unfalloc)
>  					continue;
> +				swaps_freed = folio_nr_pages(folio);
>  				if (shmem_free_swap(mapping, indices[i], folio)) {
>  					/* Swap was replaced by page: retry */
>  					index = indices[i];
>  					break;
>  				}
> -				nr_swaps_freed++;
> +				nr_swaps_freed += swaps_freed;
>  				continue;

... seems like both callers would prefer that.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 0/6] shmem: high order folios support in write path
  2023-09-15  9:51 ` [PATCH 0/6] shmem: high order folios support in write path Daniel Gomez
                     ` (5 preceding siblings ...)
       [not found]   ` <CGME20230915095133eucas1p267bade2888b7fcd2e1ea8e13e21c495f@eucas1p2.samsung.com>
@ 2023-09-15 15:29   ` David Hildenbrand
  2023-09-15 15:34     ` Matthew Wilcox
  2023-09-18  7:32     ` Daniel Gomez
  6 siblings, 2 replies; 29+ messages in thread
From: David Hildenbrand @ 2023-09-15 15:29 UTC (permalink / raw)
  To: Daniel Gomez, minchan, senozhatsky, axboe, djwong, willy, hughd,
	akpm, mcgrof, linux-kernel, linux-block, linux-xfs,
	linux-fsdevel, linux-mm
  Cc: gost.dev, Pankaj Raghav

On 15.09.23 11:51, Daniel Gomez wrote:
> This series add support for high order folios in shmem write
> path.
> 
> This is a continuation of the shmem work from Luis here [1]
> following Matthew Wilcox's suggestion [2] regarding the path to take
> for the folio allocation order calculation.
> 
> [1] RFC v2 add support for blocksize > PAGE_SIZE
> https://lore.kernel.org/all/ZHBowMEDfyrAAOWH@bombadil.infradead.org/T/#md3e93ab46ce2ad9254e1eb54ffe71211988b5632
> [2] https://lore.kernel.org/all/ZHD9zmIeNXICDaRJ@casper.infradead.org/
> 
> Patches have been tested and sent from next-230911. They do apply
> cleanly to the latest next-230914.
> 
> fsx and fstests has been performed on tmpfs with noswap with the
> following results:
> - fsx: 2d test, 21,5B
> - fstests: Same result as baseline for next-230911 [3][4][5]
> 
> [3] Baseline next-230911 failures are: generic/080 generic/126
> generic/193 generic/633 generic/689
> [4] fstests logs baseline: https://gitlab.com/-/snippets/3598621
> [5] fstests logs patches: https://gitlab.com/-/snippets/3598628
> 
> There are at least 2 cases/topics to handle that I'd appreciate
> feedback.
> 1. With the new strategy, you might end up with a folio order matching
> HPAGE_PMD_ORDER. However, we won't respect the 'huge' flag anymore if
> THP is enabled.
> 2. When the above (1.) occurs, the code skips the huge path, so
> xa_find with hindex is skipped.

Similar to large anon folios (but different to large non-shmem folios in 
the pagecache), this can result in memory waste.

We discussed that topic in the last bi-weekly mm meeting, and also how 
to eventually configure that for shmem.

Refer to of a summary. [1]

[1] https://lkml.kernel.org/r/4966f496-9f71-460c-b2ab-8661384ce626@arm.com

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 0/6] shmem: high order folios support in write path
  2023-09-15 15:29   ` [PATCH 0/6] shmem: high order folios support in " David Hildenbrand
@ 2023-09-15 15:34     ` Matthew Wilcox
  2023-09-15 15:36       ` David Hildenbrand
  2023-09-18  7:32     ` Daniel Gomez
  1 sibling, 1 reply; 29+ messages in thread
From: Matthew Wilcox @ 2023-09-15 15:34 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Daniel Gomez, minchan, senozhatsky, axboe, djwong, hughd, akpm,
	mcgrof, linux-kernel, linux-block, linux-xfs, linux-fsdevel,
	linux-mm, gost.dev, Pankaj Raghav

On Fri, Sep 15, 2023 at 05:29:51PM +0200, David Hildenbrand wrote:
> On 15.09.23 11:51, Daniel Gomez wrote:
> > This series add support for high order folios in shmem write
> > path.
> > There are at least 2 cases/topics to handle that I'd appreciate
> > feedback.
> > 1. With the new strategy, you might end up with a folio order matching
> > HPAGE_PMD_ORDER. However, we won't respect the 'huge' flag anymore if
> > THP is enabled.
> > 2. When the above (1.) occurs, the code skips the huge path, so
> > xa_find with hindex is skipped.
> 
> Similar to large anon folios (but different to large non-shmem folios in the
> pagecache), this can result in memory waste.

No, it can't.  This patchset triggers only on write, not on read or page
fault, and it's conservative, so it will only allocate folios which are
entirely covered by the write.  IOW this is memory we must allocate in
order to satisfy the write; we're just allocating it in larger chunks
when we can.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 0/6] shmem: high order folios support in write path
  2023-09-15 15:34     ` Matthew Wilcox
@ 2023-09-15 15:36       ` David Hildenbrand
  2023-09-15 15:40         ` Matthew Wilcox
  0 siblings, 1 reply; 29+ messages in thread
From: David Hildenbrand @ 2023-09-15 15:36 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Daniel Gomez, minchan, senozhatsky, axboe, djwong, hughd, akpm,
	mcgrof, linux-kernel, linux-block, linux-xfs, linux-fsdevel,
	linux-mm, gost.dev, Pankaj Raghav

On 15.09.23 17:34, Matthew Wilcox wrote:
> On Fri, Sep 15, 2023 at 05:29:51PM +0200, David Hildenbrand wrote:
>> On 15.09.23 11:51, Daniel Gomez wrote:
>>> This series add support for high order folios in shmem write
>>> path.
>>> There are at least 2 cases/topics to handle that I'd appreciate
>>> feedback.
>>> 1. With the new strategy, you might end up with a folio order matching
>>> HPAGE_PMD_ORDER. However, we won't respect the 'huge' flag anymore if
>>> THP is enabled.
>>> 2. When the above (1.) occurs, the code skips the huge path, so
>>> xa_find with hindex is skipped.
>>
>> Similar to large anon folios (but different to large non-shmem folios in the
>> pagecache), this can result in memory waste.
> 
> No, it can't.  This patchset triggers only on write, not on read or page
> fault, and it's conservative, so it will only allocate folios which are
> entirely covered by the write.  IOW this is memory we must allocate in
> order to satisfy the write; we're just allocating it in larger chunks
> when we can.

Oh, good! I was assuming you would eventually over-allocate on the write 
path.

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 5/6] shmem: add file length in shmem_get_folio path
  2023-09-15  9:51     ` [PATCH 5/6] shmem: add file length in shmem_get_folio path Daniel Gomez
@ 2023-09-15 15:37       ` Matthew Wilcox
  0 siblings, 0 replies; 29+ messages in thread
From: Matthew Wilcox @ 2023-09-15 15:37 UTC (permalink / raw)
  To: Daniel Gomez
  Cc: minchan, senozhatsky, axboe, djwong, hughd, akpm, mcgrof,
	linux-kernel, linux-block, linux-xfs, linux-fsdevel, linux-mm,
	gost.dev, Pankaj Raghav

On Fri, Sep 15, 2023 at 09:51:29AM +0000, Daniel Gomez wrote:
> @@ -2251,7 +2252,7 @@ static vm_fault_t shmem_fault(struct vm_fault *vmf)
>  	}
>  
>  	err = shmem_get_folio_gfp(inode, vmf->pgoff, &folio, SGP_CACHE,
> -				  gfp, vma, vmf, &ret);
> +				  gfp, vma, vmf, &ret, i_size_read(inode));

Surely this should be PAGE_SIZE not i_size?

> @@ -4923,7 +4927,7 @@ struct folio *shmem_read_folio_gfp(struct address_space *mapping,
>  
>  	BUG_ON(!shmem_mapping(mapping));
>  	error = shmem_get_folio_gfp(inode, index, &folio, SGP_CACHE,
> -				  gfp, NULL, NULL, NULL);
> +				  gfp, NULL, NULL, NULL, i_size_read(inode));

Same here?

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 0/6] shmem: high order folios support in write path
  2023-09-15 15:36       ` David Hildenbrand
@ 2023-09-15 15:40         ` Matthew Wilcox
  2023-09-15 15:43           ` David Hildenbrand
  0 siblings, 1 reply; 29+ messages in thread
From: Matthew Wilcox @ 2023-09-15 15:40 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Daniel Gomez, minchan, senozhatsky, axboe, djwong, hughd, akpm,
	mcgrof, linux-kernel, linux-block, linux-xfs, linux-fsdevel,
	linux-mm, gost.dev, Pankaj Raghav

On Fri, Sep 15, 2023 at 05:36:27PM +0200, David Hildenbrand wrote:
> On 15.09.23 17:34, Matthew Wilcox wrote:
> > No, it can't.  This patchset triggers only on write, not on read or page
> > fault, and it's conservative, so it will only allocate folios which are
> > entirely covered by the write.  IOW this is memory we must allocate in
> > order to satisfy the write; we're just allocating it in larger chunks
> > when we can.
> 
> Oh, good! I was assuming you would eventually over-allocate on the write
> path.

We might!  But that would be a different patchset, and it would be
subject to its own discussion.

Something else I've been wondering about is possibly reallocating the
pages on a write.  This would apply to both normal files and shmem.
If you read in a file one byte at a time, then overwrite a big chunk of
it with a large single write, that seems like a good signal that maybe
we should manage that part of the file as a single large chunk instead
of individual pages.  Maybe.

Lots of things for people who are obsessed with performance to play
with ;-)

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 0/6] shmem: high order folios support in write path
  2023-09-15 15:40         ` Matthew Wilcox
@ 2023-09-15 15:43           ` David Hildenbrand
  0 siblings, 0 replies; 29+ messages in thread
From: David Hildenbrand @ 2023-09-15 15:43 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Daniel Gomez, minchan, senozhatsky, axboe, djwong, hughd, akpm,
	mcgrof, linux-kernel, linux-block, linux-xfs, linux-fsdevel,
	linux-mm, gost.dev, Pankaj Raghav

On 15.09.23 17:40, Matthew Wilcox wrote:
> On Fri, Sep 15, 2023 at 05:36:27PM +0200, David Hildenbrand wrote:
>> On 15.09.23 17:34, Matthew Wilcox wrote:
>>> No, it can't.  This patchset triggers only on write, not on read or page
>>> fault, and it's conservative, so it will only allocate folios which are
>>> entirely covered by the write.  IOW this is memory we must allocate in
>>> order to satisfy the write; we're just allocating it in larger chunks
>>> when we can.
>>
>> Oh, good! I was assuming you would eventually over-allocate on the write
>> path.
> 
> We might!  But that would be a different patchset, and it would be
> subject to its own discussion.
> 
> Something else I've been wondering about is possibly reallocating the
> pages on a write.  This would apply to both normal files and shmem.
> If you read in a file one byte at a time, then overwrite a big chunk of
> it with a large single write, that seems like a good signal that maybe
> we should manage that part of the file as a single large chunk instead
> of individual pages.  Maybe.
> 
> Lots of things for people who are obsessed with performance to play
> with ;-)

:) Absolutely. ... because if nobody will be consuming that written 
memory any time soon, it might also be the wrong place for a large/huge 
folio.

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 6/6] shmem: add large folios support to the write path
  2023-09-15  9:51     ` [PATCH 6/6] shmem: add large folios support to the write path Daniel Gomez
@ 2023-09-15 18:26       ` Yosry Ahmed
  2023-09-18  8:00         ` Daniel Gomez
  0 siblings, 1 reply; 29+ messages in thread
From: Yosry Ahmed @ 2023-09-15 18:26 UTC (permalink / raw)
  To: Daniel Gomez
  Cc: minchan, senozhatsky, axboe, djwong, willy, hughd, akpm, mcgrof,
	linux-kernel, linux-block, linux-xfs, linux-fsdevel, linux-mm,
	gost.dev, Pankaj Raghav

On Fri, Sep 15, 2023 at 2:51 AM Daniel Gomez <da.gomez@samsung.com> wrote:
>
> Add large folio support for shmem write path matching the same high
> order preference mechanism used for iomap buffered IO path as used in
> __filemap_get_folio().
>
> Use the __folio_get_max_order to get a hint for the order of the folio
> based on file size which takes care of the mapping requirements.
>
> Swap does not support high order folios for now, so make it order 0 in
> case swap is enabled.

I didn't take a close look at the series, but I am not sure I
understand the rationale here. Reclaim will split high order shmem
folios anyway, right?

It seems like we only enable high order folios if the "noswap" mount
option is used, which is fairly recent. I doubt it is widely used.

>
> Signed-off-by: Daniel Gomez <da.gomez@samsung.com>
> ---
>  mm/shmem.c | 16 +++++++++++++---
>  1 file changed, 13 insertions(+), 3 deletions(-)
>
> diff --git a/mm/shmem.c b/mm/shmem.c
> index adff74751065..26ca555b1669 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -1683,13 +1683,19 @@ static struct folio *shmem_alloc_folio(gfp_t gfp,
>  }
>
>  static struct folio *shmem_alloc_and_acct_folio(gfp_t gfp, struct inode *inode,
> -               pgoff_t index, bool huge, unsigned int *order)
> +               pgoff_t index, bool huge, unsigned int *order,
> +               struct shmem_sb_info *sbinfo)
>  {
>         struct shmem_inode_info *info = SHMEM_I(inode);
>         struct folio *folio;
>         int nr;
>         int err;
>
> +       if (!sbinfo->noswap)
> +               *order = 0;
> +       else
> +               *order = (*order == 1) ? 0 : *order;
> +
>         if (!IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE))
>                 huge = false;
>         nr = huge ? HPAGE_PMD_NR : 1U << *order;
> @@ -2032,6 +2038,8 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index,
>                 return 0;
>         }
>
> +       order = mapping_size_order(inode->i_mapping, index, len);
> +
>         if (!shmem_is_huge(inode, index, false,
>                            vma ? vma->vm_mm : NULL, vma ? vma->vm_flags : 0))
>                 goto alloc_nohuge;
> @@ -2039,11 +2047,11 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index,
>         huge_gfp = vma_thp_gfp_mask(vma);
>         huge_gfp = limit_gfp_mask(huge_gfp, gfp);
>         folio = shmem_alloc_and_acct_folio(huge_gfp, inode, index, true,
> -                                          &order);
> +                                          &order, sbinfo);
>         if (IS_ERR(folio)) {
>  alloc_nohuge:
>                 folio = shmem_alloc_and_acct_folio(gfp, inode, index, false,
> -                                                  &order);
> +                                                  &order, sbinfo);
>         }
>         if (IS_ERR(folio)) {
>                 int retry = 5;
> @@ -2147,6 +2155,8 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index,
>         if (folio_test_large(folio)) {
>                 folio_unlock(folio);
>                 folio_put(folio);
> +               if (order > 0)
> +                       order--;
>                 goto alloc_nohuge;
>         }
>  unlock:
> --
> 2.39.2
>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 0/6] shmem: high order folios support in write path
  2023-09-15 15:29   ` [PATCH 0/6] shmem: high order folios support in " David Hildenbrand
  2023-09-15 15:34     ` Matthew Wilcox
@ 2023-09-18  7:32     ` Daniel Gomez
  1 sibling, 0 replies; 29+ messages in thread
From: Daniel Gomez @ 2023-09-18  7:32 UTC (permalink / raw)
  To: David Hildenbrand, minchan, senozhatsky, axboe, djwong, willy,
	hughd, akpm, mcgrof, linux-kernel, linux-block, linux-xfs,
	linux-fsdevel, linux-mm
  Cc: gost.dev, Pankaj Raghav

On Fri, Sep 15, 2023 at 05:29:51PM +0200, David Hildenbrand wrote:
> On 15.09.23 11:51, Daniel Gomez wrote:
> > This series add support for high order folios in shmem write
> > path.
> >
> > This is a continuation of the shmem work from Luis here [1]
> > following Matthew Wilcox's suggestion [2] regarding the path to take
> > for the folio allocation order calculation.
> >
> > [1] RFC v2 add support for blocksize > PAGE_SIZE
> > https://lore.kernel.org/all/ZHBowMEDfyrAAOWH@bombadil.infradead.org/T/#md3e93ab46ce2ad9254e1eb54ffe71211988b5632
> > [2] https://lore.kernel.org/all/ZHD9zmIeNXICDaRJ@casper.infradead.org/
> >
> > Patches have been tested and sent from next-230911. They do apply
> > cleanly to the latest next-230914.
> >
> > fsx and fstests has been performed on tmpfs with noswap with the
> > following results:
> > - fsx: 2d test, 21,5B
> > - fstests: Same result as baseline for next-230911 [3][4][5]
> >
> > [3] Baseline next-230911 failures are: generic/080 generic/126
> > generic/193 generic/633 generic/689
> > [4] fstests logs baseline: https://gitlab.com/-/snippets/3598621
> > [5] fstests logs patches: https://gitlab.com/-/snippets/3598628
> >
> > There are at least 2 cases/topics to handle that I'd appreciate
> > feedback.
> > 1. With the new strategy, you might end up with a folio order matching
> > HPAGE_PMD_ORDER. However, we won't respect the 'huge' flag anymore if
> > THP is enabled.
> > 2. When the above (1.) occurs, the code skips the huge path, so
> > xa_find with hindex is skipped.
>
> Similar to large anon folios (but different to large non-shmem folios in the
> pagecache), this can result in memory waste.
>
> We discussed that topic in the last bi-weekly mm meeting, and also how to
> eventually configure that for shmem.
>
> Refer to of a summary. [1]
>
> [1] https://lkml.kernel.org/r/4966f496-9f71-460c-b2ab-8661384ce626@arm.com

Thanks for the summary David (I was missing linux-MM from kvack in lei).

I think the PMD_ORDER-1 as max would suffice here to honor/respect the
huge flag. Although, we would end up having a different max value
than pagecache/readahead.
>
> --
> Cheers,
>
> David / dhildenb
>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 6/6] shmem: add large folios support to the write path
  2023-09-15 18:26       ` Yosry Ahmed
@ 2023-09-18  8:00         ` Daniel Gomez
  2023-09-18 18:55           ` Yosry Ahmed
  0 siblings, 1 reply; 29+ messages in thread
From: Daniel Gomez @ 2023-09-18  8:00 UTC (permalink / raw)
  To: Yosry Ahmed
  Cc: minchan, senozhatsky, axboe, djwong, willy, hughd, akpm, mcgrof,
	linux-kernel, linux-block, linux-xfs, linux-fsdevel, linux-mm,
	gost.dev, Pankaj Raghav

On Fri, Sep 15, 2023 at 11:26:37AM -0700, Yosry Ahmed wrote:
> On Fri, Sep 15, 2023 at 2:51 AM Daniel Gomez <da.gomez@samsung.com> wrote:
> >
> > Add large folio support for shmem write path matching the same high
> > order preference mechanism used for iomap buffered IO path as used in
> > __filemap_get_folio().
> >
> > Use the __folio_get_max_order to get a hint for the order of the folio
> > based on file size which takes care of the mapping requirements.
> >
> > Swap does not support high order folios for now, so make it order 0 in
> > case swap is enabled.
>
> I didn't take a close look at the series, but I am not sure I
> understand the rationale here. Reclaim will split high order shmem
> folios anyway, right?

For context, this is part of the enablement of large block sizes (LBS)
effort [1][2][3], so the assumption here is that the kernel will
reclaim memory with the same (large) block sizes that were written to
the device.

I'll add more context in the V2.

[1] https://kernelnewbies.org/KernelProjects/large-block-size
[2] https://docs.google.com/spreadsheets/d/e/2PACX-1vS7sQfw90S00l2rfOKm83Jlg0px8KxMQE4HHp_DKRGbAGcAV-xu6LITHBEc4xzVh9wLH6WM2lR0cZS8/pubhtml#
[3] https://lore.kernel.org/all/ZQfbHloBUpDh+zCg@dread.disaster.area/
>
> It seems like we only enable high order folios if the "noswap" mount
> option is used, which is fairly recent. I doubt it is widely used.

For now, I skipped the swap path as it currently lacks support for
high order folios. But I'm currently looking into it as part of the LBS
effort (please check spreadsheet at [2] for that).
>
> >
> > Signed-off-by: Daniel Gomez <da.gomez@samsung.com>
> > ---
> >  mm/shmem.c | 16 +++++++++++++---
> >  1 file changed, 13 insertions(+), 3 deletions(-)
> >
> > diff --git a/mm/shmem.c b/mm/shmem.c
> > index adff74751065..26ca555b1669 100644
> > --- a/mm/shmem.c
> > +++ b/mm/shmem.c
> > @@ -1683,13 +1683,19 @@ static struct folio *shmem_alloc_folio(gfp_t gfp,
> >  }
> >
> >  static struct folio *shmem_alloc_and_acct_folio(gfp_t gfp, struct inode *inode,
> > -               pgoff_t index, bool huge, unsigned int *order)
> > +               pgoff_t index, bool huge, unsigned int *order,
> > +               struct shmem_sb_info *sbinfo)
> >  {
> >         struct shmem_inode_info *info = SHMEM_I(inode);
> >         struct folio *folio;
> >         int nr;
> >         int err;
> >
> > +       if (!sbinfo->noswap)
> > +               *order = 0;
> > +       else
> > +               *order = (*order == 1) ? 0 : *order;
> > +
> >         if (!IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE))
> >                 huge = false;
> >         nr = huge ? HPAGE_PMD_NR : 1U << *order;
> > @@ -2032,6 +2038,8 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index,
> >                 return 0;
> >         }
> >
> > +       order = mapping_size_order(inode->i_mapping, index, len);
> > +
> >         if (!shmem_is_huge(inode, index, false,
> >                            vma ? vma->vm_mm : NULL, vma ? vma->vm_flags : 0))
> >                 goto alloc_nohuge;
> > @@ -2039,11 +2047,11 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index,
> >         huge_gfp = vma_thp_gfp_mask(vma);
> >         huge_gfp = limit_gfp_mask(huge_gfp, gfp);
> >         folio = shmem_alloc_and_acct_folio(huge_gfp, inode, index, true,
> > -                                          &order);
> > +                                          &order, sbinfo);
> >         if (IS_ERR(folio)) {
> >  alloc_nohuge:
> >                 folio = shmem_alloc_and_acct_folio(gfp, inode, index, false,
> > -                                                  &order);
> > +                                                  &order, sbinfo);
> >         }
> >         if (IS_ERR(folio)) {
> >                 int retry = 5;
> > @@ -2147,6 +2155,8 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index,
> >         if (folio_test_large(folio)) {
> >                 folio_unlock(folio);
> >                 folio_put(folio);
> > +               if (order > 0)
> > +                       order--;
> >                 goto alloc_nohuge;
> >         }
> >  unlock:
> > --
> > 2.39.2
> >

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/6] filemap: make the folio order calculation shareable
  2023-09-15 13:40       ` Matthew Wilcox
@ 2023-09-18  8:41         ` Daniel Gomez
  2023-09-18 18:09         ` Luis Chamberlain
  1 sibling, 0 replies; 29+ messages in thread
From: Daniel Gomez @ 2023-09-18  8:41 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: minchan, senozhatsky, axboe, djwong, hughd, akpm, mcgrof,
	linux-kernel, linux-block, linux-xfs, linux-fsdevel, linux-mm,
	gost.dev, Pankaj Raghav

On Fri, Sep 15, 2023 at 02:40:07PM +0100, Matthew Wilcox wrote:
> On Fri, Sep 15, 2023 at 09:51:23AM +0000, Daniel Gomez wrote:
> > To make the code that clamps the folio order in the __filemap_get_folio
> > routine reusable to others, move and merge it to the fgf_set_order
> > new subroutine (mapping_size_order), so when mapping the size at a
> > given index, the order calculated is already valid and ready to be
> > used when order is retrieved from fgp_flags with FGF_GET_ORDER.
> >
> > Signed-off-by: Daniel Gomez <da.gomez@samsung.com>
> > ---
> >  fs/iomap/buffered-io.c  |  6 ++++--
> >  include/linux/pagemap.h | 42 ++++++++++++++++++++++++++++++++++++-----
> >  mm/filemap.c            |  8 --------
> >  3 files changed, 41 insertions(+), 15 deletions(-)
>
> That seems like a lot of extra code to add in order to avoid copying
> six lines of code and one comment into the shmem code.
>
> It's not wrong, but it seems like a bad tradeoff to me.

I saw some value in sharing the logic but I'll merge this code directly
in shmem and add a comment 'Like filemap_' similar to the one in
shmem_add_to_page_cache.

Thanks for the quick feedback and review Matthew. I'll do a V2 with the
changes and retest the series again using latest next tag.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/6] filemap: make the folio order calculation shareable
  2023-09-15 13:40       ` Matthew Wilcox
  2023-09-18  8:41         ` Daniel Gomez
@ 2023-09-18 18:09         ` Luis Chamberlain
  2023-09-18 18:24           ` Matthew Wilcox
  1 sibling, 1 reply; 29+ messages in thread
From: Luis Chamberlain @ 2023-09-18 18:09 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Daniel Gomez, minchan, senozhatsky, axboe, djwong, hughd, akpm,
	linux-kernel, linux-block, linux-xfs, linux-fsdevel, linux-mm,
	gost.dev, Pankaj Raghav

On Fri, Sep 15, 2023 at 02:40:07PM +0100, Matthew Wilcox wrote:
> On Fri, Sep 15, 2023 at 09:51:23AM +0000, Daniel Gomez wrote:
> > To make the code that clamps the folio order in the __filemap_get_folio
> > routine reusable to others, move and merge it to the fgf_set_order
> > new subroutine (mapping_size_order), so when mapping the size at a
> > given index, the order calculated is already valid and ready to be
> > used when order is retrieved from fgp_flags with FGF_GET_ORDER.
> > 
> > Signed-off-by: Daniel Gomez <da.gomez@samsung.com>
> > ---
> >  fs/iomap/buffered-io.c  |  6 ++++--
> >  include/linux/pagemap.h | 42 ++++++++++++++++++++++++++++++++++++-----
> >  mm/filemap.c            |  8 --------
> >  3 files changed, 41 insertions(+), 15 deletions(-)
> 
> That seems like a lot of extra code to add in order to avoid copying
> six lines of code and one comment into the shmem code.
> 
> It's not wrong, but it seems like a bad tradeoff to me.

The suggestion to merge came from me, mostly based on later observations
that in the future we may want to extend this with a min order to ensure
the index is aligned the the order. This check would only be useful for
buffred IO for iomap, readahead. It has me wondering if buffer-heads
support for large order folios come around would we a similar check
there?

So Willy, you would know better if and when a shared piece of code would
be best with all these things in mind.

  Luis

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/6] filemap: make the folio order calculation shareable
  2023-09-18 18:09         ` Luis Chamberlain
@ 2023-09-18 18:24           ` Matthew Wilcox
  2023-09-18 18:42             ` Luis Chamberlain
  0 siblings, 1 reply; 29+ messages in thread
From: Matthew Wilcox @ 2023-09-18 18:24 UTC (permalink / raw)
  To: Luis Chamberlain
  Cc: Daniel Gomez, minchan, senozhatsky, axboe, djwong, hughd, akpm,
	linux-kernel, linux-block, linux-xfs, linux-fsdevel, linux-mm,
	gost.dev, Pankaj Raghav

On Mon, Sep 18, 2023 at 11:09:00AM -0700, Luis Chamberlain wrote:
> On Fri, Sep 15, 2023 at 02:40:07PM +0100, Matthew Wilcox wrote:
> > On Fri, Sep 15, 2023 at 09:51:23AM +0000, Daniel Gomez wrote:
> > > To make the code that clamps the folio order in the __filemap_get_folio
> > > routine reusable to others, move and merge it to the fgf_set_order
> > > new subroutine (mapping_size_order), so when mapping the size at a
> > > given index, the order calculated is already valid and ready to be
> > > used when order is retrieved from fgp_flags with FGF_GET_ORDER.
> > > 
> > > Signed-off-by: Daniel Gomez <da.gomez@samsung.com>
> > > ---
> > >  fs/iomap/buffered-io.c  |  6 ++++--
> > >  include/linux/pagemap.h | 42 ++++++++++++++++++++++++++++++++++++-----
> > >  mm/filemap.c            |  8 --------
> > >  3 files changed, 41 insertions(+), 15 deletions(-)
> > 
> > That seems like a lot of extra code to add in order to avoid copying
> > six lines of code and one comment into the shmem code.
> > 
> > It's not wrong, but it seems like a bad tradeoff to me.
> 
> The suggestion to merge came from me, mostly based on later observations
> that in the future we may want to extend this with a min order to ensure
> the index is aligned the the order. This check would only be useful for
> buffred IO for iomap, readahead. It has me wondering if buffer-heads
> support for large order folios come around would we a similar check
> there?
> 
> So Willy, you would know better if and when a shared piece of code would
> be best with all these things in mind.

In my mind, this is fundamentally code which belongs in the page cache
rather than in individual filesystems.  The fly in the ointment is that
shmem has forked the page cache in order to do its own slightly
specialised thing.  I don't see the buffer_head connection; shmem is
an extremely special case, and we shouldn't mess around with other
filesystems to avoid changing shmem.

Ideally, we'd reunify (parts of) shmem and the regular page cache, but
that's a lot of work, probably involving the swap layer changing.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/6] filemap: make the folio order calculation shareable
  2023-09-18 18:24           ` Matthew Wilcox
@ 2023-09-18 18:42             ` Luis Chamberlain
  0 siblings, 0 replies; 29+ messages in thread
From: Luis Chamberlain @ 2023-09-18 18:42 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Daniel Gomez, minchan, senozhatsky, axboe, djwong, hughd, akpm,
	linux-kernel, linux-block, linux-xfs, linux-fsdevel, linux-mm,
	gost.dev, Pankaj Raghav

On Mon, Sep 18, 2023 at 07:24:55PM +0100, Matthew Wilcox wrote:
> On Mon, Sep 18, 2023 at 11:09:00AM -0700, Luis Chamberlain wrote:
> > On Fri, Sep 15, 2023 at 02:40:07PM +0100, Matthew Wilcox wrote:
> > > On Fri, Sep 15, 2023 at 09:51:23AM +0000, Daniel Gomez wrote:
> > > > To make the code that clamps the folio order in the __filemap_get_folio
> > > > routine reusable to others, move and merge it to the fgf_set_order
> > > > new subroutine (mapping_size_order), so when mapping the size at a
> > > > given index, the order calculated is already valid and ready to be
> > > > used when order is retrieved from fgp_flags with FGF_GET_ORDER.
> > > > 
> > > > Signed-off-by: Daniel Gomez <da.gomez@samsung.com>
> > > > ---
> > > >  fs/iomap/buffered-io.c  |  6 ++++--
> > > >  include/linux/pagemap.h | 42 ++++++++++++++++++++++++++++++++++++-----
> > > >  mm/filemap.c            |  8 --------
> > > >  3 files changed, 41 insertions(+), 15 deletions(-)
> > > 
> > > That seems like a lot of extra code to add in order to avoid copying
> > > six lines of code and one comment into the shmem code.
> > > 
> > > It's not wrong, but it seems like a bad tradeoff to me.
> > 
> > The suggestion to merge came from me, mostly based on later observations
> > that in the future we may want to extend this with a min order to ensure
> > the index is aligned the the order. This check would only be useful for
> > buffred IO for iomap, readahead. It has me wondering if buffer-heads
> > support for large order folios come around would we a similar check
> > there?
> > 
> > So Willy, you would know better if and when a shared piece of code would
> > be best with all these things in mind.
> 
> In my mind, this is fundamentally code which belongs in the page cache
> rather than in individual filesystems.  The fly in the ointment is that
> shmem has forked the page cache in order to do its own slightly
> specialised thing. 

Do we do any effort *now* try to to not make that situation worse? This
just being one example.

> I don't see the buffer_head connection;

I haven't reviewed yet Hannes' patches yet but I was wondering if for bf
there was also a loop to target a high order and retry until you get the
min order allowed.

> shmem is
> an extremely special case, and we shouldn't mess around with other
> filesystems to avoid changing shmem.
> 
> Ideally, we'd reunify (parts of) shmem and the regular page cache, but
> that's a lot of work, probably involving the swap layer changing.

Well so indeed swap effort will be next, so addressing if we should
not make the situation worse as we add more code would be good to know.

  Luis

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 6/6] shmem: add large folios support to the write path
  2023-09-18  8:00         ` Daniel Gomez
@ 2023-09-18 18:55           ` Yosry Ahmed
  2023-09-19 13:27             ` Daniel Gomez
  0 siblings, 1 reply; 29+ messages in thread
From: Yosry Ahmed @ 2023-09-18 18:55 UTC (permalink / raw)
  To: Daniel Gomez
  Cc: minchan, senozhatsky, axboe, djwong, willy, hughd, akpm, mcgrof,
	linux-kernel, linux-block, linux-xfs, linux-fsdevel, linux-mm,
	gost.dev, Pankaj Raghav

On Mon, Sep 18, 2023 at 1:00 AM Daniel Gomez <da.gomez@samsung.com> wrote:
>
> On Fri, Sep 15, 2023 at 11:26:37AM -0700, Yosry Ahmed wrote:
> > On Fri, Sep 15, 2023 at 2:51 AM Daniel Gomez <da.gomez@samsung.com> wrote:
> > >
> > > Add large folio support for shmem write path matching the same high
> > > order preference mechanism used for iomap buffered IO path as used in
> > > __filemap_get_folio().
> > >
> > > Use the __folio_get_max_order to get a hint for the order of the folio
> > > based on file size which takes care of the mapping requirements.
> > >
> > > Swap does not support high order folios for now, so make it order 0 in
> > > case swap is enabled.
> >
> > I didn't take a close look at the series, but I am not sure I
> > understand the rationale here. Reclaim will split high order shmem
> > folios anyway, right?
>
> For context, this is part of the enablement of large block sizes (LBS)
> effort [1][2][3], so the assumption here is that the kernel will
> reclaim memory with the same (large) block sizes that were written to
> the device.
>
> I'll add more context in the V2.
>
> [1] https://kernelnewbies.org/KernelProjects/large-block-size
> [2] https://docs.google.com/spreadsheets/d/e/2PACX-1vS7sQfw90S00l2rfOKm83Jlg0px8KxMQE4HHp_DKRGbAGcAV-xu6LITHBEc4xzVh9wLH6WM2lR0cZS8/pubhtml#
> [3] https://lore.kernel.org/all/ZQfbHloBUpDh+zCg@dread.disaster.area/
> >
> > It seems like we only enable high order folios if the "noswap" mount
> > option is used, which is fairly recent. I doubt it is widely used.
>
> For now, I skipped the swap path as it currently lacks support for
> high order folios. But I'm currently looking into it as part of the LBS
> effort (please check spreadsheet at [2] for that).

Thanks for the context, but I am not sure I understand.

IIUC we are skipping allocating large folios in shmem if swap is
enabled in this patch. Swap does not support swapping out large folios
as a whole (except THPs), but page reclaim will split those large
folios and swap them out as order-0 pages anyway. So I am not sure I
understand why we need to skip allocating large folios if swap is
enabled.


> >
> > >
> > > Signed-off-by: Daniel Gomez <da.gomez@samsung.com>
> > > ---
> > >  mm/shmem.c | 16 +++++++++++++---
> > >  1 file changed, 13 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/mm/shmem.c b/mm/shmem.c
> > > index adff74751065..26ca555b1669 100644
> > > --- a/mm/shmem.c
> > > +++ b/mm/shmem.c
> > > @@ -1683,13 +1683,19 @@ static struct folio *shmem_alloc_folio(gfp_t gfp,
> > >  }
> > >
> > >  static struct folio *shmem_alloc_and_acct_folio(gfp_t gfp, struct inode *inode,
> > > -               pgoff_t index, bool huge, unsigned int *order)
> > > +               pgoff_t index, bool huge, unsigned int *order,
> > > +               struct shmem_sb_info *sbinfo)
> > >  {
> > >         struct shmem_inode_info *info = SHMEM_I(inode);
> > >         struct folio *folio;
> > >         int nr;
> > >         int err;
> > >
> > > +       if (!sbinfo->noswap)
> > > +               *order = 0;
> > > +       else
> > > +               *order = (*order == 1) ? 0 : *order;
> > > +
> > >         if (!IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE))
> > >                 huge = false;
> > >         nr = huge ? HPAGE_PMD_NR : 1U << *order;
> > > @@ -2032,6 +2038,8 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index,
> > >                 return 0;
> > >         }
> > >
> > > +       order = mapping_size_order(inode->i_mapping, index, len);
> > > +
> > >         if (!shmem_is_huge(inode, index, false,
> > >                            vma ? vma->vm_mm : NULL, vma ? vma->vm_flags : 0))
> > >                 goto alloc_nohuge;
> > > @@ -2039,11 +2047,11 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index,
> > >         huge_gfp = vma_thp_gfp_mask(vma);
> > >         huge_gfp = limit_gfp_mask(huge_gfp, gfp);
> > >         folio = shmem_alloc_and_acct_folio(huge_gfp, inode, index, true,
> > > -                                          &order);
> > > +                                          &order, sbinfo);
> > >         if (IS_ERR(folio)) {
> > >  alloc_nohuge:
> > >                 folio = shmem_alloc_and_acct_folio(gfp, inode, index, false,
> > > -                                                  &order);
> > > +                                                  &order, sbinfo);
> > >         }
> > >         if (IS_ERR(folio)) {
> > >                 int retry = 5;
> > > @@ -2147,6 +2155,8 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index,
> > >         if (folio_test_large(folio)) {
> > >                 folio_unlock(folio);
> > >                 folio_put(folio);
> > > +               if (order > 0)
> > > +                       order--;
> > >                 goto alloc_nohuge;
> > >         }
> > >  unlock:
> > > --
> > > 2.39.2
> > >

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 6/6] shmem: add large folios support to the write path
  2023-09-18 18:55           ` Yosry Ahmed
@ 2023-09-19 13:27             ` Daniel Gomez
  2023-09-19 16:00               ` Yosry Ahmed
  0 siblings, 1 reply; 29+ messages in thread
From: Daniel Gomez @ 2023-09-19 13:27 UTC (permalink / raw)
  To: Yosry Ahmed
  Cc: minchan, senozhatsky, axboe, djwong, willy, hughd, akpm, mcgrof,
	linux-kernel, linux-block, linux-xfs, linux-fsdevel, linux-mm,
	gost.dev, Pankaj Raghav

On Mon, Sep 18, 2023 at 11:55:34AM -0700, Yosry Ahmed wrote:
> On Mon, Sep 18, 2023 at 1:00 AM Daniel Gomez <da.gomez@samsung.com> wrote:
> >
> > On Fri, Sep 15, 2023 at 11:26:37AM -0700, Yosry Ahmed wrote:
> > > On Fri, Sep 15, 2023 at 2:51 AM Daniel Gomez <da.gomez@samsung.com> wrote:
> > > >
> > > > Add large folio support for shmem write path matching the same high
> > > > order preference mechanism used for iomap buffered IO path as used in
> > > > __filemap_get_folio().
> > > >
> > > > Use the __folio_get_max_order to get a hint for the order of the folio
> > > > based on file size which takes care of the mapping requirements.
> > > >
> > > > Swap does not support high order folios for now, so make it order 0 in
> > > > case swap is enabled.
> > >
> > > I didn't take a close look at the series, but I am not sure I
> > > understand the rationale here. Reclaim will split high order shmem
> > > folios anyway, right?
> >
> > For context, this is part of the enablement of large block sizes (LBS)
> > effort [1][2][3], so the assumption here is that the kernel will
> > reclaim memory with the same (large) block sizes that were written to
> > the device.
> >
> > I'll add more context in the V2.
> >
> > [1] https://protect2.fireeye.com/v1/url?k=a80aab33-c981be05-a80b207c-000babff9b5d-b656d8860b04562f&q=1&e=46666acf-d70d-4e8d-8d00-b027808ae400&u=https%3A%2F%2Fkernelnewbies.org%2FKernelProjects%2Flarge-block-size
> > [2] https://protect2.fireeye.com/v1/url?k=3f753ca2-5efe2994-3f74b7ed-000babff9b5d-e678f885471555e3&q=1&e=46666acf-d70d-4e8d-8d00-b027808ae400&u=https%3A%2F%2Fdocs.google.com%2Fspreadsheets%2Fd%2Fe%2F2PACX-1vS7sQfw90S00l2rfOKm83Jlg0px8KxMQE4HHp_DKRGbAGcAV-xu6LITHBEc4xzVh9wLH6WM2lR0cZS8%2Fpubhtml%23
> > [3] https://lore.kernel.org/all/ZQfbHloBUpDh+zCg@dread.disaster.area/
> > >
> > > It seems like we only enable high order folios if the "noswap" mount
> > > option is used, which is fairly recent. I doubt it is widely used.
> >
> > For now, I skipped the swap path as it currently lacks support for
> > high order folios. But I'm currently looking into it as part of the LBS
> > effort (please check spreadsheet at [2] for that).
>
> Thanks for the context, but I am not sure I understand.
>
> IIUC we are skipping allocating large folios in shmem if swap is
> enabled in this patch. Swap does not support swapping out large folios
> as a whole (except THPs), but page reclaim will split those large
> folios and swap them out as order-0 pages anyway. So I am not sure I
> understand why we need to skip allocating large folios if swap is
> enabled.

I lifted noswap condition and retested it again on top of 230918 and
there is some regression. So, based on the results I guess the initial
requirement may be the way to go. But what do you think?

Here the logs:
* shmem-large-folios-swap: https://gitlab.com/-/snippets/3600360
* shmem-baseline-swap : https://gitlab.com/-/snippets/3600362

-Failures: generic/080 generic/126 generic/193 generic/633 generic/689
-Failed 5 of 730 tests
\ No newline at end of file
+Failures: generic/080 generic/103 generic/126 generic/193 generic/285 generic/436 generic/619 generic/633 generic/689
+Failed 9 of 730 tests
\ No newline at end of file
>
>
> > >
> > > >
> > > > Signed-off-by: Daniel Gomez <da.gomez@samsung.com>
> > > > ---
> > > >  mm/shmem.c | 16 +++++++++++++---
> > > >  1 file changed, 13 insertions(+), 3 deletions(-)
> > > >
> > > > diff --git a/mm/shmem.c b/mm/shmem.c
> > > > index adff74751065..26ca555b1669 100644
> > > > --- a/mm/shmem.c
> > > > +++ b/mm/shmem.c
> > > > @@ -1683,13 +1683,19 @@ static struct folio *shmem_alloc_folio(gfp_t gfp,
> > > >  }
> > > >
> > > >  static struct folio *shmem_alloc_and_acct_folio(gfp_t gfp, struct inode *inode,
> > > > -               pgoff_t index, bool huge, unsigned int *order)
> > > > +               pgoff_t index, bool huge, unsigned int *order,
> > > > +               struct shmem_sb_info *sbinfo)
> > > >  {
> > > >         struct shmem_inode_info *info = SHMEM_I(inode);
> > > >         struct folio *folio;
> > > >         int nr;
> > > >         int err;
> > > >
> > > > +       if (!sbinfo->noswap)
> > > > +               *order = 0;
> > > > +       else
> > > > +               *order = (*order == 1) ? 0 : *order;
> > > > +
> > > >         if (!IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE))
> > > >                 huge = false;
> > > >         nr = huge ? HPAGE_PMD_NR : 1U << *order;
> > > > @@ -2032,6 +2038,8 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index,
> > > >                 return 0;
> > > >         }
> > > >
> > > > +       order = mapping_size_order(inode->i_mapping, index, len);
> > > > +
> > > >         if (!shmem_is_huge(inode, index, false,
> > > >                            vma ? vma->vm_mm : NULL, vma ? vma->vm_flags : 0))
> > > >                 goto alloc_nohuge;
> > > > @@ -2039,11 +2047,11 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index,
> > > >         huge_gfp = vma_thp_gfp_mask(vma);
> > > >         huge_gfp = limit_gfp_mask(huge_gfp, gfp);
> > > >         folio = shmem_alloc_and_acct_folio(huge_gfp, inode, index, true,
> > > > -                                          &order);
> > > > +                                          &order, sbinfo);
> > > >         if (IS_ERR(folio)) {
> > > >  alloc_nohuge:
> > > >                 folio = shmem_alloc_and_acct_folio(gfp, inode, index, false,
> > > > -                                                  &order);
> > > > +                                                  &order, sbinfo);
> > > >         }
> > > >         if (IS_ERR(folio)) {
> > > >                 int retry = 5;
> > > > @@ -2147,6 +2155,8 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index,
> > > >         if (folio_test_large(folio)) {
> > > >                 folio_unlock(folio);
> > > >                 folio_put(folio);
> > > > +               if (order > 0)
> > > > +                       order--;
> > > >                 goto alloc_nohuge;
> > > >         }
> > > >  unlock:
> > > > --
> > > > 2.39.2
> > > >

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 6/6] shmem: add large folios support to the write path
  2023-09-19 13:27             ` Daniel Gomez
@ 2023-09-19 16:00               ` Yosry Ahmed
  2023-09-19 21:46                 ` Luis Chamberlain
  0 siblings, 1 reply; 29+ messages in thread
From: Yosry Ahmed @ 2023-09-19 16:00 UTC (permalink / raw)
  To: Daniel Gomez
  Cc: minchan, senozhatsky, axboe, djwong, willy, hughd, akpm, mcgrof,
	linux-kernel, linux-block, linux-xfs, linux-fsdevel, linux-mm,
	gost.dev, Pankaj Raghav

On Tue, Sep 19, 2023 at 6:27 AM Daniel Gomez <da.gomez@samsung.com> wrote:
>
> On Mon, Sep 18, 2023 at 11:55:34AM -0700, Yosry Ahmed wrote:
> > On Mon, Sep 18, 2023 at 1:00 AM Daniel Gomez <da.gomez@samsung.com> wrote:
> > >
> > > On Fri, Sep 15, 2023 at 11:26:37AM -0700, Yosry Ahmed wrote:
> > > > On Fri, Sep 15, 2023 at 2:51 AM Daniel Gomez <da.gomez@samsung.com> wrote:
> > > > >
> > > > > Add large folio support for shmem write path matching the same high
> > > > > order preference mechanism used for iomap buffered IO path as used in
> > > > > __filemap_get_folio().
> > > > >
> > > > > Use the __folio_get_max_order to get a hint for the order of the folio
> > > > > based on file size which takes care of the mapping requirements.
> > > > >
> > > > > Swap does not support high order folios for now, so make it order 0 in
> > > > > case swap is enabled.
> > > >
> > > > I didn't take a close look at the series, but I am not sure I
> > > > understand the rationale here. Reclaim will split high order shmem
> > > > folios anyway, right?
> > >
> > > For context, this is part of the enablement of large block sizes (LBS)
> > > effort [1][2][3], so the assumption here is that the kernel will
> > > reclaim memory with the same (large) block sizes that were written to
> > > the device.
> > >
> > > I'll add more context in the V2.
> > >
> > > [1] https://protect2.fireeye.com/v1/url?k=a80aab33-c981be05-a80b207c-000babff9b5d-b656d8860b04562f&q=1&e=46666acf-d70d-4e8d-8d00-b027808ae400&u=https%3A%2F%2Fkernelnewbies.org%2FKernelProjects%2Flarge-block-size
> > > [2] https://protect2.fireeye.com/v1/url?k=3f753ca2-5efe2994-3f74b7ed-000babff9b5d-e678f885471555e3&q=1&e=46666acf-d70d-4e8d-8d00-b027808ae400&u=https%3A%2F%2Fdocs.google.com%2Fspreadsheets%2Fd%2Fe%2F2PACX-1vS7sQfw90S00l2rfOKm83Jlg0px8KxMQE4HHp_DKRGbAGcAV-xu6LITHBEc4xzVh9wLH6WM2lR0cZS8%2Fpubhtml%23
> > > [3] https://lore.kernel.org/all/ZQfbHloBUpDh+zCg@dread.disaster.area/
> > > >
> > > > It seems like we only enable high order folios if the "noswap" mount
> > > > option is used, which is fairly recent. I doubt it is widely used.
> > >
> > > For now, I skipped the swap path as it currently lacks support for
> > > high order folios. But I'm currently looking into it as part of the LBS
> > > effort (please check spreadsheet at [2] for that).
> >
> > Thanks for the context, but I am not sure I understand.
> >
> > IIUC we are skipping allocating large folios in shmem if swap is
> > enabled in this patch. Swap does not support swapping out large folios
> > as a whole (except THPs), but page reclaim will split those large
> > folios and swap them out as order-0 pages anyway. So I am not sure I
> > understand why we need to skip allocating large folios if swap is
> > enabled.
>
> I lifted noswap condition and retested it again on top of 230918 and
> there is some regression. So, based on the results I guess the initial
> requirement may be the way to go. But what do you think?
>
> Here the logs:
> * shmem-large-folios-swap: https://gitlab.com/-/snippets/3600360
> * shmem-baseline-swap : https://gitlab.com/-/snippets/3600362
>
> -Failures: generic/080 generic/126 generic/193 generic/633 generic/689
> -Failed 5 of 730 tests
> \ No newline at end of file
> +Failures: generic/080 generic/103 generic/126 generic/193 generic/285 generic/436 generic/619 generic/633 generic/689
> +Failed 9 of 730 tests
> \ No newline at end of file
> >

I am not really familiar with these tests so I cannot really tell
what's going on. I can see "swapfiles are not supported" in the logs
though, so it seems like we are seeing extra failures by just lifting
"noswap" even without actually swapping. I am curious if this is just
hiding a different issue, I would at least try to understand what's
happening.

Anyway, I don't have enough context here to be useful. I was just
making an observation about reclaim splitting shmem folios to swap
them out as order-0 pages, and asking why this is needed based on
that. I will leave it up to you and the reviewers to decide if there's
anything interesting here.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 6/6] shmem: add large folios support to the write path
  2023-09-19 16:00               ` Yosry Ahmed
@ 2023-09-19 21:46                 ` Luis Chamberlain
  2023-09-19 21:51                   ` Yosry Ahmed
  0 siblings, 1 reply; 29+ messages in thread
From: Luis Chamberlain @ 2023-09-19 21:46 UTC (permalink / raw)
  To: Yosry Ahmed
  Cc: Daniel Gomez, minchan, senozhatsky, axboe, djwong, willy, hughd,
	akpm, linux-kernel, linux-block, linux-xfs, linux-fsdevel,
	linux-mm, gost.dev, Pankaj Raghav

On Tue, Sep 19, 2023 at 09:00:16AM -0700, Yosry Ahmed wrote:
> On Tue, Sep 19, 2023 at 6:27 AM Daniel Gomez <da.gomez@samsung.com> wrote:
> >
> > On Mon, Sep 18, 2023 at 11:55:34AM -0700, Yosry Ahmed wrote:
> > > On Mon, Sep 18, 2023 at 1:00 AM Daniel Gomez <da.gomez@samsung.com> wrote:
> > > >
> > > > On Fri, Sep 15, 2023 at 11:26:37AM -0700, Yosry Ahmed wrote:
> > > > > On Fri, Sep 15, 2023 at 2:51 AM Daniel Gomez <da.gomez@samsung.com> wrote:
> > > > > >
> > > > > > Add large folio support for shmem write path matching the same high
> > > > > > order preference mechanism used for iomap buffered IO path as used in
> > > > > > __filemap_get_folio().
> > > > > >
> > > > > > Use the __folio_get_max_order to get a hint for the order of the folio
> > > > > > based on file size which takes care of the mapping requirements.
> > > > > >
> > > > > > Swap does not support high order folios for now, so make it order 0 in
> > > > > > case swap is enabled.
> > > > >
> > > > > I didn't take a close look at the series, but I am not sure I
> > > > > understand the rationale here. Reclaim will split high order shmem
> > > > > folios anyway, right?
> > > >
> > > > For context, this is part of the enablement of large block sizes (LBS)
> > > > effort [1][2][3], so the assumption here is that the kernel will
> > > > reclaim memory with the same (large) block sizes that were written to
> > > > the device.
> > > >
> > > > I'll add more context in the V2.
> > > >
> > > > [1] https://protect2.fireeye.com/v1/url?k=a80aab33-c981be05-a80b207c-000babff9b5d-b656d8860b04562f&q=1&e=46666acf-d70d-4e8d-8d00-b027808ae400&u=https%3A%2F%2Fkernelnewbies.org%2FKernelProjects%2Flarge-block-size
> > > > [2] https://protect2.fireeye.com/v1/url?k=3f753ca2-5efe2994-3f74b7ed-000babff9b5d-e678f885471555e3&q=1&e=46666acf-d70d-4e8d-8d00-b027808ae400&u=https%3A%2F%2Fdocs.google.com%2Fspreadsheets%2Fd%2Fe%2F2PACX-1vS7sQfw90S00l2rfOKm83Jlg0px8KxMQE4HHp_DKRGbAGcAV-xu6LITHBEc4xzVh9wLH6WM2lR0cZS8%2Fpubhtml%23
> > > > [3] https://lore.kernel.org/all/ZQfbHloBUpDh+zCg@dread.disaster.area/
> > > > >
> > > > > It seems like we only enable high order folios if the "noswap" mount
> > > > > option is used, which is fairly recent. I doubt it is widely used.
> > > >
> > > > For now, I skipped the swap path as it currently lacks support for
> > > > high order folios. But I'm currently looking into it as part of the LBS
> > > > effort (please check spreadsheet at [2] for that).
> > >
> > > Thanks for the context, but I am not sure I understand.
> > >
> > > IIUC we are skipping allocating large folios in shmem if swap is
> > > enabled in this patch. Swap does not support swapping out large folios
> > > as a whole (except THPs), but page reclaim will split those large
> > > folios and swap them out as order-0 pages anyway. So I am not sure I
> > > understand why we need to skip allocating large folios if swap is
> > > enabled.
> >
> > I lifted noswap condition and retested it again on top of 230918 and
> > there is some regression. So, based on the results I guess the initial
> > requirement may be the way to go. But what do you think?
> >
> > Here the logs:
> > * shmem-large-folios-swap: https://gitlab.com/-/snippets/3600360
> > * shmem-baseline-swap : https://gitlab.com/-/snippets/3600362
> >
> > -Failures: generic/080 generic/126 generic/193 generic/633 generic/689
> > -Failed 5 of 730 tests
> > \ No newline at end of file
> > +Failures: generic/080 generic/103 generic/126 generic/193 generic/285 generic/436 generic/619 generic/633 generic/689
> > +Failed 9 of 730 tests
> > \ No newline at end of file
> > >
> 
> I am not really familiar with these tests so I cannot really tell
> what's going on. I can see "swapfiles are not supported" in the logs
> though, so it seems like we are seeing extra failures by just lifting
> "noswap" even without actually swapping. I am curious if this is just
> hiding a different issue, I would at least try to understand what's
> happening.
> 
> Anyway, I don't have enough context here to be useful. I was just
> making an observation about reclaim splitting shmem folios to swap
> them out as order-0 pages, and asking why this is needed based on
> that. I will leave it up to you and the reviewers to decide if there's
> anything interesting here.

The tests which are failing seem be related to permissions, I could not
immediate decipher why, because as you suggest we'd just be doing the
silly thing of splitting large folios on writepage.

I'd prefer we don't require swap until those regressions would be fixed.

Note that part of the rationale to enable this work is to eventually
also extend swap code to support large order folios, so it is not like
this would be left as-is. It is just that it may take time to resolve
the kinks with swap.

So I'd stick to nowap for now.

The above tests also don't stress swap too, and if we do that I would
imagine we might see some other undesirable failures.

 Luis

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 6/6] shmem: add large folios support to the write path
  2023-09-19 21:46                 ` Luis Chamberlain
@ 2023-09-19 21:51                   ` Yosry Ahmed
  0 siblings, 0 replies; 29+ messages in thread
From: Yosry Ahmed @ 2023-09-19 21:51 UTC (permalink / raw)
  To: Luis Chamberlain
  Cc: Daniel Gomez, minchan, senozhatsky, axboe, djwong, willy, hughd,
	akpm, linux-kernel, linux-block, linux-xfs, linux-fsdevel,
	linux-mm, gost.dev, Pankaj Raghav

On Tue, Sep 19, 2023 at 2:47 PM Luis Chamberlain <mcgrof@kernel.org> wrote:
>
> On Tue, Sep 19, 2023 at 09:00:16AM -0700, Yosry Ahmed wrote:
> > On Tue, Sep 19, 2023 at 6:27 AM Daniel Gomez <da.gomez@samsung.com> wrote:
> > >
> > > On Mon, Sep 18, 2023 at 11:55:34AM -0700, Yosry Ahmed wrote:
> > > > On Mon, Sep 18, 2023 at 1:00 AM Daniel Gomez <da.gomez@samsung.com> wrote:
> > > > >
> > > > > On Fri, Sep 15, 2023 at 11:26:37AM -0700, Yosry Ahmed wrote:
> > > > > > On Fri, Sep 15, 2023 at 2:51 AM Daniel Gomez <da.gomez@samsung.com> wrote:
> > > > > > >
> > > > > > > Add large folio support for shmem write path matching the same high
> > > > > > > order preference mechanism used for iomap buffered IO path as used in
> > > > > > > __filemap_get_folio().
> > > > > > >
> > > > > > > Use the __folio_get_max_order to get a hint for the order of the folio
> > > > > > > based on file size which takes care of the mapping requirements.
> > > > > > >
> > > > > > > Swap does not support high order folios for now, so make it order 0 in
> > > > > > > case swap is enabled.
> > > > > >
> > > > > > I didn't take a close look at the series, but I am not sure I
> > > > > > understand the rationale here. Reclaim will split high order shmem
> > > > > > folios anyway, right?
> > > > >
> > > > > For context, this is part of the enablement of large block sizes (LBS)
> > > > > effort [1][2][3], so the assumption here is that the kernel will
> > > > > reclaim memory with the same (large) block sizes that were written to
> > > > > the device.
> > > > >
> > > > > I'll add more context in the V2.
> > > > >
> > > > > [1] https://protect2.fireeye.com/v1/url?k=a80aab33-c981be05-a80b207c-000babff9b5d-b656d8860b04562f&q=1&e=46666acf-d70d-4e8d-8d00-b027808ae400&u=https%3A%2F%2Fkernelnewbies.org%2FKernelProjects%2Flarge-block-size
> > > > > [2] https://protect2.fireeye.com/v1/url?k=3f753ca2-5efe2994-3f74b7ed-000babff9b5d-e678f885471555e3&q=1&e=46666acf-d70d-4e8d-8d00-b027808ae400&u=https%3A%2F%2Fdocs.google.com%2Fspreadsheets%2Fd%2Fe%2F2PACX-1vS7sQfw90S00l2rfOKm83Jlg0px8KxMQE4HHp_DKRGbAGcAV-xu6LITHBEc4xzVh9wLH6WM2lR0cZS8%2Fpubhtml%23
> > > > > [3] https://lore.kernel.org/all/ZQfbHloBUpDh+zCg@dread.disaster.area/
> > > > > >
> > > > > > It seems like we only enable high order folios if the "noswap" mount
> > > > > > option is used, which is fairly recent. I doubt it is widely used.
> > > > >
> > > > > For now, I skipped the swap path as it currently lacks support for
> > > > > high order folios. But I'm currently looking into it as part of the LBS
> > > > > effort (please check spreadsheet at [2] for that).
> > > >
> > > > Thanks for the context, but I am not sure I understand.
> > > >
> > > > IIUC we are skipping allocating large folios in shmem if swap is
> > > > enabled in this patch. Swap does not support swapping out large folios
> > > > as a whole (except THPs), but page reclaim will split those large
> > > > folios and swap them out as order-0 pages anyway. So I am not sure I
> > > > understand why we need to skip allocating large folios if swap is
> > > > enabled.
> > >
> > > I lifted noswap condition and retested it again on top of 230918 and
> > > there is some regression. So, based on the results I guess the initial
> > > requirement may be the way to go. But what do you think?
> > >
> > > Here the logs:
> > > * shmem-large-folios-swap: https://gitlab.com/-/snippets/3600360
> > > * shmem-baseline-swap : https://gitlab.com/-/snippets/3600362
> > >
> > > -Failures: generic/080 generic/126 generic/193 generic/633 generic/689
> > > -Failed 5 of 730 tests
> > > \ No newline at end of file
> > > +Failures: generic/080 generic/103 generic/126 generic/193 generic/285 generic/436 generic/619 generic/633 generic/689
> > > +Failed 9 of 730 tests
> > > \ No newline at end of file
> > > >
> >
> > I am not really familiar with these tests so I cannot really tell
> > what's going on. I can see "swapfiles are not supported" in the logs
> > though, so it seems like we are seeing extra failures by just lifting
> > "noswap" even without actually swapping. I am curious if this is just
> > hiding a different issue, I would at least try to understand what's
> > happening.
> >
> > Anyway, I don't have enough context here to be useful. I was just
> > making an observation about reclaim splitting shmem folios to swap
> > them out as order-0 pages, and asking why this is needed based on
> > that. I will leave it up to you and the reviewers to decide if there's
> > anything interesting here.
>
> The tests which are failing seem be related to permissions, I could not
> immediate decipher why, because as you suggest we'd just be doing the
> silly thing of splitting large folios on writepage.
>
> I'd prefer we don't require swap until those regressions would be fixed.
>
> Note that part of the rationale to enable this work is to eventually
> also extend swap code to support large order folios, so it is not like
> this would be left as-is. It is just that it may take time to resolve
> the kinks with swap.
>
> So I'd stick to nowap for now.
>
> The above tests also don't stress swap too, and if we do that I would
> imagine we might see some other undesirable failures.
>
>  Luis

I thought we already have some notion of exercising swap with large
shmem folios from THPs, so this shouldn't be new, but perhaps I am
missing something.

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2023-09-19 21:52 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CGME20230915095123eucas1p2c23d8a8d910f5a8e9fd077dd9579ad0a@eucas1p2.samsung.com>
2023-09-15  9:51 ` [PATCH 0/6] shmem: high order folios support in write path Daniel Gomez
     [not found]   ` <CGME20230915095124eucas1p1eb0e0ef883f6316cf14c349404a51150@eucas1p1.samsung.com>
2023-09-15  9:51     ` [PATCH 1/6] filemap: make the folio order calculation shareable Daniel Gomez
2023-09-15 13:40       ` Matthew Wilcox
2023-09-18  8:41         ` Daniel Gomez
2023-09-18 18:09         ` Luis Chamberlain
2023-09-18 18:24           ` Matthew Wilcox
2023-09-18 18:42             ` Luis Chamberlain
     [not found]   ` <CGME20230915095126eucas1p2cf75674dab8a81228f493a7200f4a1ba@eucas1p2.samsung.com>
2023-09-15  9:51     ` [PATCH 2/6] shmem: drop BLOCKS_PER_PAGE macro Daniel Gomez
     [not found]   ` <CGME20230915095128eucas1p2885c3add58d82413d9c1d17832d3d281@eucas1p2.samsung.com>
2023-09-15  9:51     ` [PATCH 3/6] shmem: account for large order folios Daniel Gomez
2023-09-15 12:14       ` Matthew Wilcox
2023-09-15 13:44       ` Matthew Wilcox
     [not found]   ` <CGME20230915095129eucas1p1383d75c6d62056afbb20b78a3ec15234@eucas1p1.samsung.com>
2023-09-15  9:51     ` [PATCH 4/6] shmem: add order parameter support to shmem_alloc_folio Daniel Gomez
2023-09-15 12:19       ` Matthew Wilcox
     [not found]   ` <CGME20230915095131eucas1p1010e364cd1c351e5b7379954bd237a3d@eucas1p1.samsung.com>
2023-09-15  9:51     ` [PATCH 5/6] shmem: add file length in shmem_get_folio path Daniel Gomez
2023-09-15 15:37       ` Matthew Wilcox
     [not found]   ` <CGME20230915095133eucas1p267bade2888b7fcd2e1ea8e13e21c495f@eucas1p2.samsung.com>
2023-09-15  9:51     ` [PATCH 6/6] shmem: add large folios support to the write path Daniel Gomez
2023-09-15 18:26       ` Yosry Ahmed
2023-09-18  8:00         ` Daniel Gomez
2023-09-18 18:55           ` Yosry Ahmed
2023-09-19 13:27             ` Daniel Gomez
2023-09-19 16:00               ` Yosry Ahmed
2023-09-19 21:46                 ` Luis Chamberlain
2023-09-19 21:51                   ` Yosry Ahmed
2023-09-15 15:29   ` [PATCH 0/6] shmem: high order folios support in " David Hildenbrand
2023-09-15 15:34     ` Matthew Wilcox
2023-09-15 15:36       ` David Hildenbrand
2023-09-15 15:40         ` Matthew Wilcox
2023-09-15 15:43           ` David Hildenbrand
2023-09-18  7:32     ` Daniel Gomez

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).