linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/13] enable bs > ps in XFS
@ 2024-02-26  9:49 Pankaj Raghav (Samsung)
  2024-02-26  9:49 ` [PATCH 01/13] mm: Support order-1 folios in the page cache Pankaj Raghav (Samsung)
                   ` (12 more replies)
  0 siblings, 13 replies; 35+ messages in thread
From: Pankaj Raghav (Samsung) @ 2024-02-26  9:49 UTC (permalink / raw)
  To: linux-xfs, linux-fsdevel
  Cc: linux-kernel, david, chandan.babu, akpm, mcgrof, ziy, hare,
	djwong, gost.dev, linux-mm, willy, Pankaj Raghav

From: Pankaj Raghav <p.raghav@samsung.com>

This is the first version of the series(2 RFCs posted before) that enables
block size > page size(Large Block Size) in XFS. This version has various
bug fixes and suggestion collected from previous RFCs[1][2]. The context
and motivation can be seen in cover letter of the RFC v1[1]. We also
recorded a talk about this effort at LPC [3], if someone would like more
context on this effort.

A lot of emphasis has been put on testing using kdevops. The testing has
been split into regression and progression.

Regression testing:
In regression testing, we ran the whole test suite with SOAK_DURATION=2.5
hours to check for *regression on existing profiles due to the page cache
changes.

No regression was found with the patches added on top.

*Baseline for regression was created using SOAK_DURATION of 2.5 hours
and having used about 7-8 XFS test clusters to test loop fstests over
70 times. We then scraped for critical failures (crashes, XFS or page
cache asserts, or hung tasks) and have reported these to the community
as well.[5]

Progression testing:
For progression testing, we tested for 8k, 16k, 32k and 64k block sizes.
To compare it with existing support, an ARM VM with 64k base page system
(without our patches) was used as a reference to check for actual failures
due to LBS support in a 4k base page size system.

There are some common failures upstream for bs=64k that needs to be fixed[4].
There are also some tests that assumes block size < page size that needs to
be fixed. I have a tree with fixes for xfstests here [6], which I will be
sending soon to the list.

No new failures were found with LBS support.

We've done some preliminary performance tests with fio on XFS on 4k block
size against pmem and NVMe with buffered IO and Direct IO on vanilla
v6.8-rc4 Vs v6.8-rc4 + these patches applied, and detected no regressions.

We also wrote an eBPF tool called blkalgn [7] to see if IO sent to the device
is aligned and at least filesystem block size in length.

I have also started the discussion with Zi Yan to upstream truncation
with lower folio order that will improve the memory utilization when
partial truncate happens with LBS support(Patch 9).[8]

The series has been greatly improved (and simplified) since the previous
version. Thanks to Chinner, Darrick, Hannes and willy for your comments.

[1] https://lore.kernel.org/lkml/20230915183848.1018717-1-kernel@pankajraghav.com/
[2] https://lore.kernel.org/linux-xfs/20240213093713.1753368-1-kernel@pankajraghav.com/
[3] https://www.youtube.com/watch?v=ar72r5Xf7x4
[4] https://github.com/linux-kdevops/kdevops/blob/master/docs/xfs-bugs.md
489 non-critical issues and 55 critical issues. We've determined and reported
that the 55 critical issues have all fall into 5 common  XFS asserts or hung
tasks  and 2 memory management asserts.
[5] https://lore.kernel.org/linux-xfs/fe7fec1c-3b08-430f-9c95-ea76b237acf4@samsung.com/
[6] https://github.com/Panky-codes/xfstests/tree/lbs-fixes
[7] https://github.com/iovisor/bcc/pull/4813
[8] https://lore.kernel.org/all/dvamjmlss62p5pf4das7nu5q35ftf4jlk3viwzyyvzasv4qjns@h3omqs7ecstd/

Changes since RFC v2:
- Move order 1 patch above the 1st patch
- Remove order == 1 conditional in `fs: Allow fine-grained control of
folio sizes`. This fixed generic/630 that was reported in the previous version.
- Hide the max order and expose `mapping_set_folio_min_order` instead.
- Add new helper mapping_start_index_align and DEFINE_READAHEAD_ALIGN
- don't call `page_cache_ra_order` with min order in do_mmap_sync_readahead
- simplify ondemand readahead with only aligning the start index at the end
- Don't cap ra_pages based on bdi->io_pages
- use `checked_mul_overflow` while calculating bytes in validate_fsb
- Remove config lbs option
- Add a warning while mounting a LBS kernel
- Add Acked-by and Reviewed-by from Hannes and Darrick.

Changes since RFC v1:
- Added willy's patch to enable order-1 folios.
- Unified common page cache effort from Hannes LBS work.
- Added a new helper min_nrpages and added CONFIG_THP for enabling mapping_large_folio_support
- Don't split a folio if it has minorder set. Remove the old code where we set extra pins if it has that requirement.
- Split the code in XFS between the validation of mapping count. Put the icache code changes with enabling bs > ps.
- Added CONFIG_XFS_LBS option
- align the index in do_read_cache_folio()
- Removed truncate changes
- Fixed generic/091 with iomap changes to iomap_dio_zero function.
- Took care of folio truncation scenario in page_cache_ra_unbounded() that happens after read_pages if a folio was found.
- Sqaushed and moved commits around
- Rebased on top of v6.8-rc4

Dave Chinner (1):
  xfs: expose block size in stat

Hannes Reinecke (1):
  readahead: rework loop in page_cache_ra_unbounded()

Luis Chamberlain (3):
  filemap: align the index to mapping_min_order in the page cache
  readahead: set file_ra_state->ra_pages to be at least
    mapping_min_order
  readahead: align index to mapping_min_order in ondemand_ra and
    force_ra

Matthew Wilcox (Oracle) (2):
  mm: Support order-1 folios in the page cache
  fs: Allow fine-grained control of folio sizes

Pankaj Raghav (6):
  filemap: use mapping_min_order while allocating folios
  readahead: allocate folios with mapping_min_order in
    ra_(unbounded|order)
  mm: do not split a folio if it has minimum folio order requirement
  iomap: fix iomap_dio_zero() for fs bs > system page size
  xfs: make the calculation generic in xfs_sb_validate_fsb_count()
  xfs: enable block size larger than page size support

 fs/iomap/direct-io.c       |  13 ++++-
 fs/xfs/libxfs/xfs_ialloc.c |   5 ++
 fs/xfs/libxfs/xfs_shared.h |   3 ++
 fs/xfs/xfs_icache.c        |   6 ++-
 fs/xfs/xfs_iops.c          |   2 +-
 fs/xfs/xfs_mount.c         |   9 +++-
 fs/xfs/xfs_super.c         |  10 +---
 include/linux/huge_mm.h    |   7 ++-
 include/linux/pagemap.h    | 108 ++++++++++++++++++++++++++++++-------
 mm/filemap.c               |  48 +++++++++++------
 mm/huge_memory.c           |  36 +++++++++++--
 mm/internal.h              |   4 +-
 mm/readahead.c             |  74 ++++++++++++++++++-------
 13 files changed, 246 insertions(+), 79 deletions(-)


base-commit: b401b621758e46812da61fa58a67c3fd8d91de0d
-- 
2.43.0


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 01/13] mm: Support order-1 folios in the page cache
  2024-02-26  9:49 [PATCH 00/13] enable bs > ps in XFS Pankaj Raghav (Samsung)
@ 2024-02-26  9:49 ` Pankaj Raghav (Samsung)
  2024-02-26  9:49 ` [PATCH 02/13] fs: Allow fine-grained control of folio sizes Pankaj Raghav (Samsung)
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 35+ messages in thread
From: Pankaj Raghav (Samsung) @ 2024-02-26  9:49 UTC (permalink / raw)
  To: linux-xfs, linux-fsdevel
  Cc: linux-kernel, david, chandan.babu, akpm, mcgrof, ziy, hare,
	djwong, gost.dev, linux-mm, willy

From: "Matthew Wilcox (Oracle)" <willy@infradead.org>

Folios of order 1 have no space to store the deferred list.  This is
not a problem for the page cache as file-backed folios are never
placed on the deferred list.  All we need to do is prevent the core
MM from touching the deferred list for order 1 folios and remove the
code which prevented us from allocating order 1 folios.

Link: https://lore.kernel.org/linux-mm/90344ea7-4eec-47ee-5996-0c22f42d6a6a@google.com/
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 include/linux/huge_mm.h |  7 +++++--
 mm/filemap.c            |  2 --
 mm/huge_memory.c        | 23 ++++++++++++++++++-----
 mm/internal.h           |  4 +---
 mm/readahead.c          |  3 ---
 5 files changed, 24 insertions(+), 15 deletions(-)

diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index 5adb86af35fc..916a2a539517 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -263,7 +263,7 @@ unsigned long thp_vma_allowable_orders(struct vm_area_struct *vma,
 unsigned long thp_get_unmapped_area(struct file *filp, unsigned long addr,
 		unsigned long len, unsigned long pgoff, unsigned long flags);
 
-void folio_prep_large_rmappable(struct folio *folio);
+struct folio *folio_prep_large_rmappable(struct folio *folio);
 bool can_split_folio(struct folio *folio, int *pextra_pins);
 int split_huge_page_to_list(struct page *page, struct list_head *list);
 static inline int split_huge_page(struct page *page)
@@ -410,7 +410,10 @@ static inline unsigned long thp_vma_allowable_orders(struct vm_area_struct *vma,
 	return 0;
 }
 
-static inline void folio_prep_large_rmappable(struct folio *folio) {}
+static inline struct folio *folio_prep_large_rmappable(struct folio *folio)
+{
+	return folio;
+}
 
 #define transparent_hugepage_flags 0UL
 
diff --git a/mm/filemap.c b/mm/filemap.c
index 750e779c23db..2b00442b9d19 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1912,8 +1912,6 @@ struct folio *__filemap_get_folio(struct address_space *mapping, pgoff_t index,
 			gfp_t alloc_gfp = gfp;
 
 			err = -ENOMEM;
-			if (order == 1)
-				order = 0;
 			if (order > 0)
 				alloc_gfp |= __GFP_NORETRY | __GFP_NOWARN;
 			folio = filemap_alloc_folio(alloc_gfp, order);
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 94c958f7ebb5..81fd1ba57088 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -788,11 +788,15 @@ struct deferred_split *get_deferred_split_queue(struct folio *folio)
 }
 #endif
 
-void folio_prep_large_rmappable(struct folio *folio)
+struct folio *folio_prep_large_rmappable(struct folio *folio)
 {
-	VM_BUG_ON_FOLIO(folio_order(folio) < 2, folio);
-	INIT_LIST_HEAD(&folio->_deferred_list);
+	if (!folio || !folio_test_large(folio))
+		return folio;
+	if (folio_order(folio) > 1)
+		INIT_LIST_HEAD(&folio->_deferred_list);
 	folio_set_large_rmappable(folio);
+
+	return folio;
 }
 
 static inline bool is_transparent_hugepage(struct folio *folio)
@@ -3082,7 +3086,8 @@ int split_huge_page_to_list(struct page *page, struct list_head *list)
 	/* Prevent deferred_split_scan() touching ->_refcount */
 	spin_lock(&ds_queue->split_queue_lock);
 	if (folio_ref_freeze(folio, 1 + extra_pins)) {
-		if (!list_empty(&folio->_deferred_list)) {
+		if (folio_order(folio) > 1 &&
+		    !list_empty(&folio->_deferred_list)) {
 			ds_queue->split_queue_len--;
 			list_del(&folio->_deferred_list);
 		}
@@ -3133,6 +3138,9 @@ void folio_undo_large_rmappable(struct folio *folio)
 	struct deferred_split *ds_queue;
 	unsigned long flags;
 
+	if (folio_order(folio) <= 1)
+		return;
+
 	/*
 	 * At this point, there is no one trying to add the folio to
 	 * deferred_list. If folio is not in deferred_list, it's safe
@@ -3158,7 +3166,12 @@ void deferred_split_folio(struct folio *folio)
 #endif
 	unsigned long flags;
 
-	VM_BUG_ON_FOLIO(folio_order(folio) < 2, folio);
+	/*
+	 * Order 1 folios have no space for a deferred list, but we also
+	 * won't waste much memory by not adding them to the deferred list.
+	 */
+	if (folio_order(folio) <= 1)
+		return;
 
 	/*
 	 * The try_to_unmap() in page reclaim path might reach here too,
diff --git a/mm/internal.h b/mm/internal.h
index f309a010d50f..5174b5b0c344 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -419,9 +419,7 @@ static inline struct folio *page_rmappable_folio(struct page *page)
 {
 	struct folio *folio = (struct folio *)page;
 
-	if (folio && folio_order(folio) > 1)
-		folio_prep_large_rmappable(folio);
-	return folio;
+	return folio_prep_large_rmappable(folio);
 }
 
 static inline void prep_compound_head(struct page *page, unsigned int order)
diff --git a/mm/readahead.c b/mm/readahead.c
index 2648ec4f0494..369c70e2be42 100644
--- a/mm/readahead.c
+++ b/mm/readahead.c
@@ -516,9 +516,6 @@ void page_cache_ra_order(struct readahead_control *ractl,
 		/* Don't allocate pages past EOF */
 		while (index + (1UL << order) - 1 > limit)
 			order--;
-		/* THP machinery does not support order-1 */
-		if (order == 1)
-			order = 0;
 		err = ra_alloc_folio(ractl, index, mark, order, gfp);
 		if (err)
 			break;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 02/13] fs: Allow fine-grained control of folio sizes
  2024-02-26  9:49 [PATCH 00/13] enable bs > ps in XFS Pankaj Raghav (Samsung)
  2024-02-26  9:49 ` [PATCH 01/13] mm: Support order-1 folios in the page cache Pankaj Raghav (Samsung)
@ 2024-02-26  9:49 ` Pankaj Raghav (Samsung)
  2024-02-26  9:49 ` [PATCH 03/13] filemap: align the index to mapping_min_order in the page cache Pankaj Raghav (Samsung)
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 35+ messages in thread
From: Pankaj Raghav (Samsung) @ 2024-02-26  9:49 UTC (permalink / raw)
  To: linux-xfs, linux-fsdevel
  Cc: linux-kernel, david, chandan.babu, akpm, mcgrof, ziy, hare,
	djwong, gost.dev, linux-mm, willy, Pankaj Raghav

From: "Matthew Wilcox (Oracle)" <willy@infradead.org>

Some filesystems want to be able to ensure that folios that are added to
the page cache are at least a certain size.
Add mapping_set_folio_min_order() to allow this level of control.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Co-developed-by: Pankaj Raghav <p.raghav@samsung.com>
Signed-off-by: Pankaj Raghav <p.raghav@samsung.com>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
---
 include/linux/pagemap.h | 100 ++++++++++++++++++++++++++++++++--------
 1 file changed, 80 insertions(+), 20 deletions(-)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 2df35e65557d..fc8eb9c94e9c 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -202,13 +202,18 @@ enum mapping_flags {
 	AS_EXITING	= 4, 	/* final truncate in progress */
 	/* writeback related tags are not used */
 	AS_NO_WRITEBACK_TAGS = 5,
-	AS_LARGE_FOLIO_SUPPORT = 6,
-	AS_RELEASE_ALWAYS,	/* Call ->release_folio(), even if no private data */
-	AS_STABLE_WRITES,	/* must wait for writeback before modifying
+	AS_RELEASE_ALWAYS = 6,	/* Call ->release_folio(), even if no private data */
+	AS_STABLE_WRITES = 7,	/* must wait for writeback before modifying
 				   folio contents */
-	AS_UNMOVABLE,		/* The mapping cannot be moved, ever */
+	AS_FOLIO_ORDER_MIN = 8,
+	AS_FOLIO_ORDER_MAX = 13, /* Bit 8-17 are used for FOLIO_ORDER */
+	AS_UNMOVABLE = 18,		/* The mapping cannot be moved, ever */
 };
 
+#define AS_FOLIO_ORDER_MIN_MASK 0x00001f00
+#define AS_FOLIO_ORDER_MAX_MASK 0x0003e000
+#define AS_FOLIO_ORDER_MASK (AS_FOLIO_ORDER_MIN_MASK | AS_FOLIO_ORDER_MAX_MASK)
+
 /**
  * mapping_set_error - record a writeback error in the address_space
  * @mapping: the mapping in which an error should be set
@@ -344,9 +349,47 @@ static inline void mapping_set_gfp_mask(struct address_space *m, gfp_t mask)
 	m->gfp_mask = mask;
 }
 
+/*
+ * There are some parts of the kernel which assume that PMD entries
+ * are exactly HPAGE_PMD_ORDER.  Those should be fixed, but until then,
+ * limit the maximum allocation order to PMD size.  I'm not aware of any
+ * assumptions about maximum order if THP are disabled, but 8 seems like
+ * a good order (that's 1MB if you're using 4kB pages)
+ */
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+#define MAX_PAGECACHE_ORDER	HPAGE_PMD_ORDER
+#else
+#define MAX_PAGECACHE_ORDER	8
+#endif
+
+/*
+ * mapping_set_folio_min_order() - Set the minimum folio order
+ * @mapping: The address_space.
+ * @min: Minimum folio order (between 0-MAX_PAGECACHE_ORDER inclusive).
+ *
+ * The filesystem should call this function in its inode constructor to
+ * indicate which base size of folio the VFS can use to cache the contents
+ * of the file.  This should only be used if the filesystem needs special
+ * handling of folio sizes (ie there is something the core cannot know).
+ * Do not tune it based on, eg, i_size.
+ *
+ * Context: This should not be called while the inode is active as it
+ * is non-atomic.
+ */
+static inline void mapping_set_folio_min_order(struct address_space *mapping,
+					       unsigned int min)
+{
+	if (min > MAX_PAGECACHE_ORDER)
+		min = MAX_PAGECACHE_ORDER;
+
+	mapping->flags = (mapping->flags & ~AS_FOLIO_ORDER_MASK) |
+			 (min << AS_FOLIO_ORDER_MIN) |
+			 (MAX_PAGECACHE_ORDER << AS_FOLIO_ORDER_MAX);
+}
+
 /**
  * mapping_set_large_folios() - Indicate the file supports large folios.
- * @mapping: The file.
+ * @mapping: The address_space.
  *
  * The filesystem should call this function in its inode constructor to
  * indicate that the VFS can use large folios to cache the contents of
@@ -357,7 +400,37 @@ static inline void mapping_set_gfp_mask(struct address_space *m, gfp_t mask)
  */
 static inline void mapping_set_large_folios(struct address_space *mapping)
 {
-	__set_bit(AS_LARGE_FOLIO_SUPPORT, &mapping->flags);
+	mapping_set_folio_min_order(mapping, 0);
+}
+
+static inline unsigned int mapping_max_folio_order(struct address_space *mapping)
+{
+	return (mapping->flags & AS_FOLIO_ORDER_MAX_MASK) >> AS_FOLIO_ORDER_MAX;
+}
+
+static inline unsigned int mapping_min_folio_order(struct address_space *mapping)
+{
+	return (mapping->flags & AS_FOLIO_ORDER_MIN_MASK) >> AS_FOLIO_ORDER_MIN;
+}
+
+static inline unsigned long mapping_min_folio_nrpages(struct address_space *mapping)
+{
+	return 1UL << mapping_min_folio_order(mapping);
+}
+
+/**
+ * mapping_align_start_index() - Align starting index based on the min
+ * folio order of the page cache.
+ * @mapping: The address_space.
+ *
+ * Ensure the index used is aligned to the minimum folio order when adding
+ * new folios to the page cache by rounding down to the nearest minimum
+ * folio number of pages.
+ */
+static inline pgoff_t mapping_align_start_index(struct address_space *mapping,
+						pgoff_t index)
+{
+	return round_down(index, mapping_min_folio_nrpages(mapping));
 }
 
 /*
@@ -367,7 +440,7 @@ static inline void mapping_set_large_folios(struct address_space *mapping)
 static inline bool mapping_large_folio_support(struct address_space *mapping)
 {
 	return IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) &&
-		test_bit(AS_LARGE_FOLIO_SUPPORT, &mapping->flags);
+	       (mapping_max_folio_order(mapping) > 0);
 }
 
 static inline int filemap_nr_thps(struct address_space *mapping)
@@ -528,19 +601,6 @@ static inline void *detach_page_private(struct page *page)
 	return folio_detach_private(page_folio(page));
 }
 
-/*
- * There are some parts of the kernel which assume that PMD entries
- * are exactly HPAGE_PMD_ORDER.  Those should be fixed, but until then,
- * limit the maximum allocation order to PMD size.  I'm not aware of any
- * assumptions about maximum order if THP are disabled, but 8 seems like
- * a good order (that's 1MB if you're using 4kB pages)
- */
-#ifdef CONFIG_TRANSPARENT_HUGEPAGE
-#define MAX_PAGECACHE_ORDER	HPAGE_PMD_ORDER
-#else
-#define MAX_PAGECACHE_ORDER	8
-#endif
-
 #ifdef CONFIG_NUMA
 struct folio *filemap_alloc_folio(gfp_t gfp, unsigned int order);
 #else
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 03/13] filemap: align the index to mapping_min_order in the page cache
  2024-02-26  9:49 [PATCH 00/13] enable bs > ps in XFS Pankaj Raghav (Samsung)
  2024-02-26  9:49 ` [PATCH 01/13] mm: Support order-1 folios in the page cache Pankaj Raghav (Samsung)
  2024-02-26  9:49 ` [PATCH 02/13] fs: Allow fine-grained control of folio sizes Pankaj Raghav (Samsung)
@ 2024-02-26  9:49 ` Pankaj Raghav (Samsung)
  2024-02-26 14:40   ` Matthew Wilcox
  2024-02-26  9:49 ` [PATCH 04/13] filemap: use mapping_min_order while allocating folios Pankaj Raghav (Samsung)
                   ` (9 subsequent siblings)
  12 siblings, 1 reply; 35+ messages in thread
From: Pankaj Raghav (Samsung) @ 2024-02-26  9:49 UTC (permalink / raw)
  To: linux-xfs, linux-fsdevel
  Cc: linux-kernel, david, chandan.babu, akpm, mcgrof, ziy, hare,
	djwong, gost.dev, linux-mm, willy, Pankaj Raghav

From: Luis Chamberlain <mcgrof@kernel.org>

Supporting mapping_min_order implies that we guarantee each folio in the
page cache has at least an order of mapping_min_order. So when adding new
folios to the page cache we must ensure the index used is aligned to the
mapping_min_order as the page cache requires the index to be aligned to
the order of the folio.

A higher order folio than min_order by definition is a multiple of the
min_order. If an index is aligned to an order higher than a min_order, it
will also be aligned to the min order.

This effectively introduces no new functional changes when min order is
not set other than a few rounding computations that should result in the
same value.

Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Pankaj Raghav <p.raghav@samsung.com>
---
 include/linux/pagemap.h |  8 ++++++++
 mm/filemap.c            | 22 +++++++++++++---------
 2 files changed, 21 insertions(+), 9 deletions(-)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index fc8eb9c94e9c..fe8e1fbb667d 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -1328,6 +1328,14 @@ struct readahead_control {
 		._index = i,						\
 	}
 
+#define DEFINE_READAHEAD_ALIGNED(ractl, f, r, m, i)			\
+	struct readahead_control ractl = {				\
+		.file = f,						\
+		.mapping = m,						\
+		.ra = r,						\
+		._index = mapping_align_start_index(m, i),		\
+	}
+
 #define VM_READAHEAD_PAGES	(SZ_128K / PAGE_SIZE)
 
 void page_cache_ra_unbounded(struct readahead_control *,
diff --git a/mm/filemap.c b/mm/filemap.c
index 2b00442b9d19..bdf4f65f597c 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2478,11 +2478,11 @@ static int filemap_get_pages(struct kiocb *iocb, size_t count,
 	struct file *filp = iocb->ki_filp;
 	struct address_space *mapping = filp->f_mapping;
 	struct file_ra_state *ra = &filp->f_ra;
-	pgoff_t index = iocb->ki_pos >> PAGE_SHIFT;
-	pgoff_t last_index;
+	pgoff_t index, last_index;
 	struct folio *folio;
 	int err = 0;
 
+	index = mapping_align_start_index(mapping, iocb->ki_pos >> PAGE_SHIFT);
 	/* "last_index" is the index of the page beyond the end of the read */
 	last_index = DIV_ROUND_UP(iocb->ki_pos + count, PAGE_SIZE);
 retry:
@@ -2500,8 +2500,7 @@ static int filemap_get_pages(struct kiocb *iocb, size_t count,
 	if (!folio_batch_count(fbatch)) {
 		if (iocb->ki_flags & (IOCB_NOWAIT | IOCB_WAITQ))
 			return -EAGAIN;
-		err = filemap_create_folio(filp, mapping,
-				iocb->ki_pos >> PAGE_SHIFT, fbatch);
+		err = filemap_create_folio(filp, mapping, index, fbatch);
 		if (err == AOP_TRUNCATED_PAGE)
 			goto retry;
 		return err;
@@ -3093,7 +3092,7 @@ static struct file *do_sync_mmap_readahead(struct vm_fault *vmf)
 	struct file *file = vmf->vma->vm_file;
 	struct file_ra_state *ra = &file->f_ra;
 	struct address_space *mapping = file->f_mapping;
-	DEFINE_READAHEAD(ractl, file, ra, mapping, vmf->pgoff);
+	DEFINE_READAHEAD_ALIGNED(ractl, file, ra, mapping, vmf->pgoff);
 	struct file *fpin = NULL;
 	unsigned long vm_flags = vmf->vma->vm_flags;
 	unsigned int mmap_miss;
@@ -3147,7 +3146,7 @@ static struct file *do_sync_mmap_readahead(struct vm_fault *vmf)
 	ra->start = max_t(long, 0, vmf->pgoff - ra->ra_pages / 2);
 	ra->size = ra->ra_pages;
 	ra->async_size = ra->ra_pages / 4;
-	ractl._index = ra->start;
+	ractl._index = mapping_align_start_index(mapping, ra->start);
 	page_cache_ra_order(&ractl, ra, 0);
 	return fpin;
 }
@@ -3162,7 +3161,7 @@ static struct file *do_async_mmap_readahead(struct vm_fault *vmf,
 {
 	struct file *file = vmf->vma->vm_file;
 	struct file_ra_state *ra = &file->f_ra;
-	DEFINE_READAHEAD(ractl, file, ra, file->f_mapping, vmf->pgoff);
+	DEFINE_READAHEAD_ALIGNED(ractl, file, ra, file->f_mapping, vmf->pgoff);
 	struct file *fpin = NULL;
 	unsigned int mmap_miss;
 
@@ -3211,11 +3210,12 @@ vm_fault_t filemap_fault(struct vm_fault *vmf)
 	struct file *fpin = NULL;
 	struct address_space *mapping = file->f_mapping;
 	struct inode *inode = mapping->host;
-	pgoff_t max_idx, index = vmf->pgoff;
+	pgoff_t max_idx, index;
 	struct folio *folio;
 	vm_fault_t ret = 0;
 	bool mapping_locked = false;
 
+	index = mapping_align_start_index(mapping, vmf->pgoff);
 	max_idx = DIV_ROUND_UP(i_size_read(inode), PAGE_SIZE);
 	if (unlikely(index >= max_idx))
 		return VM_FAULT_SIGBUS;
@@ -3321,7 +3321,10 @@ vm_fault_t filemap_fault(struct vm_fault *vmf)
 		return VM_FAULT_SIGBUS;
 	}
 
-	vmf->page = folio_file_page(folio, index);
+	VM_BUG_ON_FOLIO(folio_order(folio) < mapping_min_folio_order(mapping),
+			folio);
+
+	vmf->page = folio_file_page(folio, vmf->pgoff);
 	return ret | VM_FAULT_LOCKED;
 
 page_not_uptodate:
@@ -3657,6 +3660,7 @@ static struct folio *do_read_cache_folio(struct address_space *mapping,
 	struct folio *folio;
 	int err;
 
+	index = mapping_align_start_index(mapping, index);
 	if (!filler)
 		filler = mapping->a_ops->read_folio;
 repeat:
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 04/13] filemap: use mapping_min_order while allocating folios
  2024-02-26  9:49 [PATCH 00/13] enable bs > ps in XFS Pankaj Raghav (Samsung)
                   ` (2 preceding siblings ...)
  2024-02-26  9:49 ` [PATCH 03/13] filemap: align the index to mapping_min_order in the page cache Pankaj Raghav (Samsung)
@ 2024-02-26  9:49 ` Pankaj Raghav (Samsung)
  2024-02-26 14:47   ` Matthew Wilcox
  2024-02-26  9:49 ` [PATCH 05/13] readahead: set file_ra_state->ra_pages to be at least mapping_min_order Pankaj Raghav (Samsung)
                   ` (8 subsequent siblings)
  12 siblings, 1 reply; 35+ messages in thread
From: Pankaj Raghav (Samsung) @ 2024-02-26  9:49 UTC (permalink / raw)
  To: linux-xfs, linux-fsdevel
  Cc: linux-kernel, david, chandan.babu, akpm, mcgrof, ziy, hare,
	djwong, gost.dev, linux-mm, willy, Pankaj Raghav

From: Pankaj Raghav <p.raghav@samsung.com>

filemap_create_folio() and do_read_cache_folio() were always allocating
folio of order 0. __filemap_get_folio was trying to allocate higher
order folios when fgp_flags had higher order hint set but it will default
to order 0 folio if higher order memory allocation fails.

As we bring the notion of mapping_min_order, make sure these functions
allocate at least folio of mapping_min_order as we need to guarantee it
in the page cache.

Add some additional VM_BUG_ON() in page_cache_delete[batch] and
__filemap_add_folio to catch errors where we delete or add folios that
has order less than min_order.

Signed-off-by: Pankaj Raghav <p.raghav@samsung.com>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Acked-by: Darrick J. Wong <djwong@kernel.org>
---
 mm/filemap.c | 24 ++++++++++++++++++++----
 1 file changed, 20 insertions(+), 4 deletions(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index bdf4f65f597c..4b144479c4cb 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -135,6 +135,8 @@ static void page_cache_delete(struct address_space *mapping,
 	xas_set_order(&xas, folio->index, folio_order(folio));
 	nr = folio_nr_pages(folio);
 
+	VM_BUG_ON_FOLIO(folio_order(folio) < mapping_min_folio_order(mapping),
+			folio);
 	VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio);
 
 	xas_store(&xas, shadow);
@@ -305,6 +307,8 @@ static void page_cache_delete_batch(struct address_space *mapping,
 
 		WARN_ON_ONCE(!folio_test_locked(folio));
 
+		VM_BUG_ON_FOLIO(folio_order(folio) < mapping_min_folio_order(mapping),
+				folio);
 		folio->mapping = NULL;
 		/* Leave folio->index set: truncation lookup relies on it */
 
@@ -896,6 +900,8 @@ noinline int __filemap_add_folio(struct address_space *mapping,
 			}
 		}
 
+		VM_BUG_ON_FOLIO(folio_order(folio) < mapping_min_folio_order(mapping),
+				folio);
 		xas_store(&xas, folio);
 		if (xas_error(&xas))
 			goto unlock;
@@ -1847,6 +1853,9 @@ struct folio *__filemap_get_folio(struct address_space *mapping, pgoff_t index,
 		fgf_t fgp_flags, gfp_t gfp)
 {
 	struct folio *folio;
+	unsigned int min_order = mapping_min_folio_order(mapping);
+
+	index = mapping_align_start_index(mapping, index);
 
 repeat:
 	folio = filemap_get_entry(mapping, index);
@@ -1886,7 +1895,7 @@ struct folio *__filemap_get_folio(struct address_space *mapping, pgoff_t index,
 		folio_wait_stable(folio);
 no_page:
 	if (!folio && (fgp_flags & FGP_CREAT)) {
-		unsigned order = FGF_GET_ORDER(fgp_flags);
+		unsigned int order = max(min_order, FGF_GET_ORDER(fgp_flags));
 		int err;
 
 		if ((fgp_flags & FGP_WRITE) && mapping_can_writeback(mapping))
@@ -1912,8 +1921,13 @@ struct folio *__filemap_get_folio(struct address_space *mapping, pgoff_t index,
 			gfp_t alloc_gfp = gfp;
 
 			err = -ENOMEM;
+			if (order < min_order)
+				order = min_order;
 			if (order > 0)
 				alloc_gfp |= __GFP_NORETRY | __GFP_NOWARN;
+
+			VM_BUG_ON(index & ((1UL << order) - 1));
+
 			folio = filemap_alloc_folio(alloc_gfp, order);
 			if (!folio)
 				continue;
@@ -1927,7 +1941,7 @@ struct folio *__filemap_get_folio(struct address_space *mapping, pgoff_t index,
 				break;
 			folio_put(folio);
 			folio = NULL;
-		} while (order-- > 0);
+		} while (order-- > min_order);
 
 		if (err == -EEXIST)
 			goto repeat;
@@ -2422,7 +2436,8 @@ static int filemap_create_folio(struct file *file,
 	struct folio *folio;
 	int error;
 
-	folio = filemap_alloc_folio(mapping_gfp_mask(mapping), 0);
+	folio = filemap_alloc_folio(mapping_gfp_mask(mapping),
+				    mapping_min_folio_order(mapping));
 	if (!folio)
 		return -ENOMEM;
 
@@ -3666,7 +3681,8 @@ static struct folio *do_read_cache_folio(struct address_space *mapping,
 repeat:
 	folio = filemap_get_folio(mapping, index);
 	if (IS_ERR(folio)) {
-		folio = filemap_alloc_folio(gfp, 0);
+		folio = filemap_alloc_folio(gfp,
+					    mapping_min_folio_order(mapping));
 		if (!folio)
 			return ERR_PTR(-ENOMEM);
 		err = filemap_add_folio(mapping, folio, index, gfp);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 05/13] readahead: set file_ra_state->ra_pages to be at least mapping_min_order
  2024-02-26  9:49 [PATCH 00/13] enable bs > ps in XFS Pankaj Raghav (Samsung)
                   ` (3 preceding siblings ...)
  2024-02-26  9:49 ` [PATCH 04/13] filemap: use mapping_min_order while allocating folios Pankaj Raghav (Samsung)
@ 2024-02-26  9:49 ` Pankaj Raghav (Samsung)
  2024-02-26 14:49   ` Matthew Wilcox
  2024-02-26  9:49 ` [PATCH 06/13] readahead: align index to mapping_min_order in ondemand_ra and force_ra Pankaj Raghav (Samsung)
                   ` (7 subsequent siblings)
  12 siblings, 1 reply; 35+ messages in thread
From: Pankaj Raghav (Samsung) @ 2024-02-26  9:49 UTC (permalink / raw)
  To: linux-xfs, linux-fsdevel
  Cc: linux-kernel, david, chandan.babu, akpm, mcgrof, ziy, hare,
	djwong, gost.dev, linux-mm, willy, Pankaj Raghav

From: Luis Chamberlain <mcgrof@kernel.org>

Set the file_ra_state->ra_pages in file_ra_state_init() to be at least
mapping_min_order of pages if the bdi->ra_pages is less than that.

Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Pankaj Raghav <p.raghav@samsung.com>
---
 mm/readahead.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/mm/readahead.c b/mm/readahead.c
index 369c70e2be42..8a610b78d94b 100644
--- a/mm/readahead.c
+++ b/mm/readahead.c
@@ -138,7 +138,11 @@
 void
 file_ra_state_init(struct file_ra_state *ra, struct address_space *mapping)
 {
+	unsigned int min_nrpages = mapping_min_folio_nrpages(mapping);
+
 	ra->ra_pages = inode_to_bdi(mapping->host)->ra_pages;
+	if (ra->ra_pages < min_nrpages)
+		ra->ra_pages = min_nrpages;
 	ra->prev_pos = -1;
 }
 EXPORT_SYMBOL_GPL(file_ra_state_init);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 06/13] readahead: align index to mapping_min_order in ondemand_ra and force_ra
  2024-02-26  9:49 [PATCH 00/13] enable bs > ps in XFS Pankaj Raghav (Samsung)
                   ` (4 preceding siblings ...)
  2024-02-26  9:49 ` [PATCH 05/13] readahead: set file_ra_state->ra_pages to be at least mapping_min_order Pankaj Raghav (Samsung)
@ 2024-02-26  9:49 ` Pankaj Raghav (Samsung)
  2024-02-26  9:49 ` [PATCH 07/13] readahead: rework loop in page_cache_ra_unbounded() Pankaj Raghav (Samsung)
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 35+ messages in thread
From: Pankaj Raghav (Samsung) @ 2024-02-26  9:49 UTC (permalink / raw)
  To: linux-xfs, linux-fsdevel
  Cc: linux-kernel, david, chandan.babu, akpm, mcgrof, ziy, hare,
	djwong, gost.dev, linux-mm, willy, Pankaj Raghav

From: Luis Chamberlain <mcgrof@kernel.org>

Align the ra->start and ra->size to mapping_min_order in
ondemand_readahead(), and align the index to mapping_min_order in
force_page_cache_ra(). This will ensure that the folios allocated for
readahead that are added to the page cache are aligned to
mapping_min_order.

Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Pankaj Raghav <p.raghav@samsung.com>
---
 mm/readahead.c | 27 ++++++++++++++++++++-------
 1 file changed, 20 insertions(+), 7 deletions(-)

diff --git a/mm/readahead.c b/mm/readahead.c
index 8a610b78d94b..325a25e4ee3a 100644
--- a/mm/readahead.c
+++ b/mm/readahead.c
@@ -313,7 +313,9 @@ void force_page_cache_ra(struct readahead_control *ractl,
 	struct address_space *mapping = ractl->mapping;
 	struct file_ra_state *ra = ractl->ra;
 	struct backing_dev_info *bdi = inode_to_bdi(mapping->host);
-	unsigned long max_pages, index;
+	unsigned long max_pages;
+	pgoff_t index, new_index;
+	unsigned long min_nrpages = mapping_min_folio_nrpages(mapping);
 
 	if (unlikely(!mapping->a_ops->read_folio && !mapping->a_ops->readahead))
 		return;
@@ -323,7 +325,14 @@ void force_page_cache_ra(struct readahead_control *ractl,
 	 * be up to the optimal hardware IO size
 	 */
 	index = readahead_index(ractl);
+	new_index = mapping_align_start_index(mapping, index);
+	if (new_index != index) {
+		nr_to_read += index - new_index;
+		index = new_index;
+	}
+
 	max_pages = max_t(unsigned long, bdi->io_pages, ra->ra_pages);
+	max_pages = max_t(unsigned long, max_pages, min_nrpages);
 	nr_to_read = min_t(unsigned long, nr_to_read, max_pages);
 	while (nr_to_read) {
 		unsigned long this_chunk = (2 * 1024 * 1024) / PAGE_SIZE;
@@ -331,6 +340,7 @@ void force_page_cache_ra(struct readahead_control *ractl,
 		if (this_chunk > nr_to_read)
 			this_chunk = nr_to_read;
 		ractl->_index = index;
+		VM_BUG_ON(!IS_ALIGNED(index, min_nrpages));
 		do_page_cache_ra(ractl, this_chunk, 0);
 
 		index += this_chunk;
@@ -557,8 +567,11 @@ static void ondemand_readahead(struct readahead_control *ractl,
 	unsigned long add_pages;
 	pgoff_t index = readahead_index(ractl);
 	pgoff_t expected, prev_index;
-	unsigned int order = folio ? folio_order(folio) : 0;
+	unsigned int min_order = mapping_min_folio_order(ractl->mapping);
+	unsigned int min_nrpages = mapping_min_folio_nrpages(ractl->mapping);
+	unsigned int order = folio ? folio_order(folio) : min_order;
 
+	VM_BUG_ON(!IS_ALIGNED(index, min_nrpages));
 	/*
 	 * If the request exceeds the readahead window, allow the read to
 	 * be up to the optimal hardware IO size
@@ -580,7 +593,7 @@ static void ondemand_readahead(struct readahead_control *ractl,
 			1UL << order);
 	if (index == expected || index == (ra->start + ra->size)) {
 		ra->start += ra->size;
-		ra->size = get_next_ra_size(ra, max_pages);
+		ra->size = max(get_next_ra_size(ra, max_pages), min_nrpages);
 		ra->async_size = ra->size;
 		goto readit;
 	}
@@ -605,7 +618,7 @@ static void ondemand_readahead(struct readahead_control *ractl,
 		ra->start = start;
 		ra->size = start - index;	/* old async_size */
 		ra->size += req_size;
-		ra->size = get_next_ra_size(ra, max_pages);
+		ra->size = max(get_next_ra_size(ra, max_pages), min_nrpages);
 		ra->async_size = ra->size;
 		goto readit;
 	}
@@ -642,7 +655,7 @@ static void ondemand_readahead(struct readahead_control *ractl,
 
 initial_readahead:
 	ra->start = index;
-	ra->size = get_init_ra_size(req_size, max_pages);
+	ra->size = max(min_nrpages, get_init_ra_size(req_size, max_pages));
 	ra->async_size = ra->size > req_size ? ra->size - req_size : ra->size;
 
 readit:
@@ -653,7 +666,7 @@ static void ondemand_readahead(struct readahead_control *ractl,
 	 * Take care of maximum IO pages as above.
 	 */
 	if (index == ra->start && ra->size == ra->async_size) {
-		add_pages = get_next_ra_size(ra, max_pages);
+		add_pages = max(get_next_ra_size(ra, max_pages), min_nrpages);
 		if (ra->size + add_pages <= max_pages) {
 			ra->async_size = add_pages;
 			ra->size += add_pages;
@@ -663,7 +676,7 @@ static void ondemand_readahead(struct readahead_control *ractl,
 		}
 	}
 
-	ractl->_index = ra->start;
+	ractl->_index = mapping_align_start_index(ractl->mapping, ra->start);
 	page_cache_ra_order(ractl, ra, order);
 }
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 07/13] readahead: rework loop in page_cache_ra_unbounded()
  2024-02-26  9:49 [PATCH 00/13] enable bs > ps in XFS Pankaj Raghav (Samsung)
                   ` (5 preceding siblings ...)
  2024-02-26  9:49 ` [PATCH 06/13] readahead: align index to mapping_min_order in ondemand_ra and force_ra Pankaj Raghav (Samsung)
@ 2024-02-26  9:49 ` Pankaj Raghav (Samsung)
  2024-02-26  9:49 ` [PATCH 08/13] readahead: allocate folios with mapping_min_order in ra_(unbounded|order) Pankaj Raghav (Samsung)
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 35+ messages in thread
From: Pankaj Raghav (Samsung) @ 2024-02-26  9:49 UTC (permalink / raw)
  To: linux-xfs, linux-fsdevel
  Cc: linux-kernel, david, chandan.babu, akpm, mcgrof, ziy, hare,
	djwong, gost.dev, linux-mm, willy, Pankaj Raghav

From: Hannes Reinecke <hare@suse.de>

Rework the loop in page_cache_ra_unbounded() to advance with
the number of pages in a folio instead of just one page at a time.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Co-developed-by: Pankaj Raghav <p.raghav@samsung.com>
Signed-off-by: Pankaj Raghav <p.raghav@samsung.com>
Acked-by: Darrick J. Wong <djwong@kernel.org>
---
 mm/readahead.c | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/mm/readahead.c b/mm/readahead.c
index 325a25e4ee3a..ef0004147952 100644
--- a/mm/readahead.c
+++ b/mm/readahead.c
@@ -212,7 +212,7 @@ void page_cache_ra_unbounded(struct readahead_control *ractl,
 	struct address_space *mapping = ractl->mapping;
 	unsigned long index = readahead_index(ractl);
 	gfp_t gfp_mask = readahead_gfp_mask(mapping);
-	unsigned long i;
+	unsigned long i = 0;
 
 	/*
 	 * Partway through the readahead operation, we will have added
@@ -230,7 +230,7 @@ void page_cache_ra_unbounded(struct readahead_control *ractl,
 	/*
 	 * Preallocate as many pages as we will need.
 	 */
-	for (i = 0; i < nr_to_read; i++) {
+	while (i < nr_to_read) {
 		struct folio *folio = xa_load(&mapping->i_pages, index + i);
 
 		if (folio && !xa_is_value(folio)) {
@@ -243,8 +243,8 @@ void page_cache_ra_unbounded(struct readahead_control *ractl,
 			 * not worth getting one just for that.
 			 */
 			read_pages(ractl);
-			ractl->_index++;
-			i = ractl->_index + ractl->_nr_pages - index - 1;
+			ractl->_index += folio_nr_pages(folio);
+			i = ractl->_index + ractl->_nr_pages - index;
 			continue;
 		}
 
@@ -256,13 +256,14 @@ void page_cache_ra_unbounded(struct readahead_control *ractl,
 			folio_put(folio);
 			read_pages(ractl);
 			ractl->_index++;
-			i = ractl->_index + ractl->_nr_pages - index - 1;
+			i = ractl->_index + ractl->_nr_pages - index;
 			continue;
 		}
 		if (i == nr_to_read - lookahead_size)
 			folio_set_readahead(folio);
 		ractl->_workingset |= folio_test_workingset(folio);
-		ractl->_nr_pages++;
+		ractl->_nr_pages += folio_nr_pages(folio);
+		i += folio_nr_pages(folio);
 	}
 
 	/*
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 08/13] readahead: allocate folios with mapping_min_order in ra_(unbounded|order)
  2024-02-26  9:49 [PATCH 00/13] enable bs > ps in XFS Pankaj Raghav (Samsung)
                   ` (6 preceding siblings ...)
  2024-02-26  9:49 ` [PATCH 07/13] readahead: rework loop in page_cache_ra_unbounded() Pankaj Raghav (Samsung)
@ 2024-02-26  9:49 ` Pankaj Raghav (Samsung)
  2024-02-26  9:49 ` [PATCH 09/13] mm: do not split a folio if it has minimum folio order requirement Pankaj Raghav (Samsung)
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 35+ messages in thread
From: Pankaj Raghav (Samsung) @ 2024-02-26  9:49 UTC (permalink / raw)
  To: linux-xfs, linux-fsdevel
  Cc: linux-kernel, david, chandan.babu, akpm, mcgrof, ziy, hare,
	djwong, gost.dev, linux-mm, willy, Pankaj Raghav

From: Pankaj Raghav <p.raghav@samsung.com>

Allocate folios with at least mapping_min_order in
page_cache_ra_unbounded() and page_cache_ra_order() as we need to
guarantee a minimum order in the page cache.

Signed-off-by: Pankaj Raghav <p.raghav@samsung.com>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Acked-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
---
 mm/readahead.c | 29 +++++++++++++++++++++++++----
 1 file changed, 25 insertions(+), 4 deletions(-)

diff --git a/mm/readahead.c b/mm/readahead.c
index ef0004147952..73aef3f080ba 100644
--- a/mm/readahead.c
+++ b/mm/readahead.c
@@ -213,6 +213,7 @@ void page_cache_ra_unbounded(struct readahead_control *ractl,
 	unsigned long index = readahead_index(ractl);
 	gfp_t gfp_mask = readahead_gfp_mask(mapping);
 	unsigned long i = 0;
+	unsigned int min_nrpages = mapping_min_folio_nrpages(mapping);
 
 	/*
 	 * Partway through the readahead operation, we will have added
@@ -234,6 +235,8 @@ void page_cache_ra_unbounded(struct readahead_control *ractl,
 		struct folio *folio = xa_load(&mapping->i_pages, index + i);
 
 		if (folio && !xa_is_value(folio)) {
+			long nr_pages = folio_nr_pages(folio);
+
 			/*
 			 * Page already present?  Kick off the current batch
 			 * of contiguous pages before continuing with the
@@ -243,19 +246,31 @@ void page_cache_ra_unbounded(struct readahead_control *ractl,
 			 * not worth getting one just for that.
 			 */
 			read_pages(ractl);
-			ractl->_index += folio_nr_pages(folio);
+
+			/*
+			 * Move the ractl->_index by at least min_pages
+			 * if the folio got truncated to respect the
+			 * alignment constraint in the page cache.
+			 *
+			 */
+			if (mapping != folio->mapping)
+				nr_pages = min_nrpages;
+
+			VM_BUG_ON_FOLIO(nr_pages < min_nrpages, folio);
+			ractl->_index += nr_pages;
 			i = ractl->_index + ractl->_nr_pages - index;
 			continue;
 		}
 
-		folio = filemap_alloc_folio(gfp_mask, 0);
+		folio = filemap_alloc_folio(gfp_mask,
+					    mapping_min_folio_order(mapping));
 		if (!folio)
 			break;
 		if (filemap_add_folio(mapping, folio, index + i,
 					gfp_mask) < 0) {
 			folio_put(folio);
 			read_pages(ractl);
-			ractl->_index++;
+			ractl->_index += min_nrpages;
 			i = ractl->_index + ractl->_nr_pages - index;
 			continue;
 		}
@@ -503,6 +518,7 @@ void page_cache_ra_order(struct readahead_control *ractl,
 {
 	struct address_space *mapping = ractl->mapping;
 	pgoff_t index = readahead_index(ractl);
+	unsigned int min_order = mapping_min_folio_order(mapping);
 	pgoff_t limit = (i_size_read(mapping->host) - 1) >> PAGE_SHIFT;
 	pgoff_t mark = index + ra->size - ra->async_size;
 	int err = 0;
@@ -529,8 +545,13 @@ void page_cache_ra_order(struct readahead_control *ractl,
 		if (index & ((1UL << order) - 1))
 			order = __ffs(index);
 		/* Don't allocate pages past EOF */
-		while (index + (1UL << order) - 1 > limit)
+		while (order > min_order && index + (1UL << order) - 1 > limit)
 			order--;
+
+		if (order < min_order)
+			order = min_order;
+
+		VM_BUG_ON(index & ((1UL << order) - 1));
 		err = ra_alloc_folio(ractl, index, mark, order, gfp);
 		if (err)
 			break;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 09/13] mm: do not split a folio if it has minimum folio order requirement
  2024-02-26  9:49 [PATCH 00/13] enable bs > ps in XFS Pankaj Raghav (Samsung)
                   ` (7 preceding siblings ...)
  2024-02-26  9:49 ` [PATCH 08/13] readahead: allocate folios with mapping_min_order in ra_(unbounded|order) Pankaj Raghav (Samsung)
@ 2024-02-26  9:49 ` Pankaj Raghav (Samsung)
  2024-02-26  9:49 ` [PATCH 10/13] iomap: fix iomap_dio_zero() for fs bs > system page size Pankaj Raghav (Samsung)
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 35+ messages in thread
From: Pankaj Raghav (Samsung) @ 2024-02-26  9:49 UTC (permalink / raw)
  To: linux-xfs, linux-fsdevel
  Cc: linux-kernel, david, chandan.babu, akpm, mcgrof, ziy, hare,
	djwong, gost.dev, linux-mm, willy, Pankaj Raghav

From: Pankaj Raghav <p.raghav@samsung.com>

As we don't have a way to split a folio to a any given lower folio
order yet, avoid splitting the folio in split_huge_page_to_list() if it
has a minimum folio order requirement.

Signed-off-by: Pankaj Raghav <p.raghav@samsung.com>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
---
 mm/huge_memory.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 81fd1ba57088..6ec3417638a1 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -3030,6 +3030,19 @@ int split_huge_page_to_list(struct page *page, struct list_head *list)
 			goto out;
 		}
 
+		/*
+		 * Do not split if mapping has minimum folio order
+		 * requirement.
+		 *
+		 * XXX: Once we have support for splitting to any lower
+		 * folio order, then it could be split based on the
+		 * min_folio_order.
+		 */
+		if (mapping_min_folio_order(mapping)) {
+			ret = -EAGAIN;
+			goto out;
+		}
+
 		gfp = current_gfp_context(mapping_gfp_mask(mapping) &
 							GFP_RECLAIM_MASK);
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 10/13] iomap: fix iomap_dio_zero() for fs bs > system page size
  2024-02-26  9:49 [PATCH 00/13] enable bs > ps in XFS Pankaj Raghav (Samsung)
                   ` (8 preceding siblings ...)
  2024-02-26  9:49 ` [PATCH 09/13] mm: do not split a folio if it has minimum folio order requirement Pankaj Raghav (Samsung)
@ 2024-02-26  9:49 ` Pankaj Raghav (Samsung)
  2024-02-26 17:58   ` Matthew Wilcox
  2024-02-26  9:49 ` [PATCH 11/13] xfs: expose block size in stat Pankaj Raghav (Samsung)
                   ` (2 subsequent siblings)
  12 siblings, 1 reply; 35+ messages in thread
From: Pankaj Raghav (Samsung) @ 2024-02-26  9:49 UTC (permalink / raw)
  To: linux-xfs, linux-fsdevel
  Cc: linux-kernel, david, chandan.babu, akpm, mcgrof, ziy, hare,
	djwong, gost.dev, linux-mm, willy, Pankaj Raghav

From: Pankaj Raghav <p.raghav@samsung.com>

iomap_dio_zero() will pad a fs block with zeroes if the direct IO size
< fs block size. iomap_dio_zero() has an implicit assumption that fs block
size < page_size. This is true for most filesystems at the moment.

If the block size > page size, this will send the contents of the page
next to zero page(as len > PAGE_SIZE) to the underlying block device,
causing FS corruption.

iomap is a generic infrastructure and it should not make any assumptions
about the fs block size and the page size of the system.

Signed-off-by: Pankaj Raghav <p.raghav@samsung.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/iomap/direct-io.c | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
index bcd3f8cf5ea4..04f6c5548136 100644
--- a/fs/iomap/direct-io.c
+++ b/fs/iomap/direct-io.c
@@ -239,14 +239,23 @@ static void iomap_dio_zero(const struct iomap_iter *iter, struct iomap_dio *dio,
 	struct page *page = ZERO_PAGE(0);
 	struct bio *bio;
 
-	bio = iomap_dio_alloc_bio(iter, dio, 1, REQ_OP_WRITE | REQ_SYNC | REQ_IDLE);
+	WARN_ON_ONCE(len > (BIO_MAX_VECS * PAGE_SIZE));
+
+	bio = iomap_dio_alloc_bio(iter, dio, BIO_MAX_VECS,
+				  REQ_OP_WRITE | REQ_SYNC | REQ_IDLE);
 	fscrypt_set_bio_crypt_ctx(bio, inode, pos >> inode->i_blkbits,
 				  GFP_KERNEL);
+
 	bio->bi_iter.bi_sector = iomap_sector(&iter->iomap, pos);
 	bio->bi_private = dio;
 	bio->bi_end_io = iomap_dio_bio_end_io;
 
-	__bio_add_page(bio, page, len, 0);
+	while (len) {
+		unsigned int io_len = min_t(unsigned int, len, PAGE_SIZE);
+
+		__bio_add_page(bio, page, io_len, 0);
+		len -= io_len;
+	}
 	iomap_dio_submit_bio(iter, dio, bio, pos);
 }
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 11/13] xfs: expose block size in stat
  2024-02-26  9:49 [PATCH 00/13] enable bs > ps in XFS Pankaj Raghav (Samsung)
                   ` (9 preceding siblings ...)
  2024-02-26  9:49 ` [PATCH 10/13] iomap: fix iomap_dio_zero() for fs bs > system page size Pankaj Raghav (Samsung)
@ 2024-02-26  9:49 ` Pankaj Raghav (Samsung)
  2024-02-26 12:44   ` Dave Chinner
  2024-02-26  9:49 ` [PATCH 12/13] xfs: make the calculation generic in xfs_sb_validate_fsb_count() Pankaj Raghav (Samsung)
  2024-02-26  9:49 ` [PATCH 13/13] xfs: enable block size larger than page size support Pankaj Raghav (Samsung)
  12 siblings, 1 reply; 35+ messages in thread
From: Pankaj Raghav (Samsung) @ 2024-02-26  9:49 UTC (permalink / raw)
  To: linux-xfs, linux-fsdevel
  Cc: linux-kernel, david, chandan.babu, akpm, mcgrof, ziy, hare,
	djwong, gost.dev, linux-mm, willy, Dave Chinner

From: Dave Chinner <dchinner@redhat.com>

For block size larger than page size, the unit of efficient IO is
the block size, not the page size. Leaving stat() to report
PAGE_SIZE as the block size causes test programs like fsx to issue
illegal ranges for operations that require block size alignment
(e.g. fallocate() insert range). Hence update the preferred IO size
to reflect the block size in this case.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
dd2d535e3fb29d ("xfs: cleanup calculating the stat optimal I/O size")]
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
---
 fs/xfs/xfs_iops.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index a0d77f5f512e..1b4edfad464f 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -543,7 +543,7 @@ xfs_stat_blksize(
 			return 1U << mp->m_allocsize_log;
 	}
 
-	return PAGE_SIZE;
+	return max_t(unsigned long, PAGE_SIZE, mp->m_sb.sb_blocksize);
 }
 
 STATIC int
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 12/13] xfs: make the calculation generic in xfs_sb_validate_fsb_count()
  2024-02-26  9:49 [PATCH 00/13] enable bs > ps in XFS Pankaj Raghav (Samsung)
                   ` (10 preceding siblings ...)
  2024-02-26  9:49 ` [PATCH 11/13] xfs: expose block size in stat Pankaj Raghav (Samsung)
@ 2024-02-26  9:49 ` Pankaj Raghav (Samsung)
  2024-02-26 12:47   ` Dave Chinner
  2024-02-26 13:21   ` Matthew Wilcox
  2024-02-26  9:49 ` [PATCH 13/13] xfs: enable block size larger than page size support Pankaj Raghav (Samsung)
  12 siblings, 2 replies; 35+ messages in thread
From: Pankaj Raghav (Samsung) @ 2024-02-26  9:49 UTC (permalink / raw)
  To: linux-xfs, linux-fsdevel
  Cc: linux-kernel, david, chandan.babu, akpm, mcgrof, ziy, hare,
	djwong, gost.dev, linux-mm, willy, Pankaj Raghav

From: Pankaj Raghav <p.raghav@samsung.com>

Instead of assuming that PAGE_SHIFT is always higher than the blocklog,
make the calculation generic so that page cache count can be calculated
correctly for LBS.

Signed-off-by: Pankaj Raghav <p.raghav@samsung.com>
---
 fs/xfs/xfs_mount.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
index aabb25dc3efa..69af3b06be99 100644
--- a/fs/xfs/xfs_mount.c
+++ b/fs/xfs/xfs_mount.c
@@ -133,9 +133,15 @@ xfs_sb_validate_fsb_count(
 {
 	ASSERT(PAGE_SHIFT >= sbp->sb_blocklog);
 	ASSERT(sbp->sb_blocklog >= BBSHIFT);
+	uint64_t mapping_count;
+	uint64_t bytes;
 
+	if (check_mul_overflow(nblocks, (1 << sbp->sb_blocklog), &bytes))
+		return -EFBIG;
+
+	mapping_count = bytes >> PAGE_SHIFT;
 	/* Limited by ULONG_MAX of page cache index */
-	if (nblocks >> (PAGE_SHIFT - sbp->sb_blocklog) > ULONG_MAX)
+	if (mapping_count > ULONG_MAX)
 		return -EFBIG;
 	return 0;
 }
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 13/13] xfs: enable block size larger than page size support
  2024-02-26  9:49 [PATCH 00/13] enable bs > ps in XFS Pankaj Raghav (Samsung)
                   ` (11 preceding siblings ...)
  2024-02-26  9:49 ` [PATCH 12/13] xfs: make the calculation generic in xfs_sb_validate_fsb_count() Pankaj Raghav (Samsung)
@ 2024-02-26  9:49 ` Pankaj Raghav (Samsung)
  2024-02-26 13:26   ` Matthew Wilcox
  12 siblings, 1 reply; 35+ messages in thread
From: Pankaj Raghav (Samsung) @ 2024-02-26  9:49 UTC (permalink / raw)
  To: linux-xfs, linux-fsdevel
  Cc: linux-kernel, david, chandan.babu, akpm, mcgrof, ziy, hare,
	djwong, gost.dev, linux-mm, willy, Pankaj Raghav

From: Pankaj Raghav <p.raghav@samsung.com>

Page cache now has the ability to have a minimum order when allocating
a folio which is a prerequisite to add support for block size > page
size.

Signed-off-by: Pankaj Raghav <p.raghav@samsung.com>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
---
 fs/xfs/libxfs/xfs_ialloc.c |  5 +++++
 fs/xfs/libxfs/xfs_shared.h |  3 +++
 fs/xfs/xfs_icache.c        |  6 ++++--
 fs/xfs/xfs_mount.c         |  1 -
 fs/xfs/xfs_super.c         | 10 ++--------
 5 files changed, 14 insertions(+), 11 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_ialloc.c b/fs/xfs/libxfs/xfs_ialloc.c
index 2361a22035b0..c040bd6271fd 100644
--- a/fs/xfs/libxfs/xfs_ialloc.c
+++ b/fs/xfs/libxfs/xfs_ialloc.c
@@ -2892,6 +2892,11 @@ xfs_ialloc_setup_geometry(
 		igeo->ialloc_align = mp->m_dalign;
 	else
 		igeo->ialloc_align = 0;
+
+	if (mp->m_sb.sb_blocksize > PAGE_SIZE)
+		igeo->min_folio_order = mp->m_sb.sb_blocklog - PAGE_SHIFT;
+	else
+		igeo->min_folio_order = 0;
 }
 
 /* Compute the location of the root directory inode that is laid out by mkfs. */
diff --git a/fs/xfs/libxfs/xfs_shared.h b/fs/xfs/libxfs/xfs_shared.h
index 4220d3584c1b..67ed406e7a81 100644
--- a/fs/xfs/libxfs/xfs_shared.h
+++ b/fs/xfs/libxfs/xfs_shared.h
@@ -188,6 +188,9 @@ struct xfs_ino_geometry {
 	/* precomputed value for di_flags2 */
 	uint64_t	new_diflags2;
 
+	/* minimum folio order of a page cache allocation */
+	unsigned int	min_folio_order;
+
 };
 
 #endif /* __XFS_SHARED_H__ */
diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
index dba514a2c84d..a1857000e2cd 100644
--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -88,7 +88,8 @@ xfs_inode_alloc(
 	/* VFS doesn't initialise i_mode or i_state! */
 	VFS_I(ip)->i_mode = 0;
 	VFS_I(ip)->i_state = 0;
-	mapping_set_large_folios(VFS_I(ip)->i_mapping);
+	mapping_set_folio_min_order(VFS_I(ip)->i_mapping,
+				    M_IGEO(mp)->min_folio_order);
 
 	XFS_STATS_INC(mp, vn_active);
 	ASSERT(atomic_read(&ip->i_pincount) == 0);
@@ -323,7 +324,8 @@ xfs_reinit_inode(
 	inode->i_rdev = dev;
 	inode->i_uid = uid;
 	inode->i_gid = gid;
-	mapping_set_large_folios(inode->i_mapping);
+	mapping_set_folio_min_order(inode->i_mapping,
+				    M_IGEO(mp)->min_folio_order);
 	return error;
 }
 
diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
index 69af3b06be99..c7df1857195c 100644
--- a/fs/xfs/xfs_mount.c
+++ b/fs/xfs/xfs_mount.c
@@ -131,7 +131,6 @@ xfs_sb_validate_fsb_count(
 	xfs_sb_t	*sbp,
 	uint64_t	nblocks)
 {
-	ASSERT(PAGE_SHIFT >= sbp->sb_blocklog);
 	ASSERT(sbp->sb_blocklog >= BBSHIFT);
 	uint64_t mapping_count;
 	uint64_t bytes;
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 5a2512d20bd0..685ce7bf7324 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -1625,16 +1625,10 @@ xfs_fs_fill_super(
 		goto out_free_sb;
 	}
 
-	/*
-	 * Until this is fixed only page-sized or smaller data blocks work.
-	 */
 	if (mp->m_sb.sb_blocksize > PAGE_SIZE) {
 		xfs_warn(mp,
-		"File system with blocksize %d bytes. "
-		"Only pagesize (%ld) or less will currently work.",
-				mp->m_sb.sb_blocksize, PAGE_SIZE);
-		error = -ENOSYS;
-		goto out_free_sb;
+"EXPERIMENTAL: Filesystem with Large Block Size (%d bytes) enabled.",
+			mp->m_sb.sb_blocksize);
 	}
 
 	/* Ensure this filesystem fits in the page cache limits */
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH 11/13] xfs: expose block size in stat
  2024-02-26  9:49 ` [PATCH 11/13] xfs: expose block size in stat Pankaj Raghav (Samsung)
@ 2024-02-26 12:44   ` Dave Chinner
  2024-02-27  8:53     ` Pankaj Raghav (Samsung)
  0 siblings, 1 reply; 35+ messages in thread
From: Dave Chinner @ 2024-02-26 12:44 UTC (permalink / raw)
  To: Pankaj Raghav (Samsung)
  Cc: linux-xfs, linux-fsdevel, linux-kernel, chandan.babu, akpm,
	mcgrof, ziy, hare, djwong, gost.dev, linux-mm, willy,
	Dave Chinner

On Mon, Feb 26, 2024 at 10:49:34AM +0100, Pankaj Raghav (Samsung) wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> For block size larger than page size, the unit of efficient IO is
> the block size, not the page size. Leaving stat() to report
> PAGE_SIZE as the block size causes test programs like fsx to issue
> illegal ranges for operations that require block size alignment
> (e.g. fallocate() insert range). Hence update the preferred IO size
> to reflect the block size in this case.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> dd2d535e3fb29d ("xfs: cleanup calculating the stat optimal I/O size")]
> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>

Something screwed up there, and you haven't put your own SOB on
this.

> ---
>  fs/xfs/xfs_iops.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
> index a0d77f5f512e..1b4edfad464f 100644
> --- a/fs/xfs/xfs_iops.c
> +++ b/fs/xfs/xfs_iops.c
> @@ -543,7 +543,7 @@ xfs_stat_blksize(
>  			return 1U << mp->m_allocsize_log;
>  	}
>  
> -	return PAGE_SIZE;
> +	return max_t(unsigned long, PAGE_SIZE, mp->m_sb.sb_blocksize);
>  }

This function returns a uint32_t, same type as
mp->m_sb.sb_blocksize. The comparision should use uint32_t casts,
not unsigned long.

ALso, this bears no resemblence to the original patch I wrote back in
2018. Please remove my SOB from it - you can state that "this change
is based on a patch originally from Dave Chinner" to credit the
history of it, but it's certainly not the patch I wrote 6 years ago
and so my SOB does not belong on it.

-Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 12/13] xfs: make the calculation generic in xfs_sb_validate_fsb_count()
  2024-02-26  9:49 ` [PATCH 12/13] xfs: make the calculation generic in xfs_sb_validate_fsb_count() Pankaj Raghav (Samsung)
@ 2024-02-26 12:47   ` Dave Chinner
  2024-02-26 13:21   ` Matthew Wilcox
  1 sibling, 0 replies; 35+ messages in thread
From: Dave Chinner @ 2024-02-26 12:47 UTC (permalink / raw)
  To: Pankaj Raghav (Samsung)
  Cc: linux-xfs, linux-fsdevel, linux-kernel, chandan.babu, akpm,
	mcgrof, ziy, hare, djwong, gost.dev, linux-mm, willy,
	Pankaj Raghav

On Mon, Feb 26, 2024 at 10:49:35AM +0100, Pankaj Raghav (Samsung) wrote:
> From: Pankaj Raghav <p.raghav@samsung.com>
> 
> Instead of assuming that PAGE_SHIFT is always higher than the blocklog,
> make the calculation generic so that page cache count can be calculated
> correctly for LBS.
> 
> Signed-off-by: Pankaj Raghav <p.raghav@samsung.com>
> ---
>  fs/xfs/xfs_mount.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
> index aabb25dc3efa..69af3b06be99 100644
> --- a/fs/xfs/xfs_mount.c
> +++ b/fs/xfs/xfs_mount.c
> @@ -133,9 +133,15 @@ xfs_sb_validate_fsb_count(
>  {
>  	ASSERT(PAGE_SHIFT >= sbp->sb_blocklog);
>  	ASSERT(sbp->sb_blocklog >= BBSHIFT);
> +	uint64_t mapping_count;
> +	uint64_t bytes;
>  
> +	if (check_mul_overflow(nblocks, (1 << sbp->sb_blocklog), &bytes))
> +		return -EFBIG;
> +
> +	mapping_count = bytes >> PAGE_SHIFT;

max_index, not a "mapping count". Also, put this after this comment:

>  	/* Limited by ULONG_MAX of page cache index */

So it is obvious what the max_index we are calculating belongs to.

-Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 12/13] xfs: make the calculation generic in xfs_sb_validate_fsb_count()
  2024-02-26  9:49 ` [PATCH 12/13] xfs: make the calculation generic in xfs_sb_validate_fsb_count() Pankaj Raghav (Samsung)
  2024-02-26 12:47   ` Dave Chinner
@ 2024-02-26 13:21   ` Matthew Wilcox
  2024-02-27  8:44     ` Pankaj Raghav (Samsung)
  1 sibling, 1 reply; 35+ messages in thread
From: Matthew Wilcox @ 2024-02-26 13:21 UTC (permalink / raw)
  To: Pankaj Raghav (Samsung)
  Cc: linux-xfs, linux-fsdevel, linux-kernel, david, chandan.babu,
	akpm, mcgrof, ziy, hare, djwong, gost.dev, linux-mm,
	Pankaj Raghav

On Mon, Feb 26, 2024 at 10:49:35AM +0100, Pankaj Raghav (Samsung) wrote:
> +	if (check_mul_overflow(nblocks, (1 << sbp->sb_blocklog), &bytes))

Why would you not use check_shl_overflow()?

> +		return -EFBIG;
> +
> +	mapping_count = bytes >> PAGE_SHIFT;
>  	/* Limited by ULONG_MAX of page cache index */
> -	if (nblocks >> (PAGE_SHIFT - sbp->sb_blocklog) > ULONG_MAX)
> +	if (mapping_count > ULONG_MAX)
>  		return -EFBIG;
>  	return 0;
>  }
> -- 
> 2.43.0
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 13/13] xfs: enable block size larger than page size support
  2024-02-26  9:49 ` [PATCH 13/13] xfs: enable block size larger than page size support Pankaj Raghav (Samsung)
@ 2024-02-26 13:26   ` Matthew Wilcox
  2024-02-26 21:18     ` Dave Chinner
  0 siblings, 1 reply; 35+ messages in thread
From: Matthew Wilcox @ 2024-02-26 13:26 UTC (permalink / raw)
  To: Pankaj Raghav (Samsung)
  Cc: linux-xfs, linux-fsdevel, linux-kernel, david, chandan.babu,
	akpm, mcgrof, ziy, hare, djwong, gost.dev, linux-mm,
	Pankaj Raghav

On Mon, Feb 26, 2024 at 10:49:36AM +0100, Pankaj Raghav (Samsung) wrote:
> @@ -1625,16 +1625,10 @@ xfs_fs_fill_super(
>  		goto out_free_sb;
>  	}
>  
> -	/*
> -	 * Until this is fixed only page-sized or smaller data blocks work.
> -	 */
>  	if (mp->m_sb.sb_blocksize > PAGE_SIZE) {
>  		xfs_warn(mp,
> -		"File system with blocksize %d bytes. "
> -		"Only pagesize (%ld) or less will currently work.",
> -				mp->m_sb.sb_blocksize, PAGE_SIZE);
> -		error = -ENOSYS;
> -		goto out_free_sb;
> +"EXPERIMENTAL: Filesystem with Large Block Size (%d bytes) enabled.",
> +			mp->m_sb.sb_blocksize);

WARN seems a little high for this.  xfs_notice() or xfs_info() would
seem more appropriate:

#define KERN_WARNING    KERN_SOH "4"    /* warning conditions */
#define KERN_NOTICE     KERN_SOH "5"    /* normal but significant condition */
#define KERN_INFO       KERN_SOH "6"    /* informational */


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 03/13] filemap: align the index to mapping_min_order in the page cache
  2024-02-26  9:49 ` [PATCH 03/13] filemap: align the index to mapping_min_order in the page cache Pankaj Raghav (Samsung)
@ 2024-02-26 14:40   ` Matthew Wilcox
  2024-02-27 10:06     ` Pankaj Raghav (Samsung)
  0 siblings, 1 reply; 35+ messages in thread
From: Matthew Wilcox @ 2024-02-26 14:40 UTC (permalink / raw)
  To: Pankaj Raghav (Samsung)
  Cc: linux-xfs, linux-fsdevel, linux-kernel, david, chandan.babu,
	akpm, mcgrof, ziy, hare, djwong, gost.dev, linux-mm,
	Pankaj Raghav

On Mon, Feb 26, 2024 at 10:49:26AM +0100, Pankaj Raghav (Samsung) wrote:
> From: Luis Chamberlain <mcgrof@kernel.org>
> 
> Supporting mapping_min_order implies that we guarantee each folio in the
> page cache has at least an order of mapping_min_order. So when adding new
> folios to the page cache we must ensure the index used is aligned to the
> mapping_min_order as the page cache requires the index to be aligned to
> the order of the folio.

This seems like a remarkably complicated way of achieving:

diff --git a/mm/filemap.c b/mm/filemap.c
index 5603ced05fb7..36105dad4440 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2427,9 +2427,11 @@ static int filemap_update_page(struct kiocb *iocb,
 }
 
 static int filemap_create_folio(struct file *file,
-		struct address_space *mapping, pgoff_t index,
+		struct address_space *mapping, loff_t pos,
 		struct folio_batch *fbatch)
 {
+	pgoff_t index;
+	unsigned int min_order;
 	struct folio *folio;
 	int error;
 
@@ -2451,6 +2453,8 @@ static int filemap_create_folio(struct file *file,
 	 * well to keep locking rules simple.
 	 */
 	filemap_invalidate_lock_shared(mapping);
+	min_order = mapping_min_folio_order(mapping);
+	index = (pos >> (min_order + PAGE_SHIFT)) << min_order;
 	error = filemap_add_folio(mapping, folio, index,
 			mapping_gfp_constraint(mapping, GFP_KERNEL));
 	if (error == -EEXIST)
@@ -2511,8 +2515,7 @@ static int filemap_get_pages(struct kiocb *iocb, size_t count,
 	if (!folio_batch_count(fbatch)) {
 		if (iocb->ki_flags & (IOCB_NOWAIT | IOCB_WAITQ))
 			return -EAGAIN;
-		err = filemap_create_folio(filp, mapping,
-				iocb->ki_pos >> PAGE_SHIFT, fbatch);
+		err = filemap_create_folio(filp, mapping, iocb->ki_pos, fbatch);
 		if (err == AOP_TRUNCATED_PAGE)
 			goto retry;
 		return err;

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH 04/13] filemap: use mapping_min_order while allocating folios
  2024-02-26  9:49 ` [PATCH 04/13] filemap: use mapping_min_order while allocating folios Pankaj Raghav (Samsung)
@ 2024-02-26 14:47   ` Matthew Wilcox
  2024-02-27 12:09     ` Pankaj Raghav (Samsung)
  0 siblings, 1 reply; 35+ messages in thread
From: Matthew Wilcox @ 2024-02-26 14:47 UTC (permalink / raw)
  To: Pankaj Raghav (Samsung)
  Cc: linux-xfs, linux-fsdevel, linux-kernel, david, chandan.babu,
	akpm, mcgrof, ziy, hare, djwong, gost.dev, linux-mm,
	Pankaj Raghav

On Mon, Feb 26, 2024 at 10:49:27AM +0100, Pankaj Raghav (Samsung) wrote:
> Add some additional VM_BUG_ON() in page_cache_delete[batch] and
> __filemap_add_folio to catch errors where we delete or add folios that
> has order less than min_order.

I don't understand why we need these checks in the deletion path.  The
add path, yes, absolutely.  But the delete path?

> @@ -896,6 +900,8 @@ noinline int __filemap_add_folio(struct address_space *mapping,
>  			}
>  		}
>  
> +		VM_BUG_ON_FOLIO(folio_order(folio) < mapping_min_folio_order(mapping),
> +				folio);

But I don't understand why you put it here, while we're holding the
xa_lock.  That seems designed to cause maximum disruption.  Why not put
it at the beginning of the function with all the other VM_BUG_ON_FOLIO?

> @@ -1847,6 +1853,9 @@ struct folio *__filemap_get_folio(struct address_space *mapping, pgoff_t index,
>  		fgf_t fgp_flags, gfp_t gfp)
>  {
>  	struct folio *folio;
> +	unsigned int min_order = mapping_min_folio_order(mapping);
> +
> +	index = mapping_align_start_index(mapping, index);

I would not do this here.

>  repeat:
>  	folio = filemap_get_entry(mapping, index);
> @@ -1886,7 +1895,7 @@ struct folio *__filemap_get_folio(struct address_space *mapping, pgoff_t index,
>  		folio_wait_stable(folio);
>  no_page:
>  	if (!folio && (fgp_flags & FGP_CREAT)) {
> -		unsigned order = FGF_GET_ORDER(fgp_flags);
> +		unsigned int order = max(min_order, FGF_GET_ORDER(fgp_flags));
>  		int err;

Put it here instead.

>  		if ((fgp_flags & FGP_WRITE) && mapping_can_writeback(mapping))
> @@ -1912,8 +1921,13 @@ struct folio *__filemap_get_folio(struct address_space *mapping, pgoff_t index,
>  			gfp_t alloc_gfp = gfp;
>  
>  			err = -ENOMEM;
> +			if (order < min_order)
> +				order = min_order;
>  			if (order > 0)
>  				alloc_gfp |= __GFP_NORETRY | __GFP_NOWARN;
> +
> +			VM_BUG_ON(index & ((1UL << order) - 1));

Then you don't need this BUG_ON because it's obvious you just did it.
And the one in filemap_add_folio() would catch it anyway.


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 05/13] readahead: set file_ra_state->ra_pages to be at least mapping_min_order
  2024-02-26  9:49 ` [PATCH 05/13] readahead: set file_ra_state->ra_pages to be at least mapping_min_order Pankaj Raghav (Samsung)
@ 2024-02-26 14:49   ` Matthew Wilcox
  2024-02-27 12:42     ` Pankaj Raghav (Samsung)
  0 siblings, 1 reply; 35+ messages in thread
From: Matthew Wilcox @ 2024-02-26 14:49 UTC (permalink / raw)
  To: Pankaj Raghav (Samsung)
  Cc: linux-xfs, linux-fsdevel, linux-kernel, david, chandan.babu,
	akpm, mcgrof, ziy, hare, djwong, gost.dev, linux-mm,
	Pankaj Raghav

On Mon, Feb 26, 2024 at 10:49:28AM +0100, Pankaj Raghav (Samsung) wrote:
> From: Luis Chamberlain <mcgrof@kernel.org>
> 
> Set the file_ra_state->ra_pages in file_ra_state_init() to be at least
> mapping_min_order of pages if the bdi->ra_pages is less than that.

Don't we rather want to round up to a multiple of mapping_min_nrpages?

>  file_ra_state_init(struct file_ra_state *ra, struct address_space *mapping)
>  {
> +	unsigned int min_nrpages = mapping_min_folio_nrpages(mapping);
> +
>  	ra->ra_pages = inode_to_bdi(mapping->host)->ra_pages;
> +	if (ra->ra_pages < min_nrpages)
> +		ra->ra_pages = min_nrpages;
>  	ra->prev_pos = -1;

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 10/13] iomap: fix iomap_dio_zero() for fs bs > system page size
  2024-02-26  9:49 ` [PATCH 10/13] iomap: fix iomap_dio_zero() for fs bs > system page size Pankaj Raghav (Samsung)
@ 2024-02-26 17:58   ` Matthew Wilcox
  2024-02-27  9:33     ` Pankaj Raghav (Samsung)
  0 siblings, 1 reply; 35+ messages in thread
From: Matthew Wilcox @ 2024-02-26 17:58 UTC (permalink / raw)
  To: Pankaj Raghav (Samsung)
  Cc: linux-xfs, linux-fsdevel, linux-kernel, david, chandan.babu,
	akpm, mcgrof, ziy, hare, djwong, gost.dev, linux-mm,
	Pankaj Raghav

On Mon, Feb 26, 2024 at 10:49:33AM +0100, Pankaj Raghav (Samsung) wrote:
> +++ b/fs/iomap/direct-io.c
> @@ -239,14 +239,23 @@ static void iomap_dio_zero(const struct iomap_iter *iter, struct iomap_dio *dio,
>  	struct page *page = ZERO_PAGE(0);
>  	struct bio *bio;
>  
> -	bio = iomap_dio_alloc_bio(iter, dio, 1, REQ_OP_WRITE | REQ_SYNC | REQ_IDLE);
> +	WARN_ON_ONCE(len > (BIO_MAX_VECS * PAGE_SIZE));
> +
> +	bio = iomap_dio_alloc_bio(iter, dio, BIO_MAX_VECS,
> +				  REQ_OP_WRITE | REQ_SYNC | REQ_IDLE);
>  	fscrypt_set_bio_crypt_ctx(bio, inode, pos >> inode->i_blkbits,
>  				  GFP_KERNEL);
> +
>  	bio->bi_iter.bi_sector = iomap_sector(&iter->iomap, pos);
>  	bio->bi_private = dio;
>  	bio->bi_end_io = iomap_dio_bio_end_io;
>  
> -	__bio_add_page(bio, page, len, 0);
> +	while (len) {
> +		unsigned int io_len = min_t(unsigned int, len, PAGE_SIZE);
> +
> +		__bio_add_page(bio, page, io_len, 0);
> +		len -= io_len;
> +	}

I thought we were going to use the huge_zero_page for this?

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 13/13] xfs: enable block size larger than page size support
  2024-02-26 13:26   ` Matthew Wilcox
@ 2024-02-26 21:18     ` Dave Chinner
  0 siblings, 0 replies; 35+ messages in thread
From: Dave Chinner @ 2024-02-26 21:18 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Pankaj Raghav (Samsung),
	linux-xfs, linux-fsdevel, linux-kernel, chandan.babu, akpm,
	mcgrof, ziy, hare, djwong, gost.dev, linux-mm, Pankaj Raghav

On Mon, Feb 26, 2024 at 01:26:30PM +0000, Matthew Wilcox wrote:
> On Mon, Feb 26, 2024 at 10:49:36AM +0100, Pankaj Raghav (Samsung) wrote:
> > @@ -1625,16 +1625,10 @@ xfs_fs_fill_super(
> >  		goto out_free_sb;
> >  	}
> >  
> > -	/*
> > -	 * Until this is fixed only page-sized or smaller data blocks work.
> > -	 */
> >  	if (mp->m_sb.sb_blocksize > PAGE_SIZE) {
> >  		xfs_warn(mp,
> > -		"File system with blocksize %d bytes. "
> > -		"Only pagesize (%ld) or less will currently work.",
> > -				mp->m_sb.sb_blocksize, PAGE_SIZE);
> > -		error = -ENOSYS;
> > -		goto out_free_sb;
> > +"EXPERIMENTAL: Filesystem with Large Block Size (%d bytes) enabled.",
> > +			mp->m_sb.sb_blocksize);
> 
> WARN seems a little high for this.  xfs_notice() or xfs_info() would
> seem more appropriate:

Nope, warning level is correct and consistent with what we've used
for these experimental warnings.

	xfs_warn(mp, "DAX enabled. Warning: EXPERIMENTAL, use at your own risk");

i.e. A message that says "Expect things not to work correctly in
your filesystem" is definitely worth warning level meddaging.

-Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 12/13] xfs: make the calculation generic in xfs_sb_validate_fsb_count()
  2024-02-26 13:21   ` Matthew Wilcox
@ 2024-02-27  8:44     ` Pankaj Raghav (Samsung)
  0 siblings, 0 replies; 35+ messages in thread
From: Pankaj Raghav (Samsung) @ 2024-02-27  8:44 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: linux-xfs, linux-fsdevel, linux-kernel, david, chandan.babu,
	akpm, mcgrof, ziy, hare, djwong, gost.dev, linux-mm,
	Pankaj Raghav

On Mon, Feb 26, 2024 at 01:21:56PM +0000, Matthew Wilcox wrote:
> On Mon, Feb 26, 2024 at 10:49:35AM +0100, Pankaj Raghav (Samsung) wrote:
> > +	if (check_mul_overflow(nblocks, (1 << sbp->sb_blocklog), &bytes))
> 
> Why would you not use check_shl_overflow()?

This looks better than check_mul_overflow. I will use this in the next
version.
> 
> > +		return -EFBIG;
> > +
> > +	mapping_count = bytes >> PAGE_SHIFT;
> >  	/* Limited by ULONG_MAX of page cache index */
> > -	if (nblocks >> (PAGE_SHIFT - sbp->sb_blocklog) > ULONG_MAX)
> > +	if (mapping_count > ULONG_MAX)
> >  		return -EFBIG;
> >  	return 0;
> >  }
> > -- 
> > 2.43.0
> > 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 11/13] xfs: expose block size in stat
  2024-02-26 12:44   ` Dave Chinner
@ 2024-02-27  8:53     ` Pankaj Raghav (Samsung)
  0 siblings, 0 replies; 35+ messages in thread
From: Pankaj Raghav (Samsung) @ 2024-02-27  8:53 UTC (permalink / raw)
  To: Dave Chinner
  Cc: linux-xfs, linux-fsdevel, linux-kernel, chandan.babu, akpm,
	mcgrof, ziy, hare, djwong, gost.dev, linux-mm, willy,
	Dave Chinner

On Mon, Feb 26, 2024 at 11:44:16PM +1100, Dave Chinner wrote:
> On Mon, Feb 26, 2024 at 10:49:34AM +0100, Pankaj Raghav (Samsung) wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > For block size larger than page size, the unit of efficient IO is
> > the block size, not the page size. Leaving stat() to report
> > PAGE_SIZE as the block size causes test programs like fsx to issue
> > illegal ranges for operations that require block size alignment
> > (e.g. fallocate() insert range). Hence update the preferred IO size
> > to reflect the block size in this case.
> > 
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > dd2d535e3fb29d ("xfs: cleanup calculating the stat optimal I/O size")]
> > Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
> 
> Something screwed up there, and you haven't put your own SOB on
> this.
Oops. I will add it.

> 
> > ---
> >  fs/xfs/xfs_iops.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
> > index a0d77f5f512e..1b4edfad464f 100644
> > --- a/fs/xfs/xfs_iops.c
> > +++ b/fs/xfs/xfs_iops.c
> > @@ -543,7 +543,7 @@ xfs_stat_blksize(
> >  			return 1U << mp->m_allocsize_log;
> >  	}
> >  
> > -	return PAGE_SIZE;
> > +	return max_t(unsigned long, PAGE_SIZE, mp->m_sb.sb_blocksize);
> >  }
> 
> This function returns a uint32_t, same type as
> mp->m_sb.sb_blocksize. The comparision should use uint32_t casts,
> not unsigned long.
> 
Yeah. Something like this instead of using unsigned long:

return max_t(uint32_t, PAGE_SIZE, mp->m_sb.sb_blocksize);

> ALso, this bears no resemblence to the original patch I wrote back in
> 2018. Please remove my SOB from it - you can state that "this change
> is based on a patch originally from Dave Chinner" to credit the
> history of it, but it's certainly not the patch I wrote 6 years ago
> and so my SOB does not belong on it.
Ok.

> 
> -Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 10/13] iomap: fix iomap_dio_zero() for fs bs > system page size
  2024-02-26 17:58   ` Matthew Wilcox
@ 2024-02-27  9:33     ` Pankaj Raghav (Samsung)
  0 siblings, 0 replies; 35+ messages in thread
From: Pankaj Raghav (Samsung) @ 2024-02-27  9:33 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: linux-xfs, linux-fsdevel, linux-kernel, david, chandan.babu,
	akpm, mcgrof, ziy, hare, djwong, gost.dev, linux-mm,
	Pankaj Raghav

> I thought we were going to use the huge_zero_page for this?

Yes. We discussed that huge_zero_page might fail, so we concluded that
we needed an api that can return arbitrary folio order that will not
fail:
```
your point about it possibly failing is correct.  so i think we need an
api which definitely returns a folio, but it might be of arbitrary
order.
```

I couldn't come up with implementing your latter suggestion, so I
informed darrick that let's use this patch for now, and add the
arbitrary folio order with zero as a later enhancement.

If we want to use mm_huge_zero_page, then this should work:

diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
index 04f6c5548136..b6a3f52f48da 100644
--- a/fs/iomap/direct-io.c
+++ b/fs/iomap/direct-io.c
@@ -237,10 +237,17 @@ static void iomap_dio_zero(const struct iomap_iter *iter, struct iomap_dio *dio,
 {
        struct inode *inode = file_inode(dio->iocb->ki_filp);
        struct page *page = ZERO_PAGE(0);
+       struct folio *folio = NULL;
        struct bio *bio;
 
        WARN_ON_ONCE(len > (BIO_MAX_VECS * PAGE_SIZE));
 
+       if (len > PAGE_SIZE) {
+               page = mm_get_huge_zero_page(current->mm);
+               if (!page)
+                       page = ZERO_PAGE(0);
+       }
+
        bio = iomap_dio_alloc_bio(iter, dio, BIO_MAX_VECS,
                                  REQ_OP_WRITE | REQ_SYNC | REQ_IDLE);
        fscrypt_set_bio_crypt_ctx(bio, inode, pos >> inode->i_blkbits,
@@ -249,13 +256,15 @@ static void iomap_dio_zero(const struct iomap_iter *iter, struct iomap_dio *dio,
        bio->bi_iter.bi_sector = iomap_sector(&iter->iomap, pos);
        bio->bi_private = dio;
        bio->bi_end_io = iomap_dio_bio_end_io;
+       folio = page_folio(page);
 
        while (len) {
-               unsigned int io_len = min_t(unsigned int, len, PAGE_SIZE);
+               size_t size = min(len, folio_size(folio));
 
-               __bio_add_page(bio, page, io_len, 0);
-               len -= io_len;
+               bio_add_folio_nofail(bio, folio, size, 0);
+               len -= size;
        }
+
        iomap_dio_submit_bio(iter, dio, bio, pos);
 }

Let me know if we should go for this or let's keep the original patch
and add a ZERO_FOLIO_ORDER API that will not fail and use it as a later
enhancement.

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH 03/13] filemap: align the index to mapping_min_order in the page cache
  2024-02-26 14:40   ` Matthew Wilcox
@ 2024-02-27 10:06     ` Pankaj Raghav (Samsung)
  2024-02-27 16:22       ` Kent Overstreet
  0 siblings, 1 reply; 35+ messages in thread
From: Pankaj Raghav (Samsung) @ 2024-02-27 10:06 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: linux-xfs, linux-fsdevel, linux-kernel, david, chandan.babu,
	akpm, mcgrof, ziy, hare, djwong, gost.dev, linux-mm,
	Pankaj Raghav

On Mon, Feb 26, 2024 at 02:40:42PM +0000, Matthew Wilcox wrote:
> On Mon, Feb 26, 2024 at 10:49:26AM +0100, Pankaj Raghav (Samsung) wrote:
> > From: Luis Chamberlain <mcgrof@kernel.org>
> > 
> > Supporting mapping_min_order implies that we guarantee each folio in the
> > page cache has at least an order of mapping_min_order. So when adding new
> > folios to the page cache we must ensure the index used is aligned to the
> > mapping_min_order as the page cache requires the index to be aligned to
> > the order of the folio.
> 
> This seems like a remarkably complicated way of achieving:
> 
> diff --git a/mm/filemap.c b/mm/filemap.c
> index 5603ced05fb7..36105dad4440 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -2427,9 +2427,11 @@ static int filemap_update_page(struct kiocb *iocb,
>  }
>  
>  static int filemap_create_folio(struct file *file,
> -		struct address_space *mapping, pgoff_t index,
> +		struct address_space *mapping, loff_t pos,
>  		struct folio_batch *fbatch)
>  {
> +	pgoff_t index;
> +	unsigned int min_order;
>  	struct folio *folio;
>  	int error;
>  
> @@ -2451,6 +2453,8 @@ static int filemap_create_folio(struct file *file,
>  	 * well to keep locking rules simple.
>  	 */
>  	filemap_invalidate_lock_shared(mapping);
> +	min_order = mapping_min_folio_order(mapping);
> +	index = (pos >> (min_order + PAGE_SHIFT)) << min_order;

That is some cool mathfu. I will add a comment here as it might not be
that obvious to some people (i.e me).

Thanks.

>  	error = filemap_add_folio(mapping, folio, index,
>  			mapping_gfp_constraint(mapping, GFP_KERNEL));
>  	if (error == -EEXIST)
> @@ -2511,8 +2515,7 @@ static int filemap_get_pages(struct kiocb *iocb, size_t count,
>  	if (!folio_batch_count(fbatch)) {
>  		if (iocb->ki_flags & (IOCB_NOWAIT | IOCB_WAITQ))
>  			return -EAGAIN;
> -		err = filemap_create_folio(filp, mapping,
> -				iocb->ki_pos >> PAGE_SHIFT, fbatch);
> +		err = filemap_create_folio(filp, mapping, iocb->ki_pos, fbatch);
>  		if (err == AOP_TRUNCATED_PAGE)
>  			goto retry;
>  		return err;

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 04/13] filemap: use mapping_min_order while allocating folios
  2024-02-26 14:47   ` Matthew Wilcox
@ 2024-02-27 12:09     ` Pankaj Raghav (Samsung)
  0 siblings, 0 replies; 35+ messages in thread
From: Pankaj Raghav (Samsung) @ 2024-02-27 12:09 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: linux-xfs, linux-fsdevel, linux-kernel, david, chandan.babu,
	akpm, mcgrof, ziy, hare, djwong, gost.dev, linux-mm,
	Pankaj Raghav

On Mon, Feb 26, 2024 at 02:47:33PM +0000, Matthew Wilcox wrote:
> On Mon, Feb 26, 2024 at 10:49:27AM +0100, Pankaj Raghav (Samsung) wrote:
> > Add some additional VM_BUG_ON() in page_cache_delete[batch] and
> > __filemap_add_folio to catch errors where we delete or add folios that
> > has order less than min_order.
> 
> I don't understand why we need these checks in the deletion path.  The
> add path, yes, absolutely.  But the delete path?
I think we initially added it to check if some split happened which
might mess up the page cache with min order support. But I think it is
not super critical anymore because of the changes in the split_folio
path. I will remove the checks.

> 
> > @@ -896,6 +900,8 @@ noinline int __filemap_add_folio(struct address_space *mapping,
> >  			}
> >  		}
> >  
> > +		VM_BUG_ON_FOLIO(folio_order(folio) < mapping_min_folio_order(mapping),
> > +				folio);
> 
> But I don't understand why you put it here, while we're holding the
> xa_lock.  That seems designed to cause maximum disruption.  Why not put
> it at the beginning of the function with all the other VM_BUG_ON_FOLIO?

Yeah. That makes sense as the folio itself is not changing.

> 
> > @@ -1847,6 +1853,9 @@ struct folio *__filemap_get_folio(struct address_space *mapping, pgoff_t index,
> >  		fgf_t fgp_flags, gfp_t gfp)
> >  {
> >  	struct folio *folio;
> > +	unsigned int min_order = mapping_min_folio_order(mapping);
> > +
> > +	index = mapping_align_start_index(mapping, index);
> 
> I would not do this here.
> 
> >  repeat:
> >  	folio = filemap_get_entry(mapping, index);
> > @@ -1886,7 +1895,7 @@ struct folio *__filemap_get_folio(struct address_space *mapping, pgoff_t index,
> >  		folio_wait_stable(folio);
> >  no_page:
> >  	if (!folio && (fgp_flags & FGP_CREAT)) {
> > -		unsigned order = FGF_GET_ORDER(fgp_flags);
> > +		unsigned int order = max(min_order, FGF_GET_ORDER(fgp_flags));
> >  		int err;
> 
> Put it here instead.
> 
> >  		if ((fgp_flags & FGP_WRITE) && mapping_can_writeback(mapping))
> > @@ -1912,8 +1921,13 @@ struct folio *__filemap_get_folio(struct address_space *mapping, pgoff_t index,
> >  			gfp_t alloc_gfp = gfp;
> >  
> >  			err = -ENOMEM;
> > +			if (order < min_order)
> > +				order = min_order;
> >  			if (order > 0)
> >  				alloc_gfp |= __GFP_NORETRY | __GFP_NOWARN;
> > +
> > +			VM_BUG_ON(index & ((1UL << order) - 1));
> 
> Then you don't need this BUG_ON because it's obvious you just did it.
> And the one in filemap_add_folio() would catch it anyway.

I agree. I will change it in the next revision.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 05/13] readahead: set file_ra_state->ra_pages to be at least mapping_min_order
  2024-02-26 14:49   ` Matthew Wilcox
@ 2024-02-27 12:42     ` Pankaj Raghav (Samsung)
  0 siblings, 0 replies; 35+ messages in thread
From: Pankaj Raghav (Samsung) @ 2024-02-27 12:42 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: linux-xfs, linux-fsdevel, linux-kernel, david, chandan.babu,
	akpm, mcgrof, ziy, hare, djwong, gost.dev, linux-mm,
	Pankaj Raghav

On Mon, Feb 26, 2024 at 02:49:04PM +0000, Matthew Wilcox wrote:
> On Mon, Feb 26, 2024 at 10:49:28AM +0100, Pankaj Raghav (Samsung) wrote:
> > From: Luis Chamberlain <mcgrof@kernel.org>
> > 
> > Set the file_ra_state->ra_pages in file_ra_state_init() to be at least
> > mapping_min_order of pages if the bdi->ra_pages is less than that.
> 
> Don't we rather want to round up to a multiple of mapping_min_nrpages?

Hmm. That will definitely be more explicit. We might be doing
multiple of min_nrpages now anyway, going beyond the ra_pages(if it 
is not a multiple of min_nrpages).

I will do this instead:

diff --git a/mm/readahead.c b/mm/readahead.c
index 73aef3f080ba..4e3a6f763f5c 100644
--- a/mm/readahead.c
+++ b/mm/readahead.c
@@ -138,11 +138,8 @@
 void
 file_ra_state_init(struct file_ra_state *ra, struct address_space *mapping)
 {
-       unsigned int min_nrpages = mapping_min_folio_nrpages(mapping);
-
-       ra->ra_pages = inode_to_bdi(mapping->host)->ra_pages;
-       if (ra->ra_pages < min_nrpages)
-               ra->ra_pages = min_nrpages;
+       ra->ra_pages = round_up(inode_to_bdi(mapping->host)->ra_pages,
+                               mapping_min_folio_nrpages(mapping));
        ra->prev_pos = -1;
 }

> 
> >  file_ra_state_init(struct file_ra_state *ra, struct address_space *mapping)
> >  {
> > +	unsigned int min_nrpages = mapping_min_folio_nrpages(mapping);
> > +
> >  	ra->ra_pages = inode_to_bdi(mapping->host)->ra_pages;
> > +	if (ra->ra_pages < min_nrpages)
> > +		ra->ra_pages = min_nrpages;
> >  	ra->prev_pos = -1;

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH 03/13] filemap: align the index to mapping_min_order in the page cache
  2024-02-27 10:06     ` Pankaj Raghav (Samsung)
@ 2024-02-27 16:22       ` Kent Overstreet
  2024-02-27 16:36         ` Pankaj Raghav (Samsung)
  0 siblings, 1 reply; 35+ messages in thread
From: Kent Overstreet @ 2024-02-27 16:22 UTC (permalink / raw)
  To: Pankaj Raghav (Samsung)
  Cc: Matthew Wilcox, linux-xfs, linux-fsdevel, linux-kernel, david,
	chandan.babu, akpm, mcgrof, ziy, hare, djwong, gost.dev,
	linux-mm, Pankaj Raghav

On Tue, Feb 27, 2024 at 11:06:37AM +0100, Pankaj Raghav (Samsung) wrote:
> On Mon, Feb 26, 2024 at 02:40:42PM +0000, Matthew Wilcox wrote:
> > On Mon, Feb 26, 2024 at 10:49:26AM +0100, Pankaj Raghav (Samsung) wrote:
> > > From: Luis Chamberlain <mcgrof@kernel.org>
> > > 
> > > Supporting mapping_min_order implies that we guarantee each folio in the
> > > page cache has at least an order of mapping_min_order. So when adding new
> > > folios to the page cache we must ensure the index used is aligned to the
> > > mapping_min_order as the page cache requires the index to be aligned to
> > > the order of the folio.
> > 
> > This seems like a remarkably complicated way of achieving:
> > 
> > diff --git a/mm/filemap.c b/mm/filemap.c
> > index 5603ced05fb7..36105dad4440 100644
> > --- a/mm/filemap.c
> > +++ b/mm/filemap.c
> > @@ -2427,9 +2427,11 @@ static int filemap_update_page(struct kiocb *iocb,
> >  }
> >  
> >  static int filemap_create_folio(struct file *file,
> > -		struct address_space *mapping, pgoff_t index,
> > +		struct address_space *mapping, loff_t pos,
> >  		struct folio_batch *fbatch)
> >  {
> > +	pgoff_t index;
> > +	unsigned int min_order;
> >  	struct folio *folio;
> >  	int error;
> >  
> > @@ -2451,6 +2453,8 @@ static int filemap_create_folio(struct file *file,
> >  	 * well to keep locking rules simple.
> >  	 */
> >  	filemap_invalidate_lock_shared(mapping);
> > +	min_order = mapping_min_folio_order(mapping);
> > +	index = (pos >> (min_order + PAGE_SHIFT)) << min_order;
> 
> That is some cool mathfu. I will add a comment here as it might not be
> that obvious to some people (i.e me).

you guys are both wrong, just use rounddown()

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 03/13] filemap: align the index to mapping_min_order in the page cache
  2024-02-27 16:22       ` Kent Overstreet
@ 2024-02-27 16:36         ` Pankaj Raghav (Samsung)
  2024-02-27 16:40           ` Kent Overstreet
  0 siblings, 1 reply; 35+ messages in thread
From: Pankaj Raghav (Samsung) @ 2024-02-27 16:36 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: Matthew Wilcox, linux-xfs, linux-fsdevel, linux-kernel, david,
	chandan.babu, akpm, mcgrof, ziy, hare, djwong, gost.dev,
	linux-mm, Pankaj Raghav

On Tue, Feb 27, 2024 at 11:22:24AM -0500, Kent Overstreet wrote:
> On Tue, Feb 27, 2024 at 11:06:37AM +0100, Pankaj Raghav (Samsung) wrote:
> > On Mon, Feb 26, 2024 at 02:40:42PM +0000, Matthew Wilcox wrote:
> > > On Mon, Feb 26, 2024 at 10:49:26AM +0100, Pankaj Raghav (Samsung) wrote:
> > > > From: Luis Chamberlain <mcgrof@kernel.org>
> > > > 
> > > > Supporting mapping_min_order implies that we guarantee each folio in the
> > > > page cache has at least an order of mapping_min_order. So when adding new
> > > > folios to the page cache we must ensure the index used is aligned to the
> > > > mapping_min_order as the page cache requires the index to be aligned to
> > > > the order of the folio.
> > > 
> > > This seems like a remarkably complicated way of achieving:
> > > 
> > > diff --git a/mm/filemap.c b/mm/filemap.c
> > > index 5603ced05fb7..36105dad4440 100644
> > > --- a/mm/filemap.c
> > > +++ b/mm/filemap.c
> > > @@ -2427,9 +2427,11 @@ static int filemap_update_page(struct kiocb *iocb,
> > >  }
> > >  
> > >  static int filemap_create_folio(struct file *file,
> > > -		struct address_space *mapping, pgoff_t index,
> > > +		struct address_space *mapping, loff_t pos,
> > >  		struct folio_batch *fbatch)
> > >  {
> > > +	pgoff_t index;
> > > +	unsigned int min_order;
> > >  	struct folio *folio;
> > >  	int error;
> > >  
> > > @@ -2451,6 +2453,8 @@ static int filemap_create_folio(struct file *file,
> > >  	 * well to keep locking rules simple.
> > >  	 */
> > >  	filemap_invalidate_lock_shared(mapping);
> > > +	min_order = mapping_min_folio_order(mapping);
> > > +	index = (pos >> (min_order + PAGE_SHIFT)) << min_order;
> > 
> > That is some cool mathfu. I will add a comment here as it might not be
> > that obvious to some people (i.e me).
> 
> you guys are both wrong, just use rounddown()

Umm, what do you mean just use rounddown? rounddown to ...?

We need to get index that are in PAGE units but aligned to min_order
pages.

The original patch did this:

index = mapping_align_start_index(mapping, iocb->ki_pos >> PAGE_SHIFT);

Which is essentially a rounddown operation (probably this is what you
are suggesting?).

So what willy is proposing will do the same. To me, what I proposed is
less complicated but to willy it is the other way around.
 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 03/13] filemap: align the index to mapping_min_order in the page cache
  2024-02-27 16:36         ` Pankaj Raghav (Samsung)
@ 2024-02-27 16:40           ` Kent Overstreet
  2024-02-27 16:55             ` Pankaj Raghav (Samsung)
  0 siblings, 1 reply; 35+ messages in thread
From: Kent Overstreet @ 2024-02-27 16:40 UTC (permalink / raw)
  To: Pankaj Raghav (Samsung)
  Cc: Matthew Wilcox, linux-xfs, linux-fsdevel, linux-kernel, david,
	chandan.babu, akpm, mcgrof, ziy, hare, djwong, gost.dev,
	linux-mm, Pankaj Raghav

On Tue, Feb 27, 2024 at 05:36:09PM +0100, Pankaj Raghav (Samsung) wrote:
> On Tue, Feb 27, 2024 at 11:22:24AM -0500, Kent Overstreet wrote:
> > On Tue, Feb 27, 2024 at 11:06:37AM +0100, Pankaj Raghav (Samsung) wrote:
> > > On Mon, Feb 26, 2024 at 02:40:42PM +0000, Matthew Wilcox wrote:
> > > > On Mon, Feb 26, 2024 at 10:49:26AM +0100, Pankaj Raghav (Samsung) wrote:
> > > > > From: Luis Chamberlain <mcgrof@kernel.org>
> > > > > 
> > > > > Supporting mapping_min_order implies that we guarantee each folio in the
> > > > > page cache has at least an order of mapping_min_order. So when adding new
> > > > > folios to the page cache we must ensure the index used is aligned to the
> > > > > mapping_min_order as the page cache requires the index to be aligned to
> > > > > the order of the folio.
> > > > 
> > > > This seems like a remarkably complicated way of achieving:
> > > > 
> > > > diff --git a/mm/filemap.c b/mm/filemap.c
> > > > index 5603ced05fb7..36105dad4440 100644
> > > > --- a/mm/filemap.c
> > > > +++ b/mm/filemap.c
> > > > @@ -2427,9 +2427,11 @@ static int filemap_update_page(struct kiocb *iocb,
> > > >  }
> > > >  
> > > >  static int filemap_create_folio(struct file *file,
> > > > -		struct address_space *mapping, pgoff_t index,
> > > > +		struct address_space *mapping, loff_t pos,
> > > >  		struct folio_batch *fbatch)
> > > >  {
> > > > +	pgoff_t index;
> > > > +	unsigned int min_order;
> > > >  	struct folio *folio;
> > > >  	int error;
> > > >  
> > > > @@ -2451,6 +2453,8 @@ static int filemap_create_folio(struct file *file,
> > > >  	 * well to keep locking rules simple.
> > > >  	 */
> > > >  	filemap_invalidate_lock_shared(mapping);
> > > > +	min_order = mapping_min_folio_order(mapping);
> > > > +	index = (pos >> (min_order + PAGE_SHIFT)) << min_order;
> > > 
> > > That is some cool mathfu. I will add a comment here as it might not be
> > > that obvious to some people (i.e me).
> > 
> > you guys are both wrong, just use rounddown()
> 
> Umm, what do you mean just use rounddown? rounddown to ...?
> 
> We need to get index that are in PAGE units but aligned to min_order
> pages.
> 
> The original patch did this:
> 
> index = mapping_align_start_index(mapping, iocb->ki_pos >> PAGE_SHIFT);
> 
> Which is essentially a rounddown operation (probably this is what you
> are suggesting?).
> 
> So what willy is proposing will do the same. To me, what I proposed is
> less complicated but to willy it is the other way around.

Ok, I just found the code for mapping_align_start_index() - it is just a
round_down().

Never mind; patch looks fine (aside from perhaps some quibbling over
whether the round_down()) should be done before calling readahead or
within readahead; I think that might have been more what willy was
keying in on)

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 03/13] filemap: align the index to mapping_min_order in the page cache
  2024-02-27 16:40           ` Kent Overstreet
@ 2024-02-27 16:55             ` Pankaj Raghav (Samsung)
  2024-02-27 17:02               ` Kent Overstreet
  0 siblings, 1 reply; 35+ messages in thread
From: Pankaj Raghav (Samsung) @ 2024-02-27 16:55 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: Matthew Wilcox, linux-xfs, linux-fsdevel, linux-kernel, david,
	chandan.babu, akpm, mcgrof, ziy, hare, djwong, gost.dev,
	linux-mm, Pankaj Raghav

> > > 
> > > you guys are both wrong, just use rounddown()
> > 
> > Umm, what do you mean just use rounddown? rounddown to ...?
> > 
> > We need to get index that are in PAGE units but aligned to min_order
> > pages.
> > 
> > The original patch did this:
> > 
> > index = mapping_align_start_index(mapping, iocb->ki_pos >> PAGE_SHIFT);
> > 
> > Which is essentially a rounddown operation (probably this is what you
> > are suggesting?).
> > 
> > So what willy is proposing will do the same. To me, what I proposed is
> > less complicated but to willy it is the other way around.
> 
> Ok, I just found the code for mapping_align_start_index() - it is just a
> round_down().
> 
> Never mind; patch looks fine (aside from perhaps some quibbling over
> whether the round_down()) should be done before calling readahead or
> within readahead; I think that might have been more what willy was
> keying in on)

Yeah, exactly.

I have one question while I have you here. 

When we have this support in the page cache, do you think bcachefs can make
use of this support to enable bs > ps in bcachefs as it already makes use 
of large folios? 
Do you think it is just a simple mapping_set_large_folios ->
mapping_set_folio_min_order(.., block_size order) or it requires more
effort?

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 03/13] filemap: align the index to mapping_min_order in the page cache
  2024-02-27 16:55             ` Pankaj Raghav (Samsung)
@ 2024-02-27 17:02               ` Kent Overstreet
  2024-02-27 17:09                 ` Pankaj Raghav (Samsung)
  0 siblings, 1 reply; 35+ messages in thread
From: Kent Overstreet @ 2024-02-27 17:02 UTC (permalink / raw)
  To: Pankaj Raghav (Samsung)
  Cc: Matthew Wilcox, linux-xfs, linux-fsdevel, linux-kernel, david,
	chandan.babu, akpm, mcgrof, ziy, hare, djwong, gost.dev,
	linux-mm, Pankaj Raghav

On Tue, Feb 27, 2024 at 05:55:35PM +0100, Pankaj Raghav (Samsung) wrote:
> > > > 
> > > > you guys are both wrong, just use rounddown()
> > > 
> > > Umm, what do you mean just use rounddown? rounddown to ...?
> > > 
> > > We need to get index that are in PAGE units but aligned to min_order
> > > pages.
> > > 
> > > The original patch did this:
> > > 
> > > index = mapping_align_start_index(mapping, iocb->ki_pos >> PAGE_SHIFT);
> > > 
> > > Which is essentially a rounddown operation (probably this is what you
> > > are suggesting?).
> > > 
> > > So what willy is proposing will do the same. To me, what I proposed is
> > > less complicated but to willy it is the other way around.
> > 
> > Ok, I just found the code for mapping_align_start_index() - it is just a
> > round_down().
> > 
> > Never mind; patch looks fine (aside from perhaps some quibbling over
> > whether the round_down()) should be done before calling readahead or
> > within readahead; I think that might have been more what willy was
> > keying in on)
> 
> Yeah, exactly.
> 
> I have one question while I have you here. 
> 
> When we have this support in the page cache, do you think bcachefs can make
> use of this support to enable bs > ps in bcachefs as it already makes use 
> of large folios? 

Yes, of course.

> Do you think it is just a simple mapping_set_large_folios ->
> mapping_set_folio_min_order(.., block_size order) or it requires more
> effort?

I think that's all that would be required. There's very little in the
way of references to PAGE_SIZE in bcachefs.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 03/13] filemap: align the index to mapping_min_order in the page cache
  2024-02-27 17:02               ` Kent Overstreet
@ 2024-02-27 17:09                 ` Pankaj Raghav (Samsung)
  0 siblings, 0 replies; 35+ messages in thread
From: Pankaj Raghav (Samsung) @ 2024-02-27 17:09 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: Matthew Wilcox, linux-xfs, linux-fsdevel, linux-kernel, david,
	chandan.babu, akpm, mcgrof, ziy, hare, djwong, gost.dev,
	linux-mm, Pankaj Raghav

> > 
> > I have one question while I have you here. 
> > 
> > When we have this support in the page cache, do you think bcachefs can make
> > use of this support to enable bs > ps in bcachefs as it already makes use 
> > of large folios? 
> 
> Yes, of course.
> 
> > Do you think it is just a simple mapping_set_large_folios ->
> > mapping_set_folio_min_order(.., block_size order) or it requires more
> > effort?
> 
> I think that's all that would be required. There's very little in the
> way of references to PAGE_SIZE in bcachefs.

Sweet. I will take a look at it once we get this upstream.

--
Pankaj

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2024-02-27 17:10 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-02-26  9:49 [PATCH 00/13] enable bs > ps in XFS Pankaj Raghav (Samsung)
2024-02-26  9:49 ` [PATCH 01/13] mm: Support order-1 folios in the page cache Pankaj Raghav (Samsung)
2024-02-26  9:49 ` [PATCH 02/13] fs: Allow fine-grained control of folio sizes Pankaj Raghav (Samsung)
2024-02-26  9:49 ` [PATCH 03/13] filemap: align the index to mapping_min_order in the page cache Pankaj Raghav (Samsung)
2024-02-26 14:40   ` Matthew Wilcox
2024-02-27 10:06     ` Pankaj Raghav (Samsung)
2024-02-27 16:22       ` Kent Overstreet
2024-02-27 16:36         ` Pankaj Raghav (Samsung)
2024-02-27 16:40           ` Kent Overstreet
2024-02-27 16:55             ` Pankaj Raghav (Samsung)
2024-02-27 17:02               ` Kent Overstreet
2024-02-27 17:09                 ` Pankaj Raghav (Samsung)
2024-02-26  9:49 ` [PATCH 04/13] filemap: use mapping_min_order while allocating folios Pankaj Raghav (Samsung)
2024-02-26 14:47   ` Matthew Wilcox
2024-02-27 12:09     ` Pankaj Raghav (Samsung)
2024-02-26  9:49 ` [PATCH 05/13] readahead: set file_ra_state->ra_pages to be at least mapping_min_order Pankaj Raghav (Samsung)
2024-02-26 14:49   ` Matthew Wilcox
2024-02-27 12:42     ` Pankaj Raghav (Samsung)
2024-02-26  9:49 ` [PATCH 06/13] readahead: align index to mapping_min_order in ondemand_ra and force_ra Pankaj Raghav (Samsung)
2024-02-26  9:49 ` [PATCH 07/13] readahead: rework loop in page_cache_ra_unbounded() Pankaj Raghav (Samsung)
2024-02-26  9:49 ` [PATCH 08/13] readahead: allocate folios with mapping_min_order in ra_(unbounded|order) Pankaj Raghav (Samsung)
2024-02-26  9:49 ` [PATCH 09/13] mm: do not split a folio if it has minimum folio order requirement Pankaj Raghav (Samsung)
2024-02-26  9:49 ` [PATCH 10/13] iomap: fix iomap_dio_zero() for fs bs > system page size Pankaj Raghav (Samsung)
2024-02-26 17:58   ` Matthew Wilcox
2024-02-27  9:33     ` Pankaj Raghav (Samsung)
2024-02-26  9:49 ` [PATCH 11/13] xfs: expose block size in stat Pankaj Raghav (Samsung)
2024-02-26 12:44   ` Dave Chinner
2024-02-27  8:53     ` Pankaj Raghav (Samsung)
2024-02-26  9:49 ` [PATCH 12/13] xfs: make the calculation generic in xfs_sb_validate_fsb_count() Pankaj Raghav (Samsung)
2024-02-26 12:47   ` Dave Chinner
2024-02-26 13:21   ` Matthew Wilcox
2024-02-27  8:44     ` Pankaj Raghav (Samsung)
2024-02-26  9:49 ` [PATCH 13/13] xfs: enable block size larger than page size support Pankaj Raghav (Samsung)
2024-02-26 13:26   ` Matthew Wilcox
2024-02-26 21:18     ` Dave Chinner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).