linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC 0/8] shmem: add support for blocksize > PAGE_SIZE
@ 2023-04-21 21:43 Luis Chamberlain
  2023-04-21 21:43 ` [RFC 1/8] shmem: replace BLOCKS_PER_PAGE with PAGE_SECTORS Luis Chamberlain
                   ` (7 more replies)
  0 siblings, 8 replies; 20+ messages in thread
From: Luis Chamberlain @ 2023-04-21 21:43 UTC (permalink / raw)
  To: hughd, akpm, willy, brauner, djwong
  Cc: p.raghav, da.gomez, a.manzanares, dave, yosryahmed, keescook,
	hare, kbusch, mcgrof, patches, linux-block, linux-fsdevel,
	linux-mm, linux-kernel

This is an initial attempt to add support for block size > PAGE_SIZE for tmpfs.
Why would you want this? It helps us experiment with higher order folio uses
with fs APIS and helps us test out corner cases which would likely need
to be accounted for sooner or later if and when filesystems enable support
for this. Better review early and burn early than continue on in the wrong
direction so looking for early feedback.

I have other patches to convert shmem_write_begin() and shmem_file_read_iter()
to folios too but those are not yet working. In the swap world the next
thing to look at would be to convert swap_cluster_readahead() to folios.

If folks want to experiment with tmpfs, brd or with things related to larger
block sizes I've put a branch up with this, Hannes's brd patches, and some
still work-in-progress patches on my large-block-20230421 branch [0]. Similarly
you can also use kdevops with CONFIG_QEMU_ENABLE_EXTRA_DRIVE_LARGEIO support
to get everything with just as that branch is used for that:

  make
  make bringup
  make linux

[0] https://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux-next.git/log/?h=large-block-20230421
[1] https://github.com/linux-kdevops/kdevops

Luis Chamberlain (8):
  shmem: replace BLOCKS_PER_PAGE with PAGE_SECTORS
  shmem: convert to use folio_test_hwpoison()
  shmem: account for high order folios
  shmem: add helpers to get block size
  shmem: account for larger blocks sizes for shmem_default_max_blocks()
  shmem: consider block size in shmem_default_max_inodes()
  shmem: add high order page support
  shmem: add support to customize block size on multiple PAGE_SIZE

 include/linux/shmem_fs.h |   3 +
 mm/shmem.c               | 146 +++++++++++++++++++++++++++++----------
 2 files changed, 114 insertions(+), 35 deletions(-)

-- 
2.39.2


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [RFC 1/8] shmem: replace BLOCKS_PER_PAGE with PAGE_SECTORS
  2023-04-21 21:43 [RFC 0/8] shmem: add support for blocksize > PAGE_SIZE Luis Chamberlain
@ 2023-04-21 21:43 ` Luis Chamberlain
  2023-04-21 21:43 ` [RFC 2/8] shmem: convert to use folio_test_hwpoison() Luis Chamberlain
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 20+ messages in thread
From: Luis Chamberlain @ 2023-04-21 21:43 UTC (permalink / raw)
  To: hughd, akpm, willy, brauner, djwong
  Cc: p.raghav, da.gomez, a.manzanares, dave, yosryahmed, keescook,
	hare, kbusch, mcgrof, patches, linux-block, linux-fsdevel,
	linux-mm, linux-kernel

Instead of having our own macro use the generic PAGE_SECTORS.
It also makes it clearer what we are trying to compute here on
the inode->i_blocks. We get the inode size by as define din
__inode_get_bytes() by:

(inode->i_blocks << SECTOR_SHIFT) + inode->i_bytes

This produces no functional changes.

Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
---
 mm/shmem.c | 11 +++++------
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/mm/shmem.c b/mm/shmem.c
index b5d102a2a766..5bf92d571092 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -86,7 +86,6 @@ static struct vfsmount *shm_mnt;
 
 #include "internal.h"
 
-#define BLOCKS_PER_PAGE  (PAGE_SIZE/512)
 #define VM_ACCT(size)    (PAGE_ALIGN(size) >> PAGE_SHIFT)
 
 /* Pretend that each entry is of this size in directory's i_size */
@@ -363,7 +362,7 @@ static void shmem_recalc_inode(struct inode *inode)
 	freed = info->alloced - info->swapped - inode->i_mapping->nrpages;
 	if (freed > 0) {
 		info->alloced -= freed;
-		inode->i_blocks -= freed * BLOCKS_PER_PAGE;
+		inode->i_blocks -= freed * PAGE_SECTORS;
 		shmem_inode_unacct_blocks(inode, freed);
 	}
 }
@@ -381,7 +380,7 @@ bool shmem_charge(struct inode *inode, long pages)
 
 	spin_lock_irqsave(&info->lock, flags);
 	info->alloced += pages;
-	inode->i_blocks += pages * BLOCKS_PER_PAGE;
+	inode->i_blocks += pages * PAGE_SECTORS;
 	shmem_recalc_inode(inode);
 	spin_unlock_irqrestore(&info->lock, flags);
 
@@ -397,7 +396,7 @@ void shmem_uncharge(struct inode *inode, long pages)
 
 	spin_lock_irqsave(&info->lock, flags);
 	info->alloced -= pages;
-	inode->i_blocks -= pages * BLOCKS_PER_PAGE;
+	inode->i_blocks -= pages * PAGE_SECTORS;
 	shmem_recalc_inode(inode);
 	spin_unlock_irqrestore(&info->lock, flags);
 
@@ -2002,7 +2001,7 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index,
 
 	spin_lock_irq(&info->lock);
 	info->alloced += folio_nr_pages(folio);
-	inode->i_blocks += (blkcnt_t)BLOCKS_PER_PAGE << folio_order(folio);
+	inode->i_blocks += (blkcnt_t) PAGE_SECTORS << folio_order(folio);
 	shmem_recalc_inode(inode);
 	spin_unlock_irq(&info->lock);
 	alloced = true;
@@ -2659,7 +2658,7 @@ int shmem_mfill_atomic_pte(pmd_t *dst_pmd,
 
 	spin_lock_irq(&info->lock);
 	info->alloced++;
-	inode->i_blocks += BLOCKS_PER_PAGE;
+	inode->i_blocks += PAGE_SECTORS;
 	shmem_recalc_inode(inode);
 	spin_unlock_irq(&info->lock);
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC 2/8] shmem: convert to use folio_test_hwpoison()
  2023-04-21 21:43 [RFC 0/8] shmem: add support for blocksize > PAGE_SIZE Luis Chamberlain
  2023-04-21 21:43 ` [RFC 1/8] shmem: replace BLOCKS_PER_PAGE with PAGE_SECTORS Luis Chamberlain
@ 2023-04-21 21:43 ` Luis Chamberlain
  2023-04-21 22:42   ` Matthew Wilcox
  2023-04-21 21:43 ` [RFC 3/8] shmem: account for high order folios Luis Chamberlain
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 20+ messages in thread
From: Luis Chamberlain @ 2023-04-21 21:43 UTC (permalink / raw)
  To: hughd, akpm, willy, brauner, djwong
  Cc: p.raghav, da.gomez, a.manzanares, dave, yosryahmed, keescook,
	hare, kbusch, mcgrof, patches, linux-block, linux-fsdevel,
	linux-mm, linux-kernel

The PageHWPoison() call can be converted over to the respective folio call
folio_test_hwpoison(). This introduces no functional changes.

Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
---
 mm/shmem.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/mm/shmem.c b/mm/shmem.c
index 5bf92d571092..6f117c3cbe89 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -3483,7 +3483,7 @@ static const char *shmem_get_link(struct dentry *dentry,
 		folio = filemap_get_folio(inode->i_mapping, 0);
 		if (IS_ERR(folio))
 			return ERR_PTR(-ECHILD);
-		if (PageHWPoison(folio_page(folio, 0)) ||
+		if (folio_test_hwpoison(folio) ||
 		    !folio_test_uptodate(folio)) {
 			folio_put(folio);
 			return ERR_PTR(-ECHILD);
@@ -3494,7 +3494,7 @@ static const char *shmem_get_link(struct dentry *dentry,
 			return ERR_PTR(error);
 		if (!folio)
 			return ERR_PTR(-ECHILD);
-		if (PageHWPoison(folio_page(folio, 0))) {
+		if (folio_test_hwpoison(folio)) {
 			folio_unlock(folio);
 			folio_put(folio);
 			return ERR_PTR(-ECHILD);
@@ -4672,7 +4672,7 @@ struct page *shmem_read_mapping_page_gfp(struct address_space *mapping,
 		return &folio->page;
 
 	page = folio_file_page(folio, index);
-	if (PageHWPoison(page)) {
+	if (folio_test_hwpoison(folio)) {
 		folio_put(folio);
 		return ERR_PTR(-EIO);
 	}
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC 3/8] shmem: account for high order folios
  2023-04-21 21:43 [RFC 0/8] shmem: add support for blocksize > PAGE_SIZE Luis Chamberlain
  2023-04-21 21:43 ` [RFC 1/8] shmem: replace BLOCKS_PER_PAGE with PAGE_SECTORS Luis Chamberlain
  2023-04-21 21:43 ` [RFC 2/8] shmem: convert to use folio_test_hwpoison() Luis Chamberlain
@ 2023-04-21 21:43 ` Luis Chamberlain
  2023-04-21 22:46   ` Matthew Wilcox
  2023-04-21 21:43 ` [RFC 4/8] shmem: add helpers to get block size Luis Chamberlain
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 20+ messages in thread
From: Luis Chamberlain @ 2023-04-21 21:43 UTC (permalink / raw)
  To: hughd, akpm, willy, brauner, djwong
  Cc: p.raghav, da.gomez, a.manzanares, dave, yosryahmed, keescook,
	hare, kbusch, mcgrof, patches, linux-block, linux-fsdevel,
	linux-mm, linux-kernel

shmem uses the shem_info_inode alloced, swapped to account
for allocated pages and swapped pages. In preparation for high
order folios adjust the accounting to use folio_nr_pages().

This should produce no functional changes yet as higher order
folios are not yet used or supported in shmem.

Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
---
 mm/shmem.c | 39 +++++++++++++++++++++++++--------------
 1 file changed, 25 insertions(+), 14 deletions(-)

diff --git a/mm/shmem.c b/mm/shmem.c
index 6f117c3cbe89..d76e86ff356e 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -806,15 +806,15 @@ unsigned long shmem_partial_swap_usage(struct address_space *mapping,
 						pgoff_t start, pgoff_t end)
 {
 	XA_STATE(xas, &mapping->i_pages, start);
-	struct page *page;
+	struct folio *folio;
 	unsigned long swapped = 0;
 
 	rcu_read_lock();
-	xas_for_each(&xas, page, end - 1) {
-		if (xas_retry(&xas, page))
+	xas_for_each(&xas, folio, end - 1) {
+		if (xas_retry(&xas, folio))
 			continue;
-		if (xa_is_value(page))
-			swapped++;
+		if (xa_is_value(folio))
+			swapped+=(folio_nr_pages(folio));
 
 		if (need_resched()) {
 			xas_pause(&xas);
@@ -941,10 +941,15 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend,
 			folio = fbatch.folios[i];
 
 			if (xa_is_value(folio)) {
+				long swaps_freed = 0;
 				if (unfalloc)
 					continue;
-				nr_swaps_freed += !shmem_free_swap(mapping,
-							indices[i], folio);
+				swaps_freed = folio_nr_pages(folio);
+				if (!shmem_free_swap(mapping, indices[i], folio)) {
+					if (swaps_freed > 1)
+						pr_warn("swaps freed > 1 -- %lu\n", swaps_freed);
+					nr_swaps_freed += swaps_freed;
+				}
 				continue;
 			}
 
@@ -1010,14 +1015,18 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend,
 			folio = fbatch.folios[i];
 
 			if (xa_is_value(folio)) {
+				long swaps_freed = 0;
 				if (unfalloc)
 					continue;
+				swaps_freed = folio_nr_pages(folio);
 				if (shmem_free_swap(mapping, indices[i], folio)) {
 					/* Swap was replaced by page: retry */
 					index = indices[i];
 					break;
 				}
-				nr_swaps_freed++;
+				if (swaps_freed > 1)
+					pr_warn("swaps freed > 1 -- %lu\n", swaps_freed);
+				nr_swaps_freed+=swaps_freed;
 				continue;
 			}
 
@@ -1448,7 +1457,7 @@ static int shmem_writepage(struct page *page, struct writeback_control *wbc)
 			NULL) == 0) {
 		spin_lock_irq(&info->lock);
 		shmem_recalc_inode(inode);
-		info->swapped++;
+		info->swapped+=folio_nr_pages(folio);
 		spin_unlock_irq(&info->lock);
 
 		swap_shmem_alloc(swap);
@@ -1723,6 +1732,7 @@ static void shmem_set_folio_swapin_error(struct inode *inode, pgoff_t index,
 	struct shmem_inode_info *info = SHMEM_I(inode);
 	swp_entry_t swapin_error;
 	void *old;
+	long num_swap_pages;
 
 	swapin_error = make_swapin_error_entry();
 	old = xa_cmpxchg_irq(&mapping->i_pages, index,
@@ -1732,6 +1742,7 @@ static void shmem_set_folio_swapin_error(struct inode *inode, pgoff_t index,
 		return;
 
 	folio_wait_writeback(folio);
+	num_swap_pages = folio_nr_pages(folio);
 	delete_from_swap_cache(folio);
 	spin_lock_irq(&info->lock);
 	/*
@@ -1739,8 +1750,8 @@ static void shmem_set_folio_swapin_error(struct inode *inode, pgoff_t index,
 	 * be 0 when inode is released and thus trigger WARN_ON(inode->i_blocks) in
 	 * shmem_evict_inode.
 	 */
-	info->alloced--;
-	info->swapped--;
+	info->alloced-=num_swap_pages;
+	info->swapped-=num_swap_pages;
 	shmem_recalc_inode(inode);
 	spin_unlock_irq(&info->lock);
 	swap_free(swap);
@@ -1830,7 +1841,7 @@ static int shmem_swapin_folio(struct inode *inode, pgoff_t index,
 		goto failed;
 
 	spin_lock_irq(&info->lock);
-	info->swapped--;
+	info->swapped-= folio_nr_pages(folio);
 	shmem_recalc_inode(inode);
 	spin_unlock_irq(&info->lock);
 
@@ -2657,8 +2668,8 @@ int shmem_mfill_atomic_pte(pmd_t *dst_pmd,
 		goto out_delete_from_cache;
 
 	spin_lock_irq(&info->lock);
-	info->alloced++;
-	inode->i_blocks += PAGE_SECTORS;
+	info->alloced += folio_nr_pages(folio);
+	inode->i_blocks += PAGE_SECTORS << folio_order(folio);
 	shmem_recalc_inode(inode);
 	spin_unlock_irq(&info->lock);
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC 4/8] shmem: add helpers to get block size
  2023-04-21 21:43 [RFC 0/8] shmem: add support for blocksize > PAGE_SIZE Luis Chamberlain
                   ` (2 preceding siblings ...)
  2023-04-21 21:43 ` [RFC 3/8] shmem: account for high order folios Luis Chamberlain
@ 2023-04-21 21:43 ` Luis Chamberlain
  2023-04-21 22:49   ` Matthew Wilcox
  2023-04-21 21:43 ` [RFC 5/8] shmem: account for larger blocks sizes for shmem_default_max_blocks() Luis Chamberlain
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 20+ messages in thread
From: Luis Chamberlain @ 2023-04-21 21:43 UTC (permalink / raw)
  To: hughd, akpm, willy, brauner, djwong
  Cc: p.raghav, da.gomez, a.manzanares, dave, yosryahmed, keescook,
	hare, kbusch, mcgrof, patches, linux-block, linux-fsdevel,
	linux-mm, linux-kernel

Stuff the block size as a struct shmem_sb_info member when CONFIG_TMPFS
is enabled, but keep the current static value for now, and use helpers
to get the blocksize. This will make the subsequent change easier to read.

The static value for block size of PAGE_SIZE is used currently.

The struct super_block s_blocksize_bits represents the blocksize in
power of two, since the block size is always PAGE_SIZE this is PAGE_SHIFT
today, but to help make this a bit more apt to scale we can use __ffs()
for it instead.

This commit introduces no functional changes other than __ffs() for the
s_blocksize_bits and extending the struct shmem_sb_info with the blocksize.

Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
---
 include/linux/shmem_fs.h |  3 +++
 mm/shmem.c               | 24 +++++++++++++++++++++---
 2 files changed, 24 insertions(+), 3 deletions(-)

diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h
index 9029abd29b1c..89e471fcde1d 100644
--- a/include/linux/shmem_fs.h
+++ b/include/linux/shmem_fs.h
@@ -36,6 +36,9 @@ struct shmem_inode_info {
 #define SHMEM_FL_INHERITED		(FS_NODUMP_FL | FS_NOATIME_FL)
 
 struct shmem_sb_info {
+#ifdef CONFIG_TMPFS
+	u64 blocksize;
+#endif
 	unsigned long max_blocks;   /* How many blocks are allowed */
 	struct percpu_counter used_blocks;  /* How many are allocated */
 	unsigned long max_inodes;   /* How many inodes are allowed */
diff --git a/mm/shmem.c b/mm/shmem.c
index d76e86ff356e..162384b58a5c 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -125,7 +125,17 @@ struct shmem_options {
 #define SHMEM_SEEN_NOSWAP 16
 };
 
+static u64 shmem_default_bsize(void)
+{
+	return PAGE_SIZE;
+}
+
 #ifdef CONFIG_TMPFS
+static u64 shmem_sb_blocksize(struct shmem_sb_info *sbinfo)
+{
+	return sbinfo->blocksize;
+}
+
 static unsigned long shmem_default_max_blocks(void)
 {
 	return totalram_pages() / 2;
@@ -137,6 +147,12 @@ static unsigned long shmem_default_max_inodes(void)
 
 	return min(nr_pages - totalhigh_pages(), nr_pages / 2);
 }
+#else
+static u64 shmem_sb_blocksize(struct shmem_sb_info *sbinfo)
+{
+	return shmem_default_bsize();
+}
+
 #endif
 
 static int shmem_swapin_folio(struct inode *inode, pgoff_t index,
@@ -3190,7 +3206,7 @@ static int shmem_statfs(struct dentry *dentry, struct kstatfs *buf)
 	struct shmem_sb_info *sbinfo = SHMEM_SB(dentry->d_sb);
 
 	buf->f_type = TMPFS_MAGIC;
-	buf->f_bsize = PAGE_SIZE;
+	buf->f_bsize = shmem_sb_blocksize(sbinfo);
 	buf->f_namelen = NAME_MAX;
 	if (sbinfo->max_blocks) {
 		buf->f_blocks = sbinfo->max_blocks;
@@ -4100,6 +4116,7 @@ static int shmem_fill_super(struct super_block *sb, struct fs_context *fc)
 	}
 	sb->s_export_op = &shmem_export_ops;
 	sb->s_flags |= SB_NOSEC | SB_I_VERSION;
+	sbinfo->blocksize = shmem_default_bsize();
 #else
 	sb->s_flags |= SB_NOUSER;
 #endif
@@ -4125,8 +4142,9 @@ static int shmem_fill_super(struct super_block *sb, struct fs_context *fc)
 	INIT_LIST_HEAD(&sbinfo->shrinklist);
 
 	sb->s_maxbytes = MAX_LFS_FILESIZE;
-	sb->s_blocksize = PAGE_SIZE;
-	sb->s_blocksize_bits = PAGE_SHIFT;
+	sb->s_blocksize = shmem_sb_blocksize(sbinfo);
+	sb->s_blocksize_bits = __ffs(sb->s_blocksize);
+	WARN_ON_ONCE(sb->s_blocksize_bits != PAGE_SHIFT);
 	sb->s_magic = TMPFS_MAGIC;
 	sb->s_op = &shmem_ops;
 	sb->s_time_gran = 1;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC 5/8] shmem: account for larger blocks sizes for shmem_default_max_blocks()
  2023-04-21 21:43 [RFC 0/8] shmem: add support for blocksize > PAGE_SIZE Luis Chamberlain
                   ` (3 preceding siblings ...)
  2023-04-21 21:43 ` [RFC 4/8] shmem: add helpers to get block size Luis Chamberlain
@ 2023-04-21 21:43 ` Luis Chamberlain
  2023-04-21 21:43 ` [RFC 6/8] shmem: consider block size in shmem_default_max_inodes() Luis Chamberlain
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 20+ messages in thread
From: Luis Chamberlain @ 2023-04-21 21:43 UTC (permalink / raw)
  To: hughd, akpm, willy, brauner, djwong
  Cc: p.raghav, da.gomez, a.manzanares, dave, yosryahmed, keescook,
	hare, kbusch, mcgrof, patches, linux-block, linux-fsdevel,
	linux-mm, linux-kernel

If we end up supporting a larger block size than PAGE_SIZE the calculations in
shmem_default_max_blocks() need to be modified to take into account the fact
that multiple pages would be required for a single block.

Today the max number of blocks is computed based on the fact that we
will by default use half of the available memory and each block is of
PAGE_SIZE.

And so we end up with:

totalram_pages() / 2

That's becauase blocksize == PAGE_SIZE. When blocksize > PAGE_SIZE
we need to consider how how many blocks fit into totalram_pages() first,
then just divide by 2. This ends up being:

totalram_pages * PAGE_SIZE / blocksize / 2
totalram_pages * 2^PAGE_SHIFT / 2^bbits / 2
totalram_pages * 2^(PAGE_SHIFT - bbits - 1)

We know bbits > PAGE_SHIFT so we'll end up with a negative
power of 2. 2^(-some_val). We can factor the -1 out by changing
this to a division of power of 2 and flipping the values for
the signs:

-1 * (PAGE_SHIFT - bbits -1) = (-PAGE_SHIFT + bbits + 1)
                             = (bbits - PAGE_SHIFT + 1)

And so we end up with:

totalram_pages / 2^(bbits - PAGE_SHIFT + 1)

We use __ffs(blocksize) as this computation is needed early on before
any inode is established.

Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
---
 mm/shmem.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/mm/shmem.c b/mm/shmem.c
index 162384b58a5c..b83596467706 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -136,9 +136,11 @@ static u64 shmem_sb_blocksize(struct shmem_sb_info *sbinfo)
 	return sbinfo->blocksize;
 }
 
-static unsigned long shmem_default_max_blocks(void)
+static unsigned long shmem_default_max_blocks(u64 blocksize)
 {
-	return totalram_pages() / 2;
+	if (blocksize == shmem_default_bsize())
+		return totalram_pages() / 2;
+	return totalram_pages() >> (__ffs(blocksize) - PAGE_SHIFT + 1);
 }
 
 static unsigned long shmem_default_max_inodes(void)
@@ -3816,7 +3818,7 @@ static int shmem_parse_one(struct fs_context *fc, struct fs_parameter *param)
 		}
 		if (*rest)
 			goto bad_value;
-		ctx->blocks = DIV_ROUND_UP(size, PAGE_SIZE);
+		ctx->blocks = DIV_ROUND_UP(size, shmem_default_bsize());
 		ctx->seen |= SHMEM_SEEN_BLOCKS;
 		break;
 	case Opt_nr_blocks:
@@ -4023,7 +4025,7 @@ static int shmem_show_options(struct seq_file *seq, struct dentry *root)
 {
 	struct shmem_sb_info *sbinfo = SHMEM_SB(root->d_sb);
 
-	if (sbinfo->max_blocks != shmem_default_max_blocks())
+	if (sbinfo->max_blocks != shmem_default_max_blocks(shmem_default_bsize()))
 		seq_printf(seq, ",size=%luk",
 			sbinfo->max_blocks << (PAGE_SHIFT - 10));
 	if (sbinfo->max_inodes != shmem_default_max_inodes())
@@ -4105,7 +4107,7 @@ static int shmem_fill_super(struct super_block *sb, struct fs_context *fc)
 	 */
 	if (!(sb->s_flags & SB_KERNMOUNT)) {
 		if (!(ctx->seen & SHMEM_SEEN_BLOCKS))
-			ctx->blocks = shmem_default_max_blocks();
+			ctx->blocks = shmem_default_max_blocks(shmem_default_bsize());
 		if (!(ctx->seen & SHMEM_SEEN_INODES))
 			ctx->inodes = shmem_default_max_inodes();
 		if (!(ctx->seen & SHMEM_SEEN_INUMS))
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC 6/8] shmem: consider block size in shmem_default_max_inodes()
  2023-04-21 21:43 [RFC 0/8] shmem: add support for blocksize > PAGE_SIZE Luis Chamberlain
                   ` (4 preceding siblings ...)
  2023-04-21 21:43 ` [RFC 5/8] shmem: account for larger blocks sizes for shmem_default_max_blocks() Luis Chamberlain
@ 2023-04-21 21:43 ` Luis Chamberlain
  2023-04-21 21:43 ` [RFC 7/8] shmem: add high order page support Luis Chamberlain
  2023-04-21 21:44 ` [RFC 8/8] shmem: add support to customize block size on multiple PAGE_SIZE Luis Chamberlain
  7 siblings, 0 replies; 20+ messages in thread
From: Luis Chamberlain @ 2023-04-21 21:43 UTC (permalink / raw)
  To: hughd, akpm, willy, brauner, djwong
  Cc: p.raghav, da.gomez, a.manzanares, dave, yosryahmed, keescook,
	hare, kbusch, mcgrof, patches, linux-block, linux-fsdevel,
	linux-mm, linux-kernel

Today we allow for a max number of inodes in consideration for
the smallest possible inodes with just one block of size PAGE_SIZE.
The max number of inodes depend on the size of the block size then,
and if we want to support higher block sizes we end up with less
number of inodes.

Account for this in the computation for the max number of inodes.

If the blocksize is greater than the PAGE_SIZE, we simply divide the
number of pages usable, multiply by the page size and divide by the
blocksize.

This produces no functional changes right now as we don't support
larger block sizes yet.

Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
---
 mm/shmem.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/mm/shmem.c b/mm/shmem.c
index b83596467706..5a64efd1f3c2 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -143,11 +143,14 @@ static unsigned long shmem_default_max_blocks(u64 blocksize)
 	return totalram_pages() >> (__ffs(blocksize) - PAGE_SHIFT + 1);
 }
 
-static unsigned long shmem_default_max_inodes(void)
+static unsigned long shmem_default_max_inodes(u64 blocksize)
 {
 	unsigned long nr_pages = totalram_pages();
+	unsigned long pages_for_inodes = min(nr_pages - totalhigh_pages(), nr_pages / 2);
 
-	return min(nr_pages - totalhigh_pages(), nr_pages / 2);
+	if (blocksize == shmem_default_bsize())
+		return pages_for_inodes;
+	return pages_for_inodes >> (__ffs(blocksize) - PAGE_SHIFT);
 }
 #else
 static u64 shmem_sb_blocksize(struct shmem_sb_info *sbinfo)
@@ -4028,7 +4031,7 @@ static int shmem_show_options(struct seq_file *seq, struct dentry *root)
 	if (sbinfo->max_blocks != shmem_default_max_blocks(shmem_default_bsize()))
 		seq_printf(seq, ",size=%luk",
 			sbinfo->max_blocks << (PAGE_SHIFT - 10));
-	if (sbinfo->max_inodes != shmem_default_max_inodes())
+	if (sbinfo->max_inodes != shmem_default_max_inodes(shmem_default_bsize()))
 		seq_printf(seq, ",nr_inodes=%lu", sbinfo->max_inodes);
 	if (sbinfo->mode != (0777 | S_ISVTX))
 		seq_printf(seq, ",mode=%03ho", sbinfo->mode);
@@ -4109,7 +4112,7 @@ static int shmem_fill_super(struct super_block *sb, struct fs_context *fc)
 		if (!(ctx->seen & SHMEM_SEEN_BLOCKS))
 			ctx->blocks = shmem_default_max_blocks(shmem_default_bsize());
 		if (!(ctx->seen & SHMEM_SEEN_INODES))
-			ctx->inodes = shmem_default_max_inodes();
+			ctx->inodes = shmem_default_max_inodes(shmem_default_bsize());
 		if (!(ctx->seen & SHMEM_SEEN_INUMS))
 			ctx->full_inums = IS_ENABLED(CONFIG_TMPFS_INODE64);
 		sbinfo->noswap = ctx->noswap;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC 7/8] shmem: add high order page support
  2023-04-21 21:43 [RFC 0/8] shmem: add support for blocksize > PAGE_SIZE Luis Chamberlain
                   ` (5 preceding siblings ...)
  2023-04-21 21:43 ` [RFC 6/8] shmem: consider block size in shmem_default_max_inodes() Luis Chamberlain
@ 2023-04-21 21:43 ` Luis Chamberlain
  2023-04-21 21:44 ` [RFC 8/8] shmem: add support to customize block size on multiple PAGE_SIZE Luis Chamberlain
  7 siblings, 0 replies; 20+ messages in thread
From: Luis Chamberlain @ 2023-04-21 21:43 UTC (permalink / raw)
  To: hughd, akpm, willy, brauner, djwong
  Cc: p.raghav, da.gomez, a.manzanares, dave, yosryahmed, keescook,
	hare, kbusch, mcgrof, patches, linux-block, linux-fsdevel,
	linux-mm, linux-kernel

To support high order block sizes we want to support a high order
folios so to treat the larger block atomically. Add support for this
for tmpfs mounts.

Right now this produces no functional changes since we only allow one
single block size, matching the PAGE_SIZE and so the order is always 0.

Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
---
 mm/shmem.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/mm/shmem.c b/mm/shmem.c
index 5a64efd1f3c2..740b4448f936 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1621,9 +1621,15 @@ static struct folio *shmem_alloc_folio(gfp_t gfp,
 {
 	struct vm_area_struct pvma;
 	struct folio *folio;
+	struct inode *inode = &info->vfs_inode;
+	struct super_block *i_sb = inode->i_sb;
+	int order = 0;
+
+	if (!(i_sb->s_flags & SB_KERNMOUNT))
+		order = i_sb->s_blocksize_bits - PAGE_SHIFT;
 
 	shmem_pseudo_vma_init(&pvma, info, index);
-	folio = vma_alloc_folio(gfp, 0, &pvma, 0, false);
+	folio = vma_alloc_folio(gfp, order, &pvma, 0, false);
 	shmem_pseudo_vma_destroy(&pvma);
 
 	return folio;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC 8/8] shmem: add support to customize block size on multiple PAGE_SIZE
  2023-04-21 21:43 [RFC 0/8] shmem: add support for blocksize > PAGE_SIZE Luis Chamberlain
                   ` (6 preceding siblings ...)
  2023-04-21 21:43 ` [RFC 7/8] shmem: add high order page support Luis Chamberlain
@ 2023-04-21 21:44 ` Luis Chamberlain
  2023-04-22  5:10   ` Jane Chu
  7 siblings, 1 reply; 20+ messages in thread
From: Luis Chamberlain @ 2023-04-21 21:44 UTC (permalink / raw)
  To: hughd, akpm, willy, brauner, djwong
  Cc: p.raghav, da.gomez, a.manzanares, dave, yosryahmed, keescook,
	hare, kbusch, mcgrof, patches, linux-block, linux-fsdevel,
	linux-mm, linux-kernel

This allows tmpfs mounts to use a custom block size. We only allow
block sizes greater than PAGE_SIZE, and these must also be a multiple
of the PAGE_SIZE too.

Only simple tests have been run so far:

time for i in $(seq 1 1000000); do echo $i >> /root/ordered.txt; done

real    0m21.392s
user    0m8.077s
sys     0m13.098s

du -h /root/ordered.txt
6.6M    /root/ordered.txt

sha1sum /root/ordered.txt
2dcc06b7ca3b7dd8b5626af83c1be3cb08ddc76c  /root/ordered.txt

stat /root/ordered.txt
  File: /root/ordered.txt
  Size: 6888896         Blocks: 13456      IO Block: 4096   regular file
Device: 254,1   Inode: 655717      Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2023-04-21 19:34:20.709869093 +0000
Modify: 2023-04-21 19:34:43.833900042 +0000
Change: 2023-04-21 19:34:43.833900042 +0000
 Birth: 2023-04-21 19:34:20.709869093 +0000

8 KiB block size:

sha1sum /root/ordered.txt
mount -t tmpfs            -o size=10M,bsize=$((4096*2)) -o noswap tmpfs /data-tmpfs/
cp /root/ordered.txt
sha1sum /data-tmpfs/ordered.txt
stat /data-tmpfs/ordered.txt
2dcc06b7ca3b7dd8b5626af83c1be3cb08ddc76c  /root/ordered.txt
2dcc06b7ca3b7dd8b5626af83c1be3cb08ddc76c  /data-tmpfs/ordered.txt
  File: /data-tmpfs/ordered.txt
  Size: 6888896         Blocks: 13456      IO Block: 8192   regular file
Device: 0,42    Inode: 2           Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2023-04-21 19:31:16.078390405 +0000
Modify: 2023-04-21 19:31:16.070391363 +0000
Change: 2023-04-21 19:31:16.070391363 +0000
 Birth: 2023-04-21 19:31:16.034395676 +0000

64 KiB block size:

sha1sum /root/ordered.txt
mount -t tmpfs            -o size=10M,bsize=$((4096*16)) -o noswap tmpfs /data-tmpfs/
cp /root/ordered.txt /data-tmpfs/; sha1sum /data-tmpfs/ordered.txt
stat /data-tmpfs/ordered.txt
2dcc06b7ca3b7dd8b5626af83c1be3cb08ddc76c  /root/ordered.txt
2dcc06b7ca3b7dd8b5626af83c1be3cb08ddc76c  /data-tmpfs/ordered.txt
  File: /data-tmpfs/ordered.txt
  Size: 6888896         Blocks: 13568      IO Block: 65536  regular file
Device: 0,42    Inode: 2           Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2023-04-21 19:32:14.669796970 +0000
Modify: 2023-04-21 19:32:14.661796959 +0000
Change: 2023-04-21 19:32:14.661796959 +0000
 Birth: 2023-04-21 19:32:14.649796944 +0000

Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
---
 mm/shmem.c | 47 ++++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 42 insertions(+), 5 deletions(-)

diff --git a/mm/shmem.c b/mm/shmem.c
index 740b4448f936..64108c28eebd 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -118,11 +118,13 @@ struct shmem_options {
 	int huge;
 	int seen;
 	bool noswap;
+	u64 blocksize;
 #define SHMEM_SEEN_BLOCKS 1
 #define SHMEM_SEEN_INODES 2
 #define SHMEM_SEEN_HUGE 4
 #define SHMEM_SEEN_INUMS 8
 #define SHMEM_SEEN_NOSWAP 16
+#define SHMEM_SEEN_BLOCKSIZE 32
 };
 
 static u64 shmem_default_bsize(void)
@@ -3779,6 +3781,7 @@ enum shmem_param {
 	Opt_inode32,
 	Opt_inode64,
 	Opt_noswap,
+	Opt_bsize,
 };
 
 static const struct constant_table shmem_param_enums_huge[] = {
@@ -3801,6 +3804,7 @@ const struct fs_parameter_spec shmem_fs_parameters[] = {
 	fsparam_flag  ("inode32",	Opt_inode32),
 	fsparam_flag  ("inode64",	Opt_inode64),
 	fsparam_flag  ("noswap",	Opt_noswap),
+	fsparam_u32   ("bsize",		Opt_bsize),
 	{}
 };
 
@@ -3827,7 +3831,14 @@ static int shmem_parse_one(struct fs_context *fc, struct fs_parameter *param)
 		}
 		if (*rest)
 			goto bad_value;
-		ctx->blocks = DIV_ROUND_UP(size, shmem_default_bsize());
+		if (!(ctx->seen & SHMEM_SEEN_BLOCKSIZE) ||
+		    ctx->blocksize == shmem_default_bsize())
+			ctx->blocks = DIV_ROUND_UP(size, shmem_default_bsize());
+		else {
+			if (size < ctx->blocksize || size % ctx->blocksize != 0)
+				goto bad_value;
+			ctx->blocks = DIV_ROUND_UP(size, ctx->blocksize);
+		}
 		ctx->seen |= SHMEM_SEEN_BLOCKS;
 		break;
 	case Opt_nr_blocks:
@@ -3892,6 +3903,23 @@ static int shmem_parse_one(struct fs_context *fc, struct fs_parameter *param)
 		ctx->noswap = true;
 		ctx->seen |= SHMEM_SEEN_NOSWAP;
 		break;
+	case Opt_bsize:
+		ctx->blocksize = result.uint_32;
+		ctx->seen |= SHMEM_SEEN_BLOCKSIZE;
+		/* Must be >= PAGE_SIZE */
+		if (ctx->blocksize < PAGE_SIZE)
+			goto bad_value;
+		/*
+		 * We cap this to allow a block to be at least allowed to
+		 * be allocated using the buddy allocator. That's MAX_ORDER
+		 * pages. So 4 MiB on x86_64.
+		 */
+		if (ctx->blocksize > (1 << (MAX_ORDER + PAGE_SHIFT)))
+			goto bad_value;
+		/* The blocksize must be a multiple of the page size so must be aligned */
+		if (!PAGE_ALIGNED(ctx->blocksize))
+			goto bad_value;
+		break;
 	}
 	return 0;
 
@@ -3963,6 +3991,12 @@ static int shmem_reconfigure(struct fs_context *fc)
 	raw_spin_lock(&sbinfo->stat_lock);
 	inodes = sbinfo->max_inodes - sbinfo->free_inodes;
 
+	if (ctx->seen & SHMEM_SEEN_BLOCKSIZE) {
+		if (ctx->blocksize != shmem_sb_blocksize(sbinfo)) {
+			err = "Cannot modify block size on remount";
+			goto out;
+		}
+	}
 	if ((ctx->seen & SHMEM_SEEN_BLOCKS) && ctx->blocks) {
 		if (!sbinfo->max_blocks) {
 			err = "Cannot retroactively limit size";
@@ -4078,6 +4112,8 @@ static int shmem_show_options(struct seq_file *seq, struct dentry *root)
 	shmem_show_mpol(seq, sbinfo->mpol);
 	if (sbinfo->noswap)
 		seq_printf(seq, ",noswap");
+	if (shmem_sb_blocksize(sbinfo) != shmem_default_bsize())
+		seq_printf(seq, ",bsize=%llu", shmem_sb_blocksize(sbinfo));
 	return 0;
 }
 
@@ -4115,10 +4151,12 @@ static int shmem_fill_super(struct super_block *sb, struct fs_context *fc)
 	 * but the internal instance is left unlimited.
 	 */
 	if (!(sb->s_flags & SB_KERNMOUNT)) {
+		if (!(ctx->seen & SHMEM_SEEN_BLOCKSIZE))
+			ctx->blocksize = shmem_default_bsize();
 		if (!(ctx->seen & SHMEM_SEEN_BLOCKS))
-			ctx->blocks = shmem_default_max_blocks(shmem_default_bsize());
+			ctx->blocks = shmem_default_max_blocks(ctx->blocksize);
 		if (!(ctx->seen & SHMEM_SEEN_INODES))
-			ctx->inodes = shmem_default_max_inodes(shmem_default_bsize());
+			ctx->inodes = shmem_default_max_inodes(ctx->blocksize);
 		if (!(ctx->seen & SHMEM_SEEN_INUMS))
 			ctx->full_inums = IS_ENABLED(CONFIG_TMPFS_INODE64);
 		sbinfo->noswap = ctx->noswap;
@@ -4127,7 +4165,7 @@ static int shmem_fill_super(struct super_block *sb, struct fs_context *fc)
 	}
 	sb->s_export_op = &shmem_export_ops;
 	sb->s_flags |= SB_NOSEC | SB_I_VERSION;
-	sbinfo->blocksize = shmem_default_bsize();
+	sbinfo->blocksize = ctx->blocksize;
 #else
 	sb->s_flags |= SB_NOUSER;
 #endif
@@ -4155,7 +4193,6 @@ static int shmem_fill_super(struct super_block *sb, struct fs_context *fc)
 	sb->s_maxbytes = MAX_LFS_FILESIZE;
 	sb->s_blocksize = shmem_sb_blocksize(sbinfo);
 	sb->s_blocksize_bits = __ffs(sb->s_blocksize);
-	WARN_ON_ONCE(sb->s_blocksize_bits != PAGE_SHIFT);
 	sb->s_magic = TMPFS_MAGIC;
 	sb->s_op = &shmem_ops;
 	sb->s_time_gran = 1;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [RFC 2/8] shmem: convert to use folio_test_hwpoison()
  2023-04-21 21:43 ` [RFC 2/8] shmem: convert to use folio_test_hwpoison() Luis Chamberlain
@ 2023-04-21 22:42   ` Matthew Wilcox
  2023-04-22  3:05     ` Luis Chamberlain
       [not found]     ` <CGME20230425110913eucas1p22cf9d4c7401881999adb12134b985273@eucas1p2.samsung.com>
  0 siblings, 2 replies; 20+ messages in thread
From: Matthew Wilcox @ 2023-04-21 22:42 UTC (permalink / raw)
  To: Luis Chamberlain
  Cc: hughd, akpm, brauner, djwong, p.raghav, da.gomez, a.manzanares,
	dave, yosryahmed, keescook, hare, kbusch, patches, linux-block,
	linux-fsdevel, linux-mm, linux-kernel

On Fri, Apr 21, 2023 at 02:43:54PM -0700, Luis Chamberlain wrote:
> The PageHWPoison() call can be converted over to the respective folio call
> folio_test_hwpoison(). This introduces no functional changes.

Um, no.  Nobody should use folio_test_hwpoison(), it's a nonsense.

Individual pages are hwpoisoned.  You're only testing the head page
if you use folio_test_hwpoison().  There's folio_has_hwpoisoned() to
test if _any_ page in the folio is poisoned.  But blindly converting
PageHWPoison to folio_test_hwpoison() is wrong.

If anyone knows how to poison folio_test_hwpoison() to make it not
work, I'd appreciate it.

> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
> ---
>  mm/shmem.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/mm/shmem.c b/mm/shmem.c
> index 5bf92d571092..6f117c3cbe89 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -3483,7 +3483,7 @@ static const char *shmem_get_link(struct dentry *dentry,
>  		folio = filemap_get_folio(inode->i_mapping, 0);
>  		if (IS_ERR(folio))
>  			return ERR_PTR(-ECHILD);
> -		if (PageHWPoison(folio_page(folio, 0)) ||
> +		if (folio_test_hwpoison(folio) ||
>  		    !folio_test_uptodate(folio)) {
>  			folio_put(folio);
>  			return ERR_PTR(-ECHILD);
> @@ -3494,7 +3494,7 @@ static const char *shmem_get_link(struct dentry *dentry,
>  			return ERR_PTR(error);
>  		if (!folio)
>  			return ERR_PTR(-ECHILD);
> -		if (PageHWPoison(folio_page(folio, 0))) {
> +		if (folio_test_hwpoison(folio)) {
>  			folio_unlock(folio);
>  			folio_put(folio);
>  			return ERR_PTR(-ECHILD);
> @@ -4672,7 +4672,7 @@ struct page *shmem_read_mapping_page_gfp(struct address_space *mapping,
>  		return &folio->page;
>  
>  	page = folio_file_page(folio, index);
> -	if (PageHWPoison(page)) {
> +	if (folio_test_hwpoison(folio)) {
>  		folio_put(folio);
>  		return ERR_PTR(-EIO);
>  	}
> -- 
> 2.39.2
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC 3/8] shmem: account for high order folios
  2023-04-21 21:43 ` [RFC 3/8] shmem: account for high order folios Luis Chamberlain
@ 2023-04-21 22:46   ` Matthew Wilcox
  0 siblings, 0 replies; 20+ messages in thread
From: Matthew Wilcox @ 2023-04-21 22:46 UTC (permalink / raw)
  To: Luis Chamberlain
  Cc: hughd, akpm, brauner, djwong, p.raghav, da.gomez, a.manzanares,
	dave, yosryahmed, keescook, hare, kbusch, patches, linux-block,
	linux-fsdevel, linux-mm, linux-kernel

On Fri, Apr 21, 2023 at 02:43:55PM -0700, Luis Chamberlain wrote:
> -		if (xa_is_value(page))
> -			swapped++;
> +		if (xa_is_value(folio))
> +			swapped+=(folio_nr_pages(folio));

			swapped += folio_nr_pages(folio);

>  			if (xa_is_value(folio)) {
> +				long swaps_freed = 0;
>  				if (unfalloc)
>  					continue;
> -				nr_swaps_freed += !shmem_free_swap(mapping,
> -							indices[i], folio);
> +				swaps_freed = folio_nr_pages(folio);

Why initialise it to 0 when you're about to set it to folio_nr_pages()?

> +				if (!shmem_free_swap(mapping, indices[i], folio)) {
> +					if (swaps_freed > 1)
> +						pr_warn("swaps freed > 1 -- %lu\n", swaps_freed);

Debug code that escaped into this patch?

> -		info->swapped++;
> +		info->swapped+=folio_nr_pages(folio);

Same comment as earlier.

> -	info->alloced--;
> -	info->swapped--;
> +	info->alloced-=num_swap_pages;
> +	info->swapped-=num_swap_pages;

Spacing

> -	info->swapped--;
> +	info->swapped-= folio_nr_pages(folio);

Spacing.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC 4/8] shmem: add helpers to get block size
  2023-04-21 21:43 ` [RFC 4/8] shmem: add helpers to get block size Luis Chamberlain
@ 2023-04-21 22:49   ` Matthew Wilcox
  0 siblings, 0 replies; 20+ messages in thread
From: Matthew Wilcox @ 2023-04-21 22:49 UTC (permalink / raw)
  To: Luis Chamberlain
  Cc: hughd, akpm, brauner, djwong, p.raghav, da.gomez, a.manzanares,
	dave, yosryahmed, keescook, hare, kbusch, patches, linux-block,
	linux-fsdevel, linux-mm, linux-kernel

On Fri, Apr 21, 2023 at 02:43:56PM -0700, Luis Chamberlain wrote:
>  struct shmem_sb_info {
> +#ifdef CONFIG_TMPFS
> +	u64 blocksize;
> +#endif

u64?  You're planning on supporting a blocksize larger than 2GB?

I would store block_order (in an unsigned char) and then you can avoid
the call to ffs(), at the expense of doing 1UL << sbi->block_order;


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC 2/8] shmem: convert to use folio_test_hwpoison()
  2023-04-21 22:42   ` Matthew Wilcox
@ 2023-04-22  3:05     ` Luis Chamberlain
  2023-04-24 21:17       ` Yang Shi
       [not found]     ` <CGME20230425110913eucas1p22cf9d4c7401881999adb12134b985273@eucas1p2.samsung.com>
  1 sibling, 1 reply; 20+ messages in thread
From: Luis Chamberlain @ 2023-04-22  3:05 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: hughd, akpm, brauner, djwong, p.raghav, da.gomez, a.manzanares,
	dave, yosryahmed, keescook, hare, kbusch, patches, linux-block,
	linux-fsdevel, linux-mm, linux-kernel

On Fri, Apr 21, 2023 at 11:42:53PM +0100, Matthew Wilcox wrote:
> On Fri, Apr 21, 2023 at 02:43:54PM -0700, Luis Chamberlain wrote:
> > The PageHWPoison() call can be converted over to the respective folio call
> > folio_test_hwpoison(). This introduces no functional changes.
> 
> Um, no.  Nobody should use folio_test_hwpoison(), it's a nonsense.
> 
> Individual pages are hwpoisoned.  You're only testing the head page
> if you use folio_test_hwpoison().  There's folio_has_hwpoisoned() to
> test if _any_ page in the folio is poisoned.  But blindly converting
> PageHWPoison to folio_test_hwpoison() is wrong.

Thanks! I don't see folio_has_hwpoisoned() though.

  Luis

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC 8/8] shmem: add support to customize block size on multiple PAGE_SIZE
  2023-04-21 21:44 ` [RFC 8/8] shmem: add support to customize block size on multiple PAGE_SIZE Luis Chamberlain
@ 2023-04-22  5:10   ` Jane Chu
  0 siblings, 0 replies; 20+ messages in thread
From: Jane Chu @ 2023-04-22  5:10 UTC (permalink / raw)
  To: Luis Chamberlain, hughd, akpm, willy, brauner, djwong
  Cc: p.raghav, da.gomez, a.manzanares, dave, yosryahmed, keescook,
	hare, kbusch, patches, linux-block, linux-fsdevel, linux-mm,
	linux-kernel


On 4/21/2023 2:44 PM, Luis Chamberlain wrote:
[..]
> +		/*
> +		 * We cap this to allow a block to be at least allowed to
> +		 * be allocated using the buddy allocator. That's MAX_ORDER
> +		 * pages. So 4 MiB on x86_64.

8 MiB? since MAX_ORDER is 11.

> +		 */
> +		if (ctx->blocksize > (1 << (MAX_ORDER + PAGE_SHIFT)))
> +			goto bad_value; > +
> +		/* The blocksize must be a multiple of the page size so must be aligned */
> +		if (!PAGE_ALIGNED(ctx->blocksize))
> +			goto bad_value;
> +		break;
>   	}
>   	return 0;

-jane

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC 2/8] shmem: convert to use folio_test_hwpoison()
  2023-04-22  3:05     ` Luis Chamberlain
@ 2023-04-24 21:17       ` Yang Shi
  2023-04-24 21:36         ` Matthew Wilcox
  0 siblings, 1 reply; 20+ messages in thread
From: Yang Shi @ 2023-04-24 21:17 UTC (permalink / raw)
  To: Luis Chamberlain
  Cc: Matthew Wilcox, hughd, akpm, brauner, djwong, p.raghav, da.gomez,
	a.manzanares, dave, yosryahmed, keescook, hare, kbusch, patches,
	linux-block, linux-fsdevel, linux-mm, linux-kernel

On Fri, Apr 21, 2023 at 8:05 PM Luis Chamberlain <mcgrof@kernel.org> wrote:
>
> On Fri, Apr 21, 2023 at 11:42:53PM +0100, Matthew Wilcox wrote:
> > On Fri, Apr 21, 2023 at 02:43:54PM -0700, Luis Chamberlain wrote:
> > > The PageHWPoison() call can be converted over to the respective folio call
> > > folio_test_hwpoison(). This introduces no functional changes.
> >
> > Um, no.  Nobody should use folio_test_hwpoison(), it's a nonsense.
> >
> > Individual pages are hwpoisoned.  You're only testing the head page
> > if you use folio_test_hwpoison().  There's folio_has_hwpoisoned() to
> > test if _any_ page in the folio is poisoned.  But blindly converting
> > PageHWPoison to folio_test_hwpoison() is wrong.
>
> Thanks! I don't see folio_has_hwpoisoned() though.

We do have PageHasHWPoisoned(), which indicates at least one subpage
is hwpoisoned in the huge page.

You may need to add a folio variant.

>
>   Luis
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC 2/8] shmem: convert to use folio_test_hwpoison()
  2023-04-24 21:17       ` Yang Shi
@ 2023-04-24 21:36         ` Matthew Wilcox
  2023-04-24 23:05           ` Yang Shi
  0 siblings, 1 reply; 20+ messages in thread
From: Matthew Wilcox @ 2023-04-24 21:36 UTC (permalink / raw)
  To: Yang Shi
  Cc: Luis Chamberlain, hughd, akpm, brauner, djwong, p.raghav,
	da.gomez, a.manzanares, dave, yosryahmed, keescook, hare, kbusch,
	patches, linux-block, linux-fsdevel, linux-mm, linux-kernel

On Mon, Apr 24, 2023 at 02:17:12PM -0700, Yang Shi wrote:
> On Fri, Apr 21, 2023 at 8:05 PM Luis Chamberlain <mcgrof@kernel.org> wrote:
> >
> > On Fri, Apr 21, 2023 at 11:42:53PM +0100, Matthew Wilcox wrote:
> > > On Fri, Apr 21, 2023 at 02:43:54PM -0700, Luis Chamberlain wrote:
> > > > The PageHWPoison() call can be converted over to the respective folio call
> > > > folio_test_hwpoison(). This introduces no functional changes.
> > >
> > > Um, no.  Nobody should use folio_test_hwpoison(), it's a nonsense.
> > >
> > > Individual pages are hwpoisoned.  You're only testing the head page
> > > if you use folio_test_hwpoison().  There's folio_has_hwpoisoned() to
> > > test if _any_ page in the folio is poisoned.  But blindly converting
> > > PageHWPoison to folio_test_hwpoison() is wrong.
> >
> > Thanks! I don't see folio_has_hwpoisoned() though.
> 
> We do have PageHasHWPoisoned(), which indicates at least one subpage
> is hwpoisoned in the huge page.
> 
> You may need to add a folio variant.

PAGEFLAG(HasHWPoisoned, has_hwpoisoned, PF_SECOND)
        TESTSCFLAG(HasHWPoisoned, has_hwpoisoned, PF_SECOND)

That generates folio_has_hwpoisoned() along with
folio_set_has_hwpoisoned(), folio_clear_has_hwpoisoned(),
folio_test_set_has_hwpoisoned() and folio_test_clear_has_hwpoisoned().

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC 2/8] shmem: convert to use folio_test_hwpoison()
  2023-04-24 21:36         ` Matthew Wilcox
@ 2023-04-24 23:05           ` Yang Shi
  0 siblings, 0 replies; 20+ messages in thread
From: Yang Shi @ 2023-04-24 23:05 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Luis Chamberlain, hughd, akpm, brauner, djwong, p.raghav,
	da.gomez, a.manzanares, dave, yosryahmed, keescook, hare, kbusch,
	patches, linux-block, linux-fsdevel, linux-mm, linux-kernel

On Mon, Apr 24, 2023 at 2:37 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Mon, Apr 24, 2023 at 02:17:12PM -0700, Yang Shi wrote:
> > On Fri, Apr 21, 2023 at 8:05 PM Luis Chamberlain <mcgrof@kernel.org> wrote:
> > >
> > > On Fri, Apr 21, 2023 at 11:42:53PM +0100, Matthew Wilcox wrote:
> > > > On Fri, Apr 21, 2023 at 02:43:54PM -0700, Luis Chamberlain wrote:
> > > > > The PageHWPoison() call can be converted over to the respective folio call
> > > > > folio_test_hwpoison(). This introduces no functional changes.
> > > >
> > > > Um, no.  Nobody should use folio_test_hwpoison(), it's a nonsense.
> > > >
> > > > Individual pages are hwpoisoned.  You're only testing the head page
> > > > if you use folio_test_hwpoison().  There's folio_has_hwpoisoned() to
> > > > test if _any_ page in the folio is poisoned.  But blindly converting
> > > > PageHWPoison to folio_test_hwpoison() is wrong.
> > >
> > > Thanks! I don't see folio_has_hwpoisoned() though.
> >
> > We do have PageHasHWPoisoned(), which indicates at least one subpage
> > is hwpoisoned in the huge page.
> >
> > You may need to add a folio variant.
>
> PAGEFLAG(HasHWPoisoned, has_hwpoisoned, PF_SECOND)
>         TESTSCFLAG(HasHWPoisoned, has_hwpoisoned, PF_SECOND)
>
> That generates folio_has_hwpoisoned() along with
> folio_set_has_hwpoisoned(), folio_clear_has_hwpoisoned(),
> folio_test_set_has_hwpoisoned() and folio_test_clear_has_hwpoisoned().

Oh, yeah, I missed that part. Thanks.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC 2/8] shmem: convert to use folio_test_hwpoison()
       [not found]     ` <CGME20230425110913eucas1p22cf9d4c7401881999adb12134b985273@eucas1p2.samsung.com>
@ 2023-04-25 11:00       ` Pankaj Raghav
  2023-04-25 22:47         ` Luis Chamberlain
  0 siblings, 1 reply; 20+ messages in thread
From: Pankaj Raghav @ 2023-04-25 11:00 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Luis Chamberlain, hughd, akpm, brauner, djwong, da.gomez,
	a.manzanares, dave, yosryahmed, keescook, hare, kbusch, patches,
	linux-block, linux-fsdevel, linux-mm, linux-kernel, p.raghav

[-- Attachment #1: Type: text/plain, Size: 1381 bytes --]

On Fri, Apr 21, 2023 at 11:42:53PM +0100, Matthew Wilcox wrote:
> On Fri, Apr 21, 2023 at 02:43:54PM -0700, Luis Chamberlain wrote:
> > The PageHWPoison() call can be converted over to the respective folio call
> > folio_test_hwpoison(). This introduces no functional changes.
> 
> Um, no.  Nobody should use folio_test_hwpoison(), it's a nonsense.
> 
> Individual pages are hwpoisoned.  You're only testing the head page
> if you use folio_test_hwpoison().  There's folio_has_hwpoisoned() to
> test if _any_ page in the folio is poisoned.  But blindly converting
> PageHWPoison to folio_test_hwpoison() is wrong.

I see a pattern in shmem.c where first the head is tested and for large
folios, any of pages in the folio is tested for poison flag. Should we
factor it out as a helper in shmem.c and use it here?

static ssize_t shmem_file_splice_read(struct file *in, loff_t *ppos,
...
	if (folio_test_hwpoison(folio) ||
	    (folio_test_large(folio) &&
	     folio_test_has_hwpoisoned(folio))) {
	..
> 
> If anyone knows how to poison folio_test_hwpoison() to make it not
> work, I'd appreciate it.

IMO, I think it will be clear if folio_test_hwpoison checks if any of the
page in the folio is poisoned and we should have a explicit helper such
as folio_test_head_hwpoison if the callers want to only test if the head
page is poisoned (although I am not sure if that is useful).

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC 2/8] shmem: convert to use folio_test_hwpoison()
  2023-04-25 11:00       ` Pankaj Raghav
@ 2023-04-25 22:47         ` Luis Chamberlain
  2023-04-26  7:43           ` Luis Chamberlain
  0 siblings, 1 reply; 20+ messages in thread
From: Luis Chamberlain @ 2023-04-25 22:47 UTC (permalink / raw)
  To: Pankaj Raghav, hughd, willy
  Cc: akpm, brauner, djwong, da.gomez, a.manzanares, dave, yosryahmed,
	keescook, hare, kbusch, patches, linux-block, linux-fsdevel,
	linux-mm, linux-kernel

On Tue, Apr 25, 2023 at 01:00:25PM +0200, Pankaj Raghav wrote:
> On Fri, Apr 21, 2023 at 11:42:53PM +0100, Matthew Wilcox wrote:
> > On Fri, Apr 21, 2023 at 02:43:54PM -0700, Luis Chamberlain wrote:
> > > The PageHWPoison() call can be converted over to the respective folio call
> > > folio_test_hwpoison(). This introduces no functional changes.
> > 
> > Um, no.  Nobody should use folio_test_hwpoison(), it's a nonsense.
> > 
> > Individual pages are hwpoisoned.  You're only testing the head page
> > if you use folio_test_hwpoison().  There's folio_has_hwpoisoned() to
> > test if _any_ page in the folio is poisoned.  But blindly converting
> > PageHWPoison to folio_test_hwpoison() is wrong.
> 
> I see a pattern in shmem.c where first the head is tested and for large
> folios, any of pages in the folio is tested for poison flag. Should we
> factor it out as a helper in shmem.c and use it here?
> 
> static ssize_t shmem_file_splice_read(struct file *in, loff_t *ppos,
> ...
> 	if (folio_test_hwpoison(folio) ||
> 	    (folio_test_large(folio) &&
> 	     folio_test_has_hwpoisoned(folio))) {
> 	..

Hugh's commit 72887c976a7c9e ("shmem: minor fixes to splice-read
implementation") is on point about this :

  "Perhaps that ugliness can be improved at the mm end later"

So how about we put some lipstick on this guy now (notice right above it
a similar compound page check for is_page_hwpoison()):

diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 1c68d67b832f..6a4a571dbe50 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -883,6 +883,13 @@ static inline bool is_page_hwpoison(struct page *page)
 	return PageHuge(page) && PageHWPoison(compound_head(page));
 }
 
+static inline bool is_folio_hwpoison(struct folio *folio)
+{
+	if (folio_test_hwpoison(folio))
+		return true;
+	return folio_test_large(folio) && folio_test_has_hwpoisoned(folio);
+}
+
 /*
  * For pages that are never mapped to userspace (and aren't PageSlab),
  * page_type may be used.  Because it is initialised to -1, we invert the
diff --git a/mm/shmem.c b/mm/shmem.c
index ef7ad684f4fb..b7f47f6b75d5 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -3013,9 +3013,7 @@ static ssize_t shmem_file_splice_read(struct file *in, loff_t *ppos,
 		if (folio) {
 			folio_unlock(folio);
 
-			if (folio_test_hwpoison(folio) ||
-			    (folio_test_large(folio) &&
-			     folio_test_has_hwpoisoned(folio))) {
+			if (is_folio_hwpoison(folio)) {
 				error = -EIO;
 				break;
 			}

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [RFC 2/8] shmem: convert to use folio_test_hwpoison()
  2023-04-25 22:47         ` Luis Chamberlain
@ 2023-04-26  7:43           ` Luis Chamberlain
  0 siblings, 0 replies; 20+ messages in thread
From: Luis Chamberlain @ 2023-04-26  7:43 UTC (permalink / raw)
  To: Pankaj Raghav, hughd, willy
  Cc: akpm, brauner, djwong, da.gomez, a.manzanares, dave, yosryahmed,
	keescook, hare, kbusch, patches, linux-block, linux-fsdevel,
	linux-mm, linux-kernel

On Tue, Apr 25, 2023 at 03:47:24PM -0700, Luis Chamberlain wrote:
> On Tue, Apr 25, 2023 at 01:00:25PM +0200, Pankaj Raghav wrote:
> > On Fri, Apr 21, 2023 at 11:42:53PM +0100, Matthew Wilcox wrote:
> > > On Fri, Apr 21, 2023 at 02:43:54PM -0700, Luis Chamberlain wrote:
> > > > The PageHWPoison() call can be converted over to the respective folio call
> > > > folio_test_hwpoison(). This introduces no functional changes.
> > > 
> > > Um, no.  Nobody should use folio_test_hwpoison(), it's a nonsense.
> > > 
> > > Individual pages are hwpoisoned.  You're only testing the head page
> > > if you use folio_test_hwpoison().  There's folio_has_hwpoisoned() to
> > > test if _any_ page in the folio is poisoned.  But blindly converting
> > > PageHWPoison to folio_test_hwpoison() is wrong.
> > 
> > I see a pattern in shmem.c where first the head is tested and for large
> > folios, any of pages in the folio is tested for poison flag. Should we
> > factor it out as a helper in shmem.c and use it here?
> > 
> > static ssize_t shmem_file_splice_read(struct file *in, loff_t *ppos,
> > ...
> > 	if (folio_test_hwpoison(folio) ||
> > 	    (folio_test_large(folio) &&
> > 	     folio_test_has_hwpoisoned(folio))) {
> > 	..
> 
> Hugh's commit 72887c976a7c9e ("shmem: minor fixes to splice-read
> implementation") is on point about this :
> 
>   "Perhaps that ugliness can be improved at the mm end later"
> 
> So how about we put some lipstick on this guy now (notice right above it
> a similar compound page check for is_page_hwpoison()):
> 
> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> index 1c68d67b832f..6a4a571dbe50 100644
> --- a/include/linux/page-flags.h
> +++ b/include/linux/page-flags.h
> @@ -883,6 +883,13 @@ static inline bool is_page_hwpoison(struct page *page)
>  	return PageHuge(page) && PageHWPoison(compound_head(page));
>  }
>  
> +static inline bool is_folio_hwpoison(struct folio *folio)
> +{
> +	if (folio_test_hwpoison(folio))
> +		return true;
> +	return folio_test_large(folio) && folio_test_has_hwpoisoned(folio);
> +}
> +
>  /*
>   * For pages that are never mapped to userspace (and aren't PageSlab),
>   * page_type may be used.  Because it is initialised to -1, we invert the
> diff --git a/mm/shmem.c b/mm/shmem.c
> index ef7ad684f4fb..b7f47f6b75d5 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -3013,9 +3013,7 @@ static ssize_t shmem_file_splice_read(struct file *in, loff_t *ppos,
>  		if (folio) {
>  			folio_unlock(folio);
>  
> -			if (folio_test_hwpoison(folio) ||
> -			    (folio_test_large(folio) &&
> -			     folio_test_has_hwpoisoned(folio))) {
> +			if (is_folio_hwpoison(folio)) {
>  				error = -EIO;
>  				break;
>  			}

With this, this I end up with the following for shmem_file_read_iter().
For some odd reason without the first hunk I see that some non SB_KERNMOUNT
end up with a silly inode->i_blkbits.

I must be doing something wrong with the shmem_file_read_iter() conversion
as I end up with a empty new line at the end, but I can't seem to
understand why. Any ideas?

diff --git a/mm/shmem.c b/mm/shmem.c
index 21a4b8522ac5..39ae17774dc3 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2541,6 +2541,10 @@ static struct inode *shmem_get_inode(struct mnt_idmap *idmap, struct super_block
 		inode->i_ino = ino;
 		inode_init_owner(idmap, inode, dir, mode);
 		inode->i_blocks = 0;
+		if (sb->s_flags & SB_KERNMOUNT)
+			inode->i_blkbits = PAGE_SHIFT;
+		else
+			inode->i_blkbits = sb->s_blocksize_bits;
 		inode->i_atime = inode->i_mtime = inode->i_ctime = current_time(inode);
 		inode->i_generation = get_random_u32();
 		info = SHMEM_I(inode);
@@ -2786,18 +2790,23 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
 	struct file *file = iocb->ki_filp;
 	struct inode *inode = file_inode(file);
 	struct address_space *mapping = inode->i_mapping;
+	struct super_block *sb = inode->i_sb;
+	u64 bsize = i_blocksize(inode);
 	pgoff_t index;
 	unsigned long offset;
 	int error = 0;
 	ssize_t retval = 0;
 	loff_t *ppos = &iocb->ki_pos;
 
+	/*
+	 * Although our index is page specific, we can read a blocksize at a
+	 * time as we use a folio per block.
+	 */
 	index = *ppos >> PAGE_SHIFT;
-	offset = *ppos & ~PAGE_MASK;
+	offset = *ppos & (bsize - 1);
 
 	for (;;) {
 		struct folio *folio = NULL;
-		struct page *page = NULL;
 		pgoff_t end_index;
 		unsigned long nr, ret;
 		loff_t i_size = i_size_read(inode);
@@ -2806,7 +2815,7 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
 		if (index > end_index)
 			break;
 		if (index == end_index) {
-			nr = i_size & ~PAGE_MASK;
+			nr = i_size & (bsize - 1);
 			if (nr <= offset)
 				break;
 		}
@@ -2819,9 +2828,7 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
 		}
 		if (folio) {
 			folio_unlock(folio);
-
-			page = folio_file_page(folio, index);
-			if (PageHWPoison(page)) {
+			if (is_folio_hwpoison(folio)) {
 				folio_put(folio);
 				error = -EIO;
 				break;
@@ -2831,50 +2838,63 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
 		/*
 		 * We must evaluate after, since reads (unlike writes)
 		 * are called without i_rwsem protection against truncate
+		 *
+		 * nr represents the number of bytes we can read per folio,
+		 * and this will depend on the blocksize set.
 		 */
-		nr = PAGE_SIZE;
+		nr = bsize;
+		WARN_ON(!(sb->s_flags & SB_KERNMOUNT) && folio && bsize != folio_size(folio));
 		i_size = i_size_read(inode);
 		end_index = i_size >> PAGE_SHIFT;
 		if (index == end_index) {
-			nr = i_size & ~PAGE_MASK;
+			nr = i_size & (bsize - 1);
 			if (nr <= offset) {
 				if (folio)
 					folio_put(folio);
 				break;
 			}
 		}
+		/*
+		 * On the first folio read this will amount to blocksize - offset. On subsequent
+		 * reads we can read blocksize at time until iov_iter_count(to) == 0.
+		 *
+		 * The offset represents the base we'll use to do the reads per folio, it
+		 * gets incremented by the number of bytes we read per folio and is aligned
+		 * to the blocksize. After a first offset block the offset would be 0 and
+		 * we'd read a block at a time.
+		 */
 		nr -= offset;
 
 		if (folio) {
 			/*
-			 * If users can be writing to this page using arbitrary
+			 * If users can be writing to this folio using arbitrary
 			 * virtual addresses, take care about potential aliasing
-			 * before reading the page on the kernel side.
+			 * before reading the folio on the kernel side.
 			 */
 			if (mapping_writably_mapped(mapping))
-				flush_dcache_page(page);
+				flush_dcache_folio(folio);
 			/*
-			 * Mark the page accessed if we read the beginning.
+			 * Mark the folio accessed if we read the beginning.
 			 */
 			if (!offset)
 				folio_mark_accessed(folio);
 			/*
-			 * Ok, we have the page, and it's up-to-date, so
+			 * Ok, we have the folio, and it's up-to-date, so
 			 * now we can copy it to user space...
 			 */
-			ret = copy_page_to_iter(page, offset, nr, to);
+			ret = copy_folio_to_iter(folio, offset, nr, to);
 			folio_put(folio);
 
 		} else if (user_backed_iter(to)) {
 			/*
 			 * Copy to user tends to be so well optimized, but
 			 * clear_user() not so much, that it is noticeably
-			 * faster to copy the zero page instead of clearing.
+			 * faster to copy the zero folio instead of clearing.
 			 */
-			ret = copy_page_to_iter(ZERO_PAGE(0), offset, nr, to);
+			ret = copy_folio_to_iter(page_folio(ZERO_PAGE(0)), offset, nr, to);
 		} else {
 			/*
-			 * But submitting the same page twice in a row to
+			 * But submitting the same folio twice in a row to
 			 * splice() - or others? - can result in confusion:
 			 * so don't attempt that optimization on pipes etc.
 			 */
@@ -2883,8 +2903,13 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
 
 		retval += ret;
 		offset += ret;
+
+		/*
+		 * Due to usage of folios per blocksize we know this will actually read
+		 * blocksize at a time after the first block read at offset.
+		 */
 		index += offset >> PAGE_SHIFT;
-		offset &= ~PAGE_MASK;
+		offset &= (bsize - 1);
 
 		if (!iov_iter_count(to))
 			break;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2023-04-26  7:43 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-04-21 21:43 [RFC 0/8] shmem: add support for blocksize > PAGE_SIZE Luis Chamberlain
2023-04-21 21:43 ` [RFC 1/8] shmem: replace BLOCKS_PER_PAGE with PAGE_SECTORS Luis Chamberlain
2023-04-21 21:43 ` [RFC 2/8] shmem: convert to use folio_test_hwpoison() Luis Chamberlain
2023-04-21 22:42   ` Matthew Wilcox
2023-04-22  3:05     ` Luis Chamberlain
2023-04-24 21:17       ` Yang Shi
2023-04-24 21:36         ` Matthew Wilcox
2023-04-24 23:05           ` Yang Shi
     [not found]     ` <CGME20230425110913eucas1p22cf9d4c7401881999adb12134b985273@eucas1p2.samsung.com>
2023-04-25 11:00       ` Pankaj Raghav
2023-04-25 22:47         ` Luis Chamberlain
2023-04-26  7:43           ` Luis Chamberlain
2023-04-21 21:43 ` [RFC 3/8] shmem: account for high order folios Luis Chamberlain
2023-04-21 22:46   ` Matthew Wilcox
2023-04-21 21:43 ` [RFC 4/8] shmem: add helpers to get block size Luis Chamberlain
2023-04-21 22:49   ` Matthew Wilcox
2023-04-21 21:43 ` [RFC 5/8] shmem: account for larger blocks sizes for shmem_default_max_blocks() Luis Chamberlain
2023-04-21 21:43 ` [RFC 6/8] shmem: consider block size in shmem_default_max_inodes() Luis Chamberlain
2023-04-21 21:43 ` [RFC 7/8] shmem: add high order page support Luis Chamberlain
2023-04-21 21:44 ` [RFC 8/8] shmem: add support to customize block size on multiple PAGE_SIZE Luis Chamberlain
2023-04-22  5:10   ` Jane Chu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).