All of lore.kernel.org
 help / color / mirror / Atom feed
* put the xfs xfile abstraction on a diet v2
@ 2024-01-26 13:28 Christoph Hellwig
  2024-01-26 13:28 ` [PATCH 01/21] mm: move mapping_set_update out of <linux/swap.h> Christoph Hellwig
                   ` (22 more replies)
  0 siblings, 23 replies; 48+ messages in thread
From: Christoph Hellwig @ 2024-01-26 13:28 UTC (permalink / raw)
  To: Chandan Babu R, Darrick J. Wong, Hugh Dickins, Andrew Morton
  Cc: linux-xfs, linux-mm

Hi all,

this series refactors and simplifies the code in the xfs xfile
abstraction, which is a thing layer on a kernel-use shmem file.

Do do this is needs a slighly lower level export from shmem.c,
which I combined with improving an assert and documentation there.

One thing I don't really like yet is that xfile is still based on
folios and not pages.  The main stumbling block for that is the
mess around the hwpoison flag - that one still is per-file and not
per-folio, and shmem checks it weirdly often and not really in
at the abstractions levels where I'd expect it and feels very
different from the normal page cache code in filemap.c.  Maybe
I'm just failing to understand why that is done, but especially
without comments explaining it it feels like it could use some
real attention first.

The series is against the xfs for-next branch.

Changes since v1:
 - fix reading i_blocks
 - provide wrappers for reading i_size and i_blocks
 - update the xfile load/store documentation
 - improve a commit message
 - use shmem_kernel_file_setup
 - add a missing folio unlock in the hwpoison path in xfile_get_page
 - fix checking for shmem mappings
 - improve I/O error handling (Darrick)
 - convert to folios (partially from Darrick)

Diffstat:
 Documentation/filesystems/xfs/xfs-online-fsck-design.rst |   12 
 fs/xfs/scrub/rtsummary.c                                 |    6 
 fs/xfs/scrub/trace.h                                     |   81 ++-
 fs/xfs/scrub/xfarray.c                                   |  234 ++++-----
 fs/xfs/scrub/xfarray.h                                   |   11 
 fs/xfs/scrub/xfile.c                                     |  370 +++++----------
 fs/xfs/scrub/xfile.h                                     |   62 --
 include/linux/shmem_fs.h                                 |    6 
 include/linux/swap.h                                     |   10 
 mm/filemap.c                                             |    9 
 mm/internal.h                                            |    4 
 mm/shmem.c                                               |   38 +
 12 files changed, 368 insertions(+), 475 deletions(-)

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH 01/21] mm: move mapping_set_update out of <linux/swap.h>
  2024-01-26 13:28 put the xfs xfile abstraction on a diet v2 Christoph Hellwig
@ 2024-01-26 13:28 ` Christoph Hellwig
  2024-01-26 14:39   ` Matthew Wilcox
  2024-01-26 13:28 ` [PATCH 02/21] shmem: move shmem_mapping out of line Christoph Hellwig
                   ` (21 subsequent siblings)
  22 siblings, 1 reply; 48+ messages in thread
From: Christoph Hellwig @ 2024-01-26 13:28 UTC (permalink / raw)
  To: Chandan Babu R, Darrick J. Wong, Hugh Dickins, Andrew Morton
  Cc: linux-xfs, linux-mm

mapping_set_update is only used inside mm/.  Move mapping_set_update to
mm/internal.h and turn it into an inline function instead of a macro.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 include/linux/swap.h | 10 ----------
 mm/filemap.c         |  9 +++++++++
 mm/internal.h        |  4 ++++
 3 files changed, 13 insertions(+), 10 deletions(-)

diff --git a/include/linux/swap.h b/include/linux/swap.h
index 4db00ddad26169..755fc64ba48ded 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -350,16 +350,6 @@ void *workingset_eviction(struct folio *folio, struct mem_cgroup *target_memcg);
 void workingset_refault(struct folio *folio, void *shadow);
 void workingset_activation(struct folio *folio);
 
-/* Only track the nodes of mappings with shadow entries */
-void workingset_update_node(struct xa_node *node);
-extern struct list_lru shadow_nodes;
-#define mapping_set_update(xas, mapping) do {				\
-	if (!dax_mapping(mapping) && !shmem_mapping(mapping)) {		\
-		xas_set_update(xas, workingset_update_node);		\
-		xas_set_lru(xas, &shadow_nodes);			\
-	}								\
-} while (0)
-
 /* linux/mm/page_alloc.c */
 extern unsigned long totalreserve_pages;
 
diff --git a/mm/filemap.c b/mm/filemap.c
index 750e779c23db74..6c8b089f00d26a 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -124,6 +124,15 @@
  *    ->private_lock		(zap_pte_range->block_dirty_folio)
  */
 
+static void mapping_set_update(struct xa_state *xas,
+		struct address_space *mapping)
+{
+	if (dax_mapping(mapping) || shmem_mapping(mapping))
+		return;
+	xas_set_update(xas, workingset_update_node);
+	xas_set_lru(xas, &shadow_nodes);
+}
+
 static void page_cache_delete(struct address_space *mapping,
 				   struct folio *folio, void *shadow)
 {
diff --git a/mm/internal.h b/mm/internal.h
index f309a010d50fb6..4398f572485f00 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -1266,4 +1266,8 @@ static inline void shrinker_debugfs_remove(struct dentry *debugfs_entry,
 }
 #endif /* CONFIG_SHRINKER_DEBUG */
 
+/* Only track the nodes of mappings with shadow entries */
+void workingset_update_node(struct xa_node *node);
+extern struct list_lru shadow_nodes;
+
 #endif	/* __MM_INTERNAL_H */
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 02/21] shmem: move shmem_mapping out of line
  2024-01-26 13:28 put the xfs xfile abstraction on a diet v2 Christoph Hellwig
  2024-01-26 13:28 ` [PATCH 01/21] mm: move mapping_set_update out of <linux/swap.h> Christoph Hellwig
@ 2024-01-26 13:28 ` Christoph Hellwig
  2024-01-26 14:40   ` Matthew Wilcox
  2024-01-26 13:28 ` [PATCH 03/21] shmem: set a_ops earlier in shmem_symlink Christoph Hellwig
                   ` (20 subsequent siblings)
  22 siblings, 1 reply; 48+ messages in thread
From: Christoph Hellwig @ 2024-01-26 13:28 UTC (permalink / raw)
  To: Chandan Babu R, Darrick J. Wong, Hugh Dickins, Andrew Morton
  Cc: linux-xfs, linux-mm

shmem_aops really should not be exported to the world.  Move
shmem_mapping and export it as internal for the one semi-legitimate
modular user in udmabuf.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 include/linux/shmem_fs.h |  6 +-----
 mm/shmem.c               | 11 ++++++++---
 2 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h
index 2caa6b86106aa3..6b96a87e4bc80a 100644
--- a/include/linux/shmem_fs.h
+++ b/include/linux/shmem_fs.h
@@ -97,11 +97,7 @@ extern unsigned long shmem_get_unmapped_area(struct file *, unsigned long addr,
 		unsigned long len, unsigned long pgoff, unsigned long flags);
 extern int shmem_lock(struct file *file, int lock, struct ucounts *ucounts);
 #ifdef CONFIG_SHMEM
-extern const struct address_space_operations shmem_aops;
-static inline bool shmem_mapping(struct address_space *mapping)
-{
-	return mapping->a_ops == &shmem_aops;
-}
+bool shmem_mapping(struct address_space *mapping);
 #else
 static inline bool shmem_mapping(struct address_space *mapping)
 {
diff --git a/mm/shmem.c b/mm/shmem.c
index d7c84ff621860b..f607b0cab7e4e2 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -254,7 +254,7 @@ static void shmem_inode_unacct_blocks(struct inode *inode, long pages)
 }
 
 static const struct super_operations shmem_ops;
-const struct address_space_operations shmem_aops;
+static const struct address_space_operations shmem_aops;
 static const struct file_operations shmem_file_operations;
 static const struct inode_operations shmem_inode_operations;
 static const struct inode_operations shmem_dir_inode_operations;
@@ -263,6 +263,12 @@ static const struct vm_operations_struct shmem_vm_ops;
 static const struct vm_operations_struct shmem_anon_vm_ops;
 static struct file_system_type shmem_fs_type;
 
+bool shmem_mapping(struct address_space *mapping)
+{
+	return mapping->a_ops == &shmem_aops;
+}
+EXPORT_SYMBOL_GPL(shmem_mapping);
+
 bool vma_is_anon_shmem(struct vm_area_struct *vma)
 {
 	return vma->vm_ops == &shmem_anon_vm_ops;
@@ -4466,7 +4472,7 @@ static int shmem_error_remove_folio(struct address_space *mapping,
 	return 0;
 }
 
-const struct address_space_operations shmem_aops = {
+static const struct address_space_operations shmem_aops = {
 	.writepage	= shmem_writepage,
 	.dirty_folio	= noop_dirty_folio,
 #ifdef CONFIG_TMPFS
@@ -4478,7 +4484,6 @@ const struct address_space_operations shmem_aops = {
 #endif
 	.error_remove_folio = shmem_error_remove_folio,
 };
-EXPORT_SYMBOL(shmem_aops);
 
 static const struct file_operations shmem_file_operations = {
 	.mmap		= shmem_mmap,
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 03/21] shmem: set a_ops earlier in shmem_symlink
  2024-01-26 13:28 put the xfs xfile abstraction on a diet v2 Christoph Hellwig
  2024-01-26 13:28 ` [PATCH 01/21] mm: move mapping_set_update out of <linux/swap.h> Christoph Hellwig
  2024-01-26 13:28 ` [PATCH 02/21] shmem: move shmem_mapping out of line Christoph Hellwig
@ 2024-01-26 13:28 ` Christoph Hellwig
  2024-01-26 14:41   ` Matthew Wilcox
  2024-01-26 13:28 ` [PATCH 04/21] shmem: move the shmem_mapping assert into shmem_get_folio_gfp Christoph Hellwig
                   ` (19 subsequent siblings)
  22 siblings, 1 reply; 48+ messages in thread
From: Christoph Hellwig @ 2024-01-26 13:28 UTC (permalink / raw)
  To: Chandan Babu R, Darrick J. Wong, Hugh Dickins, Andrew Morton
  Cc: linux-xfs, linux-mm

Set the a_aops in shmem_symlink before reading a folio from the mapping
to prepare for asserting that shmem_get_folio is only called on shmem
mappings.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 mm/shmem.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/shmem.c b/mm/shmem.c
index f607b0cab7e4e2..1900916aa84d13 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -3506,10 +3506,10 @@ static int shmem_symlink(struct mnt_idmap *idmap, struct inode *dir,
 		inode->i_op = &shmem_short_symlink_operations;
 	} else {
 		inode_nohighmem(inode);
+		inode->i_mapping->a_ops = &shmem_aops;
 		error = shmem_get_folio(inode, 0, &folio, SGP_WRITE);
 		if (error)
 			goto out_remove_offset;
-		inode->i_mapping->a_ops = &shmem_aops;
 		inode->i_op = &shmem_symlink_inode_operations;
 		memcpy(folio_address(folio), symname, len);
 		folio_mark_uptodate(folio);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 04/21] shmem: move the shmem_mapping assert into shmem_get_folio_gfp
  2024-01-26 13:28 put the xfs xfile abstraction on a diet v2 Christoph Hellwig
                   ` (2 preceding siblings ...)
  2024-01-26 13:28 ` [PATCH 03/21] shmem: set a_ops earlier in shmem_symlink Christoph Hellwig
@ 2024-01-26 13:28 ` Christoph Hellwig
  2024-01-26 15:09   ` Matthew Wilcox
  2024-01-26 13:28 ` [PATCH 05/21] shmem: export shmem_get_folio Christoph Hellwig
                   ` (18 subsequent siblings)
  22 siblings, 1 reply; 48+ messages in thread
From: Christoph Hellwig @ 2024-01-26 13:28 UTC (permalink / raw)
  To: Chandan Babu R, Darrick J. Wong, Hugh Dickins, Andrew Morton
  Cc: linux-xfs, linux-mm

Move the check that the inode really is a shmemfs one from
shmem_read_folio_gfp to shmem_get_folio_gfp given that shmem_get_folio
can also be called from outside of shmem.c.  Also turn it into a
WARN_ON_ONCE and error return instead of BUG_ON to be less severe.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
---
 mm/shmem.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/mm/shmem.c b/mm/shmem.c
index 1900916aa84d13..ad533b2f0721a7 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1972,6 +1972,9 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index,
 	int error;
 	bool alloced;
 
+	if (WARN_ON_ONCE(!shmem_mapping(inode->i_mapping)))
+		return -EINVAL;
+
 	if (index > (MAX_LFS_FILESIZE >> PAGE_SHIFT))
 		return -EFBIG;
 repeat:
@@ -4915,7 +4918,6 @@ struct folio *shmem_read_folio_gfp(struct address_space *mapping,
 	struct folio *folio;
 	int error;
 
-	BUG_ON(!shmem_mapping(mapping));
 	error = shmem_get_folio_gfp(inode, index, &folio, SGP_CACHE,
 				    gfp, NULL, NULL);
 	if (error)
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 05/21] shmem: export shmem_get_folio
  2024-01-26 13:28 put the xfs xfile abstraction on a diet v2 Christoph Hellwig
                   ` (3 preceding siblings ...)
  2024-01-26 13:28 ` [PATCH 04/21] shmem: move the shmem_mapping assert into shmem_get_folio_gfp Christoph Hellwig
@ 2024-01-26 13:28 ` Christoph Hellwig
  2024-01-26 15:15   ` Matthew Wilcox
  2024-01-26 13:28 ` [PATCH 06/21] shmem: export shmem_kernel_file_setup Christoph Hellwig
                   ` (17 subsequent siblings)
  22 siblings, 1 reply; 48+ messages in thread
From: Christoph Hellwig @ 2024-01-26 13:28 UTC (permalink / raw)
  To: Chandan Babu R, Darrick J. Wong, Hugh Dickins, Andrew Morton
  Cc: linux-xfs, linux-mm

Export shmem_get_folio as a slightly lower-level variant of
shmem_read_folio_gfp.  This will be useful for XFS xfile use cases
that want to pass SGP_NOALLOC or get a locked page, which the thin
shmem_read_folio_gfp wrapper can't provide.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
---
 mm/shmem.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/mm/shmem.c b/mm/shmem.c
index ad533b2f0721a7..dae684cd3c99fb 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2137,12 +2137,27 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index,
 	return error;
 }
 
+/**
+ * shmem_get_folio - find and get a reference to a shmem folio.
+ * @inode:	inode to search
+ * @index:	the page index.
+ * @foliop:	pointer to the found folio if one was found
+ * @sgp:	SGP_* flags to control behavior
+ *
+ * Looks up the page cache entry at @inode & @index.
+ *
+ * If this function returns a folio, it is returned with an increased refcount.
+ *
+ * Return: The found folio, %NULL if SGP_READ or SGP_NOALLOC was passed in @sgp
+ * and no folio was found at @index, or an ERR_PTR() otherwise.
+ */
 int shmem_get_folio(struct inode *inode, pgoff_t index, struct folio **foliop,
 		enum sgp_type sgp)
 {
 	return shmem_get_folio_gfp(inode, index, foliop, sgp,
 			mapping_gfp_mask(inode->i_mapping), NULL, NULL);
 }
+EXPORT_SYMBOL_GPL(shmem_get_folio);
 
 /*
  * This is like autoremove_wake_function, but it removes the wait queue
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 06/21] shmem: export shmem_kernel_file_setup
  2024-01-26 13:28 put the xfs xfile abstraction on a diet v2 Christoph Hellwig
                   ` (4 preceding siblings ...)
  2024-01-26 13:28 ` [PATCH 05/21] shmem: export shmem_get_folio Christoph Hellwig
@ 2024-01-26 13:28 ` Christoph Hellwig
  2024-01-26 15:45   ` Matthew Wilcox
  2024-01-26 13:28 ` [PATCH 07/21] shmem: document how to "persist" data when using shmem_*file_setup Christoph Hellwig
                   ` (16 subsequent siblings)
  22 siblings, 1 reply; 48+ messages in thread
From: Christoph Hellwig @ 2024-01-26 13:28 UTC (permalink / raw)
  To: Chandan Babu R, Darrick J. Wong, Hugh Dickins, Andrew Morton
  Cc: linux-xfs, linux-mm

XFS wants to use this for it's internal in-memory data structures and
currently duplicates the functionality.  Export shmem_kernel_file_setup
to allow XFS to switch over to using the proper kernel API.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 mm/shmem.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/mm/shmem.c b/mm/shmem.c
index dae684cd3c99fb..e89fb5eccb0c0a 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -4856,6 +4856,7 @@ struct file *shmem_kernel_file_setup(const char *name, loff_t size, unsigned lon
 {
 	return __shmem_file_setup(shm_mnt, name, size, flags, S_PRIVATE);
 }
+EXPORT_SYMBOL_GPL(shmem_kernel_file_setup);
 
 /**
  * shmem_file_setup - get an unlinked file living in tmpfs
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 07/21] shmem: document how to "persist" data when using shmem_*file_setup
  2024-01-26 13:28 put the xfs xfile abstraction on a diet v2 Christoph Hellwig
                   ` (5 preceding siblings ...)
  2024-01-26 13:28 ` [PATCH 06/21] shmem: export shmem_kernel_file_setup Christoph Hellwig
@ 2024-01-26 13:28 ` Christoph Hellwig
  2024-01-26 15:49   ` Matthew Wilcox
  2024-01-26 13:28 ` [PATCH 08/21] xfs: remove xfile_stat Christoph Hellwig
                   ` (15 subsequent siblings)
  22 siblings, 1 reply; 48+ messages in thread
From: Christoph Hellwig @ 2024-01-26 13:28 UTC (permalink / raw)
  To: Chandan Babu R, Darrick J. Wong, Hugh Dickins, Andrew Morton
  Cc: linux-xfs, linux-mm

Add a blurb that simply dirtying the folio will persist data for in-kernel
shmem files.  This is what most of the callers already do.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
---
 mm/shmem.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/mm/shmem.c b/mm/shmem.c
index e89fb5eccb0c0a..f7d6848fb294d6 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2150,6 +2150,11 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index,
  *
  * Return: The found folio, %NULL if SGP_READ or SGP_NOALLOC was passed in @sgp
  * and no folio was found at @index, or an ERR_PTR() otherwise.
+ *
+ * If the caller modifies data in the returned folio, it must call
+ * folio_mark_dirty() on the locked folio before dropping the reference to
+ * ensure the folio is not reclaimed.  Unlike for normal file systems there is
+ * no need to reserve space for users of shmem_*file_setup().
  */
 int shmem_get_folio(struct inode *inode, pgoff_t index, struct folio **foliop,
 		enum sgp_type sgp)
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 08/21] xfs: remove xfile_stat
  2024-01-26 13:28 put the xfs xfile abstraction on a diet v2 Christoph Hellwig
                   ` (6 preceding siblings ...)
  2024-01-26 13:28 ` [PATCH 07/21] shmem: document how to "persist" data when using shmem_*file_setup Christoph Hellwig
@ 2024-01-26 13:28 ` Christoph Hellwig
  2024-01-26 13:28 ` [PATCH 09/21] xfs: remove the xfile_pread/pwrite APIs Christoph Hellwig
                   ` (14 subsequent siblings)
  22 siblings, 0 replies; 48+ messages in thread
From: Christoph Hellwig @ 2024-01-26 13:28 UTC (permalink / raw)
  To: Chandan Babu R, Darrick J. Wong, Hugh Dickins, Andrew Morton
  Cc: linux-xfs, linux-mm

vfs_getattr is needed to query inode attributes for unknown underlying
file systems.  But shmemfs is well known for users of shmem_file_setup
and shmem_read_mapping_page_gfp that rely on it not needing specific
inode revalidation and having a normal mapping.  Remove the detour
through the getattr method and an extra wrapper, and just read the
inode size and i_bytes directly in the scrub tracing code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/scrub/trace.h | 34 ++++++++++------------------------
 fs/xfs/scrub/xfile.c | 19 -------------------
 fs/xfs/scrub/xfile.h |  7 -------
 3 files changed, 10 insertions(+), 50 deletions(-)

diff --git a/fs/xfs/scrub/trace.h b/fs/xfs/scrub/trace.h
index 6bbb4e8639dca6..260b8fe0a80296 100644
--- a/fs/xfs/scrub/trace.h
+++ b/fs/xfs/scrub/trace.h
@@ -861,18 +861,11 @@ TRACE_EVENT(xfile_destroy,
 		__field(loff_t, size)
 	),
 	TP_fast_assign(
-		struct xfile_stat	statbuf;
-		int			ret;
-
-		ret = xfile_stat(xf, &statbuf);
-		if (!ret) {
-			__entry->bytes = statbuf.bytes;
-			__entry->size = statbuf.size;
-		} else {
-			__entry->bytes = -1;
-			__entry->size = -1;
-		}
-		__entry->ino = file_inode(xf->file)->i_ino;
+		struct inode		*inode = file_inode(xf->file);
+
+		__entry->ino = inode->i_ino;
+		__entry->bytes = inode->i_blocks << SECTOR_SHIFT;
+		__entry->size = i_size_read(inode);
 	),
 	TP_printk("xfino 0x%lx mem_bytes 0x%llx isize 0x%llx",
 		  __entry->ino,
@@ -891,19 +884,12 @@ DECLARE_EVENT_CLASS(xfile_class,
 		__field(unsigned long long, bytecount)
 	),
 	TP_fast_assign(
-		struct xfile_stat	statbuf;
-		int			ret;
-
-		ret = xfile_stat(xf, &statbuf);
-		if (!ret) {
-			__entry->bytes_used = statbuf.bytes;
-			__entry->size = statbuf.size;
-		} else {
-			__entry->bytes_used = -1;
-			__entry->size = -1;
-		}
-		__entry->ino = file_inode(xf->file)->i_ino;
+		struct inode		*inode = file_inode(xf->file);
+
+		__entry->ino = inode->i_ino;
+		__entry->bytes_used = inode->i_blocks << SECTOR_SHIFT;
 		__entry->pos = pos;
+		__entry->size = i_size_read(inode);
 		__entry->bytecount = bytecount;
 	),
 	TP_printk("xfino 0x%lx mem_bytes 0x%llx pos 0x%llx bytecount 0x%llx isize 0x%llx",
diff --git a/fs/xfs/scrub/xfile.c b/fs/xfs/scrub/xfile.c
index 090c3ead43fdf1..87654cdd5ac6f9 100644
--- a/fs/xfs/scrub/xfile.c
+++ b/fs/xfs/scrub/xfile.c
@@ -291,25 +291,6 @@ xfile_seek_data(
 	return ret;
 }
 
-/* Query stat information for an xfile. */
-int
-xfile_stat(
-	struct xfile		*xf,
-	struct xfile_stat	*statbuf)
-{
-	struct kstat		ks;
-	int			error;
-
-	error = vfs_getattr_nosec(&xf->file->f_path, &ks,
-			STATX_SIZE | STATX_BLOCKS, AT_STATX_DONT_SYNC);
-	if (error)
-		return error;
-
-	statbuf->size = ks.size;
-	statbuf->bytes = ks.blocks << SECTOR_SHIFT;
-	return 0;
-}
-
 /*
  * Grab the (locked) page for a memory object.  The object cannot span a page
  * boundary.  Returns 0 (and a locked page) if successful, -ENOTBLK if we
diff --git a/fs/xfs/scrub/xfile.h b/fs/xfs/scrub/xfile.h
index d56643b0f429e1..c602d11560d8ee 100644
--- a/fs/xfs/scrub/xfile.h
+++ b/fs/xfs/scrub/xfile.h
@@ -63,13 +63,6 @@ xfile_obj_store(struct xfile *xf, const void *buf, size_t count, loff_t pos)
 
 loff_t xfile_seek_data(struct xfile *xf, loff_t pos);
 
-struct xfile_stat {
-	loff_t			size;
-	unsigned long long	bytes;
-};
-
-int xfile_stat(struct xfile *xf, struct xfile_stat *statbuf);
-
 int xfile_get_page(struct xfile *xf, loff_t offset, unsigned int len,
 		struct xfile_page *xbuf);
 int xfile_put_page(struct xfile *xf, struct xfile_page *xbuf);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 09/21] xfs: remove the xfile_pread/pwrite APIs
  2024-01-26 13:28 put the xfs xfile abstraction on a diet v2 Christoph Hellwig
                   ` (7 preceding siblings ...)
  2024-01-26 13:28 ` [PATCH 08/21] xfs: remove xfile_stat Christoph Hellwig
@ 2024-01-26 13:28 ` Christoph Hellwig
  2024-01-26 16:21   ` Matthew Wilcox
  2024-01-26 13:28 ` [PATCH 10/21] xfs: don't try to handle non-update pages in xfile_obj_load Christoph Hellwig
                   ` (13 subsequent siblings)
  22 siblings, 1 reply; 48+ messages in thread
From: Christoph Hellwig @ 2024-01-26 13:28 UTC (permalink / raw)
  To: Chandan Babu R, Darrick J. Wong, Hugh Dickins, Andrew Morton
  Cc: linux-xfs, linux-mm

All current and pending xfile users use the xfile_obj_load
and xfile_obj_store API, so make those the actual implementation.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
---
 .../xfs/xfs-online-fsck-design.rst            | 10 +---
 fs/xfs/scrub/rtsummary.c                      |  6 +--
 fs/xfs/scrub/trace.h                          |  4 +-
 fs/xfs/scrub/xfarray.c                        | 18 +++----
 fs/xfs/scrub/xfile.c                          | 54 +++++++++----------
 fs/xfs/scrub/xfile.h                          | 32 +----------
 6 files changed, 42 insertions(+), 82 deletions(-)

diff --git a/Documentation/filesystems/xfs/xfs-online-fsck-design.rst b/Documentation/filesystems/xfs/xfs-online-fsck-design.rst
index 352516feef6ffe..324d5ec921e8e5 100644
--- a/Documentation/filesystems/xfs/xfs-online-fsck-design.rst
+++ b/Documentation/filesystems/xfs/xfs-online-fsck-design.rst
@@ -1915,19 +1915,13 @@ four of those five higher level data structures.
 The fifth use case is discussed in the :ref:`realtime summary <rtsummary>` case
 study.
 
-The most general storage interface supported by the xfile enables the reading
-and writing of arbitrary quantities of data at arbitrary offsets in the xfile.
-This capability is provided by ``xfile_pread`` and ``xfile_pwrite`` functions,
-which behave similarly to their userspace counterparts.
 XFS is very record-based, which suggests that the ability to load and store
 complete records is important.
 To support these cases, a pair of ``xfile_obj_load`` and ``xfile_obj_store``
-functions are provided to read and persist objects into an xfile.
-They are internally the same as pread and pwrite, except that they treat any
-error as an out of memory error.
+functions are provided to read and persist objects into an xfile that unlike
+the pread and pwrite system calls treat any error as an out of memory error.
 For online repair, squashing error conditions in this manner is an acceptable
 behavior because the only reaction is to abort the operation back to userspace.
-All five xfile usecases can be serviced by these four functions.
 
 However, no discussion of file access idioms is complete without answering the
 question, "But what about mmap?"
diff --git a/fs/xfs/scrub/rtsummary.c b/fs/xfs/scrub/rtsummary.c
index fabd0ed9dfa676..30b5a3952513f8 100644
--- a/fs/xfs/scrub/rtsummary.c
+++ b/fs/xfs/scrub/rtsummary.c
@@ -118,7 +118,7 @@ xfsum_load(
 	xfs_rtsumoff_t		sumoff,
 	union xfs_suminfo_raw	*rawinfo)
 {
-	return xfile_obj_load(sc->xfile, rawinfo,
+	return xfile_load(sc->xfile, rawinfo,
 			sizeof(union xfs_suminfo_raw),
 			sumoff << XFS_WORDLOG);
 }
@@ -129,7 +129,7 @@ xfsum_store(
 	xfs_rtsumoff_t		sumoff,
 	const union xfs_suminfo_raw rawinfo)
 {
-	return xfile_obj_store(sc->xfile, &rawinfo,
+	return xfile_store(sc->xfile, &rawinfo,
 			sizeof(union xfs_suminfo_raw),
 			sumoff << XFS_WORDLOG);
 }
@@ -141,7 +141,7 @@ xfsum_copyout(
 	union xfs_suminfo_raw	*rawinfo,
 	unsigned int		nr_words)
 {
-	return xfile_obj_load(sc->xfile, rawinfo, nr_words << XFS_WORDLOG,
+	return xfile_load(sc->xfile, rawinfo, nr_words << XFS_WORDLOG,
 			sumoff << XFS_WORDLOG);
 }
 
diff --git a/fs/xfs/scrub/trace.h b/fs/xfs/scrub/trace.h
index 260b8fe0a80296..0327cab606b070 100644
--- a/fs/xfs/scrub/trace.h
+++ b/fs/xfs/scrub/trace.h
@@ -903,8 +903,8 @@ DECLARE_EVENT_CLASS(xfile_class,
 DEFINE_EVENT(xfile_class, name, \
 	TP_PROTO(struct xfile *xf, loff_t pos, unsigned long long bytecount), \
 	TP_ARGS(xf, pos, bytecount))
-DEFINE_XFILE_EVENT(xfile_pread);
-DEFINE_XFILE_EVENT(xfile_pwrite);
+DEFINE_XFILE_EVENT(xfile_load);
+DEFINE_XFILE_EVENT(xfile_store);
 DEFINE_XFILE_EVENT(xfile_seek_data);
 DEFINE_XFILE_EVENT(xfile_get_page);
 DEFINE_XFILE_EVENT(xfile_put_page);
diff --git a/fs/xfs/scrub/xfarray.c b/fs/xfs/scrub/xfarray.c
index f0f532c10a5acc..95ac14bceeadd6 100644
--- a/fs/xfs/scrub/xfarray.c
+++ b/fs/xfs/scrub/xfarray.c
@@ -136,7 +136,7 @@ xfarray_load(
 	if (idx >= array->nr)
 		return -ENODATA;
 
-	return xfile_obj_load(array->xfile, ptr, array->obj_size,
+	return xfile_load(array->xfile, ptr, array->obj_size,
 			xfarray_pos(array, idx));
 }
 
@@ -152,7 +152,7 @@ xfarray_is_unset(
 	if (array->unset_slots == 0)
 		return false;
 
-	error = xfile_obj_load(array->xfile, temp, array->obj_size, pos);
+	error = xfile_load(array->xfile, temp, array->obj_size, pos);
 	if (!error && xfarray_element_is_null(array, temp))
 		return true;
 
@@ -184,7 +184,7 @@ xfarray_unset(
 		return 0;
 
 	memset(temp, 0, array->obj_size);
-	error = xfile_obj_store(array->xfile, temp, array->obj_size, pos);
+	error = xfile_store(array->xfile, temp, array->obj_size, pos);
 	if (error)
 		return error;
 
@@ -209,7 +209,7 @@ xfarray_store(
 
 	ASSERT(!xfarray_element_is_null(array, ptr));
 
-	ret = xfile_obj_store(array->xfile, ptr, array->obj_size,
+	ret = xfile_store(array->xfile, ptr, array->obj_size,
 			xfarray_pos(array, idx));
 	if (ret)
 		return ret;
@@ -245,12 +245,12 @@ xfarray_store_anywhere(
 	for (pos = 0;
 	     pos < endpos && array->unset_slots > 0;
 	     pos += array->obj_size) {
-		error = xfile_obj_load(array->xfile, temp, array->obj_size,
+		error = xfile_load(array->xfile, temp, array->obj_size,
 				pos);
 		if (error || !xfarray_element_is_null(array, temp))
 			continue;
 
-		error = xfile_obj_store(array->xfile, ptr, array->obj_size,
+		error = xfile_store(array->xfile, ptr, array->obj_size,
 				pos);
 		if (error)
 			return error;
@@ -552,7 +552,7 @@ xfarray_isort(
 	trace_xfarray_isort(si, lo, hi);
 
 	xfarray_sort_bump_loads(si);
-	error = xfile_obj_load(si->array->xfile, scratch, len, lo_pos);
+	error = xfile_load(si->array->xfile, scratch, len, lo_pos);
 	if (error)
 		return error;
 
@@ -560,7 +560,7 @@ xfarray_isort(
 	sort(scratch, hi - lo + 1, si->array->obj_size, si->cmp_fn, NULL);
 
 	xfarray_sort_bump_stores(si);
-	return xfile_obj_store(si->array->xfile, scratch, len, lo_pos);
+	return xfile_store(si->array->xfile, scratch, len, lo_pos);
 }
 
 /* Grab a page for sorting records. */
@@ -858,7 +858,7 @@ xfarray_sort_load_cached(
 		if (xfarray_sort_terminated(si, &error))
 			return error;
 
-		return xfile_obj_load(si->array->xfile, ptr,
+		return xfile_load(si->array->xfile, ptr,
 				si->array->obj_size, idx_pos);
 	}
 
diff --git a/fs/xfs/scrub/xfile.c b/fs/xfs/scrub/xfile.c
index 87654cdd5ac6f9..d65681372a7458 100644
--- a/fs/xfs/scrub/xfile.c
+++ b/fs/xfs/scrub/xfile.c
@@ -118,13 +118,11 @@ xfile_destroy(
 }
 
 /*
- * Read a memory object directly from the xfile's page cache.  Unlike regular
- * pread, we return -E2BIG and -EFBIG for reads that are too large or at too
- * high an offset, instead of truncating the read.  Otherwise, we return
- * bytes read or an error code, like regular pread.
+ * Load an object.  Since we're treating this file as "memory", any error or
+ * short IO is treated as a failure to allocate memory.
  */
-ssize_t
-xfile_pread(
+int
+xfile_load(
 	struct xfile		*xf,
 	void			*buf,
 	size_t			count,
@@ -133,16 +131,15 @@ xfile_pread(
 	struct inode		*inode = file_inode(xf->file);
 	struct address_space	*mapping = inode->i_mapping;
 	struct page		*page = NULL;
-	ssize_t			read = 0;
 	unsigned int		pflags;
 	int			error = 0;
 
 	if (count > MAX_RW_COUNT)
-		return -E2BIG;
+		return -ENOMEM;
 	if (inode->i_sb->s_maxbytes - pos < count)
-		return -EFBIG;
+		return -ENOMEM;
 
-	trace_xfile_pread(xf, pos, count);
+	trace_xfile_load(xf, pos, count);
 
 	pflags = memalloc_nofs_save();
 	while (count > 0) {
@@ -160,8 +157,10 @@ xfile_pread(
 				__GFP_NOWARN);
 		if (IS_ERR(page)) {
 			error = PTR_ERR(page);
-			if (error != -ENOMEM)
+			if (error != -ENOMEM) {
+				error = -ENOMEM;
 				break;
+			}
 
 			memset(buf, 0, len);
 			goto advance;
@@ -185,23 +184,18 @@ xfile_pread(
 		count -= len;
 		pos += len;
 		buf += len;
-		read += len;
 	}
 	memalloc_nofs_restore(pflags);
 
-	if (read > 0)
-		return read;
 	return error;
 }
 
 /*
- * Write a memory object directly to the xfile's page cache.  Unlike regular
- * pwrite, we return -E2BIG and -EFBIG for writes that are too large or at too
- * high an offset, instead of truncating the write.  Otherwise, we return
- * bytes written or an error code, like regular pwrite.
+ * Store an object.  Since we're treating this file as "memory", any error or
+ * short IO is treated as a failure to allocate memory.
  */
-ssize_t
-xfile_pwrite(
+int
+xfile_store(
 	struct xfile		*xf,
 	const void		*buf,
 	size_t			count,
@@ -211,16 +205,15 @@ xfile_pwrite(
 	struct address_space	*mapping = inode->i_mapping;
 	const struct address_space_operations *aops = mapping->a_ops;
 	struct page		*page = NULL;
-	ssize_t			written = 0;
 	unsigned int		pflags;
 	int			error = 0;
 
 	if (count > MAX_RW_COUNT)
-		return -E2BIG;
+		return -ENOMEM;
 	if (inode->i_sb->s_maxbytes - pos < count)
-		return -EFBIG;
+		return -ENOMEM;
 
-	trace_xfile_pwrite(xf, pos, count);
+	trace_xfile_store(xf, pos, count);
 
 	pflags = memalloc_nofs_save();
 	while (count > 0) {
@@ -239,8 +232,10 @@ xfile_pwrite(
 		 */
 		error = aops->write_begin(NULL, mapping, pos, len, &page,
 				&fsdata);
-		if (error)
+		if (error) {
+			error = -ENOMEM;
 			break;
+		}
 
 		/*
 		 * xfile pages must never be mapped into userspace, so we skip
@@ -259,13 +254,14 @@ xfile_pwrite(
 		ret = aops->write_end(NULL, mapping, pos, len, len, page,
 				fsdata);
 		if (ret < 0) {
-			error = ret;
+			error = -ENOMEM;
 			break;
 		}
 
-		written += ret;
-		if (ret != len)
+		if (ret != len) {
+			error = -ENOMEM;
 			break;
+		}
 
 		count -= ret;
 		pos += ret;
@@ -273,8 +269,6 @@ xfile_pwrite(
 	}
 	memalloc_nofs_restore(pflags);
 
-	if (written > 0)
-		return written;
 	return error;
 }
 
diff --git a/fs/xfs/scrub/xfile.h b/fs/xfs/scrub/xfile.h
index c602d11560d8ee..465b10f492b66d 100644
--- a/fs/xfs/scrub/xfile.h
+++ b/fs/xfs/scrub/xfile.h
@@ -29,38 +29,10 @@ struct xfile {
 int xfile_create(const char *description, loff_t isize, struct xfile **xfilep);
 void xfile_destroy(struct xfile *xf);
 
-ssize_t xfile_pread(struct xfile *xf, void *buf, size_t count, loff_t pos);
-ssize_t xfile_pwrite(struct xfile *xf, const void *buf, size_t count,
+int xfile_load(struct xfile *xf, void *buf, size_t count, loff_t pos);
+int xfile_store(struct xfile *xf, const void *buf, size_t count,
 		loff_t pos);
 
-/*
- * Load an object.  Since we're treating this file as "memory", any error or
- * short IO is treated as a failure to allocate memory.
- */
-static inline int
-xfile_obj_load(struct xfile *xf, void *buf, size_t count, loff_t pos)
-{
-	ssize_t	ret = xfile_pread(xf, buf, count, pos);
-
-	if (ret < 0 || ret != count)
-		return -ENOMEM;
-	return 0;
-}
-
-/*
- * Store an object.  Since we're treating this file as "memory", any error or
- * short IO is treated as a failure to allocate memory.
- */
-static inline int
-xfile_obj_store(struct xfile *xf, const void *buf, size_t count, loff_t pos)
-{
-	ssize_t	ret = xfile_pwrite(xf, buf, count, pos);
-
-	if (ret < 0 || ret != count)
-		return -ENOMEM;
-	return 0;
-}
-
 loff_t xfile_seek_data(struct xfile *xf, loff_t pos);
 
 int xfile_get_page(struct xfile *xf, loff_t offset, unsigned int len,
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 10/21] xfs: don't try to handle non-update pages in xfile_obj_load
  2024-01-26 13:28 put the xfs xfile abstraction on a diet v2 Christoph Hellwig
                   ` (8 preceding siblings ...)
  2024-01-26 13:28 ` [PATCH 09/21] xfs: remove the xfile_pread/pwrite APIs Christoph Hellwig
@ 2024-01-26 13:28 ` Christoph Hellwig
  2024-01-26 13:28 ` [PATCH 11/21] xfs: shmem_file_setup can't return NULL Christoph Hellwig
                   ` (12 subsequent siblings)
  22 siblings, 0 replies; 48+ messages in thread
From: Christoph Hellwig @ 2024-01-26 13:28 UTC (permalink / raw)
  To: Chandan Babu R, Darrick J. Wong, Hugh Dickins, Andrew Morton
  Cc: linux-xfs, linux-mm

shmem_read_mapping_page_gfp always returns an uptodate page or an
ERR_PTR.  Remove the code that tries to handle a non-uptodate page.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/scrub/xfile.c | 20 ++++++++------------
 1 file changed, 8 insertions(+), 12 deletions(-)

diff --git a/fs/xfs/scrub/xfile.c b/fs/xfs/scrub/xfile.c
index d65681372a7458..71c4102f3305fe 100644
--- a/fs/xfs/scrub/xfile.c
+++ b/fs/xfs/scrub/xfile.c
@@ -166,18 +166,14 @@ xfile_load(
 			goto advance;
 		}
 
-		if (PageUptodate(page)) {
-			/*
-			 * xfile pages must never be mapped into userspace, so
-			 * we skip the dcache flush.
-			 */
-			kaddr = kmap_local_page(page);
-			p = kaddr + offset_in_page(pos);
-			memcpy(buf, p, len);
-			kunmap_local(kaddr);
-		} else {
-			memset(buf, 0, len);
-		}
+		/*
+		 * xfile pages must never be mapped into userspace, so
+		 * we skip the dcache flush.
+		 */
+		kaddr = kmap_local_page(page);
+		p = kaddr + offset_in_page(pos);
+		memcpy(buf, p, len);
+		kunmap_local(kaddr);
 		put_page(page);
 
 advance:
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 11/21] xfs: shmem_file_setup can't return NULL
  2024-01-26 13:28 put the xfs xfile abstraction on a diet v2 Christoph Hellwig
                   ` (9 preceding siblings ...)
  2024-01-26 13:28 ` [PATCH 10/21] xfs: don't try to handle non-update pages in xfile_obj_load Christoph Hellwig
@ 2024-01-26 13:28 ` Christoph Hellwig
  2024-01-26 13:28 ` [PATCH 12/21] xfs: don't modify file and inode flags for shmem files Christoph Hellwig
                   ` (11 subsequent siblings)
  22 siblings, 0 replies; 48+ messages in thread
From: Christoph Hellwig @ 2024-01-26 13:28 UTC (permalink / raw)
  To: Chandan Babu R, Darrick J. Wong, Hugh Dickins, Andrew Morton
  Cc: linux-xfs, linux-mm

shmem_file_setup always returns a struct file pointer or an ERR_PTR,
so remove the code to check for a NULL return.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/scrub/xfile.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/fs/xfs/scrub/xfile.c b/fs/xfs/scrub/xfile.c
index 71c4102f3305fe..7785afacf21809 100644
--- a/fs/xfs/scrub/xfile.c
+++ b/fs/xfs/scrub/xfile.c
@@ -62,15 +62,13 @@ xfile_create(
 {
 	struct inode		*inode;
 	struct xfile		*xf;
-	int			error = -ENOMEM;
+	int			error;
 
 	xf = kmalloc(sizeof(struct xfile), XCHK_GFP_FLAGS);
 	if (!xf)
 		return -ENOMEM;
 
 	xf->file = shmem_file_setup(description, isize, 0);
-	if (!xf->file)
-		goto out_xfile;
 	if (IS_ERR(xf->file)) {
 		error = PTR_ERR(xf->file);
 		goto out_xfile;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 12/21] xfs: don't modify file and inode flags for shmem files
  2024-01-26 13:28 put the xfs xfile abstraction on a diet v2 Christoph Hellwig
                   ` (10 preceding siblings ...)
  2024-01-26 13:28 ` [PATCH 11/21] xfs: shmem_file_setup can't return NULL Christoph Hellwig
@ 2024-01-26 13:28 ` Christoph Hellwig
  2024-01-26 13:28 ` [PATCH 13/21] xfs: don't allow highmem pages in xfile mappings Christoph Hellwig
                   ` (10 subsequent siblings)
  22 siblings, 0 replies; 48+ messages in thread
From: Christoph Hellwig @ 2024-01-26 13:28 UTC (permalink / raw)
  To: Chandan Babu R, Darrick J. Wong, Hugh Dickins, Andrew Morton
  Cc: linux-xfs, linux-mm

shmem_file_setup is explicitly intended for a file that can be
fully read and written by kernel users without restrictions.  Don't
poke into internals to change random flags in the file or inode.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/scrub/xfile.c | 17 +----------------
 1 file changed, 1 insertion(+), 16 deletions(-)

diff --git a/fs/xfs/scrub/xfile.c b/fs/xfs/scrub/xfile.c
index 7785afacf21809..7e915385ef0011 100644
--- a/fs/xfs/scrub/xfile.c
+++ b/fs/xfs/scrub/xfile.c
@@ -68,28 +68,13 @@ xfile_create(
 	if (!xf)
 		return -ENOMEM;
 
-	xf->file = shmem_file_setup(description, isize, 0);
+	xf->file = shmem_kernel_file_setup(description, isize, 0);
 	if (IS_ERR(xf->file)) {
 		error = PTR_ERR(xf->file);
 		goto out_xfile;
 	}
 
-	/*
-	 * We want a large sparse file that we can pread, pwrite, and seek.
-	 * xfile users are responsible for keeping the xfile hidden away from
-	 * all other callers, so we skip timestamp updates and security checks.
-	 * Make the inode only accessible by root, just in case the xfile ever
-	 * escapes.
-	 */
-	xf->file->f_mode |= FMODE_PREAD | FMODE_PWRITE | FMODE_NOCMTIME |
-			    FMODE_LSEEK;
-	xf->file->f_flags |= O_RDWR | O_LARGEFILE | O_NOATIME;
 	inode = file_inode(xf->file);
-	inode->i_flags |= S_PRIVATE | S_NOCMTIME | S_NOATIME;
-	inode->i_mode &= ~0177;
-	inode->i_uid = GLOBAL_ROOT_UID;
-	inode->i_gid = GLOBAL_ROOT_GID;
-
 	lockdep_set_class(&inode->i_rwsem, &xfile_i_mutex_key);
 
 	trace_xfile_create(xf);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 13/21] xfs: don't allow highmem pages in xfile mappings
  2024-01-26 13:28 put the xfs xfile abstraction on a diet v2 Christoph Hellwig
                   ` (11 preceding siblings ...)
  2024-01-26 13:28 ` [PATCH 12/21] xfs: don't modify file and inode flags for shmem files Christoph Hellwig
@ 2024-01-26 13:28 ` Christoph Hellwig
  2024-01-26 16:26   ` Matthew Wilcox
  2024-01-26 13:28 ` [PATCH 14/21] xfs: use shmem_get_folio in xfile_obj_store Christoph Hellwig
                   ` (9 subsequent siblings)
  22 siblings, 1 reply; 48+ messages in thread
From: Christoph Hellwig @ 2024-01-26 13:28 UTC (permalink / raw)
  To: Chandan Babu R, Darrick J. Wong, Hugh Dickins, Andrew Morton
  Cc: linux-xfs, linux-mm

XFS is generally used on 64-bit, non-highmem platforms and xfile
mappings are accessed all the time.  Reduce our pain by not allowing
any highmem mappings in the xfile page cache and remove all the kmap
calls for it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/scrub/xfarray.c |  3 +--
 fs/xfs/scrub/xfile.c   | 21 +++++++++------------
 2 files changed, 10 insertions(+), 14 deletions(-)

diff --git a/fs/xfs/scrub/xfarray.c b/fs/xfs/scrub/xfarray.c
index 95ac14bceeadd6..d0f98a43b2ba0a 100644
--- a/fs/xfs/scrub/xfarray.c
+++ b/fs/xfs/scrub/xfarray.c
@@ -580,7 +580,7 @@ xfarray_sort_get_page(
 	 * xfile pages must never be mapped into userspace, so we skip the
 	 * dcache flush when mapping the page.
 	 */
-	si->page_kaddr = kmap_local_page(si->xfpage.page);
+	si->page_kaddr = page_address(si->xfpage.page);
 	return 0;
 }
 
@@ -592,7 +592,6 @@ xfarray_sort_put_page(
 	if (!si->page_kaddr)
 		return 0;
 
-	kunmap_local(si->page_kaddr);
 	si->page_kaddr = NULL;
 
 	return xfile_put_page(si->array->xfile, &si->xfpage);
diff --git a/fs/xfs/scrub/xfile.c b/fs/xfs/scrub/xfile.c
index 7e915385ef0011..623bbde91ae3fe 100644
--- a/fs/xfs/scrub/xfile.c
+++ b/fs/xfs/scrub/xfile.c
@@ -77,6 +77,12 @@ xfile_create(
 	inode = file_inode(xf->file);
 	lockdep_set_class(&inode->i_rwsem, &xfile_i_mutex_key);
 
+	/*
+	 * We don't want to bother with kmapping data during repair, so don't
+	 * allow highmem pages to back this mapping.
+	 */
+	mapping_set_gfp_mask(inode->i_mapping, GFP_HIGHUSER);
+
 	trace_xfile_create(xf);
 
 	*xfilep = xf;
@@ -126,7 +132,6 @@ xfile_load(
 
 	pflags = memalloc_nofs_save();
 	while (count > 0) {
-		void		*p, *kaddr;
 		unsigned int	len;
 
 		len = min_t(ssize_t, count, PAGE_SIZE - offset_in_page(pos));
@@ -153,10 +158,7 @@ xfile_load(
 		 * xfile pages must never be mapped into userspace, so
 		 * we skip the dcache flush.
 		 */
-		kaddr = kmap_local_page(page);
-		p = kaddr + offset_in_page(pos);
-		memcpy(buf, p, len);
-		kunmap_local(kaddr);
+		memcpy(buf, page_address(page) + offset_in_page(pos), len);
 		put_page(page);
 
 advance:
@@ -221,14 +223,13 @@ xfile_store(
 		 * the dcache flush.  If the page is not uptodate, zero it
 		 * before writing data.
 		 */
-		kaddr = kmap_local_page(page);
+		kaddr = page_address(page);
 		if (!PageUptodate(page)) {
 			memset(kaddr, 0, PAGE_SIZE);
 			SetPageUptodate(page);
 		}
 		p = kaddr + offset_in_page(pos);
 		memcpy(p, buf, len);
-		kunmap_local(kaddr);
 
 		ret = aops->write_end(NULL, mapping, pos, len, len, page,
 				fsdata);
@@ -314,11 +315,7 @@ xfile_get_page(
 	 * to the caller and make sure the backing store will hold on to them.
 	 */
 	if (!PageUptodate(page)) {
-		void	*kaddr;
-
-		kaddr = kmap_local_page(page);
-		memset(kaddr, 0, PAGE_SIZE);
-		kunmap_local(kaddr);
+		memset(page_address(page), 0, PAGE_SIZE);
 		SetPageUptodate(page);
 	}
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 14/21] xfs: use shmem_get_folio in xfile_obj_store
  2024-01-26 13:28 put the xfs xfile abstraction on a diet v2 Christoph Hellwig
                   ` (12 preceding siblings ...)
  2024-01-26 13:28 ` [PATCH 13/21] xfs: don't allow highmem pages in xfile mappings Christoph Hellwig
@ 2024-01-26 13:28 ` Christoph Hellwig
  2024-01-26 16:29   ` Matthew Wilcox
  2024-01-26 13:28 ` [PATCH 15/21] xfs: use shmem_get_folio in in xfile_load Christoph Hellwig
                   ` (8 subsequent siblings)
  22 siblings, 1 reply; 48+ messages in thread
From: Christoph Hellwig @ 2024-01-26 13:28 UTC (permalink / raw)
  To: Chandan Babu R, Darrick J. Wong, Hugh Dickins, Andrew Morton
  Cc: linux-xfs, linux-mm

Switch to using shmem_get_folio and manually dirtying the page instead
of abusing aops->write_begin and aops->write_end in xfile_get_page.

This simplifies the code by not doing indirect calls of not actually
exported interfaces that don't really fit the use case very well, and
happens to get us large folio support for free.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/scrub/xfile.c | 75 +++++++++++++++++---------------------------
 1 file changed, 29 insertions(+), 46 deletions(-)

diff --git a/fs/xfs/scrub/xfile.c b/fs/xfs/scrub/xfile.c
index 623bbde91ae3fe..c71c853c9ffdd7 100644
--- a/fs/xfs/scrub/xfile.c
+++ b/fs/xfs/scrub/xfile.c
@@ -183,11 +183,7 @@ xfile_store(
 	loff_t			pos)
 {
 	struct inode		*inode = file_inode(xf->file);
-	struct address_space	*mapping = inode->i_mapping;
-	const struct address_space_operations *aops = mapping->a_ops;
-	struct page		*page = NULL;
 	unsigned int		pflags;
-	int			error = 0;
 
 	if (count > MAX_RW_COUNT)
 		return -ENOMEM;
@@ -196,60 +192,47 @@ xfile_store(
 
 	trace_xfile_store(xf, pos, count);
 
+	/*
+	 * Increase the file size first so that shmem_get_folio(..., SGP_CACHE),
+	 * actually allocates a folio instead of erroring out.
+	 */
+	if (pos + count > i_size_read(inode))
+		i_size_write(inode, pos + count);
+
 	pflags = memalloc_nofs_save();
 	while (count > 0) {
-		void		*fsdata = NULL;
-		void		*p, *kaddr;
+		struct folio	*folio;
 		unsigned int	len;
-		int		ret;
+		unsigned int	offset;
 
-		len = min_t(ssize_t, count, PAGE_SIZE - offset_in_page(pos));
-
-		/*
-		 * We call write_begin directly here to avoid all the freezer
-		 * protection lock-taking that happens in the normal path.
-		 * shmem doesn't support fs freeze, but lockdep doesn't know
-		 * that and will trip over that.
-		 */
-		error = aops->write_begin(NULL, mapping, pos, len, &page,
-				&fsdata);
-		if (error) {
-			error = -ENOMEM;
+		if (shmem_get_folio(inode, pos >> PAGE_SHIFT, &folio,
+				SGP_CACHE) < 0)
 			break;
-		}
-
-		/*
-		 * xfile pages must never be mapped into userspace, so we skip
-		 * the dcache flush.  If the page is not uptodate, zero it
-		 * before writing data.
-		 */
-		kaddr = page_address(page);
-		if (!PageUptodate(page)) {
-			memset(kaddr, 0, PAGE_SIZE);
-			SetPageUptodate(page);
-		}
-		p = kaddr + offset_in_page(pos);
-		memcpy(p, buf, len);
-
-		ret = aops->write_end(NULL, mapping, pos, len, len, page,
-				fsdata);
-		if (ret < 0) {
-			error = -ENOMEM;
+		if (folio_test_hwpoison(folio) ||
+		    (folio_test_large(folio) &&
+		     folio_test_has_hwpoisoned(folio))) {
+			folio_unlock(folio);
+			folio_put(folio);
 			break;
 		}
 
-		if (ret != len) {
-			error = -ENOMEM;
-			break;
-		}
+		offset = offset_in_folio(folio, pos);
+		len = min_t(ssize_t, count, folio_size(folio) - offset);
+		memcpy(folio_address(folio) + offset, buf, len);
+
+		folio_mark_dirty(folio);
+		folio_unlock(folio);
+		folio_put(folio);
 
-		count -= ret;
-		pos += ret;
-		buf += ret;
+		count -= len;
+		pos += len;
+		buf += len;
 	}
 	memalloc_nofs_restore(pflags);
 
-	return error;
+	if (count)
+		return -ENOMEM;
+	return 0;
 }
 
 /* Find the next written area in the xfile data for a given offset. */
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 15/21] xfs: use shmem_get_folio in in xfile_load
  2024-01-26 13:28 put the xfs xfile abstraction on a diet v2 Christoph Hellwig
                   ` (13 preceding siblings ...)
  2024-01-26 13:28 ` [PATCH 14/21] xfs: use shmem_get_folio in xfile_obj_store Christoph Hellwig
@ 2024-01-26 13:28 ` Christoph Hellwig
  2024-01-26 13:28 ` [PATCH 16/21] xfs: improve detection of lost xfile contents Christoph Hellwig
                   ` (7 subsequent siblings)
  22 siblings, 0 replies; 48+ messages in thread
From: Christoph Hellwig @ 2024-01-26 13:28 UTC (permalink / raw)
  To: Chandan Babu R, Darrick J. Wong, Hugh Dickins, Andrew Morton
  Cc: linux-xfs, linux-mm

Switch to using shmem_get_folio in xfile_load instead of using
shmem_read_mapping_page_gfp.  This gets us support for large folios
and also optimized reading from unallocated space, as
shmem_get_folio with SGP_READ won't allocate a page for them just
to zero the content.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/scrub/xfile.c | 63 ++++++++++++++++++++------------------------
 1 file changed, 28 insertions(+), 35 deletions(-)

diff --git a/fs/xfs/scrub/xfile.c b/fs/xfs/scrub/xfile.c
index c71c853c9ffdd7..077f9ce6e81409 100644
--- a/fs/xfs/scrub/xfile.c
+++ b/fs/xfs/scrub/xfile.c
@@ -34,13 +34,6 @@
  * xfiles assume that the caller will handle all required concurrency
  * management; standard vfs locks (freezer and inode) are not taken.  Reads
  * and writes are satisfied directly from the page cache.
- *
- * NOTE: The current shmemfs implementation has a quirk that in-kernel reads
- * of a hole cause a page to be mapped into the file.  If you are going to
- * create a sparse xfile, please be careful about reading from uninitialized
- * parts of the file.  These pages are !Uptodate and will eventually be
- * reclaimed if not written, but in the short term this boosts memory
- * consumption.
  */
 
 /*
@@ -118,10 +111,7 @@ xfile_load(
 	loff_t			pos)
 {
 	struct inode		*inode = file_inode(xf->file);
-	struct address_space	*mapping = inode->i_mapping;
-	struct page		*page = NULL;
 	unsigned int		pflags;
-	int			error = 0;
 
 	if (count > MAX_RW_COUNT)
 		return -ENOMEM;
@@ -132,43 +122,46 @@ xfile_load(
 
 	pflags = memalloc_nofs_save();
 	while (count > 0) {
+		struct folio	*folio;
 		unsigned int	len;
+		unsigned int	offset;
 
-		len = min_t(ssize_t, count, PAGE_SIZE - offset_in_page(pos));
-
-		/*
-		 * In-kernel reads of a shmem file cause it to allocate a page
-		 * if the mapping shows a hole.  Therefore, if we hit ENOMEM
-		 * we can continue by zeroing the caller's buffer.
-		 */
-		page = shmem_read_mapping_page_gfp(mapping, pos >> PAGE_SHIFT,
-				__GFP_NOWARN);
-		if (IS_ERR(page)) {
-			error = PTR_ERR(page);
-			if (error != -ENOMEM) {
-				error = -ENOMEM;
+		if (shmem_get_folio(inode, pos >> PAGE_SHIFT, &folio,
+				SGP_READ) < 0)
+			break;
+		if (!folio) {
+			/*
+			 * No data stored at this offset, just zero the output
+			 * buffer until the next page boundary.
+			 */
+			len = min_t(ssize_t, count,
+				PAGE_SIZE - offset_in_page(pos));
+			memset(buf, 0, len);
+		} else {
+			if (folio_test_hwpoison(folio) ||
+			    (folio_test_large(folio) &&
+			     folio_test_has_hwpoisoned(folio))) {
+				folio_unlock(folio);
+				folio_put(folio);
 				break;
 			}
 
-			memset(buf, 0, len);
-			goto advance;
-		}
-
-		/*
-		 * xfile pages must never be mapped into userspace, so
-		 * we skip the dcache flush.
-		 */
-		memcpy(buf, page_address(page) + offset_in_page(pos), len);
-		put_page(page);
+			offset = offset_in_folio(folio, pos);
+			len = min_t(ssize_t, count, folio_size(folio) - offset);
+			memcpy(buf, folio_address(folio) + offset, len);
 
-advance:
+			folio_unlock(folio);
+			folio_put(folio);
+		}
 		count -= len;
 		pos += len;
 		buf += len;
 	}
 	memalloc_nofs_restore(pflags);
 
-	return error;
+	if (count)
+		return -ENOMEM;
+	return 0;
 }
 
 /*
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 16/21] xfs: improve detection of lost xfile contents
  2024-01-26 13:28 put the xfs xfile abstraction on a diet v2 Christoph Hellwig
                   ` (14 preceding siblings ...)
  2024-01-26 13:28 ` [PATCH 15/21] xfs: use shmem_get_folio in in xfile_load Christoph Hellwig
@ 2024-01-26 13:28 ` Christoph Hellwig
  2024-01-26 16:33   ` Matthew Wilcox
  2024-01-26 13:28 ` [PATCH 17/21] xfs: add file_{get,put}_folio Christoph Hellwig
                   ` (6 subsequent siblings)
  22 siblings, 1 reply; 48+ messages in thread
From: Christoph Hellwig @ 2024-01-26 13:28 UTC (permalink / raw)
  To: Chandan Babu R, Darrick J. Wong, Hugh Dickins, Andrew Morton
  Cc: linux-xfs, linux-mm

From: "Darrick J. Wong" <djwong@kernel.org>

shmem files are weird animals with respect to figuring out if we've lost
any data.  The memory failure code can set HWPoison on a page, but it
also sets writeback errors on the file mapping.  Replace the twisty
multi-line if logic with a single helper that looks in all the places
that I know of where memory errors can show up.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/scrub/xfile.c | 33 +++++++++++++++++++++++++++------
 1 file changed, 27 insertions(+), 6 deletions(-)

diff --git a/fs/xfs/scrub/xfile.c b/fs/xfs/scrub/xfile.c
index 077f9ce6e81409..2d802c20a8ddfe 100644
--- a/fs/xfs/scrub/xfile.c
+++ b/fs/xfs/scrub/xfile.c
@@ -99,6 +99,31 @@ xfile_destroy(
 	kfree(xf);
 }
 
+/* Has this file lost any of the data stored in it? */
+static inline bool
+xfile_has_lost_data(
+	struct inode		*inode,
+	struct folio		*folio)
+{
+	struct address_space	*mapping = inode->i_mapping;
+
+	/* This folio itself has been poisoned. */
+	if (folio_test_hwpoison(folio))
+		return true;
+
+	/* A base page under this large folio has been poisoned. */
+	if (folio_test_large(folio) && folio_test_has_hwpoisoned(folio))
+		return true;
+
+	/* Data loss has occurred anywhere in this shmem file. */
+	if (test_bit(AS_EIO, &mapping->flags))
+		return true;
+	if (filemap_check_wb_err(mapping, 0))
+		return true;
+
+	return false;
+}
+
 /*
  * Load an object.  Since we're treating this file as "memory", any error or
  * short IO is treated as a failure to allocate memory.
@@ -138,9 +163,7 @@ xfile_load(
 				PAGE_SIZE - offset_in_page(pos));
 			memset(buf, 0, len);
 		} else {
-			if (folio_test_hwpoison(folio) ||
-			    (folio_test_large(folio) &&
-			     folio_test_has_hwpoisoned(folio))) {
+			if (xfile_has_lost_data(inode, folio)) {
 				folio_unlock(folio);
 				folio_put(folio);
 				break;
@@ -201,9 +224,7 @@ xfile_store(
 		if (shmem_get_folio(inode, pos >> PAGE_SHIFT, &folio,
 				SGP_CACHE) < 0)
 			break;
-		if (folio_test_hwpoison(folio) ||
-		    (folio_test_large(folio) &&
-		     folio_test_has_hwpoisoned(folio))) {
+		if (xfile_has_lost_data(inode, folio)) {
 			folio_unlock(folio);
 			folio_put(folio);
 			break;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 17/21] xfs: add file_{get,put}_folio
  2024-01-26 13:28 put the xfs xfile abstraction on a diet v2 Christoph Hellwig
                   ` (15 preceding siblings ...)
  2024-01-26 13:28 ` [PATCH 16/21] xfs: improve detection of lost xfile contents Christoph Hellwig
@ 2024-01-26 13:28 ` Christoph Hellwig
  2024-01-26 16:40   ` Matthew Wilcox
  2024-01-27  1:26   ` Kent Overstreet
  2024-01-26 13:29 ` [PATCH 18/21] xfs: remove xfarray_sortinfo.page_kaddr Christoph Hellwig
                   ` (5 subsequent siblings)
  22 siblings, 2 replies; 48+ messages in thread
From: Christoph Hellwig @ 2024-01-26 13:28 UTC (permalink / raw)
  To: Chandan Babu R, Darrick J. Wong, Hugh Dickins, Andrew Morton
  Cc: linux-xfs, linux-mm

From: "Darrick J. Wong" <djwong@kernel.org>

Add helper similar to file_{get,set}_page, but which deal with folios
and don't allocate new folio unless explicitly asked to, which map
to shmem_get_folio instead of calling into the aops.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/scrub/trace.h |  2 ++
 fs/xfs/scrub/xfile.c | 74 ++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/scrub/xfile.h |  7 +++++
 3 files changed, 83 insertions(+)

diff --git a/fs/xfs/scrub/trace.h b/fs/xfs/scrub/trace.h
index 0327cab606b070..c61fa7a95ef522 100644
--- a/fs/xfs/scrub/trace.h
+++ b/fs/xfs/scrub/trace.h
@@ -908,6 +908,8 @@ DEFINE_XFILE_EVENT(xfile_store);
 DEFINE_XFILE_EVENT(xfile_seek_data);
 DEFINE_XFILE_EVENT(xfile_get_page);
 DEFINE_XFILE_EVENT(xfile_put_page);
+DEFINE_XFILE_EVENT(xfile_get_folio);
+DEFINE_XFILE_EVENT(xfile_put_folio);
 
 TRACE_EVENT(xfarray_create,
 	TP_PROTO(struct xfarray *xfa, unsigned long long required_capacity),
diff --git a/fs/xfs/scrub/xfile.c b/fs/xfs/scrub/xfile.c
index 2d802c20a8ddfe..1c1db4ae1ba6ee 100644
--- a/fs/xfs/scrub/xfile.c
+++ b/fs/xfs/scrub/xfile.c
@@ -365,3 +365,77 @@ xfile_put_page(
 		return -EIO;
 	return 0;
 }
+
+/*
+ * Grab the (locked) folio for a memory object.  The object cannot span a folio
+ * boundary.  Returns the locked folio if successful, NULL if there was no
+ * folio or it didn't cover the range requested, or an ERR_PTR on failure.
+ */
+struct folio *
+xfile_get_folio(
+	struct xfile		*xf,
+	loff_t			pos,
+	size_t			len,
+	unsigned int		flags)
+{
+	struct inode		*inode = file_inode(xf->file);
+	struct folio		*folio = NULL;
+	unsigned int		pflags;
+	int			error;
+
+	if (inode->i_sb->s_maxbytes - pos < len)
+		return ERR_PTR(-ENOMEM);
+
+	trace_xfile_get_folio(xf, pos, len);
+
+	/*
+	 * Increase the file size first so that shmem_get_folio(..., SGP_CACHE),
+	 * actually allocates a folio instead of erroring out.
+	 */
+	if ((flags & XFILE_ALLOC) && pos + len > i_size_read(inode))
+		i_size_write(inode, pos + len);
+
+	pflags = memalloc_nofs_save();
+	error = shmem_get_folio(inode, pos >> PAGE_SHIFT, &folio,
+			(flags & XFILE_ALLOC) ? SGP_CACHE : SGP_READ);
+	memalloc_nofs_restore(pflags);
+	if (error)
+		return ERR_PTR(error);
+
+	if (!folio)
+		return NULL;
+
+	if (len > folio_size(folio) - offset_in_folio(folio, pos)) {
+		folio_unlock(folio);
+		folio_put(folio);
+		return NULL;
+	}
+
+	if (xfile_has_lost_data(inode, folio)) {
+		folio_unlock(folio);
+		folio_put(folio);
+		return ERR_PTR(-EIO);
+	}
+
+	/*
+	 * Mark the folio dirty so that it won't be reclaimed once we drop the
+	 * (potentially last) reference in xfile_put_folio.
+	 */
+	if (flags & XFILE_ALLOC)
+		folio_set_dirty(folio);
+	return folio;
+}
+
+/*
+ * Release the (locked) folio for a memory object.
+ */
+void
+xfile_put_folio(
+	struct xfile		*xf,
+	struct folio		*folio)
+{
+	trace_xfile_put_folio(xf, folio_pos(folio), folio_size(folio));
+
+	folio_unlock(folio);
+	folio_put(folio);
+}
diff --git a/fs/xfs/scrub/xfile.h b/fs/xfs/scrub/xfile.h
index 465b10f492b66d..afb75e9fbaf265 100644
--- a/fs/xfs/scrub/xfile.h
+++ b/fs/xfs/scrub/xfile.h
@@ -39,4 +39,11 @@ int xfile_get_page(struct xfile *xf, loff_t offset, unsigned int len,
 		struct xfile_page *xbuf);
 int xfile_put_page(struct xfile *xf, struct xfile_page *xbuf);
 
+#define XFILE_MAX_FOLIO_SIZE	(PAGE_SIZE << MAX_PAGECACHE_ORDER)
+
+#define XFILE_ALLOC		(1 << 0) /* allocate folio if not present */
+struct folio *xfile_get_folio(struct xfile *xf, loff_t offset, size_t len,
+		unsigned int flags);
+void xfile_put_folio(struct xfile *xf, struct folio *folio);
+
 #endif /* __XFS_SCRUB_XFILE_H__ */
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 18/21] xfs: remove xfarray_sortinfo.page_kaddr
  2024-01-26 13:28 put the xfs xfile abstraction on a diet v2 Christoph Hellwig
                   ` (16 preceding siblings ...)
  2024-01-26 13:28 ` [PATCH 17/21] xfs: add file_{get,put}_folio Christoph Hellwig
@ 2024-01-26 13:29 ` Christoph Hellwig
  2024-01-26 13:29 ` [PATCH 19/21] xfs: fix a comment in xfarray.c Christoph Hellwig
                   ` (4 subsequent siblings)
  22 siblings, 0 replies; 48+ messages in thread
From: Christoph Hellwig @ 2024-01-26 13:29 UTC (permalink / raw)
  To: Chandan Babu R, Darrick J. Wong, Hugh Dickins, Andrew Morton
  Cc: linux-xfs, linux-mm

Now that xfile pages don't need kmapping, there is no need to cache
the kernel virtual address for them.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/scrub/xfarray.c | 22 ++++------------------
 fs/xfs/scrub/xfarray.h |  1 -
 2 files changed, 4 insertions(+), 19 deletions(-)

diff --git a/fs/xfs/scrub/xfarray.c b/fs/xfs/scrub/xfarray.c
index d0f98a43b2ba0a..82b2a35a8e8630 100644
--- a/fs/xfs/scrub/xfarray.c
+++ b/fs/xfs/scrub/xfarray.c
@@ -570,18 +570,7 @@ xfarray_sort_get_page(
 	loff_t			pos,
 	uint64_t		len)
 {
-	int			error;
-
-	error = xfile_get_page(si->array->xfile, pos, len, &si->xfpage);
-	if (error)
-		return error;
-
-	/*
-	 * xfile pages must never be mapped into userspace, so we skip the
-	 * dcache flush when mapping the page.
-	 */
-	si->page_kaddr = page_address(si->xfpage.page);
-	return 0;
+	return xfile_get_page(si->array->xfile, pos, len, &si->xfpage);
 }
 
 /* Release a page we grabbed for sorting records. */
@@ -589,11 +578,8 @@ static inline int
 xfarray_sort_put_page(
 	struct xfarray_sortinfo	*si)
 {
-	if (!si->page_kaddr)
+	if (!xfile_page_cached(&si->xfpage))
 		return 0;
-
-	si->page_kaddr = NULL;
-
 	return xfile_put_page(si->array->xfile, &si->xfpage);
 }
 
@@ -636,7 +622,7 @@ xfarray_pagesort(
 		return error;
 
 	xfarray_sort_bump_heapsorts(si);
-	startp = si->page_kaddr + offset_in_page(lo_pos);
+	startp = page_address(si->xfpage.page) + offset_in_page(lo_pos);
 	sort(startp, hi - lo + 1, si->array->obj_size, si->cmp_fn, NULL);
 
 	xfarray_sort_bump_stores(si);
@@ -883,7 +869,7 @@ xfarray_sort_load_cached(
 			return error;
 	}
 
-	memcpy(ptr, si->page_kaddr + offset_in_page(idx_pos),
+	memcpy(ptr, page_address(si->xfpage.page) + offset_in_page(idx_pos),
 			si->array->obj_size);
 	return 0;
 }
diff --git a/fs/xfs/scrub/xfarray.h b/fs/xfs/scrub/xfarray.h
index 62b9c506fdd1b7..6f2862054e194d 100644
--- a/fs/xfs/scrub/xfarray.h
+++ b/fs/xfs/scrub/xfarray.h
@@ -107,7 +107,6 @@ struct xfarray_sortinfo {
 
 	/* Cache a page here for faster access. */
 	struct xfile_page	xfpage;
-	void			*page_kaddr;
 
 #ifdef DEBUG
 	/* Performance statistics. */
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 19/21] xfs: fix a comment in xfarray.c
  2024-01-26 13:28 put the xfs xfile abstraction on a diet v2 Christoph Hellwig
                   ` (17 preceding siblings ...)
  2024-01-26 13:29 ` [PATCH 18/21] xfs: remove xfarray_sortinfo.page_kaddr Christoph Hellwig
@ 2024-01-26 13:29 ` Christoph Hellwig
  2024-01-26 13:29 ` [PATCH 20/21] xfs: convert xfarray_pagesort to deal with large folios Christoph Hellwig
                   ` (3 subsequent siblings)
  22 siblings, 0 replies; 48+ messages in thread
From: Christoph Hellwig @ 2024-01-26 13:29 UTC (permalink / raw)
  To: Chandan Babu R, Darrick J. Wong, Hugh Dickins, Andrew Morton
  Cc: linux-xfs, linux-mm

xfiles are shmem files, not memfds.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/scrub/xfarray.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/xfs/scrub/xfarray.c b/fs/xfs/scrub/xfarray.c
index 82b2a35a8e8630..379e1db22269c7 100644
--- a/fs/xfs/scrub/xfarray.c
+++ b/fs/xfs/scrub/xfarray.c
@@ -16,7 +16,7 @@
  * Large Arrays of Fixed-Size Records
  * ==================================
  *
- * This memory array uses an xfile (which itself is a memfd "file") to store
+ * This memory array uses an xfile (which itself is a shmem file) to store
  * large numbers of fixed-size records in memory that can be paged out.  This
  * puts less stress on the memory reclaim algorithms during an online repair
  * because we don't have to pin so much memory.  However, array access is less
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 20/21] xfs: convert xfarray_pagesort to deal with large folios
  2024-01-26 13:28 put the xfs xfile abstraction on a diet v2 Christoph Hellwig
                   ` (18 preceding siblings ...)
  2024-01-26 13:29 ` [PATCH 19/21] xfs: fix a comment in xfarray.c Christoph Hellwig
@ 2024-01-26 13:29 ` Christoph Hellwig
  2024-01-27  1:09   ` Kent Overstreet
  2024-01-26 13:29 ` [PATCH 21/21] xfs: remove xfile_{get,put}_page Christoph Hellwig
                   ` (2 subsequent siblings)
  22 siblings, 1 reply; 48+ messages in thread
From: Christoph Hellwig @ 2024-01-26 13:29 UTC (permalink / raw)
  To: Chandan Babu R, Darrick J. Wong, Hugh Dickins, Andrew Morton
  Cc: linux-xfs, linux-mm

From: "Darrick J. Wong" <djwong@kernel.org>

Convert xfarray_pagesort to handle large folios by introducing a new
xfile_get_folio routine that can return a folio of arbitrary size, and
using heapsort on the full folio.  This also corrects an off-by-one bug
in the calculation of len in xfarray_pagesort that was papered over by
xfarray_want_pagesort.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/scrub/trace.h   |  43 ++++++++-
 fs/xfs/scrub/xfarray.c | 201 +++++++++++++++++++----------------------
 fs/xfs/scrub/xfarray.h |  10 +-
 3 files changed, 143 insertions(+), 111 deletions(-)

diff --git a/fs/xfs/scrub/trace.h b/fs/xfs/scrub/trace.h
index c61fa7a95ef522..3a1a827828dcb9 100644
--- a/fs/xfs/scrub/trace.h
+++ b/fs/xfs/scrub/trace.h
@@ -956,7 +956,7 @@ TRACE_EVENT(xfarray_isort,
 		  __entry->hi - __entry->lo)
 );
 
-TRACE_EVENT(xfarray_pagesort,
+TRACE_EVENT(xfarray_foliosort,
 	TP_PROTO(struct xfarray_sortinfo *si, uint64_t lo, uint64_t hi),
 	TP_ARGS(si, lo, hi),
 	TP_STRUCT__entry(
@@ -1027,6 +1027,47 @@ TRACE_EVENT(xfarray_sort,
 		  __entry->bytes)
 );
 
+TRACE_EVENT(xfarray_sort_scan,
+	TP_PROTO(struct xfarray_sortinfo *si, unsigned long long idx),
+	TP_ARGS(si, idx),
+	TP_STRUCT__entry(
+		__field(unsigned long, ino)
+		__field(unsigned long long, nr)
+		__field(size_t, obj_size)
+		__field(unsigned long long, idx)
+		__field(unsigned long long, folio_pos)
+		__field(unsigned long, folio_bytes)
+		__field(unsigned long long, first_idx)
+		__field(unsigned long long, last_idx)
+	),
+	TP_fast_assign(
+		__entry->nr = si->array->nr;
+		__entry->obj_size = si->array->obj_size;
+		__entry->ino = file_inode(si->array->xfile->file)->i_ino;
+		__entry->idx = idx;
+		if (si->folio) {
+			__entry->folio_pos = folio_pos(si->folio);
+			__entry->folio_bytes = folio_size(si->folio);
+			__entry->first_idx = si->first_folio_idx;
+			__entry->last_idx = si->last_folio_idx;
+		} else {
+			__entry->folio_pos = 0;
+			__entry->folio_bytes = 0;
+			__entry->first_idx = 0;
+			__entry->last_idx = 0;
+		}
+	),
+	TP_printk("xfino 0x%lx nr %llu objsz %zu idx %llu folio_pos 0x%llx folio_bytes 0x%lx first_idx %llu last_idx %llu",
+		  __entry->ino,
+		  __entry->nr,
+		  __entry->obj_size,
+		  __entry->idx,
+		  __entry->folio_pos,
+		  __entry->folio_bytes,
+		  __entry->first_idx,
+		  __entry->last_idx)
+);
+
 TRACE_EVENT(xfarray_sort_stats,
 	TP_PROTO(struct xfarray_sortinfo *si, int error),
 	TP_ARGS(si, error),
diff --git a/fs/xfs/scrub/xfarray.c b/fs/xfs/scrub/xfarray.c
index 379e1db22269c7..17c982a4821d47 100644
--- a/fs/xfs/scrub/xfarray.c
+++ b/fs/xfs/scrub/xfarray.c
@@ -563,70 +563,42 @@ xfarray_isort(
 	return xfile_store(si->array->xfile, scratch, len, lo_pos);
 }
 
-/* Grab a page for sorting records. */
-static inline int
-xfarray_sort_get_page(
-	struct xfarray_sortinfo	*si,
-	loff_t			pos,
-	uint64_t		len)
-{
-	return xfile_get_page(si->array->xfile, pos, len, &si->xfpage);
-}
-
-/* Release a page we grabbed for sorting records. */
-static inline int
-xfarray_sort_put_page(
-	struct xfarray_sortinfo	*si)
-{
-	if (!xfile_page_cached(&si->xfpage))
-		return 0;
-	return xfile_put_page(si->array->xfile, &si->xfpage);
-}
-
-/* Decide if these records are eligible for in-page sorting. */
-static inline bool
-xfarray_want_pagesort(
-	struct xfarray_sortinfo	*si,
-	xfarray_idx_t		lo,
-	xfarray_idx_t		hi)
-{
-	pgoff_t			lo_page;
-	pgoff_t			hi_page;
-	loff_t			end_pos;
-
-	/* We can only map one page at a time. */
-	lo_page = xfarray_pos(si->array, lo) >> PAGE_SHIFT;
-	end_pos = xfarray_pos(si->array, hi) + si->array->obj_size - 1;
-	hi_page = end_pos >> PAGE_SHIFT;
-
-	return lo_page == hi_page;
-}
-
-/* Sort a bunch of records that all live in the same memory page. */
+/*
+ * Sort the records from lo to hi (inclusive) if they are all backed by the
+ * same memory folio.  Returns 1 if it sorted, 0 if it did not, or a negative
+ * errno.
+ */
 STATIC int
-xfarray_pagesort(
+xfarray_foliosort(
 	struct xfarray_sortinfo	*si,
 	xfarray_idx_t		lo,
 	xfarray_idx_t		hi)
 {
+	struct folio		*folio;
 	void			*startp;
 	loff_t			lo_pos = xfarray_pos(si->array, lo);
-	uint64_t		len = xfarray_pos(si->array, hi - lo);
-	int			error = 0;
+	uint64_t		len = xfarray_pos(si->array, hi - lo + 1);
 
-	trace_xfarray_pagesort(si, lo, hi);
+	/* No single folio could back this many records. */
+	if (len > XFILE_MAX_FOLIO_SIZE)
+		return 0;
 
 	xfarray_sort_bump_loads(si);
-	error = xfarray_sort_get_page(si, lo_pos, len);
-	if (error)
-		return error;
+	folio = xfile_get_folio(si->array->xfile, lo_pos, len, XFILE_ALLOC);
+	if (IS_ERR(folio))
+		return PTR_ERR(folio);
+	if (!folio)
+		return 0;
+
+	trace_xfarray_foliosort(si, lo, hi);
 
 	xfarray_sort_bump_heapsorts(si);
-	startp = page_address(si->xfpage.page) + offset_in_page(lo_pos);
+	startp = folio_address(folio) + offset_in_folio(folio, lo_pos);
 	sort(startp, hi - lo + 1, si->array->obj_size, si->cmp_fn, NULL);
 
 	xfarray_sort_bump_stores(si);
-	return xfarray_sort_put_page(si);
+	xfile_put_folio(si->array->xfile, folio);
+	return 1;
 }
 
 /* Return a pointer to the xfarray pivot record within the sortinfo struct. */
@@ -814,63 +786,78 @@ xfarray_qsort_push(
 	return 0;
 }
 
+static inline void
+xfarray_sort_scan_done(
+	struct xfarray_sortinfo	*si)
+{
+	if (si->folio)
+		xfile_put_folio(si->array->xfile, si->folio);
+	si->folio = NULL;
+}
+
 /*
- * Load an element from the array into the first scratchpad and cache the page,
- * if possible.
+ * Cache the folio backing the start of the given array element.  If the array
+ * element is contained entirely within the folio, return a pointer to the
+ * cached folio.  Otherwise, load the element into the scratchpad and return a
+ * pointer to the scratchpad.
  */
 static inline int
-xfarray_sort_load_cached(
+xfarray_sort_scan(
 	struct xfarray_sortinfo	*si,
 	xfarray_idx_t		idx,
-	void			*ptr)
+	void			**ptrp)
 {
 	loff_t			idx_pos = xfarray_pos(si->array, idx);
-	pgoff_t			startpage;
-	pgoff_t			endpage;
 	int			error = 0;
 
-	/*
-	 * If this load would split a page, release the cached page, if any,
-	 * and perform a traditional read.
-	 */
-	startpage = idx_pos >> PAGE_SHIFT;
-	endpage = (idx_pos + si->array->obj_size - 1) >> PAGE_SHIFT;
-	if (startpage != endpage) {
-		error = xfarray_sort_put_page(si);
-		if (error)
-			return error;
+	if (xfarray_sort_terminated(si, &error))
+		return error;
 
-		if (xfarray_sort_terminated(si, &error))
-			return error;
+	trace_xfarray_sort_scan(si, idx);
 
-		return xfile_load(si->array->xfile, ptr,
-				si->array->obj_size, idx_pos);
-	}
+	/* If the cached folio doesn't cover this index, release it. */
+	if (si->folio &&
+	    (idx < si->first_folio_idx || idx > si->last_folio_idx))
+		xfarray_sort_scan_done(si);
 
-	/* If the cached page is not the one we want, release it. */
-	if (xfile_page_cached(&si->xfpage) &&
-	    xfile_page_index(&si->xfpage) != startpage) {
-		error = xfarray_sort_put_page(si);
-		if (error)
-			return error;
+	/* Grab the first folio that backs this array element. */
+	if (!si->folio) {
+		loff_t		next_pos;
+
+		si->folio = xfile_get_folio(si->array->xfile, idx_pos,
+				si->array->obj_size, XFILE_ALLOC);
+		if (IS_ERR(si->folio))
+			return PTR_ERR(si->folio);
+
+		si->first_folio_idx = xfarray_idx(si->array,
+				folio_pos(si->folio) + si->array->obj_size - 1);
+
+		next_pos = folio_pos(si->folio) + folio_size(si->folio);
+		si->last_folio_idx = xfarray_idx(si->array, next_pos - 1);
+		if (xfarray_pos(si->array, si->last_folio_idx + 1) > next_pos)
+			si->last_folio_idx--;
+
+		trace_xfarray_sort_scan(si, idx);
 	}
 
 	/*
-	 * If we don't have a cached page (and we know the load is contained
-	 * in a single page) then grab it.
+	 * If this folio still doesn't cover the desired element, it must cross
+	 * a folio boundary.  Read into the scratchpad and we're done.
 	 */
-	if (!xfile_page_cached(&si->xfpage)) {
-		if (xfarray_sort_terminated(si, &error))
-			return error;
+	if (idx < si->first_folio_idx || idx > si->last_folio_idx) {
+		void		*temp = xfarray_scratch(si->array);
 
-		error = xfarray_sort_get_page(si, startpage << PAGE_SHIFT,
-				PAGE_SIZE);
+		error = xfile_load(si->array->xfile, temp, si->array->obj_size,
+				idx_pos);
 		if (error)
 			return error;
+
+		*ptrp = temp;
+		return 0;
 	}
 
-	memcpy(ptr, page_address(si->xfpage.page) + offset_in_page(idx_pos),
-			si->array->obj_size);
+	/* Otherwise return a pointer to the array element in the folio. */
+	*ptrp = folio_address(si->folio) + offset_in_folio(si->folio, idx_pos);
 	return 0;
 }
 
@@ -937,6 +924,8 @@ xfarray_sort(
 	pivot = xfarray_sortinfo_pivot(si);
 
 	while (si->stack_depth >= 0) {
+		int		ret;
+
 		lo = si_lo[si->stack_depth];
 		hi = si_hi[si->stack_depth];
 
@@ -949,13 +938,13 @@ xfarray_sort(
 		}
 
 		/*
-		 * If directly mapping the page and sorting can solve our
+		 * If directly mapping the folio and sorting can solve our
 		 * problems, we're done.
 		 */
-		if (xfarray_want_pagesort(si, lo, hi)) {
-			error = xfarray_pagesort(si, lo, hi);
-			if (error)
-				goto out_free;
+		ret = xfarray_foliosort(si, lo, hi);
+		if (ret < 0)
+			goto out_free;
+		if (ret == 1) {
 			si->stack_depth--;
 			continue;
 		}
@@ -980,25 +969,24 @@ xfarray_sort(
 		 * than the pivot is on the right side of the range.
 		 */
 		while (lo < hi) {
+			void	*p;
+
 			/*
 			 * Decrement hi until it finds an a[hi] less than the
 			 * pivot value.
 			 */
-			error = xfarray_sort_load_cached(si, hi, scratch);
+			error = xfarray_sort_scan(si, hi, &p);
 			if (error)
 				goto out_free;
-			while (xfarray_sort_cmp(si, scratch, pivot) >= 0 &&
-								lo < hi) {
+			while (xfarray_sort_cmp(si, p, pivot) >= 0 && lo < hi) {
 				hi--;
-				error = xfarray_sort_load_cached(si, hi,
-						scratch);
+				error = xfarray_sort_scan(si, hi, &p);
 				if (error)
 					goto out_free;
 			}
-			error = xfarray_sort_put_page(si);
-			if (error)
-				goto out_free;
-
+			if (p != scratch)
+				memcpy(scratch, p, si->array->obj_size);
+			xfarray_sort_scan_done(si);
 			if (xfarray_sort_terminated(si, &error))
 				goto out_free;
 
@@ -1013,21 +1001,18 @@ xfarray_sort(
 			 * Increment lo until it finds an a[lo] greater than
 			 * the pivot value.
 			 */
-			error = xfarray_sort_load_cached(si, lo, scratch);
+			error = xfarray_sort_scan(si, lo, &p);
 			if (error)
 				goto out_free;
-			while (xfarray_sort_cmp(si, scratch, pivot) <= 0 &&
-								lo < hi) {
+			while (xfarray_sort_cmp(si, p, pivot) <= 0 && lo < hi) {
 				lo++;
-				error = xfarray_sort_load_cached(si, lo,
-						scratch);
+				error = xfarray_sort_scan(si, lo, &p);
 				if (error)
 					goto out_free;
 			}
-			error = xfarray_sort_put_page(si);
-			if (error)
-				goto out_free;
-
+			if (p != scratch)
+				memcpy(scratch, p, si->array->obj_size);
+			xfarray_sort_scan_done(si);
 			if (xfarray_sort_terminated(si, &error))
 				goto out_free;
 
diff --git a/fs/xfs/scrub/xfarray.h b/fs/xfs/scrub/xfarray.h
index 6f2862054e194d..ec643cc9fc1432 100644
--- a/fs/xfs/scrub/xfarray.h
+++ b/fs/xfs/scrub/xfarray.h
@@ -105,8 +105,14 @@ struct xfarray_sortinfo {
 	/* XFARRAY_SORT_* flags; see below. */
 	unsigned int		flags;
 
-	/* Cache a page here for faster access. */
-	struct xfile_page	xfpage;
+	/* Cache a folio here for faster scanning for pivots */
+	struct folio		*folio;
+
+	/* First array index in folio that is completely readable */
+	xfarray_idx_t		first_folio_idx;
+
+	/* Last array index in folio that is completely readable */
+	xfarray_idx_t		last_folio_idx;
 
 #ifdef DEBUG
 	/* Performance statistics. */
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 21/21] xfs: remove xfile_{get,put}_page
  2024-01-26 13:28 put the xfs xfile abstraction on a diet v2 Christoph Hellwig
                   ` (19 preceding siblings ...)
  2024-01-26 13:29 ` [PATCH 20/21] xfs: convert xfarray_pagesort to deal with large folios Christoph Hellwig
@ 2024-01-26 13:29 ` Christoph Hellwig
  2024-01-26 14:15 ` put the xfs xfile abstraction on a diet v2 Matthew Wilcox
  2024-01-26 16:47 ` Matthew Wilcox
  22 siblings, 0 replies; 48+ messages in thread
From: Christoph Hellwig @ 2024-01-26 13:29 UTC (permalink / raw)
  To: Chandan Babu R, Darrick J. Wong, Hugh Dickins, Andrew Morton
  Cc: linux-xfs, linux-mm

From: "Darrick J. Wong" <djwong@kernel.org>

These functions aren't used anymore, so get rid of them.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 .../xfs/xfs-online-fsck-design.rst            |   2 +-
 fs/xfs/scrub/trace.h                          |   2 -
 fs/xfs/scrub/xfile.c                          | 104 ------------------
 fs/xfs/scrub/xfile.h                          |  20 ----
 4 files changed, 1 insertion(+), 127 deletions(-)

diff --git a/Documentation/filesystems/xfs/xfs-online-fsck-design.rst b/Documentation/filesystems/xfs/xfs-online-fsck-design.rst
index 324d5ec921e8e5..6d91b68dd23b71 100644
--- a/Documentation/filesystems/xfs/xfs-online-fsck-design.rst
+++ b/Documentation/filesystems/xfs/xfs-online-fsck-design.rst
@@ -1941,7 +1941,7 @@ mapping it into kernel address space, and dropping the folio lock.
 These long term users *must* be responsive to memory reclaim by hooking into
 the shrinker infrastructure to know when to release folios.
 
-The ``xfile_get_page`` and ``xfile_put_page`` functions are provided to
+The ``xfile_get_folio`` and ``xfile_put_folio`` functions are provided to
 retrieve the (locked) folio that backs part of an xfile and to release it.
 The only code to use these folio lease functions are the xfarray
 :ref:`sorting<xfarray_sort>` algorithms and the :ref:`in-memory
diff --git a/fs/xfs/scrub/trace.h b/fs/xfs/scrub/trace.h
index 3a1a827828dcb9..ae6b2385a8cbe5 100644
--- a/fs/xfs/scrub/trace.h
+++ b/fs/xfs/scrub/trace.h
@@ -906,8 +906,6 @@ DEFINE_EVENT(xfile_class, name, \
 DEFINE_XFILE_EVENT(xfile_load);
 DEFINE_XFILE_EVENT(xfile_store);
 DEFINE_XFILE_EVENT(xfile_seek_data);
-DEFINE_XFILE_EVENT(xfile_get_page);
-DEFINE_XFILE_EVENT(xfile_put_page);
 DEFINE_XFILE_EVENT(xfile_get_folio);
 DEFINE_XFILE_EVENT(xfile_put_folio);
 
diff --git a/fs/xfs/scrub/xfile.c b/fs/xfs/scrub/xfile.c
index 1c1db4ae1ba6ee..341d0b55ddfdfc 100644
--- a/fs/xfs/scrub/xfile.c
+++ b/fs/xfs/scrub/xfile.c
@@ -262,110 +262,6 @@ xfile_seek_data(
 	return ret;
 }
 
-/*
- * Grab the (locked) page for a memory object.  The object cannot span a page
- * boundary.  Returns 0 (and a locked page) if successful, -ENOTBLK if we
- * cannot grab the page, or the usual negative errno.
- */
-int
-xfile_get_page(
-	struct xfile		*xf,
-	loff_t			pos,
-	unsigned int		len,
-	struct xfile_page	*xfpage)
-{
-	struct inode		*inode = file_inode(xf->file);
-	struct address_space	*mapping = inode->i_mapping;
-	const struct address_space_operations *aops = mapping->a_ops;
-	struct page		*page = NULL;
-	void			*fsdata = NULL;
-	loff_t			key = round_down(pos, PAGE_SIZE);
-	unsigned int		pflags;
-	int			error;
-
-	if (inode->i_sb->s_maxbytes - pos < len)
-		return -ENOMEM;
-	if (len > PAGE_SIZE - offset_in_page(pos))
-		return -ENOTBLK;
-
-	trace_xfile_get_page(xf, pos, len);
-
-	pflags = memalloc_nofs_save();
-
-	/*
-	 * We call write_begin directly here to avoid all the freezer
-	 * protection lock-taking that happens in the normal path.  shmem
-	 * doesn't support fs freeze, but lockdep doesn't know that and will
-	 * trip over that.
-	 */
-	error = aops->write_begin(NULL, mapping, key, PAGE_SIZE, &page,
-			&fsdata);
-	if (error)
-		goto out_pflags;
-
-	/* We got the page, so make sure we push out EOF. */
-	if (i_size_read(inode) < pos + len)
-		i_size_write(inode, pos + len);
-
-	/*
-	 * If the page isn't up to date, fill it with zeroes before we hand it
-	 * to the caller and make sure the backing store will hold on to them.
-	 */
-	if (!PageUptodate(page)) {
-		memset(page_address(page), 0, PAGE_SIZE);
-		SetPageUptodate(page);
-	}
-
-	/*
-	 * Mark each page dirty so that the contents are written to some
-	 * backing store when we drop this buffer, and take an extra reference
-	 * to prevent the xfile page from being swapped or removed from the
-	 * page cache by reclaim if the caller unlocks the page.
-	 */
-	set_page_dirty(page);
-	get_page(page);
-
-	xfpage->page = page;
-	xfpage->fsdata = fsdata;
-	xfpage->pos = key;
-out_pflags:
-	memalloc_nofs_restore(pflags);
-	return error;
-}
-
-/*
- * Release the (locked) page for a memory object.  Returns 0 or a negative
- * errno.
- */
-int
-xfile_put_page(
-	struct xfile		*xf,
-	struct xfile_page	*xfpage)
-{
-	struct inode		*inode = file_inode(xf->file);
-	struct address_space	*mapping = inode->i_mapping;
-	const struct address_space_operations *aops = mapping->a_ops;
-	unsigned int		pflags;
-	int			ret;
-
-	trace_xfile_put_page(xf, xfpage->pos, PAGE_SIZE);
-
-	/* Give back the reference that we took in xfile_get_page. */
-	put_page(xfpage->page);
-
-	pflags = memalloc_nofs_save();
-	ret = aops->write_end(NULL, mapping, xfpage->pos, PAGE_SIZE, PAGE_SIZE,
-			xfpage->page, xfpage->fsdata);
-	memalloc_nofs_restore(pflags);
-	memset(xfpage, 0, sizeof(struct xfile_page));
-
-	if (ret < 0)
-		return ret;
-	if (ret != PAGE_SIZE)
-		return -EIO;
-	return 0;
-}
-
 /*
  * Grab the (locked) folio for a memory object.  The object cannot span a folio
  * boundary.  Returns the locked folio if successful, NULL if there was no
diff --git a/fs/xfs/scrub/xfile.h b/fs/xfs/scrub/xfile.h
index afb75e9fbaf265..76d78dba7e3478 100644
--- a/fs/xfs/scrub/xfile.h
+++ b/fs/xfs/scrub/xfile.h
@@ -6,22 +6,6 @@
 #ifndef __XFS_SCRUB_XFILE_H__
 #define __XFS_SCRUB_XFILE_H__
 
-struct xfile_page {
-	struct page		*page;
-	void			*fsdata;
-	loff_t			pos;
-};
-
-static inline bool xfile_page_cached(const struct xfile_page *xfpage)
-{
-	return xfpage->page != NULL;
-}
-
-static inline pgoff_t xfile_page_index(const struct xfile_page *xfpage)
-{
-	return xfpage->page->index;
-}
-
 struct xfile {
 	struct file		*file;
 };
@@ -35,10 +19,6 @@ int xfile_store(struct xfile *xf, const void *buf, size_t count,
 
 loff_t xfile_seek_data(struct xfile *xf, loff_t pos);
 
-int xfile_get_page(struct xfile *xf, loff_t offset, unsigned int len,
-		struct xfile_page *xbuf);
-int xfile_put_page(struct xfile *xf, struct xfile_page *xbuf);
-
 #define XFILE_MAX_FOLIO_SIZE	(PAGE_SIZE << MAX_PAGECACHE_ORDER)
 
 #define XFILE_ALLOC		(1 << 0) /* allocate folio if not present */
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: put the xfs xfile abstraction on a diet v2
  2024-01-26 13:28 put the xfs xfile abstraction on a diet v2 Christoph Hellwig
                   ` (20 preceding siblings ...)
  2024-01-26 13:29 ` [PATCH 21/21] xfs: remove xfile_{get,put}_page Christoph Hellwig
@ 2024-01-26 14:15 ` Matthew Wilcox
  2024-01-26 14:18   ` Christoph Hellwig
  2024-01-26 16:47 ` Matthew Wilcox
  22 siblings, 1 reply; 48+ messages in thread
From: Matthew Wilcox @ 2024-01-26 14:15 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Chandan Babu R, Darrick J. Wong, Hugh Dickins, Andrew Morton,
	linux-xfs, linux-mm

On Fri, Jan 26, 2024 at 02:28:42PM +0100, Christoph Hellwig wrote:
> One thing I don't really like yet is that xfile is still based on
> folios and not pages.  The main stumbling block for that is the
> mess around the hwpoison flag - that one still is per-file and not
> per-folio, and shmem checks it weirdly often and not really in

hwpoison is per page not per file.  That's intrinsic to, well, hardware
poison, it affects an entire page (I'd love to support sub-page poison,
but not enough to spend my time working on memory-poison.c).

In general, I think there's a lack of understanding of hwpoison, and
I include myself in that.  Mostly I blame Intel for this; limiting the
hardware support to the higher end machines means that most of us just
don't care about it.

Why even bother checking for hwpoison in xfiles?  If you have flaky
hardware, well, maybe there's a reason you're having to fsck, and crashing
during a fsck might encourage the user to replace their hardware with
stuff that works.


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: put the xfs xfile abstraction on a diet v2
  2024-01-26 14:15 ` put the xfs xfile abstraction on a diet v2 Matthew Wilcox
@ 2024-01-26 14:18   ` Christoph Hellwig
  0 siblings, 0 replies; 48+ messages in thread
From: Christoph Hellwig @ 2024-01-26 14:18 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Christoph Hellwig, Chandan Babu R, Darrick J. Wong, Hugh Dickins,
	Andrew Morton, linux-xfs, linux-mm

On Fri, Jan 26, 2024 at 02:15:44PM +0000, Matthew Wilcox wrote:
> hwpoison is per page not per file.  That's intrinsic to, well, hardware
> poison, it affects an entire page (I'd love to support sub-page poison,
> but not enough to spend my time working on memory-poison.c).
> 
> In general, I think there's a lack of understanding of hwpoison, and
> I include myself in that.  Mostly I blame Intel for this; limiting the
> hardware support to the higher end machines means that most of us just
> don't care about it.
> 
> Why even bother checking for hwpoison in xfiles?  If you have flaky
> hardware, well, maybe there's a reason you're having to fsck, and crashing
> during a fsck might encourage the user to replace their hardware with
> stuff that works.

But the sentence is stale actually - we're using folios now after Darrick
coded up a helper check all the hwpoison cases and others.  I should
have removed it from the commit log.  Crashing is never a good idea
I think if we can easily avoid it.

Note that I still find the difference in hwpoison checking in shmem
vs filemap rather confusing.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 01/21] mm: move mapping_set_update out of <linux/swap.h>
  2024-01-26 13:28 ` [PATCH 01/21] mm: move mapping_set_update out of <linux/swap.h> Christoph Hellwig
@ 2024-01-26 14:39   ` Matthew Wilcox
  0 siblings, 0 replies; 48+ messages in thread
From: Matthew Wilcox @ 2024-01-26 14:39 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Chandan Babu R, Darrick J. Wong, Hugh Dickins, Andrew Morton,
	linux-xfs, linux-mm

On Fri, Jan 26, 2024 at 02:28:43PM +0100, Christoph Hellwig wrote:
> mapping_set_update is only used inside mm/.  Move mapping_set_update to
> mm/internal.h and turn it into an inline function instead of a macro.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 02/21] shmem: move shmem_mapping out of line
  2024-01-26 13:28 ` [PATCH 02/21] shmem: move shmem_mapping out of line Christoph Hellwig
@ 2024-01-26 14:40   ` Matthew Wilcox
  0 siblings, 0 replies; 48+ messages in thread
From: Matthew Wilcox @ 2024-01-26 14:40 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Chandan Babu R, Darrick J. Wong, Hugh Dickins, Andrew Morton,
	linux-xfs, linux-mm

On Fri, Jan 26, 2024 at 02:28:44PM +0100, Christoph Hellwig wrote:
> shmem_aops really should not be exported to the world.  Move
> shmem_mapping and export it as internal for the one semi-legitimate
> modular user in udmabuf.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 03/21] shmem: set a_ops earlier in shmem_symlink
  2024-01-26 13:28 ` [PATCH 03/21] shmem: set a_ops earlier in shmem_symlink Christoph Hellwig
@ 2024-01-26 14:41   ` Matthew Wilcox
  0 siblings, 0 replies; 48+ messages in thread
From: Matthew Wilcox @ 2024-01-26 14:41 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Chandan Babu R, Darrick J. Wong, Hugh Dickins, Andrew Morton,
	linux-xfs, linux-mm

On Fri, Jan 26, 2024 at 02:28:45PM +0100, Christoph Hellwig wrote:
> Set the a_aops in shmem_symlink before reading a folio from the mapping

s/a_aops/a_ops/

> to prepare for asserting that shmem_get_folio is only called on shmem
> mappings.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 04/21] shmem: move the shmem_mapping assert into shmem_get_folio_gfp
  2024-01-26 13:28 ` [PATCH 04/21] shmem: move the shmem_mapping assert into shmem_get_folio_gfp Christoph Hellwig
@ 2024-01-26 15:09   ` Matthew Wilcox
  0 siblings, 0 replies; 48+ messages in thread
From: Matthew Wilcox @ 2024-01-26 15:09 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Chandan Babu R, Darrick J. Wong, Hugh Dickins, Andrew Morton,
	linux-xfs, linux-mm

On Fri, Jan 26, 2024 at 02:28:46PM +0100, Christoph Hellwig wrote:
> Move the check that the inode really is a shmemfs one from
> shmem_read_folio_gfp to shmem_get_folio_gfp given that shmem_get_folio
> can also be called from outside of shmem.c.  Also turn it into a
> WARN_ON_ONCE and error return instead of BUG_ON to be less severe.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Darrick J. Wong <djwong@kernel.org>

Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 05/21] shmem: export shmem_get_folio
  2024-01-26 13:28 ` [PATCH 05/21] shmem: export shmem_get_folio Christoph Hellwig
@ 2024-01-26 15:15   ` Matthew Wilcox
  0 siblings, 0 replies; 48+ messages in thread
From: Matthew Wilcox @ 2024-01-26 15:15 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Chandan Babu R, Darrick J. Wong, Hugh Dickins, Andrew Morton,
	linux-xfs, linux-mm

On Fri, Jan 26, 2024 at 02:28:47PM +0100, Christoph Hellwig wrote:
> Export shmem_get_folio as a slightly lower-level variant of
> shmem_read_folio_gfp.  This will be useful for XFS xfile use cases
> that want to pass SGP_NOALLOC or get a locked page, which the thin
> shmem_read_folio_gfp wrapper can't provide.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Darrick J. Wong <djwong@kernel.org>

Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 06/21] shmem: export shmem_kernel_file_setup
  2024-01-26 13:28 ` [PATCH 06/21] shmem: export shmem_kernel_file_setup Christoph Hellwig
@ 2024-01-26 15:45   ` Matthew Wilcox
  0 siblings, 0 replies; 48+ messages in thread
From: Matthew Wilcox @ 2024-01-26 15:45 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Chandan Babu R, Darrick J. Wong, Hugh Dickins, Andrew Morton,
	linux-xfs, linux-mm

On Fri, Jan 26, 2024 at 02:28:48PM +0100, Christoph Hellwig wrote:
> XFS wants to use this for it's internal in-memory data structures and
> currently duplicates the functionality.  Export shmem_kernel_file_setup
> to allow XFS to switch over to using the proper kernel API.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 07/21] shmem: document how to "persist" data when using shmem_*file_setup
  2024-01-26 13:28 ` [PATCH 07/21] shmem: document how to "persist" data when using shmem_*file_setup Christoph Hellwig
@ 2024-01-26 15:49   ` Matthew Wilcox
  2024-01-28 16:54     ` Christoph Hellwig
  0 siblings, 1 reply; 48+ messages in thread
From: Matthew Wilcox @ 2024-01-26 15:49 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Chandan Babu R, Darrick J. Wong, Hugh Dickins, Andrew Morton,
	linux-xfs, linux-mm

On Fri, Jan 26, 2024 at 02:28:49PM +0100, Christoph Hellwig wrote:
> +++ b/mm/shmem.c
> @@ -2150,6 +2150,11 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index,
>   *
>   * Return: The found folio, %NULL if SGP_READ or SGP_NOALLOC was passed in @sgp
>   * and no folio was found at @index, or an ERR_PTR() otherwise.
> + *
> + * If the caller modifies data in the returned folio, it must call
> + * folio_mark_dirty() on the locked folio before dropping the reference to
> + * ensure the folio is not reclaimed.  Unlike for normal file systems there is
> + * no need to reserve space for users of shmem_*file_setup().

This doesn't quite make sense to me.  Do you mean:

 * If the caller modifies data in the folio, it must call folio_mark_dirty()
 * before unlocking the folio to ensure that the folio is not reclaimed.
 * There is no equivalent to write_begin/write_end for shmem.

(also this should go before the Return: section; the return section
should be the last thing in the kernel-doc)

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 09/21] xfs: remove the xfile_pread/pwrite APIs
  2024-01-26 13:28 ` [PATCH 09/21] xfs: remove the xfile_pread/pwrite APIs Christoph Hellwig
@ 2024-01-26 16:21   ` Matthew Wilcox
  2024-01-26 16:48     ` Darrick J. Wong
  0 siblings, 1 reply; 48+ messages in thread
From: Matthew Wilcox @ 2024-01-26 16:21 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Chandan Babu R, Darrick J. Wong, Hugh Dickins, Andrew Morton,
	linux-xfs, linux-mm

On Fri, Jan 26, 2024 at 02:28:51PM +0100, Christoph Hellwig wrote:
> +++ b/Documentation/filesystems/xfs/xfs-online-fsck-design.rst
> @@ -1915,19 +1915,13 @@ four of those five higher level data structures.
>  The fifth use case is discussed in the :ref:`realtime summary <rtsummary>` case
>  study.
>  
> -The most general storage interface supported by the xfile enables the reading
> -and writing of arbitrary quantities of data at arbitrary offsets in the xfile.
> -This capability is provided by ``xfile_pread`` and ``xfile_pwrite`` functions,
> -which behave similarly to their userspace counterparts.
>  XFS is very record-based, which suggests that the ability to load and store
>  complete records is important.
>  To support these cases, a pair of ``xfile_obj_load`` and ``xfile_obj_store``
> -functions are provided to read and persist objects into an xfile.
> -They are internally the same as pread and pwrite, except that they treat any
> -error as an out of memory error.
> +functions are provided to read and persist objects into an xfile that unlike
> +the pread and pwrite system calls treat any error as an out of memory error.

It's a bit weird to refer to the pread and pwrite system calls now.
I'd just say:

+functions are provided to read and persist objects into an xfile that
+treat any error as an out of memory error.

I wonder if we shouldn't also change:

-Programmatic access (e.g. pread and pwrite) uses this mechanism.
+Object load and store use this mechanism.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 13/21] xfs: don't allow highmem pages in xfile mappings
  2024-01-26 13:28 ` [PATCH 13/21] xfs: don't allow highmem pages in xfile mappings Christoph Hellwig
@ 2024-01-26 16:26   ` Matthew Wilcox
  0 siblings, 0 replies; 48+ messages in thread
From: Matthew Wilcox @ 2024-01-26 16:26 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Chandan Babu R, Darrick J. Wong, Hugh Dickins, Andrew Morton,
	linux-xfs, linux-mm

On Fri, Jan 26, 2024 at 02:28:55PM +0100, Christoph Hellwig wrote:
> XFS is generally used on 64-bit, non-highmem platforms and xfile
> mappings are accessed all the time.  Reduce our pain by not allowing
> any highmem mappings in the xfile page cache and remove all the kmap
> calls for it.

Ummm ...

> +++ b/fs/xfs/scrub/xfile.c
> @@ -77,6 +77,12 @@ xfile_create(
>  	inode = file_inode(xf->file);
>  	lockdep_set_class(&inode->i_rwsem, &xfile_i_mutex_key);
>  
> +	/*
> +	 * We don't want to bother with kmapping data during repair, so don't
> +	 * allow highmem pages to back this mapping.
> +	 */
> +	mapping_set_gfp_mask(inode->i_mapping, GFP_HIGHUSER);

HIGHUSER includes __GFP_HIGHMEM.

Do you want GFP_USER?  Compared to GFP_KERNEL, it includes the "cpuset
memory allocation policy" which ... I have no idea what it means, honestly.


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 14/21] xfs: use shmem_get_folio in xfile_obj_store
  2024-01-26 13:28 ` [PATCH 14/21] xfs: use shmem_get_folio in xfile_obj_store Christoph Hellwig
@ 2024-01-26 16:29   ` Matthew Wilcox
  0 siblings, 0 replies; 48+ messages in thread
From: Matthew Wilcox @ 2024-01-26 16:29 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Chandan Babu R, Darrick J. Wong, Hugh Dickins, Andrew Morton,
	linux-xfs, linux-mm

On Fri, Jan 26, 2024 at 02:28:56PM +0100, Christoph Hellwig wrote:
> +		if (folio_test_hwpoison(folio) ||
> +		    (folio_test_large(folio) &&
> +		     folio_test_has_hwpoisoned(folio))) {

I need to put in a helper for this.  I'm seeing this pattern crop up in
a few places and I don't like it.


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 16/21] xfs: improve detection of lost xfile contents
  2024-01-26 13:28 ` [PATCH 16/21] xfs: improve detection of lost xfile contents Christoph Hellwig
@ 2024-01-26 16:33   ` Matthew Wilcox
  2024-01-26 16:50     ` Darrick J. Wong
  2024-01-28 16:55     ` Christoph Hellwig
  0 siblings, 2 replies; 48+ messages in thread
From: Matthew Wilcox @ 2024-01-26 16:33 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Chandan Babu R, Darrick J. Wong, Hugh Dickins, Andrew Morton,
	linux-xfs, linux-mm

On Fri, Jan 26, 2024 at 02:28:58PM +0100, Christoph Hellwig wrote:
> +/* Has this file lost any of the data stored in it? */
> +static inline bool
> +xfile_has_lost_data(
> +	struct inode		*inode,
> +	struct folio		*folio)
> +{
> +	struct address_space	*mapping = inode->i_mapping;
> +
> +	/* This folio itself has been poisoned. */
> +	if (folio_test_hwpoison(folio))
> +		return true;
> +
> +	/* A base page under this large folio has been poisoned. */
> +	if (folio_test_large(folio) && folio_test_has_hwpoisoned(folio))
> +		return true;
> +
> +	/* Data loss has occurred anywhere in this shmem file. */
> +	if (test_bit(AS_EIO, &mapping->flags))
> +		return true;
> +	if (filemap_check_wb_err(mapping, 0))
> +		return true;
> +
> +	return false;
> +}

This is too much.  filemap_check_wb_err() will do just fine for your
needs unless you really want to get fine-grained and perhaps try to
reconstruct the contents of the file.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 17/21] xfs: add file_{get,put}_folio
  2024-01-26 13:28 ` [PATCH 17/21] xfs: add file_{get,put}_folio Christoph Hellwig
@ 2024-01-26 16:40   ` Matthew Wilcox
  2024-01-26 16:55     ` Darrick J. Wong
  2024-01-27  1:26   ` Kent Overstreet
  1 sibling, 1 reply; 48+ messages in thread
From: Matthew Wilcox @ 2024-01-26 16:40 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Chandan Babu R, Darrick J. Wong, Hugh Dickins, Andrew Morton,
	linux-xfs, linux-mm

On Fri, Jan 26, 2024 at 02:28:59PM +0100, Christoph Hellwig wrote:
> +	/*
> +	 * Mark the folio dirty so that it won't be reclaimed once we drop the
> +	 * (potentially last) reference in xfile_put_folio.
> +	 */
> +	if (flags & XFILE_ALLOC)
> +		folio_set_dirty(folio);

What I can't tell from skimming the code is whether we ever get the folio
and don't modify it.  If we do, it might make sense to not set dirty here,
but instead pass a bool to xfile_put_folio().  Or have the caller dirty
the folio if they actually modify it.  But perhaps that never happens
in practice and this is simple and works every time.


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: put the xfs xfile abstraction on a diet v2
  2024-01-26 13:28 put the xfs xfile abstraction on a diet v2 Christoph Hellwig
                   ` (21 preceding siblings ...)
  2024-01-26 14:15 ` put the xfs xfile abstraction on a diet v2 Matthew Wilcox
@ 2024-01-26 16:47 ` Matthew Wilcox
  22 siblings, 0 replies; 48+ messages in thread
From: Matthew Wilcox @ 2024-01-26 16:47 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Chandan Babu R, Darrick J. Wong, Hugh Dickins, Andrew Morton,
	linux-xfs, linux-mm

On Fri, Jan 26, 2024 at 02:28:42PM +0100, Christoph Hellwig wrote:
> this series refactors and simplifies the code in the xfs xfile
> abstraction, which is a thing layer on a kernel-use shmem file.

OK, I've finished reviewing the series.  I'm not offering R-b for the
xfiles portions as I feel I don't understand the code well enough
to do a high quality review.

In general it looks good, and I only had minor quibbles; the overall
direction is good.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 09/21] xfs: remove the xfile_pread/pwrite APIs
  2024-01-26 16:21   ` Matthew Wilcox
@ 2024-01-26 16:48     ` Darrick J. Wong
  0 siblings, 0 replies; 48+ messages in thread
From: Darrick J. Wong @ 2024-01-26 16:48 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Christoph Hellwig, Chandan Babu R, Hugh Dickins, Andrew Morton,
	linux-xfs, linux-mm

On Fri, Jan 26, 2024 at 04:21:52PM +0000, Matthew Wilcox wrote:
> On Fri, Jan 26, 2024 at 02:28:51PM +0100, Christoph Hellwig wrote:
> > +++ b/Documentation/filesystems/xfs/xfs-online-fsck-design.rst
> > @@ -1915,19 +1915,13 @@ four of those five higher level data structures.
> >  The fifth use case is discussed in the :ref:`realtime summary <rtsummary>` case
> >  study.
> >  
> > -The most general storage interface supported by the xfile enables the reading
> > -and writing of arbitrary quantities of data at arbitrary offsets in the xfile.
> > -This capability is provided by ``xfile_pread`` and ``xfile_pwrite`` functions,
> > -which behave similarly to their userspace counterparts.
> >  XFS is very record-based, which suggests that the ability to load and store
> >  complete records is important.
> >  To support these cases, a pair of ``xfile_obj_load`` and ``xfile_obj_store``

Nit: s/xfile_obj_XXXX/xfile_XXXX/ here.

> > -functions are provided to read and persist objects into an xfile.
> > -They are internally the same as pread and pwrite, except that they treat any
> > -error as an out of memory error.
> > +functions are provided to read and persist objects into an xfile that unlike
> > +the pread and pwrite system calls treat any error as an out of memory error.
> 
> It's a bit weird to refer to the pread and pwrite system calls now.
> I'd just say:
> 
> +functions are provided to read and persist objects into an xfile that
> +treat any error as an out of memory error.
> 
> I wonder if we shouldn't also change:
> 
> -Programmatic access (e.g. pread and pwrite) uses this mechanism.
> +Object load and store use this mechanism.

Yes.

--D

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 16/21] xfs: improve detection of lost xfile contents
  2024-01-26 16:33   ` Matthew Wilcox
@ 2024-01-26 16:50     ` Darrick J. Wong
  2024-01-28 16:55     ` Christoph Hellwig
  1 sibling, 0 replies; 48+ messages in thread
From: Darrick J. Wong @ 2024-01-26 16:50 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Christoph Hellwig, Chandan Babu R, Hugh Dickins, Andrew Morton,
	linux-xfs, linux-mm

On Fri, Jan 26, 2024 at 04:33:40PM +0000, Matthew Wilcox wrote:
> On Fri, Jan 26, 2024 at 02:28:58PM +0100, Christoph Hellwig wrote:
> > +/* Has this file lost any of the data stored in it? */
> > +static inline bool
> > +xfile_has_lost_data(
> > +	struct inode		*inode,
> > +	struct folio		*folio)
> > +{
> > +	struct address_space	*mapping = inode->i_mapping;
> > +
> > +	/* This folio itself has been poisoned. */
> > +	if (folio_test_hwpoison(folio))
> > +		return true;
> > +
> > +	/* A base page under this large folio has been poisoned. */
> > +	if (folio_test_large(folio) && folio_test_has_hwpoisoned(folio))
> > +		return true;
> > +
> > +	/* Data loss has occurred anywhere in this shmem file. */
> > +	if (test_bit(AS_EIO, &mapping->flags))
> > +		return true;
> > +	if (filemap_check_wb_err(mapping, 0))
> > +		return true;
> > +
> > +	return false;
> > +}
> 
> This is too much.  filemap_check_wb_err() will do just fine for your
> needs unless you really want to get fine-grained and perhaps try to
> reconstruct the contents of the file.

Hah no, we're not going to implement online fsck for xfiles. ;)

Online fsck might be a little special in that any data loss anywhere in
an xfile needs to result in the repair aborting without touching the
ondisk metadata, and the sooner we realise this the better.

--D

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 17/21] xfs: add file_{get,put}_folio
  2024-01-26 16:40   ` Matthew Wilcox
@ 2024-01-26 16:55     ` Darrick J. Wong
  0 siblings, 0 replies; 48+ messages in thread
From: Darrick J. Wong @ 2024-01-26 16:55 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Christoph Hellwig, Chandan Babu R, Hugh Dickins, Andrew Morton,
	linux-xfs, linux-mm

On Fri, Jan 26, 2024 at 04:40:57PM +0000, Matthew Wilcox wrote:
> On Fri, Jan 26, 2024 at 02:28:59PM +0100, Christoph Hellwig wrote:
> > +	/*
> > +	 * Mark the folio dirty so that it won't be reclaimed once we drop the
> > +	 * (potentially last) reference in xfile_put_folio.
> > +	 */
> > +	if (flags & XFILE_ALLOC)
> > +		folio_set_dirty(folio);
> 
> What I can't tell from skimming the code is whether we ever get the folio
> and don't modify it.  If we do, it might make sense to not set dirty here,
> but instead pass a bool to xfile_put_folio().  Or have the caller dirty
> the folio if they actually modify it.  But perhaps that never happens
> in practice and this is simple and works every time.

Generally we won't be allocating an xfile folio without storing data to it.
It's possible that there could be dirty folios containing zeroes (e.g.
an xfarray that stored a bunch of array elements then nullified all of
them) but setting dirty early is simpler.

(and all the users of xfiles are only interested in ephemeral datasets
so most of the dirty pages and the entire file will get flushed out
quickly)

--D

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 20/21] xfs: convert xfarray_pagesort to deal with large folios
  2024-01-26 13:29 ` [PATCH 20/21] xfs: convert xfarray_pagesort to deal with large folios Christoph Hellwig
@ 2024-01-27  1:09   ` Kent Overstreet
  0 siblings, 0 replies; 48+ messages in thread
From: Kent Overstreet @ 2024-01-27  1:09 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Chandan Babu R, Darrick J. Wong, Hugh Dickins, Andrew Morton,
	linux-xfs, linux-mm

On Fri, Jan 26, 2024 at 02:29:02PM +0100, Christoph Hellwig wrote:
> From: "Darrick J. Wong" <djwong@kernel.org>
> 
> Convert xfarray_pagesort to handle large folios by introducing a new
> xfile_get_folio routine that can return a folio of arbitrary size, and
> using heapsort on the full folio.  This also corrects an off-by-one bug
> in the calculation of len in xfarray_pagesort that was papered over by
> xfarray_want_pagesort.
> 
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> Signed-off-by: Christoph Hellwig <hch@lst.de>

not the fastest way this could be done - but it was performance tested
and according to Darrick was only 2x slower than the "use a normal sort
routine on a contiguously mapped buffer" version, which really isn't
that bad. So -

Reviewed-by: Kent Overstreet <kent.overstreet@linux.dev>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 17/21] xfs: add file_{get,put}_folio
  2024-01-26 13:28 ` [PATCH 17/21] xfs: add file_{get,put}_folio Christoph Hellwig
  2024-01-26 16:40   ` Matthew Wilcox
@ 2024-01-27  1:26   ` Kent Overstreet
  2024-01-27  1:32     ` Darrick J. Wong
  1 sibling, 1 reply; 48+ messages in thread
From: Kent Overstreet @ 2024-01-27  1:26 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Chandan Babu R, Darrick J. Wong, Hugh Dickins, Andrew Morton,
	linux-xfs, linux-mm

On Fri, Jan 26, 2024 at 02:28:59PM +0100, Christoph Hellwig wrote:
> From: "Darrick J. Wong" <djwong@kernel.org>
> 
> Add helper similar to file_{get,set}_page, but which deal with folios
> and don't allocate new folio unless explicitly asked to, which map
> to shmem_get_folio instead of calling into the aops.
> 
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> Signed-off-by: Christoph Hellwig <hch@lst.de>

looks boilerplatey to my eyes, but this is all new conceptual stuff and
the implementation will be evolving, so...

one nit: that's really not the right place for memalloc_nofs_save(), can
we try to start figuring out the proper locations for those?

Reviewed-by: Kent Overstreet <kent.overstreet@linux.dev>

> ---
>  fs/xfs/scrub/trace.h |  2 ++
>  fs/xfs/scrub/xfile.c | 74 ++++++++++++++++++++++++++++++++++++++++++++
>  fs/xfs/scrub/xfile.h |  7 +++++
>  3 files changed, 83 insertions(+)
> 
> diff --git a/fs/xfs/scrub/trace.h b/fs/xfs/scrub/trace.h
> index 0327cab606b070..c61fa7a95ef522 100644
> --- a/fs/xfs/scrub/trace.h
> +++ b/fs/xfs/scrub/trace.h
> @@ -908,6 +908,8 @@ DEFINE_XFILE_EVENT(xfile_store);
>  DEFINE_XFILE_EVENT(xfile_seek_data);
>  DEFINE_XFILE_EVENT(xfile_get_page);
>  DEFINE_XFILE_EVENT(xfile_put_page);
> +DEFINE_XFILE_EVENT(xfile_get_folio);
> +DEFINE_XFILE_EVENT(xfile_put_folio);
>  
>  TRACE_EVENT(xfarray_create,
>  	TP_PROTO(struct xfarray *xfa, unsigned long long required_capacity),
> diff --git a/fs/xfs/scrub/xfile.c b/fs/xfs/scrub/xfile.c
> index 2d802c20a8ddfe..1c1db4ae1ba6ee 100644
> --- a/fs/xfs/scrub/xfile.c
> +++ b/fs/xfs/scrub/xfile.c
> @@ -365,3 +365,77 @@ xfile_put_page(
>  		return -EIO;
>  	return 0;
>  }
> +
> +/*
> + * Grab the (locked) folio for a memory object.  The object cannot span a folio
> + * boundary.  Returns the locked folio if successful, NULL if there was no
> + * folio or it didn't cover the range requested, or an ERR_PTR on failure.
> + */
> +struct folio *
> +xfile_get_folio(
> +	struct xfile		*xf,
> +	loff_t			pos,
> +	size_t			len,
> +	unsigned int		flags)
> +{
> +	struct inode		*inode = file_inode(xf->file);
> +	struct folio		*folio = NULL;
> +	unsigned int		pflags;
> +	int			error;
> +
> +	if (inode->i_sb->s_maxbytes - pos < len)
> +		return ERR_PTR(-ENOMEM);
> +
> +	trace_xfile_get_folio(xf, pos, len);
> +
> +	/*
> +	 * Increase the file size first so that shmem_get_folio(..., SGP_CACHE),
> +	 * actually allocates a folio instead of erroring out.
> +	 */
> +	if ((flags & XFILE_ALLOC) && pos + len > i_size_read(inode))
> +		i_size_write(inode, pos + len);
> +
> +	pflags = memalloc_nofs_save();
> +	error = shmem_get_folio(inode, pos >> PAGE_SHIFT, &folio,
> +			(flags & XFILE_ALLOC) ? SGP_CACHE : SGP_READ);
> +	memalloc_nofs_restore(pflags);
> +	if (error)
> +		return ERR_PTR(error);
> +
> +	if (!folio)
> +		return NULL;
> +
> +	if (len > folio_size(folio) - offset_in_folio(folio, pos)) {
> +		folio_unlock(folio);
> +		folio_put(folio);
> +		return NULL;
> +	}
> +
> +	if (xfile_has_lost_data(inode, folio)) {
> +		folio_unlock(folio);
> +		folio_put(folio);
> +		return ERR_PTR(-EIO);
> +	}
> +
> +	/*
> +	 * Mark the folio dirty so that it won't be reclaimed once we drop the
> +	 * (potentially last) reference in xfile_put_folio.
> +	 */
> +	if (flags & XFILE_ALLOC)
> +		folio_set_dirty(folio);
> +	return folio;
> +}
> +
> +/*
> + * Release the (locked) folio for a memory object.
> + */
> +void
> +xfile_put_folio(
> +	struct xfile		*xf,
> +	struct folio		*folio)
> +{
> +	trace_xfile_put_folio(xf, folio_pos(folio), folio_size(folio));
> +
> +	folio_unlock(folio);
> +	folio_put(folio);
> +}
> diff --git a/fs/xfs/scrub/xfile.h b/fs/xfs/scrub/xfile.h
> index 465b10f492b66d..afb75e9fbaf265 100644
> --- a/fs/xfs/scrub/xfile.h
> +++ b/fs/xfs/scrub/xfile.h
> @@ -39,4 +39,11 @@ int xfile_get_page(struct xfile *xf, loff_t offset, unsigned int len,
>  		struct xfile_page *xbuf);
>  int xfile_put_page(struct xfile *xf, struct xfile_page *xbuf);
>  
> +#define XFILE_MAX_FOLIO_SIZE	(PAGE_SIZE << MAX_PAGECACHE_ORDER)
> +
> +#define XFILE_ALLOC		(1 << 0) /* allocate folio if not present */
> +struct folio *xfile_get_folio(struct xfile *xf, loff_t offset, size_t len,
> +		unsigned int flags);
> +void xfile_put_folio(struct xfile *xf, struct folio *folio);
> +
>  #endif /* __XFS_SCRUB_XFILE_H__ */
> -- 
> 2.39.2
> 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 17/21] xfs: add file_{get,put}_folio
  2024-01-27  1:26   ` Kent Overstreet
@ 2024-01-27  1:32     ` Darrick J. Wong
  0 siblings, 0 replies; 48+ messages in thread
From: Darrick J. Wong @ 2024-01-27  1:32 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: Christoph Hellwig, Chandan Babu R, Hugh Dickins, Andrew Morton,
	linux-xfs, linux-mm

On Fri, Jan 26, 2024 at 08:26:22PM -0500, Kent Overstreet wrote:
> On Fri, Jan 26, 2024 at 02:28:59PM +0100, Christoph Hellwig wrote:
> > From: "Darrick J. Wong" <djwong@kernel.org>
> > 
> > Add helper similar to file_{get,set}_page, but which deal with folios
> > and don't allocate new folio unless explicitly asked to, which map
> > to shmem_get_folio instead of calling into the aops.
> > 
> > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > Signed-off-by: Christoph Hellwig <hch@lst.de>
> 
> looks boilerplatey to my eyes, but this is all new conceptual stuff and
> the implementation will be evolving, so...
> 
> one nit: that's really not the right place for memalloc_nofs_save(), can
> we try to start figuring out the proper locations for those?

I /think/ this is actually unnecessary since the xfs scrub code will
have already allocated a (possibly empty) transaction, which will have
set PF_MEMALLOC_NOFS.  But.  I'd rather concentrate on merging the code
and fixing the correctness bugs; and later we can find and remove the
unnecessary bits.

(Yeah shameful copy pasta from shmem.c.)

--D

> Reviewed-by: Kent Overstreet <kent.overstreet@linux.dev>
> 
> > ---
> >  fs/xfs/scrub/trace.h |  2 ++
> >  fs/xfs/scrub/xfile.c | 74 ++++++++++++++++++++++++++++++++++++++++++++
> >  fs/xfs/scrub/xfile.h |  7 +++++
> >  3 files changed, 83 insertions(+)
> > 
> > diff --git a/fs/xfs/scrub/trace.h b/fs/xfs/scrub/trace.h
> > index 0327cab606b070..c61fa7a95ef522 100644
> > --- a/fs/xfs/scrub/trace.h
> > +++ b/fs/xfs/scrub/trace.h
> > @@ -908,6 +908,8 @@ DEFINE_XFILE_EVENT(xfile_store);
> >  DEFINE_XFILE_EVENT(xfile_seek_data);
> >  DEFINE_XFILE_EVENT(xfile_get_page);
> >  DEFINE_XFILE_EVENT(xfile_put_page);
> > +DEFINE_XFILE_EVENT(xfile_get_folio);
> > +DEFINE_XFILE_EVENT(xfile_put_folio);
> >  
> >  TRACE_EVENT(xfarray_create,
> >  	TP_PROTO(struct xfarray *xfa, unsigned long long required_capacity),
> > diff --git a/fs/xfs/scrub/xfile.c b/fs/xfs/scrub/xfile.c
> > index 2d802c20a8ddfe..1c1db4ae1ba6ee 100644
> > --- a/fs/xfs/scrub/xfile.c
> > +++ b/fs/xfs/scrub/xfile.c
> > @@ -365,3 +365,77 @@ xfile_put_page(
> >  		return -EIO;
> >  	return 0;
> >  }
> > +
> > +/*
> > + * Grab the (locked) folio for a memory object.  The object cannot span a folio
> > + * boundary.  Returns the locked folio if successful, NULL if there was no
> > + * folio or it didn't cover the range requested, or an ERR_PTR on failure.
> > + */
> > +struct folio *
> > +xfile_get_folio(
> > +	struct xfile		*xf,
> > +	loff_t			pos,
> > +	size_t			len,
> > +	unsigned int		flags)
> > +{
> > +	struct inode		*inode = file_inode(xf->file);
> > +	struct folio		*folio = NULL;
> > +	unsigned int		pflags;
> > +	int			error;
> > +
> > +	if (inode->i_sb->s_maxbytes - pos < len)
> > +		return ERR_PTR(-ENOMEM);
> > +
> > +	trace_xfile_get_folio(xf, pos, len);
> > +
> > +	/*
> > +	 * Increase the file size first so that shmem_get_folio(..., SGP_CACHE),
> > +	 * actually allocates a folio instead of erroring out.
> > +	 */
> > +	if ((flags & XFILE_ALLOC) && pos + len > i_size_read(inode))
> > +		i_size_write(inode, pos + len);
> > +
> > +	pflags = memalloc_nofs_save();
> > +	error = shmem_get_folio(inode, pos >> PAGE_SHIFT, &folio,
> > +			(flags & XFILE_ALLOC) ? SGP_CACHE : SGP_READ);
> > +	memalloc_nofs_restore(pflags);
> > +	if (error)
> > +		return ERR_PTR(error);
> > +
> > +	if (!folio)
> > +		return NULL;
> > +
> > +	if (len > folio_size(folio) - offset_in_folio(folio, pos)) {
> > +		folio_unlock(folio);
> > +		folio_put(folio);
> > +		return NULL;
> > +	}
> > +
> > +	if (xfile_has_lost_data(inode, folio)) {
> > +		folio_unlock(folio);
> > +		folio_put(folio);
> > +		return ERR_PTR(-EIO);
> > +	}
> > +
> > +	/*
> > +	 * Mark the folio dirty so that it won't be reclaimed once we drop the
> > +	 * (potentially last) reference in xfile_put_folio.
> > +	 */
> > +	if (flags & XFILE_ALLOC)
> > +		folio_set_dirty(folio);
> > +	return folio;
> > +}
> > +
> > +/*
> > + * Release the (locked) folio for a memory object.
> > + */
> > +void
> > +xfile_put_folio(
> > +	struct xfile		*xf,
> > +	struct folio		*folio)
> > +{
> > +	trace_xfile_put_folio(xf, folio_pos(folio), folio_size(folio));
> > +
> > +	folio_unlock(folio);
> > +	folio_put(folio);
> > +}
> > diff --git a/fs/xfs/scrub/xfile.h b/fs/xfs/scrub/xfile.h
> > index 465b10f492b66d..afb75e9fbaf265 100644
> > --- a/fs/xfs/scrub/xfile.h
> > +++ b/fs/xfs/scrub/xfile.h
> > @@ -39,4 +39,11 @@ int xfile_get_page(struct xfile *xf, loff_t offset, unsigned int len,
> >  		struct xfile_page *xbuf);
> >  int xfile_put_page(struct xfile *xf, struct xfile_page *xbuf);
> >  
> > +#define XFILE_MAX_FOLIO_SIZE	(PAGE_SIZE << MAX_PAGECACHE_ORDER)
> > +
> > +#define XFILE_ALLOC		(1 << 0) /* allocate folio if not present */
> > +struct folio *xfile_get_folio(struct xfile *xf, loff_t offset, size_t len,
> > +		unsigned int flags);
> > +void xfile_put_folio(struct xfile *xf, struct folio *folio);
> > +
> >  #endif /* __XFS_SCRUB_XFILE_H__ */
> > -- 
> > 2.39.2
> > 
> 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 07/21] shmem: document how to "persist" data when using shmem_*file_setup
  2024-01-26 15:49   ` Matthew Wilcox
@ 2024-01-28 16:54     ` Christoph Hellwig
  2024-01-29 15:56       ` Matthew Wilcox
  0 siblings, 1 reply; 48+ messages in thread
From: Christoph Hellwig @ 2024-01-28 16:54 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Christoph Hellwig, Chandan Babu R, Darrick J. Wong, Hugh Dickins,
	Andrew Morton, linux-xfs, linux-mm

On Fri, Jan 26, 2024 at 03:49:30PM +0000, Matthew Wilcox wrote:
> This doesn't quite make sense to me.  Do you mean:
> 
>  * If the caller modifies data in the folio, it must call folio_mark_dirty()
>  * before unlocking the folio to ensure that the folio is not reclaimed.
>  * There is no equivalent to write_begin/write_end for shmem.
> 
> (also this should go before the Return: section; the return section
> should be the last thing in the kernel-doc)

So the first sentence and moving the section makes total sense.
The second sentence I don't think is very useful.  write_begin/write_end
are relaly just a way for generic_perform_write to do the space
reservation and extending i_size and not really methods in the classic
sense.  They should go away from a_ops and certainly don't end up
being mentioned in shmem.c.

What I have now is this:

If the caller modifies data in the folio, it must call folio_mark_dirty()
before unlocking the folio to ensure that the folio is not reclaimed.
These is no need to reserve space before calling folio_mark_dirty().

but I'm open to further improvements.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 16/21] xfs: improve detection of lost xfile contents
  2024-01-26 16:33   ` Matthew Wilcox
  2024-01-26 16:50     ` Darrick J. Wong
@ 2024-01-28 16:55     ` Christoph Hellwig
  2024-01-28 20:33       ` Matthew Wilcox
  1 sibling, 1 reply; 48+ messages in thread
From: Christoph Hellwig @ 2024-01-28 16:55 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Christoph Hellwig, Chandan Babu R, Darrick J. Wong, Hugh Dickins,
	Andrew Morton, linux-xfs, linux-mm

On Fri, Jan 26, 2024 at 04:33:40PM +0000, Matthew Wilcox wrote:
> > +static inline bool
> > +xfile_has_lost_data(
> > +	struct inode		*inode,
> > +	struct folio		*folio)
> > +{
> > +	struct address_space	*mapping = inode->i_mapping;
> > +
> > +	/* This folio itself has been poisoned. */
> > +	if (folio_test_hwpoison(folio))
> > +		return true;
> > +
> > +	/* A base page under this large folio has been poisoned. */
> > +	if (folio_test_large(folio) && folio_test_has_hwpoisoned(folio))
> > +		return true;
> > +
> > +	/* Data loss has occurred anywhere in this shmem file. */
> > +	if (test_bit(AS_EIO, &mapping->flags))
> > +		return true;
> > +	if (filemap_check_wb_err(mapping, 0))
> > +		return true;
> > +
> > +	return false;
> > +}
> 
> This is too much.  filemap_check_wb_err() will do just fine for your
> needs unless you really want to get fine-grained and perhaps try to
> reconstruct the contents of the file.

As in only call filemap_check_wb_err and do away with all the
hwpoisoned checks and the extra AS_EIO check?

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 16/21] xfs: improve detection of lost xfile contents
  2024-01-28 16:55     ` Christoph Hellwig
@ 2024-01-28 20:33       ` Matthew Wilcox
  0 siblings, 0 replies; 48+ messages in thread
From: Matthew Wilcox @ 2024-01-28 20:33 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Chandan Babu R, Darrick J. Wong, Hugh Dickins, Andrew Morton,
	linux-xfs, linux-mm

On Sun, Jan 28, 2024 at 05:55:49PM +0100, Christoph Hellwig wrote:
> On Fri, Jan 26, 2024 at 04:33:40PM +0000, Matthew Wilcox wrote:
> > > +static inline bool
> > > +xfile_has_lost_data(
> > > +	struct inode		*inode,
> > > +	struct folio		*folio)
> > > +{
> > > +	struct address_space	*mapping = inode->i_mapping;
> > > +
> > > +	/* This folio itself has been poisoned. */
> > > +	if (folio_test_hwpoison(folio))
> > > +		return true;
> > > +
> > > +	/* A base page under this large folio has been poisoned. */
> > > +	if (folio_test_large(folio) && folio_test_has_hwpoisoned(folio))
> > > +		return true;
> > > +
> > > +	/* Data loss has occurred anywhere in this shmem file. */
> > > +	if (test_bit(AS_EIO, &mapping->flags))
> > > +		return true;
> > > +	if (filemap_check_wb_err(mapping, 0))
> > > +		return true;
> > > +
> > > +	return false;
> > > +}
> > 
> > This is too much.  filemap_check_wb_err() will do just fine for your
> > needs unless you really want to get fine-grained and perhaps try to
> > reconstruct the contents of the file.
> 
> As in only call filemap_check_wb_err and do away with all the
> hwpoisoned checks and the extra AS_EIO check?

Yes, that's what i meant.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 07/21] shmem: document how to "persist" data when using shmem_*file_setup
  2024-01-28 16:54     ` Christoph Hellwig
@ 2024-01-29 15:56       ` Matthew Wilcox
  2024-01-29 16:04         ` Christoph Hellwig
  0 siblings, 1 reply; 48+ messages in thread
From: Matthew Wilcox @ 2024-01-29 15:56 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Chandan Babu R, Darrick J. Wong, Hugh Dickins, Andrew Morton,
	linux-xfs, linux-mm

On Sun, Jan 28, 2024 at 05:54:34PM +0100, Christoph Hellwig wrote:
> On Fri, Jan 26, 2024 at 03:49:30PM +0000, Matthew Wilcox wrote:
> > This doesn't quite make sense to me.  Do you mean:
> > 
> >  * If the caller modifies data in the folio, it must call folio_mark_dirty()
> >  * before unlocking the folio to ensure that the folio is not reclaimed.
> >  * There is no equivalent to write_begin/write_end for shmem.
> > 
> > (also this should go before the Return: section; the return section
> > should be the last thing in the kernel-doc)
> 
> So the first sentence and moving the section makes total sense.
> The second sentence I don't think is very useful.  write_begin/write_end
> are relaly just a way for generic_perform_write to do the space
> reservation and extending i_size and not really methods in the classic
> sense.  They should go away from a_ops and certainly don't end up
> being mentioned in shmem.c.
> 
> What I have now is this:
> 
> If the caller modifies data in the folio, it must call folio_mark_dirty()
> before unlocking the folio to ensure that the folio is not reclaimed.
> These is no need to reserve space before calling folio_mark_dirty().

That's totally fine with me.

Could I trouble you to elaborate on what you'd like to see a filesystem
like ubifs do to replace write_begin/write_end?  After my recent patches,
those are the only places in ubifs that have a struct page reference.
I've been holding off on converting those and writepage because we have
plans to eliminate them, but I'm not sure how much longer we can hold off.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 07/21] shmem: document how to "persist" data when using shmem_*file_setup
  2024-01-29 15:56       ` Matthew Wilcox
@ 2024-01-29 16:04         ` Christoph Hellwig
  0 siblings, 0 replies; 48+ messages in thread
From: Christoph Hellwig @ 2024-01-29 16:04 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Christoph Hellwig, Chandan Babu R, Darrick J. Wong, Hugh Dickins,
	Andrew Morton, linux-xfs, linux-mm

On Mon, Jan 29, 2024 at 03:56:58PM +0000, Matthew Wilcox wrote:
> Could I trouble you to elaborate on what you'd like to see a filesystem
> like ubifs do to replace write_begin/write_end?  After my recent patches,
> those are the only places in ubifs that have a struct page reference.
> I've been holding off on converting those and writepage because we have
> plans to eliminate them, but I'm not sure how much longer we can hold off.

I think the abstraction is okay for anyone using generic_perform_write.
The problem with them is that they are not generic operations called
by higher level code, but callbacks that depend on caller context.
So in perfect world they would be passed directly to
generic_perform_write and the few other users and not go through aops.

^ permalink raw reply	[flat|nested] 48+ messages in thread

end of thread, other threads:[~2024-01-29 16:04 UTC | newest]

Thread overview: 48+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-01-26 13:28 put the xfs xfile abstraction on a diet v2 Christoph Hellwig
2024-01-26 13:28 ` [PATCH 01/21] mm: move mapping_set_update out of <linux/swap.h> Christoph Hellwig
2024-01-26 14:39   ` Matthew Wilcox
2024-01-26 13:28 ` [PATCH 02/21] shmem: move shmem_mapping out of line Christoph Hellwig
2024-01-26 14:40   ` Matthew Wilcox
2024-01-26 13:28 ` [PATCH 03/21] shmem: set a_ops earlier in shmem_symlink Christoph Hellwig
2024-01-26 14:41   ` Matthew Wilcox
2024-01-26 13:28 ` [PATCH 04/21] shmem: move the shmem_mapping assert into shmem_get_folio_gfp Christoph Hellwig
2024-01-26 15:09   ` Matthew Wilcox
2024-01-26 13:28 ` [PATCH 05/21] shmem: export shmem_get_folio Christoph Hellwig
2024-01-26 15:15   ` Matthew Wilcox
2024-01-26 13:28 ` [PATCH 06/21] shmem: export shmem_kernel_file_setup Christoph Hellwig
2024-01-26 15:45   ` Matthew Wilcox
2024-01-26 13:28 ` [PATCH 07/21] shmem: document how to "persist" data when using shmem_*file_setup Christoph Hellwig
2024-01-26 15:49   ` Matthew Wilcox
2024-01-28 16:54     ` Christoph Hellwig
2024-01-29 15:56       ` Matthew Wilcox
2024-01-29 16:04         ` Christoph Hellwig
2024-01-26 13:28 ` [PATCH 08/21] xfs: remove xfile_stat Christoph Hellwig
2024-01-26 13:28 ` [PATCH 09/21] xfs: remove the xfile_pread/pwrite APIs Christoph Hellwig
2024-01-26 16:21   ` Matthew Wilcox
2024-01-26 16:48     ` Darrick J. Wong
2024-01-26 13:28 ` [PATCH 10/21] xfs: don't try to handle non-update pages in xfile_obj_load Christoph Hellwig
2024-01-26 13:28 ` [PATCH 11/21] xfs: shmem_file_setup can't return NULL Christoph Hellwig
2024-01-26 13:28 ` [PATCH 12/21] xfs: don't modify file and inode flags for shmem files Christoph Hellwig
2024-01-26 13:28 ` [PATCH 13/21] xfs: don't allow highmem pages in xfile mappings Christoph Hellwig
2024-01-26 16:26   ` Matthew Wilcox
2024-01-26 13:28 ` [PATCH 14/21] xfs: use shmem_get_folio in xfile_obj_store Christoph Hellwig
2024-01-26 16:29   ` Matthew Wilcox
2024-01-26 13:28 ` [PATCH 15/21] xfs: use shmem_get_folio in in xfile_load Christoph Hellwig
2024-01-26 13:28 ` [PATCH 16/21] xfs: improve detection of lost xfile contents Christoph Hellwig
2024-01-26 16:33   ` Matthew Wilcox
2024-01-26 16:50     ` Darrick J. Wong
2024-01-28 16:55     ` Christoph Hellwig
2024-01-28 20:33       ` Matthew Wilcox
2024-01-26 13:28 ` [PATCH 17/21] xfs: add file_{get,put}_folio Christoph Hellwig
2024-01-26 16:40   ` Matthew Wilcox
2024-01-26 16:55     ` Darrick J. Wong
2024-01-27  1:26   ` Kent Overstreet
2024-01-27  1:32     ` Darrick J. Wong
2024-01-26 13:29 ` [PATCH 18/21] xfs: remove xfarray_sortinfo.page_kaddr Christoph Hellwig
2024-01-26 13:29 ` [PATCH 19/21] xfs: fix a comment in xfarray.c Christoph Hellwig
2024-01-26 13:29 ` [PATCH 20/21] xfs: convert xfarray_pagesort to deal with large folios Christoph Hellwig
2024-01-27  1:09   ` Kent Overstreet
2024-01-26 13:29 ` [PATCH 21/21] xfs: remove xfile_{get,put}_page Christoph Hellwig
2024-01-26 14:15 ` put the xfs xfile abstraction on a diet v2 Matthew Wilcox
2024-01-26 14:18   ` Christoph Hellwig
2024-01-26 16:47 ` Matthew Wilcox

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.