linux-xfs.vger.kernel.org archive mirror

Re: [PATCH v5 4/9] xfs: convert delayed extents to unwritten when zeroing post eof blocks

2024-04-27T06:59:04Z

On Fri, Apr 26, 2024 at 03:18:17PM +0800, Zhang Yi wrote:
> I've had the same idea before, I asked Dave and he explained that Linux
> could leak data beyond EOF page for some cases, e.g. mmap() can write to
> the EOF page beyond EOF without failing, and the data in that EOF page
> could be non-zeroed by mmap(), so the zeroing is still needed now.
> 
> OTOH, if we free the delalloc and unwritten blocks beyond EOF blocks, he
> said it could lead to some performance problems and make thinks
> complicated to deal with the trimming of EOF block. Please see [1]
> for details and maybe Dave could explain more.

Oh well.  Given that we're full in on the speculative allocations
we might as well deal with it.

Re: [RFCv3 7/7] iomap: Optimize data access patterns for filesystems with indirect mappings

2024-04-27T06:03:51Z

Matthew Wilcox  writes:

> On Sat, Apr 27, 2024 at 12:27:52AM +0530, Ritesh Harjani wrote:
>> Matthew Wilcox  writes:
>> > @@ -79,6 +79,7 @@ static void iomap_set_range_uptodate(struct folio *folio, size_t off,
>> >  	if (ifs) {
>> >  		spin_lock_irqsave(&ifs->state_lock, flags);
>> >  		uptodate = ifs_set_range_uptodate(folio, ifs, off, len);
>> > +		ifs->read_bytes_pending -= len;
>> >  		spin_unlock_irqrestore(&ifs->state_lock, flags);
>> >  	}
>> 
>> iomap_set_range_uptodate() gets called from ->write_begin() and
>> ->write_end() too. So what we are saying is we are updating
>> the state of read_bytes_pending even though we are not in
>> ->read_folio() or ->readahead() call?
>
> Exactly.
>
>> >  
>> > @@ -208,6 +209,8 @@ static struct iomap_folio_state *ifs_alloc(struct inode *inode,
>> >  	spin_lock_init(&ifs->state_lock);
>> >  	if (folio_test_uptodate(folio))
>> >  		bitmap_set(ifs->state, 0, nr_blocks);
>> > +	else
>> > +		ifs->read_bytes_pending = folio_size(folio);
>> 
>> We might not come till here during ->read_folio -> ifs_alloc(). Since we
>> might have a cached ifs which was allocated during write to this folio.
>> 
>> But unless you are saying that during writes, we would have set
>> ifs->r_b_p to folio_size() and when the read call happens, we use
>> the same value of the cached ifs.
>> Ok, I see. I was mostly focusing on updating ifs->r_b_p value only when
>> the reads bytes are actually pending during ->read_folio() or
>> ->readahead() and not updating r_b_p during writes.
>
> I see why you might want to think that way ... but this way is much less
> complex, don't you think?  ;-)
>
>> ...One small problem which I see with this approach is - we might have
>> some non-zero value in ifs->r_b_p when ifs_free() gets called and it
>> might give a warning of non-zero ifs->r_b_p, because we updated
>> ifs->r_b_p during writes to a non-zero value, but the reads
>> never happend. Then during a call to ->release_folio, it will complain
>> of a non-zero ifs->r_b_p.
>
> Yes, we'll have to remove that assertion.  I don't think that's a
> problem, do you?

Sure, I will give it a spin.

-ritesh

Re: [PATCH 1/2] common/config: export TEST_DEV for mkfs.xfs

2024-04-27T05:56:45Z

On Thu, Apr 11, 2024 at 04:32:33PM +1000, David Disseldorp wrote:
> As of xfsprogs commit 6e0ed3d1 ("mkfs: stop allowing tiny filesystems")
> attempts to create XFS filesystems sized under 300M fail, unless
> TEST_DIR, TEST_DEV and QA_CHECK_FS environment variables are exported
> (or a --unsupported mkfs parameter is provided).
> 
> TEST_DIR and QA_CHECK_FS are already exported, while TEST_DEV may only
> be locally set if provided via e.g. configs/$HOSTNAME.config. Explicitly
> export TEST_DEV to ensure that tests which call _scratch_mkfs_sized()
> with an fssize under 300M run normally.

As for fixing the immediate problem this looks fine:

Reviewed-by: Christoph Hellwig 

But adding the xfs list as allowing to create a smaller than supported
file system just for testing is pretty silly.  If we don't want to
support these tiny file systems, we should also not use them for
testing.  The best way to port over the existing tests to a larger
size would probably be to round up the size to the minimum supported
one and then fill the space?

Re: [PATCH v4 07/11] iomap: fix iomap_dio_zero() for fs bs > system page size

2024-04-27T05:12:05Z

On Fri, Apr 26, 2024 at 11:43:01AM +0000, Pankaj Raghav (Samsung) wrote:
> Because allocating it during runtime will defeat the purpose.

Well, what runtime?  Either way it seems like we have the infrastructure
now based on the comment from willy.

> In anycase, I would like to pursue huge_zero_page folio separately
> from this series. Also iomap_dio_zero() only pads a fs block with
> zeroes, which should never be > 64k for XFS.

Only if you are limited to 64k block size.

Re: [PATCH v4 00/11] enable bs > ps in XFS

2024-04-27T05:05:18Z

On Sat, Apr 27, 2024 at 10:12:38AM +0530, Ritesh Harjani wrote:
> "Pankaj Raghav (Samsung)"  writes:
> 
> > From: Pankaj Raghav 
> >
> > This is the fourth version of the series that enables block size > page size
> > (Large Block Size) in XFS. The context and motivation can be seen in cover
> > letter of the RFC v1[1]. We also recorded a talk about this effort at LPC [3],
> > if someone would like more context on this effort.
> >
> > This series does not split a folio during truncation even though we have
> > an API to do so due to some issues with writeback. While it is not a
> > blocker, this feature can be added as a future improvement once we
> > get the base patches upstream (See patch 7).
> >
> > A lot of emphasis has been put on testing using kdevops. The testing has
> > been split into regression and progression.
> >
> > Regression testing:
> > In regression testing, we ran the whole test suite to check for
> > *regression on existing profiles due to the page cache changes.
> >
> > No regression was found with the patches added on top.
> >
> > Progression testing:
> > For progression testing, we tested for 8k, 16k, 32k and 64k block sizes.
> > To compare it with existing support, an ARM VM with 64k base page system
> > (without our patches) was used as a reference to check for actual failures
> > due to LBS support in a 4k base page size system.
> >
> > There are some tests that assumes block size < page size that needs to
> > be fixed. I have a tree with fixes for xfstests here [6], which I will be
> > sending soon to the list. Already a part of this has been upstreamed to
> > fstest.
> >
> > No new failures were found with the LBS support.
> 
> I just did portability testing by creating XFS with 16k bs on x86 VM (4k
> pagesize), created some files + checksums. I then moved the disk to
> Power VM with 64k pagesize and mounted this. I was able to mount and
> all the file checksums passed.
> 
> Then I did the vice versa, created a filesystem on Power VM with 64k
> blocksize and created 10 files with random data of 10MB each. I then
> hotplugged this device out from Power and plugged it into x86 VM and
> mounted it.
> 
> 
> ~# mount /dev/vdk /mnt1/
> [   35.145350] XFS (vdk): EXPERIMENTAL: Filesystem with Large Block Size (65536 bytes) enabled.
> [   35.149858] XFS (vdk): Mounting V5 Filesystem 91933a8b-1370-4931-97d1-c21213f31f8f
> [   35.227459] XFS (vdk): Ending clean mount
> [   35.235090] xfs filesystem being mounted at /mnt1 supports timestamps until 2038-01-19 (0x7fffffff)
> ~# cd /mnt1/
> ~# sha256sum -c checksums 
> file-1.img: OK
> file-2.img: OK
> file-3.img: OK
> file-4.img: OK
> file-5.img: OK
> file-6.img: OK
> file-7.img: OK
> file-8.img: OK
> file-9.img: OK
> file-10.img: OK
> 
> So thanks for this nice portability which this series offers :) 

Yessss this is awesome to see this coming together after many years!

--D

> -ritesh
> 
>

[PATCH 5/5] xfs: refactor dir format helpers

2024-04-27T05:04:18Z

Add a new enum and a xfs_dir2_format helper that returns it to allow
the code to switch on the format of a directory in a single operation
and switch all helpers of xfs_dir2_isblock and xfs_dir2_isleaf to it.

This also removes the explicit xfs_iread_extents call in a few of the
call sites given that xfs_bmap_last_offset already takes care of it
underneath.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Darrick J. Wong 
---
 fs/xfs/libxfs/xfs_dir2.c     | 188 ++++++++++++++---------------------
 fs/xfs/libxfs/xfs_dir2.h     |  12 ++-
 fs/xfs/libxfs/xfs_exchmaps.c |   9 +-
 fs/xfs/scrub/dir.c           |   3 +-
 fs/xfs/scrub/readdir.c       |  24 ++---
 fs/xfs/xfs_dir2_readdir.c    |  19 ++--
 6 files changed, 105 insertions(+), 150 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_dir2.c b/fs/xfs/libxfs/xfs_dir2.c
index d3d4d80c2098d3..457f9a38f85045 100644
--- a/fs/xfs/libxfs/xfs_dir2.c
+++ b/fs/xfs/libxfs/xfs_dir2.c
@@ -256,31 +256,60 @@ xfs_dir_init(
 	return error;
 }
 
+enum xfs_dir2_fmt
+xfs_dir2_format(
+	struct xfs_da_args	*args,
+	int			*error)
+{
+	struct xfs_inode	*dp = args->dp;
+	struct xfs_mount	*mp = dp->i_mount;
+	struct xfs_da_geometry	*geo = mp->m_dir_geo;
+	xfs_fileoff_t		eof;
+
+	xfs_assert_ilocked(dp, XFS_ILOCK_SHARED | XFS_ILOCK_EXCL);
+
+	*error = 0;
+	if (dp->i_df.if_format == XFS_DINODE_FMT_LOCAL)
+		return XFS_DIR2_FMT_SF;
+
+	*error = xfs_bmap_last_offset(dp, &eof, XFS_DATA_FORK);
+	if (*error)
+		return XFS_DIR2_FMT_ERROR;
+
+	if (eof == XFS_B_TO_FSB(mp, geo->blksize)) {
+		if (XFS_IS_CORRUPT(mp, dp->i_disk_size != geo->blksize)) {
+			xfs_da_mark_sick(args);
+			*error = -EFSCORRUPTED;
+			return XFS_DIR2_FMT_ERROR;
+		}
+		return XFS_DIR2_FMT_BLOCK;
+	}
+	if (eof == geo->leafblk + geo->fsbcount)
+		return XFS_DIR2_FMT_LEAF;
+	return XFS_DIR2_FMT_NODE;
+}
+
 int
 xfs_dir_createname_args(
 	struct xfs_da_args	*args)
 {
-	bool			is_block, is_leaf;
 	int			error;
 
 	if (!args->inumber)
 		args->op_flags |= XFS_DA_OP_JUSTCHECK;
 
-	if (args->dp->i_df.if_format == XFS_DINODE_FMT_LOCAL)
+	switch (xfs_dir2_format(args, &error)) {
+	case XFS_DIR2_FMT_SF:
 		return xfs_dir2_sf_addname(args);
-
-	error = xfs_dir2_isblock(args, &is_block);
-	if (error)
-		return error;
-	if (is_block)
+	case XFS_DIR2_FMT_BLOCK:
 		return xfs_dir2_block_addname(args);
-
-	error = xfs_dir2_isleaf(args, &is_leaf);
-	if (error)
-		return error;
-	if (is_leaf)
+	case XFS_DIR2_FMT_LEAF:
 		return xfs_dir2_leaf_addname(args);
-	return xfs_dir2_node_addname(args);
+	case XFS_DIR2_FMT_NODE:
+		return xfs_dir2_node_addname(args);
+	default:
+		return error;
+	}
 }
 
 /*
@@ -359,36 +388,25 @@ int
 xfs_dir_lookup_args(
 	struct xfs_da_args	*args)
 {
-	bool			is_block, is_leaf;
 	int			error;
 
-	if (args->dp->i_df.if_format == XFS_DINODE_FMT_LOCAL) {
+	switch (xfs_dir2_format(args, &error)) {
+	case XFS_DIR2_FMT_SF:
 		error = xfs_dir2_sf_lookup(args);
-		goto out;
-	}
-
-	/* dir2 functions require that the data fork is loaded */
-	error = xfs_iread_extents(args->trans, args->dp, XFS_DATA_FORK);
-	if (error)
-		goto out;
-
-	error = xfs_dir2_isblock(args, &is_block);
-	if (error)
-		goto out;
-
-	if (is_block) {
+		break;
+	case XFS_DIR2_FMT_BLOCK:
 		error = xfs_dir2_block_lookup(args);
-		goto out;
-	}
-
-	error = xfs_dir2_isleaf(args, &is_leaf);
-	if (error)
-		goto out;
-	if (is_leaf)
+		break;
+	case XFS_DIR2_FMT_LEAF:
 		error = xfs_dir2_leaf_lookup(args);
-	else
+		break;
+	case XFS_DIR2_FMT_NODE:
 		error = xfs_dir2_node_lookup(args);
-out:
+		break;
+	default:
+		break;
+	}
+
 	if (error != -EEXIST)
 		return error;
 	return 0;
@@ -448,24 +466,20 @@ int
 xfs_dir_removename_args(
 	struct xfs_da_args	*args)
 {
-	bool			is_block, is_leaf;
 	int			error;
 
-	if (args->dp->i_df.if_format == XFS_DINODE_FMT_LOCAL)
+	switch (xfs_dir2_format(args, &error)) {
+	case XFS_DIR2_FMT_SF:
 		return xfs_dir2_sf_removename(args);
-
-	error = xfs_dir2_isblock(args, &is_block);
-	if (error)
-		return error;
-	if (is_block)
+	case XFS_DIR2_FMT_BLOCK:
 		return xfs_dir2_block_removename(args);
-
-	error = xfs_dir2_isleaf(args, &is_leaf);
-	if (error)
-		return error;
-	if (is_leaf)
+	case XFS_DIR2_FMT_LEAF:
 		return xfs_dir2_leaf_removename(args);
-	return xfs_dir2_node_removename(args);
+	case XFS_DIR2_FMT_NODE:
+		return xfs_dir2_node_removename(args);
+	default:
+		return error;
+	}
 }
 
 /*
@@ -509,25 +523,20 @@ int
 xfs_dir_replace_args(
 	struct xfs_da_args	*args)
 {
-	bool			is_block, is_leaf;
 	int			error;
 
-	if (args->dp->i_df.if_format == XFS_DINODE_FMT_LOCAL)
+	switch (xfs_dir2_format(args, &error)) {
+	case XFS_DIR2_FMT_SF:
 		return xfs_dir2_sf_replace(args);
-
-	error = xfs_dir2_isblock(args, &is_block);
-	if (error)
-		return error;
-	if (is_block)
+	case XFS_DIR2_FMT_BLOCK:
 		return xfs_dir2_block_replace(args);
-
-	error = xfs_dir2_isleaf(args, &is_leaf);
-	if (error)
-		return error;
-	if (is_leaf)
+	case XFS_DIR2_FMT_LEAF:
 		return xfs_dir2_leaf_replace(args);
-
-	return xfs_dir2_node_replace(args);
+	case XFS_DIR2_FMT_NODE:
+		return xfs_dir2_node_replace(args);
+	default:
+		return error;
+	}
 }
 
 /*
@@ -633,57 +642,6 @@ xfs_dir2_grow_inode(
 	return 0;
 }
 
-/*
- * See if the directory is a single-block form directory.
- */
-int
-xfs_dir2_isblock(
-	struct xfs_da_args	*args,
-	bool			*isblock)
-{
-	struct xfs_mount	*mp = args->dp->i_mount;
-	xfs_fileoff_t		eof;
-	int			error;
-
-	error = xfs_bmap_last_offset(args->dp, &eof, XFS_DATA_FORK);
-	if (error)
-		return error;
-
-	*isblock = false;
-	if (XFS_FSB_TO_B(mp, eof) != args->geo->blksize)
-		return 0;
-
-	*isblock = true;
-	if (XFS_IS_CORRUPT(mp, args->dp->i_disk_size != args->geo->blksize)) {
-		xfs_da_mark_sick(args);
-		return -EFSCORRUPTED;
-	}
-	return 0;
-}
-
-/*
- * See if the directory is a single-leaf form directory.
- */
-int
-xfs_dir2_isleaf(
-	struct xfs_da_args	*args,
-	bool			*isleaf)
-{
-	xfs_fileoff_t		eof;
-	int			error;
-
-	error = xfs_bmap_last_offset(args->dp, &eof, XFS_DATA_FORK);
-	if (error)
-		return error;
-
-	*isleaf = false;
-	if (eof != args->geo->leafblk + args->geo->fsbcount)
-		return 0;
-
-	*isleaf = true;
-	return 0;
-}
-
 /*
  * Remove the given block from the directory.
  * This routine is used for data and free blocks, leaf/node are done
diff --git a/fs/xfs/libxfs/xfs_dir2.h b/fs/xfs/libxfs/xfs_dir2.h
index 6c00fe24a8987e..6dbe6e9ecb491f 100644
--- a/fs/xfs/libxfs/xfs_dir2.h
+++ b/fs/xfs/libxfs/xfs_dir2.h
@@ -36,6 +36,16 @@ xfs_dir2_samename(
 	return !memcmp(n1->name, n2->name, n1->len);
 }
 
+enum xfs_dir2_fmt {
+	XFS_DIR2_FMT_SF,
+	XFS_DIR2_FMT_BLOCK,
+	XFS_DIR2_FMT_LEAF,
+	XFS_DIR2_FMT_NODE,
+	XFS_DIR2_FMT_ERROR,
+};
+
+enum xfs_dir2_fmt xfs_dir2_format(struct xfs_da_args *args, int *error);
+
 /*
  * Convert inode mode to directory entry filetype
  */
@@ -79,8 +89,6 @@ extern int xfs_dir2_sf_to_block(struct xfs_da_args *args);
 /*
  * Interface routines used by userspace utilities
  */
-extern int xfs_dir2_isblock(struct xfs_da_args *args, bool *isblock);
-extern int xfs_dir2_isleaf(struct xfs_da_args *args, bool *isleaf);
 extern int xfs_dir2_shrink_inode(struct xfs_da_args *args, xfs_dir2_db_t db,
 				struct xfs_buf *bp);
 
diff --git a/fs/xfs/libxfs/xfs_exchmaps.c b/fs/xfs/libxfs/xfs_exchmaps.c
index 44ab6a9235c0bd..2021396651de27 100644
--- a/fs/xfs/libxfs/xfs_exchmaps.c
+++ b/fs/xfs/libxfs/xfs_exchmaps.c
@@ -465,17 +465,12 @@ xfs_exchmaps_dir_to_sf(
 	};
 	struct xfs_dir2_sf_hdr	sfh;
 	struct xfs_buf		*bp;
-	bool			isblock;
 	int			size;
-	int			error;
+	int			error = 0;
 
-	error = xfs_dir2_isblock(&args, &isblock);
-	if (error)
+	if (xfs_dir2_format(&args, &error) != XFS_DIR2_FMT_BLOCK)
 		return error;
 
-	if (!isblock)
-		return 0;
-
 	error = xfs_dir3_block_read(tp, xmi->xmi_ip2, xmi->xmi_ip2->i_ino, &bp);
 	if (error)
 		return error;
diff --git a/fs/xfs/scrub/dir.c b/fs/xfs/scrub/dir.c
index 62474d0557c41a..bf9199e8df633f 100644
--- a/fs/xfs/scrub/dir.c
+++ b/fs/xfs/scrub/dir.c
@@ -808,7 +808,8 @@ xchk_directory_blocks(
 	free_lblk = XFS_B_TO_FSB(mp, XFS_DIR2_FREE_OFFSET);
 
 	/* Is this a block dir? */
-	error = xfs_dir2_isblock(&args, &is_block);
+	if (xfs_dir2_format(&args, &error) == XFS_DIR2_FMT_BLOCK)
+		is_block = true;
 	if (!xchk_fblock_process_error(sc, XFS_DATA_FORK, lblk, &error))
 		goto out;
 
diff --git a/fs/xfs/scrub/readdir.c b/fs/xfs/scrub/readdir.c
index 0ac77359d8e9f8..01c9a2dc0f2c48 100644
--- a/fs/xfs/scrub/readdir.c
+++ b/fs/xfs/scrub/readdir.c
@@ -276,7 +276,6 @@ xchk_dir_walk(
 		.trans		= sc->tp,
 		.owner		= dp->i_ino,
 	};
-	bool			isblock;
 	int			error;
 
 	if (xfs_is_shutdown(dp->i_mount))
@@ -285,22 +284,17 @@ xchk_dir_walk(
 	ASSERT(S_ISDIR(VFS_I(dp)->i_mode));
 	xfs_assert_ilocked(dp, XFS_ILOCK_SHARED | XFS_ILOCK_EXCL);
 
-	if (dp->i_df.if_format == XFS_DINODE_FMT_LOCAL)
+	switch (xfs_dir2_format(&args, &error)) {
+	case XFS_DIR2_FMT_SF:
 		return xchk_dir_walk_sf(sc, dp, dirent_fn, priv);
-
-	/* dir2 functions require that the data fork is loaded */
-	error = xfs_iread_extents(sc->tp, dp, XFS_DATA_FORK);
-	if (error)
-		return error;
-
-	error = xfs_dir2_isblock(&args, &isblock);
-	if (error)
-		return error;
-
-	if (isblock)
+	case XFS_DIR2_FMT_BLOCK:
 		return xchk_dir_walk_block(sc, dp, dirent_fn, priv);
-
-	return xchk_dir_walk_leaf(sc, dp, dirent_fn, priv);
+	case XFS_DIR2_FMT_LEAF:
+	case XFS_DIR2_FMT_NODE:
+		return xchk_dir_walk_leaf(sc, dp, dirent_fn, priv);
+	default:
+		return error;
+	}
 }
 
 /*
diff --git a/fs/xfs/xfs_dir2_readdir.c b/fs/xfs/xfs_dir2_readdir.c
index b3abad5a6cd800..06ac5a7de60a04 100644
--- a/fs/xfs/xfs_dir2_readdir.c
+++ b/fs/xfs/xfs_dir2_readdir.c
@@ -516,7 +516,6 @@ xfs_readdir(
 {
 	struct xfs_da_args	args = { NULL };
 	unsigned int		lock_mode;
-	bool			isblock;
 	int			error;
 
 	trace_xfs_readdir(dp);
@@ -539,18 +538,18 @@ xfs_readdir(
 		return xfs_dir2_sf_getdents(&args, ctx);
 
 	lock_mode = xfs_ilock_data_map_shared(dp);
-	error = xfs_dir2_isblock(&args, &isblock);
-	if (error)
-		goto out_unlock;
-
-	if (isblock) {
+	switch (xfs_dir2_format(&args, &error)) {
+	case XFS_DIR2_FMT_BLOCK:
 		error = xfs_dir2_block_getdents(&args, ctx, &lock_mode);
-		goto out_unlock;
+		break;
+	case XFS_DIR2_FMT_LEAF:
+	case XFS_DIR2_FMT_NODE:
+		error = xfs_dir2_leaf_getdents(&args, ctx, bufsize, &lock_mode);
+		break;
+	default:
+		break;
 	}
 
-	error = xfs_dir2_leaf_getdents(&args, ctx, bufsize, &lock_mode);
-
-out_unlock:
 	if (lock_mode)
 		xfs_iunlock(dp, lock_mode);
 	return error;
-- 
2.39.2

[PATCH 4/5] xfs: factor out a xfs_dir_replace_args helper

2024-04-27T05:04:15Z

Add a helper to switch between the different directory formats for
removing a directory entry.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Darrick J. Wong 
---
 fs/xfs/libxfs/xfs_dir2.c  | 49 +++++++++++++++++++++------------------
 fs/xfs/libxfs/xfs_dir2.h  |  1 +
 fs/xfs/scrub/dir_repair.c | 19 +--------------
 3 files changed, 28 insertions(+), 41 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_dir2.c b/fs/xfs/libxfs/xfs_dir2.c
index 76aa11ade2e92d..d3d4d80c2098d3 100644
--- a/fs/xfs/libxfs/xfs_dir2.c
+++ b/fs/xfs/libxfs/xfs_dir2.c
@@ -505,6 +505,31 @@ xfs_dir_removename(
 	return rval;
 }
 
+int
+xfs_dir_replace_args(
+	struct xfs_da_args	*args)
+{
+	bool			is_block, is_leaf;
+	int			error;
+
+	if (args->dp->i_df.if_format == XFS_DINODE_FMT_LOCAL)
+		return xfs_dir2_sf_replace(args);
+
+	error = xfs_dir2_isblock(args, &is_block);
+	if (error)
+		return error;
+	if (is_block)
+		return xfs_dir2_block_replace(args);
+
+	error = xfs_dir2_isleaf(args, &is_leaf);
+	if (error)
+		return error;
+	if (is_leaf)
+		return xfs_dir2_leaf_replace(args);
+
+	return xfs_dir2_node_replace(args);
+}
+
 /*
  * Replace the inode number of a directory entry.
  */
@@ -518,7 +543,6 @@ xfs_dir_replace(
 {
 	struct xfs_da_args	*args;
 	int			rval;
-	bool			v;
 
 	ASSERT(S_ISDIR(VFS_I(dp)->i_mode));
 
@@ -541,28 +565,7 @@ xfs_dir_replace(
 	args->whichfork = XFS_DATA_FORK;
 	args->trans = tp;
 	args->owner = dp->i_ino;
-
-	if (dp->i_df.if_format == XFS_DINODE_FMT_LOCAL) {
-		rval = xfs_dir2_sf_replace(args);
-		goto out_free;
-	}
-
-	rval = xfs_dir2_isblock(args, &v);
-	if (rval)
-		goto out_free;
-	if (v) {
-		rval = xfs_dir2_block_replace(args);
-		goto out_free;
-	}
-
-	rval = xfs_dir2_isleaf(args, &v);
-	if (rval)
-		goto out_free;
-	if (v)
-		rval = xfs_dir2_leaf_replace(args);
-	else
-		rval = xfs_dir2_node_replace(args);
-out_free:
+	rval = xfs_dir_replace_args(args);
 	kfree(args);
 	return rval;
 }
diff --git a/fs/xfs/libxfs/xfs_dir2.h b/fs/xfs/libxfs/xfs_dir2.h
index 3db54801d69ecd..6c00fe24a8987e 100644
--- a/fs/xfs/libxfs/xfs_dir2.h
+++ b/fs/xfs/libxfs/xfs_dir2.h
@@ -69,6 +69,7 @@ extern int xfs_dir_canenter(struct xfs_trans *tp, struct xfs_inode *dp,
 int xfs_dir_lookup_args(struct xfs_da_args *args);
 int xfs_dir_createname_args(struct xfs_da_args *args);
 int xfs_dir_removename_args(struct xfs_da_args *args);
+int xfs_dir_replace_args(struct xfs_da_args *args);
 
 /*
  * Direct call from the bmap code, bypassing the generic directory layer.
diff --git a/fs/xfs/scrub/dir_repair.c b/fs/xfs/scrub/dir_repair.c
index 98e4ed25cc2309..64679fe0844650 100644
--- a/fs/xfs/scrub/dir_repair.c
+++ b/fs/xfs/scrub/dir_repair.c
@@ -1513,7 +1513,6 @@ xrep_dir_replace(
 	xfs_extlen_t		total)
 {
 	struct xfs_scrub	*sc = rd->sc;
-	bool			is_block, is_leaf;
 	int			error;
 
 	ASSERT(S_ISDIR(VFS_I(dp)->i_mode));
@@ -1525,23 +1524,7 @@ xrep_dir_replace(
 	xrep_dir_init_args(rd, dp, name);
 	rd->args.inumber = inum;
 	rd->args.total = total;
-
-	if (dp->i_df.if_format == XFS_DINODE_FMT_LOCAL)
-		return xfs_dir2_sf_replace(&rd->args);
-
-	error = xfs_dir2_isblock(&rd->args, &is_block);
-	if (error)
-		return error;
-	if (is_block)
-		return xfs_dir2_block_replace(&rd->args);
-
-	error = xfs_dir2_isleaf(&rd->args, &is_leaf);
-	if (error)
-		return error;
-	if (is_leaf)
-		return xfs_dir2_leaf_replace(&rd->args);
-
-	return xfs_dir2_node_replace(&rd->args);
+	return xfs_dir_replace_args(&rd->args);
 }
 
 /*
-- 
2.39.2

[PATCH 3/5] xfs: factor out a xfs_dir_removename_args helper

2024-04-27T05:04:12Z

Add a helper to switch between the different directory formats for
removing a directory entry.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Darrick J. Wong 
---
 fs/xfs/libxfs/xfs_dir2.c  | 48 ++++++++++++++++++++-------------------
 fs/xfs/libxfs/xfs_dir2.h  |  1 +
 fs/xfs/scrub/dir_repair.c | 20 +---------------
 3 files changed, 27 insertions(+), 42 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_dir2.c b/fs/xfs/libxfs/xfs_dir2.c
index e2727602d0479e..76aa11ade2e92d 100644
--- a/fs/xfs/libxfs/xfs_dir2.c
+++ b/fs/xfs/libxfs/xfs_dir2.c
@@ -444,6 +444,30 @@ xfs_dir_lookup(
 	return rval;
 }
 
+int
+xfs_dir_removename_args(
+	struct xfs_da_args	*args)
+{
+	bool			is_block, is_leaf;
+	int			error;
+
+	if (args->dp->i_df.if_format == XFS_DINODE_FMT_LOCAL)
+		return xfs_dir2_sf_removename(args);
+
+	error = xfs_dir2_isblock(args, &is_block);
+	if (error)
+		return error;
+	if (is_block)
+		return xfs_dir2_block_removename(args);
+
+	error = xfs_dir2_isleaf(args, &is_leaf);
+	if (error)
+		return error;
+	if (is_leaf)
+		return xfs_dir2_leaf_removename(args);
+	return xfs_dir2_node_removename(args);
+}
+
 /*
  * Remove an entry from a directory.
  */
@@ -457,7 +481,6 @@ xfs_dir_removename(
 {
 	struct xfs_da_args	*args;
 	int			rval;
-	bool			v;
 
 	ASSERT(S_ISDIR(VFS_I(dp)->i_mode));
 	XFS_STATS_INC(dp->i_mount, xs_dir_remove);
@@ -477,28 +500,7 @@ xfs_dir_removename(
 	args->whichfork = XFS_DATA_FORK;
 	args->trans = tp;
 	args->owner = dp->i_ino;
-
-	if (dp->i_df.if_format == XFS_DINODE_FMT_LOCAL) {
-		rval = xfs_dir2_sf_removename(args);
-		goto out_free;
-	}
-
-	rval = xfs_dir2_isblock(args, &v);
-	if (rval)
-		goto out_free;
-	if (v) {
-		rval = xfs_dir2_block_removename(args);
-		goto out_free;
-	}
-
-	rval = xfs_dir2_isleaf(args, &v);
-	if (rval)
-		goto out_free;
-	if (v)
-		rval = xfs_dir2_leaf_removename(args);
-	else
-		rval = xfs_dir2_node_removename(args);
-out_free:
+	rval = xfs_dir_removename_args(args);
 	kfree(args);
 	return rval;
 }
diff --git a/fs/xfs/libxfs/xfs_dir2.h b/fs/xfs/libxfs/xfs_dir2.h
index f5361dd7b90a93..3db54801d69ecd 100644
--- a/fs/xfs/libxfs/xfs_dir2.h
+++ b/fs/xfs/libxfs/xfs_dir2.h
@@ -68,6 +68,7 @@ extern int xfs_dir_canenter(struct xfs_trans *tp, struct xfs_inode *dp,
 
 int xfs_dir_lookup_args(struct xfs_da_args *args);
 int xfs_dir_createname_args(struct xfs_da_args *args);
+int xfs_dir_removename_args(struct xfs_da_args *args);
 
 /*
  * Direct call from the bmap code, bypassing the generic directory layer.
diff --git a/fs/xfs/scrub/dir_repair.c b/fs/xfs/scrub/dir_repair.c
index a1e31b7827881c..98e4ed25cc2309 100644
--- a/fs/xfs/scrub/dir_repair.c
+++ b/fs/xfs/scrub/dir_repair.c
@@ -712,8 +712,6 @@ xrep_dir_replay_removename(
 	xfs_extlen_t		total)
 {
 	struct xfs_inode	*dp = rd->args.dp;
-	bool			is_block, is_leaf;
-	int			error;
 
 	ASSERT(S_ISDIR(VFS_I(dp)->i_mode));
 
@@ -722,23 +720,7 @@ xrep_dir_replay_removename(
 	rd->args.total = total;
 
 	trace_xrep_dir_replay_removename(dp, name, 0);
-
-	if (dp->i_df.if_format == XFS_DINODE_FMT_LOCAL)
-		return xfs_dir2_sf_removename(&rd->args);
-
-	error = xfs_dir2_isblock(&rd->args, &is_block);
-	if (error)
-		return error;
-	if (is_block)
-		return xfs_dir2_block_removename(&rd->args);
-
-	error = xfs_dir2_isleaf(&rd->args, &is_leaf);
-	if (error)
-		return error;
-	if (is_leaf)
-		return xfs_dir2_leaf_removename(&rd->args);
-
-	return xfs_dir2_node_removename(&rd->args);
+	return xfs_dir_removename_args(&rd->args);
 }
 
 /*
-- 
2.39.2

[PATCH 2/5] xfs: factor out a xfs_dir_createname_args helper

2024-04-27T05:04:10Z

Add a helper to switch between the different directory formats for
creating a directory entry and to handle the XFS_DA_OP_JUSTCHECK flag
based on the passed in ino number field.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Darrick J. Wong 
---
 fs/xfs/libxfs/xfs_dir2.c  | 53 +++++++++++++++++++++------------------
 fs/xfs/libxfs/xfs_dir2.h  |  1 +
 fs/xfs/scrub/dir_repair.c | 19 +-------------
 3 files changed, 30 insertions(+), 43 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_dir2.c b/fs/xfs/libxfs/xfs_dir2.c
index b4f9359089117e..e2727602d0479e 100644
--- a/fs/xfs/libxfs/xfs_dir2.c
+++ b/fs/xfs/libxfs/xfs_dir2.c
@@ -256,6 +256,33 @@ xfs_dir_init(
 	return error;
 }
 
+int
+xfs_dir_createname_args(
+	struct xfs_da_args	*args)
+{
+	bool			is_block, is_leaf;
+	int			error;
+
+	if (!args->inumber)
+		args->op_flags |= XFS_DA_OP_JUSTCHECK;
+
+	if (args->dp->i_df.if_format == XFS_DINODE_FMT_LOCAL)
+		return xfs_dir2_sf_addname(args);
+
+	error = xfs_dir2_isblock(args, &is_block);
+	if (error)
+		return error;
+	if (is_block)
+		return xfs_dir2_block_addname(args);
+
+	error = xfs_dir2_isleaf(args, &is_leaf);
+	if (error)
+		return error;
+	if (is_leaf)
+		return xfs_dir2_leaf_addname(args);
+	return xfs_dir2_node_addname(args);
+}
+
 /*
  * Enter a name in a directory, or check for available space.
  * If inum is 0, only the available space test is performed.
@@ -270,7 +297,6 @@ xfs_dir_createname(
 {
 	struct xfs_da_args	*args;
 	int			rval;
-	bool			v;
 
 	ASSERT(S_ISDIR(VFS_I(dp)->i_mode));
 
@@ -297,31 +323,8 @@ xfs_dir_createname(
 	args->trans = tp;
 	args->op_flags = XFS_DA_OP_ADDNAME | XFS_DA_OP_OKNOENT;
 	args->owner = dp->i_ino;
-	if (!inum)
-		args->op_flags |= XFS_DA_OP_JUSTCHECK;
-
-	if (dp->i_df.if_format == XFS_DINODE_FMT_LOCAL) {
-		rval = xfs_dir2_sf_addname(args);
-		goto out_free;
-	}
 
-	rval = xfs_dir2_isblock(args, &v);
-	if (rval)
-		goto out_free;
-	if (v) {
-		rval = xfs_dir2_block_addname(args);
-		goto out_free;
-	}
-
-	rval = xfs_dir2_isleaf(args, &v);
-	if (rval)
-		goto out_free;
-	if (v)
-		rval = xfs_dir2_leaf_addname(args);
-	else
-		rval = xfs_dir2_node_addname(args);
-
-out_free:
+	rval = xfs_dir_createname_args(args);
 	kfree(args);
 	return rval;
 }
diff --git a/fs/xfs/libxfs/xfs_dir2.h b/fs/xfs/libxfs/xfs_dir2.h
index 982c2249bfa305..f5361dd7b90a93 100644
--- a/fs/xfs/libxfs/xfs_dir2.h
+++ b/fs/xfs/libxfs/xfs_dir2.h
@@ -67,6 +67,7 @@ extern int xfs_dir_canenter(struct xfs_trans *tp, struct xfs_inode *dp,
 				struct xfs_name *name);
 
 int xfs_dir_lookup_args(struct xfs_da_args *args);
+int xfs_dir_createname_args(struct xfs_da_args *args);
 
 /*
  * Direct call from the bmap code, bypassing the generic directory layer.
diff --git a/fs/xfs/scrub/dir_repair.c b/fs/xfs/scrub/dir_repair.c
index fa1a9954d48d93..a1e31b7827881c 100644
--- a/fs/xfs/scrub/dir_repair.c
+++ b/fs/xfs/scrub/dir_repair.c
@@ -687,7 +687,6 @@ xrep_dir_replay_createname(
 {
 	struct xfs_scrub	*sc = rd->sc;
 	struct xfs_inode	*dp = rd->sc->tempip;
-	bool			is_block, is_leaf;
 	int			error;
 
 	ASSERT(S_ISDIR(VFS_I(dp)->i_mode));
@@ -702,23 +701,7 @@ xrep_dir_replay_createname(
 	rd->args.inumber = inum;
 	rd->args.total = total;
 	rd->args.op_flags = XFS_DA_OP_ADDNAME | XFS_DA_OP_OKNOENT;
-
-	if (dp->i_df.if_format == XFS_DINODE_FMT_LOCAL)
-		return xfs_dir2_sf_addname(&rd->args);
-
-	error = xfs_dir2_isblock(&rd->args, &is_block);
-	if (error)
-		return error;
-	if (is_block)
-		return xfs_dir2_block_addname(&rd->args);
-
-	error = xfs_dir2_isleaf(&rd->args, &is_leaf);
-	if (error)
-		return error;
-	if (is_leaf)
-		return xfs_dir2_leaf_addname(&rd->args);
-
-	return xfs_dir2_node_addname(&rd->args);
+	return xfs_dir_createname_args(&rd->args);
 }
 
 /* Replay a stashed removename onto the temporary directory. */
-- 
2.39.2

[PATCH 1/5] xfs: factor out a xfs_dir_lookup_args helper

2024-04-27T05:04:08Z

Add a helper to switch between the different directory formats for
lookup and to handle the -EEXIST return for a successful lookup.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Darrick J. Wong 
---
 fs/xfs/libxfs/xfs_dir2.c | 66 ++++++++++++++++++++++++----------------
 fs/xfs/libxfs/xfs_dir2.h |  2 ++
 fs/xfs/scrub/readdir.c   | 35 +--------------------
 3 files changed, 43 insertions(+), 60 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_dir2.c b/fs/xfs/libxfs/xfs_dir2.c
index 7634344dc51538..b4f9359089117e 100644
--- a/fs/xfs/libxfs/xfs_dir2.c
+++ b/fs/xfs/libxfs/xfs_dir2.c
@@ -352,6 +352,45 @@ xfs_dir_cilookup_result(
 	return -EEXIST;
 }
 
+int
+xfs_dir_lookup_args(
+	struct xfs_da_args	*args)
+{
+	bool			is_block, is_leaf;
+	int			error;
+
+	if (args->dp->i_df.if_format == XFS_DINODE_FMT_LOCAL) {
+		error = xfs_dir2_sf_lookup(args);
+		goto out;
+	}
+
+	/* dir2 functions require that the data fork is loaded */
+	error = xfs_iread_extents(args->trans, args->dp, XFS_DATA_FORK);
+	if (error)
+		goto out;
+
+	error = xfs_dir2_isblock(args, &is_block);
+	if (error)
+		goto out;
+
+	if (is_block) {
+		error = xfs_dir2_block_lookup(args);
+		goto out;
+	}
+
+	error = xfs_dir2_isleaf(args, &is_leaf);
+	if (error)
+		goto out;
+	if (is_leaf)
+		error = xfs_dir2_leaf_lookup(args);
+	else
+		error = xfs_dir2_node_lookup(args);
+out:
+	if (error != -EEXIST)
+		return error;
+	return 0;
+}
+
 /*
  * Lookup a name in a directory, give back the inode number.
  * If ci_name is not NULL, returns the actual name in ci_name if it differs
@@ -368,7 +407,6 @@ xfs_dir_lookup(
 {
 	struct xfs_da_args	*args;
 	int			rval;
-	bool			v;
 	int			lock_mode;
 
 	ASSERT(S_ISDIR(VFS_I(dp)->i_mode));
@@ -390,30 +428,7 @@ xfs_dir_lookup(
 		args->op_flags |= XFS_DA_OP_CILOOKUP;
 
 	lock_mode = xfs_ilock_data_map_shared(dp);
-	if (dp->i_df.if_format == XFS_DINODE_FMT_LOCAL) {
-		rval = xfs_dir2_sf_lookup(args);
-		goto out_check_rval;
-	}
-
-	rval = xfs_dir2_isblock(args, &v);
-	if (rval)
-		goto out_free;
-	if (v) {
-		rval = xfs_dir2_block_lookup(args);
-		goto out_check_rval;
-	}
-
-	rval = xfs_dir2_isleaf(args, &v);
-	if (rval)
-		goto out_free;
-	if (v)
-		rval = xfs_dir2_leaf_lookup(args);
-	else
-		rval = xfs_dir2_node_lookup(args);
-
-out_check_rval:
-	if (rval == -EEXIST)
-		rval = 0;
+	rval = xfs_dir_lookup_args(args);
 	if (!rval) {
 		*inum = args->inumber;
 		if (ci_name) {
@@ -421,7 +436,6 @@ xfs_dir_lookup(
 			ci_name->len = args->valuelen;
 		}
 	}
-out_free:
 	xfs_iunlock(dp, lock_mode);
 	kfree(args);
 	return rval;
diff --git a/fs/xfs/libxfs/xfs_dir2.h b/fs/xfs/libxfs/xfs_dir2.h
index b580a78bcf4fc2..982c2249bfa305 100644
--- a/fs/xfs/libxfs/xfs_dir2.h
+++ b/fs/xfs/libxfs/xfs_dir2.h
@@ -66,6 +66,8 @@ extern int xfs_dir_replace(struct xfs_trans *tp, struct xfs_inode *dp,
 extern int xfs_dir_canenter(struct xfs_trans *tp, struct xfs_inode *dp,
 				struct xfs_name *name);
 
+int xfs_dir_lookup_args(struct xfs_da_args *args);
+
 /*
  * Direct call from the bmap code, bypassing the generic directory layer.
  */
diff --git a/fs/xfs/scrub/readdir.c b/fs/xfs/scrub/readdir.c
index 28a94c78b0b199..0ac77359d8e9f8 100644
--- a/fs/xfs/scrub/readdir.c
+++ b/fs/xfs/scrub/readdir.c
@@ -328,7 +328,6 @@ xchk_dir_lookup(
 		.op_flags	= XFS_DA_OP_OKNOENT,
 		.owner		= dp->i_ino,
 	};
-	bool			isblock, isleaf;
 	int			error;
 
 	if (xfs_is_shutdown(dp->i_mount))
@@ -344,39 +343,7 @@ xchk_dir_lookup(
 	ASSERT(S_ISDIR(VFS_I(dp)->i_mode));
 	xfs_assert_ilocked(dp, XFS_ILOCK_SHARED | XFS_ILOCK_EXCL);
 
-	if (dp->i_df.if_format == XFS_DINODE_FMT_LOCAL) {
-		error = xfs_dir2_sf_lookup(&args);
-		goto out_check_rval;
-	}
-
-	/* dir2 functions require that the data fork is loaded */
-	error = xfs_iread_extents(sc->tp, dp, XFS_DATA_FORK);
-	if (error)
-		return error;
-
-	error = xfs_dir2_isblock(&args, &isblock);
-	if (error)
-		return error;
-
-	if (isblock) {
-		error = xfs_dir2_block_lookup(&args);
-		goto out_check_rval;
-	}
-
-	error = xfs_dir2_isleaf(&args, &isleaf);
-	if (error)
-		return error;
-
-	if (isleaf) {
-		error = xfs_dir2_leaf_lookup(&args);
-		goto out_check_rval;
-	}
-
-	error = xfs_dir2_node_lookup(&args);
-
-out_check_rval:
-	if (error == -EEXIST)
-		error = 0;
+	error = xfs_dir_lookup_args(&args);
 	if (!error)
 		*ino = args.inumber;
 	return error;
-- 
2.39.2

add higher level directory operations helpers v2

2024-04-27T05:04:08Z

Hi all,

with the scrub and online repair code we now duplicate the switching
between the directory format for directory operations in at least two
places for each operation, with the metadir code adding even more for
some of these operations.

This series adds _args helpers to consolidate this code, and then
refactors the checking for the directory format into a single well-defined
helper.

This is now based against the for-next branch in the xfs tree.

Changes since v1:
 - removed two stray whitespaces in the last patch

Diffstat:
 libxfs/xfs_dir2.c     |  274 +++++++++++++++++++++++---------------------------
 libxfs/xfs_dir2.h     |   17 ++-
 libxfs/xfs_exchmaps.c |    9 -
 scrub/dir.c           |    3 
 scrub/dir_repair.c    |   58 ----------
 scrub/readdir.c       |   59 +---------
 xfs_dir2_readdir.c    |   19 +--
 7 files changed, 168 insertions(+), 271 deletions(-)

Re: [RFCv3 7/7] iomap: Optimize data access patterns for filesystems with indirect mappings

2024-04-27T04:54:46Z

On Fri, Apr 26, 2024 at 08:19:47PM +0100, Matthew Wilcox wrote:
> > ...One small problem which I see with this approach is - we might have
> > some non-zero value in ifs->r_b_p when ifs_free() gets called and it
> > might give a warning of non-zero ifs->r_b_p, because we updated
> > ifs->r_b_p during writes to a non-zero value, but the reads
> > never happend. Then during a call to ->release_folio, it will complain
> > of a non-zero ifs->r_b_p.
> 
> Yes, we'll have to remove that assertion.  I don't think that's a
> problem, do you?

Or refine it, as the parts not read shoud not be uptodate either?

Either way I had another idea to simplify things a bit, but this might
end up beeing even simpler, so I'll stop the hacking on my version
for now.

Re: [PATCH v4 07/11] iomap: fix iomap_dio_zero() for fs bs > system page size

2024-04-27T04:52:47Z

On Sat, Apr 27, 2024 at 04:26:44AM +0100, Matthew Wilcox wrote:
> There's a series of commits in linux-mm with the titles:
> 
>       sparc: use is_huge_zero_pmd()
>       mm: add is_huge_zero_folio()
>       mm: add pmd_folio()
>       mm: convert migrate_vma_collect_pmd to use a folio
>       mm: convert huge_zero_page to huge_zero_folio
>       mm: convert do_huge_pmd_anonymous_page to huge_zero_folio
>       dax: use huge_zero_folio
>       mm: rename mm_put_huge_zero_page to mm_put_huge_zero_folio
> 
> > it available for non-hugetlb setups?  Not only would this be cleaner
> > and more efficient, but it would actually work for the case where you'd
> > have to zero more than 1MB on a 4k PAGE_SIZE system, which doesn't
> > seem impossible with 2MB folios.
> 
> It is available for non-hugetlb setups.  It is however allocated on
> demand, so it might not be available.

We could just export get_huge_zero_page/put_huge_zero_page and make
sure it is is available for block sizse > PAGE_SIZE file systems, or
is there a good argument against that?

Re: [RFCv3 7/7] iomap: Optimize data access patterns for filesystems with indirect mappings

2024-04-27T04:47:03Z

On Thu, Apr 25, 2024 at 06:58:51PM +0530, Ritesh Harjani (IBM) wrote:
> Currently the bios for reads within iomap are only submitted at
> 2 places -
> 1. If we cannot merge the new req. with previous bio, only then we
>    submit the previous bio.
> 2. Submit the bio at the end of the entire read processing.
> 
> This means for filesystems with indirect block mapping, we call into
> ->iomap_begin() again w/o submitting the previous bios. That causes
> unoptimized data access patterns for blocks which are of BH_Boundary type.

The same is true for extent mappings.  And it's not ideal there either,
although it only inreases the bio submission latency.

Re: [RFCv3 6/7] iomap: Optimize iomap_read_folio

2024-04-27T04:44:26Z

On Fri, Apr 26, 2024 at 02:20:05PM +0530, Ritesh Harjani wrote:
> iomap_read_folio_iter() handles multiple sub blocks within a given
> folio but it's implementation logic is similar to how
> iomap_readahead_iter() handles multiple folios within a single mapped
> extent. Both of them iterate over a given range of folio/mapped extent
> and call iomap_readpage_iter() for reading.

Sounds good.

Re: [PATCH v4 00/11] enable bs > ps in XFS

2024-04-27T04:42:45Z

"Pankaj Raghav (Samsung)"  writes:

> From: Pankaj Raghav 
>
> This is the fourth version of the series that enables block size > page size
> (Large Block Size) in XFS. The context and motivation can be seen in cover
> letter of the RFC v1[1]. We also recorded a talk about this effort at LPC [3],
> if someone would like more context on this effort.
>
> This series does not split a folio during truncation even though we have
> an API to do so due to some issues with writeback. While it is not a
> blocker, this feature can be added as a future improvement once we
> get the base patches upstream (See patch 7).
>
> A lot of emphasis has been put on testing using kdevops. The testing has
> been split into regression and progression.
>
> Regression testing:
> In regression testing, we ran the whole test suite to check for
> *regression on existing profiles due to the page cache changes.
>
> No regression was found with the patches added on top.
>
> Progression testing:
> For progression testing, we tested for 8k, 16k, 32k and 64k block sizes.
> To compare it with existing support, an ARM VM with 64k base page system
> (without our patches) was used as a reference to check for actual failures
> due to LBS support in a 4k base page size system.
>
> There are some tests that assumes block size < page size that needs to
> be fixed. I have a tree with fixes for xfstests here [6], which I will be
> sending soon to the list. Already a part of this has been upstreamed to
> fstest.
>
> No new failures were found with the LBS support.

I just did portability testing by creating XFS with 16k bs on x86 VM (4k
pagesize), created some files + checksums. I then moved the disk to
Power VM with 64k pagesize and mounted this. I was able to mount and
all the file checksums passed.

Then I did the vice versa, created a filesystem on Power VM with 64k
blocksize and created 10 files with random data of 10MB each. I then
hotplugged this device out from Power and plugged it into x86 VM and
mounted it.

~# mount /dev/vdk /mnt1/
[   35.145350] XFS (vdk): EXPERIMENTAL: Filesystem with Large Block Size (65536 bytes) enabled.
[   35.149858] XFS (vdk): Mounting V5 Filesystem 91933a8b-1370-4931-97d1-c21213f31f8f
[   35.227459] XFS (vdk): Ending clean mount
[   35.235090] xfs filesystem being mounted at /mnt1 supports timestamps until 2038-01-19 (0x7fffffff)
~# cd /mnt1/
~# sha256sum -c checksums 
file-1.img: OK
file-2.img: OK
file-3.img: OK
file-4.img: OK
file-5.img: OK
file-6.img: OK
file-7.img: OK
file-8.img: OK
file-9.img: OK
file-10.img: OK

So thanks for this nice portability which this series offers :) 

-ritesh

Re: [PATCH v4 07/11] iomap: fix iomap_dio_zero() for fs bs > system page size

2024-04-27T03:26:54Z

On Thu, Apr 25, 2024 at 11:22:35PM -0700, Christoph Hellwig wrote:
> On Thu, Apr 25, 2024 at 01:37:42PM +0200, Pankaj Raghav (Samsung) wrote:
> > From: Pankaj Raghav 
> > 
> > iomap_dio_zero() will pad a fs block with zeroes if the direct IO size
> > < fs block size. iomap_dio_zero() has an implicit assumption that fs block
> > size < page_size. This is true for most filesystems at the moment.
> > 
> > If the block size > page size, this will send the contents of the page
> > next to zero page(as len > PAGE_SIZE) to the underlying block device,
> > causing FS corruption.
> > 
> > iomap is a generic infrastructure and it should not make any assumptions
> > about the fs block size and the page size of the system.
> 
> So what happened to the plan to making huge_zero_page a folio and have

There's a series of commits in linux-mm with the titles:

      sparc: use is_huge_zero_pmd()
      mm: add is_huge_zero_folio()
      mm: add pmd_folio()
      mm: convert migrate_vma_collect_pmd to use a folio
      mm: convert huge_zero_page to huge_zero_folio
      mm: convert do_huge_pmd_anonymous_page to huge_zero_folio
      dax: use huge_zero_folio
      mm: rename mm_put_huge_zero_page to mm_put_huge_zero_folio

> it available for non-hugetlb setups?  Not only would this be cleaner
> and more efficient, but it would actually work for the case where you'd
> have to zero more than 1MB on a 4k PAGE_SIZE system, which doesn't
> seem impossible with 2MB folios.

It is available for non-hugetlb setups.  It is however allocated on
demand, so it might not be available.

Re: [PATCH v4 05/11] mm: do not split a folio if it has minimum folio order requirement

2024-04-26T23:46:15Z

On Thu, Apr 25, 2024 at 05:47:28PM -0700, Luis Chamberlain wrote:
> On Thu, Apr 25, 2024 at 09:10:16PM +0100, Matthew Wilcox wrote:
> > On Thu, Apr 25, 2024 at 01:37:40PM +0200, Pankaj Raghav (Samsung) wrote:
> > > From: Pankaj Raghav 
> > > 
> > > using that API for LBS is resulting in an NULL ptr dereference
> > > error in the writeback path [1].
> > >
> > > [1] https://gist.github.com/mcgrof/d12f586ec6ebe32b2472b5d634c397df
> > 
> >  How would I go about reproducing this?
> 
> Using kdevops this is easy:
> 
> make defconfig-lbs-xfs-small -j $(nproc)
> make -j $(nproc)

I forgot after the above:

make bringup

> make fstests
> make linux
> make fstests-baseline TESTS=generic/447 COUNT=10
> tail -f guestfs/*-xfs-reflink-16k-4ks/console.log

The above tail command needs sudo prefix for now too.

> or
> sudo virsh list
> sudo virsh console ${foo}-xfs-reflink-16k-4ks
> 
> Where $foo is the value of CONFIG_KDEVOPS_HOSTS_PREFIX in .config for
> your kdevops run.
>
> I didn't have time to verify if the above commands for kdevops worked

I did now, I forgot to git push commits to kdevops yesterday but I
confirm the above steps can be used to repdroduce this issue now.

  Luis

Re: [syzbot] [xfs?] possible deadlock in xfs_fs_dirty_inode

2024-04-26T23:20:52Z

On Sat, Apr 27, 2024 at 07:22:21AM +1000, Dave Chinner wrote:
> On Fri, Apr 26, 2024 at 09:30:08AM -0700, Darrick J. Wong wrote:
> > On Thu, Apr 25, 2024 at 10:15:29PM -0700, syzbot wrote:
> > > Hello,
> > > 
> > > syzbot found the following issue on:
> > > 
> > > HEAD commit:    3b68086599f8 Merge tag 'sched_urgent_for_v6.9_rc5' of git:..
> > > git tree:       upstream
> > > console output: https://syzkaller.appspot.com/x/log.txt?x=158206bb180000
> > > kernel config:  https://syzkaller.appspot.com/x/.config?x=f47e5e015c177e57
> > > dashboard link: https://syzkaller.appspot.com/bug?extid=1619d847a7b9ba3a9137
> > > compiler:       Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
> > > 
> > > Unfortunately, I don't have any reproducer for this issue yet.
> > > 
> > > Downloadable assets:
> > > disk image: https://storage.googleapis.com/syzbot-assets/caa90b55d476/disk-3b680865.raw.xz
> > > vmlinux: https://storage.googleapis.com/syzbot-assets/17940f1c5e8f/vmlinux-3b680865.xz
> > > kernel image: https://storage.googleapis.com/syzbot-assets/b03bd6929a1c/bzImage-3b680865.xz
> > > 
> > > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > > Reported-by: syzbot+1619d847a7b9ba3a9137@syzkaller.appspotmail.com
> > > 
> > > ======================================================
> > > WARNING: possible circular locking dependency detected
> > > 6.9.0-rc4-syzkaller-00274-g3b68086599f8 #0 Not tainted
> > > ------------------------------------------------------
> > > kswapd0/81 is trying to acquire lock:
> > > ffff8881a895a610 (sb_internal#3){.+.+}-{0:0}, at: xfs_fs_dirty_inode+0x158/0x250 fs/xfs/xfs_super.c:689
> > > 
> > > but task is already holding lock:
> > > ffffffff8e428e80 (fs_reclaim){+.+.}-{0:0}, at: balance_pgdat mm/vmscan.c:6782 [inline]
> > > ffffffff8e428e80 (fs_reclaim){+.+.}-{0:0}, at: kswapd+0xb20/0x30c0 mm/vmscan.c:7164
> > > 
> > > which lock already depends on the new lock.
> > > 
> > > 
> > > the existing dependency chain (in reverse order) is:
> > > 
> > > -> #2 (fs_reclaim){+.+.}-{0:0}:
> > >        lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5754
> > >        __fs_reclaim_acquire mm/page_alloc.c:3698 [inline]
> > >        fs_reclaim_acquire+0x88/0x140 mm/page_alloc.c:3712
> > >        might_alloc include/linux/sched/mm.h:312 [inline]
> > >        slab_pre_alloc_hook mm/slub.c:3746 [inline]
> > >        slab_alloc_node mm/slub.c:3827 [inline]
> > >        kmalloc_trace+0x47/0x360 mm/slub.c:3992
> > >        kmalloc include/linux/slab.h:628 [inline]
> > >        add_stack_record_to_list mm/page_owner.c:177 [inline]
> 
> There's the GFP_KERNEL allocation being warned about again.
> 
> > >        inc_stack_record_count mm/page_owner.c:219 [inline]
> > >        __set_page_owner+0x561/0x810 mm/page_owner.c:334
> > >        set_page_owner include/linux/page_owner.h:32 [inline]
> > >        post_alloc_hook+0x1ea/0x210 mm/page_alloc.c:1534
> > >        prep_new_page mm/page_alloc.c:1541 [inline]
> > >        get_page_from_freelist+0x3410/0x35b0 mm/page_alloc.c:3317
> > >        __alloc_pages+0x256/0x6c0 mm/page_alloc.c:4575
> > >        __alloc_pages_node include/linux/gfp.h:238 [inline]
> > >        alloc_pages_node include/linux/gfp.h:261 [inline]
> > >        alloc_slab_page+0x5f/0x160 mm/slub.c:2175
> > >        allocate_slab mm/slub.c:2338 [inline]
> > >        new_slab+0x84/0x2f0 mm/slub.c:2391
> > >        ___slab_alloc+0xc73/0x1260 mm/slub.c:3525
> > >        __slab_alloc mm/slub.c:3610 [inline]
> > >        __slab_alloc_node mm/slub.c:3663 [inline]
> > >        slab_alloc_node mm/slub.c:3835 [inline]
> > >        kmem_cache_alloc+0x252/0x340 mm/slub.c:3852
> > >        kmem_cache_zalloc include/linux/slab.h:739 [inline]
> > >        xfs_btree_alloc_cursor fs/xfs/libxfs/xfs_btree.h:679 [inline]
> > >        xfs_refcountbt_init_cursor+0x65/0x2a0 fs/xfs/libxfs/xfs_refcount_btree.c:367
> > >        xfs_reflink_find_shared fs/xfs/xfs_reflink.c:147 [inline]
> > >        xfs_reflink_trim_around_shared+0x53a/0x9d0 fs/xfs/xfs_reflink.c:194
> > >        xfs_buffered_write_iomap_begin+0xebf/0x1b40 fs/xfs/xfs_iomap.c:1062
> > 
> > Hm.  We've taken an ILOCK in xfs_buffered_write_iomap_begin, and now
> > we're allocating a btree cursor but we don't have PF_MEMALLOC_NOFS set,
> > nor do we pass GFP_NOFS.
> > 
> > Ah, because nothing in this code path sets PF_MEMALLOC_NOFS explicitly,
> > nor does it create a xfs_trans_alloc_empty, which would set that.  Prior
> > to the removal of kmem_alloc, I think we were much more aggressive about
> > GFP_NOFS usage.
> > 
> > Seeing as we're about to walk a btree, we probably want the empty
> > transaction to guard against btree cycle livelocks.
> 
> Nothing like that is needed or desired, this is a just a bug in the
> memory allocation tracking code...

Not needed because it doesn't address the root cause of these two syzbot
reports?  Or did you actually analyze the refcount btree code and
discover that there's no possibility of livelocking on btree cycles?

--D

> -Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
>

Re: [PATCH] xfs: Use kmemdup() instead of kmalloc() and memcpy()

2024-04-26T23:18:37Z

On Sat, Apr 27, 2024 at 12:00:47AM +0200, Thorsten Blum wrote:
> Fixes the following two Coccinelle/coccicheck warnings reported by
> memdup.cocci:
> 
> 	xfs_dir2.c:343:15-22: WARNING opportunity for kmemdup
> 	xfs_attr_leaf.c:1062:13-20: WARNING opportunity for kmemdup
> 
> Signed-off-by: Thorsten Blum 
> ---
>  fs/xfs/libxfs/xfs_attr_leaf.c | 5 ++---
>  fs/xfs/libxfs/xfs_dir2.c      | 3 +--
>  2 files changed, 3 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_attr_leaf.c b/fs/xfs/libxfs/xfs_attr_leaf.c
> index ac904cc1a97b..7346ee9aa4ca 100644
> --- a/fs/xfs/libxfs/xfs_attr_leaf.c
> +++ b/fs/xfs/libxfs/xfs_attr_leaf.c
> @@ -1059,12 +1059,11 @@ xfs_attr3_leaf_to_shortform(
>  
>  	trace_xfs_attr_leaf_to_sf(args);
>  
> -	tmpbuffer = kmalloc(args->geo->blksize, GFP_KERNEL | __GFP_NOFAIL);
> +	tmpbuffer = kmemdup(bp->b_addr, args->geo->blksize,
> +			GFP_KERNEL | __GFP_NOFAIL);
>  	if (!tmpbuffer)
>  		return -ENOMEM;
>  
> -	memcpy(tmpbuffer, bp->b_addr, args->geo->blksize);
> -

Please read the list before submitting a patch to a function that
already has a different change pending.

--D

>  	leaf = (xfs_attr_leafblock_t *)tmpbuffer;
>  	xfs_attr3_leaf_hdr_from_disk(args->geo, &ichdr, leaf);
>  	entry = xfs_attr3_leaf_entryp(leaf);
> diff --git a/fs/xfs/libxfs/xfs_dir2.c b/fs/xfs/libxfs/xfs_dir2.c
> index 4821519efad4..3ebb959cdaf0 100644
> --- a/fs/xfs/libxfs/xfs_dir2.c
> +++ b/fs/xfs/libxfs/xfs_dir2.c
> @@ -340,12 +340,11 @@ xfs_dir_cilookup_result(
>  					!(args->op_flags & XFS_DA_OP_CILOOKUP))
>  		return -EEXIST;
>  
> -	args->value = kmalloc(len,
> +	args->value = kmemdup(name, len,
>  			GFP_KERNEL | __GFP_NOLOCKDEP | __GFP_RETRY_MAYFAIL);
>  	if (!args->value)
>  		return -ENOMEM;
>  
> -	memcpy(args->value, name, len);
>  	args->valuelen = len;
>  	return -EEXIST;
>  }
> -- 
> 2.44.0
> 
>

Re: [syzbot] [xfs?] possible deadlock in xfs_ilock_data_map_shared

2024-04-26T23:17:09Z

On Sat, Apr 27, 2024 at 07:20:03AM +1000, Dave Chinner wrote:
> [cc linux-mm@kvack.org]
> 
> On Fri, Apr 26, 2024 at 09:32:28AM -0700, Darrick J. Wong wrote:
> > On Thu, Apr 25, 2024 at 07:46:28AM -0700, syzbot wrote:
> > > Hello,
> > > 
> > > syzbot found the following issue on:
> > > 
> > > HEAD commit:    977b1ef51866 Merge tag 'block-6.9-20240420' of git://git.k..
> > > git tree:       upstream
> > > console output: https://syzkaller.appspot.com/x/log.txt?x=126497cd180000
> > > kernel config:  https://syzkaller.appspot.com/x/.config?x=d239903bd07761e5
> > > dashboard link: https://syzkaller.appspot.com/bug?extid=b7e8d799f0ab724876f9
> > > compiler:       Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
> > > 
> > > Unfortunately, I don't have any reproducer for this issue yet.
> > > 
> > > Downloadable assets:
> > > disk image: https://storage.googleapis.com/syzbot-assets/08d7b6e107aa/disk-977b1ef5.raw.xz
> > > vmlinux: https://storage.googleapis.com/syzbot-assets/9c5e543ffdcf/vmlinux-977b1ef5.xz
> > > kernel image: https://storage.googleapis.com/syzbot-assets/04a6d79d2f69/bzImage-977b1ef5.xz
> > > 
> > > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > > Reported-by: syzbot+b7e8d799f0ab724876f9@syzkaller.appspotmail.com
> > > 
> > > XFS (loop2): Ending clean mount
> > > ======================================================
> > > WARNING: possible circular locking dependency detected
> > > 6.9.0-rc4-syzkaller-00266-g977b1ef51866 #0 Not tainted
> > > ------------------------------------------------------
> > > syz-executor.2/7915 is trying to acquire lock:
> > > ffffffff8e42a800 (fs_reclaim){+.+.}-{0:0}, at: might_alloc include/linux/sched/mm.h:312 [inline]
> > > ffffffff8e42a800 (fs_reclaim){+.+.}-{0:0}, at: slab_pre_alloc_hook mm/slub.c:3746 [inline]
> > > ffffffff8e42a800 (fs_reclaim){+.+.}-{0:0}, at: slab_alloc_node mm/slub.c:3827 [inline]
> > > ffffffff8e42a800 (fs_reclaim){+.+.}-{0:0}, at: kmalloc_trace+0x47/0x360 mm/slub.c:3992
> > > 
> > > but task is already holding lock:
> > > ffff888056da8118 (&xfs_dir_ilock_class){++++}-{3:3}, at: xfs_ilock_data_map_shared+0x4f/0x70 fs/xfs/xfs_inode.c:114
> > > 
> > > which lock already depends on the new lock.
> > > 
> > > 
> > > the existing dependency chain (in reverse order) is:
> > > 
> > > -> #1 (&xfs_dir_ilock_class){++++}-{3:3}:
> > >        lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5754
> > >        down_write_nested+0x3d/0x50 kernel/locking/rwsem.c:1695
> > >        xfs_reclaim_inode fs/xfs/xfs_icache.c:945 [inline]
> > >        xfs_icwalk_process_inode fs/xfs/xfs_icache.c:1631 [inline]
> > >        xfs_icwalk_ag+0x120e/0x1ad0 fs/xfs/xfs_icache.c:1713
> > >        xfs_icwalk fs/xfs/xfs_icache.c:1762 [inline]
> > >        xfs_reclaim_inodes_nr+0x257/0x360 fs/xfs/xfs_icache.c:1011
> > >        super_cache_scan+0x411/0x4b0 fs/super.c:227
> > >        do_shrink_slab+0x707/0x1160 mm/shrinker.c:435
> > >        shrink_slab+0x1092/0x14d0 mm/shrinker.c:662
> > >        shrink_one+0x453/0x880 mm/vmscan.c:4774
> > >        shrink_many mm/vmscan.c:4835 [inline]
> > >        lru_gen_shrink_node mm/vmscan.c:4935 [inline]
> > >        shrink_node+0x3b17/0x4310 mm/vmscan.c:5894
> > >        kswapd_shrink_node mm/vmscan.c:6704 [inline]
> > >        balance_pgdat mm/vmscan.c:6895 [inline]
> > >        kswapd+0x1882/0x38a0 mm/vmscan.c:7164
> > >        kthread+0x2f2/0x390 kernel/kthread.c:388
> > >        ret_from_fork+0x4d/0x80 arch/x86/kernel/process.c:147
> > >        ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
> > > 
> > > -> #0 (fs_reclaim){+.+.}-{0:0}:
> > >        check_prev_add kernel/locking/lockdep.c:3134 [inline]
> > >        check_prevs_add kernel/locking/lockdep.c:3253 [inline]
> > >        validate_chain+0x18cb/0x58e0 kernel/locking/lockdep.c:3869
> > >        __lock_acquire+0x1346/0x1fd0 kernel/locking/lockdep.c:5137
> > >        lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5754
> > >        __fs_reclaim_acquire mm/page_alloc.c:3698 [inline]
> > >        fs_reclaim_acquire+0x88/0x140 mm/page_alloc.c:3712
> > >        might_alloc include/linux/sched/mm.h:312 [inline]
> > >        slab_pre_alloc_hook mm/slub.c:3746 [inline]
> > >        slab_alloc_node mm/slub.c:3827 [inline]
> > >        kmalloc_trace+0x47/0x360 mm/slub.c:3992
> > >        kmalloc include/linux/slab.h:628 [inline]
> > >        add_stack_record_to_list mm/page_owner.c:177 [inline]
> > >        inc_stack_record_count mm/page_owner.c:219 [inline]
> > >        __set_page_owner+0x561/0x810 mm/page_owner.c:334
> > >        set_page_owner include/linux/page_owner.h:32 [inline]
> > >        post_alloc_hook+0x1ea/0x210 mm/page_alloc.c:1534
> > >        prep_new_page mm/page_alloc.c:1541 [inline]
> > >        get_page_from_freelist+0x3410/0x35b0 mm/page_alloc.c:3317
> > >        __alloc_pages+0x256/0x6c0 mm/page_alloc.c:4575
> > >        __alloc_pages_bulk+0x729/0xd40 mm/page_alloc.c:4523
> > >        alloc_pages_bulk_array include/linux/gfp.h:202 [inline]
> > >        xfs_buf_alloc_pages+0x1a7/0x860 fs/xfs/xfs_buf.c:398
> > >        xfs_buf_find_insert+0x19a/0x1540 fs/xfs/xfs_buf.c:650
> > >        xfs_buf_get_map+0x149c/0x1ae0 fs/xfs/xfs_buf.c:755
> > >        xfs_buf_read_map+0x111/0xa60 fs/xfs/xfs_buf.c:860
> > >        xfs_trans_read_buf_map+0x260/0xad0 fs/xfs/xfs_trans_buf.c:289
> > >        xfs_da_read_buf+0x2b1/0x470 fs/xfs/libxfs/xfs_da_btree.c:2674
> > >        xfs_dir3_block_read+0x92/0x1a0 fs/xfs/libxfs/xfs_dir2_block.c:145
> > >        xfs_dir2_block_lookup_int+0x109/0x7d0 fs/xfs/libxfs/xfs_dir2_block.c:700
> > >        xfs_dir2_block_lookup+0x19a/0x630 fs/xfs/libxfs/xfs_dir2_block.c:650
> > >        xfs_dir_lookup+0x633/0xaf0 fs/xfs/libxfs/xfs_dir2.c:399
> > 
> > Hm.  We've taken an ILOCK in xfs_dir_lookup, and now we're reading a
> > directory block.  We don't have PF_MEMALLOC_NOFS set, nor do we pass
> > GFP_NOFS when allocating the xfs_buf pages.
> > 
> > Nothing in this code path sets PF_MEMALLOC_NOFS explicitly, nor does it
> > create a xfs_trans_alloc_empty, which would set that.  Prior to the
> > removal of kmem_alloc, I think we were much more aggressive about
> > GFP_NOFS usage.
> 
> This isn't an XFS bug. The XFS code is correct - the callsite in the
> buffer cache is using GFP_KERNEL | __GFP_NOLOCKDEP explicitly to
> avoid these sorts of false positives.
> 
> Please take a closer look at the stack trace - there's a second
> memory allocation taking place there way below the XFS memory
> allocation inside the page owner tracking code itself:
> 
> static void add_stack_record_to_list(struct stack_record *stack_record,
>                                      gfp_t gfp_mask)
> {
>         unsigned long flags;
>         struct stack *stack;
> 
>         /* Filter gfp_mask the same way stackdepot does, for consistency */
>         gfp_mask &= ~GFP_ZONEMASK;
>         gfp_mask &= (GFP_ATOMIC | GFP_KERNEL);
>         gfp_mask |= __GFP_NOWARN;
> 
>         set_current_in_page_owner();
>         stack = kmalloc(sizeof(*stack), gfp_mask);
>         if (!stack) {
>                 unset_current_in_page_owner();
>                 return;
>         }
>         unset_current_in_page_owner();
> .....
> 
> Look familiar? That exactly the same gfp mask filtering that the
> stackdepot code was doing that caused this issue with KASAN:
> 
> https://lore.kernel.org/linux-xfs/000000000000fbf10e06164f3695@google.com/
> 
> Which was fixed with this patch:
> 
> https://lore.kernel.org/linux-xfs/20240418141133.22950-1-ryabinin.a.a@gmail.com/
> 
> Essentially, we're now playing whack-a-mole with internal kernel
> debug code that doesn't honor __GFP_NOLOCKDEP....
> 
> MM-people: can you please do an audit of all the nested allocations
> that occur inside the public high level allocation API and ensure
> that they all obey __GFP_NOLOCKDEP so we don't have syzbot keep
> tripping over them one at a time?

Ah.  Well.  Given my clear inability to investigate these reports
sufficiently, I will step back and let the experts handle them from now
on.

--D

> -Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
>

Re: [PATCH 6.1 CANDIDATE 00/24] more backport proposals for linux-6.1.y

2024-04-26T23:14:08Z

On Fri, Apr 26, 2024 at 02:54:47PM -0700, Leah Rumancik wrote:
> Hi again,
> 
> These have been tested on 10 configs x 30 runs of the auto group. No
> regressions were seen.
> 
> - Leah
> 
> Darrick J. Wong (8):
>   xfs: fix incorrect error-out in xfs_remove
>   xfs: invalidate block device page cache during unmount
>   xfs: attach dquots to inode before reading data/cow fork mappings
>   xfs: hoist refcount record merge predicates
>   xfs: estimate post-merge refcounts correctly
>   xfs: invalidate xfs_bufs when allocating cow extents
>   xfs: allow inode inactivation during a ro mount log recovery
>   xfs: fix log recovery when unknown rocompat bits are set
> 
> Dave Chinner (10):
>   xfs: write page faults in iomap are not buffered writes
>   xfs: punching delalloc extents on write failure is racy
>   xfs: use byte ranges for write cleanup ranges
>   xfs,iomap: move delalloc punching to iomap
>   iomap: buffered write failure should not truncate the page cache
>   xfs: xfs_bmap_punch_delalloc_range() should take a byte range
>   iomap: write iomap validity checks
>   xfs: use iomap_valid method to detect stale cached iomaps
>   xfs: drop write error injection is unfixable, remove it
>   xfs: fix off-by-one-block in xfs_discard_folio()
> 
> Eric Sandeen (1):
>   xfs: short circuit xfs_growfs_data_private() if delta is zero
> 
> Guo Xuenan (2):
>   xfs: wait iclog complete before tearing down AIL
>   xfs: fix super block buf log item UAF during force shutdown
> 
> Hironori Shiina (1):
>   xfs: get root inode correctly at bulkstat
> 
> Long Li (2):
>   xfs: fix sb write verify for lazysbcount
>   xfs: fix incorrect i_nlink caused by inode racing

Looks good to me,
Acked-by: Darrick J. Wong 

--D

> 
>  fs/iomap/buffered-io.c       | 254 ++++++++++++++++++++++++++++++++++-
>  fs/iomap/iter.c              |  19 ++-
>  fs/xfs/libxfs/xfs_bmap.c     |   8 +-
>  fs/xfs/libxfs/xfs_errortag.h |  12 +-
>  fs/xfs/libxfs/xfs_refcount.c | 146 +++++++++++++++++---
>  fs/xfs/libxfs/xfs_sb.c       |   7 +-
>  fs/xfs/xfs_aops.c            |  37 ++---
>  fs/xfs/xfs_bmap_util.c       |  10 +-
>  fs/xfs/xfs_bmap_util.h       |   2 +-
>  fs/xfs/xfs_buf.c             |   1 +
>  fs/xfs/xfs_buf_item.c        |   2 +
>  fs/xfs/xfs_error.c           |  27 +++-
>  fs/xfs/xfs_file.c            |   2 +-
>  fs/xfs/xfs_fsops.c           |   4 +
>  fs/xfs/xfs_icache.c          |   6 +
>  fs/xfs/xfs_inode.c           |  16 ++-
>  fs/xfs/xfs_ioctl.c           |   4 +-
>  fs/xfs/xfs_iomap.c           | 177 ++++++++++++++----------
>  fs/xfs/xfs_iomap.h           |   6 +-
>  fs/xfs/xfs_log.c             |  53 ++++----
>  fs/xfs/xfs_mount.c           |  15 +++
>  fs/xfs/xfs_pnfs.c            |   6 +-
>  include/linux/iomap.h        |  47 +++++--
>  23 files changed, 683 insertions(+), 178 deletions(-)
> 
> -- 
> 2.44.0.769.g3c40516874-goog
> 
>

[PATCH] xfs: Use kmemdup() instead of kmalloc() and memcpy()

2024-04-26T22:01:41Z

Fixes the following two Coccinelle/coccicheck warnings reported by
memdup.cocci:

	xfs_dir2.c:343:15-22: WARNING opportunity for kmemdup
	xfs_attr_leaf.c:1062:13-20: WARNING opportunity for kmemdup

Signed-off-by: Thorsten Blum 
---
 fs/xfs/libxfs/xfs_attr_leaf.c | 5 ++---
 fs/xfs/libxfs/xfs_dir2.c      | 3 +--
 2 files changed, 3 insertions(+), 5 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_attr_leaf.c b/fs/xfs/libxfs/xfs_attr_leaf.c
index ac904cc1a97b..7346ee9aa4ca 100644
--- a/fs/xfs/libxfs/xfs_attr_leaf.c
+++ b/fs/xfs/libxfs/xfs_attr_leaf.c
@@ -1059,12 +1059,11 @@ xfs_attr3_leaf_to_shortform(
 
 	trace_xfs_attr_leaf_to_sf(args);
 
-	tmpbuffer = kmalloc(args->geo->blksize, GFP_KERNEL | __GFP_NOFAIL);
+	tmpbuffer = kmemdup(bp->b_addr, args->geo->blksize,
+			GFP_KERNEL | __GFP_NOFAIL);
 	if (!tmpbuffer)
 		return -ENOMEM;
 
-	memcpy(tmpbuffer, bp->b_addr, args->geo->blksize);
-
 	leaf = (xfs_attr_leafblock_t *)tmpbuffer;
 	xfs_attr3_leaf_hdr_from_disk(args->geo, &ichdr, leaf);
 	entry = xfs_attr3_leaf_entryp(leaf);
diff --git a/fs/xfs/libxfs/xfs_dir2.c b/fs/xfs/libxfs/xfs_dir2.c
index 4821519efad4..3ebb959cdaf0 100644
--- a/fs/xfs/libxfs/xfs_dir2.c
+++ b/fs/xfs/libxfs/xfs_dir2.c
@@ -340,12 +340,11 @@ xfs_dir_cilookup_result(
 					!(args->op_flags & XFS_DA_OP_CILOOKUP))
 		return -EEXIST;
 
-	args->value = kmalloc(len,
+	args->value = kmemdup(name, len,
 			GFP_KERNEL | __GFP_NOLOCKDEP | __GFP_RETRY_MAYFAIL);
 	if (!args->value)
 		return -ENOMEM;
 
-	memcpy(args->value, name, len);
 	args->valuelen = len;
 	return -EEXIST;
 }
-- 
2.44.0

[PATCH 6.1 CANDIDATE 24/24] xfs: short circuit xfs_growfs_data_private() if delta is zero

2024-04-26T21:56:14Z

From: Eric Sandeen 

[ Upstream commit 84712492e6dab803bf595fb8494d11098b74a652 ]

Although xfs_growfs_data() doesn't call xfs_growfs_data_private()
if in->newblocks == mp->m_sb.sb_dblocks, xfs_growfs_data_private()
further massages the new block count so that we don't i.e. try
to create a too-small new AG.

This may lead to a delta of "0" in xfs_growfs_data_private(), so
we end up in the shrink case and emit the EXPERIMENTAL warning
even if we're not changing anything at all.

Fix this by returning straightaway if the block delta is zero.

(nb: in older kernels, the result of entering the shrink case
with delta == 0 may actually let an -ENOSPC escape to userspace,
which is confusing for users.)

Fixes: fb2fc1720185 ("xfs: support shrinking unused space in the last AG")
Signed-off-by: Eric Sandeen 
Reviewed-by: "Darrick J. Wong" 
Signed-off-by: Chandan Babu R 
Signed-off-by: Leah Rumancik 
---
 fs/xfs/xfs_fsops.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
index 13851c0d640b..332da0d7b85c 100644
--- a/fs/xfs/xfs_fsops.c
+++ b/fs/xfs/xfs_fsops.c
@@ -129,6 +129,10 @@ xfs_growfs_data_private(
 	if (delta < 0 && nagcount < 2)
 		return -EINVAL;
 
+	/* No work to do */
+	if (delta == 0)
+		return 0;
+
 	oagcount = mp->m_sb.sb_agcount;
 	/* allocate the new per-ag structures */
 	if (nagcount > oagcount) {
-- 
2.44.0.769.g3c40516874-goog

[PATCH 6.1 CANDIDATE 23/24] xfs: get root inode correctly at bulkstat

2024-04-26T21:56:13Z

From: Hironori Shiina 

[ Upstream commit 817644fa4525258992f17fecf4f1d6cdd2e1b731 ]

The root inode number should be set to `breq->startino` for getting stat
information of the root when XFS_BULK_IREQ_SPECIAL_ROOT is used.
Otherwise, the inode search is started from 1
(XFS_BULK_IREQ_SPECIAL_ROOT) and the inode with the lowest number in a
filesystem is returned.

Fixes: bf3cb3944792 ("xfs: allow single bulkstat of special inodes")
Signed-off-by: Hironori Shiina 
Reviewed-by: Darrick J. Wong 
Signed-off-by: Darrick J. Wong 
Signed-off-by: Leah Rumancik 
---
 fs/xfs/xfs_ioctl.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index 1f783e979629..85fbb3b71d1c 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -754,7 +754,7 @@ xfs_bulkstat_fmt(
 static int
 xfs_bulk_ireq_setup(
 	struct xfs_mount	*mp,
-	struct xfs_bulk_ireq	*hdr,
+	const struct xfs_bulk_ireq *hdr,
 	struct xfs_ibulk	*breq,
 	void __user		*ubuffer)
 {
@@ -780,7 +780,7 @@ xfs_bulk_ireq_setup(
 
 		switch (hdr->ino) {
 		case XFS_BULK_IREQ_SPECIAL_ROOT:
-			hdr->ino = mp->m_sb.sb_rootino;
+			breq->startino = mp->m_sb.sb_rootino;
 			break;
 		default:
 			return -EINVAL;
-- 
2.44.0.769.g3c40516874-goog