All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/18 V4] XFS: LARP state machine and recovery rework
@ 2022-05-09  0:41 Dave Chinner
  2022-05-09  0:41 ` [PATCH 01/18] xfs: avoid empty xattr transaction when attrs are inline Dave Chinner
                   ` (18 more replies)
  0 siblings, 19 replies; 56+ messages in thread
From: Dave Chinner @ 2022-05-09  0:41 UTC (permalink / raw)
  To: linux-xfs

This patchset aims to simplify the state machine that the new logged
attributes require to do their stuff. It also reworks the
attribute logging and recovery algorithms to simplify them and to
avoid leaving incomplete attributes around after recovery.

When I first dug into this code, I modified the state machine
to try to simplify the set and replace operations as there was
a lot of duplicate code in them. I also wasn't completely happy with
the different transistions for replace operations when larp mode was
enabled. I simplified the state machien and renumbered the states so
that we can iterate through the same state progression for different
attr formats just by having different inital leaf and node states.

Once I'd largely done this and started testing it, I realised that
recovery wasn't setting the initial state properly, so I started
digging into the recovery code. At this point, I realised the
recovery code didn't work correctly in all cases and could often
leave unremovable incomplete attrs sitting around. The issues are
larger documented in the last patch in the series, so I won't go
over them here, just read that patch.

However, to get, the whole replace operation for LARP=1 needed to
change. Luckily, that turned out to be pretty simple because it was
largely already broken down into component operations in the state
machine. hence I just needed to add new "remove" initial states to
the set_iter state machine, and that allowed the new algorithm to
function.

Then I realised that I'd just implemented the remove_iter algorithm
in the set_iter state machine, so I removed the remove_iter code and
it's states altogether and just pointed remove ops at the set-iter
remove initial states. The code now uses the XFS_DA_OP_RENAME flag
to determine if it should follow up an add or remove with a remove
or add, and it all largely just works. All runtime algorithms run
throught he same state machine just with different initial states
and state progressions.

And with the last patch in the series, attr item log recovery uses
that same state machine, too. It has a few quirks that need to be
handled, so I added the XFS_DA_OP_RECOVERY flag to allow the right
thing to be done with the INCOMPLETE flag deep in the guts of the
attr lookup code. And so recovery with LARP=1 now seems to mostly
work.

This version passes fstests, several recoveryloop passes and the
targeted error injection based attr recovery test that Catherine
wrote. There's a fair bit of re-org in the patch series since V2,
but most of that is pulling stuff from the last patch and putting it
in the right place in the series. hence the only real logic iand bug
fixes changes occurred in the last patch that changes the logging 
and recovery algorithm.

Comments, reviews and, most especially, testing welcome.

A compose of the patchset I've been testing based on the current
for-next tree can be found here:

git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs.git xfs-5.19-compose

Version 4:
- fixed failure to collapse empty node form structures back to
  shortform and/or removing the attr fork altogether if empty.
  Several subsequent patches needed rebasing after this, so I
  decided a new version of the patchset was warranted.
- added patch to ensure leaf format verifiers catch empty leaf nodes
  being written to disk. This would have caught the node format
  collapse bug this version fixes.

Version 3:
- https://lore.kernel.org/linux-xfs/20220506223532.GK1098723@dread.disaster.area/
- rebased on 5.18-rc2 + for-next + rebased larp v29
- added state asserts to xfs_attr_dela_state_set_replace()
- only roll the transactions in ALLOC_RMT when we need to do more
  extent allocation for the remote attr value container.
- removed unnecessary attr->xattri_blkcnt check in ALLOC_RMT
- added comments to commit message to explain why we are combining
  the set and remove paths.
- preserved and isolated the state path save/restore code for
  avoiding repeated name entry path lookups when rolling transactins
  across remove operations.  Left a big comment for Future Dave to
  re-enable the optimisation.
- fixed a transient attr fork removal bug in
  xfs_attr3_leaf_to_shortform() in the new removal algorithm which
  can result in the attr fork being removed between the remove and
  set ops in a REPLACE operation.
- moved defer operation setup changes ("split replace from set op")
  to early on in the series and combined it with the refactoring
  done immediately afterwards in the last patch of the series. This
  allows for cleanly fixing the log recovery state initialisation
  problem the patchset had.
- pulled the state initialisation for log recovery up into the
  patches that introduce the state machine changes for the given
  operations.
- Lots of changes to log recovery algorithm change in last patch.

Version 2:
https://lore.kernel.org/linux-xfs/20220414094434.2508781-1-david@fromorbit.com/
- fixed attrd initialisation for LARP=1 mode
- fixed REPLACE->REMOVE_OLD state transition for LARP=1 mode
- added more comments to describe the assumptions that allow
  xfs_attr_remove_leaf_attr() to work for both modes


^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH 01/18] xfs: avoid empty xattr transaction when attrs are inline
  2022-05-09  0:41 [PATCH 00/18 V4] XFS: LARP state machine and recovery rework Dave Chinner
@ 2022-05-09  0:41 ` Dave Chinner
  2022-05-09 16:43   ` Darrick J. Wong
  2022-05-09  0:41 ` [PATCH 02/18] xfs: initialise attrd item to zero Dave Chinner
                   ` (17 subsequent siblings)
  18 siblings, 1 reply; 56+ messages in thread
From: Dave Chinner @ 2022-05-09  0:41 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

generic/642 triggered a reproducable assert failure in
xlog_cil_commit() that resulted from a xfs_attr_set() committing
an empty but dirty transaction. When the CIL is empty and this
occurs, xlog_cil_commit() tries a background push and this triggers
a "pushing an empty CIL" assert.

XFS: Assertion failed: !list_empty(&cil->xc_cil), file: fs/xfs/xfs_log_cil.c, line: 1274
Call Trace:
 <TASK>
 xlog_cil_commit+0xa5a/0xad0
 __xfs_trans_commit+0xb8/0x330
 xfs_trans_commit+0x10/0x20
 xfs_attr_set+0x3e2/0x4c0
 xfs_xattr_set+0x8d/0xe0
 __vfs_setxattr+0x6b/0x90
 __vfs_setxattr_noperm+0x76/0x220
 __vfs_setxattr_locked+0xdf/0x100
 vfs_setxattr+0x94/0x170
 setxattr+0x110/0x200
 path_setxattr+0xbf/0xe0
 __x64_sys_setxattr+0x2b/0x30
 do_syscall_64+0x35/0x80

The problem is related to the breakdown of attribute addition in
xfs_attr_set_iter() and how it is called from deferred operations.
When we have a pure leaf xattr insert, we add the xattr to the leaf
and set the next state to XFS_DAS_FOUND_LBLK and return -EAGAIN.
This requeues the xattr defered work, rolls the transaction and
runs xfs_attr_set_iter() again. This then checks the xattr for
being remote (it's not) and whether a replace op is being done (this
is a create op) and if neither are true it returns without having
done anything.

xfs_xattri_finish_update() then unconditionally sets the transaction
dirty, and the deferops finishes and returns to __xfs_trans_commit()
which sees the transaction dirty and tries to commit it by calling
xlog_cil_commit(). The transaction is empty, and then the assert
fires if this happens when the CIL is empty.

This patch addresses the structure of xfs_attr_set_iter() that
requires re-entry on leaf add even when nothing will be done. This
gets rid of the trailing empty transaction and so doesn't trigger
the XFS_TRANS_DIRTY assignment in xfs_xattri_finish_update()
incorrectly. Addressing that is for a different patch.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Allison Henderson<allison.henderson@oracle.com>
---
 fs/xfs/libxfs/xfs_attr.c | 39 +++++++++++++++++++--------------------
 1 file changed, 19 insertions(+), 20 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
index 48b7e7efbb30..b3d918195160 100644
--- a/fs/xfs/libxfs/xfs_attr.c
+++ b/fs/xfs/libxfs/xfs_attr.c
@@ -315,6 +315,7 @@ xfs_attr_leaf_addname(
 {
 	struct xfs_da_args	*args = attr->xattri_da_args;
 	struct xfs_inode	*dp = args->dp;
+	enum xfs_delattr_state	next_state = XFS_DAS_UNINIT;
 	int			error;
 
 	if (xfs_attr_is_leaf(dp)) {
@@ -335,37 +336,35 @@ xfs_attr_leaf_addname(
 			 * when we come back, we'll be a node, so we'll fall
 			 * down into the node handling code below
 			 */
-			trace_xfs_attr_set_iter_return(
-				attr->xattri_dela_state, args->dp);
-			return -EAGAIN;
+			error = -EAGAIN;
+			goto out;
 		}
-
-		if (error)
-			return error;
-
-		attr->xattri_dela_state = XFS_DAS_FOUND_LBLK;
+		next_state = XFS_DAS_FOUND_LBLK;
 	} else {
 		error = xfs_attr_node_addname_find_attr(attr);
 		if (error)
 			return error;
 
+		next_state = XFS_DAS_FOUND_NBLK;
 		error = xfs_attr_node_addname(attr);
-		if (error)
-			return error;
-
-		/*
-		 * If addname was successful, and we dont need to alloc or
-		 * remove anymore blks, we're done.
-		 */
-		if (!args->rmtblkno &&
-		    !(args->op_flags & XFS_DA_OP_RENAME))
-			return 0;
+	}
+	if (error)
+		return error;
 
-		attr->xattri_dela_state = XFS_DAS_FOUND_NBLK;
+	/*
+	 * We need to commit and roll if we need to allocate remote xattr blocks
+	 * or perform more xattr manipulations. Otherwise there is nothing more
+	 * to do and we can return success.
+	 */
+	if (args->rmtblkno ||
+	    (args->op_flags & XFS_DA_OP_RENAME)) {
+		attr->xattri_dela_state = next_state;
+		error = -EAGAIN;
 	}
 
+out:
 	trace_xfs_attr_leaf_addname_return(attr->xattri_dela_state, args->dp);
-	return -EAGAIN;
+	return error;
 }
 
 /*
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 02/18] xfs: initialise attrd item to zero
  2022-05-09  0:41 [PATCH 00/18 V4] XFS: LARP state machine and recovery rework Dave Chinner
  2022-05-09  0:41 ` [PATCH 01/18] xfs: avoid empty xattr transaction when attrs are inline Dave Chinner
@ 2022-05-09  0:41 ` Dave Chinner
  2022-05-09 16:43   ` Darrick J. Wong
  2022-05-09  0:41 ` [PATCH 03/18] xfs: make xattri_leaf_bp more useful Dave Chinner
                   ` (16 subsequent siblings)
  18 siblings, 1 reply; 56+ messages in thread
From: Dave Chinner @ 2022-05-09  0:41 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

On the first allocation of a attrd item, xfs_trans_add_item() fires
an assert like so:

 XFS (pmem0): EXPERIMENTAL logged extended attributes feature added. Use at your own risk!
 XFS: Assertion failed: !test_bit(XFS_LI_DIRTY, &lip->li_flags), file: fs/xfs/xfs_trans.c, line: 683
 ------------[ cut here ]------------
 kernel BUG at fs/xfs/xfs_message.c:102!
 Call Trace:
  <TASK>
  xfs_trans_add_item+0x17e/0x190
  xfs_trans_get_attrd+0x67/0x90
  xfs_attr_create_done+0x13/0x20
  xfs_defer_finish_noroll+0x100/0x690
  __xfs_trans_commit+0x144/0x330
  xfs_trans_commit+0x10/0x20
  xfs_attr_set+0x3e2/0x4c0
  xfs_initxattrs+0xaa/0xe0
  security_inode_init_security+0xb0/0x130
  xfs_init_security+0x18/0x20
  xfs_generic_create+0x13a/0x340
  xfs_vn_create+0x17/0x20
  path_openat+0xff3/0x12f0
  do_filp_open+0xb2/0x150

The attrd log item is allocated via kmem_cache_alloc, and
xfs_log_item_init() does not zero the entire log item structure - it
assumes that the structure is already all zeros as it only
initialises non-zero fields. Fix the attr items to be allocated
via the *zalloc methods.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Allison Henderson<allison.henderson@oracle.com>
---
 fs/xfs/xfs_attr_item.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/xfs/xfs_attr_item.c b/fs/xfs/xfs_attr_item.c
index 9061adce3f16..5f8680b05079 100644
--- a/fs/xfs/xfs_attr_item.c
+++ b/fs/xfs/xfs_attr_item.c
@@ -721,7 +721,7 @@ xfs_trans_get_attrd(struct xfs_trans		*tp,
 
 	ASSERT(tp != NULL);
 
-	attrdp = kmem_cache_alloc(xfs_attrd_cache, GFP_NOFS | __GFP_NOFAIL);
+	attrdp = kmem_cache_zalloc(xfs_attrd_cache, GFP_NOFS | __GFP_NOFAIL);
 
 	xfs_log_item_init(tp->t_mountp, &attrdp->attrd_item, XFS_LI_ATTRD,
 			  &xfs_attrd_item_ops);
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 03/18] xfs: make xattri_leaf_bp more useful
  2022-05-09  0:41 [PATCH 00/18 V4] XFS: LARP state machine and recovery rework Dave Chinner
  2022-05-09  0:41 ` [PATCH 01/18] xfs: avoid empty xattr transaction when attrs are inline Dave Chinner
  2022-05-09  0:41 ` [PATCH 02/18] xfs: initialise attrd item to zero Dave Chinner
@ 2022-05-09  0:41 ` Dave Chinner
  2022-05-10 22:58   ` Darrick J. Wong
  2022-05-09  0:41 ` [PATCH 04/18] xfs: rework deferred attribute operation setup Dave Chinner
                   ` (15 subsequent siblings)
  18 siblings, 1 reply; 56+ messages in thread
From: Dave Chinner @ 2022-05-09  0:41 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

We currently set it and hold it when converting from short to leaf
form, then release it only to immediately look it back up again
to do the leaf insert.

Do a bit of refactoring to xfs_attr_leaf_try_add() to avoid this
messy handling of the newly allocated leaf buffer.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Allison Henderson<allison.henderson@oracle.com>
---
 fs/xfs/libxfs/xfs_attr.c | 50 +++++++++++++++++++++++++---------------
 1 file changed, 32 insertions(+), 18 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
index b3d918195160..a4b0b20a3bab 100644
--- a/fs/xfs/libxfs/xfs_attr.c
+++ b/fs/xfs/libxfs/xfs_attr.c
@@ -319,7 +319,15 @@ xfs_attr_leaf_addname(
 	int			error;
 
 	if (xfs_attr_is_leaf(dp)) {
+
+		/*
+		 * Use the leaf buffer we may already hold locked as a result of
+		 * a sf-to-leaf conversion. The held buffer is no longer valid
+		 * after this call, regardless of the result.
+		 */
 		error = xfs_attr_leaf_try_add(args, attr->xattri_leaf_bp);
+		attr->xattri_leaf_bp = NULL;
+
 		if (error == -ENOSPC) {
 			error = xfs_attr3_leaf_to_node(args);
 			if (error)
@@ -341,6 +349,8 @@ xfs_attr_leaf_addname(
 		}
 		next_state = XFS_DAS_FOUND_LBLK;
 	} else {
+		ASSERT(!attr->xattri_leaf_bp);
+
 		error = xfs_attr_node_addname_find_attr(attr);
 		if (error)
 			return error;
@@ -396,12 +406,6 @@ xfs_attr_set_iter(
 		 */
 		if (xfs_attr_is_shortform(dp))
 			return xfs_attr_sf_addname(attr);
-		if (attr->xattri_leaf_bp != NULL) {
-			xfs_trans_bhold_release(args->trans,
-						attr->xattri_leaf_bp);
-			attr->xattri_leaf_bp = NULL;
-		}
-
 		return xfs_attr_leaf_addname(attr);
 
 	case XFS_DAS_FOUND_LBLK:
@@ -992,18 +996,31 @@ xfs_attr_leaf_try_add(
 	struct xfs_da_args	*args,
 	struct xfs_buf		*bp)
 {
-	int			retval;
+	int			error;
 
 	/*
-	 * Look up the given attribute in the leaf block.  Figure out if
-	 * the given flags produce an error or call for an atomic rename.
+	 * If the caller provided a buffer to us, it is locked and held in
+	 * the transaction because it just did a shortform to leaf conversion.
+	 * Hence we don't need to read it again. Otherwise read in the leaf
+	 * buffer.
 	 */
-	retval = xfs_attr_leaf_hasname(args, &bp);
-	if (retval != -ENOATTR && retval != -EEXIST)
-		return retval;
-	if (retval == -ENOATTR && (args->attr_flags & XATTR_REPLACE))
+	if (bp) {
+		xfs_trans_bhold_release(args->trans, bp);
+	} else {
+		error = xfs_attr3_leaf_read(args->trans, args->dp, 0, &bp);
+		if (error)
+			return error;
+	}
+
+	/*
+	 * Look up the xattr name to set the insertion point for the new xattr.
+	 */
+	error = xfs_attr3_leaf_lookup_int(bp, args);
+	if (error != -ENOATTR && error != -EEXIST)
 		goto out_brelse;
-	if (retval == -EEXIST) {
+	if (error == -ENOATTR && (args->attr_flags & XATTR_REPLACE))
+		goto out_brelse;
+	if (error == -EEXIST) {
 		if (args->attr_flags & XATTR_CREATE)
 			goto out_brelse;
 
@@ -1023,14 +1040,11 @@ xfs_attr_leaf_try_add(
 		args->rmtvaluelen = 0;
 	}
 
-	/*
-	 * Add the attribute to the leaf block
-	 */
 	return xfs_attr3_leaf_add(bp, args);
 
 out_brelse:
 	xfs_trans_brelse(args->trans, bp);
-	return retval;
+	return error;
 }
 
 /*
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 04/18] xfs: rework deferred attribute operation setup
  2022-05-09  0:41 [PATCH 00/18 V4] XFS: LARP state machine and recovery rework Dave Chinner
                   ` (2 preceding siblings ...)
  2022-05-09  0:41 ` [PATCH 03/18] xfs: make xattri_leaf_bp more useful Dave Chinner
@ 2022-05-09  0:41 ` Dave Chinner
  2022-05-10 23:04   ` Darrick J. Wong
  2022-05-09  0:41 ` [PATCH 05/18] xfs: separate out initial attr_set states Dave Chinner
                   ` (14 subsequent siblings)
  18 siblings, 1 reply; 56+ messages in thread
From: Dave Chinner @ 2022-05-09  0:41 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

Logged attribute intents only have set and remove types - there is
no type of the replace operation. We should ahve a separate type for
a replace operation, as it needs to perform operations that neither
SET or REMOVE can perform.

Add this type to the intent items and rearrange the deferred
operation setup to reflect the different operations we are
performing.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Allison Henderson<allison.henderson@oracle.com>
---
 fs/xfs/libxfs/xfs_attr.c       | 165 +++++++++++++++++++--------------
 fs/xfs/libxfs/xfs_attr.h       |   2 -
 fs/xfs/libxfs/xfs_log_format.h |   1 +
 fs/xfs/xfs_attr_item.c         |   9 +-
 fs/xfs/xfs_trace.h             |   4 +
 5 files changed, 110 insertions(+), 71 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
index a4b0b20a3bab..817e15740f9c 100644
--- a/fs/xfs/libxfs/xfs_attr.c
+++ b/fs/xfs/libxfs/xfs_attr.c
@@ -671,6 +671,81 @@ xfs_attr_lookup(
 	return xfs_attr_node_hasname(args, NULL);
 }
 
+static int
+xfs_attr_item_init(
+	struct xfs_da_args	*args,
+	unsigned int		op_flags,	/* op flag (set or remove) */
+	struct xfs_attr_item	**attr)		/* new xfs_attr_item */
+{
+
+	struct xfs_attr_item	*new;
+
+	new = kmem_zalloc(sizeof(struct xfs_attr_item), KM_NOFS);
+	new->xattri_op_flags = op_flags;
+	new->xattri_da_args = args;
+
+	*attr = new;
+	return 0;
+}
+
+/* Sets an attribute for an inode as a deferred operation */
+static int
+xfs_attr_defer_add(
+	struct xfs_da_args	*args)
+{
+	struct xfs_attr_item	*new;
+	int			error = 0;
+
+	error = xfs_attr_item_init(args, XFS_ATTR_OP_FLAGS_SET, &new);
+	if (error)
+		return error;
+
+	new->xattri_dela_state = XFS_DAS_UNINIT;
+	xfs_defer_add(args->trans, XFS_DEFER_OPS_TYPE_ATTR, &new->xattri_list);
+	trace_xfs_attr_defer_add(new->xattri_dela_state, args->dp);
+
+	return 0;
+}
+
+/* Sets an attribute for an inode as a deferred operation */
+static int
+xfs_attr_defer_replace(
+	struct xfs_da_args	*args)
+{
+	struct xfs_attr_item	*new;
+	int			error = 0;
+
+	error = xfs_attr_item_init(args, XFS_ATTR_OP_FLAGS_REPLACE, &new);
+	if (error)
+		return error;
+
+	new->xattri_dela_state = XFS_DAS_UNINIT;
+	xfs_defer_add(args->trans, XFS_DEFER_OPS_TYPE_ATTR, &new->xattri_list);
+	trace_xfs_attr_defer_replace(new->xattri_dela_state, args->dp);
+
+	return 0;
+}
+
+/* Removes an attribute for an inode as a deferred operation */
+static int
+xfs_attr_defer_remove(
+	struct xfs_da_args	*args)
+{
+
+	struct xfs_attr_item	*new;
+	int			error;
+
+	error  = xfs_attr_item_init(args, XFS_ATTR_OP_FLAGS_REMOVE, &new);
+	if (error)
+		return error;
+
+	new->xattri_dela_state = XFS_DAS_UNINIT;
+	xfs_defer_add(args->trans, XFS_DEFER_OPS_TYPE_ATTR, &new->xattri_list);
+	trace_xfs_attr_defer_remove(new->xattri_dela_state, args->dp);
+
+	return 0;
+}
+
 /*
  * Note: If args->value is NULL the attribute will be removed, just like the
  * Linux ->setattr API.
@@ -759,29 +834,35 @@ xfs_attr_set(
 	}
 
 	error = xfs_attr_lookup(args);
-	if (args->value) {
-		if (error == -EEXIST && (args->attr_flags & XATTR_CREATE))
-			goto out_trans_cancel;
-		if (error == -ENOATTR && (args->attr_flags & XATTR_REPLACE))
-			goto out_trans_cancel;
-		if (error != -ENOATTR && error != -EEXIST)
+	switch (error) {
+	case -EEXIST:
+		/* if no value, we are performing a remove operation */
+		if (!args->value) {
+			error = xfs_attr_defer_remove(args);
+			break;
+		}
+		/* Pure create fails if the attr already exists */
+		if (args->attr_flags & XATTR_CREATE)
 			goto out_trans_cancel;
 
-		error = xfs_attr_set_deferred(args);
-		if (error)
+		error = xfs_attr_defer_replace(args);
+		break;
+	case -ENOATTR:
+		/* Can't remove what isn't there. */
+		if (!args->value)
 			goto out_trans_cancel;
 
-		/* shortform attribute has already been committed */
-		if (!args->trans)
-			goto out_unlock;
-	} else {
-		if (error != -EEXIST)
+		/* Pure replace fails if no existing attr to replace. */
+		if (args->attr_flags & XATTR_REPLACE)
 			goto out_trans_cancel;
 
-		error = xfs_attr_remove_deferred(args);
-		if (error)
-			goto out_trans_cancel;
+		error = xfs_attr_defer_add(args);
+		break;
+	default:
+		goto out_trans_cancel;
 	}
+	if (error)
+		goto out_trans_cancel;
 
 	/*
 	 * If this is a synchronous mount, make sure that the
@@ -845,58 +926,6 @@ xfs_attrd_destroy_cache(void)
 	xfs_attrd_cache = NULL;
 }
 
-STATIC int
-xfs_attr_item_init(
-	struct xfs_da_args	*args,
-	unsigned int		op_flags,	/* op flag (set or remove) */
-	struct xfs_attr_item	**attr)		/* new xfs_attr_item */
-{
-
-	struct xfs_attr_item	*new;
-
-	new = kmem_zalloc(sizeof(struct xfs_attr_item), KM_NOFS);
-	new->xattri_op_flags = op_flags;
-	new->xattri_da_args = args;
-
-	*attr = new;
-	return 0;
-}
-
-/* Sets an attribute for an inode as a deferred operation */
-int
-xfs_attr_set_deferred(
-	struct xfs_da_args	*args)
-{
-	struct xfs_attr_item	*new;
-	int			error = 0;
-
-	error = xfs_attr_item_init(args, XFS_ATTR_OP_FLAGS_SET, &new);
-	if (error)
-		return error;
-
-	xfs_defer_add(args->trans, XFS_DEFER_OPS_TYPE_ATTR, &new->xattri_list);
-
-	return 0;
-}
-
-/* Removes an attribute for an inode as a deferred operation */
-int
-xfs_attr_remove_deferred(
-	struct xfs_da_args	*args)
-{
-
-	struct xfs_attr_item	*new;
-	int			error;
-
-	error  = xfs_attr_item_init(args, XFS_ATTR_OP_FLAGS_REMOVE, &new);
-	if (error)
-		return error;
-
-	xfs_defer_add(args->trans, XFS_DEFER_OPS_TYPE_ATTR, &new->xattri_list);
-
-	return 0;
-}
-
 /*========================================================================
  * External routines when attribute list is inside the inode
  *========================================================================*/
diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
index f6c13d2bfbcd..c9c867e3406c 100644
--- a/fs/xfs/libxfs/xfs_attr.h
+++ b/fs/xfs/libxfs/xfs_attr.h
@@ -521,8 +521,6 @@ bool xfs_attr_namecheck(const void *name, size_t length);
 int xfs_attr_calc_size(struct xfs_da_args *args, int *local);
 void xfs_init_attr_trans(struct xfs_da_args *args, struct xfs_trans_res *tres,
 			 unsigned int *total);
-int xfs_attr_set_deferred(struct xfs_da_args *args);
-int xfs_attr_remove_deferred(struct xfs_da_args *args);
 
 extern struct kmem_cache	*xfs_attri_cache;
 extern struct kmem_cache	*xfs_attrd_cache;
diff --git a/fs/xfs/libxfs/xfs_log_format.h b/fs/xfs/libxfs/xfs_log_format.h
index a27492e99673..f7edd1ecf6d9 100644
--- a/fs/xfs/libxfs/xfs_log_format.h
+++ b/fs/xfs/libxfs/xfs_log_format.h
@@ -908,6 +908,7 @@ struct xfs_icreate_log {
  */
 #define XFS_ATTR_OP_FLAGS_SET		1	/* Set the attribute */
 #define XFS_ATTR_OP_FLAGS_REMOVE	2	/* Remove the attribute */
+#define XFS_ATTR_OP_FLAGS_REPLACE	3	/* Replace the attribute */
 #define XFS_ATTR_OP_FLAGS_TYPE_MASK	0xFF	/* Flags type mask */
 
 /*
diff --git a/fs/xfs/xfs_attr_item.c b/fs/xfs/xfs_attr_item.c
index 5f8680b05079..fe1e37696634 100644
--- a/fs/xfs/xfs_attr_item.c
+++ b/fs/xfs/xfs_attr_item.c
@@ -311,6 +311,7 @@ xfs_xattri_finish_update(
 
 	switch (op) {
 	case XFS_ATTR_OP_FLAGS_SET:
+	case XFS_ATTR_OP_FLAGS_REPLACE:
 		error = xfs_attr_set_iter(attr);
 		break;
 	case XFS_ATTR_OP_FLAGS_REMOVE:
@@ -500,8 +501,14 @@ xfs_attri_validate(
 		return false;
 
 	/* alfi_op_flags should be either a set or remove */
-	if (op != XFS_ATTR_OP_FLAGS_SET && op != XFS_ATTR_OP_FLAGS_REMOVE)
+	switch (op) {
+	case XFS_ATTR_OP_FLAGS_SET:
+	case XFS_ATTR_OP_FLAGS_REPLACE:
+	case XFS_ATTR_OP_FLAGS_REMOVE:
+		break;
+	default:
 		return false;
+	}
 
 	if (attrp->alfi_value_len > XATTR_SIZE_MAX)
 		return false;
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index fec4198b738b..01ce0401aa32 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -4154,6 +4154,10 @@ DEFINE_DAS_STATE_EVENT(xfs_attr_leaf_addname_return);
 DEFINE_DAS_STATE_EVENT(xfs_attr_node_addname_return);
 DEFINE_DAS_STATE_EVENT(xfs_attr_remove_iter_return);
 DEFINE_DAS_STATE_EVENT(xfs_attr_rmtval_remove_return);
+DEFINE_DAS_STATE_EVENT(xfs_attr_defer_add);
+DEFINE_DAS_STATE_EVENT(xfs_attr_defer_replace);
+DEFINE_DAS_STATE_EVENT(xfs_attr_defer_remove);
+
 
 TRACE_EVENT(xfs_force_shutdown,
 	TP_PROTO(struct xfs_mount *mp, int ptag, int flags, const char *fname,
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 05/18] xfs: separate out initial attr_set states
  2022-05-09  0:41 [PATCH 00/18 V4] XFS: LARP state machine and recovery rework Dave Chinner
                   ` (3 preceding siblings ...)
  2022-05-09  0:41 ` [PATCH 04/18] xfs: rework deferred attribute operation setup Dave Chinner
@ 2022-05-09  0:41 ` Dave Chinner
  2022-05-10 23:12   ` Darrick J. Wong
  2022-05-09  0:41 ` [PATCH 06/18] xfs: kill XFS_DAC_LEAF_ADDNAME_INIT Dave Chinner
                   ` (13 subsequent siblings)
  18 siblings, 1 reply; 56+ messages in thread
From: Dave Chinner @ 2022-05-09  0:41 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

We current use XFS_DAS_UNINIT for several steps in the attr_set
state machine. We use it for setting shortform xattrs, converting
from shortform to leaf, leaf add, leaf-to-node and leaf add. All of
these things are essentially known before we start the state machine
iterating, so we really should separate them out:

XFS_DAS_SF_ADD:
	- tries to do a shortform add
	- on success -> done
	- on ENOSPC converts to leaf, -> XFS_DAS_LEAF_ADD
	- on error, dies.

XFS_DAS_LEAF_ADD:
	- tries to do leaf add
	- on success:
		- inline attr -> done
		- remote xattr || REPLACE -> XFS_DAS_FOUND_LBLK
	- on ENOSPC converts to node, -> XFS_DAS_NODE_ADD
	- on error, dies

XFS_DAS_NODE_ADD:
	- tries to do node add
	- on success:
		- inline attr -> done
		- remote xattr || REPLACE -> XFS_DAS_FOUND_NBLK
	- on error, dies

This makes it easier to understand how the state machine starts
up and sets us up on the path to further state machine
simplifications.

This also converts the DAS state tracepoints to use strings rather
than numbers, as converting between enums and numbers requires
manual counting rather than just reading the name.

This also introduces a XFS_DAS_DONE state so that we can trace
successful operation completions easily.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Allison Henderson<allison.henderson@oracle.com>
---
 fs/xfs/libxfs/xfs_attr.c  | 161 +++++++++++++++++++-------------------
 fs/xfs/libxfs/xfs_attr.h  |  80 ++++++++++++++++---
 fs/xfs/libxfs/xfs_defer.c |   2 +
 fs/xfs/xfs_acl.c          |   4 +-
 fs/xfs/xfs_attr_item.c    |  13 ++-
 fs/xfs/xfs_ioctl.c        |   4 +-
 fs/xfs/xfs_trace.h        |  22 +++++-
 fs/xfs/xfs_xattr.c        |   2 +-
 8 files changed, 185 insertions(+), 103 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
index 817e15740f9c..edc31075fde4 100644
--- a/fs/xfs/libxfs/xfs_attr.c
+++ b/fs/xfs/libxfs/xfs_attr.c
@@ -59,7 +59,7 @@ STATIC int xfs_attr_leaf_try_add(struct xfs_da_args *args, struct xfs_buf *bp);
  */
 STATIC int xfs_attr_node_get(xfs_da_args_t *args);
 STATIC void xfs_attr_restore_rmt_blk(struct xfs_da_args *args);
-STATIC int xfs_attr_node_addname(struct xfs_attr_item *attr);
+static int xfs_attr_node_try_addname(struct xfs_attr_item *attr);
 STATIC int xfs_attr_node_addname_find_attr(struct xfs_attr_item *attr);
 STATIC int xfs_attr_node_addname_clear_incomplete(struct xfs_attr_item *attr);
 STATIC int xfs_attr_node_hasname(xfs_da_args_t *args,
@@ -224,6 +224,11 @@ xfs_init_attr_trans(
 	}
 }
 
+/*
+ * Add an attr to a shortform fork. If there is no space,
+ * xfs_attr_shortform_addname() will convert to leaf format and return -ENOSPC.
+ * to use.
+ */
 STATIC int
 xfs_attr_try_sf_addname(
 	struct xfs_inode	*dp,
@@ -255,20 +260,7 @@ xfs_attr_try_sf_addname(
 	return error;
 }
 
-/*
- * Check to see if the attr should be upgraded from non-existent or shortform to
- * single-leaf-block attribute list.
- */
-static inline bool
-xfs_attr_is_shortform(
-	struct xfs_inode    *ip)
-{
-	return ip->i_afp->if_format == XFS_DINODE_FMT_LOCAL ||
-	       (ip->i_afp->if_format == XFS_DINODE_FMT_EXTENTS &&
-		ip->i_afp->if_nextents == 0);
-}
-
-STATIC int
+static int
 xfs_attr_sf_addname(
 	struct xfs_attr_item		*attr)
 {
@@ -276,14 +268,12 @@ xfs_attr_sf_addname(
 	struct xfs_inode		*dp = args->dp;
 	int				error = 0;
 
-	/*
-	 * Try to add the attr to the attribute list in the inode.
-	 */
 	error = xfs_attr_try_sf_addname(dp, args);
-
-	/* Should only be 0, -EEXIST or -ENOSPC */
-	if (error != -ENOSPC)
-		return error;
+	if (error != -ENOSPC) {
+		ASSERT(!error || error == -EEXIST);
+		attr->xattri_dela_state = XFS_DAS_DONE;
+		goto out;
+	}
 
 	/*
 	 * It won't fit in the shortform, transform to a leaf block.  GROT:
@@ -299,64 +289,42 @@ xfs_attr_sf_addname(
 	 * with the write verifier.
 	 */
 	xfs_trans_bhold(args->trans, attr->xattri_leaf_bp);
-
-	/*
-	 * We're still in XFS_DAS_UNINIT state here.  We've converted
-	 * the attr fork to leaf format and will restart with the leaf
-	 * add.
-	 */
-	trace_xfs_attr_sf_addname_return(XFS_DAS_UNINIT, args->dp);
-	return -EAGAIN;
+	attr->xattri_dela_state = XFS_DAS_LEAF_ADD;
+	error = -EAGAIN;
+out:
+	trace_xfs_attr_sf_addname_return(attr->xattri_dela_state, args->dp);
+	return error;
 }
 
-STATIC int
+static int
 xfs_attr_leaf_addname(
 	struct xfs_attr_item	*attr)
 {
 	struct xfs_da_args	*args = attr->xattri_da_args;
-	struct xfs_inode	*dp = args->dp;
-	enum xfs_delattr_state	next_state = XFS_DAS_UNINIT;
 	int			error;
 
-	if (xfs_attr_is_leaf(dp)) {
-
-		/*
-		 * Use the leaf buffer we may already hold locked as a result of
-		 * a sf-to-leaf conversion. The held buffer is no longer valid
-		 * after this call, regardless of the result.
-		 */
-		error = xfs_attr_leaf_try_add(args, attr->xattri_leaf_bp);
-		attr->xattri_leaf_bp = NULL;
+	ASSERT(xfs_attr_is_leaf(args->dp));
 
-		if (error == -ENOSPC) {
-			error = xfs_attr3_leaf_to_node(args);
-			if (error)
-				return error;
-
-			/*
-			 * Finish any deferred work items and roll the
-			 * transaction once more.  The goal here is to call
-			 * node_addname with the inode and transaction in the
-			 * same state (inode locked and joined, transaction
-			 * clean) no matter how we got to this step.
-			 *
-			 * At this point, we are still in XFS_DAS_UNINIT, but
-			 * when we come back, we'll be a node, so we'll fall
-			 * down into the node handling code below
-			 */
-			error = -EAGAIN;
-			goto out;
-		}
-		next_state = XFS_DAS_FOUND_LBLK;
-	} else {
-		ASSERT(!attr->xattri_leaf_bp);
+	/*
+	 * Use the leaf buffer we may already hold locked as a result of
+	 * a sf-to-leaf conversion. The held buffer is no longer valid
+	 * after this call, regardless of the result.
+	 */
+	error = xfs_attr_leaf_try_add(args, attr->xattri_leaf_bp);
+	attr->xattri_leaf_bp = NULL;
 
-		error = xfs_attr_node_addname_find_attr(attr);
+	if (error == -ENOSPC) {
+		error = xfs_attr3_leaf_to_node(args);
 		if (error)
 			return error;
 
-		next_state = XFS_DAS_FOUND_NBLK;
-		error = xfs_attr_node_addname(attr);
+		/*
+		 * We're not in leaf format anymore, so roll the transaction and
+		 * retry the add to the newly allocated node block.
+		 */
+		attr->xattri_dela_state = XFS_DAS_NODE_ADD;
+		error = -EAGAIN;
+		goto out;
 	}
 	if (error)
 		return error;
@@ -368,15 +336,46 @@ xfs_attr_leaf_addname(
 	 */
 	if (args->rmtblkno ||
 	    (args->op_flags & XFS_DA_OP_RENAME)) {
-		attr->xattri_dela_state = next_state;
+		attr->xattri_dela_state = XFS_DAS_FOUND_LBLK;
 		error = -EAGAIN;
+	} else {
+		attr->xattri_dela_state = XFS_DAS_DONE;
 	}
-
 out:
 	trace_xfs_attr_leaf_addname_return(attr->xattri_dela_state, args->dp);
 	return error;
 }
 
+static int
+xfs_attr_node_addname(
+	struct xfs_attr_item	*attr)
+{
+	struct xfs_da_args	*args = attr->xattri_da_args;
+	int			error;
+
+	ASSERT(!attr->xattri_leaf_bp);
+
+	error = xfs_attr_node_addname_find_attr(attr);
+	if (error)
+		return error;
+
+	error = xfs_attr_node_try_addname(attr);
+	if (error)
+		return error;
+
+	if (args->rmtblkno ||
+	    (args->op_flags & XFS_DA_OP_RENAME)) {
+		attr->xattri_dela_state = XFS_DAS_FOUND_NBLK;
+		error = -EAGAIN;
+	} else {
+		attr->xattri_dela_state = XFS_DAS_DONE;
+	}
+
+	trace_xfs_attr_node_addname_return(attr->xattri_dela_state, args->dp);
+	return error;
+}
+
+
 /*
  * Set the attribute specified in @args.
  * This routine is meant to function as a delayed operation, and may return
@@ -397,16 +396,14 @@ xfs_attr_set_iter(
 	/* State machine switch */
 	switch (attr->xattri_dela_state) {
 	case XFS_DAS_UNINIT:
-		/*
-		 * If the fork is shortform, attempt to add the attr. If there
-		 * is no space, this converts to leaf format and returns
-		 * -EAGAIN with the leaf buffer held across the roll. The caller
-		 * will deal with a transaction roll error, but otherwise
-		 * release the hold once we return with a clean transaction.
-		 */
-		if (xfs_attr_is_shortform(dp))
-			return xfs_attr_sf_addname(attr);
+		ASSERT(0);
+		return -EFSCORRUPTED;
+	case XFS_DAS_SF_ADD:
+		return xfs_attr_sf_addname(attr);
+	case XFS_DAS_LEAF_ADD:
 		return xfs_attr_leaf_addname(attr);
+	case XFS_DAS_NODE_ADD:
+		return xfs_attr_node_addname(attr);
 
 	case XFS_DAS_FOUND_LBLK:
 		/*
@@ -700,7 +697,7 @@ xfs_attr_defer_add(
 	if (error)
 		return error;
 
-	new->xattri_dela_state = XFS_DAS_UNINIT;
+	new->xattri_dela_state = xfs_attr_init_add_state(args);
 	xfs_defer_add(args->trans, XFS_DEFER_OPS_TYPE_ATTR, &new->xattri_list);
 	trace_xfs_attr_defer_add(new->xattri_dela_state, args->dp);
 
@@ -719,7 +716,7 @@ xfs_attr_defer_replace(
 	if (error)
 		return error;
 
-	new->xattri_dela_state = XFS_DAS_UNINIT;
+	new->xattri_dela_state = xfs_attr_init_replace_state(args);
 	xfs_defer_add(args->trans, XFS_DEFER_OPS_TYPE_ATTR, &new->xattri_list);
 	trace_xfs_attr_defer_replace(new->xattri_dela_state, args->dp);
 
@@ -1262,8 +1259,8 @@ xfs_attr_node_addname_find_attr(
  * to handle this, and recall the function until a successful error code is
  *returned.
  */
-STATIC int
-xfs_attr_node_addname(
+static int
+xfs_attr_node_try_addname(
 	struct xfs_attr_item		*attr)
 {
 	struct xfs_da_args		*args = attr->xattri_da_args;
diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
index c9c867e3406c..ad52b5dc59e4 100644
--- a/fs/xfs/libxfs/xfs_attr.h
+++ b/fs/xfs/libxfs/xfs_attr.h
@@ -443,21 +443,44 @@ struct xfs_attr_list_context {
  * to where it was and resume executing where it left off.
  */
 enum xfs_delattr_state {
-	XFS_DAS_UNINIT		= 0,  /* No state has been set yet */
-	XFS_DAS_RMTBLK,		      /* Removing remote blks */
-	XFS_DAS_RM_NAME,	      /* Remove attr name */
-	XFS_DAS_RM_SHRINK,	      /* We are shrinking the tree */
-	XFS_DAS_FOUND_LBLK,	      /* We found leaf blk for attr */
-	XFS_DAS_FOUND_NBLK,	      /* We found node blk for attr */
-	XFS_DAS_FLIP_LFLAG,	      /* Flipped leaf INCOMPLETE attr flag */
-	XFS_DAS_RM_LBLK,	      /* A rename is removing leaf blocks */
-	XFS_DAS_RD_LEAF,	      /* Read in the new leaf */
-	XFS_DAS_ALLOC_NODE,	      /* We are allocating node blocks */
-	XFS_DAS_FLIP_NFLAG,	      /* Flipped node INCOMPLETE attr flag */
-	XFS_DAS_RM_NBLK,	      /* A rename is removing node blocks */
-	XFS_DAS_CLR_FLAG,	      /* Clear incomplete flag */
+	XFS_DAS_UNINIT		= 0,	/* No state has been set yet */
+	XFS_DAS_SF_ADD,			/* Initial shortform set iter state */
+	XFS_DAS_LEAF_ADD,		/* Initial leaf form set iter state */
+	XFS_DAS_NODE_ADD,		/* Initial node form set iter state */
+	XFS_DAS_RMTBLK,			/* Removing remote blks */
+	XFS_DAS_RM_NAME,		/* Remove attr name */
+	XFS_DAS_RM_SHRINK,		/* We are shrinking the tree */
+	XFS_DAS_FOUND_LBLK,		/* We found leaf blk for attr */
+	XFS_DAS_FOUND_NBLK,		/* We found node blk for attr */
+	XFS_DAS_FLIP_LFLAG,		/* Flipped leaf INCOMPLETE attr flag */
+	XFS_DAS_RM_LBLK,		/* A rename is removing leaf blocks */
+	XFS_DAS_RD_LEAF,		/* Read in the new leaf */
+	XFS_DAS_ALLOC_NODE,		/* We are allocating node blocks */
+	XFS_DAS_FLIP_NFLAG,		/* Flipped node INCOMPLETE attr flag */
+	XFS_DAS_RM_NBLK,		/* A rename is removing node blocks */
+	XFS_DAS_CLR_FLAG,		/* Clear incomplete flag */
+	XFS_DAS_DONE,			/* finished operation */
 };
 
+#define XFS_DAS_STRINGS	\
+	{ XFS_DAS_UNINIT,	"XFS_DAS_UNINIT" }, \
+	{ XFS_DAS_SF_ADD,	"XFS_DAS_SF_ADD" }, \
+	{ XFS_DAS_LEAF_ADD,	"XFS_DAS_LEAF_ADD" }, \
+	{ XFS_DAS_NODE_ADD,	"XFS_DAS_NODE_ADD" }, \
+	{ XFS_DAS_RMTBLK,	"XFS_DAS_RMTBLK" }, \
+	{ XFS_DAS_RM_NAME,	"XFS_DAS_RM_NAME" }, \
+	{ XFS_DAS_RM_SHRINK,	"XFS_DAS_RM_SHRINK" }, \
+	{ XFS_DAS_FOUND_LBLK,	"XFS_DAS_FOUND_LBLK" }, \
+	{ XFS_DAS_FOUND_NBLK,	"XFS_DAS_FOUND_NBLK" }, \
+	{ XFS_DAS_FLIP_LFLAG,	"XFS_DAS_FLIP_LFLAG" }, \
+	{ XFS_DAS_RM_LBLK,	"XFS_DAS_RM_LBLK" }, \
+	{ XFS_DAS_RD_LEAF,	"XFS_DAS_RD_LEAF" }, \
+	{ XFS_DAS_ALLOC_NODE,	"XFS_DAS_ALLOC_NODE" }, \
+	{ XFS_DAS_FLIP_NFLAG,	"XFS_DAS_FLIP_NFLAG" }, \
+	{ XFS_DAS_RM_NBLK,	"XFS_DAS_RM_NBLK" }, \
+	{ XFS_DAS_CLR_FLAG,	"XFS_DAS_CLR_FLAG" }, \
+	{ XFS_DAS_DONE,		"XFS_DAS_DONE" }
+
 /*
  * Defines for xfs_attr_item.xattri_flags
  */
@@ -530,4 +553,35 @@ void xfs_attri_destroy_cache(void);
 int __init xfs_attrd_init_cache(void);
 void xfs_attrd_destroy_cache(void);
 
+/*
+ * Check to see if the attr should be upgraded from non-existent or shortform to
+ * single-leaf-block attribute list.
+ */
+static inline bool
+xfs_attr_is_shortform(
+	struct xfs_inode    *ip)
+{
+	return ip->i_afp->if_format == XFS_DINODE_FMT_LOCAL ||
+	       (ip->i_afp->if_format == XFS_DINODE_FMT_EXTENTS &&
+		ip->i_afp->if_nextents == 0);
+}
+
+static inline enum xfs_delattr_state
+xfs_attr_init_add_state(struct xfs_da_args *args)
+{
+	if (!args->dp->i_afp)
+		return XFS_DAS_DONE;
+	if (xfs_attr_is_shortform(args->dp))
+		return XFS_DAS_SF_ADD;
+	if (xfs_attr_is_leaf(args->dp))
+		return XFS_DAS_LEAF_ADD;
+	return XFS_DAS_NODE_ADD;
+}
+
+static inline enum xfs_delattr_state
+xfs_attr_init_replace_state(struct xfs_da_args *args)
+{
+	return xfs_attr_init_add_state(args);
+}
+
 #endif	/* __XFS_ATTR_H__ */
diff --git a/fs/xfs/libxfs/xfs_defer.c b/fs/xfs/libxfs/xfs_defer.c
index b2ecc272f9e4..ceb222b4f261 100644
--- a/fs/xfs/libxfs/xfs_defer.c
+++ b/fs/xfs/libxfs/xfs_defer.c
@@ -23,6 +23,8 @@
 #include "xfs_bmap.h"
 #include "xfs_alloc.h"
 #include "xfs_buf.h"
+#include "xfs_da_format.h"
+#include "xfs_da_btree.h"
 #include "xfs_attr.h"
 
 static struct kmem_cache	*xfs_defer_pending_cache;
diff --git a/fs/xfs/xfs_acl.c b/fs/xfs/xfs_acl.c
index 5c52ee869272..3df9c1782ead 100644
--- a/fs/xfs/xfs_acl.c
+++ b/fs/xfs/xfs_acl.c
@@ -10,12 +10,12 @@
 #include "xfs_trans_resv.h"
 #include "xfs_mount.h"
 #include "xfs_inode.h"
+#include "xfs_da_format.h"
+#include "xfs_da_btree.h"
 #include "xfs_attr.h"
 #include "xfs_trace.h"
 #include "xfs_error.h"
 #include "xfs_acl.h"
-#include "xfs_da_format.h"
-#include "xfs_da_btree.h"
 #include "xfs_trans.h"
 
 #include <linux/posix_acl_xattr.h>
diff --git a/fs/xfs/xfs_attr_item.c b/fs/xfs/xfs_attr_item.c
index fe1e37696634..5bfb33746e37 100644
--- a/fs/xfs/xfs_attr_item.c
+++ b/fs/xfs/xfs_attr_item.c
@@ -570,10 +570,21 @@ xfs_attri_item_recover(
 	args->hashval = xfs_da_hashname(args->name, args->namelen);
 	args->attr_filter = attrp->alfi_attr_flags;
 
-	if (attrp->alfi_op_flags == XFS_ATTR_OP_FLAGS_SET) {
+	switch (attrp->alfi_op_flags & XFS_ATTR_OP_FLAGS_TYPE_MASK) {
+	case XFS_ATTR_OP_FLAGS_SET:
+	case XFS_ATTR_OP_FLAGS_REPLACE:
 		args->value = attrip->attri_value;
 		args->valuelen = attrp->alfi_value_len;
 		args->total = xfs_attr_calc_size(args, &local);
+		attr->xattri_dela_state = xfs_attr_init_add_state(args);
+		break;
+	case XFS_ATTR_OP_FLAGS_REMOVE:
+		attr->xattri_dela_state = XFS_DAS_UNINIT;
+		break;
+	default:
+		ASSERT(0);
+		error = -EFSCORRUPTED;
+		goto out;
 	}
 
 	xfs_init_attr_trans(args, &tres, &total);
diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index e9eadc7337ce..0e5cb7936206 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -15,6 +15,8 @@
 #include "xfs_iwalk.h"
 #include "xfs_itable.h"
 #include "xfs_error.h"
+#include "xfs_da_format.h"
+#include "xfs_da_btree.h"
 #include "xfs_attr.h"
 #include "xfs_bmap.h"
 #include "xfs_bmap_util.h"
@@ -35,8 +37,6 @@
 #include "xfs_health.h"
 #include "xfs_reflink.h"
 #include "xfs_ioctl.h"
-#include "xfs_da_format.h"
-#include "xfs_da_btree.h"
 
 #include <linux/mount.h>
 #include <linux/namei.h>
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index 01ce0401aa32..8f722be25c29 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -4129,6 +4129,23 @@ DEFINE_ICLOG_EVENT(xlog_iclog_want_sync);
 DEFINE_ICLOG_EVENT(xlog_iclog_wait_on);
 DEFINE_ICLOG_EVENT(xlog_iclog_write);
 
+TRACE_DEFINE_ENUM(XFS_DAS_UNINIT);
+TRACE_DEFINE_ENUM(XFS_DAS_SF_ADD);
+TRACE_DEFINE_ENUM(XFS_DAS_LEAF_ADD);
+TRACE_DEFINE_ENUM(XFS_DAS_NODE_ADD);
+TRACE_DEFINE_ENUM(XFS_DAS_RMTBLK);
+TRACE_DEFINE_ENUM(XFS_DAS_RM_NAME);
+TRACE_DEFINE_ENUM(XFS_DAS_RM_SHRINK);
+TRACE_DEFINE_ENUM(XFS_DAS_FOUND_LBLK);
+TRACE_DEFINE_ENUM(XFS_DAS_FOUND_NBLK);
+TRACE_DEFINE_ENUM(XFS_DAS_FLIP_LFLAG);
+TRACE_DEFINE_ENUM(XFS_DAS_RM_LBLK);
+TRACE_DEFINE_ENUM(XFS_DAS_RD_LEAF);
+TRACE_DEFINE_ENUM(XFS_DAS_ALLOC_NODE);
+TRACE_DEFINE_ENUM(XFS_DAS_FLIP_NFLAG);
+TRACE_DEFINE_ENUM(XFS_DAS_RM_NBLK);
+TRACE_DEFINE_ENUM(XFS_DAS_CLR_FLAG);
+
 DECLARE_EVENT_CLASS(xfs_das_state_class,
 	TP_PROTO(int das, struct xfs_inode *ip),
 	TP_ARGS(das, ip),
@@ -4140,8 +4157,9 @@ DECLARE_EVENT_CLASS(xfs_das_state_class,
 		__entry->das = das;
 		__entry->ino = ip->i_ino;
 	),
-	TP_printk("state change %d ino 0x%llx",
-		  __entry->das, __entry->ino)
+	TP_printk("state change %s ino 0x%llx",
+		  __print_symbolic(__entry->das, XFS_DAS_STRINGS),
+		  __entry->ino)
 )
 
 #define DEFINE_DAS_STATE_EVENT(name) \
diff --git a/fs/xfs/xfs_xattr.c b/fs/xfs/xfs_xattr.c
index 0d050f8829ef..7a044afd4c46 100644
--- a/fs/xfs/xfs_xattr.c
+++ b/fs/xfs/xfs_xattr.c
@@ -12,9 +12,9 @@
 #include "xfs_trans_resv.h"
 #include "xfs_mount.h"
 #include "xfs_inode.h"
+#include "xfs_da_btree.h"
 #include "xfs_attr.h"
 #include "xfs_acl.h"
-#include "xfs_da_btree.h"
 
 #include <linux/posix_acl_xattr.h>
 
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 06/18] xfs: kill XFS_DAC_LEAF_ADDNAME_INIT
  2022-05-09  0:41 [PATCH 00/18 V4] XFS: LARP state machine and recovery rework Dave Chinner
                   ` (4 preceding siblings ...)
  2022-05-09  0:41 ` [PATCH 05/18] xfs: separate out initial attr_set states Dave Chinner
@ 2022-05-09  0:41 ` Dave Chinner
  2022-05-10 23:15   ` Darrick J. Wong
  2022-05-09  0:41 ` [PATCH 07/18] xfs: consolidate leaf/node states in xfs_attr_set_iter Dave Chinner
                   ` (12 subsequent siblings)
  18 siblings, 1 reply; 56+ messages in thread
From: Dave Chinner @ 2022-05-09  0:41 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

We re-enter the XFS_DAS_FOUND_LBLK state when we have to allocate
multiple extents for a remote xattr. We currently have a flag
called XFS_DAC_LEAF_ADDNAME_INIT to avoid running the remote attr
hole finding code more than once.

However, for the node format tree, we have a separate state for this
so we never reenter the state machine at XFS_DAS_FOUND_NBLK and so
it does not need a special flag to skip over the remote attr hold
finding code.

Convert the leaf block code to use the same state machine as the
node blocks and kill the  XFS_DAC_LEAF_ADDNAME_INIT flag.

This further points out that this "ALLOC" state is only traversed
if we have remote xattrs or we are doing a rename operation. Rename
both the leaf and node alloc states to _ALLOC_RMT to indicate they
are iterating to do allocation of remote xattr blocks.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Allison Henderson<allison.henderson@oracle.com>
---
 fs/xfs/libxfs/xfs_attr.c | 45 ++++++++++++++++++++--------------------
 fs/xfs/libxfs/xfs_attr.h |  6 ++++--
 fs/xfs/xfs_trace.h       |  3 ++-
 3 files changed, 29 insertions(+), 25 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
index edc31075fde4..ab8a884af512 100644
--- a/fs/xfs/libxfs/xfs_attr.c
+++ b/fs/xfs/libxfs/xfs_attr.c
@@ -406,40 +406,41 @@ xfs_attr_set_iter(
 		return xfs_attr_node_addname(attr);
 
 	case XFS_DAS_FOUND_LBLK:
+		/*
+		 * Find space for remote blocks and fall into the allocation
+		 * state.
+		 */
+		if (args->rmtblkno > 0) {
+			error = xfs_attr_rmtval_find_space(attr);
+			if (error)
+				return error;
+		}
+		attr->xattri_dela_state = XFS_DAS_LEAF_ALLOC_RMT;
+		fallthrough;
+	case XFS_DAS_LEAF_ALLOC_RMT:
+
 		/*
 		 * If there was an out-of-line value, allocate the blocks we
 		 * identified for its storage and copy the value.  This is done
 		 * after we create the attribute so that we don't overflow the
 		 * maximum size of a transaction and/or hit a deadlock.
 		 */
-
-		/* Open coded xfs_attr_rmtval_set without trans handling */
-		if ((attr->xattri_flags & XFS_DAC_LEAF_ADDNAME_INIT) == 0) {
-			attr->xattri_flags |= XFS_DAC_LEAF_ADDNAME_INIT;
-			if (args->rmtblkno > 0) {
-				error = xfs_attr_rmtval_find_space(attr);
+		if (args->rmtblkno > 0) {
+			if (attr->xattri_blkcnt > 0) {
+				error = xfs_attr_rmtval_set_blk(attr);
 				if (error)
 					return error;
+				trace_xfs_attr_set_iter_return(
+						attr->xattri_dela_state,
+						args->dp);
+				return -EAGAIN;
 			}
-		}
 
-		/*
-		 * Repeat allocating remote blocks for the attr value until
-		 * blkcnt drops to zero.
-		 */
-		if (attr->xattri_blkcnt > 0) {
-			error = xfs_attr_rmtval_set_blk(attr);
+			error = xfs_attr_rmtval_set_value(args);
 			if (error)
 				return error;
-			trace_xfs_attr_set_iter_return(attr->xattri_dela_state,
-						       args->dp);
-			return -EAGAIN;
 		}
 
-		error = xfs_attr_rmtval_set_value(args);
-		if (error)
-			return error;
-
 		/*
 		 * If this is not a rename, clear the incomplete flag and we're
 		 * done.
@@ -534,15 +535,15 @@ xfs_attr_set_iter(
 				return error;
 		}
 
+		attr->xattri_dela_state = XFS_DAS_NODE_ALLOC_RMT;
 		fallthrough;
-	case XFS_DAS_ALLOC_NODE:
+	case XFS_DAS_NODE_ALLOC_RMT:
 		/*
 		 * If there was an out-of-line value, allocate the blocks we
 		 * identified for its storage and copy the value.  This is done
 		 * after we create the attribute so that we don't overflow the
 		 * maximum size of a transaction and/or hit a deadlock.
 		 */
-		attr->xattri_dela_state = XFS_DAS_ALLOC_NODE;
 		if (args->rmtblkno > 0) {
 			if (attr->xattri_blkcnt > 0) {
 				error = xfs_attr_rmtval_set_blk(attr);
diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
index ad52b5dc59e4..d016af4dbf81 100644
--- a/fs/xfs/libxfs/xfs_attr.h
+++ b/fs/xfs/libxfs/xfs_attr.h
@@ -451,11 +451,12 @@ enum xfs_delattr_state {
 	XFS_DAS_RM_NAME,		/* Remove attr name */
 	XFS_DAS_RM_SHRINK,		/* We are shrinking the tree */
 	XFS_DAS_FOUND_LBLK,		/* We found leaf blk for attr */
+	XFS_DAS_LEAF_ALLOC_RMT,		/* We are allocating remote blocks */
 	XFS_DAS_FOUND_NBLK,		/* We found node blk for attr */
+	XFS_DAS_NODE_ALLOC_RMT,		/* We are allocating remote blocks */
 	XFS_DAS_FLIP_LFLAG,		/* Flipped leaf INCOMPLETE attr flag */
 	XFS_DAS_RM_LBLK,		/* A rename is removing leaf blocks */
 	XFS_DAS_RD_LEAF,		/* Read in the new leaf */
-	XFS_DAS_ALLOC_NODE,		/* We are allocating node blocks */
 	XFS_DAS_FLIP_NFLAG,		/* Flipped node INCOMPLETE attr flag */
 	XFS_DAS_RM_NBLK,		/* A rename is removing node blocks */
 	XFS_DAS_CLR_FLAG,		/* Clear incomplete flag */
@@ -471,11 +472,12 @@ enum xfs_delattr_state {
 	{ XFS_DAS_RM_NAME,	"XFS_DAS_RM_NAME" }, \
 	{ XFS_DAS_RM_SHRINK,	"XFS_DAS_RM_SHRINK" }, \
 	{ XFS_DAS_FOUND_LBLK,	"XFS_DAS_FOUND_LBLK" }, \
+	{ XFS_DAS_LEAF_ALLOC_RMT, "XFS_DAS_LEAF_ALLOC_RMT" }, \
 	{ XFS_DAS_FOUND_NBLK,	"XFS_DAS_FOUND_NBLK" }, \
+	{ XFS_DAS_NODE_ALLOC_RMT, "XFS_DAS_NODE_ALLOC_RMT" },  \
 	{ XFS_DAS_FLIP_LFLAG,	"XFS_DAS_FLIP_LFLAG" }, \
 	{ XFS_DAS_RM_LBLK,	"XFS_DAS_RM_LBLK" }, \
 	{ XFS_DAS_RD_LEAF,	"XFS_DAS_RD_LEAF" }, \
-	{ XFS_DAS_ALLOC_NODE,	"XFS_DAS_ALLOC_NODE" }, \
 	{ XFS_DAS_FLIP_NFLAG,	"XFS_DAS_FLIP_NFLAG" }, \
 	{ XFS_DAS_RM_NBLK,	"XFS_DAS_RM_NBLK" }, \
 	{ XFS_DAS_CLR_FLAG,	"XFS_DAS_CLR_FLAG" }, \
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index 8f722be25c29..067ab31d7a20 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -4137,11 +4137,12 @@ TRACE_DEFINE_ENUM(XFS_DAS_RMTBLK);
 TRACE_DEFINE_ENUM(XFS_DAS_RM_NAME);
 TRACE_DEFINE_ENUM(XFS_DAS_RM_SHRINK);
 TRACE_DEFINE_ENUM(XFS_DAS_FOUND_LBLK);
+TRACE_DEFINE_ENUM(XFS_DAS_LEAF_ALLOC_RMT);
 TRACE_DEFINE_ENUM(XFS_DAS_FOUND_NBLK);
+TRACE_DEFINE_ENUM(XFS_DAS_NODE_ALLOC_RMT);
 TRACE_DEFINE_ENUM(XFS_DAS_FLIP_LFLAG);
 TRACE_DEFINE_ENUM(XFS_DAS_RM_LBLK);
 TRACE_DEFINE_ENUM(XFS_DAS_RD_LEAF);
-TRACE_DEFINE_ENUM(XFS_DAS_ALLOC_NODE);
 TRACE_DEFINE_ENUM(XFS_DAS_FLIP_NFLAG);
 TRACE_DEFINE_ENUM(XFS_DAS_RM_NBLK);
 TRACE_DEFINE_ENUM(XFS_DAS_CLR_FLAG);
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 07/18] xfs: consolidate leaf/node states in xfs_attr_set_iter
  2022-05-09  0:41 [PATCH 00/18 V4] XFS: LARP state machine and recovery rework Dave Chinner
                   ` (5 preceding siblings ...)
  2022-05-09  0:41 ` [PATCH 06/18] xfs: kill XFS_DAC_LEAF_ADDNAME_INIT Dave Chinner
@ 2022-05-09  0:41 ` Dave Chinner
  2022-05-10 23:20   ` Darrick J. Wong
  2022-05-09  0:41 ` [PATCH 08/18] xfs: split remote attr setting out from replace path Dave Chinner
                   ` (11 subsequent siblings)
  18 siblings, 1 reply; 56+ messages in thread
From: Dave Chinner @ 2022-05-09  0:41 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

The operations performed from XFS_DAS_FOUND_LBLK through to
XFS_DAS_RM_LBLK are now identical to XFS_DAS_FOUND_NBLK through to
XFS_DAS_RM_NBLK. We can collapse these down into a single set of
code.

To do this, define the states that leaf and node run through as
separate sets of sequential states. Then as we move to the next
state, we can use increments rather than specific state assignments
to move through the states. This means the state progression is set
by the initial state that enters the series and we don't need to
duplicate the code anymore.

At the exit point of the series we need to select the correct leaf
or node state, but that can also be done by state increment rather
than assignment.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Allison Henderson<allison.henderson@oracle.com>
---
 fs/xfs/libxfs/xfs_attr.c | 127 ++++++---------------------------------
 fs/xfs/libxfs/xfs_attr.h |   9 ++-
 2 files changed, 27 insertions(+), 109 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
index ab8a884af512..be580da62f08 100644
--- a/fs/xfs/libxfs/xfs_attr.c
+++ b/fs/xfs/libxfs/xfs_attr.c
@@ -394,6 +394,7 @@ xfs_attr_set_iter(
 	struct xfs_mount		*mp = args->dp->i_mount;
 
 	/* State machine switch */
+next_state:
 	switch (attr->xattri_dela_state) {
 	case XFS_DAS_UNINIT:
 		ASSERT(0);
@@ -406,6 +407,7 @@ xfs_attr_set_iter(
 		return xfs_attr_node_addname(attr);
 
 	case XFS_DAS_FOUND_LBLK:
+	case XFS_DAS_FOUND_NBLK:
 		/*
 		 * Find space for remote blocks and fall into the allocation
 		 * state.
@@ -415,9 +417,10 @@ xfs_attr_set_iter(
 			if (error)
 				return error;
 		}
-		attr->xattri_dela_state = XFS_DAS_LEAF_ALLOC_RMT;
+		attr->xattri_dela_state++;
 		fallthrough;
 	case XFS_DAS_LEAF_ALLOC_RMT:
+	case XFS_DAS_NODE_ALLOC_RMT:
 
 		/*
 		 * If there was an out-of-line value, allocate the blocks we
@@ -466,16 +469,18 @@ xfs_attr_set_iter(
 				return error;
 			/*
 			 * Commit the flag value change and start the next trans
-			 * in series.
+			 * in series at FLIP_FLAG.
 			 */
-			attr->xattri_dela_state = XFS_DAS_FLIP_LFLAG;
+			attr->xattri_dela_state++;
 			trace_xfs_attr_set_iter_return(attr->xattri_dela_state,
 						       args->dp);
 			return -EAGAIN;
 		}
 
+		attr->xattri_dela_state++;
 		fallthrough;
 	case XFS_DAS_FLIP_LFLAG:
+	case XFS_DAS_FLIP_NFLAG:
 		/*
 		 * Dismantle the "old" attribute/value pair by removing a
 		 * "remote" value (if it exists).
@@ -485,10 +490,10 @@ xfs_attr_set_iter(
 		if (error)
 			return error;
 
+		attr->xattri_dela_state++;
 		fallthrough;
 	case XFS_DAS_RM_LBLK:
-		/* Set state in case xfs_attr_rmtval_remove returns -EAGAIN */
-		attr->xattri_dela_state = XFS_DAS_RM_LBLK;
+	case XFS_DAS_RM_NBLK:
 		if (args->rmtblkno) {
 			error = xfs_attr_rmtval_remove(attr);
 			if (error == -EAGAIN)
@@ -503,7 +508,16 @@ xfs_attr_set_iter(
 			return -EAGAIN;
 		}
 
-		fallthrough;
+		/*
+		 * This is the end of the shared leaf/node sequence. We need
+		 * to continue at the next state in the sequence, but we can't
+		 * easily just fall through. So we increment to the next state
+		 * and then jump back to switch statement to evaluate the next
+		 * state correctly.
+		 */
+		attr->xattri_dela_state++;
+		goto next_state;
+
 	case XFS_DAS_RD_LEAF:
 		/*
 		 * This is the last step for leaf format. Read the block with
@@ -524,106 +538,6 @@ xfs_attr_set_iter(
 
 		return error;
 
-	case XFS_DAS_FOUND_NBLK:
-		/*
-		 * Find space for remote blocks and fall into the allocation
-		 * state.
-		 */
-		if (args->rmtblkno > 0) {
-			error = xfs_attr_rmtval_find_space(attr);
-			if (error)
-				return error;
-		}
-
-		attr->xattri_dela_state = XFS_DAS_NODE_ALLOC_RMT;
-		fallthrough;
-	case XFS_DAS_NODE_ALLOC_RMT:
-		/*
-		 * If there was an out-of-line value, allocate the blocks we
-		 * identified for its storage and copy the value.  This is done
-		 * after we create the attribute so that we don't overflow the
-		 * maximum size of a transaction and/or hit a deadlock.
-		 */
-		if (args->rmtblkno > 0) {
-			if (attr->xattri_blkcnt > 0) {
-				error = xfs_attr_rmtval_set_blk(attr);
-				if (error)
-					return error;
-				trace_xfs_attr_set_iter_return(
-					attr->xattri_dela_state, args->dp);
-				return -EAGAIN;
-			}
-
-			error = xfs_attr_rmtval_set_value(args);
-			if (error)
-				return error;
-		}
-
-		/*
-		 * If this was not a rename, clear the incomplete flag and we're
-		 * done.
-		 */
-		if (!(args->op_flags & XFS_DA_OP_RENAME)) {
-			if (args->rmtblkno > 0)
-				error = xfs_attr3_leaf_clearflag(args);
-			goto out;
-		}
-
-		/*
-		 * If this is an atomic rename operation, we must "flip" the
-		 * incomplete flags on the "new" and "old" attribute/value pairs
-		 * so that one disappears and one appears atomically.  Then we
-		 * must remove the "old" attribute/value pair.
-		 *
-		 * In a separate transaction, set the incomplete flag on the
-		 * "old" attr and clear the incomplete flag on the "new" attr.
-		 */
-		if (!xfs_has_larp(mp)) {
-			error = xfs_attr3_leaf_flipflags(args);
-			if (error)
-				goto out;
-			/*
-			 * Commit the flag value change and start the next trans
-			 * in series
-			 */
-			attr->xattri_dela_state = XFS_DAS_FLIP_NFLAG;
-			trace_xfs_attr_set_iter_return(attr->xattri_dela_state,
-						       args->dp);
-			return -EAGAIN;
-		}
-
-		fallthrough;
-	case XFS_DAS_FLIP_NFLAG:
-		/*
-		 * Dismantle the "old" attribute/value pair by removing a
-		 * "remote" value (if it exists).
-		 */
-		xfs_attr_restore_rmt_blk(args);
-
-		error = xfs_attr_rmtval_invalidate(args);
-		if (error)
-			return error;
-
-		fallthrough;
-	case XFS_DAS_RM_NBLK:
-		/* Set state in case xfs_attr_rmtval_remove returns -EAGAIN */
-		attr->xattri_dela_state = XFS_DAS_RM_NBLK;
-		if (args->rmtblkno) {
-			error = xfs_attr_rmtval_remove(attr);
-			if (error == -EAGAIN)
-				trace_xfs_attr_set_iter_return(
-					attr->xattri_dela_state, args->dp);
-
-			if (error)
-				return error;
-
-			attr->xattri_dela_state = XFS_DAS_CLR_FLAG;
-			trace_xfs_attr_set_iter_return(attr->xattri_dela_state,
-						       args->dp);
-			return -EAGAIN;
-		}
-
-		fallthrough;
 	case XFS_DAS_CLR_FLAG:
 		/*
 		 * The last state for node format. Look up the old attr and
@@ -635,7 +549,6 @@ xfs_attr_set_iter(
 		ASSERT(0);
 		break;
 	}
-out:
 	return error;
 }
 
diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
index d016af4dbf81..37db61649217 100644
--- a/fs/xfs/libxfs/xfs_attr.h
+++ b/fs/xfs/libxfs/xfs_attr.h
@@ -450,16 +450,21 @@ enum xfs_delattr_state {
 	XFS_DAS_RMTBLK,			/* Removing remote blks */
 	XFS_DAS_RM_NAME,		/* Remove attr name */
 	XFS_DAS_RM_SHRINK,		/* We are shrinking the tree */
+
+	/* Leaf state set sequence */
 	XFS_DAS_FOUND_LBLK,		/* We found leaf blk for attr */
 	XFS_DAS_LEAF_ALLOC_RMT,		/* We are allocating remote blocks */
-	XFS_DAS_FOUND_NBLK,		/* We found node blk for attr */
-	XFS_DAS_NODE_ALLOC_RMT,		/* We are allocating remote blocks */
 	XFS_DAS_FLIP_LFLAG,		/* Flipped leaf INCOMPLETE attr flag */
 	XFS_DAS_RM_LBLK,		/* A rename is removing leaf blocks */
 	XFS_DAS_RD_LEAF,		/* Read in the new leaf */
+
+	/* Node state set sequence, must match leaf state above */
+	XFS_DAS_FOUND_NBLK,		/* We found node blk for attr */
+	XFS_DAS_NODE_ALLOC_RMT,		/* We are allocating remote blocks */
 	XFS_DAS_FLIP_NFLAG,		/* Flipped node INCOMPLETE attr flag */
 	XFS_DAS_RM_NBLK,		/* A rename is removing node blocks */
 	XFS_DAS_CLR_FLAG,		/* Clear incomplete flag */
+
 	XFS_DAS_DONE,			/* finished operation */
 };
 
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 08/18] xfs: split remote attr setting out from replace path
  2022-05-09  0:41 [PATCH 00/18 V4] XFS: LARP state machine and recovery rework Dave Chinner
                   ` (6 preceding siblings ...)
  2022-05-09  0:41 ` [PATCH 07/18] xfs: consolidate leaf/node states in xfs_attr_set_iter Dave Chinner
@ 2022-05-09  0:41 ` Dave Chinner
  2022-05-10 23:22   ` Darrick J. Wong
  2022-05-09  0:41 ` [PATCH 09/18] xfs: XFS_DAS_LEAF_REPLACE state only needed if !LARP Dave Chinner
                   ` (10 subsequent siblings)
  18 siblings, 1 reply; 56+ messages in thread
From: Dave Chinner @ 2022-05-09  0:41 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

When we set a new xattr, we have three exit paths:

	1. nothing else to do
	2. allocate and set the remote xattr value
	3. perform the rest of a replace operation

Currently we push both 2 and 3 into the same state, regardless of
whether we just set a remote attribute or not. Once we've set the
remote xattr, we have two exit states:

	1. nothing else to do
	2. perform the rest of a replace operation

Hence we can split the remote xattr allocation and setting into
their own states and factor it out of xfs_attr_set_iter() to further
clean up the state machine and the implementation of the state
machine.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Allison Henderson<allison.henderson@oracle.com>
---
 fs/xfs/libxfs/xfs_attr.c | 113 +++++++++++++++++++++------------------
 fs/xfs/libxfs/xfs_attr.h |  14 +++--
 fs/xfs/xfs_trace.h       |   9 ++--
 3 files changed, 77 insertions(+), 59 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
index be580da62f08..d2b29f7e103a 100644
--- a/fs/xfs/libxfs/xfs_attr.c
+++ b/fs/xfs/libxfs/xfs_attr.c
@@ -334,9 +334,11 @@ xfs_attr_leaf_addname(
 	 * or perform more xattr manipulations. Otherwise there is nothing more
 	 * to do and we can return success.
 	 */
-	if (args->rmtblkno ||
-	    (args->op_flags & XFS_DA_OP_RENAME)) {
-		attr->xattri_dela_state = XFS_DAS_FOUND_LBLK;
+	if (args->rmtblkno) {
+		attr->xattri_dela_state = XFS_DAS_LEAF_SET_RMT;
+		error = -EAGAIN;
+	} else if (args->op_flags & XFS_DA_OP_RENAME) {
+		attr->xattri_dela_state = XFS_DAS_LEAF_REPLACE;
 		error = -EAGAIN;
 	} else {
 		attr->xattri_dela_state = XFS_DAS_DONE;
@@ -363,9 +365,11 @@ xfs_attr_node_addname(
 	if (error)
 		return error;
 
-	if (args->rmtblkno ||
-	    (args->op_flags & XFS_DA_OP_RENAME)) {
-		attr->xattri_dela_state = XFS_DAS_FOUND_NBLK;
+	if (args->rmtblkno) {
+		attr->xattri_dela_state = XFS_DAS_NODE_SET_RMT;
+		error = -EAGAIN;
+	} else if (args->op_flags & XFS_DA_OP_RENAME) {
+		attr->xattri_dela_state = XFS_DAS_NODE_REPLACE;
 		error = -EAGAIN;
 	} else {
 		attr->xattri_dela_state = XFS_DAS_DONE;
@@ -375,6 +379,40 @@ xfs_attr_node_addname(
 	return error;
 }
 
+static int
+xfs_attr_rmtval_alloc(
+	struct xfs_attr_item		*attr)
+{
+	struct xfs_da_args              *args = attr->xattri_da_args;
+	int				error = 0;
+
+	/*
+	 * If there was an out-of-line value, allocate the blocks we
+	 * identified for its storage and copy the value.  This is done
+	 * after we create the attribute so that we don't overflow the
+	 * maximum size of a transaction and/or hit a deadlock.
+	 */
+	if (attr->xattri_blkcnt > 0) {
+		error = xfs_attr_rmtval_set_blk(attr);
+		if (error)
+			return error;
+		error = -EAGAIN;
+		goto out;
+	}
+
+	error = xfs_attr_rmtval_set_value(args);
+	if (error)
+		return error;
+
+	/* If this is not a rename, clear the incomplete flag and we're done. */
+	if (!(args->op_flags & XFS_DA_OP_RENAME)) {
+		error = xfs_attr3_leaf_clearflag(args);
+		attr->xattri_dela_state = XFS_DAS_DONE;
+	}
+out:
+	trace_xfs_attr_rmtval_alloc(attr->xattri_dela_state, args->dp);
+	return error;
+}
 
 /*
  * Set the attribute specified in @args.
@@ -406,54 +444,26 @@ xfs_attr_set_iter(
 	case XFS_DAS_NODE_ADD:
 		return xfs_attr_node_addname(attr);
 
-	case XFS_DAS_FOUND_LBLK:
-	case XFS_DAS_FOUND_NBLK:
-		/*
-		 * Find space for remote blocks and fall into the allocation
-		 * state.
-		 */
-		if (args->rmtblkno > 0) {
-			error = xfs_attr_rmtval_find_space(attr);
-			if (error)
-				return error;
-		}
+	case XFS_DAS_LEAF_SET_RMT:
+	case XFS_DAS_NODE_SET_RMT:
+		error = xfs_attr_rmtval_find_space(attr);
+		if (error)
+			return error;
 		attr->xattri_dela_state++;
 		fallthrough;
+
 	case XFS_DAS_LEAF_ALLOC_RMT:
 	case XFS_DAS_NODE_ALLOC_RMT:
-
-		/*
-		 * If there was an out-of-line value, allocate the blocks we
-		 * identified for its storage and copy the value.  This is done
-		 * after we create the attribute so that we don't overflow the
-		 * maximum size of a transaction and/or hit a deadlock.
-		 */
-		if (args->rmtblkno > 0) {
-			if (attr->xattri_blkcnt > 0) {
-				error = xfs_attr_rmtval_set_blk(attr);
-				if (error)
-					return error;
-				trace_xfs_attr_set_iter_return(
-						attr->xattri_dela_state,
-						args->dp);
-				return -EAGAIN;
-			}
-
-			error = xfs_attr_rmtval_set_value(args);
-			if (error)
-				return error;
-		}
-
-		/*
-		 * If this is not a rename, clear the incomplete flag and we're
-		 * done.
-		 */
-		if (!(args->op_flags & XFS_DA_OP_RENAME)) {
-			if (args->rmtblkno > 0)
-				error = xfs_attr3_leaf_clearflag(args);
+		error = xfs_attr_rmtval_alloc(attr);
+		if (error)
 			return error;
-		}
+		if (attr->xattri_dela_state == XFS_DAS_DONE)
+			break;
+		attr->xattri_dela_state++;
+		fallthrough;
 
+	case XFS_DAS_LEAF_REPLACE:
+	case XFS_DAS_NODE_REPLACE:
 		/*
 		 * If this is an atomic rename operation, we must "flip" the
 		 * incomplete flags on the "new" and "old" attribute/value pairs
@@ -471,10 +481,9 @@ xfs_attr_set_iter(
 			 * Commit the flag value change and start the next trans
 			 * in series at FLIP_FLAG.
 			 */
+			error = -EAGAIN;
 			attr->xattri_dela_state++;
-			trace_xfs_attr_set_iter_return(attr->xattri_dela_state,
-						       args->dp);
-			return -EAGAIN;
+			break;
 		}
 
 		attr->xattri_dela_state++;
@@ -549,6 +558,8 @@ xfs_attr_set_iter(
 		ASSERT(0);
 		break;
 	}
+
+	trace_xfs_attr_set_iter_return(attr->xattri_dela_state, args->dp);
 	return error;
 }
 
diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
index 37db61649217..1749fd8f7ddd 100644
--- a/fs/xfs/libxfs/xfs_attr.h
+++ b/fs/xfs/libxfs/xfs_attr.h
@@ -452,15 +452,17 @@ enum xfs_delattr_state {
 	XFS_DAS_RM_SHRINK,		/* We are shrinking the tree */
 
 	/* Leaf state set sequence */
-	XFS_DAS_FOUND_LBLK,		/* We found leaf blk for attr */
+	XFS_DAS_LEAF_SET_RMT,		/* set a remote xattr from a leaf */
 	XFS_DAS_LEAF_ALLOC_RMT,		/* We are allocating remote blocks */
+	XFS_DAS_LEAF_REPLACE,		/* Perform replace ops on a leaf */
 	XFS_DAS_FLIP_LFLAG,		/* Flipped leaf INCOMPLETE attr flag */
 	XFS_DAS_RM_LBLK,		/* A rename is removing leaf blocks */
 	XFS_DAS_RD_LEAF,		/* Read in the new leaf */
 
 	/* Node state set sequence, must match leaf state above */
-	XFS_DAS_FOUND_NBLK,		/* We found node blk for attr */
+	XFS_DAS_NODE_SET_RMT,		/* set a remote xattr from a node */
 	XFS_DAS_NODE_ALLOC_RMT,		/* We are allocating remote blocks */
+	XFS_DAS_NODE_REPLACE,		/* Perform replace ops on a node */
 	XFS_DAS_FLIP_NFLAG,		/* Flipped node INCOMPLETE attr flag */
 	XFS_DAS_RM_NBLK,		/* A rename is removing node blocks */
 	XFS_DAS_CLR_FLAG,		/* Clear incomplete flag */
@@ -476,13 +478,15 @@ enum xfs_delattr_state {
 	{ XFS_DAS_RMTBLK,	"XFS_DAS_RMTBLK" }, \
 	{ XFS_DAS_RM_NAME,	"XFS_DAS_RM_NAME" }, \
 	{ XFS_DAS_RM_SHRINK,	"XFS_DAS_RM_SHRINK" }, \
-	{ XFS_DAS_FOUND_LBLK,	"XFS_DAS_FOUND_LBLK" }, \
+	{ XFS_DAS_LEAF_SET_RMT,	"XFS_DAS_LEAF_SET_RMT" }, \
 	{ XFS_DAS_LEAF_ALLOC_RMT, "XFS_DAS_LEAF_ALLOC_RMT" }, \
-	{ XFS_DAS_FOUND_NBLK,	"XFS_DAS_FOUND_NBLK" }, \
-	{ XFS_DAS_NODE_ALLOC_RMT, "XFS_DAS_NODE_ALLOC_RMT" },  \
+	{ XFS_DAS_LEAF_REPLACE,	"XFS_DAS_LEAF_REPLACE" }, \
 	{ XFS_DAS_FLIP_LFLAG,	"XFS_DAS_FLIP_LFLAG" }, \
 	{ XFS_DAS_RM_LBLK,	"XFS_DAS_RM_LBLK" }, \
 	{ XFS_DAS_RD_LEAF,	"XFS_DAS_RD_LEAF" }, \
+	{ XFS_DAS_NODE_SET_RMT,	"XFS_DAS_NODE_SET_RMT" }, \
+	{ XFS_DAS_NODE_ALLOC_RMT, "XFS_DAS_NODE_ALLOC_RMT" },  \
+	{ XFS_DAS_NODE_REPLACE,	"XFS_DAS_NODE_REPLACE" },  \
 	{ XFS_DAS_FLIP_NFLAG,	"XFS_DAS_FLIP_NFLAG" }, \
 	{ XFS_DAS_RM_NBLK,	"XFS_DAS_RM_NBLK" }, \
 	{ XFS_DAS_CLR_FLAG,	"XFS_DAS_CLR_FLAG" }, \
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index 067ab31d7a20..cb9122327114 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -4136,13 +4136,15 @@ TRACE_DEFINE_ENUM(XFS_DAS_NODE_ADD);
 TRACE_DEFINE_ENUM(XFS_DAS_RMTBLK);
 TRACE_DEFINE_ENUM(XFS_DAS_RM_NAME);
 TRACE_DEFINE_ENUM(XFS_DAS_RM_SHRINK);
-TRACE_DEFINE_ENUM(XFS_DAS_FOUND_LBLK);
+TRACE_DEFINE_ENUM(XFS_DAS_LEAF_SET_RMT);
 TRACE_DEFINE_ENUM(XFS_DAS_LEAF_ALLOC_RMT);
-TRACE_DEFINE_ENUM(XFS_DAS_FOUND_NBLK);
-TRACE_DEFINE_ENUM(XFS_DAS_NODE_ALLOC_RMT);
+TRACE_DEFINE_ENUM(XFS_DAS_LEAF_REPLACE);
 TRACE_DEFINE_ENUM(XFS_DAS_FLIP_LFLAG);
 TRACE_DEFINE_ENUM(XFS_DAS_RM_LBLK);
 TRACE_DEFINE_ENUM(XFS_DAS_RD_LEAF);
+TRACE_DEFINE_ENUM(XFS_DAS_NODE_SET_RMT);
+TRACE_DEFINE_ENUM(XFS_DAS_NODE_ALLOC_RMT);
+TRACE_DEFINE_ENUM(XFS_DAS_NODE_REPLACE);
 TRACE_DEFINE_ENUM(XFS_DAS_FLIP_NFLAG);
 TRACE_DEFINE_ENUM(XFS_DAS_RM_NBLK);
 TRACE_DEFINE_ENUM(XFS_DAS_CLR_FLAG);
@@ -4172,6 +4174,7 @@ DEFINE_DAS_STATE_EVENT(xfs_attr_set_iter_return);
 DEFINE_DAS_STATE_EVENT(xfs_attr_leaf_addname_return);
 DEFINE_DAS_STATE_EVENT(xfs_attr_node_addname_return);
 DEFINE_DAS_STATE_EVENT(xfs_attr_remove_iter_return);
+DEFINE_DAS_STATE_EVENT(xfs_attr_rmtval_alloc);
 DEFINE_DAS_STATE_EVENT(xfs_attr_rmtval_remove_return);
 DEFINE_DAS_STATE_EVENT(xfs_attr_defer_add);
 DEFINE_DAS_STATE_EVENT(xfs_attr_defer_replace);
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 09/18] xfs: XFS_DAS_LEAF_REPLACE state only needed if !LARP
  2022-05-09  0:41 [PATCH 00/18 V4] XFS: LARP state machine and recovery rework Dave Chinner
                   ` (7 preceding siblings ...)
  2022-05-09  0:41 ` [PATCH 08/18] xfs: split remote attr setting out from replace path Dave Chinner
@ 2022-05-09  0:41 ` Dave Chinner
  2022-05-10 23:24   ` Darrick J. Wong
  2022-05-09  0:41 ` [PATCH 10/18] xfs: remote xattr removal in xfs_attr_set_iter() is conditional Dave Chinner
                   ` (9 subsequent siblings)
  18 siblings, 1 reply; 56+ messages in thread
From: Dave Chinner @ 2022-05-09  0:41 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

We can skip the REPLACE state when LARP is enabled, but that means
the XFS_DAS_FLIP_LFLAG state is now poorly named - it indicates
something that has been done rather than what the state is going to
do. Rename it to "REMOVE_OLD" to indicate that we are now going to
perform removal of the old attr.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Allison Henderson<allison.henderson@oracle.com>
---
 fs/xfs/libxfs/xfs_attr.c | 81 +++++++++++++++++++++++++---------------
 fs/xfs/libxfs/xfs_attr.h | 44 +++++++++++-----------
 fs/xfs/xfs_trace.h       |  4 +-
 3 files changed, 75 insertions(+), 54 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
index d2b29f7e103a..0f4636e2e246 100644
--- a/fs/xfs/libxfs/xfs_attr.c
+++ b/fs/xfs/libxfs/xfs_attr.c
@@ -296,6 +296,26 @@ xfs_attr_sf_addname(
 	return error;
 }
 
+/*
+ * When we bump the state to REPLACE, we may actually need to skip over the
+ * state. When LARP mode is enabled, we don't need to run the atomic flags flip,
+ * so we skip straight over the REPLACE state and go on to REMOVE_OLD.
+ */
+static void
+xfs_attr_dela_state_set_replace(
+	struct xfs_attr_item	*attr,
+	enum xfs_delattr_state	replace)
+{
+	struct xfs_da_args	*args = attr->xattri_da_args;
+
+	ASSERT(replace == XFS_DAS_LEAF_REPLACE ||
+			replace == XFS_DAS_NODE_REPLACE);
+
+	attr->xattri_dela_state = replace;
+	if (xfs_has_larp(args->dp->i_mount))
+		attr->xattri_dela_state++;
+}
+
 static int
 xfs_attr_leaf_addname(
 	struct xfs_attr_item	*attr)
@@ -338,7 +358,7 @@ xfs_attr_leaf_addname(
 		attr->xattri_dela_state = XFS_DAS_LEAF_SET_RMT;
 		error = -EAGAIN;
 	} else if (args->op_flags & XFS_DA_OP_RENAME) {
-		attr->xattri_dela_state = XFS_DAS_LEAF_REPLACE;
+		xfs_attr_dela_state_set_replace(attr, XFS_DAS_LEAF_REPLACE);
 		error = -EAGAIN;
 	} else {
 		attr->xattri_dela_state = XFS_DAS_DONE;
@@ -369,7 +389,7 @@ xfs_attr_node_addname(
 		attr->xattri_dela_state = XFS_DAS_NODE_SET_RMT;
 		error = -EAGAIN;
 	} else if (args->op_flags & XFS_DA_OP_RENAME) {
-		attr->xattri_dela_state = XFS_DAS_NODE_REPLACE;
+		xfs_attr_dela_state_set_replace(attr, XFS_DAS_NODE_REPLACE);
 		error = -EAGAIN;
 	} else {
 		attr->xattri_dela_state = XFS_DAS_DONE;
@@ -396,8 +416,11 @@ xfs_attr_rmtval_alloc(
 		error = xfs_attr_rmtval_set_blk(attr);
 		if (error)
 			return error;
-		error = -EAGAIN;
-		goto out;
+		/* Roll the transaction only if there is more to allocate. */
+		if (attr->xattri_blkcnt > 0) {
+			error = -EAGAIN;
+			goto out;
+		}
 	}
 
 	error = xfs_attr_rmtval_set_value(args);
@@ -408,6 +431,13 @@ xfs_attr_rmtval_alloc(
 	if (!(args->op_flags & XFS_DA_OP_RENAME)) {
 		error = xfs_attr3_leaf_clearflag(args);
 		attr->xattri_dela_state = XFS_DAS_DONE;
+	} else {
+		/*
+		 * We are running a REPLACE operation, so we need to bump the
+		 * state to the step in that operation.
+		 */
+		attr->xattri_dela_state++;
+		xfs_attr_dela_state_set_replace(attr, attr->xattri_dela_state);
 	}
 out:
 	trace_xfs_attr_rmtval_alloc(attr->xattri_dela_state, args->dp);
@@ -429,7 +459,6 @@ xfs_attr_set_iter(
 	struct xfs_inode		*dp = args->dp;
 	struct xfs_buf			*bp = NULL;
 	int				forkoff, error = 0;
-	struct xfs_mount		*mp = args->dp->i_mount;
 
 	/* State machine switch */
 next_state:
@@ -459,37 +488,29 @@ xfs_attr_set_iter(
 			return error;
 		if (attr->xattri_dela_state == XFS_DAS_DONE)
 			break;
-		attr->xattri_dela_state++;
-		fallthrough;
+		goto next_state;
 
 	case XFS_DAS_LEAF_REPLACE:
 	case XFS_DAS_NODE_REPLACE:
 		/*
-		 * If this is an atomic rename operation, we must "flip" the
-		 * incomplete flags on the "new" and "old" attribute/value pairs
-		 * so that one disappears and one appears atomically.  Then we
-		 * must remove the "old" attribute/value pair.
-		 *
-		 * In a separate transaction, set the incomplete flag on the
-		 * "old" attr and clear the incomplete flag on the "new" attr.
+		 * We must "flip" the incomplete flags on the "new" and "old"
+		 * attribute/value pairs so that one disappears and one appears
+		 * atomically.  Then we must remove the "old" attribute/value
+		 * pair.
 		 */
-		if (!xfs_has_larp(mp)) {
-			error = xfs_attr3_leaf_flipflags(args);
-			if (error)
-				return error;
-			/*
-			 * Commit the flag value change and start the next trans
-			 * in series at FLIP_FLAG.
-			 */
-			error = -EAGAIN;
-			attr->xattri_dela_state++;
-			break;
-		}
-
+		error = xfs_attr3_leaf_flipflags(args);
+		if (error)
+			return error;
+		/*
+		 * Commit the flag value change and start the next trans
+		 * in series at REMOVE_OLD.
+		 */
+		error = -EAGAIN;
 		attr->xattri_dela_state++;
-		fallthrough;
-	case XFS_DAS_FLIP_LFLAG:
-	case XFS_DAS_FLIP_NFLAG:
+		break;
+
+	case XFS_DAS_LEAF_REMOVE_OLD:
+	case XFS_DAS_NODE_REMOVE_OLD:
 		/*
 		 * Dismantle the "old" attribute/value pair by removing a
 		 * "remote" value (if it exists).
diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
index 1749fd8f7ddd..3f1234272f3a 100644
--- a/fs/xfs/libxfs/xfs_attr.h
+++ b/fs/xfs/libxfs/xfs_attr.h
@@ -455,7 +455,7 @@ enum xfs_delattr_state {
 	XFS_DAS_LEAF_SET_RMT,		/* set a remote xattr from a leaf */
 	XFS_DAS_LEAF_ALLOC_RMT,		/* We are allocating remote blocks */
 	XFS_DAS_LEAF_REPLACE,		/* Perform replace ops on a leaf */
-	XFS_DAS_FLIP_LFLAG,		/* Flipped leaf INCOMPLETE attr flag */
+	XFS_DAS_LEAF_REMOVE_OLD,	/* Start removing old attr from leaf */
 	XFS_DAS_RM_LBLK,		/* A rename is removing leaf blocks */
 	XFS_DAS_RD_LEAF,		/* Read in the new leaf */
 
@@ -463,7 +463,7 @@ enum xfs_delattr_state {
 	XFS_DAS_NODE_SET_RMT,		/* set a remote xattr from a node */
 	XFS_DAS_NODE_ALLOC_RMT,		/* We are allocating remote blocks */
 	XFS_DAS_NODE_REPLACE,		/* Perform replace ops on a node */
-	XFS_DAS_FLIP_NFLAG,		/* Flipped node INCOMPLETE attr flag */
+	XFS_DAS_NODE_REMOVE_OLD,	/* Start removing old attr from node */
 	XFS_DAS_RM_NBLK,		/* A rename is removing node blocks */
 	XFS_DAS_CLR_FLAG,		/* Clear incomplete flag */
 
@@ -471,26 +471,26 @@ enum xfs_delattr_state {
 };
 
 #define XFS_DAS_STRINGS	\
-	{ XFS_DAS_UNINIT,	"XFS_DAS_UNINIT" }, \
-	{ XFS_DAS_SF_ADD,	"XFS_DAS_SF_ADD" }, \
-	{ XFS_DAS_LEAF_ADD,	"XFS_DAS_LEAF_ADD" }, \
-	{ XFS_DAS_NODE_ADD,	"XFS_DAS_NODE_ADD" }, \
-	{ XFS_DAS_RMTBLK,	"XFS_DAS_RMTBLK" }, \
-	{ XFS_DAS_RM_NAME,	"XFS_DAS_RM_NAME" }, \
-	{ XFS_DAS_RM_SHRINK,	"XFS_DAS_RM_SHRINK" }, \
-	{ XFS_DAS_LEAF_SET_RMT,	"XFS_DAS_LEAF_SET_RMT" }, \
-	{ XFS_DAS_LEAF_ALLOC_RMT, "XFS_DAS_LEAF_ALLOC_RMT" }, \
-	{ XFS_DAS_LEAF_REPLACE,	"XFS_DAS_LEAF_REPLACE" }, \
-	{ XFS_DAS_FLIP_LFLAG,	"XFS_DAS_FLIP_LFLAG" }, \
-	{ XFS_DAS_RM_LBLK,	"XFS_DAS_RM_LBLK" }, \
-	{ XFS_DAS_RD_LEAF,	"XFS_DAS_RD_LEAF" }, \
-	{ XFS_DAS_NODE_SET_RMT,	"XFS_DAS_NODE_SET_RMT" }, \
-	{ XFS_DAS_NODE_ALLOC_RMT, "XFS_DAS_NODE_ALLOC_RMT" },  \
-	{ XFS_DAS_NODE_REPLACE,	"XFS_DAS_NODE_REPLACE" },  \
-	{ XFS_DAS_FLIP_NFLAG,	"XFS_DAS_FLIP_NFLAG" }, \
-	{ XFS_DAS_RM_NBLK,	"XFS_DAS_RM_NBLK" }, \
-	{ XFS_DAS_CLR_FLAG,	"XFS_DAS_CLR_FLAG" }, \
-	{ XFS_DAS_DONE,		"XFS_DAS_DONE" }
+	{ XFS_DAS_UNINIT,		"XFS_DAS_UNINIT" }, \
+	{ XFS_DAS_SF_ADD,		"XFS_DAS_SF_ADD" }, \
+	{ XFS_DAS_LEAF_ADD,		"XFS_DAS_LEAF_ADD" }, \
+	{ XFS_DAS_NODE_ADD,		"XFS_DAS_NODE_ADD" }, \
+	{ XFS_DAS_RMTBLK,		"XFS_DAS_RMTBLK" }, \
+	{ XFS_DAS_RM_NAME,		"XFS_DAS_RM_NAME" }, \
+	{ XFS_DAS_RM_SHRINK,		"XFS_DAS_RM_SHRINK" }, \
+	{ XFS_DAS_LEAF_SET_RMT,		"XFS_DAS_LEAF_SET_RMT" }, \
+	{ XFS_DAS_LEAF_ALLOC_RMT,	"XFS_DAS_LEAF_ALLOC_RMT" }, \
+	{ XFS_DAS_LEAF_REPLACE,		"XFS_DAS_LEAF_REPLACE" }, \
+	{ XFS_DAS_LEAF_REMOVE_OLD,	"XFS_DAS_LEAF_REMOVE_OLD" }, \
+	{ XFS_DAS_RM_LBLK,		"XFS_DAS_RM_LBLK" }, \
+	{ XFS_DAS_RD_LEAF,		"XFS_DAS_RD_LEAF" }, \
+	{ XFS_DAS_NODE_SET_RMT,		"XFS_DAS_NODE_SET_RMT" }, \
+	{ XFS_DAS_NODE_ALLOC_RMT,	"XFS_DAS_NODE_ALLOC_RMT" },  \
+	{ XFS_DAS_NODE_REPLACE,		"XFS_DAS_NODE_REPLACE" },  \
+	{ XFS_DAS_NODE_REMOVE_OLD,	"XFS_DAS_NODE_REMOVE_OLD" }, \
+	{ XFS_DAS_RM_NBLK,		"XFS_DAS_RM_NBLK" }, \
+	{ XFS_DAS_CLR_FLAG,		"XFS_DAS_CLR_FLAG" }, \
+	{ XFS_DAS_DONE,			"XFS_DAS_DONE" }
 
 /*
  * Defines for xfs_attr_item.xattri_flags
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index cb9122327114..b528c0f375c2 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -4139,13 +4139,13 @@ TRACE_DEFINE_ENUM(XFS_DAS_RM_SHRINK);
 TRACE_DEFINE_ENUM(XFS_DAS_LEAF_SET_RMT);
 TRACE_DEFINE_ENUM(XFS_DAS_LEAF_ALLOC_RMT);
 TRACE_DEFINE_ENUM(XFS_DAS_LEAF_REPLACE);
-TRACE_DEFINE_ENUM(XFS_DAS_FLIP_LFLAG);
+TRACE_DEFINE_ENUM(XFS_DAS_LEAF_REMOVE_OLD);
 TRACE_DEFINE_ENUM(XFS_DAS_RM_LBLK);
 TRACE_DEFINE_ENUM(XFS_DAS_RD_LEAF);
 TRACE_DEFINE_ENUM(XFS_DAS_NODE_SET_RMT);
 TRACE_DEFINE_ENUM(XFS_DAS_NODE_ALLOC_RMT);
 TRACE_DEFINE_ENUM(XFS_DAS_NODE_REPLACE);
-TRACE_DEFINE_ENUM(XFS_DAS_FLIP_NFLAG);
+TRACE_DEFINE_ENUM(XFS_DAS_NODE_REMOVE_OLD);
 TRACE_DEFINE_ENUM(XFS_DAS_RM_NBLK);
 TRACE_DEFINE_ENUM(XFS_DAS_CLR_FLAG);
 
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 10/18] xfs: remote xattr removal in xfs_attr_set_iter() is conditional
  2022-05-09  0:41 [PATCH 00/18 V4] XFS: LARP state machine and recovery rework Dave Chinner
                   ` (8 preceding siblings ...)
  2022-05-09  0:41 ` [PATCH 09/18] xfs: XFS_DAS_LEAF_REPLACE state only needed if !LARP Dave Chinner
@ 2022-05-09  0:41 ` Dave Chinner
  2022-05-10 23:26   ` Darrick J. Wong
  2022-05-09  0:41 ` [PATCH 11/18] xfs: clean up final attr removal in xfs_attr_set_iter Dave Chinner
                   ` (8 subsequent siblings)
  18 siblings, 1 reply; 56+ messages in thread
From: Dave Chinner @ 2022-05-09  0:41 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

We may not have a remote value for the old xattr we have to remove,
so skip over the remote value removal states and go straight to
the xattr name removal in the leaf/node block.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Allison Henderson<allison.henderson@oracle.com>
---
 fs/xfs/libxfs/xfs_attr.c | 59 ++++++++++++++++++++--------------------
 fs/xfs/libxfs/xfs_attr.h |  8 +++---
 fs/xfs/xfs_trace.h       |  4 +--
 3 files changed, 36 insertions(+), 35 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
index 0f4636e2e246..d65abaead9a1 100644
--- a/fs/xfs/libxfs/xfs_attr.c
+++ b/fs/xfs/libxfs/xfs_attr.c
@@ -495,15 +495,14 @@ xfs_attr_set_iter(
 		/*
 		 * We must "flip" the incomplete flags on the "new" and "old"
 		 * attribute/value pairs so that one disappears and one appears
-		 * atomically.  Then we must remove the "old" attribute/value
-		 * pair.
+		 * atomically.
 		 */
 		error = xfs_attr3_leaf_flipflags(args);
 		if (error)
 			return error;
 		/*
-		 * Commit the flag value change and start the next trans
-		 * in series at REMOVE_OLD.
+		 * We must commit the flag value change now to make it atomic
+		 * and then we can start the next trans in series at REMOVE_OLD.
 		 */
 		error = -EAGAIN;
 		attr->xattri_dela_state++;
@@ -512,41 +511,43 @@ xfs_attr_set_iter(
 	case XFS_DAS_LEAF_REMOVE_OLD:
 	case XFS_DAS_NODE_REMOVE_OLD:
 		/*
-		 * Dismantle the "old" attribute/value pair by removing a
-		 * "remote" value (if it exists).
+		 * If we have a remote attr, start the process of removing it
+		 * by invalidating any cached buffers.
+		 *
+		 * If we don't have a remote attr, we skip the remote block
+		 * removal state altogether with a second state increment.
 		 */
 		xfs_attr_restore_rmt_blk(args);
-		error = xfs_attr_rmtval_invalidate(args);
-		if (error)
-			return error;
-
-		attr->xattri_dela_state++;
-		fallthrough;
-	case XFS_DAS_RM_LBLK:
-	case XFS_DAS_RM_NBLK:
 		if (args->rmtblkno) {
-			error = xfs_attr_rmtval_remove(attr);
-			if (error == -EAGAIN)
-				trace_xfs_attr_set_iter_return(
-					attr->xattri_dela_state, args->dp);
+			error = xfs_attr_rmtval_invalidate(args);
 			if (error)
 				return error;
-
-			attr->xattri_dela_state = XFS_DAS_RD_LEAF;
-			trace_xfs_attr_set_iter_return(attr->xattri_dela_state,
-						       args->dp);
-			return -EAGAIN;
+		} else {
+			attr->xattri_dela_state++;
 		}
 
+		attr->xattri_dela_state++;
+		goto next_state;
+
+	case XFS_DAS_LEAF_REMOVE_RMT:
+	case XFS_DAS_NODE_REMOVE_RMT:
+		error = xfs_attr_rmtval_remove(attr);
+		if (error == -EAGAIN)
+			break;
+		if (error)
+			return error;
+
 		/*
-		 * This is the end of the shared leaf/node sequence. We need
-		 * to continue at the next state in the sequence, but we can't
-		 * easily just fall through. So we increment to the next state
-		 * and then jump back to switch statement to evaluate the next
-		 * state correctly.
+		 * We've finished removing the remote attr blocks, so commit the
+		 * transaction and move on to removing the attr name from the
+		 * leaf/node block. Removing the attr might require a full
+		 * transaction reservation for btree block freeing, so we
+		 * can't do that in the same transaction where we removed the
+		 * remote attr blocks.
 		 */
+		error = -EAGAIN;
 		attr->xattri_dela_state++;
-		goto next_state;
+		break;
 
 	case XFS_DAS_RD_LEAF:
 		/*
diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
index 3f1234272f3a..d67535e4ce5a 100644
--- a/fs/xfs/libxfs/xfs_attr.h
+++ b/fs/xfs/libxfs/xfs_attr.h
@@ -456,7 +456,7 @@ enum xfs_delattr_state {
 	XFS_DAS_LEAF_ALLOC_RMT,		/* We are allocating remote blocks */
 	XFS_DAS_LEAF_REPLACE,		/* Perform replace ops on a leaf */
 	XFS_DAS_LEAF_REMOVE_OLD,	/* Start removing old attr from leaf */
-	XFS_DAS_RM_LBLK,		/* A rename is removing leaf blocks */
+	XFS_DAS_LEAF_REMOVE_RMT,	/* A rename is removing remote blocks */
 	XFS_DAS_RD_LEAF,		/* Read in the new leaf */
 
 	/* Node state set sequence, must match leaf state above */
@@ -464,7 +464,7 @@ enum xfs_delattr_state {
 	XFS_DAS_NODE_ALLOC_RMT,		/* We are allocating remote blocks */
 	XFS_DAS_NODE_REPLACE,		/* Perform replace ops on a node */
 	XFS_DAS_NODE_REMOVE_OLD,	/* Start removing old attr from node */
-	XFS_DAS_RM_NBLK,		/* A rename is removing node blocks */
+	XFS_DAS_NODE_REMOVE_RMT,	/* A rename is removing remote blocks */
 	XFS_DAS_CLR_FLAG,		/* Clear incomplete flag */
 
 	XFS_DAS_DONE,			/* finished operation */
@@ -482,13 +482,13 @@ enum xfs_delattr_state {
 	{ XFS_DAS_LEAF_ALLOC_RMT,	"XFS_DAS_LEAF_ALLOC_RMT" }, \
 	{ XFS_DAS_LEAF_REPLACE,		"XFS_DAS_LEAF_REPLACE" }, \
 	{ XFS_DAS_LEAF_REMOVE_OLD,	"XFS_DAS_LEAF_REMOVE_OLD" }, \
-	{ XFS_DAS_RM_LBLK,		"XFS_DAS_RM_LBLK" }, \
+	{ XFS_DAS_LEAF_REMOVE_RMT,	"XFS_DAS_LEAF_REMOVE_RMT" }, \
 	{ XFS_DAS_RD_LEAF,		"XFS_DAS_RD_LEAF" }, \
 	{ XFS_DAS_NODE_SET_RMT,		"XFS_DAS_NODE_SET_RMT" }, \
 	{ XFS_DAS_NODE_ALLOC_RMT,	"XFS_DAS_NODE_ALLOC_RMT" },  \
 	{ XFS_DAS_NODE_REPLACE,		"XFS_DAS_NODE_REPLACE" },  \
 	{ XFS_DAS_NODE_REMOVE_OLD,	"XFS_DAS_NODE_REMOVE_OLD" }, \
-	{ XFS_DAS_RM_NBLK,		"XFS_DAS_RM_NBLK" }, \
+	{ XFS_DAS_NODE_REMOVE_RMT,	"XFS_DAS_NODE_REMOVE_RMT" }, \
 	{ XFS_DAS_CLR_FLAG,		"XFS_DAS_CLR_FLAG" }, \
 	{ XFS_DAS_DONE,			"XFS_DAS_DONE" }
 
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index b528c0f375c2..793d2a86ab2c 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -4140,13 +4140,13 @@ TRACE_DEFINE_ENUM(XFS_DAS_LEAF_SET_RMT);
 TRACE_DEFINE_ENUM(XFS_DAS_LEAF_ALLOC_RMT);
 TRACE_DEFINE_ENUM(XFS_DAS_LEAF_REPLACE);
 TRACE_DEFINE_ENUM(XFS_DAS_LEAF_REMOVE_OLD);
-TRACE_DEFINE_ENUM(XFS_DAS_RM_LBLK);
+TRACE_DEFINE_ENUM(XFS_DAS_LEAF_REMOVE_RMT);
 TRACE_DEFINE_ENUM(XFS_DAS_RD_LEAF);
 TRACE_DEFINE_ENUM(XFS_DAS_NODE_SET_RMT);
 TRACE_DEFINE_ENUM(XFS_DAS_NODE_ALLOC_RMT);
 TRACE_DEFINE_ENUM(XFS_DAS_NODE_REPLACE);
 TRACE_DEFINE_ENUM(XFS_DAS_NODE_REMOVE_OLD);
-TRACE_DEFINE_ENUM(XFS_DAS_RM_NBLK);
+TRACE_DEFINE_ENUM(XFS_DAS_NODE_REMOVE_RMT);
 TRACE_DEFINE_ENUM(XFS_DAS_CLR_FLAG);
 
 DECLARE_EVENT_CLASS(xfs_das_state_class,
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 11/18] xfs: clean up final attr removal in xfs_attr_set_iter
  2022-05-09  0:41 [PATCH 00/18 V4] XFS: LARP state machine and recovery rework Dave Chinner
                   ` (9 preceding siblings ...)
  2022-05-09  0:41 ` [PATCH 10/18] xfs: remote xattr removal in xfs_attr_set_iter() is conditional Dave Chinner
@ 2022-05-09  0:41 ` Dave Chinner
  2022-05-10 23:29   ` Darrick J. Wong
  2022-05-09  0:41 ` [PATCH 12/18] xfs: xfs_attr_set_iter() does not need to return EAGAIN Dave Chinner
                   ` (7 subsequent siblings)
  18 siblings, 1 reply; 56+ messages in thread
From: Dave Chinner @ 2022-05-09  0:41 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

Clean up the final leaf/node states in xfs_attr_set_iter() to
further simplify the high level state machine and to set the
completion state correctly. As we are adding a separate state
for node format removal, we need to ensure that node formats
are collapsed back to shortformi or empty correctly.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Allison Henderson<allison.henderson@oracle.com>
---
 fs/xfs/libxfs/xfs_attr.c | 149 ++++++++++++++++++++++-----------------
 fs/xfs/libxfs/xfs_attr.h |  12 ++--
 fs/xfs/xfs_trace.h       |   5 +-
 3 files changed, 94 insertions(+), 72 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
index d65abaead9a1..5707ac4f44a7 100644
--- a/fs/xfs/libxfs/xfs_attr.c
+++ b/fs/xfs/libxfs/xfs_attr.c
@@ -61,7 +61,7 @@ STATIC int xfs_attr_node_get(xfs_da_args_t *args);
 STATIC void xfs_attr_restore_rmt_blk(struct xfs_da_args *args);
 static int xfs_attr_node_try_addname(struct xfs_attr_item *attr);
 STATIC int xfs_attr_node_addname_find_attr(struct xfs_attr_item *attr);
-STATIC int xfs_attr_node_addname_clear_incomplete(struct xfs_attr_item *attr);
+STATIC int xfs_attr_node_remove_attr(struct xfs_attr_item *attr);
 STATIC int xfs_attr_node_hasname(xfs_da_args_t *args,
 				 struct xfs_da_state **state);
 STATIC int xfs_attr_fillstate(xfs_da_state_t *state);
@@ -444,6 +444,77 @@ xfs_attr_rmtval_alloc(
 	return error;
 }
 
+/*
+ * Remove the original attr we have just replaced. This is dependent on the
+ * original lookup and insert placing the old attr in args->blkno/args->index
+ * and the new attr in args->blkno2/args->index2.
+ */
+static int
+xfs_attr_leaf_remove_attr(
+	struct xfs_attr_item		*attr)
+{
+	struct xfs_da_args              *args = attr->xattri_da_args;
+	struct xfs_inode		*dp = args->dp;
+	struct xfs_buf			*bp = NULL;
+	int				forkoff;
+	int				error;
+
+	error = xfs_attr3_leaf_read(args->trans, args->dp, args->blkno,
+				   &bp);
+	if (error)
+		return error;
+
+	xfs_attr3_leaf_remove(bp, args);
+
+	forkoff = xfs_attr_shortform_allfit(bp, dp);
+	if (forkoff)
+		error = xfs_attr3_leaf_to_shortform(bp, args, forkoff);
+		/* bp is gone due to xfs_da_shrink_inode */
+
+	return error;
+}
+
+/*
+ * Shrink an attribute from leaf to shortform. Used by the node format remove
+ * path when the node format collapses to a single block and so we have to check
+ * if it can be collapsed further.
+ */
+static int
+xfs_attr_leaf_shrink(
+	struct xfs_da_args	*args,
+	struct xfs_da_state     *state)
+{
+	struct xfs_inode	*dp = args->dp;
+	int			error, forkoff;
+	struct xfs_buf		*bp;
+
+	if (!xfs_attr_is_leaf(dp))
+		return 0;
+
+	/*
+	 * Have to get rid of the copy of this dabuf in the state.
+	 */
+	if (state) {
+		ASSERT(state->path.active == 1);
+		ASSERT(state->path.blk[0].bp);
+		state->path.blk[0].bp = NULL;
+	}
+
+	error = xfs_attr3_leaf_read(args->trans, args->dp, 0, &bp);
+	if (error)
+		return error;
+
+	forkoff = xfs_attr_shortform_allfit(bp, dp);
+	if (forkoff) {
+		error = xfs_attr3_leaf_to_shortform(bp, args, forkoff);
+		/* bp is gone due to xfs_da_shrink_inode */
+	} else {
+		xfs_trans_brelse(args->trans, bp);
+	}
+
+	return error;
+}
+
 /*
  * Set the attribute specified in @args.
  * This routine is meant to function as a delayed operation, and may return
@@ -456,9 +527,7 @@ xfs_attr_set_iter(
 	struct xfs_attr_item		*attr)
 {
 	struct xfs_da_args              *args = attr->xattri_da_args;
-	struct xfs_inode		*dp = args->dp;
-	struct xfs_buf			*bp = NULL;
-	int				forkoff, error = 0;
+	int				error = 0;
 
 	/* State machine switch */
 next_state:
@@ -549,32 +618,16 @@ xfs_attr_set_iter(
 		attr->xattri_dela_state++;
 		break;
 
-	case XFS_DAS_RD_LEAF:
-		/*
-		 * This is the last step for leaf format. Read the block with
-		 * the old attr, remove the old attr, check for shortform
-		 * conversion and return.
-		 */
-		error = xfs_attr3_leaf_read(args->trans, args->dp, args->blkno,
-					   &bp);
-		if (error)
-			return error;
-
-		xfs_attr3_leaf_remove(bp, args);
-
-		forkoff = xfs_attr_shortform_allfit(bp, dp);
-		if (forkoff)
-			error = xfs_attr3_leaf_to_shortform(bp, args, forkoff);
-			/* bp is gone due to xfs_da_shrink_inode */
-
-		return error;
+	case XFS_DAS_LEAF_REMOVE_ATTR:
+		error = xfs_attr_leaf_remove_attr(attr);
+		attr->xattri_dela_state = XFS_DAS_DONE;
+		break;
 
-	case XFS_DAS_CLR_FLAG:
-		/*
-		 * The last state for node format. Look up the old attr and
-		 * remove it.
-		 */
-		error = xfs_attr_node_addname_clear_incomplete(attr);
+	case XFS_DAS_NODE_REMOVE_ATTR:
+		error = xfs_attr_node_remove_attr(attr);
+		if (!error)
+			error = xfs_attr_leaf_shrink(args, NULL);
+		attr->xattri_dela_state = XFS_DAS_DONE;
 		break;
 	default:
 		ASSERT(0);
@@ -1269,8 +1322,8 @@ xfs_attr_node_try_addname(
 }
 
 
-STATIC int
-xfs_attr_node_addname_clear_incomplete(
+static int
+xfs_attr_node_remove_attr(
 	struct xfs_attr_item		*attr)
 {
 	struct xfs_da_args		*args = attr->xattri_da_args;
@@ -1311,38 +1364,6 @@ xfs_attr_node_addname_clear_incomplete(
 	return retval;
 }
 
-/*
- * Shrink an attribute from leaf to shortform
- */
-STATIC int
-xfs_attr_node_shrink(
-	struct xfs_da_args	*args,
-	struct xfs_da_state     *state)
-{
-	struct xfs_inode	*dp = args->dp;
-	int			error, forkoff;
-	struct xfs_buf		*bp;
-
-	/*
-	 * Have to get rid of the copy of this dabuf in the state.
-	 */
-	ASSERT(state->path.active == 1);
-	ASSERT(state->path.blk[0].bp);
-	state->path.blk[0].bp = NULL;
-
-	error = xfs_attr3_leaf_read(args->trans, args->dp, 0, &bp);
-	if (error)
-		return error;
-
-	forkoff = xfs_attr_shortform_allfit(bp, dp);
-	if (forkoff) {
-		error = xfs_attr3_leaf_to_shortform(bp, args, forkoff);
-		/* bp is gone due to xfs_da_shrink_inode */
-	} else
-		xfs_trans_brelse(args->trans, bp);
-
-	return error;
-}
 
 /*
  * Mark an attribute entry INCOMPLETE and save pointers to the relevant buffers
@@ -1551,7 +1572,7 @@ xfs_attr_remove_iter(
 		 * transaction.
 		 */
 		if (xfs_attr_is_leaf(dp))
-			error = xfs_attr_node_shrink(args, state);
+			error = xfs_attr_leaf_shrink(args, state);
 		ASSERT(error != -EAGAIN);
 		break;
 	default:
diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
index d67535e4ce5a..c318260f17d4 100644
--- a/fs/xfs/libxfs/xfs_attr.h
+++ b/fs/xfs/libxfs/xfs_attr.h
@@ -451,21 +451,21 @@ enum xfs_delattr_state {
 	XFS_DAS_RM_NAME,		/* Remove attr name */
 	XFS_DAS_RM_SHRINK,		/* We are shrinking the tree */
 
-	/* Leaf state set sequence */
+	/* Leaf state set/replace sequence */
 	XFS_DAS_LEAF_SET_RMT,		/* set a remote xattr from a leaf */
 	XFS_DAS_LEAF_ALLOC_RMT,		/* We are allocating remote blocks */
 	XFS_DAS_LEAF_REPLACE,		/* Perform replace ops on a leaf */
 	XFS_DAS_LEAF_REMOVE_OLD,	/* Start removing old attr from leaf */
 	XFS_DAS_LEAF_REMOVE_RMT,	/* A rename is removing remote blocks */
-	XFS_DAS_RD_LEAF,		/* Read in the new leaf */
+	XFS_DAS_LEAF_REMOVE_ATTR,	/* Remove the old attr from a leaf */
 
-	/* Node state set sequence, must match leaf state above */
+	/* Node state set/replace sequence, must match leaf state above */
 	XFS_DAS_NODE_SET_RMT,		/* set a remote xattr from a node */
 	XFS_DAS_NODE_ALLOC_RMT,		/* We are allocating remote blocks */
 	XFS_DAS_NODE_REPLACE,		/* Perform replace ops on a node */
 	XFS_DAS_NODE_REMOVE_OLD,	/* Start removing old attr from node */
 	XFS_DAS_NODE_REMOVE_RMT,	/* A rename is removing remote blocks */
-	XFS_DAS_CLR_FLAG,		/* Clear incomplete flag */
+	XFS_DAS_NODE_REMOVE_ATTR,	/* Remove the old attr from a node */
 
 	XFS_DAS_DONE,			/* finished operation */
 };
@@ -483,13 +483,13 @@ enum xfs_delattr_state {
 	{ XFS_DAS_LEAF_REPLACE,		"XFS_DAS_LEAF_REPLACE" }, \
 	{ XFS_DAS_LEAF_REMOVE_OLD,	"XFS_DAS_LEAF_REMOVE_OLD" }, \
 	{ XFS_DAS_LEAF_REMOVE_RMT,	"XFS_DAS_LEAF_REMOVE_RMT" }, \
-	{ XFS_DAS_RD_LEAF,		"XFS_DAS_RD_LEAF" }, \
+	{ XFS_DAS_LEAF_REMOVE_ATTR,	"XFS_DAS_LEAF_REMOVE_ATTR" }, \
 	{ XFS_DAS_NODE_SET_RMT,		"XFS_DAS_NODE_SET_RMT" }, \
 	{ XFS_DAS_NODE_ALLOC_RMT,	"XFS_DAS_NODE_ALLOC_RMT" },  \
 	{ XFS_DAS_NODE_REPLACE,		"XFS_DAS_NODE_REPLACE" },  \
 	{ XFS_DAS_NODE_REMOVE_OLD,	"XFS_DAS_NODE_REMOVE_OLD" }, \
 	{ XFS_DAS_NODE_REMOVE_RMT,	"XFS_DAS_NODE_REMOVE_RMT" }, \
-	{ XFS_DAS_CLR_FLAG,		"XFS_DAS_CLR_FLAG" }, \
+	{ XFS_DAS_NODE_REMOVE_ATTR,	"XFS_DAS_NODE_REMOVE_ATTR" }, \
 	{ XFS_DAS_DONE,			"XFS_DAS_DONE" }
 
 /*
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index 793d2a86ab2c..260760ce2d05 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -4141,13 +4141,14 @@ TRACE_DEFINE_ENUM(XFS_DAS_LEAF_ALLOC_RMT);
 TRACE_DEFINE_ENUM(XFS_DAS_LEAF_REPLACE);
 TRACE_DEFINE_ENUM(XFS_DAS_LEAF_REMOVE_OLD);
 TRACE_DEFINE_ENUM(XFS_DAS_LEAF_REMOVE_RMT);
-TRACE_DEFINE_ENUM(XFS_DAS_RD_LEAF);
+TRACE_DEFINE_ENUM(XFS_DAS_LEAF_REMOVE_ATTR);
 TRACE_DEFINE_ENUM(XFS_DAS_NODE_SET_RMT);
 TRACE_DEFINE_ENUM(XFS_DAS_NODE_ALLOC_RMT);
 TRACE_DEFINE_ENUM(XFS_DAS_NODE_REPLACE);
 TRACE_DEFINE_ENUM(XFS_DAS_NODE_REMOVE_OLD);
 TRACE_DEFINE_ENUM(XFS_DAS_NODE_REMOVE_RMT);
-TRACE_DEFINE_ENUM(XFS_DAS_CLR_FLAG);
+TRACE_DEFINE_ENUM(XFS_DAS_NODE_REMOVE_ATTR);
+TRACE_DEFINE_ENUM(XFS_DAS_DONE);
 
 DECLARE_EVENT_CLASS(xfs_das_state_class,
 	TP_PROTO(int das, struct xfs_inode *ip),
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 12/18] xfs: xfs_attr_set_iter() does not need to return EAGAIN
  2022-05-09  0:41 [PATCH 00/18 V4] XFS: LARP state machine and recovery rework Dave Chinner
                   ` (10 preceding siblings ...)
  2022-05-09  0:41 ` [PATCH 11/18] xfs: clean up final attr removal in xfs_attr_set_iter Dave Chinner
@ 2022-05-09  0:41 ` Dave Chinner
  2022-05-10 23:30   ` Darrick J. Wong
  2022-05-09  0:41 ` [PATCH 13/18] xfs: introduce attr remove initial states into xfs_attr_set_iter Dave Chinner
                   ` (6 subsequent siblings)
  18 siblings, 1 reply; 56+ messages in thread
From: Dave Chinner @ 2022-05-09  0:41 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

Now that the full xfs_attr_set_iter() state machine always
terminates with either the state being XFS_DAS_DONE on success or
an error on failure, we can get rid of the need for it to return
-EAGAIN whenever it needs to roll the transaction before running
the next state.

That is, we don't need to spray -EAGAIN return states everywhere,
the caller just check the state machine state for completion to
determine what action should be taken next. This greatly simplifies
the code within the state machine implementation as it now only has
to handle 0 for success or -errno for error and it doesn't need to
tell the caller to retry.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Allison Henderson<allison.henderson@oracle.com>
---
 fs/xfs/libxfs/xfs_attr.c | 86 +++++++++++++++++-----------------------
 fs/xfs/xfs_attr_item.c   |  2 +
 2 files changed, 38 insertions(+), 50 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
index 5707ac4f44a7..89e68d9e22c0 100644
--- a/fs/xfs/libxfs/xfs_attr.c
+++ b/fs/xfs/libxfs/xfs_attr.c
@@ -290,7 +290,6 @@ xfs_attr_sf_addname(
 	 */
 	xfs_trans_bhold(args->trans, attr->xattri_leaf_bp);
 	attr->xattri_dela_state = XFS_DAS_LEAF_ADD;
-	error = -EAGAIN;
 out:
 	trace_xfs_attr_sf_addname_return(attr->xattri_dela_state, args->dp);
 	return error;
@@ -343,7 +342,6 @@ xfs_attr_leaf_addname(
 		 * retry the add to the newly allocated node block.
 		 */
 		attr->xattri_dela_state = XFS_DAS_NODE_ADD;
-		error = -EAGAIN;
 		goto out;
 	}
 	if (error)
@@ -354,20 +352,24 @@ xfs_attr_leaf_addname(
 	 * or perform more xattr manipulations. Otherwise there is nothing more
 	 * to do and we can return success.
 	 */
-	if (args->rmtblkno) {
+	if (args->rmtblkno)
 		attr->xattri_dela_state = XFS_DAS_LEAF_SET_RMT;
-		error = -EAGAIN;
-	} else if (args->op_flags & XFS_DA_OP_RENAME) {
+	else if (args->op_flags & XFS_DA_OP_RENAME)
 		xfs_attr_dela_state_set_replace(attr, XFS_DAS_LEAF_REPLACE);
-		error = -EAGAIN;
-	} else {
+	else
 		attr->xattri_dela_state = XFS_DAS_DONE;
-	}
 out:
 	trace_xfs_attr_leaf_addname_return(attr->xattri_dela_state, args->dp);
 	return error;
 }
 
+/*
+ * Add an entry to a node format attr tree.
+ *
+ * Note that we might still have a leaf here - xfs_attr_is_leaf() cannot tell
+ * the difference between leaf + remote attr blocks and a node format tree,
+ * so we may still end up having to convert from leaf to node format here.
+ */
 static int
 xfs_attr_node_addname(
 	struct xfs_attr_item	*attr)
@@ -382,19 +384,26 @@ xfs_attr_node_addname(
 		return error;
 
 	error = xfs_attr_node_try_addname(attr);
+	if (error == -ENOSPC) {
+		error = xfs_attr3_leaf_to_node(args);
+		if (error)
+			return error;
+		/*
+		 * No state change, we really are in node form now
+		 * but we need the transaction rolled to continue.
+		 */
+		goto out;
+	}
 	if (error)
 		return error;
 
-	if (args->rmtblkno) {
+	if (args->rmtblkno)
 		attr->xattri_dela_state = XFS_DAS_NODE_SET_RMT;
-		error = -EAGAIN;
-	} else if (args->op_flags & XFS_DA_OP_RENAME) {
+	else if (args->op_flags & XFS_DA_OP_RENAME)
 		xfs_attr_dela_state_set_replace(attr, XFS_DAS_NODE_REPLACE);
-		error = -EAGAIN;
-	} else {
+	else
 		attr->xattri_dela_state = XFS_DAS_DONE;
-	}
-
+out:
 	trace_xfs_attr_node_addname_return(attr->xattri_dela_state, args->dp);
 	return error;
 }
@@ -417,10 +426,8 @@ xfs_attr_rmtval_alloc(
 		if (error)
 			return error;
 		/* Roll the transaction only if there is more to allocate. */
-		if (attr->xattri_blkcnt > 0) {
-			error = -EAGAIN;
+		if (attr->xattri_blkcnt > 0)
 			goto out;
-		}
 	}
 
 	error = xfs_attr_rmtval_set_value(args);
@@ -516,11 +523,12 @@ xfs_attr_leaf_shrink(
 }
 
 /*
- * Set the attribute specified in @args.
- * This routine is meant to function as a delayed operation, and may return
- * -EAGAIN when the transaction needs to be rolled.  Calling functions will need
- * to handle this, and recall the function until a successful error code is
- * returned.
+ * Run the attribute operation specified in @attr.
+ *
+ * This routine is meant to function as a delayed operation and will set the
+ * state to XFS_DAS_DONE when the operation is complete.  Calling functions will
+ * need to handle this, and recall the function until either an error or
+ * XFS_DAS_DONE is detected.
  */
 int
 xfs_attr_set_iter(
@@ -573,7 +581,6 @@ xfs_attr_set_iter(
 		 * We must commit the flag value change now to make it atomic
 		 * and then we can start the next trans in series at REMOVE_OLD.
 		 */
-		error = -EAGAIN;
 		attr->xattri_dela_state++;
 		break;
 
@@ -601,8 +608,10 @@ xfs_attr_set_iter(
 	case XFS_DAS_LEAF_REMOVE_RMT:
 	case XFS_DAS_NODE_REMOVE_RMT:
 		error = xfs_attr_rmtval_remove(attr);
-		if (error == -EAGAIN)
+		if (error == -EAGAIN) {
+			error = 0;
 			break;
+		}
 		if (error)
 			return error;
 
@@ -614,7 +623,6 @@ xfs_attr_set_iter(
 		 * can't do that in the same transaction where we removed the
 		 * remote attr blocks.
 		 */
-		error = -EAGAIN;
 		attr->xattri_dela_state++;
 		break;
 
@@ -1250,14 +1258,6 @@ xfs_attr_node_addname_find_attr(
  * This will involve walking down the Btree, and may involve splitting
  * leaf nodes and even splitting intermediate nodes up to and including
  * the root node (a special case of an intermediate node).
- *
- * "Remote" attribute values confuse the issue and atomic rename operations
- * add a whole extra layer of confusion on top of that.
- *
- * This routine is meant to function as a delayed operation, and may return
- * -EAGAIN when the transaction needs to be rolled.  Calling functions will need
- * to handle this, and recall the function until a successful error code is
- *returned.
  */
 static int
 xfs_attr_node_try_addname(
@@ -1279,24 +1279,10 @@ xfs_attr_node_try_addname(
 			/*
 			 * Its really a single leaf node, but it had
 			 * out-of-line values so it looked like it *might*
-			 * have been a b-tree.
+			 * have been a b-tree. Let the caller deal with this.
 			 */
 			xfs_da_state_free(state);
-			state = NULL;
-			error = xfs_attr3_leaf_to_node(args);
-			if (error)
-				goto out;
-
-			/*
-			 * Now that we have converted the leaf to a node, we can
-			 * roll the transaction, and try xfs_attr3_leaf_add
-			 * again on re-entry.  No need to set dela_state to do
-			 * this. dela_state is still unset by this function at
-			 * this point.
-			 */
-			trace_xfs_attr_node_addname_return(
-					attr->xattri_dela_state, args->dp);
-			return -EAGAIN;
+			return -ENOSPC;
 		}
 
 		/*
diff --git a/fs/xfs/xfs_attr_item.c b/fs/xfs/xfs_attr_item.c
index 5bfb33746e37..740a55d07660 100644
--- a/fs/xfs/xfs_attr_item.c
+++ b/fs/xfs/xfs_attr_item.c
@@ -313,6 +313,8 @@ xfs_xattri_finish_update(
 	case XFS_ATTR_OP_FLAGS_SET:
 	case XFS_ATTR_OP_FLAGS_REPLACE:
 		error = xfs_attr_set_iter(attr);
+		if (!error && attr->xattri_dela_state != XFS_DAS_DONE)
+			error = -EAGAIN;
 		break;
 	case XFS_ATTR_OP_FLAGS_REMOVE:
 		ASSERT(XFS_IFORK_Q(args->dp));
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 13/18] xfs: introduce attr remove initial states into xfs_attr_set_iter
  2022-05-09  0:41 [PATCH 00/18 V4] XFS: LARP state machine and recovery rework Dave Chinner
                   ` (11 preceding siblings ...)
  2022-05-09  0:41 ` [PATCH 12/18] xfs: xfs_attr_set_iter() does not need to return EAGAIN Dave Chinner
@ 2022-05-09  0:41 ` Dave Chinner
  2022-05-10 23:37   ` Darrick J. Wong
  2022-05-09  0:41 ` [PATCH 14/18] xfs: switch attr remove to xfs_attri_set_iter Dave Chinner
                   ` (5 subsequent siblings)
  18 siblings, 1 reply; 56+ messages in thread
From: Dave Chinner @ 2022-05-09  0:41 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

We need to merge the add and remove code paths to enable safe
recovery of replace operations. Hoist the initial remove states from
xfs_attr_remove_iter into xfs_attr_set_iter. We will make use of
them in the next patches.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Allison Henderson<allison.henderson@oracle.com>
---
 fs/xfs/libxfs/xfs_attr.c | 139 ++++++++++++++++++++++-----------------
 fs/xfs/libxfs/xfs_attr.h |   4 ++
 fs/xfs/xfs_trace.h       |   3 +
 3 files changed, 84 insertions(+), 62 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
index 89e68d9e22c0..a6a9b1f8dce6 100644
--- a/fs/xfs/libxfs/xfs_attr.c
+++ b/fs/xfs/libxfs/xfs_attr.c
@@ -451,6 +451,68 @@ xfs_attr_rmtval_alloc(
 	return error;
 }
 
+/*
+ * Mark an attribute entry INCOMPLETE and save pointers to the relevant buffers
+ * for later deletion of the entry.
+ */
+static int
+xfs_attr_leaf_mark_incomplete(
+	struct xfs_da_args	*args,
+	struct xfs_da_state	*state)
+{
+	int			error;
+
+	/*
+	 * Fill in disk block numbers in the state structure
+	 * so that we can get the buffers back after we commit
+	 * several transactions in the following calls.
+	 */
+	error = xfs_attr_fillstate(state);
+	if (error)
+		return error;
+
+	/*
+	 * Mark the attribute as INCOMPLETE
+	 */
+	return xfs_attr3_leaf_setflag(args);
+}
+
+/*
+ * Initial setup for xfs_attr_node_removename.  Make sure the attr is there and
+ * the blocks are valid.  Attr keys with remote blocks will be marked
+ * incomplete.
+ */
+static
+int xfs_attr_node_removename_setup(
+	struct xfs_attr_item		*attr)
+{
+	struct xfs_da_args		*args = attr->xattri_da_args;
+	struct xfs_da_state		**state = &attr->xattri_da_state;
+	int				error;
+
+	error = xfs_attr_node_hasname(args, state);
+	if (error != -EEXIST)
+		goto out;
+	error = 0;
+
+	ASSERT((*state)->path.blk[(*state)->path.active - 1].bp != NULL);
+	ASSERT((*state)->path.blk[(*state)->path.active - 1].magic ==
+		XFS_ATTR_LEAF_MAGIC);
+
+	if (args->rmtblkno > 0) {
+		error = xfs_attr_leaf_mark_incomplete(args, *state);
+		if (error)
+			goto out;
+
+		error = xfs_attr_rmtval_invalidate(args);
+	}
+out:
+	if (error)
+		xfs_da_state_free(*state);
+
+	return error;
+}
+
 /*
  * Remove the original attr we have just replaced. This is dependent on the
  * original lookup and insert placing the old attr in args->blkno/args->index
@@ -550,6 +612,21 @@ xfs_attr_set_iter(
 	case XFS_DAS_NODE_ADD:
 		return xfs_attr_node_addname(attr);
 
+	case XFS_DAS_SF_REMOVE:
+		attr->xattri_dela_state = XFS_DAS_DONE;
+		return xfs_attr_sf_removename(args);
+	case XFS_DAS_LEAF_REMOVE:
+		attr->xattri_dela_state = XFS_DAS_DONE;
+		return xfs_attr_leaf_removename(args);
+	case XFS_DAS_NODE_REMOVE:
+		error = xfs_attr_node_removename_setup(attr);
+		if (error)
+			return error;
+		attr->xattri_dela_state = XFS_DAS_NODE_REMOVE_RMT;
+		if (args->rmtblkno == 0)
+			attr->xattri_dela_state++;
+		break;
+
 	case XFS_DAS_LEAF_SET_RMT:
 	case XFS_DAS_NODE_SET_RMT:
 		error = xfs_attr_rmtval_find_space(attr);
@@ -1351,68 +1428,6 @@ xfs_attr_node_remove_attr(
 }
 
 
-/*
- * Mark an attribute entry INCOMPLETE and save pointers to the relevant buffers
- * for later deletion of the entry.
- */
-STATIC int
-xfs_attr_leaf_mark_incomplete(
-	struct xfs_da_args	*args,
-	struct xfs_da_state	*state)
-{
-	int			error;
-
-	/*
-	 * Fill in disk block numbers in the state structure
-	 * so that we can get the buffers back after we commit
-	 * several transactions in the following calls.
-	 */
-	error = xfs_attr_fillstate(state);
-	if (error)
-		return error;
-
-	/*
-	 * Mark the attribute as INCOMPLETE
-	 */
-	return xfs_attr3_leaf_setflag(args);
-}
-
-/*
- * Initial setup for xfs_attr_node_removename.  Make sure the attr is there and
- * the blocks are valid.  Attr keys with remote blocks will be marked
- * incomplete.
- */
-STATIC
-int xfs_attr_node_removename_setup(
-	struct xfs_attr_item		*attr)
-{
-	struct xfs_da_args		*args = attr->xattri_da_args;
-	struct xfs_da_state		**state = &attr->xattri_da_state;
-	int				error;
-
-	error = xfs_attr_node_hasname(args, state);
-	if (error != -EEXIST)
-		goto out;
-	error = 0;
-
-	ASSERT((*state)->path.blk[(*state)->path.active - 1].bp != NULL);
-	ASSERT((*state)->path.blk[(*state)->path.active - 1].magic ==
-		XFS_ATTR_LEAF_MAGIC);
-
-	if (args->rmtblkno > 0) {
-		error = xfs_attr_leaf_mark_incomplete(args, *state);
-		if (error)
-			goto out;
-
-		error = xfs_attr_rmtval_invalidate(args);
-	}
-out:
-	if (error)
-		xfs_da_state_free(*state);
-
-	return error;
-}
-
 STATIC int
 xfs_attr_node_removename(
 	struct xfs_da_args	*args,
diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
index c318260f17d4..7ea7c7fa31ac 100644
--- a/fs/xfs/libxfs/xfs_attr.h
+++ b/fs/xfs/libxfs/xfs_attr.h
@@ -451,6 +451,10 @@ enum xfs_delattr_state {
 	XFS_DAS_RM_NAME,		/* Remove attr name */
 	XFS_DAS_RM_SHRINK,		/* We are shrinking the tree */
 
+	XFS_DAS_SF_REMOVE,		/* Initial shortform set iter state */
+	XFS_DAS_LEAF_REMOVE,		/* Initial leaf form set iter state */
+	XFS_DAS_NODE_REMOVE,		/* Initial node form set iter state */
+
 	/* Leaf state set/replace sequence */
 	XFS_DAS_LEAF_SET_RMT,		/* set a remote xattr from a leaf */
 	XFS_DAS_LEAF_ALLOC_RMT,		/* We are allocating remote blocks */
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index 260760ce2d05..01b047d86cd1 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -4136,6 +4136,9 @@ TRACE_DEFINE_ENUM(XFS_DAS_NODE_ADD);
 TRACE_DEFINE_ENUM(XFS_DAS_RMTBLK);
 TRACE_DEFINE_ENUM(XFS_DAS_RM_NAME);
 TRACE_DEFINE_ENUM(XFS_DAS_RM_SHRINK);
+TRACE_DEFINE_ENUM(XFS_DAS_SF_REMOVE);
+TRACE_DEFINE_ENUM(XFS_DAS_LEAF_REMOVE);
+TRACE_DEFINE_ENUM(XFS_DAS_NODE_REMOVE);
 TRACE_DEFINE_ENUM(XFS_DAS_LEAF_SET_RMT);
 TRACE_DEFINE_ENUM(XFS_DAS_LEAF_ALLOC_RMT);
 TRACE_DEFINE_ENUM(XFS_DAS_LEAF_REPLACE);
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 14/18] xfs: switch attr remove to xfs_attri_set_iter
  2022-05-09  0:41 [PATCH 00/18 V4] XFS: LARP state machine and recovery rework Dave Chinner
                   ` (12 preceding siblings ...)
  2022-05-09  0:41 ` [PATCH 13/18] xfs: introduce attr remove initial states into xfs_attr_set_iter Dave Chinner
@ 2022-05-09  0:41 ` Dave Chinner
  2022-05-10 23:40   ` Darrick J. Wong
  2022-05-09  0:41 ` [PATCH 15/18] xfs: remove xfs_attri_remove_iter Dave Chinner
                   ` (4 subsequent siblings)
  18 siblings, 1 reply; 56+ messages in thread
From: Dave Chinner @ 2022-05-09  0:41 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

Now that xfs_attri_set_iter() has initial states for removing
attributes, switch the pure attribute removal code over to using it.
This requires attrs being removed to always be marked as INCOMPLETE
before we start the removal due to the fact we look up the attr to
remove again in xfs_attr_node_remove_attr().

Note: this drops the fillstate/refillstate optimisations from
the remove path that avoid having to look up the path again after
setting the incomplete flag and removeing remote attrs. Restoring
that optimisation to this path is future Dave's problem.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Allison Henderson<allison.henderson@oracle.com>
---
 fs/xfs/libxfs/xfs_attr.c | 21 +++++++++------------
 fs/xfs/libxfs/xfs_attr.h | 10 ++++++++++
 fs/xfs/xfs_attr_item.c   | 31 +++++++------------------------
 3 files changed, 26 insertions(+), 36 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
index a6a9b1f8dce6..b935727cf517 100644
--- a/fs/xfs/libxfs/xfs_attr.c
+++ b/fs/xfs/libxfs/xfs_attr.c
@@ -499,13 +499,11 @@ int xfs_attr_node_removename_setup(
 	ASSERT((*state)->path.blk[(*state)->path.active - 1].magic ==
 		XFS_ATTR_LEAF_MAGIC);
 
-	if (args->rmtblkno > 0) {
-		error = xfs_attr_leaf_mark_incomplete(args, *state);
-		if (error)
-			goto out;
-
+	error = xfs_attr_leaf_mark_incomplete(args, *state);
+	if (error)
+		goto out;
+	if (args->rmtblkno > 0)
 		error = xfs_attr_rmtval_invalidate(args);
-	}
 out:
 	if (error)
 		xfs_da_state_free(*state);
@@ -821,7 +819,7 @@ xfs_attr_defer_remove(
 	if (error)
 		return error;
 
-	new->xattri_dela_state = XFS_DAS_UNINIT;
+	new->xattri_dela_state = xfs_attr_init_remove_state(args);
 	xfs_defer_add(args->trans, XFS_DEFER_OPS_TYPE_ATTR, &new->xattri_list);
 	trace_xfs_attr_defer_remove(new->xattri_dela_state, args->dp);
 
@@ -1391,16 +1389,15 @@ xfs_attr_node_remove_attr(
 {
 	struct xfs_da_args		*args = attr->xattri_da_args;
 	struct xfs_da_state		*state = NULL;
-	struct xfs_mount		*mp = args->dp->i_mount;
 	int				retval = 0;
 	int				error = 0;
 
 	/*
-	 * Re-find the "old" attribute entry after any split ops. The INCOMPLETE
-	 * flag means that we will find the "old" attr, not the "new" one.
+	 * The attr we are removing has already been marked incomplete, so
+	 * we need to set the filter appropriately to re-find the "old"
+	 * attribute entry after any split ops.
 	 */
-	if (!xfs_has_larp(mp))
-		args->attr_filter |= XFS_ATTR_INCOMPLETE;
+	args->attr_filter |= XFS_ATTR_INCOMPLETE;
 	state = xfs_da_state_alloc(args);
 	state->inleaf = 0;
 	error = xfs_da3_node_lookup_int(state, &retval);
diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
index 7ea7c7fa31ac..6bef522533a4 100644
--- a/fs/xfs/libxfs/xfs_attr.h
+++ b/fs/xfs/libxfs/xfs_attr.h
@@ -593,6 +593,16 @@ xfs_attr_init_add_state(struct xfs_da_args *args)
 	return XFS_DAS_NODE_ADD;
 }
 
+static inline enum xfs_delattr_state
+xfs_attr_init_remove_state(struct xfs_da_args *args)
+{
+	if (xfs_attr_is_shortform(args->dp))
+		return XFS_DAS_SF_REMOVE;
+	if (xfs_attr_is_leaf(args->dp))
+		return XFS_DAS_LEAF_REMOVE;
+	return XFS_DAS_NODE_REMOVE;
+}
+
 static inline enum xfs_delattr_state
 xfs_attr_init_replace_state(struct xfs_da_args *args)
 {
diff --git a/fs/xfs/xfs_attr_item.c b/fs/xfs/xfs_attr_item.c
index 740a55d07660..fb9549e7ea96 100644
--- a/fs/xfs/xfs_attr_item.c
+++ b/fs/xfs/xfs_attr_item.c
@@ -296,12 +296,9 @@ xfs_attrd_item_release(
 STATIC int
 xfs_xattri_finish_update(
 	struct xfs_attr_item		*attr,
-	struct xfs_attrd_log_item	*attrdp,
-	uint32_t			op_flags)
+	struct xfs_attrd_log_item	*attrdp)
 {
 	struct xfs_da_args		*args = attr->xattri_da_args;
-	unsigned int			op = op_flags &
-					     XFS_ATTR_OP_FLAGS_TYPE_MASK;
 	int				error;
 
 	if (XFS_TEST_ERROR(false, args->dp->i_mount, XFS_ERRTAG_LARP)) {
@@ -309,22 +306,9 @@ xfs_xattri_finish_update(
 		goto out;
 	}
 
-	switch (op) {
-	case XFS_ATTR_OP_FLAGS_SET:
-	case XFS_ATTR_OP_FLAGS_REPLACE:
-		error = xfs_attr_set_iter(attr);
-		if (!error && attr->xattri_dela_state != XFS_DAS_DONE)
-			error = -EAGAIN;
-		break;
-	case XFS_ATTR_OP_FLAGS_REMOVE:
-		ASSERT(XFS_IFORK_Q(args->dp));
-		error = xfs_attr_remove_iter(attr);
-		break;
-	default:
-		error = -EFSCORRUPTED;
-		break;
-	}
-
+	error = xfs_attr_set_iter(attr);
+	if (!error && attr->xattri_dela_state != XFS_DAS_DONE)
+		error = -EAGAIN;
 out:
 	/*
 	 * Mark the transaction dirty, even on error. This ensures the
@@ -432,8 +416,7 @@ xfs_attr_finish_item(
 	 */
 	attr->xattri_da_args->trans = tp;
 
-	error = xfs_xattri_finish_update(attr, done_item,
-					 attr->xattri_op_flags);
+	error = xfs_xattri_finish_update(attr, done_item);
 	if (error != -EAGAIN)
 		kmem_free(attr);
 
@@ -581,7 +564,7 @@ xfs_attri_item_recover(
 		attr->xattri_dela_state = xfs_attr_init_add_state(args);
 		break;
 	case XFS_ATTR_OP_FLAGS_REMOVE:
-		attr->xattri_dela_state = XFS_DAS_UNINIT;
+		attr->xattri_dela_state = xfs_attr_init_remove_state(args);
 		break;
 	default:
 		ASSERT(0);
@@ -600,7 +583,7 @@ xfs_attri_item_recover(
 	xfs_ilock(ip, XFS_ILOCK_EXCL);
 	xfs_trans_ijoin(tp, ip, 0);
 
-	ret = xfs_xattri_finish_update(attr, done_item, attrp->alfi_op_flags);
+	ret = xfs_xattri_finish_update(attr, done_item);
 	if (ret == -EAGAIN) {
 		/* There's more work to do, so add it to this transaction */
 		xfs_defer_add(tp, XFS_DEFER_OPS_TYPE_ATTR, &attr->xattri_list);
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 15/18] xfs: remove xfs_attri_remove_iter
  2022-05-09  0:41 [PATCH 00/18 V4] XFS: LARP state machine and recovery rework Dave Chinner
                   ` (13 preceding siblings ...)
  2022-05-09  0:41 ` [PATCH 14/18] xfs: switch attr remove to xfs_attri_set_iter Dave Chinner
@ 2022-05-09  0:41 ` Dave Chinner
  2022-05-10 23:42   ` Darrick J. Wong
  2022-05-09  0:41 ` [PATCH 16/18] xfs: use XFS_DA_OP flags in deferred attr ops Dave Chinner
                   ` (3 subsequent siblings)
  18 siblings, 1 reply; 56+ messages in thread
From: Dave Chinner @ 2022-05-09  0:41 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

xfs_attri_remove_iter is not used anymore, so remove it and all the
infrastructure it uses and is needed to drive it. THe
xfs_attr_refillstate() function now throws an unused warning, so
isolate the xfs_attr_fillstate()/xfs_attr_refillstate() code pair
with an #if 0 and a comment explaining why we want to keep this code
and restore the optimisation it provides in the near future.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Allison Henderson<allison.henderson@oracle.com>
---
 fs/xfs/libxfs/xfs_attr.c | 418 +++++++++++++--------------------------
 1 file changed, 139 insertions(+), 279 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
index b935727cf517..8be76f8d11c5 100644
--- a/fs/xfs/libxfs/xfs_attr.c
+++ b/fs/xfs/libxfs/xfs_attr.c
@@ -64,10 +64,6 @@ STATIC int xfs_attr_node_addname_find_attr(struct xfs_attr_item *attr);
 STATIC int xfs_attr_node_remove_attr(struct xfs_attr_item *attr);
 STATIC int xfs_attr_node_hasname(xfs_da_args_t *args,
 				 struct xfs_da_state **state);
-STATIC int xfs_attr_fillstate(xfs_da_state_t *state);
-STATIC int xfs_attr_refillstate(xfs_da_state_t *state);
-STATIC int xfs_attr_node_removename(struct xfs_da_args *args,
-				    struct xfs_da_state *state);
 
 int
 xfs_inode_hasattr(
@@ -100,6 +96,123 @@ xfs_attr_is_leaf(
 	return imap.br_startoff == 0 && imap.br_blockcount == 1;
 }
 
+/*
+ * XXX (dchinner): name path state saving and refilling is an optimisation to
+ * avoid needing to look up name entries after rolling transactions removing
+ * remote xattr blocks between the name entry lookup and name entry removal.
+ * This optimisation got sidelined when combining the set and remove state
+ * machines, but the code has been left in place because it is worthwhile to
+ * restore the optimisation once the combined state machine paths have settled.
+ *
+ * This comment is a public service announcement to remind Future Dave that he
+ * still needs to restore this code to working order.
+ */
+#if 0
+/*
+ * Fill in the disk block numbers in the state structure for the buffers
+ * that are attached to the state structure.
+ * This is done so that we can quickly reattach ourselves to those buffers
+ * after some set of transaction commits have released these buffers.
+ */
+static int
+xfs_attr_fillstate(xfs_da_state_t *state)
+{
+	xfs_da_state_path_t *path;
+	xfs_da_state_blk_t *blk;
+	int level;
+
+	trace_xfs_attr_fillstate(state->args);
+
+	/*
+	 * Roll down the "path" in the state structure, storing the on-disk
+	 * block number for those buffers in the "path".
+	 */
+	path = &state->path;
+	ASSERT((path->active >= 0) && (path->active < XFS_DA_NODE_MAXDEPTH));
+	for (blk = path->blk, level = 0; level < path->active; blk++, level++) {
+		if (blk->bp) {
+			blk->disk_blkno = xfs_buf_daddr(blk->bp);
+			blk->bp = NULL;
+		} else {
+			blk->disk_blkno = 0;
+		}
+	}
+
+	/*
+	 * Roll down the "altpath" in the state structure, storing the on-disk
+	 * block number for those buffers in the "altpath".
+	 */
+	path = &state->altpath;
+	ASSERT((path->active >= 0) && (path->active < XFS_DA_NODE_MAXDEPTH));
+	for (blk = path->blk, level = 0; level < path->active; blk++, level++) {
+		if (blk->bp) {
+			blk->disk_blkno = xfs_buf_daddr(blk->bp);
+			blk->bp = NULL;
+		} else {
+			blk->disk_blkno = 0;
+		}
+	}
+
+	return 0;
+}
+
+/*
+ * Reattach the buffers to the state structure based on the disk block
+ * numbers stored in the state structure.
+ * This is done after some set of transaction commits have released those
+ * buffers from our grip.
+ */
+static int
+xfs_attr_refillstate(xfs_da_state_t *state)
+{
+	xfs_da_state_path_t *path;
+	xfs_da_state_blk_t *blk;
+	int level, error;
+
+	trace_xfs_attr_refillstate(state->args);
+
+	/*
+	 * Roll down the "path" in the state structure, storing the on-disk
+	 * block number for those buffers in the "path".
+	 */
+	path = &state->path;
+	ASSERT((path->active >= 0) && (path->active < XFS_DA_NODE_MAXDEPTH));
+	for (blk = path->blk, level = 0; level < path->active; blk++, level++) {
+		if (blk->disk_blkno) {
+			error = xfs_da3_node_read_mapped(state->args->trans,
+					state->args->dp, blk->disk_blkno,
+					&blk->bp, XFS_ATTR_FORK);
+			if (error)
+				return error;
+		} else {
+			blk->bp = NULL;
+		}
+	}
+
+	/*
+	 * Roll down the "altpath" in the state structure, storing the on-disk
+	 * block number for those buffers in the "altpath".
+	 */
+	path = &state->altpath;
+	ASSERT((path->active >= 0) && (path->active < XFS_DA_NODE_MAXDEPTH));
+	for (blk = path->blk, level = 0; level < path->active; blk++, level++) {
+		if (blk->disk_blkno) {
+			error = xfs_da3_node_read_mapped(state->args->trans,
+					state->args->dp, blk->disk_blkno,
+					&blk->bp, XFS_ATTR_FORK);
+			if (error)
+				return error;
+		} else {
+			blk->bp = NULL;
+		}
+	}
+
+	return 0;
+}
+#else
+static int xfs_attr_fillstate(xfs_da_state_t *state) { return 0; }
+#endif
+
 /*========================================================================
  * Overall external interface routines.
  *========================================================================*/
@@ -548,25 +661,16 @@ xfs_attr_leaf_remove_attr(
  */
 static int
 xfs_attr_leaf_shrink(
-	struct xfs_da_args	*args,
-	struct xfs_da_state     *state)
+	struct xfs_da_args	*args)
 {
 	struct xfs_inode	*dp = args->dp;
-	int			error, forkoff;
 	struct xfs_buf		*bp;
+	int			forkoff;
+	int			error;
 
 	if (!xfs_attr_is_leaf(dp))
 		return 0;
 
-	/*
-	 * Have to get rid of the copy of this dabuf in the state.
-	 */
-	if (state) {
-		ASSERT(state->path.active == 1);
-		ASSERT(state->path.blk[0].bp);
-		state->path.blk[0].bp = NULL;
-	}
-
 	error = xfs_attr3_leaf_read(args->trans, args->dp, 0, &bp);
 	if (error)
 		return error;
@@ -709,7 +813,7 @@ xfs_attr_set_iter(
 	case XFS_DAS_NODE_REMOVE_ATTR:
 		error = xfs_attr_node_remove_attr(attr);
 		if (!error)
-			error = xfs_attr_leaf_shrink(args, NULL);
+			error = xfs_attr_leaf_shrink(args);
 		attr->xattri_dela_state = XFS_DAS_DONE;
 		break;
 	default:
@@ -1382,6 +1486,24 @@ xfs_attr_node_try_addname(
 	return error;
 }
 
+static int
+xfs_attr_node_removename(
+	struct xfs_da_args	*args,
+	struct xfs_da_state	*state)
+{
+	struct xfs_da_state_blk	*blk;
+	int			retval;
+
+	/*
+	 * Remove the name and update the hashvals in the tree.
+	 */
+	blk = &state->path.blk[state->path.active-1];
+	ASSERT(blk->magic == XFS_ATTR_LEAF_MAGIC);
+	retval = xfs_attr3_leaf_remove(blk->bp, args);
+	xfs_da3_fixhashpath(state, &state->path);
+
+	return retval;
+}
 
 static int
 xfs_attr_node_remove_attr(
@@ -1424,268 +1546,6 @@ xfs_attr_node_remove_attr(
 	return retval;
 }
 
-
-STATIC int
-xfs_attr_node_removename(
-	struct xfs_da_args	*args,
-	struct xfs_da_state	*state)
-{
-	struct xfs_da_state_blk	*blk;
-	int			retval;
-
-	/*
-	 * Remove the name and update the hashvals in the tree.
-	 */
-	blk = &state->path.blk[state->path.active-1];
-	ASSERT(blk->magic == XFS_ATTR_LEAF_MAGIC);
-	retval = xfs_attr3_leaf_remove(blk->bp, args);
-	xfs_da3_fixhashpath(state, &state->path);
-
-	return retval;
-}
-
-/*
- * Remove the attribute specified in @args.
- *
- * This will involve walking down the Btree, and may involve joining
- * leaf nodes and even joining intermediate nodes up to and including
- * the root node (a special case of an intermediate node).
- *
- * This routine is meant to function as either an in-line or delayed operation,
- * and may return -EAGAIN when the transaction needs to be rolled.  Calling
- * functions will need to handle this, and call the function until a
- * successful error code is returned.
- */
-int
-xfs_attr_remove_iter(
-	struct xfs_attr_item		*attr)
-{
-	struct xfs_da_args		*args = attr->xattri_da_args;
-	struct xfs_da_state		*state = attr->xattri_da_state;
-	int				retval, error = 0;
-	struct xfs_inode		*dp = args->dp;
-
-	trace_xfs_attr_node_removename(args);
-
-	switch (attr->xattri_dela_state) {
-	case XFS_DAS_UNINIT:
-		if (!xfs_inode_hasattr(dp))
-			return -ENOATTR;
-
-		/*
-		 * Shortform or leaf formats don't require transaction rolls and
-		 * thus state transitions. Call the right helper and return.
-		 */
-		if (dp->i_afp->if_format == XFS_DINODE_FMT_LOCAL)
-			return xfs_attr_sf_removename(args);
-
-		if (xfs_attr_is_leaf(dp))
-			return xfs_attr_leaf_removename(args);
-
-		/*
-		 * Node format may require transaction rolls. Set up the
-		 * state context and fall into the state machine.
-		 */
-		if (!attr->xattri_da_state) {
-			error = xfs_attr_node_removename_setup(attr);
-			if (error)
-				return error;
-			state = attr->xattri_da_state;
-		}
-
-		fallthrough;
-	case XFS_DAS_RMTBLK:
-		attr->xattri_dela_state = XFS_DAS_RMTBLK;
-
-		/*
-		 * If there is an out-of-line value, de-allocate the blocks.
-		 * This is done before we remove the attribute so that we don't
-		 * overflow the maximum size of a transaction and/or hit a
-		 * deadlock.
-		 */
-		if (args->rmtblkno > 0) {
-			/*
-			 * May return -EAGAIN. Roll and repeat until all remote
-			 * blocks are removed.
-			 */
-			error = xfs_attr_rmtval_remove(attr);
-			if (error == -EAGAIN) {
-				trace_xfs_attr_remove_iter_return(
-					attr->xattri_dela_state, args->dp);
-				return error;
-			} else if (error) {
-				goto out;
-			}
-
-			/*
-			 * Refill the state structure with buffers (the prior
-			 * calls released our buffers) and close out this
-			 * transaction before proceeding.
-			 */
-			ASSERT(args->rmtblkno == 0);
-			error = xfs_attr_refillstate(state);
-			if (error)
-				goto out;
-
-			attr->xattri_dela_state = XFS_DAS_RM_NAME;
-			trace_xfs_attr_remove_iter_return(
-					attr->xattri_dela_state, args->dp);
-			return -EAGAIN;
-		}
-
-		fallthrough;
-	case XFS_DAS_RM_NAME:
-		/*
-		 * If we came here fresh from a transaction roll, reattach all
-		 * the buffers to the current transaction.
-		 */
-		if (attr->xattri_dela_state == XFS_DAS_RM_NAME) {
-			error = xfs_attr_refillstate(state);
-			if (error)
-				goto out;
-		}
-
-		retval = xfs_attr_node_removename(args, state);
-
-		/*
-		 * Check to see if the tree needs to be collapsed. If so, roll
-		 * the transacton and fall into the shrink state.
-		 */
-		if (retval && (state->path.active > 1)) {
-			error = xfs_da3_join(state);
-			if (error)
-				goto out;
-
-			attr->xattri_dela_state = XFS_DAS_RM_SHRINK;
-			trace_xfs_attr_remove_iter_return(
-					attr->xattri_dela_state, args->dp);
-			return -EAGAIN;
-		}
-
-		fallthrough;
-	case XFS_DAS_RM_SHRINK:
-		/*
-		 * If the result is small enough, push it all into the inode.
-		 * This is our final state so it's safe to return a dirty
-		 * transaction.
-		 */
-		if (xfs_attr_is_leaf(dp))
-			error = xfs_attr_leaf_shrink(args, state);
-		ASSERT(error != -EAGAIN);
-		break;
-	default:
-		ASSERT(0);
-		error = -EINVAL;
-		goto out;
-	}
-out:
-	if (state)
-		xfs_da_state_free(state);
-	return error;
-}
-
-/*
- * Fill in the disk block numbers in the state structure for the buffers
- * that are attached to the state structure.
- * This is done so that we can quickly reattach ourselves to those buffers
- * after some set of transaction commits have released these buffers.
- */
-STATIC int
-xfs_attr_fillstate(xfs_da_state_t *state)
-{
-	xfs_da_state_path_t *path;
-	xfs_da_state_blk_t *blk;
-	int level;
-
-	trace_xfs_attr_fillstate(state->args);
-
-	/*
-	 * Roll down the "path" in the state structure, storing the on-disk
-	 * block number for those buffers in the "path".
-	 */
-	path = &state->path;
-	ASSERT((path->active >= 0) && (path->active < XFS_DA_NODE_MAXDEPTH));
-	for (blk = path->blk, level = 0; level < path->active; blk++, level++) {
-		if (blk->bp) {
-			blk->disk_blkno = xfs_buf_daddr(blk->bp);
-			blk->bp = NULL;
-		} else {
-			blk->disk_blkno = 0;
-		}
-	}
-
-	/*
-	 * Roll down the "altpath" in the state structure, storing the on-disk
-	 * block number for those buffers in the "altpath".
-	 */
-	path = &state->altpath;
-	ASSERT((path->active >= 0) && (path->active < XFS_DA_NODE_MAXDEPTH));
-	for (blk = path->blk, level = 0; level < path->active; blk++, level++) {
-		if (blk->bp) {
-			blk->disk_blkno = xfs_buf_daddr(blk->bp);
-			blk->bp = NULL;
-		} else {
-			blk->disk_blkno = 0;
-		}
-	}
-
-	return 0;
-}
-
-/*
- * Reattach the buffers to the state structure based on the disk block
- * numbers stored in the state structure.
- * This is done after some set of transaction commits have released those
- * buffers from our grip.
- */
-STATIC int
-xfs_attr_refillstate(xfs_da_state_t *state)
-{
-	xfs_da_state_path_t *path;
-	xfs_da_state_blk_t *blk;
-	int level, error;
-
-	trace_xfs_attr_refillstate(state->args);
-
-	/*
-	 * Roll down the "path" in the state structure, storing the on-disk
-	 * block number for those buffers in the "path".
-	 */
-	path = &state->path;
-	ASSERT((path->active >= 0) && (path->active < XFS_DA_NODE_MAXDEPTH));
-	for (blk = path->blk, level = 0; level < path->active; blk++, level++) {
-		if (blk->disk_blkno) {
-			error = xfs_da3_node_read_mapped(state->args->trans,
-					state->args->dp, blk->disk_blkno,
-					&blk->bp, XFS_ATTR_FORK);
-			if (error)
-				return error;
-		} else {
-			blk->bp = NULL;
-		}
-	}
-
-	/*
-	 * Roll down the "altpath" in the state structure, storing the on-disk
-	 * block number for those buffers in the "altpath".
-	 */
-	path = &state->altpath;
-	ASSERT((path->active >= 0) && (path->active < XFS_DA_NODE_MAXDEPTH));
-	for (blk = path->blk, level = 0; level < path->active; blk++, level++) {
-		if (blk->disk_blkno) {
-			error = xfs_da3_node_read_mapped(state->args->trans,
-					state->args->dp, blk->disk_blkno,
-					&blk->bp, XFS_ATTR_FORK);
-			if (error)
-				return error;
-		} else {
-			blk->bp = NULL;
-		}
-	}
-
-	return 0;
-}
-
 /*
  * Retrieve the attribute data from a node attribute list.
  *
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 16/18] xfs: use XFS_DA_OP flags in deferred attr ops
  2022-05-09  0:41 [PATCH 00/18 V4] XFS: LARP state machine and recovery rework Dave Chinner
                   ` (14 preceding siblings ...)
  2022-05-09  0:41 ` [PATCH 15/18] xfs: remove xfs_attri_remove_iter Dave Chinner
@ 2022-05-09  0:41 ` Dave Chinner
  2022-05-10 22:20   ` [PATCH 16/18 v2] " Dave Chinner
  2022-05-09  0:41 ` [PATCH 17/18] xfs: ATTR_REPLACE algorithm with LARP enabled needs rework Dave Chinner
                   ` (2 subsequent siblings)
  18 siblings, 1 reply; 56+ messages in thread
From: Dave Chinner @ 2022-05-09  0:41 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

We currently store the high level attr operation in
args->attr_flags. This falgs are what the VFS is telling us to do,
but don't necessarily match what we are doing in the low level
modification state machine. e.g. XATTR_REPLACE implies both
XFS_DA_OP_ADDNAME and XFS_DA_OP_RENAME because it is doing both a
remove and adding a new attr.

However, deep in the individual state machine operations, we check
errors against this high level VFS op flags, not the low level
XFS_DA_OP flags. Indeed, we don't even have a low level flag for
a REMOVE operation, so the only way we know we are doing a remove
is the complete absence of XATTR_REPLACE, XATTR_CREATE,
XFS_DA_OP_ADDNAME and XFS_DA_OP_RENAME. And because there are other
flags in these fields, this is a pain to check if we need to.

As the XFS_DA_OP flags are only needed once the deferred operations
are set up, set these flags appropriately when we set the initial
operation state. We also introduce a XFS_DA_OP_REMOVE flag to make
it easy to know that we are doing a remove operation.

With these, we can remove the use of XATTR_REPLACE and XATTR_CREATE
in low level lookup operations, and manipulate the low level flags
according to the low level context that is operating. e.g. log
recovery does not have a VFS xattr operation state to copy into
args->attr_flags, and the low level state machine ops we do for
recovery do not match the high level VFS operations that were in
progress when the system failed...

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/libxfs/xfs_attr.c      | 124 +++++++++++++++++++---------------
 fs/xfs/libxfs/xfs_attr.h      |   3 +
 fs/xfs/libxfs/xfs_attr_leaf.c |   2 +-
 fs/xfs/libxfs/xfs_da_btree.h  |   8 ++-
 4 files changed, 77 insertions(+), 60 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
index 8be76f8d11c5..344497e37813 100644
--- a/fs/xfs/libxfs/xfs_attr.c
+++ b/fs/xfs/libxfs/xfs_attr.c
@@ -467,7 +467,7 @@ xfs_attr_leaf_addname(
 	 */
 	if (args->rmtblkno)
 		attr->xattri_dela_state = XFS_DAS_LEAF_SET_RMT;
-	else if (args->op_flags & XFS_DA_OP_RENAME)
+	else if (args->op_flags & XFS_DA_OP_REPLACE)
 		xfs_attr_dela_state_set_replace(attr, XFS_DAS_LEAF_REPLACE);
 	else
 		attr->xattri_dela_state = XFS_DAS_DONE;
@@ -512,7 +512,7 @@ xfs_attr_node_addname(
 
 	if (args->rmtblkno)
 		attr->xattri_dela_state = XFS_DAS_NODE_SET_RMT;
-	else if (args->op_flags & XFS_DA_OP_RENAME)
+	else if (args->op_flags & XFS_DA_OP_REPLACE)
 		xfs_attr_dela_state_set_replace(attr, XFS_DAS_NODE_REPLACE);
 	else
 		attr->xattri_dela_state = XFS_DAS_DONE;
@@ -548,7 +548,7 @@ xfs_attr_rmtval_alloc(
 		return error;
 
 	/* If this is not a rename, clear the incomplete flag and we're done. */
-	if (!(args->op_flags & XFS_DA_OP_RENAME)) {
+	if (!(args->op_flags & XFS_DA_OP_REPLACE)) {
 		error = xfs_attr3_leaf_clearflag(args);
 		attr->xattri_dela_state = XFS_DAS_DONE;
 	} else {
@@ -967,8 +967,6 @@ xfs_attr_set(
 
 	if (args->value) {
 		XFS_STATS_INC(mp, xs_attr_set);
-
-		args->op_flags |= XFS_DA_OP_ADDNAME;
 		args->total = xfs_attr_calc_size(args, &local);
 
 		/*
@@ -1126,28 +1124,41 @@ static inline int xfs_attr_sf_totsize(struct xfs_inode *dp)
  * Add a name to the shortform attribute list structure
  * This is the external routine.
  */
-STATIC int
-xfs_attr_shortform_addname(xfs_da_args_t *args)
+static int
+xfs_attr_shortform_addname(
+	struct xfs_da_args	*args)
 {
-	int newsize, forkoff, retval;
+	int			newsize, forkoff;
+	int			error;
 
 	trace_xfs_attr_sf_addname(args);
 
-	retval = xfs_attr_shortform_lookup(args);
-	if (retval == -ENOATTR && (args->attr_flags & XATTR_REPLACE))
-		return retval;
-	if (retval == -EEXIST) {
-		if (args->attr_flags & XATTR_CREATE)
-			return retval;
-		retval = xfs_attr_sf_removename(args);
-		if (retval)
-			return retval;
+	error = xfs_attr_shortform_lookup(args);
+	switch (error) {
+	case -ENOATTR:
+		if (args->op_flags & XFS_DA_OP_REPLACE)
+			return error;
+		break;
+	case -EEXIST:
+		if (!(args->op_flags & XFS_DA_OP_REPLACE))
+			return error;
+
+		error = xfs_attr_sf_removename(args);
+		if (error)
+			return error;
+
 		/*
-		 * Since we have removed the old attr, clear ATTR_REPLACE so
-		 * that the leaf format add routine won't trip over the attr
-		 * not being around.
+		 * Since we have removed the old attr, clear XFS_DA_OP_REPLACE
+		 * so that the new attr doesn't fit in shortform format, the
+		 * leaf format add routine won't trip over the attr not being
+		 * around.
 		 */
-		args->attr_flags &= ~XATTR_REPLACE;
+		args->op_flags &= ~XFS_DA_OP_REPLACE;
+		break;
+	case 0:
+		break;
+	default:
+		return error;
 	}
 
 	if (args->namelen >= XFS_ATTR_SF_ENTSIZE_MAX ||
@@ -1229,28 +1240,30 @@ xfs_attr_leaf_try_add(
 	 * Look up the xattr name to set the insertion point for the new xattr.
 	 */
 	error = xfs_attr3_leaf_lookup_int(bp, args);
-	if (error != -ENOATTR && error != -EEXIST)
-		goto out_brelse;
-	if (error == -ENOATTR && (args->attr_flags & XATTR_REPLACE))
-		goto out_brelse;
-	if (error == -EEXIST) {
-		if (args->attr_flags & XATTR_CREATE)
+	switch (error) {
+	case -ENOATTR:
+		if (args->op_flags & XFS_DA_OP_REPLACE)
+			goto out_brelse;
+		break;
+	case -EEXIST:
+		if (!(args->op_flags & XFS_DA_OP_REPLACE))
 			goto out_brelse;
 
 		trace_xfs_attr_leaf_replace(args);
-
-		/* save the attribute state for later removal*/
-		args->op_flags |= XFS_DA_OP_RENAME;	/* an atomic rename */
-		xfs_attr_save_rmt_blk(args);
-
 		/*
-		 * clear the remote attr state now that it is saved so that the
-		 * values reflect the state of the attribute we are about to
+		 * Save the existing remote attr state so that the current
+		 * values reflect the state of the new attribute we are about to
 		 * add, not the attribute we just found and will remove later.
 		 */
+		xfs_attr_save_rmt_blk(args);
 		args->rmtblkno = 0;
 		args->rmtblkcnt = 0;
 		args->rmtvaluelen = 0;
+		break;
+	case 0:
+		break;
+	default:
+		goto out_brelse;
 	}
 
 	return xfs_attr3_leaf_add(bp, args);
@@ -1389,46 +1402,45 @@ xfs_attr_node_hasname(
 
 STATIC int
 xfs_attr_node_addname_find_attr(
-	 struct xfs_attr_item		*attr)
+	 struct xfs_attr_item	*attr)
 {
-	struct xfs_da_args		*args = attr->xattri_da_args;
-	int				retval;
+	struct xfs_da_args	*args = attr->xattri_da_args;
+	int			error;
 
 	/*
 	 * Search to see if name already exists, and get back a pointer
 	 * to where it should go.
 	 */
-	retval = xfs_attr_node_hasname(args, &attr->xattri_da_state);
-	if (retval != -ENOATTR && retval != -EEXIST)
-		goto error;
-
-	if (retval == -ENOATTR && (args->attr_flags & XATTR_REPLACE))
-		goto error;
-	if (retval == -EEXIST) {
-		if (args->attr_flags & XATTR_CREATE)
+	error = xfs_attr_node_hasname(args, &attr->xattri_da_state);
+	switch (error) {
+	case -ENOATTR:
+		if (args->op_flags & XFS_DA_OP_REPLACE)
+			goto error;
+		break;
+	case -EEXIST:
+		if (!(args->op_flags & XFS_DA_OP_REPLACE))
 			goto error;
 
-		trace_xfs_attr_node_replace(args);
-
-		/* save the attribute state for later removal*/
-		args->op_flags |= XFS_DA_OP_RENAME;	/* atomic rename op */
-		xfs_attr_save_rmt_blk(args);
 
+		trace_xfs_attr_node_replace(args);
 		/*
-		 * clear the remote attr state now that it is saved so that the
-		 * values reflect the state of the attribute we are about to
+		 * Save the existing remote attr state so that the current
+		 * values reflect the state of the new attribute we are about to
 		 * add, not the attribute we just found and will remove later.
 		 */
-		args->rmtblkno = 0;
-		args->rmtblkcnt = 0;
-		args->rmtvaluelen = 0;
+		xfs_attr_save_rmt_blk(args);
+		break;
+	case 0:
+		break;
+	default:
+		goto error;
 	}
 
 	return 0;
 error:
 	if (attr->xattri_da_state)
 		xfs_da_state_free(attr->xattri_da_state);
-	return retval;
+	return error;
 }
 
 /*
diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
index 6bef522533a4..e93efc8b11cd 100644
--- a/fs/xfs/libxfs/xfs_attr.h
+++ b/fs/xfs/libxfs/xfs_attr.h
@@ -584,6 +584,7 @@ xfs_attr_is_shortform(
 static inline enum xfs_delattr_state
 xfs_attr_init_add_state(struct xfs_da_args *args)
 {
+	args->op_flags |= XFS_DA_OP_ADDNAME;
 	if (!args->dp->i_afp)
 		return XFS_DAS_DONE;
 	if (xfs_attr_is_shortform(args->dp))
@@ -596,6 +597,7 @@ xfs_attr_init_add_state(struct xfs_da_args *args)
 static inline enum xfs_delattr_state
 xfs_attr_init_remove_state(struct xfs_da_args *args)
 {
+	args->op_flags |= XFS_DA_OP_REMOVE;
 	if (xfs_attr_is_shortform(args->dp))
 		return XFS_DAS_SF_REMOVE;
 	if (xfs_attr_is_leaf(args->dp))
@@ -606,6 +608,7 @@ xfs_attr_init_remove_state(struct xfs_da_args *args)
 static inline enum xfs_delattr_state
 xfs_attr_init_replace_state(struct xfs_da_args *args)
 {
+	args->op_flags |= XFS_DA_OP_ADDNAME | XFS_DA_OP_REPLACE;
 	return xfs_attr_init_add_state(args);
 }
 
diff --git a/fs/xfs/libxfs/xfs_attr_leaf.c b/fs/xfs/libxfs/xfs_attr_leaf.c
index e90bfd9d7551..53d02ce9ed78 100644
--- a/fs/xfs/libxfs/xfs_attr_leaf.c
+++ b/fs/xfs/libxfs/xfs_attr_leaf.c
@@ -1492,7 +1492,7 @@ xfs_attr3_leaf_add_work(
 	entry->flags = args->attr_filter;
 	if (tmp)
 		entry->flags |= XFS_ATTR_LOCAL;
-	if (args->op_flags & XFS_DA_OP_RENAME) {
+	if (args->op_flags & XFS_DA_OP_REPLACE) {
 		if (!xfs_has_larp(mp))
 			entry->flags |= XFS_ATTR_INCOMPLETE;
 		if ((args->blkno2 == args->blkno) &&
diff --git a/fs/xfs/libxfs/xfs_da_btree.h b/fs/xfs/libxfs/xfs_da_btree.h
index deb368d041e3..468ca70cd35d 100644
--- a/fs/xfs/libxfs/xfs_da_btree.h
+++ b/fs/xfs/libxfs/xfs_da_btree.h
@@ -85,19 +85,21 @@ typedef struct xfs_da_args {
  * Operation flags:
  */
 #define XFS_DA_OP_JUSTCHECK	(1u << 0) /* check for ok with no space */
-#define XFS_DA_OP_RENAME	(1u << 1) /* this is an atomic rename op */
+#define XFS_DA_OP_REPLACE	(1u << 1) /* this is an atomic replace op */
 #define XFS_DA_OP_ADDNAME	(1u << 2) /* this is an add operation */
 #define XFS_DA_OP_OKNOENT	(1u << 3) /* lookup op, ENOENT ok, else die */
 #define XFS_DA_OP_CILOOKUP	(1u << 4) /* lookup returns CI name if found */
 #define XFS_DA_OP_NOTIME	(1u << 5) /* don't update inode timestamps */
+#define XFS_DA_OP_REMOVE	(1u << 6) /* this is a remove operation */
 
 #define XFS_DA_OP_FLAGS \
 	{ XFS_DA_OP_JUSTCHECK,	"JUSTCHECK" }, \
-	{ XFS_DA_OP_RENAME,	"RENAME" }, \
+	{ XFS_DA_OP_REPLACE,	"REPLACE" }, \
 	{ XFS_DA_OP_ADDNAME,	"ADDNAME" }, \
 	{ XFS_DA_OP_OKNOENT,	"OKNOENT" }, \
 	{ XFS_DA_OP_CILOOKUP,	"CILOOKUP" }, \
-	{ XFS_DA_OP_NOTIME,	"NOTIME" }
+	{ XFS_DA_OP_NOTIME,	"NOTIME" }, \
+	{ XFS_DA_OP_REMOVE,	"REMOVE" }
 
 /*
  * Storage for holding state during Btree searches and split/join ops.
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 17/18] xfs: ATTR_REPLACE algorithm with LARP enabled needs rework
  2022-05-09  0:41 [PATCH 00/18 V4] XFS: LARP state machine and recovery rework Dave Chinner
                   ` (15 preceding siblings ...)
  2022-05-09  0:41 ` [PATCH 16/18] xfs: use XFS_DA_OP flags in deferred attr ops Dave Chinner
@ 2022-05-09  0:41 ` Dave Chinner
  2022-05-10 22:31   ` Alli
  2022-05-10 23:53   ` Darrick J. Wong
  2022-05-09  0:41 ` [PATCH 18/18] xfs: detect empty attr leaf blocks in xfs_attr3_leaf_verify Dave Chinner
  2022-05-10 22:27 ` [PATCH 19/18] xfs: can't use kmem_zalloc() for attribute buffers Dave Chinner
  18 siblings, 2 replies; 56+ messages in thread
From: Dave Chinner @ 2022-05-09  0:41 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

We can't use the same algorithm for replacing an existing attribute
when logging attributes. The existing algorithm is essentially:

1. create new attr w/ INCOMPLETE
2. atomically flip INCOMPLETE flags between old + new attribute
3. remove old attr which is marked w/ INCOMPLETE

This algorithm guarantees that we see either the old or new
attribute, and if we fail after the atomic flag flip, we don't have
to recover the removal of the old attr because we never see
INCOMPLETE attributes in lookups.

For logged attributes, however, this does not work. The logged
attribute intents do not track the work that has been done as the
transaction rolls, and hence the only recovery mechanism we have is
"run the replace operation from scratch".

This is further exacerbated by the attempt to avoid needing the
INCOMPLETE flag to create an atomic swap. This means we can create
a second active attribute of the same name before we remove the
original. If we fail at any point after the create but before the
removal has completed, we end up with duplicate attributes in
the attr btree and recovery only tries to replace one of them.

There are several other failure modes where we can leave partially
allocated remote attributes that expose stale data, partially free
remote attributes that enable UAF based stale data exposure, etc.

TO fix this, we need a different algorithm for replace operations
when LARP is enabled. Luckily, it's not that complex if we take the
right first step. That is, the first thing we log is the attri
intent with the new name/value pair and mark the old attr as
INCOMPLETE in the same transaction.

From there, we then remove the old attr and keep relogging the
new name/value in the intent, such that we always know that we have
to create the new attr in recovery. Once the old attr is removed,
we then run a normal ATTR_CREATE operation relogging the intent as
we go. If the new attr is local, then it gets created in a single
atomic transaction that also logs the final intent done. If the new
attr is remote, the we set INCOMPLETE on the new attr while we
allocate and set the remote value, and then we clear the INCOMPLETE
flag at in the last transaction taht logs the final intent done.

If we fail at any point in this algorithm, log recovery will always
see the same state on disk: the new name/value in the intent, and
either an INCOMPLETE attr or no attr in the attr btree. If we find
an INCOMPLETE attr, we run the full replace starting with removing
the INCOMPLETE attr. If we don't find it, then we simply create the
new attr.

Notably, recovery of a failed create that has an INCOMPLETE flag set
is now the same - we start with the lookup of the INCOMPLETE attr,
and if that exists then we do the full replace recovery process,
otherwise we just create the new attr.

Hence changing the way we do the replace operation when LARP is
enabled allows us to use the same log recovery algorithm for both
the ATTR_CREATE and ATTR_REPLACE operations. This is also the same
algorithm we use for runtime ATTR_REPLACE operations (except for the
step setting up the initial conditions).

The result is that:

- ATTR_CREATE uses the same algorithm regardless of whether LARP is
  enabled or not
- ATTR_REPLACE with larp=0 is identical to the old algorithm
- ATTR_REPLACE with larp=1 runs an unmodified attr removal algorithm
  from the larp=0 code and then runs the unmodified ATTR_CREATE
  code.
- log recovery when larp=1 runs the same ATTR_REPLACE algorithm as
  it uses at runtime.

Because the state machine is now quite clean, changing the algorithm
is really just a case of changing the initial state and how the
states link together for the ATTR_REPLACE case. Hence it's not a
huge amoutn of code for what is a fairly substantial rework
of the attr logging and recovery algorithm....

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/libxfs/xfs_attr.c      | 97 +++++++++++++++++++++--------------
 fs/xfs/libxfs/xfs_attr.h      | 49 +++++++++++-------
 fs/xfs/libxfs/xfs_attr_leaf.c | 44 +++++++++++++---
 fs/xfs/libxfs/xfs_da_btree.h  |  4 +-
 fs/xfs/xfs_attr_item.c        |  8 ++-
 fs/xfs/xfs_trace.h            |  7 +--
 6 files changed, 137 insertions(+), 72 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
index 344497e37813..2f6b9bfd7768 100644
--- a/fs/xfs/libxfs/xfs_attr.c
+++ b/fs/xfs/libxfs/xfs_attr.c
@@ -69,9 +69,12 @@ int
 xfs_inode_hasattr(
 	struct xfs_inode	*ip)
 {
-	if (!XFS_IFORK_Q(ip) ||
-	    (ip->i_afp->if_format == XFS_DINODE_FMT_EXTENTS &&
-	     ip->i_afp->if_nextents == 0))
+	if (!XFS_IFORK_Q(ip))
+		return 0;
+	if (!ip->i_afp)
+		return 0;
+	if (ip->i_afp->if_format == XFS_DINODE_FMT_EXTENTS &&
+	    ip->i_afp->if_nextents == 0)
 		return 0;
 	return 1;
 }
@@ -409,23 +412,30 @@ xfs_attr_sf_addname(
 }
 
 /*
- * When we bump the state to REPLACE, we may actually need to skip over the
- * state. When LARP mode is enabled, we don't need to run the atomic flags flip,
- * so we skip straight over the REPLACE state and go on to REMOVE_OLD.
+ * Handle the state change on completion of a multi-state attr operation.
+ *
+ * If the XFS_DA_OP_REPLACE flag is set, this means the operation was the first
+ * modification in a attr replace operation and we still have to do the second
+ * state, indicated by @replace_state.
+ *
+ * We consume the XFS_DA_OP_REPLACE flag so that when we are called again on
+ * completion of the second half of the attr replace operation we correctly
+ * signal that it is done.
  */
-static void
-xfs_attr_dela_state_set_replace(
+static enum xfs_delattr_state
+xfs_attr_complete_op(
 	struct xfs_attr_item	*attr,
-	enum xfs_delattr_state	replace)
+	enum xfs_delattr_state	replace_state)
 {
 	struct xfs_da_args	*args = attr->xattri_da_args;
+	bool			do_replace = args->op_flags & XFS_DA_OP_REPLACE;
 
-	ASSERT(replace == XFS_DAS_LEAF_REPLACE ||
-			replace == XFS_DAS_NODE_REPLACE);
-
-	attr->xattri_dela_state = replace;
-	if (xfs_has_larp(args->dp->i_mount))
-		attr->xattri_dela_state++;
+	args->op_flags &= ~XFS_DA_OP_REPLACE;
+	if (do_replace) {
+		args->attr_filter &= ~XFS_ATTR_INCOMPLETE;
+		return replace_state;
+	}
+	return XFS_DAS_DONE;
 }
 
 static int
@@ -467,10 +477,9 @@ xfs_attr_leaf_addname(
 	 */
 	if (args->rmtblkno)
 		attr->xattri_dela_state = XFS_DAS_LEAF_SET_RMT;
-	else if (args->op_flags & XFS_DA_OP_REPLACE)
-		xfs_attr_dela_state_set_replace(attr, XFS_DAS_LEAF_REPLACE);
 	else
-		attr->xattri_dela_state = XFS_DAS_DONE;
+		attr->xattri_dela_state = xfs_attr_complete_op(attr,
+							XFS_DAS_LEAF_REPLACE);
 out:
 	trace_xfs_attr_leaf_addname_return(attr->xattri_dela_state, args->dp);
 	return error;
@@ -512,10 +521,9 @@ xfs_attr_node_addname(
 
 	if (args->rmtblkno)
 		attr->xattri_dela_state = XFS_DAS_NODE_SET_RMT;
-	else if (args->op_flags & XFS_DA_OP_REPLACE)
-		xfs_attr_dela_state_set_replace(attr, XFS_DAS_NODE_REPLACE);
 	else
-		attr->xattri_dela_state = XFS_DAS_DONE;
+		attr->xattri_dela_state = xfs_attr_complete_op(attr,
+							XFS_DAS_NODE_REPLACE);
 out:
 	trace_xfs_attr_node_addname_return(attr->xattri_dela_state, args->dp);
 	return error;
@@ -547,18 +555,15 @@ xfs_attr_rmtval_alloc(
 	if (error)
 		return error;
 
-	/* If this is not a rename, clear the incomplete flag and we're done. */
-	if (!(args->op_flags & XFS_DA_OP_REPLACE)) {
+	attr->xattri_dela_state = xfs_attr_complete_op(attr,
+						++attr->xattri_dela_state);
+	/*
+	 * If we are not doing a rename, we've finished the operation but still
+	 * have to clear the incomplete flag protecting the new attr from
+	 * exposing partially initialised state if we crash during creation.
+	 */
+	if (attr->xattri_dela_state == XFS_DAS_DONE)
 		error = xfs_attr3_leaf_clearflag(args);
-		attr->xattri_dela_state = XFS_DAS_DONE;
-	} else {
-		/*
-		 * We are running a REPLACE operation, so we need to bump the
-		 * state to the step in that operation.
-		 */
-		attr->xattri_dela_state++;
-		xfs_attr_dela_state_set_replace(attr, attr->xattri_dela_state);
-	}
 out:
 	trace_xfs_attr_rmtval_alloc(attr->xattri_dela_state, args->dp);
 	return error;
@@ -715,13 +720,24 @@ xfs_attr_set_iter(
 		return xfs_attr_node_addname(attr);
 
 	case XFS_DAS_SF_REMOVE:
-		attr->xattri_dela_state = XFS_DAS_DONE;
-		return xfs_attr_sf_removename(args);
+		error = xfs_attr_sf_removename(args);
+		attr->xattri_dela_state = xfs_attr_complete_op(attr,
+						xfs_attr_init_add_state(args));
+		break;
 	case XFS_DAS_LEAF_REMOVE:
-		attr->xattri_dela_state = XFS_DAS_DONE;
-		return xfs_attr_leaf_removename(args);
+		error = xfs_attr_leaf_removename(args);
+		attr->xattri_dela_state = xfs_attr_complete_op(attr,
+						xfs_attr_init_add_state(args));
+		break;
 	case XFS_DAS_NODE_REMOVE:
 		error = xfs_attr_node_removename_setup(attr);
+		if (error == -ENOATTR &&
+		    (args->op_flags & XFS_DA_OP_RECOVERY)) {
+			attr->xattri_dela_state = xfs_attr_complete_op(attr,
+						xfs_attr_init_add_state(args));
+			error = 0;
+			break;
+		}
 		if (error)
 			return error;
 		attr->xattri_dela_state = XFS_DAS_NODE_REMOVE_RMT;
@@ -807,14 +823,16 @@ xfs_attr_set_iter(
 
 	case XFS_DAS_LEAF_REMOVE_ATTR:
 		error = xfs_attr_leaf_remove_attr(attr);
-		attr->xattri_dela_state = XFS_DAS_DONE;
+		attr->xattri_dela_state = xfs_attr_complete_op(attr,
+						xfs_attr_init_add_state(args));
 		break;
 
 	case XFS_DAS_NODE_REMOVE_ATTR:
 		error = xfs_attr_node_remove_attr(attr);
 		if (!error)
 			error = xfs_attr_leaf_shrink(args);
-		attr->xattri_dela_state = XFS_DAS_DONE;
+		attr->xattri_dela_state = xfs_attr_complete_op(attr,
+						xfs_attr_init_add_state(args));
 		break;
 	default:
 		ASSERT(0);
@@ -1316,9 +1334,10 @@ xfs_attr_leaf_removename(
 	dp = args->dp;
 
 	error = xfs_attr_leaf_hasname(args, &bp);
-
 	if (error == -ENOATTR) {
 		xfs_trans_brelse(args->trans, bp);
+		if (args->op_flags & XFS_DA_OP_RECOVERY)
+			return 0;
 		return error;
 	} else if (error != -EEXIST)
 		return error;
diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
index e93efc8b11cd..7467d31cb3f1 100644
--- a/fs/xfs/libxfs/xfs_attr.h
+++ b/fs/xfs/libxfs/xfs_attr.h
@@ -444,18 +444,23 @@ struct xfs_attr_list_context {
  */
 enum xfs_delattr_state {
 	XFS_DAS_UNINIT		= 0,	/* No state has been set yet */
-	XFS_DAS_SF_ADD,			/* Initial shortform set iter state */
-	XFS_DAS_LEAF_ADD,		/* Initial leaf form set iter state */
-	XFS_DAS_NODE_ADD,		/* Initial node form set iter state */
-	XFS_DAS_RMTBLK,			/* Removing remote blks */
-	XFS_DAS_RM_NAME,		/* Remove attr name */
-	XFS_DAS_RM_SHRINK,		/* We are shrinking the tree */
-
-	XFS_DAS_SF_REMOVE,		/* Initial shortform set iter state */
-	XFS_DAS_LEAF_REMOVE,		/* Initial leaf form set iter state */
-	XFS_DAS_NODE_REMOVE,		/* Initial node form set iter state */
-
-	/* Leaf state set/replace sequence */
+
+	/*
+	 * Initial sequence states. The replace setup code relies on the
+	 * ADD and REMOVE states for a specific format to be sequential so
+	 * that we can transform the initial operation to be performed
+	 * according to the xfs_has_larp() state easily.
+	 */
+	XFS_DAS_SF_ADD,			/* Initial sf add state */
+	XFS_DAS_SF_REMOVE,		/* Initial sf replace/remove state */
+
+	XFS_DAS_LEAF_ADD,		/* Initial leaf add state */
+	XFS_DAS_LEAF_REMOVE,		/* Initial leaf replace/remove state */
+
+	XFS_DAS_NODE_ADD,		/* Initial node add state */
+	XFS_DAS_NODE_REMOVE,		/* Initial node replace/remove state */
+
+	/* Leaf state set/replace/remove sequence */
 	XFS_DAS_LEAF_SET_RMT,		/* set a remote xattr from a leaf */
 	XFS_DAS_LEAF_ALLOC_RMT,		/* We are allocating remote blocks */
 	XFS_DAS_LEAF_REPLACE,		/* Perform replace ops on a leaf */
@@ -463,7 +468,7 @@ enum xfs_delattr_state {
 	XFS_DAS_LEAF_REMOVE_RMT,	/* A rename is removing remote blocks */
 	XFS_DAS_LEAF_REMOVE_ATTR,	/* Remove the old attr from a leaf */
 
-	/* Node state set/replace sequence, must match leaf state above */
+	/* Node state sequence, must match leaf state above */
 	XFS_DAS_NODE_SET_RMT,		/* set a remote xattr from a node */
 	XFS_DAS_NODE_ALLOC_RMT,		/* We are allocating remote blocks */
 	XFS_DAS_NODE_REPLACE,		/* Perform replace ops on a node */
@@ -477,11 +482,11 @@ enum xfs_delattr_state {
 #define XFS_DAS_STRINGS	\
 	{ XFS_DAS_UNINIT,		"XFS_DAS_UNINIT" }, \
 	{ XFS_DAS_SF_ADD,		"XFS_DAS_SF_ADD" }, \
+	{ XFS_DAS_SF_REMOVE,		"XFS_DAS_SF_REMOVE" }, \
 	{ XFS_DAS_LEAF_ADD,		"XFS_DAS_LEAF_ADD" }, \
+	{ XFS_DAS_LEAF_REMOVE,		"XFS_DAS_LEAF_REMOVE" }, \
 	{ XFS_DAS_NODE_ADD,		"XFS_DAS_NODE_ADD" }, \
-	{ XFS_DAS_RMTBLK,		"XFS_DAS_RMTBLK" }, \
-	{ XFS_DAS_RM_NAME,		"XFS_DAS_RM_NAME" }, \
-	{ XFS_DAS_RM_SHRINK,		"XFS_DAS_RM_SHRINK" }, \
+	{ XFS_DAS_NODE_REMOVE,		"XFS_DAS_NODE_REMOVE" }, \
 	{ XFS_DAS_LEAF_SET_RMT,		"XFS_DAS_LEAF_SET_RMT" }, \
 	{ XFS_DAS_LEAF_ALLOC_RMT,	"XFS_DAS_LEAF_ALLOC_RMT" }, \
 	{ XFS_DAS_LEAF_REPLACE,		"XFS_DAS_LEAF_REPLACE" }, \
@@ -525,8 +530,7 @@ struct xfs_attr_item {
 	enum xfs_delattr_state		xattri_dela_state;
 
 	/*
-	 * Indicates if the attr operation is a set or a remove
-	 * XFS_ATTR_OP_FLAGS_{SET,REMOVE}
+	 * Attr operation being performed - XFS_ATTR_OP_FLAGS_*
 	 */
 	unsigned int			xattri_op_flags;
 
@@ -605,10 +609,19 @@ xfs_attr_init_remove_state(struct xfs_da_args *args)
 	return XFS_DAS_NODE_REMOVE;
 }
 
+/*
+ * If we are logging the attributes, then we have to start with removal of the
+ * old attribute so that there is always consistent state that we can recover
+ * from if the system goes down part way through. We always log the new attr
+ * value, so even when we remove the attr first we still have the information in
+ * the log to finish the replace operation atomically.
+ */
 static inline enum xfs_delattr_state
 xfs_attr_init_replace_state(struct xfs_da_args *args)
 {
 	args->op_flags |= XFS_DA_OP_ADDNAME | XFS_DA_OP_REPLACE;
+	if (xfs_has_larp(args->dp->i_mount))
+		return xfs_attr_init_remove_state(args);
 	return xfs_attr_init_add_state(args);
 }
 
diff --git a/fs/xfs/libxfs/xfs_attr_leaf.c b/fs/xfs/libxfs/xfs_attr_leaf.c
index 53d02ce9ed78..d15e92858bf0 100644
--- a/fs/xfs/libxfs/xfs_attr_leaf.c
+++ b/fs/xfs/libxfs/xfs_attr_leaf.c
@@ -446,6 +446,14 @@ xfs_attr3_leaf_read(
  * Namespace helper routines
  *========================================================================*/
 
+/*
+ * If we are in log recovery, then we want the lookup to ignore the INCOMPLETE
+ * flag on disk - if there's an incomplete attr then recovery needs to tear it
+ * down. If there's no incomplete attr, then recovery needs to tear that attr
+ * down to replace it with the attr that has been logged. In this case, the
+ * INCOMPLETE flag will not be set in attr->attr_filter, but rather
+ * XFS_DA_OP_RECOVERY will be set in args->op_flags.
+ */
 static bool
 xfs_attr_match(
 	struct xfs_da_args	*args,
@@ -453,14 +461,18 @@ xfs_attr_match(
 	unsigned char		*name,
 	int			flags)
 {
+
 	if (args->namelen != namelen)
 		return false;
 	if (memcmp(args->name, name, namelen) != 0)
 		return false;
-	/*
-	 * If we are looking for incomplete entries, show only those, else only
-	 * show complete entries.
-	 */
+
+	/* Recovery ignores the INCOMPLETE flag. */
+	if ((args->op_flags & XFS_DA_OP_RECOVERY) &&
+	    args->attr_filter == (flags & XFS_ATTR_NSP_ONDISK_MASK))
+		return true;
+
+	/* All remaining matches need to be filtered by INCOMPLETE state. */
 	if (args->attr_filter !=
 	    (flags & (XFS_ATTR_NSP_ONDISK_MASK | XFS_ATTR_INCOMPLETE)))
 		return false;
@@ -799,6 +811,14 @@ xfs_attr_sf_removename(
 	sf = (struct xfs_attr_shortform *)dp->i_afp->if_u1.if_data;
 
 	error = xfs_attr_sf_findname(args, &sfe, &base);
+
+	/*
+	 * If we are recovering an operation, finding nothing to
+	 * remove is not an error - it just means there was nothing
+	 * to clean up.
+	 */
+	if (error == -ENOATTR && (args->op_flags & XFS_DA_OP_RECOVERY))
+		return 0;
 	if (error != -EEXIST)
 		return error;
 	size = xfs_attr_sf_entsize(sfe);
@@ -819,7 +839,7 @@ xfs_attr_sf_removename(
 	totsize -= size;
 	if (totsize == sizeof(xfs_attr_sf_hdr_t) && xfs_has_attr2(mp) &&
 	    (dp->i_df.if_format != XFS_DINODE_FMT_BTREE) &&
-	    !(args->op_flags & XFS_DA_OP_ADDNAME)) {
+	    !(args->op_flags & (XFS_DA_OP_ADDNAME | XFS_DA_OP_REPLACE))) {
 		xfs_attr_fork_remove(dp, args->trans);
 	} else {
 		xfs_idata_realloc(dp, -size, XFS_ATTR_FORK);
@@ -1128,9 +1148,17 @@ xfs_attr3_leaf_to_shortform(
 		goto out;
 
 	if (forkoff == -1) {
-		ASSERT(xfs_has_attr2(dp->i_mount));
-		ASSERT(dp->i_df.if_format != XFS_DINODE_FMT_BTREE);
-		xfs_attr_fork_remove(dp, args->trans);
+		/*
+		 * Don't remove the attr fork if this operation is the first
+		 * part of a attr replace operations. We're going to add a new
+		 * attr immediately, so we need to keep the attr fork around in
+		 * this case.
+		 */
+		if (!(args->op_flags & XFS_DA_OP_REPLACE)) {
+			ASSERT(xfs_has_attr2(dp->i_mount));
+			ASSERT(dp->i_df.if_format != XFS_DINODE_FMT_BTREE);
+			xfs_attr_fork_remove(dp, args->trans);
+		}
 		goto out;
 	}
 
diff --git a/fs/xfs/libxfs/xfs_da_btree.h b/fs/xfs/libxfs/xfs_da_btree.h
index 468ca70cd35d..ed2303e4d46a 100644
--- a/fs/xfs/libxfs/xfs_da_btree.h
+++ b/fs/xfs/libxfs/xfs_da_btree.h
@@ -91,6 +91,7 @@ typedef struct xfs_da_args {
 #define XFS_DA_OP_CILOOKUP	(1u << 4) /* lookup returns CI name if found */
 #define XFS_DA_OP_NOTIME	(1u << 5) /* don't update inode timestamps */
 #define XFS_DA_OP_REMOVE	(1u << 6) /* this is a remove operation */
+#define XFS_DA_OP_RECOVERY	(1u << 7) /* Log recovery operation */
 
 #define XFS_DA_OP_FLAGS \
 	{ XFS_DA_OP_JUSTCHECK,	"JUSTCHECK" }, \
@@ -99,7 +100,8 @@ typedef struct xfs_da_args {
 	{ XFS_DA_OP_OKNOENT,	"OKNOENT" }, \
 	{ XFS_DA_OP_CILOOKUP,	"CILOOKUP" }, \
 	{ XFS_DA_OP_NOTIME,	"NOTIME" }, \
-	{ XFS_DA_OP_REMOVE,	"REMOVE" }
+	{ XFS_DA_OP_REMOVE,	"REMOVE" }, \
+	{ XFS_DA_OP_RECOVERY,	"RECOVERY" }
 
 /*
  * Storage for holding state during Btree searches and split/join ops.
diff --git a/fs/xfs/xfs_attr_item.c b/fs/xfs/xfs_attr_item.c
index fb9549e7ea96..50ad3aa891ee 100644
--- a/fs/xfs/xfs_attr_item.c
+++ b/fs/xfs/xfs_attr_item.c
@@ -554,6 +554,7 @@ xfs_attri_item_recover(
 	args->namelen = attrp->alfi_name_len;
 	args->hashval = xfs_da_hashname(args->name, args->namelen);
 	args->attr_filter = attrp->alfi_attr_flags;
+	args->op_flags = XFS_DA_OP_RECOVERY | XFS_DA_OP_OKNOENT;
 
 	switch (attrp->alfi_op_flags & XFS_ATTR_OP_FLAGS_TYPE_MASK) {
 	case XFS_ATTR_OP_FLAGS_SET:
@@ -561,9 +562,14 @@ xfs_attri_item_recover(
 		args->value = attrip->attri_value;
 		args->valuelen = attrp->alfi_value_len;
 		args->total = xfs_attr_calc_size(args, &local);
-		attr->xattri_dela_state = xfs_attr_init_add_state(args);
+		if (xfs_inode_hasattr(args->dp))
+			attr->xattri_dela_state = xfs_attr_init_replace_state(args);
+		else
+			attr->xattri_dela_state = xfs_attr_init_add_state(args);
 		break;
 	case XFS_ATTR_OP_FLAGS_REMOVE:
+		if (!xfs_inode_hasattr(args->dp))
+			goto out;
 		attr->xattri_dela_state = xfs_attr_init_remove_state(args);
 		break;
 	default:
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index 01b047d86cd1..d32026585c1b 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -4131,13 +4131,10 @@ DEFINE_ICLOG_EVENT(xlog_iclog_write);
 
 TRACE_DEFINE_ENUM(XFS_DAS_UNINIT);
 TRACE_DEFINE_ENUM(XFS_DAS_SF_ADD);
-TRACE_DEFINE_ENUM(XFS_DAS_LEAF_ADD);
-TRACE_DEFINE_ENUM(XFS_DAS_NODE_ADD);
-TRACE_DEFINE_ENUM(XFS_DAS_RMTBLK);
-TRACE_DEFINE_ENUM(XFS_DAS_RM_NAME);
-TRACE_DEFINE_ENUM(XFS_DAS_RM_SHRINK);
 TRACE_DEFINE_ENUM(XFS_DAS_SF_REMOVE);
+TRACE_DEFINE_ENUM(XFS_DAS_LEAF_ADD);
 TRACE_DEFINE_ENUM(XFS_DAS_LEAF_REMOVE);
+TRACE_DEFINE_ENUM(XFS_DAS_NODE_ADD);
 TRACE_DEFINE_ENUM(XFS_DAS_NODE_REMOVE);
 TRACE_DEFINE_ENUM(XFS_DAS_LEAF_SET_RMT);
 TRACE_DEFINE_ENUM(XFS_DAS_LEAF_ALLOC_RMT);
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 18/18] xfs: detect empty attr leaf blocks in xfs_attr3_leaf_verify
  2022-05-09  0:41 [PATCH 00/18 V4] XFS: LARP state machine and recovery rework Dave Chinner
                   ` (16 preceding siblings ...)
  2022-05-09  0:41 ` [PATCH 17/18] xfs: ATTR_REPLACE algorithm with LARP enabled needs rework Dave Chinner
@ 2022-05-09  0:41 ` Dave Chinner
  2022-05-10 22:31   ` Alli
  2022-05-10 23:54   ` Darrick J. Wong
  2022-05-10 22:27 ` [PATCH 19/18] xfs: can't use kmem_zalloc() for attribute buffers Dave Chinner
  18 siblings, 2 replies; 56+ messages in thread
From: Dave Chinner @ 2022-05-09  0:41 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

xfs_repair flags these as a corruption error, so the verifier should
catch software bugs that result in empty leaf blocks being written
to disk, too.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/libxfs/xfs_attr_leaf.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/fs/xfs/libxfs/xfs_attr_leaf.c b/fs/xfs/libxfs/xfs_attr_leaf.c
index d15e92858bf0..15a990409463 100644
--- a/fs/xfs/libxfs/xfs_attr_leaf.c
+++ b/fs/xfs/libxfs/xfs_attr_leaf.c
@@ -310,6 +310,15 @@ xfs_attr3_leaf_verify(
 	if (fa)
 		return fa;
 
+	/*
+	 * Empty leaf blocks should never occur;  they imply the existence of a
+	 * software bug that needs fixing. xfs_repair also flags them as a
+	 * corruption that needs fixing, so we should never let these go to
+	 * disk.
+	 */
+	if (ichdr.count == 0)
+		return __this_address;
+
 	/*
 	 * firstused is the block offset of the first name info structure.
 	 * Make sure it doesn't go off the block or crash into the header.
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* Re: [PATCH 01/18] xfs: avoid empty xattr transaction when attrs are inline
  2022-05-09  0:41 ` [PATCH 01/18] xfs: avoid empty xattr transaction when attrs are inline Dave Chinner
@ 2022-05-09 16:43   ` Darrick J. Wong
  0 siblings, 0 replies; 56+ messages in thread
From: Darrick J. Wong @ 2022-05-09 16:43 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Mon, May 09, 2022 at 10:41:21AM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> generic/642 triggered a reproducable assert failure in
> xlog_cil_commit() that resulted from a xfs_attr_set() committing
> an empty but dirty transaction. When the CIL is empty and this
> occurs, xlog_cil_commit() tries a background push and this triggers
> a "pushing an empty CIL" assert.
> 
> XFS: Assertion failed: !list_empty(&cil->xc_cil), file: fs/xfs/xfs_log_cil.c, line: 1274
> Call Trace:
>  <TASK>
>  xlog_cil_commit+0xa5a/0xad0
>  __xfs_trans_commit+0xb8/0x330
>  xfs_trans_commit+0x10/0x20
>  xfs_attr_set+0x3e2/0x4c0
>  xfs_xattr_set+0x8d/0xe0
>  __vfs_setxattr+0x6b/0x90
>  __vfs_setxattr_noperm+0x76/0x220
>  __vfs_setxattr_locked+0xdf/0x100
>  vfs_setxattr+0x94/0x170
>  setxattr+0x110/0x200
>  path_setxattr+0xbf/0xe0
>  __x64_sys_setxattr+0x2b/0x30
>  do_syscall_64+0x35/0x80
> 
> The problem is related to the breakdown of attribute addition in
> xfs_attr_set_iter() and how it is called from deferred operations.
> When we have a pure leaf xattr insert, we add the xattr to the leaf
> and set the next state to XFS_DAS_FOUND_LBLK and return -EAGAIN.
> This requeues the xattr defered work, rolls the transaction and
> runs xfs_attr_set_iter() again. This then checks the xattr for
> being remote (it's not) and whether a replace op is being done (this
> is a create op) and if neither are true it returns without having
> done anything.
> 
> xfs_xattri_finish_update() then unconditionally sets the transaction
> dirty, and the deferops finishes and returns to __xfs_trans_commit()
> which sees the transaction dirty and tries to commit it by calling
> xlog_cil_commit(). The transaction is empty, and then the assert
> fires if this happens when the CIL is empty.
> 
> This patch addresses the structure of xfs_attr_set_iter() that
> requires re-entry on leaf add even when nothing will be done. This
> gets rid of the trailing empty transaction and so doesn't trigger
> the XFS_TRANS_DIRTY assignment in xfs_xattri_finish_update()
> incorrectly. Addressing that is for a different patch.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> Reviewed-by: Allison Henderson<allison.henderson@oracle.com>

Looks ok to me,
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> ---
>  fs/xfs/libxfs/xfs_attr.c | 39 +++++++++++++++++++--------------------
>  1 file changed, 19 insertions(+), 20 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
> index 48b7e7efbb30..b3d918195160 100644
> --- a/fs/xfs/libxfs/xfs_attr.c
> +++ b/fs/xfs/libxfs/xfs_attr.c
> @@ -315,6 +315,7 @@ xfs_attr_leaf_addname(
>  {
>  	struct xfs_da_args	*args = attr->xattri_da_args;
>  	struct xfs_inode	*dp = args->dp;
> +	enum xfs_delattr_state	next_state = XFS_DAS_UNINIT;
>  	int			error;
>  
>  	if (xfs_attr_is_leaf(dp)) {
> @@ -335,37 +336,35 @@ xfs_attr_leaf_addname(
>  			 * when we come back, we'll be a node, so we'll fall
>  			 * down into the node handling code below
>  			 */
> -			trace_xfs_attr_set_iter_return(
> -				attr->xattri_dela_state, args->dp);
> -			return -EAGAIN;
> +			error = -EAGAIN;
> +			goto out;
>  		}
> -
> -		if (error)
> -			return error;
> -
> -		attr->xattri_dela_state = XFS_DAS_FOUND_LBLK;
> +		next_state = XFS_DAS_FOUND_LBLK;
>  	} else {
>  		error = xfs_attr_node_addname_find_attr(attr);
>  		if (error)
>  			return error;
>  
> +		next_state = XFS_DAS_FOUND_NBLK;
>  		error = xfs_attr_node_addname(attr);
> -		if (error)
> -			return error;
> -
> -		/*
> -		 * If addname was successful, and we dont need to alloc or
> -		 * remove anymore blks, we're done.
> -		 */
> -		if (!args->rmtblkno &&
> -		    !(args->op_flags & XFS_DA_OP_RENAME))
> -			return 0;
> +	}
> +	if (error)
> +		return error;
>  
> -		attr->xattri_dela_state = XFS_DAS_FOUND_NBLK;
> +	/*
> +	 * We need to commit and roll if we need to allocate remote xattr blocks
> +	 * or perform more xattr manipulations. Otherwise there is nothing more
> +	 * to do and we can return success.
> +	 */
> +	if (args->rmtblkno ||
> +	    (args->op_flags & XFS_DA_OP_RENAME)) {
> +		attr->xattri_dela_state = next_state;
> +		error = -EAGAIN;
>  	}
>  
> +out:
>  	trace_xfs_attr_leaf_addname_return(attr->xattri_dela_state, args->dp);
> -	return -EAGAIN;
> +	return error;
>  }
>  
>  /*
> -- 
> 2.35.1
> 

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 02/18] xfs: initialise attrd item to zero
  2022-05-09  0:41 ` [PATCH 02/18] xfs: initialise attrd item to zero Dave Chinner
@ 2022-05-09 16:43   ` Darrick J. Wong
  0 siblings, 0 replies; 56+ messages in thread
From: Darrick J. Wong @ 2022-05-09 16:43 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Mon, May 09, 2022 at 10:41:22AM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> On the first allocation of a attrd item, xfs_trans_add_item() fires
> an assert like so:
> 
>  XFS (pmem0): EXPERIMENTAL logged extended attributes feature added. Use at your own risk!
>  XFS: Assertion failed: !test_bit(XFS_LI_DIRTY, &lip->li_flags), file: fs/xfs/xfs_trans.c, line: 683
>  ------------[ cut here ]------------
>  kernel BUG at fs/xfs/xfs_message.c:102!
>  Call Trace:
>   <TASK>
>   xfs_trans_add_item+0x17e/0x190
>   xfs_trans_get_attrd+0x67/0x90
>   xfs_attr_create_done+0x13/0x20
>   xfs_defer_finish_noroll+0x100/0x690
>   __xfs_trans_commit+0x144/0x330
>   xfs_trans_commit+0x10/0x20
>   xfs_attr_set+0x3e2/0x4c0
>   xfs_initxattrs+0xaa/0xe0
>   security_inode_init_security+0xb0/0x130
>   xfs_init_security+0x18/0x20
>   xfs_generic_create+0x13a/0x340
>   xfs_vn_create+0x17/0x20
>   path_openat+0xff3/0x12f0
>   do_filp_open+0xb2/0x150
> 
> The attrd log item is allocated via kmem_cache_alloc, and
> xfs_log_item_init() does not zero the entire log item structure - it
> assumes that the structure is already all zeros as it only
> initialises non-zero fields. Fix the attr items to be allocated
> via the *zalloc methods.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> Reviewed-by: Allison Henderson<allison.henderson@oracle.com>

Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> ---
>  fs/xfs/xfs_attr_item.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/xfs/xfs_attr_item.c b/fs/xfs/xfs_attr_item.c
> index 9061adce3f16..5f8680b05079 100644
> --- a/fs/xfs/xfs_attr_item.c
> +++ b/fs/xfs/xfs_attr_item.c
> @@ -721,7 +721,7 @@ xfs_trans_get_attrd(struct xfs_trans		*tp,
>  
>  	ASSERT(tp != NULL);
>  
> -	attrdp = kmem_cache_alloc(xfs_attrd_cache, GFP_NOFS | __GFP_NOFAIL);
> +	attrdp = kmem_cache_zalloc(xfs_attrd_cache, GFP_NOFS | __GFP_NOFAIL);
>  
>  	xfs_log_item_init(tp->t_mountp, &attrdp->attrd_item, XFS_LI_ATTRD,
>  			  &xfs_attrd_item_ops);
> -- 
> 2.35.1
> 

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH 16/18 v2] xfs: use XFS_DA_OP flags in deferred attr ops
  2022-05-09  0:41 ` [PATCH 16/18] xfs: use XFS_DA_OP flags in deferred attr ops Dave Chinner
@ 2022-05-10 22:20   ` Dave Chinner
  2022-05-10 23:47     ` Darrick J. Wong
  2022-05-10 23:49     ` Alli
  0 siblings, 2 replies; 56+ messages in thread
From: Dave Chinner @ 2022-05-10 22:20 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

We currently store the high level attr operation in
args->attr_flags. This falgs are what the VFS is telling us to do,
but don't necessarily match what we are doing in the low level
modification state machine. e.g. XATTR_REPLACE implies both
XFS_DA_OP_ADDNAME and XFS_DA_OP_RENAME because it is doing both a
remove and adding a new attr.

However, deep in the individual state machine operations, we check
errors against this high level VFS op flags, not the low level
XFS_DA_OP flags. Indeed, we don't even have a low level flag for
a REMOVE operation, so the only way we know we are doing a remove
is the complete absence of XATTR_REPLACE, XATTR_CREATE,
XFS_DA_OP_ADDNAME and XFS_DA_OP_RENAME. And because there are other
flags in these fields, this is a pain to check if we need to.

As the XFS_DA_OP flags are only needed once the deferred operations
are set up, set these flags appropriately when we set the initial
operation state. We also introduce a XFS_DA_OP_REMOVE flag to make
it easy to know that we are doing a remove operation.

With these, we can remove the use of XATTR_REPLACE and XATTR_CREATE
in low level lookup operations, and manipulate the low level flags
according to the low level context that is operating. e.g. log
recovery does not have a VFS xattr operation state to copy into
args->attr_flags, and the low level state machine ops we do for
recovery do not match the high level VFS operations that were in
progress when the system failed...

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
Version 2 (now that vger has finally got this to me after 2 days):
- fixed bug by that removed clearing remote block values after save
  for node format blocks. Clearing is now done by
  xfs_attr_save_rmt_blk() for both callers.

 fs/xfs/libxfs/xfs_attr.c      | 136 +++++++++++++++++++++++-------------------
 fs/xfs/libxfs/xfs_attr.h      |   3 +
 fs/xfs/libxfs/xfs_attr_leaf.c |   2 +-
 fs/xfs/libxfs/xfs_da_btree.h  |   8 ++-
 4 files changed, 83 insertions(+), 66 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
index 8be76f8d11c5..a36364b27aa1 100644
--- a/fs/xfs/libxfs/xfs_attr.c
+++ b/fs/xfs/libxfs/xfs_attr.c
@@ -467,7 +467,7 @@ xfs_attr_leaf_addname(
 	 */
 	if (args->rmtblkno)
 		attr->xattri_dela_state = XFS_DAS_LEAF_SET_RMT;
-	else if (args->op_flags & XFS_DA_OP_RENAME)
+	else if (args->op_flags & XFS_DA_OP_REPLACE)
 		xfs_attr_dela_state_set_replace(attr, XFS_DAS_LEAF_REPLACE);
 	else
 		attr->xattri_dela_state = XFS_DAS_DONE;
@@ -512,7 +512,7 @@ xfs_attr_node_addname(
 
 	if (args->rmtblkno)
 		attr->xattri_dela_state = XFS_DAS_NODE_SET_RMT;
-	else if (args->op_flags & XFS_DA_OP_RENAME)
+	else if (args->op_flags & XFS_DA_OP_REPLACE)
 		xfs_attr_dela_state_set_replace(attr, XFS_DAS_NODE_REPLACE);
 	else
 		attr->xattri_dela_state = XFS_DAS_DONE;
@@ -548,7 +548,7 @@ xfs_attr_rmtval_alloc(
 		return error;
 
 	/* If this is not a rename, clear the incomplete flag and we're done. */
-	if (!(args->op_flags & XFS_DA_OP_RENAME)) {
+	if (!(args->op_flags & XFS_DA_OP_REPLACE)) {
 		error = xfs_attr3_leaf_clearflag(args);
 		attr->xattri_dela_state = XFS_DAS_DONE;
 	} else {
@@ -967,8 +967,6 @@ xfs_attr_set(
 
 	if (args->value) {
 		XFS_STATS_INC(mp, xs_attr_set);
-
-		args->op_flags |= XFS_DA_OP_ADDNAME;
 		args->total = xfs_attr_calc_size(args, &local);
 
 		/*
@@ -1126,28 +1124,41 @@ static inline int xfs_attr_sf_totsize(struct xfs_inode *dp)
  * Add a name to the shortform attribute list structure
  * This is the external routine.
  */
-STATIC int
-xfs_attr_shortform_addname(xfs_da_args_t *args)
+static int
+xfs_attr_shortform_addname(
+	struct xfs_da_args	*args)
 {
-	int newsize, forkoff, retval;
+	int			newsize, forkoff;
+	int			error;
 
 	trace_xfs_attr_sf_addname(args);
 
-	retval = xfs_attr_shortform_lookup(args);
-	if (retval == -ENOATTR && (args->attr_flags & XATTR_REPLACE))
-		return retval;
-	if (retval == -EEXIST) {
-		if (args->attr_flags & XATTR_CREATE)
-			return retval;
-		retval = xfs_attr_sf_removename(args);
-		if (retval)
-			return retval;
+	error = xfs_attr_shortform_lookup(args);
+	switch (error) {
+	case -ENOATTR:
+		if (args->op_flags & XFS_DA_OP_REPLACE)
+			return error;
+		break;
+	case -EEXIST:
+		if (!(args->op_flags & XFS_DA_OP_REPLACE))
+			return error;
+
+		error = xfs_attr_sf_removename(args);
+		if (error)
+			return error;
+
 		/*
-		 * Since we have removed the old attr, clear ATTR_REPLACE so
-		 * that the leaf format add routine won't trip over the attr
-		 * not being around.
+		 * Since we have removed the old attr, clear XFS_DA_OP_REPLACE
+		 * so that the new attr doesn't fit in shortform format, the
+		 * leaf format add routine won't trip over the attr not being
+		 * around.
 		 */
-		args->attr_flags &= ~XATTR_REPLACE;
+		args->op_flags &= ~XFS_DA_OP_REPLACE;
+		break;
+	case 0:
+		break;
+	default:
+		return error;
 	}
 
 	if (args->namelen >= XFS_ATTR_SF_ENTSIZE_MAX ||
@@ -1170,8 +1181,8 @@ xfs_attr_shortform_addname(xfs_da_args_t *args)
  * External routines when attribute list is one block
  *========================================================================*/
 
-/* Store info about a remote block */
-STATIC void
+/* Save the current remote block info and clear the current pointers. */
+static void
 xfs_attr_save_rmt_blk(
 	struct xfs_da_args	*args)
 {
@@ -1180,10 +1191,13 @@ xfs_attr_save_rmt_blk(
 	args->rmtblkno2 = args->rmtblkno;
 	args->rmtblkcnt2 = args->rmtblkcnt;
 	args->rmtvaluelen2 = args->rmtvaluelen;
+	args->rmtblkno = 0;
+	args->rmtblkcnt = 0;
+	args->rmtvaluelen = 0;
 }
 
 /* Set stored info about a remote block */
-STATIC void
+static void
 xfs_attr_restore_rmt_blk(
 	struct xfs_da_args	*args)
 {
@@ -1229,28 +1243,27 @@ xfs_attr_leaf_try_add(
 	 * Look up the xattr name to set the insertion point for the new xattr.
 	 */
 	error = xfs_attr3_leaf_lookup_int(bp, args);
-	if (error != -ENOATTR && error != -EEXIST)
-		goto out_brelse;
-	if (error == -ENOATTR && (args->attr_flags & XATTR_REPLACE))
-		goto out_brelse;
-	if (error == -EEXIST) {
-		if (args->attr_flags & XATTR_CREATE)
+	switch (error) {
+	case -ENOATTR:
+		if (args->op_flags & XFS_DA_OP_REPLACE)
+			goto out_brelse;
+		break;
+	case -EEXIST:
+		if (!(args->op_flags & XFS_DA_OP_REPLACE))
 			goto out_brelse;
 
 		trace_xfs_attr_leaf_replace(args);
-
-		/* save the attribute state for later removal*/
-		args->op_flags |= XFS_DA_OP_RENAME;	/* an atomic rename */
-		xfs_attr_save_rmt_blk(args);
-
 		/*
-		 * clear the remote attr state now that it is saved so that the
-		 * values reflect the state of the attribute we are about to
+		 * Save the existing remote attr state so that the current
+		 * values reflect the state of the new attribute we are about to
 		 * add, not the attribute we just found and will remove later.
 		 */
-		args->rmtblkno = 0;
-		args->rmtblkcnt = 0;
-		args->rmtvaluelen = 0;
+		xfs_attr_save_rmt_blk(args);
+		break;
+	case 0:
+		break;
+	default:
+		goto out_brelse;
 	}
 
 	return xfs_attr3_leaf_add(bp, args);
@@ -1389,46 +1402,45 @@ xfs_attr_node_hasname(
 
 STATIC int
 xfs_attr_node_addname_find_attr(
-	 struct xfs_attr_item		*attr)
+	 struct xfs_attr_item	*attr)
 {
-	struct xfs_da_args		*args = attr->xattri_da_args;
-	int				retval;
+	struct xfs_da_args	*args = attr->xattri_da_args;
+	int			error;
 
 	/*
 	 * Search to see if name already exists, and get back a pointer
 	 * to where it should go.
 	 */
-	retval = xfs_attr_node_hasname(args, &attr->xattri_da_state);
-	if (retval != -ENOATTR && retval != -EEXIST)
-		goto error;
-
-	if (retval == -ENOATTR && (args->attr_flags & XATTR_REPLACE))
-		goto error;
-	if (retval == -EEXIST) {
-		if (args->attr_flags & XATTR_CREATE)
+	error = xfs_attr_node_hasname(args, &attr->xattri_da_state);
+	switch (error) {
+	case -ENOATTR:
+		if (args->op_flags & XFS_DA_OP_REPLACE)
+			goto error;
+		break;
+	case -EEXIST:
+		if (!(args->op_flags & XFS_DA_OP_REPLACE))
 			goto error;
 
-		trace_xfs_attr_node_replace(args);
-
-		/* save the attribute state for later removal*/
-		args->op_flags |= XFS_DA_OP_RENAME;	/* atomic rename op */
-		xfs_attr_save_rmt_blk(args);
 
+		trace_xfs_attr_node_replace(args);
 		/*
-		 * clear the remote attr state now that it is saved so that the
-		 * values reflect the state of the attribute we are about to
+		 * Save the existing remote attr state so that the current
+		 * values reflect the state of the new attribute we are about to
 		 * add, not the attribute we just found and will remove later.
 		 */
-		args->rmtblkno = 0;
-		args->rmtblkcnt = 0;
-		args->rmtvaluelen = 0;
+		xfs_attr_save_rmt_blk(args);
+		break;
+	case 0:
+		break;
+	default:
+		goto error;
 	}
 
 	return 0;
 error:
 	if (attr->xattri_da_state)
 		xfs_da_state_free(attr->xattri_da_state);
-	return retval;
+	return error;
 }
 
 /*
diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
index 6bef522533a4..e93efc8b11cd 100644
--- a/fs/xfs/libxfs/xfs_attr.h
+++ b/fs/xfs/libxfs/xfs_attr.h
@@ -584,6 +584,7 @@ xfs_attr_is_shortform(
 static inline enum xfs_delattr_state
 xfs_attr_init_add_state(struct xfs_da_args *args)
 {
+	args->op_flags |= XFS_DA_OP_ADDNAME;
 	if (!args->dp->i_afp)
 		return XFS_DAS_DONE;
 	if (xfs_attr_is_shortform(args->dp))
@@ -596,6 +597,7 @@ xfs_attr_init_add_state(struct xfs_da_args *args)
 static inline enum xfs_delattr_state
 xfs_attr_init_remove_state(struct xfs_da_args *args)
 {
+	args->op_flags |= XFS_DA_OP_REMOVE;
 	if (xfs_attr_is_shortform(args->dp))
 		return XFS_DAS_SF_REMOVE;
 	if (xfs_attr_is_leaf(args->dp))
@@ -606,6 +608,7 @@ xfs_attr_init_remove_state(struct xfs_da_args *args)
 static inline enum xfs_delattr_state
 xfs_attr_init_replace_state(struct xfs_da_args *args)
 {
+	args->op_flags |= XFS_DA_OP_ADDNAME | XFS_DA_OP_REPLACE;
 	return xfs_attr_init_add_state(args);
 }
 
diff --git a/fs/xfs/libxfs/xfs_attr_leaf.c b/fs/xfs/libxfs/xfs_attr_leaf.c
index e90bfd9d7551..53d02ce9ed78 100644
--- a/fs/xfs/libxfs/xfs_attr_leaf.c
+++ b/fs/xfs/libxfs/xfs_attr_leaf.c
@@ -1492,7 +1492,7 @@ xfs_attr3_leaf_add_work(
 	entry->flags = args->attr_filter;
 	if (tmp)
 		entry->flags |= XFS_ATTR_LOCAL;
-	if (args->op_flags & XFS_DA_OP_RENAME) {
+	if (args->op_flags & XFS_DA_OP_REPLACE) {
 		if (!xfs_has_larp(mp))
 			entry->flags |= XFS_ATTR_INCOMPLETE;
 		if ((args->blkno2 == args->blkno) &&
diff --git a/fs/xfs/libxfs/xfs_da_btree.h b/fs/xfs/libxfs/xfs_da_btree.h
index deb368d041e3..468ca70cd35d 100644
--- a/fs/xfs/libxfs/xfs_da_btree.h
+++ b/fs/xfs/libxfs/xfs_da_btree.h
@@ -85,19 +85,21 @@ typedef struct xfs_da_args {
  * Operation flags:
  */
 #define XFS_DA_OP_JUSTCHECK	(1u << 0) /* check for ok with no space */
-#define XFS_DA_OP_RENAME	(1u << 1) /* this is an atomic rename op */
+#define XFS_DA_OP_REPLACE	(1u << 1) /* this is an atomic replace op */
 #define XFS_DA_OP_ADDNAME	(1u << 2) /* this is an add operation */
 #define XFS_DA_OP_OKNOENT	(1u << 3) /* lookup op, ENOENT ok, else die */
 #define XFS_DA_OP_CILOOKUP	(1u << 4) /* lookup returns CI name if found */
 #define XFS_DA_OP_NOTIME	(1u << 5) /* don't update inode timestamps */
+#define XFS_DA_OP_REMOVE	(1u << 6) /* this is a remove operation */
 
 #define XFS_DA_OP_FLAGS \
 	{ XFS_DA_OP_JUSTCHECK,	"JUSTCHECK" }, \
-	{ XFS_DA_OP_RENAME,	"RENAME" }, \
+	{ XFS_DA_OP_REPLACE,	"REPLACE" }, \
 	{ XFS_DA_OP_ADDNAME,	"ADDNAME" }, \
 	{ XFS_DA_OP_OKNOENT,	"OKNOENT" }, \
 	{ XFS_DA_OP_CILOOKUP,	"CILOOKUP" }, \
-	{ XFS_DA_OP_NOTIME,	"NOTIME" }
+	{ XFS_DA_OP_NOTIME,	"NOTIME" }, \
+	{ XFS_DA_OP_REMOVE,	"REMOVE" }
 
 /*
  * Storage for holding state during Btree searches and split/join ops.
Dave Chinner
david@fromorbit.com

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 19/18] xfs: can't use kmem_zalloc() for attribute buffers
  2022-05-09  0:41 [PATCH 00/18 V4] XFS: LARP state machine and recovery rework Dave Chinner
                   ` (17 preceding siblings ...)
  2022-05-09  0:41 ` [PATCH 18/18] xfs: detect empty attr leaf blocks in xfs_attr3_leaf_verify Dave Chinner
@ 2022-05-10 22:27 ` Dave Chinner
  2022-05-10 23:59   ` Darrick J. Wong
  2022-05-11  0:54   ` Alli
  18 siblings, 2 replies; 56+ messages in thread
From: Dave Chinner @ 2022-05-10 22:27 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

Because when running fsmark workloads with 64kB xattrs, heap
allocation of >64kB buffers for the attr item name/value buffer
will fail and deadlock.

....
 XFS: fs_mark(8414) possible memory allocation deadlock size 65768 in kmem_alloc (mode:0x2d40)
 XFS: fs_mark(8417) possible memory allocation deadlock size 65768 in kmem_alloc (mode:0x2d40)
 XFS: fs_mark(8409) possible memory allocation deadlock size 65768 in kmem_alloc (mode:0x2d40)
 XFS: fs_mark(8428) possible memory allocation deadlock size 65768 in kmem_alloc (mode:0x2d40)
 XFS: fs_mark(8430) possible memory allocation deadlock size 65768 in kmem_alloc (mode:0x2d40)
 XFS: fs_mark(8437) possible memory allocation deadlock size 65768 in kmem_alloc (mode:0x2d40)
 XFS: fs_mark(8433) possible memory allocation deadlock size 65768 in kmem_alloc (mode:0x2d40)
 XFS: fs_mark(8406) possible memory allocation deadlock size 65768 in kmem_alloc (mode:0x2d40)
 XFS: fs_mark(8412) possible memory allocation deadlock size 65768 in kmem_alloc (mode:0x2d40)
 XFS: fs_mark(8432) possible memory allocation deadlock size 65768 in kmem_alloc (mode:0x2d40)
 XFS: fs_mark(8424) possible memory allocation deadlock size 65768 in kmem_alloc (mode:0x2d40)
....

I'd use kvmalloc(), but if we are doing 15,000 64kB xattr creates a
second, the attempt to use kmalloc() in kvmalloc() results in a huge
amount of direct reclaim work that is guaranteed to fail occurs
before it falls back to vmalloc:

- 48.19% xfs_attr_create_intent
  - 46.89% xfs_attri_init
     - kvmalloc_node
	- 46.04% __kmalloc_node
	   - kmalloc_large_node
	      - 45.99% __alloc_pages
		 - 39.39% __alloc_pages_slowpath.constprop.0
		    - 38.89% __alloc_pages_direct_compact
		       - 38.71% try_to_compact_pages
			  - compact_zone_order
			  - compact_zone
			     - 21.09% isolate_migratepages_block
				  10.31% PageHuge
				  5.82% set_pfnblock_flags_mask
				  0.86% get_pfnblock_flags_mask
			     - 4.48% __reset_isolation_suitable
				  4.44% __reset_isolation_pfn
			     - 3.56% __pageblock_pfn_to_page
				  1.33% pfn_to_online_page
			       2.83% get_pfnblock_flags_mask
			     - 0.87% migrate_pages
				  0.86% compaction_alloc
			       0.84% find_suitable_fallback
		 - 6.60% get_page_from_freelist
		      4.99% clear_page_erms
		    - 1.19% _raw_spin_lock_irqsave
		       - do_raw_spin_lock
			    __pv_queued_spin_lock_slowpath
	- 0.86% __vmalloc_node_range
	     0.65% __alloc_pages_bulk

So lift xlog_cil_kvmalloc(), rename it to xlog_kvmalloc() and use
that instead because it has sane fail-fast behaviour for the
embedded kmalloc attempt. It also provides __GFP_NOFAIL guarantees
that kvmalloc() won't do, either....

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_attr_item.c | 35 +++++++++++++++--------------------
 fs/xfs/xfs_log_cil.c   | 35 +----------------------------------
 fs/xfs/xfs_log_priv.h  | 34 ++++++++++++++++++++++++++++++++++
 3 files changed, 50 insertions(+), 54 deletions(-)

diff --git a/fs/xfs/xfs_attr_item.c b/fs/xfs/xfs_attr_item.c
index 56f678c965b7..e8ac88d9fd14 100644
--- a/fs/xfs/xfs_attr_item.c
+++ b/fs/xfs/xfs_attr_item.c
@@ -44,7 +44,7 @@ xfs_attri_item_free(
 	struct xfs_attri_log_item	*attrip)
 {
 	kmem_free(attrip->attri_item.li_lv_shadow);
-	kmem_free(attrip);
+	kvfree(attrip);
 }
 
 /*
@@ -119,11 +119,11 @@ xfs_attri_item_format(
 			sizeof(struct xfs_attri_log_format));
 	xlog_copy_iovec(lv, &vecp, XLOG_REG_TYPE_ATTR_NAME,
 			attrip->attri_name,
-			xlog_calc_iovec_len(attrip->attri_name_len));
+			attrip->attri_name_len);
 	if (attrip->attri_value_len > 0)
 		xlog_copy_iovec(lv, &vecp, XLOG_REG_TYPE_ATTR_VALUE,
 				attrip->attri_value,
-				xlog_calc_iovec_len(attrip->attri_value_len));
+				attrip->attri_value_len);
 }
 
 /*
@@ -163,26 +163,21 @@ xfs_attri_init(
 
 {
 	struct xfs_attri_log_item	*attrip;
-	uint32_t			name_vec_len = 0;
-	uint32_t			value_vec_len = 0;
-	uint32_t			buffer_size;
-
-	if (name_len)
-		name_vec_len = xlog_calc_iovec_len(name_len);
-	if (value_len)
-		value_vec_len = xlog_calc_iovec_len(value_len);
-
-	buffer_size = name_vec_len + value_vec_len;
+	uint32_t			buffer_size = name_len + value_len;
 
 	if (buffer_size) {
-		attrip = kmem_zalloc(sizeof(struct xfs_attri_log_item) +
-				    buffer_size, KM_NOFS);
-		if (attrip == NULL)
-			return NULL;
+		/*
+		 * This could be over 64kB in length, so we have to use
+		 * kvmalloc() for this. But kvmalloc() utterly sucks, so we
+		 * use own version.
+		 */
+		attrip = xlog_kvmalloc(sizeof(struct xfs_attri_log_item) +
+					buffer_size);
 	} else {
-		attrip = kmem_cache_zalloc(xfs_attri_cache,
-					  GFP_NOFS | __GFP_NOFAIL);
+		attrip = kmem_cache_alloc(xfs_attri_cache,
+					GFP_NOFS | __GFP_NOFAIL);
 	}
+	memset(attrip, 0, sizeof(struct xfs_attri_log_item));
 
 	attrip->attri_name_len = name_len;
 	if (name_len)
@@ -195,7 +190,7 @@ xfs_attri_init(
 	if (value_len)
 		attrip->attri_value = ((char *)attrip) +
 				sizeof(struct xfs_attri_log_item) +
-				name_vec_len;
+				name_len;
 	else
 		attrip->attri_value = NULL;
 
diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
index 42ace9b091d8..b4023693b89f 100644
--- a/fs/xfs/xfs_log_cil.c
+++ b/fs/xfs/xfs_log_cil.c
@@ -219,39 +219,6 @@ xlog_cil_iovec_space(
 			sizeof(uint64_t));
 }
 
-/*
- * shadow buffers can be large, so we need to use kvmalloc() here to ensure
- * success. Unfortunately, kvmalloc() only allows GFP_KERNEL contexts to fall
- * back to vmalloc, so we can't actually do anything useful with gfp flags to
- * control the kmalloc() behaviour within kvmalloc(). Hence kmalloc() will do
- * direct reclaim and compaction in the slow path, both of which are
- * horrendously expensive. We just want kmalloc to fail fast and fall back to
- * vmalloc if it can't get somethign straight away from the free lists or buddy
- * allocator. Hence we have to open code kvmalloc outselves here.
- *
- * Also, we are in memalloc_nofs_save task context here, so despite the use of
- * GFP_KERNEL here, we are actually going to be doing GFP_NOFS allocations. This
- * is actually the only way to make vmalloc() do GFP_NOFS allocations, so lets
- * just all pretend this is a GFP_KERNEL context operation....
- */
-static inline void *
-xlog_cil_kvmalloc(
-	size_t		buf_size)
-{
-	gfp_t		flags = GFP_KERNEL;
-	void		*p;
-
-	flags &= ~__GFP_DIRECT_RECLAIM;
-	flags |= __GFP_NOWARN | __GFP_NORETRY;
-	do {
-		p = kmalloc(buf_size, flags);
-		if (!p)
-			p = vmalloc(buf_size);
-	} while (!p);
-
-	return p;
-}
-
 /*
  * Allocate or pin log vector buffers for CIL insertion.
  *
@@ -368,7 +335,7 @@ xlog_cil_alloc_shadow_bufs(
 			 * storage.
 			 */
 			kmem_free(lip->li_lv_shadow);
-			lv = xlog_cil_kvmalloc(buf_size);
+			lv = xlog_kvmalloc(buf_size);
 
 			memset(lv, 0, xlog_cil_iovec_space(niovecs));
 
diff --git a/fs/xfs/xfs_log_priv.h b/fs/xfs/xfs_log_priv.h
index 4aa95b68450a..46f989641eda 100644
--- a/fs/xfs/xfs_log_priv.h
+++ b/fs/xfs/xfs_log_priv.h
@@ -679,4 +679,38 @@ xlog_valid_lsn(
  */
 void xlog_cil_pcp_dead(struct xlog *log, unsigned int cpu);
 
+/*
+ * Log vector and shadow buffers can be large, so we need to use kvmalloc() here
+ * to ensure success. Unfortunately, kvmalloc() only allows GFP_KERNEL contexts
+ * to fall back to vmalloc, so we can't actually do anything useful with gfp
+ * flags to control the kmalloc() behaviour within kvmalloc(). Hence kmalloc()
+ * will do direct reclaim and compaction in the slow path, both of which are
+ * horrendously expensive. We just want kmalloc to fail fast and fall back to
+ * vmalloc if it can't get somethign straight away from the free lists or
+ * buddy allocator. Hence we have to open code kvmalloc outselves here.
+ *
+ * This assumes that the caller uses memalloc_nofs_save task context here, so
+ * despite the use of GFP_KERNEL here, we are going to be doing GFP_NOFS
+ * allocations. This is actually the only way to make vmalloc() do GFP_NOFS
+ * allocations, so lets just all pretend this is a GFP_KERNEL context
+ * operation....
+ */
+static inline void *
+xlog_kvmalloc(
+	size_t		buf_size)
+{
+	gfp_t		flags = GFP_KERNEL;
+	void		*p;
+
+	flags &= ~__GFP_DIRECT_RECLAIM;
+	flags |= __GFP_NOWARN | __GFP_NORETRY;
+	do {
+		p = kmalloc(buf_size, flags);
+		if (!p)
+			p = vmalloc(buf_size);
+	} while (!p);
+
+	return p;
+}
+
 #endif	/* __XFS_LOG_PRIV_H__ */
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* Re: [PATCH 17/18] xfs: ATTR_REPLACE algorithm with LARP enabled needs rework
  2022-05-09  0:41 ` [PATCH 17/18] xfs: ATTR_REPLACE algorithm with LARP enabled needs rework Dave Chinner
@ 2022-05-10 22:31   ` Alli
  2022-05-10 23:53   ` Darrick J. Wong
  1 sibling, 0 replies; 56+ messages in thread
From: Alli @ 2022-05-10 22:31 UTC (permalink / raw)
  To: Dave Chinner, linux-xfs

On Mon, 2022-05-09 at 10:41 +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> We can't use the same algorithm for replacing an existing attribute
> when logging attributes. The existing algorithm is essentially:
> 
> 1. create new attr w/ INCOMPLETE
> 2. atomically flip INCOMPLETE flags between old + new attribute
> 3. remove old attr which is marked w/ INCOMPLETE
> 
> This algorithm guarantees that we see either the old or new
> attribute, and if we fail after the atomic flag flip, we don't have
> to recover the removal of the old attr because we never see
> INCOMPLETE attributes in lookups.
> 
> For logged attributes, however, this does not work. The logged
> attribute intents do not track the work that has been done as the
> transaction rolls, and hence the only recovery mechanism we have is
> "run the replace operation from scratch".
> 
> This is further exacerbated by the attempt to avoid needing the
> INCOMPLETE flag to create an atomic swap. This means we can create
> a second active attribute of the same name before we remove the
> original. If we fail at any point after the create but before the
> removal has completed, we end up with duplicate attributes in
> the attr btree and recovery only tries to replace one of them.
> 
> There are several other failure modes where we can leave partially
> allocated remote attributes that expose stale data, partially free
> remote attributes that enable UAF based stale data exposure, etc.
> 
> TO fix this, we need a different algorithm for replace operations
> when LARP is enabled. Luckily, it's not that complex if we take the
> right first step. That is, the first thing we log is the attri
> intent with the new name/value pair and mark the old attr as
> INCOMPLETE in the same transaction.
> 
> From there, we then remove the old attr and keep relogging the
> new name/value in the intent, such that we always know that we have
> to create the new attr in recovery. Once the old attr is removed,
> we then run a normal ATTR_CREATE operation relogging the intent as
> we go. If the new attr is local, then it gets created in a single
> atomic transaction that also logs the final intent done. If the new
> attr is remote, the we set INCOMPLETE on the new attr while we
> allocate and set the remote value, and then we clear the INCOMPLETE
> flag at in the last transaction taht logs the final intent done.
> 
> If we fail at any point in this algorithm, log recovery will always
> see the same state on disk: the new name/value in the intent, and
> either an INCOMPLETE attr or no attr in the attr btree. If we find
> an INCOMPLETE attr, we run the full replace starting with removing
> the INCOMPLETE attr. If we don't find it, then we simply create the
> new attr.
> 
> Notably, recovery of a failed create that has an INCOMPLETE flag set
> is now the same - we start with the lookup of the INCOMPLETE attr,
> and if that exists then we do the full replace recovery process,
> otherwise we just create the new attr.
> 
> Hence changing the way we do the replace operation when LARP is
> enabled allows us to use the same log recovery algorithm for both
> the ATTR_CREATE and ATTR_REPLACE operations. This is also the same
> algorithm we use for runtime ATTR_REPLACE operations (except for the
> step setting up the initial conditions).
> 
> The result is that:
> 
> - ATTR_CREATE uses the same algorithm regardless of whether LARP is
>   enabled or not
> - ATTR_REPLACE with larp=0 is identical to the old algorithm
> - ATTR_REPLACE with larp=1 runs an unmodified attr removal algorithm
>   from the larp=0 code and then runs the unmodified ATTR_CREATE
>   code.
> - log recovery when larp=1 runs the same ATTR_REPLACE algorithm as
>   it uses at runtime.
> 
> Because the state machine is now quite clean, changing the algorithm
> is really just a case of changing the initial state and how the
> states link together for the ATTR_REPLACE case. Hence it's not a
> huge amoutn of code for what is a fairly substantial rework
> of the attr logging and recovery algorithm....
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
Ok, I think this is looking good, and with the new fixes tests seem to
pass in my setup.

Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
> ---
>  fs/xfs/libxfs/xfs_attr.c      | 97 +++++++++++++++++++++----------
> ----
>  fs/xfs/libxfs/xfs_attr.h      | 49 +++++++++++-------
>  fs/xfs/libxfs/xfs_attr_leaf.c | 44 +++++++++++++---
>  fs/xfs/libxfs/xfs_da_btree.h  |  4 +-
>  fs/xfs/xfs_attr_item.c        |  8 ++-
>  fs/xfs/xfs_trace.h            |  7 +--
>  6 files changed, 137 insertions(+), 72 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
> index 344497e37813..2f6b9bfd7768 100644
> --- a/fs/xfs/libxfs/xfs_attr.c
> +++ b/fs/xfs/libxfs/xfs_attr.c
> @@ -69,9 +69,12 @@ int
>  xfs_inode_hasattr(
>  	struct xfs_inode	*ip)
>  {
> -	if (!XFS_IFORK_Q(ip) ||
> -	    (ip->i_afp->if_format == XFS_DINODE_FMT_EXTENTS &&
> -	     ip->i_afp->if_nextents == 0))
> +	if (!XFS_IFORK_Q(ip))
> +		return 0;
> +	if (!ip->i_afp)
> +		return 0;
> +	if (ip->i_afp->if_format == XFS_DINODE_FMT_EXTENTS &&
> +	    ip->i_afp->if_nextents == 0)
>  		return 0;
>  	return 1;
>  }
> @@ -409,23 +412,30 @@ xfs_attr_sf_addname(
>  }
>  
>  /*
> - * When we bump the state to REPLACE, we may actually need to skip
> over the
> - * state. When LARP mode is enabled, we don't need to run the atomic
> flags flip,
> - * so we skip straight over the REPLACE state and go on to
> REMOVE_OLD.
> + * Handle the state change on completion of a multi-state attr
> operation.
> + *
> + * If the XFS_DA_OP_REPLACE flag is set, this means the operation
> was the first
> + * modification in a attr replace operation and we still have to do
> the second
> + * state, indicated by @replace_state.
> + *
> + * We consume the XFS_DA_OP_REPLACE flag so that when we are called
> again on
> + * completion of the second half of the attr replace operation we
> correctly
> + * signal that it is done.
>   */
> -static void
> -xfs_attr_dela_state_set_replace(
> +static enum xfs_delattr_state
> +xfs_attr_complete_op(
>  	struct xfs_attr_item	*attr,
> -	enum xfs_delattr_state	replace)
> +	enum xfs_delattr_state	replace_state)
>  {
>  	struct xfs_da_args	*args = attr->xattri_da_args;
> +	bool			do_replace = args->op_flags &
> XFS_DA_OP_REPLACE;
>  
> -	ASSERT(replace == XFS_DAS_LEAF_REPLACE ||
> -			replace == XFS_DAS_NODE_REPLACE);
> -
> -	attr->xattri_dela_state = replace;
> -	if (xfs_has_larp(args->dp->i_mount))
> -		attr->xattri_dela_state++;
> +	args->op_flags &= ~XFS_DA_OP_REPLACE;
> +	if (do_replace) {
> +		args->attr_filter &= ~XFS_ATTR_INCOMPLETE;
> +		return replace_state;
> +	}
> +	return XFS_DAS_DONE;
>  }
>  
>  static int
> @@ -467,10 +477,9 @@ xfs_attr_leaf_addname(
>  	 */
>  	if (args->rmtblkno)
>  		attr->xattri_dela_state = XFS_DAS_LEAF_SET_RMT;
> -	else if (args->op_flags & XFS_DA_OP_REPLACE)
> -		xfs_attr_dela_state_set_replace(attr,
> XFS_DAS_LEAF_REPLACE);
>  	else
> -		attr->xattri_dela_state = XFS_DAS_DONE;
> +		attr->xattri_dela_state = xfs_attr_complete_op(attr,
> +							XFS_DAS_LEAF_RE
> PLACE);
>  out:
>  	trace_xfs_attr_leaf_addname_return(attr->xattri_dela_state,
> args->dp);
>  	return error;
> @@ -512,10 +521,9 @@ xfs_attr_node_addname(
>  
>  	if (args->rmtblkno)
>  		attr->xattri_dela_state = XFS_DAS_NODE_SET_RMT;
> -	else if (args->op_flags & XFS_DA_OP_REPLACE)
> -		xfs_attr_dela_state_set_replace(attr,
> XFS_DAS_NODE_REPLACE);
>  	else
> -		attr->xattri_dela_state = XFS_DAS_DONE;
> +		attr->xattri_dela_state = xfs_attr_complete_op(attr,
> +							XFS_DAS_NODE_RE
> PLACE);
>  out:
>  	trace_xfs_attr_node_addname_return(attr->xattri_dela_state,
> args->dp);
>  	return error;
> @@ -547,18 +555,15 @@ xfs_attr_rmtval_alloc(
>  	if (error)
>  		return error;
>  
> -	/* If this is not a rename, clear the incomplete flag and we're
> done. */
> -	if (!(args->op_flags & XFS_DA_OP_REPLACE)) {
> +	attr->xattri_dela_state = xfs_attr_complete_op(attr,
> +						++attr-
> >xattri_dela_state);
> +	/*
> +	 * If we are not doing a rename, we've finished the operation
> but still
> +	 * have to clear the incomplete flag protecting the new attr
> from
> +	 * exposing partially initialised state if we crash during
> creation.
> +	 */
> +	if (attr->xattri_dela_state == XFS_DAS_DONE)
>  		error = xfs_attr3_leaf_clearflag(args);
> -		attr->xattri_dela_state = XFS_DAS_DONE;
> -	} else {
> -		/*
> -		 * We are running a REPLACE operation, so we need to
> bump the
> -		 * state to the step in that operation.
> -		 */
> -		attr->xattri_dela_state++;
> -		xfs_attr_dela_state_set_replace(attr, attr-
> >xattri_dela_state);
> -	}
>  out:
>  	trace_xfs_attr_rmtval_alloc(attr->xattri_dela_state, args->dp);
>  	return error;
> @@ -715,13 +720,24 @@ xfs_attr_set_iter(
>  		return xfs_attr_node_addname(attr);
>  
>  	case XFS_DAS_SF_REMOVE:
> -		attr->xattri_dela_state = XFS_DAS_DONE;
> -		return xfs_attr_sf_removename(args);
> +		error = xfs_attr_sf_removename(args);
> +		attr->xattri_dela_state = xfs_attr_complete_op(attr,
> +						xfs_attr_init_add_state
> (args));
> +		break;
>  	case XFS_DAS_LEAF_REMOVE:
> -		attr->xattri_dela_state = XFS_DAS_DONE;
> -		return xfs_attr_leaf_removename(args);
> +		error = xfs_attr_leaf_removename(args);
> +		attr->xattri_dela_state = xfs_attr_complete_op(attr,
> +						xfs_attr_init_add_state
> (args));
> +		break;
>  	case XFS_DAS_NODE_REMOVE:
>  		error = xfs_attr_node_removename_setup(attr);
> +		if (error == -ENOATTR &&
> +		    (args->op_flags & XFS_DA_OP_RECOVERY)) {
> +			attr->xattri_dela_state =
> xfs_attr_complete_op(attr,
> +						xfs_attr_init_add_state
> (args));
> +			error = 0;
> +			break;
> +		}
>  		if (error)
>  			return error;
>  		attr->xattri_dela_state = XFS_DAS_NODE_REMOVE_RMT;
> @@ -807,14 +823,16 @@ xfs_attr_set_iter(
>  
>  	case XFS_DAS_LEAF_REMOVE_ATTR:
>  		error = xfs_attr_leaf_remove_attr(attr);
> -		attr->xattri_dela_state = XFS_DAS_DONE;
> +		attr->xattri_dela_state = xfs_attr_complete_op(attr,
> +						xfs_attr_init_add_state
> (args));
>  		break;
>  
>  	case XFS_DAS_NODE_REMOVE_ATTR:
>  		error = xfs_attr_node_remove_attr(attr);
>  		if (!error)
>  			error = xfs_attr_leaf_shrink(args);
> -		attr->xattri_dela_state = XFS_DAS_DONE;
> +		attr->xattri_dela_state = xfs_attr_complete_op(attr,
> +						xfs_attr_init_add_state
> (args));
>  		break;
>  	default:
>  		ASSERT(0);
> @@ -1316,9 +1334,10 @@ xfs_attr_leaf_removename(
>  	dp = args->dp;
>  
>  	error = xfs_attr_leaf_hasname(args, &bp);
> -
>  	if (error == -ENOATTR) {
>  		xfs_trans_brelse(args->trans, bp);
> +		if (args->op_flags & XFS_DA_OP_RECOVERY)
> +			return 0;
>  		return error;
>  	} else if (error != -EEXIST)
>  		return error;
> diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
> index e93efc8b11cd..7467d31cb3f1 100644
> --- a/fs/xfs/libxfs/xfs_attr.h
> +++ b/fs/xfs/libxfs/xfs_attr.h
> @@ -444,18 +444,23 @@ struct xfs_attr_list_context {
>   */
>  enum xfs_delattr_state {
>  	XFS_DAS_UNINIT		= 0,	/* No state has been set yet
> */
> -	XFS_DAS_SF_ADD,			/* Initial shortform set iter
> state */
> -	XFS_DAS_LEAF_ADD,		/* Initial leaf form set iter state
> */
> -	XFS_DAS_NODE_ADD,		/* Initial node form set iter state
> */
> -	XFS_DAS_RMTBLK,			/* Removing remote blks */
> -	XFS_DAS_RM_NAME,		/* Remove attr name */
> -	XFS_DAS_RM_SHRINK,		/* We are shrinking the tree
> */
> -
> -	XFS_DAS_SF_REMOVE,		/* Initial shortform set iter
> state */
> -	XFS_DAS_LEAF_REMOVE,		/* Initial leaf form set iter
> state */
> -	XFS_DAS_NODE_REMOVE,		/* Initial node form set iter
> state */
> -
> -	/* Leaf state set/replace sequence */
> +
> +	/*
> +	 * Initial sequence states. The replace setup code relies on
> the
> +	 * ADD and REMOVE states for a specific format to be sequential
> so
> +	 * that we can transform the initial operation to be performed
> +	 * according to the xfs_has_larp() state easily.
> +	 */
> +	XFS_DAS_SF_ADD,			/* Initial sf add state */
> +	XFS_DAS_SF_REMOVE,		/* Initial sf replace/remove
> state */
> +
> +	XFS_DAS_LEAF_ADD,		/* Initial leaf add state */
> +	XFS_DAS_LEAF_REMOVE,		/* Initial leaf
> replace/remove state */
> +
> +	XFS_DAS_NODE_ADD,		/* Initial node add state */
> +	XFS_DAS_NODE_REMOVE,		/* Initial node
> replace/remove state */
> +
> +	/* Leaf state set/replace/remove sequence */
>  	XFS_DAS_LEAF_SET_RMT,		/* set a remote xattr from a
> leaf */
>  	XFS_DAS_LEAF_ALLOC_RMT,		/* We are allocating remote
> blocks */
>  	XFS_DAS_LEAF_REPLACE,		/* Perform replace ops on a
> leaf */
> @@ -463,7 +468,7 @@ enum xfs_delattr_state {
>  	XFS_DAS_LEAF_REMOVE_RMT,	/* A rename is removing remote blocks
> */
>  	XFS_DAS_LEAF_REMOVE_ATTR,	/* Remove the old attr from a leaf */
>  
> -	/* Node state set/replace sequence, must match leaf state above
> */
> +	/* Node state sequence, must match leaf state above */
>  	XFS_DAS_NODE_SET_RMT,		/* set a remote xattr from a
> node */
>  	XFS_DAS_NODE_ALLOC_RMT,		/* We are allocating remote
> blocks */
>  	XFS_DAS_NODE_REPLACE,		/* Perform replace ops on a
> node */
> @@ -477,11 +482,11 @@ enum xfs_delattr_state {
>  #define XFS_DAS_STRINGS	\
>  	{ XFS_DAS_UNINIT,		"XFS_DAS_UNINIT" }, \
>  	{ XFS_DAS_SF_ADD,		"XFS_DAS_SF_ADD" }, \
> +	{ XFS_DAS_SF_REMOVE,		"XFS_DAS_SF_REMOVE" }, \
>  	{ XFS_DAS_LEAF_ADD,		"XFS_DAS_LEAF_ADD" }, \
> +	{ XFS_DAS_LEAF_REMOVE,		"XFS_DAS_LEAF_REMOVE" }, \
>  	{ XFS_DAS_NODE_ADD,		"XFS_DAS_NODE_ADD" }, \
> -	{ XFS_DAS_RMTBLK,		"XFS_DAS_RMTBLK" }, \
> -	{ XFS_DAS_RM_NAME,		"XFS_DAS_RM_NAME" }, \
> -	{ XFS_DAS_RM_SHRINK,		"XFS_DAS_RM_SHRINK" }, \
> +	{ XFS_DAS_NODE_REMOVE,		"XFS_DAS_NODE_REMOVE" }, \
>  	{ XFS_DAS_LEAF_SET_RMT,		"XFS_DAS_LEAF_SET_RMT" }, \
>  	{ XFS_DAS_LEAF_ALLOC_RMT,	"XFS_DAS_LEAF_ALLOC_RMT" }, \
>  	{ XFS_DAS_LEAF_REPLACE,		"XFS_DAS_LEAF_REPLACE" }, \
> @@ -525,8 +530,7 @@ struct xfs_attr_item {
>  	enum xfs_delattr_state		xattri_dela_state;
>  
>  	/*
> -	 * Indicates if the attr operation is a set or a remove
> -	 * XFS_ATTR_OP_FLAGS_{SET,REMOVE}
> +	 * Attr operation being performed - XFS_ATTR_OP_FLAGS_*
>  	 */
>  	unsigned int			xattri_op_flags;
>  
> @@ -605,10 +609,19 @@ xfs_attr_init_remove_state(struct xfs_da_args
> *args)
>  	return XFS_DAS_NODE_REMOVE;
>  }
>  
> +/*
> + * If we are logging the attributes, then we have to start with
> removal of the
> + * old attribute so that there is always consistent state that we
> can recover
> + * from if the system goes down part way through. We always log the
> new attr
> + * value, so even when we remove the attr first we still have the
> information in
> + * the log to finish the replace operation atomically.
> + */
>  static inline enum xfs_delattr_state
>  xfs_attr_init_replace_state(struct xfs_da_args *args)
>  {
>  	args->op_flags |= XFS_DA_OP_ADDNAME | XFS_DA_OP_REPLACE;
> +	if (xfs_has_larp(args->dp->i_mount))
> +		return xfs_attr_init_remove_state(args);
>  	return xfs_attr_init_add_state(args);
>  }
>  
> diff --git a/fs/xfs/libxfs/xfs_attr_leaf.c
> b/fs/xfs/libxfs/xfs_attr_leaf.c
> index 53d02ce9ed78..d15e92858bf0 100644
> --- a/fs/xfs/libxfs/xfs_attr_leaf.c
> +++ b/fs/xfs/libxfs/xfs_attr_leaf.c
> @@ -446,6 +446,14 @@ xfs_attr3_leaf_read(
>   * Namespace helper routines
>  
> *====================================================================
> ====*/
>  
> +/*
> + * If we are in log recovery, then we want the lookup to ignore the
> INCOMPLETE
> + * flag on disk - if there's an incomplete attr then recovery needs
> to tear it
> + * down. If there's no incomplete attr, then recovery needs to tear
> that attr
> + * down to replace it with the attr that has been logged. In this
> case, the
> + * INCOMPLETE flag will not be set in attr->attr_filter, but rather
> + * XFS_DA_OP_RECOVERY will be set in args->op_flags.
> + */
>  static bool
>  xfs_attr_match(
>  	struct xfs_da_args	*args,
> @@ -453,14 +461,18 @@ xfs_attr_match(
>  	unsigned char		*name,
>  	int			flags)
>  {
> +
>  	if (args->namelen != namelen)
>  		return false;
>  	if (memcmp(args->name, name, namelen) != 0)
>  		return false;
> -	/*
> -	 * If we are looking for incomplete entries, show only those,
> else only
> -	 * show complete entries.
> -	 */
> +
> +	/* Recovery ignores the INCOMPLETE flag. */
> +	if ((args->op_flags & XFS_DA_OP_RECOVERY) &&
> +	    args->attr_filter == (flags & XFS_ATTR_NSP_ONDISK_MASK))
> +		return true;
> +
> +	/* All remaining matches need to be filtered by INCOMPLETE
> state. */
>  	if (args->attr_filter !=
>  	    (flags & (XFS_ATTR_NSP_ONDISK_MASK | XFS_ATTR_INCOMPLETE)))
>  		return false;
> @@ -799,6 +811,14 @@ xfs_attr_sf_removename(
>  	sf = (struct xfs_attr_shortform *)dp->i_afp->if_u1.if_data;
>  
>  	error = xfs_attr_sf_findname(args, &sfe, &base);
> +
> +	/*
> +	 * If we are recovering an operation, finding nothing to
> +	 * remove is not an error - it just means there was nothing
> +	 * to clean up.
> +	 */
> +	if (error == -ENOATTR && (args->op_flags & XFS_DA_OP_RECOVERY))
> +		return 0;
>  	if (error != -EEXIST)
>  		return error;
>  	size = xfs_attr_sf_entsize(sfe);
> @@ -819,7 +839,7 @@ xfs_attr_sf_removename(
>  	totsize -= size;
>  	if (totsize == sizeof(xfs_attr_sf_hdr_t) && xfs_has_attr2(mp)
> &&
>  	    (dp->i_df.if_format != XFS_DINODE_FMT_BTREE) &&
> -	    !(args->op_flags & XFS_DA_OP_ADDNAME)) {
> +	    !(args->op_flags & (XFS_DA_OP_ADDNAME |
> XFS_DA_OP_REPLACE))) {
>  		xfs_attr_fork_remove(dp, args->trans);
>  	} else {
>  		xfs_idata_realloc(dp, -size, XFS_ATTR_FORK);
> @@ -1128,9 +1148,17 @@ xfs_attr3_leaf_to_shortform(
>  		goto out;
>  
>  	if (forkoff == -1) {
> -		ASSERT(xfs_has_attr2(dp->i_mount));
> -		ASSERT(dp->i_df.if_format != XFS_DINODE_FMT_BTREE);
> -		xfs_attr_fork_remove(dp, args->trans);
> +		/*
> +		 * Don't remove the attr fork if this operation is the
> first
> +		 * part of a attr replace operations. We're going to
> add a new
> +		 * attr immediately, so we need to keep the attr fork
> around in
> +		 * this case.
> +		 */
> +		if (!(args->op_flags & XFS_DA_OP_REPLACE)) {
> +			ASSERT(xfs_has_attr2(dp->i_mount));
> +			ASSERT(dp->i_df.if_format !=
> XFS_DINODE_FMT_BTREE);
> +			xfs_attr_fork_remove(dp, args->trans);
> +		}
>  		goto out;
>  	}
>  
> diff --git a/fs/xfs/libxfs/xfs_da_btree.h
> b/fs/xfs/libxfs/xfs_da_btree.h
> index 468ca70cd35d..ed2303e4d46a 100644
> --- a/fs/xfs/libxfs/xfs_da_btree.h
> +++ b/fs/xfs/libxfs/xfs_da_btree.h
> @@ -91,6 +91,7 @@ typedef struct xfs_da_args {
>  #define XFS_DA_OP_CILOOKUP	(1u << 4) /* lookup returns CI name if
> found */
>  #define XFS_DA_OP_NOTIME	(1u << 5) /* don't update inode
> timestamps */
>  #define XFS_DA_OP_REMOVE	(1u << 6) /* this is a remove operation
> */
> +#define XFS_DA_OP_RECOVERY	(1u << 7) /* Log recovery operation */
>  
>  #define XFS_DA_OP_FLAGS \
>  	{ XFS_DA_OP_JUSTCHECK,	"JUSTCHECK" }, \
> @@ -99,7 +100,8 @@ typedef struct xfs_da_args {
>  	{ XFS_DA_OP_OKNOENT,	"OKNOENT" }, \
>  	{ XFS_DA_OP_CILOOKUP,	"CILOOKUP" }, \
>  	{ XFS_DA_OP_NOTIME,	"NOTIME" }, \
> -	{ XFS_DA_OP_REMOVE,	"REMOVE" }
> +	{ XFS_DA_OP_REMOVE,	"REMOVE" }, \
> +	{ XFS_DA_OP_RECOVERY,	"RECOVERY" }
>  
>  /*
>   * Storage for holding state during Btree searches and split/join
> ops.
> diff --git a/fs/xfs/xfs_attr_item.c b/fs/xfs/xfs_attr_item.c
> index fb9549e7ea96..50ad3aa891ee 100644
> --- a/fs/xfs/xfs_attr_item.c
> +++ b/fs/xfs/xfs_attr_item.c
> @@ -554,6 +554,7 @@ xfs_attri_item_recover(
>  	args->namelen = attrp->alfi_name_len;
>  	args->hashval = xfs_da_hashname(args->name, args->namelen);
>  	args->attr_filter = attrp->alfi_attr_flags;
> +	args->op_flags = XFS_DA_OP_RECOVERY | XFS_DA_OP_OKNOENT;
>  
>  	switch (attrp->alfi_op_flags & XFS_ATTR_OP_FLAGS_TYPE_MASK) {
>  	case XFS_ATTR_OP_FLAGS_SET:
> @@ -561,9 +562,14 @@ xfs_attri_item_recover(
>  		args->value = attrip->attri_value;
>  		args->valuelen = attrp->alfi_value_len;
>  		args->total = xfs_attr_calc_size(args, &local);
> -		attr->xattri_dela_state =
> xfs_attr_init_add_state(args);
> +		if (xfs_inode_hasattr(args->dp))
> +			attr->xattri_dela_state =
> xfs_attr_init_replace_state(args);
> +		else
> +			attr->xattri_dela_state =
> xfs_attr_init_add_state(args);
>  		break;
>  	case XFS_ATTR_OP_FLAGS_REMOVE:
> +		if (!xfs_inode_hasattr(args->dp))
> +			goto out;
>  		attr->xattri_dela_state =
> xfs_attr_init_remove_state(args);
>  		break;
>  	default:
> diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
> index 01b047d86cd1..d32026585c1b 100644
> --- a/fs/xfs/xfs_trace.h
> +++ b/fs/xfs/xfs_trace.h
> @@ -4131,13 +4131,10 @@ DEFINE_ICLOG_EVENT(xlog_iclog_write);
>  
>  TRACE_DEFINE_ENUM(XFS_DAS_UNINIT);
>  TRACE_DEFINE_ENUM(XFS_DAS_SF_ADD);
> -TRACE_DEFINE_ENUM(XFS_DAS_LEAF_ADD);
> -TRACE_DEFINE_ENUM(XFS_DAS_NODE_ADD);
> -TRACE_DEFINE_ENUM(XFS_DAS_RMTBLK);
> -TRACE_DEFINE_ENUM(XFS_DAS_RM_NAME);
> -TRACE_DEFINE_ENUM(XFS_DAS_RM_SHRINK);
>  TRACE_DEFINE_ENUM(XFS_DAS_SF_REMOVE);
> +TRACE_DEFINE_ENUM(XFS_DAS_LEAF_ADD);
>  TRACE_DEFINE_ENUM(XFS_DAS_LEAF_REMOVE);
> +TRACE_DEFINE_ENUM(XFS_DAS_NODE_ADD);
>  TRACE_DEFINE_ENUM(XFS_DAS_NODE_REMOVE);
>  TRACE_DEFINE_ENUM(XFS_DAS_LEAF_SET_RMT);
>  TRACE_DEFINE_ENUM(XFS_DAS_LEAF_ALLOC_RMT);


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 18/18] xfs: detect empty attr leaf blocks in xfs_attr3_leaf_verify
  2022-05-09  0:41 ` [PATCH 18/18] xfs: detect empty attr leaf blocks in xfs_attr3_leaf_verify Dave Chinner
@ 2022-05-10 22:31   ` Alli
  2022-05-10 23:54   ` Darrick J. Wong
  1 sibling, 0 replies; 56+ messages in thread
From: Alli @ 2022-05-10 22:31 UTC (permalink / raw)
  To: Dave Chinner, linux-xfs

On Mon, 2022-05-09 at 10:41 +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> xfs_repair flags these as a corruption error, so the verifier should
> catch software bugs that result in empty leaf blocks being written
> to disk, too.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
Ok, looks good
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>

> ---
>  fs/xfs/libxfs/xfs_attr_leaf.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/fs/xfs/libxfs/xfs_attr_leaf.c
> b/fs/xfs/libxfs/xfs_attr_leaf.c
> index d15e92858bf0..15a990409463 100644
> --- a/fs/xfs/libxfs/xfs_attr_leaf.c
> +++ b/fs/xfs/libxfs/xfs_attr_leaf.c
> @@ -310,6 +310,15 @@ xfs_attr3_leaf_verify(
>  	if (fa)
>  		return fa;
>  
> +	/*
> +	 * Empty leaf blocks should never occur;  they imply the
> existence of a
> +	 * software bug that needs fixing. xfs_repair also flags them
> as a
> +	 * corruption that needs fixing, so we should never let these
> go to
> +	 * disk.
> +	 */
> +	if (ichdr.count == 0)
> +		return __this_address;
> +
>  	/*
>  	 * firstused is the block offset of the first name info
> structure.
>  	 * Make sure it doesn't go off the block or crash into the
> header.


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 03/18] xfs: make xattri_leaf_bp more useful
  2022-05-09  0:41 ` [PATCH 03/18] xfs: make xattri_leaf_bp more useful Dave Chinner
@ 2022-05-10 22:58   ` Darrick J. Wong
  0 siblings, 0 replies; 56+ messages in thread
From: Darrick J. Wong @ 2022-05-10 22:58 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Mon, May 09, 2022 at 10:41:23AM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> We currently set it and hold it when converting from short to leaf
> form, then release it only to immediately look it back up again
> to do the leaf insert.
> 
> Do a bit of refactoring to xfs_attr_leaf_try_add() to avoid this
> messy handling of the newly allocated leaf buffer.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> Reviewed-by: Allison Henderson<allison.henderson@oracle.com>

Took me a while to figure this out, but looks fine to me.
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> ---
>  fs/xfs/libxfs/xfs_attr.c | 50 +++++++++++++++++++++++++---------------
>  1 file changed, 32 insertions(+), 18 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
> index b3d918195160..a4b0b20a3bab 100644
> --- a/fs/xfs/libxfs/xfs_attr.c
> +++ b/fs/xfs/libxfs/xfs_attr.c
> @@ -319,7 +319,15 @@ xfs_attr_leaf_addname(
>  	int			error;
>  
>  	if (xfs_attr_is_leaf(dp)) {
> +
> +		/*
> +		 * Use the leaf buffer we may already hold locked as a result of
> +		 * a sf-to-leaf conversion. The held buffer is no longer valid
> +		 * after this call, regardless of the result.
> +		 */
>  		error = xfs_attr_leaf_try_add(args, attr->xattri_leaf_bp);
> +		attr->xattri_leaf_bp = NULL;
> +
>  		if (error == -ENOSPC) {
>  			error = xfs_attr3_leaf_to_node(args);
>  			if (error)
> @@ -341,6 +349,8 @@ xfs_attr_leaf_addname(
>  		}
>  		next_state = XFS_DAS_FOUND_LBLK;
>  	} else {
> +		ASSERT(!attr->xattri_leaf_bp);
> +
>  		error = xfs_attr_node_addname_find_attr(attr);
>  		if (error)
>  			return error;
> @@ -396,12 +406,6 @@ xfs_attr_set_iter(
>  		 */
>  		if (xfs_attr_is_shortform(dp))
>  			return xfs_attr_sf_addname(attr);
> -		if (attr->xattri_leaf_bp != NULL) {
> -			xfs_trans_bhold_release(args->trans,
> -						attr->xattri_leaf_bp);
> -			attr->xattri_leaf_bp = NULL;
> -		}
> -
>  		return xfs_attr_leaf_addname(attr);
>  
>  	case XFS_DAS_FOUND_LBLK:
> @@ -992,18 +996,31 @@ xfs_attr_leaf_try_add(
>  	struct xfs_da_args	*args,
>  	struct xfs_buf		*bp)
>  {
> -	int			retval;
> +	int			error;
>  
>  	/*
> -	 * Look up the given attribute in the leaf block.  Figure out if
> -	 * the given flags produce an error or call for an atomic rename.
> +	 * If the caller provided a buffer to us, it is locked and held in
> +	 * the transaction because it just did a shortform to leaf conversion.
> +	 * Hence we don't need to read it again. Otherwise read in the leaf
> +	 * buffer.
>  	 */
> -	retval = xfs_attr_leaf_hasname(args, &bp);
> -	if (retval != -ENOATTR && retval != -EEXIST)
> -		return retval;
> -	if (retval == -ENOATTR && (args->attr_flags & XATTR_REPLACE))
> +	if (bp) {
> +		xfs_trans_bhold_release(args->trans, bp);
> +	} else {
> +		error = xfs_attr3_leaf_read(args->trans, args->dp, 0, &bp);
> +		if (error)
> +			return error;
> +	}
> +
> +	/*
> +	 * Look up the xattr name to set the insertion point for the new xattr.
> +	 */
> +	error = xfs_attr3_leaf_lookup_int(bp, args);
> +	if (error != -ENOATTR && error != -EEXIST)
>  		goto out_brelse;
> -	if (retval == -EEXIST) {
> +	if (error == -ENOATTR && (args->attr_flags & XATTR_REPLACE))
> +		goto out_brelse;
> +	if (error == -EEXIST) {
>  		if (args->attr_flags & XATTR_CREATE)
>  			goto out_brelse;
>  
> @@ -1023,14 +1040,11 @@ xfs_attr_leaf_try_add(
>  		args->rmtvaluelen = 0;
>  	}
>  
> -	/*
> -	 * Add the attribute to the leaf block
> -	 */
>  	return xfs_attr3_leaf_add(bp, args);
>  
>  out_brelse:
>  	xfs_trans_brelse(args->trans, bp);
> -	return retval;
> +	return error;
>  }
>  
>  /*
> -- 
> 2.35.1
> 

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 04/18] xfs: rework deferred attribute operation setup
  2022-05-09  0:41 ` [PATCH 04/18] xfs: rework deferred attribute operation setup Dave Chinner
@ 2022-05-10 23:04   ` Darrick J. Wong
  2022-05-11  0:57     ` Dave Chinner
  0 siblings, 1 reply; 56+ messages in thread
From: Darrick J. Wong @ 2022-05-10 23:04 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Mon, May 09, 2022 at 10:41:24AM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> Logged attribute intents only have set and remove types - there is
> no type of the replace operation. We should ahve a separate type for

"..no separate intent type for a replace operation."?

Also, "ahve" -> "have".

> a replace operation, as it needs to perform operations that neither
> SET or REMOVE can perform.
> 
> Add this type to the intent items and rearrange the deferred
> operation setup to reflect the different operations we are
> performing.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> Reviewed-by: Allison Henderson<allison.henderson@oracle.com>
> ---
>  fs/xfs/libxfs/xfs_attr.c       | 165 +++++++++++++++++++--------------
>  fs/xfs/libxfs/xfs_attr.h       |   2 -
>  fs/xfs/libxfs/xfs_log_format.h |   1 +
>  fs/xfs/xfs_attr_item.c         |   9 +-
>  fs/xfs/xfs_trace.h             |   4 +
>  5 files changed, 110 insertions(+), 71 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
> index a4b0b20a3bab..817e15740f9c 100644
> --- a/fs/xfs/libxfs/xfs_attr.c
> +++ b/fs/xfs/libxfs/xfs_attr.c
> @@ -671,6 +671,81 @@ xfs_attr_lookup(
>  	return xfs_attr_node_hasname(args, NULL);
>  }
>  
> +static int
> +xfs_attr_item_init(
> +	struct xfs_da_args	*args,
> +	unsigned int		op_flags,	/* op flag (set or remove) */
> +	struct xfs_attr_item	**attr)		/* new xfs_attr_item */
> +{
> +
> +	struct xfs_attr_item	*new;
> +
> +	new = kmem_zalloc(sizeof(struct xfs_attr_item), KM_NOFS);

Can this fail?

> +	new->xattri_op_flags = op_flags;
> +	new->xattri_da_args = args;
> +
> +	*attr = new;
> +	return 0;

And if it can't, then there's no need for a return value, AFAICT.  I
looked at the end of xfs-5.19-compose and there's no other return
statements in this function.

And then you don't need any of the error handling in this patch at all,
right?

*OH*, this is just a hoist of the stuff at the end.  Could someone add a
patch on the end of ... whatever patchset there is to clean that up?

In the meantime, with the commit message typos cleaned up,
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> +}
> +
> +/* Sets an attribute for an inode as a deferred operation */
> +static int
> +xfs_attr_defer_add(
> +	struct xfs_da_args	*args)
> +{
> +	struct xfs_attr_item	*new;
> +	int			error = 0;
> +
> +	error = xfs_attr_item_init(args, XFS_ATTR_OP_FLAGS_SET, &new);
> +	if (error)
> +		return error;
> +
> +	new->xattri_dela_state = XFS_DAS_UNINIT;
> +	xfs_defer_add(args->trans, XFS_DEFER_OPS_TYPE_ATTR, &new->xattri_list);
> +	trace_xfs_attr_defer_add(new->xattri_dela_state, args->dp);
> +
> +	return 0;
> +}
> +
> +/* Sets an attribute for an inode as a deferred operation */
> +static int
> +xfs_attr_defer_replace(
> +	struct xfs_da_args	*args)
> +{
> +	struct xfs_attr_item	*new;
> +	int			error = 0;
> +
> +	error = xfs_attr_item_init(args, XFS_ATTR_OP_FLAGS_REPLACE, &new);
> +	if (error)
> +		return error;
> +
> +	new->xattri_dela_state = XFS_DAS_UNINIT;
> +	xfs_defer_add(args->trans, XFS_DEFER_OPS_TYPE_ATTR, &new->xattri_list);
> +	trace_xfs_attr_defer_replace(new->xattri_dela_state, args->dp);
> +
> +	return 0;
> +}
> +
> +/* Removes an attribute for an inode as a deferred operation */
> +static int
> +xfs_attr_defer_remove(
> +	struct xfs_da_args	*args)
> +{
> +
> +	struct xfs_attr_item	*new;
> +	int			error;
> +
> +	error  = xfs_attr_item_init(args, XFS_ATTR_OP_FLAGS_REMOVE, &new);
> +	if (error)
> +		return error;
> +
> +	new->xattri_dela_state = XFS_DAS_UNINIT;
> +	xfs_defer_add(args->trans, XFS_DEFER_OPS_TYPE_ATTR, &new->xattri_list);
> +	trace_xfs_attr_defer_remove(new->xattri_dela_state, args->dp);
> +
> +	return 0;
> +}
> +
>  /*
>   * Note: If args->value is NULL the attribute will be removed, just like the
>   * Linux ->setattr API.
> @@ -759,29 +834,35 @@ xfs_attr_set(
>  	}
>  
>  	error = xfs_attr_lookup(args);
> -	if (args->value) {
> -		if (error == -EEXIST && (args->attr_flags & XATTR_CREATE))
> -			goto out_trans_cancel;
> -		if (error == -ENOATTR && (args->attr_flags & XATTR_REPLACE))
> -			goto out_trans_cancel;
> -		if (error != -ENOATTR && error != -EEXIST)
> +	switch (error) {
> +	case -EEXIST:
> +		/* if no value, we are performing a remove operation */
> +		if (!args->value) {
> +			error = xfs_attr_defer_remove(args);
> +			break;
> +		}
> +		/* Pure create fails if the attr already exists */
> +		if (args->attr_flags & XATTR_CREATE)
>  			goto out_trans_cancel;
>  
> -		error = xfs_attr_set_deferred(args);
> -		if (error)
> +		error = xfs_attr_defer_replace(args);
> +		break;
> +	case -ENOATTR:
> +		/* Can't remove what isn't there. */
> +		if (!args->value)
>  			goto out_trans_cancel;
>  
> -		/* shortform attribute has already been committed */
> -		if (!args->trans)
> -			goto out_unlock;
> -	} else {
> -		if (error != -EEXIST)
> +		/* Pure replace fails if no existing attr to replace. */
> +		if (args->attr_flags & XATTR_REPLACE)
>  			goto out_trans_cancel;
>  
> -		error = xfs_attr_remove_deferred(args);
> -		if (error)
> -			goto out_trans_cancel;
> +		error = xfs_attr_defer_add(args);
> +		break;
> +	default:
> +		goto out_trans_cancel;
>  	}
> +	if (error)
> +		goto out_trans_cancel;
>  
>  	/*
>  	 * If this is a synchronous mount, make sure that the
> @@ -845,58 +926,6 @@ xfs_attrd_destroy_cache(void)
>  	xfs_attrd_cache = NULL;
>  }
>  
> -STATIC int
> -xfs_attr_item_init(
> -	struct xfs_da_args	*args,
> -	unsigned int		op_flags,	/* op flag (set or remove) */
> -	struct xfs_attr_item	**attr)		/* new xfs_attr_item */
> -{
> -
> -	struct xfs_attr_item	*new;
> -
> -	new = kmem_zalloc(sizeof(struct xfs_attr_item), KM_NOFS);
> -	new->xattri_op_flags = op_flags;
> -	new->xattri_da_args = args;
> -
> -	*attr = new;
> -	return 0;
> -}
> -
> -/* Sets an attribute for an inode as a deferred operation */
> -int
> -xfs_attr_set_deferred(
> -	struct xfs_da_args	*args)
> -{
> -	struct xfs_attr_item	*new;
> -	int			error = 0;
> -
> -	error = xfs_attr_item_init(args, XFS_ATTR_OP_FLAGS_SET, &new);
> -	if (error)
> -		return error;
> -
> -	xfs_defer_add(args->trans, XFS_DEFER_OPS_TYPE_ATTR, &new->xattri_list);
> -
> -	return 0;
> -}
> -
> -/* Removes an attribute for an inode as a deferred operation */
> -int
> -xfs_attr_remove_deferred(
> -	struct xfs_da_args	*args)
> -{
> -
> -	struct xfs_attr_item	*new;
> -	int			error;
> -
> -	error  = xfs_attr_item_init(args, XFS_ATTR_OP_FLAGS_REMOVE, &new);
> -	if (error)
> -		return error;
> -
> -	xfs_defer_add(args->trans, XFS_DEFER_OPS_TYPE_ATTR, &new->xattri_list);
> -
> -	return 0;
> -}
> -
>  /*========================================================================
>   * External routines when attribute list is inside the inode
>   *========================================================================*/
> diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
> index f6c13d2bfbcd..c9c867e3406c 100644
> --- a/fs/xfs/libxfs/xfs_attr.h
> +++ b/fs/xfs/libxfs/xfs_attr.h
> @@ -521,8 +521,6 @@ bool xfs_attr_namecheck(const void *name, size_t length);
>  int xfs_attr_calc_size(struct xfs_da_args *args, int *local);
>  void xfs_init_attr_trans(struct xfs_da_args *args, struct xfs_trans_res *tres,
>  			 unsigned int *total);
> -int xfs_attr_set_deferred(struct xfs_da_args *args);
> -int xfs_attr_remove_deferred(struct xfs_da_args *args);
>  
>  extern struct kmem_cache	*xfs_attri_cache;
>  extern struct kmem_cache	*xfs_attrd_cache;
> diff --git a/fs/xfs/libxfs/xfs_log_format.h b/fs/xfs/libxfs/xfs_log_format.h
> index a27492e99673..f7edd1ecf6d9 100644
> --- a/fs/xfs/libxfs/xfs_log_format.h
> +++ b/fs/xfs/libxfs/xfs_log_format.h
> @@ -908,6 +908,7 @@ struct xfs_icreate_log {
>   */
>  #define XFS_ATTR_OP_FLAGS_SET		1	/* Set the attribute */
>  #define XFS_ATTR_OP_FLAGS_REMOVE	2	/* Remove the attribute */
> +#define XFS_ATTR_OP_FLAGS_REPLACE	3	/* Replace the attribute */
>  #define XFS_ATTR_OP_FLAGS_TYPE_MASK	0xFF	/* Flags type mask */
>  
>  /*
> diff --git a/fs/xfs/xfs_attr_item.c b/fs/xfs/xfs_attr_item.c
> index 5f8680b05079..fe1e37696634 100644
> --- a/fs/xfs/xfs_attr_item.c
> +++ b/fs/xfs/xfs_attr_item.c
> @@ -311,6 +311,7 @@ xfs_xattri_finish_update(
>  
>  	switch (op) {
>  	case XFS_ATTR_OP_FLAGS_SET:
> +	case XFS_ATTR_OP_FLAGS_REPLACE:
>  		error = xfs_attr_set_iter(attr);
>  		break;
>  	case XFS_ATTR_OP_FLAGS_REMOVE:
> @@ -500,8 +501,14 @@ xfs_attri_validate(
>  		return false;
>  
>  	/* alfi_op_flags should be either a set or remove */
> -	if (op != XFS_ATTR_OP_FLAGS_SET && op != XFS_ATTR_OP_FLAGS_REMOVE)
> +	switch (op) {
> +	case XFS_ATTR_OP_FLAGS_SET:
> +	case XFS_ATTR_OP_FLAGS_REPLACE:
> +	case XFS_ATTR_OP_FLAGS_REMOVE:
> +		break;
> +	default:
>  		return false;
> +	}
>  
>  	if (attrp->alfi_value_len > XATTR_SIZE_MAX)
>  		return false;
> diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
> index fec4198b738b..01ce0401aa32 100644
> --- a/fs/xfs/xfs_trace.h
> +++ b/fs/xfs/xfs_trace.h
> @@ -4154,6 +4154,10 @@ DEFINE_DAS_STATE_EVENT(xfs_attr_leaf_addname_return);
>  DEFINE_DAS_STATE_EVENT(xfs_attr_node_addname_return);
>  DEFINE_DAS_STATE_EVENT(xfs_attr_remove_iter_return);
>  DEFINE_DAS_STATE_EVENT(xfs_attr_rmtval_remove_return);
> +DEFINE_DAS_STATE_EVENT(xfs_attr_defer_add);
> +DEFINE_DAS_STATE_EVENT(xfs_attr_defer_replace);
> +DEFINE_DAS_STATE_EVENT(xfs_attr_defer_remove);
> +
>  
>  TRACE_EVENT(xfs_force_shutdown,
>  	TP_PROTO(struct xfs_mount *mp, int ptag, int flags, const char *fname,
> -- 
> 2.35.1
> 

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 05/18] xfs: separate out initial attr_set states
  2022-05-09  0:41 ` [PATCH 05/18] xfs: separate out initial attr_set states Dave Chinner
@ 2022-05-10 23:12   ` Darrick J. Wong
  2022-05-11  1:06     ` Dave Chinner
  0 siblings, 1 reply; 56+ messages in thread
From: Darrick J. Wong @ 2022-05-10 23:12 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Mon, May 09, 2022 at 10:41:25AM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> We current use XFS_DAS_UNINIT for several steps in the attr_set
> state machine. We use it for setting shortform xattrs, converting
> from shortform to leaf, leaf add, leaf-to-node and leaf add. All of
> these things are essentially known before we start the state machine
> iterating, so we really should separate them out:
> 
> XFS_DAS_SF_ADD:
> 	- tries to do a shortform add
> 	- on success -> done
> 	- on ENOSPC converts to leaf, -> XFS_DAS_LEAF_ADD
> 	- on error, dies.
> 
> XFS_DAS_LEAF_ADD:
> 	- tries to do leaf add
> 	- on success:
> 		- inline attr -> done
> 		- remote xattr || REPLACE -> XFS_DAS_FOUND_LBLK
> 	- on ENOSPC converts to node, -> XFS_DAS_NODE_ADD
> 	- on error, dies
> 
> XFS_DAS_NODE_ADD:
> 	- tries to do node add
> 	- on success:
> 		- inline attr -> done
> 		- remote xattr || REPLACE -> XFS_DAS_FOUND_NBLK
> 	- on error, dies
> 
> This makes it easier to understand how the state machine starts
> up and sets us up on the path to further state machine
> simplifications.

Yes!!

> This also converts the DAS state tracepoints to use strings rather
> than numbers, as converting between enums and numbers requires
> manual counting rather than just reading the name.
> 
> This also introduces a XFS_DAS_DONE state so that we can trace
> successful operation completions easily.

Yes!!

> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> Reviewed-by: Allison Henderson<allison.henderson@oracle.com>
> ---
>  fs/xfs/libxfs/xfs_attr.c  | 161 +++++++++++++++++++-------------------
>  fs/xfs/libxfs/xfs_attr.h  |  80 ++++++++++++++++---
>  fs/xfs/libxfs/xfs_defer.c |   2 +
>  fs/xfs/xfs_acl.c          |   4 +-
>  fs/xfs/xfs_attr_item.c    |  13 ++-
>  fs/xfs/xfs_ioctl.c        |   4 +-
>  fs/xfs/xfs_trace.h        |  22 +++++-
>  fs/xfs/xfs_xattr.c        |   2 +-
>  8 files changed, 185 insertions(+), 103 deletions(-)
> 

<snip>

> diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
> index c9c867e3406c..ad52b5dc59e4 100644
> --- a/fs/xfs/libxfs/xfs_attr.h
> +++ b/fs/xfs/libxfs/xfs_attr.h
> @@ -530,4 +553,35 @@ void xfs_attri_destroy_cache(void);
>  int __init xfs_attrd_init_cache(void);
>  void xfs_attrd_destroy_cache(void);
>  
> +/*
> + * Check to see if the attr should be upgraded from non-existent or shortform to
> + * single-leaf-block attribute list.
> + */
> +static inline bool
> +xfs_attr_is_shortform(
> +	struct xfs_inode    *ip)
> +{
> +	return ip->i_afp->if_format == XFS_DINODE_FMT_LOCAL ||
> +	       (ip->i_afp->if_format == XFS_DINODE_FMT_EXTENTS &&
> +		ip->i_afp->if_nextents == 0);
> +}
> +
> +static inline enum xfs_delattr_state
> +xfs_attr_init_add_state(struct xfs_da_args *args)
> +{
> +	if (!args->dp->i_afp)
> +		return XFS_DAS_DONE;

If we're in add/replace attr call without an attr fork, why do we go
straight to finished?

--D

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 06/18] xfs: kill XFS_DAC_LEAF_ADDNAME_INIT
  2022-05-09  0:41 ` [PATCH 06/18] xfs: kill XFS_DAC_LEAF_ADDNAME_INIT Dave Chinner
@ 2022-05-10 23:15   ` Darrick J. Wong
  0 siblings, 0 replies; 56+ messages in thread
From: Darrick J. Wong @ 2022-05-10 23:15 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Mon, May 09, 2022 at 10:41:26AM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> We re-enter the XFS_DAS_FOUND_LBLK state when we have to allocate
> multiple extents for a remote xattr. We currently have a flag
> called XFS_DAC_LEAF_ADDNAME_INIT to avoid running the remote attr
> hole finding code more than once.
> 
> However, for the node format tree, we have a separate state for this
> so we never reenter the state machine at XFS_DAS_FOUND_NBLK and so
> it does not need a special flag to skip over the remote attr hold
> finding code.
> 
> Convert the leaf block code to use the same state machine as the
> node blocks and kill the  XFS_DAC_LEAF_ADDNAME_INIT flag.
> 
> This further points out that this "ALLOC" state is only traversed
> if we have remote xattrs or we are doing a rename operation. Rename
> both the leaf and node alloc states to _ALLOC_RMT to indicate they
> are iterating to do allocation of remote xattr blocks.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> Reviewed-by: Allison Henderson<allison.henderson@oracle.com>

I agree this is a good idea to have rmtval allocation as separate
states in the machine.

Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> ---
>  fs/xfs/libxfs/xfs_attr.c | 45 ++++++++++++++++++++--------------------
>  fs/xfs/libxfs/xfs_attr.h |  6 ++++--
>  fs/xfs/xfs_trace.h       |  3 ++-
>  3 files changed, 29 insertions(+), 25 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
> index edc31075fde4..ab8a884af512 100644
> --- a/fs/xfs/libxfs/xfs_attr.c
> +++ b/fs/xfs/libxfs/xfs_attr.c
> @@ -406,40 +406,41 @@ xfs_attr_set_iter(
>  		return xfs_attr_node_addname(attr);
>  
>  	case XFS_DAS_FOUND_LBLK:
> +		/*
> +		 * Find space for remote blocks and fall into the allocation
> +		 * state.
> +		 */
> +		if (args->rmtblkno > 0) {
> +			error = xfs_attr_rmtval_find_space(attr);
> +			if (error)
> +				return error;
> +		}
> +		attr->xattri_dela_state = XFS_DAS_LEAF_ALLOC_RMT;
> +		fallthrough;
> +	case XFS_DAS_LEAF_ALLOC_RMT:
> +
>  		/*
>  		 * If there was an out-of-line value, allocate the blocks we
>  		 * identified for its storage and copy the value.  This is done
>  		 * after we create the attribute so that we don't overflow the
>  		 * maximum size of a transaction and/or hit a deadlock.
>  		 */
> -
> -		/* Open coded xfs_attr_rmtval_set without trans handling */
> -		if ((attr->xattri_flags & XFS_DAC_LEAF_ADDNAME_INIT) == 0) {
> -			attr->xattri_flags |= XFS_DAC_LEAF_ADDNAME_INIT;
> -			if (args->rmtblkno > 0) {
> -				error = xfs_attr_rmtval_find_space(attr);
> +		if (args->rmtblkno > 0) {
> +			if (attr->xattri_blkcnt > 0) {
> +				error = xfs_attr_rmtval_set_blk(attr);
>  				if (error)
>  					return error;
> +				trace_xfs_attr_set_iter_return(
> +						attr->xattri_dela_state,
> +						args->dp);
> +				return -EAGAIN;
>  			}
> -		}
>  
> -		/*
> -		 * Repeat allocating remote blocks for the attr value until
> -		 * blkcnt drops to zero.
> -		 */
> -		if (attr->xattri_blkcnt > 0) {
> -			error = xfs_attr_rmtval_set_blk(attr);
> +			error = xfs_attr_rmtval_set_value(args);
>  			if (error)
>  				return error;
> -			trace_xfs_attr_set_iter_return(attr->xattri_dela_state,
> -						       args->dp);
> -			return -EAGAIN;
>  		}
>  
> -		error = xfs_attr_rmtval_set_value(args);
> -		if (error)
> -			return error;
> -
>  		/*
>  		 * If this is not a rename, clear the incomplete flag and we're
>  		 * done.
> @@ -534,15 +535,15 @@ xfs_attr_set_iter(
>  				return error;
>  		}
>  
> +		attr->xattri_dela_state = XFS_DAS_NODE_ALLOC_RMT;
>  		fallthrough;
> -	case XFS_DAS_ALLOC_NODE:
> +	case XFS_DAS_NODE_ALLOC_RMT:
>  		/*
>  		 * If there was an out-of-line value, allocate the blocks we
>  		 * identified for its storage and copy the value.  This is done
>  		 * after we create the attribute so that we don't overflow the
>  		 * maximum size of a transaction and/or hit a deadlock.
>  		 */
> -		attr->xattri_dela_state = XFS_DAS_ALLOC_NODE;
>  		if (args->rmtblkno > 0) {
>  			if (attr->xattri_blkcnt > 0) {
>  				error = xfs_attr_rmtval_set_blk(attr);
> diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
> index ad52b5dc59e4..d016af4dbf81 100644
> --- a/fs/xfs/libxfs/xfs_attr.h
> +++ b/fs/xfs/libxfs/xfs_attr.h
> @@ -451,11 +451,12 @@ enum xfs_delattr_state {
>  	XFS_DAS_RM_NAME,		/* Remove attr name */
>  	XFS_DAS_RM_SHRINK,		/* We are shrinking the tree */
>  	XFS_DAS_FOUND_LBLK,		/* We found leaf blk for attr */
> +	XFS_DAS_LEAF_ALLOC_RMT,		/* We are allocating remote blocks */
>  	XFS_DAS_FOUND_NBLK,		/* We found node blk for attr */
> +	XFS_DAS_NODE_ALLOC_RMT,		/* We are allocating remote blocks */
>  	XFS_DAS_FLIP_LFLAG,		/* Flipped leaf INCOMPLETE attr flag */
>  	XFS_DAS_RM_LBLK,		/* A rename is removing leaf blocks */
>  	XFS_DAS_RD_LEAF,		/* Read in the new leaf */
> -	XFS_DAS_ALLOC_NODE,		/* We are allocating node blocks */
>  	XFS_DAS_FLIP_NFLAG,		/* Flipped node INCOMPLETE attr flag */
>  	XFS_DAS_RM_NBLK,		/* A rename is removing node blocks */
>  	XFS_DAS_CLR_FLAG,		/* Clear incomplete flag */
> @@ -471,11 +472,12 @@ enum xfs_delattr_state {
>  	{ XFS_DAS_RM_NAME,	"XFS_DAS_RM_NAME" }, \
>  	{ XFS_DAS_RM_SHRINK,	"XFS_DAS_RM_SHRINK" }, \
>  	{ XFS_DAS_FOUND_LBLK,	"XFS_DAS_FOUND_LBLK" }, \
> +	{ XFS_DAS_LEAF_ALLOC_RMT, "XFS_DAS_LEAF_ALLOC_RMT" }, \
>  	{ XFS_DAS_FOUND_NBLK,	"XFS_DAS_FOUND_NBLK" }, \
> +	{ XFS_DAS_NODE_ALLOC_RMT, "XFS_DAS_NODE_ALLOC_RMT" },  \
>  	{ XFS_DAS_FLIP_LFLAG,	"XFS_DAS_FLIP_LFLAG" }, \
>  	{ XFS_DAS_RM_LBLK,	"XFS_DAS_RM_LBLK" }, \
>  	{ XFS_DAS_RD_LEAF,	"XFS_DAS_RD_LEAF" }, \
> -	{ XFS_DAS_ALLOC_NODE,	"XFS_DAS_ALLOC_NODE" }, \
>  	{ XFS_DAS_FLIP_NFLAG,	"XFS_DAS_FLIP_NFLAG" }, \
>  	{ XFS_DAS_RM_NBLK,	"XFS_DAS_RM_NBLK" }, \
>  	{ XFS_DAS_CLR_FLAG,	"XFS_DAS_CLR_FLAG" }, \
> diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
> index 8f722be25c29..067ab31d7a20 100644
> --- a/fs/xfs/xfs_trace.h
> +++ b/fs/xfs/xfs_trace.h
> @@ -4137,11 +4137,12 @@ TRACE_DEFINE_ENUM(XFS_DAS_RMTBLK);
>  TRACE_DEFINE_ENUM(XFS_DAS_RM_NAME);
>  TRACE_DEFINE_ENUM(XFS_DAS_RM_SHRINK);
>  TRACE_DEFINE_ENUM(XFS_DAS_FOUND_LBLK);
> +TRACE_DEFINE_ENUM(XFS_DAS_LEAF_ALLOC_RMT);
>  TRACE_DEFINE_ENUM(XFS_DAS_FOUND_NBLK);
> +TRACE_DEFINE_ENUM(XFS_DAS_NODE_ALLOC_RMT);
>  TRACE_DEFINE_ENUM(XFS_DAS_FLIP_LFLAG);
>  TRACE_DEFINE_ENUM(XFS_DAS_RM_LBLK);
>  TRACE_DEFINE_ENUM(XFS_DAS_RD_LEAF);
> -TRACE_DEFINE_ENUM(XFS_DAS_ALLOC_NODE);
>  TRACE_DEFINE_ENUM(XFS_DAS_FLIP_NFLAG);
>  TRACE_DEFINE_ENUM(XFS_DAS_RM_NBLK);
>  TRACE_DEFINE_ENUM(XFS_DAS_CLR_FLAG);
> -- 
> 2.35.1
> 

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 07/18] xfs: consolidate leaf/node states in xfs_attr_set_iter
  2022-05-09  0:41 ` [PATCH 07/18] xfs: consolidate leaf/node states in xfs_attr_set_iter Dave Chinner
@ 2022-05-10 23:20   ` Darrick J. Wong
  2022-05-11  1:09     ` Dave Chinner
  0 siblings, 1 reply; 56+ messages in thread
From: Darrick J. Wong @ 2022-05-10 23:20 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Mon, May 09, 2022 at 10:41:27AM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> The operations performed from XFS_DAS_FOUND_LBLK through to
> XFS_DAS_RM_LBLK are now identical to XFS_DAS_FOUND_NBLK through to
> XFS_DAS_RM_NBLK. We can collapse these down into a single set of
> code.
> 
> To do this, define the states that leaf and node run through as
> separate sets of sequential states. Then as we move to the next
> state, we can use increments rather than specific state assignments
> to move through the states. This means the state progression is set
> by the initial state that enters the series and we don't need to
> duplicate the code anymore.
> 
> At the exit point of the series we need to select the correct leaf
> or node state, but that can also be done by state increment rather
> than assignment.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> Reviewed-by: Allison Henderson<allison.henderson@oracle.com>
> ---
>  fs/xfs/libxfs/xfs_attr.c | 127 ++++++---------------------------------
>  fs/xfs/libxfs/xfs_attr.h |   9 ++-
>  2 files changed, 27 insertions(+), 109 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
> index ab8a884af512..be580da62f08 100644
> --- a/fs/xfs/libxfs/xfs_attr.c
> +++ b/fs/xfs/libxfs/xfs_attr.c
> @@ -394,6 +394,7 @@ xfs_attr_set_iter(
>  	struct xfs_mount		*mp = args->dp->i_mount;
>  
>  	/* State machine switch */
> +next_state:
>  	switch (attr->xattri_dela_state) {
>  	case XFS_DAS_UNINIT:
>  		ASSERT(0);
> @@ -406,6 +407,7 @@ xfs_attr_set_iter(
>  		return xfs_attr_node_addname(attr);
>  
>  	case XFS_DAS_FOUND_LBLK:
> +	case XFS_DAS_FOUND_NBLK:
>  		/*
>  		 * Find space for remote blocks and fall into the allocation
>  		 * state.
> @@ -415,9 +417,10 @@ xfs_attr_set_iter(
>  			if (error)
>  				return error;
>  		}
> -		attr->xattri_dela_state = XFS_DAS_LEAF_ALLOC_RMT;
> +		attr->xattri_dela_state++;
>  		fallthrough;
>  	case XFS_DAS_LEAF_ALLOC_RMT:
> +	case XFS_DAS_NODE_ALLOC_RMT:
>  
>  		/*
>  		 * If there was an out-of-line value, allocate the blocks we
> @@ -466,16 +469,18 @@ xfs_attr_set_iter(
>  				return error;
>  			/*
>  			 * Commit the flag value change and start the next trans
> -			 * in series.
> +			 * in series at FLIP_FLAG.
>  			 */
> -			attr->xattri_dela_state = XFS_DAS_FLIP_LFLAG;
> +			attr->xattri_dela_state++;
>  			trace_xfs_attr_set_iter_return(attr->xattri_dela_state,
>  						       args->dp);
>  			return -EAGAIN;
>  		}
>  
> +		attr->xattri_dela_state++;
>  		fallthrough;
>  	case XFS_DAS_FLIP_LFLAG:
> +	case XFS_DAS_FLIP_NFLAG:
>  		/*
>  		 * Dismantle the "old" attribute/value pair by removing a
>  		 * "remote" value (if it exists).
> @@ -485,10 +490,10 @@ xfs_attr_set_iter(
>  		if (error)
>  			return error;
>  
> +		attr->xattri_dela_state++;
>  		fallthrough;
>  	case XFS_DAS_RM_LBLK:
> -		/* Set state in case xfs_attr_rmtval_remove returns -EAGAIN */
> -		attr->xattri_dela_state = XFS_DAS_RM_LBLK;
> +	case XFS_DAS_RM_NBLK:
>  		if (args->rmtblkno) {
>  			error = xfs_attr_rmtval_remove(attr);
>  			if (error == -EAGAIN)
> @@ -503,7 +508,16 @@ xfs_attr_set_iter(
>  			return -EAGAIN;
>  		}
>  
> -		fallthrough;
> +		/*
> +		 * This is the end of the shared leaf/node sequence. We need
> +		 * to continue at the next state in the sequence, but we can't
> +		 * easily just fall through. So we increment to the next state
> +		 * and then jump back to switch statement to evaluate the next
> +		 * state correctly.
> +		 */
> +		attr->xattri_dela_state++;
> +		goto next_state;
> +
>  	case XFS_DAS_RD_LEAF:
>  		/*
>  		 * This is the last step for leaf format. Read the block with
> @@ -524,106 +538,6 @@ xfs_attr_set_iter(
>  
>  		return error;
>  
> -	case XFS_DAS_FOUND_NBLK:
> -		/*
> -		 * Find space for remote blocks and fall into the allocation
> -		 * state.
> -		 */
> -		if (args->rmtblkno > 0) {
> -			error = xfs_attr_rmtval_find_space(attr);
> -			if (error)
> -				return error;
> -		}
> -
> -		attr->xattri_dela_state = XFS_DAS_NODE_ALLOC_RMT;
> -		fallthrough;
> -	case XFS_DAS_NODE_ALLOC_RMT:
> -		/*
> -		 * If there was an out-of-line value, allocate the blocks we
> -		 * identified for its storage and copy the value.  This is done
> -		 * after we create the attribute so that we don't overflow the
> -		 * maximum size of a transaction and/or hit a deadlock.
> -		 */
> -		if (args->rmtblkno > 0) {
> -			if (attr->xattri_blkcnt > 0) {
> -				error = xfs_attr_rmtval_set_blk(attr);
> -				if (error)
> -					return error;
> -				trace_xfs_attr_set_iter_return(
> -					attr->xattri_dela_state, args->dp);
> -				return -EAGAIN;
> -			}
> -
> -			error = xfs_attr_rmtval_set_value(args);
> -			if (error)
> -				return error;
> -		}
> -
> -		/*
> -		 * If this was not a rename, clear the incomplete flag and we're
> -		 * done.
> -		 */
> -		if (!(args->op_flags & XFS_DA_OP_RENAME)) {
> -			if (args->rmtblkno > 0)
> -				error = xfs_attr3_leaf_clearflag(args);
> -			goto out;
> -		}
> -
> -		/*
> -		 * If this is an atomic rename operation, we must "flip" the
> -		 * incomplete flags on the "new" and "old" attribute/value pairs
> -		 * so that one disappears and one appears atomically.  Then we
> -		 * must remove the "old" attribute/value pair.
> -		 *
> -		 * In a separate transaction, set the incomplete flag on the
> -		 * "old" attr and clear the incomplete flag on the "new" attr.
> -		 */
> -		if (!xfs_has_larp(mp)) {
> -			error = xfs_attr3_leaf_flipflags(args);
> -			if (error)
> -				goto out;
> -			/*
> -			 * Commit the flag value change and start the next trans
> -			 * in series
> -			 */
> -			attr->xattri_dela_state = XFS_DAS_FLIP_NFLAG;
> -			trace_xfs_attr_set_iter_return(attr->xattri_dela_state,
> -						       args->dp);
> -			return -EAGAIN;
> -		}
> -
> -		fallthrough;
> -	case XFS_DAS_FLIP_NFLAG:
> -		/*
> -		 * Dismantle the "old" attribute/value pair by removing a
> -		 * "remote" value (if it exists).
> -		 */
> -		xfs_attr_restore_rmt_blk(args);
> -
> -		error = xfs_attr_rmtval_invalidate(args);
> -		if (error)
> -			return error;
> -
> -		fallthrough;
> -	case XFS_DAS_RM_NBLK:
> -		/* Set state in case xfs_attr_rmtval_remove returns -EAGAIN */
> -		attr->xattri_dela_state = XFS_DAS_RM_NBLK;
> -		if (args->rmtblkno) {
> -			error = xfs_attr_rmtval_remove(attr);
> -			if (error == -EAGAIN)
> -				trace_xfs_attr_set_iter_return(
> -					attr->xattri_dela_state, args->dp);
> -
> -			if (error)
> -				return error;
> -
> -			attr->xattri_dela_state = XFS_DAS_CLR_FLAG;
> -			trace_xfs_attr_set_iter_return(attr->xattri_dela_state,
> -						       args->dp);
> -			return -EAGAIN;
> -		}
> -
> -		fallthrough;
>  	case XFS_DAS_CLR_FLAG:
>  		/*
>  		 * The last state for node format. Look up the old attr and
> @@ -635,7 +549,6 @@ xfs_attr_set_iter(
>  		ASSERT(0);
>  		break;
>  	}
> -out:
>  	return error;
>  }
>  
> diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
> index d016af4dbf81..37db61649217 100644
> --- a/fs/xfs/libxfs/xfs_attr.h
> +++ b/fs/xfs/libxfs/xfs_attr.h
> @@ -450,16 +450,21 @@ enum xfs_delattr_state {
>  	XFS_DAS_RMTBLK,			/* Removing remote blks */
>  	XFS_DAS_RM_NAME,		/* Remove attr name */
>  	XFS_DAS_RM_SHRINK,		/* We are shrinking the tree */
> +
> +	/* Leaf state set sequence */

I think this comment should note that the state increment operations of
xfs_attr_set_iter requires that the exact order of the values
FOUND_[LN]BLK through RM_[LN]BLK must be preserved exactly.

Question: Are we supposed to be able to dela_state++ our way from
RM_LBLK to RD_LEAF and from RM_NBLK to CLR_FLAG?

With that comment added,
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

>  	XFS_DAS_FOUND_LBLK,		/* We found leaf blk for attr */
>  	XFS_DAS_LEAF_ALLOC_RMT,		/* We are allocating remote blocks */
> -	XFS_DAS_FOUND_NBLK,		/* We found node blk for attr */
> -	XFS_DAS_NODE_ALLOC_RMT,		/* We are allocating remote blocks */
>  	XFS_DAS_FLIP_LFLAG,		/* Flipped leaf INCOMPLETE attr flag */
>  	XFS_DAS_RM_LBLK,		/* A rename is removing leaf blocks */
>  	XFS_DAS_RD_LEAF,		/* Read in the new leaf */
> +
> +	/* Node state set sequence, must match leaf state above */
> +	XFS_DAS_FOUND_NBLK,		/* We found node blk for attr */
> +	XFS_DAS_NODE_ALLOC_RMT,		/* We are allocating remote blocks */
>  	XFS_DAS_FLIP_NFLAG,		/* Flipped node INCOMPLETE attr flag */
>  	XFS_DAS_RM_NBLK,		/* A rename is removing node blocks */
>  	XFS_DAS_CLR_FLAG,		/* Clear incomplete flag */
> +
>  	XFS_DAS_DONE,			/* finished operation */
>  };
>  
> -- 
> 2.35.1
> 

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 08/18] xfs: split remote attr setting out from replace path
  2022-05-09  0:41 ` [PATCH 08/18] xfs: split remote attr setting out from replace path Dave Chinner
@ 2022-05-10 23:22   ` Darrick J. Wong
  0 siblings, 0 replies; 56+ messages in thread
From: Darrick J. Wong @ 2022-05-10 23:22 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Mon, May 09, 2022 at 10:41:28AM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> When we set a new xattr, we have three exit paths:
> 
> 	1. nothing else to do
> 	2. allocate and set the remote xattr value
> 	3. perform the rest of a replace operation
> 
> Currently we push both 2 and 3 into the same state, regardless of
> whether we just set a remote attribute or not. Once we've set the
> remote xattr, we have two exit states:
> 
> 	1. nothing else to do
> 	2. perform the rest of a replace operation
> 
> Hence we can split the remote xattr allocation and setting into
> their own states and factor it out of xfs_attr_set_iter() to further
> clean up the state machine and the implementation of the state
> machine.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> Reviewed-by: Allison Henderson<allison.henderson@oracle.com>

Pretty straightforward,
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> ---
>  fs/xfs/libxfs/xfs_attr.c | 113 +++++++++++++++++++++------------------
>  fs/xfs/libxfs/xfs_attr.h |  14 +++--
>  fs/xfs/xfs_trace.h       |   9 ++--
>  3 files changed, 77 insertions(+), 59 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
> index be580da62f08..d2b29f7e103a 100644
> --- a/fs/xfs/libxfs/xfs_attr.c
> +++ b/fs/xfs/libxfs/xfs_attr.c
> @@ -334,9 +334,11 @@ xfs_attr_leaf_addname(
>  	 * or perform more xattr manipulations. Otherwise there is nothing more
>  	 * to do and we can return success.
>  	 */
> -	if (args->rmtblkno ||
> -	    (args->op_flags & XFS_DA_OP_RENAME)) {
> -		attr->xattri_dela_state = XFS_DAS_FOUND_LBLK;
> +	if (args->rmtblkno) {
> +		attr->xattri_dela_state = XFS_DAS_LEAF_SET_RMT;
> +		error = -EAGAIN;
> +	} else if (args->op_flags & XFS_DA_OP_RENAME) {
> +		attr->xattri_dela_state = XFS_DAS_LEAF_REPLACE;
>  		error = -EAGAIN;
>  	} else {
>  		attr->xattri_dela_state = XFS_DAS_DONE;
> @@ -363,9 +365,11 @@ xfs_attr_node_addname(
>  	if (error)
>  		return error;
>  
> -	if (args->rmtblkno ||
> -	    (args->op_flags & XFS_DA_OP_RENAME)) {
> -		attr->xattri_dela_state = XFS_DAS_FOUND_NBLK;
> +	if (args->rmtblkno) {
> +		attr->xattri_dela_state = XFS_DAS_NODE_SET_RMT;
> +		error = -EAGAIN;
> +	} else if (args->op_flags & XFS_DA_OP_RENAME) {
> +		attr->xattri_dela_state = XFS_DAS_NODE_REPLACE;
>  		error = -EAGAIN;
>  	} else {
>  		attr->xattri_dela_state = XFS_DAS_DONE;
> @@ -375,6 +379,40 @@ xfs_attr_node_addname(
>  	return error;
>  }
>  
> +static int
> +xfs_attr_rmtval_alloc(
> +	struct xfs_attr_item		*attr)
> +{
> +	struct xfs_da_args              *args = attr->xattri_da_args;
> +	int				error = 0;
> +
> +	/*
> +	 * If there was an out-of-line value, allocate the blocks we
> +	 * identified for its storage and copy the value.  This is done
> +	 * after we create the attribute so that we don't overflow the
> +	 * maximum size of a transaction and/or hit a deadlock.
> +	 */
> +	if (attr->xattri_blkcnt > 0) {
> +		error = xfs_attr_rmtval_set_blk(attr);
> +		if (error)
> +			return error;
> +		error = -EAGAIN;
> +		goto out;
> +	}
> +
> +	error = xfs_attr_rmtval_set_value(args);
> +	if (error)
> +		return error;
> +
> +	/* If this is not a rename, clear the incomplete flag and we're done. */
> +	if (!(args->op_flags & XFS_DA_OP_RENAME)) {
> +		error = xfs_attr3_leaf_clearflag(args);
> +		attr->xattri_dela_state = XFS_DAS_DONE;
> +	}
> +out:
> +	trace_xfs_attr_rmtval_alloc(attr->xattri_dela_state, args->dp);
> +	return error;
> +}
>  
>  /*
>   * Set the attribute specified in @args.
> @@ -406,54 +444,26 @@ xfs_attr_set_iter(
>  	case XFS_DAS_NODE_ADD:
>  		return xfs_attr_node_addname(attr);
>  
> -	case XFS_DAS_FOUND_LBLK:
> -	case XFS_DAS_FOUND_NBLK:
> -		/*
> -		 * Find space for remote blocks and fall into the allocation
> -		 * state.
> -		 */
> -		if (args->rmtblkno > 0) {
> -			error = xfs_attr_rmtval_find_space(attr);
> -			if (error)
> -				return error;
> -		}
> +	case XFS_DAS_LEAF_SET_RMT:
> +	case XFS_DAS_NODE_SET_RMT:
> +		error = xfs_attr_rmtval_find_space(attr);
> +		if (error)
> +			return error;
>  		attr->xattri_dela_state++;
>  		fallthrough;
> +
>  	case XFS_DAS_LEAF_ALLOC_RMT:
>  	case XFS_DAS_NODE_ALLOC_RMT:
> -
> -		/*
> -		 * If there was an out-of-line value, allocate the blocks we
> -		 * identified for its storage and copy the value.  This is done
> -		 * after we create the attribute so that we don't overflow the
> -		 * maximum size of a transaction and/or hit a deadlock.
> -		 */
> -		if (args->rmtblkno > 0) {
> -			if (attr->xattri_blkcnt > 0) {
> -				error = xfs_attr_rmtval_set_blk(attr);
> -				if (error)
> -					return error;
> -				trace_xfs_attr_set_iter_return(
> -						attr->xattri_dela_state,
> -						args->dp);
> -				return -EAGAIN;
> -			}
> -
> -			error = xfs_attr_rmtval_set_value(args);
> -			if (error)
> -				return error;
> -		}
> -
> -		/*
> -		 * If this is not a rename, clear the incomplete flag and we're
> -		 * done.
> -		 */
> -		if (!(args->op_flags & XFS_DA_OP_RENAME)) {
> -			if (args->rmtblkno > 0)
> -				error = xfs_attr3_leaf_clearflag(args);
> +		error = xfs_attr_rmtval_alloc(attr);
> +		if (error)
>  			return error;
> -		}
> +		if (attr->xattri_dela_state == XFS_DAS_DONE)
> +			break;
> +		attr->xattri_dela_state++;
> +		fallthrough;
>  
> +	case XFS_DAS_LEAF_REPLACE:
> +	case XFS_DAS_NODE_REPLACE:
>  		/*
>  		 * If this is an atomic rename operation, we must "flip" the
>  		 * incomplete flags on the "new" and "old" attribute/value pairs
> @@ -471,10 +481,9 @@ xfs_attr_set_iter(
>  			 * Commit the flag value change and start the next trans
>  			 * in series at FLIP_FLAG.
>  			 */
> +			error = -EAGAIN;
>  			attr->xattri_dela_state++;
> -			trace_xfs_attr_set_iter_return(attr->xattri_dela_state,
> -						       args->dp);
> -			return -EAGAIN;
> +			break;
>  		}
>  
>  		attr->xattri_dela_state++;
> @@ -549,6 +558,8 @@ xfs_attr_set_iter(
>  		ASSERT(0);
>  		break;
>  	}
> +
> +	trace_xfs_attr_set_iter_return(attr->xattri_dela_state, args->dp);
>  	return error;
>  }
>  
> diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
> index 37db61649217..1749fd8f7ddd 100644
> --- a/fs/xfs/libxfs/xfs_attr.h
> +++ b/fs/xfs/libxfs/xfs_attr.h
> @@ -452,15 +452,17 @@ enum xfs_delattr_state {
>  	XFS_DAS_RM_SHRINK,		/* We are shrinking the tree */
>  
>  	/* Leaf state set sequence */
> -	XFS_DAS_FOUND_LBLK,		/* We found leaf blk for attr */
> +	XFS_DAS_LEAF_SET_RMT,		/* set a remote xattr from a leaf */
>  	XFS_DAS_LEAF_ALLOC_RMT,		/* We are allocating remote blocks */
> +	XFS_DAS_LEAF_REPLACE,		/* Perform replace ops on a leaf */
>  	XFS_DAS_FLIP_LFLAG,		/* Flipped leaf INCOMPLETE attr flag */
>  	XFS_DAS_RM_LBLK,		/* A rename is removing leaf blocks */
>  	XFS_DAS_RD_LEAF,		/* Read in the new leaf */
>  
>  	/* Node state set sequence, must match leaf state above */
> -	XFS_DAS_FOUND_NBLK,		/* We found node blk for attr */
> +	XFS_DAS_NODE_SET_RMT,		/* set a remote xattr from a node */
>  	XFS_DAS_NODE_ALLOC_RMT,		/* We are allocating remote blocks */
> +	XFS_DAS_NODE_REPLACE,		/* Perform replace ops on a node */
>  	XFS_DAS_FLIP_NFLAG,		/* Flipped node INCOMPLETE attr flag */
>  	XFS_DAS_RM_NBLK,		/* A rename is removing node blocks */
>  	XFS_DAS_CLR_FLAG,		/* Clear incomplete flag */
> @@ -476,13 +478,15 @@ enum xfs_delattr_state {
>  	{ XFS_DAS_RMTBLK,	"XFS_DAS_RMTBLK" }, \
>  	{ XFS_DAS_RM_NAME,	"XFS_DAS_RM_NAME" }, \
>  	{ XFS_DAS_RM_SHRINK,	"XFS_DAS_RM_SHRINK" }, \
> -	{ XFS_DAS_FOUND_LBLK,	"XFS_DAS_FOUND_LBLK" }, \
> +	{ XFS_DAS_LEAF_SET_RMT,	"XFS_DAS_LEAF_SET_RMT" }, \
>  	{ XFS_DAS_LEAF_ALLOC_RMT, "XFS_DAS_LEAF_ALLOC_RMT" }, \
> -	{ XFS_DAS_FOUND_NBLK,	"XFS_DAS_FOUND_NBLK" }, \
> -	{ XFS_DAS_NODE_ALLOC_RMT, "XFS_DAS_NODE_ALLOC_RMT" },  \
> +	{ XFS_DAS_LEAF_REPLACE,	"XFS_DAS_LEAF_REPLACE" }, \
>  	{ XFS_DAS_FLIP_LFLAG,	"XFS_DAS_FLIP_LFLAG" }, \
>  	{ XFS_DAS_RM_LBLK,	"XFS_DAS_RM_LBLK" }, \
>  	{ XFS_DAS_RD_LEAF,	"XFS_DAS_RD_LEAF" }, \
> +	{ XFS_DAS_NODE_SET_RMT,	"XFS_DAS_NODE_SET_RMT" }, \
> +	{ XFS_DAS_NODE_ALLOC_RMT, "XFS_DAS_NODE_ALLOC_RMT" },  \
> +	{ XFS_DAS_NODE_REPLACE,	"XFS_DAS_NODE_REPLACE" },  \
>  	{ XFS_DAS_FLIP_NFLAG,	"XFS_DAS_FLIP_NFLAG" }, \
>  	{ XFS_DAS_RM_NBLK,	"XFS_DAS_RM_NBLK" }, \
>  	{ XFS_DAS_CLR_FLAG,	"XFS_DAS_CLR_FLAG" }, \
> diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
> index 067ab31d7a20..cb9122327114 100644
> --- a/fs/xfs/xfs_trace.h
> +++ b/fs/xfs/xfs_trace.h
> @@ -4136,13 +4136,15 @@ TRACE_DEFINE_ENUM(XFS_DAS_NODE_ADD);
>  TRACE_DEFINE_ENUM(XFS_DAS_RMTBLK);
>  TRACE_DEFINE_ENUM(XFS_DAS_RM_NAME);
>  TRACE_DEFINE_ENUM(XFS_DAS_RM_SHRINK);
> -TRACE_DEFINE_ENUM(XFS_DAS_FOUND_LBLK);
> +TRACE_DEFINE_ENUM(XFS_DAS_LEAF_SET_RMT);
>  TRACE_DEFINE_ENUM(XFS_DAS_LEAF_ALLOC_RMT);
> -TRACE_DEFINE_ENUM(XFS_DAS_FOUND_NBLK);
> -TRACE_DEFINE_ENUM(XFS_DAS_NODE_ALLOC_RMT);
> +TRACE_DEFINE_ENUM(XFS_DAS_LEAF_REPLACE);
>  TRACE_DEFINE_ENUM(XFS_DAS_FLIP_LFLAG);
>  TRACE_DEFINE_ENUM(XFS_DAS_RM_LBLK);
>  TRACE_DEFINE_ENUM(XFS_DAS_RD_LEAF);
> +TRACE_DEFINE_ENUM(XFS_DAS_NODE_SET_RMT);
> +TRACE_DEFINE_ENUM(XFS_DAS_NODE_ALLOC_RMT);
> +TRACE_DEFINE_ENUM(XFS_DAS_NODE_REPLACE);
>  TRACE_DEFINE_ENUM(XFS_DAS_FLIP_NFLAG);
>  TRACE_DEFINE_ENUM(XFS_DAS_RM_NBLK);
>  TRACE_DEFINE_ENUM(XFS_DAS_CLR_FLAG);
> @@ -4172,6 +4174,7 @@ DEFINE_DAS_STATE_EVENT(xfs_attr_set_iter_return);
>  DEFINE_DAS_STATE_EVENT(xfs_attr_leaf_addname_return);
>  DEFINE_DAS_STATE_EVENT(xfs_attr_node_addname_return);
>  DEFINE_DAS_STATE_EVENT(xfs_attr_remove_iter_return);
> +DEFINE_DAS_STATE_EVENT(xfs_attr_rmtval_alloc);
>  DEFINE_DAS_STATE_EVENT(xfs_attr_rmtval_remove_return);
>  DEFINE_DAS_STATE_EVENT(xfs_attr_defer_add);
>  DEFINE_DAS_STATE_EVENT(xfs_attr_defer_replace);
> -- 
> 2.35.1
> 

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 09/18] xfs: XFS_DAS_LEAF_REPLACE state only needed if !LARP
  2022-05-09  0:41 ` [PATCH 09/18] xfs: XFS_DAS_LEAF_REPLACE state only needed if !LARP Dave Chinner
@ 2022-05-10 23:24   ` Darrick J. Wong
  0 siblings, 0 replies; 56+ messages in thread
From: Darrick J. Wong @ 2022-05-10 23:24 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Mon, May 09, 2022 at 10:41:29AM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> We can skip the REPLACE state when LARP is enabled, but that means
> the XFS_DAS_FLIP_LFLAG state is now poorly named - it indicates
> something that has been done rather than what the state is going to
> do. Rename it to "REMOVE_OLD" to indicate that we are now going to
> perform removal of the old attr.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> Reviewed-by: Allison Henderson<allison.henderson@oracle.com>

Neat cleanup!
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> ---
>  fs/xfs/libxfs/xfs_attr.c | 81 +++++++++++++++++++++++++---------------
>  fs/xfs/libxfs/xfs_attr.h | 44 +++++++++++-----------
>  fs/xfs/xfs_trace.h       |  4 +-
>  3 files changed, 75 insertions(+), 54 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
> index d2b29f7e103a..0f4636e2e246 100644
> --- a/fs/xfs/libxfs/xfs_attr.c
> +++ b/fs/xfs/libxfs/xfs_attr.c
> @@ -296,6 +296,26 @@ xfs_attr_sf_addname(
>  	return error;
>  }
>  
> +/*
> + * When we bump the state to REPLACE, we may actually need to skip over the
> + * state. When LARP mode is enabled, we don't need to run the atomic flags flip,
> + * so we skip straight over the REPLACE state and go on to REMOVE_OLD.
> + */
> +static void
> +xfs_attr_dela_state_set_replace(
> +	struct xfs_attr_item	*attr,
> +	enum xfs_delattr_state	replace)
> +{
> +	struct xfs_da_args	*args = attr->xattri_da_args;
> +
> +	ASSERT(replace == XFS_DAS_LEAF_REPLACE ||
> +			replace == XFS_DAS_NODE_REPLACE);
> +
> +	attr->xattri_dela_state = replace;
> +	if (xfs_has_larp(args->dp->i_mount))
> +		attr->xattri_dela_state++;
> +}
> +
>  static int
>  xfs_attr_leaf_addname(
>  	struct xfs_attr_item	*attr)
> @@ -338,7 +358,7 @@ xfs_attr_leaf_addname(
>  		attr->xattri_dela_state = XFS_DAS_LEAF_SET_RMT;
>  		error = -EAGAIN;
>  	} else if (args->op_flags & XFS_DA_OP_RENAME) {
> -		attr->xattri_dela_state = XFS_DAS_LEAF_REPLACE;
> +		xfs_attr_dela_state_set_replace(attr, XFS_DAS_LEAF_REPLACE);
>  		error = -EAGAIN;
>  	} else {
>  		attr->xattri_dela_state = XFS_DAS_DONE;
> @@ -369,7 +389,7 @@ xfs_attr_node_addname(
>  		attr->xattri_dela_state = XFS_DAS_NODE_SET_RMT;
>  		error = -EAGAIN;
>  	} else if (args->op_flags & XFS_DA_OP_RENAME) {
> -		attr->xattri_dela_state = XFS_DAS_NODE_REPLACE;
> +		xfs_attr_dela_state_set_replace(attr, XFS_DAS_NODE_REPLACE);
>  		error = -EAGAIN;
>  	} else {
>  		attr->xattri_dela_state = XFS_DAS_DONE;
> @@ -396,8 +416,11 @@ xfs_attr_rmtval_alloc(
>  		error = xfs_attr_rmtval_set_blk(attr);
>  		if (error)
>  			return error;
> -		error = -EAGAIN;
> -		goto out;
> +		/* Roll the transaction only if there is more to allocate. */
> +		if (attr->xattri_blkcnt > 0) {
> +			error = -EAGAIN;
> +			goto out;
> +		}
>  	}
>  
>  	error = xfs_attr_rmtval_set_value(args);
> @@ -408,6 +431,13 @@ xfs_attr_rmtval_alloc(
>  	if (!(args->op_flags & XFS_DA_OP_RENAME)) {
>  		error = xfs_attr3_leaf_clearflag(args);
>  		attr->xattri_dela_state = XFS_DAS_DONE;
> +	} else {
> +		/*
> +		 * We are running a REPLACE operation, so we need to bump the
> +		 * state to the step in that operation.
> +		 */
> +		attr->xattri_dela_state++;
> +		xfs_attr_dela_state_set_replace(attr, attr->xattri_dela_state);
>  	}
>  out:
>  	trace_xfs_attr_rmtval_alloc(attr->xattri_dela_state, args->dp);
> @@ -429,7 +459,6 @@ xfs_attr_set_iter(
>  	struct xfs_inode		*dp = args->dp;
>  	struct xfs_buf			*bp = NULL;
>  	int				forkoff, error = 0;
> -	struct xfs_mount		*mp = args->dp->i_mount;
>  
>  	/* State machine switch */
>  next_state:
> @@ -459,37 +488,29 @@ xfs_attr_set_iter(
>  			return error;
>  		if (attr->xattri_dela_state == XFS_DAS_DONE)
>  			break;
> -		attr->xattri_dela_state++;
> -		fallthrough;
> +		goto next_state;
>  
>  	case XFS_DAS_LEAF_REPLACE:
>  	case XFS_DAS_NODE_REPLACE:
>  		/*
> -		 * If this is an atomic rename operation, we must "flip" the
> -		 * incomplete flags on the "new" and "old" attribute/value pairs
> -		 * so that one disappears and one appears atomically.  Then we
> -		 * must remove the "old" attribute/value pair.
> -		 *
> -		 * In a separate transaction, set the incomplete flag on the
> -		 * "old" attr and clear the incomplete flag on the "new" attr.
> +		 * We must "flip" the incomplete flags on the "new" and "old"
> +		 * attribute/value pairs so that one disappears and one appears
> +		 * atomically.  Then we must remove the "old" attribute/value
> +		 * pair.
>  		 */
> -		if (!xfs_has_larp(mp)) {
> -			error = xfs_attr3_leaf_flipflags(args);
> -			if (error)
> -				return error;
> -			/*
> -			 * Commit the flag value change and start the next trans
> -			 * in series at FLIP_FLAG.
> -			 */
> -			error = -EAGAIN;
> -			attr->xattri_dela_state++;
> -			break;
> -		}
> -
> +		error = xfs_attr3_leaf_flipflags(args);
> +		if (error)
> +			return error;
> +		/*
> +		 * Commit the flag value change and start the next trans
> +		 * in series at REMOVE_OLD.
> +		 */
> +		error = -EAGAIN;
>  		attr->xattri_dela_state++;
> -		fallthrough;
> -	case XFS_DAS_FLIP_LFLAG:
> -	case XFS_DAS_FLIP_NFLAG:
> +		break;
> +
> +	case XFS_DAS_LEAF_REMOVE_OLD:
> +	case XFS_DAS_NODE_REMOVE_OLD:
>  		/*
>  		 * Dismantle the "old" attribute/value pair by removing a
>  		 * "remote" value (if it exists).
> diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
> index 1749fd8f7ddd..3f1234272f3a 100644
> --- a/fs/xfs/libxfs/xfs_attr.h
> +++ b/fs/xfs/libxfs/xfs_attr.h
> @@ -455,7 +455,7 @@ enum xfs_delattr_state {
>  	XFS_DAS_LEAF_SET_RMT,		/* set a remote xattr from a leaf */
>  	XFS_DAS_LEAF_ALLOC_RMT,		/* We are allocating remote blocks */
>  	XFS_DAS_LEAF_REPLACE,		/* Perform replace ops on a leaf */
> -	XFS_DAS_FLIP_LFLAG,		/* Flipped leaf INCOMPLETE attr flag */
> +	XFS_DAS_LEAF_REMOVE_OLD,	/* Start removing old attr from leaf */
>  	XFS_DAS_RM_LBLK,		/* A rename is removing leaf blocks */
>  	XFS_DAS_RD_LEAF,		/* Read in the new leaf */
>  
> @@ -463,7 +463,7 @@ enum xfs_delattr_state {
>  	XFS_DAS_NODE_SET_RMT,		/* set a remote xattr from a node */
>  	XFS_DAS_NODE_ALLOC_RMT,		/* We are allocating remote blocks */
>  	XFS_DAS_NODE_REPLACE,		/* Perform replace ops on a node */
> -	XFS_DAS_FLIP_NFLAG,		/* Flipped node INCOMPLETE attr flag */
> +	XFS_DAS_NODE_REMOVE_OLD,	/* Start removing old attr from node */
>  	XFS_DAS_RM_NBLK,		/* A rename is removing node blocks */
>  	XFS_DAS_CLR_FLAG,		/* Clear incomplete flag */
>  
> @@ -471,26 +471,26 @@ enum xfs_delattr_state {
>  };
>  
>  #define XFS_DAS_STRINGS	\
> -	{ XFS_DAS_UNINIT,	"XFS_DAS_UNINIT" }, \
> -	{ XFS_DAS_SF_ADD,	"XFS_DAS_SF_ADD" }, \
> -	{ XFS_DAS_LEAF_ADD,	"XFS_DAS_LEAF_ADD" }, \
> -	{ XFS_DAS_NODE_ADD,	"XFS_DAS_NODE_ADD" }, \
> -	{ XFS_DAS_RMTBLK,	"XFS_DAS_RMTBLK" }, \
> -	{ XFS_DAS_RM_NAME,	"XFS_DAS_RM_NAME" }, \
> -	{ XFS_DAS_RM_SHRINK,	"XFS_DAS_RM_SHRINK" }, \
> -	{ XFS_DAS_LEAF_SET_RMT,	"XFS_DAS_LEAF_SET_RMT" }, \
> -	{ XFS_DAS_LEAF_ALLOC_RMT, "XFS_DAS_LEAF_ALLOC_RMT" }, \
> -	{ XFS_DAS_LEAF_REPLACE,	"XFS_DAS_LEAF_REPLACE" }, \
> -	{ XFS_DAS_FLIP_LFLAG,	"XFS_DAS_FLIP_LFLAG" }, \
> -	{ XFS_DAS_RM_LBLK,	"XFS_DAS_RM_LBLK" }, \
> -	{ XFS_DAS_RD_LEAF,	"XFS_DAS_RD_LEAF" }, \
> -	{ XFS_DAS_NODE_SET_RMT,	"XFS_DAS_NODE_SET_RMT" }, \
> -	{ XFS_DAS_NODE_ALLOC_RMT, "XFS_DAS_NODE_ALLOC_RMT" },  \
> -	{ XFS_DAS_NODE_REPLACE,	"XFS_DAS_NODE_REPLACE" },  \
> -	{ XFS_DAS_FLIP_NFLAG,	"XFS_DAS_FLIP_NFLAG" }, \
> -	{ XFS_DAS_RM_NBLK,	"XFS_DAS_RM_NBLK" }, \
> -	{ XFS_DAS_CLR_FLAG,	"XFS_DAS_CLR_FLAG" }, \
> -	{ XFS_DAS_DONE,		"XFS_DAS_DONE" }
> +	{ XFS_DAS_UNINIT,		"XFS_DAS_UNINIT" }, \
> +	{ XFS_DAS_SF_ADD,		"XFS_DAS_SF_ADD" }, \
> +	{ XFS_DAS_LEAF_ADD,		"XFS_DAS_LEAF_ADD" }, \
> +	{ XFS_DAS_NODE_ADD,		"XFS_DAS_NODE_ADD" }, \
> +	{ XFS_DAS_RMTBLK,		"XFS_DAS_RMTBLK" }, \
> +	{ XFS_DAS_RM_NAME,		"XFS_DAS_RM_NAME" }, \
> +	{ XFS_DAS_RM_SHRINK,		"XFS_DAS_RM_SHRINK" }, \
> +	{ XFS_DAS_LEAF_SET_RMT,		"XFS_DAS_LEAF_SET_RMT" }, \
> +	{ XFS_DAS_LEAF_ALLOC_RMT,	"XFS_DAS_LEAF_ALLOC_RMT" }, \
> +	{ XFS_DAS_LEAF_REPLACE,		"XFS_DAS_LEAF_REPLACE" }, \
> +	{ XFS_DAS_LEAF_REMOVE_OLD,	"XFS_DAS_LEAF_REMOVE_OLD" }, \
> +	{ XFS_DAS_RM_LBLK,		"XFS_DAS_RM_LBLK" }, \
> +	{ XFS_DAS_RD_LEAF,		"XFS_DAS_RD_LEAF" }, \
> +	{ XFS_DAS_NODE_SET_RMT,		"XFS_DAS_NODE_SET_RMT" }, \
> +	{ XFS_DAS_NODE_ALLOC_RMT,	"XFS_DAS_NODE_ALLOC_RMT" },  \
> +	{ XFS_DAS_NODE_REPLACE,		"XFS_DAS_NODE_REPLACE" },  \
> +	{ XFS_DAS_NODE_REMOVE_OLD,	"XFS_DAS_NODE_REMOVE_OLD" }, \
> +	{ XFS_DAS_RM_NBLK,		"XFS_DAS_RM_NBLK" }, \
> +	{ XFS_DAS_CLR_FLAG,		"XFS_DAS_CLR_FLAG" }, \
> +	{ XFS_DAS_DONE,			"XFS_DAS_DONE" }
>  
>  /*
>   * Defines for xfs_attr_item.xattri_flags
> diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
> index cb9122327114..b528c0f375c2 100644
> --- a/fs/xfs/xfs_trace.h
> +++ b/fs/xfs/xfs_trace.h
> @@ -4139,13 +4139,13 @@ TRACE_DEFINE_ENUM(XFS_DAS_RM_SHRINK);
>  TRACE_DEFINE_ENUM(XFS_DAS_LEAF_SET_RMT);
>  TRACE_DEFINE_ENUM(XFS_DAS_LEAF_ALLOC_RMT);
>  TRACE_DEFINE_ENUM(XFS_DAS_LEAF_REPLACE);
> -TRACE_DEFINE_ENUM(XFS_DAS_FLIP_LFLAG);
> +TRACE_DEFINE_ENUM(XFS_DAS_LEAF_REMOVE_OLD);
>  TRACE_DEFINE_ENUM(XFS_DAS_RM_LBLK);
>  TRACE_DEFINE_ENUM(XFS_DAS_RD_LEAF);
>  TRACE_DEFINE_ENUM(XFS_DAS_NODE_SET_RMT);
>  TRACE_DEFINE_ENUM(XFS_DAS_NODE_ALLOC_RMT);
>  TRACE_DEFINE_ENUM(XFS_DAS_NODE_REPLACE);
> -TRACE_DEFINE_ENUM(XFS_DAS_FLIP_NFLAG);
> +TRACE_DEFINE_ENUM(XFS_DAS_NODE_REMOVE_OLD);
>  TRACE_DEFINE_ENUM(XFS_DAS_RM_NBLK);
>  TRACE_DEFINE_ENUM(XFS_DAS_CLR_FLAG);
>  
> -- 
> 2.35.1
> 

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 10/18] xfs: remote xattr removal in xfs_attr_set_iter() is conditional
  2022-05-09  0:41 ` [PATCH 10/18] xfs: remote xattr removal in xfs_attr_set_iter() is conditional Dave Chinner
@ 2022-05-10 23:26   ` Darrick J. Wong
  0 siblings, 0 replies; 56+ messages in thread
From: Darrick J. Wong @ 2022-05-10 23:26 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Mon, May 09, 2022 at 10:41:30AM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> We may not have a remote value for the old xattr we have to remove,
> so skip over the remote value removal states and go straight to
> the xattr name removal in the leaf/node block.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> Reviewed-by: Allison Henderson<allison.henderson@oracle.com>

Makes sense,
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> ---
>  fs/xfs/libxfs/xfs_attr.c | 59 ++++++++++++++++++++--------------------
>  fs/xfs/libxfs/xfs_attr.h |  8 +++---
>  fs/xfs/xfs_trace.h       |  4 +--
>  3 files changed, 36 insertions(+), 35 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
> index 0f4636e2e246..d65abaead9a1 100644
> --- a/fs/xfs/libxfs/xfs_attr.c
> +++ b/fs/xfs/libxfs/xfs_attr.c
> @@ -495,15 +495,14 @@ xfs_attr_set_iter(
>  		/*
>  		 * We must "flip" the incomplete flags on the "new" and "old"
>  		 * attribute/value pairs so that one disappears and one appears
> -		 * atomically.  Then we must remove the "old" attribute/value
> -		 * pair.
> +		 * atomically.
>  		 */
>  		error = xfs_attr3_leaf_flipflags(args);
>  		if (error)
>  			return error;
>  		/*
> -		 * Commit the flag value change and start the next trans
> -		 * in series at REMOVE_OLD.
> +		 * We must commit the flag value change now to make it atomic
> +		 * and then we can start the next trans in series at REMOVE_OLD.
>  		 */
>  		error = -EAGAIN;
>  		attr->xattri_dela_state++;
> @@ -512,41 +511,43 @@ xfs_attr_set_iter(
>  	case XFS_DAS_LEAF_REMOVE_OLD:
>  	case XFS_DAS_NODE_REMOVE_OLD:
>  		/*
> -		 * Dismantle the "old" attribute/value pair by removing a
> -		 * "remote" value (if it exists).
> +		 * If we have a remote attr, start the process of removing it
> +		 * by invalidating any cached buffers.
> +		 *
> +		 * If we don't have a remote attr, we skip the remote block
> +		 * removal state altogether with a second state increment.
>  		 */
>  		xfs_attr_restore_rmt_blk(args);
> -		error = xfs_attr_rmtval_invalidate(args);
> -		if (error)
> -			return error;
> -
> -		attr->xattri_dela_state++;
> -		fallthrough;
> -	case XFS_DAS_RM_LBLK:
> -	case XFS_DAS_RM_NBLK:
>  		if (args->rmtblkno) {
> -			error = xfs_attr_rmtval_remove(attr);
> -			if (error == -EAGAIN)
> -				trace_xfs_attr_set_iter_return(
> -					attr->xattri_dela_state, args->dp);
> +			error = xfs_attr_rmtval_invalidate(args);
>  			if (error)
>  				return error;
> -
> -			attr->xattri_dela_state = XFS_DAS_RD_LEAF;
> -			trace_xfs_attr_set_iter_return(attr->xattri_dela_state,
> -						       args->dp);
> -			return -EAGAIN;
> +		} else {
> +			attr->xattri_dela_state++;
>  		}
>  
> +		attr->xattri_dela_state++;
> +		goto next_state;
> +
> +	case XFS_DAS_LEAF_REMOVE_RMT:
> +	case XFS_DAS_NODE_REMOVE_RMT:
> +		error = xfs_attr_rmtval_remove(attr);
> +		if (error == -EAGAIN)
> +			break;
> +		if (error)
> +			return error;
> +
>  		/*
> -		 * This is the end of the shared leaf/node sequence. We need
> -		 * to continue at the next state in the sequence, but we can't
> -		 * easily just fall through. So we increment to the next state
> -		 * and then jump back to switch statement to evaluate the next
> -		 * state correctly.
> +		 * We've finished removing the remote attr blocks, so commit the
> +		 * transaction and move on to removing the attr name from the
> +		 * leaf/node block. Removing the attr might require a full
> +		 * transaction reservation for btree block freeing, so we
> +		 * can't do that in the same transaction where we removed the
> +		 * remote attr blocks.
>  		 */
> +		error = -EAGAIN;
>  		attr->xattri_dela_state++;
> -		goto next_state;
> +		break;
>  
>  	case XFS_DAS_RD_LEAF:
>  		/*
> diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
> index 3f1234272f3a..d67535e4ce5a 100644
> --- a/fs/xfs/libxfs/xfs_attr.h
> +++ b/fs/xfs/libxfs/xfs_attr.h
> @@ -456,7 +456,7 @@ enum xfs_delattr_state {
>  	XFS_DAS_LEAF_ALLOC_RMT,		/* We are allocating remote blocks */
>  	XFS_DAS_LEAF_REPLACE,		/* Perform replace ops on a leaf */
>  	XFS_DAS_LEAF_REMOVE_OLD,	/* Start removing old attr from leaf */
> -	XFS_DAS_RM_LBLK,		/* A rename is removing leaf blocks */
> +	XFS_DAS_LEAF_REMOVE_RMT,	/* A rename is removing remote blocks */
>  	XFS_DAS_RD_LEAF,		/* Read in the new leaf */
>  
>  	/* Node state set sequence, must match leaf state above */
> @@ -464,7 +464,7 @@ enum xfs_delattr_state {
>  	XFS_DAS_NODE_ALLOC_RMT,		/* We are allocating remote blocks */
>  	XFS_DAS_NODE_REPLACE,		/* Perform replace ops on a node */
>  	XFS_DAS_NODE_REMOVE_OLD,	/* Start removing old attr from node */
> -	XFS_DAS_RM_NBLK,		/* A rename is removing node blocks */
> +	XFS_DAS_NODE_REMOVE_RMT,	/* A rename is removing remote blocks */
>  	XFS_DAS_CLR_FLAG,		/* Clear incomplete flag */
>  
>  	XFS_DAS_DONE,			/* finished operation */
> @@ -482,13 +482,13 @@ enum xfs_delattr_state {
>  	{ XFS_DAS_LEAF_ALLOC_RMT,	"XFS_DAS_LEAF_ALLOC_RMT" }, \
>  	{ XFS_DAS_LEAF_REPLACE,		"XFS_DAS_LEAF_REPLACE" }, \
>  	{ XFS_DAS_LEAF_REMOVE_OLD,	"XFS_DAS_LEAF_REMOVE_OLD" }, \
> -	{ XFS_DAS_RM_LBLK,		"XFS_DAS_RM_LBLK" }, \
> +	{ XFS_DAS_LEAF_REMOVE_RMT,	"XFS_DAS_LEAF_REMOVE_RMT" }, \
>  	{ XFS_DAS_RD_LEAF,		"XFS_DAS_RD_LEAF" }, \
>  	{ XFS_DAS_NODE_SET_RMT,		"XFS_DAS_NODE_SET_RMT" }, \
>  	{ XFS_DAS_NODE_ALLOC_RMT,	"XFS_DAS_NODE_ALLOC_RMT" },  \
>  	{ XFS_DAS_NODE_REPLACE,		"XFS_DAS_NODE_REPLACE" },  \
>  	{ XFS_DAS_NODE_REMOVE_OLD,	"XFS_DAS_NODE_REMOVE_OLD" }, \
> -	{ XFS_DAS_RM_NBLK,		"XFS_DAS_RM_NBLK" }, \
> +	{ XFS_DAS_NODE_REMOVE_RMT,	"XFS_DAS_NODE_REMOVE_RMT" }, \
>  	{ XFS_DAS_CLR_FLAG,		"XFS_DAS_CLR_FLAG" }, \
>  	{ XFS_DAS_DONE,			"XFS_DAS_DONE" }
>  
> diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
> index b528c0f375c2..793d2a86ab2c 100644
> --- a/fs/xfs/xfs_trace.h
> +++ b/fs/xfs/xfs_trace.h
> @@ -4140,13 +4140,13 @@ TRACE_DEFINE_ENUM(XFS_DAS_LEAF_SET_RMT);
>  TRACE_DEFINE_ENUM(XFS_DAS_LEAF_ALLOC_RMT);
>  TRACE_DEFINE_ENUM(XFS_DAS_LEAF_REPLACE);
>  TRACE_DEFINE_ENUM(XFS_DAS_LEAF_REMOVE_OLD);
> -TRACE_DEFINE_ENUM(XFS_DAS_RM_LBLK);
> +TRACE_DEFINE_ENUM(XFS_DAS_LEAF_REMOVE_RMT);
>  TRACE_DEFINE_ENUM(XFS_DAS_RD_LEAF);
>  TRACE_DEFINE_ENUM(XFS_DAS_NODE_SET_RMT);
>  TRACE_DEFINE_ENUM(XFS_DAS_NODE_ALLOC_RMT);
>  TRACE_DEFINE_ENUM(XFS_DAS_NODE_REPLACE);
>  TRACE_DEFINE_ENUM(XFS_DAS_NODE_REMOVE_OLD);
> -TRACE_DEFINE_ENUM(XFS_DAS_RM_NBLK);
> +TRACE_DEFINE_ENUM(XFS_DAS_NODE_REMOVE_RMT);
>  TRACE_DEFINE_ENUM(XFS_DAS_CLR_FLAG);
>  
>  DECLARE_EVENT_CLASS(xfs_das_state_class,
> -- 
> 2.35.1
> 

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 11/18] xfs: clean up final attr removal in xfs_attr_set_iter
  2022-05-09  0:41 ` [PATCH 11/18] xfs: clean up final attr removal in xfs_attr_set_iter Dave Chinner
@ 2022-05-10 23:29   ` Darrick J. Wong
  0 siblings, 0 replies; 56+ messages in thread
From: Darrick J. Wong @ 2022-05-10 23:29 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Mon, May 09, 2022 at 10:41:31AM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> Clean up the final leaf/node states in xfs_attr_set_iter() to
> further simplify the high level state machine and to set the
> completion state correctly. As we are adding a separate state
> for node format removal, we need to ensure that node formats
> are collapsed back to shortformi or empty correctly.

Nit: shortform

With that fixed, this is a nice disentangling of more of the state
machine.
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> Reviewed-by: Allison Henderson<allison.henderson@oracle.com>
> ---
>  fs/xfs/libxfs/xfs_attr.c | 149 ++++++++++++++++++++++-----------------
>  fs/xfs/libxfs/xfs_attr.h |  12 ++--
>  fs/xfs/xfs_trace.h       |   5 +-
>  3 files changed, 94 insertions(+), 72 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
> index d65abaead9a1..5707ac4f44a7 100644
> --- a/fs/xfs/libxfs/xfs_attr.c
> +++ b/fs/xfs/libxfs/xfs_attr.c
> @@ -61,7 +61,7 @@ STATIC int xfs_attr_node_get(xfs_da_args_t *args);
>  STATIC void xfs_attr_restore_rmt_blk(struct xfs_da_args *args);
>  static int xfs_attr_node_try_addname(struct xfs_attr_item *attr);
>  STATIC int xfs_attr_node_addname_find_attr(struct xfs_attr_item *attr);
> -STATIC int xfs_attr_node_addname_clear_incomplete(struct xfs_attr_item *attr);
> +STATIC int xfs_attr_node_remove_attr(struct xfs_attr_item *attr);
>  STATIC int xfs_attr_node_hasname(xfs_da_args_t *args,
>  				 struct xfs_da_state **state);
>  STATIC int xfs_attr_fillstate(xfs_da_state_t *state);
> @@ -444,6 +444,77 @@ xfs_attr_rmtval_alloc(
>  	return error;
>  }
>  
> +/*
> + * Remove the original attr we have just replaced. This is dependent on the
> + * original lookup and insert placing the old attr in args->blkno/args->index
> + * and the new attr in args->blkno2/args->index2.
> + */
> +static int
> +xfs_attr_leaf_remove_attr(
> +	struct xfs_attr_item		*attr)
> +{
> +	struct xfs_da_args              *args = attr->xattri_da_args;
> +	struct xfs_inode		*dp = args->dp;
> +	struct xfs_buf			*bp = NULL;
> +	int				forkoff;
> +	int				error;
> +
> +	error = xfs_attr3_leaf_read(args->trans, args->dp, args->blkno,
> +				   &bp);
> +	if (error)
> +		return error;
> +
> +	xfs_attr3_leaf_remove(bp, args);
> +
> +	forkoff = xfs_attr_shortform_allfit(bp, dp);
> +	if (forkoff)
> +		error = xfs_attr3_leaf_to_shortform(bp, args, forkoff);
> +		/* bp is gone due to xfs_da_shrink_inode */
> +
> +	return error;
> +}
> +
> +/*
> + * Shrink an attribute from leaf to shortform. Used by the node format remove
> + * path when the node format collapses to a single block and so we have to check
> + * if it can be collapsed further.
> + */
> +static int
> +xfs_attr_leaf_shrink(
> +	struct xfs_da_args	*args,
> +	struct xfs_da_state     *state)
> +{
> +	struct xfs_inode	*dp = args->dp;
> +	int			error, forkoff;
> +	struct xfs_buf		*bp;
> +
> +	if (!xfs_attr_is_leaf(dp))
> +		return 0;
> +
> +	/*
> +	 * Have to get rid of the copy of this dabuf in the state.
> +	 */
> +	if (state) {
> +		ASSERT(state->path.active == 1);
> +		ASSERT(state->path.blk[0].bp);
> +		state->path.blk[0].bp = NULL;
> +	}
> +
> +	error = xfs_attr3_leaf_read(args->trans, args->dp, 0, &bp);
> +	if (error)
> +		return error;
> +
> +	forkoff = xfs_attr_shortform_allfit(bp, dp);
> +	if (forkoff) {
> +		error = xfs_attr3_leaf_to_shortform(bp, args, forkoff);
> +		/* bp is gone due to xfs_da_shrink_inode */
> +	} else {
> +		xfs_trans_brelse(args->trans, bp);
> +	}
> +
> +	return error;
> +}
> +
>  /*
>   * Set the attribute specified in @args.
>   * This routine is meant to function as a delayed operation, and may return
> @@ -456,9 +527,7 @@ xfs_attr_set_iter(
>  	struct xfs_attr_item		*attr)
>  {
>  	struct xfs_da_args              *args = attr->xattri_da_args;
> -	struct xfs_inode		*dp = args->dp;
> -	struct xfs_buf			*bp = NULL;
> -	int				forkoff, error = 0;
> +	int				error = 0;
>  
>  	/* State machine switch */
>  next_state:
> @@ -549,32 +618,16 @@ xfs_attr_set_iter(
>  		attr->xattri_dela_state++;
>  		break;
>  
> -	case XFS_DAS_RD_LEAF:
> -		/*
> -		 * This is the last step for leaf format. Read the block with
> -		 * the old attr, remove the old attr, check for shortform
> -		 * conversion and return.
> -		 */
> -		error = xfs_attr3_leaf_read(args->trans, args->dp, args->blkno,
> -					   &bp);
> -		if (error)
> -			return error;
> -
> -		xfs_attr3_leaf_remove(bp, args);
> -
> -		forkoff = xfs_attr_shortform_allfit(bp, dp);
> -		if (forkoff)
> -			error = xfs_attr3_leaf_to_shortform(bp, args, forkoff);
> -			/* bp is gone due to xfs_da_shrink_inode */
> -
> -		return error;
> +	case XFS_DAS_LEAF_REMOVE_ATTR:
> +		error = xfs_attr_leaf_remove_attr(attr);
> +		attr->xattri_dela_state = XFS_DAS_DONE;
> +		break;
>  
> -	case XFS_DAS_CLR_FLAG:
> -		/*
> -		 * The last state for node format. Look up the old attr and
> -		 * remove it.
> -		 */
> -		error = xfs_attr_node_addname_clear_incomplete(attr);
> +	case XFS_DAS_NODE_REMOVE_ATTR:
> +		error = xfs_attr_node_remove_attr(attr);
> +		if (!error)
> +			error = xfs_attr_leaf_shrink(args, NULL);
> +		attr->xattri_dela_state = XFS_DAS_DONE;
>  		break;
>  	default:
>  		ASSERT(0);
> @@ -1269,8 +1322,8 @@ xfs_attr_node_try_addname(
>  }
>  
>  
> -STATIC int
> -xfs_attr_node_addname_clear_incomplete(
> +static int
> +xfs_attr_node_remove_attr(
>  	struct xfs_attr_item		*attr)
>  {
>  	struct xfs_da_args		*args = attr->xattri_da_args;
> @@ -1311,38 +1364,6 @@ xfs_attr_node_addname_clear_incomplete(
>  	return retval;
>  }
>  
> -/*
> - * Shrink an attribute from leaf to shortform
> - */
> -STATIC int
> -xfs_attr_node_shrink(
> -	struct xfs_da_args	*args,
> -	struct xfs_da_state     *state)
> -{
> -	struct xfs_inode	*dp = args->dp;
> -	int			error, forkoff;
> -	struct xfs_buf		*bp;
> -
> -	/*
> -	 * Have to get rid of the copy of this dabuf in the state.
> -	 */
> -	ASSERT(state->path.active == 1);
> -	ASSERT(state->path.blk[0].bp);
> -	state->path.blk[0].bp = NULL;
> -
> -	error = xfs_attr3_leaf_read(args->trans, args->dp, 0, &bp);
> -	if (error)
> -		return error;
> -
> -	forkoff = xfs_attr_shortform_allfit(bp, dp);
> -	if (forkoff) {
> -		error = xfs_attr3_leaf_to_shortform(bp, args, forkoff);
> -		/* bp is gone due to xfs_da_shrink_inode */
> -	} else
> -		xfs_trans_brelse(args->trans, bp);
> -
> -	return error;
> -}
>  
>  /*
>   * Mark an attribute entry INCOMPLETE and save pointers to the relevant buffers
> @@ -1551,7 +1572,7 @@ xfs_attr_remove_iter(
>  		 * transaction.
>  		 */
>  		if (xfs_attr_is_leaf(dp))
> -			error = xfs_attr_node_shrink(args, state);
> +			error = xfs_attr_leaf_shrink(args, state);
>  		ASSERT(error != -EAGAIN);
>  		break;
>  	default:
> diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
> index d67535e4ce5a..c318260f17d4 100644
> --- a/fs/xfs/libxfs/xfs_attr.h
> +++ b/fs/xfs/libxfs/xfs_attr.h
> @@ -451,21 +451,21 @@ enum xfs_delattr_state {
>  	XFS_DAS_RM_NAME,		/* Remove attr name */
>  	XFS_DAS_RM_SHRINK,		/* We are shrinking the tree */
>  
> -	/* Leaf state set sequence */
> +	/* Leaf state set/replace sequence */
>  	XFS_DAS_LEAF_SET_RMT,		/* set a remote xattr from a leaf */
>  	XFS_DAS_LEAF_ALLOC_RMT,		/* We are allocating remote blocks */
>  	XFS_DAS_LEAF_REPLACE,		/* Perform replace ops on a leaf */
>  	XFS_DAS_LEAF_REMOVE_OLD,	/* Start removing old attr from leaf */
>  	XFS_DAS_LEAF_REMOVE_RMT,	/* A rename is removing remote blocks */
> -	XFS_DAS_RD_LEAF,		/* Read in the new leaf */
> +	XFS_DAS_LEAF_REMOVE_ATTR,	/* Remove the old attr from a leaf */
>  
> -	/* Node state set sequence, must match leaf state above */
> +	/* Node state set/replace sequence, must match leaf state above */
>  	XFS_DAS_NODE_SET_RMT,		/* set a remote xattr from a node */
>  	XFS_DAS_NODE_ALLOC_RMT,		/* We are allocating remote blocks */
>  	XFS_DAS_NODE_REPLACE,		/* Perform replace ops on a node */
>  	XFS_DAS_NODE_REMOVE_OLD,	/* Start removing old attr from node */
>  	XFS_DAS_NODE_REMOVE_RMT,	/* A rename is removing remote blocks */
> -	XFS_DAS_CLR_FLAG,		/* Clear incomplete flag */
> +	XFS_DAS_NODE_REMOVE_ATTR,	/* Remove the old attr from a node */
>  
>  	XFS_DAS_DONE,			/* finished operation */
>  };
> @@ -483,13 +483,13 @@ enum xfs_delattr_state {
>  	{ XFS_DAS_LEAF_REPLACE,		"XFS_DAS_LEAF_REPLACE" }, \
>  	{ XFS_DAS_LEAF_REMOVE_OLD,	"XFS_DAS_LEAF_REMOVE_OLD" }, \
>  	{ XFS_DAS_LEAF_REMOVE_RMT,	"XFS_DAS_LEAF_REMOVE_RMT" }, \
> -	{ XFS_DAS_RD_LEAF,		"XFS_DAS_RD_LEAF" }, \
> +	{ XFS_DAS_LEAF_REMOVE_ATTR,	"XFS_DAS_LEAF_REMOVE_ATTR" }, \
>  	{ XFS_DAS_NODE_SET_RMT,		"XFS_DAS_NODE_SET_RMT" }, \
>  	{ XFS_DAS_NODE_ALLOC_RMT,	"XFS_DAS_NODE_ALLOC_RMT" },  \
>  	{ XFS_DAS_NODE_REPLACE,		"XFS_DAS_NODE_REPLACE" },  \
>  	{ XFS_DAS_NODE_REMOVE_OLD,	"XFS_DAS_NODE_REMOVE_OLD" }, \
>  	{ XFS_DAS_NODE_REMOVE_RMT,	"XFS_DAS_NODE_REMOVE_RMT" }, \
> -	{ XFS_DAS_CLR_FLAG,		"XFS_DAS_CLR_FLAG" }, \
> +	{ XFS_DAS_NODE_REMOVE_ATTR,	"XFS_DAS_NODE_REMOVE_ATTR" }, \
>  	{ XFS_DAS_DONE,			"XFS_DAS_DONE" }
>  
>  /*
> diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
> index 793d2a86ab2c..260760ce2d05 100644
> --- a/fs/xfs/xfs_trace.h
> +++ b/fs/xfs/xfs_trace.h
> @@ -4141,13 +4141,14 @@ TRACE_DEFINE_ENUM(XFS_DAS_LEAF_ALLOC_RMT);
>  TRACE_DEFINE_ENUM(XFS_DAS_LEAF_REPLACE);
>  TRACE_DEFINE_ENUM(XFS_DAS_LEAF_REMOVE_OLD);
>  TRACE_DEFINE_ENUM(XFS_DAS_LEAF_REMOVE_RMT);
> -TRACE_DEFINE_ENUM(XFS_DAS_RD_LEAF);
> +TRACE_DEFINE_ENUM(XFS_DAS_LEAF_REMOVE_ATTR);
>  TRACE_DEFINE_ENUM(XFS_DAS_NODE_SET_RMT);
>  TRACE_DEFINE_ENUM(XFS_DAS_NODE_ALLOC_RMT);
>  TRACE_DEFINE_ENUM(XFS_DAS_NODE_REPLACE);
>  TRACE_DEFINE_ENUM(XFS_DAS_NODE_REMOVE_OLD);
>  TRACE_DEFINE_ENUM(XFS_DAS_NODE_REMOVE_RMT);
> -TRACE_DEFINE_ENUM(XFS_DAS_CLR_FLAG);
> +TRACE_DEFINE_ENUM(XFS_DAS_NODE_REMOVE_ATTR);
> +TRACE_DEFINE_ENUM(XFS_DAS_DONE);
>  
>  DECLARE_EVENT_CLASS(xfs_das_state_class,
>  	TP_PROTO(int das, struct xfs_inode *ip),
> -- 
> 2.35.1
> 

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 12/18] xfs: xfs_attr_set_iter() does not need to return EAGAIN
  2022-05-09  0:41 ` [PATCH 12/18] xfs: xfs_attr_set_iter() does not need to return EAGAIN Dave Chinner
@ 2022-05-10 23:30   ` Darrick J. Wong
  0 siblings, 0 replies; 56+ messages in thread
From: Darrick J. Wong @ 2022-05-10 23:30 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Mon, May 09, 2022 at 10:41:32AM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> Now that the full xfs_attr_set_iter() state machine always
> terminates with either the state being XFS_DAS_DONE on success or
> an error on failure, we can get rid of the need for it to return
> -EAGAIN whenever it needs to roll the transaction before running
> the next state.
> 
> That is, we don't need to spray -EAGAIN return states everywhere,
> the caller just check the state machine state for completion to
> determine what action should be taken next. This greatly simplifies
> the code within the state machine implementation as it now only has
> to handle 0 for success or -errno for error and it doesn't need to
> tell the caller to retry.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> Reviewed-by: Allison Henderson<allison.henderson@oracle.com>

LGTM
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> ---
>  fs/xfs/libxfs/xfs_attr.c | 86 +++++++++++++++++-----------------------
>  fs/xfs/xfs_attr_item.c   |  2 +
>  2 files changed, 38 insertions(+), 50 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
> index 5707ac4f44a7..89e68d9e22c0 100644
> --- a/fs/xfs/libxfs/xfs_attr.c
> +++ b/fs/xfs/libxfs/xfs_attr.c
> @@ -290,7 +290,6 @@ xfs_attr_sf_addname(
>  	 */
>  	xfs_trans_bhold(args->trans, attr->xattri_leaf_bp);
>  	attr->xattri_dela_state = XFS_DAS_LEAF_ADD;
> -	error = -EAGAIN;
>  out:
>  	trace_xfs_attr_sf_addname_return(attr->xattri_dela_state, args->dp);
>  	return error;
> @@ -343,7 +342,6 @@ xfs_attr_leaf_addname(
>  		 * retry the add to the newly allocated node block.
>  		 */
>  		attr->xattri_dela_state = XFS_DAS_NODE_ADD;
> -		error = -EAGAIN;
>  		goto out;
>  	}
>  	if (error)
> @@ -354,20 +352,24 @@ xfs_attr_leaf_addname(
>  	 * or perform more xattr manipulations. Otherwise there is nothing more
>  	 * to do and we can return success.
>  	 */
> -	if (args->rmtblkno) {
> +	if (args->rmtblkno)
>  		attr->xattri_dela_state = XFS_DAS_LEAF_SET_RMT;
> -		error = -EAGAIN;
> -	} else if (args->op_flags & XFS_DA_OP_RENAME) {
> +	else if (args->op_flags & XFS_DA_OP_RENAME)
>  		xfs_attr_dela_state_set_replace(attr, XFS_DAS_LEAF_REPLACE);
> -		error = -EAGAIN;
> -	} else {
> +	else
>  		attr->xattri_dela_state = XFS_DAS_DONE;
> -	}
>  out:
>  	trace_xfs_attr_leaf_addname_return(attr->xattri_dela_state, args->dp);
>  	return error;
>  }
>  
> +/*
> + * Add an entry to a node format attr tree.
> + *
> + * Note that we might still have a leaf here - xfs_attr_is_leaf() cannot tell
> + * the difference between leaf + remote attr blocks and a node format tree,
> + * so we may still end up having to convert from leaf to node format here.
> + */
>  static int
>  xfs_attr_node_addname(
>  	struct xfs_attr_item	*attr)
> @@ -382,19 +384,26 @@ xfs_attr_node_addname(
>  		return error;
>  
>  	error = xfs_attr_node_try_addname(attr);
> +	if (error == -ENOSPC) {
> +		error = xfs_attr3_leaf_to_node(args);
> +		if (error)
> +			return error;
> +		/*
> +		 * No state change, we really are in node form now
> +		 * but we need the transaction rolled to continue.
> +		 */
> +		goto out;
> +	}
>  	if (error)
>  		return error;
>  
> -	if (args->rmtblkno) {
> +	if (args->rmtblkno)
>  		attr->xattri_dela_state = XFS_DAS_NODE_SET_RMT;
> -		error = -EAGAIN;
> -	} else if (args->op_flags & XFS_DA_OP_RENAME) {
> +	else if (args->op_flags & XFS_DA_OP_RENAME)
>  		xfs_attr_dela_state_set_replace(attr, XFS_DAS_NODE_REPLACE);
> -		error = -EAGAIN;
> -	} else {
> +	else
>  		attr->xattri_dela_state = XFS_DAS_DONE;
> -	}
> -
> +out:
>  	trace_xfs_attr_node_addname_return(attr->xattri_dela_state, args->dp);
>  	return error;
>  }
> @@ -417,10 +426,8 @@ xfs_attr_rmtval_alloc(
>  		if (error)
>  			return error;
>  		/* Roll the transaction only if there is more to allocate. */
> -		if (attr->xattri_blkcnt > 0) {
> -			error = -EAGAIN;
> +		if (attr->xattri_blkcnt > 0)
>  			goto out;
> -		}
>  	}
>  
>  	error = xfs_attr_rmtval_set_value(args);
> @@ -516,11 +523,12 @@ xfs_attr_leaf_shrink(
>  }
>  
>  /*
> - * Set the attribute specified in @args.
> - * This routine is meant to function as a delayed operation, and may return
> - * -EAGAIN when the transaction needs to be rolled.  Calling functions will need
> - * to handle this, and recall the function until a successful error code is
> - * returned.
> + * Run the attribute operation specified in @attr.
> + *
> + * This routine is meant to function as a delayed operation and will set the
> + * state to XFS_DAS_DONE when the operation is complete.  Calling functions will
> + * need to handle this, and recall the function until either an error or
> + * XFS_DAS_DONE is detected.
>   */
>  int
>  xfs_attr_set_iter(
> @@ -573,7 +581,6 @@ xfs_attr_set_iter(
>  		 * We must commit the flag value change now to make it atomic
>  		 * and then we can start the next trans in series at REMOVE_OLD.
>  		 */
> -		error = -EAGAIN;
>  		attr->xattri_dela_state++;
>  		break;
>  
> @@ -601,8 +608,10 @@ xfs_attr_set_iter(
>  	case XFS_DAS_LEAF_REMOVE_RMT:
>  	case XFS_DAS_NODE_REMOVE_RMT:
>  		error = xfs_attr_rmtval_remove(attr);
> -		if (error == -EAGAIN)
> +		if (error == -EAGAIN) {
> +			error = 0;
>  			break;
> +		}
>  		if (error)
>  			return error;
>  
> @@ -614,7 +623,6 @@ xfs_attr_set_iter(
>  		 * can't do that in the same transaction where we removed the
>  		 * remote attr blocks.
>  		 */
> -		error = -EAGAIN;
>  		attr->xattri_dela_state++;
>  		break;
>  
> @@ -1250,14 +1258,6 @@ xfs_attr_node_addname_find_attr(
>   * This will involve walking down the Btree, and may involve splitting
>   * leaf nodes and even splitting intermediate nodes up to and including
>   * the root node (a special case of an intermediate node).
> - *
> - * "Remote" attribute values confuse the issue and atomic rename operations
> - * add a whole extra layer of confusion on top of that.
> - *
> - * This routine is meant to function as a delayed operation, and may return
> - * -EAGAIN when the transaction needs to be rolled.  Calling functions will need
> - * to handle this, and recall the function until a successful error code is
> - *returned.
>   */
>  static int
>  xfs_attr_node_try_addname(
> @@ -1279,24 +1279,10 @@ xfs_attr_node_try_addname(
>  			/*
>  			 * Its really a single leaf node, but it had
>  			 * out-of-line values so it looked like it *might*
> -			 * have been a b-tree.
> +			 * have been a b-tree. Let the caller deal with this.
>  			 */
>  			xfs_da_state_free(state);
> -			state = NULL;
> -			error = xfs_attr3_leaf_to_node(args);
> -			if (error)
> -				goto out;
> -
> -			/*
> -			 * Now that we have converted the leaf to a node, we can
> -			 * roll the transaction, and try xfs_attr3_leaf_add
> -			 * again on re-entry.  No need to set dela_state to do
> -			 * this. dela_state is still unset by this function at
> -			 * this point.
> -			 */
> -			trace_xfs_attr_node_addname_return(
> -					attr->xattri_dela_state, args->dp);
> -			return -EAGAIN;
> +			return -ENOSPC;
>  		}
>  
>  		/*
> diff --git a/fs/xfs/xfs_attr_item.c b/fs/xfs/xfs_attr_item.c
> index 5bfb33746e37..740a55d07660 100644
> --- a/fs/xfs/xfs_attr_item.c
> +++ b/fs/xfs/xfs_attr_item.c
> @@ -313,6 +313,8 @@ xfs_xattri_finish_update(
>  	case XFS_ATTR_OP_FLAGS_SET:
>  	case XFS_ATTR_OP_FLAGS_REPLACE:
>  		error = xfs_attr_set_iter(attr);
> +		if (!error && attr->xattri_dela_state != XFS_DAS_DONE)
> +			error = -EAGAIN;
>  		break;
>  	case XFS_ATTR_OP_FLAGS_REMOVE:
>  		ASSERT(XFS_IFORK_Q(args->dp));
> -- 
> 2.35.1
> 

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 13/18] xfs: introduce attr remove initial states into xfs_attr_set_iter
  2022-05-09  0:41 ` [PATCH 13/18] xfs: introduce attr remove initial states into xfs_attr_set_iter Dave Chinner
@ 2022-05-10 23:37   ` Darrick J. Wong
  2022-05-10 23:40     ` Darrick J. Wong
  0 siblings, 1 reply; 56+ messages in thread
From: Darrick J. Wong @ 2022-05-10 23:37 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Mon, May 09, 2022 at 10:41:33AM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> We need to merge the add and remove code paths to enable safe
> recovery of replace operations. Hoist the initial remove states from
> xfs_attr_remove_iter into xfs_attr_set_iter. We will make use of
> them in the next patches.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> Reviewed-by: Allison Henderson<allison.henderson@oracle.com>
> ---
>  fs/xfs/libxfs/xfs_attr.c | 139 ++++++++++++++++++++++-----------------
>  fs/xfs/libxfs/xfs_attr.h |   4 ++
>  fs/xfs/xfs_trace.h       |   3 +
>  3 files changed, 84 insertions(+), 62 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
> index 89e68d9e22c0..a6a9b1f8dce6 100644
> --- a/fs/xfs/libxfs/xfs_attr.c
> +++ b/fs/xfs/libxfs/xfs_attr.c
> @@ -451,6 +451,68 @@ xfs_attr_rmtval_alloc(
>  	return error;
>  }
>  
> +/*
> + * Mark an attribute entry INCOMPLETE and save pointers to the relevant buffers
> + * for later deletion of the entry.
> + */
> +static int
> +xfs_attr_leaf_mark_incomplete(
> +	struct xfs_da_args	*args,
> +	struct xfs_da_state	*state)
> +{
> +	int			error;
> +
> +	/*
> +	 * Fill in disk block numbers in the state structure
> +	 * so that we can get the buffers back after we commit
> +	 * several transactions in the following calls.
> +	 */
> +	error = xfs_attr_fillstate(state);
> +	if (error)
> +		return error;
> +
> +	/*
> +	 * Mark the attribute as INCOMPLETE
> +	 */
> +	return xfs_attr3_leaf_setflag(args);
> +}
> +
> +/*
> + * Initial setup for xfs_attr_node_removename.  Make sure the attr is there and
> + * the blocks are valid.  Attr keys with remote blocks will be marked
> + * incomplete.
> + */
> +static
> +int xfs_attr_node_removename_setup(
> +	struct xfs_attr_item		*attr)
> +{
> +	struct xfs_da_args		*args = attr->xattri_da_args;
> +	struct xfs_da_state		**state = &attr->xattri_da_state;
> +	int				error;
> +
> +	error = xfs_attr_node_hasname(args, state);
> +	if (error != -EEXIST)
> +		goto out;
> +	error = 0;
> +
> +	ASSERT((*state)->path.blk[(*state)->path.active - 1].bp != NULL);
> +	ASSERT((*state)->path.blk[(*state)->path.active - 1].magic ==
> +		XFS_ATTR_LEAF_MAGIC);
> +
> +	if (args->rmtblkno > 0) {
> +		error = xfs_attr_leaf_mark_incomplete(args, *state);
> +		if (error)
> +			goto out;
> +
> +		error = xfs_attr_rmtval_invalidate(args);
> +	}
> +out:
> +	if (error)
> +		xfs_da_state_free(*state);
> +
> +	return error;
> +}
> +
>  /*
>   * Remove the original attr we have just replaced. This is dependent on the
>   * original lookup and insert placing the old attr in args->blkno/args->index
> @@ -550,6 +612,21 @@ xfs_attr_set_iter(
>  	case XFS_DAS_NODE_ADD:
>  		return xfs_attr_node_addname(attr);
>  
> +	case XFS_DAS_SF_REMOVE:

I think attr removal is broken until the next patch, which adds
xfs_attr_init_remove_state to set dela_state to DAS_*_REMOVE, right?

The rest of the state machine changes look solid.

With that moved up by one patch to fix the bisection issue,
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> +		attr->xattri_dela_state = XFS_DAS_DONE;
> +		return xfs_attr_sf_removename(args);
> +	case XFS_DAS_LEAF_REMOVE:
> +		attr->xattri_dela_state = XFS_DAS_DONE;
> +		return xfs_attr_leaf_removename(args);
> +	case XFS_DAS_NODE_REMOVE:
> +		error = xfs_attr_node_removename_setup(attr);
> +		if (error)
> +			return error;
> +		attr->xattri_dela_state = XFS_DAS_NODE_REMOVE_RMT;
> +		if (args->rmtblkno == 0)
> +			attr->xattri_dela_state++;
> +		break;
> +
>  	case XFS_DAS_LEAF_SET_RMT:
>  	case XFS_DAS_NODE_SET_RMT:
>  		error = xfs_attr_rmtval_find_space(attr);
> @@ -1351,68 +1428,6 @@ xfs_attr_node_remove_attr(
>  }
>  
>  
> -/*
> - * Mark an attribute entry INCOMPLETE and save pointers to the relevant buffers
> - * for later deletion of the entry.
> - */
> -STATIC int
> -xfs_attr_leaf_mark_incomplete(
> -	struct xfs_da_args	*args,
> -	struct xfs_da_state	*state)
> -{
> -	int			error;
> -
> -	/*
> -	 * Fill in disk block numbers in the state structure
> -	 * so that we can get the buffers back after we commit
> -	 * several transactions in the following calls.
> -	 */
> -	error = xfs_attr_fillstate(state);
> -	if (error)
> -		return error;
> -
> -	/*
> -	 * Mark the attribute as INCOMPLETE
> -	 */
> -	return xfs_attr3_leaf_setflag(args);
> -}
> -
> -/*
> - * Initial setup for xfs_attr_node_removename.  Make sure the attr is there and
> - * the blocks are valid.  Attr keys with remote blocks will be marked
> - * incomplete.
> - */
> -STATIC
> -int xfs_attr_node_removename_setup(
> -	struct xfs_attr_item		*attr)
> -{
> -	struct xfs_da_args		*args = attr->xattri_da_args;
> -	struct xfs_da_state		**state = &attr->xattri_da_state;
> -	int				error;
> -
> -	error = xfs_attr_node_hasname(args, state);
> -	if (error != -EEXIST)
> -		goto out;
> -	error = 0;
> -
> -	ASSERT((*state)->path.blk[(*state)->path.active - 1].bp != NULL);
> -	ASSERT((*state)->path.blk[(*state)->path.active - 1].magic ==
> -		XFS_ATTR_LEAF_MAGIC);
> -
> -	if (args->rmtblkno > 0) {
> -		error = xfs_attr_leaf_mark_incomplete(args, *state);
> -		if (error)
> -			goto out;
> -
> -		error = xfs_attr_rmtval_invalidate(args);
> -	}
> -out:
> -	if (error)
> -		xfs_da_state_free(*state);
> -
> -	return error;
> -}
> -
>  STATIC int
>  xfs_attr_node_removename(
>  	struct xfs_da_args	*args,
> diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
> index c318260f17d4..7ea7c7fa31ac 100644
> --- a/fs/xfs/libxfs/xfs_attr.h
> +++ b/fs/xfs/libxfs/xfs_attr.h
> @@ -451,6 +451,10 @@ enum xfs_delattr_state {
>  	XFS_DAS_RM_NAME,		/* Remove attr name */
>  	XFS_DAS_RM_SHRINK,		/* We are shrinking the tree */
>  
> +	XFS_DAS_SF_REMOVE,		/* Initial shortform set iter state */
> +	XFS_DAS_LEAF_REMOVE,		/* Initial leaf form set iter state */
> +	XFS_DAS_NODE_REMOVE,		/* Initial node form set iter state */
> +
>  	/* Leaf state set/replace sequence */
>  	XFS_DAS_LEAF_SET_RMT,		/* set a remote xattr from a leaf */
>  	XFS_DAS_LEAF_ALLOC_RMT,		/* We are allocating remote blocks */
> diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
> index 260760ce2d05..01b047d86cd1 100644
> --- a/fs/xfs/xfs_trace.h
> +++ b/fs/xfs/xfs_trace.h
> @@ -4136,6 +4136,9 @@ TRACE_DEFINE_ENUM(XFS_DAS_NODE_ADD);
>  TRACE_DEFINE_ENUM(XFS_DAS_RMTBLK);
>  TRACE_DEFINE_ENUM(XFS_DAS_RM_NAME);
>  TRACE_DEFINE_ENUM(XFS_DAS_RM_SHRINK);
> +TRACE_DEFINE_ENUM(XFS_DAS_SF_REMOVE);
> +TRACE_DEFINE_ENUM(XFS_DAS_LEAF_REMOVE);
> +TRACE_DEFINE_ENUM(XFS_DAS_NODE_REMOVE);
>  TRACE_DEFINE_ENUM(XFS_DAS_LEAF_SET_RMT);
>  TRACE_DEFINE_ENUM(XFS_DAS_LEAF_ALLOC_RMT);
>  TRACE_DEFINE_ENUM(XFS_DAS_LEAF_REPLACE);
> -- 
> 2.35.1
> 

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 14/18] xfs: switch attr remove to xfs_attri_set_iter
  2022-05-09  0:41 ` [PATCH 14/18] xfs: switch attr remove to xfs_attri_set_iter Dave Chinner
@ 2022-05-10 23:40   ` Darrick J. Wong
  0 siblings, 0 replies; 56+ messages in thread
From: Darrick J. Wong @ 2022-05-10 23:40 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Mon, May 09, 2022 at 10:41:34AM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> Now that xfs_attri_set_iter() has initial states for removing
> attributes, switch the pure attribute removal code over to using it.
> This requires attrs being removed to always be marked as INCOMPLETE
> before we start the removal due to the fact we look up the attr to
> remove again in xfs_attr_node_remove_attr().
> 
> Note: this drops the fillstate/refillstate optimisations from
> the remove path that avoid having to look up the path again after
> setting the incomplete flag and removeing remote attrs. Restoring

'removing'

> that optimisation to this path is future Dave's problem.

LOL.

> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> Reviewed-by: Allison Henderson<allison.henderson@oracle.com>

Stupid nit: Probably ought to  ^^^ be a space between the 'son' and the
bracket.

With that stuff cleaned up,
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D


> ---
>  fs/xfs/libxfs/xfs_attr.c | 21 +++++++++------------
>  fs/xfs/libxfs/xfs_attr.h | 10 ++++++++++
>  fs/xfs/xfs_attr_item.c   | 31 +++++++------------------------
>  3 files changed, 26 insertions(+), 36 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
> index a6a9b1f8dce6..b935727cf517 100644
> --- a/fs/xfs/libxfs/xfs_attr.c
> +++ b/fs/xfs/libxfs/xfs_attr.c
> @@ -499,13 +499,11 @@ int xfs_attr_node_removename_setup(
>  	ASSERT((*state)->path.blk[(*state)->path.active - 1].magic ==
>  		XFS_ATTR_LEAF_MAGIC);
>  
> -	if (args->rmtblkno > 0) {
> -		error = xfs_attr_leaf_mark_incomplete(args, *state);
> -		if (error)
> -			goto out;
> -
> +	error = xfs_attr_leaf_mark_incomplete(args, *state);
> +	if (error)
> +		goto out;
> +	if (args->rmtblkno > 0)
>  		error = xfs_attr_rmtval_invalidate(args);
> -	}
>  out:
>  	if (error)
>  		xfs_da_state_free(*state);
> @@ -821,7 +819,7 @@ xfs_attr_defer_remove(
>  	if (error)
>  		return error;
>  
> -	new->xattri_dela_state = XFS_DAS_UNINIT;
> +	new->xattri_dela_state = xfs_attr_init_remove_state(args);
>  	xfs_defer_add(args->trans, XFS_DEFER_OPS_TYPE_ATTR, &new->xattri_list);
>  	trace_xfs_attr_defer_remove(new->xattri_dela_state, args->dp);
>  
> @@ -1391,16 +1389,15 @@ xfs_attr_node_remove_attr(
>  {
>  	struct xfs_da_args		*args = attr->xattri_da_args;
>  	struct xfs_da_state		*state = NULL;
> -	struct xfs_mount		*mp = args->dp->i_mount;
>  	int				retval = 0;
>  	int				error = 0;
>  
>  	/*
> -	 * Re-find the "old" attribute entry after any split ops. The INCOMPLETE
> -	 * flag means that we will find the "old" attr, not the "new" one.
> +	 * The attr we are removing has already been marked incomplete, so
> +	 * we need to set the filter appropriately to re-find the "old"
> +	 * attribute entry after any split ops.
>  	 */
> -	if (!xfs_has_larp(mp))
> -		args->attr_filter |= XFS_ATTR_INCOMPLETE;
> +	args->attr_filter |= XFS_ATTR_INCOMPLETE;
>  	state = xfs_da_state_alloc(args);
>  	state->inleaf = 0;
>  	error = xfs_da3_node_lookup_int(state, &retval);
> diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
> index 7ea7c7fa31ac..6bef522533a4 100644
> --- a/fs/xfs/libxfs/xfs_attr.h
> +++ b/fs/xfs/libxfs/xfs_attr.h
> @@ -593,6 +593,16 @@ xfs_attr_init_add_state(struct xfs_da_args *args)
>  	return XFS_DAS_NODE_ADD;
>  }
>  
> +static inline enum xfs_delattr_state
> +xfs_attr_init_remove_state(struct xfs_da_args *args)
> +{
> +	if (xfs_attr_is_shortform(args->dp))
> +		return XFS_DAS_SF_REMOVE;
> +	if (xfs_attr_is_leaf(args->dp))
> +		return XFS_DAS_LEAF_REMOVE;
> +	return XFS_DAS_NODE_REMOVE;
> +}
> +
>  static inline enum xfs_delattr_state
>  xfs_attr_init_replace_state(struct xfs_da_args *args)
>  {
> diff --git a/fs/xfs/xfs_attr_item.c b/fs/xfs/xfs_attr_item.c
> index 740a55d07660..fb9549e7ea96 100644
> --- a/fs/xfs/xfs_attr_item.c
> +++ b/fs/xfs/xfs_attr_item.c
> @@ -296,12 +296,9 @@ xfs_attrd_item_release(
>  STATIC int
>  xfs_xattri_finish_update(
>  	struct xfs_attr_item		*attr,
> -	struct xfs_attrd_log_item	*attrdp,
> -	uint32_t			op_flags)
> +	struct xfs_attrd_log_item	*attrdp)
>  {
>  	struct xfs_da_args		*args = attr->xattri_da_args;
> -	unsigned int			op = op_flags &
> -					     XFS_ATTR_OP_FLAGS_TYPE_MASK;
>  	int				error;
>  
>  	if (XFS_TEST_ERROR(false, args->dp->i_mount, XFS_ERRTAG_LARP)) {
> @@ -309,22 +306,9 @@ xfs_xattri_finish_update(
>  		goto out;
>  	}
>  
> -	switch (op) {
> -	case XFS_ATTR_OP_FLAGS_SET:
> -	case XFS_ATTR_OP_FLAGS_REPLACE:
> -		error = xfs_attr_set_iter(attr);
> -		if (!error && attr->xattri_dela_state != XFS_DAS_DONE)
> -			error = -EAGAIN;
> -		break;
> -	case XFS_ATTR_OP_FLAGS_REMOVE:
> -		ASSERT(XFS_IFORK_Q(args->dp));
> -		error = xfs_attr_remove_iter(attr);
> -		break;
> -	default:
> -		error = -EFSCORRUPTED;
> -		break;
> -	}
> -
> +	error = xfs_attr_set_iter(attr);
> +	if (!error && attr->xattri_dela_state != XFS_DAS_DONE)
> +		error = -EAGAIN;
>  out:
>  	/*
>  	 * Mark the transaction dirty, even on error. This ensures the
> @@ -432,8 +416,7 @@ xfs_attr_finish_item(
>  	 */
>  	attr->xattri_da_args->trans = tp;
>  
> -	error = xfs_xattri_finish_update(attr, done_item,
> -					 attr->xattri_op_flags);
> +	error = xfs_xattri_finish_update(attr, done_item);
>  	if (error != -EAGAIN)
>  		kmem_free(attr);
>  
> @@ -581,7 +564,7 @@ xfs_attri_item_recover(
>  		attr->xattri_dela_state = xfs_attr_init_add_state(args);
>  		break;
>  	case XFS_ATTR_OP_FLAGS_REMOVE:
> -		attr->xattri_dela_state = XFS_DAS_UNINIT;
> +		attr->xattri_dela_state = xfs_attr_init_remove_state(args);
>  		break;
>  	default:
>  		ASSERT(0);
> @@ -600,7 +583,7 @@ xfs_attri_item_recover(
>  	xfs_ilock(ip, XFS_ILOCK_EXCL);
>  	xfs_trans_ijoin(tp, ip, 0);
>  
> -	ret = xfs_xattri_finish_update(attr, done_item, attrp->alfi_op_flags);
> +	ret = xfs_xattri_finish_update(attr, done_item);
>  	if (ret == -EAGAIN) {
>  		/* There's more work to do, so add it to this transaction */
>  		xfs_defer_add(tp, XFS_DEFER_OPS_TYPE_ATTR, &attr->xattri_list);
> -- 
> 2.35.1
> 

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 13/18] xfs: introduce attr remove initial states into xfs_attr_set_iter
  2022-05-10 23:37   ` Darrick J. Wong
@ 2022-05-10 23:40     ` Darrick J. Wong
  0 siblings, 0 replies; 56+ messages in thread
From: Darrick J. Wong @ 2022-05-10 23:40 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Tue, May 10, 2022 at 04:37:36PM -0700, Darrick J. Wong wrote:
> On Mon, May 09, 2022 at 10:41:33AM +1000, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > We need to merge the add and remove code paths to enable safe
> > recovery of replace operations. Hoist the initial remove states from
> > xfs_attr_remove_iter into xfs_attr_set_iter. We will make use of
> > them in the next patches.
> > 
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > Reviewed-by: Allison Henderson<allison.henderson@oracle.com>
> > ---
> >  fs/xfs/libxfs/xfs_attr.c | 139 ++++++++++++++++++++++-----------------
> >  fs/xfs/libxfs/xfs_attr.h |   4 ++
> >  fs/xfs/xfs_trace.h       |   3 +
> >  3 files changed, 84 insertions(+), 62 deletions(-)
> > 
> > diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
> > index 89e68d9e22c0..a6a9b1f8dce6 100644
> > --- a/fs/xfs/libxfs/xfs_attr.c
> > +++ b/fs/xfs/libxfs/xfs_attr.c
> > @@ -451,6 +451,68 @@ xfs_attr_rmtval_alloc(
> >  	return error;
> >  }
> >  
> > +/*
> > + * Mark an attribute entry INCOMPLETE and save pointers to the relevant buffers
> > + * for later deletion of the entry.
> > + */
> > +static int
> > +xfs_attr_leaf_mark_incomplete(
> > +	struct xfs_da_args	*args,
> > +	struct xfs_da_state	*state)
> > +{
> > +	int			error;
> > +
> > +	/*
> > +	 * Fill in disk block numbers in the state structure
> > +	 * so that we can get the buffers back after we commit
> > +	 * several transactions in the following calls.
> > +	 */
> > +	error = xfs_attr_fillstate(state);
> > +	if (error)
> > +		return error;
> > +
> > +	/*
> > +	 * Mark the attribute as INCOMPLETE
> > +	 */
> > +	return xfs_attr3_leaf_setflag(args);
> > +}
> > +
> > +/*
> > + * Initial setup for xfs_attr_node_removename.  Make sure the attr is there and
> > + * the blocks are valid.  Attr keys with remote blocks will be marked
> > + * incomplete.
> > + */
> > +static
> > +int xfs_attr_node_removename_setup(
> > +	struct xfs_attr_item		*attr)
> > +{
> > +	struct xfs_da_args		*args = attr->xattri_da_args;
> > +	struct xfs_da_state		**state = &attr->xattri_da_state;
> > +	int				error;
> > +
> > +	error = xfs_attr_node_hasname(args, state);
> > +	if (error != -EEXIST)
> > +		goto out;
> > +	error = 0;
> > +
> > +	ASSERT((*state)->path.blk[(*state)->path.active - 1].bp != NULL);
> > +	ASSERT((*state)->path.blk[(*state)->path.active - 1].magic ==
> > +		XFS_ATTR_LEAF_MAGIC);
> > +
> > +	if (args->rmtblkno > 0) {
> > +		error = xfs_attr_leaf_mark_incomplete(args, *state);
> > +		if (error)
> > +			goto out;
> > +
> > +		error = xfs_attr_rmtval_invalidate(args);
> > +	}
> > +out:
> > +	if (error)
> > +		xfs_da_state_free(*state);
> > +
> > +	return error;
> > +}
> > +
> >  /*
> >   * Remove the original attr we have just replaced. This is dependent on the
> >   * original lookup and insert placing the old attr in args->blkno/args->index
> > @@ -550,6 +612,21 @@ xfs_attr_set_iter(
> >  	case XFS_DAS_NODE_ADD:
> >  		return xfs_attr_node_addname(attr);
> >  
> > +	case XFS_DAS_SF_REMOVE:
> 
> I think attr removal is broken until the next patch, which adds
> xfs_attr_init_remove_state to set dela_state to DAS_*_REMOVE, right?

Aha, I was looking at the wrong tab, the old xfs_attr_remove_iter code
is still in place, so ignore this question...

> The rest of the state machine changes look solid.
> 
> With that moved up by one patch to fix the bisection issue,
> Reviewed-by: Darrick J. Wong <djwong@kernel.org>

..and go straight to the RVB tag.

--D

> 
> --D
> 
> > +		attr->xattri_dela_state = XFS_DAS_DONE;
> > +		return xfs_attr_sf_removename(args);
> > +	case XFS_DAS_LEAF_REMOVE:
> > +		attr->xattri_dela_state = XFS_DAS_DONE;
> > +		return xfs_attr_leaf_removename(args);
> > +	case XFS_DAS_NODE_REMOVE:
> > +		error = xfs_attr_node_removename_setup(attr);
> > +		if (error)
> > +			return error;
> > +		attr->xattri_dela_state = XFS_DAS_NODE_REMOVE_RMT;
> > +		if (args->rmtblkno == 0)
> > +			attr->xattri_dela_state++;
> > +		break;
> > +
> >  	case XFS_DAS_LEAF_SET_RMT:
> >  	case XFS_DAS_NODE_SET_RMT:
> >  		error = xfs_attr_rmtval_find_space(attr);
> > @@ -1351,68 +1428,6 @@ xfs_attr_node_remove_attr(
> >  }
> >  
> >  
> > -/*
> > - * Mark an attribute entry INCOMPLETE and save pointers to the relevant buffers
> > - * for later deletion of the entry.
> > - */
> > -STATIC int
> > -xfs_attr_leaf_mark_incomplete(
> > -	struct xfs_da_args	*args,
> > -	struct xfs_da_state	*state)
> > -{
> > -	int			error;
> > -
> > -	/*
> > -	 * Fill in disk block numbers in the state structure
> > -	 * so that we can get the buffers back after we commit
> > -	 * several transactions in the following calls.
> > -	 */
> > -	error = xfs_attr_fillstate(state);
> > -	if (error)
> > -		return error;
> > -
> > -	/*
> > -	 * Mark the attribute as INCOMPLETE
> > -	 */
> > -	return xfs_attr3_leaf_setflag(args);
> > -}
> > -
> > -/*
> > - * Initial setup for xfs_attr_node_removename.  Make sure the attr is there and
> > - * the blocks are valid.  Attr keys with remote blocks will be marked
> > - * incomplete.
> > - */
> > -STATIC
> > -int xfs_attr_node_removename_setup(
> > -	struct xfs_attr_item		*attr)
> > -{
> > -	struct xfs_da_args		*args = attr->xattri_da_args;
> > -	struct xfs_da_state		**state = &attr->xattri_da_state;
> > -	int				error;
> > -
> > -	error = xfs_attr_node_hasname(args, state);
> > -	if (error != -EEXIST)
> > -		goto out;
> > -	error = 0;
> > -
> > -	ASSERT((*state)->path.blk[(*state)->path.active - 1].bp != NULL);
> > -	ASSERT((*state)->path.blk[(*state)->path.active - 1].magic ==
> > -		XFS_ATTR_LEAF_MAGIC);
> > -
> > -	if (args->rmtblkno > 0) {
> > -		error = xfs_attr_leaf_mark_incomplete(args, *state);
> > -		if (error)
> > -			goto out;
> > -
> > -		error = xfs_attr_rmtval_invalidate(args);
> > -	}
> > -out:
> > -	if (error)
> > -		xfs_da_state_free(*state);
> > -
> > -	return error;
> > -}
> > -
> >  STATIC int
> >  xfs_attr_node_removename(
> >  	struct xfs_da_args	*args,
> > diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
> > index c318260f17d4..7ea7c7fa31ac 100644
> > --- a/fs/xfs/libxfs/xfs_attr.h
> > +++ b/fs/xfs/libxfs/xfs_attr.h
> > @@ -451,6 +451,10 @@ enum xfs_delattr_state {
> >  	XFS_DAS_RM_NAME,		/* Remove attr name */
> >  	XFS_DAS_RM_SHRINK,		/* We are shrinking the tree */
> >  
> > +	XFS_DAS_SF_REMOVE,		/* Initial shortform set iter state */
> > +	XFS_DAS_LEAF_REMOVE,		/* Initial leaf form set iter state */
> > +	XFS_DAS_NODE_REMOVE,		/* Initial node form set iter state */
> > +
> >  	/* Leaf state set/replace sequence */
> >  	XFS_DAS_LEAF_SET_RMT,		/* set a remote xattr from a leaf */
> >  	XFS_DAS_LEAF_ALLOC_RMT,		/* We are allocating remote blocks */
> > diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
> > index 260760ce2d05..01b047d86cd1 100644
> > --- a/fs/xfs/xfs_trace.h
> > +++ b/fs/xfs/xfs_trace.h
> > @@ -4136,6 +4136,9 @@ TRACE_DEFINE_ENUM(XFS_DAS_NODE_ADD);
> >  TRACE_DEFINE_ENUM(XFS_DAS_RMTBLK);
> >  TRACE_DEFINE_ENUM(XFS_DAS_RM_NAME);
> >  TRACE_DEFINE_ENUM(XFS_DAS_RM_SHRINK);
> > +TRACE_DEFINE_ENUM(XFS_DAS_SF_REMOVE);
> > +TRACE_DEFINE_ENUM(XFS_DAS_LEAF_REMOVE);
> > +TRACE_DEFINE_ENUM(XFS_DAS_NODE_REMOVE);
> >  TRACE_DEFINE_ENUM(XFS_DAS_LEAF_SET_RMT);
> >  TRACE_DEFINE_ENUM(XFS_DAS_LEAF_ALLOC_RMT);
> >  TRACE_DEFINE_ENUM(XFS_DAS_LEAF_REPLACE);
> > -- 
> > 2.35.1
> > 

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 15/18] xfs: remove xfs_attri_remove_iter
  2022-05-09  0:41 ` [PATCH 15/18] xfs: remove xfs_attri_remove_iter Dave Chinner
@ 2022-05-10 23:42   ` Darrick J. Wong
  0 siblings, 0 replies; 56+ messages in thread
From: Darrick J. Wong @ 2022-05-10 23:42 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Mon, May 09, 2022 at 10:41:35AM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> xfs_attri_remove_iter is not used anymore, so remove it and all the
> infrastructure it uses and is needed to drive it. THe
> xfs_attr_refillstate() function now throws an unused warning, so
> isolate the xfs_attr_fillstate()/xfs_attr_refillstate() code pair
> with an #if 0 and a comment explaining why we want to keep this code
> and restore the optimisation it provides in the near future.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> Reviewed-by: Allison Henderson<allison.henderson@oracle.com>

Heh.  The state machine traversal is muuuch easier to read now...
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> ---
>  fs/xfs/libxfs/xfs_attr.c | 418 +++++++++++++--------------------------
>  1 file changed, 139 insertions(+), 279 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
> index b935727cf517..8be76f8d11c5 100644
> --- a/fs/xfs/libxfs/xfs_attr.c
> +++ b/fs/xfs/libxfs/xfs_attr.c
> @@ -64,10 +64,6 @@ STATIC int xfs_attr_node_addname_find_attr(struct xfs_attr_item *attr);
>  STATIC int xfs_attr_node_remove_attr(struct xfs_attr_item *attr);
>  STATIC int xfs_attr_node_hasname(xfs_da_args_t *args,
>  				 struct xfs_da_state **state);
> -STATIC int xfs_attr_fillstate(xfs_da_state_t *state);
> -STATIC int xfs_attr_refillstate(xfs_da_state_t *state);
> -STATIC int xfs_attr_node_removename(struct xfs_da_args *args,
> -				    struct xfs_da_state *state);
>  
>  int
>  xfs_inode_hasattr(
> @@ -100,6 +96,123 @@ xfs_attr_is_leaf(
>  	return imap.br_startoff == 0 && imap.br_blockcount == 1;
>  }
>  
> +/*
> + * XXX (dchinner): name path state saving and refilling is an optimisation to
> + * avoid needing to look up name entries after rolling transactions removing
> + * remote xattr blocks between the name entry lookup and name entry removal.
> + * This optimisation got sidelined when combining the set and remove state
> + * machines, but the code has been left in place because it is worthwhile to
> + * restore the optimisation once the combined state machine paths have settled.
> + *
> + * This comment is a public service announcement to remind Future Dave that he
> + * still needs to restore this code to working order.
> + */
> +#if 0
> +/*
> + * Fill in the disk block numbers in the state structure for the buffers
> + * that are attached to the state structure.
> + * This is done so that we can quickly reattach ourselves to those buffers
> + * after some set of transaction commits have released these buffers.
> + */
> +static int
> +xfs_attr_fillstate(xfs_da_state_t *state)
> +{
> +	xfs_da_state_path_t *path;
> +	xfs_da_state_blk_t *blk;
> +	int level;
> +
> +	trace_xfs_attr_fillstate(state->args);
> +
> +	/*
> +	 * Roll down the "path" in the state structure, storing the on-disk
> +	 * block number for those buffers in the "path".
> +	 */
> +	path = &state->path;
> +	ASSERT((path->active >= 0) && (path->active < XFS_DA_NODE_MAXDEPTH));
> +	for (blk = path->blk, level = 0; level < path->active; blk++, level++) {
> +		if (blk->bp) {
> +			blk->disk_blkno = xfs_buf_daddr(blk->bp);
> +			blk->bp = NULL;
> +		} else {
> +			blk->disk_blkno = 0;
> +		}
> +	}
> +
> +	/*
> +	 * Roll down the "altpath" in the state structure, storing the on-disk
> +	 * block number for those buffers in the "altpath".
> +	 */
> +	path = &state->altpath;
> +	ASSERT((path->active >= 0) && (path->active < XFS_DA_NODE_MAXDEPTH));
> +	for (blk = path->blk, level = 0; level < path->active; blk++, level++) {
> +		if (blk->bp) {
> +			blk->disk_blkno = xfs_buf_daddr(blk->bp);
> +			blk->bp = NULL;
> +		} else {
> +			blk->disk_blkno = 0;
> +		}
> +	}
> +
> +	return 0;
> +}
> +
> +/*
> + * Reattach the buffers to the state structure based on the disk block
> + * numbers stored in the state structure.
> + * This is done after some set of transaction commits have released those
> + * buffers from our grip.
> + */
> +static int
> +xfs_attr_refillstate(xfs_da_state_t *state)
> +{
> +	xfs_da_state_path_t *path;
> +	xfs_da_state_blk_t *blk;
> +	int level, error;
> +
> +	trace_xfs_attr_refillstate(state->args);
> +
> +	/*
> +	 * Roll down the "path" in the state structure, storing the on-disk
> +	 * block number for those buffers in the "path".
> +	 */
> +	path = &state->path;
> +	ASSERT((path->active >= 0) && (path->active < XFS_DA_NODE_MAXDEPTH));
> +	for (blk = path->blk, level = 0; level < path->active; blk++, level++) {
> +		if (blk->disk_blkno) {
> +			error = xfs_da3_node_read_mapped(state->args->trans,
> +					state->args->dp, blk->disk_blkno,
> +					&blk->bp, XFS_ATTR_FORK);
> +			if (error)
> +				return error;
> +		} else {
> +			blk->bp = NULL;
> +		}
> +	}
> +
> +	/*
> +	 * Roll down the "altpath" in the state structure, storing the on-disk
> +	 * block number for those buffers in the "altpath".
> +	 */
> +	path = &state->altpath;
> +	ASSERT((path->active >= 0) && (path->active < XFS_DA_NODE_MAXDEPTH));
> +	for (blk = path->blk, level = 0; level < path->active; blk++, level++) {
> +		if (blk->disk_blkno) {
> +			error = xfs_da3_node_read_mapped(state->args->trans,
> +					state->args->dp, blk->disk_blkno,
> +					&blk->bp, XFS_ATTR_FORK);
> +			if (error)
> +				return error;
> +		} else {
> +			blk->bp = NULL;
> +		}
> +	}
> +
> +	return 0;
> +}
> +#else
> +static int xfs_attr_fillstate(xfs_da_state_t *state) { return 0; }
> +#endif
> +
>  /*========================================================================
>   * Overall external interface routines.
>   *========================================================================*/
> @@ -548,25 +661,16 @@ xfs_attr_leaf_remove_attr(
>   */
>  static int
>  xfs_attr_leaf_shrink(
> -	struct xfs_da_args	*args,
> -	struct xfs_da_state     *state)
> +	struct xfs_da_args	*args)
>  {
>  	struct xfs_inode	*dp = args->dp;
> -	int			error, forkoff;
>  	struct xfs_buf		*bp;
> +	int			forkoff;
> +	int			error;
>  
>  	if (!xfs_attr_is_leaf(dp))
>  		return 0;
>  
> -	/*
> -	 * Have to get rid of the copy of this dabuf in the state.
> -	 */
> -	if (state) {
> -		ASSERT(state->path.active == 1);
> -		ASSERT(state->path.blk[0].bp);
> -		state->path.blk[0].bp = NULL;
> -	}
> -
>  	error = xfs_attr3_leaf_read(args->trans, args->dp, 0, &bp);
>  	if (error)
>  		return error;
> @@ -709,7 +813,7 @@ xfs_attr_set_iter(
>  	case XFS_DAS_NODE_REMOVE_ATTR:
>  		error = xfs_attr_node_remove_attr(attr);
>  		if (!error)
> -			error = xfs_attr_leaf_shrink(args, NULL);
> +			error = xfs_attr_leaf_shrink(args);
>  		attr->xattri_dela_state = XFS_DAS_DONE;
>  		break;
>  	default:
> @@ -1382,6 +1486,24 @@ xfs_attr_node_try_addname(
>  	return error;
>  }
>  
> +static int
> +xfs_attr_node_removename(
> +	struct xfs_da_args	*args,
> +	struct xfs_da_state	*state)
> +{
> +	struct xfs_da_state_blk	*blk;
> +	int			retval;
> +
> +	/*
> +	 * Remove the name and update the hashvals in the tree.
> +	 */
> +	blk = &state->path.blk[state->path.active-1];
> +	ASSERT(blk->magic == XFS_ATTR_LEAF_MAGIC);
> +	retval = xfs_attr3_leaf_remove(blk->bp, args);
> +	xfs_da3_fixhashpath(state, &state->path);
> +
> +	return retval;
> +}
>  
>  static int
>  xfs_attr_node_remove_attr(
> @@ -1424,268 +1546,6 @@ xfs_attr_node_remove_attr(
>  	return retval;
>  }
>  
> -
> -STATIC int
> -xfs_attr_node_removename(
> -	struct xfs_da_args	*args,
> -	struct xfs_da_state	*state)
> -{
> -	struct xfs_da_state_blk	*blk;
> -	int			retval;
> -
> -	/*
> -	 * Remove the name and update the hashvals in the tree.
> -	 */
> -	blk = &state->path.blk[state->path.active-1];
> -	ASSERT(blk->magic == XFS_ATTR_LEAF_MAGIC);
> -	retval = xfs_attr3_leaf_remove(blk->bp, args);
> -	xfs_da3_fixhashpath(state, &state->path);
> -
> -	return retval;
> -}
> -
> -/*
> - * Remove the attribute specified in @args.
> - *
> - * This will involve walking down the Btree, and may involve joining
> - * leaf nodes and even joining intermediate nodes up to and including
> - * the root node (a special case of an intermediate node).
> - *
> - * This routine is meant to function as either an in-line or delayed operation,
> - * and may return -EAGAIN when the transaction needs to be rolled.  Calling
> - * functions will need to handle this, and call the function until a
> - * successful error code is returned.
> - */
> -int
> -xfs_attr_remove_iter(
> -	struct xfs_attr_item		*attr)
> -{
> -	struct xfs_da_args		*args = attr->xattri_da_args;
> -	struct xfs_da_state		*state = attr->xattri_da_state;
> -	int				retval, error = 0;
> -	struct xfs_inode		*dp = args->dp;
> -
> -	trace_xfs_attr_node_removename(args);
> -
> -	switch (attr->xattri_dela_state) {
> -	case XFS_DAS_UNINIT:
> -		if (!xfs_inode_hasattr(dp))
> -			return -ENOATTR;
> -
> -		/*
> -		 * Shortform or leaf formats don't require transaction rolls and
> -		 * thus state transitions. Call the right helper and return.
> -		 */
> -		if (dp->i_afp->if_format == XFS_DINODE_FMT_LOCAL)
> -			return xfs_attr_sf_removename(args);
> -
> -		if (xfs_attr_is_leaf(dp))
> -			return xfs_attr_leaf_removename(args);
> -
> -		/*
> -		 * Node format may require transaction rolls. Set up the
> -		 * state context and fall into the state machine.
> -		 */
> -		if (!attr->xattri_da_state) {
> -			error = xfs_attr_node_removename_setup(attr);
> -			if (error)
> -				return error;
> -			state = attr->xattri_da_state;
> -		}
> -
> -		fallthrough;
> -	case XFS_DAS_RMTBLK:
> -		attr->xattri_dela_state = XFS_DAS_RMTBLK;
> -
> -		/*
> -		 * If there is an out-of-line value, de-allocate the blocks.
> -		 * This is done before we remove the attribute so that we don't
> -		 * overflow the maximum size of a transaction and/or hit a
> -		 * deadlock.
> -		 */
> -		if (args->rmtblkno > 0) {
> -			/*
> -			 * May return -EAGAIN. Roll and repeat until all remote
> -			 * blocks are removed.
> -			 */
> -			error = xfs_attr_rmtval_remove(attr);
> -			if (error == -EAGAIN) {
> -				trace_xfs_attr_remove_iter_return(
> -					attr->xattri_dela_state, args->dp);
> -				return error;
> -			} else if (error) {
> -				goto out;
> -			}
> -
> -			/*
> -			 * Refill the state structure with buffers (the prior
> -			 * calls released our buffers) and close out this
> -			 * transaction before proceeding.
> -			 */
> -			ASSERT(args->rmtblkno == 0);
> -			error = xfs_attr_refillstate(state);
> -			if (error)
> -				goto out;
> -
> -			attr->xattri_dela_state = XFS_DAS_RM_NAME;
> -			trace_xfs_attr_remove_iter_return(
> -					attr->xattri_dela_state, args->dp);
> -			return -EAGAIN;
> -		}
> -
> -		fallthrough;
> -	case XFS_DAS_RM_NAME:
> -		/*
> -		 * If we came here fresh from a transaction roll, reattach all
> -		 * the buffers to the current transaction.
> -		 */
> -		if (attr->xattri_dela_state == XFS_DAS_RM_NAME) {
> -			error = xfs_attr_refillstate(state);
> -			if (error)
> -				goto out;
> -		}
> -
> -		retval = xfs_attr_node_removename(args, state);
> -
> -		/*
> -		 * Check to see if the tree needs to be collapsed. If so, roll
> -		 * the transacton and fall into the shrink state.
> -		 */
> -		if (retval && (state->path.active > 1)) {
> -			error = xfs_da3_join(state);
> -			if (error)
> -				goto out;
> -
> -			attr->xattri_dela_state = XFS_DAS_RM_SHRINK;
> -			trace_xfs_attr_remove_iter_return(
> -					attr->xattri_dela_state, args->dp);
> -			return -EAGAIN;
> -		}
> -
> -		fallthrough;
> -	case XFS_DAS_RM_SHRINK:
> -		/*
> -		 * If the result is small enough, push it all into the inode.
> -		 * This is our final state so it's safe to return a dirty
> -		 * transaction.
> -		 */
> -		if (xfs_attr_is_leaf(dp))
> -			error = xfs_attr_leaf_shrink(args, state);
> -		ASSERT(error != -EAGAIN);
> -		break;
> -	default:
> -		ASSERT(0);
> -		error = -EINVAL;
> -		goto out;
> -	}
> -out:
> -	if (state)
> -		xfs_da_state_free(state);
> -	return error;
> -}
> -
> -/*
> - * Fill in the disk block numbers in the state structure for the buffers
> - * that are attached to the state structure.
> - * This is done so that we can quickly reattach ourselves to those buffers
> - * after some set of transaction commits have released these buffers.
> - */
> -STATIC int
> -xfs_attr_fillstate(xfs_da_state_t *state)
> -{
> -	xfs_da_state_path_t *path;
> -	xfs_da_state_blk_t *blk;
> -	int level;
> -
> -	trace_xfs_attr_fillstate(state->args);
> -
> -	/*
> -	 * Roll down the "path" in the state structure, storing the on-disk
> -	 * block number for those buffers in the "path".
> -	 */
> -	path = &state->path;
> -	ASSERT((path->active >= 0) && (path->active < XFS_DA_NODE_MAXDEPTH));
> -	for (blk = path->blk, level = 0; level < path->active; blk++, level++) {
> -		if (blk->bp) {
> -			blk->disk_blkno = xfs_buf_daddr(blk->bp);
> -			blk->bp = NULL;
> -		} else {
> -			blk->disk_blkno = 0;
> -		}
> -	}
> -
> -	/*
> -	 * Roll down the "altpath" in the state structure, storing the on-disk
> -	 * block number for those buffers in the "altpath".
> -	 */
> -	path = &state->altpath;
> -	ASSERT((path->active >= 0) && (path->active < XFS_DA_NODE_MAXDEPTH));
> -	for (blk = path->blk, level = 0; level < path->active; blk++, level++) {
> -		if (blk->bp) {
> -			blk->disk_blkno = xfs_buf_daddr(blk->bp);
> -			blk->bp = NULL;
> -		} else {
> -			blk->disk_blkno = 0;
> -		}
> -	}
> -
> -	return 0;
> -}
> -
> -/*
> - * Reattach the buffers to the state structure based on the disk block
> - * numbers stored in the state structure.
> - * This is done after some set of transaction commits have released those
> - * buffers from our grip.
> - */
> -STATIC int
> -xfs_attr_refillstate(xfs_da_state_t *state)
> -{
> -	xfs_da_state_path_t *path;
> -	xfs_da_state_blk_t *blk;
> -	int level, error;
> -
> -	trace_xfs_attr_refillstate(state->args);
> -
> -	/*
> -	 * Roll down the "path" in the state structure, storing the on-disk
> -	 * block number for those buffers in the "path".
> -	 */
> -	path = &state->path;
> -	ASSERT((path->active >= 0) && (path->active < XFS_DA_NODE_MAXDEPTH));
> -	for (blk = path->blk, level = 0; level < path->active; blk++, level++) {
> -		if (blk->disk_blkno) {
> -			error = xfs_da3_node_read_mapped(state->args->trans,
> -					state->args->dp, blk->disk_blkno,
> -					&blk->bp, XFS_ATTR_FORK);
> -			if (error)
> -				return error;
> -		} else {
> -			blk->bp = NULL;
> -		}
> -	}
> -
> -	/*
> -	 * Roll down the "altpath" in the state structure, storing the on-disk
> -	 * block number for those buffers in the "altpath".
> -	 */
> -	path = &state->altpath;
> -	ASSERT((path->active >= 0) && (path->active < XFS_DA_NODE_MAXDEPTH));
> -	for (blk = path->blk, level = 0; level < path->active; blk++, level++) {
> -		if (blk->disk_blkno) {
> -			error = xfs_da3_node_read_mapped(state->args->trans,
> -					state->args->dp, blk->disk_blkno,
> -					&blk->bp, XFS_ATTR_FORK);
> -			if (error)
> -				return error;
> -		} else {
> -			blk->bp = NULL;
> -		}
> -	}
> -
> -	return 0;
> -}
> -
>  /*
>   * Retrieve the attribute data from a node attribute list.
>   *
> -- 
> 2.35.1
> 

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 16/18 v2] xfs: use XFS_DA_OP flags in deferred attr ops
  2022-05-10 22:20   ` [PATCH 16/18 v2] " Dave Chinner
@ 2022-05-10 23:47     ` Darrick J. Wong
  2022-05-10 23:49     ` Alli
  1 sibling, 0 replies; 56+ messages in thread
From: Darrick J. Wong @ 2022-05-10 23:47 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Wed, May 11, 2022 at 08:20:15AM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> We currently store the high level attr operation in
> args->attr_flags. This falgs are what the VFS is telling us to do,

"This flag contains what the VFS is telling us to do..."

> but don't necessarily match what we are doing in the low level
> modification state machine. e.g. XATTR_REPLACE implies both
> XFS_DA_OP_ADDNAME and XFS_DA_OP_RENAME because it is doing both a
> remove and adding a new attr.
> 
> However, deep in the individual state machine operations, we check
> errors against this high level VFS op flags, not the low level
> XFS_DA_OP flags. Indeed, we don't even have a low level flag for
> a REMOVE operation, so the only way we know we are doing a remove
> is the complete absence of XATTR_REPLACE, XATTR_CREATE,
> XFS_DA_OP_ADDNAME and XFS_DA_OP_RENAME. And because there are other
> flags in these fields, this is a pain to check if we need to.
> 
> As the XFS_DA_OP flags are only needed once the deferred operations
> are set up, set these flags appropriately when we set the initial
> operation state. We also introduce a XFS_DA_OP_REMOVE flag to make
> it easy to know that we are doing a remove operation.

Oh goody!

> With these, we can remove the use of XATTR_REPLACE and XATTR_CREATE
> in low level lookup operations, and manipulate the low level flags
> according to the low level context that is operating. e.g. log
> recovery does not have a VFS xattr operation state to copy into
> args->attr_flags, and the low level state machine ops we do for
> recovery do not match the high level VFS operations that were in
> progress when the system failed...
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
> Version 2 (now that vger has finally got this to me after 2 days):
> - fixed bug by that removed clearing remote block values after save
>   for node format blocks. Clearing is now done by
>   xfs_attr_save_rmt_blk() for both callers.
> 

With the minor sentence nit above fixed,
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

>  fs/xfs/libxfs/xfs_attr.c      | 136 +++++++++++++++++++++++-------------------
>  fs/xfs/libxfs/xfs_attr.h      |   3 +
>  fs/xfs/libxfs/xfs_attr_leaf.c |   2 +-
>  fs/xfs/libxfs/xfs_da_btree.h  |   8 ++-
>  4 files changed, 83 insertions(+), 66 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
> index 8be76f8d11c5..a36364b27aa1 100644
> --- a/fs/xfs/libxfs/xfs_attr.c
> +++ b/fs/xfs/libxfs/xfs_attr.c
> @@ -467,7 +467,7 @@ xfs_attr_leaf_addname(
>  	 */
>  	if (args->rmtblkno)
>  		attr->xattri_dela_state = XFS_DAS_LEAF_SET_RMT;
> -	else if (args->op_flags & XFS_DA_OP_RENAME)
> +	else if (args->op_flags & XFS_DA_OP_REPLACE)
>  		xfs_attr_dela_state_set_replace(attr, XFS_DAS_LEAF_REPLACE);
>  	else
>  		attr->xattri_dela_state = XFS_DAS_DONE;
> @@ -512,7 +512,7 @@ xfs_attr_node_addname(
>  
>  	if (args->rmtblkno)
>  		attr->xattri_dela_state = XFS_DAS_NODE_SET_RMT;
> -	else if (args->op_flags & XFS_DA_OP_RENAME)
> +	else if (args->op_flags & XFS_DA_OP_REPLACE)
>  		xfs_attr_dela_state_set_replace(attr, XFS_DAS_NODE_REPLACE);
>  	else
>  		attr->xattri_dela_state = XFS_DAS_DONE;
> @@ -548,7 +548,7 @@ xfs_attr_rmtval_alloc(
>  		return error;
>  
>  	/* If this is not a rename, clear the incomplete flag and we're done. */
> -	if (!(args->op_flags & XFS_DA_OP_RENAME)) {
> +	if (!(args->op_flags & XFS_DA_OP_REPLACE)) {
>  		error = xfs_attr3_leaf_clearflag(args);
>  		attr->xattri_dela_state = XFS_DAS_DONE;
>  	} else {
> @@ -967,8 +967,6 @@ xfs_attr_set(
>  
>  	if (args->value) {
>  		XFS_STATS_INC(mp, xs_attr_set);
> -
> -		args->op_flags |= XFS_DA_OP_ADDNAME;
>  		args->total = xfs_attr_calc_size(args, &local);
>  
>  		/*
> @@ -1126,28 +1124,41 @@ static inline int xfs_attr_sf_totsize(struct xfs_inode *dp)
>   * Add a name to the shortform attribute list structure
>   * This is the external routine.
>   */
> -STATIC int
> -xfs_attr_shortform_addname(xfs_da_args_t *args)
> +static int
> +xfs_attr_shortform_addname(
> +	struct xfs_da_args	*args)
>  {
> -	int newsize, forkoff, retval;
> +	int			newsize, forkoff;
> +	int			error;
>  
>  	trace_xfs_attr_sf_addname(args);
>  
> -	retval = xfs_attr_shortform_lookup(args);
> -	if (retval == -ENOATTR && (args->attr_flags & XATTR_REPLACE))
> -		return retval;
> -	if (retval == -EEXIST) {
> -		if (args->attr_flags & XATTR_CREATE)
> -			return retval;
> -		retval = xfs_attr_sf_removename(args);
> -		if (retval)
> -			return retval;
> +	error = xfs_attr_shortform_lookup(args);
> +	switch (error) {
> +	case -ENOATTR:
> +		if (args->op_flags & XFS_DA_OP_REPLACE)
> +			return error;
> +		break;
> +	case -EEXIST:
> +		if (!(args->op_flags & XFS_DA_OP_REPLACE))
> +			return error;
> +
> +		error = xfs_attr_sf_removename(args);
> +		if (error)
> +			return error;
> +
>  		/*
> -		 * Since we have removed the old attr, clear ATTR_REPLACE so
> -		 * that the leaf format add routine won't trip over the attr
> -		 * not being around.
> +		 * Since we have removed the old attr, clear XFS_DA_OP_REPLACE
> +		 * so that the new attr doesn't fit in shortform format, the
> +		 * leaf format add routine won't trip over the attr not being
> +		 * around.
>  		 */
> -		args->attr_flags &= ~XATTR_REPLACE;
> +		args->op_flags &= ~XFS_DA_OP_REPLACE;
> +		break;
> +	case 0:
> +		break;
> +	default:
> +		return error;
>  	}
>  
>  	if (args->namelen >= XFS_ATTR_SF_ENTSIZE_MAX ||
> @@ -1170,8 +1181,8 @@ xfs_attr_shortform_addname(xfs_da_args_t *args)
>   * External routines when attribute list is one block
>   *========================================================================*/
>  
> -/* Store info about a remote block */
> -STATIC void
> +/* Save the current remote block info and clear the current pointers. */
> +static void
>  xfs_attr_save_rmt_blk(
>  	struct xfs_da_args	*args)
>  {
> @@ -1180,10 +1191,13 @@ xfs_attr_save_rmt_blk(
>  	args->rmtblkno2 = args->rmtblkno;
>  	args->rmtblkcnt2 = args->rmtblkcnt;
>  	args->rmtvaluelen2 = args->rmtvaluelen;
> +	args->rmtblkno = 0;
> +	args->rmtblkcnt = 0;
> +	args->rmtvaluelen = 0;
>  }
>  
>  /* Set stored info about a remote block */
> -STATIC void
> +static void
>  xfs_attr_restore_rmt_blk(
>  	struct xfs_da_args	*args)
>  {
> @@ -1229,28 +1243,27 @@ xfs_attr_leaf_try_add(
>  	 * Look up the xattr name to set the insertion point for the new xattr.
>  	 */
>  	error = xfs_attr3_leaf_lookup_int(bp, args);
> -	if (error != -ENOATTR && error != -EEXIST)
> -		goto out_brelse;
> -	if (error == -ENOATTR && (args->attr_flags & XATTR_REPLACE))
> -		goto out_brelse;
> -	if (error == -EEXIST) {
> -		if (args->attr_flags & XATTR_CREATE)
> +	switch (error) {
> +	case -ENOATTR:
> +		if (args->op_flags & XFS_DA_OP_REPLACE)
> +			goto out_brelse;
> +		break;
> +	case -EEXIST:
> +		if (!(args->op_flags & XFS_DA_OP_REPLACE))
>  			goto out_brelse;
>  
>  		trace_xfs_attr_leaf_replace(args);
> -
> -		/* save the attribute state for later removal*/
> -		args->op_flags |= XFS_DA_OP_RENAME;	/* an atomic rename */
> -		xfs_attr_save_rmt_blk(args);
> -
>  		/*
> -		 * clear the remote attr state now that it is saved so that the
> -		 * values reflect the state of the attribute we are about to
> +		 * Save the existing remote attr state so that the current
> +		 * values reflect the state of the new attribute we are about to
>  		 * add, not the attribute we just found and will remove later.
>  		 */
> -		args->rmtblkno = 0;
> -		args->rmtblkcnt = 0;
> -		args->rmtvaluelen = 0;
> +		xfs_attr_save_rmt_blk(args);
> +		break;
> +	case 0:
> +		break;
> +	default:
> +		goto out_brelse;
>  	}
>  
>  	return xfs_attr3_leaf_add(bp, args);
> @@ -1389,46 +1402,45 @@ xfs_attr_node_hasname(
>  
>  STATIC int
>  xfs_attr_node_addname_find_attr(
> -	 struct xfs_attr_item		*attr)
> +	 struct xfs_attr_item	*attr)
>  {
> -	struct xfs_da_args		*args = attr->xattri_da_args;
> -	int				retval;
> +	struct xfs_da_args	*args = attr->xattri_da_args;
> +	int			error;
>  
>  	/*
>  	 * Search to see if name already exists, and get back a pointer
>  	 * to where it should go.
>  	 */
> -	retval = xfs_attr_node_hasname(args, &attr->xattri_da_state);
> -	if (retval != -ENOATTR && retval != -EEXIST)
> -		goto error;
> -
> -	if (retval == -ENOATTR && (args->attr_flags & XATTR_REPLACE))
> -		goto error;
> -	if (retval == -EEXIST) {
> -		if (args->attr_flags & XATTR_CREATE)
> +	error = xfs_attr_node_hasname(args, &attr->xattri_da_state);
> +	switch (error) {
> +	case -ENOATTR:
> +		if (args->op_flags & XFS_DA_OP_REPLACE)
> +			goto error;
> +		break;
> +	case -EEXIST:
> +		if (!(args->op_flags & XFS_DA_OP_REPLACE))
>  			goto error;
>  
> -		trace_xfs_attr_node_replace(args);
> -
> -		/* save the attribute state for later removal*/
> -		args->op_flags |= XFS_DA_OP_RENAME;	/* atomic rename op */
> -		xfs_attr_save_rmt_blk(args);
>  
> +		trace_xfs_attr_node_replace(args);
>  		/*
> -		 * clear the remote attr state now that it is saved so that the
> -		 * values reflect the state of the attribute we are about to
> +		 * Save the existing remote attr state so that the current
> +		 * values reflect the state of the new attribute we are about to
>  		 * add, not the attribute we just found and will remove later.
>  		 */
> -		args->rmtblkno = 0;
> -		args->rmtblkcnt = 0;
> -		args->rmtvaluelen = 0;
> +		xfs_attr_save_rmt_blk(args);
> +		break;
> +	case 0:
> +		break;
> +	default:
> +		goto error;
>  	}
>  
>  	return 0;
>  error:
>  	if (attr->xattri_da_state)
>  		xfs_da_state_free(attr->xattri_da_state);
> -	return retval;
> +	return error;
>  }
>  
>  /*
> diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
> index 6bef522533a4..e93efc8b11cd 100644
> --- a/fs/xfs/libxfs/xfs_attr.h
> +++ b/fs/xfs/libxfs/xfs_attr.h
> @@ -584,6 +584,7 @@ xfs_attr_is_shortform(
>  static inline enum xfs_delattr_state
>  xfs_attr_init_add_state(struct xfs_da_args *args)
>  {
> +	args->op_flags |= XFS_DA_OP_ADDNAME;
>  	if (!args->dp->i_afp)
>  		return XFS_DAS_DONE;
>  	if (xfs_attr_is_shortform(args->dp))
> @@ -596,6 +597,7 @@ xfs_attr_init_add_state(struct xfs_da_args *args)
>  static inline enum xfs_delattr_state
>  xfs_attr_init_remove_state(struct xfs_da_args *args)
>  {
> +	args->op_flags |= XFS_DA_OP_REMOVE;
>  	if (xfs_attr_is_shortform(args->dp))
>  		return XFS_DAS_SF_REMOVE;
>  	if (xfs_attr_is_leaf(args->dp))
> @@ -606,6 +608,7 @@ xfs_attr_init_remove_state(struct xfs_da_args *args)
>  static inline enum xfs_delattr_state
>  xfs_attr_init_replace_state(struct xfs_da_args *args)
>  {
> +	args->op_flags |= XFS_DA_OP_ADDNAME | XFS_DA_OP_REPLACE;
>  	return xfs_attr_init_add_state(args);
>  }
>  
> diff --git a/fs/xfs/libxfs/xfs_attr_leaf.c b/fs/xfs/libxfs/xfs_attr_leaf.c
> index e90bfd9d7551..53d02ce9ed78 100644
> --- a/fs/xfs/libxfs/xfs_attr_leaf.c
> +++ b/fs/xfs/libxfs/xfs_attr_leaf.c
> @@ -1492,7 +1492,7 @@ xfs_attr3_leaf_add_work(
>  	entry->flags = args->attr_filter;
>  	if (tmp)
>  		entry->flags |= XFS_ATTR_LOCAL;
> -	if (args->op_flags & XFS_DA_OP_RENAME) {
> +	if (args->op_flags & XFS_DA_OP_REPLACE) {
>  		if (!xfs_has_larp(mp))
>  			entry->flags |= XFS_ATTR_INCOMPLETE;
>  		if ((args->blkno2 == args->blkno) &&
> diff --git a/fs/xfs/libxfs/xfs_da_btree.h b/fs/xfs/libxfs/xfs_da_btree.h
> index deb368d041e3..468ca70cd35d 100644
> --- a/fs/xfs/libxfs/xfs_da_btree.h
> +++ b/fs/xfs/libxfs/xfs_da_btree.h
> @@ -85,19 +85,21 @@ typedef struct xfs_da_args {
>   * Operation flags:
>   */
>  #define XFS_DA_OP_JUSTCHECK	(1u << 0) /* check for ok with no space */
> -#define XFS_DA_OP_RENAME	(1u << 1) /* this is an atomic rename op */
> +#define XFS_DA_OP_REPLACE	(1u << 1) /* this is an atomic replace op */
>  #define XFS_DA_OP_ADDNAME	(1u << 2) /* this is an add operation */
>  #define XFS_DA_OP_OKNOENT	(1u << 3) /* lookup op, ENOENT ok, else die */
>  #define XFS_DA_OP_CILOOKUP	(1u << 4) /* lookup returns CI name if found */
>  #define XFS_DA_OP_NOTIME	(1u << 5) /* don't update inode timestamps */
> +#define XFS_DA_OP_REMOVE	(1u << 6) /* this is a remove operation */
>  
>  #define XFS_DA_OP_FLAGS \
>  	{ XFS_DA_OP_JUSTCHECK,	"JUSTCHECK" }, \
> -	{ XFS_DA_OP_RENAME,	"RENAME" }, \
> +	{ XFS_DA_OP_REPLACE,	"REPLACE" }, \
>  	{ XFS_DA_OP_ADDNAME,	"ADDNAME" }, \
>  	{ XFS_DA_OP_OKNOENT,	"OKNOENT" }, \
>  	{ XFS_DA_OP_CILOOKUP,	"CILOOKUP" }, \
> -	{ XFS_DA_OP_NOTIME,	"NOTIME" }
> +	{ XFS_DA_OP_NOTIME,	"NOTIME" }, \
> +	{ XFS_DA_OP_REMOVE,	"REMOVE" }
>  
>  /*
>   * Storage for holding state during Btree searches and split/join ops.
> Dave Chinner
> david@fromorbit.com

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 16/18 v2] xfs: use XFS_DA_OP flags in deferred attr ops
  2022-05-10 22:20   ` [PATCH 16/18 v2] " Dave Chinner
  2022-05-10 23:47     ` Darrick J. Wong
@ 2022-05-10 23:49     ` Alli
  1 sibling, 0 replies; 56+ messages in thread
From: Alli @ 2022-05-10 23:49 UTC (permalink / raw)
  To: Dave Chinner, linux-xfs

On Wed, 2022-05-11 at 08:20 +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> We currently store the high level attr operation in
> args->attr_flags. This falgs are what the VFS is telling us to do,
> but don't necessarily match what we are doing in the low level
> modification state machine. e.g. XATTR_REPLACE implies both
> XFS_DA_OP_ADDNAME and XFS_DA_OP_RENAME because it is doing both a
> remove and adding a new attr.
> 
> However, deep in the individual state machine operations, we check
> errors against this high level VFS op flags, not the low level
> XFS_DA_OP flags. Indeed, we don't even have a low level flag for
> a REMOVE operation, so the only way we know we are doing a remove
> is the complete absence of XATTR_REPLACE, XATTR_CREATE,
> XFS_DA_OP_ADDNAME and XFS_DA_OP_RENAME. And because there are other
> flags in these fields, this is a pain to check if we need to.
> 
> As the XFS_DA_OP flags are only needed once the deferred operations
> are set up, set these flags appropriately when we set the initial
> operation state. We also introduce a XFS_DA_OP_REMOVE flag to make
> it easy to know that we are doing a remove operation.
> 
> With these, we can remove the use of XATTR_REPLACE and XATTR_CREATE
> in low level lookup operations, and manipulate the low level flags
> according to the low level context that is operating. e.g. log
> recovery does not have a VFS xattr operation state to copy into
> args->attr_flags, and the low level state machine ops we do for
> recovery do not match the high level VFS operations that were in
> progress when the system failed...
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
I popped this in place of the previous version and it ran ok for me
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>

> ---
> Version 2 (now that vger has finally got this to me after 2 days):
> - fixed bug by that removed clearing remote block values after save
>   for node format blocks. Clearing is now done by
>   xfs_attr_save_rmt_blk() for both callers.
> 
>  fs/xfs/libxfs/xfs_attr.c      | 136 +++++++++++++++++++++++---------
> ----------
>  fs/xfs/libxfs/xfs_attr.h      |   3 +
>  fs/xfs/libxfs/xfs_attr_leaf.c |   2 +-
>  fs/xfs/libxfs/xfs_da_btree.h  |   8 ++-
>  4 files changed, 83 insertions(+), 66 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
> index 8be76f8d11c5..a36364b27aa1 100644
> --- a/fs/xfs/libxfs/xfs_attr.c
> +++ b/fs/xfs/libxfs/xfs_attr.c
> @@ -467,7 +467,7 @@ xfs_attr_leaf_addname(
>  	 */
>  	if (args->rmtblkno)
>  		attr->xattri_dela_state = XFS_DAS_LEAF_SET_RMT;
> -	else if (args->op_flags & XFS_DA_OP_RENAME)
> +	else if (args->op_flags & XFS_DA_OP_REPLACE)
>  		xfs_attr_dela_state_set_replace(attr,
> XFS_DAS_LEAF_REPLACE);
>  	else
>  		attr->xattri_dela_state = XFS_DAS_DONE;
> @@ -512,7 +512,7 @@ xfs_attr_node_addname(
>  
>  	if (args->rmtblkno)
>  		attr->xattri_dela_state = XFS_DAS_NODE_SET_RMT;
> -	else if (args->op_flags & XFS_DA_OP_RENAME)
> +	else if (args->op_flags & XFS_DA_OP_REPLACE)
>  		xfs_attr_dela_state_set_replace(attr,
> XFS_DAS_NODE_REPLACE);
>  	else
>  		attr->xattri_dela_state = XFS_DAS_DONE;
> @@ -548,7 +548,7 @@ xfs_attr_rmtval_alloc(
>  		return error;
>  
>  	/* If this is not a rename, clear the incomplete flag and we're
> done. */
> -	if (!(args->op_flags & XFS_DA_OP_RENAME)) {
> +	if (!(args->op_flags & XFS_DA_OP_REPLACE)) {
>  		error = xfs_attr3_leaf_clearflag(args);
>  		attr->xattri_dela_state = XFS_DAS_DONE;
>  	} else {
> @@ -967,8 +967,6 @@ xfs_attr_set(
>  
>  	if (args->value) {
>  		XFS_STATS_INC(mp, xs_attr_set);
> -
> -		args->op_flags |= XFS_DA_OP_ADDNAME;
>  		args->total = xfs_attr_calc_size(args, &local);
>  
>  		/*
> @@ -1126,28 +1124,41 @@ static inline int xfs_attr_sf_totsize(struct
> xfs_inode *dp)
>   * Add a name to the shortform attribute list structure
>   * This is the external routine.
>   */
> -STATIC int
> -xfs_attr_shortform_addname(xfs_da_args_t *args)
> +static int
> +xfs_attr_shortform_addname(
> +	struct xfs_da_args	*args)
>  {
> -	int newsize, forkoff, retval;
> +	int			newsize, forkoff;
> +	int			error;
>  
>  	trace_xfs_attr_sf_addname(args);
>  
> -	retval = xfs_attr_shortform_lookup(args);
> -	if (retval == -ENOATTR && (args->attr_flags & XATTR_REPLACE))
> -		return retval;
> -	if (retval == -EEXIST) {
> -		if (args->attr_flags & XATTR_CREATE)
> -			return retval;
> -		retval = xfs_attr_sf_removename(args);
> -		if (retval)
> -			return retval;
> +	error = xfs_attr_shortform_lookup(args);
> +	switch (error) {
> +	case -ENOATTR:
> +		if (args->op_flags & XFS_DA_OP_REPLACE)
> +			return error;
> +		break;
> +	case -EEXIST:
> +		if (!(args->op_flags & XFS_DA_OP_REPLACE))
> +			return error;
> +
> +		error = xfs_attr_sf_removename(args);
> +		if (error)
> +			return error;
> +
>  		/*
> -		 * Since we have removed the old attr, clear
> ATTR_REPLACE so
> -		 * that the leaf format add routine won't trip over the
> attr
> -		 * not being around.
> +		 * Since we have removed the old attr, clear
> XFS_DA_OP_REPLACE
> +		 * so that the new attr doesn't fit in shortform
> format, the
> +		 * leaf format add routine won't trip over the attr not
> being
> +		 * around.
>  		 */
> -		args->attr_flags &= ~XATTR_REPLACE;
> +		args->op_flags &= ~XFS_DA_OP_REPLACE;
> +		break;
> +	case 0:
> +		break;
> +	default:
> +		return error;
>  	}
>  
>  	if (args->namelen >= XFS_ATTR_SF_ENTSIZE_MAX ||
> @@ -1170,8 +1181,8 @@ xfs_attr_shortform_addname(xfs_da_args_t *args)
>   * External routines when attribute list is one block
>  
> *====================================================================
> ====*/
>  
> -/* Store info about a remote block */
> -STATIC void
> +/* Save the current remote block info and clear the current
> pointers. */
> +static void
>  xfs_attr_save_rmt_blk(
>  	struct xfs_da_args	*args)
>  {
> @@ -1180,10 +1191,13 @@ xfs_attr_save_rmt_blk(
>  	args->rmtblkno2 = args->rmtblkno;
>  	args->rmtblkcnt2 = args->rmtblkcnt;
>  	args->rmtvaluelen2 = args->rmtvaluelen;
> +	args->rmtblkno = 0;
> +	args->rmtblkcnt = 0;
> +	args->rmtvaluelen = 0;
>  }
>  
>  /* Set stored info about a remote block */
> -STATIC void
> +static void
>  xfs_attr_restore_rmt_blk(
>  	struct xfs_da_args	*args)
>  {
> @@ -1229,28 +1243,27 @@ xfs_attr_leaf_try_add(
>  	 * Look up the xattr name to set the insertion point for the
> new xattr.
>  	 */
>  	error = xfs_attr3_leaf_lookup_int(bp, args);
> -	if (error != -ENOATTR && error != -EEXIST)
> -		goto out_brelse;
> -	if (error == -ENOATTR && (args->attr_flags & XATTR_REPLACE))
> -		goto out_brelse;
> -	if (error == -EEXIST) {
> -		if (args->attr_flags & XATTR_CREATE)
> +	switch (error) {
> +	case -ENOATTR:
> +		if (args->op_flags & XFS_DA_OP_REPLACE)
> +			goto out_brelse;
> +		break;
> +	case -EEXIST:
> +		if (!(args->op_flags & XFS_DA_OP_REPLACE))
>  			goto out_brelse;
>  
>  		trace_xfs_attr_leaf_replace(args);
> -
> -		/* save the attribute state for later removal*/
> -		args->op_flags |= XFS_DA_OP_RENAME;	/* an atomic
> rename */
> -		xfs_attr_save_rmt_blk(args);
> -
>  		/*
> -		 * clear the remote attr state now that it is saved so
> that the
> -		 * values reflect the state of the attribute we are
> about to
> +		 * Save the existing remote attr state so that the
> current
> +		 * values reflect the state of the new attribute we are
> about to
>  		 * add, not the attribute we just found and will remove
> later.
>  		 */
> -		args->rmtblkno = 0;
> -		args->rmtblkcnt = 0;
> -		args->rmtvaluelen = 0;
> +		xfs_attr_save_rmt_blk(args);
> +		break;
> +	case 0:
> +		break;
> +	default:
> +		goto out_brelse;
>  	}
>  
>  	return xfs_attr3_leaf_add(bp, args);
> @@ -1389,46 +1402,45 @@ xfs_attr_node_hasname(
>  
>  STATIC int
>  xfs_attr_node_addname_find_attr(
> -	 struct xfs_attr_item		*attr)
> +	 struct xfs_attr_item	*attr)
>  {
> -	struct xfs_da_args		*args = attr->xattri_da_args;
> -	int				retval;
> +	struct xfs_da_args	*args = attr->xattri_da_args;
> +	int			error;
>  
>  	/*
>  	 * Search to see if name already exists, and get back a pointer
>  	 * to where it should go.
>  	 */
> -	retval = xfs_attr_node_hasname(args, &attr->xattri_da_state);
> -	if (retval != -ENOATTR && retval != -EEXIST)
> -		goto error;
> -
> -	if (retval == -ENOATTR && (args->attr_flags & XATTR_REPLACE))
> -		goto error;
> -	if (retval == -EEXIST) {
> -		if (args->attr_flags & XATTR_CREATE)
> +	error = xfs_attr_node_hasname(args, &attr->xattri_da_state);
> +	switch (error) {
> +	case -ENOATTR:
> +		if (args->op_flags & XFS_DA_OP_REPLACE)
> +			goto error;
> +		break;
> +	case -EEXIST:
> +		if (!(args->op_flags & XFS_DA_OP_REPLACE))
>  			goto error;
>  
> -		trace_xfs_attr_node_replace(args);
> -
> -		/* save the attribute state for later removal*/
> -		args->op_flags |= XFS_DA_OP_RENAME;	/* atomic
> rename op */
> -		xfs_attr_save_rmt_blk(args);
>  
> +		trace_xfs_attr_node_replace(args);
>  		/*
> -		 * clear the remote attr state now that it is saved so
> that the
> -		 * values reflect the state of the attribute we are
> about to
> +		 * Save the existing remote attr state so that the
> current
> +		 * values reflect the state of the new attribute we are
> about to
>  		 * add, not the attribute we just found and will remove
> later.
>  		 */
> -		args->rmtblkno = 0;
> -		args->rmtblkcnt = 0;
> -		args->rmtvaluelen = 0;
> +		xfs_attr_save_rmt_blk(args);
> +		break;
> +	case 0:
> +		break;
> +	default:
> +		goto error;
>  	}
>  
>  	return 0;
>  error:
>  	if (attr->xattri_da_state)
>  		xfs_da_state_free(attr->xattri_da_state);
> -	return retval;
> +	return error;
>  }
>  
>  /*
> diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
> index 6bef522533a4..e93efc8b11cd 100644
> --- a/fs/xfs/libxfs/xfs_attr.h
> +++ b/fs/xfs/libxfs/xfs_attr.h
> @@ -584,6 +584,7 @@ xfs_attr_is_shortform(
>  static inline enum xfs_delattr_state
>  xfs_attr_init_add_state(struct xfs_da_args *args)
>  {
> +	args->op_flags |= XFS_DA_OP_ADDNAME;
>  	if (!args->dp->i_afp)
>  		return XFS_DAS_DONE;
>  	if (xfs_attr_is_shortform(args->dp))
> @@ -596,6 +597,7 @@ xfs_attr_init_add_state(struct xfs_da_args *args)
>  static inline enum xfs_delattr_state
>  xfs_attr_init_remove_state(struct xfs_da_args *args)
>  {
> +	args->op_flags |= XFS_DA_OP_REMOVE;
>  	if (xfs_attr_is_shortform(args->dp))
>  		return XFS_DAS_SF_REMOVE;
>  	if (xfs_attr_is_leaf(args->dp))
> @@ -606,6 +608,7 @@ xfs_attr_init_remove_state(struct xfs_da_args
> *args)
>  static inline enum xfs_delattr_state
>  xfs_attr_init_replace_state(struct xfs_da_args *args)
>  {
> +	args->op_flags |= XFS_DA_OP_ADDNAME | XFS_DA_OP_REPLACE;
>  	return xfs_attr_init_add_state(args);
>  }
>  
> diff --git a/fs/xfs/libxfs/xfs_attr_leaf.c
> b/fs/xfs/libxfs/xfs_attr_leaf.c
> index e90bfd9d7551..53d02ce9ed78 100644
> --- a/fs/xfs/libxfs/xfs_attr_leaf.c
> +++ b/fs/xfs/libxfs/xfs_attr_leaf.c
> @@ -1492,7 +1492,7 @@ xfs_attr3_leaf_add_work(
>  	entry->flags = args->attr_filter;
>  	if (tmp)
>  		entry->flags |= XFS_ATTR_LOCAL;
> -	if (args->op_flags & XFS_DA_OP_RENAME) {
> +	if (args->op_flags & XFS_DA_OP_REPLACE) {
>  		if (!xfs_has_larp(mp))
>  			entry->flags |= XFS_ATTR_INCOMPLETE;
>  		if ((args->blkno2 == args->blkno) &&
> diff --git a/fs/xfs/libxfs/xfs_da_btree.h
> b/fs/xfs/libxfs/xfs_da_btree.h
> index deb368d041e3..468ca70cd35d 100644
> --- a/fs/xfs/libxfs/xfs_da_btree.h
> +++ b/fs/xfs/libxfs/xfs_da_btree.h
> @@ -85,19 +85,21 @@ typedef struct xfs_da_args {
>   * Operation flags:
>   */
>  #define XFS_DA_OP_JUSTCHECK	(1u << 0) /* check for ok with no space
> */
> -#define XFS_DA_OP_RENAME	(1u << 1) /* this is an atomic rename
> op */
> +#define XFS_DA_OP_REPLACE	(1u << 1) /* this is an atomic replace
> op */
>  #define XFS_DA_OP_ADDNAME	(1u << 2) /* this is an add operation
> */
>  #define XFS_DA_OP_OKNOENT	(1u << 3) /* lookup op, ENOENT ok, else
> die */
>  #define XFS_DA_OP_CILOOKUP	(1u << 4) /* lookup returns CI name if
> found */
>  #define XFS_DA_OP_NOTIME	(1u << 5) /* don't update inode
> timestamps */
> +#define XFS_DA_OP_REMOVE	(1u << 6) /* this is a remove operation
> */
>  
>  #define XFS_DA_OP_FLAGS \
>  	{ XFS_DA_OP_JUSTCHECK,	"JUSTCHECK" }, \
> -	{ XFS_DA_OP_RENAME,	"RENAME" }, \
> +	{ XFS_DA_OP_REPLACE,	"REPLACE" }, \
>  	{ XFS_DA_OP_ADDNAME,	"ADDNAME" }, \
>  	{ XFS_DA_OP_OKNOENT,	"OKNOENT" }, \
>  	{ XFS_DA_OP_CILOOKUP,	"CILOOKUP" }, \
> -	{ XFS_DA_OP_NOTIME,	"NOTIME" }
> +	{ XFS_DA_OP_NOTIME,	"NOTIME" }, \
> +	{ XFS_DA_OP_REMOVE,	"REMOVE" }
>  
>  /*
>   * Storage for holding state during Btree searches and split/join
> ops.
> Dave Chinner
> david@fromorbit.com


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 17/18] xfs: ATTR_REPLACE algorithm with LARP enabled needs rework
  2022-05-09  0:41 ` [PATCH 17/18] xfs: ATTR_REPLACE algorithm with LARP enabled needs rework Dave Chinner
  2022-05-10 22:31   ` Alli
@ 2022-05-10 23:53   ` Darrick J. Wong
  2022-05-11  1:14     ` Dave Chinner
  1 sibling, 1 reply; 56+ messages in thread
From: Darrick J. Wong @ 2022-05-10 23:53 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Mon, May 09, 2022 at 10:41:37AM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> We can't use the same algorithm for replacing an existing attribute
> when logging attributes. The existing algorithm is essentially:
> 
> 1. create new attr w/ INCOMPLETE
> 2. atomically flip INCOMPLETE flags between old + new attribute
> 3. remove old attr which is marked w/ INCOMPLETE
> 
> This algorithm guarantees that we see either the old or new
> attribute, and if we fail after the atomic flag flip, we don't have
> to recover the removal of the old attr because we never see
> INCOMPLETE attributes in lookups.
> 
> For logged attributes, however, this does not work. The logged
> attribute intents do not track the work that has been done as the
> transaction rolls, and hence the only recovery mechanism we have is
> "run the replace operation from scratch".
> 
> This is further exacerbated by the attempt to avoid needing the
> INCOMPLETE flag to create an atomic swap. This means we can create
> a second active attribute of the same name before we remove the
> original. If we fail at any point after the create but before the
> removal has completed, we end up with duplicate attributes in
> the attr btree and recovery only tries to replace one of them.
> 
> There are several other failure modes where we can leave partially
> allocated remote attributes that expose stale data, partially free
> remote attributes that enable UAF based stale data exposure, etc.
> 
> TO fix this, we need a different algorithm for replace operations
> when LARP is enabled. Luckily, it's not that complex if we take the
> right first step. That is, the first thing we log is the attri
> intent with the new name/value pair and mark the old attr as
> INCOMPLETE in the same transaction.
> 
> From there, we then remove the old attr and keep relogging the
> new name/value in the intent, such that we always know that we have
> to create the new attr in recovery. Once the old attr is removed,
> we then run a normal ATTR_CREATE operation relogging the intent as
> we go. If the new attr is local, then it gets created in a single
> atomic transaction that also logs the final intent done. If the new
> attr is remote, the we set INCOMPLETE on the new attr while we
> allocate and set the remote value, and then we clear the INCOMPLETE
> flag at in the last transaction taht logs the final intent done.
> 
> If we fail at any point in this algorithm, log recovery will always
> see the same state on disk: the new name/value in the intent, and
> either an INCOMPLETE attr or no attr in the attr btree. If we find
> an INCOMPLETE attr, we run the full replace starting with removing
> the INCOMPLETE attr. If we don't find it, then we simply create the
> new attr.

...assuming that the INCOMPLETE attr is preserved until we're completely
done unmapping the rmtval blocks, right?

> Notably, recovery of a failed create that has an INCOMPLETE flag set
> is now the same - we start with the lookup of the INCOMPLETE attr,
> and if that exists then we do the full replace recovery process,
> otherwise we just create the new attr.
> 
> Hence changing the way we do the replace operation when LARP is
> enabled allows us to use the same log recovery algorithm for both
> the ATTR_CREATE and ATTR_REPLACE operations. This is also the same
> algorithm we use for runtime ATTR_REPLACE operations (except for the
> step setting up the initial conditions).
> 
> The result is that:
> 
> - ATTR_CREATE uses the same algorithm regardless of whether LARP is
>   enabled or not
> - ATTR_REPLACE with larp=0 is identical to the old algorithm
> - ATTR_REPLACE with larp=1 runs an unmodified attr removal algorithm
>   from the larp=0 code and then runs the unmodified ATTR_CREATE
>   code.
> - log recovery when larp=1 runs the same ATTR_REPLACE algorithm as
>   it uses at runtime.
> 
> Because the state machine is now quite clean, changing the algorithm
> is really just a case of changing the initial state and how the
> states link together for the ATTR_REPLACE case. Hence it's not a
> huge amoutn of code for what is a fairly substantial rework

"amount"

> of the attr logging and recovery algorithm....
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/libxfs/xfs_attr.c      | 97 +++++++++++++++++++++--------------
>  fs/xfs/libxfs/xfs_attr.h      | 49 +++++++++++-------
>  fs/xfs/libxfs/xfs_attr_leaf.c | 44 +++++++++++++---
>  fs/xfs/libxfs/xfs_da_btree.h  |  4 +-
>  fs/xfs/xfs_attr_item.c        |  8 ++-
>  fs/xfs/xfs_trace.h            |  7 +--
>  6 files changed, 137 insertions(+), 72 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
> index 344497e37813..2f6b9bfd7768 100644
> --- a/fs/xfs/libxfs/xfs_attr.c
> +++ b/fs/xfs/libxfs/xfs_attr.c
> @@ -69,9 +69,12 @@ int
>  xfs_inode_hasattr(
>  	struct xfs_inode	*ip)
>  {
> -	if (!XFS_IFORK_Q(ip) ||
> -	    (ip->i_afp->if_format == XFS_DINODE_FMT_EXTENTS &&
> -	     ip->i_afp->if_nextents == 0))
> +	if (!XFS_IFORK_Q(ip))
> +		return 0;
> +	if (!ip->i_afp)
> +		return 0;

Thank you.

> +	if (ip->i_afp->if_format == XFS_DINODE_FMT_EXTENTS &&
> +	    ip->i_afp->if_nextents == 0)
>  		return 0;
>  	return 1;
>  }
> @@ -409,23 +412,30 @@ xfs_attr_sf_addname(
>  }
>  
>  /*
> - * When we bump the state to REPLACE, we may actually need to skip over the
> - * state. When LARP mode is enabled, we don't need to run the atomic flags flip,
> - * so we skip straight over the REPLACE state and go on to REMOVE_OLD.
> + * Handle the state change on completion of a multi-state attr operation.
> + *
> + * If the XFS_DA_OP_REPLACE flag is set, this means the operation was the first
> + * modification in a attr replace operation and we still have to do the second
> + * state, indicated by @replace_state.
> + *
> + * We consume the XFS_DA_OP_REPLACE flag so that when we are called again on
> + * completion of the second half of the attr replace operation we correctly
> + * signal that it is done.
>   */
> -static void
> -xfs_attr_dela_state_set_replace(
> +static enum xfs_delattr_state
> +xfs_attr_complete_op(
>  	struct xfs_attr_item	*attr,
> -	enum xfs_delattr_state	replace)
> +	enum xfs_delattr_state	replace_state)
>  {
>  	struct xfs_da_args	*args = attr->xattri_da_args;
> +	bool			do_replace = args->op_flags & XFS_DA_OP_REPLACE;
>  
> -	ASSERT(replace == XFS_DAS_LEAF_REPLACE ||
> -			replace == XFS_DAS_NODE_REPLACE);
> -
> -	attr->xattri_dela_state = replace;
> -	if (xfs_has_larp(args->dp->i_mount))
> -		attr->xattri_dela_state++;
> +	args->op_flags &= ~XFS_DA_OP_REPLACE;
> +	if (do_replace) {
> +		args->attr_filter &= ~XFS_ATTR_INCOMPLETE;
> +		return replace_state;
> +	}
> +	return XFS_DAS_DONE;
>  }
>  
>  static int
> @@ -467,10 +477,9 @@ xfs_attr_leaf_addname(
>  	 */
>  	if (args->rmtblkno)
>  		attr->xattri_dela_state = XFS_DAS_LEAF_SET_RMT;
> -	else if (args->op_flags & XFS_DA_OP_REPLACE)
> -		xfs_attr_dela_state_set_replace(attr, XFS_DAS_LEAF_REPLACE);
>  	else
> -		attr->xattri_dela_state = XFS_DAS_DONE;
> +		attr->xattri_dela_state = xfs_attr_complete_op(attr,
> +							XFS_DAS_LEAF_REPLACE);
>  out:
>  	trace_xfs_attr_leaf_addname_return(attr->xattri_dela_state, args->dp);
>  	return error;
> @@ -512,10 +521,9 @@ xfs_attr_node_addname(
>  
>  	if (args->rmtblkno)
>  		attr->xattri_dela_state = XFS_DAS_NODE_SET_RMT;
> -	else if (args->op_flags & XFS_DA_OP_REPLACE)
> -		xfs_attr_dela_state_set_replace(attr, XFS_DAS_NODE_REPLACE);
>  	else
> -		attr->xattri_dela_state = XFS_DAS_DONE;
> +		attr->xattri_dela_state = xfs_attr_complete_op(attr,
> +							XFS_DAS_NODE_REPLACE);
>  out:
>  	trace_xfs_attr_node_addname_return(attr->xattri_dela_state, args->dp);
>  	return error;
> @@ -547,18 +555,15 @@ xfs_attr_rmtval_alloc(
>  	if (error)
>  		return error;
>  
> -	/* If this is not a rename, clear the incomplete flag and we're done. */
> -	if (!(args->op_flags & XFS_DA_OP_REPLACE)) {
> +	attr->xattri_dela_state = xfs_attr_complete_op(attr,
> +						++attr->xattri_dela_state);
> +	/*
> +	 * If we are not doing a rename, we've finished the operation but still
> +	 * have to clear the incomplete flag protecting the new attr from
> +	 * exposing partially initialised state if we crash during creation.
> +	 */
> +	if (attr->xattri_dela_state == XFS_DAS_DONE)
>  		error = xfs_attr3_leaf_clearflag(args);
> -		attr->xattri_dela_state = XFS_DAS_DONE;
> -	} else {
> -		/*
> -		 * We are running a REPLACE operation, so we need to bump the
> -		 * state to the step in that operation.
> -		 */
> -		attr->xattri_dela_state++;
> -		xfs_attr_dela_state_set_replace(attr, attr->xattri_dela_state);
> -	}
>  out:
>  	trace_xfs_attr_rmtval_alloc(attr->xattri_dela_state, args->dp);
>  	return error;
> @@ -715,13 +720,24 @@ xfs_attr_set_iter(
>  		return xfs_attr_node_addname(attr);
>  
>  	case XFS_DAS_SF_REMOVE:
> -		attr->xattri_dela_state = XFS_DAS_DONE;
> -		return xfs_attr_sf_removename(args);
> +		error = xfs_attr_sf_removename(args);
> +		attr->xattri_dela_state = xfs_attr_complete_op(attr,
> +						xfs_attr_init_add_state(args));
> +		break;
>  	case XFS_DAS_LEAF_REMOVE:
> -		attr->xattri_dela_state = XFS_DAS_DONE;
> -		return xfs_attr_leaf_removename(args);
> +		error = xfs_attr_leaf_removename(args);
> +		attr->xattri_dela_state = xfs_attr_complete_op(attr,
> +						xfs_attr_init_add_state(args));
> +		break;
>  	case XFS_DAS_NODE_REMOVE:
>  		error = xfs_attr_node_removename_setup(attr);
> +		if (error == -ENOATTR &&
> +		    (args->op_flags & XFS_DA_OP_RECOVERY)) {
> +			attr->xattri_dela_state = xfs_attr_complete_op(attr,
> +						xfs_attr_init_add_state(args));
> +			error = 0;
> +			break;
> +		}
>  		if (error)
>  			return error;
>  		attr->xattri_dela_state = XFS_DAS_NODE_REMOVE_RMT;
> @@ -807,14 +823,16 @@ xfs_attr_set_iter(
>  
>  	case XFS_DAS_LEAF_REMOVE_ATTR:
>  		error = xfs_attr_leaf_remove_attr(attr);
> -		attr->xattri_dela_state = XFS_DAS_DONE;
> +		attr->xattri_dela_state = xfs_attr_complete_op(attr,
> +						xfs_attr_init_add_state(args));
>  		break;
>  
>  	case XFS_DAS_NODE_REMOVE_ATTR:
>  		error = xfs_attr_node_remove_attr(attr);
>  		if (!error)
>  			error = xfs_attr_leaf_shrink(args);
> -		attr->xattri_dela_state = XFS_DAS_DONE;
> +		attr->xattri_dela_state = xfs_attr_complete_op(attr,
> +						xfs_attr_init_add_state(args));
>  		break;
>  	default:
>  		ASSERT(0);
> @@ -1316,9 +1334,10 @@ xfs_attr_leaf_removename(
>  	dp = args->dp;
>  
>  	error = xfs_attr_leaf_hasname(args, &bp);
> -
>  	if (error == -ENOATTR) {
>  		xfs_trans_brelse(args->trans, bp);
> +		if (args->op_flags & XFS_DA_OP_RECOVERY)
> +			return 0;
>  		return error;
>  	} else if (error != -EEXIST)
>  		return error;
> diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
> index e93efc8b11cd..7467d31cb3f1 100644
> --- a/fs/xfs/libxfs/xfs_attr.h
> +++ b/fs/xfs/libxfs/xfs_attr.h
> @@ -444,18 +444,23 @@ struct xfs_attr_list_context {
>   */
>  enum xfs_delattr_state {
>  	XFS_DAS_UNINIT		= 0,	/* No state has been set yet */
> -	XFS_DAS_SF_ADD,			/* Initial shortform set iter state */
> -	XFS_DAS_LEAF_ADD,		/* Initial leaf form set iter state */
> -	XFS_DAS_NODE_ADD,		/* Initial node form set iter state */
> -	XFS_DAS_RMTBLK,			/* Removing remote blks */
> -	XFS_DAS_RM_NAME,		/* Remove attr name */
> -	XFS_DAS_RM_SHRINK,		/* We are shrinking the tree */
> -
> -	XFS_DAS_SF_REMOVE,		/* Initial shortform set iter state */
> -	XFS_DAS_LEAF_REMOVE,		/* Initial leaf form set iter state */
> -	XFS_DAS_NODE_REMOVE,		/* Initial node form set iter state */
> -
> -	/* Leaf state set/replace sequence */
> +
> +	/*
> +	 * Initial sequence states. The replace setup code relies on the
> +	 * ADD and REMOVE states for a specific format to be sequential so
> +	 * that we can transform the initial operation to be performed
> +	 * according to the xfs_has_larp() state easily.

Aha, there's that comment I was asking about back in patch 7.

Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> +	 */
> +	XFS_DAS_SF_ADD,			/* Initial sf add state */
> +	XFS_DAS_SF_REMOVE,		/* Initial sf replace/remove state */
> +
> +	XFS_DAS_LEAF_ADD,		/* Initial leaf add state */
> +	XFS_DAS_LEAF_REMOVE,		/* Initial leaf replace/remove state */
> +
> +	XFS_DAS_NODE_ADD,		/* Initial node add state */
> +	XFS_DAS_NODE_REMOVE,		/* Initial node replace/remove state */
> +
> +	/* Leaf state set/replace/remove sequence */
>  	XFS_DAS_LEAF_SET_RMT,		/* set a remote xattr from a leaf */
>  	XFS_DAS_LEAF_ALLOC_RMT,		/* We are allocating remote blocks */
>  	XFS_DAS_LEAF_REPLACE,		/* Perform replace ops on a leaf */
> @@ -463,7 +468,7 @@ enum xfs_delattr_state {
>  	XFS_DAS_LEAF_REMOVE_RMT,	/* A rename is removing remote blocks */
>  	XFS_DAS_LEAF_REMOVE_ATTR,	/* Remove the old attr from a leaf */
>  
> -	/* Node state set/replace sequence, must match leaf state above */
> +	/* Node state sequence, must match leaf state above */
>  	XFS_DAS_NODE_SET_RMT,		/* set a remote xattr from a node */
>  	XFS_DAS_NODE_ALLOC_RMT,		/* We are allocating remote blocks */
>  	XFS_DAS_NODE_REPLACE,		/* Perform replace ops on a node */
> @@ -477,11 +482,11 @@ enum xfs_delattr_state {
>  #define XFS_DAS_STRINGS	\
>  	{ XFS_DAS_UNINIT,		"XFS_DAS_UNINIT" }, \
>  	{ XFS_DAS_SF_ADD,		"XFS_DAS_SF_ADD" }, \
> +	{ XFS_DAS_SF_REMOVE,		"XFS_DAS_SF_REMOVE" }, \
>  	{ XFS_DAS_LEAF_ADD,		"XFS_DAS_LEAF_ADD" }, \
> +	{ XFS_DAS_LEAF_REMOVE,		"XFS_DAS_LEAF_REMOVE" }, \
>  	{ XFS_DAS_NODE_ADD,		"XFS_DAS_NODE_ADD" }, \
> -	{ XFS_DAS_RMTBLK,		"XFS_DAS_RMTBLK" }, \
> -	{ XFS_DAS_RM_NAME,		"XFS_DAS_RM_NAME" }, \
> -	{ XFS_DAS_RM_SHRINK,		"XFS_DAS_RM_SHRINK" }, \
> +	{ XFS_DAS_NODE_REMOVE,		"XFS_DAS_NODE_REMOVE" }, \
>  	{ XFS_DAS_LEAF_SET_RMT,		"XFS_DAS_LEAF_SET_RMT" }, \
>  	{ XFS_DAS_LEAF_ALLOC_RMT,	"XFS_DAS_LEAF_ALLOC_RMT" }, \
>  	{ XFS_DAS_LEAF_REPLACE,		"XFS_DAS_LEAF_REPLACE" }, \
> @@ -525,8 +530,7 @@ struct xfs_attr_item {
>  	enum xfs_delattr_state		xattri_dela_state;
>  
>  	/*
> -	 * Indicates if the attr operation is a set or a remove
> -	 * XFS_ATTR_OP_FLAGS_{SET,REMOVE}
> +	 * Attr operation being performed - XFS_ATTR_OP_FLAGS_*
>  	 */
>  	unsigned int			xattri_op_flags;
>  
> @@ -605,10 +609,19 @@ xfs_attr_init_remove_state(struct xfs_da_args *args)
>  	return XFS_DAS_NODE_REMOVE;
>  }
>  
> +/*
> + * If we are logging the attributes, then we have to start with removal of the
> + * old attribute so that there is always consistent state that we can recover
> + * from if the system goes down part way through. We always log the new attr
> + * value, so even when we remove the attr first we still have the information in
> + * the log to finish the replace operation atomically.
> + */
>  static inline enum xfs_delattr_state
>  xfs_attr_init_replace_state(struct xfs_da_args *args)
>  {
>  	args->op_flags |= XFS_DA_OP_ADDNAME | XFS_DA_OP_REPLACE;
> +	if (xfs_has_larp(args->dp->i_mount))
> +		return xfs_attr_init_remove_state(args);
>  	return xfs_attr_init_add_state(args);
>  }
>  
> diff --git a/fs/xfs/libxfs/xfs_attr_leaf.c b/fs/xfs/libxfs/xfs_attr_leaf.c
> index 53d02ce9ed78..d15e92858bf0 100644
> --- a/fs/xfs/libxfs/xfs_attr_leaf.c
> +++ b/fs/xfs/libxfs/xfs_attr_leaf.c
> @@ -446,6 +446,14 @@ xfs_attr3_leaf_read(
>   * Namespace helper routines
>   *========================================================================*/
>  
> +/*
> + * If we are in log recovery, then we want the lookup to ignore the INCOMPLETE
> + * flag on disk - if there's an incomplete attr then recovery needs to tear it
> + * down. If there's no incomplete attr, then recovery needs to tear that attr
> + * down to replace it with the attr that has been logged. In this case, the
> + * INCOMPLETE flag will not be set in attr->attr_filter, but rather
> + * XFS_DA_OP_RECOVERY will be set in args->op_flags.
> + */
>  static bool
>  xfs_attr_match(
>  	struct xfs_da_args	*args,
> @@ -453,14 +461,18 @@ xfs_attr_match(
>  	unsigned char		*name,
>  	int			flags)
>  {
> +
>  	if (args->namelen != namelen)
>  		return false;
>  	if (memcmp(args->name, name, namelen) != 0)
>  		return false;
> -	/*
> -	 * If we are looking for incomplete entries, show only those, else only
> -	 * show complete entries.
> -	 */
> +
> +	/* Recovery ignores the INCOMPLETE flag. */
> +	if ((args->op_flags & XFS_DA_OP_RECOVERY) &&
> +	    args->attr_filter == (flags & XFS_ATTR_NSP_ONDISK_MASK))
> +		return true;
> +
> +	/* All remaining matches need to be filtered by INCOMPLETE state. */
>  	if (args->attr_filter !=
>  	    (flags & (XFS_ATTR_NSP_ONDISK_MASK | XFS_ATTR_INCOMPLETE)))
>  		return false;
> @@ -799,6 +811,14 @@ xfs_attr_sf_removename(
>  	sf = (struct xfs_attr_shortform *)dp->i_afp->if_u1.if_data;
>  
>  	error = xfs_attr_sf_findname(args, &sfe, &base);
> +
> +	/*
> +	 * If we are recovering an operation, finding nothing to
> +	 * remove is not an error - it just means there was nothing
> +	 * to clean up.
> +	 */
> +	if (error == -ENOATTR && (args->op_flags & XFS_DA_OP_RECOVERY))
> +		return 0;
>  	if (error != -EEXIST)
>  		return error;
>  	size = xfs_attr_sf_entsize(sfe);
> @@ -819,7 +839,7 @@ xfs_attr_sf_removename(
>  	totsize -= size;
>  	if (totsize == sizeof(xfs_attr_sf_hdr_t) && xfs_has_attr2(mp) &&
>  	    (dp->i_df.if_format != XFS_DINODE_FMT_BTREE) &&
> -	    !(args->op_flags & XFS_DA_OP_ADDNAME)) {
> +	    !(args->op_flags & (XFS_DA_OP_ADDNAME | XFS_DA_OP_REPLACE))) {
>  		xfs_attr_fork_remove(dp, args->trans);
>  	} else {
>  		xfs_idata_realloc(dp, -size, XFS_ATTR_FORK);
> @@ -1128,9 +1148,17 @@ xfs_attr3_leaf_to_shortform(
>  		goto out;
>  
>  	if (forkoff == -1) {
> -		ASSERT(xfs_has_attr2(dp->i_mount));
> -		ASSERT(dp->i_df.if_format != XFS_DINODE_FMT_BTREE);
> -		xfs_attr_fork_remove(dp, args->trans);
> +		/*
> +		 * Don't remove the attr fork if this operation is the first
> +		 * part of a attr replace operations. We're going to add a new
> +		 * attr immediately, so we need to keep the attr fork around in
> +		 * this case.
> +		 */
> +		if (!(args->op_flags & XFS_DA_OP_REPLACE)) {
> +			ASSERT(xfs_has_attr2(dp->i_mount));
> +			ASSERT(dp->i_df.if_format != XFS_DINODE_FMT_BTREE);
> +			xfs_attr_fork_remove(dp, args->trans);
> +		}
>  		goto out;
>  	}
>  
> diff --git a/fs/xfs/libxfs/xfs_da_btree.h b/fs/xfs/libxfs/xfs_da_btree.h
> index 468ca70cd35d..ed2303e4d46a 100644
> --- a/fs/xfs/libxfs/xfs_da_btree.h
> +++ b/fs/xfs/libxfs/xfs_da_btree.h
> @@ -91,6 +91,7 @@ typedef struct xfs_da_args {
>  #define XFS_DA_OP_CILOOKUP	(1u << 4) /* lookup returns CI name if found */
>  #define XFS_DA_OP_NOTIME	(1u << 5) /* don't update inode timestamps */
>  #define XFS_DA_OP_REMOVE	(1u << 6) /* this is a remove operation */
> +#define XFS_DA_OP_RECOVERY	(1u << 7) /* Log recovery operation */
>  
>  #define XFS_DA_OP_FLAGS \
>  	{ XFS_DA_OP_JUSTCHECK,	"JUSTCHECK" }, \
> @@ -99,7 +100,8 @@ typedef struct xfs_da_args {
>  	{ XFS_DA_OP_OKNOENT,	"OKNOENT" }, \
>  	{ XFS_DA_OP_CILOOKUP,	"CILOOKUP" }, \
>  	{ XFS_DA_OP_NOTIME,	"NOTIME" }, \
> -	{ XFS_DA_OP_REMOVE,	"REMOVE" }
> +	{ XFS_DA_OP_REMOVE,	"REMOVE" }, \
> +	{ XFS_DA_OP_RECOVERY,	"RECOVERY" }
>  
>  /*
>   * Storage for holding state during Btree searches and split/join ops.
> diff --git a/fs/xfs/xfs_attr_item.c b/fs/xfs/xfs_attr_item.c
> index fb9549e7ea96..50ad3aa891ee 100644
> --- a/fs/xfs/xfs_attr_item.c
> +++ b/fs/xfs/xfs_attr_item.c
> @@ -554,6 +554,7 @@ xfs_attri_item_recover(
>  	args->namelen = attrp->alfi_name_len;
>  	args->hashval = xfs_da_hashname(args->name, args->namelen);
>  	args->attr_filter = attrp->alfi_attr_flags;
> +	args->op_flags = XFS_DA_OP_RECOVERY | XFS_DA_OP_OKNOENT;
>  
>  	switch (attrp->alfi_op_flags & XFS_ATTR_OP_FLAGS_TYPE_MASK) {
>  	case XFS_ATTR_OP_FLAGS_SET:
> @@ -561,9 +562,14 @@ xfs_attri_item_recover(
>  		args->value = attrip->attri_value;
>  		args->valuelen = attrp->alfi_value_len;
>  		args->total = xfs_attr_calc_size(args, &local);
> -		attr->xattri_dela_state = xfs_attr_init_add_state(args);
> +		if (xfs_inode_hasattr(args->dp))
> +			attr->xattri_dela_state = xfs_attr_init_replace_state(args);
> +		else
> +			attr->xattri_dela_state = xfs_attr_init_add_state(args);
>  		break;
>  	case XFS_ATTR_OP_FLAGS_REMOVE:
> +		if (!xfs_inode_hasattr(args->dp))
> +			goto out;
>  		attr->xattri_dela_state = xfs_attr_init_remove_state(args);
>  		break;
>  	default:
> diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
> index 01b047d86cd1..d32026585c1b 100644
> --- a/fs/xfs/xfs_trace.h
> +++ b/fs/xfs/xfs_trace.h
> @@ -4131,13 +4131,10 @@ DEFINE_ICLOG_EVENT(xlog_iclog_write);
>  
>  TRACE_DEFINE_ENUM(XFS_DAS_UNINIT);
>  TRACE_DEFINE_ENUM(XFS_DAS_SF_ADD);
> -TRACE_DEFINE_ENUM(XFS_DAS_LEAF_ADD);
> -TRACE_DEFINE_ENUM(XFS_DAS_NODE_ADD);
> -TRACE_DEFINE_ENUM(XFS_DAS_RMTBLK);
> -TRACE_DEFINE_ENUM(XFS_DAS_RM_NAME);
> -TRACE_DEFINE_ENUM(XFS_DAS_RM_SHRINK);
>  TRACE_DEFINE_ENUM(XFS_DAS_SF_REMOVE);
> +TRACE_DEFINE_ENUM(XFS_DAS_LEAF_ADD);
>  TRACE_DEFINE_ENUM(XFS_DAS_LEAF_REMOVE);
> +TRACE_DEFINE_ENUM(XFS_DAS_NODE_ADD);
>  TRACE_DEFINE_ENUM(XFS_DAS_NODE_REMOVE);
>  TRACE_DEFINE_ENUM(XFS_DAS_LEAF_SET_RMT);
>  TRACE_DEFINE_ENUM(XFS_DAS_LEAF_ALLOC_RMT);
> -- 
> 2.35.1
> 

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 18/18] xfs: detect empty attr leaf blocks in xfs_attr3_leaf_verify
  2022-05-09  0:41 ` [PATCH 18/18] xfs: detect empty attr leaf blocks in xfs_attr3_leaf_verify Dave Chinner
  2022-05-10 22:31   ` Alli
@ 2022-05-10 23:54   ` Darrick J. Wong
  1 sibling, 0 replies; 56+ messages in thread
From: Darrick J. Wong @ 2022-05-10 23:54 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Mon, May 09, 2022 at 10:41:38AM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> xfs_repair flags these as a corruption error, so the verifier should
> catch software bugs that result in empty leaf blocks being written
> to disk, too.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>

Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> ---
>  fs/xfs/libxfs/xfs_attr_leaf.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/fs/xfs/libxfs/xfs_attr_leaf.c b/fs/xfs/libxfs/xfs_attr_leaf.c
> index d15e92858bf0..15a990409463 100644
> --- a/fs/xfs/libxfs/xfs_attr_leaf.c
> +++ b/fs/xfs/libxfs/xfs_attr_leaf.c
> @@ -310,6 +310,15 @@ xfs_attr3_leaf_verify(
>  	if (fa)
>  		return fa;
>  
> +	/*
> +	 * Empty leaf blocks should never occur;  they imply the existence of a
> +	 * software bug that needs fixing. xfs_repair also flags them as a
> +	 * corruption that needs fixing, so we should never let these go to
> +	 * disk.
> +	 */
> +	if (ichdr.count == 0)
> +		return __this_address;
> +
>  	/*
>  	 * firstused is the block offset of the first name info structure.
>  	 * Make sure it doesn't go off the block or crash into the header.
> -- 
> 2.35.1
> 

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 19/18] xfs: can't use kmem_zalloc() for attribute buffers
  2022-05-10 22:27 ` [PATCH 19/18] xfs: can't use kmem_zalloc() for attribute buffers Dave Chinner
@ 2022-05-10 23:59   ` Darrick J. Wong
  2022-05-11  0:54     ` Dave Chinner
  2022-05-11  0:54   ` Alli
  1 sibling, 1 reply; 56+ messages in thread
From: Darrick J. Wong @ 2022-05-10 23:59 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Wed, May 11, 2022 at 08:27:16AM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> Because when running fsmark workloads with 64kB xattrs, heap
> allocation of >64kB buffers for the attr item name/value buffer
> will fail and deadlock.
> 
> ....
>  XFS: fs_mark(8414) possible memory allocation deadlock size 65768 in kmem_alloc (mode:0x2d40)
>  XFS: fs_mark(8417) possible memory allocation deadlock size 65768 in kmem_alloc (mode:0x2d40)
>  XFS: fs_mark(8409) possible memory allocation deadlock size 65768 in kmem_alloc (mode:0x2d40)
>  XFS: fs_mark(8428) possible memory allocation deadlock size 65768 in kmem_alloc (mode:0x2d40)
>  XFS: fs_mark(8430) possible memory allocation deadlock size 65768 in kmem_alloc (mode:0x2d40)
>  XFS: fs_mark(8437) possible memory allocation deadlock size 65768 in kmem_alloc (mode:0x2d40)
>  XFS: fs_mark(8433) possible memory allocation deadlock size 65768 in kmem_alloc (mode:0x2d40)
>  XFS: fs_mark(8406) possible memory allocation deadlock size 65768 in kmem_alloc (mode:0x2d40)
>  XFS: fs_mark(8412) possible memory allocation deadlock size 65768 in kmem_alloc (mode:0x2d40)
>  XFS: fs_mark(8432) possible memory allocation deadlock size 65768 in kmem_alloc (mode:0x2d40)
>  XFS: fs_mark(8424) possible memory allocation deadlock size 65768 in kmem_alloc (mode:0x2d40)
> ....
> 
> I'd use kvmalloc(), but if we are doing 15,000 64kB xattr creates a
> second, the attempt to use kmalloc() in kvmalloc() results in a huge
> amount of direct reclaim work that is guaranteed to fail occurs
> before it falls back to vmalloc:
> 
> - 48.19% xfs_attr_create_intent
>   - 46.89% xfs_attri_init
>      - kvmalloc_node
> 	- 46.04% __kmalloc_node
> 	   - kmalloc_large_node
> 	      - 45.99% __alloc_pages
> 		 - 39.39% __alloc_pages_slowpath.constprop.0
> 		    - 38.89% __alloc_pages_direct_compact
> 		       - 38.71% try_to_compact_pages
> 			  - compact_zone_order
> 			  - compact_zone
> 			     - 21.09% isolate_migratepages_block
> 				  10.31% PageHuge
> 				  5.82% set_pfnblock_flags_mask
> 				  0.86% get_pfnblock_flags_mask
> 			     - 4.48% __reset_isolation_suitable
> 				  4.44% __reset_isolation_pfn
> 			     - 3.56% __pageblock_pfn_to_page
> 				  1.33% pfn_to_online_page
> 			       2.83% get_pfnblock_flags_mask
> 			     - 0.87% migrate_pages
> 				  0.86% compaction_alloc
> 			       0.84% find_suitable_fallback
> 		 - 6.60% get_page_from_freelist
> 		      4.99% clear_page_erms
> 		    - 1.19% _raw_spin_lock_irqsave
> 		       - do_raw_spin_lock
> 			    __pv_queued_spin_lock_slowpath
> 	- 0.86% __vmalloc_node_range
> 	     0.65% __alloc_pages_bulk
> 
> So lift xlog_cil_kvmalloc(), rename it to xlog_kvmalloc() and use
> that instead because it has sane fail-fast behaviour for the
> embedded kmalloc attempt. It also provides __GFP_NOFAIL guarantees
> that kvmalloc() won't do, either....
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/xfs_attr_item.c | 35 +++++++++++++++--------------------
>  fs/xfs/xfs_log_cil.c   | 35 +----------------------------------
>  fs/xfs/xfs_log_priv.h  | 34 ++++++++++++++++++++++++++++++++++
>  3 files changed, 50 insertions(+), 54 deletions(-)
> 
> diff --git a/fs/xfs/xfs_attr_item.c b/fs/xfs/xfs_attr_item.c
> index 56f678c965b7..e8ac88d9fd14 100644
> --- a/fs/xfs/xfs_attr_item.c
> +++ b/fs/xfs/xfs_attr_item.c
> @@ -44,7 +44,7 @@ xfs_attri_item_free(
>  	struct xfs_attri_log_item	*attrip)
>  {
>  	kmem_free(attrip->attri_item.li_lv_shadow);
> -	kmem_free(attrip);
> +	kvfree(attrip);
>  }
>  
>  /*
> @@ -119,11 +119,11 @@ xfs_attri_item_format(
>  			sizeof(struct xfs_attri_log_format));
>  	xlog_copy_iovec(lv, &vecp, XLOG_REG_TYPE_ATTR_NAME,
>  			attrip->attri_name,
> -			xlog_calc_iovec_len(attrip->attri_name_len));
> +			attrip->attri_name_len);

Are we fixing these because the xlog_{copy,finish}_iovec functions do
the rounding themselves now?

>  	if (attrip->attri_value_len > 0)
>  		xlog_copy_iovec(lv, &vecp, XLOG_REG_TYPE_ATTR_VALUE,
>  				attrip->attri_value,
> -				xlog_calc_iovec_len(attrip->attri_value_len));
> +				attrip->attri_value_len);
>  }
>  
>  /*
> @@ -163,26 +163,21 @@ xfs_attri_init(
>  
>  {
>  	struct xfs_attri_log_item	*attrip;
> -	uint32_t			name_vec_len = 0;
> -	uint32_t			value_vec_len = 0;
> -	uint32_t			buffer_size;
> -
> -	if (name_len)
> -		name_vec_len = xlog_calc_iovec_len(name_len);
> -	if (value_len)
> -		value_vec_len = xlog_calc_iovec_len(value_len);

...and we don't need to bloat up the internal structures anymore either,
right?

> -
> -	buffer_size = name_vec_len + value_vec_len;
> +	uint32_t			buffer_size = name_len + value_len;
>  
>  	if (buffer_size) {
> -		attrip = kmem_zalloc(sizeof(struct xfs_attri_log_item) +
> -				    buffer_size, KM_NOFS);
> -		if (attrip == NULL)
> -			return NULL;
> +		/*
> +		 * This could be over 64kB in length, so we have to use
> +		 * kvmalloc() for this. But kvmalloc() utterly sucks, so we
> +		 * use own version.
> +		 */
> +		attrip = xlog_kvmalloc(sizeof(struct xfs_attri_log_item) +
> +					buffer_size);
>  	} else {
> -		attrip = kmem_cache_zalloc(xfs_attri_cache,
> -					  GFP_NOFS | __GFP_NOFAIL);
> +		attrip = kmem_cache_alloc(xfs_attri_cache,
> +					GFP_NOFS | __GFP_NOFAIL);
>  	}
> +	memset(attrip, 0, sizeof(struct xfs_attri_log_item));

I wonder if this memset should be right after the xlog_kvmalloc and
leave the kmem_cache_zalloc alone?

Looks solid otherwise.

--D

>  
>  	attrip->attri_name_len = name_len;
>  	if (name_len)
> @@ -195,7 +190,7 @@ xfs_attri_init(
>  	if (value_len)
>  		attrip->attri_value = ((char *)attrip) +
>  				sizeof(struct xfs_attri_log_item) +
> -				name_vec_len;
> +				name_len;
>  	else
>  		attrip->attri_value = NULL;
>  
> diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
> index 42ace9b091d8..b4023693b89f 100644
> --- a/fs/xfs/xfs_log_cil.c
> +++ b/fs/xfs/xfs_log_cil.c
> @@ -219,39 +219,6 @@ xlog_cil_iovec_space(
>  			sizeof(uint64_t));
>  }
>  
> -/*
> - * shadow buffers can be large, so we need to use kvmalloc() here to ensure
> - * success. Unfortunately, kvmalloc() only allows GFP_KERNEL contexts to fall
> - * back to vmalloc, so we can't actually do anything useful with gfp flags to
> - * control the kmalloc() behaviour within kvmalloc(). Hence kmalloc() will do
> - * direct reclaim and compaction in the slow path, both of which are
> - * horrendously expensive. We just want kmalloc to fail fast and fall back to
> - * vmalloc if it can't get somethign straight away from the free lists or buddy
> - * allocator. Hence we have to open code kvmalloc outselves here.
> - *
> - * Also, we are in memalloc_nofs_save task context here, so despite the use of
> - * GFP_KERNEL here, we are actually going to be doing GFP_NOFS allocations. This
> - * is actually the only way to make vmalloc() do GFP_NOFS allocations, so lets
> - * just all pretend this is a GFP_KERNEL context operation....
> - */
> -static inline void *
> -xlog_cil_kvmalloc(
> -	size_t		buf_size)
> -{
> -	gfp_t		flags = GFP_KERNEL;
> -	void		*p;
> -
> -	flags &= ~__GFP_DIRECT_RECLAIM;
> -	flags |= __GFP_NOWARN | __GFP_NORETRY;
> -	do {
> -		p = kmalloc(buf_size, flags);
> -		if (!p)
> -			p = vmalloc(buf_size);
> -	} while (!p);
> -
> -	return p;
> -}
> -
>  /*
>   * Allocate or pin log vector buffers for CIL insertion.
>   *
> @@ -368,7 +335,7 @@ xlog_cil_alloc_shadow_bufs(
>  			 * storage.
>  			 */
>  			kmem_free(lip->li_lv_shadow);
> -			lv = xlog_cil_kvmalloc(buf_size);
> +			lv = xlog_kvmalloc(buf_size);
>  
>  			memset(lv, 0, xlog_cil_iovec_space(niovecs));
>  
> diff --git a/fs/xfs/xfs_log_priv.h b/fs/xfs/xfs_log_priv.h
> index 4aa95b68450a..46f989641eda 100644
> --- a/fs/xfs/xfs_log_priv.h
> +++ b/fs/xfs/xfs_log_priv.h
> @@ -679,4 +679,38 @@ xlog_valid_lsn(
>   */
>  void xlog_cil_pcp_dead(struct xlog *log, unsigned int cpu);
>  
> +/*
> + * Log vector and shadow buffers can be large, so we need to use kvmalloc() here
> + * to ensure success. Unfortunately, kvmalloc() only allows GFP_KERNEL contexts
> + * to fall back to vmalloc, so we can't actually do anything useful with gfp
> + * flags to control the kmalloc() behaviour within kvmalloc(). Hence kmalloc()
> + * will do direct reclaim and compaction in the slow path, both of which are
> + * horrendously expensive. We just want kmalloc to fail fast and fall back to
> + * vmalloc if it can't get somethign straight away from the free lists or
> + * buddy allocator. Hence we have to open code kvmalloc outselves here.
> + *
> + * This assumes that the caller uses memalloc_nofs_save task context here, so
> + * despite the use of GFP_KERNEL here, we are going to be doing GFP_NOFS
> + * allocations. This is actually the only way to make vmalloc() do GFP_NOFS
> + * allocations, so lets just all pretend this is a GFP_KERNEL context
> + * operation....
> + */
> +static inline void *
> +xlog_kvmalloc(
> +	size_t		buf_size)
> +{
> +	gfp_t		flags = GFP_KERNEL;
> +	void		*p;
> +
> +	flags &= ~__GFP_DIRECT_RECLAIM;
> +	flags |= __GFP_NOWARN | __GFP_NORETRY;
> +	do {
> +		p = kmalloc(buf_size, flags);
> +		if (!p)
> +			p = vmalloc(buf_size);
> +	} while (!p);
> +
> +	return p;
> +}
> +
>  #endif	/* __XFS_LOG_PRIV_H__ */
> -- 
> Dave Chinner
> david@fromorbit.com

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 19/18] xfs: can't use kmem_zalloc() for attribute buffers
  2022-05-10 22:27 ` [PATCH 19/18] xfs: can't use kmem_zalloc() for attribute buffers Dave Chinner
  2022-05-10 23:59   ` Darrick J. Wong
@ 2022-05-11  0:54   ` Alli
  1 sibling, 0 replies; 56+ messages in thread
From: Alli @ 2022-05-11  0:54 UTC (permalink / raw)
  To: Dave Chinner, linux-xfs

On Wed, 2022-05-11 at 08:27 +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> Because when running fsmark workloads with 64kB xattrs, heap
> allocation of >64kB buffers for the attr item name/value buffer
> will fail and deadlock.
> 
> ....
>  XFS: fs_mark(8414) possible memory allocation deadlock size 65768 in
> kmem_alloc (mode:0x2d40)
>  XFS: fs_mark(8417) possible memory allocation deadlock size 65768 in
> kmem_alloc (mode:0x2d40)
>  XFS: fs_mark(8409) possible memory allocation deadlock size 65768 in
> kmem_alloc (mode:0x2d40)
>  XFS: fs_mark(8428) possible memory allocation deadlock size 65768 in
> kmem_alloc (mode:0x2d40)
>  XFS: fs_mark(8430) possible memory allocation deadlock size 65768 in
> kmem_alloc (mode:0x2d40)
>  XFS: fs_mark(8437) possible memory allocation deadlock size 65768 in
> kmem_alloc (mode:0x2d40)
>  XFS: fs_mark(8433) possible memory allocation deadlock size 65768 in
> kmem_alloc (mode:0x2d40)
>  XFS: fs_mark(8406) possible memory allocation deadlock size 65768 in
> kmem_alloc (mode:0x2d40)
>  XFS: fs_mark(8412) possible memory allocation deadlock size 65768 in
> kmem_alloc (mode:0x2d40)
>  XFS: fs_mark(8432) possible memory allocation deadlock size 65768 in
> kmem_alloc (mode:0x2d40)
>  XFS: fs_mark(8424) possible memory allocation deadlock size 65768 in
> kmem_alloc (mode:0x2d40)
> ....
> 
> I'd use kvmalloc(), but if we are doing 15,000 64kB xattr creates a
> second, the attempt to use kmalloc() in kvmalloc() results in a huge
> amount of direct reclaim work that is guaranteed to fail occurs
> before it falls back to vmalloc:
> 
> - 48.19% xfs_attr_create_intent
>   - 46.89% xfs_attri_init
>      - kvmalloc_node
> 	- 46.04% __kmalloc_node
> 	   - kmalloc_large_node
> 	      - 45.99% __alloc_pages
> 		 - 39.39% __alloc_pages_slowpath.constprop.0
> 		    - 38.89% __alloc_pages_direct_compact
> 		       - 38.71% try_to_compact_pages
> 			  - compact_zone_order
> 			  - compact_zone
> 			     - 21.09% isolate_migratepages_block
> 				  10.31% PageHuge
> 				  5.82% set_pfnblock_flags_mask
> 				  0.86% get_pfnblock_flags_mask
> 			     - 4.48% __reset_isolation_suitable
> 				  4.44% __reset_isolation_pfn
> 			     - 3.56% __pageblock_pfn_to_page
> 				  1.33% pfn_to_online_page
> 			       2.83% get_pfnblock_flags_mask
> 			     - 0.87% migrate_pages
> 				  0.86% compaction_alloc
> 			       0.84% find_suitable_fallback
> 		 - 6.60% get_page_from_freelist
> 		      4.99% clear_page_erms
> 		    - 1.19% _raw_spin_lock_irqsave
> 		       - do_raw_spin_lock
> 			    __pv_queued_spin_lock_slowpath
> 	- 0.86% __vmalloc_node_range
> 	     0.65% __alloc_pages_bulk
> 
> So lift xlog_cil_kvmalloc(), rename it to xlog_kvmalloc() and use
> that instead because it has sane fail-fast behaviour for the
> embedded kmalloc attempt. It also provides __GFP_NOFAIL guarantees
> that kvmalloc() won't do, either....
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
This looks ok to me, didnt run across any test failures
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>

> ---
>  fs/xfs/xfs_attr_item.c | 35 +++++++++++++++--------------------
>  fs/xfs/xfs_log_cil.c   | 35 +----------------------------------
>  fs/xfs/xfs_log_priv.h  | 34 ++++++++++++++++++++++++++++++++++
>  3 files changed, 50 insertions(+), 54 deletions(-)
> 
> diff --git a/fs/xfs/xfs_attr_item.c b/fs/xfs/xfs_attr_item.c
> index 56f678c965b7..e8ac88d9fd14 100644
> --- a/fs/xfs/xfs_attr_item.c
> +++ b/fs/xfs/xfs_attr_item.c
> @@ -44,7 +44,7 @@ xfs_attri_item_free(
>  	struct xfs_attri_log_item	*attrip)
>  {
>  	kmem_free(attrip->attri_item.li_lv_shadow);
> -	kmem_free(attrip);
> +	kvfree(attrip);
>  }
>  
>  /*
> @@ -119,11 +119,11 @@ xfs_attri_item_format(
>  			sizeof(struct xfs_attri_log_format));
>  	xlog_copy_iovec(lv, &vecp, XLOG_REG_TYPE_ATTR_NAME,
>  			attrip->attri_name,
> -			xlog_calc_iovec_len(attrip->attri_name_len));
> +			attrip->attri_name_len);
>  	if (attrip->attri_value_len > 0)
>  		xlog_copy_iovec(lv, &vecp, XLOG_REG_TYPE_ATTR_VALUE,
>  				attrip->attri_value,
> -				xlog_calc_iovec_len(attrip-
> >attri_value_len));
> +				attrip->attri_value_len);
>  }
>  
>  /*
> @@ -163,26 +163,21 @@ xfs_attri_init(
>  
>  {
>  	struct xfs_attri_log_item	*attrip;
> -	uint32_t			name_vec_len = 0;
> -	uint32_t			value_vec_len = 0;
> -	uint32_t			buffer_size;
> -
> -	if (name_len)
> -		name_vec_len = xlog_calc_iovec_len(name_len);
> -	if (value_len)
> -		value_vec_len = xlog_calc_iovec_len(value_len);
> -
> -	buffer_size = name_vec_len + value_vec_len;
> +	uint32_t			buffer_size = name_len + value_len;
>  
>  	if (buffer_size) {
> -		attrip = kmem_zalloc(sizeof(struct xfs_attri_log_item)
> +
> -				    buffer_size, KM_NOFS);
> -		if (attrip == NULL)
> -			return NULL;
> +		/*
> +		 * This could be over 64kB in length, so we have to use
> +		 * kvmalloc() for this. But kvmalloc() utterly sucks,
> so we
> +		 * use own version.
> +		 */
> +		attrip = xlog_kvmalloc(sizeof(struct
> xfs_attri_log_item) +
> +					buffer_size);
>  	} else {
> -		attrip = kmem_cache_zalloc(xfs_attri_cache,
> -					  GFP_NOFS | __GFP_NOFAIL);
> +		attrip = kmem_cache_alloc(xfs_attri_cache,
> +					GFP_NOFS | __GFP_NOFAIL);
>  	}
> +	memset(attrip, 0, sizeof(struct xfs_attri_log_item));
>  
>  	attrip->attri_name_len = name_len;
>  	if (name_len)
> @@ -195,7 +190,7 @@ xfs_attri_init(
>  	if (value_len)
>  		attrip->attri_value = ((char *)attrip) +
>  				sizeof(struct xfs_attri_log_item) +
> -				name_vec_len;
> +				name_len;
>  	else
>  		attrip->attri_value = NULL;
>  
> diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
> index 42ace9b091d8..b4023693b89f 100644
> --- a/fs/xfs/xfs_log_cil.c
> +++ b/fs/xfs/xfs_log_cil.c
> @@ -219,39 +219,6 @@ xlog_cil_iovec_space(
>  			sizeof(uint64_t));
>  }
>  
> -/*
> - * shadow buffers can be large, so we need to use kvmalloc() here to
> ensure
> - * success. Unfortunately, kvmalloc() only allows GFP_KERNEL
> contexts to fall
> - * back to vmalloc, so we can't actually do anything useful with gfp
> flags to
> - * control the kmalloc() behaviour within kvmalloc(). Hence
> kmalloc() will do
> - * direct reclaim and compaction in the slow path, both of which are
> - * horrendously expensive. We just want kmalloc to fail fast and
> fall back to
> - * vmalloc if it can't get somethign straight away from the free
> lists or buddy
> - * allocator. Hence we have to open code kvmalloc outselves here.
> - *
> - * Also, we are in memalloc_nofs_save task context here, so despite
> the use of
> - * GFP_KERNEL here, we are actually going to be doing GFP_NOFS
> allocations. This
> - * is actually the only way to make vmalloc() do GFP_NOFS
> allocations, so lets
> - * just all pretend this is a GFP_KERNEL context operation....
> - */
> -static inline void *
> -xlog_cil_kvmalloc(
> -	size_t		buf_size)
> -{
> -	gfp_t		flags = GFP_KERNEL;
> -	void		*p;
> -
> -	flags &= ~__GFP_DIRECT_RECLAIM;
> -	flags |= __GFP_NOWARN | __GFP_NORETRY;
> -	do {
> -		p = kmalloc(buf_size, flags);
> -		if (!p)
> -			p = vmalloc(buf_size);
> -	} while (!p);
> -
> -	return p;
> -}
> -
>  /*
>   * Allocate or pin log vector buffers for CIL insertion.
>   *
> @@ -368,7 +335,7 @@ xlog_cil_alloc_shadow_bufs(
>  			 * storage.
>  			 */
>  			kmem_free(lip->li_lv_shadow);
> -			lv = xlog_cil_kvmalloc(buf_size);
> +			lv = xlog_kvmalloc(buf_size);
>  
>  			memset(lv, 0, xlog_cil_iovec_space(niovecs));
>  
> diff --git a/fs/xfs/xfs_log_priv.h b/fs/xfs/xfs_log_priv.h
> index 4aa95b68450a..46f989641eda 100644
> --- a/fs/xfs/xfs_log_priv.h
> +++ b/fs/xfs/xfs_log_priv.h
> @@ -679,4 +679,38 @@ xlog_valid_lsn(
>   */
>  void xlog_cil_pcp_dead(struct xlog *log, unsigned int cpu);
>  
> +/*
> + * Log vector and shadow buffers can be large, so we need to use
> kvmalloc() here
> + * to ensure success. Unfortunately, kvmalloc() only allows
> GFP_KERNEL contexts
> + * to fall back to vmalloc, so we can't actually do anything useful
> with gfp
> + * flags to control the kmalloc() behaviour within kvmalloc(). Hence
> kmalloc()
> + * will do direct reclaim and compaction in the slow path, both of
> which are
> + * horrendously expensive. We just want kmalloc to fail fast and
> fall back to
> + * vmalloc if it can't get somethign straight away from the free
> lists or
> + * buddy allocator. Hence we have to open code kvmalloc outselves
> here.
> + *
> + * This assumes that the caller uses memalloc_nofs_save task context
> here, so
> + * despite the use of GFP_KERNEL here, we are going to be doing
> GFP_NOFS
> + * allocations. This is actually the only way to make vmalloc() do
> GFP_NOFS
> + * allocations, so lets just all pretend this is a GFP_KERNEL
> context
> + * operation....
> + */
> +static inline void *
> +xlog_kvmalloc(
> +	size_t		buf_size)
> +{
> +	gfp_t		flags = GFP_KERNEL;
> +	void		*p;
> +
> +	flags &= ~__GFP_DIRECT_RECLAIM;
> +	flags |= __GFP_NOWARN | __GFP_NORETRY;
> +	do {
> +		p = kmalloc(buf_size, flags);
> +		if (!p)
> +			p = vmalloc(buf_size);
> +	} while (!p);
> +
> +	return p;
> +}
> +
>  #endif	/* __XFS_LOG_PRIV_H__ */


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 19/18] xfs: can't use kmem_zalloc() for attribute buffers
  2022-05-10 23:59   ` Darrick J. Wong
@ 2022-05-11  0:54     ` Dave Chinner
  2022-05-11  1:10       ` Darrick J. Wong
  0 siblings, 1 reply; 56+ messages in thread
From: Dave Chinner @ 2022-05-11  0:54 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Tue, May 10, 2022 at 04:59:31PM -0700, Darrick J. Wong wrote:
> On Wed, May 11, 2022 at 08:27:16AM +1000, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > Because when running fsmark workloads with 64kB xattrs, heap
> > allocation of >64kB buffers for the attr item name/value buffer
> > will fail and deadlock.
.....
> > @@ -119,11 +119,11 @@ xfs_attri_item_format(
> >  			sizeof(struct xfs_attri_log_format));
> >  	xlog_copy_iovec(lv, &vecp, XLOG_REG_TYPE_ATTR_NAME,
> >  			attrip->attri_name,
> > -			xlog_calc_iovec_len(attrip->attri_name_len));
> > +			attrip->attri_name_len);
> 
> Are we fixing these because the xlog_{copy,finish}_iovec functions do
> the rounding themselves now?

Yes, I forgot that I cleaned this up here when I noticed it.
Probably should mention it in the commit log:

"We also clean up the attribute name and value lengths as they no
longer need to be rounded out to sizes compatible with log vectors."

> 
> >  	if (attrip->attri_value_len > 0)
> >  		xlog_copy_iovec(lv, &vecp, XLOG_REG_TYPE_ATTR_VALUE,
> >  				attrip->attri_value,
> > -				xlog_calc_iovec_len(attrip->attri_value_len));
> > +				attrip->attri_value_len);
> >  }
> >  
> >  /*
> > @@ -163,26 +163,21 @@ xfs_attri_init(
> >  
> >  {
> >  	struct xfs_attri_log_item	*attrip;
> > -	uint32_t			name_vec_len = 0;
> > -	uint32_t			value_vec_len = 0;
> > -	uint32_t			buffer_size;
> > -
> > -	if (name_len)
> > -		name_vec_len = xlog_calc_iovec_len(name_len);
> > -	if (value_len)
> > -		value_vec_len = xlog_calc_iovec_len(value_len);
> 
> ...and we don't need to bloat up the internal structures anymore either,
> right?

Yup, because we only copy out the exact length of the name/val now.

> 
> > -
> > -	buffer_size = name_vec_len + value_vec_len;
> > +	uint32_t			buffer_size = name_len + value_len;
> >  
> >  	if (buffer_size) {
> > -		attrip = kmem_zalloc(sizeof(struct xfs_attri_log_item) +
> > -				    buffer_size, KM_NOFS);
> > -		if (attrip == NULL)
> > -			return NULL;
> > +		/*
> > +		 * This could be over 64kB in length, so we have to use
> > +		 * kvmalloc() for this. But kvmalloc() utterly sucks, so we
> > +		 * use own version.
> > +		 */
> > +		attrip = xlog_kvmalloc(sizeof(struct xfs_attri_log_item) +
> > +					buffer_size);
> >  	} else {
> > -		attrip = kmem_cache_zalloc(xfs_attri_cache,
> > -					  GFP_NOFS | __GFP_NOFAIL);
> > +		attrip = kmem_cache_alloc(xfs_attri_cache,
> > +					GFP_NOFS | __GFP_NOFAIL);
> >  	}
> > +	memset(attrip, 0, sizeof(struct xfs_attri_log_item));
> 
> I wonder if this memset should be right after the xlog_kvmalloc and
> leave the kmem_cache_zalloc alone?

Then we memset the header twice in the common small attr case, and
the xfs_attri_log_item header is not exactly what you'd call small
(224 bytes).

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 04/18] xfs: rework deferred attribute operation setup
  2022-05-10 23:04   ` Darrick J. Wong
@ 2022-05-11  0:57     ` Dave Chinner
  0 siblings, 0 replies; 56+ messages in thread
From: Dave Chinner @ 2022-05-11  0:57 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Tue, May 10, 2022 at 04:04:51PM -0700, Darrick J. Wong wrote:
> On Mon, May 09, 2022 at 10:41:24AM +1000, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > Logged attribute intents only have set and remove types - there is
> > no type of the replace operation. We should ahve a separate type for
> 
> "..no separate intent type for a replace operation."?
> 
> Also, "ahve" -> "have".
> 
> > a replace operation, as it needs to perform operations that neither
> > SET or REMOVE can perform.
> > 
> > Add this type to the intent items and rearrange the deferred
> > operation setup to reflect the different operations we are
> > performing.
> > 
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > Reviewed-by: Allison Henderson<allison.henderson@oracle.com>
> > ---
> >  fs/xfs/libxfs/xfs_attr.c       | 165 +++++++++++++++++++--------------
> >  fs/xfs/libxfs/xfs_attr.h       |   2 -
> >  fs/xfs/libxfs/xfs_log_format.h |   1 +
> >  fs/xfs/xfs_attr_item.c         |   9 +-
> >  fs/xfs/xfs_trace.h             |   4 +
> >  5 files changed, 110 insertions(+), 71 deletions(-)
> > 
> > diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
> > index a4b0b20a3bab..817e15740f9c 100644
> > --- a/fs/xfs/libxfs/xfs_attr.c
> > +++ b/fs/xfs/libxfs/xfs_attr.c
> > @@ -671,6 +671,81 @@ xfs_attr_lookup(
> >  	return xfs_attr_node_hasname(args, NULL);
> >  }
> >  
> > +static int
> > +xfs_attr_item_init(
> > +	struct xfs_da_args	*args,
> > +	unsigned int		op_flags,	/* op flag (set or remove) */
> > +	struct xfs_attr_item	**attr)		/* new xfs_attr_item */
> > +{
> > +
> > +	struct xfs_attr_item	*new;
> > +
> > +	new = kmem_zalloc(sizeof(struct xfs_attr_item), KM_NOFS);
> 
> Can this fail?
> 
> > +	new->xattri_op_flags = op_flags;
> > +	new->xattri_da_args = args;
> > +
> > +	*attr = new;
> > +	return 0;
> 
> And if it can't, then there's no need for a return value, AFAICT.  I
> looked at the end of xfs-5.19-compose and there's no other return
> statements in this function.
> 
> And then you don't need any of the error handling in this patch at all,
> right?
> 
> *OH*, this is just a hoist of the stuff at the end.  Could someone add a
> patch on the end of ... whatever patchset there is to clean that up?
> 
> In the meantime, with the commit message typos cleaned up,
> Reviewed-by: Darrick J. Wong <djwong@kernel.org>

Thanks.

I'm making notes of all the things that need cleaning up. there's a
few things already, that probably won't happen until the next cycle
at this point...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 05/18] xfs: separate out initial attr_set states
  2022-05-10 23:12   ` Darrick J. Wong
@ 2022-05-11  1:06     ` Dave Chinner
  2022-05-11  1:08       ` Darrick J. Wong
  0 siblings, 1 reply; 56+ messages in thread
From: Dave Chinner @ 2022-05-11  1:06 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Tue, May 10, 2022 at 04:12:34PM -0700, Darrick J. Wong wrote:
> On Mon, May 09, 2022 at 10:41:25AM +1000, Dave Chinner wrote:
> > diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
> > index c9c867e3406c..ad52b5dc59e4 100644
> > --- a/fs/xfs/libxfs/xfs_attr.h
> > +++ b/fs/xfs/libxfs/xfs_attr.h
> > @@ -530,4 +553,35 @@ void xfs_attri_destroy_cache(void);
> >  int __init xfs_attrd_init_cache(void);
> >  void xfs_attrd_destroy_cache(void);
> >  
> > +/*
> > + * Check to see if the attr should be upgraded from non-existent or shortform to
> > + * single-leaf-block attribute list.
> > + */
> > +static inline bool
> > +xfs_attr_is_shortform(
> > +	struct xfs_inode    *ip)
> > +{
> > +	return ip->i_afp->if_format == XFS_DINODE_FMT_LOCAL ||
> > +	       (ip->i_afp->if_format == XFS_DINODE_FMT_EXTENTS &&
> > +		ip->i_afp->if_nextents == 0);
> > +}
> > +
> > +static inline enum xfs_delattr_state
> > +xfs_attr_init_add_state(struct xfs_da_args *args)
> > +{
> > +	if (!args->dp->i_afp)
> > +		return XFS_DAS_DONE;
> 
> If we're in add/replace attr call without an attr fork, why do we go
> straight to finished?

I suspect I've fixed all the issues that triggered crashes here
because args->dp->i_afp was null. THere were transient states in a
replace operaiton when the remove takes away the last attr, removes
the attr fork, then calls the ADD operation. The add operation
assumes that the attr fork has already been set up, and so bad
things happened here.

This also occurred when setting up recovery operations - recovery of
an add/replace could start from that same "there's no attr fork"
condition, and so calling xfs_inode_has_attr() or
xfs_attr_is_shortform() direct from the reocovery setup code would
go splat because ip->i_afp was null.

I'm going to leave this for the moment (cleanup note made) because I
don't want to have to find out that I missed a corner case somewhere
they hard way right now. It's basically there to stop log recovery
crashing hard, which only occurs when the experimental larp code is
running, so I think this is safe to leave for a later cleanup.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 05/18] xfs: separate out initial attr_set states
  2022-05-11  1:06     ` Dave Chinner
@ 2022-05-11  1:08       ` Darrick J. Wong
  2022-05-11  1:38         ` Dave Chinner
  0 siblings, 1 reply; 56+ messages in thread
From: Darrick J. Wong @ 2022-05-11  1:08 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Wed, May 11, 2022 at 11:06:51AM +1000, Dave Chinner wrote:
> On Tue, May 10, 2022 at 04:12:34PM -0700, Darrick J. Wong wrote:
> > On Mon, May 09, 2022 at 10:41:25AM +1000, Dave Chinner wrote:
> > > diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
> > > index c9c867e3406c..ad52b5dc59e4 100644
> > > --- a/fs/xfs/libxfs/xfs_attr.h
> > > +++ b/fs/xfs/libxfs/xfs_attr.h
> > > @@ -530,4 +553,35 @@ void xfs_attri_destroy_cache(void);
> > >  int __init xfs_attrd_init_cache(void);
> > >  void xfs_attrd_destroy_cache(void);
> > >  
> > > +/*
> > > + * Check to see if the attr should be upgraded from non-existent or shortform to
> > > + * single-leaf-block attribute list.
> > > + */
> > > +static inline bool
> > > +xfs_attr_is_shortform(
> > > +	struct xfs_inode    *ip)
> > > +{
> > > +	return ip->i_afp->if_format == XFS_DINODE_FMT_LOCAL ||
> > > +	       (ip->i_afp->if_format == XFS_DINODE_FMT_EXTENTS &&
> > > +		ip->i_afp->if_nextents == 0);
> > > +}
> > > +
> > > +static inline enum xfs_delattr_state
> > > +xfs_attr_init_add_state(struct xfs_da_args *args)
> > > +{
> > > +	if (!args->dp->i_afp)
> > > +		return XFS_DAS_DONE;
> > 
> > If we're in add/replace attr call without an attr fork, why do we go
> > straight to finished?
> 
> I suspect I've fixed all the issues that triggered crashes here
> because args->dp->i_afp was null. THere were transient states in a
> replace operaiton when the remove takes away the last attr, removes
> the attr fork, then calls the ADD operation. The add operation
> assumes that the attr fork has already been set up, and so bad
> things happened here.
> 
> This also occurred when setting up recovery operations - recovery of
> an add/replace could start from that same "there's no attr fork"
> condition, and so calling xfs_inode_has_attr() or
> xfs_attr_is_shortform() direct from the reocovery setup code would
> go splat because ip->i_afp was null.
> 
> I'm going to leave this for the moment (cleanup note made) because I
> don't want to have to find out that I missed a corner case somewhere
> they hard way right now. It's basically there to stop log recovery
> crashing hard, which only occurs when the experimental larp code is
> running, so I think this is safe to leave for a later cleanup.

Hmm, in that case, can this become:

	if (!args->dp->i_afp) {
		ASSERT(0);
		return XFS_DAS_DONE;
	}

And then you can add:
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D


> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 07/18] xfs: consolidate leaf/node states in xfs_attr_set_iter
  2022-05-10 23:20   ` Darrick J. Wong
@ 2022-05-11  1:09     ` Dave Chinner
  0 siblings, 0 replies; 56+ messages in thread
From: Dave Chinner @ 2022-05-11  1:09 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Tue, May 10, 2022 at 04:20:19PM -0700, Darrick J. Wong wrote:
> On Mon, May 09, 2022 at 10:41:27AM +1000, Dave Chinner wrote:
> > diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
> > index d016af4dbf81..37db61649217 100644
> > --- a/fs/xfs/libxfs/xfs_attr.h
> > +++ b/fs/xfs/libxfs/xfs_attr.h
> > @@ -450,16 +450,21 @@ enum xfs_delattr_state {
> >  	XFS_DAS_RMTBLK,			/* Removing remote blks */
> >  	XFS_DAS_RM_NAME,		/* Remove attr name */
> >  	XFS_DAS_RM_SHRINK,		/* We are shrinking the tree */
> > +
> > +	/* Leaf state set sequence */
> 
> I think this comment should note that the state increment operations of
> xfs_attr_set_iter requires that the exact order of the values
> FOUND_[LN]BLK through RM_[LN]BLK must be preserved exactly.

Happens later, as you've already noticed, when....

> Question: Are we supposed to be able to dela_state++ our way from
> RM_LBLK to RD_LEAF and from RM_NBLK to CLR_FLAG?

... we are finally able to dela_state++ our way right through the
leaf/node operations.

> With that comment added,
> Reviewed-by: Darrick J. Wong <djwong@kernel.org>

Thanks!

-Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 19/18] xfs: can't use kmem_zalloc() for attribute buffers
  2022-05-11  0:54     ` Dave Chinner
@ 2022-05-11  1:10       ` Darrick J. Wong
  0 siblings, 0 replies; 56+ messages in thread
From: Darrick J. Wong @ 2022-05-11  1:10 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Wed, May 11, 2022 at 10:54:20AM +1000, Dave Chinner wrote:
> On Tue, May 10, 2022 at 04:59:31PM -0700, Darrick J. Wong wrote:
> > On Wed, May 11, 2022 at 08:27:16AM +1000, Dave Chinner wrote:
> > > From: Dave Chinner <dchinner@redhat.com>
> > > 
> > > Because when running fsmark workloads with 64kB xattrs, heap
> > > allocation of >64kB buffers for the attr item name/value buffer
> > > will fail and deadlock.
> .....
> > > @@ -119,11 +119,11 @@ xfs_attri_item_format(
> > >  			sizeof(struct xfs_attri_log_format));
> > >  	xlog_copy_iovec(lv, &vecp, XLOG_REG_TYPE_ATTR_NAME,
> > >  			attrip->attri_name,
> > > -			xlog_calc_iovec_len(attrip->attri_name_len));
> > > +			attrip->attri_name_len);
> > 
> > Are we fixing these because the xlog_{copy,finish}_iovec functions do
> > the rounding themselves now?
> 
> Yes, I forgot that I cleaned this up here when I noticed it.
> Probably should mention it in the commit log:
> 
> "We also clean up the attribute name and value lengths as they no
> longer need to be rounded out to sizes compatible with log vectors."

Ok.

> > 
> > >  	if (attrip->attri_value_len > 0)
> > >  		xlog_copy_iovec(lv, &vecp, XLOG_REG_TYPE_ATTR_VALUE,
> > >  				attrip->attri_value,
> > > -				xlog_calc_iovec_len(attrip->attri_value_len));
> > > +				attrip->attri_value_len);
> > >  }
> > >  
> > >  /*
> > > @@ -163,26 +163,21 @@ xfs_attri_init(
> > >  
> > >  {
> > >  	struct xfs_attri_log_item	*attrip;
> > > -	uint32_t			name_vec_len = 0;
> > > -	uint32_t			value_vec_len = 0;
> > > -	uint32_t			buffer_size;
> > > -
> > > -	if (name_len)
> > > -		name_vec_len = xlog_calc_iovec_len(name_len);
> > > -	if (value_len)
> > > -		value_vec_len = xlog_calc_iovec_len(value_len);
> > 
> > ...and we don't need to bloat up the internal structures anymore either,
> > right?
> 
> Yup, because we only copy out the exact length of the name/val now.

<nod>

> > 
> > > -
> > > -	buffer_size = name_vec_len + value_vec_len;
> > > +	uint32_t			buffer_size = name_len + value_len;
> > >  
> > >  	if (buffer_size) {
> > > -		attrip = kmem_zalloc(sizeof(struct xfs_attri_log_item) +
> > > -				    buffer_size, KM_NOFS);
> > > -		if (attrip == NULL)
> > > -			return NULL;
> > > +		/*
> > > +		 * This could be over 64kB in length, so we have to use
> > > +		 * kvmalloc() for this. But kvmalloc() utterly sucks, so we
> > > +		 * use own version.
> > > +		 */
> > > +		attrip = xlog_kvmalloc(sizeof(struct xfs_attri_log_item) +
> > > +					buffer_size);
> > >  	} else {
> > > -		attrip = kmem_cache_zalloc(xfs_attri_cache,
> > > -					  GFP_NOFS | __GFP_NOFAIL);
> > > +		attrip = kmem_cache_alloc(xfs_attri_cache,
> > > +					GFP_NOFS | __GFP_NOFAIL);
> > >  	}
> > > +	memset(attrip, 0, sizeof(struct xfs_attri_log_item));
> > 
> > I wonder if this memset should be right after the xlog_kvmalloc and
> > leave the kmem_cache_zalloc alone?
> 
> Then we memset the header twice in the common small attr case, and
> the xfs_attri_log_item header is not exactly what you'd call small
> (224 bytes).

Eh, ok, NBD.
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 17/18] xfs: ATTR_REPLACE algorithm with LARP enabled needs rework
  2022-05-10 23:53   ` Darrick J. Wong
@ 2022-05-11  1:14     ` Dave Chinner
  0 siblings, 0 replies; 56+ messages in thread
From: Dave Chinner @ 2022-05-11  1:14 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Tue, May 10, 2022 at 04:53:47PM -0700, Darrick J. Wong wrote:
> On Mon, May 09, 2022 at 10:41:37AM +1000, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > We can't use the same algorithm for replacing an existing attribute
> > when logging attributes. The existing algorithm is essentially:
> > 
> > 1. create new attr w/ INCOMPLETE
> > 2. atomically flip INCOMPLETE flags between old + new attribute
> > 3. remove old attr which is marked w/ INCOMPLETE
> > 
> > This algorithm guarantees that we see either the old or new
> > attribute, and if we fail after the atomic flag flip, we don't have
> > to recover the removal of the old attr because we never see
> > INCOMPLETE attributes in lookups.
> > 
> > For logged attributes, however, this does not work. The logged
> > attribute intents do not track the work that has been done as the
> > transaction rolls, and hence the only recovery mechanism we have is
> > "run the replace operation from scratch".
> > 
> > This is further exacerbated by the attempt to avoid needing the
> > INCOMPLETE flag to create an atomic swap. This means we can create
> > a second active attribute of the same name before we remove the
> > original. If we fail at any point after the create but before the
> > removal has completed, we end up with duplicate attributes in
> > the attr btree and recovery only tries to replace one of them.
> > 
> > There are several other failure modes where we can leave partially
> > allocated remote attributes that expose stale data, partially free
> > remote attributes that enable UAF based stale data exposure, etc.
> > 
> > TO fix this, we need a different algorithm for replace operations
> > when LARP is enabled. Luckily, it's not that complex if we take the
> > right first step. That is, the first thing we log is the attri
> > intent with the new name/value pair and mark the old attr as
> > INCOMPLETE in the same transaction.
> > 
> > From there, we then remove the old attr and keep relogging the
> > new name/value in the intent, such that we always know that we have
> > to create the new attr in recovery. Once the old attr is removed,
> > we then run a normal ATTR_CREATE operation relogging the intent as
> > we go. If the new attr is local, then it gets created in a single
> > atomic transaction that also logs the final intent done. If the new
> > attr is remote, the we set INCOMPLETE on the new attr while we
> > allocate and set the remote value, and then we clear the INCOMPLETE
> > flag at in the last transaction taht logs the final intent done.
> > 
> > If we fail at any point in this algorithm, log recovery will always
> > see the same state on disk: the new name/value in the intent, and
> > either an INCOMPLETE attr or no attr in the attr btree. If we find
> > an INCOMPLETE attr, we run the full replace starting with removing
> > the INCOMPLETE attr. If we don't find it, then we simply create the
> > new attr.
> 
> ...assuming that the INCOMPLETE attr is preserved until we're completely
> done unmapping the rmtval blocks, right?

Yes, it is preserved - the INCOMPLETE flag is held in
the name entry and that's not removed until after the rmtval blocks
have been fully removed.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 05/18] xfs: separate out initial attr_set states
  2022-05-11  1:08       ` Darrick J. Wong
@ 2022-05-11  1:38         ` Dave Chinner
  2022-05-11  8:35           ` Dave Chinner
  0 siblings, 1 reply; 56+ messages in thread
From: Dave Chinner @ 2022-05-11  1:38 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Tue, May 10, 2022 at 06:08:48PM -0700, Darrick J. Wong wrote:
> On Wed, May 11, 2022 at 11:06:51AM +1000, Dave Chinner wrote:
> > On Tue, May 10, 2022 at 04:12:34PM -0700, Darrick J. Wong wrote:
> > > On Mon, May 09, 2022 at 10:41:25AM +1000, Dave Chinner wrote:
> > > > diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
> > > > index c9c867e3406c..ad52b5dc59e4 100644
> > > > --- a/fs/xfs/libxfs/xfs_attr.h
> > > > +++ b/fs/xfs/libxfs/xfs_attr.h
> > > > @@ -530,4 +553,35 @@ void xfs_attri_destroy_cache(void);
> > > >  int __init xfs_attrd_init_cache(void);
> > > >  void xfs_attrd_destroy_cache(void);
> > > >  
> > > > +/*
> > > > + * Check to see if the attr should be upgraded from non-existent or shortform to
> > > > + * single-leaf-block attribute list.
> > > > + */
> > > > +static inline bool
> > > > +xfs_attr_is_shortform(
> > > > +	struct xfs_inode    *ip)
> > > > +{
> > > > +	return ip->i_afp->if_format == XFS_DINODE_FMT_LOCAL ||
> > > > +	       (ip->i_afp->if_format == XFS_DINODE_FMT_EXTENTS &&
> > > > +		ip->i_afp->if_nextents == 0);
> > > > +}
> > > > +
> > > > +static inline enum xfs_delattr_state
> > > > +xfs_attr_init_add_state(struct xfs_da_args *args)
> > > > +{
> > > > +	if (!args->dp->i_afp)
> > > > +		return XFS_DAS_DONE;
> > > 
> > > If we're in add/replace attr call without an attr fork, why do we go
> > > straight to finished?
> > 
> > I suspect I've fixed all the issues that triggered crashes here
> > because args->dp->i_afp was null. THere were transient states in a
> > replace operaiton when the remove takes away the last attr, removes
> > the attr fork, then calls the ADD operation. The add operation
> > assumes that the attr fork has already been set up, and so bad
> > things happened here.
> > 
> > This also occurred when setting up recovery operations - recovery of
> > an add/replace could start from that same "there's no attr fork"
> > condition, and so calling xfs_inode_has_attr() or
> > xfs_attr_is_shortform() direct from the reocovery setup code would
> > go splat because ip->i_afp was null.
> > 
> > I'm going to leave this for the moment (cleanup note made) because I
> > don't want to have to find out that I missed a corner case somewhere
> > they hard way right now. It's basically there to stop log recovery
> > crashing hard, which only occurs when the experimental larp code is
> > running, so I think this is safe to leave for a later cleanup.
> 
> Hmm, in that case, can this become:
> 
> 	if (!args->dp->i_afp) {
> 		ASSERT(0);
> 		return XFS_DAS_DONE;
> 	}

OK.

> And then you can add:
> Reviewed-by: Darrick J. Wong <djwong@kernel.org>

Thanks!

-Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 05/18] xfs: separate out initial attr_set states
  2022-05-11  1:38         ` Dave Chinner
@ 2022-05-11  8:35           ` Dave Chinner
  2022-05-11 15:39             ` Darrick J. Wong
  0 siblings, 1 reply; 56+ messages in thread
From: Dave Chinner @ 2022-05-11  8:35 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Wed, May 11, 2022 at 11:38:51AM +1000, Dave Chinner wrote:
> On Tue, May 10, 2022 at 06:08:48PM -0700, Darrick J. Wong wrote:
> > On Wed, May 11, 2022 at 11:06:51AM +1000, Dave Chinner wrote:
> > > On Tue, May 10, 2022 at 04:12:34PM -0700, Darrick J. Wong wrote:
> > > > On Mon, May 09, 2022 at 10:41:25AM +1000, Dave Chinner wrote:
> > > > > diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
> > > > > index c9c867e3406c..ad52b5dc59e4 100644
> > > > > --- a/fs/xfs/libxfs/xfs_attr.h
> > > > > +++ b/fs/xfs/libxfs/xfs_attr.h
> > > > > @@ -530,4 +553,35 @@ void xfs_attri_destroy_cache(void);
> > > > >  int __init xfs_attrd_init_cache(void);
> > > > >  void xfs_attrd_destroy_cache(void);
> > > > >  
> > > > > +/*
> > > > > + * Check to see if the attr should be upgraded from non-existent or shortform to
> > > > > + * single-leaf-block attribute list.
> > > > > + */
> > > > > +static inline bool
> > > > > +xfs_attr_is_shortform(
> > > > > +	struct xfs_inode    *ip)
> > > > > +{
> > > > > +	return ip->i_afp->if_format == XFS_DINODE_FMT_LOCAL ||
> > > > > +	       (ip->i_afp->if_format == XFS_DINODE_FMT_EXTENTS &&
> > > > > +		ip->i_afp->if_nextents == 0);
> > > > > +}
> > > > > +
> > > > > +static inline enum xfs_delattr_state
> > > > > +xfs_attr_init_add_state(struct xfs_da_args *args)
> > > > > +{
> > > > > +	if (!args->dp->i_afp)
> > > > > +		return XFS_DAS_DONE;
> > > > 
> > > > If we're in add/replace attr call without an attr fork, why do we go
> > > > straight to finished?
> > > 
> > > I suspect I've fixed all the issues that triggered crashes here
> > > because args->dp->i_afp was null. THere were transient states in a
> > > replace operaiton when the remove takes away the last attr, removes
> > > the attr fork, then calls the ADD operation. The add operation
> > > assumes that the attr fork has already been set up, and so bad
> > > things happened here.
> > > 
> > > This also occurred when setting up recovery operations - recovery of
> > > an add/replace could start from that same "there's no attr fork"
> > > condition, and so calling xfs_inode_has_attr() or
> > > xfs_attr_is_shortform() direct from the reocovery setup code would
> > > go splat because ip->i_afp was null.
> > > 
> > > I'm going to leave this for the moment (cleanup note made) because I
> > > don't want to have to find out that I missed a corner case somewhere
> > > they hard way right now. It's basically there to stop log recovery
> > > crashing hard, which only occurs when the experimental larp code is
> > > running, so I think this is safe to leave for a later cleanup.
> > 
> > Hmm, in that case, can this become:
> > 
> > 	if (!args->dp->i_afp) {
> > 		ASSERT(0);
> > 		return XFS_DAS_DONE;
> > 	}
> 
> OK.

Ok, now generic/051 has reminded me exactly what this was for.

Shortform attr remove will remove the attr and the attr fork from
this code:

        case XFS_DAS_SF_REMOVE:                                                  
                error = xfs_attr_sf_removename(args);                            
                attr->xattri_dela_state = xfs_attr_complete_op(attr,             
                                                xfs_attr_init_add_state(args));  
                break;                                                           

But if we are doing this as part of a REPLACE operation and we
still need to add the new attr, it calls xfs_attr_init_add_state()
to get the add state we should start with. That then hits the
null args->dp->i_afp case because the fork got removed.

This can't happen if we are doing a replace op, so we'd then check
if it's a shortform attr fork and return XFS_DAS_SF_ADD for the
replace to then execute. But it's not a replace op, so we can
have a null attr fork.

I'm going to restore the old code with a comment so that I don't
forget this again.

/*
 * If called from the completion of a attr remove to determine
 * the next state, the attribute fork may be null. This can occur on
 * a pure remove, but we grab the next state before we check if a
 * replace operation is being performed. Hence if the attr fork is
 * null, it's a pure remove operation and we are done.
 */

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 05/18] xfs: separate out initial attr_set states
  2022-05-11  8:35           ` Dave Chinner
@ 2022-05-11 15:39             ` Darrick J. Wong
  2022-05-12  0:57               ` Dave Chinner
  0 siblings, 1 reply; 56+ messages in thread
From: Darrick J. Wong @ 2022-05-11 15:39 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Wed, May 11, 2022 at 06:35:13PM +1000, Dave Chinner wrote:
> On Wed, May 11, 2022 at 11:38:51AM +1000, Dave Chinner wrote:
> > On Tue, May 10, 2022 at 06:08:48PM -0700, Darrick J. Wong wrote:
> > > On Wed, May 11, 2022 at 11:06:51AM +1000, Dave Chinner wrote:
> > > > On Tue, May 10, 2022 at 04:12:34PM -0700, Darrick J. Wong wrote:
> > > > > On Mon, May 09, 2022 at 10:41:25AM +1000, Dave Chinner wrote:
> > > > > > diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
> > > > > > index c9c867e3406c..ad52b5dc59e4 100644
> > > > > > --- a/fs/xfs/libxfs/xfs_attr.h
> > > > > > +++ b/fs/xfs/libxfs/xfs_attr.h
> > > > > > @@ -530,4 +553,35 @@ void xfs_attri_destroy_cache(void);
> > > > > >  int __init xfs_attrd_init_cache(void);
> > > > > >  void xfs_attrd_destroy_cache(void);
> > > > > >  
> > > > > > +/*
> > > > > > + * Check to see if the attr should be upgraded from non-existent or shortform to
> > > > > > + * single-leaf-block attribute list.
> > > > > > + */
> > > > > > +static inline bool
> > > > > > +xfs_attr_is_shortform(
> > > > > > +	struct xfs_inode    *ip)
> > > > > > +{
> > > > > > +	return ip->i_afp->if_format == XFS_DINODE_FMT_LOCAL ||
> > > > > > +	       (ip->i_afp->if_format == XFS_DINODE_FMT_EXTENTS &&
> > > > > > +		ip->i_afp->if_nextents == 0);
> > > > > > +}
> > > > > > +
> > > > > > +static inline enum xfs_delattr_state
> > > > > > +xfs_attr_init_add_state(struct xfs_da_args *args)
> > > > > > +{
> > > > > > +	if (!args->dp->i_afp)
> > > > > > +		return XFS_DAS_DONE;
> > > > > 
> > > > > If we're in add/replace attr call without an attr fork, why do we go
> > > > > straight to finished?
> > > > 
> > > > I suspect I've fixed all the issues that triggered crashes here
> > > > because args->dp->i_afp was null. THere were transient states in a
> > > > replace operaiton when the remove takes away the last attr, removes
> > > > the attr fork, then calls the ADD operation. The add operation
> > > > assumes that the attr fork has already been set up, and so bad
> > > > things happened here.
> > > > 
> > > > This also occurred when setting up recovery operations - recovery of
> > > > an add/replace could start from that same "there's no attr fork"
> > > > condition, and so calling xfs_inode_has_attr() or
> > > > xfs_attr_is_shortform() direct from the reocovery setup code would
> > > > go splat because ip->i_afp was null.
> > > > 
> > > > I'm going to leave this for the moment (cleanup note made) because I
> > > > don't want to have to find out that I missed a corner case somewhere
> > > > they hard way right now. It's basically there to stop log recovery
> > > > crashing hard, which only occurs when the experimental larp code is
> > > > running, so I think this is safe to leave for a later cleanup.
> > > 
> > > Hmm, in that case, can this become:
> > > 
> > > 	if (!args->dp->i_afp) {
> > > 		ASSERT(0);
> > > 		return XFS_DAS_DONE;
> > > 	}
> > 
> > OK.
> 
> Ok, now generic/051 has reminded me exactly what this was for.
> 
> Shortform attr remove will remove the attr and the attr fork from
> this code:
> 
>         case XFS_DAS_SF_REMOVE:                                                  
>                 error = xfs_attr_sf_removename(args);                            
>                 attr->xattri_dela_state = xfs_attr_complete_op(attr,             
>                                                 xfs_attr_init_add_state(args));  
>                 break;                                                           
> 
> But if we are doing this as part of a REPLACE operation and we
> still need to add the new attr, it calls xfs_attr_init_add_state()
> to get the add state we should start with. That then hits the
> null args->dp->i_afp case because the fork got removed.
> 
> This can't happen if we are doing a replace op, so we'd then check
> if it's a shortform attr fork and return XFS_DAS_SF_ADD for the
> replace to then execute. But it's not a replace op, so we can
> have a null attr fork.
> 
> I'm going to restore the old code with a comment so that I don't
> forget this again.
> 
> /*
>  * If called from the completion of a attr remove to determine
>  * the next state, the attribute fork may be null. This can occur on
>  * a pure remove, but we grab the next state before we check if a
>  * replace operation is being performed. Hence if the attr fork is
>  * null, it's a pure remove operation and we are done.
>  */

Ahh, I see -- sf_removename will /never/ kill i_afp if we're doing a
DA_OP_REPLACE or ADDNAME, and leaf_removename also won't do that if
we're doing DA_OP_REPLACE.  IOWs, only a removexattr operation can
result in i_afp being freed.

And the XATTR_CREATE operation always guarantee that i_afp is non-null
before we start, so xfs_attr_defer_add should never be called with
args->dp->i_afp == NULL, hence it'll never hit that state.

Would you mind adding a sentence to the comment?

"A pure create ensures the existence of i_afp and never encounters this
state."

FBO of maintainers who aren't quite as uptodate on how xattrs work? ;)

(Admittedly all this will probably go away if we stop freeing i_afp, but
I wasn't going to push on that until the LARP stuff settles...)

--D

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 05/18] xfs: separate out initial attr_set states
  2022-05-11 15:39             ` Darrick J. Wong
@ 2022-05-12  0:57               ` Dave Chinner
  0 siblings, 0 replies; 56+ messages in thread
From: Dave Chinner @ 2022-05-12  0:57 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Wed, May 11, 2022 at 08:39:52AM -0700, Darrick J. Wong wrote:
> On Wed, May 11, 2022 at 06:35:13PM +1000, Dave Chinner wrote:
> > On Wed, May 11, 2022 at 11:38:51AM +1000, Dave Chinner wrote:
> > > On Tue, May 10, 2022 at 06:08:48PM -0700, Darrick J. Wong wrote:
> > > > On Wed, May 11, 2022 at 11:06:51AM +1000, Dave Chinner wrote:
> > > > > I'm going to leave this for the moment (cleanup note made) because I
> > > > > don't want to have to find out that I missed a corner case somewhere
> > > > > they hard way right now. It's basically there to stop log recovery
> > > > > crashing hard, which only occurs when the experimental larp code is
> > > > > running, so I think this is safe to leave for a later cleanup.
> > > > 
> > > > Hmm, in that case, can this become:
> > > > 
> > > > 	if (!args->dp->i_afp) {
> > > > 		ASSERT(0);
> > > > 		return XFS_DAS_DONE;
> > > > 	}
> > > 
> > > OK.
> > 
> > Ok, now generic/051 has reminded me exactly what this was for.
> > 
> > Shortform attr remove will remove the attr and the attr fork from
> > this code:
> > 
> >         case XFS_DAS_SF_REMOVE:                                                  
> >                 error = xfs_attr_sf_removename(args);                            
> >                 attr->xattri_dela_state = xfs_attr_complete_op(attr,             
> >                                                 xfs_attr_init_add_state(args));  
> >                 break;                                                           
> > 
> > But if we are doing this as part of a REPLACE operation and we
> > still need to add the new attr, it calls xfs_attr_init_add_state()
> > to get the add state we should start with. That then hits the
> > null args->dp->i_afp case because the fork got removed.
> > 
> > This can't happen if we are doing a replace op, so we'd then check
> > if it's a shortform attr fork and return XFS_DAS_SF_ADD for the
> > replace to then execute. But it's not a replace op, so we can
> > have a null attr fork.
> > 
> > I'm going to restore the old code with a comment so that I don't
> > forget this again.
> > 
> > /*
> >  * If called from the completion of a attr remove to determine
> >  * the next state, the attribute fork may be null. This can occur on
> >  * a pure remove, but we grab the next state before we check if a
> >  * replace operation is being performed. Hence if the attr fork is
> >  * null, it's a pure remove operation and we are done.
> >  */
> 
> Ahh, I see -- sf_removename will /never/ kill i_afp if we're doing a
> DA_OP_REPLACE or ADDNAME, and leaf_removename also won't do that if
> we're doing DA_OP_REPLACE.  IOWs, only a removexattr operation can
> result in i_afp being freed.
> 
> And the XATTR_CREATE operation always guarantee that i_afp is non-null
> before we start, so xfs_attr_defer_add should never be called with
> args->dp->i_afp == NULL, hence it'll never hit that state.
> 
> Would you mind adding a sentence to the comment?
> 
> "A pure create ensures the existence of i_afp and never encounters this
> state."

Sure.

-Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 56+ messages in thread

end of thread, other threads:[~2022-05-12  0:57 UTC | newest]

Thread overview: 56+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-09  0:41 [PATCH 00/18 V4] XFS: LARP state machine and recovery rework Dave Chinner
2022-05-09  0:41 ` [PATCH 01/18] xfs: avoid empty xattr transaction when attrs are inline Dave Chinner
2022-05-09 16:43   ` Darrick J. Wong
2022-05-09  0:41 ` [PATCH 02/18] xfs: initialise attrd item to zero Dave Chinner
2022-05-09 16:43   ` Darrick J. Wong
2022-05-09  0:41 ` [PATCH 03/18] xfs: make xattri_leaf_bp more useful Dave Chinner
2022-05-10 22:58   ` Darrick J. Wong
2022-05-09  0:41 ` [PATCH 04/18] xfs: rework deferred attribute operation setup Dave Chinner
2022-05-10 23:04   ` Darrick J. Wong
2022-05-11  0:57     ` Dave Chinner
2022-05-09  0:41 ` [PATCH 05/18] xfs: separate out initial attr_set states Dave Chinner
2022-05-10 23:12   ` Darrick J. Wong
2022-05-11  1:06     ` Dave Chinner
2022-05-11  1:08       ` Darrick J. Wong
2022-05-11  1:38         ` Dave Chinner
2022-05-11  8:35           ` Dave Chinner
2022-05-11 15:39             ` Darrick J. Wong
2022-05-12  0:57               ` Dave Chinner
2022-05-09  0:41 ` [PATCH 06/18] xfs: kill XFS_DAC_LEAF_ADDNAME_INIT Dave Chinner
2022-05-10 23:15   ` Darrick J. Wong
2022-05-09  0:41 ` [PATCH 07/18] xfs: consolidate leaf/node states in xfs_attr_set_iter Dave Chinner
2022-05-10 23:20   ` Darrick J. Wong
2022-05-11  1:09     ` Dave Chinner
2022-05-09  0:41 ` [PATCH 08/18] xfs: split remote attr setting out from replace path Dave Chinner
2022-05-10 23:22   ` Darrick J. Wong
2022-05-09  0:41 ` [PATCH 09/18] xfs: XFS_DAS_LEAF_REPLACE state only needed if !LARP Dave Chinner
2022-05-10 23:24   ` Darrick J. Wong
2022-05-09  0:41 ` [PATCH 10/18] xfs: remote xattr removal in xfs_attr_set_iter() is conditional Dave Chinner
2022-05-10 23:26   ` Darrick J. Wong
2022-05-09  0:41 ` [PATCH 11/18] xfs: clean up final attr removal in xfs_attr_set_iter Dave Chinner
2022-05-10 23:29   ` Darrick J. Wong
2022-05-09  0:41 ` [PATCH 12/18] xfs: xfs_attr_set_iter() does not need to return EAGAIN Dave Chinner
2022-05-10 23:30   ` Darrick J. Wong
2022-05-09  0:41 ` [PATCH 13/18] xfs: introduce attr remove initial states into xfs_attr_set_iter Dave Chinner
2022-05-10 23:37   ` Darrick J. Wong
2022-05-10 23:40     ` Darrick J. Wong
2022-05-09  0:41 ` [PATCH 14/18] xfs: switch attr remove to xfs_attri_set_iter Dave Chinner
2022-05-10 23:40   ` Darrick J. Wong
2022-05-09  0:41 ` [PATCH 15/18] xfs: remove xfs_attri_remove_iter Dave Chinner
2022-05-10 23:42   ` Darrick J. Wong
2022-05-09  0:41 ` [PATCH 16/18] xfs: use XFS_DA_OP flags in deferred attr ops Dave Chinner
2022-05-10 22:20   ` [PATCH 16/18 v2] " Dave Chinner
2022-05-10 23:47     ` Darrick J. Wong
2022-05-10 23:49     ` Alli
2022-05-09  0:41 ` [PATCH 17/18] xfs: ATTR_REPLACE algorithm with LARP enabled needs rework Dave Chinner
2022-05-10 22:31   ` Alli
2022-05-10 23:53   ` Darrick J. Wong
2022-05-11  1:14     ` Dave Chinner
2022-05-09  0:41 ` [PATCH 18/18] xfs: detect empty attr leaf blocks in xfs_attr3_leaf_verify Dave Chinner
2022-05-10 22:31   ` Alli
2022-05-10 23:54   ` Darrick J. Wong
2022-05-10 22:27 ` [PATCH 19/18] xfs: can't use kmem_zalloc() for attribute buffers Dave Chinner
2022-05-10 23:59   ` Darrick J. Wong
2022-05-11  0:54     ` Dave Chinner
2022-05-11  1:10       ` Darrick J. Wong
2022-05-11  0:54   ` Alli

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.