All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v13 00/10] xfs: Delayed Attributes
@ 2020-10-23  6:34 Allison Henderson
  2020-10-23  6:34 ` [PATCH v13 01/10] xfs: Add helper xfs_attr_node_remove_step Allison Henderson
                   ` (9 more replies)
  0 siblings, 10 replies; 58+ messages in thread
From: Allison Henderson @ 2020-10-23  6:34 UTC (permalink / raw)
  To: linux-xfs

Hi all,

This set is a subset of a larger series for parent pointers. Delayed attributes
allow attribute operations (set and remove) to be logged and committed in the same
way that other delayed operations do. This allows more complex operations (like
parent pointers) to be broken up into multiple smaller transactions. To do
this, the existing attr operations must be modified to operate as either a
delayed operation or a inline operation since older filesystems will not be
able to use the new log entries.  This means that they cannot roll, commit, or
finish transactions.  Instead, they return -EAGAIN to allow the calling
function to handle the transaction. In this series, we focus on only the clean
up and refactoring needed to accomplish this. We will introduce delayed attrs
and parent pointers in a later set.

At the moment, I would like people to focus their review efforts on just this
"delayed attribute" sub series, as I think that is a more conservative use of peoples
review time.  I also think the set is a bit much to manage all at once, and we
need to get the infrastructure ironed out before we focus too much anything
that depends on it. But I do have the extended series for folks that want to
see the bigger picture of where this is going.

To help organize the set, I've arranged the patches to make sort of mini sets.
I thought it would help reviewers break down the reviewing some. For reviewing
purposes, the set could be broken up into 2 phases:

Delay Ready Attributes: (patches 1-4)
These are the remaining patches belonging to the "Delay Ready" series that
we've been working with.  In these patches, transaction handling is removed
from the attr routines, and replaced with a state machine that allows a high
level function to roll the transaction and repeatedly recall the attr routines
until they are finished.  The behavior of the attr set/remove routines
are now also compatible as a .finish_item callback
  xfs: Add helper xfs_attr_node_remove_step
  xfs: Add delay ready attr remove routines
  xfs: Add delay ready attr set routines
  xfs: Rename __xfs_attr_rmtval_remove

Delayed Attributes: (patches 5 - 10)
These patches go on to fully implement delayed attributes.  New attr intent and
done items are introduced for use in the existing logging infrastructure.  A
feature bit is added to toggle the feature on and off, and an error tag is added
to test the log replay
  xfs: Set up infastructure for deferred attribute operations
  xfs: Add xfs_attr_set_deferred and xfs_attr_remove_deferred
  xfs: Add feature bit XFS_SB_FEAT_INCOMPAT_LOG_DELATTR
  xfs: Enable delayed attributes
  xfs Remove unused xfs_attr_-_args.patch
  xfs: Add delayed attributes error tag

Updates since v12: Mostly integrating review feed back.  I've refactored
xfs_attr_node_removename as discussed in the 2nd patch, and consolidated the
xfs_attr_item members into a single args pointer.  I've gotten the dfops
mechanics to drive both delayed and non-delayed operations.  Lastly the
xfs_attr_*_args functions are removed at the end of the set since the dfops
machinery replaces it.  I did explore reorganizing xfs_da_args, though I think
thats big enough to be a separate project and I wanted to get a  v13 out before
too much time gets away.  Also updated the extended parent pointer series to
keep it functional for now.  

xfs: Add helper xfs_attr_node_remove_step
  New

xfs: Add delay ready attr remove routines
  Fixed typo in commit message
  Rebase adjustments
  Refactored xfs_attr_node_removename to xfs_attr_node_removename_iter
  Added state XFS_DAC_UNINIT to avoid warnings
  Found I could remove blk from dac.  Removed to simplify

xfs: Add delay ready attr set routines
  Rebase adjustments

xfs: Set up infastructure for deferred attribute operations
   Collapsed xfs_attr_item members into a single args pointer
   Modified xfs_attr_create_intent routines to return null when delayed attrs not enabled
   Modified xfs_trans_attr and xfs_attr_finish_item to avoid handling intent and
      done items when delayed attrs not enabled Moved xfs_sb_version_hasdelattr stub
      from "xfs: Add feature bit XFS_SB_FEAT_INCOMPAT_LOG_DELATTR" patch to this
      patch.  Used in managing when intents are recorded
   Removed initialization logic from args xfs_attr_finish_item (no longer needed)
   Simplified xfs_attr_recover to eliminate looping logic


xfs: Add xfs_attr_set_deferred and xfs_attr_remove_deferred
  collapsed parameters into a single args pointer

xfs: Add feature bit XFS_SB_FEAT_INCOMPAT_LOG_DELATTR
  Rebase adjustments: Fill in xfs_sb_version_hasdelattr stub from earlier patch

xfs: Enable delayed attributes
  Removed logic to test for feature bit which is now handled by delayed attr mechanics

xfs: Remove unused xfs_attr_*_args
   New


This series can be viewed on github here:
https://github.com/allisonhenderson/xfs_work/tree/delay_ready_attrs_v13

As well as the extended delayed attribute and parent pointer series:
https://github.com/allisonhenderson/xfs_work/tree/delay_ready_attrs_v13_extended

And the test cases:
https://github.com/allisonhenderson/xfs_work/tree/pptr_xfstests

In order to run the test cases, you will need have the corresponding xfsprogs
changes as well.  Which can be found here:
https://github.com/allisonhenderson/xfs_work/tree/delay_ready_attrs_xfsprogs_v13
https://github.com/allisonhenderson/xfs_work/tree/delay_ready_attrs_xfsprogs_v13_extended

To run the xfs attributes tests run:
check -g attr

To run as delayed attributes run:
export MKFS_OPTIONS="-n delattr"
check -g attr

To run parent pointer tests:
check -g parent

I've also made the corresponding updates to the user space side as well, and ported anything
they need to seat correctly.

Questions, comment and feedback appreciated! 

Thanks all!
Allison 

Allison Collins (5):
  xfs: Add helper xfs_attr_node_remove_step
  xfs: Rename __xfs_attr_rmtval_remove
  xfs: Add xfs_attr_set_deferred and xfs_attr_remove_deferred
  xfs: Enable delayed attributes
  xfs: Add delayed attributes error tag

Allison Henderson (5):
  xfs: Add delay ready attr remove routines
  xfs: Add delay ready attr set routines
  xfs: Set up infastructure for deferred attribute operations
  xfs: Add feature bit XFS_SB_FEAT_INCOMPAT_LOG_DELATTR
  xfs: Remove unused xfs_attr_*_args

 fs/xfs/Makefile                 |   1 +
 fs/xfs/libxfs/xfs_attr.c        | 550 +++++++++++++++++++----------
 fs/xfs/libxfs/xfs_attr.h        | 218 +++++++++++-
 fs/xfs/libxfs/xfs_attr_leaf.c   |   2 +-
 fs/xfs/libxfs/xfs_attr_remote.c | 110 +++---
 fs/xfs/libxfs/xfs_attr_remote.h |   7 +-
 fs/xfs/libxfs/xfs_defer.c       |   1 +
 fs/xfs/libxfs/xfs_defer.h       |   3 +
 fs/xfs/libxfs/xfs_errortag.h    |   4 +-
 fs/xfs/libxfs/xfs_format.h      |  11 +-
 fs/xfs/libxfs/xfs_fs.h          |   1 +
 fs/xfs/libxfs/xfs_log_format.h  |  44 ++-
 fs/xfs/libxfs/xfs_log_recover.h |   2 +
 fs/xfs/libxfs/xfs_sb.c          |   2 +
 fs/xfs/libxfs/xfs_types.h       |   1 +
 fs/xfs/scrub/common.c           |   2 +
 fs/xfs/xfs_acl.c                |   2 +
 fs/xfs/xfs_attr_inactive.c      |   2 +-
 fs/xfs/xfs_attr_item.c          | 758 ++++++++++++++++++++++++++++++++++++++++
 fs/xfs/xfs_attr_item.h          |  76 ++++
 fs/xfs/xfs_attr_list.c          |   1 +
 fs/xfs/xfs_error.c              |   3 +
 fs/xfs/xfs_ioctl.c              |   2 +
 fs/xfs/xfs_ioctl32.c            |   2 +
 fs/xfs/xfs_iops.c               |   2 +
 fs/xfs/xfs_log.c                |   4 +
 fs/xfs/xfs_log_recover.c        |   2 +
 fs/xfs/xfs_ondisk.h             |   2 +
 fs/xfs/xfs_super.c              |   3 +
 fs/xfs/xfs_trace.h              |   1 -
 fs/xfs/xfs_xattr.c              |   1 +
 31 files changed, 1591 insertions(+), 229 deletions(-)
 create mode 100644 fs/xfs/xfs_attr_item.c
 create mode 100644 fs/xfs/xfs_attr_item.h

-- 
2.7.4


^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH v13 01/10] xfs: Add helper xfs_attr_node_remove_step
  2020-10-23  6:34 [PATCH v13 00/10] xfs: Delayed Attributes Allison Henderson
@ 2020-10-23  6:34 ` Allison Henderson
  2020-10-27  7:03   ` Chandan Babu R
                     ` (2 more replies)
  2020-10-23  6:34 ` [PATCH v13 02/10] xfs: Add delay ready attr remove routines Allison Henderson
                   ` (8 subsequent siblings)
  9 siblings, 3 replies; 58+ messages in thread
From: Allison Henderson @ 2020-10-23  6:34 UTC (permalink / raw)
  To: linux-xfs

From: Allison Collins <allison.henderson@oracle.com>

This patch adds a new helper function xfs_attr_node_remove_step.  This
will help simplify and modularize the calling function
xfs_attr_node_remove.

Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
---
 fs/xfs/libxfs/xfs_attr.c | 46 ++++++++++++++++++++++++++++++++++------------
 1 file changed, 34 insertions(+), 12 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
index fd8e641..f4d39bf 100644
--- a/fs/xfs/libxfs/xfs_attr.c
+++ b/fs/xfs/libxfs/xfs_attr.c
@@ -1228,19 +1228,14 @@ xfs_attr_node_remove_rmt(
  * the root node (a special case of an intermediate node).
  */
 STATIC int
-xfs_attr_node_removename(
-	struct xfs_da_args	*args)
+xfs_attr_node_remove_step(
+	struct xfs_da_args	*args,
+	struct xfs_da_state	*state)
 {
-	struct xfs_da_state	*state;
 	struct xfs_da_state_blk	*blk;
 	int			retval, error;
 	struct xfs_inode	*dp = args->dp;
 
-	trace_xfs_attr_node_removename(args);
-
-	error = xfs_attr_node_removename_setup(args, &state);
-	if (error)
-		goto out;
 
 	/*
 	 * If there is an out-of-line value, de-allocate the blocks.
@@ -1250,7 +1245,7 @@ xfs_attr_node_removename(
 	if (args->rmtblkno > 0) {
 		error = xfs_attr_node_remove_rmt(args, state);
 		if (error)
-			goto out;
+			return error;
 	}
 
 	/*
@@ -1267,18 +1262,45 @@ xfs_attr_node_removename(
 	if (retval && (state->path.active > 1)) {
 		error = xfs_da3_join(state);
 		if (error)
-			goto out;
+			return error;
 		error = xfs_defer_finish(&args->trans);
 		if (error)
-			goto out;
+			return error;
 		/*
 		 * Commit the Btree join operation and start a new trans.
 		 */
 		error = xfs_trans_roll_inode(&args->trans, dp);
 		if (error)
-			goto out;
+			return error;
 	}
 
+	return error;
+}
+
+/*
+ * Remove a name from a B-tree attribute list.
+ *
+ * This routine will find the blocks of the name to remove, remove them and
+ * shirnk the tree if needed.
+ */
+STATIC int
+xfs_attr_node_removename(
+	struct xfs_da_args	*args)
+{
+	struct xfs_da_state	*state;
+	int			error;
+	struct xfs_inode	*dp = args->dp;
+
+	trace_xfs_attr_node_removename(args);
+
+	error = xfs_attr_node_removename_setup(args, &state);
+	if (error)
+		goto out;
+
+	error = xfs_attr_node_remove_step(args, state);
+	if (error)
+		goto out;
+
 	/*
 	 * If the result is small enough, push it all into the inode.
 	 */
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v13 02/10] xfs: Add delay ready attr remove routines
  2020-10-23  6:34 [PATCH v13 00/10] xfs: Delayed Attributes Allison Henderson
  2020-10-23  6:34 ` [PATCH v13 01/10] xfs: Add helper xfs_attr_node_remove_step Allison Henderson
@ 2020-10-23  6:34 ` Allison Henderson
  2020-10-27  9:59   ` Chandan Babu R
                     ` (2 more replies)
  2020-10-23  6:34 ` [PATCH v13 03/10] xfs: Add delay ready attr set routines Allison Henderson
                   ` (7 subsequent siblings)
  9 siblings, 3 replies; 58+ messages in thread
From: Allison Henderson @ 2020-10-23  6:34 UTC (permalink / raw)
  To: linux-xfs

This patch modifies the attr remove routines to be delay ready. This
means they no longer roll or commit transactions, but instead return
-EAGAIN to have the calling routine roll and refresh the transaction. In
this series, xfs_attr_remove_args has become xfs_attr_remove_iter, which
uses a sort of state machine like switch to keep track of where it was
when EAGAIN was returned. xfs_attr_node_removename has also been
modified to use the switch, and a new version of xfs_attr_remove_args
consists of a simple loop to refresh the transaction until the operation
is completed.  A new XFS_DAC_DEFER_FINISH flag is used to finish the
transaction where ever the existing code used to.

Calls to xfs_attr_rmtval_remove are replaced with the delay ready
version __xfs_attr_rmtval_remove. We will rename
__xfs_attr_rmtval_remove back to xfs_attr_rmtval_remove when we are
done.

xfs_attr_rmtval_remove itself is still in use by the set routines (used
during a rename).  For reasons of preserving existing function, we
modify xfs_attr_rmtval_remove to call xfs_defer_finish when the flag is
set.  Similar to how xfs_attr_remove_args does here.  Once we transition
the set routines to be delay ready, xfs_attr_rmtval_remove is no longer
used and will be removed.

This patch also adds a new struct xfs_delattr_context, which we will use
to keep track of the current state of an attribute operation. The new
xfs_delattr_state enum is used to track various operations that are in
progress so that we know not to repeat them, and resume where we left
off before EAGAIN was returned to cycle out the transaction. Other
members take the place of local variables that need to retain their
values across multiple function recalls.  See xfs_attr.h for a more
detailed diagram of the states.

Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
---
 fs/xfs/libxfs/xfs_attr.c        | 200 +++++++++++++++++++++++++++++-----------
 fs/xfs/libxfs/xfs_attr.h        |  72 +++++++++++++++
 fs/xfs/libxfs/xfs_attr_leaf.c   |   2 +-
 fs/xfs/libxfs/xfs_attr_remote.c |  37 ++++----
 fs/xfs/libxfs/xfs_attr_remote.h |   2 +-
 fs/xfs/xfs_attr_inactive.c      |   2 +-
 6 files changed, 241 insertions(+), 74 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
index f4d39bf..6ca94cb 100644
--- a/fs/xfs/libxfs/xfs_attr.c
+++ b/fs/xfs/libxfs/xfs_attr.c
@@ -53,7 +53,7 @@ STATIC int xfs_attr_leaf_hasname(struct xfs_da_args *args, struct xfs_buf **bp);
  */
 STATIC int xfs_attr_node_get(xfs_da_args_t *args);
 STATIC int xfs_attr_node_addname(xfs_da_args_t *args);
-STATIC int xfs_attr_node_removename(xfs_da_args_t *args);
+STATIC int xfs_attr_node_removename_iter(struct xfs_delattr_context *dac);
 STATIC int xfs_attr_node_hasname(xfs_da_args_t *args,
 				 struct xfs_da_state **state);
 STATIC int xfs_attr_fillstate(xfs_da_state_t *state);
@@ -264,6 +264,33 @@ xfs_attr_set_shortform(
 }
 
 /*
+ * Checks to see if a delayed attribute transaction should be rolled.  If so,
+ * also checks for a defer finish.  Transaction is finished and rolled as
+ * needed, and returns true of false if the delayed operation should continue.
+ */
+int
+xfs_attr_trans_roll(
+	struct xfs_delattr_context	*dac)
+{
+	struct xfs_da_args		*args = dac->da_args;
+	int				error = 0;
+
+	if (dac->flags & XFS_DAC_DEFER_FINISH) {
+		/*
+		 * The caller wants us to finish all the deferred ops so that we
+		 * avoid pinning the log tail with a large number of deferred
+		 * ops.
+		 */
+		dac->flags &= ~XFS_DAC_DEFER_FINISH;
+		error = xfs_defer_finish(&args->trans);
+		if (error)
+			return error;
+	}
+
+	return xfs_trans_roll_inode(&args->trans, args->dp);
+}
+
+/*
  * Set the attribute specified in @args.
  */
 int
@@ -364,23 +391,54 @@ xfs_has_attr(
  */
 int
 xfs_attr_remove_args(
-	struct xfs_da_args      *args)
+	struct xfs_da_args	*args)
 {
-	struct xfs_inode	*dp = args->dp;
-	int			error;
+	int				error = 0;
+	struct xfs_delattr_context	dac = {
+		.da_args	= args,
+	};
+
+	do {
+		error = xfs_attr_remove_iter(&dac);
+		if (error != -EAGAIN)
+			break;
+
+		error = xfs_attr_trans_roll(&dac);
+		if (error)
+			return error;
+
+	} while (true);
+
+	return error;
+}
+
+/*
+ * Remove the attribute specified in @args.
+ *
+ * This function may return -EAGAIN to signal that the transaction needs to be
+ * rolled.  Callers should continue calling this function until they receive a
+ * return value other than -EAGAIN.
+ */
+int
+xfs_attr_remove_iter(
+	struct xfs_delattr_context	*dac)
+{
+	struct xfs_da_args		*args = dac->da_args;
+	struct xfs_inode		*dp = args->dp;
+
+	if (dac->dela_state == XFS_DAS_RM_SHRINK)
+		goto node;
 
 	if (!xfs_inode_hasattr(dp)) {
-		error = -ENOATTR;
+		return -ENOATTR;
 	} else if (dp->i_afp->if_format == XFS_DINODE_FMT_LOCAL) {
 		ASSERT(dp->i_afp->if_flags & XFS_IFINLINE);
-		error = xfs_attr_shortform_remove(args);
+		return xfs_attr_shortform_remove(args);
 	} else if (xfs_bmap_one_block(dp, XFS_ATTR_FORK)) {
-		error = xfs_attr_leaf_removename(args);
-	} else {
-		error = xfs_attr_node_removename(args);
+		return xfs_attr_leaf_removename(args);
 	}
-
-	return error;
+node:
+	return  xfs_attr_node_removename_iter(dac);
 }
 
 /*
@@ -1178,10 +1236,11 @@ xfs_attr_leaf_mark_incomplete(
  */
 STATIC
 int xfs_attr_node_removename_setup(
-	struct xfs_da_args	*args,
-	struct xfs_da_state	**state)
+	struct xfs_delattr_context	*dac,
+	struct xfs_da_state		**state)
 {
-	int			error;
+	struct xfs_da_args		*args = dac->da_args;
+	int				error;
 
 	error = xfs_attr_node_hasname(args, state);
 	if (error != -EEXIST)
@@ -1191,6 +1250,12 @@ int xfs_attr_node_removename_setup(
 	ASSERT((*state)->path.blk[(*state)->path.active - 1].magic ==
 		XFS_ATTR_LEAF_MAGIC);
 
+	/*
+	 * Store state in the context incase we need to cycle out the
+	 * transaction
+	 */
+	dac->da_state = *state;
+
 	if (args->rmtblkno > 0) {
 		error = xfs_attr_leaf_mark_incomplete(args, *state);
 		if (error)
@@ -1203,13 +1268,16 @@ int xfs_attr_node_removename_setup(
 }
 
 STATIC int
-xfs_attr_node_remove_rmt(
-	struct xfs_da_args	*args,
-	struct xfs_da_state	*state)
+xfs_attr_node_remove_rmt (
+	struct xfs_delattr_context	*dac,
+	struct xfs_da_state		*state)
 {
-	int			error = 0;
+	int				error = 0;
 
-	error = xfs_attr_rmtval_remove(args);
+	/*
+	 * May return -EAGAIN to request that the caller recall this function
+	 */
+	error = __xfs_attr_rmtval_remove(dac);
 	if (error)
 		return error;
 
@@ -1221,21 +1289,27 @@ xfs_attr_node_remove_rmt(
 }
 
 /*
- * Remove a name from a B-tree attribute list.
+ * Step through removeing a name from a B-tree attribute list.
  *
  * This will involve walking down the Btree, and may involve joining
  * leaf nodes and even joining intermediate nodes up to and including
  * the root node (a special case of an intermediate node).
+ *
+ * This routine is meant to function as either an inline or delayed operation,
+ * and may return -EAGAIN when the transaction needs to be rolled.  Calling
+ * functions will need to handle this, and recall the function until a
+ * successful error code is returned.
  */
 STATIC int
 xfs_attr_node_remove_step(
-	struct xfs_da_args	*args,
-	struct xfs_da_state	*state)
+	struct xfs_delattr_context	*dac)
 {
-	struct xfs_da_state_blk	*blk;
-	int			retval, error;
-	struct xfs_inode	*dp = args->dp;
+	struct xfs_da_args		*args = dac->da_args;
+	struct xfs_da_state		*state;
+	struct xfs_da_state_blk		*blk;
+	int				retval, error = 0;
 
+	state = dac->da_state;
 
 	/*
 	 * If there is an out-of-line value, de-allocate the blocks.
@@ -1243,7 +1317,10 @@ xfs_attr_node_remove_step(
 	 * overflow the maximum size of a transaction and/or hit a deadlock.
 	 */
 	if (args->rmtblkno > 0) {
-		error = xfs_attr_node_remove_rmt(args, state);
+		/*
+		 * May return -EAGAIN. Remove blocks until args->rmtblkno == 0
+		 */
+		error = xfs_attr_node_remove_rmt(dac, state);
 		if (error)
 			return error;
 	}
@@ -1257,21 +1334,18 @@ xfs_attr_node_remove_step(
 	xfs_da3_fixhashpath(state, &state->path);
 
 	/*
-	 * Check to see if the tree needs to be collapsed.
+	 * Check to see if the tree needs to be collapsed.  Set the flag to
+	 * indicate that the calling function needs to move the to shrink
+	 * operation
 	 */
 	if (retval && (state->path.active > 1)) {
 		error = xfs_da3_join(state);
 		if (error)
 			return error;
-		error = xfs_defer_finish(&args->trans);
-		if (error)
-			return error;
-		/*
-		 * Commit the Btree join operation and start a new trans.
-		 */
-		error = xfs_trans_roll_inode(&args->trans, dp);
-		if (error)
-			return error;
+
+		dac->flags |= XFS_DAC_DEFER_FINISH;
+		dac->dela_state = XFS_DAS_RM_SHRINK;
+		return -EAGAIN;
 	}
 
 	return error;
@@ -1282,31 +1356,53 @@ xfs_attr_node_remove_step(
  *
  * This routine will find the blocks of the name to remove, remove them and
  * shirnk the tree if needed.
+ *
+ * This routine is meant to function as either an inline or delayed operation,
+ * and may return -EAGAIN when the transaction needs to be rolled.  Calling
+ * functions will need to handle this, and recall the function until a
+ * successful error code is returned.
  */
 STATIC int
-xfs_attr_node_removename(
-	struct xfs_da_args	*args)
+xfs_attr_node_removename_iter(
+	struct xfs_delattr_context	*dac)
 {
-	struct xfs_da_state	*state;
-	int			error;
-	struct xfs_inode	*dp = args->dp;
+	struct xfs_da_args		*args = dac->da_args;
+	struct xfs_da_state		*state;
+	int				error;
+	struct xfs_inode		*dp = args->dp;
 
 	trace_xfs_attr_node_removename(args);
+	state = dac->da_state;
 
-	error = xfs_attr_node_removename_setup(args, &state);
-	if (error)
-		goto out;
+	if ((dac->flags & XFS_DAC_NODE_RMVNAME_INIT) == 0) {
+		dac->flags |= XFS_DAC_NODE_RMVNAME_INIT;
+		error = xfs_attr_node_removename_setup(dac, &state);
+		if (error)
+			goto out;
+	}
 
-	error = xfs_attr_node_remove_step(args, state);
-	if (error)
-		goto out;
+	switch (dac->dela_state) {
+	case XFS_DAS_UNINIT:
+		error = xfs_attr_node_remove_step(dac);
+		if (error)
+			break;
 
-	/*
-	 * If the result is small enough, push it all into the inode.
-	 */
-	if (xfs_bmap_one_block(dp, XFS_ATTR_FORK))
-		error = xfs_attr_node_shrink(args, state);
+		/* do not break, proceed to shrink if needed */
+	case XFS_DAS_RM_SHRINK:
+		/*
+		 * If the result is small enough, push it all into the inode.
+		 */
+		if (xfs_bmap_one_block(dp, XFS_ATTR_FORK))
+			error = xfs_attr_node_shrink(args, state);
 
+		break;
+	default:
+		ASSERT(0);
+		return -EINVAL;
+	}
+
+	if (error == -EAGAIN)
+		return error;
 out:
 	if (state)
 		xfs_da_state_free(state);
diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
index 3e97a93..64dcf0f 100644
--- a/fs/xfs/libxfs/xfs_attr.h
+++ b/fs/xfs/libxfs/xfs_attr.h
@@ -74,6 +74,74 @@ struct xfs_attr_list_context {
 };
 
 
+/*
+ * ========================================================================
+ * Structure used to pass context around among the delayed routines.
+ * ========================================================================
+ */
+
+/*
+ * Below is a state machine diagram for attr remove operations. The  XFS_DAS_*
+ * states indicate places where the function would return -EAGAIN, and then
+ * immediately resume from after being recalled by the calling function. States
+ * marked as a "subroutine state" indicate that they belong to a subroutine, and
+ * so the calling function needs to pass them back to that subroutine to allow
+ * it to finish where it left off. But they otherwise do not have a role in the
+ * calling function other than just passing through.
+ *
+ * xfs_attr_remove_iter()
+ *	  XFS_DAS_RM_SHRINK ─┐
+ *	  (subroutine state) │
+ *	                     └─>xfs_attr_node_removename()
+ *	                                      │
+ *	                                      v
+ *	                                   need to
+ *	                                shrink tree? ─n─┐
+ *	                                      │         │
+ *	                                      y         │
+ *	                                      │         │
+ *	                                      v         │
+ *	                              XFS_DAS_RM_SHRINK │
+ *	                                      │         │
+ *	                                      v         │
+ *	                                     done <─────┘
+ *
+ */
+
+/*
+ * Enum values for xfs_delattr_context.da_state
+ *
+ * These values are used by delayed attribute operations to keep track  of where
+ * they were before they returned -EAGAIN.  A return code of -EAGAIN signals the
+ * calling function to roll the transaction, and then recall the subroutine to
+ * finish the operation.  The enum is then used by the subroutine to jump back
+ * to where it was and resume executing where it left off.
+ */
+enum xfs_delattr_state {
+	XFS_DAS_UNINIT		= 0,  /* No state has been set yet */
+	XFS_DAS_RM_SHRINK,	      /* We are shrinking the tree */
+};
+
+/*
+ * Defines for xfs_delattr_context.flags
+ */
+#define XFS_DAC_DEFER_FINISH		0x01 /* finish the transaction */
+#define XFS_DAC_NODE_RMVNAME_INIT	0x02 /* xfs_attr_node_removename init */
+
+/*
+ * Context used for keeping track of delayed attribute operations
+ */
+struct xfs_delattr_context {
+	struct xfs_da_args      *da_args;
+
+	/* Used in xfs_attr_node_removename to roll through removing blocks */
+	struct xfs_da_state     *da_state;
+
+	/* Used to keep track of current state of delayed operation */
+	unsigned int            flags;
+	enum xfs_delattr_state  dela_state;
+};
+
 /*========================================================================
  * Function prototypes for the kernel.
  *========================================================================*/
@@ -91,6 +159,10 @@ int xfs_attr_set(struct xfs_da_args *args);
 int xfs_attr_set_args(struct xfs_da_args *args);
 int xfs_has_attr(struct xfs_da_args *args);
 int xfs_attr_remove_args(struct xfs_da_args *args);
+int xfs_attr_remove_iter(struct xfs_delattr_context *dac);
+int xfs_attr_trans_roll(struct xfs_delattr_context *dac);
 bool xfs_attr_namecheck(const void *name, size_t length);
+void xfs_delattr_context_init(struct xfs_delattr_context *dac,
+			      struct xfs_da_args *args);
 
 #endif	/* __XFS_ATTR_H__ */
diff --git a/fs/xfs/libxfs/xfs_attr_leaf.c b/fs/xfs/libxfs/xfs_attr_leaf.c
index bb128db..338377e 100644
--- a/fs/xfs/libxfs/xfs_attr_leaf.c
+++ b/fs/xfs/libxfs/xfs_attr_leaf.c
@@ -19,8 +19,8 @@
 #include "xfs_bmap_btree.h"
 #include "xfs_bmap.h"
 #include "xfs_attr_sf.h"
-#include "xfs_attr_remote.h"
 #include "xfs_attr.h"
+#include "xfs_attr_remote.h"
 #include "xfs_attr_leaf.h"
 #include "xfs_error.h"
 #include "xfs_trace.h"
diff --git a/fs/xfs/libxfs/xfs_attr_remote.c b/fs/xfs/libxfs/xfs_attr_remote.c
index 48d8e9c..1426c15 100644
--- a/fs/xfs/libxfs/xfs_attr_remote.c
+++ b/fs/xfs/libxfs/xfs_attr_remote.c
@@ -674,10 +674,12 @@ xfs_attr_rmtval_invalidate(
  */
 int
 xfs_attr_rmtval_remove(
-	struct xfs_da_args      *args)
+	struct xfs_da_args		*args)
 {
-	int			error;
-	int			retval;
+	int				error;
+	struct xfs_delattr_context	dac  = {
+		.da_args	= args,
+	};
 
 	trace_xfs_attr_rmtval_remove(args);
 
@@ -685,19 +687,17 @@ xfs_attr_rmtval_remove(
 	 * Keep de-allocating extents until the remote-value region is gone.
 	 */
 	do {
-		retval = __xfs_attr_rmtval_remove(args);
-		if (retval && retval != -EAGAIN)
-			return retval;
+		error = __xfs_attr_rmtval_remove(&dac);
+		if (error != -EAGAIN)
+			break;
 
-		/*
-		 * Close out trans and start the next one in the chain.
-		 */
-		error = xfs_trans_roll_inode(&args->trans, args->dp);
+		error = xfs_attr_trans_roll(&dac);
 		if (error)
 			return error;
-	} while (retval == -EAGAIN);
 
-	return 0;
+	} while (true);
+
+	return error;
 }
 
 /*
@@ -707,9 +707,10 @@ xfs_attr_rmtval_remove(
  */
 int
 __xfs_attr_rmtval_remove(
-	struct xfs_da_args	*args)
+	struct xfs_delattr_context	*dac)
 {
-	int			error, done;
+	struct xfs_da_args		*args = dac->da_args;
+	int				error, done;
 
 	/*
 	 * Unmap value blocks for this attr.
@@ -719,12 +720,10 @@ __xfs_attr_rmtval_remove(
 	if (error)
 		return error;
 
-	error = xfs_defer_finish(&args->trans);
-	if (error)
-		return error;
-
-	if (!done)
+	if (!done) {
+		dac->flags |= XFS_DAC_DEFER_FINISH;
 		return -EAGAIN;
+	}
 
 	return error;
 }
diff --git a/fs/xfs/libxfs/xfs_attr_remote.h b/fs/xfs/libxfs/xfs_attr_remote.h
index 9eee615..002fd30 100644
--- a/fs/xfs/libxfs/xfs_attr_remote.h
+++ b/fs/xfs/libxfs/xfs_attr_remote.h
@@ -14,5 +14,5 @@ int xfs_attr_rmtval_remove(struct xfs_da_args *args);
 int xfs_attr_rmtval_stale(struct xfs_inode *ip, struct xfs_bmbt_irec *map,
 		xfs_buf_flags_t incore_flags);
 int xfs_attr_rmtval_invalidate(struct xfs_da_args *args);
-int __xfs_attr_rmtval_remove(struct xfs_da_args *args);
+int __xfs_attr_rmtval_remove(struct xfs_delattr_context *dac);
 #endif /* __XFS_ATTR_REMOTE_H__ */
diff --git a/fs/xfs/xfs_attr_inactive.c b/fs/xfs/xfs_attr_inactive.c
index bfad669..aaa7e66 100644
--- a/fs/xfs/xfs_attr_inactive.c
+++ b/fs/xfs/xfs_attr_inactive.c
@@ -15,10 +15,10 @@
 #include "xfs_da_format.h"
 #include "xfs_da_btree.h"
 #include "xfs_inode.h"
+#include "xfs_attr.h"
 #include "xfs_attr_remote.h"
 #include "xfs_trans.h"
 #include "xfs_bmap.h"
-#include "xfs_attr.h"
 #include "xfs_attr_leaf.h"
 #include "xfs_quota.h"
 #include "xfs_dir2.h"
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v13 03/10] xfs: Add delay ready attr set routines
  2020-10-23  6:34 [PATCH v13 00/10] xfs: Delayed Attributes Allison Henderson
  2020-10-23  6:34 ` [PATCH v13 01/10] xfs: Add helper xfs_attr_node_remove_step Allison Henderson
  2020-10-23  6:34 ` [PATCH v13 02/10] xfs: Add delay ready attr remove routines Allison Henderson
@ 2020-10-23  6:34 ` Allison Henderson
  2020-10-27 13:32   ` Chandan Babu R
  2020-11-10 23:10   ` Darrick J. Wong
  2020-10-23  6:34 ` [PATCH v13 04/10] xfs: Rename __xfs_attr_rmtval_remove Allison Henderson
                   ` (6 subsequent siblings)
  9 siblings, 2 replies; 58+ messages in thread
From: Allison Henderson @ 2020-10-23  6:34 UTC (permalink / raw)
  To: linux-xfs

This patch modifies the attr set routines to be delay ready. This means
they no longer roll or commit transactions, but instead return -EAGAIN
to have the calling routine roll and refresh the transaction.  In this
series, xfs_attr_set_args has become xfs_attr_set_iter, which uses a
state machine like switch to keep track of where it was when EAGAIN was
returned. See xfs_attr.h for a more detailed diagram of the states.

Two new helper functions have been added: xfs_attr_rmtval_set_init and
xfs_attr_rmtval_set_blk.  They provide a subset of logic similar to
xfs_attr_rmtval_set, but they store the current block in the delay attr
context to allow the caller to roll the transaction between allocations.
This helps to simplify and consolidate code used by
xfs_attr_leaf_addname and xfs_attr_node_addname. xfs_attr_set_args has
now become a simple loop to refresh the transaction until the operation
is completed.  Lastly, xfs_attr_rmtval_remove is no longer used, and is
removed.

Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
---
 fs/xfs/libxfs/xfs_attr.c        | 370 ++++++++++++++++++++++++++--------------
 fs/xfs/libxfs/xfs_attr.h        | 126 +++++++++++++-
 fs/xfs/libxfs/xfs_attr_remote.c |  99 +++++++----
 fs/xfs/libxfs/xfs_attr_remote.h |   4 +
 fs/xfs/xfs_trace.h              |   1 -
 5 files changed, 439 insertions(+), 161 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
index 6ca94cb..95c98d7 100644
--- a/fs/xfs/libxfs/xfs_attr.c
+++ b/fs/xfs/libxfs/xfs_attr.c
@@ -44,7 +44,7 @@ STATIC int xfs_attr_shortform_addname(xfs_da_args_t *args);
  * Internal routines when attribute list is one block.
  */
 STATIC int xfs_attr_leaf_get(xfs_da_args_t *args);
-STATIC int xfs_attr_leaf_addname(xfs_da_args_t *args);
+STATIC int xfs_attr_leaf_addname(struct xfs_delattr_context *dac);
 STATIC int xfs_attr_leaf_removename(xfs_da_args_t *args);
 STATIC int xfs_attr_leaf_hasname(struct xfs_da_args *args, struct xfs_buf **bp);
 
@@ -52,12 +52,15 @@ STATIC int xfs_attr_leaf_hasname(struct xfs_da_args *args, struct xfs_buf **bp);
  * Internal routines when attribute list is more than one block.
  */
 STATIC int xfs_attr_node_get(xfs_da_args_t *args);
-STATIC int xfs_attr_node_addname(xfs_da_args_t *args);
+STATIC int xfs_attr_node_addname(struct xfs_delattr_context *dac);
 STATIC int xfs_attr_node_removename_iter(struct xfs_delattr_context *dac);
 STATIC int xfs_attr_node_hasname(xfs_da_args_t *args,
 				 struct xfs_da_state **state);
 STATIC int xfs_attr_fillstate(xfs_da_state_t *state);
 STATIC int xfs_attr_refillstate(xfs_da_state_t *state);
+STATIC int xfs_attr_leaf_try_add(struct xfs_da_args *args, struct xfs_buf *bp);
+STATIC int xfs_attr_set_iter(struct xfs_delattr_context *dac,
+			     struct xfs_buf **leaf_bp);
 
 int
 xfs_inode_hasattr(
@@ -218,8 +221,11 @@ xfs_attr_is_shortform(
 
 /*
  * Attempts to set an attr in shortform, or converts short form to leaf form if
- * there is not enough room.  If the attr is set, the transaction is committed
- * and set to NULL.
+ * there is not enough room.  This function is meant to operate as a helper
+ * routine to the delayed attribute functions.  It returns -EAGAIN to indicate
+ * that the calling function should roll the transaction, and then proceed to
+ * add the attr in leaf form.  This subroutine does not expect to be recalled
+ * again like the other delayed attr routines do.
  */
 STATIC int
 xfs_attr_set_shortform(
@@ -227,16 +233,16 @@ xfs_attr_set_shortform(
 	struct xfs_buf		**leaf_bp)
 {
 	struct xfs_inode	*dp = args->dp;
-	int			error, error2 = 0;
+	int			error = 0;
 
 	/*
 	 * Try to add the attr to the attribute list in the inode.
 	 */
 	error = xfs_attr_try_sf_addname(dp, args);
+
+	/* Should only be 0, -EEXIST or ENOSPC */
 	if (error != -ENOSPC) {
-		error2 = xfs_trans_commit(args->trans);
-		args->trans = NULL;
-		return error ? error : error2;
+		return error;
 	}
 	/*
 	 * It won't fit in the shortform, transform to a leaf block.  GROT:
@@ -249,18 +255,10 @@ xfs_attr_set_shortform(
 	/*
 	 * Prevent the leaf buffer from being unlocked so that a concurrent AIL
 	 * push cannot grab the half-baked leaf buffer and run into problems
-	 * with the write verifier. Once we're done rolling the transaction we
-	 * can release the hold and add the attr to the leaf.
+	 * with the write verifier.
 	 */
 	xfs_trans_bhold(args->trans, *leaf_bp);
-	error = xfs_defer_finish(&args->trans);
-	xfs_trans_bhold_release(args->trans, *leaf_bp);
-	if (error) {
-		xfs_trans_brelse(args->trans, *leaf_bp);
-		return error;
-	}
-
-	return 0;
+	return -EAGAIN;
 }
 
 /*
@@ -268,7 +266,7 @@ xfs_attr_set_shortform(
  * also checks for a defer finish.  Transaction is finished and rolled as
  * needed, and returns true of false if the delayed operation should continue.
  */
-int
+STATIC int
 xfs_attr_trans_roll(
 	struct xfs_delattr_context	*dac)
 {
@@ -297,61 +295,130 @@ int
 xfs_attr_set_args(
 	struct xfs_da_args	*args)
 {
-	struct xfs_inode	*dp = args->dp;
-	struct xfs_buf          *leaf_bp = NULL;
-	int			error = 0;
+	struct xfs_buf			*leaf_bp = NULL;
+	int				error = 0;
+	struct xfs_delattr_context	dac = {
+		.da_args	= args,
+	};
+
+	do {
+		error = xfs_attr_set_iter(&dac, &leaf_bp);
+		if (error != -EAGAIN)
+			break;
+
+		error = xfs_attr_trans_roll(&dac);
+		if (error)
+			return error;
+
+		if (leaf_bp) {
+			xfs_trans_bjoin(args->trans, leaf_bp);
+			xfs_trans_bhold(args->trans, leaf_bp);
+		}
+
+	} while (true);
+
+	return error;
+}
+
+/*
+ * Set the attribute specified in @args.
+ * This routine is meant to function as a delayed operation, and may return
+ * -EAGAIN when the transaction needs to be rolled.  Calling functions will need
+ * to handle this, and recall the function until a successful error code is
+ * returned.
+ */
+STATIC int
+xfs_attr_set_iter(
+	struct xfs_delattr_context	*dac,
+	struct xfs_buf			**leaf_bp)
+{
+	struct xfs_da_args		*args = dac->da_args;
+	struct xfs_inode		*dp = args->dp;
+	int				error = 0;
+
+	/* State machine switch */
+	switch (dac->dela_state) {
+	case XFS_DAS_FLIP_LFLAG:
+	case XFS_DAS_FOUND_LBLK:
+		goto das_leaf;
+	case XFS_DAS_FOUND_NBLK:
+	case XFS_DAS_FLIP_NFLAG:
+	case XFS_DAS_ALLOC_NODE:
+		goto das_node;
+	default:
+		break;
+	}
 
 	/*
 	 * If the attribute list is already in leaf format, jump straight to
 	 * leaf handling.  Otherwise, try to add the attribute to the shortform
 	 * list; if there's no room then convert the list to leaf format and try
-	 * again.
+	 * again. No need to set state as we will be in leaf form when we come
+	 * back
 	 */
 	if (xfs_attr_is_shortform(dp)) {
 
 		/*
-		 * If the attr was successfully set in shortform, the
-		 * transaction is committed and set to NULL.  Otherwise, is it
-		 * converted from shortform to leaf, and the transaction is
-		 * retained.
+		 * If the attr was successfully set in shortform, no need to
+		 * continue.  Otherwise, is it converted from shortform to leaf
+		 * and -EAGAIN is returned.
 		 */
-		error = xfs_attr_set_shortform(args, &leaf_bp);
-		if (error || !args->trans)
-			return error;
+		error = xfs_attr_set_shortform(args, leaf_bp);
+		if (error == -EAGAIN)
+			dac->flags |= XFS_DAC_DEFER_FINISH;
+
+		return error;
 	}
 
-	if (xfs_bmap_one_block(dp, XFS_ATTR_FORK)) {
-		error = xfs_attr_leaf_addname(args);
-		if (error != -ENOSPC)
-			return error;
+	/*
+	 * After a shortform to leaf conversion, we need to hold the leaf and
+	 * cycle out the transaction.  When we get back, we need to release
+	 * the leaf.
+	 */
+	if (*leaf_bp != NULL) {
+		xfs_trans_bhold_release(args->trans, *leaf_bp);
+		*leaf_bp = NULL;
+	}
 
-		/*
-		 * Promote the attribute list to the Btree format.
-		 */
-		error = xfs_attr3_leaf_to_node(args);
-		if (error)
-			return error;
+	if (xfs_bmap_one_block(dp, XFS_ATTR_FORK)) {
+		error = xfs_attr_leaf_try_add(args, *leaf_bp);
+		switch (error) {
+		case -ENOSPC:
+			/*
+			 * Promote the attribute list to the Btree format.
+			 */
+			error = xfs_attr3_leaf_to_node(args);
+			if (error)
+				return error;
 
-		/*
-		 * Finish any deferred work items and roll the transaction once
-		 * more.  The goal here is to call node_addname with the inode
-		 * and transaction in the same state (inode locked and joined,
-		 * transaction clean) no matter how we got to this step.
-		 */
-		error = xfs_defer_finish(&args->trans);
-		if (error)
+			/*
+			 * Finish any deferred work items and roll the
+			 * transaction once more.  The goal here is to call
+			 * node_addname with the inode and transaction in the
+			 * same state (inode locked and joined, transaction
+			 * clean) no matter how we got to this step.
+			 */
+			dac->flags |= XFS_DAC_DEFER_FINISH;
+			return -EAGAIN;
+		case 0:
+			dac->dela_state = XFS_DAS_FOUND_LBLK;
+			return -EAGAIN;
+		default:
 			return error;
+		}
+das_leaf:
+		error = xfs_attr_leaf_addname(dac);
+		if (error == -ENOSPC)
+			/*
+			 * No need to set state.  We will be in node form when
+			 * we are recalled
+			 */
+			return -EAGAIN;
 
-		/*
-		 * Commit the current trans (including the inode) and
-		 * start a new one.
-		 */
-		error = xfs_trans_roll_inode(&args->trans, dp);
-		if (error)
-			return error;
+		return error;
 	}
-
-	error = xfs_attr_node_addname(args);
+das_node:
+	error = xfs_attr_node_addname(dac);
 	return error;
 }
 
@@ -723,28 +790,30 @@ xfs_attr_leaf_try_add(
  *
  * This leaf block cannot have a "remote" value, we only call this routine
  * if bmap_one_block() says there is only one block (ie: no remote blks).
+ *
+ * This routine is meant to function as a delayed operation, and may return
+ * -EAGAIN when the transaction needs to be rolled.  Calling functions will need
+ * to handle this, and recall the function until a successful error code is
+ * returned.
  */
 STATIC int
 xfs_attr_leaf_addname(
-	struct xfs_da_args	*args)
+	struct xfs_delattr_context	*dac)
 {
-	int			error, forkoff;
-	struct xfs_buf		*bp = NULL;
-	struct xfs_inode	*dp = args->dp;
-
-	trace_xfs_attr_leaf_addname(args);
-
-	error = xfs_attr_leaf_try_add(args, bp);
-	if (error)
-		return error;
+	struct xfs_da_args		*args = dac->da_args;
+	struct xfs_buf			*bp = NULL;
+	int				error, forkoff;
+	struct xfs_inode		*dp = args->dp;
 
-	/*
-	 * Commit the transaction that added the attr name so that
-	 * later routines can manage their own transactions.
-	 */
-	error = xfs_trans_roll_inode(&args->trans, dp);
-	if (error)
-		return error;
+	/* State machine switch */
+	switch (dac->dela_state) {
+	case XFS_DAS_FLIP_LFLAG:
+		goto das_flip_flag;
+	case XFS_DAS_RM_LBLK:
+		goto das_rm_lblk;
+	default:
+		break;
+	}
 
 	/*
 	 * If there was an out-of-line value, allocate the blocks we
@@ -752,12 +821,34 @@ xfs_attr_leaf_addname(
 	 * after we create the attribute so that we don't overflow the
 	 * maximum size of a transaction and/or hit a deadlock.
 	 */
-	if (args->rmtblkno > 0) {
-		error = xfs_attr_rmtval_set(args);
+
+	/* Open coded xfs_attr_rmtval_set without trans handling */
+	if ((dac->flags & XFS_DAC_LEAF_ADDNAME_INIT) == 0) {
+		dac->flags |= XFS_DAC_LEAF_ADDNAME_INIT;
+		if (args->rmtblkno > 0) {
+			error = xfs_attr_rmtval_find_space(dac);
+			if (error)
+				return error;
+		}
+	}
+
+	/*
+	 * Roll through the "value", allocating blocks on disk as
+	 * required.
+	 */
+	if (dac->blkcnt > 0) {
+		error = xfs_attr_rmtval_set_blk(dac);
 		if (error)
 			return error;
+
+		dac->flags |= XFS_DAC_DEFER_FINISH;
+		return -EAGAIN;
 	}
 
+	error = xfs_attr_rmtval_set_value(args);
+	if (error)
+		return error;
+
 	if (!(args->op_flags & XFS_DA_OP_RENAME)) {
 		/*
 		 * Added a "remote" value, just clear the incomplete flag.
@@ -777,29 +868,29 @@ xfs_attr_leaf_addname(
 	 * In a separate transaction, set the incomplete flag on the "old" attr
 	 * and clear the incomplete flag on the "new" attr.
 	 */
-
 	error = xfs_attr3_leaf_flipflags(args);
 	if (error)
 		return error;
 	/*
 	 * Commit the flag value change and start the next trans in series.
 	 */
-	error = xfs_trans_roll_inode(&args->trans, args->dp);
-	if (error)
-		return error;
-
+	dac->dela_state = XFS_DAS_FLIP_LFLAG;
+	return -EAGAIN;
+das_flip_flag:
 	/*
 	 * Dismantle the "old" attribute/value pair by removing a "remote" value
 	 * (if it exists).
 	 */
 	xfs_attr_restore_rmt_blk(args);
 
+	error = xfs_attr_rmtval_invalidate(args);
+	if (error)
+		return error;
+das_rm_lblk:
 	if (args->rmtblkno) {
-		error = xfs_attr_rmtval_invalidate(args);
-		if (error)
-			return error;
-
-		error = xfs_attr_rmtval_remove(args);
+		error = __xfs_attr_rmtval_remove(dac);
+		if (error == -EAGAIN)
+			dac->dela_state = XFS_DAS_RM_LBLK;
 		if (error)
 			return error;
 	}
@@ -965,23 +1056,38 @@ xfs_attr_node_hasname(
  *
  * "Remote" attribute values confuse the issue and atomic rename operations
  * add a whole extra layer of confusion on top of that.
+ *
+ * This routine is meant to function as a delayed operation, and may return
+ * -EAGAIN when the transaction needs to be rolled.  Calling functions will need
+ * to handle this, and recall the function until a successful error code is
+ *returned.
  */
 STATIC int
 xfs_attr_node_addname(
-	struct xfs_da_args	*args)
+	struct xfs_delattr_context	*dac)
 {
-	struct xfs_da_state	*state;
-	struct xfs_da_state_blk	*blk;
-	struct xfs_inode	*dp;
-	int			retval, error;
+	struct xfs_da_args		*args = dac->da_args;
+	struct xfs_da_state		*state = NULL;
+	struct xfs_da_state_blk		*blk;
+	int				retval = 0;
+	int				error = 0;
 
 	trace_xfs_attr_node_addname(args);
 
-	/*
-	 * Fill in bucket of arguments/results/context to carry around.
-	 */
-	dp = args->dp;
-restart:
+	/* State machine switch */
+	switch (dac->dela_state) {
+	case XFS_DAS_FLIP_NFLAG:
+		goto das_flip_flag;
+	case XFS_DAS_FOUND_NBLK:
+		goto das_found_nblk;
+	case XFS_DAS_ALLOC_NODE:
+		goto das_alloc_node;
+	case XFS_DAS_RM_NBLK:
+		goto das_rm_nblk;
+	default:
+		break;
+	}
+
 	/*
 	 * Search to see if name already exists, and get back a pointer
 	 * to where it should go.
@@ -1027,19 +1133,13 @@ xfs_attr_node_addname(
 			error = xfs_attr3_leaf_to_node(args);
 			if (error)
 				goto out;
-			error = xfs_defer_finish(&args->trans);
-			if (error)
-				goto out;
 
 			/*
-			 * Commit the node conversion and start the next
-			 * trans in the chain.
+			 * Restart routine from the top.  No need to set  the
+			 * state
 			 */
-			error = xfs_trans_roll_inode(&args->trans, dp);
-			if (error)
-				goto out;
-
-			goto restart;
+			dac->flags |= XFS_DAC_DEFER_FINISH;
+			return -EAGAIN;
 		}
 
 		/*
@@ -1051,9 +1151,7 @@ xfs_attr_node_addname(
 		error = xfs_da3_split(state);
 		if (error)
 			goto out;
-		error = xfs_defer_finish(&args->trans);
-		if (error)
-			goto out;
+		dac->flags |= XFS_DAC_DEFER_FINISH;
 	} else {
 		/*
 		 * Addition succeeded, update Btree hashvals.
@@ -1068,13 +1166,9 @@ xfs_attr_node_addname(
 	xfs_da_state_free(state);
 	state = NULL;
 
-	/*
-	 * Commit the leaf addition or btree split and start the next
-	 * trans in the chain.
-	 */
-	error = xfs_trans_roll_inode(&args->trans, dp);
-	if (error)
-		goto out;
+	dac->dela_state = XFS_DAS_FOUND_NBLK;
+	return -EAGAIN;
+das_found_nblk:
 
 	/*
 	 * If there was an out-of-line value, allocate the blocks we
@@ -1083,7 +1177,27 @@ xfs_attr_node_addname(
 	 * maximum size of a transaction and/or hit a deadlock.
 	 */
 	if (args->rmtblkno > 0) {
-		error = xfs_attr_rmtval_set(args);
+		/* Open coded xfs_attr_rmtval_set without trans handling */
+		error = xfs_attr_rmtval_find_space(dac);
+		if (error)
+			return error;
+
+		/*
+		 * Roll through the "value", allocating blocks on disk as
+		 * required.
+		 */
+das_alloc_node:
+		if (dac->blkcnt > 0) {
+			error = xfs_attr_rmtval_set_blk(dac);
+			if (error)
+				return error;
+
+			dac->flags |= XFS_DAC_DEFER_FINISH;
+			dac->dela_state = XFS_DAS_ALLOC_NODE;
+			return -EAGAIN;
+		}
+
+		error = xfs_attr_rmtval_set_value(args);
 		if (error)
 			return error;
 	}
@@ -1113,22 +1227,28 @@ xfs_attr_node_addname(
 	/*
 	 * Commit the flag value change and start the next trans in series
 	 */
-	error = xfs_trans_roll_inode(&args->trans, args->dp);
-	if (error)
-		goto out;
-
+	dac->dela_state = XFS_DAS_FLIP_NFLAG;
+	return -EAGAIN;
+das_flip_flag:
 	/*
 	 * Dismantle the "old" attribute/value pair by removing a "remote" value
 	 * (if it exists).
 	 */
 	xfs_attr_restore_rmt_blk(args);
 
+	error = xfs_attr_rmtval_invalidate(args);
+	if (error)
+		return error;
+
+das_rm_nblk:
 	if (args->rmtblkno) {
-		error = xfs_attr_rmtval_invalidate(args);
-		if (error)
-			return error;
+		error = __xfs_attr_rmtval_remove(dac);
+
+		if (error == -EAGAIN) {
+			dac->dela_state = XFS_DAS_RM_NBLK;
+			return -EAGAIN;
+		}
 
-		error = xfs_attr_rmtval_remove(args);
 		if (error)
 			return error;
 	}
diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
index 64dcf0f..501f9df 100644
--- a/fs/xfs/libxfs/xfs_attr.h
+++ b/fs/xfs/libxfs/xfs_attr.h
@@ -106,6 +106,118 @@ struct xfs_attr_list_context {
  *	                                      v         │
  *	                                     done <─────┘
  *
+ *
+ * Below is a state machine diagram for attr set operations.
+ *
+ *  xfs_attr_set_iter()
+ *             │
+ *             v
+ *   ┌───n── fork has
+ *   │	    only 1 blk?
+ *   │		│
+ *   │		y
+ *   │		│
+ *   │		v
+ *   │	xfs_attr_leaf_try_add()
+ *   │		│
+ *   │		v
+ *   │	     had enough
+ *   ├───n────space?
+ *   │		│
+ *   │		y
+ *   │		│
+ *   │		v
+ *   │	XFS_DAS_FOUND_LBLK ──┐
+ *   │	                     │
+ *   │	XFS_DAS_FLIP_LFLAG ──┤
+ *   │	(subroutine state)   │
+ *   │		             │
+ *   │		             └─>xfs_attr_leaf_addname()
+ *   │		                      │
+ *   │		                      v
+ *   │		                   was this
+ *   │		                   a rename? ──n─┐
+ *   │		                      │          │
+ *   │		                      y          │
+ *   │		                      │          │
+ *   │		                      v          │
+ *   │		                flip incomplete  │
+ *   │		                    flag         │
+ *   │		                      │          │
+ *   │		                      v          │
+ *   │		              XFS_DAS_FLIP_LFLAG │
+ *   │		                      │          │
+ *   │		                      v          │
+ *   │		                    remove       │
+ *   │		XFS_DAS_RM_LBLK ─> old name      │
+ *   │		         ^            │          │
+ *   │		         │            v          │
+ *   │		         └──────y── more to      │
+ *   │		                    remove       │
+ *   │		                      │          │
+ *   │		                      n          │
+ *   │		                      │          │
+ *   │		                      v          │
+ *   │		                     done <──────┘
+ *   └──> XFS_DAS_FOUND_NBLK ──┐
+ *	  (subroutine state)   │
+ *	                       │
+ *	  XFS_DAS_ALLOC_NODE ──┤
+ *	  (subroutine state)   │
+ *	                       │
+ *	  XFS_DAS_FLIP_NFLAG ──┤
+ *	  (subroutine state)   │
+ *	                       │
+ *	                       └─>xfs_attr_node_addname()
+ *	                               │
+ *	                               v
+ *	                       find space to store
+ *	                      attr. Split if needed
+ *	                               │
+ *	                               v
+ *	                       XFS_DAS_FOUND_NBLK
+ *	                               │
+ *	                               v
+ *	                 ┌─────n──  need to
+ *	                 │        alloc blks?
+ *	                 │             │
+ *	                 │             y
+ *	                 │             │
+ *	                 │             v
+ *	                 │  ┌─>XFS_DAS_ALLOC_NODE
+ *	                 │  │          │
+ *	                 │  │          v
+ *	                 │  └──y── need to alloc
+ *	                 │         more blocks?
+ *	                 │             │
+ *	                 │             n
+ *	                 │             │
+ *	                 │             v
+ *	                 │          was this
+ *	                 └────────> a rename? ──n─┐
+ *	                               │          │
+ *	                               y          │
+ *	                               │          │
+ *	                               v          │
+ *	                         flip incomplete  │
+ *	                             flag         │
+ *	                               │          │
+ *	                               v          │
+ *	                       XFS_DAS_FLIP_NFLAG │
+ *	                               │          │
+ *	                               v          │
+ *	                             remove       │
+ *	         XFS_DAS_RM_NBLK ─> old name      │
+ *	                  ^            │          │
+ *	                  │            v          │
+ *	                  └──────y── more to      │
+ *	                             remove       │
+ *	                               │          │
+ *	                               n          │
+ *	                               │          │
+ *	                               v          │
+ *	                              done <──────┘
+ *
  */
 
 /*
@@ -120,6 +232,13 @@ struct xfs_attr_list_context {
 enum xfs_delattr_state {
 	XFS_DAS_UNINIT		= 0,  /* No state has been set yet */
 	XFS_DAS_RM_SHRINK,	      /* We are shrinking the tree */
+	XFS_DAS_FOUND_LBLK,	      /* We found leaf blk for attr */
+	XFS_DAS_FOUND_NBLK,	      /* We found node blk for attr */
+	XFS_DAS_FLIP_LFLAG,	      /* Flipped leaf INCOMPLETE attr flag */
+	XFS_DAS_RM_LBLK,	      /* A rename is removing leaf blocks */
+	XFS_DAS_ALLOC_NODE,	      /* We are allocating node blocks */
+	XFS_DAS_FLIP_NFLAG,	      /* Flipped node INCOMPLETE attr flag */
+	XFS_DAS_RM_NBLK,	      /* A rename is removing node blocks */
 };
 
 /*
@@ -127,6 +246,7 @@ enum xfs_delattr_state {
  */
 #define XFS_DAC_DEFER_FINISH		0x01 /* finish the transaction */
 #define XFS_DAC_NODE_RMVNAME_INIT	0x02 /* xfs_attr_node_removename init */
+#define XFS_DAC_LEAF_ADDNAME_INIT	0x04 /* xfs_attr_leaf_addname init*/
 
 /*
  * Context used for keeping track of delayed attribute operations
@@ -134,6 +254,11 @@ enum xfs_delattr_state {
 struct xfs_delattr_context {
 	struct xfs_da_args      *da_args;
 
+	/* Used in xfs_attr_rmtval_set_blk to roll through allocating blocks */
+	struct xfs_bmbt_irec	map;
+	xfs_dablk_t		lblkno;
+	int			blkcnt;
+
 	/* Used in xfs_attr_node_removename to roll through removing blocks */
 	struct xfs_da_state     *da_state;
 
@@ -160,7 +285,6 @@ int xfs_attr_set_args(struct xfs_da_args *args);
 int xfs_has_attr(struct xfs_da_args *args);
 int xfs_attr_remove_args(struct xfs_da_args *args);
 int xfs_attr_remove_iter(struct xfs_delattr_context *dac);
-int xfs_attr_trans_roll(struct xfs_delattr_context *dac);
 bool xfs_attr_namecheck(const void *name, size_t length);
 void xfs_delattr_context_init(struct xfs_delattr_context *dac,
 			      struct xfs_da_args *args);
diff --git a/fs/xfs/libxfs/xfs_attr_remote.c b/fs/xfs/libxfs/xfs_attr_remote.c
index 1426c15..5b445e7 100644
--- a/fs/xfs/libxfs/xfs_attr_remote.c
+++ b/fs/xfs/libxfs/xfs_attr_remote.c
@@ -441,7 +441,7 @@ xfs_attr_rmtval_get(
  * Find a "hole" in the attribute address space large enough for us to drop the
  * new attribute's value into
  */
-STATIC int
+int
 xfs_attr_rmt_find_hole(
 	struct xfs_da_args	*args)
 {
@@ -468,7 +468,7 @@ xfs_attr_rmt_find_hole(
 	return 0;
 }
 
-STATIC int
+int
 xfs_attr_rmtval_set_value(
 	struct xfs_da_args	*args)
 {
@@ -628,6 +628,69 @@ xfs_attr_rmtval_set(
 }
 
 /*
+ * Find a hole for the attr and store it in the delayed attr context.  This
+ * initializes the context to roll through allocating an attr extent for a
+ * delayed attr operation
+ */
+int
+xfs_attr_rmtval_find_space(
+	struct xfs_delattr_context	*dac)
+{
+	struct xfs_da_args		*args = dac->da_args;
+	struct xfs_bmbt_irec		*map = &dac->map;
+	int				error;
+
+	dac->lblkno = 0;
+	dac->blkcnt = 0;
+	args->rmtblkcnt = 0;
+	args->rmtblkno = 0;
+	memset(map, 0, sizeof(struct xfs_bmbt_irec));
+
+	error = xfs_attr_rmt_find_hole(args);
+	if (error)
+		return error;
+
+	dac->blkcnt = args->rmtblkcnt;
+	dac->lblkno = args->rmtblkno;
+
+	return 0;
+}
+
+/*
+ * Write one block of the value associated with an attribute into the
+ * out-of-line buffer that we have defined for it. This is similar to a subset
+ * of xfs_attr_rmtval_set, but records the current block to the delayed attr
+ * context, and leaves transaction handling to the caller.
+ */
+int
+xfs_attr_rmtval_set_blk(
+	struct xfs_delattr_context	*dac)
+{
+	struct xfs_da_args		*args = dac->da_args;
+	struct xfs_inode		*dp = args->dp;
+	struct xfs_bmbt_irec		*map = &dac->map;
+	int nmap;
+	int error;
+
+	nmap = 1;
+	error = xfs_bmapi_write(args->trans, dp, (xfs_fileoff_t)dac->lblkno,
+				dac->blkcnt, XFS_BMAPI_ATTRFORK, args->total,
+				map, &nmap);
+	if (error)
+		return error;
+
+	ASSERT(nmap == 1);
+	ASSERT((map->br_startblock != DELAYSTARTBLOCK) &&
+	       (map->br_startblock != HOLESTARTBLOCK));
+
+	/* roll attribute extent map forwards */
+	dac->lblkno += map->br_blockcount;
+	dac->blkcnt -= map->br_blockcount;
+
+	return 0;
+}
+
+/*
  * Remove the value associated with an attribute by deleting the
  * out-of-line buffer that it is stored on.
  */
@@ -669,38 +732,6 @@ xfs_attr_rmtval_invalidate(
 }
 
 /*
- * Remove the value associated with an attribute by deleting the
- * out-of-line buffer that it is stored on.
- */
-int
-xfs_attr_rmtval_remove(
-	struct xfs_da_args		*args)
-{
-	int				error;
-	struct xfs_delattr_context	dac  = {
-		.da_args	= args,
-	};
-
-	trace_xfs_attr_rmtval_remove(args);
-
-	/*
-	 * Keep de-allocating extents until the remote-value region is gone.
-	 */
-	do {
-		error = __xfs_attr_rmtval_remove(&dac);
-		if (error != -EAGAIN)
-			break;
-
-		error = xfs_attr_trans_roll(&dac);
-		if (error)
-			return error;
-
-	} while (true);
-
-	return error;
-}
-
-/*
  * Remove the value associated with an attribute by deleting the out-of-line
  * buffer that it is stored on. Returns EAGAIN for the caller to refresh the
  * transaction and re-call the function
diff --git a/fs/xfs/libxfs/xfs_attr_remote.h b/fs/xfs/libxfs/xfs_attr_remote.h
index 002fd30..84e2700 100644
--- a/fs/xfs/libxfs/xfs_attr_remote.h
+++ b/fs/xfs/libxfs/xfs_attr_remote.h
@@ -15,4 +15,8 @@ int xfs_attr_rmtval_stale(struct xfs_inode *ip, struct xfs_bmbt_irec *map,
 		xfs_buf_flags_t incore_flags);
 int xfs_attr_rmtval_invalidate(struct xfs_da_args *args);
 int __xfs_attr_rmtval_remove(struct xfs_delattr_context *dac);
+int xfs_attr_rmt_find_hole(struct xfs_da_args *args);
+int xfs_attr_rmtval_set_value(struct xfs_da_args *args);
+int xfs_attr_rmtval_set_blk(struct xfs_delattr_context *dac);
+int xfs_attr_rmtval_find_space(struct xfs_delattr_context *dac);
 #endif /* __XFS_ATTR_REMOTE_H__ */
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index 8695165..e9dde4e 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -1925,7 +1925,6 @@ DEFINE_ATTR_EVENT(xfs_attr_refillstate);
 
 DEFINE_ATTR_EVENT(xfs_attr_rmtval_get);
 DEFINE_ATTR_EVENT(xfs_attr_rmtval_set);
-DEFINE_ATTR_EVENT(xfs_attr_rmtval_remove);
 
 #define DEFINE_DA_EVENT(name) \
 DEFINE_EVENT(xfs_da_class, name, \
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v13 04/10] xfs: Rename __xfs_attr_rmtval_remove
  2020-10-23  6:34 [PATCH v13 00/10] xfs: Delayed Attributes Allison Henderson
                   ` (2 preceding siblings ...)
  2020-10-23  6:34 ` [PATCH v13 03/10] xfs: Add delay ready attr set routines Allison Henderson
@ 2020-10-23  6:34 ` Allison Henderson
  2020-10-23  6:34 ` [PATCH v13 05/10] xfs: Set up infastructure for deferred attribute operations Allison Henderson
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 58+ messages in thread
From: Allison Henderson @ 2020-10-23  6:34 UTC (permalink / raw)
  To: linux-xfs

From: Allison Collins <allison.henderson@oracle.com>

Now that xfs_attr_rmtval_remove is gone, rename __xfs_attr_rmtval_remove
to xfs_attr_rmtval_remove

Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/libxfs/xfs_attr.c        | 7 +++----
 fs/xfs/libxfs/xfs_attr_remote.c | 2 +-
 fs/xfs/libxfs/xfs_attr_remote.h | 3 +--
 3 files changed, 5 insertions(+), 7 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
index 95c98d7..6453178 100644
--- a/fs/xfs/libxfs/xfs_attr.c
+++ b/fs/xfs/libxfs/xfs_attr.c
@@ -888,7 +888,7 @@ xfs_attr_leaf_addname(
 		return error;
 das_rm_lblk:
 	if (args->rmtblkno) {
-		error = __xfs_attr_rmtval_remove(dac);
+		error = xfs_attr_rmtval_remove(dac);
 		if (error == -EAGAIN)
 			dac->dela_state = XFS_DAS_RM_LBLK;
 		if (error)
@@ -1242,8 +1242,7 @@ xfs_attr_node_addname(
 
 das_rm_nblk:
 	if (args->rmtblkno) {
-		error = __xfs_attr_rmtval_remove(dac);
-
+		error = xfs_attr_rmtval_remove(dac);
 		if (error == -EAGAIN) {
 			dac->dela_state = XFS_DAS_RM_NBLK;
 			return -EAGAIN;
@@ -1397,7 +1396,7 @@ xfs_attr_node_remove_rmt (
 	/*
 	 * May return -EAGAIN to request that the caller recall this function
 	 */
-	error = __xfs_attr_rmtval_remove(dac);
+	error = xfs_attr_rmtval_remove(dac);
 	if (error)
 		return error;
 
diff --git a/fs/xfs/libxfs/xfs_attr_remote.c b/fs/xfs/libxfs/xfs_attr_remote.c
index 5b445e7..45c4bc5 100644
--- a/fs/xfs/libxfs/xfs_attr_remote.c
+++ b/fs/xfs/libxfs/xfs_attr_remote.c
@@ -737,7 +737,7 @@ xfs_attr_rmtval_invalidate(
  * transaction and re-call the function
  */
 int
-__xfs_attr_rmtval_remove(
+xfs_attr_rmtval_remove(
 	struct xfs_delattr_context	*dac)
 {
 	struct xfs_da_args		*args = dac->da_args;
diff --git a/fs/xfs/libxfs/xfs_attr_remote.h b/fs/xfs/libxfs/xfs_attr_remote.h
index 84e2700..6ae91af 100644
--- a/fs/xfs/libxfs/xfs_attr_remote.h
+++ b/fs/xfs/libxfs/xfs_attr_remote.h
@@ -10,11 +10,10 @@ int xfs_attr3_rmt_blocks(struct xfs_mount *mp, int attrlen);
 
 int xfs_attr_rmtval_get(struct xfs_da_args *args);
 int xfs_attr_rmtval_set(struct xfs_da_args *args);
-int xfs_attr_rmtval_remove(struct xfs_da_args *args);
 int xfs_attr_rmtval_stale(struct xfs_inode *ip, struct xfs_bmbt_irec *map,
 		xfs_buf_flags_t incore_flags);
 int xfs_attr_rmtval_invalidate(struct xfs_da_args *args);
-int __xfs_attr_rmtval_remove(struct xfs_delattr_context *dac);
+int xfs_attr_rmtval_remove(struct xfs_delattr_context *dac);
 int xfs_attr_rmt_find_hole(struct xfs_da_args *args);
 int xfs_attr_rmtval_set_value(struct xfs_da_args *args);
 int xfs_attr_rmtval_set_blk(struct xfs_delattr_context *dac);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v13 05/10] xfs: Set up infastructure for deferred attribute operations
  2020-10-23  6:34 [PATCH v13 00/10] xfs: Delayed Attributes Allison Henderson
                   ` (3 preceding siblings ...)
  2020-10-23  6:34 ` [PATCH v13 04/10] xfs: Rename __xfs_attr_rmtval_remove Allison Henderson
@ 2020-10-23  6:34 ` Allison Henderson
  2020-11-10 21:51   ` Darrick J. Wong
  2020-10-23  6:34 ` [PATCH v13 06/10] xfs: Add xfs_attr_set_deferred and xfs_attr_remove_deferred Allison Henderson
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 58+ messages in thread
From: Allison Henderson @ 2020-10-23  6:34 UTC (permalink / raw)
  To: linux-xfs

Currently attributes are modified directly across one or more
transactions. But they are not logged or replayed in the event of an
error. The goal of delayed attributes is to enable logging and replaying
of attribute operations using the existing delayed operations
infrastructure.  This will later enable the attributes to become part of
larger multi part operations that also must first be recorded to the
log.  This is mostly of interest in the scheme of parent pointers which
would need to maintain an attribute containing parent inode information
any time an inode is moved, created, or removed.  Parent pointers would
then be of interest to any feature that would need to quickly derive an
inode path from the mount point. Online scrub, nfs lookups and fs grow
or shrink operations are all features that could take advantage of this.

This patch adds two new log item types for setting or removing
attributes as deferred operations.  The xfs_attri_log_item logs an
intent to set or remove an attribute.  The corresponding
xfs_attrd_log_item holds a reference to the xfs_attri_log_item and is
freed once the transaction is done.  Both log items use a generic
xfs_attr_log_format structure that contains the attribute name, value,
flags, inode, and an op_flag that indicates if the operations is a set
or remove.

Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
---
 fs/xfs/Makefile                 |   1 +
 fs/xfs/libxfs/xfs_attr.c        |   7 +-
 fs/xfs/libxfs/xfs_attr.h        |  19 +
 fs/xfs/libxfs/xfs_defer.c       |   1 +
 fs/xfs/libxfs/xfs_defer.h       |   3 +
 fs/xfs/libxfs/xfs_format.h      |   5 +
 fs/xfs/libxfs/xfs_log_format.h  |  44 ++-
 fs/xfs/libxfs/xfs_log_recover.h |   2 +
 fs/xfs/libxfs/xfs_types.h       |   1 +
 fs/xfs/scrub/common.c           |   2 +
 fs/xfs/xfs_acl.c                |   2 +
 fs/xfs/xfs_attr_item.c          | 750 ++++++++++++++++++++++++++++++++++++++++
 fs/xfs/xfs_attr_item.h          |  76 ++++
 fs/xfs/xfs_attr_list.c          |   1 +
 fs/xfs/xfs_ioctl.c              |   2 +
 fs/xfs/xfs_ioctl32.c            |   2 +
 fs/xfs/xfs_iops.c               |   2 +
 fs/xfs/xfs_log.c                |   4 +
 fs/xfs/xfs_log_recover.c        |   2 +
 fs/xfs/xfs_ondisk.h             |   2 +
 fs/xfs/xfs_xattr.c              |   1 +
 21 files changed, 923 insertions(+), 6 deletions(-)

diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index 04611a1..b056cfc 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -102,6 +102,7 @@ xfs-y				+= xfs_log.o \
 				   xfs_buf_item_recover.o \
 				   xfs_dquot_item_recover.o \
 				   xfs_extfree_item.o \
+				   xfs_attr_item.o \
 				   xfs_icreate_item.o \
 				   xfs_inode_item.o \
 				   xfs_inode_item_recover.o \
diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
index 6453178..760383c 100644
--- a/fs/xfs/libxfs/xfs_attr.c
+++ b/fs/xfs/libxfs/xfs_attr.c
@@ -24,6 +24,7 @@
 #include "xfs_quota.h"
 #include "xfs_trans_space.h"
 #include "xfs_trace.h"
+#include "xfs_attr_item.h"
 
 /*
  * xfs_attr.c
@@ -59,8 +60,6 @@ STATIC int xfs_attr_node_hasname(xfs_da_args_t *args,
 STATIC int xfs_attr_fillstate(xfs_da_state_t *state);
 STATIC int xfs_attr_refillstate(xfs_da_state_t *state);
 STATIC int xfs_attr_leaf_try_add(struct xfs_da_args *args, struct xfs_buf *bp);
-STATIC int xfs_attr_set_iter(struct xfs_delattr_context *dac,
-			     struct xfs_buf **leaf_bp);
 
 int
 xfs_inode_hasattr(
@@ -142,7 +141,7 @@ xfs_attr_get(
 /*
  * Calculate how many blocks we need for the new attribute,
  */
-STATIC int
+int
 xfs_attr_calc_size(
 	struct xfs_da_args	*args,
 	int			*local)
@@ -327,7 +326,7 @@ xfs_attr_set_args(
  * to handle this, and recall the function until a successful error code is
  * returned.
  */
-STATIC int
+int
 xfs_attr_set_iter(
 	struct xfs_delattr_context	*dac,
 	struct xfs_buf			**leaf_bp)
diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
index 501f9df..5b4a1ca 100644
--- a/fs/xfs/libxfs/xfs_attr.h
+++ b/fs/xfs/libxfs/xfs_attr.h
@@ -247,6 +247,7 @@ enum xfs_delattr_state {
 #define XFS_DAC_DEFER_FINISH		0x01 /* finish the transaction */
 #define XFS_DAC_NODE_RMVNAME_INIT	0x02 /* xfs_attr_node_removename init */
 #define XFS_DAC_LEAF_ADDNAME_INIT	0x04 /* xfs_attr_leaf_addname init*/
+#define XFS_DAC_DELAYED_OP_INIT		0x08 /* delayed operations init*/
 
 /*
  * Context used for keeping track of delayed attribute operations
@@ -254,6 +255,9 @@ enum xfs_delattr_state {
 struct xfs_delattr_context {
 	struct xfs_da_args      *da_args;
 
+	/* Used by delayed attributes to hold leaf across transactions */
+	struct xfs_buf		*leaf_bp;
+
 	/* Used in xfs_attr_rmtval_set_blk to roll through allocating blocks */
 	struct xfs_bmbt_irec	map;
 	xfs_dablk_t		lblkno;
@@ -267,6 +271,18 @@ struct xfs_delattr_context {
 	enum xfs_delattr_state  dela_state;
 };
 
+/*
+ * List of attrs to commit later.
+ */
+struct xfs_attr_item {
+	struct xfs_delattr_context	xattri_dac;
+	uint32_t			xattri_op_flags;/* attr op set or rm */
+
+	/* used to log this item to an intent */
+	struct list_head		xattri_list;
+};
+
+
 /*========================================================================
  * Function prototypes for the kernel.
  *========================================================================*/
@@ -282,11 +298,14 @@ int xfs_attr_get_ilocked(struct xfs_da_args *args);
 int xfs_attr_get(struct xfs_da_args *args);
 int xfs_attr_set(struct xfs_da_args *args);
 int xfs_attr_set_args(struct xfs_da_args *args);
+int xfs_attr_set_iter(struct xfs_delattr_context *dac,
+		      struct xfs_buf **leaf_bp);
 int xfs_has_attr(struct xfs_da_args *args);
 int xfs_attr_remove_args(struct xfs_da_args *args);
 int xfs_attr_remove_iter(struct xfs_delattr_context *dac);
 bool xfs_attr_namecheck(const void *name, size_t length);
 void xfs_delattr_context_init(struct xfs_delattr_context *dac,
 			      struct xfs_da_args *args);
+int xfs_attr_calc_size(struct xfs_da_args *args, int *local);
 
 #endif	/* __XFS_ATTR_H__ */
diff --git a/fs/xfs/libxfs/xfs_defer.c b/fs/xfs/libxfs/xfs_defer.c
index eff4a12..e9caff7 100644
--- a/fs/xfs/libxfs/xfs_defer.c
+++ b/fs/xfs/libxfs/xfs_defer.c
@@ -178,6 +178,7 @@ static const struct xfs_defer_op_type *defer_op_types[] = {
 	[XFS_DEFER_OPS_TYPE_RMAP]	= &xfs_rmap_update_defer_type,
 	[XFS_DEFER_OPS_TYPE_FREE]	= &xfs_extent_free_defer_type,
 	[XFS_DEFER_OPS_TYPE_AGFL_FREE]	= &xfs_agfl_free_defer_type,
+	[XFS_DEFER_OPS_TYPE_ATTR]	= &xfs_attr_defer_type,
 };
 
 static void
diff --git a/fs/xfs/libxfs/xfs_defer.h b/fs/xfs/libxfs/xfs_defer.h
index 05472f7..72a5789 100644
--- a/fs/xfs/libxfs/xfs_defer.h
+++ b/fs/xfs/libxfs/xfs_defer.h
@@ -19,6 +19,7 @@ enum xfs_defer_ops_type {
 	XFS_DEFER_OPS_TYPE_RMAP,
 	XFS_DEFER_OPS_TYPE_FREE,
 	XFS_DEFER_OPS_TYPE_AGFL_FREE,
+	XFS_DEFER_OPS_TYPE_ATTR,
 	XFS_DEFER_OPS_TYPE_MAX,
 };
 
@@ -63,6 +64,8 @@ extern const struct xfs_defer_op_type xfs_refcount_update_defer_type;
 extern const struct xfs_defer_op_type xfs_rmap_update_defer_type;
 extern const struct xfs_defer_op_type xfs_extent_free_defer_type;
 extern const struct xfs_defer_op_type xfs_agfl_free_defer_type;
+extern const struct xfs_defer_op_type xfs_attr_defer_type;
+
 
 /*
  * This structure enables a dfops user to detach the chain of deferred
diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
index dd764da..d419c34 100644
--- a/fs/xfs/libxfs/xfs_format.h
+++ b/fs/xfs/libxfs/xfs_format.h
@@ -584,6 +584,11 @@ static inline bool xfs_sb_version_hasinobtcounts(struct xfs_sb *sbp)
 		(sbp->sb_features_ro_compat & XFS_SB_FEAT_RO_COMPAT_INOBTCNT);
 }
 
+static inline bool xfs_sb_version_hasdelattr(struct xfs_sb *sbp)
+{
+	return false;
+}
+
 /*
  * end of superblock version macros
  */
diff --git a/fs/xfs/libxfs/xfs_log_format.h b/fs/xfs/libxfs/xfs_log_format.h
index 8bd00da..de6309d 100644
--- a/fs/xfs/libxfs/xfs_log_format.h
+++ b/fs/xfs/libxfs/xfs_log_format.h
@@ -117,7 +117,12 @@ struct xfs_unmount_log_format {
 #define XLOG_REG_TYPE_CUD_FORMAT	24
 #define XLOG_REG_TYPE_BUI_FORMAT	25
 #define XLOG_REG_TYPE_BUD_FORMAT	26
-#define XLOG_REG_TYPE_MAX		26
+#define XLOG_REG_TYPE_ATTRI_FORMAT	27
+#define XLOG_REG_TYPE_ATTRD_FORMAT	28
+#define XLOG_REG_TYPE_ATTR_NAME	29
+#define XLOG_REG_TYPE_ATTR_VALUE	30
+#define XLOG_REG_TYPE_MAX		30
+
 
 /*
  * Flags to log operation header
@@ -240,6 +245,8 @@ typedef struct xfs_trans_header {
 #define	XFS_LI_CUD		0x1243
 #define	XFS_LI_BUI		0x1244	/* bmbt update intent */
 #define	XFS_LI_BUD		0x1245
+#define	XFS_LI_ATTRI		0x1246  /* attr set/remove intent*/
+#define	XFS_LI_ATTRD		0x1247  /* attr set/remove done */
 
 #define XFS_LI_TYPE_DESC \
 	{ XFS_LI_EFI,		"XFS_LI_EFI" }, \
@@ -255,7 +262,9 @@ typedef struct xfs_trans_header {
 	{ XFS_LI_CUI,		"XFS_LI_CUI" }, \
 	{ XFS_LI_CUD,		"XFS_LI_CUD" }, \
 	{ XFS_LI_BUI,		"XFS_LI_BUI" }, \
-	{ XFS_LI_BUD,		"XFS_LI_BUD" }
+	{ XFS_LI_BUD,		"XFS_LI_BUD" }, \
+	{ XFS_LI_ATTRI,		"XFS_LI_ATTRI" }, \
+	{ XFS_LI_ATTRD,		"XFS_LI_ATTRD" }
 
 /*
  * Inode Log Item Format definitions.
@@ -863,4 +872,35 @@ struct xfs_icreate_log {
 	__be32		icl_gen;	/* inode generation number to use */
 };
 
+/*
+ * Flags for deferred attribute operations.
+ * Upper bits are flags, lower byte is type code
+ */
+#define XFS_ATTR_OP_FLAGS_SET		1	/* Set the attribute */
+#define XFS_ATTR_OP_FLAGS_REMOVE	2	/* Remove the attribute */
+#define XFS_ATTR_OP_FLAGS_TYPE_MASK	0x0FF	/* Flags type mask */
+
+/*
+ * This is the structure used to lay out an attr log item in the
+ * log.
+ */
+struct xfs_attri_log_format {
+	uint16_t	alfi_type;	/* attri log item type */
+	uint16_t	alfi_size;	/* size of this item */
+	uint32_t	__pad;		/* pad to 64 bit aligned */
+	uint64_t	alfi_id;	/* attri identifier */
+	xfs_ino_t	alfi_ino;	/* the inode for this attr operation */
+	uint32_t	alfi_op_flags;	/* marks the op as a set or remove */
+	uint32_t	alfi_name_len;	/* attr name length */
+	uint32_t	alfi_value_len;	/* attr value length */
+	uint32_t	alfi_attr_flags;/* attr flags */
+};
+
+struct xfs_attrd_log_format {
+	uint16_t	alfd_type;	/* attrd log item type */
+	uint16_t	alfd_size;	/* size of this item */
+	uint32_t	__pad;		/* pad to 64 bit aligned */
+	uint64_t	alfd_alf_id;	/* id of corresponding attrd */
+};
+
 #endif /* __XFS_LOG_FORMAT_H__ */
diff --git a/fs/xfs/libxfs/xfs_log_recover.h b/fs/xfs/libxfs/xfs_log_recover.h
index 3cca2bf..b6e5514 100644
--- a/fs/xfs/libxfs/xfs_log_recover.h
+++ b/fs/xfs/libxfs/xfs_log_recover.h
@@ -72,6 +72,8 @@ extern const struct xlog_recover_item_ops xlog_rui_item_ops;
 extern const struct xlog_recover_item_ops xlog_rud_item_ops;
 extern const struct xlog_recover_item_ops xlog_cui_item_ops;
 extern const struct xlog_recover_item_ops xlog_cud_item_ops;
+extern const struct xlog_recover_item_ops xlog_attri_item_ops;
+extern const struct xlog_recover_item_ops xlog_attrd_item_ops;
 
 /*
  * Macros, structures, prototypes for internal log manager use.
diff --git a/fs/xfs/libxfs/xfs_types.h b/fs/xfs/libxfs/xfs_types.h
index 397d947..860cdd2 100644
--- a/fs/xfs/libxfs/xfs_types.h
+++ b/fs/xfs/libxfs/xfs_types.h
@@ -11,6 +11,7 @@ typedef uint32_t	prid_t;		/* project ID */
 typedef uint32_t	xfs_agblock_t;	/* blockno in alloc. group */
 typedef uint32_t	xfs_agino_t;	/* inode # within allocation grp */
 typedef uint32_t	xfs_extlen_t;	/* extent length in blocks */
+typedef uint32_t	xfs_attrlen_t;	/* attr length */
 typedef uint32_t	xfs_agnumber_t;	/* allocation group number */
 typedef int32_t		xfs_extnum_t;	/* # of extents in a file */
 typedef int16_t		xfs_aextnum_t;	/* # extents in an attribute fork */
diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
index 1887605..9a649d1 100644
--- a/fs/xfs/scrub/common.c
+++ b/fs/xfs/scrub/common.c
@@ -24,6 +24,8 @@
 #include "xfs_rmap_btree.h"
 #include "xfs_log.h"
 #include "xfs_trans_priv.h"
+#include "xfs_da_format.h"
+#include "xfs_da_btree.h"
 #include "xfs_attr.h"
 #include "xfs_reflink.h"
 #include "scrub/scrub.h"
diff --git a/fs/xfs/xfs_acl.c b/fs/xfs/xfs_acl.c
index c544951..cad1db4 100644
--- a/fs/xfs/xfs_acl.c
+++ b/fs/xfs/xfs_acl.c
@@ -10,6 +10,8 @@
 #include "xfs_trans_resv.h"
 #include "xfs_mount.h"
 #include "xfs_inode.h"
+#include "xfs_da_format.h"
+#include "xfs_da_btree.h"
 #include "xfs_attr.h"
 #include "xfs_trace.h"
 #include "xfs_error.h"
diff --git a/fs/xfs/xfs_attr_item.c b/fs/xfs/xfs_attr_item.c
new file mode 100644
index 0000000..3980066
--- /dev/null
+++ b/fs/xfs/xfs_attr_item.c
@@ -0,0 +1,750 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Copyright (C) 2019 Oracle.  All Rights Reserved.
+ * Author: Allison Collins <allison.henderson@oracle.com>
+ */
+
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_format.h"
+#include "xfs_log_format.h"
+#include "xfs_trans_resv.h"
+#include "xfs_bit.h"
+#include "xfs_shared.h"
+#include "xfs_mount.h"
+#include "xfs_defer.h"
+#include "xfs_trans.h"
+#include "xfs_trans_priv.h"
+#include "xfs_buf_item.h"
+#include "xfs_attr_item.h"
+#include "xfs_log.h"
+#include "xfs_btree.h"
+#include "xfs_rmap.h"
+#include "xfs_inode.h"
+#include "xfs_icache.h"
+#include "xfs_da_format.h"
+#include "xfs_da_btree.h"
+#include "xfs_attr.h"
+#include "xfs_shared.h"
+#include "xfs_attr_item.h"
+#include "xfs_alloc.h"
+#include "xfs_bmap.h"
+#include "xfs_trace.h"
+#include "libxfs/xfs_da_format.h"
+#include "xfs_inode.h"
+#include "xfs_quota.h"
+#include "xfs_log_priv.h"
+#include "xfs_log_recover.h"
+
+static const struct xfs_item_ops xfs_attri_item_ops;
+static const struct xfs_item_ops xfs_attrd_item_ops;
+
+static inline struct xfs_attri_log_item *ATTRI_ITEM(struct xfs_log_item *lip)
+{
+	return container_of(lip, struct xfs_attri_log_item, attri_item);
+}
+
+STATIC void
+xfs_attri_item_free(
+	struct xfs_attri_log_item	*attrip)
+{
+	kmem_free(attrip->attri_item.li_lv_shadow);
+	kmem_free(attrip);
+}
+
+/*
+ * Freeing the attrip requires that we remove it from the AIL if it has already
+ * been placed there. However, the ATTRI may not yet have been placed in the
+ * AIL when called by xfs_attri_release() from ATTRD processing due to the
+ * ordering of committed vs unpin operations in bulk insert operations. Hence
+ * the reference count to ensure only the last caller frees the ATTRI.
+ */
+STATIC void
+xfs_attri_release(
+	struct xfs_attri_log_item	*attrip)
+{
+	ASSERT(atomic_read(&attrip->attri_refcount) > 0);
+	if (atomic_dec_and_test(&attrip->attri_refcount)) {
+		xfs_trans_ail_delete(&attrip->attri_item,
+				     SHUTDOWN_LOG_IO_ERROR);
+		xfs_attri_item_free(attrip);
+	}
+}
+
+/*
+ * This returns the number of iovecs needed to log the given attri item. We
+ * only need 1 iovec for an attri item.  It just logs the attr_log_format
+ * structure.
+ */
+static inline int
+xfs_attri_item_sizeof(
+	struct xfs_attri_log_item *attrip)
+{
+	return sizeof(struct xfs_attri_log_format);
+}
+
+STATIC void
+xfs_attri_item_size(
+	struct xfs_log_item	*lip,
+	int			*nvecs,
+	int			*nbytes)
+{
+	struct xfs_attri_log_item       *attrip = ATTRI_ITEM(lip);
+
+	*nvecs += 1;
+	*nbytes += xfs_attri_item_sizeof(attrip);
+
+	/* Attr set and remove operations require a name */
+	ASSERT(attrip->attri_name_len > 0);
+
+	*nvecs += 1;
+	*nbytes += ATTR_NVEC_SIZE(attrip->attri_name_len);
+
+	/*
+	 * Set ops can accept a value of 0 len to clear an attr value.  Remove
+	 * ops do not need a value at all.  So only account for the value
+	 * when it is needed.
+	 */
+	if (attrip->attri_value_len > 0) {
+		*nvecs += 1;
+		*nbytes += ATTR_NVEC_SIZE(attrip->attri_value_len);
+	}
+}
+
+/*
+ * This is called to fill in the log iovecs for the given attri log
+ * item. We use  1 iovec for the attri_format_item, 1 for the name, and
+ * another for the value if it is present
+ */
+STATIC void
+xfs_attri_item_format(
+	struct xfs_log_item	*lip,
+	struct xfs_log_vec	*lv)
+{
+	struct xfs_attri_log_item	*attrip = ATTRI_ITEM(lip);
+	struct xfs_log_iovec		*vecp = NULL;
+
+	attrip->attri_format.alfi_type = XFS_LI_ATTRI;
+	attrip->attri_format.alfi_size = 1;
+
+	/*
+	 * This size accounting must be done before copying the attrip into the
+	 * iovec.  If we do it after, the wrong size will be recorded to the log
+	 * and we trip across assertion checks for bad region sizes later during
+	 * the log recovery.
+	 */
+
+	ASSERT(attrip->attri_name_len > 0);
+	attrip->attri_format.alfi_size++;
+
+	if (attrip->attri_value_len > 0)
+		attrip->attri_format.alfi_size++;
+
+	xlog_copy_iovec(lv, &vecp, XLOG_REG_TYPE_ATTRI_FORMAT,
+			&attrip->attri_format,
+			xfs_attri_item_sizeof(attrip));
+	if (attrip->attri_name_len > 0)
+		xlog_copy_iovec(lv, &vecp, XLOG_REG_TYPE_ATTR_NAME,
+				attrip->attri_name,
+				ATTR_NVEC_SIZE(attrip->attri_name_len));
+
+	if (attrip->attri_value_len > 0)
+		xlog_copy_iovec(lv, &vecp, XLOG_REG_TYPE_ATTR_VALUE,
+				attrip->attri_value,
+				ATTR_NVEC_SIZE(attrip->attri_value_len));
+}
+
+/*
+ * The unpin operation is the last place an ATTRI is manipulated in the log. It
+ * is either inserted in the AIL or aborted in the event of a log I/O error. In
+ * either case, the ATTRI transaction has been successfully committed to make
+ * it this far. Therefore, we expect whoever committed the ATTRI to either
+ * construct and commit the ATTRD or drop the ATTRD's reference in the event of
+ * error. Simply drop the log's ATTRI reference now that the log is done with
+ * it.
+ */
+STATIC void
+xfs_attri_item_unpin(
+	struct xfs_log_item	*lip,
+	int			remove)
+{
+	struct xfs_attri_log_item	*attrip = ATTRI_ITEM(lip);
+
+	xfs_attri_release(attrip);
+}
+
+
+STATIC void
+xfs_attri_item_release(
+	struct xfs_log_item	*lip)
+{
+	xfs_attri_release(ATTRI_ITEM(lip));
+}
+
+/*
+ * Allocate and initialize an attri item
+ */
+STATIC struct xfs_attri_log_item *
+xfs_attri_init(
+	struct xfs_mount	*mp)
+
+{
+	struct xfs_attri_log_item	*attrip;
+	uint				size;
+
+	size = (uint)(sizeof(struct xfs_attri_log_item));
+	attrip = kmem_zalloc(size, 0);
+
+	xfs_log_item_init(mp, &attrip->attri_item, XFS_LI_ATTRI,
+			  &xfs_attri_item_ops);
+	attrip->attri_format.alfi_id = (uintptr_t)(void *)attrip;
+	atomic_set(&attrip->attri_refcount, 2);
+
+	return attrip;
+}
+
+/*
+ * Copy an attr format buffer from the given buf, and into the destination attr
+ * format structure.
+ */
+STATIC int
+xfs_attri_copy_format(struct xfs_log_iovec *buf,
+		      struct xfs_attri_log_format *dst_attr_fmt)
+{
+	struct xfs_attri_log_format *src_attr_fmt = buf->i_addr;
+	uint len = sizeof(struct xfs_attri_log_format);
+
+	if (buf->i_len != len)
+		return -EFSCORRUPTED;
+
+	memcpy((char *)dst_attr_fmt, (char *)src_attr_fmt, len);
+	return 0;
+}
+
+static inline struct xfs_attrd_log_item *ATTRD_ITEM(struct xfs_log_item *lip)
+{
+	return container_of(lip, struct xfs_attrd_log_item, attrd_item);
+}
+
+STATIC void
+xfs_attrd_item_free(struct xfs_attrd_log_item *attrdp)
+{
+	kmem_free(attrdp->attrd_item.li_lv_shadow);
+	kmem_free(attrdp);
+}
+
+/*
+ * This returns the number of iovecs needed to log the given attrd item.
+ * We only need 1 iovec for an attrd item.  It just logs the attr_log_format
+ * structure.
+ */
+static inline int
+xfs_attrd_item_sizeof(
+	struct xfs_attrd_log_item *attrdp)
+{
+	return sizeof(struct xfs_attrd_log_format);
+}
+
+STATIC void
+xfs_attrd_item_size(
+	struct xfs_log_item	*lip,
+	int			*nvecs,
+	int			*nbytes)
+{
+	struct xfs_attrd_log_item	*attrdp = ATTRD_ITEM(lip);
+	*nvecs += 1;
+	*nbytes += xfs_attrd_item_sizeof(attrdp);
+}
+
+/*
+ * This is called to fill in the log iovecs for the given attrd log item. We use
+ * only 1 iovec for the attrd_format, and we point that at the attr_log_format
+ * structure embedded in the attrd item.
+ */
+STATIC void
+xfs_attrd_item_format(
+	struct xfs_log_item	*lip,
+	struct xfs_log_vec	*lv)
+{
+	struct xfs_attrd_log_item	*attrdp = ATTRD_ITEM(lip);
+	struct xfs_log_iovec		*vecp = NULL;
+
+	attrdp->attrd_format.alfd_type = XFS_LI_ATTRD;
+	attrdp->attrd_format.alfd_size = 1;
+
+	xlog_copy_iovec(lv, &vecp, XLOG_REG_TYPE_ATTRD_FORMAT,
+			&attrdp->attrd_format, xfs_attrd_item_sizeof(attrdp));
+}
+
+/*
+ * The ATTRD is either committed or aborted if the transaction is cancelled. If
+ * the transaction is cancelled, drop our reference to the ATTRI and free the
+ * ATTRD.
+ */
+STATIC void
+xfs_attrd_item_release(
+	struct xfs_log_item     *lip)
+{
+	struct xfs_attrd_log_item *attrdp = ATTRD_ITEM(lip);
+	xfs_attri_release(attrdp->attrd_attrip);
+	xfs_attrd_item_free(attrdp);
+}
+
+/*
+ * Log an ATTRI it to the ATTRD when the attr op is done.  An attr operation
+ * may be a set or a remove.  Note that the transaction is marked dirty
+ * regardless of whether the operation succeeds or fails to support the
+ * ATTRI/ATTRD lifecycle rules.
+ */
+int
+xfs_trans_attr(
+	struct xfs_delattr_context	*dac,
+	struct xfs_attrd_log_item	*attrdp,
+	struct xfs_buf			**leaf_bp,
+	uint32_t			op_flags)
+{
+	struct xfs_da_args		*args = dac->da_args;
+	int				error;
+
+	error = xfs_qm_dqattach_locked(args->dp, 0);
+	if (error)
+		return error;
+
+	switch (op_flags) {
+	case XFS_ATTR_OP_FLAGS_SET:
+		args->op_flags |= XFS_DA_OP_ADDNAME;
+		error = xfs_attr_set_iter(dac, leaf_bp);
+		break;
+	case XFS_ATTR_OP_FLAGS_REMOVE:
+		ASSERT(XFS_IFORK_Q((args->dp)));
+		error = xfs_attr_remove_iter(dac);
+		break;
+	default:
+		error = -EFSCORRUPTED;
+		break;
+	}
+
+	/*
+	 * Mark the transaction dirty, even on error. This ensures the
+	 * transaction is aborted, which:
+	 *
+	 * 1.) releases the ATTRI and frees the ATTRD
+	 * 2.) shuts down the filesystem
+	 */
+	args->trans->t_flags |= XFS_TRANS_DIRTY;
+	if (xfs_sb_version_hasdelattr(&args->dp->i_mount->m_sb))
+		set_bit(XFS_LI_DIRTY, &attrdp->attrd_item.li_flags);
+
+	return error;
+}
+
+/* Log an attr to the intent item. */
+STATIC void
+xfs_attr_log_item(
+	struct xfs_trans		*tp,
+	struct xfs_attri_log_item	*attrip,
+	struct xfs_attr_item		*attr)
+{
+	struct xfs_attri_log_format	*attrp;
+
+	tp->t_flags |= XFS_TRANS_DIRTY;
+	set_bit(XFS_LI_DIRTY, &attrip->attri_item.li_flags);
+
+	/*
+	 * At this point the xfs_attr_item has been constructed, and we've
+	 * created the log intent. Fill in the attri log item and log format
+	 * structure with fields from this xfs_attr_item
+	 */
+	attrp = &attrip->attri_format;
+	attrp->alfi_ino = attr->xattri_dac.da_args->dp->i_ino;
+	attrp->alfi_op_flags = attr->xattri_op_flags;
+	attrp->alfi_value_len = attr->xattri_dac.da_args->valuelen;
+	attrp->alfi_name_len = attr->xattri_dac.da_args->namelen;
+	attrp->alfi_attr_flags = attr->xattri_dac.da_args->attr_filter;
+
+	attrip->attri_name = (void *)attr->xattri_dac.da_args->name;
+	attrip->attri_value = attr->xattri_dac.da_args->value;
+	attrip->attri_name_len = attr->xattri_dac.da_args->namelen;
+	attrip->attri_value_len = attr->xattri_dac.da_args->valuelen;
+}
+
+/* Get an ATTRI. */
+static struct xfs_log_item *
+xfs_attr_create_intent(
+	struct xfs_trans		*tp,
+	struct list_head		*items,
+	unsigned int			count,
+	bool				sort)
+{
+	struct xfs_mount		*mp = tp->t_mountp;
+	struct xfs_attri_log_item	*attrip;
+	struct xfs_attr_item		*attr;
+
+	ASSERT(count == 1);
+
+	if (!xfs_sb_version_hasdelattr(&mp->m_sb))
+		return NULL;
+
+	attrip = xfs_attri_init(mp);
+	xfs_trans_add_item(tp, &attrip->attri_item);
+	list_for_each_entry(attr, items, xattri_list)
+		xfs_attr_log_item(tp, attrip, attr);
+	return &attrip->attri_item;
+}
+
+/* Process an attr. */
+STATIC int
+xfs_attr_finish_item(
+	struct xfs_trans		*tp,
+	struct xfs_log_item		*done,
+	struct list_head		*item,
+	struct xfs_btree_cur		**state)
+{
+	struct xfs_attr_item		*attr;
+	int				error;
+	struct xfs_delattr_context	*dac;
+	struct xfs_attrd_log_item	*attrdp;
+	struct xfs_attri_log_item	*attrip;
+
+	attr = container_of(item, struct xfs_attr_item, xattri_list);
+	dac = &attr->xattri_dac;
+
+	/*
+	 * Always reset trans after EAGAIN cycle
+	 * since the transaction is new
+	 */
+	dac->da_args->trans = tp;
+
+	error = xfs_trans_attr(dac, ATTRD_ITEM(done), &dac->leaf_bp,
+			       attr->xattri_op_flags);
+	/*
+	 * The attrip refers to xfs_attr_item memory to log the name and value
+	 * with the intent item. This already occurred when the intent was
+	 * committed so these fields are no longer accessed. Clear them out of
+	 * caution since we're about to free the xfs_attr_item.
+	 */
+	if (xfs_sb_version_hasdelattr(&dac->da_args->dp->i_mount->m_sb)) {
+		attrdp = (struct xfs_attrd_log_item *)done;
+		attrip = attrdp->attrd_attrip;
+		attrip->attri_name = NULL;
+		attrip->attri_value = NULL;
+	}
+
+	if (error != -EAGAIN)
+		kmem_free(attr);
+
+	return error;
+}
+
+/* Abort all pending ATTRs. */
+STATIC void
+xfs_attr_abort_intent(
+	struct xfs_log_item		*intent)
+{
+	xfs_attri_release(ATTRI_ITEM(intent));
+}
+
+/* Cancel an attr */
+STATIC void
+xfs_attr_cancel_item(
+	struct list_head		*item)
+{
+	struct xfs_attr_item		*attr;
+
+	attr = container_of(item, struct xfs_attr_item, xattri_list);
+	kmem_free(attr);
+}
+
+/*
+ * The ATTRI is logged only once and cannot be moved in the log, so simply
+ * return the lsn at which it's been logged.
+ */
+STATIC xfs_lsn_t
+xfs_attri_item_committed(
+	struct xfs_log_item	*lip,
+	xfs_lsn_t		lsn)
+{
+	return lsn;
+}
+
+STATIC void
+xfs_attri_item_committing(
+	struct xfs_log_item	*lip,
+	xfs_lsn_t		lsn)
+{
+}
+
+STATIC bool
+xfs_attri_item_match(
+	struct xfs_log_item	*lip,
+	uint64_t		intent_id)
+{
+	return ATTRI_ITEM(lip)->attri_format.alfi_id == intent_id;
+}
+
+/*
+ * When the attrd item is committed to disk, all we need to do is delete our
+ * reference to our partner attri item and then free ourselves. Since we're
+ * freeing ourselves we must return -1 to keep the transaction code from
+ * further referencing this item.
+ */
+STATIC xfs_lsn_t
+xfs_attrd_item_committed(
+	struct xfs_log_item	*lip,
+	xfs_lsn_t		lsn)
+{
+	struct xfs_attrd_log_item	*attrdp = ATTRD_ITEM(lip);
+
+	/*
+	 * Drop the ATTRI reference regardless of whether the ATTRD has been
+	 * aborted. Once the ATTRD transaction is constructed, it is the sole
+	 * responsibility of the ATTRD to release the ATTRI (even if the ATTRI
+	 * is aborted due to log I/O error).
+	 */
+	xfs_attri_release(attrdp->attrd_attrip);
+	xfs_attrd_item_free(attrdp);
+
+	return NULLCOMMITLSN;
+}
+
+STATIC void
+xfs_attrd_item_committing(
+	struct xfs_log_item	*lip,
+	xfs_lsn_t		lsn)
+{
+}
+
+
+/*
+ * Allocate and initialize an attrd item
+ */
+struct xfs_attrd_log_item *
+xfs_attrd_init(
+	struct xfs_mount		*mp,
+	struct xfs_attri_log_item	*attrip)
+
+{
+	struct xfs_attrd_log_item	*attrdp;
+	uint				size;
+
+	size = (uint)(sizeof(struct xfs_attrd_log_item));
+	attrdp = kmem_zalloc(size, 0);
+	memset(attrdp, 0, size);
+
+	xfs_log_item_init(mp, &attrdp->attrd_item, XFS_LI_ATTRD,
+			  &xfs_attrd_item_ops);
+	attrdp->attrd_attrip = attrip;
+	attrdp->attrd_format.alfd_alf_id = attrip->attri_format.alfi_id;
+
+	return attrdp;
+}
+
+/*
+ * This routine is called to allocate an "attr free done" log item.
+ */
+struct xfs_attrd_log_item *
+xfs_trans_get_attrd(struct xfs_trans		*tp,
+		  struct xfs_attri_log_item	*attrip)
+{
+	struct xfs_attrd_log_item		*attrdp;
+
+	ASSERT(tp != NULL);
+
+	attrdp = xfs_attrd_init(tp->t_mountp, attrip);
+	ASSERT(attrdp != NULL);
+
+	xfs_trans_add_item(tp, &attrdp->attrd_item);
+	return attrdp;
+}
+
+static const struct xfs_item_ops xfs_attrd_item_ops = {
+	.iop_size	= xfs_attrd_item_size,
+	.iop_format	= xfs_attrd_item_format,
+	.iop_release    = xfs_attrd_item_release,
+	.iop_committing	= xfs_attrd_item_committing,
+	.iop_committed	= xfs_attrd_item_committed,
+};
+
+
+/* Get an ATTRD so we can process all the attrs. */
+static struct xfs_log_item *
+xfs_attr_create_done(
+	struct xfs_trans		*tp,
+	struct xfs_log_item		*intent,
+	unsigned int			count)
+{
+	if (!xfs_sb_version_hasdelattr(&tp->t_mountp->m_sb))
+		return NULL;
+
+	return &xfs_trans_get_attrd(tp, ATTRI_ITEM(intent))->attrd_item;
+}
+
+const struct xfs_defer_op_type xfs_attr_defer_type = {
+	.max_items	= 1,
+	.create_intent	= xfs_attr_create_intent,
+	.abort_intent	= xfs_attr_abort_intent,
+	.create_done	= xfs_attr_create_done,
+	.finish_item	= xfs_attr_finish_item,
+	.cancel_item	= xfs_attr_cancel_item,
+};
+
+/*
+ * Process an attr intent item that was recovered from the log.  We need to
+ * delete the attr that it describes.
+ */
+STATIC int
+xfs_attri_item_recover(
+	struct xfs_log_item		*lip,
+	struct list_head		*capture_list)
+{
+	struct xfs_attri_log_item	*attrip = ATTRI_ITEM(lip);
+	struct xfs_mount		*mp = lip->li_mountp;
+	struct xfs_inode		*ip;
+	struct xfs_da_args		args;
+	struct xfs_attri_log_format	*attrp;
+	int				error;
+
+	/*
+	 * First check the validity of the attr described by the ATTRI.  If any
+	 * are bad, then assume that all are bad and just toss the ATTRI.
+	 */
+	attrp = &attrip->attri_format;
+	if (!(attrp->alfi_op_flags == XFS_ATTR_OP_FLAGS_SET ||
+	      attrp->alfi_op_flags == XFS_ATTR_OP_FLAGS_REMOVE) ||
+	    (attrp->alfi_value_len > XATTR_SIZE_MAX) ||
+	    (attrp->alfi_name_len > XATTR_NAME_MAX) ||
+	    (attrp->alfi_name_len == 0)) {
+		/*
+		 * This will pull the ATTRI from the AIL and free the memory
+		 * associated with it.
+		 */
+		xfs_attri_release(attrip);
+		return -EFSCORRUPTED;
+	}
+
+	error = xfs_iget(mp, 0, attrp->alfi_ino, 0, 0, &ip);
+	if (error)
+		return error;
+
+	memset(&args, 0, sizeof(args));
+	args.dp = ip;
+	args.name = attrip->attri_name;
+	args.namelen = attrp->alfi_name_len;
+	args.attr_filter = attrp->alfi_attr_flags;
+	if (attrp->alfi_op_flags == XFS_ATTR_OP_FLAGS_SET) {
+		args.value = attrip->attri_value;
+		args.valuelen = attrp->alfi_value_len;
+	}
+
+	error = xfs_attr_set(&args);
+
+	xfs_attri_release(attrip);
+	xfs_irele(ip);
+	return error;
+}
+
+static const struct xfs_item_ops xfs_attri_item_ops = {
+	.iop_size	= xfs_attri_item_size,
+	.iop_format	= xfs_attri_item_format,
+	.iop_unpin	= xfs_attri_item_unpin,
+	.iop_committed	= xfs_attri_item_committed,
+	.iop_committing = xfs_attri_item_committing,
+	.iop_release    = xfs_attri_item_release,
+	.iop_recover	= xfs_attri_item_recover,
+	.iop_match	= xfs_attri_item_match,
+};
+
+
+
+STATIC int
+xlog_recover_attri_commit_pass2(
+	struct xlog                     *log,
+	struct list_head		*buffer_list,
+	struct xlog_recover_item        *item,
+	xfs_lsn_t                       lsn)
+{
+	int                             error;
+	struct xfs_mount                *mp = log->l_mp;
+	struct xfs_attri_log_item       *attrip;
+	struct xfs_attri_log_format     *attri_formatp;
+	char				*name = NULL;
+	char				*value = NULL;
+	int				region = 0;
+
+	attri_formatp = item->ri_buf[region].i_addr;
+
+	attrip = xfs_attri_init(mp);
+	error = xfs_attri_copy_format(&item->ri_buf[region],
+				      &attrip->attri_format);
+	if (error) {
+		xfs_attri_item_free(attrip);
+		return error;
+	}
+
+	attrip->attri_name_len = attri_formatp->alfi_name_len;
+	attrip->attri_value_len = attri_formatp->alfi_value_len;
+	attrip = krealloc(attrip, sizeof(struct xfs_attri_log_item) +
+			  attrip->attri_name_len + attrip->attri_value_len,
+			  GFP_NOFS | __GFP_NOFAIL);
+
+	ASSERT(attrip->attri_name_len > 0);
+	region++;
+	name = ((char *)attrip) + sizeof(struct xfs_attri_log_item);
+	memcpy(name, item->ri_buf[region].i_addr,
+	       attrip->attri_name_len);
+	attrip->attri_name = name;
+
+	if (attrip->attri_value_len > 0) {
+		region++;
+		value = ((char *)attrip) + sizeof(struct xfs_attri_log_item) +
+			attrip->attri_name_len;
+		memcpy(value, item->ri_buf[region].i_addr,
+			attrip->attri_value_len);
+		attrip->attri_value = value;
+	}
+
+	/*
+	 * The ATTRI has two references. One for the ATTRD and one for ATTRI to
+	 * ensure it makes it into the AIL. Insert the ATTRI into the AIL
+	 * directly and drop the ATTRI reference. Note that
+	 * xfs_trans_ail_update() drops the AIL lock.
+	 */
+	xfs_trans_ail_insert(log->l_ailp, &attrip->attri_item, lsn);
+	xfs_attri_release(attrip);
+	return 0;
+}
+
+const struct xlog_recover_item_ops xlog_attri_item_ops = {
+	.item_type	= XFS_LI_ATTRI,
+	.commit_pass2	= xlog_recover_attri_commit_pass2,
+};
+
+/*
+ * This routine is called when an ATTRD format structure is found in a committed
+ * transaction in the log. Its purpose is to cancel the corresponding ATTRI if
+ * it was still in the log. To do this it searches the AIL for the ATTRI with
+ * an id equal to that in the ATTRD format structure. If we find it we drop
+ * the ATTRD reference, which removes the ATTRI from the AIL and frees it.
+ */
+STATIC int
+xlog_recover_attrd_commit_pass2(
+	struct xlog			*log,
+	struct list_head		*buffer_list,
+	struct xlog_recover_item	*item,
+	xfs_lsn_t			lsn)
+{
+	struct xfs_attrd_log_format	*attrd_formatp;
+
+	attrd_formatp = item->ri_buf[0].i_addr;
+	ASSERT((item->ri_buf[0].i_len ==
+				(sizeof(struct xfs_attrd_log_format))));
+
+	xlog_recover_release_intent(log, XFS_LI_ATTRI,
+				    attrd_formatp->alfd_alf_id);
+	return 0;
+}
+
+const struct xlog_recover_item_ops xlog_attrd_item_ops = {
+	.item_type	= XFS_LI_ATTRD,
+	.commit_pass2	= xlog_recover_attrd_commit_pass2,
+};
diff --git a/fs/xfs/xfs_attr_item.h b/fs/xfs/xfs_attr_item.h
new file mode 100644
index 0000000..7dd2572
--- /dev/null
+++ b/fs/xfs/xfs_attr_item.h
@@ -0,0 +1,76 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later
+ *
+ * Copyright (C) 2019 Oracle.  All Rights Reserved.
+ * Author: Allison Collins <allison.henderson@oracle.com>
+ */
+#ifndef	__XFS_ATTR_ITEM_H__
+#define	__XFS_ATTR_ITEM_H__
+
+/* kernel only ATTRI/ATTRD definitions */
+
+struct xfs_mount;
+struct kmem_zone;
+
+/*
+ * Define ATTR flag bits. Manipulated by set/clear/test_bit operators.
+ */
+#define	XFS_ATTRI_RECOVERED	1
+
+
+/* iovec length must be 32-bit aligned */
+#define ATTR_NVEC_SIZE(size) (size == sizeof(int32_t) ? sizeof(int32_t) : \
+				size + sizeof(int32_t) - \
+				(size % sizeof(int32_t)))
+
+/*
+ * This is the "attr intention" log item.  It is used to log the fact that some
+ * attribute operations need to be processed.  An operation is currently either
+ * a set or remove.  Set or remove operations are described by the xfs_attr_item
+ * which may be logged to this intent.  Intents are used in conjunction with the
+ * "attr done" log item described below.
+ *
+ * The ATTRI is reference counted so that it is not freed prior to both the
+ * ATTRI and ATTRD being committed and unpinned. This ensures the ATTRI is
+ * inserted into the AIL even in the event of out of order ATTRI/ATTRD
+ * processing. In other words, an ATTRI is born with two references:
+ *
+ *      1.) an ATTRI held reference to track ATTRI AIL insertion
+ *      2.) an ATTRD held reference to track ATTRD commit
+ *
+ * On allocation, both references are the responsibility of the caller. Once the
+ * ATTRI is added to and dirtied in a transaction, ownership of reference one
+ * transfers to the transaction. The reference is dropped once the ATTRI is
+ * inserted to the AIL or in the event of failure along the way (e.g., commit
+ * failure, log I/O error, etc.). Note that the caller remains responsible for
+ * the ATTRD reference under all circumstances to this point. The caller has no
+ * means to detect failure once the transaction is committed, however.
+ * Therefore, an ATTRD is required after this point, even in the event of
+ * unrelated failure.
+ *
+ * Once an ATTRD is allocated and dirtied in a transaction, reference two
+ * transfers to the transaction. The ATTRD reference is dropped once it reaches
+ * the unpin handler. Similar to the ATTRI, the reference also drops in the
+ * event of commit failure or log I/O errors. Note that the ATTRD is not
+ * inserted in the AIL, so at this point both the ATTRI and ATTRD are freed.
+ */
+struct xfs_attri_log_item {
+	struct xfs_log_item		attri_item;
+	atomic_t			attri_refcount;
+	int				attri_name_len;
+	void				*attri_name;
+	int				attri_value_len;
+	void				*attri_value;
+	struct xfs_attri_log_format	attri_format;
+};
+
+/*
+ * This is the "attr done" log item.  It is used to log the fact that some attrs
+ * earlier mentioned in an attri item have been freed.
+ */
+struct xfs_attrd_log_item {
+	struct xfs_attri_log_item	*attrd_attrip;
+	struct xfs_log_item		attrd_item;
+	struct xfs_attrd_log_format	attrd_format;
+};
+
+#endif	/* __XFS_ATTR_ITEM_H__ */
diff --git a/fs/xfs/xfs_attr_list.c b/fs/xfs/xfs_attr_list.c
index 8f8837f..d7787a5 100644
--- a/fs/xfs/xfs_attr_list.c
+++ b/fs/xfs/xfs_attr_list.c
@@ -15,6 +15,7 @@
 #include "xfs_inode.h"
 #include "xfs_trans.h"
 #include "xfs_bmap.h"
+#include "xfs_da_btree.h"
 #include "xfs_attr.h"
 #include "xfs_attr_sf.h"
 #include "xfs_attr_leaf.h"
diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index 3fbd98f..d5d1959 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -15,6 +15,8 @@
 #include "xfs_iwalk.h"
 #include "xfs_itable.h"
 #include "xfs_error.h"
+#include "xfs_da_format.h"
+#include "xfs_da_btree.h"
 #include "xfs_attr.h"
 #include "xfs_bmap.h"
 #include "xfs_bmap_util.h"
diff --git a/fs/xfs/xfs_ioctl32.c b/fs/xfs/xfs_ioctl32.c
index c1771e7..62e1534 100644
--- a/fs/xfs/xfs_ioctl32.c
+++ b/fs/xfs/xfs_ioctl32.c
@@ -17,6 +17,8 @@
 #include "xfs_itable.h"
 #include "xfs_fsops.h"
 #include "xfs_rtalloc.h"
+#include "xfs_da_format.h"
+#include "xfs_da_btree.h"
 #include "xfs_attr.h"
 #include "xfs_ioctl.h"
 #include "xfs_ioctl32.h"
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index 5e16545..5ecc76c 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -13,6 +13,8 @@
 #include "xfs_inode.h"
 #include "xfs_acl.h"
 #include "xfs_quota.h"
+#include "xfs_da_format.h"
+#include "xfs_da_btree.h"
 #include "xfs_attr.h"
 #include "xfs_trans.h"
 #include "xfs_trace.h"
diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
index fa2d05e..3457f22 100644
--- a/fs/xfs/xfs_log.c
+++ b/fs/xfs/xfs_log.c
@@ -1993,6 +1993,10 @@ xlog_print_tic_res(
 	    REG_TYPE_STR(CUD_FORMAT, "cud_format"),
 	    REG_TYPE_STR(BUI_FORMAT, "bui_format"),
 	    REG_TYPE_STR(BUD_FORMAT, "bud_format"),
+	    REG_TYPE_STR(ATTRI_FORMAT, "attri_format"),
+	    REG_TYPE_STR(ATTRD_FORMAT, "attrd_format"),
+	    REG_TYPE_STR(ATTR_NAME, "attr_name"),
+	    REG_TYPE_STR(ATTR_VALUE, "attr_value"),
 	};
 	BUILD_BUG_ON(ARRAY_SIZE(res_type_str) != XLOG_REG_TYPE_MAX + 1);
 #undef REG_TYPE_STR
diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index a8289ad..cb951cd 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -1775,6 +1775,8 @@ static const struct xlog_recover_item_ops *xlog_recover_item_ops[] = {
 	&xlog_cud_item_ops,
 	&xlog_bui_item_ops,
 	&xlog_bud_item_ops,
+	&xlog_attri_item_ops,
+	&xlog_attrd_item_ops,
 };
 
 static const struct xlog_recover_item_ops *
diff --git a/fs/xfs/xfs_ondisk.h b/fs/xfs/xfs_ondisk.h
index 0aa87c2..bc9c25e 100644
--- a/fs/xfs/xfs_ondisk.h
+++ b/fs/xfs/xfs_ondisk.h
@@ -132,6 +132,8 @@ xfs_check_ondisk_structs(void)
 	XFS_CHECK_STRUCT_SIZE(struct xfs_inode_log_format,	56);
 	XFS_CHECK_STRUCT_SIZE(struct xfs_qoff_logformat,	20);
 	XFS_CHECK_STRUCT_SIZE(struct xfs_trans_header,		16);
+	XFS_CHECK_STRUCT_SIZE(struct xfs_attri_log_format,	40);
+	XFS_CHECK_STRUCT_SIZE(struct xfs_attrd_log_format,	16);
 
 	/*
 	 * The v5 superblock format extended several v4 header structures with
diff --git a/fs/xfs/xfs_xattr.c b/fs/xfs/xfs_xattr.c
index bca48b3..9b0c790 100644
--- a/fs/xfs/xfs_xattr.c
+++ b/fs/xfs/xfs_xattr.c
@@ -10,6 +10,7 @@
 #include "xfs_log_format.h"
 #include "xfs_da_format.h"
 #include "xfs_inode.h"
+#include "xfs_da_btree.h"
 #include "xfs_attr.h"
 #include "xfs_acl.h"
 #include "xfs_da_btree.h"
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v13 06/10] xfs: Add xfs_attr_set_deferred and xfs_attr_remove_deferred
  2020-10-23  6:34 [PATCH v13 00/10] xfs: Delayed Attributes Allison Henderson
                   ` (4 preceding siblings ...)
  2020-10-23  6:34 ` [PATCH v13 05/10] xfs: Set up infastructure for deferred attribute operations Allison Henderson
@ 2020-10-23  6:34 ` Allison Henderson
  2020-11-10 20:15   ` Darrick J. Wong
  2020-10-23  6:34 ` [PATCH v13 07/10] xfs: Add feature bit XFS_SB_FEAT_INCOMPAT_LOG_DELATTR Allison Henderson
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 58+ messages in thread
From: Allison Henderson @ 2020-10-23  6:34 UTC (permalink / raw)
  To: linux-xfs

From: Allison Collins <allison.henderson@oracle.com>

These routines to set up and start a new deferred attribute operations.
These functions are meant to be called by any routine needing to
initiate a deferred attribute operation as opposed to the existing
inline operations. New helper function xfs_attr_item_init also added.

Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
---
 fs/xfs/libxfs/xfs_attr.c | 54 ++++++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/libxfs/xfs_attr.h |  2 ++
 2 files changed, 56 insertions(+)

diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
index 760383c..7fe5554 100644
--- a/fs/xfs/libxfs/xfs_attr.c
+++ b/fs/xfs/libxfs/xfs_attr.c
@@ -25,6 +25,7 @@
 #include "xfs_trans_space.h"
 #include "xfs_trace.h"
 #include "xfs_attr_item.h"
+#include "xfs_attr.h"
 
 /*
  * xfs_attr.c
@@ -643,6 +644,59 @@ xfs_attr_set(
 	goto out_unlock;
 }
 
+STATIC int
+xfs_attr_item_init(
+	struct xfs_da_args	*args,
+	unsigned int		op_flags,	/* op flag (set or remove) */
+	struct xfs_attr_item	**attr)		/* new xfs_attr_item */
+{
+
+	struct xfs_attr_item	*new;
+
+	new = kmem_alloc_large(sizeof(struct xfs_attr_item), KM_NOFS);
+	memset(new, 0, sizeof(struct xfs_attr_item));
+	new->xattri_op_flags = op_flags;
+	new->xattri_dac.da_args = args;
+
+	*attr = new;
+	return 0;
+}
+
+/* Sets an attribute for an inode as a deferred operation */
+int
+xfs_attr_set_deferred(
+	struct xfs_da_args	*args)
+{
+	struct xfs_attr_item	*new;
+	int			error = 0;
+
+	error = xfs_attr_item_init(args, XFS_ATTR_OP_FLAGS_SET, &new);
+	if (error)
+		return error;
+
+	xfs_defer_add(args->trans, XFS_DEFER_OPS_TYPE_ATTR, &new->xattri_list);
+
+	return 0;
+}
+
+/* Removes an attribute for an inode as a deferred operation */
+int
+xfs_attr_remove_deferred(
+	struct xfs_da_args	*args)
+{
+
+	struct xfs_attr_item	*new;
+	int			error;
+
+	error  = xfs_attr_item_init(args, XFS_ATTR_OP_FLAGS_REMOVE, &new);
+	if (error)
+		return error;
+
+	xfs_defer_add(args->trans, XFS_DEFER_OPS_TYPE_ATTR, &new->xattri_list);
+
+	return 0;
+}
+
 /*========================================================================
  * External routines when attribute list is inside the inode
  *========================================================================*/
diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
index 5b4a1ca..8a08411 100644
--- a/fs/xfs/libxfs/xfs_attr.h
+++ b/fs/xfs/libxfs/xfs_attr.h
@@ -307,5 +307,7 @@ bool xfs_attr_namecheck(const void *name, size_t length);
 void xfs_delattr_context_init(struct xfs_delattr_context *dac,
 			      struct xfs_da_args *args);
 int xfs_attr_calc_size(struct xfs_da_args *args, int *local);
+int xfs_attr_set_deferred(struct xfs_da_args *args);
+int xfs_attr_remove_deferred(struct xfs_da_args *args);
 
 #endif	/* __XFS_ATTR_H__ */
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v13 07/10] xfs: Add feature bit XFS_SB_FEAT_INCOMPAT_LOG_DELATTR
  2020-10-23  6:34 [PATCH v13 00/10] xfs: Delayed Attributes Allison Henderson
                   ` (5 preceding siblings ...)
  2020-10-23  6:34 ` [PATCH v13 06/10] xfs: Add xfs_attr_set_deferred and xfs_attr_remove_deferred Allison Henderson
@ 2020-10-23  6:34 ` Allison Henderson
  2020-11-10 20:10   ` Darrick J. Wong
  2020-11-19  2:36   ` Darrick J. Wong
  2020-10-23  6:34 ` [PATCH v13 08/10] xfs: Enable delayed attributes Allison Henderson
                   ` (2 subsequent siblings)
  9 siblings, 2 replies; 58+ messages in thread
From: Allison Henderson @ 2020-10-23  6:34 UTC (permalink / raw)
  To: linux-xfs

This patch adds a new feature bit XFS_SB_FEAT_INCOMPAT_LOG_DELATTR which
can be used to control turning on/off delayed attributes

Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
---
 fs/xfs/libxfs/xfs_format.h | 8 ++++++--
 fs/xfs/libxfs/xfs_fs.h     | 1 +
 fs/xfs/libxfs/xfs_sb.c     | 2 ++
 fs/xfs/xfs_super.c         | 3 +++
 4 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
index d419c34..18b41a7 100644
--- a/fs/xfs/libxfs/xfs_format.h
+++ b/fs/xfs/libxfs/xfs_format.h
@@ -483,7 +483,9 @@ xfs_sb_has_incompat_feature(
 	return (sbp->sb_features_incompat & feature) != 0;
 }
 
-#define XFS_SB_FEAT_INCOMPAT_LOG_ALL 0
+#define XFS_SB_FEAT_INCOMPAT_LOG_DELATTR   (1 << 0)	/* Delayed Attributes */
+#define XFS_SB_FEAT_INCOMPAT_LOG_ALL \
+	(XFS_SB_FEAT_INCOMPAT_LOG_DELATTR)
 #define XFS_SB_FEAT_INCOMPAT_LOG_UNKNOWN	~XFS_SB_FEAT_INCOMPAT_LOG_ALL
 static inline bool
 xfs_sb_has_incompat_log_feature(
@@ -586,7 +588,9 @@ static inline bool xfs_sb_version_hasinobtcounts(struct xfs_sb *sbp)
 
 static inline bool xfs_sb_version_hasdelattr(struct xfs_sb *sbp)
 {
-	return false;
+	return (XFS_SB_VERSION_NUM(sbp) == XFS_SB_VERSION_5 &&
+		(sbp->sb_features_log_incompat &
+		XFS_SB_FEAT_INCOMPAT_LOG_DELATTR));
 }
 
 /*
diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
index 2a2e3cf..f703d95 100644
--- a/fs/xfs/libxfs/xfs_fs.h
+++ b/fs/xfs/libxfs/xfs_fs.h
@@ -250,6 +250,7 @@ typedef struct xfs_fsop_resblks {
 #define XFS_FSOP_GEOM_FLAGS_RMAPBT	(1 << 19) /* reverse mapping btree */
 #define XFS_FSOP_GEOM_FLAGS_REFLINK	(1 << 20) /* files can share blocks */
 #define XFS_FSOP_GEOM_FLAGS_BIGTIME	(1 << 21) /* 64-bit nsec timestamps */
+#define XFS_FSOP_GEOM_FLAGS_DELATTR	(1 << 22) /* delayed attributes	    */
 
 /*
  * Minimum and maximum sizes need for growth checks.
diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c
index 5aeafa5..a0ec327 100644
--- a/fs/xfs/libxfs/xfs_sb.c
+++ b/fs/xfs/libxfs/xfs_sb.c
@@ -1168,6 +1168,8 @@ xfs_fs_geometry(
 		geo->flags |= XFS_FSOP_GEOM_FLAGS_REFLINK;
 	if (xfs_sb_version_hasbigtime(sbp))
 		geo->flags |= XFS_FSOP_GEOM_FLAGS_BIGTIME;
+	if (xfs_sb_version_hasdelattr(sbp))
+		geo->flags |= XFS_FSOP_GEOM_FLAGS_DELATTR;
 	if (xfs_sb_version_hassector(sbp))
 		geo->logsectsize = sbp->sb_logsectsize;
 	else
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index d1b5f2d..bb85884 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -1580,6 +1580,9 @@ xfs_fc_fill_super(
 	if (xfs_sb_version_hasinobtcounts(&mp->m_sb))
 		xfs_warn(mp,
  "EXPERIMENTAL inode btree counters feature in use. Use at your own risk!");
+	if (xfs_sb_version_hasdelattr(&mp->m_sb))
+		xfs_alert(mp,
+	"EXPERIMENTAL delayed attrs feature enabled. Use at your own risk!");
 
 	error = xfs_mountfs(mp);
 	if (error)
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v13 08/10] xfs: Enable delayed attributes
  2020-10-23  6:34 [PATCH v13 00/10] xfs: Delayed Attributes Allison Henderson
                   ` (6 preceding siblings ...)
  2020-10-23  6:34 ` [PATCH v13 07/10] xfs: Add feature bit XFS_SB_FEAT_INCOMPAT_LOG_DELATTR Allison Henderson
@ 2020-10-23  6:34 ` Allison Henderson
  2020-10-23  6:34 ` [PATCH v13 09/10] xfs: Remove unused xfs_attr_*_args Allison Henderson
  2020-10-23  6:34 ` [PATCH v13 10/10] xfs: Add delayed attributes error tag Allison Henderson
  9 siblings, 0 replies; 58+ messages in thread
From: Allison Henderson @ 2020-10-23  6:34 UTC (permalink / raw)
  To: linux-xfs

From: Allison Collins <allison.henderson@oracle.com>

Finally enable delayed attributes in xfs_attr_set and xfs_attr_remove.

Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
---
 fs/xfs/libxfs/xfs_attr.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
index 7fe5554..edd5d10 100644
--- a/fs/xfs/libxfs/xfs_attr.c
+++ b/fs/xfs/libxfs/xfs_attr.c
@@ -603,9 +603,10 @@ xfs_attr_set(
 		if (error != -ENOATTR && error != -EEXIST)
 			goto out_trans_cancel;
 
-		error = xfs_attr_set_args(args);
+		error = xfs_attr_set_deferred(args);
 		if (error)
 			goto out_trans_cancel;
+
 		/* shortform attribute has already been committed */
 		if (!args->trans)
 			goto out_unlock;
@@ -614,7 +615,7 @@ xfs_attr_set(
 		if (error != -EEXIST)
 			goto out_trans_cancel;
 
-		error = xfs_attr_remove_args(args);
+		error = xfs_attr_remove_deferred(args);
 		if (error)
 			goto out_trans_cancel;
 	}
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v13 09/10] xfs: Remove unused xfs_attr_*_args
  2020-10-23  6:34 [PATCH v13 00/10] xfs: Delayed Attributes Allison Henderson
                   ` (7 preceding siblings ...)
  2020-10-23  6:34 ` [PATCH v13 08/10] xfs: Enable delayed attributes Allison Henderson
@ 2020-10-23  6:34 ` Allison Henderson
  2020-11-10 20:07   ` Darrick J. Wong
  2020-10-23  6:34 ` [PATCH v13 10/10] xfs: Add delayed attributes error tag Allison Henderson
  9 siblings, 1 reply; 58+ messages in thread
From: Allison Henderson @ 2020-10-23  6:34 UTC (permalink / raw)
  To: linux-xfs

Remove xfs_attr_set_args, xfs_attr_remove_args, and xfs_attr_trans_roll.
These high level loops are now driven by the delayed operations code,
and can be removed.

Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
---
 fs/xfs/libxfs/xfs_attr.c        | 97 +----------------------------------------
 fs/xfs/libxfs/xfs_attr.h        |  9 ++--
 fs/xfs/libxfs/xfs_attr_remote.c |  4 +-
 3 files changed, 5 insertions(+), 105 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
index edd5d10..b5e1e84 100644
--- a/fs/xfs/libxfs/xfs_attr.c
+++ b/fs/xfs/libxfs/xfs_attr.c
@@ -262,65 +262,6 @@ xfs_attr_set_shortform(
 }
 
 /*
- * Checks to see if a delayed attribute transaction should be rolled.  If so,
- * also checks for a defer finish.  Transaction is finished and rolled as
- * needed, and returns true of false if the delayed operation should continue.
- */
-STATIC int
-xfs_attr_trans_roll(
-	struct xfs_delattr_context	*dac)
-{
-	struct xfs_da_args		*args = dac->da_args;
-	int				error = 0;
-
-	if (dac->flags & XFS_DAC_DEFER_FINISH) {
-		/*
-		 * The caller wants us to finish all the deferred ops so that we
-		 * avoid pinning the log tail with a large number of deferred
-		 * ops.
-		 */
-		dac->flags &= ~XFS_DAC_DEFER_FINISH;
-		error = xfs_defer_finish(&args->trans);
-		if (error)
-			return error;
-	}
-
-	return xfs_trans_roll_inode(&args->trans, args->dp);
-}
-
-/*
- * Set the attribute specified in @args.
- */
-int
-xfs_attr_set_args(
-	struct xfs_da_args	*args)
-{
-	struct xfs_buf			*leaf_bp = NULL;
-	int				error = 0;
-	struct xfs_delattr_context	dac = {
-		.da_args	= args,
-	};
-
-	do {
-		error = xfs_attr_set_iter(&dac, &leaf_bp);
-		if (error != -EAGAIN)
-			break;
-
-		error = xfs_attr_trans_roll(&dac);
-		if (error)
-			return error;
-
-		if (leaf_bp) {
-			xfs_trans_bjoin(args->trans, leaf_bp);
-			xfs_trans_bhold(args->trans, leaf_bp);
-		}
-
-	} while (true);
-
-	return error;
-}
-
-/*
  * Set the attribute specified in @args.
  * This routine is meant to function as a delayed operation, and may return
  * -EAGAIN when the transaction needs to be rolled.  Calling functions will need
@@ -363,11 +304,7 @@ xfs_attr_set_iter(
 		 * continue.  Otherwise, is it converted from shortform to leaf
 		 * and -EAGAIN is returned.
 		 */
-		error = xfs_attr_set_shortform(args, leaf_bp);
-		if (error == -EAGAIN)
-			dac->flags |= XFS_DAC_DEFER_FINISH;
-
-		return error;
+		return xfs_attr_set_shortform(args, leaf_bp);
 	}
 
 	/*
@@ -398,7 +335,6 @@ xfs_attr_set_iter(
 			 * same state (inode locked and joined, transaction
 			 * clean) no matter how we got to this step.
 			 */
-			dac->flags |= XFS_DAC_DEFER_FINISH;
 			return -EAGAIN;
 		case 0:
 			dac->dela_state = XFS_DAS_FOUND_LBLK;
@@ -455,32 +391,6 @@ xfs_has_attr(
 
 /*
  * Remove the attribute specified in @args.
- */
-int
-xfs_attr_remove_args(
-	struct xfs_da_args	*args)
-{
-	int				error = 0;
-	struct xfs_delattr_context	dac = {
-		.da_args	= args,
-	};
-
-	do {
-		error = xfs_attr_remove_iter(&dac);
-		if (error != -EAGAIN)
-			break;
-
-		error = xfs_attr_trans_roll(&dac);
-		if (error)
-			return error;
-
-	} while (true);
-
-	return error;
-}
-
-/*
- * Remove the attribute specified in @args.
  *
  * This function may return -EAGAIN to signal that the transaction needs to be
  * rolled.  Callers should continue calling this function until they receive a
@@ -895,7 +805,6 @@ xfs_attr_leaf_addname(
 		if (error)
 			return error;
 
-		dac->flags |= XFS_DAC_DEFER_FINISH;
 		return -EAGAIN;
 	}
 
@@ -1192,7 +1101,6 @@ xfs_attr_node_addname(
 			 * Restart routine from the top.  No need to set  the
 			 * state
 			 */
-			dac->flags |= XFS_DAC_DEFER_FINISH;
 			return -EAGAIN;
 		}
 
@@ -1205,7 +1113,6 @@ xfs_attr_node_addname(
 		error = xfs_da3_split(state);
 		if (error)
 			goto out;
-		dac->flags |= XFS_DAC_DEFER_FINISH;
 	} else {
 		/*
 		 * Addition succeeded, update Btree hashvals.
@@ -1246,7 +1153,6 @@ xfs_attr_node_addname(
 			if (error)
 				return error;
 
-			dac->flags |= XFS_DAC_DEFER_FINISH;
 			dac->dela_state = XFS_DAS_ALLOC_NODE;
 			return -EAGAIN;
 		}
@@ -1516,7 +1422,6 @@ xfs_attr_node_remove_step(
 		if (error)
 			return error;
 
-		dac->flags |= XFS_DAC_DEFER_FINISH;
 		dac->dela_state = XFS_DAS_RM_SHRINK;
 		return -EAGAIN;
 	}
diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
index 8a08411..6d90301 100644
--- a/fs/xfs/libxfs/xfs_attr.h
+++ b/fs/xfs/libxfs/xfs_attr.h
@@ -244,10 +244,9 @@ enum xfs_delattr_state {
 /*
  * Defines for xfs_delattr_context.flags
  */
-#define XFS_DAC_DEFER_FINISH		0x01 /* finish the transaction */
-#define XFS_DAC_NODE_RMVNAME_INIT	0x02 /* xfs_attr_node_removename init */
-#define XFS_DAC_LEAF_ADDNAME_INIT	0x04 /* xfs_attr_leaf_addname init*/
-#define XFS_DAC_DELAYED_OP_INIT		0x08 /* delayed operations init*/
+#define XFS_DAC_NODE_RMVNAME_INIT	0x01 /* xfs_attr_node_removename init */
+#define XFS_DAC_LEAF_ADDNAME_INIT	0x02 /* xfs_attr_leaf_addname init*/
+#define XFS_DAC_DELAYED_OP_INIT		0x04 /* delayed operations init*/
 
 /*
  * Context used for keeping track of delayed attribute operations
@@ -297,11 +296,9 @@ int xfs_inode_hasattr(struct xfs_inode *ip);
 int xfs_attr_get_ilocked(struct xfs_da_args *args);
 int xfs_attr_get(struct xfs_da_args *args);
 int xfs_attr_set(struct xfs_da_args *args);
-int xfs_attr_set_args(struct xfs_da_args *args);
 int xfs_attr_set_iter(struct xfs_delattr_context *dac,
 		      struct xfs_buf **leaf_bp);
 int xfs_has_attr(struct xfs_da_args *args);
-int xfs_attr_remove_args(struct xfs_da_args *args);
 int xfs_attr_remove_iter(struct xfs_delattr_context *dac);
 bool xfs_attr_namecheck(const void *name, size_t length);
 void xfs_delattr_context_init(struct xfs_delattr_context *dac,
diff --git a/fs/xfs/libxfs/xfs_attr_remote.c b/fs/xfs/libxfs/xfs_attr_remote.c
index 45c4bc5..262d1870 100644
--- a/fs/xfs/libxfs/xfs_attr_remote.c
+++ b/fs/xfs/libxfs/xfs_attr_remote.c
@@ -751,10 +751,8 @@ xfs_attr_rmtval_remove(
 	if (error)
 		return error;
 
-	if (!done) {
-		dac->flags |= XFS_DAC_DEFER_FINISH;
+	if (!done)
 		return -EAGAIN;
-	}
 
 	return error;
 }
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v13 10/10] xfs: Add delayed attributes error tag
  2020-10-23  6:34 [PATCH v13 00/10] xfs: Delayed Attributes Allison Henderson
                   ` (8 preceding siblings ...)
  2020-10-23  6:34 ` [PATCH v13 09/10] xfs: Remove unused xfs_attr_*_args Allison Henderson
@ 2020-10-23  6:34 ` Allison Henderson
  9 siblings, 0 replies; 58+ messages in thread
From: Allison Henderson @ 2020-10-23  6:34 UTC (permalink / raw)
  To: linux-xfs

From: Allison Collins <allison.henderson@oracle.com>

This patch adds an error tag that we can use to test delayed attribute
recovery and replay

Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/libxfs/xfs_errortag.h | 4 +++-
 fs/xfs/xfs_attr_item.c       | 8 ++++++++
 fs/xfs/xfs_error.c           | 3 +++
 3 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/fs/xfs/libxfs/xfs_errortag.h b/fs/xfs/libxfs/xfs_errortag.h
index 53b305d..cb38cbf 100644
--- a/fs/xfs/libxfs/xfs_errortag.h
+++ b/fs/xfs/libxfs/xfs_errortag.h
@@ -56,7 +56,8 @@
 #define XFS_ERRTAG_FORCE_SUMMARY_RECALC			33
 #define XFS_ERRTAG_IUNLINK_FALLBACK			34
 #define XFS_ERRTAG_BUF_IOERROR				35
-#define XFS_ERRTAG_MAX					36
+#define XFS_ERRTAG_DELAYED_ATTR				36
+#define XFS_ERRTAG_MAX					37
 
 /*
  * Random factors for above tags, 1 means always, 2 means 1/2 time, etc.
@@ -97,5 +98,6 @@
 #define XFS_RANDOM_FORCE_SUMMARY_RECALC			1
 #define XFS_RANDOM_IUNLINK_FALLBACK			(XFS_RANDOM_DEFAULT/10)
 #define XFS_RANDOM_BUF_IOERROR				XFS_RANDOM_DEFAULT
+#define XFS_RANDOM_DELAYED_ATTR				1
 
 #endif /* __XFS_ERRORTAG_H_ */
diff --git a/fs/xfs/xfs_attr_item.c b/fs/xfs/xfs_attr_item.c
index 3980066..3e75f2c 100644
--- a/fs/xfs/xfs_attr_item.c
+++ b/fs/xfs/xfs_attr_item.c
@@ -35,6 +35,8 @@
 #include "xfs_quota.h"
 #include "xfs_log_priv.h"
 #include "xfs_log_recover.h"
+#include "xfs_error.h"
+#include "xfs_errortag.h"
 
 static const struct xfs_item_ops xfs_attri_item_ops;
 static const struct xfs_item_ops xfs_attrd_item_ops;
@@ -310,6 +312,11 @@ xfs_trans_attr(
 	if (error)
 		return error;
 
+	if (XFS_TEST_ERROR(false, args->dp->i_mount, XFS_ERRTAG_DELAYED_ATTR)) {
+		error = -EIO;
+		goto out;
+	}
+
 	switch (op_flags) {
 	case XFS_ATTR_OP_FLAGS_SET:
 		args->op_flags |= XFS_DA_OP_ADDNAME;
@@ -324,6 +331,7 @@ xfs_trans_attr(
 		break;
 	}
 
+out:
 	/*
 	 * Mark the transaction dirty, even on error. This ensures the
 	 * transaction is aborted, which:
diff --git a/fs/xfs/xfs_error.c b/fs/xfs/xfs_error.c
index 7f6e208..fc551cb 100644
--- a/fs/xfs/xfs_error.c
+++ b/fs/xfs/xfs_error.c
@@ -54,6 +54,7 @@ static unsigned int xfs_errortag_random_default[] = {
 	XFS_RANDOM_FORCE_SUMMARY_RECALC,
 	XFS_RANDOM_IUNLINK_FALLBACK,
 	XFS_RANDOM_BUF_IOERROR,
+	XFS_RANDOM_DELAYED_ATTR,
 };
 
 struct xfs_errortag_attr {
@@ -164,6 +165,7 @@ XFS_ERRORTAG_ATTR_RW(force_repair,	XFS_ERRTAG_FORCE_SCRUB_REPAIR);
 XFS_ERRORTAG_ATTR_RW(bad_summary,	XFS_ERRTAG_FORCE_SUMMARY_RECALC);
 XFS_ERRORTAG_ATTR_RW(iunlink_fallback,	XFS_ERRTAG_IUNLINK_FALLBACK);
 XFS_ERRORTAG_ATTR_RW(buf_ioerror,	XFS_ERRTAG_BUF_IOERROR);
+XFS_ERRORTAG_ATTR_RW(delayed_attr,	XFS_ERRTAG_DELAYED_ATTR);
 
 static struct attribute *xfs_errortag_attrs[] = {
 	XFS_ERRORTAG_ATTR_LIST(noerror),
@@ -202,6 +204,7 @@ static struct attribute *xfs_errortag_attrs[] = {
 	XFS_ERRORTAG_ATTR_LIST(bad_summary),
 	XFS_ERRORTAG_ATTR_LIST(iunlink_fallback),
 	XFS_ERRORTAG_ATTR_LIST(buf_ioerror),
+	XFS_ERRORTAG_ATTR_LIST(delayed_attr),
 	NULL,
 };
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* Re: [PATCH v13 01/10] xfs: Add helper xfs_attr_node_remove_step
  2020-10-23  6:34 ` [PATCH v13 01/10] xfs: Add helper xfs_attr_node_remove_step Allison Henderson
@ 2020-10-27  7:03   ` Chandan Babu R
  2020-10-27 22:23     ` Allison Henderson
  2020-10-27 12:15   ` Brian Foster
  2020-11-10 23:12   ` Darrick J. Wong
  2 siblings, 1 reply; 58+ messages in thread
From: Chandan Babu R @ 2020-10-27  7:03 UTC (permalink / raw)
  To: Allison Henderson; +Cc: linux-xfs

On Friday 23 October 2020 12:04:26 PM IST Allison Henderson wrote:
> From: Allison Collins <allison.henderson@oracle.com>
> 
> This patch adds a new helper function xfs_attr_node_remove_step.  This
> will help simplify and modularize the calling function
> xfs_attr_node_remove.

The above should have been "xfs_attr_node_removename".

The code changes themselves are logically correct.
Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>

> 
> Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
> ---
>  fs/xfs/libxfs/xfs_attr.c | 46 ++++++++++++++++++++++++++++++++++------------
>  1 file changed, 34 insertions(+), 12 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
> index fd8e641..f4d39bf 100644
> --- a/fs/xfs/libxfs/xfs_attr.c
> +++ b/fs/xfs/libxfs/xfs_attr.c
> @@ -1228,19 +1228,14 @@ xfs_attr_node_remove_rmt(
>   * the root node (a special case of an intermediate node).
>   */
>  STATIC int
> -xfs_attr_node_removename(
> -	struct xfs_da_args	*args)
> +xfs_attr_node_remove_step(
> +	struct xfs_da_args	*args,
> +	struct xfs_da_state	*state)
>  {
> -	struct xfs_da_state	*state;
>  	struct xfs_da_state_blk	*blk;
>  	int			retval, error;
>  	struct xfs_inode	*dp = args->dp;
>  
> -	trace_xfs_attr_node_removename(args);
> -
> -	error = xfs_attr_node_removename_setup(args, &state);
> -	if (error)
> -		goto out;
>  
>  	/*
>  	 * If there is an out-of-line value, de-allocate the blocks.
> @@ -1250,7 +1245,7 @@ xfs_attr_node_removename(
>  	if (args->rmtblkno > 0) {
>  		error = xfs_attr_node_remove_rmt(args, state);
>  		if (error)
> -			goto out;
> +			return error;
>  	}
>  
>  	/*
> @@ -1267,18 +1262,45 @@ xfs_attr_node_removename(
>  	if (retval && (state->path.active > 1)) {
>  		error = xfs_da3_join(state);
>  		if (error)
> -			goto out;
> +			return error;
>  		error = xfs_defer_finish(&args->trans);
>  		if (error)
> -			goto out;
> +			return error;
>  		/*
>  		 * Commit the Btree join operation and start a new trans.
>  		 */
>  		error = xfs_trans_roll_inode(&args->trans, dp);
>  		if (error)
> -			goto out;
> +			return error;
>  	}
>  
> +	return error;
> +}
> +
> +/*
> + * Remove a name from a B-tree attribute list.
> + *
> + * This routine will find the blocks of the name to remove, remove them and
> + * shirnk the tree if needed.
> + */
> +STATIC int
> +xfs_attr_node_removename(
> +	struct xfs_da_args	*args)
> +{
> +	struct xfs_da_state	*state;
> +	int			error;
> +	struct xfs_inode	*dp = args->dp;
> +
> +	trace_xfs_attr_node_removename(args);
> +
> +	error = xfs_attr_node_removename_setup(args, &state);
> +	if (error)
> +		goto out;
> +
> +	error = xfs_attr_node_remove_step(args, state);
> +	if (error)
> +		goto out;
> +
>  	/*
>  	 * If the result is small enough, push it all into the inode.
>  	 */
> 


-- 
chandan




^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v13 02/10] xfs: Add delay ready attr remove routines
  2020-10-23  6:34 ` [PATCH v13 02/10] xfs: Add delay ready attr remove routines Allison Henderson
@ 2020-10-27  9:59   ` Chandan Babu R
  2020-10-27 15:32     ` Allison Henderson
  2020-10-27 12:16   ` Brian Foster
  2020-11-10 23:43   ` Darrick J. Wong
  2 siblings, 1 reply; 58+ messages in thread
From: Chandan Babu R @ 2020-10-27  9:59 UTC (permalink / raw)
  To: Allison Henderson; +Cc: linux-xfs

On Friday 23 October 2020 12:04:27 PM IST Allison Henderson wrote:
> This patch modifies the attr remove routines to be delay ready. This
> means they no longer roll or commit transactions, but instead return
> -EAGAIN to have the calling routine roll and refresh the transaction. In
> this series, xfs_attr_remove_args has become xfs_attr_remove_iter, which
> uses a sort of state machine like switch to keep track of where it was
> when EAGAIN was returned. xfs_attr_node_removename has also been
> modified to use the switch, and a new version of xfs_attr_remove_args
> consists of a simple loop to refresh the transaction until the operation
> is completed.  A new XFS_DAC_DEFER_FINISH flag is used to finish the
> transaction where ever the existing code used to.
> 
> Calls to xfs_attr_rmtval_remove are replaced with the delay ready
> version __xfs_attr_rmtval_remove. We will rename
> __xfs_attr_rmtval_remove back to xfs_attr_rmtval_remove when we are
> done.
> 
> xfs_attr_rmtval_remove itself is still in use by the set routines (used
> during a rename).  For reasons of preserving existing function, we
> modify xfs_attr_rmtval_remove to call xfs_defer_finish when the flag is
> set.  Similar to how xfs_attr_remove_args does here.  Once we transition
> the set routines to be delay ready, xfs_attr_rmtval_remove is no longer
> used and will be removed.
> 
> This patch also adds a new struct xfs_delattr_context, which we will use
> to keep track of the current state of an attribute operation. The new
> xfs_delattr_state enum is used to track various operations that are in
> progress so that we know not to repeat them, and resume where we left
> off before EAGAIN was returned to cycle out the transaction. Other
> members take the place of local variables that need to retain their
> values across multiple function recalls.  See xfs_attr.h for a more
> detailed diagram of the states.
> 
> Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
> ---
>  fs/xfs/libxfs/xfs_attr.c        | 200 +++++++++++++++++++++++++++++-----------
>  fs/xfs/libxfs/xfs_attr.h        |  72 +++++++++++++++
>  fs/xfs/libxfs/xfs_attr_leaf.c   |   2 +-
>  fs/xfs/libxfs/xfs_attr_remote.c |  37 ++++----
>  fs/xfs/libxfs/xfs_attr_remote.h |   2 +-
>  fs/xfs/xfs_attr_inactive.c      |   2 +-
>  6 files changed, 241 insertions(+), 74 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
> index f4d39bf..6ca94cb 100644
> --- a/fs/xfs/libxfs/xfs_attr.c
> +++ b/fs/xfs/libxfs/xfs_attr.c
> @@ -53,7 +53,7 @@ STATIC int xfs_attr_leaf_hasname(struct xfs_da_args *args, struct xfs_buf **bp);
>   */
>  STATIC int xfs_attr_node_get(xfs_da_args_t *args);
>  STATIC int xfs_attr_node_addname(xfs_da_args_t *args);
> -STATIC int xfs_attr_node_removename(xfs_da_args_t *args);
> +STATIC int xfs_attr_node_removename_iter(struct xfs_delattr_context *dac);
>  STATIC int xfs_attr_node_hasname(xfs_da_args_t *args,
>  				 struct xfs_da_state **state);
>  STATIC int xfs_attr_fillstate(xfs_da_state_t *state);
> @@ -264,6 +264,33 @@ xfs_attr_set_shortform(
>  }
>  
>  /*
> + * Checks to see if a delayed attribute transaction should be rolled.  If so,
> + * also checks for a defer finish.  Transaction is finished and rolled as
> + * needed, and returns true of false if the delayed operation should continue.
> + */
> +int
> +xfs_attr_trans_roll(
> +	struct xfs_delattr_context	*dac)
> +{
> +	struct xfs_da_args		*args = dac->da_args;
> +	int				error = 0;
> +
> +	if (dac->flags & XFS_DAC_DEFER_FINISH) {
> +		/*
> +		 * The caller wants us to finish all the deferred ops so that we
> +		 * avoid pinning the log tail with a large number of deferred
> +		 * ops.
> +		 */
> +		dac->flags &= ~XFS_DAC_DEFER_FINISH;
> +		error = xfs_defer_finish(&args->trans);
> +		if (error)
> +			return error;
> +	}
> +
> +	return xfs_trans_roll_inode(&args->trans, args->dp);
> +}
> +
> +/*
>   * Set the attribute specified in @args.
>   */
>  int
> @@ -364,23 +391,54 @@ xfs_has_attr(
>   */
>  int
>  xfs_attr_remove_args(
> -	struct xfs_da_args      *args)
> +	struct xfs_da_args	*args)
>  {
> -	struct xfs_inode	*dp = args->dp;
> -	int			error;
> +	int				error = 0;

I guess the explicit initialization of "error" can be removed since the
value returned by the call to xfs_attr_remove_iter() will overwrite it.

> +	struct xfs_delattr_context	dac = {
> +		.da_args	= args,
> +	};
> +
> +	do {
> +		error = xfs_attr_remove_iter(&dac);
> +		if (error != -EAGAIN)
> +			break;
> +
> +		error = xfs_attr_trans_roll(&dac);
> +		if (error)
> +			return error;
> +
> +	} while (true);
> +
> +	return error;
> +}
> +
> +/*
> + * Remove the attribute specified in @args.
> + *
> + * This function may return -EAGAIN to signal that the transaction needs to be
> + * rolled.  Callers should continue calling this function until they receive a
> + * return value other than -EAGAIN.
> + */
> +int
> +xfs_attr_remove_iter(
> +	struct xfs_delattr_context	*dac)
> +{
> +	struct xfs_da_args		*args = dac->da_args;
> +	struct xfs_inode		*dp = args->dp;
> +
> +	if (dac->dela_state == XFS_DAS_RM_SHRINK)
> +		goto node;
>  
>  	if (!xfs_inode_hasattr(dp)) {
> -		error = -ENOATTR;
> +		return -ENOATTR;
>  	} else if (dp->i_afp->if_format == XFS_DINODE_FMT_LOCAL) {
>  		ASSERT(dp->i_afp->if_flags & XFS_IFINLINE);
> -		error = xfs_attr_shortform_remove(args);
> +		return xfs_attr_shortform_remove(args);
>  	} else if (xfs_bmap_one_block(dp, XFS_ATTR_FORK)) {
> -		error = xfs_attr_leaf_removename(args);
> -	} else {
> -		error = xfs_attr_node_removename(args);
> +		return xfs_attr_leaf_removename(args);
>  	}
> -
> -	return error;
> +node:
> +	return  xfs_attr_node_removename_iter(dac);
>  }
>  
>  /*
> @@ -1178,10 +1236,11 @@ xfs_attr_leaf_mark_incomplete(
>   */
>  STATIC
>  int xfs_attr_node_removename_setup(
> -	struct xfs_da_args	*args,
> -	struct xfs_da_state	**state)
> +	struct xfs_delattr_context	*dac,
> +	struct xfs_da_state		**state)
>  {
> -	int			error;
> +	struct xfs_da_args		*args = dac->da_args;
> +	int				error;
>  
>  	error = xfs_attr_node_hasname(args, state);
>  	if (error != -EEXIST)
> @@ -1191,6 +1250,12 @@ int xfs_attr_node_removename_setup(
>  	ASSERT((*state)->path.blk[(*state)->path.active - 1].magic ==
>  		XFS_ATTR_LEAF_MAGIC);
>  
> +	/*
> +	 * Store state in the context incase we need to cycle out the
> +	 * transaction
> +	 */
> +	dac->da_state = *state;
> +
>  	if (args->rmtblkno > 0) {
>  		error = xfs_attr_leaf_mark_incomplete(args, *state);
>  		if (error)
> @@ -1203,13 +1268,16 @@ int xfs_attr_node_removename_setup(
>  }
>  
>  STATIC int
> -xfs_attr_node_remove_rmt(
> -	struct xfs_da_args	*args,
> -	struct xfs_da_state	*state)
> +xfs_attr_node_remove_rmt (
> +	struct xfs_delattr_context	*dac,
> +	struct xfs_da_state		*state)
>  {
> -	int			error = 0;
> +	int				error = 0;
>  
> -	error = xfs_attr_rmtval_remove(args);
> +	/*
> +	 * May return -EAGAIN to request that the caller recall this function
> +	 */
> +	error = __xfs_attr_rmtval_remove(dac);
>  	if (error)
>  		return error;
>  
> @@ -1221,21 +1289,27 @@ xfs_attr_node_remove_rmt(
>  }
>  
>  /*
> - * Remove a name from a B-tree attribute list.
> + * Step through removeing a name from a B-tree attribute list.
>   *
>   * This will involve walking down the Btree, and may involve joining
>   * leaf nodes and even joining intermediate nodes up to and including
>   * the root node (a special case of an intermediate node).
> + *
> + * This routine is meant to function as either an inline or delayed operation,
> + * and may return -EAGAIN when the transaction needs to be rolled.  Calling
> + * functions will need to handle this, and recall the function until a
> + * successful error code is returned.
>   */
>  STATIC int
>  xfs_attr_node_remove_step(
> -	struct xfs_da_args	*args,
> -	struct xfs_da_state	*state)
> +	struct xfs_delattr_context	*dac)
>  {
> -	struct xfs_da_state_blk	*blk;
> -	int			retval, error;
> -	struct xfs_inode	*dp = args->dp;
> +	struct xfs_da_args		*args = dac->da_args;
> +	struct xfs_da_state		*state;
> +	struct xfs_da_state_blk		*blk;
> +	int				retval, error = 0;
>  
> +	state = dac->da_state;
>  
>  	/*
>  	 * If there is an out-of-line value, de-allocate the blocks.
> @@ -1243,7 +1317,10 @@ xfs_attr_node_remove_step(
>  	 * overflow the maximum size of a transaction and/or hit a deadlock.
>  	 */
>  	if (args->rmtblkno > 0) {
> -		error = xfs_attr_node_remove_rmt(args, state);
> +		/*
> +		 * May return -EAGAIN. Remove blocks until args->rmtblkno == 0
> +		 */
> +		error = xfs_attr_node_remove_rmt(dac, state);
>  		if (error)
>  			return error;
>  	}
> @@ -1257,21 +1334,18 @@ xfs_attr_node_remove_step(
>  	xfs_da3_fixhashpath(state, &state->path);
>  
>  	/*
> -	 * Check to see if the tree needs to be collapsed.
> +	 * Check to see if the tree needs to be collapsed.  Set the flag to
> +	 * indicate that the calling function needs to move the to shrink
> +	 * operation
>  	 */
>  	if (retval && (state->path.active > 1)) {
>  		error = xfs_da3_join(state);
>  		if (error)
>  			return error;
> -		error = xfs_defer_finish(&args->trans);
> -		if (error)
> -			return error;
> -		/*
> -		 * Commit the Btree join operation and start a new trans.
> -		 */
> -		error = xfs_trans_roll_inode(&args->trans, dp);
> -		if (error)
> -			return error;
> +
> +		dac->flags |= XFS_DAC_DEFER_FINISH;
> +		dac->dela_state = XFS_DAS_RM_SHRINK;
> +		return -EAGAIN;
>  	}
>  
>  	return error;
> @@ -1282,31 +1356,53 @@ xfs_attr_node_remove_step(
>   *
>   * This routine will find the blocks of the name to remove, remove them and
>   * shirnk the tree if needed.
> + *
> + * This routine is meant to function as either an inline or delayed operation,
> + * and may return -EAGAIN when the transaction needs to be rolled.  Calling
> + * functions will need to handle this, and recall the function until a
> + * successful error code is returned.
>   */
>  STATIC int
> -xfs_attr_node_removename(
> -	struct xfs_da_args	*args)
> +xfs_attr_node_removename_iter(
> +	struct xfs_delattr_context	*dac)
>  {
> -	struct xfs_da_state	*state;
> -	int			error;
> -	struct xfs_inode	*dp = args->dp;
> +	struct xfs_da_args		*args = dac->da_args;
> +	struct xfs_da_state		*state;
> +	int				error;
> +	struct xfs_inode		*dp = args->dp;
>  
>  	trace_xfs_attr_node_removename(args);
> +	state = dac->da_state;
>  
> -	error = xfs_attr_node_removename_setup(args, &state);
> -	if (error)
> -		goto out;
> +	if ((dac->flags & XFS_DAC_NODE_RMVNAME_INIT) == 0) {
> +		dac->flags |= XFS_DAC_NODE_RMVNAME_INIT;
> +		error = xfs_attr_node_removename_setup(dac, &state);
> +		if (error)
> +			goto out;
> +	}
>  
> -	error = xfs_attr_node_remove_step(args, state);
> -	if (error)
> -		goto out;
> +	switch (dac->dela_state) {
> +	case XFS_DAS_UNINIT:
> +		error = xfs_attr_node_remove_step(dac);
> +		if (error)
> +			break;
>  
> -	/*
> -	 * If the result is small enough, push it all into the inode.
> -	 */
> -	if (xfs_bmap_one_block(dp, XFS_ATTR_FORK))
> -		error = xfs_attr_node_shrink(args, state);
> +		/* do not break, proceed to shrink if needed */
> +	case XFS_DAS_RM_SHRINK:
> +		/*
> +		 * If the result is small enough, push it all into the inode.
> +		 */
> +		if (xfs_bmap_one_block(dp, XFS_ATTR_FORK))
> +			error = xfs_attr_node_shrink(args, state);
>  
> +		break;
> +	default:
> +		ASSERT(0);
> +		return -EINVAL;

I don't think it is possible in a real world scenario, but if "state" were
pointing to allocated memory then the above return value might leak the
corresponding memory.

Apart from the above nit, the remaining changes look good to me.

Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>

> +	}
> +
> +	if (error == -EAGAIN)
> +		return error;
>  out:
>  	if (state)
>  		xfs_da_state_free(state);
> diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
> index 3e97a93..64dcf0f 100644
> --- a/fs/xfs/libxfs/xfs_attr.h
> +++ b/fs/xfs/libxfs/xfs_attr.h
> @@ -74,6 +74,74 @@ struct xfs_attr_list_context {
>  };
>  
>  
> +/*
> + * ========================================================================
> + * Structure used to pass context around among the delayed routines.
> + * ========================================================================
> + */
> +
> +/*
> + * Below is a state machine diagram for attr remove operations. The  XFS_DAS_*
> + * states indicate places where the function would return -EAGAIN, and then
> + * immediately resume from after being recalled by the calling function. States
> + * marked as a "subroutine state" indicate that they belong to a subroutine, and
> + * so the calling function needs to pass them back to that subroutine to allow
> + * it to finish where it left off. But they otherwise do not have a role in the
> + * calling function other than just passing through.
> + *
> + * xfs_attr_remove_iter()
> + *	  XFS_DAS_RM_SHRINK ─┐
> + *	  (subroutine state) │
> + *	                     └─>xfs_attr_node_removename()
> + *	                                      │
> + *	                                      v
> + *	                                   need to
> + *	                                shrink tree? ─n─┐
> + *	                                      │         │
> + *	                                      y         │
> + *	                                      │         │
> + *	                                      v         │
> + *	                              XFS_DAS_RM_SHRINK │
> + *	                                      │         │
> + *	                                      v         │
> + *	                                     done <─────┘
> + *
> + */
> +
> +/*
> + * Enum values for xfs_delattr_context.da_state
> + *
> + * These values are used by delayed attribute operations to keep track  of where
> + * they were before they returned -EAGAIN.  A return code of -EAGAIN signals the
> + * calling function to roll the transaction, and then recall the subroutine to
> + * finish the operation.  The enum is then used by the subroutine to jump back
> + * to where it was and resume executing where it left off.
> + */
> +enum xfs_delattr_state {
> +	XFS_DAS_UNINIT		= 0,  /* No state has been set yet */
> +	XFS_DAS_RM_SHRINK,	      /* We are shrinking the tree */
> +};
> +
> +/*
> + * Defines for xfs_delattr_context.flags
> + */
> +#define XFS_DAC_DEFER_FINISH		0x01 /* finish the transaction */
> +#define XFS_DAC_NODE_RMVNAME_INIT	0x02 /* xfs_attr_node_removename init */
> +
> +/*
> + * Context used for keeping track of delayed attribute operations
> + */
> +struct xfs_delattr_context {
> +	struct xfs_da_args      *da_args;
> +
> +	/* Used in xfs_attr_node_removename to roll through removing blocks */
> +	struct xfs_da_state     *da_state;
> +
> +	/* Used to keep track of current state of delayed operation */
> +	unsigned int            flags;
> +	enum xfs_delattr_state  dela_state;
> +};
> +
>  /*========================================================================
>   * Function prototypes for the kernel.
>   *========================================================================*/
> @@ -91,6 +159,10 @@ int xfs_attr_set(struct xfs_da_args *args);
>  int xfs_attr_set_args(struct xfs_da_args *args);
>  int xfs_has_attr(struct xfs_da_args *args);
>  int xfs_attr_remove_args(struct xfs_da_args *args);
> +int xfs_attr_remove_iter(struct xfs_delattr_context *dac);
> +int xfs_attr_trans_roll(struct xfs_delattr_context *dac);
>  bool xfs_attr_namecheck(const void *name, size_t length);
> +void xfs_delattr_context_init(struct xfs_delattr_context *dac,
> +			      struct xfs_da_args *args);
>  
>  #endif	/* __XFS_ATTR_H__ */
> diff --git a/fs/xfs/libxfs/xfs_attr_leaf.c b/fs/xfs/libxfs/xfs_attr_leaf.c
> index bb128db..338377e 100644
> --- a/fs/xfs/libxfs/xfs_attr_leaf.c
> +++ b/fs/xfs/libxfs/xfs_attr_leaf.c
> @@ -19,8 +19,8 @@
>  #include "xfs_bmap_btree.h"
>  #include "xfs_bmap.h"
>  #include "xfs_attr_sf.h"
> -#include "xfs_attr_remote.h"
>  #include "xfs_attr.h"
> +#include "xfs_attr_remote.h"
>  #include "xfs_attr_leaf.h"
>  #include "xfs_error.h"
>  #include "xfs_trace.h"
> diff --git a/fs/xfs/libxfs/xfs_attr_remote.c b/fs/xfs/libxfs/xfs_attr_remote.c
> index 48d8e9c..1426c15 100644
> --- a/fs/xfs/libxfs/xfs_attr_remote.c
> +++ b/fs/xfs/libxfs/xfs_attr_remote.c
> @@ -674,10 +674,12 @@ xfs_attr_rmtval_invalidate(
>   */
>  int
>  xfs_attr_rmtval_remove(
> -	struct xfs_da_args      *args)
> +	struct xfs_da_args		*args)
>  {
> -	int			error;
> -	int			retval;
> +	int				error;
> +	struct xfs_delattr_context	dac  = {
> +		.da_args	= args,
> +	};
>  
>  	trace_xfs_attr_rmtval_remove(args);
>  
> @@ -685,19 +687,17 @@ xfs_attr_rmtval_remove(
>  	 * Keep de-allocating extents until the remote-value region is gone.
>  	 */
>  	do {
> -		retval = __xfs_attr_rmtval_remove(args);
> -		if (retval && retval != -EAGAIN)
> -			return retval;
> +		error = __xfs_attr_rmtval_remove(&dac);
> +		if (error != -EAGAIN)
> +			break;
>  
> -		/*
> -		 * Close out trans and start the next one in the chain.
> -		 */
> -		error = xfs_trans_roll_inode(&args->trans, args->dp);
> +		error = xfs_attr_trans_roll(&dac);
>  		if (error)
>  			return error;
> -	} while (retval == -EAGAIN);
>  
> -	return 0;
> +	} while (true);
> +
> +	return error;
>  }
>  
>  /*
> @@ -707,9 +707,10 @@ xfs_attr_rmtval_remove(
>   */
>  int
>  __xfs_attr_rmtval_remove(
> -	struct xfs_da_args	*args)
> +	struct xfs_delattr_context	*dac)
>  {
> -	int			error, done;
> +	struct xfs_da_args		*args = dac->da_args;
> +	int				error, done;
>  
>  	/*
>  	 * Unmap value blocks for this attr.
> @@ -719,12 +720,10 @@ __xfs_attr_rmtval_remove(
>  	if (error)
>  		return error;
>  
> -	error = xfs_defer_finish(&args->trans);
> -	if (error)
> -		return error;
> -
> -	if (!done)
> +	if (!done) {
> +		dac->flags |= XFS_DAC_DEFER_FINISH;
>  		return -EAGAIN;
> +	}
>  
>  	return error;
>  }
> diff --git a/fs/xfs/libxfs/xfs_attr_remote.h b/fs/xfs/libxfs/xfs_attr_remote.h
> index 9eee615..002fd30 100644
> --- a/fs/xfs/libxfs/xfs_attr_remote.h
> +++ b/fs/xfs/libxfs/xfs_attr_remote.h
> @@ -14,5 +14,5 @@ int xfs_attr_rmtval_remove(struct xfs_da_args *args);
>  int xfs_attr_rmtval_stale(struct xfs_inode *ip, struct xfs_bmbt_irec *map,
>  		xfs_buf_flags_t incore_flags);
>  int xfs_attr_rmtval_invalidate(struct xfs_da_args *args);
> -int __xfs_attr_rmtval_remove(struct xfs_da_args *args);
> +int __xfs_attr_rmtval_remove(struct xfs_delattr_context *dac);
>  #endif /* __XFS_ATTR_REMOTE_H__ */
> diff --git a/fs/xfs/xfs_attr_inactive.c b/fs/xfs/xfs_attr_inactive.c
> index bfad669..aaa7e66 100644
> --- a/fs/xfs/xfs_attr_inactive.c
> +++ b/fs/xfs/xfs_attr_inactive.c
> @@ -15,10 +15,10 @@
>  #include "xfs_da_format.h"
>  #include "xfs_da_btree.h"
>  #include "xfs_inode.h"
> +#include "xfs_attr.h"
>  #include "xfs_attr_remote.h"
>  #include "xfs_trans.h"
>  #include "xfs_bmap.h"
> -#include "xfs_attr.h"
>  #include "xfs_attr_leaf.h"
>  #include "xfs_quota.h"
>  #include "xfs_dir2.h"
> 


-- 
chandan




^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v13 01/10] xfs: Add helper xfs_attr_node_remove_step
  2020-10-23  6:34 ` [PATCH v13 01/10] xfs: Add helper xfs_attr_node_remove_step Allison Henderson
  2020-10-27  7:03   ` Chandan Babu R
@ 2020-10-27 12:15   ` Brian Foster
  2020-10-27 15:33     ` Allison Henderson
  2020-11-10 23:12   ` Darrick J. Wong
  2 siblings, 1 reply; 58+ messages in thread
From: Brian Foster @ 2020-10-27 12:15 UTC (permalink / raw)
  To: Allison Henderson; +Cc: linux-xfs

On Thu, Oct 22, 2020 at 11:34:26PM -0700, Allison Henderson wrote:
> From: Allison Collins <allison.henderson@oracle.com>
> 
> This patch adds a new helper function xfs_attr_node_remove_step.  This
> will help simplify and modularize the calling function
> xfs_attr_node_remove.
> 
> Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
> ---
>  fs/xfs/libxfs/xfs_attr.c | 46 ++++++++++++++++++++++++++++++++++------------
>  1 file changed, 34 insertions(+), 12 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
> index fd8e641..f4d39bf 100644
> --- a/fs/xfs/libxfs/xfs_attr.c
> +++ b/fs/xfs/libxfs/xfs_attr.c
...
> @@ -1267,18 +1262,45 @@ xfs_attr_node_removename(
>  	if (retval && (state->path.active > 1)) {
>  		error = xfs_da3_join(state);
>  		if (error)
> -			goto out;
> +			return error;
>  		error = xfs_defer_finish(&args->trans);
>  		if (error)
> -			goto out;
> +			return error;
>  		/*
>  		 * Commit the Btree join operation and start a new trans.
>  		 */
>  		error = xfs_trans_roll_inode(&args->trans, dp);
>  		if (error)
> -			goto out;
> +			return error;
>  	}
>  
> +	return error;
> +}
> +
> +/*
> + * Remove a name from a B-tree attribute list.
> + *
> + * This routine will find the blocks of the name to remove, remove them and
> + * shirnk the tree if needed.
> + */
> +STATIC int
> +xfs_attr_node_removename(
> +	struct xfs_da_args	*args)
> +{
> +	struct xfs_da_state	*state;

It urks me a little bit that we have to dig down into a couple functions
to grok that state allocation is the first step or otherwise occurs
before we potentially use the error path. Since we already check for
state in the out path, can we just initialize this as *state = NULL
here so the logic is clear? Otherwise the patch LGTM:

Reviewed-by: Brian Foster <bfoster@redhat.com>

> +	int			error;
> +	struct xfs_inode	*dp = args->dp;
> +
> +	trace_xfs_attr_node_removename(args);
> +
> +	error = xfs_attr_node_removename_setup(args, &state);
> +	if (error)
> +		goto out;
> +
> +	error = xfs_attr_node_remove_step(args, state);
> +	if (error)
> +		goto out;
> +
>  	/*
>  	 * If the result is small enough, push it all into the inode.
>  	 */
> -- 
> 2.7.4
> 


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v13 02/10] xfs: Add delay ready attr remove routines
  2020-10-23  6:34 ` [PATCH v13 02/10] xfs: Add delay ready attr remove routines Allison Henderson
  2020-10-27  9:59   ` Chandan Babu R
@ 2020-10-27 12:16   ` Brian Foster
  2020-10-27 22:27     ` Allison Henderson
  2020-11-10 23:15     ` Darrick J. Wong
  2020-11-10 23:43   ` Darrick J. Wong
  2 siblings, 2 replies; 58+ messages in thread
From: Brian Foster @ 2020-10-27 12:16 UTC (permalink / raw)
  To: Allison Henderson; +Cc: linux-xfs

On Thu, Oct 22, 2020 at 11:34:27PM -0700, Allison Henderson wrote:
> This patch modifies the attr remove routines to be delay ready. This
> means they no longer roll or commit transactions, but instead return
> -EAGAIN to have the calling routine roll and refresh the transaction. In
> this series, xfs_attr_remove_args has become xfs_attr_remove_iter, which
> uses a sort of state machine like switch to keep track of where it was
> when EAGAIN was returned. xfs_attr_node_removename has also been
> modified to use the switch, and a new version of xfs_attr_remove_args
> consists of a simple loop to refresh the transaction until the operation
> is completed.  A new XFS_DAC_DEFER_FINISH flag is used to finish the
> transaction where ever the existing code used to.
> 
> Calls to xfs_attr_rmtval_remove are replaced with the delay ready
> version __xfs_attr_rmtval_remove. We will rename
> __xfs_attr_rmtval_remove back to xfs_attr_rmtval_remove when we are
> done.
> 
> xfs_attr_rmtval_remove itself is still in use by the set routines (used
> during a rename).  For reasons of preserving existing function, we
> modify xfs_attr_rmtval_remove to call xfs_defer_finish when the flag is
> set.  Similar to how xfs_attr_remove_args does here.  Once we transition
> the set routines to be delay ready, xfs_attr_rmtval_remove is no longer
> used and will be removed.
> 
> This patch also adds a new struct xfs_delattr_context, which we will use
> to keep track of the current state of an attribute operation. The new
> xfs_delattr_state enum is used to track various operations that are in
> progress so that we know not to repeat them, and resume where we left
> off before EAGAIN was returned to cycle out the transaction. Other
> members take the place of local variables that need to retain their
> values across multiple function recalls.  See xfs_attr.h for a more
> detailed diagram of the states.
> 
> Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
> ---
>  fs/xfs/libxfs/xfs_attr.c        | 200 +++++++++++++++++++++++++++++-----------
>  fs/xfs/libxfs/xfs_attr.h        |  72 +++++++++++++++
>  fs/xfs/libxfs/xfs_attr_leaf.c   |   2 +-
>  fs/xfs/libxfs/xfs_attr_remote.c |  37 ++++----
>  fs/xfs/libxfs/xfs_attr_remote.h |   2 +-
>  fs/xfs/xfs_attr_inactive.c      |   2 +-
>  6 files changed, 241 insertions(+), 74 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
> index f4d39bf..6ca94cb 100644
> --- a/fs/xfs/libxfs/xfs_attr.c
> +++ b/fs/xfs/libxfs/xfs_attr.c
> @@ -53,7 +53,7 @@ STATIC int xfs_attr_leaf_hasname(struct xfs_da_args *args, struct xfs_buf **bp);
>   */
>  STATIC int xfs_attr_node_get(xfs_da_args_t *args);
>  STATIC int xfs_attr_node_addname(xfs_da_args_t *args);
> -STATIC int xfs_attr_node_removename(xfs_da_args_t *args);
> +STATIC int xfs_attr_node_removename_iter(struct xfs_delattr_context *dac);
>  STATIC int xfs_attr_node_hasname(xfs_da_args_t *args,
>  				 struct xfs_da_state **state);
>  STATIC int xfs_attr_fillstate(xfs_da_state_t *state);
> @@ -264,6 +264,33 @@ xfs_attr_set_shortform(
>  }
>  
>  /*
> + * Checks to see if a delayed attribute transaction should be rolled.  If so,
> + * also checks for a defer finish.  Transaction is finished and rolled as
> + * needed, and returns true of false if the delayed operation should continue.
> + */
> +int
> +xfs_attr_trans_roll(
> +	struct xfs_delattr_context	*dac)
> +{
> +	struct xfs_da_args		*args = dac->da_args;
> +	int				error = 0;
> +
> +	if (dac->flags & XFS_DAC_DEFER_FINISH) {
> +		/*
> +		 * The caller wants us to finish all the deferred ops so that we
> +		 * avoid pinning the log tail with a large number of deferred
> +		 * ops.
> +		 */
> +		dac->flags &= ~XFS_DAC_DEFER_FINISH;
> +		error = xfs_defer_finish(&args->trans);
> +		if (error)
> +			return error;
> +	}
> +

It seems like some comments on the previous version weren't addressed.
I.e., the spurious transaction roll here when a dfops finish occurs..?

> +	return xfs_trans_roll_inode(&args->trans, args->dp);
> +}
> +
> +/*
>   * Set the attribute specified in @args.
>   */
>  int
> @@ -364,23 +391,54 @@ xfs_has_attr(
>   */
>  int
>  xfs_attr_remove_args(
> -	struct xfs_da_args      *args)
> +	struct xfs_da_args	*args)
>  {
> -	struct xfs_inode	*dp = args->dp;
> -	int			error;
> +	int				error = 0;
> +	struct xfs_delattr_context	dac = {
> +		.da_args	= args,
> +	};
> +
> +	do {
> +		error = xfs_attr_remove_iter(&dac);
> +		if (error != -EAGAIN)
> +			break;
> +
> +		error = xfs_attr_trans_roll(&dac);
> +		if (error)
> +			return error;
> +
> +	} while (true);
> +
> +	return error;
> +}
> +
> +/*
> + * Remove the attribute specified in @args.
> + *
> + * This function may return -EAGAIN to signal that the transaction needs to be
> + * rolled.  Callers should continue calling this function until they receive a
> + * return value other than -EAGAIN.
> + */
> +int
> +xfs_attr_remove_iter(
> +	struct xfs_delattr_context	*dac)
> +{
> +	struct xfs_da_args		*args = dac->da_args;
> +	struct xfs_inode		*dp = args->dp;
> +
> +	if (dac->dela_state == XFS_DAS_RM_SHRINK)
> +		goto node;
>  
>  	if (!xfs_inode_hasattr(dp)) {
> -		error = -ENOATTR;
> +		return -ENOATTR;
>  	} else if (dp->i_afp->if_format == XFS_DINODE_FMT_LOCAL) {
>  		ASSERT(dp->i_afp->if_flags & XFS_IFINLINE);
> -		error = xfs_attr_shortform_remove(args);
> +		return xfs_attr_shortform_remove(args);
>  	} else if (xfs_bmap_one_block(dp, XFS_ATTR_FORK)) {
> -		error = xfs_attr_leaf_removename(args);
> -	} else {
> -		error = xfs_attr_node_removename(args);
> +		return xfs_attr_leaf_removename(args);
>  	}
> -
> -	return error;
> +node:
> +	return  xfs_attr_node_removename_iter(dac);
>  }
>  
>  /*
> @@ -1178,10 +1236,11 @@ xfs_attr_leaf_mark_incomplete(
>   */
>  STATIC
>  int xfs_attr_node_removename_setup(
> -	struct xfs_da_args	*args,
> -	struct xfs_da_state	**state)
> +	struct xfs_delattr_context	*dac,
> +	struct xfs_da_state		**state)
>  {
> -	int			error;
> +	struct xfs_da_args		*args = dac->da_args;
> +	int				error;
>  
>  	error = xfs_attr_node_hasname(args, state);
>  	if (error != -EEXIST)
> @@ -1191,6 +1250,12 @@ int xfs_attr_node_removename_setup(
>  	ASSERT((*state)->path.blk[(*state)->path.active - 1].magic ==
>  		XFS_ATTR_LEAF_MAGIC);
>  
> +	/*
> +	 * Store state in the context incase we need to cycle out the
> +	 * transaction
> +	 */
> +	dac->da_state = *state;
> +
>  	if (args->rmtblkno > 0) {
>  		error = xfs_attr_leaf_mark_incomplete(args, *state);
>  		if (error)
> @@ -1203,13 +1268,16 @@ int xfs_attr_node_removename_setup(
>  }
>  
>  STATIC int
> -xfs_attr_node_remove_rmt(
> -	struct xfs_da_args	*args,
> -	struct xfs_da_state	*state)
> +xfs_attr_node_remove_rmt (

Extra space		   ^

> +	struct xfs_delattr_context	*dac,
> +	struct xfs_da_state		*state)
>  {
> -	int			error = 0;
> +	int				error = 0;
>  
> -	error = xfs_attr_rmtval_remove(args);
> +	/*
> +	 * May return -EAGAIN to request that the caller recall this function
> +	 */
> +	error = __xfs_attr_rmtval_remove(dac);
>  	if (error)
>  		return error;
>  
> @@ -1221,21 +1289,27 @@ xfs_attr_node_remove_rmt(
>  }
>  
>  /*
> - * Remove a name from a B-tree attribute list.
> + * Step through removeing a name from a B-tree attribute list.
>   *
>   * This will involve walking down the Btree, and may involve joining
>   * leaf nodes and even joining intermediate nodes up to and including
>   * the root node (a special case of an intermediate node).
> + *
> + * This routine is meant to function as either an inline or delayed operation,
> + * and may return -EAGAIN when the transaction needs to be rolled.  Calling
> + * functions will need to handle this, and recall the function until a
> + * successful error code is returned.
>   */
>  STATIC int
>  xfs_attr_node_remove_step(
> -	struct xfs_da_args	*args,
> -	struct xfs_da_state	*state)
> +	struct xfs_delattr_context	*dac)
>  {
> -	struct xfs_da_state_blk	*blk;
> -	int			retval, error;
> -	struct xfs_inode	*dp = args->dp;
> +	struct xfs_da_args		*args = dac->da_args;
> +	struct xfs_da_state		*state;
> +	struct xfs_da_state_blk		*blk;
> +	int				retval, error = 0;
>  
> +	state = dac->da_state;
>  
>  	/*
>  	 * If there is an out-of-line value, de-allocate the blocks.
> @@ -1243,7 +1317,10 @@ xfs_attr_node_remove_step(
>  	 * overflow the maximum size of a transaction and/or hit a deadlock.
>  	 */
>  	if (args->rmtblkno > 0) {
> -		error = xfs_attr_node_remove_rmt(args, state);
> +		/*
> +		 * May return -EAGAIN. Remove blocks until args->rmtblkno == 0
> +		 */
> +		error = xfs_attr_node_remove_rmt(dac, state);
>  		if (error)
>  			return error;
>  	}
> @@ -1257,21 +1334,18 @@ xfs_attr_node_remove_step(
>  	xfs_da3_fixhashpath(state, &state->path);
>  
>  	/*
> -	 * Check to see if the tree needs to be collapsed.
> +	 * Check to see if the tree needs to be collapsed.  Set the flag to
> +	 * indicate that the calling function needs to move the to shrink
> +	 * operation
>  	 */
>  	if (retval && (state->path.active > 1)) {
>  		error = xfs_da3_join(state);
>  		if (error)
>  			return error;
> -		error = xfs_defer_finish(&args->trans);
> -		if (error)
> -			return error;
> -		/*
> -		 * Commit the Btree join operation and start a new trans.
> -		 */
> -		error = xfs_trans_roll_inode(&args->trans, dp);
> -		if (error)
> -			return error;
> +
> +		dac->flags |= XFS_DAC_DEFER_FINISH;
> +		dac->dela_state = XFS_DAS_RM_SHRINK;
> +		return -EAGAIN;
>  	}
>  
>  	return error;
> @@ -1282,31 +1356,53 @@ xfs_attr_node_remove_step(
>   *
>   * This routine will find the blocks of the name to remove, remove them and
>   * shirnk the tree if needed.
> + *
> + * This routine is meant to function as either an inline or delayed operation,
> + * and may return -EAGAIN when the transaction needs to be rolled.  Calling
> + * functions will need to handle this, and recall the function until a
> + * successful error code is returned.
>   */
>  STATIC int
> -xfs_attr_node_removename(
> -	struct xfs_da_args	*args)
> +xfs_attr_node_removename_iter(
> +	struct xfs_delattr_context	*dac)
>  {
> -	struct xfs_da_state	*state;
> -	int			error;
> -	struct xfs_inode	*dp = args->dp;
> +	struct xfs_da_args		*args = dac->da_args;
> +	struct xfs_da_state		*state;
> +	int				error;
> +	struct xfs_inode		*dp = args->dp;
>  
>  	trace_xfs_attr_node_removename(args);
> +	state = dac->da_state;
>  
> -	error = xfs_attr_node_removename_setup(args, &state);
> -	if (error)
> -		goto out;
> +	if ((dac->flags & XFS_DAC_NODE_RMVNAME_INIT) == 0) {
> +		dac->flags |= XFS_DAC_NODE_RMVNAME_INIT;
> +		error = xfs_attr_node_removename_setup(dac, &state);
> +		if (error)
> +			goto out;
> +	}
>  
> -	error = xfs_attr_node_remove_step(args, state);
> -	if (error)
> -		goto out;
> +	switch (dac->dela_state) {
> +	case XFS_DAS_UNINIT:
> +		error = xfs_attr_node_remove_step(dac);
> +		if (error)
> +			break;
>  

I think there's a bit more preliminary refactoring to do here to isolate
the state management to this one function. I.e., from the discussion on
the previous version, we'd ideally pull the logic that checks for the
subsequent shrink state out of xfs_attr_node_remove_step() and lift it
into this branch. See the pseudocode in the previous discussion for an
example of what I mean:

  https://lore.kernel.org/linux-xfs/20200901170020.GC174813@bfoster/

The general goal of that is to refactor the existing code such that all
of the state transitions and whatnot are shown in one place and the rest
is broken down into smaller functional helpers.

Brian

> -	/*
> -	 * If the result is small enough, push it all into the inode.
> -	 */
> -	if (xfs_bmap_one_block(dp, XFS_ATTR_FORK))
> -		error = xfs_attr_node_shrink(args, state);
> +		/* do not break, proceed to shrink if needed */
> +	case XFS_DAS_RM_SHRINK:
> +		/*
> +		 * If the result is small enough, push it all into the inode.
> +		 */
> +		if (xfs_bmap_one_block(dp, XFS_ATTR_FORK))
> +			error = xfs_attr_node_shrink(args, state);
>  
> +		break;
> +	default:
> +		ASSERT(0);
> +		return -EINVAL;
> +	}
> +
> +	if (error == -EAGAIN)
> +		return error;
>  out:
>  	if (state)
>  		xfs_da_state_free(state);
> diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
> index 3e97a93..64dcf0f 100644
> --- a/fs/xfs/libxfs/xfs_attr.h
> +++ b/fs/xfs/libxfs/xfs_attr.h
> @@ -74,6 +74,74 @@ struct xfs_attr_list_context {
>  };
>  
>  
> +/*
> + * ========================================================================
> + * Structure used to pass context around among the delayed routines.
> + * ========================================================================
> + */
> +
> +/*
> + * Below is a state machine diagram for attr remove operations. The  XFS_DAS_*
> + * states indicate places where the function would return -EAGAIN, and then
> + * immediately resume from after being recalled by the calling function. States
> + * marked as a "subroutine state" indicate that they belong to a subroutine, and
> + * so the calling function needs to pass them back to that subroutine to allow
> + * it to finish where it left off. But they otherwise do not have a role in the
> + * calling function other than just passing through.
> + *
> + * xfs_attr_remove_iter()
> + *	  XFS_DAS_RM_SHRINK ─┐
> + *	  (subroutine state) │
> + *	                     └─>xfs_attr_node_removename()
> + *	                                      │
> + *	                                      v
> + *	                                   need to
> + *	                                shrink tree? ─n─┐
> + *	                                      │         │
> + *	                                      y         │
> + *	                                      │         │
> + *	                                      v         │
> + *	                              XFS_DAS_RM_SHRINK │
> + *	                                      │         │
> + *	                                      v         │
> + *	                                     done <─────┘
> + *
> + */
> +
> +/*
> + * Enum values for xfs_delattr_context.da_state
> + *
> + * These values are used by delayed attribute operations to keep track  of where
> + * they were before they returned -EAGAIN.  A return code of -EAGAIN signals the
> + * calling function to roll the transaction, and then recall the subroutine to
> + * finish the operation.  The enum is then used by the subroutine to jump back
> + * to where it was and resume executing where it left off.
> + */
> +enum xfs_delattr_state {
> +	XFS_DAS_UNINIT		= 0,  /* No state has been set yet */
> +	XFS_DAS_RM_SHRINK,	      /* We are shrinking the tree */
> +};
> +
> +/*
> + * Defines for xfs_delattr_context.flags
> + */
> +#define XFS_DAC_DEFER_FINISH		0x01 /* finish the transaction */
> +#define XFS_DAC_NODE_RMVNAME_INIT	0x02 /* xfs_attr_node_removename init */
> +
> +/*
> + * Context used for keeping track of delayed attribute operations
> + */
> +struct xfs_delattr_context {
> +	struct xfs_da_args      *da_args;
> +
> +	/* Used in xfs_attr_node_removename to roll through removing blocks */
> +	struct xfs_da_state     *da_state;
> +
> +	/* Used to keep track of current state of delayed operation */
> +	unsigned int            flags;
> +	enum xfs_delattr_state  dela_state;
> +};
> +
>  /*========================================================================
>   * Function prototypes for the kernel.
>   *========================================================================*/
> @@ -91,6 +159,10 @@ int xfs_attr_set(struct xfs_da_args *args);
>  int xfs_attr_set_args(struct xfs_da_args *args);
>  int xfs_has_attr(struct xfs_da_args *args);
>  int xfs_attr_remove_args(struct xfs_da_args *args);
> +int xfs_attr_remove_iter(struct xfs_delattr_context *dac);
> +int xfs_attr_trans_roll(struct xfs_delattr_context *dac);
>  bool xfs_attr_namecheck(const void *name, size_t length);
> +void xfs_delattr_context_init(struct xfs_delattr_context *dac,
> +			      struct xfs_da_args *args);
>  
>  #endif	/* __XFS_ATTR_H__ */
> diff --git a/fs/xfs/libxfs/xfs_attr_leaf.c b/fs/xfs/libxfs/xfs_attr_leaf.c
> index bb128db..338377e 100644
> --- a/fs/xfs/libxfs/xfs_attr_leaf.c
> +++ b/fs/xfs/libxfs/xfs_attr_leaf.c
> @@ -19,8 +19,8 @@
>  #include "xfs_bmap_btree.h"
>  #include "xfs_bmap.h"
>  #include "xfs_attr_sf.h"
> -#include "xfs_attr_remote.h"
>  #include "xfs_attr.h"
> +#include "xfs_attr_remote.h"
>  #include "xfs_attr_leaf.h"
>  #include "xfs_error.h"
>  #include "xfs_trace.h"
> diff --git a/fs/xfs/libxfs/xfs_attr_remote.c b/fs/xfs/libxfs/xfs_attr_remote.c
> index 48d8e9c..1426c15 100644
> --- a/fs/xfs/libxfs/xfs_attr_remote.c
> +++ b/fs/xfs/libxfs/xfs_attr_remote.c
> @@ -674,10 +674,12 @@ xfs_attr_rmtval_invalidate(
>   */
>  int
>  xfs_attr_rmtval_remove(
> -	struct xfs_da_args      *args)
> +	struct xfs_da_args		*args)
>  {
> -	int			error;
> -	int			retval;
> +	int				error;
> +	struct xfs_delattr_context	dac  = {
> +		.da_args	= args,
> +	};
>  
>  	trace_xfs_attr_rmtval_remove(args);
>  
> @@ -685,19 +687,17 @@ xfs_attr_rmtval_remove(
>  	 * Keep de-allocating extents until the remote-value region is gone.
>  	 */
>  	do {
> -		retval = __xfs_attr_rmtval_remove(args);
> -		if (retval && retval != -EAGAIN)
> -			return retval;
> +		error = __xfs_attr_rmtval_remove(&dac);
> +		if (error != -EAGAIN)
> +			break;
>  
> -		/*
> -		 * Close out trans and start the next one in the chain.
> -		 */
> -		error = xfs_trans_roll_inode(&args->trans, args->dp);
> +		error = xfs_attr_trans_roll(&dac);
>  		if (error)
>  			return error;
> -	} while (retval == -EAGAIN);
>  
> -	return 0;
> +	} while (true);
> +
> +	return error;
>  }
>  
>  /*
> @@ -707,9 +707,10 @@ xfs_attr_rmtval_remove(
>   */
>  int
>  __xfs_attr_rmtval_remove(
> -	struct xfs_da_args	*args)
> +	struct xfs_delattr_context	*dac)
>  {
> -	int			error, done;
> +	struct xfs_da_args		*args = dac->da_args;
> +	int				error, done;
>  
>  	/*
>  	 * Unmap value blocks for this attr.
> @@ -719,12 +720,10 @@ __xfs_attr_rmtval_remove(
>  	if (error)
>  		return error;
>  
> -	error = xfs_defer_finish(&args->trans);
> -	if (error)
> -		return error;
> -
> -	if (!done)
> +	if (!done) {
> +		dac->flags |= XFS_DAC_DEFER_FINISH;
>  		return -EAGAIN;
> +	}
>  
>  	return error;
>  }
> diff --git a/fs/xfs/libxfs/xfs_attr_remote.h b/fs/xfs/libxfs/xfs_attr_remote.h
> index 9eee615..002fd30 100644
> --- a/fs/xfs/libxfs/xfs_attr_remote.h
> +++ b/fs/xfs/libxfs/xfs_attr_remote.h
> @@ -14,5 +14,5 @@ int xfs_attr_rmtval_remove(struct xfs_da_args *args);
>  int xfs_attr_rmtval_stale(struct xfs_inode *ip, struct xfs_bmbt_irec *map,
>  		xfs_buf_flags_t incore_flags);
>  int xfs_attr_rmtval_invalidate(struct xfs_da_args *args);
> -int __xfs_attr_rmtval_remove(struct xfs_da_args *args);
> +int __xfs_attr_rmtval_remove(struct xfs_delattr_context *dac);
>  #endif /* __XFS_ATTR_REMOTE_H__ */
> diff --git a/fs/xfs/xfs_attr_inactive.c b/fs/xfs/xfs_attr_inactive.c
> index bfad669..aaa7e66 100644
> --- a/fs/xfs/xfs_attr_inactive.c
> +++ b/fs/xfs/xfs_attr_inactive.c
> @@ -15,10 +15,10 @@
>  #include "xfs_da_format.h"
>  #include "xfs_da_btree.h"
>  #include "xfs_inode.h"
> +#include "xfs_attr.h"
>  #include "xfs_attr_remote.h"
>  #include "xfs_trans.h"
>  #include "xfs_bmap.h"
> -#include "xfs_attr.h"
>  #include "xfs_attr_leaf.h"
>  #include "xfs_quota.h"
>  #include "xfs_dir2.h"
> -- 
> 2.7.4
> 


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v13 03/10] xfs: Add delay ready attr set routines
  2020-10-23  6:34 ` [PATCH v13 03/10] xfs: Add delay ready attr set routines Allison Henderson
@ 2020-10-27 13:32   ` Chandan Babu R
  2020-11-10 21:57     ` Darrick J. Wong
  2020-11-10 23:10   ` Darrick J. Wong
  1 sibling, 1 reply; 58+ messages in thread
From: Chandan Babu R @ 2020-10-27 13:32 UTC (permalink / raw)
  To: Allison Henderson; +Cc: linux-xfs

On Friday 23 October 2020 12:04:28 PM IST Allison Henderson wrote:
> This patch modifies the attr set routines to be delay ready. This means
> they no longer roll or commit transactions, but instead return -EAGAIN
> to have the calling routine roll and refresh the transaction.  In this
> series, xfs_attr_set_args has become xfs_attr_set_iter, which uses a
> state machine like switch to keep track of where it was when EAGAIN was
> returned. See xfs_attr.h for a more detailed diagram of the states.
> 
> Two new helper functions have been added: xfs_attr_rmtval_set_init and
> xfs_attr_rmtval_set_blk.  They provide a subset of logic similar to
> xfs_attr_rmtval_set, but they store the current block in the delay attr
> context to allow the caller to roll the transaction between allocations.
> This helps to simplify and consolidate code used by
> xfs_attr_leaf_addname and xfs_attr_node_addname. xfs_attr_set_args has
> now become a simple loop to refresh the transaction until the operation
> is completed.  Lastly, xfs_attr_rmtval_remove is no longer used, and is
> removed.

One nit. xfs_attr_rmtval_remove()'s prototype declaration needs to be removed
from xfs_attr_remote.h.

> 
> Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
> ---
>  fs/xfs/libxfs/xfs_attr.c        | 370 ++++++++++++++++++++++++++--------------
>  fs/xfs/libxfs/xfs_attr.h        | 126 +++++++++++++-
>  fs/xfs/libxfs/xfs_attr_remote.c |  99 +++++++----
>  fs/xfs/libxfs/xfs_attr_remote.h |   4 +
>  fs/xfs/xfs_trace.h              |   1 -
>  5 files changed, 439 insertions(+), 161 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
> index 6ca94cb..95c98d7 100644
> --- a/fs/xfs/libxfs/xfs_attr.c
> +++ b/fs/xfs/libxfs/xfs_attr.c
> @@ -44,7 +44,7 @@ STATIC int xfs_attr_shortform_addname(xfs_da_args_t *args);
>   * Internal routines when attribute list is one block.
>   */
>  STATIC int xfs_attr_leaf_get(xfs_da_args_t *args);
> -STATIC int xfs_attr_leaf_addname(xfs_da_args_t *args);
> +STATIC int xfs_attr_leaf_addname(struct xfs_delattr_context *dac);
>  STATIC int xfs_attr_leaf_removename(xfs_da_args_t *args);
>  STATIC int xfs_attr_leaf_hasname(struct xfs_da_args *args, struct xfs_buf **bp);
>  
> @@ -52,12 +52,15 @@ STATIC int xfs_attr_leaf_hasname(struct xfs_da_args *args, struct xfs_buf **bp);
>   * Internal routines when attribute list is more than one block.
>   */
>  STATIC int xfs_attr_node_get(xfs_da_args_t *args);
> -STATIC int xfs_attr_node_addname(xfs_da_args_t *args);
> +STATIC int xfs_attr_node_addname(struct xfs_delattr_context *dac);
>  STATIC int xfs_attr_node_removename_iter(struct xfs_delattr_context *dac);
>  STATIC int xfs_attr_node_hasname(xfs_da_args_t *args,
>  				 struct xfs_da_state **state);
>  STATIC int xfs_attr_fillstate(xfs_da_state_t *state);
>  STATIC int xfs_attr_refillstate(xfs_da_state_t *state);
> +STATIC int xfs_attr_leaf_try_add(struct xfs_da_args *args, struct xfs_buf *bp);
> +STATIC int xfs_attr_set_iter(struct xfs_delattr_context *dac,
> +			     struct xfs_buf **leaf_bp);
>  
>  int
>  xfs_inode_hasattr(
> @@ -218,8 +221,11 @@ xfs_attr_is_shortform(
>  
>  /*
>   * Attempts to set an attr in shortform, or converts short form to leaf form if
> - * there is not enough room.  If the attr is set, the transaction is committed
> - * and set to NULL.
> + * there is not enough room.  This function is meant to operate as a helper
> + * routine to the delayed attribute functions.  It returns -EAGAIN to indicate
> + * that the calling function should roll the transaction, and then proceed to
> + * add the attr in leaf form.  This subroutine does not expect to be recalled
> + * again like the other delayed attr routines do.
>   */
>  STATIC int
>  xfs_attr_set_shortform(
> @@ -227,16 +233,16 @@ xfs_attr_set_shortform(
>  	struct xfs_buf		**leaf_bp)
>  {
>  	struct xfs_inode	*dp = args->dp;
> -	int			error, error2 = 0;
> +	int			error = 0;
>  
>  	/*
>  	 * Try to add the attr to the attribute list in the inode.
>  	 */
>  	error = xfs_attr_try_sf_addname(dp, args);
> +
> +	/* Should only be 0, -EEXIST or ENOSPC */
>  	if (error != -ENOSPC) {
> -		error2 = xfs_trans_commit(args->trans);
> -		args->trans = NULL;
> -		return error ? error : error2;
> +		return error;
>  	}
>  	/*
>  	 * It won't fit in the shortform, transform to a leaf block.  GROT:
> @@ -249,18 +255,10 @@ xfs_attr_set_shortform(
>  	/*
>  	 * Prevent the leaf buffer from being unlocked so that a concurrent AIL
>  	 * push cannot grab the half-baked leaf buffer and run into problems
> -	 * with the write verifier. Once we're done rolling the transaction we
> -	 * can release the hold and add the attr to the leaf.
> +	 * with the write verifier.
>  	 */
>  	xfs_trans_bhold(args->trans, *leaf_bp);
> -	error = xfs_defer_finish(&args->trans);
> -	xfs_trans_bhold_release(args->trans, *leaf_bp);
> -	if (error) {
> -		xfs_trans_brelse(args->trans, *leaf_bp);
> -		return error;
> -	}
> -
> -	return 0;
> +	return -EAGAIN;
>  }
>  
>  /*
> @@ -268,7 +266,7 @@ xfs_attr_set_shortform(
>   * also checks for a defer finish.  Transaction is finished and rolled as
>   * needed, and returns true of false if the delayed operation should continue.
>   */
> -int
> +STATIC int
>  xfs_attr_trans_roll(
>  	struct xfs_delattr_context	*dac)
>  {
> @@ -297,61 +295,130 @@ int
>  xfs_attr_set_args(
>  	struct xfs_da_args	*args)
>  {
> -	struct xfs_inode	*dp = args->dp;
> -	struct xfs_buf          *leaf_bp = NULL;
> -	int			error = 0;
> +	struct xfs_buf			*leaf_bp = NULL;
> +	int				error = 0;
> +	struct xfs_delattr_context	dac = {
> +		.da_args	= args,
> +	};
> +
> +	do {
> +		error = xfs_attr_set_iter(&dac, &leaf_bp);
> +		if (error != -EAGAIN)
> +			break;
> +
> +		error = xfs_attr_trans_roll(&dac);
> +		if (error)
> +			return error;
> +
> +		if (leaf_bp) {
> +			xfs_trans_bjoin(args->trans, leaf_bp);
> +			xfs_trans_bhold(args->trans, leaf_bp);
> +		}

When xfs_attr_set_iter() causes a "short form" attribute list to be converted
to "leaf form", leaf_bp would point to an xfs_buf which has been added to the
transaction and also XFS_BLI_HOLD flag is set on the buffer (last statement in
xfs_attr_set_shortform()). XFS_BLI_HOLD flag makes sure that the new
transaction allocated by xfs_attr_trans_roll() would continue to have leaf_bp
in the transaction's item list. Hence I think the above calls to
xfs_trans_bjoin() and xfs_trans_bhold() are not required. Please let me know
if I am missing something obvious here.


-- 
chandan




^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v13 02/10] xfs: Add delay ready attr remove routines
  2020-10-27  9:59   ` Chandan Babu R
@ 2020-10-27 15:32     ` Allison Henderson
  2020-10-28 12:04       ` Chandan Babu R
  0 siblings, 1 reply; 58+ messages in thread
From: Allison Henderson @ 2020-10-27 15:32 UTC (permalink / raw)
  To: Chandan Babu R; +Cc: linux-xfs



On 10/27/20 2:59 AM, Chandan Babu R wrote:
> On Friday 23 October 2020 12:04:27 PM IST Allison Henderson wrote:
>> This patch modifies the attr remove routines to be delay ready. This
>> means they no longer roll or commit transactions, but instead return
>> -EAGAIN to have the calling routine roll and refresh the transaction. In
>> this series, xfs_attr_remove_args has become xfs_attr_remove_iter, which
>> uses a sort of state machine like switch to keep track of where it was
>> when EAGAIN was returned. xfs_attr_node_removename has also been
>> modified to use the switch, and a new version of xfs_attr_remove_args
>> consists of a simple loop to refresh the transaction until the operation
>> is completed.  A new XFS_DAC_DEFER_FINISH flag is used to finish the
>> transaction where ever the existing code used to.
>>
>> Calls to xfs_attr_rmtval_remove are replaced with the delay ready
>> version __xfs_attr_rmtval_remove. We will rename
>> __xfs_attr_rmtval_remove back to xfs_attr_rmtval_remove when we are
>> done.
>>
>> xfs_attr_rmtval_remove itself is still in use by the set routines (used
>> during a rename).  For reasons of preserving existing function, we
>> modify xfs_attr_rmtval_remove to call xfs_defer_finish when the flag is
>> set.  Similar to how xfs_attr_remove_args does here.  Once we transition
>> the set routines to be delay ready, xfs_attr_rmtval_remove is no longer
>> used and will be removed.
>>
>> This patch also adds a new struct xfs_delattr_context, which we will use
>> to keep track of the current state of an attribute operation. The new
>> xfs_delattr_state enum is used to track various operations that are in
>> progress so that we know not to repeat them, and resume where we left
>> off before EAGAIN was returned to cycle out the transaction. Other
>> members take the place of local variables that need to retain their
>> values across multiple function recalls.  See xfs_attr.h for a more
>> detailed diagram of the states.
>>
>> Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
>> ---
>>   fs/xfs/libxfs/xfs_attr.c        | 200 +++++++++++++++++++++++++++++-----------
>>   fs/xfs/libxfs/xfs_attr.h        |  72 +++++++++++++++
>>   fs/xfs/libxfs/xfs_attr_leaf.c   |   2 +-
>>   fs/xfs/libxfs/xfs_attr_remote.c |  37 ++++----
>>   fs/xfs/libxfs/xfs_attr_remote.h |   2 +-
>>   fs/xfs/xfs_attr_inactive.c      |   2 +-
>>   6 files changed, 241 insertions(+), 74 deletions(-)
>>
>> diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
>> index f4d39bf..6ca94cb 100644
>> --- a/fs/xfs/libxfs/xfs_attr.c
>> +++ b/fs/xfs/libxfs/xfs_attr.c
>> @@ -53,7 +53,7 @@ STATIC int xfs_attr_leaf_hasname(struct xfs_da_args *args, struct xfs_buf **bp);
>>    */
>>   STATIC int xfs_attr_node_get(xfs_da_args_t *args);
>>   STATIC int xfs_attr_node_addname(xfs_da_args_t *args);
>> -STATIC int xfs_attr_node_removename(xfs_da_args_t *args);
>> +STATIC int xfs_attr_node_removename_iter(struct xfs_delattr_context *dac);
>>   STATIC int xfs_attr_node_hasname(xfs_da_args_t *args,
>>   				 struct xfs_da_state **state);
>>   STATIC int xfs_attr_fillstate(xfs_da_state_t *state);
>> @@ -264,6 +264,33 @@ xfs_attr_set_shortform(
>>   }
>>   
>>   /*
>> + * Checks to see if a delayed attribute transaction should be rolled.  If so,
>> + * also checks for a defer finish.  Transaction is finished and rolled as
>> + * needed, and returns true of false if the delayed operation should continue.
>> + */
>> +int
>> +xfs_attr_trans_roll(
>> +	struct xfs_delattr_context	*dac)
>> +{
>> +	struct xfs_da_args		*args = dac->da_args;
>> +	int				error = 0;
>> +
>> +	if (dac->flags & XFS_DAC_DEFER_FINISH) {
>> +		/*
>> +		 * The caller wants us to finish all the deferred ops so that we
>> +		 * avoid pinning the log tail with a large number of deferred
>> +		 * ops.
>> +		 */
>> +		dac->flags &= ~XFS_DAC_DEFER_FINISH;
>> +		error = xfs_defer_finish(&args->trans);
>> +		if (error)
>> +			return error;
>> +	}
>> +
>> +	return xfs_trans_roll_inode(&args->trans, args->dp);
>> +}
>> +
>> +/*
>>    * Set the attribute specified in @args.
>>    */
>>   int
>> @@ -364,23 +391,54 @@ xfs_has_attr(
>>    */
>>   int
>>   xfs_attr_remove_args(
>> -	struct xfs_da_args      *args)
>> +	struct xfs_da_args	*args)
>>   {
>> -	struct xfs_inode	*dp = args->dp;
>> -	int			error;
>> +	int				error = 0;
> 
> I guess the explicit initialization of "error" can be removed since the
> value returned by the call to xfs_attr_remove_iter() will overwrite it.
Sure, will fix
> 
>> +	struct xfs_delattr_context	dac = {
>> +		.da_args	= args,
>> +	};
>> +
>> +	do {
>> +		error = xfs_attr_remove_iter(&dac);
>> +		if (error != -EAGAIN)
>> +			break;
>> +
>> +		error = xfs_attr_trans_roll(&dac);
>> +		if (error)
>> +			return error;
>> +
>> +	} while (true);
>> +
>> +	return error;
>> +}
>> +
>> +/*
>> + * Remove the attribute specified in @args.
>> + *
>> + * This function may return -EAGAIN to signal that the transaction needs to be
>> + * rolled.  Callers should continue calling this function until they receive a
>> + * return value other than -EAGAIN.
>> + */
>> +int
>> +xfs_attr_remove_iter(
>> +	struct xfs_delattr_context	*dac)
>> +{
>> +	struct xfs_da_args		*args = dac->da_args;
>> +	struct xfs_inode		*dp = args->dp;
>> +
>> +	if (dac->dela_state == XFS_DAS_RM_SHRINK)
>> +		goto node;
>>   
>>   	if (!xfs_inode_hasattr(dp)) {
>> -		error = -ENOATTR;
>> +		return -ENOATTR;
>>   	} else if (dp->i_afp->if_format == XFS_DINODE_FMT_LOCAL) {
>>   		ASSERT(dp->i_afp->if_flags & XFS_IFINLINE);
>> -		error = xfs_attr_shortform_remove(args);
>> +		return xfs_attr_shortform_remove(args);
>>   	} else if (xfs_bmap_one_block(dp, XFS_ATTR_FORK)) {
>> -		error = xfs_attr_leaf_removename(args);
>> -	} else {
>> -		error = xfs_attr_node_removename(args);
>> +		return xfs_attr_leaf_removename(args);
>>   	}
>> -
>> -	return error;
>> +node:
>> +	return  xfs_attr_node_removename_iter(dac);
>>   }
>>   
>>   /*
>> @@ -1178,10 +1236,11 @@ xfs_attr_leaf_mark_incomplete(
>>    */
>>   STATIC
>>   int xfs_attr_node_removename_setup(
>> -	struct xfs_da_args	*args,
>> -	struct xfs_da_state	**state)
>> +	struct xfs_delattr_context	*dac,
>> +	struct xfs_da_state		**state)
>>   {
>> -	int			error;
>> +	struct xfs_da_args		*args = dac->da_args;
>> +	int				error;
>>   
>>   	error = xfs_attr_node_hasname(args, state);
>>   	if (error != -EEXIST)
>> @@ -1191,6 +1250,12 @@ int xfs_attr_node_removename_setup(
>>   	ASSERT((*state)->path.blk[(*state)->path.active - 1].magic ==
>>   		XFS_ATTR_LEAF_MAGIC);
>>   
>> +	/*
>> +	 * Store state in the context incase we need to cycle out the
>> +	 * transaction
>> +	 */
>> +	dac->da_state = *state;
>> +
>>   	if (args->rmtblkno > 0) {
>>   		error = xfs_attr_leaf_mark_incomplete(args, *state);
>>   		if (error)
>> @@ -1203,13 +1268,16 @@ int xfs_attr_node_removename_setup(
>>   }
>>   
>>   STATIC int
>> -xfs_attr_node_remove_rmt(
>> -	struct xfs_da_args	*args,
>> -	struct xfs_da_state	*state)
>> +xfs_attr_node_remove_rmt (
>> +	struct xfs_delattr_context	*dac,
>> +	struct xfs_da_state		*state)
>>   {
>> -	int			error = 0;
>> +	int				error = 0;
>>   
>> -	error = xfs_attr_rmtval_remove(args);
>> +	/*
>> +	 * May return -EAGAIN to request that the caller recall this function
>> +	 */
>> +	error = __xfs_attr_rmtval_remove(dac);
>>   	if (error)
>>   		return error;
>>   
>> @@ -1221,21 +1289,27 @@ xfs_attr_node_remove_rmt(
>>   }
>>   
>>   /*
>> - * Remove a name from a B-tree attribute list.
>> + * Step through removeing a name from a B-tree attribute list.
>>    *
>>    * This will involve walking down the Btree, and may involve joining
>>    * leaf nodes and even joining intermediate nodes up to and including
>>    * the root node (a special case of an intermediate node).
>> + *
>> + * This routine is meant to function as either an inline or delayed operation,
>> + * and may return -EAGAIN when the transaction needs to be rolled.  Calling
>> + * functions will need to handle this, and recall the function until a
>> + * successful error code is returned.
>>    */
>>   STATIC int
>>   xfs_attr_node_remove_step(
>> -	struct xfs_da_args	*args,
>> -	struct xfs_da_state	*state)
>> +	struct xfs_delattr_context	*dac)
>>   {
>> -	struct xfs_da_state_blk	*blk;
>> -	int			retval, error;
>> -	struct xfs_inode	*dp = args->dp;
>> +	struct xfs_da_args		*args = dac->da_args;
>> +	struct xfs_da_state		*state;
>> +	struct xfs_da_state_blk		*blk;
>> +	int				retval, error = 0;
>>   
>> +	state = dac->da_state;
>>   
>>   	/*
>>   	 * If there is an out-of-line value, de-allocate the blocks.
>> @@ -1243,7 +1317,10 @@ xfs_attr_node_remove_step(
>>   	 * overflow the maximum size of a transaction and/or hit a deadlock.
>>   	 */
>>   	if (args->rmtblkno > 0) {
>> -		error = xfs_attr_node_remove_rmt(args, state);
>> +		/*
>> +		 * May return -EAGAIN. Remove blocks until args->rmtblkno == 0
>> +		 */
>> +		error = xfs_attr_node_remove_rmt(dac, state);
>>   		if (error)
>>   			return error;
>>   	}
>> @@ -1257,21 +1334,18 @@ xfs_attr_node_remove_step(
>>   	xfs_da3_fixhashpath(state, &state->path);
>>   
>>   	/*
>> -	 * Check to see if the tree needs to be collapsed.
>> +	 * Check to see if the tree needs to be collapsed.  Set the flag to
>> +	 * indicate that the calling function needs to move the to shrink
>> +	 * operation
>>   	 */
>>   	if (retval && (state->path.active > 1)) {
>>   		error = xfs_da3_join(state);
>>   		if (error)
>>   			return error;
>> -		error = xfs_defer_finish(&args->trans);
>> -		if (error)
>> -			return error;
>> -		/*
>> -		 * Commit the Btree join operation and start a new trans.
>> -		 */
>> -		error = xfs_trans_roll_inode(&args->trans, dp);
>> -		if (error)
>> -			return error;
>> +
>> +		dac->flags |= XFS_DAC_DEFER_FINISH;
>> +		dac->dela_state = XFS_DAS_RM_SHRINK;
>> +		return -EAGAIN;
>>   	}
>>   
>>   	return error;
>> @@ -1282,31 +1356,53 @@ xfs_attr_node_remove_step(
>>    *
>>    * This routine will find the blocks of the name to remove, remove them and
>>    * shirnk the tree if needed.
>> + *
>> + * This routine is meant to function as either an inline or delayed operation,
>> + * and may return -EAGAIN when the transaction needs to be rolled.  Calling
>> + * functions will need to handle this, and recall the function until a
>> + * successful error code is returned.
>>    */
>>   STATIC int
>> -xfs_attr_node_removename(
>> -	struct xfs_da_args	*args)
>> +xfs_attr_node_removename_iter(
>> +	struct xfs_delattr_context	*dac)
>>   {
>> -	struct xfs_da_state	*state;
>> -	int			error;
>> -	struct xfs_inode	*dp = args->dp;
>> +	struct xfs_da_args		*args = dac->da_args;
>> +	struct xfs_da_state		*state;
>> +	int				error;
>> +	struct xfs_inode		*dp = args->dp;
>>   
>>   	trace_xfs_attr_node_removename(args);
>> +	state = dac->da_state;
>>   
>> -	error = xfs_attr_node_removename_setup(args, &state);
>> -	if (error)
>> -		goto out;
>> +	if ((dac->flags & XFS_DAC_NODE_RMVNAME_INIT) == 0) {
>> +		dac->flags |= XFS_DAC_NODE_RMVNAME_INIT;
>> +		error = xfs_attr_node_removename_setup(dac, &state);
>> +		if (error)
>> +			goto out;
>> +	}
>>   
>> -	error = xfs_attr_node_remove_step(args, state);
>> -	if (error)
>> -		goto out;
>> +	switch (dac->dela_state) {
>> +	case XFS_DAS_UNINIT:
>> +		error = xfs_attr_node_remove_step(dac);
>> +		if (error)
>> +			break;
>>   
>> -	/*
>> -	 * If the result is small enough, push it all into the inode.
>> -	 */
>> -	if (xfs_bmap_one_block(dp, XFS_ATTR_FORK))
>> -		error = xfs_attr_node_shrink(args, state);
>> +		/* do not break, proceed to shrink if needed */
>> +	case XFS_DAS_RM_SHRINK:
>> +		/*
>> +		 * If the result is small enough, push it all into the inode.
>> +		 */
>> +		if (xfs_bmap_one_block(dp, XFS_ATTR_FORK))
>> +			error = xfs_attr_node_shrink(args, state);
>>   
>> +		break;
>> +	default:
>> +		ASSERT(0);
>> +		return -EINVAL;
> 
> I don't think it is possible in a real world scenario, but if "state" were
> pointing to allocated memory then the above return value might leak the
> corresponding memory.
Hmm, trying to follow you here.... I'm assuming you meant dela_state 
instead of state since that's what controls the switch.  The dac 
structure is zeroed when allocated to avoid this.  Most of the time when 
this switch executes, dela_state is zero.  I did have to add the 
XFS_DAS_UNINIT from the previous suggestion in the last revision though 
or it generates warnings.
> 
> Apart from the above nit, the remaining changes look good to me.
Ok, thanks for the review!
Allison

> 
> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
> 
>> +	}
>> +
>> +	if (error == -EAGAIN)
>> +		return error;
>>   out:
>>   	if (state)
>>   		xfs_da_state_free(state);
>> diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
>> index 3e97a93..64dcf0f 100644
>> --- a/fs/xfs/libxfs/xfs_attr.h
>> +++ b/fs/xfs/libxfs/xfs_attr.h
>> @@ -74,6 +74,74 @@ struct xfs_attr_list_context {
>>   };
>>   
>>   
>> +/*
>> + * ========================================================================
>> + * Structure used to pass context around among the delayed routines.
>> + * ========================================================================
>> + */
>> +
>> +/*
>> + * Below is a state machine diagram for attr remove operations. The  XFS_DAS_*
>> + * states indicate places where the function would return -EAGAIN, and then
>> + * immediately resume from after being recalled by the calling function. States
>> + * marked as a "subroutine state" indicate that they belong to a subroutine, and
>> + * so the calling function needs to pass them back to that subroutine to allow
>> + * it to finish where it left off. But they otherwise do not have a role in the
>> + * calling function other than just passing through.
>> + *
>> + * xfs_attr_remove_iter()
>> + *	  XFS_DAS_RM_SHRINK ─┐
>> + *	  (subroutine state) │
>> + *	                     └─>xfs_attr_node_removename()
>> + *	                                      │
>> + *	                                      v
>> + *	                                   need to
>> + *	                                shrink tree? ─n─┐
>> + *	                                      │         │
>> + *	                                      y         │
>> + *	                                      │         │
>> + *	                                      v         │
>> + *	                              XFS_DAS_RM_SHRINK │
>> + *	                                      │         │
>> + *	                                      v         │
>> + *	                                     done <─────┘
>> + *
>> + */
>> +
>> +/*
>> + * Enum values for xfs_delattr_context.da_state
>> + *
>> + * These values are used by delayed attribute operations to keep track  of where
>> + * they were before they returned -EAGAIN.  A return code of -EAGAIN signals the
>> + * calling function to roll the transaction, and then recall the subroutine to
>> + * finish the operation.  The enum is then used by the subroutine to jump back
>> + * to where it was and resume executing where it left off.
>> + */
>> +enum xfs_delattr_state {
>> +	XFS_DAS_UNINIT		= 0,  /* No state has been set yet */
>> +	XFS_DAS_RM_SHRINK,	      /* We are shrinking the tree */
>> +};
>> +
>> +/*
>> + * Defines for xfs_delattr_context.flags
>> + */
>> +#define XFS_DAC_DEFER_FINISH		0x01 /* finish the transaction */
>> +#define XFS_DAC_NODE_RMVNAME_INIT	0x02 /* xfs_attr_node_removename init */
>> +
>> +/*
>> + * Context used for keeping track of delayed attribute operations
>> + */
>> +struct xfs_delattr_context {
>> +	struct xfs_da_args      *da_args;
>> +
>> +	/* Used in xfs_attr_node_removename to roll through removing blocks */
>> +	struct xfs_da_state     *da_state;
>> +
>> +	/* Used to keep track of current state of delayed operation */
>> +	unsigned int            flags;
>> +	enum xfs_delattr_state  dela_state;
>> +};
>> +
>>   /*========================================================================
>>    * Function prototypes for the kernel.
>>    *========================================================================*/
>> @@ -91,6 +159,10 @@ int xfs_attr_set(struct xfs_da_args *args);
>>   int xfs_attr_set_args(struct xfs_da_args *args);
>>   int xfs_has_attr(struct xfs_da_args *args);
>>   int xfs_attr_remove_args(struct xfs_da_args *args);
>> +int xfs_attr_remove_iter(struct xfs_delattr_context *dac);
>> +int xfs_attr_trans_roll(struct xfs_delattr_context *dac);
>>   bool xfs_attr_namecheck(const void *name, size_t length);
>> +void xfs_delattr_context_init(struct xfs_delattr_context *dac,
>> +			      struct xfs_da_args *args);
>>   
>>   #endif	/* __XFS_ATTR_H__ */
>> diff --git a/fs/xfs/libxfs/xfs_attr_leaf.c b/fs/xfs/libxfs/xfs_attr_leaf.c
>> index bb128db..338377e 100644
>> --- a/fs/xfs/libxfs/xfs_attr_leaf.c
>> +++ b/fs/xfs/libxfs/xfs_attr_leaf.c
>> @@ -19,8 +19,8 @@
>>   #include "xfs_bmap_btree.h"
>>   #include "xfs_bmap.h"
>>   #include "xfs_attr_sf.h"
>> -#include "xfs_attr_remote.h"
>>   #include "xfs_attr.h"
>> +#include "xfs_attr_remote.h"
>>   #include "xfs_attr_leaf.h"
>>   #include "xfs_error.h"
>>   #include "xfs_trace.h"
>> diff --git a/fs/xfs/libxfs/xfs_attr_remote.c b/fs/xfs/libxfs/xfs_attr_remote.c
>> index 48d8e9c..1426c15 100644
>> --- a/fs/xfs/libxfs/xfs_attr_remote.c
>> +++ b/fs/xfs/libxfs/xfs_attr_remote.c
>> @@ -674,10 +674,12 @@ xfs_attr_rmtval_invalidate(
>>    */
>>   int
>>   xfs_attr_rmtval_remove(
>> -	struct xfs_da_args      *args)
>> +	struct xfs_da_args		*args)
>>   {
>> -	int			error;
>> -	int			retval;
>> +	int				error;
>> +	struct xfs_delattr_context	dac  = {
>> +		.da_args	= args,
>> +	};
>>   
>>   	trace_xfs_attr_rmtval_remove(args);
>>   
>> @@ -685,19 +687,17 @@ xfs_attr_rmtval_remove(
>>   	 * Keep de-allocating extents until the remote-value region is gone.
>>   	 */
>>   	do {
>> -		retval = __xfs_attr_rmtval_remove(args);
>> -		if (retval && retval != -EAGAIN)
>> -			return retval;
>> +		error = __xfs_attr_rmtval_remove(&dac);
>> +		if (error != -EAGAIN)
>> +			break;
>>   
>> -		/*
>> -		 * Close out trans and start the next one in the chain.
>> -		 */
>> -		error = xfs_trans_roll_inode(&args->trans, args->dp);
>> +		error = xfs_attr_trans_roll(&dac);
>>   		if (error)
>>   			return error;
>> -	} while (retval == -EAGAIN);
>>   
>> -	return 0;
>> +	} while (true);
>> +
>> +	return error;
>>   }
>>   
>>   /*
>> @@ -707,9 +707,10 @@ xfs_attr_rmtval_remove(
>>    */
>>   int
>>   __xfs_attr_rmtval_remove(
>> -	struct xfs_da_args	*args)
>> +	struct xfs_delattr_context	*dac)
>>   {
>> -	int			error, done;
>> +	struct xfs_da_args		*args = dac->da_args;
>> +	int				error, done;
>>   
>>   	/*
>>   	 * Unmap value blocks for this attr.
>> @@ -719,12 +720,10 @@ __xfs_attr_rmtval_remove(
>>   	if (error)
>>   		return error;
>>   
>> -	error = xfs_defer_finish(&args->trans);
>> -	if (error)
>> -		return error;
>> -
>> -	if (!done)
>> +	if (!done) {
>> +		dac->flags |= XFS_DAC_DEFER_FINISH;
>>   		return -EAGAIN;
>> +	}
>>   
>>   	return error;
>>   }
>> diff --git a/fs/xfs/libxfs/xfs_attr_remote.h b/fs/xfs/libxfs/xfs_attr_remote.h
>> index 9eee615..002fd30 100644
>> --- a/fs/xfs/libxfs/xfs_attr_remote.h
>> +++ b/fs/xfs/libxfs/xfs_attr_remote.h
>> @@ -14,5 +14,5 @@ int xfs_attr_rmtval_remove(struct xfs_da_args *args);
>>   int xfs_attr_rmtval_stale(struct xfs_inode *ip, struct xfs_bmbt_irec *map,
>>   		xfs_buf_flags_t incore_flags);
>>   int xfs_attr_rmtval_invalidate(struct xfs_da_args *args);
>> -int __xfs_attr_rmtval_remove(struct xfs_da_args *args);
>> +int __xfs_attr_rmtval_remove(struct xfs_delattr_context *dac);
>>   #endif /* __XFS_ATTR_REMOTE_H__ */
>> diff --git a/fs/xfs/xfs_attr_inactive.c b/fs/xfs/xfs_attr_inactive.c
>> index bfad669..aaa7e66 100644
>> --- a/fs/xfs/xfs_attr_inactive.c
>> +++ b/fs/xfs/xfs_attr_inactive.c
>> @@ -15,10 +15,10 @@
>>   #include "xfs_da_format.h"
>>   #include "xfs_da_btree.h"
>>   #include "xfs_inode.h"
>> +#include "xfs_attr.h"
>>   #include "xfs_attr_remote.h"
>>   #include "xfs_trans.h"
>>   #include "xfs_bmap.h"
>> -#include "xfs_attr.h"
>>   #include "xfs_attr_leaf.h"
>>   #include "xfs_quota.h"
>>   #include "xfs_dir2.h"
>>
> 
> 

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v13 01/10] xfs: Add helper xfs_attr_node_remove_step
  2020-10-27 12:15   ` Brian Foster
@ 2020-10-27 15:33     ` Allison Henderson
  0 siblings, 0 replies; 58+ messages in thread
From: Allison Henderson @ 2020-10-27 15:33 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs



On 10/27/20 5:15 AM, Brian Foster wrote:
> On Thu, Oct 22, 2020 at 11:34:26PM -0700, Allison Henderson wrote:
>> From: Allison Collins <allison.henderson@oracle.com>
>>
>> This patch adds a new helper function xfs_attr_node_remove_step.  This
>> will help simplify and modularize the calling function
>> xfs_attr_node_remove.
>>
>> Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
>> ---
>>   fs/xfs/libxfs/xfs_attr.c | 46 ++++++++++++++++++++++++++++++++++------------
>>   1 file changed, 34 insertions(+), 12 deletions(-)
>>
>> diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
>> index fd8e641..f4d39bf 100644
>> --- a/fs/xfs/libxfs/xfs_attr.c
>> +++ b/fs/xfs/libxfs/xfs_attr.c
> ...
>> @@ -1267,18 +1262,45 @@ xfs_attr_node_removename(
>>   	if (retval && (state->path.active > 1)) {
>>   		error = xfs_da3_join(state);
>>   		if (error)
>> -			goto out;
>> +			return error;
>>   		error = xfs_defer_finish(&args->trans);
>>   		if (error)
>> -			goto out;
>> +			return error;
>>   		/*
>>   		 * Commit the Btree join operation and start a new trans.
>>   		 */
>>   		error = xfs_trans_roll_inode(&args->trans, dp);
>>   		if (error)
>> -			goto out;
>> +			return error;
>>   	}
>>   
>> +	return error;
>> +}
>> +
>> +/*
>> + * Remove a name from a B-tree attribute list.
>> + *
>> + * This routine will find the blocks of the name to remove, remove them and
>> + * shirnk the tree if needed.
>> + */
>> +STATIC int
>> +xfs_attr_node_removename(
>> +	struct xfs_da_args	*args)
>> +{
>> +	struct xfs_da_state	*state;
> 
> It urks me a little bit that we have to dig down into a couple functions
> to grok that state allocation is the first step or otherwise occurs
> before we potentially use the error path. Since we already check for
> state in the out path, can we just initialize this as *state = NULL
> here so the logic is clear? Otherwise the patch LGTM:
> 
Sure, will add.  Thanks!

Allison
> Reviewed-by: Brian Foster <bfoster@redhat.com>
> 
>> +	int			error;
>> +	struct xfs_inode	*dp = args->dp;
>> +
>> +	trace_xfs_attr_node_removename(args);
>> +
>> +	error = xfs_attr_node_removename_setup(args, &state);
>> +	if (error)
>> +		goto out;
>> +
>> +	error = xfs_attr_node_remove_step(args, state);
>> +	if (error)
>> +		goto out;
>> +
>>   	/*
>>   	 * If the result is small enough, push it all into the inode.
>>   	 */
>> -- 
>> 2.7.4
>>
> 

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v13 01/10] xfs: Add helper xfs_attr_node_remove_step
  2020-10-27  7:03   ` Chandan Babu R
@ 2020-10-27 22:23     ` Allison Henderson
  0 siblings, 0 replies; 58+ messages in thread
From: Allison Henderson @ 2020-10-27 22:23 UTC (permalink / raw)
  To: Chandan Babu R; +Cc: linux-xfs



On 10/27/20 12:03 AM, Chandan Babu R wrote:
> On Friday 23 October 2020 12:04:26 PM IST Allison Henderson wrote:
>> From: Allison Collins <allison.henderson@oracle.com>
>>
>> This patch adds a new helper function xfs_attr_node_remove_step.  This
>> will help simplify and modularize the calling function
>> xfs_attr_node_remove.
> 
> The above should have been "xfs_attr_node_removename".
Sure, will fix. Thanks!

Allison

> 
> The code changes themselves are logically correct.
> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
> 
>>
>> Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
>> ---
>>   fs/xfs/libxfs/xfs_attr.c | 46 ++++++++++++++++++++++++++++++++++------------
>>   1 file changed, 34 insertions(+), 12 deletions(-)
>>
>> diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
>> index fd8e641..f4d39bf 100644
>> --- a/fs/xfs/libxfs/xfs_attr.c
>> +++ b/fs/xfs/libxfs/xfs_attr.c
>> @@ -1228,19 +1228,14 @@ xfs_attr_node_remove_rmt(
>>    * the root node (a special case of an intermediate node).
>>    */
>>   STATIC int
>> -xfs_attr_node_removename(
>> -	struct xfs_da_args	*args)
>> +xfs_attr_node_remove_step(
>> +	struct xfs_da_args	*args,
>> +	struct xfs_da_state	*state)
>>   {
>> -	struct xfs_da_state	*state;
>>   	struct xfs_da_state_blk	*blk;
>>   	int			retval, error;
>>   	struct xfs_inode	*dp = args->dp;
>>   
>> -	trace_xfs_attr_node_removename(args);
>> -
>> -	error = xfs_attr_node_removename_setup(args, &state);
>> -	if (error)
>> -		goto out;
>>   
>>   	/*
>>   	 * If there is an out-of-line value, de-allocate the blocks.
>> @@ -1250,7 +1245,7 @@ xfs_attr_node_removename(
>>   	if (args->rmtblkno > 0) {
>>   		error = xfs_attr_node_remove_rmt(args, state);
>>   		if (error)
>> -			goto out;
>> +			return error;
>>   	}
>>   
>>   	/*
>> @@ -1267,18 +1262,45 @@ xfs_attr_node_removename(
>>   	if (retval && (state->path.active > 1)) {
>>   		error = xfs_da3_join(state);
>>   		if (error)
>> -			goto out;
>> +			return error;
>>   		error = xfs_defer_finish(&args->trans);
>>   		if (error)
>> -			goto out;
>> +			return error;
>>   		/*
>>   		 * Commit the Btree join operation and start a new trans.
>>   		 */
>>   		error = xfs_trans_roll_inode(&args->trans, dp);
>>   		if (error)
>> -			goto out;
>> +			return error;
>>   	}
>>   
>> +	return error;
>> +}
>> +
>> +/*
>> + * Remove a name from a B-tree attribute list.
>> + *
>> + * This routine will find the blocks of the name to remove, remove them and
>> + * shirnk the tree if needed.
>> + */
>> +STATIC int
>> +xfs_attr_node_removename(
>> +	struct xfs_da_args	*args)
>> +{
>> +	struct xfs_da_state	*state;
>> +	int			error;
>> +	struct xfs_inode	*dp = args->dp;
>> +
>> +	trace_xfs_attr_node_removename(args);
>> +
>> +	error = xfs_attr_node_removename_setup(args, &state);
>> +	if (error)
>> +		goto out;
>> +
>> +	error = xfs_attr_node_remove_step(args, state);
>> +	if (error)
>> +		goto out;
>> +
>>   	/*
>>   	 * If the result is small enough, push it all into the inode.
>>   	 */
>>
> 
> 

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v13 02/10] xfs: Add delay ready attr remove routines
  2020-10-27 12:16   ` Brian Foster
@ 2020-10-27 22:27     ` Allison Henderson
  2020-10-28 12:28       ` Brian Foster
  2020-11-10 23:15     ` Darrick J. Wong
  1 sibling, 1 reply; 58+ messages in thread
From: Allison Henderson @ 2020-10-27 22:27 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs



On 10/27/20 5:16 AM, Brian Foster wrote:
> On Thu, Oct 22, 2020 at 11:34:27PM -0700, Allison Henderson wrote:
>> This patch modifies the attr remove routines to be delay ready. This
>> means they no longer roll or commit transactions, but instead return
>> -EAGAIN to have the calling routine roll and refresh the transaction. In
>> this series, xfs_attr_remove_args has become xfs_attr_remove_iter, which
>> uses a sort of state machine like switch to keep track of where it was
>> when EAGAIN was returned. xfs_attr_node_removename has also been
>> modified to use the switch, and a new version of xfs_attr_remove_args
>> consists of a simple loop to refresh the transaction until the operation
>> is completed.  A new XFS_DAC_DEFER_FINISH flag is used to finish the
>> transaction where ever the existing code used to.
>>
>> Calls to xfs_attr_rmtval_remove are replaced with the delay ready
>> version __xfs_attr_rmtval_remove. We will rename
>> __xfs_attr_rmtval_remove back to xfs_attr_rmtval_remove when we are
>> done.
>>
>> xfs_attr_rmtval_remove itself is still in use by the set routines (used
>> during a rename).  For reasons of preserving existing function, we
>> modify xfs_attr_rmtval_remove to call xfs_defer_finish when the flag is
>> set.  Similar to how xfs_attr_remove_args does here.  Once we transition
>> the set routines to be delay ready, xfs_attr_rmtval_remove is no longer
>> used and will be removed.
>>
>> This patch also adds a new struct xfs_delattr_context, which we will use
>> to keep track of the current state of an attribute operation. The new
>> xfs_delattr_state enum is used to track various operations that are in
>> progress so that we know not to repeat them, and resume where we left
>> off before EAGAIN was returned to cycle out the transaction. Other
>> members take the place of local variables that need to retain their
>> values across multiple function recalls.  See xfs_attr.h for a more
>> detailed diagram of the states.
>>
>> Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
>> ---
>>   fs/xfs/libxfs/xfs_attr.c        | 200 +++++++++++++++++++++++++++++-----------
>>   fs/xfs/libxfs/xfs_attr.h        |  72 +++++++++++++++
>>   fs/xfs/libxfs/xfs_attr_leaf.c   |   2 +-
>>   fs/xfs/libxfs/xfs_attr_remote.c |  37 ++++----
>>   fs/xfs/libxfs/xfs_attr_remote.h |   2 +-
>>   fs/xfs/xfs_attr_inactive.c      |   2 +-
>>   6 files changed, 241 insertions(+), 74 deletions(-)
>>
>> diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
>> index f4d39bf..6ca94cb 100644
>> --- a/fs/xfs/libxfs/xfs_attr.c
>> +++ b/fs/xfs/libxfs/xfs_attr.c
>> @@ -53,7 +53,7 @@ STATIC int xfs_attr_leaf_hasname(struct xfs_da_args *args, struct xfs_buf **bp);
>>    */
>>   STATIC int xfs_attr_node_get(xfs_da_args_t *args);
>>   STATIC int xfs_attr_node_addname(xfs_da_args_t *args);
>> -STATIC int xfs_attr_node_removename(xfs_da_args_t *args);
>> +STATIC int xfs_attr_node_removename_iter(struct xfs_delattr_context *dac);
>>   STATIC int xfs_attr_node_hasname(xfs_da_args_t *args,
>>   				 struct xfs_da_state **state);
>>   STATIC int xfs_attr_fillstate(xfs_da_state_t *state);
>> @@ -264,6 +264,33 @@ xfs_attr_set_shortform(
>>   }
>>   
>>   /*
>> + * Checks to see if a delayed attribute transaction should be rolled.  If so,
>> + * also checks for a defer finish.  Transaction is finished and rolled as
>> + * needed, and returns true of false if the delayed operation should continue.
>> + */
>> +int
>> +xfs_attr_trans_roll(
>> +	struct xfs_delattr_context	*dac)
>> +{
>> +	struct xfs_da_args		*args = dac->da_args;
>> +	int				error = 0;
>> +
>> +	if (dac->flags & XFS_DAC_DEFER_FINISH) {
>> +		/*
>> +		 * The caller wants us to finish all the deferred ops so that we
>> +		 * avoid pinning the log tail with a large number of deferred
>> +		 * ops.
>> +		 */
>> +		dac->flags &= ~XFS_DAC_DEFER_FINISH;
>> +		error = xfs_defer_finish(&args->trans);
>> +		if (error)
>> +			return error;
>> +	}
>> +
> 
> It seems like some comments on the previous version weren't addressed.
> I.e., the spurious transaction roll here when a dfops finish occurs..?
Ok, I got the impression from the last review that we wanted most of 
this looping mechanism to go away, so most of the changes in this set 
are focused there, and forgot to come back and touch this up.  Most of 
this code disappears in patch 9 now though.  I wasnt really sure what 
people would think of it just yet, so I left that as a patch and the end 
of the series in case people had different thoughts after seeing the 
implementation.

> 
>> +	return xfs_trans_roll_inode(&args->trans, args->dp);
>> +}
>> +
>> +/*
>>    * Set the attribute specified in @args.
>>    */
>>   int
>> @@ -364,23 +391,54 @@ xfs_has_attr(
>>    */
>>   int
>>   xfs_attr_remove_args(
>> -	struct xfs_da_args      *args)
>> +	struct xfs_da_args	*args)
>>   {
>> -	struct xfs_inode	*dp = args->dp;
>> -	int			error;
>> +	int				error = 0;
>> +	struct xfs_delattr_context	dac = {
>> +		.da_args	= args,
>> +	};
>> +
>> +	do {
>> +		error = xfs_attr_remove_iter(&dac);
>> +		if (error != -EAGAIN)
>> +			break;
>> +
>> +		error = xfs_attr_trans_roll(&dac);
>> +		if (error)
>> +			return error;
>> +
>> +	} while (true);
>> +
>> +	return error;
>> +}
>> +
>> +/*
>> + * Remove the attribute specified in @args.
>> + *
>> + * This function may return -EAGAIN to signal that the transaction needs to be
>> + * rolled.  Callers should continue calling this function until they receive a
>> + * return value other than -EAGAIN.
>> + */
>> +int
>> +xfs_attr_remove_iter(
>> +	struct xfs_delattr_context	*dac)
>> +{
>> +	struct xfs_da_args		*args = dac->da_args;
>> +	struct xfs_inode		*dp = args->dp;
>> +
>> +	if (dac->dela_state == XFS_DAS_RM_SHRINK)
>> +		goto node;
>>   
>>   	if (!xfs_inode_hasattr(dp)) {
>> -		error = -ENOATTR;
>> +		return -ENOATTR;
>>   	} else if (dp->i_afp->if_format == XFS_DINODE_FMT_LOCAL) {
>>   		ASSERT(dp->i_afp->if_flags & XFS_IFINLINE);
>> -		error = xfs_attr_shortform_remove(args);
>> +		return xfs_attr_shortform_remove(args);
>>   	} else if (xfs_bmap_one_block(dp, XFS_ATTR_FORK)) {
>> -		error = xfs_attr_leaf_removename(args);
>> -	} else {
>> -		error = xfs_attr_node_removename(args);
>> +		return xfs_attr_leaf_removename(args);
>>   	}
>> -
>> -	return error;
>> +node:
>> +	return  xfs_attr_node_removename_iter(dac);
>>   }
>>   
>>   /*
>> @@ -1178,10 +1236,11 @@ xfs_attr_leaf_mark_incomplete(
>>    */
>>   STATIC
>>   int xfs_attr_node_removename_setup(
>> -	struct xfs_da_args	*args,
>> -	struct xfs_da_state	**state)
>> +	struct xfs_delattr_context	*dac,
>> +	struct xfs_da_state		**state)
>>   {
>> -	int			error;
>> +	struct xfs_da_args		*args = dac->da_args;
>> +	int				error;
>>   
>>   	error = xfs_attr_node_hasname(args, state);
>>   	if (error != -EEXIST)
>> @@ -1191,6 +1250,12 @@ int xfs_attr_node_removename_setup(
>>   	ASSERT((*state)->path.blk[(*state)->path.active - 1].magic ==
>>   		XFS_ATTR_LEAF_MAGIC);
>>   
>> +	/*
>> +	 * Store state in the context incase we need to cycle out the
>> +	 * transaction
>> +	 */
>> +	dac->da_state = *state;
>> +
>>   	if (args->rmtblkno > 0) {
>>   		error = xfs_attr_leaf_mark_incomplete(args, *state);
>>   		if (error)
>> @@ -1203,13 +1268,16 @@ int xfs_attr_node_removename_setup(
>>   }
>>   
>>   STATIC int
>> -xfs_attr_node_remove_rmt(
>> -	struct xfs_da_args	*args,
>> -	struct xfs_da_state	*state)
>> +xfs_attr_node_remove_rmt (
> 
> Extra space		   ^
Ok will fix

> 
>> +	struct xfs_delattr_context	*dac,
>> +	struct xfs_da_state		*state)
>>   {
>> -	int			error = 0;
>> +	int				error = 0;
>>   
>> -	error = xfs_attr_rmtval_remove(args);
>> +	/*
>> +	 * May return -EAGAIN to request that the caller recall this function
>> +	 */
>> +	error = __xfs_attr_rmtval_remove(dac);
>>   	if (error)
>>   		return error;
>>   
>> @@ -1221,21 +1289,27 @@ xfs_attr_node_remove_rmt(
>>   }
>>   
>>   /*
>> - * Remove a name from a B-tree attribute list.
>> + * Step through removeing a name from a B-tree attribute list.
>>    *
>>    * This will involve walking down the Btree, and may involve joining
>>    * leaf nodes and even joining intermediate nodes up to and including
>>    * the root node (a special case of an intermediate node).
>> + *
>> + * This routine is meant to function as either an inline or delayed operation,
>> + * and may return -EAGAIN when the transaction needs to be rolled.  Calling
>> + * functions will need to handle this, and recall the function until a
>> + * successful error code is returned.
>>    */
>>   STATIC int
>>   xfs_attr_node_remove_step(
>> -	struct xfs_da_args	*args,
>> -	struct xfs_da_state	*state)
>> +	struct xfs_delattr_context	*dac)
>>   {
>> -	struct xfs_da_state_blk	*blk;
>> -	int			retval, error;
>> -	struct xfs_inode	*dp = args->dp;
>> +	struct xfs_da_args		*args = dac->da_args;
>> +	struct xfs_da_state		*state;
>> +	struct xfs_da_state_blk		*blk;
>> +	int				retval, error = 0;
>>   
>> +	state = dac->da_state;
>>   
>>   	/*
>>   	 * If there is an out-of-line value, de-allocate the blocks.
>> @@ -1243,7 +1317,10 @@ xfs_attr_node_remove_step(
>>   	 * overflow the maximum size of a transaction and/or hit a deadlock.
>>   	 */
>>   	if (args->rmtblkno > 0) {
>> -		error = xfs_attr_node_remove_rmt(args, state);
>> +		/*
>> +		 * May return -EAGAIN. Remove blocks until args->rmtblkno == 0
>> +		 */
>> +		error = xfs_attr_node_remove_rmt(dac, state);
>>   		if (error)
>>   			return error;
>>   	}
>> @@ -1257,21 +1334,18 @@ xfs_attr_node_remove_step(
>>   	xfs_da3_fixhashpath(state, &state->path);
>>   
>>   	/*
>> -	 * Check to see if the tree needs to be collapsed.
>> +	 * Check to see if the tree needs to be collapsed.  Set the flag to
>> +	 * indicate that the calling function needs to move the to shrink
>> +	 * operation
>>   	 */
>>   	if (retval && (state->path.active > 1)) {
>>   		error = xfs_da3_join(state);
>>   		if (error)
>>   			return error;
>> -		error = xfs_defer_finish(&args->trans);
>> -		if (error)
>> -			return error;
>> -		/*
>> -		 * Commit the Btree join operation and start a new trans.
>> -		 */
>> -		error = xfs_trans_roll_inode(&args->trans, dp);
>> -		if (error)
>> -			return error;
>> +
>> +		dac->flags |= XFS_DAC_DEFER_FINISH;
>> +		dac->dela_state = XFS_DAS_RM_SHRINK;
>> +		return -EAGAIN;
>>   	}
>>   
>>   	return error;
>> @@ -1282,31 +1356,53 @@ xfs_attr_node_remove_step(
>>    *
>>    * This routine will find the blocks of the name to remove, remove them and
>>    * shirnk the tree if needed.
>> + *
>> + * This routine is meant to function as either an inline or delayed operation,
>> + * and may return -EAGAIN when the transaction needs to be rolled.  Calling
>> + * functions will need to handle this, and recall the function until a
>> + * successful error code is returned.
>>    */
>>   STATIC int
>> -xfs_attr_node_removename(
>> -	struct xfs_da_args	*args)
>> +xfs_attr_node_removename_iter(
>> +	struct xfs_delattr_context	*dac)
>>   {
>> -	struct xfs_da_state	*state;
>> -	int			error;
>> -	struct xfs_inode	*dp = args->dp;
>> +	struct xfs_da_args		*args = dac->da_args;
>> +	struct xfs_da_state		*state;
>> +	int				error;
>> +	struct xfs_inode		*dp = args->dp;
>>   
>>   	trace_xfs_attr_node_removename(args);
>> +	state = dac->da_state;
>>   
>> -	error = xfs_attr_node_removename_setup(args, &state);
>> -	if (error)
>> -		goto out;
>> +	if ((dac->flags & XFS_DAC_NODE_RMVNAME_INIT) == 0) {
>> +		dac->flags |= XFS_DAC_NODE_RMVNAME_INIT;
>> +		error = xfs_attr_node_removename_setup(dac, &state);
>> +		if (error)
>> +			goto out;
>> +	}
>>   
>> -	error = xfs_attr_node_remove_step(args, state);
>> -	if (error)
>> -		goto out;
>> +	switch (dac->dela_state) {
>> +	case XFS_DAS_UNINIT:
>> +		error = xfs_attr_node_remove_step(dac);
>> +		if (error)
>> +			break;
>>   
> 
> I think there's a bit more preliminary refactoring to do here to isolate
> the state management to this one function. I.e., from the discussion on
> the previous version, we'd ideally pull the logic that checks for the
> subsequent shrink state out of xfs_attr_node_remove_step() and lift it
> into this branch. See the pseudocode in the previous discussion for an
> example of what I mean:
> 
>    https://urldefense.com/v3/__https://lore.kernel.org/linux-xfs/20200901170020.GC174813@bfoster/__;!!GqivPVa7Brio!JKxU3Z07HVj0V1YFesXrveRWnoGuWqiTZuaIDiG9UFmSxz-aFGsZPpLtjIOjSht7WL_h$
> 
> The general goal of that is to refactor the existing code such that all
> of the state transitions and whatnot are shown in one place and the rest
> is broken down into smaller functional helpers.
> 
> Brian

Yes, I did see the pseudo code, though I wasnt able to get it through 
the test cases quite the way it appears.  Because whether or not we 
proceed to shrink is determined by the leaf remove that's burried in the 
step function.  Otherwise we run into some failed asserts in the shrink 
routines for "state->path.active == 1". Alternatly we can add another 
helper to pull it up to this scope.  Then the switch ends up looking 
like this:

         switch (dac->dela_state) { 

         case XFS_DAS_UNINIT: 

                 /* 

                  * repeatedly remove remote blocks, remove the entry 
and
                  * join. returns -EAGAIN or 0 for completion of the 
step.
                  */ 

                 error = xfs_attr_node_remove_step(dac); 

                 if (error) 

                         break; 

 

                 retval = xfs_attr_node_remove_cleanup(dac); 

 

                 /* 

                  * Check to see if the tree needs to be collapsed.  Set 
the flag to
                  * indicate that the calling function needs to move the 
to shrink
                  * operation 

                  */ 

                 if (retval && (state->path.active > 1)) { 

                         error = xfs_da3_join(state); 

                         if (error) 

                                 return error; 

 

                         dac->dela_state = XFS_DAS_RM_SHRINK; 

                         return -EAGAIN; 

                 } 

 

                 /* check whether to shrink or return success */ 

                 if (!error && xfs_bmap_one_block(dp, XFS_ATTR_FORK)) { 

                         dac->dela_state = XFS_DAS_RM_SHRINK; 

                         error = -EAGAIN; 

                 } 

                 break; 

         case XFS_DAS_RM_SHRINK: 

                 /* 

                  * If the result is small enough, push it all into the 
inode.
                  */ 

                 error = xfs_attr_node_shrink(args, state); 

 

                 break; 

         default: 

                 ASSERT(0); 

                 return -EINVAL; 

         } 

 

         if (error == -EAGAIN) 

                 return error;





And then we have this little clean up helper that we pull out of 
remove_step:



STATIC int 

xfs_attr_node_remove_cleanup( 

         struct xfs_delattr_context      *dac) 

{ 

         struct xfs_da_args              *args = dac->da_args; 

         struct xfs_da_state_blk         *blk; 

         int                             retval; 

 

         /* 

          * Remove the name and update the hashvals in the tree. 

          */ 

         blk = &dac->da_state->path.blk[ dac->da_state->path.active-1 ]; 

         ASSERT(blk->magic == XFS_ATTR_LEAF_MAGIC); 

         retval = xfs_attr3_leaf_remove(blk->bp, args); 

         xfs_da3_fixhashpath(dac->da_state, &dac->da_state->path); 

 

         return retval; 

}


This configuration seems to get through the test cases.  It's not quite 
as tidy, but it does get all the state handling into this scope.  If 
people prefer it this way I can add in the extra helper and make these 
adjustments?

Allison

> 
>> -	/*
>> -	 * If the result is small enough, push it all into the inode.
>> -	 */
>> -	if (xfs_bmap_one_block(dp, XFS_ATTR_FORK))
>> -		error = xfs_attr_node_shrink(args, state);
>> +		/* do not break, proceed to shrink if needed */
>> +	case XFS_DAS_RM_SHRINK:
>> +		/*
>> +		 * If the result is small enough, push it all into the inode.
>> +		 */
>> +		if (xfs_bmap_one_block(dp, XFS_ATTR_FORK))
>> +			error = xfs_attr_node_shrink(args, state);
>>   
>> +		break;
>> +	default:
>> +		ASSERT(0);
>> +		return -EINVAL;
>> +	}
>> +
>> +	if (error == -EAGAIN)
>> +		return error;
>>   out:
>>   	if (state)
>>   		xfs_da_state_free(state);
>> diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
>> index 3e97a93..64dcf0f 100644
>> --- a/fs/xfs/libxfs/xfs_attr.h
>> +++ b/fs/xfs/libxfs/xfs_attr.h
>> @@ -74,6 +74,74 @@ struct xfs_attr_list_context {
>>   };
>>   
>>   
>> +/*
>> + * ========================================================================
>> + * Structure used to pass context around among the delayed routines.
>> + * ========================================================================
>> + */
>> +
>> +/*
>> + * Below is a state machine diagram for attr remove operations. The  XFS_DAS_*
>> + * states indicate places where the function would return -EAGAIN, and then
>> + * immediately resume from after being recalled by the calling function. States
>> + * marked as a "subroutine state" indicate that they belong to a subroutine, and
>> + * so the calling function needs to pass them back to that subroutine to allow
>> + * it to finish where it left off. But they otherwise do not have a role in the
>> + * calling function other than just passing through.
>> + *
>> + * xfs_attr_remove_iter()
>> + *	  XFS_DAS_RM_SHRINK ─┐
>> + *	  (subroutine state) │
>> + *	                     └─>xfs_attr_node_removename()
>> + *	                                      │
>> + *	                                      v
>> + *	                                   need to
>> + *	                                shrink tree? ─n─┐
>> + *	                                      │         │
>> + *	                                      y         │
>> + *	                                      │         │
>> + *	                                      v         │
>> + *	                              XFS_DAS_RM_SHRINK │
>> + *	                                      │         │
>> + *	                                      v         │
>> + *	                                     done <─────┘
>> + *
>> + */
>> +
>> +/*
>> + * Enum values for xfs_delattr_context.da_state
>> + *
>> + * These values are used by delayed attribute operations to keep track  of where
>> + * they were before they returned -EAGAIN.  A return code of -EAGAIN signals the
>> + * calling function to roll the transaction, and then recall the subroutine to
>> + * finish the operation.  The enum is then used by the subroutine to jump back
>> + * to where it was and resume executing where it left off.
>> + */
>> +enum xfs_delattr_state {
>> +	XFS_DAS_UNINIT		= 0,  /* No state has been set yet */
>> +	XFS_DAS_RM_SHRINK,	      /* We are shrinking the tree */
>> +};
>> +
>> +/*
>> + * Defines for xfs_delattr_context.flags
>> + */
>> +#define XFS_DAC_DEFER_FINISH		0x01 /* finish the transaction */
>> +#define XFS_DAC_NODE_RMVNAME_INIT	0x02 /* xfs_attr_node_removename init */
>> +
>> +/*
>> + * Context used for keeping track of delayed attribute operations
>> + */
>> +struct xfs_delattr_context {
>> +	struct xfs_da_args      *da_args;
>> +
>> +	/* Used in xfs_attr_node_removename to roll through removing blocks */
>> +	struct xfs_da_state     *da_state;
>> +
>> +	/* Used to keep track of current state of delayed operation */
>> +	unsigned int            flags;
>> +	enum xfs_delattr_state  dela_state;
>> +};
>> +
>>   /*========================================================================
>>    * Function prototypes for the kernel.
>>    *========================================================================*/
>> @@ -91,6 +159,10 @@ int xfs_attr_set(struct xfs_da_args *args);
>>   int xfs_attr_set_args(struct xfs_da_args *args);
>>   int xfs_has_attr(struct xfs_da_args *args);
>>   int xfs_attr_remove_args(struct xfs_da_args *args);
>> +int xfs_attr_remove_iter(struct xfs_delattr_context *dac);
>> +int xfs_attr_trans_roll(struct xfs_delattr_context *dac);
>>   bool xfs_attr_namecheck(const void *name, size_t length);
>> +void xfs_delattr_context_init(struct xfs_delattr_context *dac,
>> +			      struct xfs_da_args *args);
>>   
>>   #endif	/* __XFS_ATTR_H__ */
>> diff --git a/fs/xfs/libxfs/xfs_attr_leaf.c b/fs/xfs/libxfs/xfs_attr_leaf.c
>> index bb128db..338377e 100644
>> --- a/fs/xfs/libxfs/xfs_attr_leaf.c
>> +++ b/fs/xfs/libxfs/xfs_attr_leaf.c
>> @@ -19,8 +19,8 @@
>>   #include "xfs_bmap_btree.h"
>>   #include "xfs_bmap.h"
>>   #include "xfs_attr_sf.h"
>> -#include "xfs_attr_remote.h"
>>   #include "xfs_attr.h"
>> +#include "xfs_attr_remote.h"
>>   #include "xfs_attr_leaf.h"
>>   #include "xfs_error.h"
>>   #include "xfs_trace.h"
>> diff --git a/fs/xfs/libxfs/xfs_attr_remote.c b/fs/xfs/libxfs/xfs_attr_remote.c
>> index 48d8e9c..1426c15 100644
>> --- a/fs/xfs/libxfs/xfs_attr_remote.c
>> +++ b/fs/xfs/libxfs/xfs_attr_remote.c
>> @@ -674,10 +674,12 @@ xfs_attr_rmtval_invalidate(
>>    */
>>   int
>>   xfs_attr_rmtval_remove(
>> -	struct xfs_da_args      *args)
>> +	struct xfs_da_args		*args)
>>   {
>> -	int			error;
>> -	int			retval;
>> +	int				error;
>> +	struct xfs_delattr_context	dac  = {
>> +		.da_args	= args,
>> +	};
>>   
>>   	trace_xfs_attr_rmtval_remove(args);
>>   
>> @@ -685,19 +687,17 @@ xfs_attr_rmtval_remove(
>>   	 * Keep de-allocating extents until the remote-value region is gone.
>>   	 */
>>   	do {
>> -		retval = __xfs_attr_rmtval_remove(args);
>> -		if (retval && retval != -EAGAIN)
>> -			return retval;
>> +		error = __xfs_attr_rmtval_remove(&dac);
>> +		if (error != -EAGAIN)
>> +			break;
>>   
>> -		/*
>> -		 * Close out trans and start the next one in the chain.
>> -		 */
>> -		error = xfs_trans_roll_inode(&args->trans, args->dp);
>> +		error = xfs_attr_trans_roll(&dac);
>>   		if (error)
>>   			return error;
>> -	} while (retval == -EAGAIN);
>>   
>> -	return 0;
>> +	} while (true);
>> +
>> +	return error;
>>   }
>>   
>>   /*
>> @@ -707,9 +707,10 @@ xfs_attr_rmtval_remove(
>>    */
>>   int
>>   __xfs_attr_rmtval_remove(
>> -	struct xfs_da_args	*args)
>> +	struct xfs_delattr_context	*dac)
>>   {
>> -	int			error, done;
>> +	struct xfs_da_args		*args = dac->da_args;
>> +	int				error, done;
>>   
>>   	/*
>>   	 * Unmap value blocks for this attr.
>> @@ -719,12 +720,10 @@ __xfs_attr_rmtval_remove(
>>   	if (error)
>>   		return error;
>>   
>> -	error = xfs_defer_finish(&args->trans);
>> -	if (error)
>> -		return error;
>> -
>> -	if (!done)
>> +	if (!done) {
>> +		dac->flags |= XFS_DAC_DEFER_FINISH;
>>   		return -EAGAIN;
>> +	}
>>   
>>   	return error;
>>   }
>> diff --git a/fs/xfs/libxfs/xfs_attr_remote.h b/fs/xfs/libxfs/xfs_attr_remote.h
>> index 9eee615..002fd30 100644
>> --- a/fs/xfs/libxfs/xfs_attr_remote.h
>> +++ b/fs/xfs/libxfs/xfs_attr_remote.h
>> @@ -14,5 +14,5 @@ int xfs_attr_rmtval_remove(struct xfs_da_args *args);
>>   int xfs_attr_rmtval_stale(struct xfs_inode *ip, struct xfs_bmbt_irec *map,
>>   		xfs_buf_flags_t incore_flags);
>>   int xfs_attr_rmtval_invalidate(struct xfs_da_args *args);
>> -int __xfs_attr_rmtval_remove(struct xfs_da_args *args);
>> +int __xfs_attr_rmtval_remove(struct xfs_delattr_context *dac);
>>   #endif /* __XFS_ATTR_REMOTE_H__ */
>> diff --git a/fs/xfs/xfs_attr_inactive.c b/fs/xfs/xfs_attr_inactive.c
>> index bfad669..aaa7e66 100644
>> --- a/fs/xfs/xfs_attr_inactive.c
>> +++ b/fs/xfs/xfs_attr_inactive.c
>> @@ -15,10 +15,10 @@
>>   #include "xfs_da_format.h"
>>   #include "xfs_da_btree.h"
>>   #include "xfs_inode.h"
>> +#include "xfs_attr.h"
>>   #include "xfs_attr_remote.h"
>>   #include "xfs_trans.h"
>>   #include "xfs_bmap.h"
>> -#include "xfs_attr.h"
>>   #include "xfs_attr_leaf.h"
>>   #include "xfs_quota.h"
>>   #include "xfs_dir2.h"
>> -- 
>> 2.7.4
>>
> 

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v13 02/10] xfs: Add delay ready attr remove routines
  2020-10-27 15:32     ` Allison Henderson
@ 2020-10-28 12:04       ` Chandan Babu R
  2020-10-29  1:29         ` Allison Henderson
  0 siblings, 1 reply; 58+ messages in thread
From: Chandan Babu R @ 2020-10-28 12:04 UTC (permalink / raw)
  To: Allison Henderson; +Cc: linux-xfs

On Tuesday 27 October 2020 9:02:05 PM IST Allison Henderson wrote:
> 
> On 10/27/20 2:59 AM, Chandan Babu R wrote:
> > On Friday 23 October 2020 12:04:27 PM IST Allison Henderson wrote:
> >> This patch modifies the attr remove routines to be delay ready. This
> >> means they no longer roll or commit transactions, but instead return
> >> -EAGAIN to have the calling routine roll and refresh the transaction. In
> >> this series, xfs_attr_remove_args has become xfs_attr_remove_iter, which
> >> uses a sort of state machine like switch to keep track of where it was
> >> when EAGAIN was returned. xfs_attr_node_removename has also been
> >> modified to use the switch, and a new version of xfs_attr_remove_args
> >> consists of a simple loop to refresh the transaction until the operation
> >> is completed.  A new XFS_DAC_DEFER_FINISH flag is used to finish the
> >> transaction where ever the existing code used to.
> >>
> >> Calls to xfs_attr_rmtval_remove are replaced with the delay ready
> >> version __xfs_attr_rmtval_remove. We will rename
> >> __xfs_attr_rmtval_remove back to xfs_attr_rmtval_remove when we are
> >> done.
> >>
> >> xfs_attr_rmtval_remove itself is still in use by the set routines (used
> >> during a rename).  For reasons of preserving existing function, we
> >> modify xfs_attr_rmtval_remove to call xfs_defer_finish when the flag is
> >> set.  Similar to how xfs_attr_remove_args does here.  Once we transition
> >> the set routines to be delay ready, xfs_attr_rmtval_remove is no longer
> >> used and will be removed.
> >>
> >> This patch also adds a new struct xfs_delattr_context, which we will use
> >> to keep track of the current state of an attribute operation. The new
> >> xfs_delattr_state enum is used to track various operations that are in
> >> progress so that we know not to repeat them, and resume where we left
> >> off before EAGAIN was returned to cycle out the transaction. Other
> >> members take the place of local variables that need to retain their
> >> values across multiple function recalls.  See xfs_attr.h for a more
> >> detailed diagram of the states.
> >>
> >> Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
> >> ---
> >>   fs/xfs/libxfs/xfs_attr.c        | 200 +++++++++++++++++++++++++++++-----------
> >>   fs/xfs/libxfs/xfs_attr.h        |  72 +++++++++++++++
> >>   fs/xfs/libxfs/xfs_attr_leaf.c   |   2 +-
> >>   fs/xfs/libxfs/xfs_attr_remote.c |  37 ++++----
> >>   fs/xfs/libxfs/xfs_attr_remote.h |   2 +-
> >>   fs/xfs/xfs_attr_inactive.c      |   2 +-
> >>   6 files changed, 241 insertions(+), 74 deletions(-)
> >>
> >> diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
> >> index f4d39bf..6ca94cb 100644
> >> --- a/fs/xfs/libxfs/xfs_attr.c
> >> +++ b/fs/xfs/libxfs/xfs_attr.c
> >> @@ -53,7 +53,7 @@ STATIC int xfs_attr_leaf_hasname(struct xfs_da_args *args, struct xfs_buf **bp);
> >>    */
> >>   STATIC int xfs_attr_node_get(xfs_da_args_t *args);
> >>   STATIC int xfs_attr_node_addname(xfs_da_args_t *args);
> >> -STATIC int xfs_attr_node_removename(xfs_da_args_t *args);
> >> +STATIC int xfs_attr_node_removename_iter(struct xfs_delattr_context *dac);
> >>   STATIC int xfs_attr_node_hasname(xfs_da_args_t *args,
> >>   				 struct xfs_da_state **state);
> >>   STATIC int xfs_attr_fillstate(xfs_da_state_t *state);
> >> @@ -264,6 +264,33 @@ xfs_attr_set_shortform(
> >>   }
> >>   
> >>   /*
> >> + * Checks to see if a delayed attribute transaction should be rolled.  If so,
> >> + * also checks for a defer finish.  Transaction is finished and rolled as
> >> + * needed, and returns true of false if the delayed operation should continue.
> >> + */
> >> +int
> >> +xfs_attr_trans_roll(
> >> +	struct xfs_delattr_context	*dac)
> >> +{
> >> +	struct xfs_da_args		*args = dac->da_args;
> >> +	int				error = 0;
> >> +
> >> +	if (dac->flags & XFS_DAC_DEFER_FINISH) {
> >> +		/*
> >> +		 * The caller wants us to finish all the deferred ops so that we
> >> +		 * avoid pinning the log tail with a large number of deferred
> >> +		 * ops.
> >> +		 */
> >> +		dac->flags &= ~XFS_DAC_DEFER_FINISH;
> >> +		error = xfs_defer_finish(&args->trans);
> >> +		if (error)
> >> +			return error;
> >> +	}
> >> +
> >> +	return xfs_trans_roll_inode(&args->trans, args->dp);
> >> +}
> >> +
> >> +/*
> >>    * Set the attribute specified in @args.
> >>    */
> >>   int
> >> @@ -364,23 +391,54 @@ xfs_has_attr(
> >>    */
> >>   int
> >>   xfs_attr_remove_args(
> >> -	struct xfs_da_args      *args)
> >> +	struct xfs_da_args	*args)
> >>   {
> >> -	struct xfs_inode	*dp = args->dp;
> >> -	int			error;
> >> +	int				error = 0;
> > 
> > I guess the explicit initialization of "error" can be removed since the
> > value returned by the call to xfs_attr_remove_iter() will overwrite it.
> Sure, will fix
> > 
> >> +	struct xfs_delattr_context	dac = {
> >> +		.da_args	= args,
> >> +	};
> >> +
> >> +	do {
> >> +		error = xfs_attr_remove_iter(&dac);
> >> +		if (error != -EAGAIN)
> >> +			break;
> >> +
> >> +		error = xfs_attr_trans_roll(&dac);
> >> +		if (error)
> >> +			return error;
> >> +
> >> +	} while (true);
> >> +
> >> +	return error;
> >> +}
> >> +
> >> +/*
> >> + * Remove the attribute specified in @args.
> >> + *
> >> + * This function may return -EAGAIN to signal that the transaction needs to be
> >> + * rolled.  Callers should continue calling this function until they receive a
> >> + * return value other than -EAGAIN.
> >> + */
> >> +int
> >> +xfs_attr_remove_iter(
> >> +	struct xfs_delattr_context	*dac)
> >> +{
> >> +	struct xfs_da_args		*args = dac->da_args;
> >> +	struct xfs_inode		*dp = args->dp;
> >> +
> >> +	if (dac->dela_state == XFS_DAS_RM_SHRINK)
> >> +		goto node;
> >>   
> >>   	if (!xfs_inode_hasattr(dp)) {
> >> -		error = -ENOATTR;
> >> +		return -ENOATTR;
> >>   	} else if (dp->i_afp->if_format == XFS_DINODE_FMT_LOCAL) {
> >>   		ASSERT(dp->i_afp->if_flags & XFS_IFINLINE);
> >> -		error = xfs_attr_shortform_remove(args);
> >> +		return xfs_attr_shortform_remove(args);
> >>   	} else if (xfs_bmap_one_block(dp, XFS_ATTR_FORK)) {
> >> -		error = xfs_attr_leaf_removename(args);
> >> -	} else {
> >> -		error = xfs_attr_node_removename(args);
> >> +		return xfs_attr_leaf_removename(args);
> >>   	}
> >> -
> >> -	return error;
> >> +node:
> >> +	return  xfs_attr_node_removename_iter(dac);
> >>   }
> >>   
> >>   /*
> >> @@ -1178,10 +1236,11 @@ xfs_attr_leaf_mark_incomplete(
> >>    */
> >>   STATIC
> >>   int xfs_attr_node_removename_setup(
> >> -	struct xfs_da_args	*args,
> >> -	struct xfs_da_state	**state)
> >> +	struct xfs_delattr_context	*dac,
> >> +	struct xfs_da_state		**state)
> >>   {
> >> -	int			error;
> >> +	struct xfs_da_args		*args = dac->da_args;
> >> +	int				error;
> >>   
> >>   	error = xfs_attr_node_hasname(args, state);
> >>   	if (error != -EEXIST)
> >> @@ -1191,6 +1250,12 @@ int xfs_attr_node_removename_setup(
> >>   	ASSERT((*state)->path.blk[(*state)->path.active - 1].magic ==
> >>   		XFS_ATTR_LEAF_MAGIC);
> >>   
> >> +	/*
> >> +	 * Store state in the context incase we need to cycle out the
> >> +	 * transaction
> >> +	 */
> >> +	dac->da_state = *state;
> >> +
> >>   	if (args->rmtblkno > 0) {
> >>   		error = xfs_attr_leaf_mark_incomplete(args, *state);
> >>   		if (error)
> >> @@ -1203,13 +1268,16 @@ int xfs_attr_node_removename_setup(
> >>   }
> >>   
> >>   STATIC int
> >> -xfs_attr_node_remove_rmt(
> >> -	struct xfs_da_args	*args,
> >> -	struct xfs_da_state	*state)
> >> +xfs_attr_node_remove_rmt (
> >> +	struct xfs_delattr_context	*dac,
> >> +	struct xfs_da_state		*state)
> >>   {
> >> -	int			error = 0;
> >> +	int				error = 0;
> >>   
> >> -	error = xfs_attr_rmtval_remove(args);
> >> +	/*
> >> +	 * May return -EAGAIN to request that the caller recall this function
> >> +	 */
> >> +	error = __xfs_attr_rmtval_remove(dac);
> >>   	if (error)
> >>   		return error;
> >>   
> >> @@ -1221,21 +1289,27 @@ xfs_attr_node_remove_rmt(
> >>   }
> >>   
> >>   /*
> >> - * Remove a name from a B-tree attribute list.
> >> + * Step through removeing a name from a B-tree attribute list.
> >>    *
> >>    * This will involve walking down the Btree, and may involve joining
> >>    * leaf nodes and even joining intermediate nodes up to and including
> >>    * the root node (a special case of an intermediate node).
> >> + *
> >> + * This routine is meant to function as either an inline or delayed operation,
> >> + * and may return -EAGAIN when the transaction needs to be rolled.  Calling
> >> + * functions will need to handle this, and recall the function until a
> >> + * successful error code is returned.
> >>    */
> >>   STATIC int
> >>   xfs_attr_node_remove_step(
> >> -	struct xfs_da_args	*args,
> >> -	struct xfs_da_state	*state)
> >> +	struct xfs_delattr_context	*dac)
> >>   {
> >> -	struct xfs_da_state_blk	*blk;
> >> -	int			retval, error;
> >> -	struct xfs_inode	*dp = args->dp;
> >> +	struct xfs_da_args		*args = dac->da_args;
> >> +	struct xfs_da_state		*state;
> >> +	struct xfs_da_state_blk		*blk;
> >> +	int				retval, error = 0;
> >>   
> >> +	state = dac->da_state;
> >>   
> >>   	/*
> >>   	 * If there is an out-of-line value, de-allocate the blocks.
> >> @@ -1243,7 +1317,10 @@ xfs_attr_node_remove_step(
> >>   	 * overflow the maximum size of a transaction and/or hit a deadlock.
> >>   	 */
> >>   	if (args->rmtblkno > 0) {
> >> -		error = xfs_attr_node_remove_rmt(args, state);
> >> +		/*
> >> +		 * May return -EAGAIN. Remove blocks until args->rmtblkno == 0
> >> +		 */
> >> +		error = xfs_attr_node_remove_rmt(dac, state);
> >>   		if (error)
> >>   			return error;
> >>   	}
> >> @@ -1257,21 +1334,18 @@ xfs_attr_node_remove_step(
> >>   	xfs_da3_fixhashpath(state, &state->path);
> >>   
> >>   	/*
> >> -	 * Check to see if the tree needs to be collapsed.
> >> +	 * Check to see if the tree needs to be collapsed.  Set the flag to
> >> +	 * indicate that the calling function needs to move the to shrink
> >> +	 * operation
> >>   	 */
> >>   	if (retval && (state->path.active > 1)) {
> >>   		error = xfs_da3_join(state);
> >>   		if (error)
> >>   			return error;
> >> -		error = xfs_defer_finish(&args->trans);
> >> -		if (error)
> >> -			return error;
> >> -		/*
> >> -		 * Commit the Btree join operation and start a new trans.
> >> -		 */
> >> -		error = xfs_trans_roll_inode(&args->trans, dp);
> >> -		if (error)
> >> -			return error;
> >> +
> >> +		dac->flags |= XFS_DAC_DEFER_FINISH;
> >> +		dac->dela_state = XFS_DAS_RM_SHRINK;
> >> +		return -EAGAIN;
> >>   	}
> >>   
> >>   	return error;
> >> @@ -1282,31 +1356,53 @@ xfs_attr_node_remove_step(
> >>    *
> >>    * This routine will find the blocks of the name to remove, remove them and
> >>    * shirnk the tree if needed.
> >> + *
> >> + * This routine is meant to function as either an inline or delayed operation,
> >> + * and may return -EAGAIN when the transaction needs to be rolled.  Calling
> >> + * functions will need to handle this, and recall the function until a
> >> + * successful error code is returned.
> >>    */
> >>   STATIC int
> >> -xfs_attr_node_removename(
> >> -	struct xfs_da_args	*args)
> >> +xfs_attr_node_removename_iter(
> >> +	struct xfs_delattr_context	*dac)
> >>   {
> >> -	struct xfs_da_state	*state;
> >> -	int			error;
> >> -	struct xfs_inode	*dp = args->dp;
> >> +	struct xfs_da_args		*args = dac->da_args;
> >> +	struct xfs_da_state		*state;
> >> +	int				error;
> >> +	struct xfs_inode		*dp = args->dp;
> >>   
> >>   	trace_xfs_attr_node_removename(args);
> >> +	state = dac->da_state;
> >>   
> >> -	error = xfs_attr_node_removename_setup(args, &state);
> >> -	if (error)
> >> -		goto out;
> >> +	if ((dac->flags & XFS_DAC_NODE_RMVNAME_INIT) == 0) {
> >> +		dac->flags |= XFS_DAC_NODE_RMVNAME_INIT;
> >> +		error = xfs_attr_node_removename_setup(dac, &state);
> >> +		if (error)
> >> +			goto out;
> >> +	}
> >>   
> >> -	error = xfs_attr_node_remove_step(args, state);
> >> -	if (error)
> >> -		goto out;
> >> +	switch (dac->dela_state) {
> >> +	case XFS_DAS_UNINIT:
> >> +		error = xfs_attr_node_remove_step(dac);
> >> +		if (error)
> >> +			break;
> >>   
> >> -	/*
> >> -	 * If the result is small enough, push it all into the inode.
> >> -	 */
> >> -	if (xfs_bmap_one_block(dp, XFS_ATTR_FORK))
> >> -		error = xfs_attr_node_shrink(args, state);
> >> +		/* do not break, proceed to shrink if needed */
> >> +	case XFS_DAS_RM_SHRINK:
> >> +		/*
> >> +		 * If the result is small enough, push it all into the inode.
> >> +		 */
> >> +		if (xfs_bmap_one_block(dp, XFS_ATTR_FORK))
> >> +			error = xfs_attr_node_shrink(args, state);
> >>   
> >> +		break;
> >> +	default:
> >> +		ASSERT(0);
> >> +		return -EINVAL;
> > 
> > I don't think it is possible in a real world scenario, but if "state" were
> > pointing to allocated memory then the above return value might leak the
> > corresponding memory.
> Hmm, trying to follow you here.... I'm assuming you meant dela_state 
> instead of state since that's what controls the switch.  The dac 
> structure is zeroed when allocated to avoid this.  Most of the time when 
> this switch executes, dela_state is zero.  I did have to add the 
> XFS_DAS_UNINIT from the previous suggestion in the last revision though 
> or it generates warnings.
> >

Sorry, I should have clarified that I was referring to the allocated
memory pointed to by dac->da_state. If dac->da_state was pointing to a valid
memory location and dac->dela_state's value is not equal to either
XFS_DAS_UNINIT nor XFS_DAS_RM_SHRINK then the code under the "default" clause
will execute causing -EINVAL to be returned. This could leak the memory
pointed to by dac->da_state.


> > Apart from the above nit, the remaining changes look good to me.
> Ok, thanks for the review!
> Allison
> 
> > 
> > Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
> > 
> >> +	}
> >> +
> >> +	if (error == -EAGAIN)
> >> +		return error;
> >>   out:
> >>   	if (state)
> >>   		xfs_da_state_free(state);
> >> diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
> >> index 3e97a93..64dcf0f 100644
> >> --- a/fs/xfs/libxfs/xfs_attr.h
> >> +++ b/fs/xfs/libxfs/xfs_attr.h
> >> @@ -74,6 +74,74 @@ struct xfs_attr_list_context {
> >>   };
> >>   
> >>   
> >> +/*
> >> + * ========================================================================
> >> + * Structure used to pass context around among the delayed routines.
> >> + * ========================================================================
> >> + */
> >> +
> >> +/*
> >> + * Below is a state machine diagram for attr remove operations. The  XFS_DAS_*
> >> + * states indicate places where the function would return -EAGAIN, and then
> >> + * immediately resume from after being recalled by the calling function. States
> >> + * marked as a "subroutine state" indicate that they belong to a subroutine, and
> >> + * so the calling function needs to pass them back to that subroutine to allow
> >> + * it to finish where it left off. But they otherwise do not have a role in the
> >> + * calling function other than just passing through.
> >> + *
> >> + * xfs_attr_remove_iter()
> >> + *	  XFS_DAS_RM_SHRINK ─┐
> >> + *	  (subroutine state) │
> >> + *	                     └─>xfs_attr_node_removename()
> >> + *	                                      │
> >> + *	                                      v
> >> + *	                                   need to
> >> + *	                                shrink tree? ─n─┐
> >> + *	                                      │         │
> >> + *	                                      y         │
> >> + *	                                      │         │
> >> + *	                                      v         │
> >> + *	                              XFS_DAS_RM_SHRINK │
> >> + *	                                      │         │
> >> + *	                                      v         │
> >> + *	                                     done <─────┘
> >> + *
> >> + */
> >> +
> >> +/*
> >> + * Enum values for xfs_delattr_context.da_state
> >> + *
> >> + * These values are used by delayed attribute operations to keep track  of where
> >> + * they were before they returned -EAGAIN.  A return code of -EAGAIN signals the
> >> + * calling function to roll the transaction, and then recall the subroutine to
> >> + * finish the operation.  The enum is then used by the subroutine to jump back
> >> + * to where it was and resume executing where it left off.
> >> + */
> >> +enum xfs_delattr_state {
> >> +	XFS_DAS_UNINIT		= 0,  /* No state has been set yet */
> >> +	XFS_DAS_RM_SHRINK,	      /* We are shrinking the tree */
> >> +};
> >> +
> >> +/*
> >> + * Defines for xfs_delattr_context.flags
> >> + */
> >> +#define XFS_DAC_DEFER_FINISH		0x01 /* finish the transaction */
> >> +#define XFS_DAC_NODE_RMVNAME_INIT	0x02 /* xfs_attr_node_removename init */
> >> +
> >> +/*
> >> + * Context used for keeping track of delayed attribute operations
> >> + */
> >> +struct xfs_delattr_context {
> >> +	struct xfs_da_args      *da_args;
> >> +
> >> +	/* Used in xfs_attr_node_removename to roll through removing blocks */
> >> +	struct xfs_da_state     *da_state;
> >> +
> >> +	/* Used to keep track of current state of delayed operation */
> >> +	unsigned int            flags;
> >> +	enum xfs_delattr_state  dela_state;
> >> +};
> >> +
> >>   /*========================================================================
> >>    * Function prototypes for the kernel.
> >>    *========================================================================*/
> >> @@ -91,6 +159,10 @@ int xfs_attr_set(struct xfs_da_args *args);
> >>   int xfs_attr_set_args(struct xfs_da_args *args);
> >>   int xfs_has_attr(struct xfs_da_args *args);
> >>   int xfs_attr_remove_args(struct xfs_da_args *args);
> >> +int xfs_attr_remove_iter(struct xfs_delattr_context *dac);
> >> +int xfs_attr_trans_roll(struct xfs_delattr_context *dac);
> >>   bool xfs_attr_namecheck(const void *name, size_t length);
> >> +void xfs_delattr_context_init(struct xfs_delattr_context *dac,
> >> +			      struct xfs_da_args *args);
> >>   
> >>   #endif	/* __XFS_ATTR_H__ */
> >> diff --git a/fs/xfs/libxfs/xfs_attr_leaf.c b/fs/xfs/libxfs/xfs_attr_leaf.c
> >> index bb128db..338377e 100644
> >> --- a/fs/xfs/libxfs/xfs_attr_leaf.c
> >> +++ b/fs/xfs/libxfs/xfs_attr_leaf.c
> >> @@ -19,8 +19,8 @@
> >>   #include "xfs_bmap_btree.h"
> >>   #include "xfs_bmap.h"
> >>   #include "xfs_attr_sf.h"
> >> -#include "xfs_attr_remote.h"
> >>   #include "xfs_attr.h"
> >> +#include "xfs_attr_remote.h"
> >>   #include "xfs_attr_leaf.h"
> >>   #include "xfs_error.h"
> >>   #include "xfs_trace.h"
> >> diff --git a/fs/xfs/libxfs/xfs_attr_remote.c b/fs/xfs/libxfs/xfs_attr_remote.c
> >> index 48d8e9c..1426c15 100644
> >> --- a/fs/xfs/libxfs/xfs_attr_remote.c
> >> +++ b/fs/xfs/libxfs/xfs_attr_remote.c
> >> @@ -674,10 +674,12 @@ xfs_attr_rmtval_invalidate(
> >>    */
> >>   int
> >>   xfs_attr_rmtval_remove(
> >> -	struct xfs_da_args      *args)
> >> +	struct xfs_da_args		*args)
> >>   {
> >> -	int			error;
> >> -	int			retval;
> >> +	int				error;
> >> +	struct xfs_delattr_context	dac  = {
> >> +		.da_args	= args,
> >> +	};
> >>   
> >>   	trace_xfs_attr_rmtval_remove(args);
> >>   
> >> @@ -685,19 +687,17 @@ xfs_attr_rmtval_remove(
> >>   	 * Keep de-allocating extents until the remote-value region is gone.
> >>   	 */
> >>   	do {
> >> -		retval = __xfs_attr_rmtval_remove(args);
> >> -		if (retval && retval != -EAGAIN)
> >> -			return retval;
> >> +		error = __xfs_attr_rmtval_remove(&dac);
> >> +		if (error != -EAGAIN)
> >> +			break;
> >>   
> >> -		/*
> >> -		 * Close out trans and start the next one in the chain.
> >> -		 */
> >> -		error = xfs_trans_roll_inode(&args->trans, args->dp);
> >> +		error = xfs_attr_trans_roll(&dac);
> >>   		if (error)
> >>   			return error;
> >> -	} while (retval == -EAGAIN);
> >>   
> >> -	return 0;
> >> +	} while (true);
> >> +
> >> +	return error;
> >>   }
> >>   
> >>   /*
> >> @@ -707,9 +707,10 @@ xfs_attr_rmtval_remove(
> >>    */
> >>   int
> >>   __xfs_attr_rmtval_remove(
> >> -	struct xfs_da_args	*args)
> >> +	struct xfs_delattr_context	*dac)
> >>   {
> >> -	int			error, done;
> >> +	struct xfs_da_args		*args = dac->da_args;
> >> +	int				error, done;
> >>   
> >>   	/*
> >>   	 * Unmap value blocks for this attr.
> >> @@ -719,12 +720,10 @@ __xfs_attr_rmtval_remove(
> >>   	if (error)
> >>   		return error;
> >>   
> >> -	error = xfs_defer_finish(&args->trans);
> >> -	if (error)
> >> -		return error;
> >> -
> >> -	if (!done)
> >> +	if (!done) {
> >> +		dac->flags |= XFS_DAC_DEFER_FINISH;
> >>   		return -EAGAIN;
> >> +	}
> >>   
> >>   	return error;
> >>   }
> >> diff --git a/fs/xfs/libxfs/xfs_attr_remote.h b/fs/xfs/libxfs/xfs_attr_remote.h
> >> index 9eee615..002fd30 100644
> >> --- a/fs/xfs/libxfs/xfs_attr_remote.h
> >> +++ b/fs/xfs/libxfs/xfs_attr_remote.h
> >> @@ -14,5 +14,5 @@ int xfs_attr_rmtval_remove(struct xfs_da_args *args);
> >>   int xfs_attr_rmtval_stale(struct xfs_inode *ip, struct xfs_bmbt_irec *map,
> >>   		xfs_buf_flags_t incore_flags);
> >>   int xfs_attr_rmtval_invalidate(struct xfs_da_args *args);
> >> -int __xfs_attr_rmtval_remove(struct xfs_da_args *args);
> >> +int __xfs_attr_rmtval_remove(struct xfs_delattr_context *dac);
> >>   #endif /* __XFS_ATTR_REMOTE_H__ */
> >> diff --git a/fs/xfs/xfs_attr_inactive.c b/fs/xfs/xfs_attr_inactive.c
> >> index bfad669..aaa7e66 100644
> >> --- a/fs/xfs/xfs_attr_inactive.c
> >> +++ b/fs/xfs/xfs_attr_inactive.c
> >> @@ -15,10 +15,10 @@
> >>   #include "xfs_da_format.h"
> >>   #include "xfs_da_btree.h"
> >>   #include "xfs_inode.h"
> >> +#include "xfs_attr.h"
> >>   #include "xfs_attr_remote.h"
> >>   #include "xfs_trans.h"
> >>   #include "xfs_bmap.h"
> >> -#include "xfs_attr.h"
> >>   #include "xfs_attr_leaf.h"
> >>   #include "xfs_quota.h"
> >>   #include "xfs_dir2.h"
> >>
> > 
> > 
> 


-- 
chandan




^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v13 02/10] xfs: Add delay ready attr remove routines
  2020-10-27 22:27     ` Allison Henderson
@ 2020-10-28 12:28       ` Brian Foster
  2020-10-29  1:03         ` Allison Henderson
  0 siblings, 1 reply; 58+ messages in thread
From: Brian Foster @ 2020-10-28 12:28 UTC (permalink / raw)
  To: Allison Henderson; +Cc: linux-xfs

On Tue, Oct 27, 2020 at 03:27:20PM -0700, Allison Henderson wrote:
> 
> 
> On 10/27/20 5:16 AM, Brian Foster wrote:
> > On Thu, Oct 22, 2020 at 11:34:27PM -0700, Allison Henderson wrote:
> > > This patch modifies the attr remove routines to be delay ready. This
> > > means they no longer roll or commit transactions, but instead return
> > > -EAGAIN to have the calling routine roll and refresh the transaction. In
> > > this series, xfs_attr_remove_args has become xfs_attr_remove_iter, which
> > > uses a sort of state machine like switch to keep track of where it was
> > > when EAGAIN was returned. xfs_attr_node_removename has also been
> > > modified to use the switch, and a new version of xfs_attr_remove_args
> > > consists of a simple loop to refresh the transaction until the operation
> > > is completed.  A new XFS_DAC_DEFER_FINISH flag is used to finish the
> > > transaction where ever the existing code used to.
> > > 
> > > Calls to xfs_attr_rmtval_remove are replaced with the delay ready
> > > version __xfs_attr_rmtval_remove. We will rename
> > > __xfs_attr_rmtval_remove back to xfs_attr_rmtval_remove when we are
> > > done.
> > > 
> > > xfs_attr_rmtval_remove itself is still in use by the set routines (used
> > > during a rename).  For reasons of preserving existing function, we
> > > modify xfs_attr_rmtval_remove to call xfs_defer_finish when the flag is
> > > set.  Similar to how xfs_attr_remove_args does here.  Once we transition
> > > the set routines to be delay ready, xfs_attr_rmtval_remove is no longer
> > > used and will be removed.
> > > 
> > > This patch also adds a new struct xfs_delattr_context, which we will use
> > > to keep track of the current state of an attribute operation. The new
> > > xfs_delattr_state enum is used to track various operations that are in
> > > progress so that we know not to repeat them, and resume where we left
> > > off before EAGAIN was returned to cycle out the transaction. Other
> > > members take the place of local variables that need to retain their
> > > values across multiple function recalls.  See xfs_attr.h for a more
> > > detailed diagram of the states.
> > > 
> > > Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
> > > ---
> > >   fs/xfs/libxfs/xfs_attr.c        | 200 +++++++++++++++++++++++++++++-----------
> > >   fs/xfs/libxfs/xfs_attr.h        |  72 +++++++++++++++
> > >   fs/xfs/libxfs/xfs_attr_leaf.c   |   2 +-
> > >   fs/xfs/libxfs/xfs_attr_remote.c |  37 ++++----
> > >   fs/xfs/libxfs/xfs_attr_remote.h |   2 +-
> > >   fs/xfs/xfs_attr_inactive.c      |   2 +-
> > >   6 files changed, 241 insertions(+), 74 deletions(-)
> > > 
> > > diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
> > > index f4d39bf..6ca94cb 100644
> > > --- a/fs/xfs/libxfs/xfs_attr.c
> > > +++ b/fs/xfs/libxfs/xfs_attr.c
...
> > > @@ -1282,31 +1356,53 @@ xfs_attr_node_remove_step(
> > >    *
> > >    * This routine will find the blocks of the name to remove, remove them and
> > >    * shirnk the tree if needed.
> > > + *
> > > + * This routine is meant to function as either an inline or delayed operation,
> > > + * and may return -EAGAIN when the transaction needs to be rolled.  Calling
> > > + * functions will need to handle this, and recall the function until a
> > > + * successful error code is returned.
> > >    */
> > >   STATIC int
> > > -xfs_attr_node_removename(
> > > -	struct xfs_da_args	*args)
> > > +xfs_attr_node_removename_iter(
> > > +	struct xfs_delattr_context	*dac)
> > >   {
> > > -	struct xfs_da_state	*state;
> > > -	int			error;
> > > -	struct xfs_inode	*dp = args->dp;
> > > +	struct xfs_da_args		*args = dac->da_args;
> > > +	struct xfs_da_state		*state;
> > > +	int				error;
> > > +	struct xfs_inode		*dp = args->dp;
> > >   	trace_xfs_attr_node_removename(args);
> > > +	state = dac->da_state;
> > > -	error = xfs_attr_node_removename_setup(args, &state);
> > > -	if (error)
> > > -		goto out;
> > > +	if ((dac->flags & XFS_DAC_NODE_RMVNAME_INIT) == 0) {
> > > +		dac->flags |= XFS_DAC_NODE_RMVNAME_INIT;
> > > +		error = xfs_attr_node_removename_setup(dac, &state);
> > > +		if (error)
> > > +			goto out;
> > > +	}
> > > -	error = xfs_attr_node_remove_step(args, state);
> > > -	if (error)
> > > -		goto out;
> > > +	switch (dac->dela_state) {
> > > +	case XFS_DAS_UNINIT:
> > > +		error = xfs_attr_node_remove_step(dac);
> > > +		if (error)
> > > +			break;
> > 
> > I think there's a bit more preliminary refactoring to do here to isolate
> > the state management to this one function. I.e., from the discussion on
> > the previous version, we'd ideally pull the logic that checks for the
> > subsequent shrink state out of xfs_attr_node_remove_step() and lift it
> > into this branch. See the pseudocode in the previous discussion for an
> > example of what I mean:
> > 
> >    https://urldefense.com/v3/__https://lore.kernel.org/linux-xfs/20200901170020.GC174813@bfoster/__;!!GqivPVa7Brio!JKxU3Z07HVj0V1YFesXrveRWnoGuWqiTZuaIDiG9UFmSxz-aFGsZPpLtjIOjSht7WL_h$
> > 
> > The general goal of that is to refactor the existing code such that all
> > of the state transitions and whatnot are shown in one place and the rest
> > is broken down into smaller functional helpers.
> > 
> > Brian
> 
> Yes, I did see the pseudo code, though I wasnt able to get it through the
> test cases quite the way it appears.  Because whether or not we proceed to
> shrink is determined by the leaf remove that's burried in the step function.
> Otherwise we run into some failed asserts in the shrink routines for
> "state->path.active == 1". Alternatly we can add another helper to pull it
> up to this scope.  Then the switch ends up looking like this:
> 

Ok. That was more focused on just showing an approach that collects all
of the state transition logic in one place. The factoring below seems
generally reasonable to me..

>         switch (dac->dela_state) {
> 
>         case XFS_DAS_UNINIT:
> 
>                 /*
> 
>                  * repeatedly remove remote blocks, remove the entry and
>                  * join. returns -EAGAIN or 0 for completion of the step.
>                  */
> 
>                 error = xfs_attr_node_remove_step(dac);
> 
>                 if (error)
> 
>                         break;
> 
> 
> 
>                 retval = xfs_attr_node_remove_cleanup(dac);
> 
> 
> 
>                 /*
> 
>                  * Check to see if the tree needs to be collapsed.  Set the
> flag to
>                  * indicate that the calling function needs to move the to
> shrink
>                  * operation
> 
>                  */
> 
>                 if (retval && (state->path.active > 1)) {
> 
>                         error = xfs_da3_join(state);
> 
>                         if (error)
> 
>                                 return error;
> 
> 
> 
>                         dac->dela_state = XFS_DAS_RM_SHRINK;
> 
>                         return -EAGAIN;
> 
>                 }
> 
> 
> 
>                 /* check whether to shrink or return success */
> 
>                 if (!error && xfs_bmap_one_block(dp, XFS_ATTR_FORK)) {
> 
>                         dac->dela_state = XFS_DAS_RM_SHRINK;
> 
>                         error = -EAGAIN;
> 
>                 }

... though what is the purpose of this hunk? This seems to diverge from
the current logic, but maybe I'm missing something.

> 
>                 break;
> 
>         case XFS_DAS_RM_SHRINK:
> 
>                 /*
> 
>                  * If the result is small enough, push it all into the
> inode.
>                  */
> 
>                 error = xfs_attr_node_shrink(args, state);
> 
> 
> 
>                 break;
> 
>         default:
> 
>                 ASSERT(0);
> 
>                 return -EINVAL;
> 
>         }
> 
> 
> 
>         if (error == -EAGAIN)
> 
>                 return error;
> 
> 
> 
> 
> 
> And then we have this little clean up helper that we pull out of
> remove_step:
> 
> 
> 
> STATIC int
> 
> xfs_attr_node_remove_cleanup(
> 
>         struct xfs_delattr_context      *dac)
> 
> {
> 
>         struct xfs_da_args              *args = dac->da_args;
> 
>         struct xfs_da_state_blk         *blk;
> 
>         int                             retval;
> 
> 
> 
>         /*
> 
>          * Remove the name and update the hashvals in the tree.
> 
>          */
> 
>         blk = &dac->da_state->path.blk[ dac->da_state->path.active-1 ];
> 
>         ASSERT(blk->magic == XFS_ATTR_LEAF_MAGIC);
> 
>         retval = xfs_attr3_leaf_remove(blk->bp, args);
> 
>         xfs_da3_fixhashpath(dac->da_state, &dac->da_state->path);
> 
> 
> 
>         return retval;
> 
> }
> 
> 
> This configuration seems to get through the test cases.  It's not quite as
> tidy, but it does get all the state handling into this scope.  If people
> prefer it this way I can add in the extra helper and make these adjustments?
> 

Makes sense. I think the goal should be to pull the state management
into one (or as few) place(s) as technically possible and we should
refactor however necessary to accomplish that (please just pull the
refactoring out into preliminary patches to facilitate review). Thanks.

Brian

> Allison
> 
> > 
> > > -	/*
> > > -	 * If the result is small enough, push it all into the inode.
> > > -	 */
> > > -	if (xfs_bmap_one_block(dp, XFS_ATTR_FORK))
> > > -		error = xfs_attr_node_shrink(args, state);
> > > +		/* do not break, proceed to shrink if needed */
> > > +	case XFS_DAS_RM_SHRINK:
> > > +		/*
> > > +		 * If the result is small enough, push it all into the inode.
> > > +		 */
> > > +		if (xfs_bmap_one_block(dp, XFS_ATTR_FORK))
> > > +			error = xfs_attr_node_shrink(args, state);
> > > +		break;
> > > +	default:
> > > +		ASSERT(0);
> > > +		return -EINVAL;
> > > +	}
> > > +
> > > +	if (error == -EAGAIN)
> > > +		return error;
> > >   out:
> > >   	if (state)
> > >   		xfs_da_state_free(state);
> > > diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
> > > index 3e97a93..64dcf0f 100644
> > > --- a/fs/xfs/libxfs/xfs_attr.h
> > > +++ b/fs/xfs/libxfs/xfs_attr.h
> > > @@ -74,6 +74,74 @@ struct xfs_attr_list_context {
> > >   };
> > > +/*
> > > + * ========================================================================
> > > + * Structure used to pass context around among the delayed routines.
> > > + * ========================================================================
> > > + */
> > > +
> > > +/*
> > > + * Below is a state machine diagram for attr remove operations. The  XFS_DAS_*
> > > + * states indicate places where the function would return -EAGAIN, and then
> > > + * immediately resume from after being recalled by the calling function. States
> > > + * marked as a "subroutine state" indicate that they belong to a subroutine, and
> > > + * so the calling function needs to pass them back to that subroutine to allow
> > > + * it to finish where it left off. But they otherwise do not have a role in the
> > > + * calling function other than just passing through.
> > > + *
> > > + * xfs_attr_remove_iter()
> > > + *	  XFS_DAS_RM_SHRINK ─┐
> > > + *	  (subroutine state) │
> > > + *	                     └─>xfs_attr_node_removename()
> > > + *	                                      │
> > > + *	                                      v
> > > + *	                                   need to
> > > + *	                                shrink tree? ─n─┐
> > > + *	                                      │         │
> > > + *	                                      y         │
> > > + *	                                      │         │
> > > + *	                                      v         │
> > > + *	                              XFS_DAS_RM_SHRINK │
> > > + *	                                      │         │
> > > + *	                                      v         │
> > > + *	                                     done <─────┘
> > > + *
> > > + */
> > > +
> > > +/*
> > > + * Enum values for xfs_delattr_context.da_state
> > > + *
> > > + * These values are used by delayed attribute operations to keep track  of where
> > > + * they were before they returned -EAGAIN.  A return code of -EAGAIN signals the
> > > + * calling function to roll the transaction, and then recall the subroutine to
> > > + * finish the operation.  The enum is then used by the subroutine to jump back
> > > + * to where it was and resume executing where it left off.
> > > + */
> > > +enum xfs_delattr_state {
> > > +	XFS_DAS_UNINIT		= 0,  /* No state has been set yet */
> > > +	XFS_DAS_RM_SHRINK,	      /* We are shrinking the tree */
> > > +};
> > > +
> > > +/*
> > > + * Defines for xfs_delattr_context.flags
> > > + */
> > > +#define XFS_DAC_DEFER_FINISH		0x01 /* finish the transaction */
> > > +#define XFS_DAC_NODE_RMVNAME_INIT	0x02 /* xfs_attr_node_removename init */
> > > +
> > > +/*
> > > + * Context used for keeping track of delayed attribute operations
> > > + */
> > > +struct xfs_delattr_context {
> > > +	struct xfs_da_args      *da_args;
> > > +
> > > +	/* Used in xfs_attr_node_removename to roll through removing blocks */
> > > +	struct xfs_da_state     *da_state;
> > > +
> > > +	/* Used to keep track of current state of delayed operation */
> > > +	unsigned int            flags;
> > > +	enum xfs_delattr_state  dela_state;
> > > +};
> > > +
> > >   /*========================================================================
> > >    * Function prototypes for the kernel.
> > >    *========================================================================*/
> > > @@ -91,6 +159,10 @@ int xfs_attr_set(struct xfs_da_args *args);
> > >   int xfs_attr_set_args(struct xfs_da_args *args);
> > >   int xfs_has_attr(struct xfs_da_args *args);
> > >   int xfs_attr_remove_args(struct xfs_da_args *args);
> > > +int xfs_attr_remove_iter(struct xfs_delattr_context *dac);
> > > +int xfs_attr_trans_roll(struct xfs_delattr_context *dac);
> > >   bool xfs_attr_namecheck(const void *name, size_t length);
> > > +void xfs_delattr_context_init(struct xfs_delattr_context *dac,
> > > +			      struct xfs_da_args *args);
> > >   #endif	/* __XFS_ATTR_H__ */
> > > diff --git a/fs/xfs/libxfs/xfs_attr_leaf.c b/fs/xfs/libxfs/xfs_attr_leaf.c
> > > index bb128db..338377e 100644
> > > --- a/fs/xfs/libxfs/xfs_attr_leaf.c
> > > +++ b/fs/xfs/libxfs/xfs_attr_leaf.c
> > > @@ -19,8 +19,8 @@
> > >   #include "xfs_bmap_btree.h"
> > >   #include "xfs_bmap.h"
> > >   #include "xfs_attr_sf.h"
> > > -#include "xfs_attr_remote.h"
> > >   #include "xfs_attr.h"
> > > +#include "xfs_attr_remote.h"
> > >   #include "xfs_attr_leaf.h"
> > >   #include "xfs_error.h"
> > >   #include "xfs_trace.h"
> > > diff --git a/fs/xfs/libxfs/xfs_attr_remote.c b/fs/xfs/libxfs/xfs_attr_remote.c
> > > index 48d8e9c..1426c15 100644
> > > --- a/fs/xfs/libxfs/xfs_attr_remote.c
> > > +++ b/fs/xfs/libxfs/xfs_attr_remote.c
> > > @@ -674,10 +674,12 @@ xfs_attr_rmtval_invalidate(
> > >    */
> > >   int
> > >   xfs_attr_rmtval_remove(
> > > -	struct xfs_da_args      *args)
> > > +	struct xfs_da_args		*args)
> > >   {
> > > -	int			error;
> > > -	int			retval;
> > > +	int				error;
> > > +	struct xfs_delattr_context	dac  = {
> > > +		.da_args	= args,
> > > +	};
> > >   	trace_xfs_attr_rmtval_remove(args);
> > > @@ -685,19 +687,17 @@ xfs_attr_rmtval_remove(
> > >   	 * Keep de-allocating extents until the remote-value region is gone.
> > >   	 */
> > >   	do {
> > > -		retval = __xfs_attr_rmtval_remove(args);
> > > -		if (retval && retval != -EAGAIN)
> > > -			return retval;
> > > +		error = __xfs_attr_rmtval_remove(&dac);
> > > +		if (error != -EAGAIN)
> > > +			break;
> > > -		/*
> > > -		 * Close out trans and start the next one in the chain.
> > > -		 */
> > > -		error = xfs_trans_roll_inode(&args->trans, args->dp);
> > > +		error = xfs_attr_trans_roll(&dac);
> > >   		if (error)
> > >   			return error;
> > > -	} while (retval == -EAGAIN);
> > > -	return 0;
> > > +	} while (true);
> > > +
> > > +	return error;
> > >   }
> > >   /*
> > > @@ -707,9 +707,10 @@ xfs_attr_rmtval_remove(
> > >    */
> > >   int
> > >   __xfs_attr_rmtval_remove(
> > > -	struct xfs_da_args	*args)
> > > +	struct xfs_delattr_context	*dac)
> > >   {
> > > -	int			error, done;
> > > +	struct xfs_da_args		*args = dac->da_args;
> > > +	int				error, done;
> > >   	/*
> > >   	 * Unmap value blocks for this attr.
> > > @@ -719,12 +720,10 @@ __xfs_attr_rmtval_remove(
> > >   	if (error)
> > >   		return error;
> > > -	error = xfs_defer_finish(&args->trans);
> > > -	if (error)
> > > -		return error;
> > > -
> > > -	if (!done)
> > > +	if (!done) {
> > > +		dac->flags |= XFS_DAC_DEFER_FINISH;
> > >   		return -EAGAIN;
> > > +	}
> > >   	return error;
> > >   }
> > > diff --git a/fs/xfs/libxfs/xfs_attr_remote.h b/fs/xfs/libxfs/xfs_attr_remote.h
> > > index 9eee615..002fd30 100644
> > > --- a/fs/xfs/libxfs/xfs_attr_remote.h
> > > +++ b/fs/xfs/libxfs/xfs_attr_remote.h
> > > @@ -14,5 +14,5 @@ int xfs_attr_rmtval_remove(struct xfs_da_args *args);
> > >   int xfs_attr_rmtval_stale(struct xfs_inode *ip, struct xfs_bmbt_irec *map,
> > >   		xfs_buf_flags_t incore_flags);
> > >   int xfs_attr_rmtval_invalidate(struct xfs_da_args *args);
> > > -int __xfs_attr_rmtval_remove(struct xfs_da_args *args);
> > > +int __xfs_attr_rmtval_remove(struct xfs_delattr_context *dac);
> > >   #endif /* __XFS_ATTR_REMOTE_H__ */
> > > diff --git a/fs/xfs/xfs_attr_inactive.c b/fs/xfs/xfs_attr_inactive.c
> > > index bfad669..aaa7e66 100644
> > > --- a/fs/xfs/xfs_attr_inactive.c
> > > +++ b/fs/xfs/xfs_attr_inactive.c
> > > @@ -15,10 +15,10 @@
> > >   #include "xfs_da_format.h"
> > >   #include "xfs_da_btree.h"
> > >   #include "xfs_inode.h"
> > > +#include "xfs_attr.h"
> > >   #include "xfs_attr_remote.h"
> > >   #include "xfs_trans.h"
> > >   #include "xfs_bmap.h"
> > > -#include "xfs_attr.h"
> > >   #include "xfs_attr_leaf.h"
> > >   #include "xfs_quota.h"
> > >   #include "xfs_dir2.h"
> > > -- 
> > > 2.7.4
> > > 
> > 
> 


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v13 02/10] xfs: Add delay ready attr remove routines
  2020-10-28 12:28       ` Brian Foster
@ 2020-10-29  1:03         ` Allison Henderson
  0 siblings, 0 replies; 58+ messages in thread
From: Allison Henderson @ 2020-10-29  1:03 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs



On 10/28/20 5:28 AM, Brian Foster wrote:
> On Tue, Oct 27, 2020 at 03:27:20PM -0700, Allison Henderson wrote:
>>
>>
>> On 10/27/20 5:16 AM, Brian Foster wrote:
>>> On Thu, Oct 22, 2020 at 11:34:27PM -0700, Allison Henderson wrote:
>>>> This patch modifies the attr remove routines to be delay ready. This
>>>> means they no longer roll or commit transactions, but instead return
>>>> -EAGAIN to have the calling routine roll and refresh the transaction. In
>>>> this series, xfs_attr_remove_args has become xfs_attr_remove_iter, which
>>>> uses a sort of state machine like switch to keep track of where it was
>>>> when EAGAIN was returned. xfs_attr_node_removename has also been
>>>> modified to use the switch, and a new version of xfs_attr_remove_args
>>>> consists of a simple loop to refresh the transaction until the operation
>>>> is completed.  A new XFS_DAC_DEFER_FINISH flag is used to finish the
>>>> transaction where ever the existing code used to.
>>>>
>>>> Calls to xfs_attr_rmtval_remove are replaced with the delay ready
>>>> version __xfs_attr_rmtval_remove. We will rename
>>>> __xfs_attr_rmtval_remove back to xfs_attr_rmtval_remove when we are
>>>> done.
>>>>
>>>> xfs_attr_rmtval_remove itself is still in use by the set routines (used
>>>> during a rename).  For reasons of preserving existing function, we
>>>> modify xfs_attr_rmtval_remove to call xfs_defer_finish when the flag is
>>>> set.  Similar to how xfs_attr_remove_args does here.  Once we transition
>>>> the set routines to be delay ready, xfs_attr_rmtval_remove is no longer
>>>> used and will be removed.
>>>>
>>>> This patch also adds a new struct xfs_delattr_context, which we will use
>>>> to keep track of the current state of an attribute operation. The new
>>>> xfs_delattr_state enum is used to track various operations that are in
>>>> progress so that we know not to repeat them, and resume where we left
>>>> off before EAGAIN was returned to cycle out the transaction. Other
>>>> members take the place of local variables that need to retain their
>>>> values across multiple function recalls.  See xfs_attr.h for a more
>>>> detailed diagram of the states.
>>>>
>>>> Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
>>>> ---
>>>>    fs/xfs/libxfs/xfs_attr.c        | 200 +++++++++++++++++++++++++++++-----------
>>>>    fs/xfs/libxfs/xfs_attr.h        |  72 +++++++++++++++
>>>>    fs/xfs/libxfs/xfs_attr_leaf.c   |   2 +-
>>>>    fs/xfs/libxfs/xfs_attr_remote.c |  37 ++++----
>>>>    fs/xfs/libxfs/xfs_attr_remote.h |   2 +-
>>>>    fs/xfs/xfs_attr_inactive.c      |   2 +-
>>>>    6 files changed, 241 insertions(+), 74 deletions(-)
>>>>
>>>> diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
>>>> index f4d39bf..6ca94cb 100644
>>>> --- a/fs/xfs/libxfs/xfs_attr.c
>>>> +++ b/fs/xfs/libxfs/xfs_attr.c
> ...
>>>> @@ -1282,31 +1356,53 @@ xfs_attr_node_remove_step(
>>>>     *
>>>>     * This routine will find the blocks of the name to remove, remove them and
>>>>     * shirnk the tree if needed.
>>>> + *
>>>> + * This routine is meant to function as either an inline or delayed operation,
>>>> + * and may return -EAGAIN when the transaction needs to be rolled.  Calling
>>>> + * functions will need to handle this, and recall the function until a
>>>> + * successful error code is returned.
>>>>     */
>>>>    STATIC int
>>>> -xfs_attr_node_removename(
>>>> -	struct xfs_da_args	*args)
>>>> +xfs_attr_node_removename_iter(
>>>> +	struct xfs_delattr_context	*dac)
>>>>    {
>>>> -	struct xfs_da_state	*state;
>>>> -	int			error;
>>>> -	struct xfs_inode	*dp = args->dp;
>>>> +	struct xfs_da_args		*args = dac->da_args;
>>>> +	struct xfs_da_state		*state;
>>>> +	int				error;
>>>> +	struct xfs_inode		*dp = args->dp;
>>>>    	trace_xfs_attr_node_removename(args);
>>>> +	state = dac->da_state;
>>>> -	error = xfs_attr_node_removename_setup(args, &state);
>>>> -	if (error)
>>>> -		goto out;
>>>> +	if ((dac->flags & XFS_DAC_NODE_RMVNAME_INIT) == 0) {
>>>> +		dac->flags |= XFS_DAC_NODE_RMVNAME_INIT;
>>>> +		error = xfs_attr_node_removename_setup(dac, &state);
>>>> +		if (error)
>>>> +			goto out;
>>>> +	}
>>>> -	error = xfs_attr_node_remove_step(args, state);
>>>> -	if (error)
>>>> -		goto out;
>>>> +	switch (dac->dela_state) {
>>>> +	case XFS_DAS_UNINIT:
>>>> +		error = xfs_attr_node_remove_step(dac);
>>>> +		if (error)
>>>> +			break;
>>>
>>> I think there's a bit more preliminary refactoring to do here to isolate
>>> the state management to this one function. I.e., from the discussion on
>>> the previous version, we'd ideally pull the logic that checks for the
>>> subsequent shrink state out of xfs_attr_node_remove_step() and lift it
>>> into this branch. See the pseudocode in the previous discussion for an
>>> example of what I mean:
>>>
>>>     https://urldefense.com/v3/__https://lore.kernel.org/linux-xfs/20200901170020.GC174813@bfoster/__;!!GqivPVa7Brio!JKxU3Z07HVj0V1YFesXrveRWnoGuWqiTZuaIDiG9UFmSxz-aFGsZPpLtjIOjSht7WL_h$
>>>
>>> The general goal of that is to refactor the existing code such that all
>>> of the state transitions and whatnot are shown in one place and the rest
>>> is broken down into smaller functional helpers.
>>>
>>> Brian
>>
>> Yes, I did see the pseudo code, though I wasnt able to get it through the
>> test cases quite the way it appears.  Because whether or not we proceed to
>> shrink is determined by the leaf remove that's burried in the step function.
>> Otherwise we run into some failed asserts in the shrink routines for
>> "state->path.active == 1". Alternatly we can add another helper to pull it
>> up to this scope.  Then the switch ends up looking like this:
>>
> 
> Ok. That was more focused on just showing an approach that collects all
> of the state transition logic in one place. The factoring below seems
> generally reasonable to me..
> 
>>          switch (dac->dela_state) {
>>
>>          case XFS_DAS_UNINIT:
>>
>>                  /*
>>
>>                   * repeatedly remove remote blocks, remove the entry and
>>                   * join. returns -EAGAIN or 0 for completion of the step.
>>                   */
>>
>>                  error = xfs_attr_node_remove_step(dac);
>>
>>                  if (error)
>>
>>                          break;
>>
>>
>>
>>                  retval = xfs_attr_node_remove_cleanup(dac);
>>
>>
>>
>>                  /*
>>
>>                   * Check to see if the tree needs to be collapsed.  Set the
>> flag to
>>                   * indicate that the calling function needs to move the to
>> shrink
>>                   * operation
>>
>>                   */
>>
>>                  if (retval && (state->path.active > 1)) {
>>
>>                          error = xfs_da3_join(state);
>>
>>                          if (error)
>>
>>                                  return error;
>>
>>
>>
>>                          dac->dela_state = XFS_DAS_RM_SHRINK;
>>
>>                          return -EAGAIN;
>>
>>                  }
>>
>>
>>
>>                  /* check whether to shrink or return success */
>>
>>                  if (!error && xfs_bmap_one_block(dp, XFS_ATTR_FORK)) {
>>
>>                          dac->dela_state = XFS_DAS_RM_SHRINK;
>>
>>                          error = -EAGAIN;
>>
>>                  }
> 
> ... though what is the purpose of this hunk? This seems to diverge from
> the current logic, but maybe I'm missing something.
> 
That's just me trying to get as close to the psuedo code as possible, 
but if the concern is just to keep the state management in this scope, I 
may move it back down into the the XFS_DAS_RM_SHRINK case where it was, 
I think it looks a little cleaner there.


>>
>>                  break;
>>
>>          case XFS_DAS_RM_SHRINK:
>>
>>                  /*
>>
>>                   * If the result is small enough, push it all into the
>> inode.
>>                   */
>>
So it would go back down here like this, which is more akin to the 
original code flow:


		if (xfs_bmap_one_block(dp, XFS_ATTR_FORK))
>>                  error = xfs_attr_node_shrink(args, state);
>>
>>
>>
>>                  break;
>>
>>          default:
>>
>>                  ASSERT(0);
>>
>>                  return -EINVAL;
>>
>>          }
>>
>>
>>
>>          if (error == -EAGAIN)
>>
>>                  return error;
>>
>>
>>
>>
>>
>> And then we have this little clean up helper that we pull out of
>> remove_step:
>>
>>
>>
>> STATIC int
>>
>> xfs_attr_node_remove_cleanup(
>>
>>          struct xfs_delattr_context      *dac)
>>
>> {
>>
>>          struct xfs_da_args              *args = dac->da_args;
>>
>>          struct xfs_da_state_blk         *blk;
>>
>>          int                             retval;
>>
>>
>>
>>          /*
>>
>>           * Remove the name and update the hashvals in the tree.
>>
>>           */
>>
>>          blk = &dac->da_state->path.blk[ dac->da_state->path.active-1 ];
>>
>>          ASSERT(blk->magic == XFS_ATTR_LEAF_MAGIC);
>>
>>          retval = xfs_attr3_leaf_remove(blk->bp, args);
>>
>>          xfs_da3_fixhashpath(dac->da_state, &dac->da_state->path);
>>
>>
>>
>>          return retval;
>>
>> }
>>
>>
>> This configuration seems to get through the test cases.  It's not quite as
>> tidy, but it does get all the state handling into this scope.  If people
>> prefer it this way I can add in the extra helper and make these adjustments?
>>
> 
> Makes sense. I think the goal should be to pull the state management
> into one (or as few) place(s) as technically possible and we should
> refactor however necessary to accomplish that (please just pull the
> refactoring out into preliminary patches to facilitate review). Thanks.
> 
> Brian
Sure, I'll add in another patch for that extra helper in the next set. 
Thanks for the review!

Allison

> 
>> Allison
>>
>>>
>>>> -	/*
>>>> -	 * If the result is small enough, push it all into the inode.
>>>> -	 */
>>>> -	if (xfs_bmap_one_block(dp, XFS_ATTR_FORK))
>>>> -		error = xfs_attr_node_shrink(args, state);
>>>> +		/* do not break, proceed to shrink if needed */
>>>> +	case XFS_DAS_RM_SHRINK:
>>>> +		/*
>>>> +		 * If the result is small enough, push it all into the inode.
>>>> +		 */
>>>> +		if (xfs_bmap_one_block(dp, XFS_ATTR_FORK))
>>>> +			error = xfs_attr_node_shrink(args, state);
>>>> +		break;
>>>> +	default:
>>>> +		ASSERT(0);
>>>> +		return -EINVAL;
>>>> +	}
>>>> +
>>>> +	if (error == -EAGAIN)
>>>> +		return error;
>>>>    out:
>>>>    	if (state)
>>>>    		xfs_da_state_free(state);
>>>> diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
>>>> index 3e97a93..64dcf0f 100644
>>>> --- a/fs/xfs/libxfs/xfs_attr.h
>>>> +++ b/fs/xfs/libxfs/xfs_attr.h
>>>> @@ -74,6 +74,74 @@ struct xfs_attr_list_context {
>>>>    };
>>>> +/*
>>>> + * ========================================================================
>>>> + * Structure used to pass context around among the delayed routines.
>>>> + * ========================================================================
>>>> + */
>>>> +
>>>> +/*
>>>> + * Below is a state machine diagram for attr remove operations. The  XFS_DAS_*
>>>> + * states indicate places where the function would return -EAGAIN, and then
>>>> + * immediately resume from after being recalled by the calling function. States
>>>> + * marked as a "subroutine state" indicate that they belong to a subroutine, and
>>>> + * so the calling function needs to pass them back to that subroutine to allow
>>>> + * it to finish where it left off. But they otherwise do not have a role in the
>>>> + * calling function other than just passing through.
>>>> + *
>>>> + * xfs_attr_remove_iter()
>>>> + *	  XFS_DAS_RM_SHRINK ─┐
>>>> + *	  (subroutine state) │
>>>> + *	                     └─>xfs_attr_node_removename()
>>>> + *	                                      │
>>>> + *	                                      v
>>>> + *	                                   need to
>>>> + *	                                shrink tree? ─n─┐
>>>> + *	                                      │         │
>>>> + *	                                      y         │
>>>> + *	                                      │         │
>>>> + *	                                      v         │
>>>> + *	                              XFS_DAS_RM_SHRINK │
>>>> + *	                                      │         │
>>>> + *	                                      v         │
>>>> + *	                                     done <─────┘
>>>> + *
>>>> + */
>>>> +
>>>> +/*
>>>> + * Enum values for xfs_delattr_context.da_state
>>>> + *
>>>> + * These values are used by delayed attribute operations to keep track  of where
>>>> + * they were before they returned -EAGAIN.  A return code of -EAGAIN signals the
>>>> + * calling function to roll the transaction, and then recall the subroutine to
>>>> + * finish the operation.  The enum is then used by the subroutine to jump back
>>>> + * to where it was and resume executing where it left off.
>>>> + */
>>>> +enum xfs_delattr_state {
>>>> +	XFS_DAS_UNINIT		= 0,  /* No state has been set yet */
>>>> +	XFS_DAS_RM_SHRINK,	      /* We are shrinking the tree */
>>>> +};
>>>> +
>>>> +/*
>>>> + * Defines for xfs_delattr_context.flags
>>>> + */
>>>> +#define XFS_DAC_DEFER_FINISH		0x01 /* finish the transaction */
>>>> +#define XFS_DAC_NODE_RMVNAME_INIT	0x02 /* xfs_attr_node_removename init */
>>>> +
>>>> +/*
>>>> + * Context used for keeping track of delayed attribute operations
>>>> + */
>>>> +struct xfs_delattr_context {
>>>> +	struct xfs_da_args      *da_args;
>>>> +
>>>> +	/* Used in xfs_attr_node_removename to roll through removing blocks */
>>>> +	struct xfs_da_state     *da_state;
>>>> +
>>>> +	/* Used to keep track of current state of delayed operation */
>>>> +	unsigned int            flags;
>>>> +	enum xfs_delattr_state  dela_state;
>>>> +};
>>>> +
>>>>    /*========================================================================
>>>>     * Function prototypes for the kernel.
>>>>     *========================================================================*/
>>>> @@ -91,6 +159,10 @@ int xfs_attr_set(struct xfs_da_args *args);
>>>>    int xfs_attr_set_args(struct xfs_da_args *args);
>>>>    int xfs_has_attr(struct xfs_da_args *args);
>>>>    int xfs_attr_remove_args(struct xfs_da_args *args);
>>>> +int xfs_attr_remove_iter(struct xfs_delattr_context *dac);
>>>> +int xfs_attr_trans_roll(struct xfs_delattr_context *dac);
>>>>    bool xfs_attr_namecheck(const void *name, size_t length);
>>>> +void xfs_delattr_context_init(struct xfs_delattr_context *dac,
>>>> +			      struct xfs_da_args *args);
>>>>    #endif	/* __XFS_ATTR_H__ */
>>>> diff --git a/fs/xfs/libxfs/xfs_attr_leaf.c b/fs/xfs/libxfs/xfs_attr_leaf.c
>>>> index bb128db..338377e 100644
>>>> --- a/fs/xfs/libxfs/xfs_attr_leaf.c
>>>> +++ b/fs/xfs/libxfs/xfs_attr_leaf.c
>>>> @@ -19,8 +19,8 @@
>>>>    #include "xfs_bmap_btree.h"
>>>>    #include "xfs_bmap.h"
>>>>    #include "xfs_attr_sf.h"
>>>> -#include "xfs_attr_remote.h"
>>>>    #include "xfs_attr.h"
>>>> +#include "xfs_attr_remote.h"
>>>>    #include "xfs_attr_leaf.h"
>>>>    #include "xfs_error.h"
>>>>    #include "xfs_trace.h"
>>>> diff --git a/fs/xfs/libxfs/xfs_attr_remote.c b/fs/xfs/libxfs/xfs_attr_remote.c
>>>> index 48d8e9c..1426c15 100644
>>>> --- a/fs/xfs/libxfs/xfs_attr_remote.c
>>>> +++ b/fs/xfs/libxfs/xfs_attr_remote.c
>>>> @@ -674,10 +674,12 @@ xfs_attr_rmtval_invalidate(
>>>>     */
>>>>    int
>>>>    xfs_attr_rmtval_remove(
>>>> -	struct xfs_da_args      *args)
>>>> +	struct xfs_da_args		*args)
>>>>    {
>>>> -	int			error;
>>>> -	int			retval;
>>>> +	int				error;
>>>> +	struct xfs_delattr_context	dac  = {
>>>> +		.da_args	= args,
>>>> +	};
>>>>    	trace_xfs_attr_rmtval_remove(args);
>>>> @@ -685,19 +687,17 @@ xfs_attr_rmtval_remove(
>>>>    	 * Keep de-allocating extents until the remote-value region is gone.
>>>>    	 */
>>>>    	do {
>>>> -		retval = __xfs_attr_rmtval_remove(args);
>>>> -		if (retval && retval != -EAGAIN)
>>>> -			return retval;
>>>> +		error = __xfs_attr_rmtval_remove(&dac);
>>>> +		if (error != -EAGAIN)
>>>> +			break;
>>>> -		/*
>>>> -		 * Close out trans and start the next one in the chain.
>>>> -		 */
>>>> -		error = xfs_trans_roll_inode(&args->trans, args->dp);
>>>> +		error = xfs_attr_trans_roll(&dac);
>>>>    		if (error)
>>>>    			return error;
>>>> -	} while (retval == -EAGAIN);
>>>> -	return 0;
>>>> +	} while (true);
>>>> +
>>>> +	return error;
>>>>    }
>>>>    /*
>>>> @@ -707,9 +707,10 @@ xfs_attr_rmtval_remove(
>>>>     */
>>>>    int
>>>>    __xfs_attr_rmtval_remove(
>>>> -	struct xfs_da_args	*args)
>>>> +	struct xfs_delattr_context	*dac)
>>>>    {
>>>> -	int			error, done;
>>>> +	struct xfs_da_args		*args = dac->da_args;
>>>> +	int				error, done;
>>>>    	/*
>>>>    	 * Unmap value blocks for this attr.
>>>> @@ -719,12 +720,10 @@ __xfs_attr_rmtval_remove(
>>>>    	if (error)
>>>>    		return error;
>>>> -	error = xfs_defer_finish(&args->trans);
>>>> -	if (error)
>>>> -		return error;
>>>> -
>>>> -	if (!done)
>>>> +	if (!done) {
>>>> +		dac->flags |= XFS_DAC_DEFER_FINISH;
>>>>    		return -EAGAIN;
>>>> +	}
>>>>    	return error;
>>>>    }
>>>> diff --git a/fs/xfs/libxfs/xfs_attr_remote.h b/fs/xfs/libxfs/xfs_attr_remote.h
>>>> index 9eee615..002fd30 100644
>>>> --- a/fs/xfs/libxfs/xfs_attr_remote.h
>>>> +++ b/fs/xfs/libxfs/xfs_attr_remote.h
>>>> @@ -14,5 +14,5 @@ int xfs_attr_rmtval_remove(struct xfs_da_args *args);
>>>>    int xfs_attr_rmtval_stale(struct xfs_inode *ip, struct xfs_bmbt_irec *map,
>>>>    		xfs_buf_flags_t incore_flags);
>>>>    int xfs_attr_rmtval_invalidate(struct xfs_da_args *args);
>>>> -int __xfs_attr_rmtval_remove(struct xfs_da_args *args);
>>>> +int __xfs_attr_rmtval_remove(struct xfs_delattr_context *dac);
>>>>    #endif /* __XFS_ATTR_REMOTE_H__ */
>>>> diff --git a/fs/xfs/xfs_attr_inactive.c b/fs/xfs/xfs_attr_inactive.c
>>>> index bfad669..aaa7e66 100644
>>>> --- a/fs/xfs/xfs_attr_inactive.c
>>>> +++ b/fs/xfs/xfs_attr_inactive.c
>>>> @@ -15,10 +15,10 @@
>>>>    #include "xfs_da_format.h"
>>>>    #include "xfs_da_btree.h"
>>>>    #include "xfs_inode.h"
>>>> +#include "xfs_attr.h"
>>>>    #include "xfs_attr_remote.h"
>>>>    #include "xfs_trans.h"
>>>>    #include "xfs_bmap.h"
>>>> -#include "xfs_attr.h"
>>>>    #include "xfs_attr_leaf.h"
>>>>    #include "xfs_quota.h"
>>>>    #include "xfs_dir2.h"
>>>> -- 
>>>> 2.7.4
>>>>
>>>
>>
> 

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v13 02/10] xfs: Add delay ready attr remove routines
  2020-10-28 12:04       ` Chandan Babu R
@ 2020-10-29  1:29         ` Allison Henderson
  2020-11-14  0:53           ` Darrick J. Wong
  0 siblings, 1 reply; 58+ messages in thread
From: Allison Henderson @ 2020-10-29  1:29 UTC (permalink / raw)
  To: Chandan Babu R; +Cc: linux-xfs



On 10/28/20 5:04 AM, Chandan Babu R wrote:
> On Tuesday 27 October 2020 9:02:05 PM IST Allison Henderson wrote:
>>
>> On 10/27/20 2:59 AM, Chandan Babu R wrote:
>>> On Friday 23 October 2020 12:04:27 PM IST Allison Henderson wrote:
>>>> This patch modifies the attr remove routines to be delay ready. This
>>>> means they no longer roll or commit transactions, but instead return
>>>> -EAGAIN to have the calling routine roll and refresh the transaction. In
>>>> this series, xfs_attr_remove_args has become xfs_attr_remove_iter, which
>>>> uses a sort of state machine like switch to keep track of where it was
>>>> when EAGAIN was returned. xfs_attr_node_removename has also been
>>>> modified to use the switch, and a new version of xfs_attr_remove_args
>>>> consists of a simple loop to refresh the transaction until the operation
>>>> is completed.  A new XFS_DAC_DEFER_FINISH flag is used to finish the
>>>> transaction where ever the existing code used to.
>>>>
>>>> Calls to xfs_attr_rmtval_remove are replaced with the delay ready
>>>> version __xfs_attr_rmtval_remove. We will rename
>>>> __xfs_attr_rmtval_remove back to xfs_attr_rmtval_remove when we are
>>>> done.
>>>>
>>>> xfs_attr_rmtval_remove itself is still in use by the set routines (used
>>>> during a rename).  For reasons of preserving existing function, we
>>>> modify xfs_attr_rmtval_remove to call xfs_defer_finish when the flag is
>>>> set.  Similar to how xfs_attr_remove_args does here.  Once we transition
>>>> the set routines to be delay ready, xfs_attr_rmtval_remove is no longer
>>>> used and will be removed.
>>>>
>>>> This patch also adds a new struct xfs_delattr_context, which we will use
>>>> to keep track of the current state of an attribute operation. The new
>>>> xfs_delattr_state enum is used to track various operations that are in
>>>> progress so that we know not to repeat them, and resume where we left
>>>> off before EAGAIN was returned to cycle out the transaction. Other
>>>> members take the place of local variables that need to retain their
>>>> values across multiple function recalls.  See xfs_attr.h for a more
>>>> detailed diagram of the states.
>>>>
>>>> Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
>>>> ---
>>>>    fs/xfs/libxfs/xfs_attr.c        | 200 +++++++++++++++++++++++++++++-----------
>>>>    fs/xfs/libxfs/xfs_attr.h        |  72 +++++++++++++++
>>>>    fs/xfs/libxfs/xfs_attr_leaf.c   |   2 +-
>>>>    fs/xfs/libxfs/xfs_attr_remote.c |  37 ++++----
>>>>    fs/xfs/libxfs/xfs_attr_remote.h |   2 +-
>>>>    fs/xfs/xfs_attr_inactive.c      |   2 +-
>>>>    6 files changed, 241 insertions(+), 74 deletions(-)
>>>>
>>>> diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
>>>> index f4d39bf..6ca94cb 100644
>>>> --- a/fs/xfs/libxfs/xfs_attr.c
>>>> +++ b/fs/xfs/libxfs/xfs_attr.c
>>>> @@ -53,7 +53,7 @@ STATIC int xfs_attr_leaf_hasname(struct xfs_da_args *args, struct xfs_buf **bp);
>>>>     */
>>>>    STATIC int xfs_attr_node_get(xfs_da_args_t *args);
>>>>    STATIC int xfs_attr_node_addname(xfs_da_args_t *args);
>>>> -STATIC int xfs_attr_node_removename(xfs_da_args_t *args);
>>>> +STATIC int xfs_attr_node_removename_iter(struct xfs_delattr_context *dac);
>>>>    STATIC int xfs_attr_node_hasname(xfs_da_args_t *args,
>>>>    				 struct xfs_da_state **state);
>>>>    STATIC int xfs_attr_fillstate(xfs_da_state_t *state);
>>>> @@ -264,6 +264,33 @@ xfs_attr_set_shortform(
>>>>    }
>>>>    
>>>>    /*
>>>> + * Checks to see if a delayed attribute transaction should be rolled.  If so,
>>>> + * also checks for a defer finish.  Transaction is finished and rolled as
>>>> + * needed, and returns true of false if the delayed operation should continue.
>>>> + */
>>>> +int
>>>> +xfs_attr_trans_roll(
>>>> +	struct xfs_delattr_context	*dac)
>>>> +{
>>>> +	struct xfs_da_args		*args = dac->da_args;
>>>> +	int				error = 0;
>>>> +
>>>> +	if (dac->flags & XFS_DAC_DEFER_FINISH) {
>>>> +		/*
>>>> +		 * The caller wants us to finish all the deferred ops so that we
>>>> +		 * avoid pinning the log tail with a large number of deferred
>>>> +		 * ops.
>>>> +		 */
>>>> +		dac->flags &= ~XFS_DAC_DEFER_FINISH;
>>>> +		error = xfs_defer_finish(&args->trans);
>>>> +		if (error)
>>>> +			return error;
>>>> +	}
>>>> +
>>>> +	return xfs_trans_roll_inode(&args->trans, args->dp);
>>>> +}
>>>> +
>>>> +/*
>>>>     * Set the attribute specified in @args.
>>>>     */
>>>>    int
>>>> @@ -364,23 +391,54 @@ xfs_has_attr(
>>>>     */
>>>>    int
>>>>    xfs_attr_remove_args(
>>>> -	struct xfs_da_args      *args)
>>>> +	struct xfs_da_args	*args)
>>>>    {
>>>> -	struct xfs_inode	*dp = args->dp;
>>>> -	int			error;
>>>> +	int				error = 0;
>>>
>>> I guess the explicit initialization of "error" can be removed since the
>>> value returned by the call to xfs_attr_remove_iter() will overwrite it.
>> Sure, will fix
>>>
>>>> +	struct xfs_delattr_context	dac = {
>>>> +		.da_args	= args,
>>>> +	};
>>>> +
>>>> +	do {
>>>> +		error = xfs_attr_remove_iter(&dac);
>>>> +		if (error != -EAGAIN)
>>>> +			break;
>>>> +
>>>> +		error = xfs_attr_trans_roll(&dac);
>>>> +		if (error)
>>>> +			return error;
>>>> +
>>>> +	} while (true);
>>>> +
>>>> +	return error;
>>>> +}
>>>> +
>>>> +/*
>>>> + * Remove the attribute specified in @args.
>>>> + *
>>>> + * This function may return -EAGAIN to signal that the transaction needs to be
>>>> + * rolled.  Callers should continue calling this function until they receive a
>>>> + * return value other than -EAGAIN.
>>>> + */
>>>> +int
>>>> +xfs_attr_remove_iter(
>>>> +	struct xfs_delattr_context	*dac)
>>>> +{
>>>> +	struct xfs_da_args		*args = dac->da_args;
>>>> +	struct xfs_inode		*dp = args->dp;
>>>> +
>>>> +	if (dac->dela_state == XFS_DAS_RM_SHRINK)
>>>> +		goto node;
>>>>    
>>>>    	if (!xfs_inode_hasattr(dp)) {
>>>> -		error = -ENOATTR;
>>>> +		return -ENOATTR;
>>>>    	} else if (dp->i_afp->if_format == XFS_DINODE_FMT_LOCAL) {
>>>>    		ASSERT(dp->i_afp->if_flags & XFS_IFINLINE);
>>>> -		error = xfs_attr_shortform_remove(args);
>>>> +		return xfs_attr_shortform_remove(args);
>>>>    	} else if (xfs_bmap_one_block(dp, XFS_ATTR_FORK)) {
>>>> -		error = xfs_attr_leaf_removename(args);
>>>> -	} else {
>>>> -		error = xfs_attr_node_removename(args);
>>>> +		return xfs_attr_leaf_removename(args);
>>>>    	}
>>>> -
>>>> -	return error;
>>>> +node:
>>>> +	return  xfs_attr_node_removename_iter(dac);
>>>>    }
>>>>    
>>>>    /*
>>>> @@ -1178,10 +1236,11 @@ xfs_attr_leaf_mark_incomplete(
>>>>     */
>>>>    STATIC
>>>>    int xfs_attr_node_removename_setup(
>>>> -	struct xfs_da_args	*args,
>>>> -	struct xfs_da_state	**state)
>>>> +	struct xfs_delattr_context	*dac,
>>>> +	struct xfs_da_state		**state)
>>>>    {
>>>> -	int			error;
>>>> +	struct xfs_da_args		*args = dac->da_args;
>>>> +	int				error;
>>>>    
>>>>    	error = xfs_attr_node_hasname(args, state);
>>>>    	if (error != -EEXIST)
>>>> @@ -1191,6 +1250,12 @@ int xfs_attr_node_removename_setup(
>>>>    	ASSERT((*state)->path.blk[(*state)->path.active - 1].magic ==
>>>>    		XFS_ATTR_LEAF_MAGIC);
>>>>    
>>>> +	/*
>>>> +	 * Store state in the context incase we need to cycle out the
>>>> +	 * transaction
>>>> +	 */
>>>> +	dac->da_state = *state;
>>>> +
>>>>    	if (args->rmtblkno > 0) {
>>>>    		error = xfs_attr_leaf_mark_incomplete(args, *state);
>>>>    		if (error)
>>>> @@ -1203,13 +1268,16 @@ int xfs_attr_node_removename_setup(
>>>>    }
>>>>    
>>>>    STATIC int
>>>> -xfs_attr_node_remove_rmt(
>>>> -	struct xfs_da_args	*args,
>>>> -	struct xfs_da_state	*state)
>>>> +xfs_attr_node_remove_rmt (
>>>> +	struct xfs_delattr_context	*dac,
>>>> +	struct xfs_da_state		*state)
>>>>    {
>>>> -	int			error = 0;
>>>> +	int				error = 0;
>>>>    
>>>> -	error = xfs_attr_rmtval_remove(args);
>>>> +	/*
>>>> +	 * May return -EAGAIN to request that the caller recall this function
>>>> +	 */
>>>> +	error = __xfs_attr_rmtval_remove(dac);
>>>>    	if (error)
>>>>    		return error;
>>>>    
>>>> @@ -1221,21 +1289,27 @@ xfs_attr_node_remove_rmt(
>>>>    }
>>>>    
>>>>    /*
>>>> - * Remove a name from a B-tree attribute list.
>>>> + * Step through removeing a name from a B-tree attribute list.
>>>>     *
>>>>     * This will involve walking down the Btree, and may involve joining
>>>>     * leaf nodes and even joining intermediate nodes up to and including
>>>>     * the root node (a special case of an intermediate node).
>>>> + *
>>>> + * This routine is meant to function as either an inline or delayed operation,
>>>> + * and may return -EAGAIN when the transaction needs to be rolled.  Calling
>>>> + * functions will need to handle this, and recall the function until a
>>>> + * successful error code is returned.
>>>>     */
>>>>    STATIC int
>>>>    xfs_attr_node_remove_step(
>>>> -	struct xfs_da_args	*args,
>>>> -	struct xfs_da_state	*state)
>>>> +	struct xfs_delattr_context	*dac)
>>>>    {
>>>> -	struct xfs_da_state_blk	*blk;
>>>> -	int			retval, error;
>>>> -	struct xfs_inode	*dp = args->dp;
>>>> +	struct xfs_da_args		*args = dac->da_args;
>>>> +	struct xfs_da_state		*state;
>>>> +	struct xfs_da_state_blk		*blk;
>>>> +	int				retval, error = 0;
>>>>    
>>>> +	state = dac->da_state;
>>>>    
>>>>    	/*
>>>>    	 * If there is an out-of-line value, de-allocate the blocks.
>>>> @@ -1243,7 +1317,10 @@ xfs_attr_node_remove_step(
>>>>    	 * overflow the maximum size of a transaction and/or hit a deadlock.
>>>>    	 */
>>>>    	if (args->rmtblkno > 0) {
>>>> -		error = xfs_attr_node_remove_rmt(args, state);
>>>> +		/*
>>>> +		 * May return -EAGAIN. Remove blocks until args->rmtblkno == 0
>>>> +		 */
>>>> +		error = xfs_attr_node_remove_rmt(dac, state);
>>>>    		if (error)
>>>>    			return error;
>>>>    	}
>>>> @@ -1257,21 +1334,18 @@ xfs_attr_node_remove_step(
>>>>    	xfs_da3_fixhashpath(state, &state->path);
>>>>    
>>>>    	/*
>>>> -	 * Check to see if the tree needs to be collapsed.
>>>> +	 * Check to see if the tree needs to be collapsed.  Set the flag to
>>>> +	 * indicate that the calling function needs to move the to shrink
>>>> +	 * operation
>>>>    	 */
>>>>    	if (retval && (state->path.active > 1)) {
>>>>    		error = xfs_da3_join(state);
>>>>    		if (error)
>>>>    			return error;
>>>> -		error = xfs_defer_finish(&args->trans);
>>>> -		if (error)
>>>> -			return error;
>>>> -		/*
>>>> -		 * Commit the Btree join operation and start a new trans.
>>>> -		 */
>>>> -		error = xfs_trans_roll_inode(&args->trans, dp);
>>>> -		if (error)
>>>> -			return error;
>>>> +
>>>> +		dac->flags |= XFS_DAC_DEFER_FINISH;
>>>> +		dac->dela_state = XFS_DAS_RM_SHRINK;
>>>> +		return -EAGAIN;
>>>>    	}
>>>>    
>>>>    	return error;
>>>> @@ -1282,31 +1356,53 @@ xfs_attr_node_remove_step(
>>>>     *
>>>>     * This routine will find the blocks of the name to remove, remove them and
>>>>     * shirnk the tree if needed.
>>>> + *
>>>> + * This routine is meant to function as either an inline or delayed operation,
>>>> + * and may return -EAGAIN when the transaction needs to be rolled.  Calling
>>>> + * functions will need to handle this, and recall the function until a
>>>> + * successful error code is returned.
>>>>     */
>>>>    STATIC int
>>>> -xfs_attr_node_removename(
>>>> -	struct xfs_da_args	*args)
>>>> +xfs_attr_node_removename_iter(
>>>> +	struct xfs_delattr_context	*dac)
>>>>    {
>>>> -	struct xfs_da_state	*state;
>>>> -	int			error;
>>>> -	struct xfs_inode	*dp = args->dp;
>>>> +	struct xfs_da_args		*args = dac->da_args;
>>>> +	struct xfs_da_state		*state;
>>>> +	int				error;
>>>> +	struct xfs_inode		*dp = args->dp;
>>>>    
>>>>    	trace_xfs_attr_node_removename(args);
>>>> +	state = dac->da_state;
>>>>    
>>>> -	error = xfs_attr_node_removename_setup(args, &state);
>>>> -	if (error)
>>>> -		goto out;
>>>> +	if ((dac->flags & XFS_DAC_NODE_RMVNAME_INIT) == 0) {
>>>> +		dac->flags |= XFS_DAC_NODE_RMVNAME_INIT;
>>>> +		error = xfs_attr_node_removename_setup(dac, &state);
>>>> +		if (error)
>>>> +			goto out;
>>>> +	}
>>>>    
>>>> -	error = xfs_attr_node_remove_step(args, state);
>>>> -	if (error)
>>>> -		goto out;
>>>> +	switch (dac->dela_state) {
>>>> +	case XFS_DAS_UNINIT:
>>>> +		error = xfs_attr_node_remove_step(dac);
>>>> +		if (error)
>>>> +			break;
>>>>    
>>>> -	/*
>>>> -	 * If the result is small enough, push it all into the inode.
>>>> -	 */
>>>> -	if (xfs_bmap_one_block(dp, XFS_ATTR_FORK))
>>>> -		error = xfs_attr_node_shrink(args, state);
>>>> +		/* do not break, proceed to shrink if needed */
>>>> +	case XFS_DAS_RM_SHRINK:
>>>> +		/*
>>>> +		 * If the result is small enough, push it all into the inode.
>>>> +		 */
>>>> +		if (xfs_bmap_one_block(dp, XFS_ATTR_FORK))
>>>> +			error = xfs_attr_node_shrink(args, state);
>>>>    
>>>> +		break;
>>>> +	default:
>>>> +		ASSERT(0);
>>>> +		return -EINVAL;
>>>
>>> I don't think it is possible in a real world scenario, but if "state" were
>>> pointing to allocated memory then the above return value might leak the
>>> corresponding memory.
>> Hmm, trying to follow you here.... I'm assuming you meant dela_state
>> instead of state since that's what controls the switch.  The dac
>> structure is zeroed when allocated to avoid this.  Most of the time when
>> this switch executes, dela_state is zero.  I did have to add the
>> XFS_DAS_UNINIT from the previous suggestion in the last revision though
>> or it generates warnings.
>>>
> 
> Sorry, I should have clarified that I was referring to the allocated
> memory pointed to by dac->da_state. If dac->da_state was pointing to a valid
> memory location and dac->dela_state's value is not equal to either
> XFS_DAS_UNINIT nor XFS_DAS_RM_SHRINK then the code under the "default" clause
> will execute causing -EINVAL to be returned. This could leak the memory
> pointed to by dac->da_state.

Oooh, ok I see it.  We should set error to -EINVAL and goto out. 
Ideally it should never happen, but that should be the proper error 
handling if it did.  Thanks for the catch  :-)

Allison
> 
> 
>>> Apart from the above nit, the remaining changes look good to me.
>> Ok, thanks for the review!
>> Allison
>>
>>>
>>> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
>>>
>>>> +	}
>>>> +
>>>> +	if (error == -EAGAIN)
>>>> +		return error;
>>>>    out:
>>>>    	if (state)
>>>>    		xfs_da_state_free(state);
>>>> diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
>>>> index 3e97a93..64dcf0f 100644
>>>> --- a/fs/xfs/libxfs/xfs_attr.h
>>>> +++ b/fs/xfs/libxfs/xfs_attr.h
>>>> @@ -74,6 +74,74 @@ struct xfs_attr_list_context {
>>>>    };
>>>>    
>>>>    
>>>> +/*
>>>> + * ========================================================================
>>>> + * Structure used to pass context around among the delayed routines.
>>>> + * ========================================================================
>>>> + */
>>>> +
>>>> +/*
>>>> + * Below is a state machine diagram for attr remove operations. The  XFS_DAS_*
>>>> + * states indicate places where the function would return -EAGAIN, and then
>>>> + * immediately resume from after being recalled by the calling function. States
>>>> + * marked as a "subroutine state" indicate that they belong to a subroutine, and
>>>> + * so the calling function needs to pass them back to that subroutine to allow
>>>> + * it to finish where it left off. But they otherwise do not have a role in the
>>>> + * calling function other than just passing through.
>>>> + *
>>>> + * xfs_attr_remove_iter()
>>>> + *	  XFS_DAS_RM_SHRINK ─┐
>>>> + *	  (subroutine state) │
>>>> + *	                     └─>xfs_attr_node_removename()
>>>> + *	                                      │
>>>> + *	                                      v
>>>> + *	                                   need to
>>>> + *	                                shrink tree? ─n─┐
>>>> + *	                                      │         │
>>>> + *	                                      y         │
>>>> + *	                                      │         │
>>>> + *	                                      v         │
>>>> + *	                              XFS_DAS_RM_SHRINK │
>>>> + *	                                      │         │
>>>> + *	                                      v         │
>>>> + *	                                     done <─────┘
>>>> + *
>>>> + */
>>>> +
>>>> +/*
>>>> + * Enum values for xfs_delattr_context.da_state
>>>> + *
>>>> + * These values are used by delayed attribute operations to keep track  of where
>>>> + * they were before they returned -EAGAIN.  A return code of -EAGAIN signals the
>>>> + * calling function to roll the transaction, and then recall the subroutine to
>>>> + * finish the operation.  The enum is then used by the subroutine to jump back
>>>> + * to where it was and resume executing where it left off.
>>>> + */
>>>> +enum xfs_delattr_state {
>>>> +	XFS_DAS_UNINIT		= 0,  /* No state has been set yet */
>>>> +	XFS_DAS_RM_SHRINK,	      /* We are shrinking the tree */
>>>> +};
>>>> +
>>>> +/*
>>>> + * Defines for xfs_delattr_context.flags
>>>> + */
>>>> +#define XFS_DAC_DEFER_FINISH		0x01 /* finish the transaction */
>>>> +#define XFS_DAC_NODE_RMVNAME_INIT	0x02 /* xfs_attr_node_removename init */
>>>> +
>>>> +/*
>>>> + * Context used for keeping track of delayed attribute operations
>>>> + */
>>>> +struct xfs_delattr_context {
>>>> +	struct xfs_da_args      *da_args;
>>>> +
>>>> +	/* Used in xfs_attr_node_removename to roll through removing blocks */
>>>> +	struct xfs_da_state     *da_state;
>>>> +
>>>> +	/* Used to keep track of current state of delayed operation */
>>>> +	unsigned int            flags;
>>>> +	enum xfs_delattr_state  dela_state;
>>>> +};
>>>> +
>>>>    /*========================================================================
>>>>     * Function prototypes for the kernel.
>>>>     *========================================================================*/
>>>> @@ -91,6 +159,10 @@ int xfs_attr_set(struct xfs_da_args *args);
>>>>    int xfs_attr_set_args(struct xfs_da_args *args);
>>>>    int xfs_has_attr(struct xfs_da_args *args);
>>>>    int xfs_attr_remove_args(struct xfs_da_args *args);
>>>> +int xfs_attr_remove_iter(struct xfs_delattr_context *dac);
>>>> +int xfs_attr_trans_roll(struct xfs_delattr_context *dac);
>>>>    bool xfs_attr_namecheck(const void *name, size_t length);
>>>> +void xfs_delattr_context_init(struct xfs_delattr_context *dac,
>>>> +			      struct xfs_da_args *args);
>>>>    
>>>>    #endif	/* __XFS_ATTR_H__ */
>>>> diff --git a/fs/xfs/libxfs/xfs_attr_leaf.c b/fs/xfs/libxfs/xfs_attr_leaf.c
>>>> index bb128db..338377e 100644
>>>> --- a/fs/xfs/libxfs/xfs_attr_leaf.c
>>>> +++ b/fs/xfs/libxfs/xfs_attr_leaf.c
>>>> @@ -19,8 +19,8 @@
>>>>    #include "xfs_bmap_btree.h"
>>>>    #include "xfs_bmap.h"
>>>>    #include "xfs_attr_sf.h"
>>>> -#include "xfs_attr_remote.h"
>>>>    #include "xfs_attr.h"
>>>> +#include "xfs_attr_remote.h"
>>>>    #include "xfs_attr_leaf.h"
>>>>    #include "xfs_error.h"
>>>>    #include "xfs_trace.h"
>>>> diff --git a/fs/xfs/libxfs/xfs_attr_remote.c b/fs/xfs/libxfs/xfs_attr_remote.c
>>>> index 48d8e9c..1426c15 100644
>>>> --- a/fs/xfs/libxfs/xfs_attr_remote.c
>>>> +++ b/fs/xfs/libxfs/xfs_attr_remote.c
>>>> @@ -674,10 +674,12 @@ xfs_attr_rmtval_invalidate(
>>>>     */
>>>>    int
>>>>    xfs_attr_rmtval_remove(
>>>> -	struct xfs_da_args      *args)
>>>> +	struct xfs_da_args		*args)
>>>>    {
>>>> -	int			error;
>>>> -	int			retval;
>>>> +	int				error;
>>>> +	struct xfs_delattr_context	dac  = {
>>>> +		.da_args	= args,
>>>> +	};
>>>>    
>>>>    	trace_xfs_attr_rmtval_remove(args);
>>>>    
>>>> @@ -685,19 +687,17 @@ xfs_attr_rmtval_remove(
>>>>    	 * Keep de-allocating extents until the remote-value region is gone.
>>>>    	 */
>>>>    	do {
>>>> -		retval = __xfs_attr_rmtval_remove(args);
>>>> -		if (retval && retval != -EAGAIN)
>>>> -			return retval;
>>>> +		error = __xfs_attr_rmtval_remove(&dac);
>>>> +		if (error != -EAGAIN)
>>>> +			break;
>>>>    
>>>> -		/*
>>>> -		 * Close out trans and start the next one in the chain.
>>>> -		 */
>>>> -		error = xfs_trans_roll_inode(&args->trans, args->dp);
>>>> +		error = xfs_attr_trans_roll(&dac);
>>>>    		if (error)
>>>>    			return error;
>>>> -	} while (retval == -EAGAIN);
>>>>    
>>>> -	return 0;
>>>> +	} while (true);
>>>> +
>>>> +	return error;
>>>>    }
>>>>    
>>>>    /*
>>>> @@ -707,9 +707,10 @@ xfs_attr_rmtval_remove(
>>>>     */
>>>>    int
>>>>    __xfs_attr_rmtval_remove(
>>>> -	struct xfs_da_args	*args)
>>>> +	struct xfs_delattr_context	*dac)
>>>>    {
>>>> -	int			error, done;
>>>> +	struct xfs_da_args		*args = dac->da_args;
>>>> +	int				error, done;
>>>>    
>>>>    	/*
>>>>    	 * Unmap value blocks for this attr.
>>>> @@ -719,12 +720,10 @@ __xfs_attr_rmtval_remove(
>>>>    	if (error)
>>>>    		return error;
>>>>    
>>>> -	error = xfs_defer_finish(&args->trans);
>>>> -	if (error)
>>>> -		return error;
>>>> -
>>>> -	if (!done)
>>>> +	if (!done) {
>>>> +		dac->flags |= XFS_DAC_DEFER_FINISH;
>>>>    		return -EAGAIN;
>>>> +	}
>>>>    
>>>>    	return error;
>>>>    }
>>>> diff --git a/fs/xfs/libxfs/xfs_attr_remote.h b/fs/xfs/libxfs/xfs_attr_remote.h
>>>> index 9eee615..002fd30 100644
>>>> --- a/fs/xfs/libxfs/xfs_attr_remote.h
>>>> +++ b/fs/xfs/libxfs/xfs_attr_remote.h
>>>> @@ -14,5 +14,5 @@ int xfs_attr_rmtval_remove(struct xfs_da_args *args);
>>>>    int xfs_attr_rmtval_stale(struct xfs_inode *ip, struct xfs_bmbt_irec *map,
>>>>    		xfs_buf_flags_t incore_flags);
>>>>    int xfs_attr_rmtval_invalidate(struct xfs_da_args *args);
>>>> -int __xfs_attr_rmtval_remove(struct xfs_da_args *args);
>>>> +int __xfs_attr_rmtval_remove(struct xfs_delattr_context *dac);
>>>>    #endif /* __XFS_ATTR_REMOTE_H__ */
>>>> diff --git a/fs/xfs/xfs_attr_inactive.c b/fs/xfs/xfs_attr_inactive.c
>>>> index bfad669..aaa7e66 100644
>>>> --- a/fs/xfs/xfs_attr_inactive.c
>>>> +++ b/fs/xfs/xfs_attr_inactive.c
>>>> @@ -15,10 +15,10 @@
>>>>    #include "xfs_da_format.h"
>>>>    #include "xfs_da_btree.h"
>>>>    #include "xfs_inode.h"
>>>> +#include "xfs_attr.h"
>>>>    #include "xfs_attr_remote.h"
>>>>    #include "xfs_trans.h"
>>>>    #include "xfs_bmap.h"
>>>> -#include "xfs_attr.h"
>>>>    #include "xfs_attr_leaf.h"
>>>>    #include "xfs_quota.h"
>>>>    #include "xfs_dir2.h"
>>>>
>>>
>>>
>>
> 
> 

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v13 09/10] xfs: Remove unused xfs_attr_*_args
  2020-10-23  6:34 ` [PATCH v13 09/10] xfs: Remove unused xfs_attr_*_args Allison Henderson
@ 2020-11-10 20:07   ` Darrick J. Wong
  2020-11-13  1:27     ` Allison Henderson
  0 siblings, 1 reply; 58+ messages in thread
From: Darrick J. Wong @ 2020-11-10 20:07 UTC (permalink / raw)
  To: Allison Henderson; +Cc: linux-xfs

On Thu, Oct 22, 2020 at 11:34:34PM -0700, Allison Henderson wrote:
> Remove xfs_attr_set_args, xfs_attr_remove_args, and xfs_attr_trans_roll.
> These high level loops are now driven by the delayed operations code,
> and can be removed.
> 
> Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
> ---
>  fs/xfs/libxfs/xfs_attr.c        | 97 +----------------------------------------
>  fs/xfs/libxfs/xfs_attr.h        |  9 ++--
>  fs/xfs/libxfs/xfs_attr_remote.c |  4 +-
>  3 files changed, 5 insertions(+), 105 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
> index edd5d10..b5e1e84 100644
> --- a/fs/xfs/libxfs/xfs_attr.c
> +++ b/fs/xfs/libxfs/xfs_attr.c
> @@ -262,65 +262,6 @@ xfs_attr_set_shortform(
>  }
>  
>  /*
> - * Checks to see if a delayed attribute transaction should be rolled.  If so,
> - * also checks for a defer finish.  Transaction is finished and rolled as
> - * needed, and returns true of false if the delayed operation should continue.
> - */
> -STATIC int
> -xfs_attr_trans_roll(
> -	struct xfs_delattr_context	*dac)
> -{
> -	struct xfs_da_args		*args = dac->da_args;
> -	int				error = 0;
> -
> -	if (dac->flags & XFS_DAC_DEFER_FINISH) {
> -		/*
> -		 * The caller wants us to finish all the deferred ops so that we
> -		 * avoid pinning the log tail with a large number of deferred
> -		 * ops.
> -		 */
> -		dac->flags &= ~XFS_DAC_DEFER_FINISH;
> -		error = xfs_defer_finish(&args->trans);
> -		if (error)
> -			return error;
> -	}
> -
> -	return xfs_trans_roll_inode(&args->trans, args->dp);
> -}
> -
> -/*
> - * Set the attribute specified in @args.
> - */
> -int
> -xfs_attr_set_args(
> -	struct xfs_da_args	*args)
> -{
> -	struct xfs_buf			*leaf_bp = NULL;
> -	int				error = 0;
> -	struct xfs_delattr_context	dac = {
> -		.da_args	= args,
> -	};
> -
> -	do {
> -		error = xfs_attr_set_iter(&dac, &leaf_bp);

Now that there's only one caller of xfs_attr_set_iter and it passes
&dac->leaf_bp, I think you can get rid of this second parameter, right?

It's nice to see so much code disappear now that we track attr
operations with deferred ops.  Everything else looks ok here. :)

--D

> -		if (error != -EAGAIN)
> -			break;
> -
> -		error = xfs_attr_trans_roll(&dac);
> -		if (error)
> -			return error;
> -
> -		if (leaf_bp) {
> -			xfs_trans_bjoin(args->trans, leaf_bp);
> -			xfs_trans_bhold(args->trans, leaf_bp);
> -		}
> -
> -	} while (true);
> -
> -	return error;
> -}
> -
> -/*
>   * Set the attribute specified in @args.
>   * This routine is meant to function as a delayed operation, and may return
>   * -EAGAIN when the transaction needs to be rolled.  Calling functions will need
> @@ -363,11 +304,7 @@ xfs_attr_set_iter(
>  		 * continue.  Otherwise, is it converted from shortform to leaf
>  		 * and -EAGAIN is returned.
>  		 */
> -		error = xfs_attr_set_shortform(args, leaf_bp);
> -		if (error == -EAGAIN)
> -			dac->flags |= XFS_DAC_DEFER_FINISH;
> -
> -		return error;
> +		return xfs_attr_set_shortform(args, leaf_bp);
>  	}
>  
>  	/*
> @@ -398,7 +335,6 @@ xfs_attr_set_iter(
>  			 * same state (inode locked and joined, transaction
>  			 * clean) no matter how we got to this step.
>  			 */
> -			dac->flags |= XFS_DAC_DEFER_FINISH;
>  			return -EAGAIN;
>  		case 0:
>  			dac->dela_state = XFS_DAS_FOUND_LBLK;
> @@ -455,32 +391,6 @@ xfs_has_attr(
>  
>  /*
>   * Remove the attribute specified in @args.
> - */
> -int
> -xfs_attr_remove_args(
> -	struct xfs_da_args	*args)
> -{
> -	int				error = 0;
> -	struct xfs_delattr_context	dac = {
> -		.da_args	= args,
> -	};
> -
> -	do {
> -		error = xfs_attr_remove_iter(&dac);
> -		if (error != -EAGAIN)
> -			break;
> -
> -		error = xfs_attr_trans_roll(&dac);
> -		if (error)
> -			return error;
> -
> -	} while (true);
> -
> -	return error;
> -}
> -
> -/*
> - * Remove the attribute specified in @args.
>   *
>   * This function may return -EAGAIN to signal that the transaction needs to be
>   * rolled.  Callers should continue calling this function until they receive a
> @@ -895,7 +805,6 @@ xfs_attr_leaf_addname(
>  		if (error)
>  			return error;
>  
> -		dac->flags |= XFS_DAC_DEFER_FINISH;
>  		return -EAGAIN;
>  	}
>  
> @@ -1192,7 +1101,6 @@ xfs_attr_node_addname(
>  			 * Restart routine from the top.  No need to set  the
>  			 * state
>  			 */
> -			dac->flags |= XFS_DAC_DEFER_FINISH;
>  			return -EAGAIN;
>  		}
>  
> @@ -1205,7 +1113,6 @@ xfs_attr_node_addname(
>  		error = xfs_da3_split(state);
>  		if (error)
>  			goto out;
> -		dac->flags |= XFS_DAC_DEFER_FINISH;
>  	} else {
>  		/*
>  		 * Addition succeeded, update Btree hashvals.
> @@ -1246,7 +1153,6 @@ xfs_attr_node_addname(
>  			if (error)
>  				return error;
>  
> -			dac->flags |= XFS_DAC_DEFER_FINISH;
>  			dac->dela_state = XFS_DAS_ALLOC_NODE;
>  			return -EAGAIN;
>  		}
> @@ -1516,7 +1422,6 @@ xfs_attr_node_remove_step(
>  		if (error)
>  			return error;
>  
> -		dac->flags |= XFS_DAC_DEFER_FINISH;
>  		dac->dela_state = XFS_DAS_RM_SHRINK;
>  		return -EAGAIN;
>  	}
> diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
> index 8a08411..6d90301 100644
> --- a/fs/xfs/libxfs/xfs_attr.h
> +++ b/fs/xfs/libxfs/xfs_attr.h
> @@ -244,10 +244,9 @@ enum xfs_delattr_state {
>  /*
>   * Defines for xfs_delattr_context.flags
>   */
> -#define XFS_DAC_DEFER_FINISH		0x01 /* finish the transaction */
> -#define XFS_DAC_NODE_RMVNAME_INIT	0x02 /* xfs_attr_node_removename init */
> -#define XFS_DAC_LEAF_ADDNAME_INIT	0x04 /* xfs_attr_leaf_addname init*/
> -#define XFS_DAC_DELAYED_OP_INIT		0x08 /* delayed operations init*/
> +#define XFS_DAC_NODE_RMVNAME_INIT	0x01 /* xfs_attr_node_removename init */
> +#define XFS_DAC_LEAF_ADDNAME_INIT	0x02 /* xfs_attr_leaf_addname init*/
> +#define XFS_DAC_DELAYED_OP_INIT		0x04 /* delayed operations init*/
>  
>  /*
>   * Context used for keeping track of delayed attribute operations
> @@ -297,11 +296,9 @@ int xfs_inode_hasattr(struct xfs_inode *ip);
>  int xfs_attr_get_ilocked(struct xfs_da_args *args);
>  int xfs_attr_get(struct xfs_da_args *args);
>  int xfs_attr_set(struct xfs_da_args *args);
> -int xfs_attr_set_args(struct xfs_da_args *args);
>  int xfs_attr_set_iter(struct xfs_delattr_context *dac,
>  		      struct xfs_buf **leaf_bp);
>  int xfs_has_attr(struct xfs_da_args *args);
> -int xfs_attr_remove_args(struct xfs_da_args *args);
>  int xfs_attr_remove_iter(struct xfs_delattr_context *dac);
>  bool xfs_attr_namecheck(const void *name, size_t length);
>  void xfs_delattr_context_init(struct xfs_delattr_context *dac,
> diff --git a/fs/xfs/libxfs/xfs_attr_remote.c b/fs/xfs/libxfs/xfs_attr_remote.c
> index 45c4bc5..262d1870 100644
> --- a/fs/xfs/libxfs/xfs_attr_remote.c
> +++ b/fs/xfs/libxfs/xfs_attr_remote.c
> @@ -751,10 +751,8 @@ xfs_attr_rmtval_remove(
>  	if (error)
>  		return error;
>  
> -	if (!done) {
> -		dac->flags |= XFS_DAC_DEFER_FINISH;
> +	if (!done)
>  		return -EAGAIN;
> -	}
>  
>  	return error;
>  }
> -- 
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v13 07/10] xfs: Add feature bit XFS_SB_FEAT_INCOMPAT_LOG_DELATTR
  2020-10-23  6:34 ` [PATCH v13 07/10] xfs: Add feature bit XFS_SB_FEAT_INCOMPAT_LOG_DELATTR Allison Henderson
@ 2020-11-10 20:10   ` Darrick J. Wong
  2020-11-13  1:27     ` Allison Henderson
  2020-11-19  2:36   ` Darrick J. Wong
  1 sibling, 1 reply; 58+ messages in thread
From: Darrick J. Wong @ 2020-11-10 20:10 UTC (permalink / raw)
  To: Allison Henderson; +Cc: linux-xfs

On Thu, Oct 22, 2020 at 11:34:32PM -0700, Allison Henderson wrote:
> This patch adds a new feature bit XFS_SB_FEAT_INCOMPAT_LOG_DELATTR which
> can be used to control turning on/off delayed attributes
> 
> Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
> ---
>  fs/xfs/libxfs/xfs_format.h | 8 ++++++--
>  fs/xfs/libxfs/xfs_fs.h     | 1 +
>  fs/xfs/libxfs/xfs_sb.c     | 2 ++
>  fs/xfs/xfs_super.c         | 3 +++
>  4 files changed, 12 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
> index d419c34..18b41a7 100644
> --- a/fs/xfs/libxfs/xfs_format.h
> +++ b/fs/xfs/libxfs/xfs_format.h
> @@ -483,7 +483,9 @@ xfs_sb_has_incompat_feature(
>  	return (sbp->sb_features_incompat & feature) != 0;
>  }
>  
> -#define XFS_SB_FEAT_INCOMPAT_LOG_ALL 0
> +#define XFS_SB_FEAT_INCOMPAT_LOG_DELATTR   (1 << 0)	/* Delayed Attributes */
> +#define XFS_SB_FEAT_INCOMPAT_LOG_ALL \
> +	(XFS_SB_FEAT_INCOMPAT_LOG_DELATTR)
>  #define XFS_SB_FEAT_INCOMPAT_LOG_UNKNOWN	~XFS_SB_FEAT_INCOMPAT_LOG_ALL
>  static inline bool
>  xfs_sb_has_incompat_log_feature(
> @@ -586,7 +588,9 @@ static inline bool xfs_sb_version_hasinobtcounts(struct xfs_sb *sbp)
>  
>  static inline bool xfs_sb_version_hasdelattr(struct xfs_sb *sbp)
>  {
> -	return false;
> +	return (XFS_SB_VERSION_NUM(sbp) == XFS_SB_VERSION_5 &&
> +		(sbp->sb_features_log_incompat &
> +		XFS_SB_FEAT_INCOMPAT_LOG_DELATTR));

This change and the EXPERIMENTAL warning should go in whichever patch
defines xfs_sb_version_hasdelattr.

>  }
>  
>  /*
> diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
> index 2a2e3cf..f703d95 100644
> --- a/fs/xfs/libxfs/xfs_fs.h
> +++ b/fs/xfs/libxfs/xfs_fs.h
> @@ -250,6 +250,7 @@ typedef struct xfs_fsop_resblks {
>  #define XFS_FSOP_GEOM_FLAGS_RMAPBT	(1 << 19) /* reverse mapping btree */
>  #define XFS_FSOP_GEOM_FLAGS_REFLINK	(1 << 20) /* files can share blocks */
>  #define XFS_FSOP_GEOM_FLAGS_BIGTIME	(1 << 21) /* 64-bit nsec timestamps */
> +#define XFS_FSOP_GEOM_FLAGS_DELATTR	(1 << 22) /* delayed attributes	    */
>  
>  /*
>   * Minimum and maximum sizes need for growth checks.
> diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c
> index 5aeafa5..a0ec327 100644
> --- a/fs/xfs/libxfs/xfs_sb.c
> +++ b/fs/xfs/libxfs/xfs_sb.c
> @@ -1168,6 +1168,8 @@ xfs_fs_geometry(
>  		geo->flags |= XFS_FSOP_GEOM_FLAGS_REFLINK;
>  	if (xfs_sb_version_hasbigtime(sbp))
>  		geo->flags |= XFS_FSOP_GEOM_FLAGS_BIGTIME;
> +	if (xfs_sb_version_hasdelattr(sbp))
> +		geo->flags |= XFS_FSOP_GEOM_FLAGS_DELATTR;

These changes to the geometry ioctl should be a separate patch.

IOWs, the only change in this patch should be adding
XFS_SB_FEAT_INCOMPAT_LOG_DELATTR to the _ALL #define.

--D

>  	if (xfs_sb_version_hassector(sbp))
>  		geo->logsectsize = sbp->sb_logsectsize;
>  	else
> diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> index d1b5f2d..bb85884 100644
> --- a/fs/xfs/xfs_super.c
> +++ b/fs/xfs/xfs_super.c
> @@ -1580,6 +1580,9 @@ xfs_fc_fill_super(
>  	if (xfs_sb_version_hasinobtcounts(&mp->m_sb))
>  		xfs_warn(mp,
>   "EXPERIMENTAL inode btree counters feature in use. Use at your own risk!");
> +	if (xfs_sb_version_hasdelattr(&mp->m_sb))
> +		xfs_alert(mp,
> +	"EXPERIMENTAL delayed attrs feature enabled. Use at your own risk!");
>  
>  	error = xfs_mountfs(mp);
>  	if (error)
> -- 
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v13 06/10] xfs: Add xfs_attr_set_deferred and xfs_attr_remove_deferred
  2020-10-23  6:34 ` [PATCH v13 06/10] xfs: Add xfs_attr_set_deferred and xfs_attr_remove_deferred Allison Henderson
@ 2020-11-10 20:15   ` Darrick J. Wong
  2020-11-13  1:27     ` Allison Henderson
  0 siblings, 1 reply; 58+ messages in thread
From: Darrick J. Wong @ 2020-11-10 20:15 UTC (permalink / raw)
  To: Allison Henderson; +Cc: linux-xfs

On Thu, Oct 22, 2020 at 11:34:31PM -0700, Allison Henderson wrote:
> From: Allison Collins <allison.henderson@oracle.com>
> 
> These routines to set up and start a new deferred attribute operations.
> These functions are meant to be called by any routine needing to
> initiate a deferred attribute operation as opposed to the existing
> inline operations. New helper function xfs_attr_item_init also added.
> 
> Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
> ---
>  fs/xfs/libxfs/xfs_attr.c | 54 ++++++++++++++++++++++++++++++++++++++++++++++++
>  fs/xfs/libxfs/xfs_attr.h |  2 ++
>  2 files changed, 56 insertions(+)
> 
> diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
> index 760383c..7fe5554 100644
> --- a/fs/xfs/libxfs/xfs_attr.c
> +++ b/fs/xfs/libxfs/xfs_attr.c
> @@ -25,6 +25,7 @@
>  #include "xfs_trans_space.h"
>  #include "xfs_trace.h"
>  #include "xfs_attr_item.h"
> +#include "xfs_attr.h"
>  
>  /*
>   * xfs_attr.c
> @@ -643,6 +644,59 @@ xfs_attr_set(
>  	goto out_unlock;
>  }
>  
> +STATIC int
> +xfs_attr_item_init(
> +	struct xfs_da_args	*args,
> +	unsigned int		op_flags,	/* op flag (set or remove) */
> +	struct xfs_attr_item	**attr)		/* new xfs_attr_item */
> +{
> +
> +	struct xfs_attr_item	*new;
> +
> +	new = kmem_alloc_large(sizeof(struct xfs_attr_item), KM_NOFS);

I don't think we need _large allocations for struct xfs_attr_item, right?

> +	memset(new, 0, sizeof(struct xfs_attr_item));

Use kmem_zalloc and you won't have to memset.  Better yet, zalloc will
get you memory that's been pre-zeroed in the background.

> +	new->xattri_op_flags = op_flags;
> +	new->xattri_dac.da_args = args;
> +
> +	*attr = new;
> +	return 0;
> +}
> +
> +/* Sets an attribute for an inode as a deferred operation */
> +int
> +xfs_attr_set_deferred(
> +	struct xfs_da_args	*args)
> +{
> +	struct xfs_attr_item	*new;
> +	int			error = 0;
> +
> +	error = xfs_attr_item_init(args, XFS_ATTR_OP_FLAGS_SET, &new);
> +	if (error)
> +		return error;
> +
> +	xfs_defer_add(args->trans, XFS_DEFER_OPS_TYPE_ATTR, &new->xattri_list);
> +
> +	return 0;
> +}

The changes in "xfs: enable delayed attributes" should be moved to this
patch so that these new functions immediately have callers.

(Also see the reply I sent to the next patch, which will avoid weird
regressions if someone's bisect lands in the middle of this series...)

--D

> +
> +/* Removes an attribute for an inode as a deferred operation */
> +int
> +xfs_attr_remove_deferred(
> +	struct xfs_da_args	*args)
> +{
> +
> +	struct xfs_attr_item	*new;
> +	int			error;
> +
> +	error  = xfs_attr_item_init(args, XFS_ATTR_OP_FLAGS_REMOVE, &new);
> +	if (error)
> +		return error;
> +
> +	xfs_defer_add(args->trans, XFS_DEFER_OPS_TYPE_ATTR, &new->xattri_list);
> +
> +	return 0;
> +}
> +
>  /*========================================================================
>   * External routines when attribute list is inside the inode
>   *========================================================================*/
> diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
> index 5b4a1ca..8a08411 100644
> --- a/fs/xfs/libxfs/xfs_attr.h
> +++ b/fs/xfs/libxfs/xfs_attr.h
> @@ -307,5 +307,7 @@ bool xfs_attr_namecheck(const void *name, size_t length);
>  void xfs_delattr_context_init(struct xfs_delattr_context *dac,
>  			      struct xfs_da_args *args);
>  int xfs_attr_calc_size(struct xfs_da_args *args, int *local);
> +int xfs_attr_set_deferred(struct xfs_da_args *args);
> +int xfs_attr_remove_deferred(struct xfs_da_args *args);
>  
>  #endif	/* __XFS_ATTR_H__ */
> -- 
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v13 05/10] xfs: Set up infastructure for deferred attribute operations
  2020-10-23  6:34 ` [PATCH v13 05/10] xfs: Set up infastructure for deferred attribute operations Allison Henderson
@ 2020-11-10 21:51   ` Darrick J. Wong
  2020-11-11  3:44     ` Darrick J. Wong
  2020-11-13  1:32     ` Allison Henderson
  0 siblings, 2 replies; 58+ messages in thread
From: Darrick J. Wong @ 2020-11-10 21:51 UTC (permalink / raw)
  To: Allison Henderson; +Cc: linux-xfs

On Thu, Oct 22, 2020 at 11:34:30PM -0700, Allison Henderson wrote:
> Currently attributes are modified directly across one or more
> transactions. But they are not logged or replayed in the event of an
> error. The goal of delayed attributes is to enable logging and replaying
> of attribute operations using the existing delayed operations
> infrastructure.  This will later enable the attributes to become part of
> larger multi part operations that also must first be recorded to the
> log.  This is mostly of interest in the scheme of parent pointers which
> would need to maintain an attribute containing parent inode information
> any time an inode is moved, created, or removed.  Parent pointers would
> then be of interest to any feature that would need to quickly derive an
> inode path from the mount point. Online scrub, nfs lookups and fs grow
> or shrink operations are all features that could take advantage of this.
> 
> This patch adds two new log item types for setting or removing
> attributes as deferred operations.  The xfs_attri_log_item logs an
> intent to set or remove an attribute.  The corresponding
> xfs_attrd_log_item holds a reference to the xfs_attri_log_item and is
> freed once the transaction is done.  Both log items use a generic
> xfs_attr_log_format structure that contains the attribute name, value,
> flags, inode, and an op_flag that indicates if the operations is a set
> or remove.
> 
> Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
> ---
>  fs/xfs/Makefile                 |   1 +
>  fs/xfs/libxfs/xfs_attr.c        |   7 +-
>  fs/xfs/libxfs/xfs_attr.h        |  19 +
>  fs/xfs/libxfs/xfs_defer.c       |   1 +
>  fs/xfs/libxfs/xfs_defer.h       |   3 +
>  fs/xfs/libxfs/xfs_format.h      |   5 +
>  fs/xfs/libxfs/xfs_log_format.h  |  44 ++-
>  fs/xfs/libxfs/xfs_log_recover.h |   2 +
>  fs/xfs/libxfs/xfs_types.h       |   1 +
>  fs/xfs/scrub/common.c           |   2 +
>  fs/xfs/xfs_acl.c                |   2 +
>  fs/xfs/xfs_attr_item.c          | 750 ++++++++++++++++++++++++++++++++++++++++
>  fs/xfs/xfs_attr_item.h          |  76 ++++
>  fs/xfs/xfs_attr_list.c          |   1 +
>  fs/xfs/xfs_ioctl.c              |   2 +
>  fs/xfs/xfs_ioctl32.c            |   2 +
>  fs/xfs/xfs_iops.c               |   2 +
>  fs/xfs/xfs_log.c                |   4 +
>  fs/xfs/xfs_log_recover.c        |   2 +
>  fs/xfs/xfs_ondisk.h             |   2 +
>  fs/xfs/xfs_xattr.c              |   1 +
>  21 files changed, 923 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
> index 04611a1..b056cfc 100644
> --- a/fs/xfs/Makefile
> +++ b/fs/xfs/Makefile
> @@ -102,6 +102,7 @@ xfs-y				+= xfs_log.o \
>  				   xfs_buf_item_recover.o \
>  				   xfs_dquot_item_recover.o \
>  				   xfs_extfree_item.o \
> +				   xfs_attr_item.o \
>  				   xfs_icreate_item.o \
>  				   xfs_inode_item.o \
>  				   xfs_inode_item_recover.o \
> diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
> index 6453178..760383c 100644
> --- a/fs/xfs/libxfs/xfs_attr.c
> +++ b/fs/xfs/libxfs/xfs_attr.c
> @@ -24,6 +24,7 @@
>  #include "xfs_quota.h"
>  #include "xfs_trans_space.h"
>  #include "xfs_trace.h"
> +#include "xfs_attr_item.h"
>  
>  /*
>   * xfs_attr.c
> @@ -59,8 +60,6 @@ STATIC int xfs_attr_node_hasname(xfs_da_args_t *args,
>  STATIC int xfs_attr_fillstate(xfs_da_state_t *state);
>  STATIC int xfs_attr_refillstate(xfs_da_state_t *state);
>  STATIC int xfs_attr_leaf_try_add(struct xfs_da_args *args, struct xfs_buf *bp);
> -STATIC int xfs_attr_set_iter(struct xfs_delattr_context *dac,
> -			     struct xfs_buf **leaf_bp);
>  
>  int
>  xfs_inode_hasattr(
> @@ -142,7 +141,7 @@ xfs_attr_get(
>  /*
>   * Calculate how many blocks we need for the new attribute,
>   */
> -STATIC int
> +int
>  xfs_attr_calc_size(
>  	struct xfs_da_args	*args,
>  	int			*local)
> @@ -327,7 +326,7 @@ xfs_attr_set_args(
>   * to handle this, and recall the function until a successful error code is
>   * returned.
>   */
> -STATIC int
> +int
>  xfs_attr_set_iter(
>  	struct xfs_delattr_context	*dac,
>  	struct xfs_buf			**leaf_bp)
> diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
> index 501f9df..5b4a1ca 100644
> --- a/fs/xfs/libxfs/xfs_attr.h
> +++ b/fs/xfs/libxfs/xfs_attr.h
> @@ -247,6 +247,7 @@ enum xfs_delattr_state {
>  #define XFS_DAC_DEFER_FINISH		0x01 /* finish the transaction */
>  #define XFS_DAC_NODE_RMVNAME_INIT	0x02 /* xfs_attr_node_removename init */
>  #define XFS_DAC_LEAF_ADDNAME_INIT	0x04 /* xfs_attr_leaf_addname init*/
> +#define XFS_DAC_DELAYED_OP_INIT		0x08 /* delayed operations init*/
>  
>  /*
>   * Context used for keeping track of delayed attribute operations
> @@ -254,6 +255,9 @@ enum xfs_delattr_state {
>  struct xfs_delattr_context {
>  	struct xfs_da_args      *da_args;
>  
> +	/* Used by delayed attributes to hold leaf across transactions */

"Used by xfs_attr_set to hold a leaf buffer across a transaction roll" ?

> +	struct xfs_buf		*leaf_bp;
> +
>  	/* Used in xfs_attr_rmtval_set_blk to roll through allocating blocks */
>  	struct xfs_bmbt_irec	map;
>  	xfs_dablk_t		lblkno;
> @@ -267,6 +271,18 @@ struct xfs_delattr_context {
>  	enum xfs_delattr_state  dela_state;
>  };
>  
> +/*
> + * List of attrs to commit later.
> + */
> +struct xfs_attr_item {
> +	struct xfs_delattr_context	xattri_dac;
> +	uint32_t			xattri_op_flags;/* attr op set or rm */

The comment for xattri_op_flags should be more direct in mentioning that
it takes XFS_ATTR_OP_FLAGS_{SET,REMOVE}.

(Alternately you could define an enum for the incore state tracker that
causes the appropriate XFS_ATTR_OP_FLAG* to be set on the log item in
xfs_attr_create_intent to avoid mixing of the flag namespaces, but that
is a lot of paper-pushing...)

> +
> +	/* used to log this item to an intent */
> +	struct list_head		xattri_list;
> +};

Ok, so going back to a confusing comment I had from the last series,
I'm glad that you've moved all the attr code to be deferred operations.

Can you move all the xfs_delattr_context fields into xfs_attr_item?
AFAICT (from git diff'ing the entire branch :P) we never allocate an
xfs_delattr_context on its own; we only ever access the one that's
embedded in xfs_attr_item, right?

> +
> +
>  /*========================================================================
>   * Function prototypes for the kernel.
>   *========================================================================*/
> @@ -282,11 +298,14 @@ int xfs_attr_get_ilocked(struct xfs_da_args *args);
>  int xfs_attr_get(struct xfs_da_args *args);
>  int xfs_attr_set(struct xfs_da_args *args);
>  int xfs_attr_set_args(struct xfs_da_args *args);
> +int xfs_attr_set_iter(struct xfs_delattr_context *dac,
> +		      struct xfs_buf **leaf_bp);
>  int xfs_has_attr(struct xfs_da_args *args);
>  int xfs_attr_remove_args(struct xfs_da_args *args);
>  int xfs_attr_remove_iter(struct xfs_delattr_context *dac);
>  bool xfs_attr_namecheck(const void *name, size_t length);
>  void xfs_delattr_context_init(struct xfs_delattr_context *dac,
>  			      struct xfs_da_args *args);
> +int xfs_attr_calc_size(struct xfs_da_args *args, int *local);
>  
>  #endif	/* __XFS_ATTR_H__ */
> diff --git a/fs/xfs/libxfs/xfs_defer.c b/fs/xfs/libxfs/xfs_defer.c
> index eff4a12..e9caff7 100644
> --- a/fs/xfs/libxfs/xfs_defer.c
> +++ b/fs/xfs/libxfs/xfs_defer.c
> @@ -178,6 +178,7 @@ static const struct xfs_defer_op_type *defer_op_types[] = {
>  	[XFS_DEFER_OPS_TYPE_RMAP]	= &xfs_rmap_update_defer_type,
>  	[XFS_DEFER_OPS_TYPE_FREE]	= &xfs_extent_free_defer_type,
>  	[XFS_DEFER_OPS_TYPE_AGFL_FREE]	= &xfs_agfl_free_defer_type,
> +	[XFS_DEFER_OPS_TYPE_ATTR]	= &xfs_attr_defer_type,
>  };
>  
>  static void
> diff --git a/fs/xfs/libxfs/xfs_defer.h b/fs/xfs/libxfs/xfs_defer.h
> index 05472f7..72a5789 100644
> --- a/fs/xfs/libxfs/xfs_defer.h
> +++ b/fs/xfs/libxfs/xfs_defer.h
> @@ -19,6 +19,7 @@ enum xfs_defer_ops_type {
>  	XFS_DEFER_OPS_TYPE_RMAP,
>  	XFS_DEFER_OPS_TYPE_FREE,
>  	XFS_DEFER_OPS_TYPE_AGFL_FREE,
> +	XFS_DEFER_OPS_TYPE_ATTR,
>  	XFS_DEFER_OPS_TYPE_MAX,
>  };
>  
> @@ -63,6 +64,8 @@ extern const struct xfs_defer_op_type xfs_refcount_update_defer_type;
>  extern const struct xfs_defer_op_type xfs_rmap_update_defer_type;
>  extern const struct xfs_defer_op_type xfs_extent_free_defer_type;
>  extern const struct xfs_defer_op_type xfs_agfl_free_defer_type;
> +extern const struct xfs_defer_op_type xfs_attr_defer_type;
> +
>  
>  /*
>   * This structure enables a dfops user to detach the chain of deferred
> diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
> index dd764da..d419c34 100644
> --- a/fs/xfs/libxfs/xfs_format.h
> +++ b/fs/xfs/libxfs/xfs_format.h
> @@ -584,6 +584,11 @@ static inline bool xfs_sb_version_hasinobtcounts(struct xfs_sb *sbp)
>  		(sbp->sb_features_ro_compat & XFS_SB_FEAT_RO_COMPAT_INOBTCNT);
>  }
>  
> +static inline bool xfs_sb_version_hasdelattr(struct xfs_sb *sbp)
> +{
> +	return false;
> +}
> +
>  /*
>   * end of superblock version macros
>   */
> diff --git a/fs/xfs/libxfs/xfs_log_format.h b/fs/xfs/libxfs/xfs_log_format.h
> index 8bd00da..de6309d 100644
> --- a/fs/xfs/libxfs/xfs_log_format.h
> +++ b/fs/xfs/libxfs/xfs_log_format.h
> @@ -117,7 +117,12 @@ struct xfs_unmount_log_format {
>  #define XLOG_REG_TYPE_CUD_FORMAT	24
>  #define XLOG_REG_TYPE_BUI_FORMAT	25
>  #define XLOG_REG_TYPE_BUD_FORMAT	26
> -#define XLOG_REG_TYPE_MAX		26
> +#define XLOG_REG_TYPE_ATTRI_FORMAT	27
> +#define XLOG_REG_TYPE_ATTRD_FORMAT	28
> +#define XLOG_REG_TYPE_ATTR_NAME	29
> +#define XLOG_REG_TYPE_ATTR_VALUE	30
> +#define XLOG_REG_TYPE_MAX		30
> +
>  
>  /*
>   * Flags to log operation header
> @@ -240,6 +245,8 @@ typedef struct xfs_trans_header {
>  #define	XFS_LI_CUD		0x1243
>  #define	XFS_LI_BUI		0x1244	/* bmbt update intent */
>  #define	XFS_LI_BUD		0x1245
> +#define	XFS_LI_ATTRI		0x1246  /* attr set/remove intent*/
> +#define	XFS_LI_ATTRD		0x1247  /* attr set/remove done */
>  
>  #define XFS_LI_TYPE_DESC \
>  	{ XFS_LI_EFI,		"XFS_LI_EFI" }, \
> @@ -255,7 +262,9 @@ typedef struct xfs_trans_header {
>  	{ XFS_LI_CUI,		"XFS_LI_CUI" }, \
>  	{ XFS_LI_CUD,		"XFS_LI_CUD" }, \
>  	{ XFS_LI_BUI,		"XFS_LI_BUI" }, \
> -	{ XFS_LI_BUD,		"XFS_LI_BUD" }
> +	{ XFS_LI_BUD,		"XFS_LI_BUD" }, \
> +	{ XFS_LI_ATTRI,		"XFS_LI_ATTRI" }, \
> +	{ XFS_LI_ATTRD,		"XFS_LI_ATTRD" }
>  
>  /*
>   * Inode Log Item Format definitions.
> @@ -863,4 +872,35 @@ struct xfs_icreate_log {
>  	__be32		icl_gen;	/* inode generation number to use */
>  };
>  
> +/*
> + * Flags for deferred attribute operations.
> + * Upper bits are flags, lower byte is type code
> + */
> +#define XFS_ATTR_OP_FLAGS_SET		1	/* Set the attribute */
> +#define XFS_ATTR_OP_FLAGS_REMOVE	2	/* Remove the attribute */
> +#define XFS_ATTR_OP_FLAGS_TYPE_MASK	0x0FF	/* Flags type mask */
> +
> +/*
> + * This is the structure used to lay out an attr log item in the
> + * log.
> + */
> +struct xfs_attri_log_format {
> +	uint16_t	alfi_type;	/* attri log item type */
> +	uint16_t	alfi_size;	/* size of this item */
> +	uint32_t	__pad;		/* pad to 64 bit aligned */
> +	uint64_t	alfi_id;	/* attri identifier */
> +	xfs_ino_t	alfi_ino;	/* the inode for this attr operation */

This is an ondisk structure; please use only explicitly sized data
types like uint64_t.

> +	uint32_t	alfi_op_flags;	/* marks the op as a set or remove */
> +	uint32_t	alfi_name_len;	/* attr name length */
> +	uint32_t	alfi_value_len;	/* attr value length */
> +	uint32_t	alfi_attr_flags;/* attr flags */
> +};
> +
> +struct xfs_attrd_log_format {
> +	uint16_t	alfd_type;	/* attrd log item type */
> +	uint16_t	alfd_size;	/* size of this item */
> +	uint32_t	__pad;		/* pad to 64 bit aligned */
> +	uint64_t	alfd_alf_id;	/* id of corresponding attrd */

"..of corresponding attri"

> +};
> +
>  #endif /* __XFS_LOG_FORMAT_H__ */
> diff --git a/fs/xfs/libxfs/xfs_log_recover.h b/fs/xfs/libxfs/xfs_log_recover.h
> index 3cca2bf..b6e5514 100644
> --- a/fs/xfs/libxfs/xfs_log_recover.h
> +++ b/fs/xfs/libxfs/xfs_log_recover.h
> @@ -72,6 +72,8 @@ extern const struct xlog_recover_item_ops xlog_rui_item_ops;
>  extern const struct xlog_recover_item_ops xlog_rud_item_ops;
>  extern const struct xlog_recover_item_ops xlog_cui_item_ops;
>  extern const struct xlog_recover_item_ops xlog_cud_item_ops;
> +extern const struct xlog_recover_item_ops xlog_attri_item_ops;
> +extern const struct xlog_recover_item_ops xlog_attrd_item_ops;
>  
>  /*
>   * Macros, structures, prototypes for internal log manager use.
> diff --git a/fs/xfs/libxfs/xfs_types.h b/fs/xfs/libxfs/xfs_types.h
> index 397d947..860cdd2 100644
> --- a/fs/xfs/libxfs/xfs_types.h
> +++ b/fs/xfs/libxfs/xfs_types.h
> @@ -11,6 +11,7 @@ typedef uint32_t	prid_t;		/* project ID */
>  typedef uint32_t	xfs_agblock_t;	/* blockno in alloc. group */
>  typedef uint32_t	xfs_agino_t;	/* inode # within allocation grp */
>  typedef uint32_t	xfs_extlen_t;	/* extent length in blocks */
> +typedef uint32_t	xfs_attrlen_t;	/* attr length */

This doesn't get used anywhere.

>  typedef uint32_t	xfs_agnumber_t;	/* allocation group number */
>  typedef int32_t		xfs_extnum_t;	/* # of extents in a file */
>  typedef int16_t		xfs_aextnum_t;	/* # extents in an attribute fork */
> diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
> index 1887605..9a649d1 100644
> --- a/fs/xfs/scrub/common.c
> +++ b/fs/xfs/scrub/common.c
> @@ -24,6 +24,8 @@
>  #include "xfs_rmap_btree.h"
>  #include "xfs_log.h"
>  #include "xfs_trans_priv.h"
> +#include "xfs_da_format.h"
> +#include "xfs_da_btree.h"
>  #include "xfs_attr.h"
>  #include "xfs_reflink.h"
>  #include "scrub/scrub.h"
> diff --git a/fs/xfs/xfs_acl.c b/fs/xfs/xfs_acl.c
> index c544951..cad1db4 100644
> --- a/fs/xfs/xfs_acl.c
> +++ b/fs/xfs/xfs_acl.c
> @@ -10,6 +10,8 @@
>  #include "xfs_trans_resv.h"
>  #include "xfs_mount.h"
>  #include "xfs_inode.h"
> +#include "xfs_da_format.h"
> +#include "xfs_da_btree.h"
>  #include "xfs_attr.h"
>  #include "xfs_trace.h"
>  #include "xfs_error.h"
> diff --git a/fs/xfs/xfs_attr_item.c b/fs/xfs/xfs_attr_item.c
> new file mode 100644
> index 0000000..3980066
> --- /dev/null
> +++ b/fs/xfs/xfs_attr_item.c
> @@ -0,0 +1,750 @@
> +// SPDX-License-Identifier: GPL-2.0-or-later
> +/*
> + * Copyright (C) 2019 Oracle.  All Rights Reserved.

2019 -> 2020.

> + * Author: Allison Collins <allison.henderson@oracle.com>
> + */
> +
> +#include "xfs.h"
> +#include "xfs_fs.h"
> +#include "xfs_format.h"
> +#include "xfs_log_format.h"
> +#include "xfs_trans_resv.h"
> +#include "xfs_bit.h"
> +#include "xfs_shared.h"
> +#include "xfs_mount.h"
> +#include "xfs_defer.h"
> +#include "xfs_trans.h"
> +#include "xfs_trans_priv.h"
> +#include "xfs_buf_item.h"
> +#include "xfs_attr_item.h"
> +#include "xfs_log.h"
> +#include "xfs_btree.h"
> +#include "xfs_rmap.h"
> +#include "xfs_inode.h"
> +#include "xfs_icache.h"
> +#include "xfs_da_format.h"
> +#include "xfs_da_btree.h"
> +#include "xfs_attr.h"
> +#include "xfs_shared.h"
> +#include "xfs_attr_item.h"
> +#include "xfs_alloc.h"
> +#include "xfs_bmap.h"
> +#include "xfs_trace.h"
> +#include "libxfs/xfs_da_format.h"
> +#include "xfs_inode.h"
> +#include "xfs_quota.h"
> +#include "xfs_log_priv.h"
> +#include "xfs_log_recover.h"
> +
> +static const struct xfs_item_ops xfs_attri_item_ops;
> +static const struct xfs_item_ops xfs_attrd_item_ops;
> +
> +static inline struct xfs_attri_log_item *ATTRI_ITEM(struct xfs_log_item *lip)
> +{
> +	return container_of(lip, struct xfs_attri_log_item, attri_item);
> +}
> +
> +STATIC void
> +xfs_attri_item_free(
> +	struct xfs_attri_log_item	*attrip)
> +{
> +	kmem_free(attrip->attri_item.li_lv_shadow);
> +	kmem_free(attrip);
> +}
> +
> +/*
> + * Freeing the attrip requires that we remove it from the AIL if it has already
> + * been placed there. However, the ATTRI may not yet have been placed in the
> + * AIL when called by xfs_attri_release() from ATTRD processing due to the
> + * ordering of committed vs unpin operations in bulk insert operations. Hence
> + * the reference count to ensure only the last caller frees the ATTRI.
> + */
> +STATIC void
> +xfs_attri_release(
> +	struct xfs_attri_log_item	*attrip)
> +{
> +	ASSERT(atomic_read(&attrip->attri_refcount) > 0);
> +	if (atomic_dec_and_test(&attrip->attri_refcount)) {
> +		xfs_trans_ail_delete(&attrip->attri_item,
> +				     SHUTDOWN_LOG_IO_ERROR);
> +		xfs_attri_item_free(attrip);
> +	}
> +}
> +
> +/*
> + * This returns the number of iovecs needed to log the given attri item. We
> + * only need 1 iovec for an attri item.  It just logs the attr_log_format
> + * structure.
> + */
> +static inline int
> +xfs_attri_item_sizeof(
> +	struct xfs_attri_log_item *attrip)
> +{
> +	return sizeof(struct xfs_attri_log_format);
> +}

Please get rid of this trivial oneliner.

> +
> +STATIC void
> +xfs_attri_item_size(
> +	struct xfs_log_item	*lip,
> +	int			*nvecs,
> +	int			*nbytes)
> +{
> +	struct xfs_attri_log_item       *attrip = ATTRI_ITEM(lip);
> +
> +	*nvecs += 1;
> +	*nbytes += xfs_attri_item_sizeof(attrip);
> +
> +	/* Attr set and remove operations require a name */
> +	ASSERT(attrip->attri_name_len > 0);
> +
> +	*nvecs += 1;
> +	*nbytes += ATTR_NVEC_SIZE(attrip->attri_name_len);
> +
> +	/*
> +	 * Set ops can accept a value of 0 len to clear an attr value.  Remove
> +	 * ops do not need a value at all.  So only account for the value
> +	 * when it is needed.
> +	 */
> +	if (attrip->attri_value_len > 0) {
> +		*nvecs += 1;
> +		*nbytes += ATTR_NVEC_SIZE(attrip->attri_value_len);
> +	}
> +}
> +
> +/*
> + * This is called to fill in the log iovecs for the given attri log
> + * item. We use  1 iovec for the attri_format_item, 1 for the name, and
> + * another for the value if it is present
> + */
> +STATIC void
> +xfs_attri_item_format(
> +	struct xfs_log_item	*lip,
> +	struct xfs_log_vec	*lv)
> +{
> +	struct xfs_attri_log_item	*attrip = ATTRI_ITEM(lip);
> +	struct xfs_log_iovec		*vecp = NULL;
> +
> +	attrip->attri_format.alfi_type = XFS_LI_ATTRI;
> +	attrip->attri_format.alfi_size = 1;
> +
> +	/*
> +	 * This size accounting must be done before copying the attrip into the
> +	 * iovec.  If we do it after, the wrong size will be recorded to the log
> +	 * and we trip across assertion checks for bad region sizes later during
> +	 * the log recovery.
> +	 */
> +
> +	ASSERT(attrip->attri_name_len > 0);
> +	attrip->attri_format.alfi_size++;
> +
> +	if (attrip->attri_value_len > 0)
> +		attrip->attri_format.alfi_size++;
> +
> +	xlog_copy_iovec(lv, &vecp, XLOG_REG_TYPE_ATTRI_FORMAT,
> +			&attrip->attri_format,
> +			xfs_attri_item_sizeof(attrip));
> +	if (attrip->attri_name_len > 0)

I thought we required attri_name_len > 0 always?

> +		xlog_copy_iovec(lv, &vecp, XLOG_REG_TYPE_ATTR_NAME,
> +				attrip->attri_name,
> +				ATTR_NVEC_SIZE(attrip->attri_name_len));
> +
> +	if (attrip->attri_value_len > 0)
> +		xlog_copy_iovec(lv, &vecp, XLOG_REG_TYPE_ATTR_VALUE,
> +				attrip->attri_value,
> +				ATTR_NVEC_SIZE(attrip->attri_value_len));
> +}
> +
> +/*
> + * The unpin operation is the last place an ATTRI is manipulated in the log. It
> + * is either inserted in the AIL or aborted in the event of a log I/O error. In
> + * either case, the ATTRI transaction has been successfully committed to make
> + * it this far. Therefore, we expect whoever committed the ATTRI to either
> + * construct and commit the ATTRD or drop the ATTRD's reference in the event of
> + * error. Simply drop the log's ATTRI reference now that the log is done with
> + * it.
> + */
> +STATIC void
> +xfs_attri_item_unpin(
> +	struct xfs_log_item	*lip,
> +	int			remove)
> +{
> +	struct xfs_attri_log_item	*attrip = ATTRI_ITEM(lip);
> +
> +	xfs_attri_release(attrip);

Nit: this could be shortened to xfs_attri_release(ATTRI_ITEM(lip)).

> +}
> +
> +
> +STATIC void
> +xfs_attri_item_release(
> +	struct xfs_log_item	*lip)
> +{
> +	xfs_attri_release(ATTRI_ITEM(lip));
> +}
> +
> +/*
> + * Allocate and initialize an attri item
> + */
> +STATIC struct xfs_attri_log_item *
> +xfs_attri_init(
> +	struct xfs_mount	*mp)
> +
> +{
> +	struct xfs_attri_log_item	*attrip;
> +	uint				size;

Can you line up the *mp in the parameter list with the *attrip in the
local variables?

> +
> +	size = (uint)(sizeof(struct xfs_attri_log_item));

kmem_zalloc takes a size_t parameter (which is the return type of sizeof);
no need to do all this casting.

> +	attrip = kmem_zalloc(size, 0);
> +
> +	xfs_log_item_init(mp, &attrip->attri_item, XFS_LI_ATTRI,
> +			  &xfs_attri_item_ops);
> +	attrip->attri_format.alfi_id = (uintptr_t)(void *)attrip;
> +	atomic_set(&attrip->attri_refcount, 2);
> +
> +	return attrip;
> +}
> +
> +/*
> + * Copy an attr format buffer from the given buf, and into the destination attr
> + * format structure.
> + */
> +STATIC int
> +xfs_attri_copy_format(struct xfs_log_iovec *buf,
> +		      struct xfs_attri_log_format *dst_attr_fmt)
> +{
> +	struct xfs_attri_log_format *src_attr_fmt = buf->i_addr;
> +	uint len = sizeof(struct xfs_attri_log_format);

Indentation and whatnot with the parameter names.

> +
> +	if (buf->i_len != len)
> +		return -EFSCORRUPTED;
> +
> +	memcpy((char *)dst_attr_fmt, (char *)src_attr_fmt, len);
> +	return 0;
> +}
> +
> +static inline struct xfs_attrd_log_item *ATTRD_ITEM(struct xfs_log_item *lip)
> +{
> +	return container_of(lip, struct xfs_attrd_log_item, attrd_item);
> +}
> +
> +STATIC void
> +xfs_attrd_item_free(struct xfs_attrd_log_item *attrdp)
> +{
> +	kmem_free(attrdp->attrd_item.li_lv_shadow);
> +	kmem_free(attrdp);
> +}
> +
> +/*
> + * This returns the number of iovecs needed to log the given attrd item.
> + * We only need 1 iovec for an attrd item.  It just logs the attr_log_format
> + * structure.
> + */
> +static inline int
> +xfs_attrd_item_sizeof(
> +	struct xfs_attrd_log_item *attrdp)
> +{
> +	return sizeof(struct xfs_attrd_log_format);
> +}
> +
> +STATIC void
> +xfs_attrd_item_size(
> +	struct xfs_log_item	*lip,
> +	int			*nvecs,
> +	int			*nbytes)
> +{
> +	struct xfs_attrd_log_item	*attrdp = ATTRD_ITEM(lip);

Variable name alignment between the parameter list and the local vars.

> +	*nvecs += 1;

Space between local variable declaration and the first line of code.

> +	*nbytes += xfs_attrd_item_sizeof(attrdp);

No need for a oneliner function for sizeof.

> +}
> +
> +/*
> + * This is called to fill in the log iovecs for the given attrd log item. We use
> + * only 1 iovec for the attrd_format, and we point that at the attr_log_format
> + * structure embedded in the attrd item.
> + */
> +STATIC void
> +xfs_attrd_item_format(
> +	struct xfs_log_item	*lip,
> +	struct xfs_log_vec	*lv)
> +{
> +	struct xfs_attrd_log_item	*attrdp = ATTRD_ITEM(lip);
> +	struct xfs_log_iovec		*vecp = NULL;
> +
> +	attrdp->attrd_format.alfd_type = XFS_LI_ATTRD;
> +	attrdp->attrd_format.alfd_size = 1;
> +
> +	xlog_copy_iovec(lv, &vecp, XLOG_REG_TYPE_ATTRD_FORMAT,
> +			&attrdp->attrd_format, xfs_attrd_item_sizeof(attrdp));
> +}
> +
> +/*
> + * The ATTRD is either committed or aborted if the transaction is cancelled. If
> + * the transaction is cancelled, drop our reference to the ATTRI and free the
> + * ATTRD.
> + */
> +STATIC void
> +xfs_attrd_item_release(
> +	struct xfs_log_item     *lip)
> +{
> +	struct xfs_attrd_log_item *attrdp = ATTRD_ITEM(lip);
> +	xfs_attri_release(attrdp->attrd_attrip);

Space between the variable declaration and the first line of code.

> +	xfs_attrd_item_free(attrdp);
> +}
> +
> +/*
> + * Log an ATTRI it to the ATTRD when the attr op is done.  An attr operation

I don't know what "Log an ATTRI it to the ATTRD" means.  I think this is
the function that performs one step of an attribute update intent and
then tags the attrd item dirty, right?

> + * may be a set or a remove.  Note that the transaction is marked dirty
> + * regardless of whether the operation succeeds or fails to support the
> + * ATTRI/ATTRD lifecycle rules.
> + */
> +int
> +xfs_trans_attr(
> +	struct xfs_delattr_context	*dac,
> +	struct xfs_attrd_log_item	*attrdp,
> +	struct xfs_buf			**leaf_bp,
> +	uint32_t			op_flags)
> +{
> +	struct xfs_da_args		*args = dac->da_args;
> +	int				error;
> +
> +	error = xfs_qm_dqattach_locked(args->dp, 0);
> +	if (error)
> +		return error;
> +
> +	switch (op_flags) {
> +	case XFS_ATTR_OP_FLAGS_SET:
> +		args->op_flags |= XFS_DA_OP_ADDNAME;
> +		error = xfs_attr_set_iter(dac, leaf_bp);
> +		break;
> +	case XFS_ATTR_OP_FLAGS_REMOVE:
> +		ASSERT(XFS_IFORK_Q((args->dp)));

No need for the double parentheses around args->dp.

> +		error = xfs_attr_remove_iter(dac);
> +		break;
> +	default:
> +		error = -EFSCORRUPTED;
> +		break;
> +	}
> +
> +	/*
> +	 * Mark the transaction dirty, even on error. This ensures the
> +	 * transaction is aborted, which:
> +	 *
> +	 * 1.) releases the ATTRI and frees the ATTRD
> +	 * 2.) shuts down the filesystem
> +	 */
> +	args->trans->t_flags |= XFS_TRANS_DIRTY;
> +	if (xfs_sb_version_hasdelattr(&args->dp->i_mount->m_sb))
> +		set_bit(XFS_LI_DIRTY, &attrdp->attrd_item.li_flags);

This could probably be:

	if (attrdp)
		set_bit(...);

> +
> +	return error;
> +}
> +
> +/* Log an attr to the intent item. */
> +STATIC void
> +xfs_attr_log_item(
> +	struct xfs_trans		*tp,
> +	struct xfs_attri_log_item	*attrip,
> +	struct xfs_attr_item		*attr)
> +{
> +	struct xfs_attri_log_format	*attrp;
> +
> +	tp->t_flags |= XFS_TRANS_DIRTY;
> +	set_bit(XFS_LI_DIRTY, &attrip->attri_item.li_flags);
> +
> +	/*
> +	 * At this point the xfs_attr_item has been constructed, and we've
> +	 * created the log intent. Fill in the attri log item and log format
> +	 * structure with fields from this xfs_attr_item
> +	 */
> +	attrp = &attrip->attri_format;
> +	attrp->alfi_ino = attr->xattri_dac.da_args->dp->i_ino;
> +	attrp->alfi_op_flags = attr->xattri_op_flags;
> +	attrp->alfi_value_len = attr->xattri_dac.da_args->valuelen;
> +	attrp->alfi_name_len = attr->xattri_dac.da_args->namelen;
> +	attrp->alfi_attr_flags = attr->xattri_dac.da_args->attr_filter;
> +
> +	attrip->attri_name = (void *)attr->xattri_dac.da_args->name;
> +	attrip->attri_value = attr->xattri_dac.da_args->value;
> +	attrip->attri_name_len = attr->xattri_dac.da_args->namelen;
> +	attrip->attri_value_len = attr->xattri_dac.da_args->valuelen;
> +}
> +
> +/* Get an ATTRI. */
> +static struct xfs_log_item *
> +xfs_attr_create_intent(
> +	struct xfs_trans		*tp,
> +	struct list_head		*items,
> +	unsigned int			count,
> +	bool				sort)
> +{
> +	struct xfs_mount		*mp = tp->t_mountp;
> +	struct xfs_attri_log_item	*attrip;
> +	struct xfs_attr_item		*attr;
> +
> +	ASSERT(count == 1);
> +
> +	if (!xfs_sb_version_hasdelattr(&mp->m_sb))
> +		return NULL;
> +
> +	attrip = xfs_attri_init(mp);
> +	xfs_trans_add_item(tp, &attrip->attri_item);
> +	list_for_each_entry(attr, items, xattri_list)
> +		xfs_attr_log_item(tp, attrip, attr);
> +	return &attrip->attri_item;
> +}
> +
> +/* Process an attr. */
> +STATIC int
> +xfs_attr_finish_item(
> +	struct xfs_trans		*tp,
> +	struct xfs_log_item		*done,
> +	struct list_head		*item,
> +	struct xfs_btree_cur		**state)
> +{
> +	struct xfs_attr_item		*attr;
> +	int				error;
> +	struct xfs_delattr_context	*dac;
> +	struct xfs_attrd_log_item	*attrdp;
> +	struct xfs_attri_log_item	*attrip;
> +
> +	attr = container_of(item, struct xfs_attr_item, xattri_list);
> +	dac = &attr->xattri_dac;
> +
> +	/*
> +	 * Always reset trans after EAGAIN cycle
> +	 * since the transaction is new
> +	 */
> +	dac->da_args->trans = tp;
> +
> +	error = xfs_trans_attr(dac, ATTRD_ITEM(done), &dac->leaf_bp,
> +			       attr->xattri_op_flags);
> +	/*
> +	 * The attrip refers to xfs_attr_item memory to log the name and value
> +	 * with the intent item. This already occurred when the intent was
> +	 * committed so these fields are no longer accessed.

Can you clear the attri_{name,value} pointers after you've logged the
intent item so that we don't have to do them here?

> Clear them out of
> +	 * caution since we're about to free the xfs_attr_item.
> +	 */
> +	if (xfs_sb_version_hasdelattr(&dac->da_args->dp->i_mount->m_sb)) {
> +		attrdp = (struct xfs_attrd_log_item *)done;

attrdp = ATTRD_ITEM(done)?

> +		attrip = attrdp->attrd_attrip;
> +		attrip->attri_name = NULL;
> +		attrip->attri_value = NULL;
> +	}
> +
> +	if (error != -EAGAIN)
> +		kmem_free(attr);
> +
> +	return error;
> +}
> +
> +/* Abort all pending ATTRs. */
> +STATIC void
> +xfs_attr_abort_intent(
> +	struct xfs_log_item		*intent)
> +{
> +	xfs_attri_release(ATTRI_ITEM(intent));
> +}
> +
> +/* Cancel an attr */
> +STATIC void
> +xfs_attr_cancel_item(
> +	struct list_head		*item)
> +{
> +	struct xfs_attr_item		*attr;
> +
> +	attr = container_of(item, struct xfs_attr_item, xattri_list);
> +	kmem_free(attr);
> +}
> +
> +/*
> + * The ATTRI is logged only once and cannot be moved in the log, so simply
> + * return the lsn at which it's been logged.
> + */
> +STATIC xfs_lsn_t
> +xfs_attri_item_committed(
> +	struct xfs_log_item	*lip,
> +	xfs_lsn_t		lsn)
> +{
> +	return lsn;
> +}

You can omit this function because the default is "return lsn;" if you
don't provide one.  See xfs_trans_committed_bulk.

> +
> +STATIC void
> +xfs_attri_item_committing(
> +	struct xfs_log_item	*lip,
> +	xfs_lsn_t		lsn)
> +{
> +}

This function isn't required if it doesn't do anything.  See
xfs_log_commit_cil.

> +
> +STATIC bool
> +xfs_attri_item_match(
> +	struct xfs_log_item	*lip,
> +	uint64_t		intent_id)
> +{
> +	return ATTRI_ITEM(lip)->attri_format.alfi_id == intent_id;
> +}
> +
> +/*
> + * When the attrd item is committed to disk, all we need to do is delete our
> + * reference to our partner attri item and then free ourselves. Since we're
> + * freeing ourselves we must return -1 to keep the transaction code from
> + * further referencing this item.
> + */
> +STATIC xfs_lsn_t
> +xfs_attrd_item_committed(
> +	struct xfs_log_item	*lip,
> +	xfs_lsn_t		lsn)
> +{
> +	struct xfs_attrd_log_item	*attrdp = ATTRD_ITEM(lip);
> +
> +	/*
> +	 * Drop the ATTRI reference regardless of whether the ATTRD has been
> +	 * aborted. Once the ATTRD transaction is constructed, it is the sole
> +	 * responsibility of the ATTRD to release the ATTRI (even if the ATTRI
> +	 * is aborted due to log I/O error).
> +	 */
> +	xfs_attri_release(attrdp->attrd_attrip);
> +	xfs_attrd_item_free(attrdp);
> +
> +	return NULLCOMMITLSN;
> +}

If you set XFS_ITEM_RELEASE_WHEN_COMMITTED in the attrd item ops,
xfs_trans_committed_bulk will call ->iop_release instead of
->iop_committed and you therefore don't need this function.

> +
> +STATIC void
> +xfs_attrd_item_committing(
> +	struct xfs_log_item	*lip,
> +	xfs_lsn_t		lsn)
> +{
> +}

Same comment as xfs_attri_item_committing.

> +
> +
> +/*
> + * Allocate and initialize an attrd item
> + */
> +struct xfs_attrd_log_item *
> +xfs_attrd_init(
> +	struct xfs_mount		*mp,
> +	struct xfs_attri_log_item	*attrip)
> +
> +{
> +	struct xfs_attrd_log_item	*attrdp;
> +	uint				size;
> +
> +	size = (uint)(sizeof(struct xfs_attrd_log_item));

Same comment about sizeof and size_t as in xfs_attri_init.

> +	attrdp = kmem_zalloc(size, 0);
> +	memset(attrdp, 0, size);

No need to memset-zero something you just zalloc'd.

> +
> +	xfs_log_item_init(mp, &attrdp->attrd_item, XFS_LI_ATTRD,
> +			  &xfs_attrd_item_ops);
> +	attrdp->attrd_attrip = attrip;
> +	attrdp->attrd_format.alfd_alf_id = attrip->attri_format.alfi_id;
> +
> +	return attrdp;
> +}
> +
> +/*
> + * This routine is called to allocate an "attr free done" log item.
> + */
> +struct xfs_attrd_log_item *
> +xfs_trans_get_attrd(struct xfs_trans		*tp,
> +		  struct xfs_attri_log_item	*attrip)
> +{
> +	struct xfs_attrd_log_item		*attrdp;
> +
> +	ASSERT(tp != NULL);
> +
> +	attrdp = xfs_attrd_init(tp->t_mountp, attrip);
> +	ASSERT(attrdp != NULL);

You could fold xfs_attrd_init into this function since there's only one
caller.

> +
> +	xfs_trans_add_item(tp, &attrdp->attrd_item);
> +	return attrdp;
> +}
> +
> +static const struct xfs_item_ops xfs_attrd_item_ops = {
> +	.iop_size	= xfs_attrd_item_size,
> +	.iop_format	= xfs_attrd_item_format,
> +	.iop_release    = xfs_attrd_item_release,
> +	.iop_committing	= xfs_attrd_item_committing,
> +	.iop_committed	= xfs_attrd_item_committed,
> +};
> +
> +
> +/* Get an ATTRD so we can process all the attrs. */
> +static struct xfs_log_item *
> +xfs_attr_create_done(
> +	struct xfs_trans		*tp,
> +	struct xfs_log_item		*intent,
> +	unsigned int			count)
> +{
> +	if (!xfs_sb_version_hasdelattr(&tp->t_mountp->m_sb))
> +		return NULL;

This is probably better expressed as:

	if (!intent)
		return NULL;

Since we don't need a log intent done item if there's no log intent
item.

> +
> +	return &xfs_trans_get_attrd(tp, ATTRI_ITEM(intent))->attrd_item;
> +}
> +
> +const struct xfs_defer_op_type xfs_attr_defer_type = {
> +	.max_items	= 1,
> +	.create_intent	= xfs_attr_create_intent,
> +	.abort_intent	= xfs_attr_abort_intent,
> +	.create_done	= xfs_attr_create_done,
> +	.finish_item	= xfs_attr_finish_item,
> +	.cancel_item	= xfs_attr_cancel_item,
> +};
> +
> +/*
> + * Process an attr intent item that was recovered from the log.  We need to
> + * delete the attr that it describes.
> + */
> +STATIC int
> +xfs_attri_item_recover(
> +	struct xfs_log_item		*lip,
> +	struct list_head		*capture_list)
> +{
> +	struct xfs_attri_log_item	*attrip = ATTRI_ITEM(lip);
> +	struct xfs_mount		*mp = lip->li_mountp;
> +	struct xfs_inode		*ip;
> +	struct xfs_da_args		args;
> +	struct xfs_attri_log_format	*attrp;
> +	int				error;
> +
> +	/*
> +	 * First check the validity of the attr described by the ATTRI.  If any
> +	 * are bad, then assume that all are bad and just toss the ATTRI.
> +	 */
> +	attrp = &attrip->attri_format;
> +	if (!(attrp->alfi_op_flags == XFS_ATTR_OP_FLAGS_SET ||
> +	      attrp->alfi_op_flags == XFS_ATTR_OP_FLAGS_REMOVE) ||
> +	    (attrp->alfi_value_len > XATTR_SIZE_MAX) ||
> +	    (attrp->alfi_name_len > XATTR_NAME_MAX) ||
> +	    (attrp->alfi_name_len == 0)) {

This needs to call xfs_verify_ino() on attrp->alfi_ino.

This also needs to check for xfs_sb_version_hasdelayedattr().

I would refactor this into a separate validation predicate to eliminate
the multi-line if statement.  I will post a series cleaning up the other
log items' recover functions shortly.

> +		/*
> +		 * This will pull the ATTRI from the AIL and free the memory
> +		 * associated with it.
> +		 */
> +		xfs_attri_release(attrip);

No need to call xfs_attri_release; one of the 5.10 cleanups was to
recognize that the log recovery code does this for you automatically.

> +		return -EFSCORRUPTED;
> +	}
> +
> +	error = xfs_iget(mp, 0, attrp->alfi_ino, 0, 0, &ip);
> +	if (error)
> +		return error;

I /think/ this needs to call xfs_qm_dqattach here, for reasons I'll get
into shortly.

In the meantime, this /definitely/ needs to do:

	if (VFS_I(ip)->i_nlink == 0)
		xfs_iflags_set(ip, XFS_IRECOVERY);

Because the IRECOVERY flag prevents inode inactivation from triggering
on an unlinked inode while we're still performing log recovery.

If you want to steal the xlog_recover_iget helper from the atomic
swapext series[0] please feel free. :)

[0] https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/commit/?id=51e23b9c9d9674a78dc97c5848c9efb4461e074d

> +	memset(&args, 0, sizeof(args));
> +	args.dp = ip;
> +	args.name = attrip->attri_name;
> +	args.namelen = attrp->alfi_name_len;
> +	args.attr_filter = attrp->alfi_attr_flags;
> +	if (attrp->alfi_op_flags == XFS_ATTR_OP_FLAGS_SET) {
> +		args.value = attrip->attri_value;
> +		args.valuelen = attrp->alfi_value_len;
> +	}
> +
> +	error = xfs_attr_set(&args);

Er...

> +
> +	xfs_attri_release(attrip);

The transaction commit will take care of releasing attrip.

> +	xfs_irele(ip);
> +	return error;
> +}
> +
> +static const struct xfs_item_ops xfs_attri_item_ops = {
> +	.iop_size	= xfs_attri_item_size,
> +	.iop_format	= xfs_attri_item_format,
> +	.iop_unpin	= xfs_attri_item_unpin,
> +	.iop_committed	= xfs_attri_item_committed,
> +	.iop_committing = xfs_attri_item_committing,
> +	.iop_release    = xfs_attri_item_release,
> +	.iop_recover	= xfs_attri_item_recover,
> +	.iop_match	= xfs_attri_item_match,

This needs an ->iop_relog method so that we can relog the attri log item
if the log starts to fill up.

> +};
> +
> +
> +
> +STATIC int
> +xlog_recover_attri_commit_pass2(
> +	struct xlog                     *log,
> +	struct list_head		*buffer_list,
> +	struct xlog_recover_item        *item,
> +	xfs_lsn_t                       lsn)
> +{
> +	int                             error;
> +	struct xfs_mount                *mp = log->l_mp;
> +	struct xfs_attri_log_item       *attrip;
> +	struct xfs_attri_log_format     *attri_formatp;
> +	char				*name = NULL;
> +	char				*value = NULL;
> +	int				region = 0;
> +
> +	attri_formatp = item->ri_buf[region].i_addr;

Please check the __pad field for zeroes here.

> +	attrip = xfs_attri_init(mp);
> +	error = xfs_attri_copy_format(&item->ri_buf[region],
> +				      &attrip->attri_format);
> +	if (error) {
> +		xfs_attri_item_free(attrip);
> +		return error;
> +	}
> +
> +	attrip->attri_name_len = attri_formatp->alfi_name_len;
> +	attrip->attri_value_len = attri_formatp->alfi_value_len;
> +	attrip = krealloc(attrip, sizeof(struct xfs_attri_log_item) +
> +			  attrip->attri_name_len + attrip->attri_value_len,
> +			  GFP_NOFS | __GFP_NOFAIL);
> +
> +	ASSERT(attrip->attri_name_len > 0);

If attri_name_len is zero, reject the whole thing with EFSCORRUPTED.

> +	region++;
> +	name = ((char *)attrip) + sizeof(struct xfs_attri_log_item);
> +	memcpy(name, item->ri_buf[region].i_addr,
> +	       attrip->attri_name_len);
> +	attrip->attri_name = name;
> +
> +	if (attrip->attri_value_len > 0) {
> +		region++;
> +		value = ((char *)attrip) + sizeof(struct xfs_attri_log_item) +
> +			attrip->attri_name_len;
> +		memcpy(value, item->ri_buf[region].i_addr,
> +			attrip->attri_value_len);
> +		attrip->attri_value = value;
> +	}

Question: is it valid for an attri item to have value_len > 0 for an
XFS_ATTRI_OP_FLAGS_REMOVE operation?

Granted, that level of validation might be better left to the _recover
function.

> +
> +	/*
> +	 * The ATTRI has two references. One for the ATTRD and one for ATTRI to
> +	 * ensure it makes it into the AIL. Insert the ATTRI into the AIL
> +	 * directly and drop the ATTRI reference. Note that
> +	 * xfs_trans_ail_update() drops the AIL lock.
> +	 */
> +	xfs_trans_ail_insert(log->l_ailp, &attrip->attri_item, lsn);
> +	xfs_attri_release(attrip);
> +	return 0;
> +}
> +
> +const struct xlog_recover_item_ops xlog_attri_item_ops = {
> +	.item_type	= XFS_LI_ATTRI,
> +	.commit_pass2	= xlog_recover_attri_commit_pass2,
> +};
> +
> +/*
> + * This routine is called when an ATTRD format structure is found in a committed
> + * transaction in the log. Its purpose is to cancel the corresponding ATTRI if
> + * it was still in the log. To do this it searches the AIL for the ATTRI with
> + * an id equal to that in the ATTRD format structure. If we find it we drop
> + * the ATTRD reference, which removes the ATTRI from the AIL and frees it.
> + */
> +STATIC int
> +xlog_recover_attrd_commit_pass2(
> +	struct xlog			*log,
> +	struct list_head		*buffer_list,
> +	struct xlog_recover_item	*item,
> +	xfs_lsn_t			lsn)
> +{
> +	struct xfs_attrd_log_format	*attrd_formatp;
> +
> +	attrd_formatp = item->ri_buf[0].i_addr;
> +	ASSERT((item->ri_buf[0].i_len ==
> +				(sizeof(struct xfs_attrd_log_format))));
> +
> +	xlog_recover_release_intent(log, XFS_LI_ATTRI,
> +				    attrd_formatp->alfd_alf_id);
> +	return 0;
> +}
> +
> +const struct xlog_recover_item_ops xlog_attrd_item_ops = {
> +	.item_type	= XFS_LI_ATTRD,
> +	.commit_pass2	= xlog_recover_attrd_commit_pass2,
> +};
> diff --git a/fs/xfs/xfs_attr_item.h b/fs/xfs/xfs_attr_item.h
> new file mode 100644
> index 0000000..7dd2572
> --- /dev/null
> +++ b/fs/xfs/xfs_attr_item.h
> @@ -0,0 +1,76 @@
> +/* SPDX-License-Identifier: GPL-2.0-or-later
> + *
> + * Copyright (C) 2019 Oracle.  All Rights Reserved.
> + * Author: Allison Collins <allison.henderson@oracle.com>
> + */
> +#ifndef	__XFS_ATTR_ITEM_H__
> +#define	__XFS_ATTR_ITEM_H__
> +
> +/* kernel only ATTRI/ATTRD definitions */
> +
> +struct xfs_mount;
> +struct kmem_zone;
> +
> +/*
> + * Define ATTR flag bits. Manipulated by set/clear/test_bit operators.
> + */
> +#define	XFS_ATTRI_RECOVERED	1
> +
> +
> +/* iovec length must be 32-bit aligned */
> +#define ATTR_NVEC_SIZE(size) (size == sizeof(int32_t) ? sizeof(int32_t) : \
> +				size + sizeof(int32_t) - \
> +				(size % sizeof(int32_t)))

Can you turn this into a static inline helper?

And use one of the roundup() variants to ensure the proper alignment
instead of this open-coded stuff? :)

> +
> +/*
> + * This is the "attr intention" log item.  It is used to log the fact that some
> + * attribute operations need to be processed.  An operation is currently either
> + * a set or remove.  Set or remove operations are described by the xfs_attr_item
> + * which may be logged to this intent.  Intents are used in conjunction with the
> + * "attr done" log item described below.
> + *
> + * The ATTRI is reference counted so that it is not freed prior to both the
> + * ATTRI and ATTRD being committed and unpinned. This ensures the ATTRI is
> + * inserted into the AIL even in the event of out of order ATTRI/ATTRD
> + * processing. In other words, an ATTRI is born with two references:
> + *
> + *      1.) an ATTRI held reference to track ATTRI AIL insertion
> + *      2.) an ATTRD held reference to track ATTRD commit
> + *
> + * On allocation, both references are the responsibility of the caller. Once the
> + * ATTRI is added to and dirtied in a transaction, ownership of reference one
> + * transfers to the transaction. The reference is dropped once the ATTRI is
> + * inserted to the AIL or in the event of failure along the way (e.g., commit
> + * failure, log I/O error, etc.). Note that the caller remains responsible for
> + * the ATTRD reference under all circumstances to this point. The caller has no
> + * means to detect failure once the transaction is committed, however.
> + * Therefore, an ATTRD is required after this point, even in the event of
> + * unrelated failure.
> + *
> + * Once an ATTRD is allocated and dirtied in a transaction, reference two
> + * transfers to the transaction. The ATTRD reference is dropped once it reaches
> + * the unpin handler. Similar to the ATTRI, the reference also drops in the
> + * event of commit failure or log I/O errors. Note that the ATTRD is not
> + * inserted in the AIL, so at this point both the ATTRI and ATTRD are freed.

I don't think it's necessary to document the entire log intent/log done
refcount state machine here; it'll do to record just the bits that are
specific to delayed xattr operations.

> + */
> +struct xfs_attri_log_item {
> +	struct xfs_log_item		attri_item;
> +	atomic_t			attri_refcount;
> +	int				attri_name_len;
> +	void				*attri_name;
> +	int				attri_value_len;
> +	void				*attri_value;

Please compress this structure a bit by moving the two pointers to be
adjacent instead of interspersed with ints.

Ok, now on to digesting the new state machine...

--D

> +	struct xfs_attri_log_format	attri_format;
> +};
> +
> +/*
> + * This is the "attr done" log item.  It is used to log the fact that some attrs
> + * earlier mentioned in an attri item have been freed.
> + */
> +struct xfs_attrd_log_item {
> +	struct xfs_attri_log_item	*attrd_attrip;
> +	struct xfs_log_item		attrd_item;
> +	struct xfs_attrd_log_format	attrd_format;
> +};
> +
> +#endif	/* __XFS_ATTR_ITEM_H__ */
> diff --git a/fs/xfs/xfs_attr_list.c b/fs/xfs/xfs_attr_list.c
> index 8f8837f..d7787a5 100644
> --- a/fs/xfs/xfs_attr_list.c
> +++ b/fs/xfs/xfs_attr_list.c
> @@ -15,6 +15,7 @@
>  #include "xfs_inode.h"
>  #include "xfs_trans.h"
>  #include "xfs_bmap.h"
> +#include "xfs_da_btree.h"
>  #include "xfs_attr.h"
>  #include "xfs_attr_sf.h"
>  #include "xfs_attr_leaf.h"
> diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
> index 3fbd98f..d5d1959 100644
> --- a/fs/xfs/xfs_ioctl.c
> +++ b/fs/xfs/xfs_ioctl.c
> @@ -15,6 +15,8 @@
>  #include "xfs_iwalk.h"
>  #include "xfs_itable.h"
>  #include "xfs_error.h"
> +#include "xfs_da_format.h"
> +#include "xfs_da_btree.h"
>  #include "xfs_attr.h"
>  #include "xfs_bmap.h"
>  #include "xfs_bmap_util.h"
> diff --git a/fs/xfs/xfs_ioctl32.c b/fs/xfs/xfs_ioctl32.c
> index c1771e7..62e1534 100644
> --- a/fs/xfs/xfs_ioctl32.c
> +++ b/fs/xfs/xfs_ioctl32.c
> @@ -17,6 +17,8 @@
>  #include "xfs_itable.h"
>  #include "xfs_fsops.h"
>  #include "xfs_rtalloc.h"
> +#include "xfs_da_format.h"
> +#include "xfs_da_btree.h"
>  #include "xfs_attr.h"
>  #include "xfs_ioctl.h"
>  #include "xfs_ioctl32.h"
> diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
> index 5e16545..5ecc76c 100644
> --- a/fs/xfs/xfs_iops.c
> +++ b/fs/xfs/xfs_iops.c
> @@ -13,6 +13,8 @@
>  #include "xfs_inode.h"
>  #include "xfs_acl.h"
>  #include "xfs_quota.h"
> +#include "xfs_da_format.h"
> +#include "xfs_da_btree.h"
>  #include "xfs_attr.h"
>  #include "xfs_trans.h"
>  #include "xfs_trace.h"
> diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
> index fa2d05e..3457f22 100644
> --- a/fs/xfs/xfs_log.c
> +++ b/fs/xfs/xfs_log.c
> @@ -1993,6 +1993,10 @@ xlog_print_tic_res(
>  	    REG_TYPE_STR(CUD_FORMAT, "cud_format"),
>  	    REG_TYPE_STR(BUI_FORMAT, "bui_format"),
>  	    REG_TYPE_STR(BUD_FORMAT, "bud_format"),
> +	    REG_TYPE_STR(ATTRI_FORMAT, "attri_format"),
> +	    REG_TYPE_STR(ATTRD_FORMAT, "attrd_format"),
> +	    REG_TYPE_STR(ATTR_NAME, "attr_name"),
> +	    REG_TYPE_STR(ATTR_VALUE, "attr_value"),
>  	};
>  	BUILD_BUG_ON(ARRAY_SIZE(res_type_str) != XLOG_REG_TYPE_MAX + 1);
>  #undef REG_TYPE_STR
> diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
> index a8289ad..cb951cd 100644
> --- a/fs/xfs/xfs_log_recover.c
> +++ b/fs/xfs/xfs_log_recover.c
> @@ -1775,6 +1775,8 @@ static const struct xlog_recover_item_ops *xlog_recover_item_ops[] = {
>  	&xlog_cud_item_ops,
>  	&xlog_bui_item_ops,
>  	&xlog_bud_item_ops,
> +	&xlog_attri_item_ops,
> +	&xlog_attrd_item_ops,
>  };
>  
>  static const struct xlog_recover_item_ops *
> diff --git a/fs/xfs/xfs_ondisk.h b/fs/xfs/xfs_ondisk.h
> index 0aa87c2..bc9c25e 100644
> --- a/fs/xfs/xfs_ondisk.h
> +++ b/fs/xfs/xfs_ondisk.h
> @@ -132,6 +132,8 @@ xfs_check_ondisk_structs(void)
>  	XFS_CHECK_STRUCT_SIZE(struct xfs_inode_log_format,	56);
>  	XFS_CHECK_STRUCT_SIZE(struct xfs_qoff_logformat,	20);
>  	XFS_CHECK_STRUCT_SIZE(struct xfs_trans_header,		16);
> +	XFS_CHECK_STRUCT_SIZE(struct xfs_attri_log_format,	40);
> +	XFS_CHECK_STRUCT_SIZE(struct xfs_attrd_log_format,	16);
>  
>  	/*
>  	 * The v5 superblock format extended several v4 header structures with
> diff --git a/fs/xfs/xfs_xattr.c b/fs/xfs/xfs_xattr.c
> index bca48b3..9b0c790 100644
> --- a/fs/xfs/xfs_xattr.c
> +++ b/fs/xfs/xfs_xattr.c
> @@ -10,6 +10,7 @@
>  #include "xfs_log_format.h"
>  #include "xfs_da_format.h"
>  #include "xfs_inode.h"
> +#include "xfs_da_btree.h"
>  #include "xfs_attr.h"
>  #include "xfs_acl.h"
>  #include "xfs_da_btree.h"
> -- 
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v13 03/10] xfs: Add delay ready attr set routines
  2020-10-27 13:32   ` Chandan Babu R
@ 2020-11-10 21:57     ` Darrick J. Wong
  2020-11-13  1:33       ` Allison Henderson
  0 siblings, 1 reply; 58+ messages in thread
From: Darrick J. Wong @ 2020-11-10 21:57 UTC (permalink / raw)
  To: Chandan Babu R; +Cc: Allison Henderson, linux-xfs

On Tue, Oct 27, 2020 at 07:02:55PM +0530, Chandan Babu R wrote:
> On Friday 23 October 2020 12:04:28 PM IST Allison Henderson wrote:
> > This patch modifies the attr set routines to be delay ready. This means
> > they no longer roll or commit transactions, but instead return -EAGAIN
> > to have the calling routine roll and refresh the transaction.  In this
> > series, xfs_attr_set_args has become xfs_attr_set_iter, which uses a
> > state machine like switch to keep track of where it was when EAGAIN was
> > returned. See xfs_attr.h for a more detailed diagram of the states.
> > 
> > Two new helper functions have been added: xfs_attr_rmtval_set_init and
> > xfs_attr_rmtval_set_blk.  They provide a subset of logic similar to
> > xfs_attr_rmtval_set, but they store the current block in the delay attr
> > context to allow the caller to roll the transaction between allocations.
> > This helps to simplify and consolidate code used by
> > xfs_attr_leaf_addname and xfs_attr_node_addname. xfs_attr_set_args has
> > now become a simple loop to refresh the transaction until the operation
> > is completed.  Lastly, xfs_attr_rmtval_remove is no longer used, and is
> > removed.
> 
> One nit. xfs_attr_rmtval_remove()'s prototype declaration needs to be removed
> from xfs_attr_remote.h.
> 
> > 
> > Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
> > ---
> >  fs/xfs/libxfs/xfs_attr.c        | 370 ++++++++++++++++++++++++++--------------
> >  fs/xfs/libxfs/xfs_attr.h        | 126 +++++++++++++-
> >  fs/xfs/libxfs/xfs_attr_remote.c |  99 +++++++----
> >  fs/xfs/libxfs/xfs_attr_remote.h |   4 +
> >  fs/xfs/xfs_trace.h              |   1 -
> >  5 files changed, 439 insertions(+), 161 deletions(-)
> > 
> > diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
> > index 6ca94cb..95c98d7 100644
> > --- a/fs/xfs/libxfs/xfs_attr.c
> > +++ b/fs/xfs/libxfs/xfs_attr.c
> > @@ -44,7 +44,7 @@ STATIC int xfs_attr_shortform_addname(xfs_da_args_t *args);
> >   * Internal routines when attribute list is one block.
> >   */
> >  STATIC int xfs_attr_leaf_get(xfs_da_args_t *args);
> > -STATIC int xfs_attr_leaf_addname(xfs_da_args_t *args);
> > +STATIC int xfs_attr_leaf_addname(struct xfs_delattr_context *dac);
> >  STATIC int xfs_attr_leaf_removename(xfs_da_args_t *args);
> >  STATIC int xfs_attr_leaf_hasname(struct xfs_da_args *args, struct xfs_buf **bp);
> >  
> > @@ -52,12 +52,15 @@ STATIC int xfs_attr_leaf_hasname(struct xfs_da_args *args, struct xfs_buf **bp);
> >   * Internal routines when attribute list is more than one block.
> >   */
> >  STATIC int xfs_attr_node_get(xfs_da_args_t *args);
> > -STATIC int xfs_attr_node_addname(xfs_da_args_t *args);
> > +STATIC int xfs_attr_node_addname(struct xfs_delattr_context *dac);
> >  STATIC int xfs_attr_node_removename_iter(struct xfs_delattr_context *dac);
> >  STATIC int xfs_attr_node_hasname(xfs_da_args_t *args,
> >  				 struct xfs_da_state **state);
> >  STATIC int xfs_attr_fillstate(xfs_da_state_t *state);
> >  STATIC int xfs_attr_refillstate(xfs_da_state_t *state);
> > +STATIC int xfs_attr_leaf_try_add(struct xfs_da_args *args, struct xfs_buf *bp);
> > +STATIC int xfs_attr_set_iter(struct xfs_delattr_context *dac,
> > +			     struct xfs_buf **leaf_bp);
> >  
> >  int
> >  xfs_inode_hasattr(
> > @@ -218,8 +221,11 @@ xfs_attr_is_shortform(
> >  
> >  /*
> >   * Attempts to set an attr in shortform, or converts short form to leaf form if
> > - * there is not enough room.  If the attr is set, the transaction is committed
> > - * and set to NULL.
> > + * there is not enough room.  This function is meant to operate as a helper
> > + * routine to the delayed attribute functions.  It returns -EAGAIN to indicate
> > + * that the calling function should roll the transaction, and then proceed to
> > + * add the attr in leaf form.  This subroutine does not expect to be recalled
> > + * again like the other delayed attr routines do.
> >   */
> >  STATIC int
> >  xfs_attr_set_shortform(
> > @@ -227,16 +233,16 @@ xfs_attr_set_shortform(
> >  	struct xfs_buf		**leaf_bp)
> >  {
> >  	struct xfs_inode	*dp = args->dp;
> > -	int			error, error2 = 0;
> > +	int			error = 0;
> >  
> >  	/*
> >  	 * Try to add the attr to the attribute list in the inode.
> >  	 */
> >  	error = xfs_attr_try_sf_addname(dp, args);
> > +
> > +	/* Should only be 0, -EEXIST or ENOSPC */
> >  	if (error != -ENOSPC) {
> > -		error2 = xfs_trans_commit(args->trans);
> > -		args->trans = NULL;
> > -		return error ? error : error2;
> > +		return error;
> >  	}
> >  	/*
> >  	 * It won't fit in the shortform, transform to a leaf block.  GROT:
> > @@ -249,18 +255,10 @@ xfs_attr_set_shortform(
> >  	/*
> >  	 * Prevent the leaf buffer from being unlocked so that a concurrent AIL
> >  	 * push cannot grab the half-baked leaf buffer and run into problems
> > -	 * with the write verifier. Once we're done rolling the transaction we
> > -	 * can release the hold and add the attr to the leaf.
> > +	 * with the write verifier.
> >  	 */
> >  	xfs_trans_bhold(args->trans, *leaf_bp);
> > -	error = xfs_defer_finish(&args->trans);
> > -	xfs_trans_bhold_release(args->trans, *leaf_bp);
> > -	if (error) {
> > -		xfs_trans_brelse(args->trans, *leaf_bp);
> > -		return error;
> > -	}
> > -
> > -	return 0;
> > +	return -EAGAIN;
> >  }
> >  
> >  /*
> > @@ -268,7 +266,7 @@ xfs_attr_set_shortform(
> >   * also checks for a defer finish.  Transaction is finished and rolled as
> >   * needed, and returns true of false if the delayed operation should continue.
> >   */
> > -int
> > +STATIC int
> >  xfs_attr_trans_roll(
> >  	struct xfs_delattr_context	*dac)
> >  {
> > @@ -297,61 +295,130 @@ int
> >  xfs_attr_set_args(
> >  	struct xfs_da_args	*args)
> >  {
> > -	struct xfs_inode	*dp = args->dp;
> > -	struct xfs_buf          *leaf_bp = NULL;
> > -	int			error = 0;
> > +	struct xfs_buf			*leaf_bp = NULL;
> > +	int				error = 0;
> > +	struct xfs_delattr_context	dac = {
> > +		.da_args	= args,
> > +	};
> > +
> > +	do {
> > +		error = xfs_attr_set_iter(&dac, &leaf_bp);
> > +		if (error != -EAGAIN)
> > +			break;
> > +
> > +		error = xfs_attr_trans_roll(&dac);
> > +		if (error)
> > +			return error;
> > +
> > +		if (leaf_bp) {
> > +			xfs_trans_bjoin(args->trans, leaf_bp);
> > +			xfs_trans_bhold(args->trans, leaf_bp);
> > +		}
> 
> When xfs_attr_set_iter() causes a "short form" attribute list to be converted
> to "leaf form", leaf_bp would point to an xfs_buf which has been added to the
> transaction and also XFS_BLI_HOLD flag is set on the buffer (last statement in
> xfs_attr_set_shortform()). XFS_BLI_HOLD flag makes sure that the new
> transaction allocated by xfs_attr_trans_roll() would continue to have leaf_bp
> in the transaction's item list. Hence I think the above calls to
> xfs_trans_bjoin() and xfs_trans_bhold() are not required.

I /think/ the defer ops will rejoin the buffer each time it rolls, which
means that xfs_attr_trans_roll returns with the buffer already joined to
the transaction?  And I think you're right that the bhold isn't needed,
because holding is dictated by the lower levels (i.e. _set_iter).

> Please let me know if I am missing something obvious here.

The entire function goes away by the end of the series. :)

--D

> 
> -- 
> chandan
> 
> 
> 

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v13 03/10] xfs: Add delay ready attr set routines
  2020-10-23  6:34 ` [PATCH v13 03/10] xfs: Add delay ready attr set routines Allison Henderson
  2020-10-27 13:32   ` Chandan Babu R
@ 2020-11-10 23:10   ` Darrick J. Wong
  2020-11-13  1:38     ` Allison Henderson
  1 sibling, 1 reply; 58+ messages in thread
From: Darrick J. Wong @ 2020-11-10 23:10 UTC (permalink / raw)
  To: Allison Henderson; +Cc: linux-xfs

On Thu, Oct 22, 2020 at 11:34:28PM -0700, Allison Henderson wrote:
> This patch modifies the attr set routines to be delay ready. This means
> they no longer roll or commit transactions, but instead return -EAGAIN
> to have the calling routine roll and refresh the transaction.  In this
> series, xfs_attr_set_args has become xfs_attr_set_iter, which uses a
> state machine like switch to keep track of where it was when EAGAIN was
> returned. See xfs_attr.h for a more detailed diagram of the states.
> 
> Two new helper functions have been added: xfs_attr_rmtval_set_init and
> xfs_attr_rmtval_set_blk.  They provide a subset of logic similar to
> xfs_attr_rmtval_set, but they store the current block in the delay attr
> context to allow the caller to roll the transaction between allocations.
> This helps to simplify and consolidate code used by
> xfs_attr_leaf_addname and xfs_attr_node_addname. xfs_attr_set_args has
> now become a simple loop to refresh the transaction until the operation
> is completed.  Lastly, xfs_attr_rmtval_remove is no longer used, and is
> removed.
> 
> Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
> ---
>  fs/xfs/libxfs/xfs_attr.c        | 370 ++++++++++++++++++++++++++--------------
>  fs/xfs/libxfs/xfs_attr.h        | 126 +++++++++++++-
>  fs/xfs/libxfs/xfs_attr_remote.c |  99 +++++++----
>  fs/xfs/libxfs/xfs_attr_remote.h |   4 +
>  fs/xfs/xfs_trace.h              |   1 -
>  5 files changed, 439 insertions(+), 161 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
> index 6ca94cb..95c98d7 100644
> --- a/fs/xfs/libxfs/xfs_attr.c
> +++ b/fs/xfs/libxfs/xfs_attr.c
> @@ -44,7 +44,7 @@ STATIC int xfs_attr_shortform_addname(xfs_da_args_t *args);
>   * Internal routines when attribute list is one block.
>   */
>  STATIC int xfs_attr_leaf_get(xfs_da_args_t *args);
> -STATIC int xfs_attr_leaf_addname(xfs_da_args_t *args);
> +STATIC int xfs_attr_leaf_addname(struct xfs_delattr_context *dac);
>  STATIC int xfs_attr_leaf_removename(xfs_da_args_t *args);
>  STATIC int xfs_attr_leaf_hasname(struct xfs_da_args *args, struct xfs_buf **bp);
>  
> @@ -52,12 +52,15 @@ STATIC int xfs_attr_leaf_hasname(struct xfs_da_args *args, struct xfs_buf **bp);
>   * Internal routines when attribute list is more than one block.
>   */
>  STATIC int xfs_attr_node_get(xfs_da_args_t *args);
> -STATIC int xfs_attr_node_addname(xfs_da_args_t *args);
> +STATIC int xfs_attr_node_addname(struct xfs_delattr_context *dac);
>  STATIC int xfs_attr_node_removename_iter(struct xfs_delattr_context *dac);
>  STATIC int xfs_attr_node_hasname(xfs_da_args_t *args,
>  				 struct xfs_da_state **state);
>  STATIC int xfs_attr_fillstate(xfs_da_state_t *state);
>  STATIC int xfs_attr_refillstate(xfs_da_state_t *state);
> +STATIC int xfs_attr_leaf_try_add(struct xfs_da_args *args, struct xfs_buf *bp);
> +STATIC int xfs_attr_set_iter(struct xfs_delattr_context *dac,
> +			     struct xfs_buf **leaf_bp);
>  
>  int
>  xfs_inode_hasattr(
> @@ -218,8 +221,11 @@ xfs_attr_is_shortform(
>  
>  /*
>   * Attempts to set an attr in shortform, or converts short form to leaf form if
> - * there is not enough room.  If the attr is set, the transaction is committed
> - * and set to NULL.
> + * there is not enough room.  This function is meant to operate as a helper
> + * routine to the delayed attribute functions.  It returns -EAGAIN to indicate
> + * that the calling function should roll the transaction, and then proceed to
> + * add the attr in leaf form.  This subroutine does not expect to be recalled
> + * again like the other delayed attr routines do.
>   */
>  STATIC int
>  xfs_attr_set_shortform(
> @@ -227,16 +233,16 @@ xfs_attr_set_shortform(
>  	struct xfs_buf		**leaf_bp)
>  {
>  	struct xfs_inode	*dp = args->dp;
> -	int			error, error2 = 0;
> +	int			error = 0;
>  
>  	/*
>  	 * Try to add the attr to the attribute list in the inode.
>  	 */
>  	error = xfs_attr_try_sf_addname(dp, args);
> +
> +	/* Should only be 0, -EEXIST or ENOSPC */

Nit: "...or -ENOSPC"

Also, this comment could go a couple of lines up:

	/*
	 * Try to add the attr to the attribute list in the inode.
	 * This should only return 0, -EEXIST, or -ENOSPC.
	 */
	error = xfs_attr_try_sf_addname(dp, args);
	if (error != -ENOSPC)
		return error;


>  	if (error != -ENOSPC) {
> -		error2 = xfs_trans_commit(args->trans);
> -		args->trans = NULL;
> -		return error ? error : error2;
> +		return error;
>  	}
>  	/*
>  	 * It won't fit in the shortform, transform to a leaf block.  GROT:
> @@ -249,18 +255,10 @@ xfs_attr_set_shortform(
>  	/*
>  	 * Prevent the leaf buffer from being unlocked so that a concurrent AIL
>  	 * push cannot grab the half-baked leaf buffer and run into problems
> -	 * with the write verifier. Once we're done rolling the transaction we
> -	 * can release the hold and add the attr to the leaf.
> +	 * with the write verifier.
>  	 */
>  	xfs_trans_bhold(args->trans, *leaf_bp);
> -	error = xfs_defer_finish(&args->trans);
> -	xfs_trans_bhold_release(args->trans, *leaf_bp);
> -	if (error) {
> -		xfs_trans_brelse(args->trans, *leaf_bp);
> -		return error;
> -	}
> -
> -	return 0;
> +	return -EAGAIN;

What state are we in when return -EAGAIN here?  Are we still in
XFS_DAS_UNINIT, but with an attr fork that is no longer in local format,
which means that we skip the xfs_attr_is_shortform branch next time
around?

>  }
>  
>  /*
> @@ -268,7 +266,7 @@ xfs_attr_set_shortform(
>   * also checks for a defer finish.  Transaction is finished and rolled as
>   * needed, and returns true of false if the delayed operation should continue.
>   */
> -int
> +STATIC int
>  xfs_attr_trans_roll(
>  	struct xfs_delattr_context	*dac)
>  {
> @@ -297,61 +295,130 @@ int
>  xfs_attr_set_args(
>  	struct xfs_da_args	*args)
>  {
> -	struct xfs_inode	*dp = args->dp;
> -	struct xfs_buf          *leaf_bp = NULL;
> -	int			error = 0;
> +	struct xfs_buf			*leaf_bp = NULL;
> +	int				error = 0;
> +	struct xfs_delattr_context	dac = {
> +		.da_args	= args,
> +	};
> +
> +	do {
> +		error = xfs_attr_set_iter(&dac, &leaf_bp);
> +		if (error != -EAGAIN)
> +			break;
> +
> +		error = xfs_attr_trans_roll(&dac);
> +		if (error)
> +			return error;
> +
> +		if (leaf_bp) {
> +			xfs_trans_bjoin(args->trans, leaf_bp);
> +			xfs_trans_bhold(args->trans, leaf_bp);
> +		}
> +
> +	} while (true);
> +
> +	return error;
> +}
> +
> +/*
> + * Set the attribute specified in @args.
> + * This routine is meant to function as a delayed operation, and may return
> + * -EAGAIN when the transaction needs to be rolled.  Calling functions will need
> + * to handle this, and recall the function until a successful error code is
> + * returned.
> + */
> +STATIC int
> +xfs_attr_set_iter(
> +	struct xfs_delattr_context	*dac,
> +	struct xfs_buf			**leaf_bp)
> +{
> +	struct xfs_da_args		*args = dac->da_args;
> +	struct xfs_inode		*dp = args->dp;
> +	int				error = 0;
> +
> +	/* State machine switch */
> +	switch (dac->dela_state) {
> +	case XFS_DAS_FLIP_LFLAG:
> +	case XFS_DAS_FOUND_LBLK:

Do we need to catch XFS_DAS_RM_LBLK here?

> +		goto das_leaf;
> +	case XFS_DAS_FOUND_NBLK:
> +	case XFS_DAS_FLIP_NFLAG:
> +	case XFS_DAS_ALLOC_NODE:
> +		goto das_node;
> +	default:
> +		break;
> +	}
>  
>  	/*
>  	 * If the attribute list is already in leaf format, jump straight to
>  	 * leaf handling.  Otherwise, try to add the attribute to the shortform
>  	 * list; if there's no room then convert the list to leaf format and try
> -	 * again.
> +	 * again. No need to set state as we will be in leaf form when we come
> +	 * back
>  	 */
>  	if (xfs_attr_is_shortform(dp)) {
>  
>  		/*
> -		 * If the attr was successfully set in shortform, the
> -		 * transaction is committed and set to NULL.  Otherwise, is it
> -		 * converted from shortform to leaf, and the transaction is
> -		 * retained.
> +		 * If the attr was successfully set in shortform, no need to
> +		 * continue.  Otherwise, is it converted from shortform to leaf
> +		 * and -EAGAIN is returned.
>  		 */
> -		error = xfs_attr_set_shortform(args, &leaf_bp);
> -		if (error || !args->trans)
> -			return error;
> +		error = xfs_attr_set_shortform(args, leaf_bp);
> +		if (error == -EAGAIN)
> +			dac->flags |= XFS_DAC_DEFER_FINISH;
> +
> +		return error;
>  	}
>  
> -	if (xfs_bmap_one_block(dp, XFS_ATTR_FORK)) {
> -		error = xfs_attr_leaf_addname(args);
> -		if (error != -ENOSPC)
> -			return error;
> +	/*
> +	 * After a shortform to leaf conversion, we need to hold the leaf and
> +	 * cycle out the transaction.  When we get back, we need to release
> +	 * the leaf.

"...to release the hold on the leaf buffer."

> +	 */
> +	if (*leaf_bp != NULL) {
> +		xfs_trans_bhold_release(args->trans, *leaf_bp);
> +		*leaf_bp = NULL;
> +	}
>  
> -		/*
> -		 * Promote the attribute list to the Btree format.
> -		 */
> -		error = xfs_attr3_leaf_to_node(args);
> -		if (error)
> -			return error;
> +	if (xfs_bmap_one_block(dp, XFS_ATTR_FORK)) {
> +		error = xfs_attr_leaf_try_add(args, *leaf_bp);
> +		switch (error) {
> +		case -ENOSPC:
> +			/*
> +			 * Promote the attribute list to the Btree format.
> +			 */
> +			error = xfs_attr3_leaf_to_node(args);
> +			if (error)
> +				return error;
>  
> -		/*
> -		 * Finish any deferred work items and roll the transaction once
> -		 * more.  The goal here is to call node_addname with the inode
> -		 * and transaction in the same state (inode locked and joined,
> -		 * transaction clean) no matter how we got to this step.
> -		 */
> -		error = xfs_defer_finish(&args->trans);
> -		if (error)
> +			/*
> +			 * Finish any deferred work items and roll the
> +			 * transaction once more.  The goal here is to call
> +			 * node_addname with the inode and transaction in the
> +			 * same state (inode locked and joined, transaction
> +			 * clean) no matter how we got to this step.
> +			 */
> +			dac->flags |= XFS_DAC_DEFER_FINISH;
> +			return -EAGAIN;

What state should we be in at this -EAGAIN return?  Is it
XFS_DAS_UNINIT, but with more than one extent in the attr fork?

/me is wishing these would get turned into explicit states, since afaict
we don't unlock the inode and so we should find it in /exactly/ the
state that the delattr_context says it should be in.

> +		case 0:
> +			dac->dela_state = XFS_DAS_FOUND_LBLK;
> +			return -EAGAIN;
> +		default:
>  			return error;
> +		}
> +das_leaf:

The only way to get to this block of code is by jumping to das_leaf,
from the switch statement above, right?  If so, then shouldn't it be up
there in the switch statement?

> +		error = xfs_attr_leaf_addname(dac);
> +		if (error == -ENOSPC)
> +			/*
> +			 * No need to set state.  We will be in node form when
> +			 * we are recalled
> +			 */
> +			return -EAGAIN;

How do we get to node form?

> -		/*
> -		 * Commit the current trans (including the inode) and
> -		 * start a new one.
> -		 */
> -		error = xfs_trans_roll_inode(&args->trans, dp);
> -		if (error)
> -			return error;
> +		return error;
>  	}
> -
> -	error = xfs_attr_node_addname(args);
> +das_node:
> +	error = xfs_attr_node_addname(dac);
>  	return error;

Similarly, I think the only way get to this block of code is if we're in
the initial state (XFS_DAS_UNINIT?) and the inode wasn't in short
format; or if we jumped here via DAS_{FOUND_NBLK,FLIP_NFLAG,ALLOC_NODE},
right?

I think you could straighten this out a bit further (I left out the
comments):

	switch (dac->dela_state) {
	case XFS_DAS_FLIP_LFLAG:
	case XFS_DAS_FOUND_LBLK:
		error = xfs_attr_leaf_addname(dac);
		if (error == -ENOSPC)
			return -EAGAIN;
		return error;
	case XFS_DAS_FOUND_NBLK:
	case XFS_DAS_FLIP_NFLAG:
	case XFS_DAS_ALLOC_NODE:
		return xfs_attr_node_addname(dac);
	case XFS_DAS_UNINIT:
		break;
	default:
		...assert on the XFS_DAS_RM_* flags...
	}

	if (xfs_attr_is_shortform(dp))
		return xfs_attr_set_shortform(args, leaf_bp);

	if (*leaf_bp != NULL) {
		...release bhold...
	}

	if (!xfs_bmap_one_block(...))
		return xfs_attr_node_addname(dac);

	error = xfs_attr_leaf_try_add(args, *leaf_bp);
	switch (error) {
	...handle -ENOSPC and 0...
	}
	return error;

>  }
>  
> @@ -723,28 +790,30 @@ xfs_attr_leaf_try_add(
>   *
>   * This leaf block cannot have a "remote" value, we only call this routine
>   * if bmap_one_block() says there is only one block (ie: no remote blks).
> + *
> + * This routine is meant to function as a delayed operation, and may return
> + * -EAGAIN when the transaction needs to be rolled.  Calling functions will need
> + * to handle this, and recall the function until a successful error code is
> + * returned.
>   */
>  STATIC int
>  xfs_attr_leaf_addname(
> -	struct xfs_da_args	*args)
> +	struct xfs_delattr_context	*dac)
>  {
> -	int			error, forkoff;
> -	struct xfs_buf		*bp = NULL;
> -	struct xfs_inode	*dp = args->dp;
> -
> -	trace_xfs_attr_leaf_addname(args);
> -
> -	error = xfs_attr_leaf_try_add(args, bp);
> -	if (error)
> -		return error;
> +	struct xfs_da_args		*args = dac->da_args;
> +	struct xfs_buf			*bp = NULL;
> +	int				error, forkoff;
> +	struct xfs_inode		*dp = args->dp;
>  
> -	/*
> -	 * Commit the transaction that added the attr name so that
> -	 * later routines can manage their own transactions.
> -	 */
> -	error = xfs_trans_roll_inode(&args->trans, dp);
> -	if (error)
> -		return error;
> +	/* State machine switch */
> +	switch (dac->dela_state) {
> +	case XFS_DAS_FLIP_LFLAG:
> +		goto das_flip_flag;
> +	case XFS_DAS_RM_LBLK:
> +		goto das_rm_lblk;
> +	default:
> +		break;
> +	}
>  
>  	/*
>  	 * If there was an out-of-line value, allocate the blocks we
> @@ -752,12 +821,34 @@ xfs_attr_leaf_addname(
>  	 * after we create the attribute so that we don't overflow the
>  	 * maximum size of a transaction and/or hit a deadlock.
>  	 */
> -	if (args->rmtblkno > 0) {
> -		error = xfs_attr_rmtval_set(args);
> +
> +	/* Open coded xfs_attr_rmtval_set without trans handling */
> +	if ((dac->flags & XFS_DAC_LEAF_ADDNAME_INIT) == 0) {
> +		dac->flags |= XFS_DAC_LEAF_ADDNAME_INIT;
> +		if (args->rmtblkno > 0) {
> +			error = xfs_attr_rmtval_find_space(dac);
> +			if (error)
> +				return error;
> +		}
> +	}
> +
> +	/*
> +	 * Roll through the "value", allocating blocks on disk as
> +	 * required.
> +	 */
> +	if (dac->blkcnt > 0) {
> +		error = xfs_attr_rmtval_set_blk(dac);
>  		if (error)
>  			return error;
> +
> +		dac->flags |= XFS_DAC_DEFER_FINISH;
> +		return -EAGAIN;

What state are we in here?  FOUND_LBLK, with blkcnt slowly decreasing?

>  	}
>  
> +	error = xfs_attr_rmtval_set_value(args);
> +	if (error)
> +		return error;
> +
>  	if (!(args->op_flags & XFS_DA_OP_RENAME)) {
>  		/*
>  		 * Added a "remote" value, just clear the incomplete flag.
> @@ -777,29 +868,29 @@ xfs_attr_leaf_addname(
>  	 * In a separate transaction, set the incomplete flag on the "old" attr
>  	 * and clear the incomplete flag on the "new" attr.
>  	 */
> -
>  	error = xfs_attr3_leaf_flipflags(args);
>  	if (error)
>  		return error;
>  	/*
>  	 * Commit the flag value change and start the next trans in series.
>  	 */
> -	error = xfs_trans_roll_inode(&args->trans, args->dp);
> -	if (error)
> -		return error;
> -
> +	dac->dela_state = XFS_DAS_FLIP_LFLAG;
> +	return -EAGAIN;
> +das_flip_flag:
>  	/*
>  	 * Dismantle the "old" attribute/value pair by removing a "remote" value
>  	 * (if it exists).
>  	 */
>  	xfs_attr_restore_rmt_blk(args);
>  
> +	error = xfs_attr_rmtval_invalidate(args);
> +	if (error)
> +		return error;
> +das_rm_lblk:
>  	if (args->rmtblkno) {
> -		error = xfs_attr_rmtval_invalidate(args);
> -		if (error)
> -			return error;
> -
> -		error = xfs_attr_rmtval_remove(args);
> +		error = __xfs_attr_rmtval_remove(dac);
> +		if (error == -EAGAIN)
> +			dac->dela_state = XFS_DAS_RM_LBLK;
>  		if (error)
>  			return error;
>  	}
> @@ -965,23 +1056,38 @@ xfs_attr_node_hasname(
>   *
>   * "Remote" attribute values confuse the issue and atomic rename operations
>   * add a whole extra layer of confusion on top of that.
> + *
> + * This routine is meant to function as a delayed operation, and may return
> + * -EAGAIN when the transaction needs to be rolled.  Calling functions will need
> + * to handle this, and recall the function until a successful error code is
> + *returned.
>   */
>  STATIC int
>  xfs_attr_node_addname(
> -	struct xfs_da_args	*args)
> +	struct xfs_delattr_context	*dac)
>  {
> -	struct xfs_da_state	*state;
> -	struct xfs_da_state_blk	*blk;
> -	struct xfs_inode	*dp;
> -	int			retval, error;
> +	struct xfs_da_args		*args = dac->da_args;
> +	struct xfs_da_state		*state = NULL;
> +	struct xfs_da_state_blk		*blk;
> +	int				retval = 0;
> +	int				error = 0;
>  
>  	trace_xfs_attr_node_addname(args);
>  
> -	/*
> -	 * Fill in bucket of arguments/results/context to carry around.
> -	 */
> -	dp = args->dp;
> -restart:
> +	/* State machine switch */
> +	switch (dac->dela_state) {
> +	case XFS_DAS_FLIP_NFLAG:
> +		goto das_flip_flag;
> +	case XFS_DAS_FOUND_NBLK:
> +		goto das_found_nblk;
> +	case XFS_DAS_ALLOC_NODE:
> +		goto das_alloc_node;
> +	case XFS_DAS_RM_NBLK:
> +		goto das_rm_nblk;
> +	default:
> +		break;
> +	}
> +
>  	/*
>  	 * Search to see if name already exists, and get back a pointer
>  	 * to where it should go.
> @@ -1027,19 +1133,13 @@ xfs_attr_node_addname(
>  			error = xfs_attr3_leaf_to_node(args);
>  			if (error)
>  				goto out;
> -			error = xfs_defer_finish(&args->trans);
> -			if (error)
> -				goto out;
>  
>  			/*
> -			 * Commit the node conversion and start the next
> -			 * trans in the chain.
> +			 * Restart routine from the top.  No need to set  the
> +			 * state
>  			 */
> -			error = xfs_trans_roll_inode(&args->trans, dp);
> -			if (error)
> -				goto out;
> -
> -			goto restart;
> +			dac->flags |= XFS_DAC_DEFER_FINISH;
> +			return -EAGAIN;

What state are we in here?  Are we still in the same state that we were
at the start of the function, but ready to try xfs_attr3_leaf_add again?

>  		}
>  
>  		/*
> @@ -1051,9 +1151,7 @@ xfs_attr_node_addname(
>  		error = xfs_da3_split(state);
>  		if (error)
>  			goto out;
> -		error = xfs_defer_finish(&args->trans);
> -		if (error)
> -			goto out;
> +		dac->flags |= XFS_DAC_DEFER_FINISH;
>  	} else {
>  		/*
>  		 * Addition succeeded, update Btree hashvals.
> @@ -1068,13 +1166,9 @@ xfs_attr_node_addname(
>  	xfs_da_state_free(state);
>  	state = NULL;
>  
> -	/*
> -	 * Commit the leaf addition or btree split and start the next
> -	 * trans in the chain.
> -	 */
> -	error = xfs_trans_roll_inode(&args->trans, dp);
> -	if (error)
> -		goto out;
> +	dac->dela_state = XFS_DAS_FOUND_NBLK;
> +	return -EAGAIN;
> +das_found_nblk:
>  
>  	/*
>  	 * If there was an out-of-line value, allocate the blocks we
> @@ -1083,7 +1177,27 @@ xfs_attr_node_addname(
>  	 * maximum size of a transaction and/or hit a deadlock.
>  	 */
>  	if (args->rmtblkno > 0) {
> -		error = xfs_attr_rmtval_set(args);
> +		/* Open coded xfs_attr_rmtval_set without trans handling */
> +		error = xfs_attr_rmtval_find_space(dac);
> +		if (error)
> +			return error;
> +
> +		/*
> +		 * Roll through the "value", allocating blocks on disk as
> +		 * required.
> +		 */
> +das_alloc_node:
> +		if (dac->blkcnt > 0) {
> +			error = xfs_attr_rmtval_set_blk(dac);
> +			if (error)
> +				return error;
> +
> +			dac->flags |= XFS_DAC_DEFER_FINISH;
> +			dac->dela_state = XFS_DAS_ALLOC_NODE;
> +			return -EAGAIN;
> +		}
> +
> +		error = xfs_attr_rmtval_set_value(args);
>  		if (error)
>  			return error;
>  	}
> @@ -1113,22 +1227,28 @@ xfs_attr_node_addname(
>  	/*
>  	 * Commit the flag value change and start the next trans in series
>  	 */
> -	error = xfs_trans_roll_inode(&args->trans, args->dp);
> -	if (error)
> -		goto out;
> -
> +	dac->dela_state = XFS_DAS_FLIP_NFLAG;
> +	return -EAGAIN;
> +das_flip_flag:
>  	/*
>  	 * Dismantle the "old" attribute/value pair by removing a "remote" value
>  	 * (if it exists).
>  	 */
>  	xfs_attr_restore_rmt_blk(args);
>  
> +	error = xfs_attr_rmtval_invalidate(args);
> +	if (error)
> +		return error;
> +
> +das_rm_nblk:
>  	if (args->rmtblkno) {
> -		error = xfs_attr_rmtval_invalidate(args);
> -		if (error)
> -			return error;
> +		error = __xfs_attr_rmtval_remove(dac);
> +
> +		if (error == -EAGAIN) {
> +			dac->dela_state = XFS_DAS_RM_NBLK;
> +			return -EAGAIN;
> +		}
>  
> -		error = xfs_attr_rmtval_remove(args);
>  		if (error)
>  			return error;
>  	}
> diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
> index 64dcf0f..501f9df 100644
> --- a/fs/xfs/libxfs/xfs_attr.h
> +++ b/fs/xfs/libxfs/xfs_attr.h
> @@ -106,6 +106,118 @@ struct xfs_attr_list_context {
>   *	                                      v         │
>   *	                                     done <─────┘
>   *
> + *
> + * Below is a state machine diagram for attr set operations.
> + *
> + *  xfs_attr_set_iter()
> + *             │
> + *             v

I think this diagram is missing the part where we attempt to add a
shortform attr?

--D

> + *   ┌───n── fork has
> + *   │	    only 1 blk?
> + *   │		│
> + *   │		y
> + *   │		│
> + *   │		v
> + *   │	xfs_attr_leaf_try_add()
> + *   │		│
> + *   │		v
> + *   │	     had enough
> + *   ├───n────space?
> + *   │		│
> + *   │		y
> + *   │		│
> + *   │		v
> + *   │	XFS_DAS_FOUND_LBLK ──┐
> + *   │	                     │
> + *   │	XFS_DAS_FLIP_LFLAG ──┤
> + *   │	(subroutine state)   │
> + *   │		             │
> + *   │		             └─>xfs_attr_leaf_addname()
> + *   │		                      │
> + *   │		                      v
> + *   │		                   was this
> + *   │		                   a rename? ──n─┐
> + *   │		                      │          │
> + *   │		                      y          │
> + *   │		                      │          │
> + *   │		                      v          │
> + *   │		                flip incomplete  │
> + *   │		                    flag         │
> + *   │		                      │          │
> + *   │		                      v          │
> + *   │		              XFS_DAS_FLIP_LFLAG │
> + *   │		                      │          │
> + *   │		                      v          │
> + *   │		                    remove       │
> + *   │		XFS_DAS_RM_LBLK ─> old name      │
> + *   │		         ^            │          │
> + *   │		         │            v          │
> + *   │		         └──────y── more to      │
> + *   │		                    remove       │
> + *   │		                      │          │
> + *   │		                      n          │
> + *   │		                      │          │
> + *   │		                      v          │
> + *   │		                     done <──────┘
> + *   └──> XFS_DAS_FOUND_NBLK ──┐
> + *	  (subroutine state)   │
> + *	                       │
> + *	  XFS_DAS_ALLOC_NODE ──┤
> + *	  (subroutine state)   │
> + *	                       │
> + *	  XFS_DAS_FLIP_NFLAG ──┤
> + *	  (subroutine state)   │
> + *	                       │
> + *	                       └─>xfs_attr_node_addname()
> + *	                               │
> + *	                               v
> + *	                       find space to store
> + *	                      attr. Split if needed
> + *	                               │
> + *	                               v
> + *	                       XFS_DAS_FOUND_NBLK
> + *	                               │
> + *	                               v
> + *	                 ┌─────n──  need to
> + *	                 │        alloc blks?
> + *	                 │             │
> + *	                 │             y
> + *	                 │             │
> + *	                 │             v
> + *	                 │  ┌─>XFS_DAS_ALLOC_NODE
> + *	                 │  │          │
> + *	                 │  │          v
> + *	                 │  └──y── need to alloc
> + *	                 │         more blocks?
> + *	                 │             │
> + *	                 │             n
> + *	                 │             │
> + *	                 │             v
> + *	                 │          was this
> + *	                 └────────> a rename? ──n─┐
> + *	                               │          │
> + *	                               y          │
> + *	                               │          │
> + *	                               v          │
> + *	                         flip incomplete  │
> + *	                             flag         │
> + *	                               │          │
> + *	                               v          │
> + *	                       XFS_DAS_FLIP_NFLAG │
> + *	                               │          │
> + *	                               v          │
> + *	                             remove       │
> + *	         XFS_DAS_RM_NBLK ─> old name      │
> + *	                  ^            │          │
> + *	                  │            v          │
> + *	                  └──────y── more to      │
> + *	                             remove       │
> + *	                               │          │
> + *	                               n          │
> + *	                               │          │
> + *	                               v          │
> + *	                              done <──────┘
> + *
>   */
>  
>  /*
> @@ -120,6 +232,13 @@ struct xfs_attr_list_context {
>  enum xfs_delattr_state {
>  	XFS_DAS_UNINIT		= 0,  /* No state has been set yet */
>  	XFS_DAS_RM_SHRINK,	      /* We are shrinking the tree */
> +	XFS_DAS_FOUND_LBLK,	      /* We found leaf blk for attr */
> +	XFS_DAS_FOUND_NBLK,	      /* We found node blk for attr */
> +	XFS_DAS_FLIP_LFLAG,	      /* Flipped leaf INCOMPLETE attr flag */
> +	XFS_DAS_RM_LBLK,	      /* A rename is removing leaf blocks */
> +	XFS_DAS_ALLOC_NODE,	      /* We are allocating node blocks */
> +	XFS_DAS_FLIP_NFLAG,	      /* Flipped node INCOMPLETE attr flag */
> +	XFS_DAS_RM_NBLK,	      /* A rename is removing node blocks */
>  };
>  
>  /*
> @@ -127,6 +246,7 @@ enum xfs_delattr_state {
>   */
>  #define XFS_DAC_DEFER_FINISH		0x01 /* finish the transaction */
>  #define XFS_DAC_NODE_RMVNAME_INIT	0x02 /* xfs_attr_node_removename init */
> +#define XFS_DAC_LEAF_ADDNAME_INIT	0x04 /* xfs_attr_leaf_addname init*/
>  
>  /*
>   * Context used for keeping track of delayed attribute operations
> @@ -134,6 +254,11 @@ enum xfs_delattr_state {
>  struct xfs_delattr_context {
>  	struct xfs_da_args      *da_args;
>  
> +	/* Used in xfs_attr_rmtval_set_blk to roll through allocating blocks */
> +	struct xfs_bmbt_irec	map;
> +	xfs_dablk_t		lblkno;
> +	int			blkcnt;
> +
>  	/* Used in xfs_attr_node_removename to roll through removing blocks */
>  	struct xfs_da_state     *da_state;
>  
> @@ -160,7 +285,6 @@ int xfs_attr_set_args(struct xfs_da_args *args);
>  int xfs_has_attr(struct xfs_da_args *args);
>  int xfs_attr_remove_args(struct xfs_da_args *args);
>  int xfs_attr_remove_iter(struct xfs_delattr_context *dac);
> -int xfs_attr_trans_roll(struct xfs_delattr_context *dac);
>  bool xfs_attr_namecheck(const void *name, size_t length);
>  void xfs_delattr_context_init(struct xfs_delattr_context *dac,
>  			      struct xfs_da_args *args);
> diff --git a/fs/xfs/libxfs/xfs_attr_remote.c b/fs/xfs/libxfs/xfs_attr_remote.c
> index 1426c15..5b445e7 100644
> --- a/fs/xfs/libxfs/xfs_attr_remote.c
> +++ b/fs/xfs/libxfs/xfs_attr_remote.c
> @@ -441,7 +441,7 @@ xfs_attr_rmtval_get(
>   * Find a "hole" in the attribute address space large enough for us to drop the
>   * new attribute's value into
>   */
> -STATIC int
> +int
>  xfs_attr_rmt_find_hole(
>  	struct xfs_da_args	*args)
>  {
> @@ -468,7 +468,7 @@ xfs_attr_rmt_find_hole(
>  	return 0;
>  }
>  
> -STATIC int
> +int
>  xfs_attr_rmtval_set_value(
>  	struct xfs_da_args	*args)
>  {
> @@ -628,6 +628,69 @@ xfs_attr_rmtval_set(
>  }
>  
>  /*
> + * Find a hole for the attr and store it in the delayed attr context.  This
> + * initializes the context to roll through allocating an attr extent for a
> + * delayed attr operation
> + */
> +int
> +xfs_attr_rmtval_find_space(
> +	struct xfs_delattr_context	*dac)
> +{
> +	struct xfs_da_args		*args = dac->da_args;
> +	struct xfs_bmbt_irec		*map = &dac->map;
> +	int				error;
> +
> +	dac->lblkno = 0;
> +	dac->blkcnt = 0;
> +	args->rmtblkcnt = 0;
> +	args->rmtblkno = 0;
> +	memset(map, 0, sizeof(struct xfs_bmbt_irec));
> +
> +	error = xfs_attr_rmt_find_hole(args);
> +	if (error)
> +		return error;
> +
> +	dac->blkcnt = args->rmtblkcnt;
> +	dac->lblkno = args->rmtblkno;
> +
> +	return 0;
> +}
> +
> +/*
> + * Write one block of the value associated with an attribute into the
> + * out-of-line buffer that we have defined for it. This is similar to a subset
> + * of xfs_attr_rmtval_set, but records the current block to the delayed attr
> + * context, and leaves transaction handling to the caller.
> + */
> +int
> +xfs_attr_rmtval_set_blk(
> +	struct xfs_delattr_context	*dac)
> +{
> +	struct xfs_da_args		*args = dac->da_args;
> +	struct xfs_inode		*dp = args->dp;
> +	struct xfs_bmbt_irec		*map = &dac->map;
> +	int nmap;
> +	int error;
> +
> +	nmap = 1;
> +	error = xfs_bmapi_write(args->trans, dp, (xfs_fileoff_t)dac->lblkno,
> +				dac->blkcnt, XFS_BMAPI_ATTRFORK, args->total,
> +				map, &nmap);
> +	if (error)
> +		return error;
> +
> +	ASSERT(nmap == 1);
> +	ASSERT((map->br_startblock != DELAYSTARTBLOCK) &&
> +	       (map->br_startblock != HOLESTARTBLOCK));
> +
> +	/* roll attribute extent map forwards */
> +	dac->lblkno += map->br_blockcount;
> +	dac->blkcnt -= map->br_blockcount;
> +
> +	return 0;
> +}
> +
> +/*
>   * Remove the value associated with an attribute by deleting the
>   * out-of-line buffer that it is stored on.
>   */
> @@ -669,38 +732,6 @@ xfs_attr_rmtval_invalidate(
>  }
>  
>  /*
> - * Remove the value associated with an attribute by deleting the
> - * out-of-line buffer that it is stored on.
> - */
> -int
> -xfs_attr_rmtval_remove(
> -	struct xfs_da_args		*args)
> -{
> -	int				error;
> -	struct xfs_delattr_context	dac  = {
> -		.da_args	= args,
> -	};
> -
> -	trace_xfs_attr_rmtval_remove(args);
> -
> -	/*
> -	 * Keep de-allocating extents until the remote-value region is gone.
> -	 */
> -	do {
> -		error = __xfs_attr_rmtval_remove(&dac);
> -		if (error != -EAGAIN)
> -			break;
> -
> -		error = xfs_attr_trans_roll(&dac);
> -		if (error)
> -			return error;
> -
> -	} while (true);
> -
> -	return error;
> -}
> -
> -/*
>   * Remove the value associated with an attribute by deleting the out-of-line
>   * buffer that it is stored on. Returns EAGAIN for the caller to refresh the
>   * transaction and re-call the function
> diff --git a/fs/xfs/libxfs/xfs_attr_remote.h b/fs/xfs/libxfs/xfs_attr_remote.h
> index 002fd30..84e2700 100644
> --- a/fs/xfs/libxfs/xfs_attr_remote.h
> +++ b/fs/xfs/libxfs/xfs_attr_remote.h
> @@ -15,4 +15,8 @@ int xfs_attr_rmtval_stale(struct xfs_inode *ip, struct xfs_bmbt_irec *map,
>  		xfs_buf_flags_t incore_flags);
>  int xfs_attr_rmtval_invalidate(struct xfs_da_args *args);
>  int __xfs_attr_rmtval_remove(struct xfs_delattr_context *dac);
> +int xfs_attr_rmt_find_hole(struct xfs_da_args *args);
> +int xfs_attr_rmtval_set_value(struct xfs_da_args *args);
> +int xfs_attr_rmtval_set_blk(struct xfs_delattr_context *dac);
> +int xfs_attr_rmtval_find_space(struct xfs_delattr_context *dac);
>  #endif /* __XFS_ATTR_REMOTE_H__ */
> diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
> index 8695165..e9dde4e 100644
> --- a/fs/xfs/xfs_trace.h
> +++ b/fs/xfs/xfs_trace.h
> @@ -1925,7 +1925,6 @@ DEFINE_ATTR_EVENT(xfs_attr_refillstate);
>  
>  DEFINE_ATTR_EVENT(xfs_attr_rmtval_get);
>  DEFINE_ATTR_EVENT(xfs_attr_rmtval_set);
> -DEFINE_ATTR_EVENT(xfs_attr_rmtval_remove);
>  
>  #define DEFINE_DA_EVENT(name) \
>  DEFINE_EVENT(xfs_da_class, name, \
> -- 
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v13 01/10] xfs: Add helper xfs_attr_node_remove_step
  2020-10-23  6:34 ` [PATCH v13 01/10] xfs: Add helper xfs_attr_node_remove_step Allison Henderson
  2020-10-27  7:03   ` Chandan Babu R
  2020-10-27 12:15   ` Brian Foster
@ 2020-11-10 23:12   ` Darrick J. Wong
  2020-11-13  1:38     ` Allison Henderson
  2 siblings, 1 reply; 58+ messages in thread
From: Darrick J. Wong @ 2020-11-10 23:12 UTC (permalink / raw)
  To: Allison Henderson; +Cc: linux-xfs

On Thu, Oct 22, 2020 at 11:34:26PM -0700, Allison Henderson wrote:
> From: Allison Collins <allison.henderson@oracle.com>
> 
> This patch adds a new helper function xfs_attr_node_remove_step.  This
> will help simplify and modularize the calling function
> xfs_attr_node_remove.
> 
> Signed-off-by: Allison Henderson <allison.henderson@oracle.com>

Looks fine to me, modulo Brian and Chandan's suggestions;
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> ---
>  fs/xfs/libxfs/xfs_attr.c | 46 ++++++++++++++++++++++++++++++++++------------
>  1 file changed, 34 insertions(+), 12 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
> index fd8e641..f4d39bf 100644
> --- a/fs/xfs/libxfs/xfs_attr.c
> +++ b/fs/xfs/libxfs/xfs_attr.c
> @@ -1228,19 +1228,14 @@ xfs_attr_node_remove_rmt(
>   * the root node (a special case of an intermediate node).
>   */
>  STATIC int
> -xfs_attr_node_removename(
> -	struct xfs_da_args	*args)
> +xfs_attr_node_remove_step(
> +	struct xfs_da_args	*args,
> +	struct xfs_da_state	*state)
>  {
> -	struct xfs_da_state	*state;
>  	struct xfs_da_state_blk	*blk;
>  	int			retval, error;
>  	struct xfs_inode	*dp = args->dp;
>  
> -	trace_xfs_attr_node_removename(args);
> -
> -	error = xfs_attr_node_removename_setup(args, &state);
> -	if (error)
> -		goto out;
>  
>  	/*
>  	 * If there is an out-of-line value, de-allocate the blocks.
> @@ -1250,7 +1245,7 @@ xfs_attr_node_removename(
>  	if (args->rmtblkno > 0) {
>  		error = xfs_attr_node_remove_rmt(args, state);
>  		if (error)
> -			goto out;
> +			return error;
>  	}
>  
>  	/*
> @@ -1267,18 +1262,45 @@ xfs_attr_node_removename(
>  	if (retval && (state->path.active > 1)) {
>  		error = xfs_da3_join(state);
>  		if (error)
> -			goto out;
> +			return error;
>  		error = xfs_defer_finish(&args->trans);
>  		if (error)
> -			goto out;
> +			return error;
>  		/*
>  		 * Commit the Btree join operation and start a new trans.
>  		 */
>  		error = xfs_trans_roll_inode(&args->trans, dp);
>  		if (error)
> -			goto out;
> +			return error;
>  	}
>  
> +	return error;
> +}
> +
> +/*
> + * Remove a name from a B-tree attribute list.
> + *
> + * This routine will find the blocks of the name to remove, remove them and
> + * shirnk the tree if needed.
> + */
> +STATIC int
> +xfs_attr_node_removename(
> +	struct xfs_da_args	*args)
> +{
> +	struct xfs_da_state	*state;
> +	int			error;
> +	struct xfs_inode	*dp = args->dp;
> +
> +	trace_xfs_attr_node_removename(args);
> +
> +	error = xfs_attr_node_removename_setup(args, &state);
> +	if (error)
> +		goto out;
> +
> +	error = xfs_attr_node_remove_step(args, state);
> +	if (error)
> +		goto out;
> +
>  	/*
>  	 * If the result is small enough, push it all into the inode.
>  	 */
> -- 
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v13 02/10] xfs: Add delay ready attr remove routines
  2020-10-27 12:16   ` Brian Foster
  2020-10-27 22:27     ` Allison Henderson
@ 2020-11-10 23:15     ` Darrick J. Wong
  1 sibling, 0 replies; 58+ messages in thread
From: Darrick J. Wong @ 2020-11-10 23:15 UTC (permalink / raw)
  To: Brian Foster; +Cc: Allison Henderson, linux-xfs

On Tue, Oct 27, 2020 at 08:16:45AM -0400, Brian Foster wrote:
> On Thu, Oct 22, 2020 at 11:34:27PM -0700, Allison Henderson wrote:
> > This patch modifies the attr remove routines to be delay ready. This
> > means they no longer roll or commit transactions, but instead return
> > -EAGAIN to have the calling routine roll and refresh the transaction. In
> > this series, xfs_attr_remove_args has become xfs_attr_remove_iter, which
> > uses a sort of state machine like switch to keep track of where it was
> > when EAGAIN was returned. xfs_attr_node_removename has also been
> > modified to use the switch, and a new version of xfs_attr_remove_args
> > consists of a simple loop to refresh the transaction until the operation
> > is completed.  A new XFS_DAC_DEFER_FINISH flag is used to finish the
> > transaction where ever the existing code used to.
> > 
> > Calls to xfs_attr_rmtval_remove are replaced with the delay ready
> > version __xfs_attr_rmtval_remove. We will rename
> > __xfs_attr_rmtval_remove back to xfs_attr_rmtval_remove when we are
> > done.
> > 
> > xfs_attr_rmtval_remove itself is still in use by the set routines (used
> > during a rename).  For reasons of preserving existing function, we
> > modify xfs_attr_rmtval_remove to call xfs_defer_finish when the flag is
> > set.  Similar to how xfs_attr_remove_args does here.  Once we transition
> > the set routines to be delay ready, xfs_attr_rmtval_remove is no longer
> > used and will be removed.
> > 
> > This patch also adds a new struct xfs_delattr_context, which we will use
> > to keep track of the current state of an attribute operation. The new
> > xfs_delattr_state enum is used to track various operations that are in
> > progress so that we know not to repeat them, and resume where we left
> > off before EAGAIN was returned to cycle out the transaction. Other
> > members take the place of local variables that need to retain their
> > values across multiple function recalls.  See xfs_attr.h for a more
> > detailed diagram of the states.
> > 
> > Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
> > ---
> >  fs/xfs/libxfs/xfs_attr.c        | 200 +++++++++++++++++++++++++++++-----------
> >  fs/xfs/libxfs/xfs_attr.h        |  72 +++++++++++++++
> >  fs/xfs/libxfs/xfs_attr_leaf.c   |   2 +-
> >  fs/xfs/libxfs/xfs_attr_remote.c |  37 ++++----
> >  fs/xfs/libxfs/xfs_attr_remote.h |   2 +-
> >  fs/xfs/xfs_attr_inactive.c      |   2 +-
> >  6 files changed, 241 insertions(+), 74 deletions(-)
> > 
> > diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
> > index f4d39bf..6ca94cb 100644
> > --- a/fs/xfs/libxfs/xfs_attr.c
> > +++ b/fs/xfs/libxfs/xfs_attr.c
<snip>
> > @@ -1282,31 +1356,53 @@ xfs_attr_node_remove_step(
> >   *
> >   * This routine will find the blocks of the name to remove, remove them and
> >   * shirnk the tree if needed.
> > + *
> > + * This routine is meant to function as either an inline or delayed operation,
> > + * and may return -EAGAIN when the transaction needs to be rolled.  Calling
> > + * functions will need to handle this, and recall the function until a
> > + * successful error code is returned.
> >   */
> >  STATIC int
> > -xfs_attr_node_removename(
> > -	struct xfs_da_args	*args)
> > +xfs_attr_node_removename_iter(
> > +	struct xfs_delattr_context	*dac)
> >  {
> > -	struct xfs_da_state	*state;
> > -	int			error;
> > -	struct xfs_inode	*dp = args->dp;
> > +	struct xfs_da_args		*args = dac->da_args;
> > +	struct xfs_da_state		*state;
> > +	int				error;
> > +	struct xfs_inode		*dp = args->dp;
> >  
> >  	trace_xfs_attr_node_removename(args);
> > +	state = dac->da_state;
> >  
> > -	error = xfs_attr_node_removename_setup(args, &state);
> > -	if (error)
> > -		goto out;
> > +	if ((dac->flags & XFS_DAC_NODE_RMVNAME_INIT) == 0) {
> > +		dac->flags |= XFS_DAC_NODE_RMVNAME_INIT;
> > +		error = xfs_attr_node_removename_setup(dac, &state);
> > +		if (error)
> > +			goto out;
> > +	}
> >  
> > -	error = xfs_attr_node_remove_step(args, state);
> > -	if (error)
> > -		goto out;
> > +	switch (dac->dela_state) {
> > +	case XFS_DAS_UNINIT:
> > +		error = xfs_attr_node_remove_step(dac);
> > +		if (error)
> > +			break;
> >  
> 
> I think there's a bit more preliminary refactoring to do here to isolate
> the state management to this one function. I.e., from the discussion on
> the previous version, we'd ideally pull the logic that checks for the
> subsequent shrink state out of xfs_attr_node_remove_step() and lift it
> into this branch. See the pseudocode in the previous discussion for an
> example of what I mean:
> 
>   https://lore.kernel.org/linux-xfs/20200901170020.GC174813@bfoster/
> 
> The general goal of that is to refactor the existing code such that all
> of the state transitions and whatnot are shown in one place and the rest
> is broken down into smaller functional helpers.

Agreed.

--D

> Brian
> 
> > -	/*
> > -	 * If the result is small enough, push it all into the inode.
> > -	 */
> > -	if (xfs_bmap_one_block(dp, XFS_ATTR_FORK))
> > -		error = xfs_attr_node_shrink(args, state);
> > +		/* do not break, proceed to shrink if needed */
> > +	case XFS_DAS_RM_SHRINK:
> > +		/*
> > +		 * If the result is small enough, push it all into the inode.
> > +		 */
> > +		if (xfs_bmap_one_block(dp, XFS_ATTR_FORK))
> > +			error = xfs_attr_node_shrink(args, state);
> >  
> > +		break;
> > +	default:
> > +		ASSERT(0);
> > +		return -EINVAL;
> > +	}
> > +
> > +	if (error == -EAGAIN)
> > +		return error;
> >  out:
> >  	if (state)
> >  		xfs_da_state_free(state);
> > diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
> > index 3e97a93..64dcf0f 100644
> > --- a/fs/xfs/libxfs/xfs_attr.h
> > +++ b/fs/xfs/libxfs/xfs_attr.h
> > @@ -74,6 +74,74 @@ struct xfs_attr_list_context {
> >  };
> >  
> >  
> > +/*
> > + * ========================================================================
> > + * Structure used to pass context around among the delayed routines.
> > + * ========================================================================
> > + */
> > +
> > +/*
> > + * Below is a state machine diagram for attr remove operations. The  XFS_DAS_*
> > + * states indicate places where the function would return -EAGAIN, and then
> > + * immediately resume from after being recalled by the calling function. States
> > + * marked as a "subroutine state" indicate that they belong to a subroutine, and
> > + * so the calling function needs to pass them back to that subroutine to allow
> > + * it to finish where it left off. But they otherwise do not have a role in the
> > + * calling function other than just passing through.
> > + *
> > + * xfs_attr_remove_iter()
> > + *	  XFS_DAS_RM_SHRINK ─┐
> > + *	  (subroutine state) │
> > + *	                     └─>xfs_attr_node_removename()
> > + *	                                      │
> > + *	                                      v
> > + *	                                   need to
> > + *	                                shrink tree? ─n─┐
> > + *	                                      │         │
> > + *	                                      y         │
> > + *	                                      │         │
> > + *	                                      v         │
> > + *	                              XFS_DAS_RM_SHRINK │
> > + *	                                      │         │
> > + *	                                      v         │
> > + *	                                     done <─────┘
> > + *
> > + */
> > +
> > +/*
> > + * Enum values for xfs_delattr_context.da_state
> > + *
> > + * These values are used by delayed attribute operations to keep track  of where
> > + * they were before they returned -EAGAIN.  A return code of -EAGAIN signals the
> > + * calling function to roll the transaction, and then recall the subroutine to
> > + * finish the operation.  The enum is then used by the subroutine to jump back
> > + * to where it was and resume executing where it left off.
> > + */
> > +enum xfs_delattr_state {
> > +	XFS_DAS_UNINIT		= 0,  /* No state has been set yet */
> > +	XFS_DAS_RM_SHRINK,	      /* We are shrinking the tree */
> > +};
> > +
> > +/*
> > + * Defines for xfs_delattr_context.flags
> > + */
> > +#define XFS_DAC_DEFER_FINISH		0x01 /* finish the transaction */
> > +#define XFS_DAC_NODE_RMVNAME_INIT	0x02 /* xfs_attr_node_removename init */
> > +
> > +/*
> > + * Context used for keeping track of delayed attribute operations
> > + */
> > +struct xfs_delattr_context {
> > +	struct xfs_da_args      *da_args;
> > +
> > +	/* Used in xfs_attr_node_removename to roll through removing blocks */
> > +	struct xfs_da_state     *da_state;
> > +
> > +	/* Used to keep track of current state of delayed operation */
> > +	unsigned int            flags;
> > +	enum xfs_delattr_state  dela_state;
> > +};
> > +
> >  /*========================================================================
> >   * Function prototypes for the kernel.
> >   *========================================================================*/
> > @@ -91,6 +159,10 @@ int xfs_attr_set(struct xfs_da_args *args);
> >  int xfs_attr_set_args(struct xfs_da_args *args);
> >  int xfs_has_attr(struct xfs_da_args *args);
> >  int xfs_attr_remove_args(struct xfs_da_args *args);
> > +int xfs_attr_remove_iter(struct xfs_delattr_context *dac);
> > +int xfs_attr_trans_roll(struct xfs_delattr_context *dac);
> >  bool xfs_attr_namecheck(const void *name, size_t length);
> > +void xfs_delattr_context_init(struct xfs_delattr_context *dac,
> > +			      struct xfs_da_args *args);
> >  
> >  #endif	/* __XFS_ATTR_H__ */
> > diff --git a/fs/xfs/libxfs/xfs_attr_leaf.c b/fs/xfs/libxfs/xfs_attr_leaf.c
> > index bb128db..338377e 100644
> > --- a/fs/xfs/libxfs/xfs_attr_leaf.c
> > +++ b/fs/xfs/libxfs/xfs_attr_leaf.c
> > @@ -19,8 +19,8 @@
> >  #include "xfs_bmap_btree.h"
> >  #include "xfs_bmap.h"
> >  #include "xfs_attr_sf.h"
> > -#include "xfs_attr_remote.h"
> >  #include "xfs_attr.h"
> > +#include "xfs_attr_remote.h"
> >  #include "xfs_attr_leaf.h"
> >  #include "xfs_error.h"
> >  #include "xfs_trace.h"
> > diff --git a/fs/xfs/libxfs/xfs_attr_remote.c b/fs/xfs/libxfs/xfs_attr_remote.c
> > index 48d8e9c..1426c15 100644
> > --- a/fs/xfs/libxfs/xfs_attr_remote.c
> > +++ b/fs/xfs/libxfs/xfs_attr_remote.c
> > @@ -674,10 +674,12 @@ xfs_attr_rmtval_invalidate(
> >   */
> >  int
> >  xfs_attr_rmtval_remove(
> > -	struct xfs_da_args      *args)
> > +	struct xfs_da_args		*args)
> >  {
> > -	int			error;
> > -	int			retval;
> > +	int				error;
> > +	struct xfs_delattr_context	dac  = {
> > +		.da_args	= args,
> > +	};
> >  
> >  	trace_xfs_attr_rmtval_remove(args);
> >  
> > @@ -685,19 +687,17 @@ xfs_attr_rmtval_remove(
> >  	 * Keep de-allocating extents until the remote-value region is gone.
> >  	 */
> >  	do {
> > -		retval = __xfs_attr_rmtval_remove(args);
> > -		if (retval && retval != -EAGAIN)
> > -			return retval;
> > +		error = __xfs_attr_rmtval_remove(&dac);
> > +		if (error != -EAGAIN)
> > +			break;
> >  
> > -		/*
> > -		 * Close out trans and start the next one in the chain.
> > -		 */
> > -		error = xfs_trans_roll_inode(&args->trans, args->dp);
> > +		error = xfs_attr_trans_roll(&dac);
> >  		if (error)
> >  			return error;
> > -	} while (retval == -EAGAIN);
> >  
> > -	return 0;
> > +	} while (true);
> > +
> > +	return error;
> >  }
> >  
> >  /*
> > @@ -707,9 +707,10 @@ xfs_attr_rmtval_remove(
> >   */
> >  int
> >  __xfs_attr_rmtval_remove(
> > -	struct xfs_da_args	*args)
> > +	struct xfs_delattr_context	*dac)
> >  {
> > -	int			error, done;
> > +	struct xfs_da_args		*args = dac->da_args;
> > +	int				error, done;
> >  
> >  	/*
> >  	 * Unmap value blocks for this attr.
> > @@ -719,12 +720,10 @@ __xfs_attr_rmtval_remove(
> >  	if (error)
> >  		return error;
> >  
> > -	error = xfs_defer_finish(&args->trans);
> > -	if (error)
> > -		return error;
> > -
> > -	if (!done)
> > +	if (!done) {
> > +		dac->flags |= XFS_DAC_DEFER_FINISH;
> >  		return -EAGAIN;
> > +	}
> >  
> >  	return error;
> >  }
> > diff --git a/fs/xfs/libxfs/xfs_attr_remote.h b/fs/xfs/libxfs/xfs_attr_remote.h
> > index 9eee615..002fd30 100644
> > --- a/fs/xfs/libxfs/xfs_attr_remote.h
> > +++ b/fs/xfs/libxfs/xfs_attr_remote.h
> > @@ -14,5 +14,5 @@ int xfs_attr_rmtval_remove(struct xfs_da_args *args);
> >  int xfs_attr_rmtval_stale(struct xfs_inode *ip, struct xfs_bmbt_irec *map,
> >  		xfs_buf_flags_t incore_flags);
> >  int xfs_attr_rmtval_invalidate(struct xfs_da_args *args);
> > -int __xfs_attr_rmtval_remove(struct xfs_da_args *args);
> > +int __xfs_attr_rmtval_remove(struct xfs_delattr_context *dac);
> >  #endif /* __XFS_ATTR_REMOTE_H__ */
> > diff --git a/fs/xfs/xfs_attr_inactive.c b/fs/xfs/xfs_attr_inactive.c
> > index bfad669..aaa7e66 100644
> > --- a/fs/xfs/xfs_attr_inactive.c
> > +++ b/fs/xfs/xfs_attr_inactive.c
> > @@ -15,10 +15,10 @@
> >  #include "xfs_da_format.h"
> >  #include "xfs_da_btree.h"
> >  #include "xfs_inode.h"
> > +#include "xfs_attr.h"
> >  #include "xfs_attr_remote.h"
> >  #include "xfs_trans.h"
> >  #include "xfs_bmap.h"
> > -#include "xfs_attr.h"
> >  #include "xfs_attr_leaf.h"
> >  #include "xfs_quota.h"
> >  #include "xfs_dir2.h"
> > -- 
> > 2.7.4
> > 
> 

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v13 02/10] xfs: Add delay ready attr remove routines
  2020-10-23  6:34 ` [PATCH v13 02/10] xfs: Add delay ready attr remove routines Allison Henderson
  2020-10-27  9:59   ` Chandan Babu R
  2020-10-27 12:16   ` Brian Foster
@ 2020-11-10 23:43   ` Darrick J. Wong
  2020-11-11  0:28     ` Dave Chinner
  2020-11-13  3:43     ` Allison Henderson
  2 siblings, 2 replies; 58+ messages in thread
From: Darrick J. Wong @ 2020-11-10 23:43 UTC (permalink / raw)
  To: Allison Henderson; +Cc: linux-xfs

On Thu, Oct 22, 2020 at 11:34:27PM -0700, Allison Henderson wrote:
> This patch modifies the attr remove routines to be delay ready. This
> means they no longer roll or commit transactions, but instead return
> -EAGAIN to have the calling routine roll and refresh the transaction. In
> this series, xfs_attr_remove_args has become xfs_attr_remove_iter, which
> uses a sort of state machine like switch to keep track of where it was
> when EAGAIN was returned. xfs_attr_node_removename has also been
> modified to use the switch, and a new version of xfs_attr_remove_args
> consists of a simple loop to refresh the transaction until the operation
> is completed.  A new XFS_DAC_DEFER_FINISH flag is used to finish the
> transaction where ever the existing code used to.
> 
> Calls to xfs_attr_rmtval_remove are replaced with the delay ready
> version __xfs_attr_rmtval_remove. We will rename
> __xfs_attr_rmtval_remove back to xfs_attr_rmtval_remove when we are
> done.
> 
> xfs_attr_rmtval_remove itself is still in use by the set routines (used
> during a rename).  For reasons of preserving existing function, we
> modify xfs_attr_rmtval_remove to call xfs_defer_finish when the flag is
> set.  Similar to how xfs_attr_remove_args does here.  Once we transition
> the set routines to be delay ready, xfs_attr_rmtval_remove is no longer
> used and will be removed.
> 
> This patch also adds a new struct xfs_delattr_context, which we will use
> to keep track of the current state of an attribute operation. The new
> xfs_delattr_state enum is used to track various operations that are in
> progress so that we know not to repeat them, and resume where we left
> off before EAGAIN was returned to cycle out the transaction. Other
> members take the place of local variables that need to retain their
> values across multiple function recalls.  See xfs_attr.h for a more
> detailed diagram of the states.
> 
> Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
> ---
>  fs/xfs/libxfs/xfs_attr.c        | 200 +++++++++++++++++++++++++++++-----------
>  fs/xfs/libxfs/xfs_attr.h        |  72 +++++++++++++++
>  fs/xfs/libxfs/xfs_attr_leaf.c   |   2 +-
>  fs/xfs/libxfs/xfs_attr_remote.c |  37 ++++----
>  fs/xfs/libxfs/xfs_attr_remote.h |   2 +-
>  fs/xfs/xfs_attr_inactive.c      |   2 +-
>  6 files changed, 241 insertions(+), 74 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
> index f4d39bf..6ca94cb 100644
> --- a/fs/xfs/libxfs/xfs_attr.c
> +++ b/fs/xfs/libxfs/xfs_attr.c
> @@ -53,7 +53,7 @@ STATIC int xfs_attr_leaf_hasname(struct xfs_da_args *args, struct xfs_buf **bp);
>   */
>  STATIC int xfs_attr_node_get(xfs_da_args_t *args);
>  STATIC int xfs_attr_node_addname(xfs_da_args_t *args);
> -STATIC int xfs_attr_node_removename(xfs_da_args_t *args);
> +STATIC int xfs_attr_node_removename_iter(struct xfs_delattr_context *dac);
>  STATIC int xfs_attr_node_hasname(xfs_da_args_t *args,
>  				 struct xfs_da_state **state);
>  STATIC int xfs_attr_fillstate(xfs_da_state_t *state);
> @@ -264,6 +264,33 @@ xfs_attr_set_shortform(
>  }
>  
>  /*
> + * Checks to see if a delayed attribute transaction should be rolled.  If so,
> + * also checks for a defer finish.  Transaction is finished and rolled as
> + * needed, and returns true of false if the delayed operation should continue.
> + */
> +int
> +xfs_attr_trans_roll(
> +	struct xfs_delattr_context	*dac)
> +{
> +	struct xfs_da_args		*args = dac->da_args;
> +	int				error = 0;
> +
> +	if (dac->flags & XFS_DAC_DEFER_FINISH) {
> +		/*
> +		 * The caller wants us to finish all the deferred ops so that we
> +		 * avoid pinning the log tail with a large number of deferred
> +		 * ops.
> +		 */
> +		dac->flags &= ~XFS_DAC_DEFER_FINISH;
> +		error = xfs_defer_finish(&args->trans);
> +		if (error)
> +			return error;
> +	}
> +
> +	return xfs_trans_roll_inode(&args->trans, args->dp);
> +}

(Mostly ignoring these functions since they all go away by the end of
the patchset...)

> +
> +/*
>   * Set the attribute specified in @args.
>   */
>  int
> @@ -364,23 +391,54 @@ xfs_has_attr(
>   */
>  int
>  xfs_attr_remove_args(
> -	struct xfs_da_args      *args)
> +	struct xfs_da_args	*args)
>  {
> -	struct xfs_inode	*dp = args->dp;
> -	int			error;
> +	int				error = 0;
> +	struct xfs_delattr_context	dac = {
> +		.da_args	= args,
> +	};
> +
> +	do {
> +		error = xfs_attr_remove_iter(&dac);
> +		if (error != -EAGAIN)
> +			break;
> +
> +		error = xfs_attr_trans_roll(&dac);
> +		if (error)
> +			return error;
> +
> +	} while (true);
> +
> +	return error;
> +}
> +
> +/*
> + * Remove the attribute specified in @args.
> + *
> + * This function may return -EAGAIN to signal that the transaction needs to be
> + * rolled.  Callers should continue calling this function until they receive a
> + * return value other than -EAGAIN.
> + */
> +int
> +xfs_attr_remove_iter(
> +	struct xfs_delattr_context	*dac)
> +{
> +	struct xfs_da_args		*args = dac->da_args;
> +	struct xfs_inode		*dp = args->dp;
> +
> +	if (dac->dela_state == XFS_DAS_RM_SHRINK)
> +		goto node;
>  

Might as well just make this part of the if statement dispatch:

	if (dac->dela_state == XFS_DAS_RM_SHRINK)
		return xfs_attr_node_removename_iter(dac);
	else if (!xfs_inode_hasattr(dp))
		return -ENOATTR;

>  	if (!xfs_inode_hasattr(dp)) {
> -		error = -ENOATTR;
> +		return -ENOATTR;
>  	} else if (dp->i_afp->if_format == XFS_DINODE_FMT_LOCAL) {
>  		ASSERT(dp->i_afp->if_flags & XFS_IFINLINE);
> -		error = xfs_attr_shortform_remove(args);
> +		return xfs_attr_shortform_remove(args);
>  	} else if (xfs_bmap_one_block(dp, XFS_ATTR_FORK)) {
> -		error = xfs_attr_leaf_removename(args);
> -	} else {
> -		error = xfs_attr_node_removename(args);
> +		return xfs_attr_leaf_removename(args);
>  	}
> -
> -	return error;
> +node:
> +	return  xfs_attr_node_removename_iter(dac);
>  }
>  
>  /*
> @@ -1178,10 +1236,11 @@ xfs_attr_leaf_mark_incomplete(
>   */
>  STATIC
>  int xfs_attr_node_removename_setup(
> -	struct xfs_da_args	*args,
> -	struct xfs_da_state	**state)
> +	struct xfs_delattr_context	*dac,
> +	struct xfs_da_state		**state)

AFAICT *state == &dac->da_state by the end of the series; can you
should remove this argument too?

>  {
> -	int			error;
> +	struct xfs_da_args		*args = dac->da_args;
> +	int				error;
>  
>  	error = xfs_attr_node_hasname(args, state);
>  	if (error != -EEXIST)
> @@ -1191,6 +1250,12 @@ int xfs_attr_node_removename_setup(
>  	ASSERT((*state)->path.blk[(*state)->path.active - 1].magic ==
>  		XFS_ATTR_LEAF_MAGIC);
>  
> +	/*
> +	 * Store state in the context incase we need to cycle out the
> +	 * transaction
> +	 */
> +	dac->da_state = *state;
> +
>  	if (args->rmtblkno > 0) {
>  		error = xfs_attr_leaf_mark_incomplete(args, *state);

It doesn't make a lot of logical sense to me "we marked the attr
incomplete to hide it" is the same state (UNINIT) as "we haven't done
anything yet".

>  		if (error)
> @@ -1203,13 +1268,16 @@ int xfs_attr_node_removename_setup(
>  }
>  
>  STATIC int
> -xfs_attr_node_remove_rmt(
> -	struct xfs_da_args	*args,
> -	struct xfs_da_state	*state)
> +xfs_attr_node_remove_rmt (
> +	struct xfs_delattr_context	*dac,
> +	struct xfs_da_state		*state)
>  {
> -	int			error = 0;
> +	int				error = 0;
>  
> -	error = xfs_attr_rmtval_remove(args);
> +	/*
> +	 * May return -EAGAIN to request that the caller recall this function
> +	 */
> +	error = __xfs_attr_rmtval_remove(dac);
>  	if (error)
>  		return error;
>  
> @@ -1221,21 +1289,27 @@ xfs_attr_node_remove_rmt(
>  }
>  
>  /*
> - * Remove a name from a B-tree attribute list.
> + * Step through removeing a name from a B-tree attribute list.
>   *
>   * This will involve walking down the Btree, and may involve joining
>   * leaf nodes and even joining intermediate nodes up to and including
>   * the root node (a special case of an intermediate node).
> + *
> + * This routine is meant to function as either an inline or delayed operation,
> + * and may return -EAGAIN when the transaction needs to be rolled.  Calling
> + * functions will need to handle this, and recall the function until a
> + * successful error code is returned.
>   */
>  STATIC int
>  xfs_attr_node_remove_step(
> -	struct xfs_da_args	*args,
> -	struct xfs_da_state	*state)
> +	struct xfs_delattr_context	*dac)
>  {
> -	struct xfs_da_state_blk	*blk;
> -	int			retval, error;
> -	struct xfs_inode	*dp = args->dp;
> +	struct xfs_da_args		*args = dac->da_args;
> +	struct xfs_da_state		*state;
> +	struct xfs_da_state_blk		*blk;
> +	int				retval, error = 0;
>  
> +	state = dac->da_state;

Might as well initialize this when you declare state above.

>  
>  	/*
>  	 * If there is an out-of-line value, de-allocate the blocks.
> @@ -1243,7 +1317,10 @@ xfs_attr_node_remove_step(
>  	 * overflow the maximum size of a transaction and/or hit a deadlock.
>  	 */
>  	if (args->rmtblkno > 0) {
> -		error = xfs_attr_node_remove_rmt(args, state);
> +		/*
> +		 * May return -EAGAIN. Remove blocks until args->rmtblkno == 0
> +		 */
> +		error = xfs_attr_node_remove_rmt(dac, state);
>  		if (error)
>  			return error;
>  	}
> @@ -1257,21 +1334,18 @@ xfs_attr_node_remove_step(
>  	xfs_da3_fixhashpath(state, &state->path);
>  
>  	/*
> -	 * Check to see if the tree needs to be collapsed.
> +	 * Check to see if the tree needs to be collapsed.  Set the flag to
> +	 * indicate that the calling function needs to move the to shrink
> +	 * operation
>  	 */
>  	if (retval && (state->path.active > 1)) {
>  		error = xfs_da3_join(state);
>  		if (error)
>  			return error;
> -		error = xfs_defer_finish(&args->trans);
> -		if (error)
> -			return error;
> -		/*
> -		 * Commit the Btree join operation and start a new trans.
> -		 */
> -		error = xfs_trans_roll_inode(&args->trans, dp);
> -		if (error)
> -			return error;
> +
> +		dac->flags |= XFS_DAC_DEFER_FINISH;
> +		dac->dela_state = XFS_DAS_RM_SHRINK;
> +		return -EAGAIN;
>  	}
>  
>  	return error;
> @@ -1282,31 +1356,53 @@ xfs_attr_node_remove_step(
>   *
>   * This routine will find the blocks of the name to remove, remove them and
>   * shirnk the tree if needed.

"...and shrink the tree..."

> + *
> + * This routine is meant to function as either an inline or delayed operation,
> + * and may return -EAGAIN when the transaction needs to be rolled.  Calling
> + * functions will need to handle this, and recall the function until a
> + * successful error code is returned.
>   */
>  STATIC int
> -xfs_attr_node_removename(
> -	struct xfs_da_args	*args)
> +xfs_attr_node_removename_iter(
> +	struct xfs_delattr_context	*dac)
>  {
> -	struct xfs_da_state	*state;
> -	int			error;
> -	struct xfs_inode	*dp = args->dp;
> +	struct xfs_da_args		*args = dac->da_args;
> +	struct xfs_da_state		*state;
> +	int				error;
> +	struct xfs_inode		*dp = args->dp;
>  
>  	trace_xfs_attr_node_removename(args);
> +	state = dac->da_state;
>  
> -	error = xfs_attr_node_removename_setup(args, &state);
> -	if (error)
> -		goto out;
> +	if ((dac->flags & XFS_DAC_NODE_RMVNAME_INIT) == 0) {
> +		dac->flags |= XFS_DAC_NODE_RMVNAME_INIT;

Can we determine if it's necessary to call _removename_setup by checking
dac->da_state directly instead of having a flag?

> +		error = xfs_attr_node_removename_setup(dac, &state);
> +		if (error)
> +			goto out;
> +	}
>  
> -	error = xfs_attr_node_remove_step(args, state);
> -	if (error)
> -		goto out;
> +	switch (dac->dela_state) {
> +	case XFS_DAS_UNINIT:
> +		error = xfs_attr_node_remove_step(dac);
> +		if (error)
> +			break;
>  
> -	/*
> -	 * If the result is small enough, push it all into the inode.
> -	 */
> -	if (xfs_bmap_one_block(dp, XFS_ATTR_FORK))
> -		error = xfs_attr_node_shrink(args, state);
> +		/* do not break, proceed to shrink if needed */

/* fall through */

...because otherwise the static checkers will get mad.

(Well clang will anyway because gcc, llvm, and the C18 body all have
different incompatible ideas of what should be the magic tag that
signals an intentional fall through, but this should at least be
consistent with the rest of xfs.)

> +	case XFS_DAS_RM_SHRINK:
> +		/*
> +		 * If the result is small enough, push it all into the inode.
> +		 */
> +		if (xfs_bmap_one_block(dp, XFS_ATTR_FORK))
> +			error = xfs_attr_node_shrink(args, state);
>  
> +		break;
> +	default:
> +		ASSERT(0);
> +		return -EINVAL;
> +	}
> +
> +	if (error == -EAGAIN)
> +		return error;
>  out:
>  	if (state)
>  		xfs_da_state_free(state);
> diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
> index 3e97a93..64dcf0f 100644
> --- a/fs/xfs/libxfs/xfs_attr.h
> +++ b/fs/xfs/libxfs/xfs_attr.h
> @@ -74,6 +74,74 @@ struct xfs_attr_list_context {
>  };
>  
>  
> +/*
> + * ========================================================================
> + * Structure used to pass context around among the delayed routines.
> + * ========================================================================
> + */
> +
> +/*
> + * Below is a state machine diagram for attr remove operations. The  XFS_DAS_*
> + * states indicate places where the function would return -EAGAIN, and then
> + * immediately resume from after being recalled by the calling function. States
> + * marked as a "subroutine state" indicate that they belong to a subroutine, and
> + * so the calling function needs to pass them back to that subroutine to allow
> + * it to finish where it left off. But they otherwise do not have a role in the
> + * calling function other than just passing through.
> + *
> + * xfs_attr_remove_iter()
> + *	  XFS_DAS_RM_SHRINK ─┐
> + *	  (subroutine state) │
> + *	                     └─>xfs_attr_node_removename()
> + *	                                      │
> + *	                                      v
> + *	                                   need to
> + *	                                shrink tree? ─n─┐
> + *	                                      │         │
> + *	                                      y         │
> + *	                                      │         │
> + *	                                      v         │
> + *	                              XFS_DAS_RM_SHRINK │
> + *	                                      │         │
> + *	                                      v         │
> + *	                                     done <─────┘
> + *
> + */
> +
> +/*
> + * Enum values for xfs_delattr_context.da_state
> + *
> + * These values are used by delayed attribute operations to keep track  of where
> + * they were before they returned -EAGAIN.  A return code of -EAGAIN signals the
> + * calling function to roll the transaction, and then recall the subroutine to
> + * finish the operation.  The enum is then used by the subroutine to jump back
> + * to where it was and resume executing where it left off.
> + */
> +enum xfs_delattr_state {
> +	XFS_DAS_UNINIT		= 0,  /* No state has been set yet */
> +	XFS_DAS_RM_SHRINK,	      /* We are shrinking the tree */
> +};
> +
> +/*
> + * Defines for xfs_delattr_context.flags
> + */
> +#define XFS_DAC_DEFER_FINISH		0x01 /* finish the transaction */
> +#define XFS_DAC_NODE_RMVNAME_INIT	0x02 /* xfs_attr_node_removename init */
> +
> +/*
> + * Context used for keeping track of delayed attribute operations
> + */
> +struct xfs_delattr_context {
> +	struct xfs_da_args      *da_args;
> +
> +	/* Used in xfs_attr_node_removename to roll through removing blocks */
> +	struct xfs_da_state     *da_state;
> +
> +	/* Used to keep track of current state of delayed operation */
> +	unsigned int            flags;
> +	enum xfs_delattr_state  dela_state;
> +};
> +
>  /*========================================================================
>   * Function prototypes for the kernel.
>   *========================================================================*/
> @@ -91,6 +159,10 @@ int xfs_attr_set(struct xfs_da_args *args);
>  int xfs_attr_set_args(struct xfs_da_args *args);
>  int xfs_has_attr(struct xfs_da_args *args);
>  int xfs_attr_remove_args(struct xfs_da_args *args);
> +int xfs_attr_remove_iter(struct xfs_delattr_context *dac);
> +int xfs_attr_trans_roll(struct xfs_delattr_context *dac);
>  bool xfs_attr_namecheck(const void *name, size_t length);
> +void xfs_delattr_context_init(struct xfs_delattr_context *dac,
> +			      struct xfs_da_args *args);
>  
>  #endif	/* __XFS_ATTR_H__ */
> diff --git a/fs/xfs/libxfs/xfs_attr_leaf.c b/fs/xfs/libxfs/xfs_attr_leaf.c
> index bb128db..338377e 100644
> --- a/fs/xfs/libxfs/xfs_attr_leaf.c
> +++ b/fs/xfs/libxfs/xfs_attr_leaf.c
> @@ -19,8 +19,8 @@
>  #include "xfs_bmap_btree.h"
>  #include "xfs_bmap.h"
>  #include "xfs_attr_sf.h"
> -#include "xfs_attr_remote.h"
>  #include "xfs_attr.h"
> +#include "xfs_attr_remote.h"
>  #include "xfs_attr_leaf.h"
>  #include "xfs_error.h"
>  #include "xfs_trace.h"
> diff --git a/fs/xfs/libxfs/xfs_attr_remote.c b/fs/xfs/libxfs/xfs_attr_remote.c
> index 48d8e9c..1426c15 100644
> --- a/fs/xfs/libxfs/xfs_attr_remote.c
> +++ b/fs/xfs/libxfs/xfs_attr_remote.c
> @@ -674,10 +674,12 @@ xfs_attr_rmtval_invalidate(
>   */
>  int
>  xfs_attr_rmtval_remove(
> -	struct xfs_da_args      *args)
> +	struct xfs_da_args		*args)
>  {
> -	int			error;
> -	int			retval;
> +	int				error;
> +	struct xfs_delattr_context	dac  = {
> +		.da_args	= args,
> +	};
>  
>  	trace_xfs_attr_rmtval_remove(args);
>  
> @@ -685,19 +687,17 @@ xfs_attr_rmtval_remove(
>  	 * Keep de-allocating extents until the remote-value region is gone.
>  	 */
>  	do {
> -		retval = __xfs_attr_rmtval_remove(args);
> -		if (retval && retval != -EAGAIN)
> -			return retval;
> +		error = __xfs_attr_rmtval_remove(&dac);
> +		if (error != -EAGAIN)
> +			break;
>  
> -		/*
> -		 * Close out trans and start the next one in the chain.
> -		 */
> -		error = xfs_trans_roll_inode(&args->trans, args->dp);
> +		error = xfs_attr_trans_roll(&dac);
>  		if (error)
>  			return error;
> -	} while (retval == -EAGAIN);
>  
> -	return 0;
> +	} while (true);
> +
> +	return error;
>  }
>  
>  /*
> @@ -707,9 +707,10 @@ xfs_attr_rmtval_remove(
>   */
>  int
>  __xfs_attr_rmtval_remove(
> -	struct xfs_da_args	*args)
> +	struct xfs_delattr_context	*dac)
>  {
> -	int			error, done;
> +	struct xfs_da_args		*args = dac->da_args;
> +	int				error, done;
>  
>  	/*
>  	 * Unmap value blocks for this attr.
> @@ -719,12 +720,10 @@ __xfs_attr_rmtval_remove(
>  	if (error)
>  		return error;
>  
> -	error = xfs_defer_finish(&args->trans);
> -	if (error)
> -		return error;
> -
> -	if (!done)
> +	if (!done) {
> +		dac->flags |= XFS_DAC_DEFER_FINISH;
>  		return -EAGAIN;

What state are we in when we return -EAGAIN here?

[jumps back to his whole-branch diff]

Hm, oh, I see, the next state could be a number of things--

RM_LBLK if we're removing an old remote value from a leaf block as part
of an attr set operation; or

RM_NBLK if we're removing an old remote value from a node block as part
of an attr set operation; and

UNINIT if we're removing a remote value as part of an attr set
operation.

Oh!  For the first two, it looks to me as though either we're already in
the state we're setting (RM_[LN]BLK) or we were in either of the
FLIP_[LN]FLAG state.

I think it would make more sense if you set the state before calling the
rmtval_remove function, and leave a comment here saying that the caller
is responsible for figuring out the next state.

For removals, I wonder if we should have advanced beyond UNINIT by the
time we get here?  I think you've added the minimum states that are
necessary to resume work after a transaction roll, but from this and the
next patch I feel like we do a lot of work while dela_state == UNINIT.

FWIW I will be taking a close look at all the new 'return -EAGAIN'
statements to see if I can tell what state we're in when we trigger a
transaction roll.

--D

> +	}
>  
>  	return error;
>  }
> diff --git a/fs/xfs/libxfs/xfs_attr_remote.h b/fs/xfs/libxfs/xfs_attr_remote.h
> index 9eee615..002fd30 100644
> --- a/fs/xfs/libxfs/xfs_attr_remote.h
> +++ b/fs/xfs/libxfs/xfs_attr_remote.h
> @@ -14,5 +14,5 @@ int xfs_attr_rmtval_remove(struct xfs_da_args *args);
>  int xfs_attr_rmtval_stale(struct xfs_inode *ip, struct xfs_bmbt_irec *map,
>  		xfs_buf_flags_t incore_flags);
>  int xfs_attr_rmtval_invalidate(struct xfs_da_args *args);
> -int __xfs_attr_rmtval_remove(struct xfs_da_args *args);
> +int __xfs_attr_rmtval_remove(struct xfs_delattr_context *dac);
>  #endif /* __XFS_ATTR_REMOTE_H__ */
> diff --git a/fs/xfs/xfs_attr_inactive.c b/fs/xfs/xfs_attr_inactive.c
> index bfad669..aaa7e66 100644
> --- a/fs/xfs/xfs_attr_inactive.c
> +++ b/fs/xfs/xfs_attr_inactive.c
> @@ -15,10 +15,10 @@
>  #include "xfs_da_format.h"
>  #include "xfs_da_btree.h"
>  #include "xfs_inode.h"
> +#include "xfs_attr.h"
>  #include "xfs_attr_remote.h"
>  #include "xfs_trans.h"
>  #include "xfs_bmap.h"
> -#include "xfs_attr.h"
>  #include "xfs_attr_leaf.h"
>  #include "xfs_quota.h"
>  #include "xfs_dir2.h"
> -- 
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v13 02/10] xfs: Add delay ready attr remove routines
  2020-11-10 23:43   ` Darrick J. Wong
@ 2020-11-11  0:28     ` Dave Chinner
  2020-11-13  4:00       ` Allison Henderson
  2020-11-13  3:43     ` Allison Henderson
  1 sibling, 1 reply; 58+ messages in thread
From: Dave Chinner @ 2020-11-11  0:28 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Allison Henderson, linux-xfs

On Tue, Nov 10, 2020 at 03:43:31PM -0800, Darrick J. Wong wrote:
> On Thu, Oct 22, 2020 at 11:34:27PM -0700, Allison Henderson wrote:
> > +/*
> > + * Remove the attribute specified in @args.
> > + *
> > + * This function may return -EAGAIN to signal that the transaction needs to be
> > + * rolled.  Callers should continue calling this function until they receive a
> > + * return value other than -EAGAIN.
> > + */
> > +int
> > +xfs_attr_remove_iter(
> > +	struct xfs_delattr_context	*dac)
> > +{
> > +	struct xfs_da_args		*args = dac->da_args;
> > +	struct xfs_inode		*dp = args->dp;
> > +
> > +	if (dac->dela_state == XFS_DAS_RM_SHRINK)
> > +		goto node;
> >  
> 
> Might as well just make this part of the if statement dispatch:
> 
> 	if (dac->dela_state == XFS_DAS_RM_SHRINK)
> 		return xfs_attr_node_removename_iter(dac);
> 	else if (!xfs_inode_hasattr(dp))
> 		return -ENOATTR;
> 
> >  	if (!xfs_inode_hasattr(dp)) {
> > -		error = -ENOATTR;
> > +		return -ENOATTR;
> >  	} else if (dp->i_afp->if_format == XFS_DINODE_FMT_LOCAL) {
> >  		ASSERT(dp->i_afp->if_flags & XFS_IFINLINE);
> > -		error = xfs_attr_shortform_remove(args);
> > +		return xfs_attr_shortform_remove(args);
> >  	} else if (xfs_bmap_one_block(dp, XFS_ATTR_FORK)) {
> > -		error = xfs_attr_leaf_removename(args);
> > -	} else {
> > -		error = xfs_attr_node_removename(args);
> > +		return xfs_attr_leaf_removename(args);
> >  	}
> > -
> > -	return error;
> > +node:
> > +	return  xfs_attr_node_removename_iter(dac);

Just a nitpick on this anti-pattern: else is not necessary
when the branch returns.

	if (!xfs_inode_hasattr(dp))
		return -ENOATTR;

	if (dac->dela_state == XFS_DAS_RM_SHRINK)
		return xfs_attr_node_removename_iter(dac);

	if (dp->i_afp->if_format == XFS_DINODE_FMT_LOCAL) {
		ASSERT(dp->i_afp->if_flags & XFS_IFINLINE);
		return xfs_attr_shortform_remove(args);
	}

	if (xfs_bmap_one_block(dp, XFS_ATTR_FORK))
		return xfs_attr_leaf_removename(args);

	return xfs_attr_node_removename_iter(dac);

-Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v13 05/10] xfs: Set up infastructure for deferred attribute operations
  2020-11-10 21:51   ` Darrick J. Wong
@ 2020-11-11  3:44     ` Darrick J. Wong
  2020-11-13 17:06       ` Allison Henderson
  2020-11-13  1:32     ` Allison Henderson
  1 sibling, 1 reply; 58+ messages in thread
From: Darrick J. Wong @ 2020-11-11  3:44 UTC (permalink / raw)
  To: Allison Henderson; +Cc: linux-xfs

On Tue, Nov 10, 2020 at 01:51:49PM -0800, Darrick J. Wong wrote:
> On Thu, Oct 22, 2020 at 11:34:30PM -0700, Allison Henderson wrote:
> > Currently attributes are modified directly across one or more
> > transactions. But they are not logged or replayed in the event of an
> > error. The goal of delayed attributes is to enable logging and replaying
> > of attribute operations using the existing delayed operations
> > infrastructure.  This will later enable the attributes to become part of
> > larger multi part operations that also must first be recorded to the
> > log.  This is mostly of interest in the scheme of parent pointers which
> > would need to maintain an attribute containing parent inode information
> > any time an inode is moved, created, or removed.  Parent pointers would
> > then be of interest to any feature that would need to quickly derive an
> > inode path from the mount point. Online scrub, nfs lookups and fs grow
> > or shrink operations are all features that could take advantage of this.
> > 
> > This patch adds two new log item types for setting or removing
> > attributes as deferred operations.  The xfs_attri_log_item logs an
> > intent to set or remove an attribute.  The corresponding
> > xfs_attrd_log_item holds a reference to the xfs_attri_log_item and is
> > freed once the transaction is done.  Both log items use a generic
> > xfs_attr_log_format structure that contains the attribute name, value,
> > flags, inode, and an op_flag that indicates if the operations is a set
> > or remove.
> > 
> > Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
> > ---
> >  fs/xfs/Makefile                 |   1 +
> >  fs/xfs/libxfs/xfs_attr.c        |   7 +-
> >  fs/xfs/libxfs/xfs_attr.h        |  19 +
> >  fs/xfs/libxfs/xfs_defer.c       |   1 +
> >  fs/xfs/libxfs/xfs_defer.h       |   3 +
> >  fs/xfs/libxfs/xfs_format.h      |   5 +
> >  fs/xfs/libxfs/xfs_log_format.h  |  44 ++-
> >  fs/xfs/libxfs/xfs_log_recover.h |   2 +
> >  fs/xfs/libxfs/xfs_types.h       |   1 +
> >  fs/xfs/scrub/common.c           |   2 +
> >  fs/xfs/xfs_acl.c                |   2 +
> >  fs/xfs/xfs_attr_item.c          | 750 ++++++++++++++++++++++++++++++++++++++++
> >  fs/xfs/xfs_attr_item.h          |  76 ++++
> >  fs/xfs/xfs_attr_list.c          |   1 +
> >  fs/xfs/xfs_ioctl.c              |   2 +
> >  fs/xfs/xfs_ioctl32.c            |   2 +
> >  fs/xfs/xfs_iops.c               |   2 +
> >  fs/xfs/xfs_log.c                |   4 +
> >  fs/xfs/xfs_log_recover.c        |   2 +
> >  fs/xfs/xfs_ondisk.h             |   2 +
> >  fs/xfs/xfs_xattr.c              |   1 +
> >  21 files changed, 923 insertions(+), 6 deletions(-)
> > 
> > diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
> > index 04611a1..b056cfc 100644
> > --- a/fs/xfs/Makefile
> > +++ b/fs/xfs/Makefile
> > @@ -102,6 +102,7 @@ xfs-y				+= xfs_log.o \
> >  				   xfs_buf_item_recover.o \
> >  				   xfs_dquot_item_recover.o \
> >  				   xfs_extfree_item.o \
> > +				   xfs_attr_item.o \
> >  				   xfs_icreate_item.o \
> >  				   xfs_inode_item.o \
> >  				   xfs_inode_item_recover.o \
> > diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
> > index 6453178..760383c 100644
> > --- a/fs/xfs/libxfs/xfs_attr.c
> > +++ b/fs/xfs/libxfs/xfs_attr.c
> > @@ -24,6 +24,7 @@
> >  #include "xfs_quota.h"
> >  #include "xfs_trans_space.h"
> >  #include "xfs_trace.h"
> > +#include "xfs_attr_item.h"
> >  
> >  /*
> >   * xfs_attr.c
> > @@ -59,8 +60,6 @@ STATIC int xfs_attr_node_hasname(xfs_da_args_t *args,
> >  STATIC int xfs_attr_fillstate(xfs_da_state_t *state);
> >  STATIC int xfs_attr_refillstate(xfs_da_state_t *state);
> >  STATIC int xfs_attr_leaf_try_add(struct xfs_da_args *args, struct xfs_buf *bp);
> > -STATIC int xfs_attr_set_iter(struct xfs_delattr_context *dac,
> > -			     struct xfs_buf **leaf_bp);
> >  
> >  int
> >  xfs_inode_hasattr(
> > @@ -142,7 +141,7 @@ xfs_attr_get(
> >  /*
> >   * Calculate how many blocks we need for the new attribute,
> >   */
> > -STATIC int
> > +int
> >  xfs_attr_calc_size(
> >  	struct xfs_da_args	*args,
> >  	int			*local)
> > @@ -327,7 +326,7 @@ xfs_attr_set_args(
> >   * to handle this, and recall the function until a successful error code is
> >   * returned.
> >   */
> > -STATIC int
> > +int
> >  xfs_attr_set_iter(
> >  	struct xfs_delattr_context	*dac,
> >  	struct xfs_buf			**leaf_bp)
> > diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
> > index 501f9df..5b4a1ca 100644
> > --- a/fs/xfs/libxfs/xfs_attr.h
> > +++ b/fs/xfs/libxfs/xfs_attr.h
> > @@ -247,6 +247,7 @@ enum xfs_delattr_state {
> >  #define XFS_DAC_DEFER_FINISH		0x01 /* finish the transaction */
> >  #define XFS_DAC_NODE_RMVNAME_INIT	0x02 /* xfs_attr_node_removename init */
> >  #define XFS_DAC_LEAF_ADDNAME_INIT	0x04 /* xfs_attr_leaf_addname init*/
> > +#define XFS_DAC_DELAYED_OP_INIT		0x08 /* delayed operations init*/
> >  
> >  /*
> >   * Context used for keeping track of delayed attribute operations
> > @@ -254,6 +255,9 @@ enum xfs_delattr_state {
> >  struct xfs_delattr_context {
> >  	struct xfs_da_args      *da_args;
> >  
> > +	/* Used by delayed attributes to hold leaf across transactions */
> 
> "Used by xfs_attr_set to hold a leaf buffer across a transaction roll" ?
> 
> > +	struct xfs_buf		*leaf_bp;
> > +
> >  	/* Used in xfs_attr_rmtval_set_blk to roll through allocating blocks */
> >  	struct xfs_bmbt_irec	map;
> >  	xfs_dablk_t		lblkno;
> > @@ -267,6 +271,18 @@ struct xfs_delattr_context {
> >  	enum xfs_delattr_state  dela_state;
> >  };
> >  
> > +/*
> > + * List of attrs to commit later.
> > + */
> > +struct xfs_attr_item {
> > +	struct xfs_delattr_context	xattri_dac;
> > +	uint32_t			xattri_op_flags;/* attr op set or rm */
> 
> The comment for xattri_op_flags should be more direct in mentioning that
> it takes XFS_ATTR_OP_FLAGS_{SET,REMOVE}.
> 
> (Alternately you could define an enum for the incore state tracker that
> causes the appropriate XFS_ATTR_OP_FLAG* to be set on the log item in
> xfs_attr_create_intent to avoid mixing of the flag namespaces, but that
> is a lot of paper-pushing...)
> 
> > +
> > +	/* used to log this item to an intent */
> > +	struct list_head		xattri_list;
> > +};
> 
> Ok, so going back to a confusing comment I had from the last series,
> I'm glad that you've moved all the attr code to be deferred operations.
> 
> Can you move all the xfs_delattr_context fields into xfs_attr_item?
> AFAICT (from git diff'ing the entire branch :P) we never allocate an
> xfs_delattr_context on its own; we only ever access the one that's
> embedded in xfs_attr_item, right?
> 
> > +
> > +
> >  /*========================================================================
> >   * Function prototypes for the kernel.
> >   *========================================================================*/
> > @@ -282,11 +298,14 @@ int xfs_attr_get_ilocked(struct xfs_da_args *args);
> >  int xfs_attr_get(struct xfs_da_args *args);
> >  int xfs_attr_set(struct xfs_da_args *args);
> >  int xfs_attr_set_args(struct xfs_da_args *args);
> > +int xfs_attr_set_iter(struct xfs_delattr_context *dac,
> > +		      struct xfs_buf **leaf_bp);
> >  int xfs_has_attr(struct xfs_da_args *args);
> >  int xfs_attr_remove_args(struct xfs_da_args *args);
> >  int xfs_attr_remove_iter(struct xfs_delattr_context *dac);
> >  bool xfs_attr_namecheck(const void *name, size_t length);
> >  void xfs_delattr_context_init(struct xfs_delattr_context *dac,
> >  			      struct xfs_da_args *args);
> > +int xfs_attr_calc_size(struct xfs_da_args *args, int *local);
> >  
> >  #endif	/* __XFS_ATTR_H__ */
> > diff --git a/fs/xfs/libxfs/xfs_defer.c b/fs/xfs/libxfs/xfs_defer.c
> > index eff4a12..e9caff7 100644
> > --- a/fs/xfs/libxfs/xfs_defer.c
> > +++ b/fs/xfs/libxfs/xfs_defer.c
> > @@ -178,6 +178,7 @@ static const struct xfs_defer_op_type *defer_op_types[] = {
> >  	[XFS_DEFER_OPS_TYPE_RMAP]	= &xfs_rmap_update_defer_type,
> >  	[XFS_DEFER_OPS_TYPE_FREE]	= &xfs_extent_free_defer_type,
> >  	[XFS_DEFER_OPS_TYPE_AGFL_FREE]	= &xfs_agfl_free_defer_type,
> > +	[XFS_DEFER_OPS_TYPE_ATTR]	= &xfs_attr_defer_type,
> >  };
> >  
> >  static void
> > diff --git a/fs/xfs/libxfs/xfs_defer.h b/fs/xfs/libxfs/xfs_defer.h
> > index 05472f7..72a5789 100644
> > --- a/fs/xfs/libxfs/xfs_defer.h
> > +++ b/fs/xfs/libxfs/xfs_defer.h
> > @@ -19,6 +19,7 @@ enum xfs_defer_ops_type {
> >  	XFS_DEFER_OPS_TYPE_RMAP,
> >  	XFS_DEFER_OPS_TYPE_FREE,
> >  	XFS_DEFER_OPS_TYPE_AGFL_FREE,
> > +	XFS_DEFER_OPS_TYPE_ATTR,
> >  	XFS_DEFER_OPS_TYPE_MAX,
> >  };
> >  
> > @@ -63,6 +64,8 @@ extern const struct xfs_defer_op_type xfs_refcount_update_defer_type;
> >  extern const struct xfs_defer_op_type xfs_rmap_update_defer_type;
> >  extern const struct xfs_defer_op_type xfs_extent_free_defer_type;
> >  extern const struct xfs_defer_op_type xfs_agfl_free_defer_type;
> > +extern const struct xfs_defer_op_type xfs_attr_defer_type;
> > +
> >  
> >  /*
> >   * This structure enables a dfops user to detach the chain of deferred
> > diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
> > index dd764da..d419c34 100644
> > --- a/fs/xfs/libxfs/xfs_format.h
> > +++ b/fs/xfs/libxfs/xfs_format.h
> > @@ -584,6 +584,11 @@ static inline bool xfs_sb_version_hasinobtcounts(struct xfs_sb *sbp)
> >  		(sbp->sb_features_ro_compat & XFS_SB_FEAT_RO_COMPAT_INOBTCNT);
> >  }
> >  
> > +static inline bool xfs_sb_version_hasdelattr(struct xfs_sb *sbp)
> > +{
> > +	return false;
> > +}
> > +
> >  /*
> >   * end of superblock version macros
> >   */
> > diff --git a/fs/xfs/libxfs/xfs_log_format.h b/fs/xfs/libxfs/xfs_log_format.h
> > index 8bd00da..de6309d 100644
> > --- a/fs/xfs/libxfs/xfs_log_format.h
> > +++ b/fs/xfs/libxfs/xfs_log_format.h
> > @@ -117,7 +117,12 @@ struct xfs_unmount_log_format {
> >  #define XLOG_REG_TYPE_CUD_FORMAT	24
> >  #define XLOG_REG_TYPE_BUI_FORMAT	25
> >  #define XLOG_REG_TYPE_BUD_FORMAT	26
> > -#define XLOG_REG_TYPE_MAX		26
> > +#define XLOG_REG_TYPE_ATTRI_FORMAT	27
> > +#define XLOG_REG_TYPE_ATTRD_FORMAT	28
> > +#define XLOG_REG_TYPE_ATTR_NAME	29
> > +#define XLOG_REG_TYPE_ATTR_VALUE	30
> > +#define XLOG_REG_TYPE_MAX		30
> > +
> >  
> >  /*
> >   * Flags to log operation header
> > @@ -240,6 +245,8 @@ typedef struct xfs_trans_header {
> >  #define	XFS_LI_CUD		0x1243
> >  #define	XFS_LI_BUI		0x1244	/* bmbt update intent */
> >  #define	XFS_LI_BUD		0x1245
> > +#define	XFS_LI_ATTRI		0x1246  /* attr set/remove intent*/
> > +#define	XFS_LI_ATTRD		0x1247  /* attr set/remove done */
> >  
> >  #define XFS_LI_TYPE_DESC \
> >  	{ XFS_LI_EFI,		"XFS_LI_EFI" }, \
> > @@ -255,7 +262,9 @@ typedef struct xfs_trans_header {
> >  	{ XFS_LI_CUI,		"XFS_LI_CUI" }, \
> >  	{ XFS_LI_CUD,		"XFS_LI_CUD" }, \
> >  	{ XFS_LI_BUI,		"XFS_LI_BUI" }, \
> > -	{ XFS_LI_BUD,		"XFS_LI_BUD" }
> > +	{ XFS_LI_BUD,		"XFS_LI_BUD" }, \
> > +	{ XFS_LI_ATTRI,		"XFS_LI_ATTRI" }, \
> > +	{ XFS_LI_ATTRD,		"XFS_LI_ATTRD" }
> >  
> >  /*
> >   * Inode Log Item Format definitions.
> > @@ -863,4 +872,35 @@ struct xfs_icreate_log {
> >  	__be32		icl_gen;	/* inode generation number to use */
> >  };
> >  
> > +/*
> > + * Flags for deferred attribute operations.
> > + * Upper bits are flags, lower byte is type code
> > + */
> > +#define XFS_ATTR_OP_FLAGS_SET		1	/* Set the attribute */
> > +#define XFS_ATTR_OP_FLAGS_REMOVE	2	/* Remove the attribute */
> > +#define XFS_ATTR_OP_FLAGS_TYPE_MASK	0x0FF	/* Flags type mask */
> > +
> > +/*
> > + * This is the structure used to lay out an attr log item in the
> > + * log.
> > + */
> > +struct xfs_attri_log_format {
> > +	uint16_t	alfi_type;	/* attri log item type */
> > +	uint16_t	alfi_size;	/* size of this item */
> > +	uint32_t	__pad;		/* pad to 64 bit aligned */
> > +	uint64_t	alfi_id;	/* attri identifier */
> > +	xfs_ino_t	alfi_ino;	/* the inode for this attr operation */
> 
> This is an ondisk structure; please use only explicitly sized data
> types like uint64_t.
> 
> > +	uint32_t	alfi_op_flags;	/* marks the op as a set or remove */
> > +	uint32_t	alfi_name_len;	/* attr name length */
> > +	uint32_t	alfi_value_len;	/* attr value length */
> > +	uint32_t	alfi_attr_flags;/* attr flags */
> > +};
> > +
> > +struct xfs_attrd_log_format {
> > +	uint16_t	alfd_type;	/* attrd log item type */
> > +	uint16_t	alfd_size;	/* size of this item */
> > +	uint32_t	__pad;		/* pad to 64 bit aligned */
> > +	uint64_t	alfd_alf_id;	/* id of corresponding attrd */
> 
> "..of corresponding attri"
> 
> > +};
> > +
> >  #endif /* __XFS_LOG_FORMAT_H__ */
> > diff --git a/fs/xfs/libxfs/xfs_log_recover.h b/fs/xfs/libxfs/xfs_log_recover.h
> > index 3cca2bf..b6e5514 100644
> > --- a/fs/xfs/libxfs/xfs_log_recover.h
> > +++ b/fs/xfs/libxfs/xfs_log_recover.h
> > @@ -72,6 +72,8 @@ extern const struct xlog_recover_item_ops xlog_rui_item_ops;
> >  extern const struct xlog_recover_item_ops xlog_rud_item_ops;
> >  extern const struct xlog_recover_item_ops xlog_cui_item_ops;
> >  extern const struct xlog_recover_item_ops xlog_cud_item_ops;
> > +extern const struct xlog_recover_item_ops xlog_attri_item_ops;
> > +extern const struct xlog_recover_item_ops xlog_attrd_item_ops;
> >  
> >  /*
> >   * Macros, structures, prototypes for internal log manager use.
> > diff --git a/fs/xfs/libxfs/xfs_types.h b/fs/xfs/libxfs/xfs_types.h
> > index 397d947..860cdd2 100644
> > --- a/fs/xfs/libxfs/xfs_types.h
> > +++ b/fs/xfs/libxfs/xfs_types.h
> > @@ -11,6 +11,7 @@ typedef uint32_t	prid_t;		/* project ID */
> >  typedef uint32_t	xfs_agblock_t;	/* blockno in alloc. group */
> >  typedef uint32_t	xfs_agino_t;	/* inode # within allocation grp */
> >  typedef uint32_t	xfs_extlen_t;	/* extent length in blocks */
> > +typedef uint32_t	xfs_attrlen_t;	/* attr length */
> 
> This doesn't get used anywhere.
> 
> >  typedef uint32_t	xfs_agnumber_t;	/* allocation group number */
> >  typedef int32_t		xfs_extnum_t;	/* # of extents in a file */
> >  typedef int16_t		xfs_aextnum_t;	/* # extents in an attribute fork */
> > diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
> > index 1887605..9a649d1 100644
> > --- a/fs/xfs/scrub/common.c
> > +++ b/fs/xfs/scrub/common.c
> > @@ -24,6 +24,8 @@
> >  #include "xfs_rmap_btree.h"
> >  #include "xfs_log.h"
> >  #include "xfs_trans_priv.h"
> > +#include "xfs_da_format.h"
> > +#include "xfs_da_btree.h"
> >  #include "xfs_attr.h"
> >  #include "xfs_reflink.h"
> >  #include "scrub/scrub.h"
> > diff --git a/fs/xfs/xfs_acl.c b/fs/xfs/xfs_acl.c
> > index c544951..cad1db4 100644
> > --- a/fs/xfs/xfs_acl.c
> > +++ b/fs/xfs/xfs_acl.c
> > @@ -10,6 +10,8 @@
> >  #include "xfs_trans_resv.h"
> >  #include "xfs_mount.h"
> >  #include "xfs_inode.h"
> > +#include "xfs_da_format.h"
> > +#include "xfs_da_btree.h"
> >  #include "xfs_attr.h"
> >  #include "xfs_trace.h"
> >  #include "xfs_error.h"
> > diff --git a/fs/xfs/xfs_attr_item.c b/fs/xfs/xfs_attr_item.c
> > new file mode 100644
> > index 0000000..3980066
> > --- /dev/null
> > +++ b/fs/xfs/xfs_attr_item.c
> > @@ -0,0 +1,750 @@
> > +// SPDX-License-Identifier: GPL-2.0-or-later
> > +/*
> > + * Copyright (C) 2019 Oracle.  All Rights Reserved.
> 
> 2019 -> 2020.
> 
> > + * Author: Allison Collins <allison.henderson@oracle.com>
> > + */
> > +
> > +#include "xfs.h"
> > +#include "xfs_fs.h"
> > +#include "xfs_format.h"
> > +#include "xfs_log_format.h"
> > +#include "xfs_trans_resv.h"
> > +#include "xfs_bit.h"
> > +#include "xfs_shared.h"
> > +#include "xfs_mount.h"
> > +#include "xfs_defer.h"
> > +#include "xfs_trans.h"
> > +#include "xfs_trans_priv.h"
> > +#include "xfs_buf_item.h"
> > +#include "xfs_attr_item.h"
> > +#include "xfs_log.h"
> > +#include "xfs_btree.h"
> > +#include "xfs_rmap.h"
> > +#include "xfs_inode.h"
> > +#include "xfs_icache.h"
> > +#include "xfs_da_format.h"
> > +#include "xfs_da_btree.h"
> > +#include "xfs_attr.h"
> > +#include "xfs_shared.h"
> > +#include "xfs_attr_item.h"
> > +#include "xfs_alloc.h"
> > +#include "xfs_bmap.h"
> > +#include "xfs_trace.h"
> > +#include "libxfs/xfs_da_format.h"
> > +#include "xfs_inode.h"
> > +#include "xfs_quota.h"
> > +#include "xfs_log_priv.h"
> > +#include "xfs_log_recover.h"
> > +
> > +static const struct xfs_item_ops xfs_attri_item_ops;
> > +static const struct xfs_item_ops xfs_attrd_item_ops;
> > +
> > +static inline struct xfs_attri_log_item *ATTRI_ITEM(struct xfs_log_item *lip)
> > +{
> > +	return container_of(lip, struct xfs_attri_log_item, attri_item);
> > +}
> > +
> > +STATIC void
> > +xfs_attri_item_free(
> > +	struct xfs_attri_log_item	*attrip)
> > +{
> > +	kmem_free(attrip->attri_item.li_lv_shadow);
> > +	kmem_free(attrip);
> > +}
> > +
> > +/*
> > + * Freeing the attrip requires that we remove it from the AIL if it has already
> > + * been placed there. However, the ATTRI may not yet have been placed in the
> > + * AIL when called by xfs_attri_release() from ATTRD processing due to the
> > + * ordering of committed vs unpin operations in bulk insert operations. Hence
> > + * the reference count to ensure only the last caller frees the ATTRI.
> > + */
> > +STATIC void
> > +xfs_attri_release(
> > +	struct xfs_attri_log_item	*attrip)
> > +{
> > +	ASSERT(atomic_read(&attrip->attri_refcount) > 0);
> > +	if (atomic_dec_and_test(&attrip->attri_refcount)) {
> > +		xfs_trans_ail_delete(&attrip->attri_item,
> > +				     SHUTDOWN_LOG_IO_ERROR);
> > +		xfs_attri_item_free(attrip);
> > +	}
> > +}
> > +
> > +/*
> > + * This returns the number of iovecs needed to log the given attri item. We
> > + * only need 1 iovec for an attri item.  It just logs the attr_log_format
> > + * structure.
> > + */
> > +static inline int
> > +xfs_attri_item_sizeof(
> > +	struct xfs_attri_log_item *attrip)
> > +{
> > +	return sizeof(struct xfs_attri_log_format);
> > +}
> 
> Please get rid of this trivial oneliner.
> 
> > +
> > +STATIC void
> > +xfs_attri_item_size(
> > +	struct xfs_log_item	*lip,
> > +	int			*nvecs,
> > +	int			*nbytes)
> > +{
> > +	struct xfs_attri_log_item       *attrip = ATTRI_ITEM(lip);
> > +
> > +	*nvecs += 1;
> > +	*nbytes += xfs_attri_item_sizeof(attrip);
> > +
> > +	/* Attr set and remove operations require a name */
> > +	ASSERT(attrip->attri_name_len > 0);
> > +
> > +	*nvecs += 1;
> > +	*nbytes += ATTR_NVEC_SIZE(attrip->attri_name_len);
> > +
> > +	/*
> > +	 * Set ops can accept a value of 0 len to clear an attr value.  Remove
> > +	 * ops do not need a value at all.  So only account for the value
> > +	 * when it is needed.
> > +	 */
> > +	if (attrip->attri_value_len > 0) {
> > +		*nvecs += 1;
> > +		*nbytes += ATTR_NVEC_SIZE(attrip->attri_value_len);
> > +	}
> > +}
> > +
> > +/*
> > + * This is called to fill in the log iovecs for the given attri log
> > + * item. We use  1 iovec for the attri_format_item, 1 for the name, and
> > + * another for the value if it is present
> > + */
> > +STATIC void
> > +xfs_attri_item_format(
> > +	struct xfs_log_item	*lip,
> > +	struct xfs_log_vec	*lv)
> > +{
> > +	struct xfs_attri_log_item	*attrip = ATTRI_ITEM(lip);
> > +	struct xfs_log_iovec		*vecp = NULL;
> > +
> > +	attrip->attri_format.alfi_type = XFS_LI_ATTRI;
> > +	attrip->attri_format.alfi_size = 1;
> > +
> > +	/*
> > +	 * This size accounting must be done before copying the attrip into the
> > +	 * iovec.  If we do it after, the wrong size will be recorded to the log
> > +	 * and we trip across assertion checks for bad region sizes later during
> > +	 * the log recovery.
> > +	 */
> > +
> > +	ASSERT(attrip->attri_name_len > 0);
> > +	attrip->attri_format.alfi_size++;
> > +
> > +	if (attrip->attri_value_len > 0)
> > +		attrip->attri_format.alfi_size++;
> > +
> > +	xlog_copy_iovec(lv, &vecp, XLOG_REG_TYPE_ATTRI_FORMAT,
> > +			&attrip->attri_format,
> > +			xfs_attri_item_sizeof(attrip));
> > +	if (attrip->attri_name_len > 0)
> 
> I thought we required attri_name_len > 0 always?
> 
> > +		xlog_copy_iovec(lv, &vecp, XLOG_REG_TYPE_ATTR_NAME,
> > +				attrip->attri_name,
> > +				ATTR_NVEC_SIZE(attrip->attri_name_len));
> > +
> > +	if (attrip->attri_value_len > 0)
> > +		xlog_copy_iovec(lv, &vecp, XLOG_REG_TYPE_ATTR_VALUE,
> > +				attrip->attri_value,
> > +				ATTR_NVEC_SIZE(attrip->attri_value_len));
> > +}
> > +
> > +/*
> > + * The unpin operation is the last place an ATTRI is manipulated in the log. It
> > + * is either inserted in the AIL or aborted in the event of a log I/O error. In
> > + * either case, the ATTRI transaction has been successfully committed to make
> > + * it this far. Therefore, we expect whoever committed the ATTRI to either
> > + * construct and commit the ATTRD or drop the ATTRD's reference in the event of
> > + * error. Simply drop the log's ATTRI reference now that the log is done with
> > + * it.
> > + */
> > +STATIC void
> > +xfs_attri_item_unpin(
> > +	struct xfs_log_item	*lip,
> > +	int			remove)
> > +{
> > +	struct xfs_attri_log_item	*attrip = ATTRI_ITEM(lip);
> > +
> > +	xfs_attri_release(attrip);
> 
> Nit: this could be shortened to xfs_attri_release(ATTRI_ITEM(lip)).
> 
> > +}
> > +
> > +
> > +STATIC void
> > +xfs_attri_item_release(
> > +	struct xfs_log_item	*lip)
> > +{
> > +	xfs_attri_release(ATTRI_ITEM(lip));
> > +}
> > +
> > +/*
> > + * Allocate and initialize an attri item
> > + */
> > +STATIC struct xfs_attri_log_item *
> > +xfs_attri_init(
> > +	struct xfs_mount	*mp)
> > +
> > +{
> > +	struct xfs_attri_log_item	*attrip;
> > +	uint				size;
> 
> Can you line up the *mp in the parameter list with the *attrip in the
> local variables?
> 
> > +
> > +	size = (uint)(sizeof(struct xfs_attri_log_item));
> 
> kmem_zalloc takes a size_t parameter (which is the return type of sizeof);
> no need to do all this casting.
> 
> > +	attrip = kmem_zalloc(size, 0);
> > +
> > +	xfs_log_item_init(mp, &attrip->attri_item, XFS_LI_ATTRI,
> > +			  &xfs_attri_item_ops);
> > +	attrip->attri_format.alfi_id = (uintptr_t)(void *)attrip;
> > +	atomic_set(&attrip->attri_refcount, 2);
> > +
> > +	return attrip;
> > +}
> > +
> > +/*
> > + * Copy an attr format buffer from the given buf, and into the destination attr
> > + * format structure.
> > + */
> > +STATIC int
> > +xfs_attri_copy_format(struct xfs_log_iovec *buf,
> > +		      struct xfs_attri_log_format *dst_attr_fmt)
> > +{
> > +	struct xfs_attri_log_format *src_attr_fmt = buf->i_addr;
> > +	uint len = sizeof(struct xfs_attri_log_format);
> 
> Indentation and whatnot with the parameter names.
> 
> > +
> > +	if (buf->i_len != len)
> > +		return -EFSCORRUPTED;
> > +
> > +	memcpy((char *)dst_attr_fmt, (char *)src_attr_fmt, len);
> > +	return 0;
> > +}
> > +
> > +static inline struct xfs_attrd_log_item *ATTRD_ITEM(struct xfs_log_item *lip)
> > +{
> > +	return container_of(lip, struct xfs_attrd_log_item, attrd_item);
> > +}
> > +
> > +STATIC void
> > +xfs_attrd_item_free(struct xfs_attrd_log_item *attrdp)
> > +{
> > +	kmem_free(attrdp->attrd_item.li_lv_shadow);
> > +	kmem_free(attrdp);
> > +}
> > +
> > +/*
> > + * This returns the number of iovecs needed to log the given attrd item.
> > + * We only need 1 iovec for an attrd item.  It just logs the attr_log_format
> > + * structure.
> > + */
> > +static inline int
> > +xfs_attrd_item_sizeof(
> > +	struct xfs_attrd_log_item *attrdp)
> > +{
> > +	return sizeof(struct xfs_attrd_log_format);
> > +}
> > +
> > +STATIC void
> > +xfs_attrd_item_size(
> > +	struct xfs_log_item	*lip,
> > +	int			*nvecs,
> > +	int			*nbytes)
> > +{
> > +	struct xfs_attrd_log_item	*attrdp = ATTRD_ITEM(lip);
> 
> Variable name alignment between the parameter list and the local vars.
> 
> > +	*nvecs += 1;
> 
> Space between local variable declaration and the first line of code.
> 
> > +	*nbytes += xfs_attrd_item_sizeof(attrdp);
> 
> No need for a oneliner function for sizeof.
> 
> > +}
> > +
> > +/*
> > + * This is called to fill in the log iovecs for the given attrd log item. We use
> > + * only 1 iovec for the attrd_format, and we point that at the attr_log_format
> > + * structure embedded in the attrd item.
> > + */
> > +STATIC void
> > +xfs_attrd_item_format(
> > +	struct xfs_log_item	*lip,
> > +	struct xfs_log_vec	*lv)
> > +{
> > +	struct xfs_attrd_log_item	*attrdp = ATTRD_ITEM(lip);
> > +	struct xfs_log_iovec		*vecp = NULL;
> > +
> > +	attrdp->attrd_format.alfd_type = XFS_LI_ATTRD;
> > +	attrdp->attrd_format.alfd_size = 1;
> > +
> > +	xlog_copy_iovec(lv, &vecp, XLOG_REG_TYPE_ATTRD_FORMAT,
> > +			&attrdp->attrd_format, xfs_attrd_item_sizeof(attrdp));
> > +}
> > +
> > +/*
> > + * The ATTRD is either committed or aborted if the transaction is cancelled. If
> > + * the transaction is cancelled, drop our reference to the ATTRI and free the
> > + * ATTRD.
> > + */
> > +STATIC void
> > +xfs_attrd_item_release(
> > +	struct xfs_log_item     *lip)
> > +{
> > +	struct xfs_attrd_log_item *attrdp = ATTRD_ITEM(lip);
> > +	xfs_attri_release(attrdp->attrd_attrip);
> 
> Space between the variable declaration and the first line of code.
> 
> > +	xfs_attrd_item_free(attrdp);
> > +}
> > +
> > +/*
> > + * Log an ATTRI it to the ATTRD when the attr op is done.  An attr operation
> 
> I don't know what "Log an ATTRI it to the ATTRD" means.  I think this is
> the function that performs one step of an attribute update intent and
> then tags the attrd item dirty, right?
> 
> > + * may be a set or a remove.  Note that the transaction is marked dirty
> > + * regardless of whether the operation succeeds or fails to support the
> > + * ATTRI/ATTRD lifecycle rules.
> > + */
> > +int
> > +xfs_trans_attr(
> > +	struct xfs_delattr_context	*dac,
> > +	struct xfs_attrd_log_item	*attrdp,
> > +	struct xfs_buf			**leaf_bp,
> > +	uint32_t			op_flags)
> > +{
> > +	struct xfs_da_args		*args = dac->da_args;
> > +	int				error;
> > +
> > +	error = xfs_qm_dqattach_locked(args->dp, 0);
> > +	if (error)
> > +		return error;
> > +
> > +	switch (op_flags) {
> > +	case XFS_ATTR_OP_FLAGS_SET:
> > +		args->op_flags |= XFS_DA_OP_ADDNAME;
> > +		error = xfs_attr_set_iter(dac, leaf_bp);
> > +		break;
> > +	case XFS_ATTR_OP_FLAGS_REMOVE:
> > +		ASSERT(XFS_IFORK_Q((args->dp)));
> 
> No need for the double parentheses around args->dp.
> 
> > +		error = xfs_attr_remove_iter(dac);
> > +		break;
> > +	default:
> > +		error = -EFSCORRUPTED;
> > +		break;
> > +	}
> > +
> > +	/*
> > +	 * Mark the transaction dirty, even on error. This ensures the
> > +	 * transaction is aborted, which:
> > +	 *
> > +	 * 1.) releases the ATTRI and frees the ATTRD
> > +	 * 2.) shuts down the filesystem
> > +	 */
> > +	args->trans->t_flags |= XFS_TRANS_DIRTY;
> > +	if (xfs_sb_version_hasdelattr(&args->dp->i_mount->m_sb))
> > +		set_bit(XFS_LI_DIRTY, &attrdp->attrd_item.li_flags);
> 
> This could probably be:
> 
> 	if (attrdp)
> 		set_bit(...);
> 
> > +
> > +	return error;
> > +}
> > +
> > +/* Log an attr to the intent item. */
> > +STATIC void
> > +xfs_attr_log_item(
> > +	struct xfs_trans		*tp,
> > +	struct xfs_attri_log_item	*attrip,
> > +	struct xfs_attr_item		*attr)
> > +{
> > +	struct xfs_attri_log_format	*attrp;
> > +
> > +	tp->t_flags |= XFS_TRANS_DIRTY;
> > +	set_bit(XFS_LI_DIRTY, &attrip->attri_item.li_flags);
> > +
> > +	/*
> > +	 * At this point the xfs_attr_item has been constructed, and we've
> > +	 * created the log intent. Fill in the attri log item and log format
> > +	 * structure with fields from this xfs_attr_item
> > +	 */
> > +	attrp = &attrip->attri_format;
> > +	attrp->alfi_ino = attr->xattri_dac.da_args->dp->i_ino;
> > +	attrp->alfi_op_flags = attr->xattri_op_flags;
> > +	attrp->alfi_value_len = attr->xattri_dac.da_args->valuelen;
> > +	attrp->alfi_name_len = attr->xattri_dac.da_args->namelen;
> > +	attrp->alfi_attr_flags = attr->xattri_dac.da_args->attr_filter;
> > +
> > +	attrip->attri_name = (void *)attr->xattri_dac.da_args->name;
> > +	attrip->attri_value = attr->xattri_dac.da_args->value;
> > +	attrip->attri_name_len = attr->xattri_dac.da_args->namelen;
> > +	attrip->attri_value_len = attr->xattri_dac.da_args->valuelen;
> > +}
> > +
> > +/* Get an ATTRI. */
> > +static struct xfs_log_item *
> > +xfs_attr_create_intent(
> > +	struct xfs_trans		*tp,
> > +	struct list_head		*items,
> > +	unsigned int			count,
> > +	bool				sort)
> > +{
> > +	struct xfs_mount		*mp = tp->t_mountp;
> > +	struct xfs_attri_log_item	*attrip;
> > +	struct xfs_attr_item		*attr;
> > +
> > +	ASSERT(count == 1);
> > +
> > +	if (!xfs_sb_version_hasdelattr(&mp->m_sb))
> > +		return NULL;
> > +
> > +	attrip = xfs_attri_init(mp);
> > +	xfs_trans_add_item(tp, &attrip->attri_item);
> > +	list_for_each_entry(attr, items, xattri_list)
> > +		xfs_attr_log_item(tp, attrip, attr);
> > +	return &attrip->attri_item;
> > +}
> > +
> > +/* Process an attr. */
> > +STATIC int
> > +xfs_attr_finish_item(
> > +	struct xfs_trans		*tp,
> > +	struct xfs_log_item		*done,
> > +	struct list_head		*item,
> > +	struct xfs_btree_cur		**state)
> > +{
> > +	struct xfs_attr_item		*attr;
> > +	int				error;
> > +	struct xfs_delattr_context	*dac;
> > +	struct xfs_attrd_log_item	*attrdp;
> > +	struct xfs_attri_log_item	*attrip;
> > +
> > +	attr = container_of(item, struct xfs_attr_item, xattri_list);
> > +	dac = &attr->xattri_dac;
> > +
> > +	/*
> > +	 * Always reset trans after EAGAIN cycle
> > +	 * since the transaction is new
> > +	 */
> > +	dac->da_args->trans = tp;
> > +
> > +	error = xfs_trans_attr(dac, ATTRD_ITEM(done), &dac->leaf_bp,
> > +			       attr->xattri_op_flags);
> > +	/*
> > +	 * The attrip refers to xfs_attr_item memory to log the name and value
> > +	 * with the intent item. This already occurred when the intent was
> > +	 * committed so these fields are no longer accessed.
> 
> Can you clear the attri_{name,value} pointers after you've logged the
> intent item so that we don't have to do them here?
> 
> > Clear them out of
> > +	 * caution since we're about to free the xfs_attr_item.
> > +	 */
> > +	if (xfs_sb_version_hasdelattr(&dac->da_args->dp->i_mount->m_sb)) {
> > +		attrdp = (struct xfs_attrd_log_item *)done;
> 
> attrdp = ATTRD_ITEM(done)?
> 
> > +		attrip = attrdp->attrd_attrip;
> > +		attrip->attri_name = NULL;
> > +		attrip->attri_value = NULL;
> > +	}
> > +
> > +	if (error != -EAGAIN)
> > +		kmem_free(attr);
> > +
> > +	return error;
> > +}
> > +
> > +/* Abort all pending ATTRs. */
> > +STATIC void
> > +xfs_attr_abort_intent(
> > +	struct xfs_log_item		*intent)
> > +{
> > +	xfs_attri_release(ATTRI_ITEM(intent));
> > +}
> > +
> > +/* Cancel an attr */
> > +STATIC void
> > +xfs_attr_cancel_item(
> > +	struct list_head		*item)
> > +{
> > +	struct xfs_attr_item		*attr;
> > +
> > +	attr = container_of(item, struct xfs_attr_item, xattri_list);
> > +	kmem_free(attr);
> > +}
> > +
> > +/*
> > + * The ATTRI is logged only once and cannot be moved in the log, so simply
> > + * return the lsn at which it's been logged.
> > + */
> > +STATIC xfs_lsn_t
> > +xfs_attri_item_committed(
> > +	struct xfs_log_item	*lip,
> > +	xfs_lsn_t		lsn)
> > +{
> > +	return lsn;
> > +}
> 
> You can omit this function because the default is "return lsn;" if you
> don't provide one.  See xfs_trans_committed_bulk.
> 
> > +
> > +STATIC void
> > +xfs_attri_item_committing(
> > +	struct xfs_log_item	*lip,
> > +	xfs_lsn_t		lsn)
> > +{
> > +}
> 
> This function isn't required if it doesn't do anything.  See
> xfs_log_commit_cil.
> 
> > +
> > +STATIC bool
> > +xfs_attri_item_match(
> > +	struct xfs_log_item	*lip,
> > +	uint64_t		intent_id)
> > +{
> > +	return ATTRI_ITEM(lip)->attri_format.alfi_id == intent_id;
> > +}
> > +
> > +/*
> > + * When the attrd item is committed to disk, all we need to do is delete our
> > + * reference to our partner attri item and then free ourselves. Since we're
> > + * freeing ourselves we must return -1 to keep the transaction code from
> > + * further referencing this item.
> > + */
> > +STATIC xfs_lsn_t
> > +xfs_attrd_item_committed(
> > +	struct xfs_log_item	*lip,
> > +	xfs_lsn_t		lsn)
> > +{
> > +	struct xfs_attrd_log_item	*attrdp = ATTRD_ITEM(lip);
> > +
> > +	/*
> > +	 * Drop the ATTRI reference regardless of whether the ATTRD has been
> > +	 * aborted. Once the ATTRD transaction is constructed, it is the sole
> > +	 * responsibility of the ATTRD to release the ATTRI (even if the ATTRI
> > +	 * is aborted due to log I/O error).
> > +	 */
> > +	xfs_attri_release(attrdp->attrd_attrip);
> > +	xfs_attrd_item_free(attrdp);
> > +
> > +	return NULLCOMMITLSN;
> > +}
> 
> If you set XFS_ITEM_RELEASE_WHEN_COMMITTED in the attrd item ops,
> xfs_trans_committed_bulk will call ->iop_release instead of
> ->iop_committed and you therefore don't need this function.
> 
> > +
> > +STATIC void
> > +xfs_attrd_item_committing(
> > +	struct xfs_log_item	*lip,
> > +	xfs_lsn_t		lsn)
> > +{
> > +}
> 
> Same comment as xfs_attri_item_committing.
> 
> > +
> > +
> > +/*
> > + * Allocate and initialize an attrd item
> > + */
> > +struct xfs_attrd_log_item *
> > +xfs_attrd_init(
> > +	struct xfs_mount		*mp,
> > +	struct xfs_attri_log_item	*attrip)
> > +
> > +{
> > +	struct xfs_attrd_log_item	*attrdp;
> > +	uint				size;
> > +
> > +	size = (uint)(sizeof(struct xfs_attrd_log_item));
> 
> Same comment about sizeof and size_t as in xfs_attri_init.
> 
> > +	attrdp = kmem_zalloc(size, 0);
> > +	memset(attrdp, 0, size);
> 
> No need to memset-zero something you just zalloc'd.
> 
> > +
> > +	xfs_log_item_init(mp, &attrdp->attrd_item, XFS_LI_ATTRD,
> > +			  &xfs_attrd_item_ops);
> > +	attrdp->attrd_attrip = attrip;
> > +	attrdp->attrd_format.alfd_alf_id = attrip->attri_format.alfi_id;
> > +
> > +	return attrdp;
> > +}
> > +
> > +/*
> > + * This routine is called to allocate an "attr free done" log item.
> > + */
> > +struct xfs_attrd_log_item *
> > +xfs_trans_get_attrd(struct xfs_trans		*tp,
> > +		  struct xfs_attri_log_item	*attrip)
> > +{
> > +	struct xfs_attrd_log_item		*attrdp;
> > +
> > +	ASSERT(tp != NULL);
> > +
> > +	attrdp = xfs_attrd_init(tp->t_mountp, attrip);
> > +	ASSERT(attrdp != NULL);
> 
> You could fold xfs_attrd_init into this function since there's only one
> caller.
> 
> > +
> > +	xfs_trans_add_item(tp, &attrdp->attrd_item);
> > +	return attrdp;
> > +}
> > +
> > +static const struct xfs_item_ops xfs_attrd_item_ops = {
> > +	.iop_size	= xfs_attrd_item_size,
> > +	.iop_format	= xfs_attrd_item_format,
> > +	.iop_release    = xfs_attrd_item_release,
> > +	.iop_committing	= xfs_attrd_item_committing,
> > +	.iop_committed	= xfs_attrd_item_committed,
> > +};
> > +
> > +
> > +/* Get an ATTRD so we can process all the attrs. */
> > +static struct xfs_log_item *
> > +xfs_attr_create_done(
> > +	struct xfs_trans		*tp,
> > +	struct xfs_log_item		*intent,
> > +	unsigned int			count)
> > +{
> > +	if (!xfs_sb_version_hasdelattr(&tp->t_mountp->m_sb))
> > +		return NULL;
> 
> This is probably better expressed as:
> 
> 	if (!intent)
> 		return NULL;
> 
> Since we don't need a log intent done item if there's no log intent
> item.
> 
> > +
> > +	return &xfs_trans_get_attrd(tp, ATTRI_ITEM(intent))->attrd_item;
> > +}
> > +
> > +const struct xfs_defer_op_type xfs_attr_defer_type = {
> > +	.max_items	= 1,
> > +	.create_intent	= xfs_attr_create_intent,
> > +	.abort_intent	= xfs_attr_abort_intent,
> > +	.create_done	= xfs_attr_create_done,
> > +	.finish_item	= xfs_attr_finish_item,
> > +	.cancel_item	= xfs_attr_cancel_item,
> > +};
> > +
> > +/*
> > + * Process an attr intent item that was recovered from the log.  We need to
> > + * delete the attr that it describes.
> > + */
> > +STATIC int
> > +xfs_attri_item_recover(
> > +	struct xfs_log_item		*lip,
> > +	struct list_head		*capture_list)
> > +{
> > +	struct xfs_attri_log_item	*attrip = ATTRI_ITEM(lip);
> > +	struct xfs_mount		*mp = lip->li_mountp;
> > +	struct xfs_inode		*ip;
> > +	struct xfs_da_args		args;
> > +	struct xfs_attri_log_format	*attrp;
> > +	int				error;
> > +
> > +	/*
> > +	 * First check the validity of the attr described by the ATTRI.  If any
> > +	 * are bad, then assume that all are bad and just toss the ATTRI.
> > +	 */
> > +	attrp = &attrip->attri_format;
> > +	if (!(attrp->alfi_op_flags == XFS_ATTR_OP_FLAGS_SET ||
> > +	      attrp->alfi_op_flags == XFS_ATTR_OP_FLAGS_REMOVE) ||
> > +	    (attrp->alfi_value_len > XATTR_SIZE_MAX) ||
> > +	    (attrp->alfi_name_len > XATTR_NAME_MAX) ||
> > +	    (attrp->alfi_name_len == 0)) {
> 
> This needs to call xfs_verify_ino() on attrp->alfi_ino.
> 
> This also needs to check for xfs_sb_version_hasdelayedattr().
> 
> I would refactor this into a separate validation predicate to eliminate
> the multi-line if statement.  I will post a series cleaning up the other
> log items' recover functions shortly.
> 
> > +		/*
> > +		 * This will pull the ATTRI from the AIL and free the memory
> > +		 * associated with it.
> > +		 */
> > +		xfs_attri_release(attrip);
> 
> No need to call xfs_attri_release; one of the 5.10 cleanups was to
> recognize that the log recovery code does this for you automatically.
> 
> > +		return -EFSCORRUPTED;
> > +	}
> > +
> > +	error = xfs_iget(mp, 0, attrp->alfi_ino, 0, 0, &ip);
> > +	if (error)
> > +		return error;
> 
> I /think/ this needs to call xfs_qm_dqattach here, for reasons I'll get
> into shortly.
> 
> In the meantime, this /definitely/ needs to do:
> 
> 	if (VFS_I(ip)->i_nlink == 0)
> 		xfs_iflags_set(ip, XFS_IRECOVERY);
> 
> Because the IRECOVERY flag prevents inode inactivation from triggering
> on an unlinked inode while we're still performing log recovery.
> 
> If you want to steal the xlog_recover_iget helper from the atomic
> swapext series[0] please feel free. :)
> 
> [0] https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/commit/?id=51e23b9c9d9674a78dc97c5848c9efb4461e074d
> 
> > +	memset(&args, 0, sizeof(args));
> > +	args.dp = ip;
> > +	args.name = attrip->attri_name;
> > +	args.namelen = attrp->alfi_name_len;
> > +	args.attr_filter = attrp->alfi_attr_flags;
> > +	if (attrp->alfi_op_flags == XFS_ATTR_OP_FLAGS_SET) {
> > +		args.value = attrip->attri_value;
> > +		args.valuelen = attrp->alfi_value_len;
> > +	}
> > +
> > +	error = xfs_attr_set(&args);
> 
> Er...

Err... silly /me started a comment and then forgot to come back to it.

Log intent item recovery functions are "special".  The intent items that
are recovered from the log represent all the committed state of the log
at the point that the system went down.  For each recovered intent, we
have to finish exactly that one step of work before we can move on to
any work that would have happened after a transaction roll.

Maybe an example would help here: Let's say that two threads (a) and (b)
each create a transaction, each log an intent item (we'll call them A
and B respectively) and commit.  Let's say that the system goes down
immediately after both commits are persisted but before anything else
can happen.

Let us further presume that A is a multi-step transaction, and that the
next step of A (call it A1) requires a resource that B currently has
locked for update.  Normally, thread (a) will be blocked from making
update A1 until B commits and thread (b) unlocks that resource, which
means that the commit order will be A -> B -> A1.

Now let's look at log recovery.  We recover A and B from the log.  The
data dependency between B and A1 still exists, but the log does not
capture enough information to know about that dependency.  In order to
ensure that log replay occurs in exactly the same order that it would
have had the system not gone down, XFS single-steps through the
recovered items and captures the "next steps" for later replay.

Going back to our example, log recovery will replay A needs to notice
that recover(A) queued the unfinished work A1.  It saves A1 for later in
the xfs_defer_capture machinery.  Then it recovers B, and only then can
it go back to A1 and finish that.

Concretely, this means that you can't call xfs_attr_set here, because it
creates a transaction and commits it, which potentially completes a
bunch of work items that might have had dependencies on the other things
that were recovered from the log.  I don't think xattrs actually /have/
any such dependencies, but it's easier to reason about log recovery if
all the recovery functions behave the same way.

This means that this recovery function has to behave in this manner:

	xfs_iget(..., &ip);
	xfs_trans_alloc(&tp)
	xfs_trans_get_attrd(tp, attrip);
	xfs_ilock(ip...);
	xfs_trans_attr(...);
	if (there's more work) {
		create a new defer item from the onstack &args
		link it to the transaction
	}

	xfs_defer_ops_capture_and_commit(tp, ip, capture_list);
	<unlock and release inodes>

Or put another way, if xfs_trans_attr returns -EAGAIN to tell us that
there's more work to do, we have to create an incore defer ops item,
attach it to the transaction, and let the defer capture mechanism save
it for later.

Some day we'll figure out how to encode those data dependencies in the
ondisk log (Dave speculated a while back that it might be as simple as
encoding the transaction LSN in the intent ids instead of raw pointers
so that we can reconstruct which intents came from where) but for now
this is the (less) clunky way we do it.

Oh, and also it's necessary to attach dquots to any inode involved in
log recovery, unless xfs_trans_attr already does that for us(?)

--D

> 
> > +
> > +	xfs_attri_release(attrip);
> 
> The transaction commit will take care of releasing attrip.
> 
> > +	xfs_irele(ip);
> > +	return error;
> > +}
> > +
> > +static const struct xfs_item_ops xfs_attri_item_ops = {
> > +	.iop_size	= xfs_attri_item_size,
> > +	.iop_format	= xfs_attri_item_format,
> > +	.iop_unpin	= xfs_attri_item_unpin,
> > +	.iop_committed	= xfs_attri_item_committed,
> > +	.iop_committing = xfs_attri_item_committing,
> > +	.iop_release    = xfs_attri_item_release,
> > +	.iop_recover	= xfs_attri_item_recover,
> > +	.iop_match	= xfs_attri_item_match,
> 
> This needs an ->iop_relog method so that we can relog the attri log item
> if the log starts to fill up.
> 
> > +};
> > +
> > +
> > +
> > +STATIC int
> > +xlog_recover_attri_commit_pass2(
> > +	struct xlog                     *log,
> > +	struct list_head		*buffer_list,
> > +	struct xlog_recover_item        *item,
> > +	xfs_lsn_t                       lsn)
> > +{
> > +	int                             error;
> > +	struct xfs_mount                *mp = log->l_mp;
> > +	struct xfs_attri_log_item       *attrip;
> > +	struct xfs_attri_log_format     *attri_formatp;
> > +	char				*name = NULL;
> > +	char				*value = NULL;
> > +	int				region = 0;
> > +
> > +	attri_formatp = item->ri_buf[region].i_addr;
> 
> Please check the __pad field for zeroes here.
> 
> > +	attrip = xfs_attri_init(mp);
> > +	error = xfs_attri_copy_format(&item->ri_buf[region],
> > +				      &attrip->attri_format);
> > +	if (error) {
> > +		xfs_attri_item_free(attrip);
> > +		return error;
> > +	}
> > +
> > +	attrip->attri_name_len = attri_formatp->alfi_name_len;
> > +	attrip->attri_value_len = attri_formatp->alfi_value_len;
> > +	attrip = krealloc(attrip, sizeof(struct xfs_attri_log_item) +
> > +			  attrip->attri_name_len + attrip->attri_value_len,
> > +			  GFP_NOFS | __GFP_NOFAIL);
> > +
> > +	ASSERT(attrip->attri_name_len > 0);
> 
> If attri_name_len is zero, reject the whole thing with EFSCORRUPTED.
> 
> > +	region++;
> > +	name = ((char *)attrip) + sizeof(struct xfs_attri_log_item);
> > +	memcpy(name, item->ri_buf[region].i_addr,
> > +	       attrip->attri_name_len);
> > +	attrip->attri_name = name;
> > +
> > +	if (attrip->attri_value_len > 0) {
> > +		region++;
> > +		value = ((char *)attrip) + sizeof(struct xfs_attri_log_item) +
> > +			attrip->attri_name_len;
> > +		memcpy(value, item->ri_buf[region].i_addr,
> > +			attrip->attri_value_len);
> > +		attrip->attri_value = value;
> > +	}
> 
> Question: is it valid for an attri item to have value_len > 0 for an
> XFS_ATTRI_OP_FLAGS_REMOVE operation?
> 
> Granted, that level of validation might be better left to the _recover
> function.
> 
> > +
> > +	/*
> > +	 * The ATTRI has two references. One for the ATTRD and one for ATTRI to
> > +	 * ensure it makes it into the AIL. Insert the ATTRI into the AIL
> > +	 * directly and drop the ATTRI reference. Note that
> > +	 * xfs_trans_ail_update() drops the AIL lock.
> > +	 */
> > +	xfs_trans_ail_insert(log->l_ailp, &attrip->attri_item, lsn);
> > +	xfs_attri_release(attrip);
> > +	return 0;
> > +}
> > +
> > +const struct xlog_recover_item_ops xlog_attri_item_ops = {
> > +	.item_type	= XFS_LI_ATTRI,
> > +	.commit_pass2	= xlog_recover_attri_commit_pass2,
> > +};
> > +
> > +/*
> > + * This routine is called when an ATTRD format structure is found in a committed
> > + * transaction in the log. Its purpose is to cancel the corresponding ATTRI if
> > + * it was still in the log. To do this it searches the AIL for the ATTRI with
> > + * an id equal to that in the ATTRD format structure. If we find it we drop
> > + * the ATTRD reference, which removes the ATTRI from the AIL and frees it.
> > + */
> > +STATIC int
> > +xlog_recover_attrd_commit_pass2(
> > +	struct xlog			*log,
> > +	struct list_head		*buffer_list,
> > +	struct xlog_recover_item	*item,
> > +	xfs_lsn_t			lsn)
> > +{
> > +	struct xfs_attrd_log_format	*attrd_formatp;
> > +
> > +	attrd_formatp = item->ri_buf[0].i_addr;
> > +	ASSERT((item->ri_buf[0].i_len ==
> > +				(sizeof(struct xfs_attrd_log_format))));
> > +
> > +	xlog_recover_release_intent(log, XFS_LI_ATTRI,
> > +				    attrd_formatp->alfd_alf_id);
> > +	return 0;
> > +}
> > +
> > +const struct xlog_recover_item_ops xlog_attrd_item_ops = {
> > +	.item_type	= XFS_LI_ATTRD,
> > +	.commit_pass2	= xlog_recover_attrd_commit_pass2,
> > +};
> > diff --git a/fs/xfs/xfs_attr_item.h b/fs/xfs/xfs_attr_item.h
> > new file mode 100644
> > index 0000000..7dd2572
> > --- /dev/null
> > +++ b/fs/xfs/xfs_attr_item.h
> > @@ -0,0 +1,76 @@
> > +/* SPDX-License-Identifier: GPL-2.0-or-later
> > + *
> > + * Copyright (C) 2019 Oracle.  All Rights Reserved.
> > + * Author: Allison Collins <allison.henderson@oracle.com>
> > + */
> > +#ifndef	__XFS_ATTR_ITEM_H__
> > +#define	__XFS_ATTR_ITEM_H__
> > +
> > +/* kernel only ATTRI/ATTRD definitions */
> > +
> > +struct xfs_mount;
> > +struct kmem_zone;
> > +
> > +/*
> > + * Define ATTR flag bits. Manipulated by set/clear/test_bit operators.
> > + */
> > +#define	XFS_ATTRI_RECOVERED	1
> > +
> > +
> > +/* iovec length must be 32-bit aligned */
> > +#define ATTR_NVEC_SIZE(size) (size == sizeof(int32_t) ? sizeof(int32_t) : \
> > +				size + sizeof(int32_t) - \
> > +				(size % sizeof(int32_t)))
> 
> Can you turn this into a static inline helper?
> 
> And use one of the roundup() variants to ensure the proper alignment
> instead of this open-coded stuff? :)
> 
> > +
> > +/*
> > + * This is the "attr intention" log item.  It is used to log the fact that some
> > + * attribute operations need to be processed.  An operation is currently either
> > + * a set or remove.  Set or remove operations are described by the xfs_attr_item
> > + * which may be logged to this intent.  Intents are used in conjunction with the
> > + * "attr done" log item described below.
> > + *
> > + * The ATTRI is reference counted so that it is not freed prior to both the
> > + * ATTRI and ATTRD being committed and unpinned. This ensures the ATTRI is
> > + * inserted into the AIL even in the event of out of order ATTRI/ATTRD
> > + * processing. In other words, an ATTRI is born with two references:
> > + *
> > + *      1.) an ATTRI held reference to track ATTRI AIL insertion
> > + *      2.) an ATTRD held reference to track ATTRD commit
> > + *
> > + * On allocation, both references are the responsibility of the caller. Once the
> > + * ATTRI is added to and dirtied in a transaction, ownership of reference one
> > + * transfers to the transaction. The reference is dropped once the ATTRI is
> > + * inserted to the AIL or in the event of failure along the way (e.g., commit
> > + * failure, log I/O error, etc.). Note that the caller remains responsible for
> > + * the ATTRD reference under all circumstances to this point. The caller has no
> > + * means to detect failure once the transaction is committed, however.
> > + * Therefore, an ATTRD is required after this point, even in the event of
> > + * unrelated failure.
> > + *
> > + * Once an ATTRD is allocated and dirtied in a transaction, reference two
> > + * transfers to the transaction. The ATTRD reference is dropped once it reaches
> > + * the unpin handler. Similar to the ATTRI, the reference also drops in the
> > + * event of commit failure or log I/O errors. Note that the ATTRD is not
> > + * inserted in the AIL, so at this point both the ATTRI and ATTRD are freed.
> 
> I don't think it's necessary to document the entire log intent/log done
> refcount state machine here; it'll do to record just the bits that are
> specific to delayed xattr operations.
> 
> > + */
> > +struct xfs_attri_log_item {
> > +	struct xfs_log_item		attri_item;
> > +	atomic_t			attri_refcount;
> > +	int				attri_name_len;
> > +	void				*attri_name;
> > +	int				attri_value_len;
> > +	void				*attri_value;
> 
> Please compress this structure a bit by moving the two pointers to be
> adjacent instead of interspersed with ints.
> 
> Ok, now on to digesting the new state machine...
> 
> --D
> 
> > +	struct xfs_attri_log_format	attri_format;
> > +};
> > +
> > +/*
> > + * This is the "attr done" log item.  It is used to log the fact that some attrs
> > + * earlier mentioned in an attri item have been freed.
> > + */
> > +struct xfs_attrd_log_item {
> > +	struct xfs_attri_log_item	*attrd_attrip;
> > +	struct xfs_log_item		attrd_item;
> > +	struct xfs_attrd_log_format	attrd_format;
> > +};
> > +
> > +#endif	/* __XFS_ATTR_ITEM_H__ */
> > diff --git a/fs/xfs/xfs_attr_list.c b/fs/xfs/xfs_attr_list.c
> > index 8f8837f..d7787a5 100644
> > --- a/fs/xfs/xfs_attr_list.c
> > +++ b/fs/xfs/xfs_attr_list.c
> > @@ -15,6 +15,7 @@
> >  #include "xfs_inode.h"
> >  #include "xfs_trans.h"
> >  #include "xfs_bmap.h"
> > +#include "xfs_da_btree.h"
> >  #include "xfs_attr.h"
> >  #include "xfs_attr_sf.h"
> >  #include "xfs_attr_leaf.h"
> > diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
> > index 3fbd98f..d5d1959 100644
> > --- a/fs/xfs/xfs_ioctl.c
> > +++ b/fs/xfs/xfs_ioctl.c
> > @@ -15,6 +15,8 @@
> >  #include "xfs_iwalk.h"
> >  #include "xfs_itable.h"
> >  #include "xfs_error.h"
> > +#include "xfs_da_format.h"
> > +#include "xfs_da_btree.h"
> >  #include "xfs_attr.h"
> >  #include "xfs_bmap.h"
> >  #include "xfs_bmap_util.h"
> > diff --git a/fs/xfs/xfs_ioctl32.c b/fs/xfs/xfs_ioctl32.c
> > index c1771e7..62e1534 100644
> > --- a/fs/xfs/xfs_ioctl32.c
> > +++ b/fs/xfs/xfs_ioctl32.c
> > @@ -17,6 +17,8 @@
> >  #include "xfs_itable.h"
> >  #include "xfs_fsops.h"
> >  #include "xfs_rtalloc.h"
> > +#include "xfs_da_format.h"
> > +#include "xfs_da_btree.h"
> >  #include "xfs_attr.h"
> >  #include "xfs_ioctl.h"
> >  #include "xfs_ioctl32.h"
> > diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
> > index 5e16545..5ecc76c 100644
> > --- a/fs/xfs/xfs_iops.c
> > +++ b/fs/xfs/xfs_iops.c
> > @@ -13,6 +13,8 @@
> >  #include "xfs_inode.h"
> >  #include "xfs_acl.h"
> >  #include "xfs_quota.h"
> > +#include "xfs_da_format.h"
> > +#include "xfs_da_btree.h"
> >  #include "xfs_attr.h"
> >  #include "xfs_trans.h"
> >  #include "xfs_trace.h"
> > diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
> > index fa2d05e..3457f22 100644
> > --- a/fs/xfs/xfs_log.c
> > +++ b/fs/xfs/xfs_log.c
> > @@ -1993,6 +1993,10 @@ xlog_print_tic_res(
> >  	    REG_TYPE_STR(CUD_FORMAT, "cud_format"),
> >  	    REG_TYPE_STR(BUI_FORMAT, "bui_format"),
> >  	    REG_TYPE_STR(BUD_FORMAT, "bud_format"),
> > +	    REG_TYPE_STR(ATTRI_FORMAT, "attri_format"),
> > +	    REG_TYPE_STR(ATTRD_FORMAT, "attrd_format"),
> > +	    REG_TYPE_STR(ATTR_NAME, "attr_name"),
> > +	    REG_TYPE_STR(ATTR_VALUE, "attr_value"),
> >  	};
> >  	BUILD_BUG_ON(ARRAY_SIZE(res_type_str) != XLOG_REG_TYPE_MAX + 1);
> >  #undef REG_TYPE_STR
> > diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
> > index a8289ad..cb951cd 100644
> > --- a/fs/xfs/xfs_log_recover.c
> > +++ b/fs/xfs/xfs_log_recover.c
> > @@ -1775,6 +1775,8 @@ static const struct xlog_recover_item_ops *xlog_recover_item_ops[] = {
> >  	&xlog_cud_item_ops,
> >  	&xlog_bui_item_ops,
> >  	&xlog_bud_item_ops,
> > +	&xlog_attri_item_ops,
> > +	&xlog_attrd_item_ops,
> >  };
> >  
> >  static const struct xlog_recover_item_ops *
> > diff --git a/fs/xfs/xfs_ondisk.h b/fs/xfs/xfs_ondisk.h
> > index 0aa87c2..bc9c25e 100644
> > --- a/fs/xfs/xfs_ondisk.h
> > +++ b/fs/xfs/xfs_ondisk.h
> > @@ -132,6 +132,8 @@ xfs_check_ondisk_structs(void)
> >  	XFS_CHECK_STRUCT_SIZE(struct xfs_inode_log_format,	56);
> >  	XFS_CHECK_STRUCT_SIZE(struct xfs_qoff_logformat,	20);
> >  	XFS_CHECK_STRUCT_SIZE(struct xfs_trans_header,		16);
> > +	XFS_CHECK_STRUCT_SIZE(struct xfs_attri_log_format,	40);
> > +	XFS_CHECK_STRUCT_SIZE(struct xfs_attrd_log_format,	16);
> >  
> >  	/*
> >  	 * The v5 superblock format extended several v4 header structures with
> > diff --git a/fs/xfs/xfs_xattr.c b/fs/xfs/xfs_xattr.c
> > index bca48b3..9b0c790 100644
> > --- a/fs/xfs/xfs_xattr.c
> > +++ b/fs/xfs/xfs_xattr.c
> > @@ -10,6 +10,7 @@
> >  #include "xfs_log_format.h"
> >  #include "xfs_da_format.h"
> >  #include "xfs_inode.h"
> > +#include "xfs_da_btree.h"
> >  #include "xfs_attr.h"
> >  #include "xfs_acl.h"
> >  #include "xfs_da_btree.h"
> > -- 
> > 2.7.4
> > 

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v13 09/10] xfs: Remove unused xfs_attr_*_args
  2020-11-10 20:07   ` Darrick J. Wong
@ 2020-11-13  1:27     ` Allison Henderson
  0 siblings, 0 replies; 58+ messages in thread
From: Allison Henderson @ 2020-11-13  1:27 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs



On 11/10/20 1:07 PM, Darrick J. Wong wrote:
> On Thu, Oct 22, 2020 at 11:34:34PM -0700, Allison Henderson wrote:
>> Remove xfs_attr_set_args, xfs_attr_remove_args, and xfs_attr_trans_roll.
>> These high level loops are now driven by the delayed operations code,
>> and can be removed.
>>
>> Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
>> ---
>>   fs/xfs/libxfs/xfs_attr.c        | 97 +----------------------------------------
>>   fs/xfs/libxfs/xfs_attr.h        |  9 ++--
>>   fs/xfs/libxfs/xfs_attr_remote.c |  4 +-
>>   3 files changed, 5 insertions(+), 105 deletions(-)
>>
>> diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
>> index edd5d10..b5e1e84 100644
>> --- a/fs/xfs/libxfs/xfs_attr.c
>> +++ b/fs/xfs/libxfs/xfs_attr.c
>> @@ -262,65 +262,6 @@ xfs_attr_set_shortform(
>>   }
>>   
>>   /*
>> - * Checks to see if a delayed attribute transaction should be rolled.  If so,
>> - * also checks for a defer finish.  Transaction is finished and rolled as
>> - * needed, and returns true of false if the delayed operation should continue.
>> - */
>> -STATIC int
>> -xfs_attr_trans_roll(
>> -	struct xfs_delattr_context	*dac)
>> -{
>> -	struct xfs_da_args		*args = dac->da_args;
>> -	int				error = 0;
>> -
>> -	if (dac->flags & XFS_DAC_DEFER_FINISH) {
>> -		/*
>> -		 * The caller wants us to finish all the deferred ops so that we
>> -		 * avoid pinning the log tail with a large number of deferred
>> -		 * ops.
>> -		 */
>> -		dac->flags &= ~XFS_DAC_DEFER_FINISH;
>> -		error = xfs_defer_finish(&args->trans);
>> -		if (error)
>> -			return error;
>> -	}
>> -
>> -	return xfs_trans_roll_inode(&args->trans, args->dp);
>> -}
>> -
>> -/*
>> - * Set the attribute specified in @args.
>> - */
>> -int
>> -xfs_attr_set_args(
>> -	struct xfs_da_args	*args)
>> -{
>> -	struct xfs_buf			*leaf_bp = NULL;
>> -	int				error = 0;
>> -	struct xfs_delattr_context	dac = {
>> -		.da_args	= args,
>> -	};
>> -
>> -	do {
>> -		error = xfs_attr_set_iter(&dac, &leaf_bp);
> 
> Now that there's only one caller of xfs_attr_set_iter and it passes
> &dac->leaf_bp, I think you can get rid of this second parameter, right?
> 
> It's nice to see so much code disappear now that we track attr
> operations with deferred ops.  Everything else looks ok here. :)
> 
> --D

Sure, I will collapse that parameter in then.

Allison

> 
>> -		if (error != -EAGAIN)
>> -			break;
>> -
>> -		error = xfs_attr_trans_roll(&dac);
>> -		if (error)
>> -			return error;
>> -
>> -		if (leaf_bp) {
>> -			xfs_trans_bjoin(args->trans, leaf_bp);
>> -			xfs_trans_bhold(args->trans, leaf_bp);
>> -		}
>> -
>> -	} while (true);
>> -
>> -	return error;
>> -}
>> -
>> -/*
>>    * Set the attribute specified in @args.
>>    * This routine is meant to function as a delayed operation, and may return
>>    * -EAGAIN when the transaction needs to be rolled.  Calling functions will need
>> @@ -363,11 +304,7 @@ xfs_attr_set_iter(
>>   		 * continue.  Otherwise, is it converted from shortform to leaf
>>   		 * and -EAGAIN is returned.
>>   		 */
>> -		error = xfs_attr_set_shortform(args, leaf_bp);
>> -		if (error == -EAGAIN)
>> -			dac->flags |= XFS_DAC_DEFER_FINISH;
>> -
>> -		return error;
>> +		return xfs_attr_set_shortform(args, leaf_bp);
>>   	}
>>   
>>   	/*
>> @@ -398,7 +335,6 @@ xfs_attr_set_iter(
>>   			 * same state (inode locked and joined, transaction
>>   			 * clean) no matter how we got to this step.
>>   			 */
>> -			dac->flags |= XFS_DAC_DEFER_FINISH;
>>   			return -EAGAIN;
>>   		case 0:
>>   			dac->dela_state = XFS_DAS_FOUND_LBLK;
>> @@ -455,32 +391,6 @@ xfs_has_attr(
>>   
>>   /*
>>    * Remove the attribute specified in @args.
>> - */
>> -int
>> -xfs_attr_remove_args(
>> -	struct xfs_da_args	*args)
>> -{
>> -	int				error = 0;
>> -	struct xfs_delattr_context	dac = {
>> -		.da_args	= args,
>> -	};
>> -
>> -	do {
>> -		error = xfs_attr_remove_iter(&dac);
>> -		if (error != -EAGAIN)
>> -			break;
>> -
>> -		error = xfs_attr_trans_roll(&dac);
>> -		if (error)
>> -			return error;
>> -
>> -	} while (true);
>> -
>> -	return error;
>> -}
>> -
>> -/*
>> - * Remove the attribute specified in @args.
>>    *
>>    * This function may return -EAGAIN to signal that the transaction needs to be
>>    * rolled.  Callers should continue calling this function until they receive a
>> @@ -895,7 +805,6 @@ xfs_attr_leaf_addname(
>>   		if (error)
>>   			return error;
>>   
>> -		dac->flags |= XFS_DAC_DEFER_FINISH;
>>   		return -EAGAIN;
>>   	}
>>   
>> @@ -1192,7 +1101,6 @@ xfs_attr_node_addname(
>>   			 * Restart routine from the top.  No need to set  the
>>   			 * state
>>   			 */
>> -			dac->flags |= XFS_DAC_DEFER_FINISH;
>>   			return -EAGAIN;
>>   		}
>>   
>> @@ -1205,7 +1113,6 @@ xfs_attr_node_addname(
>>   		error = xfs_da3_split(state);
>>   		if (error)
>>   			goto out;
>> -		dac->flags |= XFS_DAC_DEFER_FINISH;
>>   	} else {
>>   		/*
>>   		 * Addition succeeded, update Btree hashvals.
>> @@ -1246,7 +1153,6 @@ xfs_attr_node_addname(
>>   			if (error)
>>   				return error;
>>   
>> -			dac->flags |= XFS_DAC_DEFER_FINISH;
>>   			dac->dela_state = XFS_DAS_ALLOC_NODE;
>>   			return -EAGAIN;
>>   		}
>> @@ -1516,7 +1422,6 @@ xfs_attr_node_remove_step(
>>   		if (error)
>>   			return error;
>>   
>> -		dac->flags |= XFS_DAC_DEFER_FINISH;
>>   		dac->dela_state = XFS_DAS_RM_SHRINK;
>>   		return -EAGAIN;
>>   	}
>> diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
>> index 8a08411..6d90301 100644
>> --- a/fs/xfs/libxfs/xfs_attr.h
>> +++ b/fs/xfs/libxfs/xfs_attr.h
>> @@ -244,10 +244,9 @@ enum xfs_delattr_state {
>>   /*
>>    * Defines for xfs_delattr_context.flags
>>    */
>> -#define XFS_DAC_DEFER_FINISH		0x01 /* finish the transaction */
>> -#define XFS_DAC_NODE_RMVNAME_INIT	0x02 /* xfs_attr_node_removename init */
>> -#define XFS_DAC_LEAF_ADDNAME_INIT	0x04 /* xfs_attr_leaf_addname init*/
>> -#define XFS_DAC_DELAYED_OP_INIT		0x08 /* delayed operations init*/
>> +#define XFS_DAC_NODE_RMVNAME_INIT	0x01 /* xfs_attr_node_removename init */
>> +#define XFS_DAC_LEAF_ADDNAME_INIT	0x02 /* xfs_attr_leaf_addname init*/
>> +#define XFS_DAC_DELAYED_OP_INIT		0x04 /* delayed operations init*/
>>   
>>   /*
>>    * Context used for keeping track of delayed attribute operations
>> @@ -297,11 +296,9 @@ int xfs_inode_hasattr(struct xfs_inode *ip);
>>   int xfs_attr_get_ilocked(struct xfs_da_args *args);
>>   int xfs_attr_get(struct xfs_da_args *args);
>>   int xfs_attr_set(struct xfs_da_args *args);
>> -int xfs_attr_set_args(struct xfs_da_args *args);
>>   int xfs_attr_set_iter(struct xfs_delattr_context *dac,
>>   		      struct xfs_buf **leaf_bp);
>>   int xfs_has_attr(struct xfs_da_args *args);
>> -int xfs_attr_remove_args(struct xfs_da_args *args);
>>   int xfs_attr_remove_iter(struct xfs_delattr_context *dac);
>>   bool xfs_attr_namecheck(const void *name, size_t length);
>>   void xfs_delattr_context_init(struct xfs_delattr_context *dac,
>> diff --git a/fs/xfs/libxfs/xfs_attr_remote.c b/fs/xfs/libxfs/xfs_attr_remote.c
>> index 45c4bc5..262d1870 100644
>> --- a/fs/xfs/libxfs/xfs_attr_remote.c
>> +++ b/fs/xfs/libxfs/xfs_attr_remote.c
>> @@ -751,10 +751,8 @@ xfs_attr_rmtval_remove(
>>   	if (error)
>>   		return error;
>>   
>> -	if (!done) {
>> -		dac->flags |= XFS_DAC_DEFER_FINISH;
>> +	if (!done)
>>   		return -EAGAIN;
>> -	}
>>   
>>   	return error;
>>   }
>> -- 
>> 2.7.4
>>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v13 07/10] xfs: Add feature bit XFS_SB_FEAT_INCOMPAT_LOG_DELATTR
  2020-11-10 20:10   ` Darrick J. Wong
@ 2020-11-13  1:27     ` Allison Henderson
  0 siblings, 0 replies; 58+ messages in thread
From: Allison Henderson @ 2020-11-13  1:27 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs



On 11/10/20 1:10 PM, Darrick J. Wong wrote:
> On Thu, Oct 22, 2020 at 11:34:32PM -0700, Allison Henderson wrote:
>> This patch adds a new feature bit XFS_SB_FEAT_INCOMPAT_LOG_DELATTR which
>> can be used to control turning on/off delayed attributes
>>
>> Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
>> ---
>>   fs/xfs/libxfs/xfs_format.h | 8 ++++++--
>>   fs/xfs/libxfs/xfs_fs.h     | 1 +
>>   fs/xfs/libxfs/xfs_sb.c     | 2 ++
>>   fs/xfs/xfs_super.c         | 3 +++
>>   4 files changed, 12 insertions(+), 2 deletions(-)
>>
>> diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
>> index d419c34..18b41a7 100644
>> --- a/fs/xfs/libxfs/xfs_format.h
>> +++ b/fs/xfs/libxfs/xfs_format.h
>> @@ -483,7 +483,9 @@ xfs_sb_has_incompat_feature(
>>   	return (sbp->sb_features_incompat & feature) != 0;
>>   }
>>   
>> -#define XFS_SB_FEAT_INCOMPAT_LOG_ALL 0
>> +#define XFS_SB_FEAT_INCOMPAT_LOG_DELATTR   (1 << 0)	/* Delayed Attributes */
>> +#define XFS_SB_FEAT_INCOMPAT_LOG_ALL \
>> +	(XFS_SB_FEAT_INCOMPAT_LOG_DELATTR)
>>   #define XFS_SB_FEAT_INCOMPAT_LOG_UNKNOWN	~XFS_SB_FEAT_INCOMPAT_LOG_ALL
>>   static inline bool
>>   xfs_sb_has_incompat_log_feature(
>> @@ -586,7 +588,9 @@ static inline bool xfs_sb_version_hasinobtcounts(struct xfs_sb *sbp)
>>   
>>   static inline bool xfs_sb_version_hasdelattr(struct xfs_sb *sbp)
>>   {
>> -	return false;
>> +	return (XFS_SB_VERSION_NUM(sbp) == XFS_SB_VERSION_5 &&
>> +		(sbp->sb_features_log_incompat &
>> +		XFS_SB_FEAT_INCOMPAT_LOG_DELATTR));
> 
> This change and the EXPERIMENTAL warning should go in whichever patch
> defines xfs_sb_version_hasdelattr.

Sure, will move this to patch 5

> 
>>   }
>>   
>>   /*
>> diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
>> index 2a2e3cf..f703d95 100644
>> --- a/fs/xfs/libxfs/xfs_fs.h
>> +++ b/fs/xfs/libxfs/xfs_fs.h
>> @@ -250,6 +250,7 @@ typedef struct xfs_fsop_resblks {
>>   #define XFS_FSOP_GEOM_FLAGS_RMAPBT	(1 << 19) /* reverse mapping btree */
>>   #define XFS_FSOP_GEOM_FLAGS_REFLINK	(1 << 20) /* files can share blocks */
>>   #define XFS_FSOP_GEOM_FLAGS_BIGTIME	(1 << 21) /* 64-bit nsec timestamps */
>> +#define XFS_FSOP_GEOM_FLAGS_DELATTR	(1 << 22) /* delayed attributes	    */
>>   
>>   /*
>>    * Minimum and maximum sizes need for growth checks.
>> diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c
>> index 5aeafa5..a0ec327 100644
>> --- a/fs/xfs/libxfs/xfs_sb.c
>> +++ b/fs/xfs/libxfs/xfs_sb.c
>> @@ -1168,6 +1168,8 @@ xfs_fs_geometry(
>>   		geo->flags |= XFS_FSOP_GEOM_FLAGS_REFLINK;
>>   	if (xfs_sb_version_hasbigtime(sbp))
>>   		geo->flags |= XFS_FSOP_GEOM_FLAGS_BIGTIME;
>> +	if (xfs_sb_version_hasdelattr(sbp))
>> +		geo->flags |= XFS_FSOP_GEOM_FLAGS_DELATTR;
> 
> These changes to the geometry ioctl should be a separate patch.
> 
> IOWs, the only change in this patch should be adding
> XFS_SB_FEAT_INCOMPAT_LOG_DELATTR to the _ALL #define.
Ok, will break out.

Allison

> 
> --D
> 
>>   	if (xfs_sb_version_hassector(sbp))
>>   		geo->logsectsize = sbp->sb_logsectsize;
>>   	else
>> diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
>> index d1b5f2d..bb85884 100644
>> --- a/fs/xfs/xfs_super.c
>> +++ b/fs/xfs/xfs_super.c
>> @@ -1580,6 +1580,9 @@ xfs_fc_fill_super(
>>   	if (xfs_sb_version_hasinobtcounts(&mp->m_sb))
>>   		xfs_warn(mp,
>>    "EXPERIMENTAL inode btree counters feature in use. Use at your own risk!");
>> +	if (xfs_sb_version_hasdelattr(&mp->m_sb))
>> +		xfs_alert(mp,
>> +	"EXPERIMENTAL delayed attrs feature enabled. Use at your own risk!");
>>   
>>   	error = xfs_mountfs(mp);
>>   	if (error)
>> -- 
>> 2.7.4
>>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v13 06/10] xfs: Add xfs_attr_set_deferred and xfs_attr_remove_deferred
  2020-11-10 20:15   ` Darrick J. Wong
@ 2020-11-13  1:27     ` Allison Henderson
  2020-11-14  2:03       ` Darrick J. Wong
  0 siblings, 1 reply; 58+ messages in thread
From: Allison Henderson @ 2020-11-13  1:27 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs



On 11/10/20 1:15 PM, Darrick J. Wong wrote:
> On Thu, Oct 22, 2020 at 11:34:31PM -0700, Allison Henderson wrote:
>> From: Allison Collins <allison.henderson@oracle.com>
>>
>> These routines to set up and start a new deferred attribute operations.
>> These functions are meant to be called by any routine needing to
>> initiate a deferred attribute operation as opposed to the existing
>> inline operations. New helper function xfs_attr_item_init also added.
>>
>> Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
>> ---
>>   fs/xfs/libxfs/xfs_attr.c | 54 ++++++++++++++++++++++++++++++++++++++++++++++++
>>   fs/xfs/libxfs/xfs_attr.h |  2 ++
>>   2 files changed, 56 insertions(+)
>>
>> diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
>> index 760383c..7fe5554 100644
>> --- a/fs/xfs/libxfs/xfs_attr.c
>> +++ b/fs/xfs/libxfs/xfs_attr.c
>> @@ -25,6 +25,7 @@
>>   #include "xfs_trans_space.h"
>>   #include "xfs_trace.h"
>>   #include "xfs_attr_item.h"
>> +#include "xfs_attr.h"
>>   
>>   /*
>>    * xfs_attr.c
>> @@ -643,6 +644,59 @@ xfs_attr_set(
>>   	goto out_unlock;
>>   }
>>   
>> +STATIC int
>> +xfs_attr_item_init(
>> +	struct xfs_da_args	*args,
>> +	unsigned int		op_flags,	/* op flag (set or remove) */
>> +	struct xfs_attr_item	**attr)		/* new xfs_attr_item */
>> +{
>> +
>> +	struct xfs_attr_item	*new;
>> +
>> +	new = kmem_alloc_large(sizeof(struct xfs_attr_item), KM_NOFS);
> 
> I don't think we need _large allocations for struct xfs_attr_item, right?
I will try it and see, I think it should be ok, one of the new test 
cases I'm using does try to progressively add larger and larger attrs. 
If it doesnt work, I'll make a note of it though.

> 
>> +	memset(new, 0, sizeof(struct xfs_attr_item));
> 
> Use kmem_zalloc and you won't have to memset.  Better yet, zalloc will
> get you memory that's been pre-zeroed in the background.
> 
>> +	new->xattri_op_flags = op_flags;
>> +	new->xattri_dac.da_args = args;
>> +
>> +	*attr = new;
>> +	return 0;
>> +}
>> +
>> +/* Sets an attribute for an inode as a deferred operation */
>> +int
>> +xfs_attr_set_deferred(
>> +	struct xfs_da_args	*args)
>> +{
>> +	struct xfs_attr_item	*new;
>> +	int			error = 0;
>> +
>> +	error = xfs_attr_item_init(args, XFS_ATTR_OP_FLAGS_SET, &new);
>> +	if (error)
>> +		return error;
>> +
>> +	xfs_defer_add(args->trans, XFS_DEFER_OPS_TYPE_ATTR, &new->xattri_list);
>> +
>> +	return 0;
>> +}
> 
> The changes in "xfs: enable delayed attributes" should be moved to this
> patch so that these new functions immediately have callers.
Sure, will merge those patches together then

> 
> (Also see the reply I sent to the next patch, which will avoid weird
> regressions if someone's bisect lands in the middle of this series...)
> 
> --D
> 
>> +
>> +/* Removes an attribute for an inode as a deferred operation */
>> +int
>> +xfs_attr_remove_deferred(
>> +	struct xfs_da_args	*args)
>> +{
>> +
>> +	struct xfs_attr_item	*new;
>> +	int			error;
>> +
>> +	error  = xfs_attr_item_init(args, XFS_ATTR_OP_FLAGS_REMOVE, &new);
>> +	if (error)
>> +		return error;
>> +
>> +	xfs_defer_add(args->trans, XFS_DEFER_OPS_TYPE_ATTR, &new->xattri_list);
>> +
>> +	return 0;
>> +}
>> +
>>   /*========================================================================
>>    * External routines when attribute list is inside the inode
>>    *========================================================================*/
>> diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
>> index 5b4a1ca..8a08411 100644
>> --- a/fs/xfs/libxfs/xfs_attr.h
>> +++ b/fs/xfs/libxfs/xfs_attr.h
>> @@ -307,5 +307,7 @@ bool xfs_attr_namecheck(const void *name, size_t length);
>>   void xfs_delattr_context_init(struct xfs_delattr_context *dac,
>>   			      struct xfs_da_args *args);
>>   int xfs_attr_calc_size(struct xfs_da_args *args, int *local);
>> +int xfs_attr_set_deferred(struct xfs_da_args *args);
>> +int xfs_attr_remove_deferred(struct xfs_da_args *args);
>>   
>>   #endif	/* __XFS_ATTR_H__ */
>> -- 
>> 2.7.4
>>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v13 05/10] xfs: Set up infastructure for deferred attribute operations
  2020-11-10 21:51   ` Darrick J. Wong
  2020-11-11  3:44     ` Darrick J. Wong
@ 2020-11-13  1:32     ` Allison Henderson
  2020-11-14  2:00       ` Darrick J. Wong
  1 sibling, 1 reply; 58+ messages in thread
From: Allison Henderson @ 2020-11-13  1:32 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs



On 11/10/20 2:51 PM, Darrick J. Wong wrote:
> On Thu, Oct 22, 2020 at 11:34:30PM -0700, Allison Henderson wrote:
>> Currently attributes are modified directly across one or more
>> transactions. But they are not logged or replayed in the event of an
>> error. The goal of delayed attributes is to enable logging and replaying
>> of attribute operations using the existing delayed operations
>> infrastructure.  This will later enable the attributes to become part of
>> larger multi part operations that also must first be recorded to the
>> log.  This is mostly of interest in the scheme of parent pointers which
>> would need to maintain an attribute containing parent inode information
>> any time an inode is moved, created, or removed.  Parent pointers would
>> then be of interest to any feature that would need to quickly derive an
>> inode path from the mount point. Online scrub, nfs lookups and fs grow
>> or shrink operations are all features that could take advantage of this.
>>
>> This patch adds two new log item types for setting or removing
>> attributes as deferred operations.  The xfs_attri_log_item logs an
>> intent to set or remove an attribute.  The corresponding
>> xfs_attrd_log_item holds a reference to the xfs_attri_log_item and is
>> freed once the transaction is done.  Both log items use a generic
>> xfs_attr_log_format structure that contains the attribute name, value,
>> flags, inode, and an op_flag that indicates if the operations is a set
>> or remove.
>>
>> Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
>> ---
>>   fs/xfs/Makefile                 |   1 +
>>   fs/xfs/libxfs/xfs_attr.c        |   7 +-
>>   fs/xfs/libxfs/xfs_attr.h        |  19 +
>>   fs/xfs/libxfs/xfs_defer.c       |   1 +
>>   fs/xfs/libxfs/xfs_defer.h       |   3 +
>>   fs/xfs/libxfs/xfs_format.h      |   5 +
>>   fs/xfs/libxfs/xfs_log_format.h  |  44 ++-
>>   fs/xfs/libxfs/xfs_log_recover.h |   2 +
>>   fs/xfs/libxfs/xfs_types.h       |   1 +
>>   fs/xfs/scrub/common.c           |   2 +
>>   fs/xfs/xfs_acl.c                |   2 +
>>   fs/xfs/xfs_attr_item.c          | 750 ++++++++++++++++++++++++++++++++++++++++
>>   fs/xfs/xfs_attr_item.h          |  76 ++++
>>   fs/xfs/xfs_attr_list.c          |   1 +
>>   fs/xfs/xfs_ioctl.c              |   2 +
>>   fs/xfs/xfs_ioctl32.c            |   2 +
>>   fs/xfs/xfs_iops.c               |   2 +
>>   fs/xfs/xfs_log.c                |   4 +
>>   fs/xfs/xfs_log_recover.c        |   2 +
>>   fs/xfs/xfs_ondisk.h             |   2 +
>>   fs/xfs/xfs_xattr.c              |   1 +
>>   21 files changed, 923 insertions(+), 6 deletions(-)
>>
>> diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
>> index 04611a1..b056cfc 100644
>> --- a/fs/xfs/Makefile
>> +++ b/fs/xfs/Makefile
>> @@ -102,6 +102,7 @@ xfs-y				+= xfs_log.o \
>>   				   xfs_buf_item_recover.o \
>>   				   xfs_dquot_item_recover.o \
>>   				   xfs_extfree_item.o \
>> +				   xfs_attr_item.o \
>>   				   xfs_icreate_item.o \
>>   				   xfs_inode_item.o \
>>   				   xfs_inode_item_recover.o \
>> diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
>> index 6453178..760383c 100644
>> --- a/fs/xfs/libxfs/xfs_attr.c
>> +++ b/fs/xfs/libxfs/xfs_attr.c
>> @@ -24,6 +24,7 @@
>>   #include "xfs_quota.h"
>>   #include "xfs_trans_space.h"
>>   #include "xfs_trace.h"
>> +#include "xfs_attr_item.h"
>>   
>>   /*
>>    * xfs_attr.c
>> @@ -59,8 +60,6 @@ STATIC int xfs_attr_node_hasname(xfs_da_args_t *args,
>>   STATIC int xfs_attr_fillstate(xfs_da_state_t *state);
>>   STATIC int xfs_attr_refillstate(xfs_da_state_t *state);
>>   STATIC int xfs_attr_leaf_try_add(struct xfs_da_args *args, struct xfs_buf *bp);
>> -STATIC int xfs_attr_set_iter(struct xfs_delattr_context *dac,
>> -			     struct xfs_buf **leaf_bp);
>>   
>>   int
>>   xfs_inode_hasattr(
>> @@ -142,7 +141,7 @@ xfs_attr_get(
>>   /*
>>    * Calculate how many blocks we need for the new attribute,
>>    */
>> -STATIC int
>> +int
>>   xfs_attr_calc_size(
>>   	struct xfs_da_args	*args,
>>   	int			*local)
>> @@ -327,7 +326,7 @@ xfs_attr_set_args(
>>    * to handle this, and recall the function until a successful error code is
>>    * returned.
>>    */
>> -STATIC int
>> +int
>>   xfs_attr_set_iter(
>>   	struct xfs_delattr_context	*dac,
>>   	struct xfs_buf			**leaf_bp)
>> diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
>> index 501f9df..5b4a1ca 100644
>> --- a/fs/xfs/libxfs/xfs_attr.h
>> +++ b/fs/xfs/libxfs/xfs_attr.h
>> @@ -247,6 +247,7 @@ enum xfs_delattr_state {
>>   #define XFS_DAC_DEFER_FINISH		0x01 /* finish the transaction */
>>   #define XFS_DAC_NODE_RMVNAME_INIT	0x02 /* xfs_attr_node_removename init */
>>   #define XFS_DAC_LEAF_ADDNAME_INIT	0x04 /* xfs_attr_leaf_addname init*/
>> +#define XFS_DAC_DELAYED_OP_INIT		0x08 /* delayed operations init*/
>>   
>>   /*
>>    * Context used for keeping track of delayed attribute operations
>> @@ -254,6 +255,9 @@ enum xfs_delattr_state {
>>   struct xfs_delattr_context {
>>   	struct xfs_da_args      *da_args;
>>   
>> +	/* Used by delayed attributes to hold leaf across transactions */
> 
> "Used by xfs_attr_set to hold a leaf buffer across a transaction roll" ?
Sure, will update

> 
>> +	struct xfs_buf		*leaf_bp;
>> +
>>   	/* Used in xfs_attr_rmtval_set_blk to roll through allocating blocks */
>>   	struct xfs_bmbt_irec	map;
>>   	xfs_dablk_t		lblkno;
>> @@ -267,6 +271,18 @@ struct xfs_delattr_context {
>>   	enum xfs_delattr_state  dela_state;
>>   };
>>   
>> +/*
>> + * List of attrs to commit later.
>> + */
>> +struct xfs_attr_item {
>> +	struct xfs_delattr_context	xattri_dac;
>> +	uint32_t			xattri_op_flags;/* attr op set or rm */
> 
> The comment for xattri_op_flags should be more direct in mentioning that
> it takes XFS_ATTR_OP_FLAGS_{SET,REMOVE}.
Alrighty, will do

> 
> (Alternately you could define an enum for the incore state tracker that
> causes the appropriate XFS_ATTR_OP_FLAG* to be set on the log item in
> xfs_attr_create_intent to avoid mixing of the flag namespaces, but that
> is a lot of paper-pushing...)
> 
>> +
>> +	/* used to log this item to an intent */
>> +	struct list_head		xattri_list;
>> +};
> 
> Ok, so going back to a confusing comment I had from the last series,
> I'm glad that you've moved all the attr code to be deferred operations.
> 
> Can you move all the xfs_delattr_context fields into xfs_attr_item?
> AFAICT (from git diff'ing the entire branch :P) we never allocate an
> xfs_delattr_context on its own; we only ever access the one that's
> embedded in xfs_attr_item, right?
Well, xfs_delattr_context is used earlier in the set by the top level 
routines xfs_attr_set/remove_args.  If we did this, it would pull the 
attr_item in the the lower part of the "delay ready" subseries, and I 
think people really just wanted that part to be "refactor only" just for 
reasons of making the reviewing easier.

How about an extra patch at the end that merges these struct after those 
high level functions back out?  That way we're not trying to introduce 
the log items before this patch?  That seems like a reasonable way to 
phase in the end result.

Also, such a change would imply that a lot of these lower level attr 
routines that sensitive the the state machine mechanics are not passing 
around a xfs_delattr_context any more, now they take a xfs_attr_item. 
Not entirly sure how people would feel about that, but again, I figure 
if we save it for the end, it's easy to take it or leave it with out 
causing too much surgery below.

> 
>> +
>> +
>>   /*========================================================================
>>    * Function prototypes for the kernel.
>>    *========================================================================*/
>> @@ -282,11 +298,14 @@ int xfs_attr_get_ilocked(struct xfs_da_args *args);
>>   int xfs_attr_get(struct xfs_da_args *args);
>>   int xfs_attr_set(struct xfs_da_args *args);
>>   int xfs_attr_set_args(struct xfs_da_args *args);
>> +int xfs_attr_set_iter(struct xfs_delattr_context *dac,
>> +		      struct xfs_buf **leaf_bp);
>>   int xfs_has_attr(struct xfs_da_args *args);
>>   int xfs_attr_remove_args(struct xfs_da_args *args);
>>   int xfs_attr_remove_iter(struct xfs_delattr_context *dac);
>>   bool xfs_attr_namecheck(const void *name, size_t length);
>>   void xfs_delattr_context_init(struct xfs_delattr_context *dac,
>>   			      struct xfs_da_args *args);
>> +int xfs_attr_calc_size(struct xfs_da_args *args, int *local);
>>   
>>   #endif	/* __XFS_ATTR_H__ */
>> diff --git a/fs/xfs/libxfs/xfs_defer.c b/fs/xfs/libxfs/xfs_defer.c
>> index eff4a12..e9caff7 100644
>> --- a/fs/xfs/libxfs/xfs_defer.c
>> +++ b/fs/xfs/libxfs/xfs_defer.c
>> @@ -178,6 +178,7 @@ static const struct xfs_defer_op_type *defer_op_types[] = {
>>   	[XFS_DEFER_OPS_TYPE_RMAP]	= &xfs_rmap_update_defer_type,
>>   	[XFS_DEFER_OPS_TYPE_FREE]	= &xfs_extent_free_defer_type,
>>   	[XFS_DEFER_OPS_TYPE_AGFL_FREE]	= &xfs_agfl_free_defer_type,
>> +	[XFS_DEFER_OPS_TYPE_ATTR]	= &xfs_attr_defer_type,
>>   };
>>   
>>   static void
>> diff --git a/fs/xfs/libxfs/xfs_defer.h b/fs/xfs/libxfs/xfs_defer.h
>> index 05472f7..72a5789 100644
>> --- a/fs/xfs/libxfs/xfs_defer.h
>> +++ b/fs/xfs/libxfs/xfs_defer.h
>> @@ -19,6 +19,7 @@ enum xfs_defer_ops_type {
>>   	XFS_DEFER_OPS_TYPE_RMAP,
>>   	XFS_DEFER_OPS_TYPE_FREE,
>>   	XFS_DEFER_OPS_TYPE_AGFL_FREE,
>> +	XFS_DEFER_OPS_TYPE_ATTR,
>>   	XFS_DEFER_OPS_TYPE_MAX,
>>   };
>>   
>> @@ -63,6 +64,8 @@ extern const struct xfs_defer_op_type xfs_refcount_update_defer_type;
>>   extern const struct xfs_defer_op_type xfs_rmap_update_defer_type;
>>   extern const struct xfs_defer_op_type xfs_extent_free_defer_type;
>>   extern const struct xfs_defer_op_type xfs_agfl_free_defer_type;
>> +extern const struct xfs_defer_op_type xfs_attr_defer_type;
>> +
>>   
>>   /*
>>    * This structure enables a dfops user to detach the chain of deferred
>> diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
>> index dd764da..d419c34 100644
>> --- a/fs/xfs/libxfs/xfs_format.h
>> +++ b/fs/xfs/libxfs/xfs_format.h
>> @@ -584,6 +584,11 @@ static inline bool xfs_sb_version_hasinobtcounts(struct xfs_sb *sbp)
>>   		(sbp->sb_features_ro_compat & XFS_SB_FEAT_RO_COMPAT_INOBTCNT);
>>   }
>>   
>> +static inline bool xfs_sb_version_hasdelattr(struct xfs_sb *sbp)
>> +{
>> +	return false;
>> +}
>> +
>>   /*
>>    * end of superblock version macros
>>    */
>> diff --git a/fs/xfs/libxfs/xfs_log_format.h b/fs/xfs/libxfs/xfs_log_format.h
>> index 8bd00da..de6309d 100644
>> --- a/fs/xfs/libxfs/xfs_log_format.h
>> +++ b/fs/xfs/libxfs/xfs_log_format.h
>> @@ -117,7 +117,12 @@ struct xfs_unmount_log_format {
>>   #define XLOG_REG_TYPE_CUD_FORMAT	24
>>   #define XLOG_REG_TYPE_BUI_FORMAT	25
>>   #define XLOG_REG_TYPE_BUD_FORMAT	26
>> -#define XLOG_REG_TYPE_MAX		26
>> +#define XLOG_REG_TYPE_ATTRI_FORMAT	27
>> +#define XLOG_REG_TYPE_ATTRD_FORMAT	28
>> +#define XLOG_REG_TYPE_ATTR_NAME	29
>> +#define XLOG_REG_TYPE_ATTR_VALUE	30
>> +#define XLOG_REG_TYPE_MAX		30
>> +
>>   
>>   /*
>>    * Flags to log operation header
>> @@ -240,6 +245,8 @@ typedef struct xfs_trans_header {
>>   #define	XFS_LI_CUD		0x1243
>>   #define	XFS_LI_BUI		0x1244	/* bmbt update intent */
>>   #define	XFS_LI_BUD		0x1245
>> +#define	XFS_LI_ATTRI		0x1246  /* attr set/remove intent*/
>> +#define	XFS_LI_ATTRD		0x1247  /* attr set/remove done */
>>   
>>   #define XFS_LI_TYPE_DESC \
>>   	{ XFS_LI_EFI,		"XFS_LI_EFI" }, \
>> @@ -255,7 +262,9 @@ typedef struct xfs_trans_header {
>>   	{ XFS_LI_CUI,		"XFS_LI_CUI" }, \
>>   	{ XFS_LI_CUD,		"XFS_LI_CUD" }, \
>>   	{ XFS_LI_BUI,		"XFS_LI_BUI" }, \
>> -	{ XFS_LI_BUD,		"XFS_LI_BUD" }
>> +	{ XFS_LI_BUD,		"XFS_LI_BUD" }, \
>> +	{ XFS_LI_ATTRI,		"XFS_LI_ATTRI" }, \
>> +	{ XFS_LI_ATTRD,		"XFS_LI_ATTRD" }
>>   
>>   /*
>>    * Inode Log Item Format definitions.
>> @@ -863,4 +872,35 @@ struct xfs_icreate_log {
>>   	__be32		icl_gen;	/* inode generation number to use */
>>   };
>>   
>> +/*
>> + * Flags for deferred attribute operations.
>> + * Upper bits are flags, lower byte is type code
>> + */
>> +#define XFS_ATTR_OP_FLAGS_SET		1	/* Set the attribute */
>> +#define XFS_ATTR_OP_FLAGS_REMOVE	2	/* Remove the attribute */
>> +#define XFS_ATTR_OP_FLAGS_TYPE_MASK	0x0FF	/* Flags type mask */
>> +
>> +/*
>> + * This is the structure used to lay out an attr log item in the
>> + * log.
>> + */
>> +struct xfs_attri_log_format {
>> +	uint16_t	alfi_type;	/* attri log item type */
>> +	uint16_t	alfi_size;	/* size of this item */
>> +	uint32_t	__pad;		/* pad to 64 bit aligned */
>> +	uint64_t	alfi_id;	/* attri identifier */
>> +	xfs_ino_t	alfi_ino;	/* the inode for this attr operation */
> 
> This is an ondisk structure; please use only explicitly sized data
> types like uint64_t.
Ok, will update

> 
>> +	uint32_t	alfi_op_flags;	/* marks the op as a set or remove */
>> +	uint32_t	alfi_name_len;	/* attr name length */
>> +	uint32_t	alfi_value_len;	/* attr value length */
>> +	uint32_t	alfi_attr_flags;/* attr flags */
>> +};
>> +
>> +struct xfs_attrd_log_format {
>> +	uint16_t	alfd_type;	/* attrd log item type */
>> +	uint16_t	alfd_size;	/* size of this item */
>> +	uint32_t	__pad;		/* pad to 64 bit aligned */
>> +	uint64_t	alfd_alf_id;	/* id of corresponding attrd */
> 
> "..of corresponding attri"
Yes, corresponding attri :-)

> 
>> +};
>> +
>>   #endif /* __XFS_LOG_FORMAT_H__ */
>> diff --git a/fs/xfs/libxfs/xfs_log_recover.h b/fs/xfs/libxfs/xfs_log_recover.h
>> index 3cca2bf..b6e5514 100644
>> --- a/fs/xfs/libxfs/xfs_log_recover.h
>> +++ b/fs/xfs/libxfs/xfs_log_recover.h
>> @@ -72,6 +72,8 @@ extern const struct xlog_recover_item_ops xlog_rui_item_ops;
>>   extern const struct xlog_recover_item_ops xlog_rud_item_ops;
>>   extern const struct xlog_recover_item_ops xlog_cui_item_ops;
>>   extern const struct xlog_recover_item_ops xlog_cud_item_ops;
>> +extern const struct xlog_recover_item_ops xlog_attri_item_ops;
>> +extern const struct xlog_recover_item_ops xlog_attrd_item_ops;
>>   
>>   /*
>>    * Macros, structures, prototypes for internal log manager use.
>> diff --git a/fs/xfs/libxfs/xfs_types.h b/fs/xfs/libxfs/xfs_types.h
>> index 397d947..860cdd2 100644
>> --- a/fs/xfs/libxfs/xfs_types.h
>> +++ b/fs/xfs/libxfs/xfs_types.h
>> @@ -11,6 +11,7 @@ typedef uint32_t	prid_t;		/* project ID */
>>   typedef uint32_t	xfs_agblock_t;	/* blockno in alloc. group */
>>   typedef uint32_t	xfs_agino_t;	/* inode # within allocation grp */
>>   typedef uint32_t	xfs_extlen_t;	/* extent length in blocks */
>> +typedef uint32_t	xfs_attrlen_t;	/* attr length */
> 
> This doesn't get used anywhere.
Ok, will clean out.

> 
>>   typedef uint32_t	xfs_agnumber_t;	/* allocation group number */
>>   typedef int32_t		xfs_extnum_t;	/* # of extents in a file */
>>   typedef int16_t		xfs_aextnum_t;	/* # extents in an attribute fork */
>> diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
>> index 1887605..9a649d1 100644
>> --- a/fs/xfs/scrub/common.c
>> +++ b/fs/xfs/scrub/common.c
>> @@ -24,6 +24,8 @@
>>   #include "xfs_rmap_btree.h"
>>   #include "xfs_log.h"
>>   #include "xfs_trans_priv.h"
>> +#include "xfs_da_format.h"
>> +#include "xfs_da_btree.h"
>>   #include "xfs_attr.h"
>>   #include "xfs_reflink.h"
>>   #include "scrub/scrub.h"
>> diff --git a/fs/xfs/xfs_acl.c b/fs/xfs/xfs_acl.c
>> index c544951..cad1db4 100644
>> --- a/fs/xfs/xfs_acl.c
>> +++ b/fs/xfs/xfs_acl.c
>> @@ -10,6 +10,8 @@
>>   #include "xfs_trans_resv.h"
>>   #include "xfs_mount.h"
>>   #include "xfs_inode.h"
>> +#include "xfs_da_format.h"
>> +#include "xfs_da_btree.h"
>>   #include "xfs_attr.h"
>>   #include "xfs_trace.h"
>>   #include "xfs_error.h"
>> diff --git a/fs/xfs/xfs_attr_item.c b/fs/xfs/xfs_attr_item.c
>> new file mode 100644
>> index 0000000..3980066
>> --- /dev/null
>> +++ b/fs/xfs/xfs_attr_item.c
>> @@ -0,0 +1,750 @@
>> +// SPDX-License-Identifier: GPL-2.0-or-later
>> +/*
>> + * Copyright (C) 2019 Oracle.  All Rights Reserved.
> 
> 2019 -> 2020.
Will update.  :-)

> 
>> + * Author: Allison Collins <allison.henderson@oracle.com>
>> + */
>> +
>> +#include "xfs.h"
>> +#include "xfs_fs.h"
>> +#include "xfs_format.h"
>> +#include "xfs_log_format.h"
>> +#include "xfs_trans_resv.h"
>> +#include "xfs_bit.h"
>> +#include "xfs_shared.h"
>> +#include "xfs_mount.h"
>> +#include "xfs_defer.h"
>> +#include "xfs_trans.h"
>> +#include "xfs_trans_priv.h"
>> +#include "xfs_buf_item.h"
>> +#include "xfs_attr_item.h"
>> +#include "xfs_log.h"
>> +#include "xfs_btree.h"
>> +#include "xfs_rmap.h"
>> +#include "xfs_inode.h"
>> +#include "xfs_icache.h"
>> +#include "xfs_da_format.h"
>> +#include "xfs_da_btree.h"
>> +#include "xfs_attr.h"
>> +#include "xfs_shared.h"
>> +#include "xfs_attr_item.h"
>> +#include "xfs_alloc.h"
>> +#include "xfs_bmap.h"
>> +#include "xfs_trace.h"
>> +#include "libxfs/xfs_da_format.h"
>> +#include "xfs_inode.h"
>> +#include "xfs_quota.h"
>> +#include "xfs_log_priv.h"
>> +#include "xfs_log_recover.h"
>> +
>> +static const struct xfs_item_ops xfs_attri_item_ops;
>> +static const struct xfs_item_ops xfs_attrd_item_ops;
>> +
>> +static inline struct xfs_attri_log_item *ATTRI_ITEM(struct xfs_log_item *lip)
>> +{
>> +	return container_of(lip, struct xfs_attri_log_item, attri_item);
>> +}
>> +
>> +STATIC void
>> +xfs_attri_item_free(
>> +	struct xfs_attri_log_item	*attrip)
>> +{
>> +	kmem_free(attrip->attri_item.li_lv_shadow);
>> +	kmem_free(attrip);
>> +}
>> +
>> +/*
>> + * Freeing the attrip requires that we remove it from the AIL if it has already
>> + * been placed there. However, the ATTRI may not yet have been placed in the
>> + * AIL when called by xfs_attri_release() from ATTRD processing due to the
>> + * ordering of committed vs unpin operations in bulk insert operations. Hence
>> + * the reference count to ensure only the last caller frees the ATTRI.
>> + */
>> +STATIC void
>> +xfs_attri_release(
>> +	struct xfs_attri_log_item	*attrip)
>> +{
>> +	ASSERT(atomic_read(&attrip->attri_refcount) > 0);
>> +	if (atomic_dec_and_test(&attrip->attri_refcount)) {
>> +		xfs_trans_ail_delete(&attrip->attri_item,
>> +				     SHUTDOWN_LOG_IO_ERROR);
>> +		xfs_attri_item_free(attrip);
>> +	}
>> +}
>> +
>> +/*
>> + * This returns the number of iovecs needed to log the given attri item. We
>> + * only need 1 iovec for an attri item.  It just logs the attr_log_format
>> + * structure.
>> + */
>> +static inline int
>> +xfs_attri_item_sizeof(
>> +	struct xfs_attri_log_item *attrip)
>> +{
>> +	return sizeof(struct xfs_attri_log_format);
>> +}
> 
> Please get rid of this trivial oneliner.
Sure, I think some of this I added just for reasons of being consistent 
with how the other delayed ops are implemented.

> 
>> +
>> +STATIC void
>> +xfs_attri_item_size(
>> +	struct xfs_log_item	*lip,
>> +	int			*nvecs,
>> +	int			*nbytes)
>> +{
>> +	struct xfs_attri_log_item       *attrip = ATTRI_ITEM(lip);
>> +
>> +	*nvecs += 1;
>> +	*nbytes += xfs_attri_item_sizeof(attrip);
>> +
>> +	/* Attr set and remove operations require a name */
>> +	ASSERT(attrip->attri_name_len > 0);
>> +
>> +	*nvecs += 1;
>> +	*nbytes += ATTR_NVEC_SIZE(attrip->attri_name_len);
>> +
>> +	/*
>> +	 * Set ops can accept a value of 0 len to clear an attr value.  Remove
>> +	 * ops do not need a value at all.  So only account for the value
>> +	 * when it is needed.
>> +	 */
>> +	if (attrip->attri_value_len > 0) {
>> +		*nvecs += 1;
>> +		*nbytes += ATTR_NVEC_SIZE(attrip->attri_value_len);
>> +	}
>> +}
>> +
>> +/*
>> + * This is called to fill in the log iovecs for the given attri log
>> + * item. We use  1 iovec for the attri_format_item, 1 for the name, and
>> + * another for the value if it is present
>> + */
>> +STATIC void
>> +xfs_attri_item_format(
>> +	struct xfs_log_item	*lip,
>> +	struct xfs_log_vec	*lv)
>> +{
>> +	struct xfs_attri_log_item	*attrip = ATTRI_ITEM(lip);
>> +	struct xfs_log_iovec		*vecp = NULL;
>> +
>> +	attrip->attri_format.alfi_type = XFS_LI_ATTRI;
>> +	attrip->attri_format.alfi_size = 1;
>> +
>> +	/*
>> +	 * This size accounting must be done before copying the attrip into the
>> +	 * iovec.  If we do it after, the wrong size will be recorded to the log
>> +	 * and we trip across assertion checks for bad region sizes later during
>> +	 * the log recovery.
>> +	 */
>> +
>> +	ASSERT(attrip->attri_name_len > 0);
>> +	attrip->attri_format.alfi_size++;
>> +
>> +	if (attrip->attri_value_len > 0)
>> +		attrip->attri_format.alfi_size++;
>> +
>> +	xlog_copy_iovec(lv, &vecp, XLOG_REG_TYPE_ATTRI_FORMAT,
>> +			&attrip->attri_format,
>> +			xfs_attri_item_sizeof(attrip));
>> +	if (attrip->attri_name_len > 0)
> 
> I thought we required attri_name_len > 0 always?
I think so.  I think this check may have come up in one of the earlier 
reviews.  I'll add a comment here, we even have the assert a few lines up.

> 
>> +		xlog_copy_iovec(lv, &vecp, XLOG_REG_TYPE_ATTR_NAME,
>> +				attrip->attri_name,
>> +				ATTR_NVEC_SIZE(attrip->attri_name_len));
>> +
>> +	if (attrip->attri_value_len > 0)
>> +		xlog_copy_iovec(lv, &vecp, XLOG_REG_TYPE_ATTR_VALUE,
>> +				attrip->attri_value,
>> +				ATTR_NVEC_SIZE(attrip->attri_value_len));
>> +}
>> +
>> +/*
>> + * The unpin operation is the last place an ATTRI is manipulated in the log. It
>> + * is either inserted in the AIL or aborted in the event of a log I/O error. In
>> + * either case, the ATTRI transaction has been successfully committed to make
>> + * it this far. Therefore, we expect whoever committed the ATTRI to either
>> + * construct and commit the ATTRD or drop the ATTRD's reference in the event of
>> + * error. Simply drop the log's ATTRI reference now that the log is done with
>> + * it.
>> + */
>> +STATIC void
>> +xfs_attri_item_unpin(
>> +	struct xfs_log_item	*lip,
>> +	int			remove)
>> +{
>> +	struct xfs_attri_log_item	*attrip = ATTRI_ITEM(lip);
>> +
>> +	xfs_attri_release(attrip);
> 
> Nit: this could be shortened to xfs_attri_release(ATTRI_ITEM(lip)).
Ok, will shorten

> 
>> +}
>> +
>> +
>> +STATIC void
>> +xfs_attri_item_release(
>> +	struct xfs_log_item	*lip)
>> +{
>> +	xfs_attri_release(ATTRI_ITEM(lip));
>> +}
>> +
>> +/*
>> + * Allocate and initialize an attri item
>> + */
>> +STATIC struct xfs_attri_log_item *
>> +xfs_attri_init(
>> +	struct xfs_mount	*mp)
>> +
>> +{
>> +	struct xfs_attri_log_item	*attrip;
>> +	uint				size;
> 
> Can you line up the *mp in the parameter list with the *attrip in the
> local variables?
Sure

> 
>> +
>> +	size = (uint)(sizeof(struct xfs_attri_log_item));
> 
> kmem_zalloc takes a size_t parameter (which is the return type of sizeof);
> no need to do all this casting.
Ok, I'm thinking of adding an extra buffer_size param here, so that one 
of the callers doesnt have to realloc this for the trailing buffer 
needed during the commit.  One of the new test cases is showing an 
intermittent warning about allocating more than a page, so I'm trying to 
clean that up and figure that out

> 
>> +	attrip = kmem_zalloc(size, 0);
>> +
>> +	xfs_log_item_init(mp, &attrip->attri_item, XFS_LI_ATTRI,
>> +			  &xfs_attri_item_ops);
>> +	attrip->attri_format.alfi_id = (uintptr_t)(void *)attrip;
>> +	atomic_set(&attrip->attri_refcount, 2);
>> +
>> +	return attrip;
>> +}
>> +
>> +/*
>> + * Copy an attr format buffer from the given buf, and into the destination attr
>> + * format structure.
>> + */
>> +STATIC int
>> +xfs_attri_copy_format(struct xfs_log_iovec *buf,
>> +		      struct xfs_attri_log_format *dst_attr_fmt)
>> +{
>> +	struct xfs_attri_log_format *src_attr_fmt = buf->i_addr;
>> +	uint len = sizeof(struct xfs_attri_log_format);
> 
> Indentation and whatnot with the parameter names.
Ok will fix
> 
>> +
>> +	if (buf->i_len != len)
>> +		return -EFSCORRUPTED;
>> +
>> +	memcpy((char *)dst_attr_fmt, (char *)src_attr_fmt, len);
>> +	return 0;
>> +}
>> +
>> +static inline struct xfs_attrd_log_item *ATTRD_ITEM(struct xfs_log_item *lip)
>> +{
>> +	return container_of(lip, struct xfs_attrd_log_item, attrd_item);
>> +}
>> +
>> +STATIC void
>> +xfs_attrd_item_free(struct xfs_attrd_log_item *attrdp)
>> +{
>> +	kmem_free(attrdp->attrd_item.li_lv_shadow);
>> +	kmem_free(attrdp);
>> +}
>> +
>> +/*
>> + * This returns the number of iovecs needed to log the given attrd item.
>> + * We only need 1 iovec for an attrd item.  It just logs the attr_log_format
>> + * structure.
>> + */
>> +static inline int
>> +xfs_attrd_item_sizeof(
>> +	struct xfs_attrd_log_item *attrdp)
>> +{
>> +	return sizeof(struct xfs_attrd_log_format);
>> +}
>> +
>> +STATIC void
>> +xfs_attrd_item_size(
>> +	struct xfs_log_item	*lip,
>> +	int			*nvecs,
>> +	int			*nbytes)
>> +{
>> +	struct xfs_attrd_log_item	*attrdp = ATTRD_ITEM(lip);
> 
> Variable name alignment between the parameter list and the local vars.
> 
>> +	*nvecs += 1;
> 
> Space between local variable declaration and the first line of code.
> 
>> +	*nbytes += xfs_attrd_item_sizeof(attrdp);
> 
> No need for a oneliner function for sizeof.

Ok, will fix
> 
>> +}
>> +
>> +/*
>> + * This is called to fill in the log iovecs for the given attrd log item. We use
>> + * only 1 iovec for the attrd_format, and we point that at the attr_log_format
>> + * structure embedded in the attrd item.
>> + */
>> +STATIC void
>> +xfs_attrd_item_format(
>> +	struct xfs_log_item	*lip,
>> +	struct xfs_log_vec	*lv)
>> +{
>> +	struct xfs_attrd_log_item	*attrdp = ATTRD_ITEM(lip);
>> +	struct xfs_log_iovec		*vecp = NULL;
>> +
>> +	attrdp->attrd_format.alfd_type = XFS_LI_ATTRD;
>> +	attrdp->attrd_format.alfd_size = 1;
>> +
>> +	xlog_copy_iovec(lv, &vecp, XLOG_REG_TYPE_ATTRD_FORMAT,
>> +			&attrdp->attrd_format, xfs_attrd_item_sizeof(attrdp));
>> +}
>> +
>> +/*
>> + * The ATTRD is either committed or aborted if the transaction is cancelled. If
>> + * the transaction is cancelled, drop our reference to the ATTRI and free the
>> + * ATTRD.
>> + */
>> +STATIC void
>> +xfs_attrd_item_release(
>> +	struct xfs_log_item     *lip)
>> +{
>> +	struct xfs_attrd_log_item *attrdp = ATTRD_ITEM(lip);
>> +	xfs_attri_release(attrdp->attrd_attrip);
> 
> Space between the variable declaration and the first line of code.
Sure, will add.

> 
>> +	xfs_attrd_item_free(attrdp);
>> +}
>> +
>> +/*
>> + * Log an ATTRI it to the ATTRD when the attr op is done.  An attr operation
> 
> I don't know what "Log an ATTRI it to the ATTRD" means.  I think this is
> the function that performs one step of an attribute update intent and
> then tags the attrd item dirty, right?
Yes, I had modeled this function loosly around free extent code at the 
time.  It has similar commentary, though that's about what I interpreted 
it to mean.  Back then we were still trying to conceptualize how this 
looping behavior with the state machine was going to work though.

Maybe the comment should just state it like that if that's more clear?

"Performs one step of an attribute update intent and marks the attrd 
item dirty."

?

> 
>> + * may be a set or a remove.  Note that the transaction is marked dirty
>> + * regardless of whether the operation succeeds or fails to support the
>> + * ATTRI/ATTRD lifecycle rules.
>> + */
>> +int
>> +xfs_trans_attr(
>> +	struct xfs_delattr_context	*dac,
>> +	struct xfs_attrd_log_item	*attrdp,
>> +	struct xfs_buf			**leaf_bp,
>> +	uint32_t			op_flags)
>> +{
>> +	struct xfs_da_args		*args = dac->da_args;
>> +	int				error;
>> +
>> +	error = xfs_qm_dqattach_locked(args->dp, 0);
>> +	if (error)
>> +		return error;
>> +
>> +	switch (op_flags) {
>> +	case XFS_ATTR_OP_FLAGS_SET:
>> +		args->op_flags |= XFS_DA_OP_ADDNAME;
>> +		error = xfs_attr_set_iter(dac, leaf_bp);
>> +		break;
>> +	case XFS_ATTR_OP_FLAGS_REMOVE:
>> +		ASSERT(XFS_IFORK_Q((args->dp)));
> 
> No need for the double parentheses around args->dp.
Ok, will clean out

> 
>> +		error = xfs_attr_remove_iter(dac);
>> +		break;
>> +	default:
>> +		error = -EFSCORRUPTED;
>> +		break;
>> +	}
>> +
>> +	/*
>> +	 * Mark the transaction dirty, even on error. This ensures the
>> +	 * transaction is aborted, which:
>> +	 *
>> +	 * 1.) releases the ATTRI and frees the ATTRD
>> +	 * 2.) shuts down the filesystem
>> +	 */
>> +	args->trans->t_flags |= XFS_TRANS_DIRTY;
>> +	if (xfs_sb_version_hasdelattr(&args->dp->i_mount->m_sb))
>> +		set_bit(XFS_LI_DIRTY, &attrdp->attrd_item.li_flags);
> 
> This could probably be:
> 
> 	if (attrdp)
> 		set_bit(...);

Sure, that should work too.  I'm thinking a comment though?  Because 
this looses the subtle implication that attrdp is expected to be null 
when the feature bit is off.  Otherwise it may stir up future questions 
of why/how would this be null.  Maybe just something like:

/*
  * attr intent/done items are null when delayed attributes are disabled
  */

?

> 
>> +
>> +	return error;
>> +}
>> +
>> +/* Log an attr to the intent item. */
>> +STATIC void
>> +xfs_attr_log_item(
>> +	struct xfs_trans		*tp,
>> +	struct xfs_attri_log_item	*attrip,
>> +	struct xfs_attr_item		*attr)
>> +{
>> +	struct xfs_attri_log_format	*attrp;
>> +
>> +	tp->t_flags |= XFS_TRANS_DIRTY;
>> +	set_bit(XFS_LI_DIRTY, &attrip->attri_item.li_flags);
>> +
>> +	/*
>> +	 * At this point the xfs_attr_item has been constructed, and we've
>> +	 * created the log intent. Fill in the attri log item and log format
>> +	 * structure with fields from this xfs_attr_item
>> +	 */
>> +	attrp = &attrip->attri_format;
>> +	attrp->alfi_ino = attr->xattri_dac.da_args->dp->i_ino;
>> +	attrp->alfi_op_flags = attr->xattri_op_flags;
>> +	attrp->alfi_value_len = attr->xattri_dac.da_args->valuelen;
>> +	attrp->alfi_name_len = attr->xattri_dac.da_args->namelen;
>> +	attrp->alfi_attr_flags = attr->xattri_dac.da_args->attr_filter;
>> +
>> +	attrip->attri_name = (void *)attr->xattri_dac.da_args->name;
>> +	attrip->attri_value = attr->xattri_dac.da_args->value;
>> +	attrip->attri_name_len = attr->xattri_dac.da_args->namelen;
>> +	attrip->attri_value_len = attr->xattri_dac.da_args->valuelen;
>> +}
>> +
>> +/* Get an ATTRI. */
>> +static struct xfs_log_item *
>> +xfs_attr_create_intent(
>> +	struct xfs_trans		*tp,
>> +	struct list_head		*items,
>> +	unsigned int			count,
>> +	bool				sort)
>> +{
>> +	struct xfs_mount		*mp = tp->t_mountp;
>> +	struct xfs_attri_log_item	*attrip;
>> +	struct xfs_attr_item		*attr;
>> +
>> +	ASSERT(count == 1);
>> +
>> +	if (!xfs_sb_version_hasdelattr(&mp->m_sb))
>> +		return NULL;
>> +
>> +	attrip = xfs_attri_init(mp);
>> +	xfs_trans_add_item(tp, &attrip->attri_item);
>> +	list_for_each_entry(attr, items, xattri_list)
>> +		xfs_attr_log_item(tp, attrip, attr);
>> +	return &attrip->attri_item;
>> +}
>> +
>> +/* Process an attr. */
>> +STATIC int
>> +xfs_attr_finish_item(
>> +	struct xfs_trans		*tp,
>> +	struct xfs_log_item		*done,
>> +	struct list_head		*item,
>> +	struct xfs_btree_cur		**state)
>> +{
>> +	struct xfs_attr_item		*attr;
>> +	int				error;
>> +	struct xfs_delattr_context	*dac;
>> +	struct xfs_attrd_log_item	*attrdp;
>> +	struct xfs_attri_log_item	*attrip;
>> +
>> +	attr = container_of(item, struct xfs_attr_item, xattri_list);
>> +	dac = &attr->xattri_dac;
>> +
>> +	/*
>> +	 * Always reset trans after EAGAIN cycle
>> +	 * since the transaction is new
>> +	 */
>> +	dac->da_args->trans = tp;
>> +
>> +	error = xfs_trans_attr(dac, ATTRD_ITEM(done), &dac->leaf_bp,
>> +			       attr->xattri_op_flags);
>> +	/*
>> +	 * The attrip refers to xfs_attr_item memory to log the name and value
>> +	 * with the intent item. This already occurred when the intent was
>> +	 * committed so these fields are no longer accessed.
> 
> Can you clear the attri_{name,value} pointers after you've logged the
> intent item so that we don't have to do them here?
> 
Ok, maybe I can put this in xfs_attri_item_committed?

>> Clear them out of
>> +	 * caution since we're about to free the xfs_attr_item.
>> +	 */
>> +	if (xfs_sb_version_hasdelattr(&dac->da_args->dp->i_mount->m_sb)) {
>> +		attrdp = (struct xfs_attrd_log_item *)done;
> 
> attrdp = ATTRD_ITEM(done)?
Sure, will shorten
> 
>> +		attrip = attrdp->attrd_attrip;
>> +		attrip->attri_name = NULL;
>> +		attrip->attri_value = NULL;
>> +	}
>> +
>> +	if (error != -EAGAIN)
>> +		kmem_free(attr);
>> +
>> +	return error;
>> +}
>> +
>> +/* Abort all pending ATTRs. */
>> +STATIC void
>> +xfs_attr_abort_intent(
>> +	struct xfs_log_item		*intent)
>> +{
>> +	xfs_attri_release(ATTRI_ITEM(intent));
>> +}
>> +
>> +/* Cancel an attr */
>> +STATIC void
>> +xfs_attr_cancel_item(
>> +	struct list_head		*item)
>> +{
>> +	struct xfs_attr_item		*attr;
>> +
>> +	attr = container_of(item, struct xfs_attr_item, xattri_list);
>> +	kmem_free(attr);
>> +}
>> +
>> +/*
>> + * The ATTRI is logged only once and cannot be moved in the log, so simply
>> + * return the lsn at which it's been logged.
>> + */
>> +STATIC xfs_lsn_t
>> +xfs_attri_item_committed(
>> +	struct xfs_log_item	*lip,
>> +	xfs_lsn_t		lsn)
>> +{
>> +	return lsn;
>> +}
> 
> You can omit this function because the default is "return lsn;" if you
> don't provide one.  See xfs_trans_committed_bulk.
Oh, ok.  I was thinking of moving some of the finish item clean up here 
though.
> 
>> +
>> +STATIC void
>> +xfs_attri_item_committing(
>> +	struct xfs_log_item	*lip,
>> +	xfs_lsn_t		lsn)
>> +{
>> +}
> 
> This function isn't required if it doesn't do anything.  See
> xfs_log_commit_cil.
Ok, will remove

> 
>> +
>> +STATIC bool
>> +xfs_attri_item_match(
>> +	struct xfs_log_item	*lip,
>> +	uint64_t		intent_id)
>> +{
>> +	return ATTRI_ITEM(lip)->attri_format.alfi_id == intent_id;
>> +}
>> +
>> +/*
>> + * When the attrd item is committed to disk, all we need to do is delete our
>> + * reference to our partner attri item and then free ourselves. Since we're
>> + * freeing ourselves we must return -1 to keep the transaction code from
>> + * further referencing this item.
>> + */
>> +STATIC xfs_lsn_t
>> +xfs_attrd_item_committed(
>> +	struct xfs_log_item	*lip,
>> +	xfs_lsn_t		lsn)
>> +{
>> +	struct xfs_attrd_log_item	*attrdp = ATTRD_ITEM(lip);
>> +
>> +	/*
>> +	 * Drop the ATTRI reference regardless of whether the ATTRD has been
>> +	 * aborted. Once the ATTRD transaction is constructed, it is the sole
>> +	 * responsibility of the ATTRD to release the ATTRI (even if the ATTRI
>> +	 * is aborted due to log I/O error).
>> +	 */
>> +	xfs_attri_release(attrdp->attrd_attrip);
>> +	xfs_attrd_item_free(attrdp);
>> +
>> +	return NULLCOMMITLSN;
>> +}
> 
> If you set XFS_ITEM_RELEASE_WHEN_COMMITTED in the attrd item ops,
> xfs_trans_committed_bulk will call ->iop_release instead of
> ->iop_committed and you therefore don't need this function.
Oh i see, will do that then

> 
>> +
>> +STATIC void
>> +xfs_attrd_item_committing(
>> +	struct xfs_log_item	*lip,
>> +	xfs_lsn_t		lsn)
>> +{
>> +}
> 
> Same comment as xfs_attri_item_committing.
ok, will remove this one

> 
>> +
>> +
>> +/*
>> + * Allocate and initialize an attrd item
>> + */
>> +struct xfs_attrd_log_item *
>> +xfs_attrd_init(
>> +	struct xfs_mount		*mp,
>> +	struct xfs_attri_log_item	*attrip)
>> +
>> +{
>> +	struct xfs_attrd_log_item	*attrdp;
>> +	uint				size;
>> +
>> +	size = (uint)(sizeof(struct xfs_attrd_log_item));
> 
> Same comment about sizeof and size_t as in xfs_attri_init.
> 
>> +	attrdp = kmem_zalloc(size, 0);
>> +	memset(attrdp, 0, size);
> 
> No need to memset-zero something you just zalloc'd.
ok, will clean these up

> 
>> +
>> +	xfs_log_item_init(mp, &attrdp->attrd_item, XFS_LI_ATTRD,
>> +			  &xfs_attrd_item_ops);
>> +	attrdp->attrd_attrip = attrip;
>> +	attrdp->attrd_format.alfd_alf_id = attrip->attri_format.alfi_id;
>> +
>> +	return attrdp;
>> +}
>> +
>> +/*
>> + * This routine is called to allocate an "attr free done" log item.
>> + */
>> +struct xfs_attrd_log_item *
>> +xfs_trans_get_attrd(struct xfs_trans		*tp,
>> +		  struct xfs_attri_log_item	*attrip)
>> +{
>> +	struct xfs_attrd_log_item		*attrdp;
>> +
>> +	ASSERT(tp != NULL);
>> +
>> +	attrdp = xfs_attrd_init(tp->t_mountp, attrip);
>> +	ASSERT(attrdp != NULL);
> 
> You could fold xfs_attrd_init into this function since there's only one
> caller.
Sure, there's not a lot in the init

> 
>> +
>> +	xfs_trans_add_item(tp, &attrdp->attrd_item);
>> +	return attrdp;
>> +}
>> +
>> +static const struct xfs_item_ops xfs_attrd_item_ops = {
>> +	.iop_size	= xfs_attrd_item_size,
>> +	.iop_format	= xfs_attrd_item_format,
>> +	.iop_release    = xfs_attrd_item_release,
>> +	.iop_committing	= xfs_attrd_item_committing,
>> +	.iop_committed	= xfs_attrd_item_committed,
>> +};
>> +
>> +
>> +/* Get an ATTRD so we can process all the attrs. */
>> +static struct xfs_log_item *
>> +xfs_attr_create_done(
>> +	struct xfs_trans		*tp,
>> +	struct xfs_log_item		*intent,
>> +	unsigned int			count)
>> +{
>> +	if (!xfs_sb_version_hasdelattr(&tp->t_mountp->m_sb))
>> +		return NULL;
> 
> This is probably better expressed as:
> 
> 	if (!intent)
> 		return NULL;
> 
> Since we don't need a log intent done item if there's no log intent
> item.
Ok, that makes sense

> 
>> +
>> +	return &xfs_trans_get_attrd(tp, ATTRI_ITEM(intent))->attrd_item;
>> +}
>> +
>> +const struct xfs_defer_op_type xfs_attr_defer_type = {
>> +	.max_items	= 1,
>> +	.create_intent	= xfs_attr_create_intent,
>> +	.abort_intent	= xfs_attr_abort_intent,
>> +	.create_done	= xfs_attr_create_done,
>> +	.finish_item	= xfs_attr_finish_item,
>> +	.cancel_item	= xfs_attr_cancel_item,
>> +};
>> +
>> +/*
>> + * Process an attr intent item that was recovered from the log.  We need to
>> + * delete the attr that it describes.
>> + */
>> +STATIC int
>> +xfs_attri_item_recover(
>> +	struct xfs_log_item		*lip,
>> +	struct list_head		*capture_list)
>> +{
>> +	struct xfs_attri_log_item	*attrip = ATTRI_ITEM(lip);
>> +	struct xfs_mount		*mp = lip->li_mountp;
>> +	struct xfs_inode		*ip;
>> +	struct xfs_da_args		args;
>> +	struct xfs_attri_log_format	*attrp;
>> +	int				error;
>> +
>> +	/*
>> +	 * First check the validity of the attr described by the ATTRI.  If any
>> +	 * are bad, then assume that all are bad and just toss the ATTRI.
>> +	 */
>> +	attrp = &attrip->attri_format;
>> +	if (!(attrp->alfi_op_flags == XFS_ATTR_OP_FLAGS_SET ||
>> +	      attrp->alfi_op_flags == XFS_ATTR_OP_FLAGS_REMOVE) ||
>> +	    (attrp->alfi_value_len > XATTR_SIZE_MAX) ||
>> +	    (attrp->alfi_name_len > XATTR_NAME_MAX) ||
>> +	    (attrp->alfi_name_len == 0)) {
> 
> This needs to call xfs_verify_ino() on attrp->alfi_ino.
Ok, will add

> 
> This also needs to check for xfs_sb_version_hasdelayedattr().
Well, ideally this would not be exectuing if the feature bit were not 
on.  Maybe we should add an ASSERT at the top?

> 
> I would refactor this into a separate validation predicate to eliminate
> the multi-line if statement.  I will post a series cleaning up the other
> log items' recover functions shortly.
Alrighty, I will keep an eye out

> 
>> +		/*
>> +		 * This will pull the ATTRI from the AIL and free the memory
>> +		 * associated with it.
>> +		 */
>> +		xfs_attri_release(attrip);
> 
> No need to call xfs_attri_release; one of the 5.10 cleanups was to
> recognize that the log recovery code does this for you automatically.
> 
Ok, will remove

>> +		return -EFSCORRUPTED;
>> +	}
>> +
>> +	error = xfs_iget(mp, 0, attrp->alfi_ino, 0, 0, &ip);
>> +	if (error)
>> +		return error;
> 
> I /think/ this needs to call xfs_qm_dqattach here, for reasons I'll get
> into shortly.
> 
> In the meantime, this /definitely/ needs to do:
> 
> 	if (VFS_I(ip)->i_nlink == 0)
> 		xfs_iflags_set(ip, XFS_IRECOVERY);
> 
> Because the IRECOVERY flag prevents inode inactivation from triggering
> on an unlinked inode while we're still performing log recovery.
> 
> If you want to steal the xlog_recover_iget helper from the atomic
> swapext series[0] please feel free. :)
> 
> [0] https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/commit/?id=51e23b9c9d9674a78dc97c5848c9efb4461e074d
Oh I see.  Ok, I will take  a look at that

> 
>> +	memset(&args, 0, sizeof(args));
>> +	args.dp = ip;
>> +	args.name = attrip->attri_name;
>> +	args.namelen = attrp->alfi_name_len;
>> +	args.attr_filter = attrp->alfi_attr_flags;
>> +	if (attrp->alfi_op_flags == XFS_ATTR_OP_FLAGS_SET) {
>> +		args.value = attrip->attri_value;
>> +		args.valuelen = attrp->alfi_value_len;
>> +	}
>> +
>> +	error = xfs_attr_set(&args);
> 
> Er...
> 
>> +
>> +	xfs_attri_release(attrip);
> 
> The transaction commit will take care of releasing attrip.
Mmmm, the new test case for attr replay hangs with out this line.  I 
suspect because we end up with an item in the ail that never goes away.

[Nov12 13:26] INFO: task mount:15718 blocked for more than 120 seconds.
[  +0.000009]       Tainted: G        W   E     5.9.0-rc4 #1
[  +0.000002] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[  +0.000004] task:mount           state:D stack:    0 pid:15718 ppid: 
15491 flags:0x00004000
[  +0.000005] Call Trace:
[  +0.000079]  __schedule+0x2d9/0x780
[  +0.000020]  schedule+0x4a/0xb0
[  +0.000120]  xfs_ail_push_all_sync+0xb8/0x100 [xfs]

...ect....


Little confused on this one.... I didnt think transaction commits 
released log items?
> 
>> +	xfs_irele(ip);
>> +	return error;
>> +}
>> +
>> +static const struct xfs_item_ops xfs_attri_item_ops = {
>> +	.iop_size	= xfs_attri_item_size,
>> +	.iop_format	= xfs_attri_item_format,
>> +	.iop_unpin	= xfs_attri_item_unpin,
>> +	.iop_committed	= xfs_attri_item_committed,
>> +	.iop_committing = xfs_attri_item_committing,
>> +	.iop_release    = xfs_attri_item_release,
>> +	.iop_recover	= xfs_attri_item_recover,
>> +	.iop_match	= xfs_attri_item_match,
> 
> This needs an ->iop_relog method so that we can relog the attri log item
> if the log starts to fill up.
Ok, will add

> 
>> +};
>> +
>> +
>> +
>> +STATIC int
>> +xlog_recover_attri_commit_pass2(
>> +	struct xlog                     *log,
>> +	struct list_head		*buffer_list,
>> +	struct xlog_recover_item        *item,
>> +	xfs_lsn_t                       lsn)
>> +{
>> +	int                             error;
>> +	struct xfs_mount                *mp = log->l_mp;
>> +	struct xfs_attri_log_item       *attrip;
>> +	struct xfs_attri_log_format     *attri_formatp;
>> +	char				*name = NULL;
>> +	char				*value = NULL;
>> +	int				region = 0;
>> +
>> +	attri_formatp = item->ri_buf[region].i_addr;
> 
> Please check the __pad field for zeroes here.
Ok, will do

> 
>> +	attrip = xfs_attri_init(mp);
>> +	error = xfs_attri_copy_format(&item->ri_buf[region],
>> +				      &attrip->attri_format);
>> +	if (error) {
>> +		xfs_attri_item_free(attrip);
>> +		return error;
>> +	}
>> +
>> +	attrip->attri_name_len = attri_formatp->alfi_name_len;
>> +	attrip->attri_value_len = attri_formatp->alfi_value_len;
>> +	attrip = krealloc(attrip, sizeof(struct xfs_attri_log_item) +
>> +			  attrip->attri_name_len + attrip->attri_value_len,
>> +			  GFP_NOFS | __GFP_NOFAIL);
>> +
>> +	ASSERT(attrip->attri_name_len > 0);
> 
> If attri_name_len is zero, reject the whole thing with EFSCORRUPTED.
Ok, makes sense

> 
>> +	region++;
>> +	name = ((char *)attrip) + sizeof(struct xfs_attri_log_item);
>> +	memcpy(name, item->ri_buf[region].i_addr,
>> +	       attrip->attri_name_len);
>> +	attrip->attri_name = name;
>> +
>> +	if (attrip->attri_value_len > 0) {
>> +		region++;
>> +		value = ((char *)attrip) + sizeof(struct xfs_attri_log_item) +
>> +			attrip->attri_name_len;
>> +		memcpy(value, item->ri_buf[region].i_addr,
>> +			attrip->attri_value_len);
>> +		attrip->attri_value = value;
>> +	}
> 
> Question: is it valid for an attri item to have value_len > 0 for an
> XFS_ATTRI_OP_FLAGS_REMOVE operation?
Well, it shouldnt happen since the new attr_set routines assume that the 
absence of the value implies a remove operation.  It doesnt invalidate 
the item I suppose, though it would mean that it's carrying around a 
usless payload that it shouldnt.

> 
> Granted, that level of validation might be better left to the _recover
> function.
Maybe we should add and ASSERT there

> 
>> +
>> +	/*
>> +	 * The ATTRI has two references. One for the ATTRD and one for ATTRI to
>> +	 * ensure it makes it into the AIL. Insert the ATTRI into the AIL
>> +	 * directly and drop the ATTRI reference. Note that
>> +	 * xfs_trans_ail_update() drops the AIL lock.
>> +	 */
>> +	xfs_trans_ail_insert(log->l_ailp, &attrip->attri_item, lsn);
>> +	xfs_attri_release(attrip);
>> +	return 0;
>> +}
>> +
>> +const struct xlog_recover_item_ops xlog_attri_item_ops = {
>> +	.item_type	= XFS_LI_ATTRI,
>> +	.commit_pass2	= xlog_recover_attri_commit_pass2,
>> +};
>> +
>> +/*
>> + * This routine is called when an ATTRD format structure is found in a committed
>> + * transaction in the log. Its purpose is to cancel the corresponding ATTRI if
>> + * it was still in the log. To do this it searches the AIL for the ATTRI with
>> + * an id equal to that in the ATTRD format structure. If we find it we drop
>> + * the ATTRD reference, which removes the ATTRI from the AIL and frees it.
>> + */
>> +STATIC int
>> +xlog_recover_attrd_commit_pass2(
>> +	struct xlog			*log,
>> +	struct list_head		*buffer_list,
>> +	struct xlog_recover_item	*item,
>> +	xfs_lsn_t			lsn)
>> +{
>> +	struct xfs_attrd_log_format	*attrd_formatp;
>> +
>> +	attrd_formatp = item->ri_buf[0].i_addr;
>> +	ASSERT((item->ri_buf[0].i_len ==
>> +				(sizeof(struct xfs_attrd_log_format))));
>> +
>> +	xlog_recover_release_intent(log, XFS_LI_ATTRI,
>> +				    attrd_formatp->alfd_alf_id);
>> +	return 0;
>> +}
>> +
>> +const struct xlog_recover_item_ops xlog_attrd_item_ops = {
>> +	.item_type	= XFS_LI_ATTRD,
>> +	.commit_pass2	= xlog_recover_attrd_commit_pass2,
>> +};
>> diff --git a/fs/xfs/xfs_attr_item.h b/fs/xfs/xfs_attr_item.h
>> new file mode 100644
>> index 0000000..7dd2572
>> --- /dev/null
>> +++ b/fs/xfs/xfs_attr_item.h
>> @@ -0,0 +1,76 @@
>> +/* SPDX-License-Identifier: GPL-2.0-or-later
>> + *
>> + * Copyright (C) 2019 Oracle.  All Rights Reserved.
>> + * Author: Allison Collins <allison.henderson@oracle.com>
>> + */
>> +#ifndef	__XFS_ATTR_ITEM_H__
>> +#define	__XFS_ATTR_ITEM_H__
>> +
>> +/* kernel only ATTRI/ATTRD definitions */
>> +
>> +struct xfs_mount;
>> +struct kmem_zone;
>> +
>> +/*
>> + * Define ATTR flag bits. Manipulated by set/clear/test_bit operators.
>> + */
>> +#define	XFS_ATTRI_RECOVERED	1
>> +
>> +
>> +/* iovec length must be 32-bit aligned */
>> +#define ATTR_NVEC_SIZE(size) (size == sizeof(int32_t) ? sizeof(int32_t) : \
>> +				size + sizeof(int32_t) - \
>> +				(size % sizeof(int32_t)))
> 
> Can you turn this into a static inline helper?
> 
> And use one of the roundup() variants to ensure the proper alignment
> instead of this open-coded stuff? :)
Sure, will do

> 
>> +
>> +/*
>> + * This is the "attr intention" log item.  It is used to log the fact that some
>> + * attribute operations need to be processed.  An operation is currently either
>> + * a set or remove.  Set or remove operations are described by the xfs_attr_item
>> + * which may be logged to this intent.  Intents are used in conjunction with the
>> + * "attr done" log item described below.
>> + *
>> + * The ATTRI is reference counted so that it is not freed prior to both the
>> + * ATTRI and ATTRD being committed and unpinned. This ensures the ATTRI is
>> + * inserted into the AIL even in the event of out of order ATTRI/ATTRD
>> + * processing. In other words, an ATTRI is born with two references:
>> + *
>> + *      1.) an ATTRI held reference to track ATTRI AIL insertion
>> + *      2.) an ATTRD held reference to track ATTRD commit
>> + *
>> + * On allocation, both references are the responsibility of the caller. Once the
>> + * ATTRI is added to and dirtied in a transaction, ownership of reference one
>> + * transfers to the transaction. The reference is dropped once the ATTRI is
>> + * inserted to the AIL or in the event of failure along the way (e.g., commit
>> + * failure, log I/O error, etc.). Note that the caller remains responsible for
>> + * the ATTRD reference under all circumstances to this point. The caller has no
>> + * means to detect failure once the transaction is committed, however.
>> + * Therefore, an ATTRD is required after this point, even in the event of
>> + * unrelated failure.
>> + *
>> + * Once an ATTRD is allocated and dirtied in a transaction, reference two
>> + * transfers to the transaction. The ATTRD reference is dropped once it reaches
>> + * the unpin handler. Similar to the ATTRI, the reference also drops in the
>> + * event of commit failure or log I/O errors. Note that the ATTRD is not
>> + * inserted in the AIL, so at this point both the ATTRI and ATTRD are freed.
> 
> I don't think it's necessary to document the entire log intent/log done
> refcount state machine here; it'll do to record just the bits that are
> specific to delayed xattr operations.
Ok, maybe just the first 3 lines are enough then? I think that's all 
that really stands out from the other delayed ops

> 
>> + */
>> +struct xfs_attri_log_item {
>> +	struct xfs_log_item		attri_item;
>> +	atomic_t			attri_refcount;
>> +	int				attri_name_len;
>> +	void				*attri_name;
>> +	int				attri_value_len;
>> +	void				*attri_value;
> 
> Please compress this structure a bit by moving the two pointers to be
> adjacent instead of interspersed with ints.
Alrighty, will do.

> 
> Ok, now on to digesting the new state machine...
> 
> --D
Ok then, thanks for the thorough review!!

Allison
> 
>> +	struct xfs_attri_log_format	attri_format;
>> +};
>> +
>> +/*
>> + * This is the "attr done" log item.  It is used to log the fact that some attrs
>> + * earlier mentioned in an attri item have been freed.
>> + */
>> +struct xfs_attrd_log_item {
>> +	struct xfs_attri_log_item	*attrd_attrip;
>> +	struct xfs_log_item		attrd_item;
>> +	struct xfs_attrd_log_format	attrd_format;
>> +};
>> +
>> +#endif	/* __XFS_ATTR_ITEM_H__ */
>> diff --git a/fs/xfs/xfs_attr_list.c b/fs/xfs/xfs_attr_list.c
>> index 8f8837f..d7787a5 100644
>> --- a/fs/xfs/xfs_attr_list.c
>> +++ b/fs/xfs/xfs_attr_list.c
>> @@ -15,6 +15,7 @@
>>   #include "xfs_inode.h"
>>   #include "xfs_trans.h"
>>   #include "xfs_bmap.h"
>> +#include "xfs_da_btree.h"
>>   #include "xfs_attr.h"
>>   #include "xfs_attr_sf.h"
>>   #include "xfs_attr_leaf.h"
>> diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
>> index 3fbd98f..d5d1959 100644
>> --- a/fs/xfs/xfs_ioctl.c
>> +++ b/fs/xfs/xfs_ioctl.c
>> @@ -15,6 +15,8 @@
>>   #include "xfs_iwalk.h"
>>   #include "xfs_itable.h"
>>   #include "xfs_error.h"
>> +#include "xfs_da_format.h"
>> +#include "xfs_da_btree.h"
>>   #include "xfs_attr.h"
>>   #include "xfs_bmap.h"
>>   #include "xfs_bmap_util.h"
>> diff --git a/fs/xfs/xfs_ioctl32.c b/fs/xfs/xfs_ioctl32.c
>> index c1771e7..62e1534 100644
>> --- a/fs/xfs/xfs_ioctl32.c
>> +++ b/fs/xfs/xfs_ioctl32.c
>> @@ -17,6 +17,8 @@
>>   #include "xfs_itable.h"
>>   #include "xfs_fsops.h"
>>   #include "xfs_rtalloc.h"
>> +#include "xfs_da_format.h"
>> +#include "xfs_da_btree.h"
>>   #include "xfs_attr.h"
>>   #include "xfs_ioctl.h"
>>   #include "xfs_ioctl32.h"
>> diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
>> index 5e16545..5ecc76c 100644
>> --- a/fs/xfs/xfs_iops.c
>> +++ b/fs/xfs/xfs_iops.c
>> @@ -13,6 +13,8 @@
>>   #include "xfs_inode.h"
>>   #include "xfs_acl.h"
>>   #include "xfs_quota.h"
>> +#include "xfs_da_format.h"
>> +#include "xfs_da_btree.h"
>>   #include "xfs_attr.h"
>>   #include "xfs_trans.h"
>>   #include "xfs_trace.h"
>> diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
>> index fa2d05e..3457f22 100644
>> --- a/fs/xfs/xfs_log.c
>> +++ b/fs/xfs/xfs_log.c
>> @@ -1993,6 +1993,10 @@ xlog_print_tic_res(
>>   	    REG_TYPE_STR(CUD_FORMAT, "cud_format"),
>>   	    REG_TYPE_STR(BUI_FORMAT, "bui_format"),
>>   	    REG_TYPE_STR(BUD_FORMAT, "bud_format"),
>> +	    REG_TYPE_STR(ATTRI_FORMAT, "attri_format"),
>> +	    REG_TYPE_STR(ATTRD_FORMAT, "attrd_format"),
>> +	    REG_TYPE_STR(ATTR_NAME, "attr_name"),
>> +	    REG_TYPE_STR(ATTR_VALUE, "attr_value"),
>>   	};
>>   	BUILD_BUG_ON(ARRAY_SIZE(res_type_str) != XLOG_REG_TYPE_MAX + 1);
>>   #undef REG_TYPE_STR
>> diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
>> index a8289ad..cb951cd 100644
>> --- a/fs/xfs/xfs_log_recover.c
>> +++ b/fs/xfs/xfs_log_recover.c
>> @@ -1775,6 +1775,8 @@ static const struct xlog_recover_item_ops *xlog_recover_item_ops[] = {
>>   	&xlog_cud_item_ops,
>>   	&xlog_bui_item_ops,
>>   	&xlog_bud_item_ops,
>> +	&xlog_attri_item_ops,
>> +	&xlog_attrd_item_ops,
>>   };
>>   
>>   static const struct xlog_recover_item_ops *
>> diff --git a/fs/xfs/xfs_ondisk.h b/fs/xfs/xfs_ondisk.h
>> index 0aa87c2..bc9c25e 100644
>> --- a/fs/xfs/xfs_ondisk.h
>> +++ b/fs/xfs/xfs_ondisk.h
>> @@ -132,6 +132,8 @@ xfs_check_ondisk_structs(void)
>>   	XFS_CHECK_STRUCT_SIZE(struct xfs_inode_log_format,	56);
>>   	XFS_CHECK_STRUCT_SIZE(struct xfs_qoff_logformat,	20);
>>   	XFS_CHECK_STRUCT_SIZE(struct xfs_trans_header,		16);
>> +	XFS_CHECK_STRUCT_SIZE(struct xfs_attri_log_format,	40);
>> +	XFS_CHECK_STRUCT_SIZE(struct xfs_attrd_log_format,	16);
>>   
>>   	/*
>>   	 * The v5 superblock format extended several v4 header structures with
>> diff --git a/fs/xfs/xfs_xattr.c b/fs/xfs/xfs_xattr.c
>> index bca48b3..9b0c790 100644
>> --- a/fs/xfs/xfs_xattr.c
>> +++ b/fs/xfs/xfs_xattr.c
>> @@ -10,6 +10,7 @@
>>   #include "xfs_log_format.h"
>>   #include "xfs_da_format.h"
>>   #include "xfs_inode.h"
>> +#include "xfs_da_btree.h"
>>   #include "xfs_attr.h"
>>   #include "xfs_acl.h"
>>   #include "xfs_da_btree.h"
>> -- 
>> 2.7.4
>>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v13 03/10] xfs: Add delay ready attr set routines
  2020-11-10 21:57     ` Darrick J. Wong
@ 2020-11-13  1:33       ` Allison Henderson
  2020-11-13  9:16         ` Chandan Babu R
  0 siblings, 1 reply; 58+ messages in thread
From: Allison Henderson @ 2020-11-13  1:33 UTC (permalink / raw)
  To: Darrick J. Wong, Chandan Babu R; +Cc: linux-xfs



On 11/10/20 2:57 PM, Darrick J. Wong wrote:
> On Tue, Oct 27, 2020 at 07:02:55PM +0530, Chandan Babu R wrote:
>> On Friday 23 October 2020 12:04:28 PM IST Allison Henderson wrote:
>>> This patch modifies the attr set routines to be delay ready. This means
>>> they no longer roll or commit transactions, but instead return -EAGAIN
>>> to have the calling routine roll and refresh the transaction.  In this
>>> series, xfs_attr_set_args has become xfs_attr_set_iter, which uses a
>>> state machine like switch to keep track of where it was when EAGAIN was
>>> returned. See xfs_attr.h for a more detailed diagram of the states.
>>>
>>> Two new helper functions have been added: xfs_attr_rmtval_set_init and
>>> xfs_attr_rmtval_set_blk.  They provide a subset of logic similar to
>>> xfs_attr_rmtval_set, but they store the current block in the delay attr
>>> context to allow the caller to roll the transaction between allocations.
>>> This helps to simplify and consolidate code used by
>>> xfs_attr_leaf_addname and xfs_attr_node_addname. xfs_attr_set_args has
>>> now become a simple loop to refresh the transaction until the operation
>>> is completed.  Lastly, xfs_attr_rmtval_remove is no longer used, and is
>>> removed.
>>
>> One nit. xfs_attr_rmtval_remove()'s prototype declaration needs to be removed
>> from xfs_attr_remote.h.
Alrighty, will pull out

>>
>>>
>>> Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
>>> ---
>>>   fs/xfs/libxfs/xfs_attr.c        | 370 ++++++++++++++++++++++++++--------------
>>>   fs/xfs/libxfs/xfs_attr.h        | 126 +++++++++++++-
>>>   fs/xfs/libxfs/xfs_attr_remote.c |  99 +++++++----
>>>   fs/xfs/libxfs/xfs_attr_remote.h |   4 +
>>>   fs/xfs/xfs_trace.h              |   1 -
>>>   5 files changed, 439 insertions(+), 161 deletions(-)
>>>
>>> diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
>>> index 6ca94cb..95c98d7 100644
>>> --- a/fs/xfs/libxfs/xfs_attr.c
>>> +++ b/fs/xfs/libxfs/xfs_attr.c
>>> @@ -44,7 +44,7 @@ STATIC int xfs_attr_shortform_addname(xfs_da_args_t *args);
>>>    * Internal routines when attribute list is one block.
>>>    */
>>>   STATIC int xfs_attr_leaf_get(xfs_da_args_t *args);
>>> -STATIC int xfs_attr_leaf_addname(xfs_da_args_t *args);
>>> +STATIC int xfs_attr_leaf_addname(struct xfs_delattr_context *dac);
>>>   STATIC int xfs_attr_leaf_removename(xfs_da_args_t *args);
>>>   STATIC int xfs_attr_leaf_hasname(struct xfs_da_args *args, struct xfs_buf **bp);
>>>   
>>> @@ -52,12 +52,15 @@ STATIC int xfs_attr_leaf_hasname(struct xfs_da_args *args, struct xfs_buf **bp);
>>>    * Internal routines when attribute list is more than one block.
>>>    */
>>>   STATIC int xfs_attr_node_get(xfs_da_args_t *args);
>>> -STATIC int xfs_attr_node_addname(xfs_da_args_t *args);
>>> +STATIC int xfs_attr_node_addname(struct xfs_delattr_context *dac);
>>>   STATIC int xfs_attr_node_removename_iter(struct xfs_delattr_context *dac);
>>>   STATIC int xfs_attr_node_hasname(xfs_da_args_t *args,
>>>   				 struct xfs_da_state **state);
>>>   STATIC int xfs_attr_fillstate(xfs_da_state_t *state);
>>>   STATIC int xfs_attr_refillstate(xfs_da_state_t *state);
>>> +STATIC int xfs_attr_leaf_try_add(struct xfs_da_args *args, struct xfs_buf *bp);
>>> +STATIC int xfs_attr_set_iter(struct xfs_delattr_context *dac,
>>> +			     struct xfs_buf **leaf_bp);
>>>   
>>>   int
>>>   xfs_inode_hasattr(
>>> @@ -218,8 +221,11 @@ xfs_attr_is_shortform(
>>>   
>>>   /*
>>>    * Attempts to set an attr in shortform, or converts short form to leaf form if
>>> - * there is not enough room.  If the attr is set, the transaction is committed
>>> - * and set to NULL.
>>> + * there is not enough room.  This function is meant to operate as a helper
>>> + * routine to the delayed attribute functions.  It returns -EAGAIN to indicate
>>> + * that the calling function should roll the transaction, and then proceed to
>>> + * add the attr in leaf form.  This subroutine does not expect to be recalled
>>> + * again like the other delayed attr routines do.
>>>    */
>>>   STATIC int
>>>   xfs_attr_set_shortform(
>>> @@ -227,16 +233,16 @@ xfs_attr_set_shortform(
>>>   	struct xfs_buf		**leaf_bp)
>>>   {
>>>   	struct xfs_inode	*dp = args->dp;
>>> -	int			error, error2 = 0;
>>> +	int			error = 0;
>>>   
>>>   	/*
>>>   	 * Try to add the attr to the attribute list in the inode.
>>>   	 */
>>>   	error = xfs_attr_try_sf_addname(dp, args);
>>> +
>>> +	/* Should only be 0, -EEXIST or ENOSPC */
>>>   	if (error != -ENOSPC) {
>>> -		error2 = xfs_trans_commit(args->trans);
>>> -		args->trans = NULL;
>>> -		return error ? error : error2;
>>> +		return error;
>>>   	}
>>>   	/*
>>>   	 * It won't fit in the shortform, transform to a leaf block.  GROT:
>>> @@ -249,18 +255,10 @@ xfs_attr_set_shortform(
>>>   	/*
>>>   	 * Prevent the leaf buffer from being unlocked so that a concurrent AIL
>>>   	 * push cannot grab the half-baked leaf buffer and run into problems
>>> -	 * with the write verifier. Once we're done rolling the transaction we
>>> -	 * can release the hold and add the attr to the leaf.
>>> +	 * with the write verifier.
>>>   	 */
>>>   	xfs_trans_bhold(args->trans, *leaf_bp);
>>> -	error = xfs_defer_finish(&args->trans);
>>> -	xfs_trans_bhold_release(args->trans, *leaf_bp);
>>> -	if (error) {
>>> -		xfs_trans_brelse(args->trans, *leaf_bp);
>>> -		return error;
>>> -	}
>>> -
>>> -	return 0;
>>> +	return -EAGAIN;
>>>   }
>>>   
>>>   /*
>>> @@ -268,7 +266,7 @@ xfs_attr_set_shortform(
>>>    * also checks for a defer finish.  Transaction is finished and rolled as
>>>    * needed, and returns true of false if the delayed operation should continue.
>>>    */
>>> -int
>>> +STATIC int
>>>   xfs_attr_trans_roll(
>>>   	struct xfs_delattr_context	*dac)
>>>   {
>>> @@ -297,61 +295,130 @@ int
>>>   xfs_attr_set_args(
>>>   	struct xfs_da_args	*args)
>>>   {
>>> -	struct xfs_inode	*dp = args->dp;
>>> -	struct xfs_buf          *leaf_bp = NULL;
>>> -	int			error = 0;
>>> +	struct xfs_buf			*leaf_bp = NULL;
>>> +	int				error = 0;
>>> +	struct xfs_delattr_context	dac = {
>>> +		.da_args	= args,
>>> +	};
>>> +
>>> +	do {
>>> +		error = xfs_attr_set_iter(&dac, &leaf_bp);
>>> +		if (error != -EAGAIN)
>>> +			break;
>>> +
>>> +		error = xfs_attr_trans_roll(&dac);
>>> +		if (error)
>>> +			return error;
>>> +
>>> +		if (leaf_bp) {
>>> +			xfs_trans_bjoin(args->trans, leaf_bp);
>>> +			xfs_trans_bhold(args->trans, leaf_bp);
>>> +		}
>>
>> When xfs_attr_set_iter() causes a "short form" attribute list to be converted
>> to "leaf form", leaf_bp would point to an xfs_buf which has been added to the
>> transaction and also XFS_BLI_HOLD flag is set on the buffer (last statement in
>> xfs_attr_set_shortform()). XFS_BLI_HOLD flag makes sure that the new
>> transaction allocated by xfs_attr_trans_roll() would continue to have leaf_bp
>> in the transaction's item list. Hence I think the above calls to
>> xfs_trans_bjoin() and xfs_trans_bhold() are not required.
Sorry, I just noticed Chandans commentary for this patch.  Apologies. I 
think we can get away with out this now, but yes this routine disappears 
at the end of the set now.  Will clean out anyway for bisecting reasons 
though. :-)

> 
> I /think/ the defer ops will rejoin the buffer each time it rolls, which
> means that xfs_attr_trans_roll returns with the buffer already joined to
> the transaction?  And I think you're right that the bhold isn't needed,
> because holding is dictated by the lower levels (i.e. _set_iter).
> 
>> Please let me know if I am missing something obvious here.
> 
> The entire function goes away by the end of the series. :)
> 
> --D
> 
>>
>> -- 
>> chandan
>>
>>
>>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v13 03/10] xfs: Add delay ready attr set routines
  2020-11-10 23:10   ` Darrick J. Wong
@ 2020-11-13  1:38     ` Allison Henderson
  2020-11-14  1:35       ` Darrick J. Wong
  0 siblings, 1 reply; 58+ messages in thread
From: Allison Henderson @ 2020-11-13  1:38 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs



On 11/10/20 4:10 PM, Darrick J. Wong wrote:
> On Thu, Oct 22, 2020 at 11:34:28PM -0700, Allison Henderson wrote:
>> This patch modifies the attr set routines to be delay ready. This means
>> they no longer roll or commit transactions, but instead return -EAGAIN
>> to have the calling routine roll and refresh the transaction.  In this
>> series, xfs_attr_set_args has become xfs_attr_set_iter, which uses a
>> state machine like switch to keep track of where it was when EAGAIN was
>> returned. See xfs_attr.h for a more detailed diagram of the states.
>>
>> Two new helper functions have been added: xfs_attr_rmtval_set_init and
>> xfs_attr_rmtval_set_blk.  They provide a subset of logic similar to
>> xfs_attr_rmtval_set, but they store the current block in the delay attr
>> context to allow the caller to roll the transaction between allocations.
>> This helps to simplify and consolidate code used by
>> xfs_attr_leaf_addname and xfs_attr_node_addname. xfs_attr_set_args has
>> now become a simple loop to refresh the transaction until the operation
>> is completed.  Lastly, xfs_attr_rmtval_remove is no longer used, and is
>> removed.
>>
>> Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
>> ---
>>   fs/xfs/libxfs/xfs_attr.c        | 370 ++++++++++++++++++++++++++--------------
>>   fs/xfs/libxfs/xfs_attr.h        | 126 +++++++++++++-
>>   fs/xfs/libxfs/xfs_attr_remote.c |  99 +++++++----
>>   fs/xfs/libxfs/xfs_attr_remote.h |   4 +
>>   fs/xfs/xfs_trace.h              |   1 -
>>   5 files changed, 439 insertions(+), 161 deletions(-)
>>
>> diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
>> index 6ca94cb..95c98d7 100644
>> --- a/fs/xfs/libxfs/xfs_attr.c
>> +++ b/fs/xfs/libxfs/xfs_attr.c
>> @@ -44,7 +44,7 @@ STATIC int xfs_attr_shortform_addname(xfs_da_args_t *args);
>>    * Internal routines when attribute list is one block.
>>    */
>>   STATIC int xfs_attr_leaf_get(xfs_da_args_t *args);
>> -STATIC int xfs_attr_leaf_addname(xfs_da_args_t *args);
>> +STATIC int xfs_attr_leaf_addname(struct xfs_delattr_context *dac);
>>   STATIC int xfs_attr_leaf_removename(xfs_da_args_t *args);
>>   STATIC int xfs_attr_leaf_hasname(struct xfs_da_args *args, struct xfs_buf **bp);
>>   
>> @@ -52,12 +52,15 @@ STATIC int xfs_attr_leaf_hasname(struct xfs_da_args *args, struct xfs_buf **bp);
>>    * Internal routines when attribute list is more than one block.
>>    */
>>   STATIC int xfs_attr_node_get(xfs_da_args_t *args);
>> -STATIC int xfs_attr_node_addname(xfs_da_args_t *args);
>> +STATIC int xfs_attr_node_addname(struct xfs_delattr_context *dac);
>>   STATIC int xfs_attr_node_removename_iter(struct xfs_delattr_context *dac);
>>   STATIC int xfs_attr_node_hasname(xfs_da_args_t *args,
>>   				 struct xfs_da_state **state);
>>   STATIC int xfs_attr_fillstate(xfs_da_state_t *state);
>>   STATIC int xfs_attr_refillstate(xfs_da_state_t *state);
>> +STATIC int xfs_attr_leaf_try_add(struct xfs_da_args *args, struct xfs_buf *bp);
>> +STATIC int xfs_attr_set_iter(struct xfs_delattr_context *dac,
>> +			     struct xfs_buf **leaf_bp);
>>   
>>   int
>>   xfs_inode_hasattr(
>> @@ -218,8 +221,11 @@ xfs_attr_is_shortform(
>>   
>>   /*
>>    * Attempts to set an attr in shortform, or converts short form to leaf form if
>> - * there is not enough room.  If the attr is set, the transaction is committed
>> - * and set to NULL.
>> + * there is not enough room.  This function is meant to operate as a helper
>> + * routine to the delayed attribute functions.  It returns -EAGAIN to indicate
>> + * that the calling function should roll the transaction, and then proceed to
>> + * add the attr in leaf form.  This subroutine does not expect to be recalled
>> + * again like the other delayed attr routines do.
>>    */
>>   STATIC int
>>   xfs_attr_set_shortform(
>> @@ -227,16 +233,16 @@ xfs_attr_set_shortform(
>>   	struct xfs_buf		**leaf_bp)
>>   {
>>   	struct xfs_inode	*dp = args->dp;
>> -	int			error, error2 = 0;
>> +	int			error = 0;
>>   
>>   	/*
>>   	 * Try to add the attr to the attribute list in the inode.
>>   	 */
>>   	error = xfs_attr_try_sf_addname(dp, args);
>> +
>> +	/* Should only be 0, -EEXIST or ENOSPC */
> 
> Nit: "...or -ENOSPC"
> 
> Also, this comment could go a couple of lines up:
Sure
> 
> 	/*
> 	 * Try to add the attr to the attribute list in the inode.
> 	 * This should only return 0, -EEXIST, or -ENOSPC.
> 	 */
> 	error = xfs_attr_try_sf_addname(dp, args);
> 	if (error != -ENOSPC)
> 		return error;
> 
> 
>>   	if (error != -ENOSPC) {
>> -		error2 = xfs_trans_commit(args->trans);
>> -		args->trans = NULL;
>> -		return error ? error : error2;
>> +		return error;
>>   	}
>>   	/*
>>   	 * It won't fit in the shortform, transform to a leaf block.  GROT:
>> @@ -249,18 +255,10 @@ xfs_attr_set_shortform(
>>   	/*
>>   	 * Prevent the leaf buffer from being unlocked so that a concurrent AIL
>>   	 * push cannot grab the half-baked leaf buffer and run into problems
>> -	 * with the write verifier. Once we're done rolling the transaction we
>> -	 * can release the hold and add the attr to the leaf.
>> +	 * with the write verifier.
>>   	 */
>>   	xfs_trans_bhold(args->trans, *leaf_bp);
>> -	error = xfs_defer_finish(&args->trans);
>> -	xfs_trans_bhold_release(args->trans, *leaf_bp);
>> -	if (error) {
>> -		xfs_trans_brelse(args->trans, *leaf_bp);
>> -		return error;
>> -	}
>> -
>> -	return 0;
>> +	return -EAGAIN;
> 
> What state are we in when return -EAGAIN here?  Are we still in
> XFS_DAS_UNINIT, but with an attr fork that is no longer in local format,
> which means that we skip the xfs_attr_is_shortform branch next time
> around?
Yes, that's correct.  I think I used to have an explicit state for it, 
but it's really not needed for this reason.  Though I think they do add 
some degree of readability.  Maybe we could add a comment?

/* Restart attr operation in leaf format */

?

> 
>>   }
>>   
>>   /*
>> @@ -268,7 +266,7 @@ xfs_attr_set_shortform(
>>    * also checks for a defer finish.  Transaction is finished and rolled as
>>    * needed, and returns true of false if the delayed operation should continue.
>>    */
>> -int
>> +STATIC int
>>   xfs_attr_trans_roll(
>>   	struct xfs_delattr_context	*dac)
>>   {
>> @@ -297,61 +295,130 @@ int
>>   xfs_attr_set_args(
>>   	struct xfs_da_args	*args)
>>   {
>> -	struct xfs_inode	*dp = args->dp;
>> -	struct xfs_buf          *leaf_bp = NULL;
>> -	int			error = 0;
>> +	struct xfs_buf			*leaf_bp = NULL;
>> +	int				error = 0;
>> +	struct xfs_delattr_context	dac = {
>> +		.da_args	= args,
>> +	};
>> +
>> +	do {
>> +		error = xfs_attr_set_iter(&dac, &leaf_bp);
>> +		if (error != -EAGAIN)
>> +			break;
>> +
>> +		error = xfs_attr_trans_roll(&dac);
>> +		if (error)
>> +			return error;
>> +
>> +		if (leaf_bp) {
>> +			xfs_trans_bjoin(args->trans, leaf_bp);
>> +			xfs_trans_bhold(args->trans, leaf_bp);
>> +		}
>> +
>> +	} while (true);
>> +
>> +	return error;
>> +}
>> +
>> +/*
>> + * Set the attribute specified in @args.
>> + * This routine is meant to function as a delayed operation, and may return
>> + * -EAGAIN when the transaction needs to be rolled.  Calling functions will need
>> + * to handle this, and recall the function until a successful error code is
>> + * returned.
>> + */
>> +STATIC int
>> +xfs_attr_set_iter(
>> +	struct xfs_delattr_context	*dac,
>> +	struct xfs_buf			**leaf_bp)
>> +{
>> +	struct xfs_da_args		*args = dac->da_args;
>> +	struct xfs_inode		*dp = args->dp;
>> +	int				error = 0;
>> +
>> +	/* State machine switch */
>> +	switch (dac->dela_state) {
>> +	case XFS_DAS_FLIP_LFLAG:
>> +	case XFS_DAS_FOUND_LBLK:
> 
> Do we need to catch XFS_DAS_RM_LBLK here?

I think we fall into the correct code path without it, but I think it's 
better to have it here for consistency.  Will add.

> 
>> +		goto das_leaf;
>> +	case XFS_DAS_FOUND_NBLK:
>> +	case XFS_DAS_FLIP_NFLAG:
>> +	case XFS_DAS_ALLOC_NODE:
>> +		goto das_node;
>> +	default:
>> +		break;
>> +	}
>>   
>>   	/*
>>   	 * If the attribute list is already in leaf format, jump straight to
>>   	 * leaf handling.  Otherwise, try to add the attribute to the shortform
>>   	 * list; if there's no room then convert the list to leaf format and try
>> -	 * again.
>> +	 * again. No need to set state as we will be in leaf form when we come
>> +	 * back
>>   	 */
>>   	if (xfs_attr_is_shortform(dp)) {
>>   
>>   		/*
>> -		 * If the attr was successfully set in shortform, the
>> -		 * transaction is committed and set to NULL.  Otherwise, is it
>> -		 * converted from shortform to leaf, and the transaction is
>> -		 * retained.
>> +		 * If the attr was successfully set in shortform, no need to
>> +		 * continue.  Otherwise, is it converted from shortform to leaf
>> +		 * and -EAGAIN is returned.
>>   		 */
>> -		error = xfs_attr_set_shortform(args, &leaf_bp);
>> -		if (error || !args->trans)
>> -			return error;
>> +		error = xfs_attr_set_shortform(args, leaf_bp);
>> +		if (error == -EAGAIN)
>> +			dac->flags |= XFS_DAC_DEFER_FINISH;
>> +
>> +		return error;
>>   	}
>>   
>> -	if (xfs_bmap_one_block(dp, XFS_ATTR_FORK)) {
>> -		error = xfs_attr_leaf_addname(args);
>> -		if (error != -ENOSPC)
>> -			return error;
>> +	/*
>> +	 * After a shortform to leaf conversion, we need to hold the leaf and
>> +	 * cycle out the transaction.  When we get back, we need to release
>> +	 * the leaf.
> 
> "...to release the hold on the leaf buffer."
Sure, will expand

> 
>> +	 */
>> +	if (*leaf_bp != NULL) {
>> +		xfs_trans_bhold_release(args->trans, *leaf_bp);
>> +		*leaf_bp = NULL;
>> +	}
>>   
>> -		/*
>> -		 * Promote the attribute list to the Btree format.
>> -		 */
>> -		error = xfs_attr3_leaf_to_node(args);
>> -		if (error)
>> -			return error;
>> +	if (xfs_bmap_one_block(dp, XFS_ATTR_FORK)) {
>> +		error = xfs_attr_leaf_try_add(args, *leaf_bp);
>> +		switch (error) {
>> +		case -ENOSPC:
>> +			/*
>> +			 * Promote the attribute list to the Btree format.
>> +			 */
>> +			error = xfs_attr3_leaf_to_node(args);
>> +			if (error)
>> +				return error;
>>   
>> -		/*
>> -		 * Finish any deferred work items and roll the transaction once
>> -		 * more.  The goal here is to call node_addname with the inode
>> -		 * and transaction in the same state (inode locked and joined,
>> -		 * transaction clean) no matter how we got to this step.
>> -		 */
>> -		error = xfs_defer_finish(&args->trans);
>> -		if (error)
>> +			/*
>> +			 * Finish any deferred work items and roll the
>> +			 * transaction once more.  The goal here is to call
>> +			 * node_addname with the inode and transaction in the
>> +			 * same state (inode locked and joined, transaction
>> +			 * clean) no matter how we got to this step.
>> +			 */
>> +			dac->flags |= XFS_DAC_DEFER_FINISH;
>> +			return -EAGAIN;
> 
> What state should we be in at this -EAGAIN return?  Is it
> XFS_DAS_UNINIT, but with more than one extent in the attr fork?
It could be UNINIT, if the attr was already a leaf at the time we 
started.  If we had to promote from a block to a leaf, and STILL 
counldnt fit in leaf form, then we're probably in some state reminiscent 
of the leaf routines.  But because xfs_attr3_leaf_to_node just turned us 
into a node, we fall into the node path upon return.

I know that's confusing... which leads to your next question of.....
> 
> /me is wishing these would get turned into explicit states, since afaict
> we don't unlock the inode and so we should find it in /exactly/ the
> state that the delattr_context says it should be in.
IIRC it used to have an explicit XFS_DC_LEAF_TO_NODE state, but I think 
we simplified it away at some point in the reviewing in an effort to 
simplify the statemachine as much as possible.  v8 I think.  But yes, I 
do think there is a trade off between removing the states where they can 
be, but then reducing the readability of where we are in the attr 
process.  Because now your state isnt exactly represented by dela_state 
anymore, it's the combination of dela_state and the state of the tree.

I think I've been over this code so much by now, I can follow it either 
way, but if it's confusing to others, maybe we should put it back?  Or 
maybe just a comment if that helps?


> 
>> +		case 0:
>> +			dac->dela_state = XFS_DAS_FOUND_LBLK;
>> +			return -EAGAIN;
>> +		default:
>>   			return error;
>> +		}
>> +das_leaf:
> 
> The only way to get to this block of code is by jumping to das_leaf,
> from the switch statement above, right?  If so, then shouldn't it be up
> there in the switch statement?
We could, though I think we were just trying to be consistent in that 
the switch is sort of a dispatcher for gotos?  Otherwise we end up with 
a switch with giant cases.  It's the same difference I suppose.

> 
>> +		error = xfs_attr_leaf_addname(dac);
>> +		if (error == -ENOSPC)
>> +			/*
>> +			 * No need to set state.  We will be in node form when
>> +			 * we are recalled
>> +			 */
>> +			return -EAGAIN;
> 
> How do we get to node form?
Hmm, I thought xfs_attr_leaf_addname did promote to node if theres not 
enough space, but now that you point it out, i'm not seeing it.  We may 
have to put the LEAF_TO_NODE state back anyway.

maybe i can add a test case too, it doesnt look like any of the existing 
cases run across it.

> 
>> -		/*
>> -		 * Commit the current trans (including the inode) and
>> -		 * start a new one.
>> -		 */
>> -		error = xfs_trans_roll_inode(&args->trans, dp);
>> -		if (error)
>> -			return error;
>> +		return error;
>>   	}
>> -
>> -	error = xfs_attr_node_addname(args);
>> +das_node:
>> +	error = xfs_attr_node_addname(dac);
>>   	return error;
> 
> Similarly, I think the only way get to this block of code is if we're in
> the initial state (XFS_DAS_UNINIT?) and the inode wasn't in short
> format; or if we jumped here via DAS_{FOUND_NBLK,FLIP_NFLAG,ALLOC_NODE},
> right?
> 
> I think you could straighten this out a bit further (I left out the
> comments):
> 
> 	switch (dac->dela_state) {
> 	case XFS_DAS_FLIP_LFLAG:
> 	case XFS_DAS_FOUND_LBLK:
> 		error = xfs_attr_leaf_addname(dac);
> 		if (error == -ENOSPC)
> 			return -EAGAIN;
> 		return error;
> 	case XFS_DAS_FOUND_NBLK:
> 	case XFS_DAS_FLIP_NFLAG:
> 	case XFS_DAS_ALLOC_NODE:
> 		return xfs_attr_node_addname(dac);
> 	case XFS_DAS_UNINIT:
> 		break;
> 	default:
> 		...assert on the XFS_DAS_RM_* flags...
> 	}
> 
> 	if (xfs_attr_is_shortform(dp))
> 		return xfs_attr_set_shortform(args, leaf_bp);
> 
> 	if (*leaf_bp != NULL) {
> 		...release bhold...
> 	}
> 
> 	if (!xfs_bmap_one_block(...))
> 		return xfs_attr_node_addname(dac);
> 
> 	error = xfs_attr_leaf_try_add(args, *leaf_bp);
> 	switch (error) {
> 	...handle -ENOSPC and 0...
> 	}
> 	return error;
> 
Ok, I'll see if I can get something like that through the test cases. 
If if doesnt work out, I'll make a note of it.

>>   }
>>   
>> @@ -723,28 +790,30 @@ xfs_attr_leaf_try_add(
>>    *
>>    * This leaf block cannot have a "remote" value, we only call this routine
>>    * if bmap_one_block() says there is only one block (ie: no remote blks).
>> + *
>> + * This routine is meant to function as a delayed operation, and may return
>> + * -EAGAIN when the transaction needs to be rolled.  Calling functions will need
>> + * to handle this, and recall the function until a successful error code is
>> + * returned.
>>    */
>>   STATIC int
>>   xfs_attr_leaf_addname(
>> -	struct xfs_da_args	*args)
>> +	struct xfs_delattr_context	*dac)
>>   {
>> -	int			error, forkoff;
>> -	struct xfs_buf		*bp = NULL;
>> -	struct xfs_inode	*dp = args->dp;
>> -
>> -	trace_xfs_attr_leaf_addname(args);
>> -
>> -	error = xfs_attr_leaf_try_add(args, bp);
>> -	if (error)
>> -		return error;
>> +	struct xfs_da_args		*args = dac->da_args;
>> +	struct xfs_buf			*bp = NULL;
>> +	int				error, forkoff;
>> +	struct xfs_inode		*dp = args->dp;
>>   
>> -	/*
>> -	 * Commit the transaction that added the attr name so that
>> -	 * later routines can manage their own transactions.
>> -	 */
>> -	error = xfs_trans_roll_inode(&args->trans, dp);
>> -	if (error)
>> -		return error;
>> +	/* State machine switch */
>> +	switch (dac->dela_state) {
>> +	case XFS_DAS_FLIP_LFLAG:
>> +		goto das_flip_flag;
>> +	case XFS_DAS_RM_LBLK:
>> +		goto das_rm_lblk;
>> +	default:
>> +		break;
>> +	}
>>   
>>   	/*
>>   	 * If there was an out-of-line value, allocate the blocks we
>> @@ -752,12 +821,34 @@ xfs_attr_leaf_addname(
>>   	 * after we create the attribute so that we don't overflow the
>>   	 * maximum size of a transaction and/or hit a deadlock.
>>   	 */
>> -	if (args->rmtblkno > 0) {
>> -		error = xfs_attr_rmtval_set(args);
>> +
>> +	/* Open coded xfs_attr_rmtval_set without trans handling */
>> +	if ((dac->flags & XFS_DAC_LEAF_ADDNAME_INIT) == 0) {
>> +		dac->flags |= XFS_DAC_LEAF_ADDNAME_INIT;
>> +		if (args->rmtblkno > 0) {
>> +			error = xfs_attr_rmtval_find_space(dac);
>> +			if (error)
>> +				return error;
>> +		}
>> +	}
>> +
>> +	/*
>> +	 * Roll through the "value", allocating blocks on disk as
>> +	 * required.
>> +	 */
>> +	if (dac->blkcnt > 0) {
>> +		error = xfs_attr_rmtval_set_blk(dac);
>>   		if (error)
>>   			return error;
>> +
>> +		dac->flags |= XFS_DAC_DEFER_FINISH;
>> +		return -EAGAIN;
> 
> What state are we in here?  FOUND_LBLK, with blkcnt slowly decreasing?
> 
I used to have an ALLOC_LEAF state for this one.  Used to look something 
like this:
+alloc_leaf:
+        while (args->dac.blkcnt > 0) {
+            error = xfs_attr_rmtval_set_blk(args);
+            if (error)
+                return error;
+
+            args->dac.flags |= XFS_DAC_FINISH_TRANS;
+            args->dac.dela_state = XFS_DAS_ALLOC_LEAF;
+            return -EAGAIN;
+        }

Again, it's not really needed, as we will fall into this logic with or 
with out the state.  And the while loop doesnt really loop, though I 
guess it does sort of help the reader to understand that this is 
supposed to function like a loop.  I think it's easy to see something 
like that, and then want to simplify away the extra semantics, but then 
on a second look, it's not quite as obvious why with out the 
recollection of what it once was.  Maybe a comment is in order?

/* Repeat this until we have set all rmt blks */

?


To directly answer your question though, I think the state is still 
UNINIT at this point, since any of the other states would have branched 
off before this.  It's important to note though that the functions that 
have states are meant to sort of take ownership the statemachine.  IOW, 
if the state coming in does not apply to the scope of this function, or 
any of the subroutines there in, then the state is simply overwritten as 
this function decides appropriate.  It doesnt throw an error if it is 
passed a state that used to belong to it's parent.  Calling functions 
should understand that they have sort of "surrendered" the statemachine 
to this subfunction until it returns something other than EAGAIN.  At 
least that's the idea.  Honnestly, the only reason I have UNINIT at all 
is because we get warnings about setting the state to 0 when the enum 
needs to start at something other than 0.

Hope that helps?



>>   	}
>>   
>> +	error = xfs_attr_rmtval_set_value(args);
>> +	if (error)
>> +		return error;
>> +
>>   	if (!(args->op_flags & XFS_DA_OP_RENAME)) {
>>   		/*
>>   		 * Added a "remote" value, just clear the incomplete flag.
>> @@ -777,29 +868,29 @@ xfs_attr_leaf_addname(
>>   	 * In a separate transaction, set the incomplete flag on the "old" attr
>>   	 * and clear the incomplete flag on the "new" attr.
>>   	 */
>> -
>>   	error = xfs_attr3_leaf_flipflags(args);
>>   	if (error)
>>   		return error;
>>   	/*
>>   	 * Commit the flag value change and start the next trans in series.
>>   	 */
>> -	error = xfs_trans_roll_inode(&args->trans, args->dp);
>> -	if (error)
>> -		return error;
>> -
>> +	dac->dela_state = XFS_DAS_FLIP_LFLAG;
>> +	return -EAGAIN;
>> +das_flip_flag:
>>   	/*
>>   	 * Dismantle the "old" attribute/value pair by removing a "remote" value
>>   	 * (if it exists).
>>   	 */
>>   	xfs_attr_restore_rmt_blk(args);
>>   
>> +	error = xfs_attr_rmtval_invalidate(args);
>> +	if (error)
>> +		return error;
>> +das_rm_lblk:
>>   	if (args->rmtblkno) {
>> -		error = xfs_attr_rmtval_invalidate(args);
>> -		if (error)
>> -			return error;
>> -
>> -		error = xfs_attr_rmtval_remove(args);
>> +		error = __xfs_attr_rmtval_remove(dac);
>> +		if (error == -EAGAIN)
>> +			dac->dela_state = XFS_DAS_RM_LBLK;
>>   		if (error)
>>   			return error;
>>   	}
>> @@ -965,23 +1056,38 @@ xfs_attr_node_hasname(
>>    *
>>    * "Remote" attribute values confuse the issue and atomic rename operations
>>    * add a whole extra layer of confusion on top of that.
>> + *
>> + * This routine is meant to function as a delayed operation, and may return
>> + * -EAGAIN when the transaction needs to be rolled.  Calling functions will need
>> + * to handle this, and recall the function until a successful error code is
>> + *returned.
>>    */
>>   STATIC int
>>   xfs_attr_node_addname(
>> -	struct xfs_da_args	*args)
>> +	struct xfs_delattr_context	*dac)
>>   {
>> -	struct xfs_da_state	*state;
>> -	struct xfs_da_state_blk	*blk;
>> -	struct xfs_inode	*dp;
>> -	int			retval, error;
>> +	struct xfs_da_args		*args = dac->da_args;
>> +	struct xfs_da_state		*state = NULL;
>> +	struct xfs_da_state_blk		*blk;
>> +	int				retval = 0;
>> +	int				error = 0;
>>   
>>   	trace_xfs_attr_node_addname(args);
>>   
>> -	/*
>> -	 * Fill in bucket of arguments/results/context to carry around.
>> -	 */
>> -	dp = args->dp;
>> -restart:
>> +	/* State machine switch */
>> +	switch (dac->dela_state) {
>> +	case XFS_DAS_FLIP_NFLAG:
>> +		goto das_flip_flag;
>> +	case XFS_DAS_FOUND_NBLK:
>> +		goto das_found_nblk;
>> +	case XFS_DAS_ALLOC_NODE:
>> +		goto das_alloc_node;
>> +	case XFS_DAS_RM_NBLK:
>> +		goto das_rm_nblk;
>> +	default:
>> +		break;
>> +	}
>> +
>>   	/*
>>   	 * Search to see if name already exists, and get back a pointer
>>   	 * to where it should go.
>> @@ -1027,19 +1133,13 @@ xfs_attr_node_addname(
>>   			error = xfs_attr3_leaf_to_node(args);
>>   			if (error)
>>   				goto out;
>> -			error = xfs_defer_finish(&args->trans);
>> -			if (error)
>> -				goto out;
>>   
>>   			/*
>> -			 * Commit the node conversion and start the next
>> -			 * trans in the chain.
>> +			 * Restart routine from the top.  No need to set  the
>> +			 * state
>>   			 */
>> -			error = xfs_trans_roll_inode(&args->trans, dp);
>> -			if (error)
>> -				goto out;
>> -
>> -			goto restart;
>> +			dac->flags |= XFS_DAC_DEFER_FINISH;
>> +			return -EAGAIN;
> 
> What state are we in here?  Are we still in the same state that we were
> at the start of the function, but ready to try xfs_attr3_leaf_add again?
To directly answer the question: we may be in UNINIT if we were already 
a node when we started the attr op.  If we had to promote from leaf to 
node, it may be some state left over from the leaf routines.

Again though, in so far as this routine is concerned, the idea is that 
the state either one of the cases in the switch up top, or it's not.

> 
>>   		}
>>   
>>   		/*
>> @@ -1051,9 +1151,7 @@ xfs_attr_node_addname(
>>   		error = xfs_da3_split(state);
>>   		if (error)
>>   			goto out;
>> -		error = xfs_defer_finish(&args->trans);
>> -		if (error)
>> -			goto out;
>> +		dac->flags |= XFS_DAC_DEFER_FINISH;
>>   	} else {
>>   		/*
>>   		 * Addition succeeded, update Btree hashvals.
>> @@ -1068,13 +1166,9 @@ xfs_attr_node_addname(
>>   	xfs_da_state_free(state);
>>   	state = NULL;
>>   
>> -	/*
>> -	 * Commit the leaf addition or btree split and start the next
>> -	 * trans in the chain.
>> -	 */
>> -	error = xfs_trans_roll_inode(&args->trans, dp);
>> -	if (error)
>> -		goto out;
>> +	dac->dela_state = XFS_DAS_FOUND_NBLK;
>> +	return -EAGAIN;
>> +das_found_nblk:
>>   
>>   	/*
>>   	 * If there was an out-of-line value, allocate the blocks we
>> @@ -1083,7 +1177,27 @@ xfs_attr_node_addname(
>>   	 * maximum size of a transaction and/or hit a deadlock.
>>   	 */
>>   	if (args->rmtblkno > 0) {
>> -		error = xfs_attr_rmtval_set(args);
>> +		/* Open coded xfs_attr_rmtval_set without trans handling */
>> +		error = xfs_attr_rmtval_find_space(dac);
>> +		if (error)
>> +			return error;
>> +
>> +		/*
>> +		 * Roll through the "value", allocating blocks on disk as
>> +		 * required.
>> +		 */
>> +das_alloc_node:
>> +		if (dac->blkcnt > 0) {
>> +			error = xfs_attr_rmtval_set_blk(dac);
>> +			if (error)
>> +				return error;
>> +
>> +			dac->flags |= XFS_DAC_DEFER_FINISH;
>> +			dac->dela_state = XFS_DAS_ALLOC_NODE;
>> +			return -EAGAIN;
>> +		}
>> +
>> +		error = xfs_attr_rmtval_set_value(args);
>>   		if (error)
>>   			return error;
>>   	}
>> @@ -1113,22 +1227,28 @@ xfs_attr_node_addname(
>>   	/*
>>   	 * Commit the flag value change and start the next trans in series
>>   	 */
>> -	error = xfs_trans_roll_inode(&args->trans, args->dp);
>> -	if (error)
>> -		goto out;
>> -
>> +	dac->dela_state = XFS_DAS_FLIP_NFLAG;
>> +	return -EAGAIN;
>> +das_flip_flag:
>>   	/*
>>   	 * Dismantle the "old" attribute/value pair by removing a "remote" value
>>   	 * (if it exists).
>>   	 */
>>   	xfs_attr_restore_rmt_blk(args);
>>   
>> +	error = xfs_attr_rmtval_invalidate(args);
>> +	if (error)
>> +		return error;
>> +
>> +das_rm_nblk:
>>   	if (args->rmtblkno) {
>> -		error = xfs_attr_rmtval_invalidate(args);
>> -		if (error)
>> -			return error;
>> +		error = __xfs_attr_rmtval_remove(dac);
>> +
>> +		if (error == -EAGAIN) {
>> +			dac->dela_state = XFS_DAS_RM_NBLK;
>> +			return -EAGAIN;
>> +		}
>>   
>> -		error = xfs_attr_rmtval_remove(args);
>>   		if (error)
>>   			return error;
>>   	}
>> diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
>> index 64dcf0f..501f9df 100644
>> --- a/fs/xfs/libxfs/xfs_attr.h
>> +++ b/fs/xfs/libxfs/xfs_attr.h
>> @@ -106,6 +106,118 @@ struct xfs_attr_list_context {
>>    *	                                      v         │
>>    *	                                     done <─────┘
>>    *
>> + *
>> + * Below is a state machine diagram for attr set operations.
>> + *
>> + *  xfs_attr_set_iter()
>> + *             │
>> + *             v
> 
> I think this diagram is missing the part where we attempt to add a
> shortform attr?
I left if out because the short form doesnt make use of states.  I can 
doodle that in though if you prefer:

       ┌───n── is shortform?
       │            |
       │            y
       │            |
       │            V
       │   xfs_attr_set_shortform
       │            |
       │            V
       ├───n─── had enough
       │          space?
       │            │
       │            y
       │            │
       │            V
       │           done
       └────────────┐
                    │
                    V

> 
> --D

Thx for the thorough reviews!

Allison

> 
>> + *   ┌───n── fork has
>> + *   │	    only 1 blk?
>> + *   │		│
>> + *   │		y
>> + *   │		│
>> + *   │		v
>> + *   │	xfs_attr_leaf_try_add()
>> + *   │		│
>> + *   │		v
>> + *   │	     had enough
>> + *   ├───n────space?
>> + *   │		│
>> + *   │		y
>> + *   │		│
>> + *   │		v
>> + *   │	XFS_DAS_FOUND_LBLK ──┐
>> + *   │	                     │
>> + *   │	XFS_DAS_FLIP_LFLAG ──┤
>> + *   │	(subroutine state)   │
>> + *   │		             │
>> + *   │		             └─>xfs_attr_leaf_addname()
>> + *   │		                      │
>> + *   │		                      v
>> + *   │		                   was this
>> + *   │		                   a rename? ──n─┐
>> + *   │		                      │          │
>> + *   │		                      y          │
>> + *   │		                      │          │
>> + *   │		                      v          │
>> + *   │		                flip incomplete  │
>> + *   │		                    flag         │
>> + *   │		                      │          │
>> + *   │		                      v          │
>> + *   │		              XFS_DAS_FLIP_LFLAG │
>> + *   │		                      │          │
>> + *   │		                      v          │
>> + *   │		                    remove       │
>> + *   │		XFS_DAS_RM_LBLK ─> old name      │
>> + *   │		         ^            │          │
>> + *   │		         │            v          │
>> + *   │		         └──────y── more to      │
>> + *   │		                    remove       │
>> + *   │		                      │          │
>> + *   │		                      n          │
>> + *   │		                      │          │
>> + *   │		                      v          │
>> + *   │		                     done <──────┘
>> + *   └──> XFS_DAS_FOUND_NBLK ──┐
>> + *	  (subroutine state)   │
>> + *	                       │
>> + *	  XFS_DAS_ALLOC_NODE ──┤
>> + *	  (subroutine state)   │
>> + *	                       │
>> + *	  XFS_DAS_FLIP_NFLAG ──┤
>> + *	  (subroutine state)   │
>> + *	                       │
>> + *	                       └─>xfs_attr_node_addname()
>> + *	                               │
>> + *	                               v
>> + *	                       find space to store
>> + *	                      attr. Split if needed
>> + *	                               │
>> + *	                               v
>> + *	                       XFS_DAS_FOUND_NBLK
>> + *	                               │
>> + *	                               v
>> + *	                 ┌─────n──  need to
>> + *	                 │        alloc blks?
>> + *	                 │             │
>> + *	                 │             y
>> + *	                 │             │
>> + *	                 │             v
>> + *	                 │  ┌─>XFS_DAS_ALLOC_NODE
>> + *	                 │  │          │
>> + *	                 │  │          v
>> + *	                 │  └──y── need to alloc
>> + *	                 │         more blocks?
>> + *	                 │             │
>> + *	                 │             n
>> + *	                 │             │
>> + *	                 │             v
>> + *	                 │          was this
>> + *	                 └────────> a rename? ──n─┐
>> + *	                               │          │
>> + *	                               y          │
>> + *	                               │          │
>> + *	                               v          │
>> + *	                         flip incomplete  │
>> + *	                             flag         │
>> + *	                               │          │
>> + *	                               v          │
>> + *	                       XFS_DAS_FLIP_NFLAG │
>> + *	                               │          │
>> + *	                               v          │
>> + *	                             remove       │
>> + *	         XFS_DAS_RM_NBLK ─> old name      │
>> + *	                  ^            │          │
>> + *	                  │            v          │
>> + *	                  └──────y── more to      │
>> + *	                             remove       │
>> + *	                               │          │
>> + *	                               n          │
>> + *	                               │          │
>> + *	                               v          │
>> + *	                              done <──────┘
>> + *
>>    */
>>   
>>   /*
>> @@ -120,6 +232,13 @@ struct xfs_attr_list_context {
>>   enum xfs_delattr_state {
>>   	XFS_DAS_UNINIT		= 0,  /* No state has been set yet */
>>   	XFS_DAS_RM_SHRINK,	      /* We are shrinking the tree */
>> +	XFS_DAS_FOUND_LBLK,	      /* We found leaf blk for attr */
>> +	XFS_DAS_FOUND_NBLK,	      /* We found node blk for attr */
>> +	XFS_DAS_FLIP_LFLAG,	      /* Flipped leaf INCOMPLETE attr flag */
>> +	XFS_DAS_RM_LBLK,	      /* A rename is removing leaf blocks */
>> +	XFS_DAS_ALLOC_NODE,	      /* We are allocating node blocks */
>> +	XFS_DAS_FLIP_NFLAG,	      /* Flipped node INCOMPLETE attr flag */
>> +	XFS_DAS_RM_NBLK,	      /* A rename is removing node blocks */
>>   };
>>   
>>   /*
>> @@ -127,6 +246,7 @@ enum xfs_delattr_state {
>>    */
>>   #define XFS_DAC_DEFER_FINISH		0x01 /* finish the transaction */
>>   #define XFS_DAC_NODE_RMVNAME_INIT	0x02 /* xfs_attr_node_removename init */
>> +#define XFS_DAC_LEAF_ADDNAME_INIT	0x04 /* xfs_attr_leaf_addname init*/
>>   
>>   /*
>>    * Context used for keeping track of delayed attribute operations
>> @@ -134,6 +254,11 @@ enum xfs_delattr_state {
>>   struct xfs_delattr_context {
>>   	struct xfs_da_args      *da_args;
>>   
>> +	/* Used in xfs_attr_rmtval_set_blk to roll through allocating blocks */
>> +	struct xfs_bmbt_irec	map;
>> +	xfs_dablk_t		lblkno;
>> +	int			blkcnt;
>> +
>>   	/* Used in xfs_attr_node_removename to roll through removing blocks */
>>   	struct xfs_da_state     *da_state;
>>   
>> @@ -160,7 +285,6 @@ int xfs_attr_set_args(struct xfs_da_args *args);
>>   int xfs_has_attr(struct xfs_da_args *args);
>>   int xfs_attr_remove_args(struct xfs_da_args *args);
>>   int xfs_attr_remove_iter(struct xfs_delattr_context *dac);
>> -int xfs_attr_trans_roll(struct xfs_delattr_context *dac);
>>   bool xfs_attr_namecheck(const void *name, size_t length);
>>   void xfs_delattr_context_init(struct xfs_delattr_context *dac,
>>   			      struct xfs_da_args *args);
>> diff --git a/fs/xfs/libxfs/xfs_attr_remote.c b/fs/xfs/libxfs/xfs_attr_remote.c
>> index 1426c15..5b445e7 100644
>> --- a/fs/xfs/libxfs/xfs_attr_remote.c
>> +++ b/fs/xfs/libxfs/xfs_attr_remote.c
>> @@ -441,7 +441,7 @@ xfs_attr_rmtval_get(
>>    * Find a "hole" in the attribute address space large enough for us to drop the
>>    * new attribute's value into
>>    */
>> -STATIC int
>> +int
>>   xfs_attr_rmt_find_hole(
>>   	struct xfs_da_args	*args)
>>   {
>> @@ -468,7 +468,7 @@ xfs_attr_rmt_find_hole(
>>   	return 0;
>>   }
>>   
>> -STATIC int
>> +int
>>   xfs_attr_rmtval_set_value(
>>   	struct xfs_da_args	*args)
>>   {
>> @@ -628,6 +628,69 @@ xfs_attr_rmtval_set(
>>   }
>>   
>>   /*
>> + * Find a hole for the attr and store it in the delayed attr context.  This
>> + * initializes the context to roll through allocating an attr extent for a
>> + * delayed attr operation
>> + */
>> +int
>> +xfs_attr_rmtval_find_space(
>> +	struct xfs_delattr_context	*dac)
>> +{
>> +	struct xfs_da_args		*args = dac->da_args;
>> +	struct xfs_bmbt_irec		*map = &dac->map;
>> +	int				error;
>> +
>> +	dac->lblkno = 0;
>> +	dac->blkcnt = 0;
>> +	args->rmtblkcnt = 0;
>> +	args->rmtblkno = 0;
>> +	memset(map, 0, sizeof(struct xfs_bmbt_irec));
>> +
>> +	error = xfs_attr_rmt_find_hole(args);
>> +	if (error)
>> +		return error;
>> +
>> +	dac->blkcnt = args->rmtblkcnt;
>> +	dac->lblkno = args->rmtblkno;
>> +
>> +	return 0;
>> +}
>> +
>> +/*
>> + * Write one block of the value associated with an attribute into the
>> + * out-of-line buffer that we have defined for it. This is similar to a subset
>> + * of xfs_attr_rmtval_set, but records the current block to the delayed attr
>> + * context, and leaves transaction handling to the caller.
>> + */
>> +int
>> +xfs_attr_rmtval_set_blk(
>> +	struct xfs_delattr_context	*dac)
>> +{
>> +	struct xfs_da_args		*args = dac->da_args;
>> +	struct xfs_inode		*dp = args->dp;
>> +	struct xfs_bmbt_irec		*map = &dac->map;
>> +	int nmap;
>> +	int error;
>> +
>> +	nmap = 1;
>> +	error = xfs_bmapi_write(args->trans, dp, (xfs_fileoff_t)dac->lblkno,
>> +				dac->blkcnt, XFS_BMAPI_ATTRFORK, args->total,
>> +				map, &nmap);
>> +	if (error)
>> +		return error;
>> +
>> +	ASSERT(nmap == 1);
>> +	ASSERT((map->br_startblock != DELAYSTARTBLOCK) &&
>> +	       (map->br_startblock != HOLESTARTBLOCK));
>> +
>> +	/* roll attribute extent map forwards */
>> +	dac->lblkno += map->br_blockcount;
>> +	dac->blkcnt -= map->br_blockcount;
>> +
>> +	return 0;
>> +}
>> +
>> +/*
>>    * Remove the value associated with an attribute by deleting the
>>    * out-of-line buffer that it is stored on.
>>    */
>> @@ -669,38 +732,6 @@ xfs_attr_rmtval_invalidate(
>>   }
>>   
>>   /*
>> - * Remove the value associated with an attribute by deleting the
>> - * out-of-line buffer that it is stored on.
>> - */
>> -int
>> -xfs_attr_rmtval_remove(
>> -	struct xfs_da_args		*args)
>> -{
>> -	int				error;
>> -	struct xfs_delattr_context	dac  = {
>> -		.da_args	= args,
>> -	};
>> -
>> -	trace_xfs_attr_rmtval_remove(args);
>> -
>> -	/*
>> -	 * Keep de-allocating extents until the remote-value region is gone.
>> -	 */
>> -	do {
>> -		error = __xfs_attr_rmtval_remove(&dac);
>> -		if (error != -EAGAIN)
>> -			break;
>> -
>> -		error = xfs_attr_trans_roll(&dac);
>> -		if (error)
>> -			return error;
>> -
>> -	} while (true);
>> -
>> -	return error;
>> -}
>> -
>> -/*
>>    * Remove the value associated with an attribute by deleting the out-of-line
>>    * buffer that it is stored on. Returns EAGAIN for the caller to refresh the
>>    * transaction and re-call the function
>> diff --git a/fs/xfs/libxfs/xfs_attr_remote.h b/fs/xfs/libxfs/xfs_attr_remote.h
>> index 002fd30..84e2700 100644
>> --- a/fs/xfs/libxfs/xfs_attr_remote.h
>> +++ b/fs/xfs/libxfs/xfs_attr_remote.h
>> @@ -15,4 +15,8 @@ int xfs_attr_rmtval_stale(struct xfs_inode *ip, struct xfs_bmbt_irec *map,
>>   		xfs_buf_flags_t incore_flags);
>>   int xfs_attr_rmtval_invalidate(struct xfs_da_args *args);
>>   int __xfs_attr_rmtval_remove(struct xfs_delattr_context *dac);
>> +int xfs_attr_rmt_find_hole(struct xfs_da_args *args);
>> +int xfs_attr_rmtval_set_value(struct xfs_da_args *args);
>> +int xfs_attr_rmtval_set_blk(struct xfs_delattr_context *dac);
>> +int xfs_attr_rmtval_find_space(struct xfs_delattr_context *dac);
>>   #endif /* __XFS_ATTR_REMOTE_H__ */
>> diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
>> index 8695165..e9dde4e 100644
>> --- a/fs/xfs/xfs_trace.h
>> +++ b/fs/xfs/xfs_trace.h
>> @@ -1925,7 +1925,6 @@ DEFINE_ATTR_EVENT(xfs_attr_refillstate);
>>   
>>   DEFINE_ATTR_EVENT(xfs_attr_rmtval_get);
>>   DEFINE_ATTR_EVENT(xfs_attr_rmtval_set);
>> -DEFINE_ATTR_EVENT(xfs_attr_rmtval_remove);
>>   
>>   #define DEFINE_DA_EVENT(name) \
>>   DEFINE_EVENT(xfs_da_class, name, \
>> -- 
>> 2.7.4
>>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v13 01/10] xfs: Add helper xfs_attr_node_remove_step
  2020-11-10 23:12   ` Darrick J. Wong
@ 2020-11-13  1:38     ` Allison Henderson
  0 siblings, 0 replies; 58+ messages in thread
From: Allison Henderson @ 2020-11-13  1:38 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs



On 11/10/20 4:12 PM, Darrick J. Wong wrote:
> On Thu, Oct 22, 2020 at 11:34:26PM -0700, Allison Henderson wrote:
>> From: Allison Collins <allison.henderson@oracle.com>
>>
>> This patch adds a new helper function xfs_attr_node_remove_step.  This
>> will help simplify and modularize the calling function
>> xfs_attr_node_remove.
>>
>> Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
> 
> Looks fine to me, modulo Brian and Chandan's suggestions;
> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

Ok, thank you!

Allison
> 
> --D
> 
>> ---
>>   fs/xfs/libxfs/xfs_attr.c | 46 ++++++++++++++++++++++++++++++++++------------
>>   1 file changed, 34 insertions(+), 12 deletions(-)
>>
>> diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
>> index fd8e641..f4d39bf 100644
>> --- a/fs/xfs/libxfs/xfs_attr.c
>> +++ b/fs/xfs/libxfs/xfs_attr.c
>> @@ -1228,19 +1228,14 @@ xfs_attr_node_remove_rmt(
>>    * the root node (a special case of an intermediate node).
>>    */
>>   STATIC int
>> -xfs_attr_node_removename(
>> -	struct xfs_da_args	*args)
>> +xfs_attr_node_remove_step(
>> +	struct xfs_da_args	*args,
>> +	struct xfs_da_state	*state)
>>   {
>> -	struct xfs_da_state	*state;
>>   	struct xfs_da_state_blk	*blk;
>>   	int			retval, error;
>>   	struct xfs_inode	*dp = args->dp;
>>   
>> -	trace_xfs_attr_node_removename(args);
>> -
>> -	error = xfs_attr_node_removename_setup(args, &state);
>> -	if (error)
>> -		goto out;
>>   
>>   	/*
>>   	 * If there is an out-of-line value, de-allocate the blocks.
>> @@ -1250,7 +1245,7 @@ xfs_attr_node_removename(
>>   	if (args->rmtblkno > 0) {
>>   		error = xfs_attr_node_remove_rmt(args, state);
>>   		if (error)
>> -			goto out;
>> +			return error;
>>   	}
>>   
>>   	/*
>> @@ -1267,18 +1262,45 @@ xfs_attr_node_removename(
>>   	if (retval && (state->path.active > 1)) {
>>   		error = xfs_da3_join(state);
>>   		if (error)
>> -			goto out;
>> +			return error;
>>   		error = xfs_defer_finish(&args->trans);
>>   		if (error)
>> -			goto out;
>> +			return error;
>>   		/*
>>   		 * Commit the Btree join operation and start a new trans.
>>   		 */
>>   		error = xfs_trans_roll_inode(&args->trans, dp);
>>   		if (error)
>> -			goto out;
>> +			return error;
>>   	}
>>   
>> +	return error;
>> +}
>> +
>> +/*
>> + * Remove a name from a B-tree attribute list.
>> + *
>> + * This routine will find the blocks of the name to remove, remove them and
>> + * shirnk the tree if needed.
>> + */
>> +STATIC int
>> +xfs_attr_node_removename(
>> +	struct xfs_da_args	*args)
>> +{
>> +	struct xfs_da_state	*state;
>> +	int			error;
>> +	struct xfs_inode	*dp = args->dp;
>> +
>> +	trace_xfs_attr_node_removename(args);
>> +
>> +	error = xfs_attr_node_removename_setup(args, &state);
>> +	if (error)
>> +		goto out;
>> +
>> +	error = xfs_attr_node_remove_step(args, state);
>> +	if (error)
>> +		goto out;
>> +
>>   	/*
>>   	 * If the result is small enough, push it all into the inode.
>>   	 */
>> -- 
>> 2.7.4
>>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v13 02/10] xfs: Add delay ready attr remove routines
  2020-11-10 23:43   ` Darrick J. Wong
  2020-11-11  0:28     ` Dave Chinner
@ 2020-11-13  3:43     ` Allison Henderson
  2020-11-14  1:18       ` Darrick J. Wong
  1 sibling, 1 reply; 58+ messages in thread
From: Allison Henderson @ 2020-11-13  3:43 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs



On 11/10/20 4:43 PM, Darrick J. Wong wrote:
> On Thu, Oct 22, 2020 at 11:34:27PM -0700, Allison Henderson wrote:
>> This patch modifies the attr remove routines to be delay ready. This
>> means they no longer roll or commit transactions, but instead return
>> -EAGAIN to have the calling routine roll and refresh the transaction. In
>> this series, xfs_attr_remove_args has become xfs_attr_remove_iter, which
>> uses a sort of state machine like switch to keep track of where it was
>> when EAGAIN was returned. xfs_attr_node_removename has also been
>> modified to use the switch, and a new version of xfs_attr_remove_args
>> consists of a simple loop to refresh the transaction until the operation
>> is completed.  A new XFS_DAC_DEFER_FINISH flag is used to finish the
>> transaction where ever the existing code used to.
>>
>> Calls to xfs_attr_rmtval_remove are replaced with the delay ready
>> version __xfs_attr_rmtval_remove. We will rename
>> __xfs_attr_rmtval_remove back to xfs_attr_rmtval_remove when we are
>> done.
>>
>> xfs_attr_rmtval_remove itself is still in use by the set routines (used
>> during a rename).  For reasons of preserving existing function, we
>> modify xfs_attr_rmtval_remove to call xfs_defer_finish when the flag is
>> set.  Similar to how xfs_attr_remove_args does here.  Once we transition
>> the set routines to be delay ready, xfs_attr_rmtval_remove is no longer
>> used and will be removed.
>>
>> This patch also adds a new struct xfs_delattr_context, which we will use
>> to keep track of the current state of an attribute operation. The new
>> xfs_delattr_state enum is used to track various operations that are in
>> progress so that we know not to repeat them, and resume where we left
>> off before EAGAIN was returned to cycle out the transaction. Other
>> members take the place of local variables that need to retain their
>> values across multiple function recalls.  See xfs_attr.h for a more
>> detailed diagram of the states.
>>
>> Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
>> ---
>>   fs/xfs/libxfs/xfs_attr.c        | 200 +++++++++++++++++++++++++++++-----------
>>   fs/xfs/libxfs/xfs_attr.h        |  72 +++++++++++++++
>>   fs/xfs/libxfs/xfs_attr_leaf.c   |   2 +-
>>   fs/xfs/libxfs/xfs_attr_remote.c |  37 ++++----
>>   fs/xfs/libxfs/xfs_attr_remote.h |   2 +-
>>   fs/xfs/xfs_attr_inactive.c      |   2 +-
>>   6 files changed, 241 insertions(+), 74 deletions(-)
>>
>> diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
>> index f4d39bf..6ca94cb 100644
>> --- a/fs/xfs/libxfs/xfs_attr.c
>> +++ b/fs/xfs/libxfs/xfs_attr.c
>> @@ -53,7 +53,7 @@ STATIC int xfs_attr_leaf_hasname(struct xfs_da_args *args, struct xfs_buf **bp);
>>    */
>>   STATIC int xfs_attr_node_get(xfs_da_args_t *args);
>>   STATIC int xfs_attr_node_addname(xfs_da_args_t *args);
>> -STATIC int xfs_attr_node_removename(xfs_da_args_t *args);
>> +STATIC int xfs_attr_node_removename_iter(struct xfs_delattr_context *dac);
>>   STATIC int xfs_attr_node_hasname(xfs_da_args_t *args,
>>   				 struct xfs_da_state **state);
>>   STATIC int xfs_attr_fillstate(xfs_da_state_t *state);
>> @@ -264,6 +264,33 @@ xfs_attr_set_shortform(
>>   }
>>   
>>   /*
>> + * Checks to see if a delayed attribute transaction should be rolled.  If so,
>> + * also checks for a defer finish.  Transaction is finished and rolled as
>> + * needed, and returns true of false if the delayed operation should continue.
>> + */
>> +int
>> +xfs_attr_trans_roll(
>> +	struct xfs_delattr_context	*dac)
>> +{
>> +	struct xfs_da_args		*args = dac->da_args;
>> +	int				error = 0;
>> +
>> +	if (dac->flags & XFS_DAC_DEFER_FINISH) {
>> +		/*
>> +		 * The caller wants us to finish all the deferred ops so that we
>> +		 * avoid pinning the log tail with a large number of deferred
>> +		 * ops.
>> +		 */
>> +		dac->flags &= ~XFS_DAC_DEFER_FINISH;
>> +		error = xfs_defer_finish(&args->trans);
>> +		if (error)
>> +			return error;
>> +	}
>> +
>> +	return xfs_trans_roll_inode(&args->trans, args->dp);
>> +}
> 
> (Mostly ignoring these functions since they all go away by the end of
> the patchset...)
> 
>> +
>> +/*
>>    * Set the attribute specified in @args.
>>    */
>>   int
>> @@ -364,23 +391,54 @@ xfs_has_attr(
>>    */
>>   int
>>   xfs_attr_remove_args(
>> -	struct xfs_da_args      *args)
>> +	struct xfs_da_args	*args)
>>   {
>> -	struct xfs_inode	*dp = args->dp;
>> -	int			error;
>> +	int				error = 0;
>> +	struct xfs_delattr_context	dac = {
>> +		.da_args	= args,
>> +	};
>> +
>> +	do {
>> +		error = xfs_attr_remove_iter(&dac);
>> +		if (error != -EAGAIN)
>> +			break;
>> +
>> +		error = xfs_attr_trans_roll(&dac);
>> +		if (error)
>> +			return error;
>> +
>> +	} while (true);
>> +
>> +	return error;
>> +}
>> +
>> +/*
>> + * Remove the attribute specified in @args.
>> + *
>> + * This function may return -EAGAIN to signal that the transaction needs to be
>> + * rolled.  Callers should continue calling this function until they receive a
>> + * return value other than -EAGAIN.
>> + */
>> +int
>> +xfs_attr_remove_iter(
>> +	struct xfs_delattr_context	*dac)
>> +{
>> +	struct xfs_da_args		*args = dac->da_args;
>> +	struct xfs_inode		*dp = args->dp;
>> +
>> +	if (dac->dela_state == XFS_DAS_RM_SHRINK)
>> +		goto node;
>>   
> 
> Might as well just make this part of the if statement dispatch:
> 
> 	if (dac->dela_state == XFS_DAS_RM_SHRINK)
> 		return xfs_attr_node_removename_iter(dac);
> 	else if (!xfs_inode_hasattr(dp))
> 		return -ENOATTR;
I think we did this once, but then people disliked having the same call 
in two places.  We call the node function if XFS_DAS_RM_SHRINK is set OR 
if the other two cases fail which is actually the initial point of entry.

I think probably we need a comment somewhere.  I've realized every time 
a question gets re-raised, it means we need a comment so we dont forget 
why :-)

Maybe for the goto we can have:
/* If we are shrinking a node, resume shrink */

and.....


> 
>>   	if (!xfs_inode_hasattr(dp)) {
>> -		error = -ENOATTR;
>> +		return -ENOATTR;
>>   	} else if (dp->i_afp->if_format == XFS_DINODE_FMT_LOCAL) {
>>   		ASSERT(dp->i_afp->if_flags & XFS_IFINLINE);
>> -		error = xfs_attr_shortform_remove(args);
>> +		return xfs_attr_shortform_remove(args);
>>   	} else if (xfs_bmap_one_block(dp, XFS_ATTR_FORK)) {
>> -		error = xfs_attr_leaf_removename(args);
>> -	} else {
>> -		error = xfs_attr_node_removename(args);
>> +		return xfs_attr_leaf_removename(args);
>>   	}
>> -
>> -	return error;
>> +node:
	/* If we are not short form or leaf, then remove node */
?
>> +	return  xfs_attr_node_removename_iter(dac);
>>   }
>>   
>>   /*
>> @@ -1178,10 +1236,11 @@ xfs_attr_leaf_mark_incomplete(
>>    */
>>   STATIC
>>   int xfs_attr_node_removename_setup(
>> -	struct xfs_da_args	*args,
>> -	struct xfs_da_state	**state)
>> +	struct xfs_delattr_context	*dac,
>> +	struct xfs_da_state		**state)
> 
> AFAICT *state == &dac->da_state by the end of the series; can you
> should remove this argument too?
> 
Sure, I will see if I can collapse it down

>>   {
>> -	int			error;
>> +	struct xfs_da_args		*args = dac->da_args;
>> +	int				error;
>>   
>>   	error = xfs_attr_node_hasname(args, state);
>>   	if (error != -EEXIST)
>> @@ -1191,6 +1250,12 @@ int xfs_attr_node_removename_setup(
>>   	ASSERT((*state)->path.blk[(*state)->path.active - 1].magic ==
>>   		XFS_ATTR_LEAF_MAGIC);
>>   
>> +	/*
>> +	 * Store state in the context incase we need to cycle out the
>> +	 * transaction
>> +	 */
>> +	dac->da_state = *state;
>> +
>>   	if (args->rmtblkno > 0) {
>>   		error = xfs_attr_leaf_mark_incomplete(args, *state);
> 
> It doesn't make a lot of logical sense to me "we marked the attr
> incomplete to hide it" is the same state (UNINIT) as "we haven't done
> anything yet".
Not sure I quite follow what you mean here.  This little function is 
just a set up helper.  It doesnt jump in an out like the other functions 
do with the state machine.  We separated it out for that reason.  This 
routine executes once to stash the state. The da_state. not the 
dela_state.  Different states :-)

So after we have that stored away, the calling function moves onto
xfs_attr_node_remove_step, which does get recalled quite a bit until 
there are no more remote blocks to remove.

> 
>>   		if (error)
>> @@ -1203,13 +1268,16 @@ int xfs_attr_node_removename_setup(
>>   }
>>   
>>   STATIC int
>> -xfs_attr_node_remove_rmt(
>> -	struct xfs_da_args	*args,
>> -	struct xfs_da_state	*state)
>> +xfs_attr_node_remove_rmt (
>> +	struct xfs_delattr_context	*dac,
>> +	struct xfs_da_state		*state)
>>   {
>> -	int			error = 0;
>> +	int				error = 0;
>>   
>> -	error = xfs_attr_rmtval_remove(args);
>> +	/*
>> +	 * May return -EAGAIN to request that the caller recall this function
>> +	 */
>> +	error = __xfs_attr_rmtval_remove(dac);
>>   	if (error)
>>   		return error;
>>   
>> @@ -1221,21 +1289,27 @@ xfs_attr_node_remove_rmt(
>>   }
>>   
>>   /*
>> - * Remove a name from a B-tree attribute list.
>> + * Step through removeing a name from a B-tree attribute list.
>>    *
>>    * This will involve walking down the Btree, and may involve joining
>>    * leaf nodes and even joining intermediate nodes up to and including
>>    * the root node (a special case of an intermediate node).
>> + *
>> + * This routine is meant to function as either an inline or delayed operation,
>> + * and may return -EAGAIN when the transaction needs to be rolled.  Calling
>> + * functions will need to handle this, and recall the function until a
>> + * successful error code is returned.
>>    */
>>   STATIC int
>>   xfs_attr_node_remove_step(
>> -	struct xfs_da_args	*args,
>> -	struct xfs_da_state	*state)
>> +	struct xfs_delattr_context	*dac)
>>   {
>> -	struct xfs_da_state_blk	*blk;
>> -	int			retval, error;
>> -	struct xfs_inode	*dp = args->dp;
>> +	struct xfs_da_args		*args = dac->da_args;
>> +	struct xfs_da_state		*state;
>> +	struct xfs_da_state_blk		*blk;
>> +	int				retval, error = 0;
>>   
>> +	state = dac->da_state;
> 
> Might as well initialize this when you declare state above.
Sure

> 
>>   
>>   	/*
>>   	 * If there is an out-of-line value, de-allocate the blocks.
>> @@ -1243,7 +1317,10 @@ xfs_attr_node_remove_step(
>>   	 * overflow the maximum size of a transaction and/or hit a deadlock.
>>   	 */
>>   	if (args->rmtblkno > 0) {
>> -		error = xfs_attr_node_remove_rmt(args, state);
>> +		/*
>> +		 * May return -EAGAIN. Remove blocks until args->rmtblkno == 0
>> +		 */
>> +		error = xfs_attr_node_remove_rmt(dac, state);
>>   		if (error)
>>   			return error;
>>   	}
>> @@ -1257,21 +1334,18 @@ xfs_attr_node_remove_step(
>>   	xfs_da3_fixhashpath(state, &state->path);
>>   
>>   	/*
>> -	 * Check to see if the tree needs to be collapsed.
>> +	 * Check to see if the tree needs to be collapsed.  Set the flag to
>> +	 * indicate that the calling function needs to move the to shrink
>> +	 * operation
>>   	 */
>>   	if (retval && (state->path.active > 1)) {
>>   		error = xfs_da3_join(state);
>>   		if (error)
>>   			return error;
>> -		error = xfs_defer_finish(&args->trans);
>> -		if (error)
>> -			return error;
>> -		/*
>> -		 * Commit the Btree join operation and start a new trans.
>> -		 */
>> -		error = xfs_trans_roll_inode(&args->trans, dp);
>> -		if (error)
>> -			return error;
>> +
>> +		dac->flags |= XFS_DAC_DEFER_FINISH;
>> +		dac->dela_state = XFS_DAS_RM_SHRINK;
>> +		return -EAGAIN;
>>   	}
>>   
>>   	return error;
>> @@ -1282,31 +1356,53 @@ xfs_attr_node_remove_step(
>>    *
>>    * This routine will find the blocks of the name to remove, remove them and
>>    * shirnk the tree if needed.
> 
> "...and shrink the tree..."
> 
Will fix the shirnk :-)

>> + *
>> + * This routine is meant to function as either an inline or delayed operation,
>> + * and may return -EAGAIN when the transaction needs to be rolled.  Calling
>> + * functions will need to handle this, and recall the function until a
>> + * successful error code is returned.
>>    */
>>   STATIC int
>> -xfs_attr_node_removename(
>> -	struct xfs_da_args	*args)
>> +xfs_attr_node_removename_iter(
>> +	struct xfs_delattr_context	*dac)
>>   {
>> -	struct xfs_da_state	*state;
>> -	int			error;
>> -	struct xfs_inode	*dp = args->dp;
>> +	struct xfs_da_args		*args = dac->da_args;
>> +	struct xfs_da_state		*state;
>> +	int				error;
>> +	struct xfs_inode		*dp = args->dp;
>>   
>>   	trace_xfs_attr_node_removename(args);
>> +	state = dac->da_state;
>>   
>> -	error = xfs_attr_node_removename_setup(args, &state);
>> -	if (error)
>> -		goto out;
>> +	if ((dac->flags & XFS_DAC_NODE_RMVNAME_INIT) == 0) {
>> +		dac->flags |= XFS_DAC_NODE_RMVNAME_INIT;
> 
> Can we determine if it's necessary to call _removename_setup by checking
> dac->da_state directly instead of having a flag?

Initially I think I had another XFS_DAS_RMTVAL_REMOVE state for this. 
Alternatly we also discussed using the inverse like this:

if (dac->dela_state != XFS_DAS_RMTVAL_REMOVE)
	do setup....

Though I think people liked having the init flag, since init routines we 
a sort of re-occuring pattern.  So that's why were using the flag now.

> 
>> +		error = xfs_attr_node_removename_setup(dac, &state);
>> +		if (error)
>> +			goto out;
>> +	}
>>   
>> -	error = xfs_attr_node_remove_step(args, state);
>> -	if (error)
>> -		goto out;
>> +	switch (dac->dela_state) {
>> +	case XFS_DAS_UNINIT:
>> +		error = xfs_attr_node_remove_step(dac);
>> +		if (error)
>> +			break;
>>   
>> -	/*
>> -	 * If the result is small enough, push it all into the inode.
>> -	 */
>> -	if (xfs_bmap_one_block(dp, XFS_ATTR_FORK))
>> -		error = xfs_attr_node_shrink(args, state);
>> +		/* do not break, proceed to shrink if needed */
> 
> /* fall through */
> 
> ...because otherwise the static checkers will get mad.
> 
> (Well clang will anyway because gcc, llvm, and the C18 body all have
> different incompatible ideas of what should be the magic tag that
> signals an intentional fall through, but this should at least be
> consistent with the rest of xfs.)
Oh ok then, I did not know.  Will update the comment

> 
>> +	case XFS_DAS_RM_SHRINK:
>> +		/*
>> +		 * If the result is small enough, push it all into the inode.
>> +		 */
>> +		if (xfs_bmap_one_block(dp, XFS_ATTR_FORK))
>> +			error = xfs_attr_node_shrink(args, state);
>>   
>> +		break;
>> +	default:
>> +		ASSERT(0);
>> +		return -EINVAL;
>> +	}
>> +
>> +	if (error == -EAGAIN)
>> +		return error;
>>   out:
>>   	if (state)
>>   		xfs_da_state_free(state);
>> diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
>> index 3e97a93..64dcf0f 100644
>> --- a/fs/xfs/libxfs/xfs_attr.h
>> +++ b/fs/xfs/libxfs/xfs_attr.h
>> @@ -74,6 +74,74 @@ struct xfs_attr_list_context {
>>   };
>>   
>>   
>> +/*
>> + * ========================================================================
>> + * Structure used to pass context around among the delayed routines.
>> + * ========================================================================
>> + */
>> +
>> +/*
>> + * Below is a state machine diagram for attr remove operations. The  XFS_DAS_*
>> + * states indicate places where the function would return -EAGAIN, and then
>> + * immediately resume from after being recalled by the calling function. States
>> + * marked as a "subroutine state" indicate that they belong to a subroutine, and
>> + * so the calling function needs to pass them back to that subroutine to allow
>> + * it to finish where it left off. But they otherwise do not have a role in the
>> + * calling function other than just passing through.
>> + *
>> + * xfs_attr_remove_iter()
>> + *	  XFS_DAS_RM_SHRINK ─┐
>> + *	  (subroutine state) │
>> + *	                     └─>xfs_attr_node_removename()
>> + *	                                      │
>> + *	                                      v
>> + *	                                   need to
>> + *	                                shrink tree? ─n─┐
>> + *	                                      │         │
>> + *	                                      y         │
>> + *	                                      │         │
>> + *	                                      v         │
>> + *	                              XFS_DAS_RM_SHRINK │
>> + *	                                      │         │
>> + *	                                      v         │
>> + *	                                     done <─────┘
>> + *
>> + */
>> +
>> +/*
>> + * Enum values for xfs_delattr_context.da_state
>> + *
>> + * These values are used by delayed attribute operations to keep track  of where
>> + * they were before they returned -EAGAIN.  A return code of -EAGAIN signals the
>> + * calling function to roll the transaction, and then recall the subroutine to
>> + * finish the operation.  The enum is then used by the subroutine to jump back
>> + * to where it was and resume executing where it left off.
>> + */
>> +enum xfs_delattr_state {
>> +	XFS_DAS_UNINIT		= 0,  /* No state has been set yet */
>> +	XFS_DAS_RM_SHRINK,	      /* We are shrinking the tree */
>> +};
>> +
>> +/*
>> + * Defines for xfs_delattr_context.flags
>> + */
>> +#define XFS_DAC_DEFER_FINISH		0x01 /* finish the transaction */
>> +#define XFS_DAC_NODE_RMVNAME_INIT	0x02 /* xfs_attr_node_removename init */
>> +
>> +/*
>> + * Context used for keeping track of delayed attribute operations
>> + */
>> +struct xfs_delattr_context {
>> +	struct xfs_da_args      *da_args;
>> +
>> +	/* Used in xfs_attr_node_removename to roll through removing blocks */
>> +	struct xfs_da_state     *da_state;
>> +
>> +	/* Used to keep track of current state of delayed operation */
>> +	unsigned int            flags;
>> +	enum xfs_delattr_state  dela_state;
>> +};
>> +
>>   /*========================================================================
>>    * Function prototypes for the kernel.
>>    *========================================================================*/
>> @@ -91,6 +159,10 @@ int xfs_attr_set(struct xfs_da_args *args);
>>   int xfs_attr_set_args(struct xfs_da_args *args);
>>   int xfs_has_attr(struct xfs_da_args *args);
>>   int xfs_attr_remove_args(struct xfs_da_args *args);
>> +int xfs_attr_remove_iter(struct xfs_delattr_context *dac);
>> +int xfs_attr_trans_roll(struct xfs_delattr_context *dac);
>>   bool xfs_attr_namecheck(const void *name, size_t length);
>> +void xfs_delattr_context_init(struct xfs_delattr_context *dac,
>> +			      struct xfs_da_args *args);
>>   
>>   #endif	/* __XFS_ATTR_H__ */
>> diff --git a/fs/xfs/libxfs/xfs_attr_leaf.c b/fs/xfs/libxfs/xfs_attr_leaf.c
>> index bb128db..338377e 100644
>> --- a/fs/xfs/libxfs/xfs_attr_leaf.c
>> +++ b/fs/xfs/libxfs/xfs_attr_leaf.c
>> @@ -19,8 +19,8 @@
>>   #include "xfs_bmap_btree.h"
>>   #include "xfs_bmap.h"
>>   #include "xfs_attr_sf.h"
>> -#include "xfs_attr_remote.h"
>>   #include "xfs_attr.h"
>> +#include "xfs_attr_remote.h"
>>   #include "xfs_attr_leaf.h"
>>   #include "xfs_error.h"
>>   #include "xfs_trace.h"
>> diff --git a/fs/xfs/libxfs/xfs_attr_remote.c b/fs/xfs/libxfs/xfs_attr_remote.c
>> index 48d8e9c..1426c15 100644
>> --- a/fs/xfs/libxfs/xfs_attr_remote.c
>> +++ b/fs/xfs/libxfs/xfs_attr_remote.c
>> @@ -674,10 +674,12 @@ xfs_attr_rmtval_invalidate(
>>    */
>>   int
>>   xfs_attr_rmtval_remove(
>> -	struct xfs_da_args      *args)
>> +	struct xfs_da_args		*args)
>>   {
>> -	int			error;
>> -	int			retval;
>> +	int				error;
>> +	struct xfs_delattr_context	dac  = {
>> +		.da_args	= args,
>> +	};
>>   
>>   	trace_xfs_attr_rmtval_remove(args);
>>   
>> @@ -685,19 +687,17 @@ xfs_attr_rmtval_remove(
>>   	 * Keep de-allocating extents until the remote-value region is gone.
>>   	 */
>>   	do {
>> -		retval = __xfs_attr_rmtval_remove(args);
>> -		if (retval && retval != -EAGAIN)
>> -			return retval;
>> +		error = __xfs_attr_rmtval_remove(&dac);
>> +		if (error != -EAGAIN)
>> +			break;
>>   
>> -		/*
>> -		 * Close out trans and start the next one in the chain.
>> -		 */
>> -		error = xfs_trans_roll_inode(&args->trans, args->dp);
>> +		error = xfs_attr_trans_roll(&dac);
>>   		if (error)
>>   			return error;
>> -	} while (retval == -EAGAIN);
>>   
>> -	return 0;
>> +	} while (true);
>> +
>> +	return error;
>>   }
>>   
>>   /*
>> @@ -707,9 +707,10 @@ xfs_attr_rmtval_remove(
>>    */
>>   int
>>   __xfs_attr_rmtval_remove(
>> -	struct xfs_da_args	*args)
>> +	struct xfs_delattr_context	*dac)
>>   {
>> -	int			error, done;
>> +	struct xfs_da_args		*args = dac->da_args;
>> +	int				error, done;
>>   
>>   	/*
>>   	 * Unmap value blocks for this attr.
>> @@ -719,12 +720,10 @@ __xfs_attr_rmtval_remove(
>>   	if (error)
>>   		return error;
>>   
>> -	error = xfs_defer_finish(&args->trans);
>> -	if (error)
>> -		return error;
>> -
>> -	if (!done)
>> +	if (!done) {
>> +		dac->flags |= XFS_DAC_DEFER_FINISH;
>>   		return -EAGAIN;
> 
> What state are we in when we return -EAGAIN here?
> 
> [jumps back to his whole-branch diff]
> 
> Hm, oh, I see, the next state could be a number of things--
> 
> RM_LBLK if we're removing an old remote value from a leaf block as part
> of an attr set operation; or
> 
> RM_NBLK if we're removing an old remote value from a node block as part
> of an attr set operation; and
> 
> UNINIT if we're removing a remote value as part of an attr set
> operation.
> 
> Oh!  For the first two, it looks to me as though either we're already in
> the state we're setting (RM_[LN]BLK) or we were in either of the
> FLIP_[LN]FLAG state.
> 
> I think it would make more sense if you set the state before calling the
> rmtval_remove function, and leave a comment here saying that the caller
> is responsible for figuring out the next state.
Sure, it should be ok

> 
> For removals, I wonder if we should have advanced beyond UNINIT by the
> time we get here?  I think you've added the minimum states that are
> necessary to resume work after a transaction roll, but from this and the
> next patch I feel like we do a lot of work while dela_state == UNINIT.
Yes, I think I went over that a little in my replies to your earlier 
reviews.  Many times we can get away with out setting a state to 
accomplish the same behavior, though it may make it a little harder to 
visualize where it comes back.

I dunno this one seems like a preference in so far as what people want 
to see for simplification.  I think haveing the explicit state setting 
makes the code easier for a reader to follow, though I will concede they 
dont actually have to be there to make it work.

> 
> FWIW I will be taking a close look at all the new 'return -EAGAIN'
> statements to see if I can tell what state we're in when we trigger a
> transaction roll.
Well, ok, a lot of them are UNINIT.  If we continue in the direrction of 
removing all unnecessary states, really it's the combination of the tree 
and the state that actually lands us back to where we need to be when 
the function is recalled.

If, for debugging or readability purposes, we wanted an explicit state 
for each EAGAIN, we would reintroduce a lot of states we've simplifid 
away over the reviews.

Maybe give it a day or two to sleep on, and let me know what you think :-)

Thanks for the reviews, I know it's really complicated.
Allison

> 
> --D
> 
>> +	}
>>   
>>   	return error;
>>   }
>> diff --git a/fs/xfs/libxfs/xfs_attr_remote.h b/fs/xfs/libxfs/xfs_attr_remote.h
>> index 9eee615..002fd30 100644
>> --- a/fs/xfs/libxfs/xfs_attr_remote.h
>> +++ b/fs/xfs/libxfs/xfs_attr_remote.h
>> @@ -14,5 +14,5 @@ int xfs_attr_rmtval_remove(struct xfs_da_args *args);
>>   int xfs_attr_rmtval_stale(struct xfs_inode *ip, struct xfs_bmbt_irec *map,
>>   		xfs_buf_flags_t incore_flags);
>>   int xfs_attr_rmtval_invalidate(struct xfs_da_args *args);
>> -int __xfs_attr_rmtval_remove(struct xfs_da_args *args);
>> +int __xfs_attr_rmtval_remove(struct xfs_delattr_context *dac);
>>   #endif /* __XFS_ATTR_REMOTE_H__ */
>> diff --git a/fs/xfs/xfs_attr_inactive.c b/fs/xfs/xfs_attr_inactive.c
>> index bfad669..aaa7e66 100644
>> --- a/fs/xfs/xfs_attr_inactive.c
>> +++ b/fs/xfs/xfs_attr_inactive.c
>> @@ -15,10 +15,10 @@
>>   #include "xfs_da_format.h"
>>   #include "xfs_da_btree.h"
>>   #include "xfs_inode.h"
>> +#include "xfs_attr.h"
>>   #include "xfs_attr_remote.h"
>>   #include "xfs_trans.h"
>>   #include "xfs_bmap.h"
>> -#include "xfs_attr.h"
>>   #include "xfs_attr_leaf.h"
>>   #include "xfs_quota.h"
>>   #include "xfs_dir2.h"
>> -- 
>> 2.7.4
>>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v13 02/10] xfs: Add delay ready attr remove routines
  2020-11-11  0:28     ` Dave Chinner
@ 2020-11-13  4:00       ` Allison Henderson
  0 siblings, 0 replies; 58+ messages in thread
From: Allison Henderson @ 2020-11-13  4:00 UTC (permalink / raw)
  To: Dave Chinner, Darrick J. Wong; +Cc: linux-xfs



On 11/10/20 5:28 PM, Dave Chinner wrote:
> On Tue, Nov 10, 2020 at 03:43:31PM -0800, Darrick J. Wong wrote:
>> On Thu, Oct 22, 2020 at 11:34:27PM -0700, Allison Henderson wrote:
>>> +/*
>>> + * Remove the attribute specified in @args.
>>> + *
>>> + * This function may return -EAGAIN to signal that the transaction needs to be
>>> + * rolled.  Callers should continue calling this function until they receive a
>>> + * return value other than -EAGAIN.
>>> + */
>>> +int
>>> +xfs_attr_remove_iter(
>>> +	struct xfs_delattr_context	*dac)
>>> +{
>>> +	struct xfs_da_args		*args = dac->da_args;
>>> +	struct xfs_inode		*dp = args->dp;
>>> +
>>> +	if (dac->dela_state == XFS_DAS_RM_SHRINK)
>>> +		goto node;
>>>   
>>
>> Might as well just make this part of the if statement dispatch:
>>
>> 	if (dac->dela_state == XFS_DAS_RM_SHRINK)
>> 		return xfs_attr_node_removename_iter(dac);
>> 	else if (!xfs_inode_hasattr(dp))
>> 		return -ENOATTR;
>>
>>>   	if (!xfs_inode_hasattr(dp)) {
>>> -		error = -ENOATTR;
>>> +		return -ENOATTR;
>>>   	} else if (dp->i_afp->if_format == XFS_DINODE_FMT_LOCAL) {
>>>   		ASSERT(dp->i_afp->if_flags & XFS_IFINLINE);
>>> -		error = xfs_attr_shortform_remove(args);
>>> +		return xfs_attr_shortform_remove(args);
>>>   	} else if (xfs_bmap_one_block(dp, XFS_ATTR_FORK)) {
>>> -		error = xfs_attr_leaf_removename(args);
>>> -	} else {
>>> -		error = xfs_attr_node_removename(args);
>>> +		return xfs_attr_leaf_removename(args);
>>>   	}
>>> -
>>> -	return error;
>>> +node:
>>> +	return  xfs_attr_node_removename_iter(dac);
> 
> Just a nitpick on this anti-pattern: else is not necessary
> when the branch returns.
> 
> 	if (!xfs_inode_hasattr(dp))
> 		return -ENOATTR;
> 
> 	if (dac->dela_state == XFS_DAS_RM_SHRINK)
> 		return xfs_attr_node_removename_iter(dac);
> 
> 	if (dp->i_afp->if_format == XFS_DINODE_FMT_LOCAL) {
> 		ASSERT(dp->i_afp->if_flags & XFS_IFINLINE);
> 		return xfs_attr_shortform_remove(args);
> 	}
> 
> 	if (xfs_bmap_one_block(dp, XFS_ATTR_FORK))
> 		return xfs_attr_leaf_removename(args);
> 
> 	return xfs_attr_node_removename_iter(dac);
> 
> -Dave.
> 
Sure, I think its ok to clean out the elses sense they all return.  Thanks!

Allison


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v13 03/10] xfs: Add delay ready attr set routines
  2020-11-13  1:33       ` Allison Henderson
@ 2020-11-13  9:16         ` Chandan Babu R
  2020-11-13 17:12           ` Allison Henderson
  0 siblings, 1 reply; 58+ messages in thread
From: Chandan Babu R @ 2020-11-13  9:16 UTC (permalink / raw)
  To: Allison Henderson; +Cc: Darrick J. Wong, linux-xfs

On Friday 13 November 2020 7:03:13 AM IST Allison Henderson wrote:
> 
> On 11/10/20 2:57 PM, Darrick J. Wong wrote:
> > On Tue, Oct 27, 2020 at 07:02:55PM +0530, Chandan Babu R wrote:
> >> On Friday 23 October 2020 12:04:28 PM IST Allison Henderson wrote:
> >>> This patch modifies the attr set routines to be delay ready. This means
> >>> they no longer roll or commit transactions, but instead return -EAGAIN
> >>> to have the calling routine roll and refresh the transaction.  In this
> >>> series, xfs_attr_set_args has become xfs_attr_set_iter, which uses a
> >>> state machine like switch to keep track of where it was when EAGAIN was
> >>> returned. See xfs_attr.h for a more detailed diagram of the states.
> >>>
> >>> Two new helper functions have been added: xfs_attr_rmtval_set_init and
> >>> xfs_attr_rmtval_set_blk.  They provide a subset of logic similar to
> >>> xfs_attr_rmtval_set, but they store the current block in the delay attr
> >>> context to allow the caller to roll the transaction between allocations.
> >>> This helps to simplify and consolidate code used by
> >>> xfs_attr_leaf_addname and xfs_attr_node_addname. xfs_attr_set_args has
> >>> now become a simple loop to refresh the transaction until the operation
> >>> is completed.  Lastly, xfs_attr_rmtval_remove is no longer used, and is
> >>> removed.
> >>
> >> One nit. xfs_attr_rmtval_remove()'s prototype declaration needs to be removed
> >> from xfs_attr_remote.h.
> Alrighty, will pull out
> 
> >>
> >>>
> >>> Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
> >>> ---
> >>>   fs/xfs/libxfs/xfs_attr.c        | 370 ++++++++++++++++++++++++++--------------
> >>>   fs/xfs/libxfs/xfs_attr.h        | 126 +++++++++++++-
> >>>   fs/xfs/libxfs/xfs_attr_remote.c |  99 +++++++----
> >>>   fs/xfs/libxfs/xfs_attr_remote.h |   4 +
> >>>   fs/xfs/xfs_trace.h              |   1 -
> >>>   5 files changed, 439 insertions(+), 161 deletions(-)
> >>>
> >>> diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
> >>> index 6ca94cb..95c98d7 100644
> >>> --- a/fs/xfs/libxfs/xfs_attr.c
> >>> +++ b/fs/xfs/libxfs/xfs_attr.c
> >>> @@ -44,7 +44,7 @@ STATIC int xfs_attr_shortform_addname(xfs_da_args_t *args);
> >>>    * Internal routines when attribute list is one block.
> >>>    */
> >>>   STATIC int xfs_attr_leaf_get(xfs_da_args_t *args);
> >>> -STATIC int xfs_attr_leaf_addname(xfs_da_args_t *args);
> >>> +STATIC int xfs_attr_leaf_addname(struct xfs_delattr_context *dac);
> >>>   STATIC int xfs_attr_leaf_removename(xfs_da_args_t *args);
> >>>   STATIC int xfs_attr_leaf_hasname(struct xfs_da_args *args, struct xfs_buf **bp);
> >>>   
> >>> @@ -52,12 +52,15 @@ STATIC int xfs_attr_leaf_hasname(struct xfs_da_args *args, struct xfs_buf **bp);
> >>>    * Internal routines when attribute list is more than one block.
> >>>    */
> >>>   STATIC int xfs_attr_node_get(xfs_da_args_t *args);
> >>> -STATIC int xfs_attr_node_addname(xfs_da_args_t *args);
> >>> +STATIC int xfs_attr_node_addname(struct xfs_delattr_context *dac);
> >>>   STATIC int xfs_attr_node_removename_iter(struct xfs_delattr_context *dac);
> >>>   STATIC int xfs_attr_node_hasname(xfs_da_args_t *args,
> >>>   				 struct xfs_da_state **state);
> >>>   STATIC int xfs_attr_fillstate(xfs_da_state_t *state);
> >>>   STATIC int xfs_attr_refillstate(xfs_da_state_t *state);
> >>> +STATIC int xfs_attr_leaf_try_add(struct xfs_da_args *args, struct xfs_buf *bp);
> >>> +STATIC int xfs_attr_set_iter(struct xfs_delattr_context *dac,
> >>> +			     struct xfs_buf **leaf_bp);
> >>>   
> >>>   int
> >>>   xfs_inode_hasattr(
> >>> @@ -218,8 +221,11 @@ xfs_attr_is_shortform(
> >>>   
> >>>   /*
> >>>    * Attempts to set an attr in shortform, or converts short form to leaf form if
> >>> - * there is not enough room.  If the attr is set, the transaction is committed
> >>> - * and set to NULL.
> >>> + * there is not enough room.  This function is meant to operate as a helper
> >>> + * routine to the delayed attribute functions.  It returns -EAGAIN to indicate
> >>> + * that the calling function should roll the transaction, and then proceed to
> >>> + * add the attr in leaf form.  This subroutine does not expect to be recalled
> >>> + * again like the other delayed attr routines do.
> >>>    */
> >>>   STATIC int
> >>>   xfs_attr_set_shortform(
> >>> @@ -227,16 +233,16 @@ xfs_attr_set_shortform(
> >>>   	struct xfs_buf		**leaf_bp)
> >>>   {
> >>>   	struct xfs_inode	*dp = args->dp;
> >>> -	int			error, error2 = 0;
> >>> +	int			error = 0;
> >>>   
> >>>   	/*
> >>>   	 * Try to add the attr to the attribute list in the inode.
> >>>   	 */
> >>>   	error = xfs_attr_try_sf_addname(dp, args);
> >>> +
> >>> +	/* Should only be 0, -EEXIST or ENOSPC */
> >>>   	if (error != -ENOSPC) {
> >>> -		error2 = xfs_trans_commit(args->trans);
> >>> -		args->trans = NULL;
> >>> -		return error ? error : error2;
> >>> +		return error;
> >>>   	}
> >>>   	/*
> >>>   	 * It won't fit in the shortform, transform to a leaf block.  GROT:
> >>> @@ -249,18 +255,10 @@ xfs_attr_set_shortform(
> >>>   	/*
> >>>   	 * Prevent the leaf buffer from being unlocked so that a concurrent AIL
> >>>   	 * push cannot grab the half-baked leaf buffer and run into problems
> >>> -	 * with the write verifier. Once we're done rolling the transaction we
> >>> -	 * can release the hold and add the attr to the leaf.
> >>> +	 * with the write verifier.
> >>>   	 */
> >>>   	xfs_trans_bhold(args->trans, *leaf_bp);
> >>> -	error = xfs_defer_finish(&args->trans);
> >>> -	xfs_trans_bhold_release(args->trans, *leaf_bp);
> >>> -	if (error) {
> >>> -		xfs_trans_brelse(args->trans, *leaf_bp);
> >>> -		return error;
> >>> -	}
> >>> -
> >>> -	return 0;
> >>> +	return -EAGAIN;
> >>>   }
> >>>   
> >>>   /*
> >>> @@ -268,7 +266,7 @@ xfs_attr_set_shortform(
> >>>    * also checks for a defer finish.  Transaction is finished and rolled as
> >>>    * needed, and returns true of false if the delayed operation should continue.
> >>>    */
> >>> -int
> >>> +STATIC int
> >>>   xfs_attr_trans_roll(
> >>>   	struct xfs_delattr_context	*dac)
> >>>   {
> >>> @@ -297,61 +295,130 @@ int
> >>>   xfs_attr_set_args(
> >>>   	struct xfs_da_args	*args)
> >>>   {
> >>> -	struct xfs_inode	*dp = args->dp;
> >>> -	struct xfs_buf          *leaf_bp = NULL;
> >>> -	int			error = 0;
> >>> +	struct xfs_buf			*leaf_bp = NULL;
> >>> +	int				error = 0;
> >>> +	struct xfs_delattr_context	dac = {
> >>> +		.da_args	= args,
> >>> +	};
> >>> +
> >>> +	do {
> >>> +		error = xfs_attr_set_iter(&dac, &leaf_bp);
> >>> +		if (error != -EAGAIN)
> >>> +			break;
> >>> +
> >>> +		error = xfs_attr_trans_roll(&dac);
> >>> +		if (error)
> >>> +			return error;
> >>> +
> >>> +		if (leaf_bp) {
> >>> +			xfs_trans_bjoin(args->trans, leaf_bp);
> >>> +			xfs_trans_bhold(args->trans, leaf_bp);
> >>> +		}
> >>
> >> When xfs_attr_set_iter() causes a "short form" attribute list to be converted
> >> to "leaf form", leaf_bp would point to an xfs_buf which has been added to the
> >> transaction and also XFS_BLI_HOLD flag is set on the buffer (last statement in
> >> xfs_attr_set_shortform()). XFS_BLI_HOLD flag makes sure that the new
> >> transaction allocated by xfs_attr_trans_roll() would continue to have leaf_bp
> >> in the transaction's item list. Hence I think the above calls to
> >> xfs_trans_bjoin() and xfs_trans_bhold() are not required.
> Sorry, I just noticed Chandans commentary for this patch.  Apologies. I 
> think we can get away with out this now, but yes this routine disappears 
> at the end of the set now.  Will clean out anyway for bisecting reasons 
> though. :-)

No problem. As an aside, I stopped reviewing the patchset after I noticed
Brian's review comments for "[PATCH v13 02/10] xfs: Add delay ready attr
remove routines" suggesting some more code refactoring work.

> 
> > 
> > I /think/ the defer ops will rejoin the buffer each time it rolls, which
> > means that xfs_attr_trans_roll returns with the buffer already joined to
> > the transaction?  And I think you're right that the bhold isn't needed,
> > because holding is dictated by the lower levels (i.e. _set_iter).
> > 
> >> Please let me know if I am missing something obvious here.
> > 
> > The entire function goes away by the end of the series. :)
> > 
> > --D
> > 
> >>
> >>
> >>
> >>
> 


-- 
chandan




^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v13 05/10] xfs: Set up infastructure for deferred attribute operations
  2020-11-11  3:44     ` Darrick J. Wong
@ 2020-11-13 17:06       ` Allison Henderson
  0 siblings, 0 replies; 58+ messages in thread
From: Allison Henderson @ 2020-11-13 17:06 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs



On 11/10/20 8:44 PM, Darrick J. Wong wrote:
> On Tue, Nov 10, 2020 at 01:51:49PM -0800, Darrick J. Wong wrote:
>> On Thu, Oct 22, 2020 at 11:34:30PM -0700, Allison Henderson wrote:
>>> Currently attributes are modified directly across one or more
>>> transactions. But they are not logged or replayed in the event of an
>>> error. The goal of delayed attributes is to enable logging and replaying
>>> of attribute operations using the existing delayed operations
>>> infrastructure.  This will later enable the attributes to become part of
>>> larger multi part operations that also must first be recorded to the
>>> log.  This is mostly of interest in the scheme of parent pointers which
>>> would need to maintain an attribute containing parent inode information
>>> any time an inode is moved, created, or removed.  Parent pointers would
>>> then be of interest to any feature that would need to quickly derive an
>>> inode path from the mount point. Online scrub, nfs lookups and fs grow
>>> or shrink operations are all features that could take advantage of this.
>>>
>>> This patch adds two new log item types for setting or removing
>>> attributes as deferred operations.  The xfs_attri_log_item logs an
>>> intent to set or remove an attribute.  The corresponding
>>> xfs_attrd_log_item holds a reference to the xfs_attri_log_item and is
>>> freed once the transaction is done.  Both log items use a generic
>>> xfs_attr_log_format structure that contains the attribute name, value,
>>> flags, inode, and an op_flag that indicates if the operations is a set
>>> or remove.
>>>
>>> Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
>>> ---
>>>   fs/xfs/Makefile                 |   1 +
>>>   fs/xfs/libxfs/xfs_attr.c        |   7 +-
>>>   fs/xfs/libxfs/xfs_attr.h        |  19 +
>>>   fs/xfs/libxfs/xfs_defer.c       |   1 +
>>>   fs/xfs/libxfs/xfs_defer.h       |   3 +
>>>   fs/xfs/libxfs/xfs_format.h      |   5 +
>>>   fs/xfs/libxfs/xfs_log_format.h  |  44 ++-
>>>   fs/xfs/libxfs/xfs_log_recover.h |   2 +
>>>   fs/xfs/libxfs/xfs_types.h       |   1 +
>>>   fs/xfs/scrub/common.c           |   2 +
>>>   fs/xfs/xfs_acl.c                |   2 +
>>>   fs/xfs/xfs_attr_item.c          | 750 ++++++++++++++++++++++++++++++++++++++++
>>>   fs/xfs/xfs_attr_item.h          |  76 ++++
>>>   fs/xfs/xfs_attr_list.c          |   1 +
>>>   fs/xfs/xfs_ioctl.c              |   2 +
>>>   fs/xfs/xfs_ioctl32.c            |   2 +
>>>   fs/xfs/xfs_iops.c               |   2 +
>>>   fs/xfs/xfs_log.c                |   4 +
>>>   fs/xfs/xfs_log_recover.c        |   2 +
>>>   fs/xfs/xfs_ondisk.h             |   2 +
>>>   fs/xfs/xfs_xattr.c              |   1 +
>>>   21 files changed, 923 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
>>> index 04611a1..b056cfc 100644
>>> --- a/fs/xfs/Makefile
>>> +++ b/fs/xfs/Makefile
>>> @@ -102,6 +102,7 @@ xfs-y				+= xfs_log.o \
>>>   				   xfs_buf_item_recover.o \
>>>   				   xfs_dquot_item_recover.o \
>>>   				   xfs_extfree_item.o \
>>> +				   xfs_attr_item.o \
>>>   				   xfs_icreate_item.o \
>>>   				   xfs_inode_item.o \
>>>   				   xfs_inode_item_recover.o \
>>> diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
>>> index 6453178..760383c 100644
>>> --- a/fs/xfs/libxfs/xfs_attr.c
>>> +++ b/fs/xfs/libxfs/xfs_attr.c
>>> @@ -24,6 +24,7 @@
>>>   #include "xfs_quota.h"
>>>   #include "xfs_trans_space.h"
>>>   #include "xfs_trace.h"
>>> +#include "xfs_attr_item.h"
>>>   
>>>   /*
>>>    * xfs_attr.c
>>> @@ -59,8 +60,6 @@ STATIC int xfs_attr_node_hasname(xfs_da_args_t *args,
>>>   STATIC int xfs_attr_fillstate(xfs_da_state_t *state);
>>>   STATIC int xfs_attr_refillstate(xfs_da_state_t *state);
>>>   STATIC int xfs_attr_leaf_try_add(struct xfs_da_args *args, struct xfs_buf *bp);
>>> -STATIC int xfs_attr_set_iter(struct xfs_delattr_context *dac,
>>> -			     struct xfs_buf **leaf_bp);
>>>   
>>>   int
>>>   xfs_inode_hasattr(
>>> @@ -142,7 +141,7 @@ xfs_attr_get(
>>>   /*
>>>    * Calculate how many blocks we need for the new attribute,
>>>    */
>>> -STATIC int
>>> +int
>>>   xfs_attr_calc_size(
>>>   	struct xfs_da_args	*args,
>>>   	int			*local)
>>> @@ -327,7 +326,7 @@ xfs_attr_set_args(
>>>    * to handle this, and recall the function until a successful error code is
>>>    * returned.
>>>    */
>>> -STATIC int
>>> +int
>>>   xfs_attr_set_iter(
>>>   	struct xfs_delattr_context	*dac,
>>>   	struct xfs_buf			**leaf_bp)
>>> diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
>>> index 501f9df..5b4a1ca 100644
>>> --- a/fs/xfs/libxfs/xfs_attr.h
>>> +++ b/fs/xfs/libxfs/xfs_attr.h
>>> @@ -247,6 +247,7 @@ enum xfs_delattr_state {
>>>   #define XFS_DAC_DEFER_FINISH		0x01 /* finish the transaction */
>>>   #define XFS_DAC_NODE_RMVNAME_INIT	0x02 /* xfs_attr_node_removename init */
>>>   #define XFS_DAC_LEAF_ADDNAME_INIT	0x04 /* xfs_attr_leaf_addname init*/
>>> +#define XFS_DAC_DELAYED_OP_INIT		0x08 /* delayed operations init*/
>>>   
>>>   /*
>>>    * Context used for keeping track of delayed attribute operations
>>> @@ -254,6 +255,9 @@ enum xfs_delattr_state {
>>>   struct xfs_delattr_context {
>>>   	struct xfs_da_args      *da_args;
>>>   
>>> +	/* Used by delayed attributes to hold leaf across transactions */
>>
>> "Used by xfs_attr_set to hold a leaf buffer across a transaction roll" ?
>>
>>> +	struct xfs_buf		*leaf_bp;
>>> +
>>>   	/* Used in xfs_attr_rmtval_set_blk to roll through allocating blocks */
>>>   	struct xfs_bmbt_irec	map;
>>>   	xfs_dablk_t		lblkno;
>>> @@ -267,6 +271,18 @@ struct xfs_delattr_context {
>>>   	enum xfs_delattr_state  dela_state;
>>>   };
>>>   
>>> +/*
>>> + * List of attrs to commit later.
>>> + */
>>> +struct xfs_attr_item {
>>> +	struct xfs_delattr_context	xattri_dac;
>>> +	uint32_t			xattri_op_flags;/* attr op set or rm */
>>
>> The comment for xattri_op_flags should be more direct in mentioning that
>> it takes XFS_ATTR_OP_FLAGS_{SET,REMOVE}.
>>
>> (Alternately you could define an enum for the incore state tracker that
>> causes the appropriate XFS_ATTR_OP_FLAG* to be set on the log item in
>> xfs_attr_create_intent to avoid mixing of the flag namespaces, but that
>> is a lot of paper-pushing...)
>>
>>> +
>>> +	/* used to log this item to an intent */
>>> +	struct list_head		xattri_list;
>>> +};
>>
>> Ok, so going back to a confusing comment I had from the last series,
>> I'm glad that you've moved all the attr code to be deferred operations.
>>
>> Can you move all the xfs_delattr_context fields into xfs_attr_item?
>> AFAICT (from git diff'ing the entire branch :P) we never allocate an
>> xfs_delattr_context on its own; we only ever access the one that's
>> embedded in xfs_attr_item, right?
>>
>>> +
>>> +
>>>   /*========================================================================
>>>    * Function prototypes for the kernel.
>>>    *========================================================================*/
>>> @@ -282,11 +298,14 @@ int xfs_attr_get_ilocked(struct xfs_da_args *args);
>>>   int xfs_attr_get(struct xfs_da_args *args);
>>>   int xfs_attr_set(struct xfs_da_args *args);
>>>   int xfs_attr_set_args(struct xfs_da_args *args);
>>> +int xfs_attr_set_iter(struct xfs_delattr_context *dac,
>>> +		      struct xfs_buf **leaf_bp);
>>>   int xfs_has_attr(struct xfs_da_args *args);
>>>   int xfs_attr_remove_args(struct xfs_da_args *args);
>>>   int xfs_attr_remove_iter(struct xfs_delattr_context *dac);
>>>   bool xfs_attr_namecheck(const void *name, size_t length);
>>>   void xfs_delattr_context_init(struct xfs_delattr_context *dac,
>>>   			      struct xfs_da_args *args);
>>> +int xfs_attr_calc_size(struct xfs_da_args *args, int *local);
>>>   
>>>   #endif	/* __XFS_ATTR_H__ */
>>> diff --git a/fs/xfs/libxfs/xfs_defer.c b/fs/xfs/libxfs/xfs_defer.c
>>> index eff4a12..e9caff7 100644
>>> --- a/fs/xfs/libxfs/xfs_defer.c
>>> +++ b/fs/xfs/libxfs/xfs_defer.c
>>> @@ -178,6 +178,7 @@ static const struct xfs_defer_op_type *defer_op_types[] = {
>>>   	[XFS_DEFER_OPS_TYPE_RMAP]	= &xfs_rmap_update_defer_type,
>>>   	[XFS_DEFER_OPS_TYPE_FREE]	= &xfs_extent_free_defer_type,
>>>   	[XFS_DEFER_OPS_TYPE_AGFL_FREE]	= &xfs_agfl_free_defer_type,
>>> +	[XFS_DEFER_OPS_TYPE_ATTR]	= &xfs_attr_defer_type,
>>>   };
>>>   
>>>   static void
>>> diff --git a/fs/xfs/libxfs/xfs_defer.h b/fs/xfs/libxfs/xfs_defer.h
>>> index 05472f7..72a5789 100644
>>> --- a/fs/xfs/libxfs/xfs_defer.h
>>> +++ b/fs/xfs/libxfs/xfs_defer.h
>>> @@ -19,6 +19,7 @@ enum xfs_defer_ops_type {
>>>   	XFS_DEFER_OPS_TYPE_RMAP,
>>>   	XFS_DEFER_OPS_TYPE_FREE,
>>>   	XFS_DEFER_OPS_TYPE_AGFL_FREE,
>>> +	XFS_DEFER_OPS_TYPE_ATTR,
>>>   	XFS_DEFER_OPS_TYPE_MAX,
>>>   };
>>>   
>>> @@ -63,6 +64,8 @@ extern const struct xfs_defer_op_type xfs_refcount_update_defer_type;
>>>   extern const struct xfs_defer_op_type xfs_rmap_update_defer_type;
>>>   extern const struct xfs_defer_op_type xfs_extent_free_defer_type;
>>>   extern const struct xfs_defer_op_type xfs_agfl_free_defer_type;
>>> +extern const struct xfs_defer_op_type xfs_attr_defer_type;
>>> +
>>>   
>>>   /*
>>>    * This structure enables a dfops user to detach the chain of deferred
>>> diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
>>> index dd764da..d419c34 100644
>>> --- a/fs/xfs/libxfs/xfs_format.h
>>> +++ b/fs/xfs/libxfs/xfs_format.h
>>> @@ -584,6 +584,11 @@ static inline bool xfs_sb_version_hasinobtcounts(struct xfs_sb *sbp)
>>>   		(sbp->sb_features_ro_compat & XFS_SB_FEAT_RO_COMPAT_INOBTCNT);
>>>   }
>>>   
>>> +static inline bool xfs_sb_version_hasdelattr(struct xfs_sb *sbp)
>>> +{
>>> +	return false;
>>> +}
>>> +
>>>   /*
>>>    * end of superblock version macros
>>>    */
>>> diff --git a/fs/xfs/libxfs/xfs_log_format.h b/fs/xfs/libxfs/xfs_log_format.h
>>> index 8bd00da..de6309d 100644
>>> --- a/fs/xfs/libxfs/xfs_log_format.h
>>> +++ b/fs/xfs/libxfs/xfs_log_format.h
>>> @@ -117,7 +117,12 @@ struct xfs_unmount_log_format {
>>>   #define XLOG_REG_TYPE_CUD_FORMAT	24
>>>   #define XLOG_REG_TYPE_BUI_FORMAT	25
>>>   #define XLOG_REG_TYPE_BUD_FORMAT	26
>>> -#define XLOG_REG_TYPE_MAX		26
>>> +#define XLOG_REG_TYPE_ATTRI_FORMAT	27
>>> +#define XLOG_REG_TYPE_ATTRD_FORMAT	28
>>> +#define XLOG_REG_TYPE_ATTR_NAME	29
>>> +#define XLOG_REG_TYPE_ATTR_VALUE	30
>>> +#define XLOG_REG_TYPE_MAX		30
>>> +
>>>   
>>>   /*
>>>    * Flags to log operation header
>>> @@ -240,6 +245,8 @@ typedef struct xfs_trans_header {
>>>   #define	XFS_LI_CUD		0x1243
>>>   #define	XFS_LI_BUI		0x1244	/* bmbt update intent */
>>>   #define	XFS_LI_BUD		0x1245
>>> +#define	XFS_LI_ATTRI		0x1246  /* attr set/remove intent*/
>>> +#define	XFS_LI_ATTRD		0x1247  /* attr set/remove done */
>>>   
>>>   #define XFS_LI_TYPE_DESC \
>>>   	{ XFS_LI_EFI,		"XFS_LI_EFI" }, \
>>> @@ -255,7 +262,9 @@ typedef struct xfs_trans_header {
>>>   	{ XFS_LI_CUI,		"XFS_LI_CUI" }, \
>>>   	{ XFS_LI_CUD,		"XFS_LI_CUD" }, \
>>>   	{ XFS_LI_BUI,		"XFS_LI_BUI" }, \
>>> -	{ XFS_LI_BUD,		"XFS_LI_BUD" }
>>> +	{ XFS_LI_BUD,		"XFS_LI_BUD" }, \
>>> +	{ XFS_LI_ATTRI,		"XFS_LI_ATTRI" }, \
>>> +	{ XFS_LI_ATTRD,		"XFS_LI_ATTRD" }
>>>   
>>>   /*
>>>    * Inode Log Item Format definitions.
>>> @@ -863,4 +872,35 @@ struct xfs_icreate_log {
>>>   	__be32		icl_gen;	/* inode generation number to use */
>>>   };
>>>   
>>> +/*
>>> + * Flags for deferred attribute operations.
>>> + * Upper bits are flags, lower byte is type code
>>> + */
>>> +#define XFS_ATTR_OP_FLAGS_SET		1	/* Set the attribute */
>>> +#define XFS_ATTR_OP_FLAGS_REMOVE	2	/* Remove the attribute */
>>> +#define XFS_ATTR_OP_FLAGS_TYPE_MASK	0x0FF	/* Flags type mask */
>>> +
>>> +/*
>>> + * This is the structure used to lay out an attr log item in the
>>> + * log.
>>> + */
>>> +struct xfs_attri_log_format {
>>> +	uint16_t	alfi_type;	/* attri log item type */
>>> +	uint16_t	alfi_size;	/* size of this item */
>>> +	uint32_t	__pad;		/* pad to 64 bit aligned */
>>> +	uint64_t	alfi_id;	/* attri identifier */
>>> +	xfs_ino_t	alfi_ino;	/* the inode for this attr operation */
>>
>> This is an ondisk structure; please use only explicitly sized data
>> types like uint64_t.
>>
>>> +	uint32_t	alfi_op_flags;	/* marks the op as a set or remove */
>>> +	uint32_t	alfi_name_len;	/* attr name length */
>>> +	uint32_t	alfi_value_len;	/* attr value length */
>>> +	uint32_t	alfi_attr_flags;/* attr flags */
>>> +};
>>> +
>>> +struct xfs_attrd_log_format {
>>> +	uint16_t	alfd_type;	/* attrd log item type */
>>> +	uint16_t	alfd_size;	/* size of this item */
>>> +	uint32_t	__pad;		/* pad to 64 bit aligned */
>>> +	uint64_t	alfd_alf_id;	/* id of corresponding attrd */
>>
>> "..of corresponding attri"
>>
>>> +};
>>> +
>>>   #endif /* __XFS_LOG_FORMAT_H__ */
>>> diff --git a/fs/xfs/libxfs/xfs_log_recover.h b/fs/xfs/libxfs/xfs_log_recover.h
>>> index 3cca2bf..b6e5514 100644
>>> --- a/fs/xfs/libxfs/xfs_log_recover.h
>>> +++ b/fs/xfs/libxfs/xfs_log_recover.h
>>> @@ -72,6 +72,8 @@ extern const struct xlog_recover_item_ops xlog_rui_item_ops;
>>>   extern const struct xlog_recover_item_ops xlog_rud_item_ops;
>>>   extern const struct xlog_recover_item_ops xlog_cui_item_ops;
>>>   extern const struct xlog_recover_item_ops xlog_cud_item_ops;
>>> +extern const struct xlog_recover_item_ops xlog_attri_item_ops;
>>> +extern const struct xlog_recover_item_ops xlog_attrd_item_ops;
>>>   
>>>   /*
>>>    * Macros, structures, prototypes for internal log manager use.
>>> diff --git a/fs/xfs/libxfs/xfs_types.h b/fs/xfs/libxfs/xfs_types.h
>>> index 397d947..860cdd2 100644
>>> --- a/fs/xfs/libxfs/xfs_types.h
>>> +++ b/fs/xfs/libxfs/xfs_types.h
>>> @@ -11,6 +11,7 @@ typedef uint32_t	prid_t;		/* project ID */
>>>   typedef uint32_t	xfs_agblock_t;	/* blockno in alloc. group */
>>>   typedef uint32_t	xfs_agino_t;	/* inode # within allocation grp */
>>>   typedef uint32_t	xfs_extlen_t;	/* extent length in blocks */
>>> +typedef uint32_t	xfs_attrlen_t;	/* attr length */
>>
>> This doesn't get used anywhere.
>>
>>>   typedef uint32_t	xfs_agnumber_t;	/* allocation group number */
>>>   typedef int32_t		xfs_extnum_t;	/* # of extents in a file */
>>>   typedef int16_t		xfs_aextnum_t;	/* # extents in an attribute fork */
>>> diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
>>> index 1887605..9a649d1 100644
>>> --- a/fs/xfs/scrub/common.c
>>> +++ b/fs/xfs/scrub/common.c
>>> @@ -24,6 +24,8 @@
>>>   #include "xfs_rmap_btree.h"
>>>   #include "xfs_log.h"
>>>   #include "xfs_trans_priv.h"
>>> +#include "xfs_da_format.h"
>>> +#include "xfs_da_btree.h"
>>>   #include "xfs_attr.h"
>>>   #include "xfs_reflink.h"
>>>   #include "scrub/scrub.h"
>>> diff --git a/fs/xfs/xfs_acl.c b/fs/xfs/xfs_acl.c
>>> index c544951..cad1db4 100644
>>> --- a/fs/xfs/xfs_acl.c
>>> +++ b/fs/xfs/xfs_acl.c
>>> @@ -10,6 +10,8 @@
>>>   #include "xfs_trans_resv.h"
>>>   #include "xfs_mount.h"
>>>   #include "xfs_inode.h"
>>> +#include "xfs_da_format.h"
>>> +#include "xfs_da_btree.h"
>>>   #include "xfs_attr.h"
>>>   #include "xfs_trace.h"
>>>   #include "xfs_error.h"
>>> diff --git a/fs/xfs/xfs_attr_item.c b/fs/xfs/xfs_attr_item.c
>>> new file mode 100644
>>> index 0000000..3980066
>>> --- /dev/null
>>> +++ b/fs/xfs/xfs_attr_item.c
>>> @@ -0,0 +1,750 @@
>>> +// SPDX-License-Identifier: GPL-2.0-or-later
>>> +/*
>>> + * Copyright (C) 2019 Oracle.  All Rights Reserved.
>>
>> 2019 -> 2020.
>>
>>> + * Author: Allison Collins <allison.henderson@oracle.com>
>>> + */
>>> +
>>> +#include "xfs.h"
>>> +#include "xfs_fs.h"
>>> +#include "xfs_format.h"
>>> +#include "xfs_log_format.h"
>>> +#include "xfs_trans_resv.h"
>>> +#include "xfs_bit.h"
>>> +#include "xfs_shared.h"
>>> +#include "xfs_mount.h"
>>> +#include "xfs_defer.h"
>>> +#include "xfs_trans.h"
>>> +#include "xfs_trans_priv.h"
>>> +#include "xfs_buf_item.h"
>>> +#include "xfs_attr_item.h"
>>> +#include "xfs_log.h"
>>> +#include "xfs_btree.h"
>>> +#include "xfs_rmap.h"
>>> +#include "xfs_inode.h"
>>> +#include "xfs_icache.h"
>>> +#include "xfs_da_format.h"
>>> +#include "xfs_da_btree.h"
>>> +#include "xfs_attr.h"
>>> +#include "xfs_shared.h"
>>> +#include "xfs_attr_item.h"
>>> +#include "xfs_alloc.h"
>>> +#include "xfs_bmap.h"
>>> +#include "xfs_trace.h"
>>> +#include "libxfs/xfs_da_format.h"
>>> +#include "xfs_inode.h"
>>> +#include "xfs_quota.h"
>>> +#include "xfs_log_priv.h"
>>> +#include "xfs_log_recover.h"
>>> +
>>> +static const struct xfs_item_ops xfs_attri_item_ops;
>>> +static const struct xfs_item_ops xfs_attrd_item_ops;
>>> +
>>> +static inline struct xfs_attri_log_item *ATTRI_ITEM(struct xfs_log_item *lip)
>>> +{
>>> +	return container_of(lip, struct xfs_attri_log_item, attri_item);
>>> +}
>>> +
>>> +STATIC void
>>> +xfs_attri_item_free(
>>> +	struct xfs_attri_log_item	*attrip)
>>> +{
>>> +	kmem_free(attrip->attri_item.li_lv_shadow);
>>> +	kmem_free(attrip);
>>> +}
>>> +
>>> +/*
>>> + * Freeing the attrip requires that we remove it from the AIL if it has already
>>> + * been placed there. However, the ATTRI may not yet have been placed in the
>>> + * AIL when called by xfs_attri_release() from ATTRD processing due to the
>>> + * ordering of committed vs unpin operations in bulk insert operations. Hence
>>> + * the reference count to ensure only the last caller frees the ATTRI.
>>> + */
>>> +STATIC void
>>> +xfs_attri_release(
>>> +	struct xfs_attri_log_item	*attrip)
>>> +{
>>> +	ASSERT(atomic_read(&attrip->attri_refcount) > 0);
>>> +	if (atomic_dec_and_test(&attrip->attri_refcount)) {
>>> +		xfs_trans_ail_delete(&attrip->attri_item,
>>> +				     SHUTDOWN_LOG_IO_ERROR);
>>> +		xfs_attri_item_free(attrip);
>>> +	}
>>> +}
>>> +
>>> +/*
>>> + * This returns the number of iovecs needed to log the given attri item. We
>>> + * only need 1 iovec for an attri item.  It just logs the attr_log_format
>>> + * structure.
>>> + */
>>> +static inline int
>>> +xfs_attri_item_sizeof(
>>> +	struct xfs_attri_log_item *attrip)
>>> +{
>>> +	return sizeof(struct xfs_attri_log_format);
>>> +}
>>
>> Please get rid of this trivial oneliner.
>>
>>> +
>>> +STATIC void
>>> +xfs_attri_item_size(
>>> +	struct xfs_log_item	*lip,
>>> +	int			*nvecs,
>>> +	int			*nbytes)
>>> +{
>>> +	struct xfs_attri_log_item       *attrip = ATTRI_ITEM(lip);
>>> +
>>> +	*nvecs += 1;
>>> +	*nbytes += xfs_attri_item_sizeof(attrip);
>>> +
>>> +	/* Attr set and remove operations require a name */
>>> +	ASSERT(attrip->attri_name_len > 0);
>>> +
>>> +	*nvecs += 1;
>>> +	*nbytes += ATTR_NVEC_SIZE(attrip->attri_name_len);
>>> +
>>> +	/*
>>> +	 * Set ops can accept a value of 0 len to clear an attr value.  Remove
>>> +	 * ops do not need a value at all.  So only account for the value
>>> +	 * when it is needed.
>>> +	 */
>>> +	if (attrip->attri_value_len > 0) {
>>> +		*nvecs += 1;
>>> +		*nbytes += ATTR_NVEC_SIZE(attrip->attri_value_len);
>>> +	}
>>> +}
>>> +
>>> +/*
>>> + * This is called to fill in the log iovecs for the given attri log
>>> + * item. We use  1 iovec for the attri_format_item, 1 for the name, and
>>> + * another for the value if it is present
>>> + */
>>> +STATIC void
>>> +xfs_attri_item_format(
>>> +	struct xfs_log_item	*lip,
>>> +	struct xfs_log_vec	*lv)
>>> +{
>>> +	struct xfs_attri_log_item	*attrip = ATTRI_ITEM(lip);
>>> +	struct xfs_log_iovec		*vecp = NULL;
>>> +
>>> +	attrip->attri_format.alfi_type = XFS_LI_ATTRI;
>>> +	attrip->attri_format.alfi_size = 1;
>>> +
>>> +	/*
>>> +	 * This size accounting must be done before copying the attrip into the
>>> +	 * iovec.  If we do it after, the wrong size will be recorded to the log
>>> +	 * and we trip across assertion checks for bad region sizes later during
>>> +	 * the log recovery.
>>> +	 */
>>> +
>>> +	ASSERT(attrip->attri_name_len > 0);
>>> +	attrip->attri_format.alfi_size++;
>>> +
>>> +	if (attrip->attri_value_len > 0)
>>> +		attrip->attri_format.alfi_size++;
>>> +
>>> +	xlog_copy_iovec(lv, &vecp, XLOG_REG_TYPE_ATTRI_FORMAT,
>>> +			&attrip->attri_format,
>>> +			xfs_attri_item_sizeof(attrip));
>>> +	if (attrip->attri_name_len > 0)
>>
>> I thought we required attri_name_len > 0 always?
>>
>>> +		xlog_copy_iovec(lv, &vecp, XLOG_REG_TYPE_ATTR_NAME,
>>> +				attrip->attri_name,
>>> +				ATTR_NVEC_SIZE(attrip->attri_name_len));
>>> +
>>> +	if (attrip->attri_value_len > 0)
>>> +		xlog_copy_iovec(lv, &vecp, XLOG_REG_TYPE_ATTR_VALUE,
>>> +				attrip->attri_value,
>>> +				ATTR_NVEC_SIZE(attrip->attri_value_len));
>>> +}
>>> +
>>> +/*
>>> + * The unpin operation is the last place an ATTRI is manipulated in the log. It
>>> + * is either inserted in the AIL or aborted in the event of a log I/O error. In
>>> + * either case, the ATTRI transaction has been successfully committed to make
>>> + * it this far. Therefore, we expect whoever committed the ATTRI to either
>>> + * construct and commit the ATTRD or drop the ATTRD's reference in the event of
>>> + * error. Simply drop the log's ATTRI reference now that the log is done with
>>> + * it.
>>> + */
>>> +STATIC void
>>> +xfs_attri_item_unpin(
>>> +	struct xfs_log_item	*lip,
>>> +	int			remove)
>>> +{
>>> +	struct xfs_attri_log_item	*attrip = ATTRI_ITEM(lip);
>>> +
>>> +	xfs_attri_release(attrip);
>>
>> Nit: this could be shortened to xfs_attri_release(ATTRI_ITEM(lip)).
>>
>>> +}
>>> +
>>> +
>>> +STATIC void
>>> +xfs_attri_item_release(
>>> +	struct xfs_log_item	*lip)
>>> +{
>>> +	xfs_attri_release(ATTRI_ITEM(lip));
>>> +}
>>> +
>>> +/*
>>> + * Allocate and initialize an attri item
>>> + */
>>> +STATIC struct xfs_attri_log_item *
>>> +xfs_attri_init(
>>> +	struct xfs_mount	*mp)
>>> +
>>> +{
>>> +	struct xfs_attri_log_item	*attrip;
>>> +	uint				size;
>>
>> Can you line up the *mp in the parameter list with the *attrip in the
>> local variables?
>>
>>> +
>>> +	size = (uint)(sizeof(struct xfs_attri_log_item));
>>
>> kmem_zalloc takes a size_t parameter (which is the return type of sizeof);
>> no need to do all this casting.
>>
>>> +	attrip = kmem_zalloc(size, 0);
>>> +
>>> +	xfs_log_item_init(mp, &attrip->attri_item, XFS_LI_ATTRI,
>>> +			  &xfs_attri_item_ops);
>>> +	attrip->attri_format.alfi_id = (uintptr_t)(void *)attrip;
>>> +	atomic_set(&attrip->attri_refcount, 2);
>>> +
>>> +	return attrip;
>>> +}
>>> +
>>> +/*
>>> + * Copy an attr format buffer from the given buf, and into the destination attr
>>> + * format structure.
>>> + */
>>> +STATIC int
>>> +xfs_attri_copy_format(struct xfs_log_iovec *buf,
>>> +		      struct xfs_attri_log_format *dst_attr_fmt)
>>> +{
>>> +	struct xfs_attri_log_format *src_attr_fmt = buf->i_addr;
>>> +	uint len = sizeof(struct xfs_attri_log_format);
>>
>> Indentation and whatnot with the parameter names.
>>
>>> +
>>> +	if (buf->i_len != len)
>>> +		return -EFSCORRUPTED;
>>> +
>>> +	memcpy((char *)dst_attr_fmt, (char *)src_attr_fmt, len);
>>> +	return 0;
>>> +}
>>> +
>>> +static inline struct xfs_attrd_log_item *ATTRD_ITEM(struct xfs_log_item *lip)
>>> +{
>>> +	return container_of(lip, struct xfs_attrd_log_item, attrd_item);
>>> +}
>>> +
>>> +STATIC void
>>> +xfs_attrd_item_free(struct xfs_attrd_log_item *attrdp)
>>> +{
>>> +	kmem_free(attrdp->attrd_item.li_lv_shadow);
>>> +	kmem_free(attrdp);
>>> +}
>>> +
>>> +/*
>>> + * This returns the number of iovecs needed to log the given attrd item.
>>> + * We only need 1 iovec for an attrd item.  It just logs the attr_log_format
>>> + * structure.
>>> + */
>>> +static inline int
>>> +xfs_attrd_item_sizeof(
>>> +	struct xfs_attrd_log_item *attrdp)
>>> +{
>>> +	return sizeof(struct xfs_attrd_log_format);
>>> +}
>>> +
>>> +STATIC void
>>> +xfs_attrd_item_size(
>>> +	struct xfs_log_item	*lip,
>>> +	int			*nvecs,
>>> +	int			*nbytes)
>>> +{
>>> +	struct xfs_attrd_log_item	*attrdp = ATTRD_ITEM(lip);
>>
>> Variable name alignment between the parameter list and the local vars.
>>
>>> +	*nvecs += 1;
>>
>> Space between local variable declaration and the first line of code.
>>
>>> +	*nbytes += xfs_attrd_item_sizeof(attrdp);
>>
>> No need for a oneliner function for sizeof.
>>
>>> +}
>>> +
>>> +/*
>>> + * This is called to fill in the log iovecs for the given attrd log item. We use
>>> + * only 1 iovec for the attrd_format, and we point that at the attr_log_format
>>> + * structure embedded in the attrd item.
>>> + */
>>> +STATIC void
>>> +xfs_attrd_item_format(
>>> +	struct xfs_log_item	*lip,
>>> +	struct xfs_log_vec	*lv)
>>> +{
>>> +	struct xfs_attrd_log_item	*attrdp = ATTRD_ITEM(lip);
>>> +	struct xfs_log_iovec		*vecp = NULL;
>>> +
>>> +	attrdp->attrd_format.alfd_type = XFS_LI_ATTRD;
>>> +	attrdp->attrd_format.alfd_size = 1;
>>> +
>>> +	xlog_copy_iovec(lv, &vecp, XLOG_REG_TYPE_ATTRD_FORMAT,
>>> +			&attrdp->attrd_format, xfs_attrd_item_sizeof(attrdp));
>>> +}
>>> +
>>> +/*
>>> + * The ATTRD is either committed or aborted if the transaction is cancelled. If
>>> + * the transaction is cancelled, drop our reference to the ATTRI and free the
>>> + * ATTRD.
>>> + */
>>> +STATIC void
>>> +xfs_attrd_item_release(
>>> +	struct xfs_log_item     *lip)
>>> +{
>>> +	struct xfs_attrd_log_item *attrdp = ATTRD_ITEM(lip);
>>> +	xfs_attri_release(attrdp->attrd_attrip);
>>
>> Space between the variable declaration and the first line of code.
>>
>>> +	xfs_attrd_item_free(attrdp);
>>> +}
>>> +
>>> +/*
>>> + * Log an ATTRI it to the ATTRD when the attr op is done.  An attr operation
>>
>> I don't know what "Log an ATTRI it to the ATTRD" means.  I think this is
>> the function that performs one step of an attribute update intent and
>> then tags the attrd item dirty, right?
>>
>>> + * may be a set or a remove.  Note that the transaction is marked dirty
>>> + * regardless of whether the operation succeeds or fails to support the
>>> + * ATTRI/ATTRD lifecycle rules.
>>> + */
>>> +int
>>> +xfs_trans_attr(
>>> +	struct xfs_delattr_context	*dac,
>>> +	struct xfs_attrd_log_item	*attrdp,
>>> +	struct xfs_buf			**leaf_bp,
>>> +	uint32_t			op_flags)
>>> +{
>>> +	struct xfs_da_args		*args = dac->da_args;
>>> +	int				error;
>>> +
>>> +	error = xfs_qm_dqattach_locked(args->dp, 0);
>>> +	if (error)
>>> +		return error;
>>> +
>>> +	switch (op_flags) {
>>> +	case XFS_ATTR_OP_FLAGS_SET:
>>> +		args->op_flags |= XFS_DA_OP_ADDNAME;
>>> +		error = xfs_attr_set_iter(dac, leaf_bp);
>>> +		break;
>>> +	case XFS_ATTR_OP_FLAGS_REMOVE:
>>> +		ASSERT(XFS_IFORK_Q((args->dp)));
>>
>> No need for the double parentheses around args->dp.
>>
>>> +		error = xfs_attr_remove_iter(dac);
>>> +		break;
>>> +	default:
>>> +		error = -EFSCORRUPTED;
>>> +		break;
>>> +	}
>>> +
>>> +	/*
>>> +	 * Mark the transaction dirty, even on error. This ensures the
>>> +	 * transaction is aborted, which:
>>> +	 *
>>> +	 * 1.) releases the ATTRI and frees the ATTRD
>>> +	 * 2.) shuts down the filesystem
>>> +	 */
>>> +	args->trans->t_flags |= XFS_TRANS_DIRTY;
>>> +	if (xfs_sb_version_hasdelattr(&args->dp->i_mount->m_sb))
>>> +		set_bit(XFS_LI_DIRTY, &attrdp->attrd_item.li_flags);
>>
>> This could probably be:
>>
>> 	if (attrdp)
>> 		set_bit(...);
>>
>>> +
>>> +	return error;
>>> +}
>>> +
>>> +/* Log an attr to the intent item. */
>>> +STATIC void
>>> +xfs_attr_log_item(
>>> +	struct xfs_trans		*tp,
>>> +	struct xfs_attri_log_item	*attrip,
>>> +	struct xfs_attr_item		*attr)
>>> +{
>>> +	struct xfs_attri_log_format	*attrp;
>>> +
>>> +	tp->t_flags |= XFS_TRANS_DIRTY;
>>> +	set_bit(XFS_LI_DIRTY, &attrip->attri_item.li_flags);
>>> +
>>> +	/*
>>> +	 * At this point the xfs_attr_item has been constructed, and we've
>>> +	 * created the log intent. Fill in the attri log item and log format
>>> +	 * structure with fields from this xfs_attr_item
>>> +	 */
>>> +	attrp = &attrip->attri_format;
>>> +	attrp->alfi_ino = attr->xattri_dac.da_args->dp->i_ino;
>>> +	attrp->alfi_op_flags = attr->xattri_op_flags;
>>> +	attrp->alfi_value_len = attr->xattri_dac.da_args->valuelen;
>>> +	attrp->alfi_name_len = attr->xattri_dac.da_args->namelen;
>>> +	attrp->alfi_attr_flags = attr->xattri_dac.da_args->attr_filter;
>>> +
>>> +	attrip->attri_name = (void *)attr->xattri_dac.da_args->name;
>>> +	attrip->attri_value = attr->xattri_dac.da_args->value;
>>> +	attrip->attri_name_len = attr->xattri_dac.da_args->namelen;
>>> +	attrip->attri_value_len = attr->xattri_dac.da_args->valuelen;
>>> +}
>>> +
>>> +/* Get an ATTRI. */
>>> +static struct xfs_log_item *
>>> +xfs_attr_create_intent(
>>> +	struct xfs_trans		*tp,
>>> +	struct list_head		*items,
>>> +	unsigned int			count,
>>> +	bool				sort)
>>> +{
>>> +	struct xfs_mount		*mp = tp->t_mountp;
>>> +	struct xfs_attri_log_item	*attrip;
>>> +	struct xfs_attr_item		*attr;
>>> +
>>> +	ASSERT(count == 1);
>>> +
>>> +	if (!xfs_sb_version_hasdelattr(&mp->m_sb))
>>> +		return NULL;
>>> +
>>> +	attrip = xfs_attri_init(mp);
>>> +	xfs_trans_add_item(tp, &attrip->attri_item);
>>> +	list_for_each_entry(attr, items, xattri_list)
>>> +		xfs_attr_log_item(tp, attrip, attr);
>>> +	return &attrip->attri_item;
>>> +}
>>> +
>>> +/* Process an attr. */
>>> +STATIC int
>>> +xfs_attr_finish_item(
>>> +	struct xfs_trans		*tp,
>>> +	struct xfs_log_item		*done,
>>> +	struct list_head		*item,
>>> +	struct xfs_btree_cur		**state)
>>> +{
>>> +	struct xfs_attr_item		*attr;
>>> +	int				error;
>>> +	struct xfs_delattr_context	*dac;
>>> +	struct xfs_attrd_log_item	*attrdp;
>>> +	struct xfs_attri_log_item	*attrip;
>>> +
>>> +	attr = container_of(item, struct xfs_attr_item, xattri_list);
>>> +	dac = &attr->xattri_dac;
>>> +
>>> +	/*
>>> +	 * Always reset trans after EAGAIN cycle
>>> +	 * since the transaction is new
>>> +	 */
>>> +	dac->da_args->trans = tp;
>>> +
>>> +	error = xfs_trans_attr(dac, ATTRD_ITEM(done), &dac->leaf_bp,
>>> +			       attr->xattri_op_flags);
>>> +	/*
>>> +	 * The attrip refers to xfs_attr_item memory to log the name and value
>>> +	 * with the intent item. This already occurred when the intent was
>>> +	 * committed so these fields are no longer accessed.
>>
>> Can you clear the attri_{name,value} pointers after you've logged the
>> intent item so that we don't have to do them here?
>>
>>> Clear them out of
>>> +	 * caution since we're about to free the xfs_attr_item.
>>> +	 */
>>> +	if (xfs_sb_version_hasdelattr(&dac->da_args->dp->i_mount->m_sb)) {
>>> +		attrdp = (struct xfs_attrd_log_item *)done;
>>
>> attrdp = ATTRD_ITEM(done)?
>>
>>> +		attrip = attrdp->attrd_attrip;
>>> +		attrip->attri_name = NULL;
>>> +		attrip->attri_value = NULL;
>>> +	}
>>> +
>>> +	if (error != -EAGAIN)
>>> +		kmem_free(attr);
>>> +
>>> +	return error;
>>> +}
>>> +
>>> +/* Abort all pending ATTRs. */
>>> +STATIC void
>>> +xfs_attr_abort_intent(
>>> +	struct xfs_log_item		*intent)
>>> +{
>>> +	xfs_attri_release(ATTRI_ITEM(intent));
>>> +}
>>> +
>>> +/* Cancel an attr */
>>> +STATIC void
>>> +xfs_attr_cancel_item(
>>> +	struct list_head		*item)
>>> +{
>>> +	struct xfs_attr_item		*attr;
>>> +
>>> +	attr = container_of(item, struct xfs_attr_item, xattri_list);
>>> +	kmem_free(attr);
>>> +}
>>> +
>>> +/*
>>> + * The ATTRI is logged only once and cannot be moved in the log, so simply
>>> + * return the lsn at which it's been logged.
>>> + */
>>> +STATIC xfs_lsn_t
>>> +xfs_attri_item_committed(
>>> +	struct xfs_log_item	*lip,
>>> +	xfs_lsn_t		lsn)
>>> +{
>>> +	return lsn;
>>> +}
>>
>> You can omit this function because the default is "return lsn;" if you
>> don't provide one.  See xfs_trans_committed_bulk.
>>
>>> +
>>> +STATIC void
>>> +xfs_attri_item_committing(
>>> +	struct xfs_log_item	*lip,
>>> +	xfs_lsn_t		lsn)
>>> +{
>>> +}
>>
>> This function isn't required if it doesn't do anything.  See
>> xfs_log_commit_cil.
>>
>>> +
>>> +STATIC bool
>>> +xfs_attri_item_match(
>>> +	struct xfs_log_item	*lip,
>>> +	uint64_t		intent_id)
>>> +{
>>> +	return ATTRI_ITEM(lip)->attri_format.alfi_id == intent_id;
>>> +}
>>> +
>>> +/*
>>> + * When the attrd item is committed to disk, all we need to do is delete our
>>> + * reference to our partner attri item and then free ourselves. Since we're
>>> + * freeing ourselves we must return -1 to keep the transaction code from
>>> + * further referencing this item.
>>> + */
>>> +STATIC xfs_lsn_t
>>> +xfs_attrd_item_committed(
>>> +	struct xfs_log_item	*lip,
>>> +	xfs_lsn_t		lsn)
>>> +{
>>> +	struct xfs_attrd_log_item	*attrdp = ATTRD_ITEM(lip);
>>> +
>>> +	/*
>>> +	 * Drop the ATTRI reference regardless of whether the ATTRD has been
>>> +	 * aborted. Once the ATTRD transaction is constructed, it is the sole
>>> +	 * responsibility of the ATTRD to release the ATTRI (even if the ATTRI
>>> +	 * is aborted due to log I/O error).
>>> +	 */
>>> +	xfs_attri_release(attrdp->attrd_attrip);
>>> +	xfs_attrd_item_free(attrdp);
>>> +
>>> +	return NULLCOMMITLSN;
>>> +}
>>
>> If you set XFS_ITEM_RELEASE_WHEN_COMMITTED in the attrd item ops,
>> xfs_trans_committed_bulk will call ->iop_release instead of
>> ->iop_committed and you therefore don't need this function.
>>
>>> +
>>> +STATIC void
>>> +xfs_attrd_item_committing(
>>> +	struct xfs_log_item	*lip,
>>> +	xfs_lsn_t		lsn)
>>> +{
>>> +}
>>
>> Same comment as xfs_attri_item_committing.
>>
>>> +
>>> +
>>> +/*
>>> + * Allocate and initialize an attrd item
>>> + */
>>> +struct xfs_attrd_log_item *
>>> +xfs_attrd_init(
>>> +	struct xfs_mount		*mp,
>>> +	struct xfs_attri_log_item	*attrip)
>>> +
>>> +{
>>> +	struct xfs_attrd_log_item	*attrdp;
>>> +	uint				size;
>>> +
>>> +	size = (uint)(sizeof(struct xfs_attrd_log_item));
>>
>> Same comment about sizeof and size_t as in xfs_attri_init.
>>
>>> +	attrdp = kmem_zalloc(size, 0);
>>> +	memset(attrdp, 0, size);
>>
>> No need to memset-zero something you just zalloc'd.
>>
>>> +
>>> +	xfs_log_item_init(mp, &attrdp->attrd_item, XFS_LI_ATTRD,
>>> +			  &xfs_attrd_item_ops);
>>> +	attrdp->attrd_attrip = attrip;
>>> +	attrdp->attrd_format.alfd_alf_id = attrip->attri_format.alfi_id;
>>> +
>>> +	return attrdp;
>>> +}
>>> +
>>> +/*
>>> + * This routine is called to allocate an "attr free done" log item.
>>> + */
>>> +struct xfs_attrd_log_item *
>>> +xfs_trans_get_attrd(struct xfs_trans		*tp,
>>> +		  struct xfs_attri_log_item	*attrip)
>>> +{
>>> +	struct xfs_attrd_log_item		*attrdp;
>>> +
>>> +	ASSERT(tp != NULL);
>>> +
>>> +	attrdp = xfs_attrd_init(tp->t_mountp, attrip);
>>> +	ASSERT(attrdp != NULL);
>>
>> You could fold xfs_attrd_init into this function since there's only one
>> caller.
>>
>>> +
>>> +	xfs_trans_add_item(tp, &attrdp->attrd_item);
>>> +	return attrdp;
>>> +}
>>> +
>>> +static const struct xfs_item_ops xfs_attrd_item_ops = {
>>> +	.iop_size	= xfs_attrd_item_size,
>>> +	.iop_format	= xfs_attrd_item_format,
>>> +	.iop_release    = xfs_attrd_item_release,
>>> +	.iop_committing	= xfs_attrd_item_committing,
>>> +	.iop_committed	= xfs_attrd_item_committed,
>>> +};
>>> +
>>> +
>>> +/* Get an ATTRD so we can process all the attrs. */
>>> +static struct xfs_log_item *
>>> +xfs_attr_create_done(
>>> +	struct xfs_trans		*tp,
>>> +	struct xfs_log_item		*intent,
>>> +	unsigned int			count)
>>> +{
>>> +	if (!xfs_sb_version_hasdelattr(&tp->t_mountp->m_sb))
>>> +		return NULL;
>>
>> This is probably better expressed as:
>>
>> 	if (!intent)
>> 		return NULL;
>>
>> Since we don't need a log intent done item if there's no log intent
>> item.
>>
>>> +
>>> +	return &xfs_trans_get_attrd(tp, ATTRI_ITEM(intent))->attrd_item;
>>> +}
>>> +
>>> +const struct xfs_defer_op_type xfs_attr_defer_type = {
>>> +	.max_items	= 1,
>>> +	.create_intent	= xfs_attr_create_intent,
>>> +	.abort_intent	= xfs_attr_abort_intent,
>>> +	.create_done	= xfs_attr_create_done,
>>> +	.finish_item	= xfs_attr_finish_item,
>>> +	.cancel_item	= xfs_attr_cancel_item,
>>> +};
>>> +
>>> +/*
>>> + * Process an attr intent item that was recovered from the log.  We need to
>>> + * delete the attr that it describes.
>>> + */
>>> +STATIC int
>>> +xfs_attri_item_recover(
>>> +	struct xfs_log_item		*lip,
>>> +	struct list_head		*capture_list)
>>> +{
>>> +	struct xfs_attri_log_item	*attrip = ATTRI_ITEM(lip);
>>> +	struct xfs_mount		*mp = lip->li_mountp;
>>> +	struct xfs_inode		*ip;
>>> +	struct xfs_da_args		args;
>>> +	struct xfs_attri_log_format	*attrp;
>>> +	int				error;
>>> +
>>> +	/*
>>> +	 * First check the validity of the attr described by the ATTRI.  If any
>>> +	 * are bad, then assume that all are bad and just toss the ATTRI.
>>> +	 */
>>> +	attrp = &attrip->attri_format;
>>> +	if (!(attrp->alfi_op_flags == XFS_ATTR_OP_FLAGS_SET ||
>>> +	      attrp->alfi_op_flags == XFS_ATTR_OP_FLAGS_REMOVE) ||
>>> +	    (attrp->alfi_value_len > XATTR_SIZE_MAX) ||
>>> +	    (attrp->alfi_name_len > XATTR_NAME_MAX) ||
>>> +	    (attrp->alfi_name_len == 0)) {
>>
>> This needs to call xfs_verify_ino() on attrp->alfi_ino.
>>
>> This also needs to check for xfs_sb_version_hasdelayedattr().
>>
>> I would refactor this into a separate validation predicate to eliminate
>> the multi-line if statement.  I will post a series cleaning up the other
>> log items' recover functions shortly.
>>
>>> +		/*
>>> +		 * This will pull the ATTRI from the AIL and free the memory
>>> +		 * associated with it.
>>> +		 */
>>> +		xfs_attri_release(attrip);
>>
>> No need to call xfs_attri_release; one of the 5.10 cleanups was to
>> recognize that the log recovery code does this for you automatically.
>>
>>> +		return -EFSCORRUPTED;
>>> +	}
>>> +
>>> +	error = xfs_iget(mp, 0, attrp->alfi_ino, 0, 0, &ip);
>>> +	if (error)
>>> +		return error;
>>
>> I /think/ this needs to call xfs_qm_dqattach here, for reasons I'll get
>> into shortly.
>>
>> In the meantime, this /definitely/ needs to do:
>>
>> 	if (VFS_I(ip)->i_nlink == 0)
>> 		xfs_iflags_set(ip, XFS_IRECOVERY);
>>
>> Because the IRECOVERY flag prevents inode inactivation from triggering
>> on an unlinked inode while we're still performing log recovery.
>>
>> If you want to steal the xlog_recover_iget helper from the atomic
>> swapext series[0] please feel free. :)
>>
>> [0] https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/commit/?id=51e23b9c9d9674a78dc97c5848c9efb4461e074d
>>
>>> +	memset(&args, 0, sizeof(args));
>>> +	args.dp = ip;
>>> +	args.name = attrip->attri_name;
>>> +	args.namelen = attrp->alfi_name_len;
>>> +	args.attr_filter = attrp->alfi_attr_flags;
>>> +	if (attrp->alfi_op_flags == XFS_ATTR_OP_FLAGS_SET) {
>>> +		args.value = attrip->attri_value;
>>> +		args.valuelen = attrp->alfi_value_len;
>>> +	}
>>> +
>>> +	error = xfs_attr_set(&args);
>>
>> Er...
> 
> Err... silly /me started a comment and then forgot to come back to it.
> 
> Log intent item recovery functions are "special".  The intent items that
> are recovered from the log represent all the committed state of the log
> at the point that the system went down.  For each recovered intent, we
> have to finish exactly that one step of work before we can move on to
> any work that would have happened after a transaction roll.
> 
> Maybe an example would help here: Let's say that two threads (a) and (b)
> each create a transaction, each log an intent item (we'll call them A
> and B respectively) and commit.  Let's say that the system goes down
> immediately after both commits are persisted but before anything else
> can happen.
> 
> Let us further presume that A is a multi-step transaction, and that the
> next step of A (call it A1) requires a resource that B currently has
> locked for update.  Normally, thread (a) will be blocked from making
> update A1 until B commits and thread (b) unlocks that resource, which
> means that the commit order will be A -> B -> A1.
> 
> Now let's look at log recovery.  We recover A and B from the log.  The
> data dependency between B and A1 still exists, but the log does not
> capture enough information to know about that dependency.  In order to
> ensure that log replay occurs in exactly the same order that it would
> have had the system not gone down, XFS single-steps through the
> recovered items and captures the "next steps" for later replay.
> 
> Going back to our example, log recovery will replay A needs to notice
> that recover(A) queued the unfinished work A1.  It saves A1 for later in
> the xfs_defer_capture machinery.  Then it recovers B, and only then can
> it go back to A1 and finish that.
> 
> Concretely, this means that you can't call xfs_attr_set here, because it
> creates a transaction and commits it, which potentially completes a
> bunch of work items that might have had dependencies on the other things
> that were recovered from the log.  I don't think xattrs actually /have/
> any such dependencies, but it's easier to reason about log recovery if
> all the recovery functions behave the same way.
> 
> This means that this recovery function has to behave in this manner:
> 
> 	xfs_iget(..., &ip);
> 	xfs_trans_alloc(&tp)
> 	xfs_trans_get_attrd(tp, attrip);
> 	xfs_ilock(ip...);
> 	xfs_trans_attr(...);
> 	if (there's more work) {
> 		create a new defer item from the onstack &args
> 		link it to the transaction
> 	}
> 
> 	xfs_defer_ops_capture_and_commit(tp, ip, capture_list);
> 	<unlock and release inodes>
> 
> Or put another way, if xfs_trans_attr returns -EAGAIN to tell us that
> there's more work to do, we have to create an incore defer ops item,
> attach it to the transaction, and let the defer capture mechanism save
> it for later.
> 
> Some day we'll figure out how to encode those data dependencies in the
> ondisk log (Dave speculated a while back that it might be as simple as
> encoding the transaction LSN in the intent ids instead of raw pointers
> so that we can reconstruct which intents came from where) but for now
> this is the (less) clunky way we do it.
> 
> Oh, and also it's necessary to attach dquots to any inode involved in
> log recovery, unless xfs_trans_attr already does that for us(?)
> 
> --D
Oh ok then, I will rework this area here to be more consistent with 
this.  Thank you for the reviews!

Allison

> 
>>
>>> +
>>> +	xfs_attri_release(attrip);
>>
>> The transaction commit will take care of releasing attrip.
>>
>>> +	xfs_irele(ip);
>>> +	return error;
>>> +}
>>> +
>>> +static const struct xfs_item_ops xfs_attri_item_ops = {
>>> +	.iop_size	= xfs_attri_item_size,
>>> +	.iop_format	= xfs_attri_item_format,
>>> +	.iop_unpin	= xfs_attri_item_unpin,
>>> +	.iop_committed	= xfs_attri_item_committed,
>>> +	.iop_committing = xfs_attri_item_committing,
>>> +	.iop_release    = xfs_attri_item_release,
>>> +	.iop_recover	= xfs_attri_item_recover,
>>> +	.iop_match	= xfs_attri_item_match,
>>
>> This needs an ->iop_relog method so that we can relog the attri log item
>> if the log starts to fill up.
>>
>>> +};
>>> +
>>> +
>>> +
>>> +STATIC int
>>> +xlog_recover_attri_commit_pass2(
>>> +	struct xlog                     *log,
>>> +	struct list_head		*buffer_list,
>>> +	struct xlog_recover_item        *item,
>>> +	xfs_lsn_t                       lsn)
>>> +{
>>> +	int                             error;
>>> +	struct xfs_mount                *mp = log->l_mp;
>>> +	struct xfs_attri_log_item       *attrip;
>>> +	struct xfs_attri_log_format     *attri_formatp;
>>> +	char				*name = NULL;
>>> +	char				*value = NULL;
>>> +	int				region = 0;
>>> +
>>> +	attri_formatp = item->ri_buf[region].i_addr;
>>
>> Please check the __pad field for zeroes here.
>>
>>> +	attrip = xfs_attri_init(mp);
>>> +	error = xfs_attri_copy_format(&item->ri_buf[region],
>>> +				      &attrip->attri_format);
>>> +	if (error) {
>>> +		xfs_attri_item_free(attrip);
>>> +		return error;
>>> +	}
>>> +
>>> +	attrip->attri_name_len = attri_formatp->alfi_name_len;
>>> +	attrip->attri_value_len = attri_formatp->alfi_value_len;
>>> +	attrip = krealloc(attrip, sizeof(struct xfs_attri_log_item) +
>>> +			  attrip->attri_name_len + attrip->attri_value_len,
>>> +			  GFP_NOFS | __GFP_NOFAIL);
>>> +
>>> +	ASSERT(attrip->attri_name_len > 0);
>>
>> If attri_name_len is zero, reject the whole thing with EFSCORRUPTED.
>>
>>> +	region++;
>>> +	name = ((char *)attrip) + sizeof(struct xfs_attri_log_item);
>>> +	memcpy(name, item->ri_buf[region].i_addr,
>>> +	       attrip->attri_name_len);
>>> +	attrip->attri_name = name;
>>> +
>>> +	if (attrip->attri_value_len > 0) {
>>> +		region++;
>>> +		value = ((char *)attrip) + sizeof(struct xfs_attri_log_item) +
>>> +			attrip->attri_name_len;
>>> +		memcpy(value, item->ri_buf[region].i_addr,
>>> +			attrip->attri_value_len);
>>> +		attrip->attri_value = value;
>>> +	}
>>
>> Question: is it valid for an attri item to have value_len > 0 for an
>> XFS_ATTRI_OP_FLAGS_REMOVE operation?
>>
>> Granted, that level of validation might be better left to the _recover
>> function.
>>
>>> +
>>> +	/*
>>> +	 * The ATTRI has two references. One for the ATTRD and one for ATTRI to
>>> +	 * ensure it makes it into the AIL. Insert the ATTRI into the AIL
>>> +	 * directly and drop the ATTRI reference. Note that
>>> +	 * xfs_trans_ail_update() drops the AIL lock.
>>> +	 */
>>> +	xfs_trans_ail_insert(log->l_ailp, &attrip->attri_item, lsn);
>>> +	xfs_attri_release(attrip);
>>> +	return 0;
>>> +}
>>> +
>>> +const struct xlog_recover_item_ops xlog_attri_item_ops = {
>>> +	.item_type	= XFS_LI_ATTRI,
>>> +	.commit_pass2	= xlog_recover_attri_commit_pass2,
>>> +};
>>> +
>>> +/*
>>> + * This routine is called when an ATTRD format structure is found in a committed
>>> + * transaction in the log. Its purpose is to cancel the corresponding ATTRI if
>>> + * it was still in the log. To do this it searches the AIL for the ATTRI with
>>> + * an id equal to that in the ATTRD format structure. If we find it we drop
>>> + * the ATTRD reference, which removes the ATTRI from the AIL and frees it.
>>> + */
>>> +STATIC int
>>> +xlog_recover_attrd_commit_pass2(
>>> +	struct xlog			*log,
>>> +	struct list_head		*buffer_list,
>>> +	struct xlog_recover_item	*item,
>>> +	xfs_lsn_t			lsn)
>>> +{
>>> +	struct xfs_attrd_log_format	*attrd_formatp;
>>> +
>>> +	attrd_formatp = item->ri_buf[0].i_addr;
>>> +	ASSERT((item->ri_buf[0].i_len ==
>>> +				(sizeof(struct xfs_attrd_log_format))));
>>> +
>>> +	xlog_recover_release_intent(log, XFS_LI_ATTRI,
>>> +				    attrd_formatp->alfd_alf_id);
>>> +	return 0;
>>> +}
>>> +
>>> +const struct xlog_recover_item_ops xlog_attrd_item_ops = {
>>> +	.item_type	= XFS_LI_ATTRD,
>>> +	.commit_pass2	= xlog_recover_attrd_commit_pass2,
>>> +};
>>> diff --git a/fs/xfs/xfs_attr_item.h b/fs/xfs/xfs_attr_item.h
>>> new file mode 100644
>>> index 0000000..7dd2572
>>> --- /dev/null
>>> +++ b/fs/xfs/xfs_attr_item.h
>>> @@ -0,0 +1,76 @@
>>> +/* SPDX-License-Identifier: GPL-2.0-or-later
>>> + *
>>> + * Copyright (C) 2019 Oracle.  All Rights Reserved.
>>> + * Author: Allison Collins <allison.henderson@oracle.com>
>>> + */
>>> +#ifndef	__XFS_ATTR_ITEM_H__
>>> +#define	__XFS_ATTR_ITEM_H__
>>> +
>>> +/* kernel only ATTRI/ATTRD definitions */
>>> +
>>> +struct xfs_mount;
>>> +struct kmem_zone;
>>> +
>>> +/*
>>> + * Define ATTR flag bits. Manipulated by set/clear/test_bit operators.
>>> + */
>>> +#define	XFS_ATTRI_RECOVERED	1
>>> +
>>> +
>>> +/* iovec length must be 32-bit aligned */
>>> +#define ATTR_NVEC_SIZE(size) (size == sizeof(int32_t) ? sizeof(int32_t) : \
>>> +				size + sizeof(int32_t) - \
>>> +				(size % sizeof(int32_t)))
>>
>> Can you turn this into a static inline helper?
>>
>> And use one of the roundup() variants to ensure the proper alignment
>> instead of this open-coded stuff? :)
>>
>>> +
>>> +/*
>>> + * This is the "attr intention" log item.  It is used to log the fact that some
>>> + * attribute operations need to be processed.  An operation is currently either
>>> + * a set or remove.  Set or remove operations are described by the xfs_attr_item
>>> + * which may be logged to this intent.  Intents are used in conjunction with the
>>> + * "attr done" log item described below.
>>> + *
>>> + * The ATTRI is reference counted so that it is not freed prior to both the
>>> + * ATTRI and ATTRD being committed and unpinned. This ensures the ATTRI is
>>> + * inserted into the AIL even in the event of out of order ATTRI/ATTRD
>>> + * processing. In other words, an ATTRI is born with two references:
>>> + *
>>> + *      1.) an ATTRI held reference to track ATTRI AIL insertion
>>> + *      2.) an ATTRD held reference to track ATTRD commit
>>> + *
>>> + * On allocation, both references are the responsibility of the caller. Once the
>>> + * ATTRI is added to and dirtied in a transaction, ownership of reference one
>>> + * transfers to the transaction. The reference is dropped once the ATTRI is
>>> + * inserted to the AIL or in the event of failure along the way (e.g., commit
>>> + * failure, log I/O error, etc.). Note that the caller remains responsible for
>>> + * the ATTRD reference under all circumstances to this point. The caller has no
>>> + * means to detect failure once the transaction is committed, however.
>>> + * Therefore, an ATTRD is required after this point, even in the event of
>>> + * unrelated failure.
>>> + *
>>> + * Once an ATTRD is allocated and dirtied in a transaction, reference two
>>> + * transfers to the transaction. The ATTRD reference is dropped once it reaches
>>> + * the unpin handler. Similar to the ATTRI, the reference also drops in the
>>> + * event of commit failure or log I/O errors. Note that the ATTRD is not
>>> + * inserted in the AIL, so at this point both the ATTRI and ATTRD are freed.
>>
>> I don't think it's necessary to document the entire log intent/log done
>> refcount state machine here; it'll do to record just the bits that are
>> specific to delayed xattr operations.
>>
>>> + */
>>> +struct xfs_attri_log_item {
>>> +	struct xfs_log_item		attri_item;
>>> +	atomic_t			attri_refcount;
>>> +	int				attri_name_len;
>>> +	void				*attri_name;
>>> +	int				attri_value_len;
>>> +	void				*attri_value;
>>
>> Please compress this structure a bit by moving the two pointers to be
>> adjacent instead of interspersed with ints.
>>
>> Ok, now on to digesting the new state machine...
>>
>> --D
>>
>>> +	struct xfs_attri_log_format	attri_format;
>>> +};
>>> +
>>> +/*
>>> + * This is the "attr done" log item.  It is used to log the fact that some attrs
>>> + * earlier mentioned in an attri item have been freed.
>>> + */
>>> +struct xfs_attrd_log_item {
>>> +	struct xfs_attri_log_item	*attrd_attrip;
>>> +	struct xfs_log_item		attrd_item;
>>> +	struct xfs_attrd_log_format	attrd_format;
>>> +};
>>> +
>>> +#endif	/* __XFS_ATTR_ITEM_H__ */
>>> diff --git a/fs/xfs/xfs_attr_list.c b/fs/xfs/xfs_attr_list.c
>>> index 8f8837f..d7787a5 100644
>>> --- a/fs/xfs/xfs_attr_list.c
>>> +++ b/fs/xfs/xfs_attr_list.c
>>> @@ -15,6 +15,7 @@
>>>   #include "xfs_inode.h"
>>>   #include "xfs_trans.h"
>>>   #include "xfs_bmap.h"
>>> +#include "xfs_da_btree.h"
>>>   #include "xfs_attr.h"
>>>   #include "xfs_attr_sf.h"
>>>   #include "xfs_attr_leaf.h"
>>> diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
>>> index 3fbd98f..d5d1959 100644
>>> --- a/fs/xfs/xfs_ioctl.c
>>> +++ b/fs/xfs/xfs_ioctl.c
>>> @@ -15,6 +15,8 @@
>>>   #include "xfs_iwalk.h"
>>>   #include "xfs_itable.h"
>>>   #include "xfs_error.h"
>>> +#include "xfs_da_format.h"
>>> +#include "xfs_da_btree.h"
>>>   #include "xfs_attr.h"
>>>   #include "xfs_bmap.h"
>>>   #include "xfs_bmap_util.h"
>>> diff --git a/fs/xfs/xfs_ioctl32.c b/fs/xfs/xfs_ioctl32.c
>>> index c1771e7..62e1534 100644
>>> --- a/fs/xfs/xfs_ioctl32.c
>>> +++ b/fs/xfs/xfs_ioctl32.c
>>> @@ -17,6 +17,8 @@
>>>   #include "xfs_itable.h"
>>>   #include "xfs_fsops.h"
>>>   #include "xfs_rtalloc.h"
>>> +#include "xfs_da_format.h"
>>> +#include "xfs_da_btree.h"
>>>   #include "xfs_attr.h"
>>>   #include "xfs_ioctl.h"
>>>   #include "xfs_ioctl32.h"
>>> diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
>>> index 5e16545..5ecc76c 100644
>>> --- a/fs/xfs/xfs_iops.c
>>> +++ b/fs/xfs/xfs_iops.c
>>> @@ -13,6 +13,8 @@
>>>   #include "xfs_inode.h"
>>>   #include "xfs_acl.h"
>>>   #include "xfs_quota.h"
>>> +#include "xfs_da_format.h"
>>> +#include "xfs_da_btree.h"
>>>   #include "xfs_attr.h"
>>>   #include "xfs_trans.h"
>>>   #include "xfs_trace.h"
>>> diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
>>> index fa2d05e..3457f22 100644
>>> --- a/fs/xfs/xfs_log.c
>>> +++ b/fs/xfs/xfs_log.c
>>> @@ -1993,6 +1993,10 @@ xlog_print_tic_res(
>>>   	    REG_TYPE_STR(CUD_FORMAT, "cud_format"),
>>>   	    REG_TYPE_STR(BUI_FORMAT, "bui_format"),
>>>   	    REG_TYPE_STR(BUD_FORMAT, "bud_format"),
>>> +	    REG_TYPE_STR(ATTRI_FORMAT, "attri_format"),
>>> +	    REG_TYPE_STR(ATTRD_FORMAT, "attrd_format"),
>>> +	    REG_TYPE_STR(ATTR_NAME, "attr_name"),
>>> +	    REG_TYPE_STR(ATTR_VALUE, "attr_value"),
>>>   	};
>>>   	BUILD_BUG_ON(ARRAY_SIZE(res_type_str) != XLOG_REG_TYPE_MAX + 1);
>>>   #undef REG_TYPE_STR
>>> diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
>>> index a8289ad..cb951cd 100644
>>> --- a/fs/xfs/xfs_log_recover.c
>>> +++ b/fs/xfs/xfs_log_recover.c
>>> @@ -1775,6 +1775,8 @@ static const struct xlog_recover_item_ops *xlog_recover_item_ops[] = {
>>>   	&xlog_cud_item_ops,
>>>   	&xlog_bui_item_ops,
>>>   	&xlog_bud_item_ops,
>>> +	&xlog_attri_item_ops,
>>> +	&xlog_attrd_item_ops,
>>>   };
>>>   
>>>   static const struct xlog_recover_item_ops *
>>> diff --git a/fs/xfs/xfs_ondisk.h b/fs/xfs/xfs_ondisk.h
>>> index 0aa87c2..bc9c25e 100644
>>> --- a/fs/xfs/xfs_ondisk.h
>>> +++ b/fs/xfs/xfs_ondisk.h
>>> @@ -132,6 +132,8 @@ xfs_check_ondisk_structs(void)
>>>   	XFS_CHECK_STRUCT_SIZE(struct xfs_inode_log_format,	56);
>>>   	XFS_CHECK_STRUCT_SIZE(struct xfs_qoff_logformat,	20);
>>>   	XFS_CHECK_STRUCT_SIZE(struct xfs_trans_header,		16);
>>> +	XFS_CHECK_STRUCT_SIZE(struct xfs_attri_log_format,	40);
>>> +	XFS_CHECK_STRUCT_SIZE(struct xfs_attrd_log_format,	16);
>>>   
>>>   	/*
>>>   	 * The v5 superblock format extended several v4 header structures with
>>> diff --git a/fs/xfs/xfs_xattr.c b/fs/xfs/xfs_xattr.c
>>> index bca48b3..9b0c790 100644
>>> --- a/fs/xfs/xfs_xattr.c
>>> +++ b/fs/xfs/xfs_xattr.c
>>> @@ -10,6 +10,7 @@
>>>   #include "xfs_log_format.h"
>>>   #include "xfs_da_format.h"
>>>   #include "xfs_inode.h"
>>> +#include "xfs_da_btree.h"
>>>   #include "xfs_attr.h"
>>>   #include "xfs_acl.h"
>>>   #include "xfs_da_btree.h"
>>> -- 
>>> 2.7.4
>>>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v13 03/10] xfs: Add delay ready attr set routines
  2020-11-13  9:16         ` Chandan Babu R
@ 2020-11-13 17:12           ` Allison Henderson
  2020-11-14  1:20             ` Darrick J. Wong
  0 siblings, 1 reply; 58+ messages in thread
From: Allison Henderson @ 2020-11-13 17:12 UTC (permalink / raw)
  To: Chandan Babu R; +Cc: Darrick J. Wong, linux-xfs



On 11/13/20 2:16 AM, Chandan Babu R wrote:
> On Friday 13 November 2020 7:03:13 AM IST Allison Henderson wrote:
>>
>> On 11/10/20 2:57 PM, Darrick J. Wong wrote:
>>> On Tue, Oct 27, 2020 at 07:02:55PM +0530, Chandan Babu R wrote:
>>>> On Friday 23 October 2020 12:04:28 PM IST Allison Henderson wrote:
>>>>> This patch modifies the attr set routines to be delay ready. This means
>>>>> they no longer roll or commit transactions, but instead return -EAGAIN
>>>>> to have the calling routine roll and refresh the transaction.  In this
>>>>> series, xfs_attr_set_args has become xfs_attr_set_iter, which uses a
>>>>> state machine like switch to keep track of where it was when EAGAIN was
>>>>> returned. See xfs_attr.h for a more detailed diagram of the states.
>>>>>
>>>>> Two new helper functions have been added: xfs_attr_rmtval_set_init and
>>>>> xfs_attr_rmtval_set_blk.  They provide a subset of logic similar to
>>>>> xfs_attr_rmtval_set, but they store the current block in the delay attr
>>>>> context to allow the caller to roll the transaction between allocations.
>>>>> This helps to simplify and consolidate code used by
>>>>> xfs_attr_leaf_addname and xfs_attr_node_addname. xfs_attr_set_args has
>>>>> now become a simple loop to refresh the transaction until the operation
>>>>> is completed.  Lastly, xfs_attr_rmtval_remove is no longer used, and is
>>>>> removed.
>>>>
>>>> One nit. xfs_attr_rmtval_remove()'s prototype declaration needs to be removed
>>>> from xfs_attr_remote.h.
>> Alrighty, will pull out
>>
>>>>
>>>>>
>>>>> Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
>>>>> ---
>>>>>    fs/xfs/libxfs/xfs_attr.c        | 370 ++++++++++++++++++++++++++--------------
>>>>>    fs/xfs/libxfs/xfs_attr.h        | 126 +++++++++++++-
>>>>>    fs/xfs/libxfs/xfs_attr_remote.c |  99 +++++++----
>>>>>    fs/xfs/libxfs/xfs_attr_remote.h |   4 +
>>>>>    fs/xfs/xfs_trace.h              |   1 -
>>>>>    5 files changed, 439 insertions(+), 161 deletions(-)
>>>>>
>>>>> diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
>>>>> index 6ca94cb..95c98d7 100644
>>>>> --- a/fs/xfs/libxfs/xfs_attr.c
>>>>> +++ b/fs/xfs/libxfs/xfs_attr.c
>>>>> @@ -44,7 +44,7 @@ STATIC int xfs_attr_shortform_addname(xfs_da_args_t *args);
>>>>>     * Internal routines when attribute list is one block.
>>>>>     */
>>>>>    STATIC int xfs_attr_leaf_get(xfs_da_args_t *args);
>>>>> -STATIC int xfs_attr_leaf_addname(xfs_da_args_t *args);
>>>>> +STATIC int xfs_attr_leaf_addname(struct xfs_delattr_context *dac);
>>>>>    STATIC int xfs_attr_leaf_removename(xfs_da_args_t *args);
>>>>>    STATIC int xfs_attr_leaf_hasname(struct xfs_da_args *args, struct xfs_buf **bp);
>>>>>    
>>>>> @@ -52,12 +52,15 @@ STATIC int xfs_attr_leaf_hasname(struct xfs_da_args *args, struct xfs_buf **bp);
>>>>>     * Internal routines when attribute list is more than one block.
>>>>>     */
>>>>>    STATIC int xfs_attr_node_get(xfs_da_args_t *args);
>>>>> -STATIC int xfs_attr_node_addname(xfs_da_args_t *args);
>>>>> +STATIC int xfs_attr_node_addname(struct xfs_delattr_context *dac);
>>>>>    STATIC int xfs_attr_node_removename_iter(struct xfs_delattr_context *dac);
>>>>>    STATIC int xfs_attr_node_hasname(xfs_da_args_t *args,
>>>>>    				 struct xfs_da_state **state);
>>>>>    STATIC int xfs_attr_fillstate(xfs_da_state_t *state);
>>>>>    STATIC int xfs_attr_refillstate(xfs_da_state_t *state);
>>>>> +STATIC int xfs_attr_leaf_try_add(struct xfs_da_args *args, struct xfs_buf *bp);
>>>>> +STATIC int xfs_attr_set_iter(struct xfs_delattr_context *dac,
>>>>> +			     struct xfs_buf **leaf_bp);
>>>>>    
>>>>>    int
>>>>>    xfs_inode_hasattr(
>>>>> @@ -218,8 +221,11 @@ xfs_attr_is_shortform(
>>>>>    
>>>>>    /*
>>>>>     * Attempts to set an attr in shortform, or converts short form to leaf form if
>>>>> - * there is not enough room.  If the attr is set, the transaction is committed
>>>>> - * and set to NULL.
>>>>> + * there is not enough room.  This function is meant to operate as a helper
>>>>> + * routine to the delayed attribute functions.  It returns -EAGAIN to indicate
>>>>> + * that the calling function should roll the transaction, and then proceed to
>>>>> + * add the attr in leaf form.  This subroutine does not expect to be recalled
>>>>> + * again like the other delayed attr routines do.
>>>>>     */
>>>>>    STATIC int
>>>>>    xfs_attr_set_shortform(
>>>>> @@ -227,16 +233,16 @@ xfs_attr_set_shortform(
>>>>>    	struct xfs_buf		**leaf_bp)
>>>>>    {
>>>>>    	struct xfs_inode	*dp = args->dp;
>>>>> -	int			error, error2 = 0;
>>>>> +	int			error = 0;
>>>>>    
>>>>>    	/*
>>>>>    	 * Try to add the attr to the attribute list in the inode.
>>>>>    	 */
>>>>>    	error = xfs_attr_try_sf_addname(dp, args);
>>>>> +
>>>>> +	/* Should only be 0, -EEXIST or ENOSPC */
>>>>>    	if (error != -ENOSPC) {
>>>>> -		error2 = xfs_trans_commit(args->trans);
>>>>> -		args->trans = NULL;
>>>>> -		return error ? error : error2;
>>>>> +		return error;
>>>>>    	}
>>>>>    	/*
>>>>>    	 * It won't fit in the shortform, transform to a leaf block.  GROT:
>>>>> @@ -249,18 +255,10 @@ xfs_attr_set_shortform(
>>>>>    	/*
>>>>>    	 * Prevent the leaf buffer from being unlocked so that a concurrent AIL
>>>>>    	 * push cannot grab the half-baked leaf buffer and run into problems
>>>>> -	 * with the write verifier. Once we're done rolling the transaction we
>>>>> -	 * can release the hold and add the attr to the leaf.
>>>>> +	 * with the write verifier.
>>>>>    	 */
>>>>>    	xfs_trans_bhold(args->trans, *leaf_bp);
>>>>> -	error = xfs_defer_finish(&args->trans);
>>>>> -	xfs_trans_bhold_release(args->trans, *leaf_bp);
>>>>> -	if (error) {
>>>>> -		xfs_trans_brelse(args->trans, *leaf_bp);
>>>>> -		return error;
>>>>> -	}
>>>>> -
>>>>> -	return 0;
>>>>> +	return -EAGAIN;
>>>>>    }
>>>>>    
>>>>>    /*
>>>>> @@ -268,7 +266,7 @@ xfs_attr_set_shortform(
>>>>>     * also checks for a defer finish.  Transaction is finished and rolled as
>>>>>     * needed, and returns true of false if the delayed operation should continue.
>>>>>     */
>>>>> -int
>>>>> +STATIC int
>>>>>    xfs_attr_trans_roll(
>>>>>    	struct xfs_delattr_context	*dac)
>>>>>    {
>>>>> @@ -297,61 +295,130 @@ int
>>>>>    xfs_attr_set_args(
>>>>>    	struct xfs_da_args	*args)
>>>>>    {
>>>>> -	struct xfs_inode	*dp = args->dp;
>>>>> -	struct xfs_buf          *leaf_bp = NULL;
>>>>> -	int			error = 0;
>>>>> +	struct xfs_buf			*leaf_bp = NULL;
>>>>> +	int				error = 0;
>>>>> +	struct xfs_delattr_context	dac = {
>>>>> +		.da_args	= args,
>>>>> +	};
>>>>> +
>>>>> +	do {
>>>>> +		error = xfs_attr_set_iter(&dac, &leaf_bp);
>>>>> +		if (error != -EAGAIN)
>>>>> +			break;
>>>>> +
>>>>> +		error = xfs_attr_trans_roll(&dac);
>>>>> +		if (error)
>>>>> +			return error;
>>>>> +
>>>>> +		if (leaf_bp) {
>>>>> +			xfs_trans_bjoin(args->trans, leaf_bp);
>>>>> +			xfs_trans_bhold(args->trans, leaf_bp);
>>>>> +		}
>>>>
>>>> When xfs_attr_set_iter() causes a "short form" attribute list to be converted
>>>> to "leaf form", leaf_bp would point to an xfs_buf which has been added to the
>>>> transaction and also XFS_BLI_HOLD flag is set on the buffer (last statement in
>>>> xfs_attr_set_shortform()). XFS_BLI_HOLD flag makes sure that the new
>>>> transaction allocated by xfs_attr_trans_roll() would continue to have leaf_bp
>>>> in the transaction's item list. Hence I think the above calls to
>>>> xfs_trans_bjoin() and xfs_trans_bhold() are not required.
>> Sorry, I just noticed Chandans commentary for this patch.  Apologies. I
>> think we can get away with out this now, but yes this routine disappears
>> at the end of the set now.  Will clean out anyway for bisecting reasons
>> though. :-)
> 
> No problem. As an aside, I stopped reviewing the patchset after I noticed
> Brian's review comments for "[PATCH v13 02/10] xfs: Add delay ready attr
> remove routines" suggesting some more code refactoring work.
> 
No worries, thats reasonable.  It's why I only send this out in subsets 
to try and keep people sort of focused on a smaller area because stuff 
at the end of the set changes more often as a result of things moving 
around at the bottom of the set.  It doesn't make sense to channel too 
much effort into something that's still moving around so much :-)

Allison
>>
>>>
>>> I /think/ the defer ops will rejoin the buffer each time it rolls, which
>>> means that xfs_attr_trans_roll returns with the buffer already joined to
>>> the transaction?  And I think you're right that the bhold isn't needed,
>>> because holding is dictated by the lower levels (i.e. _set_iter).
>>>
>>>> Please let me know if I am missing something obvious here.
>>>
>>> The entire function goes away by the end of the series. :)
>>>
>>> --D
>>>
>>>>
>>>>
>>>>
>>>>
>>
> 
> 

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v13 02/10] xfs: Add delay ready attr remove routines
  2020-10-29  1:29         ` Allison Henderson
@ 2020-11-14  0:53           ` Darrick J. Wong
  0 siblings, 0 replies; 58+ messages in thread
From: Darrick J. Wong @ 2020-11-14  0:53 UTC (permalink / raw)
  To: Allison Henderson; +Cc: Chandan Babu R, linux-xfs

On Wed, Oct 28, 2020 at 06:29:51PM -0700, Allison Henderson wrote:
> 
> 
> On 10/28/20 5:04 AM, Chandan Babu R wrote:
> > On Tuesday 27 October 2020 9:02:05 PM IST Allison Henderson wrote:
> > > 
> > > On 10/27/20 2:59 AM, Chandan Babu R wrote:
> > > > On Friday 23 October 2020 12:04:27 PM IST Allison Henderson wrote:
> > > > > This patch modifies the attr remove routines to be delay ready. This
> > > > > means they no longer roll or commit transactions, but instead return
> > > > > -EAGAIN to have the calling routine roll and refresh the transaction. In
> > > > > this series, xfs_attr_remove_args has become xfs_attr_remove_iter, which
> > > > > uses a sort of state machine like switch to keep track of where it was
> > > > > when EAGAIN was returned. xfs_attr_node_removename has also been
> > > > > modified to use the switch, and a new version of xfs_attr_remove_args
> > > > > consists of a simple loop to refresh the transaction until the operation
> > > > > is completed.  A new XFS_DAC_DEFER_FINISH flag is used to finish the
> > > > > transaction where ever the existing code used to.
> > > > > 
> > > > > Calls to xfs_attr_rmtval_remove are replaced with the delay ready
> > > > > version __xfs_attr_rmtval_remove. We will rename
> > > > > __xfs_attr_rmtval_remove back to xfs_attr_rmtval_remove when we are
> > > > > done.
> > > > > 
> > > > > xfs_attr_rmtval_remove itself is still in use by the set routines (used
> > > > > during a rename).  For reasons of preserving existing function, we
> > > > > modify xfs_attr_rmtval_remove to call xfs_defer_finish when the flag is
> > > > > set.  Similar to how xfs_attr_remove_args does here.  Once we transition
> > > > > the set routines to be delay ready, xfs_attr_rmtval_remove is no longer
> > > > > used and will be removed.
> > > > > 
> > > > > This patch also adds a new struct xfs_delattr_context, which we will use
> > > > > to keep track of the current state of an attribute operation. The new
> > > > > xfs_delattr_state enum is used to track various operations that are in
> > > > > progress so that we know not to repeat them, and resume where we left
> > > > > off before EAGAIN was returned to cycle out the transaction. Other
> > > > > members take the place of local variables that need to retain their
> > > > > values across multiple function recalls.  See xfs_attr.h for a more
> > > > > detailed diagram of the states.
> > > > > 
> > > > > Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
> > > > > ---
> > > > >    fs/xfs/libxfs/xfs_attr.c        | 200 +++++++++++++++++++++++++++++-----------
> > > > >    fs/xfs/libxfs/xfs_attr.h        |  72 +++++++++++++++
> > > > >    fs/xfs/libxfs/xfs_attr_leaf.c   |   2 +-
> > > > >    fs/xfs/libxfs/xfs_attr_remote.c |  37 ++++----
> > > > >    fs/xfs/libxfs/xfs_attr_remote.h |   2 +-
> > > > >    fs/xfs/xfs_attr_inactive.c      |   2 +-
> > > > >    6 files changed, 241 insertions(+), 74 deletions(-)
> > > > > 
> > > > > diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
> > > > > index f4d39bf..6ca94cb 100644
> > > > > --- a/fs/xfs/libxfs/xfs_attr.c
> > > > > +++ b/fs/xfs/libxfs/xfs_attr.c
> > > > > @@ -53,7 +53,7 @@ STATIC int xfs_attr_leaf_hasname(struct xfs_da_args *args, struct xfs_buf **bp);
> > > > >     */
> > > > >    STATIC int xfs_attr_node_get(xfs_da_args_t *args);
> > > > >    STATIC int xfs_attr_node_addname(xfs_da_args_t *args);
> > > > > -STATIC int xfs_attr_node_removename(xfs_da_args_t *args);
> > > > > +STATIC int xfs_attr_node_removename_iter(struct xfs_delattr_context *dac);
> > > > >    STATIC int xfs_attr_node_hasname(xfs_da_args_t *args,
> > > > >    				 struct xfs_da_state **state);
> > > > >    STATIC int xfs_attr_fillstate(xfs_da_state_t *state);
> > > > > @@ -264,6 +264,33 @@ xfs_attr_set_shortform(
> > > > >    }
> > > > >    /*
> > > > > + * Checks to see if a delayed attribute transaction should be rolled.  If so,
> > > > > + * also checks for a defer finish.  Transaction is finished and rolled as
> > > > > + * needed, and returns true of false if the delayed operation should continue.
> > > > > + */
> > > > > +int
> > > > > +xfs_attr_trans_roll(
> > > > > +	struct xfs_delattr_context	*dac)
> > > > > +{
> > > > > +	struct xfs_da_args		*args = dac->da_args;
> > > > > +	int				error = 0;
> > > > > +
> > > > > +	if (dac->flags & XFS_DAC_DEFER_FINISH) {
> > > > > +		/*
> > > > > +		 * The caller wants us to finish all the deferred ops so that we
> > > > > +		 * avoid pinning the log tail with a large number of deferred
> > > > > +		 * ops.
> > > > > +		 */
> > > > > +		dac->flags &= ~XFS_DAC_DEFER_FINISH;
> > > > > +		error = xfs_defer_finish(&args->trans);
> > > > > +		if (error)
> > > > > +			return error;
> > > > > +	}
> > > > > +
> > > > > +	return xfs_trans_roll_inode(&args->trans, args->dp);
> > > > > +}
> > > > > +
> > > > > +/*
> > > > >     * Set the attribute specified in @args.
> > > > >     */
> > > > >    int
> > > > > @@ -364,23 +391,54 @@ xfs_has_attr(
> > > > >     */
> > > > >    int
> > > > >    xfs_attr_remove_args(
> > > > > -	struct xfs_da_args      *args)
> > > > > +	struct xfs_da_args	*args)
> > > > >    {
> > > > > -	struct xfs_inode	*dp = args->dp;
> > > > > -	int			error;
> > > > > +	int				error = 0;
> > > > 
> > > > I guess the explicit initialization of "error" can be removed since the
> > > > value returned by the call to xfs_attr_remove_iter() will overwrite it.
> > > Sure, will fix
> > > > 
> > > > > +	struct xfs_delattr_context	dac = {
> > > > > +		.da_args	= args,
> > > > > +	};
> > > > > +
> > > > > +	do {
> > > > > +		error = xfs_attr_remove_iter(&dac);
> > > > > +		if (error != -EAGAIN)
> > > > > +			break;
> > > > > +
> > > > > +		error = xfs_attr_trans_roll(&dac);
> > > > > +		if (error)
> > > > > +			return error;
> > > > > +
> > > > > +	} while (true);
> > > > > +
> > > > > +	return error;
> > > > > +}
> > > > > +
> > > > > +/*
> > > > > + * Remove the attribute specified in @args.
> > > > > + *
> > > > > + * This function may return -EAGAIN to signal that the transaction needs to be
> > > > > + * rolled.  Callers should continue calling this function until they receive a
> > > > > + * return value other than -EAGAIN.
> > > > > + */
> > > > > +int
> > > > > +xfs_attr_remove_iter(
> > > > > +	struct xfs_delattr_context	*dac)
> > > > > +{
> > > > > +	struct xfs_da_args		*args = dac->da_args;
> > > > > +	struct xfs_inode		*dp = args->dp;
> > > > > +
> > > > > +	if (dac->dela_state == XFS_DAS_RM_SHRINK)
> > > > > +		goto node;
> > > > >    	if (!xfs_inode_hasattr(dp)) {
> > > > > -		error = -ENOATTR;
> > > > > +		return -ENOATTR;
> > > > >    	} else if (dp->i_afp->if_format == XFS_DINODE_FMT_LOCAL) {
> > > > >    		ASSERT(dp->i_afp->if_flags & XFS_IFINLINE);
> > > > > -		error = xfs_attr_shortform_remove(args);
> > > > > +		return xfs_attr_shortform_remove(args);
> > > > >    	} else if (xfs_bmap_one_block(dp, XFS_ATTR_FORK)) {
> > > > > -		error = xfs_attr_leaf_removename(args);
> > > > > -	} else {
> > > > > -		error = xfs_attr_node_removename(args);
> > > > > +		return xfs_attr_leaf_removename(args);
> > > > >    	}
> > > > > -
> > > > > -	return error;
> > > > > +node:
> > > > > +	return  xfs_attr_node_removename_iter(dac);
> > > > >    }
> > > > >    /*
> > > > > @@ -1178,10 +1236,11 @@ xfs_attr_leaf_mark_incomplete(
> > > > >     */
> > > > >    STATIC
> > > > >    int xfs_attr_node_removename_setup(
> > > > > -	struct xfs_da_args	*args,
> > > > > -	struct xfs_da_state	**state)
> > > > > +	struct xfs_delattr_context	*dac,
> > > > > +	struct xfs_da_state		**state)
> > > > >    {
> > > > > -	int			error;
> > > > > +	struct xfs_da_args		*args = dac->da_args;
> > > > > +	int				error;
> > > > >    	error = xfs_attr_node_hasname(args, state);
> > > > >    	if (error != -EEXIST)
> > > > > @@ -1191,6 +1250,12 @@ int xfs_attr_node_removename_setup(
> > > > >    	ASSERT((*state)->path.blk[(*state)->path.active - 1].magic ==
> > > > >    		XFS_ATTR_LEAF_MAGIC);
> > > > > +	/*
> > > > > +	 * Store state in the context incase we need to cycle out the
> > > > > +	 * transaction
> > > > > +	 */
> > > > > +	dac->da_state = *state;
> > > > > +
> > > > >    	if (args->rmtblkno > 0) {
> > > > >    		error = xfs_attr_leaf_mark_incomplete(args, *state);
> > > > >    		if (error)
> > > > > @@ -1203,13 +1268,16 @@ int xfs_attr_node_removename_setup(
> > > > >    }
> > > > >    STATIC int
> > > > > -xfs_attr_node_remove_rmt(
> > > > > -	struct xfs_da_args	*args,
> > > > > -	struct xfs_da_state	*state)
> > > > > +xfs_attr_node_remove_rmt (
> > > > > +	struct xfs_delattr_context	*dac,
> > > > > +	struct xfs_da_state		*state)
> > > > >    {
> > > > > -	int			error = 0;
> > > > > +	int				error = 0;
> > > > > -	error = xfs_attr_rmtval_remove(args);
> > > > > +	/*
> > > > > +	 * May return -EAGAIN to request that the caller recall this function
> > > > > +	 */
> > > > > +	error = __xfs_attr_rmtval_remove(dac);
> > > > >    	if (error)
> > > > >    		return error;
> > > > > @@ -1221,21 +1289,27 @@ xfs_attr_node_remove_rmt(
> > > > >    }
> > > > >    /*
> > > > > - * Remove a name from a B-tree attribute list.
> > > > > + * Step through removeing a name from a B-tree attribute list.
> > > > >     *
> > > > >     * This will involve walking down the Btree, and may involve joining
> > > > >     * leaf nodes and even joining intermediate nodes up to and including
> > > > >     * the root node (a special case of an intermediate node).
> > > > > + *
> > > > > + * This routine is meant to function as either an inline or delayed operation,
> > > > > + * and may return -EAGAIN when the transaction needs to be rolled.  Calling
> > > > > + * functions will need to handle this, and recall the function until a
> > > > > + * successful error code is returned.
> > > > >     */
> > > > >    STATIC int
> > > > >    xfs_attr_node_remove_step(
> > > > > -	struct xfs_da_args	*args,
> > > > > -	struct xfs_da_state	*state)
> > > > > +	struct xfs_delattr_context	*dac)
> > > > >    {
> > > > > -	struct xfs_da_state_blk	*blk;
> > > > > -	int			retval, error;
> > > > > -	struct xfs_inode	*dp = args->dp;
> > > > > +	struct xfs_da_args		*args = dac->da_args;
> > > > > +	struct xfs_da_state		*state;
> > > > > +	struct xfs_da_state_blk		*blk;
> > > > > +	int				retval, error = 0;
> > > > > +	state = dac->da_state;
> > > > >    	/*
> > > > >    	 * If there is an out-of-line value, de-allocate the blocks.
> > > > > @@ -1243,7 +1317,10 @@ xfs_attr_node_remove_step(
> > > > >    	 * overflow the maximum size of a transaction and/or hit a deadlock.
> > > > >    	 */
> > > > >    	if (args->rmtblkno > 0) {
> > > > > -		error = xfs_attr_node_remove_rmt(args, state);
> > > > > +		/*
> > > > > +		 * May return -EAGAIN. Remove blocks until args->rmtblkno == 0
> > > > > +		 */
> > > > > +		error = xfs_attr_node_remove_rmt(dac, state);
> > > > >    		if (error)
> > > > >    			return error;
> > > > >    	}
> > > > > @@ -1257,21 +1334,18 @@ xfs_attr_node_remove_step(
> > > > >    	xfs_da3_fixhashpath(state, &state->path);
> > > > >    	/*
> > > > > -	 * Check to see if the tree needs to be collapsed.
> > > > > +	 * Check to see if the tree needs to be collapsed.  Set the flag to
> > > > > +	 * indicate that the calling function needs to move the to shrink
> > > > > +	 * operation
> > > > >    	 */
> > > > >    	if (retval && (state->path.active > 1)) {
> > > > >    		error = xfs_da3_join(state);
> > > > >    		if (error)
> > > > >    			return error;
> > > > > -		error = xfs_defer_finish(&args->trans);
> > > > > -		if (error)
> > > > > -			return error;
> > > > > -		/*
> > > > > -		 * Commit the Btree join operation and start a new trans.
> > > > > -		 */
> > > > > -		error = xfs_trans_roll_inode(&args->trans, dp);
> > > > > -		if (error)
> > > > > -			return error;
> > > > > +
> > > > > +		dac->flags |= XFS_DAC_DEFER_FINISH;
> > > > > +		dac->dela_state = XFS_DAS_RM_SHRINK;
> > > > > +		return -EAGAIN;
> > > > >    	}
> > > > >    	return error;
> > > > > @@ -1282,31 +1356,53 @@ xfs_attr_node_remove_step(
> > > > >     *
> > > > >     * This routine will find the blocks of the name to remove, remove them and
> > > > >     * shirnk the tree if needed.
> > > > > + *
> > > > > + * This routine is meant to function as either an inline or delayed operation,
> > > > > + * and may return -EAGAIN when the transaction needs to be rolled.  Calling
> > > > > + * functions will need to handle this, and recall the function until a
> > > > > + * successful error code is returned.
> > > > >     */
> > > > >    STATIC int
> > > > > -xfs_attr_node_removename(
> > > > > -	struct xfs_da_args	*args)
> > > > > +xfs_attr_node_removename_iter(
> > > > > +	struct xfs_delattr_context	*dac)
> > > > >    {
> > > > > -	struct xfs_da_state	*state;
> > > > > -	int			error;
> > > > > -	struct xfs_inode	*dp = args->dp;
> > > > > +	struct xfs_da_args		*args = dac->da_args;
> > > > > +	struct xfs_da_state		*state;
> > > > > +	int				error;
> > > > > +	struct xfs_inode		*dp = args->dp;
> > > > >    	trace_xfs_attr_node_removename(args);
> > > > > +	state = dac->da_state;
> > > > > -	error = xfs_attr_node_removename_setup(args, &state);
> > > > > -	if (error)
> > > > > -		goto out;
> > > > > +	if ((dac->flags & XFS_DAC_NODE_RMVNAME_INIT) == 0) {
> > > > > +		dac->flags |= XFS_DAC_NODE_RMVNAME_INIT;
> > > > > +		error = xfs_attr_node_removename_setup(dac, &state);
> > > > > +		if (error)
> > > > > +			goto out;
> > > > > +	}
> > > > > -	error = xfs_attr_node_remove_step(args, state);
> > > > > -	if (error)
> > > > > -		goto out;
> > > > > +	switch (dac->dela_state) {
> > > > > +	case XFS_DAS_UNINIT:
> > > > > +		error = xfs_attr_node_remove_step(dac);
> > > > > +		if (error)
> > > > > +			break;
> > > > > -	/*
> > > > > -	 * If the result is small enough, push it all into the inode.
> > > > > -	 */
> > > > > -	if (xfs_bmap_one_block(dp, XFS_ATTR_FORK))
> > > > > -		error = xfs_attr_node_shrink(args, state);
> > > > > +		/* do not break, proceed to shrink if needed */
> > > > > +	case XFS_DAS_RM_SHRINK:
> > > > > +		/*
> > > > > +		 * If the result is small enough, push it all into the inode.
> > > > > +		 */
> > > > > +		if (xfs_bmap_one_block(dp, XFS_ATTR_FORK))
> > > > > +			error = xfs_attr_node_shrink(args, state);
> > > > > +		break;
> > > > > +	default:
> > > > > +		ASSERT(0);
> > > > > +		return -EINVAL;
> > > > 
> > > > I don't think it is possible in a real world scenario, but if "state" were
> > > > pointing to allocated memory then the above return value might leak the
> > > > corresponding memory.
> > > Hmm, trying to follow you here.... I'm assuming you meant dela_state
> > > instead of state since that's what controls the switch.  The dac
> > > structure is zeroed when allocated to avoid this.  Most of the time when
> > > this switch executes, dela_state is zero.  I did have to add the
> > > XFS_DAS_UNINIT from the previous suggestion in the last revision though
> > > or it generates warnings.
> > > > 
> > 
> > Sorry, I should have clarified that I was referring to the allocated
> > memory pointed to by dac->da_state. If dac->da_state was pointing to a valid
> > memory location and dac->dela_state's value is not equal to either
> > XFS_DAS_UNINIT nor XFS_DAS_RM_SHRINK then the code under the "default" clause
> > will execute causing -EINVAL to be returned. This could leak the memory
> > pointed to by dac->da_state.
> 
> Oooh, ok I see it.  We should set error to -EINVAL and goto out. Ideally it
> should never happen, but that should be the proper error handling if it did.
> Thanks for the catch  :-)

kmemleak and kasan are your friend! :)

Also, if you combine dela_context and xfs_attr_item then it'll be nice
to combine all that into the attr item destructor.

--D

> Allison
> > 
> > 
> > > > Apart from the above nit, the remaining changes look good to me.
> > > Ok, thanks for the review!
> > > Allison
> > > 
> > > > 
> > > > Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
> > > > 
> > > > > +	}
> > > > > +
> > > > > +	if (error == -EAGAIN)
> > > > > +		return error;
> > > > >    out:
> > > > >    	if (state)
> > > > >    		xfs_da_state_free(state);
> > > > > diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
> > > > > index 3e97a93..64dcf0f 100644
> > > > > --- a/fs/xfs/libxfs/xfs_attr.h
> > > > > +++ b/fs/xfs/libxfs/xfs_attr.h
> > > > > @@ -74,6 +74,74 @@ struct xfs_attr_list_context {
> > > > >    };
> > > > > +/*
> > > > > + * ========================================================================
> > > > > + * Structure used to pass context around among the delayed routines.
> > > > > + * ========================================================================
> > > > > + */
> > > > > +
> > > > > +/*
> > > > > + * Below is a state machine diagram for attr remove operations. The  XFS_DAS_*
> > > > > + * states indicate places where the function would return -EAGAIN, and then
> > > > > + * immediately resume from after being recalled by the calling function. States
> > > > > + * marked as a "subroutine state" indicate that they belong to a subroutine, and
> > > > > + * so the calling function needs to pass them back to that subroutine to allow
> > > > > + * it to finish where it left off. But they otherwise do not have a role in the
> > > > > + * calling function other than just passing through.
> > > > > + *
> > > > > + * xfs_attr_remove_iter()
> > > > > + *	  XFS_DAS_RM_SHRINK ─┐
> > > > > + *	  (subroutine state) │
> > > > > + *	                     └─>xfs_attr_node_removename()
> > > > > + *	                                      │
> > > > > + *	                                      v
> > > > > + *	                                   need to
> > > > > + *	                                shrink tree? ─n─┐
> > > > > + *	                                      │         │
> > > > > + *	                                      y         │
> > > > > + *	                                      │         │
> > > > > + *	                                      v         │
> > > > > + *	                              XFS_DAS_RM_SHRINK │
> > > > > + *	                                      │         │
> > > > > + *	                                      v         │
> > > > > + *	                                     done <─────┘
> > > > > + *
> > > > > + */
> > > > > +
> > > > > +/*
> > > > > + * Enum values for xfs_delattr_context.da_state
> > > > > + *
> > > > > + * These values are used by delayed attribute operations to keep track  of where
> > > > > + * they were before they returned -EAGAIN.  A return code of -EAGAIN signals the
> > > > > + * calling function to roll the transaction, and then recall the subroutine to
> > > > > + * finish the operation.  The enum is then used by the subroutine to jump back
> > > > > + * to where it was and resume executing where it left off.
> > > > > + */
> > > > > +enum xfs_delattr_state {
> > > > > +	XFS_DAS_UNINIT		= 0,  /* No state has been set yet */
> > > > > +	XFS_DAS_RM_SHRINK,	      /* We are shrinking the tree */
> > > > > +};
> > > > > +
> > > > > +/*
> > > > > + * Defines for xfs_delattr_context.flags
> > > > > + */
> > > > > +#define XFS_DAC_DEFER_FINISH		0x01 /* finish the transaction */
> > > > > +#define XFS_DAC_NODE_RMVNAME_INIT	0x02 /* xfs_attr_node_removename init */
> > > > > +
> > > > > +/*
> > > > > + * Context used for keeping track of delayed attribute operations
> > > > > + */
> > > > > +struct xfs_delattr_context {
> > > > > +	struct xfs_da_args      *da_args;
> > > > > +
> > > > > +	/* Used in xfs_attr_node_removename to roll through removing blocks */
> > > > > +	struct xfs_da_state     *da_state;
> > > > > +
> > > > > +	/* Used to keep track of current state of delayed operation */
> > > > > +	unsigned int            flags;
> > > > > +	enum xfs_delattr_state  dela_state;
> > > > > +};
> > > > > +
> > > > >    /*========================================================================
> > > > >     * Function prototypes for the kernel.
> > > > >     *========================================================================*/
> > > > > @@ -91,6 +159,10 @@ int xfs_attr_set(struct xfs_da_args *args);
> > > > >    int xfs_attr_set_args(struct xfs_da_args *args);
> > > > >    int xfs_has_attr(struct xfs_da_args *args);
> > > > >    int xfs_attr_remove_args(struct xfs_da_args *args);
> > > > > +int xfs_attr_remove_iter(struct xfs_delattr_context *dac);
> > > > > +int xfs_attr_trans_roll(struct xfs_delattr_context *dac);
> > > > >    bool xfs_attr_namecheck(const void *name, size_t length);
> > > > > +void xfs_delattr_context_init(struct xfs_delattr_context *dac,
> > > > > +			      struct xfs_da_args *args);
> > > > >    #endif	/* __XFS_ATTR_H__ */
> > > > > diff --git a/fs/xfs/libxfs/xfs_attr_leaf.c b/fs/xfs/libxfs/xfs_attr_leaf.c
> > > > > index bb128db..338377e 100644
> > > > > --- a/fs/xfs/libxfs/xfs_attr_leaf.c
> > > > > +++ b/fs/xfs/libxfs/xfs_attr_leaf.c
> > > > > @@ -19,8 +19,8 @@
> > > > >    #include "xfs_bmap_btree.h"
> > > > >    #include "xfs_bmap.h"
> > > > >    #include "xfs_attr_sf.h"
> > > > > -#include "xfs_attr_remote.h"
> > > > >    #include "xfs_attr.h"
> > > > > +#include "xfs_attr_remote.h"
> > > > >    #include "xfs_attr_leaf.h"
> > > > >    #include "xfs_error.h"
> > > > >    #include "xfs_trace.h"
> > > > > diff --git a/fs/xfs/libxfs/xfs_attr_remote.c b/fs/xfs/libxfs/xfs_attr_remote.c
> > > > > index 48d8e9c..1426c15 100644
> > > > > --- a/fs/xfs/libxfs/xfs_attr_remote.c
> > > > > +++ b/fs/xfs/libxfs/xfs_attr_remote.c
> > > > > @@ -674,10 +674,12 @@ xfs_attr_rmtval_invalidate(
> > > > >     */
> > > > >    int
> > > > >    xfs_attr_rmtval_remove(
> > > > > -	struct xfs_da_args      *args)
> > > > > +	struct xfs_da_args		*args)
> > > > >    {
> > > > > -	int			error;
> > > > > -	int			retval;
> > > > > +	int				error;
> > > > > +	struct xfs_delattr_context	dac  = {
> > > > > +		.da_args	= args,
> > > > > +	};
> > > > >    	trace_xfs_attr_rmtval_remove(args);
> > > > > @@ -685,19 +687,17 @@ xfs_attr_rmtval_remove(
> > > > >    	 * Keep de-allocating extents until the remote-value region is gone.
> > > > >    	 */
> > > > >    	do {
> > > > > -		retval = __xfs_attr_rmtval_remove(args);
> > > > > -		if (retval && retval != -EAGAIN)
> > > > > -			return retval;
> > > > > +		error = __xfs_attr_rmtval_remove(&dac);
> > > > > +		if (error != -EAGAIN)
> > > > > +			break;
> > > > > -		/*
> > > > > -		 * Close out trans and start the next one in the chain.
> > > > > -		 */
> > > > > -		error = xfs_trans_roll_inode(&args->trans, args->dp);
> > > > > +		error = xfs_attr_trans_roll(&dac);
> > > > >    		if (error)
> > > > >    			return error;
> > > > > -	} while (retval == -EAGAIN);
> > > > > -	return 0;
> > > > > +	} while (true);
> > > > > +
> > > > > +	return error;
> > > > >    }
> > > > >    /*
> > > > > @@ -707,9 +707,10 @@ xfs_attr_rmtval_remove(
> > > > >     */
> > > > >    int
> > > > >    __xfs_attr_rmtval_remove(
> > > > > -	struct xfs_da_args	*args)
> > > > > +	struct xfs_delattr_context	*dac)
> > > > >    {
> > > > > -	int			error, done;
> > > > > +	struct xfs_da_args		*args = dac->da_args;
> > > > > +	int				error, done;
> > > > >    	/*
> > > > >    	 * Unmap value blocks for this attr.
> > > > > @@ -719,12 +720,10 @@ __xfs_attr_rmtval_remove(
> > > > >    	if (error)
> > > > >    		return error;
> > > > > -	error = xfs_defer_finish(&args->trans);
> > > > > -	if (error)
> > > > > -		return error;
> > > > > -
> > > > > -	if (!done)
> > > > > +	if (!done) {
> > > > > +		dac->flags |= XFS_DAC_DEFER_FINISH;
> > > > >    		return -EAGAIN;
> > > > > +	}
> > > > >    	return error;
> > > > >    }
> > > > > diff --git a/fs/xfs/libxfs/xfs_attr_remote.h b/fs/xfs/libxfs/xfs_attr_remote.h
> > > > > index 9eee615..002fd30 100644
> > > > > --- a/fs/xfs/libxfs/xfs_attr_remote.h
> > > > > +++ b/fs/xfs/libxfs/xfs_attr_remote.h
> > > > > @@ -14,5 +14,5 @@ int xfs_attr_rmtval_remove(struct xfs_da_args *args);
> > > > >    int xfs_attr_rmtval_stale(struct xfs_inode *ip, struct xfs_bmbt_irec *map,
> > > > >    		xfs_buf_flags_t incore_flags);
> > > > >    int xfs_attr_rmtval_invalidate(struct xfs_da_args *args);
> > > > > -int __xfs_attr_rmtval_remove(struct xfs_da_args *args);
> > > > > +int __xfs_attr_rmtval_remove(struct xfs_delattr_context *dac);
> > > > >    #endif /* __XFS_ATTR_REMOTE_H__ */
> > > > > diff --git a/fs/xfs/xfs_attr_inactive.c b/fs/xfs/xfs_attr_inactive.c
> > > > > index bfad669..aaa7e66 100644
> > > > > --- a/fs/xfs/xfs_attr_inactive.c
> > > > > +++ b/fs/xfs/xfs_attr_inactive.c
> > > > > @@ -15,10 +15,10 @@
> > > > >    #include "xfs_da_format.h"
> > > > >    #include "xfs_da_btree.h"
> > > > >    #include "xfs_inode.h"
> > > > > +#include "xfs_attr.h"
> > > > >    #include "xfs_attr_remote.h"
> > > > >    #include "xfs_trans.h"
> > > > >    #include "xfs_bmap.h"
> > > > > -#include "xfs_attr.h"
> > > > >    #include "xfs_attr_leaf.h"
> > > > >    #include "xfs_quota.h"
> > > > >    #include "xfs_dir2.h"
> > > > > 
> > > > 
> > > > 
> > > 
> > 
> > 

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v13 02/10] xfs: Add delay ready attr remove routines
  2020-11-13  3:43     ` Allison Henderson
@ 2020-11-14  1:18       ` Darrick J. Wong
  2020-11-16  5:12         ` Allison Henderson
  0 siblings, 1 reply; 58+ messages in thread
From: Darrick J. Wong @ 2020-11-14  1:18 UTC (permalink / raw)
  To: Allison Henderson; +Cc: linux-xfs

On Thu, Nov 12, 2020 at 08:43:25PM -0700, Allison Henderson wrote:
> 
> 
> On 11/10/20 4:43 PM, Darrick J. Wong wrote:
> > On Thu, Oct 22, 2020 at 11:34:27PM -0700, Allison Henderson wrote:
> > > This patch modifies the attr remove routines to be delay ready. This
> > > means they no longer roll or commit transactions, but instead return
> > > -EAGAIN to have the calling routine roll and refresh the transaction. In
> > > this series, xfs_attr_remove_args has become xfs_attr_remove_iter, which
> > > uses a sort of state machine like switch to keep track of where it was
> > > when EAGAIN was returned. xfs_attr_node_removename has also been
> > > modified to use the switch, and a new version of xfs_attr_remove_args
> > > consists of a simple loop to refresh the transaction until the operation
> > > is completed.  A new XFS_DAC_DEFER_FINISH flag is used to finish the
> > > transaction where ever the existing code used to.
> > > 
> > > Calls to xfs_attr_rmtval_remove are replaced with the delay ready
> > > version __xfs_attr_rmtval_remove. We will rename
> > > __xfs_attr_rmtval_remove back to xfs_attr_rmtval_remove when we are
> > > done.
> > > 
> > > xfs_attr_rmtval_remove itself is still in use by the set routines (used
> > > during a rename).  For reasons of preserving existing function, we
> > > modify xfs_attr_rmtval_remove to call xfs_defer_finish when the flag is
> > > set.  Similar to how xfs_attr_remove_args does here.  Once we transition
> > > the set routines to be delay ready, xfs_attr_rmtval_remove is no longer
> > > used and will be removed.
> > > 
> > > This patch also adds a new struct xfs_delattr_context, which we will use
> > > to keep track of the current state of an attribute operation. The new
> > > xfs_delattr_state enum is used to track various operations that are in
> > > progress so that we know not to repeat them, and resume where we left
> > > off before EAGAIN was returned to cycle out the transaction. Other
> > > members take the place of local variables that need to retain their
> > > values across multiple function recalls.  See xfs_attr.h for a more
> > > detailed diagram of the states.
> > > 
> > > Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
> > > ---
> > >   fs/xfs/libxfs/xfs_attr.c        | 200 +++++++++++++++++++++++++++++-----------
> > >   fs/xfs/libxfs/xfs_attr.h        |  72 +++++++++++++++
> > >   fs/xfs/libxfs/xfs_attr_leaf.c   |   2 +-
> > >   fs/xfs/libxfs/xfs_attr_remote.c |  37 ++++----
> > >   fs/xfs/libxfs/xfs_attr_remote.h |   2 +-
> > >   fs/xfs/xfs_attr_inactive.c      |   2 +-
> > >   6 files changed, 241 insertions(+), 74 deletions(-)
> > > 
> > > diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
> > > index f4d39bf..6ca94cb 100644
> > > --- a/fs/xfs/libxfs/xfs_attr.c
> > > +++ b/fs/xfs/libxfs/xfs_attr.c
> > > @@ -53,7 +53,7 @@ STATIC int xfs_attr_leaf_hasname(struct xfs_da_args *args, struct xfs_buf **bp);
> > >    */
> > >   STATIC int xfs_attr_node_get(xfs_da_args_t *args);
> > >   STATIC int xfs_attr_node_addname(xfs_da_args_t *args);
> > > -STATIC int xfs_attr_node_removename(xfs_da_args_t *args);
> > > +STATIC int xfs_attr_node_removename_iter(struct xfs_delattr_context *dac);
> > >   STATIC int xfs_attr_node_hasname(xfs_da_args_t *args,
> > >   				 struct xfs_da_state **state);
> > >   STATIC int xfs_attr_fillstate(xfs_da_state_t *state);
> > > @@ -264,6 +264,33 @@ xfs_attr_set_shortform(
> > >   }
> > >   /*
> > > + * Checks to see if a delayed attribute transaction should be rolled.  If so,
> > > + * also checks for a defer finish.  Transaction is finished and rolled as
> > > + * needed, and returns true of false if the delayed operation should continue.
> > > + */
> > > +int
> > > +xfs_attr_trans_roll(
> > > +	struct xfs_delattr_context	*dac)
> > > +{
> > > +	struct xfs_da_args		*args = dac->da_args;
> > > +	int				error = 0;
> > > +
> > > +	if (dac->flags & XFS_DAC_DEFER_FINISH) {
> > > +		/*
> > > +		 * The caller wants us to finish all the deferred ops so that we
> > > +		 * avoid pinning the log tail with a large number of deferred
> > > +		 * ops.
> > > +		 */
> > > +		dac->flags &= ~XFS_DAC_DEFER_FINISH;
> > > +		error = xfs_defer_finish(&args->trans);
> > > +		if (error)
> > > +			return error;
> > > +	}
> > > +
> > > +	return xfs_trans_roll_inode(&args->trans, args->dp);
> > > +}
> > 
> > (Mostly ignoring these functions since they all go away by the end of
> > the patchset...)
> > 
> > > +
> > > +/*
> > >    * Set the attribute specified in @args.
> > >    */
> > >   int
> > > @@ -364,23 +391,54 @@ xfs_has_attr(
> > >    */
> > >   int
> > >   xfs_attr_remove_args(
> > > -	struct xfs_da_args      *args)
> > > +	struct xfs_da_args	*args)
> > >   {
> > > -	struct xfs_inode	*dp = args->dp;
> > > -	int			error;
> > > +	int				error = 0;
> > > +	struct xfs_delattr_context	dac = {
> > > +		.da_args	= args,
> > > +	};
> > > +
> > > +	do {
> > > +		error = xfs_attr_remove_iter(&dac);
> > > +		if (error != -EAGAIN)
> > > +			break;
> > > +
> > > +		error = xfs_attr_trans_roll(&dac);
> > > +		if (error)
> > > +			return error;
> > > +
> > > +	} while (true);
> > > +
> > > +	return error;
> > > +}
> > > +
> > > +/*
> > > + * Remove the attribute specified in @args.
> > > + *
> > > + * This function may return -EAGAIN to signal that the transaction needs to be
> > > + * rolled.  Callers should continue calling this function until they receive a
> > > + * return value other than -EAGAIN.
> > > + */
> > > +int
> > > +xfs_attr_remove_iter(
> > > +	struct xfs_delattr_context	*dac)
> > > +{
> > > +	struct xfs_da_args		*args = dac->da_args;
> > > +	struct xfs_inode		*dp = args->dp;
> > > +
> > > +	if (dac->dela_state == XFS_DAS_RM_SHRINK)
> > > +		goto node;
> > 
> > Might as well just make this part of the if statement dispatch:
> > 
> > 	if (dac->dela_state == XFS_DAS_RM_SHRINK)
> > 		return xfs_attr_node_removename_iter(dac);
> > 	else if (!xfs_inode_hasattr(dp))
> > 		return -ENOATTR;
> I think we did this once, but then people disliked having the same call in
> two places.  We call the node function if XFS_DAS_RM_SHRINK is set OR if the
> other two cases fail which is actually the initial point of entry.
> 
> I think probably we need a comment somewhere.  I've realized every time a
> question gets re-raised, it means we need a comment so we dont forget why
> :-)
> 
> Maybe for the goto we can have:
> /* If we are shrinking a node, resume shrink */
> 
> and.....

<shrug> This was a pretty minor point in my review, so if there's a
better way of doing it, please feel free. :)

Admittedly I assume that a modern day compiler will slice and dice and
rearrange to its heart's content, so for the most part I'm looking for
higher level design errors and more or less don't care about the nitty
gritty of what kind of machine code this all turns into.

(I'm probably doing that at everyone's peril, sadly...)

> 
> > 
> > >   	if (!xfs_inode_hasattr(dp)) {
> > > -		error = -ENOATTR;
> > > +		return -ENOATTR;
> > >   	} else if (dp->i_afp->if_format == XFS_DINODE_FMT_LOCAL) {
> > >   		ASSERT(dp->i_afp->if_flags & XFS_IFINLINE);
> > > -		error = xfs_attr_shortform_remove(args);
> > > +		return xfs_attr_shortform_remove(args);
> > >   	} else if (xfs_bmap_one_block(dp, XFS_ATTR_FORK)) {
> > > -		error = xfs_attr_leaf_removename(args);
> > > -	} else {
> > > -		error = xfs_attr_node_removename(args);
> > > +		return xfs_attr_leaf_removename(args);
> > >   	}
> > > -
> > > -	return error;
> > > +node:
> 	/* If we are not short form or leaf, then remove node */
> ?
> > > +	return  xfs_attr_node_removename_iter(dac);
> > >   }
> > >   /*
> > > @@ -1178,10 +1236,11 @@ xfs_attr_leaf_mark_incomplete(
> > >    */
> > >   STATIC
> > >   int xfs_attr_node_removename_setup(
> > > -	struct xfs_da_args	*args,
> > > -	struct xfs_da_state	**state)
> > > +	struct xfs_delattr_context	*dac,
> > > +	struct xfs_da_state		**state)
> > 
> > AFAICT *state == &dac->da_state by the end of the series; can you
> > should remove this argument too?
> > 
> Sure, I will see if I can collapse it down
> 
> > >   {
> > > -	int			error;
> > > +	struct xfs_da_args		*args = dac->da_args;
> > > +	int				error;
> > >   	error = xfs_attr_node_hasname(args, state);
> > >   	if (error != -EEXIST)
> > > @@ -1191,6 +1250,12 @@ int xfs_attr_node_removename_setup(
> > >   	ASSERT((*state)->path.blk[(*state)->path.active - 1].magic ==
> > >   		XFS_ATTR_LEAF_MAGIC);
> > > +	/*
> > > +	 * Store state in the context incase we need to cycle out the
> > > +	 * transaction
> > > +	 */
> > > +	dac->da_state = *state;
> > > +
> > >   	if (args->rmtblkno > 0) {
> > >   		error = xfs_attr_leaf_mark_incomplete(args, *state);
> > 
> > It doesn't make a lot of logical sense to me "we marked the attr
> > incomplete to hide it" is the same state (UNINIT) as "we haven't done
> > anything yet".
> Not sure I quite follow what you mean here.  This little function is just a
> set up helper.  It doesnt jump in an out like the other functions do with
> the state machine.  We separated it out for that reason.  This routine
> executes once to stash the state. The da_state. not the dela_state.
> Different states :-)
> 
> So after we have that stored away, the calling function moves onto
> xfs_attr_node_remove_step, which does get recalled quite a bit until there
> are no more remote blocks to remove.

<nod> I got that; I think my confusion here is that I was expecting each
and every step to get its own state (which I think you said was how this
used to be some ~5 revisions ago) even if it doesn't result in a
transaction roll, whereas now the delattr code only introduces a new
state when it needs to roll the transaction.

Hm.  I've been reviewing this patchset by puzzling out each of the steps
of the old attr setting and removing code, and then figuring out how the
old code got from one step to another.  Then I look at the end product
of this whole patchset and try to figure out how the new state machine
maps onto the old sequences, to determine if there are any serious
discrepancies that also break things.

So I think in the first round of this review I was treading awfully
close to suggesting that every little step of the old system had to
become an explicit state in the new system's state machine, so that I
could do a 1:1 comparison.  That isn't the code that's before me now,
and reworking all that sounds like (a) a big pain and (b) probably not
where you and Brian were heading.

Perhaps an easier way to bridge the gap between the old way and the new
way would be to make the ASCII art diagram call out each of these little
steps (marking the attr incomplete, removing the value blocks, erasing
the attr key, shrinking the attr tree, etc.) and then show where each of
the XFS_DAS_* steps fall into that?

That way, the ASCII art would show that we start in XFS_DAS_UNINIT, mark
the attr "incomplete", move on to XFS_DAS_RM_SHRINK, start removing attr
blocks, etc.  The machinery can omit the unnecessary pieces, so long as
we have a map of the overall process.

How does that sound?

> > 
> > >   		if (error)
> > > @@ -1203,13 +1268,16 @@ int xfs_attr_node_removename_setup(
> > >   }
> > >   STATIC int
> > > -xfs_attr_node_remove_rmt(
> > > -	struct xfs_da_args	*args,
> > > -	struct xfs_da_state	*state)
> > > +xfs_attr_node_remove_rmt (
> > > +	struct xfs_delattr_context	*dac,
> > > +	struct xfs_da_state		*state)
> > >   {
> > > -	int			error = 0;
> > > +	int				error = 0;
> > > -	error = xfs_attr_rmtval_remove(args);
> > > +	/*
> > > +	 * May return -EAGAIN to request that the caller recall this function
> > > +	 */
> > > +	error = __xfs_attr_rmtval_remove(dac);
> > >   	if (error)
> > >   		return error;
> > > @@ -1221,21 +1289,27 @@ xfs_attr_node_remove_rmt(
> > >   }
> > >   /*
> > > - * Remove a name from a B-tree attribute list.
> > > + * Step through removeing a name from a B-tree attribute list.
> > >    *
> > >    * This will involve walking down the Btree, and may involve joining
> > >    * leaf nodes and even joining intermediate nodes up to and including
> > >    * the root node (a special case of an intermediate node).
> > > + *
> > > + * This routine is meant to function as either an inline or delayed operation,
> > > + * and may return -EAGAIN when the transaction needs to be rolled.  Calling
> > > + * functions will need to handle this, and recall the function until a
> > > + * successful error code is returned.
> > >    */
> > >   STATIC int
> > >   xfs_attr_node_remove_step(
> > > -	struct xfs_da_args	*args,
> > > -	struct xfs_da_state	*state)
> > > +	struct xfs_delattr_context	*dac)
> > >   {
> > > -	struct xfs_da_state_blk	*blk;
> > > -	int			retval, error;
> > > -	struct xfs_inode	*dp = args->dp;
> > > +	struct xfs_da_args		*args = dac->da_args;
> > > +	struct xfs_da_state		*state;
> > > +	struct xfs_da_state_blk		*blk;
> > > +	int				retval, error = 0;
> > > +	state = dac->da_state;
> > 
> > Might as well initialize this when you declare state above.
> Sure
> 
> > 
> > >   	/*
> > >   	 * If there is an out-of-line value, de-allocate the blocks.
> > > @@ -1243,7 +1317,10 @@ xfs_attr_node_remove_step(
> > >   	 * overflow the maximum size of a transaction and/or hit a deadlock.
> > >   	 */
> > >   	if (args->rmtblkno > 0) {
> > > -		error = xfs_attr_node_remove_rmt(args, state);
> > > +		/*
> > > +		 * May return -EAGAIN. Remove blocks until args->rmtblkno == 0
> > > +		 */
> > > +		error = xfs_attr_node_remove_rmt(dac, state);
> > >   		if (error)
> > >   			return error;
> > >   	}
> > > @@ -1257,21 +1334,18 @@ xfs_attr_node_remove_step(
> > >   	xfs_da3_fixhashpath(state, &state->path);
> > >   	/*
> > > -	 * Check to see if the tree needs to be collapsed.
> > > +	 * Check to see if the tree needs to be collapsed.  Set the flag to
> > > +	 * indicate that the calling function needs to move the to shrink
> > > +	 * operation
> > >   	 */
> > >   	if (retval && (state->path.active > 1)) {
> > >   		error = xfs_da3_join(state);
> > >   		if (error)
> > >   			return error;
> > > -		error = xfs_defer_finish(&args->trans);
> > > -		if (error)
> > > -			return error;
> > > -		/*
> > > -		 * Commit the Btree join operation and start a new trans.
> > > -		 */
> > > -		error = xfs_trans_roll_inode(&args->trans, dp);
> > > -		if (error)
> > > -			return error;
> > > +
> > > +		dac->flags |= XFS_DAC_DEFER_FINISH;
> > > +		dac->dela_state = XFS_DAS_RM_SHRINK;
> > > +		return -EAGAIN;
> > >   	}
> > >   	return error;
> > > @@ -1282,31 +1356,53 @@ xfs_attr_node_remove_step(
> > >    *
> > >    * This routine will find the blocks of the name to remove, remove them and
> > >    * shirnk the tree if needed.
> > 
> > "...and shrink the tree..."
> > 
> Will fix the shirnk :-)
> 
> > > + *
> > > + * This routine is meant to function as either an inline or delayed operation,
> > > + * and may return -EAGAIN when the transaction needs to be rolled.  Calling
> > > + * functions will need to handle this, and recall the function until a
> > > + * successful error code is returned.
> > >    */
> > >   STATIC int
> > > -xfs_attr_node_removename(
> > > -	struct xfs_da_args	*args)
> > > +xfs_attr_node_removename_iter(
> > > +	struct xfs_delattr_context	*dac)
> > >   {
> > > -	struct xfs_da_state	*state;
> > > -	int			error;
> > > -	struct xfs_inode	*dp = args->dp;
> > > +	struct xfs_da_args		*args = dac->da_args;
> > > +	struct xfs_da_state		*state;
> > > +	int				error;
> > > +	struct xfs_inode		*dp = args->dp;
> > >   	trace_xfs_attr_node_removename(args);
> > > +	state = dac->da_state;
> > > -	error = xfs_attr_node_removename_setup(args, &state);
> > > -	if (error)
> > > -		goto out;
> > > +	if ((dac->flags & XFS_DAC_NODE_RMVNAME_INIT) == 0) {
> > > +		dac->flags |= XFS_DAC_NODE_RMVNAME_INIT;
> > 
> > Can we determine if it's necessary to call _removename_setup by checking
> > dac->da_state directly instead of having a flag?
> 
> Initially I think I had another XFS_DAS_RMTVAL_REMOVE state for this.
> Alternatly we also discussed using the inverse like this:
> 
> if (dac->dela_state != XFS_DAS_RMTVAL_REMOVE)
> 	do setup....
> 
> Though I think people liked having the init flag, since init routines we a
> sort of re-occuring pattern.  So that's why were using the flag now.

Oh, so (da_state != NULL) and (flags & XFS_DAC_NODE_RMVNAME_INIT) aren't
a 1:1 correlation?

> > 
> > > +		error = xfs_attr_node_removename_setup(dac, &state);
> > > +		if (error)
> > > +			goto out;
> > > +	}
> > > -	error = xfs_attr_node_remove_step(args, state);
> > > -	if (error)
> > > -		goto out;
> > > +	switch (dac->dela_state) {
> > > +	case XFS_DAS_UNINIT:
> > > +		error = xfs_attr_node_remove_step(dac);
> > > +		if (error)
> > > +			break;
> > > -	/*
> > > -	 * If the result is small enough, push it all into the inode.
> > > -	 */
> > > -	if (xfs_bmap_one_block(dp, XFS_ATTR_FORK))
> > > -		error = xfs_attr_node_shrink(args, state);
> > > +		/* do not break, proceed to shrink if needed */
> > 
> > /* fall through */
> > 
> > ...because otherwise the static checkers will get mad.
> > 
> > (Well clang will anyway because gcc, llvm, and the C18 body all have
> > different incompatible ideas of what should be the magic tag that
> > signals an intentional fall through, but this should at least be
> > consistent with the rest of xfs.)
> Oh ok then, I did not know.  Will update the comment
> 
> > 
> > > +	case XFS_DAS_RM_SHRINK:
> > > +		/*
> > > +		 * If the result is small enough, push it all into the inode.
> > > +		 */
> > > +		if (xfs_bmap_one_block(dp, XFS_ATTR_FORK))
> > > +			error = xfs_attr_node_shrink(args, state);
> > > +		break;
> > > +	default:
> > > +		ASSERT(0);
> > > +		return -EINVAL;
> > > +	}
> > > +
> > > +	if (error == -EAGAIN)
> > > +		return error;
> > >   out:
> > >   	if (state)
> > >   		xfs_da_state_free(state);
> > > diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
> > > index 3e97a93..64dcf0f 100644
> > > --- a/fs/xfs/libxfs/xfs_attr.h
> > > +++ b/fs/xfs/libxfs/xfs_attr.h
> > > @@ -74,6 +74,74 @@ struct xfs_attr_list_context {
> > >   };
> > > +/*
> > > + * ========================================================================
> > > + * Structure used to pass context around among the delayed routines.
> > > + * ========================================================================
> > > + */
> > > +
> > > +/*
> > > + * Below is a state machine diagram for attr remove operations. The  XFS_DAS_*
> > > + * states indicate places where the function would return -EAGAIN, and then
> > > + * immediately resume from after being recalled by the calling function. States
> > > + * marked as a "subroutine state" indicate that they belong to a subroutine, and
> > > + * so the calling function needs to pass them back to that subroutine to allow
> > > + * it to finish where it left off. But they otherwise do not have a role in the
> > > + * calling function other than just passing through.
> > > + *
> > > + * xfs_attr_remove_iter()
> > > + *	  XFS_DAS_RM_SHRINK ─┐
> > > + *	  (subroutine state) │
> > > + *	                     └─>xfs_attr_node_removename()
> > > + *	                                      │
> > > + *	                                      v
> > > + *	                                   need to
> > > + *	                                shrink tree? ─n─┐
> > > + *	                                      │         │
> > > + *	                                      y         │
> > > + *	                                      │         │
> > > + *	                                      v         │
> > > + *	                              XFS_DAS_RM_SHRINK │
> > > + *	                                      │         │
> > > + *	                                      v         │
> > > + *	                                     done <─────┘
> > > + *
> > > + */
> > > +
> > > +/*
> > > + * Enum values for xfs_delattr_context.da_state
> > > + *
> > > + * These values are used by delayed attribute operations to keep track  of where
> > > + * they were before they returned -EAGAIN.  A return code of -EAGAIN signals the
> > > + * calling function to roll the transaction, and then recall the subroutine to
> > > + * finish the operation.  The enum is then used by the subroutine to jump back
> > > + * to where it was and resume executing where it left off.
> > > + */
> > > +enum xfs_delattr_state {
> > > +	XFS_DAS_UNINIT		= 0,  /* No state has been set yet */
> > > +	XFS_DAS_RM_SHRINK,	      /* We are shrinking the tree */
> > > +};
> > > +
> > > +/*
> > > + * Defines for xfs_delattr_context.flags
> > > + */
> > > +#define XFS_DAC_DEFER_FINISH		0x01 /* finish the transaction */
> > > +#define XFS_DAC_NODE_RMVNAME_INIT	0x02 /* xfs_attr_node_removename init */
> > > +
> > > +/*
> > > + * Context used for keeping track of delayed attribute operations
> > > + */
> > > +struct xfs_delattr_context {
> > > +	struct xfs_da_args      *da_args;
> > > +
> > > +	/* Used in xfs_attr_node_removename to roll through removing blocks */
> > > +	struct xfs_da_state     *da_state;
> > > +
> > > +	/* Used to keep track of current state of delayed operation */
> > > +	unsigned int            flags;
> > > +	enum xfs_delattr_state  dela_state;
> > > +};
> > > +
> > >   /*========================================================================
> > >    * Function prototypes for the kernel.
> > >    *========================================================================*/
> > > @@ -91,6 +159,10 @@ int xfs_attr_set(struct xfs_da_args *args);
> > >   int xfs_attr_set_args(struct xfs_da_args *args);
> > >   int xfs_has_attr(struct xfs_da_args *args);
> > >   int xfs_attr_remove_args(struct xfs_da_args *args);
> > > +int xfs_attr_remove_iter(struct xfs_delattr_context *dac);
> > > +int xfs_attr_trans_roll(struct xfs_delattr_context *dac);
> > >   bool xfs_attr_namecheck(const void *name, size_t length);
> > > +void xfs_delattr_context_init(struct xfs_delattr_context *dac,
> > > +			      struct xfs_da_args *args);
> > >   #endif	/* __XFS_ATTR_H__ */
> > > diff --git a/fs/xfs/libxfs/xfs_attr_leaf.c b/fs/xfs/libxfs/xfs_attr_leaf.c
> > > index bb128db..338377e 100644
> > > --- a/fs/xfs/libxfs/xfs_attr_leaf.c
> > > +++ b/fs/xfs/libxfs/xfs_attr_leaf.c
> > > @@ -19,8 +19,8 @@
> > >   #include "xfs_bmap_btree.h"
> > >   #include "xfs_bmap.h"
> > >   #include "xfs_attr_sf.h"
> > > -#include "xfs_attr_remote.h"
> > >   #include "xfs_attr.h"
> > > +#include "xfs_attr_remote.h"
> > >   #include "xfs_attr_leaf.h"
> > >   #include "xfs_error.h"
> > >   #include "xfs_trace.h"
> > > diff --git a/fs/xfs/libxfs/xfs_attr_remote.c b/fs/xfs/libxfs/xfs_attr_remote.c
> > > index 48d8e9c..1426c15 100644
> > > --- a/fs/xfs/libxfs/xfs_attr_remote.c
> > > +++ b/fs/xfs/libxfs/xfs_attr_remote.c
> > > @@ -674,10 +674,12 @@ xfs_attr_rmtval_invalidate(
> > >    */
> > >   int
> > >   xfs_attr_rmtval_remove(
> > > -	struct xfs_da_args      *args)
> > > +	struct xfs_da_args		*args)
> > >   {
> > > -	int			error;
> > > -	int			retval;
> > > +	int				error;
> > > +	struct xfs_delattr_context	dac  = {
> > > +		.da_args	= args,
> > > +	};
> > >   	trace_xfs_attr_rmtval_remove(args);
> > > @@ -685,19 +687,17 @@ xfs_attr_rmtval_remove(
> > >   	 * Keep de-allocating extents until the remote-value region is gone.
> > >   	 */
> > >   	do {
> > > -		retval = __xfs_attr_rmtval_remove(args);
> > > -		if (retval && retval != -EAGAIN)
> > > -			return retval;
> > > +		error = __xfs_attr_rmtval_remove(&dac);
> > > +		if (error != -EAGAIN)
> > > +			break;
> > > -		/*
> > > -		 * Close out trans and start the next one in the chain.
> > > -		 */
> > > -		error = xfs_trans_roll_inode(&args->trans, args->dp);
> > > +		error = xfs_attr_trans_roll(&dac);
> > >   		if (error)
> > >   			return error;
> > > -	} while (retval == -EAGAIN);
> > > -	return 0;
> > > +	} while (true);
> > > +
> > > +	return error;
> > >   }
> > >   /*
> > > @@ -707,9 +707,10 @@ xfs_attr_rmtval_remove(
> > >    */
> > >   int
> > >   __xfs_attr_rmtval_remove(
> > > -	struct xfs_da_args	*args)
> > > +	struct xfs_delattr_context	*dac)
> > >   {
> > > -	int			error, done;
> > > +	struct xfs_da_args		*args = dac->da_args;
> > > +	int				error, done;
> > >   	/*
> > >   	 * Unmap value blocks for this attr.
> > > @@ -719,12 +720,10 @@ __xfs_attr_rmtval_remove(
> > >   	if (error)
> > >   		return error;
> > > -	error = xfs_defer_finish(&args->trans);
> > > -	if (error)
> > > -		return error;
> > > -
> > > -	if (!done)
> > > +	if (!done) {
> > > +		dac->flags |= XFS_DAC_DEFER_FINISH;
> > >   		return -EAGAIN;
> > 
> > What state are we in when we return -EAGAIN here?
> > 
> > [jumps back to his whole-branch diff]
> > 
> > Hm, oh, I see, the next state could be a number of things--
> > 
> > RM_LBLK if we're removing an old remote value from a leaf block as part
> > of an attr set operation; or
> > 
> > RM_NBLK if we're removing an old remote value from a node block as part
> > of an attr set operation; and
> > 
> > UNINIT if we're removing a remote value as part of an attr set
> > operation.
> > 
> > Oh!  For the first two, it looks to me as though either we're already in
> > the state we're setting (RM_[LN]BLK) or we were in either of the
> > FLIP_[LN]FLAG state.
> > 
> > I think it would make more sense if you set the state before calling the
> > rmtval_remove function, and leave a comment here saying that the caller
> > is responsible for figuring out the next state.
> Sure, it should be ok
> 
> > 
> > For removals, I wonder if we should have advanced beyond UNINIT by the
> > time we get here?  I think you've added the minimum states that are
> > necessary to resume work after a transaction roll, but from this and the
> > next patch I feel like we do a lot of work while dela_state == UNINIT.
> Yes, I think I went over that a little in my replies to your earlier
> reviews.  Many times we can get away with out setting a state to accomplish
> the same behavior, though it may make it a little harder to visualize where
> it comes back.
> 
> I dunno this one seems like a preference in so far as what people want to
> see for simplification.  I think haveing the explicit state setting makes
> the code easier for a reader to follow, though I will concede they dont
> actually have to be there to make it work.

<nod> Maybe (as I said earlier in this reply) we can get by with having
the ascii art diagram point out all the things that happen while we're
in "UNINIT" state before the first transaction roll.

I suspect that showing the steps and how the DAC state machine relates
to those steps is the best we're going to be able to do w.r.t.
restructuring a general key-value store implemented inside the kernel.
:)

> > 
> > FWIW I will be taking a close look at all the new 'return -EAGAIN'
> > statements to see if I can tell what state we're in when we trigger a
> > transaction roll.
> Well, ok, a lot of them are UNINIT.  If we continue in the direrction of
> removing all unnecessary states, really it's the combination of the tree and
> the state that actually lands us back to where we need to be when the
> function is recalled.
> 
> If, for debugging or readability purposes, we wanted an explicit state for
> each EAGAIN, we would reintroduce a lot of states we've simplifid away over
> the reviews.
> 
> Maybe give it a day or two to sleep on, and let me know what you think :-)

<nod> OK.

--D

> Thanks for the reviews, I know it's really complicated.
> Allison
> 
> > 
> > --D
> > 
> > > +	}
> > >   	return error;
> > >   }
> > > diff --git a/fs/xfs/libxfs/xfs_attr_remote.h b/fs/xfs/libxfs/xfs_attr_remote.h
> > > index 9eee615..002fd30 100644
> > > --- a/fs/xfs/libxfs/xfs_attr_remote.h
> > > +++ b/fs/xfs/libxfs/xfs_attr_remote.h
> > > @@ -14,5 +14,5 @@ int xfs_attr_rmtval_remove(struct xfs_da_args *args);
> > >   int xfs_attr_rmtval_stale(struct xfs_inode *ip, struct xfs_bmbt_irec *map,
> > >   		xfs_buf_flags_t incore_flags);
> > >   int xfs_attr_rmtval_invalidate(struct xfs_da_args *args);
> > > -int __xfs_attr_rmtval_remove(struct xfs_da_args *args);
> > > +int __xfs_attr_rmtval_remove(struct xfs_delattr_context *dac);
> > >   #endif /* __XFS_ATTR_REMOTE_H__ */
> > > diff --git a/fs/xfs/xfs_attr_inactive.c b/fs/xfs/xfs_attr_inactive.c
> > > index bfad669..aaa7e66 100644
> > > --- a/fs/xfs/xfs_attr_inactive.c
> > > +++ b/fs/xfs/xfs_attr_inactive.c
> > > @@ -15,10 +15,10 @@
> > >   #include "xfs_da_format.h"
> > >   #include "xfs_da_btree.h"
> > >   #include "xfs_inode.h"
> > > +#include "xfs_attr.h"
> > >   #include "xfs_attr_remote.h"
> > >   #include "xfs_trans.h"
> > >   #include "xfs_bmap.h"
> > > -#include "xfs_attr.h"
> > >   #include "xfs_attr_leaf.h"
> > >   #include "xfs_quota.h"
> > >   #include "xfs_dir2.h"
> > > -- 
> > > 2.7.4
> > > 

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v13 03/10] xfs: Add delay ready attr set routines
  2020-11-13 17:12           ` Allison Henderson
@ 2020-11-14  1:20             ` Darrick J. Wong
  0 siblings, 0 replies; 58+ messages in thread
From: Darrick J. Wong @ 2020-11-14  1:20 UTC (permalink / raw)
  To: Allison Henderson; +Cc: Chandan Babu R, linux-xfs

On Fri, Nov 13, 2020 at 10:12:09AM -0700, Allison Henderson wrote:
> 
> 
> On 11/13/20 2:16 AM, Chandan Babu R wrote:
> > On Friday 13 November 2020 7:03:13 AM IST Allison Henderson wrote:
> > > 
> > > On 11/10/20 2:57 PM, Darrick J. Wong wrote:
> > > > On Tue, Oct 27, 2020 at 07:02:55PM +0530, Chandan Babu R wrote:
> > > > > On Friday 23 October 2020 12:04:28 PM IST Allison Henderson wrote:
> > > > > > This patch modifies the attr set routines to be delay ready. This means
> > > > > > they no longer roll or commit transactions, but instead return -EAGAIN
> > > > > > to have the calling routine roll and refresh the transaction.  In this
> > > > > > series, xfs_attr_set_args has become xfs_attr_set_iter, which uses a
> > > > > > state machine like switch to keep track of where it was when EAGAIN was
> > > > > > returned. See xfs_attr.h for a more detailed diagram of the states.
> > > > > > 
> > > > > > Two new helper functions have been added: xfs_attr_rmtval_set_init and
> > > > > > xfs_attr_rmtval_set_blk.  They provide a subset of logic similar to
> > > > > > xfs_attr_rmtval_set, but they store the current block in the delay attr
> > > > > > context to allow the caller to roll the transaction between allocations.
> > > > > > This helps to simplify and consolidate code used by
> > > > > > xfs_attr_leaf_addname and xfs_attr_node_addname. xfs_attr_set_args has
> > > > > > now become a simple loop to refresh the transaction until the operation
> > > > > > is completed.  Lastly, xfs_attr_rmtval_remove is no longer used, and is
> > > > > > removed.
> > > > > 
> > > > > One nit. xfs_attr_rmtval_remove()'s prototype declaration needs to be removed
> > > > > from xfs_attr_remote.h.
> > > Alrighty, will pull out
> > > 
> > > > > 
> > > > > > 
> > > > > > Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
> > > > > > ---
> > > > > >    fs/xfs/libxfs/xfs_attr.c        | 370 ++++++++++++++++++++++++++--------------
> > > > > >    fs/xfs/libxfs/xfs_attr.h        | 126 +++++++++++++-
> > > > > >    fs/xfs/libxfs/xfs_attr_remote.c |  99 +++++++----
> > > > > >    fs/xfs/libxfs/xfs_attr_remote.h |   4 +
> > > > > >    fs/xfs/xfs_trace.h              |   1 -
> > > > > >    5 files changed, 439 insertions(+), 161 deletions(-)
> > > > > > 
> > > > > > diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
> > > > > > index 6ca94cb..95c98d7 100644
> > > > > > --- a/fs/xfs/libxfs/xfs_attr.c
> > > > > > +++ b/fs/xfs/libxfs/xfs_attr.c
> > > > > > @@ -44,7 +44,7 @@ STATIC int xfs_attr_shortform_addname(xfs_da_args_t *args);
> > > > > >     * Internal routines when attribute list is one block.
> > > > > >     */
> > > > > >    STATIC int xfs_attr_leaf_get(xfs_da_args_t *args);
> > > > > > -STATIC int xfs_attr_leaf_addname(xfs_da_args_t *args);
> > > > > > +STATIC int xfs_attr_leaf_addname(struct xfs_delattr_context *dac);
> > > > > >    STATIC int xfs_attr_leaf_removename(xfs_da_args_t *args);
> > > > > >    STATIC int xfs_attr_leaf_hasname(struct xfs_da_args *args, struct xfs_buf **bp);
> > > > > > @@ -52,12 +52,15 @@ STATIC int xfs_attr_leaf_hasname(struct xfs_da_args *args, struct xfs_buf **bp);
> > > > > >     * Internal routines when attribute list is more than one block.
> > > > > >     */
> > > > > >    STATIC int xfs_attr_node_get(xfs_da_args_t *args);
> > > > > > -STATIC int xfs_attr_node_addname(xfs_da_args_t *args);
> > > > > > +STATIC int xfs_attr_node_addname(struct xfs_delattr_context *dac);
> > > > > >    STATIC int xfs_attr_node_removename_iter(struct xfs_delattr_context *dac);
> > > > > >    STATIC int xfs_attr_node_hasname(xfs_da_args_t *args,
> > > > > >    				 struct xfs_da_state **state);
> > > > > >    STATIC int xfs_attr_fillstate(xfs_da_state_t *state);
> > > > > >    STATIC int xfs_attr_refillstate(xfs_da_state_t *state);
> > > > > > +STATIC int xfs_attr_leaf_try_add(struct xfs_da_args *args, struct xfs_buf *bp);
> > > > > > +STATIC int xfs_attr_set_iter(struct xfs_delattr_context *dac,
> > > > > > +			     struct xfs_buf **leaf_bp);
> > > > > >    int
> > > > > >    xfs_inode_hasattr(
> > > > > > @@ -218,8 +221,11 @@ xfs_attr_is_shortform(
> > > > > >    /*
> > > > > >     * Attempts to set an attr in shortform, or converts short form to leaf form if
> > > > > > - * there is not enough room.  If the attr is set, the transaction is committed
> > > > > > - * and set to NULL.
> > > > > > + * there is not enough room.  This function is meant to operate as a helper
> > > > > > + * routine to the delayed attribute functions.  It returns -EAGAIN to indicate
> > > > > > + * that the calling function should roll the transaction, and then proceed to
> > > > > > + * add the attr in leaf form.  This subroutine does not expect to be recalled
> > > > > > + * again like the other delayed attr routines do.
> > > > > >     */
> > > > > >    STATIC int
> > > > > >    xfs_attr_set_shortform(
> > > > > > @@ -227,16 +233,16 @@ xfs_attr_set_shortform(
> > > > > >    	struct xfs_buf		**leaf_bp)
> > > > > >    {
> > > > > >    	struct xfs_inode	*dp = args->dp;
> > > > > > -	int			error, error2 = 0;
> > > > > > +	int			error = 0;
> > > > > >    	/*
> > > > > >    	 * Try to add the attr to the attribute list in the inode.
> > > > > >    	 */
> > > > > >    	error = xfs_attr_try_sf_addname(dp, args);
> > > > > > +
> > > > > > +	/* Should only be 0, -EEXIST or ENOSPC */
> > > > > >    	if (error != -ENOSPC) {
> > > > > > -		error2 = xfs_trans_commit(args->trans);
> > > > > > -		args->trans = NULL;
> > > > > > -		return error ? error : error2;
> > > > > > +		return error;
> > > > > >    	}
> > > > > >    	/*
> > > > > >    	 * It won't fit in the shortform, transform to a leaf block.  GROT:
> > > > > > @@ -249,18 +255,10 @@ xfs_attr_set_shortform(
> > > > > >    	/*
> > > > > >    	 * Prevent the leaf buffer from being unlocked so that a concurrent AIL
> > > > > >    	 * push cannot grab the half-baked leaf buffer and run into problems
> > > > > > -	 * with the write verifier. Once we're done rolling the transaction we
> > > > > > -	 * can release the hold and add the attr to the leaf.
> > > > > > +	 * with the write verifier.
> > > > > >    	 */
> > > > > >    	xfs_trans_bhold(args->trans, *leaf_bp);
> > > > > > -	error = xfs_defer_finish(&args->trans);
> > > > > > -	xfs_trans_bhold_release(args->trans, *leaf_bp);
> > > > > > -	if (error) {
> > > > > > -		xfs_trans_brelse(args->trans, *leaf_bp);
> > > > > > -		return error;
> > > > > > -	}
> > > > > > -
> > > > > > -	return 0;
> > > > > > +	return -EAGAIN;
> > > > > >    }
> > > > > >    /*
> > > > > > @@ -268,7 +266,7 @@ xfs_attr_set_shortform(
> > > > > >     * also checks for a defer finish.  Transaction is finished and rolled as
> > > > > >     * needed, and returns true of false if the delayed operation should continue.
> > > > > >     */
> > > > > > -int
> > > > > > +STATIC int
> > > > > >    xfs_attr_trans_roll(
> > > > > >    	struct xfs_delattr_context	*dac)
> > > > > >    {
> > > > > > @@ -297,61 +295,130 @@ int
> > > > > >    xfs_attr_set_args(
> > > > > >    	struct xfs_da_args	*args)
> > > > > >    {
> > > > > > -	struct xfs_inode	*dp = args->dp;
> > > > > > -	struct xfs_buf          *leaf_bp = NULL;
> > > > > > -	int			error = 0;
> > > > > > +	struct xfs_buf			*leaf_bp = NULL;
> > > > > > +	int				error = 0;
> > > > > > +	struct xfs_delattr_context	dac = {
> > > > > > +		.da_args	= args,
> > > > > > +	};
> > > > > > +
> > > > > > +	do {
> > > > > > +		error = xfs_attr_set_iter(&dac, &leaf_bp);
> > > > > > +		if (error != -EAGAIN)
> > > > > > +			break;
> > > > > > +
> > > > > > +		error = xfs_attr_trans_roll(&dac);
> > > > > > +		if (error)
> > > > > > +			return error;
> > > > > > +
> > > > > > +		if (leaf_bp) {
> > > > > > +			xfs_trans_bjoin(args->trans, leaf_bp);
> > > > > > +			xfs_trans_bhold(args->trans, leaf_bp);
> > > > > > +		}
> > > > > 
> > > > > When xfs_attr_set_iter() causes a "short form" attribute list to be converted
> > > > > to "leaf form", leaf_bp would point to an xfs_buf which has been added to the
> > > > > transaction and also XFS_BLI_HOLD flag is set on the buffer (last statement in
> > > > > xfs_attr_set_shortform()). XFS_BLI_HOLD flag makes sure that the new
> > > > > transaction allocated by xfs_attr_trans_roll() would continue to have leaf_bp
> > > > > in the transaction's item list. Hence I think the above calls to
> > > > > xfs_trans_bjoin() and xfs_trans_bhold() are not required.
> > > Sorry, I just noticed Chandans commentary for this patch.  Apologies. I
> > > think we can get away with out this now, but yes this routine disappears
> > > at the end of the set now.  Will clean out anyway for bisecting reasons
> > > though. :-)
> > 
> > No problem. As an aside, I stopped reviewing the patchset after I noticed
> > Brian's review comments for "[PATCH v13 02/10] xfs: Add delay ready attr
> > remove routines" suggesting some more code refactoring work.
> > 
> No worries, thats reasonable.  It's why I only send this out in subsets to
> try and keep people sort of focused on a smaller area because stuff at the
> end of the set changes more often as a result of things moving around at the
> bottom of the set.  It doesn't make sense to channel too much effort into
> something that's still moving around so much :-)

<nod> TBH either I seem to make time to review the entire series or I
just fail to find time to start it at all. :(

That said I also usually start at the end and work my way backwards,
assuming that most people don't do that, and the author would probably
like it if /someone/ covered the end parts.

--D

> Allison
> > > 
> > > > 
> > > > I /think/ the defer ops will rejoin the buffer each time it rolls, which
> > > > means that xfs_attr_trans_roll returns with the buffer already joined to
> > > > the transaction?  And I think you're right that the bhold isn't needed,
> > > > because holding is dictated by the lower levels (i.e. _set_iter).
> > > > 
> > > > > Please let me know if I am missing something obvious here.
> > > > 
> > > > The entire function goes away by the end of the series. :)
> > > > 
> > > > --D
> > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > 
> > 
> > 

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v13 03/10] xfs: Add delay ready attr set routines
  2020-11-13  1:38     ` Allison Henderson
@ 2020-11-14  1:35       ` Darrick J. Wong
  2020-11-16  5:25         ` Allison Henderson
  0 siblings, 1 reply; 58+ messages in thread
From: Darrick J. Wong @ 2020-11-14  1:35 UTC (permalink / raw)
  To: Allison Henderson; +Cc: linux-xfs

On Thu, Nov 12, 2020 at 06:38:10PM -0700, Allison Henderson wrote:
> 
> 
> On 11/10/20 4:10 PM, Darrick J. Wong wrote:
> > On Thu, Oct 22, 2020 at 11:34:28PM -0700, Allison Henderson wrote:
> > > This patch modifies the attr set routines to be delay ready. This means
> > > they no longer roll or commit transactions, but instead return -EAGAIN
> > > to have the calling routine roll and refresh the transaction.  In this
> > > series, xfs_attr_set_args has become xfs_attr_set_iter, which uses a
> > > state machine like switch to keep track of where it was when EAGAIN was
> > > returned. See xfs_attr.h for a more detailed diagram of the states.
> > > 
> > > Two new helper functions have been added: xfs_attr_rmtval_set_init and
> > > xfs_attr_rmtval_set_blk.  They provide a subset of logic similar to
> > > xfs_attr_rmtval_set, but they store the current block in the delay attr
> > > context to allow the caller to roll the transaction between allocations.
> > > This helps to simplify and consolidate code used by
> > > xfs_attr_leaf_addname and xfs_attr_node_addname. xfs_attr_set_args has
> > > now become a simple loop to refresh the transaction until the operation
> > > is completed.  Lastly, xfs_attr_rmtval_remove is no longer used, and is
> > > removed.
> > > 
> > > Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
> > > ---
> > >   fs/xfs/libxfs/xfs_attr.c        | 370 ++++++++++++++++++++++++++--------------
> > >   fs/xfs/libxfs/xfs_attr.h        | 126 +++++++++++++-
> > >   fs/xfs/libxfs/xfs_attr_remote.c |  99 +++++++----
> > >   fs/xfs/libxfs/xfs_attr_remote.h |   4 +
> > >   fs/xfs/xfs_trace.h              |   1 -
> > >   5 files changed, 439 insertions(+), 161 deletions(-)
> > > 
> > > diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
> > > index 6ca94cb..95c98d7 100644
> > > --- a/fs/xfs/libxfs/xfs_attr.c
> > > +++ b/fs/xfs/libxfs/xfs_attr.c
> > > @@ -44,7 +44,7 @@ STATIC int xfs_attr_shortform_addname(xfs_da_args_t *args);
> > >    * Internal routines when attribute list is one block.
> > >    */
> > >   STATIC int xfs_attr_leaf_get(xfs_da_args_t *args);
> > > -STATIC int xfs_attr_leaf_addname(xfs_da_args_t *args);
> > > +STATIC int xfs_attr_leaf_addname(struct xfs_delattr_context *dac);
> > >   STATIC int xfs_attr_leaf_removename(xfs_da_args_t *args);
> > >   STATIC int xfs_attr_leaf_hasname(struct xfs_da_args *args, struct xfs_buf **bp);
> > > @@ -52,12 +52,15 @@ STATIC int xfs_attr_leaf_hasname(struct xfs_da_args *args, struct xfs_buf **bp);
> > >    * Internal routines when attribute list is more than one block.
> > >    */
> > >   STATIC int xfs_attr_node_get(xfs_da_args_t *args);
> > > -STATIC int xfs_attr_node_addname(xfs_da_args_t *args);
> > > +STATIC int xfs_attr_node_addname(struct xfs_delattr_context *dac);
> > >   STATIC int xfs_attr_node_removename_iter(struct xfs_delattr_context *dac);
> > >   STATIC int xfs_attr_node_hasname(xfs_da_args_t *args,
> > >   				 struct xfs_da_state **state);
> > >   STATIC int xfs_attr_fillstate(xfs_da_state_t *state);
> > >   STATIC int xfs_attr_refillstate(xfs_da_state_t *state);
> > > +STATIC int xfs_attr_leaf_try_add(struct xfs_da_args *args, struct xfs_buf *bp);
> > > +STATIC int xfs_attr_set_iter(struct xfs_delattr_context *dac,
> > > +			     struct xfs_buf **leaf_bp);
> > >   int
> > >   xfs_inode_hasattr(
> > > @@ -218,8 +221,11 @@ xfs_attr_is_shortform(
> > >   /*
> > >    * Attempts to set an attr in shortform, or converts short form to leaf form if
> > > - * there is not enough room.  If the attr is set, the transaction is committed
> > > - * and set to NULL.
> > > + * there is not enough room.  This function is meant to operate as a helper
> > > + * routine to the delayed attribute functions.  It returns -EAGAIN to indicate
> > > + * that the calling function should roll the transaction, and then proceed to
> > > + * add the attr in leaf form.  This subroutine does not expect to be recalled
> > > + * again like the other delayed attr routines do.
> > >    */
> > >   STATIC int
> > >   xfs_attr_set_shortform(
> > > @@ -227,16 +233,16 @@ xfs_attr_set_shortform(
> > >   	struct xfs_buf		**leaf_bp)
> > >   {
> > >   	struct xfs_inode	*dp = args->dp;
> > > -	int			error, error2 = 0;
> > > +	int			error = 0;
> > >   	/*
> > >   	 * Try to add the attr to the attribute list in the inode.
> > >   	 */
> > >   	error = xfs_attr_try_sf_addname(dp, args);
> > > +
> > > +	/* Should only be 0, -EEXIST or ENOSPC */
> > 
> > Nit: "...or -ENOSPC"
> > 
> > Also, this comment could go a couple of lines up:
> Sure
> > 
> > 	/*
> > 	 * Try to add the attr to the attribute list in the inode.
> > 	 * This should only return 0, -EEXIST, or -ENOSPC.
> > 	 */
> > 	error = xfs_attr_try_sf_addname(dp, args);
> > 	if (error != -ENOSPC)
> > 		return error;
> > 
> > 
> > >   	if (error != -ENOSPC) {
> > > -		error2 = xfs_trans_commit(args->trans);
> > > -		args->trans = NULL;
> > > -		return error ? error : error2;
> > > +		return error;
> > >   	}
> > >   	/*
> > >   	 * It won't fit in the shortform, transform to a leaf block.  GROT:
> > > @@ -249,18 +255,10 @@ xfs_attr_set_shortform(
> > >   	/*
> > >   	 * Prevent the leaf buffer from being unlocked so that a concurrent AIL
> > >   	 * push cannot grab the half-baked leaf buffer and run into problems
> > > -	 * with the write verifier. Once we're done rolling the transaction we
> > > -	 * can release the hold and add the attr to the leaf.
> > > +	 * with the write verifier.
> > >   	 */
> > >   	xfs_trans_bhold(args->trans, *leaf_bp);
> > > -	error = xfs_defer_finish(&args->trans);
> > > -	xfs_trans_bhold_release(args->trans, *leaf_bp);
> > > -	if (error) {
> > > -		xfs_trans_brelse(args->trans, *leaf_bp);
> > > -		return error;
> > > -	}
> > > -
> > > -	return 0;
> > > +	return -EAGAIN;
> > 
> > What state are we in when return -EAGAIN here?  Are we still in
> > XFS_DAS_UNINIT, but with an attr fork that is no longer in local format,
> > which means that we skip the xfs_attr_is_shortform branch next time
> > around?
> Yes, that's correct.  I think I used to have an explicit state for it, but
> it's really not needed for this reason.  Though I think they do add some
> degree of readability.  Maybe we could add a comment?
> 
> /* Restart attr operation in leaf format */
> 
> ?

Or even mention the DAS state explicitly, e.g.

/*
 * We're still in XFS_DAS_UNINIT state here.  We've converted the attr
 * fork to leaf format and will restart with the leaf add.
 */

Hmm, second question: Could you add some tracepoints that would fire
every time we either change the DAS state or return -EAGAIN to trigger a
roll?  I bet that will make debugging the attr code easier in the future.

> 
> > 
> > >   }
> > >   /*
> > > @@ -268,7 +266,7 @@ xfs_attr_set_shortform(
> > >    * also checks for a defer finish.  Transaction is finished and rolled as
> > >    * needed, and returns true of false if the delayed operation should continue.
> > >    */
> > > -int
> > > +STATIC int
> > >   xfs_attr_trans_roll(
> > >   	struct xfs_delattr_context	*dac)
> > >   {
> > > @@ -297,61 +295,130 @@ int
> > >   xfs_attr_set_args(
> > >   	struct xfs_da_args	*args)
> > >   {
> > > -	struct xfs_inode	*dp = args->dp;
> > > -	struct xfs_buf          *leaf_bp = NULL;
> > > -	int			error = 0;
> > > +	struct xfs_buf			*leaf_bp = NULL;
> > > +	int				error = 0;
> > > +	struct xfs_delattr_context	dac = {
> > > +		.da_args	= args,
> > > +	};
> > > +
> > > +	do {
> > > +		error = xfs_attr_set_iter(&dac, &leaf_bp);
> > > +		if (error != -EAGAIN)
> > > +			break;
> > > +
> > > +		error = xfs_attr_trans_roll(&dac);
> > > +		if (error)
> > > +			return error;
> > > +
> > > +		if (leaf_bp) {
> > > +			xfs_trans_bjoin(args->trans, leaf_bp);
> > > +			xfs_trans_bhold(args->trans, leaf_bp);
> > > +		}
> > > +
> > > +	} while (true);
> > > +
> > > +	return error;
> > > +}
> > > +
> > > +/*
> > > + * Set the attribute specified in @args.
> > > + * This routine is meant to function as a delayed operation, and may return
> > > + * -EAGAIN when the transaction needs to be rolled.  Calling functions will need
> > > + * to handle this, and recall the function until a successful error code is
> > > + * returned.
> > > + */
> > > +STATIC int
> > > +xfs_attr_set_iter(
> > > +	struct xfs_delattr_context	*dac,
> > > +	struct xfs_buf			**leaf_bp)
> > > +{
> > > +	struct xfs_da_args		*args = dac->da_args;
> > > +	struct xfs_inode		*dp = args->dp;
> > > +	int				error = 0;
> > > +
> > > +	/* State machine switch */
> > > +	switch (dac->dela_state) {
> > > +	case XFS_DAS_FLIP_LFLAG:
> > > +	case XFS_DAS_FOUND_LBLK:
> > 
> > Do we need to catch XFS_DAS_RM_LBLK here?
> 
> I think we fall into the correct code path without it, but I think it's
> better to have it here for consistency.  Will add.
> 
> > 
> > > +		goto das_leaf;
> > > +	case XFS_DAS_FOUND_NBLK:
> > > +	case XFS_DAS_FLIP_NFLAG:
> > > +	case XFS_DAS_ALLOC_NODE:
> > > +		goto das_node;
> > > +	default:
> > > +		break;
> > > +	}
> > >   	/*
> > >   	 * If the attribute list is already in leaf format, jump straight to
> > >   	 * leaf handling.  Otherwise, try to add the attribute to the shortform
> > >   	 * list; if there's no room then convert the list to leaf format and try
> > > -	 * again.
> > > +	 * again. No need to set state as we will be in leaf form when we come
> > > +	 * back
> > >   	 */
> > >   	if (xfs_attr_is_shortform(dp)) {
> > >   		/*
> > > -		 * If the attr was successfully set in shortform, the
> > > -		 * transaction is committed and set to NULL.  Otherwise, is it
> > > -		 * converted from shortform to leaf, and the transaction is
> > > -		 * retained.
> > > +		 * If the attr was successfully set in shortform, no need to
> > > +		 * continue.  Otherwise, is it converted from shortform to leaf
> > > +		 * and -EAGAIN is returned.
> > >   		 */
> > > -		error = xfs_attr_set_shortform(args, &leaf_bp);
> > > -		if (error || !args->trans)
> > > -			return error;
> > > +		error = xfs_attr_set_shortform(args, leaf_bp);
> > > +		if (error == -EAGAIN)
> > > +			dac->flags |= XFS_DAC_DEFER_FINISH;
> > > +
> > > +		return error;
> > >   	}
> > > -	if (xfs_bmap_one_block(dp, XFS_ATTR_FORK)) {
> > > -		error = xfs_attr_leaf_addname(args);
> > > -		if (error != -ENOSPC)
> > > -			return error;
> > > +	/*
> > > +	 * After a shortform to leaf conversion, we need to hold the leaf and
> > > +	 * cycle out the transaction.  When we get back, we need to release
> > > +	 * the leaf.
> > 
> > "...to release the hold on the leaf buffer."
> Sure, will expand
> 
> > 
> > > +	 */
> > > +	if (*leaf_bp != NULL) {
> > > +		xfs_trans_bhold_release(args->trans, *leaf_bp);
> > > +		*leaf_bp = NULL;
> > > +	}
> > > -		/*
> > > -		 * Promote the attribute list to the Btree format.
> > > -		 */
> > > -		error = xfs_attr3_leaf_to_node(args);
> > > -		if (error)
> > > -			return error;
> > > +	if (xfs_bmap_one_block(dp, XFS_ATTR_FORK)) {
> > > +		error = xfs_attr_leaf_try_add(args, *leaf_bp);
> > > +		switch (error) {
> > > +		case -ENOSPC:
> > > +			/*
> > > +			 * Promote the attribute list to the Btree format.
> > > +			 */
> > > +			error = xfs_attr3_leaf_to_node(args);
> > > +			if (error)
> > > +				return error;
> > > -		/*
> > > -		 * Finish any deferred work items and roll the transaction once
> > > -		 * more.  The goal here is to call node_addname with the inode
> > > -		 * and transaction in the same state (inode locked and joined,
> > > -		 * transaction clean) no matter how we got to this step.
> > > -		 */
> > > -		error = xfs_defer_finish(&args->trans);
> > > -		if (error)
> > > +			/*
> > > +			 * Finish any deferred work items and roll the
> > > +			 * transaction once more.  The goal here is to call
> > > +			 * node_addname with the inode and transaction in the
> > > +			 * same state (inode locked and joined, transaction
> > > +			 * clean) no matter how we got to this step.
> > > +			 */
> > > +			dac->flags |= XFS_DAC_DEFER_FINISH;
> > > +			return -EAGAIN;
> > 
> > What state should we be in at this -EAGAIN return?  Is it
> > XFS_DAS_UNINIT, but with more than one extent in the attr fork?
> It could be UNINIT, if the attr was already a leaf at the time we started.
> If we had to promote from a block to a leaf, and STILL counldnt fit in leaf
> form, then we're probably in some state reminiscent of the leaf routines.
> But because xfs_attr3_leaf_to_node just turned us into a node, we fall into
> the node path upon return.
> 
> I know that's confusing... which leads to your next question of.....
> > 
> > /me is wishing these would get turned into explicit states, since afaict
> > we don't unlock the inode and so we should find it in /exactly/ the
> > state that the delattr_context says it should be in.
> IIRC it used to have an explicit XFS_DC_LEAF_TO_NODE state, but I think we
> simplified it away at some point in the reviewing in an effort to simplify
> the statemachine as much as possible.  v8 I think.  But yes, I do think
> there is a trade off between removing the states where they can be, but then
> reducing the readability of where we are in the attr process.  Because now
> your state isnt exactly represented by dela_state anymore, it's the
> combination of dela_state and the state of the tree.
> 
> I think I've been over this code so much by now, I can follow it either way,
> but if it's confusing to others, maybe we should put it back?  Or maybe just
> a comment if that helps?

A comment laying out which states we could be in and how we got there. :)

> 
> 
> > 
> > > +		case 0:
> > > +			dac->dela_state = XFS_DAS_FOUND_LBLK;
> > > +			return -EAGAIN;
> > > +		default:
> > >   			return error;
> > > +		}
> > > +das_leaf:
> > 
> > The only way to get to this block of code is by jumping to das_leaf,
> > from the switch statement above, right?  If so, then shouldn't it be up
> > there in the switch statement?
> We could, though I think we were just trying to be consistent in that the
> switch is sort of a dispatcher for gotos?  Otherwise we end up with a switch
> with giant cases.  It's the same difference I suppose.

With this comment in particular, I had to dig through the switch
statement in the previous code block to figure out that it's not
possible to fall into das_leaf from above.

> > 
> > > +		error = xfs_attr_leaf_addname(dac);
> > > +		if (error == -ENOSPC)
> > > +			/*
> > > +			 * No need to set state.  We will be in node form when
> > > +			 * we are recalled
> > > +			 */
> > > +			return -EAGAIN;
> > 
> > How do we get to node form?
> Hmm, I thought xfs_attr_leaf_addname did promote to node if theres not
> enough space, but now that you point it out, i'm not seeing it.  We may have
> to put the LEAF_TO_NODE state back anyway.
> 
> maybe i can add a test case too, it doesnt look like any of the existing
> cases run across it.

Hm.  At least in the old code, I thought it was xfs_attr_set_args that
would call xfs_attr_leaf_addname and if it returned ENOSPC, it would
then call xfs_attr3_leaf_to_node...

(Gosh, I can't even tell where we are anymore. :()

> > 
> > > -		/*
> > > -		 * Commit the current trans (including the inode) and
> > > -		 * start a new one.
> > > -		 */
> > > -		error = xfs_trans_roll_inode(&args->trans, dp);
> > > -		if (error)
> > > -			return error;
> > > +		return error;
> > >   	}
> > > -
> > > -	error = xfs_attr_node_addname(args);
> > > +das_node:
> > > +	error = xfs_attr_node_addname(dac);
> > >   	return error;
> > 
> > Similarly, I think the only way get to this block of code is if we're in
> > the initial state (XFS_DAS_UNINIT?) and the inode wasn't in short
> > format; or if we jumped here via DAS_{FOUND_NBLK,FLIP_NFLAG,ALLOC_NODE},
> > right?
> > 
> > I think you could straighten this out a bit further (I left out the
> > comments):
> > 
> > 	switch (dac->dela_state) {
> > 	case XFS_DAS_FLIP_LFLAG:
> > 	case XFS_DAS_FOUND_LBLK:
> > 		error = xfs_attr_leaf_addname(dac);
> > 		if (error == -ENOSPC)
> > 			return -EAGAIN;
> > 		return error;
> > 	case XFS_DAS_FOUND_NBLK:
> > 	case XFS_DAS_FLIP_NFLAG:
> > 	case XFS_DAS_ALLOC_NODE:
> > 		return xfs_attr_node_addname(dac);
> > 	case XFS_DAS_UNINIT:
> > 		break;
> > 	default:
> > 		...assert on the XFS_DAS_RM_* flags...
> > 	}
> > 
> > 	if (xfs_attr_is_shortform(dp))
> > 		return xfs_attr_set_shortform(args, leaf_bp);
> > 
> > 	if (*leaf_bp != NULL) {
> > 		...release bhold...
> > 	}
> > 
> > 	if (!xfs_bmap_one_block(...))
> > 		return xfs_attr_node_addname(dac);
> > 
> > 	error = xfs_attr_leaf_try_add(args, *leaf_bp);
> > 	switch (error) {
> > 	...handle -ENOSPC and 0...
> > 	}
> > 	return error;
> > 
> Ok, I'll see if I can get something like that through the test cases. If if
> doesnt work out, I'll make a note of it.

<nod>

> > >   }
> > > @@ -723,28 +790,30 @@ xfs_attr_leaf_try_add(
> > >    *
> > >    * This leaf block cannot have a "remote" value, we only call this routine
> > >    * if bmap_one_block() says there is only one block (ie: no remote blks).
> > > + *
> > > + * This routine is meant to function as a delayed operation, and may return
> > > + * -EAGAIN when the transaction needs to be rolled.  Calling functions will need
> > > + * to handle this, and recall the function until a successful error code is
> > > + * returned.
> > >    */
> > >   STATIC int
> > >   xfs_attr_leaf_addname(
> > > -	struct xfs_da_args	*args)
> > > +	struct xfs_delattr_context	*dac)
> > >   {
> > > -	int			error, forkoff;
> > > -	struct xfs_buf		*bp = NULL;
> > > -	struct xfs_inode	*dp = args->dp;
> > > -
> > > -	trace_xfs_attr_leaf_addname(args);
> > > -
> > > -	error = xfs_attr_leaf_try_add(args, bp);
> > > -	if (error)
> > > -		return error;
> > > +	struct xfs_da_args		*args = dac->da_args;
> > > +	struct xfs_buf			*bp = NULL;
> > > +	int				error, forkoff;
> > > +	struct xfs_inode		*dp = args->dp;
> > > -	/*
> > > -	 * Commit the transaction that added the attr name so that
> > > -	 * later routines can manage their own transactions.
> > > -	 */
> > > -	error = xfs_trans_roll_inode(&args->trans, dp);
> > > -	if (error)
> > > -		return error;
> > > +	/* State machine switch */
> > > +	switch (dac->dela_state) {
> > > +	case XFS_DAS_FLIP_LFLAG:
> > > +		goto das_flip_flag;
> > > +	case XFS_DAS_RM_LBLK:
> > > +		goto das_rm_lblk;
> > > +	default:
> > > +		break;
> > > +	}
> > >   	/*
> > >   	 * If there was an out-of-line value, allocate the blocks we
> > > @@ -752,12 +821,34 @@ xfs_attr_leaf_addname(
> > >   	 * after we create the attribute so that we don't overflow the
> > >   	 * maximum size of a transaction and/or hit a deadlock.
> > >   	 */
> > > -	if (args->rmtblkno > 0) {
> > > -		error = xfs_attr_rmtval_set(args);
> > > +
> > > +	/* Open coded xfs_attr_rmtval_set without trans handling */
> > > +	if ((dac->flags & XFS_DAC_LEAF_ADDNAME_INIT) == 0) {
> > > +		dac->flags |= XFS_DAC_LEAF_ADDNAME_INIT;
> > > +		if (args->rmtblkno > 0) {
> > > +			error = xfs_attr_rmtval_find_space(dac);
> > > +			if (error)
> > > +				return error;
> > > +		}
> > > +	}
> > > +
> > > +	/*
> > > +	 * Roll through the "value", allocating blocks on disk as
> > > +	 * required.
> > > +	 */
> > > +	if (dac->blkcnt > 0) {
> > > +		error = xfs_attr_rmtval_set_blk(dac);
> > >   		if (error)
> > >   			return error;
> > > +
> > > +		dac->flags |= XFS_DAC_DEFER_FINISH;
> > > +		return -EAGAIN;
> > 
> > What state are we in here?  FOUND_LBLK, with blkcnt slowly decreasing?
> > 
> I used to have an ALLOC_LEAF state for this one.  Used to look something
> like this:
> +alloc_leaf:

Aha, that's where ALLOC_LEAF went.

> +        while (args->dac.blkcnt > 0) {
> +            error = xfs_attr_rmtval_set_blk(args);
> +            if (error)
> +                return error;
> +
> +            args->dac.flags |= XFS_DAC_FINISH_TRANS;
> +            args->dac.dela_state = XFS_DAS_ALLOC_LEAF;
> +            return -EAGAIN;
> +        }
> 
> Again, it's not really needed, as we will fall into this logic with or with
> out the state.  And the while loop doesnt really loop, though I guess it
> does sort of help the reader to understand that this is supposed to function
> like a loop.  I think it's easy to see something like that, and then want to
> simplify away the extra semantics, but then on a second look, it's not quite
> as obvious why with out the recollection of what it once was.  Maybe a
> comment is in order?
> 
> /* Repeat this until we have set all rmt blks */
> 
> ?

Well there already is a comment that we're repeating until we've set all
the remote blocks, but it should capture which DAS state(s) we could be
in, because I quickly get lost, especially in the attr set code.

> 
> To directly answer your question though, I think the state is still UNINIT
> at this point, since any of the other states would have branched off before
> this.  It's important to note though that the functions that have states are
> meant to sort of take ownership the statemachine.  IOW, if the state coming
> in does not apply to the scope of this function, or any of the subroutines
> there in, then the state is simply overwritten as this function decides
> appropriate.  It doesnt throw an error if it is passed a state that used to
> belong to it's parent.  Calling functions should understand that they have
> sort of "surrendered" the statemachine to this subfunction until it returns
> something other than EAGAIN.

That will become very obvious once we've arrived at the end of the
series and everyone must use defer ops. :)

> At least that's the idea.  Honnestly, the only
> reason I have UNINIT at all is because we get warnings about setting the
> state to 0 when the enum needs to start at something other than 0.
> 
> Hope that helps?

Yeah.

> 
> 
> 
> > >   	}
> > > +	error = xfs_attr_rmtval_set_value(args);
> > > +	if (error)
> > > +		return error;
> > > +
> > >   	if (!(args->op_flags & XFS_DA_OP_RENAME)) {
> > >   		/*
> > >   		 * Added a "remote" value, just clear the incomplete flag.
> > > @@ -777,29 +868,29 @@ xfs_attr_leaf_addname(
> > >   	 * In a separate transaction, set the incomplete flag on the "old" attr
> > >   	 * and clear the incomplete flag on the "new" attr.
> > >   	 */
> > > -
> > >   	error = xfs_attr3_leaf_flipflags(args);
> > >   	if (error)
> > >   		return error;
> > >   	/*
> > >   	 * Commit the flag value change and start the next trans in series.
> > >   	 */
> > > -	error = xfs_trans_roll_inode(&args->trans, args->dp);
> > > -	if (error)
> > > -		return error;
> > > -
> > > +	dac->dela_state = XFS_DAS_FLIP_LFLAG;
> > > +	return -EAGAIN;
> > > +das_flip_flag:
> > >   	/*
> > >   	 * Dismantle the "old" attribute/value pair by removing a "remote" value
> > >   	 * (if it exists).
> > >   	 */
> > >   	xfs_attr_restore_rmt_blk(args);
> > > +	error = xfs_attr_rmtval_invalidate(args);
> > > +	if (error)
> > > +		return error;
> > > +das_rm_lblk:
> > >   	if (args->rmtblkno) {
> > > -		error = xfs_attr_rmtval_invalidate(args);
> > > -		if (error)
> > > -			return error;
> > > -
> > > -		error = xfs_attr_rmtval_remove(args);
> > > +		error = __xfs_attr_rmtval_remove(dac);
> > > +		if (error == -EAGAIN)
> > > +			dac->dela_state = XFS_DAS_RM_LBLK;
> > >   		if (error)
> > >   			return error;
> > >   	}
> > > @@ -965,23 +1056,38 @@ xfs_attr_node_hasname(
> > >    *
> > >    * "Remote" attribute values confuse the issue and atomic rename operations
> > >    * add a whole extra layer of confusion on top of that.
> > > + *
> > > + * This routine is meant to function as a delayed operation, and may return
> > > + * -EAGAIN when the transaction needs to be rolled.  Calling functions will need
> > > + * to handle this, and recall the function until a successful error code is
> > > + *returned.
> > >    */
> > >   STATIC int
> > >   xfs_attr_node_addname(
> > > -	struct xfs_da_args	*args)
> > > +	struct xfs_delattr_context	*dac)
> > >   {
> > > -	struct xfs_da_state	*state;
> > > -	struct xfs_da_state_blk	*blk;
> > > -	struct xfs_inode	*dp;
> > > -	int			retval, error;
> > > +	struct xfs_da_args		*args = dac->da_args;
> > > +	struct xfs_da_state		*state = NULL;
> > > +	struct xfs_da_state_blk		*blk;
> > > +	int				retval = 0;
> > > +	int				error = 0;
> > >   	trace_xfs_attr_node_addname(args);
> > > -	/*
> > > -	 * Fill in bucket of arguments/results/context to carry around.
> > > -	 */
> > > -	dp = args->dp;
> > > -restart:
> > > +	/* State machine switch */
> > > +	switch (dac->dela_state) {
> > > +	case XFS_DAS_FLIP_NFLAG:
> > > +		goto das_flip_flag;
> > > +	case XFS_DAS_FOUND_NBLK:
> > > +		goto das_found_nblk;
> > > +	case XFS_DAS_ALLOC_NODE:
> > > +		goto das_alloc_node;
> > > +	case XFS_DAS_RM_NBLK:
> > > +		goto das_rm_nblk;
> > > +	default:
> > > +		break;
> > > +	}
> > > +
> > >   	/*
> > >   	 * Search to see if name already exists, and get back a pointer
> > >   	 * to where it should go.
> > > @@ -1027,19 +1133,13 @@ xfs_attr_node_addname(
> > >   			error = xfs_attr3_leaf_to_node(args);
> > >   			if (error)
> > >   				goto out;
> > > -			error = xfs_defer_finish(&args->trans);
> > > -			if (error)
> > > -				goto out;
> > >   			/*
> > > -			 * Commit the node conversion and start the next
> > > -			 * trans in the chain.
> > > +			 * Restart routine from the top.  No need to set  the
> > > +			 * state
> > >   			 */
> > > -			error = xfs_trans_roll_inode(&args->trans, dp);
> > > -			if (error)
> > > -				goto out;
> > > -
> > > -			goto restart;
> > > +			dac->flags |= XFS_DAC_DEFER_FINISH;
> > > +			return -EAGAIN;
> > 
> > What state are we in here?  Are we still in the same state that we were
> > at the start of the function, but ready to try xfs_attr3_leaf_add again?
> To directly answer the question: we may be in UNINIT if we were already a
> node when we started the attr op.  If we had to promote from leaf to node,
> it may be some state left over from the leaf routines.
> 
> Again though, in so far as this routine is concerned, the idea is that the
> state either one of the cases in the switch up top, or it's not.

<nod>  Comment please. :)

> > 
> > >   		}
> > >   		/*
> > > @@ -1051,9 +1151,7 @@ xfs_attr_node_addname(
> > >   		error = xfs_da3_split(state);
> > >   		if (error)
> > >   			goto out;
> > > -		error = xfs_defer_finish(&args->trans);
> > > -		if (error)
> > > -			goto out;
> > > +		dac->flags |= XFS_DAC_DEFER_FINISH;
> > >   	} else {
> > >   		/*
> > >   		 * Addition succeeded, update Btree hashvals.
> > > @@ -1068,13 +1166,9 @@ xfs_attr_node_addname(
> > >   	xfs_da_state_free(state);
> > >   	state = NULL;
> > > -	/*
> > > -	 * Commit the leaf addition or btree split and start the next
> > > -	 * trans in the chain.
> > > -	 */
> > > -	error = xfs_trans_roll_inode(&args->trans, dp);
> > > -	if (error)
> > > -		goto out;
> > > +	dac->dela_state = XFS_DAS_FOUND_NBLK;
> > > +	return -EAGAIN;
> > > +das_found_nblk:
> > >   	/*
> > >   	 * If there was an out-of-line value, allocate the blocks we
> > > @@ -1083,7 +1177,27 @@ xfs_attr_node_addname(
> > >   	 * maximum size of a transaction and/or hit a deadlock.
> > >   	 */
> > >   	if (args->rmtblkno > 0) {
> > > -		error = xfs_attr_rmtval_set(args);
> > > +		/* Open coded xfs_attr_rmtval_set without trans handling */
> > > +		error = xfs_attr_rmtval_find_space(dac);
> > > +		if (error)
> > > +			return error;
> > > +
> > > +		/*
> > > +		 * Roll through the "value", allocating blocks on disk as
> > > +		 * required.
> > > +		 */
> > > +das_alloc_node:
> > > +		if (dac->blkcnt > 0) {
> > > +			error = xfs_attr_rmtval_set_blk(dac);
> > > +			if (error)
> > > +				return error;
> > > +
> > > +			dac->flags |= XFS_DAC_DEFER_FINISH;
> > > +			dac->dela_state = XFS_DAS_ALLOC_NODE;
> > > +			return -EAGAIN;
> > > +		}
> > > +
> > > +		error = xfs_attr_rmtval_set_value(args);
> > >   		if (error)
> > >   			return error;
> > >   	}
> > > @@ -1113,22 +1227,28 @@ xfs_attr_node_addname(
> > >   	/*
> > >   	 * Commit the flag value change and start the next trans in series
> > >   	 */
> > > -	error = xfs_trans_roll_inode(&args->trans, args->dp);
> > > -	if (error)
> > > -		goto out;
> > > -
> > > +	dac->dela_state = XFS_DAS_FLIP_NFLAG;
> > > +	return -EAGAIN;
> > > +das_flip_flag:
> > >   	/*
> > >   	 * Dismantle the "old" attribute/value pair by removing a "remote" value
> > >   	 * (if it exists).
> > >   	 */
> > >   	xfs_attr_restore_rmt_blk(args);
> > > +	error = xfs_attr_rmtval_invalidate(args);
> > > +	if (error)
> > > +		return error;
> > > +
> > > +das_rm_nblk:
> > >   	if (args->rmtblkno) {
> > > -		error = xfs_attr_rmtval_invalidate(args);
> > > -		if (error)
> > > -			return error;
> > > +		error = __xfs_attr_rmtval_remove(dac);
> > > +
> > > +		if (error == -EAGAIN) {
> > > +			dac->dela_state = XFS_DAS_RM_NBLK;
> > > +			return -EAGAIN;
> > > +		}
> > > -		error = xfs_attr_rmtval_remove(args);
> > >   		if (error)
> > >   			return error;
> > >   	}
> > > diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
> > > index 64dcf0f..501f9df 100644
> > > --- a/fs/xfs/libxfs/xfs_attr.h
> > > +++ b/fs/xfs/libxfs/xfs_attr.h
> > > @@ -106,6 +106,118 @@ struct xfs_attr_list_context {
> > >    *	                                      v         │
> > >    *	                                     done <─────┘
> > >    *
> > > + *
> > > + * Below is a state machine diagram for attr set operations.
> > > + *
> > > + *  xfs_attr_set_iter()
> > > + *             │
> > > + *             v
> > 
> > I think this diagram is missing the part where we attempt to add a
> > shortform attr?
> I left if out because the short form doesnt make use of states.  I can
> doodle that in though if you prefer:
> 
>       ┌───n── is shortform?
>       │            |
>       │            y
>       │            |
>       │            V
>       │   xfs_attr_set_shortform
>       │            |
>       │            V
>       ├───n─── had enough
>       │          space?
>       │            │
>       │            y
>       │            │
>       │            V
>       │           done
>       └────────────┐
>                    │
>                    V

Yes, please do capture the entire mechanism so that 2025 us aren't
sitting here muttering about why we didn't do that when everything was
still warm in our L1 brain cache. ;)

--D

> 
> > 
> > --D
> 
> Thx for the thorough reviews!
> 
> Allison
> 
> > 
> > > + *   ┌───n── fork has
> > > + *   │	    only 1 blk?
> > > + *   │		│
> > > + *   │		y
> > > + *   │		│
> > > + *   │		v
> > > + *   │	xfs_attr_leaf_try_add()
> > > + *   │		│
> > > + *   │		v
> > > + *   │	     had enough
> > > + *   ├───n────space?
> > > + *   │		│
> > > + *   │		y
> > > + *   │		│
> > > + *   │		v
> > > + *   │	XFS_DAS_FOUND_LBLK ──┐
> > > + *   │	                     │
> > > + *   │	XFS_DAS_FLIP_LFLAG ──┤
> > > + *   │	(subroutine state)   │
> > > + *   │		             │
> > > + *   │		             └─>xfs_attr_leaf_addname()
> > > + *   │		                      │
> > > + *   │		                      v
> > > + *   │		                   was this
> > > + *   │		                   a rename? ──n─┐
> > > + *   │		                      │          │
> > > + *   │		                      y          │
> > > + *   │		                      │          │
> > > + *   │		                      v          │
> > > + *   │		                flip incomplete  │
> > > + *   │		                    flag         │
> > > + *   │		                      │          │
> > > + *   │		                      v          │
> > > + *   │		              XFS_DAS_FLIP_LFLAG │
> > > + *   │		                      │          │
> > > + *   │		                      v          │
> > > + *   │		                    remove       │
> > > + *   │		XFS_DAS_RM_LBLK ─> old name      │
> > > + *   │		         ^            │          │
> > > + *   │		         │            v          │
> > > + *   │		         └──────y── more to      │
> > > + *   │		                    remove       │
> > > + *   │		                      │          │
> > > + *   │		                      n          │
> > > + *   │		                      │          │
> > > + *   │		                      v          │
> > > + *   │		                     done <──────┘
> > > + *   └──> XFS_DAS_FOUND_NBLK ──┐
> > > + *	  (subroutine state)   │
> > > + *	                       │
> > > + *	  XFS_DAS_ALLOC_NODE ──┤
> > > + *	  (subroutine state)   │
> > > + *	                       │
> > > + *	  XFS_DAS_FLIP_NFLAG ──┤
> > > + *	  (subroutine state)   │
> > > + *	                       │
> > > + *	                       └─>xfs_attr_node_addname()
> > > + *	                               │
> > > + *	                               v
> > > + *	                       find space to store
> > > + *	                      attr. Split if needed
> > > + *	                               │
> > > + *	                               v
> > > + *	                       XFS_DAS_FOUND_NBLK
> > > + *	                               │
> > > + *	                               v
> > > + *	                 ┌─────n──  need to
> > > + *	                 │        alloc blks?
> > > + *	                 │             │
> > > + *	                 │             y
> > > + *	                 │             │
> > > + *	                 │             v
> > > + *	                 │  ┌─>XFS_DAS_ALLOC_NODE
> > > + *	                 │  │          │
> > > + *	                 │  │          v
> > > + *	                 │  └──y── need to alloc
> > > + *	                 │         more blocks?
> > > + *	                 │             │
> > > + *	                 │             n
> > > + *	                 │             │
> > > + *	                 │             v
> > > + *	                 │          was this
> > > + *	                 └────────> a rename? ──n─┐
> > > + *	                               │          │
> > > + *	                               y          │
> > > + *	                               │          │
> > > + *	                               v          │
> > > + *	                         flip incomplete  │
> > > + *	                             flag         │
> > > + *	                               │          │
> > > + *	                               v          │
> > > + *	                       XFS_DAS_FLIP_NFLAG │
> > > + *	                               │          │
> > > + *	                               v          │
> > > + *	                             remove       │
> > > + *	         XFS_DAS_RM_NBLK ─> old name      │
> > > + *	                  ^            │          │
> > > + *	                  │            v          │
> > > + *	                  └──────y── more to      │
> > > + *	                             remove       │
> > > + *	                               │          │
> > > + *	                               n          │
> > > + *	                               │          │
> > > + *	                               v          │
> > > + *	                              done <──────┘
> > > + *
> > >    */
> > >   /*
> > > @@ -120,6 +232,13 @@ struct xfs_attr_list_context {
> > >   enum xfs_delattr_state {
> > >   	XFS_DAS_UNINIT		= 0,  /* No state has been set yet */
> > >   	XFS_DAS_RM_SHRINK,	      /* We are shrinking the tree */
> > > +	XFS_DAS_FOUND_LBLK,	      /* We found leaf blk for attr */
> > > +	XFS_DAS_FOUND_NBLK,	      /* We found node blk for attr */
> > > +	XFS_DAS_FLIP_LFLAG,	      /* Flipped leaf INCOMPLETE attr flag */
> > > +	XFS_DAS_RM_LBLK,	      /* A rename is removing leaf blocks */
> > > +	XFS_DAS_ALLOC_NODE,	      /* We are allocating node blocks */
> > > +	XFS_DAS_FLIP_NFLAG,	      /* Flipped node INCOMPLETE attr flag */
> > > +	XFS_DAS_RM_NBLK,	      /* A rename is removing node blocks */
> > >   };
> > >   /*
> > > @@ -127,6 +246,7 @@ enum xfs_delattr_state {
> > >    */
> > >   #define XFS_DAC_DEFER_FINISH		0x01 /* finish the transaction */
> > >   #define XFS_DAC_NODE_RMVNAME_INIT	0x02 /* xfs_attr_node_removename init */
> > > +#define XFS_DAC_LEAF_ADDNAME_INIT	0x04 /* xfs_attr_leaf_addname init*/
> > >   /*
> > >    * Context used for keeping track of delayed attribute operations
> > > @@ -134,6 +254,11 @@ enum xfs_delattr_state {
> > >   struct xfs_delattr_context {
> > >   	struct xfs_da_args      *da_args;
> > > +	/* Used in xfs_attr_rmtval_set_blk to roll through allocating blocks */
> > > +	struct xfs_bmbt_irec	map;
> > > +	xfs_dablk_t		lblkno;
> > > +	int			blkcnt;
> > > +
> > >   	/* Used in xfs_attr_node_removename to roll through removing blocks */
> > >   	struct xfs_da_state     *da_state;
> > > @@ -160,7 +285,6 @@ int xfs_attr_set_args(struct xfs_da_args *args);
> > >   int xfs_has_attr(struct xfs_da_args *args);
> > >   int xfs_attr_remove_args(struct xfs_da_args *args);
> > >   int xfs_attr_remove_iter(struct xfs_delattr_context *dac);
> > > -int xfs_attr_trans_roll(struct xfs_delattr_context *dac);
> > >   bool xfs_attr_namecheck(const void *name, size_t length);
> > >   void xfs_delattr_context_init(struct xfs_delattr_context *dac,
> > >   			      struct xfs_da_args *args);
> > > diff --git a/fs/xfs/libxfs/xfs_attr_remote.c b/fs/xfs/libxfs/xfs_attr_remote.c
> > > index 1426c15..5b445e7 100644
> > > --- a/fs/xfs/libxfs/xfs_attr_remote.c
> > > +++ b/fs/xfs/libxfs/xfs_attr_remote.c
> > > @@ -441,7 +441,7 @@ xfs_attr_rmtval_get(
> > >    * Find a "hole" in the attribute address space large enough for us to drop the
> > >    * new attribute's value into
> > >    */
> > > -STATIC int
> > > +int
> > >   xfs_attr_rmt_find_hole(
> > >   	struct xfs_da_args	*args)
> > >   {
> > > @@ -468,7 +468,7 @@ xfs_attr_rmt_find_hole(
> > >   	return 0;
> > >   }
> > > -STATIC int
> > > +int
> > >   xfs_attr_rmtval_set_value(
> > >   	struct xfs_da_args	*args)
> > >   {
> > > @@ -628,6 +628,69 @@ xfs_attr_rmtval_set(
> > >   }
> > >   /*
> > > + * Find a hole for the attr and store it in the delayed attr context.  This
> > > + * initializes the context to roll through allocating an attr extent for a
> > > + * delayed attr operation
> > > + */
> > > +int
> > > +xfs_attr_rmtval_find_space(
> > > +	struct xfs_delattr_context	*dac)
> > > +{
> > > +	struct xfs_da_args		*args = dac->da_args;
> > > +	struct xfs_bmbt_irec		*map = &dac->map;
> > > +	int				error;
> > > +
> > > +	dac->lblkno = 0;
> > > +	dac->blkcnt = 0;
> > > +	args->rmtblkcnt = 0;
> > > +	args->rmtblkno = 0;
> > > +	memset(map, 0, sizeof(struct xfs_bmbt_irec));
> > > +
> > > +	error = xfs_attr_rmt_find_hole(args);
> > > +	if (error)
> > > +		return error;
> > > +
> > > +	dac->blkcnt = args->rmtblkcnt;
> > > +	dac->lblkno = args->rmtblkno;
> > > +
> > > +	return 0;
> > > +}
> > > +
> > > +/*
> > > + * Write one block of the value associated with an attribute into the
> > > + * out-of-line buffer that we have defined for it. This is similar to a subset
> > > + * of xfs_attr_rmtval_set, but records the current block to the delayed attr
> > > + * context, and leaves transaction handling to the caller.
> > > + */
> > > +int
> > > +xfs_attr_rmtval_set_blk(
> > > +	struct xfs_delattr_context	*dac)
> > > +{
> > > +	struct xfs_da_args		*args = dac->da_args;
> > > +	struct xfs_inode		*dp = args->dp;
> > > +	struct xfs_bmbt_irec		*map = &dac->map;
> > > +	int nmap;
> > > +	int error;
> > > +
> > > +	nmap = 1;
> > > +	error = xfs_bmapi_write(args->trans, dp, (xfs_fileoff_t)dac->lblkno,
> > > +				dac->blkcnt, XFS_BMAPI_ATTRFORK, args->total,
> > > +				map, &nmap);
> > > +	if (error)
> > > +		return error;
> > > +
> > > +	ASSERT(nmap == 1);
> > > +	ASSERT((map->br_startblock != DELAYSTARTBLOCK) &&
> > > +	       (map->br_startblock != HOLESTARTBLOCK));
> > > +
> > > +	/* roll attribute extent map forwards */
> > > +	dac->lblkno += map->br_blockcount;
> > > +	dac->blkcnt -= map->br_blockcount;
> > > +
> > > +	return 0;
> > > +}
> > > +
> > > +/*
> > >    * Remove the value associated with an attribute by deleting the
> > >    * out-of-line buffer that it is stored on.
> > >    */
> > > @@ -669,38 +732,6 @@ xfs_attr_rmtval_invalidate(
> > >   }
> > >   /*
> > > - * Remove the value associated with an attribute by deleting the
> > > - * out-of-line buffer that it is stored on.
> > > - */
> > > -int
> > > -xfs_attr_rmtval_remove(
> > > -	struct xfs_da_args		*args)
> > > -{
> > > -	int				error;
> > > -	struct xfs_delattr_context	dac  = {
> > > -		.da_args	= args,
> > > -	};
> > > -
> > > -	trace_xfs_attr_rmtval_remove(args);
> > > -
> > > -	/*
> > > -	 * Keep de-allocating extents until the remote-value region is gone.
> > > -	 */
> > > -	do {
> > > -		error = __xfs_attr_rmtval_remove(&dac);
> > > -		if (error != -EAGAIN)
> > > -			break;
> > > -
> > > -		error = xfs_attr_trans_roll(&dac);
> > > -		if (error)
> > > -			return error;
> > > -
> > > -	} while (true);
> > > -
> > > -	return error;
> > > -}
> > > -
> > > -/*
> > >    * Remove the value associated with an attribute by deleting the out-of-line
> > >    * buffer that it is stored on. Returns EAGAIN for the caller to refresh the
> > >    * transaction and re-call the function
> > > diff --git a/fs/xfs/libxfs/xfs_attr_remote.h b/fs/xfs/libxfs/xfs_attr_remote.h
> > > index 002fd30..84e2700 100644
> > > --- a/fs/xfs/libxfs/xfs_attr_remote.h
> > > +++ b/fs/xfs/libxfs/xfs_attr_remote.h
> > > @@ -15,4 +15,8 @@ int xfs_attr_rmtval_stale(struct xfs_inode *ip, struct xfs_bmbt_irec *map,
> > >   		xfs_buf_flags_t incore_flags);
> > >   int xfs_attr_rmtval_invalidate(struct xfs_da_args *args);
> > >   int __xfs_attr_rmtval_remove(struct xfs_delattr_context *dac);
> > > +int xfs_attr_rmt_find_hole(struct xfs_da_args *args);
> > > +int xfs_attr_rmtval_set_value(struct xfs_da_args *args);
> > > +int xfs_attr_rmtval_set_blk(struct xfs_delattr_context *dac);
> > > +int xfs_attr_rmtval_find_space(struct xfs_delattr_context *dac);
> > >   #endif /* __XFS_ATTR_REMOTE_H__ */
> > > diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
> > > index 8695165..e9dde4e 100644
> > > --- a/fs/xfs/xfs_trace.h
> > > +++ b/fs/xfs/xfs_trace.h
> > > @@ -1925,7 +1925,6 @@ DEFINE_ATTR_EVENT(xfs_attr_refillstate);
> > >   DEFINE_ATTR_EVENT(xfs_attr_rmtval_get);
> > >   DEFINE_ATTR_EVENT(xfs_attr_rmtval_set);
> > > -DEFINE_ATTR_EVENT(xfs_attr_rmtval_remove);
> > >   #define DEFINE_DA_EVENT(name) \
> > >   DEFINE_EVENT(xfs_da_class, name, \
> > > -- 
> > > 2.7.4
> > > 

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v13 05/10] xfs: Set up infastructure for deferred attribute operations
  2020-11-13  1:32     ` Allison Henderson
@ 2020-11-14  2:00       ` Darrick J. Wong
  2020-11-16  7:41         ` Allison Henderson
  0 siblings, 1 reply; 58+ messages in thread
From: Darrick J. Wong @ 2020-11-14  2:00 UTC (permalink / raw)
  To: Allison Henderson; +Cc: linux-xfs

On Thu, Nov 12, 2020 at 06:32:13PM -0700, Allison Henderson wrote:
> 
> 
> On 11/10/20 2:51 PM, Darrick J. Wong wrote:
> > On Thu, Oct 22, 2020 at 11:34:30PM -0700, Allison Henderson wrote:
> > > Currently attributes are modified directly across one or more
> > > transactions. But they are not logged or replayed in the event of an
> > > error. The goal of delayed attributes is to enable logging and replaying
> > > of attribute operations using the existing delayed operations
> > > infrastructure.  This will later enable the attributes to become part of
> > > larger multi part operations that also must first be recorded to the
> > > log.  This is mostly of interest in the scheme of parent pointers which
> > > would need to maintain an attribute containing parent inode information
> > > any time an inode is moved, created, or removed.  Parent pointers would
> > > then be of interest to any feature that would need to quickly derive an
> > > inode path from the mount point. Online scrub, nfs lookups and fs grow
> > > or shrink operations are all features that could take advantage of this.
> > > 
> > > This patch adds two new log item types for setting or removing
> > > attributes as deferred operations.  The xfs_attri_log_item logs an
> > > intent to set or remove an attribute.  The corresponding
> > > xfs_attrd_log_item holds a reference to the xfs_attri_log_item and is
> > > freed once the transaction is done.  Both log items use a generic
> > > xfs_attr_log_format structure that contains the attribute name, value,
> > > flags, inode, and an op_flag that indicates if the operations is a set
> > > or remove.
> > > 
> > > Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
> > > ---
> > >   fs/xfs/Makefile                 |   1 +
> > >   fs/xfs/libxfs/xfs_attr.c        |   7 +-
> > >   fs/xfs/libxfs/xfs_attr.h        |  19 +
> > >   fs/xfs/libxfs/xfs_defer.c       |   1 +
> > >   fs/xfs/libxfs/xfs_defer.h       |   3 +
> > >   fs/xfs/libxfs/xfs_format.h      |   5 +
> > >   fs/xfs/libxfs/xfs_log_format.h  |  44 ++-
> > >   fs/xfs/libxfs/xfs_log_recover.h |   2 +
> > >   fs/xfs/libxfs/xfs_types.h       |   1 +
> > >   fs/xfs/scrub/common.c           |   2 +
> > >   fs/xfs/xfs_acl.c                |   2 +
> > >   fs/xfs/xfs_attr_item.c          | 750 ++++++++++++++++++++++++++++++++++++++++
> > >   fs/xfs/xfs_attr_item.h          |  76 ++++
> > >   fs/xfs/xfs_attr_list.c          |   1 +
> > >   fs/xfs/xfs_ioctl.c              |   2 +
> > >   fs/xfs/xfs_ioctl32.c            |   2 +
> > >   fs/xfs/xfs_iops.c               |   2 +
> > >   fs/xfs/xfs_log.c                |   4 +
> > >   fs/xfs/xfs_log_recover.c        |   2 +
> > >   fs/xfs/xfs_ondisk.h             |   2 +
> > >   fs/xfs/xfs_xattr.c              |   1 +
> > >   21 files changed, 923 insertions(+), 6 deletions(-)
> > > 
> > > diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
> > > index 04611a1..b056cfc 100644
> > > --- a/fs/xfs/Makefile
> > > +++ b/fs/xfs/Makefile
> > > @@ -102,6 +102,7 @@ xfs-y				+= xfs_log.o \
> > >   				   xfs_buf_item_recover.o \
> > >   				   xfs_dquot_item_recover.o \
> > >   				   xfs_extfree_item.o \
> > > +				   xfs_attr_item.o \
> > >   				   xfs_icreate_item.o \
> > >   				   xfs_inode_item.o \
> > >   				   xfs_inode_item_recover.o \
> > > diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
> > > index 6453178..760383c 100644
> > > --- a/fs/xfs/libxfs/xfs_attr.c
> > > +++ b/fs/xfs/libxfs/xfs_attr.c
> > > @@ -24,6 +24,7 @@
> > >   #include "xfs_quota.h"
> > >   #include "xfs_trans_space.h"
> > >   #include "xfs_trace.h"
> > > +#include "xfs_attr_item.h"
> > >   /*
> > >    * xfs_attr.c
> > > @@ -59,8 +60,6 @@ STATIC int xfs_attr_node_hasname(xfs_da_args_t *args,
> > >   STATIC int xfs_attr_fillstate(xfs_da_state_t *state);
> > >   STATIC int xfs_attr_refillstate(xfs_da_state_t *state);
> > >   STATIC int xfs_attr_leaf_try_add(struct xfs_da_args *args, struct xfs_buf *bp);
> > > -STATIC int xfs_attr_set_iter(struct xfs_delattr_context *dac,
> > > -			     struct xfs_buf **leaf_bp);
> > >   int
> > >   xfs_inode_hasattr(
> > > @@ -142,7 +141,7 @@ xfs_attr_get(
> > >   /*
> > >    * Calculate how many blocks we need for the new attribute,
> > >    */
> > > -STATIC int
> > > +int
> > >   xfs_attr_calc_size(
> > >   	struct xfs_da_args	*args,
> > >   	int			*local)
> > > @@ -327,7 +326,7 @@ xfs_attr_set_args(
> > >    * to handle this, and recall the function until a successful error code is
> > >    * returned.
> > >    */
> > > -STATIC int
> > > +int
> > >   xfs_attr_set_iter(
> > >   	struct xfs_delattr_context	*dac,
> > >   	struct xfs_buf			**leaf_bp)
> > > diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
> > > index 501f9df..5b4a1ca 100644
> > > --- a/fs/xfs/libxfs/xfs_attr.h
> > > +++ b/fs/xfs/libxfs/xfs_attr.h
> > > @@ -247,6 +247,7 @@ enum xfs_delattr_state {
> > >   #define XFS_DAC_DEFER_FINISH		0x01 /* finish the transaction */
> > >   #define XFS_DAC_NODE_RMVNAME_INIT	0x02 /* xfs_attr_node_removename init */
> > >   #define XFS_DAC_LEAF_ADDNAME_INIT	0x04 /* xfs_attr_leaf_addname init*/
> > > +#define XFS_DAC_DELAYED_OP_INIT		0x08 /* delayed operations init*/
> > >   /*
> > >    * Context used for keeping track of delayed attribute operations
> > > @@ -254,6 +255,9 @@ enum xfs_delattr_state {
> > >   struct xfs_delattr_context {
> > >   	struct xfs_da_args      *da_args;
> > > +	/* Used by delayed attributes to hold leaf across transactions */
> > 
> > "Used by xfs_attr_set to hold a leaf buffer across a transaction roll" ?
> Sure, will update
> 
> > 
> > > +	struct xfs_buf		*leaf_bp;
> > > +
> > >   	/* Used in xfs_attr_rmtval_set_blk to roll through allocating blocks */
> > >   	struct xfs_bmbt_irec	map;
> > >   	xfs_dablk_t		lblkno;
> > > @@ -267,6 +271,18 @@ struct xfs_delattr_context {
> > >   	enum xfs_delattr_state  dela_state;
> > >   };
> > > +/*
> > > + * List of attrs to commit later.
> > > + */
> > > +struct xfs_attr_item {
> > > +	struct xfs_delattr_context	xattri_dac;
> > > +	uint32_t			xattri_op_flags;/* attr op set or rm */
> > 
> > The comment for xattri_op_flags should be more direct in mentioning that
> > it takes XFS_ATTR_OP_FLAGS_{SET,REMOVE}.
> Alrighty, will do
> 
> > 
> > (Alternately you could define an enum for the incore state tracker that
> > causes the appropriate XFS_ATTR_OP_FLAG* to be set on the log item in
> > xfs_attr_create_intent to avoid mixing of the flag namespaces, but that
> > is a lot of paper-pushing...)
> > 
> > > +
> > > +	/* used to log this item to an intent */
> > > +	struct list_head		xattri_list;
> > > +};
> > 
> > Ok, so going back to a confusing comment I had from the last series,
> > I'm glad that you've moved all the attr code to be deferred operations.
> > 
> > Can you move all the xfs_delattr_context fields into xfs_attr_item?
> > AFAICT (from git diff'ing the entire branch :P) we never allocate an
> > xfs_delattr_context on its own; we only ever access the one that's
> > embedded in xfs_attr_item, right?
> Well, xfs_delattr_context is used earlier in the set by the top level
> routines xfs_attr_set/remove_args.  If we did this, it would pull the
> attr_item in the the lower part of the "delay ready" subseries, and I think
> people really just wanted that part to be "refactor only" just for reasons
> of making the reviewing easier.
> 
> How about an extra patch at the end that merges these struct after those
> high level functions back out?  That way we're not trying to introduce the
> log items before this patch?  That seems like a reasonable way to phase in
> the end result.

Yes.

> Also, such a change would imply that a lot of these lower level attr
> routines that sensitive the the state machine mechanics are not passing
> around a xfs_delattr_context any more, now they take a xfs_attr_item. Not
> entirly sure how people would feel about that, but again, I figure if we
> save it for the end, it's easy to take it or leave it with out causing too
> much surgery below.

Yes.  The major transformation of this patchset is to establish that
high level xfs functionality is supposed to use defer ops to stage
complex metadata updates instead of open-coding transaction rolling and
state management like it has done historically.

And, as you've undoubtedly noticed from implementing the attr item, that
also means that we can make those complex operations restartable in the
event of a system failure.

Also: When the log item is enabled, we hold the inode locked across an
entire xattr update /and/ can restart interrupted operations.  I think
this means that you can skip all the INCOMPLETE flag handling bs, since
that flag only exists to ensure that we only ever present exactly one
(key, value) tuple to userspace.

> > 
> > > +
> > > +
> > >   /*========================================================================
> > >    * Function prototypes for the kernel.
> > >    *========================================================================*/
> > > @@ -282,11 +298,14 @@ int xfs_attr_get_ilocked(struct xfs_da_args *args);
> > >   int xfs_attr_get(struct xfs_da_args *args);
> > >   int xfs_attr_set(struct xfs_da_args *args);
> > >   int xfs_attr_set_args(struct xfs_da_args *args);
> > > +int xfs_attr_set_iter(struct xfs_delattr_context *dac,
> > > +		      struct xfs_buf **leaf_bp);
> > >   int xfs_has_attr(struct xfs_da_args *args);
> > >   int xfs_attr_remove_args(struct xfs_da_args *args);
> > >   int xfs_attr_remove_iter(struct xfs_delattr_context *dac);
> > >   bool xfs_attr_namecheck(const void *name, size_t length);
> > >   void xfs_delattr_context_init(struct xfs_delattr_context *dac,
> > >   			      struct xfs_da_args *args);
> > > +int xfs_attr_calc_size(struct xfs_da_args *args, int *local);
> > >   #endif	/* __XFS_ATTR_H__ */
> > > diff --git a/fs/xfs/libxfs/xfs_defer.c b/fs/xfs/libxfs/xfs_defer.c
> > > index eff4a12..e9caff7 100644
> > > --- a/fs/xfs/libxfs/xfs_defer.c
> > > +++ b/fs/xfs/libxfs/xfs_defer.c
> > > @@ -178,6 +178,7 @@ static const struct xfs_defer_op_type *defer_op_types[] = {
> > >   	[XFS_DEFER_OPS_TYPE_RMAP]	= &xfs_rmap_update_defer_type,
> > >   	[XFS_DEFER_OPS_TYPE_FREE]	= &xfs_extent_free_defer_type,
> > >   	[XFS_DEFER_OPS_TYPE_AGFL_FREE]	= &xfs_agfl_free_defer_type,
> > > +	[XFS_DEFER_OPS_TYPE_ATTR]	= &xfs_attr_defer_type,
> > >   };
> > >   static void
> > > diff --git a/fs/xfs/libxfs/xfs_defer.h b/fs/xfs/libxfs/xfs_defer.h
> > > index 05472f7..72a5789 100644
> > > --- a/fs/xfs/libxfs/xfs_defer.h
> > > +++ b/fs/xfs/libxfs/xfs_defer.h
> > > @@ -19,6 +19,7 @@ enum xfs_defer_ops_type {
> > >   	XFS_DEFER_OPS_TYPE_RMAP,
> > >   	XFS_DEFER_OPS_TYPE_FREE,
> > >   	XFS_DEFER_OPS_TYPE_AGFL_FREE,
> > > +	XFS_DEFER_OPS_TYPE_ATTR,
> > >   	XFS_DEFER_OPS_TYPE_MAX,
> > >   };
> > > @@ -63,6 +64,8 @@ extern const struct xfs_defer_op_type xfs_refcount_update_defer_type;
> > >   extern const struct xfs_defer_op_type xfs_rmap_update_defer_type;
> > >   extern const struct xfs_defer_op_type xfs_extent_free_defer_type;
> > >   extern const struct xfs_defer_op_type xfs_agfl_free_defer_type;
> > > +extern const struct xfs_defer_op_type xfs_attr_defer_type;
> > > +
> > >   /*
> > >    * This structure enables a dfops user to detach the chain of deferred
> > > diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
> > > index dd764da..d419c34 100644
> > > --- a/fs/xfs/libxfs/xfs_format.h
> > > +++ b/fs/xfs/libxfs/xfs_format.h
> > > @@ -584,6 +584,11 @@ static inline bool xfs_sb_version_hasinobtcounts(struct xfs_sb *sbp)
> > >   		(sbp->sb_features_ro_compat & XFS_SB_FEAT_RO_COMPAT_INOBTCNT);
> > >   }
> > > +static inline bool xfs_sb_version_hasdelattr(struct xfs_sb *sbp)
> > > +{
> > > +	return false;
> > > +}
> > > +
> > >   /*
> > >    * end of superblock version macros
> > >    */
> > > diff --git a/fs/xfs/libxfs/xfs_log_format.h b/fs/xfs/libxfs/xfs_log_format.h
> > > index 8bd00da..de6309d 100644
> > > --- a/fs/xfs/libxfs/xfs_log_format.h
> > > +++ b/fs/xfs/libxfs/xfs_log_format.h
> > > @@ -117,7 +117,12 @@ struct xfs_unmount_log_format {
> > >   #define XLOG_REG_TYPE_CUD_FORMAT	24
> > >   #define XLOG_REG_TYPE_BUI_FORMAT	25
> > >   #define XLOG_REG_TYPE_BUD_FORMAT	26
> > > -#define XLOG_REG_TYPE_MAX		26
> > > +#define XLOG_REG_TYPE_ATTRI_FORMAT	27
> > > +#define XLOG_REG_TYPE_ATTRD_FORMAT	28
> > > +#define XLOG_REG_TYPE_ATTR_NAME	29
> > > +#define XLOG_REG_TYPE_ATTR_VALUE	30
> > > +#define XLOG_REG_TYPE_MAX		30
> > > +
> > >   /*
> > >    * Flags to log operation header
> > > @@ -240,6 +245,8 @@ typedef struct xfs_trans_header {
> > >   #define	XFS_LI_CUD		0x1243
> > >   #define	XFS_LI_BUI		0x1244	/* bmbt update intent */
> > >   #define	XFS_LI_BUD		0x1245
> > > +#define	XFS_LI_ATTRI		0x1246  /* attr set/remove intent*/
> > > +#define	XFS_LI_ATTRD		0x1247  /* attr set/remove done */
> > >   #define XFS_LI_TYPE_DESC \
> > >   	{ XFS_LI_EFI,		"XFS_LI_EFI" }, \
> > > @@ -255,7 +262,9 @@ typedef struct xfs_trans_header {
> > >   	{ XFS_LI_CUI,		"XFS_LI_CUI" }, \
> > >   	{ XFS_LI_CUD,		"XFS_LI_CUD" }, \
> > >   	{ XFS_LI_BUI,		"XFS_LI_BUI" }, \
> > > -	{ XFS_LI_BUD,		"XFS_LI_BUD" }
> > > +	{ XFS_LI_BUD,		"XFS_LI_BUD" }, \
> > > +	{ XFS_LI_ATTRI,		"XFS_LI_ATTRI" }, \
> > > +	{ XFS_LI_ATTRD,		"XFS_LI_ATTRD" }
> > >   /*
> > >    * Inode Log Item Format definitions.
> > > @@ -863,4 +872,35 @@ struct xfs_icreate_log {
> > >   	__be32		icl_gen;	/* inode generation number to use */
> > >   };
> > > +/*
> > > + * Flags for deferred attribute operations.
> > > + * Upper bits are flags, lower byte is type code
> > > + */
> > > +#define XFS_ATTR_OP_FLAGS_SET		1	/* Set the attribute */
> > > +#define XFS_ATTR_OP_FLAGS_REMOVE	2	/* Remove the attribute */
> > > +#define XFS_ATTR_OP_FLAGS_TYPE_MASK	0x0FF	/* Flags type mask */
> > > +
> > > +/*
> > > + * This is the structure used to lay out an attr log item in the
> > > + * log.
> > > + */
> > > +struct xfs_attri_log_format {
> > > +	uint16_t	alfi_type;	/* attri log item type */
> > > +	uint16_t	alfi_size;	/* size of this item */
> > > +	uint32_t	__pad;		/* pad to 64 bit aligned */
> > > +	uint64_t	alfi_id;	/* attri identifier */
> > > +	xfs_ino_t	alfi_ino;	/* the inode for this attr operation */
> > 
> > This is an ondisk structure; please use only explicitly sized data
> > types like uint64_t.
> Ok, will update
> 
> > 
> > > +	uint32_t	alfi_op_flags;	/* marks the op as a set or remove */
> > > +	uint32_t	alfi_name_len;	/* attr name length */
> > > +	uint32_t	alfi_value_len;	/* attr value length */
> > > +	uint32_t	alfi_attr_flags;/* attr flags */
> > > +};
> > > +
> > > +struct xfs_attrd_log_format {
> > > +	uint16_t	alfd_type;	/* attrd log item type */
> > > +	uint16_t	alfd_size;	/* size of this item */
> > > +	uint32_t	__pad;		/* pad to 64 bit aligned */
> > > +	uint64_t	alfd_alf_id;	/* id of corresponding attrd */
> > 
> > "..of corresponding attri"
> Yes, corresponding attri :-)
> 
> > 
> > > +};
> > > +
> > >   #endif /* __XFS_LOG_FORMAT_H__ */
> > > diff --git a/fs/xfs/libxfs/xfs_log_recover.h b/fs/xfs/libxfs/xfs_log_recover.h
> > > index 3cca2bf..b6e5514 100644
> > > --- a/fs/xfs/libxfs/xfs_log_recover.h
> > > +++ b/fs/xfs/libxfs/xfs_log_recover.h
> > > @@ -72,6 +72,8 @@ extern const struct xlog_recover_item_ops xlog_rui_item_ops;
> > >   extern const struct xlog_recover_item_ops xlog_rud_item_ops;
> > >   extern const struct xlog_recover_item_ops xlog_cui_item_ops;
> > >   extern const struct xlog_recover_item_ops xlog_cud_item_ops;
> > > +extern const struct xlog_recover_item_ops xlog_attri_item_ops;
> > > +extern const struct xlog_recover_item_ops xlog_attrd_item_ops;
> > >   /*
> > >    * Macros, structures, prototypes for internal log manager use.
> > > diff --git a/fs/xfs/libxfs/xfs_types.h b/fs/xfs/libxfs/xfs_types.h
> > > index 397d947..860cdd2 100644
> > > --- a/fs/xfs/libxfs/xfs_types.h
> > > +++ b/fs/xfs/libxfs/xfs_types.h
> > > @@ -11,6 +11,7 @@ typedef uint32_t	prid_t;		/* project ID */
> > >   typedef uint32_t	xfs_agblock_t;	/* blockno in alloc. group */
> > >   typedef uint32_t	xfs_agino_t;	/* inode # within allocation grp */
> > >   typedef uint32_t	xfs_extlen_t;	/* extent length in blocks */
> > > +typedef uint32_t	xfs_attrlen_t;	/* attr length */
> > 
> > This doesn't get used anywhere.
> Ok, will clean out.
> 
> > 
> > >   typedef uint32_t	xfs_agnumber_t;	/* allocation group number */
> > >   typedef int32_t		xfs_extnum_t;	/* # of extents in a file */
> > >   typedef int16_t		xfs_aextnum_t;	/* # extents in an attribute fork */
> > > diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
> > > index 1887605..9a649d1 100644
> > > --- a/fs/xfs/scrub/common.c
> > > +++ b/fs/xfs/scrub/common.c
> > > @@ -24,6 +24,8 @@
> > >   #include "xfs_rmap_btree.h"
> > >   #include "xfs_log.h"
> > >   #include "xfs_trans_priv.h"
> > > +#include "xfs_da_format.h"
> > > +#include "xfs_da_btree.h"
> > >   #include "xfs_attr.h"
> > >   #include "xfs_reflink.h"
> > >   #include "scrub/scrub.h"
> > > diff --git a/fs/xfs/xfs_acl.c b/fs/xfs/xfs_acl.c
> > > index c544951..cad1db4 100644
> > > --- a/fs/xfs/xfs_acl.c
> > > +++ b/fs/xfs/xfs_acl.c
> > > @@ -10,6 +10,8 @@
> > >   #include "xfs_trans_resv.h"
> > >   #include "xfs_mount.h"
> > >   #include "xfs_inode.h"
> > > +#include "xfs_da_format.h"
> > > +#include "xfs_da_btree.h"
> > >   #include "xfs_attr.h"
> > >   #include "xfs_trace.h"
> > >   #include "xfs_error.h"
> > > diff --git a/fs/xfs/xfs_attr_item.c b/fs/xfs/xfs_attr_item.c
> > > new file mode 100644
> > > index 0000000..3980066
> > > --- /dev/null
> > > +++ b/fs/xfs/xfs_attr_item.c
> > > @@ -0,0 +1,750 @@
> > > +// SPDX-License-Identifier: GPL-2.0-or-later
> > > +/*
> > > + * Copyright (C) 2019 Oracle.  All Rights Reserved.
> > 
> > 2019 -> 2020.
> Will update.  :-)
> 
> > 
> > > + * Author: Allison Collins <allison.henderson@oracle.com>
> > > + */
> > > +
> > > +#include "xfs.h"
> > > +#include "xfs_fs.h"
> > > +#include "xfs_format.h"
> > > +#include "xfs_log_format.h"
> > > +#include "xfs_trans_resv.h"
> > > +#include "xfs_bit.h"
> > > +#include "xfs_shared.h"
> > > +#include "xfs_mount.h"
> > > +#include "xfs_defer.h"
> > > +#include "xfs_trans.h"
> > > +#include "xfs_trans_priv.h"
> > > +#include "xfs_buf_item.h"
> > > +#include "xfs_attr_item.h"
> > > +#include "xfs_log.h"
> > > +#include "xfs_btree.h"
> > > +#include "xfs_rmap.h"
> > > +#include "xfs_inode.h"
> > > +#include "xfs_icache.h"
> > > +#include "xfs_da_format.h"
> > > +#include "xfs_da_btree.h"
> > > +#include "xfs_attr.h"
> > > +#include "xfs_shared.h"
> > > +#include "xfs_attr_item.h"
> > > +#include "xfs_alloc.h"
> > > +#include "xfs_bmap.h"
> > > +#include "xfs_trace.h"
> > > +#include "libxfs/xfs_da_format.h"
> > > +#include "xfs_inode.h"
> > > +#include "xfs_quota.h"
> > > +#include "xfs_log_priv.h"
> > > +#include "xfs_log_recover.h"
> > > +
> > > +static const struct xfs_item_ops xfs_attri_item_ops;
> > > +static const struct xfs_item_ops xfs_attrd_item_ops;
> > > +
> > > +static inline struct xfs_attri_log_item *ATTRI_ITEM(struct xfs_log_item *lip)
> > > +{
> > > +	return container_of(lip, struct xfs_attri_log_item, attri_item);
> > > +}
> > > +
> > > +STATIC void
> > > +xfs_attri_item_free(
> > > +	struct xfs_attri_log_item	*attrip)
> > > +{
> > > +	kmem_free(attrip->attri_item.li_lv_shadow);
> > > +	kmem_free(attrip);
> > > +}
> > > +
> > > +/*
> > > + * Freeing the attrip requires that we remove it from the AIL if it has already
> > > + * been placed there. However, the ATTRI may not yet have been placed in the
> > > + * AIL when called by xfs_attri_release() from ATTRD processing due to the
> > > + * ordering of committed vs unpin operations in bulk insert operations. Hence
> > > + * the reference count to ensure only the last caller frees the ATTRI.
> > > + */
> > > +STATIC void
> > > +xfs_attri_release(
> > > +	struct xfs_attri_log_item	*attrip)
> > > +{
> > > +	ASSERT(atomic_read(&attrip->attri_refcount) > 0);
> > > +	if (atomic_dec_and_test(&attrip->attri_refcount)) {
> > > +		xfs_trans_ail_delete(&attrip->attri_item,
> > > +				     SHUTDOWN_LOG_IO_ERROR);
> > > +		xfs_attri_item_free(attrip);
> > > +	}
> > > +}
> > > +
> > > +/*
> > > + * This returns the number of iovecs needed to log the given attri item. We
> > > + * only need 1 iovec for an attri item.  It just logs the attr_log_format
> > > + * structure.
> > > + */
> > > +static inline int
> > > +xfs_attri_item_sizeof(
> > > +	struct xfs_attri_log_item *attrip)
> > > +{
> > > +	return sizeof(struct xfs_attri_log_format);
> > > +}
> > 
> > Please get rid of this trivial oneliner.
> Sure, I think some of this I added just for reasons of being consistent with
> how the other delayed ops are implemented.
> 
> > 
> > > +
> > > +STATIC void
> > > +xfs_attri_item_size(
> > > +	struct xfs_log_item	*lip,
> > > +	int			*nvecs,
> > > +	int			*nbytes)
> > > +{
> > > +	struct xfs_attri_log_item       *attrip = ATTRI_ITEM(lip);
> > > +
> > > +	*nvecs += 1;
> > > +	*nbytes += xfs_attri_item_sizeof(attrip);
> > > +
> > > +	/* Attr set and remove operations require a name */
> > > +	ASSERT(attrip->attri_name_len > 0);
> > > +
> > > +	*nvecs += 1;
> > > +	*nbytes += ATTR_NVEC_SIZE(attrip->attri_name_len);
> > > +
> > > +	/*
> > > +	 * Set ops can accept a value of 0 len to clear an attr value.  Remove
> > > +	 * ops do not need a value at all.  So only account for the value
> > > +	 * when it is needed.
> > > +	 */
> > > +	if (attrip->attri_value_len > 0) {
> > > +		*nvecs += 1;
> > > +		*nbytes += ATTR_NVEC_SIZE(attrip->attri_value_len);
> > > +	}
> > > +}
> > > +
> > > +/*
> > > + * This is called to fill in the log iovecs for the given attri log
> > > + * item. We use  1 iovec for the attri_format_item, 1 for the name, and
> > > + * another for the value if it is present
> > > + */
> > > +STATIC void
> > > +xfs_attri_item_format(
> > > +	struct xfs_log_item	*lip,
> > > +	struct xfs_log_vec	*lv)
> > > +{
> > > +	struct xfs_attri_log_item	*attrip = ATTRI_ITEM(lip);
> > > +	struct xfs_log_iovec		*vecp = NULL;
> > > +
> > > +	attrip->attri_format.alfi_type = XFS_LI_ATTRI;
> > > +	attrip->attri_format.alfi_size = 1;
> > > +
> > > +	/*
> > > +	 * This size accounting must be done before copying the attrip into the
> > > +	 * iovec.  If we do it after, the wrong size will be recorded to the log
> > > +	 * and we trip across assertion checks for bad region sizes later during
> > > +	 * the log recovery.
> > > +	 */
> > > +
> > > +	ASSERT(attrip->attri_name_len > 0);
> > > +	attrip->attri_format.alfi_size++;
> > > +
> > > +	if (attrip->attri_value_len > 0)
> > > +		attrip->attri_format.alfi_size++;
> > > +
> > > +	xlog_copy_iovec(lv, &vecp, XLOG_REG_TYPE_ATTRI_FORMAT,
> > > +			&attrip->attri_format,
> > > +			xfs_attri_item_sizeof(attrip));
> > > +	if (attrip->attri_name_len > 0)
> > 
> > I thought we required attri_name_len > 0 always?
> I think so.  I think this check may have come up in one of the earlier
> reviews.  I'll add a comment here, we even have the assert a few lines up.

<nod>

> > 
> > > +		xlog_copy_iovec(lv, &vecp, XLOG_REG_TYPE_ATTR_NAME,
> > > +				attrip->attri_name,
> > > +				ATTR_NVEC_SIZE(attrip->attri_name_len));
> > > +
> > > +	if (attrip->attri_value_len > 0)
> > > +		xlog_copy_iovec(lv, &vecp, XLOG_REG_TYPE_ATTR_VALUE,
> > > +				attrip->attri_value,
> > > +				ATTR_NVEC_SIZE(attrip->attri_value_len));
> > > +}
> > > +
> > > +/*
> > > + * The unpin operation is the last place an ATTRI is manipulated in the log. It
> > > + * is either inserted in the AIL or aborted in the event of a log I/O error. In
> > > + * either case, the ATTRI transaction has been successfully committed to make
> > > + * it this far. Therefore, we expect whoever committed the ATTRI to either
> > > + * construct and commit the ATTRD or drop the ATTRD's reference in the event of
> > > + * error. Simply drop the log's ATTRI reference now that the log is done with
> > > + * it.
> > > + */
> > > +STATIC void
> > > +xfs_attri_item_unpin(
> > > +	struct xfs_log_item	*lip,
> > > +	int			remove)
> > > +{
> > > +	struct xfs_attri_log_item	*attrip = ATTRI_ITEM(lip);
> > > +
> > > +	xfs_attri_release(attrip);
> > 
> > Nit: this could be shortened to xfs_attri_release(ATTRI_ITEM(lip)).
> Ok, will shorten
> 
> > 
> > > +}
> > > +
> > > +
> > > +STATIC void
> > > +xfs_attri_item_release(
> > > +	struct xfs_log_item	*lip)
> > > +{
> > > +	xfs_attri_release(ATTRI_ITEM(lip));
> > > +}
> > > +
> > > +/*
> > > + * Allocate and initialize an attri item
> > > + */
> > > +STATIC struct xfs_attri_log_item *
> > > +xfs_attri_init(
> > > +	struct xfs_mount	*mp)
> > > +
> > > +{
> > > +	struct xfs_attri_log_item	*attrip;
> > > +	uint				size;
> > 
> > Can you line up the *mp in the parameter list with the *attrip in the
> > local variables?
> Sure
> 
> > 
> > > +
> > > +	size = (uint)(sizeof(struct xfs_attri_log_item));
> > 
> > kmem_zalloc takes a size_t parameter (which is the return type of sizeof);
> > no need to do all this casting.
> Ok, I'm thinking of adding an extra buffer_size param here, so that one of
> the callers doesnt have to realloc this for the trailing buffer needed
> during the commit.  One of the new test cases is showing an intermittent
> warning about allocating more than a page, so I'm trying to clean that up
> and figure that out

Urrk, oh right, I forgot that you can end up needing to allocate a 64k +
256b + ~80b buffer to hold all this state.

So uh yeah, you /do/ have to use kmem_zalloc_large and know the size
ahead of time.

> > > +	attrip = kmem_zalloc(size, 0);
> > > +
> > > +	xfs_log_item_init(mp, &attrip->attri_item, XFS_LI_ATTRI,
> > > +			  &xfs_attri_item_ops);
> > > +	attrip->attri_format.alfi_id = (uintptr_t)(void *)attrip;
> > > +	atomic_set(&attrip->attri_refcount, 2);
> > > +
> > > +	return attrip;
> > > +}
> > > +
> > > +/*
> > > + * Copy an attr format buffer from the given buf, and into the destination attr
> > > + * format structure.
> > > + */
> > > +STATIC int
> > > +xfs_attri_copy_format(struct xfs_log_iovec *buf,
> > > +		      struct xfs_attri_log_format *dst_attr_fmt)
> > > +{
> > > +	struct xfs_attri_log_format *src_attr_fmt = buf->i_addr;
> > > +	uint len = sizeof(struct xfs_attri_log_format);
> > 
> > Indentation and whatnot with the parameter names.
> Ok will fix
> > 
> > > +
> > > +	if (buf->i_len != len)
> > > +		return -EFSCORRUPTED;
> > > +
> > > +	memcpy((char *)dst_attr_fmt, (char *)src_attr_fmt, len);
> > > +	return 0;
> > > +}
> > > +
> > > +static inline struct xfs_attrd_log_item *ATTRD_ITEM(struct xfs_log_item *lip)
> > > +{
> > > +	return container_of(lip, struct xfs_attrd_log_item, attrd_item);
> > > +}
> > > +
> > > +STATIC void
> > > +xfs_attrd_item_free(struct xfs_attrd_log_item *attrdp)
> > > +{
> > > +	kmem_free(attrdp->attrd_item.li_lv_shadow);
> > > +	kmem_free(attrdp);
> > > +}
> > > +
> > > +/*
> > > + * This returns the number of iovecs needed to log the given attrd item.
> > > + * We only need 1 iovec for an attrd item.  It just logs the attr_log_format
> > > + * structure.
> > > + */
> > > +static inline int
> > > +xfs_attrd_item_sizeof(
> > > +	struct xfs_attrd_log_item *attrdp)
> > > +{
> > > +	return sizeof(struct xfs_attrd_log_format);
> > > +}
> > > +
> > > +STATIC void
> > > +xfs_attrd_item_size(
> > > +	struct xfs_log_item	*lip,
> > > +	int			*nvecs,
> > > +	int			*nbytes)
> > > +{
> > > +	struct xfs_attrd_log_item	*attrdp = ATTRD_ITEM(lip);
> > 
> > Variable name alignment between the parameter list and the local vars.
> > 
> > > +	*nvecs += 1;
> > 
> > Space between local variable declaration and the first line of code.
> > 
> > > +	*nbytes += xfs_attrd_item_sizeof(attrdp);
> > 
> > No need for a oneliner function for sizeof.
> 
> Ok, will fix
> > 
> > > +}
> > > +
> > > +/*
> > > + * This is called to fill in the log iovecs for the given attrd log item. We use
> > > + * only 1 iovec for the attrd_format, and we point that at the attr_log_format
> > > + * structure embedded in the attrd item.
> > > + */
> > > +STATIC void
> > > +xfs_attrd_item_format(
> > > +	struct xfs_log_item	*lip,
> > > +	struct xfs_log_vec	*lv)
> > > +{
> > > +	struct xfs_attrd_log_item	*attrdp = ATTRD_ITEM(lip);
> > > +	struct xfs_log_iovec		*vecp = NULL;
> > > +
> > > +	attrdp->attrd_format.alfd_type = XFS_LI_ATTRD;
> > > +	attrdp->attrd_format.alfd_size = 1;
> > > +
> > > +	xlog_copy_iovec(lv, &vecp, XLOG_REG_TYPE_ATTRD_FORMAT,
> > > +			&attrdp->attrd_format, xfs_attrd_item_sizeof(attrdp));
> > > +}
> > > +
> > > +/*
> > > + * The ATTRD is either committed or aborted if the transaction is cancelled. If
> > > + * the transaction is cancelled, drop our reference to the ATTRI and free the
> > > + * ATTRD.
> > > + */
> > > +STATIC void
> > > +xfs_attrd_item_release(
> > > +	struct xfs_log_item     *lip)
> > > +{
> > > +	struct xfs_attrd_log_item *attrdp = ATTRD_ITEM(lip);
> > > +	xfs_attri_release(attrdp->attrd_attrip);
> > 
> > Space between the variable declaration and the first line of code.
> Sure, will add.
> 
> > 
> > > +	xfs_attrd_item_free(attrdp);
> > > +}
> > > +
> > > +/*
> > > + * Log an ATTRI it to the ATTRD when the attr op is done.  An attr operation
> > 
> > I don't know what "Log an ATTRI it to the ATTRD" means.  I think this is
> > the function that performs one step of an attribute update intent and
> > then tags the attrd item dirty, right?
> Yes, I had modeled this function loosly around free extent code at the time.
> It has similar commentary, though that's about what I interpreted it to
> mean.  Back then we were still trying to conceptualize how this looping
> behavior with the state machine was going to work though.
> 
> Maybe the comment should just state it like that if that's more clear?
> 
> "Performs one step of an attribute update intent and marks the attrd item
> dirty."

Ok.  I was confused by the garbled sentence.

> 
> ?
> 
> > 
> > > + * may be a set or a remove.  Note that the transaction is marked dirty
> > > + * regardless of whether the operation succeeds or fails to support the
> > > + * ATTRI/ATTRD lifecycle rules.
> > > + */
> > > +int
> > > +xfs_trans_attr(
> > > +	struct xfs_delattr_context	*dac,
> > > +	struct xfs_attrd_log_item	*attrdp,
> > > +	struct xfs_buf			**leaf_bp,
> > > +	uint32_t			op_flags)
> > > +{
> > > +	struct xfs_da_args		*args = dac->da_args;
> > > +	int				error;
> > > +
> > > +	error = xfs_qm_dqattach_locked(args->dp, 0);
> > > +	if (error)
> > > +		return error;
> > > +
> > > +	switch (op_flags) {
> > > +	case XFS_ATTR_OP_FLAGS_SET:
> > > +		args->op_flags |= XFS_DA_OP_ADDNAME;
> > > +		error = xfs_attr_set_iter(dac, leaf_bp);
> > > +		break;
> > > +	case XFS_ATTR_OP_FLAGS_REMOVE:
> > > +		ASSERT(XFS_IFORK_Q((args->dp)));
> > 
> > No need for the double parentheses around args->dp.
> Ok, will clean out
> 
> > 
> > > +		error = xfs_attr_remove_iter(dac);
> > > +		break;
> > > +	default:
> > > +		error = -EFSCORRUPTED;
> > > +		break;
> > > +	}
> > > +
> > > +	/*
> > > +	 * Mark the transaction dirty, even on error. This ensures the
> > > +	 * transaction is aborted, which:
> > > +	 *
> > > +	 * 1.) releases the ATTRI and frees the ATTRD
> > > +	 * 2.) shuts down the filesystem
> > > +	 */
> > > +	args->trans->t_flags |= XFS_TRANS_DIRTY;
> > > +	if (xfs_sb_version_hasdelattr(&args->dp->i_mount->m_sb))
> > > +		set_bit(XFS_LI_DIRTY, &attrdp->attrd_item.li_flags);
> > 
> > This could probably be:
> > 
> > 	if (attrdp)
> > 		set_bit(...);
> 
> Sure, that should work too.  I'm thinking a comment though?  Because this
> looses the subtle implication that attrdp is expected to be null when the
> feature bit is off.  Otherwise it may stir up future questions of why/how
> would this be null.  Maybe just something like:
> 
> /*
>  * attr intent/done items are null when delayed attributes are disabled
>  */
> 
> ?

Ok.

> > 
> > > +
> > > +	return error;
> > > +}
> > > +
> > > +/* Log an attr to the intent item. */
> > > +STATIC void
> > > +xfs_attr_log_item(
> > > +	struct xfs_trans		*tp,
> > > +	struct xfs_attri_log_item	*attrip,
> > > +	struct xfs_attr_item		*attr)
> > > +{
> > > +	struct xfs_attri_log_format	*attrp;
> > > +
> > > +	tp->t_flags |= XFS_TRANS_DIRTY;
> > > +	set_bit(XFS_LI_DIRTY, &attrip->attri_item.li_flags);
> > > +
> > > +	/*
> > > +	 * At this point the xfs_attr_item has been constructed, and we've
> > > +	 * created the log intent. Fill in the attri log item and log format
> > > +	 * structure with fields from this xfs_attr_item
> > > +	 */
> > > +	attrp = &attrip->attri_format;
> > > +	attrp->alfi_ino = attr->xattri_dac.da_args->dp->i_ino;
> > > +	attrp->alfi_op_flags = attr->xattri_op_flags;
> > > +	attrp->alfi_value_len = attr->xattri_dac.da_args->valuelen;
> > > +	attrp->alfi_name_len = attr->xattri_dac.da_args->namelen;
> > > +	attrp->alfi_attr_flags = attr->xattri_dac.da_args->attr_filter;
> > > +
> > > +	attrip->attri_name = (void *)attr->xattri_dac.da_args->name;
> > > +	attrip->attri_value = attr->xattri_dac.da_args->value;
> > > +	attrip->attri_name_len = attr->xattri_dac.da_args->namelen;
> > > +	attrip->attri_value_len = attr->xattri_dac.da_args->valuelen;
> > > +}
> > > +
> > > +/* Get an ATTRI. */
> > > +static struct xfs_log_item *
> > > +xfs_attr_create_intent(
> > > +	struct xfs_trans		*tp,
> > > +	struct list_head		*items,
> > > +	unsigned int			count,
> > > +	bool				sort)
> > > +{
> > > +	struct xfs_mount		*mp = tp->t_mountp;
> > > +	struct xfs_attri_log_item	*attrip;
> > > +	struct xfs_attr_item		*attr;
> > > +
> > > +	ASSERT(count == 1);
> > > +
> > > +	if (!xfs_sb_version_hasdelattr(&mp->m_sb))
> > > +		return NULL;
> > > +
> > > +	attrip = xfs_attri_init(mp);
> > > +	xfs_trans_add_item(tp, &attrip->attri_item);
> > > +	list_for_each_entry(attr, items, xattri_list)
> > > +		xfs_attr_log_item(tp, attrip, attr);
> > > +	return &attrip->attri_item;
> > > +}
> > > +
> > > +/* Process an attr. */
> > > +STATIC int
> > > +xfs_attr_finish_item(
> > > +	struct xfs_trans		*tp,
> > > +	struct xfs_log_item		*done,
> > > +	struct list_head		*item,
> > > +	struct xfs_btree_cur		**state)
> > > +{
> > > +	struct xfs_attr_item		*attr;
> > > +	int				error;
> > > +	struct xfs_delattr_context	*dac;
> > > +	struct xfs_attrd_log_item	*attrdp;
> > > +	struct xfs_attri_log_item	*attrip;
> > > +
> > > +	attr = container_of(item, struct xfs_attr_item, xattri_list);
> > > +	dac = &attr->xattri_dac;
> > > +
> > > +	/*
> > > +	 * Always reset trans after EAGAIN cycle
> > > +	 * since the transaction is new
> > > +	 */
> > > +	dac->da_args->trans = tp;
> > > +
> > > +	error = xfs_trans_attr(dac, ATTRD_ITEM(done), &dac->leaf_bp,
> > > +			       attr->xattri_op_flags);
> > > +	/*
> > > +	 * The attrip refers to xfs_attr_item memory to log the name and value
> > > +	 * with the intent item. This already occurred when the intent was
> > > +	 * committed so these fields are no longer accessed.
> > 
> > Can you clear the attri_{name,value} pointers after you've logged the
> > intent item so that we don't have to do them here?
> > 
> Ok, maybe I can put this in xfs_attri_item_committed?

Yeah.

> > > Clear them out of
> > > +	 * caution since we're about to free the xfs_attr_item.
> > > +	 */
> > > +	if (xfs_sb_version_hasdelattr(&dac->da_args->dp->i_mount->m_sb)) {
> > > +		attrdp = (struct xfs_attrd_log_item *)done;
> > 
> > attrdp = ATTRD_ITEM(done)?
> Sure, will shorten
> > 
> > > +		attrip = attrdp->attrd_attrip;
> > > +		attrip->attri_name = NULL;
> > > +		attrip->attri_value = NULL;
> > > +	}
> > > +
> > > +	if (error != -EAGAIN)
> > > +		kmem_free(attr);
> > > +
> > > +	return error;
> > > +}
> > > +
> > > +/* Abort all pending ATTRs. */
> > > +STATIC void
> > > +xfs_attr_abort_intent(
> > > +	struct xfs_log_item		*intent)
> > > +{
> > > +	xfs_attri_release(ATTRI_ITEM(intent));
> > > +}
> > > +
> > > +/* Cancel an attr */
> > > +STATIC void
> > > +xfs_attr_cancel_item(
> > > +	struct list_head		*item)
> > > +{
> > > +	struct xfs_attr_item		*attr;
> > > +
> > > +	attr = container_of(item, struct xfs_attr_item, xattri_list);
> > > +	kmem_free(attr);
> > > +}
> > > +
> > > +/*
> > > + * The ATTRI is logged only once and cannot be moved in the log, so simply
> > > + * return the lsn at which it's been logged.
> > > + */
> > > +STATIC xfs_lsn_t
> > > +xfs_attri_item_committed(
> > > +	struct xfs_log_item	*lip,
> > > +	xfs_lsn_t		lsn)
> > > +{
> > > +	return lsn;
> > > +}
> > 
> > You can omit this function because the default is "return lsn;" if you
> > don't provide one.  See xfs_trans_committed_bulk.
> Oh, ok.  I was thinking of moving some of the finish item clean up here
> though.

<nod> Nowadays we're trying to reduce the number of indirect calls since
they're expensive post-Spectre.

Also there are some helpers to detect intent and intentdone items that
check the supplied li_ops; see xlog_item_is_intent and
xlog_item_is_intent_done.  I think you're fine here, but it's something
to keep in the back of your head.

> > > +
> > > +STATIC void
> > > +xfs_attri_item_committing(
> > > +	struct xfs_log_item	*lip,
> > > +	xfs_lsn_t		lsn)
> > > +{
> > > +}
> > 
> > This function isn't required if it doesn't do anything.  See
> > xfs_log_commit_cil.
> Ok, will remove
> 
> > 
> > > +
> > > +STATIC bool
> > > +xfs_attri_item_match(
> > > +	struct xfs_log_item	*lip,
> > > +	uint64_t		intent_id)
> > > +{
> > > +	return ATTRI_ITEM(lip)->attri_format.alfi_id == intent_id;
> > > +}
> > > +
> > > +/*
> > > + * When the attrd item is committed to disk, all we need to do is delete our
> > > + * reference to our partner attri item and then free ourselves. Since we're
> > > + * freeing ourselves we must return -1 to keep the transaction code from
> > > + * further referencing this item.
> > > + */
> > > +STATIC xfs_lsn_t
> > > +xfs_attrd_item_committed(
> > > +	struct xfs_log_item	*lip,
> > > +	xfs_lsn_t		lsn)
> > > +{
> > > +	struct xfs_attrd_log_item	*attrdp = ATTRD_ITEM(lip);
> > > +
> > > +	/*
> > > +	 * Drop the ATTRI reference regardless of whether the ATTRD has been
> > > +	 * aborted. Once the ATTRD transaction is constructed, it is the sole
> > > +	 * responsibility of the ATTRD to release the ATTRI (even if the ATTRI
> > > +	 * is aborted due to log I/O error).
> > > +	 */
> > > +	xfs_attri_release(attrdp->attrd_attrip);
> > > +	xfs_attrd_item_free(attrdp);
> > > +
> > > +	return NULLCOMMITLSN;
> > > +}
> > 
> > If you set XFS_ITEM_RELEASE_WHEN_COMMITTED in the attrd item ops,
> > xfs_trans_committed_bulk will call ->iop_release instead of
> > ->iop_committed and you therefore don't need this function.
> Oh i see, will do that then
> 
> > 
> > > +
> > > +STATIC void
> > > +xfs_attrd_item_committing(
> > > +	struct xfs_log_item	*lip,
> > > +	xfs_lsn_t		lsn)
> > > +{
> > > +}
> > 
> > Same comment as xfs_attri_item_committing.
> ok, will remove this one
> 
> > 
> > > +
> > > +
> > > +/*
> > > + * Allocate and initialize an attrd item
> > > + */
> > > +struct xfs_attrd_log_item *
> > > +xfs_attrd_init(
> > > +	struct xfs_mount		*mp,
> > > +	struct xfs_attri_log_item	*attrip)
> > > +
> > > +{
> > > +	struct xfs_attrd_log_item	*attrdp;
> > > +	uint				size;
> > > +
> > > +	size = (uint)(sizeof(struct xfs_attrd_log_item));
> > 
> > Same comment about sizeof and size_t as in xfs_attri_init.
> > 
> > > +	attrdp = kmem_zalloc(size, 0);
> > > +	memset(attrdp, 0, size);
> > 
> > No need to memset-zero something you just zalloc'd.
> ok, will clean these up
> 
> > 
> > > +
> > > +	xfs_log_item_init(mp, &attrdp->attrd_item, XFS_LI_ATTRD,
> > > +			  &xfs_attrd_item_ops);
> > > +	attrdp->attrd_attrip = attrip;
> > > +	attrdp->attrd_format.alfd_alf_id = attrip->attri_format.alfi_id;
> > > +
> > > +	return attrdp;
> > > +}
> > > +
> > > +/*
> > > + * This routine is called to allocate an "attr free done" log item.
> > > + */
> > > +struct xfs_attrd_log_item *
> > > +xfs_trans_get_attrd(struct xfs_trans		*tp,
> > > +		  struct xfs_attri_log_item	*attrip)
> > > +{
> > > +	struct xfs_attrd_log_item		*attrdp;
> > > +
> > > +	ASSERT(tp != NULL);
> > > +
> > > +	attrdp = xfs_attrd_init(tp->t_mountp, attrip);
> > > +	ASSERT(attrdp != NULL);
> > 
> > You could fold xfs_attrd_init into this function since there's only one
> > caller.
> Sure, there's not a lot in the init
> 
> > 
> > > +
> > > +	xfs_trans_add_item(tp, &attrdp->attrd_item);
> > > +	return attrdp;
> > > +}
> > > +
> > > +static const struct xfs_item_ops xfs_attrd_item_ops = {
> > > +	.iop_size	= xfs_attrd_item_size,
> > > +	.iop_format	= xfs_attrd_item_format,
> > > +	.iop_release    = xfs_attrd_item_release,
> > > +	.iop_committing	= xfs_attrd_item_committing,
> > > +	.iop_committed	= xfs_attrd_item_committed,
> > > +};
> > > +
> > > +
> > > +/* Get an ATTRD so we can process all the attrs. */
> > > +static struct xfs_log_item *
> > > +xfs_attr_create_done(
> > > +	struct xfs_trans		*tp,
> > > +	struct xfs_log_item		*intent,
> > > +	unsigned int			count)
> > > +{
> > > +	if (!xfs_sb_version_hasdelattr(&tp->t_mountp->m_sb))
> > > +		return NULL;
> > 
> > This is probably better expressed as:
> > 
> > 	if (!intent)
> > 		return NULL;
> > 
> > Since we don't need a log intent done item if there's no log intent
> > item.
> Ok, that makes sense
> 
> > 
> > > +
> > > +	return &xfs_trans_get_attrd(tp, ATTRI_ITEM(intent))->attrd_item;
> > > +}
> > > +
> > > +const struct xfs_defer_op_type xfs_attr_defer_type = {
> > > +	.max_items	= 1,
> > > +	.create_intent	= xfs_attr_create_intent,
> > > +	.abort_intent	= xfs_attr_abort_intent,
> > > +	.create_done	= xfs_attr_create_done,
> > > +	.finish_item	= xfs_attr_finish_item,
> > > +	.cancel_item	= xfs_attr_cancel_item,
> > > +};
> > > +
> > > +/*
> > > + * Process an attr intent item that was recovered from the log.  We need to
> > > + * delete the attr that it describes.
> > > + */
> > > +STATIC int
> > > +xfs_attri_item_recover(
> > > +	struct xfs_log_item		*lip,
> > > +	struct list_head		*capture_list)
> > > +{
> > > +	struct xfs_attri_log_item	*attrip = ATTRI_ITEM(lip);
> > > +	struct xfs_mount		*mp = lip->li_mountp;
> > > +	struct xfs_inode		*ip;
> > > +	struct xfs_da_args		args;
> > > +	struct xfs_attri_log_format	*attrp;
> > > +	int				error;
> > > +
> > > +	/*
> > > +	 * First check the validity of the attr described by the ATTRI.  If any
> > > +	 * are bad, then assume that all are bad and just toss the ATTRI.
> > > +	 */
> > > +	attrp = &attrip->attri_format;
> > > +	if (!(attrp->alfi_op_flags == XFS_ATTR_OP_FLAGS_SET ||
> > > +	      attrp->alfi_op_flags == XFS_ATTR_OP_FLAGS_REMOVE) ||
> > > +	    (attrp->alfi_value_len > XATTR_SIZE_MAX) ||
> > > +	    (attrp->alfi_name_len > XATTR_NAME_MAX) ||
> > > +	    (attrp->alfi_name_len == 0)) {
> > 
> > This needs to call xfs_verify_ino() on attrp->alfi_ino.
> Ok, will add
> 
> > 
> > This also needs to check for xfs_sb_version_hasdelayedattr().
> Well, ideally this would not be exectuing if the feature bit were not on.
> Maybe we should add an ASSERT at the top?

The trouble is, we could be fed a filesystem where the delattr feature
bit is cleared but the log has been specially crafted/corrupted to have
a log item with type XFS_LI_ATTRI.  In that case we cannot recover the
log item because the log item type is inconsistent with the superblock
feature set.

(And yes, the current recovery functions are missing that...)

> 
> > 
> > I would refactor this into a separate validation predicate to eliminate
> > the multi-line if statement.  I will post a series cleaning up the other
> > log items' recover functions shortly.
> Alrighty, I will keep an eye out
> 
> > 
> > > +		/*
> > > +		 * This will pull the ATTRI from the AIL and free the memory
> > > +		 * associated with it.
> > > +		 */
> > > +		xfs_attri_release(attrip);
> > 
> > No need to call xfs_attri_release; one of the 5.10 cleanups was to
> > recognize that the log recovery code does this for you automatically.
> > 
> Ok, will remove
> 
> > > +		return -EFSCORRUPTED;
> > > +	}
> > > +
> > > +	error = xfs_iget(mp, 0, attrp->alfi_ino, 0, 0, &ip);
> > > +	if (error)
> > > +		return error;
> > 
> > I /think/ this needs to call xfs_qm_dqattach here, for reasons I'll get
> > into shortly.
> > 
> > In the meantime, this /definitely/ needs to do:
> > 
> > 	if (VFS_I(ip)->i_nlink == 0)
> > 		xfs_iflags_set(ip, XFS_IRECOVERY);
> > 
> > Because the IRECOVERY flag prevents inode inactivation from triggering
> > on an unlinked inode while we're still performing log recovery.
> > 
> > If you want to steal the xlog_recover_iget helper from the atomic
> > swapext series[0] please feel free. :)
> > 
> > [0] https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/commit/?id=51e23b9c9d9674a78dc97c5848c9efb4461e074d
> Oh I see.  Ok, I will take  a look at that
> 
> > 
> > > +	memset(&args, 0, sizeof(args));
> > > +	args.dp = ip;
> > > +	args.name = attrip->attri_name;
> > > +	args.namelen = attrp->alfi_name_len;
> > > +	args.attr_filter = attrp->alfi_attr_flags;
> > > +	if (attrp->alfi_op_flags == XFS_ATTR_OP_FLAGS_SET) {
> > > +		args.value = attrip->attri_value;
> > > +		args.valuelen = attrp->alfi_value_len;
> > > +	}
> > > +
> > > +	error = xfs_attr_set(&args);
> > 
> > Er...
> > 
> > > +
> > > +	xfs_attri_release(attrip);
> > 
> > The transaction commit will take care of releasing attrip.
> Mmmm, the new test case for attr replay hangs with out this line.  I suspect
> because we end up with an item in the ail that never goes away.
> 
> [Nov12 13:26] INFO: task mount:15718 blocked for more than 120 seconds.
> [  +0.000009]       Tainted: G        W   E     5.9.0-rc4 #1
> [  +0.000002] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> this message.
> [  +0.000004] task:mount           state:D stack:    0 pid:15718 ppid: 15491
> flags:0x00004000
> [  +0.000005] Call Trace:
> [  +0.000079]  __schedule+0x2d9/0x780
> [  +0.000020]  schedule+0x4a/0xb0
> [  +0.000120]  xfs_ail_push_all_sync+0xb8/0x100 [xfs]
> 
> ...ect....
> 
> 
> Little confused on this one.... I didnt think transaction commits released
> log items?

The ATTRI gets created with two refcount: one is dropped by the
transaction when it commits, and the second one is dropped by the ATTRD
when the ATTRD commits (per that huge comment below that I told you to
delete ;)).

Note that you're missing an xfs_trans_get_attrd call in the recover
function, which is another reason why you can't call xfs_attr_set()
directly here.  That might be why recovery locks up, but you'd have to
go check the trace data for that log item to confirm.

> > > +	xfs_irele(ip);
> > > +	return error;
> > > +}
> > > +
> > > +static const struct xfs_item_ops xfs_attri_item_ops = {
> > > +	.iop_size	= xfs_attri_item_size,
> > > +	.iop_format	= xfs_attri_item_format,
> > > +	.iop_unpin	= xfs_attri_item_unpin,
> > > +	.iop_committed	= xfs_attri_item_committed,
> > > +	.iop_committing = xfs_attri_item_committing,
> > > +	.iop_release    = xfs_attri_item_release,
> > > +	.iop_recover	= xfs_attri_item_recover,
> > > +	.iop_match	= xfs_attri_item_match,
> > 
> > This needs an ->iop_relog method so that we can relog the attri log item
> > if the log starts to fill up.
> Ok, will add
> 
> > 
> > > +};
> > > +
> > > +
> > > +
> > > +STATIC int
> > > +xlog_recover_attri_commit_pass2(
> > > +	struct xlog                     *log,
> > > +	struct list_head		*buffer_list,
> > > +	struct xlog_recover_item        *item,
> > > +	xfs_lsn_t                       lsn)
> > > +{
> > > +	int                             error;
> > > +	struct xfs_mount                *mp = log->l_mp;
> > > +	struct xfs_attri_log_item       *attrip;
> > > +	struct xfs_attri_log_format     *attri_formatp;
> > > +	char				*name = NULL;
> > > +	char				*value = NULL;
> > > +	int				region = 0;
> > > +
> > > +	attri_formatp = item->ri_buf[region].i_addr;
> > 
> > Please check the __pad field for zeroes here.
> Ok, will do
> 
> > 
> > > +	attrip = xfs_attri_init(mp);
> > > +	error = xfs_attri_copy_format(&item->ri_buf[region],
> > > +				      &attrip->attri_format);
> > > +	if (error) {
> > > +		xfs_attri_item_free(attrip);
> > > +		return error;
> > > +	}
> > > +
> > > +	attrip->attri_name_len = attri_formatp->alfi_name_len;
> > > +	attrip->attri_value_len = attri_formatp->alfi_value_len;
> > > +	attrip = krealloc(attrip, sizeof(struct xfs_attri_log_item) +
> > > +			  attrip->attri_name_len + attrip->attri_value_len,
> > > +			  GFP_NOFS | __GFP_NOFAIL);
> > > +
> > > +	ASSERT(attrip->attri_name_len > 0);
> > 
> > If attri_name_len is zero, reject the whole thing with EFSCORRUPTED.
> Ok, makes sense
> 
> > 
> > > +	region++;
> > > +	name = ((char *)attrip) + sizeof(struct xfs_attri_log_item);
> > > +	memcpy(name, item->ri_buf[region].i_addr,
> > > +	       attrip->attri_name_len);
> > > +	attrip->attri_name = name;
> > > +
> > > +	if (attrip->attri_value_len > 0) {
> > > +		region++;
> > > +		value = ((char *)attrip) + sizeof(struct xfs_attri_log_item) +
> > > +			attrip->attri_name_len;
> > > +		memcpy(value, item->ri_buf[region].i_addr,
> > > +			attrip->attri_value_len);
> > > +		attrip->attri_value = value;
> > > +	}
> > 
> > Question: is it valid for an attri item to have value_len > 0 for an
> > XFS_ATTRI_OP_FLAGS_REMOVE operation?
> Well, it shouldnt happen since the new attr_set routines assume that the
> absence of the value implies a remove operation.  It doesnt invalidate the
> item I suppose, though it would mean that it's carrying around a usless
> payload that it shouldnt.

_commit_pass2 is called as part of recovering unfinished items from the
ondisk log.  If you find something that doesn't smell right, you should
bail out with an error code so that mounting fails.

> > 
> > Granted, that level of validation might be better left to the _recover
> > function.
> Maybe we should add and ASSERT there
> 
> > 
> > > +
> > > +	/*
> > > +	 * The ATTRI has two references. One for the ATTRD and one for ATTRI to
> > > +	 * ensure it makes it into the AIL. Insert the ATTRI into the AIL
> > > +	 * directly and drop the ATTRI reference. Note that
> > > +	 * xfs_trans_ail_update() drops the AIL lock.
> > > +	 */
> > > +	xfs_trans_ail_insert(log->l_ailp, &attrip->attri_item, lsn);
> > > +	xfs_attri_release(attrip);
> > > +	return 0;
> > > +}
> > > +
> > > +const struct xlog_recover_item_ops xlog_attri_item_ops = {
> > > +	.item_type	= XFS_LI_ATTRI,
> > > +	.commit_pass2	= xlog_recover_attri_commit_pass2,
> > > +};
> > > +
> > > +/*
> > > + * This routine is called when an ATTRD format structure is found in a committed
> > > + * transaction in the log. Its purpose is to cancel the corresponding ATTRI if
> > > + * it was still in the log. To do this it searches the AIL for the ATTRI with
> > > + * an id equal to that in the ATTRD format structure. If we find it we drop
> > > + * the ATTRD reference, which removes the ATTRI from the AIL and frees it.
> > > + */
> > > +STATIC int
> > > +xlog_recover_attrd_commit_pass2(
> > > +	struct xlog			*log,
> > > +	struct list_head		*buffer_list,
> > > +	struct xlog_recover_item	*item,
> > > +	xfs_lsn_t			lsn)
> > > +{
> > > +	struct xfs_attrd_log_format	*attrd_formatp;
> > > +
> > > +	attrd_formatp = item->ri_buf[0].i_addr;
> > > +	ASSERT((item->ri_buf[0].i_len ==
> > > +				(sizeof(struct xfs_attrd_log_format))));
> > > +
> > > +	xlog_recover_release_intent(log, XFS_LI_ATTRI,
> > > +				    attrd_formatp->alfd_alf_id);
> > > +	return 0;
> > > +}
> > > +
> > > +const struct xlog_recover_item_ops xlog_attrd_item_ops = {
> > > +	.item_type	= XFS_LI_ATTRD,
> > > +	.commit_pass2	= xlog_recover_attrd_commit_pass2,
> > > +};
> > > diff --git a/fs/xfs/xfs_attr_item.h b/fs/xfs/xfs_attr_item.h
> > > new file mode 100644
> > > index 0000000..7dd2572
> > > --- /dev/null
> > > +++ b/fs/xfs/xfs_attr_item.h
> > > @@ -0,0 +1,76 @@
> > > +/* SPDX-License-Identifier: GPL-2.0-or-later
> > > + *
> > > + * Copyright (C) 2019 Oracle.  All Rights Reserved.
> > > + * Author: Allison Collins <allison.henderson@oracle.com>
> > > + */
> > > +#ifndef	__XFS_ATTR_ITEM_H__
> > > +#define	__XFS_ATTR_ITEM_H__
> > > +
> > > +/* kernel only ATTRI/ATTRD definitions */
> > > +
> > > +struct xfs_mount;
> > > +struct kmem_zone;
> > > +
> > > +/*
> > > + * Define ATTR flag bits. Manipulated by set/clear/test_bit operators.
> > > + */
> > > +#define	XFS_ATTRI_RECOVERED	1
> > > +
> > > +
> > > +/* iovec length must be 32-bit aligned */
> > > +#define ATTR_NVEC_SIZE(size) (size == sizeof(int32_t) ? sizeof(int32_t) : \
> > > +				size + sizeof(int32_t) - \
> > > +				(size % sizeof(int32_t)))
> > 
> > Can you turn this into a static inline helper?
> > 
> > And use one of the roundup() variants to ensure the proper alignment
> > instead of this open-coded stuff? :)
> Sure, will do
> 
> > 
> > > +
> > > +/*
> > > + * This is the "attr intention" log item.  It is used to log the fact that some
> > > + * attribute operations need to be processed.  An operation is currently either
> > > + * a set or remove.  Set or remove operations are described by the xfs_attr_item
> > > + * which may be logged to this intent.  Intents are used in conjunction with the
> > > + * "attr done" log item described below.
> > > + *
> > > + * The ATTRI is reference counted so that it is not freed prior to both the
> > > + * ATTRI and ATTRD being committed and unpinned. This ensures the ATTRI is
> > > + * inserted into the AIL even in the event of out of order ATTRI/ATTRD
> > > + * processing. In other words, an ATTRI is born with two references:
> > > + *
> > > + *      1.) an ATTRI held reference to track ATTRI AIL insertion
> > > + *      2.) an ATTRD held reference to track ATTRD commit
> > > + *
> > > + * On allocation, both references are the responsibility of the caller. Once the
> > > + * ATTRI is added to and dirtied in a transaction, ownership of reference one
> > > + * transfers to the transaction. The reference is dropped once the ATTRI is
> > > + * inserted to the AIL or in the event of failure along the way (e.g., commit
> > > + * failure, log I/O error, etc.). Note that the caller remains responsible for
> > > + * the ATTRD reference under all circumstances to this point. The caller has no
> > > + * means to detect failure once the transaction is committed, however.
> > > + * Therefore, an ATTRD is required after this point, even in the event of
> > > + * unrelated failure.
> > > + *
> > > + * Once an ATTRD is allocated and dirtied in a transaction, reference two
> > > + * transfers to the transaction. The ATTRD reference is dropped once it reaches
> > > + * the unpin handler. Similar to the ATTRI, the reference also drops in the
> > > + * event of commit failure or log I/O errors. Note that the ATTRD is not
> > > + * inserted in the AIL, so at this point both the ATTRI and ATTRD are freed.
> > 
> > I don't think it's necessary to document the entire log intent/log done
> > refcount state machine here; it'll do to record just the bits that are
> > specific to delayed xattr operations.
> Ok, maybe just the first 3 lines are enough then? I think that's all that
> really stands out from the other delayed ops

Yes.  You might also want to touch on the lifespan of the name and value
buffers that are attached to the xfs_attr_item -- they're copies of what
the caller passed in from userspace, right?  And they're attached to the
log intent item long enough for the item to commit, right?  And they're
freed when the xfs_attr_item itself is freed when the work is done,
right?

--D

> > 
> > > + */
> > > +struct xfs_attri_log_item {
> > > +	struct xfs_log_item		attri_item;
> > > +	atomic_t			attri_refcount;
> > > +	int				attri_name_len;
> > > +	void				*attri_name;
> > > +	int				attri_value_len;
> > > +	void				*attri_value;
> > 
> > Please compress this structure a bit by moving the two pointers to be
> > adjacent instead of interspersed with ints.
> Alrighty, will do.
> 
> > 
> > Ok, now on to digesting the new state machine...
> > 
> > --D
> Ok then, thanks for the thorough review!!
> 
> Allison
> > 
> > > +	struct xfs_attri_log_format	attri_format;
> > > +};
> > > +
> > > +/*
> > > + * This is the "attr done" log item.  It is used to log the fact that some attrs
> > > + * earlier mentioned in an attri item have been freed.
> > > + */
> > > +struct xfs_attrd_log_item {
> > > +	struct xfs_attri_log_item	*attrd_attrip;
> > > +	struct xfs_log_item		attrd_item;
> > > +	struct xfs_attrd_log_format	attrd_format;
> > > +};
> > > +
> > > +#endif	/* __XFS_ATTR_ITEM_H__ */
> > > diff --git a/fs/xfs/xfs_attr_list.c b/fs/xfs/xfs_attr_list.c
> > > index 8f8837f..d7787a5 100644
> > > --- a/fs/xfs/xfs_attr_list.c
> > > +++ b/fs/xfs/xfs_attr_list.c
> > > @@ -15,6 +15,7 @@
> > >   #include "xfs_inode.h"
> > >   #include "xfs_trans.h"
> > >   #include "xfs_bmap.h"
> > > +#include "xfs_da_btree.h"
> > >   #include "xfs_attr.h"
> > >   #include "xfs_attr_sf.h"
> > >   #include "xfs_attr_leaf.h"
> > > diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
> > > index 3fbd98f..d5d1959 100644
> > > --- a/fs/xfs/xfs_ioctl.c
> > > +++ b/fs/xfs/xfs_ioctl.c
> > > @@ -15,6 +15,8 @@
> > >   #include "xfs_iwalk.h"
> > >   #include "xfs_itable.h"
> > >   #include "xfs_error.h"
> > > +#include "xfs_da_format.h"
> > > +#include "xfs_da_btree.h"
> > >   #include "xfs_attr.h"
> > >   #include "xfs_bmap.h"
> > >   #include "xfs_bmap_util.h"
> > > diff --git a/fs/xfs/xfs_ioctl32.c b/fs/xfs/xfs_ioctl32.c
> > > index c1771e7..62e1534 100644
> > > --- a/fs/xfs/xfs_ioctl32.c
> > > +++ b/fs/xfs/xfs_ioctl32.c
> > > @@ -17,6 +17,8 @@
> > >   #include "xfs_itable.h"
> > >   #include "xfs_fsops.h"
> > >   #include "xfs_rtalloc.h"
> > > +#include "xfs_da_format.h"
> > > +#include "xfs_da_btree.h"
> > >   #include "xfs_attr.h"
> > >   #include "xfs_ioctl.h"
> > >   #include "xfs_ioctl32.h"
> > > diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
> > > index 5e16545..5ecc76c 100644
> > > --- a/fs/xfs/xfs_iops.c
> > > +++ b/fs/xfs/xfs_iops.c
> > > @@ -13,6 +13,8 @@
> > >   #include "xfs_inode.h"
> > >   #include "xfs_acl.h"
> > >   #include "xfs_quota.h"
> > > +#include "xfs_da_format.h"
> > > +#include "xfs_da_btree.h"
> > >   #include "xfs_attr.h"
> > >   #include "xfs_trans.h"
> > >   #include "xfs_trace.h"
> > > diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
> > > index fa2d05e..3457f22 100644
> > > --- a/fs/xfs/xfs_log.c
> > > +++ b/fs/xfs/xfs_log.c
> > > @@ -1993,6 +1993,10 @@ xlog_print_tic_res(
> > >   	    REG_TYPE_STR(CUD_FORMAT, "cud_format"),
> > >   	    REG_TYPE_STR(BUI_FORMAT, "bui_format"),
> > >   	    REG_TYPE_STR(BUD_FORMAT, "bud_format"),
> > > +	    REG_TYPE_STR(ATTRI_FORMAT, "attri_format"),
> > > +	    REG_TYPE_STR(ATTRD_FORMAT, "attrd_format"),
> > > +	    REG_TYPE_STR(ATTR_NAME, "attr_name"),
> > > +	    REG_TYPE_STR(ATTR_VALUE, "attr_value"),
> > >   	};
> > >   	BUILD_BUG_ON(ARRAY_SIZE(res_type_str) != XLOG_REG_TYPE_MAX + 1);
> > >   #undef REG_TYPE_STR
> > > diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
> > > index a8289ad..cb951cd 100644
> > > --- a/fs/xfs/xfs_log_recover.c
> > > +++ b/fs/xfs/xfs_log_recover.c
> > > @@ -1775,6 +1775,8 @@ static const struct xlog_recover_item_ops *xlog_recover_item_ops[] = {
> > >   	&xlog_cud_item_ops,
> > >   	&xlog_bui_item_ops,
> > >   	&xlog_bud_item_ops,
> > > +	&xlog_attri_item_ops,
> > > +	&xlog_attrd_item_ops,
> > >   };
> > >   static const struct xlog_recover_item_ops *
> > > diff --git a/fs/xfs/xfs_ondisk.h b/fs/xfs/xfs_ondisk.h
> > > index 0aa87c2..bc9c25e 100644
> > > --- a/fs/xfs/xfs_ondisk.h
> > > +++ b/fs/xfs/xfs_ondisk.h
> > > @@ -132,6 +132,8 @@ xfs_check_ondisk_structs(void)
> > >   	XFS_CHECK_STRUCT_SIZE(struct xfs_inode_log_format,	56);
> > >   	XFS_CHECK_STRUCT_SIZE(struct xfs_qoff_logformat,	20);
> > >   	XFS_CHECK_STRUCT_SIZE(struct xfs_trans_header,		16);
> > > +	XFS_CHECK_STRUCT_SIZE(struct xfs_attri_log_format,	40);
> > > +	XFS_CHECK_STRUCT_SIZE(struct xfs_attrd_log_format,	16);
> > >   	/*
> > >   	 * The v5 superblock format extended several v4 header structures with
> > > diff --git a/fs/xfs/xfs_xattr.c b/fs/xfs/xfs_xattr.c
> > > index bca48b3..9b0c790 100644
> > > --- a/fs/xfs/xfs_xattr.c
> > > +++ b/fs/xfs/xfs_xattr.c
> > > @@ -10,6 +10,7 @@
> > >   #include "xfs_log_format.h"
> > >   #include "xfs_da_format.h"
> > >   #include "xfs_inode.h"
> > > +#include "xfs_da_btree.h"
> > >   #include "xfs_attr.h"
> > >   #include "xfs_acl.h"
> > >   #include "xfs_da_btree.h"
> > > -- 
> > > 2.7.4
> > > 

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v13 06/10] xfs: Add xfs_attr_set_deferred and xfs_attr_remove_deferred
  2020-11-13  1:27     ` Allison Henderson
@ 2020-11-14  2:03       ` Darrick J. Wong
  0 siblings, 0 replies; 58+ messages in thread
From: Darrick J. Wong @ 2020-11-14  2:03 UTC (permalink / raw)
  To: Allison Henderson; +Cc: linux-xfs

On Thu, Nov 12, 2020 at 06:27:38PM -0700, Allison Henderson wrote:
> 
> 
> On 11/10/20 1:15 PM, Darrick J. Wong wrote:
> > On Thu, Oct 22, 2020 at 11:34:31PM -0700, Allison Henderson wrote:
> > > From: Allison Collins <allison.henderson@oracle.com>
> > > 
> > > These routines to set up and start a new deferred attribute operations.
> > > These functions are meant to be called by any routine needing to
> > > initiate a deferred attribute operation as opposed to the existing
> > > inline operations. New helper function xfs_attr_item_init also added.
> > > 
> > > Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
> > > ---
> > >   fs/xfs/libxfs/xfs_attr.c | 54 ++++++++++++++++++++++++++++++++++++++++++++++++
> > >   fs/xfs/libxfs/xfs_attr.h |  2 ++
> > >   2 files changed, 56 insertions(+)
> > > 
> > > diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
> > > index 760383c..7fe5554 100644
> > > --- a/fs/xfs/libxfs/xfs_attr.c
> > > +++ b/fs/xfs/libxfs/xfs_attr.c
> > > @@ -25,6 +25,7 @@
> > >   #include "xfs_trans_space.h"
> > >   #include "xfs_trace.h"
> > >   #include "xfs_attr_item.h"
> > > +#include "xfs_attr.h"
> > >   /*
> > >    * xfs_attr.c
> > > @@ -643,6 +644,59 @@ xfs_attr_set(
> > >   	goto out_unlock;
> > >   }
> > > +STATIC int
> > > +xfs_attr_item_init(
> > > +	struct xfs_da_args	*args,
> > > +	unsigned int		op_flags,	/* op flag (set or remove) */
> > > +	struct xfs_attr_item	**attr)		/* new xfs_attr_item */
> > > +{
> > > +
> > > +	struct xfs_attr_item	*new;
> > > +
> > > +	new = kmem_alloc_large(sizeof(struct xfs_attr_item), KM_NOFS);
> > 
> > I don't think we need _large allocations for struct xfs_attr_item, right?
> I will try it and see, I think it should be ok, one of the new test cases
> I'm using does try to progressively add larger and larger attrs. If it
> doesnt work, I'll make a note of it though.

Ok.  Note that kmem_alloc will only return heap objects from memory
that's directly addressable by the kerneli (and can't be larger than a
page), whereas kmem_alloc_large can fall back to virtually mapped
memory, which is scarce on 32-bit systems.

--D

> > 
> > > +	memset(new, 0, sizeof(struct xfs_attr_item));
> > 
> > Use kmem_zalloc and you won't have to memset.  Better yet, zalloc will
> > get you memory that's been pre-zeroed in the background.
> > 
> > > +	new->xattri_op_flags = op_flags;
> > > +	new->xattri_dac.da_args = args;
> > > +
> > > +	*attr = new;
> > > +	return 0;
> > > +}
> > > +
> > > +/* Sets an attribute for an inode as a deferred operation */
> > > +int
> > > +xfs_attr_set_deferred(
> > > +	struct xfs_da_args	*args)
> > > +{
> > > +	struct xfs_attr_item	*new;
> > > +	int			error = 0;
> > > +
> > > +	error = xfs_attr_item_init(args, XFS_ATTR_OP_FLAGS_SET, &new);
> > > +	if (error)
> > > +		return error;
> > > +
> > > +	xfs_defer_add(args->trans, XFS_DEFER_OPS_TYPE_ATTR, &new->xattri_list);
> > > +
> > > +	return 0;
> > > +}
> > 
> > The changes in "xfs: enable delayed attributes" should be moved to this
> > patch so that these new functions immediately have callers.
> Sure, will merge those patches together then
> 
> > 
> > (Also see the reply I sent to the next patch, which will avoid weird
> > regressions if someone's bisect lands in the middle of this series...)
> > 
> > --D
> > 
> > > +
> > > +/* Removes an attribute for an inode as a deferred operation */
> > > +int
> > > +xfs_attr_remove_deferred(
> > > +	struct xfs_da_args	*args)
> > > +{
> > > +
> > > +	struct xfs_attr_item	*new;
> > > +	int			error;
> > > +
> > > +	error  = xfs_attr_item_init(args, XFS_ATTR_OP_FLAGS_REMOVE, &new);
> > > +	if (error)
> > > +		return error;
> > > +
> > > +	xfs_defer_add(args->trans, XFS_DEFER_OPS_TYPE_ATTR, &new->xattri_list);
> > > +
> > > +	return 0;
> > > +}
> > > +
> > >   /*========================================================================
> > >    * External routines when attribute list is inside the inode
> > >    *========================================================================*/
> > > diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
> > > index 5b4a1ca..8a08411 100644
> > > --- a/fs/xfs/libxfs/xfs_attr.h
> > > +++ b/fs/xfs/libxfs/xfs_attr.h
> > > @@ -307,5 +307,7 @@ bool xfs_attr_namecheck(const void *name, size_t length);
> > >   void xfs_delattr_context_init(struct xfs_delattr_context *dac,
> > >   			      struct xfs_da_args *args);
> > >   int xfs_attr_calc_size(struct xfs_da_args *args, int *local);
> > > +int xfs_attr_set_deferred(struct xfs_da_args *args);
> > > +int xfs_attr_remove_deferred(struct xfs_da_args *args);
> > >   #endif	/* __XFS_ATTR_H__ */
> > > -- 
> > > 2.7.4
> > > 

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v13 02/10] xfs: Add delay ready attr remove routines
  2020-11-14  1:18       ` Darrick J. Wong
@ 2020-11-16  5:12         ` Allison Henderson
  0 siblings, 0 replies; 58+ messages in thread
From: Allison Henderson @ 2020-11-16  5:12 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs



On 11/13/20 6:18 PM, Darrick J. Wong wrote:
> On Thu, Nov 12, 2020 at 08:43:25PM -0700, Allison Henderson wrote:
>>
>>
>> On 11/10/20 4:43 PM, Darrick J. Wong wrote:
>>> On Thu, Oct 22, 2020 at 11:34:27PM -0700, Allison Henderson wrote:
>>>> This patch modifies the attr remove routines to be delay ready. This
>>>> means they no longer roll or commit transactions, but instead return
>>>> -EAGAIN to have the calling routine roll and refresh the transaction. In
>>>> this series, xfs_attr_remove_args has become xfs_attr_remove_iter, which
>>>> uses a sort of state machine like switch to keep track of where it was
>>>> when EAGAIN was returned. xfs_attr_node_removename has also been
>>>> modified to use the switch, and a new version of xfs_attr_remove_args
>>>> consists of a simple loop to refresh the transaction until the operation
>>>> is completed.  A new XFS_DAC_DEFER_FINISH flag is used to finish the
>>>> transaction where ever the existing code used to.
>>>>
>>>> Calls to xfs_attr_rmtval_remove are replaced with the delay ready
>>>> version __xfs_attr_rmtval_remove. We will rename
>>>> __xfs_attr_rmtval_remove back to xfs_attr_rmtval_remove when we are
>>>> done.
>>>>
>>>> xfs_attr_rmtval_remove itself is still in use by the set routines (used
>>>> during a rename).  For reasons of preserving existing function, we
>>>> modify xfs_attr_rmtval_remove to call xfs_defer_finish when the flag is
>>>> set.  Similar to how xfs_attr_remove_args does here.  Once we transition
>>>> the set routines to be delay ready, xfs_attr_rmtval_remove is no longer
>>>> used and will be removed.
>>>>
>>>> This patch also adds a new struct xfs_delattr_context, which we will use
>>>> to keep track of the current state of an attribute operation. The new
>>>> xfs_delattr_state enum is used to track various operations that are in
>>>> progress so that we know not to repeat them, and resume where we left
>>>> off before EAGAIN was returned to cycle out the transaction. Other
>>>> members take the place of local variables that need to retain their
>>>> values across multiple function recalls.  See xfs_attr.h for a more
>>>> detailed diagram of the states.
>>>>
>>>> Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
>>>> ---
>>>>    fs/xfs/libxfs/xfs_attr.c        | 200 +++++++++++++++++++++++++++++-----------
>>>>    fs/xfs/libxfs/xfs_attr.h        |  72 +++++++++++++++
>>>>    fs/xfs/libxfs/xfs_attr_leaf.c   |   2 +-
>>>>    fs/xfs/libxfs/xfs_attr_remote.c |  37 ++++----
>>>>    fs/xfs/libxfs/xfs_attr_remote.h |   2 +-
>>>>    fs/xfs/xfs_attr_inactive.c      |   2 +-
>>>>    6 files changed, 241 insertions(+), 74 deletions(-)
>>>>
>>>> diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
>>>> index f4d39bf..6ca94cb 100644
>>>> --- a/fs/xfs/libxfs/xfs_attr.c
>>>> +++ b/fs/xfs/libxfs/xfs_attr.c
>>>> @@ -53,7 +53,7 @@ STATIC int xfs_attr_leaf_hasname(struct xfs_da_args *args, struct xfs_buf **bp);
>>>>     */
>>>>    STATIC int xfs_attr_node_get(xfs_da_args_t *args);
>>>>    STATIC int xfs_attr_node_addname(xfs_da_args_t *args);
>>>> -STATIC int xfs_attr_node_removename(xfs_da_args_t *args);
>>>> +STATIC int xfs_attr_node_removename_iter(struct xfs_delattr_context *dac);
>>>>    STATIC int xfs_attr_node_hasname(xfs_da_args_t *args,
>>>>    				 struct xfs_da_state **state);
>>>>    STATIC int xfs_attr_fillstate(xfs_da_state_t *state);
>>>> @@ -264,6 +264,33 @@ xfs_attr_set_shortform(
>>>>    }
>>>>    /*
>>>> + * Checks to see if a delayed attribute transaction should be rolled.  If so,
>>>> + * also checks for a defer finish.  Transaction is finished and rolled as
>>>> + * needed, and returns true of false if the delayed operation should continue.
>>>> + */
>>>> +int
>>>> +xfs_attr_trans_roll(
>>>> +	struct xfs_delattr_context	*dac)
>>>> +{
>>>> +	struct xfs_da_args		*args = dac->da_args;
>>>> +	int				error = 0;
>>>> +
>>>> +	if (dac->flags & XFS_DAC_DEFER_FINISH) {
>>>> +		/*
>>>> +		 * The caller wants us to finish all the deferred ops so that we
>>>> +		 * avoid pinning the log tail with a large number of deferred
>>>> +		 * ops.
>>>> +		 */
>>>> +		dac->flags &= ~XFS_DAC_DEFER_FINISH;
>>>> +		error = xfs_defer_finish(&args->trans);
>>>> +		if (error)
>>>> +			return error;
>>>> +	}
>>>> +
>>>> +	return xfs_trans_roll_inode(&args->trans, args->dp);
>>>> +}
>>>
>>> (Mostly ignoring these functions since they all go away by the end of
>>> the patchset...)
>>>
>>>> +
>>>> +/*
>>>>     * Set the attribute specified in @args.
>>>>     */
>>>>    int
>>>> @@ -364,23 +391,54 @@ xfs_has_attr(
>>>>     */
>>>>    int
>>>>    xfs_attr_remove_args(
>>>> -	struct xfs_da_args      *args)
>>>> +	struct xfs_da_args	*args)
>>>>    {
>>>> -	struct xfs_inode	*dp = args->dp;
>>>> -	int			error;
>>>> +	int				error = 0;
>>>> +	struct xfs_delattr_context	dac = {
>>>> +		.da_args	= args,
>>>> +	};
>>>> +
>>>> +	do {
>>>> +		error = xfs_attr_remove_iter(&dac);
>>>> +		if (error != -EAGAIN)
>>>> +			break;
>>>> +
>>>> +		error = xfs_attr_trans_roll(&dac);
>>>> +		if (error)
>>>> +			return error;
>>>> +
>>>> +	} while (true);
>>>> +
>>>> +	return error;
>>>> +}
>>>> +
>>>> +/*
>>>> + * Remove the attribute specified in @args.
>>>> + *
>>>> + * This function may return -EAGAIN to signal that the transaction needs to be
>>>> + * rolled.  Callers should continue calling this function until they receive a
>>>> + * return value other than -EAGAIN.
>>>> + */
>>>> +int
>>>> +xfs_attr_remove_iter(
>>>> +	struct xfs_delattr_context	*dac)
>>>> +{
>>>> +	struct xfs_da_args		*args = dac->da_args;
>>>> +	struct xfs_inode		*dp = args->dp;
>>>> +
>>>> +	if (dac->dela_state == XFS_DAS_RM_SHRINK)
>>>> +		goto node;
>>>
>>> Might as well just make this part of the if statement dispatch:
>>>
>>> 	if (dac->dela_state == XFS_DAS_RM_SHRINK)
>>> 		return xfs_attr_node_removename_iter(dac);
>>> 	else if (!xfs_inode_hasattr(dp))
>>> 		return -ENOATTR;
>> I think we did this once, but then people disliked having the same call in
>> two places.  We call the node function if XFS_DAS_RM_SHRINK is set OR if the
>> other two cases fail which is actually the initial point of entry.
>>
>> I think probably we need a comment somewhere.  I've realized every time a
>> question gets re-raised, it means we need a comment so we dont forget why
>> :-)
>>
>> Maybe for the goto we can have:
>> /* If we are shrinking a node, resume shrink */
>>
>> and.....
> 
> <shrug> This was a pretty minor point in my review, so if there's a
> better way of doing it, please feel free. :)
> 
> Admittedly I assume that a modern day compiler will slice and dice and
> rearrange to its heart's content, so for the most part I'm looking for
> higher level design errors and more or less don't care about the nitty
> gritty of what kind of machine code this all turns into.
> 
> (I'm probably doing that at everyone's peril, sadly...)
No worries, I just try my best to keep everyones commentary from prior 
review concidered.  Just to make sure things keep moving in a 
progressive direction.  At least as much as possible :-)

> 
>>
>>>
>>>>    	if (!xfs_inode_hasattr(dp)) {
>>>> -		error = -ENOATTR;
>>>> +		return -ENOATTR;
>>>>    	} else if (dp->i_afp->if_format == XFS_DINODE_FMT_LOCAL) {
>>>>    		ASSERT(dp->i_afp->if_flags & XFS_IFINLINE);
>>>> -		error = xfs_attr_shortform_remove(args);
>>>> +		return xfs_attr_shortform_remove(args);
>>>>    	} else if (xfs_bmap_one_block(dp, XFS_ATTR_FORK)) {
>>>> -		error = xfs_attr_leaf_removename(args);
>>>> -	} else {
>>>> -		error = xfs_attr_node_removename(args);
>>>> +		return xfs_attr_leaf_removename(args);
>>>>    	}
>>>> -
>>>> -	return error;
>>>> +node:
>> 	/* If we are not short form or leaf, then remove node */
>> ?
>>>> +	return  xfs_attr_node_removename_iter(dac);
>>>>    }
>>>>    /*
>>>> @@ -1178,10 +1236,11 @@ xfs_attr_leaf_mark_incomplete(
>>>>     */
>>>>    STATIC
>>>>    int xfs_attr_node_removename_setup(
>>>> -	struct xfs_da_args	*args,
>>>> -	struct xfs_da_state	**state)
>>>> +	struct xfs_delattr_context	*dac,
>>>> +	struct xfs_da_state		**state)
>>>
>>> AFAICT *state == &dac->da_state by the end of the series; can you
>>> should remove this argument too?
>>>
>> Sure, I will see if I can collapse it down
>>
>>>>    {
>>>> -	int			error;
>>>> +	struct xfs_da_args		*args = dac->da_args;
>>>> +	int				error;
>>>>    	error = xfs_attr_node_hasname(args, state);
>>>>    	if (error != -EEXIST)
>>>> @@ -1191,6 +1250,12 @@ int xfs_attr_node_removename_setup(
>>>>    	ASSERT((*state)->path.blk[(*state)->path.active - 1].magic ==
>>>>    		XFS_ATTR_LEAF_MAGIC);
>>>> +	/*
>>>> +	 * Store state in the context incase we need to cycle out the
>>>> +	 * transaction
>>>> +	 */
>>>> +	dac->da_state = *state;
>>>> +
>>>>    	if (args->rmtblkno > 0) {
>>>>    		error = xfs_attr_leaf_mark_incomplete(args, *state);
>>>
>>> It doesn't make a lot of logical sense to me "we marked the attr
>>> incomplete to hide it" is the same state (UNINIT) as "we haven't done
>>> anything yet".
>> Not sure I quite follow what you mean here.  This little function is just a
>> set up helper.  It doesnt jump in an out like the other functions do with
>> the state machine.  We separated it out for that reason.  This routine
>> executes once to stash the state. The da_state. not the dela_state.
>> Different states :-)
>>
>> So after we have that stored away, the calling function moves onto
>> xfs_attr_node_remove_step, which does get recalled quite a bit until there
>> are no more remote blocks to remove.
> 
> <nod> I got that; I think my confusion here is that I was expecting each
> and every step to get its own state (which I think you said was how this
> used to be some ~5 revisions ago) even if it doesn't result in a
> transaction roll, whereas now the delattr code only introduces a new
> state when it needs to roll the transaction.
Well, I think the correct way to phrase it would be: all states are 
associated with an EAGAIN return (and thus a transaction roll), but not 
all such returns require a state.  If there is something else that can 
be used in place of a state, we use that instead.  Such at the state of 
the tree for example.

> 
> Hm.  I've been reviewing this patchset by puzzling out each of the steps
> of the old attr setting and removing code, and then figuring out how the
> old code got from one step to another.  Then I look at the end product
> of this whole patchset and try to figure out how the new state machine
> maps onto the old sequences, to determine if there are any serious
> discrepancies that also break things.
Oh gosh, that's a lot.  Thank you for the thorough review though!

> 
> So I think in the first round of this review I was treading awfully
> close to suggesting that every little step of the old system had to
> become an explicit state in the new system's state machine, so that I
> could do a 1:1 comparison.  That isn't the code that's before me now,
> and reworking all that sounds like (a) a big pain and (b) probably not
> where you and Brian were heading.
mmm, it wouldnt be tooooo bad.  But I do think it's important for people 
to be on the same page about it so that everybody is happy and 
comfortable with the result.  And to make sure development is moving in 
a progressive direction.

To be clear, putting back the explicit states would mean the 
re-installment of the pattern:

		dac->dela_state = XFS_DAS_NEW_STATE;
		return -EAGAIN;
das_new_state:


For every "return -EAGAIN" that doesnt have an explicit state.

For the most part, I got the impression that people perceive this to be 
a sort of unpleasant pattern to look at, and would like to see it culled 
away where it can be though.

> 
> Perhaps an easier way to bridge the gap between the old way and the new
> way would be to make the ASCII art diagram call out each of these little
> steps (marking the attr incomplete, removing the value blocks, erasing
> the attr key, shrinking the attr tree, etc.) and then show where each of
> the XFS_DAS_* steps fall into that?
Sure, I can flesh out the diagram a bit more

> 
> That way, the ASCII art would show that we start in XFS_DAS_UNINIT, mark
> the attr "incomplete", move on to XFS_DAS_RM_SHRINK, start removing attr
> blocks, etc.  The machinery can omit the unnecessary pieces, so long as
> we have a map of the overall process.
> 
> How does that sound?
That sounds reasonable :-)

> 
>>>
>>>>    		if (error)
>>>> @@ -1203,13 +1268,16 @@ int xfs_attr_node_removename_setup(
>>>>    }
>>>>    STATIC int
>>>> -xfs_attr_node_remove_rmt(
>>>> -	struct xfs_da_args	*args,
>>>> -	struct xfs_da_state	*state)
>>>> +xfs_attr_node_remove_rmt (
>>>> +	struct xfs_delattr_context	*dac,
>>>> +	struct xfs_da_state		*state)
>>>>    {
>>>> -	int			error = 0;
>>>> +	int				error = 0;
>>>> -	error = xfs_attr_rmtval_remove(args);
>>>> +	/*
>>>> +	 * May return -EAGAIN to request that the caller recall this function
>>>> +	 */
>>>> +	error = __xfs_attr_rmtval_remove(dac);
>>>>    	if (error)
>>>>    		return error;
>>>> @@ -1221,21 +1289,27 @@ xfs_attr_node_remove_rmt(
>>>>    }
>>>>    /*
>>>> - * Remove a name from a B-tree attribute list.
>>>> + * Step through removeing a name from a B-tree attribute list.
>>>>     *
>>>>     * This will involve walking down the Btree, and may involve joining
>>>>     * leaf nodes and even joining intermediate nodes up to and including
>>>>     * the root node (a special case of an intermediate node).
>>>> + *
>>>> + * This routine is meant to function as either an inline or delayed operation,
>>>> + * and may return -EAGAIN when the transaction needs to be rolled.  Calling
>>>> + * functions will need to handle this, and recall the function until a
>>>> + * successful error code is returned.
>>>>     */
>>>>    STATIC int
>>>>    xfs_attr_node_remove_step(
>>>> -	struct xfs_da_args	*args,
>>>> -	struct xfs_da_state	*state)
>>>> +	struct xfs_delattr_context	*dac)
>>>>    {
>>>> -	struct xfs_da_state_blk	*blk;
>>>> -	int			retval, error;
>>>> -	struct xfs_inode	*dp = args->dp;
>>>> +	struct xfs_da_args		*args = dac->da_args;
>>>> +	struct xfs_da_state		*state;
>>>> +	struct xfs_da_state_blk		*blk;
>>>> +	int				retval, error = 0;
>>>> +	state = dac->da_state;
>>>
>>> Might as well initialize this when you declare state above.
>> Sure
>>
>>>
>>>>    	/*
>>>>    	 * If there is an out-of-line value, de-allocate the blocks.
>>>> @@ -1243,7 +1317,10 @@ xfs_attr_node_remove_step(
>>>>    	 * overflow the maximum size of a transaction and/or hit a deadlock.
>>>>    	 */
>>>>    	if (args->rmtblkno > 0) {
>>>> -		error = xfs_attr_node_remove_rmt(args, state);
>>>> +		/*
>>>> +		 * May return -EAGAIN. Remove blocks until args->rmtblkno == 0
>>>> +		 */
>>>> +		error = xfs_attr_node_remove_rmt(dac, state);
>>>>    		if (error)
>>>>    			return error;
>>>>    	}
>>>> @@ -1257,21 +1334,18 @@ xfs_attr_node_remove_step(
>>>>    	xfs_da3_fixhashpath(state, &state->path);
>>>>    	/*
>>>> -	 * Check to see if the tree needs to be collapsed.
>>>> +	 * Check to see if the tree needs to be collapsed.  Set the flag to
>>>> +	 * indicate that the calling function needs to move the to shrink
>>>> +	 * operation
>>>>    	 */
>>>>    	if (retval && (state->path.active > 1)) {
>>>>    		error = xfs_da3_join(state);
>>>>    		if (error)
>>>>    			return error;
>>>> -		error = xfs_defer_finish(&args->trans);
>>>> -		if (error)
>>>> -			return error;
>>>> -		/*
>>>> -		 * Commit the Btree join operation and start a new trans.
>>>> -		 */
>>>> -		error = xfs_trans_roll_inode(&args->trans, dp);
>>>> -		if (error)
>>>> -			return error;
>>>> +
>>>> +		dac->flags |= XFS_DAC_DEFER_FINISH;
>>>> +		dac->dela_state = XFS_DAS_RM_SHRINK;
>>>> +		return -EAGAIN;
>>>>    	}
>>>>    	return error;
>>>> @@ -1282,31 +1356,53 @@ xfs_attr_node_remove_step(
>>>>     *
>>>>     * This routine will find the blocks of the name to remove, remove them and
>>>>     * shirnk the tree if needed.
>>>
>>> "...and shrink the tree..."
>>>
>> Will fix the shirnk :-)
>>
>>>> + *
>>>> + * This routine is meant to function as either an inline or delayed operation,
>>>> + * and may return -EAGAIN when the transaction needs to be rolled.  Calling
>>>> + * functions will need to handle this, and recall the function until a
>>>> + * successful error code is returned.
>>>>     */
>>>>    STATIC int
>>>> -xfs_attr_node_removename(
>>>> -	struct xfs_da_args	*args)
>>>> +xfs_attr_node_removename_iter(
>>>> +	struct xfs_delattr_context	*dac)
>>>>    {
>>>> -	struct xfs_da_state	*state;
>>>> -	int			error;
>>>> -	struct xfs_inode	*dp = args->dp;
>>>> +	struct xfs_da_args		*args = dac->da_args;
>>>> +	struct xfs_da_state		*state;
>>>> +	int				error;
>>>> +	struct xfs_inode		*dp = args->dp;
>>>>    	trace_xfs_attr_node_removename(args);
>>>> +	state = dac->da_state;
>>>> -	error = xfs_attr_node_removename_setup(args, &state);
>>>> -	if (error)
>>>> -		goto out;
>>>> +	if ((dac->flags & XFS_DAC_NODE_RMVNAME_INIT) == 0) {
>>>> +		dac->flags |= XFS_DAC_NODE_RMVNAME_INIT;
>>>
>>> Can we determine if it's necessary to call _removename_setup by checking
>>> dac->da_state directly instead of having a flag?
>>
>> Initially I think I had another XFS_DAS_RMTVAL_REMOVE state for this.
>> Alternatly we also discussed using the inverse like this:
>>
>> if (dac->dela_state != XFS_DAS_RMTVAL_REMOVE)
>> 	do setup....
>>
>> Though I think people liked having the init flag, since init routines we a
>> sort of re-occuring pattern.  So that's why were using the flag now.
> 
> Oh, so (da_state != NULL) and (flags & XFS_DAC_NODE_RMVNAME_INIT) aren't
> a 1:1 correlation?
Oooh, sorry... I was refering to the wrong state variable.  Yes, 
da_state should never be null after it is set, so that should work. 
Will update :-)


> 
>>>
>>>> +		error = xfs_attr_node_removename_setup(dac, &state);
>>>> +		if (error)
>>>> +			goto out;
>>>> +	}
>>>> -	error = xfs_attr_node_remove_step(args, state);
>>>> -	if (error)
>>>> -		goto out;
>>>> +	switch (dac->dela_state) {
>>>> +	case XFS_DAS_UNINIT:
>>>> +		error = xfs_attr_node_remove_step(dac);
>>>> +		if (error)
>>>> +			break;
>>>> -	/*
>>>> -	 * If the result is small enough, push it all into the inode.
>>>> -	 */
>>>> -	if (xfs_bmap_one_block(dp, XFS_ATTR_FORK))
>>>> -		error = xfs_attr_node_shrink(args, state);
>>>> +		/* do not break, proceed to shrink if needed */
>>>
>>> /* fall through */
>>>
>>> ...because otherwise the static checkers will get mad.
>>>
>>> (Well clang will anyway because gcc, llvm, and the C18 body all have
>>> different incompatible ideas of what should be the magic tag that
>>> signals an intentional fall through, but this should at least be
>>> consistent with the rest of xfs.)
>> Oh ok then, I did not know.  Will update the comment
>>
>>>
>>>> +	case XFS_DAS_RM_SHRINK:
>>>> +		/*
>>>> +		 * If the result is small enough, push it all into the inode.
>>>> +		 */
>>>> +		if (xfs_bmap_one_block(dp, XFS_ATTR_FORK))
>>>> +			error = xfs_attr_node_shrink(args, state);
>>>> +		break;
>>>> +	default:
>>>> +		ASSERT(0);
>>>> +		return -EINVAL;
>>>> +	}
>>>> +
>>>> +	if (error == -EAGAIN)
>>>> +		return error;
>>>>    out:
>>>>    	if (state)
>>>>    		xfs_da_state_free(state);
>>>> diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
>>>> index 3e97a93..64dcf0f 100644
>>>> --- a/fs/xfs/libxfs/xfs_attr.h
>>>> +++ b/fs/xfs/libxfs/xfs_attr.h
>>>> @@ -74,6 +74,74 @@ struct xfs_attr_list_context {
>>>>    };
>>>> +/*
>>>> + * ========================================================================
>>>> + * Structure used to pass context around among the delayed routines.
>>>> + * ========================================================================
>>>> + */
>>>> +
>>>> +/*
>>>> + * Below is a state machine diagram for attr remove operations. The  XFS_DAS_*
>>>> + * states indicate places where the function would return -EAGAIN, and then
>>>> + * immediately resume from after being recalled by the calling function. States
>>>> + * marked as a "subroutine state" indicate that they belong to a subroutine, and
>>>> + * so the calling function needs to pass them back to that subroutine to allow
>>>> + * it to finish where it left off. But they otherwise do not have a role in the
>>>> + * calling function other than just passing through.
>>>> + *
>>>> + * xfs_attr_remove_iter()
>>>> + *	  XFS_DAS_RM_SHRINK ─┐
>>>> + *	  (subroutine state) │
>>>> + *	                     └─>xfs_attr_node_removename()
>>>> + *	                                      │
>>>> + *	                                      v
>>>> + *	                                   need to
>>>> + *	                                shrink tree? ─n─┐
>>>> + *	                                      │         │
>>>> + *	                                      y         │
>>>> + *	                                      │         │
>>>> + *	                                      v         │
>>>> + *	                              XFS_DAS_RM_SHRINK │
>>>> + *	                                      │         │
>>>> + *	                                      v         │
>>>> + *	                                     done <─────┘
>>>> + *
>>>> + */
>>>> +
>>>> +/*
>>>> + * Enum values for xfs_delattr_context.da_state
>>>> + *
>>>> + * These values are used by delayed attribute operations to keep track  of where
>>>> + * they were before they returned -EAGAIN.  A return code of -EAGAIN signals the
>>>> + * calling function to roll the transaction, and then recall the subroutine to
>>>> + * finish the operation.  The enum is then used by the subroutine to jump back
>>>> + * to where it was and resume executing where it left off.
>>>> + */
>>>> +enum xfs_delattr_state {
>>>> +	XFS_DAS_UNINIT		= 0,  /* No state has been set yet */
>>>> +	XFS_DAS_RM_SHRINK,	      /* We are shrinking the tree */
>>>> +};
>>>> +
>>>> +/*
>>>> + * Defines for xfs_delattr_context.flags
>>>> + */
>>>> +#define XFS_DAC_DEFER_FINISH		0x01 /* finish the transaction */
>>>> +#define XFS_DAC_NODE_RMVNAME_INIT	0x02 /* xfs_attr_node_removename init */
>>>> +
>>>> +/*
>>>> + * Context used for keeping track of delayed attribute operations
>>>> + */
>>>> +struct xfs_delattr_context {
>>>> +	struct xfs_da_args      *da_args;
>>>> +
>>>> +	/* Used in xfs_attr_node_removename to roll through removing blocks */
>>>> +	struct xfs_da_state     *da_state;
>>>> +
>>>> +	/* Used to keep track of current state of delayed operation */
>>>> +	unsigned int            flags;
>>>> +	enum xfs_delattr_state  dela_state;
>>>> +};
>>>> +
>>>>    /*========================================================================
>>>>     * Function prototypes for the kernel.
>>>>     *========================================================================*/
>>>> @@ -91,6 +159,10 @@ int xfs_attr_set(struct xfs_da_args *args);
>>>>    int xfs_attr_set_args(struct xfs_da_args *args);
>>>>    int xfs_has_attr(struct xfs_da_args *args);
>>>>    int xfs_attr_remove_args(struct xfs_da_args *args);
>>>> +int xfs_attr_remove_iter(struct xfs_delattr_context *dac);
>>>> +int xfs_attr_trans_roll(struct xfs_delattr_context *dac);
>>>>    bool xfs_attr_namecheck(const void *name, size_t length);
>>>> +void xfs_delattr_context_init(struct xfs_delattr_context *dac,
>>>> +			      struct xfs_da_args *args);
>>>>    #endif	/* __XFS_ATTR_H__ */
>>>> diff --git a/fs/xfs/libxfs/xfs_attr_leaf.c b/fs/xfs/libxfs/xfs_attr_leaf.c
>>>> index bb128db..338377e 100644
>>>> --- a/fs/xfs/libxfs/xfs_attr_leaf.c
>>>> +++ b/fs/xfs/libxfs/xfs_attr_leaf.c
>>>> @@ -19,8 +19,8 @@
>>>>    #include "xfs_bmap_btree.h"
>>>>    #include "xfs_bmap.h"
>>>>    #include "xfs_attr_sf.h"
>>>> -#include "xfs_attr_remote.h"
>>>>    #include "xfs_attr.h"
>>>> +#include "xfs_attr_remote.h"
>>>>    #include "xfs_attr_leaf.h"
>>>>    #include "xfs_error.h"
>>>>    #include "xfs_trace.h"
>>>> diff --git a/fs/xfs/libxfs/xfs_attr_remote.c b/fs/xfs/libxfs/xfs_attr_remote.c
>>>> index 48d8e9c..1426c15 100644
>>>> --- a/fs/xfs/libxfs/xfs_attr_remote.c
>>>> +++ b/fs/xfs/libxfs/xfs_attr_remote.c
>>>> @@ -674,10 +674,12 @@ xfs_attr_rmtval_invalidate(
>>>>     */
>>>>    int
>>>>    xfs_attr_rmtval_remove(
>>>> -	struct xfs_da_args      *args)
>>>> +	struct xfs_da_args		*args)
>>>>    {
>>>> -	int			error;
>>>> -	int			retval;
>>>> +	int				error;
>>>> +	struct xfs_delattr_context	dac  = {
>>>> +		.da_args	= args,
>>>> +	};
>>>>    	trace_xfs_attr_rmtval_remove(args);
>>>> @@ -685,19 +687,17 @@ xfs_attr_rmtval_remove(
>>>>    	 * Keep de-allocating extents until the remote-value region is gone.
>>>>    	 */
>>>>    	do {
>>>> -		retval = __xfs_attr_rmtval_remove(args);
>>>> -		if (retval && retval != -EAGAIN)
>>>> -			return retval;
>>>> +		error = __xfs_attr_rmtval_remove(&dac);
>>>> +		if (error != -EAGAIN)
>>>> +			break;
>>>> -		/*
>>>> -		 * Close out trans and start the next one in the chain.
>>>> -		 */
>>>> -		error = xfs_trans_roll_inode(&args->trans, args->dp);
>>>> +		error = xfs_attr_trans_roll(&dac);
>>>>    		if (error)
>>>>    			return error;
>>>> -	} while (retval == -EAGAIN);
>>>> -	return 0;
>>>> +	} while (true);
>>>> +
>>>> +	return error;
>>>>    }
>>>>    /*
>>>> @@ -707,9 +707,10 @@ xfs_attr_rmtval_remove(
>>>>     */
>>>>    int
>>>>    __xfs_attr_rmtval_remove(
>>>> -	struct xfs_da_args	*args)
>>>> +	struct xfs_delattr_context	*dac)
>>>>    {
>>>> -	int			error, done;
>>>> +	struct xfs_da_args		*args = dac->da_args;
>>>> +	int				error, done;
>>>>    	/*
>>>>    	 * Unmap value blocks for this attr.
>>>> @@ -719,12 +720,10 @@ __xfs_attr_rmtval_remove(
>>>>    	if (error)
>>>>    		return error;
>>>> -	error = xfs_defer_finish(&args->trans);
>>>> -	if (error)
>>>> -		return error;
>>>> -
>>>> -	if (!done)
>>>> +	if (!done) {
>>>> +		dac->flags |= XFS_DAC_DEFER_FINISH;
>>>>    		return -EAGAIN;
>>>
>>> What state are we in when we return -EAGAIN here?
>>>
>>> [jumps back to his whole-branch diff]
>>>
>>> Hm, oh, I see, the next state could be a number of things--
>>>
>>> RM_LBLK if we're removing an old remote value from a leaf block as part
>>> of an attr set operation; or
>>>
>>> RM_NBLK if we're removing an old remote value from a node block as part
>>> of an attr set operation; and
>>>
>>> UNINIT if we're removing a remote value as part of an attr set
>>> operation.
>>>
>>> Oh!  For the first two, it looks to me as though either we're already in
>>> the state we're setting (RM_[LN]BLK) or we were in either of the
>>> FLIP_[LN]FLAG state.
>>>
>>> I think it would make more sense if you set the state before calling the
>>> rmtval_remove function, and leave a comment here saying that the caller
>>> is responsible for figuring out the next state.
>> Sure, it should be ok
>>
>>>
>>> For removals, I wonder if we should have advanced beyond UNINIT by the
>>> time we get here?  I think you've added the minimum states that are
>>> necessary to resume work after a transaction roll, but from this and the
>>> next patch I feel like we do a lot of work while dela_state == UNINIT.
>> Yes, I think I went over that a little in my replies to your earlier
>> reviews.  Many times we can get away with out setting a state to accomplish
>> the same behavior, though it may make it a little harder to visualize where
>> it comes back.
>>
>> I dunno this one seems like a preference in so far as what people want to
>> see for simplification.  I think haveing the explicit state setting makes
>> the code easier for a reader to follow, though I will concede they dont
>> actually have to be there to make it work.
> 
> <nod> Maybe (as I said earlier in this reply) we can get by with having
> the ascii art diagram point out all the things that happen while we're
> in "UNINIT" state before the first transaction roll.
> 
> I suspect that showing the steps and how the DAC state machine relates
> to those steps is the best we're going to be able to do w.r.t.
> restructuring a general key-value store implemented inside the kernel.
> :)
> 
Alrighty, that sounds reasonable

Thank you for all the feed back!!
Allison
>>>
>>> FWIW I will be taking a close look at all the new 'return -EAGAIN'
>>> statements to see if I can tell what state we're in when we trigger a
>>> transaction roll.
>> Well, ok, a lot of them are UNINIT.  If we continue in the direrction of
>> removing all unnecessary states, really it's the combination of the tree and
>> the state that actually lands us back to where we need to be when the
>> function is recalled.
>>
>> If, for debugging or readability purposes, we wanted an explicit state for
>> each EAGAIN, we would reintroduce a lot of states we've simplifid away over
>> the reviews.
>>
>> Maybe give it a day or two to sleep on, and let me know what you think :-)
> 
> <nod> OK.
> 
> --D
> 
>> Thanks for the reviews, I know it's really complicated.
>> Allison
>>
>>>
>>> --D
>>>
>>>> +	}
>>>>    	return error;
>>>>    }
>>>> diff --git a/fs/xfs/libxfs/xfs_attr_remote.h b/fs/xfs/libxfs/xfs_attr_remote.h
>>>> index 9eee615..002fd30 100644
>>>> --- a/fs/xfs/libxfs/xfs_attr_remote.h
>>>> +++ b/fs/xfs/libxfs/xfs_attr_remote.h
>>>> @@ -14,5 +14,5 @@ int xfs_attr_rmtval_remove(struct xfs_da_args *args);
>>>>    int xfs_attr_rmtval_stale(struct xfs_inode *ip, struct xfs_bmbt_irec *map,
>>>>    		xfs_buf_flags_t incore_flags);
>>>>    int xfs_attr_rmtval_invalidate(struct xfs_da_args *args);
>>>> -int __xfs_attr_rmtval_remove(struct xfs_da_args *args);
>>>> +int __xfs_attr_rmtval_remove(struct xfs_delattr_context *dac);
>>>>    #endif /* __XFS_ATTR_REMOTE_H__ */
>>>> diff --git a/fs/xfs/xfs_attr_inactive.c b/fs/xfs/xfs_attr_inactive.c
>>>> index bfad669..aaa7e66 100644
>>>> --- a/fs/xfs/xfs_attr_inactive.c
>>>> +++ b/fs/xfs/xfs_attr_inactive.c
>>>> @@ -15,10 +15,10 @@
>>>>    #include "xfs_da_format.h"
>>>>    #include "xfs_da_btree.h"
>>>>    #include "xfs_inode.h"
>>>> +#include "xfs_attr.h"
>>>>    #include "xfs_attr_remote.h"
>>>>    #include "xfs_trans.h"
>>>>    #include "xfs_bmap.h"
>>>> -#include "xfs_attr.h"
>>>>    #include "xfs_attr_leaf.h"
>>>>    #include "xfs_quota.h"
>>>>    #include "xfs_dir2.h"
>>>> -- 
>>>> 2.7.4
>>>>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v13 03/10] xfs: Add delay ready attr set routines
  2020-11-14  1:35       ` Darrick J. Wong
@ 2020-11-16  5:25         ` Allison Henderson
  0 siblings, 0 replies; 58+ messages in thread
From: Allison Henderson @ 2020-11-16  5:25 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs



On 11/13/20 6:35 PM, Darrick J. Wong wrote:
> On Thu, Nov 12, 2020 at 06:38:10PM -0700, Allison Henderson wrote:
>>
>>
>> On 11/10/20 4:10 PM, Darrick J. Wong wrote:
>>> On Thu, Oct 22, 2020 at 11:34:28PM -0700, Allison Henderson wrote:
>>>> This patch modifies the attr set routines to be delay ready. This means
>>>> they no longer roll or commit transactions, but instead return -EAGAIN
>>>> to have the calling routine roll and refresh the transaction.  In this
>>>> series, xfs_attr_set_args has become xfs_attr_set_iter, which uses a
>>>> state machine like switch to keep track of where it was when EAGAIN was
>>>> returned. See xfs_attr.h for a more detailed diagram of the states.
>>>>
>>>> Two new helper functions have been added: xfs_attr_rmtval_set_init and
>>>> xfs_attr_rmtval_set_blk.  They provide a subset of logic similar to
>>>> xfs_attr_rmtval_set, but they store the current block in the delay attr
>>>> context to allow the caller to roll the transaction between allocations.
>>>> This helps to simplify and consolidate code used by
>>>> xfs_attr_leaf_addname and xfs_attr_node_addname. xfs_attr_set_args has
>>>> now become a simple loop to refresh the transaction until the operation
>>>> is completed.  Lastly, xfs_attr_rmtval_remove is no longer used, and is
>>>> removed.
>>>>
>>>> Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
>>>> ---
>>>>    fs/xfs/libxfs/xfs_attr.c        | 370 ++++++++++++++++++++++++++--------------
>>>>    fs/xfs/libxfs/xfs_attr.h        | 126 +++++++++++++-
>>>>    fs/xfs/libxfs/xfs_attr_remote.c |  99 +++++++----
>>>>    fs/xfs/libxfs/xfs_attr_remote.h |   4 +
>>>>    fs/xfs/xfs_trace.h              |   1 -
>>>>    5 files changed, 439 insertions(+), 161 deletions(-)
>>>>
>>>> diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
>>>> index 6ca94cb..95c98d7 100644
>>>> --- a/fs/xfs/libxfs/xfs_attr.c
>>>> +++ b/fs/xfs/libxfs/xfs_attr.c
>>>> @@ -44,7 +44,7 @@ STATIC int xfs_attr_shortform_addname(xfs_da_args_t *args);
>>>>     * Internal routines when attribute list is one block.
>>>>     */
>>>>    STATIC int xfs_attr_leaf_get(xfs_da_args_t *args);
>>>> -STATIC int xfs_attr_leaf_addname(xfs_da_args_t *args);
>>>> +STATIC int xfs_attr_leaf_addname(struct xfs_delattr_context *dac);
>>>>    STATIC int xfs_attr_leaf_removename(xfs_da_args_t *args);
>>>>    STATIC int xfs_attr_leaf_hasname(struct xfs_da_args *args, struct xfs_buf **bp);
>>>> @@ -52,12 +52,15 @@ STATIC int xfs_attr_leaf_hasname(struct xfs_da_args *args, struct xfs_buf **bp);
>>>>     * Internal routines when attribute list is more than one block.
>>>>     */
>>>>    STATIC int xfs_attr_node_get(xfs_da_args_t *args);
>>>> -STATIC int xfs_attr_node_addname(xfs_da_args_t *args);
>>>> +STATIC int xfs_attr_node_addname(struct xfs_delattr_context *dac);
>>>>    STATIC int xfs_attr_node_removename_iter(struct xfs_delattr_context *dac);
>>>>    STATIC int xfs_attr_node_hasname(xfs_da_args_t *args,
>>>>    				 struct xfs_da_state **state);
>>>>    STATIC int xfs_attr_fillstate(xfs_da_state_t *state);
>>>>    STATIC int xfs_attr_refillstate(xfs_da_state_t *state);
>>>> +STATIC int xfs_attr_leaf_try_add(struct xfs_da_args *args, struct xfs_buf *bp);
>>>> +STATIC int xfs_attr_set_iter(struct xfs_delattr_context *dac,
>>>> +			     struct xfs_buf **leaf_bp);
>>>>    int
>>>>    xfs_inode_hasattr(
>>>> @@ -218,8 +221,11 @@ xfs_attr_is_shortform(
>>>>    /*
>>>>     * Attempts to set an attr in shortform, or converts short form to leaf form if
>>>> - * there is not enough room.  If the attr is set, the transaction is committed
>>>> - * and set to NULL.
>>>> + * there is not enough room.  This function is meant to operate as a helper
>>>> + * routine to the delayed attribute functions.  It returns -EAGAIN to indicate
>>>> + * that the calling function should roll the transaction, and then proceed to
>>>> + * add the attr in leaf form.  This subroutine does not expect to be recalled
>>>> + * again like the other delayed attr routines do.
>>>>     */
>>>>    STATIC int
>>>>    xfs_attr_set_shortform(
>>>> @@ -227,16 +233,16 @@ xfs_attr_set_shortform(
>>>>    	struct xfs_buf		**leaf_bp)
>>>>    {
>>>>    	struct xfs_inode	*dp = args->dp;
>>>> -	int			error, error2 = 0;
>>>> +	int			error = 0;
>>>>    	/*
>>>>    	 * Try to add the attr to the attribute list in the inode.
>>>>    	 */
>>>>    	error = xfs_attr_try_sf_addname(dp, args);
>>>> +
>>>> +	/* Should only be 0, -EEXIST or ENOSPC */
>>>
>>> Nit: "...or -ENOSPC"
>>>
>>> Also, this comment could go a couple of lines up:
>> Sure
>>>
>>> 	/*
>>> 	 * Try to add the attr to the attribute list in the inode.
>>> 	 * This should only return 0, -EEXIST, or -ENOSPC.
>>> 	 */
>>> 	error = xfs_attr_try_sf_addname(dp, args);
>>> 	if (error != -ENOSPC)
>>> 		return error;
>>>
>>>
>>>>    	if (error != -ENOSPC) {
>>>> -		error2 = xfs_trans_commit(args->trans);
>>>> -		args->trans = NULL;
>>>> -		return error ? error : error2;
>>>> +		return error;
>>>>    	}
>>>>    	/*
>>>>    	 * It won't fit in the shortform, transform to a leaf block.  GROT:
>>>> @@ -249,18 +255,10 @@ xfs_attr_set_shortform(
>>>>    	/*
>>>>    	 * Prevent the leaf buffer from being unlocked so that a concurrent AIL
>>>>    	 * push cannot grab the half-baked leaf buffer and run into problems
>>>> -	 * with the write verifier. Once we're done rolling the transaction we
>>>> -	 * can release the hold and add the attr to the leaf.
>>>> +	 * with the write verifier.
>>>>    	 */
>>>>    	xfs_trans_bhold(args->trans, *leaf_bp);
>>>> -	error = xfs_defer_finish(&args->trans);
>>>> -	xfs_trans_bhold_release(args->trans, *leaf_bp);
>>>> -	if (error) {
>>>> -		xfs_trans_brelse(args->trans, *leaf_bp);
>>>> -		return error;
>>>> -	}
>>>> -
>>>> -	return 0;
>>>> +	return -EAGAIN;
>>>
>>> What state are we in when return -EAGAIN here?  Are we still in
>>> XFS_DAS_UNINIT, but with an attr fork that is no longer in local format,
>>> which means that we skip the xfs_attr_is_shortform branch next time
>>> around?
>> Yes, that's correct.  I think I used to have an explicit state for it, but
>> it's really not needed for this reason.  Though I think they do add some
>> degree of readability.  Maybe we could add a comment?
>>
>> /* Restart attr operation in leaf format */
>>
>> ?
> 
> Or even mention the DAS state explicitly, e.g.
> 
> /*
>   * We're still in XFS_DAS_UNINIT state here.  We've converted the attr
>   * fork to leaf format and will restart with the leaf add.
>   */
> 
> Hmm, second question: Could you add some tracepoints that would fire
> every time we either change the DAS state or return -EAGAIN to trigger a
> roll?  I bet that will make debugging the attr code easier in the future.
Sure, I will look into adding some tracing here.
> 
>>
>>>
>>>>    }
>>>>    /*
>>>> @@ -268,7 +266,7 @@ xfs_attr_set_shortform(
>>>>     * also checks for a defer finish.  Transaction is finished and rolled as
>>>>     * needed, and returns true of false if the delayed operation should continue.
>>>>     */
>>>> -int
>>>> +STATIC int
>>>>    xfs_attr_trans_roll(
>>>>    	struct xfs_delattr_context	*dac)
>>>>    {
>>>> @@ -297,61 +295,130 @@ int
>>>>    xfs_attr_set_args(
>>>>    	struct xfs_da_args	*args)
>>>>    {
>>>> -	struct xfs_inode	*dp = args->dp;
>>>> -	struct xfs_buf          *leaf_bp = NULL;
>>>> -	int			error = 0;
>>>> +	struct xfs_buf			*leaf_bp = NULL;
>>>> +	int				error = 0;
>>>> +	struct xfs_delattr_context	dac = {
>>>> +		.da_args	= args,
>>>> +	};
>>>> +
>>>> +	do {
>>>> +		error = xfs_attr_set_iter(&dac, &leaf_bp);
>>>> +		if (error != -EAGAIN)
>>>> +			break;
>>>> +
>>>> +		error = xfs_attr_trans_roll(&dac);
>>>> +		if (error)
>>>> +			return error;
>>>> +
>>>> +		if (leaf_bp) {
>>>> +			xfs_trans_bjoin(args->trans, leaf_bp);
>>>> +			xfs_trans_bhold(args->trans, leaf_bp);
>>>> +		}
>>>> +
>>>> +	} while (true);
>>>> +
>>>> +	return error;
>>>> +}
>>>> +
>>>> +/*
>>>> + * Set the attribute specified in @args.
>>>> + * This routine is meant to function as a delayed operation, and may return
>>>> + * -EAGAIN when the transaction needs to be rolled.  Calling functions will need
>>>> + * to handle this, and recall the function until a successful error code is
>>>> + * returned.
>>>> + */
>>>> +STATIC int
>>>> +xfs_attr_set_iter(
>>>> +	struct xfs_delattr_context	*dac,
>>>> +	struct xfs_buf			**leaf_bp)
>>>> +{
>>>> +	struct xfs_da_args		*args = dac->da_args;
>>>> +	struct xfs_inode		*dp = args->dp;
>>>> +	int				error = 0;
>>>> +
>>>> +	/* State machine switch */
>>>> +	switch (dac->dela_state) {
>>>> +	case XFS_DAS_FLIP_LFLAG:
>>>> +	case XFS_DAS_FOUND_LBLK:
>>>
>>> Do we need to catch XFS_DAS_RM_LBLK here?
>>
>> I think we fall into the correct code path without it, but I think it's
>> better to have it here for consistency.  Will add.
>>
>>>
>>>> +		goto das_leaf;
>>>> +	case XFS_DAS_FOUND_NBLK:
>>>> +	case XFS_DAS_FLIP_NFLAG:
>>>> +	case XFS_DAS_ALLOC_NODE:
>>>> +		goto das_node;
>>>> +	default:
>>>> +		break;
>>>> +	}
>>>>    	/*
>>>>    	 * If the attribute list is already in leaf format, jump straight to
>>>>    	 * leaf handling.  Otherwise, try to add the attribute to the shortform
>>>>    	 * list; if there's no room then convert the list to leaf format and try
>>>> -	 * again.
>>>> +	 * again. No need to set state as we will be in leaf form when we come
>>>> +	 * back
>>>>    	 */
>>>>    	if (xfs_attr_is_shortform(dp)) {
>>>>    		/*
>>>> -		 * If the attr was successfully set in shortform, the
>>>> -		 * transaction is committed and set to NULL.  Otherwise, is it
>>>> -		 * converted from shortform to leaf, and the transaction is
>>>> -		 * retained.
>>>> +		 * If the attr was successfully set in shortform, no need to
>>>> +		 * continue.  Otherwise, is it converted from shortform to leaf
>>>> +		 * and -EAGAIN is returned.
>>>>    		 */
>>>> -		error = xfs_attr_set_shortform(args, &leaf_bp);
>>>> -		if (error || !args->trans)
>>>> -			return error;
>>>> +		error = xfs_attr_set_shortform(args, leaf_bp);
>>>> +		if (error == -EAGAIN)
>>>> +			dac->flags |= XFS_DAC_DEFER_FINISH;
>>>> +
>>>> +		return error;
>>>>    	}
>>>> -	if (xfs_bmap_one_block(dp, XFS_ATTR_FORK)) {
>>>> -		error = xfs_attr_leaf_addname(args);
>>>> -		if (error != -ENOSPC)
>>>> -			return error;
>>>> +	/*
>>>> +	 * After a shortform to leaf conversion, we need to hold the leaf and
>>>> +	 * cycle out the transaction.  When we get back, we need to release
>>>> +	 * the leaf.
>>>
>>> "...to release the hold on the leaf buffer."
>> Sure, will expand
>>
>>>
>>>> +	 */
>>>> +	if (*leaf_bp != NULL) {
>>>> +		xfs_trans_bhold_release(args->trans, *leaf_bp);
>>>> +		*leaf_bp = NULL;
>>>> +	}
>>>> -		/*
>>>> -		 * Promote the attribute list to the Btree format.
>>>> -		 */
>>>> -		error = xfs_attr3_leaf_to_node(args);
>>>> -		if (error)
>>>> -			return error;
>>>> +	if (xfs_bmap_one_block(dp, XFS_ATTR_FORK)) {
>>>> +		error = xfs_attr_leaf_try_add(args, *leaf_bp);
>>>> +		switch (error) {
>>>> +		case -ENOSPC:
>>>> +			/*
>>>> +			 * Promote the attribute list to the Btree format.
>>>> +			 */
>>>> +			error = xfs_attr3_leaf_to_node(args);
>>>> +			if (error)
>>>> +				return error;
>>>> -		/*
>>>> -		 * Finish any deferred work items and roll the transaction once
>>>> -		 * more.  The goal here is to call node_addname with the inode
>>>> -		 * and transaction in the same state (inode locked and joined,
>>>> -		 * transaction clean) no matter how we got to this step.
>>>> -		 */
>>>> -		error = xfs_defer_finish(&args->trans);
>>>> -		if (error)
>>>> +			/*
>>>> +			 * Finish any deferred work items and roll the
>>>> +			 * transaction once more.  The goal here is to call
>>>> +			 * node_addname with the inode and transaction in the
>>>> +			 * same state (inode locked and joined, transaction
>>>> +			 * clean) no matter how we got to this step.
>>>> +			 */
>>>> +			dac->flags |= XFS_DAC_DEFER_FINISH;
>>>> +			return -EAGAIN;
>>>
>>> What state should we be in at this -EAGAIN return?  Is it
>>> XFS_DAS_UNINIT, but with more than one extent in the attr fork?
>> It could be UNINIT, if the attr was already a leaf at the time we started.
>> If we had to promote from a block to a leaf, and STILL counldnt fit in leaf
>> form, then we're probably in some state reminiscent of the leaf routines.
>> But because xfs_attr3_leaf_to_node just turned us into a node, we fall into
>> the node path upon return.
>>
>> I know that's confusing... which leads to your next question of.....
>>>
>>> /me is wishing these would get turned into explicit states, since afaict
>>> we don't unlock the inode and so we should find it in /exactly/ the
>>> state that the delattr_context says it should be in.
>> IIRC it used to have an explicit XFS_DC_LEAF_TO_NODE state, but I think we
>> simplified it away at some point in the reviewing in an effort to simplify
>> the statemachine as much as possible.  v8 I think.  But yes, I do think
>> there is a trade off between removing the states where they can be, but then
>> reducing the readability of where we are in the attr process.  Because now
>> your state isnt exactly represented by dela_state anymore, it's the
>> combination of dela_state and the state of the tree.
>>
>> I think I've been over this code so much by now, I can follow it either way,
>> but if it's confusing to others, maybe we should put it back?  Or maybe just
>> a comment if that helps?
> 
> A comment laying out which states we could be in and how we got there. :)
Sure, will add in comments and traceing too.  I'm thinking the comment 
from below should probably come up here.  Jump my next comment below...

> 
>>
>>
>>>
>>>> +		case 0:
>>>> +			dac->dela_state = XFS_DAS_FOUND_LBLK;
>>>> +			return -EAGAIN;
>>>> +		default:
>>>>    			return error;
>>>> +		}
>>>> +das_leaf:
>>>
>>> The only way to get to this block of code is by jumping to das_leaf,
>>> from the switch statement above, right?  If so, then shouldn't it be up
>>> there in the switch statement?
>> We could, though I think we were just trying to be consistent in that the
>> switch is sort of a dispatcher for gotos?  Otherwise we end up with a switch
>> with giant cases.  It's the same difference I suppose.
> 
> With this comment in particular, I had to dig through the switch
> statement in the previous code block to figure out that it's not
> possible to fall into das_leaf from above.
> 
>>>
>>>> +		error = xfs_attr_leaf_addname(dac);
>>>> +		if (error == -ENOSPC)
>>>> +			/*
>>>> +			 * No need to set state.  We will be in node form when
>>>> +			 * we are recalled
>>>> +			 */
>>>> +			return -EAGAIN;
>>>
>>> How do we get to node form?
>> Hmm, I thought xfs_attr_leaf_addname did promote to node if theres not
>> enough space, but now that you point it out, i'm not seeing it.  We may have
>> to put the LEAF_TO_NODE state back anyway.
>>
>> maybe i can add a test case too, it doesnt look like any of the existing
>> cases run across it.
> 
> Hm.  At least in the old code, I thought it was xfs_attr_set_args that
> would call xfs_attr_leaf_addname and if it returned ENOSPC, it would
> then call xfs_attr3_leaf_to_node...
> 
> (Gosh, I can't even tell where we are anymore. :()
Ok, sorry, so what happens is:
We pull xfs_attr_leaf_try_add out of xfs_attr_leaf_addname, and the 
ENOSPC handler with it.  But I think that's the only place in 
xfs_attr_leaf_addname where ENOSPC used to come from.  So I think this 
old ENOSPC handler here can go away, I dont see that theres any other 
way an ENOSPC comes back, or if it does its not expected.  That's likely 
why this ENOSPC handler isnt getting triggered in the test cases.  The 
ENOSPC condition is already getting handled in the case above.

So I think this line can just turn into
"return xfs_attr_leaf_addname(dac);"

Sorry for the condufsion!!  :-(


Going back to your request for comments above, how about we expand the 
existing coment up there and add:

/*
  * We do not need to set a state here, because when we come back, we
  * will be in node form. dela_state is still XFS_DAS_UNINIT at this time
  */


> 
>>>
>>>> -		/*
>>>> -		 * Commit the current trans (including the inode) and
>>>> -		 * start a new one.
>>>> -		 */
>>>> -		error = xfs_trans_roll_inode(&args->trans, dp);
>>>> -		if (error)
>>>> -			return error;
>>>> +		return error;
>>>>    	}
>>>> -
>>>> -	error = xfs_attr_node_addname(args);
>>>> +das_node:
>>>> +	error = xfs_attr_node_addname(dac);
>>>>    	return error;
>>>
>>> Similarly, I think the only way get to this block of code is if we're in
>>> the initial state (XFS_DAS_UNINIT?) and the inode wasn't in short
>>> format; or if we jumped here via DAS_{FOUND_NBLK,FLIP_NFLAG,ALLOC_NODE},
>>> right?
>>>
>>> I think you could straighten this out a bit further (I left out the
>>> comments):
>>>
>>> 	switch (dac->dela_state) {
>>> 	case XFS_DAS_FLIP_LFLAG:
>>> 	case XFS_DAS_FOUND_LBLK:
>>> 		error = xfs_attr_leaf_addname(dac);
>>> 		if (error == -ENOSPC)
>>> 			return -EAGAIN;
>>> 		return error;
>>> 	case XFS_DAS_FOUND_NBLK:
>>> 	case XFS_DAS_FLIP_NFLAG:
>>> 	case XFS_DAS_ALLOC_NODE:
>>> 		return xfs_attr_node_addname(dac);
>>> 	case XFS_DAS_UNINIT:
>>> 		break;
>>> 	default:
>>> 		...assert on the XFS_DAS_RM_* flags...
>>> 	}
>>>
>>> 	if (xfs_attr_is_shortform(dp))
>>> 		return xfs_attr_set_shortform(args, leaf_bp);
>>>
>>> 	if (*leaf_bp != NULL) {
>>> 		...release bhold...
>>> 	}
>>>
>>> 	if (!xfs_bmap_one_block(...))
>>> 		return xfs_attr_node_addname(dac);
>>>
>>> 	error = xfs_attr_leaf_try_add(args, *leaf_bp);
>>> 	switch (error) {
>>> 	...handle -ENOSPC and 0...
>>> 	}
>>> 	return error;
>>>
>> Ok, I'll see if I can get something like that through the test cases. If if
>> doesnt work out, I'll make a note of it.
> 
> <nod>
> 
>>>>    }
>>>> @@ -723,28 +790,30 @@ xfs_attr_leaf_try_add(
>>>>     *
>>>>     * This leaf block cannot have a "remote" value, we only call this routine
>>>>     * if bmap_one_block() says there is only one block (ie: no remote blks).
>>>> + *
>>>> + * This routine is meant to function as a delayed operation, and may return
>>>> + * -EAGAIN when the transaction needs to be rolled.  Calling functions will need
>>>> + * to handle this, and recall the function until a successful error code is
>>>> + * returned.
>>>>     */
>>>>    STATIC int
>>>>    xfs_attr_leaf_addname(
>>>> -	struct xfs_da_args	*args)
>>>> +	struct xfs_delattr_context	*dac)
>>>>    {
>>>> -	int			error, forkoff;
>>>> -	struct xfs_buf		*bp = NULL;
>>>> -	struct xfs_inode	*dp = args->dp;
>>>> -
>>>> -	trace_xfs_attr_leaf_addname(args);
>>>> -
>>>> -	error = xfs_attr_leaf_try_add(args, bp);
>>>> -	if (error)
>>>> -		return error;
>>>> +	struct xfs_da_args		*args = dac->da_args;
>>>> +	struct xfs_buf			*bp = NULL;
>>>> +	int				error, forkoff;
>>>> +	struct xfs_inode		*dp = args->dp;
>>>> -	/*
>>>> -	 * Commit the transaction that added the attr name so that
>>>> -	 * later routines can manage their own transactions.
>>>> -	 */
>>>> -	error = xfs_trans_roll_inode(&args->trans, dp);
>>>> -	if (error)
>>>> -		return error;
>>>> +	/* State machine switch */
>>>> +	switch (dac->dela_state) {
>>>> +	case XFS_DAS_FLIP_LFLAG:
>>>> +		goto das_flip_flag;
>>>> +	case XFS_DAS_RM_LBLK:
>>>> +		goto das_rm_lblk;
>>>> +	default:
>>>> +		break;
>>>> +	}
>>>>    	/*
>>>>    	 * If there was an out-of-line value, allocate the blocks we
>>>> @@ -752,12 +821,34 @@ xfs_attr_leaf_addname(
>>>>    	 * after we create the attribute so that we don't overflow the
>>>>    	 * maximum size of a transaction and/or hit a deadlock.
>>>>    	 */
>>>> -	if (args->rmtblkno > 0) {
>>>> -		error = xfs_attr_rmtval_set(args);
>>>> +
>>>> +	/* Open coded xfs_attr_rmtval_set without trans handling */
>>>> +	if ((dac->flags & XFS_DAC_LEAF_ADDNAME_INIT) == 0) {
>>>> +		dac->flags |= XFS_DAC_LEAF_ADDNAME_INIT;
>>>> +		if (args->rmtblkno > 0) {
>>>> +			error = xfs_attr_rmtval_find_space(dac);
>>>> +			if (error)
>>>> +				return error;
>>>> +		}
>>>> +	}
>>>> +
>>>> +	/*
>>>> +	 * Roll through the "value", allocating blocks on disk as
>>>> +	 * required.
>>>> +	 */
>>>> +	if (dac->blkcnt > 0) {
>>>> +		error = xfs_attr_rmtval_set_blk(dac);
>>>>    		if (error)
>>>>    			return error;
>>>> +
>>>> +		dac->flags |= XFS_DAC_DEFER_FINISH;
>>>> +		return -EAGAIN;
>>>
>>> What state are we in here?  FOUND_LBLK, with blkcnt slowly decreasing?
>>>
>> I used to have an ALLOC_LEAF state for this one.  Used to look something
>> like this:
>> +alloc_leaf:
> 
> Aha, that's where ALLOC_LEAF went.
> 
>> +        while (args->dac.blkcnt > 0) {
>> +            error = xfs_attr_rmtval_set_blk(args);
>> +            if (error)
>> +                return error;
>> +
>> +            args->dac.flags |= XFS_DAC_FINISH_TRANS;
>> +            args->dac.dela_state = XFS_DAS_ALLOC_LEAF;
>> +            return -EAGAIN;
>> +        }
>>
>> Again, it's not really needed, as we will fall into this logic with or with
>> out the state.  And the while loop doesnt really loop, though I guess it
>> does sort of help the reader to understand that this is supposed to function
>> like a loop.  I think it's easy to see something like that, and then want to
>> simplify away the extra semantics, but then on a second look, it's not quite
>> as obvious why with out the recollection of what it once was.  Maybe a
>> comment is in order?
>>
>> /* Repeat this until we have set all rmt blks */
>>
>> ?
> 
> Well there already is a comment that we're repeating until we've set all
> the remote blocks, but it should capture which DAS state(s) we could be
> in, because I quickly get lost, especially in the attr set code.
OK, how about:

/*
  * We dont set a state because we can use dac->blkcnt to know if we need
  * to continue setting blocks.  dac.dela_state is still XFS_DAS_UNINIT
  */

> 
>>
>> To directly answer your question though, I think the state is still UNINIT
>> at this point, since any of the other states would have branched off before
>> this.  It's important to note though that the functions that have states are
>> meant to sort of take ownership the statemachine.  IOW, if the state coming
>> in does not apply to the scope of this function, or any of the subroutines
>> there in, then the state is simply overwritten as this function decides
>> appropriate.  It doesnt throw an error if it is passed a state that used to
>> belong to it's parent.  Calling functions should understand that they have
>> sort of "surrendered" the statemachine to this subfunction until it returns
>> something other than EAGAIN.
> 
> That will become very obvious once we've arrived at the end of the
> series and everyone must use defer ops. :)
> 
>> At least that's the idea.  Honnestly, the only
>> reason I have UNINIT at all is because we get warnings about setting the
>> state to 0 when the enum needs to start at something other than 0.
>>
>> Hope that helps?
> 
> Yeah.
> 
>>
>>
>>
>>>>    	}
>>>> +	error = xfs_attr_rmtval_set_value(args);
>>>> +	if (error)
>>>> +		return error;
>>>> +
>>>>    	if (!(args->op_flags & XFS_DA_OP_RENAME)) {
>>>>    		/*
>>>>    		 * Added a "remote" value, just clear the incomplete flag.
>>>> @@ -777,29 +868,29 @@ xfs_attr_leaf_addname(
>>>>    	 * In a separate transaction, set the incomplete flag on the "old" attr
>>>>    	 * and clear the incomplete flag on the "new" attr.
>>>>    	 */
>>>> -
>>>>    	error = xfs_attr3_leaf_flipflags(args);
>>>>    	if (error)
>>>>    		return error;
>>>>    	/*
>>>>    	 * Commit the flag value change and start the next trans in series.
>>>>    	 */
>>>> -	error = xfs_trans_roll_inode(&args->trans, args->dp);
>>>> -	if (error)
>>>> -		return error;
>>>> -
>>>> +	dac->dela_state = XFS_DAS_FLIP_LFLAG;
>>>> +	return -EAGAIN;
>>>> +das_flip_flag:
>>>>    	/*
>>>>    	 * Dismantle the "old" attribute/value pair by removing a "remote" value
>>>>    	 * (if it exists).
>>>>    	 */
>>>>    	xfs_attr_restore_rmt_blk(args);
>>>> +	error = xfs_attr_rmtval_invalidate(args);
>>>> +	if (error)
>>>> +		return error;
>>>> +das_rm_lblk:
>>>>    	if (args->rmtblkno) {
>>>> -		error = xfs_attr_rmtval_invalidate(args);
>>>> -		if (error)
>>>> -			return error;
>>>> -
>>>> -		error = xfs_attr_rmtval_remove(args);
>>>> +		error = __xfs_attr_rmtval_remove(dac);
>>>> +		if (error == -EAGAIN)
>>>> +			dac->dela_state = XFS_DAS_RM_LBLK;
>>>>    		if (error)
>>>>    			return error;
>>>>    	}
>>>> @@ -965,23 +1056,38 @@ xfs_attr_node_hasname(
>>>>     *
>>>>     * "Remote" attribute values confuse the issue and atomic rename operations
>>>>     * add a whole extra layer of confusion on top of that.
>>>> + *
>>>> + * This routine is meant to function as a delayed operation, and may return
>>>> + * -EAGAIN when the transaction needs to be rolled.  Calling functions will need
>>>> + * to handle this, and recall the function until a successful error code is
>>>> + *returned.
>>>>     */
>>>>    STATIC int
>>>>    xfs_attr_node_addname(
>>>> -	struct xfs_da_args	*args)
>>>> +	struct xfs_delattr_context	*dac)
>>>>    {
>>>> -	struct xfs_da_state	*state;
>>>> -	struct xfs_da_state_blk	*blk;
>>>> -	struct xfs_inode	*dp;
>>>> -	int			retval, error;
>>>> +	struct xfs_da_args		*args = dac->da_args;
>>>> +	struct xfs_da_state		*state = NULL;
>>>> +	struct xfs_da_state_blk		*blk;
>>>> +	int				retval = 0;
>>>> +	int				error = 0;
>>>>    	trace_xfs_attr_node_addname(args);
>>>> -	/*
>>>> -	 * Fill in bucket of arguments/results/context to carry around.
>>>> -	 */
>>>> -	dp = args->dp;
>>>> -restart:
>>>> +	/* State machine switch */
>>>> +	switch (dac->dela_state) {
>>>> +	case XFS_DAS_FLIP_NFLAG:
>>>> +		goto das_flip_flag;
>>>> +	case XFS_DAS_FOUND_NBLK:
>>>> +		goto das_found_nblk;
>>>> +	case XFS_DAS_ALLOC_NODE:
>>>> +		goto das_alloc_node;
>>>> +	case XFS_DAS_RM_NBLK:
>>>> +		goto das_rm_nblk;
>>>> +	default:
>>>> +		break;
>>>> +	}
>>>> +
>>>>    	/*
>>>>    	 * Search to see if name already exists, and get back a pointer
>>>>    	 * to where it should go.
>>>> @@ -1027,19 +1133,13 @@ xfs_attr_node_addname(
>>>>    			error = xfs_attr3_leaf_to_node(args);
>>>>    			if (error)
>>>>    				goto out;
>>>> -			error = xfs_defer_finish(&args->trans);
>>>> -			if (error)
>>>> -				goto out;
>>>>    			/*
>>>> -			 * Commit the node conversion and start the next
>>>> -			 * trans in the chain.
>>>> +			 * Restart routine from the top.  No need to set  the
>>>> +			 * state
>>>>    			 */
>>>> -			error = xfs_trans_roll_inode(&args->trans, dp);
>>>> -			if (error)
>>>> -				goto out;
>>>> -
>>>> -			goto restart;
>>>> +			dac->flags |= XFS_DAC_DEFER_FINISH;
>>>> +			return -EAGAIN;
>>>
>>> What state are we in here?  Are we still in the same state that we were
>>> at the start of the function, but ready to try xfs_attr3_leaf_add again?
>> To directly answer the question: we may be in UNINIT if we were already a
>> node when we started the attr op.  If we had to promote from leaf to node,
>> it may be some state left over from the leaf routines.
>>
>> Again though, in so far as this routine is concerned, the idea is that the
>> state either one of the cases in the switch up top, or it's not.
> 
> <nod>  Comment please. :)
Sure, how about:
/*
  * Now that we have converted the leaf to a node, we can roll the
  * transaction, and try xfs_attr3_leaf_add again on re-entry.  No need
  * to set dela_state to do this. dela_state is still unset by this
  * function at this point.
  */


> 
>>>
>>>>    		}
>>>>    		/*
>>>> @@ -1051,9 +1151,7 @@ xfs_attr_node_addname(
>>>>    		error = xfs_da3_split(state);
>>>>    		if (error)
>>>>    			goto out;
>>>> -		error = xfs_defer_finish(&args->trans);
>>>> -		if (error)
>>>> -			goto out;
>>>> +		dac->flags |= XFS_DAC_DEFER_FINISH;
>>>>    	} else {
>>>>    		/*
>>>>    		 * Addition succeeded, update Btree hashvals.
>>>> @@ -1068,13 +1166,9 @@ xfs_attr_node_addname(
>>>>    	xfs_da_state_free(state);
>>>>    	state = NULL;
>>>> -	/*
>>>> -	 * Commit the leaf addition or btree split and start the next
>>>> -	 * trans in the chain.
>>>> -	 */
>>>> -	error = xfs_trans_roll_inode(&args->trans, dp);
>>>> -	if (error)
>>>> -		goto out;
>>>> +	dac->dela_state = XFS_DAS_FOUND_NBLK;
>>>> +	return -EAGAIN;
>>>> +das_found_nblk:
>>>>    	/*
>>>>    	 * If there was an out-of-line value, allocate the blocks we
>>>> @@ -1083,7 +1177,27 @@ xfs_attr_node_addname(
>>>>    	 * maximum size of a transaction and/or hit a deadlock.
>>>>    	 */
>>>>    	if (args->rmtblkno > 0) {
>>>> -		error = xfs_attr_rmtval_set(args);
>>>> +		/* Open coded xfs_attr_rmtval_set without trans handling */
>>>> +		error = xfs_attr_rmtval_find_space(dac);
>>>> +		if (error)
>>>> +			return error;
>>>> +
>>>> +		/*
>>>> +		 * Roll through the "value", allocating blocks on disk as
>>>> +		 * required.
>>>> +		 */
>>>> +das_alloc_node:
>>>> +		if (dac->blkcnt > 0) {
>>>> +			error = xfs_attr_rmtval_set_blk(dac);
>>>> +			if (error)
>>>> +				return error;
>>>> +
>>>> +			dac->flags |= XFS_DAC_DEFER_FINISH;
>>>> +			dac->dela_state = XFS_DAS_ALLOC_NODE;
>>>> +			return -EAGAIN;
>>>> +		}
>>>> +
>>>> +		error = xfs_attr_rmtval_set_value(args);
>>>>    		if (error)
>>>>    			return error;
>>>>    	}
>>>> @@ -1113,22 +1227,28 @@ xfs_attr_node_addname(
>>>>    	/*
>>>>    	 * Commit the flag value change and start the next trans in series
>>>>    	 */
>>>> -	error = xfs_trans_roll_inode(&args->trans, args->dp);
>>>> -	if (error)
>>>> -		goto out;
>>>> -
>>>> +	dac->dela_state = XFS_DAS_FLIP_NFLAG;
>>>> +	return -EAGAIN;
>>>> +das_flip_flag:
>>>>    	/*
>>>>    	 * Dismantle the "old" attribute/value pair by removing a "remote" value
>>>>    	 * (if it exists).
>>>>    	 */
>>>>    	xfs_attr_restore_rmt_blk(args);
>>>> +	error = xfs_attr_rmtval_invalidate(args);
>>>> +	if (error)
>>>> +		return error;
>>>> +
>>>> +das_rm_nblk:
>>>>    	if (args->rmtblkno) {
>>>> -		error = xfs_attr_rmtval_invalidate(args);
>>>> -		if (error)
>>>> -			return error;
>>>> +		error = __xfs_attr_rmtval_remove(dac);
>>>> +
>>>> +		if (error == -EAGAIN) {
>>>> +			dac->dela_state = XFS_DAS_RM_NBLK;
>>>> +			return -EAGAIN;
>>>> +		}
>>>> -		error = xfs_attr_rmtval_remove(args);
>>>>    		if (error)
>>>>    			return error;
>>>>    	}
>>>> diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
>>>> index 64dcf0f..501f9df 100644
>>>> --- a/fs/xfs/libxfs/xfs_attr.h
>>>> +++ b/fs/xfs/libxfs/xfs_attr.h
>>>> @@ -106,6 +106,118 @@ struct xfs_attr_list_context {
>>>>     *	                                      v         │
>>>>     *	                                     done <─────┘
>>>>     *
>>>> + *
>>>> + * Below is a state machine diagram for attr set operations.
>>>> + *
>>>> + *  xfs_attr_set_iter()
>>>> + *             │
>>>> + *             v
>>>
>>> I think this diagram is missing the part where we attempt to add a
>>> shortform attr?
>> I left if out because the short form doesnt make use of states.  I can
>> doodle that in though if you prefer:
>>
>>        ┌───n── is shortform?
>>        │            |
>>        │            y
>>        │            |
>>        │            V
>>        │   xfs_attr_set_shortform
>>        │            |
>>        │            V
>>        ├───n─── had enough
>>        │          space?
>>        │            │
>>        │            y
>>        │            │
>>        │            V
>>        │           done
>>        └────────────┐
>>                     │
>>                     V
> 
> Yes, please do capture the entire mechanism so that 2025 us aren't
> sitting here muttering about why we didn't do that when everything was
> still warm in our L1 brain cache. ;)
> 
> --D

Alrighty, will do :-)

Thank you!!!
Allison

> 
>>
>>>
>>> --D
>>
>> Thx for the thorough reviews!
>>
>> Allison
>>
>>>
>>>> + *   ┌───n── fork has
>>>> + *   │	    only 1 blk?
>>>> + *   │		│
>>>> + *   │		y
>>>> + *   │		│
>>>> + *   │		v
>>>> + *   │	xfs_attr_leaf_try_add()
>>>> + *   │		│
>>>> + *   │		v
>>>> + *   │	     had enough
>>>> + *   ├───n────space?
>>>> + *   │		│
>>>> + *   │		y
>>>> + *   │		│
>>>> + *   │		v
>>>> + *   │	XFS_DAS_FOUND_LBLK ──┐
>>>> + *   │	                     │
>>>> + *   │	XFS_DAS_FLIP_LFLAG ──┤
>>>> + *   │	(subroutine state)   │
>>>> + *   │		             │
>>>> + *   │		             └─>xfs_attr_leaf_addname()
>>>> + *   │		                      │
>>>> + *   │		                      v
>>>> + *   │		                   was this
>>>> + *   │		                   a rename? ──n─┐
>>>> + *   │		                      │          │
>>>> + *   │		                      y          │
>>>> + *   │		                      │          │
>>>> + *   │		                      v          │
>>>> + *   │		                flip incomplete  │
>>>> + *   │		                    flag         │
>>>> + *   │		                      │          │
>>>> + *   │		                      v          │
>>>> + *   │		              XFS_DAS_FLIP_LFLAG │
>>>> + *   │		                      │          │
>>>> + *   │		                      v          │
>>>> + *   │		                    remove       │
>>>> + *   │		XFS_DAS_RM_LBLK ─> old name      │
>>>> + *   │		         ^            │          │
>>>> + *   │		         │            v          │
>>>> + *   │		         └──────y── more to      │
>>>> + *   │		                    remove       │
>>>> + *   │		                      │          │
>>>> + *   │		                      n          │
>>>> + *   │		                      │          │
>>>> + *   │		                      v          │
>>>> + *   │		                     done <──────┘
>>>> + *   └──> XFS_DAS_FOUND_NBLK ──┐
>>>> + *	  (subroutine state)   │
>>>> + *	                       │
>>>> + *	  XFS_DAS_ALLOC_NODE ──┤
>>>> + *	  (subroutine state)   │
>>>> + *	                       │
>>>> + *	  XFS_DAS_FLIP_NFLAG ──┤
>>>> + *	  (subroutine state)   │
>>>> + *	                       │
>>>> + *	                       └─>xfs_attr_node_addname()
>>>> + *	                               │
>>>> + *	                               v
>>>> + *	                       find space to store
>>>> + *	                      attr. Split if needed
>>>> + *	                               │
>>>> + *	                               v
>>>> + *	                       XFS_DAS_FOUND_NBLK
>>>> + *	                               │
>>>> + *	                               v
>>>> + *	                 ┌─────n──  need to
>>>> + *	                 │        alloc blks?
>>>> + *	                 │             │
>>>> + *	                 │             y
>>>> + *	                 │             │
>>>> + *	                 │             v
>>>> + *	                 │  ┌─>XFS_DAS_ALLOC_NODE
>>>> + *	                 │  │          │
>>>> + *	                 │  │          v
>>>> + *	                 │  └──y── need to alloc
>>>> + *	                 │         more blocks?
>>>> + *	                 │             │
>>>> + *	                 │             n
>>>> + *	                 │             │
>>>> + *	                 │             v
>>>> + *	                 │          was this
>>>> + *	                 └────────> a rename? ──n─┐
>>>> + *	                               │          │
>>>> + *	                               y          │
>>>> + *	                               │          │
>>>> + *	                               v          │
>>>> + *	                         flip incomplete  │
>>>> + *	                             flag         │
>>>> + *	                               │          │
>>>> + *	                               v          │
>>>> + *	                       XFS_DAS_FLIP_NFLAG │
>>>> + *	                               │          │
>>>> + *	                               v          │
>>>> + *	                             remove       │
>>>> + *	         XFS_DAS_RM_NBLK ─> old name      │
>>>> + *	                  ^            │          │
>>>> + *	                  │            v          │
>>>> + *	                  └──────y── more to      │
>>>> + *	                             remove       │
>>>> + *	                               │          │
>>>> + *	                               n          │
>>>> + *	                               │          │
>>>> + *	                               v          │
>>>> + *	                              done <──────┘
>>>> + *
>>>>     */
>>>>    /*
>>>> @@ -120,6 +232,13 @@ struct xfs_attr_list_context {
>>>>    enum xfs_delattr_state {
>>>>    	XFS_DAS_UNINIT		= 0,  /* No state has been set yet */
>>>>    	XFS_DAS_RM_SHRINK,	      /* We are shrinking the tree */
>>>> +	XFS_DAS_FOUND_LBLK,	      /* We found leaf blk for attr */
>>>> +	XFS_DAS_FOUND_NBLK,	      /* We found node blk for attr */
>>>> +	XFS_DAS_FLIP_LFLAG,	      /* Flipped leaf INCOMPLETE attr flag */
>>>> +	XFS_DAS_RM_LBLK,	      /* A rename is removing leaf blocks */
>>>> +	XFS_DAS_ALLOC_NODE,	      /* We are allocating node blocks */
>>>> +	XFS_DAS_FLIP_NFLAG,	      /* Flipped node INCOMPLETE attr flag */
>>>> +	XFS_DAS_RM_NBLK,	      /* A rename is removing node blocks */
>>>>    };
>>>>    /*
>>>> @@ -127,6 +246,7 @@ enum xfs_delattr_state {
>>>>     */
>>>>    #define XFS_DAC_DEFER_FINISH		0x01 /* finish the transaction */
>>>>    #define XFS_DAC_NODE_RMVNAME_INIT	0x02 /* xfs_attr_node_removename init */
>>>> +#define XFS_DAC_LEAF_ADDNAME_INIT	0x04 /* xfs_attr_leaf_addname init*/
>>>>    /*
>>>>     * Context used for keeping track of delayed attribute operations
>>>> @@ -134,6 +254,11 @@ enum xfs_delattr_state {
>>>>    struct xfs_delattr_context {
>>>>    	struct xfs_da_args      *da_args;
>>>> +	/* Used in xfs_attr_rmtval_set_blk to roll through allocating blocks */
>>>> +	struct xfs_bmbt_irec	map;
>>>> +	xfs_dablk_t		lblkno;
>>>> +	int			blkcnt;
>>>> +
>>>>    	/* Used in xfs_attr_node_removename to roll through removing blocks */
>>>>    	struct xfs_da_state     *da_state;
>>>> @@ -160,7 +285,6 @@ int xfs_attr_set_args(struct xfs_da_args *args);
>>>>    int xfs_has_attr(struct xfs_da_args *args);
>>>>    int xfs_attr_remove_args(struct xfs_da_args *args);
>>>>    int xfs_attr_remove_iter(struct xfs_delattr_context *dac);
>>>> -int xfs_attr_trans_roll(struct xfs_delattr_context *dac);
>>>>    bool xfs_attr_namecheck(const void *name, size_t length);
>>>>    void xfs_delattr_context_init(struct xfs_delattr_context *dac,
>>>>    			      struct xfs_da_args *args);
>>>> diff --git a/fs/xfs/libxfs/xfs_attr_remote.c b/fs/xfs/libxfs/xfs_attr_remote.c
>>>> index 1426c15..5b445e7 100644
>>>> --- a/fs/xfs/libxfs/xfs_attr_remote.c
>>>> +++ b/fs/xfs/libxfs/xfs_attr_remote.c
>>>> @@ -441,7 +441,7 @@ xfs_attr_rmtval_get(
>>>>     * Find a "hole" in the attribute address space large enough for us to drop the
>>>>     * new attribute's value into
>>>>     */
>>>> -STATIC int
>>>> +int
>>>>    xfs_attr_rmt_find_hole(
>>>>    	struct xfs_da_args	*args)
>>>>    {
>>>> @@ -468,7 +468,7 @@ xfs_attr_rmt_find_hole(
>>>>    	return 0;
>>>>    }
>>>> -STATIC int
>>>> +int
>>>>    xfs_attr_rmtval_set_value(
>>>>    	struct xfs_da_args	*args)
>>>>    {
>>>> @@ -628,6 +628,69 @@ xfs_attr_rmtval_set(
>>>>    }
>>>>    /*
>>>> + * Find a hole for the attr and store it in the delayed attr context.  This
>>>> + * initializes the context to roll through allocating an attr extent for a
>>>> + * delayed attr operation
>>>> + */
>>>> +int
>>>> +xfs_attr_rmtval_find_space(
>>>> +	struct xfs_delattr_context	*dac)
>>>> +{
>>>> +	struct xfs_da_args		*args = dac->da_args;
>>>> +	struct xfs_bmbt_irec		*map = &dac->map;
>>>> +	int				error;
>>>> +
>>>> +	dac->lblkno = 0;
>>>> +	dac->blkcnt = 0;
>>>> +	args->rmtblkcnt = 0;
>>>> +	args->rmtblkno = 0;
>>>> +	memset(map, 0, sizeof(struct xfs_bmbt_irec));
>>>> +
>>>> +	error = xfs_attr_rmt_find_hole(args);
>>>> +	if (error)
>>>> +		return error;
>>>> +
>>>> +	dac->blkcnt = args->rmtblkcnt;
>>>> +	dac->lblkno = args->rmtblkno;
>>>> +
>>>> +	return 0;
>>>> +}
>>>> +
>>>> +/*
>>>> + * Write one block of the value associated with an attribute into the
>>>> + * out-of-line buffer that we have defined for it. This is similar to a subset
>>>> + * of xfs_attr_rmtval_set, but records the current block to the delayed attr
>>>> + * context, and leaves transaction handling to the caller.
>>>> + */
>>>> +int
>>>> +xfs_attr_rmtval_set_blk(
>>>> +	struct xfs_delattr_context	*dac)
>>>> +{
>>>> +	struct xfs_da_args		*args = dac->da_args;
>>>> +	struct xfs_inode		*dp = args->dp;
>>>> +	struct xfs_bmbt_irec		*map = &dac->map;
>>>> +	int nmap;
>>>> +	int error;
>>>> +
>>>> +	nmap = 1;
>>>> +	error = xfs_bmapi_write(args->trans, dp, (xfs_fileoff_t)dac->lblkno,
>>>> +				dac->blkcnt, XFS_BMAPI_ATTRFORK, args->total,
>>>> +				map, &nmap);
>>>> +	if (error)
>>>> +		return error;
>>>> +
>>>> +	ASSERT(nmap == 1);
>>>> +	ASSERT((map->br_startblock != DELAYSTARTBLOCK) &&
>>>> +	       (map->br_startblock != HOLESTARTBLOCK));
>>>> +
>>>> +	/* roll attribute extent map forwards */
>>>> +	dac->lblkno += map->br_blockcount;
>>>> +	dac->blkcnt -= map->br_blockcount;
>>>> +
>>>> +	return 0;
>>>> +}
>>>> +
>>>> +/*
>>>>     * Remove the value associated with an attribute by deleting the
>>>>     * out-of-line buffer that it is stored on.
>>>>     */
>>>> @@ -669,38 +732,6 @@ xfs_attr_rmtval_invalidate(
>>>>    }
>>>>    /*
>>>> - * Remove the value associated with an attribute by deleting the
>>>> - * out-of-line buffer that it is stored on.
>>>> - */
>>>> -int
>>>> -xfs_attr_rmtval_remove(
>>>> -	struct xfs_da_args		*args)
>>>> -{
>>>> -	int				error;
>>>> -	struct xfs_delattr_context	dac  = {
>>>> -		.da_args	= args,
>>>> -	};
>>>> -
>>>> -	trace_xfs_attr_rmtval_remove(args);
>>>> -
>>>> -	/*
>>>> -	 * Keep de-allocating extents until the remote-value region is gone.
>>>> -	 */
>>>> -	do {
>>>> -		error = __xfs_attr_rmtval_remove(&dac);
>>>> -		if (error != -EAGAIN)
>>>> -			break;
>>>> -
>>>> -		error = xfs_attr_trans_roll(&dac);
>>>> -		if (error)
>>>> -			return error;
>>>> -
>>>> -	} while (true);
>>>> -
>>>> -	return error;
>>>> -}
>>>> -
>>>> -/*
>>>>     * Remove the value associated with an attribute by deleting the out-of-line
>>>>     * buffer that it is stored on. Returns EAGAIN for the caller to refresh the
>>>>     * transaction and re-call the function
>>>> diff --git a/fs/xfs/libxfs/xfs_attr_remote.h b/fs/xfs/libxfs/xfs_attr_remote.h
>>>> index 002fd30..84e2700 100644
>>>> --- a/fs/xfs/libxfs/xfs_attr_remote.h
>>>> +++ b/fs/xfs/libxfs/xfs_attr_remote.h
>>>> @@ -15,4 +15,8 @@ int xfs_attr_rmtval_stale(struct xfs_inode *ip, struct xfs_bmbt_irec *map,
>>>>    		xfs_buf_flags_t incore_flags);
>>>>    int xfs_attr_rmtval_invalidate(struct xfs_da_args *args);
>>>>    int __xfs_attr_rmtval_remove(struct xfs_delattr_context *dac);
>>>> +int xfs_attr_rmt_find_hole(struct xfs_da_args *args);
>>>> +int xfs_attr_rmtval_set_value(struct xfs_da_args *args);
>>>> +int xfs_attr_rmtval_set_blk(struct xfs_delattr_context *dac);
>>>> +int xfs_attr_rmtval_find_space(struct xfs_delattr_context *dac);
>>>>    #endif /* __XFS_ATTR_REMOTE_H__ */
>>>> diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
>>>> index 8695165..e9dde4e 100644
>>>> --- a/fs/xfs/xfs_trace.h
>>>> +++ b/fs/xfs/xfs_trace.h
>>>> @@ -1925,7 +1925,6 @@ DEFINE_ATTR_EVENT(xfs_attr_refillstate);
>>>>    DEFINE_ATTR_EVENT(xfs_attr_rmtval_get);
>>>>    DEFINE_ATTR_EVENT(xfs_attr_rmtval_set);
>>>> -DEFINE_ATTR_EVENT(xfs_attr_rmtval_remove);
>>>>    #define DEFINE_DA_EVENT(name) \
>>>>    DEFINE_EVENT(xfs_da_class, name, \
>>>> -- 
>>>> 2.7.4
>>>>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v13 05/10] xfs: Set up infastructure for deferred attribute operations
  2020-11-14  2:00       ` Darrick J. Wong
@ 2020-11-16  7:41         ` Allison Henderson
  0 siblings, 0 replies; 58+ messages in thread
From: Allison Henderson @ 2020-11-16  7:41 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs



On 11/13/20 7:00 PM, Darrick J. Wong wrote:
> On Thu, Nov 12, 2020 at 06:32:13PM -0700, Allison Henderson wrote:
>>
>>
>> On 11/10/20 2:51 PM, Darrick J. Wong wrote:
>>> On Thu, Oct 22, 2020 at 11:34:30PM -0700, Allison Henderson wrote:
>>>> Currently attributes are modified directly across one or more
>>>> transactions. But they are not logged or replayed in the event of an
>>>> error. The goal of delayed attributes is to enable logging and replaying
>>>> of attribute operations using the existing delayed operations
>>>> infrastructure.  This will later enable the attributes to become part of
>>>> larger multi part operations that also must first be recorded to the
>>>> log.  This is mostly of interest in the scheme of parent pointers which
>>>> would need to maintain an attribute containing parent inode information
>>>> any time an inode is moved, created, or removed.  Parent pointers would
>>>> then be of interest to any feature that would need to quickly derive an
>>>> inode path from the mount point. Online scrub, nfs lookups and fs grow
>>>> or shrink operations are all features that could take advantage of this.
>>>>
>>>> This patch adds two new log item types for setting or removing
>>>> attributes as deferred operations.  The xfs_attri_log_item logs an
>>>> intent to set or remove an attribute.  The corresponding
>>>> xfs_attrd_log_item holds a reference to the xfs_attri_log_item and is
>>>> freed once the transaction is done.  Both log items use a generic
>>>> xfs_attr_log_format structure that contains the attribute name, value,
>>>> flags, inode, and an op_flag that indicates if the operations is a set
>>>> or remove.
>>>>
>>>> Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
>>>> ---
>>>>    fs/xfs/Makefile                 |   1 +
>>>>    fs/xfs/libxfs/xfs_attr.c        |   7 +-
>>>>    fs/xfs/libxfs/xfs_attr.h        |  19 +
>>>>    fs/xfs/libxfs/xfs_defer.c       |   1 +
>>>>    fs/xfs/libxfs/xfs_defer.h       |   3 +
>>>>    fs/xfs/libxfs/xfs_format.h      |   5 +
>>>>    fs/xfs/libxfs/xfs_log_format.h  |  44 ++-
>>>>    fs/xfs/libxfs/xfs_log_recover.h |   2 +
>>>>    fs/xfs/libxfs/xfs_types.h       |   1 +
>>>>    fs/xfs/scrub/common.c           |   2 +
>>>>    fs/xfs/xfs_acl.c                |   2 +
>>>>    fs/xfs/xfs_attr_item.c          | 750 ++++++++++++++++++++++++++++++++++++++++
>>>>    fs/xfs/xfs_attr_item.h          |  76 ++++
>>>>    fs/xfs/xfs_attr_list.c          |   1 +
>>>>    fs/xfs/xfs_ioctl.c              |   2 +
>>>>    fs/xfs/xfs_ioctl32.c            |   2 +
>>>>    fs/xfs/xfs_iops.c               |   2 +
>>>>    fs/xfs/xfs_log.c                |   4 +
>>>>    fs/xfs/xfs_log_recover.c        |   2 +
>>>>    fs/xfs/xfs_ondisk.h             |   2 +
>>>>    fs/xfs/xfs_xattr.c              |   1 +
>>>>    21 files changed, 923 insertions(+), 6 deletions(-)
>>>>
>>>> diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
>>>> index 04611a1..b056cfc 100644
>>>> --- a/fs/xfs/Makefile
>>>> +++ b/fs/xfs/Makefile
>>>> @@ -102,6 +102,7 @@ xfs-y				+= xfs_log.o \
>>>>    				   xfs_buf_item_recover.o \
>>>>    				   xfs_dquot_item_recover.o \
>>>>    				   xfs_extfree_item.o \
>>>> +				   xfs_attr_item.o \
>>>>    				   xfs_icreate_item.o \
>>>>    				   xfs_inode_item.o \
>>>>    				   xfs_inode_item_recover.o \
>>>> diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
>>>> index 6453178..760383c 100644
>>>> --- a/fs/xfs/libxfs/xfs_attr.c
>>>> +++ b/fs/xfs/libxfs/xfs_attr.c
>>>> @@ -24,6 +24,7 @@
>>>>    #include "xfs_quota.h"
>>>>    #include "xfs_trans_space.h"
>>>>    #include "xfs_trace.h"
>>>> +#include "xfs_attr_item.h"
>>>>    /*
>>>>     * xfs_attr.c
>>>> @@ -59,8 +60,6 @@ STATIC int xfs_attr_node_hasname(xfs_da_args_t *args,
>>>>    STATIC int xfs_attr_fillstate(xfs_da_state_t *state);
>>>>    STATIC int xfs_attr_refillstate(xfs_da_state_t *state);
>>>>    STATIC int xfs_attr_leaf_try_add(struct xfs_da_args *args, struct xfs_buf *bp);
>>>> -STATIC int xfs_attr_set_iter(struct xfs_delattr_context *dac,
>>>> -			     struct xfs_buf **leaf_bp);
>>>>    int
>>>>    xfs_inode_hasattr(
>>>> @@ -142,7 +141,7 @@ xfs_attr_get(
>>>>    /*
>>>>     * Calculate how many blocks we need for the new attribute,
>>>>     */
>>>> -STATIC int
>>>> +int
>>>>    xfs_attr_calc_size(
>>>>    	struct xfs_da_args	*args,
>>>>    	int			*local)
>>>> @@ -327,7 +326,7 @@ xfs_attr_set_args(
>>>>     * to handle this, and recall the function until a successful error code is
>>>>     * returned.
>>>>     */
>>>> -STATIC int
>>>> +int
>>>>    xfs_attr_set_iter(
>>>>    	struct xfs_delattr_context	*dac,
>>>>    	struct xfs_buf			**leaf_bp)
>>>> diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
>>>> index 501f9df..5b4a1ca 100644
>>>> --- a/fs/xfs/libxfs/xfs_attr.h
>>>> +++ b/fs/xfs/libxfs/xfs_attr.h
>>>> @@ -247,6 +247,7 @@ enum xfs_delattr_state {
>>>>    #define XFS_DAC_DEFER_FINISH		0x01 /* finish the transaction */
>>>>    #define XFS_DAC_NODE_RMVNAME_INIT	0x02 /* xfs_attr_node_removename init */
>>>>    #define XFS_DAC_LEAF_ADDNAME_INIT	0x04 /* xfs_attr_leaf_addname init*/
>>>> +#define XFS_DAC_DELAYED_OP_INIT		0x08 /* delayed operations init*/
>>>>    /*
>>>>     * Context used for keeping track of delayed attribute operations
>>>> @@ -254,6 +255,9 @@ enum xfs_delattr_state {
>>>>    struct xfs_delattr_context {
>>>>    	struct xfs_da_args      *da_args;
>>>> +	/* Used by delayed attributes to hold leaf across transactions */
>>>
>>> "Used by xfs_attr_set to hold a leaf buffer across a transaction roll" ?
>> Sure, will update
>>
>>>
>>>> +	struct xfs_buf		*leaf_bp;
>>>> +
>>>>    	/* Used in xfs_attr_rmtval_set_blk to roll through allocating blocks */
>>>>    	struct xfs_bmbt_irec	map;
>>>>    	xfs_dablk_t		lblkno;
>>>> @@ -267,6 +271,18 @@ struct xfs_delattr_context {
>>>>    	enum xfs_delattr_state  dela_state;
>>>>    };
>>>> +/*
>>>> + * List of attrs to commit later.
>>>> + */
>>>> +struct xfs_attr_item {
>>>> +	struct xfs_delattr_context	xattri_dac;
>>>> +	uint32_t			xattri_op_flags;/* attr op set or rm */
>>>
>>> The comment for xattri_op_flags should be more direct in mentioning that
>>> it takes XFS_ATTR_OP_FLAGS_{SET,REMOVE}.
>> Alrighty, will do
>>
>>>
>>> (Alternately you could define an enum for the incore state tracker that
>>> causes the appropriate XFS_ATTR_OP_FLAG* to be set on the log item in
>>> xfs_attr_create_intent to avoid mixing of the flag namespaces, but that
>>> is a lot of paper-pushing...)
>>>
>>>> +
>>>> +	/* used to log this item to an intent */
>>>> +	struct list_head		xattri_list;
>>>> +};
>>>
>>> Ok, so going back to a confusing comment I had from the last series,
>>> I'm glad that you've moved all the attr code to be deferred operations.
>>>
>>> Can you move all the xfs_delattr_context fields into xfs_attr_item?
>>> AFAICT (from git diff'ing the entire branch :P) we never allocate an
>>> xfs_delattr_context on its own; we only ever access the one that's
>>> embedded in xfs_attr_item, right?
>> Well, xfs_delattr_context is used earlier in the set by the top level
>> routines xfs_attr_set/remove_args.  If we did this, it would pull the
>> attr_item in the the lower part of the "delay ready" subseries, and I think
>> people really just wanted that part to be "refactor only" just for reasons
>> of making the reviewing easier.
>>
>> How about an extra patch at the end that merges these struct after those
>> high level functions back out?  That way we're not trying to introduce the
>> log items before this patch?  That seems like a reasonable way to phase in
>> the end result.
> 
> Yes.
> 
>> Also, such a change would imply that a lot of these lower level attr
>> routines that sensitive the the state machine mechanics are not passing
>> around a xfs_delattr_context any more, now they take a xfs_attr_item. Not
>> entirly sure how people would feel about that, but again, I figure if we
>> save it for the end, it's easy to take it or leave it with out causing too
>> much surgery below.
> 
> Yes.  The major transformation of this patchset is to establish that
> high level xfs functionality is supposed to use defer ops to stage
> complex metadata updates instead of open-coding transaction rolling and
> state management like it has done historically.
> 
> And, as you've undoubtedly noticed from implementing the attr item, that
> also means that we can make those complex operations restartable in the
> event of a system failure.
> 
> Also: When the log item is enabled, we hold the inode locked across an
> entire xattr update /and/ can restart interrupted operations.  I think
> this means that you can skip all the INCOMPLETE flag handling bs, since
> that flag only exists to ensure that we only ever present exactly one
> (key, value) tuple to userspace.
> 
Yeah, IIRC, we tried to pull it out once before, and then ended up 
having to put it back because we realized we needed it for older 
filesystems that cant use delayed attrs.  I'll see if I can put in a 
switch to skip it when delayed attrs are on

>>>
>>>> +
>>>> +
>>>>    /*========================================================================
>>>>     * Function prototypes for the kernel.
>>>>     *========================================================================*/
>>>> @@ -282,11 +298,14 @@ int xfs_attr_get_ilocked(struct xfs_da_args *args);
>>>>    int xfs_attr_get(struct xfs_da_args *args);
>>>>    int xfs_attr_set(struct xfs_da_args *args);
>>>>    int xfs_attr_set_args(struct xfs_da_args *args);
>>>> +int xfs_attr_set_iter(struct xfs_delattr_context *dac,
>>>> +		      struct xfs_buf **leaf_bp);
>>>>    int xfs_has_attr(struct xfs_da_args *args);
>>>>    int xfs_attr_remove_args(struct xfs_da_args *args);
>>>>    int xfs_attr_remove_iter(struct xfs_delattr_context *dac);
>>>>    bool xfs_attr_namecheck(const void *name, size_t length);
>>>>    void xfs_delattr_context_init(struct xfs_delattr_context *dac,
>>>>    			      struct xfs_da_args *args);
>>>> +int xfs_attr_calc_size(struct xfs_da_args *args, int *local);
>>>>    #endif	/* __XFS_ATTR_H__ */
>>>> diff --git a/fs/xfs/libxfs/xfs_defer.c b/fs/xfs/libxfs/xfs_defer.c
>>>> index eff4a12..e9caff7 100644
>>>> --- a/fs/xfs/libxfs/xfs_defer.c
>>>> +++ b/fs/xfs/libxfs/xfs_defer.c
>>>> @@ -178,6 +178,7 @@ static const struct xfs_defer_op_type *defer_op_types[] = {
>>>>    	[XFS_DEFER_OPS_TYPE_RMAP]	= &xfs_rmap_update_defer_type,
>>>>    	[XFS_DEFER_OPS_TYPE_FREE]	= &xfs_extent_free_defer_type,
>>>>    	[XFS_DEFER_OPS_TYPE_AGFL_FREE]	= &xfs_agfl_free_defer_type,
>>>> +	[XFS_DEFER_OPS_TYPE_ATTR]	= &xfs_attr_defer_type,
>>>>    };
>>>>    static void
>>>> diff --git a/fs/xfs/libxfs/xfs_defer.h b/fs/xfs/libxfs/xfs_defer.h
>>>> index 05472f7..72a5789 100644
>>>> --- a/fs/xfs/libxfs/xfs_defer.h
>>>> +++ b/fs/xfs/libxfs/xfs_defer.h
>>>> @@ -19,6 +19,7 @@ enum xfs_defer_ops_type {
>>>>    	XFS_DEFER_OPS_TYPE_RMAP,
>>>>    	XFS_DEFER_OPS_TYPE_FREE,
>>>>    	XFS_DEFER_OPS_TYPE_AGFL_FREE,
>>>> +	XFS_DEFER_OPS_TYPE_ATTR,
>>>>    	XFS_DEFER_OPS_TYPE_MAX,
>>>>    };
>>>> @@ -63,6 +64,8 @@ extern const struct xfs_defer_op_type xfs_refcount_update_defer_type;
>>>>    extern const struct xfs_defer_op_type xfs_rmap_update_defer_type;
>>>>    extern const struct xfs_defer_op_type xfs_extent_free_defer_type;
>>>>    extern const struct xfs_defer_op_type xfs_agfl_free_defer_type;
>>>> +extern const struct xfs_defer_op_type xfs_attr_defer_type;
>>>> +
>>>>    /*
>>>>     * This structure enables a dfops user to detach the chain of deferred
>>>> diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
>>>> index dd764da..d419c34 100644
>>>> --- a/fs/xfs/libxfs/xfs_format.h
>>>> +++ b/fs/xfs/libxfs/xfs_format.h
>>>> @@ -584,6 +584,11 @@ static inline bool xfs_sb_version_hasinobtcounts(struct xfs_sb *sbp)
>>>>    		(sbp->sb_features_ro_compat & XFS_SB_FEAT_RO_COMPAT_INOBTCNT);
>>>>    }
>>>> +static inline bool xfs_sb_version_hasdelattr(struct xfs_sb *sbp)
>>>> +{
>>>> +	return false;
>>>> +}
>>>> +
>>>>    /*
>>>>     * end of superblock version macros
>>>>     */
>>>> diff --git a/fs/xfs/libxfs/xfs_log_format.h b/fs/xfs/libxfs/xfs_log_format.h
>>>> index 8bd00da..de6309d 100644
>>>> --- a/fs/xfs/libxfs/xfs_log_format.h
>>>> +++ b/fs/xfs/libxfs/xfs_log_format.h
>>>> @@ -117,7 +117,12 @@ struct xfs_unmount_log_format {
>>>>    #define XLOG_REG_TYPE_CUD_FORMAT	24
>>>>    #define XLOG_REG_TYPE_BUI_FORMAT	25
>>>>    #define XLOG_REG_TYPE_BUD_FORMAT	26
>>>> -#define XLOG_REG_TYPE_MAX		26
>>>> +#define XLOG_REG_TYPE_ATTRI_FORMAT	27
>>>> +#define XLOG_REG_TYPE_ATTRD_FORMAT	28
>>>> +#define XLOG_REG_TYPE_ATTR_NAME	29
>>>> +#define XLOG_REG_TYPE_ATTR_VALUE	30
>>>> +#define XLOG_REG_TYPE_MAX		30
>>>> +
>>>>    /*
>>>>     * Flags to log operation header
>>>> @@ -240,6 +245,8 @@ typedef struct xfs_trans_header {
>>>>    #define	XFS_LI_CUD		0x1243
>>>>    #define	XFS_LI_BUI		0x1244	/* bmbt update intent */
>>>>    #define	XFS_LI_BUD		0x1245
>>>> +#define	XFS_LI_ATTRI		0x1246  /* attr set/remove intent*/
>>>> +#define	XFS_LI_ATTRD		0x1247  /* attr set/remove done */
>>>>    #define XFS_LI_TYPE_DESC \
>>>>    	{ XFS_LI_EFI,		"XFS_LI_EFI" }, \
>>>> @@ -255,7 +262,9 @@ typedef struct xfs_trans_header {
>>>>    	{ XFS_LI_CUI,		"XFS_LI_CUI" }, \
>>>>    	{ XFS_LI_CUD,		"XFS_LI_CUD" }, \
>>>>    	{ XFS_LI_BUI,		"XFS_LI_BUI" }, \
>>>> -	{ XFS_LI_BUD,		"XFS_LI_BUD" }
>>>> +	{ XFS_LI_BUD,		"XFS_LI_BUD" }, \
>>>> +	{ XFS_LI_ATTRI,		"XFS_LI_ATTRI" }, \
>>>> +	{ XFS_LI_ATTRD,		"XFS_LI_ATTRD" }
>>>>    /*
>>>>     * Inode Log Item Format definitions.
>>>> @@ -863,4 +872,35 @@ struct xfs_icreate_log {
>>>>    	__be32		icl_gen;	/* inode generation number to use */
>>>>    };
>>>> +/*
>>>> + * Flags for deferred attribute operations.
>>>> + * Upper bits are flags, lower byte is type code
>>>> + */
>>>> +#define XFS_ATTR_OP_FLAGS_SET		1	/* Set the attribute */
>>>> +#define XFS_ATTR_OP_FLAGS_REMOVE	2	/* Remove the attribute */
>>>> +#define XFS_ATTR_OP_FLAGS_TYPE_MASK	0x0FF	/* Flags type mask */
>>>> +
>>>> +/*
>>>> + * This is the structure used to lay out an attr log item in the
>>>> + * log.
>>>> + */
>>>> +struct xfs_attri_log_format {
>>>> +	uint16_t	alfi_type;	/* attri log item type */
>>>> +	uint16_t	alfi_size;	/* size of this item */
>>>> +	uint32_t	__pad;		/* pad to 64 bit aligned */
>>>> +	uint64_t	alfi_id;	/* attri identifier */
>>>> +	xfs_ino_t	alfi_ino;	/* the inode for this attr operation */
>>>
>>> This is an ondisk structure; please use only explicitly sized data
>>> types like uint64_t.
>> Ok, will update
>>
>>>
>>>> +	uint32_t	alfi_op_flags;	/* marks the op as a set or remove */
>>>> +	uint32_t	alfi_name_len;	/* attr name length */
>>>> +	uint32_t	alfi_value_len;	/* attr value length */
>>>> +	uint32_t	alfi_attr_flags;/* attr flags */
>>>> +};
>>>> +
>>>> +struct xfs_attrd_log_format {
>>>> +	uint16_t	alfd_type;	/* attrd log item type */
>>>> +	uint16_t	alfd_size;	/* size of this item */
>>>> +	uint32_t	__pad;		/* pad to 64 bit aligned */
>>>> +	uint64_t	alfd_alf_id;	/* id of corresponding attrd */
>>>
>>> "..of corresponding attri"
>> Yes, corresponding attri :-)
>>
>>>
>>>> +};
>>>> +
>>>>    #endif /* __XFS_LOG_FORMAT_H__ */
>>>> diff --git a/fs/xfs/libxfs/xfs_log_recover.h b/fs/xfs/libxfs/xfs_log_recover.h
>>>> index 3cca2bf..b6e5514 100644
>>>> --- a/fs/xfs/libxfs/xfs_log_recover.h
>>>> +++ b/fs/xfs/libxfs/xfs_log_recover.h
>>>> @@ -72,6 +72,8 @@ extern const struct xlog_recover_item_ops xlog_rui_item_ops;
>>>>    extern const struct xlog_recover_item_ops xlog_rud_item_ops;
>>>>    extern const struct xlog_recover_item_ops xlog_cui_item_ops;
>>>>    extern const struct xlog_recover_item_ops xlog_cud_item_ops;
>>>> +extern const struct xlog_recover_item_ops xlog_attri_item_ops;
>>>> +extern const struct xlog_recover_item_ops xlog_attrd_item_ops;
>>>>    /*
>>>>     * Macros, structures, prototypes for internal log manager use.
>>>> diff --git a/fs/xfs/libxfs/xfs_types.h b/fs/xfs/libxfs/xfs_types.h
>>>> index 397d947..860cdd2 100644
>>>> --- a/fs/xfs/libxfs/xfs_types.h
>>>> +++ b/fs/xfs/libxfs/xfs_types.h
>>>> @@ -11,6 +11,7 @@ typedef uint32_t	prid_t;		/* project ID */
>>>>    typedef uint32_t	xfs_agblock_t;	/* blockno in alloc. group */
>>>>    typedef uint32_t	xfs_agino_t;	/* inode # within allocation grp */
>>>>    typedef uint32_t	xfs_extlen_t;	/* extent length in blocks */
>>>> +typedef uint32_t	xfs_attrlen_t;	/* attr length */
>>>
>>> This doesn't get used anywhere.
>> Ok, will clean out.
>>
>>>
>>>>    typedef uint32_t	xfs_agnumber_t;	/* allocation group number */
>>>>    typedef int32_t		xfs_extnum_t;	/* # of extents in a file */
>>>>    typedef int16_t		xfs_aextnum_t;	/* # extents in an attribute fork */
>>>> diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
>>>> index 1887605..9a649d1 100644
>>>> --- a/fs/xfs/scrub/common.c
>>>> +++ b/fs/xfs/scrub/common.c
>>>> @@ -24,6 +24,8 @@
>>>>    #include "xfs_rmap_btree.h"
>>>>    #include "xfs_log.h"
>>>>    #include "xfs_trans_priv.h"
>>>> +#include "xfs_da_format.h"
>>>> +#include "xfs_da_btree.h"
>>>>    #include "xfs_attr.h"
>>>>    #include "xfs_reflink.h"
>>>>    #include "scrub/scrub.h"
>>>> diff --git a/fs/xfs/xfs_acl.c b/fs/xfs/xfs_acl.c
>>>> index c544951..cad1db4 100644
>>>> --- a/fs/xfs/xfs_acl.c
>>>> +++ b/fs/xfs/xfs_acl.c
>>>> @@ -10,6 +10,8 @@
>>>>    #include "xfs_trans_resv.h"
>>>>    #include "xfs_mount.h"
>>>>    #include "xfs_inode.h"
>>>> +#include "xfs_da_format.h"
>>>> +#include "xfs_da_btree.h"
>>>>    #include "xfs_attr.h"
>>>>    #include "xfs_trace.h"
>>>>    #include "xfs_error.h"
>>>> diff --git a/fs/xfs/xfs_attr_item.c b/fs/xfs/xfs_attr_item.c
>>>> new file mode 100644
>>>> index 0000000..3980066
>>>> --- /dev/null
>>>> +++ b/fs/xfs/xfs_attr_item.c
>>>> @@ -0,0 +1,750 @@
>>>> +// SPDX-License-Identifier: GPL-2.0-or-later
>>>> +/*
>>>> + * Copyright (C) 2019 Oracle.  All Rights Reserved.
>>>
>>> 2019 -> 2020.
>> Will update.  :-)
>>
>>>
>>>> + * Author: Allison Collins <allison.henderson@oracle.com>
>>>> + */
>>>> +
>>>> +#include "xfs.h"
>>>> +#include "xfs_fs.h"
>>>> +#include "xfs_format.h"
>>>> +#include "xfs_log_format.h"
>>>> +#include "xfs_trans_resv.h"
>>>> +#include "xfs_bit.h"
>>>> +#include "xfs_shared.h"
>>>> +#include "xfs_mount.h"
>>>> +#include "xfs_defer.h"
>>>> +#include "xfs_trans.h"
>>>> +#include "xfs_trans_priv.h"
>>>> +#include "xfs_buf_item.h"
>>>> +#include "xfs_attr_item.h"
>>>> +#include "xfs_log.h"
>>>> +#include "xfs_btree.h"
>>>> +#include "xfs_rmap.h"
>>>> +#include "xfs_inode.h"
>>>> +#include "xfs_icache.h"
>>>> +#include "xfs_da_format.h"
>>>> +#include "xfs_da_btree.h"
>>>> +#include "xfs_attr.h"
>>>> +#include "xfs_shared.h"
>>>> +#include "xfs_attr_item.h"
>>>> +#include "xfs_alloc.h"
>>>> +#include "xfs_bmap.h"
>>>> +#include "xfs_trace.h"
>>>> +#include "libxfs/xfs_da_format.h"
>>>> +#include "xfs_inode.h"
>>>> +#include "xfs_quota.h"
>>>> +#include "xfs_log_priv.h"
>>>> +#include "xfs_log_recover.h"
>>>> +
>>>> +static const struct xfs_item_ops xfs_attri_item_ops;
>>>> +static const struct xfs_item_ops xfs_attrd_item_ops;
>>>> +
>>>> +static inline struct xfs_attri_log_item *ATTRI_ITEM(struct xfs_log_item *lip)
>>>> +{
>>>> +	return container_of(lip, struct xfs_attri_log_item, attri_item);
>>>> +}
>>>> +
>>>> +STATIC void
>>>> +xfs_attri_item_free(
>>>> +	struct xfs_attri_log_item	*attrip)
>>>> +{
>>>> +	kmem_free(attrip->attri_item.li_lv_shadow);
>>>> +	kmem_free(attrip);
>>>> +}
>>>> +
>>>> +/*
>>>> + * Freeing the attrip requires that we remove it from the AIL if it has already
>>>> + * been placed there. However, the ATTRI may not yet have been placed in the
>>>> + * AIL when called by xfs_attri_release() from ATTRD processing due to the
>>>> + * ordering of committed vs unpin operations in bulk insert operations. Hence
>>>> + * the reference count to ensure only the last caller frees the ATTRI.
>>>> + */
>>>> +STATIC void
>>>> +xfs_attri_release(
>>>> +	struct xfs_attri_log_item	*attrip)
>>>> +{
>>>> +	ASSERT(atomic_read(&attrip->attri_refcount) > 0);
>>>> +	if (atomic_dec_and_test(&attrip->attri_refcount)) {
>>>> +		xfs_trans_ail_delete(&attrip->attri_item,
>>>> +				     SHUTDOWN_LOG_IO_ERROR);
>>>> +		xfs_attri_item_free(attrip);
>>>> +	}
>>>> +}
>>>> +
>>>> +/*
>>>> + * This returns the number of iovecs needed to log the given attri item. We
>>>> + * only need 1 iovec for an attri item.  It just logs the attr_log_format
>>>> + * structure.
>>>> + */
>>>> +static inline int
>>>> +xfs_attri_item_sizeof(
>>>> +	struct xfs_attri_log_item *attrip)
>>>> +{
>>>> +	return sizeof(struct xfs_attri_log_format);
>>>> +}
>>>
>>> Please get rid of this trivial oneliner.
>> Sure, I think some of this I added just for reasons of being consistent with
>> how the other delayed ops are implemented.
>>
>>>
>>>> +
>>>> +STATIC void
>>>> +xfs_attri_item_size(
>>>> +	struct xfs_log_item	*lip,
>>>> +	int			*nvecs,
>>>> +	int			*nbytes)
>>>> +{
>>>> +	struct xfs_attri_log_item       *attrip = ATTRI_ITEM(lip);
>>>> +
>>>> +	*nvecs += 1;
>>>> +	*nbytes += xfs_attri_item_sizeof(attrip);
>>>> +
>>>> +	/* Attr set and remove operations require a name */
>>>> +	ASSERT(attrip->attri_name_len > 0);
>>>> +
>>>> +	*nvecs += 1;
>>>> +	*nbytes += ATTR_NVEC_SIZE(attrip->attri_name_len);
>>>> +
>>>> +	/*
>>>> +	 * Set ops can accept a value of 0 len to clear an attr value.  Remove
>>>> +	 * ops do not need a value at all.  So only account for the value
>>>> +	 * when it is needed.
>>>> +	 */
>>>> +	if (attrip->attri_value_len > 0) {
>>>> +		*nvecs += 1;
>>>> +		*nbytes += ATTR_NVEC_SIZE(attrip->attri_value_len);
>>>> +	}
>>>> +}
>>>> +
>>>> +/*
>>>> + * This is called to fill in the log iovecs for the given attri log
>>>> + * item. We use  1 iovec for the attri_format_item, 1 for the name, and
>>>> + * another for the value if it is present
>>>> + */
>>>> +STATIC void
>>>> +xfs_attri_item_format(
>>>> +	struct xfs_log_item	*lip,
>>>> +	struct xfs_log_vec	*lv)
>>>> +{
>>>> +	struct xfs_attri_log_item	*attrip = ATTRI_ITEM(lip);
>>>> +	struct xfs_log_iovec		*vecp = NULL;
>>>> +
>>>> +	attrip->attri_format.alfi_type = XFS_LI_ATTRI;
>>>> +	attrip->attri_format.alfi_size = 1;
>>>> +
>>>> +	/*
>>>> +	 * This size accounting must be done before copying the attrip into the
>>>> +	 * iovec.  If we do it after, the wrong size will be recorded to the log
>>>> +	 * and we trip across assertion checks for bad region sizes later during
>>>> +	 * the log recovery.
>>>> +	 */
>>>> +
>>>> +	ASSERT(attrip->attri_name_len > 0);
>>>> +	attrip->attri_format.alfi_size++;
>>>> +
>>>> +	if (attrip->attri_value_len > 0)
>>>> +		attrip->attri_format.alfi_size++;
>>>> +
>>>> +	xlog_copy_iovec(lv, &vecp, XLOG_REG_TYPE_ATTRI_FORMAT,
>>>> +			&attrip->attri_format,
>>>> +			xfs_attri_item_sizeof(attrip));
>>>> +	if (attrip->attri_name_len > 0)
>>>
>>> I thought we required attri_name_len > 0 always?
>> I think so.  I think this check may have come up in one of the earlier
>> reviews.  I'll add a comment here, we even have the assert a few lines up.
> 
> <nod>
> 
>>>
>>>> +		xlog_copy_iovec(lv, &vecp, XLOG_REG_TYPE_ATTR_NAME,
>>>> +				attrip->attri_name,
>>>> +				ATTR_NVEC_SIZE(attrip->attri_name_len));
>>>> +
>>>> +	if (attrip->attri_value_len > 0)
>>>> +		xlog_copy_iovec(lv, &vecp, XLOG_REG_TYPE_ATTR_VALUE,
>>>> +				attrip->attri_value,
>>>> +				ATTR_NVEC_SIZE(attrip->attri_value_len));
>>>> +}
>>>> +
>>>> +/*
>>>> + * The unpin operation is the last place an ATTRI is manipulated in the log. It
>>>> + * is either inserted in the AIL or aborted in the event of a log I/O error. In
>>>> + * either case, the ATTRI transaction has been successfully committed to make
>>>> + * it this far. Therefore, we expect whoever committed the ATTRI to either
>>>> + * construct and commit the ATTRD or drop the ATTRD's reference in the event of
>>>> + * error. Simply drop the log's ATTRI reference now that the log is done with
>>>> + * it.
>>>> + */
>>>> +STATIC void
>>>> +xfs_attri_item_unpin(
>>>> +	struct xfs_log_item	*lip,
>>>> +	int			remove)
>>>> +{
>>>> +	struct xfs_attri_log_item	*attrip = ATTRI_ITEM(lip);
>>>> +
>>>> +	xfs_attri_release(attrip);
>>>
>>> Nit: this could be shortened to xfs_attri_release(ATTRI_ITEM(lip)).
>> Ok, will shorten
>>
>>>
>>>> +}
>>>> +
>>>> +
>>>> +STATIC void
>>>> +xfs_attri_item_release(
>>>> +	struct xfs_log_item	*lip)
>>>> +{
>>>> +	xfs_attri_release(ATTRI_ITEM(lip));
>>>> +}
>>>> +
>>>> +/*
>>>> + * Allocate and initialize an attri item
>>>> + */
>>>> +STATIC struct xfs_attri_log_item *
>>>> +xfs_attri_init(
>>>> +	struct xfs_mount	*mp)
>>>> +
>>>> +{
>>>> +	struct xfs_attri_log_item	*attrip;
>>>> +	uint				size;
>>>
>>> Can you line up the *mp in the parameter list with the *attrip in the
>>> local variables?
>> Sure
>>
>>>
>>>> +
>>>> +	size = (uint)(sizeof(struct xfs_attri_log_item));
>>>
>>> kmem_zalloc takes a size_t parameter (which is the return type of sizeof);
>>> no need to do all this casting.
>> Ok, I'm thinking of adding an extra buffer_size param here, so that one of
>> the callers doesnt have to realloc this for the trailing buffer needed
>> during the commit.  One of the new test cases is showing an intermittent
>> warning about allocating more than a page, so I'm trying to clean that up
>> and figure that out
> 
> Urrk, oh right, I forgot that you can end up needing to allocate a 64k +
> 256b + ~80b buffer to hold all this state.
> 
> So uh yeah, you /do/ have to use kmem_zalloc_large and know the size
> ahead of time.
> 
>>>> +	attrip = kmem_zalloc(size, 0);
>>>> +
>>>> +	xfs_log_item_init(mp, &attrip->attri_item, XFS_LI_ATTRI,
>>>> +			  &xfs_attri_item_ops);
>>>> +	attrip->attri_format.alfi_id = (uintptr_t)(void *)attrip;
>>>> +	atomic_set(&attrip->attri_refcount, 2);
>>>> +
>>>> +	return attrip;
>>>> +}
>>>> +
>>>> +/*
>>>> + * Copy an attr format buffer from the given buf, and into the destination attr
>>>> + * format structure.
>>>> + */
>>>> +STATIC int
>>>> +xfs_attri_copy_format(struct xfs_log_iovec *buf,
>>>> +		      struct xfs_attri_log_format *dst_attr_fmt)
>>>> +{
>>>> +	struct xfs_attri_log_format *src_attr_fmt = buf->i_addr;
>>>> +	uint len = sizeof(struct xfs_attri_log_format);
>>>
>>> Indentation and whatnot with the parameter names.
>> Ok will fix
>>>
>>>> +
>>>> +	if (buf->i_len != len)
>>>> +		return -EFSCORRUPTED;
>>>> +
>>>> +	memcpy((char *)dst_attr_fmt, (char *)src_attr_fmt, len);
>>>> +	return 0;
>>>> +}
>>>> +
>>>> +static inline struct xfs_attrd_log_item *ATTRD_ITEM(struct xfs_log_item *lip)
>>>> +{
>>>> +	return container_of(lip, struct xfs_attrd_log_item, attrd_item);
>>>> +}
>>>> +
>>>> +STATIC void
>>>> +xfs_attrd_item_free(struct xfs_attrd_log_item *attrdp)
>>>> +{
>>>> +	kmem_free(attrdp->attrd_item.li_lv_shadow);
>>>> +	kmem_free(attrdp);
>>>> +}
>>>> +
>>>> +/*
>>>> + * This returns the number of iovecs needed to log the given attrd item.
>>>> + * We only need 1 iovec for an attrd item.  It just logs the attr_log_format
>>>> + * structure.
>>>> + */
>>>> +static inline int
>>>> +xfs_attrd_item_sizeof(
>>>> +	struct xfs_attrd_log_item *attrdp)
>>>> +{
>>>> +	return sizeof(struct xfs_attrd_log_format);
>>>> +}
>>>> +
>>>> +STATIC void
>>>> +xfs_attrd_item_size(
>>>> +	struct xfs_log_item	*lip,
>>>> +	int			*nvecs,
>>>> +	int			*nbytes)
>>>> +{
>>>> +	struct xfs_attrd_log_item	*attrdp = ATTRD_ITEM(lip);
>>>
>>> Variable name alignment between the parameter list and the local vars.
>>>
>>>> +	*nvecs += 1;
>>>
>>> Space between local variable declaration and the first line of code.
>>>
>>>> +	*nbytes += xfs_attrd_item_sizeof(attrdp);
>>>
>>> No need for a oneliner function for sizeof.
>>
>> Ok, will fix
>>>
>>>> +}
>>>> +
>>>> +/*
>>>> + * This is called to fill in the log iovecs for the given attrd log item. We use
>>>> + * only 1 iovec for the attrd_format, and we point that at the attr_log_format
>>>> + * structure embedded in the attrd item.
>>>> + */
>>>> +STATIC void
>>>> +xfs_attrd_item_format(
>>>> +	struct xfs_log_item	*lip,
>>>> +	struct xfs_log_vec	*lv)
>>>> +{
>>>> +	struct xfs_attrd_log_item	*attrdp = ATTRD_ITEM(lip);
>>>> +	struct xfs_log_iovec		*vecp = NULL;
>>>> +
>>>> +	attrdp->attrd_format.alfd_type = XFS_LI_ATTRD;
>>>> +	attrdp->attrd_format.alfd_size = 1;
>>>> +
>>>> +	xlog_copy_iovec(lv, &vecp, XLOG_REG_TYPE_ATTRD_FORMAT,
>>>> +			&attrdp->attrd_format, xfs_attrd_item_sizeof(attrdp));
>>>> +}
>>>> +
>>>> +/*
>>>> + * The ATTRD is either committed or aborted if the transaction is cancelled. If
>>>> + * the transaction is cancelled, drop our reference to the ATTRI and free the
>>>> + * ATTRD.
>>>> + */
>>>> +STATIC void
>>>> +xfs_attrd_item_release(
>>>> +	struct xfs_log_item     *lip)
>>>> +{
>>>> +	struct xfs_attrd_log_item *attrdp = ATTRD_ITEM(lip);
>>>> +	xfs_attri_release(attrdp->attrd_attrip);
>>>
>>> Space between the variable declaration and the first line of code.
>> Sure, will add.
>>
>>>
>>>> +	xfs_attrd_item_free(attrdp);
>>>> +}
>>>> +
>>>> +/*
>>>> + * Log an ATTRI it to the ATTRD when the attr op is done.  An attr operation
>>>
>>> I don't know what "Log an ATTRI it to the ATTRD" means.  I think this is
>>> the function that performs one step of an attribute update intent and
>>> then tags the attrd item dirty, right?
>> Yes, I had modeled this function loosly around free extent code at the time.
>> It has similar commentary, though that's about what I interpreted it to
>> mean.  Back then we were still trying to conceptualize how this looping
>> behavior with the state machine was going to work though.
>>
>> Maybe the comment should just state it like that if that's more clear?
>>
>> "Performs one step of an attribute update intent and marks the attrd item
>> dirty."
> 
> Ok.  I was confused by the garbled sentence.
> 
>>
>> ?
>>
>>>
>>>> + * may be a set or a remove.  Note that the transaction is marked dirty
>>>> + * regardless of whether the operation succeeds or fails to support the
>>>> + * ATTRI/ATTRD lifecycle rules.
>>>> + */
>>>> +int
>>>> +xfs_trans_attr(
>>>> +	struct xfs_delattr_context	*dac,
>>>> +	struct xfs_attrd_log_item	*attrdp,
>>>> +	struct xfs_buf			**leaf_bp,
>>>> +	uint32_t			op_flags)
>>>> +{
>>>> +	struct xfs_da_args		*args = dac->da_args;
>>>> +	int				error;
>>>> +
>>>> +	error = xfs_qm_dqattach_locked(args->dp, 0);
>>>> +	if (error)
>>>> +		return error;
>>>> +
>>>> +	switch (op_flags) {
>>>> +	case XFS_ATTR_OP_FLAGS_SET:
>>>> +		args->op_flags |= XFS_DA_OP_ADDNAME;
>>>> +		error = xfs_attr_set_iter(dac, leaf_bp);
>>>> +		break;
>>>> +	case XFS_ATTR_OP_FLAGS_REMOVE:
>>>> +		ASSERT(XFS_IFORK_Q((args->dp)));
>>>
>>> No need for the double parentheses around args->dp.
>> Ok, will clean out
>>
>>>
>>>> +		error = xfs_attr_remove_iter(dac);
>>>> +		break;
>>>> +	default:
>>>> +		error = -EFSCORRUPTED;
>>>> +		break;
>>>> +	}
>>>> +
>>>> +	/*
>>>> +	 * Mark the transaction dirty, even on error. This ensures the
>>>> +	 * transaction is aborted, which:
>>>> +	 *
>>>> +	 * 1.) releases the ATTRI and frees the ATTRD
>>>> +	 * 2.) shuts down the filesystem
>>>> +	 */
>>>> +	args->trans->t_flags |= XFS_TRANS_DIRTY;
>>>> +	if (xfs_sb_version_hasdelattr(&args->dp->i_mount->m_sb))
>>>> +		set_bit(XFS_LI_DIRTY, &attrdp->attrd_item.li_flags);
>>>
>>> This could probably be:
>>>
>>> 	if (attrdp)
>>> 		set_bit(...);
>>
>> Sure, that should work too.  I'm thinking a comment though?  Because this
>> looses the subtle implication that attrdp is expected to be null when the
>> feature bit is off.  Otherwise it may stir up future questions of why/how
>> would this be null.  Maybe just something like:
>>
>> /*
>>   * attr intent/done items are null when delayed attributes are disabled
>>   */
>>
>> ?
> 
> Ok.
> 
>>>
>>>> +
>>>> +	return error;
>>>> +}
>>>> +
>>>> +/* Log an attr to the intent item. */
>>>> +STATIC void
>>>> +xfs_attr_log_item(
>>>> +	struct xfs_trans		*tp,
>>>> +	struct xfs_attri_log_item	*attrip,
>>>> +	struct xfs_attr_item		*attr)
>>>> +{
>>>> +	struct xfs_attri_log_format	*attrp;
>>>> +
>>>> +	tp->t_flags |= XFS_TRANS_DIRTY;
>>>> +	set_bit(XFS_LI_DIRTY, &attrip->attri_item.li_flags);
>>>> +
>>>> +	/*
>>>> +	 * At this point the xfs_attr_item has been constructed, and we've
>>>> +	 * created the log intent. Fill in the attri log item and log format
>>>> +	 * structure with fields from this xfs_attr_item
>>>> +	 */
>>>> +	attrp = &attrip->attri_format;
>>>> +	attrp->alfi_ino = attr->xattri_dac.da_args->dp->i_ino;
>>>> +	attrp->alfi_op_flags = attr->xattri_op_flags;
>>>> +	attrp->alfi_value_len = attr->xattri_dac.da_args->valuelen;
>>>> +	attrp->alfi_name_len = attr->xattri_dac.da_args->namelen;
>>>> +	attrp->alfi_attr_flags = attr->xattri_dac.da_args->attr_filter;
>>>> +
>>>> +	attrip->attri_name = (void *)attr->xattri_dac.da_args->name;
>>>> +	attrip->attri_value = attr->xattri_dac.da_args->value;
>>>> +	attrip->attri_name_len = attr->xattri_dac.da_args->namelen;
>>>> +	attrip->attri_value_len = attr->xattri_dac.da_args->valuelen;
>>>> +}
>>>> +
>>>> +/* Get an ATTRI. */
>>>> +static struct xfs_log_item *
>>>> +xfs_attr_create_intent(
>>>> +	struct xfs_trans		*tp,
>>>> +	struct list_head		*items,
>>>> +	unsigned int			count,
>>>> +	bool				sort)
>>>> +{
>>>> +	struct xfs_mount		*mp = tp->t_mountp;
>>>> +	struct xfs_attri_log_item	*attrip;
>>>> +	struct xfs_attr_item		*attr;
>>>> +
>>>> +	ASSERT(count == 1);
>>>> +
>>>> +	if (!xfs_sb_version_hasdelattr(&mp->m_sb))
>>>> +		return NULL;
>>>> +
>>>> +	attrip = xfs_attri_init(mp);
>>>> +	xfs_trans_add_item(tp, &attrip->attri_item);
>>>> +	list_for_each_entry(attr, items, xattri_list)
>>>> +		xfs_attr_log_item(tp, attrip, attr);
>>>> +	return &attrip->attri_item;
>>>> +}
>>>> +
>>>> +/* Process an attr. */
>>>> +STATIC int
>>>> +xfs_attr_finish_item(
>>>> +	struct xfs_trans		*tp,
>>>> +	struct xfs_log_item		*done,
>>>> +	struct list_head		*item,
>>>> +	struct xfs_btree_cur		**state)
>>>> +{
>>>> +	struct xfs_attr_item		*attr;
>>>> +	int				error;
>>>> +	struct xfs_delattr_context	*dac;
>>>> +	struct xfs_attrd_log_item	*attrdp;
>>>> +	struct xfs_attri_log_item	*attrip;
>>>> +
>>>> +	attr = container_of(item, struct xfs_attr_item, xattri_list);
>>>> +	dac = &attr->xattri_dac;
>>>> +
>>>> +	/*
>>>> +	 * Always reset trans after EAGAIN cycle
>>>> +	 * since the transaction is new
>>>> +	 */
>>>> +	dac->da_args->trans = tp;
>>>> +
>>>> +	error = xfs_trans_attr(dac, ATTRD_ITEM(done), &dac->leaf_bp,
>>>> +			       attr->xattri_op_flags);
>>>> +	/*
>>>> +	 * The attrip refers to xfs_attr_item memory to log the name and value
>>>> +	 * with the intent item. This already occurred when the intent was
>>>> +	 * committed so these fields are no longer accessed.
>>>
>>> Can you clear the attri_{name,value} pointers after you've logged the
>>> intent item so that we don't have to do them here?
>>>
>> Ok, maybe I can put this in xfs_attri_item_committed?
> 
> Yeah.
> 
>>>> Clear them out of
>>>> +	 * caution since we're about to free the xfs_attr_item.
>>>> +	 */
>>>> +	if (xfs_sb_version_hasdelattr(&dac->da_args->dp->i_mount->m_sb)) {
>>>> +		attrdp = (struct xfs_attrd_log_item *)done;
>>>
>>> attrdp = ATTRD_ITEM(done)?
>> Sure, will shorten
>>>
>>>> +		attrip = attrdp->attrd_attrip;
>>>> +		attrip->attri_name = NULL;
>>>> +		attrip->attri_value = NULL;
>>>> +	}
>>>> +
>>>> +	if (error != -EAGAIN)
>>>> +		kmem_free(attr);
>>>> +
>>>> +	return error;
>>>> +}
>>>> +
>>>> +/* Abort all pending ATTRs. */
>>>> +STATIC void
>>>> +xfs_attr_abort_intent(
>>>> +	struct xfs_log_item		*intent)
>>>> +{
>>>> +	xfs_attri_release(ATTRI_ITEM(intent));
>>>> +}
>>>> +
>>>> +/* Cancel an attr */
>>>> +STATIC void
>>>> +xfs_attr_cancel_item(
>>>> +	struct list_head		*item)
>>>> +{
>>>> +	struct xfs_attr_item		*attr;
>>>> +
>>>> +	attr = container_of(item, struct xfs_attr_item, xattri_list);
>>>> +	kmem_free(attr);
>>>> +}
>>>> +
>>>> +/*
>>>> + * The ATTRI is logged only once and cannot be moved in the log, so simply
>>>> + * return the lsn at which it's been logged.
>>>> + */
>>>> +STATIC xfs_lsn_t
>>>> +xfs_attri_item_committed(
>>>> +	struct xfs_log_item	*lip,
>>>> +	xfs_lsn_t		lsn)
>>>> +{
>>>> +	return lsn;
>>>> +}
>>>
>>> You can omit this function because the default is "return lsn;" if you
>>> don't provide one.  See xfs_trans_committed_bulk.
>> Oh, ok.  I was thinking of moving some of the finish item clean up here
>> though.
> 
> <nod> Nowadays we're trying to reduce the number of indirect calls since
> they're expensive post-Spectre.
> 
> Also there are some helpers to detect intent and intentdone items that
> check the supplied li_ops; see xlog_item_is_intent and
> xlog_item_is_intent_done.  I think you're fine here, but it's something
> to keep in the back of your head.
> 
Ok, will take a look

>>>> +
>>>> +STATIC void
>>>> +xfs_attri_item_committing(
>>>> +	struct xfs_log_item	*lip,
>>>> +	xfs_lsn_t		lsn)
>>>> +{
>>>> +}
>>>
>>> This function isn't required if it doesn't do anything.  See
>>> xfs_log_commit_cil.
>> Ok, will remove
>>
>>>
>>>> +
>>>> +STATIC bool
>>>> +xfs_attri_item_match(
>>>> +	struct xfs_log_item	*lip,
>>>> +	uint64_t		intent_id)
>>>> +{
>>>> +	return ATTRI_ITEM(lip)->attri_format.alfi_id == intent_id;
>>>> +}
>>>> +
>>>> +/*
>>>> + * When the attrd item is committed to disk, all we need to do is delete our
>>>> + * reference to our partner attri item and then free ourselves. Since we're
>>>> + * freeing ourselves we must return -1 to keep the transaction code from
>>>> + * further referencing this item.
>>>> + */
>>>> +STATIC xfs_lsn_t
>>>> +xfs_attrd_item_committed(
>>>> +	struct xfs_log_item	*lip,
>>>> +	xfs_lsn_t		lsn)
>>>> +{
>>>> +	struct xfs_attrd_log_item	*attrdp = ATTRD_ITEM(lip);
>>>> +
>>>> +	/*
>>>> +	 * Drop the ATTRI reference regardless of whether the ATTRD has been
>>>> +	 * aborted. Once the ATTRD transaction is constructed, it is the sole
>>>> +	 * responsibility of the ATTRD to release the ATTRI (even if the ATTRI
>>>> +	 * is aborted due to log I/O error).
>>>> +	 */
>>>> +	xfs_attri_release(attrdp->attrd_attrip);
>>>> +	xfs_attrd_item_free(attrdp);
>>>> +
>>>> +	return NULLCOMMITLSN;
>>>> +}
>>>
>>> If you set XFS_ITEM_RELEASE_WHEN_COMMITTED in the attrd item ops,
>>> xfs_trans_committed_bulk will call ->iop_release instead of
>>> ->iop_committed and you therefore don't need this function.
>> Oh i see, will do that then
>>
>>>
>>>> +
>>>> +STATIC void
>>>> +xfs_attrd_item_committing(
>>>> +	struct xfs_log_item	*lip,
>>>> +	xfs_lsn_t		lsn)
>>>> +{
>>>> +}
>>>
>>> Same comment as xfs_attri_item_committing.
>> ok, will remove this one
>>
>>>
>>>> +
>>>> +
>>>> +/*
>>>> + * Allocate and initialize an attrd item
>>>> + */
>>>> +struct xfs_attrd_log_item *
>>>> +xfs_attrd_init(
>>>> +	struct xfs_mount		*mp,
>>>> +	struct xfs_attri_log_item	*attrip)
>>>> +
>>>> +{
>>>> +	struct xfs_attrd_log_item	*attrdp;
>>>> +	uint				size;
>>>> +
>>>> +	size = (uint)(sizeof(struct xfs_attrd_log_item));
>>>
>>> Same comment about sizeof and size_t as in xfs_attri_init.
>>>
>>>> +	attrdp = kmem_zalloc(size, 0);
>>>> +	memset(attrdp, 0, size);
>>>
>>> No need to memset-zero something you just zalloc'd.
>> ok, will clean these up
>>
>>>
>>>> +
>>>> +	xfs_log_item_init(mp, &attrdp->attrd_item, XFS_LI_ATTRD,
>>>> +			  &xfs_attrd_item_ops);
>>>> +	attrdp->attrd_attrip = attrip;
>>>> +	attrdp->attrd_format.alfd_alf_id = attrip->attri_format.alfi_id;
>>>> +
>>>> +	return attrdp;
>>>> +}
>>>> +
>>>> +/*
>>>> + * This routine is called to allocate an "attr free done" log item.
>>>> + */
>>>> +struct xfs_attrd_log_item *
>>>> +xfs_trans_get_attrd(struct xfs_trans		*tp,
>>>> +		  struct xfs_attri_log_item	*attrip)
>>>> +{
>>>> +	struct xfs_attrd_log_item		*attrdp;
>>>> +
>>>> +	ASSERT(tp != NULL);
>>>> +
>>>> +	attrdp = xfs_attrd_init(tp->t_mountp, attrip);
>>>> +	ASSERT(attrdp != NULL);
>>>
>>> You could fold xfs_attrd_init into this function since there's only one
>>> caller.
>> Sure, there's not a lot in the init
>>
>>>
>>>> +
>>>> +	xfs_trans_add_item(tp, &attrdp->attrd_item);
>>>> +	return attrdp;
>>>> +}
>>>> +
>>>> +static const struct xfs_item_ops xfs_attrd_item_ops = {
>>>> +	.iop_size	= xfs_attrd_item_size,
>>>> +	.iop_format	= xfs_attrd_item_format,
>>>> +	.iop_release    = xfs_attrd_item_release,
>>>> +	.iop_committing	= xfs_attrd_item_committing,
>>>> +	.iop_committed	= xfs_attrd_item_committed,
>>>> +};
>>>> +
>>>> +
>>>> +/* Get an ATTRD so we can process all the attrs. */
>>>> +static struct xfs_log_item *
>>>> +xfs_attr_create_done(
>>>> +	struct xfs_trans		*tp,
>>>> +	struct xfs_log_item		*intent,
>>>> +	unsigned int			count)
>>>> +{
>>>> +	if (!xfs_sb_version_hasdelattr(&tp->t_mountp->m_sb))
>>>> +		return NULL;
>>>
>>> This is probably better expressed as:
>>>
>>> 	if (!intent)
>>> 		return NULL;
>>>
>>> Since we don't need a log intent done item if there's no log intent
>>> item.
>> Ok, that makes sense
>>
>>>
>>>> +
>>>> +	return &xfs_trans_get_attrd(tp, ATTRI_ITEM(intent))->attrd_item;
>>>> +}
>>>> +
>>>> +const struct xfs_defer_op_type xfs_attr_defer_type = {
>>>> +	.max_items	= 1,
>>>> +	.create_intent	= xfs_attr_create_intent,
>>>> +	.abort_intent	= xfs_attr_abort_intent,
>>>> +	.create_done	= xfs_attr_create_done,
>>>> +	.finish_item	= xfs_attr_finish_item,
>>>> +	.cancel_item	= xfs_attr_cancel_item,
>>>> +};
>>>> +
>>>> +/*
>>>> + * Process an attr intent item that was recovered from the log.  We need to
>>>> + * delete the attr that it describes.
>>>> + */
>>>> +STATIC int
>>>> +xfs_attri_item_recover(
>>>> +	struct xfs_log_item		*lip,
>>>> +	struct list_head		*capture_list)
>>>> +{
>>>> +	struct xfs_attri_log_item	*attrip = ATTRI_ITEM(lip);
>>>> +	struct xfs_mount		*mp = lip->li_mountp;
>>>> +	struct xfs_inode		*ip;
>>>> +	struct xfs_da_args		args;
>>>> +	struct xfs_attri_log_format	*attrp;
>>>> +	int				error;
>>>> +
>>>> +	/*
>>>> +	 * First check the validity of the attr described by the ATTRI.  If any
>>>> +	 * are bad, then assume that all are bad and just toss the ATTRI.
>>>> +	 */
>>>> +	attrp = &attrip->attri_format;
>>>> +	if (!(attrp->alfi_op_flags == XFS_ATTR_OP_FLAGS_SET ||
>>>> +	      attrp->alfi_op_flags == XFS_ATTR_OP_FLAGS_REMOVE) ||
>>>> +	    (attrp->alfi_value_len > XATTR_SIZE_MAX) ||
>>>> +	    (attrp->alfi_name_len > XATTR_NAME_MAX) ||
>>>> +	    (attrp->alfi_name_len == 0)) {
>>>
>>> This needs to call xfs_verify_ino() on attrp->alfi_ino.
>> Ok, will add
>>
>>>
>>> This also needs to check for xfs_sb_version_hasdelayedattr().
>> Well, ideally this would not be exectuing if the feature bit were not on.
>> Maybe we should add an ASSERT at the top?
> 
> The trouble is, we could be fed a filesystem where the delattr feature
> bit is cleared but the log has been specially crafted/corrupted to have
> a log item with type XFS_LI_ATTRI.  In that case we cannot recover the
> log item because the log item type is inconsistent with the superblock
> feature set.
> 
> (And yes, the current recovery functions are missing that...)
I suppose thats possible, though it would seem that someone would have 
to get pretty crafty to arrive at such a configuration. :-)

will add though

> 
>>
>>>
>>> I would refactor this into a separate validation predicate to eliminate
>>> the multi-line if statement.  I will post a series cleaning up the other
>>> log items' recover functions shortly.
>> Alrighty, I will keep an eye out
>>
>>>
>>>> +		/*
>>>> +		 * This will pull the ATTRI from the AIL and free the memory
>>>> +		 * associated with it.
>>>> +		 */
>>>> +		xfs_attri_release(attrip);
>>>
>>> No need to call xfs_attri_release; one of the 5.10 cleanups was to
>>> recognize that the log recovery code does this for you automatically.
>>>
>> Ok, will remove
>>
>>>> +		return -EFSCORRUPTED;
>>>> +	}
>>>> +
>>>> +	error = xfs_iget(mp, 0, attrp->alfi_ino, 0, 0, &ip);
>>>> +	if (error)
>>>> +		return error;
>>>
>>> I /think/ this needs to call xfs_qm_dqattach here, for reasons I'll get
>>> into shortly.
>>>
>>> In the meantime, this /definitely/ needs to do:
>>>
>>> 	if (VFS_I(ip)->i_nlink == 0)
>>> 		xfs_iflags_set(ip, XFS_IRECOVERY);
>>>
>>> Because the IRECOVERY flag prevents inode inactivation from triggering
>>> on an unlinked inode while we're still performing log recovery.
>>>
>>> If you want to steal the xlog_recover_iget helper from the atomic
>>> swapext series[0] please feel free. :)
>>>
>>> [0] https://urldefense.com/v3/__https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/commit/?id=51e23b9c9d9674a78dc97c5848c9efb4461e074d__;!!GqivPVa7Brio!NhoShjOeAwZKXnP8PJaOawFhTc6SKX_XvKzsFVJSzUFf0ISRg34iN0jHWRsN6JIg3Wul$
>> Oh I see.  Ok, I will take  a look at that
>>
>>>
>>>> +	memset(&args, 0, sizeof(args));
>>>> +	args.dp = ip;
>>>> +	args.name = attrip->attri_name;
>>>> +	args.namelen = attrp->alfi_name_len;
>>>> +	args.attr_filter = attrp->alfi_attr_flags;
>>>> +	if (attrp->alfi_op_flags == XFS_ATTR_OP_FLAGS_SET) {
>>>> +		args.value = attrip->attri_value;
>>>> +		args.valuelen = attrp->alfi_value_len;
>>>> +	}
>>>> +
>>>> +	error = xfs_attr_set(&args);
>>>
>>> Er...
>>>
>>>> +
>>>> +	xfs_attri_release(attrip);
>>>
>>> The transaction commit will take care of releasing attrip.
>> Mmmm, the new test case for attr replay hangs with out this line.  I suspect
>> because we end up with an item in the ail that never goes away.
>>
>> [Nov12 13:26] INFO: task mount:15718 blocked for more than 120 seconds.
>> [  +0.000009]       Tainted: G        W   E     5.9.0-rc4 #1
>> [  +0.000002] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
>> this message.
>> [  +0.000004] task:mount           state:D stack:    0 pid:15718 ppid: 15491
>> flags:0x00004000
>> [  +0.000005] Call Trace:
>> [  +0.000079]  __schedule+0x2d9/0x780
>> [  +0.000020]  schedule+0x4a/0xb0
>> [  +0.000120]  xfs_ail_push_all_sync+0xb8/0x100 [xfs]
>>
>> ...ect....
>>
>>
>> Little confused on this one.... I didnt think transaction commits released
>> log items?
> 
> The ATTRI gets created with two refcount: one is dropped by the
> transaction when it commits, and the second one is dropped by the ATTRD
> when the ATTRD commits (per that huge comment below that I told you to
> delete ;)).
> 
> Note that you're missing an xfs_trans_get_attrd call in the recover
> function, which is another reason why you can't call xfs_attr_set()
> directly here.  That might be why recovery locks up, but you'd have to
> go check the trace data for that log item to confirm.
> 
Ok, will re-work this area then

>>>> +	xfs_irele(ip);
>>>> +	return error;
>>>> +}
>>>> +
>>>> +static const struct xfs_item_ops xfs_attri_item_ops = {
>>>> +	.iop_size	= xfs_attri_item_size,
>>>> +	.iop_format	= xfs_attri_item_format,
>>>> +	.iop_unpin	= xfs_attri_item_unpin,
>>>> +	.iop_committed	= xfs_attri_item_committed,
>>>> +	.iop_committing = xfs_attri_item_committing,
>>>> +	.iop_release    = xfs_attri_item_release,
>>>> +	.iop_recover	= xfs_attri_item_recover,
>>>> +	.iop_match	= xfs_attri_item_match,
>>>
>>> This needs an ->iop_relog method so that we can relog the attri log item
>>> if the log starts to fill up.
>> Ok, will add
>>
>>>
>>>> +};
>>>> +
>>>> +
>>>> +
>>>> +STATIC int
>>>> +xlog_recover_attri_commit_pass2(
>>>> +	struct xlog                     *log,
>>>> +	struct list_head		*buffer_list,
>>>> +	struct xlog_recover_item        *item,
>>>> +	xfs_lsn_t                       lsn)
>>>> +{
>>>> +	int                             error;
>>>> +	struct xfs_mount                *mp = log->l_mp;
>>>> +	struct xfs_attri_log_item       *attrip;
>>>> +	struct xfs_attri_log_format     *attri_formatp;
>>>> +	char				*name = NULL;
>>>> +	char				*value = NULL;
>>>> +	int				region = 0;
>>>> +
>>>> +	attri_formatp = item->ri_buf[region].i_addr;
>>>
>>> Please check the __pad field for zeroes here.
>> Ok, will do
>>
>>>
>>>> +	attrip = xfs_attri_init(mp);
>>>> +	error = xfs_attri_copy_format(&item->ri_buf[region],
>>>> +				      &attrip->attri_format);
>>>> +	if (error) {
>>>> +		xfs_attri_item_free(attrip);
>>>> +		return error;
>>>> +	}
>>>> +
>>>> +	attrip->attri_name_len = attri_formatp->alfi_name_len;
>>>> +	attrip->attri_value_len = attri_formatp->alfi_value_len;
>>>> +	attrip = krealloc(attrip, sizeof(struct xfs_attri_log_item) +
>>>> +			  attrip->attri_name_len + attrip->attri_value_len,
>>>> +			  GFP_NOFS | __GFP_NOFAIL);
>>>> +
>>>> +	ASSERT(attrip->attri_name_len > 0);
>>>
>>> If attri_name_len is zero, reject the whole thing with EFSCORRUPTED.
>> Ok, makes sense
>>
>>>
>>>> +	region++;
>>>> +	name = ((char *)attrip) + sizeof(struct xfs_attri_log_item);
>>>> +	memcpy(name, item->ri_buf[region].i_addr,
>>>> +	       attrip->attri_name_len);
>>>> +	attrip->attri_name = name;
>>>> +
>>>> +	if (attrip->attri_value_len > 0) {
>>>> +		region++;
>>>> +		value = ((char *)attrip) + sizeof(struct xfs_attri_log_item) +
>>>> +			attrip->attri_name_len;
>>>> +		memcpy(value, item->ri_buf[region].i_addr,
>>>> +			attrip->attri_value_len);
>>>> +		attrip->attri_value = value;
>>>> +	}
>>>
>>> Question: is it valid for an attri item to have value_len > 0 for an
>>> XFS_ATTRI_OP_FLAGS_REMOVE operation?
>> Well, it shouldnt happen since the new attr_set routines assume that the
>> absence of the value implies a remove operation.  It doesnt invalidate the
>> item I suppose, though it would mean that it's carrying around a usless
>> payload that it shouldnt.
> 
> _commit_pass2 is called as part of recovering unfinished items from the
> ondisk log.  If you find something that doesn't smell right, you should
> bail out with an error code so that mounting fails.
> 
Ok, will do that then

>>>
>>> Granted, that level of validation might be better left to the _recover
>>> function.
>> Maybe we should add and ASSERT there
>>
>>>
>>>> +
>>>> +	/*
>>>> +	 * The ATTRI has two references. One for the ATTRD and one for ATTRI to
>>>> +	 * ensure it makes it into the AIL. Insert the ATTRI into the AIL
>>>> +	 * directly and drop the ATTRI reference. Note that
>>>> +	 * xfs_trans_ail_update() drops the AIL lock.
>>>> +	 */
>>>> +	xfs_trans_ail_insert(log->l_ailp, &attrip->attri_item, lsn);
>>>> +	xfs_attri_release(attrip);
>>>> +	return 0;
>>>> +}
>>>> +
>>>> +const struct xlog_recover_item_ops xlog_attri_item_ops = {
>>>> +	.item_type	= XFS_LI_ATTRI,
>>>> +	.commit_pass2	= xlog_recover_attri_commit_pass2,
>>>> +};
>>>> +
>>>> +/*
>>>> + * This routine is called when an ATTRD format structure is found in a committed
>>>> + * transaction in the log. Its purpose is to cancel the corresponding ATTRI if
>>>> + * it was still in the log. To do this it searches the AIL for the ATTRI with
>>>> + * an id equal to that in the ATTRD format structure. If we find it we drop
>>>> + * the ATTRD reference, which removes the ATTRI from the AIL and frees it.
>>>> + */
>>>> +STATIC int
>>>> +xlog_recover_attrd_commit_pass2(
>>>> +	struct xlog			*log,
>>>> +	struct list_head		*buffer_list,
>>>> +	struct xlog_recover_item	*item,
>>>> +	xfs_lsn_t			lsn)
>>>> +{
>>>> +	struct xfs_attrd_log_format	*attrd_formatp;
>>>> +
>>>> +	attrd_formatp = item->ri_buf[0].i_addr;
>>>> +	ASSERT((item->ri_buf[0].i_len ==
>>>> +				(sizeof(struct xfs_attrd_log_format))));
>>>> +
>>>> +	xlog_recover_release_intent(log, XFS_LI_ATTRI,
>>>> +				    attrd_formatp->alfd_alf_id);
>>>> +	return 0;
>>>> +}
>>>> +
>>>> +const struct xlog_recover_item_ops xlog_attrd_item_ops = {
>>>> +	.item_type	= XFS_LI_ATTRD,
>>>> +	.commit_pass2	= xlog_recover_attrd_commit_pass2,
>>>> +};
>>>> diff --git a/fs/xfs/xfs_attr_item.h b/fs/xfs/xfs_attr_item.h
>>>> new file mode 100644
>>>> index 0000000..7dd2572
>>>> --- /dev/null
>>>> +++ b/fs/xfs/xfs_attr_item.h
>>>> @@ -0,0 +1,76 @@
>>>> +/* SPDX-License-Identifier: GPL-2.0-or-later
>>>> + *
>>>> + * Copyright (C) 2019 Oracle.  All Rights Reserved.
>>>> + * Author: Allison Collins <allison.henderson@oracle.com>
>>>> + */
>>>> +#ifndef	__XFS_ATTR_ITEM_H__
>>>> +#define	__XFS_ATTR_ITEM_H__
>>>> +
>>>> +/* kernel only ATTRI/ATTRD definitions */
>>>> +
>>>> +struct xfs_mount;
>>>> +struct kmem_zone;
>>>> +
>>>> +/*
>>>> + * Define ATTR flag bits. Manipulated by set/clear/test_bit operators.
>>>> + */
>>>> +#define	XFS_ATTRI_RECOVERED	1
>>>> +
>>>> +
>>>> +/* iovec length must be 32-bit aligned */
>>>> +#define ATTR_NVEC_SIZE(size) (size == sizeof(int32_t) ? sizeof(int32_t) : \
>>>> +				size + sizeof(int32_t) - \
>>>> +				(size % sizeof(int32_t)))
>>>
>>> Can you turn this into a static inline helper?
>>>
>>> And use one of the roundup() variants to ensure the proper alignment
>>> instead of this open-coded stuff? :)
>> Sure, will do
>>
>>>
>>>> +
>>>> +/*
>>>> + * This is the "attr intention" log item.  It is used to log the fact that some
>>>> + * attribute operations need to be processed.  An operation is currently either
>>>> + * a set or remove.  Set or remove operations are described by the xfs_attr_item
>>>> + * which may be logged to this intent.  Intents are used in conjunction with the
>>>> + * "attr done" log item described below.
>>>> + *
>>>> + * The ATTRI is reference counted so that it is not freed prior to both the
>>>> + * ATTRI and ATTRD being committed and unpinned. This ensures the ATTRI is
>>>> + * inserted into the AIL even in the event of out of order ATTRI/ATTRD
>>>> + * processing. In other words, an ATTRI is born with two references:
>>>> + *
>>>> + *      1.) an ATTRI held reference to track ATTRI AIL insertion
>>>> + *      2.) an ATTRD held reference to track ATTRD commit
>>>> + *
>>>> + * On allocation, both references are the responsibility of the caller. Once the
>>>> + * ATTRI is added to and dirtied in a transaction, ownership of reference one
>>>> + * transfers to the transaction. The reference is dropped once the ATTRI is
>>>> + * inserted to the AIL or in the event of failure along the way (e.g., commit
>>>> + * failure, log I/O error, etc.). Note that the caller remains responsible for
>>>> + * the ATTRD reference under all circumstances to this point. The caller has no
>>>> + * means to detect failure once the transaction is committed, however.
>>>> + * Therefore, an ATTRD is required after this point, even in the event of
>>>> + * unrelated failure.
>>>> + *
>>>> + * Once an ATTRD is allocated and dirtied in a transaction, reference two
>>>> + * transfers to the transaction. The ATTRD reference is dropped once it reaches
>>>> + * the unpin handler. Similar to the ATTRI, the reference also drops in the
>>>> + * event of commit failure or log I/O errors. Note that the ATTRD is not
>>>> + * inserted in the AIL, so at this point both the ATTRI and ATTRD are freed.
>>>
>>> I don't think it's necessary to document the entire log intent/log done
>>> refcount state machine here; it'll do to record just the bits that are
>>> specific to delayed xattr operations.
>> Ok, maybe just the first 3 lines are enough then? I think that's all that
>> really stands out from the other delayed ops
> 
> Yes.  You might also want to touch on the lifespan of the name and value
> buffers that are attached to the xfs_attr_item -- they're copies of what
> the caller passed in from userspace, right?  And they're attached to the
> log intent item long enough for the item to commit, right?  And they're
> freed when the xfs_attr_item itself is freed when the work is done,
> right?
> 
That sounds about right, I will add in a blurb about those then.

Thanks for the reviews!
Allison

> --D
> 
>>>
>>>> + */
>>>> +struct xfs_attri_log_item {
>>>> +	struct xfs_log_item		attri_item;
>>>> +	atomic_t			attri_refcount;
>>>> +	int				attri_name_len;
>>>> +	void				*attri_name;
>>>> +	int				attri_value_len;
>>>> +	void				*attri_value;
>>>
>>> Please compress this structure a bit by moving the two pointers to be
>>> adjacent instead of interspersed with ints.
>> Alrighty, will do.
>>
>>>
>>> Ok, now on to digesting the new state machine...
>>>
>>> --D
>> Ok then, thanks for the thorough review!!
>>
>> Allison
>>>
>>>> +	struct xfs_attri_log_format	attri_format;
>>>> +};
>>>> +
>>>> +/*
>>>> + * This is the "attr done" log item.  It is used to log the fact that some attrs
>>>> + * earlier mentioned in an attri item have been freed.
>>>> + */
>>>> +struct xfs_attrd_log_item {
>>>> +	struct xfs_attri_log_item	*attrd_attrip;
>>>> +	struct xfs_log_item		attrd_item;
>>>> +	struct xfs_attrd_log_format	attrd_format;
>>>> +};
>>>> +
>>>> +#endif	/* __XFS_ATTR_ITEM_H__ */
>>>> diff --git a/fs/xfs/xfs_attr_list.c b/fs/xfs/xfs_attr_list.c
>>>> index 8f8837f..d7787a5 100644
>>>> --- a/fs/xfs/xfs_attr_list.c
>>>> +++ b/fs/xfs/xfs_attr_list.c
>>>> @@ -15,6 +15,7 @@
>>>>    #include "xfs_inode.h"
>>>>    #include "xfs_trans.h"
>>>>    #include "xfs_bmap.h"
>>>> +#include "xfs_da_btree.h"
>>>>    #include "xfs_attr.h"
>>>>    #include "xfs_attr_sf.h"
>>>>    #include "xfs_attr_leaf.h"
>>>> diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
>>>> index 3fbd98f..d5d1959 100644
>>>> --- a/fs/xfs/xfs_ioctl.c
>>>> +++ b/fs/xfs/xfs_ioctl.c
>>>> @@ -15,6 +15,8 @@
>>>>    #include "xfs_iwalk.h"
>>>>    #include "xfs_itable.h"
>>>>    #include "xfs_error.h"
>>>> +#include "xfs_da_format.h"
>>>> +#include "xfs_da_btree.h"
>>>>    #include "xfs_attr.h"
>>>>    #include "xfs_bmap.h"
>>>>    #include "xfs_bmap_util.h"
>>>> diff --git a/fs/xfs/xfs_ioctl32.c b/fs/xfs/xfs_ioctl32.c
>>>> index c1771e7..62e1534 100644
>>>> --- a/fs/xfs/xfs_ioctl32.c
>>>> +++ b/fs/xfs/xfs_ioctl32.c
>>>> @@ -17,6 +17,8 @@
>>>>    #include "xfs_itable.h"
>>>>    #include "xfs_fsops.h"
>>>>    #include "xfs_rtalloc.h"
>>>> +#include "xfs_da_format.h"
>>>> +#include "xfs_da_btree.h"
>>>>    #include "xfs_attr.h"
>>>>    #include "xfs_ioctl.h"
>>>>    #include "xfs_ioctl32.h"
>>>> diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
>>>> index 5e16545..5ecc76c 100644
>>>> --- a/fs/xfs/xfs_iops.c
>>>> +++ b/fs/xfs/xfs_iops.c
>>>> @@ -13,6 +13,8 @@
>>>>    #include "xfs_inode.h"
>>>>    #include "xfs_acl.h"
>>>>    #include "xfs_quota.h"
>>>> +#include "xfs_da_format.h"
>>>> +#include "xfs_da_btree.h"
>>>>    #include "xfs_attr.h"
>>>>    #include "xfs_trans.h"
>>>>    #include "xfs_trace.h"
>>>> diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
>>>> index fa2d05e..3457f22 100644
>>>> --- a/fs/xfs/xfs_log.c
>>>> +++ b/fs/xfs/xfs_log.c
>>>> @@ -1993,6 +1993,10 @@ xlog_print_tic_res(
>>>>    	    REG_TYPE_STR(CUD_FORMAT, "cud_format"),
>>>>    	    REG_TYPE_STR(BUI_FORMAT, "bui_format"),
>>>>    	    REG_TYPE_STR(BUD_FORMAT, "bud_format"),
>>>> +	    REG_TYPE_STR(ATTRI_FORMAT, "attri_format"),
>>>> +	    REG_TYPE_STR(ATTRD_FORMAT, "attrd_format"),
>>>> +	    REG_TYPE_STR(ATTR_NAME, "attr_name"),
>>>> +	    REG_TYPE_STR(ATTR_VALUE, "attr_value"),
>>>>    	};
>>>>    	BUILD_BUG_ON(ARRAY_SIZE(res_type_str) != XLOG_REG_TYPE_MAX + 1);
>>>>    #undef REG_TYPE_STR
>>>> diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
>>>> index a8289ad..cb951cd 100644
>>>> --- a/fs/xfs/xfs_log_recover.c
>>>> +++ b/fs/xfs/xfs_log_recover.c
>>>> @@ -1775,6 +1775,8 @@ static const struct xlog_recover_item_ops *xlog_recover_item_ops[] = {
>>>>    	&xlog_cud_item_ops,
>>>>    	&xlog_bui_item_ops,
>>>>    	&xlog_bud_item_ops,
>>>> +	&xlog_attri_item_ops,
>>>> +	&xlog_attrd_item_ops,
>>>>    };
>>>>    static const struct xlog_recover_item_ops *
>>>> diff --git a/fs/xfs/xfs_ondisk.h b/fs/xfs/xfs_ondisk.h
>>>> index 0aa87c2..bc9c25e 100644
>>>> --- a/fs/xfs/xfs_ondisk.h
>>>> +++ b/fs/xfs/xfs_ondisk.h
>>>> @@ -132,6 +132,8 @@ xfs_check_ondisk_structs(void)
>>>>    	XFS_CHECK_STRUCT_SIZE(struct xfs_inode_log_format,	56);
>>>>    	XFS_CHECK_STRUCT_SIZE(struct xfs_qoff_logformat,	20);
>>>>    	XFS_CHECK_STRUCT_SIZE(struct xfs_trans_header,		16);
>>>> +	XFS_CHECK_STRUCT_SIZE(struct xfs_attri_log_format,	40);
>>>> +	XFS_CHECK_STRUCT_SIZE(struct xfs_attrd_log_format,	16);
>>>>    	/*
>>>>    	 * The v5 superblock format extended several v4 header structures with
>>>> diff --git a/fs/xfs/xfs_xattr.c b/fs/xfs/xfs_xattr.c
>>>> index bca48b3..9b0c790 100644
>>>> --- a/fs/xfs/xfs_xattr.c
>>>> +++ b/fs/xfs/xfs_xattr.c
>>>> @@ -10,6 +10,7 @@
>>>>    #include "xfs_log_format.h"
>>>>    #include "xfs_da_format.h"
>>>>    #include "xfs_inode.h"
>>>> +#include "xfs_da_btree.h"
>>>>    #include "xfs_attr.h"
>>>>    #include "xfs_acl.h"
>>>>    #include "xfs_da_btree.h"
>>>> -- 
>>>> 2.7.4
>>>>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v13 07/10] xfs: Add feature bit XFS_SB_FEAT_INCOMPAT_LOG_DELATTR
  2020-10-23  6:34 ` [PATCH v13 07/10] xfs: Add feature bit XFS_SB_FEAT_INCOMPAT_LOG_DELATTR Allison Henderson
  2020-11-10 20:10   ` Darrick J. Wong
@ 2020-11-19  2:36   ` Darrick J. Wong
  2020-11-19  4:01     ` Allison Henderson
  1 sibling, 1 reply; 58+ messages in thread
From: Darrick J. Wong @ 2020-11-19  2:36 UTC (permalink / raw)
  To: Allison Henderson; +Cc: linux-xfs

On Thu, Oct 22, 2020 at 11:34:32PM -0700, Allison Henderson wrote:
> This patch adds a new feature bit XFS_SB_FEAT_INCOMPAT_LOG_DELATTR which
> can be used to control turning on/off delayed attributes
> 
> Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
> ---
>  fs/xfs/libxfs/xfs_format.h | 8 ++++++--
>  fs/xfs/libxfs/xfs_fs.h     | 1 +
>  fs/xfs/libxfs/xfs_sb.c     | 2 ++
>  fs/xfs/xfs_super.c         | 3 +++
>  4 files changed, 12 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
> index d419c34..18b41a7 100644
> --- a/fs/xfs/libxfs/xfs_format.h
> +++ b/fs/xfs/libxfs/xfs_format.h
> @@ -483,7 +483,9 @@ xfs_sb_has_incompat_feature(
>  	return (sbp->sb_features_incompat & feature) != 0;
>  }
>  
> -#define XFS_SB_FEAT_INCOMPAT_LOG_ALL 0
> +#define XFS_SB_FEAT_INCOMPAT_LOG_DELATTR   (1 << 0)	/* Delayed Attributes */
> +#define XFS_SB_FEAT_INCOMPAT_LOG_ALL \
> +	(XFS_SB_FEAT_INCOMPAT_LOG_DELATTR)
>  #define XFS_SB_FEAT_INCOMPAT_LOG_UNKNOWN	~XFS_SB_FEAT_INCOMPAT_LOG_ALL
>  static inline bool
>  xfs_sb_has_incompat_log_feature(
> @@ -586,7 +588,9 @@ static inline bool xfs_sb_version_hasinobtcounts(struct xfs_sb *sbp)
>  
>  static inline bool xfs_sb_version_hasdelattr(struct xfs_sb *sbp)

Soooo, something Dave pointed out on IRC this evening --

Log incompat flags exist /only/ to protect the contents of a dirty
journal.  They're supposed to get set when you save something to the log
(like the delayed xattr log items) and cleared when the log is replayed
or unmounted cleanly or goes idle.  If a new feature changes the disk
format then you'll have to protect that with a new rocompat / compat /
incompat flag, but never a log incompat flag.

Therefore, you can't use log incompat flags to gate higher level
functionality.  I don't think delayed attrs themselves have any user
visible effect on the ondisk format outside of the log, right?  So I
guess the good news is that's one less hurdle to getting people to use
this feature.

(Aside: in another part of this patchset review I asked if this means we
could drop the INCOMPLETE flag from attr keys.  I think you could do
that without needing to add a rocompat / compat / incompat flag, since
an old kernel works fine if it never sees an incomplete flag; and
presumably the new kernel will continue to know how to delete those
things.)

The downside is that no code exists to support log incompat flags.  I
guess every time you want to use them you'd potentially have to check
the superblock and log a new superblock to disk with the feature turned
on.  I'll have to think about that more later.

I guess for now we'd want to retain the predicate function so that we
could enable it via a seeecret mount option while we stabilize the
feature.  Later if we add a new ondisk feature flag that uses the log
item we can change the predicate to return true if that feature flag is
set (e.g. xfs_sb_version_hasdelattr always returns true if parent
pointers are enabled).

Atomic file range swapping falls into the same category, so I guess we
both have things to rework.  On the plus side it means that both of our
new features aren't going to require people to upgrade or reformat. :)

--D

>  {
> -	return false;
> +	return (XFS_SB_VERSION_NUM(sbp) == XFS_SB_VERSION_5 &&
> +		(sbp->sb_features_log_incompat &
> +		XFS_SB_FEAT_INCOMPAT_LOG_DELATTR));
>  }
>  
>  /*
> diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
> index 2a2e3cf..f703d95 100644
> --- a/fs/xfs/libxfs/xfs_fs.h
> +++ b/fs/xfs/libxfs/xfs_fs.h
> @@ -250,6 +250,7 @@ typedef struct xfs_fsop_resblks {
>  #define XFS_FSOP_GEOM_FLAGS_RMAPBT	(1 << 19) /* reverse mapping btree */
>  #define XFS_FSOP_GEOM_FLAGS_REFLINK	(1 << 20) /* files can share blocks */
>  #define XFS_FSOP_GEOM_FLAGS_BIGTIME	(1 << 21) /* 64-bit nsec timestamps */
> +#define XFS_FSOP_GEOM_FLAGS_DELATTR	(1 << 22) /* delayed attributes	    */
>  
>  /*
>   * Minimum and maximum sizes need for growth checks.
> diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c
> index 5aeafa5..a0ec327 100644
> --- a/fs/xfs/libxfs/xfs_sb.c
> +++ b/fs/xfs/libxfs/xfs_sb.c
> @@ -1168,6 +1168,8 @@ xfs_fs_geometry(
>  		geo->flags |= XFS_FSOP_GEOM_FLAGS_REFLINK;
>  	if (xfs_sb_version_hasbigtime(sbp))
>  		geo->flags |= XFS_FSOP_GEOM_FLAGS_BIGTIME;
> +	if (xfs_sb_version_hasdelattr(sbp))
> +		geo->flags |= XFS_FSOP_GEOM_FLAGS_DELATTR;
>  	if (xfs_sb_version_hassector(sbp))
>  		geo->logsectsize = sbp->sb_logsectsize;
>  	else
> diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> index d1b5f2d..bb85884 100644
> --- a/fs/xfs/xfs_super.c
> +++ b/fs/xfs/xfs_super.c
> @@ -1580,6 +1580,9 @@ xfs_fc_fill_super(
>  	if (xfs_sb_version_hasinobtcounts(&mp->m_sb))
>  		xfs_warn(mp,
>   "EXPERIMENTAL inode btree counters feature in use. Use at your own risk!");
> +	if (xfs_sb_version_hasdelattr(&mp->m_sb))
> +		xfs_alert(mp,
> +	"EXPERIMENTAL delayed attrs feature enabled. Use at your own risk!");
>  
>  	error = xfs_mountfs(mp);
>  	if (error)
> -- 
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v13 07/10] xfs: Add feature bit XFS_SB_FEAT_INCOMPAT_LOG_DELATTR
  2020-11-19  2:36   ` Darrick J. Wong
@ 2020-11-19  4:01     ` Allison Henderson
  0 siblings, 0 replies; 58+ messages in thread
From: Allison Henderson @ 2020-11-19  4:01 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs



On 11/18/20 7:36 PM, Darrick J. Wong wrote:
> On Thu, Oct 22, 2020 at 11:34:32PM -0700, Allison Henderson wrote:
>> This patch adds a new feature bit XFS_SB_FEAT_INCOMPAT_LOG_DELATTR which
>> can be used to control turning on/off delayed attributes
>>
>> Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
>> ---
>>   fs/xfs/libxfs/xfs_format.h | 8 ++++++--
>>   fs/xfs/libxfs/xfs_fs.h     | 1 +
>>   fs/xfs/libxfs/xfs_sb.c     | 2 ++
>>   fs/xfs/xfs_super.c         | 3 +++
>>   4 files changed, 12 insertions(+), 2 deletions(-)
>>
>> diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
>> index d419c34..18b41a7 100644
>> --- a/fs/xfs/libxfs/xfs_format.h
>> +++ b/fs/xfs/libxfs/xfs_format.h
>> @@ -483,7 +483,9 @@ xfs_sb_has_incompat_feature(
>>   	return (sbp->sb_features_incompat & feature) != 0;
>>   }
>>   
>> -#define XFS_SB_FEAT_INCOMPAT_LOG_ALL 0
>> +#define XFS_SB_FEAT_INCOMPAT_LOG_DELATTR   (1 << 0)	/* Delayed Attributes */
>> +#define XFS_SB_FEAT_INCOMPAT_LOG_ALL \
>> +	(XFS_SB_FEAT_INCOMPAT_LOG_DELATTR)
>>   #define XFS_SB_FEAT_INCOMPAT_LOG_UNKNOWN	~XFS_SB_FEAT_INCOMPAT_LOG_ALL
>>   static inline bool
>>   xfs_sb_has_incompat_log_feature(
>> @@ -586,7 +588,9 @@ static inline bool xfs_sb_version_hasinobtcounts(struct xfs_sb *sbp)
>>   
>>   static inline bool xfs_sb_version_hasdelattr(struct xfs_sb *sbp)
> 
> Soooo, something Dave pointed out on IRC this evening --
> 
> Log incompat flags exist /only/ to protect the contents of a dirty
> journal.  They're supposed to get set when you save something to the log
> (like the delayed xattr log items) and cleared when the log is replayed
> or unmounted cleanly or goes idle.  If a new feature changes the disk
> format then you'll have to protect that with a new rocompat / compat /
> incompat flag, but never a log incompat flag.
> 
> Therefore, you can't use log incompat flags to gate higher level
> functionality.  I don't think delayed attrs themselves have any user
> visible effect on the ondisk format outside of the log, right?  So I
> guess the good news is that's one less hurdle to getting people to use
> this feature.
> 
Yeah, I saw the scroll back history later, but it looked like folks 
might have retired for the evening.  So maybe I can set this bit in the 
item create/committed call backs then.

> (Aside: in another part of this patchset review I asked if this means we
> could drop the INCOMPLETE flag from attr keys.  I think you could do
> that without needing to add a rocompat / compat / incompat flag, since
> an old kernel works fine if it never sees an incomplete flag; and
> presumably the new kernel will continue to know how to delete those
> things.)
Yes, I saw it. I figure I could just sort of add a check to see if 
delattrs are on/off, and skip over it if delayed attrs are on.

> 
> The downside is that no code exists to support log incompat flags.  I
> guess every time you want to use them you'd potentially have to check
> the superblock and log a new superblock to disk with the feature turned
> on.  I'll have to think about that more later.
oh ok then.  maybe we should make a stand alone patch since we'll both 
need it

> 
> I guess for now we'd want to retain the predicate function so that we
> could enable it via a seeecret mount option while we stabilize the
> feature.  Later if we add a new ondisk feature flag that uses the log
> item we can change the predicate to return true if that feature flag is
> set (e.g. xfs_sb_version_hasdelattr always returns true if parent
> pointers are enabled).
Ok, i will look into a secret mount opt then :-)

> 
> Atomic file range swapping falls into the same category, so I guess we
> both have things to rework.  On the plus side it means that both of our
> new features aren't going to require people to upgrade or reformat. :)
Yeah, at least we got it figured out before we did that tho :-)

Allison
> 
> --D
> 
>>   {
>> -	return false;
>> +	return (XFS_SB_VERSION_NUM(sbp) == XFS_SB_VERSION_5 &&
>> +		(sbp->sb_features_log_incompat &
>> +		XFS_SB_FEAT_INCOMPAT_LOG_DELATTR));
>>   }
>>   
>>   /*
>> diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
>> index 2a2e3cf..f703d95 100644
>> --- a/fs/xfs/libxfs/xfs_fs.h
>> +++ b/fs/xfs/libxfs/xfs_fs.h
>> @@ -250,6 +250,7 @@ typedef struct xfs_fsop_resblks {
>>   #define XFS_FSOP_GEOM_FLAGS_RMAPBT	(1 << 19) /* reverse mapping btree */
>>   #define XFS_FSOP_GEOM_FLAGS_REFLINK	(1 << 20) /* files can share blocks */
>>   #define XFS_FSOP_GEOM_FLAGS_BIGTIME	(1 << 21) /* 64-bit nsec timestamps */
>> +#define XFS_FSOP_GEOM_FLAGS_DELATTR	(1 << 22) /* delayed attributes	    */
>>   
>>   /*
>>    * Minimum and maximum sizes need for growth checks.
>> diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c
>> index 5aeafa5..a0ec327 100644
>> --- a/fs/xfs/libxfs/xfs_sb.c
>> +++ b/fs/xfs/libxfs/xfs_sb.c
>> @@ -1168,6 +1168,8 @@ xfs_fs_geometry(
>>   		geo->flags |= XFS_FSOP_GEOM_FLAGS_REFLINK;
>>   	if (xfs_sb_version_hasbigtime(sbp))
>>   		geo->flags |= XFS_FSOP_GEOM_FLAGS_BIGTIME;
>> +	if (xfs_sb_version_hasdelattr(sbp))
>> +		geo->flags |= XFS_FSOP_GEOM_FLAGS_DELATTR;
>>   	if (xfs_sb_version_hassector(sbp))
>>   		geo->logsectsize = sbp->sb_logsectsize;
>>   	else
>> diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
>> index d1b5f2d..bb85884 100644
>> --- a/fs/xfs/xfs_super.c
>> +++ b/fs/xfs/xfs_super.c
>> @@ -1580,6 +1580,9 @@ xfs_fc_fill_super(
>>   	if (xfs_sb_version_hasinobtcounts(&mp->m_sb))
>>   		xfs_warn(mp,
>>    "EXPERIMENTAL inode btree counters feature in use. Use at your own risk!");
>> +	if (xfs_sb_version_hasdelattr(&mp->m_sb))
>> +		xfs_alert(mp,
>> +	"EXPERIMENTAL delayed attrs feature enabled. Use at your own risk!");
>>   
>>   	error = xfs_mountfs(mp);
>>   	if (error)
>> -- 
>> 2.7.4
>>

^ permalink raw reply	[flat|nested] 58+ messages in thread

end of thread, other threads:[~2020-11-19  4:02 UTC | newest]

Thread overview: 58+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-23  6:34 [PATCH v13 00/10] xfs: Delayed Attributes Allison Henderson
2020-10-23  6:34 ` [PATCH v13 01/10] xfs: Add helper xfs_attr_node_remove_step Allison Henderson
2020-10-27  7:03   ` Chandan Babu R
2020-10-27 22:23     ` Allison Henderson
2020-10-27 12:15   ` Brian Foster
2020-10-27 15:33     ` Allison Henderson
2020-11-10 23:12   ` Darrick J. Wong
2020-11-13  1:38     ` Allison Henderson
2020-10-23  6:34 ` [PATCH v13 02/10] xfs: Add delay ready attr remove routines Allison Henderson
2020-10-27  9:59   ` Chandan Babu R
2020-10-27 15:32     ` Allison Henderson
2020-10-28 12:04       ` Chandan Babu R
2020-10-29  1:29         ` Allison Henderson
2020-11-14  0:53           ` Darrick J. Wong
2020-10-27 12:16   ` Brian Foster
2020-10-27 22:27     ` Allison Henderson
2020-10-28 12:28       ` Brian Foster
2020-10-29  1:03         ` Allison Henderson
2020-11-10 23:15     ` Darrick J. Wong
2020-11-10 23:43   ` Darrick J. Wong
2020-11-11  0:28     ` Dave Chinner
2020-11-13  4:00       ` Allison Henderson
2020-11-13  3:43     ` Allison Henderson
2020-11-14  1:18       ` Darrick J. Wong
2020-11-16  5:12         ` Allison Henderson
2020-10-23  6:34 ` [PATCH v13 03/10] xfs: Add delay ready attr set routines Allison Henderson
2020-10-27 13:32   ` Chandan Babu R
2020-11-10 21:57     ` Darrick J. Wong
2020-11-13  1:33       ` Allison Henderson
2020-11-13  9:16         ` Chandan Babu R
2020-11-13 17:12           ` Allison Henderson
2020-11-14  1:20             ` Darrick J. Wong
2020-11-10 23:10   ` Darrick J. Wong
2020-11-13  1:38     ` Allison Henderson
2020-11-14  1:35       ` Darrick J. Wong
2020-11-16  5:25         ` Allison Henderson
2020-10-23  6:34 ` [PATCH v13 04/10] xfs: Rename __xfs_attr_rmtval_remove Allison Henderson
2020-10-23  6:34 ` [PATCH v13 05/10] xfs: Set up infastructure for deferred attribute operations Allison Henderson
2020-11-10 21:51   ` Darrick J. Wong
2020-11-11  3:44     ` Darrick J. Wong
2020-11-13 17:06       ` Allison Henderson
2020-11-13  1:32     ` Allison Henderson
2020-11-14  2:00       ` Darrick J. Wong
2020-11-16  7:41         ` Allison Henderson
2020-10-23  6:34 ` [PATCH v13 06/10] xfs: Add xfs_attr_set_deferred and xfs_attr_remove_deferred Allison Henderson
2020-11-10 20:15   ` Darrick J. Wong
2020-11-13  1:27     ` Allison Henderson
2020-11-14  2:03       ` Darrick J. Wong
2020-10-23  6:34 ` [PATCH v13 07/10] xfs: Add feature bit XFS_SB_FEAT_INCOMPAT_LOG_DELATTR Allison Henderson
2020-11-10 20:10   ` Darrick J. Wong
2020-11-13  1:27     ` Allison Henderson
2020-11-19  2:36   ` Darrick J. Wong
2020-11-19  4:01     ` Allison Henderson
2020-10-23  6:34 ` [PATCH v13 08/10] xfs: Enable delayed attributes Allison Henderson
2020-10-23  6:34 ` [PATCH v13 09/10] xfs: Remove unused xfs_attr_*_args Allison Henderson
2020-11-10 20:07   ` Darrick J. Wong
2020-11-13  1:27     ` Allison Henderson
2020-10-23  6:34 ` [PATCH v13 10/10] xfs: Add delayed attributes error tag Allison Henderson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.