linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v12 0/8] xfs: Delayed Attributes
@ 2020-08-27  0:35 Allison Collins
  2020-08-27  0:35 ` [PATCH v12 1/8] xfs: Add delay ready attr remove routines Allison Collins
                   ` (7 more replies)
  0 siblings, 8 replies; 21+ messages in thread
From: Allison Collins @ 2020-08-27  0:35 UTC (permalink / raw)
  To: linux-xfs

Hi all,

This set is a subset of a larger series for parent pointers. Delayed attributes
allow attribute operations (set and remove) to be logged and committed in the same
way that other delayed operations do. This allows more complex operations (like
parent pointers) to be broken up into multiple smaller transactions. To do
this, the existing attr operations must be modified to operate as either a
delayed operation or a inline operation since older filesystems will not be
able to use the new log entries.  This means that they cannot roll, commit, or
finish transactions.  Instead, they return -EAGAIN to allow the calling
function to handle the transaction. In this series, we focus on only the clean
up and refactoring needed to accomplish this. We will introduce delayed attrs
and parent pointers in a later set.

At the moment, I would like people to focus their review efforts on just this
"delayed attribute" sub series, as I think that is a more conservative use of peoples
review time.  I also think the set is a bit much to manage all at once, and we
need to get the infrastructure ironed out before we focus too much anything
that depends on it. But I do have the extended series for folks that want to
see the bigger picture of where this is going.

To help organize the set, I've arranged the patches to make sort of mini sets.
I thought it would help reviewers break down the reviewing some. For reviewing
purposes, the set could be broken up into 2 phases:


Delay Ready Attributes: (patches 1-3)
These are the remaining patches belonging to the "Delay Ready" series that
we've been working with.  In these patches, transaction handling is removed
from the attr routines, and replaced with a state machine that allows a high
level function to roll the transaction and repeatedly recall the attr routines
until they are finished.  The behavior of the attr set/remove routines
are now also compatible as a .finish_item callback
  xfs: Add delay ready attr remove routines
  xfs: Add delay ready attr set routines
  xfs: Rename __xfs_attr_rmtval_remove

Delayed Attributes: (patches 4 - 8)
These patches go on to fully implement delayed attributes.  New attr intent and
done items are introduced for use in the existing logging infrastructure.  A
feature bit is added to toggle the feature on and off, and an error tag is added
to test the log replay
  xfs: Set up infastructure for deferred attribute operations
  xfs: Add xfs_attr_set_deferred and xfs_attr_remove_deferred
  xfs: Add feature bit XFS_SB_FEAT_INCOMPAT_LOG_DELATTR
  xfs: Enable delayed attributes
  xfs_io: Add delayed attributes error tag

This series can be viewed on github here:
https://github.com/allisonhenderson/xfs_work/tree/delay_ready_attrs_v12

As well as the extended delayed attribute and parent pointer series:
https://github.com/allisonhenderson/xfs_work/tree/delay_ready_attrs_v12_extended

And the test cases:
https://github.com/allisonhenderson/xfs_work/tree/pptr_xfstests

In order to run the test cases, you will need have the corresponding xfsprogs
changes as well.  Which can be found here:
https://github.com/allisonhenderson/xfs_work/tree/delay_ready_attrs_xfsprogs_v12
https://github.com/allisonhenderson/xfs_work/tree/delay_ready_attrs_xfsprogs_v12_extended

To run the xfs attributes tests run:
check -g attr

To run as delayed attributes run:
export MKFS_OPTIONS="-n delattr"
check -g attr

To run parent pointer tests:
check -g parent

I've also made the corresponding updates to the user space side as well, and ported anything
they need to seat correctly.

Questions, comment and feedback appreciated! 

Thanks all!
Allison 


Allison Collins (8):
  xfs: Add delay ready attr remove routines
  xfs: Add delay ready attr set routines
  xfs: Rename __xfs_attr_rmtval_remove
  xfs: Set up infastructure for deferred attribute operations
  xfs: Add xfs_attr_set_deferred and xfs_attr_remove_deferred
  xfs: Add feature bit XFS_SB_FEAT_INCOMPAT_LOG_DELATTR
  xfs: Enable delayed attributes
  xfs_io: Add delayed attributes error tag

 fs/xfs/Makefile                 |   1 +
 fs/xfs/libxfs/xfs_attr.c        | 638 ++++++++++++++++++++++--------
 fs/xfs/libxfs/xfs_attr.h        | 243 ++++++++++++
 fs/xfs/libxfs/xfs_attr_leaf.c   |   2 +-
 fs/xfs/libxfs/xfs_attr_remote.c | 114 ++++--
 fs/xfs/libxfs/xfs_attr_remote.h |   7 +-
 fs/xfs/libxfs/xfs_defer.c       |   1 +
 fs/xfs/libxfs/xfs_defer.h       |   3 +
 fs/xfs/libxfs/xfs_errortag.h    |   4 +-
 fs/xfs/libxfs/xfs_format.h      |  11 +-
 fs/xfs/libxfs/xfs_fs.h          |   1 +
 fs/xfs/libxfs/xfs_log_format.h  |  44 ++-
 fs/xfs/libxfs/xfs_log_recover.h |   2 +
 fs/xfs/libxfs/xfs_sb.c          |   2 +
 fs/xfs/libxfs/xfs_types.h       |   1 +
 fs/xfs/scrub/common.c           |   2 +
 fs/xfs/xfs_acl.c                |   2 +
 fs/xfs/xfs_attr_inactive.c      |   2 +-
 fs/xfs/xfs_attr_item.c          | 837 ++++++++++++++++++++++++++++++++++++++++
 fs/xfs/xfs_attr_item.h          |  76 ++++
 fs/xfs/xfs_attr_list.c          |   1 +
 fs/xfs/xfs_error.c              |   3 +
 fs/xfs/xfs_ioctl.c              |   2 +
 fs/xfs/xfs_ioctl32.c            |   2 +
 fs/xfs/xfs_iops.c               |   2 +
 fs/xfs/xfs_log.c                |   4 +
 fs/xfs/xfs_log_recover.c        |   2 +
 fs/xfs/xfs_ondisk.h             |   2 +
 fs/xfs/xfs_super.c              |   4 +
 fs/xfs/xfs_trace.h              |   1 -
 fs/xfs/xfs_xattr.c              |   1 +
 31 files changed, 1806 insertions(+), 211 deletions(-)
 create mode 100644 fs/xfs/xfs_attr_item.c
 create mode 100644 fs/xfs/xfs_attr_item.h

-- 
2.7.4


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v12 1/8] xfs: Add delay ready attr remove routines
  2020-08-27  0:35 [PATCH v12 0/8] xfs: Delayed Attributes Allison Collins
@ 2020-08-27  0:35 ` Allison Collins
  2020-09-01 17:00   ` Brian Foster
  2020-08-27  0:35 ` [PATCH v12 2/8] xfs: Add delay ready attr set routines Allison Collins
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 21+ messages in thread
From: Allison Collins @ 2020-08-27  0:35 UTC (permalink / raw)
  To: linux-xfs

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=y, Size: 15762 bytes --]

This patch modifies the attr remove routines to be delay ready. This
means they no longer roll or commit transactions, but instead return
-EAGAIN to have the calling routine roll and refresh the transaction. In
this series, xfs_attr_remove_args has become xfs_attr_remove_iter, which
uses a sort of state machine like switch to keep track of where it was
when EAGAIN was returned. xfs_attr_node_removename has also been
modified to use the switch, and a new version of xfs_attr_remove_args
consists of a simple loop to refresh the transaction until the operation
is completed.  A new XFS_DAC_DEFER_FINISH flag is used to finish the
transaction where ever the existing code used to.

Calls to xfs_attr_rmtval_remove are replaced with the delay ready
version __xfs_attr_rmtval_remove. We will rename
__xfs_attr_rmtval_remove back to xfs_attr_rmtval_remove when we are
done.

xfs_attr_rmtval_remove itself is still in use by the set routines (used
during a rename).  For reasons of perserving existing function, we
modify xfs_attr_rmtval_remove to call xfs_defer_finish when the flag is
set.  Similar to how xfs_attr_remove_args does here.  Once we transition
the set routines to be delay ready, xfs_attr_rmtval_remove is no longer
used and will be removed.

This patch also adds a new struct xfs_delattr_context, which we will use
to keep track of the current state of an attribute operation. The new
xfs_delattr_state enum is used to track various operations that are in
progress so that we know not to repeat them, and resume where we left
off before EAGAIN was returned to cycle out the transaction. Other
members take the place of local variables that need to retain their
values across multiple function recalls.  See xfs_attr.h for a more
detailed diagram of the states.

Signed-off-by: Allison Collins <allison.henderson@oracle.com>
---
 fs/xfs/libxfs/xfs_attr.c        | 162 ++++++++++++++++++++++++++++++----------
 fs/xfs/libxfs/xfs_attr.h        |  73 ++++++++++++++++++
 fs/xfs/libxfs/xfs_attr_leaf.c   |   2 +-
 fs/xfs/libxfs/xfs_attr_remote.c |  39 +++++-----
 fs/xfs/libxfs/xfs_attr_remote.h |   2 +-
 fs/xfs/xfs_attr_inactive.c      |   2 +-
 6 files changed, 220 insertions(+), 60 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
index 2e055c0..ea50fc3 100644
--- a/fs/xfs/libxfs/xfs_attr.c
+++ b/fs/xfs/libxfs/xfs_attr.c
@@ -53,7 +53,7 @@ STATIC int xfs_attr_leaf_hasname(struct xfs_da_args *args, struct xfs_buf **bp);
  */
 STATIC int xfs_attr_node_get(xfs_da_args_t *args);
 STATIC int xfs_attr_node_addname(xfs_da_args_t *args);
-STATIC int xfs_attr_node_removename(xfs_da_args_t *args);
+STATIC int xfs_attr_node_removename(struct xfs_delattr_context *dac);
 STATIC int xfs_attr_node_hasname(xfs_da_args_t *args,
 				 struct xfs_da_state **state);
 STATIC int xfs_attr_fillstate(xfs_da_state_t *state);
@@ -264,6 +264,33 @@ xfs_attr_set_shortform(
 }
 
 /*
+ * Checks to see if a delayed attribute transaction should be rolled.  If so,
+ * also checks for a defer finish.  Transaction is finished and rolled as
+ * needed, and returns true of false if the delayed operation should continue.
+ */
+int
+xfs_attr_trans_roll(
+	struct xfs_delattr_context	*dac)
+{
+	struct xfs_da_args              *args = dac->da_args;
+	int				error = 0;
+
+	if (dac->flags & XFS_DAC_DEFER_FINISH) {
+		/*
+		 * The caller wants us to finish all the deferred ops so that we
+		 * avoid pinning the log tail with a large number of deferred
+		 * ops.
+		 */
+		dac->flags &= ~XFS_DAC_DEFER_FINISH;
+		error = xfs_defer_finish(&args->trans);
+		if (error)
+			return error;
+	}
+
+	return xfs_trans_roll_inode(&args->trans, args->dp);
+}
+
+/*
  * Set the attribute specified in @args.
  */
 int
@@ -364,23 +391,54 @@ xfs_has_attr(
  */
 int
 xfs_attr_remove_args(
-	struct xfs_da_args      *args)
+	struct xfs_da_args	*args)
 {
-	struct xfs_inode	*dp = args->dp;
-	int			error;
+	int				error = 0;
+	struct xfs_delattr_context	dac = {
+		.da_args	= args,
+	};
+
+	do {
+		error = xfs_attr_remove_iter(&dac);
+		if (error != -EAGAIN)
+			break;
+
+		error = xfs_attr_trans_roll(&dac);
+		if (error)
+			return error;
+
+	} while (true);
+
+	return error;
+}
+
+/*
+ * Remove the attribute specified in @args.
+ *
+ * This function may return -EAGAIN to signal that the transaction needs to be
+ * rolled.  Callers should continue calling this function until they receive a
+ * return value other than -EAGAIN.
+ */
+int
+xfs_attr_remove_iter(
+	struct xfs_delattr_context	*dac)
+{
+	struct xfs_da_args		*args = dac->da_args;
+	struct xfs_inode		*dp = args->dp;
+
+	if (dac->dela_state == XFS_DAS_RM_SHRINK)
+		goto node;
 
 	if (!xfs_inode_hasattr(dp)) {
-		error = -ENOATTR;
+		return -ENOATTR;
 	} else if (dp->i_afp->if_format == XFS_DINODE_FMT_LOCAL) {
 		ASSERT(dp->i_afp->if_flags & XFS_IFINLINE);
-		error = xfs_attr_shortform_remove(args);
+		return xfs_attr_shortform_remove(args);
 	} else if (xfs_bmap_one_block(dp, XFS_ATTR_FORK)) {
-		error = xfs_attr_leaf_removename(args);
-	} else {
-		error = xfs_attr_node_removename(args);
+		return xfs_attr_leaf_removename(args);
 	}
-
-	return error;
+node:
+	return  xfs_attr_node_removename(dac);
 }
 
 /*
@@ -1170,10 +1228,12 @@ xfs_attr_leaf_mark_incomplete(
  */
 STATIC
 int xfs_attr_node_removename_setup(
-	struct xfs_da_args	*args,
-	struct xfs_da_state	**state)
+	struct xfs_delattr_context	*dac,
+	struct xfs_da_state		**state)
 {
-	int			error;
+	struct xfs_da_args		*args = dac->da_args;
+	int				error;
+	struct xfs_da_state_blk		*blk;
 
 	error = xfs_attr_node_hasname(args, state);
 	if (error != -EEXIST)
@@ -1183,6 +1243,13 @@ int xfs_attr_node_removename_setup(
 	ASSERT((*state)->path.blk[(*state)->path.active - 1].magic ==
 		XFS_ATTR_LEAF_MAGIC);
 
+	/*
+	 * Store blk and state in the context incase we need to cycle out the
+	 * transaction
+	 */
+	dac->blk = blk;
+	dac->da_state = *state;
+
 	if (args->rmtblkno > 0) {
 		error = xfs_attr_leaf_mark_incomplete(args, *state);
 		if (error)
@@ -1195,13 +1262,16 @@ int xfs_attr_node_removename_setup(
 }
 
 STATIC int
-xfs_attr_node_remove_rmt(
-	struct xfs_da_args	*args,
-	struct xfs_da_state	*state)
+xfs_attr_node_remove_rmt (
+	struct xfs_delattr_context	*dac,
+	struct xfs_da_state		*state)
 {
-	int			error = 0;
+	int				error = 0;
 
-	error = xfs_attr_rmtval_remove(args);
+	/*
+	 * May return -EAGAIN to request that the caller recall this function
+	 */
+	error = __xfs_attr_rmtval_remove(dac);
 	if (error)
 		return error;
 
@@ -1218,21 +1288,35 @@ xfs_attr_node_remove_rmt(
  * This will involve walking down the Btree, and may involve joining
  * leaf nodes and even joining intermediate nodes up to and including
  * the root node (a special case of an intermediate node).
+ *
+ * This routine is meant to function as either an inline or delayed operation,
+ * and may return -EAGAIN when the transaction needs to be rolled.  Calling
+ * functions will need to handle this, and recall the function until a
+ * successful error code is returned.
  */
 STATIC int
 xfs_attr_node_removename(
-	struct xfs_da_args	*args)
+	struct xfs_delattr_context	*dac)
 {
-	struct xfs_da_state	*state;
-	struct xfs_da_state_blk	*blk;
-	int			retval, error;
-	struct xfs_inode	*dp = args->dp;
+	struct xfs_da_args		*args = dac->da_args;
+	struct xfs_da_state		*state;
+	struct xfs_da_state_blk		*blk;
+	int				retval, error;
+	struct xfs_inode		*dp = args->dp;
 
 	trace_xfs_attr_node_removename(args);
+	state = dac->da_state;
+	blk = dac->blk;
 
-	error = xfs_attr_node_removename_setup(args, &state);
-	if (error)
-		goto out;
+	if (dac->dela_state == XFS_DAS_RM_SHRINK)
+		goto das_rm_shrink;
+
+	if ((dac->flags & XFS_DAC_NODE_RMVNAME_INIT) == 0) {
+		dac->flags |= XFS_DAC_NODE_RMVNAME_INIT;
+		error = xfs_attr_node_removename_setup(dac, &state);
+		if (error)
+			goto out;
+	}
 
 	/*
 	 * If there is an out-of-line value, de-allocate the blocks.
@@ -1240,8 +1324,13 @@ xfs_attr_node_removename(
 	 * overflow the maximum size of a transaction and/or hit a deadlock.
 	 */
 	if (args->rmtblkno > 0) {
-		error = xfs_attr_node_remove_rmt(args, state);
-		if (error)
+		/*
+		 * May return -EAGAIN. Remove blocks until args->rmtblkno == 0
+		 */
+		error = xfs_attr_node_remove_rmt(dac, state);
+		if (error == -EAGAIN)
+			return error;
+		else if (error)
 			goto out;
 	}
 
@@ -1260,17 +1349,14 @@ xfs_attr_node_removename(
 		error = xfs_da3_join(state);
 		if (error)
 			goto out;
-		error = xfs_defer_finish(&args->trans);
-		if (error)
-			goto out;
-		/*
-		 * Commit the Btree join operation and start a new trans.
-		 */
-		error = xfs_trans_roll_inode(&args->trans, dp);
-		if (error)
-			goto out;
+
+		dac->flags |= XFS_DAC_DEFER_FINISH;
+		dac->dela_state = XFS_DAS_RM_SHRINK;
+		return -EAGAIN;
 	}
 
+das_rm_shrink:
+
 	/*
 	 * If the result is small enough, push it all into the inode.
 	 */
diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
index 3e97a93..9573949 100644
--- a/fs/xfs/libxfs/xfs_attr.h
+++ b/fs/xfs/libxfs/xfs_attr.h
@@ -74,6 +74,75 @@ struct xfs_attr_list_context {
 };
 
 
+/*
+ * ========================================================================
+ * Structure used to pass context around among the delayed routines.
+ * ========================================================================
+ */
+
+/*
+ * Below is a state machine diagram for attr remove operations. The  XFS_DAS_*
+ * states indicate places where the function would return -EAGAIN, and then
+ * immediately resume from after being recalled by the calling function. States
+ * marked as a "subroutine state" indicate that they belong to a subroutine, and
+ * so the calling function needs to pass them back to that subroutine to allow
+ * it to finish where it left off. But they otherwise do not have a role in the
+ * calling function other than just passing through.
+ *
+ * xfs_attr_remove_iter()
+ *	  XFS_DAS_RM_SHRINK ─┐
+ *	  (subroutine state) │
+ *	                     └─>xfs_attr_node_removename()
+ *	                                      │
+ *	                                      v
+ *	                                   need to
+ *	                                shrink tree? ─n─┐
+ *	                                      │         │
+ *	                                      y         │
+ *	                                      │         │
+ *	                                      v         │
+ *	                              XFS_DAS_RM_SHRINK │
+ *	                                      │         │
+ *	                                      v         │
+ *	                                     done <─────┘
+ *
+ */
+
+/*
+ * Enum values for xfs_delattr_context.da_state
+ *
+ * These values are used by delayed attribute operations to keep track  of where
+ * they were before they returned -EAGAIN.  A return code of -EAGAIN signals the
+ * calling function to roll the transaction, and then recall the subroutine to
+ * finish the operation.  The enum is then used by the subroutine to jump back
+ * to where it was and resume executing where it left off.
+ */
+enum xfs_delattr_state {
+				      /* Zero is uninitalized */
+	XFS_DAS_RM_SHRINK	= 1,  /* We are shrinking the tree */
+};
+
+/*
+ * Defines for xfs_delattr_context.flags
+ */
+#define XFS_DAC_DEFER_FINISH		0x01 /* finish the transaction */
+#define XFS_DAC_NODE_RMVNAME_INIT	0x02 /* xfs_attr_node_removename init */
+
+/*
+ * Context used for keeping track of delayed attribute operations
+ */
+struct xfs_delattr_context {
+	struct xfs_da_args      *da_args;
+
+	/* Used in xfs_attr_node_removename to roll through removing blocks */
+	struct xfs_da_state     *da_state;
+	struct xfs_da_state_blk *blk;
+
+	/* Used to keep track of current state of delayed operation */
+	unsigned int            flags;
+	enum xfs_delattr_state  dela_state;
+};
+
 /*========================================================================
  * Function prototypes for the kernel.
  *========================================================================*/
@@ -91,6 +160,10 @@ int xfs_attr_set(struct xfs_da_args *args);
 int xfs_attr_set_args(struct xfs_da_args *args);
 int xfs_has_attr(struct xfs_da_args *args);
 int xfs_attr_remove_args(struct xfs_da_args *args);
+int xfs_attr_remove_iter(struct xfs_delattr_context *dac);
+int xfs_attr_trans_roll(struct xfs_delattr_context *dac);
 bool xfs_attr_namecheck(const void *name, size_t length);
+void xfs_delattr_context_init(struct xfs_delattr_context *dac,
+			      struct xfs_da_args *args);
 
 #endif	/* __XFS_ATTR_H__ */
diff --git a/fs/xfs/libxfs/xfs_attr_leaf.c b/fs/xfs/libxfs/xfs_attr_leaf.c
index 8623c81..4ed7b31 100644
--- a/fs/xfs/libxfs/xfs_attr_leaf.c
+++ b/fs/xfs/libxfs/xfs_attr_leaf.c
@@ -19,8 +19,8 @@
 #include "xfs_bmap_btree.h"
 #include "xfs_bmap.h"
 #include "xfs_attr_sf.h"
-#include "xfs_attr_remote.h"
 #include "xfs_attr.h"
+#include "xfs_attr_remote.h"
 #include "xfs_attr_leaf.h"
 #include "xfs_error.h"
 #include "xfs_trace.h"
diff --git a/fs/xfs/libxfs/xfs_attr_remote.c b/fs/xfs/libxfs/xfs_attr_remote.c
index 3f80ced..7f81b48 100644
--- a/fs/xfs/libxfs/xfs_attr_remote.c
+++ b/fs/xfs/libxfs/xfs_attr_remote.c
@@ -676,10 +676,14 @@ xfs_attr_rmtval_invalidate(
  */
 int
 xfs_attr_rmtval_remove(
-	struct xfs_da_args      *args)
+	struct xfs_da_args		*args)
 {
-	int			error;
-	int			retval;
+	xfs_dablk_t			lblkno;
+	int				blkcnt;
+	int				error;
+	struct xfs_delattr_context	dac  = {
+		.da_args	= args,
+	};
 
 	trace_xfs_attr_rmtval_remove(args);
 
@@ -687,19 +691,17 @@ xfs_attr_rmtval_remove(
 	 * Keep de-allocating extents until the remote-value region is gone.
 	 */
 	do {
-		retval = __xfs_attr_rmtval_remove(args);
-		if (retval && retval != -EAGAIN)
-			return retval;
+		error = __xfs_attr_rmtval_remove(&dac);
+		if (error != -EAGAIN)
+			break;
 
-		/*
-		 * Close out trans and start the next one in the chain.
-		 */
-		error = xfs_trans_roll_inode(&args->trans, args->dp);
+		error = xfs_attr_trans_roll(&dac);
 		if (error)
 			return error;
-	} while (retval == -EAGAIN);
 
-	return 0;
+	} while (true);
+
+	return error;
 }
 
 /*
@@ -709,9 +711,10 @@ xfs_attr_rmtval_remove(
  */
 int
 __xfs_attr_rmtval_remove(
-	struct xfs_da_args	*args)
+	struct xfs_delattr_context	*dac)
 {
-	int			error, done;
+	struct xfs_da_args		*args = dac->da_args;
+	int				error, done;
 
 	/*
 	 * Unmap value blocks for this attr.
@@ -721,12 +724,10 @@ __xfs_attr_rmtval_remove(
 	if (error)
 		return error;
 
-	error = xfs_defer_finish(&args->trans);
-	if (error)
-		return error;
-
-	if (!done)
+	if (!done) {
+		dac->flags |= XFS_DAC_DEFER_FINISH;
 		return -EAGAIN;
+	}
 
 	return error;
 }
diff --git a/fs/xfs/libxfs/xfs_attr_remote.h b/fs/xfs/libxfs/xfs_attr_remote.h
index 9eee615..002fd30 100644
--- a/fs/xfs/libxfs/xfs_attr_remote.h
+++ b/fs/xfs/libxfs/xfs_attr_remote.h
@@ -14,5 +14,5 @@ int xfs_attr_rmtval_remove(struct xfs_da_args *args);
 int xfs_attr_rmtval_stale(struct xfs_inode *ip, struct xfs_bmbt_irec *map,
 		xfs_buf_flags_t incore_flags);
 int xfs_attr_rmtval_invalidate(struct xfs_da_args *args);
-int __xfs_attr_rmtval_remove(struct xfs_da_args *args);
+int __xfs_attr_rmtval_remove(struct xfs_delattr_context *dac);
 #endif /* __XFS_ATTR_REMOTE_H__ */
diff --git a/fs/xfs/xfs_attr_inactive.c b/fs/xfs/xfs_attr_inactive.c
index bfad669..aaa7e66 100644
--- a/fs/xfs/xfs_attr_inactive.c
+++ b/fs/xfs/xfs_attr_inactive.c
@@ -15,10 +15,10 @@
 #include "xfs_da_format.h"
 #include "xfs_da_btree.h"
 #include "xfs_inode.h"
+#include "xfs_attr.h"
 #include "xfs_attr_remote.h"
 #include "xfs_trans.h"
 #include "xfs_bmap.h"
-#include "xfs_attr.h"
 #include "xfs_attr_leaf.h"
 #include "xfs_quota.h"
 #include "xfs_dir2.h"
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v12 2/8] xfs: Add delay ready attr set routines
  2020-08-27  0:35 [PATCH v12 0/8] xfs: Delayed Attributes Allison Collins
  2020-08-27  0:35 ` [PATCH v12 1/8] xfs: Add delay ready attr remove routines Allison Collins
@ 2020-08-27  0:35 ` Allison Collins
  2020-08-27  0:35 ` [PATCH v12 3/8] xfs: Rename __xfs_attr_rmtval_remove Allison Collins
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 21+ messages in thread
From: Allison Collins @ 2020-08-27  0:35 UTC (permalink / raw)
  To: linux-xfs

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=y, Size: 28001 bytes --]

This patch modifies the attr set routines to be delay ready. This means
they no longer roll or commit transactions, but instead return -EAGAIN
to have the calling routine roll and refresh the transaction.  In this
series, xfs_attr_set_args has become xfs_attr_set_iter, which uses a
state machine like switch to keep track of where it was when EAGAIN was
returned. See xfs_attr.h for a more detailed diagram of the states.

Two new helper functions have been added: xfs_attr_rmtval_set_init and
xfs_attr_rmtval_set_blk.  They provide a subset of logic similar to
xfs_attr_rmtval_set, but they store the current block in the delay attr
context to allow the caller to roll the transaction between allocations.
This helps to simplify and consolidate code used by
xfs_attr_leaf_addname and xfs_attr_node_addname. xfs_attr_set_args has
now become a simple loop to refresh the transaction until the operation
is completed.  Lastly, xfs_attr_rmtval_remove is no longer used, and is
removed.

Signed-off-by: Allison Collins <allison.henderson@oracle.com>
---
 fs/xfs/libxfs/xfs_attr.c        | 370 +++++++++++++++++++++++++++-------------
 fs/xfs/libxfs/xfs_attr.h        | 126 +++++++++++++-
 fs/xfs/libxfs/xfs_attr_remote.c | 101 +++++++----
 fs/xfs/libxfs/xfs_attr_remote.h |   4 +
 fs/xfs/xfs_trace.h              |   1 -
 5 files changed, 443 insertions(+), 159 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
index ea50fc3..53ae343 100644
--- a/fs/xfs/libxfs/xfs_attr.c
+++ b/fs/xfs/libxfs/xfs_attr.c
@@ -44,7 +44,7 @@ STATIC int xfs_attr_shortform_addname(xfs_da_args_t *args);
  * Internal routines when attribute list is one block.
  */
 STATIC int xfs_attr_leaf_get(xfs_da_args_t *args);
-STATIC int xfs_attr_leaf_addname(xfs_da_args_t *args);
+STATIC int xfs_attr_leaf_addname(struct xfs_delattr_context *dac);
 STATIC int xfs_attr_leaf_removename(xfs_da_args_t *args);
 STATIC int xfs_attr_leaf_hasname(struct xfs_da_args *args, struct xfs_buf **bp);
 
@@ -52,12 +52,15 @@ STATIC int xfs_attr_leaf_hasname(struct xfs_da_args *args, struct xfs_buf **bp);
  * Internal routines when attribute list is more than one block.
  */
 STATIC int xfs_attr_node_get(xfs_da_args_t *args);
-STATIC int xfs_attr_node_addname(xfs_da_args_t *args);
+STATIC int xfs_attr_node_addname(struct xfs_delattr_context *dac);
 STATIC int xfs_attr_node_removename(struct xfs_delattr_context *dac);
 STATIC int xfs_attr_node_hasname(xfs_da_args_t *args,
 				 struct xfs_da_state **state);
 STATIC int xfs_attr_fillstate(xfs_da_state_t *state);
 STATIC int xfs_attr_refillstate(xfs_da_state_t *state);
+STATIC int xfs_attr_leaf_try_add(struct xfs_da_args *args, struct xfs_buf *bp);
+STATIC int xfs_attr_set_iter(struct xfs_delattr_context *dac,
+			     struct xfs_buf **leaf_bp);
 
 int
 xfs_inode_hasattr(
@@ -218,8 +221,11 @@ xfs_attr_is_shortform(
 
 /*
  * Attempts to set an attr in shortform, or converts short form to leaf form if
- * there is not enough room.  If the attr is set, the transaction is committed
- * and set to NULL.
+ * there is not enough room.  This function is meant to operate as a helper
+ * routine to the delayed attribute functions.  It returns -EAGAIN to indicate
+ * that the calling function should roll the transaction, and then proceed to
+ * add the attr in leaf form.  This subroutine does not expect to be recalled
+ * again like the other delayed attr routines do.
  */
 STATIC int
 xfs_attr_set_shortform(
@@ -227,16 +233,16 @@ xfs_attr_set_shortform(
 	struct xfs_buf		**leaf_bp)
 {
 	struct xfs_inode	*dp = args->dp;
-	int			error, error2 = 0;
+	int			error = 0;
 
 	/*
 	 * Try to add the attr to the attribute list in the inode.
 	 */
 	error = xfs_attr_try_sf_addname(dp, args);
+
+	/* Should only be 0, -EEXIST or ENOSPC */
 	if (error != -ENOSPC) {
-		error2 = xfs_trans_commit(args->trans);
-		args->trans = NULL;
-		return error ? error : error2;
+		return error;
 	}
 	/*
 	 * It won't fit in the shortform, transform to a leaf block.  GROT:
@@ -249,18 +255,10 @@ xfs_attr_set_shortform(
 	/*
 	 * Prevent the leaf buffer from being unlocked so that a concurrent AIL
 	 * push cannot grab the half-baked leaf buffer and run into problems
-	 * with the write verifier. Once we're done rolling the transaction we
-	 * can release the hold and add the attr to the leaf.
+	 * with the write verifier.
 	 */
 	xfs_trans_bhold(args->trans, *leaf_bp);
-	error = xfs_defer_finish(&args->trans);
-	xfs_trans_bhold_release(args->trans, *leaf_bp);
-	if (error) {
-		xfs_trans_brelse(args->trans, *leaf_bp);
-		return error;
-	}
-
-	return 0;
+	return -EAGAIN;
 }
 
 /*
@@ -268,7 +266,7 @@ xfs_attr_set_shortform(
  * also checks for a defer finish.  Transaction is finished and rolled as
  * needed, and returns true of false if the delayed operation should continue.
  */
-int
+STATIC int
 xfs_attr_trans_roll(
 	struct xfs_delattr_context	*dac)
 {
@@ -297,61 +295,130 @@ int
 xfs_attr_set_args(
 	struct xfs_da_args	*args)
 {
-	struct xfs_inode	*dp = args->dp;
-	struct xfs_buf          *leaf_bp = NULL;
-	int			error = 0;
+	struct xfs_buf			*leaf_bp = NULL;
+	int				error = 0;
+	struct xfs_delattr_context	dac = {
+		.da_args	= args,
+	};
+
+	do {
+		error = xfs_attr_set_iter(&dac, &leaf_bp);
+		if (error != -EAGAIN)
+			break;
+
+		error = xfs_attr_trans_roll(&dac);
+		if (error)
+			return error;
+
+		if (leaf_bp) {
+			xfs_trans_bjoin(args->trans, leaf_bp);
+			xfs_trans_bhold(args->trans, leaf_bp);
+		}
+
+	} while (true);
+
+	return error;
+}
+
+/*
+ * Set the attribute specified in @args.
+ * This routine is meant to function as a delayed operation, and may return
+ * -EAGAIN when the transaction needs to be rolled.  Calling functions will need
+ * to handle this, and recall the function until a successful error code is
+ * returned.
+ */
+STATIC int
+xfs_attr_set_iter(
+	struct xfs_delattr_context	*dac,
+	struct xfs_buf			**leaf_bp)
+{
+	struct xfs_da_args		*args = dac->da_args;
+	struct xfs_inode		*dp = args->dp;
+	int				error = 0;
+
+	/* State machine switch */
+	switch (dac->dela_state) {
+	case XFS_DAS_FLIP_LFLAG:
+	case XFS_DAS_FOUND_LBLK:
+		goto das_leaf;
+	case XFS_DAS_FOUND_NBLK:
+	case XFS_DAS_FLIP_NFLAG:
+	case XFS_DAS_ALLOC_NODE:
+		goto das_node;
+	default:
+		break;
+	}
 
 	/*
 	 * If the attribute list is already in leaf format, jump straight to
 	 * leaf handling.  Otherwise, try to add the attribute to the shortform
 	 * list; if there's no room then convert the list to leaf format and try
-	 * again.
+	 * again. No need to set state as we will be in leaf form when we come
+	 * back
 	 */
 	if (xfs_attr_is_shortform(dp)) {
 
 		/*
-		 * If the attr was successfully set in shortform, the
-		 * transaction is committed and set to NULL.  Otherwise, is it
-		 * converted from shortform to leaf, and the transaction is
-		 * retained.
+		 * If the attr was successfully set in shortform, no need to
+		 * continue.  Otherwise, is it converted from shortform to leaf
+		 * and -EAGAIN is returned.
 		 */
-		error = xfs_attr_set_shortform(args, &leaf_bp);
-		if (error || !args->trans)
-			return error;
+		error = xfs_attr_set_shortform(args, leaf_bp);
+		if (error == -EAGAIN)
+			dac->flags |= XFS_DAC_DEFER_FINISH;
+
+		return error;
 	}
 
-	if (xfs_bmap_one_block(dp, XFS_ATTR_FORK)) {
-		error = xfs_attr_leaf_addname(args);
-		if (error != -ENOSPC)
-			return error;
+	/*
+	 * After a shortform to leaf conversion, we need to hold the leaf and
+	 * cycle out the transaction.  When we get back, we need to release
+	 * the leaf.
+	 */
+	if (*leaf_bp != NULL) {
+		xfs_trans_bhold_release(args->trans, *leaf_bp);
+		*leaf_bp = NULL;
+	}
 
-		/*
-		 * Promote the attribute list to the Btree format.
-		 */
-		error = xfs_attr3_leaf_to_node(args);
-		if (error)
-			return error;
+	if (xfs_bmap_one_block(dp, XFS_ATTR_FORK)) {
+		error = xfs_attr_leaf_try_add(args, *leaf_bp);
+		switch (error) {
+		case -ENOSPC:
+			/*
+			 * Promote the attribute list to the Btree format.
+			 */
+			error = xfs_attr3_leaf_to_node(args);
+			if (error)
+				return error;
 
-		/*
-		 * Finish any deferred work items and roll the transaction once
-		 * more.  The goal here is to call node_addname with the inode
-		 * and transaction in the same state (inode locked and joined,
-		 * transaction clean) no matter how we got to this step.
-		 */
-		error = xfs_defer_finish(&args->trans);
-		if (error)
+			/*
+			 * Finish any deferred work items and roll the
+			 * transaction once more.  The goal here is to call
+			 * node_addname with the inode and transaction in the
+			 * same state (inode locked and joined, transaction
+			 * clean) no matter how we got to this step.
+			 */
+			dac->flags |= XFS_DAC_DEFER_FINISH;
+			return -EAGAIN;
+		case 0:
+			dac->dela_state = XFS_DAS_FOUND_LBLK;
+			return -EAGAIN;
+		default:
 			return error;
+		}
+das_leaf:
+		error = xfs_attr_leaf_addname(dac);
+		if (error == -ENOSPC)
+			/*
+			 * No need to set state.  We will be in node form when
+			 * we are recalled
+			 */
+			return -EAGAIN;
 
-		/*
-		 * Commit the current trans (including the inode) and
-		 * start a new one.
-		 */
-		error = xfs_trans_roll_inode(&args->trans, dp);
-		if (error)
-			return error;
+		return error;
 	}
-
-	error = xfs_attr_node_addname(args);
+das_node:
+	error = xfs_attr_node_addname(dac);
 	return error;
 }
 
@@ -715,28 +782,30 @@ xfs_attr_leaf_try_add(
  *
  * This leaf block cannot have a "remote" value, we only call this routine
  * if bmap_one_block() says there is only one block (ie: no remote blks).
+ *
+ * This routine is meant to function as a delayed operation, and may return
+ * -EAGAIN when the transaction needs to be rolled.  Calling functions will need
+ * to handle this, and recall the function until a successful error code is
+ * returned.
  */
 STATIC int
 xfs_attr_leaf_addname(
-	struct xfs_da_args	*args)
+	struct xfs_delattr_context	*dac)
 {
-	int			error, forkoff;
-	struct xfs_buf		*bp = NULL;
-	struct xfs_inode	*dp = args->dp;
-
-	trace_xfs_attr_leaf_addname(args);
-
-	error = xfs_attr_leaf_try_add(args, bp);
-	if (error)
-		return error;
+	struct xfs_da_args		*args = dac->da_args;
+	struct xfs_buf			*bp = NULL;
+	int				error, forkoff;
+	struct xfs_inode		*dp = args->dp;
 
-	/*
-	 * Commit the transaction that added the attr name so that
-	 * later routines can manage their own transactions.
-	 */
-	error = xfs_trans_roll_inode(&args->trans, dp);
-	if (error)
-		return error;
+	/* State machine switch */
+	switch (dac->dela_state) {
+	case XFS_DAS_FLIP_LFLAG:
+		goto das_flip_flag;
+	case XFS_DAS_RM_LBLK:
+		goto das_rm_lblk;
+	default:
+		break;
+	}
 
 	/*
 	 * If there was an out-of-line value, allocate the blocks we
@@ -744,12 +813,34 @@ xfs_attr_leaf_addname(
 	 * after we create the attribute so that we don't overflow the
 	 * maximum size of a transaction and/or hit a deadlock.
 	 */
-	if (args->rmtblkno > 0) {
-		error = xfs_attr_rmtval_set(args);
+
+	/* Open coded xfs_attr_rmtval_set without trans handling */
+	if ((dac->flags & XFS_DAC_LEAF_ADDNAME_INIT) == 0) {
+		dac->flags |= XFS_DAC_LEAF_ADDNAME_INIT;
+		if (args->rmtblkno > 0) {
+			error = xfs_attr_rmtval_find_space(dac);
+			if (error)
+				return error;
+		}
+	}
+
+	/*
+	 * Roll through the "value", allocating blocks on disk as
+	 * required.
+	 */
+	if (dac->blkcnt > 0) {
+		error = xfs_attr_rmtval_set_blk(dac);
 		if (error)
 			return error;
+
+		dac->flags |= XFS_DAC_DEFER_FINISH;
+		return -EAGAIN;
 	}
 
+	error = xfs_attr_rmtval_set_value(args);
+	if (error)
+		return error;
+
 	if (!(args->op_flags & XFS_DA_OP_RENAME)) {
 		/*
 		 * Added a "remote" value, just clear the incomplete flag.
@@ -769,29 +860,29 @@ xfs_attr_leaf_addname(
 	 * In a separate transaction, set the incomplete flag on the "old" attr
 	 * and clear the incomplete flag on the "new" attr.
 	 */
-
 	error = xfs_attr3_leaf_flipflags(args);
 	if (error)
 		return error;
 	/*
 	 * Commit the flag value change and start the next trans in series.
 	 */
-	error = xfs_trans_roll_inode(&args->trans, args->dp);
-	if (error)
-		return error;
-
+	dac->dela_state = XFS_DAS_FLIP_LFLAG;
+	return -EAGAIN;
+das_flip_flag:
 	/*
 	 * Dismantle the "old" attribute/value pair by removing a "remote" value
 	 * (if it exists).
 	 */
 	xfs_attr_restore_rmt_blk(args);
 
+	error = xfs_attr_rmtval_invalidate(args);
+	if (error)
+		return error;
+das_rm_lblk:
 	if (args->rmtblkno) {
-		error = xfs_attr_rmtval_invalidate(args);
-		if (error)
-			return error;
-
-		error = xfs_attr_rmtval_remove(args);
+		error = __xfs_attr_rmtval_remove(dac);
+		if (error == -EAGAIN)
+			dac->dela_state = XFS_DAS_RM_LBLK;
 		if (error)
 			return error;
 	}
@@ -957,15 +1048,23 @@ xfs_attr_node_hasname(
  *
  * "Remote" attribute values confuse the issue and atomic rename operations
  * add a whole extra layer of confusion on top of that.
+ *
+ * This routine is meant to function as a delayed operation, and may return
+ * -EAGAIN when the transaction needs to be rolled.  Calling functions will need
+ * to handle this, and recall the function until a successful error code is
+ *returned.
  */
 STATIC int
 xfs_attr_node_addname(
-	struct xfs_da_args	*args)
+	struct xfs_delattr_context	*dac)
 {
-	struct xfs_da_state	*state;
-	struct xfs_da_state_blk	*blk;
-	struct xfs_inode	*dp;
-	int			retval, error;
+	struct xfs_da_args		*args = dac->da_args;
+	struct xfs_da_state		*state = NULL;
+	struct xfs_da_state_blk		*blk;
+	struct xfs_inode		*dp;
+	struct xfs_mount		*mp;
+	int				retval = 0;
+	int				error = 0;
 
 	trace_xfs_attr_node_addname(args);
 
@@ -973,7 +1072,22 @@ xfs_attr_node_addname(
 	 * Fill in bucket of arguments/results/context to carry around.
 	 */
 	dp = args->dp;
-restart:
+	mp = dp->i_mount;
+
+	/* State machine switch */
+	switch (dac->dela_state) {
+	case XFS_DAS_FLIP_NFLAG:
+		goto das_flip_flag;
+	case XFS_DAS_FOUND_NBLK:
+		goto das_found_nblk;
+	case XFS_DAS_ALLOC_NODE:
+		goto das_alloc_node;
+	case XFS_DAS_RM_NBLK:
+		goto das_rm_nblk;
+	default:
+		break;
+	}
+
 	/*
 	 * Search to see if name already exists, and get back a pointer
 	 * to where it should go.
@@ -1019,19 +1133,13 @@ xfs_attr_node_addname(
 			error = xfs_attr3_leaf_to_node(args);
 			if (error)
 				goto out;
-			error = xfs_defer_finish(&args->trans);
-			if (error)
-				goto out;
 
 			/*
-			 * Commit the node conversion and start the next
-			 * trans in the chain.
+			 * Restart routine from the top.  No need to set  the
+			 * state
 			 */
-			error = xfs_trans_roll_inode(&args->trans, dp);
-			if (error)
-				goto out;
-
-			goto restart;
+			dac->flags |= XFS_DAC_DEFER_FINISH;
+			return -EAGAIN;
 		}
 
 		/*
@@ -1043,9 +1151,7 @@ xfs_attr_node_addname(
 		error = xfs_da3_split(state);
 		if (error)
 			goto out;
-		error = xfs_defer_finish(&args->trans);
-		if (error)
-			goto out;
+		dac->flags |= XFS_DAC_DEFER_FINISH;
 	} else {
 		/*
 		 * Addition succeeded, update Btree hashvals.
@@ -1060,13 +1166,9 @@ xfs_attr_node_addname(
 	xfs_da_state_free(state);
 	state = NULL;
 
-	/*
-	 * Commit the leaf addition or btree split and start the next
-	 * trans in the chain.
-	 */
-	error = xfs_trans_roll_inode(&args->trans, dp);
-	if (error)
-		goto out;
+	dac->dela_state = XFS_DAS_FOUND_NBLK;
+	return -EAGAIN;
+das_found_nblk:
 
 	/*
 	 * If there was an out-of-line value, allocate the blocks we
@@ -1075,7 +1177,27 @@ xfs_attr_node_addname(
 	 * maximum size of a transaction and/or hit a deadlock.
 	 */
 	if (args->rmtblkno > 0) {
-		error = xfs_attr_rmtval_set(args);
+		/* Open coded xfs_attr_rmtval_set without trans handling */
+		error = xfs_attr_rmtval_find_space(dac);
+		if (error)
+			return error;
+
+		/*
+		 * Roll through the "value", allocating blocks on disk as
+		 * required.
+		 */
+das_alloc_node:
+		if (dac->blkcnt > 0) {
+			error = xfs_attr_rmtval_set_blk(dac);
+			if (error)
+				return error;
+
+			dac->flags |= XFS_DAC_DEFER_FINISH;
+			dac->dela_state = XFS_DAS_ALLOC_NODE;
+			return -EAGAIN;
+		}
+
+		error = xfs_attr_rmtval_set_value(args);
 		if (error)
 			return error;
 	}
@@ -1105,22 +1227,28 @@ xfs_attr_node_addname(
 	/*
 	 * Commit the flag value change and start the next trans in series
 	 */
-	error = xfs_trans_roll_inode(&args->trans, args->dp);
-	if (error)
-		goto out;
-
+	dac->dela_state = XFS_DAS_FLIP_NFLAG;
+	return -EAGAIN;
+das_flip_flag:
 	/*
 	 * Dismantle the "old" attribute/value pair by removing a "remote" value
 	 * (if it exists).
 	 */
 	xfs_attr_restore_rmt_blk(args);
 
+	error = xfs_attr_rmtval_invalidate(args);
+	if (error)
+		return error;
+
+das_rm_nblk:
 	if (args->rmtblkno) {
-		error = xfs_attr_rmtval_invalidate(args);
-		if (error)
-			return error;
+		error = __xfs_attr_rmtval_remove(dac);
+
+		if (error == -EAGAIN) {
+			dac->dela_state = XFS_DAS_RM_NBLK;
+			return -EAGAIN;
+		}
 
-		error = xfs_attr_rmtval_remove(args);
 		if (error)
 			return error;
 	}
diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
index 9573949..4f6bba8 100644
--- a/fs/xfs/libxfs/xfs_attr.h
+++ b/fs/xfs/libxfs/xfs_attr.h
@@ -106,6 +106,118 @@ struct xfs_attr_list_context {
  *	                                      v         │
  *	                                     done <─────┘
  *
+ *
+ * Below is a state machine diagram for attr set operations.
+ *
+ *  xfs_attr_set_iter()
+ *             │
+ *             v
+ *   ┌───n── fork has
+ *   │	    only 1 blk?
+ *   │		│
+ *   │		y
+ *   │		│
+ *   │		v
+ *   │	xfs_attr_leaf_try_add()
+ *   │		│
+ *   │		v
+ *   │	     had enough
+ *   ├───n────space?
+ *   │		│
+ *   │		y
+ *   │		│
+ *   │		v
+ *   │	XFS_DAS_FOUND_LBLK ──┐
+ *   │	                     │
+ *   │	XFS_DAS_FLIP_LFLAG ──┤
+ *   │	(subroutine state)   │
+ *   │		             │
+ *   │		             └─>xfs_attr_leaf_addname()
+ *   │		                      │
+ *   │		                      v
+ *   │		                   was this
+ *   │		                   a rename? ──n─┐
+ *   │		                      │          │
+ *   │		                      y          │
+ *   │		                      │          │
+ *   │		                      v          │
+ *   │		                flip incomplete  │
+ *   │		                    flag         │
+ *   │		                      │          │
+ *   │		                      v          │
+ *   │		              XFS_DAS_FLIP_LFLAG │
+ *   │		                      │          │
+ *   │		                      v          │
+ *   │		                    remove       │
+ *   │		XFS_DAS_RM_LBLK ─> old name      │
+ *   │		         ^            │          │
+ *   │		         │            v          │
+ *   │		         └──────y── more to      │
+ *   │		                    remove       │
+ *   │		                      │          │
+ *   │		                      n          │
+ *   │		                      │          │
+ *   │		                      v          │
+ *   │		                     done <──────┘
+ *   └──> XFS_DAS_FOUND_NBLK ──┐
+ *	  (subroutine state)   │
+ *	                       │
+ *	  XFS_DAS_ALLOC_NODE ──┤
+ *	  (subroutine state)   │
+ *	                       │
+ *	  XFS_DAS_FLIP_NFLAG ──┤
+ *	  (subroutine state)   │
+ *	                       │
+ *	                       └─>xfs_attr_node_addname()
+ *	                               │
+ *	                               v
+ *	                       find space to store
+ *	                      attr. Split if needed
+ *	                               │
+ *	                               v
+ *	                       XFS_DAS_FOUND_NBLK
+ *	                               │
+ *	                               v
+ *	                 ┌─────n──  need to
+ *	                 │        alloc blks?
+ *	                 │             │
+ *	                 │             y
+ *	                 │             │
+ *	                 │             v
+ *	                 │  ┌─>XFS_DAS_ALLOC_NODE
+ *	                 │  │          │
+ *	                 │  │          v
+ *	                 │  └──y── need to alloc
+ *	                 │         more blocks?
+ *	                 │             │
+ *	                 │             n
+ *	                 │             │
+ *	                 │             v
+ *	                 │          was this
+ *	                 └────────> a rename? ──n─┐
+ *	                               │          │
+ *	                               y          │
+ *	                               │          │
+ *	                               v          │
+ *	                         flip incomplete  │
+ *	                             flag         │
+ *	                               │          │
+ *	                               v          │
+ *	                       XFS_DAS_FLIP_NFLAG │
+ *	                               │          │
+ *	                               v          │
+ *	                             remove       │
+ *	         XFS_DAS_RM_NBLK ─> old name      │
+ *	                  ^            │          │
+ *	                  │            v          │
+ *	                  └──────y── more to      │
+ *	                             remove       │
+ *	                               │          │
+ *	                               n          │
+ *	                               │          │
+ *	                               v          │
+ *	                              done <──────┘
+ *
  */
 
 /*
@@ -120,6 +232,13 @@ struct xfs_attr_list_context {
 enum xfs_delattr_state {
 				      /* Zero is uninitalized */
 	XFS_DAS_RM_SHRINK	= 1,  /* We are shrinking the tree */
+	XFS_DAS_FOUND_LBLK,	      /* We found leaf blk for attr */
+	XFS_DAS_FOUND_NBLK,	      /* We found node blk for attr */
+	XFS_DAS_FLIP_LFLAG,	      /* Flipped leaf INCOMPLETE attr flag */
+	XFS_DAS_RM_LBLK,	      /* A rename is removing leaf blocks */
+	XFS_DAS_ALLOC_NODE,	      /* We are allocating node blocks */
+	XFS_DAS_FLIP_NFLAG,	      /* Flipped node INCOMPLETE attr flag */
+	XFS_DAS_RM_NBLK,	      /* A rename is removing node blocks */
 };
 
 /*
@@ -127,6 +246,7 @@ enum xfs_delattr_state {
  */
 #define XFS_DAC_DEFER_FINISH		0x01 /* finish the transaction */
 #define XFS_DAC_NODE_RMVNAME_INIT	0x02 /* xfs_attr_node_removename init */
+#define XFS_DAC_LEAF_ADDNAME_INIT	0x04 /* xfs_attr_leaf_addname init*/
 
 /*
  * Context used for keeping track of delayed attribute operations
@@ -134,6 +254,11 @@ enum xfs_delattr_state {
 struct xfs_delattr_context {
 	struct xfs_da_args      *da_args;
 
+	/* Used in xfs_attr_rmtval_set_blk to roll through allocating blocks */
+	struct xfs_bmbt_irec	map;
+	xfs_dablk_t		lblkno;
+	int			blkcnt;
+
 	/* Used in xfs_attr_node_removename to roll through removing blocks */
 	struct xfs_da_state     *da_state;
 	struct xfs_da_state_blk *blk;
@@ -161,7 +286,6 @@ int xfs_attr_set_args(struct xfs_da_args *args);
 int xfs_has_attr(struct xfs_da_args *args);
 int xfs_attr_remove_args(struct xfs_da_args *args);
 int xfs_attr_remove_iter(struct xfs_delattr_context *dac);
-int xfs_attr_trans_roll(struct xfs_delattr_context *dac);
 bool xfs_attr_namecheck(const void *name, size_t length);
 void xfs_delattr_context_init(struct xfs_delattr_context *dac,
 			      struct xfs_da_args *args);
diff --git a/fs/xfs/libxfs/xfs_attr_remote.c b/fs/xfs/libxfs/xfs_attr_remote.c
index 7f81b48..ceaefb3 100644
--- a/fs/xfs/libxfs/xfs_attr_remote.c
+++ b/fs/xfs/libxfs/xfs_attr_remote.c
@@ -443,7 +443,7 @@ xfs_attr_rmtval_get(
  * Find a "hole" in the attribute address space large enough for us to drop the
  * new attribute's value into
  */
-STATIC int
+int
 xfs_attr_rmt_find_hole(
 	struct xfs_da_args	*args)
 {
@@ -470,7 +470,7 @@ xfs_attr_rmt_find_hole(
 	return 0;
 }
 
-STATIC int
+int
 xfs_attr_rmtval_set_value(
 	struct xfs_da_args	*args)
 {
@@ -630,6 +630,69 @@ xfs_attr_rmtval_set(
 }
 
 /*
+ * Find a hole for the attr and store it in the delayed attr context.  This
+ * initializes the context to roll through allocating an attr extent for a
+ * delayed attr operation
+ */
+int
+xfs_attr_rmtval_find_space(
+	struct xfs_delattr_context	*dac)
+{
+	struct xfs_da_args		*args = dac->da_args;
+	struct xfs_bmbt_irec		*map = &dac->map;
+	int				error;
+
+	dac->lblkno = 0;
+	dac->blkcnt = 0;
+	args->rmtblkcnt = 0;
+	args->rmtblkno = 0;
+	memset(map, 0, sizeof(struct xfs_bmbt_irec));
+
+	error = xfs_attr_rmt_find_hole(args);
+	if (error)
+		return error;
+
+	dac->blkcnt = args->rmtblkcnt;
+	dac->lblkno = args->rmtblkno;
+
+	return 0;
+}
+
+/*
+ * Write one block of the value associated with an attribute into the
+ * out-of-line buffer that we have defined for it. This is similar to a subset
+ * of xfs_attr_rmtval_set, but records the current block to the delayed attr
+ * context, and leaves transaction handling to the caller.
+ */
+int
+xfs_attr_rmtval_set_blk(
+	struct xfs_delattr_context	*dac)
+{
+	struct xfs_da_args		*args = dac->da_args;
+	struct xfs_inode		*dp = args->dp;
+	struct xfs_bmbt_irec		*map = &dac->map;
+	int nmap;
+	int error;
+
+	nmap = 1;
+	error = xfs_bmapi_write(args->trans, dp, (xfs_fileoff_t)dac->lblkno,
+				dac->blkcnt, XFS_BMAPI_ATTRFORK, args->total,
+				map, &nmap);
+	if (error)
+		return error;
+
+	ASSERT(nmap == 1);
+	ASSERT((map->br_startblock != DELAYSTARTBLOCK) &&
+	       (map->br_startblock != HOLESTARTBLOCK));
+
+	/* roll attribute extent map forwards */
+	dac->lblkno += map->br_blockcount;
+	dac->blkcnt -= map->br_blockcount;
+
+	return 0;
+}
+
+/*
  * Remove the value associated with an attribute by deleting the
  * out-of-line buffer that it is stored on.
  */
@@ -671,40 +734,6 @@ xfs_attr_rmtval_invalidate(
 }
 
 /*
- * Remove the value associated with an attribute by deleting the
- * out-of-line buffer that it is stored on.
- */
-int
-xfs_attr_rmtval_remove(
-	struct xfs_da_args		*args)
-{
-	xfs_dablk_t			lblkno;
-	int				blkcnt;
-	int				error;
-	struct xfs_delattr_context	dac  = {
-		.da_args	= args,
-	};
-
-	trace_xfs_attr_rmtval_remove(args);
-
-	/*
-	 * Keep de-allocating extents until the remote-value region is gone.
-	 */
-	do {
-		error = __xfs_attr_rmtval_remove(&dac);
-		if (error != -EAGAIN)
-			break;
-
-		error = xfs_attr_trans_roll(&dac);
-		if (error)
-			return error;
-
-	} while (true);
-
-	return error;
-}
-
-/*
  * Remove the value associated with an attribute by deleting the out-of-line
  * buffer that it is stored on. Returns EAGAIN for the caller to refresh the
  * transaction and re-call the function
diff --git a/fs/xfs/libxfs/xfs_attr_remote.h b/fs/xfs/libxfs/xfs_attr_remote.h
index 002fd30..84e2700 100644
--- a/fs/xfs/libxfs/xfs_attr_remote.h
+++ b/fs/xfs/libxfs/xfs_attr_remote.h
@@ -15,4 +15,8 @@ int xfs_attr_rmtval_stale(struct xfs_inode *ip, struct xfs_bmbt_irec *map,
 		xfs_buf_flags_t incore_flags);
 int xfs_attr_rmtval_invalidate(struct xfs_da_args *args);
 int __xfs_attr_rmtval_remove(struct xfs_delattr_context *dac);
+int xfs_attr_rmt_find_hole(struct xfs_da_args *args);
+int xfs_attr_rmtval_set_value(struct xfs_da_args *args);
+int xfs_attr_rmtval_set_blk(struct xfs_delattr_context *dac);
+int xfs_attr_rmtval_find_space(struct xfs_delattr_context *dac);
 #endif /* __XFS_ATTR_REMOTE_H__ */
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index abb1d85..427a091 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -1925,7 +1925,6 @@ DEFINE_ATTR_EVENT(xfs_attr_refillstate);
 
 DEFINE_ATTR_EVENT(xfs_attr_rmtval_get);
 DEFINE_ATTR_EVENT(xfs_attr_rmtval_set);
-DEFINE_ATTR_EVENT(xfs_attr_rmtval_remove);
 
 #define DEFINE_DA_EVENT(name) \
 DEFINE_EVENT(xfs_da_class, name, \
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v12 3/8] xfs: Rename __xfs_attr_rmtval_remove
  2020-08-27  0:35 [PATCH v12 0/8] xfs: Delayed Attributes Allison Collins
  2020-08-27  0:35 ` [PATCH v12 1/8] xfs: Add delay ready attr remove routines Allison Collins
  2020-08-27  0:35 ` [PATCH v12 2/8] xfs: Add delay ready attr set routines Allison Collins
@ 2020-08-27  0:35 ` Allison Collins
  2020-08-27  0:35 ` [PATCH v12 4/8] xfs: Set up infastructure for deferred attribute operations Allison Collins
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 21+ messages in thread
From: Allison Collins @ 2020-08-27  0:35 UTC (permalink / raw)
  To: linux-xfs

Now that xfs_attr_rmtval_remove is gone, rename __xfs_attr_rmtval_remove
to xfs_attr_rmtval_remove

Signed-off-by: Allison Collins <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/libxfs/xfs_attr.c        | 7 +++----
 fs/xfs/libxfs/xfs_attr_remote.c | 2 +-
 fs/xfs/libxfs/xfs_attr_remote.h | 3 +--
 3 files changed, 5 insertions(+), 7 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
index 53ae343..a8cfe62 100644
--- a/fs/xfs/libxfs/xfs_attr.c
+++ b/fs/xfs/libxfs/xfs_attr.c
@@ -880,7 +880,7 @@ xfs_attr_leaf_addname(
 		return error;
 das_rm_lblk:
 	if (args->rmtblkno) {
-		error = __xfs_attr_rmtval_remove(dac);
+		error = xfs_attr_rmtval_remove(dac);
 		if (error == -EAGAIN)
 			dac->dela_state = XFS_DAS_RM_LBLK;
 		if (error)
@@ -1242,8 +1242,7 @@ xfs_attr_node_addname(
 
 das_rm_nblk:
 	if (args->rmtblkno) {
-		error = __xfs_attr_rmtval_remove(dac);
-
+		error = xfs_attr_rmtval_remove(dac);
 		if (error == -EAGAIN) {
 			dac->dela_state = XFS_DAS_RM_NBLK;
 			return -EAGAIN;
@@ -1399,7 +1398,7 @@ xfs_attr_node_remove_rmt (
 	/*
 	 * May return -EAGAIN to request that the caller recall this function
 	 */
-	error = __xfs_attr_rmtval_remove(dac);
+	error = xfs_attr_rmtval_remove(dac);
 	if (error)
 		return error;
 
diff --git a/fs/xfs/libxfs/xfs_attr_remote.c b/fs/xfs/libxfs/xfs_attr_remote.c
index ceaefb3..6c48d2e 100644
--- a/fs/xfs/libxfs/xfs_attr_remote.c
+++ b/fs/xfs/libxfs/xfs_attr_remote.c
@@ -739,7 +739,7 @@ xfs_attr_rmtval_invalidate(
  * transaction and re-call the function
  */
 int
-__xfs_attr_rmtval_remove(
+xfs_attr_rmtval_remove(
 	struct xfs_delattr_context	*dac)
 {
 	struct xfs_da_args		*args = dac->da_args;
diff --git a/fs/xfs/libxfs/xfs_attr_remote.h b/fs/xfs/libxfs/xfs_attr_remote.h
index 84e2700..6ae91af 100644
--- a/fs/xfs/libxfs/xfs_attr_remote.h
+++ b/fs/xfs/libxfs/xfs_attr_remote.h
@@ -10,11 +10,10 @@ int xfs_attr3_rmt_blocks(struct xfs_mount *mp, int attrlen);
 
 int xfs_attr_rmtval_get(struct xfs_da_args *args);
 int xfs_attr_rmtval_set(struct xfs_da_args *args);
-int xfs_attr_rmtval_remove(struct xfs_da_args *args);
 int xfs_attr_rmtval_stale(struct xfs_inode *ip, struct xfs_bmbt_irec *map,
 		xfs_buf_flags_t incore_flags);
 int xfs_attr_rmtval_invalidate(struct xfs_da_args *args);
-int __xfs_attr_rmtval_remove(struct xfs_delattr_context *dac);
+int xfs_attr_rmtval_remove(struct xfs_delattr_context *dac);
 int xfs_attr_rmt_find_hole(struct xfs_da_args *args);
 int xfs_attr_rmtval_set_value(struct xfs_da_args *args);
 int xfs_attr_rmtval_set_blk(struct xfs_delattr_context *dac);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v12 4/8] xfs: Set up infastructure for deferred attribute operations
  2020-08-27  0:35 [PATCH v12 0/8] xfs: Delayed Attributes Allison Collins
                   ` (2 preceding siblings ...)
  2020-08-27  0:35 ` [PATCH v12 3/8] xfs: Rename __xfs_attr_rmtval_remove Allison Collins
@ 2020-08-27  0:35 ` Allison Collins
  2020-08-28 21:27   ` Darrick J. Wong
  2020-08-27  0:35 ` [PATCH v12 5/8] xfs: Add xfs_attr_set_deferred and xfs_attr_remove_deferred Allison Collins
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 21+ messages in thread
From: Allison Collins @ 2020-08-27  0:35 UTC (permalink / raw)
  To: linux-xfs

Currently attributes are modified directly across one or more
transactions. But they are not logged or replayed in the event of an
error. The goal of delayed attributes is to enable logging and replaying
of attribute operations using the existing delayed operations
infrastructure.  This will later enable the attributes to become part of
larger multi part operations that also must first be recorded to the
log.  This is mostly of interest in the scheme of parent pointers which
would need to maintain an attribute containing parent inode information
any time an inode is moved, created, or removed.  Parent pointers would
then be of interest to any feature that would need to quickly derive an
inode path from the mount point. Online scrub, nfs lookups and fs grow
or shrink operations are all features that could take advantage of this.

This patch adds two new log item types for setting or removing
attributes as deferred operations.  The xfs_attri_log_item logs an
intent to set or remove an attribute.  The corresponding
xfs_attrd_log_item holds a reference to the xfs_attri_log_item and is
freed once the transaction is done.  Both log items use a generic
xfs_attr_log_format structure that contains the attribute name, value,
flags, inode, and an op_flag that indicates if the operations is a set
or remove.

Signed-off-by: Allison Collins <allison.henderson@oracle.com>
---
 fs/xfs/Makefile                 |   1 +
 fs/xfs/libxfs/xfs_attr.c        |   7 +-
 fs/xfs/libxfs/xfs_attr.h        |  39 ++
 fs/xfs/libxfs/xfs_defer.c       |   1 +
 fs/xfs/libxfs/xfs_defer.h       |   3 +
 fs/xfs/libxfs/xfs_log_format.h  |  44 ++-
 fs/xfs/libxfs/xfs_log_recover.h |   2 +
 fs/xfs/libxfs/xfs_types.h       |   1 +
 fs/xfs/scrub/common.c           |   2 +
 fs/xfs/xfs_acl.c                |   2 +
 fs/xfs/xfs_attr_item.c          | 829 ++++++++++++++++++++++++++++++++++++++++
 fs/xfs/xfs_attr_item.h          |  76 ++++
 fs/xfs/xfs_attr_list.c          |   1 +
 fs/xfs/xfs_ioctl.c              |   2 +
 fs/xfs/xfs_ioctl32.c            |   2 +
 fs/xfs/xfs_iops.c               |   2 +
 fs/xfs/xfs_log.c                |   4 +
 fs/xfs/xfs_log_recover.c        |   2 +
 fs/xfs/xfs_ondisk.h             |   2 +
 fs/xfs/xfs_xattr.c              |   1 +
 20 files changed, 1017 insertions(+), 6 deletions(-)

diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index 04611a1..b056cfc 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -102,6 +102,7 @@ xfs-y				+= xfs_log.o \
 				   xfs_buf_item_recover.o \
 				   xfs_dquot_item_recover.o \
 				   xfs_extfree_item.o \
+				   xfs_attr_item.o \
 				   xfs_icreate_item.o \
 				   xfs_inode_item.o \
 				   xfs_inode_item_recover.o \
diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
index a8cfe62..cf75742 100644
--- a/fs/xfs/libxfs/xfs_attr.c
+++ b/fs/xfs/libxfs/xfs_attr.c
@@ -24,6 +24,7 @@
 #include "xfs_quota.h"
 #include "xfs_trans_space.h"
 #include "xfs_trace.h"
+#include "xfs_attr_item.h"
 
 /*
  * xfs_attr.c
@@ -59,8 +60,6 @@ STATIC int xfs_attr_node_hasname(xfs_da_args_t *args,
 STATIC int xfs_attr_fillstate(xfs_da_state_t *state);
 STATIC int xfs_attr_refillstate(xfs_da_state_t *state);
 STATIC int xfs_attr_leaf_try_add(struct xfs_da_args *args, struct xfs_buf *bp);
-STATIC int xfs_attr_set_iter(struct xfs_delattr_context *dac,
-			     struct xfs_buf **leaf_bp);
 
 int
 xfs_inode_hasattr(
@@ -142,7 +141,7 @@ xfs_attr_get(
 /*
  * Calculate how many blocks we need for the new attribute,
  */
-STATIC int
+int
 xfs_attr_calc_size(
 	struct xfs_da_args	*args,
 	int			*local)
@@ -327,7 +326,7 @@ xfs_attr_set_args(
  * to handle this, and recall the function until a successful error code is
  * returned.
  */
-STATIC int
+int
 xfs_attr_set_iter(
 	struct xfs_delattr_context	*dac,
 	struct xfs_buf			**leaf_bp)
diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
index 4f6bba8..23b8308 100644
--- a/fs/xfs/libxfs/xfs_attr.h
+++ b/fs/xfs/libxfs/xfs_attr.h
@@ -247,6 +247,7 @@ enum xfs_delattr_state {
 #define XFS_DAC_DEFER_FINISH		0x01 /* finish the transaction */
 #define XFS_DAC_NODE_RMVNAME_INIT	0x02 /* xfs_attr_node_removename init */
 #define XFS_DAC_LEAF_ADDNAME_INIT	0x04 /* xfs_attr_leaf_addname init*/
+#define XFS_DAC_DELAYED_OP_INIT		0x08 /* delayed operations init*/
 
 /*
  * Context used for keeping track of delayed attribute operations
@@ -254,6 +255,9 @@ enum xfs_delattr_state {
 struct xfs_delattr_context {
 	struct xfs_da_args      *da_args;
 
+	/* Used by delayed attributes to hold leaf across transactions */
+	struct xfs_buf		*leaf_bp;
+
 	/* Used in xfs_attr_rmtval_set_blk to roll through allocating blocks */
 	struct xfs_bmbt_irec	map;
 	xfs_dablk_t		lblkno;
@@ -268,6 +272,38 @@ struct xfs_delattr_context {
 	enum xfs_delattr_state  dela_state;
 };
 
+/*
+ * List of attrs to commit later.
+ */
+struct xfs_attr_item {
+	struct xfs_inode	*xattri_ip;
+	void			*xattri_value;		/* attr value */
+	void			*xattri_name;		/* attr name */
+	uint32_t		xattri_op_flags;	/* attr op set or rm */
+	uint32_t		xattri_value_len;	/* length of value */
+	uint32_t		xattri_name_len;	/* length of name */
+	uint32_t		xattri_flags;		/* attr flags */
+
+	/* used to log this item to an intent */
+	struct list_head	xattri_list;
+
+	/*
+	 * xfs_delattr_context and xfs_da_args need to remain instantiated
+	 * across transaction rolls during the defer finish, so store them here
+	 */
+	struct xfs_da_args		xattri_args;
+	struct xfs_delattr_context	xattri_dac;
+
+	/*
+	 * A byte array follows the header containing the file name and
+	 * attribute value.
+	 */
+};
+
+#define XFS_ATTR_ITEM_SIZEOF(namelen, valuelen)	\
+	(sizeof(struct xfs_attr_item) + (namelen) + (valuelen))
+
+
 /*========================================================================
  * Function prototypes for the kernel.
  *========================================================================*/
@@ -283,11 +319,14 @@ int xfs_attr_get_ilocked(struct xfs_da_args *args);
 int xfs_attr_get(struct xfs_da_args *args);
 int xfs_attr_set(struct xfs_da_args *args);
 int xfs_attr_set_args(struct xfs_da_args *args);
+int xfs_attr_set_iter(struct xfs_delattr_context *dac,
+		      struct xfs_buf **leaf_bp);
 int xfs_has_attr(struct xfs_da_args *args);
 int xfs_attr_remove_args(struct xfs_da_args *args);
 int xfs_attr_remove_iter(struct xfs_delattr_context *dac);
 bool xfs_attr_namecheck(const void *name, size_t length);
 void xfs_delattr_context_init(struct xfs_delattr_context *dac,
 			      struct xfs_da_args *args);
+int xfs_attr_calc_size(struct xfs_da_args *args, int *local);
 
 #endif	/* __XFS_ATTR_H__ */
diff --git a/fs/xfs/libxfs/xfs_defer.c b/fs/xfs/libxfs/xfs_defer.c
index d8f5862..4392279 100644
--- a/fs/xfs/libxfs/xfs_defer.c
+++ b/fs/xfs/libxfs/xfs_defer.c
@@ -176,6 +176,7 @@ static const struct xfs_defer_op_type *defer_op_types[] = {
 	[XFS_DEFER_OPS_TYPE_RMAP]	= &xfs_rmap_update_defer_type,
 	[XFS_DEFER_OPS_TYPE_FREE]	= &xfs_extent_free_defer_type,
 	[XFS_DEFER_OPS_TYPE_AGFL_FREE]	= &xfs_agfl_free_defer_type,
+	[XFS_DEFER_OPS_TYPE_ATTR]	= &xfs_attr_defer_type,
 };
 
 static void
diff --git a/fs/xfs/libxfs/xfs_defer.h b/fs/xfs/libxfs/xfs_defer.h
index 6b2ca58..193d3bb 100644
--- a/fs/xfs/libxfs/xfs_defer.h
+++ b/fs/xfs/libxfs/xfs_defer.h
@@ -18,6 +18,7 @@ enum xfs_defer_ops_type {
 	XFS_DEFER_OPS_TYPE_RMAP,
 	XFS_DEFER_OPS_TYPE_FREE,
 	XFS_DEFER_OPS_TYPE_AGFL_FREE,
+	XFS_DEFER_OPS_TYPE_ATTR,
 	XFS_DEFER_OPS_TYPE_MAX,
 };
 
@@ -62,5 +63,7 @@ extern const struct xfs_defer_op_type xfs_refcount_update_defer_type;
 extern const struct xfs_defer_op_type xfs_rmap_update_defer_type;
 extern const struct xfs_defer_op_type xfs_extent_free_defer_type;
 extern const struct xfs_defer_op_type xfs_agfl_free_defer_type;
+extern const struct xfs_defer_op_type xfs_attr_defer_type;
+
 
 #endif /* __XFS_DEFER_H__ */
diff --git a/fs/xfs/libxfs/xfs_log_format.h b/fs/xfs/libxfs/xfs_log_format.h
index e3400c9..33b26b6 100644
--- a/fs/xfs/libxfs/xfs_log_format.h
+++ b/fs/xfs/libxfs/xfs_log_format.h
@@ -117,7 +117,12 @@ struct xfs_unmount_log_format {
 #define XLOG_REG_TYPE_CUD_FORMAT	24
 #define XLOG_REG_TYPE_BUI_FORMAT	25
 #define XLOG_REG_TYPE_BUD_FORMAT	26
-#define XLOG_REG_TYPE_MAX		26
+#define XLOG_REG_TYPE_ATTRI_FORMAT	27
+#define XLOG_REG_TYPE_ATTRD_FORMAT	28
+#define XLOG_REG_TYPE_ATTR_NAME	29
+#define XLOG_REG_TYPE_ATTR_VALUE	30
+#define XLOG_REG_TYPE_MAX		30
+
 
 /*
  * Flags to log operation header
@@ -240,6 +245,8 @@ typedef struct xfs_trans_header {
 #define	XFS_LI_CUD		0x1243
 #define	XFS_LI_BUI		0x1244	/* bmbt update intent */
 #define	XFS_LI_BUD		0x1245
+#define	XFS_LI_ATTRI		0x1246  /* attr set/remove intent*/
+#define	XFS_LI_ATTRD		0x1247  /* attr set/remove done */
 
 #define XFS_LI_TYPE_DESC \
 	{ XFS_LI_EFI,		"XFS_LI_EFI" }, \
@@ -255,7 +262,9 @@ typedef struct xfs_trans_header {
 	{ XFS_LI_CUI,		"XFS_LI_CUI" }, \
 	{ XFS_LI_CUD,		"XFS_LI_CUD" }, \
 	{ XFS_LI_BUI,		"XFS_LI_BUI" }, \
-	{ XFS_LI_BUD,		"XFS_LI_BUD" }
+	{ XFS_LI_BUD,		"XFS_LI_BUD" }, \
+	{ XFS_LI_ATTRI,		"XFS_LI_ATTRI" }, \
+	{ XFS_LI_ATTRD,		"XFS_LI_ATTRD" }
 
 /*
  * Inode Log Item Format definitions.
@@ -860,4 +869,35 @@ struct xfs_icreate_log {
 	__be32		icl_gen;	/* inode generation number to use */
 };
 
+/*
+ * Flags for deferred attribute operations.
+ * Upper bits are flags, lower byte is type code
+ */
+#define XFS_ATTR_OP_FLAGS_SET		1	/* Set the attribute */
+#define XFS_ATTR_OP_FLAGS_REMOVE	2	/* Remove the attribute */
+#define XFS_ATTR_OP_FLAGS_TYPE_MASK	0x0FF	/* Flags type mask */
+
+/*
+ * This is the structure used to lay out an attr log item in the
+ * log.
+ */
+struct xfs_attri_log_format {
+	uint16_t	alfi_type;	/* attri log item type */
+	uint16_t	alfi_size;	/* size of this item */
+	uint32_t	__pad;		/* pad to 64 bit aligned */
+	uint64_t	alfi_id;	/* attri identifier */
+	xfs_ino_t       alfi_ino;	/* the inode for this attr operation */
+	uint32_t        alfi_op_flags;	/* marks the op as a set or remove */
+	uint32_t        alfi_name_len;	/* attr name length */
+	uint32_t        alfi_value_len;	/* attr value length */
+	uint32_t        alfi_attr_flags;/* attr flags */
+};
+
+struct xfs_attrd_log_format {
+	uint16_t	alfd_type;	/* attrd log item type */
+	uint16_t	alfd_size;	/* size of this item */
+	uint32_t	__pad;		/* pad to 64 bit aligned */
+	uint64_t	alfd_alf_id;	/* id of corresponding attrd */
+};
+
 #endif /* __XFS_LOG_FORMAT_H__ */
diff --git a/fs/xfs/libxfs/xfs_log_recover.h b/fs/xfs/libxfs/xfs_log_recover.h
index 641132d..b0b8e94 100644
--- a/fs/xfs/libxfs/xfs_log_recover.h
+++ b/fs/xfs/libxfs/xfs_log_recover.h
@@ -72,6 +72,8 @@ extern const struct xlog_recover_item_ops xlog_rui_item_ops;
 extern const struct xlog_recover_item_ops xlog_rud_item_ops;
 extern const struct xlog_recover_item_ops xlog_cui_item_ops;
 extern const struct xlog_recover_item_ops xlog_cud_item_ops;
+extern const struct xlog_recover_item_ops xlog_attri_item_ops;
+extern const struct xlog_recover_item_ops xlog_attrd_item_ops;
 
 /*
  * Macros, structures, prototypes for internal log manager use.
diff --git a/fs/xfs/libxfs/xfs_types.h b/fs/xfs/libxfs/xfs_types.h
index 397d947..860cdd2 100644
--- a/fs/xfs/libxfs/xfs_types.h
+++ b/fs/xfs/libxfs/xfs_types.h
@@ -11,6 +11,7 @@ typedef uint32_t	prid_t;		/* project ID */
 typedef uint32_t	xfs_agblock_t;	/* blockno in alloc. group */
 typedef uint32_t	xfs_agino_t;	/* inode # within allocation grp */
 typedef uint32_t	xfs_extlen_t;	/* extent length in blocks */
+typedef uint32_t	xfs_attrlen_t;	/* attr length */
 typedef uint32_t	xfs_agnumber_t;	/* allocation group number */
 typedef int32_t		xfs_extnum_t;	/* # of extents in a file */
 typedef int16_t		xfs_aextnum_t;	/* # extents in an attribute fork */
diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
index 1887605..9a649d1 100644
--- a/fs/xfs/scrub/common.c
+++ b/fs/xfs/scrub/common.c
@@ -24,6 +24,8 @@
 #include "xfs_rmap_btree.h"
 #include "xfs_log.h"
 #include "xfs_trans_priv.h"
+#include "xfs_da_format.h"
+#include "xfs_da_btree.h"
 #include "xfs_attr.h"
 #include "xfs_reflink.h"
 #include "scrub/scrub.h"
diff --git a/fs/xfs/xfs_acl.c b/fs/xfs/xfs_acl.c
index d4c687b5c..2fa173a 100644
--- a/fs/xfs/xfs_acl.c
+++ b/fs/xfs/xfs_acl.c
@@ -10,6 +10,8 @@
 #include "xfs_trans_resv.h"
 #include "xfs_mount.h"
 #include "xfs_inode.h"
+#include "xfs_da_format.h"
+#include "xfs_da_btree.h"
 #include "xfs_attr.h"
 #include "xfs_trace.h"
 #include "xfs_error.h"
diff --git a/fs/xfs/xfs_attr_item.c b/fs/xfs/xfs_attr_item.c
new file mode 100644
index 0000000..923c288
--- /dev/null
+++ b/fs/xfs/xfs_attr_item.c
@@ -0,0 +1,829 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Copyright (C) 2019 Oracle.  All Rights Reserved.
+ * Author: Allison Collins <allison.henderson@oracle.com>
+ */
+
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_format.h"
+#include "xfs_log_format.h"
+#include "xfs_trans_resv.h"
+#include "xfs_bit.h"
+#include "xfs_shared.h"
+#include "xfs_mount.h"
+#include "xfs_defer.h"
+#include "xfs_trans.h"
+#include "xfs_trans_priv.h"
+#include "xfs_buf_item.h"
+#include "xfs_attr_item.h"
+#include "xfs_log.h"
+#include "xfs_btree.h"
+#include "xfs_rmap.h"
+#include "xfs_inode.h"
+#include "xfs_icache.h"
+#include "xfs_da_format.h"
+#include "xfs_da_btree.h"
+#include "xfs_attr.h"
+#include "xfs_shared.h"
+#include "xfs_attr_item.h"
+#include "xfs_alloc.h"
+#include "xfs_bmap.h"
+#include "xfs_trace.h"
+#include "libxfs/xfs_da_format.h"
+#include "xfs_inode.h"
+#include "xfs_quota.h"
+#include "xfs_log_priv.h"
+#include "xfs_log_recover.h"
+
+static const struct xfs_item_ops xfs_attri_item_ops;
+static const struct xfs_item_ops xfs_attrd_item_ops;
+
+static inline struct xfs_attri_log_item *ATTRI_ITEM(struct xfs_log_item *lip)
+{
+	return container_of(lip, struct xfs_attri_log_item, attri_item);
+}
+
+STATIC void
+xfs_attri_item_free(
+	struct xfs_attri_log_item	*attrip)
+{
+	kmem_free(attrip->attri_item.li_lv_shadow);
+	kmem_free(attrip);
+}
+
+/*
+ * Freeing the attrip requires that we remove it from the AIL if it has already
+ * been placed there. However, the ATTRI may not yet have been placed in the
+ * AIL when called by xfs_attri_release() from ATTRD processing due to the
+ * ordering of committed vs unpin operations in bulk insert operations. Hence
+ * the reference count to ensure only the last caller frees the ATTRI.
+ */
+STATIC void
+xfs_attri_release(
+	struct xfs_attri_log_item	*attrip)
+{
+	ASSERT(atomic_read(&attrip->attri_refcount) > 0);
+	if (atomic_dec_and_test(&attrip->attri_refcount)) {
+		xfs_trans_ail_delete(&attrip->attri_item,
+				     SHUTDOWN_LOG_IO_ERROR);
+		xfs_attri_item_free(attrip);
+	}
+}
+
+/*
+ * This returns the number of iovecs needed to log the given attri item. We
+ * only need 1 iovec for an attri item.  It just logs the attr_log_format
+ * structure.
+ */
+static inline int
+xfs_attri_item_sizeof(
+	struct xfs_attri_log_item *attrip)
+{
+	return sizeof(struct xfs_attri_log_format);
+}
+
+STATIC void
+xfs_attri_item_size(
+	struct xfs_log_item	*lip,
+	int			*nvecs,
+	int			*nbytes)
+{
+	struct xfs_attri_log_item       *attrip = ATTRI_ITEM(lip);
+
+	*nvecs += 1;
+	*nbytes += xfs_attri_item_sizeof(attrip);
+
+	/* Attr set and remove operations require a name */
+	ASSERT(attrip->attri_name_len > 0);
+
+	*nvecs += 1;
+	*nbytes += ATTR_NVEC_SIZE(attrip->attri_name_len);
+
+	/*
+	 * Set ops can accept a value of 0 len to clear an attr value.  Remove
+	 * ops do not need a value at all.  So only account for the value
+	 * when it is needed.
+	 */
+	if (attrip->attri_value_len > 0) {
+		*nvecs += 1;
+		*nbytes += ATTR_NVEC_SIZE(attrip->attri_value_len);
+	}
+}
+
+/*
+ * This is called to fill in the log iovecs for the given attri log
+ * item. We use  1 iovec for the attri_format_item, 1 for the name, and
+ * another for the value if it is present
+ */
+STATIC void
+xfs_attri_item_format(
+	struct xfs_log_item	*lip,
+	struct xfs_log_vec	*lv)
+{
+	struct xfs_attri_log_item	*attrip = ATTRI_ITEM(lip);
+	struct xfs_log_iovec		*vecp = NULL;
+
+	attrip->attri_format.alfi_type = XFS_LI_ATTRI;
+	attrip->attri_format.alfi_size = 1;
+
+	/*
+	 * This size accounting must be done before copying the attrip into the
+	 * iovec.  If we do it after, the wrong size will be recorded to the log
+	 * and we trip across assertion checks for bad region sizes later during
+	 * the log recovery.
+	 */
+
+	ASSERT(attrip->attri_name_len > 0);
+	attrip->attri_format.alfi_size++;
+
+	if (attrip->attri_value_len > 0)
+		attrip->attri_format.alfi_size++;
+
+	xlog_copy_iovec(lv, &vecp, XLOG_REG_TYPE_ATTRI_FORMAT,
+			&attrip->attri_format,
+			xfs_attri_item_sizeof(attrip));
+	if (attrip->attri_name_len > 0)
+		xlog_copy_iovec(lv, &vecp, XLOG_REG_TYPE_ATTR_NAME,
+				attrip->attri_name,
+				ATTR_NVEC_SIZE(attrip->attri_name_len));
+
+	if (attrip->attri_value_len > 0)
+		xlog_copy_iovec(lv, &vecp, XLOG_REG_TYPE_ATTR_VALUE,
+				attrip->attri_value,
+				ATTR_NVEC_SIZE(attrip->attri_value_len));
+}
+
+/*
+ * The unpin operation is the last place an ATTRI is manipulated in the log. It
+ * is either inserted in the AIL or aborted in the event of a log I/O error. In
+ * either case, the ATTRI transaction has been successfully committed to make
+ * it this far. Therefore, we expect whoever committed the ATTRI to either
+ * construct and commit the ATTRD or drop the ATTRD's reference in the event of
+ * error. Simply drop the log's ATTRI reference now that the log is done with
+ * it.
+ */
+STATIC void
+xfs_attri_item_unpin(
+	struct xfs_log_item	*lip,
+	int			remove)
+{
+	struct xfs_attri_log_item	*attrip = ATTRI_ITEM(lip);
+
+	xfs_attri_release(attrip);
+}
+
+
+STATIC void
+xfs_attri_item_release(
+	struct xfs_log_item	*lip)
+{
+	xfs_attri_release(ATTRI_ITEM(lip));
+}
+
+/*
+ * Allocate and initialize an attri item
+ */
+STATIC struct xfs_attri_log_item *
+xfs_attri_init(
+	struct xfs_mount	*mp)
+
+{
+	struct xfs_attri_log_item	*attrip;
+	uint				size;
+
+	size = (uint)(sizeof(struct xfs_attri_log_item));
+	attrip = kmem_zalloc(size, 0);
+
+	xfs_log_item_init(mp, &attrip->attri_item, XFS_LI_ATTRI,
+			  &xfs_attri_item_ops);
+	attrip->attri_format.alfi_id = (uintptr_t)(void *)attrip;
+	atomic_set(&attrip->attri_refcount, 2);
+
+	return attrip;
+}
+
+/*
+ * Copy an attr format buffer from the given buf, and into the destination attr
+ * format structure.
+ */
+STATIC int
+xfs_attri_copy_format(struct xfs_log_iovec *buf,
+		      struct xfs_attri_log_format *dst_attr_fmt)
+{
+	struct xfs_attri_log_format *src_attr_fmt = buf->i_addr;
+	uint len = sizeof(struct xfs_attri_log_format);
+
+	if (buf->i_len != len)
+		return -EFSCORRUPTED;
+
+	memcpy((char *)dst_attr_fmt, (char *)src_attr_fmt, len);
+	return 0;
+}
+
+static inline struct xfs_attrd_log_item *ATTRD_ITEM(struct xfs_log_item *lip)
+{
+	return container_of(lip, struct xfs_attrd_log_item, attrd_item);
+}
+
+STATIC void
+xfs_attrd_item_free(struct xfs_attrd_log_item *attrdp)
+{
+	kmem_free(attrdp->attrd_item.li_lv_shadow);
+	kmem_free(attrdp);
+}
+
+/*
+ * This returns the number of iovecs needed to log the given attrd item.
+ * We only need 1 iovec for an attrd item.  It just logs the attr_log_format
+ * structure.
+ */
+static inline int
+xfs_attrd_item_sizeof(
+	struct xfs_attrd_log_item *attrdp)
+{
+	return sizeof(struct xfs_attrd_log_format);
+}
+
+STATIC void
+xfs_attrd_item_size(
+	struct xfs_log_item	*lip,
+	int			*nvecs,
+	int			*nbytes)
+{
+	struct xfs_attrd_log_item	*attrdp = ATTRD_ITEM(lip);
+	*nvecs += 1;
+	*nbytes += xfs_attrd_item_sizeof(attrdp);
+}
+
+/*
+ * This is called to fill in the log iovecs for the given attrd log item. We use
+ * only 1 iovec for the attrd_format, and we point that at the attr_log_format
+ * structure embedded in the attrd item.
+ */
+STATIC void
+xfs_attrd_item_format(
+	struct xfs_log_item	*lip,
+	struct xfs_log_vec	*lv)
+{
+	struct xfs_attrd_log_item	*attrdp = ATTRD_ITEM(lip);
+	struct xfs_log_iovec		*vecp = NULL;
+
+	attrdp->attrd_format.alfd_type = XFS_LI_ATTRD;
+	attrdp->attrd_format.alfd_size = 1;
+
+	xlog_copy_iovec(lv, &vecp, XLOG_REG_TYPE_ATTRD_FORMAT,
+			&attrdp->attrd_format, xfs_attrd_item_sizeof(attrdp));
+}
+
+/*
+ * The ATTRD is either committed or aborted if the transaction is cancelled. If
+ * the transaction is cancelled, drop our reference to the ATTRI and free the
+ * ATTRD.
+ */
+STATIC void
+xfs_attrd_item_release(
+	struct xfs_log_item     *lip)
+{
+	struct xfs_attrd_log_item *attrdp = ATTRD_ITEM(lip);
+	xfs_attri_release(attrdp->attrd_attrip);
+	xfs_attrd_item_free(attrdp);
+}
+
+/*
+ * Log an ATTRI it to the ATTRD when the attr op is done.  An attr operation
+ * may be a set or a remove.  Note that the transaction is marked dirty
+ * regardless of whether the operation succeeds or fails to support the
+ * ATTRI/ATTRD lifecycle rules.
+ */
+int
+xfs_trans_attr(
+	struct xfs_delattr_context	*dac,
+	struct xfs_attrd_log_item	*attrdp,
+	struct xfs_buf			**leaf_bp,
+	uint32_t			op_flags)
+{
+	struct xfs_da_args		*args = dac->da_args;
+	int				error;
+
+	error = xfs_qm_dqattach_locked(args->dp, 0);
+	if (error)
+		return error;
+
+	switch (op_flags) {
+	case XFS_ATTR_OP_FLAGS_SET:
+		args->op_flags |= XFS_DA_OP_ADDNAME;
+		error = xfs_attr_set_iter(dac, leaf_bp);
+		break;
+	case XFS_ATTR_OP_FLAGS_REMOVE:
+		ASSERT(XFS_IFORK_Q((args->dp)));
+		error = xfs_attr_remove_iter(dac);
+		break;
+	default:
+		error = -EFSCORRUPTED;
+		break;
+	}
+
+	/*
+	 * Mark the transaction dirty, even on error. This ensures the
+	 * transaction is aborted, which:
+	 *
+	 * 1.) releases the ATTRI and frees the ATTRD
+	 * 2.) shuts down the filesystem
+	 */
+	args->trans->t_flags |= XFS_TRANS_DIRTY;
+	set_bit(XFS_LI_DIRTY, &attrdp->attrd_item.li_flags);
+
+	return error;
+}
+
+/* Log an attr to the intent item. */
+STATIC void
+xfs_attr_log_item(
+	struct xfs_trans		*tp,
+	struct xfs_attri_log_item	*attrip,
+	struct xfs_attr_item		*attr)
+{
+	struct xfs_attri_log_format	*attrp;
+	char				*name_value;
+
+	name_value = ((char *)attr) + sizeof(struct xfs_attr_item);
+
+	tp->t_flags |= XFS_TRANS_DIRTY;
+	set_bit(XFS_LI_DIRTY, &attrip->attri_item.li_flags);
+
+	/*
+	 * At this point the xfs_attr_item has been constructed, and we've
+	 * created the log intent. Fill in the attri log item and log format
+	 * structure with fields from this xfs_attr_item
+	 */
+	attrp = &attrip->attri_format;
+	attrp->alfi_ino = attr->xattri_ip->i_ino;
+	attrp->alfi_op_flags = attr->xattri_op_flags;
+	attrp->alfi_value_len = attr->xattri_value_len;
+	attrp->alfi_name_len = attr->xattri_name_len;
+	attrp->alfi_attr_flags = attr->xattri_flags;
+
+	attrip->attri_name = name_value;
+	attrip->attri_value = &name_value[attr->xattri_name_len];
+	attrip->attri_name_len = attr->xattri_name_len;
+	attrip->attri_value_len = attr->xattri_value_len;
+}
+
+/* Get an ATTRI. */
+static struct xfs_log_item *
+xfs_attr_create_intent(
+	struct xfs_trans		*tp,
+	struct list_head		*items,
+	unsigned int			count,
+	bool				sort)
+{
+	struct xfs_mount		*mp = tp->t_mountp;
+	struct xfs_attri_log_item	*attrip = xfs_attri_init(mp);
+	struct xfs_attr_item		*attr;
+
+	ASSERT(count == 1);
+
+	xfs_trans_add_item(tp, &attrip->attri_item);
+	list_for_each_entry(attr, items, xattri_list)
+		xfs_attr_log_item(tp, attrip, attr);
+	return &attrip->attri_item;
+}
+
+/* Process an attr. */
+STATIC int
+xfs_attr_finish_item(
+	struct xfs_trans		*tp,
+	struct xfs_log_item		*done,
+	struct list_head		*item,
+	struct xfs_btree_cur		**state)
+{
+	struct xfs_attr_item		*attr;
+	int				error;
+	int				local;
+	struct xfs_delattr_context	*dac;
+	struct xfs_da_args		*args;
+	struct xfs_attrd_log_item	*attrdp;
+	struct xfs_attri_log_item	*attrip;
+
+	attr = container_of(item, struct xfs_attr_item, xattri_list);
+	dac = &attr->xattri_dac;
+	args = &attr->xattri_args;
+
+	if (!(dac->flags & XFS_DAC_DELAYED_OP_INIT)) {
+		/* Only need to initialize args context once */
+		memset(args, 0, sizeof(*args));
+		args->geo = attr->xattri_ip->i_mount->m_attr_geo;
+		args->whichfork = XFS_ATTR_FORK;
+		args->dp = attr->xattri_ip;
+		args->name = ((const unsigned char *)attr) +
+			      sizeof(struct xfs_attr_item);
+		args->namelen = attr->xattri_name_len;
+		args->attr_filter = attr->xattri_flags;
+		args->hashval = xfs_da_hashname(args->name, args->namelen);
+		args->value = (void *)&args->name[attr->xattri_name_len];
+		args->valuelen = attr->xattri_value_len;
+		args->op_flags = XFS_DA_OP_OKNOENT;
+
+		/* must match existing transaction block res */
+		args->total = xfs_attr_calc_size(args, &local);
+
+		memset(dac, 0, sizeof(struct xfs_delattr_context));
+		dac->flags |= XFS_DAC_DELAYED_OP_INIT;
+		dac->da_args = args;
+	}
+
+	/*
+	 * Always reset trans after EAGAIN cycle
+	 * since the transaction is new
+	 */
+	args->trans = tp;
+
+	error = xfs_trans_attr(dac, ATTRD_ITEM(done), &dac->leaf_bp,
+			       attr->xattri_op_flags);
+	/*
+	 * The attrip refers to xfs_attr_item memory to log the name and value
+	 * with the intent item. This already occurred when the intent was
+	 * committed so these fields are no longer accessed. Clear them out of
+	 * caution since we're about to free the xfs_attr_item.
+	 */
+	attrdp = (struct xfs_attrd_log_item *)done;
+	attrip = attrdp->attrd_attrip;
+	attrip->attri_name = NULL;
+	attrip->attri_value = NULL;
+
+	if (error != -EAGAIN)
+		kmem_free(attr);
+
+	return error;
+}
+
+/* Abort all pending ATTRs. */
+STATIC void
+xfs_attr_abort_intent(
+	struct xfs_log_item		*intent)
+{
+	xfs_attri_release(ATTRI_ITEM(intent));
+}
+
+/* Cancel an attr */
+STATIC void
+xfs_attr_cancel_item(
+	struct list_head		*item)
+{
+	struct xfs_attr_item		*attr;
+
+	attr = container_of(item, struct xfs_attr_item, xattri_list);
+	kmem_free(attr);
+}
+
+/*
+ * The ATTRI is logged only once and cannot be moved in the log, so simply
+ * return the lsn at which it's been logged.
+ */
+STATIC xfs_lsn_t
+xfs_attri_item_committed(
+	struct xfs_log_item	*lip,
+	xfs_lsn_t		lsn)
+{
+	return lsn;
+}
+
+STATIC void
+xfs_attri_item_committing(
+	struct xfs_log_item	*lip,
+	xfs_lsn_t		lsn)
+{
+}
+
+STATIC bool
+xfs_attri_item_match(
+	struct xfs_log_item	*lip,
+	uint64_t		intent_id)
+{
+	return ATTRI_ITEM(lip)->attri_format.alfi_id == intent_id;
+}
+
+/*
+ * When the attrd item is committed to disk, all we need to do is delete our
+ * reference to our partner attri item and then free ourselves. Since we're
+ * freeing ourselves we must return -1 to keep the transaction code from
+ * further referencing this item.
+ */
+STATIC xfs_lsn_t
+xfs_attrd_item_committed(
+	struct xfs_log_item	*lip,
+	xfs_lsn_t		lsn)
+{
+	struct xfs_attrd_log_item	*attrdp = ATTRD_ITEM(lip);
+
+	/*
+	 * Drop the ATTRI reference regardless of whether the ATTRD has been
+	 * aborted. Once the ATTRD transaction is constructed, it is the sole
+	 * responsibility of the ATTRD to release the ATTRI (even if the ATTRI
+	 * is aborted due to log I/O error).
+	 */
+	xfs_attri_release(attrdp->attrd_attrip);
+	xfs_attrd_item_free(attrdp);
+
+	return NULLCOMMITLSN;
+}
+
+STATIC void
+xfs_attrd_item_committing(
+	struct xfs_log_item	*lip,
+	xfs_lsn_t		lsn)
+{
+}
+
+
+/*
+ * Allocate and initialize an attrd item
+ */
+struct xfs_attrd_log_item *
+xfs_attrd_init(
+	struct xfs_mount		*mp,
+	struct xfs_attri_log_item	*attrip)
+
+{
+	struct xfs_attrd_log_item	*attrdp;
+	uint				size;
+
+	size = (uint)(sizeof(struct xfs_attrd_log_item));
+	attrdp = kmem_zalloc(size, 0);
+	memset(attrdp, 0, size);
+
+	xfs_log_item_init(mp, &attrdp->attrd_item, XFS_LI_ATTRD,
+			  &xfs_attrd_item_ops);
+	attrdp->attrd_attrip = attrip;
+	attrdp->attrd_format.alfd_alf_id = attrip->attri_format.alfi_id;
+
+	return attrdp;
+}
+
+/*
+ * This routine is called to allocate an "attr free done" log item.
+ */
+struct xfs_attrd_log_item *
+xfs_trans_get_attrd(struct xfs_trans		*tp,
+		  struct xfs_attri_log_item	*attrip)
+{
+	struct xfs_attrd_log_item		*attrdp;
+
+	ASSERT(tp != NULL);
+
+	attrdp = xfs_attrd_init(tp->t_mountp, attrip);
+	ASSERT(attrdp != NULL);
+
+	xfs_trans_add_item(tp, &attrdp->attrd_item);
+	return attrdp;
+}
+
+static const struct xfs_item_ops xfs_attrd_item_ops = {
+	.iop_size	= xfs_attrd_item_size,
+	.iop_format	= xfs_attrd_item_format,
+	.iop_release    = xfs_attrd_item_release,
+	.iop_committing	= xfs_attrd_item_committing,
+	.iop_committed	= xfs_attrd_item_committed,
+};
+
+
+/* Get an ATTRD so we can process all the attrs. */
+static struct xfs_log_item *
+xfs_attr_create_done(
+	struct xfs_trans		*tp,
+	struct xfs_log_item		*intent,
+	unsigned int			count)
+{
+	return &xfs_trans_get_attrd(tp, ATTRI_ITEM(intent))->attrd_item;
+}
+
+const struct xfs_defer_op_type xfs_attr_defer_type = {
+	.max_items	= 1,
+	.create_intent	= xfs_attr_create_intent,
+	.abort_intent	= xfs_attr_abort_intent,
+	.create_done	= xfs_attr_create_done,
+	.finish_item	= xfs_attr_finish_item,
+	.cancel_item	= xfs_attr_cancel_item,
+};
+
+/*
+ * Process an attr intent item that was recovered from the log.  We need to
+ * delete the attr that it describes.
+ */
+STATIC int
+xfs_attri_item_recover(
+	struct xfs_log_item		*lip,
+	struct xfs_trans		*parent_tp)
+{
+	struct xfs_attri_log_item	*attrip = ATTRI_ITEM(lip);
+	struct xfs_mount		*mp = parent_tp->t_mountp;
+	struct xfs_inode		*ip;
+	struct xfs_attrd_log_item	*attrdp;
+	struct xfs_da_args		args;
+	struct xfs_attri_log_format	*attrp;
+	struct xfs_trans_res		tres;
+	int				local;
+	int				error, err2 = 0;
+	int				rsvd = 0;
+	struct xfs_buf			*leaf_bp = NULL;
+	struct xfs_delattr_context	dac = {
+		.da_args	= &args,
+	};
+
+	/*
+	 * First check the validity of the attr described by the ATTRI.  If any
+	 * are bad, then assume that all are bad and just toss the ATTRI.
+	 */
+	attrp = &attrip->attri_format;
+	if (!(attrp->alfi_op_flags == XFS_ATTR_OP_FLAGS_SET ||
+	      attrp->alfi_op_flags == XFS_ATTR_OP_FLAGS_REMOVE) ||
+	    (attrp->alfi_value_len > XATTR_SIZE_MAX) ||
+	    (attrp->alfi_name_len > XATTR_NAME_MAX) ||
+	    (attrp->alfi_name_len == 0)) {
+		/*
+		 * This will pull the ATTRI from the AIL and free the memory
+		 * associated with it.
+		 */
+		xfs_attri_release(attrip);
+		return -EFSCORRUPTED;
+	}
+
+	error = xfs_iget(mp, 0, attrp->alfi_ino, 0, 0, &ip);
+	if (error)
+		return error;
+
+	memset(&args, 0, sizeof(args));
+	args.geo = ip->i_mount->m_attr_geo;
+	args.whichfork = XFS_ATTR_FORK;
+	args.dp = ip;
+	args.name = attrip->attri_name;
+	args.namelen = attrp->alfi_name_len;
+	args.attr_filter = attrp->alfi_attr_flags;
+	args.hashval = xfs_da_hashname(attrip->attri_name,
+					attrp->alfi_name_len);
+	args.value = attrip->attri_value;
+	args.valuelen = attrp->alfi_value_len;
+	args.op_flags = XFS_DA_OP_OKNOENT;
+	args.total = xfs_attr_calc_size(&args, &local);
+
+	tres.tr_logres = M_RES(mp)->tr_attrsetm.tr_logres +
+			M_RES(mp)->tr_attrsetrt.tr_logres * args.total;
+	tres.tr_logcount = XFS_ATTRSET_LOG_COUNT;
+	tres.tr_logflags = XFS_TRANS_PERM_LOG_RES;
+
+	error = xfs_trans_alloc(mp, &tres, args.total,  0,
+				rsvd ? XFS_TRANS_RESERVE : 0, &args.trans);
+	if (error)
+		goto out_rele;
+	attrdp = xfs_trans_get_attrd(args.trans, attrip);
+
+	xfs_ilock(ip, XFS_ILOCK_EXCL);
+
+	xfs_trans_ijoin(args.trans, ip, 0);
+
+	do {
+		error = xfs_trans_attr(&dac, attrdp, &leaf_bp,
+				       attrp->alfi_op_flags);
+		if (error && error != -EAGAIN)
+			goto abort_error;
+
+		xfs_trans_log_inode(args.trans, ip,
+				XFS_ILOG_CORE | XFS_ILOG_ADATA);
+
+		err2 = xfs_trans_roll(&args.trans);
+		if (err2) {
+			error = err2;
+			goto abort_error;
+		}
+
+		/* Rejoin inode and leaf if needed */
+		xfs_trans_ijoin(args.trans, ip, 0);
+		if (leaf_bp) {
+			xfs_trans_bjoin(args.trans, leaf_bp);
+			xfs_trans_bhold(args.trans, leaf_bp);
+		}
+
+	} while (error == -EAGAIN);
+
+	error = xfs_trans_commit(args.trans);
+	if (error)
+		goto abort_error;
+
+	xfs_iunlock(ip, XFS_ILOCK_EXCL);
+	xfs_irele(ip);
+	return error;
+
+abort_error:
+	xfs_trans_cancel(args.trans);
+	xfs_iunlock(ip, XFS_ILOCK_EXCL);
+out_rele:
+	xfs_irele(ip);
+	return error;
+}
+
+static const struct xfs_item_ops xfs_attri_item_ops = {
+	.iop_size	= xfs_attri_item_size,
+	.iop_format	= xfs_attri_item_format,
+	.iop_unpin	= xfs_attri_item_unpin,
+	.iop_committed	= xfs_attri_item_committed,
+	.iop_committing = xfs_attri_item_committing,
+	.iop_release    = xfs_attri_item_release,
+	.iop_recover	= xfs_attri_item_recover,
+	.iop_match	= xfs_attri_item_match,
+};
+
+
+
+STATIC int
+xlog_recover_attri_commit_pass2(
+	struct xlog                     *log,
+	struct list_head		*buffer_list,
+	struct xlog_recover_item        *item,
+	xfs_lsn_t                       lsn)
+{
+	int                             error;
+	struct xfs_mount                *mp = log->l_mp;
+	struct xfs_attri_log_item       *attrip;
+	struct xfs_attri_log_format     *attri_formatp;
+	char				*name = NULL;
+	char				*value = NULL;
+	int				region = 0;
+
+	attri_formatp = item->ri_buf[region].i_addr;
+
+	attrip = xfs_attri_init(mp);
+	error = xfs_attri_copy_format(&item->ri_buf[region],
+				      &attrip->attri_format);
+	if (error) {
+		xfs_attri_item_free(attrip);
+		return error;
+	}
+
+	attrip->attri_name_len = attri_formatp->alfi_name_len;
+	attrip->attri_value_len = attri_formatp->alfi_value_len;
+	attrip = kmem_realloc(attrip, sizeof(struct xfs_attri_log_item) +
+			      attrip->attri_name_len + attrip->attri_value_len,
+			      0);
+
+	ASSERT(attrip->attri_name_len > 0);
+	region++;
+	name = ((char *)attrip) + sizeof(struct xfs_attri_log_item);
+	memcpy(name, item->ri_buf[region].i_addr,
+	       attrip->attri_name_len);
+	attrip->attri_name = name;
+
+	if (attrip->attri_value_len > 0) {
+		region++;
+		value = ((char *)attrip) + sizeof(struct xfs_attri_log_item) +
+			attrip->attri_name_len;
+		memcpy(value, item->ri_buf[region].i_addr,
+			attrip->attri_value_len);
+		attrip->attri_value = value;
+	}
+
+	/*
+	 * The ATTRI has two references. One for the ATTRD and one for ATTRI to
+	 * ensure it makes it into the AIL. Insert the ATTRI into the AIL
+	 * directly and drop the ATTRI reference. Note that
+	 * xfs_trans_ail_update() drops the AIL lock.
+	 */
+	xfs_trans_ail_insert(log->l_ailp, &attrip->attri_item, lsn);
+	xfs_attri_release(attrip);
+	return 0;
+}
+
+const struct xlog_recover_item_ops xlog_attri_item_ops = {
+	.item_type	= XFS_LI_ATTRI,
+	.commit_pass2	= xlog_recover_attri_commit_pass2,
+};
+
+/*
+ * This routine is called when an ATTRD format structure is found in a committed
+ * transaction in the log. Its purpose is to cancel the corresponding ATTRI if
+ * it was still in the log. To do this it searches the AIL for the ATTRI with
+ * an id equal to that in the ATTRD format structure. If we find it we drop
+ * the ATTRD reference, which removes the ATTRI from the AIL and frees it.
+ */
+STATIC int
+xlog_recover_attrd_commit_pass2(
+	struct xlog			*log,
+	struct list_head		*buffer_list,
+	struct xlog_recover_item	*item,
+	xfs_lsn_t			lsn)
+{
+	struct xfs_attrd_log_format	*attrd_formatp;
+
+	attrd_formatp = item->ri_buf[0].i_addr;
+	ASSERT((item->ri_buf[0].i_len ==
+				(sizeof(struct xfs_attrd_log_format))));
+
+	xlog_recover_release_intent(log, XFS_LI_ATTRI,
+				    attrd_formatp->alfd_alf_id);
+	return 0;
+}
+
+const struct xlog_recover_item_ops xlog_attrd_item_ops = {
+	.item_type	= XFS_LI_ATTRD,
+	.commit_pass2	= xlog_recover_attrd_commit_pass2,
+};
diff --git a/fs/xfs/xfs_attr_item.h b/fs/xfs/xfs_attr_item.h
new file mode 100644
index 0000000..7dd2572
--- /dev/null
+++ b/fs/xfs/xfs_attr_item.h
@@ -0,0 +1,76 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later
+ *
+ * Copyright (C) 2019 Oracle.  All Rights Reserved.
+ * Author: Allison Collins <allison.henderson@oracle.com>
+ */
+#ifndef	__XFS_ATTR_ITEM_H__
+#define	__XFS_ATTR_ITEM_H__
+
+/* kernel only ATTRI/ATTRD definitions */
+
+struct xfs_mount;
+struct kmem_zone;
+
+/*
+ * Define ATTR flag bits. Manipulated by set/clear/test_bit operators.
+ */
+#define	XFS_ATTRI_RECOVERED	1
+
+
+/* iovec length must be 32-bit aligned */
+#define ATTR_NVEC_SIZE(size) (size == sizeof(int32_t) ? sizeof(int32_t) : \
+				size + sizeof(int32_t) - \
+				(size % sizeof(int32_t)))
+
+/*
+ * This is the "attr intention" log item.  It is used to log the fact that some
+ * attribute operations need to be processed.  An operation is currently either
+ * a set or remove.  Set or remove operations are described by the xfs_attr_item
+ * which may be logged to this intent.  Intents are used in conjunction with the
+ * "attr done" log item described below.
+ *
+ * The ATTRI is reference counted so that it is not freed prior to both the
+ * ATTRI and ATTRD being committed and unpinned. This ensures the ATTRI is
+ * inserted into the AIL even in the event of out of order ATTRI/ATTRD
+ * processing. In other words, an ATTRI is born with two references:
+ *
+ *      1.) an ATTRI held reference to track ATTRI AIL insertion
+ *      2.) an ATTRD held reference to track ATTRD commit
+ *
+ * On allocation, both references are the responsibility of the caller. Once the
+ * ATTRI is added to and dirtied in a transaction, ownership of reference one
+ * transfers to the transaction. The reference is dropped once the ATTRI is
+ * inserted to the AIL or in the event of failure along the way (e.g., commit
+ * failure, log I/O error, etc.). Note that the caller remains responsible for
+ * the ATTRD reference under all circumstances to this point. The caller has no
+ * means to detect failure once the transaction is committed, however.
+ * Therefore, an ATTRD is required after this point, even in the event of
+ * unrelated failure.
+ *
+ * Once an ATTRD is allocated and dirtied in a transaction, reference two
+ * transfers to the transaction. The ATTRD reference is dropped once it reaches
+ * the unpin handler. Similar to the ATTRI, the reference also drops in the
+ * event of commit failure or log I/O errors. Note that the ATTRD is not
+ * inserted in the AIL, so at this point both the ATTRI and ATTRD are freed.
+ */
+struct xfs_attri_log_item {
+	struct xfs_log_item		attri_item;
+	atomic_t			attri_refcount;
+	int				attri_name_len;
+	void				*attri_name;
+	int				attri_value_len;
+	void				*attri_value;
+	struct xfs_attri_log_format	attri_format;
+};
+
+/*
+ * This is the "attr done" log item.  It is used to log the fact that some attrs
+ * earlier mentioned in an attri item have been freed.
+ */
+struct xfs_attrd_log_item {
+	struct xfs_attri_log_item	*attrd_attrip;
+	struct xfs_log_item		attrd_item;
+	struct xfs_attrd_log_format	attrd_format;
+};
+
+#endif	/* __XFS_ATTR_ITEM_H__ */
diff --git a/fs/xfs/xfs_attr_list.c b/fs/xfs/xfs_attr_list.c
index 50f922c..166b680 100644
--- a/fs/xfs/xfs_attr_list.c
+++ b/fs/xfs/xfs_attr_list.c
@@ -15,6 +15,7 @@
 #include "xfs_inode.h"
 #include "xfs_trans.h"
 #include "xfs_bmap.h"
+#include "xfs_da_btree.h"
 #include "xfs_attr.h"
 #include "xfs_attr_sf.h"
 #include "xfs_attr_leaf.h"
diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index 6f22a66..edc05af 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -15,6 +15,8 @@
 #include "xfs_iwalk.h"
 #include "xfs_itable.h"
 #include "xfs_error.h"
+#include "xfs_da_format.h"
+#include "xfs_da_btree.h"
 #include "xfs_attr.h"
 #include "xfs_bmap.h"
 #include "xfs_bmap_util.h"
diff --git a/fs/xfs/xfs_ioctl32.c b/fs/xfs/xfs_ioctl32.c
index c1771e7..62e1534 100644
--- a/fs/xfs/xfs_ioctl32.c
+++ b/fs/xfs/xfs_ioctl32.c
@@ -17,6 +17,8 @@
 #include "xfs_itable.h"
 #include "xfs_fsops.h"
 #include "xfs_rtalloc.h"
+#include "xfs_da_format.h"
+#include "xfs_da_btree.h"
 #include "xfs_attr.h"
 #include "xfs_ioctl.h"
 #include "xfs_ioctl32.h"
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index 80a13c8..fe60da1 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -13,6 +13,8 @@
 #include "xfs_inode.h"
 #include "xfs_acl.h"
 #include "xfs_quota.h"
+#include "xfs_da_format.h"
+#include "xfs_da_btree.h"
 #include "xfs_attr.h"
 #include "xfs_trans.h"
 #include "xfs_trace.h"
diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
index ad0c69ee..6405ce33 100644
--- a/fs/xfs/xfs_log.c
+++ b/fs/xfs/xfs_log.c
@@ -1975,6 +1975,10 @@ xlog_print_tic_res(
 	    REG_TYPE_STR(CUD_FORMAT, "cud_format"),
 	    REG_TYPE_STR(BUI_FORMAT, "bui_format"),
 	    REG_TYPE_STR(BUD_FORMAT, "bud_format"),
+	    REG_TYPE_STR(ATTRI_FORMAT, "attri_format"),
+	    REG_TYPE_STR(ATTRD_FORMAT, "attrd_format"),
+	    REG_TYPE_STR(ATTR_NAME, "attr_name"),
+	    REG_TYPE_STR(ATTR_VALUE, "attr_value"),
 	};
 	BUILD_BUG_ON(ARRAY_SIZE(res_type_str) != XLOG_REG_TYPE_MAX + 1);
 #undef REG_TYPE_STR
diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index e2ec91b..ec31db0 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -1811,6 +1811,8 @@ static const struct xlog_recover_item_ops *xlog_recover_item_ops[] = {
 	&xlog_cud_item_ops,
 	&xlog_bui_item_ops,
 	&xlog_bud_item_ops,
+	&xlog_attri_item_ops,
+	&xlog_attrd_item_ops,
 };
 
 static const struct xlog_recover_item_ops *
diff --git a/fs/xfs/xfs_ondisk.h b/fs/xfs/xfs_ondisk.h
index 5f04d8a..0597a04 100644
--- a/fs/xfs/xfs_ondisk.h
+++ b/fs/xfs/xfs_ondisk.h
@@ -126,6 +126,8 @@ xfs_check_ondisk_structs(void)
 	XFS_CHECK_STRUCT_SIZE(struct xfs_inode_log_format,	56);
 	XFS_CHECK_STRUCT_SIZE(struct xfs_qoff_logformat,	20);
 	XFS_CHECK_STRUCT_SIZE(struct xfs_trans_header,		16);
+	XFS_CHECK_STRUCT_SIZE(struct xfs_attri_log_format,	40);
+	XFS_CHECK_STRUCT_SIZE(struct xfs_attrd_log_format,	16);
 
 	/*
 	 * The v5 superblock format extended several v4 header structures with
diff --git a/fs/xfs/xfs_xattr.c b/fs/xfs/xfs_xattr.c
index bca48b3..9b0c790 100644
--- a/fs/xfs/xfs_xattr.c
+++ b/fs/xfs/xfs_xattr.c
@@ -10,6 +10,7 @@
 #include "xfs_log_format.h"
 #include "xfs_da_format.h"
 #include "xfs_inode.h"
+#include "xfs_da_btree.h"
 #include "xfs_attr.h"
 #include "xfs_acl.h"
 #include "xfs_da_btree.h"
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v12 5/8] xfs: Add xfs_attr_set_deferred and xfs_attr_remove_deferred
  2020-08-27  0:35 [PATCH v12 0/8] xfs: Delayed Attributes Allison Collins
                   ` (3 preceding siblings ...)
  2020-08-27  0:35 ` [PATCH v12 4/8] xfs: Set up infastructure for deferred attribute operations Allison Collins
@ 2020-08-27  0:35 ` Allison Collins
  2020-08-27  0:35 ` [PATCH v12 6/8] xfs: Add feature bit XFS_SB_FEAT_INCOMPAT_LOG_DELATTR Allison Collins
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 21+ messages in thread
From: Allison Collins @ 2020-08-27  0:35 UTC (permalink / raw)
  To: linux-xfs

These routines to set up and start a new deferred attribute operations.
These functions are meant to be called by any routine needing to
initiate a deferred attribute operation as opposed to the existing
inline operations. New helper function xfs_attr_item_init also added.

Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Signed-off-by: Allison Collins <allison.henderson@oracle.com>
---
 fs/xfs/libxfs/xfs_attr.c | 91 ++++++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/libxfs/xfs_attr.h |  7 ++++
 2 files changed, 98 insertions(+)

diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
index cf75742..7b79868 100644
--- a/fs/xfs/libxfs/xfs_attr.c
+++ b/fs/xfs/libxfs/xfs_attr.c
@@ -25,6 +25,7 @@
 #include "xfs_trans_space.h"
 #include "xfs_trace.h"
 #include "xfs_attr_item.h"
+#include "xfs_attr.h"
 
 /*
  * xfs_attr.c
@@ -643,6 +644,96 @@ xfs_attr_set(
 	goto out_unlock;
 }
 
+STATIC int
+xfs_attr_item_init(
+	struct xfs_inode	*dp,		/* inode for attr operation */
+	struct xfs_trans	*tp,		/* transaction for attr op */
+	const unsigned char	*name,		/* attr name */
+	unsigned int		namelen,	/* attr namelen */
+	unsigned int		flags,		/* attr flags */
+	const unsigned char	*value,		/* attr value */
+	unsigned int		valuelen,	/* attr value len */
+	unsigned int		op_flags,	/* op flag (set or remove) */
+	struct xfs_attr_item	**attr)		/* new xfs_attr_item */
+{
+
+	struct xfs_attr_item	*new;
+	char			*name_value;
+
+	/*
+	 * All set operations must have a name but not necessarily a value.
+	 */
+	if (!namelen) {
+		ASSERT(0);
+		return -EINVAL;
+	}
+
+	new = kmem_alloc_large(XFS_ATTR_ITEM_SIZEOF(namelen, valuelen),
+			 KM_NOFS);
+	name_value = ((char *)new) + sizeof(struct xfs_attr_item);
+	memset(new, 0, XFS_ATTR_ITEM_SIZEOF(namelen, valuelen));
+	new->xattri_ip = dp;
+	new->xattri_op_flags = op_flags;
+	new->xattri_name_len = namelen;
+	new->xattri_value_len = valuelen;
+	new->xattri_flags = flags;
+	memcpy(&name_value[0], name, namelen);
+	new->xattri_name = name_value;
+	new->xattri_value = name_value + namelen;
+
+	if (valuelen > 0)
+		memcpy(&name_value[namelen], value, valuelen);
+
+	*attr = new;
+	return 0;
+}
+
+/* Sets an attribute for an inode as a deferred operation */
+int
+xfs_attr_set_deferred(
+	struct xfs_inode	*dp,
+	struct xfs_trans	*tp,
+	const unsigned char	*name,
+	unsigned int		namelen,
+	unsigned int		flags,
+	const unsigned char	*value,
+	unsigned int		valuelen)
+{
+	struct xfs_attr_item	*new;
+	int			error = 0;
+
+	error = xfs_attr_item_init(dp, tp, name, namelen, flags, value,
+				   valuelen, XFS_ATTR_OP_FLAGS_SET, &new);
+	if (error)
+		return error;
+
+	xfs_defer_add(tp, XFS_DEFER_OPS_TYPE_ATTR, &new->xattri_list);
+
+	return 0;
+}
+
+/* Removes an attribute for an inode as a deferred operation */
+int
+xfs_attr_remove_deferred(
+	struct xfs_inode        *dp,
+	struct xfs_trans	*tp,
+	const unsigned char	*name,
+	unsigned int		namelen,
+	unsigned int		flags)
+{
+
+	struct xfs_attr_item *new;
+
+	int error  = xfs_attr_item_init(dp, tp, name, namelen, flags, NULL, 0,
+				  XFS_ATTR_OP_FLAGS_REMOVE, &new);
+	if (error)
+		return error;
+
+	xfs_defer_add(tp, XFS_DEFER_OPS_TYPE_ATTR, &new->xattri_list);
+
+	return 0;
+}
+
 /*========================================================================
  * External routines when attribute list is inside the inode
  *========================================================================*/
diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
index 23b8308..4643b3f 100644
--- a/fs/xfs/libxfs/xfs_attr.h
+++ b/fs/xfs/libxfs/xfs_attr.h
@@ -328,5 +328,12 @@ bool xfs_attr_namecheck(const void *name, size_t length);
 void xfs_delattr_context_init(struct xfs_delattr_context *dac,
 			      struct xfs_da_args *args);
 int xfs_attr_calc_size(struct xfs_da_args *args, int *local);
+int xfs_attr_set_deferred(struct xfs_inode *dp, struct xfs_trans *tp,
+			  const unsigned char *name, unsigned int namelen,
+			  unsigned int flags, const unsigned char *value,
+			  unsigned int valuelen);
+int xfs_attr_remove_deferred(struct xfs_inode *dp, struct xfs_trans *tp,
+			    const unsigned char *name, unsigned int namelen,
+			    unsigned int flags);
 
 #endif	/* __XFS_ATTR_H__ */
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v12 6/8] xfs: Add feature bit XFS_SB_FEAT_INCOMPAT_LOG_DELATTR
  2020-08-27  0:35 [PATCH v12 0/8] xfs: Delayed Attributes Allison Collins
                   ` (4 preceding siblings ...)
  2020-08-27  0:35 ` [PATCH v12 5/8] xfs: Add xfs_attr_set_deferred and xfs_attr_remove_deferred Allison Collins
@ 2020-08-27  0:35 ` Allison Collins
  2020-08-27  0:35 ` [PATCH v12 7/8] xfs: Enable delayed attributes Allison Collins
  2020-08-27  0:35 ` [PATCH v12 8/8] xfs_io: Add delayed attributes error tag Allison Collins
  7 siblings, 0 replies; 21+ messages in thread
From: Allison Collins @ 2020-08-27  0:35 UTC (permalink / raw)
  To: linux-xfs

This patch adds a new feature bit XFS_SB_FEAT_INCOMPAT_LOG_DELATTR which
can be used to control turning on/off delayed attributes

Signed-off-by: Allison Collins <allison.henderson@oracle.com>
---
 fs/xfs/libxfs/xfs_format.h | 11 ++++++++++-
 fs/xfs/libxfs/xfs_fs.h     |  1 +
 fs/xfs/libxfs/xfs_sb.c     |  2 ++
 fs/xfs/xfs_super.c         |  4 ++++
 4 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
index 31b7ece..cc417ef 100644
--- a/fs/xfs/libxfs/xfs_format.h
+++ b/fs/xfs/libxfs/xfs_format.h
@@ -479,7 +479,9 @@ xfs_sb_has_incompat_feature(
 	return (sbp->sb_features_incompat & feature) != 0;
 }
 
-#define XFS_SB_FEAT_INCOMPAT_LOG_ALL 0
+#define XFS_SB_FEAT_INCOMPAT_LOG_DELATTR   (1 << 0)	/* Delayed Attributes */
+#define XFS_SB_FEAT_INCOMPAT_LOG_ALL \
+	(XFS_SB_FEAT_INCOMPAT_LOG_DELATTR)
 #define XFS_SB_FEAT_INCOMPAT_LOG_UNKNOWN	~XFS_SB_FEAT_INCOMPAT_LOG_ALL
 static inline bool
 xfs_sb_has_incompat_log_feature(
@@ -563,6 +565,13 @@ static inline bool xfs_sb_version_hasreflink(struct xfs_sb *sbp)
 		(sbp->sb_features_ro_compat & XFS_SB_FEAT_RO_COMPAT_REFLINK);
 }
 
+static inline bool xfs_sb_version_hasdelattr(struct xfs_sb *sbp)
+{
+	return (XFS_SB_VERSION_NUM(sbp) == XFS_SB_VERSION_5 &&
+		(sbp->sb_features_log_incompat &
+		XFS_SB_FEAT_INCOMPAT_LOG_DELATTR));
+}
+
 /*
  * end of superblock version macros
  */
diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
index 84bcffa..67b1f97 100644
--- a/fs/xfs/libxfs/xfs_fs.h
+++ b/fs/xfs/libxfs/xfs_fs.h
@@ -249,6 +249,7 @@ typedef struct xfs_fsop_resblks {
 #define XFS_FSOP_GEOM_FLAGS_SPINODES	(1 << 18) /* sparse inode chunks   */
 #define XFS_FSOP_GEOM_FLAGS_RMAPBT	(1 << 19) /* reverse mapping btree */
 #define XFS_FSOP_GEOM_FLAGS_REFLINK	(1 << 20) /* files can share blocks */
+#define XFS_FSOP_GEOM_FLAGS_DELATTR	(1 << 21) /* delayed attributes	    */
 
 /*
  * Minimum and maximum sizes need for growth checks.
diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c
index ae9aaf1..0d2e793 100644
--- a/fs/xfs/libxfs/xfs_sb.c
+++ b/fs/xfs/libxfs/xfs_sb.c
@@ -1166,6 +1166,8 @@ xfs_fs_geometry(
 		geo->flags |= XFS_FSOP_GEOM_FLAGS_RMAPBT;
 	if (xfs_sb_version_hasreflink(sbp))
 		geo->flags |= XFS_FSOP_GEOM_FLAGS_REFLINK;
+	if (xfs_sb_version_hasdelattr(sbp))
+		geo->flags |= XFS_FSOP_GEOM_FLAGS_DELATTR;
 	if (xfs_sb_version_hassector(sbp))
 		geo->logsectsize = sbp->sb_logsectsize;
 	else
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 71ac6c1..7698cf5 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -1549,6 +1549,10 @@ xfs_fc_fill_super(
 		goto out_filestream_unmount;
 	}
 
+	if (xfs_sb_version_hasdelattr(&mp->m_sb))
+		xfs_alert(mp,
+	"EXPERIMENTAL delayed attrs feature enabled. Use at your own risk!");
+
 	error = xfs_mountfs(mp);
 	if (error)
 		goto out_filestream_unmount;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v12 7/8] xfs: Enable delayed attributes
  2020-08-27  0:35 [PATCH v12 0/8] xfs: Delayed Attributes Allison Collins
                   ` (5 preceding siblings ...)
  2020-08-27  0:35 ` [PATCH v12 6/8] xfs: Add feature bit XFS_SB_FEAT_INCOMPAT_LOG_DELATTR Allison Collins
@ 2020-08-27  0:35 ` Allison Collins
  2020-08-27  0:35 ` [PATCH v12 8/8] xfs_io: Add delayed attributes error tag Allison Collins
  7 siblings, 0 replies; 21+ messages in thread
From: Allison Collins @ 2020-08-27  0:35 UTC (permalink / raw)
  To: linux-xfs

Finally enable delayed attributes in xfs_attr_set and xfs_attr_remove.
We only do this for new filesystems that have the feature bit enabled
because we cant add new log entries to older filesystems

Signed-off-by: Allison Collins <allison.henderson@oracle.com>
---
 fs/xfs/libxfs/xfs_attr.c | 19 +++++++++++++++++--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
index 7b79868..e5fbcbc 100644
--- a/fs/xfs/libxfs/xfs_attr.c
+++ b/fs/xfs/libxfs/xfs_attr.c
@@ -518,6 +518,7 @@ xfs_attr_set(
 {
 	struct xfs_inode	*dp = args->dp;
 	struct xfs_mount	*mp = dp->i_mount;
+	struct xfs_sb		*sbp = &mp->m_sb;
 	struct xfs_trans_res	tres;
 	bool			rsvd = (args->attr_filter & XFS_ATTR_ROOT);
 	int			error, local;
@@ -603,9 +604,17 @@ xfs_attr_set(
 		if (error != -ENOATTR && error != -EEXIST)
 			goto out_trans_cancel;
 
-		error = xfs_attr_set_args(args);
+		if (xfs_sb_version_hasdelattr(sbp))
+			error = xfs_attr_set_deferred(dp, args->trans,
+					      args->name, args->namelen,
+					      args->attr_filter, args->value,
+					      args->valuelen);
+		else
+			error = xfs_attr_set_args(args);
+
 		if (error)
 			goto out_trans_cancel;
+
 		/* shortform attribute has already been committed */
 		if (!args->trans)
 			goto out_unlock;
@@ -614,7 +623,13 @@ xfs_attr_set(
 		if (error != -EEXIST)
 			goto out_trans_cancel;
 
-		error = xfs_attr_remove_args(args);
+		if (xfs_sb_version_hasdelattr(sbp))
+			error = xfs_attr_remove_deferred(dp, args->trans,
+							 args->name,
+							 args->namelen,
+							 args->attr_filter);
+		else
+			error = xfs_attr_remove_args(args);
 		if (error)
 			goto out_trans_cancel;
 	}
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v12 8/8] xfs_io: Add delayed attributes error tag
  2020-08-27  0:35 [PATCH v12 0/8] xfs: Delayed Attributes Allison Collins
                   ` (6 preceding siblings ...)
  2020-08-27  0:35 ` [PATCH v12 7/8] xfs: Enable delayed attributes Allison Collins
@ 2020-08-27  0:35 ` Allison Collins
  2020-08-28 16:02   ` Darrick J. Wong
  7 siblings, 1 reply; 21+ messages in thread
From: Allison Collins @ 2020-08-27  0:35 UTC (permalink / raw)
  To: linux-xfs

This patch adds an error tag that we can use to test delayed attribute
recovery and replay

Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Allison Collins <allison.henderson@oracle.com>
---
 fs/xfs/libxfs/xfs_errortag.h | 4 +++-
 fs/xfs/xfs_attr_item.c       | 8 ++++++++
 fs/xfs/xfs_error.c           | 3 +++
 3 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/fs/xfs/libxfs/xfs_errortag.h b/fs/xfs/libxfs/xfs_errortag.h
index 53b305d..cb38cbf 100644
--- a/fs/xfs/libxfs/xfs_errortag.h
+++ b/fs/xfs/libxfs/xfs_errortag.h
@@ -56,7 +56,8 @@
 #define XFS_ERRTAG_FORCE_SUMMARY_RECALC			33
 #define XFS_ERRTAG_IUNLINK_FALLBACK			34
 #define XFS_ERRTAG_BUF_IOERROR				35
-#define XFS_ERRTAG_MAX					36
+#define XFS_ERRTAG_DELAYED_ATTR				36
+#define XFS_ERRTAG_MAX					37
 
 /*
  * Random factors for above tags, 1 means always, 2 means 1/2 time, etc.
@@ -97,5 +98,6 @@
 #define XFS_RANDOM_FORCE_SUMMARY_RECALC			1
 #define XFS_RANDOM_IUNLINK_FALLBACK			(XFS_RANDOM_DEFAULT/10)
 #define XFS_RANDOM_BUF_IOERROR				XFS_RANDOM_DEFAULT
+#define XFS_RANDOM_DELAYED_ATTR				1
 
 #endif /* __XFS_ERRORTAG_H_ */
diff --git a/fs/xfs/xfs_attr_item.c b/fs/xfs/xfs_attr_item.c
index 923c288..ed71003 100644
--- a/fs/xfs/xfs_attr_item.c
+++ b/fs/xfs/xfs_attr_item.c
@@ -35,6 +35,8 @@
 #include "xfs_quota.h"
 #include "xfs_log_priv.h"
 #include "xfs_log_recover.h"
+#include "xfs_error.h"
+#include "xfs_errortag.h"
 
 static const struct xfs_item_ops xfs_attri_item_ops;
 static const struct xfs_item_ops xfs_attrd_item_ops;
@@ -310,6 +312,11 @@ xfs_trans_attr(
 	if (error)
 		return error;
 
+	if (XFS_TEST_ERROR(false, args->dp->i_mount, XFS_ERRTAG_DELAYED_ATTR)) {
+		error = -EIO;
+		goto out;
+	}
+
 	switch (op_flags) {
 	case XFS_ATTR_OP_FLAGS_SET:
 		args->op_flags |= XFS_DA_OP_ADDNAME;
@@ -324,6 +331,7 @@ xfs_trans_attr(
 		break;
 	}
 
+out:
 	/*
 	 * Mark the transaction dirty, even on error. This ensures the
 	 * transaction is aborted, which:
diff --git a/fs/xfs/xfs_error.c b/fs/xfs/xfs_error.c
index 7f6e208..fc551cb 100644
--- a/fs/xfs/xfs_error.c
+++ b/fs/xfs/xfs_error.c
@@ -54,6 +54,7 @@ static unsigned int xfs_errortag_random_default[] = {
 	XFS_RANDOM_FORCE_SUMMARY_RECALC,
 	XFS_RANDOM_IUNLINK_FALLBACK,
 	XFS_RANDOM_BUF_IOERROR,
+	XFS_RANDOM_DELAYED_ATTR,
 };
 
 struct xfs_errortag_attr {
@@ -164,6 +165,7 @@ XFS_ERRORTAG_ATTR_RW(force_repair,	XFS_ERRTAG_FORCE_SCRUB_REPAIR);
 XFS_ERRORTAG_ATTR_RW(bad_summary,	XFS_ERRTAG_FORCE_SUMMARY_RECALC);
 XFS_ERRORTAG_ATTR_RW(iunlink_fallback,	XFS_ERRTAG_IUNLINK_FALLBACK);
 XFS_ERRORTAG_ATTR_RW(buf_ioerror,	XFS_ERRTAG_BUF_IOERROR);
+XFS_ERRORTAG_ATTR_RW(delayed_attr,	XFS_ERRTAG_DELAYED_ATTR);
 
 static struct attribute *xfs_errortag_attrs[] = {
 	XFS_ERRORTAG_ATTR_LIST(noerror),
@@ -202,6 +204,7 @@ static struct attribute *xfs_errortag_attrs[] = {
 	XFS_ERRORTAG_ATTR_LIST(bad_summary),
 	XFS_ERRORTAG_ATTR_LIST(iunlink_fallback),
 	XFS_ERRORTAG_ATTR_LIST(buf_ioerror),
+	XFS_ERRORTAG_ATTR_LIST(delayed_attr),
 	NULL,
 };
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH v12 8/8] xfs_io: Add delayed attributes error tag
  2020-08-27  0:35 ` [PATCH v12 8/8] xfs_io: Add delayed attributes error tag Allison Collins
@ 2020-08-28 16:02   ` Darrick J. Wong
  2020-08-28 18:00     ` Allison Collins
  0 siblings, 1 reply; 21+ messages in thread
From: Darrick J. Wong @ 2020-08-28 16:02 UTC (permalink / raw)
  To: Allison Collins; +Cc: linux-xfs

On Wed, Aug 26, 2020 at 05:35:18PM -0700, Allison Collins wrote:
> This patch adds an error tag that we can use to test delayed attribute
> recovery and replay
> 
> Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
> Signed-off-by: Allison Collins <allison.henderson@oracle.com>

FWIW the subject line for this patch ought to start with 'xfs:', not
'xfs_io:'.

--D

> ---
>  fs/xfs/libxfs/xfs_errortag.h | 4 +++-
>  fs/xfs/xfs_attr_item.c       | 8 ++++++++
>  fs/xfs/xfs_error.c           | 3 +++
>  3 files changed, 14 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_errortag.h b/fs/xfs/libxfs/xfs_errortag.h
> index 53b305d..cb38cbf 100644
> --- a/fs/xfs/libxfs/xfs_errortag.h
> +++ b/fs/xfs/libxfs/xfs_errortag.h
> @@ -56,7 +56,8 @@
>  #define XFS_ERRTAG_FORCE_SUMMARY_RECALC			33
>  #define XFS_ERRTAG_IUNLINK_FALLBACK			34
>  #define XFS_ERRTAG_BUF_IOERROR				35
> -#define XFS_ERRTAG_MAX					36
> +#define XFS_ERRTAG_DELAYED_ATTR				36
> +#define XFS_ERRTAG_MAX					37
>  
>  /*
>   * Random factors for above tags, 1 means always, 2 means 1/2 time, etc.
> @@ -97,5 +98,6 @@
>  #define XFS_RANDOM_FORCE_SUMMARY_RECALC			1
>  #define XFS_RANDOM_IUNLINK_FALLBACK			(XFS_RANDOM_DEFAULT/10)
>  #define XFS_RANDOM_BUF_IOERROR				XFS_RANDOM_DEFAULT
> +#define XFS_RANDOM_DELAYED_ATTR				1
>  
>  #endif /* __XFS_ERRORTAG_H_ */
> diff --git a/fs/xfs/xfs_attr_item.c b/fs/xfs/xfs_attr_item.c
> index 923c288..ed71003 100644
> --- a/fs/xfs/xfs_attr_item.c
> +++ b/fs/xfs/xfs_attr_item.c
> @@ -35,6 +35,8 @@
>  #include "xfs_quota.h"
>  #include "xfs_log_priv.h"
>  #include "xfs_log_recover.h"
> +#include "xfs_error.h"
> +#include "xfs_errortag.h"
>  
>  static const struct xfs_item_ops xfs_attri_item_ops;
>  static const struct xfs_item_ops xfs_attrd_item_ops;
> @@ -310,6 +312,11 @@ xfs_trans_attr(
>  	if (error)
>  		return error;
>  
> +	if (XFS_TEST_ERROR(false, args->dp->i_mount, XFS_ERRTAG_DELAYED_ATTR)) {
> +		error = -EIO;
> +		goto out;
> +	}
> +
>  	switch (op_flags) {
>  	case XFS_ATTR_OP_FLAGS_SET:
>  		args->op_flags |= XFS_DA_OP_ADDNAME;
> @@ -324,6 +331,7 @@ xfs_trans_attr(
>  		break;
>  	}
>  
> +out:
>  	/*
>  	 * Mark the transaction dirty, even on error. This ensures the
>  	 * transaction is aborted, which:
> diff --git a/fs/xfs/xfs_error.c b/fs/xfs/xfs_error.c
> index 7f6e208..fc551cb 100644
> --- a/fs/xfs/xfs_error.c
> +++ b/fs/xfs/xfs_error.c
> @@ -54,6 +54,7 @@ static unsigned int xfs_errortag_random_default[] = {
>  	XFS_RANDOM_FORCE_SUMMARY_RECALC,
>  	XFS_RANDOM_IUNLINK_FALLBACK,
>  	XFS_RANDOM_BUF_IOERROR,
> +	XFS_RANDOM_DELAYED_ATTR,
>  };
>  
>  struct xfs_errortag_attr {
> @@ -164,6 +165,7 @@ XFS_ERRORTAG_ATTR_RW(force_repair,	XFS_ERRTAG_FORCE_SCRUB_REPAIR);
>  XFS_ERRORTAG_ATTR_RW(bad_summary,	XFS_ERRTAG_FORCE_SUMMARY_RECALC);
>  XFS_ERRORTAG_ATTR_RW(iunlink_fallback,	XFS_ERRTAG_IUNLINK_FALLBACK);
>  XFS_ERRORTAG_ATTR_RW(buf_ioerror,	XFS_ERRTAG_BUF_IOERROR);
> +XFS_ERRORTAG_ATTR_RW(delayed_attr,	XFS_ERRTAG_DELAYED_ATTR);
>  
>  static struct attribute *xfs_errortag_attrs[] = {
>  	XFS_ERRORTAG_ATTR_LIST(noerror),
> @@ -202,6 +204,7 @@ static struct attribute *xfs_errortag_attrs[] = {
>  	XFS_ERRORTAG_ATTR_LIST(bad_summary),
>  	XFS_ERRORTAG_ATTR_LIST(iunlink_fallback),
>  	XFS_ERRORTAG_ATTR_LIST(buf_ioerror),
> +	XFS_ERRORTAG_ATTR_LIST(delayed_attr),
>  	NULL,
>  };
>  
> -- 
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v12 8/8] xfs_io: Add delayed attributes error tag
  2020-08-28 16:02   ` Darrick J. Wong
@ 2020-08-28 18:00     ` Allison Collins
  0 siblings, 0 replies; 21+ messages in thread
From: Allison Collins @ 2020-08-28 18:00 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs



On 8/28/20 9:02 AM, Darrick J. Wong wrote:
> On Wed, Aug 26, 2020 at 05:35:18PM -0700, Allison Collins wrote:
>> This patch adds an error tag that we can use to test delayed attribute
>> recovery and replay
>>
>> Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
>> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
>> Signed-off-by: Allison Collins <allison.henderson@oracle.com>
> 
> FWIW the subject line for this patch ought to start with 'xfs:', not
> 'xfs_io:'.
> 
> --D
> 
Sure, I think some time a long time ago, someone had commented that it 
was supposed to be xfs_io, but I dont see that used very much.  Will put 
it back to xfs.  Thanks!

Allison
>> ---
>>   fs/xfs/libxfs/xfs_errortag.h | 4 +++-
>>   fs/xfs/xfs_attr_item.c       | 8 ++++++++
>>   fs/xfs/xfs_error.c           | 3 +++
>>   3 files changed, 14 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/xfs/libxfs/xfs_errortag.h b/fs/xfs/libxfs/xfs_errortag.h
>> index 53b305d..cb38cbf 100644
>> --- a/fs/xfs/libxfs/xfs_errortag.h
>> +++ b/fs/xfs/libxfs/xfs_errortag.h
>> @@ -56,7 +56,8 @@
>>   #define XFS_ERRTAG_FORCE_SUMMARY_RECALC			33
>>   #define XFS_ERRTAG_IUNLINK_FALLBACK			34
>>   #define XFS_ERRTAG_BUF_IOERROR				35
>> -#define XFS_ERRTAG_MAX					36
>> +#define XFS_ERRTAG_DELAYED_ATTR				36
>> +#define XFS_ERRTAG_MAX					37
>>   
>>   /*
>>    * Random factors for above tags, 1 means always, 2 means 1/2 time, etc.
>> @@ -97,5 +98,6 @@
>>   #define XFS_RANDOM_FORCE_SUMMARY_RECALC			1
>>   #define XFS_RANDOM_IUNLINK_FALLBACK			(XFS_RANDOM_DEFAULT/10)
>>   #define XFS_RANDOM_BUF_IOERROR				XFS_RANDOM_DEFAULT
>> +#define XFS_RANDOM_DELAYED_ATTR				1
>>   
>>   #endif /* __XFS_ERRORTAG_H_ */
>> diff --git a/fs/xfs/xfs_attr_item.c b/fs/xfs/xfs_attr_item.c
>> index 923c288..ed71003 100644
>> --- a/fs/xfs/xfs_attr_item.c
>> +++ b/fs/xfs/xfs_attr_item.c
>> @@ -35,6 +35,8 @@
>>   #include "xfs_quota.h"
>>   #include "xfs_log_priv.h"
>>   #include "xfs_log_recover.h"
>> +#include "xfs_error.h"
>> +#include "xfs_errortag.h"
>>   
>>   static const struct xfs_item_ops xfs_attri_item_ops;
>>   static const struct xfs_item_ops xfs_attrd_item_ops;
>> @@ -310,6 +312,11 @@ xfs_trans_attr(
>>   	if (error)
>>   		return error;
>>   
>> +	if (XFS_TEST_ERROR(false, args->dp->i_mount, XFS_ERRTAG_DELAYED_ATTR)) {
>> +		error = -EIO;
>> +		goto out;
>> +	}
>> +
>>   	switch (op_flags) {
>>   	case XFS_ATTR_OP_FLAGS_SET:
>>   		args->op_flags |= XFS_DA_OP_ADDNAME;
>> @@ -324,6 +331,7 @@ xfs_trans_attr(
>>   		break;
>>   	}
>>   
>> +out:
>>   	/*
>>   	 * Mark the transaction dirty, even on error. This ensures the
>>   	 * transaction is aborted, which:
>> diff --git a/fs/xfs/xfs_error.c b/fs/xfs/xfs_error.c
>> index 7f6e208..fc551cb 100644
>> --- a/fs/xfs/xfs_error.c
>> +++ b/fs/xfs/xfs_error.c
>> @@ -54,6 +54,7 @@ static unsigned int xfs_errortag_random_default[] = {
>>   	XFS_RANDOM_FORCE_SUMMARY_RECALC,
>>   	XFS_RANDOM_IUNLINK_FALLBACK,
>>   	XFS_RANDOM_BUF_IOERROR,
>> +	XFS_RANDOM_DELAYED_ATTR,
>>   };
>>   
>>   struct xfs_errortag_attr {
>> @@ -164,6 +165,7 @@ XFS_ERRORTAG_ATTR_RW(force_repair,	XFS_ERRTAG_FORCE_SCRUB_REPAIR);
>>   XFS_ERRORTAG_ATTR_RW(bad_summary,	XFS_ERRTAG_FORCE_SUMMARY_RECALC);
>>   XFS_ERRORTAG_ATTR_RW(iunlink_fallback,	XFS_ERRTAG_IUNLINK_FALLBACK);
>>   XFS_ERRORTAG_ATTR_RW(buf_ioerror,	XFS_ERRTAG_BUF_IOERROR);
>> +XFS_ERRORTAG_ATTR_RW(delayed_attr,	XFS_ERRTAG_DELAYED_ATTR);
>>   
>>   static struct attribute *xfs_errortag_attrs[] = {
>>   	XFS_ERRORTAG_ATTR_LIST(noerror),
>> @@ -202,6 +204,7 @@ static struct attribute *xfs_errortag_attrs[] = {
>>   	XFS_ERRORTAG_ATTR_LIST(bad_summary),
>>   	XFS_ERRORTAG_ATTR_LIST(iunlink_fallback),
>>   	XFS_ERRORTAG_ATTR_LIST(buf_ioerror),
>> +	XFS_ERRORTAG_ATTR_LIST(delayed_attr),
>>   	NULL,
>>   };
>>   
>> -- 
>> 2.7.4
>>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v12 4/8] xfs: Set up infastructure for deferred attribute operations
  2020-08-27  0:35 ` [PATCH v12 4/8] xfs: Set up infastructure for deferred attribute operations Allison Collins
@ 2020-08-28 21:27   ` Darrick J. Wong
  2020-09-02  0:46     ` Allison Collins
  0 siblings, 1 reply; 21+ messages in thread
From: Darrick J. Wong @ 2020-08-28 21:27 UTC (permalink / raw)
  To: Allison Collins; +Cc: linux-xfs

On Wed, Aug 26, 2020 at 05:35:14PM -0700, Allison Collins wrote:
> Currently attributes are modified directly across one or more
> transactions. But they are not logged or replayed in the event of an
> error. The goal of delayed attributes is to enable logging and replaying
> of attribute operations using the existing delayed operations
> infrastructure.  This will later enable the attributes to become part of
> larger multi part operations that also must first be recorded to the
> log.  This is mostly of interest in the scheme of parent pointers which
> would need to maintain an attribute containing parent inode information
> any time an inode is moved, created, or removed.  Parent pointers would
> then be of interest to any feature that would need to quickly derive an
> inode path from the mount point. Online scrub, nfs lookups and fs grow
> or shrink operations are all features that could take advantage of this.
> 
> This patch adds two new log item types for setting or removing
> attributes as deferred operations.  The xfs_attri_log_item logs an
> intent to set or remove an attribute.  The corresponding
> xfs_attrd_log_item holds a reference to the xfs_attri_log_item and is
> freed once the transaction is done.  Both log items use a generic
> xfs_attr_log_format structure that contains the attribute name, value,
> flags, inode, and an op_flag that indicates if the operations is a set
> or remove.
> 
> Signed-off-by: Allison Collins <allison.henderson@oracle.com>
> ---
>  fs/xfs/Makefile                 |   1 +
>  fs/xfs/libxfs/xfs_attr.c        |   7 +-
>  fs/xfs/libxfs/xfs_attr.h        |  39 ++
>  fs/xfs/libxfs/xfs_defer.c       |   1 +
>  fs/xfs/libxfs/xfs_defer.h       |   3 +
>  fs/xfs/libxfs/xfs_log_format.h  |  44 ++-
>  fs/xfs/libxfs/xfs_log_recover.h |   2 +
>  fs/xfs/libxfs/xfs_types.h       |   1 +
>  fs/xfs/scrub/common.c           |   2 +
>  fs/xfs/xfs_acl.c                |   2 +
>  fs/xfs/xfs_attr_item.c          | 829 ++++++++++++++++++++++++++++++++++++++++
>  fs/xfs/xfs_attr_item.h          |  76 ++++
>  fs/xfs/xfs_attr_list.c          |   1 +
>  fs/xfs/xfs_ioctl.c              |   2 +
>  fs/xfs/xfs_ioctl32.c            |   2 +
>  fs/xfs/xfs_iops.c               |   2 +
>  fs/xfs/xfs_log.c                |   4 +
>  fs/xfs/xfs_log_recover.c        |   2 +
>  fs/xfs/xfs_ondisk.h             |   2 +
>  fs/xfs/xfs_xattr.c              |   1 +
>  20 files changed, 1017 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
> index 04611a1..b056cfc 100644
> --- a/fs/xfs/Makefile
> +++ b/fs/xfs/Makefile
> @@ -102,6 +102,7 @@ xfs-y				+= xfs_log.o \
>  				   xfs_buf_item_recover.o \
>  				   xfs_dquot_item_recover.o \
>  				   xfs_extfree_item.o \
> +				   xfs_attr_item.o \
>  				   xfs_icreate_item.o \
>  				   xfs_inode_item.o \
>  				   xfs_inode_item_recover.o \
> diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
> index a8cfe62..cf75742 100644
> --- a/fs/xfs/libxfs/xfs_attr.c
> +++ b/fs/xfs/libxfs/xfs_attr.c
> @@ -24,6 +24,7 @@
>  #include "xfs_quota.h"
>  #include "xfs_trans_space.h"
>  #include "xfs_trace.h"
> +#include "xfs_attr_item.h"
>  
>  /*
>   * xfs_attr.c
> @@ -59,8 +60,6 @@ STATIC int xfs_attr_node_hasname(xfs_da_args_t *args,
>  STATIC int xfs_attr_fillstate(xfs_da_state_t *state);
>  STATIC int xfs_attr_refillstate(xfs_da_state_t *state);
>  STATIC int xfs_attr_leaf_try_add(struct xfs_da_args *args, struct xfs_buf *bp);
> -STATIC int xfs_attr_set_iter(struct xfs_delattr_context *dac,
> -			     struct xfs_buf **leaf_bp);
>  
>  int
>  xfs_inode_hasattr(
> @@ -142,7 +141,7 @@ xfs_attr_get(
>  /*
>   * Calculate how many blocks we need for the new attribute,
>   */
> -STATIC int
> +int
>  xfs_attr_calc_size(
>  	struct xfs_da_args	*args,
>  	int			*local)
> @@ -327,7 +326,7 @@ xfs_attr_set_args(
>   * to handle this, and recall the function until a successful error code is
>   * returned.
>   */
> -STATIC int
> +int
>  xfs_attr_set_iter(
>  	struct xfs_delattr_context	*dac,
>  	struct xfs_buf			**leaf_bp)
> diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
> index 4f6bba8..23b8308 100644
> --- a/fs/xfs/libxfs/xfs_attr.h
> +++ b/fs/xfs/libxfs/xfs_attr.h
> @@ -247,6 +247,7 @@ enum xfs_delattr_state {
>  #define XFS_DAC_DEFER_FINISH		0x01 /* finish the transaction */
>  #define XFS_DAC_NODE_RMVNAME_INIT	0x02 /* xfs_attr_node_removename init */
>  #define XFS_DAC_LEAF_ADDNAME_INIT	0x04 /* xfs_attr_leaf_addname init*/
> +#define XFS_DAC_DELAYED_OP_INIT		0x08 /* delayed operations init*/
>  
>  /*
>   * Context used for keeping track of delayed attribute operations
> @@ -254,6 +255,9 @@ enum xfs_delattr_state {
>  struct xfs_delattr_context {
>  	struct xfs_da_args      *da_args;
>  
> +	/* Used by delayed attributes to hold leaf across transactions */
> +	struct xfs_buf		*leaf_bp;
> +
>  	/* Used in xfs_attr_rmtval_set_blk to roll through allocating blocks */
>  	struct xfs_bmbt_irec	map;
>  	xfs_dablk_t		lblkno;
> @@ -268,6 +272,38 @@ struct xfs_delattr_context {
>  	enum xfs_delattr_state  dela_state;
>  };

I'll start by pasting in the full xfs_delattr_context definition for
easier reading:

/*
 * Context used for keeping track of delayed attribute operations
 */
struct xfs_delattr_context {
	struct xfs_da_args      *da_args;

	/* Used by delayed attributes to hold leaf across transactions */
	struct xfs_buf		*leaf_bp;

	/* Used in xfs_attr_rmtval_set_blk to roll through allocating blocks */
	struct xfs_bmbt_irec	map;
	xfs_dablk_t		lblkno;
	int			blkcnt;

	/* Used in xfs_attr_node_removename to roll through removing blocks */
	struct xfs_da_state     *da_state;
	struct xfs_da_state_blk *blk;

	/* Used to keep track of current state of delayed operation */
	unsigned int            flags;
	enum xfs_delattr_state  dela_state;
};

Admittedly, I /am/ conducting a backwards review and zeroing in on the
data structures first.

> +/*
> + * List of attrs to commit later.
> + */
> +struct xfs_attr_item {
> +	struct xfs_inode	*xattri_ip;
> +	void			*xattri_value;		/* attr value */
> +	void			*xattri_name;		/* attr name */
> +	uint32_t		xattri_op_flags;	/* attr op set or rm */
> +	uint32_t		xattri_value_len;	/* length of value */
> +	uint32_t		xattri_name_len;	/* length of name */
> +	uint32_t		xattri_flags;		/* attr flags */
> +
> +	/* used to log this item to an intent */
> +	struct list_head	xattri_list;
> +
> +	/*
> +	 * xfs_delattr_context and xfs_da_args need to remain instantiated
> +	 * across transaction rolls during the defer finish, so store them here
> +	 */
> +	struct xfs_da_args		xattri_args;
> +	struct xfs_delattr_context	xattri_dac;
> +
> +	/*
> +	 * A byte array follows the header containing the file name and
> +	 * attribute value.
> +	 */
> +};

These two structures (xfs_delattr_context and xfs_attr_item) duplicate a
lot of information considering that they both track incore state during
an xattr set/remove operation.  There's also a lot of duplication
between the do-while loop in xfs_attr_set_args and the inner loop of the
defer attr set code.

To make sure I'm understanding this correctly, let me start by repeating
back to you what I think is the code flow through the hasdelattr path
and then the !hasdelattr path.  Let's call the hasdelattr path (A).

First, the caller allocates an xfs_da_args structure and partially
initializes it with dp, attr_filter, attr_flags, name, namelen, value,
and valuelen set appropriately for the operation it wants.  The rest of
the struct should be zeroed, because the uninitialized parts are
internal state.

Second, the *args are passed to xfs_attr_set, which after setting up a
transaction calls xfs_attr_set_deferred.  This calls xfs_attr_item_init
to allocate and initialize a struct xfs_attr_item with dp, name,
namelen, attr_filter, value, and valuelen, and passes this incore state
tracking structure to the defer ops machinery.

Third, the defer ops machinery calls xfs_attr_finish_item to deal with
the attr request.  If the xfs_delattr_context within the xfs_attr_item
is uninitialized it willl set the xfs_da_args state that's within the
xfs_attr_item to the values already stored in the xfs_attr_item.

Fourth, xfs_attr_finish_item calls xfs_trans_attr to dispatch based on
op_flags.  For setting, this means we call xfs_attr_set_iter.

Fifth, xfs_attr_set_iter dispatches functions based on whatever
dela_state in the delattr_context is set to.  The functions it calls can
set DAC_DEFER_FINISH and/or return -EAGAIN to signal the defer ops
machinery that it needs to roll the transaction so that we can repeat
steps 3-5 until we're done.  The defer ops machinery ought to honor
DEFER_FINISH and complete whatever work items we've put on the queue,
but... it's buggy and doesn't.  I'll come back to this later.

Sixth, once we're done, we return out to xfs_attr_set to commit the
transaction and exit.

Did I understand that correctly?  If so, I'll move on to the !hasdelattr
case, which we'll call (B).

First, the caller allocates an xfs_da_args structure and partially
initializes it with dp, attr_filter, attr_flags, name, namelen, value,
and valuelen set appropriately for the operation it wants.  The rest of
the struct should be zeroed, because the uninitialized parts are
internal state.  This is the same as step A1 above.

Second, the *args are passed to xfs_attr_set, which after setting up a
transaction calls xfs_attr_set_args.  This calls xfs_attr_set_iter,
which is the dela_state function dispatcher mentioned in step A5 above.
The functions it calls can set DAC_DEFER_FINISH to signal to
xfs_attr_set_args that it needs to complete whatever work items we've
attached to the transaction.  They can also return -EAGAIN to signal
to xfs_attr_set_args that it's time to roll the transaction.

Third, once we're done, we return out of xfs_attr_set, same as step A6
above.

Assuming I understood those two code paths correctly, I'll move on to
the attr item recovery case.  Call this (C).

First, xfs_attri_item_recover is called with a recovered incore log
item.  It allocates an xfs_da_args and fills out most of the same
fields that xfs_attr_set does in A1-A2 and B1-B2 above; and then it
allocates a transaction.

Second, _recover has its own while loop(!) to call xfs_trans_attr, which
calls xfs_attr_set_iter, sort of like what A4 does.  I'll come back to
this later as well.

Third, xfs_attr_set_iter uses dela_state to dispatch functions, similar
to what A5 does above.  If those functions set DAC_DEFER_FINISH or
return -EAGAIN, we'll pass that out to xfs_attr_set_iter to get the
transaction rolled so we can move on to the next state.

Fourth, when the loop is done we commit the transaction and move on with
whatever is next in log recovery.

Does that sound right?  If so, let's move on to the issues I noted
above.

I think the first problem is that this patchset adds two more xattr
operation state structures.  Current xfs_da_args store both the
operation arguments (inode, name, value, other flags) and most of the
state of the operation (whichfork, hashval, geo, block indices, rmt
block indices).  The series then adds a xfs_delattr_context that holds
more state that needs to survive a transaction roll (leafbp, rmt
mappings, da btree state, and dela_state).  Then, it adds yet another
xfs_attr_item that contains its own xfs_da_args and xfs_delattr_context,
and has a bunch more fields xattri_(ip, value, name, opflags, value)
that duplicate the fields that already exist in xfs_da_args.

This is hard to follow.  I don't know what's the difference between
xfs_attr_item.xattri_name and xfs_attr_item.xattri_args.name, and I
suspect this makes xfs_attr_item much larger than it needs to be.

Question 1: Can we break up struct xfs_da_args?  Right now its field
definition is the union set of everything needed to track both a
directory operation and an xattr operation.  What do you think of
creating separate xfs_dirop_state and xfs_attrop_state structures that
each embed an xfs_da_args, and then move the dir and attr-specific
pieces out of xfs_da_args and into xfs_{dir,attr}op_state as
appropriate?  I think Christoph has suggested this elsewhere on the list
in the past.

(Note that xfs_da_state is its own separate thing for dealing with
dabtree operations; that doesn't change.)

Question 2: Should we revise the arguments to xfs_attr_[gs]et?  Right
now the callers of these functions have to initialize the entire
xfs_da_state structure even though they only care about 7 of the 26
fields.  What do you think of changing the xfs_attr_[gs]et function
declarations to pass in the 7 arguments directly?  Or you could create a
new arguments struct?  If you did that, then xfs_args_[gs]et would be
responsible for allocating and initializing their internal state.  This
is cleaner interface-wise, and leads me into...

Question 3: Instead of creating separate xfs_delattr_context
andxfs_attr_item structs, can you put all the stuff those structures
track into xfs_attrop_state?  I sense that the duplication and pointer
indirection in _delattr_context and _attr_item might be a result of it
not being all that clear where the xfs_da_args is actually allocated,
and therefore the scoping rules.  Would all that be clearer if all the
new state was thrown into the same xfs_attrop_state that we dynamically
allocate at the start of xfs_attr_[gs]et()?  (Yes, this question's
existence depends on your answer to Q2.)

Question 4: Does xfs_attr_item_init need to allocate space to hold the
name and value buffers when it is called from xfs_attr_set?
xfs_attr_set does not return until we're completely finished with the
deferred xattr processing, which means that the buffers passed into
xfs_attr_set cannot go out of scope, right?

(I think you /do/ need to allocate separate buffers for log recovery.)

My second set of questions revolve around the duplication of attr
operation loops between xfs_attr_set_args() and the defer ops code.
AFAICT there's no reason to have xfs_attr_set_args, since there is no
requirement in the deferred ops machinery to create log intent or log
done items.

Question 5: Instead of open-coding a do {attrset roll hold} loop in
xfs_attr_set_args, what do you think about setting up the deferred op
code (xfs_attr_defer_type and the functions assigned to it in patch 4)
to do that from the start?  By adding the defer op code early, patch 2
would create xfs_attr_set_iter as it does now, and xfs_attr_finish_item
would call it directly.  Since there's no log item defined yet, the
other defer ops functions (create_intent, abort_intent, create_done) can
return NULL log item pointers.

Once you get to the point whre you have defined the log items, you can
add in all the other log item handling (i.e. xfs_attr[id]_item_ops).  As
an example of a defer op that optionally records changes to its incore
operation state with log items, see xfs_swapext_defer_type[1].

[1] https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/commit/?h=atomic-file-updates&id=53c7233842969347174e8d68c8486dbf3efb734c

Moving along to the DEFER_FINISH question that I said I'd get back to
later -- there's a subtle difference to the order in which deferred log
items that are created while trying to make progress on an xattr op are
finished.  This is due to a design wart of the original defer ops
machinery, and Brian and I have discussed this previously.

In a nutshell, let's pretend that step 1 of an xattr operation creates
new deferred ops ABCD and step 2 creates new deferred ops EFGH.  Let's
also pretend that step 1 and step 2 both set DEFER_FINISH.  In the
!delattr case, xfs_attr_set_args -> xfs_attr_trans_roll will run step 1,
process A->B->C->D, roll, run step 2, and then process E->F->G->H and
commit.

In the delattr case, however, the defer ops machinery shoves all the new
defer ops to the end of the queue, which means that we run step 1, roll,
run step 2, and then run A->B->C->D->E->F->G->H and commit.  I would
like to fix that, since it seems more logical to me that you'd finish
A-D before moving on to the second phase; and the atomic swapext code is
going to require that.

Question 6: So, uh, can you go have a look at the latest patches[2]?
I'll post them soon if I can get past the bigtime review.  I don't think
this wart of the defer ops mechanism affects your patchset, but you know
how deferred attrs work better than I. :)

[2] https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=defer-ops-stalls

I also had a couple questions (observations?) about how log recovery
works for attr items, because I noticed that xfs_attri_item_recover also
has a do {attrset, roll} loop.

HAH, I just realized (while writing Q7) that xfs_defer_move needs to log
intent items for each newly scheduled work item because if log recovery
crashes after finishing the existing intent items but before it gets to
the new intent items, the next attempt at log recovery will not see the
missing intents and will /never/ even be aware that it should have
finished a chain.  That leads to fs corruption!  So that series has more
work to do, and you can set Q6 aside for now.

Question 7: Why is there a do {attrset, roll} loop in the recovery
routine?  Log intent item recovery functions are only supposed to
complete a single transaction's worth of work.  If there's more work to
do, the recovery function should attach a new defer ops item to the
transaction to schedule the rest of the work, and use xfs_defer_move
to attach the list of new defer ops to *parent_tp.

The reason for this is that log recovery has to finish every unfinished
intent item that was in the log before it can move on to new log items
that were created as a result of recovering log items.

Ok, that's probably enough questions for now.

--D

> +
> +#define XFS_ATTR_ITEM_SIZEOF(namelen, valuelen)	\
> +	(sizeof(struct xfs_attr_item) + (namelen) + (valuelen))

> +
> +
>  /*========================================================================
>   * Function prototypes for the kernel.
>   *========================================================================*/
> @@ -283,11 +319,14 @@ int xfs_attr_get_ilocked(struct xfs_da_args *args);
>  int xfs_attr_get(struct xfs_da_args *args);
>  int xfs_attr_set(struct xfs_da_args *args);
>  int xfs_attr_set_args(struct xfs_da_args *args);
> +int xfs_attr_set_iter(struct xfs_delattr_context *dac,
> +		      struct xfs_buf **leaf_bp);
>  int xfs_has_attr(struct xfs_da_args *args);
>  int xfs_attr_remove_args(struct xfs_da_args *args);
>  int xfs_attr_remove_iter(struct xfs_delattr_context *dac);
>  bool xfs_attr_namecheck(const void *name, size_t length);
>  void xfs_delattr_context_init(struct xfs_delattr_context *dac,
>  			      struct xfs_da_args *args);
> +int xfs_attr_calc_size(struct xfs_da_args *args, int *local);
>  
>  #endif	/* __XFS_ATTR_H__ */
> diff --git a/fs/xfs/libxfs/xfs_defer.c b/fs/xfs/libxfs/xfs_defer.c
> index d8f5862..4392279 100644
> --- a/fs/xfs/libxfs/xfs_defer.c
> +++ b/fs/xfs/libxfs/xfs_defer.c
> @@ -176,6 +176,7 @@ static const struct xfs_defer_op_type *defer_op_types[] = {
>  	[XFS_DEFER_OPS_TYPE_RMAP]	= &xfs_rmap_update_defer_type,
>  	[XFS_DEFER_OPS_TYPE_FREE]	= &xfs_extent_free_defer_type,
>  	[XFS_DEFER_OPS_TYPE_AGFL_FREE]	= &xfs_agfl_free_defer_type,
> +	[XFS_DEFER_OPS_TYPE_ATTR]	= &xfs_attr_defer_type,
>  };
>  
>  static void
> diff --git a/fs/xfs/libxfs/xfs_defer.h b/fs/xfs/libxfs/xfs_defer.h
> index 6b2ca58..193d3bb 100644
> --- a/fs/xfs/libxfs/xfs_defer.h
> +++ b/fs/xfs/libxfs/xfs_defer.h
> @@ -18,6 +18,7 @@ enum xfs_defer_ops_type {
>  	XFS_DEFER_OPS_TYPE_RMAP,
>  	XFS_DEFER_OPS_TYPE_FREE,
>  	XFS_DEFER_OPS_TYPE_AGFL_FREE,
> +	XFS_DEFER_OPS_TYPE_ATTR,
>  	XFS_DEFER_OPS_TYPE_MAX,
>  };
>  
> @@ -62,5 +63,7 @@ extern const struct xfs_defer_op_type xfs_refcount_update_defer_type;
>  extern const struct xfs_defer_op_type xfs_rmap_update_defer_type;
>  extern const struct xfs_defer_op_type xfs_extent_free_defer_type;
>  extern const struct xfs_defer_op_type xfs_agfl_free_defer_type;
> +extern const struct xfs_defer_op_type xfs_attr_defer_type;
> +
>  
>  #endif /* __XFS_DEFER_H__ */
> diff --git a/fs/xfs/libxfs/xfs_log_format.h b/fs/xfs/libxfs/xfs_log_format.h
> index e3400c9..33b26b6 100644
> --- a/fs/xfs/libxfs/xfs_log_format.h
> +++ b/fs/xfs/libxfs/xfs_log_format.h
> @@ -117,7 +117,12 @@ struct xfs_unmount_log_format {
>  #define XLOG_REG_TYPE_CUD_FORMAT	24
>  #define XLOG_REG_TYPE_BUI_FORMAT	25
>  #define XLOG_REG_TYPE_BUD_FORMAT	26
> -#define XLOG_REG_TYPE_MAX		26
> +#define XLOG_REG_TYPE_ATTRI_FORMAT	27
> +#define XLOG_REG_TYPE_ATTRD_FORMAT	28
> +#define XLOG_REG_TYPE_ATTR_NAME	29
> +#define XLOG_REG_TYPE_ATTR_VALUE	30
> +#define XLOG_REG_TYPE_MAX		30
> +
>  
>  /*
>   * Flags to log operation header
> @@ -240,6 +245,8 @@ typedef struct xfs_trans_header {
>  #define	XFS_LI_CUD		0x1243
>  #define	XFS_LI_BUI		0x1244	/* bmbt update intent */
>  #define	XFS_LI_BUD		0x1245
> +#define	XFS_LI_ATTRI		0x1246  /* attr set/remove intent*/
> +#define	XFS_LI_ATTRD		0x1247  /* attr set/remove done */
>  
>  #define XFS_LI_TYPE_DESC \
>  	{ XFS_LI_EFI,		"XFS_LI_EFI" }, \
> @@ -255,7 +262,9 @@ typedef struct xfs_trans_header {
>  	{ XFS_LI_CUI,		"XFS_LI_CUI" }, \
>  	{ XFS_LI_CUD,		"XFS_LI_CUD" }, \
>  	{ XFS_LI_BUI,		"XFS_LI_BUI" }, \
> -	{ XFS_LI_BUD,		"XFS_LI_BUD" }
> +	{ XFS_LI_BUD,		"XFS_LI_BUD" }, \
> +	{ XFS_LI_ATTRI,		"XFS_LI_ATTRI" }, \
> +	{ XFS_LI_ATTRD,		"XFS_LI_ATTRD" }
>  
>  /*
>   * Inode Log Item Format definitions.
> @@ -860,4 +869,35 @@ struct xfs_icreate_log {
>  	__be32		icl_gen;	/* inode generation number to use */
>  };
>  
> +/*
> + * Flags for deferred attribute operations.
> + * Upper bits are flags, lower byte is type code
> + */
> +#define XFS_ATTR_OP_FLAGS_SET		1	/* Set the attribute */
> +#define XFS_ATTR_OP_FLAGS_REMOVE	2	/* Remove the attribute */
> +#define XFS_ATTR_OP_FLAGS_TYPE_MASK	0x0FF	/* Flags type mask */
> +
> +/*
> + * This is the structure used to lay out an attr log item in the
> + * log.
> + */
> +struct xfs_attri_log_format {
> +	uint16_t	alfi_type;	/* attri log item type */
> +	uint16_t	alfi_size;	/* size of this item */
> +	uint32_t	__pad;		/* pad to 64 bit aligned */
> +	uint64_t	alfi_id;	/* attri identifier */
> +	xfs_ino_t       alfi_ino;	/* the inode for this attr operation */
> +	uint32_t        alfi_op_flags;	/* marks the op as a set or remove */
> +	uint32_t        alfi_name_len;	/* attr name length */
> +	uint32_t        alfi_value_len;	/* attr value length */
> +	uint32_t        alfi_attr_flags;/* attr flags */
> +};
> +
> +struct xfs_attrd_log_format {
> +	uint16_t	alfd_type;	/* attrd log item type */
> +	uint16_t	alfd_size;	/* size of this item */
> +	uint32_t	__pad;		/* pad to 64 bit aligned */
> +	uint64_t	alfd_alf_id;	/* id of corresponding attrd */
> +};
> +
>  #endif /* __XFS_LOG_FORMAT_H__ */
> diff --git a/fs/xfs/libxfs/xfs_log_recover.h b/fs/xfs/libxfs/xfs_log_recover.h
> index 641132d..b0b8e94 100644
> --- a/fs/xfs/libxfs/xfs_log_recover.h
> +++ b/fs/xfs/libxfs/xfs_log_recover.h
> @@ -72,6 +72,8 @@ extern const struct xlog_recover_item_ops xlog_rui_item_ops;
>  extern const struct xlog_recover_item_ops xlog_rud_item_ops;
>  extern const struct xlog_recover_item_ops xlog_cui_item_ops;
>  extern const struct xlog_recover_item_ops xlog_cud_item_ops;
> +extern const struct xlog_recover_item_ops xlog_attri_item_ops;
> +extern const struct xlog_recover_item_ops xlog_attrd_item_ops;
>  
>  /*
>   * Macros, structures, prototypes for internal log manager use.
> diff --git a/fs/xfs/libxfs/xfs_types.h b/fs/xfs/libxfs/xfs_types.h
> index 397d947..860cdd2 100644
> --- a/fs/xfs/libxfs/xfs_types.h
> +++ b/fs/xfs/libxfs/xfs_types.h
> @@ -11,6 +11,7 @@ typedef uint32_t	prid_t;		/* project ID */
>  typedef uint32_t	xfs_agblock_t;	/* blockno in alloc. group */
>  typedef uint32_t	xfs_agino_t;	/* inode # within allocation grp */
>  typedef uint32_t	xfs_extlen_t;	/* extent length in blocks */
> +typedef uint32_t	xfs_attrlen_t;	/* attr length */
>  typedef uint32_t	xfs_agnumber_t;	/* allocation group number */
>  typedef int32_t		xfs_extnum_t;	/* # of extents in a file */
>  typedef int16_t		xfs_aextnum_t;	/* # extents in an attribute fork */
> diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
> index 1887605..9a649d1 100644
> --- a/fs/xfs/scrub/common.c
> +++ b/fs/xfs/scrub/common.c
> @@ -24,6 +24,8 @@
>  #include "xfs_rmap_btree.h"
>  #include "xfs_log.h"
>  #include "xfs_trans_priv.h"
> +#include "xfs_da_format.h"
> +#include "xfs_da_btree.h"
>  #include "xfs_attr.h"
>  #include "xfs_reflink.h"
>  #include "scrub/scrub.h"
> diff --git a/fs/xfs/xfs_acl.c b/fs/xfs/xfs_acl.c
> index d4c687b5c..2fa173a 100644
> --- a/fs/xfs/xfs_acl.c
> +++ b/fs/xfs/xfs_acl.c
> @@ -10,6 +10,8 @@
>  #include "xfs_trans_resv.h"
>  #include "xfs_mount.h"
>  #include "xfs_inode.h"
> +#include "xfs_da_format.h"
> +#include "xfs_da_btree.h"
>  #include "xfs_attr.h"
>  #include "xfs_trace.h"
>  #include "xfs_error.h"
> diff --git a/fs/xfs/xfs_attr_item.c b/fs/xfs/xfs_attr_item.c
> new file mode 100644
> index 0000000..923c288
> --- /dev/null
> +++ b/fs/xfs/xfs_attr_item.c
> @@ -0,0 +1,829 @@
> +// SPDX-License-Identifier: GPL-2.0-or-later
> +/*
> + * Copyright (C) 2019 Oracle.  All Rights Reserved.
> + * Author: Allison Collins <allison.henderson@oracle.com>
> + */
> +
> +#include "xfs.h"
> +#include "xfs_fs.h"
> +#include "xfs_format.h"
> +#include "xfs_log_format.h"
> +#include "xfs_trans_resv.h"
> +#include "xfs_bit.h"
> +#include "xfs_shared.h"
> +#include "xfs_mount.h"
> +#include "xfs_defer.h"
> +#include "xfs_trans.h"
> +#include "xfs_trans_priv.h"
> +#include "xfs_buf_item.h"
> +#include "xfs_attr_item.h"
> +#include "xfs_log.h"
> +#include "xfs_btree.h"
> +#include "xfs_rmap.h"
> +#include "xfs_inode.h"
> +#include "xfs_icache.h"
> +#include "xfs_da_format.h"
> +#include "xfs_da_btree.h"
> +#include "xfs_attr.h"
> +#include "xfs_shared.h"
> +#include "xfs_attr_item.h"
> +#include "xfs_alloc.h"
> +#include "xfs_bmap.h"
> +#include "xfs_trace.h"
> +#include "libxfs/xfs_da_format.h"
> +#include "xfs_inode.h"
> +#include "xfs_quota.h"
> +#include "xfs_log_priv.h"
> +#include "xfs_log_recover.h"
> +
> +static const struct xfs_item_ops xfs_attri_item_ops;
> +static const struct xfs_item_ops xfs_attrd_item_ops;
> +
> +static inline struct xfs_attri_log_item *ATTRI_ITEM(struct xfs_log_item *lip)
> +{
> +	return container_of(lip, struct xfs_attri_log_item, attri_item);
> +}
> +
> +STATIC void
> +xfs_attri_item_free(
> +	struct xfs_attri_log_item	*attrip)
> +{
> +	kmem_free(attrip->attri_item.li_lv_shadow);
> +	kmem_free(attrip);
> +}
> +
> +/*
> + * Freeing the attrip requires that we remove it from the AIL if it has already
> + * been placed there. However, the ATTRI may not yet have been placed in the
> + * AIL when called by xfs_attri_release() from ATTRD processing due to the
> + * ordering of committed vs unpin operations in bulk insert operations. Hence
> + * the reference count to ensure only the last caller frees the ATTRI.
> + */
> +STATIC void
> +xfs_attri_release(
> +	struct xfs_attri_log_item	*attrip)
> +{
> +	ASSERT(atomic_read(&attrip->attri_refcount) > 0);
> +	if (atomic_dec_and_test(&attrip->attri_refcount)) {
> +		xfs_trans_ail_delete(&attrip->attri_item,
> +				     SHUTDOWN_LOG_IO_ERROR);
> +		xfs_attri_item_free(attrip);
> +	}
> +}
> +
> +/*
> + * This returns the number of iovecs needed to log the given attri item. We
> + * only need 1 iovec for an attri item.  It just logs the attr_log_format
> + * structure.
> + */
> +static inline int
> +xfs_attri_item_sizeof(
> +	struct xfs_attri_log_item *attrip)
> +{
> +	return sizeof(struct xfs_attri_log_format);
> +}
> +
> +STATIC void
> +xfs_attri_item_size(
> +	struct xfs_log_item	*lip,
> +	int			*nvecs,
> +	int			*nbytes)
> +{
> +	struct xfs_attri_log_item       *attrip = ATTRI_ITEM(lip);
> +
> +	*nvecs += 1;
> +	*nbytes += xfs_attri_item_sizeof(attrip);
> +
> +	/* Attr set and remove operations require a name */
> +	ASSERT(attrip->attri_name_len > 0);
> +
> +	*nvecs += 1;
> +	*nbytes += ATTR_NVEC_SIZE(attrip->attri_name_len);
> +
> +	/*
> +	 * Set ops can accept a value of 0 len to clear an attr value.  Remove
> +	 * ops do not need a value at all.  So only account for the value
> +	 * when it is needed.
> +	 */
> +	if (attrip->attri_value_len > 0) {
> +		*nvecs += 1;
> +		*nbytes += ATTR_NVEC_SIZE(attrip->attri_value_len);
> +	}
> +}
> +
> +/*
> + * This is called to fill in the log iovecs for the given attri log
> + * item. We use  1 iovec for the attri_format_item, 1 for the name, and
> + * another for the value if it is present
> + */
> +STATIC void
> +xfs_attri_item_format(
> +	struct xfs_log_item	*lip,
> +	struct xfs_log_vec	*lv)
> +{
> +	struct xfs_attri_log_item	*attrip = ATTRI_ITEM(lip);
> +	struct xfs_log_iovec		*vecp = NULL;
> +
> +	attrip->attri_format.alfi_type = XFS_LI_ATTRI;
> +	attrip->attri_format.alfi_size = 1;
> +
> +	/*
> +	 * This size accounting must be done before copying the attrip into the
> +	 * iovec.  If we do it after, the wrong size will be recorded to the log
> +	 * and we trip across assertion checks for bad region sizes later during
> +	 * the log recovery.
> +	 */
> +
> +	ASSERT(attrip->attri_name_len > 0);
> +	attrip->attri_format.alfi_size++;
> +
> +	if (attrip->attri_value_len > 0)
> +		attrip->attri_format.alfi_size++;
> +
> +	xlog_copy_iovec(lv, &vecp, XLOG_REG_TYPE_ATTRI_FORMAT,
> +			&attrip->attri_format,
> +			xfs_attri_item_sizeof(attrip));
> +	if (attrip->attri_name_len > 0)
> +		xlog_copy_iovec(lv, &vecp, XLOG_REG_TYPE_ATTR_NAME,
> +				attrip->attri_name,
> +				ATTR_NVEC_SIZE(attrip->attri_name_len));
> +
> +	if (attrip->attri_value_len > 0)
> +		xlog_copy_iovec(lv, &vecp, XLOG_REG_TYPE_ATTR_VALUE,
> +				attrip->attri_value,
> +				ATTR_NVEC_SIZE(attrip->attri_value_len));
> +}
> +
> +/*
> + * The unpin operation is the last place an ATTRI is manipulated in the log. It
> + * is either inserted in the AIL or aborted in the event of a log I/O error. In
> + * either case, the ATTRI transaction has been successfully committed to make
> + * it this far. Therefore, we expect whoever committed the ATTRI to either
> + * construct and commit the ATTRD or drop the ATTRD's reference in the event of
> + * error. Simply drop the log's ATTRI reference now that the log is done with
> + * it.
> + */
> +STATIC void
> +xfs_attri_item_unpin(
> +	struct xfs_log_item	*lip,
> +	int			remove)
> +{
> +	struct xfs_attri_log_item	*attrip = ATTRI_ITEM(lip);
> +
> +	xfs_attri_release(attrip);
> +}
> +
> +
> +STATIC void
> +xfs_attri_item_release(
> +	struct xfs_log_item	*lip)
> +{
> +	xfs_attri_release(ATTRI_ITEM(lip));
> +}
> +
> +/*
> + * Allocate and initialize an attri item
> + */
> +STATIC struct xfs_attri_log_item *
> +xfs_attri_init(
> +	struct xfs_mount	*mp)
> +
> +{
> +	struct xfs_attri_log_item	*attrip;
> +	uint				size;
> +
> +	size = (uint)(sizeof(struct xfs_attri_log_item));
> +	attrip = kmem_zalloc(size, 0);
> +
> +	xfs_log_item_init(mp, &attrip->attri_item, XFS_LI_ATTRI,
> +			  &xfs_attri_item_ops);
> +	attrip->attri_format.alfi_id = (uintptr_t)(void *)attrip;
> +	atomic_set(&attrip->attri_refcount, 2);
> +
> +	return attrip;
> +}
> +
> +/*
> + * Copy an attr format buffer from the given buf, and into the destination attr
> + * format structure.
> + */
> +STATIC int
> +xfs_attri_copy_format(struct xfs_log_iovec *buf,
> +		      struct xfs_attri_log_format *dst_attr_fmt)
> +{
> +	struct xfs_attri_log_format *src_attr_fmt = buf->i_addr;
> +	uint len = sizeof(struct xfs_attri_log_format);
> +
> +	if (buf->i_len != len)
> +		return -EFSCORRUPTED;
> +
> +	memcpy((char *)dst_attr_fmt, (char *)src_attr_fmt, len);
> +	return 0;
> +}
> +
> +static inline struct xfs_attrd_log_item *ATTRD_ITEM(struct xfs_log_item *lip)
> +{
> +	return container_of(lip, struct xfs_attrd_log_item, attrd_item);
> +}
> +
> +STATIC void
> +xfs_attrd_item_free(struct xfs_attrd_log_item *attrdp)
> +{
> +	kmem_free(attrdp->attrd_item.li_lv_shadow);
> +	kmem_free(attrdp);
> +}
> +
> +/*
> + * This returns the number of iovecs needed to log the given attrd item.
> + * We only need 1 iovec for an attrd item.  It just logs the attr_log_format
> + * structure.
> + */
> +static inline int
> +xfs_attrd_item_sizeof(
> +	struct xfs_attrd_log_item *attrdp)
> +{
> +	return sizeof(struct xfs_attrd_log_format);
> +}
> +
> +STATIC void
> +xfs_attrd_item_size(
> +	struct xfs_log_item	*lip,
> +	int			*nvecs,
> +	int			*nbytes)
> +{
> +	struct xfs_attrd_log_item	*attrdp = ATTRD_ITEM(lip);
> +	*nvecs += 1;
> +	*nbytes += xfs_attrd_item_sizeof(attrdp);
> +}
> +
> +/*
> + * This is called to fill in the log iovecs for the given attrd log item. We use
> + * only 1 iovec for the attrd_format, and we point that at the attr_log_format
> + * structure embedded in the attrd item.
> + */
> +STATIC void
> +xfs_attrd_item_format(
> +	struct xfs_log_item	*lip,
> +	struct xfs_log_vec	*lv)
> +{
> +	struct xfs_attrd_log_item	*attrdp = ATTRD_ITEM(lip);
> +	struct xfs_log_iovec		*vecp = NULL;
> +
> +	attrdp->attrd_format.alfd_type = XFS_LI_ATTRD;
> +	attrdp->attrd_format.alfd_size = 1;
> +
> +	xlog_copy_iovec(lv, &vecp, XLOG_REG_TYPE_ATTRD_FORMAT,
> +			&attrdp->attrd_format, xfs_attrd_item_sizeof(attrdp));
> +}
> +
> +/*
> + * The ATTRD is either committed or aborted if the transaction is cancelled. If
> + * the transaction is cancelled, drop our reference to the ATTRI and free the
> + * ATTRD.
> + */
> +STATIC void
> +xfs_attrd_item_release(
> +	struct xfs_log_item     *lip)
> +{
> +	struct xfs_attrd_log_item *attrdp = ATTRD_ITEM(lip);
> +	xfs_attri_release(attrdp->attrd_attrip);
> +	xfs_attrd_item_free(attrdp);
> +}
> +
> +/*
> + * Log an ATTRI it to the ATTRD when the attr op is done.  An attr operation
> + * may be a set or a remove.  Note that the transaction is marked dirty
> + * regardless of whether the operation succeeds or fails to support the
> + * ATTRI/ATTRD lifecycle rules.
> + */
> +int
> +xfs_trans_attr(
> +	struct xfs_delattr_context	*dac,
> +	struct xfs_attrd_log_item	*attrdp,
> +	struct xfs_buf			**leaf_bp,
> +	uint32_t			op_flags)
> +{
> +	struct xfs_da_args		*args = dac->da_args;
> +	int				error;
> +
> +	error = xfs_qm_dqattach_locked(args->dp, 0);
> +	if (error)
> +		return error;
> +
> +	switch (op_flags) {
> +	case XFS_ATTR_OP_FLAGS_SET:
> +		args->op_flags |= XFS_DA_OP_ADDNAME;
> +		error = xfs_attr_set_iter(dac, leaf_bp);
> +		break;
> +	case XFS_ATTR_OP_FLAGS_REMOVE:
> +		ASSERT(XFS_IFORK_Q((args->dp)));
> +		error = xfs_attr_remove_iter(dac);
> +		break;
> +	default:
> +		error = -EFSCORRUPTED;
> +		break;
> +	}
> +
> +	/*
> +	 * Mark the transaction dirty, even on error. This ensures the
> +	 * transaction is aborted, which:
> +	 *
> +	 * 1.) releases the ATTRI and frees the ATTRD
> +	 * 2.) shuts down the filesystem
> +	 */
> +	args->trans->t_flags |= XFS_TRANS_DIRTY;
> +	set_bit(XFS_LI_DIRTY, &attrdp->attrd_item.li_flags);
> +
> +	return error;
> +}
> +
> +/* Log an attr to the intent item. */
> +STATIC void
> +xfs_attr_log_item(
> +	struct xfs_trans		*tp,
> +	struct xfs_attri_log_item	*attrip,
> +	struct xfs_attr_item		*attr)
> +{
> +	struct xfs_attri_log_format	*attrp;
> +	char				*name_value;
> +
> +	name_value = ((char *)attr) + sizeof(struct xfs_attr_item);
> +
> +	tp->t_flags |= XFS_TRANS_DIRTY;
> +	set_bit(XFS_LI_DIRTY, &attrip->attri_item.li_flags);
> +
> +	/*
> +	 * At this point the xfs_attr_item has been constructed, and we've
> +	 * created the log intent. Fill in the attri log item and log format
> +	 * structure with fields from this xfs_attr_item
> +	 */
> +	attrp = &attrip->attri_format;
> +	attrp->alfi_ino = attr->xattri_ip->i_ino;
> +	attrp->alfi_op_flags = attr->xattri_op_flags;
> +	attrp->alfi_value_len = attr->xattri_value_len;
> +	attrp->alfi_name_len = attr->xattri_name_len;
> +	attrp->alfi_attr_flags = attr->xattri_flags;
> +
> +	attrip->attri_name = name_value;
> +	attrip->attri_value = &name_value[attr->xattri_name_len];
> +	attrip->attri_name_len = attr->xattri_name_len;
> +	attrip->attri_value_len = attr->xattri_value_len;
> +}
> +
> +/* Get an ATTRI. */
> +static struct xfs_log_item *
> +xfs_attr_create_intent(
> +	struct xfs_trans		*tp,
> +	struct list_head		*items,
> +	unsigned int			count,
> +	bool				sort)
> +{
> +	struct xfs_mount		*mp = tp->t_mountp;
> +	struct xfs_attri_log_item	*attrip = xfs_attri_init(mp);
> +	struct xfs_attr_item		*attr;
> +
> +	ASSERT(count == 1);
> +
> +	xfs_trans_add_item(tp, &attrip->attri_item);
> +	list_for_each_entry(attr, items, xattri_list)
> +		xfs_attr_log_item(tp, attrip, attr);
> +	return &attrip->attri_item;
> +}
> +
> +/* Process an attr. */
> +STATIC int
> +xfs_attr_finish_item(
> +	struct xfs_trans		*tp,
> +	struct xfs_log_item		*done,
> +	struct list_head		*item,
> +	struct xfs_btree_cur		**state)
> +{
> +	struct xfs_attr_item		*attr;
> +	int				error;
> +	int				local;
> +	struct xfs_delattr_context	*dac;
> +	struct xfs_da_args		*args;
> +	struct xfs_attrd_log_item	*attrdp;
> +	struct xfs_attri_log_item	*attrip;
> +
> +	attr = container_of(item, struct xfs_attr_item, xattri_list);
> +	dac = &attr->xattri_dac;
> +	args = &attr->xattri_args;
> +
> +	if (!(dac->flags & XFS_DAC_DELAYED_OP_INIT)) {
> +		/* Only need to initialize args context once */
> +		memset(args, 0, sizeof(*args));
> +		args->geo = attr->xattri_ip->i_mount->m_attr_geo;
> +		args->whichfork = XFS_ATTR_FORK;
> +		args->dp = attr->xattri_ip;
> +		args->name = ((const unsigned char *)attr) +
> +			      sizeof(struct xfs_attr_item);
> +		args->namelen = attr->xattri_name_len;
> +		args->attr_filter = attr->xattri_flags;
> +		args->hashval = xfs_da_hashname(args->name, args->namelen);
> +		args->value = (void *)&args->name[attr->xattri_name_len];
> +		args->valuelen = attr->xattri_value_len;
> +		args->op_flags = XFS_DA_OP_OKNOENT;
> +
> +		/* must match existing transaction block res */
> +		args->total = xfs_attr_calc_size(args, &local);
> +
> +		memset(dac, 0, sizeof(struct xfs_delattr_context));
> +		dac->flags |= XFS_DAC_DELAYED_OP_INIT;
> +		dac->da_args = args;
> +	}
> +
> +	/*
> +	 * Always reset trans after EAGAIN cycle
> +	 * since the transaction is new
> +	 */
> +	args->trans = tp;
> +
> +	error = xfs_trans_attr(dac, ATTRD_ITEM(done), &dac->leaf_bp,
> +			       attr->xattri_op_flags);
> +	/*
> +	 * The attrip refers to xfs_attr_item memory to log the name and value
> +	 * with the intent item. This already occurred when the intent was
> +	 * committed so these fields are no longer accessed. Clear them out of
> +	 * caution since we're about to free the xfs_attr_item.
> +	 */
> +	attrdp = (struct xfs_attrd_log_item *)done;
> +	attrip = attrdp->attrd_attrip;
> +	attrip->attri_name = NULL;
> +	attrip->attri_value = NULL;
> +
> +	if (error != -EAGAIN)
> +		kmem_free(attr);
> +
> +	return error;
> +}
> +
> +/* Abort all pending ATTRs. */
> +STATIC void
> +xfs_attr_abort_intent(
> +	struct xfs_log_item		*intent)
> +{
> +	xfs_attri_release(ATTRI_ITEM(intent));
> +}
> +
> +/* Cancel an attr */
> +STATIC void
> +xfs_attr_cancel_item(
> +	struct list_head		*item)
> +{
> +	struct xfs_attr_item		*attr;
> +
> +	attr = container_of(item, struct xfs_attr_item, xattri_list);
> +	kmem_free(attr);
> +}
> +
> +/*
> + * The ATTRI is logged only once and cannot be moved in the log, so simply
> + * return the lsn at which it's been logged.
> + */
> +STATIC xfs_lsn_t
> +xfs_attri_item_committed(
> +	struct xfs_log_item	*lip,
> +	xfs_lsn_t		lsn)
> +{
> +	return lsn;
> +}
> +
> +STATIC void
> +xfs_attri_item_committing(
> +	struct xfs_log_item	*lip,
> +	xfs_lsn_t		lsn)
> +{
> +}
> +
> +STATIC bool
> +xfs_attri_item_match(
> +	struct xfs_log_item	*lip,
> +	uint64_t		intent_id)
> +{
> +	return ATTRI_ITEM(lip)->attri_format.alfi_id == intent_id;
> +}
> +
> +/*
> + * When the attrd item is committed to disk, all we need to do is delete our
> + * reference to our partner attri item and then free ourselves. Since we're
> + * freeing ourselves we must return -1 to keep the transaction code from
> + * further referencing this item.
> + */
> +STATIC xfs_lsn_t
> +xfs_attrd_item_committed(
> +	struct xfs_log_item	*lip,
> +	xfs_lsn_t		lsn)
> +{
> +	struct xfs_attrd_log_item	*attrdp = ATTRD_ITEM(lip);
> +
> +	/*
> +	 * Drop the ATTRI reference regardless of whether the ATTRD has been
> +	 * aborted. Once the ATTRD transaction is constructed, it is the sole
> +	 * responsibility of the ATTRD to release the ATTRI (even if the ATTRI
> +	 * is aborted due to log I/O error).
> +	 */
> +	xfs_attri_release(attrdp->attrd_attrip);
> +	xfs_attrd_item_free(attrdp);
> +
> +	return NULLCOMMITLSN;
> +}
> +
> +STATIC void
> +xfs_attrd_item_committing(
> +	struct xfs_log_item	*lip,
> +	xfs_lsn_t		lsn)
> +{
> +}
> +
> +
> +/*
> + * Allocate and initialize an attrd item
> + */
> +struct xfs_attrd_log_item *
> +xfs_attrd_init(
> +	struct xfs_mount		*mp,
> +	struct xfs_attri_log_item	*attrip)
> +
> +{
> +	struct xfs_attrd_log_item	*attrdp;
> +	uint				size;
> +
> +	size = (uint)(sizeof(struct xfs_attrd_log_item));
> +	attrdp = kmem_zalloc(size, 0);
> +	memset(attrdp, 0, size);
> +
> +	xfs_log_item_init(mp, &attrdp->attrd_item, XFS_LI_ATTRD,
> +			  &xfs_attrd_item_ops);
> +	attrdp->attrd_attrip = attrip;
> +	attrdp->attrd_format.alfd_alf_id = attrip->attri_format.alfi_id;
> +
> +	return attrdp;
> +}
> +
> +/*
> + * This routine is called to allocate an "attr free done" log item.
> + */
> +struct xfs_attrd_log_item *
> +xfs_trans_get_attrd(struct xfs_trans		*tp,
> +		  struct xfs_attri_log_item	*attrip)
> +{
> +	struct xfs_attrd_log_item		*attrdp;
> +
> +	ASSERT(tp != NULL);
> +
> +	attrdp = xfs_attrd_init(tp->t_mountp, attrip);
> +	ASSERT(attrdp != NULL);
> +
> +	xfs_trans_add_item(tp, &attrdp->attrd_item);
> +	return attrdp;
> +}
> +
> +static const struct xfs_item_ops xfs_attrd_item_ops = {
> +	.iop_size	= xfs_attrd_item_size,
> +	.iop_format	= xfs_attrd_item_format,
> +	.iop_release    = xfs_attrd_item_release,
> +	.iop_committing	= xfs_attrd_item_committing,
> +	.iop_committed	= xfs_attrd_item_committed,
> +};
> +
> +
> +/* Get an ATTRD so we can process all the attrs. */
> +static struct xfs_log_item *
> +xfs_attr_create_done(
> +	struct xfs_trans		*tp,
> +	struct xfs_log_item		*intent,
> +	unsigned int			count)
> +{
> +	return &xfs_trans_get_attrd(tp, ATTRI_ITEM(intent))->attrd_item;
> +}
> +
> +const struct xfs_defer_op_type xfs_attr_defer_type = {
> +	.max_items	= 1,
> +	.create_intent	= xfs_attr_create_intent,
> +	.abort_intent	= xfs_attr_abort_intent,
> +	.create_done	= xfs_attr_create_done,
> +	.finish_item	= xfs_attr_finish_item,
> +	.cancel_item	= xfs_attr_cancel_item,
> +};
> +
> +/*
> + * Process an attr intent item that was recovered from the log.  We need to
> + * delete the attr that it describes.
> + */
> +STATIC int
> +xfs_attri_item_recover(
> +	struct xfs_log_item		*lip,
> +	struct xfs_trans		*parent_tp)
> +{
> +	struct xfs_attri_log_item	*attrip = ATTRI_ITEM(lip);
> +	struct xfs_mount		*mp = parent_tp->t_mountp;
> +	struct xfs_inode		*ip;
> +	struct xfs_attrd_log_item	*attrdp;
> +	struct xfs_da_args		args;
> +	struct xfs_attri_log_format	*attrp;
> +	struct xfs_trans_res		tres;
> +	int				local;
> +	int				error, err2 = 0;
> +	int				rsvd = 0;
> +	struct xfs_buf			*leaf_bp = NULL;
> +	struct xfs_delattr_context	dac = {
> +		.da_args	= &args,
> +	};
> +
> +	/*
> +	 * First check the validity of the attr described by the ATTRI.  If any
> +	 * are bad, then assume that all are bad and just toss the ATTRI.
> +	 */
> +	attrp = &attrip->attri_format;
> +	if (!(attrp->alfi_op_flags == XFS_ATTR_OP_FLAGS_SET ||
> +	      attrp->alfi_op_flags == XFS_ATTR_OP_FLAGS_REMOVE) ||
> +	    (attrp->alfi_value_len > XATTR_SIZE_MAX) ||
> +	    (attrp->alfi_name_len > XATTR_NAME_MAX) ||
> +	    (attrp->alfi_name_len == 0)) {
> +		/*
> +		 * This will pull the ATTRI from the AIL and free the memory
> +		 * associated with it.
> +		 */
> +		xfs_attri_release(attrip);
> +		return -EFSCORRUPTED;
> +	}
> +
> +	error = xfs_iget(mp, 0, attrp->alfi_ino, 0, 0, &ip);
> +	if (error)
> +		return error;
> +
> +	memset(&args, 0, sizeof(args));
> +	args.geo = ip->i_mount->m_attr_geo;
> +	args.whichfork = XFS_ATTR_FORK;
> +	args.dp = ip;
> +	args.name = attrip->attri_name;
> +	args.namelen = attrp->alfi_name_len;
> +	args.attr_filter = attrp->alfi_attr_flags;
> +	args.hashval = xfs_da_hashname(attrip->attri_name,
> +					attrp->alfi_name_len);
> +	args.value = attrip->attri_value;
> +	args.valuelen = attrp->alfi_value_len;
> +	args.op_flags = XFS_DA_OP_OKNOENT;
> +	args.total = xfs_attr_calc_size(&args, &local);
> +
> +	tres.tr_logres = M_RES(mp)->tr_attrsetm.tr_logres +
> +			M_RES(mp)->tr_attrsetrt.tr_logres * args.total;
> +	tres.tr_logcount = XFS_ATTRSET_LOG_COUNT;
> +	tres.tr_logflags = XFS_TRANS_PERM_LOG_RES;
> +
> +	error = xfs_trans_alloc(mp, &tres, args.total,  0,
> +				rsvd ? XFS_TRANS_RESERVE : 0, &args.trans);
> +	if (error)
> +		goto out_rele;
> +	attrdp = xfs_trans_get_attrd(args.trans, attrip);
> +
> +	xfs_ilock(ip, XFS_ILOCK_EXCL);
> +
> +	xfs_trans_ijoin(args.trans, ip, 0);
> +
> +	do {
> +		error = xfs_trans_attr(&dac, attrdp, &leaf_bp,
> +				       attrp->alfi_op_flags);
> +		if (error && error != -EAGAIN)
> +			goto abort_error;
> +
> +		xfs_trans_log_inode(args.trans, ip,
> +				XFS_ILOG_CORE | XFS_ILOG_ADATA);
> +
> +		err2 = xfs_trans_roll(&args.trans);
> +		if (err2) {
> +			error = err2;
> +			goto abort_error;
> +		}
> +
> +		/* Rejoin inode and leaf if needed */
> +		xfs_trans_ijoin(args.trans, ip, 0);
> +		if (leaf_bp) {
> +			xfs_trans_bjoin(args.trans, leaf_bp);
> +			xfs_trans_bhold(args.trans, leaf_bp);
> +		}
> +
> +	} while (error == -EAGAIN);
> +
> +	error = xfs_trans_commit(args.trans);
> +	if (error)
> +		goto abort_error;
> +
> +	xfs_iunlock(ip, XFS_ILOCK_EXCL);
> +	xfs_irele(ip);
> +	return error;
> +
> +abort_error:
> +	xfs_trans_cancel(args.trans);
> +	xfs_iunlock(ip, XFS_ILOCK_EXCL);
> +out_rele:
> +	xfs_irele(ip);
> +	return error;
> +}
> +
> +static const struct xfs_item_ops xfs_attri_item_ops = {
> +	.iop_size	= xfs_attri_item_size,
> +	.iop_format	= xfs_attri_item_format,
> +	.iop_unpin	= xfs_attri_item_unpin,
> +	.iop_committed	= xfs_attri_item_committed,
> +	.iop_committing = xfs_attri_item_committing,
> +	.iop_release    = xfs_attri_item_release,
> +	.iop_recover	= xfs_attri_item_recover,
> +	.iop_match	= xfs_attri_item_match,
> +};
> +
> +
> +
> +STATIC int
> +xlog_recover_attri_commit_pass2(
> +	struct xlog                     *log,
> +	struct list_head		*buffer_list,
> +	struct xlog_recover_item        *item,
> +	xfs_lsn_t                       lsn)
> +{
> +	int                             error;
> +	struct xfs_mount                *mp = log->l_mp;
> +	struct xfs_attri_log_item       *attrip;
> +	struct xfs_attri_log_format     *attri_formatp;
> +	char				*name = NULL;
> +	char				*value = NULL;
> +	int				region = 0;
> +
> +	attri_formatp = item->ri_buf[region].i_addr;
> +
> +	attrip = xfs_attri_init(mp);
> +	error = xfs_attri_copy_format(&item->ri_buf[region],
> +				      &attrip->attri_format);
> +	if (error) {
> +		xfs_attri_item_free(attrip);
> +		return error;
> +	}
> +
> +	attrip->attri_name_len = attri_formatp->alfi_name_len;
> +	attrip->attri_value_len = attri_formatp->alfi_value_len;
> +	attrip = kmem_realloc(attrip, sizeof(struct xfs_attri_log_item) +
> +			      attrip->attri_name_len + attrip->attri_value_len,
> +			      0);
> +
> +	ASSERT(attrip->attri_name_len > 0);
> +	region++;
> +	name = ((char *)attrip) + sizeof(struct xfs_attri_log_item);
> +	memcpy(name, item->ri_buf[region].i_addr,
> +	       attrip->attri_name_len);
> +	attrip->attri_name = name;
> +
> +	if (attrip->attri_value_len > 0) {
> +		region++;
> +		value = ((char *)attrip) + sizeof(struct xfs_attri_log_item) +
> +			attrip->attri_name_len;
> +		memcpy(value, item->ri_buf[region].i_addr,
> +			attrip->attri_value_len);
> +		attrip->attri_value = value;
> +	}
> +
> +	/*
> +	 * The ATTRI has two references. One for the ATTRD and one for ATTRI to
> +	 * ensure it makes it into the AIL. Insert the ATTRI into the AIL
> +	 * directly and drop the ATTRI reference. Note that
> +	 * xfs_trans_ail_update() drops the AIL lock.
> +	 */
> +	xfs_trans_ail_insert(log->l_ailp, &attrip->attri_item, lsn);
> +	xfs_attri_release(attrip);
> +	return 0;
> +}
> +
> +const struct xlog_recover_item_ops xlog_attri_item_ops = {
> +	.item_type	= XFS_LI_ATTRI,
> +	.commit_pass2	= xlog_recover_attri_commit_pass2,
> +};
> +
> +/*
> + * This routine is called when an ATTRD format structure is found in a committed
> + * transaction in the log. Its purpose is to cancel the corresponding ATTRI if
> + * it was still in the log. To do this it searches the AIL for the ATTRI with
> + * an id equal to that in the ATTRD format structure. If we find it we drop
> + * the ATTRD reference, which removes the ATTRI from the AIL and frees it.
> + */
> +STATIC int
> +xlog_recover_attrd_commit_pass2(
> +	struct xlog			*log,
> +	struct list_head		*buffer_list,
> +	struct xlog_recover_item	*item,
> +	xfs_lsn_t			lsn)
> +{
> +	struct xfs_attrd_log_format	*attrd_formatp;
> +
> +	attrd_formatp = item->ri_buf[0].i_addr;
> +	ASSERT((item->ri_buf[0].i_len ==
> +				(sizeof(struct xfs_attrd_log_format))));
> +
> +	xlog_recover_release_intent(log, XFS_LI_ATTRI,
> +				    attrd_formatp->alfd_alf_id);
> +	return 0;
> +}
> +
> +const struct xlog_recover_item_ops xlog_attrd_item_ops = {
> +	.item_type	= XFS_LI_ATTRD,
> +	.commit_pass2	= xlog_recover_attrd_commit_pass2,
> +};
> diff --git a/fs/xfs/xfs_attr_item.h b/fs/xfs/xfs_attr_item.h
> new file mode 100644
> index 0000000..7dd2572
> --- /dev/null
> +++ b/fs/xfs/xfs_attr_item.h
> @@ -0,0 +1,76 @@
> +/* SPDX-License-Identifier: GPL-2.0-or-later
> + *
> + * Copyright (C) 2019 Oracle.  All Rights Reserved.
> + * Author: Allison Collins <allison.henderson@oracle.com>
> + */
> +#ifndef	__XFS_ATTR_ITEM_H__
> +#define	__XFS_ATTR_ITEM_H__
> +
> +/* kernel only ATTRI/ATTRD definitions */
> +
> +struct xfs_mount;
> +struct kmem_zone;
> +
> +/*
> + * Define ATTR flag bits. Manipulated by set/clear/test_bit operators.
> + */
> +#define	XFS_ATTRI_RECOVERED	1
> +
> +
> +/* iovec length must be 32-bit aligned */
> +#define ATTR_NVEC_SIZE(size) (size == sizeof(int32_t) ? sizeof(int32_t) : \
> +				size + sizeof(int32_t) - \
> +				(size % sizeof(int32_t)))
> +
> +/*
> + * This is the "attr intention" log item.  It is used to log the fact that some
> + * attribute operations need to be processed.  An operation is currently either
> + * a set or remove.  Set or remove operations are described by the xfs_attr_item
> + * which may be logged to this intent.  Intents are used in conjunction with the
> + * "attr done" log item described below.
> + *
> + * The ATTRI is reference counted so that it is not freed prior to both the
> + * ATTRI and ATTRD being committed and unpinned. This ensures the ATTRI is
> + * inserted into the AIL even in the event of out of order ATTRI/ATTRD
> + * processing. In other words, an ATTRI is born with two references:
> + *
> + *      1.) an ATTRI held reference to track ATTRI AIL insertion
> + *      2.) an ATTRD held reference to track ATTRD commit
> + *
> + * On allocation, both references are the responsibility of the caller. Once the
> + * ATTRI is added to and dirtied in a transaction, ownership of reference one
> + * transfers to the transaction. The reference is dropped once the ATTRI is
> + * inserted to the AIL or in the event of failure along the way (e.g., commit
> + * failure, log I/O error, etc.). Note that the caller remains responsible for
> + * the ATTRD reference under all circumstances to this point. The caller has no
> + * means to detect failure once the transaction is committed, however.
> + * Therefore, an ATTRD is required after this point, even in the event of
> + * unrelated failure.
> + *
> + * Once an ATTRD is allocated and dirtied in a transaction, reference two
> + * transfers to the transaction. The ATTRD reference is dropped once it reaches
> + * the unpin handler. Similar to the ATTRI, the reference also drops in the
> + * event of commit failure or log I/O errors. Note that the ATTRD is not
> + * inserted in the AIL, so at this point both the ATTRI and ATTRD are freed.
> + */
> +struct xfs_attri_log_item {
> +	struct xfs_log_item		attri_item;
> +	atomic_t			attri_refcount;
> +	int				attri_name_len;
> +	void				*attri_name;
> +	int				attri_value_len;
> +	void				*attri_value;
> +	struct xfs_attri_log_format	attri_format;
> +};
> +
> +/*
> + * This is the "attr done" log item.  It is used to log the fact that some attrs
> + * earlier mentioned in an attri item have been freed.
> + */
> +struct xfs_attrd_log_item {
> +	struct xfs_attri_log_item	*attrd_attrip;
> +	struct xfs_log_item		attrd_item;
> +	struct xfs_attrd_log_format	attrd_format;
> +};
> +
> +#endif	/* __XFS_ATTR_ITEM_H__ */
> diff --git a/fs/xfs/xfs_attr_list.c b/fs/xfs/xfs_attr_list.c
> index 50f922c..166b680 100644
> --- a/fs/xfs/xfs_attr_list.c
> +++ b/fs/xfs/xfs_attr_list.c
> @@ -15,6 +15,7 @@
>  #include "xfs_inode.h"
>  #include "xfs_trans.h"
>  #include "xfs_bmap.h"
> +#include "xfs_da_btree.h"
>  #include "xfs_attr.h"
>  #include "xfs_attr_sf.h"
>  #include "xfs_attr_leaf.h"
> diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
> index 6f22a66..edc05af 100644
> --- a/fs/xfs/xfs_ioctl.c
> +++ b/fs/xfs/xfs_ioctl.c
> @@ -15,6 +15,8 @@
>  #include "xfs_iwalk.h"
>  #include "xfs_itable.h"
>  #include "xfs_error.h"
> +#include "xfs_da_format.h"
> +#include "xfs_da_btree.h"
>  #include "xfs_attr.h"
>  #include "xfs_bmap.h"
>  #include "xfs_bmap_util.h"
> diff --git a/fs/xfs/xfs_ioctl32.c b/fs/xfs/xfs_ioctl32.c
> index c1771e7..62e1534 100644
> --- a/fs/xfs/xfs_ioctl32.c
> +++ b/fs/xfs/xfs_ioctl32.c
> @@ -17,6 +17,8 @@
>  #include "xfs_itable.h"
>  #include "xfs_fsops.h"
>  #include "xfs_rtalloc.h"
> +#include "xfs_da_format.h"
> +#include "xfs_da_btree.h"
>  #include "xfs_attr.h"
>  #include "xfs_ioctl.h"
>  #include "xfs_ioctl32.h"
> diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
> index 80a13c8..fe60da1 100644
> --- a/fs/xfs/xfs_iops.c
> +++ b/fs/xfs/xfs_iops.c
> @@ -13,6 +13,8 @@
>  #include "xfs_inode.h"
>  #include "xfs_acl.h"
>  #include "xfs_quota.h"
> +#include "xfs_da_format.h"
> +#include "xfs_da_btree.h"
>  #include "xfs_attr.h"
>  #include "xfs_trans.h"
>  #include "xfs_trace.h"
> diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
> index ad0c69ee..6405ce33 100644
> --- a/fs/xfs/xfs_log.c
> +++ b/fs/xfs/xfs_log.c
> @@ -1975,6 +1975,10 @@ xlog_print_tic_res(
>  	    REG_TYPE_STR(CUD_FORMAT, "cud_format"),
>  	    REG_TYPE_STR(BUI_FORMAT, "bui_format"),
>  	    REG_TYPE_STR(BUD_FORMAT, "bud_format"),
> +	    REG_TYPE_STR(ATTRI_FORMAT, "attri_format"),
> +	    REG_TYPE_STR(ATTRD_FORMAT, "attrd_format"),
> +	    REG_TYPE_STR(ATTR_NAME, "attr_name"),
> +	    REG_TYPE_STR(ATTR_VALUE, "attr_value"),
>  	};
>  	BUILD_BUG_ON(ARRAY_SIZE(res_type_str) != XLOG_REG_TYPE_MAX + 1);
>  #undef REG_TYPE_STR
> diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
> index e2ec91b..ec31db0 100644
> --- a/fs/xfs/xfs_log_recover.c
> +++ b/fs/xfs/xfs_log_recover.c
> @@ -1811,6 +1811,8 @@ static const struct xlog_recover_item_ops *xlog_recover_item_ops[] = {
>  	&xlog_cud_item_ops,
>  	&xlog_bui_item_ops,
>  	&xlog_bud_item_ops,
> +	&xlog_attri_item_ops,
> +	&xlog_attrd_item_ops,
>  };
>  
>  static const struct xlog_recover_item_ops *
> diff --git a/fs/xfs/xfs_ondisk.h b/fs/xfs/xfs_ondisk.h
> index 5f04d8a..0597a04 100644
> --- a/fs/xfs/xfs_ondisk.h
> +++ b/fs/xfs/xfs_ondisk.h
> @@ -126,6 +126,8 @@ xfs_check_ondisk_structs(void)
>  	XFS_CHECK_STRUCT_SIZE(struct xfs_inode_log_format,	56);
>  	XFS_CHECK_STRUCT_SIZE(struct xfs_qoff_logformat,	20);
>  	XFS_CHECK_STRUCT_SIZE(struct xfs_trans_header,		16);
> +	XFS_CHECK_STRUCT_SIZE(struct xfs_attri_log_format,	40);
> +	XFS_CHECK_STRUCT_SIZE(struct xfs_attrd_log_format,	16);
>  
>  	/*
>  	 * The v5 superblock format extended several v4 header structures with
> diff --git a/fs/xfs/xfs_xattr.c b/fs/xfs/xfs_xattr.c
> index bca48b3..9b0c790 100644
> --- a/fs/xfs/xfs_xattr.c
> +++ b/fs/xfs/xfs_xattr.c
> @@ -10,6 +10,7 @@
>  #include "xfs_log_format.h"
>  #include "xfs_da_format.h"
>  #include "xfs_inode.h"
> +#include "xfs_da_btree.h"
>  #include "xfs_attr.h"
>  #include "xfs_acl.h"
>  #include "xfs_da_btree.h"
> -- 
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v12 1/8] xfs: Add delay ready attr remove routines
  2020-08-27  0:35 ` [PATCH v12 1/8] xfs: Add delay ready attr remove routines Allison Collins
@ 2020-09-01 17:00   ` Brian Foster
  2020-09-01 17:20     ` Darrick J. Wong
  0 siblings, 1 reply; 21+ messages in thread
From: Brian Foster @ 2020-09-01 17:00 UTC (permalink / raw)
  To: Allison Collins; +Cc: linux-xfs

On Wed, Aug 26, 2020 at 05:35:11PM -0700, Allison Collins wrote:
> This patch modifies the attr remove routines to be delay ready. This
> means they no longer roll or commit transactions, but instead return
> -EAGAIN to have the calling routine roll and refresh the transaction. In
> this series, xfs_attr_remove_args has become xfs_attr_remove_iter, which
> uses a sort of state machine like switch to keep track of where it was
> when EAGAIN was returned. xfs_attr_node_removename has also been
> modified to use the switch, and a new version of xfs_attr_remove_args
> consists of a simple loop to refresh the transaction until the operation
> is completed.  A new XFS_DAC_DEFER_FINISH flag is used to finish the
> transaction where ever the existing code used to.
> 
> Calls to xfs_attr_rmtval_remove are replaced with the delay ready
> version __xfs_attr_rmtval_remove. We will rename
> __xfs_attr_rmtval_remove back to xfs_attr_rmtval_remove when we are
> done.
> 
> xfs_attr_rmtval_remove itself is still in use by the set routines (used
> during a rename).  For reasons of perserving existing function, we

Nit:				preserving

> modify xfs_attr_rmtval_remove to call xfs_defer_finish when the flag is
> set.  Similar to how xfs_attr_remove_args does here.  Once we transition
> the set routines to be delay ready, xfs_attr_rmtval_remove is no longer
> used and will be removed.
> 
> This patch also adds a new struct xfs_delattr_context, which we will use
> to keep track of the current state of an attribute operation. The new
> xfs_delattr_state enum is used to track various operations that are in
> progress so that we know not to repeat them, and resume where we left
> off before EAGAIN was returned to cycle out the transaction. Other
> members take the place of local variables that need to retain their
> values across multiple function recalls.  See xfs_attr.h for a more
> detailed diagram of the states.
> 
> Signed-off-by: Allison Collins <allison.henderson@oracle.com>
> ---
>  fs/xfs/libxfs/xfs_attr.c        | 162 ++++++++++++++++++++++++++++++----------
>  fs/xfs/libxfs/xfs_attr.h        |  73 ++++++++++++++++++
>  fs/xfs/libxfs/xfs_attr_leaf.c   |   2 +-
>  fs/xfs/libxfs/xfs_attr_remote.c |  39 +++++-----
>  fs/xfs/libxfs/xfs_attr_remote.h |   2 +-
>  fs/xfs/xfs_attr_inactive.c      |   2 +-
>  6 files changed, 220 insertions(+), 60 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
> index 2e055c0..ea50fc3 100644
> --- a/fs/xfs/libxfs/xfs_attr.c
> +++ b/fs/xfs/libxfs/xfs_attr.c
...
> @@ -264,6 +264,33 @@ xfs_attr_set_shortform(
>  }
>  
>  /*
> + * Checks to see if a delayed attribute transaction should be rolled.  If so,
> + * also checks for a defer finish.  Transaction is finished and rolled as
> + * needed, and returns true of false if the delayed operation should continue.
> + */
> +int
> +xfs_attr_trans_roll(
> +	struct xfs_delattr_context	*dac)
> +{
> +	struct xfs_da_args              *args = dac->da_args;
> +	int				error = 0;
> +
> +	if (dac->flags & XFS_DAC_DEFER_FINISH) {
> +		/*
> +		 * The caller wants us to finish all the deferred ops so that we
> +		 * avoid pinning the log tail with a large number of deferred
> +		 * ops.
> +		 */
> +		dac->flags &= ~XFS_DAC_DEFER_FINISH;
> +		error = xfs_defer_finish(&args->trans);
> +		if (error)
> +			return error;
> +	}
> +
> +	return xfs_trans_roll_inode(&args->trans, args->dp);

I'm not sure there's a need to roll the transaction again if the
defer path above executes. xfs_defer_finish() completes the dfops and
always returns a clean transaction.

> +}
> +
> +/*
>   * Set the attribute specified in @args.
>   */
>  int
...
> @@ -1218,21 +1288,35 @@ xfs_attr_node_remove_rmt(
>   * This will involve walking down the Btree, and may involve joining
>   * leaf nodes and even joining intermediate nodes up to and including
>   * the root node (a special case of an intermediate node).
> + *
> + * This routine is meant to function as either an inline or delayed operation,
> + * and may return -EAGAIN when the transaction needs to be rolled.  Calling
> + * functions will need to handle this, and recall the function until a
> + * successful error code is returned.
>   */
>  STATIC int
>  xfs_attr_node_removename(
> -	struct xfs_da_args	*args)
> +	struct xfs_delattr_context	*dac)
>  {
> -	struct xfs_da_state	*state;
> -	struct xfs_da_state_blk	*blk;
> -	int			retval, error;
> -	struct xfs_inode	*dp = args->dp;
> +	struct xfs_da_args		*args = dac->da_args;
> +	struct xfs_da_state		*state;
> +	struct xfs_da_state_blk		*blk;
> +	int				retval, error;
> +	struct xfs_inode		*dp = args->dp;
>  
>  	trace_xfs_attr_node_removename(args);
> +	state = dac->da_state;
> +	blk = dac->blk;
>  
> -	error = xfs_attr_node_removename_setup(args, &state);
> -	if (error)
> -		goto out;
> +	if (dac->dela_state == XFS_DAS_RM_SHRINK)
> +		goto das_rm_shrink;
> +
> +	if ((dac->flags & XFS_DAC_NODE_RMVNAME_INIT) == 0) {
> +		dac->flags |= XFS_DAC_NODE_RMVNAME_INIT;
> +		error = xfs_attr_node_removename_setup(dac, &state);
> +		if (error)
> +			goto out;
> +	}
>  
>  	/*
>  	 * If there is an out-of-line value, de-allocate the blocks.
> @@ -1240,8 +1324,13 @@ xfs_attr_node_removename(
>  	 * overflow the maximum size of a transaction and/or hit a deadlock.
>  	 */
>  	if (args->rmtblkno > 0) {
> -		error = xfs_attr_node_remove_rmt(args, state);
> -		if (error)
> +		/*
> +		 * May return -EAGAIN. Remove blocks until args->rmtblkno == 0
> +		 */
> +		error = xfs_attr_node_remove_rmt(dac, state);
> +		if (error == -EAGAIN)
> +			return error;
> +		else if (error)
>  			goto out;
>  	}
>  
> @@ -1260,17 +1349,14 @@ xfs_attr_node_removename(
>  		error = xfs_da3_join(state);
>  		if (error)
>  			goto out;
> -		error = xfs_defer_finish(&args->trans);
> -		if (error)
> -			goto out;
> -		/*
> -		 * Commit the Btree join operation and start a new trans.
> -		 */
> -		error = xfs_trans_roll_inode(&args->trans, dp);
> -		if (error)
> -			goto out;
> +
> +		dac->flags |= XFS_DAC_DEFER_FINISH;
> +		dac->dela_state = XFS_DAS_RM_SHRINK;
> +		return -EAGAIN;
>  	}
>  
> +das_rm_shrink:
> +
>  	/*
>  	 * If the result is small enough, push it all into the inode.
>  	 */

ISTR that Dave or Darrick previously suggested that we should try to
isolate the state transition code as much as possible to a single
location. That basically means we should look at any place a particular
state check travels through multiple functions and see if we can
refactor things to flatten the state processing code. I tend to agree
that is the ideal approach given how difficult it can be to track state
changes through multiple functions.

In light of that (and as an example), I think the whole
xfs_attr_node_removename() path should be refactored so it looks
something like the following (with obvious error
handling/comment/aesthetic cleanups etc.):

xfs_attr_node_removename_iter()
{
	...

	if ((dac->flags & XFS_DAC_NODE_RMVNAME_INIT) == 0) {
		<do init stuff>
	}

	switch (dac->dela_state) {
	case 0:
		/* 
		 * repeatedly remove remote blocks, remove the entry and
		 * join. returns -EAGAIN or 0 for completion of the step.
		 */
		error = xfs_attr_node_remove_step(dac, state);
		if (error)
			break;

		/* check whether to shrink or return success */
		if (!error && xfs_bmap_one_block(...)) {
			dac->dela_state = XFS_DAS_RM_SHRINK;
			error = -EAGAIN;
		}
		break;
	case XFS_DAS_RM_SHRINK:
		/* shrink the fork, no reentry, no next step */
		error = xfs_attr_node_shrink_step(args, state);	
		break;
	default:
		ASSERT(0);
		return -EINVAL;
	}

	if (error == -EAGAIN)
		return error;

	<do cleanup stuff>
	...
	return error;
}

The idea here is that we have one _iter() function that does all the
state management for a particular operation and has minimal other logic.
That way we can see the states that repeat, transition, etc. all in one
place. The _step() functions implement the functional components of each
state and do no state management whatsoever beyond return -EAGAIN to
request reentry or return 0 for completion. In the case of the latter,
the _iter() function decides whether to transition to another state
(returning -EAGAIN itself) or complete the operation. If a _step()
function ever needs to set or check ->dela_state, then that is clear
indication it must be broken up into multiple _step() functions.

I think this implements the separation of state and functionality model
we're after without introduction of crazy state processing frameworks,
etc., but I admit I've so far only thought about it wrt the remove case
(which is more simple than the set case). Also note that as usual, any
associated refactoring of the functional components should come as
preliminary patches such that this patch only introduces state bits.
Thoughts?

Brian

> diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
> index 3e97a93..9573949 100644
> --- a/fs/xfs/libxfs/xfs_attr.h
> +++ b/fs/xfs/libxfs/xfs_attr.h
> @@ -74,6 +74,75 @@ struct xfs_attr_list_context {
>  };
>  
>  
> +/*
> + * ========================================================================
> + * Structure used to pass context around among the delayed routines.
> + * ========================================================================
> + */
> +
> +/*
> + * Below is a state machine diagram for attr remove operations. The  XFS_DAS_*
> + * states indicate places where the function would return -EAGAIN, and then
> + * immediately resume from after being recalled by the calling function. States
> + * marked as a "subroutine state" indicate that they belong to a subroutine, and
> + * so the calling function needs to pass them back to that subroutine to allow
> + * it to finish where it left off. But they otherwise do not have a role in the
> + * calling function other than just passing through.
> + *
> + * xfs_attr_remove_iter()
> + *	  XFS_DAS_RM_SHRINK ─�
> + *	  (subroutine state) │
> + *	                     └─>xfs_attr_node_removename()
> + *	                                      │
> + *	                                      v
> + *	                                   need to
> + *	                                shrink tree? ─n─�
> + *	                                      │         │
> + *	                                      y         │
> + *	                                      │         │
> + *	                                      v         │
> + *	                              XFS_DAS_RM_SHRINK │
> + *	                                      │         │
> + *	                                      v         │
> + *	                                     done <─────┘
> + *
> + */
> +
> +/*
> + * Enum values for xfs_delattr_context.da_state
> + *
> + * These values are used by delayed attribute operations to keep track  of where
> + * they were before they returned -EAGAIN.  A return code of -EAGAIN signals the
> + * calling function to roll the transaction, and then recall the subroutine to
> + * finish the operation.  The enum is then used by the subroutine to jump back
> + * to where it was and resume executing where it left off.
> + */
> +enum xfs_delattr_state {
> +				      /* Zero is uninitalized */
> +	XFS_DAS_RM_SHRINK	= 1,  /* We are shrinking the tree */
> +};
> +
> +/*
> + * Defines for xfs_delattr_context.flags
> + */
> +#define XFS_DAC_DEFER_FINISH		0x01 /* finish the transaction */
> +#define XFS_DAC_NODE_RMVNAME_INIT	0x02 /* xfs_attr_node_removename init */
> +
> +/*
> + * Context used for keeping track of delayed attribute operations
> + */
> +struct xfs_delattr_context {
> +	struct xfs_da_args      *da_args;
> +
> +	/* Used in xfs_attr_node_removename to roll through removing blocks */
> +	struct xfs_da_state     *da_state;
> +	struct xfs_da_state_blk *blk;
> +
> +	/* Used to keep track of current state of delayed operation */
> +	unsigned int            flags;
> +	enum xfs_delattr_state  dela_state;
> +};
> +
>  /*========================================================================
>   * Function prototypes for the kernel.
>   *========================================================================*/
> @@ -91,6 +160,10 @@ int xfs_attr_set(struct xfs_da_args *args);
>  int xfs_attr_set_args(struct xfs_da_args *args);
>  int xfs_has_attr(struct xfs_da_args *args);
>  int xfs_attr_remove_args(struct xfs_da_args *args);
> +int xfs_attr_remove_iter(struct xfs_delattr_context *dac);
> +int xfs_attr_trans_roll(struct xfs_delattr_context *dac);
>  bool xfs_attr_namecheck(const void *name, size_t length);
> +void xfs_delattr_context_init(struct xfs_delattr_context *dac,
> +			      struct xfs_da_args *args);
>  
>  #endif	/* __XFS_ATTR_H__ */
> diff --git a/fs/xfs/libxfs/xfs_attr_leaf.c b/fs/xfs/libxfs/xfs_attr_leaf.c
> index 8623c81..4ed7b31 100644
> --- a/fs/xfs/libxfs/xfs_attr_leaf.c
> +++ b/fs/xfs/libxfs/xfs_attr_leaf.c
> @@ -19,8 +19,8 @@
>  #include "xfs_bmap_btree.h"
>  #include "xfs_bmap.h"
>  #include "xfs_attr_sf.h"
> -#include "xfs_attr_remote.h"
>  #include "xfs_attr.h"
> +#include "xfs_attr_remote.h"
>  #include "xfs_attr_leaf.h"
>  #include "xfs_error.h"
>  #include "xfs_trace.h"
> diff --git a/fs/xfs/libxfs/xfs_attr_remote.c b/fs/xfs/libxfs/xfs_attr_remote.c
> index 3f80ced..7f81b48 100644
> --- a/fs/xfs/libxfs/xfs_attr_remote.c
> +++ b/fs/xfs/libxfs/xfs_attr_remote.c
> @@ -676,10 +676,14 @@ xfs_attr_rmtval_invalidate(
>   */
>  int
>  xfs_attr_rmtval_remove(
> -	struct xfs_da_args      *args)
> +	struct xfs_da_args		*args)
>  {
> -	int			error;
> -	int			retval;
> +	xfs_dablk_t			lblkno;
> +	int				blkcnt;
> +	int				error;
> +	struct xfs_delattr_context	dac  = {
> +		.da_args	= args,
> +	};
>  
>  	trace_xfs_attr_rmtval_remove(args);
>  
> @@ -687,19 +691,17 @@ xfs_attr_rmtval_remove(
>  	 * Keep de-allocating extents until the remote-value region is gone.
>  	 */
>  	do {
> -		retval = __xfs_attr_rmtval_remove(args);
> -		if (retval && retval != -EAGAIN)
> -			return retval;
> +		error = __xfs_attr_rmtval_remove(&dac);
> +		if (error != -EAGAIN)
> +			break;
>  
> -		/*
> -		 * Close out trans and start the next one in the chain.
> -		 */
> -		error = xfs_trans_roll_inode(&args->trans, args->dp);
> +		error = xfs_attr_trans_roll(&dac);
>  		if (error)
>  			return error;
> -	} while (retval == -EAGAIN);
>  
> -	return 0;
> +	} while (true);
> +
> +	return error;
>  }
>  
>  /*
> @@ -709,9 +711,10 @@ xfs_attr_rmtval_remove(
>   */
>  int
>  __xfs_attr_rmtval_remove(
> -	struct xfs_da_args	*args)
> +	struct xfs_delattr_context	*dac)
>  {
> -	int			error, done;
> +	struct xfs_da_args		*args = dac->da_args;
> +	int				error, done;
>  
>  	/*
>  	 * Unmap value blocks for this attr.
> @@ -721,12 +724,10 @@ __xfs_attr_rmtval_remove(
>  	if (error)
>  		return error;
>  
> -	error = xfs_defer_finish(&args->trans);
> -	if (error)
> -		return error;
> -
> -	if (!done)
> +	if (!done) {
> +		dac->flags |= XFS_DAC_DEFER_FINISH;
>  		return -EAGAIN;
> +	}
>  
>  	return error;
>  }
> diff --git a/fs/xfs/libxfs/xfs_attr_remote.h b/fs/xfs/libxfs/xfs_attr_remote.h
> index 9eee615..002fd30 100644
> --- a/fs/xfs/libxfs/xfs_attr_remote.h
> +++ b/fs/xfs/libxfs/xfs_attr_remote.h
> @@ -14,5 +14,5 @@ int xfs_attr_rmtval_remove(struct xfs_da_args *args);
>  int xfs_attr_rmtval_stale(struct xfs_inode *ip, struct xfs_bmbt_irec *map,
>  		xfs_buf_flags_t incore_flags);
>  int xfs_attr_rmtval_invalidate(struct xfs_da_args *args);
> -int __xfs_attr_rmtval_remove(struct xfs_da_args *args);
> +int __xfs_attr_rmtval_remove(struct xfs_delattr_context *dac);
>  #endif /* __XFS_ATTR_REMOTE_H__ */
> diff --git a/fs/xfs/xfs_attr_inactive.c b/fs/xfs/xfs_attr_inactive.c
> index bfad669..aaa7e66 100644
> --- a/fs/xfs/xfs_attr_inactive.c
> +++ b/fs/xfs/xfs_attr_inactive.c
> @@ -15,10 +15,10 @@
>  #include "xfs_da_format.h"
>  #include "xfs_da_btree.h"
>  #include "xfs_inode.h"
> +#include "xfs_attr.h"
>  #include "xfs_attr_remote.h"
>  #include "xfs_trans.h"
>  #include "xfs_bmap.h"
> -#include "xfs_attr.h"
>  #include "xfs_attr_leaf.h"
>  #include "xfs_quota.h"
>  #include "xfs_dir2.h"
> -- 
> 2.7.4
> 


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v12 1/8] xfs: Add delay ready attr remove routines
  2020-09-01 17:00   ` Brian Foster
@ 2020-09-01 17:20     ` Darrick J. Wong
  2020-09-01 18:07       ` Brian Foster
  0 siblings, 1 reply; 21+ messages in thread
From: Darrick J. Wong @ 2020-09-01 17:20 UTC (permalink / raw)
  To: Brian Foster; +Cc: Allison Collins, linux-xfs

On Tue, Sep 01, 2020 at 01:00:20PM -0400, Brian Foster wrote:
> On Wed, Aug 26, 2020 at 05:35:11PM -0700, Allison Collins wrote:
> > This patch modifies the attr remove routines to be delay ready. This
> > means they no longer roll or commit transactions, but instead return
> > -EAGAIN to have the calling routine roll and refresh the transaction. In
> > this series, xfs_attr_remove_args has become xfs_attr_remove_iter, which
> > uses a sort of state machine like switch to keep track of where it was
> > when EAGAIN was returned. xfs_attr_node_removename has also been
> > modified to use the switch, and a new version of xfs_attr_remove_args
> > consists of a simple loop to refresh the transaction until the operation
> > is completed.  A new XFS_DAC_DEFER_FINISH flag is used to finish the
> > transaction where ever the existing code used to.
> > 
> > Calls to xfs_attr_rmtval_remove are replaced with the delay ready
> > version __xfs_attr_rmtval_remove. We will rename
> > __xfs_attr_rmtval_remove back to xfs_attr_rmtval_remove when we are
> > done.
> > 
> > xfs_attr_rmtval_remove itself is still in use by the set routines (used
> > during a rename).  For reasons of perserving existing function, we
> 
> Nit:				preserving
> 
> > modify xfs_attr_rmtval_remove to call xfs_defer_finish when the flag is
> > set.  Similar to how xfs_attr_remove_args does here.  Once we transition
> > the set routines to be delay ready, xfs_attr_rmtval_remove is no longer
> > used and will be removed.
> > 
> > This patch also adds a new struct xfs_delattr_context, which we will use
> > to keep track of the current state of an attribute operation. The new
> > xfs_delattr_state enum is used to track various operations that are in
> > progress so that we know not to repeat them, and resume where we left
> > off before EAGAIN was returned to cycle out the transaction. Other
> > members take the place of local variables that need to retain their
> > values across multiple function recalls.  See xfs_attr.h for a more
> > detailed diagram of the states.
> > 
> > Signed-off-by: Allison Collins <allison.henderson@oracle.com>
> > ---
> >  fs/xfs/libxfs/xfs_attr.c        | 162 ++++++++++++++++++++++++++++++----------
> >  fs/xfs/libxfs/xfs_attr.h        |  73 ++++++++++++++++++
> >  fs/xfs/libxfs/xfs_attr_leaf.c   |   2 +-
> >  fs/xfs/libxfs/xfs_attr_remote.c |  39 +++++-----
> >  fs/xfs/libxfs/xfs_attr_remote.h |   2 +-
> >  fs/xfs/xfs_attr_inactive.c      |   2 +-
> >  6 files changed, 220 insertions(+), 60 deletions(-)
> > 
> > diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
> > index 2e055c0..ea50fc3 100644
> > --- a/fs/xfs/libxfs/xfs_attr.c
> > +++ b/fs/xfs/libxfs/xfs_attr.c
> ...
> > @@ -264,6 +264,33 @@ xfs_attr_set_shortform(
> >  }
> >  
> >  /*
> > + * Checks to see if a delayed attribute transaction should be rolled.  If so,
> > + * also checks for a defer finish.  Transaction is finished and rolled as
> > + * needed, and returns true of false if the delayed operation should continue.
> > + */
> > +int
> > +xfs_attr_trans_roll(
> > +	struct xfs_delattr_context	*dac)
> > +{
> > +	struct xfs_da_args              *args = dac->da_args;
> > +	int				error = 0;
> > +
> > +	if (dac->flags & XFS_DAC_DEFER_FINISH) {
> > +		/*
> > +		 * The caller wants us to finish all the deferred ops so that we
> > +		 * avoid pinning the log tail with a large number of deferred
> > +		 * ops.
> > +		 */
> > +		dac->flags &= ~XFS_DAC_DEFER_FINISH;
> > +		error = xfs_defer_finish(&args->trans);
> > +		if (error)
> > +			return error;
> > +	}
> > +
> > +	return xfs_trans_roll_inode(&args->trans, args->dp);
> 
> I'm not sure there's a need to roll the transaction again if the
> defer path above executes. xfs_defer_finish() completes the dfops and
> always returns a clean transaction.

I'm not sure we even really need a DEFER_FINISH flag if (a) xfs_defer.c
gets patched to finish all the other defer items before coming back to
the next step of the delattr state machine and (b) Allison removes the
_iter functions in favor of using the defer op mechanism even when we're
not pushing the state changes through the log.

(I'm working on (a) still, will have something in a few days...)

> > +}
> > +
> > +/*
> >   * Set the attribute specified in @args.
> >   */
> >  int
> ...
> > @@ -1218,21 +1288,35 @@ xfs_attr_node_remove_rmt(
> >   * This will involve walking down the Btree, and may involve joining
> >   * leaf nodes and even joining intermediate nodes up to and including
> >   * the root node (a special case of an intermediate node).
> > + *
> > + * This routine is meant to function as either an inline or delayed operation,
> > + * and may return -EAGAIN when the transaction needs to be rolled.  Calling
> > + * functions will need to handle this, and recall the function until a
> > + * successful error code is returned.
> >   */
> >  STATIC int
> >  xfs_attr_node_removename(
> > -	struct xfs_da_args	*args)
> > +	struct xfs_delattr_context	*dac)
> >  {
> > -	struct xfs_da_state	*state;
> > -	struct xfs_da_state_blk	*blk;
> > -	int			retval, error;
> > -	struct xfs_inode	*dp = args->dp;
> > +	struct xfs_da_args		*args = dac->da_args;
> > +	struct xfs_da_state		*state;
> > +	struct xfs_da_state_blk		*blk;
> > +	int				retval, error;
> > +	struct xfs_inode		*dp = args->dp;
> >  
> >  	trace_xfs_attr_node_removename(args);
> > +	state = dac->da_state;
> > +	blk = dac->blk;
> >  
> > -	error = xfs_attr_node_removename_setup(args, &state);
> > -	if (error)
> > -		goto out;
> > +	if (dac->dela_state == XFS_DAS_RM_SHRINK)
> > +		goto das_rm_shrink;
> > +
> > +	if ((dac->flags & XFS_DAC_NODE_RMVNAME_INIT) == 0) {
> > +		dac->flags |= XFS_DAC_NODE_RMVNAME_INIT;
> > +		error = xfs_attr_node_removename_setup(dac, &state);
> > +		if (error)
> > +			goto out;
> > +	}
> >  
> >  	/*
> >  	 * If there is an out-of-line value, de-allocate the blocks.
> > @@ -1240,8 +1324,13 @@ xfs_attr_node_removename(
> >  	 * overflow the maximum size of a transaction and/or hit a deadlock.
> >  	 */
> >  	if (args->rmtblkno > 0) {
> > -		error = xfs_attr_node_remove_rmt(args, state);
> > -		if (error)
> > +		/*
> > +		 * May return -EAGAIN. Remove blocks until args->rmtblkno == 0
> > +		 */
> > +		error = xfs_attr_node_remove_rmt(dac, state);
> > +		if (error == -EAGAIN)
> > +			return error;
> > +		else if (error)
> >  			goto out;
> >  	}
> >  
> > @@ -1260,17 +1349,14 @@ xfs_attr_node_removename(
> >  		error = xfs_da3_join(state);
> >  		if (error)
> >  			goto out;
> > -		error = xfs_defer_finish(&args->trans);
> > -		if (error)
> > -			goto out;
> > -		/*
> > -		 * Commit the Btree join operation and start a new trans.
> > -		 */
> > -		error = xfs_trans_roll_inode(&args->trans, dp);
> > -		if (error)
> > -			goto out;
> > +
> > +		dac->flags |= XFS_DAC_DEFER_FINISH;
> > +		dac->dela_state = XFS_DAS_RM_SHRINK;
> > +		return -EAGAIN;
> >  	}
> >  
> > +das_rm_shrink:
> > +
> >  	/*
> >  	 * If the result is small enough, push it all into the inode.
> >  	 */
> 
> ISTR that Dave or Darrick previously suggested that we should try to
> isolate the state transition code as much as possible to a single
> location. That basically means we should look at any place a particular
> state check travels through multiple functions and see if we can
> refactor things to flatten the state processing code. I tend to agree
> that is the ideal approach given how difficult it can be to track state
> changes through multiple functions.

Yes. :)

> In light of that (and as an example), I think the whole
> xfs_attr_node_removename() path should be refactored so it looks
> something like the following (with obvious error
> handling/comment/aesthetic cleanups etc.):
> 
> xfs_attr_node_removename_iter()
> {
> 	...
> 
> 	if ((dac->flags & XFS_DAC_NODE_RMVNAME_INIT) == 0) {
> 		<do init stuff>
> 	}
> 
> 	switch (dac->dela_state) {
> 	case 0:

I kinda wish "0" had its own name, but I don't also want to start
another round of naming bikeshed. :)

> 		/* 
> 		 * repeatedly remove remote blocks, remove the entry and
> 		 * join. returns -EAGAIN or 0 for completion of the step.
> 		 */
> 		error = xfs_attr_node_remove_step(dac, state);
> 		if (error)
> 			break;
> 
> 		/* check whether to shrink or return success */
> 		if (!error && xfs_bmap_one_block(...)) {
> 			dac->dela_state = XFS_DAS_RM_SHRINK;
> 			error = -EAGAIN;
> 		}
> 		break;
> 	case XFS_DAS_RM_SHRINK:
> 		/* shrink the fork, no reentry, no next step */
> 		error = xfs_attr_node_shrink_step(args, state);	
> 		break;

<nod> The ASCII art diagrams help assuage my nerves about the fact that
we branch based on dela_state but not all the branches actually show us
moving to the next state.

I've gotten the distinct sense, though, that throwing the new state all
the way back up to _iter() to set it is probably a lot more fuss than
it's worth for the attr set case, though...

> 	default:
> 		ASSERT(0);
> 		return -EINVAL;
> 	}
> 
> 	if (error == -EAGAIN)
> 		return error;
> 
> 	<do cleanup stuff>
> 	...
> 	return error;
> }
> 
> The idea here is that we have one _iter() function that does all the
> state management for a particular operation and has minimal other logic.
> That way we can see the states that repeat, transition, etc. all in one
> place. The _step() functions implement the functional components of each
> state and do no state management whatsoever beyond return -EAGAIN to
> request reentry or return 0 for completion. In the case of the latter,
> the _iter() function decides whether to transition to another state
> (returning -EAGAIN itself) or complete the operation. If a _step()
> function ever needs to set or check ->dela_state, then that is clear
> indication it must be broken up into multiple _step() functions.

...because I've frequently had the same thought that the state machine
handling ought to be in the same place.  But then I start reading
through the xattr code to figure out how that would be done, and get
trapped by the fact that some of the decisions about the next state have
to happen pretty deep in the xattr code-- stuff like allocating an
extent for a remote value, where depending on whether or not we got enough
blocks to satisfy the space requirements, either we can move on to the
next state and return EAGAIN, or we have to save the current state and
EAGAIN to try to get more blocks.

Maybe it would help a little if the setting of DEFER_FINISH and changing
of dela_state could be put into a little helper with a tracepoint so
that future us can ftrace the state machine to make sure it's working
correctly?

> I think this implements the separation of state and functionality model
> we're after without introduction of crazy state processing frameworks,

"crazy state processing frameworks"... like xfs_defer.c? :)

> etc., but I admit I've so far only thought about it wrt the remove case
> (which is more simple than the set case). Also note that as usual, any
> associated refactoring of the functional components should come as
> preliminary patches such that this patch only introduces state bits.
> Thoughts?

(I thought/hoped we'd done all the refactoring in the 23-patch megalith
that I tossed into 5.9... :))

--D

> Brian
> 
> > diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
> > index 3e97a93..9573949 100644
> > --- a/fs/xfs/libxfs/xfs_attr.h
> > +++ b/fs/xfs/libxfs/xfs_attr.h
> > @@ -74,6 +74,75 @@ struct xfs_attr_list_context {
> >  };
> >  
> >  
> > +/*
> > + * ========================================================================
> > + * Structure used to pass context around among the delayed routines.
> > + * ========================================================================
> > + */
> > +
> > +/*
> > + * Below is a state machine diagram for attr remove operations. The  XFS_DAS_*
> > + * states indicate places where the function would return -EAGAIN, and then
> > + * immediately resume from after being recalled by the calling function. States
> > + * marked as a "subroutine state" indicate that they belong to a subroutine, and
> > + * so the calling function needs to pass them back to that subroutine to allow
> > + * it to finish where it left off. But they otherwise do not have a role in the
> > + * calling function other than just passing through.
> > + *
> > + * xfs_attr_remove_iter()
> > + *	  XFS_DAS_RM_SHRINK ─�
> > + *	  (subroutine state) │
> > + *	                     └─>xfs_attr_node_removename()
> > + *	                                      │
> > + *	                                      v
> > + *	                                   need to
> > + *	                                shrink tree? ─n─�
> > + *	                                      │         │
> > + *	                                      y         │
> > + *	                                      │         │
> > + *	                                      v         │
> > + *	                              XFS_DAS_RM_SHRINK │
> > + *	                                      │         │
> > + *	                                      v         │
> > + *	                                     done <─────┘
> > + *
> > + */
> > +
> > +/*
> > + * Enum values for xfs_delattr_context.da_state
> > + *
> > + * These values are used by delayed attribute operations to keep track  of where
> > + * they were before they returned -EAGAIN.  A return code of -EAGAIN signals the
> > + * calling function to roll the transaction, and then recall the subroutine to
> > + * finish the operation.  The enum is then used by the subroutine to jump back
> > + * to where it was and resume executing where it left off.
> > + */
> > +enum xfs_delattr_state {
> > +				      /* Zero is uninitalized */
> > +	XFS_DAS_RM_SHRINK	= 1,  /* We are shrinking the tree */
> > +};
> > +
> > +/*
> > + * Defines for xfs_delattr_context.flags
> > + */
> > +#define XFS_DAC_DEFER_FINISH		0x01 /* finish the transaction */
> > +#define XFS_DAC_NODE_RMVNAME_INIT	0x02 /* xfs_attr_node_removename init */
> > +
> > +/*
> > + * Context used for keeping track of delayed attribute operations
> > + */
> > +struct xfs_delattr_context {
> > +	struct xfs_da_args      *da_args;
> > +
> > +	/* Used in xfs_attr_node_removename to roll through removing blocks */
> > +	struct xfs_da_state     *da_state;
> > +	struct xfs_da_state_blk *blk;
> > +
> > +	/* Used to keep track of current state of delayed operation */
> > +	unsigned int            flags;
> > +	enum xfs_delattr_state  dela_state;
> > +};
> > +
> >  /*========================================================================
> >   * Function prototypes for the kernel.
> >   *========================================================================*/
> > @@ -91,6 +160,10 @@ int xfs_attr_set(struct xfs_da_args *args);
> >  int xfs_attr_set_args(struct xfs_da_args *args);
> >  int xfs_has_attr(struct xfs_da_args *args);
> >  int xfs_attr_remove_args(struct xfs_da_args *args);
> > +int xfs_attr_remove_iter(struct xfs_delattr_context *dac);
> > +int xfs_attr_trans_roll(struct xfs_delattr_context *dac);
> >  bool xfs_attr_namecheck(const void *name, size_t length);
> > +void xfs_delattr_context_init(struct xfs_delattr_context *dac,
> > +			      struct xfs_da_args *args);
> >  
> >  #endif	/* __XFS_ATTR_H__ */
> > diff --git a/fs/xfs/libxfs/xfs_attr_leaf.c b/fs/xfs/libxfs/xfs_attr_leaf.c
> > index 8623c81..4ed7b31 100644
> > --- a/fs/xfs/libxfs/xfs_attr_leaf.c
> > +++ b/fs/xfs/libxfs/xfs_attr_leaf.c
> > @@ -19,8 +19,8 @@
> >  #include "xfs_bmap_btree.h"
> >  #include "xfs_bmap.h"
> >  #include "xfs_attr_sf.h"
> > -#include "xfs_attr_remote.h"
> >  #include "xfs_attr.h"
> > +#include "xfs_attr_remote.h"
> >  #include "xfs_attr_leaf.h"
> >  #include "xfs_error.h"
> >  #include "xfs_trace.h"
> > diff --git a/fs/xfs/libxfs/xfs_attr_remote.c b/fs/xfs/libxfs/xfs_attr_remote.c
> > index 3f80ced..7f81b48 100644
> > --- a/fs/xfs/libxfs/xfs_attr_remote.c
> > +++ b/fs/xfs/libxfs/xfs_attr_remote.c
> > @@ -676,10 +676,14 @@ xfs_attr_rmtval_invalidate(
> >   */
> >  int
> >  xfs_attr_rmtval_remove(
> > -	struct xfs_da_args      *args)
> > +	struct xfs_da_args		*args)
> >  {
> > -	int			error;
> > -	int			retval;
> > +	xfs_dablk_t			lblkno;
> > +	int				blkcnt;
> > +	int				error;
> > +	struct xfs_delattr_context	dac  = {
> > +		.da_args	= args,
> > +	};
> >  
> >  	trace_xfs_attr_rmtval_remove(args);
> >  
> > @@ -687,19 +691,17 @@ xfs_attr_rmtval_remove(
> >  	 * Keep de-allocating extents until the remote-value region is gone.
> >  	 */
> >  	do {
> > -		retval = __xfs_attr_rmtval_remove(args);
> > -		if (retval && retval != -EAGAIN)
> > -			return retval;
> > +		error = __xfs_attr_rmtval_remove(&dac);
> > +		if (error != -EAGAIN)
> > +			break;
> >  
> > -		/*
> > -		 * Close out trans and start the next one in the chain.
> > -		 */
> > -		error = xfs_trans_roll_inode(&args->trans, args->dp);
> > +		error = xfs_attr_trans_roll(&dac);
> >  		if (error)
> >  			return error;
> > -	} while (retval == -EAGAIN);
> >  
> > -	return 0;
> > +	} while (true);
> > +
> > +	return error;
> >  }
> >  
> >  /*
> > @@ -709,9 +711,10 @@ xfs_attr_rmtval_remove(
> >   */
> >  int
> >  __xfs_attr_rmtval_remove(
> > -	struct xfs_da_args	*args)
> > +	struct xfs_delattr_context	*dac)
> >  {
> > -	int			error, done;
> > +	struct xfs_da_args		*args = dac->da_args;
> > +	int				error, done;
> >  
> >  	/*
> >  	 * Unmap value blocks for this attr.
> > @@ -721,12 +724,10 @@ __xfs_attr_rmtval_remove(
> >  	if (error)
> >  		return error;
> >  
> > -	error = xfs_defer_finish(&args->trans);
> > -	if (error)
> > -		return error;
> > -
> > -	if (!done)
> > +	if (!done) {
> > +		dac->flags |= XFS_DAC_DEFER_FINISH;
> >  		return -EAGAIN;
> > +	}
> >  
> >  	return error;
> >  }
> > diff --git a/fs/xfs/libxfs/xfs_attr_remote.h b/fs/xfs/libxfs/xfs_attr_remote.h
> > index 9eee615..002fd30 100644
> > --- a/fs/xfs/libxfs/xfs_attr_remote.h
> > +++ b/fs/xfs/libxfs/xfs_attr_remote.h
> > @@ -14,5 +14,5 @@ int xfs_attr_rmtval_remove(struct xfs_da_args *args);
> >  int xfs_attr_rmtval_stale(struct xfs_inode *ip, struct xfs_bmbt_irec *map,
> >  		xfs_buf_flags_t incore_flags);
> >  int xfs_attr_rmtval_invalidate(struct xfs_da_args *args);
> > -int __xfs_attr_rmtval_remove(struct xfs_da_args *args);
> > +int __xfs_attr_rmtval_remove(struct xfs_delattr_context *dac);
> >  #endif /* __XFS_ATTR_REMOTE_H__ */
> > diff --git a/fs/xfs/xfs_attr_inactive.c b/fs/xfs/xfs_attr_inactive.c
> > index bfad669..aaa7e66 100644
> > --- a/fs/xfs/xfs_attr_inactive.c
> > +++ b/fs/xfs/xfs_attr_inactive.c
> > @@ -15,10 +15,10 @@
> >  #include "xfs_da_format.h"
> >  #include "xfs_da_btree.h"
> >  #include "xfs_inode.h"
> > +#include "xfs_attr.h"
> >  #include "xfs_attr_remote.h"
> >  #include "xfs_trans.h"
> >  #include "xfs_bmap.h"
> > -#include "xfs_attr.h"
> >  #include "xfs_attr_leaf.h"
> >  #include "xfs_quota.h"
> >  #include "xfs_dir2.h"
> > -- 
> > 2.7.4
> > 
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v12 1/8] xfs: Add delay ready attr remove routines
  2020-09-01 17:20     ` Darrick J. Wong
@ 2020-09-01 18:07       ` Brian Foster
  2020-09-01 18:31         ` Darrick J. Wong
  0 siblings, 1 reply; 21+ messages in thread
From: Brian Foster @ 2020-09-01 18:07 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Allison Collins, linux-xfs

On Tue, Sep 01, 2020 at 10:20:21AM -0700, Darrick J. Wong wrote:
> On Tue, Sep 01, 2020 at 01:00:20PM -0400, Brian Foster wrote:
> > On Wed, Aug 26, 2020 at 05:35:11PM -0700, Allison Collins wrote:
> > > This patch modifies the attr remove routines to be delay ready. This
> > > means they no longer roll or commit transactions, but instead return
> > > -EAGAIN to have the calling routine roll and refresh the transaction. In
> > > this series, xfs_attr_remove_args has become xfs_attr_remove_iter, which
> > > uses a sort of state machine like switch to keep track of where it was
> > > when EAGAIN was returned. xfs_attr_node_removename has also been
> > > modified to use the switch, and a new version of xfs_attr_remove_args
> > > consists of a simple loop to refresh the transaction until the operation
> > > is completed.  A new XFS_DAC_DEFER_FINISH flag is used to finish the
> > > transaction where ever the existing code used to.
> > > 
> > > Calls to xfs_attr_rmtval_remove are replaced with the delay ready
> > > version __xfs_attr_rmtval_remove. We will rename
> > > __xfs_attr_rmtval_remove back to xfs_attr_rmtval_remove when we are
> > > done.
> > > 
> > > xfs_attr_rmtval_remove itself is still in use by the set routines (used
> > > during a rename).  For reasons of perserving existing function, we
> > 
> > Nit:				preserving
> > 
> > > modify xfs_attr_rmtval_remove to call xfs_defer_finish when the flag is
> > > set.  Similar to how xfs_attr_remove_args does here.  Once we transition
> > > the set routines to be delay ready, xfs_attr_rmtval_remove is no longer
> > > used and will be removed.
> > > 
> > > This patch also adds a new struct xfs_delattr_context, which we will use
> > > to keep track of the current state of an attribute operation. The new
> > > xfs_delattr_state enum is used to track various operations that are in
> > > progress so that we know not to repeat them, and resume where we left
> > > off before EAGAIN was returned to cycle out the transaction. Other
> > > members take the place of local variables that need to retain their
> > > values across multiple function recalls.  See xfs_attr.h for a more
> > > detailed diagram of the states.
> > > 
> > > Signed-off-by: Allison Collins <allison.henderson@oracle.com>
> > > ---
> > >  fs/xfs/libxfs/xfs_attr.c        | 162 ++++++++++++++++++++++++++++++----------
> > >  fs/xfs/libxfs/xfs_attr.h        |  73 ++++++++++++++++++
> > >  fs/xfs/libxfs/xfs_attr_leaf.c   |   2 +-
> > >  fs/xfs/libxfs/xfs_attr_remote.c |  39 +++++-----
> > >  fs/xfs/libxfs/xfs_attr_remote.h |   2 +-
> > >  fs/xfs/xfs_attr_inactive.c      |   2 +-
> > >  6 files changed, 220 insertions(+), 60 deletions(-)
> > > 
> > > diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
> > > index 2e055c0..ea50fc3 100644
> > > --- a/fs/xfs/libxfs/xfs_attr.c
> > > +++ b/fs/xfs/libxfs/xfs_attr.c
> > ...
> > > @@ -264,6 +264,33 @@ xfs_attr_set_shortform(
> > >  }
> > >  
> > >  /*
> > > + * Checks to see if a delayed attribute transaction should be rolled.  If so,
> > > + * also checks for a defer finish.  Transaction is finished and rolled as
> > > + * needed, and returns true of false if the delayed operation should continue.
> > > + */
> > > +int
> > > +xfs_attr_trans_roll(
> > > +	struct xfs_delattr_context	*dac)
> > > +{
> > > +	struct xfs_da_args              *args = dac->da_args;
> > > +	int				error = 0;
> > > +
> > > +	if (dac->flags & XFS_DAC_DEFER_FINISH) {
> > > +		/*
> > > +		 * The caller wants us to finish all the deferred ops so that we
> > > +		 * avoid pinning the log tail with a large number of deferred
> > > +		 * ops.
> > > +		 */
> > > +		dac->flags &= ~XFS_DAC_DEFER_FINISH;
> > > +		error = xfs_defer_finish(&args->trans);
> > > +		if (error)
> > > +			return error;
> > > +	}
> > > +
> > > +	return xfs_trans_roll_inode(&args->trans, args->dp);
> > 
> > I'm not sure there's a need to roll the transaction again if the
> > defer path above executes. xfs_defer_finish() completes the dfops and
> > always returns a clean transaction.
> 
> I'm not sure we even really need a DEFER_FINISH flag if (a) xfs_defer.c
> gets patched to finish all the other defer items before coming back to
> the next step of the delattr state machine and (b) Allison removes the
> _iter functions in favor of using the defer op mechanism even when we're
> not pushing the state changes through the log.
> 

What do you mean by using the dfops mechanism without pushing state
changes through the log? My understanding was that dfops would be
involved with the new intent based attr ops and the state management
handles the original ops until we no longer have to support them..

> (I'm working on (a) still, will have something in a few days...)
> 
> > > +}
> > > +
> > > +/*
> > >   * Set the attribute specified in @args.
> > >   */
> > >  int
> > ...
> > > @@ -1218,21 +1288,35 @@ xfs_attr_node_remove_rmt(
> > >   * This will involve walking down the Btree, and may involve joining
> > >   * leaf nodes and even joining intermediate nodes up to and including
> > >   * the root node (a special case of an intermediate node).
> > > + *
> > > + * This routine is meant to function as either an inline or delayed operation,
> > > + * and may return -EAGAIN when the transaction needs to be rolled.  Calling
> > > + * functions will need to handle this, and recall the function until a
> > > + * successful error code is returned.
> > >   */
> > >  STATIC int
> > >  xfs_attr_node_removename(
> > > -	struct xfs_da_args	*args)
> > > +	struct xfs_delattr_context	*dac)
> > >  {
> > > -	struct xfs_da_state	*state;
> > > -	struct xfs_da_state_blk	*blk;
> > > -	int			retval, error;
> > > -	struct xfs_inode	*dp = args->dp;
> > > +	struct xfs_da_args		*args = dac->da_args;
> > > +	struct xfs_da_state		*state;
> > > +	struct xfs_da_state_blk		*blk;
> > > +	int				retval, error;
> > > +	struct xfs_inode		*dp = args->dp;
> > >  
> > >  	trace_xfs_attr_node_removename(args);
> > > +	state = dac->da_state;
> > > +	blk = dac->blk;
> > >  
> > > -	error = xfs_attr_node_removename_setup(args, &state);
> > > -	if (error)
> > > -		goto out;
> > > +	if (dac->dela_state == XFS_DAS_RM_SHRINK)
> > > +		goto das_rm_shrink;
> > > +
> > > +	if ((dac->flags & XFS_DAC_NODE_RMVNAME_INIT) == 0) {
> > > +		dac->flags |= XFS_DAC_NODE_RMVNAME_INIT;
> > > +		error = xfs_attr_node_removename_setup(dac, &state);
> > > +		if (error)
> > > +			goto out;
> > > +	}
> > >  
> > >  	/*
> > >  	 * If there is an out-of-line value, de-allocate the blocks.
> > > @@ -1240,8 +1324,13 @@ xfs_attr_node_removename(
> > >  	 * overflow the maximum size of a transaction and/or hit a deadlock.
> > >  	 */
> > >  	if (args->rmtblkno > 0) {
> > > -		error = xfs_attr_node_remove_rmt(args, state);
> > > -		if (error)
> > > +		/*
> > > +		 * May return -EAGAIN. Remove blocks until args->rmtblkno == 0
> > > +		 */
> > > +		error = xfs_attr_node_remove_rmt(dac, state);
> > > +		if (error == -EAGAIN)
> > > +			return error;
> > > +		else if (error)
> > >  			goto out;
> > >  	}
> > >  
> > > @@ -1260,17 +1349,14 @@ xfs_attr_node_removename(
> > >  		error = xfs_da3_join(state);
> > >  		if (error)
> > >  			goto out;
> > > -		error = xfs_defer_finish(&args->trans);
> > > -		if (error)
> > > -			goto out;
> > > -		/*
> > > -		 * Commit the Btree join operation and start a new trans.
> > > -		 */
> > > -		error = xfs_trans_roll_inode(&args->trans, dp);
> > > -		if (error)
> > > -			goto out;
> > > +
> > > +		dac->flags |= XFS_DAC_DEFER_FINISH;
> > > +		dac->dela_state = XFS_DAS_RM_SHRINK;
> > > +		return -EAGAIN;
> > >  	}
> > >  
> > > +das_rm_shrink:
> > > +
> > >  	/*
> > >  	 * If the result is small enough, push it all into the inode.
> > >  	 */
> > 
> > ISTR that Dave or Darrick previously suggested that we should try to
> > isolate the state transition code as much as possible to a single
> > location. That basically means we should look at any place a particular
> > state check travels through multiple functions and see if we can
> > refactor things to flatten the state processing code. I tend to agree
> > that is the ideal approach given how difficult it can be to track state
> > changes through multiple functions.
> 
> Yes. :)
> 
> > In light of that (and as an example), I think the whole
> > xfs_attr_node_removename() path should be refactored so it looks
> > something like the following (with obvious error
> > handling/comment/aesthetic cleanups etc.):
> > 
> > xfs_attr_node_removename_iter()
> > {
> > 	...
> > 
> > 	if ((dac->flags & XFS_DAC_NODE_RMVNAME_INIT) == 0) {
> > 		<do init stuff>
> > 	}
> > 
> > 	switch (dac->dela_state) {
> > 	case 0:
> 
> I kinda wish "0" had its own name, but I don't also want to start
> another round of naming bikeshed. :)
> 
> > 		/* 
> > 		 * repeatedly remove remote blocks, remove the entry and
> > 		 * join. returns -EAGAIN or 0 for completion of the step.
> > 		 */
> > 		error = xfs_attr_node_remove_step(dac, state);
> > 		if (error)
> > 			break;
> > 
> > 		/* check whether to shrink or return success */
> > 		if (!error && xfs_bmap_one_block(...)) {
> > 			dac->dela_state = XFS_DAS_RM_SHRINK;
> > 			error = -EAGAIN;
> > 		}
> > 		break;
> > 	case XFS_DAS_RM_SHRINK:
> > 		/* shrink the fork, no reentry, no next step */
> > 		error = xfs_attr_node_shrink_step(args, state);	
> > 		break;
> 
> <nod> The ASCII art diagrams help assuage my nerves about the fact that
> we branch based on dela_state but not all the branches actually show us
> moving to the next state.
> 
> I've gotten the distinct sense, though, that throwing the new state all
> the way back up to _iter() to set it is probably a lot more fuss than
> it's worth for the attr set case, though...
> 

That's quite possible. :P

> > 	default:
> > 		ASSERT(0);
> > 		return -EINVAL;
> > 	}
> > 
> > 	if (error == -EAGAIN)
> > 		return error;
> > 
> > 	<do cleanup stuff>
> > 	...
> > 	return error;
> > }
> > 
> > The idea here is that we have one _iter() function that does all the
> > state management for a particular operation and has minimal other logic.
> > That way we can see the states that repeat, transition, etc. all in one
> > place. The _step() functions implement the functional components of each
> > state and do no state management whatsoever beyond return -EAGAIN to
> > request reentry or return 0 for completion. In the case of the latter,
> > the _iter() function decides whether to transition to another state
> > (returning -EAGAIN itself) or complete the operation. If a _step()
> > function ever needs to set or check ->dela_state, then that is clear
> > indication it must be broken up into multiple _step() functions.
> 
> ...because I've frequently had the same thought that the state machine
> handling ought to be in the same place.  But then I start reading
> through the xattr code to figure out how that would be done, and get
> trapped by the fact that some of the decisions about the next state have
> to happen pretty deep in the xattr code-- stuff like allocating an
> extent for a remote value, where depending on whether or not we got enough
> blocks to satisfy the space requirements, either we can move on to the
> next state and return EAGAIN, or we have to save the current state and
> EAGAIN to try to get more blocks.
> 

I haven't walked through the set code in a while, but this sort of
sounds like more of the same (heavy refactoring followed by insertion of
state management).

> Maybe it would help a little if the setting of DEFER_FINISH and changing
> of dela_state could be put into a little helper with a tracepoint so
> that future us can ftrace the state machine to make sure it's working
> correctly?
> 

I like the idea, but not sure it helps with following the code as much
as runtime analysis.

> > I think this implements the separation of state and functionality model
> > we're after without introduction of crazy state processing frameworks,
> 
> "crazy state processing frameworks"... like xfs_defer.c? :)
> 

Re: my question above, I'm curious about reusing dfops as a mechanism
for both modes if somebody can elaborate on the idea or point me at a
reference where it was previously discussed..? I could have lost track
or missed a discussion while I was out...

> > etc., but I admit I've so far only thought about it wrt the remove case
> > (which is more simple than the set case). Also note that as usual, any
> > associated refactoring of the functional components should come as
> > preliminary patches such that this patch only introduces state bits.
> > Thoughts?
> 
> (I thought/hoped we'd done all the refactoring in the 23-patch megalith
> that I tossed into 5.9... :))
> 

Heh. I'm glad to see that snowball got tossed. ;)

Brian

> --D
> 
> > Brian
> > 
> > > diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
> > > index 3e97a93..9573949 100644
> > > --- a/fs/xfs/libxfs/xfs_attr.h
> > > +++ b/fs/xfs/libxfs/xfs_attr.h
> > > @@ -74,6 +74,75 @@ struct xfs_attr_list_context {
> > >  };
> > >  
> > >  
> > > +/*
> > > + * ========================================================================
> > > + * Structure used to pass context around among the delayed routines.
> > > + * ========================================================================
> > > + */
> > > +
> > > +/*
> > > + * Below is a state machine diagram for attr remove operations. The  XFS_DAS_*
> > > + * states indicate places where the function would return -EAGAIN, and then
> > > + * immediately resume from after being recalled by the calling function. States
> > > + * marked as a "subroutine state" indicate that they belong to a subroutine, and
> > > + * so the calling function needs to pass them back to that subroutine to allow
> > > + * it to finish where it left off. But they otherwise do not have a role in the
> > > + * calling function other than just passing through.
> > > + *
> > > + * xfs_attr_remove_iter()
> > > + *	  XFS_DAS_RM_SHRINK ─�
> > > + *	  (subroutine state) │
> > > + *	                     └─>xfs_attr_node_removename()
> > > + *	                                      │
> > > + *	                                      v
> > > + *	                                   need to
> > > + *	                                shrink tree? ─n─�
> > > + *	                                      │         │
> > > + *	                                      y         │
> > > + *	                                      │         │
> > > + *	                                      v         │
> > > + *	                              XFS_DAS_RM_SHRINK │
> > > + *	                                      │         │
> > > + *	                                      v         │
> > > + *	                                     done <─────┘
> > > + *
> > > + */
> > > +
> > > +/*
> > > + * Enum values for xfs_delattr_context.da_state
> > > + *
> > > + * These values are used by delayed attribute operations to keep track  of where
> > > + * they were before they returned -EAGAIN.  A return code of -EAGAIN signals the
> > > + * calling function to roll the transaction, and then recall the subroutine to
> > > + * finish the operation.  The enum is then used by the subroutine to jump back
> > > + * to where it was and resume executing where it left off.
> > > + */
> > > +enum xfs_delattr_state {
> > > +				      /* Zero is uninitalized */
> > > +	XFS_DAS_RM_SHRINK	= 1,  /* We are shrinking the tree */
> > > +};
> > > +
> > > +/*
> > > + * Defines for xfs_delattr_context.flags
> > > + */
> > > +#define XFS_DAC_DEFER_FINISH		0x01 /* finish the transaction */
> > > +#define XFS_DAC_NODE_RMVNAME_INIT	0x02 /* xfs_attr_node_removename init */
> > > +
> > > +/*
> > > + * Context used for keeping track of delayed attribute operations
> > > + */
> > > +struct xfs_delattr_context {
> > > +	struct xfs_da_args      *da_args;
> > > +
> > > +	/* Used in xfs_attr_node_removename to roll through removing blocks */
> > > +	struct xfs_da_state     *da_state;
> > > +	struct xfs_da_state_blk *blk;
> > > +
> > > +	/* Used to keep track of current state of delayed operation */
> > > +	unsigned int            flags;
> > > +	enum xfs_delattr_state  dela_state;
> > > +};
> > > +
> > >  /*========================================================================
> > >   * Function prototypes for the kernel.
> > >   *========================================================================*/
> > > @@ -91,6 +160,10 @@ int xfs_attr_set(struct xfs_da_args *args);
> > >  int xfs_attr_set_args(struct xfs_da_args *args);
> > >  int xfs_has_attr(struct xfs_da_args *args);
> > >  int xfs_attr_remove_args(struct xfs_da_args *args);
> > > +int xfs_attr_remove_iter(struct xfs_delattr_context *dac);
> > > +int xfs_attr_trans_roll(struct xfs_delattr_context *dac);
> > >  bool xfs_attr_namecheck(const void *name, size_t length);
> > > +void xfs_delattr_context_init(struct xfs_delattr_context *dac,
> > > +			      struct xfs_da_args *args);
> > >  
> > >  #endif	/* __XFS_ATTR_H__ */
> > > diff --git a/fs/xfs/libxfs/xfs_attr_leaf.c b/fs/xfs/libxfs/xfs_attr_leaf.c
> > > index 8623c81..4ed7b31 100644
> > > --- a/fs/xfs/libxfs/xfs_attr_leaf.c
> > > +++ b/fs/xfs/libxfs/xfs_attr_leaf.c
> > > @@ -19,8 +19,8 @@
> > >  #include "xfs_bmap_btree.h"
> > >  #include "xfs_bmap.h"
> > >  #include "xfs_attr_sf.h"
> > > -#include "xfs_attr_remote.h"
> > >  #include "xfs_attr.h"
> > > +#include "xfs_attr_remote.h"
> > >  #include "xfs_attr_leaf.h"
> > >  #include "xfs_error.h"
> > >  #include "xfs_trace.h"
> > > diff --git a/fs/xfs/libxfs/xfs_attr_remote.c b/fs/xfs/libxfs/xfs_attr_remote.c
> > > index 3f80ced..7f81b48 100644
> > > --- a/fs/xfs/libxfs/xfs_attr_remote.c
> > > +++ b/fs/xfs/libxfs/xfs_attr_remote.c
> > > @@ -676,10 +676,14 @@ xfs_attr_rmtval_invalidate(
> > >   */
> > >  int
> > >  xfs_attr_rmtval_remove(
> > > -	struct xfs_da_args      *args)
> > > +	struct xfs_da_args		*args)
> > >  {
> > > -	int			error;
> > > -	int			retval;
> > > +	xfs_dablk_t			lblkno;
> > > +	int				blkcnt;
> > > +	int				error;
> > > +	struct xfs_delattr_context	dac  = {
> > > +		.da_args	= args,
> > > +	};
> > >  
> > >  	trace_xfs_attr_rmtval_remove(args);
> > >  
> > > @@ -687,19 +691,17 @@ xfs_attr_rmtval_remove(
> > >  	 * Keep de-allocating extents until the remote-value region is gone.
> > >  	 */
> > >  	do {
> > > -		retval = __xfs_attr_rmtval_remove(args);
> > > -		if (retval && retval != -EAGAIN)
> > > -			return retval;
> > > +		error = __xfs_attr_rmtval_remove(&dac);
> > > +		if (error != -EAGAIN)
> > > +			break;
> > >  
> > > -		/*
> > > -		 * Close out trans and start the next one in the chain.
> > > -		 */
> > > -		error = xfs_trans_roll_inode(&args->trans, args->dp);
> > > +		error = xfs_attr_trans_roll(&dac);
> > >  		if (error)
> > >  			return error;
> > > -	} while (retval == -EAGAIN);
> > >  
> > > -	return 0;
> > > +	} while (true);
> > > +
> > > +	return error;
> > >  }
> > >  
> > >  /*
> > > @@ -709,9 +711,10 @@ xfs_attr_rmtval_remove(
> > >   */
> > >  int
> > >  __xfs_attr_rmtval_remove(
> > > -	struct xfs_da_args	*args)
> > > +	struct xfs_delattr_context	*dac)
> > >  {
> > > -	int			error, done;
> > > +	struct xfs_da_args		*args = dac->da_args;
> > > +	int				error, done;
> > >  
> > >  	/*
> > >  	 * Unmap value blocks for this attr.
> > > @@ -721,12 +724,10 @@ __xfs_attr_rmtval_remove(
> > >  	if (error)
> > >  		return error;
> > >  
> > > -	error = xfs_defer_finish(&args->trans);
> > > -	if (error)
> > > -		return error;
> > > -
> > > -	if (!done)
> > > +	if (!done) {
> > > +		dac->flags |= XFS_DAC_DEFER_FINISH;
> > >  		return -EAGAIN;
> > > +	}
> > >  
> > >  	return error;
> > >  }
> > > diff --git a/fs/xfs/libxfs/xfs_attr_remote.h b/fs/xfs/libxfs/xfs_attr_remote.h
> > > index 9eee615..002fd30 100644
> > > --- a/fs/xfs/libxfs/xfs_attr_remote.h
> > > +++ b/fs/xfs/libxfs/xfs_attr_remote.h
> > > @@ -14,5 +14,5 @@ int xfs_attr_rmtval_remove(struct xfs_da_args *args);
> > >  int xfs_attr_rmtval_stale(struct xfs_inode *ip, struct xfs_bmbt_irec *map,
> > >  		xfs_buf_flags_t incore_flags);
> > >  int xfs_attr_rmtval_invalidate(struct xfs_da_args *args);
> > > -int __xfs_attr_rmtval_remove(struct xfs_da_args *args);
> > > +int __xfs_attr_rmtval_remove(struct xfs_delattr_context *dac);
> > >  #endif /* __XFS_ATTR_REMOTE_H__ */
> > > diff --git a/fs/xfs/xfs_attr_inactive.c b/fs/xfs/xfs_attr_inactive.c
> > > index bfad669..aaa7e66 100644
> > > --- a/fs/xfs/xfs_attr_inactive.c
> > > +++ b/fs/xfs/xfs_attr_inactive.c
> > > @@ -15,10 +15,10 @@
> > >  #include "xfs_da_format.h"
> > >  #include "xfs_da_btree.h"
> > >  #include "xfs_inode.h"
> > > +#include "xfs_attr.h"
> > >  #include "xfs_attr_remote.h"
> > >  #include "xfs_trans.h"
> > >  #include "xfs_bmap.h"
> > > -#include "xfs_attr.h"
> > >  #include "xfs_attr_leaf.h"
> > >  #include "xfs_quota.h"
> > >  #include "xfs_dir2.h"
> > > -- 
> > > 2.7.4
> > > 
> > 
> 


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v12 1/8] xfs: Add delay ready attr remove routines
  2020-09-01 18:07       ` Brian Foster
@ 2020-09-01 18:31         ` Darrick J. Wong
  2020-09-02 12:22           ` Brian Foster
  0 siblings, 1 reply; 21+ messages in thread
From: Darrick J. Wong @ 2020-09-01 18:31 UTC (permalink / raw)
  To: Brian Foster; +Cc: Allison Collins, linux-xfs

On Tue, Sep 01, 2020 at 02:07:41PM -0400, Brian Foster wrote:
> On Tue, Sep 01, 2020 at 10:20:21AM -0700, Darrick J. Wong wrote:
> > On Tue, Sep 01, 2020 at 01:00:20PM -0400, Brian Foster wrote:
> > > On Wed, Aug 26, 2020 at 05:35:11PM -0700, Allison Collins wrote:
> > > > This patch modifies the attr remove routines to be delay ready. This
> > > > means they no longer roll or commit transactions, but instead return
> > > > -EAGAIN to have the calling routine roll and refresh the transaction. In
> > > > this series, xfs_attr_remove_args has become xfs_attr_remove_iter, which
> > > > uses a sort of state machine like switch to keep track of where it was
> > > > when EAGAIN was returned. xfs_attr_node_removename has also been
> > > > modified to use the switch, and a new version of xfs_attr_remove_args
> > > > consists of a simple loop to refresh the transaction until the operation
> > > > is completed.  A new XFS_DAC_DEFER_FINISH flag is used to finish the
> > > > transaction where ever the existing code used to.
> > > > 
> > > > Calls to xfs_attr_rmtval_remove are replaced with the delay ready
> > > > version __xfs_attr_rmtval_remove. We will rename
> > > > __xfs_attr_rmtval_remove back to xfs_attr_rmtval_remove when we are
> > > > done.
> > > > 
> > > > xfs_attr_rmtval_remove itself is still in use by the set routines (used
> > > > during a rename).  For reasons of perserving existing function, we
> > > 
> > > Nit:				preserving
> > > 
> > > > modify xfs_attr_rmtval_remove to call xfs_defer_finish when the flag is
> > > > set.  Similar to how xfs_attr_remove_args does here.  Once we transition
> > > > the set routines to be delay ready, xfs_attr_rmtval_remove is no longer
> > > > used and will be removed.
> > > > 
> > > > This patch also adds a new struct xfs_delattr_context, which we will use
> > > > to keep track of the current state of an attribute operation. The new
> > > > xfs_delattr_state enum is used to track various operations that are in
> > > > progress so that we know not to repeat them, and resume where we left
> > > > off before EAGAIN was returned to cycle out the transaction. Other
> > > > members take the place of local variables that need to retain their
> > > > values across multiple function recalls.  See xfs_attr.h for a more
> > > > detailed diagram of the states.
> > > > 
> > > > Signed-off-by: Allison Collins <allison.henderson@oracle.com>
> > > > ---
> > > >  fs/xfs/libxfs/xfs_attr.c        | 162 ++++++++++++++++++++++++++++++----------
> > > >  fs/xfs/libxfs/xfs_attr.h        |  73 ++++++++++++++++++
> > > >  fs/xfs/libxfs/xfs_attr_leaf.c   |   2 +-
> > > >  fs/xfs/libxfs/xfs_attr_remote.c |  39 +++++-----
> > > >  fs/xfs/libxfs/xfs_attr_remote.h |   2 +-
> > > >  fs/xfs/xfs_attr_inactive.c      |   2 +-
> > > >  6 files changed, 220 insertions(+), 60 deletions(-)
> > > > 
> > > > diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
> > > > index 2e055c0..ea50fc3 100644
> > > > --- a/fs/xfs/libxfs/xfs_attr.c
> > > > +++ b/fs/xfs/libxfs/xfs_attr.c
> > > ...
> > > > @@ -264,6 +264,33 @@ xfs_attr_set_shortform(
> > > >  }
> > > >  
> > > >  /*
> > > > + * Checks to see if a delayed attribute transaction should be rolled.  If so,
> > > > + * also checks for a defer finish.  Transaction is finished and rolled as
> > > > + * needed, and returns true of false if the delayed operation should continue.
> > > > + */
> > > > +int
> > > > +xfs_attr_trans_roll(
> > > > +	struct xfs_delattr_context	*dac)
> > > > +{
> > > > +	struct xfs_da_args              *args = dac->da_args;
> > > > +	int				error = 0;
> > > > +
> > > > +	if (dac->flags & XFS_DAC_DEFER_FINISH) {
> > > > +		/*
> > > > +		 * The caller wants us to finish all the deferred ops so that we
> > > > +		 * avoid pinning the log tail with a large number of deferred
> > > > +		 * ops.
> > > > +		 */
> > > > +		dac->flags &= ~XFS_DAC_DEFER_FINISH;
> > > > +		error = xfs_defer_finish(&args->trans);
> > > > +		if (error)
> > > > +			return error;
> > > > +	}
> > > > +
> > > > +	return xfs_trans_roll_inode(&args->trans, args->dp);
> > > 
> > > I'm not sure there's a need to roll the transaction again if the
> > > defer path above executes. xfs_defer_finish() completes the dfops and
> > > always returns a clean transaction.
> > 
> > I'm not sure we even really need a DEFER_FINISH flag if (a) xfs_defer.c
> > gets patched to finish all the other defer items before coming back to
> > the next step of the delattr state machine and (b) Allison removes the
> > _iter functions in favor of using the defer op mechanism even when we're
> > not pushing the state changes through the log.
> > 
> 
> What do you mean by using the dfops mechanism without pushing state
> changes through the log? My understanding was that dfops would be
> involved with the new intent based attr ops and the state management
> handles the original ops until we no longer have to support them..

I think you were probably still out when Dave and Allison and I had the
brain fart^Wstorm that nothing in the defer ops code actually requires
you to log anything, which means that you can use it to manage a long
running operation that spans multiple transaction rolls! :)

->create_intent and ->create_done are supposed to create log items and
attach them to the transaction, but the defer finish loop will still
call ->finish_item even if they return NULL pointers.  If the
finish_item call steps around the null pointers and calls whatever upper
level functions are needed to make progress, that works fine.  There's
no log recovery, obviously.

In other words, we can (ab)use defer ops for attr set/remove even in the
non-logged case, which eliminates the need for the separate control
loop.

FWIW, I've implemented that strategy as a proof of concept for extent
swapping:

https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/commit/?h=atomic-file-updates&id=a85883c36e2f3eff50db50fcf58a71d4f13d1f64

Wherein you get atomic swapext if you have the log items enabled, and
if not, you get the old "rmap swapext" that doesn't have log tracking.

> > (I'm working on (a) still, will have something in a few days...)
> > 
> > > > +}
> > > > +
> > > > +/*
> > > >   * Set the attribute specified in @args.
> > > >   */
> > > >  int
> > > ...
> > > > @@ -1218,21 +1288,35 @@ xfs_attr_node_remove_rmt(
> > > >   * This will involve walking down the Btree, and may involve joining
> > > >   * leaf nodes and even joining intermediate nodes up to and including
> > > >   * the root node (a special case of an intermediate node).
> > > > + *
> > > > + * This routine is meant to function as either an inline or delayed operation,
> > > > + * and may return -EAGAIN when the transaction needs to be rolled.  Calling
> > > > + * functions will need to handle this, and recall the function until a
> > > > + * successful error code is returned.
> > > >   */
> > > >  STATIC int
> > > >  xfs_attr_node_removename(
> > > > -	struct xfs_da_args	*args)
> > > > +	struct xfs_delattr_context	*dac)
> > > >  {
> > > > -	struct xfs_da_state	*state;
> > > > -	struct xfs_da_state_blk	*blk;
> > > > -	int			retval, error;
> > > > -	struct xfs_inode	*dp = args->dp;
> > > > +	struct xfs_da_args		*args = dac->da_args;
> > > > +	struct xfs_da_state		*state;
> > > > +	struct xfs_da_state_blk		*blk;
> > > > +	int				retval, error;
> > > > +	struct xfs_inode		*dp = args->dp;
> > > >  
> > > >  	trace_xfs_attr_node_removename(args);
> > > > +	state = dac->da_state;
> > > > +	blk = dac->blk;
> > > >  
> > > > -	error = xfs_attr_node_removename_setup(args, &state);
> > > > -	if (error)
> > > > -		goto out;
> > > > +	if (dac->dela_state == XFS_DAS_RM_SHRINK)
> > > > +		goto das_rm_shrink;
> > > > +
> > > > +	if ((dac->flags & XFS_DAC_NODE_RMVNAME_INIT) == 0) {
> > > > +		dac->flags |= XFS_DAC_NODE_RMVNAME_INIT;
> > > > +		error = xfs_attr_node_removename_setup(dac, &state);
> > > > +		if (error)
> > > > +			goto out;
> > > > +	}
> > > >  
> > > >  	/*
> > > >  	 * If there is an out-of-line value, de-allocate the blocks.
> > > > @@ -1240,8 +1324,13 @@ xfs_attr_node_removename(
> > > >  	 * overflow the maximum size of a transaction and/or hit a deadlock.
> > > >  	 */
> > > >  	if (args->rmtblkno > 0) {
> > > > -		error = xfs_attr_node_remove_rmt(args, state);
> > > > -		if (error)
> > > > +		/*
> > > > +		 * May return -EAGAIN. Remove blocks until args->rmtblkno == 0
> > > > +		 */
> > > > +		error = xfs_attr_node_remove_rmt(dac, state);
> > > > +		if (error == -EAGAIN)
> > > > +			return error;
> > > > +		else if (error)
> > > >  			goto out;
> > > >  	}
> > > >  
> > > > @@ -1260,17 +1349,14 @@ xfs_attr_node_removename(
> > > >  		error = xfs_da3_join(state);
> > > >  		if (error)
> > > >  			goto out;
> > > > -		error = xfs_defer_finish(&args->trans);
> > > > -		if (error)
> > > > -			goto out;
> > > > -		/*
> > > > -		 * Commit the Btree join operation and start a new trans.
> > > > -		 */
> > > > -		error = xfs_trans_roll_inode(&args->trans, dp);
> > > > -		if (error)
> > > > -			goto out;
> > > > +
> > > > +		dac->flags |= XFS_DAC_DEFER_FINISH;
> > > > +		dac->dela_state = XFS_DAS_RM_SHRINK;
> > > > +		return -EAGAIN;
> > > >  	}
> > > >  
> > > > +das_rm_shrink:
> > > > +
> > > >  	/*
> > > >  	 * If the result is small enough, push it all into the inode.
> > > >  	 */
> > > 
> > > ISTR that Dave or Darrick previously suggested that we should try to
> > > isolate the state transition code as much as possible to a single
> > > location. That basically means we should look at any place a particular
> > > state check travels through multiple functions and see if we can
> > > refactor things to flatten the state processing code. I tend to agree
> > > that is the ideal approach given how difficult it can be to track state
> > > changes through multiple functions.
> > 
> > Yes. :)
> > 
> > > In light of that (and as an example), I think the whole
> > > xfs_attr_node_removename() path should be refactored so it looks
> > > something like the following (with obvious error
> > > handling/comment/aesthetic cleanups etc.):
> > > 
> > > xfs_attr_node_removename_iter()
> > > {
> > > 	...
> > > 
> > > 	if ((dac->flags & XFS_DAC_NODE_RMVNAME_INIT) == 0) {
> > > 		<do init stuff>
> > > 	}
> > > 
> > > 	switch (dac->dela_state) {
> > > 	case 0:
> > 
> > I kinda wish "0" had its own name, but I don't also want to start
> > another round of naming bikeshed. :)
> > 
> > > 		/* 
> > > 		 * repeatedly remove remote blocks, remove the entry and
> > > 		 * join. returns -EAGAIN or 0 for completion of the step.
> > > 		 */
> > > 		error = xfs_attr_node_remove_step(dac, state);
> > > 		if (error)
> > > 			break;
> > > 
> > > 		/* check whether to shrink or return success */
> > > 		if (!error && xfs_bmap_one_block(...)) {
> > > 			dac->dela_state = XFS_DAS_RM_SHRINK;
> > > 			error = -EAGAIN;
> > > 		}
> > > 		break;
> > > 	case XFS_DAS_RM_SHRINK:
> > > 		/* shrink the fork, no reentry, no next step */
> > > 		error = xfs_attr_node_shrink_step(args, state);	
> > > 		break;
> > 
> > <nod> The ASCII art diagrams help assuage my nerves about the fact that
> > we branch based on dela_state but not all the branches actually show us
> > moving to the next state.
> > 
> > I've gotten the distinct sense, though, that throwing the new state all
> > the way back up to _iter() to set it is probably a lot more fuss than
> > it's worth for the attr set case, though...
> > 
> 
> That's quite possible. :P
> 
> > > 	default:
> > > 		ASSERT(0);
> > > 		return -EINVAL;
> > > 	}
> > > 
> > > 	if (error == -EAGAIN)
> > > 		return error;
> > > 
> > > 	<do cleanup stuff>
> > > 	...
> > > 	return error;
> > > }
> > > 
> > > The idea here is that we have one _iter() function that does all the
> > > state management for a particular operation and has minimal other logic.
> > > That way we can see the states that repeat, transition, etc. all in one
> > > place. The _step() functions implement the functional components of each
> > > state and do no state management whatsoever beyond return -EAGAIN to
> > > request reentry or return 0 for completion. In the case of the latter,
> > > the _iter() function decides whether to transition to another state
> > > (returning -EAGAIN itself) or complete the operation. If a _step()
> > > function ever needs to set or check ->dela_state, then that is clear
> > > indication it must be broken up into multiple _step() functions.
> > 
> > ...because I've frequently had the same thought that the state machine
> > handling ought to be in the same place.  But then I start reading
> > through the xattr code to figure out how that would be done, and get
> > trapped by the fact that some of the decisions about the next state have
> > to happen pretty deep in the xattr code-- stuff like allocating an
> > extent for a remote value, where depending on whether or not we got enough
> > blocks to satisfy the space requirements, either we can move on to the
> > next state and return EAGAIN, or we have to save the current state and
> > EAGAIN to try to get more blocks.
> > 
> 
> I haven't walked through the set code in a while, but this sort of
> sounds like more of the same (heavy refactoring followed by insertion of
> state management).
> 
> > Maybe it would help a little if the setting of DEFER_FINISH and changing
> > of dela_state could be put into a little helper with a tracepoint so
> > that future us can ftrace the state machine to make sure it's working
> > correctly?
> > 
> 
> I like the idea, but not sure it helps with following the code as much
> as runtime analysis.

<nod>

> > > I think this implements the separation of state and functionality model
> > > we're after without introduction of crazy state processing frameworks,
> > 
> > "crazy state processing frameworks"... like xfs_defer.c? :)
> > 
> 
> Re: my question above, I'm curious about reusing dfops as a mechanism
> for both modes if somebody can elaborate on the idea or point me at a
> reference where it was previously discussed..? I could have lost track
> or missed a discussion while I was out...

(See above...)

> > > etc., but I admit I've so far only thought about it wrt the remove case
> > > (which is more simple than the set case). Also note that as usual, any
> > > associated refactoring of the functional components should come as
> > > preliminary patches such that this patch only introduces state bits.
> > > Thoughts?
> > 
> > (I thought/hoped we'd done all the refactoring in the 23-patch megalith
> > that I tossed into 5.9... :))
> > 
> 
> Heh. I'm glad to see that snowball got tossed. ;)

:)

--D

> Brian
> 
> > --D
> > 
> > > Brian
> > > 
> > > > diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
> > > > index 3e97a93..9573949 100644
> > > > --- a/fs/xfs/libxfs/xfs_attr.h
> > > > +++ b/fs/xfs/libxfs/xfs_attr.h
> > > > @@ -74,6 +74,75 @@ struct xfs_attr_list_context {
> > > >  };
> > > >  
> > > >  
> > > > +/*
> > > > + * ========================================================================
> > > > + * Structure used to pass context around among the delayed routines.
> > > > + * ========================================================================
> > > > + */
> > > > +
> > > > +/*
> > > > + * Below is a state machine diagram for attr remove operations. The  XFS_DAS_*
> > > > + * states indicate places where the function would return -EAGAIN, and then
> > > > + * immediately resume from after being recalled by the calling function. States
> > > > + * marked as a "subroutine state" indicate that they belong to a subroutine, and
> > > > + * so the calling function needs to pass them back to that subroutine to allow
> > > > + * it to finish where it left off. But they otherwise do not have a role in the
> > > > + * calling function other than just passing through.
> > > > + *
> > > > + * xfs_attr_remove_iter()
> > > > + *	  XFS_DAS_RM_SHRINK ─�
> > > > + *	  (subroutine state) │
> > > > + *	                     └─>xfs_attr_node_removename()
> > > > + *	                                      │
> > > > + *	                                      v
> > > > + *	                                   need to
> > > > + *	                                shrink tree? ─n─�
> > > > + *	                                      │         │
> > > > + *	                                      y         │
> > > > + *	                                      │         │
> > > > + *	                                      v         │
> > > > + *	                              XFS_DAS_RM_SHRINK │
> > > > + *	                                      │         │
> > > > + *	                                      v         │
> > > > + *	                                     done <─────┘
> > > > + *
> > > > + */
> > > > +
> > > > +/*
> > > > + * Enum values for xfs_delattr_context.da_state
> > > > + *
> > > > + * These values are used by delayed attribute operations to keep track  of where
> > > > + * they were before they returned -EAGAIN.  A return code of -EAGAIN signals the
> > > > + * calling function to roll the transaction, and then recall the subroutine to
> > > > + * finish the operation.  The enum is then used by the subroutine to jump back
> > > > + * to where it was and resume executing where it left off.
> > > > + */
> > > > +enum xfs_delattr_state {
> > > > +				      /* Zero is uninitalized */
> > > > +	XFS_DAS_RM_SHRINK	= 1,  /* We are shrinking the tree */
> > > > +};
> > > > +
> > > > +/*
> > > > + * Defines for xfs_delattr_context.flags
> > > > + */
> > > > +#define XFS_DAC_DEFER_FINISH		0x01 /* finish the transaction */
> > > > +#define XFS_DAC_NODE_RMVNAME_INIT	0x02 /* xfs_attr_node_removename init */
> > > > +
> > > > +/*
> > > > + * Context used for keeping track of delayed attribute operations
> > > > + */
> > > > +struct xfs_delattr_context {
> > > > +	struct xfs_da_args      *da_args;
> > > > +
> > > > +	/* Used in xfs_attr_node_removename to roll through removing blocks */
> > > > +	struct xfs_da_state     *da_state;
> > > > +	struct xfs_da_state_blk *blk;
> > > > +
> > > > +	/* Used to keep track of current state of delayed operation */
> > > > +	unsigned int            flags;
> > > > +	enum xfs_delattr_state  dela_state;
> > > > +};
> > > > +
> > > >  /*========================================================================
> > > >   * Function prototypes for the kernel.
> > > >   *========================================================================*/
> > > > @@ -91,6 +160,10 @@ int xfs_attr_set(struct xfs_da_args *args);
> > > >  int xfs_attr_set_args(struct xfs_da_args *args);
> > > >  int xfs_has_attr(struct xfs_da_args *args);
> > > >  int xfs_attr_remove_args(struct xfs_da_args *args);
> > > > +int xfs_attr_remove_iter(struct xfs_delattr_context *dac);
> > > > +int xfs_attr_trans_roll(struct xfs_delattr_context *dac);
> > > >  bool xfs_attr_namecheck(const void *name, size_t length);
> > > > +void xfs_delattr_context_init(struct xfs_delattr_context *dac,
> > > > +			      struct xfs_da_args *args);
> > > >  
> > > >  #endif	/* __XFS_ATTR_H__ */
> > > > diff --git a/fs/xfs/libxfs/xfs_attr_leaf.c b/fs/xfs/libxfs/xfs_attr_leaf.c
> > > > index 8623c81..4ed7b31 100644
> > > > --- a/fs/xfs/libxfs/xfs_attr_leaf.c
> > > > +++ b/fs/xfs/libxfs/xfs_attr_leaf.c
> > > > @@ -19,8 +19,8 @@
> > > >  #include "xfs_bmap_btree.h"
> > > >  #include "xfs_bmap.h"
> > > >  #include "xfs_attr_sf.h"
> > > > -#include "xfs_attr_remote.h"
> > > >  #include "xfs_attr.h"
> > > > +#include "xfs_attr_remote.h"
> > > >  #include "xfs_attr_leaf.h"
> > > >  #include "xfs_error.h"
> > > >  #include "xfs_trace.h"
> > > > diff --git a/fs/xfs/libxfs/xfs_attr_remote.c b/fs/xfs/libxfs/xfs_attr_remote.c
> > > > index 3f80ced..7f81b48 100644
> > > > --- a/fs/xfs/libxfs/xfs_attr_remote.c
> > > > +++ b/fs/xfs/libxfs/xfs_attr_remote.c
> > > > @@ -676,10 +676,14 @@ xfs_attr_rmtval_invalidate(
> > > >   */
> > > >  int
> > > >  xfs_attr_rmtval_remove(
> > > > -	struct xfs_da_args      *args)
> > > > +	struct xfs_da_args		*args)
> > > >  {
> > > > -	int			error;
> > > > -	int			retval;
> > > > +	xfs_dablk_t			lblkno;
> > > > +	int				blkcnt;
> > > > +	int				error;
> > > > +	struct xfs_delattr_context	dac  = {
> > > > +		.da_args	= args,
> > > > +	};
> > > >  
> > > >  	trace_xfs_attr_rmtval_remove(args);
> > > >  
> > > > @@ -687,19 +691,17 @@ xfs_attr_rmtval_remove(
> > > >  	 * Keep de-allocating extents until the remote-value region is gone.
> > > >  	 */
> > > >  	do {
> > > > -		retval = __xfs_attr_rmtval_remove(args);
> > > > -		if (retval && retval != -EAGAIN)
> > > > -			return retval;
> > > > +		error = __xfs_attr_rmtval_remove(&dac);
> > > > +		if (error != -EAGAIN)
> > > > +			break;
> > > >  
> > > > -		/*
> > > > -		 * Close out trans and start the next one in the chain.
> > > > -		 */
> > > > -		error = xfs_trans_roll_inode(&args->trans, args->dp);
> > > > +		error = xfs_attr_trans_roll(&dac);
> > > >  		if (error)
> > > >  			return error;
> > > > -	} while (retval == -EAGAIN);
> > > >  
> > > > -	return 0;
> > > > +	} while (true);
> > > > +
> > > > +	return error;
> > > >  }
> > > >  
> > > >  /*
> > > > @@ -709,9 +711,10 @@ xfs_attr_rmtval_remove(
> > > >   */
> > > >  int
> > > >  __xfs_attr_rmtval_remove(
> > > > -	struct xfs_da_args	*args)
> > > > +	struct xfs_delattr_context	*dac)
> > > >  {
> > > > -	int			error, done;
> > > > +	struct xfs_da_args		*args = dac->da_args;
> > > > +	int				error, done;
> > > >  
> > > >  	/*
> > > >  	 * Unmap value blocks for this attr.
> > > > @@ -721,12 +724,10 @@ __xfs_attr_rmtval_remove(
> > > >  	if (error)
> > > >  		return error;
> > > >  
> > > > -	error = xfs_defer_finish(&args->trans);
> > > > -	if (error)
> > > > -		return error;
> > > > -
> > > > -	if (!done)
> > > > +	if (!done) {
> > > > +		dac->flags |= XFS_DAC_DEFER_FINISH;
> > > >  		return -EAGAIN;
> > > > +	}
> > > >  
> > > >  	return error;
> > > >  }
> > > > diff --git a/fs/xfs/libxfs/xfs_attr_remote.h b/fs/xfs/libxfs/xfs_attr_remote.h
> > > > index 9eee615..002fd30 100644
> > > > --- a/fs/xfs/libxfs/xfs_attr_remote.h
> > > > +++ b/fs/xfs/libxfs/xfs_attr_remote.h
> > > > @@ -14,5 +14,5 @@ int xfs_attr_rmtval_remove(struct xfs_da_args *args);
> > > >  int xfs_attr_rmtval_stale(struct xfs_inode *ip, struct xfs_bmbt_irec *map,
> > > >  		xfs_buf_flags_t incore_flags);
> > > >  int xfs_attr_rmtval_invalidate(struct xfs_da_args *args);
> > > > -int __xfs_attr_rmtval_remove(struct xfs_da_args *args);
> > > > +int __xfs_attr_rmtval_remove(struct xfs_delattr_context *dac);
> > > >  #endif /* __XFS_ATTR_REMOTE_H__ */
> > > > diff --git a/fs/xfs/xfs_attr_inactive.c b/fs/xfs/xfs_attr_inactive.c
> > > > index bfad669..aaa7e66 100644
> > > > --- a/fs/xfs/xfs_attr_inactive.c
> > > > +++ b/fs/xfs/xfs_attr_inactive.c
> > > > @@ -15,10 +15,10 @@
> > > >  #include "xfs_da_format.h"
> > > >  #include "xfs_da_btree.h"
> > > >  #include "xfs_inode.h"
> > > > +#include "xfs_attr.h"
> > > >  #include "xfs_attr_remote.h"
> > > >  #include "xfs_trans.h"
> > > >  #include "xfs_bmap.h"
> > > > -#include "xfs_attr.h"
> > > >  #include "xfs_attr_leaf.h"
> > > >  #include "xfs_quota.h"
> > > >  #include "xfs_dir2.h"
> > > > -- 
> > > > 2.7.4
> > > > 
> > > 
> > 
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v12 4/8] xfs: Set up infastructure for deferred attribute operations
  2020-08-28 21:27   ` Darrick J. Wong
@ 2020-09-02  0:46     ` Allison Collins
  2020-09-02  2:33       ` Allison Collins
  0 siblings, 1 reply; 21+ messages in thread
From: Allison Collins @ 2020-09-02  0:46 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs



On 8/28/20 2:27 PM, Darrick J. Wong wrote:
> On Wed, Aug 26, 2020 at 05:35:14PM -0700, Allison Collins wrote:
>> Currently attributes are modified directly across one or more
>> transactions. But they are not logged or replayed in the event of an
>> error. The goal of delayed attributes is to enable logging and replaying
>> of attribute operations using the existing delayed operations
>> infrastructure.  This will later enable the attributes to become part of
>> larger multi part operations that also must first be recorded to the
>> log.  This is mostly of interest in the scheme of parent pointers which
>> would need to maintain an attribute containing parent inode information
>> any time an inode is moved, created, or removed.  Parent pointers would
>> then be of interest to any feature that would need to quickly derive an
>> inode path from the mount point. Online scrub, nfs lookups and fs grow
>> or shrink operations are all features that could take advantage of this.
>>
>> This patch adds two new log item types for setting or removing
>> attributes as deferred operations.  The xfs_attri_log_item logs an
>> intent to set or remove an attribute.  The corresponding
>> xfs_attrd_log_item holds a reference to the xfs_attri_log_item and is
>> freed once the transaction is done.  Both log items use a generic
>> xfs_attr_log_format structure that contains the attribute name, value,
>> flags, inode, and an op_flag that indicates if the operations is a set
>> or remove.
>>
>> Signed-off-by: Allison Collins <allison.henderson@oracle.com>
>> ---
>>   fs/xfs/Makefile                 |   1 +
>>   fs/xfs/libxfs/xfs_attr.c        |   7 +-
>>   fs/xfs/libxfs/xfs_attr.h        |  39 ++
>>   fs/xfs/libxfs/xfs_defer.c       |   1 +
>>   fs/xfs/libxfs/xfs_defer.h       |   3 +
>>   fs/xfs/libxfs/xfs_log_format.h  |  44 ++-
>>   fs/xfs/libxfs/xfs_log_recover.h |   2 +
>>   fs/xfs/libxfs/xfs_types.h       |   1 +
>>   fs/xfs/scrub/common.c           |   2 +
>>   fs/xfs/xfs_acl.c                |   2 +
>>   fs/xfs/xfs_attr_item.c          | 829 ++++++++++++++++++++++++++++++++++++++++
>>   fs/xfs/xfs_attr_item.h          |  76 ++++
>>   fs/xfs/xfs_attr_list.c          |   1 +
>>   fs/xfs/xfs_ioctl.c              |   2 +
>>   fs/xfs/xfs_ioctl32.c            |   2 +
>>   fs/xfs/xfs_iops.c               |   2 +
>>   fs/xfs/xfs_log.c                |   4 +
>>   fs/xfs/xfs_log_recover.c        |   2 +
>>   fs/xfs/xfs_ondisk.h             |   2 +
>>   fs/xfs/xfs_xattr.c              |   1 +
>>   20 files changed, 1017 insertions(+), 6 deletions(-)
>>
>> diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
>> index 04611a1..b056cfc 100644
>> --- a/fs/xfs/Makefile
>> +++ b/fs/xfs/Makefile
>> @@ -102,6 +102,7 @@ xfs-y				+= xfs_log.o \
>>   				   xfs_buf_item_recover.o \
>>   				   xfs_dquot_item_recover.o \
>>   				   xfs_extfree_item.o \
>> +				   xfs_attr_item.o \
>>   				   xfs_icreate_item.o \
>>   				   xfs_inode_item.o \
>>   				   xfs_inode_item_recover.o \
>> diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
>> index a8cfe62..cf75742 100644
>> --- a/fs/xfs/libxfs/xfs_attr.c
>> +++ b/fs/xfs/libxfs/xfs_attr.c
>> @@ -24,6 +24,7 @@
>>   #include "xfs_quota.h"
>>   #include "xfs_trans_space.h"
>>   #include "xfs_trace.h"
>> +#include "xfs_attr_item.h"
>>   
>>   /*
>>    * xfs_attr.c
>> @@ -59,8 +60,6 @@ STATIC int xfs_attr_node_hasname(xfs_da_args_t *args,
>>   STATIC int xfs_attr_fillstate(xfs_da_state_t *state);
>>   STATIC int xfs_attr_refillstate(xfs_da_state_t *state);
>>   STATIC int xfs_attr_leaf_try_add(struct xfs_da_args *args, struct xfs_buf *bp);
>> -STATIC int xfs_attr_set_iter(struct xfs_delattr_context *dac,
>> -			     struct xfs_buf **leaf_bp);
>>   
>>   int
>>   xfs_inode_hasattr(
>> @@ -142,7 +141,7 @@ xfs_attr_get(
>>   /*
>>    * Calculate how many blocks we need for the new attribute,
>>    */
>> -STATIC int
>> +int
>>   xfs_attr_calc_size(
>>   	struct xfs_da_args	*args,
>>   	int			*local)
>> @@ -327,7 +326,7 @@ xfs_attr_set_args(
>>    * to handle this, and recall the function until a successful error code is
>>    * returned.
>>    */
>> -STATIC int
>> +int
>>   xfs_attr_set_iter(
>>   	struct xfs_delattr_context	*dac,
>>   	struct xfs_buf			**leaf_bp)
>> diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
>> index 4f6bba8..23b8308 100644
>> --- a/fs/xfs/libxfs/xfs_attr.h
>> +++ b/fs/xfs/libxfs/xfs_attr.h
>> @@ -247,6 +247,7 @@ enum xfs_delattr_state {
>>   #define XFS_DAC_DEFER_FINISH		0x01 /* finish the transaction */
>>   #define XFS_DAC_NODE_RMVNAME_INIT	0x02 /* xfs_attr_node_removename init */
>>   #define XFS_DAC_LEAF_ADDNAME_INIT	0x04 /* xfs_attr_leaf_addname init*/
>> +#define XFS_DAC_DELAYED_OP_INIT		0x08 /* delayed operations init*/
>>   
>>   /*
>>    * Context used for keeping track of delayed attribute operations
>> @@ -254,6 +255,9 @@ enum xfs_delattr_state {
>>   struct xfs_delattr_context {
>>   	struct xfs_da_args      *da_args;
>>   
>> +	/* Used by delayed attributes to hold leaf across transactions */
>> +	struct xfs_buf		*leaf_bp;
>> +
>>   	/* Used in xfs_attr_rmtval_set_blk to roll through allocating blocks */
>>   	struct xfs_bmbt_irec	map;
>>   	xfs_dablk_t		lblkno;
>> @@ -268,6 +272,38 @@ struct xfs_delattr_context {
>>   	enum xfs_delattr_state  dela_state;
>>   };
> 
> I'll start by pasting in the full xfs_delattr_context definition for
> easier reading:
> 
> /*
>   * Context used for keeping track of delayed attribute operations
>   */
> struct xfs_delattr_context {
> 	struct xfs_da_args      *da_args;
> 
> 	/* Used by delayed attributes to hold leaf across transactions */
> 	struct xfs_buf		*leaf_bp;
> 
> 	/* Used in xfs_attr_rmtval_set_blk to roll through allocating blocks */
> 	struct xfs_bmbt_irec	map;
> 	xfs_dablk_t		lblkno;
> 	int			blkcnt;
> 
> 	/* Used in xfs_attr_node_removename to roll through removing blocks */
> 	struct xfs_da_state     *da_state;
> 	struct xfs_da_state_blk *blk;
> 
> 	/* Used to keep track of current state of delayed operation */
> 	unsigned int            flags;
> 	enum xfs_delattr_state  dela_state;
> };
> 
> Admittedly, I /am/ conducting a backwards review and zeroing in on the
> data structures first.
> 
>> +/*
>> + * List of attrs to commit later.
>> + */
>> +struct xfs_attr_item {
>> +	struct xfs_inode	*xattri_ip;
>> +	void			*xattri_value;		/* attr value */
>> +	void			*xattri_name;		/* attr name */
>> +	uint32_t		xattri_op_flags;	/* attr op set or rm */
>> +	uint32_t		xattri_value_len;	/* length of value */
>> +	uint32_t		xattri_name_len;	/* length of name */
>> +	uint32_t		xattri_flags;		/* attr flags */
>> +
>> +	/* used to log this item to an intent */
>> +	struct list_head	xattri_list;
>> +
>> +	/*
>> +	 * xfs_delattr_context and xfs_da_args need to remain instantiated
>> +	 * across transaction rolls during the defer finish, so store them here
>> +	 */
>> +	struct xfs_da_args		xattri_args;
>> +	struct xfs_delattr_context	xattri_dac;
>> +
>> +	/*
>> +	 * A byte array follows the header containing the file name and
>> +	 * attribute value.
>> +	 */
>> +};
> 
> These two structures (xfs_delattr_context and xfs_attr_item) duplicate a
> lot of information considering that they both track incore state during
> an xattr set/remove operation.  There's also a lot of duplication
> between the do-while loop in xfs_attr_set_args and the inner loop of the
> defer attr set code.
Yes, to clarify a bit of the history: most of this was sort of adopted 
from the efi/efd code as a sort model.  Most of the fields in the 
xfs_attr_item are sort of like the parameters needed to kick off a 
delayed operation.

The xfs_da_args and the xfs_delattr_context are an exception to that 
model.  Most of the time, they're not even populated.  The reason they 
are here is because they need to be instantiated somewhere not inside 
the call stack of the delayed attr operations.  Otherwise we'd loose 
them every time we come back with -EAGAIN.  In the non delayed attr 
code, they are kept in the xfs_attr_*_args functions.  In the delayed 
attr code path, they are kept here.

IOW: I had to plop them somewhere, and discovered that the *_items 
remain instantiated across their corresponding delayed operations.  So 
it seemed like a reasonable place?  I suspect attrs are the first 
delayed operation to require a context of sorts as I did not see any of 
the other delayed operations needing to deal with this issue.  So attr 
operations a little unique in this way.


> 
> To make sure I'm understanding this correctly, let me start by repeating
> back to you what I think is the code flow through the hasdelattr path
> and then the !hasdelattr path.  Let's call the hasdelattr path (A).
> 
> First, the caller allocates an xfs_da_args structure and partially
> initializes it with dp, attr_filter, attr_flags, name, namelen, value,
> and valuelen set appropriately for the operation it wants.  The rest of
> the struct should be zeroed, because the uninitialized parts are
> internal state.
> 
> Second, the *args are passed to xfs_attr_set, which after setting up a
> transaction calls xfs_attr_set_deferred.  This calls xfs_attr_item_init
> to allocate and initialize a struct xfs_attr_item with dp, name,
> namelen, attr_filter, value, and valuelen, and passes this incore state
> tracking structure to the defer ops machinery.
> 
> Third, the defer ops machinery calls xfs_attr_finish_item to deal with
> the attr request.  If the xfs_delattr_context within the xfs_attr_item
> is uninitialized it willl set the xfs_da_args state that's within the
> xfs_attr_item to the values already stored in the xfs_attr_item.
> 
> Fourth, xfs_attr_finish_item calls xfs_trans_attr to dispatch based on
> op_flags.  For setting, this means we call xfs_attr_set_iter.
> 
> Fifth, xfs_attr_set_iter dispatches functions based on whatever
> dela_state in the delattr_context is set to.  The functions it calls can
> set DAC_DEFER_FINISH and/or return -EAGAIN to signal the defer ops
> machinery that it needs to roll the transaction so that we can repeat
> steps 3-5 until we're done.  The defer ops machinery ought to honor
> DEFER_FINISH and complete whatever work items we've put on the queue,
> but... it's buggy and doesn't.  I'll come back to this later.
Oh you are right... this should be updated.  So the history was: at some 
time during the review of the delayed ready series, it was proposed that 
we have a top level loop that rolls the transactions, rather than trying 
to plumb in an "off switch" for the transactions.  This looping concept 
was already modeled here, so I had adopted it for use in the delay ready 
series.

I did however, forget to go back and update it with the DEFER_FINISH 
flags that we added later.  Or consolidate them, which I suspect is 
where you are going with this... :-)

> 
> Sixth, once we're done, we return out to xfs_attr_set to commit the
> transaction and exit.
> 
> Did I understand that correctly?  
That sounds about right

If so, I'll move on to the !hasdelattr
> case, which we'll call (B).
> 
> First, the caller allocates an xfs_da_args structure and partially
> initializes it with dp, attr_filter, attr_flags, name, namelen, value,
> and valuelen set appropriately for the operation it wants.  The rest of
> the struct should be zeroed, because the uninitialized parts are
> internal state.  This is the same as step A1 above.
> 
> Second, the *args are passed to xfs_attr_set, which after setting up a
> transaction calls xfs_attr_set_args.  This calls xfs_attr_set_iter,
> which is the dela_state function dispatcher mentioned in step A5 above.
> The functions it calls can set DAC_DEFER_FINISH to signal to
> xfs_attr_set_args that it needs to complete whatever work items we've
> attached to the transaction.  They can also return -EAGAIN to signal
> to xfs_attr_set_args that it's time to roll the transaction.
> 
> Third, once we're done, we return out of xfs_attr_set, same as step A6
> above.
> 
> Assuming I understood those two code paths correctly, I'll move on to
> the attr item recovery case.  Call this (C).
Sounds about right

> 
> First, xfs_attri_item_recover is called with a recovered incore log
> item.  It allocates an xfs_da_args and fills out most of the same
> fields that xfs_attr_set does in A1-A2 and B1-B2 above; and then it
> allocates a transaction.
> 
> Second, _recover has its own while loop(!) to call xfs_trans_attr, which
> calls xfs_attr_set_iter, sort of like what A4 does.  I'll come back to
> this later as well.
> 
> Third, xfs_attr_set_iter uses dela_state to dispatch functions, similar
> to what A5 does above.  If those functions set DAC_DEFER_FINISH or
> return -EAGAIN, we'll pass that out to xfs_attr_set_iter to get the
> transaction rolled so we can move on to the next state.
Mmm, xfs_attr_set_iter doesn't roll transactions. xfs_attr_*_args does. 
Perhaps the *_recover should follow suit, or be consolidated.

> 
> Fourth, when the loop is done we commit the transaction and move on with
> whatever is next in log recovery.
> 
> Does that sound right?  If so, let's move on to the issues I noted
> above.
> 
> I think the first problem is that this patchset adds two more xattr
> operation state structures.  Current xfs_da_args store both the
> operation arguments (inode, name, value, other flags) and most of the
> state of the operation (whichfork, hashval, geo, block indices, rmt
> block indices).  The series then adds a xfs_delattr_context that holds
> more state that needs to survive a transaction roll (leafbp, rmt
> mappings, da btree state, and dela_state).  Then, it adds yet another
> xfs_attr_item that contains its own xfs_da_args and xfs_delattr_context,
> and has a bunch more fields xattri_(ip, value, name, opflags, value)
> that duplicate the fields that already exist in xfs_da_args.
> 
> This is hard to follow.  I don't know what's the difference between
> xfs_attr_item.xattri_name and xfs_attr_item.xattri_args.name, and I
> suspect this makes xfs_attr_item much larger than it needs to be.
Hmm, ok.  Let me see if I could get away with having just having args 
and dac.  That might eliminate some of the overlap.

> 
> Question 1: Can we break up struct xfs_da_args?  Right now its field
> definition is the union set of everything needed to track both a
> directory operation and an xattr operation.  What do you think of
> creating separate xfs_dirop_state and xfs_attrop_state structures that
> each embed an xfs_da_args, and then move the dir and attr-specific
> pieces out of xfs_da_args and into xfs_{dir,attr}op_state as
> appropriate?  I think Christoph has suggested this elsewhere on the list
> in the past.
> 
> (Note that xfs_da_state is its own separate thing for dealing with
> dabtree operations; that doesn't change.)
Sure, let me dig around and see if I can better modularize args so that 
we're not carrying around all the dir op stuff through all the attr op 
stuff.

> 
> Question 2: Should we revise the arguments to xfs_attr_[gs]et?  Right
> now the callers of these functions have to initialize the entire
> xfs_da_state structure even though they only care about 7 of the 26
> fields.  What do you think of changing the xfs_attr_[gs]et function
> declarations to pass in the 7 arguments directly?  Or you could create a
> new arguments struct?  If you did that, then xfs_args_[gs]et would be
> responsible for allocating and initializing their internal state.  This
> is cleaner interface-wise, 
I can dig around and see if I can get something like that to work.  I 
like the mini struct idea.  I suspect we'll end up with a few routines 
with similar set of 7 params, so the struct makes sense

and leads me into...
> 
> Question 3: Instead of creating separate xfs_delattr_context
> andxfs_attr_item structs, can you put all the stuff those structures
> track into xfs_attrop_state?  
Where xfs_attrop_state is the combination of xfs_delattr_context, 
xfs_attr_item, and a subset of xfs_da_args?

I sense that the duplication and pointer
> indirection in _delattr_context and _attr_item might be a result of it
> not being all that clear where the xfs_da_args is actually allocated,
> and therefore the scoping rules.  Would all that be clearer if all the
> new state was thrown into the same xfs_attrop_state that we dynamically
> allocate at the start of xfs_attr_[gs]et()?  (Yes, this question's
> existence depends on your answer to Q2.)
I suppose it could me made to work?  I think we're starting to glob 
together members that belong to slightly similar scopes, so the real 
question would be: Are people going to be amicable to seeing it that way?

The xfs_attr_item was sort of modeled from xfs_extent_free_item.  In 
general these structs are meant to function as sort of items in a list 
of log items.  Hats why they all have the list_head field at a minimum. 
(xfs_bmap_intent, xfs_extent_free_item, xfs_refcount_intent, 
xfs_rmap_intent, are similar in this way.  They all have their own 
corresponding *_finish_item routines whose purpose is to unpack the item 
and hand it off to it's corresponding *_trans routines).  It seemed to 
be a sort of established pattern, so I figured I should fall in line 
with that.

xfs_delattr_context is a bit of an odd ball.  It doesn't really 
represent a set of associated properties like names and values and such. 
  It's really more about bookmarking a position in a sequence of events. 
So their their contents really have no meaning outside this context. 
For example, blkcnt is used during attr grow an shrink operations, but 
an attr doesn't otherwise normally have a block count associated with it.

I personally think it's a bit messy to try and lump all of that 
together, but it's really an aesthetic thing. Ultimately that is going 
to come down to the cross section of opinions that people are most 
comfortable seeing. :-)

> 
> Question 4: Does xfs_attr_item_init need to allocate space to hold the
> name and value buffers when it is called from xfs_attr_set?
It would if it were ever used in a routine that passed name and value as 
local params, and then exited before finishing the transaction, or if it 
in anyway manipulated their values in a way not meant to be reflected in 
the delayed operation.

I'm not sure if there's an instance where we do this, but I'd really 
have to try it and see. Will verify.

> xfs_attr_set does not return until we're completely finished with the
> deferred xattr processing, which means that the buffers passed into
> xfs_attr_set cannot go out of scope, right?
> 
> (I think you /do/ need to allocate separate buffers for log recovery.)
Yes, the on disk structs are xfs_attri_log_format and xfs_attrd_log_format

> 
> My second set of questions revolve around the duplication of attr
> operation loops between xfs_attr_set_args() and the defer ops code.
> AFAICT there's no reason to have xfs_attr_set_args, since there is no
> requirement in the deferred ops machinery to create log intent or log
> done items.
Yes, let me see if I can consolidate that, I forgot that I had nabbed 
the looping code from later in the set and pulled it downwards some time 
ago.   I'm thinking maybe xfs_attri_item_recover should just unpack the 
log item and pipe it through to xfs_attr_set_args

> 
> Question 5: Instead of open-coding a do {attrset roll hold} loop in
> xfs_attr_set_args, what do you think about setting up the deferred op
> code (xfs_attr_defer_type and the functions assigned to it in patch 4)
> to do that from the start?  By adding the defer op code early, patch 2
> would create xfs_attr_set_iter as it does now, and xfs_attr_finish_item
> would call it directly.  Since there's no log item defined yet, the
> other defer ops functions (create_intent, abort_intent, create_done) can
> return NULL log item pointers.
>
K, this one I'm having a hard time following.... the purpose of 
xis_tart_finish_item is to unpack the xis_tart_item, and must also 
adhere to the xfs_defer_op_type.finish_item signature.  It doesn't loop 
like xfs_attr_set_args does because the calling delayed operation 
infrastructure already does that (see the loop in xfs_defer_finish_one). 
  Without the delayed operation machinery in the picture, a non delayed 
attribute is going to need an equivalent loop somewhere.  We can 
consolidate it with the loop we see in the recovery path, but we do need 
to keep at least one loop somewhere, and it cant be in xfs_attr_finish_item.

> Once you get to the point whre you have defined the log items, you can
> add in all the other log item handling (i.e. xfs_attr[id]_item_ops).  As
> an example of a defer op that optionally records changes to its incore
> operation state with log items, see xfs_swapext_defer_type[1].
> 
> [1] https://urldefense.com/v3/__https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/commit/?h=atomic-file-updates&id=53c7233842969347174e8d68c8486dbf3efb734c__;!!GqivPVa7Brio!N0NsMhrByLcmev5aFDxTBS_VvzIbprn_VmNb4kY_gw4ADR0q1ERQzlaQe-tjnhXmUNhu$
Ok, let me go through that, maybe I'm missing something here

> 
> Moving along to the DEFER_FINISH question that I said I'd get back to
> later -- there's a subtle difference to the order in which deferred log
> items that are created while trying to make progress on an xattr op are
> finished.  This is due to a design wart of the original defer ops
> machinery, and Brian and I have discussed this previously.
> 
> In a nutshell, let's pretend that step 1 of an xattr operation creates
> new deferred ops ABCD and step 2 creates new deferred ops EFGH.  Let's
> also pretend that step 1 and step 2 both set DEFER_FINISH.  In the
> !delattr case, xfs_attr_set_args -> xfs_attr_trans_roll will run step 1,
> process A->B->C->D, roll, run step 2, and then process E->F->G->H and
> commit.
> 
> In the delattr case, however, the defer ops machinery shoves all the new
> defer ops to the end of the queue, which means that we run step 1, roll,
> run step 2, and then run A->B->C->D->E->F->G->H and commit.  I would
> like to fix that, since it seems more logical to me that you'd finish
> A-D before moving on to the second phase; and the atomic swapext code is
> going to require that.
> 
> Question 6: So, uh, can you go have a look at the latest patches[2]?
> I'll post them soon if I can get past the bigtime review.  I don't think
> this wart of the defer ops mechanism affects your patchset, but you know
> how deferred attrs work better than I. :)
> 
> [2] https://urldefense.com/v3/__https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=defer-ops-stalls__;!!GqivPVa7Brio!N0NsMhrByLcmev5aFDxTBS_VvzIbprn_VmNb4kY_gw4ADR0q1ERQzlaQe-tjnqXvaBRR$
Oh ok.  Sure, will take a look

> 
> I also had a couple questions (observations?) about how log recovery
> works for attr items, because I noticed that xfs_attri_item_recover also
> has a do {attrset, roll} loop.
> 
> HAH, I just realized (while writing Q7) that xfs_defer_move needs to log
> intent items for each newly scheduled work item because if log recovery
> crashes after finishing the existing intent items but before it gets to
> the new intent items, the next attempt at log recovery will not see the
> missing intents and will /never/ even be aware that it should have
> finished a chain.  That leads to fs corruption!  So that series has more
> work to do, and you can set Q6 aside for now.
:-)  I may take a peek anyway

> 
> Question 7: Why is there a do {attrset, roll} loop in the recovery
> routine?  Log intent item recovery functions are only supposed to
> complete a single transaction's worth of work.  If there's more work to
> do, the recovery function should attach a new defer ops item to the
> transaction to schedule the rest of the work, and use xfs_defer_move
> to attach the list of new defer ops to *parent_tp.
Oh ok.  I wasnt aware of how that was supposed to work.  Will update.

> 
> The reason for this is that log recovery has to finish every unfinished
> intent item that was in the log before it can move on to new log items
> that were created as a result of recovering log items.
Ok, so maybe we dont consolidate the loops since this one will need to 
go away.  Thanks for the catch though!

> 
> Ok, that's probably enough questions for now.
> 
> --D

Thanks!  I know it's a lot!!
Allison

> 
>> +
>> +#define XFS_ATTR_ITEM_SIZEOF(namelen, valuelen)	\
>> +	(sizeof(struct xfs_attr_item) + (namelen) + (valuelen))
> 
>> +
>> +
>>   /*========================================================================
>>    * Function prototypes for the kernel.
>>    *========================================================================*/
>> @@ -283,11 +319,14 @@ int xfs_attr_get_ilocked(struct xfs_da_args *args);
>>   int xfs_attr_get(struct xfs_da_args *args);
>>   int xfs_attr_set(struct xfs_da_args *args);
>>   int xfs_attr_set_args(struct xfs_da_args *args);
>> +int xfs_attr_set_iter(struct xfs_delattr_context *dac,
>> +		      struct xfs_buf **leaf_bp);
>>   int xfs_has_attr(struct xfs_da_args *args);
>>   int xfs_attr_remove_args(struct xfs_da_args *args);
>>   int xfs_attr_remove_iter(struct xfs_delattr_context *dac);
>>   bool xfs_attr_namecheck(const void *name, size_t length);
>>   void xfs_delattr_context_init(struct xfs_delattr_context *dac,
>>   			      struct xfs_da_args *args);
>> +int xfs_attr_calc_size(struct xfs_da_args *args, int *local);
>>   
>>   #endif	/* __XFS_ATTR_H__ */
>> diff --git a/fs/xfs/libxfs/xfs_defer.c b/fs/xfs/libxfs/xfs_defer.c
>> index d8f5862..4392279 100644
>> --- a/fs/xfs/libxfs/xfs_defer.c
>> +++ b/fs/xfs/libxfs/xfs_defer.c
>> @@ -176,6 +176,7 @@ static const struct xfs_defer_op_type *defer_op_types[] = {
>>   	[XFS_DEFER_OPS_TYPE_RMAP]	= &xfs_rmap_update_defer_type,
>>   	[XFS_DEFER_OPS_TYPE_FREE]	= &xfs_extent_free_defer_type,
>>   	[XFS_DEFER_OPS_TYPE_AGFL_FREE]	= &xfs_agfl_free_defer_type,
>> +	[XFS_DEFER_OPS_TYPE_ATTR]	= &xfs_attr_defer_type,
>>   };
>>   
>>   static void
>> diff --git a/fs/xfs/libxfs/xfs_defer.h b/fs/xfs/libxfs/xfs_defer.h
>> index 6b2ca58..193d3bb 100644
>> --- a/fs/xfs/libxfs/xfs_defer.h
>> +++ b/fs/xfs/libxfs/xfs_defer.h
>> @@ -18,6 +18,7 @@ enum xfs_defer_ops_type {
>>   	XFS_DEFER_OPS_TYPE_RMAP,
>>   	XFS_DEFER_OPS_TYPE_FREE,
>>   	XFS_DEFER_OPS_TYPE_AGFL_FREE,
>> +	XFS_DEFER_OPS_TYPE_ATTR,
>>   	XFS_DEFER_OPS_TYPE_MAX,
>>   };
>>   
>> @@ -62,5 +63,7 @@ extern const struct xfs_defer_op_type xfs_refcount_update_defer_type;
>>   extern const struct xfs_defer_op_type xfs_rmap_update_defer_type;
>>   extern const struct xfs_defer_op_type xfs_extent_free_defer_type;
>>   extern const struct xfs_defer_op_type xfs_agfl_free_defer_type;
>> +extern const struct xfs_defer_op_type xfs_attr_defer_type;
>> +
>>   
>>   #endif /* __XFS_DEFER_H__ */
>> diff --git a/fs/xfs/libxfs/xfs_log_format.h b/fs/xfs/libxfs/xfs_log_format.h
>> index e3400c9..33b26b6 100644
>> --- a/fs/xfs/libxfs/xfs_log_format.h
>> +++ b/fs/xfs/libxfs/xfs_log_format.h
>> @@ -117,7 +117,12 @@ struct xfs_unmount_log_format {
>>   #define XLOG_REG_TYPE_CUD_FORMAT	24
>>   #define XLOG_REG_TYPE_BUI_FORMAT	25
>>   #define XLOG_REG_TYPE_BUD_FORMAT	26
>> -#define XLOG_REG_TYPE_MAX		26
>> +#define XLOG_REG_TYPE_ATTRI_FORMAT	27
>> +#define XLOG_REG_TYPE_ATTRD_FORMAT	28
>> +#define XLOG_REG_TYPE_ATTR_NAME	29
>> +#define XLOG_REG_TYPE_ATTR_VALUE	30
>> +#define XLOG_REG_TYPE_MAX		30
>> +
>>   
>>   /*
>>    * Flags to log operation header
>> @@ -240,6 +245,8 @@ typedef struct xfs_trans_header {
>>   #define	XFS_LI_CUD		0x1243
>>   #define	XFS_LI_BUI		0x1244	/* bmbt update intent */
>>   #define	XFS_LI_BUD		0x1245
>> +#define	XFS_LI_ATTRI		0x1246  /* attr set/remove intent*/
>> +#define	XFS_LI_ATTRD		0x1247  /* attr set/remove done */
>>   
>>   #define XFS_LI_TYPE_DESC \
>>   	{ XFS_LI_EFI,		"XFS_LI_EFI" }, \
>> @@ -255,7 +262,9 @@ typedef struct xfs_trans_header {
>>   	{ XFS_LI_CUI,		"XFS_LI_CUI" }, \
>>   	{ XFS_LI_CUD,		"XFS_LI_CUD" }, \
>>   	{ XFS_LI_BUI,		"XFS_LI_BUI" }, \
>> -	{ XFS_LI_BUD,		"XFS_LI_BUD" }
>> +	{ XFS_LI_BUD,		"XFS_LI_BUD" }, \
>> +	{ XFS_LI_ATTRI,		"XFS_LI_ATTRI" }, \
>> +	{ XFS_LI_ATTRD,		"XFS_LI_ATTRD" }
>>   
>>   /*
>>    * Inode Log Item Format definitions.
>> @@ -860,4 +869,35 @@ struct xfs_icreate_log {
>>   	__be32		icl_gen;	/* inode generation number to use */
>>   };
>>   
>> +/*
>> + * Flags for deferred attribute operations.
>> + * Upper bits are flags, lower byte is type code
>> + */
>> +#define XFS_ATTR_OP_FLAGS_SET		1	/* Set the attribute */
>> +#define XFS_ATTR_OP_FLAGS_REMOVE	2	/* Remove the attribute */
>> +#define XFS_ATTR_OP_FLAGS_TYPE_MASK	0x0FF	/* Flags type mask */
>> +
>> +/*
>> + * This is the structure used to lay out an attr log item in the
>> + * log.
>> + */
>> +struct xfs_attri_log_format {
>> +	uint16_t	alfi_type;	/* attri log item type */
>> +	uint16_t	alfi_size;	/* size of this item */
>> +	uint32_t	__pad;		/* pad to 64 bit aligned */
>> +	uint64_t	alfi_id;	/* attri identifier */
>> +	xfs_ino_t       alfi_ino;	/* the inode for this attr operation */
>> +	uint32_t        alfi_op_flags;	/* marks the op as a set or remove */
>> +	uint32_t        alfi_name_len;	/* attr name length */
>> +	uint32_t        alfi_value_len;	/* attr value length */
>> +	uint32_t        alfi_attr_flags;/* attr flags */
>> +};
>> +
>> +struct xfs_attrd_log_format {
>> +	uint16_t	alfd_type;	/* attrd log item type */
>> +	uint16_t	alfd_size;	/* size of this item */
>> +	uint32_t	__pad;		/* pad to 64 bit aligned */
>> +	uint64_t	alfd_alf_id;	/* id of corresponding attrd */
>> +};
>> +
>>   #endif /* __XFS_LOG_FORMAT_H__ */
>> diff --git a/fs/xfs/libxfs/xfs_log_recover.h b/fs/xfs/libxfs/xfs_log_recover.h
>> index 641132d..b0b8e94 100644
>> --- a/fs/xfs/libxfs/xfs_log_recover.h
>> +++ b/fs/xfs/libxfs/xfs_log_recover.h
>> @@ -72,6 +72,8 @@ extern const struct xlog_recover_item_ops xlog_rui_item_ops;
>>   extern const struct xlog_recover_item_ops xlog_rud_item_ops;
>>   extern const struct xlog_recover_item_ops xlog_cui_item_ops;
>>   extern const struct xlog_recover_item_ops xlog_cud_item_ops;
>> +extern const struct xlog_recover_item_ops xlog_attri_item_ops;
>> +extern const struct xlog_recover_item_ops xlog_attrd_item_ops;
>>   
>>   /*
>>    * Macros, structures, prototypes for internal log manager use.
>> diff --git a/fs/xfs/libxfs/xfs_types.h b/fs/xfs/libxfs/xfs_types.h
>> index 397d947..860cdd2 100644
>> --- a/fs/xfs/libxfs/xfs_types.h
>> +++ b/fs/xfs/libxfs/xfs_types.h
>> @@ -11,6 +11,7 @@ typedef uint32_t	prid_t;		/* project ID */
>>   typedef uint32_t	xfs_agblock_t;	/* blockno in alloc. group */
>>   typedef uint32_t	xfs_agino_t;	/* inode # within allocation grp */
>>   typedef uint32_t	xfs_extlen_t;	/* extent length in blocks */
>> +typedef uint32_t	xfs_attrlen_t;	/* attr length */
>>   typedef uint32_t	xfs_agnumber_t;	/* allocation group number */
>>   typedef int32_t		xfs_extnum_t;	/* # of extents in a file */
>>   typedef int16_t		xfs_aextnum_t;	/* # extents in an attribute fork */
>> diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
>> index 1887605..9a649d1 100644
>> --- a/fs/xfs/scrub/common.c
>> +++ b/fs/xfs/scrub/common.c
>> @@ -24,6 +24,8 @@
>>   #include "xfs_rmap_btree.h"
>>   #include "xfs_log.h"
>>   #include "xfs_trans_priv.h"
>> +#include "xfs_da_format.h"
>> +#include "xfs_da_btree.h"
>>   #include "xfs_attr.h"
>>   #include "xfs_reflink.h"
>>   #include "scrub/scrub.h"
>> diff --git a/fs/xfs/xfs_acl.c b/fs/xfs/xfs_acl.c
>> index d4c687b5c..2fa173a 100644
>> --- a/fs/xfs/xfs_acl.c
>> +++ b/fs/xfs/xfs_acl.c
>> @@ -10,6 +10,8 @@
>>   #include "xfs_trans_resv.h"
>>   #include "xfs_mount.h"
>>   #include "xfs_inode.h"
>> +#include "xfs_da_format.h"
>> +#include "xfs_da_btree.h"
>>   #include "xfs_attr.h"
>>   #include "xfs_trace.h"
>>   #include "xfs_error.h"
>> diff --git a/fs/xfs/xfs_attr_item.c b/fs/xfs/xfs_attr_item.c
>> new file mode 100644
>> index 0000000..923c288
>> --- /dev/null
>> +++ b/fs/xfs/xfs_attr_item.c
>> @@ -0,0 +1,829 @@
>> +// SPDX-License-Identifier: GPL-2.0-or-later
>> +/*
>> + * Copyright (C) 2019 Oracle.  All Rights Reserved.
>> + * Author: Allison Collins <allison.henderson@oracle.com>
>> + */
>> +
>> +#include "xfs.h"
>> +#include "xfs_fs.h"
>> +#include "xfs_format.h"
>> +#include "xfs_log_format.h"
>> +#include "xfs_trans_resv.h"
>> +#include "xfs_bit.h"
>> +#include "xfs_shared.h"
>> +#include "xfs_mount.h"
>> +#include "xfs_defer.h"
>> +#include "xfs_trans.h"
>> +#include "xfs_trans_priv.h"
>> +#include "xfs_buf_item.h"
>> +#include "xfs_attr_item.h"
>> +#include "xfs_log.h"
>> +#include "xfs_btree.h"
>> +#include "xfs_rmap.h"
>> +#include "xfs_inode.h"
>> +#include "xfs_icache.h"
>> +#include "xfs_da_format.h"
>> +#include "xfs_da_btree.h"
>> +#include "xfs_attr.h"
>> +#include "xfs_shared.h"
>> +#include "xfs_attr_item.h"
>> +#include "xfs_alloc.h"
>> +#include "xfs_bmap.h"
>> +#include "xfs_trace.h"
>> +#include "libxfs/xfs_da_format.h"
>> +#include "xfs_inode.h"
>> +#include "xfs_quota.h"
>> +#include "xfs_log_priv.h"
>> +#include "xfs_log_recover.h"
>> +
>> +static const struct xfs_item_ops xfs_attri_item_ops;
>> +static const struct xfs_item_ops xfs_attrd_item_ops;
>> +
>> +static inline struct xfs_attri_log_item *ATTRI_ITEM(struct xfs_log_item *lip)
>> +{
>> +	return container_of(lip, struct xfs_attri_log_item, attri_item);
>> +}
>> +
>> +STATIC void
>> +xfs_attri_item_free(
>> +	struct xfs_attri_log_item	*attrip)
>> +{
>> +	kmem_free(attrip->attri_item.li_lv_shadow);
>> +	kmem_free(attrip);
>> +}
>> +
>> +/*
>> + * Freeing the attrip requires that we remove it from the AIL if it has already
>> + * been placed there. However, the ATTRI may not yet have been placed in the
>> + * AIL when called by xfs_attri_release() from ATTRD processing due to the
>> + * ordering of committed vs unpin operations in bulk insert operations. Hence
>> + * the reference count to ensure only the last caller frees the ATTRI.
>> + */
>> +STATIC void
>> +xfs_attri_release(
>> +	struct xfs_attri_log_item	*attrip)
>> +{
>> +	ASSERT(atomic_read(&attrip->attri_refcount) > 0);
>> +	if (atomic_dec_and_test(&attrip->attri_refcount)) {
>> +		xfs_trans_ail_delete(&attrip->attri_item,
>> +				     SHUTDOWN_LOG_IO_ERROR);
>> +		xfs_attri_item_free(attrip);
>> +	}
>> +}
>> +
>> +/*
>> + * This returns the number of iovecs needed to log the given attri item. We
>> + * only need 1 iovec for an attri item.  It just logs the attr_log_format
>> + * structure.
>> + */
>> +static inline int
>> +xfs_attri_item_sizeof(
>> +	struct xfs_attri_log_item *attrip)
>> +{
>> +	return sizeof(struct xfs_attri_log_format);
>> +}
>> +
>> +STATIC void
>> +xfs_attri_item_size(
>> +	struct xfs_log_item	*lip,
>> +	int			*nvecs,
>> +	int			*nbytes)
>> +{
>> +	struct xfs_attri_log_item       *attrip = ATTRI_ITEM(lip);
>> +
>> +	*nvecs += 1;
>> +	*nbytes += xfs_attri_item_sizeof(attrip);
>> +
>> +	/* Attr set and remove operations require a name */
>> +	ASSERT(attrip->attri_name_len > 0);
>> +
>> +	*nvecs += 1;
>> +	*nbytes += ATTR_NVEC_SIZE(attrip->attri_name_len);
>> +
>> +	/*
>> +	 * Set ops can accept a value of 0 len to clear an attr value.  Remove
>> +	 * ops do not need a value at all.  So only account for the value
>> +	 * when it is needed.
>> +	 */
>> +	if (attrip->attri_value_len > 0) {
>> +		*nvecs += 1;
>> +		*nbytes += ATTR_NVEC_SIZE(attrip->attri_value_len);
>> +	}
>> +}
>> +
>> +/*
>> + * This is called to fill in the log iovecs for the given attri log
>> + * item. We use  1 iovec for the attri_format_item, 1 for the name, and
>> + * another for the value if it is present
>> + */
>> +STATIC void
>> +xfs_attri_item_format(
>> +	struct xfs_log_item	*lip,
>> +	struct xfs_log_vec	*lv)
>> +{
>> +	struct xfs_attri_log_item	*attrip = ATTRI_ITEM(lip);
>> +	struct xfs_log_iovec		*vecp = NULL;
>> +
>> +	attrip->attri_format.alfi_type = XFS_LI_ATTRI;
>> +	attrip->attri_format.alfi_size = 1;
>> +
>> +	/*
>> +	 * This size accounting must be done before copying the attrip into the
>> +	 * iovec.  If we do it after, the wrong size will be recorded to the log
>> +	 * and we trip across assertion checks for bad region sizes later during
>> +	 * the log recovery.
>> +	 */
>> +
>> +	ASSERT(attrip->attri_name_len > 0);
>> +	attrip->attri_format.alfi_size++;
>> +
>> +	if (attrip->attri_value_len > 0)
>> +		attrip->attri_format.alfi_size++;
>> +
>> +	xlog_copy_iovec(lv, &vecp, XLOG_REG_TYPE_ATTRI_FORMAT,
>> +			&attrip->attri_format,
>> +			xfs_attri_item_sizeof(attrip));
>> +	if (attrip->attri_name_len > 0)
>> +		xlog_copy_iovec(lv, &vecp, XLOG_REG_TYPE_ATTR_NAME,
>> +				attrip->attri_name,
>> +				ATTR_NVEC_SIZE(attrip->attri_name_len));
>> +
>> +	if (attrip->attri_value_len > 0)
>> +		xlog_copy_iovec(lv, &vecp, XLOG_REG_TYPE_ATTR_VALUE,
>> +				attrip->attri_value,
>> +				ATTR_NVEC_SIZE(attrip->attri_value_len));
>> +}
>> +
>> +/*
>> + * The unpin operation is the last place an ATTRI is manipulated in the log. It
>> + * is either inserted in the AIL or aborted in the event of a log I/O error. In
>> + * either case, the ATTRI transaction has been successfully committed to make
>> + * it this far. Therefore, we expect whoever committed the ATTRI to either
>> + * construct and commit the ATTRD or drop the ATTRD's reference in the event of
>> + * error. Simply drop the log's ATTRI reference now that the log is done with
>> + * it.
>> + */
>> +STATIC void
>> +xfs_attri_item_unpin(
>> +	struct xfs_log_item	*lip,
>> +	int			remove)
>> +{
>> +	struct xfs_attri_log_item	*attrip = ATTRI_ITEM(lip);
>> +
>> +	xfs_attri_release(attrip);
>> +}
>> +
>> +
>> +STATIC void
>> +xfs_attri_item_release(
>> +	struct xfs_log_item	*lip)
>> +{
>> +	xfs_attri_release(ATTRI_ITEM(lip));
>> +}
>> +
>> +/*
>> + * Allocate and initialize an attri item
>> + */
>> +STATIC struct xfs_attri_log_item *
>> +xfs_attri_init(
>> +	struct xfs_mount	*mp)
>> +
>> +{
>> +	struct xfs_attri_log_item	*attrip;
>> +	uint				size;
>> +
>> +	size = (uint)(sizeof(struct xfs_attri_log_item));
>> +	attrip = kmem_zalloc(size, 0);
>> +
>> +	xfs_log_item_init(mp, &attrip->attri_item, XFS_LI_ATTRI,
>> +			  &xfs_attri_item_ops);
>> +	attrip->attri_format.alfi_id = (uintptr_t)(void *)attrip;
>> +	atomic_set(&attrip->attri_refcount, 2);
>> +
>> +	return attrip;
>> +}
>> +
>> +/*
>> + * Copy an attr format buffer from the given buf, and into the destination attr
>> + * format structure.
>> + */
>> +STATIC int
>> +xfs_attri_copy_format(struct xfs_log_iovec *buf,
>> +		      struct xfs_attri_log_format *dst_attr_fmt)
>> +{
>> +	struct xfs_attri_log_format *src_attr_fmt = buf->i_addr;
>> +	uint len = sizeof(struct xfs_attri_log_format);
>> +
>> +	if (buf->i_len != len)
>> +		return -EFSCORRUPTED;
>> +
>> +	memcpy((char *)dst_attr_fmt, (char *)src_attr_fmt, len);
>> +	return 0;
>> +}
>> +
>> +static inline struct xfs_attrd_log_item *ATTRD_ITEM(struct xfs_log_item *lip)
>> +{
>> +	return container_of(lip, struct xfs_attrd_log_item, attrd_item);
>> +}
>> +
>> +STATIC void
>> +xfs_attrd_item_free(struct xfs_attrd_log_item *attrdp)
>> +{
>> +	kmem_free(attrdp->attrd_item.li_lv_shadow);
>> +	kmem_free(attrdp);
>> +}
>> +
>> +/*
>> + * This returns the number of iovecs needed to log the given attrd item.
>> + * We only need 1 iovec for an attrd item.  It just logs the attr_log_format
>> + * structure.
>> + */
>> +static inline int
>> +xfs_attrd_item_sizeof(
>> +	struct xfs_attrd_log_item *attrdp)
>> +{
>> +	return sizeof(struct xfs_attrd_log_format);
>> +}
>> +
>> +STATIC void
>> +xfs_attrd_item_size(
>> +	struct xfs_log_item	*lip,
>> +	int			*nvecs,
>> +	int			*nbytes)
>> +{
>> +	struct xfs_attrd_log_item	*attrdp = ATTRD_ITEM(lip);
>> +	*nvecs += 1;
>> +	*nbytes += xfs_attrd_item_sizeof(attrdp);
>> +}
>> +
>> +/*
>> + * This is called to fill in the log iovecs for the given attrd log item. We use
>> + * only 1 iovec for the attrd_format, and we point that at the attr_log_format
>> + * structure embedded in the attrd item.
>> + */
>> +STATIC void
>> +xfs_attrd_item_format(
>> +	struct xfs_log_item	*lip,
>> +	struct xfs_log_vec	*lv)
>> +{
>> +	struct xfs_attrd_log_item	*attrdp = ATTRD_ITEM(lip);
>> +	struct xfs_log_iovec		*vecp = NULL;
>> +
>> +	attrdp->attrd_format.alfd_type = XFS_LI_ATTRD;
>> +	attrdp->attrd_format.alfd_size = 1;
>> +
>> +	xlog_copy_iovec(lv, &vecp, XLOG_REG_TYPE_ATTRD_FORMAT,
>> +			&attrdp->attrd_format, xfs_attrd_item_sizeof(attrdp));
>> +}
>> +
>> +/*
>> + * The ATTRD is either committed or aborted if the transaction is cancelled. If
>> + * the transaction is cancelled, drop our reference to the ATTRI and free the
>> + * ATTRD.
>> + */
>> +STATIC void
>> +xfs_attrd_item_release(
>> +	struct xfs_log_item     *lip)
>> +{
>> +	struct xfs_attrd_log_item *attrdp = ATTRD_ITEM(lip);
>> +	xfs_attri_release(attrdp->attrd_attrip);
>> +	xfs_attrd_item_free(attrdp);
>> +}
>> +
>> +/*
>> + * Log an ATTRI it to the ATTRD when the attr op is done.  An attr operation
>> + * may be a set or a remove.  Note that the transaction is marked dirty
>> + * regardless of whether the operation succeeds or fails to support the
>> + * ATTRI/ATTRD lifecycle rules.
>> + */
>> +int
>> +xfs_trans_attr(
>> +	struct xfs_delattr_context	*dac,
>> +	struct xfs_attrd_log_item	*attrdp,
>> +	struct xfs_buf			**leaf_bp,
>> +	uint32_t			op_flags)
>> +{
>> +	struct xfs_da_args		*args = dac->da_args;
>> +	int				error;
>> +
>> +	error = xfs_qm_dqattach_locked(args->dp, 0);
>> +	if (error)
>> +		return error;
>> +
>> +	switch (op_flags) {
>> +	case XFS_ATTR_OP_FLAGS_SET:
>> +		args->op_flags |= XFS_DA_OP_ADDNAME;
>> +		error = xfs_attr_set_iter(dac, leaf_bp);
>> +		break;
>> +	case XFS_ATTR_OP_FLAGS_REMOVE:
>> +		ASSERT(XFS_IFORK_Q((args->dp)));
>> +		error = xfs_attr_remove_iter(dac);
>> +		break;
>> +	default:
>> +		error = -EFSCORRUPTED;
>> +		break;
>> +	}
>> +
>> +	/*
>> +	 * Mark the transaction dirty, even on error. This ensures the
>> +	 * transaction is aborted, which:
>> +	 *
>> +	 * 1.) releases the ATTRI and frees the ATTRD
>> +	 * 2.) shuts down the filesystem
>> +	 */
>> +	args->trans->t_flags |= XFS_TRANS_DIRTY;
>> +	set_bit(XFS_LI_DIRTY, &attrdp->attrd_item.li_flags);
>> +
>> +	return error;
>> +}
>> +
>> +/* Log an attr to the intent item. */
>> +STATIC void
>> +xfs_attr_log_item(
>> +	struct xfs_trans		*tp,
>> +	struct xfs_attri_log_item	*attrip,
>> +	struct xfs_attr_item		*attr)
>> +{
>> +	struct xfs_attri_log_format	*attrp;
>> +	char				*name_value;
>> +
>> +	name_value = ((char *)attr) + sizeof(struct xfs_attr_item);
>> +
>> +	tp->t_flags |= XFS_TRANS_DIRTY;
>> +	set_bit(XFS_LI_DIRTY, &attrip->attri_item.li_flags);
>> +
>> +	/*
>> +	 * At this point the xfs_attr_item has been constructed, and we've
>> +	 * created the log intent. Fill in the attri log item and log format
>> +	 * structure with fields from this xfs_attr_item
>> +	 */
>> +	attrp = &attrip->attri_format;
>> +	attrp->alfi_ino = attr->xattri_ip->i_ino;
>> +	attrp->alfi_op_flags = attr->xattri_op_flags;
>> +	attrp->alfi_value_len = attr->xattri_value_len;
>> +	attrp->alfi_name_len = attr->xattri_name_len;
>> +	attrp->alfi_attr_flags = attr->xattri_flags;
>> +
>> +	attrip->attri_name = name_value;
>> +	attrip->attri_value = &name_value[attr->xattri_name_len];
>> +	attrip->attri_name_len = attr->xattri_name_len;
>> +	attrip->attri_value_len = attr->xattri_value_len;
>> +}
>> +
>> +/* Get an ATTRI. */
>> +static struct xfs_log_item *
>> +xfs_attr_create_intent(
>> +	struct xfs_trans		*tp,
>> +	struct list_head		*items,
>> +	unsigned int			count,
>> +	bool				sort)
>> +{
>> +	struct xfs_mount		*mp = tp->t_mountp;
>> +	struct xfs_attri_log_item	*attrip = xfs_attri_init(mp);
>> +	struct xfs_attr_item		*attr;
>> +
>> +	ASSERT(count == 1);
>> +
>> +	xfs_trans_add_item(tp, &attrip->attri_item);
>> +	list_for_each_entry(attr, items, xattri_list)
>> +		xfs_attr_log_item(tp, attrip, attr);
>> +	return &attrip->attri_item;
>> +}
>> +
>> +/* Process an attr. */
>> +STATIC int
>> +xfs_attr_finish_item(
>> +	struct xfs_trans		*tp,
>> +	struct xfs_log_item		*done,
>> +	struct list_head		*item,
>> +	struct xfs_btree_cur		**state)
>> +{
>> +	struct xfs_attr_item		*attr;
>> +	int				error;
>> +	int				local;
>> +	struct xfs_delattr_context	*dac;
>> +	struct xfs_da_args		*args;
>> +	struct xfs_attrd_log_item	*attrdp;
>> +	struct xfs_attri_log_item	*attrip;
>> +
>> +	attr = container_of(item, struct xfs_attr_item, xattri_list);
>> +	dac = &attr->xattri_dac;
>> +	args = &attr->xattri_args;
>> +
>> +	if (!(dac->flags & XFS_DAC_DELAYED_OP_INIT)) {
>> +		/* Only need to initialize args context once */
>> +		memset(args, 0, sizeof(*args));
>> +		args->geo = attr->xattri_ip->i_mount->m_attr_geo;
>> +		args->whichfork = XFS_ATTR_FORK;
>> +		args->dp = attr->xattri_ip;
>> +		args->name = ((const unsigned char *)attr) +
>> +			      sizeof(struct xfs_attr_item);
>> +		args->namelen = attr->xattri_name_len;
>> +		args->attr_filter = attr->xattri_flags;
>> +		args->hashval = xfs_da_hashname(args->name, args->namelen);
>> +		args->value = (void *)&args->name[attr->xattri_name_len];
>> +		args->valuelen = attr->xattri_value_len;
>> +		args->op_flags = XFS_DA_OP_OKNOENT;
>> +
>> +		/* must match existing transaction block res */
>> +		args->total = xfs_attr_calc_size(args, &local);
>> +
>> +		memset(dac, 0, sizeof(struct xfs_delattr_context));
>> +		dac->flags |= XFS_DAC_DELAYED_OP_INIT;
>> +		dac->da_args = args;
>> +	}
>> +
>> +	/*
>> +	 * Always reset trans after EAGAIN cycle
>> +	 * since the transaction is new
>> +	 */
>> +	args->trans = tp;
>> +
>> +	error = xfs_trans_attr(dac, ATTRD_ITEM(done), &dac->leaf_bp,
>> +			       attr->xattri_op_flags);
>> +	/*
>> +	 * The attrip refers to xfs_attr_item memory to log the name and value
>> +	 * with the intent item. This already occurred when the intent was
>> +	 * committed so these fields are no longer accessed. Clear them out of
>> +	 * caution since we're about to free the xfs_attr_item.
>> +	 */
>> +	attrdp = (struct xfs_attrd_log_item *)done;
>> +	attrip = attrdp->attrd_attrip;
>> +	attrip->attri_name = NULL;
>> +	attrip->attri_value = NULL;
>> +
>> +	if (error != -EAGAIN)
>> +		kmem_free(attr);
>> +
>> +	return error;
>> +}
>> +
>> +/* Abort all pending ATTRs. */
>> +STATIC void
>> +xfs_attr_abort_intent(
>> +	struct xfs_log_item		*intent)
>> +{
>> +	xfs_attri_release(ATTRI_ITEM(intent));
>> +}
>> +
>> +/* Cancel an attr */
>> +STATIC void
>> +xfs_attr_cancel_item(
>> +	struct list_head		*item)
>> +{
>> +	struct xfs_attr_item		*attr;
>> +
>> +	attr = container_of(item, struct xfs_attr_item, xattri_list);
>> +	kmem_free(attr);
>> +}
>> +
>> +/*
>> + * The ATTRI is logged only once and cannot be moved in the log, so simply
>> + * return the lsn at which it's been logged.
>> + */
>> +STATIC xfs_lsn_t
>> +xfs_attri_item_committed(
>> +	struct xfs_log_item	*lip,
>> +	xfs_lsn_t		lsn)
>> +{
>> +	return lsn;
>> +}
>> +
>> +STATIC void
>> +xfs_attri_item_committing(
>> +	struct xfs_log_item	*lip,
>> +	xfs_lsn_t		lsn)
>> +{
>> +}
>> +
>> +STATIC bool
>> +xfs_attri_item_match(
>> +	struct xfs_log_item	*lip,
>> +	uint64_t		intent_id)
>> +{
>> +	return ATTRI_ITEM(lip)->attri_format.alfi_id == intent_id;
>> +}
>> +
>> +/*
>> + * When the attrd item is committed to disk, all we need to do is delete our
>> + * reference to our partner attri item and then free ourselves. Since we're
>> + * freeing ourselves we must return -1 to keep the transaction code from
>> + * further referencing this item.
>> + */
>> +STATIC xfs_lsn_t
>> +xfs_attrd_item_committed(
>> +	struct xfs_log_item	*lip,
>> +	xfs_lsn_t		lsn)
>> +{
>> +	struct xfs_attrd_log_item	*attrdp = ATTRD_ITEM(lip);
>> +
>> +	/*
>> +	 * Drop the ATTRI reference regardless of whether the ATTRD has been
>> +	 * aborted. Once the ATTRD transaction is constructed, it is the sole
>> +	 * responsibility of the ATTRD to release the ATTRI (even if the ATTRI
>> +	 * is aborted due to log I/O error).
>> +	 */
>> +	xfs_attri_release(attrdp->attrd_attrip);
>> +	xfs_attrd_item_free(attrdp);
>> +
>> +	return NULLCOMMITLSN;
>> +}
>> +
>> +STATIC void
>> +xfs_attrd_item_committing(
>> +	struct xfs_log_item	*lip,
>> +	xfs_lsn_t		lsn)
>> +{
>> +}
>> +
>> +
>> +/*
>> + * Allocate and initialize an attrd item
>> + */
>> +struct xfs_attrd_log_item *
>> +xfs_attrd_init(
>> +	struct xfs_mount		*mp,
>> +	struct xfs_attri_log_item	*attrip)
>> +
>> +{
>> +	struct xfs_attrd_log_item	*attrdp;
>> +	uint				size;
>> +
>> +	size = (uint)(sizeof(struct xfs_attrd_log_item));
>> +	attrdp = kmem_zalloc(size, 0);
>> +	memset(attrdp, 0, size);
>> +
>> +	xfs_log_item_init(mp, &attrdp->attrd_item, XFS_LI_ATTRD,
>> +			  &xfs_attrd_item_ops);
>> +	attrdp->attrd_attrip = attrip;
>> +	attrdp->attrd_format.alfd_alf_id = attrip->attri_format.alfi_id;
>> +
>> +	return attrdp;
>> +}
>> +
>> +/*
>> + * This routine is called to allocate an "attr free done" log item.
>> + */
>> +struct xfs_attrd_log_item *
>> +xfs_trans_get_attrd(struct xfs_trans		*tp,
>> +		  struct xfs_attri_log_item	*attrip)
>> +{
>> +	struct xfs_attrd_log_item		*attrdp;
>> +
>> +	ASSERT(tp != NULL);
>> +
>> +	attrdp = xfs_attrd_init(tp->t_mountp, attrip);
>> +	ASSERT(attrdp != NULL);
>> +
>> +	xfs_trans_add_item(tp, &attrdp->attrd_item);
>> +	return attrdp;
>> +}
>> +
>> +static const struct xfs_item_ops xfs_attrd_item_ops = {
>> +	.iop_size	= xfs_attrd_item_size,
>> +	.iop_format	= xfs_attrd_item_format,
>> +	.iop_release    = xfs_attrd_item_release,
>> +	.iop_committing	= xfs_attrd_item_committing,
>> +	.iop_committed	= xfs_attrd_item_committed,
>> +};
>> +
>> +
>> +/* Get an ATTRD so we can process all the attrs. */
>> +static struct xfs_log_item *
>> +xfs_attr_create_done(
>> +	struct xfs_trans		*tp,
>> +	struct xfs_log_item		*intent,
>> +	unsigned int			count)
>> +{
>> +	return &xfs_trans_get_attrd(tp, ATTRI_ITEM(intent))->attrd_item;
>> +}
>> +
>> +const struct xfs_defer_op_type xfs_attr_defer_type = {
>> +	.max_items	= 1,
>> +	.create_intent	= xfs_attr_create_intent,
>> +	.abort_intent	= xfs_attr_abort_intent,
>> +	.create_done	= xfs_attr_create_done,
>> +	.finish_item	= xfs_attr_finish_item,
>> +	.cancel_item	= xfs_attr_cancel_item,
>> +};
>> +
>> +/*
>> + * Process an attr intent item that was recovered from the log.  We need to
>> + * delete the attr that it describes.
>> + */
>> +STATIC int
>> +xfs_attri_item_recover(
>> +	struct xfs_log_item		*lip,
>> +	struct xfs_trans		*parent_tp)
>> +{
>> +	struct xfs_attri_log_item	*attrip = ATTRI_ITEM(lip);
>> +	struct xfs_mount		*mp = parent_tp->t_mountp;
>> +	struct xfs_inode		*ip;
>> +	struct xfs_attrd_log_item	*attrdp;
>> +	struct xfs_da_args		args;
>> +	struct xfs_attri_log_format	*attrp;
>> +	struct xfs_trans_res		tres;
>> +	int				local;
>> +	int				error, err2 = 0;
>> +	int				rsvd = 0;
>> +	struct xfs_buf			*leaf_bp = NULL;
>> +	struct xfs_delattr_context	dac = {
>> +		.da_args	= &args,
>> +	};
>> +
>> +	/*
>> +	 * First check the validity of the attr described by the ATTRI.  If any
>> +	 * are bad, then assume that all are bad and just toss the ATTRI.
>> +	 */
>> +	attrp = &attrip->attri_format;
>> +	if (!(attrp->alfi_op_flags == XFS_ATTR_OP_FLAGS_SET ||
>> +	      attrp->alfi_op_flags == XFS_ATTR_OP_FLAGS_REMOVE) ||
>> +	    (attrp->alfi_value_len > XATTR_SIZE_MAX) ||
>> +	    (attrp->alfi_name_len > XATTR_NAME_MAX) ||
>> +	    (attrp->alfi_name_len == 0)) {
>> +		/*
>> +		 * This will pull the ATTRI from the AIL and free the memory
>> +		 * associated with it.
>> +		 */
>> +		xfs_attri_release(attrip);
>> +		return -EFSCORRUPTED;
>> +	}
>> +
>> +	error = xfs_iget(mp, 0, attrp->alfi_ino, 0, 0, &ip);
>> +	if (error)
>> +		return error;
>> +
>> +	memset(&args, 0, sizeof(args));
>> +	args.geo = ip->i_mount->m_attr_geo;
>> +	args.whichfork = XFS_ATTR_FORK;
>> +	args.dp = ip;
>> +	args.name = attrip->attri_name;
>> +	args.namelen = attrp->alfi_name_len;
>> +	args.attr_filter = attrp->alfi_attr_flags;
>> +	args.hashval = xfs_da_hashname(attrip->attri_name,
>> +					attrp->alfi_name_len);
>> +	args.value = attrip->attri_value;
>> +	args.valuelen = attrp->alfi_value_len;
>> +	args.op_flags = XFS_DA_OP_OKNOENT;
>> +	args.total = xfs_attr_calc_size(&args, &local);
>> +
>> +	tres.tr_logres = M_RES(mp)->tr_attrsetm.tr_logres +
>> +			M_RES(mp)->tr_attrsetrt.tr_logres * args.total;
>> +	tres.tr_logcount = XFS_ATTRSET_LOG_COUNT;
>> +	tres.tr_logflags = XFS_TRANS_PERM_LOG_RES;
>> +
>> +	error = xfs_trans_alloc(mp, &tres, args.total,  0,
>> +				rsvd ? XFS_TRANS_RESERVE : 0, &args.trans);
>> +	if (error)
>> +		goto out_rele;
>> +	attrdp = xfs_trans_get_attrd(args.trans, attrip);
>> +
>> +	xfs_ilock(ip, XFS_ILOCK_EXCL);
>> +
>> +	xfs_trans_ijoin(args.trans, ip, 0);
>> +
>> +	do {
>> +		error = xfs_trans_attr(&dac, attrdp, &leaf_bp,
>> +				       attrp->alfi_op_flags);
>> +		if (error && error != -EAGAIN)
>> +			goto abort_error;
>> +
>> +		xfs_trans_log_inode(args.trans, ip,
>> +				XFS_ILOG_CORE | XFS_ILOG_ADATA);
>> +
>> +		err2 = xfs_trans_roll(&args.trans);
>> +		if (err2) {
>> +			error = err2;
>> +			goto abort_error;
>> +		}
>> +
>> +		/* Rejoin inode and leaf if needed */
>> +		xfs_trans_ijoin(args.trans, ip, 0);
>> +		if (leaf_bp) {
>> +			xfs_trans_bjoin(args.trans, leaf_bp);
>> +			xfs_trans_bhold(args.trans, leaf_bp);
>> +		}
>> +
>> +	} while (error == -EAGAIN);
>> +
>> +	error = xfs_trans_commit(args.trans);
>> +	if (error)
>> +		goto abort_error;
>> +
>> +	xfs_iunlock(ip, XFS_ILOCK_EXCL);
>> +	xfs_irele(ip);
>> +	return error;
>> +
>> +abort_error:
>> +	xfs_trans_cancel(args.trans);
>> +	xfs_iunlock(ip, XFS_ILOCK_EXCL);
>> +out_rele:
>> +	xfs_irele(ip);
>> +	return error;
>> +}
>> +
>> +static const struct xfs_item_ops xfs_attri_item_ops = {
>> +	.iop_size	= xfs_attri_item_size,
>> +	.iop_format	= xfs_attri_item_format,
>> +	.iop_unpin	= xfs_attri_item_unpin,
>> +	.iop_committed	= xfs_attri_item_committed,
>> +	.iop_committing = xfs_attri_item_committing,
>> +	.iop_release    = xfs_attri_item_release,
>> +	.iop_recover	= xfs_attri_item_recover,
>> +	.iop_match	= xfs_attri_item_match,
>> +};
>> +
>> +
>> +
>> +STATIC int
>> +xlog_recover_attri_commit_pass2(
>> +	struct xlog                     *log,
>> +	struct list_head		*buffer_list,
>> +	struct xlog_recover_item        *item,
>> +	xfs_lsn_t                       lsn)
>> +{
>> +	int                             error;
>> +	struct xfs_mount                *mp = log->l_mp;
>> +	struct xfs_attri_log_item       *attrip;
>> +	struct xfs_attri_log_format     *attri_formatp;
>> +	char				*name = NULL;
>> +	char				*value = NULL;
>> +	int				region = 0;
>> +
>> +	attri_formatp = item->ri_buf[region].i_addr;
>> +
>> +	attrip = xfs_attri_init(mp);
>> +	error = xfs_attri_copy_format(&item->ri_buf[region],
>> +				      &attrip->attri_format);
>> +	if (error) {
>> +		xfs_attri_item_free(attrip);
>> +		return error;
>> +	}
>> +
>> +	attrip->attri_name_len = attri_formatp->alfi_name_len;
>> +	attrip->attri_value_len = attri_formatp->alfi_value_len;
>> +	attrip = kmem_realloc(attrip, sizeof(struct xfs_attri_log_item) +
>> +			      attrip->attri_name_len + attrip->attri_value_len,
>> +			      0);
>> +
>> +	ASSERT(attrip->attri_name_len > 0);
>> +	region++;
>> +	name = ((char *)attrip) + sizeof(struct xfs_attri_log_item);
>> +	memcpy(name, item->ri_buf[region].i_addr,
>> +	       attrip->attri_name_len);
>> +	attrip->attri_name = name;
>> +
>> +	if (attrip->attri_value_len > 0) {
>> +		region++;
>> +		value = ((char *)attrip) + sizeof(struct xfs_attri_log_item) +
>> +			attrip->attri_name_len;
>> +		memcpy(value, item->ri_buf[region].i_addr,
>> +			attrip->attri_value_len);
>> +		attrip->attri_value = value;
>> +	}
>> +
>> +	/*
>> +	 * The ATTRI has two references. One for the ATTRD and one for ATTRI to
>> +	 * ensure it makes it into the AIL. Insert the ATTRI into the AIL
>> +	 * directly and drop the ATTRI reference. Note that
>> +	 * xfs_trans_ail_update() drops the AIL lock.
>> +	 */
>> +	xfs_trans_ail_insert(log->l_ailp, &attrip->attri_item, lsn);
>> +	xfs_attri_release(attrip);
>> +	return 0;
>> +}
>> +
>> +const struct xlog_recover_item_ops xlog_attri_item_ops = {
>> +	.item_type	= XFS_LI_ATTRI,
>> +	.commit_pass2	= xlog_recover_attri_commit_pass2,
>> +};
>> +
>> +/*
>> + * This routine is called when an ATTRD format structure is found in a committed
>> + * transaction in the log. Its purpose is to cancel the corresponding ATTRI if
>> + * it was still in the log. To do this it searches the AIL for the ATTRI with
>> + * an id equal to that in the ATTRD format structure. If we find it we drop
>> + * the ATTRD reference, which removes the ATTRI from the AIL and frees it.
>> + */
>> +STATIC int
>> +xlog_recover_attrd_commit_pass2(
>> +	struct xlog			*log,
>> +	struct list_head		*buffer_list,
>> +	struct xlog_recover_item	*item,
>> +	xfs_lsn_t			lsn)
>> +{
>> +	struct xfs_attrd_log_format	*attrd_formatp;
>> +
>> +	attrd_formatp = item->ri_buf[0].i_addr;
>> +	ASSERT((item->ri_buf[0].i_len ==
>> +				(sizeof(struct xfs_attrd_log_format))));
>> +
>> +	xlog_recover_release_intent(log, XFS_LI_ATTRI,
>> +				    attrd_formatp->alfd_alf_id);
>> +	return 0;
>> +}
>> +
>> +const struct xlog_recover_item_ops xlog_attrd_item_ops = {
>> +	.item_type	= XFS_LI_ATTRD,
>> +	.commit_pass2	= xlog_recover_attrd_commit_pass2,
>> +};
>> diff --git a/fs/xfs/xfs_attr_item.h b/fs/xfs/xfs_attr_item.h
>> new file mode 100644
>> index 0000000..7dd2572
>> --- /dev/null
>> +++ b/fs/xfs/xfs_attr_item.h
>> @@ -0,0 +1,76 @@
>> +/* SPDX-License-Identifier: GPL-2.0-or-later
>> + *
>> + * Copyright (C) 2019 Oracle.  All Rights Reserved.
>> + * Author: Allison Collins <allison.henderson@oracle.com>
>> + */
>> +#ifndef	__XFS_ATTR_ITEM_H__
>> +#define	__XFS_ATTR_ITEM_H__
>> +
>> +/* kernel only ATTRI/ATTRD definitions */
>> +
>> +struct xfs_mount;
>> +struct kmem_zone;
>> +
>> +/*
>> + * Define ATTR flag bits. Manipulated by set/clear/test_bit operators.
>> + */
>> +#define	XFS_ATTRI_RECOVERED	1
>> +
>> +
>> +/* iovec length must be 32-bit aligned */
>> +#define ATTR_NVEC_SIZE(size) (size == sizeof(int32_t) ? sizeof(int32_t) : \
>> +				size + sizeof(int32_t) - \
>> +				(size % sizeof(int32_t)))
>> +
>> +/*
>> + * This is the "attr intention" log item.  It is used to log the fact that some
>> + * attribute operations need to be processed.  An operation is currently either
>> + * a set or remove.  Set or remove operations are described by the xfs_attr_item
>> + * which may be logged to this intent.  Intents are used in conjunction with the
>> + * "attr done" log item described below.
>> + *
>> + * The ATTRI is reference counted so that it is not freed prior to both the
>> + * ATTRI and ATTRD being committed and unpinned. This ensures the ATTRI is
>> + * inserted into the AIL even in the event of out of order ATTRI/ATTRD
>> + * processing. In other words, an ATTRI is born with two references:
>> + *
>> + *      1.) an ATTRI held reference to track ATTRI AIL insertion
>> + *      2.) an ATTRD held reference to track ATTRD commit
>> + *
>> + * On allocation, both references are the responsibility of the caller. Once the
>> + * ATTRI is added to and dirtied in a transaction, ownership of reference one
>> + * transfers to the transaction. The reference is dropped once the ATTRI is
>> + * inserted to the AIL or in the event of failure along the way (e.g., commit
>> + * failure, log I/O error, etc.). Note that the caller remains responsible for
>> + * the ATTRD reference under all circumstances to this point. The caller has no
>> + * means to detect failure once the transaction is committed, however.
>> + * Therefore, an ATTRD is required after this point, even in the event of
>> + * unrelated failure.
>> + *
>> + * Once an ATTRD is allocated and dirtied in a transaction, reference two
>> + * transfers to the transaction. The ATTRD reference is dropped once it reaches
>> + * the unpin handler. Similar to the ATTRI, the reference also drops in the
>> + * event of commit failure or log I/O errors. Note that the ATTRD is not
>> + * inserted in the AIL, so at this point both the ATTRI and ATTRD are freed.
>> + */
>> +struct xfs_attri_log_item {
>> +	struct xfs_log_item		attri_item;
>> +	atomic_t			attri_refcount;
>> +	int				attri_name_len;
>> +	void				*attri_name;
>> +	int				attri_value_len;
>> +	void				*attri_value;
>> +	struct xfs_attri_log_format	attri_format;
>> +};
>> +
>> +/*
>> + * This is the "attr done" log item.  It is used to log the fact that some attrs
>> + * earlier mentioned in an attri item have been freed.
>> + */
>> +struct xfs_attrd_log_item {
>> +	struct xfs_attri_log_item	*attrd_attrip;
>> +	struct xfs_log_item		attrd_item;
>> +	struct xfs_attrd_log_format	attrd_format;
>> +};
>> +
>> +#endif	/* __XFS_ATTR_ITEM_H__ */
>> diff --git a/fs/xfs/xfs_attr_list.c b/fs/xfs/xfs_attr_list.c
>> index 50f922c..166b680 100644
>> --- a/fs/xfs/xfs_attr_list.c
>> +++ b/fs/xfs/xfs_attr_list.c
>> @@ -15,6 +15,7 @@
>>   #include "xfs_inode.h"
>>   #include "xfs_trans.h"
>>   #include "xfs_bmap.h"
>> +#include "xfs_da_btree.h"
>>   #include "xfs_attr.h"
>>   #include "xfs_attr_sf.h"
>>   #include "xfs_attr_leaf.h"
>> diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
>> index 6f22a66..edc05af 100644
>> --- a/fs/xfs/xfs_ioctl.c
>> +++ b/fs/xfs/xfs_ioctl.c
>> @@ -15,6 +15,8 @@
>>   #include "xfs_iwalk.h"
>>   #include "xfs_itable.h"
>>   #include "xfs_error.h"
>> +#include "xfs_da_format.h"
>> +#include "xfs_da_btree.h"
>>   #include "xfs_attr.h"
>>   #include "xfs_bmap.h"
>>   #include "xfs_bmap_util.h"
>> diff --git a/fs/xfs/xfs_ioctl32.c b/fs/xfs/xfs_ioctl32.c
>> index c1771e7..62e1534 100644
>> --- a/fs/xfs/xfs_ioctl32.c
>> +++ b/fs/xfs/xfs_ioctl32.c
>> @@ -17,6 +17,8 @@
>>   #include "xfs_itable.h"
>>   #include "xfs_fsops.h"
>>   #include "xfs_rtalloc.h"
>> +#include "xfs_da_format.h"
>> +#include "xfs_da_btree.h"
>>   #include "xfs_attr.h"
>>   #include "xfs_ioctl.h"
>>   #include "xfs_ioctl32.h"
>> diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
>> index 80a13c8..fe60da1 100644
>> --- a/fs/xfs/xfs_iops.c
>> +++ b/fs/xfs/xfs_iops.c
>> @@ -13,6 +13,8 @@
>>   #include "xfs_inode.h"
>>   #include "xfs_acl.h"
>>   #include "xfs_quota.h"
>> +#include "xfs_da_format.h"
>> +#include "xfs_da_btree.h"
>>   #include "xfs_attr.h"
>>   #include "xfs_trans.h"
>>   #include "xfs_trace.h"
>> diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
>> index ad0c69ee..6405ce33 100644
>> --- a/fs/xfs/xfs_log.c
>> +++ b/fs/xfs/xfs_log.c
>> @@ -1975,6 +1975,10 @@ xlog_print_tic_res(
>>   	    REG_TYPE_STR(CUD_FORMAT, "cud_format"),
>>   	    REG_TYPE_STR(BUI_FORMAT, "bui_format"),
>>   	    REG_TYPE_STR(BUD_FORMAT, "bud_format"),
>> +	    REG_TYPE_STR(ATTRI_FORMAT, "attri_format"),
>> +	    REG_TYPE_STR(ATTRD_FORMAT, "attrd_format"),
>> +	    REG_TYPE_STR(ATTR_NAME, "attr_name"),
>> +	    REG_TYPE_STR(ATTR_VALUE, "attr_value"),
>>   	};
>>   	BUILD_BUG_ON(ARRAY_SIZE(res_type_str) != XLOG_REG_TYPE_MAX + 1);
>>   #undef REG_TYPE_STR
>> diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
>> index e2ec91b..ec31db0 100644
>> --- a/fs/xfs/xfs_log_recover.c
>> +++ b/fs/xfs/xfs_log_recover.c
>> @@ -1811,6 +1811,8 @@ static const struct xlog_recover_item_ops *xlog_recover_item_ops[] = {
>>   	&xlog_cud_item_ops,
>>   	&xlog_bui_item_ops,
>>   	&xlog_bud_item_ops,
>> +	&xlog_attri_item_ops,
>> +	&xlog_attrd_item_ops,
>>   };
>>   
>>   static const struct xlog_recover_item_ops *
>> diff --git a/fs/xfs/xfs_ondisk.h b/fs/xfs/xfs_ondisk.h
>> index 5f04d8a..0597a04 100644
>> --- a/fs/xfs/xfs_ondisk.h
>> +++ b/fs/xfs/xfs_ondisk.h
>> @@ -126,6 +126,8 @@ xfs_check_ondisk_structs(void)
>>   	XFS_CHECK_STRUCT_SIZE(struct xfs_inode_log_format,	56);
>>   	XFS_CHECK_STRUCT_SIZE(struct xfs_qoff_logformat,	20);
>>   	XFS_CHECK_STRUCT_SIZE(struct xfs_trans_header,		16);
>> +	XFS_CHECK_STRUCT_SIZE(struct xfs_attri_log_format,	40);
>> +	XFS_CHECK_STRUCT_SIZE(struct xfs_attrd_log_format,	16);
>>   
>>   	/*
>>   	 * The v5 superblock format extended several v4 header structures with
>> diff --git a/fs/xfs/xfs_xattr.c b/fs/xfs/xfs_xattr.c
>> index bca48b3..9b0c790 100644
>> --- a/fs/xfs/xfs_xattr.c
>> +++ b/fs/xfs/xfs_xattr.c
>> @@ -10,6 +10,7 @@
>>   #include "xfs_log_format.h"
>>   #include "xfs_da_format.h"
>>   #include "xfs_inode.h"
>> +#include "xfs_da_btree.h"
>>   #include "xfs_attr.h"
>>   #include "xfs_acl.h"
>>   #include "xfs_da_btree.h"
>> -- 
>> 2.7.4
>>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v12 4/8] xfs: Set up infastructure for deferred attribute operations
  2020-09-02  0:46     ` Allison Collins
@ 2020-09-02  2:33       ` Allison Collins
  0 siblings, 0 replies; 21+ messages in thread
From: Allison Collins @ 2020-09-02  2:33 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs



On 9/1/20 5:46 PM, Allison Collins wrote:
> 
> 
> On 8/28/20 2:27 PM, Darrick J. Wong wrote:
>> On Wed, Aug 26, 2020 at 05:35:14PM -0700, Allison Collins wrote:
>>> Currently attributes are modified directly across one or more
>>> transactions. But they are not logged or replayed in the event of an
>>> error. The goal of delayed attributes is to enable logging and replaying
>>> of attribute operations using the existing delayed operations
>>> infrastructure.  This will later enable the attributes to become part of
>>> larger multi part operations that also must first be recorded to the
>>> log.  This is mostly of interest in the scheme of parent pointers which
>>> would need to maintain an attribute containing parent inode information
>>> any time an inode is moved, created, or removed.  Parent pointers would
>>> then be of interest to any feature that would need to quickly derive an
>>> inode path from the mount point. Online scrub, nfs lookups and fs grow
>>> or shrink operations are all features that could take advantage of this.
>>>
>>> This patch adds two new log item types for setting or removing
>>> attributes as deferred operations.  The xfs_attri_log_item logs an
>>> intent to set or remove an attribute.  The corresponding
>>> xfs_attrd_log_item holds a reference to the xfs_attri_log_item and is
>>> freed once the transaction is done.  Both log items use a generic
>>> xfs_attr_log_format structure that contains the attribute name, value,
>>> flags, inode, and an op_flag that indicates if the operations is a set
>>> or remove.
>>>
>>> Signed-off-by: Allison Collins <allison.henderson@oracle.com>
>>> ---
>>>   fs/xfs/Makefile                 |   1 +
>>>   fs/xfs/libxfs/xfs_attr.c        |   7 +-
>>>   fs/xfs/libxfs/xfs_attr.h        |  39 ++
>>>   fs/xfs/libxfs/xfs_defer.c       |   1 +
>>>   fs/xfs/libxfs/xfs_defer.h       |   3 +
>>>   fs/xfs/libxfs/xfs_log_format.h  |  44 ++-
>>>   fs/xfs/libxfs/xfs_log_recover.h |   2 +
>>>   fs/xfs/libxfs/xfs_types.h       |   1 +
>>>   fs/xfs/scrub/common.c           |   2 +
>>>   fs/xfs/xfs_acl.c                |   2 +
>>>   fs/xfs/xfs_attr_item.c          | 829 
>>> ++++++++++++++++++++++++++++++++++++++++
>>>   fs/xfs/xfs_attr_item.h          |  76 ++++
>>>   fs/xfs/xfs_attr_list.c          |   1 +
>>>   fs/xfs/xfs_ioctl.c              |   2 +
>>>   fs/xfs/xfs_ioctl32.c            |   2 +
>>>   fs/xfs/xfs_iops.c               |   2 +
>>>   fs/xfs/xfs_log.c                |   4 +
>>>   fs/xfs/xfs_log_recover.c        |   2 +
>>>   fs/xfs/xfs_ondisk.h             |   2 +
>>>   fs/xfs/xfs_xattr.c              |   1 +
>>>   20 files changed, 1017 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
>>> index 04611a1..b056cfc 100644
>>> --- a/fs/xfs/Makefile
>>> +++ b/fs/xfs/Makefile
>>> @@ -102,6 +102,7 @@ xfs-y                += xfs_log.o \
>>>                      xfs_buf_item_recover.o \
>>>                      xfs_dquot_item_recover.o \
>>>                      xfs_extfree_item.o \
>>> +                   xfs_attr_item.o \
>>>                      xfs_icreate_item.o \
>>>                      xfs_inode_item.o \
>>>                      xfs_inode_item_recover.o \
>>> diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
>>> index a8cfe62..cf75742 100644
>>> --- a/fs/xfs/libxfs/xfs_attr.c
>>> +++ b/fs/xfs/libxfs/xfs_attr.c
>>> @@ -24,6 +24,7 @@
>>>   #include "xfs_quota.h"
>>>   #include "xfs_trans_space.h"
>>>   #include "xfs_trace.h"
>>> +#include "xfs_attr_item.h"
>>>   /*
>>>    * xfs_attr.c
>>> @@ -59,8 +60,6 @@ STATIC int xfs_attr_node_hasname(xfs_da_args_t *args,
>>>   STATIC int xfs_attr_fillstate(xfs_da_state_t *state);
>>>   STATIC int xfs_attr_refillstate(xfs_da_state_t *state);
>>>   STATIC int xfs_attr_leaf_try_add(struct xfs_da_args *args, struct 
>>> xfs_buf *bp);
>>> -STATIC int xfs_attr_set_iter(struct xfs_delattr_context *dac,
>>> -                 struct xfs_buf **leaf_bp);
>>>   int
>>>   xfs_inode_hasattr(
>>> @@ -142,7 +141,7 @@ xfs_attr_get(
>>>   /*
>>>    * Calculate how many blocks we need for the new attribute,
>>>    */
>>> -STATIC int
>>> +int
>>>   xfs_attr_calc_size(
>>>       struct xfs_da_args    *args,
>>>       int            *local)
>>> @@ -327,7 +326,7 @@ xfs_attr_set_args(
>>>    * to handle this, and recall the function until a successful error 
>>> code is
>>>    * returned.
>>>    */
>>> -STATIC int
>>> +int
>>>   xfs_attr_set_iter(
>>>       struct xfs_delattr_context    *dac,
>>>       struct xfs_buf            **leaf_bp)
>>> diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
>>> index 4f6bba8..23b8308 100644
>>> --- a/fs/xfs/libxfs/xfs_attr.h
>>> +++ b/fs/xfs/libxfs/xfs_attr.h
>>> @@ -247,6 +247,7 @@ enum xfs_delattr_state {
>>>   #define XFS_DAC_DEFER_FINISH        0x01 /* finish the transaction */
>>>   #define XFS_DAC_NODE_RMVNAME_INIT    0x02 /* 
>>> xfs_attr_node_removename init */
>>>   #define XFS_DAC_LEAF_ADDNAME_INIT    0x04 /* xfs_attr_leaf_addname 
>>> init*/
>>> +#define XFS_DAC_DELAYED_OP_INIT        0x08 /* delayed operations 
>>> init*/
>>>   /*
>>>    * Context used for keeping track of delayed attribute operations
>>> @@ -254,6 +255,9 @@ enum xfs_delattr_state {
>>>   struct xfs_delattr_context {
>>>       struct xfs_da_args      *da_args;
>>> +    /* Used by delayed attributes to hold leaf across transactions */
>>> +    struct xfs_buf        *leaf_bp;
>>> +
>>>       /* Used in xfs_attr_rmtval_set_blk to roll through allocating 
>>> blocks */
>>>       struct xfs_bmbt_irec    map;
>>>       xfs_dablk_t        lblkno;
>>> @@ -268,6 +272,38 @@ struct xfs_delattr_context {
>>>       enum xfs_delattr_state  dela_state;
>>>   };
>>
>> I'll start by pasting in the full xfs_delattr_context definition for
>> easier reading:
>>
>> /*
>>   * Context used for keeping track of delayed attribute operations
>>   */
>> struct xfs_delattr_context {
>>     struct xfs_da_args      *da_args;
>>
>>     /* Used by delayed attributes to hold leaf across transactions */
>>     struct xfs_buf        *leaf_bp;
>>
>>     /* Used in xfs_attr_rmtval_set_blk to roll through allocating 
>> blocks */
>>     struct xfs_bmbt_irec    map;
>>     xfs_dablk_t        lblkno;
>>     int            blkcnt;
>>
>>     /* Used in xfs_attr_node_removename to roll through removing 
>> blocks */
>>     struct xfs_da_state     *da_state;
>>     struct xfs_da_state_blk *blk;
>>
>>     /* Used to keep track of current state of delayed operation */
>>     unsigned int            flags;
>>     enum xfs_delattr_state  dela_state;
>> };
>>
>> Admittedly, I /am/ conducting a backwards review and zeroing in on the
>> data structures first.
>>
>>> +/*
>>> + * List of attrs to commit later.
>>> + */
>>> +struct xfs_attr_item {
>>> +    struct xfs_inode    *xattri_ip;
>>> +    void            *xattri_value;        /* attr value */
>>> +    void            *xattri_name;        /* attr name */
>>> +    uint32_t        xattri_op_flags;    /* attr op set or rm */
>>> +    uint32_t        xattri_value_len;    /* length of value */
>>> +    uint32_t        xattri_name_len;    /* length of name */
>>> +    uint32_t        xattri_flags;        /* attr flags */
>>> +
>>> +    /* used to log this item to an intent */
>>> +    struct list_head    xattri_list;
>>> +
>>> +    /*
>>> +     * xfs_delattr_context and xfs_da_args need to remain instantiated
>>> +     * across transaction rolls during the defer finish, so store 
>>> them here
>>> +     */
>>> +    struct xfs_da_args        xattri_args;
>>> +    struct xfs_delattr_context    xattri_dac;
>>> +
>>> +    /*
>>> +     * A byte array follows the header containing the file name and
>>> +     * attribute value.
>>> +     */
>>> +};
>>
>> These two structures (xfs_delattr_context and xfs_attr_item) duplicate a
>> lot of information considering that they both track incore state during
>> an xattr set/remove operation.  There's also a lot of duplication
>> between the do-while loop in xfs_attr_set_args and the inner loop of the
>> defer attr set code.
> Yes, to clarify a bit of the history: most of this was sort of adopted 
> from the efi/efd code as a sort model.  Most of the fields in the 
> xfs_attr_item are sort of like the parameters needed to kick off a 
> delayed operation.
> 
> The xfs_da_args and the xfs_delattr_context are an exception to that 
> model.  Most of the time, they're not even populated.  The reason they 
> are here is because they need to be instantiated somewhere not inside 
> the call stack of the delayed attr operations.  Otherwise we'd loose 
> them every time we come back with -EAGAIN.  In the non delayed attr 
> code, they are kept in the xfs_attr_*_args functions.  In the delayed 
> attr code path, they are kept here.
> 
> IOW: I had to plop them somewhere, and discovered that the *_items 
> remain instantiated across their corresponding delayed operations.  So 
> it seemed like a reasonable place?  I suspect attrs are the first 
> delayed operation to require a context of sorts as I did not see any of 
> the other delayed operations needing to deal with this issue.  So attr 
> operations a little unique in this way.
> 
> 
>>
>> To make sure I'm understanding this correctly, let me start by repeating
>> back to you what I think is the code flow through the hasdelattr path
>> and then the !hasdelattr path.  Let's call the hasdelattr path (A).
>>
>> First, the caller allocates an xfs_da_args structure and partially
>> initializes it with dp, attr_filter, attr_flags, name, namelen, value,
>> and valuelen set appropriately for the operation it wants.  The rest of
>> the struct should be zeroed, because the uninitialized parts are
>> internal state.
>>
>> Second, the *args are passed to xfs_attr_set, which after setting up a
>> transaction calls xfs_attr_set_deferred.  This calls xfs_attr_item_init
>> to allocate and initialize a struct xfs_attr_item with dp, name,
>> namelen, attr_filter, value, and valuelen, and passes this incore state
>> tracking structure to the defer ops machinery.
>>
>> Third, the defer ops machinery calls xfs_attr_finish_item to deal with
>> the attr request.  If the xfs_delattr_context within the xfs_attr_item
>> is uninitialized it willl set the xfs_da_args state that's within the
>> xfs_attr_item to the values already stored in the xfs_attr_item.
>>
>> Fourth, xfs_attr_finish_item calls xfs_trans_attr to dispatch based on
>> op_flags.  For setting, this means we call xfs_attr_set_iter.
>>
>> Fifth, xfs_attr_set_iter dispatches functions based on whatever
>> dela_state in the delattr_context is set to.  The functions it calls can
>> set DAC_DEFER_FINISH and/or return -EAGAIN to signal the defer ops
>> machinery that it needs to roll the transaction so that we can repeat
>> steps 3-5 until we're done.  The defer ops machinery ought to honor
>> DEFER_FINISH and complete whatever work items we've put on the queue,
>> but... it's buggy and doesn't.  I'll come back to this later.
> Oh you are right... this should be updated.  So the history was: at some 
> time during the review of the delayed ready series, it was proposed that 
> we have a top level loop that rolls the transactions, rather than trying 
> to plumb in an "off switch" for the transactions.  This looping concept 
> was already modeled here, so I had adopted it for use in the delay ready 
> series.
> 
> I did however, forget to go back and update it with the DEFER_FINISH 
> flags that we added later.  Or consolidate them, which I suspect is 
> where you are going with this... :-)
> 
>>
>> Sixth, once we're done, we return out to xfs_attr_set to commit the
>> transaction and exit.
>>
>> Did I understand that correctly? 
> That sounds about right
> 
> If so, I'll move on to the !hasdelattr
>> case, which we'll call (B).
>>
>> First, the caller allocates an xfs_da_args structure and partially
>> initializes it with dp, attr_filter, attr_flags, name, namelen, value,
>> and valuelen set appropriately for the operation it wants.  The rest of
>> the struct should be zeroed, because the uninitialized parts are
>> internal state.  This is the same as step A1 above.
>>
>> Second, the *args are passed to xfs_attr_set, which after setting up a
>> transaction calls xfs_attr_set_args.  This calls xfs_attr_set_iter,
>> which is the dela_state function dispatcher mentioned in step A5 above.
>> The functions it calls can set DAC_DEFER_FINISH to signal to
>> xfs_attr_set_args that it needs to complete whatever work items we've
>> attached to the transaction.  They can also return -EAGAIN to signal
>> to xfs_attr_set_args that it's time to roll the transaction.
>>
>> Third, once we're done, we return out of xfs_attr_set, same as step A6
>> above.
>>
>> Assuming I understood those two code paths correctly, I'll move on to
>> the attr item recovery case.  Call this (C).
> Sounds about right
> 
>>
>> First, xfs_attri_item_recover is called with a recovered incore log
>> item.  It allocates an xfs_da_args and fills out most of the same
>> fields that xfs_attr_set does in A1-A2 and B1-B2 above; and then it
>> allocates a transaction.
>>
>> Second, _recover has its own while loop(!) to call xfs_trans_attr, which
>> calls xfs_attr_set_iter, sort of like what A4 does.  I'll come back to
>> this later as well.
>>
>> Third, xfs_attr_set_iter uses dela_state to dispatch functions, similar
>> to what A5 does above.  If those functions set DAC_DEFER_FINISH or
>> return -EAGAIN, we'll pass that out to xfs_attr_set_iter to get the
>> transaction rolled so we can move on to the next state.
> Mmm, xfs_attr_set_iter doesn't roll transactions. xfs_attr_*_args does. 
> Perhaps the *_recover should follow suit, or be consolidated.
> 
>>
>> Fourth, when the loop is done we commit the transaction and move on with
>> whatever is next in log recovery.
>>
>> Does that sound right?  If so, let's move on to the issues I noted
>> above.
>>
>> I think the first problem is that this patchset adds two more xattr
>> operation state structures.  Current xfs_da_args store both the
>> operation arguments (inode, name, value, other flags) and most of the
>> state of the operation (whichfork, hashval, geo, block indices, rmt
>> block indices).  The series then adds a xfs_delattr_context that holds
>> more state that needs to survive a transaction roll (leafbp, rmt
>> mappings, da btree state, and dela_state).  Then, it adds yet another
>> xfs_attr_item that contains its own xfs_da_args and xfs_delattr_context,
>> and has a bunch more fields xattri_(ip, value, name, opflags, value)
>> that duplicate the fields that already exist in xfs_da_args.
>>
>> This is hard to follow.  I don't know what's the difference between
>> xfs_attr_item.xattri_name and xfs_attr_item.xattri_args.name, and I
>> suspect this makes xfs_attr_item much larger than it needs to be.
> Hmm, ok.  Let me see if I could get away with having just having args 
> and dac.  That might eliminate some of the overlap.
> 
>>
>> Question 1: Can we break up struct xfs_da_args?  Right now its field
>> definition is the union set of everything needed to track both a
>> directory operation and an xattr operation.  What do you think of
>> creating separate xfs_dirop_state and xfs_attrop_state structures that
>> each embed an xfs_da_args, and then move the dir and attr-specific
>> pieces out of xfs_da_args and into xfs_{dir,attr}op_state as
>> appropriate?  I think Christoph has suggested this elsewhere on the list
>> in the past.
>>
>> (Note that xfs_da_state is its own separate thing for dealing with
>> dabtree operations; that doesn't change.)
> Sure, let me dig around and see if I can better modularize args so that 
> we're not carrying around all the dir op stuff through all the attr op 
> stuff.
> 
>>
>> Question 2: Should we revise the arguments to xfs_attr_[gs]et?  Right
>> now the callers of these functions have to initialize the entire
>> xfs_da_state structure even though they only care about 7 of the 26
>> fields.  What do you think of changing the xfs_attr_[gs]et function
>> declarations to pass in the 7 arguments directly?  Or you could create a
>> new arguments struct?  If you did that, then xfs_args_[gs]et would be
>> responsible for allocating and initializing their internal state.  This
>> is cleaner interface-wise, 
> I can dig around and see if I can get something like that to work.  I 
> like the mini struct idea.  I suspect we'll end up with a few routines 
> with similar set of 7 params, so the struct makes sense
> 
> and leads me into...
>>
>> Question 3: Instead of creating separate xfs_delattr_context
>> andxfs_attr_item structs, can you put all the stuff those structures
>> track into xfs_attrop_state? 
> Where xfs_attrop_state is the combination of xfs_delattr_context, 
> xfs_attr_item, and a subset of xfs_da_args?
> 
> I sense that the duplication and pointer
>> indirection in _delattr_context and _attr_item might be a result of it
>> not being all that clear where the xfs_da_args is actually allocated,
>> and therefore the scoping rules.  Would all that be clearer if all the
>> new state was thrown into the same xfs_attrop_state that we dynamically
>> allocate at the start of xfs_attr_[gs]et()?  (Yes, this question's
>> existence depends on your answer to Q2.)
> I suppose it could me made to work?  I think we're starting to glob 
> together members that belong to slightly similar scopes, so the real 
> question would be: Are people going to be amicable to seeing it that way?
> 
> The xfs_attr_item was sort of modeled from xfs_extent_free_item.  In 
> general these structs are meant to function as sort of items in a list 
> of log items.  Hats why they all have the list_head field at a minimum. 
> (xfs_bmap_intent, xfs_extent_free_item, xfs_refcount_intent, 
> xfs_rmap_intent, are similar in this way.  They all have their own 
> corresponding *_finish_item routines whose purpose is to unpack the item 
> and hand it off to it's corresponding *_trans routines).  It seemed to 
> be a sort of established pattern, so I figured I should fall in line 
> with that.
> 
> xfs_delattr_context is a bit of an odd ball.  It doesn't really 
> represent a set of associated properties like names and values and such. 
>   It's really more about bookmarking a position in a sequence of events. 
> So their their contents really have no meaning outside this context. For 
> example, blkcnt is used during attr grow an shrink operations, but an 
> attr doesn't otherwise normally have a block count associated with it.
> 
> I personally think it's a bit messy to try and lump all of that 
> together, but it's really an aesthetic thing. Ultimately that is going 
> to come down to the cross section of opinions that people are most 
> comfortable seeing. :-)
> 
>>
>> Question 4: Does xfs_attr_item_init need to allocate space to hold the
>> name and value buffers when it is called from xfs_attr_set?
> It would if it were ever used in a routine that passed name and value as 
> local params, and then exited before finishing the transaction, or if it 
> in anyway manipulated their values in a way not meant to be reflected in 
> the delayed operation.
> 
> I'm not sure if there's an instance where we do this, but I'd really 
> have to try it and see. Will verify.
> 
>> xfs_attr_set does not return until we're completely finished with the
>> deferred xattr processing, which means that the buffers passed into
>> xfs_attr_set cannot go out of scope, right?
>>
>> (I think you /do/ need to allocate separate buffers for log recovery.)
> Yes, the on disk structs are xfs_attri_log_format and xfs_attrd_log_format
> 
>>
>> My second set of questions revolve around the duplication of attr
>> operation loops between xfs_attr_set_args() and the defer ops code.
>> AFAICT there's no reason to have xfs_attr_set_args, since there is no
>> requirement in the deferred ops machinery to create log intent or log
>> done items.
> Yes, let me see if I can consolidate that, I forgot that I had nabbed 
> the looping code from later in the set and pulled it downwards some time 
> ago.   I'm thinking maybe xfs_attri_item_recover should just unpack the 
> log item and pipe it through to xfs_attr_set_args
> 
>>
>> Question 5: Instead of open-coding a do {attrset roll hold} loop in
>> xfs_attr_set_args, what do you think about setting up the deferred op
>> code (xfs_attr_defer_type and the functions assigned to it in patch 4)
>> to do that from the start?  By adding the defer op code early, patch 2
>> would create xfs_attr_set_iter as it does now, and xfs_attr_finish_item
>> would call it directly.  Since there's no log item defined yet, the
>> other defer ops functions (create_intent, abort_intent, create_done) can
>> return NULL log item pointers.
>>
> K, this one I'm having a hard time following.... the purpose of 
> xis_tart_finish_item is to unpack the xis_tart_item, and must also 
> adhere to the xfs_defer_op_type.finish_item signature.  It doesn't loop 
> like xfs_attr_set_args does because the calling delayed operation 
> infrastructure already does that (see the loop in xfs_defer_finish_one). 
>   Without the delayed operation machinery in the picture, a non delayed 
> attribute is going to need an equivalent loop somewhere.  We can 
> consolidate it with the loop we see in the recovery path, but we do need 
> to keep at least one loop somewhere, and it cant be in 
> xfs_attr_finish_item.
Sorry xis_tart_finish_item should read xfs_attr_finish_item and 
xis_tart_item should be xis_attr_item.

need to keep closer eye on spell checker...

> 
>> Once you get to the point whre you have defined the log items, you can
>> add in all the other log item handling (i.e. xfs_attr[id]_item_ops).  As
>> an example of a defer op that optionally records changes to its incore
>> operation state with log items, see xfs_swapext_defer_type[1].
>>
>> [1] 
>> https://urldefense.com/v3/__https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/commit/?h=atomic-file-updates&id=53c7233842969347174e8d68c8486dbf3efb734c__;!!GqivPVa7Brio!N0NsMhrByLcmev5aFDxTBS_VvzIbprn_VmNb4kY_gw4ADR0q1ERQzlaQe-tjnhXmUNhu$ 
>>
> Ok, let me go through that, maybe I'm missing something here
> 
>>
>> Moving along to the DEFER_FINISH question that I said I'd get back to
>> later -- there's a subtle difference to the order in which deferred log
>> items that are created while trying to make progress on an xattr op are
>> finished.  This is due to a design wart of the original defer ops
>> machinery, and Brian and I have discussed this previously.
>>
>> In a nutshell, let's pretend that step 1 of an xattr operation creates
>> new deferred ops ABCD and step 2 creates new deferred ops EFGH.  Let's
>> also pretend that step 1 and step 2 both set DEFER_FINISH.  In the
>> !delattr case, xfs_attr_set_args -> xfs_attr_trans_roll will run step 1,
>> process A->B->C->D, roll, run step 2, and then process E->F->G->H and
>> commit.
>>
>> In the delattr case, however, the defer ops machinery shoves all the new
>> defer ops to the end of the queue, which means that we run step 1, roll,
>> run step 2, and then run A->B->C->D->E->F->G->H and commit.  I would
>> like to fix that, since it seems more logical to me that you'd finish
>> A-D before moving on to the second phase; and the atomic swapext code is
>> going to require that.
>>
>> Question 6: So, uh, can you go have a look at the latest patches[2]?
>> I'll post them soon if I can get past the bigtime review.  I don't think
>> this wart of the defer ops mechanism affects your patchset, but you know
>> how deferred attrs work better than I. :)
>>
>> [2] 
>> https://urldefense.com/v3/__https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=defer-ops-stalls__;!!GqivPVa7Brio!N0NsMhrByLcmev5aFDxTBS_VvzIbprn_VmNb4kY_gw4ADR0q1ERQzlaQe-tjnqXvaBRR$ 
>>
> Oh ok.  Sure, will take a look
> 
>>
>> I also had a couple questions (observations?) about how log recovery
>> works for attr items, because I noticed that xfs_attri_item_recover also
>> has a do {attrset, roll} loop.
>>
>> HAH, I just realized (while writing Q7) that xfs_defer_move needs to log
>> intent items for each newly scheduled work item because if log recovery
>> crashes after finishing the existing intent items but before it gets to
>> the new intent items, the next attempt at log recovery will not see the
>> missing intents and will /never/ even be aware that it should have
>> finished a chain.  That leads to fs corruption!  So that series has more
>> work to do, and you can set Q6 aside for now.
> :-)  I may take a peek anyway
> 
>>
>> Question 7: Why is there a do {attrset, roll} loop in the recovery
>> routine?  Log intent item recovery functions are only supposed to
>> complete a single transaction's worth of work.  If there's more work to
>> do, the recovery function should attach a new defer ops item to the
>> transaction to schedule the rest of the work, and use xfs_defer_move
>> to attach the list of new defer ops to *parent_tp.
> Oh ok.  I wasnt aware of how that was supposed to work.  Will update.
> 
>>
>> The reason for this is that log recovery has to finish every unfinished
>> intent item that was in the log before it can move on to new log items
>> that were created as a result of recovering log items.
> Ok, so maybe we dont consolidate the loops since this one will need to 
> go away.  Thanks for the catch though!
> 
>>
>> Ok, that's probably enough questions for now.
>>
>> --D
> 
> Thanks!  I know it's a lot!!
> Allison
> 
>>
>>> +
>>> +#define XFS_ATTR_ITEM_SIZEOF(namelen, valuelen)    \
>>> +    (sizeof(struct xfs_attr_item) + (namelen) + (valuelen))
>>
>>> +
>>> +
>>>   
>>> /*======================================================================== 
>>>
>>>    * Function prototypes for the kernel.
>>>    
>>> *========================================================================*/ 
>>>
>>> @@ -283,11 +319,14 @@ int xfs_attr_get_ilocked(struct xfs_da_args 
>>> *args);
>>>   int xfs_attr_get(struct xfs_da_args *args);
>>>   int xfs_attr_set(struct xfs_da_args *args);
>>>   int xfs_attr_set_args(struct xfs_da_args *args);
>>> +int xfs_attr_set_iter(struct xfs_delattr_context *dac,
>>> +              struct xfs_buf **leaf_bp);
>>>   int xfs_has_attr(struct xfs_da_args *args);
>>>   int xfs_attr_remove_args(struct xfs_da_args *args);
>>>   int xfs_attr_remove_iter(struct xfs_delattr_context *dac);
>>>   bool xfs_attr_namecheck(const void *name, size_t length);
>>>   void xfs_delattr_context_init(struct xfs_delattr_context *dac,
>>>                     struct xfs_da_args *args);
>>> +int xfs_attr_calc_size(struct xfs_da_args *args, int *local);
>>>   #endif    /* __XFS_ATTR_H__ */
>>> diff --git a/fs/xfs/libxfs/xfs_defer.c b/fs/xfs/libxfs/xfs_defer.c
>>> index d8f5862..4392279 100644
>>> --- a/fs/xfs/libxfs/xfs_defer.c
>>> +++ b/fs/xfs/libxfs/xfs_defer.c
>>> @@ -176,6 +176,7 @@ static const struct xfs_defer_op_type 
>>> *defer_op_types[] = {
>>>       [XFS_DEFER_OPS_TYPE_RMAP]    = &xfs_rmap_update_defer_type,
>>>       [XFS_DEFER_OPS_TYPE_FREE]    = &xfs_extent_free_defer_type,
>>>       [XFS_DEFER_OPS_TYPE_AGFL_FREE]    = &xfs_agfl_free_defer_type,
>>> +    [XFS_DEFER_OPS_TYPE_ATTR]    = &xfs_attr_defer_type,
>>>   };
>>>   static void
>>> diff --git a/fs/xfs/libxfs/xfs_defer.h b/fs/xfs/libxfs/xfs_defer.h
>>> index 6b2ca58..193d3bb 100644
>>> --- a/fs/xfs/libxfs/xfs_defer.h
>>> +++ b/fs/xfs/libxfs/xfs_defer.h
>>> @@ -18,6 +18,7 @@ enum xfs_defer_ops_type {
>>>       XFS_DEFER_OPS_TYPE_RMAP,
>>>       XFS_DEFER_OPS_TYPE_FREE,
>>>       XFS_DEFER_OPS_TYPE_AGFL_FREE,
>>> +    XFS_DEFER_OPS_TYPE_ATTR,
>>>       XFS_DEFER_OPS_TYPE_MAX,
>>>   };
>>> @@ -62,5 +63,7 @@ extern const struct xfs_defer_op_type 
>>> xfs_refcount_update_defer_type;
>>>   extern const struct xfs_defer_op_type xfs_rmap_update_defer_type;
>>>   extern const struct xfs_defer_op_type xfs_extent_free_defer_type;
>>>   extern const struct xfs_defer_op_type xfs_agfl_free_defer_type;
>>> +extern const struct xfs_defer_op_type xfs_attr_defer_type;
>>> +
>>>   #endif /* __XFS_DEFER_H__ */
>>> diff --git a/fs/xfs/libxfs/xfs_log_format.h 
>>> b/fs/xfs/libxfs/xfs_log_format.h
>>> index e3400c9..33b26b6 100644
>>> --- a/fs/xfs/libxfs/xfs_log_format.h
>>> +++ b/fs/xfs/libxfs/xfs_log_format.h
>>> @@ -117,7 +117,12 @@ struct xfs_unmount_log_format {
>>>   #define XLOG_REG_TYPE_CUD_FORMAT    24
>>>   #define XLOG_REG_TYPE_BUI_FORMAT    25
>>>   #define XLOG_REG_TYPE_BUD_FORMAT    26
>>> -#define XLOG_REG_TYPE_MAX        26
>>> +#define XLOG_REG_TYPE_ATTRI_FORMAT    27
>>> +#define XLOG_REG_TYPE_ATTRD_FORMAT    28
>>> +#define XLOG_REG_TYPE_ATTR_NAME    29
>>> +#define XLOG_REG_TYPE_ATTR_VALUE    30
>>> +#define XLOG_REG_TYPE_MAX        30
>>> +
>>>   /*
>>>    * Flags to log operation header
>>> @@ -240,6 +245,8 @@ typedef struct xfs_trans_header {
>>>   #define    XFS_LI_CUD        0x1243
>>>   #define    XFS_LI_BUI        0x1244    /* bmbt update intent */
>>>   #define    XFS_LI_BUD        0x1245
>>> +#define    XFS_LI_ATTRI        0x1246  /* attr set/remove intent*/
>>> +#define    XFS_LI_ATTRD        0x1247  /* attr set/remove done */
>>>   #define XFS_LI_TYPE_DESC \
>>>       { XFS_LI_EFI,        "XFS_LI_EFI" }, \
>>> @@ -255,7 +262,9 @@ typedef struct xfs_trans_header {
>>>       { XFS_LI_CUI,        "XFS_LI_CUI" }, \
>>>       { XFS_LI_CUD,        "XFS_LI_CUD" }, \
>>>       { XFS_LI_BUI,        "XFS_LI_BUI" }, \
>>> -    { XFS_LI_BUD,        "XFS_LI_BUD" }
>>> +    { XFS_LI_BUD,        "XFS_LI_BUD" }, \
>>> +    { XFS_LI_ATTRI,        "XFS_LI_ATTRI" }, \
>>> +    { XFS_LI_ATTRD,        "XFS_LI_ATTRD" }
>>>   /*
>>>    * Inode Log Item Format definitions.
>>> @@ -860,4 +869,35 @@ struct xfs_icreate_log {
>>>       __be32        icl_gen;    /* inode generation number to use */
>>>   };
>>> +/*
>>> + * Flags for deferred attribute operations.
>>> + * Upper bits are flags, lower byte is type code
>>> + */
>>> +#define XFS_ATTR_OP_FLAGS_SET        1    /* Set the attribute */
>>> +#define XFS_ATTR_OP_FLAGS_REMOVE    2    /* Remove the attribute */
>>> +#define XFS_ATTR_OP_FLAGS_TYPE_MASK    0x0FF    /* Flags type mask */
>>> +
>>> +/*
>>> + * This is the structure used to lay out an attr log item in the
>>> + * log.
>>> + */
>>> +struct xfs_attri_log_format {
>>> +    uint16_t    alfi_type;    /* attri log item type */
>>> +    uint16_t    alfi_size;    /* size of this item */
>>> +    uint32_t    __pad;        /* pad to 64 bit aligned */
>>> +    uint64_t    alfi_id;    /* attri identifier */
>>> +    xfs_ino_t       alfi_ino;    /* the inode for this attr 
>>> operation */
>>> +    uint32_t        alfi_op_flags;    /* marks the op as a set or 
>>> remove */
>>> +    uint32_t        alfi_name_len;    /* attr name length */
>>> +    uint32_t        alfi_value_len;    /* attr value length */
>>> +    uint32_t        alfi_attr_flags;/* attr flags */
>>> +};
>>> +
>>> +struct xfs_attrd_log_format {
>>> +    uint16_t    alfd_type;    /* attrd log item type */
>>> +    uint16_t    alfd_size;    /* size of this item */
>>> +    uint32_t    __pad;        /* pad to 64 bit aligned */
>>> +    uint64_t    alfd_alf_id;    /* id of corresponding attrd */
>>> +};
>>> +
>>>   #endif /* __XFS_LOG_FORMAT_H__ */
>>> diff --git a/fs/xfs/libxfs/xfs_log_recover.h 
>>> b/fs/xfs/libxfs/xfs_log_recover.h
>>> index 641132d..b0b8e94 100644
>>> --- a/fs/xfs/libxfs/xfs_log_recover.h
>>> +++ b/fs/xfs/libxfs/xfs_log_recover.h
>>> @@ -72,6 +72,8 @@ extern const struct xlog_recover_item_ops 
>>> xlog_rui_item_ops;
>>>   extern const struct xlog_recover_item_ops xlog_rud_item_ops;
>>>   extern const struct xlog_recover_item_ops xlog_cui_item_ops;
>>>   extern const struct xlog_recover_item_ops xlog_cud_item_ops;
>>> +extern const struct xlog_recover_item_ops xlog_attri_item_ops;
>>> +extern const struct xlog_recover_item_ops xlog_attrd_item_ops;
>>>   /*
>>>    * Macros, structures, prototypes for internal log manager use.
>>> diff --git a/fs/xfs/libxfs/xfs_types.h b/fs/xfs/libxfs/xfs_types.h
>>> index 397d947..860cdd2 100644
>>> --- a/fs/xfs/libxfs/xfs_types.h
>>> +++ b/fs/xfs/libxfs/xfs_types.h
>>> @@ -11,6 +11,7 @@ typedef uint32_t    prid_t;        /* project ID */
>>>   typedef uint32_t    xfs_agblock_t;    /* blockno in alloc. group */
>>>   typedef uint32_t    xfs_agino_t;    /* inode # within allocation 
>>> grp */
>>>   typedef uint32_t    xfs_extlen_t;    /* extent length in blocks */
>>> +typedef uint32_t    xfs_attrlen_t;    /* attr length */
>>>   typedef uint32_t    xfs_agnumber_t;    /* allocation group number */
>>>   typedef int32_t        xfs_extnum_t;    /* # of extents in a file */
>>>   typedef int16_t        xfs_aextnum_t;    /* # extents in an 
>>> attribute fork */
>>> diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
>>> index 1887605..9a649d1 100644
>>> --- a/fs/xfs/scrub/common.c
>>> +++ b/fs/xfs/scrub/common.c
>>> @@ -24,6 +24,8 @@
>>>   #include "xfs_rmap_btree.h"
>>>   #include "xfs_log.h"
>>>   #include "xfs_trans_priv.h"
>>> +#include "xfs_da_format.h"
>>> +#include "xfs_da_btree.h"
>>>   #include "xfs_attr.h"
>>>   #include "xfs_reflink.h"
>>>   #include "scrub/scrub.h"
>>> diff --git a/fs/xfs/xfs_acl.c b/fs/xfs/xfs_acl.c
>>> index d4c687b5c..2fa173a 100644
>>> --- a/fs/xfs/xfs_acl.c
>>> +++ b/fs/xfs/xfs_acl.c
>>> @@ -10,6 +10,8 @@
>>>   #include "xfs_trans_resv.h"
>>>   #include "xfs_mount.h"
>>>   #include "xfs_inode.h"
>>> +#include "xfs_da_format.h"
>>> +#include "xfs_da_btree.h"
>>>   #include "xfs_attr.h"
>>>   #include "xfs_trace.h"
>>>   #include "xfs_error.h"
>>> diff --git a/fs/xfs/xfs_attr_item.c b/fs/xfs/xfs_attr_item.c
>>> new file mode 100644
>>> index 0000000..923c288
>>> --- /dev/null
>>> +++ b/fs/xfs/xfs_attr_item.c
>>> @@ -0,0 +1,829 @@
>>> +// SPDX-License-Identifier: GPL-2.0-or-later
>>> +/*
>>> + * Copyright (C) 2019 Oracle.  All Rights Reserved.
>>> + * Author: Allison Collins <allison.henderson@oracle.com>
>>> + */
>>> +
>>> +#include "xfs.h"
>>> +#include "xfs_fs.h"
>>> +#include "xfs_format.h"
>>> +#include "xfs_log_format.h"
>>> +#include "xfs_trans_resv.h"
>>> +#include "xfs_bit.h"
>>> +#include "xfs_shared.h"
>>> +#include "xfs_mount.h"
>>> +#include "xfs_defer.h"
>>> +#include "xfs_trans.h"
>>> +#include "xfs_trans_priv.h"
>>> +#include "xfs_buf_item.h"
>>> +#include "xfs_attr_item.h"
>>> +#include "xfs_log.h"
>>> +#include "xfs_btree.h"
>>> +#include "xfs_rmap.h"
>>> +#include "xfs_inode.h"
>>> +#include "xfs_icache.h"
>>> +#include "xfs_da_format.h"
>>> +#include "xfs_da_btree.h"
>>> +#include "xfs_attr.h"
>>> +#include "xfs_shared.h"
>>> +#include "xfs_attr_item.h"
>>> +#include "xfs_alloc.h"
>>> +#include "xfs_bmap.h"
>>> +#include "xfs_trace.h"
>>> +#include "libxfs/xfs_da_format.h"
>>> +#include "xfs_inode.h"
>>> +#include "xfs_quota.h"
>>> +#include "xfs_log_priv.h"
>>> +#include "xfs_log_recover.h"
>>> +
>>> +static const struct xfs_item_ops xfs_attri_item_ops;
>>> +static const struct xfs_item_ops xfs_attrd_item_ops;
>>> +
>>> +static inline struct xfs_attri_log_item *ATTRI_ITEM(struct 
>>> xfs_log_item *lip)
>>> +{
>>> +    return container_of(lip, struct xfs_attri_log_item, attri_item);
>>> +}
>>> +
>>> +STATIC void
>>> +xfs_attri_item_free(
>>> +    struct xfs_attri_log_item    *attrip)
>>> +{
>>> +    kmem_free(attrip->attri_item.li_lv_shadow);
>>> +    kmem_free(attrip);
>>> +}
>>> +
>>> +/*
>>> + * Freeing the attrip requires that we remove it from the AIL if it 
>>> has already
>>> + * been placed there. However, the ATTRI may not yet have been 
>>> placed in the
>>> + * AIL when called by xfs_attri_release() from ATTRD processing due 
>>> to the
>>> + * ordering of committed vs unpin operations in bulk insert 
>>> operations. Hence
>>> + * the reference count to ensure only the last caller frees the ATTRI.
>>> + */
>>> +STATIC void
>>> +xfs_attri_release(
>>> +    struct xfs_attri_log_item    *attrip)
>>> +{
>>> +    ASSERT(atomic_read(&attrip->attri_refcount) > 0);
>>> +    if (atomic_dec_and_test(&attrip->attri_refcount)) {
>>> +        xfs_trans_ail_delete(&attrip->attri_item,
>>> +                     SHUTDOWN_LOG_IO_ERROR);
>>> +        xfs_attri_item_free(attrip);
>>> +    }
>>> +}
>>> +
>>> +/*
>>> + * This returns the number of iovecs needed to log the given attri 
>>> item. We
>>> + * only need 1 iovec for an attri item.  It just logs the 
>>> attr_log_format
>>> + * structure.
>>> + */
>>> +static inline int
>>> +xfs_attri_item_sizeof(
>>> +    struct xfs_attri_log_item *attrip)
>>> +{
>>> +    return sizeof(struct xfs_attri_log_format);
>>> +}
>>> +
>>> +STATIC void
>>> +xfs_attri_item_size(
>>> +    struct xfs_log_item    *lip,
>>> +    int            *nvecs,
>>> +    int            *nbytes)
>>> +{
>>> +    struct xfs_attri_log_item       *attrip = ATTRI_ITEM(lip);
>>> +
>>> +    *nvecs += 1;
>>> +    *nbytes += xfs_attri_item_sizeof(attrip);
>>> +
>>> +    /* Attr set and remove operations require a name */
>>> +    ASSERT(attrip->attri_name_len > 0);
>>> +
>>> +    *nvecs += 1;
>>> +    *nbytes += ATTR_NVEC_SIZE(attrip->attri_name_len);
>>> +
>>> +    /*
>>> +     * Set ops can accept a value of 0 len to clear an attr value.  
>>> Remove
>>> +     * ops do not need a value at all.  So only account for the value
>>> +     * when it is needed.
>>> +     */
>>> +    if (attrip->attri_value_len > 0) {
>>> +        *nvecs += 1;
>>> +        *nbytes += ATTR_NVEC_SIZE(attrip->attri_value_len);
>>> +    }
>>> +}
>>> +
>>> +/*
>>> + * This is called to fill in the log iovecs for the given attri log
>>> + * item. We use  1 iovec for the attri_format_item, 1 for the name, and
>>> + * another for the value if it is present
>>> + */
>>> +STATIC void
>>> +xfs_attri_item_format(
>>> +    struct xfs_log_item    *lip,
>>> +    struct xfs_log_vec    *lv)
>>> +{
>>> +    struct xfs_attri_log_item    *attrip = ATTRI_ITEM(lip);
>>> +    struct xfs_log_iovec        *vecp = NULL;
>>> +
>>> +    attrip->attri_format.alfi_type = XFS_LI_ATTRI;
>>> +    attrip->attri_format.alfi_size = 1;
>>> +
>>> +    /*
>>> +     * This size accounting must be done before copying the attrip 
>>> into the
>>> +     * iovec.  If we do it after, the wrong size will be recorded to 
>>> the log
>>> +     * and we trip across assertion checks for bad region sizes 
>>> later during
>>> +     * the log recovery.
>>> +     */
>>> +
>>> +    ASSERT(attrip->attri_name_len > 0);
>>> +    attrip->attri_format.alfi_size++;
>>> +
>>> +    if (attrip->attri_value_len > 0)
>>> +        attrip->attri_format.alfi_size++;
>>> +
>>> +    xlog_copy_iovec(lv, &vecp, XLOG_REG_TYPE_ATTRI_FORMAT,
>>> +            &attrip->attri_format,
>>> +            xfs_attri_item_sizeof(attrip));
>>> +    if (attrip->attri_name_len > 0)
>>> +        xlog_copy_iovec(lv, &vecp, XLOG_REG_TYPE_ATTR_NAME,
>>> +                attrip->attri_name,
>>> +                ATTR_NVEC_SIZE(attrip->attri_name_len));
>>> +
>>> +    if (attrip->attri_value_len > 0)
>>> +        xlog_copy_iovec(lv, &vecp, XLOG_REG_TYPE_ATTR_VALUE,
>>> +                attrip->attri_value,
>>> +                ATTR_NVEC_SIZE(attrip->attri_value_len));
>>> +}
>>> +
>>> +/*
>>> + * The unpin operation is the last place an ATTRI is manipulated in 
>>> the log. It
>>> + * is either inserted in the AIL or aborted in the event of a log 
>>> I/O error. In
>>> + * either case, the ATTRI transaction has been successfully 
>>> committed to make
>>> + * it this far. Therefore, we expect whoever committed the ATTRI to 
>>> either
>>> + * construct and commit the ATTRD or drop the ATTRD's reference in 
>>> the event of
>>> + * error. Simply drop the log's ATTRI reference now that the log is 
>>> done with
>>> + * it.
>>> + */
>>> +STATIC void
>>> +xfs_attri_item_unpin(
>>> +    struct xfs_log_item    *lip,
>>> +    int            remove)
>>> +{
>>> +    struct xfs_attri_log_item    *attrip = ATTRI_ITEM(lip);
>>> +
>>> +    xfs_attri_release(attrip);
>>> +}
>>> +
>>> +
>>> +STATIC void
>>> +xfs_attri_item_release(
>>> +    struct xfs_log_item    *lip)
>>> +{
>>> +    xfs_attri_release(ATTRI_ITEM(lip));
>>> +}
>>> +
>>> +/*
>>> + * Allocate and initialize an attri item
>>> + */
>>> +STATIC struct xfs_attri_log_item *
>>> +xfs_attri_init(
>>> +    struct xfs_mount    *mp)
>>> +
>>> +{
>>> +    struct xfs_attri_log_item    *attrip;
>>> +    uint                size;
>>> +
>>> +    size = (uint)(sizeof(struct xfs_attri_log_item));
>>> +    attrip = kmem_zalloc(size, 0);
>>> +
>>> +    xfs_log_item_init(mp, &attrip->attri_item, XFS_LI_ATTRI,
>>> +              &xfs_attri_item_ops);
>>> +    attrip->attri_format.alfi_id = (uintptr_t)(void *)attrip;
>>> +    atomic_set(&attrip->attri_refcount, 2);
>>> +
>>> +    return attrip;
>>> +}
>>> +
>>> +/*
>>> + * Copy an attr format buffer from the given buf, and into the 
>>> destination attr
>>> + * format structure.
>>> + */
>>> +STATIC int
>>> +xfs_attri_copy_format(struct xfs_log_iovec *buf,
>>> +              struct xfs_attri_log_format *dst_attr_fmt)
>>> +{
>>> +    struct xfs_attri_log_format *src_attr_fmt = buf->i_addr;
>>> +    uint len = sizeof(struct xfs_attri_log_format);
>>> +
>>> +    if (buf->i_len != len)
>>> +        return -EFSCORRUPTED;
>>> +
>>> +    memcpy((char *)dst_attr_fmt, (char *)src_attr_fmt, len);
>>> +    return 0;
>>> +}
>>> +
>>> +static inline struct xfs_attrd_log_item *ATTRD_ITEM(struct 
>>> xfs_log_item *lip)
>>> +{
>>> +    return container_of(lip, struct xfs_attrd_log_item, attrd_item);
>>> +}
>>> +
>>> +STATIC void
>>> +xfs_attrd_item_free(struct xfs_attrd_log_item *attrdp)
>>> +{
>>> +    kmem_free(attrdp->attrd_item.li_lv_shadow);
>>> +    kmem_free(attrdp);
>>> +}
>>> +
>>> +/*
>>> + * This returns the number of iovecs needed to log the given attrd 
>>> item.
>>> + * We only need 1 iovec for an attrd item.  It just logs the 
>>> attr_log_format
>>> + * structure.
>>> + */
>>> +static inline int
>>> +xfs_attrd_item_sizeof(
>>> +    struct xfs_attrd_log_item *attrdp)
>>> +{
>>> +    return sizeof(struct xfs_attrd_log_format);
>>> +}
>>> +
>>> +STATIC void
>>> +xfs_attrd_item_size(
>>> +    struct xfs_log_item    *lip,
>>> +    int            *nvecs,
>>> +    int            *nbytes)
>>> +{
>>> +    struct xfs_attrd_log_item    *attrdp = ATTRD_ITEM(lip);
>>> +    *nvecs += 1;
>>> +    *nbytes += xfs_attrd_item_sizeof(attrdp);
>>> +}
>>> +
>>> +/*
>>> + * This is called to fill in the log iovecs for the given attrd log 
>>> item. We use
>>> + * only 1 iovec for the attrd_format, and we point that at the 
>>> attr_log_format
>>> + * structure embedded in the attrd item.
>>> + */
>>> +STATIC void
>>> +xfs_attrd_item_format(
>>> +    struct xfs_log_item    *lip,
>>> +    struct xfs_log_vec    *lv)
>>> +{
>>> +    struct xfs_attrd_log_item    *attrdp = ATTRD_ITEM(lip);
>>> +    struct xfs_log_iovec        *vecp = NULL;
>>> +
>>> +    attrdp->attrd_format.alfd_type = XFS_LI_ATTRD;
>>> +    attrdp->attrd_format.alfd_size = 1;
>>> +
>>> +    xlog_copy_iovec(lv, &vecp, XLOG_REG_TYPE_ATTRD_FORMAT,
>>> +            &attrdp->attrd_format, xfs_attrd_item_sizeof(attrdp));
>>> +}
>>> +
>>> +/*
>>> + * The ATTRD is either committed or aborted if the transaction is 
>>> cancelled. If
>>> + * the transaction is cancelled, drop our reference to the ATTRI and 
>>> free the
>>> + * ATTRD.
>>> + */
>>> +STATIC void
>>> +xfs_attrd_item_release(
>>> +    struct xfs_log_item     *lip)
>>> +{
>>> +    struct xfs_attrd_log_item *attrdp = ATTRD_ITEM(lip);
>>> +    xfs_attri_release(attrdp->attrd_attrip);
>>> +    xfs_attrd_item_free(attrdp);
>>> +}
>>> +
>>> +/*
>>> + * Log an ATTRI it to the ATTRD when the attr op is done.  An attr 
>>> operation
>>> + * may be a set or a remove.  Note that the transaction is marked dirty
>>> + * regardless of whether the operation succeeds or fails to support the
>>> + * ATTRI/ATTRD lifecycle rules.
>>> + */
>>> +int
>>> +xfs_trans_attr(
>>> +    struct xfs_delattr_context    *dac,
>>> +    struct xfs_attrd_log_item    *attrdp,
>>> +    struct xfs_buf            **leaf_bp,
>>> +    uint32_t            op_flags)
>>> +{
>>> +    struct xfs_da_args        *args = dac->da_args;
>>> +    int                error;
>>> +
>>> +    error = xfs_qm_dqattach_locked(args->dp, 0);
>>> +    if (error)
>>> +        return error;
>>> +
>>> +    switch (op_flags) {
>>> +    case XFS_ATTR_OP_FLAGS_SET:
>>> +        args->op_flags |= XFS_DA_OP_ADDNAME;
>>> +        error = xfs_attr_set_iter(dac, leaf_bp);
>>> +        break;
>>> +    case XFS_ATTR_OP_FLAGS_REMOVE:
>>> +        ASSERT(XFS_IFORK_Q((args->dp)));
>>> +        error = xfs_attr_remove_iter(dac);
>>> +        break;
>>> +    default:
>>> +        error = -EFSCORRUPTED;
>>> +        break;
>>> +    }
>>> +
>>> +    /*
>>> +     * Mark the transaction dirty, even on error. This ensures the
>>> +     * transaction is aborted, which:
>>> +     *
>>> +     * 1.) releases the ATTRI and frees the ATTRD
>>> +     * 2.) shuts down the filesystem
>>> +     */
>>> +    args->trans->t_flags |= XFS_TRANS_DIRTY;
>>> +    set_bit(XFS_LI_DIRTY, &attrdp->attrd_item.li_flags);
>>> +
>>> +    return error;
>>> +}
>>> +
>>> +/* Log an attr to the intent item. */
>>> +STATIC void
>>> +xfs_attr_log_item(
>>> +    struct xfs_trans        *tp,
>>> +    struct xfs_attri_log_item    *attrip,
>>> +    struct xfs_attr_item        *attr)
>>> +{
>>> +    struct xfs_attri_log_format    *attrp;
>>> +    char                *name_value;
>>> +
>>> +    name_value = ((char *)attr) + sizeof(struct xfs_attr_item);
>>> +
>>> +    tp->t_flags |= XFS_TRANS_DIRTY;
>>> +    set_bit(XFS_LI_DIRTY, &attrip->attri_item.li_flags);
>>> +
>>> +    /*
>>> +     * At this point the xfs_attr_item has been constructed, and we've
>>> +     * created the log intent. Fill in the attri log item and log 
>>> format
>>> +     * structure with fields from this xfs_attr_item
>>> +     */
>>> +    attrp = &attrip->attri_format;
>>> +    attrp->alfi_ino = attr->xattri_ip->i_ino;
>>> +    attrp->alfi_op_flags = attr->xattri_op_flags;
>>> +    attrp->alfi_value_len = attr->xattri_value_len;
>>> +    attrp->alfi_name_len = attr->xattri_name_len;
>>> +    attrp->alfi_attr_flags = attr->xattri_flags;
>>> +
>>> +    attrip->attri_name = name_value;
>>> +    attrip->attri_value = &name_value[attr->xattri_name_len];
>>> +    attrip->attri_name_len = attr->xattri_name_len;
>>> +    attrip->attri_value_len = attr->xattri_value_len;
>>> +}
>>> +
>>> +/* Get an ATTRI. */
>>> +static struct xfs_log_item *
>>> +xfs_attr_create_intent(
>>> +    struct xfs_trans        *tp,
>>> +    struct list_head        *items,
>>> +    unsigned int            count,
>>> +    bool                sort)
>>> +{
>>> +    struct xfs_mount        *mp = tp->t_mountp;
>>> +    struct xfs_attri_log_item    *attrip = xfs_attri_init(mp);
>>> +    struct xfs_attr_item        *attr;
>>> +
>>> +    ASSERT(count == 1);
>>> +
>>> +    xfs_trans_add_item(tp, &attrip->attri_item);
>>> +    list_for_each_entry(attr, items, xattri_list)
>>> +        xfs_attr_log_item(tp, attrip, attr);
>>> +    return &attrip->attri_item;
>>> +}
>>> +
>>> +/* Process an attr. */
>>> +STATIC int
>>> +xfs_attr_finish_item(
>>> +    struct xfs_trans        *tp,
>>> +    struct xfs_log_item        *done,
>>> +    struct list_head        *item,
>>> +    struct xfs_btree_cur        **state)
>>> +{
>>> +    struct xfs_attr_item        *attr;
>>> +    int                error;
>>> +    int                local;
>>> +    struct xfs_delattr_context    *dac;
>>> +    struct xfs_da_args        *args;
>>> +    struct xfs_attrd_log_item    *attrdp;
>>> +    struct xfs_attri_log_item    *attrip;
>>> +
>>> +    attr = container_of(item, struct xfs_attr_item, xattri_list);
>>> +    dac = &attr->xattri_dac;
>>> +    args = &attr->xattri_args;
>>> +
>>> +    if (!(dac->flags & XFS_DAC_DELAYED_OP_INIT)) {
>>> +        /* Only need to initialize args context once */
>>> +        memset(args, 0, sizeof(*args));
>>> +        args->geo = attr->xattri_ip->i_mount->m_attr_geo;
>>> +        args->whichfork = XFS_ATTR_FORK;
>>> +        args->dp = attr->xattri_ip;
>>> +        args->name = ((const unsigned char *)attr) +
>>> +                  sizeof(struct xfs_attr_item);
>>> +        args->namelen = attr->xattri_name_len;
>>> +        args->attr_filter = attr->xattri_flags;
>>> +        args->hashval = xfs_da_hashname(args->name, args->namelen);
>>> +        args->value = (void *)&args->name[attr->xattri_name_len];
>>> +        args->valuelen = attr->xattri_value_len;
>>> +        args->op_flags = XFS_DA_OP_OKNOENT;
>>> +
>>> +        /* must match existing transaction block res */
>>> +        args->total = xfs_attr_calc_size(args, &local);
>>> +
>>> +        memset(dac, 0, sizeof(struct xfs_delattr_context));
>>> +        dac->flags |= XFS_DAC_DELAYED_OP_INIT;
>>> +        dac->da_args = args;
>>> +    }
>>> +
>>> +    /*
>>> +     * Always reset trans after EAGAIN cycle
>>> +     * since the transaction is new
>>> +     */
>>> +    args->trans = tp;
>>> +
>>> +    error = xfs_trans_attr(dac, ATTRD_ITEM(done), &dac->leaf_bp,
>>> +                   attr->xattri_op_flags);
>>> +    /*
>>> +     * The attrip refers to xfs_attr_item memory to log the name and 
>>> value
>>> +     * with the intent item. This already occurred when the intent was
>>> +     * committed so these fields are no longer accessed. Clear them 
>>> out of
>>> +     * caution since we're about to free the xfs_attr_item.
>>> +     */
>>> +    attrdp = (struct xfs_attrd_log_item *)done;
>>> +    attrip = attrdp->attrd_attrip;
>>> +    attrip->attri_name = NULL;
>>> +    attrip->attri_value = NULL;
>>> +
>>> +    if (error != -EAGAIN)
>>> +        kmem_free(attr);
>>> +
>>> +    return error;
>>> +}
>>> +
>>> +/* Abort all pending ATTRs. */
>>> +STATIC void
>>> +xfs_attr_abort_intent(
>>> +    struct xfs_log_item        *intent)
>>> +{
>>> +    xfs_attri_release(ATTRI_ITEM(intent));
>>> +}
>>> +
>>> +/* Cancel an attr */
>>> +STATIC void
>>> +xfs_attr_cancel_item(
>>> +    struct list_head        *item)
>>> +{
>>> +    struct xfs_attr_item        *attr;
>>> +
>>> +    attr = container_of(item, struct xfs_attr_item, xattri_list);
>>> +    kmem_free(attr);
>>> +}
>>> +
>>> +/*
>>> + * The ATTRI is logged only once and cannot be moved in the log, so 
>>> simply
>>> + * return the lsn at which it's been logged.
>>> + */
>>> +STATIC xfs_lsn_t
>>> +xfs_attri_item_committed(
>>> +    struct xfs_log_item    *lip,
>>> +    xfs_lsn_t        lsn)
>>> +{
>>> +    return lsn;
>>> +}
>>> +
>>> +STATIC void
>>> +xfs_attri_item_committing(
>>> +    struct xfs_log_item    *lip,
>>> +    xfs_lsn_t        lsn)
>>> +{
>>> +}
>>> +
>>> +STATIC bool
>>> +xfs_attri_item_match(
>>> +    struct xfs_log_item    *lip,
>>> +    uint64_t        intent_id)
>>> +{
>>> +    return ATTRI_ITEM(lip)->attri_format.alfi_id == intent_id;
>>> +}
>>> +
>>> +/*
>>> + * When the attrd item is committed to disk, all we need to do is 
>>> delete our
>>> + * reference to our partner attri item and then free ourselves. 
>>> Since we're
>>> + * freeing ourselves we must return -1 to keep the transaction code 
>>> from
>>> + * further referencing this item.
>>> + */
>>> +STATIC xfs_lsn_t
>>> +xfs_attrd_item_committed(
>>> +    struct xfs_log_item    *lip,
>>> +    xfs_lsn_t        lsn)
>>> +{
>>> +    struct xfs_attrd_log_item    *attrdp = ATTRD_ITEM(lip);
>>> +
>>> +    /*
>>> +     * Drop the ATTRI reference regardless of whether the ATTRD has 
>>> been
>>> +     * aborted. Once the ATTRD transaction is constructed, it is the 
>>> sole
>>> +     * responsibility of the ATTRD to release the ATTRI (even if the 
>>> ATTRI
>>> +     * is aborted due to log I/O error).
>>> +     */
>>> +    xfs_attri_release(attrdp->attrd_attrip);
>>> +    xfs_attrd_item_free(attrdp);
>>> +
>>> +    return NULLCOMMITLSN;
>>> +}
>>> +
>>> +STATIC void
>>> +xfs_attrd_item_committing(
>>> +    struct xfs_log_item    *lip,
>>> +    xfs_lsn_t        lsn)
>>> +{
>>> +}
>>> +
>>> +
>>> +/*
>>> + * Allocate and initialize an attrd item
>>> + */
>>> +struct xfs_attrd_log_item *
>>> +xfs_attrd_init(
>>> +    struct xfs_mount        *mp,
>>> +    struct xfs_attri_log_item    *attrip)
>>> +
>>> +{
>>> +    struct xfs_attrd_log_item    *attrdp;
>>> +    uint                size;
>>> +
>>> +    size = (uint)(sizeof(struct xfs_attrd_log_item));
>>> +    attrdp = kmem_zalloc(size, 0);
>>> +    memset(attrdp, 0, size);
>>> +
>>> +    xfs_log_item_init(mp, &attrdp->attrd_item, XFS_LI_ATTRD,
>>> +              &xfs_attrd_item_ops);
>>> +    attrdp->attrd_attrip = attrip;
>>> +    attrdp->attrd_format.alfd_alf_id = attrip->attri_format.alfi_id;
>>> +
>>> +    return attrdp;
>>> +}
>>> +
>>> +/*
>>> + * This routine is called to allocate an "attr free done" log item.
>>> + */
>>> +struct xfs_attrd_log_item *
>>> +xfs_trans_get_attrd(struct xfs_trans        *tp,
>>> +          struct xfs_attri_log_item    *attrip)
>>> +{
>>> +    struct xfs_attrd_log_item        *attrdp;
>>> +
>>> +    ASSERT(tp != NULL);
>>> +
>>> +    attrdp = xfs_attrd_init(tp->t_mountp, attrip);
>>> +    ASSERT(attrdp != NULL);
>>> +
>>> +    xfs_trans_add_item(tp, &attrdp->attrd_item);
>>> +    return attrdp;
>>> +}
>>> +
>>> +static const struct xfs_item_ops xfs_attrd_item_ops = {
>>> +    .iop_size    = xfs_attrd_item_size,
>>> +    .iop_format    = xfs_attrd_item_format,
>>> +    .iop_release    = xfs_attrd_item_release,
>>> +    .iop_committing    = xfs_attrd_item_committing,
>>> +    .iop_committed    = xfs_attrd_item_committed,
>>> +};
>>> +
>>> +
>>> +/* Get an ATTRD so we can process all the attrs. */
>>> +static struct xfs_log_item *
>>> +xfs_attr_create_done(
>>> +    struct xfs_trans        *tp,
>>> +    struct xfs_log_item        *intent,
>>> +    unsigned int            count)
>>> +{
>>> +    return &xfs_trans_get_attrd(tp, ATTRI_ITEM(intent))->attrd_item;
>>> +}
>>> +
>>> +const struct xfs_defer_op_type xfs_attr_defer_type = {
>>> +    .max_items    = 1,
>>> +    .create_intent    = xfs_attr_create_intent,
>>> +    .abort_intent    = xfs_attr_abort_intent,
>>> +    .create_done    = xfs_attr_create_done,
>>> +    .finish_item    = xfs_attr_finish_item,
>>> +    .cancel_item    = xfs_attr_cancel_item,
>>> +};
>>> +
>>> +/*
>>> + * Process an attr intent item that was recovered from the log.  We 
>>> need to
>>> + * delete the attr that it describes.
>>> + */
>>> +STATIC int
>>> +xfs_attri_item_recover(
>>> +    struct xfs_log_item        *lip,
>>> +    struct xfs_trans        *parent_tp)
>>> +{
>>> +    struct xfs_attri_log_item    *attrip = ATTRI_ITEM(lip);
>>> +    struct xfs_mount        *mp = parent_tp->t_mountp;
>>> +    struct xfs_inode        *ip;
>>> +    struct xfs_attrd_log_item    *attrdp;
>>> +    struct xfs_da_args        args;
>>> +    struct xfs_attri_log_format    *attrp;
>>> +    struct xfs_trans_res        tres;
>>> +    int                local;
>>> +    int                error, err2 = 0;
>>> +    int                rsvd = 0;
>>> +    struct xfs_buf            *leaf_bp = NULL;
>>> +    struct xfs_delattr_context    dac = {
>>> +        .da_args    = &args,
>>> +    };
>>> +
>>> +    /*
>>> +     * First check the validity of the attr described by the ATTRI.  
>>> If any
>>> +     * are bad, then assume that all are bad and just toss the ATTRI.
>>> +     */
>>> +    attrp = &attrip->attri_format;
>>> +    if (!(attrp->alfi_op_flags == XFS_ATTR_OP_FLAGS_SET ||
>>> +          attrp->alfi_op_flags == XFS_ATTR_OP_FLAGS_REMOVE) ||
>>> +        (attrp->alfi_value_len > XATTR_SIZE_MAX) ||
>>> +        (attrp->alfi_name_len > XATTR_NAME_MAX) ||
>>> +        (attrp->alfi_name_len == 0)) {
>>> +        /*
>>> +         * This will pull the ATTRI from the AIL and free the memory
>>> +         * associated with it.
>>> +         */
>>> +        xfs_attri_release(attrip);
>>> +        return -EFSCORRUPTED;
>>> +    }
>>> +
>>> +    error = xfs_iget(mp, 0, attrp->alfi_ino, 0, 0, &ip);
>>> +    if (error)
>>> +        return error;
>>> +
>>> +    memset(&args, 0, sizeof(args));
>>> +    args.geo = ip->i_mount->m_attr_geo;
>>> +    args.whichfork = XFS_ATTR_FORK;
>>> +    args.dp = ip;
>>> +    args.name = attrip->attri_name;
>>> +    args.namelen = attrp->alfi_name_len;
>>> +    args.attr_filter = attrp->alfi_attr_flags;
>>> +    args.hashval = xfs_da_hashname(attrip->attri_name,
>>> +                    attrp->alfi_name_len);
>>> +    args.value = attrip->attri_value;
>>> +    args.valuelen = attrp->alfi_value_len;
>>> +    args.op_flags = XFS_DA_OP_OKNOENT;
>>> +    args.total = xfs_attr_calc_size(&args, &local);
>>> +
>>> +    tres.tr_logres = M_RES(mp)->tr_attrsetm.tr_logres +
>>> +            M_RES(mp)->tr_attrsetrt.tr_logres * args.total;
>>> +    tres.tr_logcount = XFS_ATTRSET_LOG_COUNT;
>>> +    tres.tr_logflags = XFS_TRANS_PERM_LOG_RES;
>>> +
>>> +    error = xfs_trans_alloc(mp, &tres, args.total,  0,
>>> +                rsvd ? XFS_TRANS_RESERVE : 0, &args.trans);
>>> +    if (error)
>>> +        goto out_rele;
>>> +    attrdp = xfs_trans_get_attrd(args.trans, attrip);
>>> +
>>> +    xfs_ilock(ip, XFS_ILOCK_EXCL);
>>> +
>>> +    xfs_trans_ijoin(args.trans, ip, 0);
>>> +
>>> +    do {
>>> +        error = xfs_trans_attr(&dac, attrdp, &leaf_bp,
>>> +                       attrp->alfi_op_flags);
>>> +        if (error && error != -EAGAIN)
>>> +            goto abort_error;
>>> +
>>> +        xfs_trans_log_inode(args.trans, ip,
>>> +                XFS_ILOG_CORE | XFS_ILOG_ADATA);
>>> +
>>> +        err2 = xfs_trans_roll(&args.trans);
>>> +        if (err2) {
>>> +            error = err2;
>>> +            goto abort_error;
>>> +        }
>>> +
>>> +        /* Rejoin inode and leaf if needed */
>>> +        xfs_trans_ijoin(args.trans, ip, 0);
>>> +        if (leaf_bp) {
>>> +            xfs_trans_bjoin(args.trans, leaf_bp);
>>> +            xfs_trans_bhold(args.trans, leaf_bp);
>>> +        }
>>> +
>>> +    } while (error == -EAGAIN);
>>> +
>>> +    error = xfs_trans_commit(args.trans);
>>> +    if (error)
>>> +        goto abort_error;
>>> +
>>> +    xfs_iunlock(ip, XFS_ILOCK_EXCL);
>>> +    xfs_irele(ip);
>>> +    return error;
>>> +
>>> +abort_error:
>>> +    xfs_trans_cancel(args.trans);
>>> +    xfs_iunlock(ip, XFS_ILOCK_EXCL);
>>> +out_rele:
>>> +    xfs_irele(ip);
>>> +    return error;
>>> +}
>>> +
>>> +static const struct xfs_item_ops xfs_attri_item_ops = {
>>> +    .iop_size    = xfs_attri_item_size,
>>> +    .iop_format    = xfs_attri_item_format,
>>> +    .iop_unpin    = xfs_attri_item_unpin,
>>> +    .iop_committed    = xfs_attri_item_committed,
>>> +    .iop_committing = xfs_attri_item_committing,
>>> +    .iop_release    = xfs_attri_item_release,
>>> +    .iop_recover    = xfs_attri_item_recover,
>>> +    .iop_match    = xfs_attri_item_match,
>>> +};
>>> +
>>> +
>>> +
>>> +STATIC int
>>> +xlog_recover_attri_commit_pass2(
>>> +    struct xlog                     *log,
>>> +    struct list_head        *buffer_list,
>>> +    struct xlog_recover_item        *item,
>>> +    xfs_lsn_t                       lsn)
>>> +{
>>> +    int                             error;
>>> +    struct xfs_mount                *mp = log->l_mp;
>>> +    struct xfs_attri_log_item       *attrip;
>>> +    struct xfs_attri_log_format     *attri_formatp;
>>> +    char                *name = NULL;
>>> +    char                *value = NULL;
>>> +    int                region = 0;
>>> +
>>> +    attri_formatp = item->ri_buf[region].i_addr;
>>> +
>>> +    attrip = xfs_attri_init(mp);
>>> +    error = xfs_attri_copy_format(&item->ri_buf[region],
>>> +                      &attrip->attri_format);
>>> +    if (error) {
>>> +        xfs_attri_item_free(attrip);
>>> +        return error;
>>> +    }
>>> +
>>> +    attrip->attri_name_len = attri_formatp->alfi_name_len;
>>> +    attrip->attri_value_len = attri_formatp->alfi_value_len;
>>> +    attrip = kmem_realloc(attrip, sizeof(struct xfs_attri_log_item) +
>>> +                  attrip->attri_name_len + attrip->attri_value_len,
>>> +                  0);
>>> +
>>> +    ASSERT(attrip->attri_name_len > 0);
>>> +    region++;
>>> +    name = ((char *)attrip) + sizeof(struct xfs_attri_log_item);
>>> +    memcpy(name, item->ri_buf[region].i_addr,
>>> +           attrip->attri_name_len);
>>> +    attrip->attri_name = name;
>>> +
>>> +    if (attrip->attri_value_len > 0) {
>>> +        region++;
>>> +        value = ((char *)attrip) + sizeof(struct xfs_attri_log_item) +
>>> +            attrip->attri_name_len;
>>> +        memcpy(value, item->ri_buf[region].i_addr,
>>> +            attrip->attri_value_len);
>>> +        attrip->attri_value = value;
>>> +    }
>>> +
>>> +    /*
>>> +     * The ATTRI has two references. One for the ATTRD and one for 
>>> ATTRI to
>>> +     * ensure it makes it into the AIL. Insert the ATTRI into the AIL
>>> +     * directly and drop the ATTRI reference. Note that
>>> +     * xfs_trans_ail_update() drops the AIL lock.
>>> +     */
>>> +    xfs_trans_ail_insert(log->l_ailp, &attrip->attri_item, lsn);
>>> +    xfs_attri_release(attrip);
>>> +    return 0;
>>> +}
>>> +
>>> +const struct xlog_recover_item_ops xlog_attri_item_ops = {
>>> +    .item_type    = XFS_LI_ATTRI,
>>> +    .commit_pass2    = xlog_recover_attri_commit_pass2,
>>> +};
>>> +
>>> +/*
>>> + * This routine is called when an ATTRD format structure is found in 
>>> a committed
>>> + * transaction in the log. Its purpose is to cancel the 
>>> corresponding ATTRI if
>>> + * it was still in the log. To do this it searches the AIL for the 
>>> ATTRI with
>>> + * an id equal to that in the ATTRD format structure. If we find it 
>>> we drop
>>> + * the ATTRD reference, which removes the ATTRI from the AIL and 
>>> frees it.
>>> + */
>>> +STATIC int
>>> +xlog_recover_attrd_commit_pass2(
>>> +    struct xlog            *log,
>>> +    struct list_head        *buffer_list,
>>> +    struct xlog_recover_item    *item,
>>> +    xfs_lsn_t            lsn)
>>> +{
>>> +    struct xfs_attrd_log_format    *attrd_formatp;
>>> +
>>> +    attrd_formatp = item->ri_buf[0].i_addr;
>>> +    ASSERT((item->ri_buf[0].i_len ==
>>> +                (sizeof(struct xfs_attrd_log_format))));
>>> +
>>> +    xlog_recover_release_intent(log, XFS_LI_ATTRI,
>>> +                    attrd_formatp->alfd_alf_id);
>>> +    return 0;
>>> +}
>>> +
>>> +const struct xlog_recover_item_ops xlog_attrd_item_ops = {
>>> +    .item_type    = XFS_LI_ATTRD,
>>> +    .commit_pass2    = xlog_recover_attrd_commit_pass2,
>>> +};
>>> diff --git a/fs/xfs/xfs_attr_item.h b/fs/xfs/xfs_attr_item.h
>>> new file mode 100644
>>> index 0000000..7dd2572
>>> --- /dev/null
>>> +++ b/fs/xfs/xfs_attr_item.h
>>> @@ -0,0 +1,76 @@
>>> +/* SPDX-License-Identifier: GPL-2.0-or-later
>>> + *
>>> + * Copyright (C) 2019 Oracle.  All Rights Reserved.
>>> + * Author: Allison Collins <allison.henderson@oracle.com>
>>> + */
>>> +#ifndef    __XFS_ATTR_ITEM_H__
>>> +#define    __XFS_ATTR_ITEM_H__
>>> +
>>> +/* kernel only ATTRI/ATTRD definitions */
>>> +
>>> +struct xfs_mount;
>>> +struct kmem_zone;
>>> +
>>> +/*
>>> + * Define ATTR flag bits. Manipulated by set/clear/test_bit operators.
>>> + */
>>> +#define    XFS_ATTRI_RECOVERED    1
>>> +
>>> +
>>> +/* iovec length must be 32-bit aligned */
>>> +#define ATTR_NVEC_SIZE(size) (size == sizeof(int32_t) ? 
>>> sizeof(int32_t) : \
>>> +                size + sizeof(int32_t) - \
>>> +                (size % sizeof(int32_t)))
>>> +
>>> +/*
>>> + * This is the "attr intention" log item.  It is used to log the 
>>> fact that some
>>> + * attribute operations need to be processed.  An operation is 
>>> currently either
>>> + * a set or remove.  Set or remove operations are described by the 
>>> xfs_attr_item
>>> + * which may be logged to this intent.  Intents are used in 
>>> conjunction with the
>>> + * "attr done" log item described below.
>>> + *
>>> + * The ATTRI is reference counted so that it is not freed prior to 
>>> both the
>>> + * ATTRI and ATTRD being committed and unpinned. This ensures the 
>>> ATTRI is
>>> + * inserted into the AIL even in the event of out of order ATTRI/ATTRD
>>> + * processing. In other words, an ATTRI is born with two references:
>>> + *
>>> + *      1.) an ATTRI held reference to track ATTRI AIL insertion
>>> + *      2.) an ATTRD held reference to track ATTRD commit
>>> + *
>>> + * On allocation, both references are the responsibility of the 
>>> caller. Once the
>>> + * ATTRI is added to and dirtied in a transaction, ownership of 
>>> reference one
>>> + * transfers to the transaction. The reference is dropped once the 
>>> ATTRI is
>>> + * inserted to the AIL or in the event of failure along the way 
>>> (e.g., commit
>>> + * failure, log I/O error, etc.). Note that the caller remains 
>>> responsible for
>>> + * the ATTRD reference under all circumstances to this point. The 
>>> caller has no
>>> + * means to detect failure once the transaction is committed, however.
>>> + * Therefore, an ATTRD is required after this point, even in the 
>>> event of
>>> + * unrelated failure.
>>> + *
>>> + * Once an ATTRD is allocated and dirtied in a transaction, 
>>> reference two
>>> + * transfers to the transaction. The ATTRD reference is dropped once 
>>> it reaches
>>> + * the unpin handler. Similar to the ATTRI, the reference also drops 
>>> in the
>>> + * event of commit failure or log I/O errors. Note that the ATTRD is 
>>> not
>>> + * inserted in the AIL, so at this point both the ATTRI and ATTRD 
>>> are freed.
>>> + */
>>> +struct xfs_attri_log_item {
>>> +    struct xfs_log_item        attri_item;
>>> +    atomic_t            attri_refcount;
>>> +    int                attri_name_len;
>>> +    void                *attri_name;
>>> +    int                attri_value_len;
>>> +    void                *attri_value;
>>> +    struct xfs_attri_log_format    attri_format;
>>> +};
>>> +
>>> +/*
>>> + * This is the "attr done" log item.  It is used to log the fact 
>>> that some attrs
>>> + * earlier mentioned in an attri item have been freed.
>>> + */
>>> +struct xfs_attrd_log_item {
>>> +    struct xfs_attri_log_item    *attrd_attrip;
>>> +    struct xfs_log_item        attrd_item;
>>> +    struct xfs_attrd_log_format    attrd_format;
>>> +};
>>> +
>>> +#endif    /* __XFS_ATTR_ITEM_H__ */
>>> diff --git a/fs/xfs/xfs_attr_list.c b/fs/xfs/xfs_attr_list.c
>>> index 50f922c..166b680 100644
>>> --- a/fs/xfs/xfs_attr_list.c
>>> +++ b/fs/xfs/xfs_attr_list.c
>>> @@ -15,6 +15,7 @@
>>>   #include "xfs_inode.h"
>>>   #include "xfs_trans.h"
>>>   #include "xfs_bmap.h"
>>> +#include "xfs_da_btree.h"
>>>   #include "xfs_attr.h"
>>>   #include "xfs_attr_sf.h"
>>>   #include "xfs_attr_leaf.h"
>>> diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
>>> index 6f22a66..edc05af 100644
>>> --- a/fs/xfs/xfs_ioctl.c
>>> +++ b/fs/xfs/xfs_ioctl.c
>>> @@ -15,6 +15,8 @@
>>>   #include "xfs_iwalk.h"
>>>   #include "xfs_itable.h"
>>>   #include "xfs_error.h"
>>> +#include "xfs_da_format.h"
>>> +#include "xfs_da_btree.h"
>>>   #include "xfs_attr.h"
>>>   #include "xfs_bmap.h"
>>>   #include "xfs_bmap_util.h"
>>> diff --git a/fs/xfs/xfs_ioctl32.c b/fs/xfs/xfs_ioctl32.c
>>> index c1771e7..62e1534 100644
>>> --- a/fs/xfs/xfs_ioctl32.c
>>> +++ b/fs/xfs/xfs_ioctl32.c
>>> @@ -17,6 +17,8 @@
>>>   #include "xfs_itable.h"
>>>   #include "xfs_fsops.h"
>>>   #include "xfs_rtalloc.h"
>>> +#include "xfs_da_format.h"
>>> +#include "xfs_da_btree.h"
>>>   #include "xfs_attr.h"
>>>   #include "xfs_ioctl.h"
>>>   #include "xfs_ioctl32.h"
>>> diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
>>> index 80a13c8..fe60da1 100644
>>> --- a/fs/xfs/xfs_iops.c
>>> +++ b/fs/xfs/xfs_iops.c
>>> @@ -13,6 +13,8 @@
>>>   #include "xfs_inode.h"
>>>   #include "xfs_acl.h"
>>>   #include "xfs_quota.h"
>>> +#include "xfs_da_format.h"
>>> +#include "xfs_da_btree.h"
>>>   #include "xfs_attr.h"
>>>   #include "xfs_trans.h"
>>>   #include "xfs_trace.h"
>>> diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
>>> index ad0c69ee..6405ce33 100644
>>> --- a/fs/xfs/xfs_log.c
>>> +++ b/fs/xfs/xfs_log.c
>>> @@ -1975,6 +1975,10 @@ xlog_print_tic_res(
>>>           REG_TYPE_STR(CUD_FORMAT, "cud_format"),
>>>           REG_TYPE_STR(BUI_FORMAT, "bui_format"),
>>>           REG_TYPE_STR(BUD_FORMAT, "bud_format"),
>>> +        REG_TYPE_STR(ATTRI_FORMAT, "attri_format"),
>>> +        REG_TYPE_STR(ATTRD_FORMAT, "attrd_format"),
>>> +        REG_TYPE_STR(ATTR_NAME, "attr_name"),
>>> +        REG_TYPE_STR(ATTR_VALUE, "attr_value"),
>>>       };
>>>       BUILD_BUG_ON(ARRAY_SIZE(res_type_str) != XLOG_REG_TYPE_MAX + 1);
>>>   #undef REG_TYPE_STR
>>> diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
>>> index e2ec91b..ec31db0 100644
>>> --- a/fs/xfs/xfs_log_recover.c
>>> +++ b/fs/xfs/xfs_log_recover.c
>>> @@ -1811,6 +1811,8 @@ static const struct xlog_recover_item_ops 
>>> *xlog_recover_item_ops[] = {
>>>       &xlog_cud_item_ops,
>>>       &xlog_bui_item_ops,
>>>       &xlog_bud_item_ops,
>>> +    &xlog_attri_item_ops,
>>> +    &xlog_attrd_item_ops,
>>>   };
>>>   static const struct xlog_recover_item_ops *
>>> diff --git a/fs/xfs/xfs_ondisk.h b/fs/xfs/xfs_ondisk.h
>>> index 5f04d8a..0597a04 100644
>>> --- a/fs/xfs/xfs_ondisk.h
>>> +++ b/fs/xfs/xfs_ondisk.h
>>> @@ -126,6 +126,8 @@ xfs_check_ondisk_structs(void)
>>>       XFS_CHECK_STRUCT_SIZE(struct xfs_inode_log_format,    56);
>>>       XFS_CHECK_STRUCT_SIZE(struct xfs_qoff_logformat,    20);
>>>       XFS_CHECK_STRUCT_SIZE(struct xfs_trans_header,        16);
>>> +    XFS_CHECK_STRUCT_SIZE(struct xfs_attri_log_format,    40);
>>> +    XFS_CHECK_STRUCT_SIZE(struct xfs_attrd_log_format,    16);
>>>       /*
>>>        * The v5 superblock format extended several v4 header 
>>> structures with
>>> diff --git a/fs/xfs/xfs_xattr.c b/fs/xfs/xfs_xattr.c
>>> index bca48b3..9b0c790 100644
>>> --- a/fs/xfs/xfs_xattr.c
>>> +++ b/fs/xfs/xfs_xattr.c
>>> @@ -10,6 +10,7 @@
>>>   #include "xfs_log_format.h"
>>>   #include "xfs_da_format.h"
>>>   #include "xfs_inode.h"
>>> +#include "xfs_da_btree.h"
>>>   #include "xfs_attr.h"
>>>   #include "xfs_acl.h"
>>>   #include "xfs_da_btree.h"
>>> -- 
>>> 2.7.4
>>>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v12 1/8] xfs: Add delay ready attr remove routines
  2020-09-01 18:31         ` Darrick J. Wong
@ 2020-09-02 12:22           ` Brian Foster
  2020-09-04 23:03             ` Allison Collins
  0 siblings, 1 reply; 21+ messages in thread
From: Brian Foster @ 2020-09-02 12:22 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Allison Collins, linux-xfs

On Tue, Sep 01, 2020 at 11:31:34AM -0700, Darrick J. Wong wrote:
> On Tue, Sep 01, 2020 at 02:07:41PM -0400, Brian Foster wrote:
> > On Tue, Sep 01, 2020 at 10:20:21AM -0700, Darrick J. Wong wrote:
> > > On Tue, Sep 01, 2020 at 01:00:20PM -0400, Brian Foster wrote:
> > > > On Wed, Aug 26, 2020 at 05:35:11PM -0700, Allison Collins wrote:
> > > > > This patch modifies the attr remove routines to be delay ready. This
> > > > > means they no longer roll or commit transactions, but instead return
> > > > > -EAGAIN to have the calling routine roll and refresh the transaction. In
> > > > > this series, xfs_attr_remove_args has become xfs_attr_remove_iter, which
> > > > > uses a sort of state machine like switch to keep track of where it was
> > > > > when EAGAIN was returned. xfs_attr_node_removename has also been
> > > > > modified to use the switch, and a new version of xfs_attr_remove_args
> > > > > consists of a simple loop to refresh the transaction until the operation
> > > > > is completed.  A new XFS_DAC_DEFER_FINISH flag is used to finish the
> > > > > transaction where ever the existing code used to.
> > > > > 
> > > > > Calls to xfs_attr_rmtval_remove are replaced with the delay ready
> > > > > version __xfs_attr_rmtval_remove. We will rename
> > > > > __xfs_attr_rmtval_remove back to xfs_attr_rmtval_remove when we are
> > > > > done.
> > > > > 
> > > > > xfs_attr_rmtval_remove itself is still in use by the set routines (used
> > > > > during a rename).  For reasons of perserving existing function, we
> > > > 
> > > > Nit:				preserving
> > > > 
> > > > > modify xfs_attr_rmtval_remove to call xfs_defer_finish when the flag is
> > > > > set.  Similar to how xfs_attr_remove_args does here.  Once we transition
> > > > > the set routines to be delay ready, xfs_attr_rmtval_remove is no longer
> > > > > used and will be removed.
> > > > > 
> > > > > This patch also adds a new struct xfs_delattr_context, which we will use
> > > > > to keep track of the current state of an attribute operation. The new
> > > > > xfs_delattr_state enum is used to track various operations that are in
> > > > > progress so that we know not to repeat them, and resume where we left
> > > > > off before EAGAIN was returned to cycle out the transaction. Other
> > > > > members take the place of local variables that need to retain their
> > > > > values across multiple function recalls.  See xfs_attr.h for a more
> > > > > detailed diagram of the states.
> > > > > 
> > > > > Signed-off-by: Allison Collins <allison.henderson@oracle.com>
> > > > > ---
> > > > >  fs/xfs/libxfs/xfs_attr.c        | 162 ++++++++++++++++++++++++++++++----------
> > > > >  fs/xfs/libxfs/xfs_attr.h        |  73 ++++++++++++++++++
> > > > >  fs/xfs/libxfs/xfs_attr_leaf.c   |   2 +-
> > > > >  fs/xfs/libxfs/xfs_attr_remote.c |  39 +++++-----
> > > > >  fs/xfs/libxfs/xfs_attr_remote.h |   2 +-
> > > > >  fs/xfs/xfs_attr_inactive.c      |   2 +-
> > > > >  6 files changed, 220 insertions(+), 60 deletions(-)
> > > > > 
> > > > > diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
> > > > > index 2e055c0..ea50fc3 100644
> > > > > --- a/fs/xfs/libxfs/xfs_attr.c
> > > > > +++ b/fs/xfs/libxfs/xfs_attr.c
> > > > ...
> > > > > @@ -264,6 +264,33 @@ xfs_attr_set_shortform(
> > > > >  }
> > > > >  
> > > > >  /*
> > > > > + * Checks to see if a delayed attribute transaction should be rolled.  If so,
> > > > > + * also checks for a defer finish.  Transaction is finished and rolled as
> > > > > + * needed, and returns true of false if the delayed operation should continue.
> > > > > + */
> > > > > +int
> > > > > +xfs_attr_trans_roll(
> > > > > +	struct xfs_delattr_context	*dac)
> > > > > +{
> > > > > +	struct xfs_da_args              *args = dac->da_args;
> > > > > +	int				error = 0;
> > > > > +
> > > > > +	if (dac->flags & XFS_DAC_DEFER_FINISH) {
> > > > > +		/*
> > > > > +		 * The caller wants us to finish all the deferred ops so that we
> > > > > +		 * avoid pinning the log tail with a large number of deferred
> > > > > +		 * ops.
> > > > > +		 */
> > > > > +		dac->flags &= ~XFS_DAC_DEFER_FINISH;
> > > > > +		error = xfs_defer_finish(&args->trans);
> > > > > +		if (error)
> > > > > +			return error;
> > > > > +	}
> > > > > +
> > > > > +	return xfs_trans_roll_inode(&args->trans, args->dp);
> > > > 
> > > > I'm not sure there's a need to roll the transaction again if the
> > > > defer path above executes. xfs_defer_finish() completes the dfops and
> > > > always returns a clean transaction.
> > > 
> > > I'm not sure we even really need a DEFER_FINISH flag if (a) xfs_defer.c
> > > gets patched to finish all the other defer items before coming back to
> > > the next step of the delattr state machine and (b) Allison removes the
> > > _iter functions in favor of using the defer op mechanism even when we're
> > > not pushing the state changes through the log.
> > > 
> > 
> > What do you mean by using the dfops mechanism without pushing state
> > changes through the log? My understanding was that dfops would be
> > involved with the new intent based attr ops and the state management
> > handles the original ops until we no longer have to support them..
> 
> I think you were probably still out when Dave and Allison and I had the
> brain fart^Wstorm that nothing in the defer ops code actually requires
> you to log anything, which means that you can use it to manage a long
> running operation that spans multiple transaction rolls! :)
> 

Ok..

> ->create_intent and ->create_done are supposed to create log items and
> attach them to the transaction, but the defer finish loop will still
> call ->finish_item even if they return NULL pointers.  If the
> finish_item call steps around the null pointers and calls whatever upper
> level functions are needed to make progress, that works fine.  There's
> no log recovery, obviously.
> 
> In other words, we can (ab)use defer ops for attr set/remove even in the
> non-logged case, which eliminates the need for the separate control
> loop.
> 

Right, that all makes sense. I'm still missing how this impacts the
lower level functional code driven by the control loop...

> FWIW, I've implemented that strategy as a proof of concept for extent
> swapping:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/commit/?h=atomic-file-updates&id=a85883c36e2f3eff50db50fcf58a71d4f13d1f64
> 
> Wherein you get atomic swapext if you have the log items enabled, and
> if not, you get the old "rmap swapext" that doesn't have log tracking.
> 

Interesting, thanks. The whole dfops reuse idea sounds neat to me in
that we can presumably condense the new/old implementations even further
than originally expected, but I think this side steps the concern
related to my initial comment around refactoring. AFAICT this model
doesn't necessarily dictate what the underlying code looks like. In the
example above, it looks like the swapext code reenters into a
xfs_swapext_finish_one() function that trivially understands how to pick
up where it left off. This is a fortunate implementation detail of the
swapext operation (along with the whole notion of the
xfs_op_has_more_work() pattern, which as we've already touched on can be
difficult for things like xattr set, etc.).

By contrast, the xattr code is currently a ball of wire that rolls
transactions at various points up and down its implementation (generally
speaking). The primary intent of all this refactoring work is to isolate
the transaction rolling to a single mechanism so we have the ability to
use something like dfops in the first place. I don't see how the
insertion of unlogged dfops in the design really changes much in that
regard. Is there more to the previous discussion that I'm missing?

ISTM that we're potentially talking about different aspects of the
implementation. If so, we either need to continue to refactor the xattr
code to untangle the existing mess so it can be driven by a single entry
point (just like the swapext example), or that retrofitting the existing
implementation into the dfops mechanism means something more involved
like creating new dfops op types per sub-component of a particular xattr
op and queueing/running those individually. Though TBH, the latter
sounds like it is getting a bit into crazy infrastructure territory. ;P
Thoughts?

Brian

> > > (I'm working on (a) still, will have something in a few days...)
> > > 
> > > > > +}
> > > > > +
> > > > > +/*
> > > > >   * Set the attribute specified in @args.
> > > > >   */
> > > > >  int
> > > > ...
> > > > > @@ -1218,21 +1288,35 @@ xfs_attr_node_remove_rmt(
> > > > >   * This will involve walking down the Btree, and may involve joining
> > > > >   * leaf nodes and even joining intermediate nodes up to and including
> > > > >   * the root node (a special case of an intermediate node).
> > > > > + *
> > > > > + * This routine is meant to function as either an inline or delayed operation,
> > > > > + * and may return -EAGAIN when the transaction needs to be rolled.  Calling
> > > > > + * functions will need to handle this, and recall the function until a
> > > > > + * successful error code is returned.
> > > > >   */
> > > > >  STATIC int
> > > > >  xfs_attr_node_removename(
> > > > > -	struct xfs_da_args	*args)
> > > > > +	struct xfs_delattr_context	*dac)
> > > > >  {
> > > > > -	struct xfs_da_state	*state;
> > > > > -	struct xfs_da_state_blk	*blk;
> > > > > -	int			retval, error;
> > > > > -	struct xfs_inode	*dp = args->dp;
> > > > > +	struct xfs_da_args		*args = dac->da_args;
> > > > > +	struct xfs_da_state		*state;
> > > > > +	struct xfs_da_state_blk		*blk;
> > > > > +	int				retval, error;
> > > > > +	struct xfs_inode		*dp = args->dp;
> > > > >  
> > > > >  	trace_xfs_attr_node_removename(args);
> > > > > +	state = dac->da_state;
> > > > > +	blk = dac->blk;
> > > > >  
> > > > > -	error = xfs_attr_node_removename_setup(args, &state);
> > > > > -	if (error)
> > > > > -		goto out;
> > > > > +	if (dac->dela_state == XFS_DAS_RM_SHRINK)
> > > > > +		goto das_rm_shrink;
> > > > > +
> > > > > +	if ((dac->flags & XFS_DAC_NODE_RMVNAME_INIT) == 0) {
> > > > > +		dac->flags |= XFS_DAC_NODE_RMVNAME_INIT;
> > > > > +		error = xfs_attr_node_removename_setup(dac, &state);
> > > > > +		if (error)
> > > > > +			goto out;
> > > > > +	}
> > > > >  
> > > > >  	/*
> > > > >  	 * If there is an out-of-line value, de-allocate the blocks.
> > > > > @@ -1240,8 +1324,13 @@ xfs_attr_node_removename(
> > > > >  	 * overflow the maximum size of a transaction and/or hit a deadlock.
> > > > >  	 */
> > > > >  	if (args->rmtblkno > 0) {
> > > > > -		error = xfs_attr_node_remove_rmt(args, state);
> > > > > -		if (error)
> > > > > +		/*
> > > > > +		 * May return -EAGAIN. Remove blocks until args->rmtblkno == 0
> > > > > +		 */
> > > > > +		error = xfs_attr_node_remove_rmt(dac, state);
> > > > > +		if (error == -EAGAIN)
> > > > > +			return error;
> > > > > +		else if (error)
> > > > >  			goto out;
> > > > >  	}
> > > > >  
> > > > > @@ -1260,17 +1349,14 @@ xfs_attr_node_removename(
> > > > >  		error = xfs_da3_join(state);
> > > > >  		if (error)
> > > > >  			goto out;
> > > > > -		error = xfs_defer_finish(&args->trans);
> > > > > -		if (error)
> > > > > -			goto out;
> > > > > -		/*
> > > > > -		 * Commit the Btree join operation and start a new trans.
> > > > > -		 */
> > > > > -		error = xfs_trans_roll_inode(&args->trans, dp);
> > > > > -		if (error)
> > > > > -			goto out;
> > > > > +
> > > > > +		dac->flags |= XFS_DAC_DEFER_FINISH;
> > > > > +		dac->dela_state = XFS_DAS_RM_SHRINK;
> > > > > +		return -EAGAIN;
> > > > >  	}
> > > > >  
> > > > > +das_rm_shrink:
> > > > > +
> > > > >  	/*
> > > > >  	 * If the result is small enough, push it all into the inode.
> > > > >  	 */
> > > > 
> > > > ISTR that Dave or Darrick previously suggested that we should try to
> > > > isolate the state transition code as much as possible to a single
> > > > location. That basically means we should look at any place a particular
> > > > state check travels through multiple functions and see if we can
> > > > refactor things to flatten the state processing code. I tend to agree
> > > > that is the ideal approach given how difficult it can be to track state
> > > > changes through multiple functions.
> > > 
> > > Yes. :)
> > > 
> > > > In light of that (and as an example), I think the whole
> > > > xfs_attr_node_removename() path should be refactored so it looks
> > > > something like the following (with obvious error
> > > > handling/comment/aesthetic cleanups etc.):
> > > > 
> > > > xfs_attr_node_removename_iter()
> > > > {
> > > > 	...
> > > > 
> > > > 	if ((dac->flags & XFS_DAC_NODE_RMVNAME_INIT) == 0) {
> > > > 		<do init stuff>
> > > > 	}
> > > > 
> > > > 	switch (dac->dela_state) {
> > > > 	case 0:
> > > 
> > > I kinda wish "0" had its own name, but I don't also want to start
> > > another round of naming bikeshed. :)
> > > 
> > > > 		/* 
> > > > 		 * repeatedly remove remote blocks, remove the entry and
> > > > 		 * join. returns -EAGAIN or 0 for completion of the step.
> > > > 		 */
> > > > 		error = xfs_attr_node_remove_step(dac, state);
> > > > 		if (error)
> > > > 			break;
> > > > 
> > > > 		/* check whether to shrink or return success */
> > > > 		if (!error && xfs_bmap_one_block(...)) {
> > > > 			dac->dela_state = XFS_DAS_RM_SHRINK;
> > > > 			error = -EAGAIN;
> > > > 		}
> > > > 		break;
> > > > 	case XFS_DAS_RM_SHRINK:
> > > > 		/* shrink the fork, no reentry, no next step */
> > > > 		error = xfs_attr_node_shrink_step(args, state);	
> > > > 		break;
> > > 
> > > <nod> The ASCII art diagrams help assuage my nerves about the fact that
> > > we branch based on dela_state but not all the branches actually show us
> > > moving to the next state.
> > > 
> > > I've gotten the distinct sense, though, that throwing the new state all
> > > the way back up to _iter() to set it is probably a lot more fuss than
> > > it's worth for the attr set case, though...
> > > 
> > 
> > That's quite possible. :P
> > 
> > > > 	default:
> > > > 		ASSERT(0);
> > > > 		return -EINVAL;
> > > > 	}
> > > > 
> > > > 	if (error == -EAGAIN)
> > > > 		return error;
> > > > 
> > > > 	<do cleanup stuff>
> > > > 	...
> > > > 	return error;
> > > > }
> > > > 
> > > > The idea here is that we have one _iter() function that does all the
> > > > state management for a particular operation and has minimal other logic.
> > > > That way we can see the states that repeat, transition, etc. all in one
> > > > place. The _step() functions implement the functional components of each
> > > > state and do no state management whatsoever beyond return -EAGAIN to
> > > > request reentry or return 0 for completion. In the case of the latter,
> > > > the _iter() function decides whether to transition to another state
> > > > (returning -EAGAIN itself) or complete the operation. If a _step()
> > > > function ever needs to set or check ->dela_state, then that is clear
> > > > indication it must be broken up into multiple _step() functions.
> > > 
> > > ...because I've frequently had the same thought that the state machine
> > > handling ought to be in the same place.  But then I start reading
> > > through the xattr code to figure out how that would be done, and get
> > > trapped by the fact that some of the decisions about the next state have
> > > to happen pretty deep in the xattr code-- stuff like allocating an
> > > extent for a remote value, where depending on whether or not we got enough
> > > blocks to satisfy the space requirements, either we can move on to the
> > > next state and return EAGAIN, or we have to save the current state and
> > > EAGAIN to try to get more blocks.
> > > 
> > 
> > I haven't walked through the set code in a while, but this sort of
> > sounds like more of the same (heavy refactoring followed by insertion of
> > state management).
> > 
> > > Maybe it would help a little if the setting of DEFER_FINISH and changing
> > > of dela_state could be put into a little helper with a tracepoint so
> > > that future us can ftrace the state machine to make sure it's working
> > > correctly?
> > > 
> > 
> > I like the idea, but not sure it helps with following the code as much
> > as runtime analysis.
> 
> <nod>
> 
> > > > I think this implements the separation of state and functionality model
> > > > we're after without introduction of crazy state processing frameworks,
> > > 
> > > "crazy state processing frameworks"... like xfs_defer.c? :)
> > > 
> > 
> > Re: my question above, I'm curious about reusing dfops as a mechanism
> > for both modes if somebody can elaborate on the idea or point me at a
> > reference where it was previously discussed..? I could have lost track
> > or missed a discussion while I was out...
> 
> (See above...)
> 
> > > > etc., but I admit I've so far only thought about it wrt the remove case
> > > > (which is more simple than the set case). Also note that as usual, any
> > > > associated refactoring of the functional components should come as
> > > > preliminary patches such that this patch only introduces state bits.
> > > > Thoughts?
> > > 
> > > (I thought/hoped we'd done all the refactoring in the 23-patch megalith
> > > that I tossed into 5.9... :))
> > > 
> > 
> > Heh. I'm glad to see that snowball got tossed. ;)
> 
> :)
> 
> --D
> 
> > Brian
> > 
> > > --D
> > > 
> > > > Brian
> > > > 
> > > > > diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
> > > > > index 3e97a93..9573949 100644
> > > > > --- a/fs/xfs/libxfs/xfs_attr.h
> > > > > +++ b/fs/xfs/libxfs/xfs_attr.h
> > > > > @@ -74,6 +74,75 @@ struct xfs_attr_list_context {
> > > > >  };
> > > > >  
> > > > >  
> > > > > +/*
> > > > > + * ========================================================================
> > > > > + * Structure used to pass context around among the delayed routines.
> > > > > + * ========================================================================
> > > > > + */
> > > > > +
> > > > > +/*
> > > > > + * Below is a state machine diagram for attr remove operations. The  XFS_DAS_*
> > > > > + * states indicate places where the function would return -EAGAIN, and then
> > > > > + * immediately resume from after being recalled by the calling function. States
> > > > > + * marked as a "subroutine state" indicate that they belong to a subroutine, and
> > > > > + * so the calling function needs to pass them back to that subroutine to allow
> > > > > + * it to finish where it left off. But they otherwise do not have a role in the
> > > > > + * calling function other than just passing through.
> > > > > + *
> > > > > + * xfs_attr_remove_iter()
> > > > > + *	  XFS_DAS_RM_SHRINK ─�
> > > > > + *	  (subroutine state) │
> > > > > + *	                     └─>xfs_attr_node_removename()
> > > > > + *	                                      │
> > > > > + *	                                      v
> > > > > + *	                                   need to
> > > > > + *	                                shrink tree? ─n─�
> > > > > + *	                                      │         │
> > > > > + *	                                      y         │
> > > > > + *	                                      │         │
> > > > > + *	                                      v         │
> > > > > + *	                              XFS_DAS_RM_SHRINK │
> > > > > + *	                                      │         │
> > > > > + *	                                      v         │
> > > > > + *	                                     done <─────┘
> > > > > + *
> > > > > + */
> > > > > +
> > > > > +/*
> > > > > + * Enum values for xfs_delattr_context.da_state
> > > > > + *
> > > > > + * These values are used by delayed attribute operations to keep track  of where
> > > > > + * they were before they returned -EAGAIN.  A return code of -EAGAIN signals the
> > > > > + * calling function to roll the transaction, and then recall the subroutine to
> > > > > + * finish the operation.  The enum is then used by the subroutine to jump back
> > > > > + * to where it was and resume executing where it left off.
> > > > > + */
> > > > > +enum xfs_delattr_state {
> > > > > +				      /* Zero is uninitalized */
> > > > > +	XFS_DAS_RM_SHRINK	= 1,  /* We are shrinking the tree */
> > > > > +};
> > > > > +
> > > > > +/*
> > > > > + * Defines for xfs_delattr_context.flags
> > > > > + */
> > > > > +#define XFS_DAC_DEFER_FINISH		0x01 /* finish the transaction */
> > > > > +#define XFS_DAC_NODE_RMVNAME_INIT	0x02 /* xfs_attr_node_removename init */
> > > > > +
> > > > > +/*
> > > > > + * Context used for keeping track of delayed attribute operations
> > > > > + */
> > > > > +struct xfs_delattr_context {
> > > > > +	struct xfs_da_args      *da_args;
> > > > > +
> > > > > +	/* Used in xfs_attr_node_removename to roll through removing blocks */
> > > > > +	struct xfs_da_state     *da_state;
> > > > > +	struct xfs_da_state_blk *blk;
> > > > > +
> > > > > +	/* Used to keep track of current state of delayed operation */
> > > > > +	unsigned int            flags;
> > > > > +	enum xfs_delattr_state  dela_state;
> > > > > +};
> > > > > +
> > > > >  /*========================================================================
> > > > >   * Function prototypes for the kernel.
> > > > >   *========================================================================*/
> > > > > @@ -91,6 +160,10 @@ int xfs_attr_set(struct xfs_da_args *args);
> > > > >  int xfs_attr_set_args(struct xfs_da_args *args);
> > > > >  int xfs_has_attr(struct xfs_da_args *args);
> > > > >  int xfs_attr_remove_args(struct xfs_da_args *args);
> > > > > +int xfs_attr_remove_iter(struct xfs_delattr_context *dac);
> > > > > +int xfs_attr_trans_roll(struct xfs_delattr_context *dac);
> > > > >  bool xfs_attr_namecheck(const void *name, size_t length);
> > > > > +void xfs_delattr_context_init(struct xfs_delattr_context *dac,
> > > > > +			      struct xfs_da_args *args);
> > > > >  
> > > > >  #endif	/* __XFS_ATTR_H__ */
> > > > > diff --git a/fs/xfs/libxfs/xfs_attr_leaf.c b/fs/xfs/libxfs/xfs_attr_leaf.c
> > > > > index 8623c81..4ed7b31 100644
> > > > > --- a/fs/xfs/libxfs/xfs_attr_leaf.c
> > > > > +++ b/fs/xfs/libxfs/xfs_attr_leaf.c
> > > > > @@ -19,8 +19,8 @@
> > > > >  #include "xfs_bmap_btree.h"
> > > > >  #include "xfs_bmap.h"
> > > > >  #include "xfs_attr_sf.h"
> > > > > -#include "xfs_attr_remote.h"
> > > > >  #include "xfs_attr.h"
> > > > > +#include "xfs_attr_remote.h"
> > > > >  #include "xfs_attr_leaf.h"
> > > > >  #include "xfs_error.h"
> > > > >  #include "xfs_trace.h"
> > > > > diff --git a/fs/xfs/libxfs/xfs_attr_remote.c b/fs/xfs/libxfs/xfs_attr_remote.c
> > > > > index 3f80ced..7f81b48 100644
> > > > > --- a/fs/xfs/libxfs/xfs_attr_remote.c
> > > > > +++ b/fs/xfs/libxfs/xfs_attr_remote.c
> > > > > @@ -676,10 +676,14 @@ xfs_attr_rmtval_invalidate(
> > > > >   */
> > > > >  int
> > > > >  xfs_attr_rmtval_remove(
> > > > > -	struct xfs_da_args      *args)
> > > > > +	struct xfs_da_args		*args)
> > > > >  {
> > > > > -	int			error;
> > > > > -	int			retval;
> > > > > +	xfs_dablk_t			lblkno;
> > > > > +	int				blkcnt;
> > > > > +	int				error;
> > > > > +	struct xfs_delattr_context	dac  = {
> > > > > +		.da_args	= args,
> > > > > +	};
> > > > >  
> > > > >  	trace_xfs_attr_rmtval_remove(args);
> > > > >  
> > > > > @@ -687,19 +691,17 @@ xfs_attr_rmtval_remove(
> > > > >  	 * Keep de-allocating extents until the remote-value region is gone.
> > > > >  	 */
> > > > >  	do {
> > > > > -		retval = __xfs_attr_rmtval_remove(args);
> > > > > -		if (retval && retval != -EAGAIN)
> > > > > -			return retval;
> > > > > +		error = __xfs_attr_rmtval_remove(&dac);
> > > > > +		if (error != -EAGAIN)
> > > > > +			break;
> > > > >  
> > > > > -		/*
> > > > > -		 * Close out trans and start the next one in the chain.
> > > > > -		 */
> > > > > -		error = xfs_trans_roll_inode(&args->trans, args->dp);
> > > > > +		error = xfs_attr_trans_roll(&dac);
> > > > >  		if (error)
> > > > >  			return error;
> > > > > -	} while (retval == -EAGAIN);
> > > > >  
> > > > > -	return 0;
> > > > > +	} while (true);
> > > > > +
> > > > > +	return error;
> > > > >  }
> > > > >  
> > > > >  /*
> > > > > @@ -709,9 +711,10 @@ xfs_attr_rmtval_remove(
> > > > >   */
> > > > >  int
> > > > >  __xfs_attr_rmtval_remove(
> > > > > -	struct xfs_da_args	*args)
> > > > > +	struct xfs_delattr_context	*dac)
> > > > >  {
> > > > > -	int			error, done;
> > > > > +	struct xfs_da_args		*args = dac->da_args;
> > > > > +	int				error, done;
> > > > >  
> > > > >  	/*
> > > > >  	 * Unmap value blocks for this attr.
> > > > > @@ -721,12 +724,10 @@ __xfs_attr_rmtval_remove(
> > > > >  	if (error)
> > > > >  		return error;
> > > > >  
> > > > > -	error = xfs_defer_finish(&args->trans);
> > > > > -	if (error)
> > > > > -		return error;
> > > > > -
> > > > > -	if (!done)
> > > > > +	if (!done) {
> > > > > +		dac->flags |= XFS_DAC_DEFER_FINISH;
> > > > >  		return -EAGAIN;
> > > > > +	}
> > > > >  
> > > > >  	return error;
> > > > >  }
> > > > > diff --git a/fs/xfs/libxfs/xfs_attr_remote.h b/fs/xfs/libxfs/xfs_attr_remote.h
> > > > > index 9eee615..002fd30 100644
> > > > > --- a/fs/xfs/libxfs/xfs_attr_remote.h
> > > > > +++ b/fs/xfs/libxfs/xfs_attr_remote.h
> > > > > @@ -14,5 +14,5 @@ int xfs_attr_rmtval_remove(struct xfs_da_args *args);
> > > > >  int xfs_attr_rmtval_stale(struct xfs_inode *ip, struct xfs_bmbt_irec *map,
> > > > >  		xfs_buf_flags_t incore_flags);
> > > > >  int xfs_attr_rmtval_invalidate(struct xfs_da_args *args);
> > > > > -int __xfs_attr_rmtval_remove(struct xfs_da_args *args);
> > > > > +int __xfs_attr_rmtval_remove(struct xfs_delattr_context *dac);
> > > > >  #endif /* __XFS_ATTR_REMOTE_H__ */
> > > > > diff --git a/fs/xfs/xfs_attr_inactive.c b/fs/xfs/xfs_attr_inactive.c
> > > > > index bfad669..aaa7e66 100644
> > > > > --- a/fs/xfs/xfs_attr_inactive.c
> > > > > +++ b/fs/xfs/xfs_attr_inactive.c
> > > > > @@ -15,10 +15,10 @@
> > > > >  #include "xfs_da_format.h"
> > > > >  #include "xfs_da_btree.h"
> > > > >  #include "xfs_inode.h"
> > > > > +#include "xfs_attr.h"
> > > > >  #include "xfs_attr_remote.h"
> > > > >  #include "xfs_trans.h"
> > > > >  #include "xfs_bmap.h"
> > > > > -#include "xfs_attr.h"
> > > > >  #include "xfs_attr_leaf.h"
> > > > >  #include "xfs_quota.h"
> > > > >  #include "xfs_dir2.h"
> > > > > -- 
> > > > > 2.7.4
> > > > > 
> > > > 
> > > 
> > 
> 


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v12 1/8] xfs: Add delay ready attr remove routines
  2020-09-02 12:22           ` Brian Foster
@ 2020-09-04 23:03             ` Allison Collins
  2020-09-08 14:43               ` Brian Foster
  0 siblings, 1 reply; 21+ messages in thread
From: Allison Collins @ 2020-09-04 23:03 UTC (permalink / raw)
  To: Brian Foster, Darrick J. Wong; +Cc: linux-xfs



On 9/2/20 5:22 AM, Brian Foster wrote:
> On Tue, Sep 01, 2020 at 11:31:34AM -0700, Darrick J. Wong wrote:
>> On Tue, Sep 01, 2020 at 02:07:41PM -0400, Brian Foster wrote:
>>> On Tue, Sep 01, 2020 at 10:20:21AM -0700, Darrick J. Wong wrote:
>>>> On Tue, Sep 01, 2020 at 01:00:20PM -0400, Brian Foster wrote:
>>>>> On Wed, Aug 26, 2020 at 05:35:11PM -0700, Allison Collins wrote:
>>>>>> This patch modifies the attr remove routines to be delay ready. This
>>>>>> means they no longer roll or commit transactions, but instead return
>>>>>> -EAGAIN to have the calling routine roll and refresh the transaction. In
>>>>>> this series, xfs_attr_remove_args has become xfs_attr_remove_iter, which
>>>>>> uses a sort of state machine like switch to keep track of where it was
>>>>>> when EAGAIN was returned. xfs_attr_node_removename has also been
>>>>>> modified to use the switch, and a new version of xfs_attr_remove_args
>>>>>> consists of a simple loop to refresh the transaction until the operation
>>>>>> is completed.  A new XFS_DAC_DEFER_FINISH flag is used to finish the
>>>>>> transaction where ever the existing code used to.
>>>>>>
>>>>>> Calls to xfs_attr_rmtval_remove are replaced with the delay ready
>>>>>> version __xfs_attr_rmtval_remove. We will rename
>>>>>> __xfs_attr_rmtval_remove back to xfs_attr_rmtval_remove when we are
>>>>>> done.
>>>>>>
>>>>>> xfs_attr_rmtval_remove itself is still in use by the set routines (used
>>>>>> during a rename).  For reasons of perserving existing function, we
>>>>>
>>>>> Nit:				preserving
ok, will fix
>>>>>
>>>>>> modify xfs_attr_rmtval_remove to call xfs_defer_finish when the flag is
>>>>>> set.  Similar to how xfs_attr_remove_args does here.  Once we transition
>>>>>> the set routines to be delay ready, xfs_attr_rmtval_remove is no longer
>>>>>> used and will be removed.
>>>>>>
>>>>>> This patch also adds a new struct xfs_delattr_context, which we will use
>>>>>> to keep track of the current state of an attribute operation. The new
>>>>>> xfs_delattr_state enum is used to track various operations that are in
>>>>>> progress so that we know not to repeat them, and resume where we left
>>>>>> off before EAGAIN was returned to cycle out the transaction. Other
>>>>>> members take the place of local variables that need to retain their
>>>>>> values across multiple function recalls.  See xfs_attr.h for a more
>>>>>> detailed diagram of the states.
>>>>>>
>>>>>> Signed-off-by: Allison Collins <allison.henderson@oracle.com>
>>>>>> ---
>>>>>>   fs/xfs/libxfs/xfs_attr.c        | 162 ++++++++++++++++++++++++++++++----------
>>>>>>   fs/xfs/libxfs/xfs_attr.h        |  73 ++++++++++++++++++
>>>>>>   fs/xfs/libxfs/xfs_attr_leaf.c   |   2 +-
>>>>>>   fs/xfs/libxfs/xfs_attr_remote.c |  39 +++++-----
>>>>>>   fs/xfs/libxfs/xfs_attr_remote.h |   2 +-
>>>>>>   fs/xfs/xfs_attr_inactive.c      |   2 +-
>>>>>>   6 files changed, 220 insertions(+), 60 deletions(-)
>>>>>>
>>>>>> diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
>>>>>> index 2e055c0..ea50fc3 100644
>>>>>> --- a/fs/xfs/libxfs/xfs_attr.c
>>>>>> +++ b/fs/xfs/libxfs/xfs_attr.c
>>>>> ...
>>>>>> @@ -264,6 +264,33 @@ xfs_attr_set_shortform(
>>>>>>   }
>>>>>>   
>>>>>>   /*
>>>>>> + * Checks to see if a delayed attribute transaction should be rolled.  If so,
>>>>>> + * also checks for a defer finish.  Transaction is finished and rolled as
>>>>>> + * needed, and returns true of false if the delayed operation should continue.
>>>>>> + */
>>>>>> +int
>>>>>> +xfs_attr_trans_roll(
>>>>>> +	struct xfs_delattr_context	*dac)
>>>>>> +{
>>>>>> +	struct xfs_da_args              *args = dac->da_args;
>>>>>> +	int				error = 0;
>>>>>> +
>>>>>> +	if (dac->flags & XFS_DAC_DEFER_FINISH) {
>>>>>> +		/*
>>>>>> +		 * The caller wants us to finish all the deferred ops so that we
>>>>>> +		 * avoid pinning the log tail with a large number of deferred
>>>>>> +		 * ops.
>>>>>> +		 */
>>>>>> +		dac->flags &= ~XFS_DAC_DEFER_FINISH;
>>>>>> +		error = xfs_defer_finish(&args->trans);
>>>>>> +		if (error)
>>>>>> +			return error;
>>>>>> +	}
>>>>>> +
>>>>>> +	return xfs_trans_roll_inode(&args->trans, args->dp);
>>>>>
>>>>> I'm not sure there's a need to roll the transaction again if the
>>>>> defer path above executes. xfs_defer_finish() completes the dfops and
>>>>> always returns a clean transaction.
>>>>
>>>> I'm not sure we even really need a DEFER_FINISH flag if (a) xfs_defer.c
>>>> gets patched to finish all the other defer items before coming back to
>>>> the next step of the delattr state machine and (b) Allison removes the
>>>> _iter functions in favor of using the defer op mechanism even when we're
>>>> not pushing the state changes through the log.
>>>>
>>>
>>> What do you mean by using the dfops mechanism without pushing state
>>> changes through the log? My understanding was that dfops would be
>>> involved with the new intent based attr ops and the state management
>>> handles the original ops until we no longer have to support them..
>>
>> I think you were probably still out when Dave and Allison and I had the
>> brain fart^Wstorm that nothing in the defer ops code actually requires
>> you to log anything, which means that you can use it to manage a long
>> running operation that spans multiple transaction rolls! :)
>>
> 
> Ok..
> 
>> ->create_intent and ->create_done are supposed to create log items and
>> attach them to the transaction, but the defer finish loop will still
>> call ->finish_item even if they return NULL pointers.  If the
>> finish_item call steps around the null pointers and calls whatever upper
>> level functions are needed to make progress, that works fine.  There's
>> no log recovery, obviously.
>>
>> In other words, we can (ab)use defer ops for attr set/remove even in the
>> non-logged case, which eliminates the need for the separate control
>> loop.
>>
> 
> Right, that all makes sense. I'm still missing how this impacts the
> lower level functional code driven by the control loop...
> 
>> FWIW, I've implemented that strategy as a proof of concept for extent
>> swapping:
>>
>> https://urldefense.com/v3/__https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/commit/?h=atomic-file-updates&id=a85883c36e2f3eff50db50fcf58a71d4f13d1f64__;!!GqivPVa7Brio!MQTOxwgVl5y_iE_BCpboDzsjWozVuUj8T-EEE1ICVu3TVeAwAWaWedD-cxFowrJwBzGi$
>>
>> Wherein you get atomic swapext if you have the log items enabled, and
>> if not, you get the old "rmap swapext" that doesn't have log tracking.
>>
> 
> Interesting, thanks. The whole dfops reuse idea sounds neat to me in
> that we can presumably condense the new/old implementations even further
> than originally expected, but I think this side steps the concern
> related to my initial comment around refactoring. AFAICT this model
> doesn't necessarily dictate what the underlying code looks like. In the
> example above, it looks like the swapext code reenters into a
> xfs_swapext_finish_one() function that trivially understands how to pick
> up where it left off. This is a fortunate implementation detail of the
> swapext operation (along with the whole notion of the
> xfs_op_has_more_work() pattern, which as we've already touched on can be
> difficult for things like xattr set, etc.).
> 
> By contrast, the xattr code is currently a ball of wire that rolls
> transactions at various points up and down its implementation (generally
> speaking). The primary intent of all this refactoring work is to isolate
> the transaction rolling to a single mechanism so we have the ability to
> use something like dfops in the first place. I don't see how the
> insertion of unlogged dfops in the design really changes much in that
> regard. Is there more to the previous discussion that I'm missing?
> 
> ISTM that we're potentially talking about different aspects of the
> implementation. If so, we either need to continue to refactor the xattr
> code to untangle the existing mess so it can be driven by a single entry
> point (just like the swapext example), or that retrofitting the existing
> implementation into the dfops mechanism means something more involved
> like creating new dfops op types per sub-component of a particular xattr
> op and queueing/running those individually. Though TBH, the latter
> sounds like it is getting a bit into crazy infrastructure territory. ;P
> Thoughts?
> 
> Brian

Yeah, I'll try some experimenting to see what that ends up looking like. 
  I've looked at the swap extent code from the link above, and I think I 
understand now what Darrick is describing with the reuse/abuse the 
defops mechanics.  We modify the *_create_intent routine to return null 
to skip recoding it to the log. I THINK this should still work beucase 
the state machine context is carried around in the xfs_attr_item, not 
the xfs_attri_log_item.  So we maybe might be able to make it work with 
out too much crazy.



> 
>>>> (I'm working on (a) still, will have something in a few days...)
>>>>
>>>>>> +}
>>>>>> +
>>>>>> +/*
>>>>>>    * Set the attribute specified in @args.
>>>>>>    */
>>>>>>   int
>>>>> ...
>>>>>> @@ -1218,21 +1288,35 @@ xfs_attr_node_remove_rmt(
>>>>>>    * This will involve walking down the Btree, and may involve joining
>>>>>>    * leaf nodes and even joining intermediate nodes up to and including
>>>>>>    * the root node (a special case of an intermediate node).
>>>>>> + *
>>>>>> + * This routine is meant to function as either an inline or delayed operation,
>>>>>> + * and may return -EAGAIN when the transaction needs to be rolled.  Calling
>>>>>> + * functions will need to handle this, and recall the function until a
>>>>>> + * successful error code is returned.
>>>>>>    */
>>>>>>   STATIC int
>>>>>>   xfs_attr_node_removename(
>>>>>> -	struct xfs_da_args	*args)
>>>>>> +	struct xfs_delattr_context	*dac)
>>>>>>   {
>>>>>> -	struct xfs_da_state	*state;
>>>>>> -	struct xfs_da_state_blk	*blk;
>>>>>> -	int			retval, error;
>>>>>> -	struct xfs_inode	*dp = args->dp;
>>>>>> +	struct xfs_da_args		*args = dac->da_args;
>>>>>> +	struct xfs_da_state		*state;
>>>>>> +	struct xfs_da_state_blk		*blk;
>>>>>> +	int				retval, error;
>>>>>> +	struct xfs_inode		*dp = args->dp;
>>>>>>   
>>>>>>   	trace_xfs_attr_node_removename(args);
>>>>>> +	state = dac->da_state;
>>>>>> +	blk = dac->blk;
>>>>>>   
>>>>>> -	error = xfs_attr_node_removename_setup(args, &state);
>>>>>> -	if (error)
>>>>>> -		goto out;
>>>>>> +	if (dac->dela_state == XFS_DAS_RM_SHRINK)
>>>>>> +		goto das_rm_shrink;
>>>>>> +
>>>>>> +	if ((dac->flags & XFS_DAC_NODE_RMVNAME_INIT) == 0) {
>>>>>> +		dac->flags |= XFS_DAC_NODE_RMVNAME_INIT;
>>>>>> +		error = xfs_attr_node_removename_setup(dac, &state);
>>>>>> +		if (error)
>>>>>> +			goto out;
>>>>>> +	}
>>>>>>   
>>>>>>   	/*
>>>>>>   	 * If there is an out-of-line value, de-allocate the blocks.
>>>>>> @@ -1240,8 +1324,13 @@ xfs_attr_node_removename(
>>>>>>   	 * overflow the maximum size of a transaction and/or hit a deadlock.
>>>>>>   	 */
>>>>>>   	if (args->rmtblkno > 0) {
>>>>>> -		error = xfs_attr_node_remove_rmt(args, state);
>>>>>> -		if (error)
>>>>>> +		/*
>>>>>> +		 * May return -EAGAIN. Remove blocks until args->rmtblkno == 0
>>>>>> +		 */
>>>>>> +		error = xfs_attr_node_remove_rmt(dac, state);
>>>>>> +		if (error == -EAGAIN)
>>>>>> +			return error;
>>>>>> +		else if (error)
>>>>>>   			goto out;
>>>>>>   	}
>>>>>>   
>>>>>> @@ -1260,17 +1349,14 @@ xfs_attr_node_removename(
>>>>>>   		error = xfs_da3_join(state);
>>>>>>   		if (error)
>>>>>>   			goto out;
>>>>>> -		error = xfs_defer_finish(&args->trans);
>>>>>> -		if (error)
>>>>>> -			goto out;
>>>>>> -		/*
>>>>>> -		 * Commit the Btree join operation and start a new trans.
>>>>>> -		 */
>>>>>> -		error = xfs_trans_roll_inode(&args->trans, dp);
>>>>>> -		if (error)
>>>>>> -			goto out;
>>>>>> +
>>>>>> +		dac->flags |= XFS_DAC_DEFER_FINISH;
>>>>>> +		dac->dela_state = XFS_DAS_RM_SHRINK;
>>>>>> +		return -EAGAIN;
>>>>>>   	}
>>>>>>   
>>>>>> +das_rm_shrink:
>>>>>> +
>>>>>>   	/*
>>>>>>   	 * If the result is small enough, push it all into the inode.
>>>>>>   	 */
>>>>>
>>>>> ISTR that Dave or Darrick previously suggested that we should try to
>>>>> isolate the state transition code as much as possible to a single
>>>>> location. That basically means we should look at any place a particular
>>>>> state check travels through multiple functions and see if we can
>>>>> refactor things to flatten the state processing code. I tend to agree
>>>>> that is the ideal approach given how difficult it can be to track state
>>>>> changes through multiple functions.
>>>>
>>>> Yes. :)
>>>>
>>>>> In light of that (and as an example), I think the whole
>>>>> xfs_attr_node_removename() path should be refactored so it looks
>>>>> something like the following (with obvious error
>>>>> handling/comment/aesthetic cleanups etc.):
>>>>>
>>>>> xfs_attr_node_removename_iter()
>>>>> {
>>>>> 	...
>>>>>
>>>>> 	if ((dac->flags & XFS_DAC_NODE_RMVNAME_INIT) == 0) {
>>>>> 		<do init stuff>
>>>>> 	}
>>>>>
>>>>> 	switch (dac->dela_state) {
>>>>> 	case 0:
>>>>
>>>> I kinda wish "0" had its own name, but I don't also want to start
>>>> another round of naming bikeshed. :)
>>>>
>>>>> 		/*
>>>>> 		 * repeatedly remove remote blocks, remove the entry and
>>>>> 		 * join. returns -EAGAIN or 0 for completion of the step.
>>>>> 		 */
>>>>> 		error = xfs_attr_node_remove_step(dac, state);
>>>>> 		if (error)
>>>>> 			break;
>>>>>
>>>>> 		/* check whether to shrink or return success */
>>>>> 		if (!error && xfs_bmap_one_block(...)) {
>>>>> 			dac->dela_state = XFS_DAS_RM_SHRINK;
>>>>> 			error = -EAGAIN;
>>>>> 		}
>>>>> 		break;
>>>>> 	case XFS_DAS_RM_SHRINK:
>>>>> 		/* shrink the fork, no reentry, no next step */
>>>>> 		error = xfs_attr_node_shrink_step(args, state);	
>>>>> 		break;
>>>>
>>>> <nod> The ASCII art diagrams help assuage my nerves about the fact that
>>>> we branch based on dela_state but not all the branches actually show us
>>>> moving to the next state.
>>>>
>>>> I've gotten the distinct sense, though, that throwing the new state all
>>>> the way back up to _iter() to set it is probably a lot more fuss than
>>>> it's worth for the attr set case, though...
>>>>
>>>
>>> That's quite possible. :P
Sure, I will see if I can get something similar to this worked out, at 
least for the remove path.  But yes, the set path would be a bit more of 
a challenge.

Thanks all!

Allison

>>>
>>>>> 	default:
>>>>> 		ASSERT(0);
>>>>> 		return -EINVAL;
>>>>> 	}
>>>>>
>>>>> 	if (error == -EAGAIN)
>>>>> 		return error;
>>>>>
>>>>> 	<do cleanup stuff>
>>>>> 	...
>>>>> 	return error;
>>>>> }
>>>>>
>>>>> The idea here is that we have one _iter() function that does all the
>>>>> state management for a particular operation and has minimal other logic.
>>>>> That way we can see the states that repeat, transition, etc. all in one
>>>>> place. The _step() functions implement the functional components of each
>>>>> state and do no state management whatsoever beyond return -EAGAIN to
>>>>> request reentry or return 0 for completion. In the case of the latter,
>>>>> the _iter() function decides whether to transition to another state
>>>>> (returning -EAGAIN itself) or complete the operation. If a _step()
>>>>> function ever needs to set or check ->dela_state, then that is clear
>>>>> indication it must be broken up into multiple _step() functions.
>>>>
>>>> ...because I've frequently had the same thought that the state machine
>>>> handling ought to be in the same place.  But then I start reading
>>>> through the xattr code to figure out how that would be done, and get
>>>> trapped by the fact that some of the decisions about the next state have
>>>> to happen pretty deep in the xattr code-- stuff like allocating an
>>>> extent for a remote value, where depending on whether or not we got enough
>>>> blocks to satisfy the space requirements, either we can move on to the
>>>> next state and return EAGAIN, or we have to save the current state and
>>>> EAGAIN to try to get more blocks.
>>>>
>>>
>>> I haven't walked through the set code in a while, but this sort of
>>> sounds like more of the same (heavy refactoring followed by insertion of
>>> state management).
>>>
>>>> Maybe it would help a little if the setting of DEFER_FINISH and changing
>>>> of dela_state could be put into a little helper with a tracepoint so
>>>> that future us can ftrace the state machine to make sure it's working
>>>> correctly?
>>>>
>>>
>>> I like the idea, but not sure it helps with following the code as much
>>> as runtime analysis.
>>
>> <nod>
>>
>>>>> I think this implements the separation of state and functionality model
>>>>> we're after without introduction of crazy state processing frameworks,
>>>>
>>>> "crazy state processing frameworks"... like xfs_defer.c? :)
>>>>
>>>
>>> Re: my question above, I'm curious about reusing dfops as a mechanism
>>> for both modes if somebody can elaborate on the idea or point me at a
>>> reference where it was previously discussed..? I could have lost track
>>> or missed a discussion while I was out...
>>
>> (See above...)
>>
>>>>> etc., but I admit I've so far only thought about it wrt the remove case
>>>>> (which is more simple than the set case). Also note that as usual, any
>>>>> associated refactoring of the functional components should come as
>>>>> preliminary patches such that this patch only introduces state bits.
>>>>> Thoughts?
>>>>
>>>> (I thought/hoped we'd done all the refactoring in the 23-patch megalith
>>>> that I tossed into 5.9... :))
>>>>
>>>
>>> Heh. I'm glad to see that snowball got tossed. ;)
>>
>> :)
>>
>> --D
>>
>>> Brian
>>>
>>>> --D
>>>>
>>>>> Brian
>>>>>
>>>>>> diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
>>>>>> index 3e97a93..9573949 100644
>>>>>> --- a/fs/xfs/libxfs/xfs_attr.h
>>>>>> +++ b/fs/xfs/libxfs/xfs_attr.h
>>>>>> @@ -74,6 +74,75 @@ struct xfs_attr_list_context {
>>>>>>   };
>>>>>>   
>>>>>>   
>>>>>> +/*
>>>>>> + * ========================================================================
>>>>>> + * Structure used to pass context around among the delayed routines.
>>>>>> + * ========================================================================
>>>>>> + */
>>>>>> +
>>>>>> +/*
>>>>>> + * Below is a state machine diagram for attr remove operations. The  XFS_DAS_*
>>>>>> + * states indicate places where the function would return -EAGAIN, and then
>>>>>> + * immediately resume from after being recalled by the calling function. States
>>>>>> + * marked as a "subroutine state" indicate that they belong to a subroutine, and
>>>>>> + * so the calling function needs to pass them back to that subroutine to allow
>>>>>> + * it to finish where it left off. But they otherwise do not have a role in the
>>>>>> + * calling function other than just passing through.
>>>>>> + *
>>>>>> + * xfs_attr_remove_iter()
>>>>>> + *	  XFS_DAS_RM_SHRINK ─�
>>>>>> + *	  (subroutine state) │
>>>>>> + *	                     └─>xfs_attr_node_removename()
>>>>>> + *	                                      │
>>>>>> + *	                                      v
>>>>>> + *	                                   need to
>>>>>> + *	                                shrink tree? ─n─�
>>>>>> + *	                                      │         │
>>>>>> + *	                                      y         │
>>>>>> + *	                                      │         │
>>>>>> + *	                                      v         │
>>>>>> + *	                              XFS_DAS_RM_SHRINK │
>>>>>> + *	                                      │         │
>>>>>> + *	                                      v         │
>>>>>> + *	                                     done <─────┘
>>>>>> + *
>>>>>> + */
>>>>>> +
>>>>>> +/*
>>>>>> + * Enum values for xfs_delattr_context.da_state
>>>>>> + *
>>>>>> + * These values are used by delayed attribute operations to keep track  of where
>>>>>> + * they were before they returned -EAGAIN.  A return code of -EAGAIN signals the
>>>>>> + * calling function to roll the transaction, and then recall the subroutine to
>>>>>> + * finish the operation.  The enum is then used by the subroutine to jump back
>>>>>> + * to where it was and resume executing where it left off.
>>>>>> + */
>>>>>> +enum xfs_delattr_state {
>>>>>> +				      /* Zero is uninitalized */
>>>>>> +	XFS_DAS_RM_SHRINK	= 1,  /* We are shrinking the tree */
>>>>>> +};
>>>>>> +
>>>>>> +/*
>>>>>> + * Defines for xfs_delattr_context.flags
>>>>>> + */
>>>>>> +#define XFS_DAC_DEFER_FINISH		0x01 /* finish the transaction */
>>>>>> +#define XFS_DAC_NODE_RMVNAME_INIT	0x02 /* xfs_attr_node_removename init */
>>>>>> +
>>>>>> +/*
>>>>>> + * Context used for keeping track of delayed attribute operations
>>>>>> + */
>>>>>> +struct xfs_delattr_context {
>>>>>> +	struct xfs_da_args      *da_args;
>>>>>> +
>>>>>> +	/* Used in xfs_attr_node_removename to roll through removing blocks */
>>>>>> +	struct xfs_da_state     *da_state;
>>>>>> +	struct xfs_da_state_blk *blk;
>>>>>> +
>>>>>> +	/* Used to keep track of current state of delayed operation */
>>>>>> +	unsigned int            flags;
>>>>>> +	enum xfs_delattr_state  dela_state;
>>>>>> +};
>>>>>> +
>>>>>>   /*========================================================================
>>>>>>    * Function prototypes for the kernel.
>>>>>>    *========================================================================*/
>>>>>> @@ -91,6 +160,10 @@ int xfs_attr_set(struct xfs_da_args *args);
>>>>>>   int xfs_attr_set_args(struct xfs_da_args *args);
>>>>>>   int xfs_has_attr(struct xfs_da_args *args);
>>>>>>   int xfs_attr_remove_args(struct xfs_da_args *args);
>>>>>> +int xfs_attr_remove_iter(struct xfs_delattr_context *dac);
>>>>>> +int xfs_attr_trans_roll(struct xfs_delattr_context *dac);
>>>>>>   bool xfs_attr_namecheck(const void *name, size_t length);
>>>>>> +void xfs_delattr_context_init(struct xfs_delattr_context *dac,
>>>>>> +			      struct xfs_da_args *args);
>>>>>>   
>>>>>>   #endif	/* __XFS_ATTR_H__ */
>>>>>> diff --git a/fs/xfs/libxfs/xfs_attr_leaf.c b/fs/xfs/libxfs/xfs_attr_leaf.c
>>>>>> index 8623c81..4ed7b31 100644
>>>>>> --- a/fs/xfs/libxfs/xfs_attr_leaf.c
>>>>>> +++ b/fs/xfs/libxfs/xfs_attr_leaf.c
>>>>>> @@ -19,8 +19,8 @@
>>>>>>   #include "xfs_bmap_btree.h"
>>>>>>   #include "xfs_bmap.h"
>>>>>>   #include "xfs_attr_sf.h"
>>>>>> -#include "xfs_attr_remote.h"
>>>>>>   #include "xfs_attr.h"
>>>>>> +#include "xfs_attr_remote.h"
>>>>>>   #include "xfs_attr_leaf.h"
>>>>>>   #include "xfs_error.h"
>>>>>>   #include "xfs_trace.h"
>>>>>> diff --git a/fs/xfs/libxfs/xfs_attr_remote.c b/fs/xfs/libxfs/xfs_attr_remote.c
>>>>>> index 3f80ced..7f81b48 100644
>>>>>> --- a/fs/xfs/libxfs/xfs_attr_remote.c
>>>>>> +++ b/fs/xfs/libxfs/xfs_attr_remote.c
>>>>>> @@ -676,10 +676,14 @@ xfs_attr_rmtval_invalidate(
>>>>>>    */
>>>>>>   int
>>>>>>   xfs_attr_rmtval_remove(
>>>>>> -	struct xfs_da_args      *args)
>>>>>> +	struct xfs_da_args		*args)
>>>>>>   {
>>>>>> -	int			error;
>>>>>> -	int			retval;
>>>>>> +	xfs_dablk_t			lblkno;
>>>>>> +	int				blkcnt;
>>>>>> +	int				error;
>>>>>> +	struct xfs_delattr_context	dac  = {
>>>>>> +		.da_args	= args,
>>>>>> +	};
>>>>>>   
>>>>>>   	trace_xfs_attr_rmtval_remove(args);
>>>>>>   
>>>>>> @@ -687,19 +691,17 @@ xfs_attr_rmtval_remove(
>>>>>>   	 * Keep de-allocating extents until the remote-value region is gone.
>>>>>>   	 */
>>>>>>   	do {
>>>>>> -		retval = __xfs_attr_rmtval_remove(args);
>>>>>> -		if (retval && retval != -EAGAIN)
>>>>>> -			return retval;
>>>>>> +		error = __xfs_attr_rmtval_remove(&dac);
>>>>>> +		if (error != -EAGAIN)
>>>>>> +			break;
>>>>>>   
>>>>>> -		/*
>>>>>> -		 * Close out trans and start the next one in the chain.
>>>>>> -		 */
>>>>>> -		error = xfs_trans_roll_inode(&args->trans, args->dp);
>>>>>> +		error = xfs_attr_trans_roll(&dac);
>>>>>>   		if (error)
>>>>>>   			return error;
>>>>>> -	} while (retval == -EAGAIN);
>>>>>>   
>>>>>> -	return 0;
>>>>>> +	} while (true);
>>>>>> +
>>>>>> +	return error;
>>>>>>   }
>>>>>>   
>>>>>>   /*
>>>>>> @@ -709,9 +711,10 @@ xfs_attr_rmtval_remove(
>>>>>>    */
>>>>>>   int
>>>>>>   __xfs_attr_rmtval_remove(
>>>>>> -	struct xfs_da_args	*args)
>>>>>> +	struct xfs_delattr_context	*dac)
>>>>>>   {
>>>>>> -	int			error, done;
>>>>>> +	struct xfs_da_args		*args = dac->da_args;
>>>>>> +	int				error, done;
>>>>>>   
>>>>>>   	/*
>>>>>>   	 * Unmap value blocks for this attr.
>>>>>> @@ -721,12 +724,10 @@ __xfs_attr_rmtval_remove(
>>>>>>   	if (error)
>>>>>>   		return error;
>>>>>>   
>>>>>> -	error = xfs_defer_finish(&args->trans);
>>>>>> -	if (error)
>>>>>> -		return error;
>>>>>> -
>>>>>> -	if (!done)
>>>>>> +	if (!done) {
>>>>>> +		dac->flags |= XFS_DAC_DEFER_FINISH;
>>>>>>   		return -EAGAIN;
>>>>>> +	}
>>>>>>   
>>>>>>   	return error;
>>>>>>   }
>>>>>> diff --git a/fs/xfs/libxfs/xfs_attr_remote.h b/fs/xfs/libxfs/xfs_attr_remote.h
>>>>>> index 9eee615..002fd30 100644
>>>>>> --- a/fs/xfs/libxfs/xfs_attr_remote.h
>>>>>> +++ b/fs/xfs/libxfs/xfs_attr_remote.h
>>>>>> @@ -14,5 +14,5 @@ int xfs_attr_rmtval_remove(struct xfs_da_args *args);
>>>>>>   int xfs_attr_rmtval_stale(struct xfs_inode *ip, struct xfs_bmbt_irec *map,
>>>>>>   		xfs_buf_flags_t incore_flags);
>>>>>>   int xfs_attr_rmtval_invalidate(struct xfs_da_args *args);
>>>>>> -int __xfs_attr_rmtval_remove(struct xfs_da_args *args);
>>>>>> +int __xfs_attr_rmtval_remove(struct xfs_delattr_context *dac);
>>>>>>   #endif /* __XFS_ATTR_REMOTE_H__ */
>>>>>> diff --git a/fs/xfs/xfs_attr_inactive.c b/fs/xfs/xfs_attr_inactive.c
>>>>>> index bfad669..aaa7e66 100644
>>>>>> --- a/fs/xfs/xfs_attr_inactive.c
>>>>>> +++ b/fs/xfs/xfs_attr_inactive.c
>>>>>> @@ -15,10 +15,10 @@
>>>>>>   #include "xfs_da_format.h"
>>>>>>   #include "xfs_da_btree.h"
>>>>>>   #include "xfs_inode.h"
>>>>>> +#include "xfs_attr.h"
>>>>>>   #include "xfs_attr_remote.h"
>>>>>>   #include "xfs_trans.h"
>>>>>>   #include "xfs_bmap.h"
>>>>>> -#include "xfs_attr.h"
>>>>>>   #include "xfs_attr_leaf.h"
>>>>>>   #include "xfs_quota.h"
>>>>>>   #include "xfs_dir2.h"
>>>>>> -- 
>>>>>> 2.7.4
>>>>>>
>>>>>
>>>>
>>>
>>
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v12 1/8] xfs: Add delay ready attr remove routines
  2020-09-04 23:03             ` Allison Collins
@ 2020-09-08 14:43               ` Brian Foster
  0 siblings, 0 replies; 21+ messages in thread
From: Brian Foster @ 2020-09-08 14:43 UTC (permalink / raw)
  To: Allison Collins; +Cc: Darrick J. Wong, linux-xfs

On Fri, Sep 04, 2020 at 04:03:59PM -0700, Allison Collins wrote:
> 
> 
> On 9/2/20 5:22 AM, Brian Foster wrote:
> > On Tue, Sep 01, 2020 at 11:31:34AM -0700, Darrick J. Wong wrote:
> > > On Tue, Sep 01, 2020 at 02:07:41PM -0400, Brian Foster wrote:
> > > > On Tue, Sep 01, 2020 at 10:20:21AM -0700, Darrick J. Wong wrote:
> > > > > On Tue, Sep 01, 2020 at 01:00:20PM -0400, Brian Foster wrote:
> > > > > > On Wed, Aug 26, 2020 at 05:35:11PM -0700, Allison Collins wrote:
> > > > > > > This patch modifies the attr remove routines to be delay ready. This
> > > > > > > means they no longer roll or commit transactions, but instead return
> > > > > > > -EAGAIN to have the calling routine roll and refresh the transaction. In
> > > > > > > this series, xfs_attr_remove_args has become xfs_attr_remove_iter, which
> > > > > > > uses a sort of state machine like switch to keep track of where it was
> > > > > > > when EAGAIN was returned. xfs_attr_node_removename has also been
> > > > > > > modified to use the switch, and a new version of xfs_attr_remove_args
> > > > > > > consists of a simple loop to refresh the transaction until the operation
> > > > > > > is completed.  A new XFS_DAC_DEFER_FINISH flag is used to finish the
> > > > > > > transaction where ever the existing code used to.
> > > > > > > 
> > > > > > > Calls to xfs_attr_rmtval_remove are replaced with the delay ready
> > > > > > > version __xfs_attr_rmtval_remove. We will rename
> > > > > > > __xfs_attr_rmtval_remove back to xfs_attr_rmtval_remove when we are
> > > > > > > done.
> > > > > > > 
> > > > > > > xfs_attr_rmtval_remove itself is still in use by the set routines (used
> > > > > > > during a rename).  For reasons of perserving existing function, we
> > > > > > 
> > > > > > Nit:				preserving
> ok, will fix
> > > > > > 
> > > > > > > modify xfs_attr_rmtval_remove to call xfs_defer_finish when the flag is
> > > > > > > set.  Similar to how xfs_attr_remove_args does here.  Once we transition
> > > > > > > the set routines to be delay ready, xfs_attr_rmtval_remove is no longer
> > > > > > > used and will be removed.
> > > > > > > 
> > > > > > > This patch also adds a new struct xfs_delattr_context, which we will use
> > > > > > > to keep track of the current state of an attribute operation. The new
> > > > > > > xfs_delattr_state enum is used to track various operations that are in
> > > > > > > progress so that we know not to repeat them, and resume where we left
> > > > > > > off before EAGAIN was returned to cycle out the transaction. Other
> > > > > > > members take the place of local variables that need to retain their
> > > > > > > values across multiple function recalls.  See xfs_attr.h for a more
> > > > > > > detailed diagram of the states.
> > > > > > > 
> > > > > > > Signed-off-by: Allison Collins <allison.henderson@oracle.com>
> > > > > > > ---
> > > > > > >   fs/xfs/libxfs/xfs_attr.c        | 162 ++++++++++++++++++++++++++++++----------
> > > > > > >   fs/xfs/libxfs/xfs_attr.h        |  73 ++++++++++++++++++
> > > > > > >   fs/xfs/libxfs/xfs_attr_leaf.c   |   2 +-
> > > > > > >   fs/xfs/libxfs/xfs_attr_remote.c |  39 +++++-----
> > > > > > >   fs/xfs/libxfs/xfs_attr_remote.h |   2 +-
> > > > > > >   fs/xfs/xfs_attr_inactive.c      |   2 +-
> > > > > > >   6 files changed, 220 insertions(+), 60 deletions(-)
> > > > > > > 
> > > > > > > diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
> > > > > > > index 2e055c0..ea50fc3 100644
> > > > > > > --- a/fs/xfs/libxfs/xfs_attr.c
> > > > > > > +++ b/fs/xfs/libxfs/xfs_attr.c
> > > > > > ...
> > > > > > > @@ -264,6 +264,33 @@ xfs_attr_set_shortform(
> > > > > > >   }
> > > > > > >   /*
> > > > > > > + * Checks to see if a delayed attribute transaction should be rolled.  If so,
> > > > > > > + * also checks for a defer finish.  Transaction is finished and rolled as
> > > > > > > + * needed, and returns true of false if the delayed operation should continue.
> > > > > > > + */
> > > > > > > +int
> > > > > > > +xfs_attr_trans_roll(
> > > > > > > +	struct xfs_delattr_context	*dac)
> > > > > > > +{
> > > > > > > +	struct xfs_da_args              *args = dac->da_args;
> > > > > > > +	int				error = 0;
> > > > > > > +
> > > > > > > +	if (dac->flags & XFS_DAC_DEFER_FINISH) {
> > > > > > > +		/*
> > > > > > > +		 * The caller wants us to finish all the deferred ops so that we
> > > > > > > +		 * avoid pinning the log tail with a large number of deferred
> > > > > > > +		 * ops.
> > > > > > > +		 */
> > > > > > > +		dac->flags &= ~XFS_DAC_DEFER_FINISH;
> > > > > > > +		error = xfs_defer_finish(&args->trans);
> > > > > > > +		if (error)
> > > > > > > +			return error;
> > > > > > > +	}
> > > > > > > +
> > > > > > > +	return xfs_trans_roll_inode(&args->trans, args->dp);
> > > > > > 
> > > > > > I'm not sure there's a need to roll the transaction again if the
> > > > > > defer path above executes. xfs_defer_finish() completes the dfops and
> > > > > > always returns a clean transaction.
> > > > > 
> > > > > I'm not sure we even really need a DEFER_FINISH flag if (a) xfs_defer.c
> > > > > gets patched to finish all the other defer items before coming back to
> > > > > the next step of the delattr state machine and (b) Allison removes the
> > > > > _iter functions in favor of using the defer op mechanism even when we're
> > > > > not pushing the state changes through the log.
> > > > > 
> > > > 
> > > > What do you mean by using the dfops mechanism without pushing state
> > > > changes through the log? My understanding was that dfops would be
> > > > involved with the new intent based attr ops and the state management
> > > > handles the original ops until we no longer have to support them..
> > > 
> > > I think you were probably still out when Dave and Allison and I had the
> > > brain fart^Wstorm that nothing in the defer ops code actually requires
> > > you to log anything, which means that you can use it to manage a long
> > > running operation that spans multiple transaction rolls! :)
> > > 
> > 
> > Ok..
> > 
> > > ->create_intent and ->create_done are supposed to create log items and
> > > attach them to the transaction, but the defer finish loop will still
> > > call ->finish_item even if they return NULL pointers.  If the
> > > finish_item call steps around the null pointers and calls whatever upper
> > > level functions are needed to make progress, that works fine.  There's
> > > no log recovery, obviously.
> > > 
> > > In other words, we can (ab)use defer ops for attr set/remove even in the
> > > non-logged case, which eliminates the need for the separate control
> > > loop.
> > > 
> > 
> > Right, that all makes sense. I'm still missing how this impacts the
> > lower level functional code driven by the control loop...
> > 
> > > FWIW, I've implemented that strategy as a proof of concept for extent
> > > swapping:
> > > 
> > > https://urldefense.com/v3/__https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/commit/?h=atomic-file-updates&id=a85883c36e2f3eff50db50fcf58a71d4f13d1f64__;!!GqivPVa7Brio!MQTOxwgVl5y_iE_BCpboDzsjWozVuUj8T-EEE1ICVu3TVeAwAWaWedD-cxFowrJwBzGi$
> > > 
> > > Wherein you get atomic swapext if you have the log items enabled, and
> > > if not, you get the old "rmap swapext" that doesn't have log tracking.
> > > 
> > 
> > Interesting, thanks. The whole dfops reuse idea sounds neat to me in
> > that we can presumably condense the new/old implementations even further
> > than originally expected, but I think this side steps the concern
> > related to my initial comment around refactoring. AFAICT this model
> > doesn't necessarily dictate what the underlying code looks like. In the
> > example above, it looks like the swapext code reenters into a
> > xfs_swapext_finish_one() function that trivially understands how to pick
> > up where it left off. This is a fortunate implementation detail of the
> > swapext operation (along with the whole notion of the
> > xfs_op_has_more_work() pattern, which as we've already touched on can be
> > difficult for things like xattr set, etc.).
> > 
> > By contrast, the xattr code is currently a ball of wire that rolls
> > transactions at various points up and down its implementation (generally
> > speaking). The primary intent of all this refactoring work is to isolate
> > the transaction rolling to a single mechanism so we have the ability to
> > use something like dfops in the first place. I don't see how the
> > insertion of unlogged dfops in the design really changes much in that
> > regard. Is there more to the previous discussion that I'm missing?
> > 
> > ISTM that we're potentially talking about different aspects of the
> > implementation. If so, we either need to continue to refactor the xattr
> > code to untangle the existing mess so it can be driven by a single entry
> > point (just like the swapext example), or that retrofitting the existing
> > implementation into the dfops mechanism means something more involved
> > like creating new dfops op types per sub-component of a particular xattr
> > op and queueing/running those individually. Though TBH, the latter
> > sounds like it is getting a bit into crazy infrastructure territory. ;P
> > Thoughts?
> > 
> > Brian
> 
> Yeah, I'll try some experimenting to see what that ends up looking like.
> I've looked at the swap extent code from the link above, and I think I
> understand now what Darrick is describing with the reuse/abuse the defops
> mechanics.  We modify the *_create_intent routine to return null to skip
> recoding it to the log. I THINK this should still work beucase the state
> machine context is carried around in the xfs_attr_item, not the
> xfs_attri_log_item.  So we maybe might be able to make it work with out too
> much crazy.
> 

Just FWIW, while I think the dfops thing makes sense I wouldn't put it
too far ahead of the refactoring effort. It's not immediately clear to
me whether the dfops thing required further dfops changes and/or we
wanted the isolated swapext implementation to land and get some soak
time first. I think the factoring of the underlying xattr implementation
has a chance to generate more opinions/feedback and will probably result
in more patches to review before the whole mechanism lands. The positive
is that I think once the underlying code is cleaned up, the higher level
delattr changes should be much more straightforward...

Brian

> 
> 
> > 
> > > > > (I'm working on (a) still, will have something in a few days...)
> > > > > 
> > > > > > > +}
> > > > > > > +
> > > > > > > +/*
> > > > > > >    * Set the attribute specified in @args.
> > > > > > >    */
> > > > > > >   int
> > > > > > ...
> > > > > > > @@ -1218,21 +1288,35 @@ xfs_attr_node_remove_rmt(
> > > > > > >    * This will involve walking down the Btree, and may involve joining
> > > > > > >    * leaf nodes and even joining intermediate nodes up to and including
> > > > > > >    * the root node (a special case of an intermediate node).
> > > > > > > + *
> > > > > > > + * This routine is meant to function as either an inline or delayed operation,
> > > > > > > + * and may return -EAGAIN when the transaction needs to be rolled.  Calling
> > > > > > > + * functions will need to handle this, and recall the function until a
> > > > > > > + * successful error code is returned.
> > > > > > >    */
> > > > > > >   STATIC int
> > > > > > >   xfs_attr_node_removename(
> > > > > > > -	struct xfs_da_args	*args)
> > > > > > > +	struct xfs_delattr_context	*dac)
> > > > > > >   {
> > > > > > > -	struct xfs_da_state	*state;
> > > > > > > -	struct xfs_da_state_blk	*blk;
> > > > > > > -	int			retval, error;
> > > > > > > -	struct xfs_inode	*dp = args->dp;
> > > > > > > +	struct xfs_da_args		*args = dac->da_args;
> > > > > > > +	struct xfs_da_state		*state;
> > > > > > > +	struct xfs_da_state_blk		*blk;
> > > > > > > +	int				retval, error;
> > > > > > > +	struct xfs_inode		*dp = args->dp;
> > > > > > >   	trace_xfs_attr_node_removename(args);
> > > > > > > +	state = dac->da_state;
> > > > > > > +	blk = dac->blk;
> > > > > > > -	error = xfs_attr_node_removename_setup(args, &state);
> > > > > > > -	if (error)
> > > > > > > -		goto out;
> > > > > > > +	if (dac->dela_state == XFS_DAS_RM_SHRINK)
> > > > > > > +		goto das_rm_shrink;
> > > > > > > +
> > > > > > > +	if ((dac->flags & XFS_DAC_NODE_RMVNAME_INIT) == 0) {
> > > > > > > +		dac->flags |= XFS_DAC_NODE_RMVNAME_INIT;
> > > > > > > +		error = xfs_attr_node_removename_setup(dac, &state);
> > > > > > > +		if (error)
> > > > > > > +			goto out;
> > > > > > > +	}
> > > > > > >   	/*
> > > > > > >   	 * If there is an out-of-line value, de-allocate the blocks.
> > > > > > > @@ -1240,8 +1324,13 @@ xfs_attr_node_removename(
> > > > > > >   	 * overflow the maximum size of a transaction and/or hit a deadlock.
> > > > > > >   	 */
> > > > > > >   	if (args->rmtblkno > 0) {
> > > > > > > -		error = xfs_attr_node_remove_rmt(args, state);
> > > > > > > -		if (error)
> > > > > > > +		/*
> > > > > > > +		 * May return -EAGAIN. Remove blocks until args->rmtblkno == 0
> > > > > > > +		 */
> > > > > > > +		error = xfs_attr_node_remove_rmt(dac, state);
> > > > > > > +		if (error == -EAGAIN)
> > > > > > > +			return error;
> > > > > > > +		else if (error)
> > > > > > >   			goto out;
> > > > > > >   	}
> > > > > > > @@ -1260,17 +1349,14 @@ xfs_attr_node_removename(
> > > > > > >   		error = xfs_da3_join(state);
> > > > > > >   		if (error)
> > > > > > >   			goto out;
> > > > > > > -		error = xfs_defer_finish(&args->trans);
> > > > > > > -		if (error)
> > > > > > > -			goto out;
> > > > > > > -		/*
> > > > > > > -		 * Commit the Btree join operation and start a new trans.
> > > > > > > -		 */
> > > > > > > -		error = xfs_trans_roll_inode(&args->trans, dp);
> > > > > > > -		if (error)
> > > > > > > -			goto out;
> > > > > > > +
> > > > > > > +		dac->flags |= XFS_DAC_DEFER_FINISH;
> > > > > > > +		dac->dela_state = XFS_DAS_RM_SHRINK;
> > > > > > > +		return -EAGAIN;
> > > > > > >   	}
> > > > > > > +das_rm_shrink:
> > > > > > > +
> > > > > > >   	/*
> > > > > > >   	 * If the result is small enough, push it all into the inode.
> > > > > > >   	 */
> > > > > > 
> > > > > > ISTR that Dave or Darrick previously suggested that we should try to
> > > > > > isolate the state transition code as much as possible to a single
> > > > > > location. That basically means we should look at any place a particular
> > > > > > state check travels through multiple functions and see if we can
> > > > > > refactor things to flatten the state processing code. I tend to agree
> > > > > > that is the ideal approach given how difficult it can be to track state
> > > > > > changes through multiple functions.
> > > > > 
> > > > > Yes. :)
> > > > > 
> > > > > > In light of that (and as an example), I think the whole
> > > > > > xfs_attr_node_removename() path should be refactored so it looks
> > > > > > something like the following (with obvious error
> > > > > > handling/comment/aesthetic cleanups etc.):
> > > > > > 
> > > > > > xfs_attr_node_removename_iter()
> > > > > > {
> > > > > > 	...
> > > > > > 
> > > > > > 	if ((dac->flags & XFS_DAC_NODE_RMVNAME_INIT) == 0) {
> > > > > > 		<do init stuff>
> > > > > > 	}
> > > > > > 
> > > > > > 	switch (dac->dela_state) {
> > > > > > 	case 0:
> > > > > 
> > > > > I kinda wish "0" had its own name, but I don't also want to start
> > > > > another round of naming bikeshed. :)
> > > > > 
> > > > > > 		/*
> > > > > > 		 * repeatedly remove remote blocks, remove the entry and
> > > > > > 		 * join. returns -EAGAIN or 0 for completion of the step.
> > > > > > 		 */
> > > > > > 		error = xfs_attr_node_remove_step(dac, state);
> > > > > > 		if (error)
> > > > > > 			break;
> > > > > > 
> > > > > > 		/* check whether to shrink or return success */
> > > > > > 		if (!error && xfs_bmap_one_block(...)) {
> > > > > > 			dac->dela_state = XFS_DAS_RM_SHRINK;
> > > > > > 			error = -EAGAIN;
> > > > > > 		}
> > > > > > 		break;
> > > > > > 	case XFS_DAS_RM_SHRINK:
> > > > > > 		/* shrink the fork, no reentry, no next step */
> > > > > > 		error = xfs_attr_node_shrink_step(args, state);	
> > > > > > 		break;
> > > > > 
> > > > > <nod> The ASCII art diagrams help assuage my nerves about the fact that
> > > > > we branch based on dela_state but not all the branches actually show us
> > > > > moving to the next state.
> > > > > 
> > > > > I've gotten the distinct sense, though, that throwing the new state all
> > > > > the way back up to _iter() to set it is probably a lot more fuss than
> > > > > it's worth for the attr set case, though...
> > > > > 
> > > > 
> > > > That's quite possible. :P
> Sure, I will see if I can get something similar to this worked out, at least
> for the remove path.  But yes, the set path would be a bit more of a
> challenge.
> 
> Thanks all!
> 
> Allison
> 
> > > > 
> > > > > > 	default:
> > > > > > 		ASSERT(0);
> > > > > > 		return -EINVAL;
> > > > > > 	}
> > > > > > 
> > > > > > 	if (error == -EAGAIN)
> > > > > > 		return error;
> > > > > > 
> > > > > > 	<do cleanup stuff>
> > > > > > 	...
> > > > > > 	return error;
> > > > > > }
> > > > > > 
> > > > > > The idea here is that we have one _iter() function that does all the
> > > > > > state management for a particular operation and has minimal other logic.
> > > > > > That way we can see the states that repeat, transition, etc. all in one
> > > > > > place. The _step() functions implement the functional components of each
> > > > > > state and do no state management whatsoever beyond return -EAGAIN to
> > > > > > request reentry or return 0 for completion. In the case of the latter,
> > > > > > the _iter() function decides whether to transition to another state
> > > > > > (returning -EAGAIN itself) or complete the operation. If a _step()
> > > > > > function ever needs to set or check ->dela_state, then that is clear
> > > > > > indication it must be broken up into multiple _step() functions.
> > > > > 
> > > > > ...because I've frequently had the same thought that the state machine
> > > > > handling ought to be in the same place.  But then I start reading
> > > > > through the xattr code to figure out how that would be done, and get
> > > > > trapped by the fact that some of the decisions about the next state have
> > > > > to happen pretty deep in the xattr code-- stuff like allocating an
> > > > > extent for a remote value, where depending on whether or not we got enough
> > > > > blocks to satisfy the space requirements, either we can move on to the
> > > > > next state and return EAGAIN, or we have to save the current state and
> > > > > EAGAIN to try to get more blocks.
> > > > > 
> > > > 
> > > > I haven't walked through the set code in a while, but this sort of
> > > > sounds like more of the same (heavy refactoring followed by insertion of
> > > > state management).
> > > > 
> > > > > Maybe it would help a little if the setting of DEFER_FINISH and changing
> > > > > of dela_state could be put into a little helper with a tracepoint so
> > > > > that future us can ftrace the state machine to make sure it's working
> > > > > correctly?
> > > > > 
> > > > 
> > > > I like the idea, but not sure it helps with following the code as much
> > > > as runtime analysis.
> > > 
> > > <nod>
> > > 
> > > > > > I think this implements the separation of state and functionality model
> > > > > > we're after without introduction of crazy state processing frameworks,
> > > > > 
> > > > > "crazy state processing frameworks"... like xfs_defer.c? :)
> > > > > 
> > > > 
> > > > Re: my question above, I'm curious about reusing dfops as a mechanism
> > > > for both modes if somebody can elaborate on the idea or point me at a
> > > > reference where it was previously discussed..? I could have lost track
> > > > or missed a discussion while I was out...
> > > 
> > > (See above...)
> > > 
> > > > > > etc., but I admit I've so far only thought about it wrt the remove case
> > > > > > (which is more simple than the set case). Also note that as usual, any
> > > > > > associated refactoring of the functional components should come as
> > > > > > preliminary patches such that this patch only introduces state bits.
> > > > > > Thoughts?
> > > > > 
> > > > > (I thought/hoped we'd done all the refactoring in the 23-patch megalith
> > > > > that I tossed into 5.9... :))
> > > > > 
> > > > 
> > > > Heh. I'm glad to see that snowball got tossed. ;)
> > > 
> > > :)
> > > 
> > > --D
> > > 
> > > > Brian
> > > > 
> > > > > --D
> > > > > 
> > > > > > Brian
> > > > > > 
> > > > > > > diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
> > > > > > > index 3e97a93..9573949 100644
> > > > > > > --- a/fs/xfs/libxfs/xfs_attr.h
> > > > > > > +++ b/fs/xfs/libxfs/xfs_attr.h
> > > > > > > @@ -74,6 +74,75 @@ struct xfs_attr_list_context {
> > > > > > >   };
> > > > > > > +/*
> > > > > > > + * ========================================================================
> > > > > > > + * Structure used to pass context around among the delayed routines.
> > > > > > > + * ========================================================================
> > > > > > > + */
> > > > > > > +
> > > > > > > +/*
> > > > > > > + * Below is a state machine diagram for attr remove operations. The  XFS_DAS_*
> > > > > > > + * states indicate places where the function would return -EAGAIN, and then
> > > > > > > + * immediately resume from after being recalled by the calling function. States
> > > > > > > + * marked as a "subroutine state" indicate that they belong to a subroutine, and
> > > > > > > + * so the calling function needs to pass them back to that subroutine to allow
> > > > > > > + * it to finish where it left off. But they otherwise do not have a role in the
> > > > > > > + * calling function other than just passing through.
> > > > > > > + *
> > > > > > > + * xfs_attr_remove_iter()
> > > > > > > + *	  XFS_DAS_RM_SHRINK ─�
> > > > > > > + *	  (subroutine state) │
> > > > > > > + *	                     └─>xfs_attr_node_removename()
> > > > > > > + *	                                      │
> > > > > > > + *	                                      v
> > > > > > > + *	                                   need to
> > > > > > > + *	                                shrink tree? ─n─�
> > > > > > > + *	                                      │         │
> > > > > > > + *	                                      y         │
> > > > > > > + *	                                      │         │
> > > > > > > + *	                                      v         │
> > > > > > > + *	                              XFS_DAS_RM_SHRINK │
> > > > > > > + *	                                      │         │
> > > > > > > + *	                                      v         │
> > > > > > > + *	                                     done <─────┘
> > > > > > > + *
> > > > > > > + */
> > > > > > > +
> > > > > > > +/*
> > > > > > > + * Enum values for xfs_delattr_context.da_state
> > > > > > > + *
> > > > > > > + * These values are used by delayed attribute operations to keep track  of where
> > > > > > > + * they were before they returned -EAGAIN.  A return code of -EAGAIN signals the
> > > > > > > + * calling function to roll the transaction, and then recall the subroutine to
> > > > > > > + * finish the operation.  The enum is then used by the subroutine to jump back
> > > > > > > + * to where it was and resume executing where it left off.
> > > > > > > + */
> > > > > > > +enum xfs_delattr_state {
> > > > > > > +				      /* Zero is uninitalized */
> > > > > > > +	XFS_DAS_RM_SHRINK	= 1,  /* We are shrinking the tree */
> > > > > > > +};
> > > > > > > +
> > > > > > > +/*
> > > > > > > + * Defines for xfs_delattr_context.flags
> > > > > > > + */
> > > > > > > +#define XFS_DAC_DEFER_FINISH		0x01 /* finish the transaction */
> > > > > > > +#define XFS_DAC_NODE_RMVNAME_INIT	0x02 /* xfs_attr_node_removename init */
> > > > > > > +
> > > > > > > +/*
> > > > > > > + * Context used for keeping track of delayed attribute operations
> > > > > > > + */
> > > > > > > +struct xfs_delattr_context {
> > > > > > > +	struct xfs_da_args      *da_args;
> > > > > > > +
> > > > > > > +	/* Used in xfs_attr_node_removename to roll through removing blocks */
> > > > > > > +	struct xfs_da_state     *da_state;
> > > > > > > +	struct xfs_da_state_blk *blk;
> > > > > > > +
> > > > > > > +	/* Used to keep track of current state of delayed operation */
> > > > > > > +	unsigned int            flags;
> > > > > > > +	enum xfs_delattr_state  dela_state;
> > > > > > > +};
> > > > > > > +
> > > > > > >   /*========================================================================
> > > > > > >    * Function prototypes for the kernel.
> > > > > > >    *========================================================================*/
> > > > > > > @@ -91,6 +160,10 @@ int xfs_attr_set(struct xfs_da_args *args);
> > > > > > >   int xfs_attr_set_args(struct xfs_da_args *args);
> > > > > > >   int xfs_has_attr(struct xfs_da_args *args);
> > > > > > >   int xfs_attr_remove_args(struct xfs_da_args *args);
> > > > > > > +int xfs_attr_remove_iter(struct xfs_delattr_context *dac);
> > > > > > > +int xfs_attr_trans_roll(struct xfs_delattr_context *dac);
> > > > > > >   bool xfs_attr_namecheck(const void *name, size_t length);
> > > > > > > +void xfs_delattr_context_init(struct xfs_delattr_context *dac,
> > > > > > > +			      struct xfs_da_args *args);
> > > > > > >   #endif	/* __XFS_ATTR_H__ */
> > > > > > > diff --git a/fs/xfs/libxfs/xfs_attr_leaf.c b/fs/xfs/libxfs/xfs_attr_leaf.c
> > > > > > > index 8623c81..4ed7b31 100644
> > > > > > > --- a/fs/xfs/libxfs/xfs_attr_leaf.c
> > > > > > > +++ b/fs/xfs/libxfs/xfs_attr_leaf.c
> > > > > > > @@ -19,8 +19,8 @@
> > > > > > >   #include "xfs_bmap_btree.h"
> > > > > > >   #include "xfs_bmap.h"
> > > > > > >   #include "xfs_attr_sf.h"
> > > > > > > -#include "xfs_attr_remote.h"
> > > > > > >   #include "xfs_attr.h"
> > > > > > > +#include "xfs_attr_remote.h"
> > > > > > >   #include "xfs_attr_leaf.h"
> > > > > > >   #include "xfs_error.h"
> > > > > > >   #include "xfs_trace.h"
> > > > > > > diff --git a/fs/xfs/libxfs/xfs_attr_remote.c b/fs/xfs/libxfs/xfs_attr_remote.c
> > > > > > > index 3f80ced..7f81b48 100644
> > > > > > > --- a/fs/xfs/libxfs/xfs_attr_remote.c
> > > > > > > +++ b/fs/xfs/libxfs/xfs_attr_remote.c
> > > > > > > @@ -676,10 +676,14 @@ xfs_attr_rmtval_invalidate(
> > > > > > >    */
> > > > > > >   int
> > > > > > >   xfs_attr_rmtval_remove(
> > > > > > > -	struct xfs_da_args      *args)
> > > > > > > +	struct xfs_da_args		*args)
> > > > > > >   {
> > > > > > > -	int			error;
> > > > > > > -	int			retval;
> > > > > > > +	xfs_dablk_t			lblkno;
> > > > > > > +	int				blkcnt;
> > > > > > > +	int				error;
> > > > > > > +	struct xfs_delattr_context	dac  = {
> > > > > > > +		.da_args	= args,
> > > > > > > +	};
> > > > > > >   	trace_xfs_attr_rmtval_remove(args);
> > > > > > > @@ -687,19 +691,17 @@ xfs_attr_rmtval_remove(
> > > > > > >   	 * Keep de-allocating extents until the remote-value region is gone.
> > > > > > >   	 */
> > > > > > >   	do {
> > > > > > > -		retval = __xfs_attr_rmtval_remove(args);
> > > > > > > -		if (retval && retval != -EAGAIN)
> > > > > > > -			return retval;
> > > > > > > +		error = __xfs_attr_rmtval_remove(&dac);
> > > > > > > +		if (error != -EAGAIN)
> > > > > > > +			break;
> > > > > > > -		/*
> > > > > > > -		 * Close out trans and start the next one in the chain.
> > > > > > > -		 */
> > > > > > > -		error = xfs_trans_roll_inode(&args->trans, args->dp);
> > > > > > > +		error = xfs_attr_trans_roll(&dac);
> > > > > > >   		if (error)
> > > > > > >   			return error;
> > > > > > > -	} while (retval == -EAGAIN);
> > > > > > > -	return 0;
> > > > > > > +	} while (true);
> > > > > > > +
> > > > > > > +	return error;
> > > > > > >   }
> > > > > > >   /*
> > > > > > > @@ -709,9 +711,10 @@ xfs_attr_rmtval_remove(
> > > > > > >    */
> > > > > > >   int
> > > > > > >   __xfs_attr_rmtval_remove(
> > > > > > > -	struct xfs_da_args	*args)
> > > > > > > +	struct xfs_delattr_context	*dac)
> > > > > > >   {
> > > > > > > -	int			error, done;
> > > > > > > +	struct xfs_da_args		*args = dac->da_args;
> > > > > > > +	int				error, done;
> > > > > > >   	/*
> > > > > > >   	 * Unmap value blocks for this attr.
> > > > > > > @@ -721,12 +724,10 @@ __xfs_attr_rmtval_remove(
> > > > > > >   	if (error)
> > > > > > >   		return error;
> > > > > > > -	error = xfs_defer_finish(&args->trans);
> > > > > > > -	if (error)
> > > > > > > -		return error;
> > > > > > > -
> > > > > > > -	if (!done)
> > > > > > > +	if (!done) {
> > > > > > > +		dac->flags |= XFS_DAC_DEFER_FINISH;
> > > > > > >   		return -EAGAIN;
> > > > > > > +	}
> > > > > > >   	return error;
> > > > > > >   }
> > > > > > > diff --git a/fs/xfs/libxfs/xfs_attr_remote.h b/fs/xfs/libxfs/xfs_attr_remote.h
> > > > > > > index 9eee615..002fd30 100644
> > > > > > > --- a/fs/xfs/libxfs/xfs_attr_remote.h
> > > > > > > +++ b/fs/xfs/libxfs/xfs_attr_remote.h
> > > > > > > @@ -14,5 +14,5 @@ int xfs_attr_rmtval_remove(struct xfs_da_args *args);
> > > > > > >   int xfs_attr_rmtval_stale(struct xfs_inode *ip, struct xfs_bmbt_irec *map,
> > > > > > >   		xfs_buf_flags_t incore_flags);
> > > > > > >   int xfs_attr_rmtval_invalidate(struct xfs_da_args *args);
> > > > > > > -int __xfs_attr_rmtval_remove(struct xfs_da_args *args);
> > > > > > > +int __xfs_attr_rmtval_remove(struct xfs_delattr_context *dac);
> > > > > > >   #endif /* __XFS_ATTR_REMOTE_H__ */
> > > > > > > diff --git a/fs/xfs/xfs_attr_inactive.c b/fs/xfs/xfs_attr_inactive.c
> > > > > > > index bfad669..aaa7e66 100644
> > > > > > > --- a/fs/xfs/xfs_attr_inactive.c
> > > > > > > +++ b/fs/xfs/xfs_attr_inactive.c
> > > > > > > @@ -15,10 +15,10 @@
> > > > > > >   #include "xfs_da_format.h"
> > > > > > >   #include "xfs_da_btree.h"
> > > > > > >   #include "xfs_inode.h"
> > > > > > > +#include "xfs_attr.h"
> > > > > > >   #include "xfs_attr_remote.h"
> > > > > > >   #include "xfs_trans.h"
> > > > > > >   #include "xfs_bmap.h"
> > > > > > > -#include "xfs_attr.h"
> > > > > > >   #include "xfs_attr_leaf.h"
> > > > > > >   #include "xfs_quota.h"
> > > > > > >   #include "xfs_dir2.h"
> > > > > > > -- 
> > > > > > > 2.7.4
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 


^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2020-09-08 20:16 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-27  0:35 [PATCH v12 0/8] xfs: Delayed Attributes Allison Collins
2020-08-27  0:35 ` [PATCH v12 1/8] xfs: Add delay ready attr remove routines Allison Collins
2020-09-01 17:00   ` Brian Foster
2020-09-01 17:20     ` Darrick J. Wong
2020-09-01 18:07       ` Brian Foster
2020-09-01 18:31         ` Darrick J. Wong
2020-09-02 12:22           ` Brian Foster
2020-09-04 23:03             ` Allison Collins
2020-09-08 14:43               ` Brian Foster
2020-08-27  0:35 ` [PATCH v12 2/8] xfs: Add delay ready attr set routines Allison Collins
2020-08-27  0:35 ` [PATCH v12 3/8] xfs: Rename __xfs_attr_rmtval_remove Allison Collins
2020-08-27  0:35 ` [PATCH v12 4/8] xfs: Set up infastructure for deferred attribute operations Allison Collins
2020-08-28 21:27   ` Darrick J. Wong
2020-09-02  0:46     ` Allison Collins
2020-09-02  2:33       ` Allison Collins
2020-08-27  0:35 ` [PATCH v12 5/8] xfs: Add xfs_attr_set_deferred and xfs_attr_remove_deferred Allison Collins
2020-08-27  0:35 ` [PATCH v12 6/8] xfs: Add feature bit XFS_SB_FEAT_INCOMPAT_LOG_DELATTR Allison Collins
2020-08-27  0:35 ` [PATCH v12 7/8] xfs: Enable delayed attributes Allison Collins
2020-08-27  0:35 ` [PATCH v12 8/8] xfs_io: Add delayed attributes error tag Allison Collins
2020-08-28 16:02   ` Darrick J. Wong
2020-08-28 18:00     ` Allison Collins

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).