linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v24 00/11] Log Attribute Replay
@ 2021-08-24 22:44 Allison Henderson
  2021-08-24 22:44 ` [PATCH v24 01/11] xfs: Return from xfs_attr_set_iter if there are no more rmtblks to process Allison Henderson
                   ` (12 more replies)
  0 siblings, 13 replies; 45+ messages in thread
From: Allison Henderson @ 2021-08-24 22:44 UTC (permalink / raw)
  To: linux-xfs

Hi all,

This set is a subset of a larger series parent pointers. Delayed attributes allow
attribute operations (set and remove) to be logged and committed in the same
way that other delayed operations do. This allows more complex operations (like
parent pointers) to be broken up into multiple smaller transactions. To do
this, the existing attr operations must be modified to operate as a delayed
operation.  This means that they cannot roll, commit, or finish transactions.
Instead, they return -EAGAIN to allow the calling function to handle the
transaction.  In this series, we focus on only the delayed attribute portion.
We will introduce parent pointers in a later set.

The set as a whole is a bit much to digest at once, so I usually send out the
smaller sub series to reduce reviewer burn out.  But the entire extended series
is visible through the included github links.

Updates since v23:

xfs: Return from xfs_attr_set_iter if there are no more rmtblks to process
   Fixed xfs/125 fail

xfs: Capture buffers for delayed ops
   NEW

xfs: Set up infrastructure for deferred attribute operations
   Commit message updated with new name
   Removed hasdelattr() from xfs_attri_validate

xfs: Implement attr logging and replay
   Fixed xfs_sb_version_hasdelattr to use sb_features_log_incompat
   Removed redundant flag set in xfs_trans_attr_finish_update
   Renamed XFS_SB_FEAT_INCOMPAT_LOG_DELATTR to XFS_SB_FEAT_INCOMPAT_LOG_XATTRS
   Renamed xfs_sb_version_hasdelattr sb_version_haslogxattrs
   Removed unneeded xfs_qm_dqattach_locked in xfs_trans_attr_finish_update
   Changed xfs_hasdelattr to sb_version_haslogxattrs in xfs_attr_create_intent
   Simplified args pointer in xfs_attri_item_recover
   Removed uneeded buffer rejoin logic

RFC xfs: Skip flip flags for delayed attrs
   Fixed xfs/125 fail

 xfs: Add xfs_attr_set_deferred and xfs_attr_remove_deferred
   added delayed sample to avoid potential race when enabling feature 
   moved warn_once to after error check

xfs: Remove unused xfs_attr_*_args
   Removed local variable leaf_bp

xfs: Add log attribute error tag
   Renamed error tag to ERRTAG_LARP

xfs: Add larp debug option
   NEW

xfs: Merge xfs_delattr_context into xfs_attr_item
   Rebase adjustments

xfs: Add helper function xfs_attr_leaf_addname
    Rebase adjustments

This series can be viewed on github here:
https://github.com/allisonhenderson/xfs_work/tree/delayed_attrs_v24

As well as the extended delayed attribute and parent pointer series:
https://github.com/allisonhenderson/xfs_work/tree/delayed_attrs_v24_extended

And the test cases:
https://github.com/allisonhenderson/xfs_work/tree/pptr_xfstestsv4
In order to run the test cases, you will need have the corresponding xfsprogs
changes as well.  Which can be found here:
https://github.com/allisonhenderson/xfs_work/tree/delayed_attrs_xfsprogs_v24
https://github.com/allisonhenderson/xfs_work/tree/delayed_attrs_xfsprogs_v24_extended

To run the xfs attributes tests run:
check -g attr

To run as delayed attributes run:
echo 1 > /sys/fs/xfs/debug/larp;
check -g attr

To run parent pointer tests:
check -g parent

I've also made the corresponding updates to the user space side as well, and ported anything
they need to seat correctly.

Questions, comment and feedback appreciated! 

Allison

Allison Collins (1):
  xfs: Add xfs_attr_set_deferred and xfs_attr_remove_deferred

Allison Henderson (10):
  xfs: Return from xfs_attr_set_iter if there are no more rmtblks to
    process
  xfs: Capture buffers for delayed ops
  xfs: Set up infrastructure for log atrribute replay
  xfs: Implement attr logging and replay
  RFC xfs: Skip flip flags for delayed attrs
  xfs: Remove unused xfs_attr_*_args
  xfs: Add log attribute error tag
  xfs: Add larp debug option
  xfs: Merge xfs_delattr_context into xfs_attr_item
  xfs: Add helper function xfs_attr_leaf_addname

 fs/xfs/Makefile                 |   1 +
 fs/xfs/libxfs/xfs_attr.c        | 454 +++++++++---------
 fs/xfs/libxfs/xfs_attr.h        |  57 ++-
 fs/xfs/libxfs/xfs_attr_leaf.c   |   3 +-
 fs/xfs/libxfs/xfs_attr_remote.c |  37 +-
 fs/xfs/libxfs/xfs_attr_remote.h |   6 +-
 fs/xfs/libxfs/xfs_defer.c       |   8 +-
 fs/xfs/libxfs/xfs_defer.h       |   7 +-
 fs/xfs/libxfs/xfs_errortag.h    |   4 +-
 fs/xfs/libxfs/xfs_format.h      |  10 +-
 fs/xfs/libxfs/xfs_log_format.h  |  44 +-
 fs/xfs/libxfs/xfs_log_recover.h |   2 +
 fs/xfs/scrub/common.c           |   2 +
 fs/xfs/xfs_attr_item.c          | 814 ++++++++++++++++++++++++++++++++
 fs/xfs/xfs_attr_item.h          |  52 ++
 fs/xfs/xfs_attr_list.c          |   1 +
 fs/xfs/xfs_bmap_item.c          |   2 +-
 fs/xfs/xfs_buf.c                |   1 +
 fs/xfs/xfs_buf.h                |   1 +
 fs/xfs/xfs_error.c              |   3 +
 fs/xfs/xfs_extfree_item.c       |   2 +-
 fs/xfs/xfs_globals.c            |   1 +
 fs/xfs/xfs_ioctl32.c            |   2 +
 fs/xfs/xfs_iops.c               |   2 +
 fs/xfs/xfs_log.c                |  45 ++
 fs/xfs/xfs_log.h                |   1 +
 fs/xfs/xfs_log_recover.c        |   9 +
 fs/xfs/xfs_ondisk.h             |   2 +
 fs/xfs/xfs_refcount_item.c      |   2 +-
 fs/xfs/xfs_rmap_item.c          |   2 +-
 fs/xfs/xfs_sysctl.h             |   1 +
 fs/xfs/xfs_sysfs.c              |  24 +
 fs/xfs/xfs_trace.h              |   1 +
 33 files changed, 1333 insertions(+), 270 deletions(-)
 create mode 100644 fs/xfs/xfs_attr_item.c
 create mode 100644 fs/xfs/xfs_attr_item.h

-- 
2.25.1


^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH v24 01/11] xfs: Return from xfs_attr_set_iter if there are no more rmtblks to process
  2021-08-24 22:44 [PATCH v24 00/11] Log Attribute Replay Allison Henderson
@ 2021-08-24 22:44 ` Allison Henderson
  2021-08-24 22:44 ` [PATCH v24 02/11] xfs: Capture buffers for delayed ops Allison Henderson
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 45+ messages in thread
From: Allison Henderson @ 2021-08-24 22:44 UTC (permalink / raw)
  To: linux-xfs

During an attr rename operation, blocks are saved for later removal
as rmtblkno2. The rmtblkno is used in the case of needing to alloc
more blocks if not enough were available.  However, in the case
that no further blocks need to be added or removed, we can return as soon
as xfs_attr_node_addname completes, rather than rolling the transaction
with an -EAGAIN return.  This extra loop does not hurt anything right
now, but it will be a problem later when we get into log items because
we end up with an empty log transaction.  So, add a simple check to
cut out the unneeded iteration.

Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_attr.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
index fbc9d816882c..50b91b4461e7 100644
--- a/fs/xfs/libxfs/xfs_attr.c
+++ b/fs/xfs/libxfs/xfs_attr.c
@@ -412,6 +412,14 @@ xfs_attr_set_iter(
 			if (error)
 				return error;
 
+			/*
+			 * If addname was successful, and we dont need to alloc
+			 * or remove anymore blks, we're done.
+			 */
+			if (!args->rmtblkno &&
+			    !(args->op_flags & XFS_DA_OP_RENAME))
+				return 0;
+
 			dac->dela_state = XFS_DAS_FOUND_NBLK;
 		}
 		trace_xfs_attr_set_iter_return(dac->dela_state,	args->dp);
-- 
2.25.1


^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH v24 02/11] xfs: Capture buffers for delayed ops
  2021-08-24 22:44 [PATCH v24 00/11] Log Attribute Replay Allison Henderson
  2021-08-24 22:44 ` [PATCH v24 01/11] xfs: Return from xfs_attr_set_iter if there are no more rmtblks to process Allison Henderson
@ 2021-08-24 22:44 ` Allison Henderson
  2021-08-27 11:33   ` Chandan Babu R
  2021-08-30 17:44   ` Darrick J. Wong
  2021-08-24 22:44 ` [PATCH v24 03/11] xfs: Set up infrastructure for log atrribute replay Allison Henderson
                   ` (10 subsequent siblings)
  12 siblings, 2 replies; 45+ messages in thread
From: Allison Henderson @ 2021-08-24 22:44 UTC (permalink / raw)
  To: linux-xfs

This patch enables delayed operations to capture held buffers with in
the xfs_defer_capture. Buffers are then rejoined to the new
transaction in xlog_finish_defer_ops

Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
---
 fs/xfs/libxfs/xfs_defer.c  | 7 ++++++-
 fs/xfs/libxfs/xfs_defer.h  | 4 +++-
 fs/xfs/xfs_bmap_item.c     | 2 +-
 fs/xfs/xfs_buf.c           | 1 +
 fs/xfs/xfs_buf.h           | 1 +
 fs/xfs/xfs_extfree_item.c  | 2 +-
 fs/xfs/xfs_log_recover.c   | 7 +++++++
 fs/xfs/xfs_refcount_item.c | 2 +-
 fs/xfs/xfs_rmap_item.c     | 2 +-
 9 files changed, 22 insertions(+), 6 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_defer.c b/fs/xfs/libxfs/xfs_defer.c
index eff4a127188e..d1d09b6aca55 100644
--- a/fs/xfs/libxfs/xfs_defer.c
+++ b/fs/xfs/libxfs/xfs_defer.c
@@ -639,6 +639,7 @@ xfs_defer_ops_capture(
 	dfc = kmem_zalloc(sizeof(*dfc), KM_NOFS);
 	INIT_LIST_HEAD(&dfc->dfc_list);
 	INIT_LIST_HEAD(&dfc->dfc_dfops);
+	INIT_LIST_HEAD(&dfc->dfc_buffers);
 
 	xfs_defer_create_intents(tp);
 
@@ -690,7 +691,8 @@ int
 xfs_defer_ops_capture_and_commit(
 	struct xfs_trans		*tp,
 	struct xfs_inode		*capture_ip,
-	struct list_head		*capture_list)
+	struct list_head		*capture_list,
+	struct xfs_buf			*bp)
 {
 	struct xfs_mount		*mp = tp->t_mountp;
 	struct xfs_defer_capture	*dfc;
@@ -703,6 +705,9 @@ xfs_defer_ops_capture_and_commit(
 	if (!dfc)
 		return xfs_trans_commit(tp);
 
+	if (bp && bp->b_transp == tp)
+		list_add_tail(&bp->b_delay, &dfc->dfc_buffers);
+
 	/* Commit the transaction and add the capture structure to the list. */
 	error = xfs_trans_commit(tp);
 	if (error) {
diff --git a/fs/xfs/libxfs/xfs_defer.h b/fs/xfs/libxfs/xfs_defer.h
index 05472f71fffe..739f70d72fd5 100644
--- a/fs/xfs/libxfs/xfs_defer.h
+++ b/fs/xfs/libxfs/xfs_defer.h
@@ -74,6 +74,7 @@ struct xfs_defer_capture {
 
 	/* Deferred ops state saved from the transaction. */
 	struct list_head	dfc_dfops;
+	struct list_head	dfc_buffers;
 	unsigned int		dfc_tpflags;
 
 	/* Block reservations for the data and rt devices. */
@@ -95,7 +96,8 @@ struct xfs_defer_capture {
  * This doesn't normally happen except log recovery.
  */
 int xfs_defer_ops_capture_and_commit(struct xfs_trans *tp,
-		struct xfs_inode *capture_ip, struct list_head *capture_list);
+		struct xfs_inode *capture_ip, struct list_head *capture_list,
+		struct xfs_buf *bp);
 void xfs_defer_ops_continue(struct xfs_defer_capture *d, struct xfs_trans *tp,
 		struct xfs_inode **captured_ipp);
 void xfs_defer_ops_release(struct xfs_mount *mp, struct xfs_defer_capture *d);
diff --git a/fs/xfs/xfs_bmap_item.c b/fs/xfs/xfs_bmap_item.c
index 03159970133f..51ba8ee368ca 100644
--- a/fs/xfs/xfs_bmap_item.c
+++ b/fs/xfs/xfs_bmap_item.c
@@ -532,7 +532,7 @@ xfs_bui_item_recover(
 	 * Commit transaction, which frees the transaction and saves the inode
 	 * for later replay activities.
 	 */
-	error = xfs_defer_ops_capture_and_commit(tp, ip, capture_list);
+	error = xfs_defer_ops_capture_and_commit(tp, ip, capture_list, NULL);
 	if (error)
 		goto err_unlock;
 
diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index 047bd6e3f389..29b4655a0a65 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -233,6 +233,7 @@ _xfs_buf_alloc(
 	init_completion(&bp->b_iowait);
 	INIT_LIST_HEAD(&bp->b_lru);
 	INIT_LIST_HEAD(&bp->b_list);
+	INIT_LIST_HEAD(&bp->b_delay);
 	INIT_LIST_HEAD(&bp->b_li_list);
 	sema_init(&bp->b_sema, 0); /* held, no waiters */
 	spin_lock_init(&bp->b_lock);
diff --git a/fs/xfs/xfs_buf.h b/fs/xfs/xfs_buf.h
index 6b0200b8007d..c51445705dc6 100644
--- a/fs/xfs/xfs_buf.h
+++ b/fs/xfs/xfs_buf.h
@@ -151,6 +151,7 @@ struct xfs_buf {
 	int			b_io_error;	/* internal IO error state */
 	wait_queue_head_t	b_waiters;	/* unpin waiters */
 	struct list_head	b_list;
+	struct list_head	b_delay;	/* delayed operations list */
 	struct xfs_perag	*b_pag;		/* contains rbtree root */
 	struct xfs_mount	*b_mount;
 	struct xfs_buftarg	*b_target;	/* buffer target (device) */
diff --git a/fs/xfs/xfs_extfree_item.c b/fs/xfs/xfs_extfree_item.c
index 3f8a0713573a..046f21338c48 100644
--- a/fs/xfs/xfs_extfree_item.c
+++ b/fs/xfs/xfs_extfree_item.c
@@ -637,7 +637,7 @@ xfs_efi_item_recover(
 
 	}
 
-	return xfs_defer_ops_capture_and_commit(tp, NULL, capture_list);
+	return xfs_defer_ops_capture_and_commit(tp, NULL, capture_list, NULL);
 
 abort_error:
 	xfs_trans_cancel(tp);
diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index 10562ecbd9ea..6a3c0bb16b69 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -2465,6 +2465,7 @@ xlog_finish_defer_ops(
 	struct list_head	*capture_list)
 {
 	struct xfs_defer_capture *dfc, *next;
+	struct xfs_buf		*bp, *bnext;
 	struct xfs_trans	*tp;
 	struct xfs_inode	*ip;
 	int			error = 0;
@@ -2489,6 +2490,12 @@ xlog_finish_defer_ops(
 			return error;
 		}
 
+		list_for_each_entry_safe(bp, bnext, &dfc->dfc_buffers, b_delay) {
+			xfs_trans_bjoin(tp, bp);
+			xfs_trans_bhold(tp, bp);
+			list_del_init(&bp->b_delay);
+		}
+
 		/*
 		 * Transfer to this new transaction all the dfops we captured
 		 * from recovering a single intent item.
diff --git a/fs/xfs/xfs_refcount_item.c b/fs/xfs/xfs_refcount_item.c
index 46904b793bd4..a6e7351ca4f9 100644
--- a/fs/xfs/xfs_refcount_item.c
+++ b/fs/xfs/xfs_refcount_item.c
@@ -557,7 +557,7 @@ xfs_cui_item_recover(
 	}
 
 	xfs_refcount_finish_one_cleanup(tp, rcur, error);
-	return xfs_defer_ops_capture_and_commit(tp, NULL, capture_list);
+	return xfs_defer_ops_capture_and_commit(tp, NULL, capture_list, NULL);
 
 abort_error:
 	xfs_refcount_finish_one_cleanup(tp, rcur, error);
diff --git a/fs/xfs/xfs_rmap_item.c b/fs/xfs/xfs_rmap_item.c
index 5f0695980467..8c70a4af80a9 100644
--- a/fs/xfs/xfs_rmap_item.c
+++ b/fs/xfs/xfs_rmap_item.c
@@ -587,7 +587,7 @@ xfs_rui_item_recover(
 	}
 
 	xfs_rmap_finish_one_cleanup(tp, rcur, error);
-	return xfs_defer_ops_capture_and_commit(tp, NULL, capture_list);
+	return xfs_defer_ops_capture_and_commit(tp, NULL, capture_list, NULL);
 
 abort_error:
 	xfs_rmap_finish_one_cleanup(tp, rcur, error);
-- 
2.25.1


^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH v24 03/11] xfs: Set up infrastructure for log atrribute replay
  2021-08-24 22:44 [PATCH v24 00/11] Log Attribute Replay Allison Henderson
  2021-08-24 22:44 ` [PATCH v24 01/11] xfs: Return from xfs_attr_set_iter if there are no more rmtblks to process Allison Henderson
  2021-08-24 22:44 ` [PATCH v24 02/11] xfs: Capture buffers for delayed ops Allison Henderson
@ 2021-08-24 22:44 ` Allison Henderson
  2021-08-27 14:17   ` Chandan Babu R
                     ` (2 more replies)
  2021-08-24 22:44 ` [PATCH v24 04/11] xfs: Implement attr logging and replay Allison Henderson
                   ` (9 subsequent siblings)
  12 siblings, 3 replies; 45+ messages in thread
From: Allison Henderson @ 2021-08-24 22:44 UTC (permalink / raw)
  To: linux-xfs

Currently attributes are modified directly across one or more
transactions. But they are not logged or replayed in the event of an
error. The goal of log attr replay is to enable logging and replaying
of attribute operations using the existing delayed operations
infrastructure.  This will later enable the attributes to become part of
larger multi part operations that also must first be recorded to the
log.  This is mostly of interest in the scheme of parent pointers which
would need to maintain an attribute containing parent inode information
any time an inode is moved, created, or removed.  Parent pointers would
then be of interest to any feature that would need to quickly derive an
inode path from the mount point. Online scrub, nfs lookups and fs grow
or shrink operations are all features that could take advantage of this.

This patch adds two new log item types for setting or removing
attributes as deferred operations.  The xfs_attri_log_item will log an
intent to set or remove an attribute.  The corresponding
xfs_attrd_log_item holds a reference to the xfs_attri_log_item and is
freed once the transaction is done.  Both log items use a generic
xfs_attr_log_format structure that contains the attribute name, value,
flags, inode, and an op_flag that indicates if the operations is a set
or remove.

Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
---
 fs/xfs/Makefile                 |   1 +
 fs/xfs/libxfs/xfs_attr.c        |   5 +-
 fs/xfs/libxfs/xfs_attr.h        |  31 +++
 fs/xfs/libxfs/xfs_defer.h       |   2 +
 fs/xfs/libxfs/xfs_log_format.h  |  44 +++-
 fs/xfs/libxfs/xfs_log_recover.h |   2 +
 fs/xfs/scrub/common.c           |   2 +
 fs/xfs/xfs_attr_item.c          | 453 ++++++++++++++++++++++++++++++++
 fs/xfs/xfs_attr_item.h          |  52 ++++
 fs/xfs/xfs_attr_list.c          |   1 +
 fs/xfs/xfs_ioctl32.c            |   2 +
 fs/xfs/xfs_iops.c               |   2 +
 fs/xfs/xfs_log.c                |   4 +
 fs/xfs/xfs_log_recover.c        |   2 +
 fs/xfs/xfs_ondisk.h             |   2 +
 15 files changed, 600 insertions(+), 5 deletions(-)

diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index 04611a1068b4..b056cfc6398e 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -102,6 +102,7 @@ xfs-y				+= xfs_log.o \
 				   xfs_buf_item_recover.o \
 				   xfs_dquot_item_recover.o \
 				   xfs_extfree_item.o \
+				   xfs_attr_item.o \
 				   xfs_icreate_item.o \
 				   xfs_inode_item.o \
 				   xfs_inode_item_recover.o \
diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
index 50b91b4461e7..dfff81024e46 100644
--- a/fs/xfs/libxfs/xfs_attr.c
+++ b/fs/xfs/libxfs/xfs_attr.c
@@ -24,6 +24,7 @@
 #include "xfs_quota.h"
 #include "xfs_trans_space.h"
 #include "xfs_trace.h"
+#include "xfs_attr_item.h"
 
 /*
  * xfs_attr.c
@@ -61,8 +62,6 @@ STATIC int xfs_attr_node_hasname(xfs_da_args_t *args,
 				 struct xfs_da_state **state);
 STATIC int xfs_attr_fillstate(xfs_da_state_t *state);
 STATIC int xfs_attr_refillstate(xfs_da_state_t *state);
-STATIC int xfs_attr_set_iter(struct xfs_delattr_context *dac,
-			     struct xfs_buf **leaf_bp);
 STATIC int xfs_attr_node_removename(struct xfs_da_args *args,
 				    struct xfs_da_state *state);
 
@@ -166,7 +165,7 @@ xfs_attr_get(
 /*
  * Calculate how many blocks we need for the new attribute,
  */
-STATIC int
+int
 xfs_attr_calc_size(
 	struct xfs_da_args	*args,
 	int			*local)
diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
index 5e71f719bdd5..aa33cdcf26b8 100644
--- a/fs/xfs/libxfs/xfs_attr.h
+++ b/fs/xfs/libxfs/xfs_attr.h
@@ -28,6 +28,11 @@ struct xfs_attr_list_context;
  */
 #define	ATTR_MAX_VALUELEN	(64*1024)	/* max length of a value */
 
+static inline bool xfs_has_larp(struct xfs_mount *mp)
+{
+	return false;
+}
+
 /*
  * Kernel-internal version of the attrlist cursor.
  */
@@ -454,6 +459,7 @@ enum xfs_delattr_state {
  */
 #define XFS_DAC_DEFER_FINISH		0x01 /* finish the transaction */
 #define XFS_DAC_LEAF_ADDNAME_INIT	0x02 /* xfs_attr_leaf_addname init*/
+#define XFS_DAC_DELAYED_OP_INIT		0x04 /* delayed operations init*/
 
 /*
  * Context used for keeping track of delayed attribute operations
@@ -461,6 +467,11 @@ enum xfs_delattr_state {
 struct xfs_delattr_context {
 	struct xfs_da_args      *da_args;
 
+	/*
+	 * Used by xfs_attr_set to hold a leaf buffer across a transaction roll
+	 */
+	struct xfs_buf		*leaf_bp;
+
 	/* Used in xfs_attr_rmtval_set_blk to roll through allocating blocks */
 	struct xfs_bmbt_irec	map;
 	xfs_dablk_t		lblkno;
@@ -474,6 +485,23 @@ struct xfs_delattr_context {
 	enum xfs_delattr_state  dela_state;
 };
 
+/*
+ * List of attrs to commit later.
+ */
+struct xfs_attr_item {
+	struct xfs_delattr_context	xattri_dac;
+
+	/*
+	 * Indicates if the attr operation is a set or a remove
+	 * XFS_ATTR_OP_FLAGS_{SET,REMOVE}
+	 */
+	unsigned int			xattri_op_flags;
+
+	/* used to log this item to an intent */
+	struct list_head		xattri_list;
+};
+
+
 /*========================================================================
  * Function prototypes for the kernel.
  *========================================================================*/
@@ -490,10 +518,13 @@ int xfs_attr_get_ilocked(struct xfs_da_args *args);
 int xfs_attr_get(struct xfs_da_args *args);
 int xfs_attr_set(struct xfs_da_args *args);
 int xfs_attr_set_args(struct xfs_da_args *args);
+int xfs_attr_set_iter(struct xfs_delattr_context *dac,
+		      struct xfs_buf **leaf_bp);
 int xfs_attr_remove_args(struct xfs_da_args *args);
 int xfs_attr_remove_iter(struct xfs_delattr_context *dac);
 bool xfs_attr_namecheck(const void *name, size_t length);
 void xfs_delattr_context_init(struct xfs_delattr_context *dac,
 			      struct xfs_da_args *args);
+int xfs_attr_calc_size(struct xfs_da_args *args, int *local);
 
 #endif	/* __XFS_ATTR_H__ */
diff --git a/fs/xfs/libxfs/xfs_defer.h b/fs/xfs/libxfs/xfs_defer.h
index 739f70d72fd5..89719146c5eb 100644
--- a/fs/xfs/libxfs/xfs_defer.h
+++ b/fs/xfs/libxfs/xfs_defer.h
@@ -63,6 +63,8 @@ extern const struct xfs_defer_op_type xfs_refcount_update_defer_type;
 extern const struct xfs_defer_op_type xfs_rmap_update_defer_type;
 extern const struct xfs_defer_op_type xfs_extent_free_defer_type;
 extern const struct xfs_defer_op_type xfs_agfl_free_defer_type;
+extern const struct xfs_defer_op_type xfs_attr_defer_type;
+
 
 /*
  * This structure enables a dfops user to detach the chain of deferred
diff --git a/fs/xfs/libxfs/xfs_log_format.h b/fs/xfs/libxfs/xfs_log_format.h
index b322db523d65..3301c369e815 100644
--- a/fs/xfs/libxfs/xfs_log_format.h
+++ b/fs/xfs/libxfs/xfs_log_format.h
@@ -114,7 +114,12 @@ struct xfs_unmount_log_format {
 #define XLOG_REG_TYPE_CUD_FORMAT	24
 #define XLOG_REG_TYPE_BUI_FORMAT	25
 #define XLOG_REG_TYPE_BUD_FORMAT	26
-#define XLOG_REG_TYPE_MAX		26
+#define XLOG_REG_TYPE_ATTRI_FORMAT	27
+#define XLOG_REG_TYPE_ATTRD_FORMAT	28
+#define XLOG_REG_TYPE_ATTR_NAME	29
+#define XLOG_REG_TYPE_ATTR_VALUE	30
+#define XLOG_REG_TYPE_MAX		30
+
 
 /*
  * Flags to log operation header
@@ -237,6 +242,8 @@ typedef struct xfs_trans_header {
 #define	XFS_LI_CUD		0x1243
 #define	XFS_LI_BUI		0x1244	/* bmbt update intent */
 #define	XFS_LI_BUD		0x1245
+#define	XFS_LI_ATTRI		0x1246  /* attr set/remove intent*/
+#define	XFS_LI_ATTRD		0x1247  /* attr set/remove done */
 
 #define XFS_LI_TYPE_DESC \
 	{ XFS_LI_EFI,		"XFS_LI_EFI" }, \
@@ -252,7 +259,9 @@ typedef struct xfs_trans_header {
 	{ XFS_LI_CUI,		"XFS_LI_CUI" }, \
 	{ XFS_LI_CUD,		"XFS_LI_CUD" }, \
 	{ XFS_LI_BUI,		"XFS_LI_BUI" }, \
-	{ XFS_LI_BUD,		"XFS_LI_BUD" }
+	{ XFS_LI_BUD,		"XFS_LI_BUD" }, \
+	{ XFS_LI_ATTRI,		"XFS_LI_ATTRI" }, \
+	{ XFS_LI_ATTRD,		"XFS_LI_ATTRD" }
 
 /*
  * Inode Log Item Format definitions.
@@ -869,4 +878,35 @@ struct xfs_icreate_log {
 	__be32		icl_gen;	/* inode generation number to use */
 };
 
+/*
+ * Flags for deferred attribute operations.
+ * Upper bits are flags, lower byte is type code
+ */
+#define XFS_ATTR_OP_FLAGS_SET		1	/* Set the attribute */
+#define XFS_ATTR_OP_FLAGS_REMOVE	2	/* Remove the attribute */
+#define XFS_ATTR_OP_FLAGS_TYPE_MASK	0xFF	/* Flags type mask */
+
+/*
+ * This is the structure used to lay out an attr log item in the
+ * log.
+ */
+struct xfs_attri_log_format {
+	uint16_t	alfi_type;	/* attri log item type */
+	uint16_t	alfi_size;	/* size of this item */
+	uint32_t	__pad;		/* pad to 64 bit aligned */
+	uint64_t	alfi_id;	/* attri identifier */
+	uint64_t	alfi_ino;	/* the inode for this attr operation */
+	uint32_t	alfi_op_flags;	/* marks the op as a set or remove */
+	uint32_t	alfi_name_len;	/* attr name length */
+	uint32_t	alfi_value_len;	/* attr value length */
+	uint32_t	alfi_attr_flags;/* attr flags */
+};
+
+struct xfs_attrd_log_format {
+	uint16_t	alfd_type;	/* attrd log item type */
+	uint16_t	alfd_size;	/* size of this item */
+	uint32_t	__pad;		/* pad to 64 bit aligned */
+	uint64_t	alfd_alf_id;	/* id of corresponding attri */
+};
+
 #endif /* __XFS_LOG_FORMAT_H__ */
diff --git a/fs/xfs/libxfs/xfs_log_recover.h b/fs/xfs/libxfs/xfs_log_recover.h
index ff69a0000817..32e216255cb0 100644
--- a/fs/xfs/libxfs/xfs_log_recover.h
+++ b/fs/xfs/libxfs/xfs_log_recover.h
@@ -72,6 +72,8 @@ extern const struct xlog_recover_item_ops xlog_rui_item_ops;
 extern const struct xlog_recover_item_ops xlog_rud_item_ops;
 extern const struct xlog_recover_item_ops xlog_cui_item_ops;
 extern const struct xlog_recover_item_ops xlog_cud_item_ops;
+extern const struct xlog_recover_item_ops xlog_attri_item_ops;
+extern const struct xlog_recover_item_ops xlog_attrd_item_ops;
 
 /*
  * Macros, structures, prototypes for internal log manager use.
diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
index bf1f3607d0b6..97b54ac3075f 100644
--- a/fs/xfs/scrub/common.c
+++ b/fs/xfs/scrub/common.c
@@ -23,6 +23,8 @@
 #include "xfs_rmap_btree.h"
 #include "xfs_log.h"
 #include "xfs_trans_priv.h"
+#include "xfs_da_format.h"
+#include "xfs_da_btree.h"
 #include "xfs_attr.h"
 #include "xfs_reflink.h"
 #include "xfs_ag.h"
diff --git a/fs/xfs/xfs_attr_item.c b/fs/xfs/xfs_attr_item.c
new file mode 100644
index 000000000000..879a39ec58a6
--- /dev/null
+++ b/fs/xfs/xfs_attr_item.c
@@ -0,0 +1,453 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Copyright (C) 2021 Oracle.  All Rights Reserved.
+ * Author: Allison Collins <allison.henderson@oracle.com>
+ */
+
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_format.h"
+#include "xfs_trans_resv.h"
+#include "xfs_bit.h"
+#include "xfs_shared.h"
+#include "xfs_mount.h"
+#include "xfs_defer.h"
+#include "xfs_log_format.h"
+#include "xfs_trans.h"
+#include "xfs_bmap_btree.h"
+#include "xfs_trans_priv.h"
+#include "xfs_buf_item.h"
+#include "xfs_log.h"
+#include "xfs_btree.h"
+#include "xfs_rmap.h"
+#include "xfs_inode.h"
+#include "xfs_icache.h"
+#include "xfs_da_format.h"
+#include "xfs_da_btree.h"
+#include "xfs_attr.h"
+#include "xfs_shared.h"
+#include "xfs_attr_item.h"
+#include "xfs_alloc.h"
+#include "xfs_bmap.h"
+#include "xfs_trace.h"
+#include "libxfs/xfs_da_format.h"
+#include "xfs_inode.h"
+#include "xfs_quota.h"
+#include "xfs_trans_space.h"
+#include "xfs_error.h"
+#include "xfs_log_priv.h"
+#include "xfs_log_recover.h"
+
+static const struct xfs_item_ops xfs_attri_item_ops;
+static const struct xfs_item_ops xfs_attrd_item_ops;
+
+/* iovec length must be 32-bit aligned */
+static inline size_t ATTR_NVEC_SIZE(size_t size)
+{
+	return size == sizeof(int32_t) ? size :
+	       sizeof(int32_t) + round_up(size, sizeof(int32_t));
+}
+
+static inline struct xfs_attri_log_item *ATTRI_ITEM(struct xfs_log_item *lip)
+{
+	return container_of(lip, struct xfs_attri_log_item, attri_item);
+}
+
+STATIC void
+xfs_attri_item_free(
+	struct xfs_attri_log_item	*attrip)
+{
+	kmem_free(attrip->attri_item.li_lv_shadow);
+	kmem_free(attrip);
+}
+
+/*
+ * Freeing the attrip requires that we remove it from the AIL if it has already
+ * been placed there. However, the ATTRI may not yet have been placed in the
+ * AIL when called by xfs_attri_release() from ATTRD processing due to the
+ * ordering of committed vs unpin operations in bulk insert operations. Hence
+ * the reference count to ensure only the last caller frees the ATTRI.
+ */
+STATIC void
+xfs_attri_release(
+	struct xfs_attri_log_item	*attrip)
+{
+	ASSERT(atomic_read(&attrip->attri_refcount) > 0);
+	if (atomic_dec_and_test(&attrip->attri_refcount)) {
+		xfs_trans_ail_delete(&attrip->attri_item,
+				     SHUTDOWN_LOG_IO_ERROR);
+		xfs_attri_item_free(attrip);
+	}
+}
+
+STATIC void
+xfs_attri_item_size(
+	struct xfs_log_item	*lip,
+	int			*nvecs,
+	int			*nbytes)
+{
+	struct xfs_attri_log_item       *attrip = ATTRI_ITEM(lip);
+
+	*nvecs += 1;
+	*nbytes += sizeof(struct xfs_attri_log_format);
+
+	/* Attr set and remove operations require a name */
+	ASSERT(attrip->attri_name_len > 0);
+
+	*nvecs += 1;
+	*nbytes += ATTR_NVEC_SIZE(attrip->attri_name_len);
+
+	if (attrip->attri_value_len > 0) {
+		*nvecs += 1;
+		*nbytes += ATTR_NVEC_SIZE(attrip->attri_value_len);
+	}
+}
+
+/*
+ * This is called to fill in the log iovecs for the given attri log
+ * item. We use  1 iovec for the attri_format_item, 1 for the name, and
+ * another for the value if it is present
+ */
+STATIC void
+xfs_attri_item_format(
+	struct xfs_log_item	*lip,
+	struct xfs_log_vec	*lv)
+{
+	struct xfs_attri_log_item	*attrip = ATTRI_ITEM(lip);
+	struct xfs_log_iovec		*vecp = NULL;
+
+	attrip->attri_format.alfi_type = XFS_LI_ATTRI;
+	attrip->attri_format.alfi_size = 1;
+
+	/*
+	 * This size accounting must be done before copying the attrip into the
+	 * iovec.  If we do it after, the wrong size will be recorded to the log
+	 * and we trip across assertion checks for bad region sizes later during
+	 * the log recovery.
+	 */
+
+	ASSERT(attrip->attri_name_len > 0);
+	attrip->attri_format.alfi_size++;
+
+	if (attrip->attri_value_len > 0)
+		attrip->attri_format.alfi_size++;
+
+	xlog_copy_iovec(lv, &vecp, XLOG_REG_TYPE_ATTRI_FORMAT,
+			&attrip->attri_format,
+			sizeof(struct xfs_attri_log_format));
+	xlog_copy_iovec(lv, &vecp, XLOG_REG_TYPE_ATTR_NAME,
+			attrip->attri_name,
+			ATTR_NVEC_SIZE(attrip->attri_name_len));
+	if (attrip->attri_value_len > 0)
+		xlog_copy_iovec(lv, &vecp, XLOG_REG_TYPE_ATTR_VALUE,
+				attrip->attri_value,
+				ATTR_NVEC_SIZE(attrip->attri_value_len));
+}
+
+/*
+ * The unpin operation is the last place an ATTRI is manipulated in the log. It
+ * is either inserted in the AIL or aborted in the event of a log I/O error. In
+ * either case, the ATTRI transaction has been successfully committed to make
+ * it this far. Therefore, we expect whoever committed the ATTRI to either
+ * construct and commit the ATTRD or drop the ATTRD's reference in the event of
+ * error. Simply drop the log's ATTRI reference now that the log is done with
+ * it.
+ */
+STATIC void
+xfs_attri_item_unpin(
+	struct xfs_log_item	*lip,
+	int			remove)
+{
+	xfs_attri_release(ATTRI_ITEM(lip));
+}
+
+
+STATIC void
+xfs_attri_item_release(
+	struct xfs_log_item	*lip)
+{
+	xfs_attri_release(ATTRI_ITEM(lip));
+}
+
+/*
+ * Allocate and initialize an attri item.  Caller may allocate an additional
+ * trailing buffer of the specified size
+ */
+STATIC struct xfs_attri_log_item *
+xfs_attri_init(
+	struct xfs_mount		*mp,
+	int				buffer_size)
+
+{
+	struct xfs_attri_log_item	*attrip;
+	uint				size;
+
+	size = sizeof(struct xfs_attri_log_item) + buffer_size;
+	attrip = kvmalloc(size, KM_ZERO);
+	if (attrip == NULL)
+		return NULL;
+
+	xfs_log_item_init(mp, &attrip->attri_item, XFS_LI_ATTRI,
+			  &xfs_attri_item_ops);
+	attrip->attri_format.alfi_id = (uintptr_t)(void *)attrip;
+	atomic_set(&attrip->attri_refcount, 2);
+
+	return attrip;
+}
+
+/*
+ * Copy an attr format buffer from the given buf, and into the destination attr
+ * format structure.
+ */
+STATIC int
+xfs_attri_copy_format(
+	struct xfs_log_iovec		*buf,
+	struct xfs_attri_log_format	*dst_attr_fmt)
+{
+	struct xfs_attri_log_format	*src_attr_fmt = buf->i_addr;
+	uint				len;
+
+	len = sizeof(struct xfs_attri_log_format);
+	if (buf->i_len != len) {
+		XFS_ERROR_REPORT(__func__, XFS_ERRLEVEL_LOW, NULL);
+		return -EFSCORRUPTED;
+	}
+
+	memcpy((char *)dst_attr_fmt, (char *)src_attr_fmt, len);
+	return 0;
+}
+
+static inline struct xfs_attrd_log_item *ATTRD_ITEM(struct xfs_log_item *lip)
+{
+	return container_of(lip, struct xfs_attrd_log_item, attrd_item);
+}
+
+STATIC void
+xfs_attrd_item_free(struct xfs_attrd_log_item *attrdp)
+{
+	kmem_free(attrdp->attrd_item.li_lv_shadow);
+	kmem_free(attrdp);
+}
+
+STATIC void
+xfs_attrd_item_size(
+	struct xfs_log_item		*lip,
+	int				*nvecs,
+	int				*nbytes)
+{
+	*nvecs += 1;
+	*nbytes += sizeof(struct xfs_attrd_log_format);
+}
+
+/*
+ * This is called to fill in the log iovecs for the given attrd log item. We use
+ * only 1 iovec for the attrd_format, and we point that at the attr_log_format
+ * structure embedded in the attrd item.
+ */
+STATIC void
+xfs_attrd_item_format(
+	struct xfs_log_item	*lip,
+	struct xfs_log_vec	*lv)
+{
+	struct xfs_attrd_log_item	*attrdp = ATTRD_ITEM(lip);
+	struct xfs_log_iovec		*vecp = NULL;
+
+	attrdp->attrd_format.alfd_type = XFS_LI_ATTRD;
+	attrdp->attrd_format.alfd_size = 1;
+
+	xlog_copy_iovec(lv, &vecp, XLOG_REG_TYPE_ATTRD_FORMAT,
+			&attrdp->attrd_format,
+			sizeof(struct xfs_attrd_log_format));
+}
+
+/*
+ * The ATTRD is either committed or aborted if the transaction is canceled. If
+ * the transaction is canceled, drop our reference to the ATTRI and free the
+ * ATTRD.
+ */
+STATIC void
+xfs_attrd_item_release(
+	struct xfs_log_item		*lip)
+{
+	struct xfs_attrd_log_item	*attrdp = ATTRD_ITEM(lip);
+
+	xfs_attri_release(attrdp->attrd_attrip);
+	xfs_attrd_item_free(attrdp);
+}
+
+STATIC xfs_lsn_t
+xfs_attri_item_committed(
+	struct xfs_log_item		*lip,
+	xfs_lsn_t			lsn)
+{
+	struct xfs_attri_log_item	*attrip = ATTRI_ITEM(lip);
+
+	/*
+	 * The attrip refers to xfs_attr_item memory to log the name and value
+	 * with the intent item. This already occurred when the intent was
+	 * committed so these fields are no longer accessed. Clear them out of
+	 * caution since we're about to free the xfs_attr_item.
+	 */
+	attrip->attri_name = NULL;
+	attrip->attri_value = NULL;
+
+	/*
+	 * The ATTRI is logged only once and cannot be moved in the log, so
+	 * simply return the lsn at which it's been logged.
+	 */
+	return lsn;
+}
+
+STATIC bool
+xfs_attri_item_match(
+	struct xfs_log_item	*lip,
+	uint64_t		intent_id)
+{
+	return ATTRI_ITEM(lip)->attri_format.alfi_id == intent_id;
+}
+
+static const struct xfs_item_ops xfs_attrd_item_ops = {
+	.flags		= XFS_ITEM_RELEASE_WHEN_COMMITTED,
+	.iop_size	= xfs_attrd_item_size,
+	.iop_format	= xfs_attrd_item_format,
+	.iop_release    = xfs_attrd_item_release,
+};
+
+/* Is this recovered ATTRI ok? */
+static inline bool
+xfs_attri_validate(
+	struct xfs_mount		*mp,
+	struct xfs_attri_log_item	*attrip)
+{
+	struct xfs_attri_log_format     *attrp = &attrip->attri_format;
+	unsigned int			op = attrp->alfi_op_flags &
+					     XFS_ATTR_OP_FLAGS_TYPE_MASK;
+
+	/* alfi_op_flags should be either a set or remove */
+	if (op != XFS_ATTR_OP_FLAGS_SET && op != XFS_ATTR_OP_FLAGS_REMOVE)
+		return false;
+
+	if (attrp->alfi_value_len > XATTR_SIZE_MAX)
+		return false;
+
+	if ((attrp->alfi_name_len > XATTR_NAME_MAX) ||
+	    (attrp->alfi_name_len == 0))
+		return false;
+
+	return xfs_verify_ino(mp, attrp->alfi_ino);
+}
+
+static const struct xfs_item_ops xfs_attri_item_ops = {
+	.iop_size	= xfs_attri_item_size,
+	.iop_format	= xfs_attri_item_format,
+	.iop_unpin	= xfs_attri_item_unpin,
+	.iop_committed	= xfs_attri_item_committed,
+	.iop_release    = xfs_attri_item_release,
+	.iop_match	= xfs_attri_item_match,
+};
+
+
+
+STATIC int
+xlog_recover_attri_commit_pass2(
+	struct xlog                     *log,
+	struct list_head		*buffer_list,
+	struct xlog_recover_item        *item,
+	xfs_lsn_t                       lsn)
+{
+	int                             error;
+	struct xfs_mount                *mp = log->l_mp;
+	struct xfs_attri_log_item       *attrip;
+	struct xfs_attri_log_format     *attri_formatp;
+	char				*name = NULL;
+	char				*value = NULL;
+	int				region = 0;
+	int				buffer_size;
+
+	attri_formatp = item->ri_buf[region].i_addr;
+
+	/* Validate xfs_attri_log_format */
+	if (attri_formatp->__pad != 0 || attri_formatp->alfi_name_len == 0 ||
+	    (attri_formatp->alfi_op_flags == XFS_ATTR_OP_FLAGS_REMOVE &&
+	    attri_formatp->alfi_value_len != 0)) {
+		XFS_ERROR_REPORT(__func__, XFS_ERRLEVEL_LOW, mp);
+		return -EFSCORRUPTED;
+	}
+
+	buffer_size = attri_formatp->alfi_name_len +
+		      attri_formatp->alfi_value_len;
+
+	attrip = xfs_attri_init(mp, buffer_size);
+	if (attrip == NULL)
+		return -ENOMEM;
+
+	error = xfs_attri_copy_format(&item->ri_buf[region],
+				      &attrip->attri_format);
+	if (error) {
+		xfs_attri_item_free(attrip);
+		return error;
+	}
+
+	attrip->attri_name_len = attri_formatp->alfi_name_len;
+	attrip->attri_value_len = attri_formatp->alfi_value_len;
+	region++;
+	name = ((char *)attrip) + sizeof(struct xfs_attri_log_item);
+	memcpy(name, item->ri_buf[region].i_addr, attrip->attri_name_len);
+	attrip->attri_name = name;
+
+	if (attrip->attri_value_len > 0) {
+		region++;
+		value = ((char *)attrip) + sizeof(struct xfs_attri_log_item) +
+			attrip->attri_name_len;
+		memcpy(value, item->ri_buf[region].i_addr,
+			attrip->attri_value_len);
+		attrip->attri_value = value;
+	}
+
+	/*
+	 * The ATTRI has two references. One for the ATTRD and one for ATTRI to
+	 * ensure it makes it into the AIL. Insert the ATTRI into the AIL
+	 * directly and drop the ATTRI reference. Note that
+	 * xfs_trans_ail_update() drops the AIL lock.
+	 */
+	xfs_trans_ail_insert(log->l_ailp, &attrip->attri_item, lsn);
+	xfs_attri_release(attrip);
+	return 0;
+}
+
+const struct xlog_recover_item_ops xlog_attri_item_ops = {
+	.item_type	= XFS_LI_ATTRI,
+	.commit_pass2	= xlog_recover_attri_commit_pass2,
+};
+
+/*
+ * This routine is called when an ATTRD format structure is found in a committed
+ * transaction in the log. Its purpose is to cancel the corresponding ATTRI if
+ * it was still in the log. To do this it searches the AIL for the ATTRI with
+ * an id equal to that in the ATTRD format structure. If we find it we drop
+ * the ATTRD reference, which removes the ATTRI from the AIL and frees it.
+ */
+STATIC int
+xlog_recover_attrd_commit_pass2(
+	struct xlog			*log,
+	struct list_head		*buffer_list,
+	struct xlog_recover_item	*item,
+	xfs_lsn_t			lsn)
+{
+	struct xfs_attrd_log_format	*attrd_formatp;
+
+	attrd_formatp = item->ri_buf[0].i_addr;
+	if (item->ri_buf[0].i_len != sizeof(struct xfs_attrd_log_format)) {
+		XFS_ERROR_REPORT(__func__, XFS_ERRLEVEL_LOW, NULL);
+		return -EFSCORRUPTED;
+	}
+
+	xlog_recover_release_intent(log, XFS_LI_ATTRI,
+				    attrd_formatp->alfd_alf_id);
+	return 0;
+}
+
+const struct xlog_recover_item_ops xlog_attrd_item_ops = {
+	.item_type	= XFS_LI_ATTRD,
+	.commit_pass2	= xlog_recover_attrd_commit_pass2,
+};
diff --git a/fs/xfs/xfs_attr_item.h b/fs/xfs/xfs_attr_item.h
new file mode 100644
index 000000000000..ce33e9b5a9d3
--- /dev/null
+++ b/fs/xfs/xfs_attr_item.h
@@ -0,0 +1,52 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later
+ *
+ * Copyright (C) 2021 Oracle.  All Rights Reserved.
+ * Author: Allison Collins <allison.henderson@oracle.com>
+ */
+#ifndef	__XFS_ATTR_ITEM_H__
+#define	__XFS_ATTR_ITEM_H__
+
+/* kernel only ATTRI/ATTRD definitions */
+
+struct xfs_mount;
+struct kmem_zone;
+
+/*
+ * Define ATTR flag bits. Manipulated by set/clear/test_bit operators.
+ */
+#define	XFS_ATTRI_RECOVERED	1
+
+
+/*
+ * This is the "attr intention" log item.  It is used to log the fact that some
+ * attribute operations need to be processed.  An operation is currently either
+ * a set or remove.  Set or remove operations are described by the xfs_attr_item
+ * which may be logged to this intent.
+ *
+ * During a normal attr operation, name and value point to the name and value
+ * fields of the calling functions xfs_da_args.  During a recovery, the name
+ * and value buffers are copied from the log, and stored in a trailing buffer
+ * attached to the xfs_attr_item until they are committed.  They are freed when
+ * the xfs_attr_item itself is freed when the work is done.
+ */
+struct xfs_attri_log_item {
+	struct xfs_log_item		attri_item;
+	atomic_t			attri_refcount;
+	int				attri_name_len;
+	int				attri_value_len;
+	void				*attri_name;
+	void				*attri_value;
+	struct xfs_attri_log_format	attri_format;
+};
+
+/*
+ * This is the "attr done" log item.  It is used to log the fact that some attrs
+ * earlier mentioned in an attri item have been freed.
+ */
+struct xfs_attrd_log_item {
+	struct xfs_attri_log_item	*attrd_attrip;
+	struct xfs_log_item		attrd_item;
+	struct xfs_attrd_log_format	attrd_format;
+};
+
+#endif	/* __XFS_ATTR_ITEM_H__ */
diff --git a/fs/xfs/xfs_attr_list.c b/fs/xfs/xfs_attr_list.c
index 2d1e5134cebe..90a14e85e76d 100644
--- a/fs/xfs/xfs_attr_list.c
+++ b/fs/xfs/xfs_attr_list.c
@@ -15,6 +15,7 @@
 #include "xfs_inode.h"
 #include "xfs_trans.h"
 #include "xfs_bmap.h"
+#include "xfs_da_btree.h"
 #include "xfs_attr.h"
 #include "xfs_attr_sf.h"
 #include "xfs_attr_leaf.h"
diff --git a/fs/xfs/xfs_ioctl32.c b/fs/xfs/xfs_ioctl32.c
index 8783af203cfc..ab543c5b1371 100644
--- a/fs/xfs/xfs_ioctl32.c
+++ b/fs/xfs/xfs_ioctl32.c
@@ -17,6 +17,8 @@
 #include "xfs_itable.h"
 #include "xfs_fsops.h"
 #include "xfs_rtalloc.h"
+#include "xfs_da_format.h"
+#include "xfs_da_btree.h"
 #include "xfs_attr.h"
 #include "xfs_ioctl.h"
 #include "xfs_ioctl32.h"
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index 0ff0cca94092..f81e5d55341b 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -13,6 +13,8 @@
 #include "xfs_inode.h"
 #include "xfs_acl.h"
 #include "xfs_quota.h"
+#include "xfs_da_format.h"
+#include "xfs_da_btree.h"
 #include "xfs_attr.h"
 #include "xfs_trans.h"
 #include "xfs_trace.h"
diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
index f6cd2d4aa770..4402c5d09269 100644
--- a/fs/xfs/xfs_log.c
+++ b/fs/xfs/xfs_log.c
@@ -2157,6 +2157,10 @@ xlog_print_tic_res(
 	    REG_TYPE_STR(CUD_FORMAT, "cud_format"),
 	    REG_TYPE_STR(BUI_FORMAT, "bui_format"),
 	    REG_TYPE_STR(BUD_FORMAT, "bud_format"),
+	    REG_TYPE_STR(ATTRI_FORMAT, "attri_format"),
+	    REG_TYPE_STR(ATTRD_FORMAT, "attrd_format"),
+	    REG_TYPE_STR(ATTR_NAME, "attr name"),
+	    REG_TYPE_STR(ATTR_VALUE, "attr value"),
 	};
 	BUILD_BUG_ON(ARRAY_SIZE(res_type_str) != XLOG_REG_TYPE_MAX + 1);
 #undef REG_TYPE_STR
diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index 6a3c0bb16b69..c248fd5cf9ea 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -1800,6 +1800,8 @@ static const struct xlog_recover_item_ops *xlog_recover_item_ops[] = {
 	&xlog_cud_item_ops,
 	&xlog_bui_item_ops,
 	&xlog_bud_item_ops,
+	&xlog_attri_item_ops,
+	&xlog_attrd_item_ops,
 };
 
 static const struct xlog_recover_item_ops *
diff --git a/fs/xfs/xfs_ondisk.h b/fs/xfs/xfs_ondisk.h
index 25991923c1a8..758702b9495f 100644
--- a/fs/xfs/xfs_ondisk.h
+++ b/fs/xfs/xfs_ondisk.h
@@ -132,6 +132,8 @@ xfs_check_ondisk_structs(void)
 	XFS_CHECK_STRUCT_SIZE(struct xfs_inode_log_format,	56);
 	XFS_CHECK_STRUCT_SIZE(struct xfs_qoff_logformat,	20);
 	XFS_CHECK_STRUCT_SIZE(struct xfs_trans_header,		16);
+	XFS_CHECK_STRUCT_SIZE(struct xfs_attri_log_format,	40);
+	XFS_CHECK_STRUCT_SIZE(struct xfs_attrd_log_format,	16);
 
 	/*
 	 * The v5 superblock format extended several v4 header structures with
-- 
2.25.1


^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH v24 04/11] xfs: Implement attr logging and replay
  2021-08-24 22:44 [PATCH v24 00/11] Log Attribute Replay Allison Henderson
                   ` (2 preceding siblings ...)
  2021-08-24 22:44 ` [PATCH v24 03/11] xfs: Set up infrastructure for log atrribute replay Allison Henderson
@ 2021-08-24 22:44 ` Allison Henderson
  2021-08-30  7:47   ` Chandan Babu R
  2021-08-24 22:44 ` [PATCH v24 05/11] RFC xfs: Skip flip flags for delayed attrs Allison Henderson
                   ` (8 subsequent siblings)
  12 siblings, 1 reply; 45+ messages in thread
From: Allison Henderson @ 2021-08-24 22:44 UTC (permalink / raw)
  To: linux-xfs

This patch adds the needed routines to create, log and recover logged
extended attribute intents.

Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
---
 fs/xfs/libxfs/xfs_defer.c  |   1 +
 fs/xfs/libxfs/xfs_defer.h  |   1 +
 fs/xfs/libxfs/xfs_format.h |  10 +-
 fs/xfs/xfs_attr_item.c     | 358 +++++++++++++++++++++++++++++++++++++
 4 files changed, 369 insertions(+), 1 deletion(-)

diff --git a/fs/xfs/libxfs/xfs_defer.c b/fs/xfs/libxfs/xfs_defer.c
index d1d09b6aca55..01fcf5e93be5 100644
--- a/fs/xfs/libxfs/xfs_defer.c
+++ b/fs/xfs/libxfs/xfs_defer.c
@@ -178,6 +178,7 @@ static const struct xfs_defer_op_type *defer_op_types[] = {
 	[XFS_DEFER_OPS_TYPE_RMAP]	= &xfs_rmap_update_defer_type,
 	[XFS_DEFER_OPS_TYPE_FREE]	= &xfs_extent_free_defer_type,
 	[XFS_DEFER_OPS_TYPE_AGFL_FREE]	= &xfs_agfl_free_defer_type,
+	[XFS_DEFER_OPS_TYPE_ATTR]	= &xfs_attr_defer_type,
 };
 
 static void
diff --git a/fs/xfs/libxfs/xfs_defer.h b/fs/xfs/libxfs/xfs_defer.h
index 89719146c5eb..d70525c57b5c 100644
--- a/fs/xfs/libxfs/xfs_defer.h
+++ b/fs/xfs/libxfs/xfs_defer.h
@@ -19,6 +19,7 @@ enum xfs_defer_ops_type {
 	XFS_DEFER_OPS_TYPE_RMAP,
 	XFS_DEFER_OPS_TYPE_FREE,
 	XFS_DEFER_OPS_TYPE_AGFL_FREE,
+	XFS_DEFER_OPS_TYPE_ATTR,
 	XFS_DEFER_OPS_TYPE_MAX,
 };
 
diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
index 2d7057b7984b..2e0937bbff6d 100644
--- a/fs/xfs/libxfs/xfs_format.h
+++ b/fs/xfs/libxfs/xfs_format.h
@@ -388,7 +388,9 @@ xfs_sb_has_incompat_feature(
 	return (sbp->sb_features_incompat & feature) != 0;
 }
 
-#define XFS_SB_FEAT_INCOMPAT_LOG_ALL 0
+#define XFS_SB_FEAT_INCOMPAT_LOG_XATTRS   (1 << 0)	/* Delayed Attributes */
+#define XFS_SB_FEAT_INCOMPAT_LOG_ALL \
+	(XFS_SB_FEAT_INCOMPAT_LOG_XATTRS)
 #define XFS_SB_FEAT_INCOMPAT_LOG_UNKNOWN	~XFS_SB_FEAT_INCOMPAT_LOG_ALL
 static inline bool
 xfs_sb_has_incompat_log_feature(
@@ -413,6 +415,12 @@ xfs_sb_add_incompat_log_features(
 	sbp->sb_features_log_incompat |= features;
 }
 
+static inline bool sb_version_haslogxattrs(struct xfs_sb *sbp)
+{
+	return XFS_SB_VERSION_NUM(sbp) == XFS_SB_VERSION_5 &&
+		(sbp->sb_features_log_incompat &
+		 XFS_SB_FEAT_INCOMPAT_LOG_XATTRS);
+}
 
 static inline bool
 xfs_is_quota_inode(struct xfs_sb *sbp, xfs_ino_t ino)
diff --git a/fs/xfs/xfs_attr_item.c b/fs/xfs/xfs_attr_item.c
index 879a39ec58a6..c6d5ed34b424 100644
--- a/fs/xfs/xfs_attr_item.c
+++ b/fs/xfs/xfs_attr_item.c
@@ -275,6 +275,163 @@ xfs_attrd_item_release(
 	xfs_attrd_item_free(attrdp);
 }
 
+/*
+ * Performs one step of an attribute update intent and marks the attrd item
+ * dirty..  An attr operation may be a set or a remove.  Note that the
+ * transaction is marked dirty regardless of whether the operation succeeds or
+ * fails to support the ATTRI/ATTRD lifecycle rules.
+ */
+STATIC int
+xfs_trans_attr_finish_update(
+	struct xfs_delattr_context	*dac,
+	struct xfs_attrd_log_item	*attrdp,
+	struct xfs_buf			**leaf_bp,
+	uint32_t			op_flags)
+{
+	struct xfs_da_args		*args = dac->da_args;
+	unsigned int			op = op_flags &
+					     XFS_ATTR_OP_FLAGS_TYPE_MASK;
+	int				error;
+
+	switch (op) {
+	case XFS_ATTR_OP_FLAGS_SET:
+		error = xfs_attr_set_iter(dac, leaf_bp);
+		break;
+	case XFS_ATTR_OP_FLAGS_REMOVE:
+		ASSERT(XFS_IFORK_Q(args->dp));
+		error = xfs_attr_remove_iter(dac);
+		break;
+	default:
+		error = -EFSCORRUPTED;
+		break;
+	}
+
+	/*
+	 * Mark the transaction dirty, even on error. This ensures the
+	 * transaction is aborted, which:
+	 *
+	 * 1.) releases the ATTRI and frees the ATTRD
+	 * 2.) shuts down the filesystem
+	 */
+	args->trans->t_flags |= XFS_TRANS_DIRTY;
+
+	/*
+	 * attr intent/done items are null when delayed attributes are disabled
+	 */
+	if (attrdp)
+		set_bit(XFS_LI_DIRTY, &attrdp->attrd_item.li_flags);
+
+	return error;
+}
+
+/* Log an attr to the intent item. */
+STATIC void
+xfs_attr_log_item(
+	struct xfs_trans		*tp,
+	struct xfs_attri_log_item	*attrip,
+	struct xfs_attr_item		*attr)
+{
+	struct xfs_attri_log_format	*attrp;
+
+	tp->t_flags |= XFS_TRANS_DIRTY;
+	set_bit(XFS_LI_DIRTY, &attrip->attri_item.li_flags);
+
+	/*
+	 * At this point the xfs_attr_item has been constructed, and we've
+	 * created the log intent. Fill in the attri log item and log format
+	 * structure with fields from this xfs_attr_item
+	 */
+	attrp = &attrip->attri_format;
+	attrp->alfi_ino = attr->xattri_dac.da_args->dp->i_ino;
+	attrp->alfi_op_flags = attr->xattri_op_flags;
+	attrp->alfi_value_len = attr->xattri_dac.da_args->valuelen;
+	attrp->alfi_name_len = attr->xattri_dac.da_args->namelen;
+	attrp->alfi_attr_flags = attr->xattri_dac.da_args->attr_filter;
+
+	attrip->attri_name = (void *)attr->xattri_dac.da_args->name;
+	attrip->attri_value = attr->xattri_dac.da_args->value;
+	attrip->attri_name_len = attr->xattri_dac.da_args->namelen;
+	attrip->attri_value_len = attr->xattri_dac.da_args->valuelen;
+}
+
+/* Get an ATTRI. */
+static struct xfs_log_item *
+xfs_attr_create_intent(
+	struct xfs_trans		*tp,
+	struct list_head		*items,
+	unsigned int			count,
+	bool				sort)
+{
+	struct xfs_mount		*mp = tp->t_mountp;
+	struct xfs_attri_log_item	*attrip;
+	struct xfs_attr_item		*attr;
+
+	ASSERT(count == 1);
+
+	if (!sb_version_haslogxattrs(&mp->m_sb))
+		return NULL;
+
+	attrip = xfs_attri_init(mp, 0);
+	if (attrip == NULL)
+		return NULL;
+
+	xfs_trans_add_item(tp, &attrip->attri_item);
+	list_for_each_entry(attr, items, xattri_list)
+		xfs_attr_log_item(tp, attrip, attr);
+	return &attrip->attri_item;
+}
+
+/* Process an attr. */
+STATIC int
+xfs_attr_finish_item(
+	struct xfs_trans		*tp,
+	struct xfs_log_item		*done,
+	struct list_head		*item,
+	struct xfs_btree_cur		**state)
+{
+	struct xfs_attr_item		*attr;
+	struct xfs_attrd_log_item	*done_item = NULL;
+	int				error;
+	struct xfs_delattr_context	*dac;
+
+	attr = container_of(item, struct xfs_attr_item, xattri_list);
+	dac = &attr->xattri_dac;
+	if (done)
+		done_item = ATTRD_ITEM(done);
+
+	/*
+	 * Always reset trans after EAGAIN cycle
+	 * since the transaction is new
+	 */
+	dac->da_args->trans = tp;
+
+	error = xfs_trans_attr_finish_update(dac, done_item, &dac->leaf_bp,
+					     attr->xattri_op_flags);
+	if (error != -EAGAIN)
+		kmem_free(attr);
+
+	return error;
+}
+
+/* Abort all pending ATTRs. */
+STATIC void
+xfs_attr_abort_intent(
+	struct xfs_log_item		*intent)
+{
+	xfs_attri_release(ATTRI_ITEM(intent));
+}
+
+/* Cancel an attr */
+STATIC void
+xfs_attr_cancel_item(
+	struct list_head		*item)
+{
+	struct xfs_attr_item		*attr;
+
+	attr = container_of(item, struct xfs_attr_item, xattri_list);
+	kmem_free(attr);
+}
+
 STATIC xfs_lsn_t
 xfs_attri_item_committed(
 	struct xfs_log_item		*lip,
@@ -306,6 +463,30 @@ xfs_attri_item_match(
 	return ATTRI_ITEM(lip)->attri_format.alfi_id == intent_id;
 }
 
+/*
+ * This routine is called to allocate an "attr free done" log item.
+ */
+static struct xfs_attrd_log_item *
+xfs_trans_get_attrd(struct xfs_trans		*tp,
+		  struct xfs_attri_log_item	*attrip)
+{
+	struct xfs_attrd_log_item		*attrdp;
+	uint					size;
+
+	ASSERT(tp != NULL);
+
+	size = sizeof(struct xfs_attrd_log_item);
+	attrdp = kmem_zalloc(size, 0);
+
+	xfs_log_item_init(tp->t_mountp, &attrdp->attrd_item, XFS_LI_ATTRD,
+			  &xfs_attrd_item_ops);
+	attrdp->attrd_attrip = attrip;
+	attrdp->attrd_format.alfd_alf_id = attrip->attri_format.alfi_id;
+
+	xfs_trans_add_item(tp, &attrdp->attrd_item);
+	return attrdp;
+}
+
 static const struct xfs_item_ops xfs_attrd_item_ops = {
 	.flags		= XFS_ITEM_RELEASE_WHEN_COMMITTED,
 	.iop_size	= xfs_attrd_item_size,
@@ -313,6 +494,29 @@ static const struct xfs_item_ops xfs_attrd_item_ops = {
 	.iop_release    = xfs_attrd_item_release,
 };
 
+
+/* Get an ATTRD so we can process all the attrs. */
+static struct xfs_log_item *
+xfs_attr_create_done(
+	struct xfs_trans		*tp,
+	struct xfs_log_item		*intent,
+	unsigned int			count)
+{
+	if (!intent)
+		return NULL;
+
+	return &xfs_trans_get_attrd(tp, ATTRI_ITEM(intent))->attrd_item;
+}
+
+const struct xfs_defer_op_type xfs_attr_defer_type = {
+	.max_items	= 1,
+	.create_intent	= xfs_attr_create_intent,
+	.abort_intent	= xfs_attr_abort_intent,
+	.create_done	= xfs_attr_create_done,
+	.finish_item	= xfs_attr_finish_item,
+	.cancel_item	= xfs_attr_cancel_item,
+};
+
 /* Is this recovered ATTRI ok? */
 static inline bool
 xfs_attri_validate(
@@ -337,13 +541,167 @@ xfs_attri_validate(
 	return xfs_verify_ino(mp, attrp->alfi_ino);
 }
 
+/*
+ * Process an attr intent item that was recovered from the log.  We need to
+ * delete the attr that it describes.
+ */
+STATIC int
+xfs_attri_item_recover(
+	struct xfs_log_item		*lip,
+	struct list_head		*capture_list)
+{
+	struct xfs_attri_log_item	*attrip = ATTRI_ITEM(lip);
+	struct xfs_attr_item		*attr;
+	struct xfs_mount		*mp = lip->li_mountp;
+	struct xfs_inode		*ip;
+	struct xfs_da_args		*args;
+	struct xfs_trans		*tp;
+	struct xfs_trans_res		tres;
+	struct xfs_attri_log_format	*attrp;
+	int				error, ret = 0;
+	int				total;
+	int				local;
+	struct xfs_attrd_log_item	*done_item = NULL;
+
+	/*
+	 * First check the validity of the attr described by the ATTRI.  If any
+	 * are bad, then assume that all are bad and just toss the ATTRI.
+	 */
+	attrp = &attrip->attri_format;
+	if (!xfs_attri_validate(mp, attrip))
+		return -EFSCORRUPTED;
+
+	error = xlog_recover_iget(mp,  attrp->alfi_ino, &ip);
+	if (error)
+		return error;
+
+	attr = kmem_zalloc(sizeof(struct xfs_attr_item) +
+			   sizeof(struct xfs_da_args), KM_NOFS);
+	args = (struct xfs_da_args *)(attr + 1);
+
+	attr->xattri_dac.da_args = args;
+	attr->xattri_op_flags = attrp->alfi_op_flags;
+
+	args->dp = ip;
+	args->geo = mp->m_attr_geo;
+	args->op_flags = attrp->alfi_op_flags;
+	args->whichfork = XFS_ATTR_FORK;
+	args->name = attrip->attri_name;
+	args->namelen = attrp->alfi_name_len;
+	args->hashval = xfs_da_hashname(args->name, args->namelen);
+	args->attr_filter = attrp->alfi_attr_flags;
+
+	if (attrp->alfi_op_flags == XFS_ATTR_OP_FLAGS_SET) {
+		args->value = attrip->attri_value;
+		args->valuelen = attrp->alfi_value_len;
+		args->total = xfs_attr_calc_size(args, &local);
+
+		tres.tr_logres = M_RES(mp)->tr_attrsetm.tr_logres +
+				 M_RES(mp)->tr_attrsetrt.tr_logres *
+					args->total;
+		tres.tr_logcount = XFS_ATTRSET_LOG_COUNT;
+		tres.tr_logflags = XFS_TRANS_PERM_LOG_RES;
+		total = args->total;
+	} else {
+		tres = M_RES(mp)->tr_attrrm;
+		total = XFS_ATTRRM_SPACE_RES(mp);
+	}
+	error = xfs_trans_alloc(mp, &tres, total, 0, XFS_TRANS_RESERVE, &tp);
+	if (error)
+		goto out;
+
+	args->trans = tp;
+	done_item = xfs_trans_get_attrd(tp, attrip);
+
+	xfs_ilock(ip, XFS_ILOCK_EXCL);
+	xfs_trans_ijoin(tp, ip, 0);
+
+	ret = xfs_trans_attr_finish_update(&attr->xattri_dac, done_item,
+					   &attr->xattri_dac.leaf_bp,
+					   attrp->alfi_op_flags);
+	if (ret == -EAGAIN) {
+		/* There's more work to do, so add it to this transaction */
+		xfs_defer_add(tp, XFS_DEFER_OPS_TYPE_ATTR, &attr->xattri_list);
+	} else
+		error = ret;
+
+	if (error) {
+		xfs_trans_cancel(tp);
+		goto out_unlock;
+	}
+
+	error = xfs_defer_ops_capture_and_commit(tp, ip, capture_list,
+						 attr->xattri_dac.leaf_bp);
+
+out_unlock:
+	xfs_iunlock(ip, XFS_ILOCK_EXCL);
+	xfs_irele(ip);
+out:
+	if (ret != -EAGAIN)
+		kmem_free(attr);
+	return error;
+}
+
+/* Re-log an intent item to push the log tail forward. */
+static struct xfs_log_item *
+xfs_attri_item_relog(
+	struct xfs_log_item		*intent,
+	struct xfs_trans		*tp)
+{
+	struct xfs_attrd_log_item	*attrdp;
+	struct xfs_attri_log_item	*old_attrip;
+	struct xfs_attri_log_item	*new_attrip;
+	struct xfs_attri_log_format	*new_attrp;
+	struct xfs_attri_log_format	*old_attrp;
+	int				buffer_size;
+
+	old_attrip = ATTRI_ITEM(intent);
+	old_attrp = &old_attrip->attri_format;
+	buffer_size = old_attrp->alfi_value_len + old_attrp->alfi_name_len;
+
+	tp->t_flags |= XFS_TRANS_DIRTY;
+	attrdp = xfs_trans_get_attrd(tp, old_attrip);
+	set_bit(XFS_LI_DIRTY, &attrdp->attrd_item.li_flags);
+
+	new_attrip = xfs_attri_init(tp->t_mountp, buffer_size);
+	new_attrp = &new_attrip->attri_format;
+
+	new_attrp->alfi_ino = old_attrp->alfi_ino;
+	new_attrp->alfi_op_flags = old_attrp->alfi_op_flags;
+	new_attrp->alfi_value_len = old_attrp->alfi_value_len;
+	new_attrp->alfi_name_len = old_attrp->alfi_name_len;
+	new_attrp->alfi_attr_flags = old_attrp->alfi_attr_flags;
+
+	new_attrip->attri_name_len = old_attrip->attri_name_len;
+	new_attrip->attri_name = ((char *)new_attrip) +
+				 sizeof(struct xfs_attri_log_item);
+	memcpy(new_attrip->attri_name, old_attrip->attri_name,
+		new_attrip->attri_name_len);
+
+	new_attrip->attri_value_len = old_attrip->attri_value_len;
+	if (new_attrip->attri_value_len > 0) {
+		new_attrip->attri_value = new_attrip->attri_name +
+					  new_attrip->attri_name_len;
+
+		memcpy(new_attrip->attri_value, old_attrip->attri_value,
+		       new_attrip->attri_value_len);
+	}
+
+	xfs_trans_add_item(tp, &new_attrip->attri_item);
+	set_bit(XFS_LI_DIRTY, &new_attrip->attri_item.li_flags);
+
+	return &new_attrip->attri_item;
+}
+
 static const struct xfs_item_ops xfs_attri_item_ops = {
 	.iop_size	= xfs_attri_item_size,
 	.iop_format	= xfs_attri_item_format,
 	.iop_unpin	= xfs_attri_item_unpin,
 	.iop_committed	= xfs_attri_item_committed,
 	.iop_release    = xfs_attri_item_release,
+	.iop_recover	= xfs_attri_item_recover,
 	.iop_match	= xfs_attri_item_match,
+	.iop_relog	= xfs_attri_item_relog,
 };
 
 
-- 
2.25.1


^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH v24 05/11] RFC xfs: Skip flip flags for delayed attrs
  2021-08-24 22:44 [PATCH v24 00/11] Log Attribute Replay Allison Henderson
                   ` (3 preceding siblings ...)
  2021-08-24 22:44 ` [PATCH v24 04/11] xfs: Implement attr logging and replay Allison Henderson
@ 2021-08-24 22:44 ` Allison Henderson
  2021-08-30 10:15   ` Chandan Babu R
  2021-08-24 22:44 ` [PATCH v24 06/11] xfs: Add xfs_attr_set_deferred and xfs_attr_remove_deferred Allison Henderson
                   ` (7 subsequent siblings)
  12 siblings, 1 reply; 45+ messages in thread
From: Allison Henderson @ 2021-08-24 22:44 UTC (permalink / raw)
  To: linux-xfs

This is a clean up patch that skips the flip flag logic for delayed attr
renames.  Since the log replay keeps the inode locked, we do not need to
worry about race windows with attr lookups.  So we can skip over
flipping the flag and the extra transaction roll for it

RFC: In the last review, folks asked for some performance analysis, so I
did a few perf captures with and with out this patch.  What I found was
that there wasnt very much difference at all between having the patch or
not having it.  Of the time we do spend in the affected code, the
percentage is small.  Most of the time we spend about %0.03 of the time
in this function, with or with out the patch.  Occasionally we get a
0.02%, though not often.  So I think this starts to challenge needing
this patch at all. This patch was requested some number of reviews ago,
be perhaps in light of the findings, it may no longer be of interest.

     0.03%     0.00%  fsstress  [xfs]               [k] xfs_attr_set_iter

Keep it or drop it?

Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_attr.c      | 54 +++++++++++++++++++++--------------
 fs/xfs/libxfs/xfs_attr_leaf.c |  3 +-
 2 files changed, 35 insertions(+), 22 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
index dfff81024e46..fce67c717be2 100644
--- a/fs/xfs/libxfs/xfs_attr.c
+++ b/fs/xfs/libxfs/xfs_attr.c
@@ -355,6 +355,7 @@ xfs_attr_set_iter(
 	struct xfs_inode		*dp = args->dp;
 	struct xfs_buf			*bp = NULL;
 	int				forkoff, error = 0;
+	struct xfs_mount		*mp = args->dp->i_mount;
 
 	/* State machine switch */
 	switch (dac->dela_state) {
@@ -477,16 +478,21 @@ xfs_attr_set_iter(
 		 * In a separate transaction, set the incomplete flag on the
 		 * "old" attr and clear the incomplete flag on the "new" attr.
 		 */
-		error = xfs_attr3_leaf_flipflags(args);
-		if (error)
-			return error;
-		/*
-		 * Commit the flag value change and start the next trans in
-		 * series.
-		 */
-		dac->dela_state = XFS_DAS_FLIP_LFLAG;
-		trace_xfs_attr_set_iter_return(dac->dela_state, args->dp);
-		return -EAGAIN;
+		if (!xfs_has_larp(mp)) {
+			error = xfs_attr3_leaf_flipflags(args);
+			if (error)
+				return error;
+			/*
+			 * Commit the flag value change and start the next trans
+			 * in series.
+			 */
+			dac->dela_state = XFS_DAS_FLIP_LFLAG;
+			trace_xfs_attr_set_iter_return(dac->dela_state,
+						       args->dp);
+			return -EAGAIN;
+		}
+
+		/* fallthrough */
 	case XFS_DAS_FLIP_LFLAG:
 		/*
 		 * Dismantle the "old" attribute/value pair by removing a
@@ -589,17 +595,21 @@ xfs_attr_set_iter(
 		 * In a separate transaction, set the incomplete flag on the
 		 * "old" attr and clear the incomplete flag on the "new" attr.
 		 */
-		error = xfs_attr3_leaf_flipflags(args);
-		if (error)
-			goto out;
-		/*
-		 * Commit the flag value change and start the next trans in
-		 * series
-		 */
-		dac->dela_state = XFS_DAS_FLIP_NFLAG;
-		trace_xfs_attr_set_iter_return(dac->dela_state, args->dp);
-		return -EAGAIN;
+		if (!xfs_has_larp(mp)) {
+			error = xfs_attr3_leaf_flipflags(args);
+			if (error)
+				goto out;
+			/*
+			 * Commit the flag value change and start the next trans
+			 * in series
+			 */
+			dac->dela_state = XFS_DAS_FLIP_NFLAG;
+			trace_xfs_attr_set_iter_return(dac->dela_state,
+						       args->dp);
+			return -EAGAIN;
+		}
 
+		/* fallthrough */
 	case XFS_DAS_FLIP_NFLAG:
 		/*
 		 * Dismantle the "old" attribute/value pair by removing a
@@ -1236,6 +1246,7 @@ xfs_attr_node_addname_clear_incomplete(
 {
 	struct xfs_da_args		*args = dac->da_args;
 	struct xfs_da_state		*state = NULL;
+	struct xfs_mount		*mp = args->dp->i_mount;
 	int				retval = 0;
 	int				error = 0;
 
@@ -1243,7 +1254,8 @@ xfs_attr_node_addname_clear_incomplete(
 	 * Re-find the "old" attribute entry after any split ops. The INCOMPLETE
 	 * flag means that we will find the "old" attr, not the "new" one.
 	 */
-	args->attr_filter |= XFS_ATTR_INCOMPLETE;
+	if (!xfs_has_larp(mp))
+		args->attr_filter |= XFS_ATTR_INCOMPLETE;
 	state = xfs_da_state_alloc(args);
 	state->inleaf = 0;
 	error = xfs_da3_node_lookup_int(state, &retval);
diff --git a/fs/xfs/libxfs/xfs_attr_leaf.c b/fs/xfs/libxfs/xfs_attr_leaf.c
index e1d11e314228..a0a352bdea59 100644
--- a/fs/xfs/libxfs/xfs_attr_leaf.c
+++ b/fs/xfs/libxfs/xfs_attr_leaf.c
@@ -1487,7 +1487,8 @@ xfs_attr3_leaf_add_work(
 	if (tmp)
 		entry->flags |= XFS_ATTR_LOCAL;
 	if (args->op_flags & XFS_DA_OP_RENAME) {
-		entry->flags |= XFS_ATTR_INCOMPLETE;
+		if (!xfs_has_larp(mp))
+			entry->flags |= XFS_ATTR_INCOMPLETE;
 		if ((args->blkno2 == args->blkno) &&
 		    (args->index2 <= args->index)) {
 			args->index2++;
-- 
2.25.1


^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH v24 06/11] xfs: Add xfs_attr_set_deferred and xfs_attr_remove_deferred
  2021-08-24 22:44 [PATCH v24 00/11] Log Attribute Replay Allison Henderson
                   ` (4 preceding siblings ...)
  2021-08-24 22:44 ` [PATCH v24 05/11] RFC xfs: Skip flip flags for delayed attrs Allison Henderson
@ 2021-08-24 22:44 ` Allison Henderson
  2021-08-30 10:27   ` Chandan Babu R
  2021-09-01  3:47   ` Dave Chinner
  2021-08-24 22:44 ` [PATCH v24 07/11] xfs: Remove unused xfs_attr_*_args Allison Henderson
                   ` (6 subsequent siblings)
  12 siblings, 2 replies; 45+ messages in thread
From: Allison Henderson @ 2021-08-24 22:44 UTC (permalink / raw)
  To: linux-xfs

From: Allison Collins <allison.henderson@oracle.com>

These routines set up and queue a new deferred attribute operations.
These functions are meant to be called by any routine needing to
initiate a deferred attribute operation as opposed to the existing
inline operations. New helper function xfs_attr_item_init also added.

Finally enable delayed attributes in xfs_attr_set and xfs_attr_remove.

Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_attr.c | 71 ++++++++++++++++++++++++++++++++++++++--
 fs/xfs/libxfs/xfs_attr.h |  2 ++
 fs/xfs/xfs_log.c         | 41 +++++++++++++++++++++++
 fs/xfs/xfs_log.h         |  1 +
 4 files changed, 112 insertions(+), 3 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
index fce67c717be2..6877683e2e35 100644
--- a/fs/xfs/libxfs/xfs_attr.c
+++ b/fs/xfs/libxfs/xfs_attr.c
@@ -25,6 +25,8 @@
 #include "xfs_trans_space.h"
 #include "xfs_trace.h"
 #include "xfs_attr_item.h"
+#include "xfs_attr.h"
+#include "xfs_log.h"
 
 /*
  * xfs_attr.c
@@ -726,6 +728,7 @@ xfs_attr_set(
 	int			error, local;
 	int			rmt_blks = 0;
 	unsigned int		total;
+	int			delayed = xfs_has_larp(mp);
 
 	if (xfs_is_shutdown(dp->i_mount))
 		return -EIO;
@@ -782,13 +785,19 @@ xfs_attr_set(
 		rmt_blks = xfs_attr3_rmt_blocks(mp, XFS_XATTR_SIZE_MAX);
 	}
 
+	if (delayed) {
+		error = xfs_attr_use_log_assist(mp);
+		if (error)
+			return error;
+	}
+
 	/*
 	 * Root fork attributes can use reserved data blocks for this
 	 * operation if necessary
 	 */
 	error = xfs_trans_alloc_inode(dp, &tres, total, 0, rsvd, &args->trans);
 	if (error)
-		return error;
+		goto drop_incompat;
 
 	if (args->value || xfs_inode_hasattr(dp)) {
 		error = xfs_iext_count_may_overflow(dp, XFS_ATTR_FORK,
@@ -806,9 +815,10 @@ xfs_attr_set(
 		if (error != -ENOATTR && error != -EEXIST)
 			goto out_trans_cancel;
 
-		error = xfs_attr_set_args(args);
+		error = xfs_attr_set_deferred(args);
 		if (error)
 			goto out_trans_cancel;
+
 		/* shortform attribute has already been committed */
 		if (!args->trans)
 			goto out_unlock;
@@ -816,7 +826,7 @@ xfs_attr_set(
 		if (error != -EEXIST)
 			goto out_trans_cancel;
 
-		error = xfs_attr_remove_args(args);
+		error = xfs_attr_remove_deferred(args);
 		if (error)
 			goto out_trans_cancel;
 	}
@@ -838,6 +848,9 @@ xfs_attr_set(
 	error = xfs_trans_commit(args->trans);
 out_unlock:
 	xfs_iunlock(dp, XFS_ILOCK_EXCL);
+drop_incompat:
+	if (delayed)
+		xlog_drop_incompat_feat(mp->m_log);
 	return error;
 
 out_trans_cancel:
@@ -846,6 +859,58 @@ xfs_attr_set(
 	goto out_unlock;
 }
 
+STATIC int
+xfs_attr_item_init(
+	struct xfs_da_args	*args,
+	unsigned int		op_flags,	/* op flag (set or remove) */
+	struct xfs_attr_item	**attr)		/* new xfs_attr_item */
+{
+
+	struct xfs_attr_item	*new;
+
+	new = kmem_zalloc(sizeof(struct xfs_attr_item), KM_NOFS);
+	new->xattri_op_flags = op_flags;
+	new->xattri_dac.da_args = args;
+
+	*attr = new;
+	return 0;
+}
+
+/* Sets an attribute for an inode as a deferred operation */
+int
+xfs_attr_set_deferred(
+	struct xfs_da_args	*args)
+{
+	struct xfs_attr_item	*new;
+	int			error = 0;
+
+	error = xfs_attr_item_init(args, XFS_ATTR_OP_FLAGS_SET, &new);
+	if (error)
+		return error;
+
+	xfs_defer_add(args->trans, XFS_DEFER_OPS_TYPE_ATTR, &new->xattri_list);
+
+	return 0;
+}
+
+/* Removes an attribute for an inode as a deferred operation */
+int
+xfs_attr_remove_deferred(
+	struct xfs_da_args	*args)
+{
+
+	struct xfs_attr_item	*new;
+	int			error;
+
+	error  = xfs_attr_item_init(args, XFS_ATTR_OP_FLAGS_REMOVE, &new);
+	if (error)
+		return error;
+
+	xfs_defer_add(args->trans, XFS_DEFER_OPS_TYPE_ATTR, &new->xattri_list);
+
+	return 0;
+}
+
 /*========================================================================
  * External routines when attribute list is inside the inode
  *========================================================================*/
diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
index aa33cdcf26b8..0f326c28ab7c 100644
--- a/fs/xfs/libxfs/xfs_attr.h
+++ b/fs/xfs/libxfs/xfs_attr.h
@@ -526,5 +526,7 @@ bool xfs_attr_namecheck(const void *name, size_t length);
 void xfs_delattr_context_init(struct xfs_delattr_context *dac,
 			      struct xfs_da_args *args);
 int xfs_attr_calc_size(struct xfs_da_args *args, int *local);
+int xfs_attr_set_deferred(struct xfs_da_args *args);
+int xfs_attr_remove_deferred(struct xfs_da_args *args);
 
 #endif	/* __XFS_ATTR_H__ */
diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
index 4402c5d09269..0d0afa1aae59 100644
--- a/fs/xfs/xfs_log.c
+++ b/fs/xfs/xfs_log.c
@@ -3993,3 +3993,44 @@ xlog_drop_incompat_feat(
 {
 	up_read(&log->l_incompat_users);
 }
+
+/*
+ * Get permission to use log-assisted atomic exchange of file extents.
+ *
+ * Callers must not be running any transactions or hold any inode locks, and
+ * they must release the permission by calling xlog_drop_incompat_feat
+ * when they're done.
+ */
+int
+xfs_attr_use_log_assist(
+	struct xfs_mount	*mp)
+{
+	int			error = 0;
+
+	/*
+	 * Protect ourselves from an idle log clearing the logged xattrs log
+	 * incompat feature bit.
+	 */
+	xlog_use_incompat_feat(mp->m_log);
+
+	/*
+	 * If log-assisted xattrs are already enabled, the caller can use the
+	 * log assisted swap functions with the log-incompat reference we got.
+	 */
+	if (sb_version_haslogxattrs(&mp->m_sb))
+		return 0;
+
+	/* Enable log-assisted xattrs. */
+	error = xfs_add_incompat_log_feature(mp,
+			XFS_SB_FEAT_INCOMPAT_LOG_XATTRS);
+	if (error)
+		goto drop_incompat;
+
+	xfs_warn_once(mp,
+"EXPERIMENTAL logged extended attributes feature added. Use at your own risk!");
+
+	return 0;
+drop_incompat:
+	xlog_drop_incompat_feat(mp->m_log);
+	return error;
+}
diff --git a/fs/xfs/xfs_log.h b/fs/xfs/xfs_log.h
index dc1b77b92fc1..4504ab60ac85 100644
--- a/fs/xfs/xfs_log.h
+++ b/fs/xfs/xfs_log.h
@@ -144,5 +144,6 @@ bool	  xlog_force_shutdown(struct xlog *log, int shutdown_flags);
 
 void xlog_use_incompat_feat(struct xlog *log);
 void xlog_drop_incompat_feat(struct xlog *log);
+int xfs_attr_use_log_assist(struct xfs_mount *mp);
 
 #endif	/* __XFS_LOG_H__ */
-- 
2.25.1


^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH v24 07/11] xfs: Remove unused xfs_attr_*_args
  2021-08-24 22:44 [PATCH v24 00/11] Log Attribute Replay Allison Henderson
                   ` (5 preceding siblings ...)
  2021-08-24 22:44 ` [PATCH v24 06/11] xfs: Add xfs_attr_set_deferred and xfs_attr_remove_deferred Allison Henderson
@ 2021-08-24 22:44 ` Allison Henderson
  2021-08-24 22:44 ` [PATCH v24 08/11] xfs: Add log attribute error tag Allison Henderson
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 45+ messages in thread
From: Allison Henderson @ 2021-08-24 22:44 UTC (permalink / raw)
  To: linux-xfs

Remove xfs_attr_set_args, xfs_attr_remove_args, and xfs_attr_trans_roll.
These high level loops are now driven by the delayed operations code,
and can be removed.

Additionally collapse in the leaf_bp parameter of xfs_attr_set_iter
since we only have one caller that passes dac->leaf_bp

Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
---
 fs/xfs/libxfs/xfs_attr.c        | 106 +++-----------------------------
 fs/xfs/libxfs/xfs_attr.h        |  10 +--
 fs/xfs/libxfs/xfs_attr_remote.c |   1 -
 fs/xfs/xfs_attr_item.c          |   6 +-
 4 files changed, 14 insertions(+), 109 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
index 6877683e2e35..d62ab53c3b07 100644
--- a/fs/xfs/libxfs/xfs_attr.c
+++ b/fs/xfs/libxfs/xfs_attr.c
@@ -244,64 +244,9 @@ xfs_attr_is_shortform(
 		ip->i_afp->if_nextents == 0);
 }
 
-/*
- * Checks to see if a delayed attribute transaction should be rolled.  If so,
- * transaction is finished or rolled as needed.
- */
-STATIC int
-xfs_attr_trans_roll(
-	struct xfs_delattr_context	*dac)
-{
-	struct xfs_da_args		*args = dac->da_args;
-	int				error;
-
-	if (dac->flags & XFS_DAC_DEFER_FINISH) {
-		/*
-		 * The caller wants us to finish all the deferred ops so that we
-		 * avoid pinning the log tail with a large number of deferred
-		 * ops.
-		 */
-		dac->flags &= ~XFS_DAC_DEFER_FINISH;
-		error = xfs_defer_finish(&args->trans);
-	} else
-		error = xfs_trans_roll_inode(&args->trans, args->dp);
-
-	return error;
-}
-
-/*
- * Set the attribute specified in @args.
- */
-int
-xfs_attr_set_args(
-	struct xfs_da_args		*args)
-{
-	struct xfs_buf			*leaf_bp = NULL;
-	int				error = 0;
-	struct xfs_delattr_context	dac = {
-		.da_args	= args,
-	};
-
-	do {
-		error = xfs_attr_set_iter(&dac, &leaf_bp);
-		if (error != -EAGAIN)
-			break;
-
-		error = xfs_attr_trans_roll(&dac);
-		if (error) {
-			if (leaf_bp)
-				xfs_trans_brelse(args->trans, leaf_bp);
-			return error;
-		}
-	} while (true);
-
-	return error;
-}
-
 STATIC int
 xfs_attr_sf_addname(
-	struct xfs_delattr_context	*dac,
-	struct xfs_buf			**leaf_bp)
+	struct xfs_delattr_context	*dac)
 {
 	struct xfs_da_args		*args = dac->da_args;
 	struct xfs_inode		*dp = args->dp;
@@ -320,7 +265,7 @@ xfs_attr_sf_addname(
 	 * It won't fit in the shortform, transform to a leaf block.  GROT:
 	 * another possible req'mt for a double-split btree op.
 	 */
-	error = xfs_attr_shortform_to_leaf(args, leaf_bp);
+	error = xfs_attr_shortform_to_leaf(args, &dac->leaf_bp);
 	if (error)
 		return error;
 
@@ -329,7 +274,7 @@ xfs_attr_sf_addname(
 	 * push cannot grab the half-baked leaf buffer and run into problems
 	 * with the write verifier.
 	 */
-	xfs_trans_bhold(args->trans, *leaf_bp);
+	xfs_trans_bhold(args->trans, dac->leaf_bp);
 
 	/*
 	 * We're still in XFS_DAS_UNINIT state here.  We've converted
@@ -337,7 +282,6 @@ xfs_attr_sf_addname(
 	 * add.
 	 */
 	trace_xfs_attr_sf_addname_return(XFS_DAS_UNINIT, args->dp);
-	dac->flags |= XFS_DAC_DEFER_FINISH;
 	return -EAGAIN;
 }
 
@@ -350,8 +294,7 @@ xfs_attr_sf_addname(
  */
 int
 xfs_attr_set_iter(
-	struct xfs_delattr_context	*dac,
-	struct xfs_buf			**leaf_bp)
+	struct xfs_delattr_context	*dac)
 {
 	struct xfs_da_args              *args = dac->da_args;
 	struct xfs_inode		*dp = args->dp;
@@ -370,14 +313,14 @@ xfs_attr_set_iter(
 		 * release the hold once we return with a clean transaction.
 		 */
 		if (xfs_attr_is_shortform(dp))
-			return xfs_attr_sf_addname(dac, leaf_bp);
-		if (*leaf_bp != NULL) {
-			xfs_trans_bhold_release(args->trans, *leaf_bp);
-			*leaf_bp = NULL;
+			return xfs_attr_sf_addname(dac);
+		if (dac->leaf_bp != NULL) {
+			xfs_trans_bhold_release(args->trans, dac->leaf_bp);
+			dac->leaf_bp = NULL;
 		}
 
 		if (xfs_attr_is_leaf(dp)) {
-			error = xfs_attr_leaf_try_add(args, *leaf_bp);
+			error = xfs_attr_leaf_try_add(args, dac->leaf_bp);
 			if (error == -ENOSPC) {
 				error = xfs_attr3_leaf_to_node(args);
 				if (error)
@@ -396,7 +339,6 @@ xfs_attr_set_iter(
 				 * be a node, so we'll fall down into the node
 				 * handling code below
 				 */
-				dac->flags |= XFS_DAC_DEFER_FINISH;
 				trace_xfs_attr_set_iter_return(
 					dac->dela_state, args->dp);
 				return -EAGAIN;
@@ -687,32 +629,6 @@ xfs_attr_lookup(
 	return xfs_attr_node_hasname(args, NULL);
 }
 
-/*
- * Remove the attribute specified in @args.
- */
-int
-xfs_attr_remove_args(
-	struct xfs_da_args	*args)
-{
-	int				error;
-	struct xfs_delattr_context	dac = {
-		.da_args	= args,
-	};
-
-	do {
-		error = xfs_attr_remove_iter(&dac);
-		if (error != -EAGAIN)
-			break;
-
-		error = xfs_attr_trans_roll(&dac);
-		if (error)
-			return error;
-
-	} while (true);
-
-	return error;
-}
-
 /*
  * Note: If args->value is NULL the attribute will be removed, just like the
  * Linux ->setattr API.
@@ -1275,7 +1191,6 @@ xfs_attr_node_addname(
 			 * this. dela_state is still unset by this function at
 			 * this point.
 			 */
-			dac->flags |= XFS_DAC_DEFER_FINISH;
 			trace_xfs_attr_node_addname_return(
 					dac->dela_state, args->dp);
 			return -EAGAIN;
@@ -1290,7 +1205,6 @@ xfs_attr_node_addname(
 		error = xfs_da3_split(state);
 		if (error)
 			goto out;
-		dac->flags |= XFS_DAC_DEFER_FINISH;
 	} else {
 		/*
 		 * Addition succeeded, update Btree hashvals.
@@ -1544,7 +1458,6 @@ xfs_attr_remove_iter(
 			if (error)
 				goto out;
 			dac->dela_state = XFS_DAS_RM_NAME;
-			dac->flags |= XFS_DAC_DEFER_FINISH;
 			trace_xfs_attr_remove_iter_return(dac->dela_state, args->dp);
 			return -EAGAIN;
 		}
@@ -1572,7 +1485,6 @@ xfs_attr_remove_iter(
 			if (error)
 				goto out;
 
-			dac->flags |= XFS_DAC_DEFER_FINISH;
 			dac->dela_state = XFS_DAS_RM_SHRINK;
 			trace_xfs_attr_remove_iter_return(
 					dac->dela_state, args->dp);
diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
index 0f326c28ab7c..efb7ac4fc41c 100644
--- a/fs/xfs/libxfs/xfs_attr.h
+++ b/fs/xfs/libxfs/xfs_attr.h
@@ -457,9 +457,8 @@ enum xfs_delattr_state {
 /*
  * Defines for xfs_delattr_context.flags
  */
-#define XFS_DAC_DEFER_FINISH		0x01 /* finish the transaction */
-#define XFS_DAC_LEAF_ADDNAME_INIT	0x02 /* xfs_attr_leaf_addname init*/
-#define XFS_DAC_DELAYED_OP_INIT		0x04 /* delayed operations init*/
+#define XFS_DAC_LEAF_ADDNAME_INIT	0x01 /* xfs_attr_leaf_addname init*/
+#define XFS_DAC_DELAYED_OP_INIT		0x02 /* delayed operations init*/
 
 /*
  * Context used for keeping track of delayed attribute operations
@@ -517,10 +516,7 @@ bool xfs_attr_is_leaf(struct xfs_inode *ip);
 int xfs_attr_get_ilocked(struct xfs_da_args *args);
 int xfs_attr_get(struct xfs_da_args *args);
 int xfs_attr_set(struct xfs_da_args *args);
-int xfs_attr_set_args(struct xfs_da_args *args);
-int xfs_attr_set_iter(struct xfs_delattr_context *dac,
-		      struct xfs_buf **leaf_bp);
-int xfs_attr_remove_args(struct xfs_da_args *args);
+int xfs_attr_set_iter(struct xfs_delattr_context *dac);
 int xfs_attr_remove_iter(struct xfs_delattr_context *dac);
 bool xfs_attr_namecheck(const void *name, size_t length);
 void xfs_delattr_context_init(struct xfs_delattr_context *dac,
diff --git a/fs/xfs/libxfs/xfs_attr_remote.c b/fs/xfs/libxfs/xfs_attr_remote.c
index 83b95be9ded8..c806319134fb 100644
--- a/fs/xfs/libxfs/xfs_attr_remote.c
+++ b/fs/xfs/libxfs/xfs_attr_remote.c
@@ -695,7 +695,6 @@ xfs_attr_rmtval_remove(
 	 * the parent
 	 */
 	if (!done) {
-		dac->flags |= XFS_DAC_DEFER_FINISH;
 		trace_xfs_attr_rmtval_remove_return(dac->dela_state, args->dp);
 		return -EAGAIN;
 	}
diff --git a/fs/xfs/xfs_attr_item.c b/fs/xfs/xfs_attr_item.c
index c6d5ed34b424..928c0076a2fd 100644
--- a/fs/xfs/xfs_attr_item.c
+++ b/fs/xfs/xfs_attr_item.c
@@ -285,7 +285,6 @@ STATIC int
 xfs_trans_attr_finish_update(
 	struct xfs_delattr_context	*dac,
 	struct xfs_attrd_log_item	*attrdp,
-	struct xfs_buf			**leaf_bp,
 	uint32_t			op_flags)
 {
 	struct xfs_da_args		*args = dac->da_args;
@@ -295,7 +294,7 @@ xfs_trans_attr_finish_update(
 
 	switch (op) {
 	case XFS_ATTR_OP_FLAGS_SET:
-		error = xfs_attr_set_iter(dac, leaf_bp);
+		error = xfs_attr_set_iter(dac);
 		break;
 	case XFS_ATTR_OP_FLAGS_REMOVE:
 		ASSERT(XFS_IFORK_Q(args->dp));
@@ -405,7 +404,7 @@ xfs_attr_finish_item(
 	 */
 	dac->da_args->trans = tp;
 
-	error = xfs_trans_attr_finish_update(dac, done_item, &dac->leaf_bp,
+	error = xfs_trans_attr_finish_update(dac, done_item,
 					     attr->xattri_op_flags);
 	if (error != -EAGAIN)
 		kmem_free(attr);
@@ -617,7 +616,6 @@ xfs_attri_item_recover(
 	xfs_trans_ijoin(tp, ip, 0);
 
 	ret = xfs_trans_attr_finish_update(&attr->xattri_dac, done_item,
-					   &attr->xattri_dac.leaf_bp,
 					   attrp->alfi_op_flags);
 	if (ret == -EAGAIN) {
 		/* There's more work to do, so add it to this transaction */
-- 
2.25.1


^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH v24 08/11] xfs: Add log attribute error tag
  2021-08-24 22:44 [PATCH v24 00/11] Log Attribute Replay Allison Henderson
                   ` (6 preceding siblings ...)
  2021-08-24 22:44 ` [PATCH v24 07/11] xfs: Remove unused xfs_attr_*_args Allison Henderson
@ 2021-08-24 22:44 ` Allison Henderson
  2021-08-24 22:44 ` [PATCH v24 09/11] xfs: Add larp debug option Allison Henderson
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 45+ messages in thread
From: Allison Henderson @ 2021-08-24 22:44 UTC (permalink / raw)
  To: linux-xfs

This patch adds an error tag that we can use to test log attribute
recovery and replay

Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
---
 fs/xfs/libxfs/xfs_errortag.h | 4 +++-
 fs/xfs/xfs_attr_item.c       | 7 +++++++
 fs/xfs/xfs_error.c           | 3 +++
 3 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/fs/xfs/libxfs/xfs_errortag.h b/fs/xfs/libxfs/xfs_errortag.h
index a23a52e643ad..c15d2340220c 100644
--- a/fs/xfs/libxfs/xfs_errortag.h
+++ b/fs/xfs/libxfs/xfs_errortag.h
@@ -59,7 +59,8 @@
 #define XFS_ERRTAG_REDUCE_MAX_IEXTENTS			36
 #define XFS_ERRTAG_BMAP_ALLOC_MINLEN_EXTENT		37
 #define XFS_ERRTAG_AG_RESV_FAIL				38
-#define XFS_ERRTAG_MAX					39
+#define XFS_ERRTAG_LARP					39
+#define XFS_ERRTAG_MAX					40
 
 /*
  * Random factors for above tags, 1 means always, 2 means 1/2 time, etc.
@@ -103,5 +104,6 @@
 #define XFS_RANDOM_REDUCE_MAX_IEXTENTS			1
 #define XFS_RANDOM_BMAP_ALLOC_MINLEN_EXTENT		1
 #define XFS_RANDOM_AG_RESV_FAIL				1
+#define XFS_RANDOM_LARP					1
 
 #endif /* __XFS_ERRORTAG_H_ */
diff --git a/fs/xfs/xfs_attr_item.c b/fs/xfs/xfs_attr_item.c
index 928c0076a2fd..69646a8b6e09 100644
--- a/fs/xfs/xfs_attr_item.c
+++ b/fs/xfs/xfs_attr_item.c
@@ -34,6 +34,7 @@
 #include "xfs_inode.h"
 #include "xfs_quota.h"
 #include "xfs_trans_space.h"
+#include "xfs_errortag.h"
 #include "xfs_error.h"
 #include "xfs_log_priv.h"
 #include "xfs_log_recover.h"
@@ -292,6 +293,11 @@ xfs_trans_attr_finish_update(
 					     XFS_ATTR_OP_FLAGS_TYPE_MASK;
 	int				error;
 
+	if (XFS_TEST_ERROR(false, args->dp->i_mount, XFS_ERRTAG_LARP)) {
+		error = -EIO;
+		goto out;
+	}
+
 	switch (op) {
 	case XFS_ATTR_OP_FLAGS_SET:
 		error = xfs_attr_set_iter(dac);
@@ -305,6 +311,7 @@ xfs_trans_attr_finish_update(
 		break;
 	}
 
+out:
 	/*
 	 * Mark the transaction dirty, even on error. This ensures the
 	 * transaction is aborted, which:
diff --git a/fs/xfs/xfs_error.c b/fs/xfs/xfs_error.c
index 81c445e9489b..d4b2256ba00b 100644
--- a/fs/xfs/xfs_error.c
+++ b/fs/xfs/xfs_error.c
@@ -57,6 +57,7 @@ static unsigned int xfs_errortag_random_default[] = {
 	XFS_RANDOM_REDUCE_MAX_IEXTENTS,
 	XFS_RANDOM_BMAP_ALLOC_MINLEN_EXTENT,
 	XFS_RANDOM_AG_RESV_FAIL,
+	XFS_RANDOM_LARP,
 };
 
 struct xfs_errortag_attr {
@@ -170,6 +171,7 @@ XFS_ERRORTAG_ATTR_RW(buf_ioerror,	XFS_ERRTAG_BUF_IOERROR);
 XFS_ERRORTAG_ATTR_RW(reduce_max_iextents,	XFS_ERRTAG_REDUCE_MAX_IEXTENTS);
 XFS_ERRORTAG_ATTR_RW(bmap_alloc_minlen_extent,	XFS_ERRTAG_BMAP_ALLOC_MINLEN_EXTENT);
 XFS_ERRORTAG_ATTR_RW(ag_resv_fail, XFS_ERRTAG_AG_RESV_FAIL);
+XFS_ERRORTAG_ATTR_RW(larp,		XFS_ERRTAG_LARP);
 
 static struct attribute *xfs_errortag_attrs[] = {
 	XFS_ERRORTAG_ATTR_LIST(noerror),
@@ -211,6 +213,7 @@ static struct attribute *xfs_errortag_attrs[] = {
 	XFS_ERRORTAG_ATTR_LIST(reduce_max_iextents),
 	XFS_ERRORTAG_ATTR_LIST(bmap_alloc_minlen_extent),
 	XFS_ERRORTAG_ATTR_LIST(ag_resv_fail),
+	XFS_ERRORTAG_ATTR_LIST(larp),
 	NULL,
 };
 
-- 
2.25.1


^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH v24 09/11] xfs: Add larp debug option
  2021-08-24 22:44 [PATCH v24 00/11] Log Attribute Replay Allison Henderson
                   ` (7 preceding siblings ...)
  2021-08-24 22:44 ` [PATCH v24 08/11] xfs: Add log attribute error tag Allison Henderson
@ 2021-08-24 22:44 ` Allison Henderson
  2021-08-30 12:03   ` Chandan Babu R
  2021-08-24 22:44 ` [PATCH v24 10/11] xfs: Merge xfs_delattr_context into xfs_attr_item Allison Henderson
                   ` (3 subsequent siblings)
  12 siblings, 1 reply; 45+ messages in thread
From: Allison Henderson @ 2021-08-24 22:44 UTC (permalink / raw)
  To: linux-xfs

This patch adds a mount option to enable log attribute replay. Eventually
this can be removed when delayed attrs becomes permanent.

Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
---
 fs/xfs/libxfs/xfs_attr.h |  2 +-
 fs/xfs/xfs_globals.c     |  1 +
 fs/xfs/xfs_sysctl.h      |  1 +
 fs/xfs/xfs_sysfs.c       | 24 ++++++++++++++++++++++++
 4 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
index efb7ac4fc41c..492762541174 100644
--- a/fs/xfs/libxfs/xfs_attr.h
+++ b/fs/xfs/libxfs/xfs_attr.h
@@ -30,7 +30,7 @@ struct xfs_attr_list_context;
 
 static inline bool xfs_has_larp(struct xfs_mount *mp)
 {
-	return false;
+	return xfs_globals.larp;
 }
 
 /*
diff --git a/fs/xfs/xfs_globals.c b/fs/xfs/xfs_globals.c
index f62fa652c2fd..4d0a98f920ca 100644
--- a/fs/xfs/xfs_globals.c
+++ b/fs/xfs/xfs_globals.c
@@ -41,5 +41,6 @@ struct xfs_globals xfs_globals = {
 #endif
 #ifdef DEBUG
 	.pwork_threads		=	-1,	/* automatic thread detection */
+	.larp			=	false,	/* log attribute replay */
 #endif
 };
diff --git a/fs/xfs/xfs_sysctl.h b/fs/xfs/xfs_sysctl.h
index 7692e76ead33..f78ad6b10ea5 100644
--- a/fs/xfs/xfs_sysctl.h
+++ b/fs/xfs/xfs_sysctl.h
@@ -83,6 +83,7 @@ extern xfs_param_t	xfs_params;
 struct xfs_globals {
 #ifdef DEBUG
 	int	pwork_threads;		/* parallel workqueue threads */
+	bool	larp;			/* log attribute replay */
 #endif
 	int	log_recovery_delay;	/* log recovery delay (secs) */
 	int	mount_delay;		/* mount setup delay (secs) */
diff --git a/fs/xfs/xfs_sysfs.c b/fs/xfs/xfs_sysfs.c
index 18dc5eca6c04..74180e05e8ed 100644
--- a/fs/xfs/xfs_sysfs.c
+++ b/fs/xfs/xfs_sysfs.c
@@ -227,6 +227,29 @@ pwork_threads_show(
 	return snprintf(buf, PAGE_SIZE, "%d\n", xfs_globals.pwork_threads);
 }
 XFS_SYSFS_ATTR_RW(pwork_threads);
+
+static ssize_t
+larp_store(
+	struct kobject	*kobject,
+	const char	*buf,
+	size_t		count)
+{
+	ssize_t		ret;
+
+	ret = kstrtobool(buf, &xfs_globals.larp);
+	if (ret < 0)
+		return ret;
+	return count;
+}
+
+STATIC ssize_t
+larp_show(
+	struct kobject	*kobject,
+	char		*buf)
+{
+	return snprintf(buf, PAGE_SIZE, "%d\n", xfs_globals.larp);
+}
+XFS_SYSFS_ATTR_RW(larp);
 #endif /* DEBUG */
 
 static struct attribute *xfs_dbg_attrs[] = {
@@ -236,6 +259,7 @@ static struct attribute *xfs_dbg_attrs[] = {
 	ATTR_LIST(always_cow),
 #ifdef DEBUG
 	ATTR_LIST(pwork_threads),
+	ATTR_LIST(larp),
 #endif
 	NULL,
 };
-- 
2.25.1


^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH v24 10/11] xfs: Merge xfs_delattr_context into xfs_attr_item
  2021-08-24 22:44 [PATCH v24 00/11] Log Attribute Replay Allison Henderson
                   ` (8 preceding siblings ...)
  2021-08-24 22:44 ` [PATCH v24 09/11] xfs: Add larp debug option Allison Henderson
@ 2021-08-24 22:44 ` Allison Henderson
  2021-08-24 22:44 ` [PATCH v24 11/11] xfs: Add helper function xfs_attr_leaf_addname Allison Henderson
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 45+ messages in thread
From: Allison Henderson @ 2021-08-24 22:44 UTC (permalink / raw)
  To: linux-xfs

This is a clean up patch that merges xfs_delattr_context into
xfs_attr_item.  Now that the refactoring is complete and the delayed
operation infrastructure is in place, we can combine these to eliminate
the extra struct

Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
---
 fs/xfs/libxfs/xfs_attr.c        | 162 +++++++++++++++++---------------
 fs/xfs/libxfs/xfs_attr.h        |  40 ++++----
 fs/xfs/libxfs/xfs_attr_remote.c |  36 +++----
 fs/xfs/libxfs/xfs_attr_remote.h |   6 +-
 fs/xfs/xfs_attr_item.c          |  38 ++++----
 5 files changed, 141 insertions(+), 141 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
index d62ab53c3b07..c3fdf232cd51 100644
--- a/fs/xfs/libxfs/xfs_attr.c
+++ b/fs/xfs/libxfs/xfs_attr.c
@@ -56,10 +56,9 @@ STATIC int xfs_attr_leaf_try_add(struct xfs_da_args *args, struct xfs_buf *bp);
  */
 STATIC int xfs_attr_node_get(xfs_da_args_t *args);
 STATIC void xfs_attr_restore_rmt_blk(struct xfs_da_args *args);
-STATIC int xfs_attr_node_addname(struct xfs_delattr_context *dac);
-STATIC int xfs_attr_node_addname_find_attr(struct xfs_delattr_context *dac);
-STATIC int xfs_attr_node_addname_clear_incomplete(
-				struct xfs_delattr_context *dac);
+STATIC int xfs_attr_node_addname(struct xfs_attr_item *attr);
+STATIC int xfs_attr_node_addname_find_attr(struct xfs_attr_item *attr);
+STATIC int xfs_attr_node_addname_clear_incomplete(struct xfs_attr_item *attr);
 STATIC int xfs_attr_node_hasname(xfs_da_args_t *args,
 				 struct xfs_da_state **state);
 STATIC int xfs_attr_fillstate(xfs_da_state_t *state);
@@ -246,9 +245,9 @@ xfs_attr_is_shortform(
 
 STATIC int
 xfs_attr_sf_addname(
-	struct xfs_delattr_context	*dac)
+	struct xfs_attr_item		*attr)
 {
-	struct xfs_da_args		*args = dac->da_args;
+	struct xfs_da_args		*args = attr->xattri_da_args;
 	struct xfs_inode		*dp = args->dp;
 	int				error = 0;
 
@@ -265,7 +264,7 @@ xfs_attr_sf_addname(
 	 * It won't fit in the shortform, transform to a leaf block.  GROT:
 	 * another possible req'mt for a double-split btree op.
 	 */
-	error = xfs_attr_shortform_to_leaf(args, &dac->leaf_bp);
+	error = xfs_attr_shortform_to_leaf(args, &attr->xattri_leaf_bp);
 	if (error)
 		return error;
 
@@ -274,7 +273,7 @@ xfs_attr_sf_addname(
 	 * push cannot grab the half-baked leaf buffer and run into problems
 	 * with the write verifier.
 	 */
-	xfs_trans_bhold(args->trans, dac->leaf_bp);
+	xfs_trans_bhold(args->trans, attr->xattri_leaf_bp);
 
 	/*
 	 * We're still in XFS_DAS_UNINIT state here.  We've converted
@@ -294,16 +293,16 @@ xfs_attr_sf_addname(
  */
 int
 xfs_attr_set_iter(
-	struct xfs_delattr_context	*dac)
+	struct xfs_attr_item		*attr)
 {
-	struct xfs_da_args              *args = dac->da_args;
+	struct xfs_da_args              *args = attr->xattri_da_args;
 	struct xfs_inode		*dp = args->dp;
 	struct xfs_buf			*bp = NULL;
 	int				forkoff, error = 0;
 	struct xfs_mount		*mp = args->dp->i_mount;
 
 	/* State machine switch */
-	switch (dac->dela_state) {
+	switch (attr->xattri_dela_state) {
 	case XFS_DAS_UNINIT:
 		/*
 		 * If the fork is shortform, attempt to add the attr. If there
@@ -313,14 +312,16 @@ xfs_attr_set_iter(
 		 * release the hold once we return with a clean transaction.
 		 */
 		if (xfs_attr_is_shortform(dp))
-			return xfs_attr_sf_addname(dac);
-		if (dac->leaf_bp != NULL) {
-			xfs_trans_bhold_release(args->trans, dac->leaf_bp);
-			dac->leaf_bp = NULL;
+			return xfs_attr_sf_addname(attr);
+		if (attr->xattri_leaf_bp != NULL) {
+			xfs_trans_bhold_release(args->trans,
+						attr->xattri_leaf_bp);
+			attr->xattri_leaf_bp = NULL;
 		}
 
 		if (xfs_attr_is_leaf(dp)) {
-			error = xfs_attr_leaf_try_add(args, dac->leaf_bp);
+			error = xfs_attr_leaf_try_add(args,
+						      attr->xattri_leaf_bp);
 			if (error == -ENOSPC) {
 				error = xfs_attr3_leaf_to_node(args);
 				if (error)
@@ -340,19 +341,19 @@ xfs_attr_set_iter(
 				 * handling code below
 				 */
 				trace_xfs_attr_set_iter_return(
-					dac->dela_state, args->dp);
+					attr->xattri_dela_state, args->dp);
 				return -EAGAIN;
 			} else if (error) {
 				return error;
 			}
 
-			dac->dela_state = XFS_DAS_FOUND_LBLK;
+			attr->xattri_dela_state = XFS_DAS_FOUND_LBLK;
 		} else {
-			error = xfs_attr_node_addname_find_attr(dac);
+			error = xfs_attr_node_addname_find_attr(attr);
 			if (error)
 				return error;
 
-			error = xfs_attr_node_addname(dac);
+			error = xfs_attr_node_addname(attr);
 			if (error)
 				return error;
 
@@ -364,9 +365,10 @@ xfs_attr_set_iter(
 			    !(args->op_flags & XFS_DA_OP_RENAME))
 				return 0;
 
-			dac->dela_state = XFS_DAS_FOUND_NBLK;
+			attr->xattri_dela_state = XFS_DAS_FOUND_NBLK;
 		}
-		trace_xfs_attr_set_iter_return(dac->dela_state,	args->dp);
+		trace_xfs_attr_set_iter_return(attr->xattri_dela_state,
+					       args->dp);
 		return -EAGAIN;
 	case XFS_DAS_FOUND_LBLK:
 		/*
@@ -377,10 +379,10 @@ xfs_attr_set_iter(
 		 */
 
 		/* Open coded xfs_attr_rmtval_set without trans handling */
-		if ((dac->flags & XFS_DAC_LEAF_ADDNAME_INIT) == 0) {
-			dac->flags |= XFS_DAC_LEAF_ADDNAME_INIT;
+		if ((attr->xattri_flags & XFS_DAC_LEAF_ADDNAME_INIT) == 0) {
+			attr->xattri_flags |= XFS_DAC_LEAF_ADDNAME_INIT;
 			if (args->rmtblkno > 0) {
-				error = xfs_attr_rmtval_find_space(dac);
+				error = xfs_attr_rmtval_find_space(attr);
 				if (error)
 					return error;
 			}
@@ -390,11 +392,11 @@ xfs_attr_set_iter(
 		 * Repeat allocating remote blocks for the attr value until
 		 * blkcnt drops to zero.
 		 */
-		if (dac->blkcnt > 0) {
-			error = xfs_attr_rmtval_set_blk(dac);
+		if (attr->xattri_blkcnt > 0) {
+			error = xfs_attr_rmtval_set_blk(attr);
 			if (error)
 				return error;
-			trace_xfs_attr_set_iter_return(dac->dela_state,
+			trace_xfs_attr_set_iter_return(attr->xattri_dela_state,
 						       args->dp);
 			return -EAGAIN;
 		}
@@ -430,8 +432,8 @@ xfs_attr_set_iter(
 			 * Commit the flag value change and start the next trans
 			 * in series.
 			 */
-			dac->dela_state = XFS_DAS_FLIP_LFLAG;
-			trace_xfs_attr_set_iter_return(dac->dela_state,
+			attr->xattri_dela_state = XFS_DAS_FLIP_LFLAG;
+			trace_xfs_attr_set_iter_return(attr->xattri_dela_state,
 						       args->dp);
 			return -EAGAIN;
 		}
@@ -450,17 +452,18 @@ xfs_attr_set_iter(
 		fallthrough;
 	case XFS_DAS_RM_LBLK:
 		/* Set state in case xfs_attr_rmtval_remove returns -EAGAIN */
-		dac->dela_state = XFS_DAS_RM_LBLK;
+		attr->xattri_dela_state = XFS_DAS_RM_LBLK;
 		if (args->rmtblkno) {
-			error = xfs_attr_rmtval_remove(dac);
+			error = xfs_attr_rmtval_remove(attr);
 			if (error == -EAGAIN)
 				trace_xfs_attr_set_iter_return(
-					dac->dela_state, args->dp);
+					attr->xattri_dela_state, args->dp);
 			if (error)
 				return error;
 
-			dac->dela_state = XFS_DAS_RD_LEAF;
-			trace_xfs_attr_set_iter_return(dac->dela_state, args->dp);
+			attr->xattri_dela_state = XFS_DAS_RD_LEAF;
+			trace_xfs_attr_set_iter_return(attr->xattri_dela_state,
+						       args->dp);
 			return -EAGAIN;
 		}
 
@@ -491,7 +494,7 @@ xfs_attr_set_iter(
 		 * state.
 		 */
 		if (args->rmtblkno > 0) {
-			error = xfs_attr_rmtval_find_space(dac);
+			error = xfs_attr_rmtval_find_space(attr);
 			if (error)
 				return error;
 		}
@@ -504,14 +507,14 @@ xfs_attr_set_iter(
 		 * after we create the attribute so that we don't overflow the
 		 * maximum size of a transaction and/or hit a deadlock.
 		 */
-		dac->dela_state = XFS_DAS_ALLOC_NODE;
+		attr->xattri_dela_state = XFS_DAS_ALLOC_NODE;
 		if (args->rmtblkno > 0) {
-			if (dac->blkcnt > 0) {
-				error = xfs_attr_rmtval_set_blk(dac);
+			if (attr->xattri_blkcnt > 0) {
+				error = xfs_attr_rmtval_set_blk(attr);
 				if (error)
 					return error;
 				trace_xfs_attr_set_iter_return(
-					dac->dela_state, args->dp);
+					attr->xattri_dela_state, args->dp);
 				return -EAGAIN;
 			}
 
@@ -547,8 +550,8 @@ xfs_attr_set_iter(
 			 * Commit the flag value change and start the next trans
 			 * in series
 			 */
-			dac->dela_state = XFS_DAS_FLIP_NFLAG;
-			trace_xfs_attr_set_iter_return(dac->dela_state,
+			attr->xattri_dela_state = XFS_DAS_FLIP_NFLAG;
+			trace_xfs_attr_set_iter_return(attr->xattri_dela_state,
 						       args->dp);
 			return -EAGAIN;
 		}
@@ -568,18 +571,19 @@ xfs_attr_set_iter(
 		fallthrough;
 	case XFS_DAS_RM_NBLK:
 		/* Set state in case xfs_attr_rmtval_remove returns -EAGAIN */
-		dac->dela_state = XFS_DAS_RM_NBLK;
+		attr->xattri_dela_state = XFS_DAS_RM_NBLK;
 		if (args->rmtblkno) {
-			error = xfs_attr_rmtval_remove(dac);
+			error = xfs_attr_rmtval_remove(attr);
 			if (error == -EAGAIN)
 				trace_xfs_attr_set_iter_return(
-					dac->dela_state, args->dp);
+					attr->xattri_dela_state, args->dp);
 
 			if (error)
 				return error;
 
-			dac->dela_state = XFS_DAS_CLR_FLAG;
-			trace_xfs_attr_set_iter_return(dac->dela_state, args->dp);
+			attr->xattri_dela_state = XFS_DAS_CLR_FLAG;
+			trace_xfs_attr_set_iter_return(attr->xattri_dela_state,
+						       args->dp);
 			return -EAGAIN;
 		}
 
@@ -589,7 +593,7 @@ xfs_attr_set_iter(
 		 * The last state for node format. Look up the old attr and
 		 * remove it.
 		 */
-		error = xfs_attr_node_addname_clear_incomplete(dac);
+		error = xfs_attr_node_addname_clear_incomplete(attr);
 		break;
 	default:
 		ASSERT(0);
@@ -786,7 +790,7 @@ xfs_attr_item_init(
 
 	new = kmem_zalloc(sizeof(struct xfs_attr_item), KM_NOFS);
 	new->xattri_op_flags = op_flags;
-	new->xattri_dac.da_args = args;
+	new->xattri_da_args = args;
 
 	*attr = new;
 	return 0;
@@ -1099,16 +1103,16 @@ xfs_attr_node_hasname(
 
 STATIC int
 xfs_attr_node_addname_find_attr(
-	struct xfs_delattr_context	*dac)
+	 struct xfs_attr_item		*attr)
 {
-	struct xfs_da_args		*args = dac->da_args;
+	struct xfs_da_args		*args = attr->xattri_da_args;
 	int				retval;
 
 	/*
 	 * Search to see if name already exists, and get back a pointer
 	 * to where it should go.
 	 */
-	retval = xfs_attr_node_hasname(args, &dac->da_state);
+	retval = xfs_attr_node_hasname(args, &attr->xattri_da_state);
 	if (retval != -ENOATTR && retval != -EEXIST)
 		return retval;
 
@@ -1136,8 +1140,8 @@ xfs_attr_node_addname_find_attr(
 
 	return 0;
 error:
-	if (dac->da_state)
-		xfs_da_state_free(dac->da_state);
+	if (attr->xattri_da_state)
+		xfs_da_state_free(attr->xattri_da_state);
 	return retval;
 }
 
@@ -1158,10 +1162,10 @@ xfs_attr_node_addname_find_attr(
  */
 STATIC int
 xfs_attr_node_addname(
-	struct xfs_delattr_context	*dac)
+	struct xfs_attr_item		*attr)
 {
-	struct xfs_da_args		*args = dac->da_args;
-	struct xfs_da_state		*state = dac->da_state;
+	struct xfs_da_args		*args = attr->xattri_da_args;
+	struct xfs_da_state		*state = attr->xattri_da_state;
 	struct xfs_da_state_blk		*blk;
 	int				error;
 
@@ -1192,7 +1196,7 @@ xfs_attr_node_addname(
 			 * this point.
 			 */
 			trace_xfs_attr_node_addname_return(
-					dac->dela_state, args->dp);
+					attr->xattri_dela_state, args->dp);
 			return -EAGAIN;
 		}
 
@@ -1221,9 +1225,9 @@ xfs_attr_node_addname(
 
 STATIC int
 xfs_attr_node_addname_clear_incomplete(
-	struct xfs_delattr_context	*dac)
+	struct xfs_attr_item		*attr)
 {
-	struct xfs_da_args		*args = dac->da_args;
+	struct xfs_da_args		*args = attr->xattri_da_args;
 	struct xfs_da_state		*state = NULL;
 	struct xfs_mount		*mp = args->dp->i_mount;
 	int				retval = 0;
@@ -1327,10 +1331,10 @@ xfs_attr_leaf_mark_incomplete(
  */
 STATIC
 int xfs_attr_node_removename_setup(
-	struct xfs_delattr_context	*dac)
+	struct xfs_attr_item		*attr)
 {
-	struct xfs_da_args		*args = dac->da_args;
-	struct xfs_da_state		**state = &dac->da_state;
+	struct xfs_da_args		*args = attr->xattri_da_args;
+	struct xfs_da_state		**state = &attr->xattri_da_state;
 	int				error;
 
 	error = xfs_attr_node_hasname(args, state);
@@ -1389,16 +1393,16 @@ xfs_attr_node_removename(
  */
 int
 xfs_attr_remove_iter(
-	struct xfs_delattr_context	*dac)
+	struct xfs_attr_item		*attr)
 {
-	struct xfs_da_args		*args = dac->da_args;
-	struct xfs_da_state		*state = dac->da_state;
+	struct xfs_da_args		*args = attr->xattri_da_args;
+	struct xfs_da_state		*state = attr->xattri_da_state;
 	int				retval, error = 0;
 	struct xfs_inode		*dp = args->dp;
 
 	trace_xfs_attr_node_removename(args);
 
-	switch (dac->dela_state) {
+	switch (attr->xattri_dela_state) {
 	case XFS_DAS_UNINIT:
 		if (!xfs_inode_hasattr(dp))
 			return -ENOATTR;
@@ -1417,16 +1421,16 @@ xfs_attr_remove_iter(
 		 * Node format may require transaction rolls. Set up the
 		 * state context and fall into the state machine.
 		 */
-		if (!dac->da_state) {
-			error = xfs_attr_node_removename_setup(dac);
+		if (!attr->xattri_da_state) {
+			error = xfs_attr_node_removename_setup(attr);
 			if (error)
 				return error;
-			state = dac->da_state;
+			state = attr->xattri_da_state;
 		}
 
 		fallthrough;
 	case XFS_DAS_RMTBLK:
-		dac->dela_state = XFS_DAS_RMTBLK;
+		attr->xattri_dela_state = XFS_DAS_RMTBLK;
 
 		/*
 		 * If there is an out-of-line value, de-allocate the blocks.
@@ -1439,10 +1443,10 @@ xfs_attr_remove_iter(
 			 * May return -EAGAIN. Roll and repeat until all remote
 			 * blocks are removed.
 			 */
-			error = xfs_attr_rmtval_remove(dac);
+			error = xfs_attr_rmtval_remove(attr);
 			if (error == -EAGAIN) {
 				trace_xfs_attr_remove_iter_return(
-						dac->dela_state, args->dp);
+					attr->xattri_dela_state, args->dp);
 				return error;
 			} else if (error) {
 				goto out;
@@ -1457,8 +1461,10 @@ xfs_attr_remove_iter(
 			error = xfs_attr_refillstate(state);
 			if (error)
 				goto out;
-			dac->dela_state = XFS_DAS_RM_NAME;
-			trace_xfs_attr_remove_iter_return(dac->dela_state, args->dp);
+
+			attr->xattri_dela_state = XFS_DAS_RM_NAME;
+			trace_xfs_attr_remove_iter_return(
+					attr->xattri_dela_state, args->dp);
 			return -EAGAIN;
 		}
 
@@ -1468,7 +1474,7 @@ xfs_attr_remove_iter(
 		 * If we came here fresh from a transaction roll, reattach all
 		 * the buffers to the current transaction.
 		 */
-		if (dac->dela_state == XFS_DAS_RM_NAME) {
+		if (attr->xattri_dela_state == XFS_DAS_RM_NAME) {
 			error = xfs_attr_refillstate(state);
 			if (error)
 				goto out;
@@ -1485,9 +1491,9 @@ xfs_attr_remove_iter(
 			if (error)
 				goto out;
 
-			dac->dela_state = XFS_DAS_RM_SHRINK;
+			attr->xattri_dela_state = XFS_DAS_RM_SHRINK;
 			trace_xfs_attr_remove_iter_return(
-					dac->dela_state, args->dp);
+					attr->xattri_dela_state, args->dp);
 			return -EAGAIN;
 		}
 
diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
index 492762541174..ceb2c18b9cca 100644
--- a/fs/xfs/libxfs/xfs_attr.h
+++ b/fs/xfs/libxfs/xfs_attr.h
@@ -430,7 +430,7 @@ struct xfs_attr_list_context {
  */
 
 /*
- * Enum values for xfs_delattr_context.da_state
+ * Enum values for xfs_attr_item.xattri_da_state
  *
  * These values are used by delayed attribute operations to keep track  of where
  * they were before they returned -EAGAIN.  A return code of -EAGAIN signals the
@@ -455,7 +455,7 @@ enum xfs_delattr_state {
 };
 
 /*
- * Defines for xfs_delattr_context.flags
+ * Defines for xfs_attr_item.xattri_flags
  */
 #define XFS_DAC_LEAF_ADDNAME_INIT	0x01 /* xfs_attr_leaf_addname init*/
 #define XFS_DAC_DELAYED_OP_INIT		0x02 /* delayed operations init*/
@@ -463,32 +463,25 @@ enum xfs_delattr_state {
 /*
  * Context used for keeping track of delayed attribute operations
  */
-struct xfs_delattr_context {
-	struct xfs_da_args      *da_args;
+struct xfs_attr_item {
+	struct xfs_da_args		*xattri_da_args;
 
 	/*
 	 * Used by xfs_attr_set to hold a leaf buffer across a transaction roll
 	 */
-	struct xfs_buf		*leaf_bp;
+	struct xfs_buf			*xattri_leaf_bp;
 
 	/* Used in xfs_attr_rmtval_set_blk to roll through allocating blocks */
-	struct xfs_bmbt_irec	map;
-	xfs_dablk_t		lblkno;
-	int			blkcnt;
+	struct xfs_bmbt_irec		xattri_map;
+	xfs_dablk_t			xattri_lblkno;
+	int				xattri_blkcnt;
 
 	/* Used in xfs_attr_node_removename to roll through removing blocks */
-	struct xfs_da_state     *da_state;
+	struct xfs_da_state		*xattri_da_state;
 
 	/* Used to keep track of current state of delayed operation */
-	unsigned int            flags;
-	enum xfs_delattr_state  dela_state;
-};
-
-/*
- * List of attrs to commit later.
- */
-struct xfs_attr_item {
-	struct xfs_delattr_context	xattri_dac;
+	unsigned int			xattri_flags;
+	enum xfs_delattr_state		xattri_dela_state;
 
 	/*
 	 * Indicates if the attr operation is a set or a remove
@@ -496,7 +489,10 @@ struct xfs_attr_item {
 	 */
 	unsigned int			xattri_op_flags;
 
-	/* used to log this item to an intent */
+	/*
+	 * used to log this item to an intent containing a list of attrs to
+	 * commit later
+	 */
 	struct list_head		xattri_list;
 };
 
@@ -516,11 +512,9 @@ bool xfs_attr_is_leaf(struct xfs_inode *ip);
 int xfs_attr_get_ilocked(struct xfs_da_args *args);
 int xfs_attr_get(struct xfs_da_args *args);
 int xfs_attr_set(struct xfs_da_args *args);
-int xfs_attr_set_iter(struct xfs_delattr_context *dac);
-int xfs_attr_remove_iter(struct xfs_delattr_context *dac);
+int xfs_attr_set_iter(struct xfs_attr_item *attr);
+int xfs_attr_remove_iter(struct xfs_attr_item *attr);
 bool xfs_attr_namecheck(const void *name, size_t length);
-void xfs_delattr_context_init(struct xfs_delattr_context *dac,
-			      struct xfs_da_args *args);
 int xfs_attr_calc_size(struct xfs_da_args *args, int *local);
 int xfs_attr_set_deferred(struct xfs_da_args *args);
 int xfs_attr_remove_deferred(struct xfs_da_args *args);
diff --git a/fs/xfs/libxfs/xfs_attr_remote.c b/fs/xfs/libxfs/xfs_attr_remote.c
index c806319134fb..4250159ecced 100644
--- a/fs/xfs/libxfs/xfs_attr_remote.c
+++ b/fs/xfs/libxfs/xfs_attr_remote.c
@@ -568,14 +568,14 @@ xfs_attr_rmtval_stale(
  */
 int
 xfs_attr_rmtval_find_space(
-	struct xfs_delattr_context	*dac)
+	struct xfs_attr_item		*attr)
 {
-	struct xfs_da_args		*args = dac->da_args;
-	struct xfs_bmbt_irec		*map = &dac->map;
+	struct xfs_da_args		*args = attr->xattri_da_args;
+	struct xfs_bmbt_irec		*map = &attr->xattri_map;
 	int				error;
 
-	dac->lblkno = 0;
-	dac->blkcnt = 0;
+	attr->xattri_lblkno = 0;
+	attr->xattri_blkcnt = 0;
 	args->rmtblkcnt = 0;
 	args->rmtblkno = 0;
 	memset(map, 0, sizeof(struct xfs_bmbt_irec));
@@ -584,8 +584,8 @@ xfs_attr_rmtval_find_space(
 	if (error)
 		return error;
 
-	dac->blkcnt = args->rmtblkcnt;
-	dac->lblkno = args->rmtblkno;
+	attr->xattri_blkcnt = args->rmtblkcnt;
+	attr->xattri_lblkno = args->rmtblkno;
 
 	return 0;
 }
@@ -598,17 +598,18 @@ xfs_attr_rmtval_find_space(
  */
 int
 xfs_attr_rmtval_set_blk(
-	struct xfs_delattr_context	*dac)
+	struct xfs_attr_item		*attr)
 {
-	struct xfs_da_args		*args = dac->da_args;
+	struct xfs_da_args		*args = attr->xattri_da_args;
 	struct xfs_inode		*dp = args->dp;
-	struct xfs_bmbt_irec		*map = &dac->map;
+	struct xfs_bmbt_irec		*map = &attr->xattri_map;
 	int nmap;
 	int error;
 
 	nmap = 1;
-	error = xfs_bmapi_write(args->trans, dp, (xfs_fileoff_t)dac->lblkno,
-			dac->blkcnt, XFS_BMAPI_ATTRFORK, args->total,
+	error = xfs_bmapi_write(args->trans, dp,
+			(xfs_fileoff_t)attr->xattri_lblkno,
+			attr->xattri_blkcnt, XFS_BMAPI_ATTRFORK, args->total,
 			map, &nmap);
 	if (error)
 		return error;
@@ -618,8 +619,8 @@ xfs_attr_rmtval_set_blk(
 	       (map->br_startblock != HOLESTARTBLOCK));
 
 	/* roll attribute extent map forwards */
-	dac->lblkno += map->br_blockcount;
-	dac->blkcnt -= map->br_blockcount;
+	attr->xattri_lblkno += map->br_blockcount;
+	attr->xattri_blkcnt -= map->br_blockcount;
 
 	return 0;
 }
@@ -673,9 +674,9 @@ xfs_attr_rmtval_invalidate(
  */
 int
 xfs_attr_rmtval_remove(
-	struct xfs_delattr_context	*dac)
+	struct xfs_attr_item		*attr)
 {
-	struct xfs_da_args		*args = dac->da_args;
+	struct xfs_da_args		*args = attr->xattri_da_args;
 	int				error, done;
 
 	/*
@@ -695,7 +696,8 @@ xfs_attr_rmtval_remove(
 	 * the parent
 	 */
 	if (!done) {
-		trace_xfs_attr_rmtval_remove_return(dac->dela_state, args->dp);
+		trace_xfs_attr_rmtval_remove_return(attr->xattri_dela_state,
+						    args->dp);
 		return -EAGAIN;
 	}
 
diff --git a/fs/xfs/libxfs/xfs_attr_remote.h b/fs/xfs/libxfs/xfs_attr_remote.h
index d72eff30ca18..62b398edec3f 100644
--- a/fs/xfs/libxfs/xfs_attr_remote.h
+++ b/fs/xfs/libxfs/xfs_attr_remote.h
@@ -12,9 +12,9 @@ int xfs_attr_rmtval_get(struct xfs_da_args *args);
 int xfs_attr_rmtval_stale(struct xfs_inode *ip, struct xfs_bmbt_irec *map,
 		xfs_buf_flags_t incore_flags);
 int xfs_attr_rmtval_invalidate(struct xfs_da_args *args);
-int xfs_attr_rmtval_remove(struct xfs_delattr_context *dac);
+int xfs_attr_rmtval_remove(struct xfs_attr_item *attr);
 int xfs_attr_rmt_find_hole(struct xfs_da_args *args);
 int xfs_attr_rmtval_set_value(struct xfs_da_args *args);
-int xfs_attr_rmtval_set_blk(struct xfs_delattr_context *dac);
-int xfs_attr_rmtval_find_space(struct xfs_delattr_context *dac);
+int xfs_attr_rmtval_set_blk(struct xfs_attr_item *attr);
+int xfs_attr_rmtval_find_space(struct xfs_attr_item *attr);
 #endif /* __XFS_ATTR_REMOTE_H__ */
diff --git a/fs/xfs/xfs_attr_item.c b/fs/xfs/xfs_attr_item.c
index 69646a8b6e09..2f48e84754ae 100644
--- a/fs/xfs/xfs_attr_item.c
+++ b/fs/xfs/xfs_attr_item.c
@@ -284,11 +284,11 @@ xfs_attrd_item_release(
  */
 STATIC int
 xfs_trans_attr_finish_update(
-	struct xfs_delattr_context	*dac,
+	struct xfs_attr_item		*attr,
 	struct xfs_attrd_log_item	*attrdp,
 	uint32_t			op_flags)
 {
-	struct xfs_da_args		*args = dac->da_args;
+	struct xfs_da_args		*args = attr->xattri_da_args;
 	unsigned int			op = op_flags &
 					     XFS_ATTR_OP_FLAGS_TYPE_MASK;
 	int				error;
@@ -300,11 +300,11 @@ xfs_trans_attr_finish_update(
 
 	switch (op) {
 	case XFS_ATTR_OP_FLAGS_SET:
-		error = xfs_attr_set_iter(dac);
+		error = xfs_attr_set_iter(attr);
 		break;
 	case XFS_ATTR_OP_FLAGS_REMOVE:
 		ASSERT(XFS_IFORK_Q(args->dp));
-		error = xfs_attr_remove_iter(dac);
+		error = xfs_attr_remove_iter(attr);
 		break;
 	default:
 		error = -EFSCORRUPTED;
@@ -348,16 +348,16 @@ xfs_attr_log_item(
 	 * structure with fields from this xfs_attr_item
 	 */
 	attrp = &attrip->attri_format;
-	attrp->alfi_ino = attr->xattri_dac.da_args->dp->i_ino;
+	attrp->alfi_ino = attr->xattri_da_args->dp->i_ino;
 	attrp->alfi_op_flags = attr->xattri_op_flags;
-	attrp->alfi_value_len = attr->xattri_dac.da_args->valuelen;
-	attrp->alfi_name_len = attr->xattri_dac.da_args->namelen;
-	attrp->alfi_attr_flags = attr->xattri_dac.da_args->attr_filter;
-
-	attrip->attri_name = (void *)attr->xattri_dac.da_args->name;
-	attrip->attri_value = attr->xattri_dac.da_args->value;
-	attrip->attri_name_len = attr->xattri_dac.da_args->namelen;
-	attrip->attri_value_len = attr->xattri_dac.da_args->valuelen;
+	attrp->alfi_value_len = attr->xattri_da_args->valuelen;
+	attrp->alfi_name_len = attr->xattri_da_args->namelen;
+	attrp->alfi_attr_flags = attr->xattri_da_args->attr_filter;
+
+	attrip->attri_name = (void *)attr->xattri_da_args->name;
+	attrip->attri_value = attr->xattri_da_args->value;
+	attrip->attri_name_len = attr->xattri_da_args->namelen;
+	attrip->attri_value_len = attr->xattri_da_args->valuelen;
 }
 
 /* Get an ATTRI. */
@@ -398,10 +398,8 @@ xfs_attr_finish_item(
 	struct xfs_attr_item		*attr;
 	struct xfs_attrd_log_item	*done_item = NULL;
 	int				error;
-	struct xfs_delattr_context	*dac;
 
 	attr = container_of(item, struct xfs_attr_item, xattri_list);
-	dac = &attr->xattri_dac;
 	if (done)
 		done_item = ATTRD_ITEM(done);
 
@@ -409,9 +407,9 @@ xfs_attr_finish_item(
 	 * Always reset trans after EAGAIN cycle
 	 * since the transaction is new
 	 */
-	dac->da_args->trans = tp;
+	attr->xattri_da_args->trans = tp;
 
-	error = xfs_trans_attr_finish_update(dac, done_item,
+	error = xfs_trans_attr_finish_update(attr, done_item,
 					     attr->xattri_op_flags);
 	if (error != -EAGAIN)
 		kmem_free(attr);
@@ -585,7 +583,7 @@ xfs_attri_item_recover(
 			   sizeof(struct xfs_da_args), KM_NOFS);
 	args = (struct xfs_da_args *)(attr + 1);
 
-	attr->xattri_dac.da_args = args;
+	attr->xattri_da_args = args;
 	attr->xattri_op_flags = attrp->alfi_op_flags;
 
 	args->dp = ip;
@@ -622,7 +620,7 @@ xfs_attri_item_recover(
 	xfs_ilock(ip, XFS_ILOCK_EXCL);
 	xfs_trans_ijoin(tp, ip, 0);
 
-	ret = xfs_trans_attr_finish_update(&attr->xattri_dac, done_item,
+	ret = xfs_trans_attr_finish_update(attr, done_item,
 					   attrp->alfi_op_flags);
 	if (ret == -EAGAIN) {
 		/* There's more work to do, so add it to this transaction */
@@ -636,7 +634,7 @@ xfs_attri_item_recover(
 	}
 
 	error = xfs_defer_ops_capture_and_commit(tp, ip, capture_list,
-						 attr->xattri_dac.leaf_bp);
+						 attr->xattri_leaf_bp);
 
 out_unlock:
 	xfs_iunlock(ip, XFS_ILOCK_EXCL);
-- 
2.25.1


^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH v24 11/11] xfs: Add helper function xfs_attr_leaf_addname
  2021-08-24 22:44 [PATCH v24 00/11] Log Attribute Replay Allison Henderson
                   ` (9 preceding siblings ...)
  2021-08-24 22:44 ` [PATCH v24 10/11] xfs: Merge xfs_delattr_context into xfs_attr_item Allison Henderson
@ 2021-08-24 22:44 ` Allison Henderson
  2021-08-30 14:17   ` Chandan Babu R
  2021-08-31  0:20 ` [PATCH v24 00/11] Log Attribute Replay Dave Chinner
  2021-09-01  7:30 ` [PATCH 0/5] xfs: various logged attribute fixes Dave Chinner
  12 siblings, 1 reply; 45+ messages in thread
From: Allison Henderson @ 2021-08-24 22:44 UTC (permalink / raw)
  To: linux-xfs

This patch adds a helper function xfs_attr_leaf_addname.  While this
does help to break down xfs_attr_set_iter, it does also hoist out some
of the state management.  This patch has been moved to the end of the
clean up series for further discussion.

Suggested-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_attr.c | 110 +++++++++++++++++++++------------------
 fs/xfs/xfs_trace.h       |   1 +
 2 files changed, 61 insertions(+), 50 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
index c3fdf232cd51..7150f0e051a0 100644
--- a/fs/xfs/libxfs/xfs_attr.c
+++ b/fs/xfs/libxfs/xfs_attr.c
@@ -284,6 +284,65 @@ xfs_attr_sf_addname(
 	return -EAGAIN;
 }
 
+STATIC int
+xfs_attr_leaf_addname(
+	struct xfs_attr_item	*attr)
+{
+	struct xfs_da_args	*args = attr->xattri_da_args;
+	struct xfs_inode	*dp = args->dp;
+	int			error;
+
+	if (xfs_attr_is_leaf(dp)) {
+		error = xfs_attr_leaf_try_add(args, attr->xattri_leaf_bp);
+		if (error == -ENOSPC) {
+			error = xfs_attr3_leaf_to_node(args);
+			if (error)
+				return error;
+
+			/*
+			 * Finish any deferred work items and roll the
+			 * transaction once more.  The goal here is to call
+			 * node_addname with the inode and transaction in the
+			 * same state (inode locked and joined, transaction
+			 * clean) no matter how we got to this step.
+			 *
+			 * At this point, we are still in XFS_DAS_UNINIT, but
+			 * when we come back, we'll be a node, so we'll fall
+			 * down into the node handling code below
+			 */
+			trace_xfs_attr_set_iter_return(
+				attr->xattri_dela_state, args->dp);
+			return -EAGAIN;
+		}
+
+		if (error)
+			return error;
+
+		attr->xattri_dela_state = XFS_DAS_FOUND_LBLK;
+	} else {
+		error = xfs_attr_node_addname_find_attr(attr);
+		if (error)
+			return error;
+
+		error = xfs_attr_node_addname(attr);
+		if (error)
+			return error;
+
+		/*
+		 * If addname was successful, and we dont need to alloc or
+		 * remove anymore blks, we're done.
+		 */
+		if (!args->rmtblkno &&
+		    !(args->op_flags & XFS_DA_OP_RENAME))
+			return 0;
+
+		attr->xattri_dela_state = XFS_DAS_FOUND_NBLK;
+	}
+
+	trace_xfs_attr_leaf_addname_return(attr->xattri_dela_state, args->dp);
+	return -EAGAIN;
+}
+
 /*
  * Set the attribute specified in @args.
  * This routine is meant to function as a delayed operation, and may return
@@ -319,57 +378,8 @@ xfs_attr_set_iter(
 			attr->xattri_leaf_bp = NULL;
 		}
 
-		if (xfs_attr_is_leaf(dp)) {
-			error = xfs_attr_leaf_try_add(args,
-						      attr->xattri_leaf_bp);
-			if (error == -ENOSPC) {
-				error = xfs_attr3_leaf_to_node(args);
-				if (error)
-					return error;
-
-				/*
-				 * Finish any deferred work items and roll the
-				 * transaction once more.  The goal here is to
-				 * call node_addname with the inode and
-				 * transaction in the same state (inode locked
-				 * and joined, transaction clean) no matter how
-				 * we got to this step.
-				 *
-				 * At this point, we are still in
-				 * XFS_DAS_UNINIT, but when we come back, we'll
-				 * be a node, so we'll fall down into the node
-				 * handling code below
-				 */
-				trace_xfs_attr_set_iter_return(
-					attr->xattri_dela_state, args->dp);
-				return -EAGAIN;
-			} else if (error) {
-				return error;
-			}
-
-			attr->xattri_dela_state = XFS_DAS_FOUND_LBLK;
-		} else {
-			error = xfs_attr_node_addname_find_attr(attr);
-			if (error)
-				return error;
+		return xfs_attr_leaf_addname(attr);
 
-			error = xfs_attr_node_addname(attr);
-			if (error)
-				return error;
-
-			/*
-			 * If addname was successful, and we dont need to alloc
-			 * or remove anymore blks, we're done.
-			 */
-			if (!args->rmtblkno &&
-			    !(args->op_flags & XFS_DA_OP_RENAME))
-				return 0;
-
-			attr->xattri_dela_state = XFS_DAS_FOUND_NBLK;
-		}
-		trace_xfs_attr_set_iter_return(attr->xattri_dela_state,
-					       args->dp);
-		return -EAGAIN;
 	case XFS_DAS_FOUND_LBLK:
 		/*
 		 * If there was an out-of-line value, allocate the blocks we
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index 1033a95fbf8e..77a78b5b1a29 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -4132,6 +4132,7 @@ DEFINE_EVENT(xfs_das_state_class, name, \
 	TP_ARGS(das, ip))
 DEFINE_DAS_STATE_EVENT(xfs_attr_sf_addname_return);
 DEFINE_DAS_STATE_EVENT(xfs_attr_set_iter_return);
+DEFINE_DAS_STATE_EVENT(xfs_attr_leaf_addname_return);
 DEFINE_DAS_STATE_EVENT(xfs_attr_node_addname_return);
 DEFINE_DAS_STATE_EVENT(xfs_attr_remove_iter_return);
 DEFINE_DAS_STATE_EVENT(xfs_attr_rmtval_remove_return);
-- 
2.25.1


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v24 02/11] xfs: Capture buffers for delayed ops
  2021-08-24 22:44 ` [PATCH v24 02/11] xfs: Capture buffers for delayed ops Allison Henderson
@ 2021-08-27 11:33   ` Chandan Babu R
  2021-08-31 18:03     ` Allison Henderson
  2021-08-30 17:44   ` Darrick J. Wong
  1 sibling, 1 reply; 45+ messages in thread
From: Chandan Babu R @ 2021-08-27 11:33 UTC (permalink / raw)
  To: Allison Henderson; +Cc: linux-xfs

On 25 Aug 2021 at 04:14, Allison Henderson wrote:
> This patch enables delayed operations to capture held buffers with in
> the xfs_defer_capture. Buffers are then rejoined to the new
> transaction in xlog_finish_defer_ops
>

Looks good to me.

Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>

> Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
> ---
>  fs/xfs/libxfs/xfs_defer.c  | 7 ++++++-
>  fs/xfs/libxfs/xfs_defer.h  | 4 +++-
>  fs/xfs/xfs_bmap_item.c     | 2 +-
>  fs/xfs/xfs_buf.c           | 1 +
>  fs/xfs/xfs_buf.h           | 1 +
>  fs/xfs/xfs_extfree_item.c  | 2 +-
>  fs/xfs/xfs_log_recover.c   | 7 +++++++
>  fs/xfs/xfs_refcount_item.c | 2 +-
>  fs/xfs/xfs_rmap_item.c     | 2 +-
>  9 files changed, 22 insertions(+), 6 deletions(-)
>
> diff --git a/fs/xfs/libxfs/xfs_defer.c b/fs/xfs/libxfs/xfs_defer.c
> index eff4a127188e..d1d09b6aca55 100644
> --- a/fs/xfs/libxfs/xfs_defer.c
> +++ b/fs/xfs/libxfs/xfs_defer.c
> @@ -639,6 +639,7 @@ xfs_defer_ops_capture(
>  	dfc = kmem_zalloc(sizeof(*dfc), KM_NOFS);
>  	INIT_LIST_HEAD(&dfc->dfc_list);
>  	INIT_LIST_HEAD(&dfc->dfc_dfops);
> +	INIT_LIST_HEAD(&dfc->dfc_buffers);
>  
>  	xfs_defer_create_intents(tp);
>  
> @@ -690,7 +691,8 @@ int
>  xfs_defer_ops_capture_and_commit(
>  	struct xfs_trans		*tp,
>  	struct xfs_inode		*capture_ip,
> -	struct list_head		*capture_list)
> +	struct list_head		*capture_list,
> +	struct xfs_buf			*bp)
>  {
>  	struct xfs_mount		*mp = tp->t_mountp;
>  	struct xfs_defer_capture	*dfc;
> @@ -703,6 +705,9 @@ xfs_defer_ops_capture_and_commit(
>  	if (!dfc)
>  		return xfs_trans_commit(tp);
>  
> +	if (bp && bp->b_transp == tp)
> +		list_add_tail(&bp->b_delay, &dfc->dfc_buffers);
> +
>  	/* Commit the transaction and add the capture structure to the list. */
>  	error = xfs_trans_commit(tp);
>  	if (error) {
> diff --git a/fs/xfs/libxfs/xfs_defer.h b/fs/xfs/libxfs/xfs_defer.h
> index 05472f71fffe..739f70d72fd5 100644
> --- a/fs/xfs/libxfs/xfs_defer.h
> +++ b/fs/xfs/libxfs/xfs_defer.h
> @@ -74,6 +74,7 @@ struct xfs_defer_capture {
>  
>  	/* Deferred ops state saved from the transaction. */
>  	struct list_head	dfc_dfops;
> +	struct list_head	dfc_buffers;
>  	unsigned int		dfc_tpflags;
>  
>  	/* Block reservations for the data and rt devices. */
> @@ -95,7 +96,8 @@ struct xfs_defer_capture {
>   * This doesn't normally happen except log recovery.
>   */
>  int xfs_defer_ops_capture_and_commit(struct xfs_trans *tp,
> -		struct xfs_inode *capture_ip, struct list_head *capture_list);
> +		struct xfs_inode *capture_ip, struct list_head *capture_list,
> +		struct xfs_buf *bp);
>  void xfs_defer_ops_continue(struct xfs_defer_capture *d, struct xfs_trans *tp,
>  		struct xfs_inode **captured_ipp);
>  void xfs_defer_ops_release(struct xfs_mount *mp, struct xfs_defer_capture *d);
> diff --git a/fs/xfs/xfs_bmap_item.c b/fs/xfs/xfs_bmap_item.c
> index 03159970133f..51ba8ee368ca 100644
> --- a/fs/xfs/xfs_bmap_item.c
> +++ b/fs/xfs/xfs_bmap_item.c
> @@ -532,7 +532,7 @@ xfs_bui_item_recover(
>  	 * Commit transaction, which frees the transaction and saves the inode
>  	 * for later replay activities.
>  	 */
> -	error = xfs_defer_ops_capture_and_commit(tp, ip, capture_list);
> +	error = xfs_defer_ops_capture_and_commit(tp, ip, capture_list, NULL);
>  	if (error)
>  		goto err_unlock;
>  
> diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
> index 047bd6e3f389..29b4655a0a65 100644
> --- a/fs/xfs/xfs_buf.c
> +++ b/fs/xfs/xfs_buf.c
> @@ -233,6 +233,7 @@ _xfs_buf_alloc(
>  	init_completion(&bp->b_iowait);
>  	INIT_LIST_HEAD(&bp->b_lru);
>  	INIT_LIST_HEAD(&bp->b_list);
> +	INIT_LIST_HEAD(&bp->b_delay);
>  	INIT_LIST_HEAD(&bp->b_li_list);
>  	sema_init(&bp->b_sema, 0); /* held, no waiters */
>  	spin_lock_init(&bp->b_lock);
> diff --git a/fs/xfs/xfs_buf.h b/fs/xfs/xfs_buf.h
> index 6b0200b8007d..c51445705dc6 100644
> --- a/fs/xfs/xfs_buf.h
> +++ b/fs/xfs/xfs_buf.h
> @@ -151,6 +151,7 @@ struct xfs_buf {
>  	int			b_io_error;	/* internal IO error state */
>  	wait_queue_head_t	b_waiters;	/* unpin waiters */
>  	struct list_head	b_list;
> +	struct list_head	b_delay;	/* delayed operations list */
>  	struct xfs_perag	*b_pag;		/* contains rbtree root */
>  	struct xfs_mount	*b_mount;
>  	struct xfs_buftarg	*b_target;	/* buffer target (device) */
> diff --git a/fs/xfs/xfs_extfree_item.c b/fs/xfs/xfs_extfree_item.c
> index 3f8a0713573a..046f21338c48 100644
> --- a/fs/xfs/xfs_extfree_item.c
> +++ b/fs/xfs/xfs_extfree_item.c
> @@ -637,7 +637,7 @@ xfs_efi_item_recover(
>  
>  	}
>  
> -	return xfs_defer_ops_capture_and_commit(tp, NULL, capture_list);
> +	return xfs_defer_ops_capture_and_commit(tp, NULL, capture_list, NULL);
>  
>  abort_error:
>  	xfs_trans_cancel(tp);
> diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
> index 10562ecbd9ea..6a3c0bb16b69 100644
> --- a/fs/xfs/xfs_log_recover.c
> +++ b/fs/xfs/xfs_log_recover.c
> @@ -2465,6 +2465,7 @@ xlog_finish_defer_ops(
>  	struct list_head	*capture_list)
>  {
>  	struct xfs_defer_capture *dfc, *next;
> +	struct xfs_buf		*bp, *bnext;
>  	struct xfs_trans	*tp;
>  	struct xfs_inode	*ip;
>  	int			error = 0;
> @@ -2489,6 +2490,12 @@ xlog_finish_defer_ops(
>  			return error;
>  		}
>  
> +		list_for_each_entry_safe(bp, bnext, &dfc->dfc_buffers, b_delay) {
> +			xfs_trans_bjoin(tp, bp);
> +			xfs_trans_bhold(tp, bp);
> +			list_del_init(&bp->b_delay);
> +		}
> +
>  		/*
>  		 * Transfer to this new transaction all the dfops we captured
>  		 * from recovering a single intent item.
> diff --git a/fs/xfs/xfs_refcount_item.c b/fs/xfs/xfs_refcount_item.c
> index 46904b793bd4..a6e7351ca4f9 100644
> --- a/fs/xfs/xfs_refcount_item.c
> +++ b/fs/xfs/xfs_refcount_item.c
> @@ -557,7 +557,7 @@ xfs_cui_item_recover(
>  	}
>  
>  	xfs_refcount_finish_one_cleanup(tp, rcur, error);
> -	return xfs_defer_ops_capture_and_commit(tp, NULL, capture_list);
> +	return xfs_defer_ops_capture_and_commit(tp, NULL, capture_list, NULL);
>  
>  abort_error:
>  	xfs_refcount_finish_one_cleanup(tp, rcur, error);
> diff --git a/fs/xfs/xfs_rmap_item.c b/fs/xfs/xfs_rmap_item.c
> index 5f0695980467..8c70a4af80a9 100644
> --- a/fs/xfs/xfs_rmap_item.c
> +++ b/fs/xfs/xfs_rmap_item.c
> @@ -587,7 +587,7 @@ xfs_rui_item_recover(
>  	}
>  
>  	xfs_rmap_finish_one_cleanup(tp, rcur, error);
> -	return xfs_defer_ops_capture_and_commit(tp, NULL, capture_list);
> +	return xfs_defer_ops_capture_and_commit(tp, NULL, capture_list, NULL);
>  
>  abort_error:
>  	xfs_rmap_finish_one_cleanup(tp, rcur, error);


-- 
chandan

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v24 03/11] xfs: Set up infrastructure for log atrribute replay
  2021-08-24 22:44 ` [PATCH v24 03/11] xfs: Set up infrastructure for log atrribute replay Allison Henderson
@ 2021-08-27 14:17   ` Chandan Babu R
  2021-08-31 18:03     ` Allison Henderson
  2021-08-31  0:48   ` Dave Chinner
  2021-08-31 21:52   ` Dave Chinner
  2 siblings, 1 reply; 45+ messages in thread
From: Chandan Babu R @ 2021-08-27 14:17 UTC (permalink / raw)
  To: Allison Henderson; +Cc: linux-xfs

On 25 Aug 2021 at 04:14, Allison Henderson wrote:
> Currently attributes are modified directly across one or more
> transactions. But they are not logged or replayed in the event of an
> error. The goal of log attr replay is to enable logging and replaying
> of attribute operations using the existing delayed operations
> infrastructure.  This will later enable the attributes to become part of
> larger multi part operations that also must first be recorded to the
> log.  This is mostly of interest in the scheme of parent pointers which
> would need to maintain an attribute containing parent inode information
> any time an inode is moved, created, or removed.  Parent pointers would
> then be of interest to any feature that would need to quickly derive an
> inode path from the mount point. Online scrub, nfs lookups and fs grow
> or shrink operations are all features that could take advantage of this.
>
> This patch adds two new log item types for setting or removing
> attributes as deferred operations.  The xfs_attri_log_item will log an
> intent to set or remove an attribute.  The corresponding
> xfs_attrd_log_item holds a reference to the xfs_attri_log_item and is
> freed once the transaction is done.  Both log items use a generic
> xfs_attr_log_format structure that contains the attribute name, value,
> flags, inode, and an op_flag that indicates if the operations is a set
> or remove.

Apart from the minor nits mentioned below, the remaining changes look good to
me.

Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>

>
> Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
> ---
>  fs/xfs/Makefile                 |   1 +
>  fs/xfs/libxfs/xfs_attr.c        |   5 +-
>  fs/xfs/libxfs/xfs_attr.h        |  31 +++
>  fs/xfs/libxfs/xfs_defer.h       |   2 +
>  fs/xfs/libxfs/xfs_log_format.h  |  44 +++-
>  fs/xfs/libxfs/xfs_log_recover.h |   2 +
>  fs/xfs/scrub/common.c           |   2 +
>  fs/xfs/xfs_attr_item.c          | 453 ++++++++++++++++++++++++++++++++
>  fs/xfs/xfs_attr_item.h          |  52 ++++
>  fs/xfs/xfs_attr_list.c          |   1 +
>  fs/xfs/xfs_ioctl32.c            |   2 +
>  fs/xfs/xfs_iops.c               |   2 +
>  fs/xfs/xfs_log.c                |   4 +
>  fs/xfs/xfs_log_recover.c        |   2 +
>  fs/xfs/xfs_ondisk.h             |   2 +
>  15 files changed, 600 insertions(+), 5 deletions(-)
>
> diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
> index 04611a1068b4..b056cfc6398e 100644
> --- a/fs/xfs/Makefile
> +++ b/fs/xfs/Makefile
> @@ -102,6 +102,7 @@ xfs-y				+= xfs_log.o \
>  				   xfs_buf_item_recover.o \
>  				   xfs_dquot_item_recover.o \
>  				   xfs_extfree_item.o \
> +				   xfs_attr_item.o \
>  				   xfs_icreate_item.o \
>  				   xfs_inode_item.o \
>  				   xfs_inode_item_recover.o \
> diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
> index 50b91b4461e7..dfff81024e46 100644
> --- a/fs/xfs/libxfs/xfs_attr.c
> +++ b/fs/xfs/libxfs/xfs_attr.c
> @@ -24,6 +24,7 @@
>  #include "xfs_quota.h"
>  #include "xfs_trans_space.h"
>  #include "xfs_trace.h"
> +#include "xfs_attr_item.h"
>  
>  /*
>   * xfs_attr.c
> @@ -61,8 +62,6 @@ STATIC int xfs_attr_node_hasname(xfs_da_args_t *args,
>  				 struct xfs_da_state **state);
>  STATIC int xfs_attr_fillstate(xfs_da_state_t *state);
>  STATIC int xfs_attr_refillstate(xfs_da_state_t *state);
> -STATIC int xfs_attr_set_iter(struct xfs_delattr_context *dac,
> -			     struct xfs_buf **leaf_bp);
>  STATIC int xfs_attr_node_removename(struct xfs_da_args *args,
>  				    struct xfs_da_state *state);
>  
> @@ -166,7 +165,7 @@ xfs_attr_get(
>  /*
>   * Calculate how many blocks we need for the new attribute,
>   */
> -STATIC int
> +int
>  xfs_attr_calc_size(
>  	struct xfs_da_args	*args,
>  	int			*local)
> diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
> index 5e71f719bdd5..aa33cdcf26b8 100644
> --- a/fs/xfs/libxfs/xfs_attr.h
> +++ b/fs/xfs/libxfs/xfs_attr.h
> @@ -28,6 +28,11 @@ struct xfs_attr_list_context;
>   */
>  #define	ATTR_MAX_VALUELEN	(64*1024)	/* max length of a value */
>  
> +static inline bool xfs_has_larp(struct xfs_mount *mp)
> +{
> +	return false;
> +}
> +
>  /*
>   * Kernel-internal version of the attrlist cursor.
>   */
> @@ -454,6 +459,7 @@ enum xfs_delattr_state {
>   */
>  #define XFS_DAC_DEFER_FINISH		0x01 /* finish the transaction */
>  #define XFS_DAC_LEAF_ADDNAME_INIT	0x02 /* xfs_attr_leaf_addname init*/
> +#define XFS_DAC_DELAYED_OP_INIT		0x04 /* delayed operations init*/

A future patch in the series assigns a value of 0x02 to
XFS_DAC_DELAYED_OP_INIT. Also, this macro is not used by any of the patches in
this series.

>  
>  /*
>   * Context used for keeping track of delayed attribute operations
> @@ -461,6 +467,11 @@ enum xfs_delattr_state {
>  struct xfs_delattr_context {
>  	struct xfs_da_args      *da_args;
>  
> +	/*
> +	 * Used by xfs_attr_set to hold a leaf buffer across a transaction roll
> +	 */
> +	struct xfs_buf		*leaf_bp;
> +
>  	/* Used in xfs_attr_rmtval_set_blk to roll through allocating blocks */
>  	struct xfs_bmbt_irec	map;
>  	xfs_dablk_t		lblkno;
> @@ -474,6 +485,23 @@ struct xfs_delattr_context {
>  	enum xfs_delattr_state  dela_state;
>  };
>  
> +/*
> + * List of attrs to commit later.
> + */
> +struct xfs_attr_item {
> +	struct xfs_delattr_context	xattri_dac;
> +
> +	/*
> +	 * Indicates if the attr operation is a set or a remove
> +	 * XFS_ATTR_OP_FLAGS_{SET,REMOVE}
> +	 */
> +	unsigned int			xattri_op_flags;
> +
> +	/* used to log this item to an intent */
> +	struct list_head		xattri_list;
> +};
> +
> +
>  /*========================================================================
>   * Function prototypes for the kernel.
>   *========================================================================*/
> @@ -490,10 +518,13 @@ int xfs_attr_get_ilocked(struct xfs_da_args *args);
>  int xfs_attr_get(struct xfs_da_args *args);
>  int xfs_attr_set(struct xfs_da_args *args);
>  int xfs_attr_set_args(struct xfs_da_args *args);
> +int xfs_attr_set_iter(struct xfs_delattr_context *dac,
> +		      struct xfs_buf **leaf_bp);
>  int xfs_attr_remove_args(struct xfs_da_args *args);
>  int xfs_attr_remove_iter(struct xfs_delattr_context *dac);
>  bool xfs_attr_namecheck(const void *name, size_t length);
>  void xfs_delattr_context_init(struct xfs_delattr_context *dac,
>  			      struct xfs_da_args *args);
> +int xfs_attr_calc_size(struct xfs_da_args *args, int *local);
>  
>  #endif	/* __XFS_ATTR_H__ */
> diff --git a/fs/xfs/libxfs/xfs_defer.h b/fs/xfs/libxfs/xfs_defer.h
> index 739f70d72fd5..89719146c5eb 100644
> --- a/fs/xfs/libxfs/xfs_defer.h
> +++ b/fs/xfs/libxfs/xfs_defer.h
> @@ -63,6 +63,8 @@ extern const struct xfs_defer_op_type xfs_refcount_update_defer_type;
>  extern const struct xfs_defer_op_type xfs_rmap_update_defer_type;
>  extern const struct xfs_defer_op_type xfs_extent_free_defer_type;
>  extern const struct xfs_defer_op_type xfs_agfl_free_defer_type;
> +extern const struct xfs_defer_op_type xfs_attr_defer_type;
> +
>  
>  /*
>   * This structure enables a dfops user to detach the chain of deferred
> diff --git a/fs/xfs/libxfs/xfs_log_format.h b/fs/xfs/libxfs/xfs_log_format.h
> index b322db523d65..3301c369e815 100644
> --- a/fs/xfs/libxfs/xfs_log_format.h
> +++ b/fs/xfs/libxfs/xfs_log_format.h
> @@ -114,7 +114,12 @@ struct xfs_unmount_log_format {
>  #define XLOG_REG_TYPE_CUD_FORMAT	24
>  #define XLOG_REG_TYPE_BUI_FORMAT	25
>  #define XLOG_REG_TYPE_BUD_FORMAT	26
> -#define XLOG_REG_TYPE_MAX		26
> +#define XLOG_REG_TYPE_ATTRI_FORMAT	27
> +#define XLOG_REG_TYPE_ATTRD_FORMAT	28
> +#define XLOG_REG_TYPE_ATTR_NAME	29
> +#define XLOG_REG_TYPE_ATTR_VALUE	30
> +#define XLOG_REG_TYPE_MAX		30
> +
>  
>  /*
>   * Flags to log operation header
> @@ -237,6 +242,8 @@ typedef struct xfs_trans_header {
>  #define	XFS_LI_CUD		0x1243
>  #define	XFS_LI_BUI		0x1244	/* bmbt update intent */
>  #define	XFS_LI_BUD		0x1245
> +#define	XFS_LI_ATTRI		0x1246  /* attr set/remove intent*/
> +#define	XFS_LI_ATTRD		0x1247  /* attr set/remove done */
>  
>  #define XFS_LI_TYPE_DESC \
>  	{ XFS_LI_EFI,		"XFS_LI_EFI" }, \
> @@ -252,7 +259,9 @@ typedef struct xfs_trans_header {
>  	{ XFS_LI_CUI,		"XFS_LI_CUI" }, \
>  	{ XFS_LI_CUD,		"XFS_LI_CUD" }, \
>  	{ XFS_LI_BUI,		"XFS_LI_BUI" }, \
> -	{ XFS_LI_BUD,		"XFS_LI_BUD" }
> +	{ XFS_LI_BUD,		"XFS_LI_BUD" }, \
> +	{ XFS_LI_ATTRI,		"XFS_LI_ATTRI" }, \
> +	{ XFS_LI_ATTRD,		"XFS_LI_ATTRD" }
>  
>  /*
>   * Inode Log Item Format definitions.
> @@ -869,4 +878,35 @@ struct xfs_icreate_log {
>  	__be32		icl_gen;	/* inode generation number to use */
>  };
>  
> +/*
> + * Flags for deferred attribute operations.
> + * Upper bits are flags, lower byte is type code
> + */
> +#define XFS_ATTR_OP_FLAGS_SET		1	/* Set the attribute */
> +#define XFS_ATTR_OP_FLAGS_REMOVE	2	/* Remove the attribute */
> +#define XFS_ATTR_OP_FLAGS_TYPE_MASK	0xFF	/* Flags type mask */
> +
> +/*
> + * This is the structure used to lay out an attr log item in the
> + * log.
> + */
> +struct xfs_attri_log_format {
> +	uint16_t	alfi_type;	/* attri log item type */
> +	uint16_t	alfi_size;	/* size of this item */
> +	uint32_t	__pad;		/* pad to 64 bit aligned */
> +	uint64_t	alfi_id;	/* attri identifier */
> +	uint64_t	alfi_ino;	/* the inode for this attr operation */
> +	uint32_t	alfi_op_flags;	/* marks the op as a set or remove */
> +	uint32_t	alfi_name_len;	/* attr name length */
> +	uint32_t	alfi_value_len;	/* attr value length */
> +	uint32_t	alfi_attr_flags;/* attr flags */
> +};
> +
> +struct xfs_attrd_log_format {
> +	uint16_t	alfd_type;	/* attrd log item type */
> +	uint16_t	alfd_size;	/* size of this item */
> +	uint32_t	__pad;		/* pad to 64 bit aligned */
> +	uint64_t	alfd_alf_id;	/* id of corresponding attri */
> +};
> +
>  #endif /* __XFS_LOG_FORMAT_H__ */
> diff --git a/fs/xfs/libxfs/xfs_log_recover.h b/fs/xfs/libxfs/xfs_log_recover.h
> index ff69a0000817..32e216255cb0 100644
> --- a/fs/xfs/libxfs/xfs_log_recover.h
> +++ b/fs/xfs/libxfs/xfs_log_recover.h
> @@ -72,6 +72,8 @@ extern const struct xlog_recover_item_ops xlog_rui_item_ops;
>  extern const struct xlog_recover_item_ops xlog_rud_item_ops;
>  extern const struct xlog_recover_item_ops xlog_cui_item_ops;
>  extern const struct xlog_recover_item_ops xlog_cud_item_ops;
> +extern const struct xlog_recover_item_ops xlog_attri_item_ops;
> +extern const struct xlog_recover_item_ops xlog_attrd_item_ops;
>  
>  /*
>   * Macros, structures, prototypes for internal log manager use.
> diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
> index bf1f3607d0b6..97b54ac3075f 100644
> --- a/fs/xfs/scrub/common.c
> +++ b/fs/xfs/scrub/common.c
> @@ -23,6 +23,8 @@
>  #include "xfs_rmap_btree.h"
>  #include "xfs_log.h"
>  #include "xfs_trans_priv.h"
> +#include "xfs_da_format.h"
> +#include "xfs_da_btree.h"
>  #include "xfs_attr.h"
>  #include "xfs_reflink.h"
>  #include "xfs_ag.h"
> diff --git a/fs/xfs/xfs_attr_item.c b/fs/xfs/xfs_attr_item.c
> new file mode 100644
> index 000000000000..879a39ec58a6
> --- /dev/null
> +++ b/fs/xfs/xfs_attr_item.c
> @@ -0,0 +1,453 @@
> +// SPDX-License-Identifier: GPL-2.0-or-later
> +/*
> + * Copyright (C) 2021 Oracle.  All Rights Reserved.
> + * Author: Allison Collins <allison.henderson@oracle.com>
> + */
> +

Some of the header files included below are not required. I was able to
compile successfully without including xfs_bit.h, xfs_bmap_btree.h,
xfs_buf_item.h, xfs_btree.h, xfs_rmap.h, xfs_icache.h, xfs_alloc.h, xfs_bmap.h
and xfs_quota.h. Also xfs_shared.h is included twice.

> +#include "xfs.h"
> +#include "xfs_fs.h"
> +#include "xfs_format.h"
> +#include "xfs_trans_resv.h"
> +#include "xfs_bit.h"
> +#include "xfs_shared.h"
> +#include "xfs_mount.h"
> +#include "xfs_defer.h"
> +#include "xfs_log_format.h"
> +#include "xfs_trans.h"
> +#include "xfs_bmap_btree.h"
> +#include "xfs_trans_priv.h"
> +#include "xfs_buf_item.h"
> +#include "xfs_log.h"
> +#include "xfs_btree.h"
> +#include "xfs_rmap.h"
> +#include "xfs_inode.h"
> +#include "xfs_icache.h"
> +#include "xfs_da_format.h"
> +#include "xfs_da_btree.h"
> +#include "xfs_attr.h"
> +#include "xfs_shared.h"
> +#include "xfs_attr_item.h"
> +#include "xfs_alloc.h"
> +#include "xfs_bmap.h"
> +#include "xfs_trace.h"
> +#include "libxfs/xfs_da_format.h"
> +#include "xfs_inode.h"
> +#include "xfs_quota.h"
> +#include "xfs_trans_space.h"
> +#include "xfs_error.h"
> +#include "xfs_log_priv.h"
> +#include "xfs_log_recover.h"
> +
> +static const struct xfs_item_ops xfs_attri_item_ops;
> +static const struct xfs_item_ops xfs_attrd_item_ops;
> +
> +/* iovec length must be 32-bit aligned */
> +static inline size_t ATTR_NVEC_SIZE(size_t size)
> +{
> +	return size == sizeof(int32_t) ? size :
> +	       sizeof(int32_t) + round_up(size, sizeof(int32_t));
> +}
> +
> +static inline struct xfs_attri_log_item *ATTRI_ITEM(struct xfs_log_item *lip)
> +{
> +	return container_of(lip, struct xfs_attri_log_item, attri_item);
> +}
> +
> +STATIC void
> +xfs_attri_item_free(
> +	struct xfs_attri_log_item	*attrip)
> +{
> +	kmem_free(attrip->attri_item.li_lv_shadow);
> +	kmem_free(attrip);
> +}
> +
> +/*
> + * Freeing the attrip requires that we remove it from the AIL if it has already
> + * been placed there. However, the ATTRI may not yet have been placed in the
> + * AIL when called by xfs_attri_release() from ATTRD processing due to the
> + * ordering of committed vs unpin operations in bulk insert operations. Hence
> + * the reference count to ensure only the last caller frees the ATTRI.
> + */
> +STATIC void
> +xfs_attri_release(
> +	struct xfs_attri_log_item	*attrip)
> +{
> +	ASSERT(atomic_read(&attrip->attri_refcount) > 0);
> +	if (atomic_dec_and_test(&attrip->attri_refcount)) {
> +		xfs_trans_ail_delete(&attrip->attri_item,
> +				     SHUTDOWN_LOG_IO_ERROR);
> +		xfs_attri_item_free(attrip);
> +	}
> +}
> +
> +STATIC void
> +xfs_attri_item_size(
> +	struct xfs_log_item	*lip,
> +	int			*nvecs,
> +	int			*nbytes)
> +{
> +	struct xfs_attri_log_item       *attrip = ATTRI_ITEM(lip);
> +
> +	*nvecs += 1;
> +	*nbytes += sizeof(struct xfs_attri_log_format);
> +
> +	/* Attr set and remove operations require a name */
> +	ASSERT(attrip->attri_name_len > 0);
> +
> +	*nvecs += 1;
> +	*nbytes += ATTR_NVEC_SIZE(attrip->attri_name_len);
> +
> +	if (attrip->attri_value_len > 0) {
> +		*nvecs += 1;
> +		*nbytes += ATTR_NVEC_SIZE(attrip->attri_value_len);
> +	}
> +}
> +
> +/*
> + * This is called to fill in the log iovecs for the given attri log
> + * item. We use  1 iovec for the attri_format_item, 1 for the name, and
> + * another for the value if it is present
> + */
> +STATIC void
> +xfs_attri_item_format(
> +	struct xfs_log_item	*lip,
> +	struct xfs_log_vec	*lv)
> +{
> +	struct xfs_attri_log_item	*attrip = ATTRI_ITEM(lip);
> +	struct xfs_log_iovec		*vecp = NULL;
> +
> +	attrip->attri_format.alfi_type = XFS_LI_ATTRI;
> +	attrip->attri_format.alfi_size = 1;
> +
> +	/*
> +	 * This size accounting must be done before copying the attrip into the
> +	 * iovec.  If we do it after, the wrong size will be recorded to the log
> +	 * and we trip across assertion checks for bad region sizes later during
> +	 * the log recovery.
> +	 */
> +
> +	ASSERT(attrip->attri_name_len > 0);
> +	attrip->attri_format.alfi_size++;
> +
> +	if (attrip->attri_value_len > 0)
> +		attrip->attri_format.alfi_size++;
> +
> +	xlog_copy_iovec(lv, &vecp, XLOG_REG_TYPE_ATTRI_FORMAT,
> +			&attrip->attri_format,
> +			sizeof(struct xfs_attri_log_format));
> +	xlog_copy_iovec(lv, &vecp, XLOG_REG_TYPE_ATTR_NAME,
> +			attrip->attri_name,
> +			ATTR_NVEC_SIZE(attrip->attri_name_len));
> +	if (attrip->attri_value_len > 0)
> +		xlog_copy_iovec(lv, &vecp, XLOG_REG_TYPE_ATTR_VALUE,
> +				attrip->attri_value,
> +				ATTR_NVEC_SIZE(attrip->attri_value_len));
> +}
> +
> +/*
> + * The unpin operation is the last place an ATTRI is manipulated in the log. It
> + * is either inserted in the AIL or aborted in the event of a log I/O error. In
> + * either case, the ATTRI transaction has been successfully committed to make
> + * it this far. Therefore, we expect whoever committed the ATTRI to either
> + * construct and commit the ATTRD or drop the ATTRD's reference in the event of
> + * error. Simply drop the log's ATTRI reference now that the log is done with
> + * it.
> + */
> +STATIC void
> +xfs_attri_item_unpin(
> +	struct xfs_log_item	*lip,
> +	int			remove)
> +{
> +	xfs_attri_release(ATTRI_ITEM(lip));
> +}
> +
> +
> +STATIC void
> +xfs_attri_item_release(
> +	struct xfs_log_item	*lip)
> +{
> +	xfs_attri_release(ATTRI_ITEM(lip));
> +}
> +
> +/*
> + * Allocate and initialize an attri item.  Caller may allocate an additional
> + * trailing buffer of the specified size
> + */
> +STATIC struct xfs_attri_log_item *
> +xfs_attri_init(
> +	struct xfs_mount		*mp,
> +	int				buffer_size)
> +
> +{
> +	struct xfs_attri_log_item	*attrip;
> +	uint				size;
> +
> +	size = sizeof(struct xfs_attri_log_item) + buffer_size;
> +	attrip = kvmalloc(size, KM_ZERO);
> +	if (attrip == NULL)
> +		return NULL;
> +
> +	xfs_log_item_init(mp, &attrip->attri_item, XFS_LI_ATTRI,
> +			  &xfs_attri_item_ops);
> +	attrip->attri_format.alfi_id = (uintptr_t)(void *)attrip;
> +	atomic_set(&attrip->attri_refcount, 2);
> +
> +	return attrip;
> +}
> +
> +/*
> + * Copy an attr format buffer from the given buf, and into the destination attr
> + * format structure.
> + */
> +STATIC int
> +xfs_attri_copy_format(
> +	struct xfs_log_iovec		*buf,
> +	struct xfs_attri_log_format	*dst_attr_fmt)
> +{
> +	struct xfs_attri_log_format	*src_attr_fmt = buf->i_addr;
> +	uint				len;
> +
> +	len = sizeof(struct xfs_attri_log_format);
> +	if (buf->i_len != len) {
> +		XFS_ERROR_REPORT(__func__, XFS_ERRLEVEL_LOW, NULL);
> +		return -EFSCORRUPTED;
> +	}
> +
> +	memcpy((char *)dst_attr_fmt, (char *)src_attr_fmt, len);
> +	return 0;
> +}
> +
> +static inline struct xfs_attrd_log_item *ATTRD_ITEM(struct xfs_log_item *lip)
> +{
> +	return container_of(lip, struct xfs_attrd_log_item, attrd_item);
> +}
> +
> +STATIC void
> +xfs_attrd_item_free(struct xfs_attrd_log_item *attrdp)
> +{
> +	kmem_free(attrdp->attrd_item.li_lv_shadow);
> +	kmem_free(attrdp);
> +}
> +
> +STATIC void
> +xfs_attrd_item_size(
> +	struct xfs_log_item		*lip,
> +	int				*nvecs,
> +	int				*nbytes)
> +{
> +	*nvecs += 1;
> +	*nbytes += sizeof(struct xfs_attrd_log_format);
> +}
> +
> +/*
> + * This is called to fill in the log iovecs for the given attrd log item. We use
> + * only 1 iovec for the attrd_format, and we point that at the attr_log_format
> + * structure embedded in the attrd item.
> + */
> +STATIC void
> +xfs_attrd_item_format(
> +	struct xfs_log_item	*lip,
> +	struct xfs_log_vec	*lv)
> +{
> +	struct xfs_attrd_log_item	*attrdp = ATTRD_ITEM(lip);
> +	struct xfs_log_iovec		*vecp = NULL;
> +
> +	attrdp->attrd_format.alfd_type = XFS_LI_ATTRD;
> +	attrdp->attrd_format.alfd_size = 1;
> +
> +	xlog_copy_iovec(lv, &vecp, XLOG_REG_TYPE_ATTRD_FORMAT,
> +			&attrdp->attrd_format,
> +			sizeof(struct xfs_attrd_log_format));
> +}
> +
> +/*
> + * The ATTRD is either committed or aborted if the transaction is canceled. If
> + * the transaction is canceled, drop our reference to the ATTRI and free the
> + * ATTRD.
> + */
> +STATIC void
> +xfs_attrd_item_release(
> +	struct xfs_log_item		*lip)
> +{
> +	struct xfs_attrd_log_item	*attrdp = ATTRD_ITEM(lip);
> +
> +	xfs_attri_release(attrdp->attrd_attrip);
> +	xfs_attrd_item_free(attrdp);
> +}
> +
> +STATIC xfs_lsn_t
> +xfs_attri_item_committed(
> +	struct xfs_log_item		*lip,
> +	xfs_lsn_t			lsn)
> +{
> +	struct xfs_attri_log_item	*attrip = ATTRI_ITEM(lip);
> +
> +	/*
> +	 * The attrip refers to xfs_attr_item memory to log the name and value
> +	 * with the intent item. This already occurred when the intent was
> +	 * committed so these fields are no longer accessed. Clear them out of
> +	 * caution since we're about to free the xfs_attr_item.
> +	 */
> +	attrip->attri_name = NULL;
> +	attrip->attri_value = NULL;
> +
> +	/*
> +	 * The ATTRI is logged only once and cannot be moved in the log, so
> +	 * simply return the lsn at which it's been logged.
> +	 */
> +	return lsn;
> +}
> +
> +STATIC bool
> +xfs_attri_item_match(
> +	struct xfs_log_item	*lip,
> +	uint64_t		intent_id)
> +{
> +	return ATTRI_ITEM(lip)->attri_format.alfi_id == intent_id;
> +}
> +
> +static const struct xfs_item_ops xfs_attrd_item_ops = {
> +	.flags		= XFS_ITEM_RELEASE_WHEN_COMMITTED,
> +	.iop_size	= xfs_attrd_item_size,
> +	.iop_format	= xfs_attrd_item_format,
> +	.iop_release    = xfs_attrd_item_release,
> +};
> +
> +/* Is this recovered ATTRI ok? */
> +static inline bool
> +xfs_attri_validate(
> +	struct xfs_mount		*mp,
> +	struct xfs_attri_log_item	*attrip)
> +{
> +	struct xfs_attri_log_format     *attrp = &attrip->attri_format;
> +	unsigned int			op = attrp->alfi_op_flags &
> +					     XFS_ATTR_OP_FLAGS_TYPE_MASK;
> +
> +	/* alfi_op_flags should be either a set or remove */
> +	if (op != XFS_ATTR_OP_FLAGS_SET && op != XFS_ATTR_OP_FLAGS_REMOVE)
> +		return false;
> +
> +	if (attrp->alfi_value_len > XATTR_SIZE_MAX)
> +		return false;
> +
> +	if ((attrp->alfi_name_len > XATTR_NAME_MAX) ||
> +	    (attrp->alfi_name_len == 0))
> +		return false;
> +
> +	return xfs_verify_ino(mp, attrp->alfi_ino);
> +}
> +
> +static const struct xfs_item_ops xfs_attri_item_ops = {
> +	.iop_size	= xfs_attri_item_size,
> +	.iop_format	= xfs_attri_item_format,
> +	.iop_unpin	= xfs_attri_item_unpin,
> +	.iop_committed	= xfs_attri_item_committed,
> +	.iop_release    = xfs_attri_item_release,
> +	.iop_match	= xfs_attri_item_match,
> +};
> +
> +
> +
> +STATIC int
> +xlog_recover_attri_commit_pass2(
> +	struct xlog                     *log,
> +	struct list_head		*buffer_list,
> +	struct xlog_recover_item        *item,
> +	xfs_lsn_t                       lsn)
> +{
> +	int                             error;
> +	struct xfs_mount                *mp = log->l_mp;
> +	struct xfs_attri_log_item       *attrip;
> +	struct xfs_attri_log_format     *attri_formatp;
> +	char				*name = NULL;
> +	char				*value = NULL;
> +	int				region = 0;
> +	int				buffer_size;
> +
> +	attri_formatp = item->ri_buf[region].i_addr;
> +
> +	/* Validate xfs_attri_log_format */
> +	if (attri_formatp->__pad != 0 || attri_formatp->alfi_name_len == 0 ||
> +	    (attri_formatp->alfi_op_flags == XFS_ATTR_OP_FLAGS_REMOVE &&
> +	    attri_formatp->alfi_value_len != 0)) {
> +		XFS_ERROR_REPORT(__func__, XFS_ERRLEVEL_LOW, mp);
> +		return -EFSCORRUPTED;
> +	}
> +
> +	buffer_size = attri_formatp->alfi_name_len +
> +		      attri_formatp->alfi_value_len;
> +
> +	attrip = xfs_attri_init(mp, buffer_size);
> +	if (attrip == NULL)
> +		return -ENOMEM;
> +
> +	error = xfs_attri_copy_format(&item->ri_buf[region],
> +				      &attrip->attri_format);
> +	if (error) {
> +		xfs_attri_item_free(attrip);
> +		return error;
> +	}
> +
> +	attrip->attri_name_len = attri_formatp->alfi_name_len;
> +	attrip->attri_value_len = attri_formatp->alfi_value_len;
> +	region++;
> +	name = ((char *)attrip) + sizeof(struct xfs_attri_log_item);
> +	memcpy(name, item->ri_buf[region].i_addr, attrip->attri_name_len);
> +	attrip->attri_name = name;
> +
> +	if (attrip->attri_value_len > 0) {
> +		region++;
> +		value = ((char *)attrip) + sizeof(struct xfs_attri_log_item) +
> +			attrip->attri_name_len;
> +		memcpy(value, item->ri_buf[region].i_addr,
> +			attrip->attri_value_len);
> +		attrip->attri_value = value;
> +	}
> +
> +	/*
> +	 * The ATTRI has two references. One for the ATTRD and one for ATTRI to
> +	 * ensure it makes it into the AIL. Insert the ATTRI into the AIL
> +	 * directly and drop the ATTRI reference. Note that
> +	 * xfs_trans_ail_update() drops the AIL lock.
> +	 */
> +	xfs_trans_ail_insert(log->l_ailp, &attrip->attri_item, lsn);
> +	xfs_attri_release(attrip);
> +	return 0;
> +}
> +
> +const struct xlog_recover_item_ops xlog_attri_item_ops = {
> +	.item_type	= XFS_LI_ATTRI,
> +	.commit_pass2	= xlog_recover_attri_commit_pass2,
> +};
> +
> +/*
> + * This routine is called when an ATTRD format structure is found in a committed
> + * transaction in the log. Its purpose is to cancel the corresponding ATTRI if
> + * it was still in the log. To do this it searches the AIL for the ATTRI with
> + * an id equal to that in the ATTRD format structure. If we find it we drop
> + * the ATTRD reference, which removes the ATTRI from the AIL and frees it.
> + */
> +STATIC int
> +xlog_recover_attrd_commit_pass2(
> +	struct xlog			*log,
> +	struct list_head		*buffer_list,
> +	struct xlog_recover_item	*item,
> +	xfs_lsn_t			lsn)
> +{
> +	struct xfs_attrd_log_format	*attrd_formatp;
> +
> +	attrd_formatp = item->ri_buf[0].i_addr;
> +	if (item->ri_buf[0].i_len != sizeof(struct xfs_attrd_log_format)) {
> +		XFS_ERROR_REPORT(__func__, XFS_ERRLEVEL_LOW, NULL);
> +		return -EFSCORRUPTED;
> +	}
> +
> +	xlog_recover_release_intent(log, XFS_LI_ATTRI,
> +				    attrd_formatp->alfd_alf_id);
> +	return 0;
> +}
> +
> +const struct xlog_recover_item_ops xlog_attrd_item_ops = {
> +	.item_type	= XFS_LI_ATTRD,
> +	.commit_pass2	= xlog_recover_attrd_commit_pass2,
> +};
> diff --git a/fs/xfs/xfs_attr_item.h b/fs/xfs/xfs_attr_item.h
> new file mode 100644
> index 000000000000..ce33e9b5a9d3
> --- /dev/null
> +++ b/fs/xfs/xfs_attr_item.h
> @@ -0,0 +1,52 @@
> +/* SPDX-License-Identifier: GPL-2.0-or-later
> + *
> + * Copyright (C) 2021 Oracle.  All Rights Reserved.
> + * Author: Allison Collins <allison.henderson@oracle.com>
> + */
> +#ifndef	__XFS_ATTR_ITEM_H__
> +#define	__XFS_ATTR_ITEM_H__
> +
> +/* kernel only ATTRI/ATTRD definitions */
> +
> +struct xfs_mount;
> +struct kmem_zone;
> +
> +/*
> + * Define ATTR flag bits. Manipulated by set/clear/test_bit operators.
> + */
> +#define	XFS_ATTRI_RECOVERED	1

XFS_ATTRI_RECOVERED isn't used in this patch nor in any of the future patches
in this series.

> +
> +
> +/*
> + * This is the "attr intention" log item.  It is used to log the fact that some
> + * attribute operations need to be processed.  An operation is currently either
> + * a set or remove.  Set or remove operations are described by the xfs_attr_item
> + * which may be logged to this intent.
> + *
> + * During a normal attr operation, name and value point to the name and value
> + * fields of the calling functions xfs_da_args.  During a recovery, the name
> + * and value buffers are copied from the log, and stored in a trailing buffer
> + * attached to the xfs_attr_item until they are committed.  They are freed when
> + * the xfs_attr_item itself is freed when the work is done.
> + */
> +struct xfs_attri_log_item {
> +	struct xfs_log_item		attri_item;
> +	atomic_t			attri_refcount;
> +	int				attri_name_len;
> +	int				attri_value_len;
> +	void				*attri_name;
> +	void				*attri_value;
> +	struct xfs_attri_log_format	attri_format;
> +};
> +
> +/*
> + * This is the "attr done" log item.  It is used to log the fact that some attrs
> + * earlier mentioned in an attri item have been freed.
> + */
> +struct xfs_attrd_log_item {
> +	struct xfs_attri_log_item	*attrd_attrip;
> +	struct xfs_log_item		attrd_item;
> +	struct xfs_attrd_log_format	attrd_format;
> +};
> +
> +#endif	/* __XFS_ATTR_ITEM_H__ */
> diff --git a/fs/xfs/xfs_attr_list.c b/fs/xfs/xfs_attr_list.c
> index 2d1e5134cebe..90a14e85e76d 100644
> --- a/fs/xfs/xfs_attr_list.c
> +++ b/fs/xfs/xfs_attr_list.c
> @@ -15,6 +15,7 @@
>  #include "xfs_inode.h"
>  #include "xfs_trans.h"
>  #include "xfs_bmap.h"
> +#include "xfs_da_btree.h"
>  #include "xfs_attr.h"
>  #include "xfs_attr_sf.h"
>  #include "xfs_attr_leaf.h"
> diff --git a/fs/xfs/xfs_ioctl32.c b/fs/xfs/xfs_ioctl32.c
> index 8783af203cfc..ab543c5b1371 100644
> --- a/fs/xfs/xfs_ioctl32.c
> +++ b/fs/xfs/xfs_ioctl32.c
> @@ -17,6 +17,8 @@
>  #include "xfs_itable.h"
>  #include "xfs_fsops.h"
>  #include "xfs_rtalloc.h"
> +#include "xfs_da_format.h"
> +#include "xfs_da_btree.h"
>  #include "xfs_attr.h"
>  #include "xfs_ioctl.h"
>  #include "xfs_ioctl32.h"
> diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
> index 0ff0cca94092..f81e5d55341b 100644
> --- a/fs/xfs/xfs_iops.c
> +++ b/fs/xfs/xfs_iops.c
> @@ -13,6 +13,8 @@
>  #include "xfs_inode.h"
>  #include "xfs_acl.h"
>  #include "xfs_quota.h"
> +#include "xfs_da_format.h"
> +#include "xfs_da_btree.h"
>  #include "xfs_attr.h"
>  #include "xfs_trans.h"
>  #include "xfs_trace.h"
> diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
> index f6cd2d4aa770..4402c5d09269 100644
> --- a/fs/xfs/xfs_log.c
> +++ b/fs/xfs/xfs_log.c
> @@ -2157,6 +2157,10 @@ xlog_print_tic_res(
>  	    REG_TYPE_STR(CUD_FORMAT, "cud_format"),
>  	    REG_TYPE_STR(BUI_FORMAT, "bui_format"),
>  	    REG_TYPE_STR(BUD_FORMAT, "bud_format"),
> +	    REG_TYPE_STR(ATTRI_FORMAT, "attri_format"),
> +	    REG_TYPE_STR(ATTRD_FORMAT, "attrd_format"),
> +	    REG_TYPE_STR(ATTR_NAME, "attr name"),
> +	    REG_TYPE_STR(ATTR_VALUE, "attr value"),
>  	};
>  	BUILD_BUG_ON(ARRAY_SIZE(res_type_str) != XLOG_REG_TYPE_MAX + 1);
>  #undef REG_TYPE_STR
> diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
> index 6a3c0bb16b69..c248fd5cf9ea 100644
> --- a/fs/xfs/xfs_log_recover.c
> +++ b/fs/xfs/xfs_log_recover.c
> @@ -1800,6 +1800,8 @@ static const struct xlog_recover_item_ops *xlog_recover_item_ops[] = {
>  	&xlog_cud_item_ops,
>  	&xlog_bui_item_ops,
>  	&xlog_bud_item_ops,
> +	&xlog_attri_item_ops,
> +	&xlog_attrd_item_ops,
>  };
>  
>  static const struct xlog_recover_item_ops *
> diff --git a/fs/xfs/xfs_ondisk.h b/fs/xfs/xfs_ondisk.h
> index 25991923c1a8..758702b9495f 100644
> --- a/fs/xfs/xfs_ondisk.h
> +++ b/fs/xfs/xfs_ondisk.h
> @@ -132,6 +132,8 @@ xfs_check_ondisk_structs(void)
>  	XFS_CHECK_STRUCT_SIZE(struct xfs_inode_log_format,	56);
>  	XFS_CHECK_STRUCT_SIZE(struct xfs_qoff_logformat,	20);
>  	XFS_CHECK_STRUCT_SIZE(struct xfs_trans_header,		16);
> +	XFS_CHECK_STRUCT_SIZE(struct xfs_attri_log_format,	40);
> +	XFS_CHECK_STRUCT_SIZE(struct xfs_attrd_log_format,	16);
>  
>  	/*
>  	 * The v5 superblock format extended several v4 header structures with


-- 
chandan

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v24 04/11] xfs: Implement attr logging and replay
  2021-08-24 22:44 ` [PATCH v24 04/11] xfs: Implement attr logging and replay Allison Henderson
@ 2021-08-30  7:47   ` Chandan Babu R
  2021-08-31 18:11     ` Allison Henderson
  0 siblings, 1 reply; 45+ messages in thread
From: Chandan Babu R @ 2021-08-30  7:47 UTC (permalink / raw)
  To: Allison Henderson; +Cc: linux-xfs

On 25 Aug 2021 at 04:14, Allison Henderson wrote:
> This patch adds the needed routines to create, log and recover logged
> extended attribute intents.
>

Apart from the two nits mentioned below, the remaining changes seem to
correctly handle attr set/remove operations regardless of whether delayed
logging is enabled or not.

Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>

> Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
> ---
>  fs/xfs/libxfs/xfs_defer.c  |   1 +
>  fs/xfs/libxfs/xfs_defer.h  |   1 +
>  fs/xfs/libxfs/xfs_format.h |  10 +-
>  fs/xfs/xfs_attr_item.c     | 358 +++++++++++++++++++++++++++++++++++++
>  4 files changed, 369 insertions(+), 1 deletion(-)
>
> diff --git a/fs/xfs/libxfs/xfs_defer.c b/fs/xfs/libxfs/xfs_defer.c
> index d1d09b6aca55..01fcf5e93be5 100644
> --- a/fs/xfs/libxfs/xfs_defer.c
> +++ b/fs/xfs/libxfs/xfs_defer.c
> @@ -178,6 +178,7 @@ static const struct xfs_defer_op_type *defer_op_types[] = {
>  	[XFS_DEFER_OPS_TYPE_RMAP]	= &xfs_rmap_update_defer_type,
>  	[XFS_DEFER_OPS_TYPE_FREE]	= &xfs_extent_free_defer_type,
>  	[XFS_DEFER_OPS_TYPE_AGFL_FREE]	= &xfs_agfl_free_defer_type,
> +	[XFS_DEFER_OPS_TYPE_ATTR]	= &xfs_attr_defer_type,
>  };
>  
>  static void
> diff --git a/fs/xfs/libxfs/xfs_defer.h b/fs/xfs/libxfs/xfs_defer.h
> index 89719146c5eb..d70525c57b5c 100644
> --- a/fs/xfs/libxfs/xfs_defer.h
> +++ b/fs/xfs/libxfs/xfs_defer.h
> @@ -19,6 +19,7 @@ enum xfs_defer_ops_type {
>  	XFS_DEFER_OPS_TYPE_RMAP,
>  	XFS_DEFER_OPS_TYPE_FREE,
>  	XFS_DEFER_OPS_TYPE_AGFL_FREE,
> +	XFS_DEFER_OPS_TYPE_ATTR,
>  	XFS_DEFER_OPS_TYPE_MAX,
>  };
>  
> diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
> index 2d7057b7984b..2e0937bbff6d 100644
> --- a/fs/xfs/libxfs/xfs_format.h
> +++ b/fs/xfs/libxfs/xfs_format.h
> @@ -388,7 +388,9 @@ xfs_sb_has_incompat_feature(
>  	return (sbp->sb_features_incompat & feature) != 0;
>  }
>  
> -#define XFS_SB_FEAT_INCOMPAT_LOG_ALL 0
> +#define XFS_SB_FEAT_INCOMPAT_LOG_XATTRS   (1 << 0)	/* Delayed Attributes */
> +#define XFS_SB_FEAT_INCOMPAT_LOG_ALL \
> +	(XFS_SB_FEAT_INCOMPAT_LOG_XATTRS)
>  #define XFS_SB_FEAT_INCOMPAT_LOG_UNKNOWN	~XFS_SB_FEAT_INCOMPAT_LOG_ALL
>  static inline bool
>  xfs_sb_has_incompat_log_feature(
> @@ -413,6 +415,12 @@ xfs_sb_add_incompat_log_features(
>  	sbp->sb_features_log_incompat |= features;
>  }
>

To be consistent with the existing convention, may be the function below
should be renamed as xfs_sb_version_haslogxattrs()?

> +static inline bool sb_version_haslogxattrs(struct xfs_sb *sbp)
> +{
> +	return XFS_SB_VERSION_NUM(sbp) == XFS_SB_VERSION_5 &&

The above comparison can be replaced with a call to xfs_sb_is_v5().

> +		(sbp->sb_features_log_incompat &
> +		 XFS_SB_FEAT_INCOMPAT_LOG_XATTRS);
> +}
>  
>  static inline bool
>  xfs_is_quota_inode(struct xfs_sb *sbp, xfs_ino_t ino)
> diff --git a/fs/xfs/xfs_attr_item.c b/fs/xfs/xfs_attr_item.c
> index 879a39ec58a6..c6d5ed34b424 100644
> --- a/fs/xfs/xfs_attr_item.c
> +++ b/fs/xfs/xfs_attr_item.c
> @@ -275,6 +275,163 @@ xfs_attrd_item_release(
>  	xfs_attrd_item_free(attrdp);
>  }
>  
> +/*
> + * Performs one step of an attribute update intent and marks the attrd item
> + * dirty..  An attr operation may be a set or a remove.  Note that the
> + * transaction is marked dirty regardless of whether the operation succeeds or
> + * fails to support the ATTRI/ATTRD lifecycle rules.
> + */
> +STATIC int
> +xfs_trans_attr_finish_update(
> +	struct xfs_delattr_context	*dac,
> +	struct xfs_attrd_log_item	*attrdp,
> +	struct xfs_buf			**leaf_bp,
> +	uint32_t			op_flags)
> +{
> +	struct xfs_da_args		*args = dac->da_args;
> +	unsigned int			op = op_flags &
> +					     XFS_ATTR_OP_FLAGS_TYPE_MASK;
> +	int				error;
> +
> +	switch (op) {
> +	case XFS_ATTR_OP_FLAGS_SET:
> +		error = xfs_attr_set_iter(dac, leaf_bp);
> +		break;
> +	case XFS_ATTR_OP_FLAGS_REMOVE:
> +		ASSERT(XFS_IFORK_Q(args->dp));
> +		error = xfs_attr_remove_iter(dac);
> +		break;
> +	default:
> +		error = -EFSCORRUPTED;
> +		break;
> +	}
> +
> +	/*
> +	 * Mark the transaction dirty, even on error. This ensures the
> +	 * transaction is aborted, which:
> +	 *
> +	 * 1.) releases the ATTRI and frees the ATTRD
> +	 * 2.) shuts down the filesystem
> +	 */
> +	args->trans->t_flags |= XFS_TRANS_DIRTY;
> +
> +	/*
> +	 * attr intent/done items are null when delayed attributes are disabled
> +	 */
> +	if (attrdp)
> +		set_bit(XFS_LI_DIRTY, &attrdp->attrd_item.li_flags);
> +
> +	return error;
> +}
> +
> +/* Log an attr to the intent item. */
> +STATIC void
> +xfs_attr_log_item(
> +	struct xfs_trans		*tp,
> +	struct xfs_attri_log_item	*attrip,
> +	struct xfs_attr_item		*attr)
> +{
> +	struct xfs_attri_log_format	*attrp;
> +
> +	tp->t_flags |= XFS_TRANS_DIRTY;
> +	set_bit(XFS_LI_DIRTY, &attrip->attri_item.li_flags);
> +
> +	/*
> +	 * At this point the xfs_attr_item has been constructed, and we've
> +	 * created the log intent. Fill in the attri log item and log format
> +	 * structure with fields from this xfs_attr_item
> +	 */
> +	attrp = &attrip->attri_format;
> +	attrp->alfi_ino = attr->xattri_dac.da_args->dp->i_ino;
> +	attrp->alfi_op_flags = attr->xattri_op_flags;
> +	attrp->alfi_value_len = attr->xattri_dac.da_args->valuelen;
> +	attrp->alfi_name_len = attr->xattri_dac.da_args->namelen;
> +	attrp->alfi_attr_flags = attr->xattri_dac.da_args->attr_filter;
> +
> +	attrip->attri_name = (void *)attr->xattri_dac.da_args->name;
> +	attrip->attri_value = attr->xattri_dac.da_args->value;
> +	attrip->attri_name_len = attr->xattri_dac.da_args->namelen;
> +	attrip->attri_value_len = attr->xattri_dac.da_args->valuelen;
> +}
> +
> +/* Get an ATTRI. */
> +static struct xfs_log_item *
> +xfs_attr_create_intent(
> +	struct xfs_trans		*tp,
> +	struct list_head		*items,
> +	unsigned int			count,
> +	bool				sort)
> +{
> +	struct xfs_mount		*mp = tp->t_mountp;
> +	struct xfs_attri_log_item	*attrip;
> +	struct xfs_attr_item		*attr;
> +
> +	ASSERT(count == 1);
> +
> +	if (!sb_version_haslogxattrs(&mp->m_sb))
> +		return NULL;
> +
> +	attrip = xfs_attri_init(mp, 0);
> +	if (attrip == NULL)
> +		return NULL;
> +
> +	xfs_trans_add_item(tp, &attrip->attri_item);
> +	list_for_each_entry(attr, items, xattri_list)
> +		xfs_attr_log_item(tp, attrip, attr);
> +	return &attrip->attri_item;
> +}
> +
> +/* Process an attr. */
> +STATIC int
> +xfs_attr_finish_item(
> +	struct xfs_trans		*tp,
> +	struct xfs_log_item		*done,
> +	struct list_head		*item,
> +	struct xfs_btree_cur		**state)
> +{
> +	struct xfs_attr_item		*attr;
> +	struct xfs_attrd_log_item	*done_item = NULL;
> +	int				error;
> +	struct xfs_delattr_context	*dac;
> +
> +	attr = container_of(item, struct xfs_attr_item, xattri_list);
> +	dac = &attr->xattri_dac;
> +	if (done)
> +		done_item = ATTRD_ITEM(done);
> +
> +	/*
> +	 * Always reset trans after EAGAIN cycle
> +	 * since the transaction is new
> +	 */
> +	dac->da_args->trans = tp;
> +
> +	error = xfs_trans_attr_finish_update(dac, done_item, &dac->leaf_bp,
> +					     attr->xattri_op_flags);
> +	if (error != -EAGAIN)
> +		kmem_free(attr);
> +
> +	return error;
> +}
> +
> +/* Abort all pending ATTRs. */
> +STATIC void
> +xfs_attr_abort_intent(
> +	struct xfs_log_item		*intent)
> +{
> +	xfs_attri_release(ATTRI_ITEM(intent));
> +}
> +
> +/* Cancel an attr */
> +STATIC void
> +xfs_attr_cancel_item(
> +	struct list_head		*item)
> +{
> +	struct xfs_attr_item		*attr;
> +
> +	attr = container_of(item, struct xfs_attr_item, xattri_list);
> +	kmem_free(attr);
> +}
> +
>  STATIC xfs_lsn_t
>  xfs_attri_item_committed(
>  	struct xfs_log_item		*lip,
> @@ -306,6 +463,30 @@ xfs_attri_item_match(
>  	return ATTRI_ITEM(lip)->attri_format.alfi_id == intent_id;
>  }
>  
> +/*
> + * This routine is called to allocate an "attr free done" log item.
> + */
> +static struct xfs_attrd_log_item *
> +xfs_trans_get_attrd(struct xfs_trans		*tp,
> +		  struct xfs_attri_log_item	*attrip)
> +{
> +	struct xfs_attrd_log_item		*attrdp;
> +	uint					size;
> +
> +	ASSERT(tp != NULL);
> +
> +	size = sizeof(struct xfs_attrd_log_item);
> +	attrdp = kmem_zalloc(size, 0);
> +
> +	xfs_log_item_init(tp->t_mountp, &attrdp->attrd_item, XFS_LI_ATTRD,
> +			  &xfs_attrd_item_ops);
> +	attrdp->attrd_attrip = attrip;
> +	attrdp->attrd_format.alfd_alf_id = attrip->attri_format.alfi_id;
> +
> +	xfs_trans_add_item(tp, &attrdp->attrd_item);
> +	return attrdp;
> +}
> +
>  static const struct xfs_item_ops xfs_attrd_item_ops = {
>  	.flags		= XFS_ITEM_RELEASE_WHEN_COMMITTED,
>  	.iop_size	= xfs_attrd_item_size,
> @@ -313,6 +494,29 @@ static const struct xfs_item_ops xfs_attrd_item_ops = {
>  	.iop_release    = xfs_attrd_item_release,
>  };
>  
> +
> +/* Get an ATTRD so we can process all the attrs. */
> +static struct xfs_log_item *
> +xfs_attr_create_done(
> +	struct xfs_trans		*tp,
> +	struct xfs_log_item		*intent,
> +	unsigned int			count)
> +{
> +	if (!intent)
> +		return NULL;
> +
> +	return &xfs_trans_get_attrd(tp, ATTRI_ITEM(intent))->attrd_item;
> +}
> +
> +const struct xfs_defer_op_type xfs_attr_defer_type = {
> +	.max_items	= 1,
> +	.create_intent	= xfs_attr_create_intent,
> +	.abort_intent	= xfs_attr_abort_intent,
> +	.create_done	= xfs_attr_create_done,
> +	.finish_item	= xfs_attr_finish_item,
> +	.cancel_item	= xfs_attr_cancel_item,
> +};
> +
>  /* Is this recovered ATTRI ok? */
>  static inline bool
>  xfs_attri_validate(
> @@ -337,13 +541,167 @@ xfs_attri_validate(
>  	return xfs_verify_ino(mp, attrp->alfi_ino);
>  }
>  
> +/*
> + * Process an attr intent item that was recovered from the log.  We need to
> + * delete the attr that it describes.
> + */
> +STATIC int
> +xfs_attri_item_recover(
> +	struct xfs_log_item		*lip,
> +	struct list_head		*capture_list)
> +{
> +	struct xfs_attri_log_item	*attrip = ATTRI_ITEM(lip);
> +	struct xfs_attr_item		*attr;
> +	struct xfs_mount		*mp = lip->li_mountp;
> +	struct xfs_inode		*ip;
> +	struct xfs_da_args		*args;
> +	struct xfs_trans		*tp;
> +	struct xfs_trans_res		tres;
> +	struct xfs_attri_log_format	*attrp;
> +	int				error, ret = 0;
> +	int				total;
> +	int				local;
> +	struct xfs_attrd_log_item	*done_item = NULL;
> +
> +	/*
> +	 * First check the validity of the attr described by the ATTRI.  If any
> +	 * are bad, then assume that all are bad and just toss the ATTRI.
> +	 */
> +	attrp = &attrip->attri_format;
> +	if (!xfs_attri_validate(mp, attrip))
> +		return -EFSCORRUPTED;
> +
> +	error = xlog_recover_iget(mp,  attrp->alfi_ino, &ip);
> +	if (error)
> +		return error;
> +
> +	attr = kmem_zalloc(sizeof(struct xfs_attr_item) +
> +			   sizeof(struct xfs_da_args), KM_NOFS);
> +	args = (struct xfs_da_args *)(attr + 1);
> +
> +	attr->xattri_dac.da_args = args;
> +	attr->xattri_op_flags = attrp->alfi_op_flags;
> +
> +	args->dp = ip;
> +	args->geo = mp->m_attr_geo;
> +	args->op_flags = attrp->alfi_op_flags;
> +	args->whichfork = XFS_ATTR_FORK;
> +	args->name = attrip->attri_name;
> +	args->namelen = attrp->alfi_name_len;
> +	args->hashval = xfs_da_hashname(args->name, args->namelen);
> +	args->attr_filter = attrp->alfi_attr_flags;
> +
> +	if (attrp->alfi_op_flags == XFS_ATTR_OP_FLAGS_SET) {
> +		args->value = attrip->attri_value;
> +		args->valuelen = attrp->alfi_value_len;
> +		args->total = xfs_attr_calc_size(args, &local);
> +
> +		tres.tr_logres = M_RES(mp)->tr_attrsetm.tr_logres +
> +				 M_RES(mp)->tr_attrsetrt.tr_logres *
> +					args->total;
> +		tres.tr_logcount = XFS_ATTRSET_LOG_COUNT;
> +		tres.tr_logflags = XFS_TRANS_PERM_LOG_RES;
> +		total = args->total;
> +	} else {
> +		tres = M_RES(mp)->tr_attrrm;
> +		total = XFS_ATTRRM_SPACE_RES(mp);
> +	}
> +	error = xfs_trans_alloc(mp, &tres, total, 0, XFS_TRANS_RESERVE, &tp);
> +	if (error)
> +		goto out;
> +
> +	args->trans = tp;
> +	done_item = xfs_trans_get_attrd(tp, attrip);
> +
> +	xfs_ilock(ip, XFS_ILOCK_EXCL);
> +	xfs_trans_ijoin(tp, ip, 0);
> +
> +	ret = xfs_trans_attr_finish_update(&attr->xattri_dac, done_item,
> +					   &attr->xattri_dac.leaf_bp,
> +					   attrp->alfi_op_flags);
> +	if (ret == -EAGAIN) {
> +		/* There's more work to do, so add it to this transaction */
> +		xfs_defer_add(tp, XFS_DEFER_OPS_TYPE_ATTR, &attr->xattri_list);
> +	} else
> +		error = ret;
> +
> +	if (error) {
> +		xfs_trans_cancel(tp);
> +		goto out_unlock;
> +	}
> +
> +	error = xfs_defer_ops_capture_and_commit(tp, ip, capture_list,
> +						 attr->xattri_dac.leaf_bp);
> +
> +out_unlock:
> +	xfs_iunlock(ip, XFS_ILOCK_EXCL);
> +	xfs_irele(ip);
> +out:
> +	if (ret != -EAGAIN)
> +		kmem_free(attr);
> +	return error;
> +}
> +
> +/* Re-log an intent item to push the log tail forward. */
> +static struct xfs_log_item *
> +xfs_attri_item_relog(
> +	struct xfs_log_item		*intent,
> +	struct xfs_trans		*tp)
> +{
> +	struct xfs_attrd_log_item	*attrdp;
> +	struct xfs_attri_log_item	*old_attrip;
> +	struct xfs_attri_log_item	*new_attrip;
> +	struct xfs_attri_log_format	*new_attrp;
> +	struct xfs_attri_log_format	*old_attrp;
> +	int				buffer_size;
> +
> +	old_attrip = ATTRI_ITEM(intent);
> +	old_attrp = &old_attrip->attri_format;
> +	buffer_size = old_attrp->alfi_value_len + old_attrp->alfi_name_len;
> +
> +	tp->t_flags |= XFS_TRANS_DIRTY;
> +	attrdp = xfs_trans_get_attrd(tp, old_attrip);
> +	set_bit(XFS_LI_DIRTY, &attrdp->attrd_item.li_flags);
> +
> +	new_attrip = xfs_attri_init(tp->t_mountp, buffer_size);
> +	new_attrp = &new_attrip->attri_format;
> +
> +	new_attrp->alfi_ino = old_attrp->alfi_ino;
> +	new_attrp->alfi_op_flags = old_attrp->alfi_op_flags;
> +	new_attrp->alfi_value_len = old_attrp->alfi_value_len;
> +	new_attrp->alfi_name_len = old_attrp->alfi_name_len;
> +	new_attrp->alfi_attr_flags = old_attrp->alfi_attr_flags;
> +
> +	new_attrip->attri_name_len = old_attrip->attri_name_len;
> +	new_attrip->attri_name = ((char *)new_attrip) +
> +				 sizeof(struct xfs_attri_log_item);
> +	memcpy(new_attrip->attri_name, old_attrip->attri_name,
> +		new_attrip->attri_name_len);
> +
> +	new_attrip->attri_value_len = old_attrip->attri_value_len;
> +	if (new_attrip->attri_value_len > 0) {
> +		new_attrip->attri_value = new_attrip->attri_name +
> +					  new_attrip->attri_name_len;
> +
> +		memcpy(new_attrip->attri_value, old_attrip->attri_value,
> +		       new_attrip->attri_value_len);
> +	}
> +
> +	xfs_trans_add_item(tp, &new_attrip->attri_item);
> +	set_bit(XFS_LI_DIRTY, &new_attrip->attri_item.li_flags);
> +
> +	return &new_attrip->attri_item;
> +}
> +
>  static const struct xfs_item_ops xfs_attri_item_ops = {
>  	.iop_size	= xfs_attri_item_size,
>  	.iop_format	= xfs_attri_item_format,
>  	.iop_unpin	= xfs_attri_item_unpin,
>  	.iop_committed	= xfs_attri_item_committed,
>  	.iop_release    = xfs_attri_item_release,
> +	.iop_recover	= xfs_attri_item_recover,
>  	.iop_match	= xfs_attri_item_match,
> +	.iop_relog	= xfs_attri_item_relog,
>  };


-- 
chandan

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v24 05/11] RFC xfs: Skip flip flags for delayed attrs
  2021-08-24 22:44 ` [PATCH v24 05/11] RFC xfs: Skip flip flags for delayed attrs Allison Henderson
@ 2021-08-30 10:15   ` Chandan Babu R
  2021-08-31 18:11     ` Allison Henderson
  0 siblings, 1 reply; 45+ messages in thread
From: Chandan Babu R @ 2021-08-30 10:15 UTC (permalink / raw)
  To: Allison Henderson; +Cc: linux-xfs

On 25 Aug 2021 at 04:14, Allison Henderson wrote:
> This is a clean up patch that skips the flip flag logic for delayed attr
> renames.  Since the log replay keeps the inode locked, we do not need to
> worry about race windows with attr lookups.  So we can skip over
> flipping the flag and the extra transaction roll for it
>
> RFC: In the last review, folks asked for some performance analysis, so I
> did a few perf captures with and with out this patch.  What I found was
> that there wasnt very much difference at all between having the patch or
> not having it.  Of the time we do spend in the affected code, the
> percentage is small.  Most of the time we spend about %0.03 of the time
> in this function, with or with out the patch.  Occasionally we get a
> 0.02%, though not often.  So I think this starts to challenge needing
> this patch at all. This patch was requested some number of reviews ago,
> be perhaps in light of the findings, it may no longer be of interest.
>
>      0.03%     0.00%  fsstress  [xfs]               [k] xfs_attr_set_iter
>
> Keep it or drop it?

Looks good to me.

Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>

>
> Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
> ---
>  fs/xfs/libxfs/xfs_attr.c      | 54 +++++++++++++++++++++--------------
>  fs/xfs/libxfs/xfs_attr_leaf.c |  3 +-
>  2 files changed, 35 insertions(+), 22 deletions(-)
>
> diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
> index dfff81024e46..fce67c717be2 100644
> --- a/fs/xfs/libxfs/xfs_attr.c
> +++ b/fs/xfs/libxfs/xfs_attr.c
> @@ -355,6 +355,7 @@ xfs_attr_set_iter(
>  	struct xfs_inode		*dp = args->dp;
>  	struct xfs_buf			*bp = NULL;
>  	int				forkoff, error = 0;
> +	struct xfs_mount		*mp = args->dp->i_mount;
>  
>  	/* State machine switch */
>  	switch (dac->dela_state) {
> @@ -477,16 +478,21 @@ xfs_attr_set_iter(
>  		 * In a separate transaction, set the incomplete flag on the
>  		 * "old" attr and clear the incomplete flag on the "new" attr.
>  		 */
> -		error = xfs_attr3_leaf_flipflags(args);
> -		if (error)
> -			return error;
> -		/*
> -		 * Commit the flag value change and start the next trans in
> -		 * series.
> -		 */
> -		dac->dela_state = XFS_DAS_FLIP_LFLAG;
> -		trace_xfs_attr_set_iter_return(dac->dela_state, args->dp);
> -		return -EAGAIN;
> +		if (!xfs_has_larp(mp)) {
> +			error = xfs_attr3_leaf_flipflags(args);
> +			if (error)
> +				return error;
> +			/*
> +			 * Commit the flag value change and start the next trans
> +			 * in series.
> +			 */
> +			dac->dela_state = XFS_DAS_FLIP_LFLAG;
> +			trace_xfs_attr_set_iter_return(dac->dela_state,
> +						       args->dp);
> +			return -EAGAIN;
> +		}
> +
> +		/* fallthrough */
>  	case XFS_DAS_FLIP_LFLAG:
>  		/*
>  		 * Dismantle the "old" attribute/value pair by removing a
> @@ -589,17 +595,21 @@ xfs_attr_set_iter(
>  		 * In a separate transaction, set the incomplete flag on the
>  		 * "old" attr and clear the incomplete flag on the "new" attr.
>  		 */
> -		error = xfs_attr3_leaf_flipflags(args);
> -		if (error)
> -			goto out;
> -		/*
> -		 * Commit the flag value change and start the next trans in
> -		 * series
> -		 */
> -		dac->dela_state = XFS_DAS_FLIP_NFLAG;
> -		trace_xfs_attr_set_iter_return(dac->dela_state, args->dp);
> -		return -EAGAIN;
> +		if (!xfs_has_larp(mp)) {
> +			error = xfs_attr3_leaf_flipflags(args);
> +			if (error)
> +				goto out;
> +			/*
> +			 * Commit the flag value change and start the next trans
> +			 * in series
> +			 */
> +			dac->dela_state = XFS_DAS_FLIP_NFLAG;
> +			trace_xfs_attr_set_iter_return(dac->dela_state,
> +						       args->dp);
> +			return -EAGAIN;
> +		}
>  
> +		/* fallthrough */
>  	case XFS_DAS_FLIP_NFLAG:
>  		/*
>  		 * Dismantle the "old" attribute/value pair by removing a
> @@ -1236,6 +1246,7 @@ xfs_attr_node_addname_clear_incomplete(
>  {
>  	struct xfs_da_args		*args = dac->da_args;
>  	struct xfs_da_state		*state = NULL;
> +	struct xfs_mount		*mp = args->dp->i_mount;
>  	int				retval = 0;
>  	int				error = 0;
>  
> @@ -1243,7 +1254,8 @@ xfs_attr_node_addname_clear_incomplete(
>  	 * Re-find the "old" attribute entry after any split ops. The INCOMPLETE
>  	 * flag means that we will find the "old" attr, not the "new" one.
>  	 */
> -	args->attr_filter |= XFS_ATTR_INCOMPLETE;
> +	if (!xfs_has_larp(mp))
> +		args->attr_filter |= XFS_ATTR_INCOMPLETE;
>  	state = xfs_da_state_alloc(args);
>  	state->inleaf = 0;
>  	error = xfs_da3_node_lookup_int(state, &retval);
> diff --git a/fs/xfs/libxfs/xfs_attr_leaf.c b/fs/xfs/libxfs/xfs_attr_leaf.c
> index e1d11e314228..a0a352bdea59 100644
> --- a/fs/xfs/libxfs/xfs_attr_leaf.c
> +++ b/fs/xfs/libxfs/xfs_attr_leaf.c
> @@ -1487,7 +1487,8 @@ xfs_attr3_leaf_add_work(
>  	if (tmp)
>  		entry->flags |= XFS_ATTR_LOCAL;
>  	if (args->op_flags & XFS_DA_OP_RENAME) {
> -		entry->flags |= XFS_ATTR_INCOMPLETE;
> +		if (!xfs_has_larp(mp))
> +			entry->flags |= XFS_ATTR_INCOMPLETE;
>  		if ((args->blkno2 == args->blkno) &&
>  		    (args->index2 <= args->index)) {
>  			args->index2++;


-- 
chandan

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v24 06/11] xfs: Add xfs_attr_set_deferred and xfs_attr_remove_deferred
  2021-08-24 22:44 ` [PATCH v24 06/11] xfs: Add xfs_attr_set_deferred and xfs_attr_remove_deferred Allison Henderson
@ 2021-08-30 10:27   ` Chandan Babu R
  2021-08-31 18:12     ` Allison Henderson
  2021-09-01  3:47   ` Dave Chinner
  1 sibling, 1 reply; 45+ messages in thread
From: Chandan Babu R @ 2021-08-30 10:27 UTC (permalink / raw)
  To: Allison Henderson; +Cc: linux-xfs

On 25 Aug 2021 at 04:14, Allison Henderson wrote:
> From: Allison Collins <allison.henderson@oracle.com>
>
> These routines set up and queue a new deferred attribute operations.
> These functions are meant to be called by any routine needing to
> initiate a deferred attribute operation as opposed to the existing
> inline operations. New helper function xfs_attr_item_init also added.
>
> Finally enable delayed attributes in xfs_attr_set and xfs_attr_remove.
>

Looks good to me.

Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>

> Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
> ---
>  fs/xfs/libxfs/xfs_attr.c | 71 ++++++++++++++++++++++++++++++++++++++--
>  fs/xfs/libxfs/xfs_attr.h |  2 ++
>  fs/xfs/xfs_log.c         | 41 +++++++++++++++++++++++
>  fs/xfs/xfs_log.h         |  1 +
>  4 files changed, 112 insertions(+), 3 deletions(-)
>
> diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
> index fce67c717be2..6877683e2e35 100644
> --- a/fs/xfs/libxfs/xfs_attr.c
> +++ b/fs/xfs/libxfs/xfs_attr.c
> @@ -25,6 +25,8 @@
>  #include "xfs_trans_space.h"
>  #include "xfs_trace.h"
>  #include "xfs_attr_item.h"
> +#include "xfs_attr.h"
> +#include "xfs_log.h"
>  
>  /*
>   * xfs_attr.c
> @@ -726,6 +728,7 @@ xfs_attr_set(
>  	int			error, local;
>  	int			rmt_blks = 0;
>  	unsigned int		total;
> +	int			delayed = xfs_has_larp(mp);
>  
>  	if (xfs_is_shutdown(dp->i_mount))
>  		return -EIO;
> @@ -782,13 +785,19 @@ xfs_attr_set(
>  		rmt_blks = xfs_attr3_rmt_blocks(mp, XFS_XATTR_SIZE_MAX);
>  	}
>  
> +	if (delayed) {
> +		error = xfs_attr_use_log_assist(mp);
> +		if (error)
> +			return error;
> +	}
> +
>  	/*
>  	 * Root fork attributes can use reserved data blocks for this
>  	 * operation if necessary
>  	 */
>  	error = xfs_trans_alloc_inode(dp, &tres, total, 0, rsvd, &args->trans);
>  	if (error)
> -		return error;
> +		goto drop_incompat;
>  
>  	if (args->value || xfs_inode_hasattr(dp)) {
>  		error = xfs_iext_count_may_overflow(dp, XFS_ATTR_FORK,
> @@ -806,9 +815,10 @@ xfs_attr_set(
>  		if (error != -ENOATTR && error != -EEXIST)
>  			goto out_trans_cancel;
>  
> -		error = xfs_attr_set_args(args);
> +		error = xfs_attr_set_deferred(args);
>  		if (error)
>  			goto out_trans_cancel;
> +
>  		/* shortform attribute has already been committed */
>  		if (!args->trans)
>  			goto out_unlock;
> @@ -816,7 +826,7 @@ xfs_attr_set(
>  		if (error != -EEXIST)
>  			goto out_trans_cancel;
>  
> -		error = xfs_attr_remove_args(args);
> +		error = xfs_attr_remove_deferred(args);
>  		if (error)
>  			goto out_trans_cancel;
>  	}
> @@ -838,6 +848,9 @@ xfs_attr_set(
>  	error = xfs_trans_commit(args->trans);
>  out_unlock:
>  	xfs_iunlock(dp, XFS_ILOCK_EXCL);
> +drop_incompat:
> +	if (delayed)
> +		xlog_drop_incompat_feat(mp->m_log);
>  	return error;
>  
>  out_trans_cancel:
> @@ -846,6 +859,58 @@ xfs_attr_set(
>  	goto out_unlock;
>  }
>  
> +STATIC int
> +xfs_attr_item_init(
> +	struct xfs_da_args	*args,
> +	unsigned int		op_flags,	/* op flag (set or remove) */
> +	struct xfs_attr_item	**attr)		/* new xfs_attr_item */
> +{
> +
> +	struct xfs_attr_item	*new;
> +
> +	new = kmem_zalloc(sizeof(struct xfs_attr_item), KM_NOFS);
> +	new->xattri_op_flags = op_flags;
> +	new->xattri_dac.da_args = args;
> +
> +	*attr = new;
> +	return 0;
> +}
> +
> +/* Sets an attribute for an inode as a deferred operation */
> +int
> +xfs_attr_set_deferred(
> +	struct xfs_da_args	*args)
> +{
> +	struct xfs_attr_item	*new;
> +	int			error = 0;
> +
> +	error = xfs_attr_item_init(args, XFS_ATTR_OP_FLAGS_SET, &new);
> +	if (error)
> +		return error;
> +
> +	xfs_defer_add(args->trans, XFS_DEFER_OPS_TYPE_ATTR, &new->xattri_list);
> +
> +	return 0;
> +}
> +
> +/* Removes an attribute for an inode as a deferred operation */
> +int
> +xfs_attr_remove_deferred(
> +	struct xfs_da_args	*args)
> +{
> +
> +	struct xfs_attr_item	*new;
> +	int			error;
> +
> +	error  = xfs_attr_item_init(args, XFS_ATTR_OP_FLAGS_REMOVE, &new);
> +	if (error)
> +		return error;
> +
> +	xfs_defer_add(args->trans, XFS_DEFER_OPS_TYPE_ATTR, &new->xattri_list);
> +
> +	return 0;
> +}
> +
>  /*========================================================================
>   * External routines when attribute list is inside the inode
>   *========================================================================*/
> diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
> index aa33cdcf26b8..0f326c28ab7c 100644
> --- a/fs/xfs/libxfs/xfs_attr.h
> +++ b/fs/xfs/libxfs/xfs_attr.h
> @@ -526,5 +526,7 @@ bool xfs_attr_namecheck(const void *name, size_t length);
>  void xfs_delattr_context_init(struct xfs_delattr_context *dac,
>  			      struct xfs_da_args *args);
>  int xfs_attr_calc_size(struct xfs_da_args *args, int *local);
> +int xfs_attr_set_deferred(struct xfs_da_args *args);
> +int xfs_attr_remove_deferred(struct xfs_da_args *args);
>  
>  #endif	/* __XFS_ATTR_H__ */
> diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
> index 4402c5d09269..0d0afa1aae59 100644
> --- a/fs/xfs/xfs_log.c
> +++ b/fs/xfs/xfs_log.c
> @@ -3993,3 +3993,44 @@ xlog_drop_incompat_feat(
>  {
>  	up_read(&log->l_incompat_users);
>  }
> +
> +/*
> + * Get permission to use log-assisted atomic exchange of file extents.
> + *
> + * Callers must not be running any transactions or hold any inode locks, and
> + * they must release the permission by calling xlog_drop_incompat_feat
> + * when they're done.
> + */
> +int
> +xfs_attr_use_log_assist(
> +	struct xfs_mount	*mp)
> +{
> +	int			error = 0;
> +
> +	/*
> +	 * Protect ourselves from an idle log clearing the logged xattrs log
> +	 * incompat feature bit.
> +	 */
> +	xlog_use_incompat_feat(mp->m_log);
> +
> +	/*
> +	 * If log-assisted xattrs are already enabled, the caller can use the
> +	 * log assisted swap functions with the log-incompat reference we got.
> +	 */
> +	if (sb_version_haslogxattrs(&mp->m_sb))
> +		return 0;
> +
> +	/* Enable log-assisted xattrs. */
> +	error = xfs_add_incompat_log_feature(mp,
> +			XFS_SB_FEAT_INCOMPAT_LOG_XATTRS);
> +	if (error)
> +		goto drop_incompat;
> +
> +	xfs_warn_once(mp,
> +"EXPERIMENTAL logged extended attributes feature added. Use at your own risk!");
> +
> +	return 0;
> +drop_incompat:
> +	xlog_drop_incompat_feat(mp->m_log);
> +	return error;
> +}
> diff --git a/fs/xfs/xfs_log.h b/fs/xfs/xfs_log.h
> index dc1b77b92fc1..4504ab60ac85 100644
> --- a/fs/xfs/xfs_log.h
> +++ b/fs/xfs/xfs_log.h
> @@ -144,5 +144,6 @@ bool	  xlog_force_shutdown(struct xlog *log, int shutdown_flags);
>  
>  void xlog_use_incompat_feat(struct xlog *log);
>  void xlog_drop_incompat_feat(struct xlog *log);
> +int xfs_attr_use_log_assist(struct xfs_mount *mp);
>  
>  #endif	/* __XFS_LOG_H__ */


-- 
chandan

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v24 09/11] xfs: Add larp debug option
  2021-08-24 22:44 ` [PATCH v24 09/11] xfs: Add larp debug option Allison Henderson
@ 2021-08-30 12:03   ` Chandan Babu R
  2021-08-31 18:12     ` Allison Henderson
  0 siblings, 1 reply; 45+ messages in thread
From: Chandan Babu R @ 2021-08-30 12:03 UTC (permalink / raw)
  To: Allison Henderson; +Cc: linux-xfs

On 25 Aug 2021 at 04:14, Allison Henderson wrote:
> This patch adds a mount option to enable log attribute replay. Eventually

s/mount option/debug option//

> this can be removed when delayed attrs becomes permanent.

The rest looks good to me.

Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>

>
> Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
> ---
>  fs/xfs/libxfs/xfs_attr.h |  2 +-
>  fs/xfs/xfs_globals.c     |  1 +
>  fs/xfs/xfs_sysctl.h      |  1 +
>  fs/xfs/xfs_sysfs.c       | 24 ++++++++++++++++++++++++
>  4 files changed, 27 insertions(+), 1 deletion(-)
>
> diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
> index efb7ac4fc41c..492762541174 100644
> --- a/fs/xfs/libxfs/xfs_attr.h
> +++ b/fs/xfs/libxfs/xfs_attr.h
> @@ -30,7 +30,7 @@ struct xfs_attr_list_context;
>  
>  static inline bool xfs_has_larp(struct xfs_mount *mp)
>  {
> -	return false;
> +	return xfs_globals.larp;
>  }
>  
>  /*
> diff --git a/fs/xfs/xfs_globals.c b/fs/xfs/xfs_globals.c
> index f62fa652c2fd..4d0a98f920ca 100644
> --- a/fs/xfs/xfs_globals.c
> +++ b/fs/xfs/xfs_globals.c
> @@ -41,5 +41,6 @@ struct xfs_globals xfs_globals = {
>  #endif
>  #ifdef DEBUG
>  	.pwork_threads		=	-1,	/* automatic thread detection */
> +	.larp			=	false,	/* log attribute replay */
>  #endif
>  };
> diff --git a/fs/xfs/xfs_sysctl.h b/fs/xfs/xfs_sysctl.h
> index 7692e76ead33..f78ad6b10ea5 100644
> --- a/fs/xfs/xfs_sysctl.h
> +++ b/fs/xfs/xfs_sysctl.h
> @@ -83,6 +83,7 @@ extern xfs_param_t	xfs_params;
>  struct xfs_globals {
>  #ifdef DEBUG
>  	int	pwork_threads;		/* parallel workqueue threads */
> +	bool	larp;			/* log attribute replay */
>  #endif
>  	int	log_recovery_delay;	/* log recovery delay (secs) */
>  	int	mount_delay;		/* mount setup delay (secs) */
> diff --git a/fs/xfs/xfs_sysfs.c b/fs/xfs/xfs_sysfs.c
> index 18dc5eca6c04..74180e05e8ed 100644
> --- a/fs/xfs/xfs_sysfs.c
> +++ b/fs/xfs/xfs_sysfs.c
> @@ -227,6 +227,29 @@ pwork_threads_show(
>  	return snprintf(buf, PAGE_SIZE, "%d\n", xfs_globals.pwork_threads);
>  }
>  XFS_SYSFS_ATTR_RW(pwork_threads);
> +
> +static ssize_t
> +larp_store(
> +	struct kobject	*kobject,
> +	const char	*buf,
> +	size_t		count)
> +{
> +	ssize_t		ret;
> +
> +	ret = kstrtobool(buf, &xfs_globals.larp);
> +	if (ret < 0)
> +		return ret;
> +	return count;
> +}
> +
> +STATIC ssize_t
> +larp_show(
> +	struct kobject	*kobject,
> +	char		*buf)
> +{
> +	return snprintf(buf, PAGE_SIZE, "%d\n", xfs_globals.larp);
> +}
> +XFS_SYSFS_ATTR_RW(larp);
>  #endif /* DEBUG */
>  
>  static struct attribute *xfs_dbg_attrs[] = {
> @@ -236,6 +259,7 @@ static struct attribute *xfs_dbg_attrs[] = {
>  	ATTR_LIST(always_cow),
>  #ifdef DEBUG
>  	ATTR_LIST(pwork_threads),
> +	ATTR_LIST(larp),
>  #endif
>  	NULL,
>  };


-- 
chandan

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v24 11/11] xfs: Add helper function xfs_attr_leaf_addname
  2021-08-24 22:44 ` [PATCH v24 11/11] xfs: Add helper function xfs_attr_leaf_addname Allison Henderson
@ 2021-08-30 14:17   ` Chandan Babu R
  2021-08-31 18:13     ` Allison Henderson
  0 siblings, 1 reply; 45+ messages in thread
From: Chandan Babu R @ 2021-08-30 14:17 UTC (permalink / raw)
  To: Allison Henderson; +Cc: linux-xfs

On 25 Aug 2021 at 04:14, Allison Henderson wrote:
> This patch adds a helper function xfs_attr_leaf_addname.  While this
> does help to break down xfs_attr_set_iter, it does also hoist out some
> of the state management.  This patch has been moved to the end of the
> clean up series for further discussion.
>

Looks good to me.

Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>

> Suggested-by: Darrick J. Wong <djwong@kernel.org>
> Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
> ---
>  fs/xfs/libxfs/xfs_attr.c | 110 +++++++++++++++++++++------------------
>  fs/xfs/xfs_trace.h       |   1 +
>  2 files changed, 61 insertions(+), 50 deletions(-)
>
> diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
> index c3fdf232cd51..7150f0e051a0 100644
> --- a/fs/xfs/libxfs/xfs_attr.c
> +++ b/fs/xfs/libxfs/xfs_attr.c
> @@ -284,6 +284,65 @@ xfs_attr_sf_addname(
>  	return -EAGAIN;
>  }
>  
> +STATIC int
> +xfs_attr_leaf_addname(
> +	struct xfs_attr_item	*attr)
> +{
> +	struct xfs_da_args	*args = attr->xattri_da_args;
> +	struct xfs_inode	*dp = args->dp;
> +	int			error;
> +
> +	if (xfs_attr_is_leaf(dp)) {
> +		error = xfs_attr_leaf_try_add(args, attr->xattri_leaf_bp);
> +		if (error == -ENOSPC) {
> +			error = xfs_attr3_leaf_to_node(args);
> +			if (error)
> +				return error;
> +
> +			/*
> +			 * Finish any deferred work items and roll the
> +			 * transaction once more.  The goal here is to call
> +			 * node_addname with the inode and transaction in the
> +			 * same state (inode locked and joined, transaction
> +			 * clean) no matter how we got to this step.
> +			 *
> +			 * At this point, we are still in XFS_DAS_UNINIT, but
> +			 * when we come back, we'll be a node, so we'll fall
> +			 * down into the node handling code below
> +			 */
> +			trace_xfs_attr_set_iter_return(
> +				attr->xattri_dela_state, args->dp);
> +			return -EAGAIN;
> +		}
> +
> +		if (error)
> +			return error;
> +
> +		attr->xattri_dela_state = XFS_DAS_FOUND_LBLK;
> +	} else {
> +		error = xfs_attr_node_addname_find_attr(attr);
> +		if (error)
> +			return error;
> +
> +		error = xfs_attr_node_addname(attr);
> +		if (error)
> +			return error;
> +
> +		/*
> +		 * If addname was successful, and we dont need to alloc or
> +		 * remove anymore blks, we're done.
> +		 */
> +		if (!args->rmtblkno &&
> +		    !(args->op_flags & XFS_DA_OP_RENAME))
> +			return 0;
> +
> +		attr->xattri_dela_state = XFS_DAS_FOUND_NBLK;
> +	}
> +
> +	trace_xfs_attr_leaf_addname_return(attr->xattri_dela_state, args->dp);
> +	return -EAGAIN;
> +}
> +
>  /*
>   * Set the attribute specified in @args.
>   * This routine is meant to function as a delayed operation, and may return
> @@ -319,57 +378,8 @@ xfs_attr_set_iter(
>  			attr->xattri_leaf_bp = NULL;
>  		}
>  
> -		if (xfs_attr_is_leaf(dp)) {
> -			error = xfs_attr_leaf_try_add(args,
> -						      attr->xattri_leaf_bp);
> -			if (error == -ENOSPC) {
> -				error = xfs_attr3_leaf_to_node(args);
> -				if (error)
> -					return error;
> -
> -				/*
> -				 * Finish any deferred work items and roll the
> -				 * transaction once more.  The goal here is to
> -				 * call node_addname with the inode and
> -				 * transaction in the same state (inode locked
> -				 * and joined, transaction clean) no matter how
> -				 * we got to this step.
> -				 *
> -				 * At this point, we are still in
> -				 * XFS_DAS_UNINIT, but when we come back, we'll
> -				 * be a node, so we'll fall down into the node
> -				 * handling code below
> -				 */
> -				trace_xfs_attr_set_iter_return(
> -					attr->xattri_dela_state, args->dp);
> -				return -EAGAIN;
> -			} else if (error) {
> -				return error;
> -			}
> -
> -			attr->xattri_dela_state = XFS_DAS_FOUND_LBLK;
> -		} else {
> -			error = xfs_attr_node_addname_find_attr(attr);
> -			if (error)
> -				return error;
> +		return xfs_attr_leaf_addname(attr);
>  
> -			error = xfs_attr_node_addname(attr);
> -			if (error)
> -				return error;
> -
> -			/*
> -			 * If addname was successful, and we dont need to alloc
> -			 * or remove anymore blks, we're done.
> -			 */
> -			if (!args->rmtblkno &&
> -			    !(args->op_flags & XFS_DA_OP_RENAME))
> -				return 0;
> -
> -			attr->xattri_dela_state = XFS_DAS_FOUND_NBLK;
> -		}
> -		trace_xfs_attr_set_iter_return(attr->xattri_dela_state,
> -					       args->dp);
> -		return -EAGAIN;
>  	case XFS_DAS_FOUND_LBLK:
>  		/*
>  		 * If there was an out-of-line value, allocate the blocks we
> diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
> index 1033a95fbf8e..77a78b5b1a29 100644
> --- a/fs/xfs/xfs_trace.h
> +++ b/fs/xfs/xfs_trace.h
> @@ -4132,6 +4132,7 @@ DEFINE_EVENT(xfs_das_state_class, name, \
>  	TP_ARGS(das, ip))
>  DEFINE_DAS_STATE_EVENT(xfs_attr_sf_addname_return);
>  DEFINE_DAS_STATE_EVENT(xfs_attr_set_iter_return);
> +DEFINE_DAS_STATE_EVENT(xfs_attr_leaf_addname_return);
>  DEFINE_DAS_STATE_EVENT(xfs_attr_node_addname_return);
>  DEFINE_DAS_STATE_EVENT(xfs_attr_remove_iter_return);
>  DEFINE_DAS_STATE_EVENT(xfs_attr_rmtval_remove_return);


-- 
chandan

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v24 02/11] xfs: Capture buffers for delayed ops
  2021-08-24 22:44 ` [PATCH v24 02/11] xfs: Capture buffers for delayed ops Allison Henderson
  2021-08-27 11:33   ` Chandan Babu R
@ 2021-08-30 17:44   ` Darrick J. Wong
  2021-08-31 17:50     ` Allison Henderson
  1 sibling, 1 reply; 45+ messages in thread
From: Darrick J. Wong @ 2021-08-30 17:44 UTC (permalink / raw)
  To: Allison Henderson; +Cc: linux-xfs

On Tue, Aug 24, 2021 at 03:44:25PM -0700, Allison Henderson wrote:
> This patch enables delayed operations to capture held buffers with in
> the xfs_defer_capture. Buffers are then rejoined to the new
> transaction in xlog_finish_defer_ops
> 
> Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
> ---
>  fs/xfs/libxfs/xfs_defer.c  | 7 ++++++-
>  fs/xfs/libxfs/xfs_defer.h  | 4 +++-
>  fs/xfs/xfs_bmap_item.c     | 2 +-
>  fs/xfs/xfs_buf.c           | 1 +
>  fs/xfs/xfs_buf.h           | 1 +
>  fs/xfs/xfs_extfree_item.c  | 2 +-
>  fs/xfs/xfs_log_recover.c   | 7 +++++++
>  fs/xfs/xfs_refcount_item.c | 2 +-
>  fs/xfs/xfs_rmap_item.c     | 2 +-
>  9 files changed, 22 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_defer.c b/fs/xfs/libxfs/xfs_defer.c
> index eff4a127188e..d1d09b6aca55 100644
> --- a/fs/xfs/libxfs/xfs_defer.c
> +++ b/fs/xfs/libxfs/xfs_defer.c
> @@ -639,6 +639,7 @@ xfs_defer_ops_capture(
>  	dfc = kmem_zalloc(sizeof(*dfc), KM_NOFS);
>  	INIT_LIST_HEAD(&dfc->dfc_list);
>  	INIT_LIST_HEAD(&dfc->dfc_dfops);
> +	INIT_LIST_HEAD(&dfc->dfc_buffers);
>  
>  	xfs_defer_create_intents(tp);
>  
> @@ -690,7 +691,8 @@ int
>  xfs_defer_ops_capture_and_commit(
>  	struct xfs_trans		*tp,
>  	struct xfs_inode		*capture_ip,
> -	struct list_head		*capture_list)
> +	struct list_head		*capture_list,
> +	struct xfs_buf			*bp)

I wonder if xfs_defer_ops_capture should learn to pick up the inodes and
buffers to hold automatically from the transaction that's being
committed?  Seeing as xfs_defer_trans_roll already knows how to do that
across transaction rolls, and that's more or less the same thing we're
doing here, but in a much more roundabout way.

>  {
>  	struct xfs_mount		*mp = tp->t_mountp;
>  	struct xfs_defer_capture	*dfc;
> @@ -703,6 +705,9 @@ xfs_defer_ops_capture_and_commit(
>  	if (!dfc)
>  		return xfs_trans_commit(tp);
>  
> +	if (bp && bp->b_transp == tp)
> +		list_add_tail(&bp->b_delay, &dfc->dfc_buffers);
> +
>  	/* Commit the transaction and add the capture structure to the list. */
>  	error = xfs_trans_commit(tp);
>  	if (error) {
> diff --git a/fs/xfs/libxfs/xfs_defer.h b/fs/xfs/libxfs/xfs_defer.h
> index 05472f71fffe..739f70d72fd5 100644
> --- a/fs/xfs/libxfs/xfs_defer.h
> +++ b/fs/xfs/libxfs/xfs_defer.h
> @@ -74,6 +74,7 @@ struct xfs_defer_capture {
>  
>  	/* Deferred ops state saved from the transaction. */
>  	struct list_head	dfc_dfops;
> +	struct list_head	dfc_buffers;
>  	unsigned int		dfc_tpflags;
>  
>  	/* Block reservations for the data and rt devices. */
> @@ -95,7 +96,8 @@ struct xfs_defer_capture {
>   * This doesn't normally happen except log recovery.
>   */
>  int xfs_defer_ops_capture_and_commit(struct xfs_trans *tp,
> -		struct xfs_inode *capture_ip, struct list_head *capture_list);
> +		struct xfs_inode *capture_ip, struct list_head *capture_list,
> +		struct xfs_buf *bp);
>  void xfs_defer_ops_continue(struct xfs_defer_capture *d, struct xfs_trans *tp,
>  		struct xfs_inode **captured_ipp);
>  void xfs_defer_ops_release(struct xfs_mount *mp, struct xfs_defer_capture *d);
> diff --git a/fs/xfs/xfs_bmap_item.c b/fs/xfs/xfs_bmap_item.c
> index 03159970133f..51ba8ee368ca 100644
> --- a/fs/xfs/xfs_bmap_item.c
> +++ b/fs/xfs/xfs_bmap_item.c
> @@ -532,7 +532,7 @@ xfs_bui_item_recover(
>  	 * Commit transaction, which frees the transaction and saves the inode
>  	 * for later replay activities.
>  	 */
> -	error = xfs_defer_ops_capture_and_commit(tp, ip, capture_list);
> +	error = xfs_defer_ops_capture_and_commit(tp, ip, capture_list, NULL);
>  	if (error)
>  		goto err_unlock;
>  
> diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
> index 047bd6e3f389..29b4655a0a65 100644
> --- a/fs/xfs/xfs_buf.c
> +++ b/fs/xfs/xfs_buf.c
> @@ -233,6 +233,7 @@ _xfs_buf_alloc(
>  	init_completion(&bp->b_iowait);
>  	INIT_LIST_HEAD(&bp->b_lru);
>  	INIT_LIST_HEAD(&bp->b_list);
> +	INIT_LIST_HEAD(&bp->b_delay);
>  	INIT_LIST_HEAD(&bp->b_li_list);
>  	sema_init(&bp->b_sema, 0); /* held, no waiters */
>  	spin_lock_init(&bp->b_lock);
> diff --git a/fs/xfs/xfs_buf.h b/fs/xfs/xfs_buf.h
> index 6b0200b8007d..c51445705dc6 100644
> --- a/fs/xfs/xfs_buf.h
> +++ b/fs/xfs/xfs_buf.h
> @@ -151,6 +151,7 @@ struct xfs_buf {
>  	int			b_io_error;	/* internal IO error state */
>  	wait_queue_head_t	b_waiters;	/* unpin waiters */
>  	struct list_head	b_list;
> +	struct list_head	b_delay;	/* delayed operations list */
>  	struct xfs_perag	*b_pag;		/* contains rbtree root */
>  	struct xfs_mount	*b_mount;
>  	struct xfs_buftarg	*b_target;	/* buffer target (device) */

The bare list-conveyance machinery looks fine to me, but adding 16 bytes
to struct xfs_buf for something that only happens during log recovery is
rather expensive.  Can you reuse b_list for this purpose?  I think the
only user of b_list are the buffer delwri functions, which shouldn't be
active here since the xattr recovery mechanism (a) holds the buffer lock
and (b) doesn't itself use delwri buffer lists for xattr leaf blocks.

(The AIL uses delwri lists, but it won't touch a locked buffer.)

> diff --git a/fs/xfs/xfs_extfree_item.c b/fs/xfs/xfs_extfree_item.c
> index 3f8a0713573a..046f21338c48 100644
> --- a/fs/xfs/xfs_extfree_item.c
> +++ b/fs/xfs/xfs_extfree_item.c
> @@ -637,7 +637,7 @@ xfs_efi_item_recover(
>  
>  	}
>  
> -	return xfs_defer_ops_capture_and_commit(tp, NULL, capture_list);
> +	return xfs_defer_ops_capture_and_commit(tp, NULL, capture_list, NULL);
>  
>  abort_error:
>  	xfs_trans_cancel(tp);
> diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
> index 10562ecbd9ea..6a3c0bb16b69 100644
> --- a/fs/xfs/xfs_log_recover.c
> +++ b/fs/xfs/xfs_log_recover.c
> @@ -2465,6 +2465,7 @@ xlog_finish_defer_ops(
>  	struct list_head	*capture_list)
>  {
>  	struct xfs_defer_capture *dfc, *next;
> +	struct xfs_buf		*bp, *bnext;
>  	struct xfs_trans	*tp;
>  	struct xfs_inode	*ip;
>  	int			error = 0;
> @@ -2489,6 +2490,12 @@ xlog_finish_defer_ops(
>  			return error;
>  		}
>  
> +		list_for_each_entry_safe(bp, bnext, &dfc->dfc_buffers, b_delay) {
> +			xfs_trans_bjoin(tp, bp);
> +			xfs_trans_bhold(tp, bp);
> +			list_del_init(&bp->b_delay);
> +		}

Why isn't this in xfs_defer_ops_continue, like the code that extracts
the inodes from the capture struct and hands them back to the caller?

> +
>  		/*
>  		 * Transfer to this new transaction all the dfops we captured
>  		 * from recovering a single intent item.
> diff --git a/fs/xfs/xfs_refcount_item.c b/fs/xfs/xfs_refcount_item.c
> index 46904b793bd4..a6e7351ca4f9 100644
> --- a/fs/xfs/xfs_refcount_item.c
> +++ b/fs/xfs/xfs_refcount_item.c
> @@ -557,7 +557,7 @@ xfs_cui_item_recover(
>  	}
>  
>  	xfs_refcount_finish_one_cleanup(tp, rcur, error);
> -	return xfs_defer_ops_capture_and_commit(tp, NULL, capture_list);
> +	return xfs_defer_ops_capture_and_commit(tp, NULL, capture_list, NULL);
>  
>  abort_error:
>  	xfs_refcount_finish_one_cleanup(tp, rcur, error);
> diff --git a/fs/xfs/xfs_rmap_item.c b/fs/xfs/xfs_rmap_item.c
> index 5f0695980467..8c70a4af80a9 100644
> --- a/fs/xfs/xfs_rmap_item.c
> +++ b/fs/xfs/xfs_rmap_item.c
> @@ -587,7 +587,7 @@ xfs_rui_item_recover(
>  	}
>  
>  	xfs_rmap_finish_one_cleanup(tp, rcur, error);
> -	return xfs_defer_ops_capture_and_commit(tp, NULL, capture_list);
> +	return xfs_defer_ops_capture_and_commit(tp, NULL, capture_list, NULL);
>  
>  abort_error:
>  	xfs_rmap_finish_one_cleanup(tp, rcur, error);
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v24 00/11] Log Attribute Replay
  2021-08-24 22:44 [PATCH v24 00/11] Log Attribute Replay Allison Henderson
                   ` (10 preceding siblings ...)
  2021-08-24 22:44 ` [PATCH v24 11/11] xfs: Add helper function xfs_attr_leaf_addname Allison Henderson
@ 2021-08-31  0:20 ` Dave Chinner
  2021-08-31 18:27   ` Allison Henderson
  2021-09-01  7:30 ` [PATCH 0/5] xfs: various logged attribute fixes Dave Chinner
  12 siblings, 1 reply; 45+ messages in thread
From: Dave Chinner @ 2021-08-31  0:20 UTC (permalink / raw)
  To: Allison Henderson; +Cc: linux-xfs

On Tue, Aug 24, 2021 at 03:44:23PM -0700, Allison Henderson wrote:
> Hi all,
> 
> This set is a subset of a larger series parent pointers. Delayed attributes allow
> attribute operations (set and remove) to be logged and committed in the same
> way that other delayed operations do. This allows more complex operations (like
> parent pointers) to be broken up into multiple smaller transactions. To do
> this, the existing attr operations must be modified to operate as a delayed
> operation.  This means that they cannot roll, commit, or finish transactions.
> Instead, they return -EAGAIN to allow the calling function to handle the
> transaction.  In this series, we focus on only the delayed attribute portion.
> We will introduce parent pointers in a later set.
> 
> The set as a whole is a bit much to digest at once, so I usually send out the
> smaller sub series to reduce reviewer burn out.  But the entire extended series
> is visible through the included github links.

Ok, so like I did with Darrick's deferred inactivation series, the
first thing I'm doing here is throwing this patchset at
scalability/performance worklaods and finding out what is different.

I've merged this series with 5.14 + xfs/for-next + xfs-cil-scale and
then run some tests on it. First up is fsmark creating zero length
files w/ 64 byte xattrs. This should stress only shortform attribute
manipulations.

I have not enabled delayed attributes yet (i.e.
/sys/fs/xfs/debug/larp = 0)

First thing I notice is the transaction commit rate during create is
up around 900k/s, so we are doing 3 transactions per inode - 1 for
create, 2 for attributes. That looks like a regression - existing
shortform attribute creation only takes a single transaction commit,
so this workload prior to this patchset only ran at 600k commits/s.

Note that hte only reason I'm getting 900k transactions/s is the
CIL scalability patchset - without that the system tops out at ~800k
transactions/s and so this would be a significant performance
regression (20%) vs the current xfs/for-next code.

Essentially, this looks like we are doing an extra transaction
commit to defer the creation of the attribute, then doing another
transaction to actually modify the attribute. i.e.:

 - 11.04% xfs_attr_set
    - 8.70% xfs_trans_commit
       - 8.69% __xfs_trans_commit
	  - 5.10% xfs_defer_finish_noroll
	     - 3.74% xfs_defer_trans_roll
		- 3.57% xfs_trans_roll
		   - 3.13% __xfs_trans_commit
		      - 3.01% xlog_cil_commit
			   0.66% down_read
			   0.63% xfs_log_ticket_regrant
	     - 1.16% xfs_attr_finish_item
		- 1.06% xfs_trans_attr_finish_update
		   - 1.03% xfs_attr_set_iter
		      - 1.01% xfs_attr_sf_addname
			 - 0.99% xfs_attr_try_sf_addname
			    - 0.61% xfs_attr_shortform_addname
				 0.55% xfs_attr_shortform_add


AFAICT, for non-delayed attributes, this first transaction commit
logs the inode but does not create intent or intent done items
(returns NULL for both operations), so just rolls and runs the
->finish_item. So it would seem that the first transaction just
changes the inode timestamps and does nothing else.

Firstly, this means the inode timestamp change is not atomic w.r.t.
the attribute change the timestamp change relates to and it's
essentially new overhead for the non-delayed path.

Looking at the unlink path, I see the same thing - there's an extra
transaction for the attr remove path, the same as the attr set path.
This drives the unlink path to 1.1 million transaction commits/sec
instead of 800k/s, so it's likely that there's a substantial
performance regression here on a kernel without the CIL scalability
patchset.

IOWs, there's significant behavioural changes with the non-delayed
logging version of this patchset, both in terms of performance and
the atomicity of changes that appear in the journal and hence
recovery behaviour.

At this point I have to ask: why are we trying to retain the "old"
way of doing things (even for testing) if it is substantially
changing behaviour and on-disk journal contents for attribute
modifications?

So, lets turn on delayed logging:

$ sudo sh -c 'echo 1 > /sys/fs/xfs/debug/larp'
$ ~/tests/fsmark-50-test-xfs.sh -t 16 -X 64 -d /dev/mapper/fast -- -l size=2000m -d agcount=67
QUOTA=
MKFSOPTS= -l size=2000m -d agcount=67
DEV=/dev/mapper/fast
THREADS=16
.....

Message from syslogd@test4 at Aug 31 09:12:55 ...
 kernel:[ 2342.737931] XFS: Assertion failed: !test_bit(XFS_LI_DIRTY, &lip->li_flags), file: fs/xfs/xfs_trans.c, line: 652

Instant assert fail and the machine locks up hard.

Actually, now that I reproduce it with a full console trace (which
is terribly interleaved and almost impossible to read) there's
bad stuff all over the place. Null pointer dereferences in
xlog_cil_commit, "sleeping in atomic" failures, and the assert
failure above. 

Ok, run a single thread, and...

[   84.119162] BUG: kernel NULL pointer dereference, address: 000000000000000d
[   84.123541] #PF: supervisor write access in kernel mode
[   84.126028] #PF: error_code(0x0002) - not-present page
[   84.127312] PGD 0 P4D 0 
[   84.127966] Oops: 0002 [#1] PREEMPT SMP
[   84.128960] CPU: 8 PID: 5139 Comm: fs_mark Not tainted 5.14.0-dgc+ #552
[   84.130632] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-1 04/01/2014
[   84.132723] RIP: 0010:xlog_prepare_iovec+0x59/0xe0
[   84.133951] Code: 4c 89 f8 4c 29 e0 48 c1 f8 04 48 39 d0 7d 76 4d 8d 67 10 8b 43 34 8d 50 0c 83 e2 07 74 0c 83 c0 0b 83 c8 07 83 e8 0b 89 43 34 <45> 89 74 24 0c 48 63 43 34 48 03 43 28 49 89 04 24 c7 40 08 69 00
[   84.137980] RSP: 0018:ffffc900021d7838 EFLAGS: 00010202
[   84.138968] RAX: 0000000000000004 RBX: ffff888140f3e100 RCX: 0000000000000006
[   84.140321] RDX: 0000000000000004 RSI: ffffc900021d7880 RDI: ffff888140f3e100
[   84.141671] RBP: ffffc900021d7868 R08: ffffffff82a8fb88 R09: 000000000000494e
[   84.143069] R10: ffff88823ffd5000 R11: 00000000000319c8 R12: 0000000000000001
[   84.144447] R13: ffffc900021d7880 R14: 000000000000001b R15: 0000000000000000
[   84.145825] FS:  00007f51fb33c740(0000) GS:ffff88823bc00000(0000) knlGS:0000000000000000
[   84.147413] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   84.148535] CR2: 000000000000000d CR3: 00000001473b1003 CR4: 0000000000770ee0
[   84.149927] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   84.151291] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   84.152331] PKRU: 55555554
[   84.152732] Call Trace:
[   84.153106]  xfs_attri_item_format+0x87/0x230
[   84.153748]  xlog_cil_commit+0x253/0xa00
[   84.154329]  ? kvmalloc_node+0x79/0x80
[   84.154881]  __xfs_trans_commit+0xc1/0x330
[   84.155490]  xfs_trans_roll+0x53/0xe0
[   84.156027]  xfs_defer_trans_roll+0x10d/0x2b0
[   84.156659]  xfs_defer_finish_noroll+0xb1/0x650
[   84.157312]  __xfs_trans_commit+0x143/0x330
[   84.157918]  xfs_trans_commit+0x10/0x20
[   84.158498]  xfs_attr_set+0x41a/0x4e0
[   84.159030]  xfs_xattr_set+0x8d/0xe0
[   84.159554]  __vfs_setxattr+0x6b/0x90
[   84.160090]  __vfs_setxattr_noperm+0x7d/0x1f0
[   84.160718]  __vfs_setxattr_locked+0xdf/0x100
[   84.161346]  vfs_setxattr+0x9b/0x170
[   84.161862]  setxattr+0x110/0x200
[   84.162346]  ? _raw_spin_unlock+0xe/0x20
[   84.162914]  ? __handle_mm_fault+0xc1b/0x16d0
[   84.163556]  ? __might_sleep+0x49/0x80
[   84.164132]  __x64_sys_fsetxattr+0xb1/0xe0
[   84.164782]  do_syscall_64+0x35/0x80

Ok, there's the first failure.

This looks like it's a problem with xfs_attri_item_{size,format} in
calculating the number of bytes to log. They use ATTR_NVEC_SIZE() to
calculate the number of bytes of copy from the attribute item which
rounds up the length to copy to 4 byte aligned values. I'm not sure
what this function is calculating:

/* iovec length must be 32-bit aligned */
static inline size_t ATTR_NVEC_SIZE(size_t size)
{
        return size == sizeof(int32_t) ? size :
	               sizeof(int32_t) + round_up(size, sizeof(int32_t));
}

It appears to be saying if the size == 4, then return 4, otherwise
return 4 + roundup(size)... which leads me to struct
xfs_attri_log_format:

struct xfs_attri_log_format {
        uint16_t        alfi_type;      /* attri log item type */
        uint16_t        alfi_size;      /* size of this item */
        uint32_t        __pad;          /* pad to 64 bit aligned */
        uint64_t        alfi_id;        /* attri identifier */
        uint64_t        alfi_ino;       /* the inode for this attr operation */
        uint32_t        alfi_op_flags;  /* marks the op as a set or remove */
        uint32_t        alfi_name_len;  /* attr name length */
        uint32_t        alfi_value_len; /* attr value length */
        uint32_t        alfi_attr_flags;/* attr flags */
};

I don't see where the extra 4 bytes for the attribute vector size
comes from. It's not needed to store the length, so this could
oversize the amount of data to be copied from the source
buffer by up to 7 bytes?

I can see that it might need rounding with the existing
log code (because the formatter is responsible for 32 bit alignment
of log vectors), but that goes away with the CIL scalability
patchset that always aligns iovecs to 4 byte alignment so the
formatters do not need to do that.

Hiding it in a "macro" is not necessary, either - look at how
xfs_inode_item_{data,attr}_fork_size handle the rounding up of the
local format fork size. They round up the fork byte count to 4
directly, and the format code copies those bytes because
xfs_idata_realloc() allocates those bytes.

However, for the attribute buffers, this isn't guaranteed. Look at
xfs_xattr_set():

        struct xfs_da_args      args = {
                .dp             = XFS_I(inode),
                .attr_filter    = handler->flags,
                .attr_flags     = flags,
                .name           = name,
                .namelen        = strlen(name),
                .value          = (void *)value,
                .valuelen       = size,
        };

There is no rounding up of the name or value lengths, and these end
up directly referenced by the deferred logging via xfs_attr_log_item()
and attrip->da_args->...

        attrip->attri_name = (void *)attr->xattri_da_args->name;
        attrip->attri_value = attr->xattri_da_args->value;
        attrip->attri_name_len = attr->xattri_da_args->namelen;
        attrip->attri_value_len = attr->xattri_da_args->valuelen;

We then pass those pointers directly to xlog_iovec_copy() but with a
rounded up length that is longer than the source buffer:

        xlog_copy_iovec(lv, &vecp, XLOG_REG_TYPE_ATTR_NAME,
                        attrip->attri_name,
                        ATTR_NVEC_SIZE(attrip->attri_name_len));
        if (attrip->attri_value_len > 0)
                xlog_copy_iovec(lv, &vecp, XLOG_REG_TYPE_ATTR_VALUE,
                                attrip->attri_value,
                                ATTR_NVEC_SIZE(attrip->attri_value_len));

So while this might not be the source of the above crash, it's
certainly a bug that needs fixing.

At this point, I'm just going to hack on the code to make it work,
and we can go from there...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v24 03/11] xfs: Set up infrastructure for log atrribute replay
  2021-08-24 22:44 ` [PATCH v24 03/11] xfs: Set up infrastructure for log atrribute replay Allison Henderson
  2021-08-27 14:17   ` Chandan Babu R
@ 2021-08-31  0:48   ` Dave Chinner
  2021-08-31 19:03     ` Allison Henderson
  2021-08-31 21:52   ` Dave Chinner
  2 siblings, 1 reply; 45+ messages in thread
From: Dave Chinner @ 2021-08-31  0:48 UTC (permalink / raw)
  To: Allison Henderson; +Cc: linux-xfs

On Tue, Aug 24, 2021 at 03:44:26PM -0700, Allison Henderson wrote:
> +/*
> + * Allocate and initialize an attri item.  Caller may allocate an additional
> + * trailing buffer of the specified size
> + */
> +STATIC struct xfs_attri_log_item *
> +xfs_attri_init(
> +	struct xfs_mount		*mp,
> +	int				buffer_size)
> +
> +{
> +	struct xfs_attri_log_item	*attrip;
> +	uint				size;
> +
> +	size = sizeof(struct xfs_attri_log_item) + buffer_size;
> +	attrip = kvmalloc(size, KM_ZERO);
> +	if (attrip == NULL)
> +		return NULL;

kvmalloc() takes GFP flags. I think you want GFP_KERNEL | __GFP_ZERO
here.

Also, buffer size is taken directly from on-disk without bounds/length
validation, meaning this could end up being an attacker controlled
memory allocation, so .....

> +STATIC int
> +xlog_recover_attri_commit_pass2(
> +	struct xlog                     *log,
> +	struct list_head		*buffer_list,
> +	struct xlog_recover_item        *item,
> +	xfs_lsn_t                       lsn)
> +{
> +	int                             error;
> +	struct xfs_mount                *mp = log->l_mp;
> +	struct xfs_attri_log_item       *attrip;
> +	struct xfs_attri_log_format     *attri_formatp;
> +	char				*name = NULL;
> +	char				*value = NULL;
> +	int				region = 0;
> +	int				buffer_size;
> +
> +	attri_formatp = item->ri_buf[region].i_addr;
> +
> +	/* Validate xfs_attri_log_format */
> +	if (attri_formatp->__pad != 0 || attri_formatp->alfi_name_len == 0 ||
> +	    (attri_formatp->alfi_op_flags == XFS_ATTR_OP_FLAGS_REMOVE &&
> +	    attri_formatp->alfi_value_len != 0)) {
> +		XFS_ERROR_REPORT(__func__, XFS_ERRLEVEL_LOW, mp);
> +		return -EFSCORRUPTED;
> +	}
> +
> +	buffer_size = attri_formatp->alfi_name_len +
> +		      attri_formatp->alfi_value_len;
> +
> +	attrip = xfs_attri_init(mp, buffer_size);
> +	if (attrip == NULL)
> +		return -ENOMEM;

There needs to be a lot better validation of the attribute
name/value lengths here.  Also, memory allocation failure here will
abort recovery, so it might be worth adding a comment here....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v24 02/11] xfs: Capture buffers for delayed ops
  2021-08-30 17:44   ` Darrick J. Wong
@ 2021-08-31 17:50     ` Allison Henderson
  2021-09-18  0:02       ` Darrick J. Wong
  0 siblings, 1 reply; 45+ messages in thread
From: Allison Henderson @ 2021-08-31 17:50 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs



On 8/30/21 10:44 AM, Darrick J. Wong wrote:
> On Tue, Aug 24, 2021 at 03:44:25PM -0700, Allison Henderson wrote:
>> This patch enables delayed operations to capture held buffers with in
>> the xfs_defer_capture. Buffers are then rejoined to the new
>> transaction in xlog_finish_defer_ops
>>
>> Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
>> ---
>>   fs/xfs/libxfs/xfs_defer.c  | 7 ++++++-
>>   fs/xfs/libxfs/xfs_defer.h  | 4 +++-
>>   fs/xfs/xfs_bmap_item.c     | 2 +-
>>   fs/xfs/xfs_buf.c           | 1 +
>>   fs/xfs/xfs_buf.h           | 1 +
>>   fs/xfs/xfs_extfree_item.c  | 2 +-
>>   fs/xfs/xfs_log_recover.c   | 7 +++++++
>>   fs/xfs/xfs_refcount_item.c | 2 +-
>>   fs/xfs/xfs_rmap_item.c     | 2 +-
>>   9 files changed, 22 insertions(+), 6 deletions(-)
>>
>> diff --git a/fs/xfs/libxfs/xfs_defer.c b/fs/xfs/libxfs/xfs_defer.c
>> index eff4a127188e..d1d09b6aca55 100644
>> --- a/fs/xfs/libxfs/xfs_defer.c
>> +++ b/fs/xfs/libxfs/xfs_defer.c
>> @@ -639,6 +639,7 @@ xfs_defer_ops_capture(
>>   	dfc = kmem_zalloc(sizeof(*dfc), KM_NOFS);
>>   	INIT_LIST_HEAD(&dfc->dfc_list);
>>   	INIT_LIST_HEAD(&dfc->dfc_dfops);
>> +	INIT_LIST_HEAD(&dfc->dfc_buffers);
>>   
>>   	xfs_defer_create_intents(tp);
>>   
>> @@ -690,7 +691,8 @@ int
>>   xfs_defer_ops_capture_and_commit(
>>   	struct xfs_trans		*tp,
>>   	struct xfs_inode		*capture_ip,
>> -	struct list_head		*capture_list)
>> +	struct list_head		*capture_list,
>> +	struct xfs_buf			*bp)
> 
> I wonder if xfs_defer_ops_capture should learn to pick up the inodes and
> buffers to hold automatically from the transaction that's being
> committed?  Seeing as xfs_defer_trans_roll already knows how to do that
> across transaction rolls, and that's more or less the same thing we're
> doing here, but in a much more roundabout way.
I see, I suppose it could?  But it wouldnt be used in this case though, 
at least not yet.  I sort of got the impression that people like to see 
things added as they are needed, and then unused code culled where it 
can be.  I would think that if the need does arise though, b_delay would 
be easy to expand into list of xfs_delay_items or something similar to 
what xfs_defer_trans_roll has.

> 
>>   {
>>   	struct xfs_mount		*mp = tp->t_mountp;
>>   	struct xfs_defer_capture	*dfc;
>> @@ -703,6 +705,9 @@ xfs_defer_ops_capture_and_commit(
>>   	if (!dfc)
>>   		return xfs_trans_commit(tp);
>>   
>> +	if (bp && bp->b_transp == tp)
>> +		list_add_tail(&bp->b_delay, &dfc->dfc_buffers);
>> +
>>   	/* Commit the transaction and add the capture structure to the list. */
>>   	error = xfs_trans_commit(tp);
>>   	if (error) {
>> diff --git a/fs/xfs/libxfs/xfs_defer.h b/fs/xfs/libxfs/xfs_defer.h
>> index 05472f71fffe..739f70d72fd5 100644
>> --- a/fs/xfs/libxfs/xfs_defer.h
>> +++ b/fs/xfs/libxfs/xfs_defer.h
>> @@ -74,6 +74,7 @@ struct xfs_defer_capture {
>>   
>>   	/* Deferred ops state saved from the transaction. */
>>   	struct list_head	dfc_dfops;
>> +	struct list_head	dfc_buffers;
>>   	unsigned int		dfc_tpflags;
>>   
>>   	/* Block reservations for the data and rt devices. */
>> @@ -95,7 +96,8 @@ struct xfs_defer_capture {
>>    * This doesn't normally happen except log recovery.
>>    */
>>   int xfs_defer_ops_capture_and_commit(struct xfs_trans *tp,
>> -		struct xfs_inode *capture_ip, struct list_head *capture_list);
>> +		struct xfs_inode *capture_ip, struct list_head *capture_list,
>> +		struct xfs_buf *bp);
>>   void xfs_defer_ops_continue(struct xfs_defer_capture *d, struct xfs_trans *tp,
>>   		struct xfs_inode **captured_ipp);
>>   void xfs_defer_ops_release(struct xfs_mount *mp, struct xfs_defer_capture *d);
>> diff --git a/fs/xfs/xfs_bmap_item.c b/fs/xfs/xfs_bmap_item.c
>> index 03159970133f..51ba8ee368ca 100644
>> --- a/fs/xfs/xfs_bmap_item.c
>> +++ b/fs/xfs/xfs_bmap_item.c
>> @@ -532,7 +532,7 @@ xfs_bui_item_recover(
>>   	 * Commit transaction, which frees the transaction and saves the inode
>>   	 * for later replay activities.
>>   	 */
>> -	error = xfs_defer_ops_capture_and_commit(tp, ip, capture_list);
>> +	error = xfs_defer_ops_capture_and_commit(tp, ip, capture_list, NULL);
>>   	if (error)
>>   		goto err_unlock;
>>   
>> diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
>> index 047bd6e3f389..29b4655a0a65 100644
>> --- a/fs/xfs/xfs_buf.c
>> +++ b/fs/xfs/xfs_buf.c
>> @@ -233,6 +233,7 @@ _xfs_buf_alloc(
>>   	init_completion(&bp->b_iowait);
>>   	INIT_LIST_HEAD(&bp->b_lru);
>>   	INIT_LIST_HEAD(&bp->b_list);
>> +	INIT_LIST_HEAD(&bp->b_delay);
>>   	INIT_LIST_HEAD(&bp->b_li_list);
>>   	sema_init(&bp->b_sema, 0); /* held, no waiters */
>>   	spin_lock_init(&bp->b_lock);
>> diff --git a/fs/xfs/xfs_buf.h b/fs/xfs/xfs_buf.h
>> index 6b0200b8007d..c51445705dc6 100644
>> --- a/fs/xfs/xfs_buf.h
>> +++ b/fs/xfs/xfs_buf.h
>> @@ -151,6 +151,7 @@ struct xfs_buf {
>>   	int			b_io_error;	/* internal IO error state */
>>   	wait_queue_head_t	b_waiters;	/* unpin waiters */
>>   	struct list_head	b_list;
>> +	struct list_head	b_delay;	/* delayed operations list */
>>   	struct xfs_perag	*b_pag;		/* contains rbtree root */
>>   	struct xfs_mount	*b_mount;
>>   	struct xfs_buftarg	*b_target;	/* buffer target (device) */
> 
> The bare list-conveyance machinery looks fine to me, but adding 16 bytes
> to struct xfs_buf for something that only happens during log recovery is
> rather expensive.  Can you reuse b_list for this purpose?  I think the
> only user of b_list are the buffer delwri functions, which shouldn't be
> active here since the xattr recovery mechanism (a) holds the buffer lock
> and (b) doesn't itself use delwri buffer lists for xattr leaf blocks.
> 
> (The AIL uses delwri lists, but it won't touch a locked buffer.)
> 
Sure, it sounds like it would work, will try it out.

>> diff --git a/fs/xfs/xfs_extfree_item.c b/fs/xfs/xfs_extfree_item.c
>> index 3f8a0713573a..046f21338c48 100644
>> --- a/fs/xfs/xfs_extfree_item.c
>> +++ b/fs/xfs/xfs_extfree_item.c
>> @@ -637,7 +637,7 @@ xfs_efi_item_recover(
>>   
>>   	}
>>   
>> -	return xfs_defer_ops_capture_and_commit(tp, NULL, capture_list);
>> +	return xfs_defer_ops_capture_and_commit(tp, NULL, capture_list, NULL);
>>   
>>   abort_error:
>>   	xfs_trans_cancel(tp);
>> diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
>> index 10562ecbd9ea..6a3c0bb16b69 100644
>> --- a/fs/xfs/xfs_log_recover.c
>> +++ b/fs/xfs/xfs_log_recover.c
>> @@ -2465,6 +2465,7 @@ xlog_finish_defer_ops(
>>   	struct list_head	*capture_list)
>>   {
>>   	struct xfs_defer_capture *dfc, *next;
>> +	struct xfs_buf		*bp, *bnext;
>>   	struct xfs_trans	*tp;
>>   	struct xfs_inode	*ip;
>>   	int			error = 0;
>> @@ -2489,6 +2490,12 @@ xlog_finish_defer_ops(
>>   			return error;
>>   		}
>>   
>> +		list_for_each_entry_safe(bp, bnext, &dfc->dfc_buffers, b_delay) {
>> +			xfs_trans_bjoin(tp, bp);
>> +			xfs_trans_bhold(tp, bp);
>> +			list_del_init(&bp->b_delay);
>> +		}
> 
> Why isn't this in xfs_defer_ops_continue, like the code that extracts
> the inodes from the capture struct and hands them back to the caller?
Its just what was discussed in the last review is all.  That does look 
like a better place for it though.  Will move there.

Allison

> 
>> +
>>   		/*
>>   		 * Transfer to this new transaction all the dfops we captured
>>   		 * from recovering a single intent item.
>> diff --git a/fs/xfs/xfs_refcount_item.c b/fs/xfs/xfs_refcount_item.c
>> index 46904b793bd4..a6e7351ca4f9 100644
>> --- a/fs/xfs/xfs_refcount_item.c
>> +++ b/fs/xfs/xfs_refcount_item.c
>> @@ -557,7 +557,7 @@ xfs_cui_item_recover(
>>   	}
>>   
>>   	xfs_refcount_finish_one_cleanup(tp, rcur, error);
>> -	return xfs_defer_ops_capture_and_commit(tp, NULL, capture_list);
>> +	return xfs_defer_ops_capture_and_commit(tp, NULL, capture_list, NULL);
>>   
>>   abort_error:
>>   	xfs_refcount_finish_one_cleanup(tp, rcur, error);
>> diff --git a/fs/xfs/xfs_rmap_item.c b/fs/xfs/xfs_rmap_item.c
>> index 5f0695980467..8c70a4af80a9 100644
>> --- a/fs/xfs/xfs_rmap_item.c
>> +++ b/fs/xfs/xfs_rmap_item.c
>> @@ -587,7 +587,7 @@ xfs_rui_item_recover(
>>   	}
>>   
>>   	xfs_rmap_finish_one_cleanup(tp, rcur, error);
>> -	return xfs_defer_ops_capture_and_commit(tp, NULL, capture_list);
>> +	return xfs_defer_ops_capture_and_commit(tp, NULL, capture_list, NULL);
>>   
>>   abort_error:
>>   	xfs_rmap_finish_one_cleanup(tp, rcur, error);
>> -- 
>> 2.25.1
>>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v24 02/11] xfs: Capture buffers for delayed ops
  2021-08-27 11:33   ` Chandan Babu R
@ 2021-08-31 18:03     ` Allison Henderson
  0 siblings, 0 replies; 45+ messages in thread
From: Allison Henderson @ 2021-08-31 18:03 UTC (permalink / raw)
  To: Chandan Babu R; +Cc: linux-xfs



On 8/27/21 4:33 AM, Chandan Babu R wrote:
> On 25 Aug 2021 at 04:14, Allison Henderson wrote:
>> This patch enables delayed operations to capture held buffers with in
>> the xfs_defer_capture. Buffers are then rejoined to the new
>> transaction in xlog_finish_defer_ops
>>
> 
> Looks good to me.
> 
> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
Thank you!
Allison

> 
>> Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
>> ---
>>   fs/xfs/libxfs/xfs_defer.c  | 7 ++++++-
>>   fs/xfs/libxfs/xfs_defer.h  | 4 +++-
>>   fs/xfs/xfs_bmap_item.c     | 2 +-
>>   fs/xfs/xfs_buf.c           | 1 +
>>   fs/xfs/xfs_buf.h           | 1 +
>>   fs/xfs/xfs_extfree_item.c  | 2 +-
>>   fs/xfs/xfs_log_recover.c   | 7 +++++++
>>   fs/xfs/xfs_refcount_item.c | 2 +-
>>   fs/xfs/xfs_rmap_item.c     | 2 +-
>>   9 files changed, 22 insertions(+), 6 deletions(-)
>>
>> diff --git a/fs/xfs/libxfs/xfs_defer.c b/fs/xfs/libxfs/xfs_defer.c
>> index eff4a127188e..d1d09b6aca55 100644
>> --- a/fs/xfs/libxfs/xfs_defer.c
>> +++ b/fs/xfs/libxfs/xfs_defer.c
>> @@ -639,6 +639,7 @@ xfs_defer_ops_capture(
>>   	dfc = kmem_zalloc(sizeof(*dfc), KM_NOFS);
>>   	INIT_LIST_HEAD(&dfc->dfc_list);
>>   	INIT_LIST_HEAD(&dfc->dfc_dfops);
>> +	INIT_LIST_HEAD(&dfc->dfc_buffers);
>>   
>>   	xfs_defer_create_intents(tp);
>>   
>> @@ -690,7 +691,8 @@ int
>>   xfs_defer_ops_capture_and_commit(
>>   	struct xfs_trans		*tp,
>>   	struct xfs_inode		*capture_ip,
>> -	struct list_head		*capture_list)
>> +	struct list_head		*capture_list,
>> +	struct xfs_buf			*bp)
>>   {
>>   	struct xfs_mount		*mp = tp->t_mountp;
>>   	struct xfs_defer_capture	*dfc;
>> @@ -703,6 +705,9 @@ xfs_defer_ops_capture_and_commit(
>>   	if (!dfc)
>>   		return xfs_trans_commit(tp);
>>   
>> +	if (bp && bp->b_transp == tp)
>> +		list_add_tail(&bp->b_delay, &dfc->dfc_buffers);
>> +
>>   	/* Commit the transaction and add the capture structure to the list. */
>>   	error = xfs_trans_commit(tp);
>>   	if (error) {
>> diff --git a/fs/xfs/libxfs/xfs_defer.h b/fs/xfs/libxfs/xfs_defer.h
>> index 05472f71fffe..739f70d72fd5 100644
>> --- a/fs/xfs/libxfs/xfs_defer.h
>> +++ b/fs/xfs/libxfs/xfs_defer.h
>> @@ -74,6 +74,7 @@ struct xfs_defer_capture {
>>   
>>   	/* Deferred ops state saved from the transaction. */
>>   	struct list_head	dfc_dfops;
>> +	struct list_head	dfc_buffers;
>>   	unsigned int		dfc_tpflags;
>>   
>>   	/* Block reservations for the data and rt devices. */
>> @@ -95,7 +96,8 @@ struct xfs_defer_capture {
>>    * This doesn't normally happen except log recovery.
>>    */
>>   int xfs_defer_ops_capture_and_commit(struct xfs_trans *tp,
>> -		struct xfs_inode *capture_ip, struct list_head *capture_list);
>> +		struct xfs_inode *capture_ip, struct list_head *capture_list,
>> +		struct xfs_buf *bp);
>>   void xfs_defer_ops_continue(struct xfs_defer_capture *d, struct xfs_trans *tp,
>>   		struct xfs_inode **captured_ipp);
>>   void xfs_defer_ops_release(struct xfs_mount *mp, struct xfs_defer_capture *d);
>> diff --git a/fs/xfs/xfs_bmap_item.c b/fs/xfs/xfs_bmap_item.c
>> index 03159970133f..51ba8ee368ca 100644
>> --- a/fs/xfs/xfs_bmap_item.c
>> +++ b/fs/xfs/xfs_bmap_item.c
>> @@ -532,7 +532,7 @@ xfs_bui_item_recover(
>>   	 * Commit transaction, which frees the transaction and saves the inode
>>   	 * for later replay activities.
>>   	 */
>> -	error = xfs_defer_ops_capture_and_commit(tp, ip, capture_list);
>> +	error = xfs_defer_ops_capture_and_commit(tp, ip, capture_list, NULL);
>>   	if (error)
>>   		goto err_unlock;
>>   
>> diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
>> index 047bd6e3f389..29b4655a0a65 100644
>> --- a/fs/xfs/xfs_buf.c
>> +++ b/fs/xfs/xfs_buf.c
>> @@ -233,6 +233,7 @@ _xfs_buf_alloc(
>>   	init_completion(&bp->b_iowait);
>>   	INIT_LIST_HEAD(&bp->b_lru);
>>   	INIT_LIST_HEAD(&bp->b_list);
>> +	INIT_LIST_HEAD(&bp->b_delay);
>>   	INIT_LIST_HEAD(&bp->b_li_list);
>>   	sema_init(&bp->b_sema, 0); /* held, no waiters */
>>   	spin_lock_init(&bp->b_lock);
>> diff --git a/fs/xfs/xfs_buf.h b/fs/xfs/xfs_buf.h
>> index 6b0200b8007d..c51445705dc6 100644
>> --- a/fs/xfs/xfs_buf.h
>> +++ b/fs/xfs/xfs_buf.h
>> @@ -151,6 +151,7 @@ struct xfs_buf {
>>   	int			b_io_error;	/* internal IO error state */
>>   	wait_queue_head_t	b_waiters;	/* unpin waiters */
>>   	struct list_head	b_list;
>> +	struct list_head	b_delay;	/* delayed operations list */
>>   	struct xfs_perag	*b_pag;		/* contains rbtree root */
>>   	struct xfs_mount	*b_mount;
>>   	struct xfs_buftarg	*b_target;	/* buffer target (device) */
>> diff --git a/fs/xfs/xfs_extfree_item.c b/fs/xfs/xfs_extfree_item.c
>> index 3f8a0713573a..046f21338c48 100644
>> --- a/fs/xfs/xfs_extfree_item.c
>> +++ b/fs/xfs/xfs_extfree_item.c
>> @@ -637,7 +637,7 @@ xfs_efi_item_recover(
>>   
>>   	}
>>   
>> -	return xfs_defer_ops_capture_and_commit(tp, NULL, capture_list);
>> +	return xfs_defer_ops_capture_and_commit(tp, NULL, capture_list, NULL);
>>   
>>   abort_error:
>>   	xfs_trans_cancel(tp);
>> diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
>> index 10562ecbd9ea..6a3c0bb16b69 100644
>> --- a/fs/xfs/xfs_log_recover.c
>> +++ b/fs/xfs/xfs_log_recover.c
>> @@ -2465,6 +2465,7 @@ xlog_finish_defer_ops(
>>   	struct list_head	*capture_list)
>>   {
>>   	struct xfs_defer_capture *dfc, *next;
>> +	struct xfs_buf		*bp, *bnext;
>>   	struct xfs_trans	*tp;
>>   	struct xfs_inode	*ip;
>>   	int			error = 0;
>> @@ -2489,6 +2490,12 @@ xlog_finish_defer_ops(
>>   			return error;
>>   		}
>>   
>> +		list_for_each_entry_safe(bp, bnext, &dfc->dfc_buffers, b_delay) {
>> +			xfs_trans_bjoin(tp, bp);
>> +			xfs_trans_bhold(tp, bp);
>> +			list_del_init(&bp->b_delay);
>> +		}
>> +
>>   		/*
>>   		 * Transfer to this new transaction all the dfops we captured
>>   		 * from recovering a single intent item.
>> diff --git a/fs/xfs/xfs_refcount_item.c b/fs/xfs/xfs_refcount_item.c
>> index 46904b793bd4..a6e7351ca4f9 100644
>> --- a/fs/xfs/xfs_refcount_item.c
>> +++ b/fs/xfs/xfs_refcount_item.c
>> @@ -557,7 +557,7 @@ xfs_cui_item_recover(
>>   	}
>>   
>>   	xfs_refcount_finish_one_cleanup(tp, rcur, error);
>> -	return xfs_defer_ops_capture_and_commit(tp, NULL, capture_list);
>> +	return xfs_defer_ops_capture_and_commit(tp, NULL, capture_list, NULL);
>>   
>>   abort_error:
>>   	xfs_refcount_finish_one_cleanup(tp, rcur, error);
>> diff --git a/fs/xfs/xfs_rmap_item.c b/fs/xfs/xfs_rmap_item.c
>> index 5f0695980467..8c70a4af80a9 100644
>> --- a/fs/xfs/xfs_rmap_item.c
>> +++ b/fs/xfs/xfs_rmap_item.c
>> @@ -587,7 +587,7 @@ xfs_rui_item_recover(
>>   	}
>>   
>>   	xfs_rmap_finish_one_cleanup(tp, rcur, error);
>> -	return xfs_defer_ops_capture_and_commit(tp, NULL, capture_list);
>> +	return xfs_defer_ops_capture_and_commit(tp, NULL, capture_list, NULL);
>>   
>>   abort_error:
>>   	xfs_rmap_finish_one_cleanup(tp, rcur, error);
> 
> 

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v24 03/11] xfs: Set up infrastructure for log atrribute replay
  2021-08-27 14:17   ` Chandan Babu R
@ 2021-08-31 18:03     ` Allison Henderson
  0 siblings, 0 replies; 45+ messages in thread
From: Allison Henderson @ 2021-08-31 18:03 UTC (permalink / raw)
  To: Chandan Babu R; +Cc: linux-xfs



On 8/27/21 7:17 AM, Chandan Babu R wrote:
> On 25 Aug 2021 at 04:14, Allison Henderson wrote:
>> Currently attributes are modified directly across one or more
>> transactions. But they are not logged or replayed in the event of an
>> error. The goal of log attr replay is to enable logging and replaying
>> of attribute operations using the existing delayed operations
>> infrastructure.  This will later enable the attributes to become part of
>> larger multi part operations that also must first be recorded to the
>> log.  This is mostly of interest in the scheme of parent pointers which
>> would need to maintain an attribute containing parent inode information
>> any time an inode is moved, created, or removed.  Parent pointers would
>> then be of interest to any feature that would need to quickly derive an
>> inode path from the mount point. Online scrub, nfs lookups and fs grow
>> or shrink operations are all features that could take advantage of this.
>>
>> This patch adds two new log item types for setting or removing
>> attributes as deferred operations.  The xfs_attri_log_item will log an
>> intent to set or remove an attribute.  The corresponding
>> xfs_attrd_log_item holds a reference to the xfs_attri_log_item and is
>> freed once the transaction is done.  Both log items use a generic
>> xfs_attr_log_format structure that contains the attribute name, value,
>> flags, inode, and an op_flag that indicates if the operations is a set
>> or remove.
> 
> Apart from the minor nits mentioned below, the remaining changes look good to
> me.
> 
> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
Great, thank you!

> 
>>
>> Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
>> ---
>>   fs/xfs/Makefile                 |   1 +
>>   fs/xfs/libxfs/xfs_attr.c        |   5 +-
>>   fs/xfs/libxfs/xfs_attr.h        |  31 +++
>>   fs/xfs/libxfs/xfs_defer.h       |   2 +
>>   fs/xfs/libxfs/xfs_log_format.h  |  44 +++-
>>   fs/xfs/libxfs/xfs_log_recover.h |   2 +
>>   fs/xfs/scrub/common.c           |   2 +
>>   fs/xfs/xfs_attr_item.c          | 453 ++++++++++++++++++++++++++++++++
>>   fs/xfs/xfs_attr_item.h          |  52 ++++
>>   fs/xfs/xfs_attr_list.c          |   1 +
>>   fs/xfs/xfs_ioctl32.c            |   2 +
>>   fs/xfs/xfs_iops.c               |   2 +
>>   fs/xfs/xfs_log.c                |   4 +
>>   fs/xfs/xfs_log_recover.c        |   2 +
>>   fs/xfs/xfs_ondisk.h             |   2 +
>>   15 files changed, 600 insertions(+), 5 deletions(-)
>>
>> diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
>> index 04611a1068b4..b056cfc6398e 100644
>> --- a/fs/xfs/Makefile
>> +++ b/fs/xfs/Makefile
>> @@ -102,6 +102,7 @@ xfs-y				+= xfs_log.o \
>>   				   xfs_buf_item_recover.o \
>>   				   xfs_dquot_item_recover.o \
>>   				   xfs_extfree_item.o \
>> +				   xfs_attr_item.o \
>>   				   xfs_icreate_item.o \
>>   				   xfs_inode_item.o \
>>   				   xfs_inode_item_recover.o \
>> diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
>> index 50b91b4461e7..dfff81024e46 100644
>> --- a/fs/xfs/libxfs/xfs_attr.c
>> +++ b/fs/xfs/libxfs/xfs_attr.c
>> @@ -24,6 +24,7 @@
>>   #include "xfs_quota.h"
>>   #include "xfs_trans_space.h"
>>   #include "xfs_trace.h"
>> +#include "xfs_attr_item.h"
>>   
>>   /*
>>    * xfs_attr.c
>> @@ -61,8 +62,6 @@ STATIC int xfs_attr_node_hasname(xfs_da_args_t *args,
>>   				 struct xfs_da_state **state);
>>   STATIC int xfs_attr_fillstate(xfs_da_state_t *state);
>>   STATIC int xfs_attr_refillstate(xfs_da_state_t *state);
>> -STATIC int xfs_attr_set_iter(struct xfs_delattr_context *dac,
>> -			     struct xfs_buf **leaf_bp);
>>   STATIC int xfs_attr_node_removename(struct xfs_da_args *args,
>>   				    struct xfs_da_state *state);
>>   
>> @@ -166,7 +165,7 @@ xfs_attr_get(
>>   /*
>>    * Calculate how many blocks we need for the new attribute,
>>    */
>> -STATIC int
>> +int
>>   xfs_attr_calc_size(
>>   	struct xfs_da_args	*args,
>>   	int			*local)
>> diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
>> index 5e71f719bdd5..aa33cdcf26b8 100644
>> --- a/fs/xfs/libxfs/xfs_attr.h
>> +++ b/fs/xfs/libxfs/xfs_attr.h
>> @@ -28,6 +28,11 @@ struct xfs_attr_list_context;
>>    */
>>   #define	ATTR_MAX_VALUELEN	(64*1024)	/* max length of a value */
>>   
>> +static inline bool xfs_has_larp(struct xfs_mount *mp)
>> +{
>> +	return false;
>> +}
>> +
>>   /*
>>    * Kernel-internal version of the attrlist cursor.
>>    */
>> @@ -454,6 +459,7 @@ enum xfs_delattr_state {
>>    */
>>   #define XFS_DAC_DEFER_FINISH		0x01 /* finish the transaction */
>>   #define XFS_DAC_LEAF_ADDNAME_INIT	0x02 /* xfs_attr_leaf_addname init*/
>> +#define XFS_DAC_DELAYED_OP_INIT		0x04 /* delayed operations init*/
> 
> A future patch in the series assigns a value of 0x02 to
> XFS_DAC_DELAYED_OP_INIT. Also, this macro is not used by any of the patches in
> this series.
> 
>>   
>>   /*
>>    * Context used for keeping track of delayed attribute operations
>> @@ -461,6 +467,11 @@ enum xfs_delattr_state {
>>   struct xfs_delattr_context {
>>   	struct xfs_da_args      *da_args;
>>   
>> +	/*
>> +	 * Used by xfs_attr_set to hold a leaf buffer across a transaction roll
>> +	 */
>> +	struct xfs_buf		*leaf_bp;
>> +
>>   	/* Used in xfs_attr_rmtval_set_blk to roll through allocating blocks */
>>   	struct xfs_bmbt_irec	map;
>>   	xfs_dablk_t		lblkno;
>> @@ -474,6 +485,23 @@ struct xfs_delattr_context {
>>   	enum xfs_delattr_state  dela_state;
>>   };
>>   
>> +/*
>> + * List of attrs to commit later.
>> + */
>> +struct xfs_attr_item {
>> +	struct xfs_delattr_context	xattri_dac;
>> +
>> +	/*
>> +	 * Indicates if the attr operation is a set or a remove
>> +	 * XFS_ATTR_OP_FLAGS_{SET,REMOVE}
>> +	 */
>> +	unsigned int			xattri_op_flags;
>> +
>> +	/* used to log this item to an intent */
>> +	struct list_head		xattri_list;
>> +};
>> +
>> +
>>   /*========================================================================
>>    * Function prototypes for the kernel.
>>    *========================================================================*/
>> @@ -490,10 +518,13 @@ int xfs_attr_get_ilocked(struct xfs_da_args *args);
>>   int xfs_attr_get(struct xfs_da_args *args);
>>   int xfs_attr_set(struct xfs_da_args *args);
>>   int xfs_attr_set_args(struct xfs_da_args *args);
>> +int xfs_attr_set_iter(struct xfs_delattr_context *dac,
>> +		      struct xfs_buf **leaf_bp);
>>   int xfs_attr_remove_args(struct xfs_da_args *args);
>>   int xfs_attr_remove_iter(struct xfs_delattr_context *dac);
>>   bool xfs_attr_namecheck(const void *name, size_t length);
>>   void xfs_delattr_context_init(struct xfs_delattr_context *dac,
>>   			      struct xfs_da_args *args);
>> +int xfs_attr_calc_size(struct xfs_da_args *args, int *local);
>>   
>>   #endif	/* __XFS_ATTR_H__ */
>> diff --git a/fs/xfs/libxfs/xfs_defer.h b/fs/xfs/libxfs/xfs_defer.h
>> index 739f70d72fd5..89719146c5eb 100644
>> --- a/fs/xfs/libxfs/xfs_defer.h
>> +++ b/fs/xfs/libxfs/xfs_defer.h
>> @@ -63,6 +63,8 @@ extern const struct xfs_defer_op_type xfs_refcount_update_defer_type;
>>   extern const struct xfs_defer_op_type xfs_rmap_update_defer_type;
>>   extern const struct xfs_defer_op_type xfs_extent_free_defer_type;
>>   extern const struct xfs_defer_op_type xfs_agfl_free_defer_type;
>> +extern const struct xfs_defer_op_type xfs_attr_defer_type;
>> +
>>   
>>   /*
>>    * This structure enables a dfops user to detach the chain of deferred
>> diff --git a/fs/xfs/libxfs/xfs_log_format.h b/fs/xfs/libxfs/xfs_log_format.h
>> index b322db523d65..3301c369e815 100644
>> --- a/fs/xfs/libxfs/xfs_log_format.h
>> +++ b/fs/xfs/libxfs/xfs_log_format.h
>> @@ -114,7 +114,12 @@ struct xfs_unmount_log_format {
>>   #define XLOG_REG_TYPE_CUD_FORMAT	24
>>   #define XLOG_REG_TYPE_BUI_FORMAT	25
>>   #define XLOG_REG_TYPE_BUD_FORMAT	26
>> -#define XLOG_REG_TYPE_MAX		26
>> +#define XLOG_REG_TYPE_ATTRI_FORMAT	27
>> +#define XLOG_REG_TYPE_ATTRD_FORMAT	28
>> +#define XLOG_REG_TYPE_ATTR_NAME	29
>> +#define XLOG_REG_TYPE_ATTR_VALUE	30
>> +#define XLOG_REG_TYPE_MAX		30
>> +
>>   
>>   /*
>>    * Flags to log operation header
>> @@ -237,6 +242,8 @@ typedef struct xfs_trans_header {
>>   #define	XFS_LI_CUD		0x1243
>>   #define	XFS_LI_BUI		0x1244	/* bmbt update intent */
>>   #define	XFS_LI_BUD		0x1245
>> +#define	XFS_LI_ATTRI		0x1246  /* attr set/remove intent*/
>> +#define	XFS_LI_ATTRD		0x1247  /* attr set/remove done */
>>   
>>   #define XFS_LI_TYPE_DESC \
>>   	{ XFS_LI_EFI,		"XFS_LI_EFI" }, \
>> @@ -252,7 +259,9 @@ typedef struct xfs_trans_header {
>>   	{ XFS_LI_CUI,		"XFS_LI_CUI" }, \
>>   	{ XFS_LI_CUD,		"XFS_LI_CUD" }, \
>>   	{ XFS_LI_BUI,		"XFS_LI_BUI" }, \
>> -	{ XFS_LI_BUD,		"XFS_LI_BUD" }
>> +	{ XFS_LI_BUD,		"XFS_LI_BUD" }, \
>> +	{ XFS_LI_ATTRI,		"XFS_LI_ATTRI" }, \
>> +	{ XFS_LI_ATTRD,		"XFS_LI_ATTRD" }
>>   
>>   /*
>>    * Inode Log Item Format definitions.
>> @@ -869,4 +878,35 @@ struct xfs_icreate_log {
>>   	__be32		icl_gen;	/* inode generation number to use */
>>   };
>>   
>> +/*
>> + * Flags for deferred attribute operations.
>> + * Upper bits are flags, lower byte is type code
>> + */
>> +#define XFS_ATTR_OP_FLAGS_SET		1	/* Set the attribute */
>> +#define XFS_ATTR_OP_FLAGS_REMOVE	2	/* Remove the attribute */
>> +#define XFS_ATTR_OP_FLAGS_TYPE_MASK	0xFF	/* Flags type mask */
>> +
>> +/*
>> + * This is the structure used to lay out an attr log item in the
>> + * log.
>> + */
>> +struct xfs_attri_log_format {
>> +	uint16_t	alfi_type;	/* attri log item type */
>> +	uint16_t	alfi_size;	/* size of this item */
>> +	uint32_t	__pad;		/* pad to 64 bit aligned */
>> +	uint64_t	alfi_id;	/* attri identifier */
>> +	uint64_t	alfi_ino;	/* the inode for this attr operation */
>> +	uint32_t	alfi_op_flags;	/* marks the op as a set or remove */
>> +	uint32_t	alfi_name_len;	/* attr name length */
>> +	uint32_t	alfi_value_len;	/* attr value length */
>> +	uint32_t	alfi_attr_flags;/* attr flags */
>> +};
>> +
>> +struct xfs_attrd_log_format {
>> +	uint16_t	alfd_type;	/* attrd log item type */
>> +	uint16_t	alfd_size;	/* size of this item */
>> +	uint32_t	__pad;		/* pad to 64 bit aligned */
>> +	uint64_t	alfd_alf_id;	/* id of corresponding attri */
>> +};
>> +
>>   #endif /* __XFS_LOG_FORMAT_H__ */
>> diff --git a/fs/xfs/libxfs/xfs_log_recover.h b/fs/xfs/libxfs/xfs_log_recover.h
>> index ff69a0000817..32e216255cb0 100644
>> --- a/fs/xfs/libxfs/xfs_log_recover.h
>> +++ b/fs/xfs/libxfs/xfs_log_recover.h
>> @@ -72,6 +72,8 @@ extern const struct xlog_recover_item_ops xlog_rui_item_ops;
>>   extern const struct xlog_recover_item_ops xlog_rud_item_ops;
>>   extern const struct xlog_recover_item_ops xlog_cui_item_ops;
>>   extern const struct xlog_recover_item_ops xlog_cud_item_ops;
>> +extern const struct xlog_recover_item_ops xlog_attri_item_ops;
>> +extern const struct xlog_recover_item_ops xlog_attrd_item_ops;
>>   
>>   /*
>>    * Macros, structures, prototypes for internal log manager use.
>> diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
>> index bf1f3607d0b6..97b54ac3075f 100644
>> --- a/fs/xfs/scrub/common.c
>> +++ b/fs/xfs/scrub/common.c
>> @@ -23,6 +23,8 @@
>>   #include "xfs_rmap_btree.h"
>>   #include "xfs_log.h"
>>   #include "xfs_trans_priv.h"
>> +#include "xfs_da_format.h"
>> +#include "xfs_da_btree.h"
>>   #include "xfs_attr.h"
>>   #include "xfs_reflink.h"
>>   #include "xfs_ag.h"
>> diff --git a/fs/xfs/xfs_attr_item.c b/fs/xfs/xfs_attr_item.c
>> new file mode 100644
>> index 000000000000..879a39ec58a6
>> --- /dev/null
>> +++ b/fs/xfs/xfs_attr_item.c
>> @@ -0,0 +1,453 @@
>> +// SPDX-License-Identifier: GPL-2.0-or-later
>> +/*
>> + * Copyright (C) 2021 Oracle.  All Rights Reserved.
>> + * Author: Allison Collins <allison.henderson@oracle.com>
>> + */
>> +
> 
> Some of the header files included below are not required. I was able to
> compile successfully without including xfs_bit.h, xfs_bmap_btree.h,
> xfs_buf_item.h, xfs_btree.h, xfs_rmap.h, xfs_icache.h, xfs_alloc.h, xfs_bmap.h
> and xfs_quota.h. Also xfs_shared.h is included twice.

Ok, I will see if I can clean out some includes here.

> 
>> +#include "xfs.h"
>> +#include "xfs_fs.h"
>> +#include "xfs_format.h"
>> +#include "xfs_trans_resv.h"
>> +#include "xfs_bit.h"
>> +#include "xfs_shared.h"
>> +#include "xfs_mount.h"
>> +#include "xfs_defer.h"
>> +#include "xfs_log_format.h"
>> +#include "xfs_trans.h"
>> +#include "xfs_bmap_btree.h"
>> +#include "xfs_trans_priv.h"
>> +#include "xfs_buf_item.h"
>> +#include "xfs_log.h"
>> +#include "xfs_btree.h"
>> +#include "xfs_rmap.h"
>> +#include "xfs_inode.h"
>> +#include "xfs_icache.h"
>> +#include "xfs_da_format.h"
>> +#include "xfs_da_btree.h"
>> +#include "xfs_attr.h"
>> +#include "xfs_shared.h"
>> +#include "xfs_attr_item.h"
>> +#include "xfs_alloc.h"
>> +#include "xfs_bmap.h"
>> +#include "xfs_trace.h"
>> +#include "libxfs/xfs_da_format.h"
>> +#include "xfs_inode.h"
>> +#include "xfs_quota.h"
>> +#include "xfs_trans_space.h"
>> +#include "xfs_error.h"
>> +#include "xfs_log_priv.h"
>> +#include "xfs_log_recover.h"
>> +
>> +static const struct xfs_item_ops xfs_attri_item_ops;
>> +static const struct xfs_item_ops xfs_attrd_item_ops;
>> +
>> +/* iovec length must be 32-bit aligned */
>> +static inline size_t ATTR_NVEC_SIZE(size_t size)
>> +{
>> +	return size == sizeof(int32_t) ? size :
>> +	       sizeof(int32_t) + round_up(size, sizeof(int32_t));
>> +}
>> +
>> +static inline struct xfs_attri_log_item *ATTRI_ITEM(struct xfs_log_item *lip)
>> +{
>> +	return container_of(lip, struct xfs_attri_log_item, attri_item);
>> +}
>> +
>> +STATIC void
>> +xfs_attri_item_free(
>> +	struct xfs_attri_log_item	*attrip)
>> +{
>> +	kmem_free(attrip->attri_item.li_lv_shadow);
>> +	kmem_free(attrip);
>> +}
>> +
>> +/*
>> + * Freeing the attrip requires that we remove it from the AIL if it has already
>> + * been placed there. However, the ATTRI may not yet have been placed in the
>> + * AIL when called by xfs_attri_release() from ATTRD processing due to the
>> + * ordering of committed vs unpin operations in bulk insert operations. Hence
>> + * the reference count to ensure only the last caller frees the ATTRI.
>> + */
>> +STATIC void
>> +xfs_attri_release(
>> +	struct xfs_attri_log_item	*attrip)
>> +{
>> +	ASSERT(atomic_read(&attrip->attri_refcount) > 0);
>> +	if (atomic_dec_and_test(&attrip->attri_refcount)) {
>> +		xfs_trans_ail_delete(&attrip->attri_item,
>> +				     SHUTDOWN_LOG_IO_ERROR);
>> +		xfs_attri_item_free(attrip);
>> +	}
>> +}
>> +
>> +STATIC void
>> +xfs_attri_item_size(
>> +	struct xfs_log_item	*lip,
>> +	int			*nvecs,
>> +	int			*nbytes)
>> +{
>> +	struct xfs_attri_log_item       *attrip = ATTRI_ITEM(lip);
>> +
>> +	*nvecs += 1;
>> +	*nbytes += sizeof(struct xfs_attri_log_format);
>> +
>> +	/* Attr set and remove operations require a name */
>> +	ASSERT(attrip->attri_name_len > 0);
>> +
>> +	*nvecs += 1;
>> +	*nbytes += ATTR_NVEC_SIZE(attrip->attri_name_len);
>> +
>> +	if (attrip->attri_value_len > 0) {
>> +		*nvecs += 1;
>> +		*nbytes += ATTR_NVEC_SIZE(attrip->attri_value_len);
>> +	}
>> +}
>> +
>> +/*
>> + * This is called to fill in the log iovecs for the given attri log
>> + * item. We use  1 iovec for the attri_format_item, 1 for the name, and
>> + * another for the value if it is present
>> + */
>> +STATIC void
>> +xfs_attri_item_format(
>> +	struct xfs_log_item	*lip,
>> +	struct xfs_log_vec	*lv)
>> +{
>> +	struct xfs_attri_log_item	*attrip = ATTRI_ITEM(lip);
>> +	struct xfs_log_iovec		*vecp = NULL;
>> +
>> +	attrip->attri_format.alfi_type = XFS_LI_ATTRI;
>> +	attrip->attri_format.alfi_size = 1;
>> +
>> +	/*
>> +	 * This size accounting must be done before copying the attrip into the
>> +	 * iovec.  If we do it after, the wrong size will be recorded to the log
>> +	 * and we trip across assertion checks for bad region sizes later during
>> +	 * the log recovery.
>> +	 */
>> +
>> +	ASSERT(attrip->attri_name_len > 0);
>> +	attrip->attri_format.alfi_size++;
>> +
>> +	if (attrip->attri_value_len > 0)
>> +		attrip->attri_format.alfi_size++;
>> +
>> +	xlog_copy_iovec(lv, &vecp, XLOG_REG_TYPE_ATTRI_FORMAT,
>> +			&attrip->attri_format,
>> +			sizeof(struct xfs_attri_log_format));
>> +	xlog_copy_iovec(lv, &vecp, XLOG_REG_TYPE_ATTR_NAME,
>> +			attrip->attri_name,
>> +			ATTR_NVEC_SIZE(attrip->attri_name_len));
>> +	if (attrip->attri_value_len > 0)
>> +		xlog_copy_iovec(lv, &vecp, XLOG_REG_TYPE_ATTR_VALUE,
>> +				attrip->attri_value,
>> +				ATTR_NVEC_SIZE(attrip->attri_value_len));
>> +}
>> +
>> +/*
>> + * The unpin operation is the last place an ATTRI is manipulated in the log. It
>> + * is either inserted in the AIL or aborted in the event of a log I/O error. In
>> + * either case, the ATTRI transaction has been successfully committed to make
>> + * it this far. Therefore, we expect whoever committed the ATTRI to either
>> + * construct and commit the ATTRD or drop the ATTRD's reference in the event of
>> + * error. Simply drop the log's ATTRI reference now that the log is done with
>> + * it.
>> + */
>> +STATIC void
>> +xfs_attri_item_unpin(
>> +	struct xfs_log_item	*lip,
>> +	int			remove)
>> +{
>> +	xfs_attri_release(ATTRI_ITEM(lip));
>> +}
>> +
>> +
>> +STATIC void
>> +xfs_attri_item_release(
>> +	struct xfs_log_item	*lip)
>> +{
>> +	xfs_attri_release(ATTRI_ITEM(lip));
>> +}
>> +
>> +/*
>> + * Allocate and initialize an attri item.  Caller may allocate an additional
>> + * trailing buffer of the specified size
>> + */
>> +STATIC struct xfs_attri_log_item *
>> +xfs_attri_init(
>> +	struct xfs_mount		*mp,
>> +	int				buffer_size)
>> +
>> +{
>> +	struct xfs_attri_log_item	*attrip;
>> +	uint				size;
>> +
>> +	size = sizeof(struct xfs_attri_log_item) + buffer_size;
>> +	attrip = kvmalloc(size, KM_ZERO);
>> +	if (attrip == NULL)
>> +		return NULL;
>> +
>> +	xfs_log_item_init(mp, &attrip->attri_item, XFS_LI_ATTRI,
>> +			  &xfs_attri_item_ops);
>> +	attrip->attri_format.alfi_id = (uintptr_t)(void *)attrip;
>> +	atomic_set(&attrip->attri_refcount, 2);
>> +
>> +	return attrip;
>> +}
>> +
>> +/*
>> + * Copy an attr format buffer from the given buf, and into the destination attr
>> + * format structure.
>> + */
>> +STATIC int
>> +xfs_attri_copy_format(
>> +	struct xfs_log_iovec		*buf,
>> +	struct xfs_attri_log_format	*dst_attr_fmt)
>> +{
>> +	struct xfs_attri_log_format	*src_attr_fmt = buf->i_addr;
>> +	uint				len;
>> +
>> +	len = sizeof(struct xfs_attri_log_format);
>> +	if (buf->i_len != len) {
>> +		XFS_ERROR_REPORT(__func__, XFS_ERRLEVEL_LOW, NULL);
>> +		return -EFSCORRUPTED;
>> +	}
>> +
>> +	memcpy((char *)dst_attr_fmt, (char *)src_attr_fmt, len);
>> +	return 0;
>> +}
>> +
>> +static inline struct xfs_attrd_log_item *ATTRD_ITEM(struct xfs_log_item *lip)
>> +{
>> +	return container_of(lip, struct xfs_attrd_log_item, attrd_item);
>> +}
>> +
>> +STATIC void
>> +xfs_attrd_item_free(struct xfs_attrd_log_item *attrdp)
>> +{
>> +	kmem_free(attrdp->attrd_item.li_lv_shadow);
>> +	kmem_free(attrdp);
>> +}
>> +
>> +STATIC void
>> +xfs_attrd_item_size(
>> +	struct xfs_log_item		*lip,
>> +	int				*nvecs,
>> +	int				*nbytes)
>> +{
>> +	*nvecs += 1;
>> +	*nbytes += sizeof(struct xfs_attrd_log_format);
>> +}
>> +
>> +/*
>> + * This is called to fill in the log iovecs for the given attrd log item. We use
>> + * only 1 iovec for the attrd_format, and we point that at the attr_log_format
>> + * structure embedded in the attrd item.
>> + */
>> +STATIC void
>> +xfs_attrd_item_format(
>> +	struct xfs_log_item	*lip,
>> +	struct xfs_log_vec	*lv)
>> +{
>> +	struct xfs_attrd_log_item	*attrdp = ATTRD_ITEM(lip);
>> +	struct xfs_log_iovec		*vecp = NULL;
>> +
>> +	attrdp->attrd_format.alfd_type = XFS_LI_ATTRD;
>> +	attrdp->attrd_format.alfd_size = 1;
>> +
>> +	xlog_copy_iovec(lv, &vecp, XLOG_REG_TYPE_ATTRD_FORMAT,
>> +			&attrdp->attrd_format,
>> +			sizeof(struct xfs_attrd_log_format));
>> +}
>> +
>> +/*
>> + * The ATTRD is either committed or aborted if the transaction is canceled. If
>> + * the transaction is canceled, drop our reference to the ATTRI and free the
>> + * ATTRD.
>> + */
>> +STATIC void
>> +xfs_attrd_item_release(
>> +	struct xfs_log_item		*lip)
>> +{
>> +	struct xfs_attrd_log_item	*attrdp = ATTRD_ITEM(lip);
>> +
>> +	xfs_attri_release(attrdp->attrd_attrip);
>> +	xfs_attrd_item_free(attrdp);
>> +}
>> +
>> +STATIC xfs_lsn_t
>> +xfs_attri_item_committed(
>> +	struct xfs_log_item		*lip,
>> +	xfs_lsn_t			lsn)
>> +{
>> +	struct xfs_attri_log_item	*attrip = ATTRI_ITEM(lip);
>> +
>> +	/*
>> +	 * The attrip refers to xfs_attr_item memory to log the name and value
>> +	 * with the intent item. This already occurred when the intent was
>> +	 * committed so these fields are no longer accessed. Clear them out of
>> +	 * caution since we're about to free the xfs_attr_item.
>> +	 */
>> +	attrip->attri_name = NULL;
>> +	attrip->attri_value = NULL;
>> +
>> +	/*
>> +	 * The ATTRI is logged only once and cannot be moved in the log, so
>> +	 * simply return the lsn at which it's been logged.
>> +	 */
>> +	return lsn;
>> +}
>> +
>> +STATIC bool
>> +xfs_attri_item_match(
>> +	struct xfs_log_item	*lip,
>> +	uint64_t		intent_id)
>> +{
>> +	return ATTRI_ITEM(lip)->attri_format.alfi_id == intent_id;
>> +}
>> +
>> +static const struct xfs_item_ops xfs_attrd_item_ops = {
>> +	.flags		= XFS_ITEM_RELEASE_WHEN_COMMITTED,
>> +	.iop_size	= xfs_attrd_item_size,
>> +	.iop_format	= xfs_attrd_item_format,
>> +	.iop_release    = xfs_attrd_item_release,
>> +};
>> +
>> +/* Is this recovered ATTRI ok? */
>> +static inline bool
>> +xfs_attri_validate(
>> +	struct xfs_mount		*mp,
>> +	struct xfs_attri_log_item	*attrip)
>> +{
>> +	struct xfs_attri_log_format     *attrp = &attrip->attri_format;
>> +	unsigned int			op = attrp->alfi_op_flags &
>> +					     XFS_ATTR_OP_FLAGS_TYPE_MASK;
>> +
>> +	/* alfi_op_flags should be either a set or remove */
>> +	if (op != XFS_ATTR_OP_FLAGS_SET && op != XFS_ATTR_OP_FLAGS_REMOVE)
>> +		return false;
>> +
>> +	if (attrp->alfi_value_len > XATTR_SIZE_MAX)
>> +		return false;
>> +
>> +	if ((attrp->alfi_name_len > XATTR_NAME_MAX) ||
>> +	    (attrp->alfi_name_len == 0))
>> +		return false;
>> +
>> +	return xfs_verify_ino(mp, attrp->alfi_ino);
>> +}
>> +
>> +static const struct xfs_item_ops xfs_attri_item_ops = {
>> +	.iop_size	= xfs_attri_item_size,
>> +	.iop_format	= xfs_attri_item_format,
>> +	.iop_unpin	= xfs_attri_item_unpin,
>> +	.iop_committed	= xfs_attri_item_committed,
>> +	.iop_release    = xfs_attri_item_release,
>> +	.iop_match	= xfs_attri_item_match,
>> +};
>> +
>> +
>> +
>> +STATIC int
>> +xlog_recover_attri_commit_pass2(
>> +	struct xlog                     *log,
>> +	struct list_head		*buffer_list,
>> +	struct xlog_recover_item        *item,
>> +	xfs_lsn_t                       lsn)
>> +{
>> +	int                             error;
>> +	struct xfs_mount                *mp = log->l_mp;
>> +	struct xfs_attri_log_item       *attrip;
>> +	struct xfs_attri_log_format     *attri_formatp;
>> +	char				*name = NULL;
>> +	char				*value = NULL;
>> +	int				region = 0;
>> +	int				buffer_size;
>> +
>> +	attri_formatp = item->ri_buf[region].i_addr;
>> +
>> +	/* Validate xfs_attri_log_format */
>> +	if (attri_formatp->__pad != 0 || attri_formatp->alfi_name_len == 0 ||
>> +	    (attri_formatp->alfi_op_flags == XFS_ATTR_OP_FLAGS_REMOVE &&
>> +	    attri_formatp->alfi_value_len != 0)) {
>> +		XFS_ERROR_REPORT(__func__, XFS_ERRLEVEL_LOW, mp);
>> +		return -EFSCORRUPTED;
>> +	}
>> +
>> +	buffer_size = attri_formatp->alfi_name_len +
>> +		      attri_formatp->alfi_value_len;
>> +
>> +	attrip = xfs_attri_init(mp, buffer_size);
>> +	if (attrip == NULL)
>> +		return -ENOMEM;
>> +
>> +	error = xfs_attri_copy_format(&item->ri_buf[region],
>> +				      &attrip->attri_format);
>> +	if (error) {
>> +		xfs_attri_item_free(attrip);
>> +		return error;
>> +	}
>> +
>> +	attrip->attri_name_len = attri_formatp->alfi_name_len;
>> +	attrip->attri_value_len = attri_formatp->alfi_value_len;
>> +	region++;
>> +	name = ((char *)attrip) + sizeof(struct xfs_attri_log_item);
>> +	memcpy(name, item->ri_buf[region].i_addr, attrip->attri_name_len);
>> +	attrip->attri_name = name;
>> +
>> +	if (attrip->attri_value_len > 0) {
>> +		region++;
>> +		value = ((char *)attrip) + sizeof(struct xfs_attri_log_item) +
>> +			attrip->attri_name_len;
>> +		memcpy(value, item->ri_buf[region].i_addr,
>> +			attrip->attri_value_len);
>> +		attrip->attri_value = value;
>> +	}
>> +
>> +	/*
>> +	 * The ATTRI has two references. One for the ATTRD and one for ATTRI to
>> +	 * ensure it makes it into the AIL. Insert the ATTRI into the AIL
>> +	 * directly and drop the ATTRI reference. Note that
>> +	 * xfs_trans_ail_update() drops the AIL lock.
>> +	 */
>> +	xfs_trans_ail_insert(log->l_ailp, &attrip->attri_item, lsn);
>> +	xfs_attri_release(attrip);
>> +	return 0;
>> +}
>> +
>> +const struct xlog_recover_item_ops xlog_attri_item_ops = {
>> +	.item_type	= XFS_LI_ATTRI,
>> +	.commit_pass2	= xlog_recover_attri_commit_pass2,
>> +};
>> +
>> +/*
>> + * This routine is called when an ATTRD format structure is found in a committed
>> + * transaction in the log. Its purpose is to cancel the corresponding ATTRI if
>> + * it was still in the log. To do this it searches the AIL for the ATTRI with
>> + * an id equal to that in the ATTRD format structure. If we find it we drop
>> + * the ATTRD reference, which removes the ATTRI from the AIL and frees it.
>> + */
>> +STATIC int
>> +xlog_recover_attrd_commit_pass2(
>> +	struct xlog			*log,
>> +	struct list_head		*buffer_list,
>> +	struct xlog_recover_item	*item,
>> +	xfs_lsn_t			lsn)
>> +{
>> +	struct xfs_attrd_log_format	*attrd_formatp;
>> +
>> +	attrd_formatp = item->ri_buf[0].i_addr;
>> +	if (item->ri_buf[0].i_len != sizeof(struct xfs_attrd_log_format)) {
>> +		XFS_ERROR_REPORT(__func__, XFS_ERRLEVEL_LOW, NULL);
>> +		return -EFSCORRUPTED;
>> +	}
>> +
>> +	xlog_recover_release_intent(log, XFS_LI_ATTRI,
>> +				    attrd_formatp->alfd_alf_id);
>> +	return 0;
>> +}
>> +
>> +const struct xlog_recover_item_ops xlog_attrd_item_ops = {
>> +	.item_type	= XFS_LI_ATTRD,
>> +	.commit_pass2	= xlog_recover_attrd_commit_pass2,
>> +};
>> diff --git a/fs/xfs/xfs_attr_item.h b/fs/xfs/xfs_attr_item.h
>> new file mode 100644
>> index 000000000000..ce33e9b5a9d3
>> --- /dev/null
>> +++ b/fs/xfs/xfs_attr_item.h
>> @@ -0,0 +1,52 @@
>> +/* SPDX-License-Identifier: GPL-2.0-or-later
>> + *
>> + * Copyright (C) 2021 Oracle.  All Rights Reserved.
>> + * Author: Allison Collins <allison.henderson@oracle.com>
>> + */
>> +#ifndef	__XFS_ATTR_ITEM_H__
>> +#define	__XFS_ATTR_ITEM_H__
>> +
>> +/* kernel only ATTRI/ATTRD definitions */
>> +
>> +struct xfs_mount;
>> +struct kmem_zone;
>> +
>> +/*
>> + * Define ATTR flag bits. Manipulated by set/clear/test_bit operators.
>> + */
>> +#define	XFS_ATTRI_RECOVERED	1
> 
> XFS_ATTRI_RECOVERED isn't used in this patch nor in any of the future patches
> in this series.
I think the code that used this was removed in one of the older reviews. 
  Will clean out.

Thanks!
Allison
> 
>> +
>> +
>> +/*
>> + * This is the "attr intention" log item.  It is used to log the fact that some
>> + * attribute operations need to be processed.  An operation is currently either
>> + * a set or remove.  Set or remove operations are described by the xfs_attr_item
>> + * which may be logged to this intent.
>> + *
>> + * During a normal attr operation, name and value point to the name and value
>> + * fields of the calling functions xfs_da_args.  During a recovery, the name
>> + * and value buffers are copied from the log, and stored in a trailing buffer
>> + * attached to the xfs_attr_item until they are committed.  They are freed when
>> + * the xfs_attr_item itself is freed when the work is done.
>> + */
>> +struct xfs_attri_log_item {
>> +	struct xfs_log_item		attri_item;
>> +	atomic_t			attri_refcount;
>> +	int				attri_name_len;
>> +	int				attri_value_len;
>> +	void				*attri_name;
>> +	void				*attri_value;
>> +	struct xfs_attri_log_format	attri_format;
>> +};
>> +
>> +/*
>> + * This is the "attr done" log item.  It is used to log the fact that some attrs
>> + * earlier mentioned in an attri item have been freed.
>> + */
>> +struct xfs_attrd_log_item {
>> +	struct xfs_attri_log_item	*attrd_attrip;
>> +	struct xfs_log_item		attrd_item;
>> +	struct xfs_attrd_log_format	attrd_format;
>> +};
>> +
>> +#endif	/* __XFS_ATTR_ITEM_H__ */
>> diff --git a/fs/xfs/xfs_attr_list.c b/fs/xfs/xfs_attr_list.c
>> index 2d1e5134cebe..90a14e85e76d 100644
>> --- a/fs/xfs/xfs_attr_list.c
>> +++ b/fs/xfs/xfs_attr_list.c
>> @@ -15,6 +15,7 @@
>>   #include "xfs_inode.h"
>>   #include "xfs_trans.h"
>>   #include "xfs_bmap.h"
>> +#include "xfs_da_btree.h"
>>   #include "xfs_attr.h"
>>   #include "xfs_attr_sf.h"
>>   #include "xfs_attr_leaf.h"
>> diff --git a/fs/xfs/xfs_ioctl32.c b/fs/xfs/xfs_ioctl32.c
>> index 8783af203cfc..ab543c5b1371 100644
>> --- a/fs/xfs/xfs_ioctl32.c
>> +++ b/fs/xfs/xfs_ioctl32.c
>> @@ -17,6 +17,8 @@
>>   #include "xfs_itable.h"
>>   #include "xfs_fsops.h"
>>   #include "xfs_rtalloc.h"
>> +#include "xfs_da_format.h"
>> +#include "xfs_da_btree.h"
>>   #include "xfs_attr.h"
>>   #include "xfs_ioctl.h"
>>   #include "xfs_ioctl32.h"
>> diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
>> index 0ff0cca94092..f81e5d55341b 100644
>> --- a/fs/xfs/xfs_iops.c
>> +++ b/fs/xfs/xfs_iops.c
>> @@ -13,6 +13,8 @@
>>   #include "xfs_inode.h"
>>   #include "xfs_acl.h"
>>   #include "xfs_quota.h"
>> +#include "xfs_da_format.h"
>> +#include "xfs_da_btree.h"
>>   #include "xfs_attr.h"
>>   #include "xfs_trans.h"
>>   #include "xfs_trace.h"
>> diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
>> index f6cd2d4aa770..4402c5d09269 100644
>> --- a/fs/xfs/xfs_log.c
>> +++ b/fs/xfs/xfs_log.c
>> @@ -2157,6 +2157,10 @@ xlog_print_tic_res(
>>   	    REG_TYPE_STR(CUD_FORMAT, "cud_format"),
>>   	    REG_TYPE_STR(BUI_FORMAT, "bui_format"),
>>   	    REG_TYPE_STR(BUD_FORMAT, "bud_format"),
>> +	    REG_TYPE_STR(ATTRI_FORMAT, "attri_format"),
>> +	    REG_TYPE_STR(ATTRD_FORMAT, "attrd_format"),
>> +	    REG_TYPE_STR(ATTR_NAME, "attr name"),
>> +	    REG_TYPE_STR(ATTR_VALUE, "attr value"),
>>   	};
>>   	BUILD_BUG_ON(ARRAY_SIZE(res_type_str) != XLOG_REG_TYPE_MAX + 1);
>>   #undef REG_TYPE_STR
>> diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
>> index 6a3c0bb16b69..c248fd5cf9ea 100644
>> --- a/fs/xfs/xfs_log_recover.c
>> +++ b/fs/xfs/xfs_log_recover.c
>> @@ -1800,6 +1800,8 @@ static const struct xlog_recover_item_ops *xlog_recover_item_ops[] = {
>>   	&xlog_cud_item_ops,
>>   	&xlog_bui_item_ops,
>>   	&xlog_bud_item_ops,
>> +	&xlog_attri_item_ops,
>> +	&xlog_attrd_item_ops,
>>   };
>>   
>>   static const struct xlog_recover_item_ops *
>> diff --git a/fs/xfs/xfs_ondisk.h b/fs/xfs/xfs_ondisk.h
>> index 25991923c1a8..758702b9495f 100644
>> --- a/fs/xfs/xfs_ondisk.h
>> +++ b/fs/xfs/xfs_ondisk.h
>> @@ -132,6 +132,8 @@ xfs_check_ondisk_structs(void)
>>   	XFS_CHECK_STRUCT_SIZE(struct xfs_inode_log_format,	56);
>>   	XFS_CHECK_STRUCT_SIZE(struct xfs_qoff_logformat,	20);
>>   	XFS_CHECK_STRUCT_SIZE(struct xfs_trans_header,		16);
>> +	XFS_CHECK_STRUCT_SIZE(struct xfs_attri_log_format,	40);
>> +	XFS_CHECK_STRUCT_SIZE(struct xfs_attrd_log_format,	16);
>>   
>>   	/*
>>   	 * The v5 superblock format extended several v4 header structures with
> 
> 

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v24 04/11] xfs: Implement attr logging and replay
  2021-08-30  7:47   ` Chandan Babu R
@ 2021-08-31 18:11     ` Allison Henderson
  0 siblings, 0 replies; 45+ messages in thread
From: Allison Henderson @ 2021-08-31 18:11 UTC (permalink / raw)
  To: Chandan Babu R; +Cc: linux-xfs



On 8/30/21 12:47 AM, Chandan Babu R wrote:
> On 25 Aug 2021 at 04:14, Allison Henderson wrote:
>> This patch adds the needed routines to create, log and recover logged
>> extended attribute intents.
>>
> 
> Apart from the two nits mentioned below, the remaining changes seem to
> correctly handle attr set/remove operations regardless of whether delayed
> logging is enabled or not.
> 
> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
Ok, thank you!

> 
>> Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
>> ---
>>   fs/xfs/libxfs/xfs_defer.c  |   1 +
>>   fs/xfs/libxfs/xfs_defer.h  |   1 +
>>   fs/xfs/libxfs/xfs_format.h |  10 +-
>>   fs/xfs/xfs_attr_item.c     | 358 +++++++++++++++++++++++++++++++++++++
>>   4 files changed, 369 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/xfs/libxfs/xfs_defer.c b/fs/xfs/libxfs/xfs_defer.c
>> index d1d09b6aca55..01fcf5e93be5 100644
>> --- a/fs/xfs/libxfs/xfs_defer.c
>> +++ b/fs/xfs/libxfs/xfs_defer.c
>> @@ -178,6 +178,7 @@ static const struct xfs_defer_op_type *defer_op_types[] = {
>>   	[XFS_DEFER_OPS_TYPE_RMAP]	= &xfs_rmap_update_defer_type,
>>   	[XFS_DEFER_OPS_TYPE_FREE]	= &xfs_extent_free_defer_type,
>>   	[XFS_DEFER_OPS_TYPE_AGFL_FREE]	= &xfs_agfl_free_defer_type,
>> +	[XFS_DEFER_OPS_TYPE_ATTR]	= &xfs_attr_defer_type,
>>   };
>>   
>>   static void
>> diff --git a/fs/xfs/libxfs/xfs_defer.h b/fs/xfs/libxfs/xfs_defer.h
>> index 89719146c5eb..d70525c57b5c 100644
>> --- a/fs/xfs/libxfs/xfs_defer.h
>> +++ b/fs/xfs/libxfs/xfs_defer.h
>> @@ -19,6 +19,7 @@ enum xfs_defer_ops_type {
>>   	XFS_DEFER_OPS_TYPE_RMAP,
>>   	XFS_DEFER_OPS_TYPE_FREE,
>>   	XFS_DEFER_OPS_TYPE_AGFL_FREE,
>> +	XFS_DEFER_OPS_TYPE_ATTR,
>>   	XFS_DEFER_OPS_TYPE_MAX,
>>   };
>>   
>> diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
>> index 2d7057b7984b..2e0937bbff6d 100644
>> --- a/fs/xfs/libxfs/xfs_format.h
>> +++ b/fs/xfs/libxfs/xfs_format.h
>> @@ -388,7 +388,9 @@ xfs_sb_has_incompat_feature(
>>   	return (sbp->sb_features_incompat & feature) != 0;
>>   }
>>   
>> -#define XFS_SB_FEAT_INCOMPAT_LOG_ALL 0
>> +#define XFS_SB_FEAT_INCOMPAT_LOG_XATTRS   (1 << 0)	/* Delayed Attributes */
>> +#define XFS_SB_FEAT_INCOMPAT_LOG_ALL \
>> +	(XFS_SB_FEAT_INCOMPAT_LOG_XATTRS)
>>   #define XFS_SB_FEAT_INCOMPAT_LOG_UNKNOWN	~XFS_SB_FEAT_INCOMPAT_LOG_ALL
>>   static inline bool
>>   xfs_sb_has_incompat_log_feature(
>> @@ -413,6 +415,12 @@ xfs_sb_add_incompat_log_features(
>>   	sbp->sb_features_log_incompat |= features;
>>   }
>>
> 
> To be consistent with the existing convention, may be the function below
> should be renamed as xfs_sb_version_haslogxattrs()?
Sure, will add xfs_ prefix

> 
>> +static inline bool sb_version_haslogxattrs(struct xfs_sb *sbp)
>> +{
>> +	return XFS_SB_VERSION_NUM(sbp) == XFS_SB_VERSION_5 &&
> 
> The above comparison can be replaced with a call to xfs_sb_is_v5().
Ok, will update. Thanks!
Allison

> 
>> +		(sbp->sb_features_log_incompat &
>> +		 XFS_SB_FEAT_INCOMPAT_LOG_XATTRS);
>> +}
>>   
>>   static inline bool
>>   xfs_is_quota_inode(struct xfs_sb *sbp, xfs_ino_t ino)
>> diff --git a/fs/xfs/xfs_attr_item.c b/fs/xfs/xfs_attr_item.c
>> index 879a39ec58a6..c6d5ed34b424 100644
>> --- a/fs/xfs/xfs_attr_item.c
>> +++ b/fs/xfs/xfs_attr_item.c
>> @@ -275,6 +275,163 @@ xfs_attrd_item_release(
>>   	xfs_attrd_item_free(attrdp);
>>   }
>>   
>> +/*
>> + * Performs one step of an attribute update intent and marks the attrd item
>> + * dirty..  An attr operation may be a set or a remove.  Note that the
>> + * transaction is marked dirty regardless of whether the operation succeeds or
>> + * fails to support the ATTRI/ATTRD lifecycle rules.
>> + */
>> +STATIC int
>> +xfs_trans_attr_finish_update(
>> +	struct xfs_delattr_context	*dac,
>> +	struct xfs_attrd_log_item	*attrdp,
>> +	struct xfs_buf			**leaf_bp,
>> +	uint32_t			op_flags)
>> +{
>> +	struct xfs_da_args		*args = dac->da_args;
>> +	unsigned int			op = op_flags &
>> +					     XFS_ATTR_OP_FLAGS_TYPE_MASK;
>> +	int				error;
>> +
>> +	switch (op) {
>> +	case XFS_ATTR_OP_FLAGS_SET:
>> +		error = xfs_attr_set_iter(dac, leaf_bp);
>> +		break;
>> +	case XFS_ATTR_OP_FLAGS_REMOVE:
>> +		ASSERT(XFS_IFORK_Q(args->dp));
>> +		error = xfs_attr_remove_iter(dac);
>> +		break;
>> +	default:
>> +		error = -EFSCORRUPTED;
>> +		break;
>> +	}
>> +
>> +	/*
>> +	 * Mark the transaction dirty, even on error. This ensures the
>> +	 * transaction is aborted, which:
>> +	 *
>> +	 * 1.) releases the ATTRI and frees the ATTRD
>> +	 * 2.) shuts down the filesystem
>> +	 */
>> +	args->trans->t_flags |= XFS_TRANS_DIRTY;
>> +
>> +	/*
>> +	 * attr intent/done items are null when delayed attributes are disabled
>> +	 */
>> +	if (attrdp)
>> +		set_bit(XFS_LI_DIRTY, &attrdp->attrd_item.li_flags);
>> +
>> +	return error;
>> +}
>> +
>> +/* Log an attr to the intent item. */
>> +STATIC void
>> +xfs_attr_log_item(
>> +	struct xfs_trans		*tp,
>> +	struct xfs_attri_log_item	*attrip,
>> +	struct xfs_attr_item		*attr)
>> +{
>> +	struct xfs_attri_log_format	*attrp;
>> +
>> +	tp->t_flags |= XFS_TRANS_DIRTY;
>> +	set_bit(XFS_LI_DIRTY, &attrip->attri_item.li_flags);
>> +
>> +	/*
>> +	 * At this point the xfs_attr_item has been constructed, and we've
>> +	 * created the log intent. Fill in the attri log item and log format
>> +	 * structure with fields from this xfs_attr_item
>> +	 */
>> +	attrp = &attrip->attri_format;
>> +	attrp->alfi_ino = attr->xattri_dac.da_args->dp->i_ino;
>> +	attrp->alfi_op_flags = attr->xattri_op_flags;
>> +	attrp->alfi_value_len = attr->xattri_dac.da_args->valuelen;
>> +	attrp->alfi_name_len = attr->xattri_dac.da_args->namelen;
>> +	attrp->alfi_attr_flags = attr->xattri_dac.da_args->attr_filter;
>> +
>> +	attrip->attri_name = (void *)attr->xattri_dac.da_args->name;
>> +	attrip->attri_value = attr->xattri_dac.da_args->value;
>> +	attrip->attri_name_len = attr->xattri_dac.da_args->namelen;
>> +	attrip->attri_value_len = attr->xattri_dac.da_args->valuelen;
>> +}
>> +
>> +/* Get an ATTRI. */
>> +static struct xfs_log_item *
>> +xfs_attr_create_intent(
>> +	struct xfs_trans		*tp,
>> +	struct list_head		*items,
>> +	unsigned int			count,
>> +	bool				sort)
>> +{
>> +	struct xfs_mount		*mp = tp->t_mountp;
>> +	struct xfs_attri_log_item	*attrip;
>> +	struct xfs_attr_item		*attr;
>> +
>> +	ASSERT(count == 1);
>> +
>> +	if (!sb_version_haslogxattrs(&mp->m_sb))
>> +		return NULL;
>> +
>> +	attrip = xfs_attri_init(mp, 0);
>> +	if (attrip == NULL)
>> +		return NULL;
>> +
>> +	xfs_trans_add_item(tp, &attrip->attri_item);
>> +	list_for_each_entry(attr, items, xattri_list)
>> +		xfs_attr_log_item(tp, attrip, attr);
>> +	return &attrip->attri_item;
>> +}
>> +
>> +/* Process an attr. */
>> +STATIC int
>> +xfs_attr_finish_item(
>> +	struct xfs_trans		*tp,
>> +	struct xfs_log_item		*done,
>> +	struct list_head		*item,
>> +	struct xfs_btree_cur		**state)
>> +{
>> +	struct xfs_attr_item		*attr;
>> +	struct xfs_attrd_log_item	*done_item = NULL;
>> +	int				error;
>> +	struct xfs_delattr_context	*dac;
>> +
>> +	attr = container_of(item, struct xfs_attr_item, xattri_list);
>> +	dac = &attr->xattri_dac;
>> +	if (done)
>> +		done_item = ATTRD_ITEM(done);
>> +
>> +	/*
>> +	 * Always reset trans after EAGAIN cycle
>> +	 * since the transaction is new
>> +	 */
>> +	dac->da_args->trans = tp;
>> +
>> +	error = xfs_trans_attr_finish_update(dac, done_item, &dac->leaf_bp,
>> +					     attr->xattri_op_flags);
>> +	if (error != -EAGAIN)
>> +		kmem_free(attr);
>> +
>> +	return error;
>> +}
>> +
>> +/* Abort all pending ATTRs. */
>> +STATIC void
>> +xfs_attr_abort_intent(
>> +	struct xfs_log_item		*intent)
>> +{
>> +	xfs_attri_release(ATTRI_ITEM(intent));
>> +}
>> +
>> +/* Cancel an attr */
>> +STATIC void
>> +xfs_attr_cancel_item(
>> +	struct list_head		*item)
>> +{
>> +	struct xfs_attr_item		*attr;
>> +
>> +	attr = container_of(item, struct xfs_attr_item, xattri_list);
>> +	kmem_free(attr);
>> +}
>> +
>>   STATIC xfs_lsn_t
>>   xfs_attri_item_committed(
>>   	struct xfs_log_item		*lip,
>> @@ -306,6 +463,30 @@ xfs_attri_item_match(
>>   	return ATTRI_ITEM(lip)->attri_format.alfi_id == intent_id;
>>   }
>>   
>> +/*
>> + * This routine is called to allocate an "attr free done" log item.
>> + */
>> +static struct xfs_attrd_log_item *
>> +xfs_trans_get_attrd(struct xfs_trans		*tp,
>> +		  struct xfs_attri_log_item	*attrip)
>> +{
>> +	struct xfs_attrd_log_item		*attrdp;
>> +	uint					size;
>> +
>> +	ASSERT(tp != NULL);
>> +
>> +	size = sizeof(struct xfs_attrd_log_item);
>> +	attrdp = kmem_zalloc(size, 0);
>> +
>> +	xfs_log_item_init(tp->t_mountp, &attrdp->attrd_item, XFS_LI_ATTRD,
>> +			  &xfs_attrd_item_ops);
>> +	attrdp->attrd_attrip = attrip;
>> +	attrdp->attrd_format.alfd_alf_id = attrip->attri_format.alfi_id;
>> +
>> +	xfs_trans_add_item(tp, &attrdp->attrd_item);
>> +	return attrdp;
>> +}
>> +
>>   static const struct xfs_item_ops xfs_attrd_item_ops = {
>>   	.flags		= XFS_ITEM_RELEASE_WHEN_COMMITTED,
>>   	.iop_size	= xfs_attrd_item_size,
>> @@ -313,6 +494,29 @@ static const struct xfs_item_ops xfs_attrd_item_ops = {
>>   	.iop_release    = xfs_attrd_item_release,
>>   };
>>   
>> +
>> +/* Get an ATTRD so we can process all the attrs. */
>> +static struct xfs_log_item *
>> +xfs_attr_create_done(
>> +	struct xfs_trans		*tp,
>> +	struct xfs_log_item		*intent,
>> +	unsigned int			count)
>> +{
>> +	if (!intent)
>> +		return NULL;
>> +
>> +	return &xfs_trans_get_attrd(tp, ATTRI_ITEM(intent))->attrd_item;
>> +}
>> +
>> +const struct xfs_defer_op_type xfs_attr_defer_type = {
>> +	.max_items	= 1,
>> +	.create_intent	= xfs_attr_create_intent,
>> +	.abort_intent	= xfs_attr_abort_intent,
>> +	.create_done	= xfs_attr_create_done,
>> +	.finish_item	= xfs_attr_finish_item,
>> +	.cancel_item	= xfs_attr_cancel_item,
>> +};
>> +
>>   /* Is this recovered ATTRI ok? */
>>   static inline bool
>>   xfs_attri_validate(
>> @@ -337,13 +541,167 @@ xfs_attri_validate(
>>   	return xfs_verify_ino(mp, attrp->alfi_ino);
>>   }
>>   
>> +/*
>> + * Process an attr intent item that was recovered from the log.  We need to
>> + * delete the attr that it describes.
>> + */
>> +STATIC int
>> +xfs_attri_item_recover(
>> +	struct xfs_log_item		*lip,
>> +	struct list_head		*capture_list)
>> +{
>> +	struct xfs_attri_log_item	*attrip = ATTRI_ITEM(lip);
>> +	struct xfs_attr_item		*attr;
>> +	struct xfs_mount		*mp = lip->li_mountp;
>> +	struct xfs_inode		*ip;
>> +	struct xfs_da_args		*args;
>> +	struct xfs_trans		*tp;
>> +	struct xfs_trans_res		tres;
>> +	struct xfs_attri_log_format	*attrp;
>> +	int				error, ret = 0;
>> +	int				total;
>> +	int				local;
>> +	struct xfs_attrd_log_item	*done_item = NULL;
>> +
>> +	/*
>> +	 * First check the validity of the attr described by the ATTRI.  If any
>> +	 * are bad, then assume that all are bad and just toss the ATTRI.
>> +	 */
>> +	attrp = &attrip->attri_format;
>> +	if (!xfs_attri_validate(mp, attrip))
>> +		return -EFSCORRUPTED;
>> +
>> +	error = xlog_recover_iget(mp,  attrp->alfi_ino, &ip);
>> +	if (error)
>> +		return error;
>> +
>> +	attr = kmem_zalloc(sizeof(struct xfs_attr_item) +
>> +			   sizeof(struct xfs_da_args), KM_NOFS);
>> +	args = (struct xfs_da_args *)(attr + 1);
>> +
>> +	attr->xattri_dac.da_args = args;
>> +	attr->xattri_op_flags = attrp->alfi_op_flags;
>> +
>> +	args->dp = ip;
>> +	args->geo = mp->m_attr_geo;
>> +	args->op_flags = attrp->alfi_op_flags;
>> +	args->whichfork = XFS_ATTR_FORK;
>> +	args->name = attrip->attri_name;
>> +	args->namelen = attrp->alfi_name_len;
>> +	args->hashval = xfs_da_hashname(args->name, args->namelen);
>> +	args->attr_filter = attrp->alfi_attr_flags;
>> +
>> +	if (attrp->alfi_op_flags == XFS_ATTR_OP_FLAGS_SET) {
>> +		args->value = attrip->attri_value;
>> +		args->valuelen = attrp->alfi_value_len;
>> +		args->total = xfs_attr_calc_size(args, &local);
>> +
>> +		tres.tr_logres = M_RES(mp)->tr_attrsetm.tr_logres +
>> +				 M_RES(mp)->tr_attrsetrt.tr_logres *
>> +					args->total;
>> +		tres.tr_logcount = XFS_ATTRSET_LOG_COUNT;
>> +		tres.tr_logflags = XFS_TRANS_PERM_LOG_RES;
>> +		total = args->total;
>> +	} else {
>> +		tres = M_RES(mp)->tr_attrrm;
>> +		total = XFS_ATTRRM_SPACE_RES(mp);
>> +	}
>> +	error = xfs_trans_alloc(mp, &tres, total, 0, XFS_TRANS_RESERVE, &tp);
>> +	if (error)
>> +		goto out;
>> +
>> +	args->trans = tp;
>> +	done_item = xfs_trans_get_attrd(tp, attrip);
>> +
>> +	xfs_ilock(ip, XFS_ILOCK_EXCL);
>> +	xfs_trans_ijoin(tp, ip, 0);
>> +
>> +	ret = xfs_trans_attr_finish_update(&attr->xattri_dac, done_item,
>> +					   &attr->xattri_dac.leaf_bp,
>> +					   attrp->alfi_op_flags);
>> +	if (ret == -EAGAIN) {
>> +		/* There's more work to do, so add it to this transaction */
>> +		xfs_defer_add(tp, XFS_DEFER_OPS_TYPE_ATTR, &attr->xattri_list);
>> +	} else
>> +		error = ret;
>> +
>> +	if (error) {
>> +		xfs_trans_cancel(tp);
>> +		goto out_unlock;
>> +	}
>> +
>> +	error = xfs_defer_ops_capture_and_commit(tp, ip, capture_list,
>> +						 attr->xattri_dac.leaf_bp);
>> +
>> +out_unlock:
>> +	xfs_iunlock(ip, XFS_ILOCK_EXCL);
>> +	xfs_irele(ip);
>> +out:
>> +	if (ret != -EAGAIN)
>> +		kmem_free(attr);
>> +	return error;
>> +}
>> +
>> +/* Re-log an intent item to push the log tail forward. */
>> +static struct xfs_log_item *
>> +xfs_attri_item_relog(
>> +	struct xfs_log_item		*intent,
>> +	struct xfs_trans		*tp)
>> +{
>> +	struct xfs_attrd_log_item	*attrdp;
>> +	struct xfs_attri_log_item	*old_attrip;
>> +	struct xfs_attri_log_item	*new_attrip;
>> +	struct xfs_attri_log_format	*new_attrp;
>> +	struct xfs_attri_log_format	*old_attrp;
>> +	int				buffer_size;
>> +
>> +	old_attrip = ATTRI_ITEM(intent);
>> +	old_attrp = &old_attrip->attri_format;
>> +	buffer_size = old_attrp->alfi_value_len + old_attrp->alfi_name_len;
>> +
>> +	tp->t_flags |= XFS_TRANS_DIRTY;
>> +	attrdp = xfs_trans_get_attrd(tp, old_attrip);
>> +	set_bit(XFS_LI_DIRTY, &attrdp->attrd_item.li_flags);
>> +
>> +	new_attrip = xfs_attri_init(tp->t_mountp, buffer_size);
>> +	new_attrp = &new_attrip->attri_format;
>> +
>> +	new_attrp->alfi_ino = old_attrp->alfi_ino;
>> +	new_attrp->alfi_op_flags = old_attrp->alfi_op_flags;
>> +	new_attrp->alfi_value_len = old_attrp->alfi_value_len;
>> +	new_attrp->alfi_name_len = old_attrp->alfi_name_len;
>> +	new_attrp->alfi_attr_flags = old_attrp->alfi_attr_flags;
>> +
>> +	new_attrip->attri_name_len = old_attrip->attri_name_len;
>> +	new_attrip->attri_name = ((char *)new_attrip) +
>> +				 sizeof(struct xfs_attri_log_item);
>> +	memcpy(new_attrip->attri_name, old_attrip->attri_name,
>> +		new_attrip->attri_name_len);
>> +
>> +	new_attrip->attri_value_len = old_attrip->attri_value_len;
>> +	if (new_attrip->attri_value_len > 0) {
>> +		new_attrip->attri_value = new_attrip->attri_name +
>> +					  new_attrip->attri_name_len;
>> +
>> +		memcpy(new_attrip->attri_value, old_attrip->attri_value,
>> +		       new_attrip->attri_value_len);
>> +	}
>> +
>> +	xfs_trans_add_item(tp, &new_attrip->attri_item);
>> +	set_bit(XFS_LI_DIRTY, &new_attrip->attri_item.li_flags);
>> +
>> +	return &new_attrip->attri_item;
>> +}
>> +
>>   static const struct xfs_item_ops xfs_attri_item_ops = {
>>   	.iop_size	= xfs_attri_item_size,
>>   	.iop_format	= xfs_attri_item_format,
>>   	.iop_unpin	= xfs_attri_item_unpin,
>>   	.iop_committed	= xfs_attri_item_committed,
>>   	.iop_release    = xfs_attri_item_release,
>> +	.iop_recover	= xfs_attri_item_recover,
>>   	.iop_match	= xfs_attri_item_match,
>> +	.iop_relog	= xfs_attri_item_relog,
>>   };
> 
> 

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v24 05/11] RFC xfs: Skip flip flags for delayed attrs
  2021-08-30 10:15   ` Chandan Babu R
@ 2021-08-31 18:11     ` Allison Henderson
  0 siblings, 0 replies; 45+ messages in thread
From: Allison Henderson @ 2021-08-31 18:11 UTC (permalink / raw)
  To: Chandan Babu R; +Cc: linux-xfs



On 8/30/21 3:15 AM, Chandan Babu R wrote:
> On 25 Aug 2021 at 04:14, Allison Henderson wrote:
>> This is a clean up patch that skips the flip flag logic for delayed attr
>> renames.  Since the log replay keeps the inode locked, we do not need to
>> worry about race windows with attr lookups.  So we can skip over
>> flipping the flag and the extra transaction roll for it
>>
>> RFC: In the last review, folks asked for some performance analysis, so I
>> did a few perf captures with and with out this patch.  What I found was
>> that there wasnt very much difference at all between having the patch or
>> not having it.  Of the time we do spend in the affected code, the
>> percentage is small.  Most of the time we spend about %0.03 of the time
>> in this function, with or with out the patch.  Occasionally we get a
>> 0.02%, though not often.  So I think this starts to challenge needing
>> this patch at all. This patch was requested some number of reviews ago,
>> be perhaps in light of the findings, it may no longer be of interest.
>>
>>       0.03%     0.00%  fsstress  [xfs]               [k] xfs_attr_set_iter
>>
>> Keep it or drop it?
> 
> Looks good to me.
> 
> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
Ok, thank you!
Allison

> 
>>
>> Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
>> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
>> ---
>>   fs/xfs/libxfs/xfs_attr.c      | 54 +++++++++++++++++++++--------------
>>   fs/xfs/libxfs/xfs_attr_leaf.c |  3 +-
>>   2 files changed, 35 insertions(+), 22 deletions(-)
>>
>> diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
>> index dfff81024e46..fce67c717be2 100644
>> --- a/fs/xfs/libxfs/xfs_attr.c
>> +++ b/fs/xfs/libxfs/xfs_attr.c
>> @@ -355,6 +355,7 @@ xfs_attr_set_iter(
>>   	struct xfs_inode		*dp = args->dp;
>>   	struct xfs_buf			*bp = NULL;
>>   	int				forkoff, error = 0;
>> +	struct xfs_mount		*mp = args->dp->i_mount;
>>   
>>   	/* State machine switch */
>>   	switch (dac->dela_state) {
>> @@ -477,16 +478,21 @@ xfs_attr_set_iter(
>>   		 * In a separate transaction, set the incomplete flag on the
>>   		 * "old" attr and clear the incomplete flag on the "new" attr.
>>   		 */
>> -		error = xfs_attr3_leaf_flipflags(args);
>> -		if (error)
>> -			return error;
>> -		/*
>> -		 * Commit the flag value change and start the next trans in
>> -		 * series.
>> -		 */
>> -		dac->dela_state = XFS_DAS_FLIP_LFLAG;
>> -		trace_xfs_attr_set_iter_return(dac->dela_state, args->dp);
>> -		return -EAGAIN;
>> +		if (!xfs_has_larp(mp)) {
>> +			error = xfs_attr3_leaf_flipflags(args);
>> +			if (error)
>> +				return error;
>> +			/*
>> +			 * Commit the flag value change and start the next trans
>> +			 * in series.
>> +			 */
>> +			dac->dela_state = XFS_DAS_FLIP_LFLAG;
>> +			trace_xfs_attr_set_iter_return(dac->dela_state,
>> +						       args->dp);
>> +			return -EAGAIN;
>> +		}
>> +
>> +		/* fallthrough */
>>   	case XFS_DAS_FLIP_LFLAG:
>>   		/*
>>   		 * Dismantle the "old" attribute/value pair by removing a
>> @@ -589,17 +595,21 @@ xfs_attr_set_iter(
>>   		 * In a separate transaction, set the incomplete flag on the
>>   		 * "old" attr and clear the incomplete flag on the "new" attr.
>>   		 */
>> -		error = xfs_attr3_leaf_flipflags(args);
>> -		if (error)
>> -			goto out;
>> -		/*
>> -		 * Commit the flag value change and start the next trans in
>> -		 * series
>> -		 */
>> -		dac->dela_state = XFS_DAS_FLIP_NFLAG;
>> -		trace_xfs_attr_set_iter_return(dac->dela_state, args->dp);
>> -		return -EAGAIN;
>> +		if (!xfs_has_larp(mp)) {
>> +			error = xfs_attr3_leaf_flipflags(args);
>> +			if (error)
>> +				goto out;
>> +			/*
>> +			 * Commit the flag value change and start the next trans
>> +			 * in series
>> +			 */
>> +			dac->dela_state = XFS_DAS_FLIP_NFLAG;
>> +			trace_xfs_attr_set_iter_return(dac->dela_state,
>> +						       args->dp);
>> +			return -EAGAIN;
>> +		}
>>   
>> +		/* fallthrough */
>>   	case XFS_DAS_FLIP_NFLAG:
>>   		/*
>>   		 * Dismantle the "old" attribute/value pair by removing a
>> @@ -1236,6 +1246,7 @@ xfs_attr_node_addname_clear_incomplete(
>>   {
>>   	struct xfs_da_args		*args = dac->da_args;
>>   	struct xfs_da_state		*state = NULL;
>> +	struct xfs_mount		*mp = args->dp->i_mount;
>>   	int				retval = 0;
>>   	int				error = 0;
>>   
>> @@ -1243,7 +1254,8 @@ xfs_attr_node_addname_clear_incomplete(
>>   	 * Re-find the "old" attribute entry after any split ops. The INCOMPLETE
>>   	 * flag means that we will find the "old" attr, not the "new" one.
>>   	 */
>> -	args->attr_filter |= XFS_ATTR_INCOMPLETE;
>> +	if (!xfs_has_larp(mp))
>> +		args->attr_filter |= XFS_ATTR_INCOMPLETE;
>>   	state = xfs_da_state_alloc(args);
>>   	state->inleaf = 0;
>>   	error = xfs_da3_node_lookup_int(state, &retval);
>> diff --git a/fs/xfs/libxfs/xfs_attr_leaf.c b/fs/xfs/libxfs/xfs_attr_leaf.c
>> index e1d11e314228..a0a352bdea59 100644
>> --- a/fs/xfs/libxfs/xfs_attr_leaf.c
>> +++ b/fs/xfs/libxfs/xfs_attr_leaf.c
>> @@ -1487,7 +1487,8 @@ xfs_attr3_leaf_add_work(
>>   	if (tmp)
>>   		entry->flags |= XFS_ATTR_LOCAL;
>>   	if (args->op_flags & XFS_DA_OP_RENAME) {
>> -		entry->flags |= XFS_ATTR_INCOMPLETE;
>> +		if (!xfs_has_larp(mp))
>> +			entry->flags |= XFS_ATTR_INCOMPLETE;
>>   		if ((args->blkno2 == args->blkno) &&
>>   		    (args->index2 <= args->index)) {
>>   			args->index2++;
> 
> 

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v24 06/11] xfs: Add xfs_attr_set_deferred and xfs_attr_remove_deferred
  2021-08-30 10:27   ` Chandan Babu R
@ 2021-08-31 18:12     ` Allison Henderson
  0 siblings, 0 replies; 45+ messages in thread
From: Allison Henderson @ 2021-08-31 18:12 UTC (permalink / raw)
  To: Chandan Babu R; +Cc: linux-xfs



On 8/30/21 3:27 AM, Chandan Babu R wrote:
> On 25 Aug 2021 at 04:14, Allison Henderson wrote:
>> From: Allison Collins <allison.henderson@oracle.com>
>>
>> These routines set up and queue a new deferred attribute operations.
>> These functions are meant to be called by any routine needing to
>> initiate a deferred attribute operation as opposed to the existing
>> inline operations. New helper function xfs_attr_item_init also added.
>>
>> Finally enable delayed attributes in xfs_attr_set and xfs_attr_remove.
>>
> 
> Looks good to me.
> 
> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>

Thank you!

Allison

> 
>> Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
>> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
>> ---
>>   fs/xfs/libxfs/xfs_attr.c | 71 ++++++++++++++++++++++++++++++++++++++--
>>   fs/xfs/libxfs/xfs_attr.h |  2 ++
>>   fs/xfs/xfs_log.c         | 41 +++++++++++++++++++++++
>>   fs/xfs/xfs_log.h         |  1 +
>>   4 files changed, 112 insertions(+), 3 deletions(-)
>>
>> diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
>> index fce67c717be2..6877683e2e35 100644
>> --- a/fs/xfs/libxfs/xfs_attr.c
>> +++ b/fs/xfs/libxfs/xfs_attr.c
>> @@ -25,6 +25,8 @@
>>   #include "xfs_trans_space.h"
>>   #include "xfs_trace.h"
>>   #include "xfs_attr_item.h"
>> +#include "xfs_attr.h"
>> +#include "xfs_log.h"
>>   
>>   /*
>>    * xfs_attr.c
>> @@ -726,6 +728,7 @@ xfs_attr_set(
>>   	int			error, local;
>>   	int			rmt_blks = 0;
>>   	unsigned int		total;
>> +	int			delayed = xfs_has_larp(mp);
>>   
>>   	if (xfs_is_shutdown(dp->i_mount))
>>   		return -EIO;
>> @@ -782,13 +785,19 @@ xfs_attr_set(
>>   		rmt_blks = xfs_attr3_rmt_blocks(mp, XFS_XATTR_SIZE_MAX);
>>   	}
>>   
>> +	if (delayed) {
>> +		error = xfs_attr_use_log_assist(mp);
>> +		if (error)
>> +			return error;
>> +	}
>> +
>>   	/*
>>   	 * Root fork attributes can use reserved data blocks for this
>>   	 * operation if necessary
>>   	 */
>>   	error = xfs_trans_alloc_inode(dp, &tres, total, 0, rsvd, &args->trans);
>>   	if (error)
>> -		return error;
>> +		goto drop_incompat;
>>   
>>   	if (args->value || xfs_inode_hasattr(dp)) {
>>   		error = xfs_iext_count_may_overflow(dp, XFS_ATTR_FORK,
>> @@ -806,9 +815,10 @@ xfs_attr_set(
>>   		if (error != -ENOATTR && error != -EEXIST)
>>   			goto out_trans_cancel;
>>   
>> -		error = xfs_attr_set_args(args);
>> +		error = xfs_attr_set_deferred(args);
>>   		if (error)
>>   			goto out_trans_cancel;
>> +
>>   		/* shortform attribute has already been committed */
>>   		if (!args->trans)
>>   			goto out_unlock;
>> @@ -816,7 +826,7 @@ xfs_attr_set(
>>   		if (error != -EEXIST)
>>   			goto out_trans_cancel;
>>   
>> -		error = xfs_attr_remove_args(args);
>> +		error = xfs_attr_remove_deferred(args);
>>   		if (error)
>>   			goto out_trans_cancel;
>>   	}
>> @@ -838,6 +848,9 @@ xfs_attr_set(
>>   	error = xfs_trans_commit(args->trans);
>>   out_unlock:
>>   	xfs_iunlock(dp, XFS_ILOCK_EXCL);
>> +drop_incompat:
>> +	if (delayed)
>> +		xlog_drop_incompat_feat(mp->m_log);
>>   	return error;
>>   
>>   out_trans_cancel:
>> @@ -846,6 +859,58 @@ xfs_attr_set(
>>   	goto out_unlock;
>>   }
>>   
>> +STATIC int
>> +xfs_attr_item_init(
>> +	struct xfs_da_args	*args,
>> +	unsigned int		op_flags,	/* op flag (set or remove) */
>> +	struct xfs_attr_item	**attr)		/* new xfs_attr_item */
>> +{
>> +
>> +	struct xfs_attr_item	*new;
>> +
>> +	new = kmem_zalloc(sizeof(struct xfs_attr_item), KM_NOFS);
>> +	new->xattri_op_flags = op_flags;
>> +	new->xattri_dac.da_args = args;
>> +
>> +	*attr = new;
>> +	return 0;
>> +}
>> +
>> +/* Sets an attribute for an inode as a deferred operation */
>> +int
>> +xfs_attr_set_deferred(
>> +	struct xfs_da_args	*args)
>> +{
>> +	struct xfs_attr_item	*new;
>> +	int			error = 0;
>> +
>> +	error = xfs_attr_item_init(args, XFS_ATTR_OP_FLAGS_SET, &new);
>> +	if (error)
>> +		return error;
>> +
>> +	xfs_defer_add(args->trans, XFS_DEFER_OPS_TYPE_ATTR, &new->xattri_list);
>> +
>> +	return 0;
>> +}
>> +
>> +/* Removes an attribute for an inode as a deferred operation */
>> +int
>> +xfs_attr_remove_deferred(
>> +	struct xfs_da_args	*args)
>> +{
>> +
>> +	struct xfs_attr_item	*new;
>> +	int			error;
>> +
>> +	error  = xfs_attr_item_init(args, XFS_ATTR_OP_FLAGS_REMOVE, &new);
>> +	if (error)
>> +		return error;
>> +
>> +	xfs_defer_add(args->trans, XFS_DEFER_OPS_TYPE_ATTR, &new->xattri_list);
>> +
>> +	return 0;
>> +}
>> +
>>   /*========================================================================
>>    * External routines when attribute list is inside the inode
>>    *========================================================================*/
>> diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
>> index aa33cdcf26b8..0f326c28ab7c 100644
>> --- a/fs/xfs/libxfs/xfs_attr.h
>> +++ b/fs/xfs/libxfs/xfs_attr.h
>> @@ -526,5 +526,7 @@ bool xfs_attr_namecheck(const void *name, size_t length);
>>   void xfs_delattr_context_init(struct xfs_delattr_context *dac,
>>   			      struct xfs_da_args *args);
>>   int xfs_attr_calc_size(struct xfs_da_args *args, int *local);
>> +int xfs_attr_set_deferred(struct xfs_da_args *args);
>> +int xfs_attr_remove_deferred(struct xfs_da_args *args);
>>   
>>   #endif	/* __XFS_ATTR_H__ */
>> diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
>> index 4402c5d09269..0d0afa1aae59 100644
>> --- a/fs/xfs/xfs_log.c
>> +++ b/fs/xfs/xfs_log.c
>> @@ -3993,3 +3993,44 @@ xlog_drop_incompat_feat(
>>   {
>>   	up_read(&log->l_incompat_users);
>>   }
>> +
>> +/*
>> + * Get permission to use log-assisted atomic exchange of file extents.
>> + *
>> + * Callers must not be running any transactions or hold any inode locks, and
>> + * they must release the permission by calling xlog_drop_incompat_feat
>> + * when they're done.
>> + */
>> +int
>> +xfs_attr_use_log_assist(
>> +	struct xfs_mount	*mp)
>> +{
>> +	int			error = 0;
>> +
>> +	/*
>> +	 * Protect ourselves from an idle log clearing the logged xattrs log
>> +	 * incompat feature bit.
>> +	 */
>> +	xlog_use_incompat_feat(mp->m_log);
>> +
>> +	/*
>> +	 * If log-assisted xattrs are already enabled, the caller can use the
>> +	 * log assisted swap functions with the log-incompat reference we got.
>> +	 */
>> +	if (sb_version_haslogxattrs(&mp->m_sb))
>> +		return 0;
>> +
>> +	/* Enable log-assisted xattrs. */
>> +	error = xfs_add_incompat_log_feature(mp,
>> +			XFS_SB_FEAT_INCOMPAT_LOG_XATTRS);
>> +	if (error)
>> +		goto drop_incompat;
>> +
>> +	xfs_warn_once(mp,
>> +"EXPERIMENTAL logged extended attributes feature added. Use at your own risk!");
>> +
>> +	return 0;
>> +drop_incompat:
>> +	xlog_drop_incompat_feat(mp->m_log);
>> +	return error;
>> +}
>> diff --git a/fs/xfs/xfs_log.h b/fs/xfs/xfs_log.h
>> index dc1b77b92fc1..4504ab60ac85 100644
>> --- a/fs/xfs/xfs_log.h
>> +++ b/fs/xfs/xfs_log.h
>> @@ -144,5 +144,6 @@ bool	  xlog_force_shutdown(struct xlog *log, int shutdown_flags);
>>   
>>   void xlog_use_incompat_feat(struct xlog *log);
>>   void xlog_drop_incompat_feat(struct xlog *log);
>> +int xfs_attr_use_log_assist(struct xfs_mount *mp);
>>   
>>   #endif	/* __XFS_LOG_H__ */
> 
> 

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v24 09/11] xfs: Add larp debug option
  2021-08-30 12:03   ` Chandan Babu R
@ 2021-08-31 18:12     ` Allison Henderson
  0 siblings, 0 replies; 45+ messages in thread
From: Allison Henderson @ 2021-08-31 18:12 UTC (permalink / raw)
  To: Chandan Babu R; +Cc: linux-xfs



On 8/30/21 5:03 AM, Chandan Babu R wrote:
> On 25 Aug 2021 at 04:14, Allison Henderson wrote:
>> This patch adds a mount option to enable log attribute replay. Eventually
> 
> s/mount option/debug option//
Ok, will fix.

> 
>> this can be removed when delayed attrs becomes permanent.
> 
> The rest looks good to me.
> 
> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
Thanks!
Allison

> 
>>
>> Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
>> ---
>>   fs/xfs/libxfs/xfs_attr.h |  2 +-
>>   fs/xfs/xfs_globals.c     |  1 +
>>   fs/xfs/xfs_sysctl.h      |  1 +
>>   fs/xfs/xfs_sysfs.c       | 24 ++++++++++++++++++++++++
>>   4 files changed, 27 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
>> index efb7ac4fc41c..492762541174 100644
>> --- a/fs/xfs/libxfs/xfs_attr.h
>> +++ b/fs/xfs/libxfs/xfs_attr.h
>> @@ -30,7 +30,7 @@ struct xfs_attr_list_context;
>>   
>>   static inline bool xfs_has_larp(struct xfs_mount *mp)
>>   {
>> -	return false;
>> +	return xfs_globals.larp;
>>   }
>>   
>>   /*
>> diff --git a/fs/xfs/xfs_globals.c b/fs/xfs/xfs_globals.c
>> index f62fa652c2fd..4d0a98f920ca 100644
>> --- a/fs/xfs/xfs_globals.c
>> +++ b/fs/xfs/xfs_globals.c
>> @@ -41,5 +41,6 @@ struct xfs_globals xfs_globals = {
>>   #endif
>>   #ifdef DEBUG
>>   	.pwork_threads		=	-1,	/* automatic thread detection */
>> +	.larp			=	false,	/* log attribute replay */
>>   #endif
>>   };
>> diff --git a/fs/xfs/xfs_sysctl.h b/fs/xfs/xfs_sysctl.h
>> index 7692e76ead33..f78ad6b10ea5 100644
>> --- a/fs/xfs/xfs_sysctl.h
>> +++ b/fs/xfs/xfs_sysctl.h
>> @@ -83,6 +83,7 @@ extern xfs_param_t	xfs_params;
>>   struct xfs_globals {
>>   #ifdef DEBUG
>>   	int	pwork_threads;		/* parallel workqueue threads */
>> +	bool	larp;			/* log attribute replay */
>>   #endif
>>   	int	log_recovery_delay;	/* log recovery delay (secs) */
>>   	int	mount_delay;		/* mount setup delay (secs) */
>> diff --git a/fs/xfs/xfs_sysfs.c b/fs/xfs/xfs_sysfs.c
>> index 18dc5eca6c04..74180e05e8ed 100644
>> --- a/fs/xfs/xfs_sysfs.c
>> +++ b/fs/xfs/xfs_sysfs.c
>> @@ -227,6 +227,29 @@ pwork_threads_show(
>>   	return snprintf(buf, PAGE_SIZE, "%d\n", xfs_globals.pwork_threads);
>>   }
>>   XFS_SYSFS_ATTR_RW(pwork_threads);
>> +
>> +static ssize_t
>> +larp_store(
>> +	struct kobject	*kobject,
>> +	const char	*buf,
>> +	size_t		count)
>> +{
>> +	ssize_t		ret;
>> +
>> +	ret = kstrtobool(buf, &xfs_globals.larp);
>> +	if (ret < 0)
>> +		return ret;
>> +	return count;
>> +}
>> +
>> +STATIC ssize_t
>> +larp_show(
>> +	struct kobject	*kobject,
>> +	char		*buf)
>> +{
>> +	return snprintf(buf, PAGE_SIZE, "%d\n", xfs_globals.larp);
>> +}
>> +XFS_SYSFS_ATTR_RW(larp);
>>   #endif /* DEBUG */
>>   
>>   static struct attribute *xfs_dbg_attrs[] = {
>> @@ -236,6 +259,7 @@ static struct attribute *xfs_dbg_attrs[] = {
>>   	ATTR_LIST(always_cow),
>>   #ifdef DEBUG
>>   	ATTR_LIST(pwork_threads),
>> +	ATTR_LIST(larp),
>>   #endif
>>   	NULL,
>>   };
> 
> 

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v24 11/11] xfs: Add helper function xfs_attr_leaf_addname
  2021-08-30 14:17   ` Chandan Babu R
@ 2021-08-31 18:13     ` Allison Henderson
  0 siblings, 0 replies; 45+ messages in thread
From: Allison Henderson @ 2021-08-31 18:13 UTC (permalink / raw)
  To: Chandan Babu R; +Cc: linux-xfs



On 8/30/21 7:17 AM, Chandan Babu R wrote:
> On 25 Aug 2021 at 04:14, Allison Henderson wrote:
>> This patch adds a helper function xfs_attr_leaf_addname.  While this
>> does help to break down xfs_attr_set_iter, it does also hoist out some
>> of the state management.  This patch has been moved to the end of the
>> clean up series for further discussion.
>>
> 
> Looks good to me.
> 
> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>

Great, thankyou!

Allison

> 
>> Suggested-by: Darrick J. Wong <djwong@kernel.org>
>> Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
>> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
>> ---
>>   fs/xfs/libxfs/xfs_attr.c | 110 +++++++++++++++++++++------------------
>>   fs/xfs/xfs_trace.h       |   1 +
>>   2 files changed, 61 insertions(+), 50 deletions(-)
>>
>> diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
>> index c3fdf232cd51..7150f0e051a0 100644
>> --- a/fs/xfs/libxfs/xfs_attr.c
>> +++ b/fs/xfs/libxfs/xfs_attr.c
>> @@ -284,6 +284,65 @@ xfs_attr_sf_addname(
>>   	return -EAGAIN;
>>   }
>>   
>> +STATIC int
>> +xfs_attr_leaf_addname(
>> +	struct xfs_attr_item	*attr)
>> +{
>> +	struct xfs_da_args	*args = attr->xattri_da_args;
>> +	struct xfs_inode	*dp = args->dp;
>> +	int			error;
>> +
>> +	if (xfs_attr_is_leaf(dp)) {
>> +		error = xfs_attr_leaf_try_add(args, attr->xattri_leaf_bp);
>> +		if (error == -ENOSPC) {
>> +			error = xfs_attr3_leaf_to_node(args);
>> +			if (error)
>> +				return error;
>> +
>> +			/*
>> +			 * Finish any deferred work items and roll the
>> +			 * transaction once more.  The goal here is to call
>> +			 * node_addname with the inode and transaction in the
>> +			 * same state (inode locked and joined, transaction
>> +			 * clean) no matter how we got to this step.
>> +			 *
>> +			 * At this point, we are still in XFS_DAS_UNINIT, but
>> +			 * when we come back, we'll be a node, so we'll fall
>> +			 * down into the node handling code below
>> +			 */
>> +			trace_xfs_attr_set_iter_return(
>> +				attr->xattri_dela_state, args->dp);
>> +			return -EAGAIN;
>> +		}
>> +
>> +		if (error)
>> +			return error;
>> +
>> +		attr->xattri_dela_state = XFS_DAS_FOUND_LBLK;
>> +	} else {
>> +		error = xfs_attr_node_addname_find_attr(attr);
>> +		if (error)
>> +			return error;
>> +
>> +		error = xfs_attr_node_addname(attr);
>> +		if (error)
>> +			return error;
>> +
>> +		/*
>> +		 * If addname was successful, and we dont need to alloc or
>> +		 * remove anymore blks, we're done.
>> +		 */
>> +		if (!args->rmtblkno &&
>> +		    !(args->op_flags & XFS_DA_OP_RENAME))
>> +			return 0;
>> +
>> +		attr->xattri_dela_state = XFS_DAS_FOUND_NBLK;
>> +	}
>> +
>> +	trace_xfs_attr_leaf_addname_return(attr->xattri_dela_state, args->dp);
>> +	return -EAGAIN;
>> +}
>> +
>>   /*
>>    * Set the attribute specified in @args.
>>    * This routine is meant to function as a delayed operation, and may return
>> @@ -319,57 +378,8 @@ xfs_attr_set_iter(
>>   			attr->xattri_leaf_bp = NULL;
>>   		}
>>   
>> -		if (xfs_attr_is_leaf(dp)) {
>> -			error = xfs_attr_leaf_try_add(args,
>> -						      attr->xattri_leaf_bp);
>> -			if (error == -ENOSPC) {
>> -				error = xfs_attr3_leaf_to_node(args);
>> -				if (error)
>> -					return error;
>> -
>> -				/*
>> -				 * Finish any deferred work items and roll the
>> -				 * transaction once more.  The goal here is to
>> -				 * call node_addname with the inode and
>> -				 * transaction in the same state (inode locked
>> -				 * and joined, transaction clean) no matter how
>> -				 * we got to this step.
>> -				 *
>> -				 * At this point, we are still in
>> -				 * XFS_DAS_UNINIT, but when we come back, we'll
>> -				 * be a node, so we'll fall down into the node
>> -				 * handling code below
>> -				 */
>> -				trace_xfs_attr_set_iter_return(
>> -					attr->xattri_dela_state, args->dp);
>> -				return -EAGAIN;
>> -			} else if (error) {
>> -				return error;
>> -			}
>> -
>> -			attr->xattri_dela_state = XFS_DAS_FOUND_LBLK;
>> -		} else {
>> -			error = xfs_attr_node_addname_find_attr(attr);
>> -			if (error)
>> -				return error;
>> +		return xfs_attr_leaf_addname(attr);
>>   
>> -			error = xfs_attr_node_addname(attr);
>> -			if (error)
>> -				return error;
>> -
>> -			/*
>> -			 * If addname was successful, and we dont need to alloc
>> -			 * or remove anymore blks, we're done.
>> -			 */
>> -			if (!args->rmtblkno &&
>> -			    !(args->op_flags & XFS_DA_OP_RENAME))
>> -				return 0;
>> -
>> -			attr->xattri_dela_state = XFS_DAS_FOUND_NBLK;
>> -		}
>> -		trace_xfs_attr_set_iter_return(attr->xattri_dela_state,
>> -					       args->dp);
>> -		return -EAGAIN;
>>   	case XFS_DAS_FOUND_LBLK:
>>   		/*
>>   		 * If there was an out-of-line value, allocate the blocks we
>> diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
>> index 1033a95fbf8e..77a78b5b1a29 100644
>> --- a/fs/xfs/xfs_trace.h
>> +++ b/fs/xfs/xfs_trace.h
>> @@ -4132,6 +4132,7 @@ DEFINE_EVENT(xfs_das_state_class, name, \
>>   	TP_ARGS(das, ip))
>>   DEFINE_DAS_STATE_EVENT(xfs_attr_sf_addname_return);
>>   DEFINE_DAS_STATE_EVENT(xfs_attr_set_iter_return);
>> +DEFINE_DAS_STATE_EVENT(xfs_attr_leaf_addname_return);
>>   DEFINE_DAS_STATE_EVENT(xfs_attr_node_addname_return);
>>   DEFINE_DAS_STATE_EVENT(xfs_attr_remove_iter_return);
>>   DEFINE_DAS_STATE_EVENT(xfs_attr_rmtval_remove_return);
> 
> 

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v24 00/11] Log Attribute Replay
  2021-08-31  0:20 ` [PATCH v24 00/11] Log Attribute Replay Dave Chinner
@ 2021-08-31 18:27   ` Allison Henderson
  2021-09-01  6:29     ` Dave Chinner
  0 siblings, 1 reply; 45+ messages in thread
From: Allison Henderson @ 2021-08-31 18:27 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs



On 8/30/21 5:20 PM, Dave Chinner wrote:
> On Tue, Aug 24, 2021 at 03:44:23PM -0700, Allison Henderson wrote:
>> Hi all,
>>
>> This set is a subset of a larger series parent pointers. Delayed attributes allow
>> attribute operations (set and remove) to be logged and committed in the same
>> way that other delayed operations do. This allows more complex operations (like
>> parent pointers) to be broken up into multiple smaller transactions. To do
>> this, the existing attr operations must be modified to operate as a delayed
>> operation.  This means that they cannot roll, commit, or finish transactions.
>> Instead, they return -EAGAIN to allow the calling function to handle the
>> transaction.  In this series, we focus on only the delayed attribute portion.
>> We will introduce parent pointers in a later set.
>>
>> The set as a whole is a bit much to digest at once, so I usually send out the
>> smaller sub series to reduce reviewer burn out.  But the entire extended series
>> is visible through the included github links.

Some of this we worked out in the chat last night, but I will echo it 
here for the archives

> 
> Ok, so like I did with Darrick's deferred inactivation series, the
> first thing I'm doing here is throwing this patchset at
> scalability/performance worklaods and finding out what is different.
> 
> I've merged this series with 5.14 + xfs/for-next + xfs-cil-scale and
> then run some tests on it. First up is fsmark creating zero length
> files w/ 64 byte xattrs. This should stress only shortform attribute
> manipulations.
> 
> I have not enabled delayed attributes yet (i.e.
> /sys/fs/xfs/debug/larp = 0)
> 
> First thing I notice is the transaction commit rate during create is
> up around 900k/s, so we are doing 3 transactions per inode - 1 for
> create, 2 for attributes. That looks like a regression - existing
> shortform attribute creation only takes a single transaction commit,
> so this workload prior to this patchset only ran at 600k commits/s.
> 
> Note that hte only reason I'm getting 900k transactions/s is the
> CIL scalability patchset - without that the system tops out at ~800k
> transactions/s and so this would be a significant performance
> regression (20%) vs the current xfs/for-next code.
> 
> Essentially, this looks like we are doing an extra transaction
> commit to defer the creation of the attribute, then doing another
> transaction to actually modify the attribute. i.e.:
> 
>   - 11.04% xfs_attr_set
>      - 8.70% xfs_trans_commit
>         - 8.69% __xfs_trans_commit
> 	  - 5.10% xfs_defer_finish_noroll
> 	     - 3.74% xfs_defer_trans_roll
> 		- 3.57% xfs_trans_roll
> 		   - 3.13% __xfs_trans_commit
> 		      - 3.01% xlog_cil_commit
> 			   0.66% down_read
> 			   0.63% xfs_log_ticket_regrant
> 	     - 1.16% xfs_attr_finish_item
> 		- 1.06% xfs_trans_attr_finish_update
> 		   - 1.03% xfs_attr_set_iter
> 		      - 1.01% xfs_attr_sf_addname
> 			 - 0.99% xfs_attr_try_sf_addname
> 			    - 0.61% xfs_attr_shortform_addname
> 				 0.55% xfs_attr_shortform_add
> 
> 
> AFAICT, for non-delayed attributes, this first transaction commit
> logs the inode but does not create intent or intent done items
> (returns NULL for both operations), so just rolls and runs the
> ->finish_item. So it would seem that the first transaction just
> changes the inode timestamps and does nothing else.
> 
> Firstly, this means the inode timestamp change is not atomic w.r.t.
> the attribute change the timestamp change relates to and it's
> essentially new overhead for the non-delayed path.
> 
> Looking at the unlink path, I see the same thing - there's an extra
> transaction for the attr remove path, the same as the attr set path.
> This drives the unlink path to 1.1 million transaction commits/sec
> instead of 800k/s, so it's likely that there's a substantial
> performance regression here on a kernel without the CIL scalability
> patchset.
> 
> IOWs, there's significant behavioural changes with the non-delayed
> logging version of this patchset, both in terms of performance and
> the atomicity of changes that appear in the journal and hence
> recovery behaviour.
> 
> At this point I have to ask: why are we trying to retain the "old"
> way of doing things (even for testing) if it is substantially
> changing behaviour and on-disk journal contents for attribute
> modifications?
Per the chat discussion, we have to keep both methods since sb v4 would 
not use the new log entries.

> 
> So, lets turn on delayed logging:
> 
> $ sudo sh -c 'echo 1 > /sys/fs/xfs/debug/larp'
> $ ~/tests/fsmark-50-test-xfs.sh -t 16 -X 64 -d /dev/mapper/fast -- -l size=2000m -d agcount=67
> QUOTA=
> MKFSOPTS= -l size=2000m -d agcount=67
> DEV=/dev/mapper/fast
> THREADS=16
> .....
> 
> Message from syslogd@test4 at Aug 31 09:12:55 ...
>   kernel:[ 2342.737931] XFS: Assertion failed: !test_bit(XFS_LI_DIRTY, &lip->li_flags), file: fs/xfs/xfs_trans.c, line: 652
> 
> Instant assert fail and the machine locks up hard.
> 
> Actually, now that I reproduce it with a full console trace (which
> is terribly interleaved and almost impossible to read) there's
> bad stuff all over the place. Null pointer dereferences in
> xlog_cil_commit, "sleeping in atomic" failures, and the assert
> failure above.
> 
> Ok, run a single thread, and...
> 
> [   84.119162] BUG: kernel NULL pointer dereference, address: 000000000000000d
> [   84.123541] #PF: supervisor write access in kernel mode
> [   84.126028] #PF: error_code(0x0002) - not-present page
> [   84.127312] PGD 0 P4D 0
> [   84.127966] Oops: 0002 [#1] PREEMPT SMP
> [   84.128960] CPU: 8 PID: 5139 Comm: fs_mark Not tainted 5.14.0-dgc+ #552
> [   84.130632] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-1 04/01/2014
> [   84.132723] RIP: 0010:xlog_prepare_iovec+0x59/0xe0
> [   84.133951] Code: 4c 89 f8 4c 29 e0 48 c1 f8 04 48 39 d0 7d 76 4d 8d 67 10 8b 43 34 8d 50 0c 83 e2 07 74 0c 83 c0 0b 83 c8 07 83 e8 0b 89 43 34 <45> 89 74 24 0c 48 63 43 34 48 03 43 28 49 89 04 24 c7 40 08 69 00
> [   84.137980] RSP: 0018:ffffc900021d7838 EFLAGS: 00010202
> [   84.138968] RAX: 0000000000000004 RBX: ffff888140f3e100 RCX: 0000000000000006
> [   84.140321] RDX: 0000000000000004 RSI: ffffc900021d7880 RDI: ffff888140f3e100
> [   84.141671] RBP: ffffc900021d7868 R08: ffffffff82a8fb88 R09: 000000000000494e
> [   84.143069] R10: ffff88823ffd5000 R11: 00000000000319c8 R12: 0000000000000001
> [   84.144447] R13: ffffc900021d7880 R14: 000000000000001b R15: 0000000000000000
> [   84.145825] FS:  00007f51fb33c740(0000) GS:ffff88823bc00000(0000) knlGS:0000000000000000
> [   84.147413] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   84.148535] CR2: 000000000000000d CR3: 00000001473b1003 CR4: 0000000000770ee0
> [   84.149927] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [   84.151291] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [   84.152331] PKRU: 55555554
> [   84.152732] Call Trace:
> [   84.153106]  xfs_attri_item_format+0x87/0x230
> [   84.153748]  xlog_cil_commit+0x253/0xa00
> [   84.154329]  ? kvmalloc_node+0x79/0x80
> [   84.154881]  __xfs_trans_commit+0xc1/0x330
> [   84.155490]  xfs_trans_roll+0x53/0xe0
> [   84.156027]  xfs_defer_trans_roll+0x10d/0x2b0
> [   84.156659]  xfs_defer_finish_noroll+0xb1/0x650
> [   84.157312]  __xfs_trans_commit+0x143/0x330
> [   84.157918]  xfs_trans_commit+0x10/0x20
> [   84.158498]  xfs_attr_set+0x41a/0x4e0
> [   84.159030]  xfs_xattr_set+0x8d/0xe0
> [   84.159554]  __vfs_setxattr+0x6b/0x90
> [   84.160090]  __vfs_setxattr_noperm+0x7d/0x1f0
> [   84.160718]  __vfs_setxattr_locked+0xdf/0x100
> [   84.161346]  vfs_setxattr+0x9b/0x170
> [   84.161862]  setxattr+0x110/0x200
> [   84.162346]  ? _raw_spin_unlock+0xe/0x20
> [   84.162914]  ? __handle_mm_fault+0xc1b/0x16d0
> [   84.163556]  ? __might_sleep+0x49/0x80
> [   84.164132]  __x64_sys_fsetxattr+0xb1/0xe0
> [   84.164782]  do_syscall_64+0x35/0x80
> 
> Ok, there's the first failure.
> 
> This looks like it's a problem with xfs_attri_item_{size,format} in
> calculating the number of bytes to log. They use ATTR_NVEC_SIZE() to
> calculate the number of bytes of copy from the attribute item which
> rounds up the length to copy to 4 byte aligned values. I'm not sure
> what this function is calculating:
> 
> /* iovec length must be 32-bit aligned */
> static inline size_t ATTR_NVEC_SIZE(size_t size)
> {
>          return size == sizeof(int32_t) ? size :
> 	               sizeof(int32_t) + round_up(size, sizeof(int32_t));
> }
> 
> It appears to be saying if the size == 4, then return 4, otherwise
> return 4 + roundup(size)... which leads me to struct
> xfs_attri_log_format:
> 
> struct xfs_attri_log_format {
>          uint16_t        alfi_type;      /* attri log item type */
>          uint16_t        alfi_size;      /* size of this item */
>          uint32_t        __pad;          /* pad to 64 bit aligned */
>          uint64_t        alfi_id;        /* attri identifier */
>          uint64_t        alfi_ino;       /* the inode for this attr operation */
>          uint32_t        alfi_op_flags;  /* marks the op as a set or remove */
>          uint32_t        alfi_name_len;  /* attr name length */
>          uint32_t        alfi_value_len; /* attr value length */
>          uint32_t        alfi_attr_flags;/* attr flags */
> };
> 
> I don't see where the extra 4 bytes for the attribute vector size
> comes from. It's not needed to store the length, so this could
> oversize the amount of data to be copied from the source
> buffer by up to 7 bytes?
> 
> I can see that it might need rounding with the existing
> log code (because the formatter is responsible for 32 bit alignment
> of log vectors), but that goes away with the CIL scalability
> patchset that always aligns iovecs to 4 byte alignment so the
> formatters do not need to do that.

I think we figured this out last night, initially this was here for an 
assertion check in the log code, but I think just the round up will 
suffice for the check.

> 
> Hiding it in a "macro" is not necessary, either - look at how
> xfs_inode_item_{data,attr}_fork_size handle the rounding up of the
> local format fork size. They round up the fork byte count to 4
> directly, and the format code copies those bytes because
> xfs_idata_realloc() allocates those bytes.
> 
> However, for the attribute buffers, this isn't guaranteed. Look at
> xfs_xattr_set():
> 
>          struct xfs_da_args      args = {
>                  .dp             = XFS_I(inode),
>                  .attr_filter    = handler->flags,
>                  .attr_flags     = flags,
>                  .name           = name,
>                  .namelen        = strlen(name),
>                  .value          = (void *)value,
>                  .valuelen       = size,
>          };
> 
> There is no rounding up of the name or value lengths, and these end
> up directly referenced by the deferred logging via xfs_attr_log_item()
> and attrip->da_args->...
> 
>          attrip->attri_name = (void *)attr->xattri_da_args->name;
>          attrip->attri_value = attr->xattri_da_args->value;
>          attrip->attri_name_len = attr->xattri_da_args->namelen;
>          attrip->attri_value_len = attr->xattri_da_args->valuelen;
> 
> We then pass those pointers directly to xlog_iovec_copy() but with a
> rounded up length that is longer than the source buffer:
> 
>          xlog_copy_iovec(lv, &vecp, XLOG_REG_TYPE_ATTR_NAME,
>                          attrip->attri_name,
>                          ATTR_NVEC_SIZE(attrip->attri_name_len));
>          if (attrip->attri_value_len > 0)
>                  xlog_copy_iovec(lv, &vecp, XLOG_REG_TYPE_ATTR_VALUE,
>                                  attrip->attri_value,
>                                  ATTR_NVEC_SIZE(attrip->attri_value_len));
> 
> So while this might not be the source of the above crash, it's
> certainly a bug that needs fixing.
> 
> At this point, I'm just going to hack on the code to make it work,
> and we can go from there...

I think when we left off last night, we are more concerned with the 
extra transaction overhead in the perf captures.  I will see if I can 
replicate what you are seeing with perf and maybe we can work out some 
short cuts.  Thank for your help here!

Allison

> 
> Cheers,
> 
> Dave.
> 

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v24 03/11] xfs: Set up infrastructure for log atrribute replay
  2021-08-31  0:48   ` Dave Chinner
@ 2021-08-31 19:03     ` Allison Henderson
  0 siblings, 0 replies; 45+ messages in thread
From: Allison Henderson @ 2021-08-31 19:03 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs



On 8/30/21 5:48 PM, Dave Chinner wrote:
> On Tue, Aug 24, 2021 at 03:44:26PM -0700, Allison Henderson wrote:
>> +/*
>> + * Allocate and initialize an attri item.  Caller may allocate an additional
>> + * trailing buffer of the specified size
>> + */
>> +STATIC struct xfs_attri_log_item *
>> +xfs_attri_init(
>> +	struct xfs_mount		*mp,
>> +	int				buffer_size)
>> +
>> +{
>> +	struct xfs_attri_log_item	*attrip;
>> +	uint				size;
>> +
>> +	size = sizeof(struct xfs_attri_log_item) + buffer_size;
>> +	attrip = kvmalloc(size, KM_ZERO);
>> +	if (attrip == NULL)
>> +		return NULL;
> 
> kvmalloc() takes GFP flags. I think you want GFP_KERNEL | __GFP_ZERO
> here.
> 
> Also, buffer size is taken directly from on-disk without bounds/length
> validation, meaning this could end up being an attacker controlled
> memory allocation, so .....
> 
Ok, will fix

>> +STATIC int
>> +xlog_recover_attri_commit_pass2(
>> +	struct xlog                     *log,
>> +	struct list_head		*buffer_list,
>> +	struct xlog_recover_item        *item,
>> +	xfs_lsn_t                       lsn)
>> +{
>> +	int                             error;
>> +	struct xfs_mount                *mp = log->l_mp;
>> +	struct xfs_attri_log_item       *attrip;
>> +	struct xfs_attri_log_format     *attri_formatp;
>> +	char				*name = NULL;
>> +	char				*value = NULL;
>> +	int				region = 0;
>> +	int				buffer_size;
>> +
>> +	attri_formatp = item->ri_buf[region].i_addr;
>> +
>> +	/* Validate xfs_attri_log_format */
>> +	if (attri_formatp->__pad != 0 || attri_formatp->alfi_name_len == 0 ||
>> +	    (attri_formatp->alfi_op_flags == XFS_ATTR_OP_FLAGS_REMOVE &&
>> +	    attri_formatp->alfi_value_len != 0)) {
>> +		XFS_ERROR_REPORT(__func__, XFS_ERRLEVEL_LOW, mp);
>> +		return -EFSCORRUPTED;
>> +	}
>> +
>> +	buffer_size = attri_formatp->alfi_name_len +
>> +		      attri_formatp->alfi_value_len;
>> +
>> +	attrip = xfs_attri_init(mp, buffer_size);
>> +	if (attrip == NULL)
>> +		return -ENOMEM;
> 
> There needs to be a lot better validation of the attribute
> name/value lengths here.  Also, memory allocation failure here will
> abort recovery, so it might be worth adding a comment here....
Maybe we can add a call to xfs_attri_validate here?  I think we can just 
modify it to directly check the xfs_attri_log_format.

Thanks!
Allison

> 
> Cheers,
> 
> Dave.
> 


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v24 03/11] xfs: Set up infrastructure for log atrribute replay
  2021-08-24 22:44 ` [PATCH v24 03/11] xfs: Set up infrastructure for log atrribute replay Allison Henderson
  2021-08-27 14:17   ` Chandan Babu R
  2021-08-31  0:48   ` Dave Chinner
@ 2021-08-31 21:52   ` Dave Chinner
  2021-09-01  5:34     ` Allison Henderson
  2 siblings, 1 reply; 45+ messages in thread
From: Dave Chinner @ 2021-08-31 21:52 UTC (permalink / raw)
  To: Allison Henderson; +Cc: linux-xfs

On Tue, Aug 24, 2021 at 03:44:26PM -0700, Allison Henderson wrote:
> Currently attributes are modified directly across one or more
> transactions. But they are not logged or replayed in the event of an
> error. The goal of log attr replay is to enable logging and replaying
> of attribute operations using the existing delayed operations
> infrastructure.  This will later enable the attributes to become part of
> larger multi part operations that also must first be recorded to the
> log.  This is mostly of interest in the scheme of parent pointers which
> would need to maintain an attribute containing parent inode information
> any time an inode is moved, created, or removed.  Parent pointers would
> then be of interest to any feature that would need to quickly derive an
> inode path from the mount point. Online scrub, nfs lookups and fs grow
> or shrink operations are all features that could take advantage of this.
> 
> This patch adds two new log item types for setting or removing
> attributes as deferred operations.  The xfs_attri_log_item will log an
> intent to set or remove an attribute.  The corresponding
> xfs_attrd_log_item holds a reference to the xfs_attri_log_item and is
> freed once the transaction is done.  Both log items use a generic
> xfs_attr_log_format structure that contains the attribute name, value,
> flags, inode, and an op_flag that indicates if the operations is a set
> or remove.
> 
> Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
> ---
>  fs/xfs/Makefile                 |   1 +
>  fs/xfs/libxfs/xfs_attr.c        |   5 +-
>  fs/xfs/libxfs/xfs_attr.h        |  31 +++
>  fs/xfs/libxfs/xfs_defer.h       |   2 +
>  fs/xfs/libxfs/xfs_log_format.h  |  44 +++-
>  fs/xfs/libxfs/xfs_log_recover.h |   2 +
>  fs/xfs/scrub/common.c           |   2 +
>  fs/xfs/xfs_attr_item.c          | 453 ++++++++++++++++++++++++++++++++

Comment on the overall structure of this file now I've been trying
to navigate through it for a little while. It is structured like:

<some attri stuff>
<some attrd stuff>
static const struct xfs_item_ops xfs_attrd_item_ops = {...}
<some more attri stuff>
static const struct xfs_item_ops xfs_attri_item_ops = {...}
<some attri log recovery stuff>
<some attrd log recovery stuff>

IOWs, the attri and attrd functions are interleaved non-obvious
ways and that makes it hard to navigate around when trying to find
related information. It would make more sense to me to structure
this as:

<attri stuff>
<attri log recovery stuff>
<some attrd stuff>
<attrd log recovery stuff>
static const struct xfs_item_ops xfs_attri_item_ops = {...}
const struct xlog_recover_item_ops xlog_attri_item_ops = {...}
static const struct xfs_item_ops xfs_attrd_item_ops = {...}
const struct xlog_recover_item_ops xlog_attrd_item_ops = {...}

because then all the related functionality is grouped together. It
also puts all the ops structures together in the one place, so we
don't have to jump around all over the file when just looking at
what ops the items run at different times...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v24 06/11] xfs: Add xfs_attr_set_deferred and xfs_attr_remove_deferred
  2021-08-24 22:44 ` [PATCH v24 06/11] xfs: Add xfs_attr_set_deferred and xfs_attr_remove_deferred Allison Henderson
  2021-08-30 10:27   ` Chandan Babu R
@ 2021-09-01  3:47   ` Dave Chinner
  2021-09-01  5:34     ` Allison Henderson
  1 sibling, 1 reply; 45+ messages in thread
From: Dave Chinner @ 2021-09-01  3:47 UTC (permalink / raw)
  To: Allison Henderson; +Cc: linux-xfs

On Tue, Aug 24, 2021 at 03:44:29PM -0700, Allison Henderson wrote:
> From: Allison Collins <allison.henderson@oracle.com>
> 
> These routines set up and queue a new deferred attribute operations.
> These functions are meant to be called by any routine needing to
> initiate a deferred attribute operation as opposed to the existing
> inline operations. New helper function xfs_attr_item_init also added.
> 
> Finally enable delayed attributes in xfs_attr_set and xfs_attr_remove.
> 
> Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
.....
>  
> +STATIC int
> +xfs_attr_item_init(
> +	struct xfs_da_args	*args,
> +	unsigned int		op_flags,	/* op flag (set or remove) */
> +	struct xfs_attr_item	**attr)		/* new xfs_attr_item */
> +{
> +
> +	struct xfs_attr_item	*new;
> +
> +	new = kmem_zalloc(sizeof(struct xfs_attr_item), KM_NOFS);

In transaction context here so we don't need KM_NOFS.

> +	new->xattri_op_flags = op_flags;
> +	new->xattri_dac.da_args = args;
> +
> +	*attr = new;
> +	return 0;
> +}

Why doesn't this just return the object or NULL on allocation 
failure? What other error could it ever return?

> +
> +/* Sets an attribute for an inode as a deferred operation */
> +int
> +xfs_attr_set_deferred(
> +	struct xfs_da_args	*args)
> +{
> +	struct xfs_attr_item	*new;
> +	int			error = 0;
> +
> +	error = xfs_attr_item_init(args, XFS_ATTR_OP_FLAGS_SET, &new);
> +	if (error)
> +		return error;

i.e.
	attri = xfs_attr_item_init(args, XFS_ATTR_OP_FLAGS_SET);
	if (!attri)
		return -ENOMEM;

> +
> +	xfs_defer_add(args->trans, XFS_DEFER_OPS_TYPE_ATTR, &new->xattri_list);
> +
> +	return 0;
> +}
> +
> +/* Removes an attribute for an inode as a deferred operation */
> +int
> +xfs_attr_remove_deferred(
> +	struct xfs_da_args	*args)
> +{
> +
> +	struct xfs_attr_item	*new;
> +	int			error;
> +
> +	error  = xfs_attr_item_init(args, XFS_ATTR_OP_FLAGS_REMOVE, &new);
> +	if (error)
> +		return error;
> +
> +	xfs_defer_add(args->trans, XFS_DEFER_OPS_TYPE_ATTR, &new->xattri_list);
> +
> +	return 0;
> +}

We really should not use "new" as a variable name. As a general
rule, the common pattern set by this file is that xfs_attri_item
objects in a function are named "attri". Just because it's newly
allocated doesn't mean we should use a different convention for
naming xfs_attri_item objects...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v24 03/11] xfs: Set up infrastructure for log atrribute replay
  2021-08-31 21:52   ` Dave Chinner
@ 2021-09-01  5:34     ` Allison Henderson
  0 siblings, 0 replies; 45+ messages in thread
From: Allison Henderson @ 2021-09-01  5:34 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs



On 8/31/21 2:52 PM, Dave Chinner wrote:
> On Tue, Aug 24, 2021 at 03:44:26PM -0700, Allison Henderson wrote:
>> Currently attributes are modified directly across one or more
>> transactions. But they are not logged or replayed in the event of an
>> error. The goal of log attr replay is to enable logging and replaying
>> of attribute operations using the existing delayed operations
>> infrastructure.  This will later enable the attributes to become part of
>> larger multi part operations that also must first be recorded to the
>> log.  This is mostly of interest in the scheme of parent pointers which
>> would need to maintain an attribute containing parent inode information
>> any time an inode is moved, created, or removed.  Parent pointers would
>> then be of interest to any feature that would need to quickly derive an
>> inode path from the mount point. Online scrub, nfs lookups and fs grow
>> or shrink operations are all features that could take advantage of this.
>>
>> This patch adds two new log item types for setting or removing
>> attributes as deferred operations.  The xfs_attri_log_item will log an
>> intent to set or remove an attribute.  The corresponding
>> xfs_attrd_log_item holds a reference to the xfs_attri_log_item and is
>> freed once the transaction is done.  Both log items use a generic
>> xfs_attr_log_format structure that contains the attribute name, value,
>> flags, inode, and an op_flag that indicates if the operations is a set
>> or remove.
>>
>> Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
>> ---
>>   fs/xfs/Makefile                 |   1 +
>>   fs/xfs/libxfs/xfs_attr.c        |   5 +-
>>   fs/xfs/libxfs/xfs_attr.h        |  31 +++
>>   fs/xfs/libxfs/xfs_defer.h       |   2 +
>>   fs/xfs/libxfs/xfs_log_format.h  |  44 +++-
>>   fs/xfs/libxfs/xfs_log_recover.h |   2 +
>>   fs/xfs/scrub/common.c           |   2 +
>>   fs/xfs/xfs_attr_item.c          | 453 ++++++++++++++++++++++++++++++++
> 
> Comment on the overall structure of this file now I've been trying
> to navigate through it for a little while. It is structured like:
> 
> <some attri stuff>
> <some attrd stuff>
> static const struct xfs_item_ops xfs_attrd_item_ops = {...}
> <some more attri stuff>
> static const struct xfs_item_ops xfs_attri_item_ops = {...}
> <some attri log recovery stuff>
> <some attrd log recovery stuff>
> 
> IOWs, the attri and attrd functions are interleaved non-obvious
> ways and that makes it hard to navigate around when trying to find
> related information. It would make more sense to me to structure
> this as:
> 
> <attri stuff>
> <attri log recovery stuff>
> <some attrd stuff>
> <attrd log recovery stuff>
> static const struct xfs_item_ops xfs_attri_item_ops = {...}
> const struct xlog_recover_item_ops xlog_attri_item_ops = {...}
> static const struct xfs_item_ops xfs_attrd_item_ops = {...}
> const struct xlog_recover_item_ops xlog_attrd_item_ops = {...}
> 
> because then all the related functionality is grouped together. It
> also puts all the ops structures together in the one place, so we
> don't have to jump around all over the file when just looking at
> what ops the items run at different times...
> 
Sure, will make a note to re-arrange some of these in the next version

Allison

> Cheers,
> 
> Dave.
> 

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v24 06/11] xfs: Add xfs_attr_set_deferred and xfs_attr_remove_deferred
  2021-09-01  3:47   ` Dave Chinner
@ 2021-09-01  5:34     ` Allison Henderson
  0 siblings, 0 replies; 45+ messages in thread
From: Allison Henderson @ 2021-09-01  5:34 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs



On 8/31/21 8:47 PM, Dave Chinner wrote:
> On Tue, Aug 24, 2021 at 03:44:29PM -0700, Allison Henderson wrote:
>> From: Allison Collins <allison.henderson@oracle.com>
>>
>> These routines set up and queue a new deferred attribute operations.
>> These functions are meant to be called by any routine needing to
>> initiate a deferred attribute operation as opposed to the existing
>> inline operations. New helper function xfs_attr_item_init also added.
>>
>> Finally enable delayed attributes in xfs_attr_set and xfs_attr_remove.
>>
>> Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
>> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
> .....
>>   
>> +STATIC int
>> +xfs_attr_item_init(
>> +	struct xfs_da_args	*args,
>> +	unsigned int		op_flags,	/* op flag (set or remove) */
>> +	struct xfs_attr_item	**attr)		/* new xfs_attr_item */
>> +{
>> +
>> +	struct xfs_attr_item	*new;
>> +
>> +	new = kmem_zalloc(sizeof(struct xfs_attr_item), KM_NOFS);
> 
> In transaction context here so we don't need KM_NOFS.
ok, will remove

> 
>> +	new->xattri_op_flags = op_flags;
>> +	new->xattri_dac.da_args = args;
>> +
>> +	*attr = new;
>> +	return 0;
>> +}
> 
> Why doesn't this just return the object or NULL on allocation
> failure? What other error could it ever return?
I had adopted this function signature just to be consistent with other 
*_item_init routines at the time.  Mostly just trying to be uniform, 
though they may have changed since.  This patch spent some time on the 
back burner while we were more focused on the state machine refactoring.

> 
>> +
>> +/* Sets an attribute for an inode as a deferred operation */
>> +int
>> +xfs_attr_set_deferred(
>> +	struct xfs_da_args	*args)
>> +{
>> +	struct xfs_attr_item	*new;
>> +	int			error = 0;
>> +
>> +	error = xfs_attr_item_init(args, XFS_ATTR_OP_FLAGS_SET, &new);
>> +	if (error)
>> +		return error;
> 
> i.e.
> 	attri = xfs_attr_item_init(args, XFS_ATTR_OP_FLAGS_SET);
> 	if (!attri)
> 		return -ENOMEM;
> 
>> +
>> +	xfs_defer_add(args->trans, XFS_DEFER_OPS_TYPE_ATTR, &new->xattri_list);
>> +
>> +	return 0;
>> +}
>> +
>> +/* Removes an attribute for an inode as a deferred operation */
>> +int
>> +xfs_attr_remove_deferred(
>> +	struct xfs_da_args	*args)
>> +{
>> +
>> +	struct xfs_attr_item	*new;
>> +	int			error;
>> +
>> +	error  = xfs_attr_item_init(args, XFS_ATTR_OP_FLAGS_REMOVE, &new);
>> +	if (error)
>> +		return error;
>> +
>> +	xfs_defer_add(args->trans, XFS_DEFER_OPS_TYPE_ATTR, &new->xattri_list);
>> +
>> +	return 0;
>> +}
> 
> We really should not use "new" as a variable name. As a general
> rule, the common pattern set by this file is that xfs_attri_item
> objects in a function are named "attri". Just because it's newly
> allocated doesn't mean we should use a different convention for
> naming xfs_attri_item objects...
> 
Ok, I had seen the pattern around and reused it.  Will change to attri

Allison

> Cheers,
> 
> Dave.
> 

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v24 00/11] Log Attribute Replay
  2021-08-31 18:27   ` Allison Henderson
@ 2021-09-01  6:29     ` Dave Chinner
  0 siblings, 0 replies; 45+ messages in thread
From: Dave Chinner @ 2021-09-01  6:29 UTC (permalink / raw)
  To: Allison Henderson; +Cc: linux-xfs

On Tue, Aug 31, 2021 at 11:27:48AM -0700, Allison Henderson wrote:
> 
> 
> On 8/30/21 5:20 PM, Dave Chinner wrote:
> > On Tue, Aug 24, 2021 at 03:44:23PM -0700, Allison Henderson wrote:
> > > Hi all,
> > > 
> > > This set is a subset of a larger series parent pointers. Delayed attributes allow
> > > attribute operations (set and remove) to be logged and committed in the same
> > > way that other delayed operations do. This allows more complex operations (like
> > > parent pointers) to be broken up into multiple smaller transactions. To do
> > > this, the existing attr operations must be modified to operate as a delayed
> > > operation.  This means that they cannot roll, commit, or finish transactions.
> > > Instead, they return -EAGAIN to allow the calling function to handle the
> > > transaction.  In this series, we focus on only the delayed attribute portion.
> > > We will introduce parent pointers in a later set.
> > > 
> > > The set as a whole is a bit much to digest at once, so I usually send out the
> > > smaller sub series to reduce reviewer burn out.  But the entire extended series
> > > is visible through the included github links.
> 
> Some of this we worked out in the chat last night, but I will echo it here
> for the archives
> 
> > 
> > Ok, so like I did with Darrick's deferred inactivation series, the
> > first thing I'm doing here is throwing this patchset at
> > scalability/performance worklaods and finding out what is different.
> > 
> > I've merged this series with 5.14 + xfs/for-next + xfs-cil-scale and
> > then run some tests on it. First up is fsmark creating zero length
> > files w/ 64 byte xattrs. This should stress only shortform attribute
> > manipulations.
> > 
> > I have not enabled delayed attributes yet (i.e.
> > /sys/fs/xfs/debug/larp = 0)
> > 
> > First thing I notice is the transaction commit rate during create is
> > up around 900k/s, so we are doing 3 transactions per inode - 1 for
> > create, 2 for attributes. That looks like a regression - existing
> > shortform attribute creation only takes a single transaction commit,
> > so this workload prior to this patchset only ran at 600k commits/s.
> > 
> > Note that hte only reason I'm getting 900k transactions/s is the
> > CIL scalability patchset - without that the system tops out at ~800k
> > transactions/s and so this would be a significant performance
> > regression (20%) vs the current xfs/for-next code.
> > 
> > Essentially, this looks like we are doing an extra transaction
> > commit to defer the creation of the attribute, then doing another
> > transaction to actually modify the attribute. i.e.:
> > 
> >   - 11.04% xfs_attr_set
> >      - 8.70% xfs_trans_commit
> >         - 8.69% __xfs_trans_commit
> > 	  - 5.10% xfs_defer_finish_noroll
> > 	     - 3.74% xfs_defer_trans_roll
> > 		- 3.57% xfs_trans_roll
> > 		   - 3.13% __xfs_trans_commit
> > 		      - 3.01% xlog_cil_commit
> > 			   0.66% down_read
> > 			   0.63% xfs_log_ticket_regrant
> > 	     - 1.16% xfs_attr_finish_item
> > 		- 1.06% xfs_trans_attr_finish_update
> > 		   - 1.03% xfs_attr_set_iter
> > 		      - 1.01% xfs_attr_sf_addname
> > 			 - 0.99% xfs_attr_try_sf_addname
> > 			    - 0.61% xfs_attr_shortform_addname
> > 				 0.55% xfs_attr_shortform_add
> > 
> > 
> > AFAICT, for non-delayed attributes, this first transaction commit
> > logs the inode but does not create intent or intent done items
> > (returns NULL for both operations), so just rolls and runs the
> > ->finish_item. So it would seem that the first transaction just
> > changes the inode timestamps and does nothing else.
> > 
> > Firstly, this means the inode timestamp change is not atomic w.r.t.
> > the attribute change the timestamp change relates to and it's
> > essentially new overhead for the non-delayed path.
> > 
> > Looking at the unlink path, I see the same thing - there's an extra
> > transaction for the attr remove path, the same as the attr set path.
> > This drives the unlink path to 1.1 million transaction commits/sec
> > instead of 800k/s, so it's likely that there's a substantial
> > performance regression here on a kernel without the CIL scalability
> > patchset.
> > 
> > IOWs, there's significant behavioural changes with the non-delayed
> > logging version of this patchset, both in terms of performance and
> > the atomicity of changes that appear in the journal and hence
> > recovery behaviour.
> > 
> > At this point I have to ask: why are we trying to retain the "old"
> > way of doing things (even for testing) if it is substantially
> > changing behaviour and on-disk journal contents for attribute
> > modifications?
> Per the chat discussion, we have to keep both methods since sb v4 would not
> use the new log entries.

Yup, but we likely have better reasons than this. Tests I've run
today have indicated that as the xattr size goes up, the performance
regression of the delayed attrs gets worse. At max sized xattrs,
performance is only ~25% of the existing attr implementation.

Largely the problem is the number of copies of the xattr value we
end up in memory when we are using delalyed attrs. In the current
case, we have this all in a single syscall context:

	- xattr value allocated by VFS
	- attached to xfs_da_args
	- xfs_da_args passed to xfs_attr_rmtval_set_value() where we
	  allocate buffer(s) to hold the encoded name.
	- buffer is written to disk.
	- xattr VFS value freed.

And the xfs_buf is freed when memory pressure occurs.

So, essentially we have two copies of the xattr value and two
memcpy()s of it to get it to disk.

Now, with delayed attrs we have this in the syscall context:

	- xattr value allocated by VFS
	- attached to xfs_da_args
	- attached to xfs_attri_item
	- attri item intent gets created, allocates a 64k+ shadow
	  buffer and memcpy()s the value into it.
	- intent gets committed and attached to the CIL
	- xfs_da_args passed to xfs_attr_rmtval_set_value() where we
	  allocate buffer(s) to hold the encoded name.
	- buffer is written to disk.
	- xattr VFS value freed.

Now userspace runs more xattr creation syscalls, all queuing up
shadow buffers on the CIL. memory footprint goes up massively.

Some time later, the CIL commits and we:
	- shadow buffer attached to CIL commit
	- memcpy() the value from the shadow buffer to the iclog
	- commit the CIL checkpoint
	- on CIL checkpoint completion, shadow buffer is freed.

And the xfs_bufs is freed when memory pressure occurs.

The result is that with delayed attrs, large attr creation hammers
the page allocator really badly - more than 50% of the CPU time is
spent trying to allocate pages and compact memory into contiguous
regions and so performance goes way down. e.g.

- 51.90% xlog_cil_commit
 - 36.96% kvmalloc_node
    - 36.51% __kmalloc_node
       - 36.32% kmalloc_large_node
	  - 36.24% __alloc_pages
	     - 34.53% __alloc_pages_slowpath.constprop.0
		- 33.80% __alloc_pages_direct_compact
		   - 33.62% try_to_compact_pages
		      - compact_zone_order
			 - 30.58% compact_zone
			      14.72% PageHuge

I suspect we need to look at our use of kvmalloc() and the flags we
pass it because I think if we can't get contiguous ranges
immediately, we should go straight to vmalloc rather than burn a
dozen CPUs trying to compact memory...

The amount we log also goes way up.  At 16 threads and 4kB xattrs,
the log alone is sustaining more than 1GB/s throughput at 50k xattrs
creates/s.  There is over 2GB/s going to disk.

With 64k xattrs, there is over 2.5GB/s being written to the log and
just over 3GB/s being written to disk. Performance is about 9000
xattr creates/s.

In comparison, with delayed attrs turned off for 64k xattrs, log
throughput is roughly 500MB/s, disk throughput is between 3.5-4GB/s
and the create rate is around 45,000 xattrs/s. So you can see that
logging large xattrs really hurts performance.

We want delayed attrs for modifying attributes atomically
with other inode modifications (i.e. parent pointers, security
labels, ACLs, etc) and none of these use cases use large xattrs -
they are all relatively small and so the logging and memory overhead
of delayed attrs isn't a huge deal for them. For anything else,
small or large, a 10-15% regression is a deal breaker (think ceph,
samba, etc).

So I think we're going to need to select attribute modification
behaviour at the call site. e.g. from xattr syscalls we don't get
any real integrity benefit from logging xattrs, so maybe this path
always uses the unlogged path. The we can change the internal create
path to use delayed attrs for parent pointers and security attrs.
As Darrick said, this is likely what the current larp debug knob
should turn into - caller selected behaviour.

> > Ok, there's the first failure.
> > 
> > This looks like it's a problem with xfs_attri_item_{size,format} in
> > calculating the number of bytes to log. They use ATTR_NVEC_SIZE() to
> > calculate the number of bytes of copy from the attribute item which
> > rounds up the length to copy to 4 byte aligned values. I'm not sure
> > what this function is calculating:
> > 
> > /* iovec length must be 32-bit aligned */
> > static inline size_t ATTR_NVEC_SIZE(size_t size)
> > {
> >          return size == sizeof(int32_t) ? size :
> > 	               sizeof(int32_t) + round_up(size, sizeof(int32_t));
> > }
> > 
> > It appears to be saying if the size == 4, then return 4, otherwise
> > return 4 + roundup(size)... which leads me to struct
> > xfs_attri_log_format:
> > 
> > struct xfs_attri_log_format {
> >          uint16_t        alfi_type;      /* attri log item type */
> >          uint16_t        alfi_size;      /* size of this item */
> >          uint32_t        __pad;          /* pad to 64 bit aligned */
> >          uint64_t        alfi_id;        /* attri identifier */
> >          uint64_t        alfi_ino;       /* the inode for this attr operation */
> >          uint32_t        alfi_op_flags;  /* marks the op as a set or remove */
> >          uint32_t        alfi_name_len;  /* attr name length */
> >          uint32_t        alfi_value_len; /* attr value length */
> >          uint32_t        alfi_attr_flags;/* attr flags */
> > };
> > 
> > I don't see where the extra 4 bytes for the attribute vector size
> > comes from. It's not needed to store the length, so this could
> > oversize the amount of data to be copied from the source
> > buffer by up to 7 bytes?
> > 
> > I can see that it might need rounding with the existing
> > log code (because the formatter is responsible for 32 bit alignment
> > of log vectors), but that goes away with the CIL scalability
> > patchset that always aligns iovecs to 4 byte alignment so the
> > formatters do not need to do that.
> 
> I think we figured this out last night, initially this was here for an
> assertion check in the log code, but I think just the round up will suffice
> for the check.

I've got a cleaner solution in the works - we should be hiding the
alignment behind the log iovec API, not force callers to know about
it and always get it right.

> > Hiding it in a "macro" is not necessary, either - look at how
> > xfs_inode_item_{data,attr}_fork_size handle the rounding up of the
> > local format fork size. They round up the fork byte count to 4
> > directly, and the format code copies those bytes because
> > xfs_idata_realloc() allocates those bytes.
> > 
> > However, for the attribute buffers, this isn't guaranteed. Look at
> > xfs_xattr_set():
> > 
> >          struct xfs_da_args      args = {
> >                  .dp             = XFS_I(inode),
> >                  .attr_filter    = handler->flags,
> >                  .attr_flags     = flags,
> >                  .name           = name,
> >                  .namelen        = strlen(name),
> >                  .value          = (void *)value,
> >                  .valuelen       = size,
> >          };
> > 
> > There is no rounding up of the name or value lengths, and these end
> > up directly referenced by the deferred logging via xfs_attr_log_item()
> > and attrip->da_args->...
> > 
> >          attrip->attri_name = (void *)attr->xattri_da_args->name;
> >          attrip->attri_value = attr->xattri_da_args->value;
> >          attrip->attri_name_len = attr->xattri_da_args->namelen;
> >          attrip->attri_value_len = attr->xattri_da_args->valuelen;
> > 
> > We then pass those pointers directly to xlog_iovec_copy() but with a
> > rounded up length that is longer than the source buffer:
> > 
> >          xlog_copy_iovec(lv, &vecp, XLOG_REG_TYPE_ATTR_NAME,
> >                          attrip->attri_name,
> >                          ATTR_NVEC_SIZE(attrip->attri_name_len));
> >          if (attrip->attri_value_len > 0)
> >                  xlog_copy_iovec(lv, &vecp, XLOG_REG_TYPE_ATTR_VALUE,
> >                                  attrip->attri_value,
> >                                  ATTR_NVEC_SIZE(attrip->attri_value_len));
> > 
> > So while this might not be the source of the above crash, it's
> > certainly a bug that needs fixing.
> > 
> > At this point, I'm just going to hack on the code to make it work,
> > and we can go from there...
> 
> I think when we left off last night, we are more concerned with the extra
> transaction overhead in the perf captures.  I will see if I can replicate
> what you are seeing with perf and maybe we can work out some short cuts.
> Thank for your help here!

Yup, there's a few issues. I've hacked out the initial transaction
for non-delayed attrs and that brings the commit count down as
expected. it might be acceptible, but we'll see.

I've also added intent whiteouts to try to avoid having to log
intents that are completed before the intent has been written to the
log. With that in place, the above large xattr creation workloads
have relatively consistent log throughput of around 600-700MB/s, and
performance for 4k and 64kB xattrs roughly doubles. It's still way
behind the current implementation, though, because it hammers the
page allocator even harder now. However, intent whiteouts are
generic, so I'll write this up into a separate patchset that
addresses all the other types of intents we have as well.

There's some other things we can potentially do to minimise the
memory footprint and the number of memcpy()s of the logged values,
so I'll look into these tomorrow.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 0/5] xfs: various logged attribute fixes
  2021-08-24 22:44 [PATCH v24 00/11] Log Attribute Replay Allison Henderson
                   ` (11 preceding siblings ...)
  2021-08-31  0:20 ` [PATCH v24 00/11] Log Attribute Replay Dave Chinner
@ 2021-09-01  7:30 ` Dave Chinner
  2021-09-01  7:30   ` [PATCH 1/5] xfs: fix fallthrough annotations in xfs_attr_set_iter() Dave Chinner
                     ` (5 more replies)
  12 siblings, 6 replies; 45+ messages in thread
From: Dave Chinner @ 2021-09-01  7:30 UTC (permalink / raw)
  To: linux-xfs; +Cc: allison.henderson

Hi Allison,

Here are the first set of fixups I've made while testing this
series. The intent whiteouts are generic so I'm keeping that
separate, even though it's the change that fixes most of the
performance regressions for small xattr sizes that result from
enabling logged attributes.

Cheers,

Dave.


^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 1/5] xfs: fix fallthrough annotations in xfs_attr_set_iter()
  2021-09-01  7:30 ` [PATCH 0/5] xfs: various logged attribute fixes Dave Chinner
@ 2021-09-01  7:30   ` Dave Chinner
  2021-09-01  7:30   ` [PATCH 2/5] xfs: fix flags passed to kvmalloc() by xfs_attri_init() Dave Chinner
                     ` (4 subsequent siblings)
  5 siblings, 0 replies; 45+ messages in thread
From: Dave Chinner @ 2021-09-01  7:30 UTC (permalink / raw)
  To: linux-xfs; +Cc: allison.henderson

From: Dave Chinner <dchinner@redhat.com>

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/libxfs/xfs_attr.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
index 7150f0e051a0..e49284325d04 100644
--- a/fs/xfs/libxfs/xfs_attr.c
+++ b/fs/xfs/libxfs/xfs_attr.c
@@ -448,7 +448,7 @@ xfs_attr_set_iter(
 			return -EAGAIN;
 		}
 
-		/* fallthrough */
+		fallthrough;
 	case XFS_DAS_FLIP_LFLAG:
 		/*
 		 * Dismantle the "old" attribute/value pair by removing a
@@ -566,7 +566,7 @@ xfs_attr_set_iter(
 			return -EAGAIN;
 		}
 
-		/* fallthrough */
+		fallthrough;
 	case XFS_DAS_FLIP_NFLAG:
 		/*
 		 * Dismantle the "old" attribute/value pair by removing a
-- 
2.31.1


^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 2/5] xfs: fix flags passed to kvmalloc() by xfs_attri_init()
  2021-09-01  7:30 ` [PATCH 0/5] xfs: various logged attribute fixes Dave Chinner
  2021-09-01  7:30   ` [PATCH 1/5] xfs: fix fallthrough annotations in xfs_attr_set_iter() Dave Chinner
@ 2021-09-01  7:30   ` Dave Chinner
  2021-09-01  7:30   ` [PATCH 3/5] xfs: hide log iovec alignment constraints Dave Chinner
                     ` (3 subsequent siblings)
  5 siblings, 0 replies; 45+ messages in thread
From: Dave Chinner @ 2021-09-01  7:30 UTC (permalink / raw)
  To: linux-xfs; +Cc: allison.henderson

From: Dave Chinner <dchinner@redhat.com>

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_attr_item.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/xfs/xfs_attr_item.c b/fs/xfs/xfs_attr_item.c
index 2f48e84754ae..4072972a2ed8 100644
--- a/fs/xfs/xfs_attr_item.c
+++ b/fs/xfs/xfs_attr_item.c
@@ -184,7 +184,7 @@ xfs_attri_init(
 	uint				size;
 
 	size = sizeof(struct xfs_attri_log_item) + buffer_size;
-	attrip = kvmalloc(size, KM_ZERO);
+	attrip = kvmalloc(size, GFP_KERNEL | __GFP_ZERO);
 	if (attrip == NULL)
 		return NULL;
 
-- 
2.31.1


^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 3/5] xfs: hide log iovec alignment constraints
  2021-09-01  7:30 ` [PATCH 0/5] xfs: various logged attribute fixes Dave Chinner
  2021-09-01  7:30   ` [PATCH 1/5] xfs: fix fallthrough annotations in xfs_attr_set_iter() Dave Chinner
  2021-09-01  7:30   ` [PATCH 2/5] xfs: fix flags passed to kvmalloc() by xfs_attri_init() Dave Chinner
@ 2021-09-01  7:30   ` Dave Chinner
  2021-09-01  7:30   ` [PATCH 4/5] xfs: fix attribute log iovec sizing Dave Chinner
                     ` (2 subsequent siblings)
  5 siblings, 0 replies; 45+ messages in thread
From: Dave Chinner @ 2021-09-01  7:30 UTC (permalink / raw)
  To: linux-xfs; +Cc: allison.henderson

From: Dave Chinner <dchinner@redhat.com>

Callers currently have to round out the size of buffers to match the
aligment constraints of log iovecs and xlog_write(). They should not
need to know this detail, so introduce a new function to calculate
the iovec length (for use in ->iop_size implementations). Also
modify xlog_finish_iovec() to round up the length to the correct
alignment so the callers don't need to do this, either.

Convert the only user - inode forks - of this alignment rounding to
use the new interface.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/libxfs/xfs_inode_fork.c | 10 ++--------
 fs/xfs/xfs_inode_item.c        | 25 +++++++------------------
 fs/xfs/xfs_log.h               | 20 ++++++++++++++++++++
 3 files changed, 29 insertions(+), 26 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_inode_fork.c b/fs/xfs/libxfs/xfs_inode_fork.c
index 1d174909f9bd..2690f5f47b7e 100644
--- a/fs/xfs/libxfs/xfs_inode_fork.c
+++ b/fs/xfs/libxfs/xfs_inode_fork.c
@@ -50,8 +50,7 @@ xfs_init_local_fork(
 		mem_size++;
 
 	if (size) {
-		real_size = roundup(mem_size, 4);
-		ifp->if_u1.if_data = kmem_alloc(real_size, KM_NOFS);
+		ifp->if_u1.if_data = kmem_alloc(mem_size, KM_NOFS);
 		memcpy(ifp->if_u1.if_data, data, size);
 		if (zero_terminate)
 			ifp->if_u1.if_data[size] = '\0';
@@ -497,12 +496,7 @@ xfs_idata_realloc(
 		return;
 	}
 
-	/*
-	 * For inline data, the underlying buffer must be a multiple of 4 bytes
-	 * in size so that it can be logged and stay on word boundaries.
-	 * We enforce that here.
-	 */
-	ifp->if_u1.if_data = krealloc(ifp->if_u1.if_data, roundup(new_size, 4),
+	ifp->if_u1.if_data = krealloc(ifp->if_u1.if_data, new_size,
 				      GFP_NOFS | __GFP_NOFAIL);
 	ifp->if_bytes = new_size;
 }
diff --git a/fs/xfs/xfs_inode_item.c b/fs/xfs/xfs_inode_item.c
index 0659d19c211e..806562d82ced 100644
--- a/fs/xfs/xfs_inode_item.c
+++ b/fs/xfs/xfs_inode_item.c
@@ -70,7 +70,7 @@ xfs_inode_item_data_fork_size(
 	case XFS_DINODE_FMT_LOCAL:
 		if ((iip->ili_fields & XFS_ILOG_DDATA) &&
 		    ip->i_df.if_bytes > 0) {
-			*nbytes += roundup(ip->i_df.if_bytes, 4);
+			*nbytes += xlog_calc_iovec_len(ip->i_df.if_bytes);
 			*nvecs += 1;
 		}
 		break;
@@ -111,7 +111,7 @@ xfs_inode_item_attr_fork_size(
 	case XFS_DINODE_FMT_LOCAL:
 		if ((iip->ili_fields & XFS_ILOG_ADATA) &&
 		    ip->i_afp->if_bytes > 0) {
-			*nbytes += roundup(ip->i_afp->if_bytes, 4);
+			*nbytes += xlog_calc_iovec_len(ip->i_df.if_bytes);
 			*nvecs += 1;
 		}
 		break;
@@ -203,17 +203,12 @@ xfs_inode_item_format_data_fork(
 			~(XFS_ILOG_DEXT | XFS_ILOG_DBROOT | XFS_ILOG_DEV);
 		if ((iip->ili_fields & XFS_ILOG_DDATA) &&
 		    ip->i_df.if_bytes > 0) {
-			/*
-			 * Round i_bytes up to a word boundary.
-			 * The underlying memory is guaranteed
-			 * to be there by xfs_idata_realloc().
-			 */
-			data_bytes = roundup(ip->i_df.if_bytes, 4);
 			ASSERT(ip->i_df.if_u1.if_data != NULL);
 			ASSERT(ip->i_disk_size > 0);
 			xlog_copy_iovec(lv, vecp, XLOG_REG_TYPE_ILOCAL,
-					ip->i_df.if_u1.if_data, data_bytes);
-			ilf->ilf_dsize = (unsigned)data_bytes;
+					ip->i_df.if_u1.if_data,
+					ip->i_df.if_bytes);
+			ilf->ilf_dsize = (unsigned)ip->i_df.if_bytes;
 			ilf->ilf_size++;
 		} else {
 			iip->ili_fields &= ~XFS_ILOG_DDATA;
@@ -287,17 +282,11 @@ xfs_inode_item_format_attr_fork(
 
 		if ((iip->ili_fields & XFS_ILOG_ADATA) &&
 		    ip->i_afp->if_bytes > 0) {
-			/*
-			 * Round i_bytes up to a word boundary.
-			 * The underlying memory is guaranteed
-			 * to be there by xfs_idata_realloc().
-			 */
-			data_bytes = roundup(ip->i_afp->if_bytes, 4);
 			ASSERT(ip->i_afp->if_u1.if_data != NULL);
 			xlog_copy_iovec(lv, vecp, XLOG_REG_TYPE_IATTR_LOCAL,
 					ip->i_afp->if_u1.if_data,
-					data_bytes);
-			ilf->ilf_asize = (unsigned)data_bytes;
+					ip->i_afp->if_bytes);
+			ilf->ilf_asize = (unsigned)ip->i_afp->if_bytes;
 			ilf->ilf_size++;
 		} else {
 			iip->ili_fields &= ~XFS_ILOG_ADATA;
diff --git a/fs/xfs/xfs_log.h b/fs/xfs/xfs_log.h
index b324d2136e94..0ee0ec7f96c8 100644
--- a/fs/xfs/xfs_log.h
+++ b/fs/xfs/xfs_log.h
@@ -22,6 +22,17 @@ struct xfs_log_vec {
 
 #define XFS_LOG_VEC_ORDERED	(-1)
 
+/*
+ * Calculate the log iovec length for a given user buffer length. Intended to be
+ * used by ->iop_size implementations when sizing buffers of arbitrary
+ * alignments.
+ */
+static inline int
+xlog_calc_iovec_len(int len)
+{
+	return roundup(len, 4);
+}
+
 void *xlog_prepare_iovec(struct xfs_log_vec *lv, struct xfs_log_iovec **vecp,
 		uint type);
 
@@ -30,6 +41,12 @@ xlog_finish_iovec(struct xfs_log_vec *lv, struct xfs_log_iovec *vec, int len)
 {
 	struct xlog_op_header	*oph = vec->i_addr;
 
+	/*
+	 * Always round up the length to the correct alignment so callers don't
+	 * need to know anything about this log vec layout requirement.
+	 */
+	len = xlog_calc_iovec_len(len);
+
 	/* opheader tracks payload length, logvec tracks region length */
 	oph->oh_len = len;
 
@@ -39,6 +56,9 @@ xlog_finish_iovec(struct xfs_log_vec *lv, struct xfs_log_iovec *vec, int len)
 	vec->i_len = len;
 }
 
+/*
+ * Copy the amount of data requested by the caller into a new log iovec.
+ */
 static inline void *
 xlog_copy_iovec(struct xfs_log_vec *lv, struct xfs_log_iovec **vecp,
 		uint type, void *data, int len)
-- 
2.31.1


^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 4/5] xfs: fix attribute log iovec sizing
  2021-09-01  7:30 ` [PATCH 0/5] xfs: various logged attribute fixes Dave Chinner
                     ` (2 preceding siblings ...)
  2021-09-01  7:30   ` [PATCH 3/5] xfs: hide log iovec alignment constraints Dave Chinner
@ 2021-09-01  7:30   ` Dave Chinner
  2021-09-01  7:30   ` [PATCH 5/5] [RFC] xfs: don't commit the first deferred transaction without intents Dave Chinner
  2021-09-01 21:43   ` [PATCH 0/5] xfs: various logged attribute fixes Allison Henderson
  5 siblings, 0 replies; 45+ messages in thread
From: Dave Chinner @ 2021-09-01  7:30 UTC (permalink / raw)
  To: linux-xfs; +Cc: allison.henderson

From: Dave Chinner <dchinner@redhat.com>

The sizing of the attri name and value log iovecs is incorrect.
This results in memory corruption and crashes on a kernel with
the current CIL scalability patchset applied as it relies on the
callers playing by slightly different alignment rules.

Convert the attri code to the new xlog_calc_iovec_size() API to
fix these issues.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_attr_item.c | 29 ++++++++---------------------
 1 file changed, 8 insertions(+), 21 deletions(-)

diff --git a/fs/xfs/xfs_attr_item.c b/fs/xfs/xfs_attr_item.c
index 4072972a2ed8..bd4089eb8087 100644
--- a/fs/xfs/xfs_attr_item.c
+++ b/fs/xfs/xfs_attr_item.c
@@ -42,13 +42,6 @@
 static const struct xfs_item_ops xfs_attri_item_ops;
 static const struct xfs_item_ops xfs_attrd_item_ops;
 
-/* iovec length must be 32-bit aligned */
-static inline size_t ATTR_NVEC_SIZE(size_t size)
-{
-	return size == sizeof(int32_t) ? size :
-	       sizeof(int32_t) + round_up(size, sizeof(int32_t));
-}
-
 static inline struct xfs_attri_log_item *ATTRI_ITEM(struct xfs_log_item *lip)
 {
 	return container_of(lip, struct xfs_attri_log_item, attri_item);
@@ -89,19 +82,15 @@ xfs_attri_item_size(
 {
 	struct xfs_attri_log_item       *attrip = ATTRI_ITEM(lip);
 
-	*nvecs += 1;
-	*nbytes += sizeof(struct xfs_attri_log_format);
+	*nvecs += 2;
+	*nbytes += sizeof(struct xfs_attri_log_format) +
+			xlog_calc_iovec_len(attrip->attri_name_len);
 
-	/* Attr set and remove operations require a name */
-	ASSERT(attrip->attri_name_len > 0);
+	if (!attrip->attri_value_len)
+		return;
 
 	*nvecs += 1;
-	*nbytes += ATTR_NVEC_SIZE(attrip->attri_name_len);
-
-	if (attrip->attri_value_len > 0) {
-		*nvecs += 1;
-		*nbytes += ATTR_NVEC_SIZE(attrip->attri_value_len);
-	}
+	*nbytes += xlog_calc_iovec_len(attrip->attri_value_len);
 }
 
 /*
@@ -137,12 +126,10 @@ xfs_attri_item_format(
 			&attrip->attri_format,
 			sizeof(struct xfs_attri_log_format));
 	xlog_copy_iovec(lv, &vecp, XLOG_REG_TYPE_ATTR_NAME,
-			attrip->attri_name,
-			ATTR_NVEC_SIZE(attrip->attri_name_len));
+			attrip->attri_name, attrip->attri_name_len);
 	if (attrip->attri_value_len > 0)
 		xlog_copy_iovec(lv, &vecp, XLOG_REG_TYPE_ATTR_VALUE,
-				attrip->attri_value,
-				ATTR_NVEC_SIZE(attrip->attri_value_len));
+				attrip->attri_value, attrip->attri_value_len);
 }
 
 /*
-- 
2.31.1


^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 5/5] [RFC] xfs: don't commit the first deferred transaction without intents
  2021-09-01  7:30 ` [PATCH 0/5] xfs: various logged attribute fixes Dave Chinner
                     ` (3 preceding siblings ...)
  2021-09-01  7:30   ` [PATCH 4/5] xfs: fix attribute log iovec sizing Dave Chinner
@ 2021-09-01  7:30   ` Dave Chinner
  2021-09-01 21:43   ` [PATCH 0/5] xfs: various logged attribute fixes Allison Henderson
  5 siblings, 0 replies; 45+ messages in thread
From: Dave Chinner @ 2021-09-01  7:30 UTC (permalink / raw)
  To: linux-xfs; +Cc: allison.henderson

From: Dave Chinner <dchinner@redhat.com>

If the first operation in a string of defer ops has no intents,
then there is no reason to commit it before running the first call
to xfs_defer_finish_one(). This allows the defer ops to be used
effectively for non-intent based operations without requiring an
unnecessary extra transaction commit when first called.

This fixes a regression in per-attribute modification transaction
count when delayed attributes are not being used.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/libxfs/xfs_defer.c | 29 +++++++++++++++++------------
 1 file changed, 17 insertions(+), 12 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_defer.c b/fs/xfs/libxfs/xfs_defer.c
index 01fcf5e93be5..05983a4a4f91 100644
--- a/fs/xfs/libxfs/xfs_defer.c
+++ b/fs/xfs/libxfs/xfs_defer.c
@@ -181,7 +181,7 @@ static const struct xfs_defer_op_type *defer_op_types[] = {
 	[XFS_DEFER_OPS_TYPE_ATTR]	= &xfs_attr_defer_type,
 };
 
-static void
+static bool
 xfs_defer_create_intent(
 	struct xfs_trans		*tp,
 	struct xfs_defer_pending	*dfp,
@@ -192,6 +192,7 @@ xfs_defer_create_intent(
 	if (!dfp->dfp_intent)
 		dfp->dfp_intent = ops->create_intent(tp, &dfp->dfp_work,
 						     dfp->dfp_count, sort);
+	return dfp->dfp_intent;
 }
 
 /*
@@ -199,16 +200,18 @@ xfs_defer_create_intent(
  * associated extents, then add the entire intake list to the end of
  * the pending list.
  */
-STATIC void
+STATIC bool
 xfs_defer_create_intents(
 	struct xfs_trans		*tp)
 {
 	struct xfs_defer_pending	*dfp;
+	bool				ret = false;
 
 	list_for_each_entry(dfp, &tp->t_dfops, dfp_list) {
 		trace_xfs_defer_create_intent(tp->t_mountp, dfp);
-		xfs_defer_create_intent(tp, dfp, true);
+		ret |= xfs_defer_create_intent(tp, dfp, true);
 	}
+	return ret;
 }
 
 /* Abort all the intents that were committed. */
@@ -459,7 +462,7 @@ int
 xfs_defer_finish_noroll(
 	struct xfs_trans		**tp)
 {
-	struct xfs_defer_pending	*dfp;
+	struct xfs_defer_pending	*dfp = NULL;
 	int				error = 0;
 	LIST_HEAD(dop_pending);
 
@@ -478,17 +481,19 @@ xfs_defer_finish_noroll(
 		 * of time that any one intent item can stick around in memory,
 		 * pinning the log tail.
 		 */
-		xfs_defer_create_intents(*tp);
+		bool has_intents = xfs_defer_create_intents(*tp);
 		list_splice_init(&(*tp)->t_dfops, &dop_pending);
 
-		error = xfs_defer_trans_roll(tp);
-		if (error)
-			goto out_shutdown;
+		if (has_intents || dfp) {
+			error = xfs_defer_trans_roll(tp);
+			if (error)
+				goto out_shutdown;
 
-		/* Possibly relog intent items to keep the log moving. */
-		error = xfs_defer_relog(tp, &dop_pending);
-		if (error)
-			goto out_shutdown;
+			/* Possibly relog intent items to keep the log moving. */
+			error = xfs_defer_relog(tp, &dop_pending);
+			if (error)
+				goto out_shutdown;
+		}
 
 		dfp = list_first_entry(&dop_pending, struct xfs_defer_pending,
 				       dfp_list);
-- 
2.31.1


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 0/5] xfs: various logged attribute fixes
  2021-09-01  7:30 ` [PATCH 0/5] xfs: various logged attribute fixes Dave Chinner
                     ` (4 preceding siblings ...)
  2021-09-01  7:30   ` [PATCH 5/5] [RFC] xfs: don't commit the first deferred transaction without intents Dave Chinner
@ 2021-09-01 21:43   ` Allison Henderson
  5 siblings, 0 replies; 45+ messages in thread
From: Allison Henderson @ 2021-09-01 21:43 UTC (permalink / raw)
  To: Dave Chinner, linux-xfs

Alrighty, thanks for all your help on this.  These changes make sense to 
me, I can add patches 3 and 5 to the set, and the rest looks like they 
can be amended into their respective patches in the series.

I will keep an eye out for the whiteout set you are working on then.
Thank you!!

Allison

On 9/1/21 12:30 AM, Dave Chinner wrote:
> Hi Allison,
> 
> Here are the first set of fixups I've made while testing this
> series. The intent whiteouts are generic so I'm keeping that
> separate, even though it's the change that fixes most of the
> performance regressions for small xattr sizes that result from
> enabling logged attributes.
> 
> Cheers,
> 
> Dave.
> 

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v24 02/11] xfs: Capture buffers for delayed ops
  2021-08-31 17:50     ` Allison Henderson
@ 2021-09-18  0:02       ` Darrick J. Wong
  0 siblings, 0 replies; 45+ messages in thread
From: Darrick J. Wong @ 2021-09-18  0:02 UTC (permalink / raw)
  To: Allison Henderson; +Cc: linux-xfs

On Tue, Aug 31, 2021 at 10:50:52AM -0700, Allison Henderson wrote:
> 
> 
> On 8/30/21 10:44 AM, Darrick J. Wong wrote:
> > On Tue, Aug 24, 2021 at 03:44:25PM -0700, Allison Henderson wrote:
> > > This patch enables delayed operations to capture held buffers with in
> > > the xfs_defer_capture. Buffers are then rejoined to the new
> > > transaction in xlog_finish_defer_ops
> > > 
> > > Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
> > > ---
> > >   fs/xfs/libxfs/xfs_defer.c  | 7 ++++++-
> > >   fs/xfs/libxfs/xfs_defer.h  | 4 +++-
> > >   fs/xfs/xfs_bmap_item.c     | 2 +-
> > >   fs/xfs/xfs_buf.c           | 1 +
> > >   fs/xfs/xfs_buf.h           | 1 +
> > >   fs/xfs/xfs_extfree_item.c  | 2 +-
> > >   fs/xfs/xfs_log_recover.c   | 7 +++++++
> > >   fs/xfs/xfs_refcount_item.c | 2 +-
> > >   fs/xfs/xfs_rmap_item.c     | 2 +-
> > >   9 files changed, 22 insertions(+), 6 deletions(-)
> > > 
> > > diff --git a/fs/xfs/libxfs/xfs_defer.c b/fs/xfs/libxfs/xfs_defer.c
> > > index eff4a127188e..d1d09b6aca55 100644
> > > --- a/fs/xfs/libxfs/xfs_defer.c
> > > +++ b/fs/xfs/libxfs/xfs_defer.c
> > > @@ -639,6 +639,7 @@ xfs_defer_ops_capture(
> > >   	dfc = kmem_zalloc(sizeof(*dfc), KM_NOFS);
> > >   	INIT_LIST_HEAD(&dfc->dfc_list);
> > >   	INIT_LIST_HEAD(&dfc->dfc_dfops);
> > > +	INIT_LIST_HEAD(&dfc->dfc_buffers);
> > >   	xfs_defer_create_intents(tp);
> > > @@ -690,7 +691,8 @@ int
> > >   xfs_defer_ops_capture_and_commit(
> > >   	struct xfs_trans		*tp,
> > >   	struct xfs_inode		*capture_ip,
> > > -	struct list_head		*capture_list)
> > > +	struct list_head		*capture_list,
> > > +	struct xfs_buf			*bp)
> > 
> > I wonder if xfs_defer_ops_capture should learn to pick up the inodes and
> > buffers to hold automatically from the transaction that's being
> > committed?  Seeing as xfs_defer_trans_roll already knows how to do that
> > across transaction rolls, and that's more or less the same thing we're
> > doing here, but in a much more roundabout way.
> I see, I suppose it could?  But it wouldnt be used in this case though, at
> least not yet.  I sort of got the impression that people like to see things
> added as they are needed, and then unused code culled where it can be.  I
> would think that if the need does arise though, b_delay would be easy to
> expand into list of xfs_delay_items or something similar to what
> xfs_defer_trans_roll has.

On further thought, I decided that log recovery of intent items really
ought to be able to hold the same number and type of resources across a
transaction "roll" that we can do at runtime, since there really ought
to be no difference.

I wrote a quick patch series to hoist the runtime code path to store the
held buffers and inodes during xfs_defer_trans_roll in an explicitly
named structure with separate save and restore helpers, then refactored
log recovery to use it.

This cleans up the API a bit so we that the defer capture system picks
up the held resources from the log item state just like a regular defer
roll.  It occurs to me that another way to handle intent item recovery
would be to create workqueue items for each recovered intent and then
kick the transactions off in background threads so that the end of all
the _recover functions would be xfs_trans_commit.  But that's a lot more
surgery on the recovery code and I don't want to go there right now.

Anyway, I'll send the RFC series shortly and cc you on it, in case you
want to incorporate it at the head of your series and nuke patch #2.

Note: the end of the xattri recovery function then becomes:

	error = xfs_defer_ops_capture_and_commit(tp, capture_list);
	if (error)
		/* pretend I handled the error */

	if (attr->xattri_dac.leaf_bp)
		xfs_buf_relse(attr->xattri_dac.leaf_bp);
	xfs_iunlock(ip, XFS_ILOCK_EXCL);
	xfs_irele(ip);
	return 0;

I think it's a bug that patch 4 doesn't release leaf_bp, since we
require code that holds a buffer across a commit to relse it explicitly.

--D

> 
> > 
> > >   {
> > >   	struct xfs_mount		*mp = tp->t_mountp;
> > >   	struct xfs_defer_capture	*dfc;
> > > @@ -703,6 +705,9 @@ xfs_defer_ops_capture_and_commit(
> > >   	if (!dfc)
> > >   		return xfs_trans_commit(tp);
> > > +	if (bp && bp->b_transp == tp)
> > > +		list_add_tail(&bp->b_delay, &dfc->dfc_buffers);
> > > +
> > >   	/* Commit the transaction and add the capture structure to the list. */
> > >   	error = xfs_trans_commit(tp);
> > >   	if (error) {
> > > diff --git a/fs/xfs/libxfs/xfs_defer.h b/fs/xfs/libxfs/xfs_defer.h
> > > index 05472f71fffe..739f70d72fd5 100644
> > > --- a/fs/xfs/libxfs/xfs_defer.h
> > > +++ b/fs/xfs/libxfs/xfs_defer.h
> > > @@ -74,6 +74,7 @@ struct xfs_defer_capture {
> > >   	/* Deferred ops state saved from the transaction. */
> > >   	struct list_head	dfc_dfops;
> > > +	struct list_head	dfc_buffers;
> > >   	unsigned int		dfc_tpflags;
> > >   	/* Block reservations for the data and rt devices. */
> > > @@ -95,7 +96,8 @@ struct xfs_defer_capture {
> > >    * This doesn't normally happen except log recovery.
> > >    */
> > >   int xfs_defer_ops_capture_and_commit(struct xfs_trans *tp,
> > > -		struct xfs_inode *capture_ip, struct list_head *capture_list);
> > > +		struct xfs_inode *capture_ip, struct list_head *capture_list,
> > > +		struct xfs_buf *bp);
> > >   void xfs_defer_ops_continue(struct xfs_defer_capture *d, struct xfs_trans *tp,
> > >   		struct xfs_inode **captured_ipp);
> > >   void xfs_defer_ops_release(struct xfs_mount *mp, struct xfs_defer_capture *d);
> > > diff --git a/fs/xfs/xfs_bmap_item.c b/fs/xfs/xfs_bmap_item.c
> > > index 03159970133f..51ba8ee368ca 100644
> > > --- a/fs/xfs/xfs_bmap_item.c
> > > +++ b/fs/xfs/xfs_bmap_item.c
> > > @@ -532,7 +532,7 @@ xfs_bui_item_recover(
> > >   	 * Commit transaction, which frees the transaction and saves the inode
> > >   	 * for later replay activities.
> > >   	 */
> > > -	error = xfs_defer_ops_capture_and_commit(tp, ip, capture_list);
> > > +	error = xfs_defer_ops_capture_and_commit(tp, ip, capture_list, NULL);
> > >   	if (error)
> > >   		goto err_unlock;
> > > diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
> > > index 047bd6e3f389..29b4655a0a65 100644
> > > --- a/fs/xfs/xfs_buf.c
> > > +++ b/fs/xfs/xfs_buf.c
> > > @@ -233,6 +233,7 @@ _xfs_buf_alloc(
> > >   	init_completion(&bp->b_iowait);
> > >   	INIT_LIST_HEAD(&bp->b_lru);
> > >   	INIT_LIST_HEAD(&bp->b_list);
> > > +	INIT_LIST_HEAD(&bp->b_delay);
> > >   	INIT_LIST_HEAD(&bp->b_li_list);
> > >   	sema_init(&bp->b_sema, 0); /* held, no waiters */
> > >   	spin_lock_init(&bp->b_lock);
> > > diff --git a/fs/xfs/xfs_buf.h b/fs/xfs/xfs_buf.h
> > > index 6b0200b8007d..c51445705dc6 100644
> > > --- a/fs/xfs/xfs_buf.h
> > > +++ b/fs/xfs/xfs_buf.h
> > > @@ -151,6 +151,7 @@ struct xfs_buf {
> > >   	int			b_io_error;	/* internal IO error state */
> > >   	wait_queue_head_t	b_waiters;	/* unpin waiters */
> > >   	struct list_head	b_list;
> > > +	struct list_head	b_delay;	/* delayed operations list */
> > >   	struct xfs_perag	*b_pag;		/* contains rbtree root */
> > >   	struct xfs_mount	*b_mount;
> > >   	struct xfs_buftarg	*b_target;	/* buffer target (device) */
> > 
> > The bare list-conveyance machinery looks fine to me, but adding 16 bytes
> > to struct xfs_buf for something that only happens during log recovery is
> > rather expensive.  Can you reuse b_list for this purpose?  I think the
> > only user of b_list are the buffer delwri functions, which shouldn't be
> > active here since the xattr recovery mechanism (a) holds the buffer lock
> > and (b) doesn't itself use delwri buffer lists for xattr leaf blocks.
> > 
> > (The AIL uses delwri lists, but it won't touch a locked buffer.)
> > 
> Sure, it sounds like it would work, will try it out.
> 
> > > diff --git a/fs/xfs/xfs_extfree_item.c b/fs/xfs/xfs_extfree_item.c
> > > index 3f8a0713573a..046f21338c48 100644
> > > --- a/fs/xfs/xfs_extfree_item.c
> > > +++ b/fs/xfs/xfs_extfree_item.c
> > > @@ -637,7 +637,7 @@ xfs_efi_item_recover(
> > >   	}
> > > -	return xfs_defer_ops_capture_and_commit(tp, NULL, capture_list);
> > > +	return xfs_defer_ops_capture_and_commit(tp, NULL, capture_list, NULL);
> > >   abort_error:
> > >   	xfs_trans_cancel(tp);
> > > diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
> > > index 10562ecbd9ea..6a3c0bb16b69 100644
> > > --- a/fs/xfs/xfs_log_recover.c
> > > +++ b/fs/xfs/xfs_log_recover.c
> > > @@ -2465,6 +2465,7 @@ xlog_finish_defer_ops(
> > >   	struct list_head	*capture_list)
> > >   {
> > >   	struct xfs_defer_capture *dfc, *next;
> > > +	struct xfs_buf		*bp, *bnext;
> > >   	struct xfs_trans	*tp;
> > >   	struct xfs_inode	*ip;
> > >   	int			error = 0;
> > > @@ -2489,6 +2490,12 @@ xlog_finish_defer_ops(
> > >   			return error;
> > >   		}
> > > +		list_for_each_entry_safe(bp, bnext, &dfc->dfc_buffers, b_delay) {
> > > +			xfs_trans_bjoin(tp, bp);
> > > +			xfs_trans_bhold(tp, bp);
> > > +			list_del_init(&bp->b_delay);
> > > +		}
> > 
> > Why isn't this in xfs_defer_ops_continue, like the code that extracts
> > the inodes from the capture struct and hands them back to the caller?
> Its just what was discussed in the last review is all.  That does look like
> a better place for it though.  Will move there.
> 
> Allison
> 
> > 
> > > +
> > >   		/*
> > >   		 * Transfer to this new transaction all the dfops we captured
> > >   		 * from recovering a single intent item.
> > > diff --git a/fs/xfs/xfs_refcount_item.c b/fs/xfs/xfs_refcount_item.c
> > > index 46904b793bd4..a6e7351ca4f9 100644
> > > --- a/fs/xfs/xfs_refcount_item.c
> > > +++ b/fs/xfs/xfs_refcount_item.c
> > > @@ -557,7 +557,7 @@ xfs_cui_item_recover(
> > >   	}
> > >   	xfs_refcount_finish_one_cleanup(tp, rcur, error);
> > > -	return xfs_defer_ops_capture_and_commit(tp, NULL, capture_list);
> > > +	return xfs_defer_ops_capture_and_commit(tp, NULL, capture_list, NULL);
> > >   abort_error:
> > >   	xfs_refcount_finish_one_cleanup(tp, rcur, error);
> > > diff --git a/fs/xfs/xfs_rmap_item.c b/fs/xfs/xfs_rmap_item.c
> > > index 5f0695980467..8c70a4af80a9 100644
> > > --- a/fs/xfs/xfs_rmap_item.c
> > > +++ b/fs/xfs/xfs_rmap_item.c
> > > @@ -587,7 +587,7 @@ xfs_rui_item_recover(
> > >   	}
> > >   	xfs_rmap_finish_one_cleanup(tp, rcur, error);
> > > -	return xfs_defer_ops_capture_and_commit(tp, NULL, capture_list);
> > > +	return xfs_defer_ops_capture_and_commit(tp, NULL, capture_list, NULL);
> > >   abort_error:
> > >   	xfs_rmap_finish_one_cleanup(tp, rcur, error);
> > > -- 
> > > 2.25.1
> > > 

^ permalink raw reply	[flat|nested] 45+ messages in thread

end of thread, other threads:[~2021-09-18  0:02 UTC | newest]

Thread overview: 45+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-24 22:44 [PATCH v24 00/11] Log Attribute Replay Allison Henderson
2021-08-24 22:44 ` [PATCH v24 01/11] xfs: Return from xfs_attr_set_iter if there are no more rmtblks to process Allison Henderson
2021-08-24 22:44 ` [PATCH v24 02/11] xfs: Capture buffers for delayed ops Allison Henderson
2021-08-27 11:33   ` Chandan Babu R
2021-08-31 18:03     ` Allison Henderson
2021-08-30 17:44   ` Darrick J. Wong
2021-08-31 17:50     ` Allison Henderson
2021-09-18  0:02       ` Darrick J. Wong
2021-08-24 22:44 ` [PATCH v24 03/11] xfs: Set up infrastructure for log atrribute replay Allison Henderson
2021-08-27 14:17   ` Chandan Babu R
2021-08-31 18:03     ` Allison Henderson
2021-08-31  0:48   ` Dave Chinner
2021-08-31 19:03     ` Allison Henderson
2021-08-31 21:52   ` Dave Chinner
2021-09-01  5:34     ` Allison Henderson
2021-08-24 22:44 ` [PATCH v24 04/11] xfs: Implement attr logging and replay Allison Henderson
2021-08-30  7:47   ` Chandan Babu R
2021-08-31 18:11     ` Allison Henderson
2021-08-24 22:44 ` [PATCH v24 05/11] RFC xfs: Skip flip flags for delayed attrs Allison Henderson
2021-08-30 10:15   ` Chandan Babu R
2021-08-31 18:11     ` Allison Henderson
2021-08-24 22:44 ` [PATCH v24 06/11] xfs: Add xfs_attr_set_deferred and xfs_attr_remove_deferred Allison Henderson
2021-08-30 10:27   ` Chandan Babu R
2021-08-31 18:12     ` Allison Henderson
2021-09-01  3:47   ` Dave Chinner
2021-09-01  5:34     ` Allison Henderson
2021-08-24 22:44 ` [PATCH v24 07/11] xfs: Remove unused xfs_attr_*_args Allison Henderson
2021-08-24 22:44 ` [PATCH v24 08/11] xfs: Add log attribute error tag Allison Henderson
2021-08-24 22:44 ` [PATCH v24 09/11] xfs: Add larp debug option Allison Henderson
2021-08-30 12:03   ` Chandan Babu R
2021-08-31 18:12     ` Allison Henderson
2021-08-24 22:44 ` [PATCH v24 10/11] xfs: Merge xfs_delattr_context into xfs_attr_item Allison Henderson
2021-08-24 22:44 ` [PATCH v24 11/11] xfs: Add helper function xfs_attr_leaf_addname Allison Henderson
2021-08-30 14:17   ` Chandan Babu R
2021-08-31 18:13     ` Allison Henderson
2021-08-31  0:20 ` [PATCH v24 00/11] Log Attribute Replay Dave Chinner
2021-08-31 18:27   ` Allison Henderson
2021-09-01  6:29     ` Dave Chinner
2021-09-01  7:30 ` [PATCH 0/5] xfs: various logged attribute fixes Dave Chinner
2021-09-01  7:30   ` [PATCH 1/5] xfs: fix fallthrough annotations in xfs_attr_set_iter() Dave Chinner
2021-09-01  7:30   ` [PATCH 2/5] xfs: fix flags passed to kvmalloc() by xfs_attri_init() Dave Chinner
2021-09-01  7:30   ` [PATCH 3/5] xfs: hide log iovec alignment constraints Dave Chinner
2021-09-01  7:30   ` [PATCH 4/5] xfs: fix attribute log iovec sizing Dave Chinner
2021-09-01  7:30   ` [PATCH 5/5] [RFC] xfs: don't commit the first deferred transaction without intents Dave Chinner
2021-09-01 21:43   ` [PATCH 0/5] xfs: various logged attribute fixes Allison Henderson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).