All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/8 v3] xfs: intent whiteouts
@ 2022-03-14 22:06 Dave Chinner
  2022-03-14 22:06 ` [PATCH 1/8] xfs: hide log iovec alignment constraints Dave Chinner
                   ` (8 more replies)
  0 siblings, 9 replies; 20+ messages in thread
From: Dave Chinner @ 2022-03-14 22:06 UTC (permalink / raw)
  To: linux-xfs

This is a patchset inspired by the performance regressions that were
seen from logging 64k xattrs with Allison's delayed attribute
patchset and trying to work out how to minimise the impact of
logging xattrs. Most of that is explained in the "xfs: intent item
whiteouts" patch, so I won't repeat it here.

The whiteouts massively reduce the journal write overhead of logging
xattrs - with this patchset I've reduced 2.5GB/s of log traffic (16
way file create w/64k xattr workload) down to approximately 220MB of
log traffic, and performance has increased from 9k creates/s to 36k
creates/s. The workload still writes to disk at 2.5GB/s, but that's
what writing 35k x 64k xattrs to disk does.

This is still short of the non-logged attribute mechanism, which
runs at 40-45k creates a second and 3.5-4GB/s to disk, but it brings
logged attrs to within roughly 5-15% of non-logged attrs across the
full range of attribute sizes.

So, while this patchset was clearly insired and has major positive
impact on Allison's delayed attribute work, it also applies
generically to all other intent/intent done pairs that already
exist. Hence I've created this patchset as a stand-alone patchset
that isn't dependent on the delayed attributes being committed, nor
does the delayed attribute patchset need this to function properly.
IOWs, they can be merged in parallel and then the attribute log item
implementation be updated to support whiteouts after the fact.

This patchset is separate to the attr code, though, because
intent whiteouts are not specific to the attr code. They are a
generic mechanism that can be applied to all the intent/intent done
item pairs we already have. This patch set modifies all those
intents to use whiteouts, and so there is benefits from the patch
set for all operations that use these intents.

With respect to the delayed attribute patchset, it can be merged
without whiteout support and still work correctly with/without this
patchset in place. Once both intent whiteouts and delayed attrs are
merged, we can add whiteout support to delayed attributes with only
a few lines of extra code.

Changelog:

Version 3:
- rebased on 5.17-rc4 + xlog-write-rework
- no longer dependent on xfs-cil-scalability, so there's some porting changes
  that was needed to remove all the per-cpu CIL dependencies.

Version 2:
- not published
- rebased on 5.15-rc2 + xfs-cil-scalability
- dropped the kvmalloc changes for CIL shadow buffers as that's a
  separate perf problem and not something related to intent
  whiteouts.
- dropped all the delayed attribute modifications so that the
  patchset is not dependent on Allison's dev tree.
- Thanks to Allison for an initial quick review pass - I haven't
  included those RVB tags because every patch in the series has
  changed since the original RFC posting.

RFC:
- https://lore.kernel.org/linux-xfs/20210909212133.GE2361455@dread.disaster.area/


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 1/8] xfs: hide log iovec alignment constraints
  2022-03-14 22:06 [PATCH 0/8 v3] xfs: intent whiteouts Dave Chinner
@ 2022-03-14 22:06 ` Dave Chinner
  2022-04-11  5:23   ` Alli
  2022-03-14 22:06 ` [PATCH 2/8] xfs: don't commit the first deferred transaction without intents Dave Chinner
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 20+ messages in thread
From: Dave Chinner @ 2022-03-14 22:06 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

Callers currently have to round out the size of buffers to match the
aligment constraints of log iovecs and xlog_write(). They should not
need to know this detail, so introduce a new function to calculate
the iovec length (for use in ->iop_size implementations). Also
modify xlog_finish_iovec() to round up the length to the correct
alignment so the callers don't need to do this, either.

Convert the only user - inode forks - of this alignment rounding to
use the new interface.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/libxfs/xfs_inode_fork.c  | 12 +++---------
 fs/xfs/xfs_inode_item.c         | 25 +++++++------------------
 fs/xfs/xfs_inode_item_recover.c |  4 ++--
 fs/xfs/xfs_log.h                | 23 +++++++++++++++++++++++
 4 files changed, 35 insertions(+), 29 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_inode_fork.c b/fs/xfs/libxfs/xfs_inode_fork.c
index 9149f4f796fc..a46379f6c812 100644
--- a/fs/xfs/libxfs/xfs_inode_fork.c
+++ b/fs/xfs/libxfs/xfs_inode_fork.c
@@ -36,7 +36,7 @@ xfs_init_local_fork(
 	int64_t			size)
 {
 	struct xfs_ifork	*ifp = XFS_IFORK_PTR(ip, whichfork);
-	int			mem_size = size, real_size = 0;
+	int			mem_size = size;
 	bool			zero_terminate;
 
 	/*
@@ -50,8 +50,7 @@ xfs_init_local_fork(
 		mem_size++;
 
 	if (size) {
-		real_size = roundup(mem_size, 4);
-		ifp->if_u1.if_data = kmem_alloc(real_size, KM_NOFS);
+		ifp->if_u1.if_data = kmem_alloc(mem_size, KM_NOFS);
 		memcpy(ifp->if_u1.if_data, data, size);
 		if (zero_terminate)
 			ifp->if_u1.if_data[size] = '\0';
@@ -497,12 +496,7 @@ xfs_idata_realloc(
 		return;
 	}
 
-	/*
-	 * For inline data, the underlying buffer must be a multiple of 4 bytes
-	 * in size so that it can be logged and stay on word boundaries.
-	 * We enforce that here.
-	 */
-	ifp->if_u1.if_data = krealloc(ifp->if_u1.if_data, roundup(new_size, 4),
+	ifp->if_u1.if_data = krealloc(ifp->if_u1.if_data, new_size,
 				      GFP_NOFS | __GFP_NOFAIL);
 	ifp->if_bytes = new_size;
 }
diff --git a/fs/xfs/xfs_inode_item.c b/fs/xfs/xfs_inode_item.c
index 90d8e591baf8..19dc3e37bb4d 100644
--- a/fs/xfs/xfs_inode_item.c
+++ b/fs/xfs/xfs_inode_item.c
@@ -70,7 +70,7 @@ xfs_inode_item_data_fork_size(
 	case XFS_DINODE_FMT_LOCAL:
 		if ((iip->ili_fields & XFS_ILOG_DDATA) &&
 		    ip->i_df.if_bytes > 0) {
-			*nbytes += roundup(ip->i_df.if_bytes, 4);
+			*nbytes += xlog_calc_iovec_len(ip->i_df.if_bytes);
 			*nvecs += 1;
 		}
 		break;
@@ -111,7 +111,7 @@ xfs_inode_item_attr_fork_size(
 	case XFS_DINODE_FMT_LOCAL:
 		if ((iip->ili_fields & XFS_ILOG_ADATA) &&
 		    ip->i_afp->if_bytes > 0) {
-			*nbytes += roundup(ip->i_afp->if_bytes, 4);
+			*nbytes += xlog_calc_iovec_len(ip->i_afp->if_bytes);
 			*nvecs += 1;
 		}
 		break;
@@ -203,17 +203,12 @@ xfs_inode_item_format_data_fork(
 			~(XFS_ILOG_DEXT | XFS_ILOG_DBROOT | XFS_ILOG_DEV);
 		if ((iip->ili_fields & XFS_ILOG_DDATA) &&
 		    ip->i_df.if_bytes > 0) {
-			/*
-			 * Round i_bytes up to a word boundary.
-			 * The underlying memory is guaranteed
-			 * to be there by xfs_idata_realloc().
-			 */
-			data_bytes = roundup(ip->i_df.if_bytes, 4);
 			ASSERT(ip->i_df.if_u1.if_data != NULL);
 			ASSERT(ip->i_disk_size > 0);
 			xlog_copy_iovec(lv, vecp, XLOG_REG_TYPE_ILOCAL,
-					ip->i_df.if_u1.if_data, data_bytes);
-			ilf->ilf_dsize = (unsigned)data_bytes;
+					ip->i_df.if_u1.if_data,
+					ip->i_df.if_bytes);
+			ilf->ilf_dsize = (unsigned)ip->i_df.if_bytes;
 			ilf->ilf_size++;
 		} else {
 			iip->ili_fields &= ~XFS_ILOG_DDATA;
@@ -287,17 +282,11 @@ xfs_inode_item_format_attr_fork(
 
 		if ((iip->ili_fields & XFS_ILOG_ADATA) &&
 		    ip->i_afp->if_bytes > 0) {
-			/*
-			 * Round i_bytes up to a word boundary.
-			 * The underlying memory is guaranteed
-			 * to be there by xfs_idata_realloc().
-			 */
-			data_bytes = roundup(ip->i_afp->if_bytes, 4);
 			ASSERT(ip->i_afp->if_u1.if_data != NULL);
 			xlog_copy_iovec(lv, vecp, XLOG_REG_TYPE_IATTR_LOCAL,
 					ip->i_afp->if_u1.if_data,
-					data_bytes);
-			ilf->ilf_asize = (unsigned)data_bytes;
+					ip->i_afp->if_bytes);
+			ilf->ilf_asize = (unsigned)ip->i_afp->if_bytes;
 			ilf->ilf_size++;
 		} else {
 			iip->ili_fields &= ~XFS_ILOG_ADATA;
diff --git a/fs/xfs/xfs_inode_item_recover.c b/fs/xfs/xfs_inode_item_recover.c
index 239dd2e3384e..044504a4b399 100644
--- a/fs/xfs/xfs_inode_item_recover.c
+++ b/fs/xfs/xfs_inode_item_recover.c
@@ -401,7 +401,7 @@ xlog_recover_inode_commit_pass2(
 	ASSERT(in_f->ilf_size <= 4);
 	ASSERT((in_f->ilf_size == 3) || (fields & XFS_ILOG_AFORK));
 	ASSERT(!(fields & XFS_ILOG_DFORK) ||
-	       (len == in_f->ilf_dsize));
+	       (len == xlog_calc_iovec_len(in_f->ilf_dsize)));
 
 	switch (fields & XFS_ILOG_DFORK) {
 	case XFS_ILOG_DDATA:
@@ -436,7 +436,7 @@ xlog_recover_inode_commit_pass2(
 		}
 		len = item->ri_buf[attr_index].i_len;
 		src = item->ri_buf[attr_index].i_addr;
-		ASSERT(len == in_f->ilf_asize);
+		ASSERT(len == xlog_calc_iovec_len(in_f->ilf_asize));
 
 		switch (in_f->ilf_fields & XFS_ILOG_AFORK) {
 		case XFS_ILOG_ADATA:
diff --git a/fs/xfs/xfs_log.h b/fs/xfs/xfs_log.h
index 816f44d7dc81..fbe3e764cee3 100644
--- a/fs/xfs/xfs_log.h
+++ b/fs/xfs/xfs_log.h
@@ -21,6 +21,17 @@ struct xfs_log_vec {
 
 #define XFS_LOG_VEC_ORDERED	(-1)
 
+/*
+ * Calculate the log iovec length for a given user buffer length. Intended to be
+ * used by ->iop_size implementations when sizing buffers of arbitrary
+ * alignments.
+ */
+static inline int
+xlog_calc_iovec_len(int len)
+{
+	return roundup(len, 4);
+}
+
 void *xlog_prepare_iovec(struct xfs_log_vec *lv, struct xfs_log_iovec **vecp,
 		uint type);
 
@@ -29,6 +40,12 @@ xlog_finish_iovec(struct xfs_log_vec *lv, struct xfs_log_iovec *vec, int len)
 {
 	struct xlog_op_header	*oph = vec->i_addr;
 
+	/*
+	 * Always round up the length to the correct alignment so callers don't
+	 * need to know anything about this log vec layout requirement.
+	 */
+	len = xlog_calc_iovec_len(len);
+
 	/* opheader tracks payload length, logvec tracks region length */
 	oph->oh_len = cpu_to_be32(len);
 
@@ -36,8 +53,14 @@ xlog_finish_iovec(struct xfs_log_vec *lv, struct xfs_log_iovec *vec, int len)
 	lv->lv_buf_len += len;
 	lv->lv_bytes += len;
 	vec->i_len = len;
+
+	/* Catch buffer overruns */
+	ASSERT((void *)lv->lv_buf + lv->lv_bytes <= (void *)lv + lv->lv_size);
 }
 
+/*
+ * Copy the amount of data requested by the caller into a new log iovec.
+ */
 static inline void *
 xlog_copy_iovec(struct xfs_log_vec *lv, struct xfs_log_iovec **vecp,
 		uint type, void *data, int len)
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 2/8] xfs: don't commit the first deferred transaction without intents
  2022-03-14 22:06 [PATCH 0/8 v3] xfs: intent whiteouts Dave Chinner
  2022-03-14 22:06 ` [PATCH 1/8] xfs: hide log iovec alignment constraints Dave Chinner
@ 2022-03-14 22:06 ` Dave Chinner
  2022-04-11  5:22   ` Alli
  2022-03-14 22:06 ` [PATCH 3/8] xfs: add log item flags to indicate intents Dave Chinner
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 20+ messages in thread
From: Dave Chinner @ 2022-03-14 22:06 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

If the first operation in a string of defer ops has no intents,
then there is no reason to commit it before running the first call
to xfs_defer_finish_one(). This allows the defer ops to be used
effectively for non-intent based operations without requiring an
unnecessary extra transaction commit when first called.

This fixes a regression in per-attribute modification transaction
count when delayed attributes are not being used.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/libxfs/xfs_defer.c | 29 +++++++++++++++++------------
 1 file changed, 17 insertions(+), 12 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_defer.c b/fs/xfs/libxfs/xfs_defer.c
index 0805ade2d300..66b4555bda8e 100644
--- a/fs/xfs/libxfs/xfs_defer.c
+++ b/fs/xfs/libxfs/xfs_defer.c
@@ -186,7 +186,7 @@ static const struct xfs_defer_op_type *defer_op_types[] = {
 	[XFS_DEFER_OPS_TYPE_AGFL_FREE]	= &xfs_agfl_free_defer_type,
 };
 
-static void
+static bool
 xfs_defer_create_intent(
 	struct xfs_trans		*tp,
 	struct xfs_defer_pending	*dfp,
@@ -197,6 +197,7 @@ xfs_defer_create_intent(
 	if (!dfp->dfp_intent)
 		dfp->dfp_intent = ops->create_intent(tp, &dfp->dfp_work,
 						     dfp->dfp_count, sort);
+	return dfp->dfp_intent;
 }
 
 /*
@@ -204,16 +205,18 @@ xfs_defer_create_intent(
  * associated extents, then add the entire intake list to the end of
  * the pending list.
  */
-STATIC void
+static bool
 xfs_defer_create_intents(
 	struct xfs_trans		*tp)
 {
 	struct xfs_defer_pending	*dfp;
+	bool				ret = false;
 
 	list_for_each_entry(dfp, &tp->t_dfops, dfp_list) {
 		trace_xfs_defer_create_intent(tp->t_mountp, dfp);
-		xfs_defer_create_intent(tp, dfp, true);
+		ret |= xfs_defer_create_intent(tp, dfp, true);
 	}
+	return ret;
 }
 
 /* Abort all the intents that were committed. */
@@ -487,7 +490,7 @@ int
 xfs_defer_finish_noroll(
 	struct xfs_trans		**tp)
 {
-	struct xfs_defer_pending	*dfp;
+	struct xfs_defer_pending	*dfp = NULL;
 	int				error = 0;
 	LIST_HEAD(dop_pending);
 
@@ -506,17 +509,19 @@ xfs_defer_finish_noroll(
 		 * of time that any one intent item can stick around in memory,
 		 * pinning the log tail.
 		 */
-		xfs_defer_create_intents(*tp);
+		bool has_intents = xfs_defer_create_intents(*tp);
 		list_splice_init(&(*tp)->t_dfops, &dop_pending);
 
-		error = xfs_defer_trans_roll(tp);
-		if (error)
-			goto out_shutdown;
+		if (has_intents || dfp) {
+			error = xfs_defer_trans_roll(tp);
+			if (error)
+				goto out_shutdown;
 
-		/* Possibly relog intent items to keep the log moving. */
-		error = xfs_defer_relog(tp, &dop_pending);
-		if (error)
-			goto out_shutdown;
+			/* Possibly relog intent items to keep the log moving. */
+			error = xfs_defer_relog(tp, &dop_pending);
+			if (error)
+				goto out_shutdown;
+		}
 
 		dfp = list_first_entry(&dop_pending, struct xfs_defer_pending,
 				       dfp_list);
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 3/8] xfs: add log item flags to indicate intents
  2022-03-14 22:06 [PATCH 0/8 v3] xfs: intent whiteouts Dave Chinner
  2022-03-14 22:06 ` [PATCH 1/8] xfs: hide log iovec alignment constraints Dave Chinner
  2022-03-14 22:06 ` [PATCH 2/8] xfs: don't commit the first deferred transaction without intents Dave Chinner
@ 2022-03-14 22:06 ` Dave Chinner
  2022-04-11  5:23   ` Alli
  2022-03-14 22:06 ` [PATCH 4/8] xfs: tag transactions that contain intent done items Dave Chinner
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 20+ messages in thread
From: Dave Chinner @ 2022-03-14 22:06 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

We currently have a couple of helper functions that try to infer
whether the log item is an intent or intent done item from the
combinations of operations it supports.  This is incredibly fragile
and not very efficient as it requires checking specific combinations
of ops.

We need to be able to identify intent and intent done items quickly
and easily in upcoming patches, so simply add intent and intent done
type flags to the log item ops flags. These are static flags to
begin with, so intent items should have been typed like this from
the start.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_bmap_item.c     |  4 +++-
 fs/xfs/xfs_extfree_item.c  |  4 +++-
 fs/xfs/xfs_refcount_item.c |  4 +++-
 fs/xfs/xfs_rmap_item.c     |  4 +++-
 fs/xfs/xfs_trans.h         | 25 +++++++++++++------------
 5 files changed, 25 insertions(+), 16 deletions(-)

diff --git a/fs/xfs/xfs_bmap_item.c b/fs/xfs/xfs_bmap_item.c
index e1f4d7d5a011..ead27d40ef78 100644
--- a/fs/xfs/xfs_bmap_item.c
+++ b/fs/xfs/xfs_bmap_item.c
@@ -202,7 +202,8 @@ xfs_bud_item_release(
 }
 
 static const struct xfs_item_ops xfs_bud_item_ops = {
-	.flags		= XFS_ITEM_RELEASE_WHEN_COMMITTED,
+	.flags		= XFS_ITEM_RELEASE_WHEN_COMMITTED |
+			  XFS_ITEM_INTENT_DONE,
 	.iop_size	= xfs_bud_item_size,
 	.iop_format	= xfs_bud_item_format,
 	.iop_release	= xfs_bud_item_release,
@@ -584,6 +585,7 @@ xfs_bui_item_relog(
 }
 
 static const struct xfs_item_ops xfs_bui_item_ops = {
+	.flags		= XFS_ITEM_INTENT,
 	.iop_size	= xfs_bui_item_size,
 	.iop_format	= xfs_bui_item_format,
 	.iop_unpin	= xfs_bui_item_unpin,
diff --git a/fs/xfs/xfs_extfree_item.c b/fs/xfs/xfs_extfree_item.c
index 47ef9c9c5c17..6913db61d770 100644
--- a/fs/xfs/xfs_extfree_item.c
+++ b/fs/xfs/xfs_extfree_item.c
@@ -307,7 +307,8 @@ xfs_efd_item_release(
 }
 
 static const struct xfs_item_ops xfs_efd_item_ops = {
-	.flags		= XFS_ITEM_RELEASE_WHEN_COMMITTED,
+	.flags		= XFS_ITEM_RELEASE_WHEN_COMMITTED |
+			  XFS_ITEM_INTENT_DONE,
 	.iop_size	= xfs_efd_item_size,
 	.iop_format	= xfs_efd_item_format,
 	.iop_release	= xfs_efd_item_release,
@@ -688,6 +689,7 @@ xfs_efi_item_relog(
 }
 
 static const struct xfs_item_ops xfs_efi_item_ops = {
+	.flags		= XFS_ITEM_INTENT,
 	.iop_size	= xfs_efi_item_size,
 	.iop_format	= xfs_efi_item_format,
 	.iop_unpin	= xfs_efi_item_unpin,
diff --git a/fs/xfs/xfs_refcount_item.c b/fs/xfs/xfs_refcount_item.c
index d3da67772d57..bffde82b109c 100644
--- a/fs/xfs/xfs_refcount_item.c
+++ b/fs/xfs/xfs_refcount_item.c
@@ -208,7 +208,8 @@ xfs_cud_item_release(
 }
 
 static const struct xfs_item_ops xfs_cud_item_ops = {
-	.flags		= XFS_ITEM_RELEASE_WHEN_COMMITTED,
+	.flags		= XFS_ITEM_RELEASE_WHEN_COMMITTED |
+			  XFS_ITEM_INTENT_DONE,
 	.iop_size	= xfs_cud_item_size,
 	.iop_format	= xfs_cud_item_format,
 	.iop_release	= xfs_cud_item_release,
@@ -600,6 +601,7 @@ xfs_cui_item_relog(
 }
 
 static const struct xfs_item_ops xfs_cui_item_ops = {
+	.flags		= XFS_ITEM_INTENT,
 	.iop_size	= xfs_cui_item_size,
 	.iop_format	= xfs_cui_item_format,
 	.iop_unpin	= xfs_cui_item_unpin,
diff --git a/fs/xfs/xfs_rmap_item.c b/fs/xfs/xfs_rmap_item.c
index c3966b4c58ef..c78e31bc84e1 100644
--- a/fs/xfs/xfs_rmap_item.c
+++ b/fs/xfs/xfs_rmap_item.c
@@ -231,7 +231,8 @@ xfs_rud_item_release(
 }
 
 static const struct xfs_item_ops xfs_rud_item_ops = {
-	.flags		= XFS_ITEM_RELEASE_WHEN_COMMITTED,
+	.flags		= XFS_ITEM_RELEASE_WHEN_COMMITTED |
+			  XFS_ITEM_INTENT_DONE,
 	.iop_size	= xfs_rud_item_size,
 	.iop_format	= xfs_rud_item_format,
 	.iop_release	= xfs_rud_item_release,
@@ -630,6 +631,7 @@ xfs_rui_item_relog(
 }
 
 static const struct xfs_item_ops xfs_rui_item_ops = {
+	.flags		= XFS_ITEM_INTENT,
 	.iop_size	= xfs_rui_item_size,
 	.iop_format	= xfs_rui_item_format,
 	.iop_unpin	= xfs_rui_item_unpin,
diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h
index a487b264a9eb..93cb4be33f7a 100644
--- a/fs/xfs/xfs_trans.h
+++ b/fs/xfs/xfs_trans.h
@@ -79,28 +79,29 @@ struct xfs_item_ops {
 			struct xfs_trans *tp);
 };
 
-/* Is this log item a deferred action intent? */
+/*
+ * Log item ops flags
+ */
+/*
+ * Release the log item when the journal commits instead of inserting into the
+ * AIL for writeback tracking and/or log tail pinning.
+ */
+#define XFS_ITEM_RELEASE_WHEN_COMMITTED	(1 << 0)
+#define XFS_ITEM_INTENT			(1 << 1)
+#define XFS_ITEM_INTENT_DONE		(1 << 2)
+
 static inline bool
 xlog_item_is_intent(struct xfs_log_item *lip)
 {
-	return lip->li_ops->iop_recover != NULL &&
-	       lip->li_ops->iop_match != NULL;
+	return lip->li_ops->flags & XFS_ITEM_INTENT;
 }
 
-/* Is this a log intent-done item? */
 static inline bool
 xlog_item_is_intent_done(struct xfs_log_item *lip)
 {
-	return lip->li_ops->iop_unpin == NULL &&
-	       lip->li_ops->iop_push == NULL;
+	return lip->li_ops->flags & XFS_ITEM_INTENT_DONE;
 }
 
-/*
- * Release the log item as soon as committed.  This is for items just logging
- * intents that never need to be written back in place.
- */
-#define XFS_ITEM_RELEASE_WHEN_COMMITTED	(1 << 0)
-
 void	xfs_log_item_init(struct xfs_mount *mp, struct xfs_log_item *item,
 			  int type, const struct xfs_item_ops *ops);
 
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 4/8] xfs: tag transactions that contain intent done items
  2022-03-14 22:06 [PATCH 0/8 v3] xfs: intent whiteouts Dave Chinner
                   ` (2 preceding siblings ...)
  2022-03-14 22:06 ` [PATCH 3/8] xfs: add log item flags to indicate intents Dave Chinner
@ 2022-03-14 22:06 ` Dave Chinner
  2022-04-11  5:23   ` Alli
  2022-03-14 22:06 ` [PATCH 5/8] xfs: factor and move some code in xfs_log_cil.c Dave Chinner
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 20+ messages in thread
From: Dave Chinner @ 2022-03-14 22:06 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

Intent whiteouts will require extra work to be done during
transaction commit if the transaction contains an intent done item.

To determine if a transaction contains an intent done item, we want
to avoid having to walk all the items in the transaction to check if
they are intent done items. Hence when we add an intent done item to
a transaction, tag the transaction to indicate that it contains such
an item.

We don't tag the transaction when the defer ops is relogging an
intent to move it forward in the log. Whiteouts will never apply to
these cases, so we don't need to bother looking for them.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/libxfs/xfs_shared.h | 24 +++++++++++++++++-------
 fs/xfs/xfs_bmap_item.c     |  2 +-
 fs/xfs/xfs_extfree_item.c  |  2 +-
 fs/xfs/xfs_refcount_item.c |  2 +-
 fs/xfs/xfs_rmap_item.c     |  2 +-
 5 files changed, 21 insertions(+), 11 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_shared.h b/fs/xfs/libxfs/xfs_shared.h
index 25c4cab58851..e96618dbde29 100644
--- a/fs/xfs/libxfs/xfs_shared.h
+++ b/fs/xfs/libxfs/xfs_shared.h
@@ -54,13 +54,23 @@ void	xfs_log_get_max_trans_res(struct xfs_mount *mp,
 /*
  * Values for t_flags.
  */
-#define	XFS_TRANS_DIRTY		0x01	/* something needs to be logged */
-#define	XFS_TRANS_SB_DIRTY	0x02	/* superblock is modified */
-#define	XFS_TRANS_PERM_LOG_RES	0x04	/* xact took a permanent log res */
-#define	XFS_TRANS_SYNC		0x08	/* make commit synchronous */
-#define XFS_TRANS_RESERVE	0x20    /* OK to use reserved data blocks */
-#define XFS_TRANS_NO_WRITECOUNT 0x40	/* do not elevate SB writecount */
-#define XFS_TRANS_RES_FDBLKS	0x80	/* reserve newly freed blocks */
+/* Transaction needs to be logged */
+#define XFS_TRANS_DIRTY			(1 << 0)
+/* Superblock is dirty and needs to be logged */
+#define XFS_TRANS_SB_DIRTY		(1 << 1)
+/* Transaction took a permanent log reservation */
+#define XFS_TRANS_PERM_LOG_RES		(1 << 2)
+/* Synchronous transaction commit needed */
+#define XFS_TRANS_SYNC			(1 << 3)
+/* Transaction can use reserve block pool */
+#define XFS_TRANS_RESERVE		(1 << 4)
+/* Transaction should avoid VFS level superblock write accounting */
+#define XFS_TRANS_NO_WRITECOUNT		(1 << 5)
+/* Transaction has freed blocks returned to it's reservation */
+#define XFS_TRANS_RES_FDBLKS		(1 << 6)
+/* Transaction contains an intent done log item */
+#define XFS_TRANS_HAS_INTENT_DONE	(1 << 7)
+
 /*
  * LOWMODE is used by the allocator to activate the lowspace algorithm - when
  * free space is running low the extent allocator may choose to allocate an
diff --git a/fs/xfs/xfs_bmap_item.c b/fs/xfs/xfs_bmap_item.c
index ead27d40ef78..45dd03272e5d 100644
--- a/fs/xfs/xfs_bmap_item.c
+++ b/fs/xfs/xfs_bmap_item.c
@@ -255,7 +255,7 @@ xfs_trans_log_finish_bmap_update(
 	 * 1.) releases the BUI and frees the BUD
 	 * 2.) shuts down the filesystem
 	 */
-	tp->t_flags |= XFS_TRANS_DIRTY;
+	tp->t_flags |= XFS_TRANS_DIRTY | XFS_TRANS_HAS_INTENT_DONE;
 	set_bit(XFS_LI_DIRTY, &budp->bud_item.li_flags);
 
 	return error;
diff --git a/fs/xfs/xfs_extfree_item.c b/fs/xfs/xfs_extfree_item.c
index 6913db61d770..ed1229cb6807 100644
--- a/fs/xfs/xfs_extfree_item.c
+++ b/fs/xfs/xfs_extfree_item.c
@@ -381,7 +381,7 @@ xfs_trans_free_extent(
 	 * 1.) releases the EFI and frees the EFD
 	 * 2.) shuts down the filesystem
 	 */
-	tp->t_flags |= XFS_TRANS_DIRTY;
+	tp->t_flags |= XFS_TRANS_DIRTY | XFS_TRANS_HAS_INTENT_DONE;
 	set_bit(XFS_LI_DIRTY, &efdp->efd_item.li_flags);
 
 	next_extent = efdp->efd_next_extent;
diff --git a/fs/xfs/xfs_refcount_item.c b/fs/xfs/xfs_refcount_item.c
index bffde82b109c..642bcff72a71 100644
--- a/fs/xfs/xfs_refcount_item.c
+++ b/fs/xfs/xfs_refcount_item.c
@@ -260,7 +260,7 @@ xfs_trans_log_finish_refcount_update(
 	 * 1.) releases the CUI and frees the CUD
 	 * 2.) shuts down the filesystem
 	 */
-	tp->t_flags |= XFS_TRANS_DIRTY;
+	tp->t_flags |= XFS_TRANS_DIRTY | XFS_TRANS_HAS_INTENT_DONE;
 	set_bit(XFS_LI_DIRTY, &cudp->cud_item.li_flags);
 
 	return error;
diff --git a/fs/xfs/xfs_rmap_item.c b/fs/xfs/xfs_rmap_item.c
index c78e31bc84e1..4285b94465d2 100644
--- a/fs/xfs/xfs_rmap_item.c
+++ b/fs/xfs/xfs_rmap_item.c
@@ -328,7 +328,7 @@ xfs_trans_log_finish_rmap_update(
 	 * 1.) releases the RUI and frees the RUD
 	 * 2.) shuts down the filesystem
 	 */
-	tp->t_flags |= XFS_TRANS_DIRTY;
+	tp->t_flags |= XFS_TRANS_DIRTY | XFS_TRANS_HAS_INTENT_DONE;
 	set_bit(XFS_LI_DIRTY, &rudp->rud_item.li_flags);
 
 	return error;
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 5/8] xfs: factor and move some code in xfs_log_cil.c
  2022-03-14 22:06 [PATCH 0/8 v3] xfs: intent whiteouts Dave Chinner
                   ` (3 preceding siblings ...)
  2022-03-14 22:06 ` [PATCH 4/8] xfs: tag transactions that contain intent done items Dave Chinner
@ 2022-03-14 22:06 ` Dave Chinner
  2022-04-11  5:24   ` Alli
  2022-03-14 22:06 ` [PATCH 6/8] xfs: add log item method to return related intents Dave Chinner
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 20+ messages in thread
From: Dave Chinner @ 2022-03-14 22:06 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

In preparation for adding support for intent item whiteouts.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_log_cil.c | 119 ++++++++++++++++++++++++-------------------
 1 file changed, 67 insertions(+), 52 deletions(-)

diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
index 5179436b6603..dda71f1a25c5 100644
--- a/fs/xfs/xfs_log_cil.c
+++ b/fs/xfs/xfs_log_cil.c
@@ -47,6 +47,38 @@ xlog_cil_ticket_alloc(
 	return tic;
 }
 
+/*
+ * Check if the current log item was first committed in this sequence.
+ * We can't rely on just the log item being in the CIL, we have to check
+ * the recorded commit sequence number.
+ *
+ * Note: for this to be used in a non-racy manner, it has to be called with
+ * CIL flushing locked out. As a result, it should only be used during the
+ * transaction commit process when deciding what to format into the item.
+ */
+static bool
+xlog_item_in_current_chkpt(
+	struct xfs_cil		*cil,
+	struct xfs_log_item	*lip)
+{
+	if (list_empty(&lip->li_cil))
+		return false;
+
+	/*
+	 * li_seq is written on the first commit of a log item to record the
+	 * first checkpoint it is written to. Hence if it is different to the
+	 * current sequence, we're in a new checkpoint.
+	 */
+	return lip->li_seq == READ_ONCE(cil->xc_current_sequence);
+}
+
+bool
+xfs_log_item_in_current_chkpt(
+	struct xfs_log_item *lip)
+{
+	return xlog_item_in_current_chkpt(lip->li_mountp->m_log->l_cilp, lip);
+}
+
 /*
  * Unavoidable forward declaration - xlog_cil_push_work() calls
  * xlog_cil_ctx_alloc() itself.
@@ -924,6 +956,40 @@ xlog_cil_build_trans_hdr(
 	tic->t_curr_res -= lvhdr->lv_bytes;
 }
 
+/*
+ * Pull all the log vectors off the items in the CIL, and remove the items from
+ * the CIL. We don't need the CIL lock here because it's only needed on the
+ * transaction commit side which is currently locked out by the flush lock.
+ */
+static void
+xlog_cil_build_lv_chain(
+	struct xfs_cil		*cil,
+	struct xfs_cil_ctx	*ctx,
+	uint32_t		*num_iovecs,
+	uint32_t		*num_bytes)
+{
+	struct xfs_log_vec	*lv = NULL;
+
+	while (!list_empty(&cil->xc_cil)) {
+		struct xfs_log_item	*item;
+
+		item = list_first_entry(&cil->xc_cil,
+					struct xfs_log_item, li_cil);
+		list_del_init(&item->li_cil);
+		if (!ctx->lv_chain)
+			ctx->lv_chain = item->li_lv;
+		else
+			lv->lv_next = item->li_lv;
+		lv = item->li_lv;
+		item->li_lv = NULL;
+		*num_iovecs += lv->lv_niovecs;
+
+		/* we don't write ordered log vectors */
+		if (lv->lv_buf_len != XFS_LOG_VEC_ORDERED)
+			*num_bytes += lv->lv_bytes;
+	}
+}
+
 /*
  * Push the Committed Item List to the log.
  *
@@ -946,7 +1012,6 @@ xlog_cil_push_work(
 		container_of(work, struct xfs_cil_ctx, push_work);
 	struct xfs_cil		*cil = ctx->cil;
 	struct xlog		*log = cil->xc_log;
-	struct xfs_log_vec	*lv;
 	struct xfs_cil_ctx	*new_ctx;
 	int			num_iovecs = 0;
 	int			num_bytes = 0;
@@ -1043,31 +1108,7 @@ xlog_cil_push_work(
 	xfs_flush_bdev_async(&bio, log->l_mp->m_ddev_targp->bt_bdev,
 				&bdev_flush);
 
-	/*
-	 * Pull all the log vectors off the items in the CIL, and remove the
-	 * items from the CIL. We don't need the CIL lock here because it's only
-	 * needed on the transaction commit side which is currently locked out
-	 * by the flush lock.
-	 */
-	lv = NULL;
-	while (!list_empty(&cil->xc_cil)) {
-		struct xfs_log_item	*item;
-
-		item = list_first_entry(&cil->xc_cil,
-					struct xfs_log_item, li_cil);
-		list_del_init(&item->li_cil);
-		if (!ctx->lv_chain)
-			ctx->lv_chain = item->li_lv;
-		else
-			lv->lv_next = item->li_lv;
-		lv = item->li_lv;
-		item->li_lv = NULL;
-		num_iovecs += lv->lv_niovecs;
-
-		/* we don't write ordered log vectors */
-		if (lv->lv_buf_len != XFS_LOG_VEC_ORDERED)
-			num_bytes += lv->lv_bytes;
-	}
+	xlog_cil_build_lv_chain(cil, ctx, &num_iovecs, &num_bytes);
 
 	/*
 	 * Switch the contexts so we can drop the context lock and move out
@@ -1508,32 +1549,6 @@ xlog_cil_force_seq(
 	return 0;
 }
 
-/*
- * Check if the current log item was first committed in this sequence.
- * We can't rely on just the log item being in the CIL, we have to check
- * the recorded commit sequence number.
- *
- * Note: for this to be used in a non-racy manner, it has to be called with
- * CIL flushing locked out. As a result, it should only be used during the
- * transaction commit process when deciding what to format into the item.
- */
-bool
-xfs_log_item_in_current_chkpt(
-	struct xfs_log_item	*lip)
-{
-	struct xfs_cil		*cil = lip->li_mountp->m_log->l_cilp;
-
-	if (list_empty(&lip->li_cil))
-		return false;
-
-	/*
-	 * li_seq is written on the first commit of a log item to record the
-	 * first checkpoint it is written to. Hence if it is different to the
-	 * current sequence, we're in a new checkpoint.
-	 */
-	return lip->li_seq == READ_ONCE(cil->xc_current_sequence);
-}
-
 /*
  * Perform initial CIL structure initialisation.
  */
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 6/8] xfs: add log item method to return related intents
  2022-03-14 22:06 [PATCH 0/8 v3] xfs: intent whiteouts Dave Chinner
                   ` (4 preceding siblings ...)
  2022-03-14 22:06 ` [PATCH 5/8] xfs: factor and move some code in xfs_log_cil.c Dave Chinner
@ 2022-03-14 22:06 ` Dave Chinner
  2022-04-11  5:24   ` Alli
  2022-03-14 22:06 ` [PATCH 7/8] xfs: whiteouts release intents that are not in the AIL Dave Chinner
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 20+ messages in thread
From: Dave Chinner @ 2022-03-14 22:06 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

To apply a whiteout to an intent item when an intent done item is
committed, we need to be able to retrieve the intent item from the
the intent done item. Add a log item op method for doing this, and
wire all the intent done items up to it.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_bmap_item.c     | 8 ++++++++
 fs/xfs/xfs_extfree_item.c  | 8 ++++++++
 fs/xfs/xfs_refcount_item.c | 8 ++++++++
 fs/xfs/xfs_rmap_item.c     | 8 ++++++++
 fs/xfs/xfs_trans.h         | 1 +
 5 files changed, 33 insertions(+)

diff --git a/fs/xfs/xfs_bmap_item.c b/fs/xfs/xfs_bmap_item.c
index 45dd03272e5d..2e7abfe35644 100644
--- a/fs/xfs/xfs_bmap_item.c
+++ b/fs/xfs/xfs_bmap_item.c
@@ -201,12 +201,20 @@ xfs_bud_item_release(
 	kmem_cache_free(xfs_bud_cache, budp);
 }
 
+static struct xfs_log_item *
+xfs_bud_item_intent(
+	struct xfs_log_item	*lip)
+{
+	return &BUD_ITEM(lip)->bud_buip->bui_item;
+}
+
 static const struct xfs_item_ops xfs_bud_item_ops = {
 	.flags		= XFS_ITEM_RELEASE_WHEN_COMMITTED |
 			  XFS_ITEM_INTENT_DONE,
 	.iop_size	= xfs_bud_item_size,
 	.iop_format	= xfs_bud_item_format,
 	.iop_release	= xfs_bud_item_release,
+	.iop_intent	= xfs_bud_item_intent,
 };
 
 static struct xfs_bud_log_item *
diff --git a/fs/xfs/xfs_extfree_item.c b/fs/xfs/xfs_extfree_item.c
index ed1229cb6807..1d0e5cdc15f9 100644
--- a/fs/xfs/xfs_extfree_item.c
+++ b/fs/xfs/xfs_extfree_item.c
@@ -306,12 +306,20 @@ xfs_efd_item_release(
 	xfs_efd_item_free(efdp);
 }
 
+static struct xfs_log_item *
+xfs_efd_item_intent(
+	struct xfs_log_item	*lip)
+{
+	return &EFD_ITEM(lip)->efd_efip->efi_item;
+}
+
 static const struct xfs_item_ops xfs_efd_item_ops = {
 	.flags		= XFS_ITEM_RELEASE_WHEN_COMMITTED |
 			  XFS_ITEM_INTENT_DONE,
 	.iop_size	= xfs_efd_item_size,
 	.iop_format	= xfs_efd_item_format,
 	.iop_release	= xfs_efd_item_release,
+	.iop_intent	= xfs_efd_item_intent,
 };
 
 /*
diff --git a/fs/xfs/xfs_refcount_item.c b/fs/xfs/xfs_refcount_item.c
index 642bcff72a71..ada5793ce550 100644
--- a/fs/xfs/xfs_refcount_item.c
+++ b/fs/xfs/xfs_refcount_item.c
@@ -207,12 +207,20 @@ xfs_cud_item_release(
 	kmem_cache_free(xfs_cud_cache, cudp);
 }
 
+static struct xfs_log_item *
+xfs_cud_item_intent(
+	struct xfs_log_item	*lip)
+{
+	return &CUD_ITEM(lip)->cud_cuip->cui_item;
+}
+
 static const struct xfs_item_ops xfs_cud_item_ops = {
 	.flags		= XFS_ITEM_RELEASE_WHEN_COMMITTED |
 			  XFS_ITEM_INTENT_DONE,
 	.iop_size	= xfs_cud_item_size,
 	.iop_format	= xfs_cud_item_format,
 	.iop_release	= xfs_cud_item_release,
+	.iop_intent	= xfs_cud_item_intent,
 };
 
 static struct xfs_cud_log_item *
diff --git a/fs/xfs/xfs_rmap_item.c b/fs/xfs/xfs_rmap_item.c
index 4285b94465d2..6e66e7718902 100644
--- a/fs/xfs/xfs_rmap_item.c
+++ b/fs/xfs/xfs_rmap_item.c
@@ -230,12 +230,20 @@ xfs_rud_item_release(
 	kmem_cache_free(xfs_rud_cache, rudp);
 }
 
+static struct xfs_log_item *
+xfs_rud_item_intent(
+	struct xfs_log_item	*lip)
+{
+	return &RUD_ITEM(lip)->rud_ruip->rui_item;
+}
+
 static const struct xfs_item_ops xfs_rud_item_ops = {
 	.flags		= XFS_ITEM_RELEASE_WHEN_COMMITTED |
 			  XFS_ITEM_INTENT_DONE,
 	.iop_size	= xfs_rud_item_size,
 	.iop_format	= xfs_rud_item_format,
 	.iop_release	= xfs_rud_item_release,
+	.iop_intent	= xfs_rud_item_intent,
 };
 
 static struct xfs_rud_log_item *
diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h
index 93cb4be33f7a..6182c97cb8e7 100644
--- a/fs/xfs/xfs_trans.h
+++ b/fs/xfs/xfs_trans.h
@@ -77,6 +77,7 @@ struct xfs_item_ops {
 	bool (*iop_match)(struct xfs_log_item *item, uint64_t id);
 	struct xfs_log_item *(*iop_relog)(struct xfs_log_item *intent,
 			struct xfs_trans *tp);
+	struct xfs_log_item *(*iop_intent)(struct xfs_log_item *intent_done);
 };
 
 /*
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 7/8] xfs: whiteouts release intents that are not in the AIL
  2022-03-14 22:06 [PATCH 0/8 v3] xfs: intent whiteouts Dave Chinner
                   ` (5 preceding siblings ...)
  2022-03-14 22:06 ` [PATCH 6/8] xfs: add log item method to return related intents Dave Chinner
@ 2022-03-14 22:06 ` Dave Chinner
  2022-04-11  5:27   ` Alli
  2022-03-14 22:06 ` [PATCH 8/8] xfs: intent item whiteouts Dave Chinner
  2022-04-11  5:22 ` [PATCH 0/8 v3] xfs: intent whiteouts Alli
  8 siblings, 1 reply; 20+ messages in thread
From: Dave Chinner @ 2022-03-14 22:06 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

When we release an intent that a whiteout applies to, it will not
have been committed to the journal and so won't be in the AIL. Hence
when we drop the last reference to the intent, we do not want to try
to remove it from the AIL as that will trigger a filesystem
shutdown. Hence make the removal of intents from the AIL conditional
on them actually being in the AIL so we do the correct thing.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_bmap_item.c     | 8 +++++---
 fs/xfs/xfs_extfree_item.c  | 8 +++++---
 fs/xfs/xfs_refcount_item.c | 8 +++++---
 fs/xfs/xfs_rmap_item.c     | 8 +++++---
 4 files changed, 20 insertions(+), 12 deletions(-)

diff --git a/fs/xfs/xfs_bmap_item.c b/fs/xfs/xfs_bmap_item.c
index 2e7abfe35644..c6f47c13da27 100644
--- a/fs/xfs/xfs_bmap_item.c
+++ b/fs/xfs/xfs_bmap_item.c
@@ -54,10 +54,12 @@ xfs_bui_release(
 	struct xfs_bui_log_item	*buip)
 {
 	ASSERT(atomic_read(&buip->bui_refcount) > 0);
-	if (atomic_dec_and_test(&buip->bui_refcount)) {
+	if (!atomic_dec_and_test(&buip->bui_refcount))
+		return;
+
+	if (test_bit(XFS_LI_IN_AIL, &buip->bui_item.li_flags))
 		xfs_trans_ail_delete(&buip->bui_item, SHUTDOWN_LOG_IO_ERROR);
-		xfs_bui_item_free(buip);
-	}
+	xfs_bui_item_free(buip);
 }
 
 
diff --git a/fs/xfs/xfs_extfree_item.c b/fs/xfs/xfs_extfree_item.c
index 1d0e5cdc15f9..36eeac9413f5 100644
--- a/fs/xfs/xfs_extfree_item.c
+++ b/fs/xfs/xfs_extfree_item.c
@@ -58,10 +58,12 @@ xfs_efi_release(
 	struct xfs_efi_log_item	*efip)
 {
 	ASSERT(atomic_read(&efip->efi_refcount) > 0);
-	if (atomic_dec_and_test(&efip->efi_refcount)) {
+	if (!atomic_dec_and_test(&efip->efi_refcount))
+		return;
+
+	if (test_bit(XFS_LI_IN_AIL, &efip->efi_item.li_flags))
 		xfs_trans_ail_delete(&efip->efi_item, SHUTDOWN_LOG_IO_ERROR);
-		xfs_efi_item_free(efip);
-	}
+	xfs_efi_item_free(efip);
 }
 
 /*
diff --git a/fs/xfs/xfs_refcount_item.c b/fs/xfs/xfs_refcount_item.c
index ada5793ce550..d4632f2ceb89 100644
--- a/fs/xfs/xfs_refcount_item.c
+++ b/fs/xfs/xfs_refcount_item.c
@@ -53,10 +53,12 @@ xfs_cui_release(
 	struct xfs_cui_log_item	*cuip)
 {
 	ASSERT(atomic_read(&cuip->cui_refcount) > 0);
-	if (atomic_dec_and_test(&cuip->cui_refcount)) {
+	if (!atomic_dec_and_test(&cuip->cui_refcount))
+		return;
+
+	if (test_bit(XFS_LI_IN_AIL, &cuip->cui_item.li_flags))
 		xfs_trans_ail_delete(&cuip->cui_item, SHUTDOWN_LOG_IO_ERROR);
-		xfs_cui_item_free(cuip);
-	}
+	xfs_cui_item_free(cuip);
 }
 
 
diff --git a/fs/xfs/xfs_rmap_item.c b/fs/xfs/xfs_rmap_item.c
index 6e66e7718902..fa691a6ae737 100644
--- a/fs/xfs/xfs_rmap_item.c
+++ b/fs/xfs/xfs_rmap_item.c
@@ -53,10 +53,12 @@ xfs_rui_release(
 	struct xfs_rui_log_item	*ruip)
 {
 	ASSERT(atomic_read(&ruip->rui_refcount) > 0);
-	if (atomic_dec_and_test(&ruip->rui_refcount)) {
+	if (!atomic_dec_and_test(&ruip->rui_refcount))
+		return;
+
+	if (test_bit(XFS_LI_IN_AIL, &ruip->rui_item.li_flags))
 		xfs_trans_ail_delete(&ruip->rui_item, SHUTDOWN_LOG_IO_ERROR);
-		xfs_rui_item_free(ruip);
-	}
+	xfs_rui_item_free(ruip);
 }
 
 STATIC void
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 8/8] xfs: intent item whiteouts
  2022-03-14 22:06 [PATCH 0/8 v3] xfs: intent whiteouts Dave Chinner
                   ` (6 preceding siblings ...)
  2022-03-14 22:06 ` [PATCH 7/8] xfs: whiteouts release intents that are not in the AIL Dave Chinner
@ 2022-03-14 22:06 ` Dave Chinner
  2022-04-11  5:22 ` [PATCH 0/8 v3] xfs: intent whiteouts Alli
  8 siblings, 0 replies; 20+ messages in thread
From: Dave Chinner @ 2022-03-14 22:06 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

When we log modifications based on intents, we add both intent
and intent done items to the modification being made. These get
written to the log to ensure that the operation is re-run if the
intent done is not found in the log.

However, for operations that complete wholly within a single
checkpoint, the change in the checkpoint is atomic and will never
need replay. In this case, we don't need to actually write the
intent and intent done items to the journal because log recovery
will never need to manually restart this modification.

Log recovery currently handles intent/intent done matching by
inserting the intent into the AIL, then removing it when a matching
intent done item is found. Hence for all the intent-based operations
that complete within a checkpoint, we spend all that time parsing
the intent/intent done items just to cancel them and do nothing with
them.

Hence it follows that the only time we actually need intents in the
log is when the modification crosses checkpoint boundaries in the
log and so may only be partially complete in the journal. Hence if
we commit and intent done item to the CIL and the intent item is in
the same checkpoint, we don't actually have to write them to the
journal because log recovery will always cancel the intents.

We've never really worried about the overhead of logging intents
unnecessarily like this because the intents we log are generally
very much smaller than the change being made. e.g. freeing an extent
involves modifying at lease two freespace btree blocks and the AGF,
so the EFI/EFD overhead is only a small increase in space and
processing time compared to the overall cost of freeing an extent.

However, delayed attributes change this cost equation dramatically,
especially for inline attributes. In the case of adding an inline
attribute, we only log the inode core and attribute fork at present.
With delayed attributes, we now log the attr intent which includes
the name and value, the inode core adn attr fork, and finally the
attr intent done item. We increase the number of items we log from 1
to 3, and the number of log vectors (regions) goes up from 3 to 7.
Hence we tripple the number of objects that the CIL has to process,
and more than double the number of log vectors that need to be
written to the journal.

At scale, this means delayed attributes cause a non-pipelined CIL to
become CPU bound processing all the extra items, resulting in a > 40%
performance degradation on 16-way file+xattr create worklaods.
Pipelining the CIL (as per 5.15) reduces the performance degradation
to 20%, but now the limitation is the rate at which the log items
can be written to the iclogs and iclogs be dispatched for IO and
completed.

Even log IO completion is slowed down by these intents, because it
now has to process 3x the number of items in the checkpoint.
Processing completed intents is especially inefficient here, because
we first insert the intent into the AIL, then remove it from the AIL
when the intent done is processed. IOWs, we are also doing expensive
operations in log IO completion we could completely avoid if we
didn't log completed intent/intent done pairs.

Enter log item whiteouts.

When an intent done is committed, we can check to see if the
associated intent is in the same checkpoint as we are currently
committing the intent done to. If so, we can mark the intent log
item with a whiteout and immediately free the intent done item
rather than committing it to the CIL. We can basically skip the
entire formatting and CIL insertion steps for the intent done item.

However, we cannot remove the intent item from the CIL at this point
because the unlocked per-cpu CIL item lists do not permit removal
without holding the CIL context lock exclusively. Transaction commit
only holds the context lock shared, hence the best we can do is mark
the intent item with a whiteout so that the CIL push can release it
rather than writing it to the log.

This means we never write the intent to the log if the intent done
has also been committed to the same checkpoint, but we'll always
write the intent if the intent done has not been committed or has
been committed to a different checkpoint. This will result in
correct log recovery behaviour in all cases, without the overhead of
logging unnecessary intents.

This intent whiteout concept is generic - we can apply it to all
intent/intent done pairs that have a direct 1:1 relationship. The
way deferred ops iterate and relog intents mean that all intents
currently have a 1:1 relationship with their done intent, and hence
we can apply this cancellation to all existing intent/intent done
implementations.

For delayed attributes with a 16-way 64kB xattr create workload,
whiteouts reduce the amount of journalled metadata from ~2.5GB/s
down to ~600MB/s and improve the creation rate from 9000/s to
14000/s.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_bmap_item.c |  2 ++
 fs/xfs/xfs_log_cil.c   | 80 ++++++++++++++++++++++++++++++++++++++++--
 fs/xfs/xfs_rmap_item.c |  2 ++
 fs/xfs/xfs_trace.h     |  3 ++
 fs/xfs/xfs_trans.h     |  7 ++--
 5 files changed, 89 insertions(+), 5 deletions(-)

diff --git a/fs/xfs/xfs_bmap_item.c b/fs/xfs/xfs_bmap_item.c
index c6f47c13da27..fa710067aac2 100644
--- a/fs/xfs/xfs_bmap_item.c
+++ b/fs/xfs/xfs_bmap_item.c
@@ -39,6 +39,7 @@ STATIC void
 xfs_bui_item_free(
 	struct xfs_bui_log_item	*buip)
 {
+	kmem_free(buip->bui_item.li_lv_shadow);
 	kmem_cache_free(xfs_bui_cache, buip);
 }
 
@@ -199,6 +200,7 @@ xfs_bud_item_release(
 {
 	struct xfs_bud_log_item	*budp = BUD_ITEM(lip);
 
+	kmem_free(budp->bud_item.li_lv_shadow);
 	xfs_bui_release(budp->bud_buip);
 	kmem_cache_free(xfs_bud_cache, budp);
 }
diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
index dda71f1a25c5..3d8ebf2a1e55 100644
--- a/fs/xfs/xfs_log_cil.c
+++ b/fs/xfs/xfs_log_cil.c
@@ -476,7 +476,8 @@ xlog_cil_insert_format_items(
 static void
 xlog_cil_insert_items(
 	struct xlog		*log,
-	struct xfs_trans	*tp)
+	struct xfs_trans	*tp,
+	uint32_t		released_space)
 {
 	struct xfs_cil		*cil = log->l_cilp;
 	struct xfs_cil_ctx	*ctx = cil->xc_ctx;
@@ -526,6 +527,7 @@ xlog_cil_insert_items(
 	}
 	tp->t_ticket->t_curr_res -= len;
 	ctx->space_used += len;
+	ctx->space_used -= released_space;
 
 	/*
 	 * If we've overrun the reservation, dump the tx details before we move
@@ -960,11 +962,16 @@ xlog_cil_build_trans_hdr(
  * Pull all the log vectors off the items in the CIL, and remove the items from
  * the CIL. We don't need the CIL lock here because it's only needed on the
  * transaction commit side which is currently locked out by the flush lock.
+ *
+ * If a log item is marked with a whiteout, we do not need to write it to the
+ * journal and so we just move them to the whiteout list for the caller to
+ * dispose of appropriately.
  */
 static void
 xlog_cil_build_lv_chain(
 	struct xfs_cil		*cil,
 	struct xfs_cil_ctx	*ctx,
+	struct list_head	*whiteouts,
 	uint32_t		*num_iovecs,
 	uint32_t		*num_bytes)
 {
@@ -975,6 +982,13 @@ xlog_cil_build_lv_chain(
 
 		item = list_first_entry(&cil->xc_cil,
 					struct xfs_log_item, li_cil);
+
+		if (test_bit(XFS_LI_WHITEOUT, &item->li_flags)) {
+			list_move(&item->li_cil, whiteouts);
+			trace_xfs_cil_whiteout_skip(item);
+			continue;
+		}
+
 		list_del_init(&item->li_cil);
 		if (!ctx->lv_chain)
 			ctx->lv_chain = item->li_lv;
@@ -1023,6 +1037,7 @@ xlog_cil_push_work(
 	struct bio		bio;
 	DECLARE_COMPLETION_ONSTACK(bdev_flush);
 	bool			push_commit_stable;
+	LIST_HEAD		(whiteouts);
 
 	new_ctx = xlog_cil_ctx_alloc();
 	new_ctx->ticket = xlog_cil_ticket_alloc(log);
@@ -1108,7 +1123,7 @@ xlog_cil_push_work(
 	xfs_flush_bdev_async(&bio, log->l_mp->m_ddev_targp->bt_bdev,
 				&bdev_flush);
 
-	xlog_cil_build_lv_chain(cil, ctx, &num_iovecs, &num_bytes);
+	xlog_cil_build_lv_chain(cil, ctx, &whiteouts, &num_iovecs, &num_bytes);
 
 	/*
 	 * Switch the contexts so we can drop the context lock and move out
@@ -1217,6 +1232,15 @@ xlog_cil_push_work(
 	/* Not safe to reference ctx now! */
 
 	spin_unlock(&log->l_icloglock);
+
+	/* clean up log items that had whiteouts */
+	while (!list_empty(&whiteouts)) {
+		struct xfs_log_item *item = list_first_entry(&whiteouts,
+						struct xfs_log_item, li_cil);
+		list_del_init(&item->li_cil);
+		trace_xfs_cil_whiteout_unpin(item);
+		item->li_ops->iop_unpin(item, 1);
+	}
 	return;
 
 out_skip:
@@ -1228,6 +1252,14 @@ xlog_cil_push_work(
 out_abort_free_ticket:
 	xfs_log_ticket_ungrant(log, ctx->ticket);
 	ASSERT(xlog_is_shutdown(log));
+	while (!list_empty(&whiteouts)) {
+		struct xfs_log_item *item = list_first_entry(&whiteouts,
+						struct xfs_log_item, li_cil);
+		list_del_init(&item->li_cil);
+		trace_xfs_cil_whiteout_unpin(item);
+		item->li_ops->iop_unpin(item, 1);
+	}
+
 	if (!ctx->commit_iclog) {
 		xlog_cil_committed(ctx);
 		return;
@@ -1367,6 +1399,43 @@ xlog_cil_empty(
 	return empty;
 }
 
+/*
+ * If there are intent done items in this transaction and the related intent was
+ * committed in the current (same) CIL checkpoint, we don't need to write either
+ * the intent or intent done item to the journal as the change will be
+ * journalled atomically within this checkpoint. As we cannot remove items from
+ * the CIL here, mark the related intent with a whiteout so that the CIL push
+ * can remove it rather than writing it to the journal. Then remove the intent
+ * done item from the current transaction and release it so it doesn't get put
+ * into the CIL at all.
+ */
+static uint32_t
+xlog_cil_process_intents(
+	struct xfs_cil		*cil,
+	struct xfs_trans	*tp)
+{
+	struct xfs_log_item	*lip, *ilip, *next;
+	uint32_t		len = 0;
+
+	list_for_each_entry_safe(lip, next, &tp->t_items, li_trans) {
+		if (!(lip->li_ops->flags & XFS_ITEM_INTENT_DONE))
+			continue;
+
+		ilip = lip->li_ops->iop_intent(lip);
+		if (!ilip || !xlog_item_in_current_chkpt(cil, ilip))
+			continue;
+		set_bit(XFS_LI_WHITEOUT, &ilip->li_flags);
+		trace_xfs_cil_whiteout_mark(ilip);
+		len += ilip->li_lv->lv_bytes;
+		kmem_free(ilip->li_lv);
+		ilip->li_lv = NULL;
+
+		xfs_trans_del_item(lip);
+		lip->li_ops->iop_release(lip);
+	}
+	return len;
+}
+
 /*
  * Commit a transaction with the given vector to the Committed Item List.
  *
@@ -1389,6 +1458,7 @@ xlog_cil_commit(
 {
 	struct xfs_cil		*cil = log->l_cilp;
 	struct xfs_log_item	*lip, *next;
+	uint32_t		released_space = 0;
 
 	/*
 	 * Do all necessary memory allocation before we lock the CIL.
@@ -1400,7 +1470,11 @@ xlog_cil_commit(
 	/* lock out background commit */
 	down_read(&cil->xc_ctx_lock);
 
-	xlog_cil_insert_items(log, tp);
+	if (tp->t_flags & XFS_TRANS_HAS_INTENT_DONE)
+		released_space = xlog_cil_process_intents(cil, tp);
+
+	xlog_cil_insert_items(log, tp, released_space);
+	tp->t_ticket->t_curr_res += released_space;
 
 	if (regrant && !xlog_is_shutdown(log))
 		xfs_log_ticket_regrant(log, tp->t_ticket);
diff --git a/fs/xfs/xfs_rmap_item.c b/fs/xfs/xfs_rmap_item.c
index fa691a6ae737..cb0490919b2c 100644
--- a/fs/xfs/xfs_rmap_item.c
+++ b/fs/xfs/xfs_rmap_item.c
@@ -35,6 +35,7 @@ STATIC void
 xfs_rui_item_free(
 	struct xfs_rui_log_item	*ruip)
 {
+	kmem_free(ruip->rui_item.li_lv_shadow);
 	if (ruip->rui_format.rui_nextents > XFS_RUI_MAX_FAST_EXTENTS)
 		kmem_free(ruip);
 	else
@@ -228,6 +229,7 @@ xfs_rud_item_release(
 {
 	struct xfs_rud_log_item	*rudp = RUD_ITEM(lip);
 
+	kmem_free(rudp->rud_item.li_lv_shadow);
 	xfs_rui_release(rudp->rud_ruip);
 	kmem_cache_free(xfs_rud_cache, rudp);
 }
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index 4a8076ef8cb4..585bd9853b6b 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -1348,6 +1348,9 @@ DEFINE_LOG_ITEM_EVENT(xfs_ail_push);
 DEFINE_LOG_ITEM_EVENT(xfs_ail_pinned);
 DEFINE_LOG_ITEM_EVENT(xfs_ail_locked);
 DEFINE_LOG_ITEM_EVENT(xfs_ail_flushing);
+DEFINE_LOG_ITEM_EVENT(xfs_cil_whiteout_mark);
+DEFINE_LOG_ITEM_EVENT(xfs_cil_whiteout_skip);
+DEFINE_LOG_ITEM_EVENT(xfs_cil_whiteout_unpin);
 
 DECLARE_EVENT_CLASS(xfs_ail_class,
 	TP_PROTO(struct xfs_log_item *lip, xfs_lsn_t old_lsn, xfs_lsn_t new_lsn),
diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h
index 6182c97cb8e7..85dca2c9b559 100644
--- a/fs/xfs/xfs_trans.h
+++ b/fs/xfs/xfs_trans.h
@@ -54,13 +54,16 @@ struct xfs_log_item {
 #define	XFS_LI_IN_AIL	0
 #define	XFS_LI_ABORTED	1
 #define	XFS_LI_FAILED	2
-#define	XFS_LI_DIRTY	3	/* log item dirty in transaction */
+#define	XFS_LI_DIRTY	3
+#define	XFS_LI_WHITEOUT	4
 
 #define XFS_LI_FLAGS \
 	{ (1 << XFS_LI_IN_AIL),		"IN_AIL" }, \
 	{ (1 << XFS_LI_ABORTED),	"ABORTED" }, \
 	{ (1 << XFS_LI_FAILED),		"FAILED" }, \
-	{ (1 << XFS_LI_DIRTY),		"DIRTY" }
+	{ (1 << XFS_LI_DIRTY),		"DIRTY" }, \
+	{ (1 << XFS_LI_WHITEOUT),	"WHITEOUT" }
+
 
 struct xfs_item_ops {
 	unsigned flags;
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/8 v3] xfs: intent whiteouts
  2022-03-14 22:06 [PATCH 0/8 v3] xfs: intent whiteouts Dave Chinner
                   ` (7 preceding siblings ...)
  2022-03-14 22:06 ` [PATCH 8/8] xfs: intent item whiteouts Dave Chinner
@ 2022-04-11  5:22 ` Alli
  8 siblings, 0 replies; 20+ messages in thread
From: Alli @ 2022-04-11  5:22 UTC (permalink / raw)
  To: Dave Chinner, linux-xfs

On Tue, 2022-03-15 at 09:06 +1100, Dave Chinner wrote:
> This is a patchset inspired by the performance regressions that were
> seen from logging 64k xattrs with Allison's delayed attribute
> patchset and trying to work out how to minimise the impact of
> logging xattrs. Most of that is explained in the "xfs: intent item
> whiteouts" patch, so I won't repeat it here.
> 
> The whiteouts massively reduce the journal write overhead of logging
> xattrs - with this patchset I've reduced 2.5GB/s of log traffic (16
> way file create w/64k xattr workload) down to approximately 220MB of
> log traffic, and performance has increased from 9k creates/s to 36k
> creates/s. The workload still writes to disk at 2.5GB/s, but that's
> what writing 35k x 64k xattrs to disk does.
> 
> This is still short of the non-logged attribute mechanism, which
> runs at 40-45k creates a second and 3.5-4GB/s to disk, but it brings
> logged attrs to within roughly 5-15% of non-logged attrs across the
> full range of attribute sizes.
> 
> So, while this patchset was clearly insired and has major positive
> impact on Allison's delayed attribute work, it also applies
> generically to all other intent/intent done pairs that already
> exist. Hence I've created this patchset as a stand-alone patchset
> that isn't dependent on the delayed attributes being committed, nor
> does the delayed attribute patchset need this to function properly.
> IOWs, they can be merged in parallel and then the attribute log item
> implementation be updated to support whiteouts after the fact.
> 
> This patchset is separate to the attr code, though, because
> intent whiteouts are not specific to the attr code. They are a
> generic mechanism that can be applied to all the intent/intent done
> item pairs we already have. This patch set modifies all those
> intents to use whiteouts, and so there is benefits from the patch
> set for all operations that use these intents.
> 
> With respect to the delayed attribute patchset, it can be merged
> without whiteout support and still work correctly with/without this
> patchset in place. Once both intent whiteouts and delayed attrs are
> merged, we can add whiteout support to delayed attributes with only
> a few lines of extra code.
> 
I applied this set to the for-next tag:
01728b44ef1b (tag: xfs-5.18-merge-2), and ran into a few merge
conflicts.  I just went through and fixed them to try and get things
working, but I'm getting some failed assert checks on generic/466:

[  155.219780] XFS: Assertion failed: reg->i_len % sizeof(int32_t) ==
0, file: fs/xfs/xfs_log.c, line: 2536

So maybe still some bugglies to work out.  I'll go over the conflicts
in the review.  Also, I'll push out the branch I have so far. Since
I've already gone through the rebase effort, I may as well share it:
https://github.com/allisonhenderson/xfs_work/tree/whiteouts

Allison

> Changelog:
> 
> Version 3:
> - rebased on 5.17-rc4 + xlog-write-rework
> - no longer dependent on xfs-cil-scalability, so there's some porting
> changes
>   that was needed to remove all the per-cpu CIL dependencies.
> 
> Version 2:
> - not published
> - rebased on 5.15-rc2 + xfs-cil-scalability
> - dropped the kvmalloc changes for CIL shadow buffers as that's a
>   separate perf problem and not something related to intent
>   whiteouts.
> - dropped all the delayed attribute modifications so that the
>   patchset is not dependent on Allison's dev tree.
> - Thanks to Allison for an initial quick review pass - I haven't
>   included those RVB tags because every patch in the series has
>   changed since the original RFC posting.
> 
> RFC:
> - 
> https://lore.kernel.org/linux-xfs/20210909212133.GE2361455@dread.disaster.area/
> 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/8] xfs: don't commit the first deferred transaction without intents
  2022-03-14 22:06 ` [PATCH 2/8] xfs: don't commit the first deferred transaction without intents Dave Chinner
@ 2022-04-11  5:22   ` Alli
  2022-04-12 10:21     ` Dave Chinner
  0 siblings, 1 reply; 20+ messages in thread
From: Alli @ 2022-04-11  5:22 UTC (permalink / raw)
  To: Dave Chinner, linux-xfs

On Tue, 2022-03-15 at 09:06 +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> If the first operation in a string of defer ops has no intents,
> then there is no reason to commit it before running the first call
> to xfs_defer_finish_one(). This allows the defer ops to be used
> effectively for non-intent based operations without requiring an
> unnecessary extra transaction commit when first called.
> 
> This fixes a regression in per-attribute modification transaction
> count when delayed attributes are not being used.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>

I recall some time ago, you had given me this patch, and I added it to
the delayed attribute series series.  The reviews created a slightly
more simplified version of this, so if you are ok with how that one
turned out, you can just omit this patch from the white out series.  Or
if you prefer to keep it with this set, you can just adopt the second
patch of the larp series, and I can omit it from there.  Either was
should be fine I think?

Here are the reviews for this patch:
v25
https://lore.kernel.org/all/20211117041343.3050202-3-allison.henderson@oracle.com/
v26
https://lore.kernel.org/all/20220124052708.580016-3-allison.henderson@oracle.com/
v27
https://lore.kernel.org/all/20220216013713.1191082-3-allison.henderson@oracle.com/
v28
https://lore.kernel.org/all/20220228195147.1913281-3-allison.henderson@oracle.com/

Allison

> ---
>  fs/xfs/libxfs/xfs_defer.c | 29 +++++++++++++++++------------
>  1 file changed, 17 insertions(+), 12 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_defer.c b/fs/xfs/libxfs/xfs_defer.c
> index 0805ade2d300..66b4555bda8e 100644
> --- a/fs/xfs/libxfs/xfs_defer.c
> +++ b/fs/xfs/libxfs/xfs_defer.c
> @@ -186,7 +186,7 @@ static const struct xfs_defer_op_type
> *defer_op_types[] = {
>  	[XFS_DEFER_OPS_TYPE_AGFL_FREE]	= &xfs_agfl_free_defer_type,
>  };
>  
> -static void
> +static bool
>  xfs_defer_create_intent(
>  	struct xfs_trans		*tp,
>  	struct xfs_defer_pending	*dfp,
> @@ -197,6 +197,7 @@ xfs_defer_create_intent(
>  	if (!dfp->dfp_intent)
>  		dfp->dfp_intent = ops->create_intent(tp, &dfp-
> >dfp_work,
>  						     dfp->dfp_count,
> sort);
> +	return dfp->dfp_intent;
>  }
>  
>  /*
> @@ -204,16 +205,18 @@ xfs_defer_create_intent(
>   * associated extents, then add the entire intake list to the end of
>   * the pending list.
>   */
> -STATIC void
> +static bool
>  xfs_defer_create_intents(
>  	struct xfs_trans		*tp)
>  {
>  	struct xfs_defer_pending	*dfp;
> +	bool				ret = false;
>  
>  	list_for_each_entry(dfp, &tp->t_dfops, dfp_list) {
>  		trace_xfs_defer_create_intent(tp->t_mountp, dfp);
> -		xfs_defer_create_intent(tp, dfp, true);
> +		ret |= xfs_defer_create_intent(tp, dfp, true);
>  	}
> +	return ret;
>  }
>  
>  /* Abort all the intents that were committed. */
> @@ -487,7 +490,7 @@ int
>  xfs_defer_finish_noroll(
>  	struct xfs_trans		**tp)
>  {
> -	struct xfs_defer_pending	*dfp;
> +	struct xfs_defer_pending	*dfp = NULL;
>  	int				error = 0;
>  	LIST_HEAD(dop_pending);
>  
> @@ -506,17 +509,19 @@ xfs_defer_finish_noroll(
>  		 * of time that any one intent item can stick around in
> memory,
>  		 * pinning the log tail.
>  		 */
> -		xfs_defer_create_intents(*tp);
> +		bool has_intents = xfs_defer_create_intents(*tp);
>  		list_splice_init(&(*tp)->t_dfops, &dop_pending);
>  
> -		error = xfs_defer_trans_roll(tp);
> -		if (error)
> -			goto out_shutdown;
> +		if (has_intents || dfp) {
> +			error = xfs_defer_trans_roll(tp);
> +			if (error)
> +				goto out_shutdown;
>  
> -		/* Possibly relog intent items to keep the log moving.
> */
> -		error = xfs_defer_relog(tp, &dop_pending);
> -		if (error)
> -			goto out_shutdown;
> +			/* Possibly relog intent items to keep the log
> moving. */
> +			error = xfs_defer_relog(tp, &dop_pending);
> +			if (error)
> +				goto out_shutdown;
> +		}
>  
>  		dfp = list_first_entry(&dop_pending, struct
> xfs_defer_pending,
>  				       dfp_list);


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/8] xfs: hide log iovec alignment constraints
  2022-03-14 22:06 ` [PATCH 1/8] xfs: hide log iovec alignment constraints Dave Chinner
@ 2022-04-11  5:23   ` Alli
  2022-04-12 10:13     ` Dave Chinner
  0 siblings, 1 reply; 20+ messages in thread
From: Alli @ 2022-04-11  5:23 UTC (permalink / raw)
  To: Dave Chinner, linux-xfs

On Tue, 2022-03-15 at 09:06 +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> Callers currently have to round out the size of buffers to match the
> aligment constraints of log iovecs and xlog_write(). They should not
> need to know this detail, so introduce a new function to calculate
> the iovec length (for use in ->iop_size implementations). Also
> modify xlog_finish_iovec() to round up the length to the correct
> alignment so the callers don't need to do this, either.
> 
> Convert the only user - inode forks - of this alignment rounding to
> use the new interface.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/libxfs/xfs_inode_fork.c  | 12 +++---------
>  fs/xfs/xfs_inode_item.c         | 25 +++++++------------------
>  fs/xfs/xfs_inode_item_recover.c |  4 ++--
>  fs/xfs/xfs_log.h                | 23 +++++++++++++++++++++++
>  4 files changed, 35 insertions(+), 29 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_inode_fork.c
> b/fs/xfs/libxfs/xfs_inode_fork.c
> index 9149f4f796fc..a46379f6c812 100644
> --- a/fs/xfs/libxfs/xfs_inode_fork.c
> +++ b/fs/xfs/libxfs/xfs_inode_fork.c
> @@ -36,7 +36,7 @@ xfs_init_local_fork(
>  	int64_t			size)
>  {
>  	struct xfs_ifork	*ifp = XFS_IFORK_PTR(ip, whichfork);
> -	int			mem_size = size, real_size = 0;
> +	int			mem_size = size;
>  	bool			zero_terminate;
>  
>  	/*
> @@ -50,8 +50,7 @@ xfs_init_local_fork(
>  		mem_size++;
>  
>  	if (size) {
> -		real_size = roundup(mem_size, 4);
> -		ifp->if_u1.if_data = kmem_alloc(real_size, KM_NOFS);
> +		ifp->if_u1.if_data = kmem_alloc(mem_size, KM_NOFS);
>  		memcpy(ifp->if_u1.if_data, data, size);
>  		if (zero_terminate)
>  			ifp->if_u1.if_data[size] = '\0';
> @@ -497,12 +496,7 @@ xfs_idata_realloc(
>  		return;
>  	}
>  
> -	/*
> -	 * For inline data, the underlying buffer must be a multiple of
> 4 bytes
> -	 * in size so that it can be logged and stay on word
> boundaries.
> -	 * We enforce that here.
> -	 */
> -	ifp->if_u1.if_data = krealloc(ifp->if_u1.if_data,
> roundup(new_size, 4),
> +	ifp->if_u1.if_data = krealloc(ifp->if_u1.if_data, new_size,
>  				      GFP_NOFS | __GFP_NOFAIL);
>  	ifp->if_bytes = new_size;
>  }
> diff --git a/fs/xfs/xfs_inode_item.c b/fs/xfs/xfs_inode_item.c
> index 90d8e591baf8..19dc3e37bb4d 100644
> --- a/fs/xfs/xfs_inode_item.c
> +++ b/fs/xfs/xfs_inode_item.c
> @@ -70,7 +70,7 @@ xfs_inode_item_data_fork_size(
>  	case XFS_DINODE_FMT_LOCAL:
>  		if ((iip->ili_fields & XFS_ILOG_DDATA) &&
>  		    ip->i_df.if_bytes > 0) {
> -			*nbytes += roundup(ip->i_df.if_bytes, 4);
> +			*nbytes += xlog_calc_iovec_len(ip-
> >i_df.if_bytes);
>  			*nvecs += 1;
>  		}
>  		break;
> @@ -111,7 +111,7 @@ xfs_inode_item_attr_fork_size(
>  	case XFS_DINODE_FMT_LOCAL:
>  		if ((iip->ili_fields & XFS_ILOG_ADATA) &&
>  		    ip->i_afp->if_bytes > 0) {
> -			*nbytes += roundup(ip->i_afp->if_bytes, 4);
> +			*nbytes += xlog_calc_iovec_len(ip->i_afp-
> >if_bytes);
>  			*nvecs += 1;
>  		}
>  		break;
> @@ -203,17 +203,12 @@ xfs_inode_item_format_data_fork(
>  			~(XFS_ILOG_DEXT | XFS_ILOG_DBROOT |
> XFS_ILOG_DEV);
>  		if ((iip->ili_fields & XFS_ILOG_DDATA) &&
>  		    ip->i_df.if_bytes > 0) {
> -			/*
> -			 * Round i_bytes up to a word boundary.
> -			 * The underlying memory is guaranteed
> -			 * to be there by xfs_idata_realloc().
> -			 */
> -			data_bytes = roundup(ip->i_df.if_bytes, 4);
>  			ASSERT(ip->i_df.if_u1.if_data != NULL);
>  			ASSERT(ip->i_disk_size > 0);
>  			xlog_copy_iovec(lv, vecp, XLOG_REG_TYPE_ILOCAL,
> -					ip->i_df.if_u1.if_data,
> data_bytes);
> -			ilf->ilf_dsize = (unsigned)data_bytes;
> +					ip->i_df.if_u1.if_data,
> +					ip->i_df.if_bytes);
> +			ilf->ilf_dsize = (unsigned)ip->i_df.if_bytes;
>  			ilf->ilf_size++;
>  		} else {
>  			iip->ili_fields &= ~XFS_ILOG_DDATA;
> @@ -287,17 +282,11 @@ xfs_inode_item_format_attr_fork(
>  
>  		if ((iip->ili_fields & XFS_ILOG_ADATA) &&
>  		    ip->i_afp->if_bytes > 0) {
> -			/*
> -			 * Round i_bytes up to a word boundary.
> -			 * The underlying memory is guaranteed
> -			 * to be there by xfs_idata_realloc().
> -			 */
> -			data_bytes = roundup(ip->i_afp->if_bytes, 4);
>  			ASSERT(ip->i_afp->if_u1.if_data != NULL);
>  			xlog_copy_iovec(lv, vecp,
> XLOG_REG_TYPE_IATTR_LOCAL,
>  					ip->i_afp->if_u1.if_data,
> -					data_bytes);
> -			ilf->ilf_asize = (unsigned)data_bytes;
> +					ip->i_afp->if_bytes);
> +			ilf->ilf_asize = (unsigned)ip->i_afp->if_bytes;
>  			ilf->ilf_size++;
>  		} else {
>  			iip->ili_fields &= ~XFS_ILOG_ADATA;
> diff --git a/fs/xfs/xfs_inode_item_recover.c
> b/fs/xfs/xfs_inode_item_recover.c
> index 239dd2e3384e..044504a4b399 100644
> --- a/fs/xfs/xfs_inode_item_recover.c
> +++ b/fs/xfs/xfs_inode_item_recover.c
> @@ -401,7 +401,7 @@ xlog_recover_inode_commit_pass2(
>  	ASSERT(in_f->ilf_size <= 4);
>  	ASSERT((in_f->ilf_size == 3) || (fields & XFS_ILOG_AFORK));
>  	ASSERT(!(fields & XFS_ILOG_DFORK) ||
> -	       (len == in_f->ilf_dsize));
> +	       (len == xlog_calc_iovec_len(in_f->ilf_dsize)));
>  
>  	switch (fields & XFS_ILOG_DFORK) {
>  	case XFS_ILOG_DDATA:
> @@ -436,7 +436,7 @@ xlog_recover_inode_commit_pass2(
>  		}
>  		len = item->ri_buf[attr_index].i_len;
>  		src = item->ri_buf[attr_index].i_addr;
> -		ASSERT(len == in_f->ilf_asize);
> +		ASSERT(len == xlog_calc_iovec_len(in_f->ilf_asize));
>  
>  		switch (in_f->ilf_fields & XFS_ILOG_AFORK) {
>  		case XFS_ILOG_ADATA:
> diff --git a/fs/xfs/xfs_log.h b/fs/xfs/xfs_log.h
> index 816f44d7dc81..fbe3e764cee3 100644
> --- a/fs/xfs/xfs_log.h
> +++ b/fs/xfs/xfs_log.h
> @@ -21,6 +21,17 @@ struct xfs_log_vec {
>  
>  #define XFS_LOG_VEC_ORDERED	(-1)
>  
> +/*
> + * Calculate the log iovec length for a given user buffer length.
> Intended to be
> + * used by ->iop_size implementations when sizing buffers of
> arbitrary
> + * alignments.
> + */
> +static inline int
> +xlog_calc_iovec_len(int len)
> +{
> +	return roundup(len, 4);
> +}
> +
>  void *xlog_prepare_iovec(struct xfs_log_vec *lv, struct
> xfs_log_iovec **vecp,
>  		uint type);
>  
> @@ -29,6 +40,12 @@ xlog_finish_iovec(struct xfs_log_vec *lv, struct
> xfs_log_iovec *vec, int len)
>  {
>  	struct xlog_op_header	*oph = vec->i_addr;
>  
> +	/*
> +	 * Always round up the length to the correct alignment so
> callers don't
> +	 * need to know anything about this log vec layout requirement.
> +	 */
> +	len = xlog_calc_iovec_len(len);Hmm, what code base was this on?
> > 
Hmm, I'm getting some merge conflicts in this area.  It looks like the
round_up logic was already added in:

bde7cff67c39227c6ad503394e19e58debdbc5e3
"xfs: format log items write directly into the linear CIL buffer"

So I think it's ok to drop this bit about rounding length.

Other than that I think the patch looks ok.
Allison

> +
>  	/* opheader tracks payload length, logvec tracks region length
> */
>  	oph->oh_len = cpu_to_be32(len);
>  
> @@ -36,8 +53,14 @@ xlog_finish_iovec(struct xfs_log_vec *lv, struct
> xfs_log_iovec *vec, int len)
>  	lv->lv_buf_len += len;
>  	lv->lv_bytes += len;
>  	vec->i_len = len;
> +
> +	/* Catch buffer overruns */
> +	ASSERT((void *)lv->lv_buf + lv->lv_bytes <= (void *)lv + lv-
> >lv_size);
>  }
>  
> +/*
> + * Copy the amount of data requested by the caller into a new log
> iovec.
> + */
>  static inline void *
>  xlog_copy_iovec(struct xfs_log_vec *lv, struct xfs_log_iovec **vecp,
>  		uint type, void *data, int len)


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/8] xfs: add log item flags to indicate intents
  2022-03-14 22:06 ` [PATCH 3/8] xfs: add log item flags to indicate intents Dave Chinner
@ 2022-04-11  5:23   ` Alli
  0 siblings, 0 replies; 20+ messages in thread
From: Alli @ 2022-04-11  5:23 UTC (permalink / raw)
  To: Dave Chinner, linux-xfs

On Tue, 2022-03-15 at 09:06 +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> We currently have a couple of helper functions that try to infer
> whether the log item is an intent or intent done item from the
> combinations of operations it supports.  This is incredibly fragile
> and not very efficient as it requires checking specific combinations
> of ops.
> 
> We need to be able to identify intent and intent done items quickly
> and easily in upcoming patches, so simply add intent and intent done
> type flags to the log item ops flags. These are static flags to
> begin with, so intent items should have been typed like this from
> the start.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
Ok, this one looks pretty straight forward.
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>

> ---
>  fs/xfs/xfs_bmap_item.c     |  4 +++-
>  fs/xfs/xfs_extfree_item.c  |  4 +++-
>  fs/xfs/xfs_refcount_item.c |  4 +++-
>  fs/xfs/xfs_rmap_item.c     |  4 +++-
>  fs/xfs/xfs_trans.h         | 25 +++++++++++++------------
>  5 files changed, 25 insertions(+), 16 deletions(-)
> 
> diff --git a/fs/xfs/xfs_bmap_item.c b/fs/xfs/xfs_bmap_item.c
> index e1f4d7d5a011..ead27d40ef78 100644
> --- a/fs/xfs/xfs_bmap_item.c
> +++ b/fs/xfs/xfs_bmap_item.c
> @@ -202,7 +202,8 @@ xfs_bud_item_release(
>  }
>  
>  static const struct xfs_item_ops xfs_bud_item_ops = {
> -	.flags		= XFS_ITEM_RELEASE_WHEN_COMMITTED,
> +	.flags		= XFS_ITEM_RELEASE_WHEN_COMMITTED |
> +			  XFS_ITEM_INTENT_DONE,
>  	.iop_size	= xfs_bud_item_size,
>  	.iop_format	= xfs_bud_item_format,
>  	.iop_release	= xfs_bud_item_release,
> @@ -584,6 +585,7 @@ xfs_bui_item_relog(
>  }
>  
>  static const struct xfs_item_ops xfs_bui_item_ops = {
> +	.flags		= XFS_ITEM_INTENT,
>  	.iop_size	= xfs_bui_item_size,
>  	.iop_format	= xfs_bui_item_format,
>  	.iop_unpin	= xfs_bui_item_unpin,
> diff --git a/fs/xfs/xfs_extfree_item.c b/fs/xfs/xfs_extfree_item.c
> index 47ef9c9c5c17..6913db61d770 100644
> --- a/fs/xfs/xfs_extfree_item.c
> +++ b/fs/xfs/xfs_extfree_item.c
> @@ -307,7 +307,8 @@ xfs_efd_item_release(
>  }
>  
>  static const struct xfs_item_ops xfs_efd_item_ops = {
> -	.flags		= XFS_ITEM_RELEASE_WHEN_COMMITTED,
> +	.flags		= XFS_ITEM_RELEASE_WHEN_COMMITTED |
> +			  XFS_ITEM_INTENT_DONE,
>  	.iop_size	= xfs_efd_item_size,
>  	.iop_format	= xfs_efd_item_format,
>  	.iop_release	= xfs_efd_item_release,
> @@ -688,6 +689,7 @@ xfs_efi_item_relog(
>  }
>  
>  static const struct xfs_item_ops xfs_efi_item_ops = {
> +	.flags		= XFS_ITEM_INTENT,
>  	.iop_size	= xfs_efi_item_size,
>  	.iop_format	= xfs_efi_item_format,
>  	.iop_unpin	= xfs_efi_item_unpin,
> diff --git a/fs/xfs/xfs_refcount_item.c b/fs/xfs/xfs_refcount_item.c
> index d3da67772d57..bffde82b109c 100644
> --- a/fs/xfs/xfs_refcount_item.c
> +++ b/fs/xfs/xfs_refcount_item.c
> @@ -208,7 +208,8 @@ xfs_cud_item_release(
>  }
>  
>  static const struct xfs_item_ops xfs_cud_item_ops = {
> -	.flags		= XFS_ITEM_RELEASE_WHEN_COMMITTED,
> +	.flags		= XFS_ITEM_RELEASE_WHEN_COMMITTED |
> +			  XFS_ITEM_INTENT_DONE,
>  	.iop_size	= xfs_cud_item_size,
>  	.iop_format	= xfs_cud_item_format,
>  	.iop_release	= xfs_cud_item_release,
> @@ -600,6 +601,7 @@ xfs_cui_item_relog(
>  }
>  
>  static const struct xfs_item_ops xfs_cui_item_ops = {
> +	.flags		= XFS_ITEM_INTENT,
>  	.iop_size	= xfs_cui_item_size,
>  	.iop_format	= xfs_cui_item_format,
>  	.iop_unpin	= xfs_cui_item_unpin,
> diff --git a/fs/xfs/xfs_rmap_item.c b/fs/xfs/xfs_rmap_item.c
> index c3966b4c58ef..c78e31bc84e1 100644
> --- a/fs/xfs/xfs_rmap_item.c
> +++ b/fs/xfs/xfs_rmap_item.c
> @@ -231,7 +231,8 @@ xfs_rud_item_release(
>  }
>  
>  static const struct xfs_item_ops xfs_rud_item_ops = {
> -	.flags		= XFS_ITEM_RELEASE_WHEN_COMMITTED,
> +	.flags		= XFS_ITEM_RELEASE_WHEN_COMMITTED |
> +			  XFS_ITEM_INTENT_DONE,
>  	.iop_size	= xfs_rud_item_size,
>  	.iop_format	= xfs_rud_item_format,
>  	.iop_release	= xfs_rud_item_release,
> @@ -630,6 +631,7 @@ xfs_rui_item_relog(
>  }
>  
>  static const struct xfs_item_ops xfs_rui_item_ops = {
> +	.flags		= XFS_ITEM_INTENT,
>  	.iop_size	= xfs_rui_item_size,
>  	.iop_format	= xfs_rui_item_format,
>  	.iop_unpin	= xfs_rui_item_unpin,
> diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h
> index a487b264a9eb..93cb4be33f7a 100644
> --- a/fs/xfs/xfs_trans.h
> +++ b/fs/xfs/xfs_trans.h
> @@ -79,28 +79,29 @@ struct xfs_item_ops {
>  			struct xfs_trans *tp);
>  };
>  
> -/* Is this log item a deferred action intent? */
> +/*
> + * Log item ops flags
> + */
> +/*
> + * Release the log item when the journal commits instead of
> inserting into the
> + * AIL for writeback tracking and/or log tail pinning.
> + */
> +#define XFS_ITEM_RELEASE_WHEN_COMMITTED	(1 << 0)
> +#define XFS_ITEM_INTENT			(1 << 1)
> +#define XFS_ITEM_INTENT_DONE		(1 << 2)
> +
>  static inline bool
>  xlog_item_is_intent(struct xfs_log_item *lip)
>  {
> -	return lip->li_ops->iop_recover != NULL &&
> -	       lip->li_ops->iop_match != NULL;
> +	return lip->li_ops->flags & XFS_ITEM_INTENT;
>  }
>  
> -/* Is this a log intent-done item? */
>  static inline bool
>  xlog_item_is_intent_done(struct xfs_log_item *lip)
>  {
> -	return lip->li_ops->iop_unpin == NULL &&
> -	       lip->li_ops->iop_push == NULL;
> +	return lip->li_ops->flags & XFS_ITEM_INTENT_DONE;
>  }
>  
> -/*
> - * Release the log item as soon as committed.  This is for items
> just logging
> - * intents that never need to be written back in place.
> - */
> -#define XFS_ITEM_RELEASE_WHEN_COMMITTED	(1 << 0)
> -
>  void	xfs_log_item_init(struct xfs_mount *mp, struct xfs_log_item
> *item,
>  			  int type, const struct xfs_item_ops *ops);
>  


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 4/8] xfs: tag transactions that contain intent done items
  2022-03-14 22:06 ` [PATCH 4/8] xfs: tag transactions that contain intent done items Dave Chinner
@ 2022-04-11  5:23   ` Alli
  0 siblings, 0 replies; 20+ messages in thread
From: Alli @ 2022-04-11  5:23 UTC (permalink / raw)
  To: Dave Chinner, linux-xfs

On Tue, 2022-03-15 at 09:06 +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> Intent whiteouts will require extra work to be done during
> transaction commit if the transaction contains an intent done item.
> 
> To determine if a transaction contains an intent done item, we want
> to avoid having to walk all the items in the transaction to check if
> they are intent done items. Hence when we add an intent done item to
> a transaction, tag the transaction to indicate that it contains such
> an item.
> 
> We don't tag the transaction when the defer ops is relogging an
> intent to move it forward in the log. Whiteouts will never apply to
> these cases, so we don't need to bother looking for them.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
Ok, makes sense:
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
> ---
>  fs/xfs/libxfs/xfs_shared.h | 24 +++++++++++++++++-------
>  fs/xfs/xfs_bmap_item.c     |  2 +-
>  fs/xfs/xfs_extfree_item.c  |  2 +-
>  fs/xfs/xfs_refcount_item.c |  2 +-
>  fs/xfs/xfs_rmap_item.c     |  2 +-
>  5 files changed, 21 insertions(+), 11 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_shared.h b/fs/xfs/libxfs/xfs_shared.h
> index 25c4cab58851..e96618dbde29 100644
> --- a/fs/xfs/libxfs/xfs_shared.h
> +++ b/fs/xfs/libxfs/xfs_shared.h
> @@ -54,13 +54,23 @@ void	xfs_log_get_max_trans_res(struct
> xfs_mount *mp,
>  /*
>   * Values for t_flags.
>   */
> -#define	XFS_TRANS_DIRTY		0x01	/* something needs to
> be logged */
> -#define	XFS_TRANS_SB_DIRTY	0x02	/* superblock is
> modified */
> -#define	XFS_TRANS_PERM_LOG_RES	0x04	/* xact took a
> permanent log res */
> -#define	XFS_TRANS_SYNC		0x08	/* make commit
> synchronous */
> -#define XFS_TRANS_RESERVE	0x20    /* OK to use reserved data
> blocks */
> -#define XFS_TRANS_NO_WRITECOUNT 0x40	/* do not elevate SB writecount
> */
> -#define XFS_TRANS_RES_FDBLKS	0x80	/* reserve newly freed blocks
> */
> +/* Transaction needs to be logged */
> +#define XFS_TRANS_DIRTY			(1 << 0)
> +/* Superblock is dirty and needs to be logged */
> +#define XFS_TRANS_SB_DIRTY		(1 << 1)
> +/* Transaction took a permanent log reservation */
> +#define XFS_TRANS_PERM_LOG_RES		(1 << 2)
> +/* Synchronous transaction commit needed */
> +#define XFS_TRANS_SYNC			(1 << 3)
> +/* Transaction can use reserve block pool */
> +#define XFS_TRANS_RESERVE		(1 << 4)
> +/* Transaction should avoid VFS level superblock write accounting */
> +#define XFS_TRANS_NO_WRITECOUNT		(1 << 5)
> +/* Transaction has freed blocks returned to it's reservation */
> +#define XFS_TRANS_RES_FDBLKS		(1 << 6)
> +/* Transaction contains an intent done log item */
> +#define XFS_TRANS_HAS_INTENT_DONE	(1 << 7)
> +
>  /*
>   * LOWMODE is used by the allocator to activate the lowspace
> algorithm - when
>   * free space is running low the extent allocator may choose to
> allocate an
> diff --git a/fs/xfs/xfs_bmap_item.c b/fs/xfs/xfs_bmap_item.c
> index ead27d40ef78..45dd03272e5d 100644
> --- a/fs/xfs/xfs_bmap_item.c
> +++ b/fs/xfs/xfs_bmap_item.c
> @@ -255,7 +255,7 @@ xfs_trans_log_finish_bmap_update(
>  	 * 1.) releases the BUI and frees the BUD
>  	 * 2.) shuts down the filesystem
>  	 */
> -	tp->t_flags |= XFS_TRANS_DIRTY;
> +	tp->t_flags |= XFS_TRANS_DIRTY | XFS_TRANS_HAS_INTENT_DONE;
>  	set_bit(XFS_LI_DIRTY, &budp->bud_item.li_flags);
>  
>  	return error;
> diff --git a/fs/xfs/xfs_extfree_item.c b/fs/xfs/xfs_extfree_item.c
> index 6913db61d770..ed1229cb6807 100644
> --- a/fs/xfs/xfs_extfree_item.c
> +++ b/fs/xfs/xfs_extfree_item.c
> @@ -381,7 +381,7 @@ xfs_trans_free_extent(
>  	 * 1.) releases the EFI and frees the EFD
>  	 * 2.) shuts down the filesystem
>  	 */
> -	tp->t_flags |= XFS_TRANS_DIRTY;
> +	tp->t_flags |= XFS_TRANS_DIRTY | XFS_TRANS_HAS_INTENT_DONE;
>  	set_bit(XFS_LI_DIRTY, &efdp->efd_item.li_flags);
>  
>  	next_extent = efdp->efd_next_extent;
> diff --git a/fs/xfs/xfs_refcount_item.c b/fs/xfs/xfs_refcount_item.c
> index bffde82b109c..642bcff72a71 100644
> --- a/fs/xfs/xfs_refcount_item.c
> +++ b/fs/xfs/xfs_refcount_item.c
> @@ -260,7 +260,7 @@ xfs_trans_log_finish_refcount_update(
>  	 * 1.) releases the CUI and frees the CUD
>  	 * 2.) shuts down the filesystem
>  	 */
> -	tp->t_flags |= XFS_TRANS_DIRTY;
> +	tp->t_flags |= XFS_TRANS_DIRTY | XFS_TRANS_HAS_INTENT_DONE;
>  	set_bit(XFS_LI_DIRTY, &cudp->cud_item.li_flags);
>  
>  	return error;
> diff --git a/fs/xfs/xfs_rmap_item.c b/fs/xfs/xfs_rmap_item.c
> index c78e31bc84e1..4285b94465d2 100644
> --- a/fs/xfs/xfs_rmap_item.c
> +++ b/fs/xfs/xfs_rmap_item.c
> @@ -328,7 +328,7 @@ xfs_trans_log_finish_rmap_update(
>  	 * 1.) releases the RUI and frees the RUD
>  	 * 2.) shuts down the filesystem
>  	 */
> -	tp->t_flags |= XFS_TRANS_DIRTY;
> +	tp->t_flags |= XFS_TRANS_DIRTY | XFS_TRANS_HAS_INTENT_DONE;
>  	set_bit(XFS_LI_DIRTY, &rudp->rud_item.li_flags);
>  
>  	return error;


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 5/8] xfs: factor and move some code in xfs_log_cil.c
  2022-03-14 22:06 ` [PATCH 5/8] xfs: factor and move some code in xfs_log_cil.c Dave Chinner
@ 2022-04-11  5:24   ` Alli
  2022-04-12 10:25     ` Dave Chinner
  0 siblings, 1 reply; 20+ messages in thread
From: Alli @ 2022-04-11  5:24 UTC (permalink / raw)
  To: Dave Chinner, linux-xfs

On Tue, 2022-03-15 at 09:06 +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> In preparation for adding support for intent item whiteouts.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/xfs_log_cil.c | 119 ++++++++++++++++++++++++-----------------
> --
>  1 file changed, 67 insertions(+), 52 deletions(-)
> 
> diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
> index 5179436b6603..dda71f1a25c5 100644
> --- a/fs/xfs/xfs_log_cil.c
> +++ b/fs/xfs/xfs_log_cil.c
> @@ -47,6 +47,38 @@ xlog_cil_ticket_alloc(
>  	return tic;
>  }
>  
> +/*
> + * Check if the current log item was first committed in this
> sequence.
> + * We can't rely on just the log item being in the CIL, we have to
> check
> + * the recorded commit sequence number.
> + *
> + * Note: for this to be used in a non-racy manner, it has to be
> called with
> + * CIL flushing locked out. As a result, it should only be used
> during the
> + * transaction commit process when deciding what to format into the
> item.
> + */
> +static bool
> +xlog_item_in_current_chkpt(
> +	struct xfs_cil		*cil,
> +	struct xfs_log_item	*lip)
> +{
> +	if (list_empty(&lip->li_cil))
> +		return false;
> +
> +	/*
> +	 * li_seq is written on the first commit of a log item to
> record the
> +	 * first checkpoint it is written to. Hence if it is different
> to the
> +	 * current sequence, we're in a new checkpoint.
> +	 */
> +	return lip->li_seq == READ_ONCE(cil->xc_current_sequence);
> +}
> +
> +bool
> +xfs_log_item_in_current_chkpt(
> +	struct xfs_log_item *lip)
> +{
> +	return xlog_item_in_current_chkpt(lip->li_mountp->m_log-
> >l_cilp, lip);

I think this turns into "lip->li_log->l_mp->m_log->l_cilp" in the new
code base

> +}
> +
>  /*
>   * Unavoidable forward declaration - xlog_cil_push_work() calls
>   * xlog_cil_ctx_alloc() itself.
> @@ -924,6 +956,40 @@ xlog_cil_build_trans_hdr(
>  	tic->t_curr_res -= lvhdr->lv_bytes;
>  }
>  
> +/*
> + * Pull all the log vectors off the items in the CIL, and remove the
> items from
> + * the CIL. We don't need the CIL lock here because it's only needed
> on the
> + * transaction commit side which is currently locked out by the
> flush lock.
> + */
> +static void
> +xlog_cil_build_lv_chain(
> +	struct xfs_cil		*cil,
> +	struct xfs_cil_ctx	*ctx,
> +	uint32_t		*num_iovecs,
> +	uint32_t		*num_bytes)
> +{
> +	struct xfs_log_vec	*lv = NULL;
> +
> +	while (!list_empty(&cil->xc_cil)) {
> +		struct xfs_log_item	*item;
> +
> +		item = list_first_entry(&cil->xc_cil,
> +					struct xfs_log_item, li_cil);
> +		list_del_init(&item->li_cil);
> +		if (!ctx->lv_chain)
> +			ctx->lv_chain = item->li_lv;
> +		else
> +			lv->lv_next = item->li_lv;
> +		lv = item->li_lv;
> +		item->li_lv = NULL;
> +		*num_iovecs += lv->lv_niovecs;
> 

This part below does not appear in the new rebase, so this would go
away in the hoisted helper

> +
> +		/* we don't write ordered log vectors */
> +		if (lv->lv_buf_len != XFS_LOG_VEC_ORDERED)
> +			*num_bytes += lv->lv_bytes;
> +	}
> +}
> +
>  /*
>   * Push the Committed Item List to the log.
>   *
> @@ -946,7 +1012,6 @@ xlog_cil_push_work(
>  		container_of(work, struct xfs_cil_ctx, push_work);
>  	struct xfs_cil		*cil = ctx->cil;
>  	struct xlog		*log = cil->xc_log;
> -	struct xfs_log_vec	*lv;
>  	struct xfs_cil_ctx	*new_ctx;
>  	int			num_iovecs = 0;

For me, I had to add the new num_bytes variable here:
	int			num_iovecs, num_bytes;

I think the new helper does not need the num_bytes parameter in the new
rebase, so we may be able to just remove num_bytes  entirely.

Otherwise these hoists look straight forward.

Allison

>  	int			num_bytes = 0;
> @@ -1043,31 +1108,7 @@ xlog_cil_push_work(
>  	xfs_flush_bdev_async(&bio, log->l_mp->m_ddev_targp->bt_bdev,
>  				&bdev_flush);
>  
> -	/*
> -	 * Pull all the log vectors off the items in the CIL, and
> remove the
> -	 * items from the CIL. We don't need the CIL lock here because
> it's only
> -	 * needed on the transaction commit side which is currently
> locked out
> -	 * by the flush lock.
> -	 */
> -	lv = NULL;
> -	while (!list_empty(&cil->xc_cil)) {
> -		struct xfs_log_item	*item;
> -
> -		item = list_first_entry(&cil->xc_cil,
> -					struct xfs_log_item, li_cil);
> -		list_del_init(&item->li_cil);
> -		if (!ctx->lv_chain)
> -			ctx->lv_chain = item->li_lv;
> -		else
> -			lv->lv_next = item->li_lv;
> -		lv = item->li_lv;
> -		item->li_lv = NULL;
> -		num_iovecs += lv->lv_niovecs;
> -
> -		/* we don't write ordered log vectors */
> -		if (lv->lv_buf_len != XFS_LOG_VEC_ORDERED)
> -			num_bytes += lv->lv_bytes;
> -	}
> +	xlog_cil_build_lv_chain(cil, ctx, &num_iovecs, &num_bytes);
>  
>  	/*
>  	 * Switch the contexts so we can drop the context lock and move
> out
> @@ -1508,32 +1549,6 @@ xlog_cil_force_seq(
>  	return 0;
>  }
>  
> -/*
> - * Check if the current log item was first committed in this
> sequence.
> - * We can't rely on just the log item being in the CIL, we have to
> check
> - * the recorded commit sequence number.
> - *
> - * Note: for this to be used in a non-racy manner, it has to be
> called with
> - * CIL flushing locked out. As a result, it should only be used
> during the
> - * transaction commit process when deciding what to format into the
> item.
> - */
> -bool
> -xfs_log_item_in_current_chkpt(
> -	struct xfs_log_item	*lip)
> -{
> -	struct xfs_cil		*cil = lip->li_mountp->m_log->l_cilp;
> -
> -	if (list_empty(&lip->li_cil))
> -		return false;
> -
> -	/*
> -	 * li_seq is written on the first commit of a log item to
> record the
> -	 * first checkpoint it is written to. Hence if it is different
> to the
> -	 * current sequence, we're in a new checkpoint.
> -	 */
> -	return lip->li_seq == READ_ONCE(cil->xc_current_sequence);
> -}
> -
>  /*
>   * Perform initial CIL structure initialisation.
>   */


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 6/8] xfs: add log item method to return related intents
  2022-03-14 22:06 ` [PATCH 6/8] xfs: add log item method to return related intents Dave Chinner
@ 2022-04-11  5:24   ` Alli
  0 siblings, 0 replies; 20+ messages in thread
From: Alli @ 2022-04-11  5:24 UTC (permalink / raw)
  To: Dave Chinner, linux-xfs

On Tue, 2022-03-15 at 09:06 +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> To apply a whiteout to an intent item when an intent done item is
> committed, we need to be able to retrieve the intent item from the
> the intent done item. Add a log item op method for doing this, and
> wire all the intent done items up to it.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
Looks ok to me
Reviewed-by Allison Henderson <allison.henderson@oracle.com>
> ---
>  fs/xfs/xfs_bmap_item.c     | 8 ++++++++
>  fs/xfs/xfs_extfree_item.c  | 8 ++++++++
>  fs/xfs/xfs_refcount_item.c | 8 ++++++++
>  fs/xfs/xfs_rmap_item.c     | 8 ++++++++
>  fs/xfs/xfs_trans.h         | 1 +
>  5 files changed, 33 insertions(+)
> 
> diff --git a/fs/xfs/xfs_bmap_item.c b/fs/xfs/xfs_bmap_item.c
> index 45dd03272e5d..2e7abfe35644 100644
> --- a/fs/xfs/xfs_bmap_item.c
> +++ b/fs/xfs/xfs_bmap_item.c
> @@ -201,12 +201,20 @@ xfs_bud_item_release(
>  	kmem_cache_free(xfs_bud_cache, budp);
>  }
>  
> +static struct xfs_log_item *
> +xfs_bud_item_intent(
> +	struct xfs_log_item	*lip)
> +{
> +	return &BUD_ITEM(lip)->bud_buip->bui_item;
> +}
> +
>  static const struct xfs_item_ops xfs_bud_item_ops = {
>  	.flags		= XFS_ITEM_RELEASE_WHEN_COMMITTED |
>  			  XFS_ITEM_INTENT_DONE,
>  	.iop_size	= xfs_bud_item_size,
>  	.iop_format	= xfs_bud_item_format,
>  	.iop_release	= xfs_bud_item_release,
> +	.iop_intent	= xfs_bud_item_intent,
>  };
>  
>  static struct xfs_bud_log_item *
> diff --git a/fs/xfs/xfs_extfree_item.c b/fs/xfs/xfs_extfree_item.c
> index ed1229cb6807..1d0e5cdc15f9 100644
> --- a/fs/xfs/xfs_extfree_item.c
> +++ b/fs/xfs/xfs_extfree_item.c
> @@ -306,12 +306,20 @@ xfs_efd_item_release(
>  	xfs_efd_item_free(efdp);
>  }
>  
> +static struct xfs_log_item *
> +xfs_efd_item_intent(
> +	struct xfs_log_item	*lip)
> +{
> +	return &EFD_ITEM(lip)->efd_efip->efi_item;
> +}
> +
>  static const struct xfs_item_ops xfs_efd_item_ops = {
>  	.flags		= XFS_ITEM_RELEASE_WHEN_COMMITTED |
>  			  XFS_ITEM_INTENT_DONE,
>  	.iop_size	= xfs_efd_item_size,
>  	.iop_format	= xfs_efd_item_format,
>  	.iop_release	= xfs_efd_item_release,
> +	.iop_intent	= xfs_efd_item_intent,
>  };
>  
>  /*
> diff --git a/fs/xfs/xfs_refcount_item.c b/fs/xfs/xfs_refcount_item.c
> index 642bcff72a71..ada5793ce550 100644
> --- a/fs/xfs/xfs_refcount_item.c
> +++ b/fs/xfs/xfs_refcount_item.c
> @@ -207,12 +207,20 @@ xfs_cud_item_release(
>  	kmem_cache_free(xfs_cud_cache, cudp);
>  }
>  
> +static struct xfs_log_item *
> +xfs_cud_item_intent(
> +	struct xfs_log_item	*lip)
> +{
> +	return &CUD_ITEM(lip)->cud_cuip->cui_item;
> +}
> +
>  static const struct xfs_item_ops xfs_cud_item_ops = {
>  	.flags		= XFS_ITEM_RELEASE_WHEN_COMMITTED |
>  			  XFS_ITEM_INTENT_DONE,
>  	.iop_size	= xfs_cud_item_size,
>  	.iop_format	= xfs_cud_item_format,
>  	.iop_release	= xfs_cud_item_release,
> +	.iop_intent	= xfs_cud_item_intent,
>  };
>  
>  static struct xfs_cud_log_item *
> diff --git a/fs/xfs/xfs_rmap_item.c b/fs/xfs/xfs_rmap_item.c
> index 4285b94465d2..6e66e7718902 100644
> --- a/fs/xfs/xfs_rmap_item.c
> +++ b/fs/xfs/xfs_rmap_item.c
> @@ -230,12 +230,20 @@ xfs_rud_item_release(
>  	kmem_cache_free(xfs_rud_cache, rudp);
>  }
>  
> +static struct xfs_log_item *
> +xfs_rud_item_intent(
> +	struct xfs_log_item	*lip)
> +{
> +	return &RUD_ITEM(lip)->rud_ruip->rui_item;
> +}
> +
>  static const struct xfs_item_ops xfs_rud_item_ops = {
>  	.flags		= XFS_ITEM_RELEASE_WHEN_COMMITTED |
>  			  XFS_ITEM_INTENT_DONE,
>  	.iop_size	= xfs_rud_item_size,
>  	.iop_format	= xfs_rud_item_format,
>  	.iop_release	= xfs_rud_item_release,
> +	.iop_intent	= xfs_rud_item_intent,
>  };
>  
>  static struct xfs_rud_log_item *
> diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h
> index 93cb4be33f7a..6182c97cb8e7 100644
> --- a/fs/xfs/xfs_trans.h
> +++ b/fs/xfs/xfs_trans.h
> @@ -77,6 +77,7 @@ struct xfs_item_ops {
>  	bool (*iop_match)(struct xfs_log_item *item, uint64_t id);
>  	struct xfs_log_item *(*iop_relog)(struct xfs_log_item *intent,
>  			struct xfs_trans *tp);
> +	struct xfs_log_item *(*iop_intent)(struct xfs_log_item
> *intent_done);
>  };
>  
>  /*


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 7/8] xfs: whiteouts release intents that are not in the AIL
  2022-03-14 22:06 ` [PATCH 7/8] xfs: whiteouts release intents that are not in the AIL Dave Chinner
@ 2022-04-11  5:27   ` Alli
  0 siblings, 0 replies; 20+ messages in thread
From: Alli @ 2022-04-11  5:27 UTC (permalink / raw)
  To: Dave Chinner, linux-xfs

On Tue, 2022-03-15 at 09:06 +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> When we release an intent that a whiteout applies to, it will not
> have been committed to the journal and so won't be in the AIL. Hence
> when we drop the last reference to the intent, we do not want to try
> to remove it from the AIL as that will trigger a filesystem
> shutdown. Hence make the removal of intents from the AIL conditional
> on them actually being in the AIL so we do the correct thing.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
Ok, makes sense
Reviewed-by Allison Henderson <allison.henderson@oracle.com>

> ---
>  fs/xfs/xfs_bmap_item.c     | 8 +++++---
>  fs/xfs/xfs_extfree_item.c  | 8 +++++---
>  fs/xfs/xfs_refcount_item.c | 8 +++++---
>  fs/xfs/xfs_rmap_item.c     | 8 +++++---
>  4 files changed, 20 insertions(+), 12 deletions(-)
> 
> diff --git a/fs/xfs/xfs_bmap_item.c b/fs/xfs/xfs_bmap_item.c
> index 2e7abfe35644..c6f47c13da27 100644
> --- a/fs/xfs/xfs_bmap_item.c
> +++ b/fs/xfs/xfs_bmap_item.c
> @@ -54,10 +54,12 @@ xfs_bui_release(
>  	struct xfs_bui_log_item	*buip)
>  {
>  	ASSERT(atomic_read(&buip->bui_refcount) > 0);
> -	if (atomic_dec_and_test(&buip->bui_refcount)) {
> +	if (!atomic_dec_and_test(&buip->bui_refcount))
> +		return;
> +
> +	if (test_bit(XFS_LI_IN_AIL, &buip->bui_item.li_flags))
>  		xfs_trans_ail_delete(&buip->bui_item,
> SHUTDOWN_LOG_IO_ERROR);
> -		xfs_bui_item_free(buip);
> -	}
> +	xfs_bui_item_free(buip);
>  }
>  
>  
> diff --git a/fs/xfs/xfs_extfree_item.c b/fs/xfs/xfs_extfree_item.c
> index 1d0e5cdc15f9..36eeac9413f5 100644
> --- a/fs/xfs/xfs_extfree_item.c
> +++ b/fs/xfs/xfs_extfree_item.c
> @@ -58,10 +58,12 @@ xfs_efi_release(
>  	struct xfs_efi_log_item	*efip)
>  {
>  	ASSERT(atomic_read(&efip->efi_refcount) > 0);
> -	if (atomic_dec_and_test(&efip->efi_refcount)) {
> +	if (!atomic_dec_and_test(&efip->efi_refcount))
> +		return;
> +
> +	if (test_bit(XFS_LI_IN_AIL, &efip->efi_item.li_flags))
>  		xfs_trans_ail_delete(&efip->efi_item,
> SHUTDOWN_LOG_IO_ERROR);
> -		xfs_efi_item_free(efip);
> -	}
> +	xfs_efi_item_free(efip);
>  }
>  
>  /*
> diff --git a/fs/xfs/xfs_refcount_item.c b/fs/xfs/xfs_refcount_item.c
> index ada5793ce550..d4632f2ceb89 100644
> --- a/fs/xfs/xfs_refcount_item.c
> +++ b/fs/xfs/xfs_refcount_item.c
> @@ -53,10 +53,12 @@ xfs_cui_release(
>  	struct xfs_cui_log_item	*cuip)
>  {
>  	ASSERT(atomic_read(&cuip->cui_refcount) > 0);
> -	if (atomic_dec_and_test(&cuip->cui_refcount)) {
> +	if (!atomic_dec_and_test(&cuip->cui_refcount))
> +		return;
> +
> +	if (test_bit(XFS_LI_IN_AIL, &cuip->cui_item.li_flags))
>  		xfs_trans_ail_delete(&cuip->cui_item,
> SHUTDOWN_LOG_IO_ERROR);
> -		xfs_cui_item_free(cuip);
> -	}
> +	xfs_cui_item_free(cuip);
>  }
>  
>  
> diff --git a/fs/xfs/xfs_rmap_item.c b/fs/xfs/xfs_rmap_item.c
> index 6e66e7718902..fa691a6ae737 100644
> --- a/fs/xfs/xfs_rmap_item.c
> +++ b/fs/xfs/xfs_rmap_item.c
> @@ -53,10 +53,12 @@ xfs_rui_release(
>  	struct xfs_rui_log_item	*ruip)
>  {
>  	ASSERT(atomic_read(&ruip->rui_refcount) > 0);
> -	if (atomic_dec_and_test(&ruip->rui_refcount)) {
> +	if (!atomic_dec_and_test(&ruip->rui_refcount))
> +		return;
> +
> +	if (test_bit(XFS_LI_IN_AIL, &ruip->rui_item.li_flags))
>  		xfs_trans_ail_delete(&ruip->rui_item,
> SHUTDOWN_LOG_IO_ERROR);
> -		xfs_rui_item_free(ruip);
> -	}
> +	xfs_rui_item_free(ruip);
>  }
>  
>  STATIC void


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/8] xfs: hide log iovec alignment constraints
  2022-04-11  5:23   ` Alli
@ 2022-04-12 10:13     ` Dave Chinner
  0 siblings, 0 replies; 20+ messages in thread
From: Dave Chinner @ 2022-04-12 10:13 UTC (permalink / raw)
  To: Alli; +Cc: linux-xfs

On Sun, Apr 10, 2022 at 10:23:09PM -0700, Alli wrote:
> On Tue, 2022-03-15 at 09:06 +1100, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > Callers currently have to round out the size of buffers to match the
> > aligment constraints of log iovecs and xlog_write(). They should not
> > need to know this detail, so introduce a new function to calculate
> > the iovec length (for use in ->iop_size implementations). Also
> > modify xlog_finish_iovec() to round up the length to the correct
> > alignment so the callers don't need to do this, either.
> > 
> > Convert the only user - inode forks - of this alignment rounding to
> > use the new interface.
> > 
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
.....
> >  void *xlog_prepare_iovec(struct xfs_log_vec *lv, struct
> > xfs_log_iovec **vecp,
> >  		uint type);
> >  
> > @@ -29,6 +40,12 @@ xlog_finish_iovec(struct xfs_log_vec *lv, struct
> > xfs_log_iovec *vec, int len)
> >  {
> >  	struct xlog_op_header	*oph = vec->i_addr;
> >  
> > +	/*
> > +	 * Always round up the length to the correct alignment so
> > callers don't
> > +	 * need to know anything about this log vec layout requirement.
> > +	 */
> > +	len = xlog_calc_iovec_len(len);Hmm, what code base was this on?
> > > 
> Hmm, I'm getting some merge conflicts in this area.  It looks like the
> round_up logic was already added in:
> 
> bde7cff67c39227c6ad503394e19e58debdbc5e3
> "xfs: format log items write directly into the linear CIL buffer"
> 
> So I think it's ok to drop this bit about rounding length.

Ok, I think that's why you are getting rounding assert failures in
the log code - this code replaces the fixed 4 byte allocation
roundup that is done for the inode fork data buffers, and if you
remove both the round-up I added to xlog_finish_iovec() and the
inode fork roundup, you get unaligned regions and assert failures in
xlog_write()....

The posted patchset was based on top of the xlog-write-rewrite
patchset I posted before this one, so I'd say that's where the
conflicts applying this to a base 5.18-rc2 kernel came from.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/8] xfs: don't commit the first deferred transaction without intents
  2022-04-11  5:22   ` Alli
@ 2022-04-12 10:21     ` Dave Chinner
  0 siblings, 0 replies; 20+ messages in thread
From: Dave Chinner @ 2022-04-12 10:21 UTC (permalink / raw)
  To: Alli; +Cc: linux-xfs

On Sun, Apr 10, 2022 at 10:22:48PM -0700, Alli wrote:
> On Tue, 2022-03-15 at 09:06 +1100, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > If the first operation in a string of defer ops has no intents,
> > then there is no reason to commit it before running the first call
> > to xfs_defer_finish_one(). This allows the defer ops to be used
> > effectively for non-intent based operations without requiring an
> > unnecessary extra transaction commit when first called.
> > 
> > This fixes a regression in per-attribute modification transaction
> > count when delayed attributes are not being used.
> > 
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> 
> I recall some time ago, you had given me this patch, and I added it to
> the delayed attribute series series.  The reviews created a slightly
> more simplified version of this, so if you are ok with how that one
> turned out, you can just omit this patch from the white out series.  Or
> if you prefer to keep it with this set, you can just adopt the second
> patch of the larp series, and I can omit it from there.  Either was
> should be fine I think?

The version in this patch set is quite different in implementation
scope - the original was just a scatter-gun that checked if the
transaction was not dirty. This one checks if there are intents
being logged, so the conditions under which it skips the commit are
much more refined.

Hence I don't the reviews carry over, and I think the version in the
whiteout patchset is the version we want...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 5/8] xfs: factor and move some code in xfs_log_cil.c
  2022-04-11  5:24   ` Alli
@ 2022-04-12 10:25     ` Dave Chinner
  0 siblings, 0 replies; 20+ messages in thread
From: Dave Chinner @ 2022-04-12 10:25 UTC (permalink / raw)
  To: Alli; +Cc: linux-xfs

On Sun, Apr 10, 2022 at 10:24:21PM -0700, Alli wrote:
> On Tue, 2022-03-15 at 09:06 +1100, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > In preparation for adding support for intent item whiteouts.
> > 
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > ---
> >  fs/xfs/xfs_log_cil.c | 119 ++++++++++++++++++++++++-----------------
> > --
> >  1 file changed, 67 insertions(+), 52 deletions(-)
> > 
> > diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
> > index 5179436b6603..dda71f1a25c5 100644
> > --- a/fs/xfs/xfs_log_cil.c
> > +++ b/fs/xfs/xfs_log_cil.c
> > @@ -47,6 +47,38 @@ xlog_cil_ticket_alloc(
> >  	return tic;
> >  }
> >  
> > +/*
> > + * Check if the current log item was first committed in this
> > sequence.
> > + * We can't rely on just the log item being in the CIL, we have to
> > check
> > + * the recorded commit sequence number.
> > + *
> > + * Note: for this to be used in a non-racy manner, it has to be
> > called with
> > + * CIL flushing locked out. As a result, it should only be used
> > during the
> > + * transaction commit process when deciding what to format into the
> > item.
> > + */
> > +static bool
> > +xlog_item_in_current_chkpt(
> > +	struct xfs_cil		*cil,
> > +	struct xfs_log_item	*lip)
> > +{
> > +	if (list_empty(&lip->li_cil))
> > +		return false;
> > +
> > +	/*
> > +	 * li_seq is written on the first commit of a log item to
> > record the
> > +	 * first checkpoint it is written to. Hence if it is different
> > to the
> > +	 * current sequence, we're in a new checkpoint.
> > +	 */
> > +	return lip->li_seq == READ_ONCE(cil->xc_current_sequence);
> > +}
> > +
> > +bool
> > +xfs_log_item_in_current_chkpt(
> > +	struct xfs_log_item *lip)
> > +{
> > +	return xlog_item_in_current_chkpt(lip->li_mountp->m_log-
> > >l_cilp, lip);
> 
> I think this turns into "lip->li_log->l_mp->m_log->l_cilp" in the new
> code base

Should just be lip->li_log->l_cilp.

> 
> > +}
> > +
> >  /*
> >   * Unavoidable forward declaration - xlog_cil_push_work() calls
> >   * xlog_cil_ctx_alloc() itself.
> > @@ -924,6 +956,40 @@ xlog_cil_build_trans_hdr(
> >  	tic->t_curr_res -= lvhdr->lv_bytes;
> >  }
> >  
> > +/*
> > + * Pull all the log vectors off the items in the CIL, and remove the
> > items from
> > + * the CIL. We don't need the CIL lock here because it's only needed
> > on the
> > + * transaction commit side which is currently locked out by the
> > flush lock.
> > + */
> > +static void
> > +xlog_cil_build_lv_chain(
> > +	struct xfs_cil		*cil,
> > +	struct xfs_cil_ctx	*ctx,
> > +	uint32_t		*num_iovecs,
> > +	uint32_t		*num_bytes)
> > +{
> > +	struct xfs_log_vec	*lv = NULL;
> > +
> > +	while (!list_empty(&cil->xc_cil)) {
> > +		struct xfs_log_item	*item;
> > +
> > +		item = list_first_entry(&cil->xc_cil,
> > +					struct xfs_log_item, li_cil);
> > +		list_del_init(&item->li_cil);
> > +		if (!ctx->lv_chain)
> > +			ctx->lv_chain = item->li_lv;
> > +		else
> > +			lv->lv_next = item->li_lv;
> > +		lv = item->li_lv;
> > +		item->li_lv = NULL;
> > +		*num_iovecs += lv->lv_niovecs;
> > 
> 
> This part below does not appear in the new rebase, so this would go
> away in the hoisted helper
> 
> > +
> > +		/* we don't write ordered log vectors */
> > +		if (lv->lv_buf_len != XFS_LOG_VEC_ORDERED)
> > +			*num_bytes += lv->lv_bytes;
> > +	}
> > +}
> > +
> >  /*
> >   * Push the Committed Item List to the log.
> >   *
> > @@ -946,7 +1012,6 @@ xlog_cil_push_work(
> >  		container_of(work, struct xfs_cil_ctx, push_work);
> >  	struct xfs_cil		*cil = ctx->cil;
> >  	struct xlog		*log = cil->xc_log;
> > -	struct xfs_log_vec	*lv;
> >  	struct xfs_cil_ctx	*new_ctx;
> >  	int			num_iovecs = 0;
> 
> For me, I had to add the new num_bytes variable here:
> 	int			num_iovecs, num_bytes;
> 
> I think the new helper does not need the num_bytes parameter in the new
> rebase, so we may be able to just remove num_bytes  entirely.

Right, so this is all stuff that changes in the xlog-write rework
patch series that this was based on. There's a lot of changes
to the CIL push code in that series that this builds on....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2022-04-12 11:43 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-14 22:06 [PATCH 0/8 v3] xfs: intent whiteouts Dave Chinner
2022-03-14 22:06 ` [PATCH 1/8] xfs: hide log iovec alignment constraints Dave Chinner
2022-04-11  5:23   ` Alli
2022-04-12 10:13     ` Dave Chinner
2022-03-14 22:06 ` [PATCH 2/8] xfs: don't commit the first deferred transaction without intents Dave Chinner
2022-04-11  5:22   ` Alli
2022-04-12 10:21     ` Dave Chinner
2022-03-14 22:06 ` [PATCH 3/8] xfs: add log item flags to indicate intents Dave Chinner
2022-04-11  5:23   ` Alli
2022-03-14 22:06 ` [PATCH 4/8] xfs: tag transactions that contain intent done items Dave Chinner
2022-04-11  5:23   ` Alli
2022-03-14 22:06 ` [PATCH 5/8] xfs: factor and move some code in xfs_log_cil.c Dave Chinner
2022-04-11  5:24   ` Alli
2022-04-12 10:25     ` Dave Chinner
2022-03-14 22:06 ` [PATCH 6/8] xfs: add log item method to return related intents Dave Chinner
2022-04-11  5:24   ` Alli
2022-03-14 22:06 ` [PATCH 7/8] xfs: whiteouts release intents that are not in the AIL Dave Chinner
2022-04-11  5:27   ` Alli
2022-03-14 22:06 ` [PATCH 8/8] xfs: intent item whiteouts Dave Chinner
2022-04-11  5:22 ` [PATCH 0/8 v3] xfs: intent whiteouts Alli

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.