All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/8 v5] xfs: intent whiteouts
@ 2022-04-27  2:22 Dave Chinner
  2022-04-27  2:22 ` [PATCH 1/8] xfs: hide log iovec alignment constraints Dave Chinner
                   ` (7 more replies)
  0 siblings, 8 replies; 35+ messages in thread
From: Dave Chinner @ 2022-04-27  2:22 UTC (permalink / raw)
  To: linux-xfs

Hi folks,

This is a patchset inspired by the performance regressions that were
seen from logging 64k xattrs with Allison's delayed attribute
patchset and trying to work out how to minimise the impact of
logging xattrs. Most of that is explained in the "xfs: intent item
whiteouts" patch, so I won't repeat it here.

The whiteouts massively reduce the journal write overhead of logging
xattrs - with this patchset I've reduced 2.5GB/s of log traffic (16
way file create w/64k xattr workload) down to approximately 220MB of
log traffic, and performance has increased from 9k creates/s to 36k
creates/s. The workload still writes to disk at 2.5GB/s, but that's
what writing 35k x 64k xattrs to disk does.

This is still short of the non-logged attribute mechanism, which
runs at 40-45k creates a second and 3.5-4GB/s to disk, but it brings
logged attrs to within roughly 5-15% of non-logged attrs across the
full range of attribute sizes.

So, while this patchset was clearly insired and has major positive
impact on Allison's delayed attribute work, it also applies
generically to all other intent/intent done pairs that already
exist. Hence I've created this patchset as a stand-alone patchset
that isn't dependent on the delayed attributes being committed, nor
does the delayed attribute patchset need this to funciton properly.
IOWs, they can be merged in parallel and then the attribute log item
implementation be updated to support whiteouts after the fact.

This patchset is separate to the attr code, though, because
intent whiteouts are not specific to the attr code. They are a
generic mechanism that can be applied to all the intent/intent done
item pairs we already have. This patch set modifies all those
intents to use whiteouts, and so there is benefits from the patch
set for all operations that use these intents.

With respect to the delayed attribute patchset, it can be merged
with/without whiteout support and still work correctly with/without
this patchset in place. Once both intent whiteouts and delayed attrs
are merged, we can add whiteout support to delayed attributes with
only a few lines of extra code.

-Dave.

Changelog:

Version 5:
- rebased on 5.18-rc2 + linux-xfs/for-next
- converted transaction flags to unsigned to match tp->t_flags definition

Version 4:
- not published
- rebased on 5.17 + for-next + log shutdown fixes + xlog-write-rework
- fixed memory leak of CUI shadow buffers from log recovery when clearing
  leftover reflink entries.

Version 3:
- https://lore.kernel.org/linux-xfs/20220314220631.3093283-1-david@fromorbit.com/
- rebased on 5.17-rc4 + xlog-write-rework
- no longer dependent on xfs-cil-scalability, so there's some porting changes
  that was needed to remove all the per-cpu CIL dependencies.

Version 2:
- not published
- rebased on 5.15-rc2 + xfs-cil-scalability
- dropped the kvmalloc changes for CIL shadow buffers as that's a
  separate perf problem and not something related to intent
  whiteouts.
- dropped all the delayed attribute modifications so that the
  patchset is not dependent on Allison's dev tree.
- Thanks to Allison for an initial quick review pass - I haven't
  included those RVB tags because every patch in the series has
  changed since the original RFC posting.

RFC:
- https://lore.kernel.org/linux-xfs/20210909212133.GE2361455@dread.disaster.area/


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 1/8] xfs: hide log iovec alignment constraints
  2022-04-27  2:22 [PATCH 0/8 v5] xfs: intent whiteouts Dave Chinner
@ 2022-04-27  2:22 ` Dave Chinner
  2022-04-27  3:14   ` Darrick J. Wong
  2022-04-28 13:00   ` Christoph Hellwig
  2022-04-27  2:22 ` [PATCH 2/8] xfs: don't commit the first deferred transaction without intents Dave Chinner
                   ` (6 subsequent siblings)
  7 siblings, 2 replies; 35+ messages in thread
From: Dave Chinner @ 2022-04-27  2:22 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

Callers currently have to round out the size of buffers to match the
aligment constraints of log iovecs and xlog_write(). They should not
need to know this detail, so introduce a new function to calculate
the iovec length (for use in ->iop_size implementations). Also
modify xlog_finish_iovec() to round up the length to the correct
alignment so the callers don't need to do this, either.

Convert the only user - inode forks - of this alignment rounding to
use the new interface.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/libxfs/xfs_inode_fork.c  | 12 +++---------
 fs/xfs/xfs_inode_item.c         | 25 +++++++------------------
 fs/xfs/xfs_inode_item_recover.c |  4 ++--
 fs/xfs/xfs_log.h                | 23 +++++++++++++++++++++++
 4 files changed, 35 insertions(+), 29 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_inode_fork.c b/fs/xfs/libxfs/xfs_inode_fork.c
index 9aee4a1e2fe9..1a4cdf550f6d 100644
--- a/fs/xfs/libxfs/xfs_inode_fork.c
+++ b/fs/xfs/libxfs/xfs_inode_fork.c
@@ -36,7 +36,7 @@ xfs_init_local_fork(
 	int64_t			size)
 {
 	struct xfs_ifork	*ifp = XFS_IFORK_PTR(ip, whichfork);
-	int			mem_size = size, real_size = 0;
+	int			mem_size = size;
 	bool			zero_terminate;
 
 	/*
@@ -50,8 +50,7 @@ xfs_init_local_fork(
 		mem_size++;
 
 	if (size) {
-		real_size = roundup(mem_size, 4);
-		ifp->if_u1.if_data = kmem_alloc(real_size, KM_NOFS);
+		ifp->if_u1.if_data = kmem_alloc(mem_size, KM_NOFS);
 		memcpy(ifp->if_u1.if_data, data, size);
 		if (zero_terminate)
 			ifp->if_u1.if_data[size] = '\0';
@@ -497,12 +496,7 @@ xfs_idata_realloc(
 		return;
 	}
 
-	/*
-	 * For inline data, the underlying buffer must be a multiple of 4 bytes
-	 * in size so that it can be logged and stay on word boundaries.
-	 * We enforce that here.
-	 */
-	ifp->if_u1.if_data = krealloc(ifp->if_u1.if_data, roundup(new_size, 4),
+	ifp->if_u1.if_data = krealloc(ifp->if_u1.if_data, new_size,
 				      GFP_NOFS | __GFP_NOFAIL);
 	ifp->if_bytes = new_size;
 }
diff --git a/fs/xfs/xfs_inode_item.c b/fs/xfs/xfs_inode_item.c
index 00733a18ccdc..721def0639fd 100644
--- a/fs/xfs/xfs_inode_item.c
+++ b/fs/xfs/xfs_inode_item.c
@@ -71,7 +71,7 @@ xfs_inode_item_data_fork_size(
 	case XFS_DINODE_FMT_LOCAL:
 		if ((iip->ili_fields & XFS_ILOG_DDATA) &&
 		    ip->i_df.if_bytes > 0) {
-			*nbytes += roundup(ip->i_df.if_bytes, 4);
+			*nbytes += xlog_calc_iovec_len(ip->i_df.if_bytes);
 			*nvecs += 1;
 		}
 		break;
@@ -112,7 +112,7 @@ xfs_inode_item_attr_fork_size(
 	case XFS_DINODE_FMT_LOCAL:
 		if ((iip->ili_fields & XFS_ILOG_ADATA) &&
 		    ip->i_afp->if_bytes > 0) {
-			*nbytes += roundup(ip->i_afp->if_bytes, 4);
+			*nbytes += xlog_calc_iovec_len(ip->i_afp->if_bytes);
 			*nvecs += 1;
 		}
 		break;
@@ -204,17 +204,12 @@ xfs_inode_item_format_data_fork(
 			~(XFS_ILOG_DEXT | XFS_ILOG_DBROOT | XFS_ILOG_DEV);
 		if ((iip->ili_fields & XFS_ILOG_DDATA) &&
 		    ip->i_df.if_bytes > 0) {
-			/*
-			 * Round i_bytes up to a word boundary.
-			 * The underlying memory is guaranteed
-			 * to be there by xfs_idata_realloc().
-			 */
-			data_bytes = roundup(ip->i_df.if_bytes, 4);
 			ASSERT(ip->i_df.if_u1.if_data != NULL);
 			ASSERT(ip->i_disk_size > 0);
 			xlog_copy_iovec(lv, vecp, XLOG_REG_TYPE_ILOCAL,
-					ip->i_df.if_u1.if_data, data_bytes);
-			ilf->ilf_dsize = (unsigned)data_bytes;
+					ip->i_df.if_u1.if_data,
+					ip->i_df.if_bytes);
+			ilf->ilf_dsize = (unsigned)ip->i_df.if_bytes;
 			ilf->ilf_size++;
 		} else {
 			iip->ili_fields &= ~XFS_ILOG_DDATA;
@@ -288,17 +283,11 @@ xfs_inode_item_format_attr_fork(
 
 		if ((iip->ili_fields & XFS_ILOG_ADATA) &&
 		    ip->i_afp->if_bytes > 0) {
-			/*
-			 * Round i_bytes up to a word boundary.
-			 * The underlying memory is guaranteed
-			 * to be there by xfs_idata_realloc().
-			 */
-			data_bytes = roundup(ip->i_afp->if_bytes, 4);
 			ASSERT(ip->i_afp->if_u1.if_data != NULL);
 			xlog_copy_iovec(lv, vecp, XLOG_REG_TYPE_IATTR_LOCAL,
 					ip->i_afp->if_u1.if_data,
-					data_bytes);
-			ilf->ilf_asize = (unsigned)data_bytes;
+					ip->i_afp->if_bytes);
+			ilf->ilf_asize = (unsigned)ip->i_afp->if_bytes;
 			ilf->ilf_size++;
 		} else {
 			iip->ili_fields &= ~XFS_ILOG_ADATA;
diff --git a/fs/xfs/xfs_inode_item_recover.c b/fs/xfs/xfs_inode_item_recover.c
index 6d44f5fd6d7e..d28ffaebd067 100644
--- a/fs/xfs/xfs_inode_item_recover.c
+++ b/fs/xfs/xfs_inode_item_recover.c
@@ -462,7 +462,7 @@ xlog_recover_inode_commit_pass2(
 	ASSERT(in_f->ilf_size <= 4);
 	ASSERT((in_f->ilf_size == 3) || (fields & XFS_ILOG_AFORK));
 	ASSERT(!(fields & XFS_ILOG_DFORK) ||
-	       (len == in_f->ilf_dsize));
+	       (len == xlog_calc_iovec_len(in_f->ilf_dsize)));
 
 	switch (fields & XFS_ILOG_DFORK) {
 	case XFS_ILOG_DDATA:
@@ -497,7 +497,7 @@ xlog_recover_inode_commit_pass2(
 		}
 		len = item->ri_buf[attr_index].i_len;
 		src = item->ri_buf[attr_index].i_addr;
-		ASSERT(len == in_f->ilf_asize);
+		ASSERT(len == xlog_calc_iovec_len(in_f->ilf_asize));
 
 		switch (in_f->ilf_fields & XFS_ILOG_AFORK) {
 		case XFS_ILOG_ADATA:
diff --git a/fs/xfs/xfs_log.h b/fs/xfs/xfs_log.h
index 8dafe8f771c7..d9aebca3f971 100644
--- a/fs/xfs/xfs_log.h
+++ b/fs/xfs/xfs_log.h
@@ -21,6 +21,17 @@ struct xfs_log_vec {
 
 #define XFS_LOG_VEC_ORDERED	(-1)
 
+/*
+ * Calculate the log iovec length for a given user buffer length. Intended to be
+ * used by ->iop_size implementations when sizing buffers of arbitrary
+ * alignments.
+ */
+static inline int
+xlog_calc_iovec_len(int len)
+{
+	return roundup(len, 4);
+}
+
 void *xlog_prepare_iovec(struct xfs_log_vec *lv, struct xfs_log_iovec **vecp,
 		uint type);
 
@@ -29,6 +40,12 @@ xlog_finish_iovec(struct xfs_log_vec *lv, struct xfs_log_iovec *vec, int len)
 {
 	struct xlog_op_header	*oph = vec->i_addr;
 
+	/*
+	 * Always round up the length to the correct alignment so callers don't
+	 * need to know anything about this log vec layout requirement.
+	 */
+	len = xlog_calc_iovec_len(len);
+
 	/* opheader tracks payload length, logvec tracks region length */
 	oph->oh_len = cpu_to_be32(len);
 
@@ -36,8 +53,14 @@ xlog_finish_iovec(struct xfs_log_vec *lv, struct xfs_log_iovec *vec, int len)
 	lv->lv_buf_len += len;
 	lv->lv_bytes += len;
 	vec->i_len = len;
+
+	/* Catch buffer overruns */
+	ASSERT((void *)lv->lv_buf + lv->lv_bytes <= (void *)lv + lv->lv_size);
 }
 
+/*
+ * Copy the amount of data requested by the caller into a new log iovec.
+ */
 static inline void *
 xlog_copy_iovec(struct xfs_log_vec *lv, struct xfs_log_iovec **vecp,
 		uint type, void *data, int len)
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 2/8] xfs: don't commit the first deferred transaction without intents
  2022-04-27  2:22 [PATCH 0/8 v5] xfs: intent whiteouts Dave Chinner
  2022-04-27  2:22 ` [PATCH 1/8] xfs: hide log iovec alignment constraints Dave Chinner
@ 2022-04-27  2:22 ` Dave Chinner
  2022-04-27  3:03   ` Darrick J. Wong
                     ` (2 more replies)
  2022-04-27  2:22 ` [PATCH 3/8] xfs: add log item flags to indicate intents Dave Chinner
                   ` (5 subsequent siblings)
  7 siblings, 3 replies; 35+ messages in thread
From: Dave Chinner @ 2022-04-27  2:22 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

If the first operation in a string of defer ops has no intents,
then there is no reason to commit it before running the first call
to xfs_defer_finish_one(). This allows the defer ops to be used
effectively for non-intent based operations without requiring an
unnecessary extra transaction commit when first called.

This fixes a regression in per-attribute modification transaction
count when delayed attributes are not being used.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/libxfs/xfs_defer.c | 29 +++++++++++++++++------------
 1 file changed, 17 insertions(+), 12 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_defer.c b/fs/xfs/libxfs/xfs_defer.c
index 0805ade2d300..66b4555bda8e 100644
--- a/fs/xfs/libxfs/xfs_defer.c
+++ b/fs/xfs/libxfs/xfs_defer.c
@@ -186,7 +186,7 @@ static const struct xfs_defer_op_type *defer_op_types[] = {
 	[XFS_DEFER_OPS_TYPE_AGFL_FREE]	= &xfs_agfl_free_defer_type,
 };
 
-static void
+static bool
 xfs_defer_create_intent(
 	struct xfs_trans		*tp,
 	struct xfs_defer_pending	*dfp,
@@ -197,6 +197,7 @@ xfs_defer_create_intent(
 	if (!dfp->dfp_intent)
 		dfp->dfp_intent = ops->create_intent(tp, &dfp->dfp_work,
 						     dfp->dfp_count, sort);
+	return dfp->dfp_intent;
 }
 
 /*
@@ -204,16 +205,18 @@ xfs_defer_create_intent(
  * associated extents, then add the entire intake list to the end of
  * the pending list.
  */
-STATIC void
+static bool
 xfs_defer_create_intents(
 	struct xfs_trans		*tp)
 {
 	struct xfs_defer_pending	*dfp;
+	bool				ret = false;
 
 	list_for_each_entry(dfp, &tp->t_dfops, dfp_list) {
 		trace_xfs_defer_create_intent(tp->t_mountp, dfp);
-		xfs_defer_create_intent(tp, dfp, true);
+		ret |= xfs_defer_create_intent(tp, dfp, true);
 	}
+	return ret;
 }
 
 /* Abort all the intents that were committed. */
@@ -487,7 +490,7 @@ int
 xfs_defer_finish_noroll(
 	struct xfs_trans		**tp)
 {
-	struct xfs_defer_pending	*dfp;
+	struct xfs_defer_pending	*dfp = NULL;
 	int				error = 0;
 	LIST_HEAD(dop_pending);
 
@@ -506,17 +509,19 @@ xfs_defer_finish_noroll(
 		 * of time that any one intent item can stick around in memory,
 		 * pinning the log tail.
 		 */
-		xfs_defer_create_intents(*tp);
+		bool has_intents = xfs_defer_create_intents(*tp);
 		list_splice_init(&(*tp)->t_dfops, &dop_pending);
 
-		error = xfs_defer_trans_roll(tp);
-		if (error)
-			goto out_shutdown;
+		if (has_intents || dfp) {
+			error = xfs_defer_trans_roll(tp);
+			if (error)
+				goto out_shutdown;
 
-		/* Possibly relog intent items to keep the log moving. */
-		error = xfs_defer_relog(tp, &dop_pending);
-		if (error)
-			goto out_shutdown;
+			/* Possibly relog intent items to keep the log moving. */
+			error = xfs_defer_relog(tp, &dop_pending);
+			if (error)
+				goto out_shutdown;
+		}
 
 		dfp = list_first_entry(&dop_pending, struct xfs_defer_pending,
 				       dfp_list);
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 3/8] xfs: add log item flags to indicate intents
  2022-04-27  2:22 [PATCH 0/8 v5] xfs: intent whiteouts Dave Chinner
  2022-04-27  2:22 ` [PATCH 1/8] xfs: hide log iovec alignment constraints Dave Chinner
  2022-04-27  2:22 ` [PATCH 2/8] xfs: don't commit the first deferred transaction without intents Dave Chinner
@ 2022-04-27  2:22 ` Dave Chinner
  2022-04-27  3:04   ` Darrick J. Wong
  2022-04-27  2:22 ` [PATCH 4/8] xfs: tag transactions that contain intent done items Dave Chinner
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 35+ messages in thread
From: Dave Chinner @ 2022-04-27  2:22 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

We currently have a couple of helper functions that try to infer
whether the log item is an intent or intent done item from the
combinations of operations it supports.  This is incredibly fragile
and not very efficient as it requires checking specific combinations
of ops.

We need to be able to identify intent and intent done items quickly
and easily in upcoming patches, so simply add intent and intent done
type flags to the log item ops flags. These are static flags to
begin with, so intent items should have been typed like this from
the start.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
---
 fs/xfs/xfs_bmap_item.c     |  4 +++-
 fs/xfs/xfs_extfree_item.c  |  4 +++-
 fs/xfs/xfs_refcount_item.c |  4 +++-
 fs/xfs/xfs_rmap_item.c     |  4 +++-
 fs/xfs/xfs_trans.h         | 25 +++++++++++++------------
 5 files changed, 25 insertions(+), 16 deletions(-)

diff --git a/fs/xfs/xfs_bmap_item.c b/fs/xfs/xfs_bmap_item.c
index 593ac29cffc7..ed67c0028a68 100644
--- a/fs/xfs/xfs_bmap_item.c
+++ b/fs/xfs/xfs_bmap_item.c
@@ -202,7 +202,8 @@ xfs_bud_item_release(
 }
 
 static const struct xfs_item_ops xfs_bud_item_ops = {
-	.flags		= XFS_ITEM_RELEASE_WHEN_COMMITTED,
+	.flags		= XFS_ITEM_RELEASE_WHEN_COMMITTED |
+			  XFS_ITEM_INTENT_DONE,
 	.iop_size	= xfs_bud_item_size,
 	.iop_format	= xfs_bud_item_format,
 	.iop_release	= xfs_bud_item_release,
@@ -586,6 +587,7 @@ xfs_bui_item_relog(
 }
 
 static const struct xfs_item_ops xfs_bui_item_ops = {
+	.flags		= XFS_ITEM_INTENT,
 	.iop_size	= xfs_bui_item_size,
 	.iop_format	= xfs_bui_item_format,
 	.iop_unpin	= xfs_bui_item_unpin,
diff --git a/fs/xfs/xfs_extfree_item.c b/fs/xfs/xfs_extfree_item.c
index 0e50f2c9348e..21a159f9d8c5 100644
--- a/fs/xfs/xfs_extfree_item.c
+++ b/fs/xfs/xfs_extfree_item.c
@@ -307,7 +307,8 @@ xfs_efd_item_release(
 }
 
 static const struct xfs_item_ops xfs_efd_item_ops = {
-	.flags		= XFS_ITEM_RELEASE_WHEN_COMMITTED,
+	.flags		= XFS_ITEM_RELEASE_WHEN_COMMITTED |
+			  XFS_ITEM_INTENT_DONE,
 	.iop_size	= xfs_efd_item_size,
 	.iop_format	= xfs_efd_item_format,
 	.iop_release	= xfs_efd_item_release,
@@ -688,6 +689,7 @@ xfs_efi_item_relog(
 }
 
 static const struct xfs_item_ops xfs_efi_item_ops = {
+	.flags		= XFS_ITEM_INTENT,
 	.iop_size	= xfs_efi_item_size,
 	.iop_format	= xfs_efi_item_format,
 	.iop_unpin	= xfs_efi_item_unpin,
diff --git a/fs/xfs/xfs_refcount_item.c b/fs/xfs/xfs_refcount_item.c
index 0d868c93144d..6536eea4c6ea 100644
--- a/fs/xfs/xfs_refcount_item.c
+++ b/fs/xfs/xfs_refcount_item.c
@@ -208,7 +208,8 @@ xfs_cud_item_release(
 }
 
 static const struct xfs_item_ops xfs_cud_item_ops = {
-	.flags		= XFS_ITEM_RELEASE_WHEN_COMMITTED,
+	.flags		= XFS_ITEM_RELEASE_WHEN_COMMITTED |
+			  XFS_ITEM_INTENT_DONE,
 	.iop_size	= xfs_cud_item_size,
 	.iop_format	= xfs_cud_item_format,
 	.iop_release	= xfs_cud_item_release,
@@ -600,6 +601,7 @@ xfs_cui_item_relog(
 }
 
 static const struct xfs_item_ops xfs_cui_item_ops = {
+	.flags		= XFS_ITEM_INTENT,
 	.iop_size	= xfs_cui_item_size,
 	.iop_format	= xfs_cui_item_format,
 	.iop_unpin	= xfs_cui_item_unpin,
diff --git a/fs/xfs/xfs_rmap_item.c b/fs/xfs/xfs_rmap_item.c
index a22b2d19ef91..c2bb8cfc231e 100644
--- a/fs/xfs/xfs_rmap_item.c
+++ b/fs/xfs/xfs_rmap_item.c
@@ -231,7 +231,8 @@ xfs_rud_item_release(
 }
 
 static const struct xfs_item_ops xfs_rud_item_ops = {
-	.flags		= XFS_ITEM_RELEASE_WHEN_COMMITTED,
+	.flags		= XFS_ITEM_RELEASE_WHEN_COMMITTED |
+			  XFS_ITEM_INTENT_DONE,
 	.iop_size	= xfs_rud_item_size,
 	.iop_format	= xfs_rud_item_format,
 	.iop_release	= xfs_rud_item_release,
@@ -630,6 +631,7 @@ xfs_rui_item_relog(
 }
 
 static const struct xfs_item_ops xfs_rui_item_ops = {
+	.flags		= XFS_ITEM_INTENT,
 	.iop_size	= xfs_rui_item_size,
 	.iop_format	= xfs_rui_item_format,
 	.iop_unpin	= xfs_rui_item_unpin,
diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h
index 87e940b5366e..f68e74e46026 100644
--- a/fs/xfs/xfs_trans.h
+++ b/fs/xfs/xfs_trans.h
@@ -80,28 +80,29 @@ struct xfs_item_ops {
 			struct xfs_trans *tp);
 };
 
-/* Is this log item a deferred action intent? */
+/*
+ * Log item ops flags
+ */
+/*
+ * Release the log item when the journal commits instead of inserting into the
+ * AIL for writeback tracking and/or log tail pinning.
+ */
+#define XFS_ITEM_RELEASE_WHEN_COMMITTED	(1 << 0)
+#define XFS_ITEM_INTENT			(1 << 1)
+#define XFS_ITEM_INTENT_DONE		(1 << 2)
+
 static inline bool
 xlog_item_is_intent(struct xfs_log_item *lip)
 {
-	return lip->li_ops->iop_recover != NULL &&
-	       lip->li_ops->iop_match != NULL;
+	return lip->li_ops->flags & XFS_ITEM_INTENT;
 }
 
-/* Is this a log intent-done item? */
 static inline bool
 xlog_item_is_intent_done(struct xfs_log_item *lip)
 {
-	return lip->li_ops->iop_unpin == NULL &&
-	       lip->li_ops->iop_push == NULL;
+	return lip->li_ops->flags & XFS_ITEM_INTENT_DONE;
 }
 
-/*
- * Release the log item as soon as committed.  This is for items just logging
- * intents that never need to be written back in place.
- */
-#define XFS_ITEM_RELEASE_WHEN_COMMITTED	(1 << 0)
-
 void	xfs_log_item_init(struct xfs_mount *mp, struct xfs_log_item *item,
 			  int type, const struct xfs_item_ops *ops);
 
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 4/8] xfs: tag transactions that contain intent done items
  2022-04-27  2:22 [PATCH 0/8 v5] xfs: intent whiteouts Dave Chinner
                   ` (2 preceding siblings ...)
  2022-04-27  2:22 ` [PATCH 3/8] xfs: add log item flags to indicate intents Dave Chinner
@ 2022-04-27  2:22 ` Dave Chinner
  2022-04-27  3:06   ` Darrick J. Wong
  2022-04-28 13:05   ` Christoph Hellwig
  2022-04-27  2:22 ` [PATCH 5/8] xfs: factor and move some code in xfs_log_cil.c Dave Chinner
                   ` (3 subsequent siblings)
  7 siblings, 2 replies; 35+ messages in thread
From: Dave Chinner @ 2022-04-27  2:22 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

Intent whiteouts will require extra work to be done during
transaction commit if the transaction contains an intent done item.

To determine if a transaction contains an intent done item, we want
to avoid having to walk all the items in the transaction to check if
they are intent done items. Hence when we add an intent done item to
a transaction, tag the transaction to indicate that it contains such
an item.

We don't tag the transaction when the defer ops is relogging an
intent to move it forward in the log. Whiteouts will never apply to
these cases, so we don't need to bother looking for them.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
---
 fs/xfs/libxfs/xfs_shared.h | 24 +++++++++++++++++-------
 fs/xfs/xfs_bmap_item.c     |  2 +-
 fs/xfs/xfs_extfree_item.c  |  2 +-
 fs/xfs/xfs_refcount_item.c |  2 +-
 fs/xfs/xfs_rmap_item.c     |  2 +-
 5 files changed, 21 insertions(+), 11 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_shared.h b/fs/xfs/libxfs/xfs_shared.h
index 25c4cab58851..c4381388c0c1 100644
--- a/fs/xfs/libxfs/xfs_shared.h
+++ b/fs/xfs/libxfs/xfs_shared.h
@@ -54,13 +54,23 @@ void	xfs_log_get_max_trans_res(struct xfs_mount *mp,
 /*
  * Values for t_flags.
  */
-#define	XFS_TRANS_DIRTY		0x01	/* something needs to be logged */
-#define	XFS_TRANS_SB_DIRTY	0x02	/* superblock is modified */
-#define	XFS_TRANS_PERM_LOG_RES	0x04	/* xact took a permanent log res */
-#define	XFS_TRANS_SYNC		0x08	/* make commit synchronous */
-#define XFS_TRANS_RESERVE	0x20    /* OK to use reserved data blocks */
-#define XFS_TRANS_NO_WRITECOUNT 0x40	/* do not elevate SB writecount */
-#define XFS_TRANS_RES_FDBLKS	0x80	/* reserve newly freed blocks */
+/* Transaction needs to be logged */
+#define XFS_TRANS_DIRTY			(1u << 0)
+/* Superblock is dirty and needs to be logged */
+#define XFS_TRANS_SB_DIRTY		(1u << 1)
+/* Transaction took a permanent log reservation */
+#define XFS_TRANS_PERM_LOG_RES		(1u << 2)
+/* Synchronous transaction commit needed */
+#define XFS_TRANS_SYNC			(1u << 3)
+/* Transaction can use reserve block pool */
+#define XFS_TRANS_RESERVE		(1u << 4)
+/* Transaction should avoid VFS level superblock write accounting */
+#define XFS_TRANS_NO_WRITECOUNT		(1u << 5)
+/* Transaction has freed blocks returned to it's reservation */
+#define XFS_TRANS_RES_FDBLKS		(1u << 6)
+/* Transaction contains an intent done log item */
+#define XFS_TRANS_HAS_INTENT_DONE	(1u << 7)
+
 /*
  * LOWMODE is used by the allocator to activate the lowspace algorithm - when
  * free space is running low the extent allocator may choose to allocate an
diff --git a/fs/xfs/xfs_bmap_item.c b/fs/xfs/xfs_bmap_item.c
index ed67c0028a68..3b968b31911b 100644
--- a/fs/xfs/xfs_bmap_item.c
+++ b/fs/xfs/xfs_bmap_item.c
@@ -255,7 +255,7 @@ xfs_trans_log_finish_bmap_update(
 	 * 1.) releases the BUI and frees the BUD
 	 * 2.) shuts down the filesystem
 	 */
-	tp->t_flags |= XFS_TRANS_DIRTY;
+	tp->t_flags |= XFS_TRANS_DIRTY | XFS_TRANS_HAS_INTENT_DONE;
 	set_bit(XFS_LI_DIRTY, &budp->bud_item.li_flags);
 
 	return error;
diff --git a/fs/xfs/xfs_extfree_item.c b/fs/xfs/xfs_extfree_item.c
index 21a159f9d8c5..96735f23d12d 100644
--- a/fs/xfs/xfs_extfree_item.c
+++ b/fs/xfs/xfs_extfree_item.c
@@ -381,7 +381,7 @@ xfs_trans_free_extent(
 	 * 1.) releases the EFI and frees the EFD
 	 * 2.) shuts down the filesystem
 	 */
-	tp->t_flags |= XFS_TRANS_DIRTY;
+	tp->t_flags |= XFS_TRANS_DIRTY | XFS_TRANS_HAS_INTENT_DONE;
 	set_bit(XFS_LI_DIRTY, &efdp->efd_item.li_flags);
 
 	next_extent = efdp->efd_next_extent;
diff --git a/fs/xfs/xfs_refcount_item.c b/fs/xfs/xfs_refcount_item.c
index 6536eea4c6ea..b523ce2c775b 100644
--- a/fs/xfs/xfs_refcount_item.c
+++ b/fs/xfs/xfs_refcount_item.c
@@ -260,7 +260,7 @@ xfs_trans_log_finish_refcount_update(
 	 * 1.) releases the CUI and frees the CUD
 	 * 2.) shuts down the filesystem
 	 */
-	tp->t_flags |= XFS_TRANS_DIRTY;
+	tp->t_flags |= XFS_TRANS_DIRTY | XFS_TRANS_HAS_INTENT_DONE;
 	set_bit(XFS_LI_DIRTY, &cudp->cud_item.li_flags);
 
 	return error;
diff --git a/fs/xfs/xfs_rmap_item.c b/fs/xfs/xfs_rmap_item.c
index c2bb8cfc231e..b269e68407b9 100644
--- a/fs/xfs/xfs_rmap_item.c
+++ b/fs/xfs/xfs_rmap_item.c
@@ -328,7 +328,7 @@ xfs_trans_log_finish_rmap_update(
 	 * 1.) releases the RUI and frees the RUD
 	 * 2.) shuts down the filesystem
 	 */
-	tp->t_flags |= XFS_TRANS_DIRTY;
+	tp->t_flags |= XFS_TRANS_DIRTY | XFS_TRANS_HAS_INTENT_DONE;
 	set_bit(XFS_LI_DIRTY, &rudp->rud_item.li_flags);
 
 	return error;
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 5/8] xfs: factor and move some code in xfs_log_cil.c
  2022-04-27  2:22 [PATCH 0/8 v5] xfs: intent whiteouts Dave Chinner
                   ` (3 preceding siblings ...)
  2022-04-27  2:22 ` [PATCH 4/8] xfs: tag transactions that contain intent done items Dave Chinner
@ 2022-04-27  2:22 ` Dave Chinner
  2022-04-27  3:15   ` Darrick J. Wong
                     ` (2 more replies)
  2022-04-27  2:22 ` [PATCH 6/8] xfs: add log item method to return related intents Dave Chinner
                   ` (2 subsequent siblings)
  7 siblings, 3 replies; 35+ messages in thread
From: Dave Chinner @ 2022-04-27  2:22 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

In preparation for adding support for intent item whiteouts.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_log_cil.c | 119 ++++++++++++++++++++++++-------------------
 1 file changed, 67 insertions(+), 52 deletions(-)

diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
index e5ab62f08c19..0d8d092447ad 100644
--- a/fs/xfs/xfs_log_cil.c
+++ b/fs/xfs/xfs_log_cil.c
@@ -47,6 +47,38 @@ xlog_cil_ticket_alloc(
 	return tic;
 }
 
+/*
+ * Check if the current log item was first committed in this sequence.
+ * We can't rely on just the log item being in the CIL, we have to check
+ * the recorded commit sequence number.
+ *
+ * Note: for this to be used in a non-racy manner, it has to be called with
+ * CIL flushing locked out. As a result, it should only be used during the
+ * transaction commit process when deciding what to format into the item.
+ */
+static bool
+xlog_item_in_current_chkpt(
+	struct xfs_cil		*cil,
+	struct xfs_log_item	*lip)
+{
+	if (list_empty(&lip->li_cil))
+		return false;
+
+	/*
+	 * li_seq is written on the first commit of a log item to record the
+	 * first checkpoint it is written to. Hence if it is different to the
+	 * current sequence, we're in a new checkpoint.
+	 */
+	return lip->li_seq == READ_ONCE(cil->xc_current_sequence);
+}
+
+bool
+xfs_log_item_in_current_chkpt(
+	struct xfs_log_item *lip)
+{
+	return xlog_item_in_current_chkpt(lip->li_log->l_cilp, lip);
+}
+
 /*
  * Unavoidable forward declaration - xlog_cil_push_work() calls
  * xlog_cil_ctx_alloc() itself.
@@ -934,6 +966,40 @@ xlog_cil_build_trans_hdr(
 	tic->t_curr_res -= lvhdr->lv_bytes;
 }
 
+/*
+ * Pull all the log vectors off the items in the CIL, and remove the items from
+ * the CIL. We don't need the CIL lock here because it's only needed on the
+ * transaction commit side which is currently locked out by the flush lock.
+ */
+static void
+xlog_cil_build_lv_chain(
+	struct xfs_cil		*cil,
+	struct xfs_cil_ctx	*ctx,
+	uint32_t		*num_iovecs,
+	uint32_t		*num_bytes)
+{
+	struct xfs_log_vec	*lv = NULL;
+
+	while (!list_empty(&cil->xc_cil)) {
+		struct xfs_log_item	*item;
+
+		item = list_first_entry(&cil->xc_cil,
+					struct xfs_log_item, li_cil);
+		list_del_init(&item->li_cil);
+		if (!ctx->lv_chain)
+			ctx->lv_chain = item->li_lv;
+		else
+			lv->lv_next = item->li_lv;
+		lv = item->li_lv;
+		item->li_lv = NULL;
+		*num_iovecs += lv->lv_niovecs;
+
+		/* we don't write ordered log vectors */
+		if (lv->lv_buf_len != XFS_LOG_VEC_ORDERED)
+			*num_bytes += lv->lv_bytes;
+	}
+}
+
 /*
  * Push the Committed Item List to the log.
  *
@@ -956,7 +1022,6 @@ xlog_cil_push_work(
 		container_of(work, struct xfs_cil_ctx, push_work);
 	struct xfs_cil		*cil = ctx->cil;
 	struct xlog		*log = cil->xc_log;
-	struct xfs_log_vec	*lv;
 	struct xfs_cil_ctx	*new_ctx;
 	int			num_iovecs = 0;
 	int			num_bytes = 0;
@@ -1033,31 +1098,7 @@ xlog_cil_push_work(
 	list_add(&ctx->committing, &cil->xc_committing);
 	spin_unlock(&cil->xc_push_lock);
 
-	/*
-	 * Pull all the log vectors off the items in the CIL, and remove the
-	 * items from the CIL. We don't need the CIL lock here because it's only
-	 * needed on the transaction commit side which is currently locked out
-	 * by the flush lock.
-	 */
-	lv = NULL;
-	while (!list_empty(&cil->xc_cil)) {
-		struct xfs_log_item	*item;
-
-		item = list_first_entry(&cil->xc_cil,
-					struct xfs_log_item, li_cil);
-		list_del_init(&item->li_cil);
-		if (!ctx->lv_chain)
-			ctx->lv_chain = item->li_lv;
-		else
-			lv->lv_next = item->li_lv;
-		lv = item->li_lv;
-		item->li_lv = NULL;
-		num_iovecs += lv->lv_niovecs;
-
-		/* we don't write ordered log vectors */
-		if (lv->lv_buf_len != XFS_LOG_VEC_ORDERED)
-			num_bytes += lv->lv_bytes;
-	}
+	xlog_cil_build_lv_chain(cil, ctx, &num_iovecs, &num_bytes);
 
 	/*
 	 * Switch the contexts so we can drop the context lock and move out
@@ -1508,32 +1549,6 @@ xlog_cil_force_seq(
 	return 0;
 }
 
-/*
- * Check if the current log item was first committed in this sequence.
- * We can't rely on just the log item being in the CIL, we have to check
- * the recorded commit sequence number.
- *
- * Note: for this to be used in a non-racy manner, it has to be called with
- * CIL flushing locked out. As a result, it should only be used during the
- * transaction commit process when deciding what to format into the item.
- */
-bool
-xfs_log_item_in_current_chkpt(
-	struct xfs_log_item	*lip)
-{
-	struct xfs_cil		*cil = lip->li_log->l_cilp;
-
-	if (list_empty(&lip->li_cil))
-		return false;
-
-	/*
-	 * li_seq is written on the first commit of a log item to record the
-	 * first checkpoint it is written to. Hence if it is different to the
-	 * current sequence, we're in a new checkpoint.
-	 */
-	return lip->li_seq == READ_ONCE(cil->xc_current_sequence);
-}
-
 /*
  * Perform initial CIL structure initialisation.
  */
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 6/8] xfs: add log item method to return related intents
  2022-04-27  2:22 [PATCH 0/8 v5] xfs: intent whiteouts Dave Chinner
                   ` (4 preceding siblings ...)
  2022-04-27  2:22 ` [PATCH 5/8] xfs: factor and move some code in xfs_log_cil.c Dave Chinner
@ 2022-04-27  2:22 ` Dave Chinner
  2022-04-27  3:18   ` Darrick J. Wong
  2022-04-28 13:10   ` Christoph Hellwig
  2022-04-27  2:22 ` [PATCH 7/8] xfs: whiteouts release intents that are not in the AIL Dave Chinner
  2022-04-27  2:22 ` [PATCH 8/8] xfs: intent item whiteouts Dave Chinner
  7 siblings, 2 replies; 35+ messages in thread
From: Dave Chinner @ 2022-04-27  2:22 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

To apply a whiteout to an intent item when an intent done item is
committed, we need to be able to retrieve the intent item from the
the intent done item. Add a log item op method for doing this, and
wire all the intent done items up to it.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
---
 fs/xfs/xfs_bmap_item.c     | 8 ++++++++
 fs/xfs/xfs_extfree_item.c  | 8 ++++++++
 fs/xfs/xfs_refcount_item.c | 8 ++++++++
 fs/xfs/xfs_rmap_item.c     | 8 ++++++++
 fs/xfs/xfs_trans.h         | 1 +
 5 files changed, 33 insertions(+)

diff --git a/fs/xfs/xfs_bmap_item.c b/fs/xfs/xfs_bmap_item.c
index 3b968b31911b..e1b0e321d604 100644
--- a/fs/xfs/xfs_bmap_item.c
+++ b/fs/xfs/xfs_bmap_item.c
@@ -201,12 +201,20 @@ xfs_bud_item_release(
 	kmem_cache_free(xfs_bud_cache, budp);
 }
 
+static struct xfs_log_item *
+xfs_bud_item_intent(
+	struct xfs_log_item	*lip)
+{
+	return &BUD_ITEM(lip)->bud_buip->bui_item;
+}
+
 static const struct xfs_item_ops xfs_bud_item_ops = {
 	.flags		= XFS_ITEM_RELEASE_WHEN_COMMITTED |
 			  XFS_ITEM_INTENT_DONE,
 	.iop_size	= xfs_bud_item_size,
 	.iop_format	= xfs_bud_item_format,
 	.iop_release	= xfs_bud_item_release,
+	.iop_intent	= xfs_bud_item_intent,
 };
 
 static struct xfs_bud_log_item *
diff --git a/fs/xfs/xfs_extfree_item.c b/fs/xfs/xfs_extfree_item.c
index 96735f23d12d..032db5269e97 100644
--- a/fs/xfs/xfs_extfree_item.c
+++ b/fs/xfs/xfs_extfree_item.c
@@ -306,12 +306,20 @@ xfs_efd_item_release(
 	xfs_efd_item_free(efdp);
 }
 
+static struct xfs_log_item *
+xfs_efd_item_intent(
+	struct xfs_log_item	*lip)
+{
+	return &EFD_ITEM(lip)->efd_efip->efi_item;
+}
+
 static const struct xfs_item_ops xfs_efd_item_ops = {
 	.flags		= XFS_ITEM_RELEASE_WHEN_COMMITTED |
 			  XFS_ITEM_INTENT_DONE,
 	.iop_size	= xfs_efd_item_size,
 	.iop_format	= xfs_efd_item_format,
 	.iop_release	= xfs_efd_item_release,
+	.iop_intent	= xfs_efd_item_intent,
 };
 
 /*
diff --git a/fs/xfs/xfs_refcount_item.c b/fs/xfs/xfs_refcount_item.c
index b523ce2c775b..a2213b5ee344 100644
--- a/fs/xfs/xfs_refcount_item.c
+++ b/fs/xfs/xfs_refcount_item.c
@@ -207,12 +207,20 @@ xfs_cud_item_release(
 	kmem_cache_free(xfs_cud_cache, cudp);
 }
 
+static struct xfs_log_item *
+xfs_cud_item_intent(
+	struct xfs_log_item	*lip)
+{
+	return &CUD_ITEM(lip)->cud_cuip->cui_item;
+}
+
 static const struct xfs_item_ops xfs_cud_item_ops = {
 	.flags		= XFS_ITEM_RELEASE_WHEN_COMMITTED |
 			  XFS_ITEM_INTENT_DONE,
 	.iop_size	= xfs_cud_item_size,
 	.iop_format	= xfs_cud_item_format,
 	.iop_release	= xfs_cud_item_release,
+	.iop_intent	= xfs_cud_item_intent,
 };
 
 static struct xfs_cud_log_item *
diff --git a/fs/xfs/xfs_rmap_item.c b/fs/xfs/xfs_rmap_item.c
index b269e68407b9..053eb135380c 100644
--- a/fs/xfs/xfs_rmap_item.c
+++ b/fs/xfs/xfs_rmap_item.c
@@ -230,12 +230,20 @@ xfs_rud_item_release(
 	kmem_cache_free(xfs_rud_cache, rudp);
 }
 
+static struct xfs_log_item *
+xfs_rud_item_intent(
+	struct xfs_log_item	*lip)
+{
+	return &RUD_ITEM(lip)->rud_ruip->rui_item;
+}
+
 static const struct xfs_item_ops xfs_rud_item_ops = {
 	.flags		= XFS_ITEM_RELEASE_WHEN_COMMITTED |
 			  XFS_ITEM_INTENT_DONE,
 	.iop_size	= xfs_rud_item_size,
 	.iop_format	= xfs_rud_item_format,
 	.iop_release	= xfs_rud_item_release,
+	.iop_intent	= xfs_rud_item_intent,
 };
 
 static struct xfs_rud_log_item *
diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h
index f68e74e46026..d72a5995d33e 100644
--- a/fs/xfs/xfs_trans.h
+++ b/fs/xfs/xfs_trans.h
@@ -78,6 +78,7 @@ struct xfs_item_ops {
 	bool (*iop_match)(struct xfs_log_item *item, uint64_t id);
 	struct xfs_log_item *(*iop_relog)(struct xfs_log_item *intent,
 			struct xfs_trans *tp);
+	struct xfs_log_item *(*iop_intent)(struct xfs_log_item *intent_done);
 };
 
 /*
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 7/8] xfs: whiteouts release intents that are not in the AIL
  2022-04-27  2:22 [PATCH 0/8 v5] xfs: intent whiteouts Dave Chinner
                   ` (5 preceding siblings ...)
  2022-04-27  2:22 ` [PATCH 6/8] xfs: add log item method to return related intents Dave Chinner
@ 2022-04-27  2:22 ` Dave Chinner
  2022-04-27  3:19   ` Darrick J. Wong
  2022-04-28 13:15   ` Christoph Hellwig
  2022-04-27  2:22 ` [PATCH 8/8] xfs: intent item whiteouts Dave Chinner
  7 siblings, 2 replies; 35+ messages in thread
From: Dave Chinner @ 2022-04-27  2:22 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

When we release an intent that a whiteout applies to, it will not
have been committed to the journal and so won't be in the AIL. Hence
when we drop the last reference to the intent, we do not want to try
to remove it from the AIL as that will trigger a filesystem
shutdown. Hence make the removal of intents from the AIL conditional
on them actually being in the AIL so we do the correct thing.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
---
 fs/xfs/xfs_bmap_item.c     | 8 +++++---
 fs/xfs/xfs_extfree_item.c  | 8 +++++---
 fs/xfs/xfs_refcount_item.c | 8 +++++---
 fs/xfs/xfs_rmap_item.c     | 8 +++++---
 4 files changed, 20 insertions(+), 12 deletions(-)

diff --git a/fs/xfs/xfs_bmap_item.c b/fs/xfs/xfs_bmap_item.c
index e1b0e321d604..59aa5f9bf769 100644
--- a/fs/xfs/xfs_bmap_item.c
+++ b/fs/xfs/xfs_bmap_item.c
@@ -54,10 +54,12 @@ xfs_bui_release(
 	struct xfs_bui_log_item	*buip)
 {
 	ASSERT(atomic_read(&buip->bui_refcount) > 0);
-	if (atomic_dec_and_test(&buip->bui_refcount)) {
+	if (!atomic_dec_and_test(&buip->bui_refcount))
+		return;
+
+	if (test_bit(XFS_LI_IN_AIL, &buip->bui_item.li_flags))
 		xfs_trans_ail_delete(&buip->bui_item, SHUTDOWN_LOG_IO_ERROR);
-		xfs_bui_item_free(buip);
-	}
+	xfs_bui_item_free(buip);
 }
 
 
diff --git a/fs/xfs/xfs_extfree_item.c b/fs/xfs/xfs_extfree_item.c
index 032db5269e97..893a7dd15cbb 100644
--- a/fs/xfs/xfs_extfree_item.c
+++ b/fs/xfs/xfs_extfree_item.c
@@ -58,10 +58,12 @@ xfs_efi_release(
 	struct xfs_efi_log_item	*efip)
 {
 	ASSERT(atomic_read(&efip->efi_refcount) > 0);
-	if (atomic_dec_and_test(&efip->efi_refcount)) {
+	if (!atomic_dec_and_test(&efip->efi_refcount))
+		return;
+
+	if (test_bit(XFS_LI_IN_AIL, &efip->efi_item.li_flags))
 		xfs_trans_ail_delete(&efip->efi_item, SHUTDOWN_LOG_IO_ERROR);
-		xfs_efi_item_free(efip);
-	}
+	xfs_efi_item_free(efip);
 }
 
 /*
diff --git a/fs/xfs/xfs_refcount_item.c b/fs/xfs/xfs_refcount_item.c
index a2213b5ee344..1b82b818f515 100644
--- a/fs/xfs/xfs_refcount_item.c
+++ b/fs/xfs/xfs_refcount_item.c
@@ -53,10 +53,12 @@ xfs_cui_release(
 	struct xfs_cui_log_item	*cuip)
 {
 	ASSERT(atomic_read(&cuip->cui_refcount) > 0);
-	if (atomic_dec_and_test(&cuip->cui_refcount)) {
+	if (!atomic_dec_and_test(&cuip->cui_refcount))
+		return;
+
+	if (test_bit(XFS_LI_IN_AIL, &cuip->cui_item.li_flags))
 		xfs_trans_ail_delete(&cuip->cui_item, SHUTDOWN_LOG_IO_ERROR);
-		xfs_cui_item_free(cuip);
-	}
+	xfs_cui_item_free(cuip);
 }
 
 
diff --git a/fs/xfs/xfs_rmap_item.c b/fs/xfs/xfs_rmap_item.c
index 053eb135380c..99dfb3ae7e9c 100644
--- a/fs/xfs/xfs_rmap_item.c
+++ b/fs/xfs/xfs_rmap_item.c
@@ -53,10 +53,12 @@ xfs_rui_release(
 	struct xfs_rui_log_item	*ruip)
 {
 	ASSERT(atomic_read(&ruip->rui_refcount) > 0);
-	if (atomic_dec_and_test(&ruip->rui_refcount)) {
+	if (!atomic_dec_and_test(&ruip->rui_refcount))
+		return;
+
+	if (test_bit(XFS_LI_IN_AIL, &ruip->rui_item.li_flags))
 		xfs_trans_ail_delete(&ruip->rui_item, SHUTDOWN_LOG_IO_ERROR);
-		xfs_rui_item_free(ruip);
-	}
+	xfs_rui_item_free(ruip);
 }
 
 STATIC void
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 8/8] xfs: intent item whiteouts
  2022-04-27  2:22 [PATCH 0/8 v5] xfs: intent whiteouts Dave Chinner
                   ` (6 preceding siblings ...)
  2022-04-27  2:22 ` [PATCH 7/8] xfs: whiteouts release intents that are not in the AIL Dave Chinner
@ 2022-04-27  2:22 ` Dave Chinner
  2022-04-27  3:32   ` Darrick J. Wong
  2022-04-28 13:22   ` Christoph Hellwig
  7 siblings, 2 replies; 35+ messages in thread
From: Dave Chinner @ 2022-04-27  2:22 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

When we log modifications based on intents, we add both intent
and intent done items to the modification being made. These get
written to the log to ensure that the operation is re-run if the
intent done is not found in the log.

However, for operations that complete wholly within a single
checkpoint, the change in the checkpoint is atomic and will never
need replay. In this case, we don't need to actually write the
intent and intent done items to the journal because log recovery
will never need to manually restart this modification.

Log recovery currently handles intent/intent done matching by
inserting the intent into the AIL, then removing it when a matching
intent done item is found. Hence for all the intent-based operations
that complete within a checkpoint, we spend all that time parsing
the intent/intent done items just to cancel them and do nothing with
them.

Hence it follows that the only time we actually need intents in the
log is when the modification crosses checkpoint boundaries in the
log and so may only be partially complete in the journal. Hence if
we commit and intent done item to the CIL and the intent item is in
the same checkpoint, we don't actually have to write them to the
journal because log recovery will always cancel the intents.

We've never really worried about the overhead of logging intents
unnecessarily like this because the intents we log are generally
very much smaller than the change being made. e.g. freeing an extent
involves modifying at lease two freespace btree blocks and the AGF,
so the EFI/EFD overhead is only a small increase in space and
processing time compared to the overall cost of freeing an extent.

However, delayed attributes change this cost equation dramatically,
especially for inline attributes. In the case of adding an inline
attribute, we only log the inode core and attribute fork at present.
With delayed attributes, we now log the attr intent which includes
the name and value, the inode core adn attr fork, and finally the
attr intent done item. We increase the number of items we log from 1
to 3, and the number of log vectors (regions) goes up from 3 to 7.
Hence we tripple the number of objects that the CIL has to process,
and more than double the number of log vectors that need to be
written to the journal.

At scale, this means delayed attributes cause a non-pipelined CIL to
become CPU bound processing all the extra items, resulting in a > 40%
performance degradation on 16-way file+xattr create worklaods.
Pipelining the CIL (as per 5.15) reduces the performance degradation
to 20%, but now the limitation is the rate at which the log items
can be written to the iclogs and iclogs be dispatched for IO and
completed.

Even log IO completion is slowed down by these intents, because it
now has to process 3x the number of items in the checkpoint.
Processing completed intents is especially inefficient here, because
we first insert the intent into the AIL, then remove it from the AIL
when the intent done is processed. IOWs, we are also doing expensive
operations in log IO completion we could completely avoid if we
didn't log completed intent/intent done pairs.

Enter log item whiteouts.

When an intent done is committed, we can check to see if the
associated intent is in the same checkpoint as we are currently
committing the intent done to. If so, we can mark the intent log
item with a whiteout and immediately free the intent done item
rather than committing it to the CIL. We can basically skip the
entire formatting and CIL insertion steps for the intent done item.

However, we cannot remove the intent item from the CIL at this point
because the unlocked per-cpu CIL item lists do not permit removal
without holding the CIL context lock exclusively. Transaction commit
only holds the context lock shared, hence the best we can do is mark
the intent item with a whiteout so that the CIL push can release it
rather than writing it to the log.

This means we never write the intent to the log if the intent done
has also been committed to the same checkpoint, but we'll always
write the intent if the intent done has not been committed or has
been committed to a different checkpoint. This will result in
correct log recovery behaviour in all cases, without the overhead of
logging unnecessary intents.

This intent whiteout concept is generic - we can apply it to all
intent/intent done pairs that have a direct 1:1 relationship. The
way deferred ops iterate and relog intents mean that all intents
currently have a 1:1 relationship with their done intent, and hence
we can apply this cancellation to all existing intent/intent done
implementations.

For delayed attributes with a 16-way 64kB xattr create workload,
whiteouts reduce the amount of journalled metadata from ~2.5GB/s
down to ~600MB/s and improve the creation rate from 9000/s to
14000/s.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_bmap_item.c     |  2 +
 fs/xfs/xfs_log_cil.c       | 80 ++++++++++++++++++++++++++++++++++++--
 fs/xfs/xfs_refcount_item.c |  1 +
 fs/xfs/xfs_rmap_item.c     |  2 +
 fs/xfs/xfs_trace.h         |  3 ++
 fs/xfs/xfs_trans.h         |  6 ++-
 6 files changed, 89 insertions(+), 5 deletions(-)

diff --git a/fs/xfs/xfs_bmap_item.c b/fs/xfs/xfs_bmap_item.c
index 59aa5f9bf769..670d074a71cc 100644
--- a/fs/xfs/xfs_bmap_item.c
+++ b/fs/xfs/xfs_bmap_item.c
@@ -39,6 +39,7 @@ STATIC void
 xfs_bui_item_free(
 	struct xfs_bui_log_item	*buip)
 {
+	kmem_free(buip->bui_item.li_lv_shadow);
 	kmem_cache_free(xfs_bui_cache, buip);
 }
 
@@ -199,6 +200,7 @@ xfs_bud_item_release(
 {
 	struct xfs_bud_log_item	*budp = BUD_ITEM(lip);
 
+	kmem_free(budp->bud_item.li_lv_shadow);
 	xfs_bui_release(budp->bud_buip);
 	kmem_cache_free(xfs_bud_cache, budp);
 }
diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
index 0d8d092447ad..d894511428f2 100644
--- a/fs/xfs/xfs_log_cil.c
+++ b/fs/xfs/xfs_log_cil.c
@@ -476,7 +476,8 @@ xlog_cil_insert_format_items(
 static void
 xlog_cil_insert_items(
 	struct xlog		*log,
-	struct xfs_trans	*tp)
+	struct xfs_trans	*tp,
+	uint32_t		released_space)
 {
 	struct xfs_cil		*cil = log->l_cilp;
 	struct xfs_cil_ctx	*ctx = cil->xc_ctx;
@@ -526,6 +527,7 @@ xlog_cil_insert_items(
 	}
 	tp->t_ticket->t_curr_res -= len;
 	ctx->space_used += len;
+	ctx->space_used -= released_space;
 
 	/*
 	 * If we've overrun the reservation, dump the tx details before we move
@@ -970,11 +972,16 @@ xlog_cil_build_trans_hdr(
  * Pull all the log vectors off the items in the CIL, and remove the items from
  * the CIL. We don't need the CIL lock here because it's only needed on the
  * transaction commit side which is currently locked out by the flush lock.
+ *
+ * If a log item is marked with a whiteout, we do not need to write it to the
+ * journal and so we just move them to the whiteout list for the caller to
+ * dispose of appropriately.
  */
 static void
 xlog_cil_build_lv_chain(
 	struct xfs_cil		*cil,
 	struct xfs_cil_ctx	*ctx,
+	struct list_head	*whiteouts,
 	uint32_t		*num_iovecs,
 	uint32_t		*num_bytes)
 {
@@ -985,6 +992,13 @@ xlog_cil_build_lv_chain(
 
 		item = list_first_entry(&cil->xc_cil,
 					struct xfs_log_item, li_cil);
+
+		if (test_bit(XFS_LI_WHITEOUT, &item->li_flags)) {
+			list_move(&item->li_cil, whiteouts);
+			trace_xfs_cil_whiteout_skip(item);
+			continue;
+		}
+
 		list_del_init(&item->li_cil);
 		if (!ctx->lv_chain)
 			ctx->lv_chain = item->li_lv;
@@ -1030,6 +1044,7 @@ xlog_cil_push_work(
 	struct xfs_log_vec	lvhdr = { NULL };
 	xfs_csn_t		push_seq;
 	bool			push_commit_stable;
+	LIST_HEAD		(whiteouts);
 
 	new_ctx = xlog_cil_ctx_alloc();
 	new_ctx->ticket = xlog_cil_ticket_alloc(log);
@@ -1098,7 +1113,7 @@ xlog_cil_push_work(
 	list_add(&ctx->committing, &cil->xc_committing);
 	spin_unlock(&cil->xc_push_lock);
 
-	xlog_cil_build_lv_chain(cil, ctx, &num_iovecs, &num_bytes);
+	xlog_cil_build_lv_chain(cil, ctx, &whiteouts, &num_iovecs, &num_bytes);
 
 	/*
 	 * Switch the contexts so we can drop the context lock and move out
@@ -1201,6 +1216,15 @@ xlog_cil_push_work(
 	/* Not safe to reference ctx now! */
 
 	spin_unlock(&log->l_icloglock);
+
+	/* clean up log items that had whiteouts */
+	while (!list_empty(&whiteouts)) {
+		struct xfs_log_item *item = list_first_entry(&whiteouts,
+						struct xfs_log_item, li_cil);
+		list_del_init(&item->li_cil);
+		trace_xfs_cil_whiteout_unpin(item);
+		item->li_ops->iop_unpin(item, 1);
+	}
 	return;
 
 out_skip:
@@ -1212,6 +1236,14 @@ xlog_cil_push_work(
 out_abort_free_ticket:
 	xfs_log_ticket_ungrant(log, ctx->ticket);
 	ASSERT(xlog_is_shutdown(log));
+	while (!list_empty(&whiteouts)) {
+		struct xfs_log_item *item = list_first_entry(&whiteouts,
+						struct xfs_log_item, li_cil);
+		list_del_init(&item->li_cil);
+		trace_xfs_cil_whiteout_unpin(item);
+		item->li_ops->iop_unpin(item, 1);
+	}
+
 	if (!ctx->commit_iclog) {
 		xlog_cil_committed(ctx);
 		return;
@@ -1360,6 +1392,43 @@ xlog_cil_empty(
 	return empty;
 }
 
+/*
+ * If there are intent done items in this transaction and the related intent was
+ * committed in the current (same) CIL checkpoint, we don't need to write either
+ * the intent or intent done item to the journal as the change will be
+ * journalled atomically within this checkpoint. As we cannot remove items from
+ * the CIL here, mark the related intent with a whiteout so that the CIL push
+ * can remove it rather than writing it to the journal. Then remove the intent
+ * done item from the current transaction and release it so it doesn't get put
+ * into the CIL at all.
+ */
+static uint32_t
+xlog_cil_process_intents(
+	struct xfs_cil		*cil,
+	struct xfs_trans	*tp)
+{
+	struct xfs_log_item	*lip, *ilip, *next;
+	uint32_t		len = 0;
+
+	list_for_each_entry_safe(lip, next, &tp->t_items, li_trans) {
+		if (!(lip->li_ops->flags & XFS_ITEM_INTENT_DONE))
+			continue;
+
+		ilip = lip->li_ops->iop_intent(lip);
+		if (!ilip || !xlog_item_in_current_chkpt(cil, ilip))
+			continue;
+		set_bit(XFS_LI_WHITEOUT, &ilip->li_flags);
+		trace_xfs_cil_whiteout_mark(ilip);
+		len += ilip->li_lv->lv_bytes;
+		kmem_free(ilip->li_lv);
+		ilip->li_lv = NULL;
+
+		xfs_trans_del_item(lip);
+		lip->li_ops->iop_release(lip);
+	}
+	return len;
+}
+
 /*
  * Commit a transaction with the given vector to the Committed Item List.
  *
@@ -1382,6 +1451,7 @@ xlog_cil_commit(
 {
 	struct xfs_cil		*cil = log->l_cilp;
 	struct xfs_log_item	*lip, *next;
+	uint32_t		released_space = 0;
 
 	/*
 	 * Do all necessary memory allocation before we lock the CIL.
@@ -1393,7 +1463,11 @@ xlog_cil_commit(
 	/* lock out background commit */
 	down_read(&cil->xc_ctx_lock);
 
-	xlog_cil_insert_items(log, tp);
+	if (tp->t_flags & XFS_TRANS_HAS_INTENT_DONE)
+		released_space = xlog_cil_process_intents(cil, tp);
+
+	xlog_cil_insert_items(log, tp, released_space);
+	tp->t_ticket->t_curr_res += released_space;
 
 	if (regrant && !xlog_is_shutdown(log))
 		xfs_log_ticket_regrant(log, tp->t_ticket);
diff --git a/fs/xfs/xfs_refcount_item.c b/fs/xfs/xfs_refcount_item.c
index 1b82b818f515..3fee1090e9a8 100644
--- a/fs/xfs/xfs_refcount_item.c
+++ b/fs/xfs/xfs_refcount_item.c
@@ -35,6 +35,7 @@ STATIC void
 xfs_cui_item_free(
 	struct xfs_cui_log_item	*cuip)
 {
+	kmem_free(cuip->cui_item.li_lv_shadow);
 	if (cuip->cui_format.cui_nextents > XFS_CUI_MAX_FAST_EXTENTS)
 		kmem_free(cuip);
 	else
diff --git a/fs/xfs/xfs_rmap_item.c b/fs/xfs/xfs_rmap_item.c
index 99dfb3ae7e9c..546bd824cdf7 100644
--- a/fs/xfs/xfs_rmap_item.c
+++ b/fs/xfs/xfs_rmap_item.c
@@ -35,6 +35,7 @@ STATIC void
 xfs_rui_item_free(
 	struct xfs_rui_log_item	*ruip)
 {
+	kmem_free(ruip->rui_item.li_lv_shadow);
 	if (ruip->rui_format.rui_nextents > XFS_RUI_MAX_FAST_EXTENTS)
 		kmem_free(ruip);
 	else
@@ -228,6 +229,7 @@ xfs_rud_item_release(
 {
 	struct xfs_rud_log_item	*rudp = RUD_ITEM(lip);
 
+	kmem_free(rudp->rud_item.li_lv_shadow);
 	xfs_rui_release(rudp->rud_ruip);
 	kmem_cache_free(xfs_rud_cache, rudp);
 }
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index e1197f9ad97e..75934e3c3f55 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -1332,6 +1332,9 @@ DEFINE_LOG_ITEM_EVENT(xfs_ail_push);
 DEFINE_LOG_ITEM_EVENT(xfs_ail_pinned);
 DEFINE_LOG_ITEM_EVENT(xfs_ail_locked);
 DEFINE_LOG_ITEM_EVENT(xfs_ail_flushing);
+DEFINE_LOG_ITEM_EVENT(xfs_cil_whiteout_mark);
+DEFINE_LOG_ITEM_EVENT(xfs_cil_whiteout_skip);
+DEFINE_LOG_ITEM_EVENT(xfs_cil_whiteout_unpin);
 
 DECLARE_EVENT_CLASS(xfs_ail_class,
 	TP_PROTO(struct xfs_log_item *lip, xfs_lsn_t old_lsn, xfs_lsn_t new_lsn),
diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h
index d72a5995d33e..9561f193e7e1 100644
--- a/fs/xfs/xfs_trans.h
+++ b/fs/xfs/xfs_trans.h
@@ -55,13 +55,15 @@ struct xfs_log_item {
 #define	XFS_LI_IN_AIL	0
 #define	XFS_LI_ABORTED	1
 #define	XFS_LI_FAILED	2
-#define	XFS_LI_DIRTY	3	/* log item dirty in transaction */
+#define	XFS_LI_DIRTY	3
+#define	XFS_LI_WHITEOUT	4
 
 #define XFS_LI_FLAGS \
 	{ (1u << XFS_LI_IN_AIL),	"IN_AIL" }, \
 	{ (1u << XFS_LI_ABORTED),	"ABORTED" }, \
 	{ (1u << XFS_LI_FAILED),	"FAILED" }, \
-	{ (1u << XFS_LI_DIRTY),		"DIRTY" }
+	{ (1u << XFS_LI_DIRTY),		"DIRTY" }, \
+	{ (1u << XFS_LI_WHITEOUT),	"WHITEOUT" }
 
 struct xfs_item_ops {
 	unsigned flags;
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/8] xfs: don't commit the first deferred transaction without intents
  2022-04-27  2:22 ` [PATCH 2/8] xfs: don't commit the first deferred transaction without intents Dave Chinner
@ 2022-04-27  3:03   ` Darrick J. Wong
  2022-04-27  4:52     ` Dave Chinner
  2022-04-28 13:02   ` Christoph Hellwig
  2022-04-30 17:02   ` Alli
  2 siblings, 1 reply; 35+ messages in thread
From: Darrick J. Wong @ 2022-04-27  3:03 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Wed, Apr 27, 2022 at 12:22:53PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> If the first operation in a string of defer ops has no intents,
> then there is no reason to commit it before running the first call
> to xfs_defer_finish_one(). This allows the defer ops to be used
> effectively for non-intent based operations without requiring an
> unnecessary extra transaction commit when first called.
> 
> This fixes a regression in per-attribute modification transaction
> count when delayed attributes are not being used.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/libxfs/xfs_defer.c | 29 +++++++++++++++++------------
>  1 file changed, 17 insertions(+), 12 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_defer.c b/fs/xfs/libxfs/xfs_defer.c
> index 0805ade2d300..66b4555bda8e 100644
> --- a/fs/xfs/libxfs/xfs_defer.c
> +++ b/fs/xfs/libxfs/xfs_defer.c
> @@ -186,7 +186,7 @@ static const struct xfs_defer_op_type *defer_op_types[] = {
>  	[XFS_DEFER_OPS_TYPE_AGFL_FREE]	= &xfs_agfl_free_defer_type,
>  };
>  
> -static void
> +static bool
>  xfs_defer_create_intent(
>  	struct xfs_trans		*tp,
>  	struct xfs_defer_pending	*dfp,
> @@ -197,6 +197,7 @@ xfs_defer_create_intent(
>  	if (!dfp->dfp_intent)
>  		dfp->dfp_intent = ops->create_intent(tp, &dfp->dfp_work,
>  						     dfp->dfp_count, sort);
> +	return dfp->dfp_intent;

Same comment as last time -- please make it more obvious that we're
returning whether or not ->create_intent actually added a log item:

	return dfp->dfp_intent != NULL;

and not returning the log intent item itself.

Otherwise looks ok, so with that fixed,
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

>  }
>  
>  /*
> @@ -204,16 +205,18 @@ xfs_defer_create_intent(
>   * associated extents, then add the entire intake list to the end of
>   * the pending list.
>   */
> -STATIC void
> +static bool
>  xfs_defer_create_intents(
>  	struct xfs_trans		*tp)
>  {
>  	struct xfs_defer_pending	*dfp;
> +	bool				ret = false;
>  
>  	list_for_each_entry(dfp, &tp->t_dfops, dfp_list) {
>  		trace_xfs_defer_create_intent(tp->t_mountp, dfp);
> -		xfs_defer_create_intent(tp, dfp, true);
> +		ret |= xfs_defer_create_intent(tp, dfp, true);
>  	}
> +	return ret;
>  }
>  
>  /* Abort all the intents that were committed. */
> @@ -487,7 +490,7 @@ int
>  xfs_defer_finish_noroll(
>  	struct xfs_trans		**tp)
>  {
> -	struct xfs_defer_pending	*dfp;
> +	struct xfs_defer_pending	*dfp = NULL;
>  	int				error = 0;
>  	LIST_HEAD(dop_pending);
>  
> @@ -506,17 +509,19 @@ xfs_defer_finish_noroll(
>  		 * of time that any one intent item can stick around in memory,
>  		 * pinning the log tail.
>  		 */
> -		xfs_defer_create_intents(*tp);
> +		bool has_intents = xfs_defer_create_intents(*tp);
>  		list_splice_init(&(*tp)->t_dfops, &dop_pending);
>  
> -		error = xfs_defer_trans_roll(tp);
> -		if (error)
> -			goto out_shutdown;
> +		if (has_intents || dfp) {
> +			error = xfs_defer_trans_roll(tp);
> +			if (error)
> +				goto out_shutdown;
>  
> -		/* Possibly relog intent items to keep the log moving. */
> -		error = xfs_defer_relog(tp, &dop_pending);
> -		if (error)
> -			goto out_shutdown;
> +			/* Possibly relog intent items to keep the log moving. */
> +			error = xfs_defer_relog(tp, &dop_pending);
> +			if (error)
> +				goto out_shutdown;
> +		}
>  
>  		dfp = list_first_entry(&dop_pending, struct xfs_defer_pending,
>  				       dfp_list);
> -- 
> 2.35.1
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 3/8] xfs: add log item flags to indicate intents
  2022-04-27  2:22 ` [PATCH 3/8] xfs: add log item flags to indicate intents Dave Chinner
@ 2022-04-27  3:04   ` Darrick J. Wong
  2022-04-28 13:04     ` Christoph Hellwig
  0 siblings, 1 reply; 35+ messages in thread
From: Darrick J. Wong @ 2022-04-27  3:04 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Wed, Apr 27, 2022 at 12:22:54PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> We currently have a couple of helper functions that try to infer
> whether the log item is an intent or intent done item from the
> combinations of operations it supports.  This is incredibly fragile
> and not very efficient as it requires checking specific combinations
> of ops.
> 
> We need to be able to identify intent and intent done items quickly
> and easily in upcoming patches, so simply add intent and intent done
> type flags to the log item ops flags. These are static flags to
> begin with, so intent items should have been typed like this from
> the start.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> Reviewed-by: Allison Henderson <allison.henderson@oracle.com>

Heh, I remember being told to infer intentness or intentdoneness instead
of using explicit flags...

Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> ---
>  fs/xfs/xfs_bmap_item.c     |  4 +++-
>  fs/xfs/xfs_extfree_item.c  |  4 +++-
>  fs/xfs/xfs_refcount_item.c |  4 +++-
>  fs/xfs/xfs_rmap_item.c     |  4 +++-
>  fs/xfs/xfs_trans.h         | 25 +++++++++++++------------
>  5 files changed, 25 insertions(+), 16 deletions(-)
> 
> diff --git a/fs/xfs/xfs_bmap_item.c b/fs/xfs/xfs_bmap_item.c
> index 593ac29cffc7..ed67c0028a68 100644
> --- a/fs/xfs/xfs_bmap_item.c
> +++ b/fs/xfs/xfs_bmap_item.c
> @@ -202,7 +202,8 @@ xfs_bud_item_release(
>  }
>  
>  static const struct xfs_item_ops xfs_bud_item_ops = {
> -	.flags		= XFS_ITEM_RELEASE_WHEN_COMMITTED,
> +	.flags		= XFS_ITEM_RELEASE_WHEN_COMMITTED |
> +			  XFS_ITEM_INTENT_DONE,
>  	.iop_size	= xfs_bud_item_size,
>  	.iop_format	= xfs_bud_item_format,
>  	.iop_release	= xfs_bud_item_release,
> @@ -586,6 +587,7 @@ xfs_bui_item_relog(
>  }
>  
>  static const struct xfs_item_ops xfs_bui_item_ops = {
> +	.flags		= XFS_ITEM_INTENT,
>  	.iop_size	= xfs_bui_item_size,
>  	.iop_format	= xfs_bui_item_format,
>  	.iop_unpin	= xfs_bui_item_unpin,
> diff --git a/fs/xfs/xfs_extfree_item.c b/fs/xfs/xfs_extfree_item.c
> index 0e50f2c9348e..21a159f9d8c5 100644
> --- a/fs/xfs/xfs_extfree_item.c
> +++ b/fs/xfs/xfs_extfree_item.c
> @@ -307,7 +307,8 @@ xfs_efd_item_release(
>  }
>  
>  static const struct xfs_item_ops xfs_efd_item_ops = {
> -	.flags		= XFS_ITEM_RELEASE_WHEN_COMMITTED,
> +	.flags		= XFS_ITEM_RELEASE_WHEN_COMMITTED |
> +			  XFS_ITEM_INTENT_DONE,
>  	.iop_size	= xfs_efd_item_size,
>  	.iop_format	= xfs_efd_item_format,
>  	.iop_release	= xfs_efd_item_release,
> @@ -688,6 +689,7 @@ xfs_efi_item_relog(
>  }
>  
>  static const struct xfs_item_ops xfs_efi_item_ops = {
> +	.flags		= XFS_ITEM_INTENT,
>  	.iop_size	= xfs_efi_item_size,
>  	.iop_format	= xfs_efi_item_format,
>  	.iop_unpin	= xfs_efi_item_unpin,
> diff --git a/fs/xfs/xfs_refcount_item.c b/fs/xfs/xfs_refcount_item.c
> index 0d868c93144d..6536eea4c6ea 100644
> --- a/fs/xfs/xfs_refcount_item.c
> +++ b/fs/xfs/xfs_refcount_item.c
> @@ -208,7 +208,8 @@ xfs_cud_item_release(
>  }
>  
>  static const struct xfs_item_ops xfs_cud_item_ops = {
> -	.flags		= XFS_ITEM_RELEASE_WHEN_COMMITTED,
> +	.flags		= XFS_ITEM_RELEASE_WHEN_COMMITTED |
> +			  XFS_ITEM_INTENT_DONE,
>  	.iop_size	= xfs_cud_item_size,
>  	.iop_format	= xfs_cud_item_format,
>  	.iop_release	= xfs_cud_item_release,
> @@ -600,6 +601,7 @@ xfs_cui_item_relog(
>  }
>  
>  static const struct xfs_item_ops xfs_cui_item_ops = {
> +	.flags		= XFS_ITEM_INTENT,
>  	.iop_size	= xfs_cui_item_size,
>  	.iop_format	= xfs_cui_item_format,
>  	.iop_unpin	= xfs_cui_item_unpin,
> diff --git a/fs/xfs/xfs_rmap_item.c b/fs/xfs/xfs_rmap_item.c
> index a22b2d19ef91..c2bb8cfc231e 100644
> --- a/fs/xfs/xfs_rmap_item.c
> +++ b/fs/xfs/xfs_rmap_item.c
> @@ -231,7 +231,8 @@ xfs_rud_item_release(
>  }
>  
>  static const struct xfs_item_ops xfs_rud_item_ops = {
> -	.flags		= XFS_ITEM_RELEASE_WHEN_COMMITTED,
> +	.flags		= XFS_ITEM_RELEASE_WHEN_COMMITTED |
> +			  XFS_ITEM_INTENT_DONE,
>  	.iop_size	= xfs_rud_item_size,
>  	.iop_format	= xfs_rud_item_format,
>  	.iop_release	= xfs_rud_item_release,
> @@ -630,6 +631,7 @@ xfs_rui_item_relog(
>  }
>  
>  static const struct xfs_item_ops xfs_rui_item_ops = {
> +	.flags		= XFS_ITEM_INTENT,
>  	.iop_size	= xfs_rui_item_size,
>  	.iop_format	= xfs_rui_item_format,
>  	.iop_unpin	= xfs_rui_item_unpin,
> diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h
> index 87e940b5366e..f68e74e46026 100644
> --- a/fs/xfs/xfs_trans.h
> +++ b/fs/xfs/xfs_trans.h
> @@ -80,28 +80,29 @@ struct xfs_item_ops {
>  			struct xfs_trans *tp);
>  };
>  
> -/* Is this log item a deferred action intent? */
> +/*
> + * Log item ops flags
> + */
> +/*
> + * Release the log item when the journal commits instead of inserting into the
> + * AIL for writeback tracking and/or log tail pinning.
> + */
> +#define XFS_ITEM_RELEASE_WHEN_COMMITTED	(1 << 0)
> +#define XFS_ITEM_INTENT			(1 << 1)
> +#define XFS_ITEM_INTENT_DONE		(1 << 2)
> +
>  static inline bool
>  xlog_item_is_intent(struct xfs_log_item *lip)
>  {
> -	return lip->li_ops->iop_recover != NULL &&
> -	       lip->li_ops->iop_match != NULL;
> +	return lip->li_ops->flags & XFS_ITEM_INTENT;
>  }
>  
> -/* Is this a log intent-done item? */
>  static inline bool
>  xlog_item_is_intent_done(struct xfs_log_item *lip)
>  {
> -	return lip->li_ops->iop_unpin == NULL &&
> -	       lip->li_ops->iop_push == NULL;
> +	return lip->li_ops->flags & XFS_ITEM_INTENT_DONE;
>  }
>  
> -/*
> - * Release the log item as soon as committed.  This is for items just logging
> - * intents that never need to be written back in place.
> - */
> -#define XFS_ITEM_RELEASE_WHEN_COMMITTED	(1 << 0)
> -
>  void	xfs_log_item_init(struct xfs_mount *mp, struct xfs_log_item *item,
>  			  int type, const struct xfs_item_ops *ops);
>  
> -- 
> 2.35.1
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 4/8] xfs: tag transactions that contain intent done items
  2022-04-27  2:22 ` [PATCH 4/8] xfs: tag transactions that contain intent done items Dave Chinner
@ 2022-04-27  3:06   ` Darrick J. Wong
  2022-04-28 13:05   ` Christoph Hellwig
  1 sibling, 0 replies; 35+ messages in thread
From: Darrick J. Wong @ 2022-04-27  3:06 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Wed, Apr 27, 2022 at 12:22:55PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> Intent whiteouts will require extra work to be done during
> transaction commit if the transaction contains an intent done item.
> 
> To determine if a transaction contains an intent done item, we want
> to avoid having to walk all the items in the transaction to check if
> they are intent done items. Hence when we add an intent done item to
> a transaction, tag the transaction to indicate that it contains such
> an item.
> 
> We don't tag the transaction when the defer ops is relogging an
> intent to move it forward in the log. Whiteouts will never apply to
> these cases, so we don't need to bother looking for them.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
> ---
>  fs/xfs/libxfs/xfs_shared.h | 24 +++++++++++++++++-------
>  fs/xfs/xfs_bmap_item.c     |  2 +-
>  fs/xfs/xfs_extfree_item.c  |  2 +-
>  fs/xfs/xfs_refcount_item.c |  2 +-
>  fs/xfs/xfs_rmap_item.c     |  2 +-
>  5 files changed, 21 insertions(+), 11 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_shared.h b/fs/xfs/libxfs/xfs_shared.h
> index 25c4cab58851..c4381388c0c1 100644
> --- a/fs/xfs/libxfs/xfs_shared.h
> +++ b/fs/xfs/libxfs/xfs_shared.h
> @@ -54,13 +54,23 @@ void	xfs_log_get_max_trans_res(struct xfs_mount *mp,
>  /*
>   * Values for t_flags.
>   */
> -#define	XFS_TRANS_DIRTY		0x01	/* something needs to be logged */
> -#define	XFS_TRANS_SB_DIRTY	0x02	/* superblock is modified */
> -#define	XFS_TRANS_PERM_LOG_RES	0x04	/* xact took a permanent log res */
> -#define	XFS_TRANS_SYNC		0x08	/* make commit synchronous */
> -#define XFS_TRANS_RESERVE	0x20    /* OK to use reserved data blocks */
> -#define XFS_TRANS_NO_WRITECOUNT 0x40	/* do not elevate SB writecount */
> -#define XFS_TRANS_RES_FDBLKS	0x80	/* reserve newly freed blocks */
> +/* Transaction needs to be logged */
> +#define XFS_TRANS_DIRTY			(1u << 0)
> +/* Superblock is dirty and needs to be logged */
> +#define XFS_TRANS_SB_DIRTY		(1u << 1)
> +/* Transaction took a permanent log reservation */
> +#define XFS_TRANS_PERM_LOG_RES		(1u << 2)
> +/* Synchronous transaction commit needed */
> +#define XFS_TRANS_SYNC			(1u << 3)
> +/* Transaction can use reserve block pool */
> +#define XFS_TRANS_RESERVE		(1u << 4)
> +/* Transaction should avoid VFS level superblock write accounting */
> +#define XFS_TRANS_NO_WRITECOUNT		(1u << 5)
> +/* Transaction has freed blocks returned to it's reservation */
> +#define XFS_TRANS_RES_FDBLKS		(1u << 6)

Yesssss documentation finally!

> +/* Transaction contains an intent done log item */
> +#define XFS_TRANS_HAS_INTENT_DONE	(1u << 7)

I guess I'll see what this one does in the next few patches.

Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> +
>  /*
>   * LOWMODE is used by the allocator to activate the lowspace algorithm - when
>   * free space is running low the extent allocator may choose to allocate an
> diff --git a/fs/xfs/xfs_bmap_item.c b/fs/xfs/xfs_bmap_item.c
> index ed67c0028a68..3b968b31911b 100644
> --- a/fs/xfs/xfs_bmap_item.c
> +++ b/fs/xfs/xfs_bmap_item.c
> @@ -255,7 +255,7 @@ xfs_trans_log_finish_bmap_update(
>  	 * 1.) releases the BUI and frees the BUD
>  	 * 2.) shuts down the filesystem
>  	 */
> -	tp->t_flags |= XFS_TRANS_DIRTY;
> +	tp->t_flags |= XFS_TRANS_DIRTY | XFS_TRANS_HAS_INTENT_DONE;
>  	set_bit(XFS_LI_DIRTY, &budp->bud_item.li_flags);
>  
>  	return error;
> diff --git a/fs/xfs/xfs_extfree_item.c b/fs/xfs/xfs_extfree_item.c
> index 21a159f9d8c5..96735f23d12d 100644
> --- a/fs/xfs/xfs_extfree_item.c
> +++ b/fs/xfs/xfs_extfree_item.c
> @@ -381,7 +381,7 @@ xfs_trans_free_extent(
>  	 * 1.) releases the EFI and frees the EFD
>  	 * 2.) shuts down the filesystem
>  	 */
> -	tp->t_flags |= XFS_TRANS_DIRTY;
> +	tp->t_flags |= XFS_TRANS_DIRTY | XFS_TRANS_HAS_INTENT_DONE;
>  	set_bit(XFS_LI_DIRTY, &efdp->efd_item.li_flags);
>  
>  	next_extent = efdp->efd_next_extent;
> diff --git a/fs/xfs/xfs_refcount_item.c b/fs/xfs/xfs_refcount_item.c
> index 6536eea4c6ea..b523ce2c775b 100644
> --- a/fs/xfs/xfs_refcount_item.c
> +++ b/fs/xfs/xfs_refcount_item.c
> @@ -260,7 +260,7 @@ xfs_trans_log_finish_refcount_update(
>  	 * 1.) releases the CUI and frees the CUD
>  	 * 2.) shuts down the filesystem
>  	 */
> -	tp->t_flags |= XFS_TRANS_DIRTY;
> +	tp->t_flags |= XFS_TRANS_DIRTY | XFS_TRANS_HAS_INTENT_DONE;
>  	set_bit(XFS_LI_DIRTY, &cudp->cud_item.li_flags);
>  
>  	return error;
> diff --git a/fs/xfs/xfs_rmap_item.c b/fs/xfs/xfs_rmap_item.c
> index c2bb8cfc231e..b269e68407b9 100644
> --- a/fs/xfs/xfs_rmap_item.c
> +++ b/fs/xfs/xfs_rmap_item.c
> @@ -328,7 +328,7 @@ xfs_trans_log_finish_rmap_update(
>  	 * 1.) releases the RUI and frees the RUD
>  	 * 2.) shuts down the filesystem
>  	 */
> -	tp->t_flags |= XFS_TRANS_DIRTY;
> +	tp->t_flags |= XFS_TRANS_DIRTY | XFS_TRANS_HAS_INTENT_DONE;
>  	set_bit(XFS_LI_DIRTY, &rudp->rud_item.li_flags);
>  
>  	return error;
> -- 
> 2.35.1
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/8] xfs: hide log iovec alignment constraints
  2022-04-27  2:22 ` [PATCH 1/8] xfs: hide log iovec alignment constraints Dave Chinner
@ 2022-04-27  3:14   ` Darrick J. Wong
  2022-04-27  4:50     ` Dave Chinner
  2022-04-28 13:00   ` Christoph Hellwig
  1 sibling, 1 reply; 35+ messages in thread
From: Darrick J. Wong @ 2022-04-27  3:14 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Wed, Apr 27, 2022 at 12:22:52PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> Callers currently have to round out the size of buffers to match the
> aligment constraints of log iovecs and xlog_write(). They should not
> need to know this detail, so introduce a new function to calculate
> the iovec length (for use in ->iop_size implementations). Also
> modify xlog_finish_iovec() to round up the length to the correct
> alignment so the callers don't need to do this, either.
> 
> Convert the only user - inode forks - of this alignment rounding to
> use the new interface.

Hmm.  So currently, we require that the inode fork buffer be rounded up
to the next 4 bytes, and then I guess the log will copy that into the
log iovec?  IOWs, if we have a 37-byte data fork, we'll allocate a 40
byte buffer for the xfs_ifork, and the log will copy all 40 bytes into a
40 byte iovec.

Now it looks like we'd allocate a 37-byte buffer for the xfs_ifork, but
the log iovec will still be 40 bytes.  So ... do we copy 37 bytes out of
the ifork buffer and zero the last 3 bytes in the iovec?  Does we leak
kernel memory in those last 3 bytes?  Or do we copy 40 bytes and
overrun?

It sorta looks like (at least for the local format case) xlog_copy_iovec
will copy 37 bytes and leave the last 3 bytes of the iovec in whatever
state it was in previously.  Is that zeroed?  Because it then looks like
xlog_finish_iovec will round that 37 up to 40.

(FWIW I'm just checking for kernel memory exposure vectors here.)

--D

> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/libxfs/xfs_inode_fork.c  | 12 +++---------
>  fs/xfs/xfs_inode_item.c         | 25 +++++++------------------
>  fs/xfs/xfs_inode_item_recover.c |  4 ++--
>  fs/xfs/xfs_log.h                | 23 +++++++++++++++++++++++
>  4 files changed, 35 insertions(+), 29 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_inode_fork.c b/fs/xfs/libxfs/xfs_inode_fork.c
> index 9aee4a1e2fe9..1a4cdf550f6d 100644
> --- a/fs/xfs/libxfs/xfs_inode_fork.c
> +++ b/fs/xfs/libxfs/xfs_inode_fork.c
> @@ -36,7 +36,7 @@ xfs_init_local_fork(
>  	int64_t			size)
>  {
>  	struct xfs_ifork	*ifp = XFS_IFORK_PTR(ip, whichfork);
> -	int			mem_size = size, real_size = 0;
> +	int			mem_size = size;
>  	bool			zero_terminate;
>  
>  	/*
> @@ -50,8 +50,7 @@ xfs_init_local_fork(
>  		mem_size++;
>  
>  	if (size) {
> -		real_size = roundup(mem_size, 4);
> -		ifp->if_u1.if_data = kmem_alloc(real_size, KM_NOFS);
> +		ifp->if_u1.if_data = kmem_alloc(mem_size, KM_NOFS);
>  		memcpy(ifp->if_u1.if_data, data, size);
>  		if (zero_terminate)
>  			ifp->if_u1.if_data[size] = '\0';
> @@ -497,12 +496,7 @@ xfs_idata_realloc(
>  		return;
>  	}
>  
> -	/*
> -	 * For inline data, the underlying buffer must be a multiple of 4 bytes
> -	 * in size so that it can be logged and stay on word boundaries.
> -	 * We enforce that here.
> -	 */
> -	ifp->if_u1.if_data = krealloc(ifp->if_u1.if_data, roundup(new_size, 4),
> +	ifp->if_u1.if_data = krealloc(ifp->if_u1.if_data, new_size,
>  				      GFP_NOFS | __GFP_NOFAIL);
>  	ifp->if_bytes = new_size;
>  }
> diff --git a/fs/xfs/xfs_inode_item.c b/fs/xfs/xfs_inode_item.c
> index 00733a18ccdc..721def0639fd 100644
> --- a/fs/xfs/xfs_inode_item.c
> +++ b/fs/xfs/xfs_inode_item.c
> @@ -71,7 +71,7 @@ xfs_inode_item_data_fork_size(
>  	case XFS_DINODE_FMT_LOCAL:
>  		if ((iip->ili_fields & XFS_ILOG_DDATA) &&
>  		    ip->i_df.if_bytes > 0) {
> -			*nbytes += roundup(ip->i_df.if_bytes, 4);
> +			*nbytes += xlog_calc_iovec_len(ip->i_df.if_bytes);
>  			*nvecs += 1;
>  		}
>  		break;
> @@ -112,7 +112,7 @@ xfs_inode_item_attr_fork_size(
>  	case XFS_DINODE_FMT_LOCAL:
>  		if ((iip->ili_fields & XFS_ILOG_ADATA) &&
>  		    ip->i_afp->if_bytes > 0) {
> -			*nbytes += roundup(ip->i_afp->if_bytes, 4);
> +			*nbytes += xlog_calc_iovec_len(ip->i_afp->if_bytes);
>  			*nvecs += 1;
>  		}
>  		break;
> @@ -204,17 +204,12 @@ xfs_inode_item_format_data_fork(
>  			~(XFS_ILOG_DEXT | XFS_ILOG_DBROOT | XFS_ILOG_DEV);
>  		if ((iip->ili_fields & XFS_ILOG_DDATA) &&
>  		    ip->i_df.if_bytes > 0) {
> -			/*
> -			 * Round i_bytes up to a word boundary.
> -			 * The underlying memory is guaranteed
> -			 * to be there by xfs_idata_realloc().
> -			 */
> -			data_bytes = roundup(ip->i_df.if_bytes, 4);
>  			ASSERT(ip->i_df.if_u1.if_data != NULL);
>  			ASSERT(ip->i_disk_size > 0);
>  			xlog_copy_iovec(lv, vecp, XLOG_REG_TYPE_ILOCAL,
> -					ip->i_df.if_u1.if_data, data_bytes);
> -			ilf->ilf_dsize = (unsigned)data_bytes;
> +					ip->i_df.if_u1.if_data,
> +					ip->i_df.if_bytes);
> +			ilf->ilf_dsize = (unsigned)ip->i_df.if_bytes;
>  			ilf->ilf_size++;
>  		} else {
>  			iip->ili_fields &= ~XFS_ILOG_DDATA;
> @@ -288,17 +283,11 @@ xfs_inode_item_format_attr_fork(
>  
>  		if ((iip->ili_fields & XFS_ILOG_ADATA) &&
>  		    ip->i_afp->if_bytes > 0) {
> -			/*
> -			 * Round i_bytes up to a word boundary.
> -			 * The underlying memory is guaranteed
> -			 * to be there by xfs_idata_realloc().
> -			 */
> -			data_bytes = roundup(ip->i_afp->if_bytes, 4);
>  			ASSERT(ip->i_afp->if_u1.if_data != NULL);
>  			xlog_copy_iovec(lv, vecp, XLOG_REG_TYPE_IATTR_LOCAL,
>  					ip->i_afp->if_u1.if_data,
> -					data_bytes);
> -			ilf->ilf_asize = (unsigned)data_bytes;
> +					ip->i_afp->if_bytes);
> +			ilf->ilf_asize = (unsigned)ip->i_afp->if_bytes;
>  			ilf->ilf_size++;
>  		} else {
>  			iip->ili_fields &= ~XFS_ILOG_ADATA;
> diff --git a/fs/xfs/xfs_inode_item_recover.c b/fs/xfs/xfs_inode_item_recover.c
> index 6d44f5fd6d7e..d28ffaebd067 100644
> --- a/fs/xfs/xfs_inode_item_recover.c
> +++ b/fs/xfs/xfs_inode_item_recover.c
> @@ -462,7 +462,7 @@ xlog_recover_inode_commit_pass2(
>  	ASSERT(in_f->ilf_size <= 4);
>  	ASSERT((in_f->ilf_size == 3) || (fields & XFS_ILOG_AFORK));
>  	ASSERT(!(fields & XFS_ILOG_DFORK) ||
> -	       (len == in_f->ilf_dsize));
> +	       (len == xlog_calc_iovec_len(in_f->ilf_dsize)));
>  
>  	switch (fields & XFS_ILOG_DFORK) {
>  	case XFS_ILOG_DDATA:
> @@ -497,7 +497,7 @@ xlog_recover_inode_commit_pass2(
>  		}
>  		len = item->ri_buf[attr_index].i_len;
>  		src = item->ri_buf[attr_index].i_addr;
> -		ASSERT(len == in_f->ilf_asize);
> +		ASSERT(len == xlog_calc_iovec_len(in_f->ilf_asize));
>  
>  		switch (in_f->ilf_fields & XFS_ILOG_AFORK) {
>  		case XFS_ILOG_ADATA:
> diff --git a/fs/xfs/xfs_log.h b/fs/xfs/xfs_log.h
> index 8dafe8f771c7..d9aebca3f971 100644
> --- a/fs/xfs/xfs_log.h
> +++ b/fs/xfs/xfs_log.h
> @@ -21,6 +21,17 @@ struct xfs_log_vec {
>  
>  #define XFS_LOG_VEC_ORDERED	(-1)
>  
> +/*
> + * Calculate the log iovec length for a given user buffer length. Intended to be
> + * used by ->iop_size implementations when sizing buffers of arbitrary
> + * alignments.
> + */
> +static inline int
> +xlog_calc_iovec_len(int len)
> +{
> +	return roundup(len, 4);
> +}
> +
>  void *xlog_prepare_iovec(struct xfs_log_vec *lv, struct xfs_log_iovec **vecp,
>  		uint type);
>  
> @@ -29,6 +40,12 @@ xlog_finish_iovec(struct xfs_log_vec *lv, struct xfs_log_iovec *vec, int len)
>  {
>  	struct xlog_op_header	*oph = vec->i_addr;
>  
> +	/*
> +	 * Always round up the length to the correct alignment so callers don't
> +	 * need to know anything about this log vec layout requirement.
> +	 */
> +	len = xlog_calc_iovec_len(len);
> +
>  	/* opheader tracks payload length, logvec tracks region length */
>  	oph->oh_len = cpu_to_be32(len);
>  
> @@ -36,8 +53,14 @@ xlog_finish_iovec(struct xfs_log_vec *lv, struct xfs_log_iovec *vec, int len)
>  	lv->lv_buf_len += len;
>  	lv->lv_bytes += len;
>  	vec->i_len = len;
> +
> +	/* Catch buffer overruns */
> +	ASSERT((void *)lv->lv_buf + lv->lv_bytes <= (void *)lv + lv->lv_size);
>  }
>  
> +/*
> + * Copy the amount of data requested by the caller into a new log iovec.
> + */
>  static inline void *
>  xlog_copy_iovec(struct xfs_log_vec *lv, struct xfs_log_iovec **vecp,
>  		uint type, void *data, int len)
> -- 
> 2.35.1
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 5/8] xfs: factor and move some code in xfs_log_cil.c
  2022-04-27  2:22 ` [PATCH 5/8] xfs: factor and move some code in xfs_log_cil.c Dave Chinner
@ 2022-04-27  3:15   ` Darrick J. Wong
  2022-04-27  4:56     ` Dave Chinner
  2022-04-28 13:06   ` Christoph Hellwig
  2022-04-29  1:56   ` Alli
  2 siblings, 1 reply; 35+ messages in thread
From: Darrick J. Wong @ 2022-04-27  3:15 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Wed, Apr 27, 2022 at 12:22:56PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> In preparation for adding support for intent item whiteouts.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>

Looks like a straight hoist?
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> ---
>  fs/xfs/xfs_log_cil.c | 119 ++++++++++++++++++++++++-------------------
>  1 file changed, 67 insertions(+), 52 deletions(-)
> 
> diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
> index e5ab62f08c19..0d8d092447ad 100644
> --- a/fs/xfs/xfs_log_cil.c
> +++ b/fs/xfs/xfs_log_cil.c
> @@ -47,6 +47,38 @@ xlog_cil_ticket_alloc(
>  	return tic;
>  }
>  
> +/*
> + * Check if the current log item was first committed in this sequence.
> + * We can't rely on just the log item being in the CIL, we have to check
> + * the recorded commit sequence number.
> + *
> + * Note: for this to be used in a non-racy manner, it has to be called with
> + * CIL flushing locked out. As a result, it should only be used during the
> + * transaction commit process when deciding what to format into the item.
> + */
> +static bool
> +xlog_item_in_current_chkpt(
> +	struct xfs_cil		*cil,
> +	struct xfs_log_item	*lip)
> +{
> +	if (list_empty(&lip->li_cil))
> +		return false;
> +
> +	/*
> +	 * li_seq is written on the first commit of a log item to record the
> +	 * first checkpoint it is written to. Hence if it is different to the
> +	 * current sequence, we're in a new checkpoint.
> +	 */
> +	return lip->li_seq == READ_ONCE(cil->xc_current_sequence);
> +}
> +
> +bool
> +xfs_log_item_in_current_chkpt(
> +	struct xfs_log_item *lip)
> +{
> +	return xlog_item_in_current_chkpt(lip->li_log->l_cilp, lip);
> +}
> +
>  /*
>   * Unavoidable forward declaration - xlog_cil_push_work() calls
>   * xlog_cil_ctx_alloc() itself.
> @@ -934,6 +966,40 @@ xlog_cil_build_trans_hdr(
>  	tic->t_curr_res -= lvhdr->lv_bytes;
>  }
>  
> +/*
> + * Pull all the log vectors off the items in the CIL, and remove the items from
> + * the CIL. We don't need the CIL lock here because it's only needed on the
> + * transaction commit side which is currently locked out by the flush lock.
> + */
> +static void
> +xlog_cil_build_lv_chain(
> +	struct xfs_cil		*cil,
> +	struct xfs_cil_ctx	*ctx,
> +	uint32_t		*num_iovecs,
> +	uint32_t		*num_bytes)
> +{
> +	struct xfs_log_vec	*lv = NULL;
> +
> +	while (!list_empty(&cil->xc_cil)) {
> +		struct xfs_log_item	*item;
> +
> +		item = list_first_entry(&cil->xc_cil,
> +					struct xfs_log_item, li_cil);
> +		list_del_init(&item->li_cil);
> +		if (!ctx->lv_chain)
> +			ctx->lv_chain = item->li_lv;
> +		else
> +			lv->lv_next = item->li_lv;
> +		lv = item->li_lv;
> +		item->li_lv = NULL;
> +		*num_iovecs += lv->lv_niovecs;
> +
> +		/* we don't write ordered log vectors */
> +		if (lv->lv_buf_len != XFS_LOG_VEC_ORDERED)
> +			*num_bytes += lv->lv_bytes;
> +	}
> +}
> +
>  /*
>   * Push the Committed Item List to the log.
>   *
> @@ -956,7 +1022,6 @@ xlog_cil_push_work(
>  		container_of(work, struct xfs_cil_ctx, push_work);
>  	struct xfs_cil		*cil = ctx->cil;
>  	struct xlog		*log = cil->xc_log;
> -	struct xfs_log_vec	*lv;
>  	struct xfs_cil_ctx	*new_ctx;
>  	int			num_iovecs = 0;
>  	int			num_bytes = 0;
> @@ -1033,31 +1098,7 @@ xlog_cil_push_work(
>  	list_add(&ctx->committing, &cil->xc_committing);
>  	spin_unlock(&cil->xc_push_lock);
>  
> -	/*
> -	 * Pull all the log vectors off the items in the CIL, and remove the
> -	 * items from the CIL. We don't need the CIL lock here because it's only
> -	 * needed on the transaction commit side which is currently locked out
> -	 * by the flush lock.
> -	 */
> -	lv = NULL;
> -	while (!list_empty(&cil->xc_cil)) {
> -		struct xfs_log_item	*item;
> -
> -		item = list_first_entry(&cil->xc_cil,
> -					struct xfs_log_item, li_cil);
> -		list_del_init(&item->li_cil);
> -		if (!ctx->lv_chain)
> -			ctx->lv_chain = item->li_lv;
> -		else
> -			lv->lv_next = item->li_lv;
> -		lv = item->li_lv;
> -		item->li_lv = NULL;
> -		num_iovecs += lv->lv_niovecs;
> -
> -		/* we don't write ordered log vectors */
> -		if (lv->lv_buf_len != XFS_LOG_VEC_ORDERED)
> -			num_bytes += lv->lv_bytes;
> -	}
> +	xlog_cil_build_lv_chain(cil, ctx, &num_iovecs, &num_bytes);
>  
>  	/*
>  	 * Switch the contexts so we can drop the context lock and move out
> @@ -1508,32 +1549,6 @@ xlog_cil_force_seq(
>  	return 0;
>  }
>  
> -/*
> - * Check if the current log item was first committed in this sequence.
> - * We can't rely on just the log item being in the CIL, we have to check
> - * the recorded commit sequence number.
> - *
> - * Note: for this to be used in a non-racy manner, it has to be called with
> - * CIL flushing locked out. As a result, it should only be used during the
> - * transaction commit process when deciding what to format into the item.
> - */
> -bool
> -xfs_log_item_in_current_chkpt(
> -	struct xfs_log_item	*lip)
> -{
> -	struct xfs_cil		*cil = lip->li_log->l_cilp;
> -
> -	if (list_empty(&lip->li_cil))
> -		return false;
> -
> -	/*
> -	 * li_seq is written on the first commit of a log item to record the
> -	 * first checkpoint it is written to. Hence if it is different to the
> -	 * current sequence, we're in a new checkpoint.
> -	 */
> -	return lip->li_seq == READ_ONCE(cil->xc_current_sequence);
> -}
> -
>  /*
>   * Perform initial CIL structure initialisation.
>   */
> -- 
> 2.35.1
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 6/8] xfs: add log item method to return related intents
  2022-04-27  2:22 ` [PATCH 6/8] xfs: add log item method to return related intents Dave Chinner
@ 2022-04-27  3:18   ` Darrick J. Wong
  2022-04-28 13:10   ` Christoph Hellwig
  1 sibling, 0 replies; 35+ messages in thread
From: Darrick J. Wong @ 2022-04-27  3:18 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Wed, Apr 27, 2022 at 12:22:57PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> To apply a whiteout to an intent item when an intent done item is
> committed, we need to be able to retrieve the intent item from the
> the intent done item. Add a log item op method for doing this, and
> wire all the intent done items up to it.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> Reviewed-by: Allison Henderson <allison.henderson@oracle.com>

We'll see what I think about the last patch, but the code changes here
look acceptable.

Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> ---
>  fs/xfs/xfs_bmap_item.c     | 8 ++++++++
>  fs/xfs/xfs_extfree_item.c  | 8 ++++++++
>  fs/xfs/xfs_refcount_item.c | 8 ++++++++
>  fs/xfs/xfs_rmap_item.c     | 8 ++++++++
>  fs/xfs/xfs_trans.h         | 1 +
>  5 files changed, 33 insertions(+)
> 
> diff --git a/fs/xfs/xfs_bmap_item.c b/fs/xfs/xfs_bmap_item.c
> index 3b968b31911b..e1b0e321d604 100644
> --- a/fs/xfs/xfs_bmap_item.c
> +++ b/fs/xfs/xfs_bmap_item.c
> @@ -201,12 +201,20 @@ xfs_bud_item_release(
>  	kmem_cache_free(xfs_bud_cache, budp);
>  }
>  
> +static struct xfs_log_item *
> +xfs_bud_item_intent(
> +	struct xfs_log_item	*lip)
> +{
> +	return &BUD_ITEM(lip)->bud_buip->bui_item;
> +}
> +
>  static const struct xfs_item_ops xfs_bud_item_ops = {
>  	.flags		= XFS_ITEM_RELEASE_WHEN_COMMITTED |
>  			  XFS_ITEM_INTENT_DONE,
>  	.iop_size	= xfs_bud_item_size,
>  	.iop_format	= xfs_bud_item_format,
>  	.iop_release	= xfs_bud_item_release,
> +	.iop_intent	= xfs_bud_item_intent,
>  };
>  
>  static struct xfs_bud_log_item *
> diff --git a/fs/xfs/xfs_extfree_item.c b/fs/xfs/xfs_extfree_item.c
> index 96735f23d12d..032db5269e97 100644
> --- a/fs/xfs/xfs_extfree_item.c
> +++ b/fs/xfs/xfs_extfree_item.c
> @@ -306,12 +306,20 @@ xfs_efd_item_release(
>  	xfs_efd_item_free(efdp);
>  }
>  
> +static struct xfs_log_item *
> +xfs_efd_item_intent(
> +	struct xfs_log_item	*lip)
> +{
> +	return &EFD_ITEM(lip)->efd_efip->efi_item;
> +}
> +
>  static const struct xfs_item_ops xfs_efd_item_ops = {
>  	.flags		= XFS_ITEM_RELEASE_WHEN_COMMITTED |
>  			  XFS_ITEM_INTENT_DONE,
>  	.iop_size	= xfs_efd_item_size,
>  	.iop_format	= xfs_efd_item_format,
>  	.iop_release	= xfs_efd_item_release,
> +	.iop_intent	= xfs_efd_item_intent,
>  };
>  
>  /*
> diff --git a/fs/xfs/xfs_refcount_item.c b/fs/xfs/xfs_refcount_item.c
> index b523ce2c775b..a2213b5ee344 100644
> --- a/fs/xfs/xfs_refcount_item.c
> +++ b/fs/xfs/xfs_refcount_item.c
> @@ -207,12 +207,20 @@ xfs_cud_item_release(
>  	kmem_cache_free(xfs_cud_cache, cudp);
>  }
>  
> +static struct xfs_log_item *
> +xfs_cud_item_intent(
> +	struct xfs_log_item	*lip)
> +{
> +	return &CUD_ITEM(lip)->cud_cuip->cui_item;
> +}
> +
>  static const struct xfs_item_ops xfs_cud_item_ops = {
>  	.flags		= XFS_ITEM_RELEASE_WHEN_COMMITTED |
>  			  XFS_ITEM_INTENT_DONE,
>  	.iop_size	= xfs_cud_item_size,
>  	.iop_format	= xfs_cud_item_format,
>  	.iop_release	= xfs_cud_item_release,
> +	.iop_intent	= xfs_cud_item_intent,
>  };
>  
>  static struct xfs_cud_log_item *
> diff --git a/fs/xfs/xfs_rmap_item.c b/fs/xfs/xfs_rmap_item.c
> index b269e68407b9..053eb135380c 100644
> --- a/fs/xfs/xfs_rmap_item.c
> +++ b/fs/xfs/xfs_rmap_item.c
> @@ -230,12 +230,20 @@ xfs_rud_item_release(
>  	kmem_cache_free(xfs_rud_cache, rudp);
>  }
>  
> +static struct xfs_log_item *
> +xfs_rud_item_intent(
> +	struct xfs_log_item	*lip)
> +{
> +	return &RUD_ITEM(lip)->rud_ruip->rui_item;
> +}
> +
>  static const struct xfs_item_ops xfs_rud_item_ops = {
>  	.flags		= XFS_ITEM_RELEASE_WHEN_COMMITTED |
>  			  XFS_ITEM_INTENT_DONE,
>  	.iop_size	= xfs_rud_item_size,
>  	.iop_format	= xfs_rud_item_format,
>  	.iop_release	= xfs_rud_item_release,
> +	.iop_intent	= xfs_rud_item_intent,
>  };
>  
>  static struct xfs_rud_log_item *
> diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h
> index f68e74e46026..d72a5995d33e 100644
> --- a/fs/xfs/xfs_trans.h
> +++ b/fs/xfs/xfs_trans.h
> @@ -78,6 +78,7 @@ struct xfs_item_ops {
>  	bool (*iop_match)(struct xfs_log_item *item, uint64_t id);
>  	struct xfs_log_item *(*iop_relog)(struct xfs_log_item *intent,
>  			struct xfs_trans *tp);
> +	struct xfs_log_item *(*iop_intent)(struct xfs_log_item *intent_done);
>  };
>  
>  /*
> -- 
> 2.35.1
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 7/8] xfs: whiteouts release intents that are not in the AIL
  2022-04-27  2:22 ` [PATCH 7/8] xfs: whiteouts release intents that are not in the AIL Dave Chinner
@ 2022-04-27  3:19   ` Darrick J. Wong
  2022-04-28 13:15   ` Christoph Hellwig
  1 sibling, 0 replies; 35+ messages in thread
From: Darrick J. Wong @ 2022-04-27  3:19 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Wed, Apr 27, 2022 at 12:22:58PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> When we release an intent that a whiteout applies to, it will not
> have been committed to the journal and so won't be in the AIL. Hence
> when we drop the last reference to the intent, we do not want to try
> to remove it from the AIL as that will trigger a filesystem
> shutdown. Hence make the removal of intents from the AIL conditional
> on them actually being in the AIL so we do the correct thing.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> Reviewed-by: Allison Henderson <allison.henderson@oracle.com>

Same provisional RVB as the last patch.
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> ---
>  fs/xfs/xfs_bmap_item.c     | 8 +++++---
>  fs/xfs/xfs_extfree_item.c  | 8 +++++---
>  fs/xfs/xfs_refcount_item.c | 8 +++++---
>  fs/xfs/xfs_rmap_item.c     | 8 +++++---
>  4 files changed, 20 insertions(+), 12 deletions(-)
> 
> diff --git a/fs/xfs/xfs_bmap_item.c b/fs/xfs/xfs_bmap_item.c
> index e1b0e321d604..59aa5f9bf769 100644
> --- a/fs/xfs/xfs_bmap_item.c
> +++ b/fs/xfs/xfs_bmap_item.c
> @@ -54,10 +54,12 @@ xfs_bui_release(
>  	struct xfs_bui_log_item	*buip)
>  {
>  	ASSERT(atomic_read(&buip->bui_refcount) > 0);
> -	if (atomic_dec_and_test(&buip->bui_refcount)) {
> +	if (!atomic_dec_and_test(&buip->bui_refcount))
> +		return;
> +
> +	if (test_bit(XFS_LI_IN_AIL, &buip->bui_item.li_flags))
>  		xfs_trans_ail_delete(&buip->bui_item, SHUTDOWN_LOG_IO_ERROR);
> -		xfs_bui_item_free(buip);
> -	}
> +	xfs_bui_item_free(buip);
>  }
>  
>  
> diff --git a/fs/xfs/xfs_extfree_item.c b/fs/xfs/xfs_extfree_item.c
> index 032db5269e97..893a7dd15cbb 100644
> --- a/fs/xfs/xfs_extfree_item.c
> +++ b/fs/xfs/xfs_extfree_item.c
> @@ -58,10 +58,12 @@ xfs_efi_release(
>  	struct xfs_efi_log_item	*efip)
>  {
>  	ASSERT(atomic_read(&efip->efi_refcount) > 0);
> -	if (atomic_dec_and_test(&efip->efi_refcount)) {
> +	if (!atomic_dec_and_test(&efip->efi_refcount))
> +		return;
> +
> +	if (test_bit(XFS_LI_IN_AIL, &efip->efi_item.li_flags))
>  		xfs_trans_ail_delete(&efip->efi_item, SHUTDOWN_LOG_IO_ERROR);
> -		xfs_efi_item_free(efip);
> -	}
> +	xfs_efi_item_free(efip);
>  }
>  
>  /*
> diff --git a/fs/xfs/xfs_refcount_item.c b/fs/xfs/xfs_refcount_item.c
> index a2213b5ee344..1b82b818f515 100644
> --- a/fs/xfs/xfs_refcount_item.c
> +++ b/fs/xfs/xfs_refcount_item.c
> @@ -53,10 +53,12 @@ xfs_cui_release(
>  	struct xfs_cui_log_item	*cuip)
>  {
>  	ASSERT(atomic_read(&cuip->cui_refcount) > 0);
> -	if (atomic_dec_and_test(&cuip->cui_refcount)) {
> +	if (!atomic_dec_and_test(&cuip->cui_refcount))
> +		return;
> +
> +	if (test_bit(XFS_LI_IN_AIL, &cuip->cui_item.li_flags))
>  		xfs_trans_ail_delete(&cuip->cui_item, SHUTDOWN_LOG_IO_ERROR);
> -		xfs_cui_item_free(cuip);
> -	}
> +	xfs_cui_item_free(cuip);
>  }
>  
>  
> diff --git a/fs/xfs/xfs_rmap_item.c b/fs/xfs/xfs_rmap_item.c
> index 053eb135380c..99dfb3ae7e9c 100644
> --- a/fs/xfs/xfs_rmap_item.c
> +++ b/fs/xfs/xfs_rmap_item.c
> @@ -53,10 +53,12 @@ xfs_rui_release(
>  	struct xfs_rui_log_item	*ruip)
>  {
>  	ASSERT(atomic_read(&ruip->rui_refcount) > 0);
> -	if (atomic_dec_and_test(&ruip->rui_refcount)) {
> +	if (!atomic_dec_and_test(&ruip->rui_refcount))
> +		return;
> +
> +	if (test_bit(XFS_LI_IN_AIL, &ruip->rui_item.li_flags))
>  		xfs_trans_ail_delete(&ruip->rui_item, SHUTDOWN_LOG_IO_ERROR);
> -		xfs_rui_item_free(ruip);
> -	}
> +	xfs_rui_item_free(ruip);
>  }
>  
>  STATIC void
> -- 
> 2.35.1
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 8/8] xfs: intent item whiteouts
  2022-04-27  2:22 ` [PATCH 8/8] xfs: intent item whiteouts Dave Chinner
@ 2022-04-27  3:32   ` Darrick J. Wong
  2022-04-27  5:47     ` Dave Chinner
  2022-04-28 13:22   ` Christoph Hellwig
  1 sibling, 1 reply; 35+ messages in thread
From: Darrick J. Wong @ 2022-04-27  3:32 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Wed, Apr 27, 2022 at 12:22:59PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> When we log modifications based on intents, we add both intent
> and intent done items to the modification being made. These get
> written to the log to ensure that the operation is re-run if the
> intent done is not found in the log.
> 
> However, for operations that complete wholly within a single
> checkpoint, the change in the checkpoint is atomic and will never
> need replay. In this case, we don't need to actually write the
> intent and intent done items to the journal because log recovery
> will never need to manually restart this modification.
> 
> Log recovery currently handles intent/intent done matching by
> inserting the intent into the AIL, then removing it when a matching
> intent done item is found. Hence for all the intent-based operations
> that complete within a checkpoint, we spend all that time parsing
> the intent/intent done items just to cancel them and do nothing with
> them.
> 
> Hence it follows that the only time we actually need intents in the
> log is when the modification crosses checkpoint boundaries in the
> log and so may only be partially complete in the journal. Hence if
> we commit and intent done item to the CIL and the intent item is in
> the same checkpoint, we don't actually have to write them to the
> journal because log recovery will always cancel the intents.
> 
> We've never really worried about the overhead of logging intents
> unnecessarily like this because the intents we log are generally
> very much smaller than the change being made. e.g. freeing an extent
> involves modifying at lease two freespace btree blocks and the AGF,
> so the EFI/EFD overhead is only a small increase in space and
> processing time compared to the overall cost of freeing an extent.

Question: If we whiteout enough intent items, does that make it possible
to cram more updates into a checkpoint?

Are changes required to the existing intent item code to support
whiteouts, or does the log code autodetect an *I/*D pair in the same
checkpoint and elide them automatically?

(I might be fishing here for "Does this make generic/447 faster when it
deletes the million extents?")

> However, delayed attributes change this cost equation dramatically,
> especially for inline attributes. In the case of adding an inline
> attribute, we only log the inode core and attribute fork at present.
> With delayed attributes, we now log the attr intent which includes
> the name and value, the inode core adn attr fork, and finally the
> attr intent done item. We increase the number of items we log from 1
> to 3, and the number of log vectors (regions) goes up from 3 to 7.
> Hence we tripple the number of objects that the CIL has to process,
> and more than double the number of log vectors that need to be
> written to the journal.
> 
> At scale, this means delayed attributes cause a non-pipelined CIL to
> become CPU bound processing all the extra items, resulting in a > 40%
> performance degradation on 16-way file+xattr create worklaods.
> Pipelining the CIL (as per 5.15) reduces the performance degradation
> to 20%, but now the limitation is the rate at which the log items
> can be written to the iclogs and iclogs be dispatched for IO and
> completed.
> 
> Even log IO completion is slowed down by these intents, because it
> now has to process 3x the number of items in the checkpoint.
> Processing completed intents is especially inefficient here, because
> we first insert the intent into the AIL, then remove it from the AIL
> when the intent done is processed. IOWs, we are also doing expensive
> operations in log IO completion we could completely avoid if we
> didn't log completed intent/intent done pairs.
> 
> Enter log item whiteouts.
> 
> When an intent done is committed, we can check to see if the
> associated intent is in the same checkpoint as we are currently
> committing the intent done to. If so, we can mark the intent log
> item with a whiteout and immediately free the intent done item
> rather than committing it to the CIL. We can basically skip the
> entire formatting and CIL insertion steps for the intent done item.
> 
> However, we cannot remove the intent item from the CIL at this point
> because the unlocked per-cpu CIL item lists do not permit removal
> without holding the CIL context lock exclusively. Transaction commit
> only holds the context lock shared, hence the best we can do is mark
> the intent item with a whiteout so that the CIL push can release it
> rather than writing it to the log.
> 
> This means we never write the intent to the log if the intent done
> has also been committed to the same checkpoint, but we'll always
> write the intent if the intent done has not been committed or has
> been committed to a different checkpoint. This will result in
> correct log recovery behaviour in all cases, without the overhead of
> logging unnecessary intents.
> 
> This intent whiteout concept is generic - we can apply it to all
> intent/intent done pairs that have a direct 1:1 relationship. The
> way deferred ops iterate and relog intents mean that all intents
> currently have a 1:1 relationship with their done intent, and hence
> we can apply this cancellation to all existing intent/intent done
> implementations.
> 
> For delayed attributes with a 16-way 64kB xattr create workload,
> whiteouts reduce the amount of journalled metadata from ~2.5GB/s
> down to ~600MB/s and improve the creation rate from 9000/s to
> 14000/s.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/xfs_bmap_item.c     |  2 +
>  fs/xfs/xfs_log_cil.c       | 80 ++++++++++++++++++++++++++++++++++++--
>  fs/xfs/xfs_refcount_item.c |  1 +
>  fs/xfs/xfs_rmap_item.c     |  2 +
>  fs/xfs/xfs_trace.h         |  3 ++
>  fs/xfs/xfs_trans.h         |  6 ++-
>  6 files changed, 89 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/xfs/xfs_bmap_item.c b/fs/xfs/xfs_bmap_item.c
> index 59aa5f9bf769..670d074a71cc 100644
> --- a/fs/xfs/xfs_bmap_item.c
> +++ b/fs/xfs/xfs_bmap_item.c
> @@ -39,6 +39,7 @@ STATIC void
>  xfs_bui_item_free(
>  	struct xfs_bui_log_item	*buip)
>  {
> +	kmem_free(buip->bui_item.li_lv_shadow);

Why is it necessary for log items to free their own shadow buffer?

>  	kmem_cache_free(xfs_bui_cache, buip);
>  }
>  
> @@ -199,6 +200,7 @@ xfs_bud_item_release(
>  {
>  	struct xfs_bud_log_item	*budp = BUD_ITEM(lip);
>  
> +	kmem_free(budp->bud_item.li_lv_shadow);
>  	xfs_bui_release(budp->bud_buip);
>  	kmem_cache_free(xfs_bud_cache, budp);
>  }
> diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
> index 0d8d092447ad..d894511428f2 100644
> --- a/fs/xfs/xfs_log_cil.c
> +++ b/fs/xfs/xfs_log_cil.c
> @@ -476,7 +476,8 @@ xlog_cil_insert_format_items(
>  static void
>  xlog_cil_insert_items(
>  	struct xlog		*log,
> -	struct xfs_trans	*tp)
> +	struct xfs_trans	*tp,
> +	uint32_t		released_space)
>  {
>  	struct xfs_cil		*cil = log->l_cilp;
>  	struct xfs_cil_ctx	*ctx = cil->xc_ctx;
> @@ -526,6 +527,7 @@ xlog_cil_insert_items(
>  	}
>  	tp->t_ticket->t_curr_res -= len;
>  	ctx->space_used += len;
> +	ctx->space_used -= released_space;
>  
>  	/*
>  	 * If we've overrun the reservation, dump the tx details before we move
> @@ -970,11 +972,16 @@ xlog_cil_build_trans_hdr(
>   * Pull all the log vectors off the items in the CIL, and remove the items from
>   * the CIL. We don't need the CIL lock here because it's only needed on the
>   * transaction commit side which is currently locked out by the flush lock.
> + *
> + * If a log item is marked with a whiteout, we do not need to write it to the
> + * journal and so we just move them to the whiteout list for the caller to
> + * dispose of appropriately.
>   */
>  static void
>  xlog_cil_build_lv_chain(
>  	struct xfs_cil		*cil,
>  	struct xfs_cil_ctx	*ctx,
> +	struct list_head	*whiteouts,
>  	uint32_t		*num_iovecs,
>  	uint32_t		*num_bytes)
>  {
> @@ -985,6 +992,13 @@ xlog_cil_build_lv_chain(
>  
>  		item = list_first_entry(&cil->xc_cil,
>  					struct xfs_log_item, li_cil);
> +
> +		if (test_bit(XFS_LI_WHITEOUT, &item->li_flags)) {
> +			list_move(&item->li_cil, whiteouts);
> +			trace_xfs_cil_whiteout_skip(item);
> +			continue;
> +		}
> +
>  		list_del_init(&item->li_cil);
>  		if (!ctx->lv_chain)
>  			ctx->lv_chain = item->li_lv;
> @@ -1030,6 +1044,7 @@ xlog_cil_push_work(
>  	struct xfs_log_vec	lvhdr = { NULL };
>  	xfs_csn_t		push_seq;
>  	bool			push_commit_stable;
> +	LIST_HEAD		(whiteouts);
>  
>  	new_ctx = xlog_cil_ctx_alloc();
>  	new_ctx->ticket = xlog_cil_ticket_alloc(log);
> @@ -1098,7 +1113,7 @@ xlog_cil_push_work(
>  	list_add(&ctx->committing, &cil->xc_committing);
>  	spin_unlock(&cil->xc_push_lock);
>  
> -	xlog_cil_build_lv_chain(cil, ctx, &num_iovecs, &num_bytes);
> +	xlog_cil_build_lv_chain(cil, ctx, &whiteouts, &num_iovecs, &num_bytes);
>  
>  	/*
>  	 * Switch the contexts so we can drop the context lock and move out
> @@ -1201,6 +1216,15 @@ xlog_cil_push_work(
>  	/* Not safe to reference ctx now! */
>  
>  	spin_unlock(&log->l_icloglock);
> +
> +	/* clean up log items that had whiteouts */
> +	while (!list_empty(&whiteouts)) {
> +		struct xfs_log_item *item = list_first_entry(&whiteouts,
> +						struct xfs_log_item, li_cil);
> +		list_del_init(&item->li_cil);
> +		trace_xfs_cil_whiteout_unpin(item);
> +		item->li_ops->iop_unpin(item, 1);
> +	}
>  	return;
>  
>  out_skip:
> @@ -1212,6 +1236,14 @@ xlog_cil_push_work(
>  out_abort_free_ticket:
>  	xfs_log_ticket_ungrant(log, ctx->ticket);
>  	ASSERT(xlog_is_shutdown(log));
> +	while (!list_empty(&whiteouts)) {
> +		struct xfs_log_item *item = list_first_entry(&whiteouts,
> +						struct xfs_log_item, li_cil);
> +		list_del_init(&item->li_cil);
> +		trace_xfs_cil_whiteout_unpin(item);
> +		item->li_ops->iop_unpin(item, 1);
> +	}
> +
>  	if (!ctx->commit_iclog) {
>  		xlog_cil_committed(ctx);
>  		return;
> @@ -1360,6 +1392,43 @@ xlog_cil_empty(
>  	return empty;
>  }
>  
> +/*
> + * If there are intent done items in this transaction and the related intent was
> + * committed in the current (same) CIL checkpoint, we don't need to write either
> + * the intent or intent done item to the journal as the change will be
> + * journalled atomically within this checkpoint. As we cannot remove items from
> + * the CIL here, mark the related intent with a whiteout so that the CIL push
> + * can remove it rather than writing it to the journal. Then remove the intent
> + * done item from the current transaction and release it so it doesn't get put
> + * into the CIL at all.
> + */
> +static uint32_t
> +xlog_cil_process_intents(
> +	struct xfs_cil		*cil,
> +	struct xfs_trans	*tp)
> +{
> +	struct xfs_log_item	*lip, *ilip, *next;
> +	uint32_t		len = 0;
> +
> +	list_for_each_entry_safe(lip, next, &tp->t_items, li_trans) {
> +		if (!(lip->li_ops->flags & XFS_ITEM_INTENT_DONE))
> +			continue;
> +
> +		ilip = lip->li_ops->iop_intent(lip);
> +		if (!ilip || !xlog_item_in_current_chkpt(cil, ilip))
> +			continue;
> +		set_bit(XFS_LI_WHITEOUT, &ilip->li_flags);
> +		trace_xfs_cil_whiteout_mark(ilip);
> +		len += ilip->li_lv->lv_bytes;
> +		kmem_free(ilip->li_lv);
> +		ilip->li_lv = NULL;
> +
> +		xfs_trans_del_item(lip);
> +		lip->li_ops->iop_release(lip);
> +	}
> +	return len;
> +}
> +
>  /*
>   * Commit a transaction with the given vector to the Committed Item List.
>   *
> @@ -1382,6 +1451,7 @@ xlog_cil_commit(
>  {
>  	struct xfs_cil		*cil = log->l_cilp;
>  	struct xfs_log_item	*lip, *next;
> +	uint32_t		released_space = 0;
>  
>  	/*
>  	 * Do all necessary memory allocation before we lock the CIL.
> @@ -1393,7 +1463,11 @@ xlog_cil_commit(
>  	/* lock out background commit */
>  	down_read(&cil->xc_ctx_lock);
>  
> -	xlog_cil_insert_items(log, tp);
> +	if (tp->t_flags & XFS_TRANS_HAS_INTENT_DONE)
> +		released_space = xlog_cil_process_intents(cil, tp);
> +
> +	xlog_cil_insert_items(log, tp, released_space);
> +	tp->t_ticket->t_curr_res += released_space;

I'm a little tired, so why isn't this adjustment a part of
xlog_cil_insert_items?  A similar adjustment is made to
ctx->space_used to release the unused space back to the committing tx,
right?

--D

>  
>  	if (regrant && !xlog_is_shutdown(log))
>  		xfs_log_ticket_regrant(log, tp->t_ticket);
> diff --git a/fs/xfs/xfs_refcount_item.c b/fs/xfs/xfs_refcount_item.c
> index 1b82b818f515..3fee1090e9a8 100644
> --- a/fs/xfs/xfs_refcount_item.c
> +++ b/fs/xfs/xfs_refcount_item.c
> @@ -35,6 +35,7 @@ STATIC void
>  xfs_cui_item_free(
>  	struct xfs_cui_log_item	*cuip)
>  {
> +	kmem_free(cuip->cui_item.li_lv_shadow);
>  	if (cuip->cui_format.cui_nextents > XFS_CUI_MAX_FAST_EXTENTS)
>  		kmem_free(cuip);
>  	else
> diff --git a/fs/xfs/xfs_rmap_item.c b/fs/xfs/xfs_rmap_item.c
> index 99dfb3ae7e9c..546bd824cdf7 100644
> --- a/fs/xfs/xfs_rmap_item.c
> +++ b/fs/xfs/xfs_rmap_item.c
> @@ -35,6 +35,7 @@ STATIC void
>  xfs_rui_item_free(
>  	struct xfs_rui_log_item	*ruip)
>  {
> +	kmem_free(ruip->rui_item.li_lv_shadow);
>  	if (ruip->rui_format.rui_nextents > XFS_RUI_MAX_FAST_EXTENTS)
>  		kmem_free(ruip);
>  	else
> @@ -228,6 +229,7 @@ xfs_rud_item_release(
>  {
>  	struct xfs_rud_log_item	*rudp = RUD_ITEM(lip);
>  
> +	kmem_free(rudp->rud_item.li_lv_shadow);
>  	xfs_rui_release(rudp->rud_ruip);
>  	kmem_cache_free(xfs_rud_cache, rudp);
>  }
> diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
> index e1197f9ad97e..75934e3c3f55 100644
> --- a/fs/xfs/xfs_trace.h
> +++ b/fs/xfs/xfs_trace.h
> @@ -1332,6 +1332,9 @@ DEFINE_LOG_ITEM_EVENT(xfs_ail_push);
>  DEFINE_LOG_ITEM_EVENT(xfs_ail_pinned);
>  DEFINE_LOG_ITEM_EVENT(xfs_ail_locked);
>  DEFINE_LOG_ITEM_EVENT(xfs_ail_flushing);
> +DEFINE_LOG_ITEM_EVENT(xfs_cil_whiteout_mark);
> +DEFINE_LOG_ITEM_EVENT(xfs_cil_whiteout_skip);
> +DEFINE_LOG_ITEM_EVENT(xfs_cil_whiteout_unpin);
>  
>  DECLARE_EVENT_CLASS(xfs_ail_class,
>  	TP_PROTO(struct xfs_log_item *lip, xfs_lsn_t old_lsn, xfs_lsn_t new_lsn),
> diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h
> index d72a5995d33e..9561f193e7e1 100644
> --- a/fs/xfs/xfs_trans.h
> +++ b/fs/xfs/xfs_trans.h
> @@ -55,13 +55,15 @@ struct xfs_log_item {
>  #define	XFS_LI_IN_AIL	0
>  #define	XFS_LI_ABORTED	1
>  #define	XFS_LI_FAILED	2
> -#define	XFS_LI_DIRTY	3	/* log item dirty in transaction */
> +#define	XFS_LI_DIRTY	3
> +#define	XFS_LI_WHITEOUT	4
>  
>  #define XFS_LI_FLAGS \
>  	{ (1u << XFS_LI_IN_AIL),	"IN_AIL" }, \
>  	{ (1u << XFS_LI_ABORTED),	"ABORTED" }, \
>  	{ (1u << XFS_LI_FAILED),	"FAILED" }, \
> -	{ (1u << XFS_LI_DIRTY),		"DIRTY" }
> +	{ (1u << XFS_LI_DIRTY),		"DIRTY" }, \
> +	{ (1u << XFS_LI_WHITEOUT),	"WHITEOUT" }
>  
>  struct xfs_item_ops {
>  	unsigned flags;
> -- 
> 2.35.1
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/8] xfs: hide log iovec alignment constraints
  2022-04-27  3:14   ` Darrick J. Wong
@ 2022-04-27  4:50     ` Dave Chinner
  2022-04-27 16:45       ` Darrick J. Wong
  0 siblings, 1 reply; 35+ messages in thread
From: Dave Chinner @ 2022-04-27  4:50 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Tue, Apr 26, 2022 at 08:14:45PM -0700, Darrick J. Wong wrote:
> On Wed, Apr 27, 2022 at 12:22:52PM +1000, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > Callers currently have to round out the size of buffers to match the
> > aligment constraints of log iovecs and xlog_write(). They should not
> > need to know this detail, so introduce a new function to calculate
> > the iovec length (for use in ->iop_size implementations). Also
> > modify xlog_finish_iovec() to round up the length to the correct
> > alignment so the callers don't need to do this, either.
> > 
> > Convert the only user - inode forks - of this alignment rounding to
> > use the new interface.
> 
> Hmm.  So currently, we require that the inode fork buffer be rounded up
> to the next 4 bytes, and then I guess the log will copy that into the
> log iovec?  IOWs, if we have a 37-byte data fork, we'll allocate a 40
> byte buffer for the xfs_ifork, and the log will copy all 40 bytes into a
> 40 byte iovec.

Yes, that's how the current code works. It ends up leaking whatever
was in those 3 bytes into the shadow buffer that we then copy into
the log region. i.e. the existing code "leaks" non-zeroed allocated
memory to the journal.

> Now it looks like we'd allocate a 37-byte buffer for the xfs_ifork, but
> the log iovec will still be 40 bytes.  So ... do we copy 37 bytes out of
> the ifork buffer and zero the last 3 bytes in the iovec?

Yes, we copy 37 bytes out of the ifork buffer now into the shadow
buffer so we do not overrun the inode fork buffer.

> Does we leak
> kernel memory in those last 3 bytes?

We does indeed still leak the remaining 3 bytes as they are not
zeroed.

> Or do we copy 40 bytes and
> overrun?

No, we definitely don't do that - KASAN gets very unhappy when you
do that...

> It sorta looks like (at least for the local format case) xlog_copy_iovec
> will copy 37 bytes and leave the last 3 bytes of the iovec in whatever
> state it was in previously.  Is that zeroed?  Because it then looks like
> xlog_finish_iovec will round that 37 up to 40.

The shadow buffer is only partially zeroed - the part that makes io
the header and iovec pointer array is zeroed, but the region that
the journal data is written to is not zeroed.

> (FWIW I'm just checking for kernel memory exposure vectors here.)

Yup, I hadn't even considered that aspect of the code because we
aren't actually leaking anything to userspace. If an unprivileged
user can read 3 bytes of uninitialised data out of the journal we've
got much, much bigger security problems to deal with.

It should be trivial to fix, though. I'll do the initial fix as a
standalone patch, though, and then roll it into this one because the
problem has been around for a long while and fixing this patch
doesn't produce an easily backportable fix...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/8] xfs: don't commit the first deferred transaction without intents
  2022-04-27  3:03   ` Darrick J. Wong
@ 2022-04-27  4:52     ` Dave Chinner
  0 siblings, 0 replies; 35+ messages in thread
From: Dave Chinner @ 2022-04-27  4:52 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Tue, Apr 26, 2022 at 08:03:30PM -0700, Darrick J. Wong wrote:
> On Wed, Apr 27, 2022 at 12:22:53PM +1000, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > If the first operation in a string of defer ops has no intents,
> > then there is no reason to commit it before running the first call
> > to xfs_defer_finish_one(). This allows the defer ops to be used
> > effectively for non-intent based operations without requiring an
> > unnecessary extra transaction commit when first called.
> > 
> > This fixes a regression in per-attribute modification transaction
> > count when delayed attributes are not being used.
> > 
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > ---
> >  fs/xfs/libxfs/xfs_defer.c | 29 +++++++++++++++++------------
> >  1 file changed, 17 insertions(+), 12 deletions(-)
> > 
> > diff --git a/fs/xfs/libxfs/xfs_defer.c b/fs/xfs/libxfs/xfs_defer.c
> > index 0805ade2d300..66b4555bda8e 100644
> > --- a/fs/xfs/libxfs/xfs_defer.c
> > +++ b/fs/xfs/libxfs/xfs_defer.c
> > @@ -186,7 +186,7 @@ static const struct xfs_defer_op_type *defer_op_types[] = {
> >  	[XFS_DEFER_OPS_TYPE_AGFL_FREE]	= &xfs_agfl_free_defer_type,
> >  };
> >  
> > -static void
> > +static bool
> >  xfs_defer_create_intent(
> >  	struct xfs_trans		*tp,
> >  	struct xfs_defer_pending	*dfp,
> > @@ -197,6 +197,7 @@ xfs_defer_create_intent(
> >  	if (!dfp->dfp_intent)
> >  		dfp->dfp_intent = ops->create_intent(tp, &dfp->dfp_work,
> >  						     dfp->dfp_count, sort);
> > +	return dfp->dfp_intent;
> 
> Same comment as last time -- please make it more obvious that we're
> returning whether or not ->create_intent actually added a log item:
> 
> 	return dfp->dfp_intent != NULL;

Oh, sorry, I must have missed that. My fault! Fixed.

> and not returning the log intent item itself.
> 
> Otherwise looks ok, so with that fixed,
> Reviewed-by: Darrick J. Wong <djwong@kernel.org>

Thanks!

-Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 5/8] xfs: factor and move some code in xfs_log_cil.c
  2022-04-27  3:15   ` Darrick J. Wong
@ 2022-04-27  4:56     ` Dave Chinner
  0 siblings, 0 replies; 35+ messages in thread
From: Dave Chinner @ 2022-04-27  4:56 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Tue, Apr 26, 2022 at 08:15:30PM -0700, Darrick J. Wong wrote:
> On Wed, Apr 27, 2022 at 12:22:56PM +1000, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > In preparation for adding support for intent item whiteouts.
> > 
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> 
> Looks like a straight hoist?

Yup, I pulled this from the CIL scalability patchset because it
provides better isolation for the upcoming whiteout changes in this
code.

> Reviewed-by: Darrick J. Wong <djwong@kernel.org>

Thanks!

-Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 8/8] xfs: intent item whiteouts
  2022-04-27  3:32   ` Darrick J. Wong
@ 2022-04-27  5:47     ` Dave Chinner
  2022-04-27 17:31       ` Darrick J. Wong
  0 siblings, 1 reply; 35+ messages in thread
From: Dave Chinner @ 2022-04-27  5:47 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Tue, Apr 26, 2022 at 08:32:52PM -0700, Darrick J. Wong wrote:
> On Wed, Apr 27, 2022 at 12:22:59PM +1000, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > When we log modifications based on intents, we add both intent
> > and intent done items to the modification being made. These get
> > written to the log to ensure that the operation is re-run if the
> > intent done is not found in the log.
> > 
> > However, for operations that complete wholly within a single
> > checkpoint, the change in the checkpoint is atomic and will never
> > need replay. In this case, we don't need to actually write the
> > intent and intent done items to the journal because log recovery
> > will never need to manually restart this modification.
> > 
> > Log recovery currently handles intent/intent done matching by
> > inserting the intent into the AIL, then removing it when a matching
> > intent done item is found. Hence for all the intent-based operations
> > that complete within a checkpoint, we spend all that time parsing
> > the intent/intent done items just to cancel them and do nothing with
> > them.
> > 
> > Hence it follows that the only time we actually need intents in the
> > log is when the modification crosses checkpoint boundaries in the
> > log and so may only be partially complete in the journal. Hence if
> > we commit and intent done item to the CIL and the intent item is in
> > the same checkpoint, we don't actually have to write them to the
> > journal because log recovery will always cancel the intents.
> > 
> > We've never really worried about the overhead of logging intents
> > unnecessarily like this because the intents we log are generally
> > very much smaller than the change being made. e.g. freeing an extent
> > involves modifying at lease two freespace btree blocks and the AGF,
> > so the EFI/EFD overhead is only a small increase in space and
> > processing time compared to the overall cost of freeing an extent.
> 
> Question: If we whiteout enough intent items, does that make it possible
> to cram more updates into a checkpoint?

Yes - we release the space the cancelled intent pair used from the
ctx->space_used counter that tracks the size of the CIL checkpoint.

> Are changes required to the existing intent item code to support
> whiteouts, or does the log code autodetect an *I/*D pair in the same
> checkpoint and elide them automatically?

The log code automagically detects it. That's what the ->iop_intent
op is for - when a done intent committed, it looks up it's intent
pair via ->iop_intent and then checks if it is in the current
checkpoint via xlog_item_in_current_chkpt() and if that returns true
then we place a whiteout on the intent and release the space it
consumes.

We don't cull the intent from the CIL until the context checkpoint
commits - we could remove it immediately, but then when the CIL
scalability code gets placed on top of this we can't remove log
items from the per-cpu CIL in transaction commit context and so we
have to use whiteouts to delay removal to the push context. So I
just implemented it that way to start with....

> (I might be fishing here for "Does this make generic/447 faster when it
> deletes the million extents?")

I think it knocks it down a little - maybe 10%? One machine would
appear to drop 126->116s, another it is 52->48s.

The intents are generally tiny compared to the rest of the changes
being (re-)logged, so I'm not expecting miracles for the
rmap/reflink code. The big wins come when intents contain large
chunks of data...

> > diff --git a/fs/xfs/xfs_bmap_item.c b/fs/xfs/xfs_bmap_item.c
> > index 59aa5f9bf769..670d074a71cc 100644
> > --- a/fs/xfs/xfs_bmap_item.c
> > +++ b/fs/xfs/xfs_bmap_item.c
> > @@ -39,6 +39,7 @@ STATIC void
> >  xfs_bui_item_free(
> >  	struct xfs_bui_log_item	*buip)
> >  {
> > +	kmem_free(buip->bui_item.li_lv_shadow);
> 
> Why is it necessary for log items to free their own shadow buffer?

Twisty unpin passages...

Intents with whiteouts on them were leaking them when they
were unpinned from the whiteout list in xlog_cil_push_work(). The
log vectors no longer get attached to the CIL context and freed
via xlog_cil_committed()->xlog_cil_free_logvec(), and so when they
are unpinned by xlog_cil_push_work() the last reference is released
and we have to free the log vector attached to the item as it is
still attached.

The reason we can't do it directly from ->iop_unpin() is that we
also call ->iop_unpin from xlog_cil_committed()->
xfs_trans_committed_bulk(), and if we are aborting there we do not
want to free the shadow buffer because it is still linked into the
lv chain attached to the CIL ctx and will get freed once
xfs_trans_committed_bulk() returns....

> > @@ -1393,7 +1463,11 @@ xlog_cil_commit(
> >  	/* lock out background commit */
> >  	down_read(&cil->xc_ctx_lock);
> >  
> > -	xlog_cil_insert_items(log, tp);
> > +	if (tp->t_flags & XFS_TRANS_HAS_INTENT_DONE)
> > +		released_space = xlog_cil_process_intents(cil, tp);
> > +
> > +	xlog_cil_insert_items(log, tp, released_space);
> > +	tp->t_ticket->t_curr_res += released_space;
> 
> I'm a little tired, so why isn't this adjustment a part of
> xlog_cil_insert_items?  A similar adjustment is made to
> ctx->space_used to release the unused space back to the committing tx,
> right?

Probably because it was a bug fix I added at some point and not
original code....

I'm not fussed where it ends up - I can move it if you want.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/8] xfs: hide log iovec alignment constraints
  2022-04-27  4:50     ` Dave Chinner
@ 2022-04-27 16:45       ` Darrick J. Wong
  0 siblings, 0 replies; 35+ messages in thread
From: Darrick J. Wong @ 2022-04-27 16:45 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Wed, Apr 27, 2022 at 02:50:34PM +1000, Dave Chinner wrote:
> On Tue, Apr 26, 2022 at 08:14:45PM -0700, Darrick J. Wong wrote:
> > On Wed, Apr 27, 2022 at 12:22:52PM +1000, Dave Chinner wrote:
> > > From: Dave Chinner <dchinner@redhat.com>
> > > 
> > > Callers currently have to round out the size of buffers to match the
> > > aligment constraints of log iovecs and xlog_write(). They should not
> > > need to know this detail, so introduce a new function to calculate
> > > the iovec length (for use in ->iop_size implementations). Also
> > > modify xlog_finish_iovec() to round up the length to the correct
> > > alignment so the callers don't need to do this, either.
> > > 
> > > Convert the only user - inode forks - of this alignment rounding to
> > > use the new interface.
> > 
> > Hmm.  So currently, we require that the inode fork buffer be rounded up
> > to the next 4 bytes, and then I guess the log will copy that into the
> > log iovec?  IOWs, if we have a 37-byte data fork, we'll allocate a 40
> > byte buffer for the xfs_ifork, and the log will copy all 40 bytes into a
> > 40 byte iovec.
> 
> Yes, that's how the current code works. It ends up leaking whatever
> was in those 3 bytes into the shadow buffer that we then copy into
> the log region. i.e. the existing code "leaks" non-zeroed allocated
> memory to the journal.
> 
> > Now it looks like we'd allocate a 37-byte buffer for the xfs_ifork, but
> > the log iovec will still be 40 bytes.  So ... do we copy 37 bytes out of
> > the ifork buffer and zero the last 3 bytes in the iovec?
> 
> Yes, we copy 37 bytes out of the ifork buffer now into the shadow
> buffer so we do not overrun the inode fork buffer.
> 
> > Does we leak
> > kernel memory in those last 3 bytes?
> 
> We does indeed still leak the remaining 3 bytes as they are not
> zeroed.
> 
> > Or do we copy 40 bytes and
> > overrun?
> 
> No, we definitely don't do that - KASAN gets very unhappy when you
> do that...
> 
> > It sorta looks like (at least for the local format case) xlog_copy_iovec
> > will copy 37 bytes and leave the last 3 bytes of the iovec in whatever
> > state it was in previously.  Is that zeroed?  Because it then looks like
> > xlog_finish_iovec will round that 37 up to 40.
> 
> The shadow buffer is only partially zeroed - the part that makes io
> the header and iovec pointer array is zeroed, but the region that
> the journal data is written to is not zeroed.
> 
> > (FWIW I'm just checking for kernel memory exposure vectors here.)
> 
> Yup, I hadn't even considered that aspect of the code because we
> aren't actually leaking anything to userspace. If an unprivileged
> user can read 3 bytes of uninitialised data out of the journal we've
> got much, much bigger security problems to deal with.
> 
> It should be trivial to fix, though. I'll do the initial fix as a
> standalone patch, though, and then roll it into this one because the
> problem has been around for a long while and fixing this patch
> doesn't produce an easily backportable fix...

<nod> I agree that it's a very minor disclosure vulnerability (certainly
less severe than ALLOCSP) since you'd need CAP_SYS_RAWIO to exploit it.
But certainly worth patching before someone discovers that a former
pagecache page with your credit card numbers on it got recycled into a
log vector page.  Thanks for doing the fix. :)

--D

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 8/8] xfs: intent item whiteouts
  2022-04-27  5:47     ` Dave Chinner
@ 2022-04-27 17:31       ` Darrick J. Wong
  2022-04-27 22:05         ` Dave Chinner
  0 siblings, 1 reply; 35+ messages in thread
From: Darrick J. Wong @ 2022-04-27 17:31 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Wed, Apr 27, 2022 at 03:47:57PM +1000, Dave Chinner wrote:
> On Tue, Apr 26, 2022 at 08:32:52PM -0700, Darrick J. Wong wrote:
> > On Wed, Apr 27, 2022 at 12:22:59PM +1000, Dave Chinner wrote:
> > > From: Dave Chinner <dchinner@redhat.com>
> > > 
> > > When we log modifications based on intents, we add both intent
> > > and intent done items to the modification being made. These get
> > > written to the log to ensure that the operation is re-run if the
> > > intent done is not found in the log.
> > > 
> > > However, for operations that complete wholly within a single
> > > checkpoint, the change in the checkpoint is atomic and will never
> > > need replay. In this case, we don't need to actually write the
> > > intent and intent done items to the journal because log recovery
> > > will never need to manually restart this modification.
> > > 
> > > Log recovery currently handles intent/intent done matching by
> > > inserting the intent into the AIL, then removing it when a matching
> > > intent done item is found. Hence for all the intent-based operations
> > > that complete within a checkpoint, we spend all that time parsing
> > > the intent/intent done items just to cancel them and do nothing with
> > > them.
> > > 
> > > Hence it follows that the only time we actually need intents in the
> > > log is when the modification crosses checkpoint boundaries in the
> > > log and so may only be partially complete in the journal. Hence if
> > > we commit and intent done item to the CIL and the intent item is in
> > > the same checkpoint, we don't actually have to write them to the
> > > journal because log recovery will always cancel the intents.
> > > 
> > > We've never really worried about the overhead of logging intents
> > > unnecessarily like this because the intents we log are generally
> > > very much smaller than the change being made. e.g. freeing an extent
> > > involves modifying at lease two freespace btree blocks and the AGF,
> > > so the EFI/EFD overhead is only a small increase in space and
> > > processing time compared to the overall cost of freeing an extent.
> > 
> > Question: If we whiteout enough intent items, does that make it possible
> > to cram more updates into a checkpoint?
> 
> Yes - we release the space the cancelled intent pair used from the
> ctx->space_used counter that tracks the size of the CIL checkpoint.

<nod> Good, that's what I was thinking.

> > Are changes required to the existing intent item code to support
> > whiteouts, or does the log code autodetect an *I/*D pair in the same
> > checkpoint and elide them automatically?
> 
> The log code automagically detects it. That's what the ->iop_intent
> op is for - when a done intent committed, it looks up it's intent
> pair via ->iop_intent and then checks if it is in the current
> checkpoint via xlog_item_in_current_chkpt() and if that returns true
> then we place a whiteout on the intent and release the space it
> consumes.
> 
> We don't cull the intent from the CIL until the context checkpoint
> commits - we could remove it immediately, but then when the CIL
> scalability code gets placed on top of this we can't remove log
> items from the per-cpu CIL in transaction commit context and so we
> have to use whiteouts to delay removal to the push context. So I
> just implemented it that way to start with....

<nod> Sounds reasonable to me.

> > (I might be fishing here for "Does this make generic/447 faster when it
> > deletes the million extents?")
> 
> I think it knocks it down a little - maybe 10%? One machine would
> appear to drop 126->116s, another it is 52->48s.
> 
> The intents are generally tiny compared to the rest of the changes
> being (re-)logged, so I'm not expecting miracles for the
> rmap/reflink code. The big wins come when intents contain large
> chunks of data...

I'll take a 10% win. :)

> > > diff --git a/fs/xfs/xfs_bmap_item.c b/fs/xfs/xfs_bmap_item.c
> > > index 59aa5f9bf769..670d074a71cc 100644
> > > --- a/fs/xfs/xfs_bmap_item.c
> > > +++ b/fs/xfs/xfs_bmap_item.c
> > > @@ -39,6 +39,7 @@ STATIC void
> > >  xfs_bui_item_free(
> > >  	struct xfs_bui_log_item	*buip)
> > >  {
> > > +	kmem_free(buip->bui_item.li_lv_shadow);
> > 
> > Why is it necessary for log items to free their own shadow buffer?
> 
> Twisty unpin passages...
> 
> Intents with whiteouts on them were leaking them when they
> were unpinned from the whiteout list in xlog_cil_push_work(). The
> log vectors no longer get attached to the CIL context and freed
> via xlog_cil_committed()->xlog_cil_free_logvec(), and so when they
> are unpinned by xlog_cil_push_work() the last reference is released
> and we have to free the log vector attached to the item as it is
> still attached.
> 
> The reason we can't do it directly from ->iop_unpin() is that we
> also call ->iop_unpin from xlog_cil_committed()->
> xfs_trans_committed_bulk(), and if we are aborting there we do not
> want to free the shadow buffer because it is still linked into the
> lv chain attached to the CIL ctx and will get freed once
> xfs_trans_committed_bulk() returns....

Huh.  So now that I'm more awake, I noticed that you didn't patch
xfs_extfree_item.c to free the shadow buffers because the
xfs_ef[id]_item_free functions already have code to free the shadow
buffer.  git blame says that was added in:

b1c5ebb21301 ("xfs: allocate log vector buffers outside CIL context lock")

This commit was added in 4.8-rc1, and just prior to merging the rmap
patches.

When do log intent items get shadow buffers, since they should only be
committed once?  Looking at that old commit, I think what's going on is
that we preallocate the shadow buffers for every log item at commit time
to avoid a memory allocation when we have the ctx lock, so now it's
necessary for all log intent items to free them?

Does that mean RUI/CUI/BUI log items could have been leaking shadow
buffers since the beginning, and we just haven't noticed because the CIL
has freed them for us?  Which means that the changes to xfs_*_item.c
could, in theory, be a separate patch that fixes a theoretical memory
leak?

(I'm not asking for you to separate the changes; I'm checking my
understanding of something that caused me to go "Eh???" on first
reading.)

> > > @@ -1393,7 +1463,11 @@ xlog_cil_commit(
> > >  	/* lock out background commit */
> > >  	down_read(&cil->xc_ctx_lock);
> > >  
> > > -	xlog_cil_insert_items(log, tp);
> > > +	if (tp->t_flags & XFS_TRANS_HAS_INTENT_DONE)
> > > +		released_space = xlog_cil_process_intents(cil, tp);
> > > +
> > > +	xlog_cil_insert_items(log, tp, released_space);
> > > +	tp->t_ticket->t_curr_res += released_space;
> > 
> > I'm a little tired, so why isn't this adjustment a part of
> > xlog_cil_insert_items?  A similar adjustment is made to
> > ctx->space_used to release the unused space back to the committing tx,
> > right?
> 
> Probably because it was a bug fix I added at some point and not
> original code....
> 
> I'm not fussed where it ends up - I can move it if you want.

Yes please, since xlog_cil_insert_items already adjusts
tp->t_ticket->t_curr_res and it would seem to make more sense to keep
all those adjustments together.

--D

> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 8/8] xfs: intent item whiteouts
  2022-04-27 17:31       ` Darrick J. Wong
@ 2022-04-27 22:05         ` Dave Chinner
  0 siblings, 0 replies; 35+ messages in thread
From: Dave Chinner @ 2022-04-27 22:05 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Wed, Apr 27, 2022 at 10:31:45AM -0700, Darrick J. Wong wrote:
> On Wed, Apr 27, 2022 at 03:47:57PM +1000, Dave Chinner wrote:
> > > > diff --git a/fs/xfs/xfs_bmap_item.c b/fs/xfs/xfs_bmap_item.c
> > > > index 59aa5f9bf769..670d074a71cc 100644
> > > > --- a/fs/xfs/xfs_bmap_item.c
> > > > +++ b/fs/xfs/xfs_bmap_item.c
> > > > @@ -39,6 +39,7 @@ STATIC void
> > > >  xfs_bui_item_free(
> > > >  	struct xfs_bui_log_item	*buip)
> > > >  {
> > > > +	kmem_free(buip->bui_item.li_lv_shadow);
> > > 
> > > Why is it necessary for log items to free their own shadow buffer?
> > 
> > Twisty unpin passages...
> > 
> > Intents with whiteouts on them were leaking them when they
> > were unpinned from the whiteout list in xlog_cil_push_work(). The
> > log vectors no longer get attached to the CIL context and freed
> > via xlog_cil_committed()->xlog_cil_free_logvec(), and so when they
> > are unpinned by xlog_cil_push_work() the last reference is released
> > and we have to free the log vector attached to the item as it is
> > still attached.
> > 
> > The reason we can't do it directly from ->iop_unpin() is that we
> > also call ->iop_unpin from xlog_cil_committed()->
> > xfs_trans_committed_bulk(), and if we are aborting there we do not
> > want to free the shadow buffer because it is still linked into the
> > lv chain attached to the CIL ctx and will get freed once
> > xfs_trans_committed_bulk() returns....
> 
> Huh.  So now that I'm more awake, I noticed that you didn't patch
> xfs_extfree_item.c to free the shadow buffers because the
> xfs_ef[id]_item_free functions already have code to free the shadow
> buffer.  git blame says that was added in:
> 
> b1c5ebb21301 ("xfs: allocate log vector buffers outside CIL context lock")
> 
> This commit was added in 4.8-rc1, and just prior to merging the rmap
> patches.

Right - that was the commit that introduced the shadow buffers...

> When do log intent items get shadow buffers, since they should only be
> committed once?  Looking at that old commit, I think what's going on is
> that we preallocate the shadow buffers for every log item at commit time
> to avoid a memory allocation when we have the ctx lock,

Yes.

> so now it's
> necessary for all log intent items to free them?

Ever since commit b1c5ebb21301 it's been necessary in certain
situations. Not just for intents, but for all log items that are
logged to the journal. i.e. inode, dquot and buffer log items free
li_lv_shadow in their destroy routines.

Essentially, until the last reference to the log item goes away, we
don't know if the CIL holds the other reference to the log item and
so may be actively using the shadow buffer. Hence the only time it
is actually safe to free the shadow buffer is when there are no
remaining references to the log item.

> Does that mean RUI/CUI/BUI log items could have been leaking shadow
> buffers since the beginning, and we just haven't noticed because the CIL
> has freed them for us?  Which means that the changes to xfs_*_item.c
> could, in theory, be a separate patch that fixes a theoretical memory
> leak?

Thinking on it, in theory you are right. In practice, I think this
risk is very low, and KASAN certainly tells us it pretty much isn't
occuring during testing.

AFAICT the only likely time it was occurring is during forced
shutdowns when transactions are being cancelled between intent
commit and done-intent create/link. Once the done-intent is linked
to the intent, cleanup on shutdown seems to works correctly and we
don't have leaks occurring.

However, whiteouts effectively release the done-intent during commit
whilst leaving the whiteout intent as the sole reference to the log
item in the CIL, which then unpin-aborts it to clean it up rather
than chains it and frees it on checkpoint completion. Hence whiteouts
effectively drive a bulldozer through this window and so it was
leaking a BUI/RUI/CUI on every whiteout cancellation as the
CIL wasn't chaining and freeing the log vector that was built for
the commit.

> (I'm not asking for you to separate the changes; I'm checking my
> understanding of something that caused me to go "Eh???" on first
> reading.)
> 
> > > > @@ -1393,7 +1463,11 @@ xlog_cil_commit(
> > > >  	/* lock out background commit */
> > > >  	down_read(&cil->xc_ctx_lock);
> > > >  
> > > > -	xlog_cil_insert_items(log, tp);
> > > > +	if (tp->t_flags & XFS_TRANS_HAS_INTENT_DONE)
> > > > +		released_space = xlog_cil_process_intents(cil, tp);
> > > > +
> > > > +	xlog_cil_insert_items(log, tp, released_space);
> > > > +	tp->t_ticket->t_curr_res += released_space;
> > > 
> > > I'm a little tired, so why isn't this adjustment a part of
> > > xlog_cil_insert_items?  A similar adjustment is made to
> > > ctx->space_used to release the unused space back to the committing tx,
> > > right?
> > 
> > Probably because it was a bug fix I added at some point and not
> > original code....
> > 
> > I'm not fussed where it ends up - I can move it if you want.
> 
> Yes please, since xlog_cil_insert_items already adjusts
> tp->t_ticket->t_curr_res and it would seem to make more sense to keep
> all those adjustments together.

Ok, will do.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/8] xfs: hide log iovec alignment constraints
  2022-04-27  2:22 ` [PATCH 1/8] xfs: hide log iovec alignment constraints Dave Chinner
  2022-04-27  3:14   ` Darrick J. Wong
@ 2022-04-28 13:00   ` Christoph Hellwig
  1 sibling, 0 replies; 35+ messages in thread
From: Christoph Hellwig @ 2022-04-28 13:00 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/8] xfs: don't commit the first deferred transaction without intents
  2022-04-27  2:22 ` [PATCH 2/8] xfs: don't commit the first deferred transaction without intents Dave Chinner
  2022-04-27  3:03   ` Darrick J. Wong
@ 2022-04-28 13:02   ` Christoph Hellwig
  2022-04-30 17:02   ` Alli
  2 siblings, 0 replies; 35+ messages in thread
From: Christoph Hellwig @ 2022-04-28 13:02 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Wed, Apr 27, 2022 at 12:22:53PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> If the first operation in a string of defer ops has no intents,
> then there is no reason to commit it before running the first call
> to xfs_defer_finish_one(). This allows the defer ops to be used
> effectively for non-intent based operations without requiring an
> unnecessary extra transaction commit when first called.
> 
> This fixes a regression in per-attribute modification transaction
> count when delayed attributes are not being used.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/libxfs/xfs_defer.c | 29 +++++++++++++++++------------
>  1 file changed, 17 insertions(+), 12 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_defer.c b/fs/xfs/libxfs/xfs_defer.c
> index 0805ade2d300..66b4555bda8e 100644
> --- a/fs/xfs/libxfs/xfs_defer.c
> +++ b/fs/xfs/libxfs/xfs_defer.c
> @@ -186,7 +186,7 @@ static const struct xfs_defer_op_type *defer_op_types[] = {
>  	[XFS_DEFER_OPS_TYPE_AGFL_FREE]	= &xfs_agfl_free_defer_type,
>  };
>  
> -static void
> +static bool
>  xfs_defer_create_intent(
>  	struct xfs_trans		*tp,
>  	struct xfs_defer_pending	*dfp,
> @@ -197,6 +197,7 @@ xfs_defer_create_intent(
>  	if (!dfp->dfp_intent)
>  		dfp->dfp_intent = ops->create_intent(tp, &dfp->dfp_work,
>  						     dfp->dfp_count, sort);
> +	return dfp->dfp_intent;
>  }
>  
>  /*
> @@ -204,16 +205,18 @@ xfs_defer_create_intent(
>   * associated extents, then add the entire intake list to the end of
>   * the pending list.
>   */
> -STATIC void
> +static bool
>  xfs_defer_create_intents(
>  	struct xfs_trans		*tp)
>  {
>  	struct xfs_defer_pending	*dfp;
> +	bool				ret = false;
>  
>  	list_for_each_entry(dfp, &tp->t_dfops, dfp_list) {
>  		trace_xfs_defer_create_intent(tp->t_mountp, dfp);
> -		xfs_defer_create_intent(tp, dfp, true);
> +		ret |= xfs_defer_create_intent(tp, dfp, true);
>  	}
> +	return ret;
>  }
>  
>  /* Abort all the intents that were committed. */
> @@ -487,7 +490,7 @@ int
>  xfs_defer_finish_noroll(
>  	struct xfs_trans		**tp)
>  {
> -	struct xfs_defer_pending	*dfp;
> +	struct xfs_defer_pending	*dfp = NULL;
>  	int				error = 0;
>  	LIST_HEAD(dop_pending);
>  
> @@ -506,17 +509,19 @@ xfs_defer_finish_noroll(
>  		 * of time that any one intent item can stick around in memory,
>  		 * pinning the log tail.
>  		 */
> -		xfs_defer_create_intents(*tp);
> +		bool has_intents = xfs_defer_create_intents(*tp);
>  		list_splice_init(&(*tp)->t_dfops, &dop_pending);

While we're nitpicking:  an emptry line under the variable declaration
would be nice to have.

> +			/* Possibly relog intent items to keep the log moving. */

And a really annoying 81 character line that just wraps around here.

But except for the nitpicks this looks good to me:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 3/8] xfs: add log item flags to indicate intents
  2022-04-27  3:04   ` Darrick J. Wong
@ 2022-04-28 13:04     ` Christoph Hellwig
  0 siblings, 0 replies; 35+ messages in thread
From: Christoph Hellwig @ 2022-04-28 13:04 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Dave Chinner, linux-xfs

On Tue, Apr 26, 2022 at 08:04:39PM -0700, Darrick J. Wong wrote:
> > We need to be able to identify intent and intent done items quickly
> > and easily in upcoming patches, so simply add intent and intent done
> > type flags to the log item ops flags. These are static flags to
> > begin with, so intent items should have been typed like this from
> > the start.
> > 
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
> 
> Heh, I remember being told to infer intentness or intentdoneness instead
> of using explicit flags...
> 
> Reviewed-by: Darrick J. Wong <djwong@kernel.org>

And I still think having this information redundant and not inferred
seems like a bad move.  But I'll scan the following patchs to see if
there is a good rationale for it, because based on this patch alone
it doesn't actually seem like a good idea to me.


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 4/8] xfs: tag transactions that contain intent done items
  2022-04-27  2:22 ` [PATCH 4/8] xfs: tag transactions that contain intent done items Dave Chinner
  2022-04-27  3:06   ` Darrick J. Wong
@ 2022-04-28 13:05   ` Christoph Hellwig
  1 sibling, 0 replies; 35+ messages in thread
From: Christoph Hellwig @ 2022-04-28 13:05 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 5/8] xfs: factor and move some code in xfs_log_cil.c
  2022-04-27  2:22 ` [PATCH 5/8] xfs: factor and move some code in xfs_log_cil.c Dave Chinner
  2022-04-27  3:15   ` Darrick J. Wong
@ 2022-04-28 13:06   ` Christoph Hellwig
  2022-04-29  1:56   ` Alli
  2 siblings, 0 replies; 35+ messages in thread
From: Christoph Hellwig @ 2022-04-28 13:06 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 6/8] xfs: add log item method to return related intents
  2022-04-27  2:22 ` [PATCH 6/8] xfs: add log item method to return related intents Dave Chinner
  2022-04-27  3:18   ` Darrick J. Wong
@ 2022-04-28 13:10   ` Christoph Hellwig
  1 sibling, 0 replies; 35+ messages in thread
From: Christoph Hellwig @ 2022-04-28 13:10 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 7/8] xfs: whiteouts release intents that are not in the AIL
  2022-04-27  2:22 ` [PATCH 7/8] xfs: whiteouts release intents that are not in the AIL Dave Chinner
  2022-04-27  3:19   ` Darrick J. Wong
@ 2022-04-28 13:15   ` Christoph Hellwig
  1 sibling, 0 replies; 35+ messages in thread
From: Christoph Hellwig @ 2022-04-28 13:15 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Wed, Apr 27, 2022 at 12:22:58PM +1000, Dave Chinner wrote:
> +	if (!atomic_dec_and_test(&buip->bui_refcount))
> +		return;
> +
> +	if (test_bit(XFS_LI_IN_AIL, &buip->bui_item.li_flags))
>  		xfs_trans_ail_delete(&buip->bui_item, SHUTDOWN_LOG_IO_ERROR);

Wouldn't it make more sense to just call xfs_trans_ail_delete with
shutdown_type set to 0 here?  That mean there will be another ail_lock
cycle if it isn't in the AIL (do we even need that the I think single
remaining other caller?), but the code would be simpler and 
self-documenting.

Same for the other instances.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 8/8] xfs: intent item whiteouts
  2022-04-27  2:22 ` [PATCH 8/8] xfs: intent item whiteouts Dave Chinner
  2022-04-27  3:32   ` Darrick J. Wong
@ 2022-04-28 13:22   ` Christoph Hellwig
  2022-04-28 21:38     ` Dave Chinner
  1 sibling, 1 reply; 35+ messages in thread
From: Christoph Hellwig @ 2022-04-28 13:22 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

> --- a/fs/xfs/xfs_bmap_item.c
> +++ b/fs/xfs/xfs_bmap_item.c
> @@ -39,6 +39,7 @@ STATIC void
>  xfs_bui_item_free(
>  	struct xfs_bui_log_item	*buip)
>  {
> +	kmem_free(buip->bui_item.li_lv_shadow);
>  	kmem_cache_free(xfs_bui_cache, buip);
>  }

Based on the discussion with Darrick:  what about splitting adding
the frees of the shadow buffers into a separate prep patch?

> +	/* clean up log items that had whiteouts */
> +	while (!list_empty(&whiteouts)) {
> +		struct xfs_log_item *item = list_first_entry(&whiteouts,
> +						struct xfs_log_item, li_cil);
> +		list_del_init(&item->li_cil);
> +		trace_xfs_cil_whiteout_unpin(item);
> +		item->li_ops->iop_unpin(item, 1);
> +	}
>  	return;
>  
>  out_skip:
> @@ -1212,6 +1236,14 @@ xlog_cil_push_work(
>  out_abort_free_ticket:
>  	xfs_log_ticket_ungrant(log, ctx->ticket);
>  	ASSERT(xlog_is_shutdown(log));
> +	while (!list_empty(&whiteouts)) {
> +		struct xfs_log_item *item = list_first_entry(&whiteouts,
> +						struct xfs_log_item, li_cil);
> +		list_del_init(&item->li_cil);
> +		trace_xfs_cil_whiteout_unpin(item);
> +		item->li_ops->iop_unpin(item, 1);
> +	}

Sees like this would benefit from a little helper instead of
duplicating the logic?

Otherwise this does look surprisingly nice and simple:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 8/8] xfs: intent item whiteouts
  2022-04-28 13:22   ` Christoph Hellwig
@ 2022-04-28 21:38     ` Dave Chinner
  0 siblings, 0 replies; 35+ messages in thread
From: Dave Chinner @ 2022-04-28 21:38 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs

On Thu, Apr 28, 2022 at 06:22:06AM -0700, Christoph Hellwig wrote:
> > --- a/fs/xfs/xfs_bmap_item.c
> > +++ b/fs/xfs/xfs_bmap_item.c
> > @@ -39,6 +39,7 @@ STATIC void
> >  xfs_bui_item_free(
> >  	struct xfs_bui_log_item	*buip)
> >  {
> > +	kmem_free(buip->bui_item.li_lv_shadow);
> >  	kmem_cache_free(xfs_bui_cache, buip);
> >  }
> 
> Based on the discussion with Darrick:  what about splitting adding
> the frees of the shadow buffers into a separate prep patch?

Ok, I can do that easily enough.

> > +	/* clean up log items that had whiteouts */
> > +	while (!list_empty(&whiteouts)) {
> > +		struct xfs_log_item *item = list_first_entry(&whiteouts,
> > +						struct xfs_log_item, li_cil);
> > +		list_del_init(&item->li_cil);
> > +		trace_xfs_cil_whiteout_unpin(item);
> > +		item->li_ops->iop_unpin(item, 1);
> > +	}
> >  	return;
> >  
> >  out_skip:
> > @@ -1212,6 +1236,14 @@ xlog_cil_push_work(
> >  out_abort_free_ticket:
> >  	xfs_log_ticket_ungrant(log, ctx->ticket);
> >  	ASSERT(xlog_is_shutdown(log));
> > +	while (!list_empty(&whiteouts)) {
> > +		struct xfs_log_item *item = list_first_entry(&whiteouts,
> > +						struct xfs_log_item, li_cil);
> > +		list_del_init(&item->li_cil);
> > +		trace_xfs_cil_whiteout_unpin(item);
> > +		item->li_ops->iop_unpin(item, 1);
> > +	}
> 
> Sees like this would benefit from a little helper instead of
> duplicating the logic?

Yeah, when I looked over this yesterday I was in two minds whether a
helper was justified or not. Seeing as you've suggested it too, I'll
factor it out....

> Otherwise this does look surprisingly nice and simple:
> 
> Reviewed-by: Christoph Hellwig <hch@lst.de>

Thanks!

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 5/8] xfs: factor and move some code in xfs_log_cil.c
  2022-04-27  2:22 ` [PATCH 5/8] xfs: factor and move some code in xfs_log_cil.c Dave Chinner
  2022-04-27  3:15   ` Darrick J. Wong
  2022-04-28 13:06   ` Christoph Hellwig
@ 2022-04-29  1:56   ` Alli
  2 siblings, 0 replies; 35+ messages in thread
From: Alli @ 2022-04-29  1:56 UTC (permalink / raw)
  To: Dave Chinner, linux-xfs

On Wed, 2022-04-27 at 12:22 +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> In preparation for adding support for intent item whiteouts.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
Looks good to me
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>

> ---
>  fs/xfs/xfs_log_cil.c | 119 ++++++++++++++++++++++++-----------------
> --
>  1 file changed, 67 insertions(+), 52 deletions(-)
> 
> diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
> index e5ab62f08c19..0d8d092447ad 100644
> --- a/fs/xfs/xfs_log_cil.c
> +++ b/fs/xfs/xfs_log_cil.c
> @@ -47,6 +47,38 @@ xlog_cil_ticket_alloc(
>  	return tic;
>  }
>  
> +/*
> + * Check if the current log item was first committed in this
> sequence.
> + * We can't rely on just the log item being in the CIL, we have to
> check
> + * the recorded commit sequence number.
> + *
> + * Note: for this to be used in a non-racy manner, it has to be
> called with
> + * CIL flushing locked out. As a result, it should only be used
> during the
> + * transaction commit process when deciding what to format into the
> item.
> + */
> +static bool
> +xlog_item_in_current_chkpt(
> +	struct xfs_cil		*cil,
> +	struct xfs_log_item	*lip)
> +{
> +	if (list_empty(&lip->li_cil))
> +		return false;
> +
> +	/*
> +	 * li_seq is written on the first commit of a log item to
> record the
> +	 * first checkpoint it is written to. Hence if it is different
> to the
> +	 * current sequence, we're in a new checkpoint.
> +	 */
> +	return lip->li_seq == READ_ONCE(cil->xc_current_sequence);
> +}
> +
> +bool
> +xfs_log_item_in_current_chkpt(
> +	struct xfs_log_item *lip)
> +{
> +	return xlog_item_in_current_chkpt(lip->li_log->l_cilp, lip);
> +}
> +
>  /*
>   * Unavoidable forward declaration - xlog_cil_push_work() calls
>   * xlog_cil_ctx_alloc() itself.
> @@ -934,6 +966,40 @@ xlog_cil_build_trans_hdr(
>  	tic->t_curr_res -= lvhdr->lv_bytes;
>  }
>  
> +/*
> + * Pull all the log vectors off the items in the CIL, and remove the
> items from
> + * the CIL. We don't need the CIL lock here because it's only needed
> on the
> + * transaction commit side which is currently locked out by the
> flush lock.
> + */
> +static void
> +xlog_cil_build_lv_chain(
> +	struct xfs_cil		*cil,
> +	struct xfs_cil_ctx	*ctx,
> +	uint32_t		*num_iovecs,
> +	uint32_t		*num_bytes)
> +{
> +	struct xfs_log_vec	*lv = NULL;
> +
> +	while (!list_empty(&cil->xc_cil)) {
> +		struct xfs_log_item	*item;
> +
> +		item = list_first_entry(&cil->xc_cil,
> +					struct xfs_log_item, li_cil);
> +		list_del_init(&item->li_cil);
> +		if (!ctx->lv_chain)
> +			ctx->lv_chain = item->li_lv;
> +		else
> +			lv->lv_next = item->li_lv;
> +		lv = item->li_lv;
> +		item->li_lv = NULL;
> +		*num_iovecs += lv->lv_niovecs;
> +
> +		/* we don't write ordered log vectors */
> +		if (lv->lv_buf_len != XFS_LOG_VEC_ORDERED)
> +			*num_bytes += lv->lv_bytes;
> +	}
> +}
> +
>  /*
>   * Push the Committed Item List to the log.
>   *
> @@ -956,7 +1022,6 @@ xlog_cil_push_work(
>  		container_of(work, struct xfs_cil_ctx, push_work);
>  	struct xfs_cil		*cil = ctx->cil;
>  	struct xlog		*log = cil->xc_log;
> -	struct xfs_log_vec	*lv;
>  	struct xfs_cil_ctx	*new_ctx;
>  	int			num_iovecs = 0;
>  	int			num_bytes = 0;
> @@ -1033,31 +1098,7 @@ xlog_cil_push_work(
>  	list_add(&ctx->committing, &cil->xc_committing);
>  	spin_unlock(&cil->xc_push_lock);
>  
> -	/*
> -	 * Pull all the log vectors off the items in the CIL, and
> remove the
> -	 * items from the CIL. We don't need the CIL lock here because
> it's only
> -	 * needed on the transaction commit side which is currently
> locked out
> -	 * by the flush lock.
> -	 */
> -	lv = NULL;
> -	while (!list_empty(&cil->xc_cil)) {
> -		struct xfs_log_item	*item;
> -
> -		item = list_first_entry(&cil->xc_cil,
> -					struct xfs_log_item, li_cil);
> -		list_del_init(&item->li_cil);
> -		if (!ctx->lv_chain)
> -			ctx->lv_chain = item->li_lv;
> -		else
> -			lv->lv_next = item->li_lv;
> -		lv = item->li_lv;
> -		item->li_lv = NULL;
> -		num_iovecs += lv->lv_niovecs;
> -
> -		/* we don't write ordered log vectors */
> -		if (lv->lv_buf_len != XFS_LOG_VEC_ORDERED)
> -			num_bytes += lv->lv_bytes;
> -	}
> +	xlog_cil_build_lv_chain(cil, ctx, &num_iovecs, &num_bytes);
>  
>  	/*
>  	 * Switch the contexts so we can drop the context lock and move
> out
> @@ -1508,32 +1549,6 @@ xlog_cil_force_seq(
>  	return 0;
>  }
>  
> -/*
> - * Check if the current log item was first committed in this
> sequence.
> - * We can't rely on just the log item being in the CIL, we have to
> check
> - * the recorded commit sequence number.
> - *
> - * Note: for this to be used in a non-racy manner, it has to be
> called with
> - * CIL flushing locked out. As a result, it should only be used
> during the
> - * transaction commit process when deciding what to format into the
> item.
> - */
> -bool
> -xfs_log_item_in_current_chkpt(
> -	struct xfs_log_item	*lip)
> -{
> -	struct xfs_cil		*cil = lip->li_log->l_cilp;
> -
> -	if (list_empty(&lip->li_cil))
> -		return false;
> -
> -	/*
> -	 * li_seq is written on the first commit of a log item to
> record the
> -	 * first checkpoint it is written to. Hence if it is different
> to the
> -	 * current sequence, we're in a new checkpoint.
> -	 */
> -	return lip->li_seq == READ_ONCE(cil->xc_current_sequence);
> -}
> -
>  /*
>   * Perform initial CIL structure initialisation.
>   */


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/8] xfs: don't commit the first deferred transaction without intents
  2022-04-27  2:22 ` [PATCH 2/8] xfs: don't commit the first deferred transaction without intents Dave Chinner
  2022-04-27  3:03   ` Darrick J. Wong
  2022-04-28 13:02   ` Christoph Hellwig
@ 2022-04-30 17:02   ` Alli
  2 siblings, 0 replies; 35+ messages in thread
From: Alli @ 2022-04-30 17:02 UTC (permalink / raw)
  To: Dave Chinner, linux-xfs

On Wed, 2022-04-27 at 12:22 +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> If the first operation in a string of defer ops has no intents,
> then there is no reason to commit it before running the first call
> to xfs_defer_finish_one(). This allows the defer ops to be used
> effectively for non-intent based operations without requiring an
> unnecessary extra transaction commit when first called.
> 
> This fixes a regression in per-attribute modification transaction
> count when delayed attributes are not being used.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
Just realized I forgot to send this along in my last review.

Nits from other reviews aside, I think this one looks good
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>

> ---
>  fs/xfs/libxfs/xfs_defer.c | 29 +++++++++++++++++------------
>  1 file changed, 17 insertions(+), 12 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_defer.c b/fs/xfs/libxfs/xfs_defer.c
> index 0805ade2d300..66b4555bda8e 100644
> --- a/fs/xfs/libxfs/xfs_defer.c
> +++ b/fs/xfs/libxfs/xfs_defer.c
> @@ -186,7 +186,7 @@ static const struct xfs_defer_op_type
> *defer_op_types[] = {
>  	[XFS_DEFER_OPS_TYPE_AGFL_FREE]	= &xfs_agfl_free_defer_type,
>  };
>  
> -static void
> +static bool
>  xfs_defer_create_intent(
>  	struct xfs_trans		*tp,
>  	struct xfs_defer_pending	*dfp,
> @@ -197,6 +197,7 @@ xfs_defer_create_intent(
>  	if (!dfp->dfp_intent)
>  		dfp->dfp_intent = ops->create_intent(tp, &dfp-
> >dfp_work,
>  						     dfp->dfp_count,
> sort);
> +	return dfp->dfp_intent;
>  }
>  
>  /*
> @@ -204,16 +205,18 @@ xfs_defer_create_intent(
>   * associated extents, then add the entire intake list to the end of
>   * the pending list.
>   */
> -STATIC void
> +static bool
>  xfs_defer_create_intents(
>  	struct xfs_trans		*tp)
>  {
>  	struct xfs_defer_pending	*dfp;
> +	bool				ret = false;
>  
>  	list_for_each_entry(dfp, &tp->t_dfops, dfp_list) {
>  		trace_xfs_defer_create_intent(tp->t_mountp, dfp);
> -		xfs_defer_create_intent(tp, dfp, true);
> +		ret |= xfs_defer_create_intent(tp, dfp, true);
>  	}
> +	return ret;
>  }
>  
>  /* Abort all the intents that were committed. */
> @@ -487,7 +490,7 @@ int
>  xfs_defer_finish_noroll(
>  	struct xfs_trans		**tp)
>  {
> -	struct xfs_defer_pending	*dfp;
> +	struct xfs_defer_pending	*dfp = NULL;
>  	int				error = 0;
>  	LIST_HEAD(dop_pending);
>  
> @@ -506,17 +509,19 @@ xfs_defer_finish_noroll(
>  		 * of time that any one intent item can stick around in
> memory,
>  		 * pinning the log tail.
>  		 */
> -		xfs_defer_create_intents(*tp);
> +		bool has_intents = xfs_defer_create_intents(*tp);
>  		list_splice_init(&(*tp)->t_dfops, &dop_pending);
>  
> -		error = xfs_defer_trans_roll(tp);
> -		if (error)
> -			goto out_shutdown;
> +		if (has_intents || dfp) {
> +			error = xfs_defer_trans_roll(tp);
> +			if (error)
> +				goto out_shutdown;
>  
> -		/* Possibly relog intent items to keep the log moving.
> */
> -		error = xfs_defer_relog(tp, &dop_pending);
> -		if (error)
> -			goto out_shutdown;
> +			/* Possibly relog intent items to keep the log
> moving. */
> +			error = xfs_defer_relog(tp, &dop_pending);
> +			if (error)
> +				goto out_shutdown;
> +		}
>  
>  		dfp = list_first_entry(&dop_pending, struct
> xfs_defer_pending,
>  				       dfp_list);


^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2022-04-30 17:02 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-27  2:22 [PATCH 0/8 v5] xfs: intent whiteouts Dave Chinner
2022-04-27  2:22 ` [PATCH 1/8] xfs: hide log iovec alignment constraints Dave Chinner
2022-04-27  3:14   ` Darrick J. Wong
2022-04-27  4:50     ` Dave Chinner
2022-04-27 16:45       ` Darrick J. Wong
2022-04-28 13:00   ` Christoph Hellwig
2022-04-27  2:22 ` [PATCH 2/8] xfs: don't commit the first deferred transaction without intents Dave Chinner
2022-04-27  3:03   ` Darrick J. Wong
2022-04-27  4:52     ` Dave Chinner
2022-04-28 13:02   ` Christoph Hellwig
2022-04-30 17:02   ` Alli
2022-04-27  2:22 ` [PATCH 3/8] xfs: add log item flags to indicate intents Dave Chinner
2022-04-27  3:04   ` Darrick J. Wong
2022-04-28 13:04     ` Christoph Hellwig
2022-04-27  2:22 ` [PATCH 4/8] xfs: tag transactions that contain intent done items Dave Chinner
2022-04-27  3:06   ` Darrick J. Wong
2022-04-28 13:05   ` Christoph Hellwig
2022-04-27  2:22 ` [PATCH 5/8] xfs: factor and move some code in xfs_log_cil.c Dave Chinner
2022-04-27  3:15   ` Darrick J. Wong
2022-04-27  4:56     ` Dave Chinner
2022-04-28 13:06   ` Christoph Hellwig
2022-04-29  1:56   ` Alli
2022-04-27  2:22 ` [PATCH 6/8] xfs: add log item method to return related intents Dave Chinner
2022-04-27  3:18   ` Darrick J. Wong
2022-04-28 13:10   ` Christoph Hellwig
2022-04-27  2:22 ` [PATCH 7/8] xfs: whiteouts release intents that are not in the AIL Dave Chinner
2022-04-27  3:19   ` Darrick J. Wong
2022-04-28 13:15   ` Christoph Hellwig
2022-04-27  2:22 ` [PATCH 8/8] xfs: intent item whiteouts Dave Chinner
2022-04-27  3:32   ` Darrick J. Wong
2022-04-27  5:47     ` Dave Chinner
2022-04-27 17:31       ` Darrick J. Wong
2022-04-27 22:05         ` Dave Chinner
2022-04-28 13:22   ` Christoph Hellwig
2022-04-28 21:38     ` Dave Chinner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.