All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 0/5] xfs: fix how we deal with new intents during recovery
@ 2020-09-29 17:43 Darrick J. Wong
  2020-09-29 17:43 ` [PATCH 1/5] xfs: remove xfs_defer_reset Darrick J. Wong
                   ` (4 more replies)
  0 siblings, 5 replies; 24+ messages in thread
From: Darrick J. Wong @ 2020-09-29 17:43 UTC (permalink / raw)
  To: darrick.wong
  Cc: Dave Chinner, Christoph Hellwig, linux-xfs, david, hch, bfoster

Hi all,

This series of log fixes dates back to an earlier discussion that Dave
and I had about the weird way that log recovery works w.r.t. intent
items.  The current code juggles nested transactions so that it can
siphon off new deferred items for later; this we replace with a new
dfops freezer that captures the log reservation type and remaining block
reservation so that we finish the new deferred items with the same
transaction context as we would have had the system not gone down.

v2: rework the defer capture api per hch suggestions
v3: rework the api again, per bfoster suggestions, so now xfs_defer_capture
is only responsible for creating the capture device, and log recovery still
has to do some work to commit a transaction and free resources
v4: kill XFS_LI_RECOVERED and move all the defer capture commit and release
code to xfs_defer.c

If you're going to start using this mess, you probably ought to just
pull from my git trees, which are linked below.

This is an extraordinary way to destroy everything.  Enjoy!
Comments and questions are, as always, welcome.

--D

kernel git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=fix-recovery-intent-chaining-5.10
---
 fs/xfs/libxfs/xfs_defer.c  |  127 +++++++++++++++++++++++++++++++++++---------
 fs/xfs/libxfs/xfs_defer.h  |   21 +++++++
 fs/xfs/xfs_bmap_item.c     |   16 +-----
 fs/xfs/xfs_extfree_item.c  |    7 +-
 fs/xfs/xfs_log_recover.c   |  110 +++++++++++++++++++-------------------
 fs/xfs/xfs_refcount_item.c |   16 +-----
 fs/xfs/xfs_rmap_item.c     |    7 +-
 fs/xfs/xfs_trans.h         |    7 +-
 8 files changed, 193 insertions(+), 118 deletions(-)


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 1/5] xfs: remove xfs_defer_reset
  2020-09-29 17:43 [PATCH v4 0/5] xfs: fix how we deal with new intents during recovery Darrick J. Wong
@ 2020-09-29 17:43 ` Darrick J. Wong
  2020-10-01 17:30   ` Brian Foster
  2020-09-29 17:43 ` [PATCH 2/5] xfs: remove XFS_LI_RECOVERED Darrick J. Wong
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 24+ messages in thread
From: Darrick J. Wong @ 2020-09-29 17:43 UTC (permalink / raw)
  To: darrick.wong
  Cc: Dave Chinner, Christoph Hellwig, linux-xfs, david, hch, bfoster

From: Darrick J. Wong <darrick.wong@oracle.com>

Remove this one-line helper since the assert is trivially true in one
call site and the rest obscures a bitmask operation.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/libxfs/xfs_defer.c |   24 +++++-------------------
 1 file changed, 5 insertions(+), 19 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_defer.c b/fs/xfs/libxfs/xfs_defer.c
index 29e9762f3b77..36c103c14bc9 100644
--- a/fs/xfs/libxfs/xfs_defer.c
+++ b/fs/xfs/libxfs/xfs_defer.c
@@ -312,22 +312,6 @@ xfs_defer_trans_roll(
 	return error;
 }
 
-/*
- * Reset an already used dfops after finish.
- */
-static void
-xfs_defer_reset(
-	struct xfs_trans	*tp)
-{
-	ASSERT(list_empty(&tp->t_dfops));
-
-	/*
-	 * Low mode state transfers across transaction rolls to mirror dfops
-	 * lifetime. Clear it now that dfops is reset.
-	 */
-	tp->t_flags &= ~XFS_TRANS_LOWMODE;
-}
-
 /*
  * Free up any items left in the list.
  */
@@ -477,7 +461,10 @@ xfs_defer_finish(
 			return error;
 		}
 	}
-	xfs_defer_reset(*tp);
+
+	/* Reset LOWMODE now that we've finished all the dfops. */
+	ASSERT(list_empty(&(*tp)->t_dfops));
+	(*tp)->t_flags &= ~XFS_TRANS_LOWMODE;
 	return 0;
 }
 
@@ -551,8 +538,7 @@ xfs_defer_move(
 	 * that behavior.
 	 */
 	dtp->t_flags |= (stp->t_flags & XFS_TRANS_LOWMODE);
-
-	xfs_defer_reset(stp);
+	stp->t_flags &= ~XFS_TRANS_LOWMODE;
 }
 
 /*


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 2/5] xfs: remove XFS_LI_RECOVERED
  2020-09-29 17:43 [PATCH v4 0/5] xfs: fix how we deal with new intents during recovery Darrick J. Wong
  2020-09-29 17:43 ` [PATCH 1/5] xfs: remove xfs_defer_reset Darrick J. Wong
@ 2020-09-29 17:43 ` Darrick J. Wong
  2020-10-01 17:30   ` Brian Foster
  2020-10-02  7:16   ` Christoph Hellwig
  2020-09-29 17:43 ` [PATCH 3/5] xfs: proper replay of deferred ops queued during log recovery Darrick J. Wong
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 24+ messages in thread
From: Darrick J. Wong @ 2020-09-29 17:43 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs, david, hch, bfoster

From: Darrick J. Wong <darrick.wong@oracle.com>

The ->iop_recover method of a log intent item removes the recovered
intent item from the AIL by logging an intent done item and committing
the transaction, so it's superfluous to have this flag check.  Nothing
else uses it, so get rid of the flag entirely.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/xfs_log_recover.c |    8 +++-----
 fs/xfs/xfs_trans.h       |    4 +---
 2 files changed, 4 insertions(+), 8 deletions(-)


diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index e0675071b39e..84f876c6d498 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -2539,11 +2539,9 @@ xlog_recover_process_intents(
 		 * this routine or else those subsequent intents will get
 		 * replayed in the wrong order!
 		 */
-		if (!test_and_set_bit(XFS_LI_RECOVERED, &lip->li_flags)) {
-			spin_unlock(&ailp->ail_lock);
-			error = lip->li_ops->iop_recover(lip, parent_tp);
-			spin_lock(&ailp->ail_lock);
-		}
+		spin_unlock(&ailp->ail_lock);
+		error = lip->li_ops->iop_recover(lip, parent_tp);
+		spin_lock(&ailp->ail_lock);
 		if (error)
 			goto out;
 		lip = xfs_trans_ail_cursor_next(ailp, &cur);
diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h
index a71b4f443e39..ced62a35a62b 100644
--- a/fs/xfs/xfs_trans.h
+++ b/fs/xfs/xfs_trans.h
@@ -55,14 +55,12 @@ struct xfs_log_item {
 #define	XFS_LI_ABORTED	1
 #define	XFS_LI_FAILED	2
 #define	XFS_LI_DIRTY	3	/* log item dirty in transaction */
-#define	XFS_LI_RECOVERED 4	/* log intent item has been recovered */
 
 #define XFS_LI_FLAGS \
 	{ (1 << XFS_LI_IN_AIL),		"IN_AIL" }, \
 	{ (1 << XFS_LI_ABORTED),	"ABORTED" }, \
 	{ (1 << XFS_LI_FAILED),		"FAILED" }, \
-	{ (1 << XFS_LI_DIRTY),		"DIRTY" }, \
-	{ (1 << XFS_LI_RECOVERED),	"RECOVERED" }
+	{ (1 << XFS_LI_DIRTY),		"DIRTY" }
 
 struct xfs_item_ops {
 	unsigned flags;


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 3/5] xfs: proper replay of deferred ops queued during log recovery
  2020-09-29 17:43 [PATCH v4 0/5] xfs: fix how we deal with new intents during recovery Darrick J. Wong
  2020-09-29 17:43 ` [PATCH 1/5] xfs: remove xfs_defer_reset Darrick J. Wong
  2020-09-29 17:43 ` [PATCH 2/5] xfs: remove XFS_LI_RECOVERED Darrick J. Wong
@ 2020-09-29 17:43 ` Darrick J. Wong
  2020-10-01 17:31   ` Brian Foster
  2020-10-02  7:18   ` Christoph Hellwig
  2020-09-29 17:43 ` [PATCH 4/5] xfs: xfs_defer_capture should absorb remaining block reservation Darrick J. Wong
  2020-09-29 17:43 ` [PATCH 5/5] xfs: xfs_defer_capture should absorb remaining transaction reservation Darrick J. Wong
  4 siblings, 2 replies; 24+ messages in thread
From: Darrick J. Wong @ 2020-09-29 17:43 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs, david, hch, bfoster

From: Darrick J. Wong <darrick.wong@oracle.com>

When we replay unfinished intent items that have been recovered from the
log, it's possible that the replay will cause the creation of more
deferred work items.  As outlined in commit 509955823cc9c ("xfs: log
recovery should replay deferred ops in order"), later work items have an
implicit ordering dependency on earlier work items.  Therefore, recovery
must replay the items (both recovered and created) in the same order
that they would have been during normal operation.

For log recovery, we enforce this ordering by using an empty transaction
to collect deferred ops that get created in the process of recovering a
log intent item to prevent them from being committed before the rest of
the recovered intent items.  After we finish committing all the
recovered log items, we allocate a transaction with an enormous block
reservation, splice our huge list of created deferred ops into that
transaction, and commit it, thereby finishing all those ops.

This is /really/ hokey -- it's the one place in XFS where we allow
nested transactions; the splicing of the defer ops list is is inelegant
and has to be done twice per recovery function; and the broken way we
handle inode pointers and block reservations cause subtle use-after-free
and allocator problems that will be fixed by this patch and the two
patches after it.

Therefore, replace the hokey empty transaction with a structure designed
to capture each chain of deferred ops that are created as part of
recovering a single unfinished log intent.  Finally, refactor the loop
that replays those chains to do so using one transaction per chain.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/libxfs/xfs_defer.c  |   89 +++++++++++++++++++++++++++++++--
 fs/xfs/libxfs/xfs_defer.h  |   19 +++++++
 fs/xfs/xfs_bmap_item.c     |   16 +-----
 fs/xfs/xfs_extfree_item.c  |    7 +--
 fs/xfs/xfs_log_recover.c   |  118 +++++++++++++++++++++++++-------------------
 fs/xfs/xfs_refcount_item.c |   16 +-----
 fs/xfs/xfs_rmap_item.c     |    7 +--
 fs/xfs/xfs_trans.h         |    3 +
 8 files changed, 184 insertions(+), 91 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_defer.c b/fs/xfs/libxfs/xfs_defer.c
index 36c103c14bc9..85c371d29e8d 100644
--- a/fs/xfs/libxfs/xfs_defer.c
+++ b/fs/xfs/libxfs/xfs_defer.c
@@ -549,14 +549,89 @@ xfs_defer_move(
  *
  * Create and log intent items for all the work that we're capturing so that we
  * can be assured that the items will get replayed if the system goes down
- * before log recovery gets a chance to finish the work it put off.  Then we
- * move the chain from stp to dtp.
+ * before log recovery gets a chance to finish the work it put off.  The entire
+ * deferred ops state is transferred to the capture structure and the
+ * transaction is then ready for the caller to commit it.  If there are no
+ * intent items to capture, this function returns NULL.
+ */
+static struct xfs_defer_capture *
+xfs_defer_ops_capture(
+	struct xfs_trans		*tp)
+{
+	struct xfs_defer_capture	*dfc;
+
+	if (list_empty(&tp->t_dfops))
+		return NULL;
+
+	/* Create an object to capture the defer ops. */
+	dfc = kmem_zalloc(sizeof(*dfc), KM_NOFS);
+	INIT_LIST_HEAD(&dfc->dfc_list);
+	INIT_LIST_HEAD(&dfc->dfc_dfops);
+
+	xfs_defer_create_intents(tp);
+
+	/* Move the dfops chain and transaction state to the capture struct. */
+	list_splice_init(&tp->t_dfops, &dfc->dfc_dfops);
+	dfc->dfc_tpflags = tp->t_flags & XFS_TRANS_LOWMODE;
+	tp->t_flags &= ~XFS_TRANS_LOWMODE;
+
+	return dfc;
+}
+
+/* Release all resources that we used to capture deferred ops. */
+void
+xfs_defer_ops_release(
+	struct xfs_mount		*mp,
+	struct xfs_defer_capture	*dfc)
+{
+	xfs_defer_cancel_list(mp, &dfc->dfc_dfops);
+	kmem_free(dfc);
+}
+
+/*
+ * Capture any deferred ops and commit the transaction.  This is the last step
+ * needed to finish a log intent item that we recovered from the log.
+ */
+int
+xfs_defer_ops_capture_and_commit(
+	struct xfs_trans		*tp,
+	struct list_head		*capture_list)
+{
+	struct xfs_mount		*mp = tp->t_mountp;
+	struct xfs_defer_capture	*dfc;
+	int				error;
+
+	/* If we don't capture anything, commit transaction and exit. */
+	dfc = xfs_defer_ops_capture(tp);
+	if (!dfc)
+		return xfs_trans_commit(tp);
+
+	/* Commit the transaction and add the capture structure to the list. */
+	error = xfs_trans_commit(tp);
+	if (error) {
+		xfs_defer_ops_release(mp, dfc);
+		return error;
+	}
+
+	list_add_tail(&dfc->dfc_list, capture_list);
+	return 0;
+}
+
+/*
+ * Attach a chain of captured deferred ops to a new transaction and free the
+ * capture structure.
  */
 void
-xfs_defer_capture(
-	struct xfs_trans	*dtp,
-	struct xfs_trans	*stp)
+xfs_defer_ops_continue(
+	struct xfs_defer_capture	*dfc,
+	struct xfs_trans		*tp)
 {
-	xfs_defer_create_intents(stp);
-	xfs_defer_move(dtp, stp);
+	ASSERT(tp->t_flags & XFS_TRANS_PERM_LOG_RES);
+	ASSERT(!(tp->t_flags & XFS_TRANS_DIRTY));
+
+	/* Move captured dfops chain and state to the transaction. */
+	list_splice_init(&dfc->dfc_dfops, &tp->t_dfops);
+	tp->t_flags |= dfc->dfc_tpflags;
+
+	kmem_free(dfc);
 }
diff --git a/fs/xfs/libxfs/xfs_defer.h b/fs/xfs/libxfs/xfs_defer.h
index 3164199162b6..3af82ebc1249 100644
--- a/fs/xfs/libxfs/xfs_defer.h
+++ b/fs/xfs/libxfs/xfs_defer.h
@@ -8,6 +8,7 @@
 
 struct xfs_btree_cur;
 struct xfs_defer_op_type;
+struct xfs_defer_capture;
 
 /*
  * Header for deferred operation list.
@@ -63,10 +64,26 @@ extern const struct xfs_defer_op_type xfs_rmap_update_defer_type;
 extern const struct xfs_defer_op_type xfs_extent_free_defer_type;
 extern const struct xfs_defer_op_type xfs_agfl_free_defer_type;
 
+/*
+ * This structure enables a dfops user to detach the chain of deferred
+ * operations from a transaction so that they can be continued later.
+ */
+struct xfs_defer_capture {
+	/* List of other capture structures. */
+	struct list_head	dfc_list;
+
+	/* Deferred ops state saved from the transaction. */
+	struct list_head	dfc_dfops;
+	unsigned int		dfc_tpflags;
+};
+
 /*
  * Functions to capture a chain of deferred operations and continue them later.
  * This doesn't normally happen except log recovery.
  */
-void xfs_defer_capture(struct xfs_trans *dtp, struct xfs_trans *stp);
+int xfs_defer_ops_capture_and_commit(struct xfs_trans *tp,
+		struct list_head *capture_list);
+void xfs_defer_ops_continue(struct xfs_defer_capture *d, struct xfs_trans *tp);
+void xfs_defer_ops_release(struct xfs_mount *mp, struct xfs_defer_capture *d);
 
 #endif /* __XFS_DEFER_H__ */
diff --git a/fs/xfs/xfs_bmap_item.c b/fs/xfs/xfs_bmap_item.c
index b04ebcd78316..126df48dae5f 100644
--- a/fs/xfs/xfs_bmap_item.c
+++ b/fs/xfs/xfs_bmap_item.c
@@ -424,13 +424,13 @@ const struct xfs_defer_op_type xfs_bmap_update_defer_type = {
 STATIC int
 xfs_bui_item_recover(
 	struct xfs_log_item		*lip,
-	struct xfs_trans		*parent_tp)
+	struct list_head		*capture_list)
 {
 	struct xfs_bmbt_irec		irec;
 	struct xfs_bui_log_item		*buip = BUI_ITEM(lip);
 	struct xfs_trans		*tp;
 	struct xfs_inode		*ip = NULL;
-	struct xfs_mount		*mp = parent_tp->t_mountp;
+	struct xfs_mount		*mp = lip->li_mountp;
 	struct xfs_map_extent		*bmap;
 	struct xfs_bud_log_item		*budp;
 	xfs_fsblock_t			startblock_fsb;
@@ -478,12 +478,7 @@ xfs_bui_item_recover(
 			XFS_EXTENTADD_SPACE_RES(mp, XFS_DATA_FORK), 0, 0, &tp);
 	if (error)
 		return error;
-	/*
-	 * Recovery stashes all deferred ops during intent processing and
-	 * finishes them on completion. Transfer current dfops state to this
-	 * transaction and transfer the result back before we return.
-	 */
-	xfs_defer_move(tp, parent_tp);
+
 	budp = xfs_trans_get_bud(tp, buip);
 
 	/* Grab the inode. */
@@ -531,15 +526,12 @@ xfs_bui_item_recover(
 		xfs_bmap_unmap_extent(tp, ip, &irec);
 	}
 
-	xfs_defer_capture(parent_tp, tp);
-	error = xfs_trans_commit(tp);
+	error = xfs_defer_ops_capture_and_commit(tp, capture_list);
 	xfs_iunlock(ip, XFS_ILOCK_EXCL);
 	xfs_irele(ip);
-
 	return error;
 
 err_inode:
-	xfs_defer_move(parent_tp, tp);
 	xfs_trans_cancel(tp);
 	if (ip) {
 		xfs_iunlock(ip, XFS_ILOCK_EXCL);
diff --git a/fs/xfs/xfs_extfree_item.c b/fs/xfs/xfs_extfree_item.c
index 9093d2e7afdf..17d36fe5cfd0 100644
--- a/fs/xfs/xfs_extfree_item.c
+++ b/fs/xfs/xfs_extfree_item.c
@@ -585,10 +585,10 @@ const struct xfs_defer_op_type xfs_agfl_free_defer_type = {
 STATIC int
 xfs_efi_item_recover(
 	struct xfs_log_item		*lip,
-	struct xfs_trans		*parent_tp)
+	struct list_head		*capture_list)
 {
 	struct xfs_efi_log_item		*efip = EFI_ITEM(lip);
-	struct xfs_mount		*mp = parent_tp->t_mountp;
+	struct xfs_mount		*mp = lip->li_mountp;
 	struct xfs_efd_log_item		*efdp;
 	struct xfs_trans		*tp;
 	struct xfs_extent		*extp;
@@ -627,8 +627,7 @@ xfs_efi_item_recover(
 
 	}
 
-	error = xfs_trans_commit(tp);
-	return error;
+	return xfs_defer_ops_capture_and_commit(tp, capture_list);
 
 abort_error:
 	xfs_trans_cancel(tp);
diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index 84f876c6d498..550d0fa8057a 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -2434,37 +2434,62 @@ xlog_recover_process_data(
 /* Take all the collected deferred ops and finish them in order. */
 static int
 xlog_finish_defer_ops(
-	struct xfs_trans	*parent_tp)
+	struct xfs_mount	*mp,
+	struct list_head	*capture_list)
 {
-	struct xfs_mount	*mp = parent_tp->t_mountp;
+	struct xfs_defer_capture *dfc, *next;
 	struct xfs_trans	*tp;
 	int64_t			freeblks;
-	uint			resblks;
-	int			error;
+	uint64_t		resblks;
+	int			error = 0;
 
-	/*
-	 * We're finishing the defer_ops that accumulated as a result of
-	 * recovering unfinished intent items during log recovery.  We
-	 * reserve an itruncate transaction because it is the largest
-	 * permanent transaction type.  Since we're the only user of the fs
-	 * right now, take 93% (15/16) of the available free blocks.  Use
-	 * weird math to avoid a 64-bit division.
-	 */
-	freeblks = percpu_counter_sum(&mp->m_fdblocks);
-	if (freeblks <= 0)
-		return -ENOSPC;
-	resblks = min_t(int64_t, UINT_MAX, freeblks);
-	resblks = (resblks * 15) >> 4;
-	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_itruncate, resblks,
-			0, XFS_TRANS_RESERVE, &tp);
-	if (error)
-		return error;
-	/* transfer all collected dfops to this transaction */
-	xfs_defer_move(tp, parent_tp);
+	list_for_each_entry_safe(dfc, next, capture_list, dfc_list) {
+		/*
+		 * We're finishing the defer_ops that accumulated as a result
+		 * of recovering unfinished intent items during log recovery.
+		 * We reserve an itruncate transaction because it is the
+		 * largest permanent transaction type.  Since we're the only
+		 * user of the fs right now, take 93% (15/16) of the available
+		 * free blocks.  Use weird math to avoid a 64-bit division.
+		 */
+		freeblks = percpu_counter_sum(&mp->m_fdblocks);
+		if (freeblks <= 0)
+			return -ENOSPC;
 
-	return xfs_trans_commit(tp);
+		resblks = min_t(uint64_t, UINT_MAX, freeblks);
+		resblks = (resblks * 15) >> 4;
+		error = xfs_trans_alloc(mp, &M_RES(mp)->tr_itruncate, resblks,
+				0, XFS_TRANS_RESERVE, &tp);
+		if (error)
+			return error;
+
+		/* Transfer all collected dfops to this transaction. */
+		list_del_init(&dfc->dfc_list);
+		xfs_defer_ops_continue(dfc, tp);
+
+		error = xfs_trans_commit(tp);
+		if (error)
+			return error;
+	}
+
+	ASSERT(list_empty(capture_list));
+	return 0;
 }
 
+/* Release all the captured defer ops and capture structures in this list. */
+static void
+xlog_abort_defer_ops(
+	struct xfs_mount		*mp,
+	struct list_head		*capture_list)
+{
+	struct xfs_defer_capture	*dfc;
+	struct xfs_defer_capture	*next;
+
+	list_for_each_entry_safe(dfc, next, capture_list, dfc_list) {
+		list_del_init(&dfc->dfc_list);
+		xfs_defer_ops_release(mp, dfc);
+	}
+}
 /*
  * When this is called, all of the log intent items which did not have
  * corresponding log done items should be in the AIL.  What we do now
@@ -2485,35 +2510,23 @@ STATIC int
 xlog_recover_process_intents(
 	struct xlog		*log)
 {
-	struct xfs_trans	*parent_tp;
+	LIST_HEAD(capture_list);
 	struct xfs_ail_cursor	cur;
 	struct xfs_log_item	*lip;
 	struct xfs_ail		*ailp;
-	int			error;
+	int			error = 0;
 #if defined(DEBUG) || defined(XFS_WARN)
 	xfs_lsn_t		last_lsn;
 #endif
 
-	/*
-	 * The intent recovery handlers commit transactions to complete recovery
-	 * for individual intents, but any new deferred operations that are
-	 * queued during that process are held off until the very end. The
-	 * purpose of this transaction is to serve as a container for deferred
-	 * operations. Each intent recovery handler must transfer dfops here
-	 * before its local transaction commits, and we'll finish the entire
-	 * list below.
-	 */
-	error = xfs_trans_alloc_empty(log->l_mp, &parent_tp);
-	if (error)
-		return error;
-
 	ailp = log->l_ailp;
 	spin_lock(&ailp->ail_lock);
-	lip = xfs_trans_ail_cursor_first(ailp, &cur, 0);
 #if defined(DEBUG) || defined(XFS_WARN)
 	last_lsn = xlog_assign_lsn(log->l_curr_cycle, log->l_curr_block);
 #endif
-	while (lip != NULL) {
+	for (lip = xfs_trans_ail_cursor_first(ailp, &cur, 0);
+	     lip != NULL;
+	     lip = xfs_trans_ail_cursor_next(ailp, &cur)) {
 		/*
 		 * We're done when we see something other than an intent.
 		 * There should be no intents left in the AIL now.
@@ -2535,24 +2548,29 @@ xlog_recover_process_intents(
 
 		/*
 		 * NOTE: If your intent processing routine can create more
-		 * deferred ops, you /must/ attach them to the transaction in
-		 * this routine or else those subsequent intents will get
+		 * deferred ops, you /must/ attach them to the capture list in
+		 * the recover routine or else those subsequent intents will be
 		 * replayed in the wrong order!
 		 */
 		spin_unlock(&ailp->ail_lock);
-		error = lip->li_ops->iop_recover(lip, parent_tp);
+		error = lip->li_ops->iop_recover(lip, &capture_list);
 		spin_lock(&ailp->ail_lock);
 		if (error)
-			goto out;
-		lip = xfs_trans_ail_cursor_next(ailp, &cur);
+			break;
 	}
-out:
+
 	xfs_trans_ail_cursor_done(&cur);
 	spin_unlock(&ailp->ail_lock);
-	if (!error)
-		error = xlog_finish_defer_ops(parent_tp);
-	xfs_trans_cancel(parent_tp);
+	if (error)
+		goto err;
 
+	error = xlog_finish_defer_ops(log->l_mp, &capture_list);
+	if (error)
+		goto err;
+
+	return 0;
+err:
+	xlog_abort_defer_ops(log->l_mp, &capture_list);
 	return error;
 }
 
diff --git a/fs/xfs/xfs_refcount_item.c b/fs/xfs/xfs_refcount_item.c
index 3e34b7662361..0478374add64 100644
--- a/fs/xfs/xfs_refcount_item.c
+++ b/fs/xfs/xfs_refcount_item.c
@@ -424,7 +424,7 @@ const struct xfs_defer_op_type xfs_refcount_update_defer_type = {
 STATIC int
 xfs_cui_item_recover(
 	struct xfs_log_item		*lip,
-	struct xfs_trans		*parent_tp)
+	struct list_head		*capture_list)
 {
 	struct xfs_bmbt_irec		irec;
 	struct xfs_cui_log_item		*cuip = CUI_ITEM(lip);
@@ -432,7 +432,7 @@ xfs_cui_item_recover(
 	struct xfs_cud_log_item		*cudp;
 	struct xfs_trans		*tp;
 	struct xfs_btree_cur		*rcur = NULL;
-	struct xfs_mount		*mp = parent_tp->t_mountp;
+	struct xfs_mount		*mp = lip->li_mountp;
 	xfs_fsblock_t			startblock_fsb;
 	xfs_fsblock_t			new_fsb;
 	xfs_extlen_t			new_len;
@@ -487,12 +487,7 @@ xfs_cui_item_recover(
 			mp->m_refc_maxlevels * 2, 0, XFS_TRANS_RESERVE, &tp);
 	if (error)
 		return error;
-	/*
-	 * Recovery stashes all deferred ops during intent processing and
-	 * finishes them on completion. Transfer current dfops state to this
-	 * transaction and transfer the result back before we return.
-	 */
-	xfs_defer_move(tp, parent_tp);
+
 	cudp = xfs_trans_get_cud(tp, cuip);
 
 	for (i = 0; i < cuip->cui_format.cui_nextents; i++) {
@@ -549,13 +544,10 @@ xfs_cui_item_recover(
 	}
 
 	xfs_refcount_finish_one_cleanup(tp, rcur, error);
-	xfs_defer_capture(parent_tp, tp);
-	error = xfs_trans_commit(tp);
-	return error;
+	return xfs_defer_ops_capture_and_commit(tp, capture_list);
 
 abort_error:
 	xfs_refcount_finish_one_cleanup(tp, rcur, error);
-	xfs_defer_move(parent_tp, tp);
 	xfs_trans_cancel(tp);
 	return error;
 }
diff --git a/fs/xfs/xfs_rmap_item.c b/fs/xfs/xfs_rmap_item.c
index e38ec5d736be..0d8fa707f079 100644
--- a/fs/xfs/xfs_rmap_item.c
+++ b/fs/xfs/xfs_rmap_item.c
@@ -467,14 +467,14 @@ const struct xfs_defer_op_type xfs_rmap_update_defer_type = {
 STATIC int
 xfs_rui_item_recover(
 	struct xfs_log_item		*lip,
-	struct xfs_trans		*parent_tp)
+	struct list_head		*capture_list)
 {
 	struct xfs_rui_log_item		*ruip = RUI_ITEM(lip);
 	struct xfs_map_extent		*rmap;
 	struct xfs_rud_log_item		*rudp;
 	struct xfs_trans		*tp;
 	struct xfs_btree_cur		*rcur = NULL;
-	struct xfs_mount		*mp = parent_tp->t_mountp;
+	struct xfs_mount		*mp = lip->li_mountp;
 	xfs_fsblock_t			startblock_fsb;
 	enum xfs_rmap_intent_type	type;
 	xfs_exntst_t			state;
@@ -567,8 +567,7 @@ xfs_rui_item_recover(
 	}
 
 	xfs_rmap_finish_one_cleanup(tp, rcur, error);
-	error = xfs_trans_commit(tp);
-	return error;
+	return xfs_defer_ops_capture_and_commit(tp, capture_list);
 
 abort_error:
 	xfs_rmap_finish_one_cleanup(tp, rcur, error);
diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h
index ced62a35a62b..186e77d08cc1 100644
--- a/fs/xfs/xfs_trans.h
+++ b/fs/xfs/xfs_trans.h
@@ -72,7 +72,8 @@ struct xfs_item_ops {
 	void (*iop_committing)(struct xfs_log_item *, xfs_lsn_t commit_lsn);
 	void (*iop_release)(struct xfs_log_item *);
 	xfs_lsn_t (*iop_committed)(struct xfs_log_item *, xfs_lsn_t);
-	int (*iop_recover)(struct xfs_log_item *lip, struct xfs_trans *tp);
+	int (*iop_recover)(struct xfs_log_item *lip,
+			   struct list_head *capture_list);
 	bool (*iop_match)(struct xfs_log_item *item, uint64_t id);
 };
 


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 4/5] xfs: xfs_defer_capture should absorb remaining block reservation
  2020-09-29 17:43 [PATCH v4 0/5] xfs: fix how we deal with new intents during recovery Darrick J. Wong
                   ` (2 preceding siblings ...)
  2020-09-29 17:43 ` [PATCH 3/5] xfs: proper replay of deferred ops queued during log recovery Darrick J. Wong
@ 2020-09-29 17:43 ` Darrick J. Wong
  2020-10-01 17:32   ` Brian Foster
  2020-10-02  4:20   ` [PATCH v4.2 " Darrick J. Wong
  2020-09-29 17:43 ` [PATCH 5/5] xfs: xfs_defer_capture should absorb remaining transaction reservation Darrick J. Wong
  4 siblings, 2 replies; 24+ messages in thread
From: Darrick J. Wong @ 2020-09-29 17:43 UTC (permalink / raw)
  To: darrick.wong
  Cc: Christoph Hellwig, Dave Chinner, linux-xfs, david, hch, bfoster

From: Darrick J. Wong <darrick.wong@oracle.com>

When xfs_defer_capture extracts the deferred ops and transaction state
from a transaction, it should absorb the remaining block reservation so
that when we continue the dfops chain, we still have those blocks to
use.

This adds the requirement that every log intent item recovery function
must be careful to reserve enough blocks to handle both itself and all
defer ops that it can queue.  On the other hand, this enables us to do
away with the handwaving block estimation nonsense that was going on in
xlog_finish_defer_ops.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/libxfs/xfs_defer.c |    5 +++++
 fs/xfs/libxfs/xfs_defer.h |    1 +
 fs/xfs/xfs_log_recover.c  |   18 +-----------------
 3 files changed, 7 insertions(+), 17 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_defer.c b/fs/xfs/libxfs/xfs_defer.c
index 85c371d29e8d..0cceebb390c4 100644
--- a/fs/xfs/libxfs/xfs_defer.c
+++ b/fs/xfs/libxfs/xfs_defer.c
@@ -575,6 +575,10 @@ xfs_defer_ops_capture(
 	dfc->dfc_tpflags = tp->t_flags & XFS_TRANS_LOWMODE;
 	tp->t_flags &= ~XFS_TRANS_LOWMODE;
 
+	/* Capture the block reservation along with the dfops. */
+	dfc->dfc_blkres = tp->t_blk_res - tp->t_blk_res_used;
+	tp->t_blk_res = tp->t_blk_res_used;
+
 	return dfc;
 }
 
@@ -632,6 +636,7 @@ xfs_defer_ops_continue(
 	/* Move captured dfops chain and state to the transaction. */
 	list_splice_init(&dfc->dfc_dfops, &tp->t_dfops);
 	tp->t_flags |= dfc->dfc_tpflags;
+	tp->t_blk_res += dfc->dfc_blkres;
 
 	kmem_free(dfc);
 }
diff --git a/fs/xfs/libxfs/xfs_defer.h b/fs/xfs/libxfs/xfs_defer.h
index 3af82ebc1249..b1c7b761afd5 100644
--- a/fs/xfs/libxfs/xfs_defer.h
+++ b/fs/xfs/libxfs/xfs_defer.h
@@ -75,6 +75,7 @@ struct xfs_defer_capture {
 	/* Deferred ops state saved from the transaction. */
 	struct list_head	dfc_dfops;
 	unsigned int		dfc_tpflags;
+	unsigned int		dfc_blkres;
 };
 
 /*
diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index 550d0fa8057a..b06c9881a13d 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -2439,26 +2439,10 @@ xlog_finish_defer_ops(
 {
 	struct xfs_defer_capture *dfc, *next;
 	struct xfs_trans	*tp;
-	int64_t			freeblks;
-	uint64_t		resblks;
 	int			error = 0;
 
 	list_for_each_entry_safe(dfc, next, capture_list, dfc_list) {
-		/*
-		 * We're finishing the defer_ops that accumulated as a result
-		 * of recovering unfinished intent items during log recovery.
-		 * We reserve an itruncate transaction because it is the
-		 * largest permanent transaction type.  Since we're the only
-		 * user of the fs right now, take 93% (15/16) of the available
-		 * free blocks.  Use weird math to avoid a 64-bit division.
-		 */
-		freeblks = percpu_counter_sum(&mp->m_fdblocks);
-		if (freeblks <= 0)
-			return -ENOSPC;
-
-		resblks = min_t(uint64_t, UINT_MAX, freeblks);
-		resblks = (resblks * 15) >> 4;
-		error = xfs_trans_alloc(mp, &M_RES(mp)->tr_itruncate, resblks,
+		error = xfs_trans_alloc(mp, &M_RES(mp)->tr_itruncate, 0,
 				0, XFS_TRANS_RESERVE, &tp);
 		if (error)
 			return error;


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 5/5] xfs: xfs_defer_capture should absorb remaining transaction reservation
  2020-09-29 17:43 [PATCH v4 0/5] xfs: fix how we deal with new intents during recovery Darrick J. Wong
                   ` (3 preceding siblings ...)
  2020-09-29 17:43 ` [PATCH 4/5] xfs: xfs_defer_capture should absorb remaining block reservation Darrick J. Wong
@ 2020-09-29 17:43 ` Darrick J. Wong
  2020-10-01 17:32   ` Brian Foster
  2020-10-02  4:21   ` [PATCH v4.2 " Darrick J. Wong
  4 siblings, 2 replies; 24+ messages in thread
From: Darrick J. Wong @ 2020-09-29 17:43 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs, david, hch, bfoster

From: Darrick J. Wong <darrick.wong@oracle.com>

When xfs_defer_capture extracts the deferred ops and transaction state
from a transaction, it should record the transaction reservation type
from the old transaction so that when we continue the dfops chain, we
still use the same reservation parameters.

Doing this means that the log item recovery functions get to determine
the transaction reservation instead of abusing tr_itruncate in yet
another part of xfs.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/libxfs/xfs_defer.c |    9 +++++++++
 fs/xfs/libxfs/xfs_defer.h |    1 +
 fs/xfs/xfs_log_recover.c  |    4 ++--
 3 files changed, 12 insertions(+), 2 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_defer.c b/fs/xfs/libxfs/xfs_defer.c
index 0cceebb390c4..4caaf5527403 100644
--- a/fs/xfs/libxfs/xfs_defer.c
+++ b/fs/xfs/libxfs/xfs_defer.c
@@ -579,6 +579,15 @@ xfs_defer_ops_capture(
 	dfc->dfc_blkres = tp->t_blk_res - tp->t_blk_res_used;
 	tp->t_blk_res = tp->t_blk_res_used;
 
+	/*
+	 * Preserve the transaction reservation type.  The logcount is
+	 * hardwired to 1 to so that we can make forward progress in recovery
+	 * no matter how full the log might be, at a cost of more regrants.
+	 */
+	dfc->dfc_tres.tr_logres = tp->t_log_res;
+	dfc->dfc_tres.tr_logcount = 1;
+	dfc->dfc_tres.tr_logflags = XFS_TRANS_PERM_LOG_RES;
+
 	return dfc;
 }
 
diff --git a/fs/xfs/libxfs/xfs_defer.h b/fs/xfs/libxfs/xfs_defer.h
index b1c7b761afd5..c447c79bbe74 100644
--- a/fs/xfs/libxfs/xfs_defer.h
+++ b/fs/xfs/libxfs/xfs_defer.h
@@ -76,6 +76,7 @@ struct xfs_defer_capture {
 	struct list_head	dfc_dfops;
 	unsigned int		dfc_tpflags;
 	unsigned int		dfc_blkres;
+	struct xfs_trans_res	dfc_tres;
 };
 
 /*
diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index b06c9881a13d..46e750279634 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -2442,8 +2442,8 @@ xlog_finish_defer_ops(
 	int			error = 0;
 
 	list_for_each_entry_safe(dfc, next, capture_list, dfc_list) {
-		error = xfs_trans_alloc(mp, &M_RES(mp)->tr_itruncate, 0,
-				0, XFS_TRANS_RESERVE, &tp);
+		error = xfs_trans_alloc(mp, &dfc->dfc_tres, 0, 0,
+				XFS_TRANS_RESERVE, &tp);
 		if (error)
 			return error;
 


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/5] xfs: remove xfs_defer_reset
  2020-09-29 17:43 ` [PATCH 1/5] xfs: remove xfs_defer_reset Darrick J. Wong
@ 2020-10-01 17:30   ` Brian Foster
  0 siblings, 0 replies; 24+ messages in thread
From: Brian Foster @ 2020-10-01 17:30 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Dave Chinner, Christoph Hellwig, linux-xfs, david

On Tue, Sep 29, 2020 at 10:43:18AM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Remove this one-line helper since the assert is trivially true in one
> call site and the rest obscures a bitmask operation.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> Reviewed-by: Dave Chinner <dchinner@redhat.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> ---

Reviewed-by: Brian Foster <bfoster@redhat.com>

>  fs/xfs/libxfs/xfs_defer.c |   24 +++++-------------------
>  1 file changed, 5 insertions(+), 19 deletions(-)
> 
> 
> diff --git a/fs/xfs/libxfs/xfs_defer.c b/fs/xfs/libxfs/xfs_defer.c
> index 29e9762f3b77..36c103c14bc9 100644
> --- a/fs/xfs/libxfs/xfs_defer.c
> +++ b/fs/xfs/libxfs/xfs_defer.c
> @@ -312,22 +312,6 @@ xfs_defer_trans_roll(
>  	return error;
>  }
>  
> -/*
> - * Reset an already used dfops after finish.
> - */
> -static void
> -xfs_defer_reset(
> -	struct xfs_trans	*tp)
> -{
> -	ASSERT(list_empty(&tp->t_dfops));
> -
> -	/*
> -	 * Low mode state transfers across transaction rolls to mirror dfops
> -	 * lifetime. Clear it now that dfops is reset.
> -	 */
> -	tp->t_flags &= ~XFS_TRANS_LOWMODE;
> -}
> -
>  /*
>   * Free up any items left in the list.
>   */
> @@ -477,7 +461,10 @@ xfs_defer_finish(
>  			return error;
>  		}
>  	}
> -	xfs_defer_reset(*tp);
> +
> +	/* Reset LOWMODE now that we've finished all the dfops. */
> +	ASSERT(list_empty(&(*tp)->t_dfops));
> +	(*tp)->t_flags &= ~XFS_TRANS_LOWMODE;
>  	return 0;
>  }
>  
> @@ -551,8 +538,7 @@ xfs_defer_move(
>  	 * that behavior.
>  	 */
>  	dtp->t_flags |= (stp->t_flags & XFS_TRANS_LOWMODE);
> -
> -	xfs_defer_reset(stp);
> +	stp->t_flags &= ~XFS_TRANS_LOWMODE;
>  }
>  
>  /*
> 


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/5] xfs: remove XFS_LI_RECOVERED
  2020-09-29 17:43 ` [PATCH 2/5] xfs: remove XFS_LI_RECOVERED Darrick J. Wong
@ 2020-10-01 17:30   ` Brian Foster
  2020-10-02  7:16   ` Christoph Hellwig
  1 sibling, 0 replies; 24+ messages in thread
From: Brian Foster @ 2020-10-01 17:30 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs, david, hch

On Tue, Sep 29, 2020 at 10:43:25AM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> The ->iop_recover method of a log intent item removes the recovered
> intent item from the AIL by logging an intent done item and committing
> the transaction, so it's superfluous to have this flag check.  Nothing
> else uses it, so get rid of the flag entirely.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---

Reviewed-by: Brian Foster <bfoster@redhat.com>

>  fs/xfs/xfs_log_recover.c |    8 +++-----
>  fs/xfs/xfs_trans.h       |    4 +---
>  2 files changed, 4 insertions(+), 8 deletions(-)
> 
> 
> diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
> index e0675071b39e..84f876c6d498 100644
> --- a/fs/xfs/xfs_log_recover.c
> +++ b/fs/xfs/xfs_log_recover.c
> @@ -2539,11 +2539,9 @@ xlog_recover_process_intents(
>  		 * this routine or else those subsequent intents will get
>  		 * replayed in the wrong order!
>  		 */
> -		if (!test_and_set_bit(XFS_LI_RECOVERED, &lip->li_flags)) {
> -			spin_unlock(&ailp->ail_lock);
> -			error = lip->li_ops->iop_recover(lip, parent_tp);
> -			spin_lock(&ailp->ail_lock);
> -		}
> +		spin_unlock(&ailp->ail_lock);
> +		error = lip->li_ops->iop_recover(lip, parent_tp);
> +		spin_lock(&ailp->ail_lock);
>  		if (error)
>  			goto out;
>  		lip = xfs_trans_ail_cursor_next(ailp, &cur);
> diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h
> index a71b4f443e39..ced62a35a62b 100644
> --- a/fs/xfs/xfs_trans.h
> +++ b/fs/xfs/xfs_trans.h
> @@ -55,14 +55,12 @@ struct xfs_log_item {
>  #define	XFS_LI_ABORTED	1
>  #define	XFS_LI_FAILED	2
>  #define	XFS_LI_DIRTY	3	/* log item dirty in transaction */
> -#define	XFS_LI_RECOVERED 4	/* log intent item has been recovered */
>  
>  #define XFS_LI_FLAGS \
>  	{ (1 << XFS_LI_IN_AIL),		"IN_AIL" }, \
>  	{ (1 << XFS_LI_ABORTED),	"ABORTED" }, \
>  	{ (1 << XFS_LI_FAILED),		"FAILED" }, \
> -	{ (1 << XFS_LI_DIRTY),		"DIRTY" }, \
> -	{ (1 << XFS_LI_RECOVERED),	"RECOVERED" }
> +	{ (1 << XFS_LI_DIRTY),		"DIRTY" }
>  
>  struct xfs_item_ops {
>  	unsigned flags;
> 


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 3/5] xfs: proper replay of deferred ops queued during log recovery
  2020-09-29 17:43 ` [PATCH 3/5] xfs: proper replay of deferred ops queued during log recovery Darrick J. Wong
@ 2020-10-01 17:31   ` Brian Foster
  2020-10-01 17:41     ` Darrick J. Wong
  2020-10-02  7:18   ` Christoph Hellwig
  1 sibling, 1 reply; 24+ messages in thread
From: Brian Foster @ 2020-10-01 17:31 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs, david, hch

On Tue, Sep 29, 2020 at 10:43:31AM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> When we replay unfinished intent items that have been recovered from the
> log, it's possible that the replay will cause the creation of more
> deferred work items.  As outlined in commit 509955823cc9c ("xfs: log
> recovery should replay deferred ops in order"), later work items have an
> implicit ordering dependency on earlier work items.  Therefore, recovery
> must replay the items (both recovered and created) in the same order
> that they would have been during normal operation.
> 
> For log recovery, we enforce this ordering by using an empty transaction
> to collect deferred ops that get created in the process of recovering a
> log intent item to prevent them from being committed before the rest of
> the recovered intent items.  After we finish committing all the
> recovered log items, we allocate a transaction with an enormous block
> reservation, splice our huge list of created deferred ops into that
> transaction, and commit it, thereby finishing all those ops.
> 
> This is /really/ hokey -- it's the one place in XFS where we allow
> nested transactions; the splicing of the defer ops list is is inelegant
> and has to be done twice per recovery function; and the broken way we
> handle inode pointers and block reservations cause subtle use-after-free
> and allocator problems that will be fixed by this patch and the two
> patches after it.
> 
> Therefore, replace the hokey empty transaction with a structure designed
> to capture each chain of deferred ops that are created as part of
> recovering a single unfinished log intent.  Finally, refactor the loop
> that replays those chains to do so using one transaction per chain.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  fs/xfs/libxfs/xfs_defer.c  |   89 +++++++++++++++++++++++++++++++--
>  fs/xfs/libxfs/xfs_defer.h  |   19 +++++++
>  fs/xfs/xfs_bmap_item.c     |   16 +-----
>  fs/xfs/xfs_extfree_item.c  |    7 +--
>  fs/xfs/xfs_log_recover.c   |  118 +++++++++++++++++++++++++-------------------
>  fs/xfs/xfs_refcount_item.c |   16 +-----
>  fs/xfs/xfs_rmap_item.c     |    7 +--
>  fs/xfs/xfs_trans.h         |    3 +
>  8 files changed, 184 insertions(+), 91 deletions(-)
> 
> 
...
> diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
> index 84f876c6d498..550d0fa8057a 100644
> --- a/fs/xfs/xfs_log_recover.c
> +++ b/fs/xfs/xfs_log_recover.c
> @@ -2434,37 +2434,62 @@ xlog_recover_process_data(
>  /* Take all the collected deferred ops and finish them in order. */
>  static int
>  xlog_finish_defer_ops(
> -	struct xfs_trans	*parent_tp)
> +	struct xfs_mount	*mp,
> +	struct list_head	*capture_list)
>  {
> -	struct xfs_mount	*mp = parent_tp->t_mountp;
> +	struct xfs_defer_capture *dfc, *next;
>  	struct xfs_trans	*tp;
>  	int64_t			freeblks;
> -	uint			resblks;
> -	int			error;
> +	uint64_t		resblks;
> +	int			error = 0;
>  
> -	/*
> -	 * We're finishing the defer_ops that accumulated as a result of
> -	 * recovering unfinished intent items during log recovery.  We
> -	 * reserve an itruncate transaction because it is the largest
> -	 * permanent transaction type.  Since we're the only user of the fs
> -	 * right now, take 93% (15/16) of the available free blocks.  Use
> -	 * weird math to avoid a 64-bit division.
> -	 */
> -	freeblks = percpu_counter_sum(&mp->m_fdblocks);
> -	if (freeblks <= 0)
> -		return -ENOSPC;
> -	resblks = min_t(int64_t, UINT_MAX, freeblks);
> -	resblks = (resblks * 15) >> 4;
> -	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_itruncate, resblks,
> -			0, XFS_TRANS_RESERVE, &tp);
> -	if (error)
> -		return error;
> -	/* transfer all collected dfops to this transaction */
> -	xfs_defer_move(tp, parent_tp);
> +	list_for_each_entry_safe(dfc, next, capture_list, dfc_list) {
> +		/*
> +		 * We're finishing the defer_ops that accumulated as a result
> +		 * of recovering unfinished intent items during log recovery.
> +		 * We reserve an itruncate transaction because it is the
> +		 * largest permanent transaction type.  Since we're the only
> +		 * user of the fs right now, take 93% (15/16) of the available
> +		 * free blocks.  Use weird math to avoid a 64-bit division.
> +		 */
> +		freeblks = percpu_counter_sum(&mp->m_fdblocks);
> +		if (freeblks <= 0)
> +			return -ENOSPC;
>  
> -	return xfs_trans_commit(tp);
> +		resblks = min_t(uint64_t, UINT_MAX, freeblks);
> +		resblks = (resblks * 15) >> 4;
> +		error = xfs_trans_alloc(mp, &M_RES(mp)->tr_itruncate, resblks,
> +				0, XFS_TRANS_RESERVE, &tp);
> +		if (error)
> +			return error;
> +
> +		/* Transfer all collected dfops to this transaction. */

This old comment is a little misleading now that we have per-intent
capture lists. That nit aside, this looks good to me and the updated
factoring is easier to follow:

Reviewed-by: Brian Foster <bfoster@redhat.com>

> +		list_del_init(&dfc->dfc_list);
> +		xfs_defer_ops_continue(dfc, tp);
> +
> +		error = xfs_trans_commit(tp);
> +		if (error)
> +			return error;
> +	}
> +
> +	ASSERT(list_empty(capture_list));
> +	return 0;
>  }
>  
> +/* Release all the captured defer ops and capture structures in this list. */
> +static void
> +xlog_abort_defer_ops(
> +	struct xfs_mount		*mp,
> +	struct list_head		*capture_list)
> +{
> +	struct xfs_defer_capture	*dfc;
> +	struct xfs_defer_capture	*next;
> +
> +	list_for_each_entry_safe(dfc, next, capture_list, dfc_list) {
> +		list_del_init(&dfc->dfc_list);
> +		xfs_defer_ops_release(mp, dfc);
> +	}
> +}
>  /*
>   * When this is called, all of the log intent items which did not have
>   * corresponding log done items should be in the AIL.  What we do now
> @@ -2485,35 +2510,23 @@ STATIC int
>  xlog_recover_process_intents(
>  	struct xlog		*log)
>  {
> -	struct xfs_trans	*parent_tp;
> +	LIST_HEAD(capture_list);
>  	struct xfs_ail_cursor	cur;
>  	struct xfs_log_item	*lip;
>  	struct xfs_ail		*ailp;
> -	int			error;
> +	int			error = 0;
>  #if defined(DEBUG) || defined(XFS_WARN)
>  	xfs_lsn_t		last_lsn;
>  #endif
>  
> -	/*
> -	 * The intent recovery handlers commit transactions to complete recovery
> -	 * for individual intents, but any new deferred operations that are
> -	 * queued during that process are held off until the very end. The
> -	 * purpose of this transaction is to serve as a container for deferred
> -	 * operations. Each intent recovery handler must transfer dfops here
> -	 * before its local transaction commits, and we'll finish the entire
> -	 * list below.
> -	 */
> -	error = xfs_trans_alloc_empty(log->l_mp, &parent_tp);
> -	if (error)
> -		return error;
> -
>  	ailp = log->l_ailp;
>  	spin_lock(&ailp->ail_lock);
> -	lip = xfs_trans_ail_cursor_first(ailp, &cur, 0);
>  #if defined(DEBUG) || defined(XFS_WARN)
>  	last_lsn = xlog_assign_lsn(log->l_curr_cycle, log->l_curr_block);
>  #endif
> -	while (lip != NULL) {
> +	for (lip = xfs_trans_ail_cursor_first(ailp, &cur, 0);
> +	     lip != NULL;
> +	     lip = xfs_trans_ail_cursor_next(ailp, &cur)) {
>  		/*
>  		 * We're done when we see something other than an intent.
>  		 * There should be no intents left in the AIL now.
> @@ -2535,24 +2548,29 @@ xlog_recover_process_intents(
>  
>  		/*
>  		 * NOTE: If your intent processing routine can create more
> -		 * deferred ops, you /must/ attach them to the transaction in
> -		 * this routine or else those subsequent intents will get
> +		 * deferred ops, you /must/ attach them to the capture list in
> +		 * the recover routine or else those subsequent intents will be
>  		 * replayed in the wrong order!
>  		 */
>  		spin_unlock(&ailp->ail_lock);
> -		error = lip->li_ops->iop_recover(lip, parent_tp);
> +		error = lip->li_ops->iop_recover(lip, &capture_list);
>  		spin_lock(&ailp->ail_lock);
>  		if (error)
> -			goto out;
> -		lip = xfs_trans_ail_cursor_next(ailp, &cur);
> +			break;
>  	}
> -out:
> +
>  	xfs_trans_ail_cursor_done(&cur);
>  	spin_unlock(&ailp->ail_lock);
> -	if (!error)
> -		error = xlog_finish_defer_ops(parent_tp);
> -	xfs_trans_cancel(parent_tp);
> +	if (error)
> +		goto err;
>  
> +	error = xlog_finish_defer_ops(log->l_mp, &capture_list);
> +	if (error)
> +		goto err;
> +
> +	return 0;
> +err:
> +	xlog_abort_defer_ops(log->l_mp, &capture_list);
>  	return error;
>  }
>  
> diff --git a/fs/xfs/xfs_refcount_item.c b/fs/xfs/xfs_refcount_item.c
> index 3e34b7662361..0478374add64 100644
> --- a/fs/xfs/xfs_refcount_item.c
> +++ b/fs/xfs/xfs_refcount_item.c
> @@ -424,7 +424,7 @@ const struct xfs_defer_op_type xfs_refcount_update_defer_type = {
>  STATIC int
>  xfs_cui_item_recover(
>  	struct xfs_log_item		*lip,
> -	struct xfs_trans		*parent_tp)
> +	struct list_head		*capture_list)
>  {
>  	struct xfs_bmbt_irec		irec;
>  	struct xfs_cui_log_item		*cuip = CUI_ITEM(lip);
> @@ -432,7 +432,7 @@ xfs_cui_item_recover(
>  	struct xfs_cud_log_item		*cudp;
>  	struct xfs_trans		*tp;
>  	struct xfs_btree_cur		*rcur = NULL;
> -	struct xfs_mount		*mp = parent_tp->t_mountp;
> +	struct xfs_mount		*mp = lip->li_mountp;
>  	xfs_fsblock_t			startblock_fsb;
>  	xfs_fsblock_t			new_fsb;
>  	xfs_extlen_t			new_len;
> @@ -487,12 +487,7 @@ xfs_cui_item_recover(
>  			mp->m_refc_maxlevels * 2, 0, XFS_TRANS_RESERVE, &tp);
>  	if (error)
>  		return error;
> -	/*
> -	 * Recovery stashes all deferred ops during intent processing and
> -	 * finishes them on completion. Transfer current dfops state to this
> -	 * transaction and transfer the result back before we return.
> -	 */
> -	xfs_defer_move(tp, parent_tp);
> +
>  	cudp = xfs_trans_get_cud(tp, cuip);
>  
>  	for (i = 0; i < cuip->cui_format.cui_nextents; i++) {
> @@ -549,13 +544,10 @@ xfs_cui_item_recover(
>  	}
>  
>  	xfs_refcount_finish_one_cleanup(tp, rcur, error);
> -	xfs_defer_capture(parent_tp, tp);
> -	error = xfs_trans_commit(tp);
> -	return error;
> +	return xfs_defer_ops_capture_and_commit(tp, capture_list);
>  
>  abort_error:
>  	xfs_refcount_finish_one_cleanup(tp, rcur, error);
> -	xfs_defer_move(parent_tp, tp);
>  	xfs_trans_cancel(tp);
>  	return error;
>  }
> diff --git a/fs/xfs/xfs_rmap_item.c b/fs/xfs/xfs_rmap_item.c
> index e38ec5d736be..0d8fa707f079 100644
> --- a/fs/xfs/xfs_rmap_item.c
> +++ b/fs/xfs/xfs_rmap_item.c
> @@ -467,14 +467,14 @@ const struct xfs_defer_op_type xfs_rmap_update_defer_type = {
>  STATIC int
>  xfs_rui_item_recover(
>  	struct xfs_log_item		*lip,
> -	struct xfs_trans		*parent_tp)
> +	struct list_head		*capture_list)
>  {
>  	struct xfs_rui_log_item		*ruip = RUI_ITEM(lip);
>  	struct xfs_map_extent		*rmap;
>  	struct xfs_rud_log_item		*rudp;
>  	struct xfs_trans		*tp;
>  	struct xfs_btree_cur		*rcur = NULL;
> -	struct xfs_mount		*mp = parent_tp->t_mountp;
> +	struct xfs_mount		*mp = lip->li_mountp;
>  	xfs_fsblock_t			startblock_fsb;
>  	enum xfs_rmap_intent_type	type;
>  	xfs_exntst_t			state;
> @@ -567,8 +567,7 @@ xfs_rui_item_recover(
>  	}
>  
>  	xfs_rmap_finish_one_cleanup(tp, rcur, error);
> -	error = xfs_trans_commit(tp);
> -	return error;
> +	return xfs_defer_ops_capture_and_commit(tp, capture_list);
>  
>  abort_error:
>  	xfs_rmap_finish_one_cleanup(tp, rcur, error);
> diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h
> index ced62a35a62b..186e77d08cc1 100644
> --- a/fs/xfs/xfs_trans.h
> +++ b/fs/xfs/xfs_trans.h
> @@ -72,7 +72,8 @@ struct xfs_item_ops {
>  	void (*iop_committing)(struct xfs_log_item *, xfs_lsn_t commit_lsn);
>  	void (*iop_release)(struct xfs_log_item *);
>  	xfs_lsn_t (*iop_committed)(struct xfs_log_item *, xfs_lsn_t);
> -	int (*iop_recover)(struct xfs_log_item *lip, struct xfs_trans *tp);
> +	int (*iop_recover)(struct xfs_log_item *lip,
> +			   struct list_head *capture_list);
>  	bool (*iop_match)(struct xfs_log_item *item, uint64_t id);
>  };
>  
> 


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 4/5] xfs: xfs_defer_capture should absorb remaining block reservation
  2020-09-29 17:43 ` [PATCH 4/5] xfs: xfs_defer_capture should absorb remaining block reservation Darrick J. Wong
@ 2020-10-01 17:32   ` Brian Foster
  2020-10-01 17:46     ` Darrick J. Wong
  2020-10-02  4:20   ` [PATCH v4.2 " Darrick J. Wong
  1 sibling, 1 reply; 24+ messages in thread
From: Brian Foster @ 2020-10-01 17:32 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Christoph Hellwig, Dave Chinner, linux-xfs, david

On Tue, Sep 29, 2020 at 10:43:38AM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> When xfs_defer_capture extracts the deferred ops and transaction state
> from a transaction, it should absorb the remaining block reservation so
> that when we continue the dfops chain, we still have those blocks to
> use.
> 
> This adds the requirement that every log intent item recovery function
> must be careful to reserve enough blocks to handle both itself and all
> defer ops that it can queue.  On the other hand, this enables us to do
> away with the handwaving block estimation nonsense that was going on in
> xlog_finish_defer_ops.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/libxfs/xfs_defer.c |    5 +++++
>  fs/xfs/libxfs/xfs_defer.h |    1 +
>  fs/xfs/xfs_log_recover.c  |   18 +-----------------
>  3 files changed, 7 insertions(+), 17 deletions(-)
> 
> 
> diff --git a/fs/xfs/libxfs/xfs_defer.c b/fs/xfs/libxfs/xfs_defer.c
> index 85c371d29e8d..0cceebb390c4 100644
> --- a/fs/xfs/libxfs/xfs_defer.c
> +++ b/fs/xfs/libxfs/xfs_defer.c
> @@ -575,6 +575,10 @@ xfs_defer_ops_capture(
>  	dfc->dfc_tpflags = tp->t_flags & XFS_TRANS_LOWMODE;
>  	tp->t_flags &= ~XFS_TRANS_LOWMODE;
>  
> +	/* Capture the block reservation along with the dfops. */
> +	dfc->dfc_blkres = tp->t_blk_res - tp->t_blk_res_used;
> +	tp->t_blk_res = tp->t_blk_res_used;
> +
>  	return dfc;
>  }
>  
> @@ -632,6 +636,7 @@ xfs_defer_ops_continue(
>  	/* Move captured dfops chain and state to the transaction. */
>  	list_splice_init(&dfc->dfc_dfops, &tp->t_dfops);
>  	tp->t_flags |= dfc->dfc_tpflags;
> +	tp->t_blk_res += dfc->dfc_blkres;
>  
>  	kmem_free(dfc);
>  }

Seems sane, but I'm curious why we need to modify the transactions
directly in both of these contexts. Rather than building up and holding
a growing block reservation across transactions during intent
processing, could we just sample the unused blocks in the transaction at
capture time and use that as a resblks parameter when we allocate the
transaction to continue the chain? Then we at least have some validation
via the traditional allocation path if we ever screw up the accounting..

Brian

> diff --git a/fs/xfs/libxfs/xfs_defer.h b/fs/xfs/libxfs/xfs_defer.h
> index 3af82ebc1249..b1c7b761afd5 100644
> --- a/fs/xfs/libxfs/xfs_defer.h
> +++ b/fs/xfs/libxfs/xfs_defer.h
> @@ -75,6 +75,7 @@ struct xfs_defer_capture {
>  	/* Deferred ops state saved from the transaction. */
>  	struct list_head	dfc_dfops;
>  	unsigned int		dfc_tpflags;
> +	unsigned int		dfc_blkres;
>  };
>  
>  /*
> diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
> index 550d0fa8057a..b06c9881a13d 100644
> --- a/fs/xfs/xfs_log_recover.c
> +++ b/fs/xfs/xfs_log_recover.c
> @@ -2439,26 +2439,10 @@ xlog_finish_defer_ops(
>  {
>  	struct xfs_defer_capture *dfc, *next;
>  	struct xfs_trans	*tp;
> -	int64_t			freeblks;
> -	uint64_t		resblks;
>  	int			error = 0;
>  
>  	list_for_each_entry_safe(dfc, next, capture_list, dfc_list) {
> -		/*
> -		 * We're finishing the defer_ops that accumulated as a result
> -		 * of recovering unfinished intent items during log recovery.
> -		 * We reserve an itruncate transaction because it is the
> -		 * largest permanent transaction type.  Since we're the only
> -		 * user of the fs right now, take 93% (15/16) of the available
> -		 * free blocks.  Use weird math to avoid a 64-bit division.
> -		 */
> -		freeblks = percpu_counter_sum(&mp->m_fdblocks);
> -		if (freeblks <= 0)
> -			return -ENOSPC;
> -
> -		resblks = min_t(uint64_t, UINT_MAX, freeblks);
> -		resblks = (resblks * 15) >> 4;
> -		error = xfs_trans_alloc(mp, &M_RES(mp)->tr_itruncate, resblks,
> +		error = xfs_trans_alloc(mp, &M_RES(mp)->tr_itruncate, 0,
>  				0, XFS_TRANS_RESERVE, &tp);
>  		if (error)
>  			return error;
> 


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 5/5] xfs: xfs_defer_capture should absorb remaining transaction reservation
  2020-09-29 17:43 ` [PATCH 5/5] xfs: xfs_defer_capture should absorb remaining transaction reservation Darrick J. Wong
@ 2020-10-01 17:32   ` Brian Foster
  2020-10-01 17:52     ` Darrick J. Wong
  2020-10-02  4:21   ` [PATCH v4.2 " Darrick J. Wong
  1 sibling, 1 reply; 24+ messages in thread
From: Brian Foster @ 2020-10-01 17:32 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs, david, hch

On Tue, Sep 29, 2020 at 10:43:44AM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> When xfs_defer_capture extracts the deferred ops and transaction state
> from a transaction, it should record the transaction reservation type
> from the old transaction so that when we continue the dfops chain, we
> still use the same reservation parameters.
> 
> Doing this means that the log item recovery functions get to determine
> the transaction reservation instead of abusing tr_itruncate in yet
> another part of xfs.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---

Much nicer, and FWIW this is pretty much the approach I was wondering
about wrt to the block reservation in the previous patch..

>  fs/xfs/libxfs/xfs_defer.c |    9 +++++++++
>  fs/xfs/libxfs/xfs_defer.h |    1 +
>  fs/xfs/xfs_log_recover.c  |    4 ++--
>  3 files changed, 12 insertions(+), 2 deletions(-)
> 
> 
> diff --git a/fs/xfs/libxfs/xfs_defer.c b/fs/xfs/libxfs/xfs_defer.c
> index 0cceebb390c4..4caaf5527403 100644
> --- a/fs/xfs/libxfs/xfs_defer.c
> +++ b/fs/xfs/libxfs/xfs_defer.c
> @@ -579,6 +579,15 @@ xfs_defer_ops_capture(
>  	dfc->dfc_blkres = tp->t_blk_res - tp->t_blk_res_used;
>  	tp->t_blk_res = tp->t_blk_res_used;
>  
> +	/*
> +	 * Preserve the transaction reservation type.  The logcount is
> +	 * hardwired to 1 to so that we can make forward progress in recovery
> +	 * no matter how full the log might be, at a cost of more regrants.
> +	 */
> +	dfc->dfc_tres.tr_logres = tp->t_log_res;
> +	dfc->dfc_tres.tr_logcount = 1;
> +	dfc->dfc_tres.tr_logflags = XFS_TRANS_PERM_LOG_RES;

Any real need to allocate these last two fields in every captured chain
when they're basically hardcoded? If not, it might be a bit more
efficient to put an xfs_trans_res on the stack in
xlog_finish_defer_ops() and just save the logres value here.

Brian

> +
>  	return dfc;
>  }
>  
> diff --git a/fs/xfs/libxfs/xfs_defer.h b/fs/xfs/libxfs/xfs_defer.h
> index b1c7b761afd5..c447c79bbe74 100644
> --- a/fs/xfs/libxfs/xfs_defer.h
> +++ b/fs/xfs/libxfs/xfs_defer.h
> @@ -76,6 +76,7 @@ struct xfs_defer_capture {
>  	struct list_head	dfc_dfops;
>  	unsigned int		dfc_tpflags;
>  	unsigned int		dfc_blkres;
> +	struct xfs_trans_res	dfc_tres;
>  };
>  
>  /*
> diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
> index b06c9881a13d..46e750279634 100644
> --- a/fs/xfs/xfs_log_recover.c
> +++ b/fs/xfs/xfs_log_recover.c
> @@ -2442,8 +2442,8 @@ xlog_finish_defer_ops(
>  	int			error = 0;
>  
>  	list_for_each_entry_safe(dfc, next, capture_list, dfc_list) {
> -		error = xfs_trans_alloc(mp, &M_RES(mp)->tr_itruncate, 0,
> -				0, XFS_TRANS_RESERVE, &tp);
> +		error = xfs_trans_alloc(mp, &dfc->dfc_tres, 0, 0,
> +				XFS_TRANS_RESERVE, &tp);
>  		if (error)
>  			return error;
>  
> 


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 3/5] xfs: proper replay of deferred ops queued during log recovery
  2020-10-01 17:31   ` Brian Foster
@ 2020-10-01 17:41     ` Darrick J. Wong
  0 siblings, 0 replies; 24+ messages in thread
From: Darrick J. Wong @ 2020-10-01 17:41 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs, david, hch

On Thu, Oct 01, 2020 at 01:31:42PM -0400, Brian Foster wrote:
> On Tue, Sep 29, 2020 at 10:43:31AM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > When we replay unfinished intent items that have been recovered from the
> > log, it's possible that the replay will cause the creation of more
> > deferred work items.  As outlined in commit 509955823cc9c ("xfs: log
> > recovery should replay deferred ops in order"), later work items have an
> > implicit ordering dependency on earlier work items.  Therefore, recovery
> > must replay the items (both recovered and created) in the same order
> > that they would have been during normal operation.
> > 
> > For log recovery, we enforce this ordering by using an empty transaction
> > to collect deferred ops that get created in the process of recovering a
> > log intent item to prevent them from being committed before the rest of
> > the recovered intent items.  After we finish committing all the
> > recovered log items, we allocate a transaction with an enormous block
> > reservation, splice our huge list of created deferred ops into that
> > transaction, and commit it, thereby finishing all those ops.
> > 
> > This is /really/ hokey -- it's the one place in XFS where we allow
> > nested transactions; the splicing of the defer ops list is is inelegant
> > and has to be done twice per recovery function; and the broken way we
> > handle inode pointers and block reservations cause subtle use-after-free
> > and allocator problems that will be fixed by this patch and the two
> > patches after it.
> > 
> > Therefore, replace the hokey empty transaction with a structure designed
> > to capture each chain of deferred ops that are created as part of
> > recovering a single unfinished log intent.  Finally, refactor the loop
> > that replays those chains to do so using one transaction per chain.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> >  fs/xfs/libxfs/xfs_defer.c  |   89 +++++++++++++++++++++++++++++++--
> >  fs/xfs/libxfs/xfs_defer.h  |   19 +++++++
> >  fs/xfs/xfs_bmap_item.c     |   16 +-----
> >  fs/xfs/xfs_extfree_item.c  |    7 +--
> >  fs/xfs/xfs_log_recover.c   |  118 +++++++++++++++++++++++++-------------------
> >  fs/xfs/xfs_refcount_item.c |   16 +-----
> >  fs/xfs/xfs_rmap_item.c     |    7 +--
> >  fs/xfs/xfs_trans.h         |    3 +
> >  8 files changed, 184 insertions(+), 91 deletions(-)
> > 
> > 
> ...
> > diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
> > index 84f876c6d498..550d0fa8057a 100644
> > --- a/fs/xfs/xfs_log_recover.c
> > +++ b/fs/xfs/xfs_log_recover.c
> > @@ -2434,37 +2434,62 @@ xlog_recover_process_data(
> >  /* Take all the collected deferred ops and finish them in order. */
> >  static int
> >  xlog_finish_defer_ops(
> > -	struct xfs_trans	*parent_tp)
> > +	struct xfs_mount	*mp,
> > +	struct list_head	*capture_list)
> >  {
> > -	struct xfs_mount	*mp = parent_tp->t_mountp;
> > +	struct xfs_defer_capture *dfc, *next;
> >  	struct xfs_trans	*tp;
> >  	int64_t			freeblks;
> > -	uint			resblks;
> > -	int			error;
> > +	uint64_t		resblks;
> > +	int			error = 0;
> >  
> > -	/*
> > -	 * We're finishing the defer_ops that accumulated as a result of
> > -	 * recovering unfinished intent items during log recovery.  We
> > -	 * reserve an itruncate transaction because it is the largest
> > -	 * permanent transaction type.  Since we're the only user of the fs
> > -	 * right now, take 93% (15/16) of the available free blocks.  Use
> > -	 * weird math to avoid a 64-bit division.
> > -	 */
> > -	freeblks = percpu_counter_sum(&mp->m_fdblocks);
> > -	if (freeblks <= 0)
> > -		return -ENOSPC;
> > -	resblks = min_t(int64_t, UINT_MAX, freeblks);
> > -	resblks = (resblks * 15) >> 4;
> > -	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_itruncate, resblks,
> > -			0, XFS_TRANS_RESERVE, &tp);
> > -	if (error)
> > -		return error;
> > -	/* transfer all collected dfops to this transaction */
> > -	xfs_defer_move(tp, parent_tp);
> > +	list_for_each_entry_safe(dfc, next, capture_list, dfc_list) {
> > +		/*
> > +		 * We're finishing the defer_ops that accumulated as a result
> > +		 * of recovering unfinished intent items during log recovery.
> > +		 * We reserve an itruncate transaction because it is the
> > +		 * largest permanent transaction type.  Since we're the only
> > +		 * user of the fs right now, take 93% (15/16) of the available
> > +		 * free blocks.  Use weird math to avoid a 64-bit division.
> > +		 */
> > +		freeblks = percpu_counter_sum(&mp->m_fdblocks);
> > +		if (freeblks <= 0)
> > +			return -ENOSPC;
> >  
> > -	return xfs_trans_commit(tp);
> > +		resblks = min_t(uint64_t, UINT_MAX, freeblks);
> > +		resblks = (resblks * 15) >> 4;
> > +		error = xfs_trans_alloc(mp, &M_RES(mp)->tr_itruncate, resblks,
> > +				0, XFS_TRANS_RESERVE, &tp);
> > +		if (error)
> > +			return error;
> > +
> > +		/* Transfer all collected dfops to this transaction. */
> 
> This old comment is a little misleading now that we have per-intent
> capture lists. That nit aside, this looks good to me and the updated
> factoring is easier to follow:

I'll change it to:

		/*
		 * Transfer to this new transaction all the dfops we captured
		 * from recovering a single intent item.
		 */

> Reviewed-by: Brian Foster <bfoster@redhat.com>

Thanks for the review!

--D

> 
> > +		list_del_init(&dfc->dfc_list);
> > +		xfs_defer_ops_continue(dfc, tp);
> > +
> > +		error = xfs_trans_commit(tp);
> > +		if (error)
> > +			return error;
> > +	}
> > +
> > +	ASSERT(list_empty(capture_list));
> > +	return 0;
> >  }
> >  
> > +/* Release all the captured defer ops and capture structures in this list. */
> > +static void
> > +xlog_abort_defer_ops(
> > +	struct xfs_mount		*mp,
> > +	struct list_head		*capture_list)
> > +{
> > +	struct xfs_defer_capture	*dfc;
> > +	struct xfs_defer_capture	*next;
> > +
> > +	list_for_each_entry_safe(dfc, next, capture_list, dfc_list) {
> > +		list_del_init(&dfc->dfc_list);
> > +		xfs_defer_ops_release(mp, dfc);
> > +	}
> > +}
> >  /*
> >   * When this is called, all of the log intent items which did not have
> >   * corresponding log done items should be in the AIL.  What we do now
> > @@ -2485,35 +2510,23 @@ STATIC int
> >  xlog_recover_process_intents(
> >  	struct xlog		*log)
> >  {
> > -	struct xfs_trans	*parent_tp;
> > +	LIST_HEAD(capture_list);
> >  	struct xfs_ail_cursor	cur;
> >  	struct xfs_log_item	*lip;
> >  	struct xfs_ail		*ailp;
> > -	int			error;
> > +	int			error = 0;
> >  #if defined(DEBUG) || defined(XFS_WARN)
> >  	xfs_lsn_t		last_lsn;
> >  #endif
> >  
> > -	/*
> > -	 * The intent recovery handlers commit transactions to complete recovery
> > -	 * for individual intents, but any new deferred operations that are
> > -	 * queued during that process are held off until the very end. The
> > -	 * purpose of this transaction is to serve as a container for deferred
> > -	 * operations. Each intent recovery handler must transfer dfops here
> > -	 * before its local transaction commits, and we'll finish the entire
> > -	 * list below.
> > -	 */
> > -	error = xfs_trans_alloc_empty(log->l_mp, &parent_tp);
> > -	if (error)
> > -		return error;
> > -
> >  	ailp = log->l_ailp;
> >  	spin_lock(&ailp->ail_lock);
> > -	lip = xfs_trans_ail_cursor_first(ailp, &cur, 0);
> >  #if defined(DEBUG) || defined(XFS_WARN)
> >  	last_lsn = xlog_assign_lsn(log->l_curr_cycle, log->l_curr_block);
> >  #endif
> > -	while (lip != NULL) {
> > +	for (lip = xfs_trans_ail_cursor_first(ailp, &cur, 0);
> > +	     lip != NULL;
> > +	     lip = xfs_trans_ail_cursor_next(ailp, &cur)) {
> >  		/*
> >  		 * We're done when we see something other than an intent.
> >  		 * There should be no intents left in the AIL now.
> > @@ -2535,24 +2548,29 @@ xlog_recover_process_intents(
> >  
> >  		/*
> >  		 * NOTE: If your intent processing routine can create more
> > -		 * deferred ops, you /must/ attach them to the transaction in
> > -		 * this routine or else those subsequent intents will get
> > +		 * deferred ops, you /must/ attach them to the capture list in
> > +		 * the recover routine or else those subsequent intents will be
> >  		 * replayed in the wrong order!
> >  		 */
> >  		spin_unlock(&ailp->ail_lock);
> > -		error = lip->li_ops->iop_recover(lip, parent_tp);
> > +		error = lip->li_ops->iop_recover(lip, &capture_list);
> >  		spin_lock(&ailp->ail_lock);
> >  		if (error)
> > -			goto out;
> > -		lip = xfs_trans_ail_cursor_next(ailp, &cur);
> > +			break;
> >  	}
> > -out:
> > +
> >  	xfs_trans_ail_cursor_done(&cur);
> >  	spin_unlock(&ailp->ail_lock);
> > -	if (!error)
> > -		error = xlog_finish_defer_ops(parent_tp);
> > -	xfs_trans_cancel(parent_tp);
> > +	if (error)
> > +		goto err;
> >  
> > +	error = xlog_finish_defer_ops(log->l_mp, &capture_list);
> > +	if (error)
> > +		goto err;
> > +
> > +	return 0;
> > +err:
> > +	xlog_abort_defer_ops(log->l_mp, &capture_list);
> >  	return error;
> >  }
> >  
> > diff --git a/fs/xfs/xfs_refcount_item.c b/fs/xfs/xfs_refcount_item.c
> > index 3e34b7662361..0478374add64 100644
> > --- a/fs/xfs/xfs_refcount_item.c
> > +++ b/fs/xfs/xfs_refcount_item.c
> > @@ -424,7 +424,7 @@ const struct xfs_defer_op_type xfs_refcount_update_defer_type = {
> >  STATIC int
> >  xfs_cui_item_recover(
> >  	struct xfs_log_item		*lip,
> > -	struct xfs_trans		*parent_tp)
> > +	struct list_head		*capture_list)
> >  {
> >  	struct xfs_bmbt_irec		irec;
> >  	struct xfs_cui_log_item		*cuip = CUI_ITEM(lip);
> > @@ -432,7 +432,7 @@ xfs_cui_item_recover(
> >  	struct xfs_cud_log_item		*cudp;
> >  	struct xfs_trans		*tp;
> >  	struct xfs_btree_cur		*rcur = NULL;
> > -	struct xfs_mount		*mp = parent_tp->t_mountp;
> > +	struct xfs_mount		*mp = lip->li_mountp;
> >  	xfs_fsblock_t			startblock_fsb;
> >  	xfs_fsblock_t			new_fsb;
> >  	xfs_extlen_t			new_len;
> > @@ -487,12 +487,7 @@ xfs_cui_item_recover(
> >  			mp->m_refc_maxlevels * 2, 0, XFS_TRANS_RESERVE, &tp);
> >  	if (error)
> >  		return error;
> > -	/*
> > -	 * Recovery stashes all deferred ops during intent processing and
> > -	 * finishes them on completion. Transfer current dfops state to this
> > -	 * transaction and transfer the result back before we return.
> > -	 */
> > -	xfs_defer_move(tp, parent_tp);
> > +
> >  	cudp = xfs_trans_get_cud(tp, cuip);
> >  
> >  	for (i = 0; i < cuip->cui_format.cui_nextents; i++) {
> > @@ -549,13 +544,10 @@ xfs_cui_item_recover(
> >  	}
> >  
> >  	xfs_refcount_finish_one_cleanup(tp, rcur, error);
> > -	xfs_defer_capture(parent_tp, tp);
> > -	error = xfs_trans_commit(tp);
> > -	return error;
> > +	return xfs_defer_ops_capture_and_commit(tp, capture_list);
> >  
> >  abort_error:
> >  	xfs_refcount_finish_one_cleanup(tp, rcur, error);
> > -	xfs_defer_move(parent_tp, tp);
> >  	xfs_trans_cancel(tp);
> >  	return error;
> >  }
> > diff --git a/fs/xfs/xfs_rmap_item.c b/fs/xfs/xfs_rmap_item.c
> > index e38ec5d736be..0d8fa707f079 100644
> > --- a/fs/xfs/xfs_rmap_item.c
> > +++ b/fs/xfs/xfs_rmap_item.c
> > @@ -467,14 +467,14 @@ const struct xfs_defer_op_type xfs_rmap_update_defer_type = {
> >  STATIC int
> >  xfs_rui_item_recover(
> >  	struct xfs_log_item		*lip,
> > -	struct xfs_trans		*parent_tp)
> > +	struct list_head		*capture_list)
> >  {
> >  	struct xfs_rui_log_item		*ruip = RUI_ITEM(lip);
> >  	struct xfs_map_extent		*rmap;
> >  	struct xfs_rud_log_item		*rudp;
> >  	struct xfs_trans		*tp;
> >  	struct xfs_btree_cur		*rcur = NULL;
> > -	struct xfs_mount		*mp = parent_tp->t_mountp;
> > +	struct xfs_mount		*mp = lip->li_mountp;
> >  	xfs_fsblock_t			startblock_fsb;
> >  	enum xfs_rmap_intent_type	type;
> >  	xfs_exntst_t			state;
> > @@ -567,8 +567,7 @@ xfs_rui_item_recover(
> >  	}
> >  
> >  	xfs_rmap_finish_one_cleanup(tp, rcur, error);
> > -	error = xfs_trans_commit(tp);
> > -	return error;
> > +	return xfs_defer_ops_capture_and_commit(tp, capture_list);
> >  
> >  abort_error:
> >  	xfs_rmap_finish_one_cleanup(tp, rcur, error);
> > diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h
> > index ced62a35a62b..186e77d08cc1 100644
> > --- a/fs/xfs/xfs_trans.h
> > +++ b/fs/xfs/xfs_trans.h
> > @@ -72,7 +72,8 @@ struct xfs_item_ops {
> >  	void (*iop_committing)(struct xfs_log_item *, xfs_lsn_t commit_lsn);
> >  	void (*iop_release)(struct xfs_log_item *);
> >  	xfs_lsn_t (*iop_committed)(struct xfs_log_item *, xfs_lsn_t);
> > -	int (*iop_recover)(struct xfs_log_item *lip, struct xfs_trans *tp);
> > +	int (*iop_recover)(struct xfs_log_item *lip,
> > +			   struct list_head *capture_list);
> >  	bool (*iop_match)(struct xfs_log_item *item, uint64_t id);
> >  };
> >  
> > 
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 4/5] xfs: xfs_defer_capture should absorb remaining block reservation
  2020-10-01 17:32   ` Brian Foster
@ 2020-10-01 17:46     ` Darrick J. Wong
  0 siblings, 0 replies; 24+ messages in thread
From: Darrick J. Wong @ 2020-10-01 17:46 UTC (permalink / raw)
  To: Brian Foster; +Cc: Christoph Hellwig, Dave Chinner, linux-xfs, david

On Thu, Oct 01, 2020 at 01:32:24PM -0400, Brian Foster wrote:
> On Tue, Sep 29, 2020 at 10:43:38AM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > When xfs_defer_capture extracts the deferred ops and transaction state
> > from a transaction, it should absorb the remaining block reservation so
> > that when we continue the dfops chain, we still have those blocks to
> > use.
> > 
> > This adds the requirement that every log intent item recovery function
> > must be careful to reserve enough blocks to handle both itself and all
> > defer ops that it can queue.  On the other hand, this enables us to do
> > away with the handwaving block estimation nonsense that was going on in
> > xlog_finish_defer_ops.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > Reviewed-by: Christoph Hellwig <hch@lst.de>
> > Reviewed-by: Dave Chinner <dchinner@redhat.com>
> > ---
> >  fs/xfs/libxfs/xfs_defer.c |    5 +++++
> >  fs/xfs/libxfs/xfs_defer.h |    1 +
> >  fs/xfs/xfs_log_recover.c  |   18 +-----------------
> >  3 files changed, 7 insertions(+), 17 deletions(-)
> > 
> > 
> > diff --git a/fs/xfs/libxfs/xfs_defer.c b/fs/xfs/libxfs/xfs_defer.c
> > index 85c371d29e8d..0cceebb390c4 100644
> > --- a/fs/xfs/libxfs/xfs_defer.c
> > +++ b/fs/xfs/libxfs/xfs_defer.c
> > @@ -575,6 +575,10 @@ xfs_defer_ops_capture(
> >  	dfc->dfc_tpflags = tp->t_flags & XFS_TRANS_LOWMODE;
> >  	tp->t_flags &= ~XFS_TRANS_LOWMODE;
> >  
> > +	/* Capture the block reservation along with the dfops. */
> > +	dfc->dfc_blkres = tp->t_blk_res - tp->t_blk_res_used;
> > +	tp->t_blk_res = tp->t_blk_res_used;
> > +
> >  	return dfc;
> >  }
> >  
> > @@ -632,6 +636,7 @@ xfs_defer_ops_continue(
> >  	/* Move captured dfops chain and state to the transaction. */
> >  	list_splice_init(&dfc->dfc_dfops, &tp->t_dfops);
> >  	tp->t_flags |= dfc->dfc_tpflags;
> > +	tp->t_blk_res += dfc->dfc_blkres;
> >  
> >  	kmem_free(dfc);
> >  }
> 
> Seems sane, but I'm curious why we need to modify the transactions
> directly in both of these contexts. Rather than building up and holding
> a growing block reservation across transactions during intent
> processing, could we just sample the unused blocks in the transaction at
> capture time and use that as a resblks parameter when we allocate the
> transaction to continue the chain? Then we at least have some validation
> via the traditional allocation path if we ever screw up the accounting..

Good idea.  I'll also make this patch save the rt block reservation.

--D

> Brian
> 
> > diff --git a/fs/xfs/libxfs/xfs_defer.h b/fs/xfs/libxfs/xfs_defer.h
> > index 3af82ebc1249..b1c7b761afd5 100644
> > --- a/fs/xfs/libxfs/xfs_defer.h
> > +++ b/fs/xfs/libxfs/xfs_defer.h
> > @@ -75,6 +75,7 @@ struct xfs_defer_capture {
> >  	/* Deferred ops state saved from the transaction. */
> >  	struct list_head	dfc_dfops;
> >  	unsigned int		dfc_tpflags;
> > +	unsigned int		dfc_blkres;
> >  };
> >  
> >  /*
> > diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
> > index 550d0fa8057a..b06c9881a13d 100644
> > --- a/fs/xfs/xfs_log_recover.c
> > +++ b/fs/xfs/xfs_log_recover.c
> > @@ -2439,26 +2439,10 @@ xlog_finish_defer_ops(
> >  {
> >  	struct xfs_defer_capture *dfc, *next;
> >  	struct xfs_trans	*tp;
> > -	int64_t			freeblks;
> > -	uint64_t		resblks;
> >  	int			error = 0;
> >  
> >  	list_for_each_entry_safe(dfc, next, capture_list, dfc_list) {
> > -		/*
> > -		 * We're finishing the defer_ops that accumulated as a result
> > -		 * of recovering unfinished intent items during log recovery.
> > -		 * We reserve an itruncate transaction because it is the
> > -		 * largest permanent transaction type.  Since we're the only
> > -		 * user of the fs right now, take 93% (15/16) of the available
> > -		 * free blocks.  Use weird math to avoid a 64-bit division.
> > -		 */
> > -		freeblks = percpu_counter_sum(&mp->m_fdblocks);
> > -		if (freeblks <= 0)
> > -			return -ENOSPC;
> > -
> > -		resblks = min_t(uint64_t, UINT_MAX, freeblks);
> > -		resblks = (resblks * 15) >> 4;
> > -		error = xfs_trans_alloc(mp, &M_RES(mp)->tr_itruncate, resblks,
> > +		error = xfs_trans_alloc(mp, &M_RES(mp)->tr_itruncate, 0,
> >  				0, XFS_TRANS_RESERVE, &tp);
> >  		if (error)
> >  			return error;
> > 
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 5/5] xfs: xfs_defer_capture should absorb remaining transaction reservation
  2020-10-01 17:32   ` Brian Foster
@ 2020-10-01 17:52     ` Darrick J. Wong
  0 siblings, 0 replies; 24+ messages in thread
From: Darrick J. Wong @ 2020-10-01 17:52 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs, david, hch

On Thu, Oct 01, 2020 at 01:32:56PM -0400, Brian Foster wrote:
> On Tue, Sep 29, 2020 at 10:43:44AM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > When xfs_defer_capture extracts the deferred ops and transaction state
> > from a transaction, it should record the transaction reservation type
> > from the old transaction so that when we continue the dfops chain, we
> > still use the same reservation parameters.
> > 
> > Doing this means that the log item recovery functions get to determine
> > the transaction reservation instead of abusing tr_itruncate in yet
> > another part of xfs.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> 
> Much nicer, and FWIW this is pretty much the approach I was wondering
> about wrt to the block reservation in the previous patch..
> 
> >  fs/xfs/libxfs/xfs_defer.c |    9 +++++++++
> >  fs/xfs/libxfs/xfs_defer.h |    1 +
> >  fs/xfs/xfs_log_recover.c  |    4 ++--
> >  3 files changed, 12 insertions(+), 2 deletions(-)
> > 
> > 
> > diff --git a/fs/xfs/libxfs/xfs_defer.c b/fs/xfs/libxfs/xfs_defer.c
> > index 0cceebb390c4..4caaf5527403 100644
> > --- a/fs/xfs/libxfs/xfs_defer.c
> > +++ b/fs/xfs/libxfs/xfs_defer.c
> > @@ -579,6 +579,15 @@ xfs_defer_ops_capture(
> >  	dfc->dfc_blkres = tp->t_blk_res - tp->t_blk_res_used;
> >  	tp->t_blk_res = tp->t_blk_res_used;
> >  
> > +	/*
> > +	 * Preserve the transaction reservation type.  The logcount is
> > +	 * hardwired to 1 to so that we can make forward progress in recovery
> > +	 * no matter how full the log might be, at a cost of more regrants.
> > +	 */
> > +	dfc->dfc_tres.tr_logres = tp->t_log_res;
> > +	dfc->dfc_tres.tr_logcount = 1;
> > +	dfc->dfc_tres.tr_logflags = XFS_TRANS_PERM_LOG_RES;
> 
> Any real need to allocate these last two fields in every captured chain
> when they're basically hardcoded? If not, it might be a bit more
> efficient to put an xfs_trans_res on the stack in
> xlog_finish_defer_ops() and just save the logres value here.

Ok, will do.

--D

> 
> Brian
> 
> > +
> >  	return dfc;
> >  }
> >  
> > diff --git a/fs/xfs/libxfs/xfs_defer.h b/fs/xfs/libxfs/xfs_defer.h
> > index b1c7b761afd5..c447c79bbe74 100644
> > --- a/fs/xfs/libxfs/xfs_defer.h
> > +++ b/fs/xfs/libxfs/xfs_defer.h
> > @@ -76,6 +76,7 @@ struct xfs_defer_capture {
> >  	struct list_head	dfc_dfops;
> >  	unsigned int		dfc_tpflags;
> >  	unsigned int		dfc_blkres;
> > +	struct xfs_trans_res	dfc_tres;
> >  };
> >  
> >  /*
> > diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
> > index b06c9881a13d..46e750279634 100644
> > --- a/fs/xfs/xfs_log_recover.c
> > +++ b/fs/xfs/xfs_log_recover.c
> > @@ -2442,8 +2442,8 @@ xlog_finish_defer_ops(
> >  	int			error = 0;
> >  
> >  	list_for_each_entry_safe(dfc, next, capture_list, dfc_list) {
> > -		error = xfs_trans_alloc(mp, &M_RES(mp)->tr_itruncate, 0,
> > -				0, XFS_TRANS_RESERVE, &tp);
> > +		error = xfs_trans_alloc(mp, &dfc->dfc_tres, 0, 0,
> > +				XFS_TRANS_RESERVE, &tp);
> >  		if (error)
> >  			return error;
> >  
> > 
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v4.2 4/5] xfs: xfs_defer_capture should absorb remaining block reservation
  2020-09-29 17:43 ` [PATCH 4/5] xfs: xfs_defer_capture should absorb remaining block reservation Darrick J. Wong
  2020-10-01 17:32   ` Brian Foster
@ 2020-10-02  4:20   ` Darrick J. Wong
  2020-10-02  7:22     ` Christoph Hellwig
  2020-10-02 10:39     ` Brian Foster
  1 sibling, 2 replies; 24+ messages in thread
From: Darrick J. Wong @ 2020-10-02  4:20 UTC (permalink / raw)
  To: Christoph Hellwig, Dave Chinner, linux-xfs, david, bfoster

From: Darrick J. Wong <darrick.wong@oracle.com>

When xfs_defer_capture extracts the deferred ops and transaction state
from a transaction, it should record the remaining block reservations so
that when we continue the dfops chain, we can reserve the same number of
blocks to use.  We capture the reservations for both data and realtime
volumes.

This adds the requirement that every log intent item recovery function
must be careful to reserve enough blocks to handle both itself and all
defer ops that it can queue.  On the other hand, this enables us to do
away with the handwaving block estimation nonsense that was going on in
xlog_finish_defer_ops.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
v4.2: don't fiddle with transaction internals, and save the rt
reservation too
---
 fs/xfs/libxfs/xfs_defer.c |    4 ++++
 fs/xfs/libxfs/xfs_defer.h |    4 ++++
 fs/xfs/xfs_log_recover.c  |   21 +++------------------
 3 files changed, 11 insertions(+), 18 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_defer.c b/fs/xfs/libxfs/xfs_defer.c
index 85c371d29e8d..10aeae7353ab 100644
--- a/fs/xfs/libxfs/xfs_defer.c
+++ b/fs/xfs/libxfs/xfs_defer.c
@@ -575,6 +575,10 @@ xfs_defer_ops_capture(
 	dfc->dfc_tpflags = tp->t_flags & XFS_TRANS_LOWMODE;
 	tp->t_flags &= ~XFS_TRANS_LOWMODE;
 
+	/* Capture the remaining block reservations along with the dfops. */
+	dfc->dfc_blkres = tp->t_blk_res - tp->t_blk_res_used;
+	dfc->dfc_rtxres = tp->t_rtx_res - tp->t_rtx_res_used;
+
 	return dfc;
 }
 
diff --git a/fs/xfs/libxfs/xfs_defer.h b/fs/xfs/libxfs/xfs_defer.h
index 3af82ebc1249..5c0e59b69ffa 100644
--- a/fs/xfs/libxfs/xfs_defer.h
+++ b/fs/xfs/libxfs/xfs_defer.h
@@ -75,6 +75,10 @@ struct xfs_defer_capture {
 	/* Deferred ops state saved from the transaction. */
 	struct list_head	dfc_dfops;
 	unsigned int		dfc_tpflags;
+
+	/* Block reservations for the data and rt devices. */
+	unsigned int		dfc_blkres;
+	unsigned int		dfc_rtxres;
 };
 
 /*
diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index 7804906d145b..1be5208e2a2f 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -2439,27 +2439,12 @@ xlog_finish_defer_ops(
 {
 	struct xfs_defer_capture *dfc, *next;
 	struct xfs_trans	*tp;
-	int64_t			freeblks;
-	uint64_t		resblks;
 	int			error = 0;
 
 	list_for_each_entry_safe(dfc, next, capture_list, dfc_list) {
-		/*
-		 * We're finishing the defer_ops that accumulated as a result
-		 * of recovering unfinished intent items during log recovery.
-		 * We reserve an itruncate transaction because it is the
-		 * largest permanent transaction type.  Since we're the only
-		 * user of the fs right now, take 93% (15/16) of the available
-		 * free blocks.  Use weird math to avoid a 64-bit division.
-		 */
-		freeblks = percpu_counter_sum(&mp->m_fdblocks);
-		if (freeblks <= 0)
-			return -ENOSPC;
-
-		resblks = min_t(uint64_t, UINT_MAX, freeblks);
-		resblks = (resblks * 15) >> 4;
-		error = xfs_trans_alloc(mp, &M_RES(mp)->tr_itruncate, resblks,
-				0, XFS_TRANS_RESERVE, &tp);
+		error = xfs_trans_alloc(mp, &M_RES(mp)->tr_itruncate,
+				dfc->dfc_blkres, dfc->dfc_rtxres,
+				XFS_TRANS_RESERVE, &tp);
 		if (error)
 			return error;
 

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v4.2 5/5] xfs: xfs_defer_capture should absorb remaining transaction reservation
  2020-09-29 17:43 ` [PATCH 5/5] xfs: xfs_defer_capture should absorb remaining transaction reservation Darrick J. Wong
  2020-10-01 17:32   ` Brian Foster
@ 2020-10-02  4:21   ` Darrick J. Wong
  2020-10-02  7:24     ` Christoph Hellwig
  2020-10-02 10:39     ` Brian Foster
  1 sibling, 2 replies; 24+ messages in thread
From: Darrick J. Wong @ 2020-10-02  4:21 UTC (permalink / raw)
  To: linux-xfs, david, hch, bfoster

From: Darrick J. Wong <darrick.wong@oracle.com>

When xfs_defer_capture extracts the deferred ops and transaction state
from a transaction, it should record the transaction reservation type
from the old transaction so that when we continue the dfops chain, we
still use the same reservation parameters.

Doing this means that the log item recovery functions get to determine
the transaction reservation instead of abusing tr_itruncate in yet
another part of xfs.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
v4.2: save only the log reservation, and hardcode logcount and flags.
---
 fs/xfs/libxfs/xfs_defer.c |    3 +++
 fs/xfs/libxfs/xfs_defer.h |    3 +++
 fs/xfs/xfs_log_recover.c  |   17 ++++++++++++++---
 3 files changed, 20 insertions(+), 3 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_defer.c b/fs/xfs/libxfs/xfs_defer.c
index 10aeae7353ab..e19dc1ced7e6 100644
--- a/fs/xfs/libxfs/xfs_defer.c
+++ b/fs/xfs/libxfs/xfs_defer.c
@@ -579,6 +579,9 @@ xfs_defer_ops_capture(
 	dfc->dfc_blkres = tp->t_blk_res - tp->t_blk_res_used;
 	dfc->dfc_rtxres = tp->t_rtx_res - tp->t_rtx_res_used;
 
+	/* Preserve the log reservation size. */
+	dfc->dfc_logres = tp->t_log_res;
+
 	return dfc;
 }
 
diff --git a/fs/xfs/libxfs/xfs_defer.h b/fs/xfs/libxfs/xfs_defer.h
index 5c0e59b69ffa..6cde6f0713f7 100644
--- a/fs/xfs/libxfs/xfs_defer.h
+++ b/fs/xfs/libxfs/xfs_defer.h
@@ -79,6 +79,9 @@ struct xfs_defer_capture {
 	/* Block reservations for the data and rt devices. */
 	unsigned int		dfc_blkres;
 	unsigned int		dfc_rtxres;
+
+	/* Log reservation saved from the transaction. */
+	unsigned int		dfc_logres;
 };
 
 /*
diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index 1be5208e2a2f..001e1585ddc6 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -2442,9 +2442,20 @@ xlog_finish_defer_ops(
 	int			error = 0;
 
 	list_for_each_entry_safe(dfc, next, capture_list, dfc_list) {
-		error = xfs_trans_alloc(mp, &M_RES(mp)->tr_itruncate,
-				dfc->dfc_blkres, dfc->dfc_rtxres,
-				XFS_TRANS_RESERVE, &tp);
+		struct xfs_trans_res	resv;
+
+		/*
+		 * Create a new transaction reservation from the captured
+		 * information.  Set logcount to 1 to force the new transaction
+		 * to regrant every roll so that we can make forward progress
+		 * in recovery no matter how full the log might be.
+		 */
+		resv.tr_logres = dfc->dfc_logres;
+		resv.tr_logcount = 1;
+		resv.tr_logflags = XFS_TRANS_PERM_LOG_RES;
+
+		error = xfs_trans_alloc(mp, &resv, dfc->dfc_blkres,
+				dfc->dfc_rtxres, XFS_TRANS_RESERVE, &tp);
 		if (error)
 			return error;
 

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/5] xfs: remove XFS_LI_RECOVERED
  2020-09-29 17:43 ` [PATCH 2/5] xfs: remove XFS_LI_RECOVERED Darrick J. Wong
  2020-10-01 17:30   ` Brian Foster
@ 2020-10-02  7:16   ` Christoph Hellwig
  1 sibling, 0 replies; 24+ messages in thread
From: Christoph Hellwig @ 2020-10-02  7:16 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs, david, hch, bfoster

On Tue, Sep 29, 2020 at 10:43:25AM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> The ->iop_recover method of a log intent item removes the recovered
> intent item from the AIL by logging an intent done item and committing
> the transaction, so it's superfluous to have this flag check.  Nothing
> else uses it, so get rid of the flag entirely.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 3/5] xfs: proper replay of deferred ops queued during log recovery
  2020-09-29 17:43 ` [PATCH 3/5] xfs: proper replay of deferred ops queued during log recovery Darrick J. Wong
  2020-10-01 17:31   ` Brian Foster
@ 2020-10-02  7:18   ` Christoph Hellwig
  1 sibling, 0 replies; 24+ messages in thread
From: Christoph Hellwig @ 2020-10-02  7:18 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs, david, hch, bfoster

On Tue, Sep 29, 2020 at 10:43:31AM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> When we replay unfinished intent items that have been recovered from the
> log, it's possible that the replay will cause the creation of more
> deferred work items.  As outlined in commit 509955823cc9c ("xfs: log
> recovery should replay deferred ops in order"), later work items have an
> implicit ordering dependency on earlier work items.  Therefore, recovery
> must replay the items (both recovered and created) in the same order
> that they would have been during normal operation.
> 
> For log recovery, we enforce this ordering by using an empty transaction
> to collect deferred ops that get created in the process of recovering a
> log intent item to prevent them from being committed before the rest of
> the recovered intent items.  After we finish committing all the
> recovered log items, we allocate a transaction with an enormous block
> reservation, splice our huge list of created deferred ops into that
> transaction, and commit it, thereby finishing all those ops.
> 
> This is /really/ hokey -- it's the one place in XFS where we allow
> nested transactions; the splicing of the defer ops list is is inelegant
> and has to be done twice per recovery function; and the broken way we
> handle inode pointers and block reservations cause subtle use-after-free
> and allocator problems that will be fixed by this patch and the two
> patches after it.
> 
> Therefore, replace the hokey empty transaction with a structure designed
> to capture each chain of deferred ops that are created as part of
> recovering a single unfinished log intent.  Finally, refactor the loop
> that replays those chains to do so using one transaction per chain.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  fs/xfs/libxfs/xfs_defer.c  |   89 +++++++++++++++++++++++++++++++--
>  fs/xfs/libxfs/xfs_defer.h  |   19 +++++++
>  fs/xfs/xfs_bmap_item.c     |   16 +-----
>  fs/xfs/xfs_extfree_item.c  |    7 +--
>  fs/xfs/xfs_log_recover.c   |  118 +++++++++++++++++++++++++-------------------
>  fs/xfs/xfs_refcount_item.c |   16 +-----
>  fs/xfs/xfs_rmap_item.c     |    7 +--
>  fs/xfs/xfs_trans.h         |    3 +
>  8 files changed, 184 insertions(+), 91 deletions(-)
> 
> 
> diff --git a/fs/xfs/libxfs/xfs_defer.c b/fs/xfs/libxfs/xfs_defer.c
> index 36c103c14bc9..85c371d29e8d 100644
> --- a/fs/xfs/libxfs/xfs_defer.c
> +++ b/fs/xfs/libxfs/xfs_defer.c
> @@ -549,14 +549,89 @@ xfs_defer_move(
>   *
>   * Create and log intent items for all the work that we're capturing so that we
>   * can be assured that the items will get replayed if the system goes down
> - * before log recovery gets a chance to finish the work it put off.  Then we
> - * move the chain from stp to dtp.
> + * before log recovery gets a chance to finish the work it put off.  The entire
> + * deferred ops state is transferred to the capture structure and the
> + * transaction is then ready for the caller to commit it.  If there are no
> + * intent items to capture, this function returns NULL.
> + */
> +static struct xfs_defer_capture *
> +xfs_defer_ops_capture(
> +	struct xfs_trans		*tp)
> +{
> +	struct xfs_defer_capture	*dfc;
> +
> +	if (list_empty(&tp->t_dfops))
> +		return NULL;

Nit: keeping the list_empty check in the caller would seems more obvious
to me.

Otherwise looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4.2 4/5] xfs: xfs_defer_capture should absorb remaining block reservation
  2020-10-02  4:20   ` [PATCH v4.2 " Darrick J. Wong
@ 2020-10-02  7:22     ` Christoph Hellwig
  2020-10-02 10:39     ` Brian Foster
  1 sibling, 0 replies; 24+ messages in thread
From: Christoph Hellwig @ 2020-10-02  7:22 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Christoph Hellwig, Dave Chinner, linux-xfs, david, bfoster

On Thu, Oct 01, 2020 at 09:20:15PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> When xfs_defer_capture extracts the deferred ops and transaction state
> from a transaction, it should record the remaining block reservations so
> that when we continue the dfops chain, we can reserve the same number of
> blocks to use.  We capture the reservations for both data and realtime
> volumes.
> 
> This adds the requirement that every log intent item recovery function
> must be careful to reserve enough blocks to handle both itself and all
> defer ops that it can queue.  On the other hand, this enables us to do
> away with the handwaving block estimation nonsense that was going on in
> xlog_finish_defer_ops.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

I like this version better as well:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4.2 5/5] xfs: xfs_defer_capture should absorb remaining transaction reservation
  2020-10-02  4:21   ` [PATCH v4.2 " Darrick J. Wong
@ 2020-10-02  7:24     ` Christoph Hellwig
  2020-10-02  7:32       ` Christoph Hellwig
  2020-10-02 10:39     ` Brian Foster
  1 sibling, 1 reply; 24+ messages in thread
From: Christoph Hellwig @ 2020-10-02  7:24 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs, david, hch, bfoster

On Thu, Oct 01, 2020 at 09:21:03PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> When xfs_defer_capture extracts the deferred ops and transaction state
> from a transaction, it should record the transaction reservation type
> from the old transaction so that when we continue the dfops chain, we
> still use the same reservation parameters.
> 
> Doing this means that the log item recovery functions get to determine
> the transaction reservation instead of abusing tr_itruncate in yet
> another part of xfs.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
> v4.2: save only the log reservation, and hardcode logcount and flags.
> ---
>  fs/xfs/libxfs/xfs_defer.c |    3 +++
>  fs/xfs/libxfs/xfs_defer.h |    3 +++
>  fs/xfs/xfs_log_recover.c  |   17 ++++++++++++++---
>  3 files changed, 20 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_defer.c b/fs/xfs/libxfs/xfs_defer.c
> index 10aeae7353ab..e19dc1ced7e6 100644
> --- a/fs/xfs/libxfs/xfs_defer.c
> +++ b/fs/xfs/libxfs/xfs_defer.c
> @@ -579,6 +579,9 @@ xfs_defer_ops_capture(
>  	dfc->dfc_blkres = tp->t_blk_res - tp->t_blk_res_used;
>  	dfc->dfc_rtxres = tp->t_rtx_res - tp->t_rtx_res_used;
>  
> +	/* Preserve the log reservation size. */
> +	dfc->dfc_logres = tp->t_log_res;
> +
>  	return dfc;
>  }
>  
> diff --git a/fs/xfs/libxfs/xfs_defer.h b/fs/xfs/libxfs/xfs_defer.h
> index 5c0e59b69ffa..6cde6f0713f7 100644
> --- a/fs/xfs/libxfs/xfs_defer.h
> +++ b/fs/xfs/libxfs/xfs_defer.h
> @@ -79,6 +79,9 @@ struct xfs_defer_capture {
>  	/* Block reservations for the data and rt devices. */
>  	unsigned int		dfc_blkres;
>  	unsigned int		dfc_rtxres;
> +
> +	/* Log reservation saved from the transaction. */
> +	unsigned int		dfc_logres;

> +		struct xfs_trans_res	resv;
> +
> +		/*
> +		 * Create a new transaction reservation from the captured
> +		 * information.  Set logcount to 1 to force the new transaction
> +		 * to regrant every roll so that we can make forward progress
> +		 * in recovery no matter how full the log might be.
> +		 */
> +		resv.tr_logres = dfc->dfc_logres;
> +		resv.tr_logcount = 1;
> +		resv.tr_logflags = XFS_TRANS_PERM_LOG_RES;
> +
> +		error = xfs_trans_alloc(mp, &resv, dfc->dfc_blkres,
> +				dfc->dfc_rtxres, XFS_TRANS_RESERVE, &tp);

What about just embedding the struct xfs_trans_res into
struct xfs_defer_capture directly?  That probably also means merging
this and the previous patch.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4.2 5/5] xfs: xfs_defer_capture should absorb remaining transaction reservation
  2020-10-02  7:24     ` Christoph Hellwig
@ 2020-10-02  7:32       ` Christoph Hellwig
  0 siblings, 0 replies; 24+ messages in thread
From: Christoph Hellwig @ 2020-10-02  7:32 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs, david, hch, bfoster

On Fri, Oct 02, 2020 at 09:24:16AM +0200, Christoph Hellwig wrote:
> What about just embedding the struct xfs_trans_res into
> struct xfs_defer_capture directly?  That probably also means merging
> this and the previous patch.

Strike that as it doesn't make a whole lot of sense.  More coffee..

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4.2 4/5] xfs: xfs_defer_capture should absorb remaining block reservation
  2020-10-02  4:20   ` [PATCH v4.2 " Darrick J. Wong
  2020-10-02  7:22     ` Christoph Hellwig
@ 2020-10-02 10:39     ` Brian Foster
  1 sibling, 0 replies; 24+ messages in thread
From: Brian Foster @ 2020-10-02 10:39 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Christoph Hellwig, Dave Chinner, linux-xfs, david

On Thu, Oct 01, 2020 at 09:20:15PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> When xfs_defer_capture extracts the deferred ops and transaction state
> from a transaction, it should record the remaining block reservations so
> that when we continue the dfops chain, we can reserve the same number of
> blocks to use.  We capture the reservations for both data and realtime
> volumes.
> 
> This adds the requirement that every log intent item recovery function
> must be careful to reserve enough blocks to handle both itself and all
> defer ops that it can queue.  On the other hand, this enables us to do
> away with the handwaving block estimation nonsense that was going on in
> xlog_finish_defer_ops.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
> v4.2: don't fiddle with transaction internals, and save the rt
> reservation too
> ---

Reviewed-by: Brian Foster <bfoster@redhat.com>

>  fs/xfs/libxfs/xfs_defer.c |    4 ++++
>  fs/xfs/libxfs/xfs_defer.h |    4 ++++
>  fs/xfs/xfs_log_recover.c  |   21 +++------------------
>  3 files changed, 11 insertions(+), 18 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_defer.c b/fs/xfs/libxfs/xfs_defer.c
> index 85c371d29e8d..10aeae7353ab 100644
> --- a/fs/xfs/libxfs/xfs_defer.c
> +++ b/fs/xfs/libxfs/xfs_defer.c
> @@ -575,6 +575,10 @@ xfs_defer_ops_capture(
>  	dfc->dfc_tpflags = tp->t_flags & XFS_TRANS_LOWMODE;
>  	tp->t_flags &= ~XFS_TRANS_LOWMODE;
>  
> +	/* Capture the remaining block reservations along with the dfops. */
> +	dfc->dfc_blkres = tp->t_blk_res - tp->t_blk_res_used;
> +	dfc->dfc_rtxres = tp->t_rtx_res - tp->t_rtx_res_used;
> +
>  	return dfc;
>  }
>  
> diff --git a/fs/xfs/libxfs/xfs_defer.h b/fs/xfs/libxfs/xfs_defer.h
> index 3af82ebc1249..5c0e59b69ffa 100644
> --- a/fs/xfs/libxfs/xfs_defer.h
> +++ b/fs/xfs/libxfs/xfs_defer.h
> @@ -75,6 +75,10 @@ struct xfs_defer_capture {
>  	/* Deferred ops state saved from the transaction. */
>  	struct list_head	dfc_dfops;
>  	unsigned int		dfc_tpflags;
> +
> +	/* Block reservations for the data and rt devices. */
> +	unsigned int		dfc_blkres;
> +	unsigned int		dfc_rtxres;
>  };
>  
>  /*
> diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
> index 7804906d145b..1be5208e2a2f 100644
> --- a/fs/xfs/xfs_log_recover.c
> +++ b/fs/xfs/xfs_log_recover.c
> @@ -2439,27 +2439,12 @@ xlog_finish_defer_ops(
>  {
>  	struct xfs_defer_capture *dfc, *next;
>  	struct xfs_trans	*tp;
> -	int64_t			freeblks;
> -	uint64_t		resblks;
>  	int			error = 0;
>  
>  	list_for_each_entry_safe(dfc, next, capture_list, dfc_list) {
> -		/*
> -		 * We're finishing the defer_ops that accumulated as a result
> -		 * of recovering unfinished intent items during log recovery.
> -		 * We reserve an itruncate transaction because it is the
> -		 * largest permanent transaction type.  Since we're the only
> -		 * user of the fs right now, take 93% (15/16) of the available
> -		 * free blocks.  Use weird math to avoid a 64-bit division.
> -		 */
> -		freeblks = percpu_counter_sum(&mp->m_fdblocks);
> -		if (freeblks <= 0)
> -			return -ENOSPC;
> -
> -		resblks = min_t(uint64_t, UINT_MAX, freeblks);
> -		resblks = (resblks * 15) >> 4;
> -		error = xfs_trans_alloc(mp, &M_RES(mp)->tr_itruncate, resblks,
> -				0, XFS_TRANS_RESERVE, &tp);
> +		error = xfs_trans_alloc(mp, &M_RES(mp)->tr_itruncate,
> +				dfc->dfc_blkres, dfc->dfc_rtxres,
> +				XFS_TRANS_RESERVE, &tp);
>  		if (error)
>  			return error;
>  
> 


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4.2 5/5] xfs: xfs_defer_capture should absorb remaining transaction reservation
  2020-10-02  4:21   ` [PATCH v4.2 " Darrick J. Wong
  2020-10-02  7:24     ` Christoph Hellwig
@ 2020-10-02 10:39     ` Brian Foster
  1 sibling, 0 replies; 24+ messages in thread
From: Brian Foster @ 2020-10-02 10:39 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs, david, hch

On Thu, Oct 01, 2020 at 09:21:03PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> When xfs_defer_capture extracts the deferred ops and transaction state
> from a transaction, it should record the transaction reservation type
> from the old transaction so that when we continue the dfops chain, we
> still use the same reservation parameters.
> 
> Doing this means that the log item recovery functions get to determine
> the transaction reservation instead of abusing tr_itruncate in yet
> another part of xfs.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
> v4.2: save only the log reservation, and hardcode logcount and flags.
> ---

Reviewed-by: Brian Foster <bfoster@redhat.com>

>  fs/xfs/libxfs/xfs_defer.c |    3 +++
>  fs/xfs/libxfs/xfs_defer.h |    3 +++
>  fs/xfs/xfs_log_recover.c  |   17 ++++++++++++++---
>  3 files changed, 20 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_defer.c b/fs/xfs/libxfs/xfs_defer.c
> index 10aeae7353ab..e19dc1ced7e6 100644
> --- a/fs/xfs/libxfs/xfs_defer.c
> +++ b/fs/xfs/libxfs/xfs_defer.c
> @@ -579,6 +579,9 @@ xfs_defer_ops_capture(
>  	dfc->dfc_blkres = tp->t_blk_res - tp->t_blk_res_used;
>  	dfc->dfc_rtxres = tp->t_rtx_res - tp->t_rtx_res_used;
>  
> +	/* Preserve the log reservation size. */
> +	dfc->dfc_logres = tp->t_log_res;
> +
>  	return dfc;
>  }
>  
> diff --git a/fs/xfs/libxfs/xfs_defer.h b/fs/xfs/libxfs/xfs_defer.h
> index 5c0e59b69ffa..6cde6f0713f7 100644
> --- a/fs/xfs/libxfs/xfs_defer.h
> +++ b/fs/xfs/libxfs/xfs_defer.h
> @@ -79,6 +79,9 @@ struct xfs_defer_capture {
>  	/* Block reservations for the data and rt devices. */
>  	unsigned int		dfc_blkres;
>  	unsigned int		dfc_rtxres;
> +
> +	/* Log reservation saved from the transaction. */
> +	unsigned int		dfc_logres;
>  };
>  
>  /*
> diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
> index 1be5208e2a2f..001e1585ddc6 100644
> --- a/fs/xfs/xfs_log_recover.c
> +++ b/fs/xfs/xfs_log_recover.c
> @@ -2442,9 +2442,20 @@ xlog_finish_defer_ops(
>  	int			error = 0;
>  
>  	list_for_each_entry_safe(dfc, next, capture_list, dfc_list) {
> -		error = xfs_trans_alloc(mp, &M_RES(mp)->tr_itruncate,
> -				dfc->dfc_blkres, dfc->dfc_rtxres,
> -				XFS_TRANS_RESERVE, &tp);
> +		struct xfs_trans_res	resv;
> +
> +		/*
> +		 * Create a new transaction reservation from the captured
> +		 * information.  Set logcount to 1 to force the new transaction
> +		 * to regrant every roll so that we can make forward progress
> +		 * in recovery no matter how full the log might be.
> +		 */
> +		resv.tr_logres = dfc->dfc_logres;
> +		resv.tr_logcount = 1;
> +		resv.tr_logflags = XFS_TRANS_PERM_LOG_RES;
> +
> +		error = xfs_trans_alloc(mp, &resv, dfc->dfc_blkres,
> +				dfc->dfc_rtxres, XFS_TRANS_RESERVE, &tp);
>  		if (error)
>  			return error;
>  
> 


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 2/5] xfs: remove XFS_LI_RECOVERED
  2020-10-05 18:19 [PATCH v5 0/5] xfs: fix how we deal with new intents during recovery Darrick J. Wong
@ 2020-10-05 18:20 ` Darrick J. Wong
  0 siblings, 0 replies; 24+ messages in thread
From: Darrick J. Wong @ 2020-10-05 18:20 UTC (permalink / raw)
  To: darrick.wong
  Cc: Brian Foster, Christoph Hellwig, linux-xfs, david, hch, bfoster

From: Darrick J. Wong <darrick.wong@oracle.com>

The ->iop_recover method of a log intent item removes the recovered
intent item from the AIL by logging an intent done item and committing
the transaction, so it's superfluous to have this flag check.  Nothing
else uses it, so get rid of the flag entirely.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_log_recover.c |    8 +++-----
 fs/xfs/xfs_trans.h       |    4 +---
 2 files changed, 4 insertions(+), 8 deletions(-)


diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index e0675071b39e..84f876c6d498 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -2539,11 +2539,9 @@ xlog_recover_process_intents(
 		 * this routine or else those subsequent intents will get
 		 * replayed in the wrong order!
 		 */
-		if (!test_and_set_bit(XFS_LI_RECOVERED, &lip->li_flags)) {
-			spin_unlock(&ailp->ail_lock);
-			error = lip->li_ops->iop_recover(lip, parent_tp);
-			spin_lock(&ailp->ail_lock);
-		}
+		spin_unlock(&ailp->ail_lock);
+		error = lip->li_ops->iop_recover(lip, parent_tp);
+		spin_lock(&ailp->ail_lock);
 		if (error)
 			goto out;
 		lip = xfs_trans_ail_cursor_next(ailp, &cur);
diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h
index a71b4f443e39..ced62a35a62b 100644
--- a/fs/xfs/xfs_trans.h
+++ b/fs/xfs/xfs_trans.h
@@ -55,14 +55,12 @@ struct xfs_log_item {
 #define	XFS_LI_ABORTED	1
 #define	XFS_LI_FAILED	2
 #define	XFS_LI_DIRTY	3	/* log item dirty in transaction */
-#define	XFS_LI_RECOVERED 4	/* log intent item has been recovered */
 
 #define XFS_LI_FLAGS \
 	{ (1 << XFS_LI_IN_AIL),		"IN_AIL" }, \
 	{ (1 << XFS_LI_ABORTED),	"ABORTED" }, \
 	{ (1 << XFS_LI_FAILED),		"FAILED" }, \
-	{ (1 << XFS_LI_DIRTY),		"DIRTY" }, \
-	{ (1 << XFS_LI_RECOVERED),	"RECOVERED" }
+	{ (1 << XFS_LI_DIRTY),		"DIRTY" }
 
 struct xfs_item_ops {
 	unsigned flags;


^ permalink raw reply related	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2020-10-05 18:20 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-29 17:43 [PATCH v4 0/5] xfs: fix how we deal with new intents during recovery Darrick J. Wong
2020-09-29 17:43 ` [PATCH 1/5] xfs: remove xfs_defer_reset Darrick J. Wong
2020-10-01 17:30   ` Brian Foster
2020-09-29 17:43 ` [PATCH 2/5] xfs: remove XFS_LI_RECOVERED Darrick J. Wong
2020-10-01 17:30   ` Brian Foster
2020-10-02  7:16   ` Christoph Hellwig
2020-09-29 17:43 ` [PATCH 3/5] xfs: proper replay of deferred ops queued during log recovery Darrick J. Wong
2020-10-01 17:31   ` Brian Foster
2020-10-01 17:41     ` Darrick J. Wong
2020-10-02  7:18   ` Christoph Hellwig
2020-09-29 17:43 ` [PATCH 4/5] xfs: xfs_defer_capture should absorb remaining block reservation Darrick J. Wong
2020-10-01 17:32   ` Brian Foster
2020-10-01 17:46     ` Darrick J. Wong
2020-10-02  4:20   ` [PATCH v4.2 " Darrick J. Wong
2020-10-02  7:22     ` Christoph Hellwig
2020-10-02 10:39     ` Brian Foster
2020-09-29 17:43 ` [PATCH 5/5] xfs: xfs_defer_capture should absorb remaining transaction reservation Darrick J. Wong
2020-10-01 17:32   ` Brian Foster
2020-10-01 17:52     ` Darrick J. Wong
2020-10-02  4:21   ` [PATCH v4.2 " Darrick J. Wong
2020-10-02  7:24     ` Christoph Hellwig
2020-10-02  7:32       ` Christoph Hellwig
2020-10-02 10:39     ` Brian Foster
2020-10-05 18:19 [PATCH v5 0/5] xfs: fix how we deal with new intents during recovery Darrick J. Wong
2020-10-05 18:20 ` [PATCH 2/5] xfs: remove XFS_LI_RECOVERED Darrick J. Wong

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.