All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/4] xfs: fix how we deal with new intents during recovery
@ 2020-09-27 23:41 Darrick J. Wong
  2020-09-27 23:41 ` [PATCH 1/4] xfs: remove xfs_defer_reset Darrick J. Wong
                   ` (3 more replies)
  0 siblings, 4 replies; 15+ messages in thread
From: Darrick J. Wong @ 2020-09-27 23:41 UTC (permalink / raw)
  To: darrick.wong; +Cc: Christoph Hellwig, linux-xfs, david, hch, bfoster

Hi all,

This series of log fixes dates back to an earlier discussion that Dave
and I had about the weird way that log recovery works w.r.t. intent
items.  The current code juggles nested transactions so that it can
siphon off new deferred items for later; this we replace with a new
dfops freezer that captures the log reservation type and remaining block
reservation so that we finish the new deferred items with the same
transaction context as we would have had the system not gone down.

v2: rework the defer capture api per hch suggestions
v3: rework the api again, per bfoster suggestions, so now xfs_defer_capture
is only responsible for creating the capture device, and log recovery still
has to do some work to commit a transaction and free resources

If you're going to start using this mess, you probably ought to just
pull from my git trees, which are linked below.

This is an extraordinary way to destroy everything.  Enjoy!
Comments and questions are, as always, welcome.

--D

kernel git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=fix-recovery-intent-chaining-5.10
---
 fs/xfs/libxfs/xfs_defer.c       |   89 ++++++++++++++++++--------
 fs/xfs/libxfs/xfs_defer.h       |   22 ++++++
 fs/xfs/libxfs/xfs_log_recover.h |    2 +
 fs/xfs/xfs_bmap_item.c          |   16 +----
 fs/xfs/xfs_extfree_item.c       |    7 +-
 fs/xfs/xfs_log_recover.c        |  136 ++++++++++++++++++++++++---------------
 fs/xfs/xfs_refcount_item.c      |   16 +----
 fs/xfs/xfs_rmap_item.c          |    7 +-
 fs/xfs/xfs_trans.h              |    3 +
 9 files changed, 184 insertions(+), 114 deletions(-)


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/4] xfs: remove xfs_defer_reset
  2020-09-27 23:41 [PATCH v3 0/4] xfs: fix how we deal with new intents during recovery Darrick J. Wong
@ 2020-09-27 23:41 ` Darrick J. Wong
  2020-09-28  4:42   ` Dave Chinner
  2020-09-28  6:20   ` Christoph Hellwig
  2020-09-27 23:41 ` [PATCH 2/4] xfs: proper replay of deferred ops queued during log recovery Darrick J. Wong
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 15+ messages in thread
From: Darrick J. Wong @ 2020-09-27 23:41 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs, david, hch, bfoster

From: Darrick J. Wong <darrick.wong@oracle.com>

Remove this one-line helper.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/libxfs/xfs_defer.c |   24 +++++-------------------
 1 file changed, 5 insertions(+), 19 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_defer.c b/fs/xfs/libxfs/xfs_defer.c
index 29e9762f3b77..36c103c14bc9 100644
--- a/fs/xfs/libxfs/xfs_defer.c
+++ b/fs/xfs/libxfs/xfs_defer.c
@@ -312,22 +312,6 @@ xfs_defer_trans_roll(
 	return error;
 }
 
-/*
- * Reset an already used dfops after finish.
- */
-static void
-xfs_defer_reset(
-	struct xfs_trans	*tp)
-{
-	ASSERT(list_empty(&tp->t_dfops));
-
-	/*
-	 * Low mode state transfers across transaction rolls to mirror dfops
-	 * lifetime. Clear it now that dfops is reset.
-	 */
-	tp->t_flags &= ~XFS_TRANS_LOWMODE;
-}
-
 /*
  * Free up any items left in the list.
  */
@@ -477,7 +461,10 @@ xfs_defer_finish(
 			return error;
 		}
 	}
-	xfs_defer_reset(*tp);
+
+	/* Reset LOWMODE now that we've finished all the dfops. */
+	ASSERT(list_empty(&(*tp)->t_dfops));
+	(*tp)->t_flags &= ~XFS_TRANS_LOWMODE;
 	return 0;
 }
 
@@ -551,8 +538,7 @@ xfs_defer_move(
 	 * that behavior.
 	 */
 	dtp->t_flags |= (stp->t_flags & XFS_TRANS_LOWMODE);
-
-	xfs_defer_reset(stp);
+	stp->t_flags &= ~XFS_TRANS_LOWMODE;
 }
 
 /*


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/4] xfs: proper replay of deferred ops queued during log recovery
  2020-09-27 23:41 [PATCH v3 0/4] xfs: fix how we deal with new intents during recovery Darrick J. Wong
  2020-09-27 23:41 ` [PATCH 1/4] xfs: remove xfs_defer_reset Darrick J. Wong
@ 2020-09-27 23:41 ` Darrick J. Wong
  2020-09-28  5:26   ` Dave Chinner
  2020-09-27 23:41 ` [PATCH 3/4] xfs: xfs_defer_capture should absorb remaining block reservation Darrick J. Wong
  2020-09-27 23:41 ` [PATCH 4/4] xfs: xfs_defer_capture should absorb remaining transaction reservation Darrick J. Wong
  3 siblings, 1 reply; 15+ messages in thread
From: Darrick J. Wong @ 2020-09-27 23:41 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs, david, hch, bfoster

From: Darrick J. Wong <darrick.wong@oracle.com>

When we replay unfinished intent items that have been recovered from the
log, it's possible that the replay will cause the creation of more
deferred work items.  As outlined in commit 509955823cc9c ("xfs: log
recovery should replay deferred ops in order"), later work items have an
implicit ordering dependency on earlier work items.  Therefore, recovery
must replay the items (both recovered and created) in the same order
that they would have been during normal operation.

For log recovery, we enforce this ordering by using an empty transaction
to collect deferred ops that get created in the process of recovering a
log intent item to prevent them from being committed before the rest of
the recovered intent items.  After we finish committing all the
recovered log items, we allocate a transaction with an enormous block
reservation, splice our huge list of created deferred ops into that
transaction, and commit it, thereby finishing all those ops.

This is /really/ hokey -- it's the one place in XFS where we allow
nested transactions; the splicing of the defer ops list is is inelegant
and has to be done twice per recovery function; and the broken way we
handle inode pointers and block reservations cause subtle use-after-free
and allocator problems that will be fixed by this patch and the two
patches after it.

Therefore, replace the hokey empty transaction with a structure designed
to capture each chain of deferred ops that are created as part of
recovering a single unfinished log intent.  Finally, refactor the loop
that replays those chains to do so using one transaction per chain.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/libxfs/xfs_defer.c       |   56 +++++++++++++--
 fs/xfs/libxfs/xfs_defer.h       |   20 +++++
 fs/xfs/libxfs/xfs_log_recover.h |    2 +
 fs/xfs/xfs_bmap_item.c          |   16 +---
 fs/xfs/xfs_extfree_item.c       |    7 +-
 fs/xfs/xfs_log_recover.c        |  150 +++++++++++++++++++++++++--------------
 fs/xfs/xfs_refcount_item.c      |   16 +---
 fs/xfs/xfs_rmap_item.c          |    7 +-
 fs/xfs/xfs_trans.h              |    3 +
 9 files changed, 183 insertions(+), 94 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_defer.c b/fs/xfs/libxfs/xfs_defer.c
index 36c103c14bc9..0de7672fe63d 100644
--- a/fs/xfs/libxfs/xfs_defer.c
+++ b/fs/xfs/libxfs/xfs_defer.c
@@ -549,14 +549,56 @@ xfs_defer_move(
  *
  * Create and log intent items for all the work that we're capturing so that we
  * can be assured that the items will get replayed if the system goes down
- * before log recovery gets a chance to finish the work it put off.  Then we
- * move the chain from stp to dtp.
+ * before log recovery gets a chance to finish the work it put off.  The entire
+ * deferred ops state is transferred to the capture structure and the
+ * transaction is then ready for the caller to commit it.  If there are no
+ * intent items to capture, this function returns NULL.
  */
-void
+struct xfs_defer_capture *
 xfs_defer_capture(
-	struct xfs_trans	*dtp,
-	struct xfs_trans	*stp)
+	struct xfs_trans		*tp)
 {
-	xfs_defer_create_intents(stp);
-	xfs_defer_move(dtp, stp);
+	struct xfs_defer_capture	*dfc;
+
+	if (list_empty(&tp->t_dfops))
+		return NULL;
+
+	/* Create an object to capture the defer ops. */
+	dfc = kmem_zalloc(sizeof(*dfc), KM_NOFS);
+	INIT_LIST_HEAD(&dfc->dfc_list);
+	INIT_LIST_HEAD(&dfc->dfc_dfops);
+
+	xfs_defer_create_intents(tp);
+
+	/* Move the dfops chain and transaction state to the capture struct. */
+	list_splice_init(&tp->t_dfops, &dfc->dfc_dfops);
+	dfc->dfc_tpflags = tp->t_flags & XFS_TRANS_LOWMODE;
+	tp->t_flags &= ~XFS_TRANS_LOWMODE;
+
+	return dfc;
+}
+
+/* Attach a chain of captured deferred ops to a new transaction. */
+void
+xfs_defer_continue(
+	struct xfs_defer_capture	*dfc,
+	struct xfs_trans		*tp)
+{
+	ASSERT(tp->t_flags & XFS_TRANS_PERM_LOG_RES);
+	ASSERT(!(tp->t_flags & XFS_TRANS_DIRTY));
+
+	/* Move captured dfops chain and state to the transaction. */
+	list_splice_init(&dfc->dfc_dfops, &tp->t_dfops);
+	tp->t_flags |= dfc->dfc_tpflags;
+	dfc->dfc_tpflags = 0;
+}
+
+/* Release all resources that we used to capture deferred ops. */
+void
+xfs_defer_capture_free(
+	struct xfs_mount		*mp,
+	struct xfs_defer_capture	*dfc)
+{
+	xfs_defer_cancel_list(mp, &dfc->dfc_dfops);
+	kmem_free(dfc);
 }
diff --git a/fs/xfs/libxfs/xfs_defer.h b/fs/xfs/libxfs/xfs_defer.h
index 3164199162b6..bc7493bf4542 100644
--- a/fs/xfs/libxfs/xfs_defer.h
+++ b/fs/xfs/libxfs/xfs_defer.h
@@ -8,6 +8,7 @@
 
 struct xfs_btree_cur;
 struct xfs_defer_op_type;
+struct xfs_defer_capture;
 
 /*
  * Header for deferred operation list.
@@ -63,10 +64,27 @@ extern const struct xfs_defer_op_type xfs_rmap_update_defer_type;
 extern const struct xfs_defer_op_type xfs_extent_free_defer_type;
 extern const struct xfs_defer_op_type xfs_agfl_free_defer_type;
 
+/*
+ * Deferred operation freezer.  This structure enables a dfops user to detach
+ * the chain of deferred operations from a transaction so that they can be
+ * continued later.
+ */
+struct xfs_defer_capture {
+	/* List of other freezer heads. */
+	struct list_head	dfc_list;
+
+	/* Deferred ops state saved from the transaction. */
+	struct list_head	dfc_dfops;
+	unsigned int		dfc_tpflags;
+};
+
 /*
  * Functions to capture a chain of deferred operations and continue them later.
  * This doesn't normally happen except log recovery.
  */
-void xfs_defer_capture(struct xfs_trans *dtp, struct xfs_trans *stp);
+struct xfs_defer_capture *xfs_defer_capture(struct xfs_trans *tp);
+void xfs_defer_continue(struct xfs_defer_capture *dfc, struct xfs_trans *tp);
+void xfs_defer_capture_free(struct xfs_mount *mp,
+		struct xfs_defer_capture *dfc);
 
 #endif /* __XFS_DEFER_H__ */
diff --git a/fs/xfs/libxfs/xfs_log_recover.h b/fs/xfs/libxfs/xfs_log_recover.h
index 3cca2bfe714c..8ad44b4195e8 100644
--- a/fs/xfs/libxfs/xfs_log_recover.h
+++ b/fs/xfs/libxfs/xfs_log_recover.h
@@ -124,5 +124,7 @@ bool xlog_is_buffer_cancelled(struct xlog *log, xfs_daddr_t blkno, uint len);
 
 void xlog_recover_release_intent(struct xlog *log, unsigned short intent_type,
 		uint64_t intent_id);
+int xlog_recover_trans_commit(struct xfs_trans *tp,
+		struct list_head *capture_list);
 
 #endif	/* __XFS_LOG_RECOVER_H__ */
diff --git a/fs/xfs/xfs_bmap_item.c b/fs/xfs/xfs_bmap_item.c
index b04ebcd78316..b73f0a0890a2 100644
--- a/fs/xfs/xfs_bmap_item.c
+++ b/fs/xfs/xfs_bmap_item.c
@@ -424,13 +424,13 @@ const struct xfs_defer_op_type xfs_bmap_update_defer_type = {
 STATIC int
 xfs_bui_item_recover(
 	struct xfs_log_item		*lip,
-	struct xfs_trans		*parent_tp)
+	struct list_head		*capture_list)
 {
 	struct xfs_bmbt_irec		irec;
 	struct xfs_bui_log_item		*buip = BUI_ITEM(lip);
 	struct xfs_trans		*tp;
 	struct xfs_inode		*ip = NULL;
-	struct xfs_mount		*mp = parent_tp->t_mountp;
+	struct xfs_mount		*mp = lip->li_mountp;
 	struct xfs_map_extent		*bmap;
 	struct xfs_bud_log_item		*budp;
 	xfs_fsblock_t			startblock_fsb;
@@ -478,12 +478,7 @@ xfs_bui_item_recover(
 			XFS_EXTENTADD_SPACE_RES(mp, XFS_DATA_FORK), 0, 0, &tp);
 	if (error)
 		return error;
-	/*
-	 * Recovery stashes all deferred ops during intent processing and
-	 * finishes them on completion. Transfer current dfops state to this
-	 * transaction and transfer the result back before we return.
-	 */
-	xfs_defer_move(tp, parent_tp);
+
 	budp = xfs_trans_get_bud(tp, buip);
 
 	/* Grab the inode. */
@@ -531,15 +526,12 @@ xfs_bui_item_recover(
 		xfs_bmap_unmap_extent(tp, ip, &irec);
 	}
 
-	xfs_defer_capture(parent_tp, tp);
-	error = xfs_trans_commit(tp);
+	error = xlog_recover_trans_commit(tp, capture_list);
 	xfs_iunlock(ip, XFS_ILOCK_EXCL);
 	xfs_irele(ip);
-
 	return error;
 
 err_inode:
-	xfs_defer_move(parent_tp, tp);
 	xfs_trans_cancel(tp);
 	if (ip) {
 		xfs_iunlock(ip, XFS_ILOCK_EXCL);
diff --git a/fs/xfs/xfs_extfree_item.c b/fs/xfs/xfs_extfree_item.c
index 9093d2e7afdf..be0186875566 100644
--- a/fs/xfs/xfs_extfree_item.c
+++ b/fs/xfs/xfs_extfree_item.c
@@ -585,10 +585,10 @@ const struct xfs_defer_op_type xfs_agfl_free_defer_type = {
 STATIC int
 xfs_efi_item_recover(
 	struct xfs_log_item		*lip,
-	struct xfs_trans		*parent_tp)
+	struct list_head		*capture_list)
 {
 	struct xfs_efi_log_item		*efip = EFI_ITEM(lip);
-	struct xfs_mount		*mp = parent_tp->t_mountp;
+	struct xfs_mount		*mp = lip->li_mountp;
 	struct xfs_efd_log_item		*efdp;
 	struct xfs_trans		*tp;
 	struct xfs_extent		*extp;
@@ -627,8 +627,7 @@ xfs_efi_item_recover(
 
 	}
 
-	error = xfs_trans_commit(tp);
-	return error;
+	return xlog_recover_trans_commit(tp, capture_list);
 
 abort_error:
 	xfs_trans_cancel(tp);
diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index e0675071b39e..107965acc57e 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -1755,6 +1755,38 @@ xlog_recover_release_intent(
 	spin_unlock(&ailp->ail_lock);
 }
 
+/*
+ * Capture any deferred ops and commit the transaction.  This is the last step
+ * needed to finish a log intent item that we recovered from the log, and will
+ * take care of releasing all the relevant resources.
+ */
+int
+xlog_recover_trans_commit(
+	struct xfs_trans		*tp,
+	struct list_head		*capture_list)
+{
+	struct xfs_mount		*mp = tp->t_mountp;
+	struct xfs_defer_capture	*dfc = xfs_defer_capture(tp);
+	int				error;
+
+	/* If we don't capture anything, commit tp and exit. */
+	if (!dfc)
+		return xfs_trans_commit(tp);
+
+	/*
+	 * Commit the transaction.  If that fails, clean up the defer ops and
+	 * the dfc that we just created.  Otherwise, add the dfc to the list.
+	 */
+	error = xfs_trans_commit(tp);
+	if (error) {
+		xfs_defer_capture_free(mp, dfc);
+		return error;
+	}
+
+	list_add_tail(&dfc->dfc_list, capture_list);
+	return 0;
+}
+
 /******************************************************************************
  *
  *		Log recover routines
@@ -2431,38 +2463,62 @@ xlog_recover_process_data(
 	return 0;
 }
 
+static void
+xlog_cancel_defer_ops(
+	struct xfs_mount	*mp,
+	struct list_head	*capture_list)
+{
+	struct xfs_defer_capture *dfc, *next;
+
+	list_for_each_entry_safe(dfc, next, capture_list, dfc_list) {
+		list_del_init(&dfc->dfc_list);
+		xfs_defer_capture_free(mp, dfc);
+	}
+}
+
 /* Take all the collected deferred ops and finish them in order. */
 static int
 xlog_finish_defer_ops(
-	struct xfs_trans	*parent_tp)
+	struct xfs_mount	*mp,
+	struct list_head	*capture_list)
 {
-	struct xfs_mount	*mp = parent_tp->t_mountp;
+	struct xfs_defer_capture *dfc, *next;
 	struct xfs_trans	*tp;
 	int64_t			freeblks;
-	uint			resblks;
-	int			error;
+	uint64_t		resblks;
+	int			error = 0;
 
-	/*
-	 * We're finishing the defer_ops that accumulated as a result of
-	 * recovering unfinished intent items during log recovery.  We
-	 * reserve an itruncate transaction because it is the largest
-	 * permanent transaction type.  Since we're the only user of the fs
-	 * right now, take 93% (15/16) of the available free blocks.  Use
-	 * weird math to avoid a 64-bit division.
-	 */
-	freeblks = percpu_counter_sum(&mp->m_fdblocks);
-	if (freeblks <= 0)
-		return -ENOSPC;
-	resblks = min_t(int64_t, UINT_MAX, freeblks);
-	resblks = (resblks * 15) >> 4;
-	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_itruncate, resblks,
-			0, XFS_TRANS_RESERVE, &tp);
-	if (error)
-		return error;
-	/* transfer all collected dfops to this transaction */
-	xfs_defer_move(tp, parent_tp);
+	list_for_each_entry_safe(dfc, next, capture_list, dfc_list) {
+		/*
+		 * We're finishing the defer_ops that accumulated as a result
+		 * of recovering unfinished intent items during log recovery.
+		 * We reserve an itruncate transaction because it is the
+		 * largest permanent transaction type.  Since we're the only
+		 * user of the fs right now, take 93% (15/16) of the available
+		 * free blocks.  Use weird math to avoid a 64-bit division.
+		 */
+		freeblks = percpu_counter_sum(&mp->m_fdblocks);
+		if (freeblks <= 0)
+			return -ENOSPC;
 
-	return xfs_trans_commit(tp);
+		resblks = min_t(uint64_t, UINT_MAX, freeblks);
+		resblks = (resblks * 15) >> 4;
+		error = xfs_trans_alloc(mp, &M_RES(mp)->tr_itruncate, resblks,
+				0, XFS_TRANS_RESERVE, &tp);
+		if (error)
+			return error;
+
+		/* transfer all collected dfops to this transaction */
+		list_del_init(&dfc->dfc_list);
+		xfs_defer_continue(dfc, tp);
+
+		error = xfs_trans_commit(tp);
+		xfs_defer_capture_free(mp, dfc);
+		if (error)
+			return error;
+	}
+
+	return 0;
 }
 
 /*
@@ -2485,35 +2541,23 @@ STATIC int
 xlog_recover_process_intents(
 	struct xlog		*log)
 {
-	struct xfs_trans	*parent_tp;
+	LIST_HEAD(capture_list);
 	struct xfs_ail_cursor	cur;
 	struct xfs_log_item	*lip;
 	struct xfs_ail		*ailp;
-	int			error;
+	int			error = 0;
 #if defined(DEBUG) || defined(XFS_WARN)
 	xfs_lsn_t		last_lsn;
 #endif
 
-	/*
-	 * The intent recovery handlers commit transactions to complete recovery
-	 * for individual intents, but any new deferred operations that are
-	 * queued during that process are held off until the very end. The
-	 * purpose of this transaction is to serve as a container for deferred
-	 * operations. Each intent recovery handler must transfer dfops here
-	 * before its local transaction commits, and we'll finish the entire
-	 * list below.
-	 */
-	error = xfs_trans_alloc_empty(log->l_mp, &parent_tp);
-	if (error)
-		return error;
-
 	ailp = log->l_ailp;
 	spin_lock(&ailp->ail_lock);
-	lip = xfs_trans_ail_cursor_first(ailp, &cur, 0);
 #if defined(DEBUG) || defined(XFS_WARN)
 	last_lsn = xlog_assign_lsn(log->l_curr_cycle, log->l_curr_block);
 #endif
-	while (lip != NULL) {
+	for (lip = xfs_trans_ail_cursor_first(ailp, &cur, 0);
+	     lip != NULL;
+	     lip = xfs_trans_ail_cursor_next(ailp, &cur)) {
 		/*
 		 * We're done when we see something other than an intent.
 		 * There should be no intents left in the AIL now.
@@ -2533,28 +2577,28 @@ xlog_recover_process_intents(
 		 */
 		ASSERT(XFS_LSN_CMP(last_lsn, lip->li_lsn) >= 0);
 
+		if (test_and_set_bit(XFS_LI_RECOVERED, &lip->li_flags))
+			continue;
+
 		/*
 		 * NOTE: If your intent processing routine can create more
-		 * deferred ops, you /must/ attach them to the transaction in
-		 * this routine or else those subsequent intents will get
+		 * deferred ops, you /must/ attach them to the capture list in
+		 * the recover routine or else those subsequent intents will be
 		 * replayed in the wrong order!
 		 */
-		if (!test_and_set_bit(XFS_LI_RECOVERED, &lip->li_flags)) {
-			spin_unlock(&ailp->ail_lock);
-			error = lip->li_ops->iop_recover(lip, parent_tp);
-			spin_lock(&ailp->ail_lock);
-		}
+		spin_unlock(&ailp->ail_lock);
+		error = lip->li_ops->iop_recover(lip, &capture_list);
+		spin_lock(&ailp->ail_lock);
 		if (error)
-			goto out;
-		lip = xfs_trans_ail_cursor_next(ailp, &cur);
+			break;
 	}
-out:
+
 	xfs_trans_ail_cursor_done(&cur);
 	spin_unlock(&ailp->ail_lock);
 	if (!error)
-		error = xlog_finish_defer_ops(parent_tp);
-	xfs_trans_cancel(parent_tp);
+		error = xlog_finish_defer_ops(log->l_mp, &capture_list);
 
+	xlog_cancel_defer_ops(log->l_mp, &capture_list);
 	return error;
 }
 
diff --git a/fs/xfs/xfs_refcount_item.c b/fs/xfs/xfs_refcount_item.c
index 3e34b7662361..7a57b4de9ee7 100644
--- a/fs/xfs/xfs_refcount_item.c
+++ b/fs/xfs/xfs_refcount_item.c
@@ -424,7 +424,7 @@ const struct xfs_defer_op_type xfs_refcount_update_defer_type = {
 STATIC int
 xfs_cui_item_recover(
 	struct xfs_log_item		*lip,
-	struct xfs_trans		*parent_tp)
+	struct list_head		*capture_list)
 {
 	struct xfs_bmbt_irec		irec;
 	struct xfs_cui_log_item		*cuip = CUI_ITEM(lip);
@@ -432,7 +432,7 @@ xfs_cui_item_recover(
 	struct xfs_cud_log_item		*cudp;
 	struct xfs_trans		*tp;
 	struct xfs_btree_cur		*rcur = NULL;
-	struct xfs_mount		*mp = parent_tp->t_mountp;
+	struct xfs_mount		*mp = lip->li_mountp;
 	xfs_fsblock_t			startblock_fsb;
 	xfs_fsblock_t			new_fsb;
 	xfs_extlen_t			new_len;
@@ -487,12 +487,7 @@ xfs_cui_item_recover(
 			mp->m_refc_maxlevels * 2, 0, XFS_TRANS_RESERVE, &tp);
 	if (error)
 		return error;
-	/*
-	 * Recovery stashes all deferred ops during intent processing and
-	 * finishes them on completion. Transfer current dfops state to this
-	 * transaction and transfer the result back before we return.
-	 */
-	xfs_defer_move(tp, parent_tp);
+
 	cudp = xfs_trans_get_cud(tp, cuip);
 
 	for (i = 0; i < cuip->cui_format.cui_nextents; i++) {
@@ -549,13 +544,10 @@ xfs_cui_item_recover(
 	}
 
 	xfs_refcount_finish_one_cleanup(tp, rcur, error);
-	xfs_defer_capture(parent_tp, tp);
-	error = xfs_trans_commit(tp);
-	return error;
+	return xlog_recover_trans_commit(tp, capture_list);
 
 abort_error:
 	xfs_refcount_finish_one_cleanup(tp, rcur, error);
-	xfs_defer_move(parent_tp, tp);
 	xfs_trans_cancel(tp);
 	return error;
 }
diff --git a/fs/xfs/xfs_rmap_item.c b/fs/xfs/xfs_rmap_item.c
index e38ec5d736be..16c7a6385c3f 100644
--- a/fs/xfs/xfs_rmap_item.c
+++ b/fs/xfs/xfs_rmap_item.c
@@ -467,14 +467,14 @@ const struct xfs_defer_op_type xfs_rmap_update_defer_type = {
 STATIC int
 xfs_rui_item_recover(
 	struct xfs_log_item		*lip,
-	struct xfs_trans		*parent_tp)
+	struct list_head		*capture_list)
 {
 	struct xfs_rui_log_item		*ruip = RUI_ITEM(lip);
 	struct xfs_map_extent		*rmap;
 	struct xfs_rud_log_item		*rudp;
 	struct xfs_trans		*tp;
 	struct xfs_btree_cur		*rcur = NULL;
-	struct xfs_mount		*mp = parent_tp->t_mountp;
+	struct xfs_mount		*mp = lip->li_mountp;
 	xfs_fsblock_t			startblock_fsb;
 	enum xfs_rmap_intent_type	type;
 	xfs_exntst_t			state;
@@ -567,8 +567,7 @@ xfs_rui_item_recover(
 	}
 
 	xfs_rmap_finish_one_cleanup(tp, rcur, error);
-	error = xfs_trans_commit(tp);
-	return error;
+	return xlog_recover_trans_commit(tp, capture_list);
 
 abort_error:
 	xfs_rmap_finish_one_cleanup(tp, rcur, error);
diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h
index a71b4f443e39..e3875a92a541 100644
--- a/fs/xfs/xfs_trans.h
+++ b/fs/xfs/xfs_trans.h
@@ -74,7 +74,8 @@ struct xfs_item_ops {
 	void (*iop_committing)(struct xfs_log_item *, xfs_lsn_t commit_lsn);
 	void (*iop_release)(struct xfs_log_item *);
 	xfs_lsn_t (*iop_committed)(struct xfs_log_item *, xfs_lsn_t);
-	int (*iop_recover)(struct xfs_log_item *lip, struct xfs_trans *tp);
+	int (*iop_recover)(struct xfs_log_item *lip,
+			   struct list_head *capture_list);
 	bool (*iop_match)(struct xfs_log_item *item, uint64_t id);
 };
 


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 3/4] xfs: xfs_defer_capture should absorb remaining block reservation
  2020-09-27 23:41 [PATCH v3 0/4] xfs: fix how we deal with new intents during recovery Darrick J. Wong
  2020-09-27 23:41 ` [PATCH 1/4] xfs: remove xfs_defer_reset Darrick J. Wong
  2020-09-27 23:41 ` [PATCH 2/4] xfs: proper replay of deferred ops queued during log recovery Darrick J. Wong
@ 2020-09-27 23:41 ` Darrick J. Wong
  2020-09-28  5:28   ` Dave Chinner
  2020-09-27 23:41 ` [PATCH 4/4] xfs: xfs_defer_capture should absorb remaining transaction reservation Darrick J. Wong
  3 siblings, 1 reply; 15+ messages in thread
From: Darrick J. Wong @ 2020-09-27 23:41 UTC (permalink / raw)
  To: darrick.wong; +Cc: Christoph Hellwig, linux-xfs, david, hch, bfoster

From: Darrick J. Wong <darrick.wong@oracle.com>

When xfs_defer_capture extracts the deferred ops and transaction state
from a transaction, it should absorb the remaining block reservation so
that when we continue the dfops chain, we still have those blocks to
use.

This adds the requirement that every log intent item recovery function
must be careful to reserve enough blocks to handle both itself and all
defer ops that it can queue.  On the other hand, this enables us to do
away with the handwaving block estimation nonsense that was going on in
xlog_finish_defer_ops.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/libxfs/xfs_defer.c |    4 ++++
 fs/xfs/libxfs/xfs_defer.h |    1 +
 fs/xfs/xfs_log_recover.c  |   18 +-----------------
 3 files changed, 6 insertions(+), 17 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_defer.c b/fs/xfs/libxfs/xfs_defer.c
index 0de7672fe63d..85d70f1edc1c 100644
--- a/fs/xfs/libxfs/xfs_defer.c
+++ b/fs/xfs/libxfs/xfs_defer.c
@@ -574,6 +574,8 @@ xfs_defer_capture(
 	list_splice_init(&tp->t_dfops, &dfc->dfc_dfops);
 	dfc->dfc_tpflags = tp->t_flags & XFS_TRANS_LOWMODE;
 	tp->t_flags &= ~XFS_TRANS_LOWMODE;
+	dfc->dfc_blkres = tp->t_blk_res - tp->t_blk_res_used;
+	tp->t_blk_res = tp->t_blk_res_used;
 
 	return dfc;
 }
@@ -591,6 +593,8 @@ xfs_defer_continue(
 	list_splice_init(&dfc->dfc_dfops, &tp->t_dfops);
 	tp->t_flags |= dfc->dfc_tpflags;
 	dfc->dfc_tpflags = 0;
+	tp->t_blk_res += dfc->dfc_blkres;
+	dfc->dfc_blkres = 0;
 }
 
 /* Release all resources that we used to capture deferred ops. */
diff --git a/fs/xfs/libxfs/xfs_defer.h b/fs/xfs/libxfs/xfs_defer.h
index bc7493bf4542..999adbb8c449 100644
--- a/fs/xfs/libxfs/xfs_defer.h
+++ b/fs/xfs/libxfs/xfs_defer.h
@@ -76,6 +76,7 @@ struct xfs_defer_capture {
 	/* Deferred ops state saved from the transaction. */
 	struct list_head	dfc_dfops;
 	unsigned int		dfc_tpflags;
+	unsigned int		dfc_blkres;
 };
 
 /*
diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index 107965acc57e..60670ac4e5ae 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -2484,26 +2484,10 @@ xlog_finish_defer_ops(
 {
 	struct xfs_defer_capture *dfc, *next;
 	struct xfs_trans	*tp;
-	int64_t			freeblks;
-	uint64_t		resblks;
 	int			error = 0;
 
 	list_for_each_entry_safe(dfc, next, capture_list, dfc_list) {
-		/*
-		 * We're finishing the defer_ops that accumulated as a result
-		 * of recovering unfinished intent items during log recovery.
-		 * We reserve an itruncate transaction because it is the
-		 * largest permanent transaction type.  Since we're the only
-		 * user of the fs right now, take 93% (15/16) of the available
-		 * free blocks.  Use weird math to avoid a 64-bit division.
-		 */
-		freeblks = percpu_counter_sum(&mp->m_fdblocks);
-		if (freeblks <= 0)
-			return -ENOSPC;
-
-		resblks = min_t(uint64_t, UINT_MAX, freeblks);
-		resblks = (resblks * 15) >> 4;
-		error = xfs_trans_alloc(mp, &M_RES(mp)->tr_itruncate, resblks,
+		error = xfs_trans_alloc(mp, &M_RES(mp)->tr_itruncate, 0,
 				0, XFS_TRANS_RESERVE, &tp);
 		if (error)
 			return error;


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 4/4] xfs: xfs_defer_capture should absorb remaining transaction reservation
  2020-09-27 23:41 [PATCH v3 0/4] xfs: fix how we deal with new intents during recovery Darrick J. Wong
                   ` (2 preceding siblings ...)
  2020-09-27 23:41 ` [PATCH 3/4] xfs: xfs_defer_capture should absorb remaining block reservation Darrick J. Wong
@ 2020-09-27 23:41 ` Darrick J. Wong
  2020-09-28  5:52   ` Dave Chinner
  3 siblings, 1 reply; 15+ messages in thread
From: Darrick J. Wong @ 2020-09-27 23:41 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs, david, hch, bfoster

From: Darrick J. Wong <darrick.wong@oracle.com>

When xfs_defer_capture extracts the deferred ops and transaction state
from a transaction, it should record the transaction reservation type
from the old transaction so that when we continue the dfops chain, we
still use the same reservation parameters.

This avoids a potential failure vector by ensuring that we never ask for
more log reservation space than we would have asked for had the system
not gone down.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/libxfs/xfs_defer.c |    5 +++++
 fs/xfs/libxfs/xfs_defer.h |    1 +
 fs/xfs/xfs_log_recover.c  |    4 ++--
 3 files changed, 8 insertions(+), 2 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_defer.c b/fs/xfs/libxfs/xfs_defer.c
index 85d70f1edc1c..c53443252389 100644
--- a/fs/xfs/libxfs/xfs_defer.c
+++ b/fs/xfs/libxfs/xfs_defer.c
@@ -577,6 +577,11 @@ xfs_defer_capture(
 	dfc->dfc_blkres = tp->t_blk_res - tp->t_blk_res_used;
 	tp->t_blk_res = tp->t_blk_res_used;
 
+	/* Preserve the transaction reservation type. */
+	dfc->dfc_tres.tr_logres = tp->t_log_res;
+	dfc->dfc_tres.tr_logcount = tp->t_log_count;
+	dfc->dfc_tres.tr_logflags = XFS_TRANS_PERM_LOG_RES;
+
 	return dfc;
 }
 
diff --git a/fs/xfs/libxfs/xfs_defer.h b/fs/xfs/libxfs/xfs_defer.h
index 999adbb8c449..063276c063b6 100644
--- a/fs/xfs/libxfs/xfs_defer.h
+++ b/fs/xfs/libxfs/xfs_defer.h
@@ -77,6 +77,7 @@ struct xfs_defer_capture {
 	struct list_head	dfc_dfops;
 	unsigned int		dfc_tpflags;
 	unsigned int		dfc_blkres;
+	struct xfs_trans_res	dfc_tres;
 };
 
 /*
diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index 60670ac4e5ae..0d899ab7df2e 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -2487,8 +2487,8 @@ xlog_finish_defer_ops(
 	int			error = 0;
 
 	list_for_each_entry_safe(dfc, next, capture_list, dfc_list) {
-		error = xfs_trans_alloc(mp, &M_RES(mp)->tr_itruncate, 0,
-				0, XFS_TRANS_RESERVE, &tp);
+		error = xfs_trans_alloc(mp, &dfc->dfc_tres, 0, 0,
+				XFS_TRANS_RESERVE, &tp);
 		if (error)
 			return error;
 


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/4] xfs: remove xfs_defer_reset
  2020-09-27 23:41 ` [PATCH 1/4] xfs: remove xfs_defer_reset Darrick J. Wong
@ 2020-09-28  4:42   ` Dave Chinner
  2020-09-28  6:20   ` Christoph Hellwig
  1 sibling, 0 replies; 15+ messages in thread
From: Dave Chinner @ 2020-09-28  4:42 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs, hch, bfoster

On Sun, Sep 27, 2020 at 04:41:14PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Remove this one-line helper.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  fs/xfs/libxfs/xfs_defer.c |   24 +++++-------------------
>  1 file changed, 5 insertions(+), 19 deletions(-)
> 
> 
> diff --git a/fs/xfs/libxfs/xfs_defer.c b/fs/xfs/libxfs/xfs_defer.c
> index 29e9762f3b77..36c103c14bc9 100644
> --- a/fs/xfs/libxfs/xfs_defer.c
> +++ b/fs/xfs/libxfs/xfs_defer.c
> @@ -312,22 +312,6 @@ xfs_defer_trans_roll(
>  	return error;
>  }
>  
> -/*
> - * Reset an already used dfops after finish.
> - */
> -static void
> -xfs_defer_reset(
> -	struct xfs_trans	*tp)
> -{
> -	ASSERT(list_empty(&tp->t_dfops));
> -
> -	/*
> -	 * Low mode state transfers across transaction rolls to mirror dfops
> -	 * lifetime. Clear it now that dfops is reset.
> -	 */
> -	tp->t_flags &= ~XFS_TRANS_LOWMODE;
> -}
> -
>  /*
>   * Free up any items left in the list.
>   */
> @@ -477,7 +461,10 @@ xfs_defer_finish(
>  			return error;
>  		}
>  	}
> -	xfs_defer_reset(*tp);
> +
> +	/* Reset LOWMODE now that we've finished all the dfops. */
> +	ASSERT(list_empty(&(*tp)->t_dfops));
> +	(*tp)->t_flags &= ~XFS_TRANS_LOWMODE;
>  	return 0;
>  }
>  
> @@ -551,8 +538,7 @@ xfs_defer_move(
>  	 * that behavior.
>  	 */
>  	dtp->t_flags |= (stp->t_flags & XFS_TRANS_LOWMODE);
> -
> -	xfs_defer_reset(stp);
> +	stp->t_flags &= ~XFS_TRANS_LOWMODE;
>  }
>  
>  /*

looks fine.

Reviewed-by: Dave Chinner <dchinner@redhat.com>

-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/4] xfs: proper replay of deferred ops queued during log recovery
  2020-09-27 23:41 ` [PATCH 2/4] xfs: proper replay of deferred ops queued during log recovery Darrick J. Wong
@ 2020-09-28  5:26   ` Dave Chinner
  2020-09-28  6:37     ` Christoph Hellwig
  2020-09-28 17:47     ` Darrick J. Wong
  0 siblings, 2 replies; 15+ messages in thread
From: Dave Chinner @ 2020-09-28  5:26 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs, hch, bfoster

On Sun, Sep 27, 2020 at 04:41:20PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> When we replay unfinished intent items that have been recovered from the
> log, it's possible that the replay will cause the creation of more
> deferred work items.  As outlined in commit 509955823cc9c ("xfs: log
> recovery should replay deferred ops in order"), later work items have an
> implicit ordering dependency on earlier work items.  Therefore, recovery
> must replay the items (both recovered and created) in the same order
> that they would have been during normal operation.
> 
> For log recovery, we enforce this ordering by using an empty transaction
> to collect deferred ops that get created in the process of recovering a
> log intent item to prevent them from being committed before the rest of
> the recovered intent items.  After we finish committing all the
> recovered log items, we allocate a transaction with an enormous block
> reservation, splice our huge list of created deferred ops into that
> transaction, and commit it, thereby finishing all those ops.
> 
> This is /really/ hokey -- it's the one place in XFS where we allow
> nested transactions; the splicing of the defer ops list is is inelegant
> and has to be done twice per recovery function; and the broken way we
> handle inode pointers and block reservations cause subtle use-after-free
> and allocator problems that will be fixed by this patch and the two
> patches after it.
> 
> Therefore, replace the hokey empty transaction with a structure designed
> to capture each chain of deferred ops that are created as part of
> recovering a single unfinished log intent.  Finally, refactor the loop
> that replays those chains to do so using one transaction per chain.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  fs/xfs/libxfs/xfs_defer.c       |   56 +++++++++++++--
>  fs/xfs/libxfs/xfs_defer.h       |   20 +++++
>  fs/xfs/libxfs/xfs_log_recover.h |    2 +
>  fs/xfs/xfs_bmap_item.c          |   16 +---
>  fs/xfs/xfs_extfree_item.c       |    7 +-
>  fs/xfs/xfs_log_recover.c        |  150 +++++++++++++++++++++++++--------------
>  fs/xfs/xfs_refcount_item.c      |   16 +---
>  fs/xfs/xfs_rmap_item.c          |    7 +-
>  fs/xfs/xfs_trans.h              |    3 +
>  9 files changed, 183 insertions(+), 94 deletions(-)
> 
> 
> diff --git a/fs/xfs/libxfs/xfs_defer.c b/fs/xfs/libxfs/xfs_defer.c
> index 36c103c14bc9..0de7672fe63d 100644
> --- a/fs/xfs/libxfs/xfs_defer.c
> +++ b/fs/xfs/libxfs/xfs_defer.c
> @@ -549,14 +549,56 @@ xfs_defer_move(
>   *
>   * Create and log intent items for all the work that we're capturing so that we
>   * can be assured that the items will get replayed if the system goes down
> - * before log recovery gets a chance to finish the work it put off.  Then we
> - * move the chain from stp to dtp.
> + * before log recovery gets a chance to finish the work it put off.  The entire
> + * deferred ops state is transferred to the capture structure and the
> + * transaction is then ready for the caller to commit it.  If there are no
> + * intent items to capture, this function returns NULL.
>   */
> -void
> +struct xfs_defer_capture *
>  xfs_defer_capture(
> -	struct xfs_trans	*dtp,
> -	struct xfs_trans	*stp)
> +	struct xfs_trans		*tp)

"capture" what?

Perhaps this whole API reads better as:

xfs_defer_ops_capture()
xfs_defer_ops_continue()
xfs_defer_ops_release()

because what it is doing is moving deferops from a transaction to a
capture structure and back again...

.....

> diff --git a/fs/xfs/libxfs/xfs_defer.h b/fs/xfs/libxfs/xfs_defer.h
> index 3164199162b6..bc7493bf4542 100644
> --- a/fs/xfs/libxfs/xfs_defer.h
> +++ b/fs/xfs/libxfs/xfs_defer.h
> @@ -8,6 +8,7 @@
>  
>  struct xfs_btree_cur;
>  struct xfs_defer_op_type;
> +struct xfs_defer_capture;
>  
>  /*
>   * Header for deferred operation list.
> @@ -63,10 +64,27 @@ extern const struct xfs_defer_op_type xfs_rmap_update_defer_type;
>  extern const struct xfs_defer_op_type xfs_extent_free_defer_type;
>  extern const struct xfs_defer_op_type xfs_agfl_free_defer_type;
>  
> +/*
> + * Deferred operation freezer.  This structure enables a dfops user to detach
> + * the chain of deferred operations from a transaction so that they can be
> + * continued later.
> + */

"freezer"?

Stale comment?

.....

> @@ -531,15 +526,12 @@ xfs_bui_item_recover(
>  		xfs_bmap_unmap_extent(tp, ip, &irec);
>  	}
>  
> -	xfs_defer_capture(parent_tp, tp);
> -	error = xfs_trans_commit(tp);
> +	error = xlog_recover_trans_commit(tp, capture_list);
>  	xfs_iunlock(ip, XFS_ILOCK_EXCL);
>  	xfs_irele(ip);
> -
>  	return error;

hmmmm.

> diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
> index e0675071b39e..107965acc57e 100644
> --- a/fs/xfs/xfs_log_recover.c
> +++ b/fs/xfs/xfs_log_recover.c
> @@ -1755,6 +1755,38 @@ xlog_recover_release_intent(
>  	spin_unlock(&ailp->ail_lock);
>  }
>  
> +/*
> + * Capture any deferred ops and commit the transaction.  This is the last step
> + * needed to finish a log intent item that we recovered from the log, and will
> + * take care of releasing all the relevant resources.

What does "take care of releasing all the relevant resources"
mean? 

> + */
> +int
> +xlog_recover_trans_commit(
> +	struct xfs_trans		*tp,
> +	struct list_head		*capture_list)
> +{
> +	struct xfs_mount		*mp = tp->t_mountp;
> +	struct xfs_defer_capture	*dfc = xfs_defer_capture(tp);
> +	int				error;
> +
> +	/* If we don't capture anything, commit tp and exit. */
> +	if (!dfc)
> +		return xfs_trans_commit(tp);
> +
> +	/*
> +	 * Commit the transaction.  If that fails, clean up the defer ops and
> +	 * the dfc that we just created.  Otherwise, add the dfc to the list.
> +	 */
> +	error = xfs_trans_commit(tp);
> +	if (error) {
> +		xfs_defer_capture_free(mp, dfc);
> +		return error;
> +	}
> +
> +	list_add_tail(&dfc->dfc_list, capture_list);
> +	return 0;
> +}

And, really, this is more than a "transaction commit" operation; it
doesn't have anything recovery specific to it, so if the
xfs_defer_capture() API is "generic xfs_defer" functionality, why
isn't this placed next to it and nameed
xfs_defer_capture_and_commit()?

> @@ -2431,38 +2463,62 @@ xlog_recover_process_data(
>  	return 0;
>  }
>  
> +static void
> +xlog_cancel_defer_ops(
> +	struct xfs_mount	*mp,
> +	struct list_head	*capture_list)
> +{
> +	struct xfs_defer_capture *dfc, *next;
> +
> +	list_for_each_entry_safe(dfc, next, capture_list, dfc_list) {
> +		list_del_init(&dfc->dfc_list);
> +		xfs_defer_capture_free(mp, dfc);
> +	}
> +}

Same - there is nothing log recovery specific here.

>  /* Take all the collected deferred ops and finish them in order. */
>  static int
>  xlog_finish_defer_ops(
> -	struct xfs_trans	*parent_tp)
> +	struct xfs_mount	*mp,
> +	struct list_head	*capture_list)
>  {
> -	struct xfs_mount	*mp = parent_tp->t_mountp;
> +	struct xfs_defer_capture *dfc, *next;
>  	struct xfs_trans	*tp;
>  	int64_t			freeblks;
> -	uint			resblks;
> -	int			error;
> +	uint64_t		resblks;
> +	int			error = 0;
>  
> -	/*
> -	 * We're finishing the defer_ops that accumulated as a result of
> -	 * recovering unfinished intent items during log recovery.  We
> -	 * reserve an itruncate transaction because it is the largest
> -	 * permanent transaction type.  Since we're the only user of the fs
> -	 * right now, take 93% (15/16) of the available free blocks.  Use
> -	 * weird math to avoid a 64-bit division.
> -	 */
> -	freeblks = percpu_counter_sum(&mp->m_fdblocks);
> -	if (freeblks <= 0)
> -		return -ENOSPC;
> -	resblks = min_t(int64_t, UINT_MAX, freeblks);
> -	resblks = (resblks * 15) >> 4;
> -	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_itruncate, resblks,
> -			0, XFS_TRANS_RESERVE, &tp);
> -	if (error)
> -		return error;
> -	/* transfer all collected dfops to this transaction */
> -	xfs_defer_move(tp, parent_tp);
> +	list_for_each_entry_safe(dfc, next, capture_list, dfc_list) {
> +		/*
> +		 * We're finishing the defer_ops that accumulated as a result
> +		 * of recovering unfinished intent items during log recovery.
> +		 * We reserve an itruncate transaction because it is the
> +		 * largest permanent transaction type.  Since we're the only
> +		 * user of the fs right now, take 93% (15/16) of the available
> +		 * free blocks.  Use weird math to avoid a 64-bit division.
> +		 */
> +		freeblks = percpu_counter_sum(&mp->m_fdblocks);
> +		if (freeblks <= 0)
> +			return -ENOSPC;
>  
> -	return xfs_trans_commit(tp);
> +		resblks = min_t(uint64_t, UINT_MAX, freeblks);
> +		resblks = (resblks * 15) >> 4;
> +		error = xfs_trans_alloc(mp, &M_RES(mp)->tr_itruncate, resblks,
> +				0, XFS_TRANS_RESERVE, &tp);
> +		if (error)
> +			return error;
> +
> +		/* transfer all collected dfops to this transaction */
> +		list_del_init(&dfc->dfc_list);
> +		xfs_defer_continue(dfc, tp);
> +
> +		error = xfs_trans_commit(tp);
> +		xfs_defer_capture_free(mp, dfc);

Why does this need to call xfs_defer_cancel_list() here? Shouldn't
dfc->dfc_dfops already be empty here? And if it isn't, shouldn't
that throw an error rather than silently cancle work that hasn't be
done that should have been?


>  #endif
> -	while (lip != NULL) {
> +	for (lip = xfs_trans_ail_cursor_first(ailp, &cur, 0);
> +	     lip != NULL;
> +	     lip = xfs_trans_ail_cursor_next(ailp, &cur)) {
>  		/*
>  		 * We're done when we see something other than an intent.
>  		 * There should be no intents left in the AIL now.
> @@ -2533,28 +2577,28 @@ xlog_recover_process_intents(
>  		 */
>  		ASSERT(XFS_LSN_CMP(last_lsn, lip->li_lsn) >= 0);
>  
> +		if (test_and_set_bit(XFS_LI_RECOVERED, &lip->li_flags))
> +			continue;
> +

Why do we still need XFS_LI_RECOVERED here? This log item is going to get
removed from the AIL by the committing of the first transaction
in the ->iop_recover() sequence we are running, so we'll never find
it again in the AIL. Nothing else checks for XFS_LI_RECOVERED
anymore, so this seems unnecessary now...


> -out:
> +
>  	xfs_trans_ail_cursor_done(&cur);
>  	spin_unlock(&ailp->ail_lock);
>  	if (!error)
> -		error = xlog_finish_defer_ops(parent_tp);
> -	xfs_trans_cancel(parent_tp);
> +		error = xlog_finish_defer_ops(log->l_mp, &capture_list);
>  
> +	xlog_cancel_defer_ops(log->l_mp, &capture_list);
>  	return error;
>  }

Again, why are we cancelling the capture list if we just
successfully processed the defer ops on the capture list?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/4] xfs: xfs_defer_capture should absorb remaining block reservation
  2020-09-27 23:41 ` [PATCH 3/4] xfs: xfs_defer_capture should absorb remaining block reservation Darrick J. Wong
@ 2020-09-28  5:28   ` Dave Chinner
  0 siblings, 0 replies; 15+ messages in thread
From: Dave Chinner @ 2020-09-28  5:28 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Christoph Hellwig, linux-xfs, bfoster

On Sun, Sep 27, 2020 at 04:41:27PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> When xfs_defer_capture extracts the deferred ops and transaction state
> from a transaction, it should absorb the remaining block reservation so
> that when we continue the dfops chain, we still have those blocks to
> use.
> 
> This adds the requirement that every log intent item recovery function
> must be careful to reserve enough blocks to handle both itself and all
> defer ops that it can queue.  On the other hand, this enables us to do
> away with the handwaving block estimation nonsense that was going on in
> xlog_finish_defer_ops.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> ---
>  fs/xfs/libxfs/xfs_defer.c |    4 ++++
>  fs/xfs/libxfs/xfs_defer.h |    1 +
>  fs/xfs/xfs_log_recover.c  |   18 +-----------------
>  3 files changed, 6 insertions(+), 17 deletions(-)
> 
> 
> diff --git a/fs/xfs/libxfs/xfs_defer.c b/fs/xfs/libxfs/xfs_defer.c
> index 0de7672fe63d..85d70f1edc1c 100644
> --- a/fs/xfs/libxfs/xfs_defer.c
> +++ b/fs/xfs/libxfs/xfs_defer.c
> @@ -574,6 +574,8 @@ xfs_defer_capture(
>  	list_splice_init(&tp->t_dfops, &dfc->dfc_dfops);
>  	dfc->dfc_tpflags = tp->t_flags & XFS_TRANS_LOWMODE;
>  	tp->t_flags &= ~XFS_TRANS_LOWMODE;
> +	dfc->dfc_blkres = tp->t_blk_res - tp->t_blk_res_used;
> +	tp->t_blk_res = tp->t_blk_res_used;
>  
>  	return dfc;
>  }

I think this needs an explanation in the function/API comment to
explain why it is being done.

Otherwise looks good.

Reviewed-by: Dave Chinner <dchinner@redhat.com>

-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 4/4] xfs: xfs_defer_capture should absorb remaining transaction reservation
  2020-09-27 23:41 ` [PATCH 4/4] xfs: xfs_defer_capture should absorb remaining transaction reservation Darrick J. Wong
@ 2020-09-28  5:52   ` Dave Chinner
  2020-09-28 17:15     ` Darrick J. Wong
  0 siblings, 1 reply; 15+ messages in thread
From: Dave Chinner @ 2020-09-28  5:52 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs, hch, bfoster

On Sun, Sep 27, 2020 at 04:41:33PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> When xfs_defer_capture extracts the deferred ops and transaction state
> from a transaction, it should record the transaction reservation type
> from the old transaction so that when we continue the dfops chain, we
> still use the same reservation parameters.
> 
> This avoids a potential failure vector by ensuring that we never ask for
> more log reservation space than we would have asked for had the system
> not gone down.

Nope, it does not do that.

> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  fs/xfs/libxfs/xfs_defer.c |    5 +++++
>  fs/xfs/libxfs/xfs_defer.h |    1 +
>  fs/xfs/xfs_log_recover.c  |    4 ++--
>  3 files changed, 8 insertions(+), 2 deletions(-)
> 
> 
> diff --git a/fs/xfs/libxfs/xfs_defer.c b/fs/xfs/libxfs/xfs_defer.c
> index 85d70f1edc1c..c53443252389 100644
> --- a/fs/xfs/libxfs/xfs_defer.c
> +++ b/fs/xfs/libxfs/xfs_defer.c
> @@ -577,6 +577,11 @@ xfs_defer_capture(
>  	dfc->dfc_blkres = tp->t_blk_res - tp->t_blk_res_used;
>  	tp->t_blk_res = tp->t_blk_res_used;
>  
> +	/* Preserve the transaction reservation type. */
> +	dfc->dfc_tres.tr_logres = tp->t_log_res;
> +	dfc->dfc_tres.tr_logcount = tp->t_log_count;
> +	dfc->dfc_tres.tr_logflags = XFS_TRANS_PERM_LOG_RES;

This means every child deferop takes a full tp->t_log_count
reservation, whilst in memory the child reservation would ahve been
handled by the parent via the log ticket unit count being
decremented by one. Hence child deferops -never- run with the same
maximal reservation that their parents held.

The difference is that at runtime we are rolling transaction which
regrant space from the initial reservation of (tp->t_log_count *
tp->t_log_res) made a run time. i.e. the first child deferop that
runs has a total log space grant of ((tp->t_log_count - 1)
* tp->t_log_res), the second it is "- 2", and so on right down to
when the log ticket runs out of initial reservation and so it goes
to reserving a single unit (tp->t_log_res) at a time.

Hence both the intents being recovered and all their children are
over-reserving log space by using the default log count for the
&M_RES(mp)->tr_itruncate reservation. Even if we ignore the initial
reservation being incorrect, the child reservations of the same size
as the parent are definitely incorrect. They really should be
allowed only a single unit reservation, and if the transaction rolls
to process defer ops, it needs to regrant new log space during the
commit process.

Hence I think this can only be correct as:

	dfc->dfc_tres.tr_log_count = 1;

Regardless of how many units the parent recovery reservation
obtained. (Which I also think can only be correct as 1 because we
don't know how many units of reservation space the parent had
consumed when it was logged.)

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/4] xfs: remove xfs_defer_reset
  2020-09-27 23:41 ` [PATCH 1/4] xfs: remove xfs_defer_reset Darrick J. Wong
  2020-09-28  4:42   ` Dave Chinner
@ 2020-09-28  6:20   ` Christoph Hellwig
  2020-09-28 16:43     ` Darrick J. Wong
  1 sibling, 1 reply; 15+ messages in thread
From: Christoph Hellwig @ 2020-09-28  6:20 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs, david, hch, bfoster

On Sun, Sep 27, 2020 at 04:41:14PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Remove this one-line helper.

Maybe expand the rationale here a little more?

Otherwise looks fine:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/4] xfs: proper replay of deferred ops queued during log recovery
  2020-09-28  5:26   ` Dave Chinner
@ 2020-09-28  6:37     ` Christoph Hellwig
  2020-09-28 17:53       ` Darrick J. Wong
  2020-09-28 17:47     ` Darrick J. Wong
  1 sibling, 1 reply; 15+ messages in thread
From: Christoph Hellwig @ 2020-09-28  6:37 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Darrick J. Wong, linux-xfs, hch, bfoster

On Mon, Sep 28, 2020 at 03:26:18PM +1000, Dave Chinner wrote:
> > +	struct xfs_mount		*mp = tp->t_mountp;
> > +	struct xfs_defer_capture	*dfc = xfs_defer_capture(tp);
> > +	int				error;
> > +
> > +	/* If we don't capture anything, commit tp and exit. */
> > +	if (!dfc)
> > +		return xfs_trans_commit(tp);
> > +
> > +	/*
> > +	 * Commit the transaction.  If that fails, clean up the defer ops and
> > +	 * the dfc that we just created.  Otherwise, add the dfc to the list.
> > +	 */
> > +	error = xfs_trans_commit(tp);
> > +	if (error) {
> > +		xfs_defer_capture_free(mp, dfc);
> > +		return error;
> > +	}
> > +
> > +	list_add_tail(&dfc->dfc_list, capture_list);
> > +	return 0;
> > +}
> 
> And, really, this is more than a "transaction commit" operation; it
> doesn't have anything recovery specific to it, so if the
> xfs_defer_capture() API is "generic xfs_defer" functionality, why
> isn't this placed next to it and nameed
> xfs_defer_capture_and_commit()?

Agreed.  I find the xlog_recover_trans_commit naming pretty weird.

> > @@ -2533,28 +2577,28 @@ xlog_recover_process_intents(
> >  		 */
> >  		ASSERT(XFS_LSN_CMP(last_lsn, lip->li_lsn) >= 0);
> >  
> > +		if (test_and_set_bit(XFS_LI_RECOVERED, &lip->li_flags))
> > +			continue;
> > +
> 
> Why do we still need XFS_LI_RECOVERED here? This log item is going to get
> removed from the AIL by the committing of the first transaction
> in the ->iop_recover() sequence we are running, so we'll never find
> it again in the AIL. Nothing else checks for XFS_LI_RECOVERED
> anymore, so this seems unnecessary now...

We also never restart the list walk as far as I can tell.  So yes,
XFS_LI_RECOVERED seems entirely superflous and should probably be
removed in a prep patch.

> > -out:
> > +
> >  	xfs_trans_ail_cursor_done(&cur);
> >  	spin_unlock(&ailp->ail_lock);
> >  	if (!error)
> > -		error = xlog_finish_defer_ops(parent_tp);
> > -	xfs_trans_cancel(parent_tp);
> > +		error = xlog_finish_defer_ops(log->l_mp, &capture_list);
> >  
> > +	xlog_cancel_defer_ops(log->l_mp, &capture_list);
> >  	return error;
> >  }
> 
> Again, why are we cancelling the capture list if we just
> successfully processed the defer ops on the capture list?

Yes, we'll probably just want to assert it is non-empty at the end of
xlog_finish_defer_ops.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/4] xfs: remove xfs_defer_reset
  2020-09-28  6:20   ` Christoph Hellwig
@ 2020-09-28 16:43     ` Darrick J. Wong
  0 siblings, 0 replies; 15+ messages in thread
From: Darrick J. Wong @ 2020-09-28 16:43 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs, david, bfoster

On Mon, Sep 28, 2020 at 08:20:34AM +0200, Christoph Hellwig wrote:
> On Sun, Sep 27, 2020 at 04:41:14PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > Remove this one-line helper.
> 
> Maybe expand the rationale here a little more?

I'll change it to:
"Remove this one-line helper since the assert is trivially true in one
call site and the rest just obscures a bitmask operation."

--D

> Otherwise looks fine:
> 
> Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 4/4] xfs: xfs_defer_capture should absorb remaining transaction reservation
  2020-09-28  5:52   ` Dave Chinner
@ 2020-09-28 17:15     ` Darrick J. Wong
  0 siblings, 0 replies; 15+ messages in thread
From: Darrick J. Wong @ 2020-09-28 17:15 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs, hch, bfoster

On Mon, Sep 28, 2020 at 03:52:30PM +1000, Dave Chinner wrote:
> On Sun, Sep 27, 2020 at 04:41:33PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > When xfs_defer_capture extracts the deferred ops and transaction state
> > from a transaction, it should record the transaction reservation type
> > from the old transaction so that when we continue the dfops chain, we
> > still use the same reservation parameters.
> > 
> > This avoids a potential failure vector by ensuring that we never ask for
> > more log reservation space than we would have asked for had the system
> > not gone down.
> 
> Nope, it does not do that.
> 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> >  fs/xfs/libxfs/xfs_defer.c |    5 +++++
> >  fs/xfs/libxfs/xfs_defer.h |    1 +
> >  fs/xfs/xfs_log_recover.c  |    4 ++--
> >  3 files changed, 8 insertions(+), 2 deletions(-)
> > 
> > 
> > diff --git a/fs/xfs/libxfs/xfs_defer.c b/fs/xfs/libxfs/xfs_defer.c
> > index 85d70f1edc1c..c53443252389 100644
> > --- a/fs/xfs/libxfs/xfs_defer.c
> > +++ b/fs/xfs/libxfs/xfs_defer.c
> > @@ -577,6 +577,11 @@ xfs_defer_capture(
> >  	dfc->dfc_blkres = tp->t_blk_res - tp->t_blk_res_used;
> >  	tp->t_blk_res = tp->t_blk_res_used;
> >  
> > +	/* Preserve the transaction reservation type. */
> > +	dfc->dfc_tres.tr_logres = tp->t_log_res;
> > +	dfc->dfc_tres.tr_logcount = tp->t_log_count;
> > +	dfc->dfc_tres.tr_logflags = XFS_TRANS_PERM_LOG_RES;
> 
> This means every child deferop takes a full tp->t_log_count
> reservation, whilst in memory the child reservation would ahve been
> handled by the parent via the log ticket unit count being
> decremented by one. Hence child deferops -never- run with the same
> maximal reservation that their parents held.
> 
> The difference is that at runtime we are rolling transaction which
> regrant space from the initial reservation of (tp->t_log_count *
> tp->t_log_res) made a run time. i.e. the first child deferop that
> runs has a total log space grant of ((tp->t_log_count - 1)
> * tp->t_log_res), the second it is "- 2", and so on right down to
> when the log ticket runs out of initial reservation and so it goes
> to reserving a single unit (tp->t_log_res) at a time.
> 
> Hence both the intents being recovered and all their children are
> over-reserving log space by using the default log count for the
> &M_RES(mp)->tr_itruncate reservation. Even if we ignore the initial
> reservation being incorrect, the child reservations of the same size
> as the parent are definitely incorrect. They really should be
> allowed only a single unit reservation, and if the transaction rolls
> to process defer ops, it needs to regrant new log space during the
> commit process.
> 
> Hence I think this can only be correct as:
> 
> 	dfc->dfc_tres.tr_log_count = 1;
> 
> Regardless of how many units the parent recovery reservation
> obtained. (Which I also think can only be correct as 1 because we
> don't know how many units of reservation space the parent had
> consumed when it was logged.)

Can we reach down into the transaction's xlog_ticket.t_cnt to find out
how many were left?

Hm.  Maybe that doesn't matter, seeing as recovery basically
single-steps through whatever it salvaged from the log, I think it's
better for recovery to regrant every time it rolls the transaction
because the amount of space required for a single step will (once we get
the logres part sorted out) never be larger than the sum of the logres
of all transactions that were running when we crashed.

It also occurs to me (esp. given the earlier conversation about the
transaction reservations used to recover intents) that in general we
might need to finish one item to push the tail forward to free up space,
so therefore log recovery should try to use as few resources as
possible.

Well, I'll try hardcoding logcount=1 and see what happens. :)

--D

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/4] xfs: proper replay of deferred ops queued during log recovery
  2020-09-28  5:26   ` Dave Chinner
  2020-09-28  6:37     ` Christoph Hellwig
@ 2020-09-28 17:47     ` Darrick J. Wong
  1 sibling, 0 replies; 15+ messages in thread
From: Darrick J. Wong @ 2020-09-28 17:47 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs, hch, bfoster

On Mon, Sep 28, 2020 at 03:26:18PM +1000, Dave Chinner wrote:
> On Sun, Sep 27, 2020 at 04:41:20PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > When we replay unfinished intent items that have been recovered from the
> > log, it's possible that the replay will cause the creation of more
> > deferred work items.  As outlined in commit 509955823cc9c ("xfs: log
> > recovery should replay deferred ops in order"), later work items have an
> > implicit ordering dependency on earlier work items.  Therefore, recovery
> > must replay the items (both recovered and created) in the same order
> > that they would have been during normal operation.
> > 
> > For log recovery, we enforce this ordering by using an empty transaction
> > to collect deferred ops that get created in the process of recovering a
> > log intent item to prevent them from being committed before the rest of
> > the recovered intent items.  After we finish committing all the
> > recovered log items, we allocate a transaction with an enormous block
> > reservation, splice our huge list of created deferred ops into that
> > transaction, and commit it, thereby finishing all those ops.
> > 
> > This is /really/ hokey -- it's the one place in XFS where we allow
> > nested transactions; the splicing of the defer ops list is is inelegant
> > and has to be done twice per recovery function; and the broken way we
> > handle inode pointers and block reservations cause subtle use-after-free
> > and allocator problems that will be fixed by this patch and the two
> > patches after it.
> > 
> > Therefore, replace the hokey empty transaction with a structure designed
> > to capture each chain of deferred ops that are created as part of
> > recovering a single unfinished log intent.  Finally, refactor the loop
> > that replays those chains to do so using one transaction per chain.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> >  fs/xfs/libxfs/xfs_defer.c       |   56 +++++++++++++--
> >  fs/xfs/libxfs/xfs_defer.h       |   20 +++++
> >  fs/xfs/libxfs/xfs_log_recover.h |    2 +
> >  fs/xfs/xfs_bmap_item.c          |   16 +---
> >  fs/xfs/xfs_extfree_item.c       |    7 +-
> >  fs/xfs/xfs_log_recover.c        |  150 +++++++++++++++++++++++++--------------
> >  fs/xfs/xfs_refcount_item.c      |   16 +---
> >  fs/xfs/xfs_rmap_item.c          |    7 +-
> >  fs/xfs/xfs_trans.h              |    3 +
> >  9 files changed, 183 insertions(+), 94 deletions(-)
> > 
> > 
> > diff --git a/fs/xfs/libxfs/xfs_defer.c b/fs/xfs/libxfs/xfs_defer.c
> > index 36c103c14bc9..0de7672fe63d 100644
> > --- a/fs/xfs/libxfs/xfs_defer.c
> > +++ b/fs/xfs/libxfs/xfs_defer.c
> > @@ -549,14 +549,56 @@ xfs_defer_move(
> >   *
> >   * Create and log intent items for all the work that we're capturing so that we
> >   * can be assured that the items will get replayed if the system goes down
> > - * before log recovery gets a chance to finish the work it put off.  Then we
> > - * move the chain from stp to dtp.
> > + * before log recovery gets a chance to finish the work it put off.  The entire
> > + * deferred ops state is transferred to the capture structure and the
> > + * transaction is then ready for the caller to commit it.  If there are no
> > + * intent items to capture, this function returns NULL.
> >   */
> > -void
> > +struct xfs_defer_capture *
> >  xfs_defer_capture(
> > -	struct xfs_trans	*dtp,
> > -	struct xfs_trans	*stp)
> > +	struct xfs_trans		*tp)
> 
> "capture" what?
> 
> Perhaps this whole API reads better as:
> 
> xfs_defer_ops_capture()
> xfs_defer_ops_continue()
> xfs_defer_ops_release()

Yes!  Finally a better set of names!  I've been stuck on that for a
while!  I like these names very much!

> because what it is doing is moving deferops from a transaction to a
> capture structure and back again...
> 
> .....
> 
> > diff --git a/fs/xfs/libxfs/xfs_defer.h b/fs/xfs/libxfs/xfs_defer.h
> > index 3164199162b6..bc7493bf4542 100644
> > --- a/fs/xfs/libxfs/xfs_defer.h
> > +++ b/fs/xfs/libxfs/xfs_defer.h
> > @@ -8,6 +8,7 @@
> >  
> >  struct xfs_btree_cur;
> >  struct xfs_defer_op_type;
> > +struct xfs_defer_capture;
> >  
> >  /*
> >   * Header for deferred operation list.
> > @@ -63,10 +64,27 @@ extern const struct xfs_defer_op_type xfs_rmap_update_defer_type;
> >  extern const struct xfs_defer_op_type xfs_extent_free_defer_type;
> >  extern const struct xfs_defer_op_type xfs_agfl_free_defer_type;
> >  
> > +/*
> > + * Deferred operation freezer.  This structure enables a dfops user to detach
> > + * the chain of deferred operations from a transaction so that they can be
> > + * continued later.
> > + */
> 
> "freezer"?
> 
> Stale comment?

Yep.  Fixed.

> .....
> 
> > @@ -531,15 +526,12 @@ xfs_bui_item_recover(
> >  		xfs_bmap_unmap_extent(tp, ip, &irec);
> >  	}
> >  
> > -	xfs_defer_capture(parent_tp, tp);
> > -	error = xfs_trans_commit(tp);
> > +	error = xlog_recover_trans_commit(tp, capture_list);
> >  	xfs_iunlock(ip, XFS_ILOCK_EXCL);
> >  	xfs_irele(ip);
> > -
> >  	return error;
> 
> hmmmm.
> 
> > diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
> > index e0675071b39e..107965acc57e 100644
> > --- a/fs/xfs/xfs_log_recover.c
> > +++ b/fs/xfs/xfs_log_recover.c
> > @@ -1755,6 +1755,38 @@ xlog_recover_release_intent(
> >  	spin_unlock(&ailp->ail_lock);
> >  }
> >  
> > +/*
> > + * Capture any deferred ops and commit the transaction.  This is the last step
> > + * needed to finish a log intent item that we recovered from the log, and will
> > + * take care of releasing all the relevant resources.
> 
> What does "take care of releasing all the relevant resources"
> mean? 

Not much.  I'll remove it.

> > + */
> > +int
> > +xlog_recover_trans_commit(
> > +	struct xfs_trans		*tp,
> > +	struct list_head		*capture_list)
> > +{
> > +	struct xfs_mount		*mp = tp->t_mountp;
> > +	struct xfs_defer_capture	*dfc = xfs_defer_capture(tp);
> > +	int				error;
> > +
> > +	/* If we don't capture anything, commit tp and exit. */
> > +	if (!dfc)
> > +		return xfs_trans_commit(tp);
> > +
> > +	/*
> > +	 * Commit the transaction.  If that fails, clean up the defer ops and
> > +	 * the dfc that we just created.  Otherwise, add the dfc to the list.
> > +	 */
> > +	error = xfs_trans_commit(tp);
> > +	if (error) {
> > +		xfs_defer_capture_free(mp, dfc);
> > +		return error;
> > +	}
> > +
> > +	list_add_tail(&dfc->dfc_list, capture_list);
> > +	return 0;
> > +}
> 
> And, really, this is more than a "transaction commit" operation; it
> doesn't have anything recovery specific to it, so if the
> xfs_defer_capture() API is "generic xfs_defer" functionality, why
> isn't this placed next to it and nameed
> xfs_defer_capture_and_commit()?
> 
> > @@ -2431,38 +2463,62 @@ xlog_recover_process_data(
> >  	return 0;
> >  }
> >  
> > +static void
> > +xlog_cancel_defer_ops(
> > +	struct xfs_mount	*mp,
> > +	struct list_head	*capture_list)
> > +{
> > +	struct xfs_defer_capture *dfc, *next;
> > +
> > +	list_for_each_entry_safe(dfc, next, capture_list, dfc_list) {
> > +		list_del_init(&dfc->dfc_list);
> > +		xfs_defer_capture_free(mp, dfc);
> > +	}
> > +}
> 
> Same - there is nothing log recovery specific here.

Ok, I'll move these two functions to xfs_defer.c.

> >  /* Take all the collected deferred ops and finish them in order. */
> >  static int
> >  xlog_finish_defer_ops(
> > -	struct xfs_trans	*parent_tp)
> > +	struct xfs_mount	*mp,
> > +	struct list_head	*capture_list)
> >  {
> > -	struct xfs_mount	*mp = parent_tp->t_mountp;
> > +	struct xfs_defer_capture *dfc, *next;
> >  	struct xfs_trans	*tp;
> >  	int64_t			freeblks;
> > -	uint			resblks;
> > -	int			error;
> > +	uint64_t		resblks;
> > +	int			error = 0;
> >  
> > -	/*
> > -	 * We're finishing the defer_ops that accumulated as a result of
> > -	 * recovering unfinished intent items during log recovery.  We
> > -	 * reserve an itruncate transaction because it is the largest
> > -	 * permanent transaction type.  Since we're the only user of the fs
> > -	 * right now, take 93% (15/16) of the available free blocks.  Use
> > -	 * weird math to avoid a 64-bit division.
> > -	 */
> > -	freeblks = percpu_counter_sum(&mp->m_fdblocks);
> > -	if (freeblks <= 0)
> > -		return -ENOSPC;
> > -	resblks = min_t(int64_t, UINT_MAX, freeblks);
> > -	resblks = (resblks * 15) >> 4;
> > -	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_itruncate, resblks,
> > -			0, XFS_TRANS_RESERVE, &tp);
> > -	if (error)
> > -		return error;
> > -	/* transfer all collected dfops to this transaction */
> > -	xfs_defer_move(tp, parent_tp);
> > +	list_for_each_entry_safe(dfc, next, capture_list, dfc_list) {
> > +		/*
> > +		 * We're finishing the defer_ops that accumulated as a result
> > +		 * of recovering unfinished intent items during log recovery.
> > +		 * We reserve an itruncate transaction because it is the
> > +		 * largest permanent transaction type.  Since we're the only
> > +		 * user of the fs right now, take 93% (15/16) of the available
> > +		 * free blocks.  Use weird math to avoid a 64-bit division.
> > +		 */
> > +		freeblks = percpu_counter_sum(&mp->m_fdblocks);
> > +		if (freeblks <= 0)
> > +			return -ENOSPC;
> >  
> > -	return xfs_trans_commit(tp);
> > +		resblks = min_t(uint64_t, UINT_MAX, freeblks);
> > +		resblks = (resblks * 15) >> 4;
> > +		error = xfs_trans_alloc(mp, &M_RES(mp)->tr_itruncate, resblks,
> > +				0, XFS_TRANS_RESERVE, &tp);
> > +		if (error)
> > +			return error;
> > +
> > +		/* transfer all collected dfops to this transaction */
> > +		list_del_init(&dfc->dfc_list);
> > +		xfs_defer_continue(dfc, tp);
> > +
> > +		error = xfs_trans_commit(tp);
> > +		xfs_defer_capture_free(mp, dfc);
> 
> Why does this need to call xfs_defer_cancel_list() here? Shouldn't

Um, this function doesn't cancel the list at all.  Are you asking why it
is that "reattach captured dfops to transaction" and "free the capture
structure" are separate steps?

> dfc->dfc_dfops already be empty here? And if it isn't, shouldn't
> that throw an error rather than silently cancle work that hasn't be
> done that should have been?

Yes, dfc_dfops should be empty at this point.  In the BUI recovery UAF
patch, this xfs_defer_capture_free call is used to unlock and release
the captured inode(s) after the transaction commits.

An alternate way to design this would have been that xfs_defer_continue
totally consumes the *dfc (and frees it) as part of reattaching the
dfops list to the transaction; and in the later patch that adds inodes
to the capture structure, we'd pass out any inode pointers and
let xlog_finish_defer_ops unlock and release it directly.

Hmm, maybe that /would/ be more clear than this setup...

> 
> >  #endif
> > -	while (lip != NULL) {
> > +	for (lip = xfs_trans_ail_cursor_first(ailp, &cur, 0);
> > +	     lip != NULL;
> > +	     lip = xfs_trans_ail_cursor_next(ailp, &cur)) {
> >  		/*
> >  		 * We're done when we see something other than an intent.
> >  		 * There should be no intents left in the AIL now.
> > @@ -2533,28 +2577,28 @@ xlog_recover_process_intents(
> >  		 */
> >  		ASSERT(XFS_LSN_CMP(last_lsn, lip->li_lsn) >= 0);
> >  
> > +		if (test_and_set_bit(XFS_LI_RECOVERED, &lip->li_flags))
> > +			continue;
> > +
> 
> Why do we still need XFS_LI_RECOVERED here? This log item is going to get
> removed from the AIL by the committing of the first transaction
> in the ->iop_recover() sequence we are running, so we'll never find
> it again in the AIL. Nothing else checks for XFS_LI_RECOVERED
> anymore, so this seems unnecessary now...

<nod> I've suspected for a while that this wasn't necessary...

> > -out:
> > +
> >  	xfs_trans_ail_cursor_done(&cur);
> >  	spin_unlock(&ailp->ail_lock);
> >  	if (!error)
> > -		error = xlog_finish_defer_ops(parent_tp);
> > -	xfs_trans_cancel(parent_tp);
> > +		error = xlog_finish_defer_ops(log->l_mp, &capture_list);
> >  
> > +	xlog_cancel_defer_ops(log->l_mp, &capture_list);
> >  	return error;
> >  }
> 
> Again, why are we cancelling the capture list if we just
> successfully processed the defer ops on the capture list?

Overly compact code, I suppose.  If xlog_finish_defer_ops succeeds, the
call is a nop since the list is empty; but if there was an error, then
we still need to walk the list and free the remaining capture
structures.

This probably would have been clearer had I ended the function like
this:

	xfs_trans_ail_cursor_done(&cur);
	spin_unlock(&ailp->ail_lock);
	if (error)
		goto err;

	error = xlog_finish_defer_ops(log->l_mp, &capture_list);
	if (error)
		goto err;

	return 0;
err:
	xfs_defer_ops_release_all(log->l_mp, &capture_list);
	return error;
}

--D

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/4] xfs: proper replay of deferred ops queued during log recovery
  2020-09-28  6:37     ` Christoph Hellwig
@ 2020-09-28 17:53       ` Darrick J. Wong
  0 siblings, 0 replies; 15+ messages in thread
From: Darrick J. Wong @ 2020-09-28 17:53 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Dave Chinner, linux-xfs, bfoster

On Mon, Sep 28, 2020 at 08:37:17AM +0200, Christoph Hellwig wrote:
> On Mon, Sep 28, 2020 at 03:26:18PM +1000, Dave Chinner wrote:
> > > +	struct xfs_mount		*mp = tp->t_mountp;
> > > +	struct xfs_defer_capture	*dfc = xfs_defer_capture(tp);
> > > +	int				error;
> > > +
> > > +	/* If we don't capture anything, commit tp and exit. */
> > > +	if (!dfc)
> > > +		return xfs_trans_commit(tp);
> > > +
> > > +	/*
> > > +	 * Commit the transaction.  If that fails, clean up the defer ops and
> > > +	 * the dfc that we just created.  Otherwise, add the dfc to the list.
> > > +	 */
> > > +	error = xfs_trans_commit(tp);
> > > +	if (error) {
> > > +		xfs_defer_capture_free(mp, dfc);
> > > +		return error;
> > > +	}
> > > +
> > > +	list_add_tail(&dfc->dfc_list, capture_list);
> > > +	return 0;
> > > +}
> > 
> > And, really, this is more than a "transaction commit" operation; it
> > doesn't have anything recovery specific to it, so if the
> > xfs_defer_capture() API is "generic xfs_defer" functionality, why
> > isn't this placed next to it and nameed
> > xfs_defer_capture_and_commit()?
> 
> Agreed.  I find the xlog_recover_trans_commit naming pretty weird.

<nod>

The final list of functions are:

xfs_defer_ops_capture_and_commit: capture a transaction's dfops, commit
	the transaction, and add the capture structure to the list, just like
	xlog_recover_trans_commit did in this patch.

xfs_defer_ops_continue: restore the captured dfops and transaction state
	to a fresh transaction, and free the capture structure.

xfs_defer_ops_release: free all captured dfops and the structure, in case
	recovery failed somewhere and we have to bail out.

> > > @@ -2533,28 +2577,28 @@ xlog_recover_process_intents(
> > >  		 */
> > >  		ASSERT(XFS_LSN_CMP(last_lsn, lip->li_lsn) >= 0);
> > >  
> > > +		if (test_and_set_bit(XFS_LI_RECOVERED, &lip->li_flags))
> > > +			continue;
> > > +
> > 
> > Why do we still need XFS_LI_RECOVERED here? This log item is going to get
> > removed from the AIL by the committing of the first transaction
> > in the ->iop_recover() sequence we are running, so we'll never find
> > it again in the AIL. Nothing else checks for XFS_LI_RECOVERED
> > anymore, so this seems unnecessary now...
> 
> We also never restart the list walk as far as I can tell.  So yes,
> XFS_LI_RECOVERED seems entirely superflous and should probably be
> removed in a prep patch.

Ok.

> > > -out:
> > > +
> > >  	xfs_trans_ail_cursor_done(&cur);
> > >  	spin_unlock(&ailp->ail_lock);
> > >  	if (!error)
> > > -		error = xlog_finish_defer_ops(parent_tp);
> > > -	xfs_trans_cancel(parent_tp);
> > > +		error = xlog_finish_defer_ops(log->l_mp, &capture_list);
> > >  
> > > +	xlog_cancel_defer_ops(log->l_mp, &capture_list);
> > >  	return error;
> > >  }
> > 
> > Again, why are we cancelling the capture list if we just
> > successfully processed the defer ops on the capture list?
> 
> Yes, we'll probably just want to assert it is non-empty at the end of
> xlog_finish_defer_ops.

Done.

--D

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2020-09-28 17:53 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-27 23:41 [PATCH v3 0/4] xfs: fix how we deal with new intents during recovery Darrick J. Wong
2020-09-27 23:41 ` [PATCH 1/4] xfs: remove xfs_defer_reset Darrick J. Wong
2020-09-28  4:42   ` Dave Chinner
2020-09-28  6:20   ` Christoph Hellwig
2020-09-28 16:43     ` Darrick J. Wong
2020-09-27 23:41 ` [PATCH 2/4] xfs: proper replay of deferred ops queued during log recovery Darrick J. Wong
2020-09-28  5:26   ` Dave Chinner
2020-09-28  6:37     ` Christoph Hellwig
2020-09-28 17:53       ` Darrick J. Wong
2020-09-28 17:47     ` Darrick J. Wong
2020-09-27 23:41 ` [PATCH 3/4] xfs: xfs_defer_capture should absorb remaining block reservation Darrick J. Wong
2020-09-28  5:28   ` Dave Chinner
2020-09-27 23:41 ` [PATCH 4/4] xfs: xfs_defer_capture should absorb remaining transaction reservation Darrick J. Wong
2020-09-28  5:52   ` Dave Chinner
2020-09-28 17:15     ` Darrick J. Wong

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.