All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: linux-xfs@vger.kernel.org
Subject: [PATCH 02/13] xfs: add log item precommit operation
Date: Wed, 12 Aug 2020 19:25:45 +1000	[thread overview]
Message-ID: <20200812092556.2567285-3-david@fromorbit.com> (raw)
In-Reply-To: <20200812092556.2567285-1-david@fromorbit.com>

From: Dave Chinner <dchinner@redhat.com>

For inodes that are dirty, we have an attached cluster buffer that
we want to use to track the dirty inode through the AIL.
Unfortunately, locking the cluster buffer and adding it to the
transaction when the inode is first logged in a transaction leads to
buffer lock ordering inversions.

The specific problem is ordering against the AGI buffer. When
modifying unlinked lists, the buffer lock order is AGI -> inode
cluster buffer as the AGI buffer lock serialises all access to the
unlinked lists. Unfortunately, functionality like xfs_droplink()
logs the inode before calling xfs_iunlink(), as do various directory
manipulation functions. The inode can be logged way down in the
stack as far as the bmapi routines and hence, without a major
rewrite of lots of APIs there's no way we can avoid the inode being
logged by something until after the AGI has been logged.

As we are going to be using ordered buffers for inode AIL tracking,
there isn't a need to actually lock that buffer against modification
as all the modifications are captured by logging the inode item
itself. Hence we don't actually need to join the cluster buffer into
the transaction until just before it is committed. This means we do
not perturb any of the existing buffer lock orders in transactions,
and the inode cluster buffer is always locked last in a transaction
that doesn't otherwise touch inode cluster buffers.

We do this by introducing a precommit log item method. A log item
method is used because it is likely dquots will be moved to this
same ordered buffer tracking scheme and hence will need a similar
callout. This commit just introduces the mechanism; the inode item
implementation is in followup commits.

The precommit items need to be sorted into consistent order as we
may be locking multiple items here. Hence if we have two dirty
inodes in cluster buffers A and B, and some other transaction has
two separate dirty inodes in the same cluster buffers, locking them
in different orders opens us up to ABBA deadlocks. Hence we sort the
items on the transaction based on the presence of a sort log item
method.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_icache.c |  1 +
 fs/xfs/xfs_trans.c  | 91 +++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/xfs_trans.h  |  6 ++-
 3 files changed, 96 insertions(+), 2 deletions(-)

diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
index aa6aad258670..5cdded02cdc8 100644
--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -1065,6 +1065,7 @@ xfs_reclaim_inode(
 	ip->i_ino = 0;
 	spin_unlock(&ip->i_flags_lock);
 
+	ASSERT(!ip->i_itemp || ip->i_itemp->ili_item.li_buf == NULL);
 	xfs_iunlock(ip, XFS_ILOCK_EXCL);
 
 	XFS_STATS_INC(ip->i_mount, xs_ig_reclaims);
diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
index ed72867b1a19..68b03446db8e 100644
--- a/fs/xfs/xfs_trans.c
+++ b/fs/xfs/xfs_trans.c
@@ -816,6 +816,90 @@ xfs_trans_committed_bulk(
 	spin_unlock(&ailp->ail_lock);
 }
 
+/*
+ * Sort transaction items prior to running precommit operations. This will
+ * attempt to order the items such that they will always be locked in the same
+ * order. Items that have no sort function are moved to the end of the list
+ * and so are locked last (XXX: need to check the logic matches the comment).
+ *
+ * This may need refinement as different types of objects add sort functions.
+ *
+ * Function is more complex than it needs to be because we are comparing 64 bit
+ * values and the function only returns 32 bit values.
+ */
+static int
+xfs_trans_precommit_sort(
+	void			*unused_arg,
+	struct list_head	*a,
+	struct list_head	*b)
+{
+	struct xfs_log_item	*lia = container_of(a,
+					struct xfs_log_item, li_trans);
+	struct xfs_log_item	*lib = container_of(b,
+					struct xfs_log_item, li_trans);
+	int64_t			diff;
+
+	/*
+	 * If both items are non-sortable, leave them alone. If only one is
+	 * sortable, move the non-sortable item towards the end of the list.
+	 */
+	if (!lia->li_ops->iop_sort && !lib->li_ops->iop_sort)
+		return 0;
+	if (!lia->li_ops->iop_sort)
+		return 1;
+	if (!lib->li_ops->iop_sort)
+		return -1;
+
+	diff = lia->li_ops->iop_sort(lia) - lib->li_ops->iop_sort(lib);
+	if (diff < 0)
+		return -1;
+	if (diff > 0)
+		return 1;
+	return 0;
+}
+
+/*
+ * Run transaction precommit functions.
+ *
+ * If there is an error in any of the callouts, then stop immediately and
+ * trigger a shutdown to abort the transaction. There is no recovery possible
+ * from errors at this point as the transaction is dirty....
+ */
+static int
+xfs_trans_run_precommits(
+	struct xfs_trans	*tp)
+{
+	struct xfs_mount	*mp = tp->t_mountp;
+	struct xfs_log_item	*lip, *n;
+	int			error = 0;
+
+	/*
+	 * Sort the item list to avoid ABBA deadlocks with other transactions
+	 * running precommit operations that lock multiple shared items such as
+	 * inode cluster buffers.
+	 */
+	list_sort(NULL, &tp->t_items, xfs_trans_precommit_sort);
+
+	/*
+	 * Precommit operations can remove the log item from the transaction
+	 * if the log item exists purely to delay modifications until they
+	 * can be ordered against other operations. Hence we have to use
+	 * list_for_each_entry_safe() here.
+	 */
+	list_for_each_entry_safe(lip, n, &tp->t_items, li_trans) {
+		if (!test_bit(XFS_LI_DIRTY, &lip->li_flags))
+			continue;
+		if (lip->li_ops->iop_precommit) {
+			error = lip->li_ops->iop_precommit(tp, lip);
+			if (error)
+				break;
+		}
+	}
+	if (error)
+		xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE);
+	return error;
+}
+
 /*
  * Commit the given transaction to the log.
  *
@@ -840,6 +924,13 @@ __xfs_trans_commit(
 
 	trace_xfs_trans_commit(tp, _RET_IP_);
 
+	error = xfs_trans_run_precommits(tp);
+	if (error) {
+		if (tp->t_flags & XFS_TRANS_PERM_LOG_RES)
+			xfs_defer_cancel(tp);
+		goto out_unreserve;
+	}
+
 	/*
 	 * Finish deferred items on final commit. Only permanent transactions
 	 * should ever have deferred ops.
diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h
index b752501818d2..26ea19bd0621 100644
--- a/fs/xfs/xfs_trans.h
+++ b/fs/xfs/xfs_trans.h
@@ -70,10 +70,12 @@ struct xfs_item_ops {
 	void (*iop_format)(struct xfs_log_item *, struct xfs_log_vec *);
 	void (*iop_pin)(struct xfs_log_item *);
 	void (*iop_unpin)(struct xfs_log_item *, int remove);
-	uint (*iop_push)(struct xfs_log_item *, struct list_head *);
+	uint64_t (*iop_sort)(struct xfs_log_item *);
+	int (*iop_precommit)(struct xfs_trans *, struct xfs_log_item *);
 	void (*iop_committing)(struct xfs_log_item *, xfs_lsn_t commit_lsn);
-	void (*iop_release)(struct xfs_log_item *);
 	xfs_lsn_t (*iop_committed)(struct xfs_log_item *, xfs_lsn_t);
+	uint (*iop_push)(struct xfs_log_item *, struct list_head *);
+	void (*iop_release)(struct xfs_log_item *);
 	int (*iop_recover)(struct xfs_log_item *lip, struct xfs_trans *tp);
 	bool (*iop_match)(struct xfs_log_item *item, uint64_t id);
 };
-- 
2.26.2.761.g0e0b3e54be


  parent reply	other threads:[~2020-08-12  9:26 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-12  9:25 [PATCH 00/13] xfs: in memory inode unlink log items Dave Chinner
2020-08-12  9:25 ` [PATCH 01/13] xfs: xfs_iflock is no longer a completion Dave Chinner
2020-08-18 23:44   ` Darrick J. Wong
2020-08-22  7:41   ` Christoph Hellwig
2020-08-12  9:25 ` Dave Chinner [this message]
2020-08-22  9:06   ` [PATCH 02/13] xfs: add log item precommit operation Christoph Hellwig
2020-08-12  9:25 ` [PATCH 03/13] xfs: factor the xfs_iunlink functions Dave Chinner
2020-08-18 23:49   ` Darrick J. Wong
2020-08-22  7:45   ` Christoph Hellwig
2020-08-12  9:25 ` [PATCH 04/13] xfs: arrange all unlinked inodes into one list Dave Chinner
2020-08-18 23:59   ` Darrick J. Wong
2020-08-19  0:45     ` Dave Chinner
2020-08-19  0:58     ` Gao Xiang
2020-08-22  9:01       ` Christoph Hellwig
2020-08-23 17:24         ` Gao Xiang
2020-08-24  8:19           ` [RFC PATCH] xfs: use log_incompat feature instead of speculate matching Gao Xiang
2020-08-24  8:34             ` Gao Xiang
2020-08-24 15:08               ` Darrick J. Wong
2020-08-24 15:41                 ` Gao Xiang
2020-08-25 10:06                   ` [PATCH] " Gao Xiang
2020-08-25 14:54                     ` Darrick J. Wong
2020-08-25 15:30                       ` Gao Xiang
2020-08-27  7:19                       ` Christoph Hellwig
2020-08-12  9:25 ` [PATCH 05/13] xfs: add unlink list pointers to xfs_inode Dave Chinner
2020-08-19  0:02   ` Darrick J. Wong
2020-08-19  0:47     ` Dave Chinner
2020-08-22  9:03   ` Christoph Hellwig
2020-08-25  5:17     ` Dave Chinner
2020-08-27  7:21       ` Christoph Hellwig
2020-08-12  9:25 ` [PATCH 06/13] xfs: replace iunlink backref lookups with list lookups Dave Chinner
2020-08-19  0:13   ` Darrick J. Wong
2020-08-19  0:52     ` Dave Chinner
2020-08-12  9:25 ` [PATCH 07/13] xfs: mapping unlinked inodes is now redundant Dave Chinner
2020-08-19  0:14   ` Darrick J. Wong
2020-08-12  9:25 ` [PATCH 08/13] xfs: updating i_next_unlinked doesn't need to return old value Dave Chinner
2020-08-19  0:19   ` Darrick J. Wong
2020-08-12  9:25 ` [PATCH 09/13] xfs: validate the unlinked list pointer on update Dave Chinner
2020-08-19  0:23   ` Darrick J. Wong
2020-08-12  9:25 ` [PATCH 10/13] xfs: re-order AGI updates in unlink list updates Dave Chinner
2020-08-19  0:29   ` Darrick J. Wong
2020-08-19  1:01     ` Dave Chinner
2020-08-12  9:25 ` [PATCH 11/13] xfs: combine iunlink inode update functions Dave Chinner
2020-08-19  0:30   ` Darrick J. Wong
2020-08-12  9:25 ` [PATCH 12/13] xfs: add in-memory iunlink log item Dave Chinner
2020-08-19  0:35   ` Darrick J. Wong
2020-08-12  9:25 ` [PATCH 13/13] xfs: reorder iunlink remove operation in xfs_ifree Dave Chinner
2020-08-12 11:12   ` Gao Xiang
2020-08-19  0:45   ` Darrick J. Wong
2020-08-18 18:17 ` [PATCH 00/13] xfs: in memory inode unlink log items Darrick J. Wong
2020-08-18 20:01   ` Gao Xiang
2020-08-18 21:42   ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200812092556.2567285-3-david@fromorbit.com \
    --to=david@fromorbit.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.