All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/9 v2] xfs: byte-base grant head reservation tracking
@ 2022-08-09 23:03 Dave Chinner
  2022-08-09 23:03 ` [PATCH 1/9] xfs: move and xfs_trans_committed_bulk Dave Chinner
                   ` (8 more replies)
  0 siblings, 9 replies; 39+ messages in thread
From: Dave Chinner @ 2022-08-09 23:03 UTC (permalink / raw)
  To: linux-xfs

Hi folks,

One of the significant limitations of the log reservation code is
that it uses physical tracking of the reservation space to account
for both the space used in the journal as well as the reservations
held in memory by the CIL and activei running transactions. Because
this in-memory reservation tracking requires byte-level granularity,
this means that the "LSN" that the grant head stores it's location
in is split into 32 bits for the log cycle and 32 bits for the grant
head offset into the log.

Storing a byte count as the grant head offset into the log means
that we can only index 4GB of space with the grant head. This is one
of the primary limiting factors preventing us from increasing the
physical log size beyond 2GB. Hence to increase the physical log
size, we have to increase the space available for storing the grant
head.

Needing more physical space to store the grant head is an issue
because we use lockless atomic accounting for the grant head to
minimise the overhead of new incoming transaction reservations.
These have unbound concurrency, and hence any lock in the
reservation path will cause serious scalability issues. The lockless
accounting fast path was the solution to these scalability problems
that we had over a decade ago, and hence we know we cannot go back
to a lock based solution.

Therefore we are still largely limited to the storage space we can
perform atomic operations on. We already use 64 bit compare/exchange
operations, and there is not widespread hardware support for 128 bit
atomic compare/exchange operations so increasing the grant head LSN
to a structure > 64 bits in size is not really an option.

Hence we have to look for a different solution - one that doesn't
require us to increase the amount of storage space for the grant
head. This is where we need to recognise that the grant head is
actually tracking three things:

1. physical log space that is tracked by the AIL;
2. physical log space that the CIL will soon consume; and
3. potential log space that active transactions *may* consume.

One of the tricks that the grant heads play is that the don't need
to explicitly track the space consumed by the AIL (#1), because the
consumed log space is simply "grant head - log tail", and so it
doesn't not matter how the space that is consumed moves between the
three separate accounting groups. Reservation space is automatically
returned to the "available pool" by the AIL moving the log tail
forwards. Hence the grant head only needs to account for the
journal space that transactions consume as they complete, and never
have to be updated to account for metadata writeback emptying the
journal.

This all works because xlog_space_left() is a calculation of the
difference between two LSNs - the log tail and the grant head. When
the grant head wraps the log tail, we've run out of log space and
the journal reservations get throttled until the log tail is moved
forward to "unwrap" the grant head and make space available again.

But there's no reason why we have to track log space in this way
to determine that we've run out of reservation space - all we need
is for xlog_space_left() to be able to accurately calculate when
we've run out of space. So let's break this down.

Firstly, the AIL tracks all the items in the journal, and so at
any given time it should know exactly where the on-disk head and
tail of the journal are located. At the moment, we only know where
the tail is (xfs_ail_min_lsn()), and we update the log tail
(log->l_tail_lsn) whenever the AIL minimum LSN changes.

The AIL will see the maximum committed LSN, but it does not track
this. Instead, the log tracks this as log->l_last_sync_lsn and
updates this directly in iclog IO completion when a iclog has
callbacks attached. That is, log->l_last_sync_lsn is updated
whenever journal IO completion is going to insert the latest
committed log items into the AIL. If the AIL is empty, the log tail
is assigned the value stored in l_last_sync_lsn as the log tail
now points to the last written checkpoint in the journal.

The simplest way I can describe how we track the log space is
as follows:

   l_tail_lsn		l_last_sync_lsn		grant head lsn
	|-----------------------|+++++++++++++++++++++|
	|    physical space	|   in memory space   |
	| - - - - - - xlog_space_left() - - - - - - - |

It is simple for the AIL to track the maximum LSN that has been
inserted into the AIL. If we do this, we no longer need to track
log->l_last_sync_lsn in the journal itself and we can always get the
physical space tracked by the journal directly from the AIL. The AIL
functions can calculate the "log tail space" dynamically when either
the log tail or the max LSN seen changes, thereby removing all need
for the log itself to track this state. Hence we now have:

   l_tail_lsn		  ail_head_lsn		grant head lsn
	|-----------------------|+++++++++++++++++++++|
	|    log->l_tail_space	|   in memory space   |
	| - - - - - - xlog_space_left() - - - - - - - |

And we've solved the problem of efficiently calculating the amount
of physical space the log is consuming. All this leaves is now
calculating how much space we are consuming in memory.

Luckily for us, we've just added all the update hooks needed to do
this. From the above diagram, two things are obvious:

1. when the tail moves, only log->l_tail_space reduces
2. when the ail_max_lsn_seen increases, log->l_tail_space increases
   and "in memory space" reduces by the same amount.

IOWs, we now have a mechanism that can transfer the in-memory
reservation space directly to the on-disk tail space accounting. At
this point, we can change the grant head from tracking physical
location to tracking a simple byte count:

   l_tail_lsn		  ail_head_lsn		grant head bytes
	|-----------------------|+++++++++++++++++++++|
	|    log->l_tail_space	|     grant space     |
	| - - - - - - xlog_space_left() - - - - - - - |

and xlog_space_left() simply changes to:

space left = log->l_logsize - grant space - log->l_tail_space;

All of the complex grant head cracking, combining and
compare/exchange code gets replaced by simple atomic add/sub
operations, and the grant heads can now track a full 64 bit bytes
space. The fastpath reservation accounting is also much faster
because it is much simpler.

There's one little problem, though. The transaction reservation code
has to set the LSN target for the AIL to push to ensure that the log
tail keeps moving forward (xlog_grant_push_ail()), and the deferred
intent logging code also tries to keep abreast of the amount of
space available in the log via xlog_grant_push_threshold().

The AIL pushing problem is actually easy to solve - we don't need to
push the AIL from the transaction reservation code as the AIL
already tracks all the space used by the journal. All the
transaction reservation code does is try to keep 25% of the journal
physically free once the AIL has items in it. Of course there is the
corner case where the AIL can be empty and the reservations fully
depleted, in which case we have to ensure that we kick the AIL
regardless of it's state when a transaction goes to sleep on waiting
for reservation space.

Hence before we start changing any of the grant head accounting, we
remove all the AIL pushing hooks from the reservation code and let
the AIL determine the target it needs to push to itself. We also
allow the deferred intent logging code to determine if the AIL
should be tail pushing similar to how it currently checks if we are
running out of log space, so the intent relogging still works as it
should.

WIth these changes in place, there is no external code that is
dependent on the grant heads tracking physical space, and hence we
can then implement the change to pure in-memory reservation space
tracking in the grant heads.....

This all passes fstests for default and rmapbt enable configs.
Performance tests also show good improvements where the transaction
accounting is the bottleneck. This has been written and tested on
top of the CIL scalability, inode unlink item and lockless buffer
lookup patchesets, so if you want to test this you are probably best
to start with all of them applied first.

-Dave.

---

Version 2
- reorder moving xfs_trans_bulk_commit() patch to start of series
- fix failure to consider NULLCOMMITLSN push target in AIL
- grant space release based on ctx->start_lsn fails to release the
  space used in the checkpoint that was just committed. Release
  needs to be based on the the ctx->commit_lsn which is the end of
  the region that the checkpoint consumes in the log.
- rename ail_max_seen_lsn to ail_head_lsn, and convert it to
  tracking the commit lsn of the latest checkpoint. This effectively
  replaces log->l_last_sync_lsn.
- move AIL lsn updates and grant space returns to before we process
  the logvec chain to insert the new items into the AIL. This is
  necessary to avoid a transient window where the head of the AIL
  moves forward, increasing log tail space, but we haven't yet
  reduced the grant reservation space and hence available log space
  drops by the size of the checkpoint for the duration of the AIL
  insertion process before returning to where it should be.
- add memory barriers to the grant head return and xlog_space_left()
  functions to ensure that xlog_space_left() will always see the
  updated log tail space if it sees a grant head that has had the
  space returned to it. This prevents transients where the tail can
  lag the head by 2 cycles as the log head wraps.
- lots of other minor stuff....

Original RFC:
- https://lore.kernel.org/linux-xfs/20220708015558.1134330-1-david@fromorbit.com/



^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH 1/9] xfs: move and xfs_trans_committed_bulk
  2022-08-09 23:03 [PATCH 0/9 v2] xfs: byte-base grant head reservation tracking Dave Chinner
@ 2022-08-09 23:03 ` Dave Chinner
  2022-08-10 14:17   ` kernel test robot
                     ` (3 more replies)
  2022-08-09 23:03 ` [PATCH 2/9] xfs: AIL doesn't need manual pushing Dave Chinner
                   ` (7 subsequent siblings)
  8 siblings, 4 replies; 39+ messages in thread
From: Dave Chinner @ 2022-08-09 23:03 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

Ever since the CIL and delayed logging was introduced,
xfs_trans_committed_bulk() has been a purely CIL checkpoint
completion function and not a transaction commit completion
function. Now that we are adding log specific updates to this
function, it really does not have anything to do with the
transaction subsystem - it is really log and log item level
functionality.

This should be part of the CIL code as it is the callback
that moves log items from the CIL checkpoint to the AIL. Move it
and rename it to xlog_cil_ail_insert().

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_log_cil.c    | 132 +++++++++++++++++++++++++++++++++++++++-
 fs/xfs/xfs_trans.c      | 129 ---------------------------------------
 fs/xfs/xfs_trans_priv.h |   3 -
 3 files changed, 131 insertions(+), 133 deletions(-)

diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
index eccbfb99e894..475a18493c37 100644
--- a/fs/xfs/xfs_log_cil.c
+++ b/fs/xfs/xfs_log_cil.c
@@ -683,6 +683,136 @@ xlog_cil_insert_items(
 	}
 }
 
+static inline void
+xlog_cil_ail_insert_batch(
+	struct xfs_ail		*ailp,
+	struct xfs_ail_cursor	*cur,
+	struct xfs_log_item	**log_items,
+	int			nr_items,
+	xfs_lsn_t		commit_lsn)
+{
+	int	i;
+
+	spin_lock(&ailp->ail_lock);
+	/* xfs_trans_ail_update_bulk drops ailp->ail_lock */
+	xfs_trans_ail_update_bulk(ailp, cur, log_items, nr_items, commit_lsn);
+
+	for (i = 0; i < nr_items; i++) {
+		struct xfs_log_item *lip = log_items[i];
+
+		if (lip->li_ops->iop_unpin)
+			lip->li_ops->iop_unpin(lip, 0);
+	}
+}
+
+/*
+ * Take the checkpoint's log vector chain of items and insert the attached log
+ * items into the the AIL. This uses bulk insertion techniques to minimise AIL
+ * lock traffic.
+ *
+ * If we are called with the aborted flag set, it is because a log write during
+ * a CIL checkpoint commit has failed. In this case, all the items in the
+ * checkpoint have already gone through iop_committed and iop_committing, which
+ * means that checkpoint commit abort handling is treated exactly the same as an
+ * iclog write error even though we haven't started any IO yet. Hence in this
+ * case all we need to do is iop_committed processing, followed by an
+ * iop_unpin(aborted) call.
+ *
+ * The AIL cursor is used to optimise the insert process. If commit_lsn is not
+ * at the end of the AIL, the insert cursor avoids the need to walk the AIL to
+ * find the insertion point on every xfs_log_item_batch_insert() call. This
+ * saves a lot of needless list walking and is a net win, even though it
+ * slightly increases that amount of AIL lock traffic to set it up and tear it
+ * down.
+ */
+void
+xlog_cil_ail_insert(
+	struct xlog		*log,
+	struct list_head	*lv_chain,
+	xfs_lsn_t		commit_lsn,
+	bool			aborted)
+{
+#define LOG_ITEM_BATCH_SIZE	32
+	struct xfs_ail		*ailp = log->l_ailp;
+	struct xfs_log_item	*log_items[LOG_ITEM_BATCH_SIZE];
+	struct xfs_log_vec	*lv;
+	struct xfs_ail_cursor	cur;
+	int			i = 0;
+
+	spin_lock(&ailp->ail_lock);
+	xfs_trans_ail_cursor_last(ailp, &cur, commit_lsn);
+	spin_unlock(&ailp->ail_lock);
+
+	/* unpin all the log items */
+	list_for_each_entry(lv, lv_chain, lv_list) {
+		struct xfs_log_item	*lip = lv->lv_item;
+		xfs_lsn_t		item_lsn;
+
+		if (aborted)
+			set_bit(XFS_LI_ABORTED, &lip->li_flags);
+
+		if (lip->li_ops->flags & XFS_ITEM_RELEASE_WHEN_COMMITTED) {
+			lip->li_ops->iop_release(lip);
+			continue;
+		}
+
+		if (lip->li_ops->iop_committed)
+			item_lsn = lip->li_ops->iop_committed(lip, commit_lsn);
+		else
+			item_lsn = commit_lsn;
+
+		/* item_lsn of -1 means the item needs no further processing */
+		if (XFS_LSN_CMP(item_lsn, (xfs_lsn_t)-1) == 0)
+			continue;
+
+		/*
+		 * if we are aborting the operation, no point in inserting the
+		 * object into the AIL as we are in a shutdown situation.
+		 */
+		if (aborted) {
+			ASSERT(xlog_is_shutdown(ailp->ail_log));
+			if (lip->li_ops->iop_unpin)
+				lip->li_ops->iop_unpin(lip, 1);
+			continue;
+		}
+
+		if (item_lsn != commit_lsn) {
+
+			/*
+			 * Not a bulk update option due to unusual item_lsn.
+			 * Push into AIL immediately, rechecking the lsn once
+			 * we have the ail lock. Then unpin the item. This does
+			 * not affect the AIL cursor the bulk insert path is
+			 * using.
+			 */
+			spin_lock(&ailp->ail_lock);
+			if (XFS_LSN_CMP(item_lsn, lip->li_lsn) > 0)
+				xfs_trans_ail_update(ailp, lip, item_lsn);
+			else
+				spin_unlock(&ailp->ail_lock);
+			if (lip->li_ops->iop_unpin)
+				lip->li_ops->iop_unpin(lip, 0);
+			continue;
+		}
+
+		/* Item is a candidate for bulk AIL insert.  */
+		log_items[i++] = lv->lv_item;
+		if (i >= LOG_ITEM_BATCH_SIZE) {
+			xlog_cil_ail_insert_batch(ailp, &cur, log_items,
+					LOG_ITEM_BATCH_SIZE, commit_lsn);
+			i = 0;
+		}
+	}
+
+	/* make sure we insert the remainder! */
+	if (i)
+		xlog_cil_ail_insert_batch(ailp, &cur, log_items, i, commit_lsn);
+
+	spin_lock(&ailp->ail_lock);
+	xfs_trans_ail_cursor_done(&cur);
+	spin_unlock(&ailp->ail_lock);
+}
+
 static void
 xlog_cil_free_logvec(
 	struct list_head	*lv_chain)
@@ -792,7 +922,7 @@ xlog_cil_committed(
 		spin_unlock(&ctx->cil->xc_push_lock);
 	}
 
-	xfs_trans_committed_bulk(ctx->cil->xc_log->l_ailp, &ctx->lv_chain,
+	xlog_cil_ail_insert(ctx->cil->xc_log, &ctx->lv_chain,
 					ctx->start_lsn, abort);
 
 	xfs_extent_busy_sort(&ctx->busy_extents);
diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
index 7bd16fbff534..58c4e875eb12 100644
--- a/fs/xfs/xfs_trans.c
+++ b/fs/xfs/xfs_trans.c
@@ -715,135 +715,6 @@ xfs_trans_free_items(
 	}
 }
 
-static inline void
-xfs_log_item_batch_insert(
-	struct xfs_ail		*ailp,
-	struct xfs_ail_cursor	*cur,
-	struct xfs_log_item	**log_items,
-	int			nr_items,
-	xfs_lsn_t		commit_lsn)
-{
-	int	i;
-
-	spin_lock(&ailp->ail_lock);
-	/* xfs_trans_ail_update_bulk drops ailp->ail_lock */
-	xfs_trans_ail_update_bulk(ailp, cur, log_items, nr_items, commit_lsn);
-
-	for (i = 0; i < nr_items; i++) {
-		struct xfs_log_item *lip = log_items[i];
-
-		if (lip->li_ops->iop_unpin)
-			lip->li_ops->iop_unpin(lip, 0);
-	}
-}
-
-/*
- * Bulk operation version of xfs_trans_committed that takes a log vector of
- * items to insert into the AIL. This uses bulk AIL insertion techniques to
- * minimise lock traffic.
- *
- * If we are called with the aborted flag set, it is because a log write during
- * a CIL checkpoint commit has failed. In this case, all the items in the
- * checkpoint have already gone through iop_committed and iop_committing, which
- * means that checkpoint commit abort handling is treated exactly the same
- * as an iclog write error even though we haven't started any IO yet. Hence in
- * this case all we need to do is iop_committed processing, followed by an
- * iop_unpin(aborted) call.
- *
- * The AIL cursor is used to optimise the insert process. If commit_lsn is not
- * at the end of the AIL, the insert cursor avoids the need to walk
- * the AIL to find the insertion point on every xfs_log_item_batch_insert()
- * call. This saves a lot of needless list walking and is a net win, even
- * though it slightly increases that amount of AIL lock traffic to set it up
- * and tear it down.
- */
-void
-xfs_trans_committed_bulk(
-	struct xfs_ail		*ailp,
-	struct list_head	*lv_chain,
-	xfs_lsn_t		commit_lsn,
-	bool			aborted)
-{
-#define LOG_ITEM_BATCH_SIZE	32
-	struct xfs_log_item	*log_items[LOG_ITEM_BATCH_SIZE];
-	struct xfs_log_vec	*lv;
-	struct xfs_ail_cursor	cur;
-	int			i = 0;
-
-	spin_lock(&ailp->ail_lock);
-	xfs_trans_ail_cursor_last(ailp, &cur, commit_lsn);
-	spin_unlock(&ailp->ail_lock);
-
-	/* unpin all the log items */
-	list_for_each_entry(lv, lv_chain, lv_list) {
-		struct xfs_log_item	*lip = lv->lv_item;
-		xfs_lsn_t		item_lsn;
-
-		if (aborted)
-			set_bit(XFS_LI_ABORTED, &lip->li_flags);
-
-		if (lip->li_ops->flags & XFS_ITEM_RELEASE_WHEN_COMMITTED) {
-			lip->li_ops->iop_release(lip);
-			continue;
-		}
-
-		if (lip->li_ops->iop_committed)
-			item_lsn = lip->li_ops->iop_committed(lip, commit_lsn);
-		else
-			item_lsn = commit_lsn;
-
-		/* item_lsn of -1 means the item needs no further processing */
-		if (XFS_LSN_CMP(item_lsn, (xfs_lsn_t)-1) == 0)
-			continue;
-
-		/*
-		 * if we are aborting the operation, no point in inserting the
-		 * object into the AIL as we are in a shutdown situation.
-		 */
-		if (aborted) {
-			ASSERT(xlog_is_shutdown(ailp->ail_log));
-			if (lip->li_ops->iop_unpin)
-				lip->li_ops->iop_unpin(lip, 1);
-			continue;
-		}
-
-		if (item_lsn != commit_lsn) {
-
-			/*
-			 * Not a bulk update option due to unusual item_lsn.
-			 * Push into AIL immediately, rechecking the lsn once
-			 * we have the ail lock. Then unpin the item. This does
-			 * not affect the AIL cursor the bulk insert path is
-			 * using.
-			 */
-			spin_lock(&ailp->ail_lock);
-			if (XFS_LSN_CMP(item_lsn, lip->li_lsn) > 0)
-				xfs_trans_ail_update(ailp, lip, item_lsn);
-			else
-				spin_unlock(&ailp->ail_lock);
-			if (lip->li_ops->iop_unpin)
-				lip->li_ops->iop_unpin(lip, 0);
-			continue;
-		}
-
-		/* Item is a candidate for bulk AIL insert.  */
-		log_items[i++] = lv->lv_item;
-		if (i >= LOG_ITEM_BATCH_SIZE) {
-			xfs_log_item_batch_insert(ailp, &cur, log_items,
-					LOG_ITEM_BATCH_SIZE, commit_lsn);
-			i = 0;
-		}
-	}
-
-	/* make sure we insert the remainder! */
-	if (i)
-		xfs_log_item_batch_insert(ailp, &cur, log_items, i, commit_lsn);
-
-	spin_lock(&ailp->ail_lock);
-	xfs_trans_ail_cursor_done(&cur);
-	spin_unlock(&ailp->ail_lock);
-}
-
 /*
  * Sort transaction items prior to running precommit operations. This will
  * attempt to order the items such that they will always be locked in the same
diff --git a/fs/xfs/xfs_trans_priv.h b/fs/xfs/xfs_trans_priv.h
index d5400150358e..52a45f0a5ef1 100644
--- a/fs/xfs/xfs_trans_priv.h
+++ b/fs/xfs/xfs_trans_priv.h
@@ -19,9 +19,6 @@ void	xfs_trans_add_item(struct xfs_trans *, struct xfs_log_item *);
 void	xfs_trans_del_item(struct xfs_log_item *);
 void	xfs_trans_unreserve_and_mod_sb(struct xfs_trans *tp);
 
-void	xfs_trans_committed_bulk(struct xfs_ail *ailp,
-				struct list_head *lv_chain,
-				xfs_lsn_t commit_lsn, bool aborted);
 /*
  * AIL traversal cursor.
  *
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 2/9] xfs: AIL doesn't need manual pushing
  2022-08-09 23:03 [PATCH 0/9 v2] xfs: byte-base grant head reservation tracking Dave Chinner
  2022-08-09 23:03 ` [PATCH 1/9] xfs: move and xfs_trans_committed_bulk Dave Chinner
@ 2022-08-09 23:03 ` Dave Chinner
  2022-08-22 17:08   ` Darrick J. Wong
  2022-09-07 14:01   ` Christoph Hellwig
  2022-08-09 23:03 ` [PATCH 3/9] xfs: background AIL push targets physical space, not grant space Dave Chinner
                   ` (6 subsequent siblings)
  8 siblings, 2 replies; 39+ messages in thread
From: Dave Chinner @ 2022-08-09 23:03 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

We have a mechanism that checks the amount of log space remaining
available every time we make a transaction reservation. If the
amount of space is below a threshold (25% free) we push on the AIL
to tell it to do more work. To do this, we end up calculating the
LSN that the AIL needs to push to on every reservation and updating
the push target for the AIL with that new target LSN.

This is silly and expensive. The AIL is perfectly capable of
calculating the push target itself, and it will always be running
when the AIL contains objects.

Modify the AIL to calculate it's 25% push target before it starts a
push using the same reserve grant head based calculation as is
currently used, and remove all the places where we ask the AIL to
push to a new 25% free target.

This does still require a manual push in certain circumstances.
These circumstances arise when the AIL is not full, but the
reservation grants consume the entire of the free space in the log.
In this case, we still need to push on the AIL to free up space, so
when we hit this condition (i.e. reservation going to sleep to wait
on log space) we do a single push to tell the AIL it should empty
itself. This will keep the AIL moving as new reservations come in
and want more space, rather than keep queuing them and having to
push the AIL repeatedly.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/libxfs/xfs_defer.c |   4 +-
 fs/xfs/xfs_log.c          | 135 ++-----------------------------
 fs/xfs/xfs_log.h          |   1 -
 fs/xfs/xfs_log_priv.h     |   2 +
 fs/xfs/xfs_trans_ail.c    | 165 +++++++++++++++++---------------------
 fs/xfs/xfs_trans_priv.h   |  33 ++++++--
 6 files changed, 110 insertions(+), 230 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_defer.c b/fs/xfs/libxfs/xfs_defer.c
index 5a321b783398..79c077078785 100644
--- a/fs/xfs/libxfs/xfs_defer.c
+++ b/fs/xfs/libxfs/xfs_defer.c
@@ -12,12 +12,14 @@
 #include "xfs_mount.h"
 #include "xfs_defer.h"
 #include "xfs_trans.h"
+#include "xfs_trans_priv.h"
 #include "xfs_buf_item.h"
 #include "xfs_inode.h"
 #include "xfs_inode_item.h"
 #include "xfs_trace.h"
 #include "xfs_icache.h"
 #include "xfs_log.h"
+#include "xfs_log_priv.h"
 #include "xfs_rmap.h"
 #include "xfs_refcount.h"
 #include "xfs_bmap.h"
@@ -439,7 +441,7 @@ xfs_defer_relog(
 		 * the log threshold once per call.
 		 */
 		if (threshold_lsn == NULLCOMMITLSN) {
-			threshold_lsn = xlog_grant_push_threshold(log, 0);
+			threshold_lsn = xfs_ail_push_target(log->l_ailp);
 			if (threshold_lsn == NULLCOMMITLSN)
 				break;
 		}
diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
index 4b1c0a9c6368..c609c188bd8a 100644
--- a/fs/xfs/xfs_log.c
+++ b/fs/xfs/xfs_log.c
@@ -30,10 +30,6 @@ xlog_alloc_log(
 	struct xfs_buftarg	*log_target,
 	xfs_daddr_t		blk_offset,
 	int			num_bblks);
-STATIC int
-xlog_space_left(
-	struct xlog		*log,
-	atomic64_t		*head);
 STATIC void
 xlog_dealloc_log(
 	struct xlog		*log);
@@ -51,10 +47,6 @@ xlog_state_get_iclog_space(
 	struct xlog_ticket	*ticket,
 	int			*logoffsetp);
 STATIC void
-xlog_grant_push_ail(
-	struct xlog		*log,
-	int			need_bytes);
-STATIC void
 xlog_sync(
 	struct xlog		*log,
 	struct xlog_in_core	*iclog,
@@ -242,42 +234,15 @@ xlog_grant_head_wake(
 {
 	struct xlog_ticket	*tic;
 	int			need_bytes;
-	bool			woken_task = false;
 
 	list_for_each_entry(tic, &head->waiters, t_queue) {
-
-		/*
-		 * There is a chance that the size of the CIL checkpoints in
-		 * progress at the last AIL push target calculation resulted in
-		 * limiting the target to the log head (l_last_sync_lsn) at the
-		 * time. This may not reflect where the log head is now as the
-		 * CIL checkpoints may have completed.
-		 *
-		 * Hence when we are woken here, it may be that the head of the
-		 * log that has moved rather than the tail. As the tail didn't
-		 * move, there still won't be space available for the
-		 * reservation we require.  However, if the AIL has already
-		 * pushed to the target defined by the old log head location, we
-		 * will hang here waiting for something else to update the AIL
-		 * push target.
-		 *
-		 * Therefore, if there isn't space to wake the first waiter on
-		 * the grant head, we need to push the AIL again to ensure the
-		 * target reflects both the current log tail and log head
-		 * position before we wait for the tail to move again.
-		 */
-
 		need_bytes = xlog_ticket_reservation(log, head, tic);
-		if (*free_bytes < need_bytes) {
-			if (!woken_task)
-				xlog_grant_push_ail(log, need_bytes);
+		if (*free_bytes < need_bytes)
 			return false;
-		}
 
 		*free_bytes -= need_bytes;
 		trace_xfs_log_grant_wake_up(log, tic);
 		wake_up_process(tic->t_task);
-		woken_task = true;
 	}
 
 	return true;
@@ -296,13 +261,15 @@ xlog_grant_head_wait(
 	do {
 		if (xlog_is_shutdown(log))
 			goto shutdown;
-		xlog_grant_push_ail(log, need_bytes);
 
 		__set_current_state(TASK_UNINTERRUPTIBLE);
 		spin_unlock(&head->lock);
 
 		XFS_STATS_INC(log->l_mp, xs_sleep_logspace);
 
+		/* Push on the AIL to free up all the log space. */
+		xfs_ail_push_all(log->l_ailp);
+
 		trace_xfs_log_grant_sleep(log, tic);
 		schedule();
 		trace_xfs_log_grant_wake(log, tic);
@@ -418,9 +385,6 @@ xfs_log_regrant(
 	 * of rolling transactions in the log easily.
 	 */
 	tic->t_tid++;
-
-	xlog_grant_push_ail(log, tic->t_unit_res);
-
 	tic->t_curr_res = tic->t_unit_res;
 	if (tic->t_cnt > 0)
 		return 0;
@@ -477,12 +441,7 @@ xfs_log_reserve(
 	ASSERT(*ticp == NULL);
 	tic = xlog_ticket_alloc(log, unit_bytes, cnt, permanent);
 	*ticp = tic;
-
-	xlog_grant_push_ail(log, tic->t_cnt ? tic->t_unit_res * tic->t_cnt
-					    : tic->t_unit_res);
-
 	trace_xfs_log_reserve(log, tic);
-
 	error = xlog_grant_head_check(log, &log->l_reserve_head, tic,
 				      &need_bytes);
 	if (error)
@@ -1337,7 +1296,7 @@ xlog_assign_tail_lsn(
  * shortcut invalidity asserts in this case so that we don't trigger them
  * falsely.
  */
-STATIC int
+int
 xlog_space_left(
 	struct xlog	*log,
 	atomic64_t	*head)
@@ -1678,89 +1637,6 @@ xlog_alloc_log(
 	return ERR_PTR(error);
 }	/* xlog_alloc_log */
 
-/*
- * Compute the LSN that we'd need to push the log tail towards in order to have
- * (a) enough on-disk log space to log the number of bytes specified, (b) at
- * least 25% of the log space free, and (c) at least 256 blocks free.  If the
- * log free space already meets all three thresholds, this function returns
- * NULLCOMMITLSN.
- */
-xfs_lsn_t
-xlog_grant_push_threshold(
-	struct xlog	*log,
-	int		need_bytes)
-{
-	xfs_lsn_t	threshold_lsn = 0;
-	xfs_lsn_t	last_sync_lsn;
-	int		free_blocks;
-	int		free_bytes;
-	int		threshold_block;
-	int		threshold_cycle;
-	int		free_threshold;
-
-	ASSERT(BTOBB(need_bytes) < log->l_logBBsize);
-
-	free_bytes = xlog_space_left(log, &log->l_reserve_head.grant);
-	free_blocks = BTOBBT(free_bytes);
-
-	/*
-	 * Set the threshold for the minimum number of free blocks in the
-	 * log to the maximum of what the caller needs, one quarter of the
-	 * log, and 256 blocks.
-	 */
-	free_threshold = BTOBB(need_bytes);
-	free_threshold = max(free_threshold, (log->l_logBBsize >> 2));
-	free_threshold = max(free_threshold, 256);
-	if (free_blocks >= free_threshold)
-		return NULLCOMMITLSN;
-
-	xlog_crack_atomic_lsn(&log->l_tail_lsn, &threshold_cycle,
-						&threshold_block);
-	threshold_block += free_threshold;
-	if (threshold_block >= log->l_logBBsize) {
-		threshold_block -= log->l_logBBsize;
-		threshold_cycle += 1;
-	}
-	threshold_lsn = xlog_assign_lsn(threshold_cycle,
-					threshold_block);
-	/*
-	 * Don't pass in an lsn greater than the lsn of the last
-	 * log record known to be on disk. Use a snapshot of the last sync lsn
-	 * so that it doesn't change between the compare and the set.
-	 */
-	last_sync_lsn = atomic64_read(&log->l_last_sync_lsn);
-	if (XFS_LSN_CMP(threshold_lsn, last_sync_lsn) > 0)
-		threshold_lsn = last_sync_lsn;
-
-	return threshold_lsn;
-}
-
-/*
- * Push the tail of the log if we need to do so to maintain the free log space
- * thresholds set out by xlog_grant_push_threshold.  We may need to adopt a
- * policy which pushes on an lsn which is further along in the log once we
- * reach the high water mark.  In this manner, we would be creating a low water
- * mark.
- */
-STATIC void
-xlog_grant_push_ail(
-	struct xlog	*log,
-	int		need_bytes)
-{
-	xfs_lsn_t	threshold_lsn;
-
-	threshold_lsn = xlog_grant_push_threshold(log, need_bytes);
-	if (threshold_lsn == NULLCOMMITLSN || xlog_is_shutdown(log))
-		return;
-
-	/*
-	 * Get the transaction layer to kick the dirty buffers out to
-	 * disk asynchronously. No point in trying to do this if
-	 * the filesystem is shutting down.
-	 */
-	xfs_ail_push(log->l_ailp, threshold_lsn);
-}
-
 /*
  * Stamp cycle number in every block
  */
@@ -2725,7 +2601,6 @@ xlog_state_set_callback(
 		return;
 
 	atomic64_set(&log->l_last_sync_lsn, header_lsn);
-	xlog_grant_push_ail(log, 0);
 }
 
 /*
diff --git a/fs/xfs/xfs_log.h b/fs/xfs/xfs_log.h
index 2728886c2963..6b6ee35b3885 100644
--- a/fs/xfs/xfs_log.h
+++ b/fs/xfs/xfs_log.h
@@ -156,7 +156,6 @@ int	xfs_log_quiesce(struct xfs_mount *mp);
 void	xfs_log_clean(struct xfs_mount *mp);
 bool	xfs_log_check_lsn(struct xfs_mount *, xfs_lsn_t);
 
-xfs_lsn_t xlog_grant_push_threshold(struct xlog *log, int need_bytes);
 bool	  xlog_force_shutdown(struct xlog *log, uint32_t shutdown_flags);
 
 void xlog_use_incompat_feat(struct xlog *log);
diff --git a/fs/xfs/xfs_log_priv.h b/fs/xfs/xfs_log_priv.h
index 1bd2963e8fbd..91a8c74f4626 100644
--- a/fs/xfs/xfs_log_priv.h
+++ b/fs/xfs/xfs_log_priv.h
@@ -573,6 +573,8 @@ xlog_assign_grant_head(atomic64_t *head, int cycle, int space)
 	atomic64_set(head, xlog_assign_grant_head_val(cycle, space));
 }
 
+int xlog_space_left(struct xlog	 *log, atomic64_t *head);
+
 /*
  * Committed Item List interfaces
  */
diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c
index d3a97a028560..243d6b05e5a9 100644
--- a/fs/xfs/xfs_trans_ail.c
+++ b/fs/xfs/xfs_trans_ail.c
@@ -134,25 +134,6 @@ xfs_ail_min_lsn(
 	return lsn;
 }
 
-/*
- * Return the maximum lsn held in the AIL, or zero if the AIL is empty.
- */
-static xfs_lsn_t
-xfs_ail_max_lsn(
-	struct xfs_ail		*ailp)
-{
-	xfs_lsn_t       	lsn = 0;
-	struct xfs_log_item	*lip;
-
-	spin_lock(&ailp->ail_lock);
-	lip = xfs_ail_max(ailp);
-	if (lip)
-		lsn = lip->li_lsn;
-	spin_unlock(&ailp->ail_lock);
-
-	return lsn;
-}
-
 /*
  * The cursor keeps track of where our current traversal is up to by tracking
  * the next item in the list for us. However, for this to be safe, removing an
@@ -414,6 +395,57 @@ xfsaild_push_item(
 	return lip->li_ops->iop_push(lip, &ailp->ail_buf_list);
 }
 
+/*
+ * Compute the LSN that we'd need to push the log tail towards in order to have
+ * at least 25% of the log space free.  If the log free space already meets this
+ * threshold, this function returns NULLCOMMITLSN.
+ */
+xfs_lsn_t
+__xfs_ail_push_target(
+	struct xfs_ail		*ailp)
+{
+	struct xlog	*log = ailp->ail_log;
+	xfs_lsn_t	threshold_lsn = 0;
+	xfs_lsn_t	last_sync_lsn;
+	int		free_blocks;
+	int		free_bytes;
+	int		threshold_block;
+	int		threshold_cycle;
+	int		free_threshold;
+
+	free_bytes = xlog_space_left(log, &log->l_reserve_head.grant);
+	free_blocks = BTOBBT(free_bytes);
+
+	/*
+	 * Set the threshold for the minimum number of free blocks in the
+	 * log to the maximum of what the caller needs, one quarter of the
+	 * log, and 256 blocks.
+	 */
+	free_threshold = log->l_logBBsize >> 2;
+	if (free_blocks >= free_threshold)
+		return NULLCOMMITLSN;
+
+	xlog_crack_atomic_lsn(&log->l_tail_lsn, &threshold_cycle,
+						&threshold_block);
+	threshold_block += free_threshold;
+	if (threshold_block >= log->l_logBBsize) {
+		threshold_block -= log->l_logBBsize;
+		threshold_cycle += 1;
+	}
+	threshold_lsn = xlog_assign_lsn(threshold_cycle,
+					threshold_block);
+	/*
+	 * Don't pass in an lsn greater than the lsn of the last
+	 * log record known to be on disk. Use a snapshot of the last sync lsn
+	 * so that it doesn't change between the compare and the set.
+	 */
+	last_sync_lsn = atomic64_read(&log->l_last_sync_lsn);
+	if (XFS_LSN_CMP(threshold_lsn, last_sync_lsn) > 0)
+		threshold_lsn = last_sync_lsn;
+
+	return threshold_lsn;
+}
+
 static long
 xfsaild_push(
 	struct xfs_ail		*ailp)
@@ -422,7 +454,7 @@ xfsaild_push(
 	struct xfs_ail_cursor	cur;
 	struct xfs_log_item	*lip;
 	xfs_lsn_t		lsn;
-	xfs_lsn_t		target;
+	xfs_lsn_t		target = NULLCOMMITLSN;
 	long			tout;
 	int			stuck = 0;
 	int			flushing = 0;
@@ -454,21 +486,24 @@ xfsaild_push(
 	 * capture updates that occur after the sync push waiter has gone to
 	 * sleep.
 	 */
-	if (waitqueue_active(&ailp->ail_empty)) {
+	if (test_bit(XFS_AIL_OPSTATE_PUSH_ALL, &ailp->ail_opstate) ||
+	    waitqueue_active(&ailp->ail_empty)) {
 		lip = xfs_ail_max(ailp);
 		if (lip)
 			target = lip->li_lsn;
+		else
+			clear_bit(XFS_AIL_OPSTATE_PUSH_ALL, &ailp->ail_opstate);
 	} else {
-		/* barrier matches the ail_target update in xfs_ail_push() */
-		smp_rmb();
-		target = ailp->ail_target;
-		ailp->ail_target_prev = target;
+		target = __xfs_ail_push_target(ailp);
 	}
 
+	if (target == NULLCOMMITLSN)
+		goto out_done;
+
 	/* we're done if the AIL is empty or our push has reached the end */
 	lip = xfs_trans_ail_cursor_first(ailp, &cur, ailp->ail_last_pushed_lsn);
 	if (!lip)
-		goto out_done;
+		goto out_done_cursor;
 
 	XFS_STATS_INC(mp, xs_push_ail);
 
@@ -551,8 +586,9 @@ xfsaild_push(
 		lsn = lip->li_lsn;
 	}
 
-out_done:
+out_done_cursor:
 	xfs_trans_ail_cursor_done(&cur);
+out_done:
 	spin_unlock(&ailp->ail_lock);
 
 	if (xfs_buf_delwri_submit_nowait(&ailp->ail_buf_list))
@@ -601,7 +637,7 @@ xfsaild(
 	set_freezable();
 
 	while (1) {
-		if (tout && tout <= 20)
+		if (tout)
 			set_current_state(TASK_KILLABLE);
 		else
 			set_current_state(TASK_INTERRUPTIBLE);
@@ -637,21 +673,9 @@ xfsaild(
 			break;
 		}
 
+		/* Idle if the AIL is empty. */
 		spin_lock(&ailp->ail_lock);
-
-		/*
-		 * Idle if the AIL is empty and we are not racing with a target
-		 * update. We check the AIL after we set the task to a sleep
-		 * state to guarantee that we either catch an ail_target update
-		 * or that a wake_up resets the state to TASK_RUNNING.
-		 * Otherwise, we run the risk of sleeping indefinitely.
-		 *
-		 * The barrier matches the ail_target update in xfs_ail_push().
-		 */
-		smp_rmb();
-		if (!xfs_ail_min(ailp) &&
-		    ailp->ail_target == ailp->ail_target_prev &&
-		    list_empty(&ailp->ail_buf_list)) {
+		if (!xfs_ail_min(ailp) && list_empty(&ailp->ail_buf_list)) {
 			spin_unlock(&ailp->ail_lock);
 			freezable_schedule();
 			tout = 0;
@@ -673,56 +697,6 @@ xfsaild(
 	return 0;
 }
 
-/*
- * This routine is called to move the tail of the AIL forward.  It does this by
- * trying to flush items in the AIL whose lsns are below the given
- * threshold_lsn.
- *
- * The push is run asynchronously in a workqueue, which means the caller needs
- * to handle waiting on the async flush for space to become available.
- * We don't want to interrupt any push that is in progress, hence we only queue
- * work if we set the pushing bit appropriately.
- *
- * We do this unlocked - we only need to know whether there is anything in the
- * AIL at the time we are called. We don't need to access the contents of
- * any of the objects, so the lock is not needed.
- */
-void
-xfs_ail_push(
-	struct xfs_ail		*ailp,
-	xfs_lsn_t		threshold_lsn)
-{
-	struct xfs_log_item	*lip;
-
-	lip = xfs_ail_min(ailp);
-	if (!lip || xlog_is_shutdown(ailp->ail_log) ||
-	    XFS_LSN_CMP(threshold_lsn, ailp->ail_target) <= 0)
-		return;
-
-	/*
-	 * Ensure that the new target is noticed in push code before it clears
-	 * the XFS_AIL_PUSHING_BIT.
-	 */
-	smp_wmb();
-	xfs_trans_ail_copy_lsn(ailp, &ailp->ail_target, &threshold_lsn);
-	smp_wmb();
-
-	wake_up_process(ailp->ail_task);
-}
-
-/*
- * Push out all items in the AIL immediately
- */
-void
-xfs_ail_push_all(
-	struct xfs_ail  *ailp)
-{
-	xfs_lsn_t       threshold_lsn = xfs_ail_max_lsn(ailp);
-
-	if (threshold_lsn)
-		xfs_ail_push(ailp, threshold_lsn);
-}
-
 /*
  * Push out all items in the AIL immediately and wait until the AIL is empty.
  */
@@ -828,6 +802,13 @@ xfs_trans_ail_update_bulk(
 	if (!list_empty(&tmp))
 		xfs_ail_splice(ailp, cur, &tmp, lsn);
 
+	/*
+	 * If this is the first insert, wake up the push daemon so it can
+	 * actively scan for items to push.
+	 */
+	if (!mlip)
+		wake_up_process(ailp->ail_task);
+
 	xfs_ail_update_finish(ailp, tail_lsn);
 }
 
diff --git a/fs/xfs/xfs_trans_priv.h b/fs/xfs/xfs_trans_priv.h
index 52a45f0a5ef1..9a131e7fae94 100644
--- a/fs/xfs/xfs_trans_priv.h
+++ b/fs/xfs/xfs_trans_priv.h
@@ -52,16 +52,18 @@ struct xfs_ail {
 	struct xlog		*ail_log;
 	struct task_struct	*ail_task;
 	struct list_head	ail_head;
-	xfs_lsn_t		ail_target;
-	xfs_lsn_t		ail_target_prev;
 	struct list_head	ail_cursors;
 	spinlock_t		ail_lock;
 	xfs_lsn_t		ail_last_pushed_lsn;
 	int			ail_log_flush;
+	unsigned long		ail_opstate;
 	struct list_head	ail_buf_list;
 	wait_queue_head_t	ail_empty;
 };
 
+/* Push all items out of the AIL immediately. */
+#define XFS_AIL_OPSTATE_PUSH_ALL	0u
+
 /*
  * From xfs_trans_ail.c
  */
@@ -98,10 +100,29 @@ void xfs_ail_update_finish(struct xfs_ail *ailp, xfs_lsn_t old_lsn)
 			__releases(ailp->ail_lock);
 void xfs_trans_ail_delete(struct xfs_log_item *lip, int shutdown_type);
 
-void			xfs_ail_push(struct xfs_ail *, xfs_lsn_t);
-void			xfs_ail_push_all(struct xfs_ail *);
-void			xfs_ail_push_all_sync(struct xfs_ail *);
-struct xfs_log_item	*xfs_ail_min(struct xfs_ail  *ailp);
+static inline void xfs_ail_push(struct xfs_ail *ailp)
+{
+	wake_up_process(ailp->ail_task);
+}
+
+static inline void xfs_ail_push_all(struct xfs_ail *ailp)
+{
+	if (!test_and_set_bit(XFS_AIL_OPSTATE_PUSH_ALL, &ailp->ail_opstate))
+		xfs_ail_push(ailp);
+}
+
+xfs_lsn_t		__xfs_ail_push_target(struct xfs_ail *ailp);
+static inline xfs_lsn_t xfs_ail_push_target(struct xfs_ail *ailp)
+{
+	xfs_lsn_t	lsn;
+
+	spin_lock(&ailp->ail_lock);
+	lsn = __xfs_ail_push_target(ailp);
+	spin_unlock(&ailp->ail_lock);
+	return lsn;
+}
+
+void			xfs_ail_push_all_sync(struct xfs_ail *ailp);
 xfs_lsn_t		xfs_ail_min_lsn(struct xfs_ail *ailp);
 
 struct xfs_log_item *	xfs_trans_ail_cursor_first(struct xfs_ail *ailp,
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 3/9] xfs: background AIL push targets physical space, not grant space
  2022-08-09 23:03 [PATCH 0/9 v2] xfs: byte-base grant head reservation tracking Dave Chinner
  2022-08-09 23:03 ` [PATCH 1/9] xfs: move and xfs_trans_committed_bulk Dave Chinner
  2022-08-09 23:03 ` [PATCH 2/9] xfs: AIL doesn't need manual pushing Dave Chinner
@ 2022-08-09 23:03 ` Dave Chinner
  2022-08-22 19:00   ` Darrick J. Wong
  2022-09-07 14:04   ` Christoph Hellwig
  2022-08-09 23:03 ` [PATCH 4/9] xfs: ensure log tail is always up to date Dave Chinner
                   ` (5 subsequent siblings)
  8 siblings, 2 replies; 39+ messages in thread
From: Dave Chinner @ 2022-08-09 23:03 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

Currently the AIL attempts to keep 25% of the "log space" free,
where the current used space is tracked by the reserve grant head.
That is, it tracks both physical space used plus the amount reserved
by transactions in progress.

When we start tail pushing, we are trying to make space for new
reservations by writing back older metadata and the log is generally
physically full of dirty metadata, and reservations for modifications
in flight take up whatever space the AIL can physically free up.

Hence we don't really need to take into account the reservation
space that has been used - we just need to keep the log tail moving
as fast as we can to free up space for more reservations to be made.
We know exactly how much physical space the journal is consuming in
the AIL (i.e. max LSN - min LSN) so we can base push thresholds
directly on this state rather than have to look at grant head
reservations to determine how much to physically push out of the
log.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_log_priv.h  | 18 ++++++++++++
 fs/xfs/xfs_trans_ail.c | 67 +++++++++++++++++++-----------------------
 2 files changed, 49 insertions(+), 36 deletions(-)

diff --git a/fs/xfs/xfs_log_priv.h b/fs/xfs/xfs_log_priv.h
index 91a8c74f4626..9f8c601a302b 100644
--- a/fs/xfs/xfs_log_priv.h
+++ b/fs/xfs/xfs_log_priv.h
@@ -622,6 +622,24 @@ xlog_wait(
 
 int xlog_wait_on_iclog(struct xlog_in_core *iclog);
 
+/* Calculate the distance between two LSNs in bytes */
+static inline uint64_t
+xlog_lsn_sub(
+	struct xlog	*log,
+	xfs_lsn_t	high,
+	xfs_lsn_t	low)
+{
+	uint32_t	hi_cycle = CYCLE_LSN(high);
+	uint32_t	hi_block = BLOCK_LSN(high);
+	uint32_t	lo_cycle = CYCLE_LSN(low);
+	uint32_t	lo_block = BLOCK_LSN(low);
+
+	if (hi_cycle == lo_cycle)
+	       return BBTOB(hi_block - lo_block);
+	ASSERT((hi_cycle == lo_cycle + 1) || xlog_is_shutdown(log));
+	return (uint64_t)log->l_logsize - BBTOB(lo_block - hi_block);
+}
+
 /*
  * The LSN is valid so long as it is behind the current LSN. If it isn't, this
  * means that the next log record that includes this metadata could have a
diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c
index 243d6b05e5a9..d3dcb4942d6a 100644
--- a/fs/xfs/xfs_trans_ail.c
+++ b/fs/xfs/xfs_trans_ail.c
@@ -398,52 +398,47 @@ xfsaild_push_item(
 /*
  * Compute the LSN that we'd need to push the log tail towards in order to have
  * at least 25% of the log space free.  If the log free space already meets this
- * threshold, this function returns NULLCOMMITLSN.
+ * threshold, this function returns the lowest LSN in the AIL to slowly keep
+ * writeback ticking over and the tail of the log moving forward.
  */
 xfs_lsn_t
 __xfs_ail_push_target(
 	struct xfs_ail		*ailp)
 {
-	struct xlog	*log = ailp->ail_log;
-	xfs_lsn_t	threshold_lsn = 0;
-	xfs_lsn_t	last_sync_lsn;
-	int		free_blocks;
-	int		free_bytes;
-	int		threshold_block;
-	int		threshold_cycle;
-	int		free_threshold;
-
-	free_bytes = xlog_space_left(log, &log->l_reserve_head.grant);
-	free_blocks = BTOBBT(free_bytes);
+	struct xlog		*log = ailp->ail_log;
+	struct xfs_log_item	*lip;
 
-	/*
-	 * Set the threshold for the minimum number of free blocks in the
-	 * log to the maximum of what the caller needs, one quarter of the
-	 * log, and 256 blocks.
-	 */
-	free_threshold = log->l_logBBsize >> 2;
-	if (free_blocks >= free_threshold)
+	xfs_lsn_t	target_lsn = 0;
+	xfs_lsn_t	max_lsn;
+	xfs_lsn_t	min_lsn;
+	int32_t		free_bytes;
+	uint32_t	target_block;
+	uint32_t	target_cycle;
+
+	lockdep_assert_held(&ailp->ail_lock);
+
+	lip = xfs_ail_max(ailp);
+	if (!lip)
+		return NULLCOMMITLSN;
+	max_lsn = lip->li_lsn;
+	min_lsn = __xfs_ail_min_lsn(ailp);
+
+	free_bytes = log->l_logsize - xlog_lsn_sub(log, max_lsn, min_lsn);
+	if (free_bytes >= log->l_logsize >> 2)
 		return NULLCOMMITLSN;
 
-	xlog_crack_atomic_lsn(&log->l_tail_lsn, &threshold_cycle,
-						&threshold_block);
-	threshold_block += free_threshold;
-	if (threshold_block >= log->l_logBBsize) {
-		threshold_block -= log->l_logBBsize;
-		threshold_cycle += 1;
+	target_cycle = CYCLE_LSN(min_lsn);
+	target_block = BLOCK_LSN(min_lsn) + (log->l_logBBsize >> 2);
+	if (target_block >= log->l_logBBsize) {
+		target_block -= log->l_logBBsize;
+		target_cycle += 1;
 	}
-	threshold_lsn = xlog_assign_lsn(threshold_cycle,
-					threshold_block);
-	/*
-	 * Don't pass in an lsn greater than the lsn of the last
-	 * log record known to be on disk. Use a snapshot of the last sync lsn
-	 * so that it doesn't change between the compare and the set.
-	 */
-	last_sync_lsn = atomic64_read(&log->l_last_sync_lsn);
-	if (XFS_LSN_CMP(threshold_lsn, last_sync_lsn) > 0)
-		threshold_lsn = last_sync_lsn;
+	target_lsn = xlog_assign_lsn(target_cycle, target_block);
 
-	return threshold_lsn;
+	/* Cap the target to the highest LSN known to be in the AIL. */
+	if (XFS_LSN_CMP(target_lsn, max_lsn) > 0)
+		return max_lsn;
+	return target_lsn;
 }
 
 static long
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 4/9] xfs: ensure log tail is always up to date
  2022-08-09 23:03 [PATCH 0/9 v2] xfs: byte-base grant head reservation tracking Dave Chinner
                   ` (2 preceding siblings ...)
  2022-08-09 23:03 ` [PATCH 3/9] xfs: background AIL push targets physical space, not grant space Dave Chinner
@ 2022-08-09 23:03 ` Dave Chinner
  2022-08-23  0:33   ` Darrick J. Wong
  2022-09-07 14:06   ` Christoph Hellwig
  2022-08-09 23:03 ` [PATCH 5/9] xfs: l_last_sync_lsn is really AIL state Dave Chinner
                   ` (4 subsequent siblings)
  8 siblings, 2 replies; 39+ messages in thread
From: Dave Chinner @ 2022-08-09 23:03 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

Whenever we write an iclog, we call xlog_assign_tail_lsn() to update
the current tail before we write it into the iclog header. This
means we have to take the AIL lock on every iclog write just to
check if the tail of the log has moved.

This doesn't avoid races with log tail updates - the log tail could
move immediately after we assign the tail to the iclog header and
hence by the time the iclog reaches stable storage the tail LSN has
moved forward in memory. Hence the log tail LSN in the iclog header
is really just a point in time snapshot of the current state of the
AIL.

With this in mind, if we simply update the in memory log->l_tail_lsn
every time it changes in the AIL, there is no need to update the in
memory value when we are writing it into an iclog - it will already
be up-to-date in memory and checking the AIL again will not change
this. Hence xlog_state_release_iclog() does not need to check the
AIL to update the tail lsn and can just sample it directly without
needing to take the AIL lock.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_log.c       |  5 ++---
 fs/xfs/xfs_trans_ail.c | 17 +++++++++++++++--
 2 files changed, 17 insertions(+), 5 deletions(-)

diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
index c609c188bd8a..042744fe37b7 100644
--- a/fs/xfs/xfs_log.c
+++ b/fs/xfs/xfs_log.c
@@ -530,7 +530,6 @@ xlog_state_release_iclog(
 	struct xlog_in_core	*iclog,
 	struct xlog_ticket	*ticket)
 {
-	xfs_lsn_t		tail_lsn;
 	bool			last_ref;
 
 	lockdep_assert_held(&log->l_icloglock);
@@ -545,8 +544,8 @@ xlog_state_release_iclog(
 	if ((iclog->ic_state == XLOG_STATE_WANT_SYNC ||
 	     (iclog->ic_flags & XLOG_ICL_NEED_FUA)) &&
 	    !iclog->ic_header.h_tail_lsn) {
-		tail_lsn = xlog_assign_tail_lsn(log->l_mp);
-		iclog->ic_header.h_tail_lsn = cpu_to_be64(tail_lsn);
+		iclog->ic_header.h_tail_lsn =
+				cpu_to_be64(atomic64_read(&log->l_tail_lsn));
 	}
 
 	last_ref = atomic_dec_and_test(&iclog->ic_refcnt);
diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c
index d3dcb4942d6a..5f40509877f7 100644
--- a/fs/xfs/xfs_trans_ail.c
+++ b/fs/xfs/xfs_trans_ail.c
@@ -715,6 +715,13 @@ xfs_ail_push_all_sync(
 	finish_wait(&ailp->ail_empty, &wait);
 }
 
+/*
+ * Callers should pass the the original tail lsn so that we can detect if the
+ * tail has moved as a result of the operation that was performed. If the caller
+ * needs to force a tail LSN update, it should pass NULLCOMMITLSN to bypass the
+ * "did the tail LSN change?" checks. If the caller wants to avoid a tail update
+ * (e.g. it knows the tail did not change) it should pass an @old_lsn of 0.
+ */
 void
 xfs_ail_update_finish(
 	struct xfs_ail		*ailp,
@@ -799,10 +806,16 @@ xfs_trans_ail_update_bulk(
 
 	/*
 	 * If this is the first insert, wake up the push daemon so it can
-	 * actively scan for items to push.
+	 * actively scan for items to push. We also need to do a log tail
+	 * LSN update to ensure that it is correctly tracked by the log, so
+	 * set the tail_lsn to NULLCOMMITLSN so that xfs_ail_update_finish()
+	 * will see that the tail lsn has changed and will update the tail
+	 * appropriately.
 	 */
-	if (!mlip)
+	if (!mlip) {
 		wake_up_process(ailp->ail_task);
+		tail_lsn = NULLCOMMITLSN;
+	}
 
 	xfs_ail_update_finish(ailp, tail_lsn);
 }
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 5/9] xfs: l_last_sync_lsn is really AIL state
  2022-08-09 23:03 [PATCH 0/9 v2] xfs: byte-base grant head reservation tracking Dave Chinner
                   ` (3 preceding siblings ...)
  2022-08-09 23:03 ` [PATCH 4/9] xfs: ensure log tail is always up to date Dave Chinner
@ 2022-08-09 23:03 ` Dave Chinner
  2022-08-26 22:19   ` Darrick J. Wong
  2022-09-07 14:11   ` Christoph Hellwig
  2022-08-09 23:03 ` [PATCH 6/9] xfs: collapse xlog_state_set_callback in caller Dave Chinner
                   ` (3 subsequent siblings)
  8 siblings, 2 replies; 39+ messages in thread
From: Dave Chinner @ 2022-08-09 23:03 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

The current implementation of xlog_assign_tail_lsn() assumes that
when the AIL is empty, the log tail matches the LSN of the last
written commit record. This is recorded in xlog_state_set_callback()
as log->l_last_sync_lsn when the iclog state changes to
XLOG_STATE_CALLBACK. This change is then immediately followed by
running the callbacks on the iclog which then insert the log items
into the AIL at the "commit lsn" of that checkpoint.

The AIL tracks log items via the start record LSN of the checkpoint,
not the commit record LSN. THis is because we can pipeline multiple
checkpoints, and so the start record of checkpoint N+1 can be
written before the commit record of checkpoint N. i.e:

     start N			commit N
	+-------------+------------+----------------+
		  start N+1			commit N+1

The tail of the log cannot be moved to the LSN of commit N when all
the items of that checkpoint are written back, because then the
start record for N+1 is no longer in the active portion of the log
and recovery will fail/corrupt the filesystem.

Hence when all the log items in checkpoint N are written back, the
tail of the log most now only move as far forwards as the start LSN
of checkpoint N+1.

Hence we cannot use the maximum start record LSN the AIL sees as a
replacement the pointer to the current head of the on-disk log
records. However, we currently only use the l_last_sync_lsn when the
AIL is empty - when there is no start LSN remaining, the tail of the
log moves to the LSN of the last commit record as this is where
recovery needs to start searching for recoverable records. THe next
checkpoint will have a start record LSN that is higher than
l_last_sync_lsn, and so everything still works correctly when new
checkpoints are written to an otherwise empty log.

l_last_sync_lsn is an atomic variable because it is currently
updated when an iclog with callbacks attached moves to the CALLBACK
state. While we hold the icloglock at this point, we don't hold the
AIL lock. When we assign the log tail, we hold the AIL lock, not the
icloglock because we have to look up the AIL. Hence it is an atomic
variable so it's not bound to a specific lock context.

However, the iclog callbacks are only used for CIL checkpoints. We
don't use callbacks with unmount record writes, so the
l_last_sync_lsn variable only gets updated when we are processing
CIL checkpoint callbacks. And those callbacks run under AIL lock
contexts, not icloglock context. The CIL checkpoint already knows
what the LSN of the iclog the commit record was written to (obtained
when written into the iclog before submission) and so we can update
the l_last_sync_lsn under the AIL lock in this callback. No other
iclog callbacks will run until the currently executing one
completes, and hence we can update the l_last_sync_lsn under the AIL
lock safely.

This means l_last_sync_lsn can move to the AIL as the "ail_head_lsn"
and it can be used to replace the atomic l_last_sync_lsn in the
iclog code. This makes tracking the log tail belong entirely to the
AIL, rather than being smeared across log, iclog and AIL state and
locking.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_log.c         | 81 +++++-----------------------------------
 fs/xfs/xfs_log_cil.c     | 54 ++++++++++++++++++++-------
 fs/xfs/xfs_log_priv.h    |  9 ++---
 fs/xfs/xfs_log_recover.c | 19 +++++-----
 fs/xfs/xfs_trace.c       |  1 +
 fs/xfs/xfs_trace.h       |  8 ++--
 fs/xfs/xfs_trans_ail.c   | 26 +++++++++++--
 fs/xfs/xfs_trans_priv.h  | 13 +++++++
 8 files changed, 102 insertions(+), 109 deletions(-)

diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
index 042744fe37b7..e420591b1a8a 100644
--- a/fs/xfs/xfs_log.c
+++ b/fs/xfs/xfs_log.c
@@ -1237,47 +1237,6 @@ xfs_log_cover(
 	return error;
 }
 
-/*
- * We may be holding the log iclog lock upon entering this routine.
- */
-xfs_lsn_t
-xlog_assign_tail_lsn_locked(
-	struct xfs_mount	*mp)
-{
-	struct xlog		*log = mp->m_log;
-	struct xfs_log_item	*lip;
-	xfs_lsn_t		tail_lsn;
-
-	assert_spin_locked(&mp->m_ail->ail_lock);
-
-	/*
-	 * To make sure we always have a valid LSN for the log tail we keep
-	 * track of the last LSN which was committed in log->l_last_sync_lsn,
-	 * and use that when the AIL was empty.
-	 */
-	lip = xfs_ail_min(mp->m_ail);
-	if (lip)
-		tail_lsn = lip->li_lsn;
-	else
-		tail_lsn = atomic64_read(&log->l_last_sync_lsn);
-	trace_xfs_log_assign_tail_lsn(log, tail_lsn);
-	atomic64_set(&log->l_tail_lsn, tail_lsn);
-	return tail_lsn;
-}
-
-xfs_lsn_t
-xlog_assign_tail_lsn(
-	struct xfs_mount	*mp)
-{
-	xfs_lsn_t		tail_lsn;
-
-	spin_lock(&mp->m_ail->ail_lock);
-	tail_lsn = xlog_assign_tail_lsn_locked(mp);
-	spin_unlock(&mp->m_ail->ail_lock);
-
-	return tail_lsn;
-}
-
 /*
  * Return the space in the log between the tail and the head.  The head
  * is passed in the cycle/bytes formal parms.  In the special case where
@@ -1511,7 +1470,6 @@ xlog_alloc_log(
 	log->l_prev_block  = -1;
 	/* log->l_tail_lsn = 0x100000000LL; cycle = 1; current block = 0 */
 	xlog_assign_atomic_lsn(&log->l_tail_lsn, 1, 0);
-	xlog_assign_atomic_lsn(&log->l_last_sync_lsn, 1, 0);
 	log->l_curr_cycle  = 1;	    /* 0 is bad since this is initial value */
 
 	if (xfs_has_logv2(mp) && mp->m_sb.sb_logsunit > 1)
@@ -2562,44 +2520,23 @@ xlog_get_lowest_lsn(
 	return lowest_lsn;
 }
 
-/*
- * Completion of a iclog IO does not imply that a transaction has completed, as
- * transactions can be large enough to span many iclogs. We cannot change the
- * tail of the log half way through a transaction as this may be the only
- * transaction in the log and moving the tail to point to the middle of it
- * will prevent recovery from finding the start of the transaction. Hence we
- * should only update the last_sync_lsn if this iclog contains transaction
- * completion callbacks on it.
- *
- * We have to do this before we drop the icloglock to ensure we are the only one
- * that can update it.
- *
- * If we are moving the last_sync_lsn forwards, we also need to ensure we kick
- * the reservation grant head pushing. This is due to the fact that the push
- * target is bound by the current last_sync_lsn value. Hence if we have a large
- * amount of log space bound up in this committing transaction then the
- * last_sync_lsn value may be the limiting factor preventing tail pushing from
- * freeing space in the log. Hence once we've updated the last_sync_lsn we
- * should push the AIL to ensure the push target (and hence the grant head) is
- * no longer bound by the old log head location and can move forwards and make
- * progress again.
- */
 static void
 xlog_state_set_callback(
 	struct xlog		*log,
 	struct xlog_in_core	*iclog,
 	xfs_lsn_t		header_lsn)
 {
+	/*
+	 * If there are no callbacks on this iclog, we can mark it clean
+	 * immediately and return. Otherwise we need to run the
+	 * callbacks.
+	 */
+	if (list_empty(&iclog->ic_callbacks)) {
+		xlog_state_clean_iclog(log, iclog);
+		return;
+	}
 	trace_xlog_iclog_callback(iclog, _RET_IP_);
 	iclog->ic_state = XLOG_STATE_CALLBACK;
-
-	ASSERT(XFS_LSN_CMP(atomic64_read(&log->l_last_sync_lsn),
-			   header_lsn) <= 0);
-
-	if (list_empty_careful(&iclog->ic_callbacks))
-		return;
-
-	atomic64_set(&log->l_last_sync_lsn, header_lsn);
 }
 
 /*
diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
index 475a18493c37..843764d40232 100644
--- a/fs/xfs/xfs_log_cil.c
+++ b/fs/xfs/xfs_log_cil.c
@@ -710,6 +710,24 @@ xlog_cil_ail_insert_batch(
  * items into the the AIL. This uses bulk insertion techniques to minimise AIL
  * lock traffic.
  *
+ * The AIL tracks log items via the start record LSN of the checkpoint,
+ * not the commit record LSN. THis is because we can pipeline multiple
+ * checkpoints, and so the start record of checkpoint N+1 can be
+ * written before the commit record of checkpoint N. i.e:
+ *
+ *   start N			commit N
+ *	+-------------+------------+----------------+
+ *		  start N+1			commit N+1
+ *
+ * The tail of the log cannot be moved to the LSN of commit N when all
+ * the items of that checkpoint are written back, because then the
+ * start record for N+1 is no longer in the active portion of the log
+ * and recovery will fail/corrupt the filesystem.
+ *
+ * Hence when all the log items in checkpoint N are written back, the
+ * tail of the log most now only move as far forwards as the start LSN
+ * of checkpoint N+1.
+ *
  * If we are called with the aborted flag set, it is because a log write during
  * a CIL checkpoint commit has failed. In this case, all the items in the
  * checkpoint have already gone through iop_committed and iop_committing, which
@@ -727,24 +745,33 @@ xlog_cil_ail_insert_batch(
  */
 void
 xlog_cil_ail_insert(
-	struct xlog		*log,
-	struct list_head	*lv_chain,
-	xfs_lsn_t		commit_lsn,
+	struct xfs_cil_ctx	*ctx,
 	bool			aborted)
 {
 #define LOG_ITEM_BATCH_SIZE	32
-	struct xfs_ail		*ailp = log->l_ailp;
+	struct xfs_ail		*ailp = ctx->cil->xc_log->l_ailp;
 	struct xfs_log_item	*log_items[LOG_ITEM_BATCH_SIZE];
 	struct xfs_log_vec	*lv;
 	struct xfs_ail_cursor	cur;
 	int			i = 0;
 
+	/*
+	 * Update the AIL head LSN with the commit record LSN of this
+	 * checkpoint. As iclogs are always completed in order, this should
+	 * always be the same (as iclogs can contain multiple commit records) or
+	 * higher LSN than the current head. We do this before insertion of the
+	 * items so that log space checks during insertion will reflect the
+	 * space that this checkpoint has already consumed.
+	 */
+	ASSERT(XFS_LSN_CMP(ctx->commit_lsn, ailp->ail_head_lsn) >= 0 ||
+			aborted);
 	spin_lock(&ailp->ail_lock);
-	xfs_trans_ail_cursor_last(ailp, &cur, commit_lsn);
+	ailp->ail_head_lsn = ctx->commit_lsn;
+	xfs_trans_ail_cursor_last(ailp, &cur, ctx->start_lsn);
 	spin_unlock(&ailp->ail_lock);
 
 	/* unpin all the log items */
-	list_for_each_entry(lv, lv_chain, lv_list) {
+	list_for_each_entry(lv, &ctx->lv_chain, lv_list) {
 		struct xfs_log_item	*lip = lv->lv_item;
 		xfs_lsn_t		item_lsn;
 
@@ -757,9 +784,10 @@ xlog_cil_ail_insert(
 		}
 
 		if (lip->li_ops->iop_committed)
-			item_lsn = lip->li_ops->iop_committed(lip, commit_lsn);
+			item_lsn = lip->li_ops->iop_committed(lip,
+					ctx->start_lsn);
 		else
-			item_lsn = commit_lsn;
+			item_lsn = ctx->start_lsn;
 
 		/* item_lsn of -1 means the item needs no further processing */
 		if (XFS_LSN_CMP(item_lsn, (xfs_lsn_t)-1) == 0)
@@ -776,7 +804,7 @@ xlog_cil_ail_insert(
 			continue;
 		}
 
-		if (item_lsn != commit_lsn) {
+		if (item_lsn != ctx->start_lsn) {
 
 			/*
 			 * Not a bulk update option due to unusual item_lsn.
@@ -799,14 +827,15 @@ xlog_cil_ail_insert(
 		log_items[i++] = lv->lv_item;
 		if (i >= LOG_ITEM_BATCH_SIZE) {
 			xlog_cil_ail_insert_batch(ailp, &cur, log_items,
-					LOG_ITEM_BATCH_SIZE, commit_lsn);
+					LOG_ITEM_BATCH_SIZE, ctx->start_lsn);
 			i = 0;
 		}
 	}
 
 	/* make sure we insert the remainder! */
 	if (i)
-		xlog_cil_ail_insert_batch(ailp, &cur, log_items, i, commit_lsn);
+		xlog_cil_ail_insert_batch(ailp, &cur, log_items, i,
+				ctx->start_lsn);
 
 	spin_lock(&ailp->ail_lock);
 	xfs_trans_ail_cursor_done(&cur);
@@ -922,8 +951,7 @@ xlog_cil_committed(
 		spin_unlock(&ctx->cil->xc_push_lock);
 	}
 
-	xlog_cil_ail_insert(ctx->cil->xc_log, &ctx->lv_chain,
-					ctx->start_lsn, abort);
+	xlog_cil_ail_insert(ctx, abort);
 
 	xfs_extent_busy_sort(&ctx->busy_extents);
 	xfs_extent_busy_clear(mp, &ctx->busy_extents,
diff --git a/fs/xfs/xfs_log_priv.h b/fs/xfs/xfs_log_priv.h
index 9f8c601a302b..5f4358f18224 100644
--- a/fs/xfs/xfs_log_priv.h
+++ b/fs/xfs/xfs_log_priv.h
@@ -426,13 +426,10 @@ struct xlog {
 	int			l_prev_block;   /* previous logical log block */
 
 	/*
-	 * l_last_sync_lsn and l_tail_lsn are atomics so they can be set and
-	 * read without needing to hold specific locks. To avoid operations
-	 * contending with other hot objects, place each of them on a separate
-	 * cacheline.
+	 * l_tail_lsn is atomic so it can be set and read without needing to
+	 * hold specific locks. To avoid operations contending with other hot
+	 * objects, it on a separate cacheline.
 	 */
-	/* lsn of last LR on disk */
-	atomic64_t		l_last_sync_lsn ____cacheline_aligned_in_smp;
 	/* lsn of 1st LR with unflushed * buffers */
 	atomic64_t		l_tail_lsn ____cacheline_aligned_in_smp;
 
diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index 9e0e7ff76e02..d9997714f975 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -1177,8 +1177,8 @@ xlog_check_unmount_rec(
 			 */
 			xlog_assign_atomic_lsn(&log->l_tail_lsn,
 					log->l_curr_cycle, after_umount_blk);
-			xlog_assign_atomic_lsn(&log->l_last_sync_lsn,
-					log->l_curr_cycle, after_umount_blk);
+			log->l_ailp->ail_head_lsn =
+					atomic64_read(&log->l_tail_lsn);
 			*tail_blk = after_umount_blk;
 
 			*clean = true;
@@ -1212,7 +1212,7 @@ xlog_set_state(
 	if (bump_cycle)
 		log->l_curr_cycle++;
 	atomic64_set(&log->l_tail_lsn, be64_to_cpu(rhead->h_tail_lsn));
-	atomic64_set(&log->l_last_sync_lsn, be64_to_cpu(rhead->h_lsn));
+	log->l_ailp->ail_head_lsn = be64_to_cpu(rhead->h_lsn);
 	xlog_assign_grant_head(&log->l_reserve_head.grant, log->l_curr_cycle,
 					BBTOB(log->l_curr_block));
 	xlog_assign_grant_head(&log->l_write_head.grant, log->l_curr_cycle,
@@ -3294,14 +3294,13 @@ xlog_do_recover(
 
 	/*
 	 * We now update the tail_lsn since much of the recovery has completed
-	 * and there may be space available to use.  If there were no extent
-	 * or iunlinks, we can free up the entire log and set the tail_lsn to
-	 * be the last_sync_lsn.  This was set in xlog_find_tail to be the
-	 * lsn of the last known good LR on disk.  If there are extent frees
-	 * or iunlinks they will have some entries in the AIL; so we look at
-	 * the AIL to determine how to set the tail_lsn.
+	 * and there may be space available to use.  If there were no extent or
+	 * iunlinks, we can free up the entire log.  This was set in
+	 * xlog_find_tail to be the lsn of the last known good LR on disk.  If
+	 * there are extent frees or iunlinks they will have some entries in the
+	 * AIL; so we look at the AIL to determine how to set the tail_lsn.
 	 */
-	xlog_assign_tail_lsn(mp);
+	xfs_ail_assign_tail_lsn(log->l_ailp);
 
 	/*
 	 * Now that we've finished replaying all buffer and inode updates,
diff --git a/fs/xfs/xfs_trace.c b/fs/xfs/xfs_trace.c
index d269ef57ff01..dcf9af0108c1 100644
--- a/fs/xfs/xfs_trace.c
+++ b/fs/xfs/xfs_trace.c
@@ -22,6 +22,7 @@
 #include "xfs_trans.h"
 #include "xfs_log.h"
 #include "xfs_log_priv.h"
+#include "xfs_trans_priv.h"
 #include "xfs_buf_item.h"
 #include "xfs_quota.h"
 #include "xfs_dquot_item.h"
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index f9057af6e0c8..886cde292c95 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -1383,19 +1383,19 @@ TRACE_EVENT(xfs_log_assign_tail_lsn,
 		__field(dev_t, dev)
 		__field(xfs_lsn_t, new_lsn)
 		__field(xfs_lsn_t, old_lsn)
-		__field(xfs_lsn_t, last_sync_lsn)
+		__field(xfs_lsn_t, head_lsn)
 	),
 	TP_fast_assign(
 		__entry->dev = log->l_mp->m_super->s_dev;
 		__entry->new_lsn = new_lsn;
 		__entry->old_lsn = atomic64_read(&log->l_tail_lsn);
-		__entry->last_sync_lsn = atomic64_read(&log->l_last_sync_lsn);
+		__entry->head_lsn = log->l_ailp->ail_head_lsn;
 	),
-	TP_printk("dev %d:%d new tail lsn %d/%d, old lsn %d/%d, last sync %d/%d",
+	TP_printk("dev %d:%d new tail lsn %d/%d, old lsn %d/%d, head lsn %d/%d",
 		  MAJOR(__entry->dev), MINOR(__entry->dev),
 		  CYCLE_LSN(__entry->new_lsn), BLOCK_LSN(__entry->new_lsn),
 		  CYCLE_LSN(__entry->old_lsn), BLOCK_LSN(__entry->old_lsn),
-		  CYCLE_LSN(__entry->last_sync_lsn), BLOCK_LSN(__entry->last_sync_lsn))
+		  CYCLE_LSN(__entry->head_lsn), BLOCK_LSN(__entry->head_lsn))
 )
 
 DECLARE_EVENT_CLASS(xfs_file_class,
diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c
index 5f40509877f7..fe3f8b80e687 100644
--- a/fs/xfs/xfs_trans_ail.c
+++ b/fs/xfs/xfs_trans_ail.c
@@ -715,6 +715,26 @@ xfs_ail_push_all_sync(
 	finish_wait(&ailp->ail_empty, &wait);
 }
 
+void
+__xfs_ail_assign_tail_lsn(
+	struct xfs_ail		*ailp)
+{
+	struct xlog		*log = ailp->ail_log;
+	xfs_lsn_t		tail_lsn;
+
+	assert_spin_locked(&ailp->ail_lock);
+
+	if (xlog_is_shutdown(log))
+		return;
+
+	tail_lsn = __xfs_ail_min_lsn(ailp);
+	if (!tail_lsn)
+		tail_lsn = ailp->ail_head_lsn;
+
+	trace_xfs_log_assign_tail_lsn(log, tail_lsn);
+	atomic64_set(&log->l_tail_lsn, tail_lsn);
+}
+
 /*
  * Callers should pass the the original tail lsn so that we can detect if the
  * tail has moved as a result of the operation that was performed. If the caller
@@ -729,15 +749,13 @@ xfs_ail_update_finish(
 {
 	struct xlog		*log = ailp->ail_log;
 
-	/* if the tail lsn hasn't changed, don't do updates or wakeups. */
+	/* If the tail lsn hasn't changed, don't do updates or wakeups. */
 	if (!old_lsn || old_lsn == __xfs_ail_min_lsn(ailp)) {
 		spin_unlock(&ailp->ail_lock);
 		return;
 	}
 
-	if (!xlog_is_shutdown(log))
-		xlog_assign_tail_lsn_locked(log->l_mp);
-
+	__xfs_ail_assign_tail_lsn(ailp);
 	if (list_empty(&ailp->ail_head))
 		wake_up_all(&ailp->ail_empty);
 	spin_unlock(&ailp->ail_lock);
diff --git a/fs/xfs/xfs_trans_priv.h b/fs/xfs/xfs_trans_priv.h
index 9a131e7fae94..6541a6c3ea22 100644
--- a/fs/xfs/xfs_trans_priv.h
+++ b/fs/xfs/xfs_trans_priv.h
@@ -55,6 +55,7 @@ struct xfs_ail {
 	struct list_head	ail_cursors;
 	spinlock_t		ail_lock;
 	xfs_lsn_t		ail_last_pushed_lsn;
+	xfs_lsn_t		ail_head_lsn;
 	int			ail_log_flush;
 	unsigned long		ail_opstate;
 	struct list_head	ail_buf_list;
@@ -135,6 +136,18 @@ struct xfs_log_item *	xfs_trans_ail_cursor_next(struct xfs_ail *ailp,
 					struct xfs_ail_cursor *cur);
 void			xfs_trans_ail_cursor_done(struct xfs_ail_cursor *cur);
 
+void			__xfs_ail_assign_tail_lsn(struct xfs_ail *ailp);
+
+static inline void
+xfs_ail_assign_tail_lsn(
+	struct xfs_ail		*ailp)
+{
+
+	spin_lock(&ailp->ail_lock);
+	__xfs_ail_assign_tail_lsn(ailp);
+	spin_unlock(&ailp->ail_lock);
+}
+
 #if BITS_PER_LONG != 64
 static inline void
 xfs_trans_ail_copy_lsn(
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 6/9] xfs: collapse xlog_state_set_callback in caller
  2022-08-09 23:03 [PATCH 0/9 v2] xfs: byte-base grant head reservation tracking Dave Chinner
                   ` (4 preceding siblings ...)
  2022-08-09 23:03 ` [PATCH 5/9] xfs: l_last_sync_lsn is really AIL state Dave Chinner
@ 2022-08-09 23:03 ` Dave Chinner
  2022-08-26 22:20   ` Darrick J. Wong
  2022-09-07 14:12   ` Christoph Hellwig
  2022-08-09 23:03 ` [PATCH 7/9] xfs: track log space pinned by the AIL Dave Chinner
                   ` (2 subsequent siblings)
  8 siblings, 2 replies; 39+ messages in thread
From: Dave Chinner @ 2022-08-09 23:03 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

The function is called from a single place, and it isn't just
setting the iclog state to XLOG_STATE_CALLBACK - it can mark iclogs
clean, which moves tehm to states after CALLBACK. Hence the function
is now badly named, and should just be folded into the caller where
the iclog completion logic makes a whole lot more sense.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_log.c | 31 +++++++++++--------------------
 1 file changed, 11 insertions(+), 20 deletions(-)

diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
index e420591b1a8a..5b7c91a42edf 100644
--- a/fs/xfs/xfs_log.c
+++ b/fs/xfs/xfs_log.c
@@ -2520,25 +2520,6 @@ xlog_get_lowest_lsn(
 	return lowest_lsn;
 }
 
-static void
-xlog_state_set_callback(
-	struct xlog		*log,
-	struct xlog_in_core	*iclog,
-	xfs_lsn_t		header_lsn)
-{
-	/*
-	 * If there are no callbacks on this iclog, we can mark it clean
-	 * immediately and return. Otherwise we need to run the
-	 * callbacks.
-	 */
-	if (list_empty(&iclog->ic_callbacks)) {
-		xlog_state_clean_iclog(log, iclog);
-		return;
-	}
-	trace_xlog_iclog_callback(iclog, _RET_IP_);
-	iclog->ic_state = XLOG_STATE_CALLBACK;
-}
-
 /*
  * Return true if we need to stop processing, false to continue to the next
  * iclog. The caller will need to run callbacks if the iclog is returned in the
@@ -2570,7 +2551,17 @@ xlog_state_iodone_process_iclog(
 		lowest_lsn = xlog_get_lowest_lsn(log);
 		if (lowest_lsn && XFS_LSN_CMP(lowest_lsn, header_lsn) < 0)
 			return false;
-		xlog_state_set_callback(log, iclog, header_lsn);
+		/*
+		 * If there are no callbacks on this iclog, we can mark it clean
+		 * immediately and return. Otherwise we need to run the
+		 * callbacks.
+		 */
+		if (list_empty(&iclog->ic_callbacks)) {
+			xlog_state_clean_iclog(log, iclog);
+			return false;
+		}
+		trace_xlog_iclog_callback(iclog, _RET_IP_);
+		iclog->ic_state = XLOG_STATE_CALLBACK;
 		return false;
 	default:
 		/*
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 7/9] xfs: track log space pinned by the AIL
  2022-08-09 23:03 [PATCH 0/9 v2] xfs: byte-base grant head reservation tracking Dave Chinner
                   ` (5 preceding siblings ...)
  2022-08-09 23:03 ` [PATCH 6/9] xfs: collapse xlog_state_set_callback in caller Dave Chinner
@ 2022-08-09 23:03 ` Dave Chinner
  2022-08-26 22:39   ` Darrick J. Wong
  2022-08-09 23:03 ` [PATCH 8/9] xfs: pass the full grant head to accounting functions Dave Chinner
  2022-08-09 23:03 ` [PATCH 9/9] xfs: grant heads track byte counts, not LSNs Dave Chinner
  8 siblings, 1 reply; 39+ messages in thread
From: Dave Chinner @ 2022-08-09 23:03 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

Currently we track space used in the log by grant heads.
These store the reserved space as a physical log location and
combine both space reserved for future use with space already used in
the log in a single variable. The amount of space consumed in the
log is then calculated as the  distance between the log tail and
the grant head.

The problem with tracking the grant head as a physical location
comes from the fact that it tracks both log cycle count and offset
into the log in bytes in a single 64 bit variable. because the cycle
count on disk is a 32 bit number, this also limits the offset into
the log to 32 bits. ANd because that is in bytes, we are limited to
being able to track only 2GB of log space in the grant head.

Hence to support larger physical logs, we need to track used space
differently in the grant head. We no longer use the grant head for
guiding AIL pushing, so the only thing it is now used for is
determining if we've run out of reservation space via the
calculation in xlog_space_left().

What we really need to do is move the grant heads away from tracking
physical space in the log. The issue here is that space consumed in
the log is not directly tracked by the current mechanism - the
space consumed in the log by grant head reservations gets returned
to the free pool by the tail of the log moving forward. i.e. the
space isn't directly tracked or calculated, but the used grant space
gets "freed" as the physical limits of the log are updated without
actually needing to update the grant heads.

Hence to move away from implicit, zero-update log space tracking we
need to explicitly track the amount of physical space the log
actually consumes separately to the in-memory reservations for
operations that will be committed to the journal. Luckily, we
already track the information we need to calculate this in the AIL
itself.

That is, the space currently consumed by the journal is the maximum
LSN that the AIL has seen minus the current log tail. As we update
both of these items dynamically as the head and tail of the log
moves, we always know exactly how much space the journal consumes.

This means that we also know exactly how much space the currently
active reservations require, and exactly how much free space we have
remaining for new reservations to be made. Most importantly, we know
what these spaces are indepedently of the physical locations of
the head and tail of the log.

Hence by separating out the physical space consumed by the journal,
we can now track reservations in the grant heads purely as a byte
count, and the log can be considered full when the tail space +
reservation space exceeds the size of the log. This means we can use
the full 64 bits of grant head space for reservation space,
completely removing the 32 bit byte count limitation on log size
that they impose.

Hence the first step in this conversion is to track and update the
"log tail space" every time the AIL tail or maximum seen LSN
changes.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_log_cil.c   | 9 ++++++---
 fs/xfs/xfs_log_priv.h  | 1 +
 fs/xfs/xfs_trans_ail.c | 9 ++++++---
 3 files changed, 13 insertions(+), 6 deletions(-)

diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
index 843764d40232..e482ae9fc01c 100644
--- a/fs/xfs/xfs_log_cil.c
+++ b/fs/xfs/xfs_log_cil.c
@@ -761,14 +761,17 @@ xlog_cil_ail_insert(
 	 * always be the same (as iclogs can contain multiple commit records) or
 	 * higher LSN than the current head. We do this before insertion of the
 	 * items so that log space checks during insertion will reflect the
-	 * space that this checkpoint has already consumed.
+	 * space that this checkpoint has already consumed.  We call
+	 * xfs_ail_update_finish() so that tail space and space-based wakeups
+	 * will be recalculated appropriately.
 	 */
 	ASSERT(XFS_LSN_CMP(ctx->commit_lsn, ailp->ail_head_lsn) >= 0 ||
 			aborted);
 	spin_lock(&ailp->ail_lock);
-	ailp->ail_head_lsn = ctx->commit_lsn;
 	xfs_trans_ail_cursor_last(ailp, &cur, ctx->start_lsn);
-	spin_unlock(&ailp->ail_lock);
+	ailp->ail_head_lsn = ctx->commit_lsn;
+	/* xfs_ail_update_finish() drops the ail_lock */
+	xfs_ail_update_finish(ailp, NULLCOMMITLSN);
 
 	/* unpin all the log items */
 	list_for_each_entry(lv, &ctx->lv_chain, lv_list) {
diff --git a/fs/xfs/xfs_log_priv.h b/fs/xfs/xfs_log_priv.h
index 5f4358f18224..8a005cb08a02 100644
--- a/fs/xfs/xfs_log_priv.h
+++ b/fs/xfs/xfs_log_priv.h
@@ -435,6 +435,7 @@ struct xlog {
 
 	struct xlog_grant_head	l_reserve_head;
 	struct xlog_grant_head	l_write_head;
+	uint64_t		l_tail_space;
 
 	struct xfs_kobj		l_kobj;
 
diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c
index fe3f8b80e687..5d0ddd6d68e9 100644
--- a/fs/xfs/xfs_trans_ail.c
+++ b/fs/xfs/xfs_trans_ail.c
@@ -731,6 +731,8 @@ __xfs_ail_assign_tail_lsn(
 	if (!tail_lsn)
 		tail_lsn = ailp->ail_head_lsn;
 
+	WRITE_ONCE(log->l_tail_space,
+			xlog_lsn_sub(log, ailp->ail_head_lsn, tail_lsn));
 	trace_xfs_log_assign_tail_lsn(log, tail_lsn);
 	atomic64_set(&log->l_tail_lsn, tail_lsn);
 }
@@ -738,9 +740,10 @@ __xfs_ail_assign_tail_lsn(
 /*
  * Callers should pass the the original tail lsn so that we can detect if the
  * tail has moved as a result of the operation that was performed. If the caller
- * needs to force a tail LSN update, it should pass NULLCOMMITLSN to bypass the
- * "did the tail LSN change?" checks. If the caller wants to avoid a tail update
- * (e.g. it knows the tail did not change) it should pass an @old_lsn of 0.
+ * needs to force a tail space update, it should pass NULLCOMMITLSN to bypass
+ * the "did the tail LSN change?" checks. If the caller wants to avoid a tail
+ * update (e.g. it knows the tail did not change) it should pass an @old_lsn of
+ * 0.
  */
 void
 xfs_ail_update_finish(
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 8/9] xfs: pass the full grant head to accounting functions
  2022-08-09 23:03 [PATCH 0/9 v2] xfs: byte-base grant head reservation tracking Dave Chinner
                   ` (6 preceding siblings ...)
  2022-08-09 23:03 ` [PATCH 7/9] xfs: track log space pinned by the AIL Dave Chinner
@ 2022-08-09 23:03 ` Dave Chinner
  2022-08-26 22:25   ` Darrick J. Wong
  2022-08-09 23:03 ` [PATCH 9/9] xfs: grant heads track byte counts, not LSNs Dave Chinner
  8 siblings, 1 reply; 39+ messages in thread
From: Dave Chinner @ 2022-08-09 23:03 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

Because we are going to need them soon. API change only, no logic
changes.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_log.c      | 157 +++++++++++++++++++++---------------------
 fs/xfs/xfs_log_priv.h |   2 -
 2 files changed, 77 insertions(+), 82 deletions(-)

diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
index 5b7c91a42edf..459c0f438c89 100644
--- a/fs/xfs/xfs_log.c
+++ b/fs/xfs/xfs_log.c
@@ -136,10 +136,10 @@ xlog_prepare_iovec(
 static void
 xlog_grant_sub_space(
 	struct xlog		*log,
-	atomic64_t		*head,
+	struct xlog_grant_head	*head,
 	int			bytes)
 {
-	int64_t	head_val = atomic64_read(head);
+	int64_t	head_val = atomic64_read(&head->grant);
 	int64_t new, old;
 
 	do {
@@ -155,17 +155,17 @@ xlog_grant_sub_space(
 
 		old = head_val;
 		new = xlog_assign_grant_head_val(cycle, space);
-		head_val = atomic64_cmpxchg(head, old, new);
+		head_val = atomic64_cmpxchg(&head->grant, old, new);
 	} while (head_val != old);
 }
 
 static void
 xlog_grant_add_space(
 	struct xlog		*log,
-	atomic64_t		*head,
+	struct xlog_grant_head	*head,
 	int			bytes)
 {
-	int64_t	head_val = atomic64_read(head);
+	int64_t	head_val = atomic64_read(&head->grant);
 	int64_t new, old;
 
 	do {
@@ -184,7 +184,7 @@ xlog_grant_add_space(
 
 		old = head_val;
 		new = xlog_assign_grant_head_val(cycle, space);
-		head_val = atomic64_cmpxchg(head, old, new);
+		head_val = atomic64_cmpxchg(&head->grant, old, new);
 	} while (head_val != old);
 }
 
@@ -197,6 +197,63 @@ xlog_grant_head_init(
 	spin_lock_init(&head->lock);
 }
 
+/*
+ * Return the space in the log between the tail and the head.  The head
+ * is passed in the cycle/bytes formal parms.  In the special case where
+ * the reserve head has wrapped passed the tail, this calculation is no
+ * longer valid.  In this case, just return 0 which means there is no space
+ * in the log.  This works for all places where this function is called
+ * with the reserve head.  Of course, if the write head were to ever
+ * wrap the tail, we should blow up.  Rather than catch this case here,
+ * we depend on other ASSERTions in other parts of the code.   XXXmiken
+ *
+ * If reservation head is behind the tail, we have a problem. Warn about it,
+ * but then treat it as if the log is empty.
+ *
+ * If the log is shut down, the head and tail may be invalid or out of whack, so
+ * shortcut invalidity asserts in this case so that we don't trigger them
+ * falsely.
+ */
+static int
+xlog_grant_space_left(
+	struct xlog		*log,
+	struct xlog_grant_head	*head)
+{
+	int			tail_bytes;
+	int			tail_cycle;
+	int			head_cycle;
+	int			head_bytes;
+
+	xlog_crack_grant_head(&head->grant, &head_cycle, &head_bytes);
+	xlog_crack_atomic_lsn(&log->l_tail_lsn, &tail_cycle, &tail_bytes);
+	tail_bytes = BBTOB(tail_bytes);
+	if (tail_cycle == head_cycle && head_bytes >= tail_bytes)
+		return log->l_logsize - (head_bytes - tail_bytes);
+	if (tail_cycle + 1 < head_cycle)
+		return 0;
+
+	/* Ignore potential inconsistency when shutdown. */
+	if (xlog_is_shutdown(log))
+		return log->l_logsize;
+
+	if (tail_cycle < head_cycle) {
+		ASSERT(tail_cycle == (head_cycle - 1));
+		return tail_bytes - head_bytes;
+	}
+
+	/*
+	 * The reservation head is behind the tail. In this case we just want to
+	 * return the size of the log as the amount of space left.
+	 */
+	xfs_alert(log->l_mp, "xlog_grant_space_left: head behind tail");
+	xfs_alert(log->l_mp, "  tail_cycle = %d, tail_bytes = %d",
+		  tail_cycle, tail_bytes);
+	xfs_alert(log->l_mp, "  GH   cycle = %d, GH   bytes = %d",
+		  head_cycle, head_bytes);
+	ASSERT(0);
+	return log->l_logsize;
+}
+
 STATIC void
 xlog_grant_head_wake_all(
 	struct xlog_grant_head	*head)
@@ -277,7 +334,7 @@ xlog_grant_head_wait(
 		spin_lock(&head->lock);
 		if (xlog_is_shutdown(log))
 			goto shutdown;
-	} while (xlog_space_left(log, &head->grant) < need_bytes);
+	} while (xlog_grant_space_left(log, head) < need_bytes);
 
 	list_del_init(&tic->t_queue);
 	return 0;
@@ -322,7 +379,7 @@ xlog_grant_head_check(
 	 * otherwise try to get some space for this transaction.
 	 */
 	*need_bytes = xlog_ticket_reservation(log, head, tic);
-	free_bytes = xlog_space_left(log, &head->grant);
+	free_bytes = xlog_grant_space_left(log, head);
 	if (!list_empty_careful(&head->waiters)) {
 		spin_lock(&head->lock);
 		if (!xlog_grant_head_wake(log, head, &free_bytes) ||
@@ -396,7 +453,7 @@ xfs_log_regrant(
 	if (error)
 		goto out_error;
 
-	xlog_grant_add_space(log, &log->l_write_head.grant, need_bytes);
+	xlog_grant_add_space(log, &log->l_write_head, need_bytes);
 	trace_xfs_log_regrant_exit(log, tic);
 	xlog_verify_grant_tail(log);
 	return 0;
@@ -447,8 +504,8 @@ xfs_log_reserve(
 	if (error)
 		goto out_error;
 
-	xlog_grant_add_space(log, &log->l_reserve_head.grant, need_bytes);
-	xlog_grant_add_space(log, &log->l_write_head.grant, need_bytes);
+	xlog_grant_add_space(log, &log->l_reserve_head, need_bytes);
+	xlog_grant_add_space(log, &log->l_write_head, need_bytes);
 	trace_xfs_log_reserve_exit(log, tic);
 	xlog_verify_grant_tail(log);
 	return 0;
@@ -1114,7 +1171,7 @@ xfs_log_space_wake(
 		ASSERT(!xlog_in_recovery(log));
 
 		spin_lock(&log->l_write_head.lock);
-		free_bytes = xlog_space_left(log, &log->l_write_head.grant);
+		free_bytes = xlog_grant_space_left(log, &log->l_write_head);
 		xlog_grant_head_wake(log, &log->l_write_head, &free_bytes);
 		spin_unlock(&log->l_write_head.lock);
 	}
@@ -1123,7 +1180,7 @@ xfs_log_space_wake(
 		ASSERT(!xlog_in_recovery(log));
 
 		spin_lock(&log->l_reserve_head.lock);
-		free_bytes = xlog_space_left(log, &log->l_reserve_head.grant);
+		free_bytes = xlog_grant_space_left(log, &log->l_reserve_head);
 		xlog_grant_head_wake(log, &log->l_reserve_head, &free_bytes);
 		spin_unlock(&log->l_reserve_head.lock);
 	}
@@ -1237,64 +1294,6 @@ xfs_log_cover(
 	return error;
 }
 
-/*
- * Return the space in the log between the tail and the head.  The head
- * is passed in the cycle/bytes formal parms.  In the special case where
- * the reserve head has wrapped passed the tail, this calculation is no
- * longer valid.  In this case, just return 0 which means there is no space
- * in the log.  This works for all places where this function is called
- * with the reserve head.  Of course, if the write head were to ever
- * wrap the tail, we should blow up.  Rather than catch this case here,
- * we depend on other ASSERTions in other parts of the code.   XXXmiken
- *
- * If reservation head is behind the tail, we have a problem. Warn about it,
- * but then treat it as if the log is empty.
- *
- * If the log is shut down, the head and tail may be invalid or out of whack, so
- * shortcut invalidity asserts in this case so that we don't trigger them
- * falsely.
- */
-int
-xlog_space_left(
-	struct xlog	*log,
-	atomic64_t	*head)
-{
-	int		tail_bytes;
-	int		tail_cycle;
-	int		head_cycle;
-	int		head_bytes;
-
-	xlog_crack_grant_head(head, &head_cycle, &head_bytes);
-	xlog_crack_atomic_lsn(&log->l_tail_lsn, &tail_cycle, &tail_bytes);
-	tail_bytes = BBTOB(tail_bytes);
-	if (tail_cycle == head_cycle && head_bytes >= tail_bytes)
-		return log->l_logsize - (head_bytes - tail_bytes);
-	if (tail_cycle + 1 < head_cycle)
-		return 0;
-
-	/* Ignore potential inconsistency when shutdown. */
-	if (xlog_is_shutdown(log))
-		return log->l_logsize;
-
-	if (tail_cycle < head_cycle) {
-		ASSERT(tail_cycle == (head_cycle - 1));
-		return tail_bytes - head_bytes;
-	}
-
-	/*
-	 * The reservation head is behind the tail. In this case we just want to
-	 * return the size of the log as the amount of space left.
-	 */
-	xfs_alert(log->l_mp, "xlog_space_left: head behind tail");
-	xfs_alert(log->l_mp, "  tail_cycle = %d, tail_bytes = %d",
-		  tail_cycle, tail_bytes);
-	xfs_alert(log->l_mp, "  GH   cycle = %d, GH   bytes = %d",
-		  head_cycle, head_bytes);
-	ASSERT(0);
-	return log->l_logsize;
-}
-
-
 static void
 xlog_ioend_work(
 	struct work_struct	*work)
@@ -1883,8 +1882,8 @@ xlog_sync(
 	if (ticket) {
 		ticket->t_curr_res -= roundoff;
 	} else {
-		xlog_grant_add_space(log, &log->l_reserve_head.grant, roundoff);
-		xlog_grant_add_space(log, &log->l_write_head.grant, roundoff);
+		xlog_grant_add_space(log, &log->l_reserve_head, roundoff);
+		xlog_grant_add_space(log, &log->l_write_head, roundoff);
 	}
 
 	/* put cycle number in every block */
@@ -2815,17 +2814,15 @@ xfs_log_ticket_regrant(
 	if (ticket->t_cnt > 0)
 		ticket->t_cnt--;
 
-	xlog_grant_sub_space(log, &log->l_reserve_head.grant,
-					ticket->t_curr_res);
-	xlog_grant_sub_space(log, &log->l_write_head.grant,
-					ticket->t_curr_res);
+	xlog_grant_sub_space(log, &log->l_reserve_head, ticket->t_curr_res);
+	xlog_grant_sub_space(log, &log->l_write_head, ticket->t_curr_res);
 	ticket->t_curr_res = ticket->t_unit_res;
 
 	trace_xfs_log_ticket_regrant_sub(log, ticket);
 
 	/* just return if we still have some of the pre-reserved space */
 	if (!ticket->t_cnt) {
-		xlog_grant_add_space(log, &log->l_reserve_head.grant,
+		xlog_grant_add_space(log, &log->l_reserve_head,
 				     ticket->t_unit_res);
 		trace_xfs_log_ticket_regrant_exit(log, ticket);
 
@@ -2873,8 +2870,8 @@ xfs_log_ticket_ungrant(
 		bytes += ticket->t_unit_res*ticket->t_cnt;
 	}
 
-	xlog_grant_sub_space(log, &log->l_reserve_head.grant, bytes);
-	xlog_grant_sub_space(log, &log->l_write_head.grant, bytes);
+	xlog_grant_sub_space(log, &log->l_reserve_head, bytes);
+	xlog_grant_sub_space(log, &log->l_write_head, bytes);
 
 	trace_xfs_log_ticket_ungrant_exit(log, ticket);
 
diff --git a/fs/xfs/xfs_log_priv.h b/fs/xfs/xfs_log_priv.h
index 8a005cb08a02..86b5959b5ef2 100644
--- a/fs/xfs/xfs_log_priv.h
+++ b/fs/xfs/xfs_log_priv.h
@@ -571,8 +571,6 @@ xlog_assign_grant_head(atomic64_t *head, int cycle, int space)
 	atomic64_set(head, xlog_assign_grant_head_val(cycle, space));
 }
 
-int xlog_space_left(struct xlog	 *log, atomic64_t *head);
-
 /*
  * Committed Item List interfaces
  */
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 9/9] xfs: grant heads track byte counts, not LSNs
  2022-08-09 23:03 [PATCH 0/9 v2] xfs: byte-base grant head reservation tracking Dave Chinner
                   ` (7 preceding siblings ...)
  2022-08-09 23:03 ` [PATCH 8/9] xfs: pass the full grant head to accounting functions Dave Chinner
@ 2022-08-09 23:03 ` Dave Chinner
  2022-08-26 23:45   ` Darrick J. Wong
  8 siblings, 1 reply; 39+ messages in thread
From: Dave Chinner @ 2022-08-09 23:03 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

The grant heads in the log track the space reserved in the log for
running transactions. They do this by tracking how far ahead of the
tail that the reservation has reached, and the units for doing this
are {cycle,bytes} for the reserve head rather than {cycle,blocks}
which are normal used by LSNs.

This is annoyingly complex because we have to split, crack and
combined these tuples for any calculation we do to determine log
space and targets. This is computationally expensive as well as
difficult to do atomically and locklessly, as well as limiting the
size of the log to 2^32 bytes.

Really, though, all the grant heads are tracking is how much space
is currently available for use in the log. We can track this as a
simply byte count - we just don't care what the actual physical
location in the log the head and tail are at, just how much space we
have remaining before the head and tail overlap.

So, convert the grant heads to track the byte reservations that are
active rather than the current (cycle, offset) tuples. This means an
empty log has zero bytes consumed, and a full log is when the the
reservations reach the size of the log minus the space consumed by
the AIL.

This greatly simplifies the accounting and checks for whether there
is space available. We no longer need to crack or combine LSNs to
determine how much space the log has left, nor do we need to look at
the head or tail of the log to determine how close to full we are.

There is, however, a complexity that needs to be handled. We know
how much space is being tracked in the AIL now via log->l_tail_space
and the log tickets track active reservations and return the unused
portions to the grant heads when ungranted.  Unfortunately, we don't
track the used portion of the grant, so when we transfer log items
from the CIL to the AIL, the space accounted to the grant heads is
transferred to the log tail space.  Hence when we move the AIL head
forwards on item insert, we have to remove that space from the grant
heads.

We also remove the xlog_verify_grant_tail() debug function as it is
no longer useful. The check it performs has been racy since delayed
logging was introduced, but now it is clearly only detecting false
positives so remove it.

The result of this substantially simpler accounting algorithm is an
increase in sustained transaction rate from ~1.3 million
transactions/s to ~1.9 million transactions/s with no increase in
CPU usage. We also remove the 32 bit space limitation on the grant
heads, which will allow us to increase the journal size beyond 2GB
in future.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_log.c         | 205 ++++++++++++---------------------------
 fs/xfs/xfs_log_cil.c     |  12 +++
 fs/xfs/xfs_log_priv.h    |  45 +++------
 fs/xfs/xfs_log_recover.c |   4 -
 fs/xfs/xfs_sysfs.c       |  17 ++--
 fs/xfs/xfs_trace.h       |  33 ++++---
 6 files changed, 113 insertions(+), 203 deletions(-)

diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
index 459c0f438c89..148214cf7032 100644
--- a/fs/xfs/xfs_log.c
+++ b/fs/xfs/xfs_log.c
@@ -53,9 +53,6 @@ xlog_sync(
 	struct xlog_ticket	*ticket);
 #if defined(DEBUG)
 STATIC void
-xlog_verify_grant_tail(
-	struct xlog *log);
-STATIC void
 xlog_verify_iclog(
 	struct xlog		*log,
 	struct xlog_in_core	*iclog,
@@ -65,7 +62,6 @@ xlog_verify_tail_lsn(
 	struct xlog		*log,
 	struct xlog_in_core	*iclog);
 #else
-#define xlog_verify_grant_tail(a)
 #define xlog_verify_iclog(a,b,c)
 #define xlog_verify_tail_lsn(a,b)
 #endif
@@ -133,30 +129,13 @@ xlog_prepare_iovec(
 	return buf;
 }
 
-static void
+void
 xlog_grant_sub_space(
 	struct xlog		*log,
 	struct xlog_grant_head	*head,
 	int			bytes)
 {
-	int64_t	head_val = atomic64_read(&head->grant);
-	int64_t new, old;
-
-	do {
-		int	cycle, space;
-
-		xlog_crack_grant_head_val(head_val, &cycle, &space);
-
-		space -= bytes;
-		if (space < 0) {
-			space += log->l_logsize;
-			cycle--;
-		}
-
-		old = head_val;
-		new = xlog_assign_grant_head_val(cycle, space);
-		head_val = atomic64_cmpxchg(&head->grant, old, new);
-	} while (head_val != old);
+	atomic64_sub(bytes, &head->grant);
 }
 
 static void
@@ -165,93 +144,39 @@ xlog_grant_add_space(
 	struct xlog_grant_head	*head,
 	int			bytes)
 {
-	int64_t	head_val = atomic64_read(&head->grant);
-	int64_t new, old;
-
-	do {
-		int		tmp;
-		int		cycle, space;
-
-		xlog_crack_grant_head_val(head_val, &cycle, &space);
-
-		tmp = log->l_logsize - space;
-		if (tmp > bytes)
-			space += bytes;
-		else {
-			space = bytes - tmp;
-			cycle++;
-		}
-
-		old = head_val;
-		new = xlog_assign_grant_head_val(cycle, space);
-		head_val = atomic64_cmpxchg(&head->grant, old, new);
-	} while (head_val != old);
+	atomic64_add(bytes, &head->grant);
 }
 
-STATIC void
+static void
 xlog_grant_head_init(
 	struct xlog_grant_head	*head)
 {
-	xlog_assign_grant_head(&head->grant, 1, 0);
+	atomic64_set(&head->grant, 0);
 	INIT_LIST_HEAD(&head->waiters);
 	spin_lock_init(&head->lock);
 }
 
 /*
- * Return the space in the log between the tail and the head.  The head
- * is passed in the cycle/bytes formal parms.  In the special case where
- * the reserve head has wrapped passed the tail, this calculation is no
- * longer valid.  In this case, just return 0 which means there is no space
- * in the log.  This works for all places where this function is called
- * with the reserve head.  Of course, if the write head were to ever
- * wrap the tail, we should blow up.  Rather than catch this case here,
- * we depend on other ASSERTions in other parts of the code.   XXXmiken
- *
- * If reservation head is behind the tail, we have a problem. Warn about it,
- * but then treat it as if the log is empty.
- *
- * If the log is shut down, the head and tail may be invalid or out of whack, so
- * shortcut invalidity asserts in this case so that we don't trigger them
- * falsely.
+ * Return the space in the log between the tail and the head.  In the case where
+ * we have overrun available reservation space, return 0. The memory barrier
+ * pairs with the smp_wmb() in xlog_cil_ail_insert() to ensure that grant head
+ * vs tail space updates are seen in the correct order and hence avoid
+ * transients as space is transferred from the grant heads to the AIL on commit
+ * completion.
  */
-static int
+static uint64_t
 xlog_grant_space_left(
 	struct xlog		*log,
 	struct xlog_grant_head	*head)
 {
-	int			tail_bytes;
-	int			tail_cycle;
-	int			head_cycle;
-	int			head_bytes;
-
-	xlog_crack_grant_head(&head->grant, &head_cycle, &head_bytes);
-	xlog_crack_atomic_lsn(&log->l_tail_lsn, &tail_cycle, &tail_bytes);
-	tail_bytes = BBTOB(tail_bytes);
-	if (tail_cycle == head_cycle && head_bytes >= tail_bytes)
-		return log->l_logsize - (head_bytes - tail_bytes);
-	if (tail_cycle + 1 < head_cycle)
-		return 0;
-
-	/* Ignore potential inconsistency when shutdown. */
-	if (xlog_is_shutdown(log))
-		return log->l_logsize;
-
-	if (tail_cycle < head_cycle) {
-		ASSERT(tail_cycle == (head_cycle - 1));
-		return tail_bytes - head_bytes;
-	}
+	int64_t			free_bytes;
 
-	/*
-	 * The reservation head is behind the tail. In this case we just want to
-	 * return the size of the log as the amount of space left.
-	 */
-	xfs_alert(log->l_mp, "xlog_grant_space_left: head behind tail");
-	xfs_alert(log->l_mp, "  tail_cycle = %d, tail_bytes = %d",
-		  tail_cycle, tail_bytes);
-	xfs_alert(log->l_mp, "  GH   cycle = %d, GH   bytes = %d",
-		  head_cycle, head_bytes);
-	ASSERT(0);
-	return log->l_logsize;
+	smp_rmb();	// paired with smp_wmb in xlog_cil_ail_insert()
+	free_bytes = log->l_logsize - READ_ONCE(log->l_tail_space) -
+			atomic64_read(&head->grant);
+	if (free_bytes > 0)
+		return free_bytes;
+	return 0;
 }
 
 STATIC void
@@ -455,7 +380,6 @@ xfs_log_regrant(
 
 	xlog_grant_add_space(log, &log->l_write_head, need_bytes);
 	trace_xfs_log_regrant_exit(log, tic);
-	xlog_verify_grant_tail(log);
 	return 0;
 
 out_error:
@@ -507,7 +431,6 @@ xfs_log_reserve(
 	xlog_grant_add_space(log, &log->l_reserve_head, need_bytes);
 	xlog_grant_add_space(log, &log->l_write_head, need_bytes);
 	trace_xfs_log_reserve_exit(log, tic);
-	xlog_verify_grant_tail(log);
 	return 0;
 
 out_error:
@@ -3343,42 +3266,27 @@ xlog_ticket_alloc(
 }
 
 #if defined(DEBUG)
-/*
- * Check to make sure the grant write head didn't just over lap the tail.  If
- * the cycles are the same, we can't be overlapping.  Otherwise, make sure that
- * the cycles differ by exactly one and check the byte count.
- *
- * This check is run unlocked, so can give false positives. Rather than assert
- * on failures, use a warn-once flag and a panic tag to allow the admin to
- * determine if they want to panic the machine when such an error occurs. For
- * debug kernels this will have the same effect as using an assert but, unlinke
- * an assert, it can be turned off at runtime.
- */
-STATIC void
-xlog_verify_grant_tail(
-	struct xlog	*log)
+static void
+xlog_verify_dump_tail(
+	struct xlog		*log,
+	struct xlog_in_core	*iclog)
 {
-	int		tail_cycle, tail_blocks;
-	int		cycle, space;
-
-	xlog_crack_grant_head(&log->l_write_head.grant, &cycle, &space);
-	xlog_crack_atomic_lsn(&log->l_tail_lsn, &tail_cycle, &tail_blocks);
-	if (tail_cycle != cycle) {
-		if (cycle - 1 != tail_cycle &&
-		    !test_and_set_bit(XLOG_TAIL_WARN, &log->l_opstate)) {
-			xfs_alert_tag(log->l_mp, XFS_PTAG_LOGRES,
-				"%s: cycle - 1 != tail_cycle", __func__);
-		}
-
-		if (space > BBTOB(tail_blocks) &&
-		    !test_and_set_bit(XLOG_TAIL_WARN, &log->l_opstate)) {
-			xfs_alert_tag(log->l_mp, XFS_PTAG_LOGRES,
-				"%s: space > BBTOB(tail_blocks)", __func__);
-		}
-	}
-}
-
-/* check if it will fit */
+	xfs_alert(log->l_mp,
+"ran out of log space tail 0x%llx/0x%llx, head lsn 0x%llx, head 0x%x/0x%x, prev head 0x%x/0x%x",
+			iclog ? be64_to_cpu(iclog->ic_header.h_tail_lsn) : -1,
+			atomic64_read(&log->l_tail_lsn),
+			log->l_ailp->ail_head_lsn,
+			log->l_curr_cycle, log->l_curr_block,
+			log->l_prev_cycle, log->l_prev_block);
+	xfs_alert(log->l_mp,
+"write grant 0x%llx, reserve grant 0x%llx, tail_space 0x%llx, size 0x%x, iclog flags 0x%x",
+			atomic64_read(&log->l_write_head.grant),
+			atomic64_read(&log->l_reserve_head.grant),
+			log->l_tail_space, log->l_logsize,
+			iclog ? iclog->ic_flags : -1);
+}
+
+/* Check if the new iclog will fit in the log. */
 STATIC void
 xlog_verify_tail_lsn(
 	struct xlog		*log,
@@ -3387,21 +3295,34 @@ xlog_verify_tail_lsn(
 	xfs_lsn_t	tail_lsn = be64_to_cpu(iclog->ic_header.h_tail_lsn);
 	int		blocks;
 
-    if (CYCLE_LSN(tail_lsn) == log->l_prev_cycle) {
-	blocks =
-	    log->l_logBBsize - (log->l_prev_block - BLOCK_LSN(tail_lsn));
-	if (blocks < BTOBB(iclog->ic_offset)+BTOBB(log->l_iclog_hsize))
-		xfs_emerg(log->l_mp, "%s: ran out of log space", __func__);
-    } else {
-	ASSERT(CYCLE_LSN(tail_lsn)+1 == log->l_prev_cycle);
+	if (CYCLE_LSN(tail_lsn) == log->l_prev_cycle) {
+		blocks = log->l_logBBsize -
+				(log->l_prev_block - BLOCK_LSN(tail_lsn));
+		if (blocks < BTOBB(iclog->ic_offset) +
+					BTOBB(log->l_iclog_hsize)) {
+			xfs_emerg(log->l_mp,
+					"%s: ran out of log space", __func__);
+			xlog_verify_dump_tail(log, iclog);
+		}
+		return;
+	}
 
-	if (BLOCK_LSN(tail_lsn) == log->l_prev_block)
+	if (CYCLE_LSN(tail_lsn) + 1 != log->l_prev_cycle) {
+		xfs_emerg(log->l_mp, "%s: head has wrapped tail.", __func__);
+		xlog_verify_dump_tail(log, iclog);
+		return;
+	}
+	if (BLOCK_LSN(tail_lsn) == log->l_prev_block) {
 		xfs_emerg(log->l_mp, "%s: tail wrapped", __func__);
+		xlog_verify_dump_tail(log, iclog);
+		return;
+	}
 
 	blocks = BLOCK_LSN(tail_lsn) - log->l_prev_block;
-	if (blocks < BTOBB(iclog->ic_offset) + 1)
-		xfs_emerg(log->l_mp, "%s: ran out of log space", __func__);
-    }
+	if (blocks < BTOBB(iclog->ic_offset) + 1) {
+		xfs_emerg(log->l_mp, "%s: ran out of iclog space", __func__);
+		xlog_verify_dump_tail(log, iclog);
+	}
 }
 
 /*
diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
index e482ae9fc01c..7ff4814b7d87 100644
--- a/fs/xfs/xfs_log_cil.c
+++ b/fs/xfs/xfs_log_cil.c
@@ -753,6 +753,7 @@ xlog_cil_ail_insert(
 	struct xfs_log_item	*log_items[LOG_ITEM_BATCH_SIZE];
 	struct xfs_log_vec	*lv;
 	struct xfs_ail_cursor	cur;
+	xfs_lsn_t		old_head;
 	int			i = 0;
 
 	/*
@@ -769,10 +770,21 @@ xlog_cil_ail_insert(
 			aborted);
 	spin_lock(&ailp->ail_lock);
 	xfs_trans_ail_cursor_last(ailp, &cur, ctx->start_lsn);
+	old_head = ailp->ail_head_lsn;
 	ailp->ail_head_lsn = ctx->commit_lsn;
 	/* xfs_ail_update_finish() drops the ail_lock */
 	xfs_ail_update_finish(ailp, NULLCOMMITLSN);
 
+	/*
+	 * We move the AIL head forwards to account for the space used in the
+	 * log before we remove that space from the grant heads. This prevents a
+	 * transient condition where reservation space appears to become
+	 * available on return, only for it to disappear again immediately as
+	 * the AIL head update accounts in the log tail space.
+	 */
+	smp_wmb();	// paired with smp_rmb in xlog_grant_space_left
+	xlog_grant_return_space(ailp->ail_log, old_head, ailp->ail_head_lsn);
+
 	/* unpin all the log items */
 	list_for_each_entry(lv, &ctx->lv_chain, lv_list) {
 		struct xfs_log_item	*lip = lv->lv_item;
diff --git a/fs/xfs/xfs_log_priv.h b/fs/xfs/xfs_log_priv.h
index 86b5959b5ef2..c7ae9172dcd9 100644
--- a/fs/xfs/xfs_log_priv.h
+++ b/fs/xfs/xfs_log_priv.h
@@ -541,36 +541,6 @@ xlog_assign_atomic_lsn(atomic64_t *lsn, uint cycle, uint block)
 	atomic64_set(lsn, xlog_assign_lsn(cycle, block));
 }
 
-/*
- * When we crack the grant head, we sample it first so that the value will not
- * change while we are cracking it into the component values. This means we
- * will always get consistent component values to work from.
- */
-static inline void
-xlog_crack_grant_head_val(int64_t val, int *cycle, int *space)
-{
-	*cycle = val >> 32;
-	*space = val & 0xffffffff;
-}
-
-static inline void
-xlog_crack_grant_head(atomic64_t *head, int *cycle, int *space)
-{
-	xlog_crack_grant_head_val(atomic64_read(head), cycle, space);
-}
-
-static inline int64_t
-xlog_assign_grant_head_val(int cycle, int space)
-{
-	return ((int64_t)cycle << 32) | space;
-}
-
-static inline void
-xlog_assign_grant_head(atomic64_t *head, int cycle, int space)
-{
-	atomic64_set(head, xlog_assign_grant_head_val(cycle, space));
-}
-
 /*
  * Committed Item List interfaces
  */
@@ -636,6 +606,21 @@ xlog_lsn_sub(
 	return (uint64_t)log->l_logsize - BBTOB(lo_block - hi_block);
 }
 
+void	xlog_grant_sub_space(struct xlog *log, struct xlog_grant_head *head,
+			int bytes);
+
+static inline void
+xlog_grant_return_space(
+	struct xlog	*log,
+	xfs_lsn_t	old_head,
+	xfs_lsn_t	new_head)
+{
+	int64_t		diff = xlog_lsn_sub(log, new_head, old_head);
+
+	xlog_grant_sub_space(log, &log->l_reserve_head, diff);
+	xlog_grant_sub_space(log, &log->l_write_head, diff);
+}
+
 /*
  * The LSN is valid so long as it is behind the current LSN. If it isn't, this
  * means that the next log record that includes this metadata could have a
diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index d9997714f975..0c1da8c13f52 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -1213,10 +1213,6 @@ xlog_set_state(
 		log->l_curr_cycle++;
 	atomic64_set(&log->l_tail_lsn, be64_to_cpu(rhead->h_tail_lsn));
 	log->l_ailp->ail_head_lsn = be64_to_cpu(rhead->h_lsn);
-	xlog_assign_grant_head(&log->l_reserve_head.grant, log->l_curr_cycle,
-					BBTOB(log->l_curr_block));
-	xlog_assign_grant_head(&log->l_write_head.grant, log->l_curr_cycle,
-					BBTOB(log->l_curr_block));
 }
 
 /*
diff --git a/fs/xfs/xfs_sysfs.c b/fs/xfs/xfs_sysfs.c
index f7faf6e70d7f..0b19acea28cb 100644
--- a/fs/xfs/xfs_sysfs.c
+++ b/fs/xfs/xfs_sysfs.c
@@ -376,14 +376,11 @@ STATIC ssize_t
 reserve_grant_head_show(
 	struct kobject	*kobject,
 	char		*buf)
-
 {
-	int cycle;
-	int bytes;
-	struct xlog *log = to_xlog(kobject);
+	struct xlog	*log = to_xlog(kobject);
+	uint64_t	bytes = atomic64_read(&log->l_reserve_head.grant);
 
-	xlog_crack_grant_head(&log->l_reserve_head.grant, &cycle, &bytes);
-	return sysfs_emit(buf, "%d:%d\n", cycle, bytes);
+	return sysfs_emit(buf, "%lld\n", bytes);
 }
 XFS_SYSFS_ATTR_RO(reserve_grant_head);
 
@@ -392,12 +389,10 @@ write_grant_head_show(
 	struct kobject	*kobject,
 	char		*buf)
 {
-	int cycle;
-	int bytes;
-	struct xlog *log = to_xlog(kobject);
+	struct xlog	*log = to_xlog(kobject);
+	uint64_t	bytes = atomic64_read(&log->l_write_head.grant);
 
-	xlog_crack_grant_head(&log->l_write_head.grant, &cycle, &bytes);
-	return sysfs_emit(buf, "%d:%d\n", cycle, bytes);
+	return sysfs_emit(buf, "%lld\n", bytes);
 }
 XFS_SYSFS_ATTR_RO(write_grant_head);
 
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index 886cde292c95..5c1871e5747e 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -1206,6 +1206,7 @@ DECLARE_EVENT_CLASS(xfs_loggrant_class,
 	TP_ARGS(log, tic),
 	TP_STRUCT__entry(
 		__field(dev_t, dev)
+		__field(unsigned long, tic)
 		__field(char, ocnt)
 		__field(char, cnt)
 		__field(int, curr_res)
@@ -1213,16 +1214,16 @@ DECLARE_EVENT_CLASS(xfs_loggrant_class,
 		__field(unsigned int, flags)
 		__field(int, reserveq)
 		__field(int, writeq)
-		__field(int, grant_reserve_cycle)
-		__field(int, grant_reserve_bytes)
-		__field(int, grant_write_cycle)
-		__field(int, grant_write_bytes)
+		__field(uint64_t, grant_reserve_bytes)
+		__field(uint64_t, grant_write_bytes)
+		__field(uint64_t, tail_space)
 		__field(int, curr_cycle)
 		__field(int, curr_block)
 		__field(xfs_lsn_t, tail_lsn)
 	),
 	TP_fast_assign(
 		__entry->dev = log->l_mp->m_super->s_dev;
+		__entry->tic = (unsigned long)tic;
 		__entry->ocnt = tic->t_ocnt;
 		__entry->cnt = tic->t_cnt;
 		__entry->curr_res = tic->t_curr_res;
@@ -1230,23 +1231,23 @@ DECLARE_EVENT_CLASS(xfs_loggrant_class,
 		__entry->flags = tic->t_flags;
 		__entry->reserveq = list_empty(&log->l_reserve_head.waiters);
 		__entry->writeq = list_empty(&log->l_write_head.waiters);
-		xlog_crack_grant_head(&log->l_reserve_head.grant,
-				&__entry->grant_reserve_cycle,
-				&__entry->grant_reserve_bytes);
-		xlog_crack_grant_head(&log->l_write_head.grant,
-				&__entry->grant_write_cycle,
-				&__entry->grant_write_bytes);
+		__entry->tail_space = READ_ONCE(log->l_tail_space);
+		__entry->grant_reserve_bytes = __entry->tail_space +
+			atomic64_read(&log->l_reserve_head.grant);
+		__entry->grant_write_bytes = __entry->tail_space +
+			atomic64_read(&log->l_write_head.grant);
 		__entry->curr_cycle = log->l_curr_cycle;
 		__entry->curr_block = log->l_curr_block;
 		__entry->tail_lsn = atomic64_read(&log->l_tail_lsn);
 	),
-	TP_printk("dev %d:%d t_ocnt %u t_cnt %u t_curr_res %u "
+	TP_printk("dev %d:%d tic 0x%lx t_ocnt %u t_cnt %u t_curr_res %u "
 		  "t_unit_res %u t_flags %s reserveq %s "
-		  "writeq %s grant_reserve_cycle %d "
-		  "grant_reserve_bytes %d grant_write_cycle %d "
-		  "grant_write_bytes %d curr_cycle %d curr_block %d "
+		  "writeq %s "
+		  "tail space %llu grant_reserve_bytes %llu "
+		  "grant_write_bytes %llu curr_cycle %d curr_block %d "
 		  "tail_cycle %d tail_block %d",
 		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  __entry->tic,
 		  __entry->ocnt,
 		  __entry->cnt,
 		  __entry->curr_res,
@@ -1254,9 +1255,8 @@ DECLARE_EVENT_CLASS(xfs_loggrant_class,
 		  __print_flags(__entry->flags, "|", XLOG_TIC_FLAGS),
 		  __entry->reserveq ? "empty" : "active",
 		  __entry->writeq ? "empty" : "active",
-		  __entry->grant_reserve_cycle,
+		  __entry->tail_space,
 		  __entry->grant_reserve_bytes,
-		  __entry->grant_write_cycle,
 		  __entry->grant_write_bytes,
 		  __entry->curr_cycle,
 		  __entry->curr_block,
@@ -1284,6 +1284,7 @@ DEFINE_LOGGRANT_EVENT(xfs_log_ticket_ungrant);
 DEFINE_LOGGRANT_EVENT(xfs_log_ticket_ungrant_sub);
 DEFINE_LOGGRANT_EVENT(xfs_log_ticket_ungrant_exit);
 DEFINE_LOGGRANT_EVENT(xfs_log_cil_wait);
+DEFINE_LOGGRANT_EVENT(xfs_log_cil_return);
 
 DECLARE_EVENT_CLASS(xfs_log_item_class,
 	TP_PROTO(struct xfs_log_item *lip),
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/9] xfs: move and xfs_trans_committed_bulk
  2022-08-09 23:03 ` [PATCH 1/9] xfs: move and xfs_trans_committed_bulk Dave Chinner
@ 2022-08-10 14:17   ` kernel test robot
  2022-08-10 17:08   ` kernel test robot
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 39+ messages in thread
From: kernel test robot @ 2022-08-10 14:17 UTC (permalink / raw)
  To: Dave Chinner, linux-xfs; +Cc: kbuild-all

Hi Dave,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on xfs-linux/for-next]
[also build test WARNING on linus/master next-20220810]
[cannot apply to v5.19]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Dave-Chinner/xfs-byte-base-grant-head-reservation-tracking/20220810-072405
base:   https://git.kernel.org/pub/scm/fs/xfs/xfs-linux.git for-next
config: i386-allyesconfig (https://download.01.org/0day-ci/archive/20220810/202208102203.TPVkxa2S-lkp@intel.com/config)
compiler: gcc-11 (Debian 11.3.0-3) 11.3.0
reproduce (this is a W=1 build):
        # https://github.com/intel-lab-lkp/linux/commit/f02000d53b0e6d6ac32e63c1ac72be9aa7c1b69c
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Dave-Chinner/xfs-byte-base-grant-head-reservation-tracking/20220810-072405
        git checkout f02000d53b0e6d6ac32e63c1ac72be9aa7c1b69c
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        make W=1 O=build_dir ARCH=i386 SHELL=/bin/bash fs/xfs/

If you fix the issue, kindly add following tag where applicable
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

>> fs/xfs/xfs_log_cil.c:729:1: warning: no previous prototype for 'xlog_cil_ail_insert' [-Wmissing-prototypes]
     729 | xlog_cil_ail_insert(
         | ^~~~~~~~~~~~~~~~~~~


vim +/xlog_cil_ail_insert +729 fs/xfs/xfs_log_cil.c

   707	
   708	/*
   709	 * Take the checkpoint's log vector chain of items and insert the attached log
   710	 * items into the the AIL. This uses bulk insertion techniques to minimise AIL
   711	 * lock traffic.
   712	 *
   713	 * If we are called with the aborted flag set, it is because a log write during
   714	 * a CIL checkpoint commit has failed. In this case, all the items in the
   715	 * checkpoint have already gone through iop_committed and iop_committing, which
   716	 * means that checkpoint commit abort handling is treated exactly the same as an
   717	 * iclog write error even though we haven't started any IO yet. Hence in this
   718	 * case all we need to do is iop_committed processing, followed by an
   719	 * iop_unpin(aborted) call.
   720	 *
   721	 * The AIL cursor is used to optimise the insert process. If commit_lsn is not
   722	 * at the end of the AIL, the insert cursor avoids the need to walk the AIL to
   723	 * find the insertion point on every xfs_log_item_batch_insert() call. This
   724	 * saves a lot of needless list walking and is a net win, even though it
   725	 * slightly increases that amount of AIL lock traffic to set it up and tear it
   726	 * down.
   727	 */
   728	void
 > 729	xlog_cil_ail_insert(
   730		struct xlog		*log,
   731		struct list_head	*lv_chain,
   732		xfs_lsn_t		commit_lsn,
   733		bool			aborted)
   734	{
   735	#define LOG_ITEM_BATCH_SIZE	32
   736		struct xfs_ail		*ailp = log->l_ailp;
   737		struct xfs_log_item	*log_items[LOG_ITEM_BATCH_SIZE];
   738		struct xfs_log_vec	*lv;
   739		struct xfs_ail_cursor	cur;
   740		int			i = 0;
   741	
   742		spin_lock(&ailp->ail_lock);
   743		xfs_trans_ail_cursor_last(ailp, &cur, commit_lsn);
   744		spin_unlock(&ailp->ail_lock);
   745	
   746		/* unpin all the log items */
   747		list_for_each_entry(lv, lv_chain, lv_list) {
   748			struct xfs_log_item	*lip = lv->lv_item;
   749			xfs_lsn_t		item_lsn;
   750	
   751			if (aborted)
   752				set_bit(XFS_LI_ABORTED, &lip->li_flags);
   753	
   754			if (lip->li_ops->flags & XFS_ITEM_RELEASE_WHEN_COMMITTED) {
   755				lip->li_ops->iop_release(lip);
   756				continue;
   757			}
   758	
   759			if (lip->li_ops->iop_committed)
   760				item_lsn = lip->li_ops->iop_committed(lip, commit_lsn);
   761			else
   762				item_lsn = commit_lsn;
   763	
   764			/* item_lsn of -1 means the item needs no further processing */
   765			if (XFS_LSN_CMP(item_lsn, (xfs_lsn_t)-1) == 0)
   766				continue;
   767	
   768			/*
   769			 * if we are aborting the operation, no point in inserting the
   770			 * object into the AIL as we are in a shutdown situation.
   771			 */
   772			if (aborted) {
   773				ASSERT(xlog_is_shutdown(ailp->ail_log));
   774				if (lip->li_ops->iop_unpin)
   775					lip->li_ops->iop_unpin(lip, 1);
   776				continue;
   777			}
   778	
   779			if (item_lsn != commit_lsn) {
   780	
   781				/*
   782				 * Not a bulk update option due to unusual item_lsn.
   783				 * Push into AIL immediately, rechecking the lsn once
   784				 * we have the ail lock. Then unpin the item. This does
   785				 * not affect the AIL cursor the bulk insert path is
   786				 * using.
   787				 */
   788				spin_lock(&ailp->ail_lock);
   789				if (XFS_LSN_CMP(item_lsn, lip->li_lsn) > 0)
   790					xfs_trans_ail_update(ailp, lip, item_lsn);
   791				else
   792					spin_unlock(&ailp->ail_lock);
   793				if (lip->li_ops->iop_unpin)
   794					lip->li_ops->iop_unpin(lip, 0);
   795				continue;
   796			}
   797	
   798			/* Item is a candidate for bulk AIL insert.  */
   799			log_items[i++] = lv->lv_item;
   800			if (i >= LOG_ITEM_BATCH_SIZE) {
   801				xlog_cil_ail_insert_batch(ailp, &cur, log_items,
   802						LOG_ITEM_BATCH_SIZE, commit_lsn);
   803				i = 0;
   804			}
   805		}
   806	
   807		/* make sure we insert the remainder! */
   808		if (i)
   809			xlog_cil_ail_insert_batch(ailp, &cur, log_items, i, commit_lsn);
   810	
   811		spin_lock(&ailp->ail_lock);
   812		xfs_trans_ail_cursor_done(&cur);
   813		spin_unlock(&ailp->ail_lock);
   814	}
   815	

-- 
0-DAY CI Kernel Test Service
https://01.org/lkp

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/9] xfs: move and xfs_trans_committed_bulk
  2022-08-09 23:03 ` [PATCH 1/9] xfs: move and xfs_trans_committed_bulk Dave Chinner
  2022-08-10 14:17   ` kernel test robot
@ 2022-08-10 17:08   ` kernel test robot
  2022-08-22 15:03   ` Darrick J. Wong
  2022-09-07 13:51   ` Christoph Hellwig
  3 siblings, 0 replies; 39+ messages in thread
From: kernel test robot @ 2022-08-10 17:08 UTC (permalink / raw)
  To: Dave Chinner, linux-xfs; +Cc: llvm, kbuild-all

Hi Dave,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on xfs-linux/for-next]
[also build test WARNING on linus/master next-20220810]
[cannot apply to v5.19]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Dave-Chinner/xfs-byte-base-grant-head-reservation-tracking/20220810-072405
base:   https://git.kernel.org/pub/scm/fs/xfs/xfs-linux.git for-next
config: arm64-randconfig-r004-20220810 (https://download.01.org/0day-ci/archive/20220811/202208110057.CxJjzzoM-lkp@intel.com/config)
compiler: clang version 16.0.0 (https://github.com/llvm/llvm-project 5f1c7e2cc5a3c07cbc2412e851a7283c1841f520)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install arm64 cross compiling tool for clang build
        # apt-get install binutils-aarch64-linux-gnu
        # https://github.com/intel-lab-lkp/linux/commit/f02000d53b0e6d6ac32e63c1ac72be9aa7c1b69c
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Dave-Chinner/xfs-byte-base-grant-head-reservation-tracking/20220810-072405
        git checkout f02000d53b0e6d6ac32e63c1ac72be9aa7c1b69c
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=arm64 SHELL=/bin/bash fs/xfs/

If you fix the issue, kindly add following tag where applicable
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

>> fs/xfs/xfs_log_cil.c:729:1: warning: no previous prototype for function 'xlog_cil_ail_insert' [-Wmissing-prototypes]
   xlog_cil_ail_insert(
   ^
   fs/xfs/xfs_log_cil.c:728:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   void
   ^
   static 
   1 warning generated.


vim +/xlog_cil_ail_insert +729 fs/xfs/xfs_log_cil.c

   707	
   708	/*
   709	 * Take the checkpoint's log vector chain of items and insert the attached log
   710	 * items into the the AIL. This uses bulk insertion techniques to minimise AIL
   711	 * lock traffic.
   712	 *
   713	 * If we are called with the aborted flag set, it is because a log write during
   714	 * a CIL checkpoint commit has failed. In this case, all the items in the
   715	 * checkpoint have already gone through iop_committed and iop_committing, which
   716	 * means that checkpoint commit abort handling is treated exactly the same as an
   717	 * iclog write error even though we haven't started any IO yet. Hence in this
   718	 * case all we need to do is iop_committed processing, followed by an
   719	 * iop_unpin(aborted) call.
   720	 *
   721	 * The AIL cursor is used to optimise the insert process. If commit_lsn is not
   722	 * at the end of the AIL, the insert cursor avoids the need to walk the AIL to
   723	 * find the insertion point on every xfs_log_item_batch_insert() call. This
   724	 * saves a lot of needless list walking and is a net win, even though it
   725	 * slightly increases that amount of AIL lock traffic to set it up and tear it
   726	 * down.
   727	 */
   728	void
 > 729	xlog_cil_ail_insert(
   730		struct xlog		*log,
   731		struct list_head	*lv_chain,
   732		xfs_lsn_t		commit_lsn,
   733		bool			aborted)
   734	{
   735	#define LOG_ITEM_BATCH_SIZE	32
   736		struct xfs_ail		*ailp = log->l_ailp;
   737		struct xfs_log_item	*log_items[LOG_ITEM_BATCH_SIZE];
   738		struct xfs_log_vec	*lv;
   739		struct xfs_ail_cursor	cur;
   740		int			i = 0;
   741	
   742		spin_lock(&ailp->ail_lock);
   743		xfs_trans_ail_cursor_last(ailp, &cur, commit_lsn);
   744		spin_unlock(&ailp->ail_lock);
   745	
   746		/* unpin all the log items */
   747		list_for_each_entry(lv, lv_chain, lv_list) {
   748			struct xfs_log_item	*lip = lv->lv_item;
   749			xfs_lsn_t		item_lsn;
   750	
   751			if (aborted)
   752				set_bit(XFS_LI_ABORTED, &lip->li_flags);
   753	
   754			if (lip->li_ops->flags & XFS_ITEM_RELEASE_WHEN_COMMITTED) {
   755				lip->li_ops->iop_release(lip);
   756				continue;
   757			}
   758	
   759			if (lip->li_ops->iop_committed)
   760				item_lsn = lip->li_ops->iop_committed(lip, commit_lsn);
   761			else
   762				item_lsn = commit_lsn;
   763	
   764			/* item_lsn of -1 means the item needs no further processing */
   765			if (XFS_LSN_CMP(item_lsn, (xfs_lsn_t)-1) == 0)
   766				continue;
   767	
   768			/*
   769			 * if we are aborting the operation, no point in inserting the
   770			 * object into the AIL as we are in a shutdown situation.
   771			 */
   772			if (aborted) {
   773				ASSERT(xlog_is_shutdown(ailp->ail_log));
   774				if (lip->li_ops->iop_unpin)
   775					lip->li_ops->iop_unpin(lip, 1);
   776				continue;
   777			}
   778	
   779			if (item_lsn != commit_lsn) {
   780	
   781				/*
   782				 * Not a bulk update option due to unusual item_lsn.
   783				 * Push into AIL immediately, rechecking the lsn once
   784				 * we have the ail lock. Then unpin the item. This does
   785				 * not affect the AIL cursor the bulk insert path is
   786				 * using.
   787				 */
   788				spin_lock(&ailp->ail_lock);
   789				if (XFS_LSN_CMP(item_lsn, lip->li_lsn) > 0)
   790					xfs_trans_ail_update(ailp, lip, item_lsn);
   791				else
   792					spin_unlock(&ailp->ail_lock);
   793				if (lip->li_ops->iop_unpin)
   794					lip->li_ops->iop_unpin(lip, 0);
   795				continue;
   796			}
   797	
   798			/* Item is a candidate for bulk AIL insert.  */
   799			log_items[i++] = lv->lv_item;
   800			if (i >= LOG_ITEM_BATCH_SIZE) {
   801				xlog_cil_ail_insert_batch(ailp, &cur, log_items,
   802						LOG_ITEM_BATCH_SIZE, commit_lsn);
   803				i = 0;
   804			}
   805		}
   806	
   807		/* make sure we insert the remainder! */
   808		if (i)
   809			xlog_cil_ail_insert_batch(ailp, &cur, log_items, i, commit_lsn);
   810	
   811		spin_lock(&ailp->ail_lock);
   812		xfs_trans_ail_cursor_done(&cur);
   813		spin_unlock(&ailp->ail_lock);
   814	}
   815	

-- 
0-DAY CI Kernel Test Service
https://01.org/lkp

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/9] xfs: move and xfs_trans_committed_bulk
  2022-08-09 23:03 ` [PATCH 1/9] xfs: move and xfs_trans_committed_bulk Dave Chinner
  2022-08-10 14:17   ` kernel test robot
  2022-08-10 17:08   ` kernel test robot
@ 2022-08-22 15:03   ` Darrick J. Wong
  2022-09-07 13:51   ` Christoph Hellwig
  3 siblings, 0 replies; 39+ messages in thread
From: Darrick J. Wong @ 2022-08-22 15:03 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Wed, Aug 10, 2022 at 09:03:45AM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> Ever since the CIL and delayed logging was introduced,
> xfs_trans_committed_bulk() has been a purely CIL checkpoint
> completion function and not a transaction commit completion
> function. Now that we are adding log specific updates to this
> function, it really does not have anything to do with the
> transaction subsystem - it is really log and log item level
> functionality.
> 
> This should be part of the CIL code as it is the callback
> that moves log items from the CIL checkpoint to the AIL. Move it
> and rename it to xlog_cil_ail_insert().
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>

/me has been wondering why these two functions weren't lumped into the
rest of the cil code for quite sometime, so thx for clarifying. :)

Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> ---
>  fs/xfs/xfs_log_cil.c    | 132 +++++++++++++++++++++++++++++++++++++++-
>  fs/xfs/xfs_trans.c      | 129 ---------------------------------------
>  fs/xfs/xfs_trans_priv.h |   3 -
>  3 files changed, 131 insertions(+), 133 deletions(-)
> 
> diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
> index eccbfb99e894..475a18493c37 100644
> --- a/fs/xfs/xfs_log_cil.c
> +++ b/fs/xfs/xfs_log_cil.c
> @@ -683,6 +683,136 @@ xlog_cil_insert_items(
>  	}
>  }
>  
> +static inline void
> +xlog_cil_ail_insert_batch(
> +	struct xfs_ail		*ailp,
> +	struct xfs_ail_cursor	*cur,
> +	struct xfs_log_item	**log_items,
> +	int			nr_items,
> +	xfs_lsn_t		commit_lsn)
> +{
> +	int	i;
> +
> +	spin_lock(&ailp->ail_lock);
> +	/* xfs_trans_ail_update_bulk drops ailp->ail_lock */
> +	xfs_trans_ail_update_bulk(ailp, cur, log_items, nr_items, commit_lsn);
> +
> +	for (i = 0; i < nr_items; i++) {
> +		struct xfs_log_item *lip = log_items[i];
> +
> +		if (lip->li_ops->iop_unpin)
> +			lip->li_ops->iop_unpin(lip, 0);
> +	}
> +}
> +
> +/*
> + * Take the checkpoint's log vector chain of items and insert the attached log
> + * items into the the AIL. This uses bulk insertion techniques to minimise AIL
> + * lock traffic.
> + *
> + * If we are called with the aborted flag set, it is because a log write during
> + * a CIL checkpoint commit has failed. In this case, all the items in the
> + * checkpoint have already gone through iop_committed and iop_committing, which
> + * means that checkpoint commit abort handling is treated exactly the same as an
> + * iclog write error even though we haven't started any IO yet. Hence in this
> + * case all we need to do is iop_committed processing, followed by an
> + * iop_unpin(aborted) call.
> + *
> + * The AIL cursor is used to optimise the insert process. If commit_lsn is not
> + * at the end of the AIL, the insert cursor avoids the need to walk the AIL to
> + * find the insertion point on every xfs_log_item_batch_insert() call. This
> + * saves a lot of needless list walking and is a net win, even though it
> + * slightly increases that amount of AIL lock traffic to set it up and tear it
> + * down.
> + */
> +void
> +xlog_cil_ail_insert(
> +	struct xlog		*log,
> +	struct list_head	*lv_chain,
> +	xfs_lsn_t		commit_lsn,
> +	bool			aborted)
> +{
> +#define LOG_ITEM_BATCH_SIZE	32
> +	struct xfs_ail		*ailp = log->l_ailp;
> +	struct xfs_log_item	*log_items[LOG_ITEM_BATCH_SIZE];
> +	struct xfs_log_vec	*lv;
> +	struct xfs_ail_cursor	cur;
> +	int			i = 0;
> +
> +	spin_lock(&ailp->ail_lock);
> +	xfs_trans_ail_cursor_last(ailp, &cur, commit_lsn);
> +	spin_unlock(&ailp->ail_lock);
> +
> +	/* unpin all the log items */
> +	list_for_each_entry(lv, lv_chain, lv_list) {
> +		struct xfs_log_item	*lip = lv->lv_item;
> +		xfs_lsn_t		item_lsn;
> +
> +		if (aborted)
> +			set_bit(XFS_LI_ABORTED, &lip->li_flags);
> +
> +		if (lip->li_ops->flags & XFS_ITEM_RELEASE_WHEN_COMMITTED) {
> +			lip->li_ops->iop_release(lip);
> +			continue;
> +		}
> +
> +		if (lip->li_ops->iop_committed)
> +			item_lsn = lip->li_ops->iop_committed(lip, commit_lsn);
> +		else
> +			item_lsn = commit_lsn;
> +
> +		/* item_lsn of -1 means the item needs no further processing */
> +		if (XFS_LSN_CMP(item_lsn, (xfs_lsn_t)-1) == 0)
> +			continue;
> +
> +		/*
> +		 * if we are aborting the operation, no point in inserting the
> +		 * object into the AIL as we are in a shutdown situation.
> +		 */
> +		if (aborted) {
> +			ASSERT(xlog_is_shutdown(ailp->ail_log));
> +			if (lip->li_ops->iop_unpin)
> +				lip->li_ops->iop_unpin(lip, 1);
> +			continue;
> +		}
> +
> +		if (item_lsn != commit_lsn) {
> +
> +			/*
> +			 * Not a bulk update option due to unusual item_lsn.
> +			 * Push into AIL immediately, rechecking the lsn once
> +			 * we have the ail lock. Then unpin the item. This does
> +			 * not affect the AIL cursor the bulk insert path is
> +			 * using.
> +			 */
> +			spin_lock(&ailp->ail_lock);
> +			if (XFS_LSN_CMP(item_lsn, lip->li_lsn) > 0)
> +				xfs_trans_ail_update(ailp, lip, item_lsn);
> +			else
> +				spin_unlock(&ailp->ail_lock);
> +			if (lip->li_ops->iop_unpin)
> +				lip->li_ops->iop_unpin(lip, 0);
> +			continue;
> +		}
> +
> +		/* Item is a candidate for bulk AIL insert.  */
> +		log_items[i++] = lv->lv_item;
> +		if (i >= LOG_ITEM_BATCH_SIZE) {
> +			xlog_cil_ail_insert_batch(ailp, &cur, log_items,
> +					LOG_ITEM_BATCH_SIZE, commit_lsn);
> +			i = 0;
> +		}
> +	}
> +
> +	/* make sure we insert the remainder! */
> +	if (i)
> +		xlog_cil_ail_insert_batch(ailp, &cur, log_items, i, commit_lsn);
> +
> +	spin_lock(&ailp->ail_lock);
> +	xfs_trans_ail_cursor_done(&cur);
> +	spin_unlock(&ailp->ail_lock);
> +}
> +
>  static void
>  xlog_cil_free_logvec(
>  	struct list_head	*lv_chain)
> @@ -792,7 +922,7 @@ xlog_cil_committed(
>  		spin_unlock(&ctx->cil->xc_push_lock);
>  	}
>  
> -	xfs_trans_committed_bulk(ctx->cil->xc_log->l_ailp, &ctx->lv_chain,
> +	xlog_cil_ail_insert(ctx->cil->xc_log, &ctx->lv_chain,
>  					ctx->start_lsn, abort);
>  
>  	xfs_extent_busy_sort(&ctx->busy_extents);
> diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
> index 7bd16fbff534..58c4e875eb12 100644
> --- a/fs/xfs/xfs_trans.c
> +++ b/fs/xfs/xfs_trans.c
> @@ -715,135 +715,6 @@ xfs_trans_free_items(
>  	}
>  }
>  
> -static inline void
> -xfs_log_item_batch_insert(
> -	struct xfs_ail		*ailp,
> -	struct xfs_ail_cursor	*cur,
> -	struct xfs_log_item	**log_items,
> -	int			nr_items,
> -	xfs_lsn_t		commit_lsn)
> -{
> -	int	i;
> -
> -	spin_lock(&ailp->ail_lock);
> -	/* xfs_trans_ail_update_bulk drops ailp->ail_lock */
> -	xfs_trans_ail_update_bulk(ailp, cur, log_items, nr_items, commit_lsn);
> -
> -	for (i = 0; i < nr_items; i++) {
> -		struct xfs_log_item *lip = log_items[i];
> -
> -		if (lip->li_ops->iop_unpin)
> -			lip->li_ops->iop_unpin(lip, 0);
> -	}
> -}
> -
> -/*
> - * Bulk operation version of xfs_trans_committed that takes a log vector of
> - * items to insert into the AIL. This uses bulk AIL insertion techniques to
> - * minimise lock traffic.
> - *
> - * If we are called with the aborted flag set, it is because a log write during
> - * a CIL checkpoint commit has failed. In this case, all the items in the
> - * checkpoint have already gone through iop_committed and iop_committing, which
> - * means that checkpoint commit abort handling is treated exactly the same
> - * as an iclog write error even though we haven't started any IO yet. Hence in
> - * this case all we need to do is iop_committed processing, followed by an
> - * iop_unpin(aborted) call.
> - *
> - * The AIL cursor is used to optimise the insert process. If commit_lsn is not
> - * at the end of the AIL, the insert cursor avoids the need to walk
> - * the AIL to find the insertion point on every xfs_log_item_batch_insert()
> - * call. This saves a lot of needless list walking and is a net win, even
> - * though it slightly increases that amount of AIL lock traffic to set it up
> - * and tear it down.
> - */
> -void
> -xfs_trans_committed_bulk(
> -	struct xfs_ail		*ailp,
> -	struct list_head	*lv_chain,
> -	xfs_lsn_t		commit_lsn,
> -	bool			aborted)
> -{
> -#define LOG_ITEM_BATCH_SIZE	32
> -	struct xfs_log_item	*log_items[LOG_ITEM_BATCH_SIZE];
> -	struct xfs_log_vec	*lv;
> -	struct xfs_ail_cursor	cur;
> -	int			i = 0;
> -
> -	spin_lock(&ailp->ail_lock);
> -	xfs_trans_ail_cursor_last(ailp, &cur, commit_lsn);
> -	spin_unlock(&ailp->ail_lock);
> -
> -	/* unpin all the log items */
> -	list_for_each_entry(lv, lv_chain, lv_list) {
> -		struct xfs_log_item	*lip = lv->lv_item;
> -		xfs_lsn_t		item_lsn;
> -
> -		if (aborted)
> -			set_bit(XFS_LI_ABORTED, &lip->li_flags);
> -
> -		if (lip->li_ops->flags & XFS_ITEM_RELEASE_WHEN_COMMITTED) {
> -			lip->li_ops->iop_release(lip);
> -			continue;
> -		}
> -
> -		if (lip->li_ops->iop_committed)
> -			item_lsn = lip->li_ops->iop_committed(lip, commit_lsn);
> -		else
> -			item_lsn = commit_lsn;
> -
> -		/* item_lsn of -1 means the item needs no further processing */
> -		if (XFS_LSN_CMP(item_lsn, (xfs_lsn_t)-1) == 0)
> -			continue;
> -
> -		/*
> -		 * if we are aborting the operation, no point in inserting the
> -		 * object into the AIL as we are in a shutdown situation.
> -		 */
> -		if (aborted) {
> -			ASSERT(xlog_is_shutdown(ailp->ail_log));
> -			if (lip->li_ops->iop_unpin)
> -				lip->li_ops->iop_unpin(lip, 1);
> -			continue;
> -		}
> -
> -		if (item_lsn != commit_lsn) {
> -
> -			/*
> -			 * Not a bulk update option due to unusual item_lsn.
> -			 * Push into AIL immediately, rechecking the lsn once
> -			 * we have the ail lock. Then unpin the item. This does
> -			 * not affect the AIL cursor the bulk insert path is
> -			 * using.
> -			 */
> -			spin_lock(&ailp->ail_lock);
> -			if (XFS_LSN_CMP(item_lsn, lip->li_lsn) > 0)
> -				xfs_trans_ail_update(ailp, lip, item_lsn);
> -			else
> -				spin_unlock(&ailp->ail_lock);
> -			if (lip->li_ops->iop_unpin)
> -				lip->li_ops->iop_unpin(lip, 0);
> -			continue;
> -		}
> -
> -		/* Item is a candidate for bulk AIL insert.  */
> -		log_items[i++] = lv->lv_item;
> -		if (i >= LOG_ITEM_BATCH_SIZE) {
> -			xfs_log_item_batch_insert(ailp, &cur, log_items,
> -					LOG_ITEM_BATCH_SIZE, commit_lsn);
> -			i = 0;
> -		}
> -	}
> -
> -	/* make sure we insert the remainder! */
> -	if (i)
> -		xfs_log_item_batch_insert(ailp, &cur, log_items, i, commit_lsn);
> -
> -	spin_lock(&ailp->ail_lock);
> -	xfs_trans_ail_cursor_done(&cur);
> -	spin_unlock(&ailp->ail_lock);
> -}
> -
>  /*
>   * Sort transaction items prior to running precommit operations. This will
>   * attempt to order the items such that they will always be locked in the same
> diff --git a/fs/xfs/xfs_trans_priv.h b/fs/xfs/xfs_trans_priv.h
> index d5400150358e..52a45f0a5ef1 100644
> --- a/fs/xfs/xfs_trans_priv.h
> +++ b/fs/xfs/xfs_trans_priv.h
> @@ -19,9 +19,6 @@ void	xfs_trans_add_item(struct xfs_trans *, struct xfs_log_item *);
>  void	xfs_trans_del_item(struct xfs_log_item *);
>  void	xfs_trans_unreserve_and_mod_sb(struct xfs_trans *tp);
>  
> -void	xfs_trans_committed_bulk(struct xfs_ail *ailp,
> -				struct list_head *lv_chain,
> -				xfs_lsn_t commit_lsn, bool aborted);
>  /*
>   * AIL traversal cursor.
>   *
> -- 
> 2.36.1
> 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 2/9] xfs: AIL doesn't need manual pushing
  2022-08-09 23:03 ` [PATCH 2/9] xfs: AIL doesn't need manual pushing Dave Chinner
@ 2022-08-22 17:08   ` Darrick J. Wong
  2022-08-23  1:51     ` Dave Chinner
  2022-09-07 14:01   ` Christoph Hellwig
  1 sibling, 1 reply; 39+ messages in thread
From: Darrick J. Wong @ 2022-08-22 17:08 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Wed, Aug 10, 2022 at 09:03:46AM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> We have a mechanism that checks the amount of log space remaining
> available every time we make a transaction reservation. If the
> amount of space is below a threshold (25% free) we push on the AIL
> to tell it to do more work. To do this, we end up calculating the
> LSN that the AIL needs to push to on every reservation and updating
> the push target for the AIL with that new target LSN.
> 
> This is silly and expensive. The AIL is perfectly capable of
> calculating the push target itself, and it will always be running
> when the AIL contains objects.
> 
> Modify the AIL to calculate it's 25% push target before it starts a
> push using the same reserve grant head based calculation as is
> currently used, and remove all the places where we ask the AIL to
> push to a new 25% free target.
> 
> This does still require a manual push in certain circumstances.
> These circumstances arise when the AIL is not full, but the
> reservation grants consume the entire of the free space in the log.
> In this case, we still need to push on the AIL to free up space, so
> when we hit this condition (i.e. reservation going to sleep to wait
> on log space) we do a single push to tell the AIL it should empty
> itself. This will keep the AIL moving as new reservations come in
> and want more space, rather than keep queuing them and having to
> push the AIL repeatedly.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/libxfs/xfs_defer.c |   4 +-
>  fs/xfs/xfs_log.c          | 135 ++-----------------------------
>  fs/xfs/xfs_log.h          |   1 -
>  fs/xfs/xfs_log_priv.h     |   2 +
>  fs/xfs/xfs_trans_ail.c    | 165 +++++++++++++++++---------------------
>  fs/xfs/xfs_trans_priv.h   |  33 ++++++--
>  6 files changed, 110 insertions(+), 230 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_defer.c b/fs/xfs/libxfs/xfs_defer.c
> index 5a321b783398..79c077078785 100644
> --- a/fs/xfs/libxfs/xfs_defer.c
> +++ b/fs/xfs/libxfs/xfs_defer.c
> @@ -12,12 +12,14 @@
>  #include "xfs_mount.h"
>  #include "xfs_defer.h"
>  #include "xfs_trans.h"
> +#include "xfs_trans_priv.h"
>  #include "xfs_buf_item.h"
>  #include "xfs_inode.h"
>  #include "xfs_inode_item.h"
>  #include "xfs_trace.h"
>  #include "xfs_icache.h"
>  #include "xfs_log.h"
> +#include "xfs_log_priv.h"
>  #include "xfs_rmap.h"
>  #include "xfs_refcount.h"
>  #include "xfs_bmap.h"
> @@ -439,7 +441,7 @@ xfs_defer_relog(
>  		 * the log threshold once per call.
>  		 */
>  		if (threshold_lsn == NULLCOMMITLSN) {
> -			threshold_lsn = xlog_grant_push_threshold(log, 0);
> +			threshold_lsn = xfs_ail_push_target(log->l_ailp);
>  			if (threshold_lsn == NULLCOMMITLSN)
>  				break;
>  		}
> diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
> index 4b1c0a9c6368..c609c188bd8a 100644
> --- a/fs/xfs/xfs_log.c
> +++ b/fs/xfs/xfs_log.c
> @@ -30,10 +30,6 @@ xlog_alloc_log(
>  	struct xfs_buftarg	*log_target,
>  	xfs_daddr_t		blk_offset,
>  	int			num_bblks);
> -STATIC int
> -xlog_space_left(
> -	struct xlog		*log,
> -	atomic64_t		*head);
>  STATIC void
>  xlog_dealloc_log(
>  	struct xlog		*log);
> @@ -51,10 +47,6 @@ xlog_state_get_iclog_space(
>  	struct xlog_ticket	*ticket,
>  	int			*logoffsetp);
>  STATIC void
> -xlog_grant_push_ail(
> -	struct xlog		*log,
> -	int			need_bytes);
> -STATIC void
>  xlog_sync(
>  	struct xlog		*log,
>  	struct xlog_in_core	*iclog,
> @@ -242,42 +234,15 @@ xlog_grant_head_wake(
>  {
>  	struct xlog_ticket	*tic;
>  	int			need_bytes;
> -	bool			woken_task = false;
>  
>  	list_for_each_entry(tic, &head->waiters, t_queue) {
> -
> -		/*
> -		 * There is a chance that the size of the CIL checkpoints in
> -		 * progress at the last AIL push target calculation resulted in
> -		 * limiting the target to the log head (l_last_sync_lsn) at the
> -		 * time. This may not reflect where the log head is now as the
> -		 * CIL checkpoints may have completed.
> -		 *
> -		 * Hence when we are woken here, it may be that the head of the
> -		 * log that has moved rather than the tail. As the tail didn't
> -		 * move, there still won't be space available for the
> -		 * reservation we require.  However, if the AIL has already
> -		 * pushed to the target defined by the old log head location, we
> -		 * will hang here waiting for something else to update the AIL
> -		 * push target.
> -		 *
> -		 * Therefore, if there isn't space to wake the first waiter on
> -		 * the grant head, we need to push the AIL again to ensure the
> -		 * target reflects both the current log tail and log head
> -		 * position before we wait for the tail to move again.
> -		 */
> -
>  		need_bytes = xlog_ticket_reservation(log, head, tic);
> -		if (*free_bytes < need_bytes) {
> -			if (!woken_task)
> -				xlog_grant_push_ail(log, need_bytes);
> +		if (*free_bytes < need_bytes)
>  			return false;
> -		}
>  
>  		*free_bytes -= need_bytes;
>  		trace_xfs_log_grant_wake_up(log, tic);
>  		wake_up_process(tic->t_task);
> -		woken_task = true;
>  	}
>  
>  	return true;
> @@ -296,13 +261,15 @@ xlog_grant_head_wait(
>  	do {
>  		if (xlog_is_shutdown(log))
>  			goto shutdown;
> -		xlog_grant_push_ail(log, need_bytes);
>  
>  		__set_current_state(TASK_UNINTERRUPTIBLE);
>  		spin_unlock(&head->lock);
>  
>  		XFS_STATS_INC(log->l_mp, xs_sleep_logspace);
>  
> +		/* Push on the AIL to free up all the log space. */
> +		xfs_ail_push_all(log->l_ailp);
> +
>  		trace_xfs_log_grant_sleep(log, tic);
>  		schedule();
>  		trace_xfs_log_grant_wake(log, tic);
> @@ -418,9 +385,6 @@ xfs_log_regrant(
>  	 * of rolling transactions in the log easily.
>  	 */
>  	tic->t_tid++;
> -
> -	xlog_grant_push_ail(log, tic->t_unit_res);
> -
>  	tic->t_curr_res = tic->t_unit_res;
>  	if (tic->t_cnt > 0)
>  		return 0;
> @@ -477,12 +441,7 @@ xfs_log_reserve(
>  	ASSERT(*ticp == NULL);
>  	tic = xlog_ticket_alloc(log, unit_bytes, cnt, permanent);
>  	*ticp = tic;
> -
> -	xlog_grant_push_ail(log, tic->t_cnt ? tic->t_unit_res * tic->t_cnt
> -					    : tic->t_unit_res);
> -
>  	trace_xfs_log_reserve(log, tic);
> -
>  	error = xlog_grant_head_check(log, &log->l_reserve_head, tic,
>  				      &need_bytes);
>  	if (error)
> @@ -1337,7 +1296,7 @@ xlog_assign_tail_lsn(
>   * shortcut invalidity asserts in this case so that we don't trigger them
>   * falsely.
>   */
> -STATIC int
> +int
>  xlog_space_left(
>  	struct xlog	*log,
>  	atomic64_t	*head)
> @@ -1678,89 +1637,6 @@ xlog_alloc_log(
>  	return ERR_PTR(error);
>  }	/* xlog_alloc_log */
>  
> -/*
> - * Compute the LSN that we'd need to push the log tail towards in order to have
> - * (a) enough on-disk log space to log the number of bytes specified, (b) at
> - * least 25% of the log space free, and (c) at least 256 blocks free.  If the
> - * log free space already meets all three thresholds, this function returns
> - * NULLCOMMITLSN.
> - */
> -xfs_lsn_t
> -xlog_grant_push_threshold(
> -	struct xlog	*log,
> -	int		need_bytes)
> -{
> -	xfs_lsn_t	threshold_lsn = 0;
> -	xfs_lsn_t	last_sync_lsn;
> -	int		free_blocks;
> -	int		free_bytes;
> -	int		threshold_block;
> -	int		threshold_cycle;
> -	int		free_threshold;
> -
> -	ASSERT(BTOBB(need_bytes) < log->l_logBBsize);
> -
> -	free_bytes = xlog_space_left(log, &log->l_reserve_head.grant);
> -	free_blocks = BTOBBT(free_bytes);
> -
> -	/*
> -	 * Set the threshold for the minimum number of free blocks in the
> -	 * log to the maximum of what the caller needs, one quarter of the
> -	 * log, and 256 blocks.
> -	 */
> -	free_threshold = BTOBB(need_bytes);
> -	free_threshold = max(free_threshold, (log->l_logBBsize >> 2));
> -	free_threshold = max(free_threshold, 256);
> -	if (free_blocks >= free_threshold)
> -		return NULLCOMMITLSN;
> -
> -	xlog_crack_atomic_lsn(&log->l_tail_lsn, &threshold_cycle,
> -						&threshold_block);
> -	threshold_block += free_threshold;
> -	if (threshold_block >= log->l_logBBsize) {
> -		threshold_block -= log->l_logBBsize;
> -		threshold_cycle += 1;
> -	}
> -	threshold_lsn = xlog_assign_lsn(threshold_cycle,
> -					threshold_block);
> -	/*
> -	 * Don't pass in an lsn greater than the lsn of the last
> -	 * log record known to be on disk. Use a snapshot of the last sync lsn
> -	 * so that it doesn't change between the compare and the set.
> -	 */
> -	last_sync_lsn = atomic64_read(&log->l_last_sync_lsn);
> -	if (XFS_LSN_CMP(threshold_lsn, last_sync_lsn) > 0)
> -		threshold_lsn = last_sync_lsn;
> -
> -	return threshold_lsn;
> -}
> -
> -/*
> - * Push the tail of the log if we need to do so to maintain the free log space
> - * thresholds set out by xlog_grant_push_threshold.  We may need to adopt a
> - * policy which pushes on an lsn which is further along in the log once we
> - * reach the high water mark.  In this manner, we would be creating a low water
> - * mark.
> - */
> -STATIC void
> -xlog_grant_push_ail(
> -	struct xlog	*log,
> -	int		need_bytes)
> -{
> -	xfs_lsn_t	threshold_lsn;
> -
> -	threshold_lsn = xlog_grant_push_threshold(log, need_bytes);
> -	if (threshold_lsn == NULLCOMMITLSN || xlog_is_shutdown(log))
> -		return;
> -
> -	/*
> -	 * Get the transaction layer to kick the dirty buffers out to
> -	 * disk asynchronously. No point in trying to do this if
> -	 * the filesystem is shutting down.
> -	 */
> -	xfs_ail_push(log->l_ailp, threshold_lsn);
> -}
> -
>  /*
>   * Stamp cycle number in every block
>   */
> @@ -2725,7 +2601,6 @@ xlog_state_set_callback(
>  		return;
>  
>  	atomic64_set(&log->l_last_sync_lsn, header_lsn);
> -	xlog_grant_push_ail(log, 0);
>  }
>  
>  /*
> diff --git a/fs/xfs/xfs_log.h b/fs/xfs/xfs_log.h
> index 2728886c2963..6b6ee35b3885 100644
> --- a/fs/xfs/xfs_log.h
> +++ b/fs/xfs/xfs_log.h
> @@ -156,7 +156,6 @@ int	xfs_log_quiesce(struct xfs_mount *mp);
>  void	xfs_log_clean(struct xfs_mount *mp);
>  bool	xfs_log_check_lsn(struct xfs_mount *, xfs_lsn_t);
>  
> -xfs_lsn_t xlog_grant_push_threshold(struct xlog *log, int need_bytes);
>  bool	  xlog_force_shutdown(struct xlog *log, uint32_t shutdown_flags);
>  
>  void xlog_use_incompat_feat(struct xlog *log);
> diff --git a/fs/xfs/xfs_log_priv.h b/fs/xfs/xfs_log_priv.h
> index 1bd2963e8fbd..91a8c74f4626 100644
> --- a/fs/xfs/xfs_log_priv.h
> +++ b/fs/xfs/xfs_log_priv.h
> @@ -573,6 +573,8 @@ xlog_assign_grant_head(atomic64_t *head, int cycle, int space)
>  	atomic64_set(head, xlog_assign_grant_head_val(cycle, space));
>  }
>  
> +int xlog_space_left(struct xlog	 *log, atomic64_t *head);
> +
>  /*
>   * Committed Item List interfaces
>   */
> diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c
> index d3a97a028560..243d6b05e5a9 100644
> --- a/fs/xfs/xfs_trans_ail.c
> +++ b/fs/xfs/xfs_trans_ail.c
> @@ -134,25 +134,6 @@ xfs_ail_min_lsn(
>  	return lsn;
>  }
>  
> -/*
> - * Return the maximum lsn held in the AIL, or zero if the AIL is empty.
> - */
> -static xfs_lsn_t
> -xfs_ail_max_lsn(
> -	struct xfs_ail		*ailp)
> -{
> -	xfs_lsn_t       	lsn = 0;
> -	struct xfs_log_item	*lip;
> -
> -	spin_lock(&ailp->ail_lock);
> -	lip = xfs_ail_max(ailp);
> -	if (lip)
> -		lsn = lip->li_lsn;
> -	spin_unlock(&ailp->ail_lock);
> -
> -	return lsn;
> -}
> -
>  /*
>   * The cursor keeps track of where our current traversal is up to by tracking
>   * the next item in the list for us. However, for this to be safe, removing an
> @@ -414,6 +395,57 @@ xfsaild_push_item(
>  	return lip->li_ops->iop_push(lip, &ailp->ail_buf_list);
>  }
>  
> +/*
> + * Compute the LSN that we'd need to push the log tail towards in order to have
> + * at least 25% of the log space free.  If the log free space already meets this
> + * threshold, this function returns NULLCOMMITLSN.
> + */
> +xfs_lsn_t
> +__xfs_ail_push_target(
> +	struct xfs_ail		*ailp)
> +{
> +	struct xlog	*log = ailp->ail_log;
> +	xfs_lsn_t	threshold_lsn = 0;
> +	xfs_lsn_t	last_sync_lsn;
> +	int		free_blocks;
> +	int		free_bytes;
> +	int		threshold_block;
> +	int		threshold_cycle;
> +	int		free_threshold;
> +
> +	free_bytes = xlog_space_left(log, &log->l_reserve_head.grant);
> +	free_blocks = BTOBBT(free_bytes);
> +
> +	/*
> +	 * Set the threshold for the minimum number of free blocks in the
> +	 * log to the maximum of what the caller needs, one quarter of the
> +	 * log, and 256 blocks.
> +	 */
> +	free_threshold = log->l_logBBsize >> 2;
> +	if (free_blocks >= free_threshold)

What happened to the "free_threshold = max(free_threshold, 256);" from
the old code?  Or is the documented 256 block minimum no longer
necessary?

> +		return NULLCOMMITLSN;
> +
> +	xlog_crack_atomic_lsn(&log->l_tail_lsn, &threshold_cycle,
> +						&threshold_block);
> +	threshold_block += free_threshold;
> +	if (threshold_block >= log->l_logBBsize) {
> +		threshold_block -= log->l_logBBsize;
> +		threshold_cycle += 1;
> +	}
> +	threshold_lsn = xlog_assign_lsn(threshold_cycle,
> +					threshold_block);
> +	/*
> +	 * Don't pass in an lsn greater than the lsn of the last
> +	 * log record known to be on disk. Use a snapshot of the last sync lsn
> +	 * so that it doesn't change between the compare and the set.
> +	 */
> +	last_sync_lsn = atomic64_read(&log->l_last_sync_lsn);
> +	if (XFS_LSN_CMP(threshold_lsn, last_sync_lsn) > 0)
> +		threshold_lsn = last_sync_lsn;
> +
> +	return threshold_lsn;
> +}
> +
>  static long
>  xfsaild_push(
>  	struct xfs_ail		*ailp)
> @@ -422,7 +454,7 @@ xfsaild_push(
>  	struct xfs_ail_cursor	cur;
>  	struct xfs_log_item	*lip;
>  	xfs_lsn_t		lsn;
> -	xfs_lsn_t		target;
> +	xfs_lsn_t		target = NULLCOMMITLSN;
>  	long			tout;
>  	int			stuck = 0;
>  	int			flushing = 0;
> @@ -454,21 +486,24 @@ xfsaild_push(
>  	 * capture updates that occur after the sync push waiter has gone to
>  	 * sleep.
>  	 */
> -	if (waitqueue_active(&ailp->ail_empty)) {
> +	if (test_bit(XFS_AIL_OPSTATE_PUSH_ALL, &ailp->ail_opstate) ||
> +	    waitqueue_active(&ailp->ail_empty)) {
>  		lip = xfs_ail_max(ailp);
>  		if (lip)
>  			target = lip->li_lsn;
> +		else
> +			clear_bit(XFS_AIL_OPSTATE_PUSH_ALL, &ailp->ail_opstate);
>  	} else {
> -		/* barrier matches the ail_target update in xfs_ail_push() */
> -		smp_rmb();
> -		target = ailp->ail_target;
> -		ailp->ail_target_prev = target;
> +		target = __xfs_ail_push_target(ailp);

Hmm.  So now the AIL decides how far it ought to push itself: until 25%
of the log is free if nobody's watching, or all the way to the end if
there are xfs_ail_push_all_sync waiters or OPSTATE_PUSH_ALL is set
because someone needs grant space?

So the xlog*grant* callers now merely wake up the AIL and let push
whatever it will, instead of telling the AIL how far to push itself?
Does that mean that those grant callers might have to wait until the AIL
empties itself?

--D

>  	}
>  
> +	if (target == NULLCOMMITLSN)
> +		goto out_done;
> +
>  	/* we're done if the AIL is empty or our push has reached the end */
>  	lip = xfs_trans_ail_cursor_first(ailp, &cur, ailp->ail_last_pushed_lsn);
>  	if (!lip)
> -		goto out_done;
> +		goto out_done_cursor;
>  
>  	XFS_STATS_INC(mp, xs_push_ail);
>  
> @@ -551,8 +586,9 @@ xfsaild_push(
>  		lsn = lip->li_lsn;
>  	}
>  
> -out_done:
> +out_done_cursor:
>  	xfs_trans_ail_cursor_done(&cur);
> +out_done:
>  	spin_unlock(&ailp->ail_lock);
>  
>  	if (xfs_buf_delwri_submit_nowait(&ailp->ail_buf_list))
> @@ -601,7 +637,7 @@ xfsaild(
>  	set_freezable();
>  
>  	while (1) {
> -		if (tout && tout <= 20)
> +		if (tout)
>  			set_current_state(TASK_KILLABLE);
>  		else
>  			set_current_state(TASK_INTERRUPTIBLE);
> @@ -637,21 +673,9 @@ xfsaild(
>  			break;
>  		}
>  
> +		/* Idle if the AIL is empty. */
>  		spin_lock(&ailp->ail_lock);
> -
> -		/*
> -		 * Idle if the AIL is empty and we are not racing with a target
> -		 * update. We check the AIL after we set the task to a sleep
> -		 * state to guarantee that we either catch an ail_target update
> -		 * or that a wake_up resets the state to TASK_RUNNING.
> -		 * Otherwise, we run the risk of sleeping indefinitely.
> -		 *
> -		 * The barrier matches the ail_target update in xfs_ail_push().
> -		 */
> -		smp_rmb();
> -		if (!xfs_ail_min(ailp) &&
> -		    ailp->ail_target == ailp->ail_target_prev &&
> -		    list_empty(&ailp->ail_buf_list)) {
> +		if (!xfs_ail_min(ailp) && list_empty(&ailp->ail_buf_list)) {
>  			spin_unlock(&ailp->ail_lock);
>  			freezable_schedule();
>  			tout = 0;
> @@ -673,56 +697,6 @@ xfsaild(
>  	return 0;
>  }
>  
> -/*
> - * This routine is called to move the tail of the AIL forward.  It does this by
> - * trying to flush items in the AIL whose lsns are below the given
> - * threshold_lsn.
> - *
> - * The push is run asynchronously in a workqueue, which means the caller needs
> - * to handle waiting on the async flush for space to become available.
> - * We don't want to interrupt any push that is in progress, hence we only queue
> - * work if we set the pushing bit appropriately.
> - *
> - * We do this unlocked - we only need to know whether there is anything in the
> - * AIL at the time we are called. We don't need to access the contents of
> - * any of the objects, so the lock is not needed.
> - */
> -void
> -xfs_ail_push(
> -	struct xfs_ail		*ailp,
> -	xfs_lsn_t		threshold_lsn)
> -{
> -	struct xfs_log_item	*lip;
> -
> -	lip = xfs_ail_min(ailp);
> -	if (!lip || xlog_is_shutdown(ailp->ail_log) ||
> -	    XFS_LSN_CMP(threshold_lsn, ailp->ail_target) <= 0)
> -		return;
> -
> -	/*
> -	 * Ensure that the new target is noticed in push code before it clears
> -	 * the XFS_AIL_PUSHING_BIT.
> -	 */
> -	smp_wmb();
> -	xfs_trans_ail_copy_lsn(ailp, &ailp->ail_target, &threshold_lsn);
> -	smp_wmb();
> -
> -	wake_up_process(ailp->ail_task);
> -}
> -
> -/*
> - * Push out all items in the AIL immediately
> - */
> -void
> -xfs_ail_push_all(
> -	struct xfs_ail  *ailp)
> -{
> -	xfs_lsn_t       threshold_lsn = xfs_ail_max_lsn(ailp);
> -
> -	if (threshold_lsn)
> -		xfs_ail_push(ailp, threshold_lsn);
> -}
> -
>  /*
>   * Push out all items in the AIL immediately and wait until the AIL is empty.
>   */
> @@ -828,6 +802,13 @@ xfs_trans_ail_update_bulk(
>  	if (!list_empty(&tmp))
>  		xfs_ail_splice(ailp, cur, &tmp, lsn);
>  
> +	/*
> +	 * If this is the first insert, wake up the push daemon so it can
> +	 * actively scan for items to push.
> +	 */
> +	if (!mlip)
> +		wake_up_process(ailp->ail_task);
> +
>  	xfs_ail_update_finish(ailp, tail_lsn);
>  }
>  
> diff --git a/fs/xfs/xfs_trans_priv.h b/fs/xfs/xfs_trans_priv.h
> index 52a45f0a5ef1..9a131e7fae94 100644
> --- a/fs/xfs/xfs_trans_priv.h
> +++ b/fs/xfs/xfs_trans_priv.h
> @@ -52,16 +52,18 @@ struct xfs_ail {
>  	struct xlog		*ail_log;
>  	struct task_struct	*ail_task;
>  	struct list_head	ail_head;
> -	xfs_lsn_t		ail_target;
> -	xfs_lsn_t		ail_target_prev;
>  	struct list_head	ail_cursors;
>  	spinlock_t		ail_lock;
>  	xfs_lsn_t		ail_last_pushed_lsn;
>  	int			ail_log_flush;
> +	unsigned long		ail_opstate;
>  	struct list_head	ail_buf_list;
>  	wait_queue_head_t	ail_empty;
>  };
>  
> +/* Push all items out of the AIL immediately. */
> +#define XFS_AIL_OPSTATE_PUSH_ALL	0u
> +
>  /*
>   * From xfs_trans_ail.c
>   */
> @@ -98,10 +100,29 @@ void xfs_ail_update_finish(struct xfs_ail *ailp, xfs_lsn_t old_lsn)
>  			__releases(ailp->ail_lock);
>  void xfs_trans_ail_delete(struct xfs_log_item *lip, int shutdown_type);
>  
> -void			xfs_ail_push(struct xfs_ail *, xfs_lsn_t);
> -void			xfs_ail_push_all(struct xfs_ail *);
> -void			xfs_ail_push_all_sync(struct xfs_ail *);
> -struct xfs_log_item	*xfs_ail_min(struct xfs_ail  *ailp);
> +static inline void xfs_ail_push(struct xfs_ail *ailp)
> +{
> +	wake_up_process(ailp->ail_task);
> +}
> +
> +static inline void xfs_ail_push_all(struct xfs_ail *ailp)
> +{
> +	if (!test_and_set_bit(XFS_AIL_OPSTATE_PUSH_ALL, &ailp->ail_opstate))
> +		xfs_ail_push(ailp);
> +}
> +
> +xfs_lsn_t		__xfs_ail_push_target(struct xfs_ail *ailp);
> +static inline xfs_lsn_t xfs_ail_push_target(struct xfs_ail *ailp)
> +{
> +	xfs_lsn_t	lsn;
> +
> +	spin_lock(&ailp->ail_lock);
> +	lsn = __xfs_ail_push_target(ailp);
> +	spin_unlock(&ailp->ail_lock);
> +	return lsn;
> +}
> +
> +void			xfs_ail_push_all_sync(struct xfs_ail *ailp);
>  xfs_lsn_t		xfs_ail_min_lsn(struct xfs_ail *ailp);
>  
>  struct xfs_log_item *	xfs_trans_ail_cursor_first(struct xfs_ail *ailp,
> -- 
> 2.36.1
> 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 3/9] xfs: background AIL push targets physical space, not grant space
  2022-08-09 23:03 ` [PATCH 3/9] xfs: background AIL push targets physical space, not grant space Dave Chinner
@ 2022-08-22 19:00   ` Darrick J. Wong
  2022-08-23  2:01     ` Dave Chinner
  2022-09-07 14:04   ` Christoph Hellwig
  1 sibling, 1 reply; 39+ messages in thread
From: Darrick J. Wong @ 2022-08-22 19:00 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Wed, Aug 10, 2022 at 09:03:47AM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> Currently the AIL attempts to keep 25% of the "log space" free,
> where the current used space is tracked by the reserve grant head.
> That is, it tracks both physical space used plus the amount reserved
> by transactions in progress.
> 
> When we start tail pushing, we are trying to make space for new
> reservations by writing back older metadata and the log is generally
> physically full of dirty metadata, and reservations for modifications
> in flight take up whatever space the AIL can physically free up.
> 
> Hence we don't really need to take into account the reservation
> space that has been used - we just need to keep the log tail moving
> as fast as we can to free up space for more reservations to be made.
> We know exactly how much physical space the journal is consuming in
> the AIL (i.e. max LSN - min LSN) so we can base push thresholds
> directly on this state rather than have to look at grant head
> reservations to determine how much to physically push out of the
> log.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>

Makes sense, I think.  Though I was wondering about the last patch --
pushing the AIL until it's empty when a trans_alloc can't find grant
reservation could take a while on a slow storage.  Does this mean that
we're trading the incremental freeing-up of the existing code for
potentially higher transaction allocation latency in the hopes that more
threads can get reservation?  Or does the "keep the AIL going" bits make
up for that?

Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> ---
>  fs/xfs/xfs_log_priv.h  | 18 ++++++++++++
>  fs/xfs/xfs_trans_ail.c | 67 +++++++++++++++++++-----------------------
>  2 files changed, 49 insertions(+), 36 deletions(-)
> 
> diff --git a/fs/xfs/xfs_log_priv.h b/fs/xfs/xfs_log_priv.h
> index 91a8c74f4626..9f8c601a302b 100644
> --- a/fs/xfs/xfs_log_priv.h
> +++ b/fs/xfs/xfs_log_priv.h
> @@ -622,6 +622,24 @@ xlog_wait(
>  
>  int xlog_wait_on_iclog(struct xlog_in_core *iclog);
>  
> +/* Calculate the distance between two LSNs in bytes */
> +static inline uint64_t
> +xlog_lsn_sub(
> +	struct xlog	*log,
> +	xfs_lsn_t	high,
> +	xfs_lsn_t	low)
> +{
> +	uint32_t	hi_cycle = CYCLE_LSN(high);
> +	uint32_t	hi_block = BLOCK_LSN(high);
> +	uint32_t	lo_cycle = CYCLE_LSN(low);
> +	uint32_t	lo_block = BLOCK_LSN(low);
> +
> +	if (hi_cycle == lo_cycle)
> +	       return BBTOB(hi_block - lo_block);
> +	ASSERT((hi_cycle == lo_cycle + 1) || xlog_is_shutdown(log));
> +	return (uint64_t)log->l_logsize - BBTOB(lo_block - hi_block);
> +}
> +
>  /*
>   * The LSN is valid so long as it is behind the current LSN. If it isn't, this
>   * means that the next log record that includes this metadata could have a
> diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c
> index 243d6b05e5a9..d3dcb4942d6a 100644
> --- a/fs/xfs/xfs_trans_ail.c
> +++ b/fs/xfs/xfs_trans_ail.c
> @@ -398,52 +398,47 @@ xfsaild_push_item(
>  /*
>   * Compute the LSN that we'd need to push the log tail towards in order to have
>   * at least 25% of the log space free.  If the log free space already meets this
> - * threshold, this function returns NULLCOMMITLSN.
> + * threshold, this function returns the lowest LSN in the AIL to slowly keep
> + * writeback ticking over and the tail of the log moving forward.
>   */
>  xfs_lsn_t
>  __xfs_ail_push_target(
>  	struct xfs_ail		*ailp)
>  {
> -	struct xlog	*log = ailp->ail_log;
> -	xfs_lsn_t	threshold_lsn = 0;
> -	xfs_lsn_t	last_sync_lsn;
> -	int		free_blocks;
> -	int		free_bytes;
> -	int		threshold_block;
> -	int		threshold_cycle;
> -	int		free_threshold;
> -
> -	free_bytes = xlog_space_left(log, &log->l_reserve_head.grant);
> -	free_blocks = BTOBBT(free_bytes);
> +	struct xlog		*log = ailp->ail_log;
> +	struct xfs_log_item	*lip;
>  
> -	/*
> -	 * Set the threshold for the minimum number of free blocks in the
> -	 * log to the maximum of what the caller needs, one quarter of the
> -	 * log, and 256 blocks.
> -	 */
> -	free_threshold = log->l_logBBsize >> 2;
> -	if (free_blocks >= free_threshold)
> +	xfs_lsn_t	target_lsn = 0;
> +	xfs_lsn_t	max_lsn;
> +	xfs_lsn_t	min_lsn;
> +	int32_t		free_bytes;
> +	uint32_t	target_block;
> +	uint32_t	target_cycle;
> +
> +	lockdep_assert_held(&ailp->ail_lock);
> +
> +	lip = xfs_ail_max(ailp);
> +	if (!lip)
> +		return NULLCOMMITLSN;
> +	max_lsn = lip->li_lsn;
> +	min_lsn = __xfs_ail_min_lsn(ailp);
> +
> +	free_bytes = log->l_logsize - xlog_lsn_sub(log, max_lsn, min_lsn);
> +	if (free_bytes >= log->l_logsize >> 2)
>  		return NULLCOMMITLSN;
>  
> -	xlog_crack_atomic_lsn(&log->l_tail_lsn, &threshold_cycle,
> -						&threshold_block);
> -	threshold_block += free_threshold;
> -	if (threshold_block >= log->l_logBBsize) {
> -		threshold_block -= log->l_logBBsize;
> -		threshold_cycle += 1;
> +	target_cycle = CYCLE_LSN(min_lsn);
> +	target_block = BLOCK_LSN(min_lsn) + (log->l_logBBsize >> 2);
> +	if (target_block >= log->l_logBBsize) {
> +		target_block -= log->l_logBBsize;
> +		target_cycle += 1;
>  	}
> -	threshold_lsn = xlog_assign_lsn(threshold_cycle,
> -					threshold_block);
> -	/*
> -	 * Don't pass in an lsn greater than the lsn of the last
> -	 * log record known to be on disk. Use a snapshot of the last sync lsn
> -	 * so that it doesn't change between the compare and the set.
> -	 */
> -	last_sync_lsn = atomic64_read(&log->l_last_sync_lsn);
> -	if (XFS_LSN_CMP(threshold_lsn, last_sync_lsn) > 0)
> -		threshold_lsn = last_sync_lsn;
> +	target_lsn = xlog_assign_lsn(target_cycle, target_block);
>  
> -	return threshold_lsn;
> +	/* Cap the target to the highest LSN known to be in the AIL. */
> +	if (XFS_LSN_CMP(target_lsn, max_lsn) > 0)
> +		return max_lsn;
> +	return target_lsn;
>  }
>  
>  static long
> -- 
> 2.36.1
> 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 4/9] xfs: ensure log tail is always up to date
  2022-08-09 23:03 ` [PATCH 4/9] xfs: ensure log tail is always up to date Dave Chinner
@ 2022-08-23  0:33   ` Darrick J. Wong
  2022-08-23  2:18     ` Dave Chinner
  2022-09-07 14:06   ` Christoph Hellwig
  1 sibling, 1 reply; 39+ messages in thread
From: Darrick J. Wong @ 2022-08-23  0:33 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Wed, Aug 10, 2022 at 09:03:48AM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> Whenever we write an iclog, we call xlog_assign_tail_lsn() to update
> the current tail before we write it into the iclog header. This
> means we have to take the AIL lock on every iclog write just to
> check if the tail of the log has moved.
> 
> This doesn't avoid races with log tail updates - the log tail could
> move immediately after we assign the tail to the iclog header and
> hence by the time the iclog reaches stable storage the tail LSN has
> moved forward in memory. Hence the log tail LSN in the iclog header
> is really just a point in time snapshot of the current state of the
> AIL.
> 
> With this in mind, if we simply update the in memory log->l_tail_lsn
> every time it changes in the AIL, there is no need to update the in
> memory value when we are writing it into an iclog - it will already
> be up-to-date in memory and checking the AIL again will not change
> this.

This is too subtle for me to understand -- does the codebase
already update l_tail_lsn?  Does this patch make it do that?

--D

> Hence xlog_state_release_iclog() does not need to check the
> AIL to update the tail lsn and can just sample it directly without
> needing to take the AIL lock.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/xfs_log.c       |  5 ++---
>  fs/xfs/xfs_trans_ail.c | 17 +++++++++++++++--
>  2 files changed, 17 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
> index c609c188bd8a..042744fe37b7 100644
> --- a/fs/xfs/xfs_log.c
> +++ b/fs/xfs/xfs_log.c
> @@ -530,7 +530,6 @@ xlog_state_release_iclog(
>  	struct xlog_in_core	*iclog,
>  	struct xlog_ticket	*ticket)
>  {
> -	xfs_lsn_t		tail_lsn;
>  	bool			last_ref;
>  
>  	lockdep_assert_held(&log->l_icloglock);
> @@ -545,8 +544,8 @@ xlog_state_release_iclog(
>  	if ((iclog->ic_state == XLOG_STATE_WANT_SYNC ||
>  	     (iclog->ic_flags & XLOG_ICL_NEED_FUA)) &&
>  	    !iclog->ic_header.h_tail_lsn) {
> -		tail_lsn = xlog_assign_tail_lsn(log->l_mp);
> -		iclog->ic_header.h_tail_lsn = cpu_to_be64(tail_lsn);
> +		iclog->ic_header.h_tail_lsn =
> +				cpu_to_be64(atomic64_read(&log->l_tail_lsn));
>  	}
>  
>  	last_ref = atomic_dec_and_test(&iclog->ic_refcnt);
> diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c
> index d3dcb4942d6a..5f40509877f7 100644
> --- a/fs/xfs/xfs_trans_ail.c
> +++ b/fs/xfs/xfs_trans_ail.c
> @@ -715,6 +715,13 @@ xfs_ail_push_all_sync(
>  	finish_wait(&ailp->ail_empty, &wait);
>  }
>  
> +/*
> + * Callers should pass the the original tail lsn so that we can detect if the
> + * tail has moved as a result of the operation that was performed. If the caller
> + * needs to force a tail LSN update, it should pass NULLCOMMITLSN to bypass the
> + * "did the tail LSN change?" checks. If the caller wants to avoid a tail update
> + * (e.g. it knows the tail did not change) it should pass an @old_lsn of 0.
> + */
>  void
>  xfs_ail_update_finish(
>  	struct xfs_ail		*ailp,
> @@ -799,10 +806,16 @@ xfs_trans_ail_update_bulk(
>  
>  	/*
>  	 * If this is the first insert, wake up the push daemon so it can
> -	 * actively scan for items to push.
> +	 * actively scan for items to push. We also need to do a log tail
> +	 * LSN update to ensure that it is correctly tracked by the log, so
> +	 * set the tail_lsn to NULLCOMMITLSN so that xfs_ail_update_finish()
> +	 * will see that the tail lsn has changed and will update the tail
> +	 * appropriately.
>  	 */
> -	if (!mlip)
> +	if (!mlip) {
>  		wake_up_process(ailp->ail_task);
> +		tail_lsn = NULLCOMMITLSN;
> +	}
>  
>  	xfs_ail_update_finish(ailp, tail_lsn);
>  }
> -- 
> 2.36.1
> 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 2/9] xfs: AIL doesn't need manual pushing
  2022-08-22 17:08   ` Darrick J. Wong
@ 2022-08-23  1:51     ` Dave Chinner
  2022-08-26 15:46       ` Darrick J. Wong
  0 siblings, 1 reply; 39+ messages in thread
From: Dave Chinner @ 2022-08-23  1:51 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Mon, Aug 22, 2022 at 10:08:04AM -0700, Darrick J. Wong wrote:
> On Wed, Aug 10, 2022 at 09:03:46AM +1000, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > We have a mechanism that checks the amount of log space remaining
> > available every time we make a transaction reservation. If the
> > amount of space is below a threshold (25% free) we push on the AIL
> > to tell it to do more work. To do this, we end up calculating the
> > LSN that the AIL needs to push to on every reservation and updating
> > the push target for the AIL with that new target LSN.
> > 
> > This is silly and expensive. The AIL is perfectly capable of
> > calculating the push target itself, and it will always be running
> > when the AIL contains objects.
> > 
> > Modify the AIL to calculate it's 25% push target before it starts a
> > push using the same reserve grant head based calculation as is
> > currently used, and remove all the places where we ask the AIL to
> > push to a new 25% free target.
.....
> > @@ -414,6 +395,57 @@ xfsaild_push_item(
> >  	return lip->li_ops->iop_push(lip, &ailp->ail_buf_list);
> >  }
> >  
> > +/*
> > + * Compute the LSN that we'd need to push the log tail towards in order to have
> > + * at least 25% of the log space free.  If the log free space already meets this
> > + * threshold, this function returns NULLCOMMITLSN.
> > + */
> > +xfs_lsn_t
> > +__xfs_ail_push_target(
> > +	struct xfs_ail		*ailp)
> > +{
> > +	struct xlog	*log = ailp->ail_log;
> > +	xfs_lsn_t	threshold_lsn = 0;
> > +	xfs_lsn_t	last_sync_lsn;
> > +	int		free_blocks;
> > +	int		free_bytes;
> > +	int		threshold_block;
> > +	int		threshold_cycle;
> > +	int		free_threshold;
> > +
> > +	free_bytes = xlog_space_left(log, &log->l_reserve_head.grant);
> > +	free_blocks = BTOBBT(free_bytes);
> > +
> > +	/*
> > +	 * Set the threshold for the minimum number of free blocks in the
> > +	 * log to the maximum of what the caller needs, one quarter of the
> > +	 * log, and 256 blocks.
> > +	 */
> > +	free_threshold = log->l_logBBsize >> 2;
> > +	if (free_blocks >= free_threshold)
> 
> What happened to the "free_threshold = max(free_threshold, 256);" from
> the old code?  Or is the documented 256 block minimum no longer
> necessary?

Oh, I must have dropped the comment change when fixing the last
round of rebase conflicts. The minimum of 256 blocks is largely
useless because the even the smallest logs we create on tiny
filesystems are around 1000 filesystem blocks in size. So a minimum
free threshold of 128kB (256 BBs) is always going to be less than
one quarter the size of the journal....


> > @@ -454,21 +486,24 @@ xfsaild_push(
> >  	 * capture updates that occur after the sync push waiter has gone to
> >  	 * sleep.
> >  	 */
> > -	if (waitqueue_active(&ailp->ail_empty)) {
> > +	if (test_bit(XFS_AIL_OPSTATE_PUSH_ALL, &ailp->ail_opstate) ||
> > +	    waitqueue_active(&ailp->ail_empty)) {
> >  		lip = xfs_ail_max(ailp);
> >  		if (lip)
> >  			target = lip->li_lsn;
> > +		else
> > +			clear_bit(XFS_AIL_OPSTATE_PUSH_ALL, &ailp->ail_opstate);
> >  	} else {
> > -		/* barrier matches the ail_target update in xfs_ail_push() */
> > -		smp_rmb();
> > -		target = ailp->ail_target;
> > -		ailp->ail_target_prev = target;
> > +		target = __xfs_ail_push_target(ailp);
> 
> Hmm.  So now the AIL decides how far it ought to push itself: until 25%
> of the log is free if nobody's watching, or all the way to the end if
> there are xfs_ail_push_all_sync waiters or OPSTATE_PUSH_ALL is set
> because someone needs grant space?

Kind of. What the target does is determine if the AIL needs to do
any work before it goes back to sleep. If we haven't run out of
reservation space or memory (or some other push all trigger), it
will simply go back to sleep for a while if there is more than 25%
of the journal space free without doing anything.

If there are items in the AIL at a lower LSN than the target, it
will try to push up to the target or to the point of getting stuck
before going back to sleep and trying again soon after.

If the OPSTATE_PUSH_ALL flag is set, it will keep updating the
push target until the log is empty every time it loops. THis is
slightly different behaviour to the existing "push all" code which
selects a LSN to push towards and it doesn't try to push beyond that
even if new items are inserted into the AIL after the push_all has
been triggered.

However, because push_all_sync() effectly waits until the AIL is
empty (i.e. keep looping updating the push target until the AIL is
empty), and async pushes never wait for it to complete, there is no
practical difference between the current implementation and this
one.

> So the xlog*grant* callers now merely wake up the AIL and let push
> whatever it will, instead of telling the AIL how far to push itself?

Yes.

> Does that mean that those grant callers might have to wait until the AIL
> empties itself?

No. The moment the log tail moves forward because of a removal from
the tail of the AIL via xfs_ail_update_finish(), we call
xlog_assign_tail_lsn_locked() to move the l_tail_lsn forwards and
make grant space available, then we call xfs_log_space_wake() to
wake up any grant waiters that are waiting on the space to be made
available.

The reason for using the "push all" when grant space runs out is
that we can run out of grant space when there is more than 25% of
the log free. Small logs are notorious for this, and we have a hack
in the log callback code (xlog_state_set_callback()) where we push
the AIL because the *head* moved) to ensure that we kick the AIL
when we consume space in it because that can push us over the "less
than 25% available" available that starts tail pushing back up
again.

Hence when we run out of grant space and are going to sleep, we have
to consider that the grant space may be consuming almost all the log
space and there is almost nothing in the AIL. In this situation, the
AIL pins the tail and moving the tail forwards is the only way the
grant space will come available, so we have to force the AIL to push
everything to guarantee grant space will eventually be returned.
Hence triggering a "push all" just before sleeping removes all the
nasty corner cases we have in other parts of the code that work
around the "we didn't ask the AIL to push enough to free grant
space" condition that leads to log space hangs...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 3/9] xfs: background AIL push targets physical space, not grant space
  2022-08-22 19:00   ` Darrick J. Wong
@ 2022-08-23  2:01     ` Dave Chinner
  2022-08-26 15:47       ` Darrick J. Wong
  0 siblings, 1 reply; 39+ messages in thread
From: Dave Chinner @ 2022-08-23  2:01 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Mon, Aug 22, 2022 at 12:00:03PM -0700, Darrick J. Wong wrote:
> On Wed, Aug 10, 2022 at 09:03:47AM +1000, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > Currently the AIL attempts to keep 25% of the "log space" free,
> > where the current used space is tracked by the reserve grant head.
> > That is, it tracks both physical space used plus the amount reserved
> > by transactions in progress.
> > 
> > When we start tail pushing, we are trying to make space for new
> > reservations by writing back older metadata and the log is generally
> > physically full of dirty metadata, and reservations for modifications
> > in flight take up whatever space the AIL can physically free up.
> > 
> > Hence we don't really need to take into account the reservation
> > space that has been used - we just need to keep the log tail moving
> > as fast as we can to free up space for more reservations to be made.
> > We know exactly how much physical space the journal is consuming in
> > the AIL (i.e. max LSN - min LSN) so we can base push thresholds
> > directly on this state rather than have to look at grant head
> > reservations to determine how much to physically push out of the
> > log.
> > 
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> 
> Makes sense, I think.  Though I was wondering about the last patch --
> pushing the AIL until it's empty when a trans_alloc can't find grant
> reservation could take a while on a slow storage.

The push in the grant reservation code is not a blocking push - it
just tells the AIL to start pushing everything, then it goes to
sleep waiting for the tail to move and space to come available. The
AIL behaviour is largely unchanged, especially if the application is
running under even slight memory pressure as the inode shrinker will
repeatedly kick the AIL push-all trigger regardless of consumed
journal/grant space.

> Does this mean that
> we're trading the incremental freeing-up of the existing code for
> potentially higher transaction allocation latency in the hopes that more
> threads can get reservation?  Or does the "keep the AIL going" bits make
> up for that?

So far I've typically measured slightly lower worst case latencies
with this mechanism that with the existing "repeatedly push to 25%
free" that we currently have. It's not really significant enough to
make statements about (unlike cpu usage reductions or perf
increases), but it does seem to be a bit better...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 4/9] xfs: ensure log tail is always up to date
  2022-08-23  0:33   ` Darrick J. Wong
@ 2022-08-23  2:18     ` Dave Chinner
  2022-08-26 21:39       ` Darrick J. Wong
  0 siblings, 1 reply; 39+ messages in thread
From: Dave Chinner @ 2022-08-23  2:18 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Mon, Aug 22, 2022 at 05:33:19PM -0700, Darrick J. Wong wrote:
> On Wed, Aug 10, 2022 at 09:03:48AM +1000, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > Whenever we write an iclog, we call xlog_assign_tail_lsn() to update
> > the current tail before we write it into the iclog header. This
> > means we have to take the AIL lock on every iclog write just to
> > check if the tail of the log has moved.
> > 
> > This doesn't avoid races with log tail updates - the log tail could
> > move immediately after we assign the tail to the iclog header and
> > hence by the time the iclog reaches stable storage the tail LSN has
> > moved forward in memory. Hence the log tail LSN in the iclog header
> > is really just a point in time snapshot of the current state of the
> > AIL.
> > 
> > With this in mind, if we simply update the in memory log->l_tail_lsn
> > every time it changes in the AIL, there is no need to update the in
> > memory value when we are writing it into an iclog - it will already
> > be up-to-date in memory and checking the AIL again will not change
> > this.
> 
> This is too subtle for me to understand -- does the codebase
> already update l_tail_lsn?  Does this patch make it do that?

tl;dr: if the AIL is empty, log->l_tail_lsn is not updated on the
first insert of a new item into the AILi and hence is stale.
xlog_state_release_iclog() currently works around that by calling
xlog_assign_tail_lsn() to get the tail lsn from the AIL. This change
makes sure log->l_tail_lsn is always up to date.

In more detail:

The tail update occurs in xfs_ail_update_finish(), but only if we
pass in a non-zero tail_lsn. xfs_trans_ail_update_bulk() will only
set a non-zero tail_lsn if it moves the log item at the tail of the
log (i.e. we relog the tail item and move it forwards in the AIL).

Hence if we pass a non-zero tail_lsn to xfs_ail_update_finish(), it
indicates it needs to check it against the LSN of the item currently
at the tail of the AIL. If the tail LSN has not changed, we do
nothing, if it has changed, then we call
xlog_assign_tail_lsn_locked() to update the log tail.

The problem with the current code is that if the AIL is empty when
we insert the first item, we've actually moved the log tail but we
do not update the log tail (i.e. tail_lsn is zero in this case). If
we then release an iclog for writing at this point in time, the tail
lsn it writes into the iclog header would be wrong - it does not
reflect the log tail as defined by the AIL and the checkpoint that
has just been committed.

Hence xlog_state_release_iclog() called xlog_assign_tail_lsn() to
ensure that it checked that the tail LSN it applies to the iclog
reflects the current state of the AIL. i.e. it checks if there is an
item in the AIL, and if so, grabs the tail_lsn from the AIL. This
works around the fact the AIL doesn't update the log tail on the
first insert.

Hence what this patch does is have xfs_trans_ail_update_bulk set
the tail_lsn passed to xfs_ail_update_finish() to NULLCOMMITLSN when
it does the first insert into the AIL. NULLCOMMITLSN is a
non-zero value that won't match with the LSN of items we just
inserted into the AIL, and hence xfs_ail_update_finish() will go an
update the log tail in this case.

Hence we close the hole when the log->l_tail_lsn is incorrect after
the first insert into the AIL, and hence we no longer need to update
the log->l_tail_lsn when reading it into the iclog header -
log->l_tail_lsn is always up to date, and so we can now just read it
in xlog_state_release_iclog() rather than having to grab the AIL
lock and checking the AIL to update log->l_tail_lsn with the correct
tail value from iclog IO submission....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 2/9] xfs: AIL doesn't need manual pushing
  2022-08-23  1:51     ` Dave Chinner
@ 2022-08-26 15:46       ` Darrick J. Wong
  0 siblings, 0 replies; 39+ messages in thread
From: Darrick J. Wong @ 2022-08-26 15:46 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Tue, Aug 23, 2022 at 11:51:56AM +1000, Dave Chinner wrote:
> On Mon, Aug 22, 2022 at 10:08:04AM -0700, Darrick J. Wong wrote:
> > On Wed, Aug 10, 2022 at 09:03:46AM +1000, Dave Chinner wrote:
> > > From: Dave Chinner <dchinner@redhat.com>
> > > 
> > > We have a mechanism that checks the amount of log space remaining
> > > available every time we make a transaction reservation. If the
> > > amount of space is below a threshold (25% free) we push on the AIL
> > > to tell it to do more work. To do this, we end up calculating the
> > > LSN that the AIL needs to push to on every reservation and updating
> > > the push target for the AIL with that new target LSN.
> > > 
> > > This is silly and expensive. The AIL is perfectly capable of
> > > calculating the push target itself, and it will always be running
> > > when the AIL contains objects.
> > > 
> > > Modify the AIL to calculate it's 25% push target before it starts a
> > > push using the same reserve grant head based calculation as is
> > > currently used, and remove all the places where we ask the AIL to
> > > push to a new 25% free target.
> .....
> > > @@ -414,6 +395,57 @@ xfsaild_push_item(
> > >  	return lip->li_ops->iop_push(lip, &ailp->ail_buf_list);
> > >  }
> > >  
> > > +/*
> > > + * Compute the LSN that we'd need to push the log tail towards in order to have
> > > + * at least 25% of the log space free.  If the log free space already meets this
> > > + * threshold, this function returns NULLCOMMITLSN.
> > > + */
> > > +xfs_lsn_t
> > > +__xfs_ail_push_target(
> > > +	struct xfs_ail		*ailp)
> > > +{
> > > +	struct xlog	*log = ailp->ail_log;
> > > +	xfs_lsn_t	threshold_lsn = 0;
> > > +	xfs_lsn_t	last_sync_lsn;
> > > +	int		free_blocks;
> > > +	int		free_bytes;
> > > +	int		threshold_block;
> > > +	int		threshold_cycle;
> > > +	int		free_threshold;
> > > +
> > > +	free_bytes = xlog_space_left(log, &log->l_reserve_head.grant);
> > > +	free_blocks = BTOBBT(free_bytes);
> > > +
> > > +	/*
> > > +	 * Set the threshold for the minimum number of free blocks in the
> > > +	 * log to the maximum of what the caller needs, one quarter of the
> > > +	 * log, and 256 blocks.
> > > +	 */
> > > +	free_threshold = log->l_logBBsize >> 2;
> > > +	if (free_blocks >= free_threshold)
> > 
> > What happened to the "free_threshold = max(free_threshold, 256);" from
> > the old code?  Or is the documented 256 block minimum no longer
> > necessary?
> 
> Oh, I must have dropped the comment change when fixing the last
> round of rebase conflicts. The minimum of 256 blocks is largely
> useless because the even the smallest logs we create on tiny
> filesystems are around 1000 filesystem blocks in size. So a minimum
> free threshold of 128kB (256 BBs) is always going to be less than
> one quarter the size of the journal....

<nod> And even more pointless now that we've effectively mandated 64M
logs for all new filesystems.

> 
> > > @@ -454,21 +486,24 @@ xfsaild_push(
> > >  	 * capture updates that occur after the sync push waiter has gone to
> > >  	 * sleep.
> > >  	 */
> > > -	if (waitqueue_active(&ailp->ail_empty)) {
> > > +	if (test_bit(XFS_AIL_OPSTATE_PUSH_ALL, &ailp->ail_opstate) ||
> > > +	    waitqueue_active(&ailp->ail_empty)) {
> > >  		lip = xfs_ail_max(ailp);
> > >  		if (lip)
> > >  			target = lip->li_lsn;
> > > +		else
> > > +			clear_bit(XFS_AIL_OPSTATE_PUSH_ALL, &ailp->ail_opstate);
> > >  	} else {
> > > -		/* barrier matches the ail_target update in xfs_ail_push() */
> > > -		smp_rmb();
> > > -		target = ailp->ail_target;
> > > -		ailp->ail_target_prev = target;
> > > +		target = __xfs_ail_push_target(ailp);
> > 
> > Hmm.  So now the AIL decides how far it ought to push itself: until 25%
> > of the log is free if nobody's watching, or all the way to the end if
> > there are xfs_ail_push_all_sync waiters or OPSTATE_PUSH_ALL is set
> > because someone needs grant space?
> 
> Kind of. What the target does is determine if the AIL needs to do
> any work before it goes back to sleep. If we haven't run out of
> reservation space or memory (or some other push all trigger), it
> will simply go back to sleep for a while if there is more than 25%
> of the journal space free without doing anything.
> 
> If there are items in the AIL at a lower LSN than the target, it
> will try to push up to the target or to the point of getting stuck
> before going back to sleep and trying again soon after.
> 
> If the OPSTATE_PUSH_ALL flag is set, it will keep updating the
> push target until the log is empty every time it loops. THis is
> slightly different behaviour to the existing "push all" code which
> selects a LSN to push towards and it doesn't try to push beyond that
> even if new items are inserted into the AIL after the push_all has
> been triggered.

<nod> Ok, that's what I thought I was seeing -- the target is now a
little more dynamic, which means a "push all" will be more aggressive,
with perhaps less latency spikes later.

> However, because push_all_sync() effectly waits until the AIL is
> empty (i.e. keep looping updating the push target until the AIL is
> empty), and async pushes never wait for it to complete, there is no
> practical difference between the current implementation and this
> one.
> 
> > So the xlog*grant* callers now merely wake up the AIL and let push
> > whatever it will, instead of telling the AIL how far to push itself?
> 
> Yes.
> 
> > Does that mean that those grant callers might have to wait until the AIL
> > empties itself?
> 
> No. The moment the log tail moves forward because of a removal from
> the tail of the AIL via xfs_ail_update_finish(), we call
> xlog_assign_tail_lsn_locked() to move the l_tail_lsn forwards and
> make grant space available, then we call xfs_log_space_wake() to
> wake up any grant waiters that are waiting on the space to be made
> available.

Aha!  There's the missing piece, thank you.

> The reason for using the "push all" when grant space runs out is
> that we can run out of grant space when there is more than 25% of
> the log free. Small logs are notorious for this, and we have a hack
> in the log callback code (xlog_state_set_callback()) where we push
> the AIL because the *head* moved) to ensure that we kick the AIL
> when we consume space in it because that can push us over the "less
> than 25% available" available that starts tail pushing back up
> again.

...and thank you for the reminder of why that was there, because I was
puzzling over what that (now removed) line of code was doing.

> Hence when we run out of grant space and are going to sleep, we have
> to consider that the grant space may be consuming almost all the log
> space and there is almost nothing in the AIL. In this situation, the
> AIL pins the tail and moving the tail forwards is the only way the
> grant space will come available, so we have to force the AIL to push
> everything to guarantee grant space will eventually be returned.
> Hence triggering a "push all" just before sleeping removes all the
> nasty corner cases we have in other parts of the code that work
> around the "we didn't ask the AIL to push enough to free grant
> space" condition that leads to log space hangs...

<nod> I'll resume reading now.

--D

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 3/9] xfs: background AIL push targets physical space, not grant space
  2022-08-23  2:01     ` Dave Chinner
@ 2022-08-26 15:47       ` Darrick J. Wong
  2022-08-26 23:49         ` Darrick J. Wong
  0 siblings, 1 reply; 39+ messages in thread
From: Darrick J. Wong @ 2022-08-26 15:47 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Tue, Aug 23, 2022 at 12:01:03PM +1000, Dave Chinner wrote:
> On Mon, Aug 22, 2022 at 12:00:03PM -0700, Darrick J. Wong wrote:
> > On Wed, Aug 10, 2022 at 09:03:47AM +1000, Dave Chinner wrote:
> > > From: Dave Chinner <dchinner@redhat.com>
> > > 
> > > Currently the AIL attempts to keep 25% of the "log space" free,
> > > where the current used space is tracked by the reserve grant head.
> > > That is, it tracks both physical space used plus the amount reserved
> > > by transactions in progress.
> > > 
> > > When we start tail pushing, we are trying to make space for new
> > > reservations by writing back older metadata and the log is generally
> > > physically full of dirty metadata, and reservations for modifications
> > > in flight take up whatever space the AIL can physically free up.
> > > 
> > > Hence we don't really need to take into account the reservation
> > > space that has been used - we just need to keep the log tail moving
> > > as fast as we can to free up space for more reservations to be made.
> > > We know exactly how much physical space the journal is consuming in
> > > the AIL (i.e. max LSN - min LSN) so we can base push thresholds
> > > directly on this state rather than have to look at grant head
> > > reservations to determine how much to physically push out of the
> > > log.
> > > 
> > > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > 
> > Makes sense, I think.  Though I was wondering about the last patch --
> > pushing the AIL until it's empty when a trans_alloc can't find grant
> > reservation could take a while on a slow storage.
> 
> The push in the grant reservation code is not a blocking push - it
> just tells the AIL to start pushing everything, then it goes to
> sleep waiting for the tail to move and space to come available. The
> AIL behaviour is largely unchanged, especially if the application is
> running under even slight memory pressure as the inode shrinker will
> repeatedly kick the AIL push-all trigger regardless of consumed
> journal/grant space.

Ok.

> > Does this mean that
> > we're trading the incremental freeing-up of the existing code for
> > potentially higher transaction allocation latency in the hopes that more
> > threads can get reservation?  Or does the "keep the AIL going" bits make
> > up for that?
> 
> So far I've typically measured slightly lower worst case latencies
> with this mechanism that with the existing "repeatedly push to 25%
> free" that we currently have. It's not really significant enough to
> make statements about (unlike cpu usage reductions or perf
> increases), but it does seem to be a bit better...

<nod>

--D

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 4/9] xfs: ensure log tail is always up to date
  2022-08-23  2:18     ` Dave Chinner
@ 2022-08-26 21:39       ` Darrick J. Wong
  2022-08-26 23:49         ` Darrick J. Wong
  0 siblings, 1 reply; 39+ messages in thread
From: Darrick J. Wong @ 2022-08-26 21:39 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Tue, Aug 23, 2022 at 12:18:47PM +1000, Dave Chinner wrote:
> On Mon, Aug 22, 2022 at 05:33:19PM -0700, Darrick J. Wong wrote:
> > On Wed, Aug 10, 2022 at 09:03:48AM +1000, Dave Chinner wrote:
> > > From: Dave Chinner <dchinner@redhat.com>
> > > 
> > > Whenever we write an iclog, we call xlog_assign_tail_lsn() to update
> > > the current tail before we write it into the iclog header. This
> > > means we have to take the AIL lock on every iclog write just to
> > > check if the tail of the log has moved.
> > > 
> > > This doesn't avoid races with log tail updates - the log tail could
> > > move immediately after we assign the tail to the iclog header and
> > > hence by the time the iclog reaches stable storage the tail LSN has
> > > moved forward in memory. Hence the log tail LSN in the iclog header
> > > is really just a point in time snapshot of the current state of the
> > > AIL.
> > > 
> > > With this in mind, if we simply update the in memory log->l_tail_lsn
> > > every time it changes in the AIL, there is no need to update the in
> > > memory value when we are writing it into an iclog - it will already
> > > be up-to-date in memory and checking the AIL again will not change
> > > this.
> > 
> > This is too subtle for me to understand -- does the codebase
> > already update l_tail_lsn?  Does this patch make it do that?
> 
> tl;dr: if the AIL is empty, log->l_tail_lsn is not updated on the
> first insert of a new item into the AILi and hence is stale.
> xlog_state_release_iclog() currently works around that by calling
> xlog_assign_tail_lsn() to get the tail lsn from the AIL. This change
> makes sure log->l_tail_lsn is always up to date.
> 
> In more detail:
> 
> The tail update occurs in xfs_ail_update_finish(), but only if we
> pass in a non-zero tail_lsn. xfs_trans_ail_update_bulk() will only
> set a non-zero tail_lsn if it moves the log item at the tail of the
> log (i.e. we relog the tail item and move it forwards in the AIL).
> 
> Hence if we pass a non-zero tail_lsn to xfs_ail_update_finish(), it
> indicates it needs to check it against the LSN of the item currently
> at the tail of the AIL. If the tail LSN has not changed, we do
> nothing, if it has changed, then we call
> xlog_assign_tail_lsn_locked() to update the log tail.
> 
> The problem with the current code is that if the AIL is empty when
> we insert the first item, we've actually moved the log tail but we
> do not update the log tail (i.e. tail_lsn is zero in this case). If
> we then release an iclog for writing at this point in time, the tail
> lsn it writes into the iclog header would be wrong - it does not
> reflect the log tail as defined by the AIL and the checkpoint that
> has just been committed.
> 
> Hence xlog_state_release_iclog() called xlog_assign_tail_lsn() to
> ensure that it checked that the tail LSN it applies to the iclog
> reflects the current state of the AIL. i.e. it checks if there is an
> item in the AIL, and if so, grabs the tail_lsn from the AIL. This
> works around the fact the AIL doesn't update the log tail on the
> first insert.
> 
> Hence what this patch does is have xfs_trans_ail_update_bulk set
> the tail_lsn passed to xfs_ail_update_finish() to NULLCOMMITLSN when
> it does the first insert into the AIL. NULLCOMMITLSN is a
> non-zero value that won't match with the LSN of items we just
> inserted into the AIL, and hence xfs_ail_update_finish() will go an
> update the log tail in this case.
> 
> Hence we close the hole when the log->l_tail_lsn is incorrect after
> the first insert into the AIL, and hence we no longer need to update
> the log->l_tail_lsn when reading it into the iclog header -
> log->l_tail_lsn is always up to date, and so we can now just read it
> in xlog_state_release_iclog() rather than having to grab the AIL
> lock and checking the AIL to update log->l_tail_lsn with the correct
> tail value from iclog IO submission....

Ahhh, ok, I get it now.  Thanks for the explanation.

--D

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 5/9] xfs: l_last_sync_lsn is really AIL state
  2022-08-09 23:03 ` [PATCH 5/9] xfs: l_last_sync_lsn is really AIL state Dave Chinner
@ 2022-08-26 22:19   ` Darrick J. Wong
  2022-09-07 14:11   ` Christoph Hellwig
  1 sibling, 0 replies; 39+ messages in thread
From: Darrick J. Wong @ 2022-08-26 22:19 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Wed, Aug 10, 2022 at 09:03:49AM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> The current implementation of xlog_assign_tail_lsn() assumes that
> when the AIL is empty, the log tail matches the LSN of the last
> written commit record. This is recorded in xlog_state_set_callback()
> as log->l_last_sync_lsn when the iclog state changes to
> XLOG_STATE_CALLBACK. This change is then immediately followed by
> running the callbacks on the iclog which then insert the log items
> into the AIL at the "commit lsn" of that checkpoint.
> 
> The AIL tracks log items via the start record LSN of the checkpoint,
> not the commit record LSN. THis is because we can pipeline multiple
> checkpoints, and so the start record of checkpoint N+1 can be
> written before the commit record of checkpoint N. i.e:
> 
>      start N			commit N
> 	+-------------+------------+----------------+
> 		  start N+1			commit N+1
> 
> The tail of the log cannot be moved to the LSN of commit N when all
> the items of that checkpoint are written back, because then the
> start record for N+1 is no longer in the active portion of the log
> and recovery will fail/corrupt the filesystem.
> 
> Hence when all the log items in checkpoint N are written back, the
> tail of the log most now only move as far forwards as the start LSN
> of checkpoint N+1.
> 
> Hence we cannot use the maximum start record LSN the AIL sees as a
> replacement the pointer to the current head of the on-disk log
> records. However, we currently only use the l_last_sync_lsn when the
> AIL is empty - when there is no start LSN remaining, the tail of the
> log moves to the LSN of the last commit record as this is where
> recovery needs to start searching for recoverable records. THe next
> checkpoint will have a start record LSN that is higher than
> l_last_sync_lsn, and so everything still works correctly when new
> checkpoints are written to an otherwise empty log.
> 
> l_last_sync_lsn is an atomic variable because it is currently
> updated when an iclog with callbacks attached moves to the CALLBACK
> state. While we hold the icloglock at this point, we don't hold the
> AIL lock. When we assign the log tail, we hold the AIL lock, not the
> icloglock because we have to look up the AIL. Hence it is an atomic
> variable so it's not bound to a specific lock context.
> 
> However, the iclog callbacks are only used for CIL checkpoints. We
> don't use callbacks with unmount record writes, so the
> l_last_sync_lsn variable only gets updated when we are processing
> CIL checkpoint callbacks. And those callbacks run under AIL lock
> contexts, not icloglock context. The CIL checkpoint already knows
> what the LSN of the iclog the commit record was written to (obtained
> when written into the iclog before submission) and so we can update
> the l_last_sync_lsn under the AIL lock in this callback. No other
> iclog callbacks will run until the currently executing one
> completes, and hence we can update the l_last_sync_lsn under the AIL
> lock safely.
> 
> This means l_last_sync_lsn can move to the AIL as the "ail_head_lsn"
> and it can be used to replace the atomic l_last_sync_lsn in the
> iclog code. This makes tracking the log tail belong entirely to the
> AIL, rather than being smeared across log, iclog and AIL state and
> locking.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/xfs_log.c         | 81 +++++-----------------------------------
>  fs/xfs/xfs_log_cil.c     | 54 ++++++++++++++++++++-------
>  fs/xfs/xfs_log_priv.h    |  9 ++---
>  fs/xfs/xfs_log_recover.c | 19 +++++-----
>  fs/xfs/xfs_trace.c       |  1 +
>  fs/xfs/xfs_trace.h       |  8 ++--
>  fs/xfs/xfs_trans_ail.c   | 26 +++++++++++--
>  fs/xfs/xfs_trans_priv.h  | 13 +++++++
>  8 files changed, 102 insertions(+), 109 deletions(-)
> 
> diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
> index 042744fe37b7..e420591b1a8a 100644
> --- a/fs/xfs/xfs_log.c
> +++ b/fs/xfs/xfs_log.c
> @@ -1237,47 +1237,6 @@ xfs_log_cover(
>  	return error;
>  }
>  
> -/*
> - * We may be holding the log iclog lock upon entering this routine.
> - */
> -xfs_lsn_t
> -xlog_assign_tail_lsn_locked(
> -	struct xfs_mount	*mp)
> -{
> -	struct xlog		*log = mp->m_log;
> -	struct xfs_log_item	*lip;
> -	xfs_lsn_t		tail_lsn;
> -
> -	assert_spin_locked(&mp->m_ail->ail_lock);
> -
> -	/*
> -	 * To make sure we always have a valid LSN for the log tail we keep
> -	 * track of the last LSN which was committed in log->l_last_sync_lsn,
> -	 * and use that when the AIL was empty.
> -	 */
> -	lip = xfs_ail_min(mp->m_ail);
> -	if (lip)
> -		tail_lsn = lip->li_lsn;
> -	else
> -		tail_lsn = atomic64_read(&log->l_last_sync_lsn);
> -	trace_xfs_log_assign_tail_lsn(log, tail_lsn);
> -	atomic64_set(&log->l_tail_lsn, tail_lsn);
> -	return tail_lsn;
> -}
> -
> -xfs_lsn_t
> -xlog_assign_tail_lsn(
> -	struct xfs_mount	*mp)
> -{
> -	xfs_lsn_t		tail_lsn;
> -
> -	spin_lock(&mp->m_ail->ail_lock);
> -	tail_lsn = xlog_assign_tail_lsn_locked(mp);
> -	spin_unlock(&mp->m_ail->ail_lock);
> -
> -	return tail_lsn;
> -}
> -
>  /*
>   * Return the space in the log between the tail and the head.  The head
>   * is passed in the cycle/bytes formal parms.  In the special case where
> @@ -1511,7 +1470,6 @@ xlog_alloc_log(
>  	log->l_prev_block  = -1;
>  	/* log->l_tail_lsn = 0x100000000LL; cycle = 1; current block = 0 */
>  	xlog_assign_atomic_lsn(&log->l_tail_lsn, 1, 0);
> -	xlog_assign_atomic_lsn(&log->l_last_sync_lsn, 1, 0);
>  	log->l_curr_cycle  = 1;	    /* 0 is bad since this is initial value */
>  
>  	if (xfs_has_logv2(mp) && mp->m_sb.sb_logsunit > 1)
> @@ -2562,44 +2520,23 @@ xlog_get_lowest_lsn(
>  	return lowest_lsn;
>  }
>  
> -/*
> - * Completion of a iclog IO does not imply that a transaction has completed, as
> - * transactions can be large enough to span many iclogs. We cannot change the
> - * tail of the log half way through a transaction as this may be the only
> - * transaction in the log and moving the tail to point to the middle of it
> - * will prevent recovery from finding the start of the transaction. Hence we
> - * should only update the last_sync_lsn if this iclog contains transaction
> - * completion callbacks on it.
> - *
> - * We have to do this before we drop the icloglock to ensure we are the only one
> - * that can update it.
> - *
> - * If we are moving the last_sync_lsn forwards, we also need to ensure we kick
> - * the reservation grant head pushing. This is due to the fact that the push
> - * target is bound by the current last_sync_lsn value. Hence if we have a large
> - * amount of log space bound up in this committing transaction then the
> - * last_sync_lsn value may be the limiting factor preventing tail pushing from
> - * freeing space in the log. Hence once we've updated the last_sync_lsn we
> - * should push the AIL to ensure the push target (and hence the grant head) is
> - * no longer bound by the old log head location and can move forwards and make
> - * progress again.
> - */
>  static void
>  xlog_state_set_callback(
>  	struct xlog		*log,
>  	struct xlog_in_core	*iclog,
>  	xfs_lsn_t		header_lsn)
>  {
> +	/*
> +	 * If there are no callbacks on this iclog, we can mark it clean
> +	 * immediately and return. Otherwise we need to run the
> +	 * callbacks.
> +	 */
> +	if (list_empty(&iclog->ic_callbacks)) {
> +		xlog_state_clean_iclog(log, iclog);
> +		return;
> +	}
>  	trace_xlog_iclog_callback(iclog, _RET_IP_);
>  	iclog->ic_state = XLOG_STATE_CALLBACK;
> -
> -	ASSERT(XFS_LSN_CMP(atomic64_read(&log->l_last_sync_lsn),
> -			   header_lsn) <= 0);
> -
> -	if (list_empty_careful(&iclog->ic_callbacks))
> -		return;
> -
> -	atomic64_set(&log->l_last_sync_lsn, header_lsn);
>  }
>  
>  /*
> diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
> index 475a18493c37..843764d40232 100644
> --- a/fs/xfs/xfs_log_cil.c
> +++ b/fs/xfs/xfs_log_cil.c
> @@ -710,6 +710,24 @@ xlog_cil_ail_insert_batch(
>   * items into the the AIL. This uses bulk insertion techniques to minimise AIL
>   * lock traffic.
>   *
> + * The AIL tracks log items via the start record LSN of the checkpoint,
> + * not the commit record LSN. THis is because we can pipeline multiple

Silly nit: s/TH/Th/

So far so good otherwise.

--D


> + * checkpoints, and so the start record of checkpoint N+1 can be
> + * written before the commit record of checkpoint N. i.e:
> + *
> + *   start N			commit N
> + *	+-------------+------------+----------------+
> + *		  start N+1			commit N+1
> + *
> + * The tail of the log cannot be moved to the LSN of commit N when all
> + * the items of that checkpoint are written back, because then the
> + * start record for N+1 is no longer in the active portion of the log
> + * and recovery will fail/corrupt the filesystem.
> + *
> + * Hence when all the log items in checkpoint N are written back, the
> + * tail of the log most now only move as far forwards as the start LSN
> + * of checkpoint N+1.
> + *
>   * If we are called with the aborted flag set, it is because a log write during
>   * a CIL checkpoint commit has failed. In this case, all the items in the
>   * checkpoint have already gone through iop_committed and iop_committing, which
> @@ -727,24 +745,33 @@ xlog_cil_ail_insert_batch(
>   */
>  void
>  xlog_cil_ail_insert(
> -	struct xlog		*log,
> -	struct list_head	*lv_chain,
> -	xfs_lsn_t		commit_lsn,
> +	struct xfs_cil_ctx	*ctx,
>  	bool			aborted)
>  {
>  #define LOG_ITEM_BATCH_SIZE	32
> -	struct xfs_ail		*ailp = log->l_ailp;
> +	struct xfs_ail		*ailp = ctx->cil->xc_log->l_ailp;
>  	struct xfs_log_item	*log_items[LOG_ITEM_BATCH_SIZE];
>  	struct xfs_log_vec	*lv;
>  	struct xfs_ail_cursor	cur;
>  	int			i = 0;
>  
> +	/*
> +	 * Update the AIL head LSN with the commit record LSN of this
> +	 * checkpoint. As iclogs are always completed in order, this should
> +	 * always be the same (as iclogs can contain multiple commit records) or
> +	 * higher LSN than the current head. We do this before insertion of the
> +	 * items so that log space checks during insertion will reflect the
> +	 * space that this checkpoint has already consumed.
> +	 */
> +	ASSERT(XFS_LSN_CMP(ctx->commit_lsn, ailp->ail_head_lsn) >= 0 ||
> +			aborted);
>  	spin_lock(&ailp->ail_lock);
> -	xfs_trans_ail_cursor_last(ailp, &cur, commit_lsn);
> +	ailp->ail_head_lsn = ctx->commit_lsn;
> +	xfs_trans_ail_cursor_last(ailp, &cur, ctx->start_lsn);
>  	spin_unlock(&ailp->ail_lock);
>  
>  	/* unpin all the log items */
> -	list_for_each_entry(lv, lv_chain, lv_list) {
> +	list_for_each_entry(lv, &ctx->lv_chain, lv_list) {
>  		struct xfs_log_item	*lip = lv->lv_item;
>  		xfs_lsn_t		item_lsn;
>  
> @@ -757,9 +784,10 @@ xlog_cil_ail_insert(
>  		}
>  
>  		if (lip->li_ops->iop_committed)
> -			item_lsn = lip->li_ops->iop_committed(lip, commit_lsn);
> +			item_lsn = lip->li_ops->iop_committed(lip,
> +					ctx->start_lsn);
>  		else
> -			item_lsn = commit_lsn;
> +			item_lsn = ctx->start_lsn;
>  
>  		/* item_lsn of -1 means the item needs no further processing */
>  		if (XFS_LSN_CMP(item_lsn, (xfs_lsn_t)-1) == 0)
> @@ -776,7 +804,7 @@ xlog_cil_ail_insert(
>  			continue;
>  		}
>  
> -		if (item_lsn != commit_lsn) {
> +		if (item_lsn != ctx->start_lsn) {
>  
>  			/*
>  			 * Not a bulk update option due to unusual item_lsn.
> @@ -799,14 +827,15 @@ xlog_cil_ail_insert(
>  		log_items[i++] = lv->lv_item;
>  		if (i >= LOG_ITEM_BATCH_SIZE) {
>  			xlog_cil_ail_insert_batch(ailp, &cur, log_items,
> -					LOG_ITEM_BATCH_SIZE, commit_lsn);
> +					LOG_ITEM_BATCH_SIZE, ctx->start_lsn);
>  			i = 0;
>  		}
>  	}
>  
>  	/* make sure we insert the remainder! */
>  	if (i)
> -		xlog_cil_ail_insert_batch(ailp, &cur, log_items, i, commit_lsn);
> +		xlog_cil_ail_insert_batch(ailp, &cur, log_items, i,
> +				ctx->start_lsn);
>  
>  	spin_lock(&ailp->ail_lock);
>  	xfs_trans_ail_cursor_done(&cur);
> @@ -922,8 +951,7 @@ xlog_cil_committed(
>  		spin_unlock(&ctx->cil->xc_push_lock);
>  	}
>  
> -	xlog_cil_ail_insert(ctx->cil->xc_log, &ctx->lv_chain,
> -					ctx->start_lsn, abort);
> +	xlog_cil_ail_insert(ctx, abort);
>  
>  	xfs_extent_busy_sort(&ctx->busy_extents);
>  	xfs_extent_busy_clear(mp, &ctx->busy_extents,
> diff --git a/fs/xfs/xfs_log_priv.h b/fs/xfs/xfs_log_priv.h
> index 9f8c601a302b..5f4358f18224 100644
> --- a/fs/xfs/xfs_log_priv.h
> +++ b/fs/xfs/xfs_log_priv.h
> @@ -426,13 +426,10 @@ struct xlog {
>  	int			l_prev_block;   /* previous logical log block */
>  
>  	/*
> -	 * l_last_sync_lsn and l_tail_lsn are atomics so they can be set and
> -	 * read without needing to hold specific locks. To avoid operations
> -	 * contending with other hot objects, place each of them on a separate
> -	 * cacheline.
> +	 * l_tail_lsn is atomic so it can be set and read without needing to
> +	 * hold specific locks. To avoid operations contending with other hot
> +	 * objects, it on a separate cacheline.
>  	 */
> -	/* lsn of last LR on disk */
> -	atomic64_t		l_last_sync_lsn ____cacheline_aligned_in_smp;
>  	/* lsn of 1st LR with unflushed * buffers */
>  	atomic64_t		l_tail_lsn ____cacheline_aligned_in_smp;
>  
> diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
> index 9e0e7ff76e02..d9997714f975 100644
> --- a/fs/xfs/xfs_log_recover.c
> +++ b/fs/xfs/xfs_log_recover.c
> @@ -1177,8 +1177,8 @@ xlog_check_unmount_rec(
>  			 */
>  			xlog_assign_atomic_lsn(&log->l_tail_lsn,
>  					log->l_curr_cycle, after_umount_blk);
> -			xlog_assign_atomic_lsn(&log->l_last_sync_lsn,
> -					log->l_curr_cycle, after_umount_blk);
> +			log->l_ailp->ail_head_lsn =
> +					atomic64_read(&log->l_tail_lsn);
>  			*tail_blk = after_umount_blk;
>  
>  			*clean = true;
> @@ -1212,7 +1212,7 @@ xlog_set_state(
>  	if (bump_cycle)
>  		log->l_curr_cycle++;
>  	atomic64_set(&log->l_tail_lsn, be64_to_cpu(rhead->h_tail_lsn));
> -	atomic64_set(&log->l_last_sync_lsn, be64_to_cpu(rhead->h_lsn));
> +	log->l_ailp->ail_head_lsn = be64_to_cpu(rhead->h_lsn);
>  	xlog_assign_grant_head(&log->l_reserve_head.grant, log->l_curr_cycle,
>  					BBTOB(log->l_curr_block));
>  	xlog_assign_grant_head(&log->l_write_head.grant, log->l_curr_cycle,
> @@ -3294,14 +3294,13 @@ xlog_do_recover(
>  
>  	/*
>  	 * We now update the tail_lsn since much of the recovery has completed
> -	 * and there may be space available to use.  If there were no extent
> -	 * or iunlinks, we can free up the entire log and set the tail_lsn to
> -	 * be the last_sync_lsn.  This was set in xlog_find_tail to be the
> -	 * lsn of the last known good LR on disk.  If there are extent frees
> -	 * or iunlinks they will have some entries in the AIL; so we look at
> -	 * the AIL to determine how to set the tail_lsn.
> +	 * and there may be space available to use.  If there were no extent or
> +	 * iunlinks, we can free up the entire log.  This was set in
> +	 * xlog_find_tail to be the lsn of the last known good LR on disk.  If
> +	 * there are extent frees or iunlinks they will have some entries in the
> +	 * AIL; so we look at the AIL to determine how to set the tail_lsn.
>  	 */
> -	xlog_assign_tail_lsn(mp);
> +	xfs_ail_assign_tail_lsn(log->l_ailp);
>  
>  	/*
>  	 * Now that we've finished replaying all buffer and inode updates,
> diff --git a/fs/xfs/xfs_trace.c b/fs/xfs/xfs_trace.c
> index d269ef57ff01..dcf9af0108c1 100644
> --- a/fs/xfs/xfs_trace.c
> +++ b/fs/xfs/xfs_trace.c
> @@ -22,6 +22,7 @@
>  #include "xfs_trans.h"
>  #include "xfs_log.h"
>  #include "xfs_log_priv.h"
> +#include "xfs_trans_priv.h"
>  #include "xfs_buf_item.h"
>  #include "xfs_quota.h"
>  #include "xfs_dquot_item.h"
> diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
> index f9057af6e0c8..886cde292c95 100644
> --- a/fs/xfs/xfs_trace.h
> +++ b/fs/xfs/xfs_trace.h
> @@ -1383,19 +1383,19 @@ TRACE_EVENT(xfs_log_assign_tail_lsn,
>  		__field(dev_t, dev)
>  		__field(xfs_lsn_t, new_lsn)
>  		__field(xfs_lsn_t, old_lsn)
> -		__field(xfs_lsn_t, last_sync_lsn)
> +		__field(xfs_lsn_t, head_lsn)
>  	),
>  	TP_fast_assign(
>  		__entry->dev = log->l_mp->m_super->s_dev;
>  		__entry->new_lsn = new_lsn;
>  		__entry->old_lsn = atomic64_read(&log->l_tail_lsn);
> -		__entry->last_sync_lsn = atomic64_read(&log->l_last_sync_lsn);
> +		__entry->head_lsn = log->l_ailp->ail_head_lsn;
>  	),
> -	TP_printk("dev %d:%d new tail lsn %d/%d, old lsn %d/%d, last sync %d/%d",
> +	TP_printk("dev %d:%d new tail lsn %d/%d, old lsn %d/%d, head lsn %d/%d",
>  		  MAJOR(__entry->dev), MINOR(__entry->dev),
>  		  CYCLE_LSN(__entry->new_lsn), BLOCK_LSN(__entry->new_lsn),
>  		  CYCLE_LSN(__entry->old_lsn), BLOCK_LSN(__entry->old_lsn),
> -		  CYCLE_LSN(__entry->last_sync_lsn), BLOCK_LSN(__entry->last_sync_lsn))
> +		  CYCLE_LSN(__entry->head_lsn), BLOCK_LSN(__entry->head_lsn))
>  )
>  
>  DECLARE_EVENT_CLASS(xfs_file_class,
> diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c
> index 5f40509877f7..fe3f8b80e687 100644
> --- a/fs/xfs/xfs_trans_ail.c
> +++ b/fs/xfs/xfs_trans_ail.c
> @@ -715,6 +715,26 @@ xfs_ail_push_all_sync(
>  	finish_wait(&ailp->ail_empty, &wait);
>  }
>  
> +void
> +__xfs_ail_assign_tail_lsn(
> +	struct xfs_ail		*ailp)
> +{
> +	struct xlog		*log = ailp->ail_log;
> +	xfs_lsn_t		tail_lsn;
> +
> +	assert_spin_locked(&ailp->ail_lock);
> +
> +	if (xlog_is_shutdown(log))
> +		return;
> +
> +	tail_lsn = __xfs_ail_min_lsn(ailp);
> +	if (!tail_lsn)
> +		tail_lsn = ailp->ail_head_lsn;
> +
> +	trace_xfs_log_assign_tail_lsn(log, tail_lsn);
> +	atomic64_set(&log->l_tail_lsn, tail_lsn);
> +}
> +
>  /*
>   * Callers should pass the the original tail lsn so that we can detect if the
>   * tail has moved as a result of the operation that was performed. If the caller
> @@ -729,15 +749,13 @@ xfs_ail_update_finish(
>  {
>  	struct xlog		*log = ailp->ail_log;
>  
> -	/* if the tail lsn hasn't changed, don't do updates or wakeups. */
> +	/* If the tail lsn hasn't changed, don't do updates or wakeups. */
>  	if (!old_lsn || old_lsn == __xfs_ail_min_lsn(ailp)) {
>  		spin_unlock(&ailp->ail_lock);
>  		return;
>  	}
>  
> -	if (!xlog_is_shutdown(log))
> -		xlog_assign_tail_lsn_locked(log->l_mp);
> -
> +	__xfs_ail_assign_tail_lsn(ailp);
>  	if (list_empty(&ailp->ail_head))
>  		wake_up_all(&ailp->ail_empty);
>  	spin_unlock(&ailp->ail_lock);
> diff --git a/fs/xfs/xfs_trans_priv.h b/fs/xfs/xfs_trans_priv.h
> index 9a131e7fae94..6541a6c3ea22 100644
> --- a/fs/xfs/xfs_trans_priv.h
> +++ b/fs/xfs/xfs_trans_priv.h
> @@ -55,6 +55,7 @@ struct xfs_ail {
>  	struct list_head	ail_cursors;
>  	spinlock_t		ail_lock;
>  	xfs_lsn_t		ail_last_pushed_lsn;
> +	xfs_lsn_t		ail_head_lsn;
>  	int			ail_log_flush;
>  	unsigned long		ail_opstate;
>  	struct list_head	ail_buf_list;
> @@ -135,6 +136,18 @@ struct xfs_log_item *	xfs_trans_ail_cursor_next(struct xfs_ail *ailp,
>  					struct xfs_ail_cursor *cur);
>  void			xfs_trans_ail_cursor_done(struct xfs_ail_cursor *cur);
>  
> +void			__xfs_ail_assign_tail_lsn(struct xfs_ail *ailp);
> +
> +static inline void
> +xfs_ail_assign_tail_lsn(
> +	struct xfs_ail		*ailp)
> +{
> +
> +	spin_lock(&ailp->ail_lock);
> +	__xfs_ail_assign_tail_lsn(ailp);
> +	spin_unlock(&ailp->ail_lock);
> +}
> +
>  #if BITS_PER_LONG != 64
>  static inline void
>  xfs_trans_ail_copy_lsn(
> -- 
> 2.36.1
> 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 6/9] xfs: collapse xlog_state_set_callback in caller
  2022-08-09 23:03 ` [PATCH 6/9] xfs: collapse xlog_state_set_callback in caller Dave Chinner
@ 2022-08-26 22:20   ` Darrick J. Wong
  2022-09-07 14:12   ` Christoph Hellwig
  1 sibling, 0 replies; 39+ messages in thread
From: Darrick J. Wong @ 2022-08-26 22:20 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Wed, Aug 10, 2022 at 09:03:50AM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> The function is called from a single place, and it isn't just
> setting the iclog state to XLOG_STATE_CALLBACK - it can mark iclogs
> clean, which moves tehm to states after CALLBACK. Hence the function

Nit: s/tehm/them/

> is now badly named, and should just be folded into the caller where
> the iclog completion logic makes a whole lot more sense.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>

I had wondered what xlog_state_set_callback thought it was doing until I
looked ahead and saw this.

Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> ---
>  fs/xfs/xfs_log.c | 31 +++++++++++--------------------
>  1 file changed, 11 insertions(+), 20 deletions(-)
> 
> diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
> index e420591b1a8a..5b7c91a42edf 100644
> --- a/fs/xfs/xfs_log.c
> +++ b/fs/xfs/xfs_log.c
> @@ -2520,25 +2520,6 @@ xlog_get_lowest_lsn(
>  	return lowest_lsn;
>  }
>  
> -static void
> -xlog_state_set_callback(
> -	struct xlog		*log,
> -	struct xlog_in_core	*iclog,
> -	xfs_lsn_t		header_lsn)
> -{
> -	/*
> -	 * If there are no callbacks on this iclog, we can mark it clean
> -	 * immediately and return. Otherwise we need to run the
> -	 * callbacks.
> -	 */
> -	if (list_empty(&iclog->ic_callbacks)) {
> -		xlog_state_clean_iclog(log, iclog);
> -		return;
> -	}
> -	trace_xlog_iclog_callback(iclog, _RET_IP_);
> -	iclog->ic_state = XLOG_STATE_CALLBACK;
> -}
> -
>  /*
>   * Return true if we need to stop processing, false to continue to the next
>   * iclog. The caller will need to run callbacks if the iclog is returned in the
> @@ -2570,7 +2551,17 @@ xlog_state_iodone_process_iclog(
>  		lowest_lsn = xlog_get_lowest_lsn(log);
>  		if (lowest_lsn && XFS_LSN_CMP(lowest_lsn, header_lsn) < 0)
>  			return false;
> -		xlog_state_set_callback(log, iclog, header_lsn);
> +		/*
> +		 * If there are no callbacks on this iclog, we can mark it clean
> +		 * immediately and return. Otherwise we need to run the
> +		 * callbacks.
> +		 */
> +		if (list_empty(&iclog->ic_callbacks)) {
> +			xlog_state_clean_iclog(log, iclog);
> +			return false;
> +		}
> +		trace_xlog_iclog_callback(iclog, _RET_IP_);
> +		iclog->ic_state = XLOG_STATE_CALLBACK;
>  		return false;
>  	default:
>  		/*
> -- 
> 2.36.1
> 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 8/9] xfs: pass the full grant head to accounting functions
  2022-08-09 23:03 ` [PATCH 8/9] xfs: pass the full grant head to accounting functions Dave Chinner
@ 2022-08-26 22:25   ` Darrick J. Wong
  0 siblings, 0 replies; 39+ messages in thread
From: Darrick J. Wong @ 2022-08-26 22:25 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Wed, Aug 10, 2022 at 09:03:52AM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> Because we are going to need them soon. API change only, no logic
> changes.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>

Would've been nice to do the xlog_grant_space_left move as a separate
change, but as I've already squinted at both to verify that there's
nothing changing here besides the function signature, let's just leave
this as it is:

Reviewed-by: Darrick J. Wong <djwong@kernel.org>

(ok, I lie, I actually just open the patch twice in gvim, increase the
transparency on one of the gvims, and then overlay them :P)

--D

> ---
>  fs/xfs/xfs_log.c      | 157 +++++++++++++++++++++---------------------
>  fs/xfs/xfs_log_priv.h |   2 -
>  2 files changed, 77 insertions(+), 82 deletions(-)
> 
> diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
> index 5b7c91a42edf..459c0f438c89 100644
> --- a/fs/xfs/xfs_log.c
> +++ b/fs/xfs/xfs_log.c
> @@ -136,10 +136,10 @@ xlog_prepare_iovec(
>  static void
>  xlog_grant_sub_space(
>  	struct xlog		*log,
> -	atomic64_t		*head,
> +	struct xlog_grant_head	*head,
>  	int			bytes)
>  {
> -	int64_t	head_val = atomic64_read(head);
> +	int64_t	head_val = atomic64_read(&head->grant);
>  	int64_t new, old;
>  
>  	do {
> @@ -155,17 +155,17 @@ xlog_grant_sub_space(
>  
>  		old = head_val;
>  		new = xlog_assign_grant_head_val(cycle, space);
> -		head_val = atomic64_cmpxchg(head, old, new);
> +		head_val = atomic64_cmpxchg(&head->grant, old, new);
>  	} while (head_val != old);
>  }
>  
>  static void
>  xlog_grant_add_space(
>  	struct xlog		*log,
> -	atomic64_t		*head,
> +	struct xlog_grant_head	*head,
>  	int			bytes)
>  {
> -	int64_t	head_val = atomic64_read(head);
> +	int64_t	head_val = atomic64_read(&head->grant);
>  	int64_t new, old;
>  
>  	do {
> @@ -184,7 +184,7 @@ xlog_grant_add_space(
>  
>  		old = head_val;
>  		new = xlog_assign_grant_head_val(cycle, space);
> -		head_val = atomic64_cmpxchg(head, old, new);
> +		head_val = atomic64_cmpxchg(&head->grant, old, new);
>  	} while (head_val != old);
>  }
>  
> @@ -197,6 +197,63 @@ xlog_grant_head_init(
>  	spin_lock_init(&head->lock);
>  }
>  
> +/*
> + * Return the space in the log between the tail and the head.  The head
> + * is passed in the cycle/bytes formal parms.  In the special case where
> + * the reserve head has wrapped passed the tail, this calculation is no
> + * longer valid.  In this case, just return 0 which means there is no space
> + * in the log.  This works for all places where this function is called
> + * with the reserve head.  Of course, if the write head were to ever
> + * wrap the tail, we should blow up.  Rather than catch this case here,
> + * we depend on other ASSERTions in other parts of the code.   XXXmiken
> + *
> + * If reservation head is behind the tail, we have a problem. Warn about it,
> + * but then treat it as if the log is empty.
> + *
> + * If the log is shut down, the head and tail may be invalid or out of whack, so
> + * shortcut invalidity asserts in this case so that we don't trigger them
> + * falsely.
> + */
> +static int
> +xlog_grant_space_left(
> +	struct xlog		*log,
> +	struct xlog_grant_head	*head)
> +{
> +	int			tail_bytes;
> +	int			tail_cycle;
> +	int			head_cycle;
> +	int			head_bytes;
> +
> +	xlog_crack_grant_head(&head->grant, &head_cycle, &head_bytes);
> +	xlog_crack_atomic_lsn(&log->l_tail_lsn, &tail_cycle, &tail_bytes);
> +	tail_bytes = BBTOB(tail_bytes);
> +	if (tail_cycle == head_cycle && head_bytes >= tail_bytes)
> +		return log->l_logsize - (head_bytes - tail_bytes);
> +	if (tail_cycle + 1 < head_cycle)
> +		return 0;
> +
> +	/* Ignore potential inconsistency when shutdown. */
> +	if (xlog_is_shutdown(log))
> +		return log->l_logsize;
> +
> +	if (tail_cycle < head_cycle) {
> +		ASSERT(tail_cycle == (head_cycle - 1));
> +		return tail_bytes - head_bytes;
> +	}
> +
> +	/*
> +	 * The reservation head is behind the tail. In this case we just want to
> +	 * return the size of the log as the amount of space left.
> +	 */
> +	xfs_alert(log->l_mp, "xlog_grant_space_left: head behind tail");
> +	xfs_alert(log->l_mp, "  tail_cycle = %d, tail_bytes = %d",
> +		  tail_cycle, tail_bytes);
> +	xfs_alert(log->l_mp, "  GH   cycle = %d, GH   bytes = %d",
> +		  head_cycle, head_bytes);
> +	ASSERT(0);
> +	return log->l_logsize;
> +}
> +
>  STATIC void
>  xlog_grant_head_wake_all(
>  	struct xlog_grant_head	*head)
> @@ -277,7 +334,7 @@ xlog_grant_head_wait(
>  		spin_lock(&head->lock);
>  		if (xlog_is_shutdown(log))
>  			goto shutdown;
> -	} while (xlog_space_left(log, &head->grant) < need_bytes);
> +	} while (xlog_grant_space_left(log, head) < need_bytes);
>  
>  	list_del_init(&tic->t_queue);
>  	return 0;
> @@ -322,7 +379,7 @@ xlog_grant_head_check(
>  	 * otherwise try to get some space for this transaction.
>  	 */
>  	*need_bytes = xlog_ticket_reservation(log, head, tic);
> -	free_bytes = xlog_space_left(log, &head->grant);
> +	free_bytes = xlog_grant_space_left(log, head);
>  	if (!list_empty_careful(&head->waiters)) {
>  		spin_lock(&head->lock);
>  		if (!xlog_grant_head_wake(log, head, &free_bytes) ||
> @@ -396,7 +453,7 @@ xfs_log_regrant(
>  	if (error)
>  		goto out_error;
>  
> -	xlog_grant_add_space(log, &log->l_write_head.grant, need_bytes);
> +	xlog_grant_add_space(log, &log->l_write_head, need_bytes);
>  	trace_xfs_log_regrant_exit(log, tic);
>  	xlog_verify_grant_tail(log);
>  	return 0;
> @@ -447,8 +504,8 @@ xfs_log_reserve(
>  	if (error)
>  		goto out_error;
>  
> -	xlog_grant_add_space(log, &log->l_reserve_head.grant, need_bytes);
> -	xlog_grant_add_space(log, &log->l_write_head.grant, need_bytes);
> +	xlog_grant_add_space(log, &log->l_reserve_head, need_bytes);
> +	xlog_grant_add_space(log, &log->l_write_head, need_bytes);
>  	trace_xfs_log_reserve_exit(log, tic);
>  	xlog_verify_grant_tail(log);
>  	return 0;
> @@ -1114,7 +1171,7 @@ xfs_log_space_wake(
>  		ASSERT(!xlog_in_recovery(log));
>  
>  		spin_lock(&log->l_write_head.lock);
> -		free_bytes = xlog_space_left(log, &log->l_write_head.grant);
> +		free_bytes = xlog_grant_space_left(log, &log->l_write_head);
>  		xlog_grant_head_wake(log, &log->l_write_head, &free_bytes);
>  		spin_unlock(&log->l_write_head.lock);
>  	}
> @@ -1123,7 +1180,7 @@ xfs_log_space_wake(
>  		ASSERT(!xlog_in_recovery(log));
>  
>  		spin_lock(&log->l_reserve_head.lock);
> -		free_bytes = xlog_space_left(log, &log->l_reserve_head.grant);
> +		free_bytes = xlog_grant_space_left(log, &log->l_reserve_head);
>  		xlog_grant_head_wake(log, &log->l_reserve_head, &free_bytes);
>  		spin_unlock(&log->l_reserve_head.lock);
>  	}
> @@ -1237,64 +1294,6 @@ xfs_log_cover(
>  	return error;
>  }
>  
> -/*
> - * Return the space in the log between the tail and the head.  The head
> - * is passed in the cycle/bytes formal parms.  In the special case where
> - * the reserve head has wrapped passed the tail, this calculation is no
> - * longer valid.  In this case, just return 0 which means there is no space
> - * in the log.  This works for all places where this function is called
> - * with the reserve head.  Of course, if the write head were to ever
> - * wrap the tail, we should blow up.  Rather than catch this case here,
> - * we depend on other ASSERTions in other parts of the code.   XXXmiken
> - *
> - * If reservation head is behind the tail, we have a problem. Warn about it,
> - * but then treat it as if the log is empty.
> - *
> - * If the log is shut down, the head and tail may be invalid or out of whack, so
> - * shortcut invalidity asserts in this case so that we don't trigger them
> - * falsely.
> - */
> -int
> -xlog_space_left(
> -	struct xlog	*log,
> -	atomic64_t	*head)
> -{
> -	int		tail_bytes;
> -	int		tail_cycle;
> -	int		head_cycle;
> -	int		head_bytes;
> -
> -	xlog_crack_grant_head(head, &head_cycle, &head_bytes);
> -	xlog_crack_atomic_lsn(&log->l_tail_lsn, &tail_cycle, &tail_bytes);
> -	tail_bytes = BBTOB(tail_bytes);
> -	if (tail_cycle == head_cycle && head_bytes >= tail_bytes)
> -		return log->l_logsize - (head_bytes - tail_bytes);
> -	if (tail_cycle + 1 < head_cycle)
> -		return 0;
> -
> -	/* Ignore potential inconsistency when shutdown. */
> -	if (xlog_is_shutdown(log))
> -		return log->l_logsize;
> -
> -	if (tail_cycle < head_cycle) {
> -		ASSERT(tail_cycle == (head_cycle - 1));
> -		return tail_bytes - head_bytes;
> -	}
> -
> -	/*
> -	 * The reservation head is behind the tail. In this case we just want to
> -	 * return the size of the log as the amount of space left.
> -	 */
> -	xfs_alert(log->l_mp, "xlog_space_left: head behind tail");
> -	xfs_alert(log->l_mp, "  tail_cycle = %d, tail_bytes = %d",
> -		  tail_cycle, tail_bytes);
> -	xfs_alert(log->l_mp, "  GH   cycle = %d, GH   bytes = %d",
> -		  head_cycle, head_bytes);
> -	ASSERT(0);
> -	return log->l_logsize;
> -}
> -
> -
>  static void
>  xlog_ioend_work(
>  	struct work_struct	*work)
> @@ -1883,8 +1882,8 @@ xlog_sync(
>  	if (ticket) {
>  		ticket->t_curr_res -= roundoff;
>  	} else {
> -		xlog_grant_add_space(log, &log->l_reserve_head.grant, roundoff);
> -		xlog_grant_add_space(log, &log->l_write_head.grant, roundoff);
> +		xlog_grant_add_space(log, &log->l_reserve_head, roundoff);
> +		xlog_grant_add_space(log, &log->l_write_head, roundoff);
>  	}
>  
>  	/* put cycle number in every block */
> @@ -2815,17 +2814,15 @@ xfs_log_ticket_regrant(
>  	if (ticket->t_cnt > 0)
>  		ticket->t_cnt--;
>  
> -	xlog_grant_sub_space(log, &log->l_reserve_head.grant,
> -					ticket->t_curr_res);
> -	xlog_grant_sub_space(log, &log->l_write_head.grant,
> -					ticket->t_curr_res);
> +	xlog_grant_sub_space(log, &log->l_reserve_head, ticket->t_curr_res);
> +	xlog_grant_sub_space(log, &log->l_write_head, ticket->t_curr_res);
>  	ticket->t_curr_res = ticket->t_unit_res;
>  
>  	trace_xfs_log_ticket_regrant_sub(log, ticket);
>  
>  	/* just return if we still have some of the pre-reserved space */
>  	if (!ticket->t_cnt) {
> -		xlog_grant_add_space(log, &log->l_reserve_head.grant,
> +		xlog_grant_add_space(log, &log->l_reserve_head,
>  				     ticket->t_unit_res);
>  		trace_xfs_log_ticket_regrant_exit(log, ticket);
>  
> @@ -2873,8 +2870,8 @@ xfs_log_ticket_ungrant(
>  		bytes += ticket->t_unit_res*ticket->t_cnt;
>  	}
>  
> -	xlog_grant_sub_space(log, &log->l_reserve_head.grant, bytes);
> -	xlog_grant_sub_space(log, &log->l_write_head.grant, bytes);
> +	xlog_grant_sub_space(log, &log->l_reserve_head, bytes);
> +	xlog_grant_sub_space(log, &log->l_write_head, bytes);
>  
>  	trace_xfs_log_ticket_ungrant_exit(log, ticket);
>  
> diff --git a/fs/xfs/xfs_log_priv.h b/fs/xfs/xfs_log_priv.h
> index 8a005cb08a02..86b5959b5ef2 100644
> --- a/fs/xfs/xfs_log_priv.h
> +++ b/fs/xfs/xfs_log_priv.h
> @@ -571,8 +571,6 @@ xlog_assign_grant_head(atomic64_t *head, int cycle, int space)
>  	atomic64_set(head, xlog_assign_grant_head_val(cycle, space));
>  }
>  
> -int xlog_space_left(struct xlog	 *log, atomic64_t *head);
> -
>  /*
>   * Committed Item List interfaces
>   */
> -- 
> 2.36.1
> 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 7/9] xfs: track log space pinned by the AIL
  2022-08-09 23:03 ` [PATCH 7/9] xfs: track log space pinned by the AIL Dave Chinner
@ 2022-08-26 22:39   ` Darrick J. Wong
  0 siblings, 0 replies; 39+ messages in thread
From: Darrick J. Wong @ 2022-08-26 22:39 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Wed, Aug 10, 2022 at 09:03:51AM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> Currently we track space used in the log by grant heads.
> These store the reserved space as a physical log location and
> combine both space reserved for future use with space already used in
> the log in a single variable. The amount of space consumed in the
> log is then calculated as the  distance between the log tail and
> the grant head.
> 
> The problem with tracking the grant head as a physical location
> comes from the fact that it tracks both log cycle count and offset
> into the log in bytes in a single 64 bit variable. because the cycle
> count on disk is a 32 bit number, this also limits the offset into
> the log to 32 bits. ANd because that is in bytes, we are limited to
> being able to track only 2GB of log space in the grant head.
> 
> Hence to support larger physical logs, we need to track used space
> differently in the grant head. We no longer use the grant head for
> guiding AIL pushing, so the only thing it is now used for is
> determining if we've run out of reservation space via the
> calculation in xlog_space_left().
> 
> What we really need to do is move the grant heads away from tracking
> physical space in the log. The issue here is that space consumed in
> the log is not directly tracked by the current mechanism - the
> space consumed in the log by grant head reservations gets returned
> to the free pool by the tail of the log moving forward. i.e. the
> space isn't directly tracked or calculated, but the used grant space
> gets "freed" as the physical limits of the log are updated without
> actually needing to update the grant heads.
> 
> Hence to move away from implicit, zero-update log space tracking we
> need to explicitly track the amount of physical space the log
> actually consumes separately to the in-memory reservations for
> operations that will be committed to the journal. Luckily, we
> already track the information we need to calculate this in the AIL
> itself.
> 
> That is, the space currently consumed by the journal is the maximum
> LSN that the AIL has seen minus the current log tail. As we update
> both of these items dynamically as the head and tail of the log
> moves, we always know exactly how much space the journal consumes.
> 
> This means that we also know exactly how much space the currently
> active reservations require, and exactly how much free space we have
> remaining for new reservations to be made. Most importantly, we know
> what these spaces are indepedently of the physical locations of
> the head and tail of the log.
> 
> Hence by separating out the physical space consumed by the journal,
> we can now track reservations in the grant heads purely as a byte
> count, and the log can be considered full when the tail space +
> reservation space exceeds the size of the log. This means we can use
> the full 64 bits of grant head space for reservation space,
> completely removing the 32 bit byte count limitation on log size
> that they impose.
> 
> Hence the first step in this conversion is to track and update the
> "log tail space" every time the AIL tail or maximum seen LSN
> changes.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/xfs_log_cil.c   | 9 ++++++---
>  fs/xfs/xfs_log_priv.h  | 1 +
>  fs/xfs/xfs_trans_ail.c | 9 ++++++---
>  3 files changed, 13 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
> index 843764d40232..e482ae9fc01c 100644
> --- a/fs/xfs/xfs_log_cil.c
> +++ b/fs/xfs/xfs_log_cil.c
> @@ -761,14 +761,17 @@ xlog_cil_ail_insert(
>  	 * always be the same (as iclogs can contain multiple commit records) or
>  	 * higher LSN than the current head. We do this before insertion of the
>  	 * items so that log space checks during insertion will reflect the
> -	 * space that this checkpoint has already consumed.
> +	 * space that this checkpoint has already consumed.  We call
> +	 * xfs_ail_update_finish() so that tail space and space-based wakeups
> +	 * will be recalculated appropriately.
>  	 */
>  	ASSERT(XFS_LSN_CMP(ctx->commit_lsn, ailp->ail_head_lsn) >= 0 ||
>  			aborted);
>  	spin_lock(&ailp->ail_lock);
> -	ailp->ail_head_lsn = ctx->commit_lsn;
>  	xfs_trans_ail_cursor_last(ailp, &cur, ctx->start_lsn);
> -	spin_unlock(&ailp->ail_lock);
> +	ailp->ail_head_lsn = ctx->commit_lsn;
> +	/* xfs_ail_update_finish() drops the ail_lock */
> +	xfs_ail_update_finish(ailp, NULLCOMMITLSN);

Hmm.  I think this change makes it so that any time we add items to the
AIL, we update the head lsn, recalculate the amount of space being used
by the ondisk(?) journal, and possibly start waking threads up if we've
pushed the tail ahead enough space to somebody have some grant space?

If I grokked that, then:
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

>  
>  	/* unpin all the log items */
>  	list_for_each_entry(lv, &ctx->lv_chain, lv_list) {
> diff --git a/fs/xfs/xfs_log_priv.h b/fs/xfs/xfs_log_priv.h
> index 5f4358f18224..8a005cb08a02 100644
> --- a/fs/xfs/xfs_log_priv.h
> +++ b/fs/xfs/xfs_log_priv.h
> @@ -435,6 +435,7 @@ struct xlog {
>  
>  	struct xlog_grant_head	l_reserve_head;
>  	struct xlog_grant_head	l_write_head;
> +	uint64_t		l_tail_space;
>  
>  	struct xfs_kobj		l_kobj;
>  
> diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c
> index fe3f8b80e687..5d0ddd6d68e9 100644
> --- a/fs/xfs/xfs_trans_ail.c
> +++ b/fs/xfs/xfs_trans_ail.c
> @@ -731,6 +731,8 @@ __xfs_ail_assign_tail_lsn(
>  	if (!tail_lsn)
>  		tail_lsn = ailp->ail_head_lsn;
>  
> +	WRITE_ONCE(log->l_tail_space,
> +			xlog_lsn_sub(log, ailp->ail_head_lsn, tail_lsn));
>  	trace_xfs_log_assign_tail_lsn(log, tail_lsn);
>  	atomic64_set(&log->l_tail_lsn, tail_lsn);
>  }
> @@ -738,9 +740,10 @@ __xfs_ail_assign_tail_lsn(
>  /*
>   * Callers should pass the the original tail lsn so that we can detect if the
>   * tail has moved as a result of the operation that was performed. If the caller
> - * needs to force a tail LSN update, it should pass NULLCOMMITLSN to bypass the
> - * "did the tail LSN change?" checks. If the caller wants to avoid a tail update
> - * (e.g. it knows the tail did not change) it should pass an @old_lsn of 0.
> + * needs to force a tail space update, it should pass NULLCOMMITLSN to bypass
> + * the "did the tail LSN change?" checks. If the caller wants to avoid a tail
> + * update (e.g. it knows the tail did not change) it should pass an @old_lsn of
> + * 0.
>   */
>  void
>  xfs_ail_update_finish(
> -- 
> 2.36.1
> 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 9/9] xfs: grant heads track byte counts, not LSNs
  2022-08-09 23:03 ` [PATCH 9/9] xfs: grant heads track byte counts, not LSNs Dave Chinner
@ 2022-08-26 23:45   ` Darrick J. Wong
  0 siblings, 0 replies; 39+ messages in thread
From: Darrick J. Wong @ 2022-08-26 23:45 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Wed, Aug 10, 2022 at 09:03:53AM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> The grant heads in the log track the space reserved in the log for
> running transactions. They do this by tracking how far ahead of the
> tail that the reservation has reached, and the units for doing this
> are {cycle,bytes} for the reserve head rather than {cycle,blocks}
> which are normal used by LSNs.
> 
> This is annoyingly complex because we have to split, crack and
> combined these tuples for any calculation we do to determine log
> space and targets. This is computationally expensive as well as
> difficult to do atomically and locklessly, as well as limiting the
> size of the log to 2^32 bytes.
> 
> Really, though, all the grant heads are tracking is how much space
> is currently available for use in the log. We can track this as a
> simply byte count - we just don't care what the actual physical
> location in the log the head and tail are at, just how much space we
> have remaining before the head and tail overlap.
> 
> So, convert the grant heads to track the byte reservations that are
> active rather than the current (cycle, offset) tuples. This means an
> empty log has zero bytes consumed, and a full log is when the the
> reservations reach the size of the log minus the space consumed by
> the AIL.

Checking my understanding here -- the "space consumed by the AIL" is the
space used by the ondisk journal between the last iclog we committed to
disk, and the oldest ondisk transaction that xfsaild has written back to
the filesystem?  So that's ail_head_lsn - l_tail_lsn, if we go back to
the picture from the cover letter:

   l_tail_lsn             ail_head_lsn          grantheadbytes  logsize
        |-----------------------|+++++++++++++++++++++|~~~~~~~~~~~~|
        |    log->l_tail_space  |     grant space     |            |
        | - - - - - - xlog_space_left() - - - - - - - | - free - - |

The "grant space" now is just a simple byte counter of all the space
reserved for log tickets by running transactions?  And now that you've
made it so the log tracks the AIL space used with a byte counter, you're
making the grant heads also use a byte counter?

So the first ~4 or so patches are disentangling all the AIL tail pushing
code, and it's really these last 4 or so that actually do the unit
conversion to simplify the "How full is the log?" accounting?

> This greatly simplifies the accounting and checks for whether there
> is space available. We no longer need to crack or combine LSNs to
> determine how much space the log has left, nor do we need to look at
> the head or tail of the log to determine how close to full we are.
> 
> There is, however, a complexity that needs to be handled. We know
> how much space is being tracked in the AIL now via log->l_tail_space
> and the log tickets track active reservations and return the unused
> portions to the grant heads when ungranted.  Unfortunately, we don't
> track the used portion of the grant, so when we transfer log items
> from the CIL to the AIL, the space accounted to the grant heads is
> transferred to the log tail space.  Hence when we move the AIL head
> forwards on item insert, we have to remove that space from the grant
> heads.
> 
> We also remove the xlog_verify_grant_tail() debug function as it is
> no longer useful. The check it performs has been racy since delayed
> logging was introduced, but now it is clearly only detecting false
> positives so remove it.

I /was/ going to ask about generic/650 -- I've been seeing sporadic
reports from it about log reservation being over by ~40 bytes or so.
I hadn't gotten to it (a) due to other P1 escalations and (b) CPU
hotplug developed some weird problem in 6.0-rc1 so I left it alone.

Anyway, this looks reasonable to me, I'll go back to the first four
patches and add some tags there too.

Reviewed-by: Darrick J. Wong <djwong@kernel.org>

> The result of this substantially simpler accounting algorithm is an
> increase in sustained transaction rate from ~1.3 million
> transactions/s to ~1.9 million transactions/s with no increase in
> CPU usage. We also remove the 32 bit space limitation on the grant
> heads, which will allow us to increase the journal size beyond 2GB
> in future.

...and what is that?  A log-incompat change where the space component of
an LSN is now in units of (say) log sector size?  Which gets us get to
2^(31+9) == 1TB of log now?  Or 8TB if you go for 4k sector drives?

--D

> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/xfs_log.c         | 205 ++++++++++++---------------------------
>  fs/xfs/xfs_log_cil.c     |  12 +++
>  fs/xfs/xfs_log_priv.h    |  45 +++------
>  fs/xfs/xfs_log_recover.c |   4 -
>  fs/xfs/xfs_sysfs.c       |  17 ++--
>  fs/xfs/xfs_trace.h       |  33 ++++---
>  6 files changed, 113 insertions(+), 203 deletions(-)
> 
> diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
> index 459c0f438c89..148214cf7032 100644
> --- a/fs/xfs/xfs_log.c
> +++ b/fs/xfs/xfs_log.c
> @@ -53,9 +53,6 @@ xlog_sync(
>  	struct xlog_ticket	*ticket);
>  #if defined(DEBUG)
>  STATIC void
> -xlog_verify_grant_tail(
> -	struct xlog *log);
> -STATIC void
>  xlog_verify_iclog(
>  	struct xlog		*log,
>  	struct xlog_in_core	*iclog,
> @@ -65,7 +62,6 @@ xlog_verify_tail_lsn(
>  	struct xlog		*log,
>  	struct xlog_in_core	*iclog);
>  #else
> -#define xlog_verify_grant_tail(a)
>  #define xlog_verify_iclog(a,b,c)
>  #define xlog_verify_tail_lsn(a,b)
>  #endif
> @@ -133,30 +129,13 @@ xlog_prepare_iovec(
>  	return buf;
>  }
>  
> -static void
> +void
>  xlog_grant_sub_space(
>  	struct xlog		*log,
>  	struct xlog_grant_head	*head,
>  	int			bytes)
>  {
> -	int64_t	head_val = atomic64_read(&head->grant);
> -	int64_t new, old;
> -
> -	do {
> -		int	cycle, space;
> -
> -		xlog_crack_grant_head_val(head_val, &cycle, &space);
> -
> -		space -= bytes;
> -		if (space < 0) {
> -			space += log->l_logsize;
> -			cycle--;
> -		}
> -
> -		old = head_val;
> -		new = xlog_assign_grant_head_val(cycle, space);
> -		head_val = atomic64_cmpxchg(&head->grant, old, new);
> -	} while (head_val != old);
> +	atomic64_sub(bytes, &head->grant);
>  }
>  
>  static void
> @@ -165,93 +144,39 @@ xlog_grant_add_space(
>  	struct xlog_grant_head	*head,
>  	int			bytes)
>  {
> -	int64_t	head_val = atomic64_read(&head->grant);
> -	int64_t new, old;
> -
> -	do {
> -		int		tmp;
> -		int		cycle, space;
> -
> -		xlog_crack_grant_head_val(head_val, &cycle, &space);
> -
> -		tmp = log->l_logsize - space;
> -		if (tmp > bytes)
> -			space += bytes;
> -		else {
> -			space = bytes - tmp;
> -			cycle++;
> -		}
> -
> -		old = head_val;
> -		new = xlog_assign_grant_head_val(cycle, space);
> -		head_val = atomic64_cmpxchg(&head->grant, old, new);
> -	} while (head_val != old);
> +	atomic64_add(bytes, &head->grant);
>  }
>  
> -STATIC void
> +static void
>  xlog_grant_head_init(
>  	struct xlog_grant_head	*head)
>  {
> -	xlog_assign_grant_head(&head->grant, 1, 0);
> +	atomic64_set(&head->grant, 0);
>  	INIT_LIST_HEAD(&head->waiters);
>  	spin_lock_init(&head->lock);
>  }
>  
>  /*
> - * Return the space in the log between the tail and the head.  The head
> - * is passed in the cycle/bytes formal parms.  In the special case where
> - * the reserve head has wrapped passed the tail, this calculation is no
> - * longer valid.  In this case, just return 0 which means there is no space
> - * in the log.  This works for all places where this function is called
> - * with the reserve head.  Of course, if the write head were to ever
> - * wrap the tail, we should blow up.  Rather than catch this case here,
> - * we depend on other ASSERTions in other parts of the code.   XXXmiken
> - *
> - * If reservation head is behind the tail, we have a problem. Warn about it,
> - * but then treat it as if the log is empty.
> - *
> - * If the log is shut down, the head and tail may be invalid or out of whack, so
> - * shortcut invalidity asserts in this case so that we don't trigger them
> - * falsely.
> + * Return the space in the log between the tail and the head.  In the case where
> + * we have overrun available reservation space, return 0. The memory barrier
> + * pairs with the smp_wmb() in xlog_cil_ail_insert() to ensure that grant head
> + * vs tail space updates are seen in the correct order and hence avoid
> + * transients as space is transferred from the grant heads to the AIL on commit
> + * completion.
>   */
> -static int
> +static uint64_t
>  xlog_grant_space_left(
>  	struct xlog		*log,
>  	struct xlog_grant_head	*head)
>  {
> -	int			tail_bytes;
> -	int			tail_cycle;
> -	int			head_cycle;
> -	int			head_bytes;
> -
> -	xlog_crack_grant_head(&head->grant, &head_cycle, &head_bytes);
> -	xlog_crack_atomic_lsn(&log->l_tail_lsn, &tail_cycle, &tail_bytes);
> -	tail_bytes = BBTOB(tail_bytes);
> -	if (tail_cycle == head_cycle && head_bytes >= tail_bytes)
> -		return log->l_logsize - (head_bytes - tail_bytes);
> -	if (tail_cycle + 1 < head_cycle)
> -		return 0;
> -
> -	/* Ignore potential inconsistency when shutdown. */
> -	if (xlog_is_shutdown(log))
> -		return log->l_logsize;
> -
> -	if (tail_cycle < head_cycle) {
> -		ASSERT(tail_cycle == (head_cycle - 1));
> -		return tail_bytes - head_bytes;
> -	}
> +	int64_t			free_bytes;
>  
> -	/*
> -	 * The reservation head is behind the tail. In this case we just want to
> -	 * return the size of the log as the amount of space left.
> -	 */
> -	xfs_alert(log->l_mp, "xlog_grant_space_left: head behind tail");
> -	xfs_alert(log->l_mp, "  tail_cycle = %d, tail_bytes = %d",
> -		  tail_cycle, tail_bytes);
> -	xfs_alert(log->l_mp, "  GH   cycle = %d, GH   bytes = %d",
> -		  head_cycle, head_bytes);
> -	ASSERT(0);
> -	return log->l_logsize;
> +	smp_rmb();	// paired with smp_wmb in xlog_cil_ail_insert()
> +	free_bytes = log->l_logsize - READ_ONCE(log->l_tail_space) -
> +			atomic64_read(&head->grant);
> +	if (free_bytes > 0)
> +		return free_bytes;
> +	return 0;
>  }
>  
>  STATIC void
> @@ -455,7 +380,6 @@ xfs_log_regrant(
>  
>  	xlog_grant_add_space(log, &log->l_write_head, need_bytes);
>  	trace_xfs_log_regrant_exit(log, tic);
> -	xlog_verify_grant_tail(log);
>  	return 0;
>  
>  out_error:
> @@ -507,7 +431,6 @@ xfs_log_reserve(
>  	xlog_grant_add_space(log, &log->l_reserve_head, need_bytes);
>  	xlog_grant_add_space(log, &log->l_write_head, need_bytes);
>  	trace_xfs_log_reserve_exit(log, tic);
> -	xlog_verify_grant_tail(log);
>  	return 0;
>  
>  out_error:
> @@ -3343,42 +3266,27 @@ xlog_ticket_alloc(
>  }
>  
>  #if defined(DEBUG)
> -/*
> - * Check to make sure the grant write head didn't just over lap the tail.  If
> - * the cycles are the same, we can't be overlapping.  Otherwise, make sure that
> - * the cycles differ by exactly one and check the byte count.
> - *
> - * This check is run unlocked, so can give false positives. Rather than assert
> - * on failures, use a warn-once flag and a panic tag to allow the admin to
> - * determine if they want to panic the machine when such an error occurs. For
> - * debug kernels this will have the same effect as using an assert but, unlinke
> - * an assert, it can be turned off at runtime.
> - */
> -STATIC void
> -xlog_verify_grant_tail(
> -	struct xlog	*log)
> +static void
> +xlog_verify_dump_tail(
> +	struct xlog		*log,
> +	struct xlog_in_core	*iclog)
>  {
> -	int		tail_cycle, tail_blocks;
> -	int		cycle, space;
> -
> -	xlog_crack_grant_head(&log->l_write_head.grant, &cycle, &space);
> -	xlog_crack_atomic_lsn(&log->l_tail_lsn, &tail_cycle, &tail_blocks);
> -	if (tail_cycle != cycle) {
> -		if (cycle - 1 != tail_cycle &&
> -		    !test_and_set_bit(XLOG_TAIL_WARN, &log->l_opstate)) {
> -			xfs_alert_tag(log->l_mp, XFS_PTAG_LOGRES,
> -				"%s: cycle - 1 != tail_cycle", __func__);
> -		}
> -
> -		if (space > BBTOB(tail_blocks) &&
> -		    !test_and_set_bit(XLOG_TAIL_WARN, &log->l_opstate)) {
> -			xfs_alert_tag(log->l_mp, XFS_PTAG_LOGRES,
> -				"%s: space > BBTOB(tail_blocks)", __func__);
> -		}
> -	}
> -}
> -
> -/* check if it will fit */
> +	xfs_alert(log->l_mp,
> +"ran out of log space tail 0x%llx/0x%llx, head lsn 0x%llx, head 0x%x/0x%x, prev head 0x%x/0x%x",
> +			iclog ? be64_to_cpu(iclog->ic_header.h_tail_lsn) : -1,
> +			atomic64_read(&log->l_tail_lsn),
> +			log->l_ailp->ail_head_lsn,
> +			log->l_curr_cycle, log->l_curr_block,
> +			log->l_prev_cycle, log->l_prev_block);
> +	xfs_alert(log->l_mp,
> +"write grant 0x%llx, reserve grant 0x%llx, tail_space 0x%llx, size 0x%x, iclog flags 0x%x",
> +			atomic64_read(&log->l_write_head.grant),
> +			atomic64_read(&log->l_reserve_head.grant),
> +			log->l_tail_space, log->l_logsize,
> +			iclog ? iclog->ic_flags : -1);
> +}
> +
> +/* Check if the new iclog will fit in the log. */
>  STATIC void
>  xlog_verify_tail_lsn(
>  	struct xlog		*log,
> @@ -3387,21 +3295,34 @@ xlog_verify_tail_lsn(
>  	xfs_lsn_t	tail_lsn = be64_to_cpu(iclog->ic_header.h_tail_lsn);
>  	int		blocks;
>  
> -    if (CYCLE_LSN(tail_lsn) == log->l_prev_cycle) {
> -	blocks =
> -	    log->l_logBBsize - (log->l_prev_block - BLOCK_LSN(tail_lsn));
> -	if (blocks < BTOBB(iclog->ic_offset)+BTOBB(log->l_iclog_hsize))
> -		xfs_emerg(log->l_mp, "%s: ran out of log space", __func__);
> -    } else {
> -	ASSERT(CYCLE_LSN(tail_lsn)+1 == log->l_prev_cycle);
> +	if (CYCLE_LSN(tail_lsn) == log->l_prev_cycle) {
> +		blocks = log->l_logBBsize -
> +				(log->l_prev_block - BLOCK_LSN(tail_lsn));
> +		if (blocks < BTOBB(iclog->ic_offset) +
> +					BTOBB(log->l_iclog_hsize)) {
> +			xfs_emerg(log->l_mp,
> +					"%s: ran out of log space", __func__);
> +			xlog_verify_dump_tail(log, iclog);
> +		}
> +		return;
> +	}
>  
> -	if (BLOCK_LSN(tail_lsn) == log->l_prev_block)
> +	if (CYCLE_LSN(tail_lsn) + 1 != log->l_prev_cycle) {
> +		xfs_emerg(log->l_mp, "%s: head has wrapped tail.", __func__);
> +		xlog_verify_dump_tail(log, iclog);
> +		return;
> +	}
> +	if (BLOCK_LSN(tail_lsn) == log->l_prev_block) {
>  		xfs_emerg(log->l_mp, "%s: tail wrapped", __func__);
> +		xlog_verify_dump_tail(log, iclog);
> +		return;
> +	}
>  
>  	blocks = BLOCK_LSN(tail_lsn) - log->l_prev_block;
> -	if (blocks < BTOBB(iclog->ic_offset) + 1)
> -		xfs_emerg(log->l_mp, "%s: ran out of log space", __func__);
> -    }
> +	if (blocks < BTOBB(iclog->ic_offset) + 1) {
> +		xfs_emerg(log->l_mp, "%s: ran out of iclog space", __func__);
> +		xlog_verify_dump_tail(log, iclog);
> +	}
>  }
>  
>  /*
> diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
> index e482ae9fc01c..7ff4814b7d87 100644
> --- a/fs/xfs/xfs_log_cil.c
> +++ b/fs/xfs/xfs_log_cil.c
> @@ -753,6 +753,7 @@ xlog_cil_ail_insert(
>  	struct xfs_log_item	*log_items[LOG_ITEM_BATCH_SIZE];
>  	struct xfs_log_vec	*lv;
>  	struct xfs_ail_cursor	cur;
> +	xfs_lsn_t		old_head;
>  	int			i = 0;
>  
>  	/*
> @@ -769,10 +770,21 @@ xlog_cil_ail_insert(
>  			aborted);
>  	spin_lock(&ailp->ail_lock);
>  	xfs_trans_ail_cursor_last(ailp, &cur, ctx->start_lsn);
> +	old_head = ailp->ail_head_lsn;
>  	ailp->ail_head_lsn = ctx->commit_lsn;
>  	/* xfs_ail_update_finish() drops the ail_lock */
>  	xfs_ail_update_finish(ailp, NULLCOMMITLSN);
>  
> +	/*
> +	 * We move the AIL head forwards to account for the space used in the
> +	 * log before we remove that space from the grant heads. This prevents a
> +	 * transient condition where reservation space appears to become
> +	 * available on return, only for it to disappear again immediately as
> +	 * the AIL head update accounts in the log tail space.
> +	 */
> +	smp_wmb();	// paired with smp_rmb in xlog_grant_space_left
> +	xlog_grant_return_space(ailp->ail_log, old_head, ailp->ail_head_lsn);
> +
>  	/* unpin all the log items */
>  	list_for_each_entry(lv, &ctx->lv_chain, lv_list) {
>  		struct xfs_log_item	*lip = lv->lv_item;
> diff --git a/fs/xfs/xfs_log_priv.h b/fs/xfs/xfs_log_priv.h
> index 86b5959b5ef2..c7ae9172dcd9 100644
> --- a/fs/xfs/xfs_log_priv.h
> +++ b/fs/xfs/xfs_log_priv.h
> @@ -541,36 +541,6 @@ xlog_assign_atomic_lsn(atomic64_t *lsn, uint cycle, uint block)
>  	atomic64_set(lsn, xlog_assign_lsn(cycle, block));
>  }
>  
> -/*
> - * When we crack the grant head, we sample it first so that the value will not
> - * change while we are cracking it into the component values. This means we
> - * will always get consistent component values to work from.
> - */
> -static inline void
> -xlog_crack_grant_head_val(int64_t val, int *cycle, int *space)
> -{
> -	*cycle = val >> 32;
> -	*space = val & 0xffffffff;
> -}
> -
> -static inline void
> -xlog_crack_grant_head(atomic64_t *head, int *cycle, int *space)
> -{
> -	xlog_crack_grant_head_val(atomic64_read(head), cycle, space);
> -}
> -
> -static inline int64_t
> -xlog_assign_grant_head_val(int cycle, int space)
> -{
> -	return ((int64_t)cycle << 32) | space;
> -}
> -
> -static inline void
> -xlog_assign_grant_head(atomic64_t *head, int cycle, int space)
> -{
> -	atomic64_set(head, xlog_assign_grant_head_val(cycle, space));
> -}
> -
>  /*
>   * Committed Item List interfaces
>   */
> @@ -636,6 +606,21 @@ xlog_lsn_sub(
>  	return (uint64_t)log->l_logsize - BBTOB(lo_block - hi_block);
>  }
>  
> +void	xlog_grant_sub_space(struct xlog *log, struct xlog_grant_head *head,
> +			int bytes);
> +
> +static inline void
> +xlog_grant_return_space(
> +	struct xlog	*log,
> +	xfs_lsn_t	old_head,
> +	xfs_lsn_t	new_head)
> +{
> +	int64_t		diff = xlog_lsn_sub(log, new_head, old_head);
> +
> +	xlog_grant_sub_space(log, &log->l_reserve_head, diff);
> +	xlog_grant_sub_space(log, &log->l_write_head, diff);
> +}
> +
>  /*
>   * The LSN is valid so long as it is behind the current LSN. If it isn't, this
>   * means that the next log record that includes this metadata could have a
> diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
> index d9997714f975..0c1da8c13f52 100644
> --- a/fs/xfs/xfs_log_recover.c
> +++ b/fs/xfs/xfs_log_recover.c
> @@ -1213,10 +1213,6 @@ xlog_set_state(
>  		log->l_curr_cycle++;
>  	atomic64_set(&log->l_tail_lsn, be64_to_cpu(rhead->h_tail_lsn));
>  	log->l_ailp->ail_head_lsn = be64_to_cpu(rhead->h_lsn);
> -	xlog_assign_grant_head(&log->l_reserve_head.grant, log->l_curr_cycle,
> -					BBTOB(log->l_curr_block));
> -	xlog_assign_grant_head(&log->l_write_head.grant, log->l_curr_cycle,
> -					BBTOB(log->l_curr_block));
>  }
>  
>  /*
> diff --git a/fs/xfs/xfs_sysfs.c b/fs/xfs/xfs_sysfs.c
> index f7faf6e70d7f..0b19acea28cb 100644
> --- a/fs/xfs/xfs_sysfs.c
> +++ b/fs/xfs/xfs_sysfs.c
> @@ -376,14 +376,11 @@ STATIC ssize_t
>  reserve_grant_head_show(
>  	struct kobject	*kobject,
>  	char		*buf)
> -
>  {
> -	int cycle;
> -	int bytes;
> -	struct xlog *log = to_xlog(kobject);
> +	struct xlog	*log = to_xlog(kobject);
> +	uint64_t	bytes = atomic64_read(&log->l_reserve_head.grant);
>  
> -	xlog_crack_grant_head(&log->l_reserve_head.grant, &cycle, &bytes);
> -	return sysfs_emit(buf, "%d:%d\n", cycle, bytes);
> +	return sysfs_emit(buf, "%lld\n", bytes);
>  }
>  XFS_SYSFS_ATTR_RO(reserve_grant_head);
>  
> @@ -392,12 +389,10 @@ write_grant_head_show(
>  	struct kobject	*kobject,
>  	char		*buf)
>  {
> -	int cycle;
> -	int bytes;
> -	struct xlog *log = to_xlog(kobject);
> +	struct xlog	*log = to_xlog(kobject);
> +	uint64_t	bytes = atomic64_read(&log->l_write_head.grant);
>  
> -	xlog_crack_grant_head(&log->l_write_head.grant, &cycle, &bytes);
> -	return sysfs_emit(buf, "%d:%d\n", cycle, bytes);
> +	return sysfs_emit(buf, "%lld\n", bytes);
>  }
>  XFS_SYSFS_ATTR_RO(write_grant_head);
>  
> diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
> index 886cde292c95..5c1871e5747e 100644
> --- a/fs/xfs/xfs_trace.h
> +++ b/fs/xfs/xfs_trace.h
> @@ -1206,6 +1206,7 @@ DECLARE_EVENT_CLASS(xfs_loggrant_class,
>  	TP_ARGS(log, tic),
>  	TP_STRUCT__entry(
>  		__field(dev_t, dev)
> +		__field(unsigned long, tic)
>  		__field(char, ocnt)
>  		__field(char, cnt)
>  		__field(int, curr_res)
> @@ -1213,16 +1214,16 @@ DECLARE_EVENT_CLASS(xfs_loggrant_class,
>  		__field(unsigned int, flags)
>  		__field(int, reserveq)
>  		__field(int, writeq)
> -		__field(int, grant_reserve_cycle)
> -		__field(int, grant_reserve_bytes)
> -		__field(int, grant_write_cycle)
> -		__field(int, grant_write_bytes)
> +		__field(uint64_t, grant_reserve_bytes)
> +		__field(uint64_t, grant_write_bytes)
> +		__field(uint64_t, tail_space)
>  		__field(int, curr_cycle)
>  		__field(int, curr_block)
>  		__field(xfs_lsn_t, tail_lsn)
>  	),
>  	TP_fast_assign(
>  		__entry->dev = log->l_mp->m_super->s_dev;
> +		__entry->tic = (unsigned long)tic;
>  		__entry->ocnt = tic->t_ocnt;
>  		__entry->cnt = tic->t_cnt;
>  		__entry->curr_res = tic->t_curr_res;
> @@ -1230,23 +1231,23 @@ DECLARE_EVENT_CLASS(xfs_loggrant_class,
>  		__entry->flags = tic->t_flags;
>  		__entry->reserveq = list_empty(&log->l_reserve_head.waiters);
>  		__entry->writeq = list_empty(&log->l_write_head.waiters);
> -		xlog_crack_grant_head(&log->l_reserve_head.grant,
> -				&__entry->grant_reserve_cycle,
> -				&__entry->grant_reserve_bytes);
> -		xlog_crack_grant_head(&log->l_write_head.grant,
> -				&__entry->grant_write_cycle,
> -				&__entry->grant_write_bytes);
> +		__entry->tail_space = READ_ONCE(log->l_tail_space);
> +		__entry->grant_reserve_bytes = __entry->tail_space +
> +			atomic64_read(&log->l_reserve_head.grant);
> +		__entry->grant_write_bytes = __entry->tail_space +
> +			atomic64_read(&log->l_write_head.grant);
>  		__entry->curr_cycle = log->l_curr_cycle;
>  		__entry->curr_block = log->l_curr_block;
>  		__entry->tail_lsn = atomic64_read(&log->l_tail_lsn);
>  	),
> -	TP_printk("dev %d:%d t_ocnt %u t_cnt %u t_curr_res %u "
> +	TP_printk("dev %d:%d tic 0x%lx t_ocnt %u t_cnt %u t_curr_res %u "
>  		  "t_unit_res %u t_flags %s reserveq %s "
> -		  "writeq %s grant_reserve_cycle %d "
> -		  "grant_reserve_bytes %d grant_write_cycle %d "
> -		  "grant_write_bytes %d curr_cycle %d curr_block %d "
> +		  "writeq %s "
> +		  "tail space %llu grant_reserve_bytes %llu "
> +		  "grant_write_bytes %llu curr_cycle %d curr_block %d "
>  		  "tail_cycle %d tail_block %d",
>  		  MAJOR(__entry->dev), MINOR(__entry->dev),
> +		  __entry->tic,
>  		  __entry->ocnt,
>  		  __entry->cnt,
>  		  __entry->curr_res,
> @@ -1254,9 +1255,8 @@ DECLARE_EVENT_CLASS(xfs_loggrant_class,
>  		  __print_flags(__entry->flags, "|", XLOG_TIC_FLAGS),
>  		  __entry->reserveq ? "empty" : "active",
>  		  __entry->writeq ? "empty" : "active",
> -		  __entry->grant_reserve_cycle,
> +		  __entry->tail_space,
>  		  __entry->grant_reserve_bytes,
> -		  __entry->grant_write_cycle,
>  		  __entry->grant_write_bytes,
>  		  __entry->curr_cycle,
>  		  __entry->curr_block,
> @@ -1284,6 +1284,7 @@ DEFINE_LOGGRANT_EVENT(xfs_log_ticket_ungrant);
>  DEFINE_LOGGRANT_EVENT(xfs_log_ticket_ungrant_sub);
>  DEFINE_LOGGRANT_EVENT(xfs_log_ticket_ungrant_exit);
>  DEFINE_LOGGRANT_EVENT(xfs_log_cil_wait);
> +DEFINE_LOGGRANT_EVENT(xfs_log_cil_return);
>  
>  DECLARE_EVENT_CLASS(xfs_log_item_class,
>  	TP_PROTO(struct xfs_log_item *lip),
> -- 
> 2.36.1
> 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 4/9] xfs: ensure log tail is always up to date
  2022-08-26 21:39       ` Darrick J. Wong
@ 2022-08-26 23:49         ` Darrick J. Wong
  0 siblings, 0 replies; 39+ messages in thread
From: Darrick J. Wong @ 2022-08-26 23:49 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Fri, Aug 26, 2022 at 02:39:50PM -0700, Darrick J. Wong wrote:
> On Tue, Aug 23, 2022 at 12:18:47PM +1000, Dave Chinner wrote:
> > On Mon, Aug 22, 2022 at 05:33:19PM -0700, Darrick J. Wong wrote:
> > > On Wed, Aug 10, 2022 at 09:03:48AM +1000, Dave Chinner wrote:
> > > > From: Dave Chinner <dchinner@redhat.com>
> > > > 
> > > > Whenever we write an iclog, we call xlog_assign_tail_lsn() to update
> > > > the current tail before we write it into the iclog header. This
> > > > means we have to take the AIL lock on every iclog write just to
> > > > check if the tail of the log has moved.
> > > > 
> > > > This doesn't avoid races with log tail updates - the log tail could
> > > > move immediately after we assign the tail to the iclog header and
> > > > hence by the time the iclog reaches stable storage the tail LSN has
> > > > moved forward in memory. Hence the log tail LSN in the iclog header
> > > > is really just a point in time snapshot of the current state of the
> > > > AIL.
> > > > 
> > > > With this in mind, if we simply update the in memory log->l_tail_lsn
> > > > every time it changes in the AIL, there is no need to update the in
> > > > memory value when we are writing it into an iclog - it will already
> > > > be up-to-date in memory and checking the AIL again will not change
> > > > this.
> > > 
> > > This is too subtle for me to understand -- does the codebase
> > > already update l_tail_lsn?  Does this patch make it do that?
> > 
> > tl;dr: if the AIL is empty, log->l_tail_lsn is not updated on the
> > first insert of a new item into the AILi and hence is stale.
> > xlog_state_release_iclog() currently works around that by calling
> > xlog_assign_tail_lsn() to get the tail lsn from the AIL. This change
> > makes sure log->l_tail_lsn is always up to date.
> > 
> > In more detail:
> > 
> > The tail update occurs in xfs_ail_update_finish(), but only if we
> > pass in a non-zero tail_lsn. xfs_trans_ail_update_bulk() will only
> > set a non-zero tail_lsn if it moves the log item at the tail of the
> > log (i.e. we relog the tail item and move it forwards in the AIL).
> > 
> > Hence if we pass a non-zero tail_lsn to xfs_ail_update_finish(), it
> > indicates it needs to check it against the LSN of the item currently
> > at the tail of the AIL. If the tail LSN has not changed, we do
> > nothing, if it has changed, then we call
> > xlog_assign_tail_lsn_locked() to update the log tail.
> > 
> > The problem with the current code is that if the AIL is empty when
> > we insert the first item, we've actually moved the log tail but we
> > do not update the log tail (i.e. tail_lsn is zero in this case). If
> > we then release an iclog for writing at this point in time, the tail
> > lsn it writes into the iclog header would be wrong - it does not
> > reflect the log tail as defined by the AIL and the checkpoint that
> > has just been committed.
> > 
> > Hence xlog_state_release_iclog() called xlog_assign_tail_lsn() to
> > ensure that it checked that the tail LSN it applies to the iclog
> > reflects the current state of the AIL. i.e. it checks if there is an
> > item in the AIL, and if so, grabs the tail_lsn from the AIL. This
> > works around the fact the AIL doesn't update the log tail on the
> > first insert.
> > 
> > Hence what this patch does is have xfs_trans_ail_update_bulk set
> > the tail_lsn passed to xfs_ail_update_finish() to NULLCOMMITLSN when
> > it does the first insert into the AIL. NULLCOMMITLSN is a
> > non-zero value that won't match with the LSN of items we just
> > inserted into the AIL, and hence xfs_ail_update_finish() will go an
> > update the log tail in this case.
> > 
> > Hence we close the hole when the log->l_tail_lsn is incorrect after
> > the first insert into the AIL, and hence we no longer need to update
> > the log->l_tail_lsn when reading it into the iclog header -
> > log->l_tail_lsn is always up to date, and so we can now just read it
> > in xlog_state_release_iclog() rather than having to grab the AIL
> > lock and checking the AIL to update log->l_tail_lsn with the correct
> > tail value from iclog IO submission....
> 
> Ahhh, ok, I get it now.  Thanks for the explanation.

Looks ok to me now,
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> 
> --D
> 
> > Cheers,
> > 
> > Dave.
> > -- 
> > Dave Chinner
> > david@fromorbit.com

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 3/9] xfs: background AIL push targets physical space, not grant space
  2022-08-26 15:47       ` Darrick J. Wong
@ 2022-08-26 23:49         ` Darrick J. Wong
  0 siblings, 0 replies; 39+ messages in thread
From: Darrick J. Wong @ 2022-08-26 23:49 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Fri, Aug 26, 2022 at 08:47:35AM -0700, Darrick J. Wong wrote:
> On Tue, Aug 23, 2022 at 12:01:03PM +1000, Dave Chinner wrote:
> > On Mon, Aug 22, 2022 at 12:00:03PM -0700, Darrick J. Wong wrote:
> > > On Wed, Aug 10, 2022 at 09:03:47AM +1000, Dave Chinner wrote:
> > > > From: Dave Chinner <dchinner@redhat.com>
> > > > 
> > > > Currently the AIL attempts to keep 25% of the "log space" free,
> > > > where the current used space is tracked by the reserve grant head.
> > > > That is, it tracks both physical space used plus the amount reserved
> > > > by transactions in progress.
> > > > 
> > > > When we start tail pushing, we are trying to make space for new
> > > > reservations by writing back older metadata and the log is generally
> > > > physically full of dirty metadata, and reservations for modifications
> > > > in flight take up whatever space the AIL can physically free up.
> > > > 
> > > > Hence we don't really need to take into account the reservation
> > > > space that has been used - we just need to keep the log tail moving
> > > > as fast as we can to free up space for more reservations to be made.
> > > > We know exactly how much physical space the journal is consuming in
> > > > the AIL (i.e. max LSN - min LSN) so we can base push thresholds
> > > > directly on this state rather than have to look at grant head
> > > > reservations to determine how much to physically push out of the
> > > > log.
> > > > 
> > > > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > > 
> > > Makes sense, I think.  Though I was wondering about the last patch --
> > > pushing the AIL until it's empty when a trans_alloc can't find grant
> > > reservation could take a while on a slow storage.

Now that I've had a chance to see where we're going...
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> > 
> > The push in the grant reservation code is not a blocking push - it
> > just tells the AIL to start pushing everything, then it goes to
> > sleep waiting for the tail to move and space to come available. The
> > AIL behaviour is largely unchanged, especially if the application is
> > running under even slight memory pressure as the inode shrinker will
> > repeatedly kick the AIL push-all trigger regardless of consumed
> > journal/grant space.
> 
> Ok.
> 
> > > Does this mean that
> > > we're trading the incremental freeing-up of the existing code for
> > > potentially higher transaction allocation latency in the hopes that more
> > > threads can get reservation?  Or does the "keep the AIL going" bits make
> > > up for that?
> > 
> > So far I've typically measured slightly lower worst case latencies
> > with this mechanism that with the existing "repeatedly push to 25%
> > free" that we currently have. It's not really significant enough to
> > make statements about (unlike cpu usage reductions or perf
> > increases), but it does seem to be a bit better...
> 
> <nod>
> 
> --D
> 
> > Cheers,
> > 
> > Dave.
> > -- 
> > Dave Chinner
> > david@fromorbit.com

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/9] xfs: move and xfs_trans_committed_bulk
  2022-08-09 23:03 ` [PATCH 1/9] xfs: move and xfs_trans_committed_bulk Dave Chinner
                     ` (2 preceding siblings ...)
  2022-08-22 15:03   ` Darrick J. Wong
@ 2022-09-07 13:51   ` Christoph Hellwig
  3 siblings, 0 replies; 39+ messages in thread
From: Christoph Hellwig @ 2022-09-07 13:51 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

As the bot pointed out xlog_cil_ail_insert should to be marked static
now.  Otherwise looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 2/9] xfs: AIL doesn't need manual pushing
  2022-08-09 23:03 ` [PATCH 2/9] xfs: AIL doesn't need manual pushing Dave Chinner
  2022-08-22 17:08   ` Darrick J. Wong
@ 2022-09-07 14:01   ` Christoph Hellwig
  2023-10-12  8:44     ` Christoph Hellwig
  1 sibling, 1 reply; 39+ messages in thread
From: Christoph Hellwig @ 2022-09-07 14:01 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

First a note: the explanations in this email thread were really
helpful understanding the patch.  Can you make sure they are
recorded in the next version of the commit message?

Otherwise just some nitpicks:

> -xfs_lsn_t
> -xlog_grant_push_threshold(
> -	struct xlog	*log,
> -	int		need_bytes)

This is moved around and reappears as xfs_ail_push_target.  Maybe
split the move, rename and drop of the need_bytes into a separate
cleanup patch after this main one.

> +int xlog_space_left(struct xlog	 *log, atomic64_t *head);

Odd indentation with the tab before *log here.

> +xfs_lsn_t		__xfs_ail_push_target(struct xfs_ail *ailp);
> +static inline xfs_lsn_t xfs_ail_push_target(struct xfs_ail *ailp)
> +{
> +	xfs_lsn_t	lsn;
> +
> +	spin_lock(&ailp->ail_lock);
> +	lsn = __xfs_ail_push_target(ailp);
> +	spin_unlock(&ailp->ail_lock);
> +	return lsn;
> +}

Before this patch xfs_defer_relog called xlog_grant_push_threshold
without the ail_lock, why is ail_lock needed now?

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 3/9] xfs: background AIL push targets physical space, not grant space
  2022-08-09 23:03 ` [PATCH 3/9] xfs: background AIL push targets physical space, not grant space Dave Chinner
  2022-08-22 19:00   ` Darrick J. Wong
@ 2022-09-07 14:04   ` Christoph Hellwig
  1 sibling, 0 replies; 39+ messages in thread
From: Christoph Hellwig @ 2022-09-07 14:04 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

>  __xfs_ail_push_target(
>  	struct xfs_ail		*ailp)
>  {
> +	struct xlog		*log = ailp->ail_log;
> +	struct xfs_log_item	*lip;
>  
> +	xfs_lsn_t	target_lsn = 0;

Any reason for the empty line and different indentation here?

> +	xfs_lsn_t	max_lsn;
> +	xfs_lsn_t	min_lsn;
> +	int32_t		free_bytes;
> +	uint32_t	target_block;
> +	uint32_t	target_cycle;
> +
> +	lockdep_assert_held(&ailp->ail_lock);
> +
> +	lip = xfs_ail_max(ailp);
> +	if (!lip)
> +		return NULLCOMMITLSN;
> +	max_lsn = lip->li_lsn;
> +	min_lsn = __xfs_ail_min_lsn(ailp);

Ok, this appears to be when we actually need the ail_lock added in the
previous patch.

Otherwise looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 4/9] xfs: ensure log tail is always up to date
  2022-08-09 23:03 ` [PATCH 4/9] xfs: ensure log tail is always up to date Dave Chinner
  2022-08-23  0:33   ` Darrick J. Wong
@ 2022-09-07 14:06   ` Christoph Hellwig
  1 sibling, 0 replies; 39+ messages in thread
From: Christoph Hellwig @ 2022-09-07 14:06 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 5/9] xfs: l_last_sync_lsn is really AIL state
  2022-08-09 23:03 ` [PATCH 5/9] xfs: l_last_sync_lsn is really AIL state Dave Chinner
  2022-08-26 22:19   ` Darrick J. Wong
@ 2022-09-07 14:11   ` Christoph Hellwig
  1 sibling, 0 replies; 39+ messages in thread
From: Christoph Hellwig @ 2022-09-07 14:11 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 6/9] xfs: collapse xlog_state_set_callback in caller
  2022-08-09 23:03 ` [PATCH 6/9] xfs: collapse xlog_state_set_callback in caller Dave Chinner
  2022-08-26 22:20   ` Darrick J. Wong
@ 2022-09-07 14:12   ` Christoph Hellwig
  1 sibling, 0 replies; 39+ messages in thread
From: Christoph Hellwig @ 2022-09-07 14:12 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 2/9] xfs: AIL doesn't need manual pushing
  2022-09-07 14:01   ` Christoph Hellwig
@ 2023-10-12  8:44     ` Christoph Hellwig
  0 siblings, 0 replies; 39+ messages in thread
From: Christoph Hellwig @ 2023-10-12  8:44 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Wed, Sep 07, 2022 at 07:01:11AM -0700, Christoph Hellwig wrote:
> > +static inline xfs_lsn_t xfs_ail_push_target(struct xfs_ail *ailp)
> > +{
> > +	xfs_lsn_t	lsn;
> > +
> > +	spin_lock(&ailp->ail_lock);
> > +	lsn = __xfs_ail_push_target(ailp);
> > +	spin_unlock(&ailp->ail_lock);
> > +	return lsn;
> > +}
> 
> Before this patch xfs_defer_relog called xlog_grant_push_threshold
> without the ail_lock, why is ail_lock needed now?

Looking through the most recent version of the patch this is still
there and I'm also not seeing an explanation in the patch.  Can
you comment on this change in the commit log?

I also still find it not very helpful that xlog_grant_push_threshold
gets moved and renamed as part of a huge behavior change patch.

The rest still looks good to me even in the last version.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/9] xfs: move and xfs_trans_committed_bulk
  2023-09-21  1:48 ` [PATCH 1/9] xfs: move and xfs_trans_committed_bulk Dave Chinner
@ 2023-10-12  8:54   ` Christoph Hellwig
  0 siblings, 0 replies; 39+ messages in thread
From: Christoph Hellwig @ 2023-10-12  8:54 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

s/move and/move and rename/ in the subject.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH 1/9] xfs: move and xfs_trans_committed_bulk
  2023-09-21  1:48 [PATCH 0/9] xfs: byte-based grant head reservation tracking Dave Chinner
@ 2023-09-21  1:48 ` Dave Chinner
  2023-10-12  8:54   ` Christoph Hellwig
  0 siblings, 1 reply; 39+ messages in thread
From: Dave Chinner @ 2023-09-21  1:48 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

Ever since the CIL and delayed logging was introduced,
xfs_trans_committed_bulk() has been a purely CIL checkpoint
completion function and not a transaction commit completion
function. Now that we are adding log specific updates to this
function, it really does not have anything to do with the
transaction subsystem - it is really log and log item level
functionality.

This should be part of the CIL code as it is the callback
that moves log items from the CIL checkpoint to the AIL. Move it
and rename it to xlog_cil_ail_insert().

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/xfs_log_cil.c    | 132 +++++++++++++++++++++++++++++++++++++++-
 fs/xfs/xfs_trans.c      | 129 ---------------------------------------
 fs/xfs/xfs_trans_priv.h |   3 -
 3 files changed, 131 insertions(+), 133 deletions(-)

diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
index ebc70aaa299c..c1fee14be5c2 100644
--- a/fs/xfs/xfs_log_cil.c
+++ b/fs/xfs/xfs_log_cil.c
@@ -695,6 +695,136 @@ xlog_cil_insert_items(
 	}
 }
 
+static inline void
+xlog_cil_ail_insert_batch(
+	struct xfs_ail		*ailp,
+	struct xfs_ail_cursor	*cur,
+	struct xfs_log_item	**log_items,
+	int			nr_items,
+	xfs_lsn_t		commit_lsn)
+{
+	int	i;
+
+	spin_lock(&ailp->ail_lock);
+	/* xfs_trans_ail_update_bulk drops ailp->ail_lock */
+	xfs_trans_ail_update_bulk(ailp, cur, log_items, nr_items, commit_lsn);
+
+	for (i = 0; i < nr_items; i++) {
+		struct xfs_log_item *lip = log_items[i];
+
+		if (lip->li_ops->iop_unpin)
+			lip->li_ops->iop_unpin(lip, 0);
+	}
+}
+
+/*
+ * Take the checkpoint's log vector chain of items and insert the attached log
+ * items into the the AIL. This uses bulk insertion techniques to minimise AIL
+ * lock traffic.
+ *
+ * If we are called with the aborted flag set, it is because a log write during
+ * a CIL checkpoint commit has failed. In this case, all the items in the
+ * checkpoint have already gone through iop_committed and iop_committing, which
+ * means that checkpoint commit abort handling is treated exactly the same as an
+ * iclog write error even though we haven't started any IO yet. Hence in this
+ * case all we need to do is iop_committed processing, followed by an
+ * iop_unpin(aborted) call.
+ *
+ * The AIL cursor is used to optimise the insert process. If commit_lsn is not
+ * at the end of the AIL, the insert cursor avoids the need to walk the AIL to
+ * find the insertion point on every xfs_log_item_batch_insert() call. This
+ * saves a lot of needless list walking and is a net win, even though it
+ * slightly increases that amount of AIL lock traffic to set it up and tear it
+ * down.
+ */
+static void
+xlog_cil_ail_insert(
+	struct xlog		*log,
+	struct list_head	*lv_chain,
+	xfs_lsn_t		commit_lsn,
+	bool			aborted)
+{
+#define LOG_ITEM_BATCH_SIZE	32
+	struct xfs_ail		*ailp = log->l_ailp;
+	struct xfs_log_item	*log_items[LOG_ITEM_BATCH_SIZE];
+	struct xfs_log_vec	*lv;
+	struct xfs_ail_cursor	cur;
+	int			i = 0;
+
+	spin_lock(&ailp->ail_lock);
+	xfs_trans_ail_cursor_last(ailp, &cur, commit_lsn);
+	spin_unlock(&ailp->ail_lock);
+
+	/* unpin all the log items */
+	list_for_each_entry(lv, lv_chain, lv_list) {
+		struct xfs_log_item	*lip = lv->lv_item;
+		xfs_lsn_t		item_lsn;
+
+		if (aborted)
+			set_bit(XFS_LI_ABORTED, &lip->li_flags);
+
+		if (lip->li_ops->flags & XFS_ITEM_RELEASE_WHEN_COMMITTED) {
+			lip->li_ops->iop_release(lip);
+			continue;
+		}
+
+		if (lip->li_ops->iop_committed)
+			item_lsn = lip->li_ops->iop_committed(lip, commit_lsn);
+		else
+			item_lsn = commit_lsn;
+
+		/* item_lsn of -1 means the item needs no further processing */
+		if (XFS_LSN_CMP(item_lsn, (xfs_lsn_t)-1) == 0)
+			continue;
+
+		/*
+		 * if we are aborting the operation, no point in inserting the
+		 * object into the AIL as we are in a shutdown situation.
+		 */
+		if (aborted) {
+			ASSERT(xlog_is_shutdown(ailp->ail_log));
+			if (lip->li_ops->iop_unpin)
+				lip->li_ops->iop_unpin(lip, 1);
+			continue;
+		}
+
+		if (item_lsn != commit_lsn) {
+
+			/*
+			 * Not a bulk update option due to unusual item_lsn.
+			 * Push into AIL immediately, rechecking the lsn once
+			 * we have the ail lock. Then unpin the item. This does
+			 * not affect the AIL cursor the bulk insert path is
+			 * using.
+			 */
+			spin_lock(&ailp->ail_lock);
+			if (XFS_LSN_CMP(item_lsn, lip->li_lsn) > 0)
+				xfs_trans_ail_update(ailp, lip, item_lsn);
+			else
+				spin_unlock(&ailp->ail_lock);
+			if (lip->li_ops->iop_unpin)
+				lip->li_ops->iop_unpin(lip, 0);
+			continue;
+		}
+
+		/* Item is a candidate for bulk AIL insert.  */
+		log_items[i++] = lv->lv_item;
+		if (i >= LOG_ITEM_BATCH_SIZE) {
+			xlog_cil_ail_insert_batch(ailp, &cur, log_items,
+					LOG_ITEM_BATCH_SIZE, commit_lsn);
+			i = 0;
+		}
+	}
+
+	/* make sure we insert the remainder! */
+	if (i)
+		xlog_cil_ail_insert_batch(ailp, &cur, log_items, i, commit_lsn);
+
+	spin_lock(&ailp->ail_lock);
+	xfs_trans_ail_cursor_done(&cur);
+	spin_unlock(&ailp->ail_lock);
+}
+
 static void
 xlog_cil_free_logvec(
 	struct list_head	*lv_chain)
@@ -804,7 +934,7 @@ xlog_cil_committed(
 		spin_unlock(&ctx->cil->xc_push_lock);
 	}
 
-	xfs_trans_committed_bulk(ctx->cil->xc_log->l_ailp, &ctx->lv_chain,
+	xlog_cil_ail_insert(ctx->cil->xc_log, &ctx->lv_chain,
 					ctx->start_lsn, abort);
 
 	xfs_extent_busy_sort(&ctx->busy_extents);
diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
index 8c0bfc9a33b1..4ebef316c128 100644
--- a/fs/xfs/xfs_trans.c
+++ b/fs/xfs/xfs_trans.c
@@ -717,135 +717,6 @@ xfs_trans_free_items(
 	}
 }
 
-static inline void
-xfs_log_item_batch_insert(
-	struct xfs_ail		*ailp,
-	struct xfs_ail_cursor	*cur,
-	struct xfs_log_item	**log_items,
-	int			nr_items,
-	xfs_lsn_t		commit_lsn)
-{
-	int	i;
-
-	spin_lock(&ailp->ail_lock);
-	/* xfs_trans_ail_update_bulk drops ailp->ail_lock */
-	xfs_trans_ail_update_bulk(ailp, cur, log_items, nr_items, commit_lsn);
-
-	for (i = 0; i < nr_items; i++) {
-		struct xfs_log_item *lip = log_items[i];
-
-		if (lip->li_ops->iop_unpin)
-			lip->li_ops->iop_unpin(lip, 0);
-	}
-}
-
-/*
- * Bulk operation version of xfs_trans_committed that takes a log vector of
- * items to insert into the AIL. This uses bulk AIL insertion techniques to
- * minimise lock traffic.
- *
- * If we are called with the aborted flag set, it is because a log write during
- * a CIL checkpoint commit has failed. In this case, all the items in the
- * checkpoint have already gone through iop_committed and iop_committing, which
- * means that checkpoint commit abort handling is treated exactly the same
- * as an iclog write error even though we haven't started any IO yet. Hence in
- * this case all we need to do is iop_committed processing, followed by an
- * iop_unpin(aborted) call.
- *
- * The AIL cursor is used to optimise the insert process. If commit_lsn is not
- * at the end of the AIL, the insert cursor avoids the need to walk
- * the AIL to find the insertion point on every xfs_log_item_batch_insert()
- * call. This saves a lot of needless list walking and is a net win, even
- * though it slightly increases that amount of AIL lock traffic to set it up
- * and tear it down.
- */
-void
-xfs_trans_committed_bulk(
-	struct xfs_ail		*ailp,
-	struct list_head	*lv_chain,
-	xfs_lsn_t		commit_lsn,
-	bool			aborted)
-{
-#define LOG_ITEM_BATCH_SIZE	32
-	struct xfs_log_item	*log_items[LOG_ITEM_BATCH_SIZE];
-	struct xfs_log_vec	*lv;
-	struct xfs_ail_cursor	cur;
-	int			i = 0;
-
-	spin_lock(&ailp->ail_lock);
-	xfs_trans_ail_cursor_last(ailp, &cur, commit_lsn);
-	spin_unlock(&ailp->ail_lock);
-
-	/* unpin all the log items */
-	list_for_each_entry(lv, lv_chain, lv_list) {
-		struct xfs_log_item	*lip = lv->lv_item;
-		xfs_lsn_t		item_lsn;
-
-		if (aborted)
-			set_bit(XFS_LI_ABORTED, &lip->li_flags);
-
-		if (lip->li_ops->flags & XFS_ITEM_RELEASE_WHEN_COMMITTED) {
-			lip->li_ops->iop_release(lip);
-			continue;
-		}
-
-		if (lip->li_ops->iop_committed)
-			item_lsn = lip->li_ops->iop_committed(lip, commit_lsn);
-		else
-			item_lsn = commit_lsn;
-
-		/* item_lsn of -1 means the item needs no further processing */
-		if (XFS_LSN_CMP(item_lsn, (xfs_lsn_t)-1) == 0)
-			continue;
-
-		/*
-		 * if we are aborting the operation, no point in inserting the
-		 * object into the AIL as we are in a shutdown situation.
-		 */
-		if (aborted) {
-			ASSERT(xlog_is_shutdown(ailp->ail_log));
-			if (lip->li_ops->iop_unpin)
-				lip->li_ops->iop_unpin(lip, 1);
-			continue;
-		}
-
-		if (item_lsn != commit_lsn) {
-
-			/*
-			 * Not a bulk update option due to unusual item_lsn.
-			 * Push into AIL immediately, rechecking the lsn once
-			 * we have the ail lock. Then unpin the item. This does
-			 * not affect the AIL cursor the bulk insert path is
-			 * using.
-			 */
-			spin_lock(&ailp->ail_lock);
-			if (XFS_LSN_CMP(item_lsn, lip->li_lsn) > 0)
-				xfs_trans_ail_update(ailp, lip, item_lsn);
-			else
-				spin_unlock(&ailp->ail_lock);
-			if (lip->li_ops->iop_unpin)
-				lip->li_ops->iop_unpin(lip, 0);
-			continue;
-		}
-
-		/* Item is a candidate for bulk AIL insert.  */
-		log_items[i++] = lv->lv_item;
-		if (i >= LOG_ITEM_BATCH_SIZE) {
-			xfs_log_item_batch_insert(ailp, &cur, log_items,
-					LOG_ITEM_BATCH_SIZE, commit_lsn);
-			i = 0;
-		}
-	}
-
-	/* make sure we insert the remainder! */
-	if (i)
-		xfs_log_item_batch_insert(ailp, &cur, log_items, i, commit_lsn);
-
-	spin_lock(&ailp->ail_lock);
-	xfs_trans_ail_cursor_done(&cur);
-	spin_unlock(&ailp->ail_lock);
-}
-
 /*
  * Sort transaction items prior to running precommit operations. This will
  * attempt to order the items such that they will always be locked in the same
diff --git a/fs/xfs/xfs_trans_priv.h b/fs/xfs/xfs_trans_priv.h
index d5400150358e..52a45f0a5ef1 100644
--- a/fs/xfs/xfs_trans_priv.h
+++ b/fs/xfs/xfs_trans_priv.h
@@ -19,9 +19,6 @@ void	xfs_trans_add_item(struct xfs_trans *, struct xfs_log_item *);
 void	xfs_trans_del_item(struct xfs_log_item *);
 void	xfs_trans_unreserve_and_mod_sb(struct xfs_trans *tp);
 
-void	xfs_trans_committed_bulk(struct xfs_ail *ailp,
-				struct list_head *lv_chain,
-				xfs_lsn_t commit_lsn, bool aborted);
 /*
  * AIL traversal cursor.
  *
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 1/9] xfs: move and xfs_trans_committed_bulk
  2022-12-20 23:22 [PATCH 0/9 v3] xfs: byte-based grant head reservation tracking Dave Chinner
@ 2022-12-20 23:23 ` Dave Chinner
  0 siblings, 0 replies; 39+ messages in thread
From: Dave Chinner @ 2022-12-20 23:23 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

Ever since the CIL and delayed logging was introduced,
xfs_trans_committed_bulk() has been a purely CIL checkpoint
completion function and not a transaction commit completion
function. Now that we are adding log specific updates to this
function, it really does not have anything to do with the
transaction subsystem - it is really log and log item level
functionality.

This should be part of the CIL code as it is the callback
that moves log items from the CIL checkpoint to the AIL. Move it
and rename it to xlog_cil_ail_insert().

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/xfs_log_cil.c    | 132 +++++++++++++++++++++++++++++++++++++++-
 fs/xfs/xfs_trans.c      | 129 ---------------------------------------
 fs/xfs/xfs_trans_priv.h |   3 -
 3 files changed, 131 insertions(+), 133 deletions(-)

diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
index eccbfb99e894..a430ef863c55 100644
--- a/fs/xfs/xfs_log_cil.c
+++ b/fs/xfs/xfs_log_cil.c
@@ -683,6 +683,136 @@ xlog_cil_insert_items(
 	}
 }
 
+static inline void
+xlog_cil_ail_insert_batch(
+	struct xfs_ail		*ailp,
+	struct xfs_ail_cursor	*cur,
+	struct xfs_log_item	**log_items,
+	int			nr_items,
+	xfs_lsn_t		commit_lsn)
+{
+	int	i;
+
+	spin_lock(&ailp->ail_lock);
+	/* xfs_trans_ail_update_bulk drops ailp->ail_lock */
+	xfs_trans_ail_update_bulk(ailp, cur, log_items, nr_items, commit_lsn);
+
+	for (i = 0; i < nr_items; i++) {
+		struct xfs_log_item *lip = log_items[i];
+
+		if (lip->li_ops->iop_unpin)
+			lip->li_ops->iop_unpin(lip, 0);
+	}
+}
+
+/*
+ * Take the checkpoint's log vector chain of items and insert the attached log
+ * items into the the AIL. This uses bulk insertion techniques to minimise AIL
+ * lock traffic.
+ *
+ * If we are called with the aborted flag set, it is because a log write during
+ * a CIL checkpoint commit has failed. In this case, all the items in the
+ * checkpoint have already gone through iop_committed and iop_committing, which
+ * means that checkpoint commit abort handling is treated exactly the same as an
+ * iclog write error even though we haven't started any IO yet. Hence in this
+ * case all we need to do is iop_committed processing, followed by an
+ * iop_unpin(aborted) call.
+ *
+ * The AIL cursor is used to optimise the insert process. If commit_lsn is not
+ * at the end of the AIL, the insert cursor avoids the need to walk the AIL to
+ * find the insertion point on every xfs_log_item_batch_insert() call. This
+ * saves a lot of needless list walking and is a net win, even though it
+ * slightly increases that amount of AIL lock traffic to set it up and tear it
+ * down.
+ */
+static void
+xlog_cil_ail_insert(
+	struct xlog		*log,
+	struct list_head	*lv_chain,
+	xfs_lsn_t		commit_lsn,
+	bool			aborted)
+{
+#define LOG_ITEM_BATCH_SIZE	32
+	struct xfs_ail		*ailp = log->l_ailp;
+	struct xfs_log_item	*log_items[LOG_ITEM_BATCH_SIZE];
+	struct xfs_log_vec	*lv;
+	struct xfs_ail_cursor	cur;
+	int			i = 0;
+
+	spin_lock(&ailp->ail_lock);
+	xfs_trans_ail_cursor_last(ailp, &cur, commit_lsn);
+	spin_unlock(&ailp->ail_lock);
+
+	/* unpin all the log items */
+	list_for_each_entry(lv, lv_chain, lv_list) {
+		struct xfs_log_item	*lip = lv->lv_item;
+		xfs_lsn_t		item_lsn;
+
+		if (aborted)
+			set_bit(XFS_LI_ABORTED, &lip->li_flags);
+
+		if (lip->li_ops->flags & XFS_ITEM_RELEASE_WHEN_COMMITTED) {
+			lip->li_ops->iop_release(lip);
+			continue;
+		}
+
+		if (lip->li_ops->iop_committed)
+			item_lsn = lip->li_ops->iop_committed(lip, commit_lsn);
+		else
+			item_lsn = commit_lsn;
+
+		/* item_lsn of -1 means the item needs no further processing */
+		if (XFS_LSN_CMP(item_lsn, (xfs_lsn_t)-1) == 0)
+			continue;
+
+		/*
+		 * if we are aborting the operation, no point in inserting the
+		 * object into the AIL as we are in a shutdown situation.
+		 */
+		if (aborted) {
+			ASSERT(xlog_is_shutdown(ailp->ail_log));
+			if (lip->li_ops->iop_unpin)
+				lip->li_ops->iop_unpin(lip, 1);
+			continue;
+		}
+
+		if (item_lsn != commit_lsn) {
+
+			/*
+			 * Not a bulk update option due to unusual item_lsn.
+			 * Push into AIL immediately, rechecking the lsn once
+			 * we have the ail lock. Then unpin the item. This does
+			 * not affect the AIL cursor the bulk insert path is
+			 * using.
+			 */
+			spin_lock(&ailp->ail_lock);
+			if (XFS_LSN_CMP(item_lsn, lip->li_lsn) > 0)
+				xfs_trans_ail_update(ailp, lip, item_lsn);
+			else
+				spin_unlock(&ailp->ail_lock);
+			if (lip->li_ops->iop_unpin)
+				lip->li_ops->iop_unpin(lip, 0);
+			continue;
+		}
+
+		/* Item is a candidate for bulk AIL insert.  */
+		log_items[i++] = lv->lv_item;
+		if (i >= LOG_ITEM_BATCH_SIZE) {
+			xlog_cil_ail_insert_batch(ailp, &cur, log_items,
+					LOG_ITEM_BATCH_SIZE, commit_lsn);
+			i = 0;
+		}
+	}
+
+	/* make sure we insert the remainder! */
+	if (i)
+		xlog_cil_ail_insert_batch(ailp, &cur, log_items, i, commit_lsn);
+
+	spin_lock(&ailp->ail_lock);
+	xfs_trans_ail_cursor_done(&cur);
+	spin_unlock(&ailp->ail_lock);
+}
+
 static void
 xlog_cil_free_logvec(
 	struct list_head	*lv_chain)
@@ -792,7 +922,7 @@ xlog_cil_committed(
 		spin_unlock(&ctx->cil->xc_push_lock);
 	}
 
-	xfs_trans_committed_bulk(ctx->cil->xc_log->l_ailp, &ctx->lv_chain,
+	xlog_cil_ail_insert(ctx->cil->xc_log, &ctx->lv_chain,
 					ctx->start_lsn, abort);
 
 	xfs_extent_busy_sort(&ctx->busy_extents);
diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
index 7bd16fbff534..58c4e875eb12 100644
--- a/fs/xfs/xfs_trans.c
+++ b/fs/xfs/xfs_trans.c
@@ -715,135 +715,6 @@ xfs_trans_free_items(
 	}
 }
 
-static inline void
-xfs_log_item_batch_insert(
-	struct xfs_ail		*ailp,
-	struct xfs_ail_cursor	*cur,
-	struct xfs_log_item	**log_items,
-	int			nr_items,
-	xfs_lsn_t		commit_lsn)
-{
-	int	i;
-
-	spin_lock(&ailp->ail_lock);
-	/* xfs_trans_ail_update_bulk drops ailp->ail_lock */
-	xfs_trans_ail_update_bulk(ailp, cur, log_items, nr_items, commit_lsn);
-
-	for (i = 0; i < nr_items; i++) {
-		struct xfs_log_item *lip = log_items[i];
-
-		if (lip->li_ops->iop_unpin)
-			lip->li_ops->iop_unpin(lip, 0);
-	}
-}
-
-/*
- * Bulk operation version of xfs_trans_committed that takes a log vector of
- * items to insert into the AIL. This uses bulk AIL insertion techniques to
- * minimise lock traffic.
- *
- * If we are called with the aborted flag set, it is because a log write during
- * a CIL checkpoint commit has failed. In this case, all the items in the
- * checkpoint have already gone through iop_committed and iop_committing, which
- * means that checkpoint commit abort handling is treated exactly the same
- * as an iclog write error even though we haven't started any IO yet. Hence in
- * this case all we need to do is iop_committed processing, followed by an
- * iop_unpin(aborted) call.
- *
- * The AIL cursor is used to optimise the insert process. If commit_lsn is not
- * at the end of the AIL, the insert cursor avoids the need to walk
- * the AIL to find the insertion point on every xfs_log_item_batch_insert()
- * call. This saves a lot of needless list walking and is a net win, even
- * though it slightly increases that amount of AIL lock traffic to set it up
- * and tear it down.
- */
-void
-xfs_trans_committed_bulk(
-	struct xfs_ail		*ailp,
-	struct list_head	*lv_chain,
-	xfs_lsn_t		commit_lsn,
-	bool			aborted)
-{
-#define LOG_ITEM_BATCH_SIZE	32
-	struct xfs_log_item	*log_items[LOG_ITEM_BATCH_SIZE];
-	struct xfs_log_vec	*lv;
-	struct xfs_ail_cursor	cur;
-	int			i = 0;
-
-	spin_lock(&ailp->ail_lock);
-	xfs_trans_ail_cursor_last(ailp, &cur, commit_lsn);
-	spin_unlock(&ailp->ail_lock);
-
-	/* unpin all the log items */
-	list_for_each_entry(lv, lv_chain, lv_list) {
-		struct xfs_log_item	*lip = lv->lv_item;
-		xfs_lsn_t		item_lsn;
-
-		if (aborted)
-			set_bit(XFS_LI_ABORTED, &lip->li_flags);
-
-		if (lip->li_ops->flags & XFS_ITEM_RELEASE_WHEN_COMMITTED) {
-			lip->li_ops->iop_release(lip);
-			continue;
-		}
-
-		if (lip->li_ops->iop_committed)
-			item_lsn = lip->li_ops->iop_committed(lip, commit_lsn);
-		else
-			item_lsn = commit_lsn;
-
-		/* item_lsn of -1 means the item needs no further processing */
-		if (XFS_LSN_CMP(item_lsn, (xfs_lsn_t)-1) == 0)
-			continue;
-
-		/*
-		 * if we are aborting the operation, no point in inserting the
-		 * object into the AIL as we are in a shutdown situation.
-		 */
-		if (aborted) {
-			ASSERT(xlog_is_shutdown(ailp->ail_log));
-			if (lip->li_ops->iop_unpin)
-				lip->li_ops->iop_unpin(lip, 1);
-			continue;
-		}
-
-		if (item_lsn != commit_lsn) {
-
-			/*
-			 * Not a bulk update option due to unusual item_lsn.
-			 * Push into AIL immediately, rechecking the lsn once
-			 * we have the ail lock. Then unpin the item. This does
-			 * not affect the AIL cursor the bulk insert path is
-			 * using.
-			 */
-			spin_lock(&ailp->ail_lock);
-			if (XFS_LSN_CMP(item_lsn, lip->li_lsn) > 0)
-				xfs_trans_ail_update(ailp, lip, item_lsn);
-			else
-				spin_unlock(&ailp->ail_lock);
-			if (lip->li_ops->iop_unpin)
-				lip->li_ops->iop_unpin(lip, 0);
-			continue;
-		}
-
-		/* Item is a candidate for bulk AIL insert.  */
-		log_items[i++] = lv->lv_item;
-		if (i >= LOG_ITEM_BATCH_SIZE) {
-			xfs_log_item_batch_insert(ailp, &cur, log_items,
-					LOG_ITEM_BATCH_SIZE, commit_lsn);
-			i = 0;
-		}
-	}
-
-	/* make sure we insert the remainder! */
-	if (i)
-		xfs_log_item_batch_insert(ailp, &cur, log_items, i, commit_lsn);
-
-	spin_lock(&ailp->ail_lock);
-	xfs_trans_ail_cursor_done(&cur);
-	spin_unlock(&ailp->ail_lock);
-}
-
 /*
  * Sort transaction items prior to running precommit operations. This will
  * attempt to order the items such that they will always be locked in the same
diff --git a/fs/xfs/xfs_trans_priv.h b/fs/xfs/xfs_trans_priv.h
index d5400150358e..52a45f0a5ef1 100644
--- a/fs/xfs/xfs_trans_priv.h
+++ b/fs/xfs/xfs_trans_priv.h
@@ -19,9 +19,6 @@ void	xfs_trans_add_item(struct xfs_trans *, struct xfs_log_item *);
 void	xfs_trans_del_item(struct xfs_log_item *);
 void	xfs_trans_unreserve_and_mod_sb(struct xfs_trans *tp);
 
-void	xfs_trans_committed_bulk(struct xfs_ail *ailp,
-				struct list_head *lv_chain,
-				xfs_lsn_t commit_lsn, bool aborted);
 /*
  * AIL traversal cursor.
  *
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2023-10-12  8:54 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-09 23:03 [PATCH 0/9 v2] xfs: byte-base grant head reservation tracking Dave Chinner
2022-08-09 23:03 ` [PATCH 1/9] xfs: move and xfs_trans_committed_bulk Dave Chinner
2022-08-10 14:17   ` kernel test robot
2022-08-10 17:08   ` kernel test robot
2022-08-22 15:03   ` Darrick J. Wong
2022-09-07 13:51   ` Christoph Hellwig
2022-08-09 23:03 ` [PATCH 2/9] xfs: AIL doesn't need manual pushing Dave Chinner
2022-08-22 17:08   ` Darrick J. Wong
2022-08-23  1:51     ` Dave Chinner
2022-08-26 15:46       ` Darrick J. Wong
2022-09-07 14:01   ` Christoph Hellwig
2023-10-12  8:44     ` Christoph Hellwig
2022-08-09 23:03 ` [PATCH 3/9] xfs: background AIL push targets physical space, not grant space Dave Chinner
2022-08-22 19:00   ` Darrick J. Wong
2022-08-23  2:01     ` Dave Chinner
2022-08-26 15:47       ` Darrick J. Wong
2022-08-26 23:49         ` Darrick J. Wong
2022-09-07 14:04   ` Christoph Hellwig
2022-08-09 23:03 ` [PATCH 4/9] xfs: ensure log tail is always up to date Dave Chinner
2022-08-23  0:33   ` Darrick J. Wong
2022-08-23  2:18     ` Dave Chinner
2022-08-26 21:39       ` Darrick J. Wong
2022-08-26 23:49         ` Darrick J. Wong
2022-09-07 14:06   ` Christoph Hellwig
2022-08-09 23:03 ` [PATCH 5/9] xfs: l_last_sync_lsn is really AIL state Dave Chinner
2022-08-26 22:19   ` Darrick J. Wong
2022-09-07 14:11   ` Christoph Hellwig
2022-08-09 23:03 ` [PATCH 6/9] xfs: collapse xlog_state_set_callback in caller Dave Chinner
2022-08-26 22:20   ` Darrick J. Wong
2022-09-07 14:12   ` Christoph Hellwig
2022-08-09 23:03 ` [PATCH 7/9] xfs: track log space pinned by the AIL Dave Chinner
2022-08-26 22:39   ` Darrick J. Wong
2022-08-09 23:03 ` [PATCH 8/9] xfs: pass the full grant head to accounting functions Dave Chinner
2022-08-26 22:25   ` Darrick J. Wong
2022-08-09 23:03 ` [PATCH 9/9] xfs: grant heads track byte counts, not LSNs Dave Chinner
2022-08-26 23:45   ` Darrick J. Wong
2022-12-20 23:22 [PATCH 0/9 v3] xfs: byte-based grant head reservation tracking Dave Chinner
2022-12-20 23:23 ` [PATCH 1/9] xfs: move and xfs_trans_committed_bulk Dave Chinner
2023-09-21  1:48 [PATCH 0/9] xfs: byte-based grant head reservation tracking Dave Chinner
2023-09-21  1:48 ` [PATCH 1/9] xfs: move and xfs_trans_committed_bulk Dave Chinner
2023-10-12  8:54   ` Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.