All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: linux-xfs@vger.kernel.org
Subject: [PATCH 11/39] xfs: CIL work is serialised, not pipelined
Date: Thu,  3 Jun 2021 15:22:12 +1000	[thread overview]
Message-ID: <20210603052240.171998-12-david@fromorbit.com> (raw)
In-Reply-To: <20210603052240.171998-1-david@fromorbit.com>

From: Dave Chinner <dchinner@redhat.com>

Because we use a single work structure attached to the CIL rather
than the CIL context, we can only queue a single work item at a
time. This results in the CIL being single threaded and limits
performance when it becomes CPU bound.

The design of the CIL is that it is pipelined and multiple commits
can be running concurrently, but the way the work is currently
implemented means that it is not pipelining as it was intended. The
critical work to switch the CIL context can take a few milliseconds
to run, but the rest of the CIL context flush can take hundreds of
milliseconds to complete. The context switching is the serialisation
point of the CIL, once the context has been switched the rest of the
context push can run asynchrnously with all other context pushes.

Hence we can move the work to the CIL context so that we can run
multiple CIL pushes at the same time and spread the majority of
the work out over multiple CPUs. We can keep the per-cpu CIL commit
state on the CIL rather than the context, because the context is
pinned to the CIL until the switch is done and we aggregate and
drain the per-cpu state held on the CIL during the context switch.

However, because we no longer serialise the CIL work, we can have
effectively unlimited CIL pushes in progress. We don't want to do
this - not only does it create contention on the iclogs and the
state machine locks, we can run the log right out of space with
outstanding pushes. Instead, limit the work concurrency to 4
concurrent works being processed at a time. This is enough
concurrency to remove the CIL from being a CPU bound bottleneck but
not enough to create new contention points or unbound concurrency
issues.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/xfs_log_cil.c  | 80 +++++++++++++++++++++++--------------------
 fs/xfs/xfs_log_priv.h |  2 +-
 fs/xfs/xfs_super.c    |  6 +++-
 3 files changed, 48 insertions(+), 40 deletions(-)

diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
index cb849e67b1c4..713ea66d4c0c 100644
--- a/fs/xfs/xfs_log_cil.c
+++ b/fs/xfs/xfs_log_cil.c
@@ -47,6 +47,34 @@ xlog_cil_ticket_alloc(
 	return tic;
 }
 
+/*
+ * Unavoidable forward declaration - xlog_cil_push_work() calls
+ * xlog_cil_ctx_alloc() itself.
+ */
+static void xlog_cil_push_work(struct work_struct *work);
+
+static struct xfs_cil_ctx *
+xlog_cil_ctx_alloc(void)
+{
+	struct xfs_cil_ctx	*ctx;
+
+	ctx = kmem_zalloc(sizeof(*ctx), KM_NOFS);
+	INIT_LIST_HEAD(&ctx->committing);
+	INIT_LIST_HEAD(&ctx->busy_extents);
+	INIT_WORK(&ctx->push_work, xlog_cil_push_work);
+	return ctx;
+}
+
+static void
+xlog_cil_ctx_switch(
+	struct xfs_cil		*cil,
+	struct xfs_cil_ctx	*ctx)
+{
+	ctx->sequence = ++cil->xc_current_sequence;
+	ctx->cil = cil;
+	cil->xc_ctx = ctx;
+}
+
 /*
  * After the first stage of log recovery is done, we know where the head and
  * tail of the log are. We need this log initialisation done before we can
@@ -641,11 +669,11 @@ static void
 xlog_cil_push_work(
 	struct work_struct	*work)
 {
-	struct xfs_cil		*cil =
-		container_of(work, struct xfs_cil, xc_push_work);
+	struct xfs_cil_ctx	*ctx =
+		container_of(work, struct xfs_cil_ctx, push_work);
+	struct xfs_cil		*cil = ctx->cil;
 	struct xlog		*log = cil->xc_log;
 	struct xfs_log_vec	*lv;
-	struct xfs_cil_ctx	*ctx;
 	struct xfs_cil_ctx	*new_ctx;
 	struct xlog_in_core	*commit_iclog;
 	struct xlog_ticket	*tic;
@@ -660,11 +688,10 @@ xlog_cil_push_work(
 	DECLARE_COMPLETION_ONSTACK(bdev_flush);
 	bool			push_commit_stable;
 
-	new_ctx = kmem_zalloc(sizeof(*new_ctx), KM_NOFS);
+	new_ctx = xlog_cil_ctx_alloc();
 	new_ctx->ticket = xlog_cil_ticket_alloc(log);
 
 	down_write(&cil->xc_ctx_lock);
-	ctx = cil->xc_ctx;
 
 	spin_lock(&cil->xc_push_lock);
 	push_seq = cil->xc_push_seq;
@@ -696,7 +723,7 @@ xlog_cil_push_work(
 
 
 	/* check for a previously pushed sequence */
-	if (push_seq < cil->xc_ctx->sequence) {
+	if (push_seq < ctx->sequence) {
 		spin_unlock(&cil->xc_push_lock);
 		goto out_skip;
 	}
@@ -761,19 +788,7 @@ xlog_cil_push_work(
 	}
 
 	/*
-	 * initialise the new context and attach it to the CIL. Then attach
-	 * the current context to the CIL committing list so it can be found
-	 * during log forces to extract the commit lsn of the sequence that
-	 * needs to be forced.
-	 */
-	INIT_LIST_HEAD(&new_ctx->committing);
-	INIT_LIST_HEAD(&new_ctx->busy_extents);
-	new_ctx->sequence = ctx->sequence + 1;
-	new_ctx->cil = cil;
-	cil->xc_ctx = new_ctx;
-
-	/*
-	 * The switch is now done, so we can drop the context lock and move out
+	 * Switch the contexts so we can drop the context lock and move out
 	 * of a shared context. We can't just go straight to the commit record,
 	 * though - we need to synchronise with previous and future commits so
 	 * that the commit records are correctly ordered in the log to ensure
@@ -798,7 +813,7 @@ xlog_cil_push_work(
 	 * deferencing a freed context pointer.
 	 */
 	spin_lock(&cil->xc_push_lock);
-	cil->xc_current_sequence = new_ctx->sequence;
+	xlog_cil_ctx_switch(cil, new_ctx);
 	spin_unlock(&cil->xc_push_lock);
 	up_write(&cil->xc_ctx_lock);
 
@@ -970,7 +985,7 @@ xlog_cil_push_background(
 	spin_lock(&cil->xc_push_lock);
 	if (cil->xc_push_seq < cil->xc_current_sequence) {
 		cil->xc_push_seq = cil->xc_current_sequence;
-		queue_work(log->l_mp->m_cil_workqueue, &cil->xc_push_work);
+		queue_work(log->l_mp->m_cil_workqueue, &cil->xc_ctx->push_work);
 	}
 
 	/*
@@ -1036,7 +1051,7 @@ xlog_cil_push_now(
 
 	/* start on any pending background push to minimise wait time on it */
 	if (!async)
-		flush_work(&cil->xc_push_work);
+		flush_workqueue(log->l_mp->m_cil_workqueue);
 
 	/*
 	 * If the CIL is empty or we've already pushed the sequence then
@@ -1050,7 +1065,7 @@ xlog_cil_push_now(
 
 	cil->xc_push_seq = push_seq;
 	cil->xc_push_commit_stable = async;
-	queue_work(log->l_mp->m_cil_workqueue, &cil->xc_push_work);
+	queue_work(log->l_mp->m_cil_workqueue, &cil->xc_ctx->push_work);
 	spin_unlock(&cil->xc_push_lock);
 }
 
@@ -1290,13 +1305,6 @@ xlog_cil_init(
 	if (!cil)
 		return -ENOMEM;
 
-	ctx = kmem_zalloc(sizeof(*ctx), KM_MAYFAIL);
-	if (!ctx) {
-		kmem_free(cil);
-		return -ENOMEM;
-	}
-
-	INIT_WORK(&cil->xc_push_work, xlog_cil_push_work);
 	INIT_LIST_HEAD(&cil->xc_cil);
 	INIT_LIST_HEAD(&cil->xc_committing);
 	spin_lock_init(&cil->xc_cil_lock);
@@ -1304,16 +1312,12 @@ xlog_cil_init(
 	init_waitqueue_head(&cil->xc_push_wait);
 	init_rwsem(&cil->xc_ctx_lock);
 	init_waitqueue_head(&cil->xc_commit_wait);
-
-	INIT_LIST_HEAD(&ctx->committing);
-	INIT_LIST_HEAD(&ctx->busy_extents);
-	ctx->sequence = 1;
-	ctx->cil = cil;
-	cil->xc_ctx = ctx;
-	cil->xc_current_sequence = ctx->sequence;
-
 	cil->xc_log = log;
 	log->l_cilp = cil;
+
+	ctx = xlog_cil_ctx_alloc();
+	xlog_cil_ctx_switch(cil, ctx);
+
 	return 0;
 }
 
diff --git a/fs/xfs/xfs_log_priv.h b/fs/xfs/xfs_log_priv.h
index a863ccb5ece6..87447fa34c43 100644
--- a/fs/xfs/xfs_log_priv.h
+++ b/fs/xfs/xfs_log_priv.h
@@ -245,6 +245,7 @@ struct xfs_cil_ctx {
 	struct list_head	iclog_entry;
 	struct list_head	committing;	/* ctx committing list */
 	struct work_struct	discard_endio_work;
+	struct work_struct	push_work;
 };
 
 /*
@@ -277,7 +278,6 @@ struct xfs_cil {
 	struct list_head	xc_committing;
 	wait_queue_head_t	xc_commit_wait;
 	xfs_csn_t		xc_current_sequence;
-	struct work_struct	xc_push_work;
 	wait_queue_head_t	xc_push_wait;	/* background push throttle */
 } ____cacheline_aligned_in_smp;
 
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 9500ad115233..017ba9c24c2d 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -500,9 +500,13 @@ xfs_init_mount_workqueues(
 	if (!mp->m_unwritten_workqueue)
 		goto out_destroy_buf;
 
+	/*
+	 * Limit the CIL pipeline depth to 4 concurrent works to bound the
+	 * concurrency the log spinlocks will be exposed to.
+	 */
 	mp->m_cil_workqueue = alloc_workqueue("xfs-cil/%s",
 			XFS_WQFLAGS(WQ_FREEZABLE | WQ_MEM_RECLAIM | WQ_UNBOUND),
-			0, mp->m_super->s_id);
+			4, mp->m_super->s_id);
 	if (!mp->m_cil_workqueue)
 		goto out_destroy_unwritten;
 
-- 
2.31.1


  parent reply	other threads:[~2021-06-03  5:23 UTC|newest]

Thread overview: 60+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-03  5:22 [PATCH 00/39 v5] xfs: CIL and log optimisations Dave Chinner
2021-06-03  5:22 ` [PATCH 01/39] xfs: log stripe roundoff is a property of the log Dave Chinner
2021-06-03  5:22 ` [PATCH 02/39] xfs: separate CIL commit record IO Dave Chinner
2021-06-03  5:22 ` [PATCH 03/39] xfs: remove xfs_blkdev_issue_flush Dave Chinner
2021-06-03  5:22 ` [PATCH 04/39] xfs: async blkdev cache flush Dave Chinner
2021-06-03  5:22 ` [PATCH 05/39] xfs: CIL checkpoint flushes caches unconditionally Dave Chinner
2021-06-03  5:22 ` [PATCH 06/39] xfs: remove need_start_rec parameter from xlog_write() Dave Chinner
2021-06-03 22:29   ` Allison Henderson
2021-06-03  5:22 ` [PATCH 07/39] xfs: journal IO cache flush reductions Dave Chinner
2021-06-03 22:29   ` Allison Henderson
2021-06-04  5:15   ` [xfs] 80d287de7b: stress-ng.dir.ops_per_sec 80.1% improvement kernel test robot
2021-06-04  5:15     ` kernel test robot
2021-06-03  5:22 ` [PATCH 08/39] xfs: Fix CIL throttle hang when CIL space used going backwards Dave Chinner
2021-06-03 22:29   ` Allison Henderson
2021-06-03  5:22 ` [PATCH 09/39] xfs: xfs_log_force_lsn isn't passed a LSN Dave Chinner
2021-06-03 22:29   ` Allison Henderson
2021-06-03  5:22 ` [PATCH 10/39] xfs: AIL needs asynchronous CIL forcing Dave Chinner
2021-06-03 22:29   ` Allison Henderson
2021-06-03  5:22 ` Dave Chinner [this message]
2021-06-03  5:22 ` [PATCH 12/39] xfs: factor out the CIL transaction header building Dave Chinner
2021-06-03  5:22 ` [PATCH 13/39] xfs: only CIL pushes require a start record Dave Chinner
2021-06-03  5:22 ` [PATCH 14/39] xfs: embed the xlog_op_header in the unmount record Dave Chinner
2021-06-03  5:22 ` [PATCH 15/39] xfs: embed the xlog_op_header in the commit record Dave Chinner
2021-06-03  5:22 ` [PATCH 16/39] xfs: log tickets don't need log client id Dave Chinner
2021-06-03  5:22 ` [PATCH 17/39] xfs: move log iovec alignment to preparation function Dave Chinner
2021-06-03  5:22 ` [PATCH 18/39] xfs: reserve space and initialise xlog_op_header in item formatting Dave Chinner
2021-06-03  5:22 ` [PATCH 19/39] xfs: log ticket region debug is largely useless Dave Chinner
2021-06-03  5:22 ` [PATCH 20/39] xfs: pass lv chain length into xlog_write() Dave Chinner
2021-06-03  5:22 ` [PATCH 21/39] xfs: introduce xlog_write_single() Dave Chinner
2021-06-03  5:22 ` [PATCH 22/39] xfs:_introduce xlog_write_partial() Dave Chinner
2021-06-03  5:22 ` [PATCH 23/39] xfs: xlog_write() no longer needs contwr state Dave Chinner
2021-06-03  5:22 ` [PATCH 24/39] xfs: xlog_write() doesn't need optype anymore Dave Chinner
2021-06-03  5:22 ` [PATCH 25/39] xfs: CIL context doesn't need to count iovecs Dave Chinner
2021-06-03  5:22 ` [PATCH 26/39] xfs: use the CIL space used counter for emptiness checks Dave Chinner
2021-06-03  5:22 ` [PATCH 27/39] xfs: lift init CIL reservation out of xc_cil_lock Dave Chinner
2021-06-03  5:22 ` [PATCH 28/39] xfs: rework per-iclog header CIL reservation Dave Chinner
2021-06-03  5:22 ` [PATCH 29/39] xfs: introduce per-cpu CIL tracking structure Dave Chinner
2021-06-03  9:18   ` kernel test robot
2021-06-03  9:18     ` kernel test robot
2021-06-03 10:08   ` kernel test robot
2021-06-03 10:08     ` kernel test robot
2021-06-03 16:49   ` Darrick J. Wong
2021-06-03  5:22 ` [PATCH 30/39] xfs: implement percpu cil space used calculation Dave Chinner
2021-06-03 16:44   ` Darrick J. Wong
2021-06-03  5:22 ` [PATCH 31/39] xfs: track CIL ticket reservation in percpu structure Dave Chinner
2021-06-03 16:43   ` Darrick J. Wong
2021-06-03  5:22 ` [PATCH 32/39] xfs: convert CIL busy extents to per-cpu Dave Chinner
2021-06-03  5:22 ` [PATCH 33/39] xfs: Add order IDs to log items in CIL Dave Chinner
2021-06-03  5:22 ` [PATCH 34/39] xfs: convert CIL to unordered per cpu lists Dave Chinner
2021-06-03  5:22 ` [PATCH 35/39] xfs: convert log vector chain to use list heads Dave Chinner
2021-06-03 10:16   ` kernel test robot
2021-06-03 10:16     ` kernel test robot
2021-06-03  5:22 ` [PATCH 36/39] xfs: move CIL ordering to the logvec chain Dave Chinner
2021-06-03  5:22 ` [PATCH 37/39] xfs: avoid cil push lock if possible Dave Chinner
2021-06-03  5:22 ` [PATCH 38/39] xfs: xlog_sync() manually adjusts grant head space Dave Chinner
2021-06-03  5:22 ` [PATCH 39/39] xfs: expanding delayed logging design with background material Dave Chinner
2021-06-03 17:05 ` [PATCH 00/39 v5] xfs: CIL and log optimisations Darrick J. Wong
2021-06-03 22:43   ` Dave Chinner
  -- strict thread matches above, loose matches on Subject: below --
2021-05-19 12:12 [PATCH 00/39 v4] " Dave Chinner
2021-05-19 12:12 ` [PATCH 11/39] xfs: CIL work is serialised, not pipelined Dave Chinner
2021-05-21  0:32   ` Darrick J. Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210603052240.171998-12-david@fromorbit.com \
    --to=david@fromorbit.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.