All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <djwong@kernel.org>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH 5/8] xfs: CIL checkpoint flushes caches unconditionally
Date: Wed, 24 Feb 2021 12:57:42 -0800	[thread overview]
Message-ID: <20210224205742.GU7272@magnolia> (raw)
In-Reply-To: <20210223033442.3267258-6-david@fromorbit.com>

On Tue, Feb 23, 2021 at 02:34:39PM +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> Currently every journal IO is issued as REQ_PREFLUSH | REQ_FUA to
> guarantee the ordering requirements the journal has w.r.t. metadata
> writeback. THe two ordering constraints are:
> 
> 1. we cannot overwrite metadata in the journal until we guarantee
> that the dirty metadata has been written back in place and is
> stable.
> 
> 2. we cannot write back dirty metadata until it has been written to
> the journal and guaranteed to be stable (and hence recoverable) in
> the journal.
> 
> These rules apply to the atomic transactions recorded in the
> journal, not to the journal IO itself. Hence we need to ensure
> metadata is stable before we start writing a new transaction to the
> journal (guarantee #1), and we need to ensure the entire transaction
> is stable in the journal before we start metadata writeback
> (guarantee #2).
> 
> The ordering guarantees of #1 are currently provided by REQ_PREFLUSH
> being added to every iclog IO. This causes the journal IO to issue a
> cache flush and wait for it to complete before issuing the write IO
> to the journal. Hence all completed metadata IO is guaranteed to be
> stable before the journal overwrites the old metadata.
> 
> However, for long running CIL checkpoints that might do a thousand
> journal IOs, we don't need every single one of these iclog IOs to
> issue a cache flush - the cache flush done before the first iclog is
> submitted is sufficient to cover the entire range in the log that
> the checkpoint will overwrite because the CIL space reservation
> guarantees the tail of the log (completed metadata) is already
> beyond the range of the checkpoint write.
> 
> Hence we only need a full cache flush between closing off the CIL
> checkpoint context (i.e. when the push switches it out) and issuing
> the first journal IO. Rather than plumbing this through to the
> journal IO, we can start this cache flush the moment the CIL context
> is owned exclusively by the push worker. The cache flush can be in
> progress while we process the CIL ready for writing, hence
> reducing the latency of the initial iclog write. This is especially
> true for large checkpoints, where we might have to process hundreds
> of thousands of log vectors before we issue the first iclog write.
> In these cases, it is likely the cache flush has already been
> completed by the time we have built the CIL log vector chain.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>

Looks ok,
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> ---
>  fs/xfs/xfs_log_cil.c | 29 +++++++++++++++++++++++++----
>  1 file changed, 25 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
> index c5cc1b7ad25e..8bcacd463f06 100644
> --- a/fs/xfs/xfs_log_cil.c
> +++ b/fs/xfs/xfs_log_cil.c
> @@ -656,6 +656,7 @@ xlog_cil_push_work(
>  	struct xfs_log_vec	lvhdr = { NULL };
>  	xfs_lsn_t		commit_lsn;
>  	xfs_lsn_t		push_seq;
> +	DECLARE_COMPLETION_ONSTACK(bdev_flush);
>  
>  	new_ctx = kmem_zalloc(sizeof(*new_ctx), KM_NOFS);
>  	new_ctx->ticket = xlog_cil_ticket_alloc(log);
> @@ -719,10 +720,24 @@ xlog_cil_push_work(
>  	spin_unlock(&cil->xc_push_lock);
>  
>  	/*
> -	 * pull all the log vectors off the items in the CIL, and
> -	 * remove the items from the CIL. We don't need the CIL lock
> -	 * here because it's only needed on the transaction commit
> -	 * side which is currently locked out by the flush lock.
> +	 * The CIL is stable at this point - nothing new will be added to it
> +	 * because we hold the flush lock exclusively. Hence we can now issue
> +	 * a cache flush to ensure all the completed metadata in the journal we
> +	 * are about to overwrite is on stable storage.
> +	 *
> +	 * This avoids the need to have the iclogs issue REQ_PREFLUSH based
> +	 * cache flushes to provide this ordering guarantee, and hence for CIL
> +	 * checkpoints that require hundreds or thousands of log writes no
> +	 * longer need to issue device cache flushes to provide metadata
> +	 * writeback ordering.
> +	 */
> +	xfs_flush_bdev_async(log->l_mp->m_ddev_targp->bt_bdev, &bdev_flush);
> +
> +	/*
> +	 * Pull all the log vectors off the items in the CIL, and remove the
> +	 * items from the CIL. We don't need the CIL lock here because it's only
> +	 * needed on the transaction commit side which is currently locked out
> +	 * by the flush lock.
>  	 */
>  	lv = NULL;
>  	num_iovecs = 0;
> @@ -806,6 +821,12 @@ xlog_cil_push_work(
>  	lvhdr.lv_iovecp = &lhdr;
>  	lvhdr.lv_next = ctx->lv_chain;
>  
> +	/*
> +	 * Before we format and submit the first iclog, we have to ensure that
> +	 * the metadata writeback ordering cache flush is complete.
> +	 */
> +	wait_for_completion(&bdev_flush);
> +
>  	error = xlog_write(log, &lvhdr, tic, &ctx->start_lsn, NULL, 0, true);
>  	if (error)
>  		goto out_abort_free_ticket;
> -- 
> 2.28.0
> 

  parent reply	other threads:[~2021-02-24 20:58 UTC|newest]

Thread overview: 59+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-23  3:34 [PATCH v2] xfs: various log stuff Dave Chinner
2021-02-23  3:34 ` [PATCH 1/8] xfs: log stripe roundoff is a property of the log Dave Chinner
2021-02-23 10:29   ` Chandan Babu R
2021-02-24 20:14   ` Darrick J. Wong
2021-02-25  8:32   ` Christoph Hellwig
2021-03-01 15:13   ` Brian Foster
2021-02-23  3:34 ` [PATCH 2/8] xfs: separate CIL commit record IO Dave Chinner
2021-02-23 12:12   ` Chandan Babu R
2021-02-24 20:34   ` Darrick J. Wong
2021-02-24 21:44     ` Dave Chinner
2021-02-24 23:06       ` Darrick J. Wong
2021-02-25  8:34       ` Christoph Hellwig
2021-02-25 20:47         ` Dave Chinner
2021-03-01  9:09           ` Christoph Hellwig
2021-03-03  0:11             ` Dave Chinner
2021-02-26  2:48         ` Darrick J. Wong
2021-02-28 16:36           ` Brian Foster
2021-02-28 23:46             ` Dave Chinner
2021-03-01 15:33               ` Brian Foster
2021-03-01 15:19   ` Brian Foster
2021-03-03  0:41     ` Dave Chinner
2021-03-03 15:22       ` Brian Foster
2021-03-04 22:57         ` Dave Chinner
2021-03-05  0:44       ` Dave Chinner
2021-02-23  3:34 ` [PATCH 3/8] xfs: move and rename xfs_blkdev_issue_flush Dave Chinner
2021-02-23 12:57   ` Chandan Babu R
2021-02-24 20:45   ` Darrick J. Wong
2021-02-24 22:01     ` Dave Chinner
2021-02-25  8:36   ` Christoph Hellwig
2021-02-23  3:34 ` [PATCH 4/8] xfs: async blkdev cache flush Dave Chinner
2021-02-23  5:29   ` Chaitanya Kulkarni
2021-02-23 14:02   ` Chandan Babu R
2021-02-24 20:51   ` Darrick J. Wong
2021-02-23  3:34 ` [PATCH 5/8] xfs: CIL checkpoint flushes caches unconditionally Dave Chinner
2021-02-24  7:16   ` Chandan Babu R
2021-02-24 20:57   ` Darrick J. Wong [this message]
2021-02-25  8:42   ` Christoph Hellwig
2021-02-25 21:07     ` Dave Chinner
2021-02-23  3:34 ` [PATCH 6/8] xfs: remove need_start_rec parameter from xlog_write() Dave Chinner
2021-02-24  7:17   ` Chandan Babu R
2021-02-24 20:59   ` Darrick J. Wong
2021-02-25  8:49   ` Christoph Hellwig
2021-02-25 20:55     ` Dave Chinner
2021-02-23  3:34 ` [PATCH 7/8] xfs: journal IO cache flush reductions Dave Chinner
2021-02-23  8:05   ` [PATCH 7/8 v2] " Dave Chinner
2021-02-24 12:27     ` Chandan Babu R
2021-02-24 20:32       ` Dave Chinner
2021-02-24 21:13     ` Darrick J. Wong
2021-02-24 22:03       ` Dave Chinner
2021-02-25  4:09     ` Chandan Babu R
2021-02-25  7:13       ` Chandan Babu R
2021-03-01  5:44       ` Dave Chinner
2021-03-01  5:56         ` Dave Chinner
2021-02-25  8:58     ` Christoph Hellwig
2021-02-25 21:06       ` Dave Chinner
2021-03-01 19:29     ` Brian Foster
2021-02-23  3:34 ` [PATCH 8/8] xfs: Fix CIL throttle hang when CIL space used going backwards Dave Chinner
2021-02-24 21:18   ` Darrick J. Wong
2021-02-24 22:05     ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210224205742.GU7272@magnolia \
    --to=djwong@kernel.org \
    --cc=david@fromorbit.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.