All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <djwong@kernel.org>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH 44/45] xfs: xlog_sync() manually adjusts grant head space
Date: Wed, 10 Mar 2021 18:00:45 -0800	[thread overview]
Message-ID: <20210311020045.GR3419940@magnolia> (raw)
In-Reply-To: <20210305051143.182133-45-david@fromorbit.com>

On Fri, Mar 05, 2021 at 04:11:42PM +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> When xlog_sync() rounds off the tail the iclog that is being
> flushed, it manually subtracts that space from the grant heads. This
> space is actually reserved by the transaction ticket that covers
> the xlog_sync() call from xlog_write(), but we don't plumb the
> ticket down far enough for it to account for the space consumed in
> the current log ticket.
> 
> The grant heads are hot, so we really should be accounting this to
> the ticket is we can, rather than adding thousands of extra grant
> head updates every CIL commit.
> 
> Interestingly, this actually indicates a potential log space overrun
> can occur when we force the log. By the time that xfs_log_force()
> pushes out an active iclog and consumes the roundoff space, the

Ok I was wondering about that when I was trying to figure out what all
this ticket space stealing code was doing.

So in addition to fixing the theoretical overrun, I guess the
performance fix here is that every time we write an iclog we might have
to move the grant heads forward so that we always write a full log
sector / log stripe unit?  And since a CIL context might write a lot of
iclogs, it's cheaper to make those grant adjustments to the CIL ticket
(which already asked for enough space to handle the roundoffs) since the
ticket only jumps in the hot path once when the ticket is ungranted?

If I got that right,
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> reservation for that roundoff space has been returned to the grant
> heads and is no longer covered by a reservation. In theory the
> roundoff added to log force on an already full log could push the
> write head past the tail. In practice, the CIL commit that writes to
> the log and needs the iclog pushed will have reserved space for
> roundoff, so when it releases the ticket there will still be
> physical space for the roundoff to be committed to the log, even
> though it is no longer reserved. This roundoff won't be enough space
> to allow a transaction to be woken if the log is full, so overruns
> should not actually occur in practice.
> 
> That said, it indicates that we should not release the CIL context
> log ticket until after we've released the commit iclog. It also
> means that xlog_sync() still needs the direct grant head
> manipulation if we don't provide it with a ticket. Log forces are
> rare when we are in fast paths running 1.5 million transactions/s
> that make the grant heads hot, so let's optimise the hot case and
> pass CIL log tickets down to the xlog_sync() code.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/xfs_log.c      | 39 +++++++++++++++++++++++++--------------
>  fs/xfs/xfs_log_cil.c  | 19 ++++++++++++++-----
>  fs/xfs/xfs_log_priv.h |  3 ++-
>  3 files changed, 41 insertions(+), 20 deletions(-)
> 
> diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
> index fd58c3213ebf..1c7d522b12cd 100644
> --- a/fs/xfs/xfs_log.c
> +++ b/fs/xfs/xfs_log.c
> @@ -55,7 +55,8 @@ xlog_grant_push_ail(
>  STATIC void
>  xlog_sync(
>  	struct xlog		*log,
> -	struct xlog_in_core	*iclog);
> +	struct xlog_in_core	*iclog,
> +	struct xlog_ticket	*ticket);
>  #if defined(DEBUG)
>  STATIC void
>  xlog_verify_dest_ptr(
> @@ -535,7 +536,8 @@ __xlog_state_release_iclog(
>  int
>  xlog_state_release_iclog(
>  	struct xlog		*log,
> -	struct xlog_in_core	*iclog)
> +	struct xlog_in_core	*iclog,
> +	struct xlog_ticket	*ticket)
>  {
>  	lockdep_assert_held(&log->l_icloglock);
>  
> @@ -545,7 +547,7 @@ xlog_state_release_iclog(
>  	if (atomic_dec_and_test(&iclog->ic_refcnt) &&
>  	    __xlog_state_release_iclog(log, iclog)) {
>  		spin_unlock(&log->l_icloglock);
> -		xlog_sync(log, iclog);
> +		xlog_sync(log, iclog, ticket);
>  		spin_lock(&log->l_icloglock);
>  	}
>  
> @@ -898,7 +900,7 @@ xlog_unmount_write(
>  	else
>  		ASSERT(iclog->ic_state == XLOG_STATE_WANT_SYNC ||
>  		       iclog->ic_state == XLOG_STATE_IOERROR);
> -	error = xlog_state_release_iclog(log, iclog);
> +	error = xlog_state_release_iclog(log, iclog, tic);
>  	xlog_wait_on_iclog(iclog);
>  
>  	if (tic) {
> @@ -1930,7 +1932,8 @@ xlog_calc_iclog_size(
>  STATIC void
>  xlog_sync(
>  	struct xlog		*log,
> -	struct xlog_in_core	*iclog)
> +	struct xlog_in_core	*iclog,
> +	struct xlog_ticket	*ticket)
>  {
>  	unsigned int		count;		/* byte count of bwrite */
>  	unsigned int		roundoff;       /* roundoff to BB or stripe */
> @@ -1941,12 +1944,20 @@ xlog_sync(
>  
>  	count = xlog_calc_iclog_size(log, iclog, &roundoff);
>  
> -	/* move grant heads by roundoff in sync */
> -	xlog_grant_add_space(log, &log->l_reserve_head.grant, roundoff);
> -	xlog_grant_add_space(log, &log->l_write_head.grant, roundoff);
> +	/*
> +	 * If we have a ticket, account for the roundoff via the ticket
> +	 * reservation to avoid touching the hot grant heads needlessly.
> +	 * Otherwise, we have to move grant heads directly.
> +	 */
> +	if (ticket) {
> +		ticket->t_curr_res -= roundoff;
> +	} else {
> +		xlog_grant_add_space(log, &log->l_reserve_head.grant, roundoff);
> +		xlog_grant_add_space(log, &log->l_write_head.grant, roundoff);
> +	}
>  
>  	/* put cycle number in every block */
> -	xlog_pack_data(log, iclog, roundoff); 
> +	xlog_pack_data(log, iclog, roundoff);
>  
>  	/* real byte length */
>  	size = iclog->ic_offset;
> @@ -2187,7 +2198,7 @@ xlog_write_get_more_iclog_space(
>  	xlog_state_finish_copy(log, iclog, *record_cnt, *data_cnt);
>  	ASSERT(iclog->ic_state == XLOG_STATE_WANT_SYNC ||
>  	       iclog->ic_state == XLOG_STATE_IOERROR);
> -	error = xlog_state_release_iclog(log, iclog);
> +	error = xlog_state_release_iclog(log, iclog, ticket);
>  	spin_unlock(&log->l_icloglock);
>  	if (error)
>  		return error;
> @@ -2470,7 +2481,7 @@ xlog_write(
>  		ASSERT(optype & XLOG_COMMIT_TRANS);
>  		*commit_iclog = iclog;
>  	} else {
> -		error = xlog_state_release_iclog(log, iclog);
> +		error = xlog_state_release_iclog(log, iclog, ticket);
>  	}
>  	spin_unlock(&log->l_icloglock);
>  
> @@ -2929,7 +2940,7 @@ xlog_state_get_iclog_space(
>  		 * reference to the iclog.
>  		 */
>  		if (!atomic_add_unless(&iclog->ic_refcnt, -1, 1))
> -			error = xlog_state_release_iclog(log, iclog);
> +			error = xlog_state_release_iclog(log, iclog, ticket);
>  		spin_unlock(&log->l_icloglock);
>  		if (error)
>  			return error;
> @@ -3157,7 +3168,7 @@ xfs_log_force(
>  			atomic_inc(&iclog->ic_refcnt);
>  			lsn = be64_to_cpu(iclog->ic_header.h_lsn);
>  			xlog_state_switch_iclogs(log, iclog, 0);
> -			if (xlog_state_release_iclog(log, iclog))
> +			if (xlog_state_release_iclog(log, iclog, NULL))
>  				goto out_error;
>  
>  			if (be64_to_cpu(iclog->ic_header.h_lsn) != lsn)
> @@ -3250,7 +3261,7 @@ xlog_force_lsn(
>  		}
>  		atomic_inc(&iclog->ic_refcnt);
>  		xlog_state_switch_iclogs(log, iclog, 0);
> -		if (xlog_state_release_iclog(log, iclog))
> +		if (xlog_state_release_iclog(log, iclog, NULL))
>  			goto out_error;
>  		if (log_flushed)
>  			*log_flushed = 1;
> diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
> index d60c72ad391a..aef60f19ab05 100644
> --- a/fs/xfs/xfs_log_cil.c
> +++ b/fs/xfs/xfs_log_cil.c
> @@ -804,6 +804,7 @@ xlog_cil_push_work(
>  	int			cpu;
>  	struct xlog_cil_pcp	*cilpcp;
>  	LIST_HEAD		(log_items);
> +	struct xlog_ticket	*ticket;
>  
>  	new_ctx = xlog_cil_ctx_alloc();
>  	new_ctx->ticket = xlog_cil_ticket_alloc(log);
> @@ -1037,12 +1038,10 @@ xlog_cil_push_work(
>  	if (error)
>  		goto out_abort_free_ticket;
>  
> -	xfs_log_ticket_ungrant(log, ctx->ticket);
> -
>  	spin_lock(&commit_iclog->ic_callback_lock);
>  	if (commit_iclog->ic_state == XLOG_STATE_IOERROR) {
>  		spin_unlock(&commit_iclog->ic_callback_lock);
> -		goto out_abort;
> +		goto out_abort_free_ticket;
>  	}
>  	ASSERT_ALWAYS(commit_iclog->ic_state == XLOG_STATE_ACTIVE ||
>  		      commit_iclog->ic_state == XLOG_STATE_WANT_SYNC);
> @@ -1073,12 +1072,23 @@ xlog_cil_push_work(
>  		commit_iclog->ic_flags &= ~XLOG_ICL_NEED_FLUSH;
>  	}
>  
> +	/*
> +	 * Pull the ticket off the ctx so we can ungrant it after releasing the
> +	 * commit_iclog. The ctx may be freed by the time we return from
> +	 * releasing the commit_iclog (i.e. checkpoint has been completed and
> +	 * callback run) so we can't reference the ctx after the call to
> +	 * xlog_state_release_iclog().
> +	 */
> +	ticket = ctx->ticket;
> +
>  	/* release the hounds! */
>  	spin_lock(&log->l_icloglock);
>  	if (commit_iclog_sync && commit_iclog->ic_state == XLOG_STATE_ACTIVE)
>  		xlog_state_switch_iclogs(log, commit_iclog, 0);
> -	xlog_state_release_iclog(log, commit_iclog);
> +	xlog_state_release_iclog(log, commit_iclog, ticket);
>  	spin_unlock(&log->l_icloglock);
> +
> +	xfs_log_ticket_ungrant(log, ticket);
>  	return;
>  
>  out_skip:
> @@ -1089,7 +1099,6 @@ xlog_cil_push_work(
>  
>  out_abort_free_ticket:
>  	xfs_log_ticket_ungrant(log, ctx->ticket);
> -out_abort:
>  	ASSERT(XLOG_FORCED_SHUTDOWN(log));
>  	xlog_cil_committed(ctx);
>  }
> diff --git a/fs/xfs/xfs_log_priv.h b/fs/xfs/xfs_log_priv.h
> index 6a4160200417..3d43d3940757 100644
> --- a/fs/xfs/xfs_log_priv.h
> +++ b/fs/xfs/xfs_log_priv.h
> @@ -487,7 +487,8 @@ int	xlog_commit_record(struct xlog *log, struct xlog_ticket *ticket,
>  		struct xlog_in_core **iclog, xfs_lsn_t *lsn);
>  void	xlog_state_switch_iclogs(struct xlog *log, struct xlog_in_core *iclog,
>  		int eventual_size);
> -int	xlog_state_release_iclog(struct xlog *xlog, struct xlog_in_core *iclog);
> +int	xlog_state_release_iclog(struct xlog *xlog, struct xlog_in_core *iclog,
> +		struct xlog_ticket *ticket);
>  
>  void	xfs_log_ticket_ungrant(struct xlog *log, struct xlog_ticket *ticket);
>  void	xfs_log_ticket_regrant(struct xlog *log, struct xlog_ticket *ticket);
> -- 
> 2.28.0
> 

  reply	other threads:[~2021-03-11  2:01 UTC|newest]

Thread overview: 145+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-05  5:10 [PATCH 00/45 v3] xfs: consolidated log and optimisation changes Dave Chinner
2021-03-05  5:10 ` [PATCH 01/45] xfs: initialise attr fork on inode create Dave Chinner
2021-03-08 22:20   ` Darrick J. Wong
2021-03-16  8:35   ` Christoph Hellwig
2021-03-05  5:11 ` [PATCH 02/45] xfs: log stripe roundoff is a property of the log Dave Chinner
2021-03-05  5:11 ` [PATCH 03/45] xfs: separate CIL commit record IO Dave Chinner
2021-03-08  8:34   ` Chandan Babu R
2021-03-15 14:40   ` Brian Foster
2021-03-16  8:40   ` Christoph Hellwig
2021-03-05  5:11 ` [PATCH 04/45] xfs: remove xfs_blkdev_issue_flush Dave Chinner
2021-03-08  9:31   ` Chandan Babu R
2021-03-08 22:21   ` Darrick J. Wong
2021-03-15 14:40   ` Brian Foster
2021-03-16  8:41   ` Christoph Hellwig
2021-03-05  5:11 ` [PATCH 05/45] xfs: async blkdev cache flush Dave Chinner
2021-03-08  9:48   ` Chandan Babu R
2021-03-08 22:24     ` Darrick J. Wong
2021-03-15 14:41       ` Brian Foster
2021-03-15 16:32         ` Darrick J. Wong
2021-03-16  8:43           ` Christoph Hellwig
2021-03-08 22:26   ` Darrick J. Wong
2021-03-15 14:42   ` Brian Foster
2021-03-05  5:11 ` [PATCH 06/45] xfs: CIL checkpoint flushes caches unconditionally Dave Chinner
2021-03-15 14:43   ` Brian Foster
2021-03-16  8:47   ` Christoph Hellwig
2021-03-05  5:11 ` [PATCH 07/45] xfs: remove need_start_rec parameter from xlog_write() Dave Chinner
2021-03-15 14:45   ` Brian Foster
2021-03-16 14:15   ` Christoph Hellwig
2021-03-05  5:11 ` [PATCH 08/45] xfs: journal IO cache flush reductions Dave Chinner
2021-03-08 10:49   ` Chandan Babu R
2021-03-08 12:25   ` Brian Foster
2021-03-09  1:13     ` Dave Chinner
2021-03-10 20:49       ` Brian Foster
2021-03-10 21:28         ` Dave Chinner
2021-03-05  5:11 ` [PATCH 09/45] xfs: Fix CIL throttle hang when CIL space used going backwards Dave Chinner
2021-03-05  5:11 ` [PATCH 10/45] xfs: reduce buffer log item shadow allocations Dave Chinner
2021-03-15 14:52   ` Brian Foster
2021-03-05  5:11 ` [PATCH 11/45] xfs: xfs_buf_item_size_segment() needs to pass segment offset Dave Chinner
2021-03-05  5:11 ` [PATCH 12/45] xfs: optimise xfs_buf_item_size/format for contiguous regions Dave Chinner
2021-03-05  5:11 ` [PATCH 13/45] xfs: xfs_log_force_lsn isn't passed a LSN Dave Chinner
2021-03-08 22:53   ` Darrick J. Wong
2021-03-11  0:26     ` Dave Chinner
2021-03-05  5:11 ` [PATCH 14/45] xfs: AIL needs asynchronous CIL forcing Dave Chinner
2021-03-08 23:45   ` Darrick J. Wong
2021-03-05  5:11 ` [PATCH 15/45] xfs: CIL work is serialised, not pipelined Dave Chinner
2021-03-08 23:14   ` Darrick J. Wong
2021-03-08 23:38     ` Dave Chinner
2021-03-09  1:55       ` Darrick J. Wong
2021-03-09 22:35         ` Andi Kleen
2021-03-10  6:11           ` Dave Chinner
2021-03-05  5:11 ` [PATCH 16/45] xfs: type verification is expensive Dave Chinner
2021-03-05  5:11 ` [PATCH 17/45] xfs: No need for inode number error injection in __xfs_dir3_data_check Dave Chinner
2021-03-05  5:11 ` [PATCH 18/45] xfs: reduce debug overhead of dir leaf/node checks Dave Chinner
2021-03-05  5:11 ` [PATCH 19/45] xfs: factor out the CIL transaction header building Dave Chinner
2021-03-08 23:47   ` Darrick J. Wong
2021-03-16 14:50   ` Brian Foster
2021-03-05  5:11 ` [PATCH 20/45] xfs: only CIL pushes require a start record Dave Chinner
2021-03-09  0:07   ` Darrick J. Wong
2021-03-16 14:51   ` Brian Foster
2021-03-05  5:11 ` [PATCH 21/45] xfs: embed the xlog_op_header in the unmount record Dave Chinner
2021-03-09  0:15   ` Darrick J. Wong
2021-03-11  2:54     ` Dave Chinner
2021-03-05  5:11 ` [PATCH 22/45] xfs: embed the xlog_op_header in the commit record Dave Chinner
2021-03-09  0:17   ` Darrick J. Wong
2021-03-05  5:11 ` [PATCH 23/45] xfs: log tickets don't need log client id Dave Chinner
2021-03-09  0:21   ` Darrick J. Wong
2021-03-09  1:19     ` Dave Chinner
2021-03-09  1:48       ` Darrick J. Wong
2021-03-11  3:01         ` Dave Chinner
2021-03-16 14:51   ` Brian Foster
2021-03-05  5:11 ` [PATCH 24/45] xfs: move log iovec alignment to preparation function Dave Chinner
2021-03-09  2:14   ` Darrick J. Wong
2021-03-16 14:51   ` Brian Foster
2021-03-05  5:11 ` [PATCH 25/45] xfs: reserve space and initialise xlog_op_header in item formatting Dave Chinner
2021-03-09  2:21   ` Darrick J. Wong
2021-03-11  3:29     ` Dave Chinner
2021-03-11  3:41       ` Darrick J. Wong
2021-03-16 14:54         ` Brian Foster
2021-03-16 14:53   ` Brian Foster
2021-05-19  3:18     ` Dave Chinner
2021-03-05  5:11 ` [PATCH 26/45] xfs: log ticket region debug is largely useless Dave Chinner
2021-03-09  2:31   ` Darrick J. Wong
2021-03-16 14:55   ` Brian Foster
2021-05-19  3:27     ` Dave Chinner
2021-03-05  5:11 ` [PATCH 27/45] xfs: pass lv chain length into xlog_write() Dave Chinner
2021-03-09  2:36   ` Darrick J. Wong
2021-03-11  3:37     ` Dave Chinner
2021-03-16 18:38   ` Brian Foster
2021-03-05  5:11 ` [PATCH 28/45] xfs: introduce xlog_write_single() Dave Chinner
2021-03-09  2:39   ` Darrick J. Wong
2021-03-11  4:19     ` Dave Chinner
2021-03-16 18:39   ` Brian Foster
2021-05-19  3:44     ` Dave Chinner
2021-03-05  5:11 ` [PATCH 29/45] xfs:_introduce xlog_write_partial() Dave Chinner
2021-03-09  2:59   ` Darrick J. Wong
2021-03-11  4:33     ` Dave Chinner
2021-03-18 13:22   ` Brian Foster
2021-05-19  4:49     ` Dave Chinner
2021-05-20 12:33       ` Brian Foster
2021-05-27 18:03         ` Darrick J. Wong
2021-03-05  5:11 ` [PATCH 30/45] xfs: xlog_write() no longer needs contwr state Dave Chinner
2021-03-09  3:01   ` Darrick J. Wong
2021-03-05  5:11 ` [PATCH 31/45] xfs: CIL context doesn't need to count iovecs Dave Chinner
2021-03-09  3:16   ` Darrick J. Wong
2021-03-11  5:03     ` Dave Chinner
2021-03-05  5:11 ` [PATCH 32/45] xfs: use the CIL space used counter for emptiness checks Dave Chinner
2021-03-10 23:01   ` Darrick J. Wong
2021-03-05  5:11 ` [PATCH 33/45] xfs: lift init CIL reservation out of xc_cil_lock Dave Chinner
2021-03-10 23:25   ` Darrick J. Wong
2021-03-11  5:42     ` Dave Chinner
2021-03-05  5:11 ` [PATCH 34/45] xfs: rework per-iclog header CIL reservation Dave Chinner
2021-03-11  0:03   ` Darrick J. Wong
2021-03-11  6:03     ` Dave Chinner
2021-03-05  5:11 ` [PATCH 35/45] xfs: introduce per-cpu CIL tracking sructure Dave Chinner
2021-03-11  0:11   ` Darrick J. Wong
2021-03-11  6:33     ` Dave Chinner
2021-03-11  6:42       ` Dave Chinner
2021-03-05  5:11 ` [PATCH 36/45] xfs: implement percpu cil space used calculation Dave Chinner
2021-03-11  0:20   ` Darrick J. Wong
2021-03-11  6:51     ` Dave Chinner
2021-03-05  5:11 ` [PATCH 37/45] xfs: track CIL ticket reservation in percpu structure Dave Chinner
2021-03-11  0:26   ` Darrick J. Wong
2021-03-12  0:47     ` Dave Chinner
2021-03-05  5:11 ` [PATCH 38/45] xfs: convert CIL busy extents to per-cpu Dave Chinner
2021-03-11  0:36   ` Darrick J. Wong
2021-03-12  1:15     ` Dave Chinner
2021-03-05  5:11 ` [PATCH 39/45] xfs: Add order IDs to log items in CIL Dave Chinner
2021-03-11  1:00   ` Darrick J. Wong
2021-03-05  5:11 ` [PATCH 40/45] xfs: convert CIL to unordered per cpu lists Dave Chinner
2021-03-11  1:15   ` Darrick J. Wong
2021-03-12  2:18     ` Dave Chinner
2021-03-05  5:11 ` [PATCH 41/45] xfs: move CIL ordering to the logvec chain Dave Chinner
2021-03-11  1:34   ` Darrick J. Wong
2021-03-12  2:29     ` Dave Chinner
2021-03-05  5:11 ` [PATCH 42/45] xfs: __percpu_counter_compare() inode count debug too expensive Dave Chinner
2021-03-11  1:36   ` Darrick J. Wong
2021-03-05  5:11 ` [PATCH 43/45] xfs: avoid cil push lock if possible Dave Chinner
2021-03-11  1:47   ` Darrick J. Wong
2021-03-12  2:36     ` Dave Chinner
2021-03-05  5:11 ` [PATCH 44/45] xfs: xlog_sync() manually adjusts grant head space Dave Chinner
2021-03-11  2:00   ` Darrick J. Wong [this message]
2021-03-16  3:04     ` Dave Chinner
2021-03-05  5:11 ` [PATCH 45/45] xfs: expanding delayed logging design with background material Dave Chinner
2021-03-11  2:30   ` Darrick J. Wong
2021-03-16  3:28     ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210311020045.GR3419940@magnolia \
    --to=djwong@kernel.org \
    --cc=david@fromorbit.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.