All of lore.kernel.org
 help / color / mirror / Atom feed
From: Brian Foster <bfoster@redhat.com>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH 25/45] xfs: reserve space and initialise xlog_op_header in item formatting
Date: Tue, 16 Mar 2021 10:53:01 -0400	[thread overview]
Message-ID: <YFDGTRSdC/PtkVLv@bfoster> (raw)
In-Reply-To: <20210305051143.182133-26-david@fromorbit.com>

On Fri, Mar 05, 2021 at 04:11:23PM +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> Current xlog_write() adds op headers to the log manually for every
> log item region that is in the vector passed to it. While
> xlog_write() needs to stamp the transaction ID into the ophdr, we
> already know it's length, flags, clientid, etc at CIL commit time.
> 
> This means the only time that xlog write really needs to format and
> reserve space for a new ophdr is when a region is split across two
> iclogs. Adding the opheader and accounting for it as part of the
> normal formatted item region means we simplify the accounting
> of space used by a transaction and we don't have to special case
> reserving of space in for the ophdrs in xlog_write(). It also means
> we can largely initialise the ophdr in transaction commit instead
> of xlog_write, making the xlog_write formatting inner loop much
> tighter.
> 
> xlog_prepare_iovec() is now too large to stay as an inline function,
> so we move it out of line and into xfs_log.c.
> 
> Object sizes:
> text	   data	    bss	    dec	    hex	filename
> 1125934	 305951	    484	1432369	 15db31 fs/xfs/built-in.a.before
> 1123360	 305951	    484	1429795	 15d123 fs/xfs/built-in.a.after
> 
> So the code is a roughly 2.5kB smaller with xlog_prepare_iovec() now
> out of line, even though it grew in size itself.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---

Looks mostly reasonable, a couple or so questions...

>  fs/xfs/xfs_log.c     | 115 +++++++++++++++++++++++++++++--------------
>  fs/xfs/xfs_log.h     |  42 +++-------------
>  fs/xfs/xfs_log_cil.c |  25 +++++-----
>  3 files changed, 99 insertions(+), 83 deletions(-)
> 
> diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
> index 429cb1e7cc67..98de45be80c0 100644
> --- a/fs/xfs/xfs_log.c
> +++ b/fs/xfs/xfs_log.c
> @@ -89,6 +89,62 @@ xlog_iclogs_empty(
>  static int
>  xfs_log_cover(struct xfs_mount *);
>  
> +/*
> + * We need to make sure the buffer pointer returned is naturally aligned for the
> + * biggest basic data type we put into it. We have already accounted for this
> + * padding when sizing the buffer.
> + *
> + * However, this padding does not get written into the log, and hence we have to
> + * track the space used by the log vectors separately to prevent log space hangs
> + * due to inaccurate accounting (i.e. a leak) of the used log space through the
> + * CIL context ticket.
> + *
> + * We also add space for the xlog_op_header that describes this region in the
> + * log. This prepends the data region we return to the caller to copy their data
> + * into, so do all the static initialisation of the ophdr now. Because the ophdr
> + * is not 8 byte aligned, we have to be careful to ensure that we align the
> + * start of the buffer such that the region we return to the call is 8 byte
> + * aligned and packed against the tail of the ophdr.
> + */
> +void *
> +xlog_prepare_iovec(
> +	struct xfs_log_vec	*lv,
> +	struct xfs_log_iovec	**vecp,
> +	uint			type)
> +{
> +	struct xfs_log_iovec	*vec = *vecp;
> +	struct xlog_op_header	*oph;
> +	uint32_t		len;
> +	void			*buf;
> +
> +	if (vec) {
> +		ASSERT(vec - lv->lv_iovecp < lv->lv_niovecs);
> +		vec++;
> +	} else {
> +		vec = &lv->lv_iovecp[0];
> +	}
> +
> +	len = lv->lv_buf_len + sizeof(struct xlog_op_header);
> +	if (!IS_ALIGNED(len, sizeof(uint64_t))) {
> +		lv->lv_buf_len = round_up(len, sizeof(uint64_t)) -
> +					sizeof(struct xlog_op_header);
> +	}
> +
> +	vec->i_type = type;
> +	vec->i_addr = lv->lv_buf + lv->lv_buf_len;
> +
> +	oph = vec->i_addr;
> +	oph->oh_clientid = XFS_TRANSACTION;
> +	oph->oh_res2 = 0;
> +	oph->oh_flags = 0;
> +
> +	buf = vec->i_addr + sizeof(struct xlog_op_header);
> +	ASSERT(IS_ALIGNED((unsigned long)buf, sizeof(uint64_t)));

Why is it the buffer portion needs to be 8 byte aligned but not ->i_addr
itself?

> +
> +	*vecp = vec;
> +	return buf;
> +}
> +

Not worth changing now, but it's helpful to reduce the size of the patch
by separating out mechanical changes like moving functions around.

>  static void
>  xlog_grant_sub_space(
>  	struct xlog		*log,
...
> @@ -2149,18 +2205,7 @@ xlog_write_calc_vec_length(
>  			xlog_tic_add_region(ticket, vecp->i_len, vecp->i_type);
>  		}
>  	}
> -
> -	/* Don't account for regions with embedded ophdrs */
> -	if (optype && headers > 0) {
> -		headers--;
> -		if (optype & XLOG_START_TRANS) {
> -			ASSERT(headers >= 1);
> -			headers--;
> -		}
> -	}
> -
>  	ticket->t_res_num_ophdrs += headers;
> -	len += headers * sizeof(struct xlog_op_header);

Hm, this seems to suggest something was off wrt to ->t_res_num_ophdrs
prior to this change.  Granted this looks like it's just a debug field,
but the previous logic filtered out embedded op headers unconditionally
whereas now it looks like we go back to accounting them. Am I missing
something?

>  
>  	return len;
>  }
...
> @@ -2404,21 +2448,25 @@ xlog_write(
>  			ASSERT((unsigned long)ptr % sizeof(int32_t) == 0);
>  
>  			/*
> -			 * The XLOG_START_TRANS has embedded ophdrs for the
> -			 * start record and transaction header. They will always
> -			 * be the first two regions in the lv chain. Commit and
> -			 * unmount records also have embedded ophdrs.
> +			 * Regions always have their ophdr at the start of the
> +			 * region, except for:
> +			 * - a transaction start which has a start record ophdr
> +			 *   before the first region ophdr; and
> +			 * - the previous region didn't fully fit into an iclog
> +			 *   so needs a continuation ophdr to prepend the region
> +			 *   in this new iclog.
>  			 */
> -			if (optype) {
> -				ophdr = reg->i_addr;
> -				if (index)
> -					optype &= ~XLOG_START_TRANS;
> -			} else {
> +			ophdr = reg->i_addr;
> +			if (optype && index) {
> +				optype &= ~XLOG_START_TRANS;
> +			} else if (partial_copy) {
>                                  ophdr = xlog_write_setup_ophdr(ptr, ticket);
>  				xlog_write_adv_cnt(&ptr, &len, &log_offset,
>  					   sizeof(struct xlog_op_header));
>  				added_ophdr = true;
>  			}

So in the partial_copy continuation case we're still stamping an ophdr
directly into the iclog. Otherwise we're processing/modifying flags and
whatnot on the ophdr already stamped at commit time in the log vector.
However, this is Ok because a relog would reformat the op header.

> +			ophdr->oh_tid = cpu_to_be32(ticket->t_tid);
> +
>  			len += xlog_write_setup_copy(ticket, ophdr,
>  						     iclog->ic_size-log_offset,
>  						     reg->i_len,
> @@ -2436,20 +2484,11 @@ xlog_write(
>  				ophdr->oh_len = cpu_to_be32(copy_len -
>  						sizeof(struct xlog_op_header));
>  			}
> -			/*
> -			 * Copy region.
> -			 *
> -			 * Commit records just log an opheader, so
> -			 * we can have empty payloads with no data region to
> -			 * copy.  Hence we only copy the payload if the vector
> -			 * says it has data to copy.
> -			 */
> -			ASSERT(copy_len >= 0);
> -			if (copy_len > 0) {
> -				memcpy(ptr, reg->i_addr + copy_off, copy_len);
> -				xlog_write_adv_cnt(&ptr, &len, &log_offset,
> -						   copy_len);
> -			}
> +
> +			ASSERT(copy_len > 0);
> +			memcpy(ptr, reg->i_addr + copy_off, copy_len);
> +			xlog_write_adv_cnt(&ptr, &len, &log_offset, copy_len);
> +

I assume the checks in xlog_write_copy_finish() to require a minimum of
one op header worth of space in the iclog prevent doing a partial write
across an embedded op header boundary, but it would be nice to have an
assert or something that ensures that. For example, assert for something
like if a partial_copy occurs, partial_copy_len was at least the length
of an op header into the region.

>  			if (added_ophdr)
>  				copy_len += sizeof(struct xlog_op_header);
>  			record_cnt++;
...
> diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
> index 0c81c13e2cf6..7a5e6bdb7876 100644
> --- a/fs/xfs/xfs_log_cil.c
> +++ b/fs/xfs/xfs_log_cil.c
> @@ -181,13 +181,20 @@ xlog_cil_alloc_shadow_bufs(
>  		}
>  
>  		/*
> -		 * We 64-bit align the length of each iovec so that the start
> -		 * of the next one is naturally aligned.  We'll need to
> -		 * account for that slack space here. Then round nbytes up
> -		 * to 64-bit alignment so that the initial buffer alignment is
> -		 * easy to calculate and verify.
> +		 * We 64-bit align the length of each iovec so that the start of
> +		 * the next one is naturally aligned.  We'll need to account for
> +		 * that slack space here.
> +		 *

Related to my question above, I'm a little confused by the (preexisting)
comment. If the start of the next iovec is now the ophdr, doesn't that
mean the "start of the next one (iovec)" is technically no longer
naturally aligned?

Brian

> +		 * We also add the xlog_op_header to each region when
> +		 * formatting, but that's not accounted to the size of the item
> +		 * at this point. Hence we'll need an addition number of bytes
> +		 * for each vector to hold an opheader.
> +		 *
> +		 * Then round nbytes up to 64-bit alignment so that the initial
> +		 * buffer alignment is easy to calculate and verify.
>  		 */
> -		nbytes += niovecs * sizeof(uint64_t);
> +		nbytes += niovecs *
> +			(sizeof(uint64_t) + sizeof(struct xlog_op_header));
>  		nbytes = round_up(nbytes, sizeof(uint64_t));
>  
>  		/*
> @@ -433,11 +440,6 @@ xlog_cil_insert_items(
>  
>  	spin_lock(&cil->xc_cil_lock);
>  
> -	/* account for space used by new iovec headers  */
> -	iovhdr_res = diff_iovecs * sizeof(xlog_op_header_t);
> -	len += iovhdr_res;
> -	ctx->nvecs += diff_iovecs;
> -
>  	/* attach the transaction to the CIL if it has any busy extents */
>  	if (!list_empty(&tp->t_busy))
>  		list_splice_init(&tp->t_busy, &ctx->busy_extents);
> @@ -469,6 +471,7 @@ xlog_cil_insert_items(
>  	}
>  	tp->t_ticket->t_curr_res -= len;
>  	ctx->space_used += len;
> +	ctx->nvecs += diff_iovecs;
>  
>  	/*
>  	 * If we've overrun the reservation, dump the tx details before we move
> -- 
> 2.28.0
> 


  parent reply	other threads:[~2021-03-16 14:53 UTC|newest]

Thread overview: 145+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-05  5:10 [PATCH 00/45 v3] xfs: consolidated log and optimisation changes Dave Chinner
2021-03-05  5:10 ` [PATCH 01/45] xfs: initialise attr fork on inode create Dave Chinner
2021-03-08 22:20   ` Darrick J. Wong
2021-03-16  8:35   ` Christoph Hellwig
2021-03-05  5:11 ` [PATCH 02/45] xfs: log stripe roundoff is a property of the log Dave Chinner
2021-03-05  5:11 ` [PATCH 03/45] xfs: separate CIL commit record IO Dave Chinner
2021-03-08  8:34   ` Chandan Babu R
2021-03-15 14:40   ` Brian Foster
2021-03-16  8:40   ` Christoph Hellwig
2021-03-05  5:11 ` [PATCH 04/45] xfs: remove xfs_blkdev_issue_flush Dave Chinner
2021-03-08  9:31   ` Chandan Babu R
2021-03-08 22:21   ` Darrick J. Wong
2021-03-15 14:40   ` Brian Foster
2021-03-16  8:41   ` Christoph Hellwig
2021-03-05  5:11 ` [PATCH 05/45] xfs: async blkdev cache flush Dave Chinner
2021-03-08  9:48   ` Chandan Babu R
2021-03-08 22:24     ` Darrick J. Wong
2021-03-15 14:41       ` Brian Foster
2021-03-15 16:32         ` Darrick J. Wong
2021-03-16  8:43           ` Christoph Hellwig
2021-03-08 22:26   ` Darrick J. Wong
2021-03-15 14:42   ` Brian Foster
2021-03-05  5:11 ` [PATCH 06/45] xfs: CIL checkpoint flushes caches unconditionally Dave Chinner
2021-03-15 14:43   ` Brian Foster
2021-03-16  8:47   ` Christoph Hellwig
2021-03-05  5:11 ` [PATCH 07/45] xfs: remove need_start_rec parameter from xlog_write() Dave Chinner
2021-03-15 14:45   ` Brian Foster
2021-03-16 14:15   ` Christoph Hellwig
2021-03-05  5:11 ` [PATCH 08/45] xfs: journal IO cache flush reductions Dave Chinner
2021-03-08 10:49   ` Chandan Babu R
2021-03-08 12:25   ` Brian Foster
2021-03-09  1:13     ` Dave Chinner
2021-03-10 20:49       ` Brian Foster
2021-03-10 21:28         ` Dave Chinner
2021-03-05  5:11 ` [PATCH 09/45] xfs: Fix CIL throttle hang when CIL space used going backwards Dave Chinner
2021-03-05  5:11 ` [PATCH 10/45] xfs: reduce buffer log item shadow allocations Dave Chinner
2021-03-15 14:52   ` Brian Foster
2021-03-05  5:11 ` [PATCH 11/45] xfs: xfs_buf_item_size_segment() needs to pass segment offset Dave Chinner
2021-03-05  5:11 ` [PATCH 12/45] xfs: optimise xfs_buf_item_size/format for contiguous regions Dave Chinner
2021-03-05  5:11 ` [PATCH 13/45] xfs: xfs_log_force_lsn isn't passed a LSN Dave Chinner
2021-03-08 22:53   ` Darrick J. Wong
2021-03-11  0:26     ` Dave Chinner
2021-03-05  5:11 ` [PATCH 14/45] xfs: AIL needs asynchronous CIL forcing Dave Chinner
2021-03-08 23:45   ` Darrick J. Wong
2021-03-05  5:11 ` [PATCH 15/45] xfs: CIL work is serialised, not pipelined Dave Chinner
2021-03-08 23:14   ` Darrick J. Wong
2021-03-08 23:38     ` Dave Chinner
2021-03-09  1:55       ` Darrick J. Wong
2021-03-09 22:35         ` Andi Kleen
2021-03-10  6:11           ` Dave Chinner
2021-03-05  5:11 ` [PATCH 16/45] xfs: type verification is expensive Dave Chinner
2021-03-05  5:11 ` [PATCH 17/45] xfs: No need for inode number error injection in __xfs_dir3_data_check Dave Chinner
2021-03-05  5:11 ` [PATCH 18/45] xfs: reduce debug overhead of dir leaf/node checks Dave Chinner
2021-03-05  5:11 ` [PATCH 19/45] xfs: factor out the CIL transaction header building Dave Chinner
2021-03-08 23:47   ` Darrick J. Wong
2021-03-16 14:50   ` Brian Foster
2021-03-05  5:11 ` [PATCH 20/45] xfs: only CIL pushes require a start record Dave Chinner
2021-03-09  0:07   ` Darrick J. Wong
2021-03-16 14:51   ` Brian Foster
2021-03-05  5:11 ` [PATCH 21/45] xfs: embed the xlog_op_header in the unmount record Dave Chinner
2021-03-09  0:15   ` Darrick J. Wong
2021-03-11  2:54     ` Dave Chinner
2021-03-05  5:11 ` [PATCH 22/45] xfs: embed the xlog_op_header in the commit record Dave Chinner
2021-03-09  0:17   ` Darrick J. Wong
2021-03-05  5:11 ` [PATCH 23/45] xfs: log tickets don't need log client id Dave Chinner
2021-03-09  0:21   ` Darrick J. Wong
2021-03-09  1:19     ` Dave Chinner
2021-03-09  1:48       ` Darrick J. Wong
2021-03-11  3:01         ` Dave Chinner
2021-03-16 14:51   ` Brian Foster
2021-03-05  5:11 ` [PATCH 24/45] xfs: move log iovec alignment to preparation function Dave Chinner
2021-03-09  2:14   ` Darrick J. Wong
2021-03-16 14:51   ` Brian Foster
2021-03-05  5:11 ` [PATCH 25/45] xfs: reserve space and initialise xlog_op_header in item formatting Dave Chinner
2021-03-09  2:21   ` Darrick J. Wong
2021-03-11  3:29     ` Dave Chinner
2021-03-11  3:41       ` Darrick J. Wong
2021-03-16 14:54         ` Brian Foster
2021-03-16 14:53   ` Brian Foster [this message]
2021-05-19  3:18     ` Dave Chinner
2021-03-05  5:11 ` [PATCH 26/45] xfs: log ticket region debug is largely useless Dave Chinner
2021-03-09  2:31   ` Darrick J. Wong
2021-03-16 14:55   ` Brian Foster
2021-05-19  3:27     ` Dave Chinner
2021-03-05  5:11 ` [PATCH 27/45] xfs: pass lv chain length into xlog_write() Dave Chinner
2021-03-09  2:36   ` Darrick J. Wong
2021-03-11  3:37     ` Dave Chinner
2021-03-16 18:38   ` Brian Foster
2021-03-05  5:11 ` [PATCH 28/45] xfs: introduce xlog_write_single() Dave Chinner
2021-03-09  2:39   ` Darrick J. Wong
2021-03-11  4:19     ` Dave Chinner
2021-03-16 18:39   ` Brian Foster
2021-05-19  3:44     ` Dave Chinner
2021-03-05  5:11 ` [PATCH 29/45] xfs:_introduce xlog_write_partial() Dave Chinner
2021-03-09  2:59   ` Darrick J. Wong
2021-03-11  4:33     ` Dave Chinner
2021-03-18 13:22   ` Brian Foster
2021-05-19  4:49     ` Dave Chinner
2021-05-20 12:33       ` Brian Foster
2021-05-27 18:03         ` Darrick J. Wong
2021-03-05  5:11 ` [PATCH 30/45] xfs: xlog_write() no longer needs contwr state Dave Chinner
2021-03-09  3:01   ` Darrick J. Wong
2021-03-05  5:11 ` [PATCH 31/45] xfs: CIL context doesn't need to count iovecs Dave Chinner
2021-03-09  3:16   ` Darrick J. Wong
2021-03-11  5:03     ` Dave Chinner
2021-03-05  5:11 ` [PATCH 32/45] xfs: use the CIL space used counter for emptiness checks Dave Chinner
2021-03-10 23:01   ` Darrick J. Wong
2021-03-05  5:11 ` [PATCH 33/45] xfs: lift init CIL reservation out of xc_cil_lock Dave Chinner
2021-03-10 23:25   ` Darrick J. Wong
2021-03-11  5:42     ` Dave Chinner
2021-03-05  5:11 ` [PATCH 34/45] xfs: rework per-iclog header CIL reservation Dave Chinner
2021-03-11  0:03   ` Darrick J. Wong
2021-03-11  6:03     ` Dave Chinner
2021-03-05  5:11 ` [PATCH 35/45] xfs: introduce per-cpu CIL tracking sructure Dave Chinner
2021-03-11  0:11   ` Darrick J. Wong
2021-03-11  6:33     ` Dave Chinner
2021-03-11  6:42       ` Dave Chinner
2021-03-05  5:11 ` [PATCH 36/45] xfs: implement percpu cil space used calculation Dave Chinner
2021-03-11  0:20   ` Darrick J. Wong
2021-03-11  6:51     ` Dave Chinner
2021-03-05  5:11 ` [PATCH 37/45] xfs: track CIL ticket reservation in percpu structure Dave Chinner
2021-03-11  0:26   ` Darrick J. Wong
2021-03-12  0:47     ` Dave Chinner
2021-03-05  5:11 ` [PATCH 38/45] xfs: convert CIL busy extents to per-cpu Dave Chinner
2021-03-11  0:36   ` Darrick J. Wong
2021-03-12  1:15     ` Dave Chinner
2021-03-05  5:11 ` [PATCH 39/45] xfs: Add order IDs to log items in CIL Dave Chinner
2021-03-11  1:00   ` Darrick J. Wong
2021-03-05  5:11 ` [PATCH 40/45] xfs: convert CIL to unordered per cpu lists Dave Chinner
2021-03-11  1:15   ` Darrick J. Wong
2021-03-12  2:18     ` Dave Chinner
2021-03-05  5:11 ` [PATCH 41/45] xfs: move CIL ordering to the logvec chain Dave Chinner
2021-03-11  1:34   ` Darrick J. Wong
2021-03-12  2:29     ` Dave Chinner
2021-03-05  5:11 ` [PATCH 42/45] xfs: __percpu_counter_compare() inode count debug too expensive Dave Chinner
2021-03-11  1:36   ` Darrick J. Wong
2021-03-05  5:11 ` [PATCH 43/45] xfs: avoid cil push lock if possible Dave Chinner
2021-03-11  1:47   ` Darrick J. Wong
2021-03-12  2:36     ` Dave Chinner
2021-03-05  5:11 ` [PATCH 44/45] xfs: xlog_sync() manually adjusts grant head space Dave Chinner
2021-03-11  2:00   ` Darrick J. Wong
2021-03-16  3:04     ` Dave Chinner
2021-03-05  5:11 ` [PATCH 45/45] xfs: expanding delayed logging design with background material Dave Chinner
2021-03-11  2:30   ` Darrick J. Wong
2021-03-16  3:28     ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YFDGTRSdC/PtkVLv@bfoster \
    --to=bfoster@redhat.com \
    --cc=david@fromorbit.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.