All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: "Darrick J. Wong" <djwong@kernel.org>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH 30/39] xfs: implement percpu cil space used calculation
Date: Thu, 3 Jun 2021 09:47:47 +1000	[thread overview]
Message-ID: <20210602234747.GY664593@dread.disaster.area> (raw)
In-Reply-To: <20210527184121.GM202144@locust>

On Thu, May 27, 2021 at 11:41:21AM -0700, Darrick J. Wong wrote:
> On Wed, May 19, 2021 at 10:13:08PM +1000, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > Now that we have the CIL percpu structures in place, implement the
> > space used counter with a fast sum check similar to the
> > percpu_counter infrastructure.
> > 
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > ---
> >  fs/xfs/xfs_log_cil.c  | 61 ++++++++++++++++++++++++++++++++++++++-----
> >  fs/xfs/xfs_log_priv.h |  2 +-
> >  2 files changed, 55 insertions(+), 8 deletions(-)
> > 
> > diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
> > index ba1c6979a4c7..72693fba929b 100644
> > --- a/fs/xfs/xfs_log_cil.c
> > +++ b/fs/xfs/xfs_log_cil.c
> > @@ -76,6 +76,24 @@ xlog_cil_ctx_alloc(void)
> >  	return ctx;
> >  }
> >  
> > +/*
> > + * Aggregate the CIL per cpu structures into global counts, lists, etc and
> > + * clear the percpu state ready for the next context to use.
> > + */
> > +static void
> > +xlog_cil_pcp_aggregate(
> > +	struct xfs_cil		*cil,
> > +	struct xfs_cil_ctx	*ctx)
> > +{
> > +	struct xlog_cil_pcp	*cilpcp;
> > +	int			cpu;
> > +
> > +	for_each_online_cpu(cpu) {
> > +		cilpcp = per_cpu_ptr(cil->xc_pcp, cpu);
> > +		cilpcp->space_used = 0;
> 
> How does this aggregate anything?  All I see here is zeroing a counter?

Yup, zeroing all the percpu counters is an aggregation function....

By definition "aggregate != sum".

An aggregate is formed by the collection of discrete units into a
larger whole; the collective definition involves manipulating all
discrete units as a single whole entity. e.g. a percpu counter is
an aggregate of percpu variables that, via aggregation, can sum the
discrete variables into a single value. IOWs, percpu_counter_sum()
is an aggregation function that sums...

> I see that we /can/ add the percpu space_used counter to the cil context
> if we're over the space limits, but I don't actually see where...

In this case, the global CIL space used counter is summed by the
per-cpu counter update context and not an aggregation context. For
it to work as a global counter since a distinct point in time, it
needs an aggregation operation that zeros all the discrete units of
the counter at a single point in time. IOWs, the aggregation
function of this counter is a zeroing operation, not a summing
operation. This is what xlog_cil_pcp_aggregate() is doing here.

Put simply, an aggregation function is not a summing function, but a
function that operates on all the discrete units of the
aggregate so that it can operate correctly as a single unit....

I don't know of a better way of describing what this function does.
At the end of the series, this function will zero some units. In
other cases it will sum units. In some cases it will do both. Not to
mention that it will merge discrete lists into a global list. And so
on. The only common thing between these operations is that they are
all aggregation functions that allow the CIL context to operate as a
whole unit...

If you've got a better name, then I'm all ears :)

....

> > @@ -480,16 +501,34 @@ xlog_cil_insert_items(
> >  		atomic_sub(tp->t_ticket->t_iclog_hdrs, &cil->xc_iclog_hdrs);
> >  	}
> >  
> > +	/*
> > +	 * Update the CIL percpu pointer. This updates the global counter when
> > +	 * over the percpu batch size or when the CIL is over the space limit.
> > +	 * This means low lock overhead for normal updates, and when over the
> > +	 * limit the space used is immediately accounted. This makes enforcing
> > +	 * the hard limit much more accurate. The per cpu fold threshold is
> > +	 * based on how close we are to the hard limit.
> > +	 */
> > +	cilpcp = get_cpu_ptr(cil->xc_pcp);
> > +	cilpcp->space_used += len;
> > +	if (space_used >= XLOG_CIL_SPACE_LIMIT(log) ||
> > +	    cilpcp->space_used >
> > +			((XLOG_CIL_BLOCKING_SPACE_LIMIT(log) - space_used) /
> > +					num_online_cpus())) {
> > +		atomic_add(cilpcp->space_used, &ctx->space_used);
> > +		cilpcp->space_used = 0;
> > +	}
> > +	put_cpu_ptr(cilpcp);
> > +
> >  	spin_lock(&cil->xc_cil_lock);
> > -	tp->t_ticket->t_curr_res -= ctx_res + len;
> >  	ctx->ticket->t_unit_res += ctx_res;
> >  	ctx->ticket->t_curr_res += ctx_res;
> > -	ctx->space_used += len;
> 
> ...this update happens if we're not over the space limit?

It's the second case in the above if statement. As the space used in
the percpu pointer goes over it's fraction of the remaining space
limit (limit remaining / num_cpus_online), then it adds the
pcp counter back into the global counter. Essentially it is:

	if (over push threshold ||
>>>>>>	    pcp->used > ((hard limit - ctx->space_used) / cpus)) {
		ctx->space_used += pcp->used;
		pcp->used = 0;
	}

Hence, to begin with, the percpu counter is allowed to sum a large
chunk of space before it trips the per CPU summing threshold. When
summing occurs, the per-cpu threshold goes down, meaning there pcp
counters will trip sooner in the next cycle.

IOWs, the summing threshold gets closer to zero the closer the
global count gets to the hard limit. Hence when there's lots of
space available, we have little summing contention, but when we
are close to the blocking limit we essentially update the global
counter on every modification.

As such, we get scalability when the CIL is empty by trading off
accuracy, but we get accuracy when it is nearing full by trading off
scalability. We might need to tweak it for really large CPU counts
(maybe use log2(num_online_cpus()), but fundamentally the algorithm
is designed to scale according to how close we are to the push
thresholds....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

  reply	other threads:[~2021-06-02 23:48 UTC|newest]

Thread overview: 88+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-19 12:12 [PATCH 00/39 v4] xfs: CIL and log optimisations Dave Chinner
2021-05-19 12:12 ` [PATCH 01/39] xfs: log stripe roundoff is a property of the log Dave Chinner
2021-05-28  0:54   ` Allison Henderson
2021-05-19 12:12 ` [PATCH 02/39] xfs: separate CIL commit record IO Dave Chinner
2021-05-28  0:54   ` Allison Henderson
2021-05-19 12:12 ` [PATCH 03/39] xfs: remove xfs_blkdev_issue_flush Dave Chinner
2021-05-28  0:54   ` Allison Henderson
2021-05-19 12:12 ` [PATCH 04/39] xfs: async blkdev cache flush Dave Chinner
2021-05-20 23:53   ` Darrick J. Wong
2021-05-28  0:54   ` Allison Henderson
2021-05-19 12:12 ` [PATCH 05/39] xfs: CIL checkpoint flushes caches unconditionally Dave Chinner
2021-05-28  0:54   ` Allison Henderson
2021-05-19 12:12 ` [PATCH 06/39] xfs: remove need_start_rec parameter from xlog_write() Dave Chinner
2021-05-19 12:12 ` [PATCH 07/39] xfs: journal IO cache flush reductions Dave Chinner
2021-05-21  0:16   ` Darrick J. Wong
2021-05-19 12:12 ` [PATCH 08/39] xfs: Fix CIL throttle hang when CIL space used going backwards Dave Chinner
2021-05-19 12:12 ` [PATCH 09/39] xfs: xfs_log_force_lsn isn't passed a LSN Dave Chinner
2021-05-21  0:20   ` Darrick J. Wong
2021-05-19 12:12 ` [PATCH 10/39] xfs: AIL needs asynchronous CIL forcing Dave Chinner
2021-05-21  0:33   ` Darrick J. Wong
2021-05-19 12:12 ` [PATCH 11/39] xfs: CIL work is serialised, not pipelined Dave Chinner
2021-05-21  0:32   ` Darrick J. Wong
2021-05-19 12:12 ` [PATCH 12/39] xfs: factor out the CIL transaction header building Dave Chinner
2021-05-19 12:12 ` [PATCH 13/39] xfs: only CIL pushes require a start record Dave Chinner
2021-05-19 12:12 ` [PATCH 14/39] xfs: embed the xlog_op_header in the unmount record Dave Chinner
2021-05-21  0:35   ` Darrick J. Wong
2021-05-19 12:12 ` [PATCH 15/39] xfs: embed the xlog_op_header in the commit record Dave Chinner
2021-05-19 12:12 ` [PATCH 16/39] xfs: log tickets don't need log client id Dave Chinner
2021-05-21  0:38   ` Darrick J. Wong
2021-05-19 12:12 ` [PATCH 17/39] xfs: move log iovec alignment to preparation function Dave Chinner
2021-05-19 12:12 ` [PATCH 18/39] xfs: reserve space and initialise xlog_op_header in item formatting Dave Chinner
2021-05-19 12:12 ` [PATCH 19/39] xfs: log ticket region debug is largely useless Dave Chinner
2021-05-19 12:12 ` [PATCH 20/39] xfs: pass lv chain length into xlog_write() Dave Chinner
2021-05-27 17:20   ` Darrick J. Wong
2021-06-02 22:18     ` Dave Chinner
2021-06-02 22:24       ` Darrick J. Wong
2021-06-02 22:58         ` [PATCH 20/39 V2] " Dave Chinner
2021-06-02 23:01           ` Darrick J. Wong
2021-05-19 12:12 ` [PATCH 21/39] xfs: introduce xlog_write_single() Dave Chinner
2021-05-27 17:27   ` Darrick J. Wong
2021-05-19 12:13 ` [PATCH 22/39] xfs:_introduce xlog_write_partial() Dave Chinner
2021-05-27 18:06   ` Darrick J. Wong
2021-06-02 22:21     ` Dave Chinner
2021-05-19 12:13 ` [PATCH 23/39] xfs: xlog_write() no longer needs contwr state Dave Chinner
2021-05-19 12:13 ` [PATCH 24/39] xfs: xlog_write() doesn't need optype anymore Dave Chinner
2021-05-27 18:07   ` Darrick J. Wong
2021-05-19 12:13 ` [PATCH 25/39] xfs: CIL context doesn't need to count iovecs Dave Chinner
2021-05-27 18:08   ` Darrick J. Wong
2021-05-19 12:13 ` [PATCH 26/39] xfs: use the CIL space used counter for emptiness checks Dave Chinner
2021-05-19 12:13 ` [PATCH 27/39] xfs: lift init CIL reservation out of xc_cil_lock Dave Chinner
2021-05-19 12:13 ` [PATCH 28/39] xfs: rework per-iclog header CIL reservation Dave Chinner
2021-05-27 18:17   ` Darrick J. Wong
2021-05-19 12:13 ` [PATCH 29/39] xfs: introduce per-cpu CIL tracking structure Dave Chinner
2021-05-27 18:31   ` Darrick J. Wong
2021-05-19 12:13 ` [PATCH 30/39] xfs: implement percpu cil space used calculation Dave Chinner
2021-05-27 18:41   ` Darrick J. Wong
2021-06-02 23:47     ` Dave Chinner [this message]
2021-06-03  1:26       ` Darrick J. Wong
2021-06-03  2:28         ` Dave Chinner
2021-06-03  3:01           ` Darrick J. Wong
2021-06-03  3:56             ` Dave Chinner
2021-05-19 12:13 ` [PATCH 31/39] xfs: track CIL ticket reservation in percpu structure Dave Chinner
2021-05-27 18:48   ` Darrick J. Wong
2021-05-19 12:13 ` [PATCH 32/39] xfs: convert CIL busy extents to per-cpu Dave Chinner
2021-05-27 18:49   ` Darrick J. Wong
2021-05-19 12:13 ` [PATCH 33/39] xfs: Add order IDs to log items in CIL Dave Chinner
2021-05-27 19:00   ` Darrick J. Wong
2021-06-03  0:16     ` Dave Chinner
2021-06-03  0:49       ` Darrick J. Wong
2021-06-03  2:13         ` Dave Chinner
2021-06-03  3:02           ` Darrick J. Wong
2021-05-19 12:13 ` [PATCH 34/39] xfs: convert CIL to unordered per cpu lists Dave Chinner
2021-05-27 19:03   ` Darrick J. Wong
2021-06-03  0:27     ` Dave Chinner
2021-05-19 12:13 ` [PATCH 35/39] xfs: convert log vector chain to use list heads Dave Chinner
2021-05-27 19:13   ` Darrick J. Wong
2021-06-03  0:38     ` Dave Chinner
2021-06-03  0:50       ` Darrick J. Wong
2021-05-19 12:13 ` [PATCH 36/39] xfs: move CIL ordering to the logvec chain Dave Chinner
2021-05-27 19:14   ` Darrick J. Wong
2021-05-19 12:13 ` [PATCH 37/39] xfs: avoid cil push lock if possible Dave Chinner
2021-05-27 19:18   ` Darrick J. Wong
2021-05-19 12:13 ` [PATCH 38/39] xfs: xlog_sync() manually adjusts grant head space Dave Chinner
2021-05-19 12:13 ` [PATCH 39/39] xfs: expanding delayed logging design with background material Dave Chinner
2021-05-27 20:38   ` Darrick J. Wong
2021-06-03  0:57     ` Dave Chinner
2021-06-03  5:22 [PATCH 00/39 v5] xfs: CIL and log optimisations Dave Chinner
2021-06-03  5:22 ` [PATCH 30/39] xfs: implement percpu cil space used calculation Dave Chinner
2021-06-03 16:44   ` Darrick J. Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210602234747.GY664593@dread.disaster.area \
    --to=david@fromorbit.com \
    --cc=djwong@kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.