All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <djwong@kernel.org>
To: Dave Chinner <david@fromorbit.com>
Cc: chandan.babu@oracle.com, chandanrlinux@gmail.com,
	linux-xfs@vger.kernel.org
Subject: Re: [PATCH 11/14] xfs: dynamically allocate cursors based on maxlevels
Date: Wed, 22 Sep 2021 10:38:21 -0700	[thread overview]
Message-ID: <20210922173821.GH570615@magnolia> (raw)
In-Reply-To: <20210920230635.GM1756565@dread.disaster.area>

On Tue, Sep 21, 2021 at 09:06:35AM +1000, Dave Chinner wrote:
> On Fri, Sep 17, 2021 at 06:30:10PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@kernel.org>
> > 
> > Replace the statically-sized btree cursor zone with dynamically sized
> > allocations so that we can reduce the memory overhead for per-AG bt
> > cursors while handling very tall btrees for rt metadata.
> 
> Hmmmmm. We do a *lot* of btree cursor allocation and freeing under
> load. Keeping that in a single slab rather than using heap memory is
> a good idea for stuff like this for many reasons...
> 
> I mean, if we are creating a million inodes a second, a rouch
> back-of-the-envelope calculation says we are doing 3-4 million btree
> cursor instantiations a second. That's a lot of short term churn on
> the heap that we don't really need to subject it to. And even a few
> extra instructions in a path called millions of times a second adds
> up to a lot of extra runtime overhead.
> 
> So....
> 
> > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > ---
> >  fs/xfs/libxfs/xfs_btree.c |   40 ++++++++++++++++++++++++++++++++--------
> >  fs/xfs/libxfs/xfs_btree.h |    2 --
> >  fs/xfs/xfs_super.c        |   11 +----------
> >  3 files changed, 33 insertions(+), 20 deletions(-)
> > 
> > 
> > diff --git a/fs/xfs/libxfs/xfs_btree.c b/fs/xfs/libxfs/xfs_btree.c
> > index 2486ba22c01d..f9516828a847 100644
> > --- a/fs/xfs/libxfs/xfs_btree.c
> > +++ b/fs/xfs/libxfs/xfs_btree.c
> > @@ -23,11 +23,6 @@
> >  #include "xfs_btree_staging.h"
> >  #include "xfs_ag.h"
> >  
> > -/*
> > - * Cursor allocation zone.
> > - */
> > -kmem_zone_t	*xfs_btree_cur_zone;
> > -
> >  /*
> >   * Btree magic numbers.
> >   */
> > @@ -379,7 +374,7 @@ xfs_btree_del_cursor(
> >  		kmem_free(cur->bc_ops);
> >  	if (!(cur->bc_flags & XFS_BTREE_LONG_PTRS) && cur->bc_ag.pag)
> >  		xfs_perag_put(cur->bc_ag.pag);
> > -	kmem_cache_free(xfs_btree_cur_zone, cur);
> > +	kmem_free(cur);
> >  }
> >  
> >  /*
> > @@ -4927,6 +4922,32 @@ xfs_btree_has_more_records(
> >  		return block->bb_u.s.bb_rightsib != cpu_to_be32(NULLAGBLOCK);
> >  }
> >  
> > +/* Compute the maximum allowed height for a given btree type. */
> > +static unsigned int
> > +xfs_btree_maxlevels(
> > +	struct xfs_mount	*mp,
> > +	xfs_btnum_t		btnum)
> > +{
> > +	switch (btnum) {
> > +	case XFS_BTNUM_BNO:
> > +	case XFS_BTNUM_CNT:
> > +		return mp->m_ag_maxlevels;
> > +	case XFS_BTNUM_BMAP:
> > +		return max(mp->m_bm_maxlevels[XFS_DATA_FORK],
> > +			   mp->m_bm_maxlevels[XFS_ATTR_FORK]);
> > +	case XFS_BTNUM_INO:
> > +	case XFS_BTNUM_FINO:
> > +		return M_IGEO(mp)->inobt_maxlevels;
> > +	case XFS_BTNUM_RMAP:
> > +		return mp->m_rmap_maxlevels;
> > +	case XFS_BTNUM_REFC:
> > +		return mp->m_refc_maxlevels;
> > +	default:
> > +		ASSERT(0);
> > +		return XFS_BTREE_MAXLEVELS;
> > +	}
> > +}
> > +
> >  /* Allocate a new btree cursor of the appropriate size. */
> >  struct xfs_btree_cur *
> >  xfs_btree_alloc_cursor(
> > @@ -4935,13 +4956,16 @@ xfs_btree_alloc_cursor(
> >  	xfs_btnum_t		btnum)
> >  {
> >  	struct xfs_btree_cur	*cur;
> > +	unsigned int		maxlevels = xfs_btree_maxlevels(mp, btnum);
> >  
> > -	cur = kmem_cache_zalloc(xfs_btree_cur_zone, GFP_NOFS | __GFP_NOFAIL);
> > +	ASSERT(maxlevels <= XFS_BTREE_MAXLEVELS);
> > +
> > +	cur = kmem_zalloc(xfs_btree_cur_sizeof(maxlevels), KM_NOFS);
> 
> Instead of multiple dynamic runtime calculations to determine the
> size to allocate from the heap, which then has to select a slab
> based on size, why don't we just pre-calculate the max size of
> the cursor at XFS module init and use that for the btree cursor slab
> size?

As part of developing the realtime rmapbt and reflink btrees, I computed
the maximum theoretical btree height for a maximally sized realtime
volume.  For a realtime volume with 2^52 blocks and a 1k block size, I
estimate that you'd need a 11-level rtrefcount btree cursor.  The rtrmap
btree cursor would have to be 28 levels high.  Using 4k blocks instead
of 1k blocks, it's not so bad -- 8 for rtrefcount and 17 for rtrmap.

I don't recall exactly what Chandan said the maximum bmbt height would
need to be to support really large data fork mapping structures, but
based on my worst case estimate of 2^54 single-block mappings and a 1k
blocksize, you'd need a 12-level bmbt cursor.  For 4k blocks, you'd need
only 8 levels.

The current XFS_BTREE_MAXLEVELS is 9, which just so happens to fit in
248 bytes.  I will rework this patch to make xfs_btree_cur_zone supply
256-byte cursors, and the btree code will continue using the zone if 256
bytes is enough space for the cursor.

If we decide later on that we need a zone for larger cursors, I think
the next logical size up (512 bytes) will fit 25 levels, but let's wait
to get there first.

--D

> The memory overhead of the cursor isn't an issue because we've been
> maximally sizing it since forever, and the whole point of a slab
> cache is to minimise allocation overhead of frequently allocated
> objects. It seems to me that we really want to retain these
> properties of the cursor allocator, not give them up just as we're
> in the process of making other modifications that will hit the path
> more frequently than it's ever been hit before...
> 
> I like all the dynamic sized guards that this series places in the
> cursor, but I don't think we want to change the way we allocate the
> cursors just to support that.
> 
> FWIW, an example of avoidable runtime calculation overhead of
> constants is xlog_calc_unit_res(). These values are actually
> constant for a given transaction reservation, but at 1.6 million
> transactions a second it shows up at #20 on the flat profile of
> functions using the most CPU:
> 
> 0.71%  [kernel]  [k] xlog_calc_unit_res
> 
> 0.71% of 32 CPUs for 1.6 million calculations a second of the same
> constants is a non-trivial amount of CPU time to spend doing
> unnecessary repeated calculations.
> 
> Even though the btree cursor constant calculations are simpler than
> the log res calculations, they are more frequent. Hence on general
> principles of efficiency, I don't think we want to be replacing high
> frequency, low overhead slab/zone based allocations with heap
> allocations that require repeated constant calculations and
> size->slab redirection....
> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

  parent reply	other threads:[~2021-09-22 17:38 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-18  1:29 [PATCHSET RFC chandan 00/14] xfs: support dynamic btree cursor height Darrick J. Wong
2021-09-18  1:29 ` [PATCH 01/14] xfs: remove xfs_btree_cur_t typedef Darrick J. Wong
2021-09-20  9:53   ` Chandan Babu R
2021-09-21  8:36   ` Christoph Hellwig
2021-09-18  1:29 ` [PATCH 02/14] xfs: don't allocate scrub contexts on the stack Darrick J. Wong
2021-09-20  9:53   ` Chandan Babu R
2021-09-20 17:39     ` Darrick J. Wong
2021-09-21  8:39   ` Christoph Hellwig
2021-09-18  1:29 ` [PATCH 03/14] xfs: dynamically allocate btree scrub context structure Darrick J. Wong
2021-09-20  9:53   ` Chandan Babu R
2021-09-21  8:43   ` Christoph Hellwig
2021-09-22 16:17     ` Darrick J. Wong
2021-09-18  1:29 ` [PATCH 04/14] xfs: stricter btree height checking when looking for errors Darrick J. Wong
2021-09-20  9:54   ` Chandan Babu R
2021-09-18  1:29 ` [PATCH 05/14] xfs: stricter btree height checking when scanning for btree roots Darrick J. Wong
2021-09-20  9:54   ` Chandan Babu R
2021-09-18  1:29 ` [PATCH 06/14] xfs: check that bc_nlevels never overflows Darrick J. Wong
2021-09-20  9:54   ` Chandan Babu R
2021-09-21  8:44   ` Christoph Hellwig
2021-09-18  1:29 ` [PATCH 07/14] xfs: support dynamic btree cursor heights Darrick J. Wong
2021-09-20  9:55   ` Chandan Babu R
2021-09-21  8:49   ` Christoph Hellwig
2021-09-18  1:29 ` [PATCH 08/14] xfs: refactor btree cursor allocation function Darrick J. Wong
2021-09-20  9:55   ` Chandan Babu R
2021-09-21  8:53   ` Christoph Hellwig
2021-09-18  1:29 ` [PATCH 09/14] xfs: fix maxlevels comparisons in the btree staging code Darrick J. Wong
2021-09-20  9:55   ` Chandan Babu R
2021-09-21  8:56   ` Christoph Hellwig
2021-09-22 15:59     ` Darrick J. Wong
2021-09-18  1:30 ` [PATCH 10/14] xfs: encode the max btree height in the cursor Darrick J. Wong
2021-09-20  9:55   ` Chandan Babu R
2021-09-21  8:57   ` Christoph Hellwig
2021-09-18  1:30 ` [PATCH 11/14] xfs: dynamically allocate cursors based on maxlevels Darrick J. Wong
2021-09-20  9:56   ` Chandan Babu R
2021-09-20 23:06   ` Dave Chinner
2021-09-20 23:36     ` Dave Chinner
2021-09-21  9:03     ` Christoph Hellwig
2021-09-22 18:55       ` Darrick J. Wong
2021-09-22 17:38     ` Darrick J. Wong [this message]
2021-09-22 23:10       ` Dave Chinner
2021-09-23  1:58         ` Darrick J. Wong
2021-09-23  5:56           ` Chandan Babu R
2021-09-18  1:30 ` [PATCH 12/14] xfs: compute actual maximum btree height for critical reservation calculation Darrick J. Wong
2021-09-20  9:56   ` Chandan Babu R
2021-09-18  1:30 ` [PATCH 13/14] xfs: compute the maximum height of the rmap btree when reflink enabled Darrick J. Wong
2021-09-20  9:56   ` Chandan Babu R
2021-09-18  1:30 ` [PATCH 14/14] xfs: kill XFS_BTREE_MAXLEVELS Darrick J. Wong
2021-09-20  9:57   ` Chandan Babu R

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210922173821.GH570615@magnolia \
    --to=djwong@kernel.org \
    --cc=chandan.babu@oracle.com \
    --cc=chandanrlinux@gmail.com \
    --cc=david@fromorbit.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.