linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <djwong@kernel.org>
To: Dave Chinner <david@fromorbit.com>
Cc: chandan.babu@oracle.com, chandanrlinux@gmail.com,
	linux-xfs@vger.kernel.org
Subject: Re: [PATCH 11/14] xfs: dynamically allocate cursors based on maxlevels
Date: Wed, 22 Sep 2021 10:38:21 -0700	[thread overview]
Message-ID: <20210922173821.GH570615@magnolia> (raw)
In-Reply-To: <20210920230635.GM1756565@dread.disaster.area>

On Tue, Sep 21, 2021 at 09:06:35AM +1000, Dave Chinner wrote:
> On Fri, Sep 17, 2021 at 06:30:10PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@kernel.org>
> > 
> > Replace the statically-sized btree cursor zone with dynamically sized
> > allocations so that we can reduce the memory overhead for per-AG bt
> > cursors while handling very tall btrees for rt metadata.
> 
> Hmmmmm. We do a *lot* of btree cursor allocation and freeing under
> load. Keeping that in a single slab rather than using heap memory is
> a good idea for stuff like this for many reasons...
> 
> I mean, if we are creating a million inodes a second, a rouch
> back-of-the-envelope calculation says we are doing 3-4 million btree
> cursor instantiations a second. That's a lot of short term churn on
> the heap that we don't really need to subject it to. And even a few
> extra instructions in a path called millions of times a second adds
> up to a lot of extra runtime overhead.
> 
> So....
> 
> > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > ---
> >  fs/xfs/libxfs/xfs_btree.c |   40 ++++++++++++++++++++++++++++++++--------
> >  fs/xfs/libxfs/xfs_btree.h |    2 --
> >  fs/xfs/xfs_super.c        |   11 +----------
> >  3 files changed, 33 insertions(+), 20 deletions(-)
> > 
> > 
> > diff --git a/fs/xfs/libxfs/xfs_btree.c b/fs/xfs/libxfs/xfs_btree.c
> > index 2486ba22c01d..f9516828a847 100644
> > --- a/fs/xfs/libxfs/xfs_btree.c
> > +++ b/fs/xfs/libxfs/xfs_btree.c
> > @@ -23,11 +23,6 @@
> >  #include "xfs_btree_staging.h"
> >  #include "xfs_ag.h"
> >  
> > -/*
> > - * Cursor allocation zone.
> > - */
> > -kmem_zone_t	*xfs_btree_cur_zone;
> > -
> >  /*
> >   * Btree magic numbers.
> >   */
> > @@ -379,7 +374,7 @@ xfs_btree_del_cursor(
> >  		kmem_free(cur->bc_ops);
> >  	if (!(cur->bc_flags & XFS_BTREE_LONG_PTRS) && cur->bc_ag.pag)
> >  		xfs_perag_put(cur->bc_ag.pag);
> > -	kmem_cache_free(xfs_btree_cur_zone, cur);
> > +	kmem_free(cur);
> >  }
> >  
> >  /*
> > @@ -4927,6 +4922,32 @@ xfs_btree_has_more_records(
> >  		return block->bb_u.s.bb_rightsib != cpu_to_be32(NULLAGBLOCK);
> >  }
> >  
> > +/* Compute the maximum allowed height for a given btree type. */
> > +static unsigned int
> > +xfs_btree_maxlevels(
> > +	struct xfs_mount	*mp,
> > +	xfs_btnum_t		btnum)
> > +{
> > +	switch (btnum) {
> > +	case XFS_BTNUM_BNO:
> > +	case XFS_BTNUM_CNT:
> > +		return mp->m_ag_maxlevels;
> > +	case XFS_BTNUM_BMAP:
> > +		return max(mp->m_bm_maxlevels[XFS_DATA_FORK],
> > +			   mp->m_bm_maxlevels[XFS_ATTR_FORK]);
> > +	case XFS_BTNUM_INO:
> > +	case XFS_BTNUM_FINO:
> > +		return M_IGEO(mp)->inobt_maxlevels;
> > +	case XFS_BTNUM_RMAP:
> > +		return mp->m_rmap_maxlevels;
> > +	case XFS_BTNUM_REFC:
> > +		return mp->m_refc_maxlevels;
> > +	default:
> > +		ASSERT(0);
> > +		return XFS_BTREE_MAXLEVELS;
> > +	}
> > +}
> > +
> >  /* Allocate a new btree cursor of the appropriate size. */
> >  struct xfs_btree_cur *
> >  xfs_btree_alloc_cursor(
> > @@ -4935,13 +4956,16 @@ xfs_btree_alloc_cursor(
> >  	xfs_btnum_t		btnum)
> >  {
> >  	struct xfs_btree_cur	*cur;
> > +	unsigned int		maxlevels = xfs_btree_maxlevels(mp, btnum);
> >  
> > -	cur = kmem_cache_zalloc(xfs_btree_cur_zone, GFP_NOFS | __GFP_NOFAIL);
> > +	ASSERT(maxlevels <= XFS_BTREE_MAXLEVELS);
> > +
> > +	cur = kmem_zalloc(xfs_btree_cur_sizeof(maxlevels), KM_NOFS);
> 
> Instead of multiple dynamic runtime calculations to determine the
> size to allocate from the heap, which then has to select a slab
> based on size, why don't we just pre-calculate the max size of
> the cursor at XFS module init and use that for the btree cursor slab
> size?

As part of developing the realtime rmapbt and reflink btrees, I computed
the maximum theoretical btree height for a maximally sized realtime
volume.  For a realtime volume with 2^52 blocks and a 1k block size, I
estimate that you'd need a 11-level rtrefcount btree cursor.  The rtrmap
btree cursor would have to be 28 levels high.  Using 4k blocks instead
of 1k blocks, it's not so bad -- 8 for rtrefcount and 17 for rtrmap.

I don't recall exactly what Chandan said the maximum bmbt height would
need to be to support really large data fork mapping structures, but
based on my worst case estimate of 2^54 single-block mappings and a 1k
blocksize, you'd need a 12-level bmbt cursor.  For 4k blocks, you'd need
only 8 levels.

The current XFS_BTREE_MAXLEVELS is 9, which just so happens to fit in
248 bytes.  I will rework this patch to make xfs_btree_cur_zone supply
256-byte cursors, and the btree code will continue using the zone if 256
bytes is enough space for the cursor.

If we decide later on that we need a zone for larger cursors, I think
the next logical size up (512 bytes) will fit 25 levels, but let's wait
to get there first.

--D

> The memory overhead of the cursor isn't an issue because we've been
> maximally sizing it since forever, and the whole point of a slab
> cache is to minimise allocation overhead of frequently allocated
> objects. It seems to me that we really want to retain these
> properties of the cursor allocator, not give them up just as we're
> in the process of making other modifications that will hit the path
> more frequently than it's ever been hit before...
> 
> I like all the dynamic sized guards that this series places in the
> cursor, but I don't think we want to change the way we allocate the
> cursors just to support that.
> 
> FWIW, an example of avoidable runtime calculation overhead of
> constants is xlog_calc_unit_res(). These values are actually
> constant for a given transaction reservation, but at 1.6 million
> transactions a second it shows up at #20 on the flat profile of
> functions using the most CPU:
> 
> 0.71%  [kernel]  [k] xlog_calc_unit_res
> 
> 0.71% of 32 CPUs for 1.6 million calculations a second of the same
> constants is a non-trivial amount of CPU time to spend doing
> unnecessary repeated calculations.
> 
> Even though the btree cursor constant calculations are simpler than
> the log res calculations, they are more frequent. Hence on general
> principles of efficiency, I don't think we want to be replacing high
> frequency, low overhead slab/zone based allocations with heap
> allocations that require repeated constant calculations and
> size->slab redirection....
> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

  parent reply	other threads:[~2021-09-22 17:38 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-18  1:29 [PATCHSET RFC chandan 00/14] xfs: support dynamic btree cursor height Darrick J. Wong
2021-09-18  1:29 ` [PATCH 01/14] xfs: remove xfs_btree_cur_t typedef Darrick J. Wong
2021-09-20  9:53   ` Chandan Babu R
2021-09-21  8:36   ` Christoph Hellwig
2021-09-18  1:29 ` [PATCH 02/14] xfs: don't allocate scrub contexts on the stack Darrick J. Wong
2021-09-20  9:53   ` Chandan Babu R
2021-09-20 17:39     ` Darrick J. Wong
2021-09-21  8:39   ` Christoph Hellwig
2021-09-18  1:29 ` [PATCH 03/14] xfs: dynamically allocate btree scrub context structure Darrick J. Wong
2021-09-20  9:53   ` Chandan Babu R
2021-09-21  8:43   ` Christoph Hellwig
2021-09-22 16:17     ` Darrick J. Wong
2021-09-18  1:29 ` [PATCH 04/14] xfs: stricter btree height checking when looking for errors Darrick J. Wong
2021-09-20  9:54   ` Chandan Babu R
2021-09-18  1:29 ` [PATCH 05/14] xfs: stricter btree height checking when scanning for btree roots Darrick J. Wong
2021-09-20  9:54   ` Chandan Babu R
2021-09-18  1:29 ` [PATCH 06/14] xfs: check that bc_nlevels never overflows Darrick J. Wong
2021-09-20  9:54   ` Chandan Babu R
2021-09-21  8:44   ` Christoph Hellwig
2021-09-18  1:29 ` [PATCH 07/14] xfs: support dynamic btree cursor heights Darrick J. Wong
2021-09-20  9:55   ` Chandan Babu R
2021-09-21  8:49   ` Christoph Hellwig
2021-09-18  1:29 ` [PATCH 08/14] xfs: refactor btree cursor allocation function Darrick J. Wong
2021-09-20  9:55   ` Chandan Babu R
2021-09-21  8:53   ` Christoph Hellwig
2021-09-18  1:29 ` [PATCH 09/14] xfs: fix maxlevels comparisons in the btree staging code Darrick J. Wong
2021-09-20  9:55   ` Chandan Babu R
2021-09-21  8:56   ` Christoph Hellwig
2021-09-22 15:59     ` Darrick J. Wong
2021-09-18  1:30 ` [PATCH 10/14] xfs: encode the max btree height in the cursor Darrick J. Wong
2021-09-20  9:55   ` Chandan Babu R
2021-09-21  8:57   ` Christoph Hellwig
2021-09-18  1:30 ` [PATCH 11/14] xfs: dynamically allocate cursors based on maxlevels Darrick J. Wong
2021-09-20  9:56   ` Chandan Babu R
2021-09-20 23:06   ` Dave Chinner
2021-09-20 23:36     ` Dave Chinner
2021-09-21  9:03     ` Christoph Hellwig
2021-09-22 18:55       ` Darrick J. Wong
2021-09-22 17:38     ` Darrick J. Wong [this message]
2021-09-22 23:10       ` Dave Chinner
2021-09-23  1:58         ` Darrick J. Wong
2021-09-23  5:56           ` Chandan Babu R
2021-09-18  1:30 ` [PATCH 12/14] xfs: compute actual maximum btree height for critical reservation calculation Darrick J. Wong
2021-09-20  9:56   ` Chandan Babu R
2021-09-18  1:30 ` [PATCH 13/14] xfs: compute the maximum height of the rmap btree when reflink enabled Darrick J. Wong
2021-09-20  9:56   ` Chandan Babu R
2021-09-18  1:30 ` [PATCH 14/14] xfs: kill XFS_BTREE_MAXLEVELS Darrick J. Wong
2021-09-20  9:57   ` Chandan Babu R

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210922173821.GH570615@magnolia \
    --to=djwong@kernel.org \
    --cc=chandan.babu@oracle.com \
    --cc=chandanrlinux@gmail.com \
    --cc=david@fromorbit.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).