All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <djwong@kernel.org>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH 3/6] xfs: log EFIs for all btree blocks being used to stage a btree
Date: Thu, 10 Aug 2023 13:36:38 -0700	[thread overview]
Message-ID: <20230810203638.GE11352@frogsfrogsfrogs> (raw)
In-Reply-To: <20230809235234.GZ11352@frogsfrogsfrogs>

On Wed, Aug 09, 2023 at 04:52:34PM -0700, Darrick J. Wong wrote:
> On Tue, Aug 08, 2023 at 04:11:13PM +1000, Dave Chinner wrote:
> > On Mon, Aug 07, 2023 at 05:54:52PM -0700, Darrick J. Wong wrote:
> > > On Mon, Aug 07, 2023 at 06:41:39PM +1000, Dave Chinner wrote:
> > > > On Thu, Jul 27, 2023 at 03:24:32PM -0700, Darrick J. Wong wrote:
> > > > > From: Darrick J. Wong <djwong@kernel.org>
> > > > > 
> > > > > We need to log EFIs for every extent that we allocate for the purpose of
> > > > > staging a new btree so that if we fail then the blocks will be freed
> > > > > during log recovery.  Add a function to relog the EFIs, so that repair
> > > > > can relog them all every time it creates a new btree block, which will
> > > > > help us to avoid pinning the log tail.
> > > > > 
> > > > > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > > > .....
> > > > > +/*
> > > > > + * Set up automatic reaping of the blocks reserved for btree reconstruction in
> > > > > + * case we crash by logging a deferred free item for each extent we allocate so
> > > > > + * that we can get all of the space back if we crash before we can commit the
> > > > > + * new btree.  This function returns a token that can be used to cancel
> > > > > + * automatic reaping if repair is successful.
> > > > > + */
> > > > > +static int
> > > > > +xrep_newbt_schedule_autoreap(
> > > > > +	struct xrep_newbt		*xnr,
> > > > > +	struct xrep_newbt_resv		*resv)
> > > > > +{
> > > > > +	struct xfs_extent_free_item	efi_item = {
> > > > > +		.xefi_blockcount	= resv->len,
> > > > > +		.xefi_owner		= xnr->oinfo.oi_owner,
> > > > > +		.xefi_flags		= XFS_EFI_SKIP_DISCARD,
> > > > > +		.xefi_pag		= resv->pag,
> > > > > +	};
> > > > > +	struct xfs_scrub		*sc = xnr->sc;
> > > > > +	struct xfs_log_item		*lip;
> > > > > +	LIST_HEAD(items);
> > > > > +
> > > > > +	ASSERT(xnr->oinfo.oi_offset == 0);
> > > > > +
> > > > > +	efi_item.xefi_startblock = XFS_AGB_TO_FSB(sc->mp, resv->pag->pag_agno,
> > > > > +			resv->agbno);
> > > > > +	if (xnr->oinfo.oi_flags & XFS_OWNER_INFO_ATTR_FORK)
> > > > > +		efi_item.xefi_flags |= XFS_EFI_ATTR_FORK;
> > > > > +	if (xnr->oinfo.oi_flags & XFS_OWNER_INFO_BMBT_BLOCK)
> > > > > +		efi_item.xefi_flags |= XFS_EFI_BMBT_BLOCK;
> > > > > +
> > > > > +	INIT_LIST_HEAD(&efi_item.xefi_list);
> > > > > +	list_add(&efi_item.xefi_list, &items);
> > > > > +
> > > > > +	xfs_perag_intent_hold(resv->pag);
> > > > > +	lip = xfs_extent_free_defer_type.create_intent(sc->tp, &items, 1,
> > > > > +			false);
> > > > 
> > > > Hmmmm.
> > > > 
> > > > That triggered flashing lights and sirens - I'm not sure I really
> > > > like the usage of the defer type arrays like this, nor the
> > > > duplication of the defer mechanisms for relogging, etc.
> > > 
> > > Yeah, I don't quite like manually tromping through the defer ops state
> > > machine here either.  Everywhere /else/ in XFS logs an EFI and finishes
> > > it to free the space.  Just to make sure we're on the same page, newbt
> > > will allocate space, log an EFI, and then:
> > > 
> > > 1. Use the space and log an EFD for the space to cancel the EFI
> > > 2. Use some of the space, log an EFD for the space we used, immediately
> > >    log a new EFI for the unused parts, and finish the new EFI manually
> > > 3. Don't use any of the space at all, and finish the EFI manually
> > > 
> > > Initially, I tried using the regular defer ops mechanism, but this got
> > > messy on account of having to extern most of xfs_defer.c so that I could
> > > manually modify the defer ops state.  It's hard to generalize this,
> > > since there's only *one* place that actually needs manual flow control.
> > 
> > *nod*
> > 
> > But I can't help but think it's a manifestation of a generic
> > optimisation that could allow us to avoid needing to use unwritten
> > extents for new data alloations...
> 
> I've thought about this usecase at various points in the lifetime of the
> newbt.c code.  The usecases are indeed very similar -- speculatively
> allocate some disk blocks, write to them, and either map them into the
> data fork / btree root if the writes actually succeed; or free them
> because it failed.
> 
> However, I think we could eliminate the overhead of the speculative
> allocation out of the bnobt/cnbt (aka step 1) by boosting all of the
> tracking to the incore data structures.  Handwaving sketch:
> 
> 1. Find the extent we want from the free space btrees, and add a record
> to the busy extent list with some new state flag that signals
> "speculative write: do not DISCARD this extent, and allocating callers
> should move on".
> 
> (Not sure what happens if the allocating caller /never/ finds space?
> Does kicking writeback make sense here?  I am not sure it does.)
> 
> 2. Create a mapping in the cow fork for the speculatively allocated
> space, along with an annotation to that effect.  Also need to absorb
> whatever space we reserved in the delalloc mapping for bmbt expansion.
> 
> 3. Write the blocks.
> 
> 4. If the write succeeds, we do the cow remap like we do now, but also
> remove the extent from the ondisk free space btrees and clear the space
> from the busy extent list.
> 
> 5. If the write fails, clear the space from busy extent list.  Maybe add
> the space to a badblocks list(??)
> 
> Under this scheme, the only ondisk metadata update is step 4.  I really
> like the idea of porting newbt.c to use a mechanism like this, since
> the only time we touch the log is if the repair succeeds.
> 
> Too bad it doesn't exist yet! :)
> 
> For now, the oddball use of EFIs is limited to newbt.c, which means it's
> self-contained inside repair.  Except for the "too many extents attached
> to an EFI" issue, I think it works well enough to put into use until you
> or I have time to figure out how to turn either of our "unwritten extent
> for new data allocation" sketches into reality.
> 
> (IOWs, I'm trying /not/ to go carving around in the allocator and the
> extent busy list when there's already so much to think about. ;))
> 
> > > ISTR that was around the time bfoster and I were reworking log intent
> > > item recovery, and it was easier to do this outside of the defer ops
> > > code than try to refactor it and keep this exceptional piece working
> > > too.
> > > 
> > > > Not that I have a better idea right now - is this the final form of
> > > > this code, or is more stuff built on top of it or around it?
> > > 
> > > That's the final form of it.  The good news is that it's been stable
> > > enough despite me tearing into the EFI code again in the rt
> > > modernization patchset.  Do you have any further suggestions?
> > 
> > Not for the patchset as it stands.
> 
> <nod> I'll add some monitoring to report the maximum extent counts that
> get added to EFIs, and work on something to constrain the number of
> extents that get added to a single EFI log item that's coming from
> repair.

My debug patch kept a per-mount maximum EFI extent count, and logged
whenever an EFI got logged with a higher extent count.  From last
night's fstests run, I saw this:

  11297 EFI MAX DEPTH bumped to 2 (RUNTIME)
      8 EFI MAX DEPTH bumped to 2 (RECOVERY)
   2598 EFI MAX DEPTH bumped to 3 (RUNTIME)
      3 EFI MAX DEPTH bumped to 3 (RECOVERY)
    216 EFI MAX DEPTH bumped to 4 (RUNTIME)
    100 EFI MAX DEPTH bumped to 5 (RUNTIME)
    862 EFI MAX DEPTH bumped to 6 (RUNTIME)
     39 EFI MAX DEPTH bumped to 7 (RUNTIME)
     22 EFI MAX DEPTH bumped to 8 (RUNTIME)
     40 EFI MAX DEPTH bumped to 9 (RUNTIME)
     26 EFI MAX DEPTH bumped to 10 (RUNTIME)
     17 EFI MAX DEPTH bumped to 11 (RUNTIME)
     14 EFI MAX DEPTH bumped to 12 (RUNTIME)
     42 EFI MAX DEPTH bumped to 13 (RUNTIME)
      5 EFI MAX DEPTH bumped to 14 (RUNTIME)
      5 EFI MAX DEPTH bumped to 15 (RUNTIME)
    509 EFI MAX DEPTH bumped to 16 (RUNTIME)

So I guess we /do/ see a healthy(?) number of EFIs with more than 4
extents attached, and more bumps to 16 than I expected.  Granted there's
a lot of mkfs action in fstests, so this still isn't capturing what
might happen if there was fragmentation ahoy.

Well ok the alwayscow=1 profile did this:

    170 EFI MAX DEPTH bumped to 2 (RUNTIME)
    133 EFI MAX DEPTH bumped to 3 (RUNTIME)
     24 EFI MAX DEPTH bumped to 4 (RUNTIME)
      9 EFI MAX DEPTH bumped to 5 (RUNTIME)
     29 EFI MAX DEPTH bumped to 6 (RUNTIME)
      1 EFI MAX DEPTH bumped to 7 (RUNTIME)
      2 EFI MAX DEPTH bumped to 9 (RUNTIME)
      1 EFI MAX DEPTH bumped to 10 (RUNTIME)
      1 EFI MAX DEPTH bumped to 11 (RUNTIME)
      1 EFI MAX DEPTH bumped to 12 (RUNTIME)
      1 EFI MAX DEPTH bumped to 16 (RUNTIME)

--D

> --D
> 
> > -Dave.
> > -- 
> > Dave Chinner
> > david@fromorbit.com

  reply	other threads:[~2023-08-10 20:36 UTC|newest]

Thread overview: 90+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-27 22:11 [MEGAPATCHSET v26] xfs: online repair, part of part 1 Darrick J. Wong
2023-07-27 22:18 ` [PATCHSET v26.0 0/9] xfs: fix online repair block reaping Darrick J. Wong
2023-07-27 22:21   ` [PATCH 1/9] xfs: cull repair code that will never get used Darrick J. Wong
2023-07-27 22:21   ` [PATCH 2/9] xfs: move the post-repair block reaping code to a separate file Darrick J. Wong
2023-07-27 22:22   ` [PATCH 3/9] xfs: only invalidate blocks if we're going to free them Darrick J. Wong
2023-07-27 22:22   ` [PATCH 4/9] xfs: only allow reaping of per-AG blocks in xrep_reap_extents Darrick J. Wong
2023-07-27 22:22   ` [PATCH 5/9] xfs: use deferred frees to reap old btree blocks Darrick J. Wong
2023-07-27 22:22   ` [PATCH 6/9] xfs: rearrange xrep_reap_block to make future code flow easier Darrick J. Wong
2023-07-27 22:23   ` [PATCH 7/9] xfs: allow scanning ranges of the buffer cache for live buffers Darrick J. Wong
2023-07-27 22:23   ` [PATCH 8/9] xfs: reap large AG metadata extents when possible Darrick J. Wong
2023-07-27 22:23   ` [PATCH 9/9] xfs: use per-AG bitmaps to reap unused AG metadata blocks during repair Darrick J. Wong
2023-08-07  6:19   ` [PATCHSET v26.0 0/9] xfs: fix online repair block reaping Dave Chinner
2023-08-08  0:40     ` Darrick J. Wong
2023-08-08  5:17       ` Dave Chinner
2023-08-09 23:17         ` Darrick J. Wong
2023-07-27 22:18 ` [PATCHSET v26.0 0/6] xfs: prepare repair for bulk loading Darrick J. Wong
2023-07-27 22:24   ` [PATCH 1/6] xfs: force all buffers to be written during btree bulk load Darrick J. Wong
2023-07-27 22:24   ` [PATCH 2/6] xfs: implement block reservation accounting for btrees we're staging Darrick J. Wong
2023-08-07  6:58     ` Dave Chinner
2023-08-08  1:08       ` Darrick J. Wong
2023-07-27 22:24   ` [PATCH 3/6] xfs: log EFIs for all btree blocks being used to stage a btree Darrick J. Wong
2023-08-07  8:41     ` Dave Chinner
2023-08-08  0:54       ` Darrick J. Wong
2023-08-08  6:11         ` Dave Chinner
2023-08-09 23:52           ` Darrick J. Wong
2023-08-10 20:36             ` Darrick J. Wong [this message]
2023-09-08 23:34       ` Darrick J. Wong
2023-07-27 22:24   ` [PATCH 4/6] xfs: add debug knobs to control btree bulk load slack factors Darrick J. Wong
2023-07-27 22:25   ` [PATCH 5/6] xfs: move btree bulkload record initialization to ->get_record implementations Darrick J. Wong
2023-07-27 22:25   ` [PATCH 6/6] xfs: constrain dirty buffers while formatting a staged btree Darrick J. Wong
2023-07-27 22:19 ` [PATCHSET v26.0 0/7] xfs: stage repair information in pageable memory Darrick J. Wong
2023-07-27 22:25   ` [PATCH 1/7] xfs: create a big array data structure Darrick J. Wong
2023-07-28  3:10     ` Matthew Wilcox
2023-07-28  4:39       ` Darrick J. Wong
2023-07-27 22:25   ` [PATCH 2/7] xfs: enable sorting of xfile-backed arrays Darrick J. Wong
2023-07-27 22:26   ` [PATCH 3/7] xfs: convert xfarray insertion sort to heapsort using scratchpad memory Darrick J. Wong
2023-07-27 22:26   ` [PATCH 4/7] xfs: teach xfile to pass back direct-map pages to caller Darrick J. Wong
2023-07-27 22:26   ` [PATCH 5/7] xfs: speed up xfarray sort by sorting xfile page contents directly Darrick J. Wong
2023-07-27 22:26   ` [PATCH 6/7] xfs: cache pages used for xfarray quicksort convergence Darrick J. Wong
2023-07-27 22:27   ` [PATCH 7/7] xfs: improve xfarray quicksort pivot Darrick J. Wong
2023-07-27 22:19 ` [PATCHSET v26.0 0/2] xfs: add usage counters for scrub Darrick J. Wong
2023-07-27 22:27   ` [PATCH 1/2] xfs: create scaffolding for creating debugfs entries Darrick J. Wong
2023-07-27 22:27   ` [PATCH 2/2] xfs: track usage statistics of online fsck Darrick J. Wong
2023-08-08  7:09   ` [PATCHSET v26.0 0/2] xfs: add usage counters for scrub Dave Chinner
2023-07-27 22:19 ` [PATCHSET v26.0 0/4] xfs: online scrubbing of realtime summary files Darrick J. Wong
2023-07-27 22:27   ` [PATCH 1/4] xfs: get our own reference to inodes that we want to scrub Darrick J. Wong
2023-07-27 22:28   ` [PATCH 2/4] xfs: wrap ilock/iunlock operations on sc->ip Darrick J. Wong
2023-07-27 22:28   ` [PATCH 3/4] xfs: move the realtime summary file scrubber to a separate source file Darrick J. Wong
2023-07-27 22:28   ` [PATCH 4/4] xfs: implement online scrubbing of rtsummary info Darrick J. Wong
2023-07-27 22:19 ` [PATCHSET v26.0 0/2] xfs: miscellaneous repair tweaks Darrick J. Wong
2023-07-27 22:28   ` [PATCH 1/2] xfs: always rescan allegedly healthy per-ag metadata after repair Darrick J. Wong
2023-07-27 22:29   ` [PATCH 2/2] xfs: allow the user to cancel repairs before we start writing Darrick J. Wong
2023-07-27 22:20 ` [PATCHSET v26.0 0/2] xfs: force rebuilding of metadata Darrick J. Wong
2023-07-27 22:29   ` [PATCH 1/2] xfs: don't complain about unfixed metadata when repairs were injected Darrick J. Wong
2023-07-27 22:29   ` [PATCH 2/2] xfs: allow userspace to rebuild metadata structures Darrick J. Wong
2023-07-27 22:20 ` [PATCHSET v26.0 0/2] xfs: fixes to the AGFL repair code Darrick J. Wong
2023-07-27 22:30   ` [PATCH 1/2] xfs: clear pagf_agflreset when repairing the AGFL Darrick J. Wong
2023-07-27 22:30   ` [PATCH 2/2] xfs: fix agf_fllast when repairing an empty AGFL Darrick J. Wong
2023-08-08  7:10     ` Dave Chinner
2023-07-27 22:20 ` [PATCHSET v26.0 0/5] xfs: online repair of AG btrees Darrick J. Wong
2023-07-27 22:30   ` [PATCH 1/5] xfs: repair free space btrees Darrick J. Wong
2023-07-27 22:30   ` [PATCH 2/5] xfs: hide xfs_inode_is_allocated in scrub common code Darrick J. Wong
2023-08-08  7:13     ` Dave Chinner
2023-07-27 22:31   ` [PATCH 3/5] xfs: rewrite xchk_inode_is_allocated to work properly Darrick J. Wong
2023-08-08  7:14     ` Dave Chinner
2023-07-27 22:31   ` [PATCH 4/5] xfs: repair inode btrees Darrick J. Wong
2023-07-27 22:31   ` [PATCH 5/5] xfs: repair refcount btrees Darrick J. Wong
2023-07-27 22:20 ` [PATCHSET v26.0 0/2] xfs: fixes for the block mapping checker Darrick J. Wong
2023-07-27 22:31   ` [PATCH 1/2] xfs: simplify returns in xchk_bmap Darrick J. Wong
2023-07-27 22:32   ` [PATCH 2/2] xfs: don't check reflink iflag state when checking cow fork Darrick J. Wong
2023-08-08  7:16   ` [PATCHSET v26.0 0/2] xfs: fixes for the block mapping checker Dave Chinner
2023-07-27 22:21 ` [PATCHSET v26.0 0/6] xfs: online repair of inodes and forks Darrick J. Wong
2023-07-27 22:32   ` [PATCH 1/6] xfs: disable online repair quota helpers when quota not enabled Darrick J. Wong
2023-07-27 22:32   ` [PATCH 2/6] xfs: try to attach dquots to files before repairing them Darrick J. Wong
2023-07-27 22:32   ` [PATCH 3/6] xfs: repair inode records Darrick J. Wong
2023-08-09  8:42     ` Dave Chinner
2023-08-10  0:43       ` Darrick J. Wong
2023-07-27 22:33   ` [PATCH 4/6] xfs: zap broken inode forks Darrick J. Wong
2023-07-27 22:33   ` [PATCH 5/6] xfs: abort directory parent scrub scans if we encounter a zapped directory Darrick J. Wong
2023-07-27 22:33   ` [PATCH 6/6] xfs: repair obviously broken inode modes Darrick J. Wong
2023-08-09  9:44   ` [PATCHSET v26.0 0/6] xfs: online repair of inodes and forks Dave Chinner
2023-08-10  0:45     ` Darrick J. Wong
2023-07-27 22:21 ` [PATCHSET v26.0 0/5] xfs: online repair of file fork mappings Darrick J. Wong
2023-07-27 22:33   ` [PATCH 1/5] xfs: reintroduce reaping of file metadata blocks to xrep_reap_extents Darrick J. Wong
2023-07-27 22:34   ` [PATCH 2/5] xfs: repair inode fork block mapping data structures Darrick J. Wong
2023-07-27 22:34   ` [PATCH 3/5] xfs: refactor repair forcing tests into a repair.c helper Darrick J. Wong
2023-07-27 22:34   ` [PATCH 4/5] xfs: create a ranged query function for refcount btrees Darrick J. Wong
2023-07-27 22:34   ` [PATCH 5/5] xfs: repair problems in CoW forks Darrick J. Wong
  -- strict thread matches above, loose matches on Subject: below --
2023-05-26  0:28 [PATCHSET v25.0 0/6] xfs: prepare repair for bulk loading Darrick J. Wong
2023-05-26  0:46 ` [PATCH 3/6] xfs: log EFIs for all btree blocks being used to stage a btree Darrick J. Wong
2022-12-30 22:12 [PATCHSET v24.0 0/6] xfs: prepare repair for bulk loading Darrick J. Wong
2022-12-30 22:12 ` [PATCH 3/6] xfs: log EFIs for all btree blocks being used to stage a btree Darrick J. Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230810203638.GE11352@frogsfrogsfrogs \
    --to=djwong@kernel.org \
    --cc=david@fromorbit.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.