All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <djwong@kernel.org>
To: Christoph Hellwig <hch@infradead.org>
Cc: Dave Chinner <dchinner@redhat.com>, linux-xfs@vger.kernel.org
Subject: Re: [PATCH 3/5] xfs: repair free space btrees
Date: Tue, 28 Nov 2023 13:13:58 -0800	[thread overview]
Message-ID: <20231128211358.GB4167244@frogsfrogsfrogs> (raw)
In-Reply-To: <ZWYDASlIqLQvk9Wh@infradead.org>

On Tue, Nov 28, 2023 at 07:10:57AM -0800, Christoph Hellwig wrote:
> A highlevel question on how blocks are (re)used here.
> 
>  - xrep_abt_find_freespace accounts the old allocbt blocks as freespace
>    per the comment, although as far as I understand  the code in
>    xrep_abt_walk_rmap the allocbt blocks aren't actually added to the
>    free_records xfarray, but recorded into the old_allocbt_blocks
>    bitmap (btw, why are we using different data structures for them?)

The old free space btree blocks are tracked via old_allocbt_blocks so
that we can reap the space after committing the new btree roots.
Reaping cross-references the set regions in the bitmap against the
rmapbt records so that we don't free crosslinked blocks.

This is a big difference from xfs_repair, which constructs a free space
map by walking the directory / bmbt trees and builds completely new
indexes in the gaps.  It gets away with that because it's building
all the space metadata in the AG, not just the free space btrees.

The next question you might have is why there's old_allocbt_blocks and
not_allocbt_blocks -- this is due to us using the AGFL to supply the
bnobt, cntbt, and rmapbt's alloc_block routines.  Hence the blocks
tracked by all four data structures are all RMAP_OWN_AG, and we have to
do a bit of bitmap work to subtract the rmapbt and AGFL blocks from all
the OWN_AG records to end up with the blocks that we think are owned by
the free space btrees.

>  - xrep_abt_reserve_space seems to allocate space for the new alloc
>    btrees by popping the first entry of ->free_records until it has
>    enough space.

Correct.

>  - what happens if the AG is so full and fragmented that we do not
>    have space to create a second set of allocbts without tearing down
>    the old ones first?

xrep_abt_reserve_space returns -ENOSPC, so we throw away all the incore
records and throw the errno out to userspace.  Across all the btree
rebuilding code, the block reservation step happens as the very last
thing before we start writing btree blocks, so it's still possible to
back out cleanly.

> I've also got a whole bunch of nitpicks that mostly don't require an
> immediate action and/or about existing code that just grows more users
> here:

Heh.

> > +int
> > +xrep_allocbt(
> > +	struct xfs_scrub	*sc)
> > +{
> > +	struct xrep_abt		*ra;
> > +	struct xfs_mount	*mp = sc->mp;
> > +	char			*descr;
> > +	int			error;
> > +
> > +	/* We require the rmapbt to rebuild anything. */
> > +	if (!xfs_has_rmapbt(mp))
> > +		return -EOPNOTSUPP;
> 
> Shoudn't this be checked by the .has callback in struct xchk_meta_ops?

No.  Checking doesn't require the rmapbt because all we do is walk the
bnobt/cntbt records and cross-reference them with whatever metadata we
can find.

A stronger check would scan the AG to build a second recordset of the
free space and compare that against what's on disk.  However, that would
be much slower, and Dave wanted scans to be fast because corruptions are
supposed to be the edge case. :)

The weaker checking also means we can scrub old filesystems, even if we
still require xfs_repair to fix them.

> > +	/* Set up enough storage to handle maximally fragmented free space. */
> > +	descr = xchk_xfile_ag_descr(sc, "free space records");
> > +	error = xfarray_create(descr, mp->m_sb.sb_agblocks / 2,
> > +			sizeof(struct xfs_alloc_rec_incore),
> > +			&ra->free_records);
> > +	kfree(descr);
> 
> Commenting on a new user of old code here, but why doesn't
> xfarray_create simply take a format string so that we don't need the
> separate allocatiom and kasprintf here?

I didn't want to spend the brainpower to figure out how to make the
macro and va_args crud work to support pushing both from xrep_allocbt ->
xfarray_create -> xfile_create.  I don't know how to make that stuff
nest short of adding a kas- variant of xfile_create.

Seeing as we don't install xfiles into the file descriptor table anyway,
the labels are only visible via ftrace, and not procfs.  I decided that
cleanliness here wasn't a high enough priority.

> > +	/*
> > +	 * We must update sm_type temporarily so that the tree-to-tree cross
> > +	 * reference checks will work in the correct direction, and also so
> > +	 * that tracing will report correctly if there are more errors.
> > +	 */
> > +	sc->sm->sm_type = XFS_SCRUB_TYPE_BNOBT;
> > +	error = xchk_bnobt(sc);
> 
> So xchk_bnobt is a tiny wrapper around xchk_allocbt, which is a small
> wrapper around xchk_btree that basіally de-multiplex the argument
> pass in by xchk_bnobt again.  This is existing code not newly added,
> but the call chain looks a bit odd to me.

Yeah.  I suppose one way to clean that up would be to export
xchk_allocbt to the dispatch table in scrub.c instead of
xchk_{bno,cnt}bt and figure out which btree we want from sm_type.

Way back when I was designing scrub I thought that repair would be
separate for each btree type, but that turned out not to be the case.
Hence the awkwardness in the call chains.

> > +/*
> > + * Add an extent to the new btree reservation pool.  Callers are required to
> > + * reap this reservation manually if the repair is cancelled.  @pag must be a
> > + * passive reference.
> > + */
> > +int
> > +xrep_newbt_add_extent(
> > +	struct xrep_newbt	*xnr,
> > +	struct xfs_perag	*pag,
> > +	xfs_agblock_t		agbno,
> > +	xfs_extlen_t		len)
> > +{
> > +	struct xfs_mount	*mp = xnr->sc->mp;
> > +	struct xfs_alloc_arg	args = {
> > +		.tp		= NULL, /* no autoreap */
> > +		.oinfo		= xnr->oinfo,
> > +		.fsbno		= XFS_AGB_TO_FSB(mp, pag->pag_agno, agbno),
> > +		.len		= len,
> > +		.resv		= xnr->resv,
> > +	};
> > +
> > +	return xrep_newbt_add_blocks(xnr, pag, &args);
> > +}
> 
> I don't quite understand what this helper adds, and the _blocks vs
> _extent naming is a bit confusing.

This wrapper simplifes the interface to xrep_newbt_add_blocks so that
external callers don't have to know which magic values of xfs_alloc_arg
are actually used by xrep_newbt_add_blocks and therefore need to be set.

For all the other repair functions, they have to allocate space from the
free space btree, so xrep_newbt_add_blocks is passed the full
_alloc_args as returned by the allocator to xrep_newbt_alloc_ag_blocks.

> > +#define for_each_xrep_newbt_reservation(xnr, resv, n)	\
> > +	list_for_each_entry_safe((resv), (n), &(xnr)->resv_list, list)
> 
> I have to admit that I find the open code list_for_each_entry_safe
> easier to follow than such wrappers.

The funny part is that I don't even use it in newbt.c.  Maybe it's time
to get rid of it.

$ git grep for_each_xrep_newbt_reservation fs
fs/xfs/scrub/alloc_repair.c:568:        for_each_xrep_newbt_reservation(&ra->new_bnobt, resv, n) {
fs/xfs/scrub/alloc_repair.c:575:        for_each_xrep_newbt_reservation(&ra->new_bnobt, resv, n) {
fs/xfs/scrub/newbt.h:60:#define for_each_xrep_newbt_reservation(xnr, resv, n)   \
fs/xfs/scrub/rmap_repair.c:1126:        for_each_xrep_newbt_reservation(&rr->new_btree, resv, n) {

> > +/* Initialize all the btree cursors for an AG repair. */
> > +void
> > +xrep_ag_btcur_init(
> > +	struct xfs_scrub	*sc,
> > +	struct xchk_ag		*sa)
> > +{
> 
> As far as I can tell this basically sets up cursors for all the
> btrees except the one that we want to repair, and the one that
> goes along with for the alloc and ialloc pairs?  Maybe just spell
> that out for clarity.

Done.

--D

  reply	other threads:[~2023-11-28 21:13 UTC|newest]

Thread overview: 153+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-24 23:39 [MEGAPATCHSET v28] xfs: online repair, second part of part 1 Darrick J. Wong
2023-11-24 23:44 ` [PATCHSET v28.0 0/1] xfs: prevent livelocks in xchk_iget Darrick J. Wong
2023-11-24 23:46   ` [PATCH 1/1] xfs: make xchk_iget safer in the presence of corrupt inode btrees Darrick J. Wong
2023-11-25  4:57     ` Christoph Hellwig
2023-11-27 21:55       ` Darrick J. Wong
2023-11-24 23:45 ` [PATCHSET v28.0 0/7] xfs: reserve disk space for online repairs Darrick J. Wong
2023-11-24 23:47   ` [PATCH 1/7] xfs: don't append work items to logged xfs_defer_pending objects Darrick J. Wong
2023-11-25  5:04     ` Christoph Hellwig
2023-11-24 23:47   ` [PATCH 2/7] xfs: allow pausing of pending deferred work items Darrick J. Wong
2023-11-25  5:05     ` Christoph Hellwig
2023-11-24 23:47   ` [PATCH 3/7] xfs: remove __xfs_free_extent_later Darrick J. Wong
2023-11-25  5:05     ` Christoph Hellwig
2023-11-24 23:47   ` [PATCH 4/7] xfs: automatic freeing of freshly allocated unwritten space Darrick J. Wong
2023-11-25  5:06     ` Christoph Hellwig
2023-11-24 23:48   ` [PATCH 5/7] xfs: implement block reservation accounting for btrees we're staging Darrick J. Wong
2023-11-26 13:14     ` Christoph Hellwig
2023-11-27 22:34       ` Darrick J. Wong
2023-11-28  5:41         ` Christoph Hellwig
2023-11-28 17:02           ` Darrick J. Wong
2023-11-24 23:48   ` [PATCH 6/7] xfs: log EFIs for all btree blocks being used to stage a btree Darrick J. Wong
2023-11-26 13:15     ` Christoph Hellwig
2023-11-24 23:48   ` [PATCH 7/7] xfs: force small EFIs for reaping btree extents Darrick J. Wong
2023-11-25  5:13     ` Christoph Hellwig
2023-11-27 22:46       ` Darrick J. Wong
2023-11-24 23:45 ` [PATCHSET v28.0 0/4] xfs: prepare repair for bulk loading Darrick J. Wong
2023-11-24 23:48   ` [PATCH 1/4] xfs: force all buffers to be written during btree bulk load Darrick J. Wong
2023-11-25  5:49     ` Christoph Hellwig
2023-11-28  1:50       ` Darrick J. Wong
2023-11-28  7:13         ` Christoph Hellwig
2023-11-28 15:18           ` Christoph Hellwig
2023-11-28 17:07             ` Darrick J. Wong
2023-11-30  4:33               ` Christoph Hellwig
2023-11-24 23:49   ` [PATCH 2/4] xfs: add debug knobs to control btree bulk load slack factors Darrick J. Wong
2023-11-25  5:50     ` Christoph Hellwig
2023-11-28  1:44       ` Darrick J. Wong
2023-11-28  5:42         ` Christoph Hellwig
2023-11-28 17:07           ` Darrick J. Wong
2023-11-24 23:49   ` [PATCH 3/4] xfs: move btree bulkload record initialization to ->get_record implementations Darrick J. Wong
2023-11-25  5:51     ` Christoph Hellwig
2023-11-24 23:49   ` [PATCH 4/4] xfs: constrain dirty buffers while formatting a staged btree Darrick J. Wong
2023-11-25  5:53     ` Christoph Hellwig
2023-11-27 22:56       ` Darrick J. Wong
2023-11-28 20:11         ` Darrick J. Wong
2023-11-29  5:50           ` Christoph Hellwig
2023-11-29  5:57             ` Darrick J. Wong
2023-11-24 23:45 ` [PATCHSET v28.0 0/5] xfs: online repair of AG btrees Darrick J. Wong
2023-11-24 23:50   ` [PATCH 1/5] xfs: create separate structures and code for u32 bitmaps Darrick J. Wong
2023-11-25  5:57     ` Christoph Hellwig
2023-11-28  1:34       ` Darrick J. Wong
2023-11-28  5:43         ` Christoph Hellwig
2023-11-24 23:50   ` [PATCH 2/5] xfs: roll the scrub transaction after completing a repair Darrick J. Wong
2023-11-25  6:05     ` Christoph Hellwig
2023-11-28  1:29       ` Darrick J. Wong
2023-11-24 23:50   ` [PATCH 3/5] xfs: repair free space btrees Darrick J. Wong
2023-11-25  6:11     ` Christoph Hellwig
2023-11-28  1:05       ` Darrick J. Wong
2023-11-28 15:10     ` Christoph Hellwig
2023-11-28 21:13       ` Darrick J. Wong [this message]
2023-11-29  5:56         ` Christoph Hellwig
2023-11-29  6:18           ` Darrick J. Wong
2023-11-29  6:24             ` Christoph Hellwig
2023-11-29  6:26               ` Darrick J. Wong
2023-11-24 23:50   ` [PATCH 4/5] xfs: repair inode btrees Darrick J. Wong
2023-11-25  6:12     ` Christoph Hellwig
2023-11-28  1:09       ` Darrick J. Wong
2023-11-28 15:57     ` Christoph Hellwig
2023-11-28 21:37       ` Darrick J. Wong
2023-11-24 23:51   ` [PATCH 5/5] xfs: repair refcount btrees Darrick J. Wong
2023-11-28 16:07     ` Christoph Hellwig
2023-11-24 23:45 ` [PATCHSET v28.0 0/7] xfs: online repair of inodes and forks Darrick J. Wong
2023-11-24 23:51   ` [PATCH 1/7] xfs: disable online repair quota helpers when quota not enabled Darrick J. Wong
2023-11-25  6:13     ` Christoph Hellwig
2023-11-24 23:51   ` [PATCH 2/7] xfs: try to attach dquots to files before repairing them Darrick J. Wong
2023-11-25  6:14     ` Christoph Hellwig
2023-11-24 23:51   ` [PATCH 3/7] xfs: repair inode records Darrick J. Wong
2023-11-28 17:08     ` Christoph Hellwig
2023-11-28 23:08       ` Darrick J. Wong
2023-11-29  6:02         ` Christoph Hellwig
2023-12-05 23:08           ` Darrick J. Wong
2023-12-06  5:16             ` Christoph Hellwig
2023-11-24 23:52   ` [PATCH 4/7] xfs: zap broken inode forks Darrick J. Wong
2023-11-30  4:44     ` Christoph Hellwig
2023-11-30 21:08       ` Darrick J. Wong
2023-12-04  4:39         ` Christoph Hellwig
2023-12-04 20:43           ` Darrick J. Wong
2023-12-05  4:28             ` Christoph Hellwig
2023-11-24 23:52   ` [PATCH 5/7] xfs: abort directory parent scrub scans if we encounter a zapped directory Darrick J. Wong
2023-11-30  4:47     ` Christoph Hellwig
2023-11-30 21:37       ` Darrick J. Wong
2023-12-04  4:41         ` Christoph Hellwig
2023-12-04 20:44           ` Darrick J. Wong
2023-11-24 23:52   ` [PATCH 6/7] xfs: skip the rmapbt search on an empty attr fork unless we know it was zapped Darrick J. Wong
2023-11-24 23:52   ` [PATCH 7/7] xfs: repair obviously broken inode modes Darrick J. Wong
2023-11-30  4:49     ` Christoph Hellwig
2023-11-30 21:18       ` Darrick J. Wong
2023-12-04  4:42         ` Christoph Hellwig
2023-11-24 23:46 ` [PATCHSET v28.0 0/5] xfs: online repair of file fork mappings Darrick J. Wong
2023-11-24 23:53   ` [PATCH 1/5] xfs: reintroduce reaping of file metadata blocks to xrep_reap_extents Darrick J. Wong
2023-11-30  4:53     ` Christoph Hellwig
2023-11-30 21:48       ` Darrick J. Wong
2023-12-04  4:42         ` Christoph Hellwig
2023-11-24 23:53   ` [PATCH 2/5] xfs: repair inode fork block mapping data structures Darrick J. Wong
2023-11-30  5:07     ` Christoph Hellwig
2023-12-01  1:38       ` Darrick J. Wong
2023-11-24 23:53   ` [PATCH 3/5] xfs: refactor repair forcing tests into a repair.c helper Darrick J. Wong
2023-11-28 14:20     ` Christoph Hellwig
2023-11-29  5:42       ` Darrick J. Wong
2023-11-29  6:03         ` Christoph Hellwig
2023-11-24 23:53   ` [PATCH 4/5] xfs: create a ranged query function for refcount btrees Darrick J. Wong
2023-11-28 13:59     ` Christoph Hellwig
2023-11-24 23:54   ` [PATCH 5/5] xfs: repair problems in CoW forks Darrick J. Wong
2023-11-30  5:10     ` Christoph Hellwig
2023-11-24 23:46 ` [PATCHSET v28.0 0/6] xfs: online repair of rt bitmap file Darrick J. Wong
2023-11-24 23:54   ` [PATCH 1/6] xfs: check rt bitmap file geometry more thoroughly Darrick J. Wong
2023-11-28 14:04     ` Christoph Hellwig
2023-11-28 23:27       ` Darrick J. Wong
2023-11-29  6:05         ` Christoph Hellwig
2023-11-29  6:20           ` Darrick J. Wong
2023-11-24 23:54   ` [PATCH 2/6] xfs: check rt summary " Darrick J. Wong
2023-11-28 14:05     ` Christoph Hellwig
2023-11-28 23:30       ` Darrick J. Wong
2023-11-29  1:23         ` Darrick J. Wong
2023-11-29  6:05         ` Christoph Hellwig
2023-11-29  6:21           ` Darrick J. Wong
2023-11-29  6:23             ` Christoph Hellwig
2023-11-30  0:10               ` Darrick J. Wong
2023-11-24 23:54   ` [PATCH 3/6] xfs: always check the rtbitmap and rtsummary files Darrick J. Wong
2023-11-28 14:06     ` Christoph Hellwig
2023-11-24 23:55   ` [PATCH 4/6] xfs: repair the inode core and forks of a metadata inode Darrick J. Wong
2023-11-30  5:12     ` Christoph Hellwig
2023-11-24 23:55   ` [PATCH 5/6] xfs: create a new inode fork block unmap helper Darrick J. Wong
2023-11-25  6:17     ` Christoph Hellwig
2023-11-24 23:55   ` [PATCH 6/6] xfs: online repair of realtime bitmaps Darrick J. Wong
2023-11-30  5:16     ` Christoph Hellwig
2023-11-24 23:46 ` [PATCHSET v28.0 0/5] xfs: online repair of quota and rt metadata files Darrick J. Wong
2023-11-24 23:56   ` [PATCH 1/5] xfs: check the ondisk space mapping behind a dquot Darrick J. Wong
2023-11-30  5:17     ` Christoph Hellwig
2023-11-24 23:56   ` [PATCH 2/5] xfs: check dquot resource timers Darrick J. Wong
2023-11-30  5:17     ` Christoph Hellwig
2023-11-24 23:56   ` [PATCH 3/5] xfs: pull xfs_qm_dqiterate back into scrub Darrick J. Wong
2023-11-30  5:22     ` Christoph Hellwig
2023-11-24 23:56   ` [PATCH 4/5] xfs: improve dquot iteration for scrub Darrick J. Wong
2023-11-30  5:25     ` Christoph Hellwig
2023-11-24 23:57   ` [PATCH 5/5] xfs: repair quotas Darrick J. Wong
2023-11-30  5:33     ` Christoph Hellwig
2023-11-30 22:10       ` Darrick J. Wong
2023-12-04  4:48         ` Christoph Hellwig
2023-12-04 20:52           ` Darrick J. Wong
2023-12-05  4:27             ` Christoph Hellwig
2023-12-05  5:20               ` Darrick J. Wong
2023-12-04  4:49         ` Christoph Hellwig
  -- strict thread matches above, loose matches on Subject: below --
2020-01-01  1:02 [PATCH v22 0/5] xfs: online repair of AG btrees Darrick J. Wong
2020-01-01  1:02 ` [PATCH 3/5] xfs: repair free space btrees Darrick J. Wong
2019-10-29 23:31 [PATCH v21 0/5] xfs: online repair of AG btrees Darrick J. Wong
2019-10-29 23:32 ` [PATCH 3/5] xfs: repair free space btrees Darrick J. Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20231128211358.GB4167244@frogsfrogsfrogs \
    --to=djwong@kernel.org \
    --cc=dchinner@redhat.com \
    --cc=hch@infradead.org \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.