All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH 04/22] xfs: add helpers to dispose of old btree blocks after a repair
Date: Wed, 16 May 2018 16:18:20 -0700	[thread overview]
Message-ID: <20180516231820.GO23858@magnolia> (raw)
In-Reply-To: <20180516223225.GX23861@dastard>

On Thu, May 17, 2018 at 08:32:25AM +1000, Dave Chinner wrote:
> On Wed, May 16, 2018 at 12:34:25PM -0700, Darrick J. Wong wrote:
> > On Wed, May 16, 2018 at 06:32:32PM +1000, Dave Chinner wrote:
> > > On Tue, May 15, 2018 at 03:34:04PM -0700, Darrick J. Wong wrote:
> > > > From: Darrick J. Wong <darrick.wong@oracle.com>
> > > > 
> > > > Now that we've plumbed in the ability to construct a list of dead btree
> > > > blocks following a repair, add more helpers to dispose of them.  This is
> > > > done by examining the rmapbt -- if the btree was the only owner we can
> > > > free the block, otherwise it's crosslinked and we can only remove the
> > > > rmapbt record.
> > > > 
> > > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > > > ---
> 
> [...]
> 
> > > > +	struct xfs_owner_info		oinfo;
> > > > +	struct xfs_perag		*pag;
> > > > +	int				error;
> > > > +
> > > > +	/* Make sure there's space on the freelist. */
> > > > +	error = xfs_repair_fix_freelist(sc, true);
> > > > +	if (error)
> > > > +		return error;
> > > > +	pag = xfs_perag_get(sc->mp, sc->sa.agno);
> > > 
> > > Because this is how it quickly gets it gets to silly numbers of
> > > lookups. That's two now in this function.
> > > 
> > > > +	if (pag->pagf_flcount == 0) {
> > > > +		xfs_perag_put(pag);
> > > > +		return -EFSCORRUPTED;
> > > 
> > > Why is having an empty freelist a problem here? It's an AG thatis
> > > completely out of space, but it isn't corruption? And I don't see
> > > why an empty freelist prevents us from adding a backs back onto the
> > > AGFL?
> 
> I think you missed a question :P

Doh, sorry.  I don't remember exactly why I put that in there; judging
from my notes I think the idea was that if the AG is completely full
we'd rather shut down with a corruption signal hoping that the admin
will run xfs_repair.

I also don't see why it's necessary now, I'll see what happens if I
remove it.

> > > > +	/* Can we find any other rmappings? */
> > > > +	error = xfs_rmap_has_other_keys(cur, agbno, 1, oinfo, &has_other_rmap);
> > > > +	if (error)
> > > > +		goto out_cur;
> > > > +	xfs_btree_del_cursor(cur, XFS_BTREE_NOERROR);
> > > > +
> > > > +	/*
> > > > +	 * If there are other rmappings, this block is cross linked and must
> > > > +	 * not be freed.  Remove the reverse mapping and move on.  Otherwise,
> > > 
> > > Why do we just remove the reverse mapping if the block cannot be
> > > freed? I have my suspicions that this is removing cross-links one by
> > > one until there's only one reference left to the extent, but then I
> > > ask "how do we know which one is the correct mapping"?
> > 
> > Right.  Prior to calling this function we built a totally new btree with
> > blocks from the freespace, so now we need to remove the rmaps that
> > covered the old btree and/or free the block.  The goal is to rebuild
> > /all/ the trees that think they own this block so that we can free the
> > block and not have to care which one is correct.
> 
> Ok, so  we've already rebuilt the new btree, and this is removing
> stale references to cross-linked blocks that have owners different
> to the one we are currently scanning.
> 
> What happens if the cross-linked block is cross-linked within the
> same owner context?

It won't end up on the reap list in first place, because we scan every
block of every object with the same rmap owner to construct sublist.
Then we subtract sublist from exlist (which we got from rmap) and only
reap the difference.

> > > > +	struct xfs_scrub_context	*sc,
> > > > +	xfs_fsblock_t			fsbno,
> > > > +	xfs_extlen_t			len,
> > > > +	struct xfs_owner_info		*oinfo,
> > > > +	enum xfs_ag_resv_type		resv)
> > > > +{
> > > > +	struct xfs_mount		*mp = sc->mp;
> > > > +	int				error = 0;
> > > > +
> > > > +	ASSERT(xfs_sb_version_hasrmapbt(&mp->m_sb));
> > > > +	ASSERT(sc->ip != NULL || XFS_FSB_TO_AGNO(mp, fsbno) == sc->sa.agno);
> > > > +
> > > > +	trace_xfs_repair_dispose_btree_extent(mp, XFS_FSB_TO_AGNO(mp, fsbno),
> > > > +			XFS_FSB_TO_AGBNO(mp, fsbno), len);
> > > > +
> > > > +	for (; len > 0; len--, fsbno++) {
> > > > +		error = xfs_repair_dispose_btree_block(sc, fsbno, oinfo, resv);
> > > > +		if (error)
> > > > +			return error;
> > > 
> > > So why do we do this one block at a time, rather than freeing it
> > > as an entire extent in one go?
> > 
> > At the moment the xfs_rmap_has_other_keys helper can only tell you if
> > there are multiple rmap owners for any part of a given extent.  For
> > example, if the rmap records were:
> > 
> > (start = 35, len = 3, owner = rmap)
> > (start = 35, len = 1, owner = refcount)
> > (start = 37, len = 1, owner = inobt)
> > 
> > Notice how block 35 and 37 are crosslinked, but 36 isn't.  A call to
> > xfs_rmap_has_other_keys(35, 3) will say "yes" but doesn't have a way to
> > signal back that the yes applies to 35 but that the caller should try
> > again with block 36.  Doing so would require _has_other_keys to maintain
> > a refcount and to return to the caller any time the refcount changed,
> > and the caller would still have to loop the extent.  It's easier to have
> > a dumb loop for the initial implementation and optimize it if we start
> > taking more heat than we'd like on crosslinked filesystems.
> 
> Well, I can see why you are doing this now, but the problems with
> multi-block metadata makes me think that we really need to know more
> detail of the owner in the rmap. e.g. that it's directory or
> attribute data, not user file data and hence we can infer things
> about expected block sizes, do the correct sort of buffer lookups
> for invalidation, etc.

I'm not sure we can do that without causing a deadlocking problem, since
we lock all the AG headers to rebuild a btree and in general we can't
_iget an inode to find out if it's a dir or not.  But I have more to say
on this in a few paragraphs...

> I'm tending towards "this needs a design doc to explain all
> this stuff" right now. Code is great, but I'm struggling understand
> (reverse engineer!) all the algorithms and decisions that have been
> made from the code...

Working on it.

> > > > +/*
> > > > + * Invalidate buffers for per-AG btree blocks we're dumping.  We assume that
> > > > + * exlist points only to metadata blocks.
> > > > + */
> > > > +int
> > > > +xfs_repair_invalidate_blocks(
> > > > +	struct xfs_scrub_context	*sc,
> > > > +	struct xfs_repair_extent_list	*exlist)
> > > > +{
> > > > +	struct xfs_repair_extent	*rex;
> > > > +	struct xfs_repair_extent	*n;
> > > > +	struct xfs_buf			*bp;
> > > > +	xfs_agnumber_t			agno;
> > > > +	xfs_agblock_t			agbno;
> > > > +	xfs_agblock_t			i;
> > > > +
> > > > +	for_each_xfs_repair_extent_safe(rex, n, exlist) {
> > > > +		agno = XFS_FSB_TO_AGNO(sc->mp, rex->fsbno);
> > > > +		agbno = XFS_FSB_TO_AGBNO(sc->mp, rex->fsbno);
> > > > +		for (i = 0; i < rex->len; i++) {
> > > > +			bp = xfs_btree_get_bufs(sc->mp, sc->tp, agno,
> > > > +					agbno + i, 0);
> > > > +			xfs_trans_binval(sc->tp, bp);
> > > > +		}
> > > 
> > > Again, this is doing things by single blocks. We do have multi-block
> > > metadata (inodes, directory blocks, remote attrs) that, if it
> > > is already in memory, needs to be treated as multi-block extents. If
> > > we don't do that, we'll cause aliasing problems in the buffer cache
> > > (see _xfs_buf_obj_cmp()) and it's all downhill from there.
> > 
> > I only recently started testing with filesystems containing multiblock
> > dir/rmt metadata, and this is an unsolved problem. :(
> 
> That needs documenting, too. Perhaps explicitly, by rejecting repair
> requests on filesystems or types that have multi-block constructs
> until we solve these problems.

Trouble is, remote attr values can have an xfs_buf that spans however
many blocks you need to store a full 64k value, and what happens if the
rmapbt collides with that?  It sorta implies that we can't do
invalidation on /any/ filesystem, which is unfortunate....

...unless we have an easy way of finding /any/ buffer that points to a
given block?  Probably not, since iirc they're indexed by the first disk
block number.  Hm.  I suppose we could use the rmap data to look for
anything within 64k of the logical offset of an attr/data rmap
overlapping the same block...

...but on second thought we only care about invalidating the buffer if
the block belonged to the ag btree we've just killed, right?  If there's
a multi-block buffer because it's part of a directory or an rmt block
then the buffer is clearly owned by someone else and we don't even have
to look for that.  Likewise, if it's a single-block buffer  but the
block has some other magic then we don't own it and we should leave it
alone.

> > I /think/ the solution is that we need to query the buffer cache to see
> > if it has a buffer for the given disk blocks, and if it matches the
> > btree we're discarding (correct magic/uuid/b_length) then we invalidate
> > it,
> 
> I don't think that provides any guarantees. Even ignoring all the
> problems with invalidation while the buffer is dirty and tracked in
> the AIL, there's nothing stopping the other code from attempting to
> re-instantiate the buffer due to some other access. And then we
> have aliasing problems again....

Well, we /could/ just freeze the fs while we do repairs on any ag btree.

--D

> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2018-05-16 23:18 UTC|newest]

Thread overview: 76+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-05-15 22:33 [PATCH v15.1 00/22] xfs-4.18: online repair support Darrick J. Wong
2018-05-15 22:33 ` [PATCH 01/22] xfs: add helpers to deal with transaction allocation and rolling Darrick J. Wong
2018-05-16  6:51   ` Dave Chinner
2018-05-16 16:46     ` Darrick J. Wong
2018-05-16 21:19       ` Dave Chinner
2018-05-16 16:48   ` Allison Henderson
2018-05-18  3:49   ` [PATCH v2 " Darrick J. Wong
2018-05-15 22:33 ` [PATCH 02/22] xfs: add helpers to allocate and initialize fresh btree roots Darrick J. Wong
2018-05-16  7:07   ` Dave Chinner
2018-05-16 17:15     ` Darrick J. Wong
2018-05-16 17:00   ` Allison Henderson
2018-05-15 22:33 ` [PATCH 03/22] xfs: add helpers to collect and sift btree block pointers during repair Darrick J. Wong
2018-05-16  7:56   ` Dave Chinner
2018-05-16 17:34     ` Allison Henderson
2018-05-16 18:06       ` Darrick J. Wong
2018-05-16 21:23         ` Dave Chinner
2018-05-16 21:33           ` Allison Henderson
2018-05-16 18:01     ` Darrick J. Wong
2018-05-16 21:32       ` Dave Chinner
2018-05-16 22:05         ` Darrick J. Wong
2018-05-17  0:41           ` Dave Chinner
2018-05-17  5:05             ` Darrick J. Wong
2018-05-18  3:51   ` [PATCH v2 " Darrick J. Wong
2018-05-29  3:10     ` Dave Chinner
2018-05-29 15:28       ` Darrick J. Wong
2018-05-15 22:34 ` [PATCH 04/22] xfs: add helpers to dispose of old btree blocks after a repair Darrick J. Wong
2018-05-16  8:32   ` Dave Chinner
2018-05-16 18:02     ` Allison Henderson
2018-05-16 19:34     ` Darrick J. Wong
2018-05-16 22:32       ` Dave Chinner
2018-05-16 23:18         ` Darrick J. Wong [this message]
2018-05-17  5:58           ` Darrick J. Wong
2018-05-18  3:53   ` [PATCH v2 " Darrick J. Wong
2018-05-29  3:14     ` Dave Chinner
2018-05-29 18:01       ` Darrick J. Wong
2018-05-15 22:34 ` [PATCH 05/22] xfs: recover AG btree roots from rmap data Darrick J. Wong
2018-05-16  8:51   ` Dave Chinner
2018-05-16 18:37     ` Darrick J. Wong
2018-05-16 19:18       ` Allison Henderson
2018-05-16 22:36       ` Dave Chinner
2018-05-17  5:53         ` Darrick J. Wong
2018-05-18  3:54   ` [PATCH v2 " Darrick J. Wong
2018-05-29  3:16     ` Dave Chinner
2018-05-15 22:34 ` [PATCH 06/22] xfs: add a repair helper to reset superblock counters Darrick J. Wong
2018-05-16 21:29   ` Allison Henderson
2018-05-18  3:56     ` Darrick J. Wong
2018-05-18  3:56   ` [PATCH v2 " Darrick J. Wong
2018-05-29  3:28     ` Dave Chinner
2018-05-29 22:07       ` Darrick J. Wong
2018-05-29 22:24         ` Dave Chinner
2018-05-29 22:43           ` Darrick J. Wong
2018-05-30  1:23             ` Dave Chinner
2018-05-30  3:22               ` Darrick J. Wong
2018-05-15 22:34 ` [PATCH 07/22] xfs: add helpers to attach quotas to inodes Darrick J. Wong
2018-05-16 22:21   ` Allison Henderson
2018-05-18  3:58   ` [PATCH v2 " Darrick J. Wong
2018-05-29  3:29     ` Dave Chinner
2018-05-15 22:34 ` [PATCH 08/22] xfs: repair superblocks Darrick J. Wong
2018-05-16 22:55   ` Allison Henderson
2018-05-29  3:42   ` Dave Chinner
2018-05-15 22:34 ` [PATCH 09/22] xfs: repair the AGF and AGFL Darrick J. Wong
2018-05-15 22:34 ` [PATCH 10/22] xfs: repair the AGI Darrick J. Wong
2018-05-15 22:34 ` [PATCH 11/22] xfs: repair free space btrees Darrick J. Wong
2018-05-15 22:34 ` [PATCH 12/22] xfs: repair inode btrees Darrick J. Wong
2018-05-15 22:35 ` [PATCH 13/22] xfs: repair the rmapbt Darrick J. Wong
2018-05-15 22:35 ` [PATCH 14/22] xfs: repair refcount btrees Darrick J. Wong
2018-05-15 22:35 ` [PATCH 15/22] xfs: repair inode records Darrick J. Wong
2018-05-15 22:35 ` [PATCH 16/22] xfs: zap broken inode forks Darrick J. Wong
2018-05-15 22:35 ` [PATCH 17/22] xfs: repair inode block maps Darrick J. Wong
2018-05-15 22:35 ` [PATCH 18/22] xfs: repair damaged symlinks Darrick J. Wong
2018-05-15 22:35 ` [PATCH 19/22] xfs: repair extended attributes Darrick J. Wong
2018-05-15 22:35 ` [PATCH 20/22] xfs: scrub should set preen if attr leaf has holes Darrick J. Wong
2018-05-15 22:35 ` [PATCH 21/22] xfs: repair quotas Darrick J. Wong
2018-05-15 22:36 ` [PATCH 22/22] xfs: implement live quotacheck as part of quota repair Darrick J. Wong
2018-05-18  3:47 ` [PATCH 0.5/22] xfs: grab the per-ag structure whenever relevant Darrick J. Wong
2018-05-30  6:44   ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180516231820.GO23858@magnolia \
    --to=darrick.wong@oracle.com \
    --cc=david@fromorbit.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.