From: "Darrick J. Wong" <djwong@kernel.org>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH 2/3] xfs: fix an inode lookup race in xchk_get_inode
Date: Tue, 15 Nov 2022 16:53:59 -0800 [thread overview]
Message-ID: <Y3Q0p6LHYUJ/L1QN@magnolia> (raw)
In-Reply-To: <20221115034954.GX3600936@dread.disaster.area>
On Tue, Nov 15, 2022 at 02:49:54PM +1100, Dave Chinner wrote:
> On Sun, Oct 02, 2022 at 11:20:29AM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@kernel.org>
> >
> > In commit d658e, we tried to improve the robustnes of xchk_get_inode in
> > the face of EINVAL returns from iget by calling xfs_imap to see if the
> > inobt itself thinks that the inode is allocated. Unfortunately, that
> > commit didn't consider the possibility that the inode gets allocated
> > after iget but before imap. In this case, the imap call will succeed,
> > but we turn that into a corruption error and tell userspace the inode is
> > corrupt.
> >
> > Avoid this false corruption report by grabbing the AGI header and
> > retrying the iget before calling imap. If the iget succeeds, we can
> > proceed with the usual scrub-by-handle code. Fix all the incorrect
> > comments too, since unreadable/corrupt inodes no longer result in EINVAL
> > returns.
> >
> > Fixes: d658e72b4a09 ("xfs: distinguish between corrupt inode and invalid inum in xfs_scrub_get_inode")
> > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
>
> OK.
>
> > ---
> > fs/xfs/scrub/common.c | 203 ++++++++++++++++++++++++++++++++++++++++---------
> > fs/xfs/scrub/common.h | 4 +
> > fs/xfs/xfs_icache.c | 3 -
> > fs/xfs/xfs_icache.h | 1
> > 4 files changed, 172 insertions(+), 39 deletions(-)
> >
> >
> > diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
> > index 42a25488bd25..9a811c5fa8f7 100644
> > --- a/fs/xfs/scrub/common.c
> > +++ b/fs/xfs/scrub/common.c
> > @@ -635,6 +635,14 @@ xchk_ag_init(
> >
> > /* Per-scrubber setup functions */
> >
> > +void
> > +xchk_trans_cancel(
> > + struct xfs_scrub *sc)
> > +{
> > + xfs_trans_cancel(sc->tp);
> > + sc->tp = NULL;
> > +}
> > +
> > /*
> > * Grab an empty transaction so that we can re-grab locked buffers if
> > * one of our btrees turns out to be cyclic.
> > @@ -720,6 +728,80 @@ xchk_iget(
> > return xfs_iget(sc->mp, sc->tp, inum, XFS_IGET_UNTRUSTED, 0, ipp);
> > }
> >
> > +/*
> > + * Try to grab an inode. If we can't, return the AGI buffer so that the caller
> > + * can single-step the loading process to see where things went wrong.
>
> "Try to grab an inode in a manner that avoids races with physical inode
> allocation. If we can't, return the locked AGI buffer so that the
> caller can single-step the loading process to see where things went
> wrong."
Fixed.
> > + *
> > + * If the iget succeeds, return 0, a NULL AGI, and the inode.
> > + * If the iget fails, return the error, the AGI, and a NULL inode. This can
>
> "... the locked AGI, ...."
>
> > + * include -EINVAL and -ENOENT for invalid inode numbers or inodes that are no
> > + * longer allocated; or any other corruption or runtime error.
> > + * If the AGI read fails, return the error, a NULL AGI, and NULL inode.
> > + * If a fatal signal is pending, return -EINTR, a NULL AGI, and a NULL inode.
> > + */
> > +int
> > +xchk_iget_agi(
> > + struct xfs_scrub *sc,
> > + xfs_ino_t inum,
> > + struct xfs_buf **agi_bpp,
> > + struct xfs_inode **ipp)
> > +{
> > + struct xfs_mount *mp = sc->mp;
> > + struct xfs_trans *tp = sc->tp;
> > + struct xfs_perag *pag;
> > + int error;
> > +
> > +again:
> > + *agi_bpp = NULL;
> > + *ipp = NULL;
> > + error = 0;
> > +
> > + if (xchk_should_terminate(sc, &error))
> > + return error;
> > +
> > + pag = xfs_perag_get(mp, XFS_INO_TO_AGNO(mp, inum));
> > + error = xfs_ialloc_read_agi(pag, tp, agi_bpp);
> > + xfs_perag_put(pag);
> > + if (error)
> > + return error;
> > +
> > + error = xfs_iget(mp, tp, inum,
> > + XFS_IGET_NOWAIT | XFS_IGET_UNTRUSTED, 0, ipp);
>
> OK, IGET_NOWAIT is new....
>
> > +#define XFS_IGET_NOWAIT 0x10 /* return EAGAIN instead of blocking */
>
> But it doesn't prevent blocking. XFS_IGET_UNTRUSTED means we do a
> inobt record lookup (btree buffer locking and IO that can block) as
> well as reading the inode cluster from disk if it's not already in
> cache. Hence this isn't what I'd call a "non-blocking" or "no wait"
> operation.
>
> AFAICT, what is needed here is avoiding the -retry mechanism- of the
> lookup, not "non blocking/no wait" semantics. i.e. this seems
> reasonable to get an -EAGAIN error instead of delaying and trying
> again if we are using XFS_IGET_NORETRY semantics....
Aha, yes, thank you for the better name. :)
/* Return -EAGAIN immediately if the inode is unavailable. */
#define XFS_IGET_NORETRY (1U << 5)
> > + if (error == -EAGAIN) {
> > + /*
> > + * Cached inode awaiting inactivation. Drop the AGI buffer in
> > + * case the inactivation worker is now waiting for it, and try
> > + * the iget again.
> > + */
>
> That's not the only reason xfs_iget() could return -EAGAIN,
> right? radix_tree_preload() failing can cause -EAGAIN to be returned
> from xfs_iget_cache_miss(), as can an instantiation race inserting
> the new inode into the radix tree. The cache hit path has a bunch
> more cases, too. Perhaps:
Yes, I suppose that's possible if the incore inode gets evicted and
someone else is reinstantiating it at the same time we're looking for
it...
> /*
> * The inode may be in core but temporarily
> * unavailable and may require the AGI buffer before
> * it can be returned. Drop the AGI buffer and retry
> * the lookup.
> */
...so yes, this is a better explanation of what's going on.
> > + xfs_trans_brelse(tp, *agi_bpp);
> > + delay(1);
> > + goto again;
> > + }
> > + if (error == 0) {
> > + /* We got the inode, so we can release the AGI. */
> > + ASSERT(*ipp != NULL);
> > + xfs_trans_brelse(tp, *agi_bpp);
> > + *agi_bpp = NULL;
> > + }
> > +
> > + return error;
>
> That's better written as:
>
> if (error)
> return error;
>
> /* We got the inode, so we can release the AGI. */
> ASSERT(*ipp != NULL);
> xfs_trans_brelse(tp, *agi_bpp);
> *agi_bpp = NULL;
> return 0;
>
> Other than that, the code makes sense.
Fixed. Thanks for the review.
--D
> -Dave.
> --
> Dave Chinner
> david@fromorbit.com
prev parent reply other threads:[~2022-11-16 0:54 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-10-02 18:20 [PATCHSET v23.1 0/3] xfs: fix iget/irele usage in online fsck Darrick J. Wong
2022-10-02 18:20 ` [PATCH 3/3] xfs: retain the AGI when we can't iget an inode to scrub the core Darrick J. Wong
2022-11-15 4:08 ` Dave Chinner
2022-11-16 2:49 ` Darrick J. Wong
2022-11-17 1:15 ` Dave Chinner
2022-11-17 20:20 ` Darrick J. Wong
2022-10-02 18:20 ` [PATCH 1/3] xfs: manage inode DONTCACHE status at irele time Darrick J. Wong
2022-11-15 3:13 ` Dave Chinner
2022-11-15 3:34 ` Darrick J. Wong
2022-10-02 18:20 ` [PATCH 2/3] xfs: fix an inode lookup race in xchk_get_inode Darrick J. Wong
2022-11-15 3:49 ` Dave Chinner
2022-11-16 0:53 ` Darrick J. Wong [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Y3Q0p6LHYUJ/L1QN@magnolia \
--to=djwong@kernel.org \
--cc=david@fromorbit.com \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).