From: Brian Foster <bfoster@redhat.com>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH 3/3] xfs: terminate perag iteration reliably on end agno
Date: Fri, 8 Oct 2021 10:45:13 -0400 [thread overview]
Message-ID: <YWBZef87p55+XKNh@bfoster> (raw)
In-Reply-To: <20211007230259.GG54211@dread.disaster.area>
On Fri, Oct 08, 2021 at 10:02:59AM +1100, Dave Chinner wrote:
> On Thu, Oct 07, 2021 at 08:50:53AM -0400, Brian Foster wrote:
> > The for_each_perag*() set of macros are hacky in that some (i.e. those
> > based on sb_agcount) rely on the assumption that perag iteration
> > terminates naturally with a NULL perag at the specified end agno. Others
> > allow for the final AG to have a valid perag and require the calling
> > function to clean up any potential leftover xfs_perag reference on
> > termination of the loop.
> >
> > Aside from providing a subtly inconsistent interface, the former variant
> > is racy with a potential growfs in progress because growfs can create
> > discoverable post-eofs perags before the final superblock update that
> > completes the grow operation and increases sb_agcount. This leads to
> > unexpected assert failures (reproduced by xfs/104) such as the following
> > in the superblock buffer write verifier path:
> >
> > XFS: Assertion failed: agno < mp->m_sb.sb_agcount, file: fs/xfs/libxfs/xfs_types.c, line: 22
>
> Yeah, that's a bad assert. It's not valid in the context of grow or
> shrink or any of the future advanced per-ag management things we
> want to do.
>
I think it depends on the context. I don't think it's unreasonable to
expect certain paths to not want to process post-eofs perags. I don't
really like the placement of this assert tbh (with it being in a generic
helper) and the sb write verifier is probably not ideal context for the
check in a generic sense, but it does flag unexpected behavior when you
consider the higher level iteration is based on sb_agcount.
> I'm ok with the change being proposed as a expedient bug fix, but
> I'll note that the approach taken to fix it is not compatible with
> future plans for managing shrink and perag operations. I'll comment
> on the patch first, then the rest of the email is commentary about
> how xfs_perag_get() is intended to be used...
>
> > diff --git a/fs/xfs/libxfs/xfs_ag.h b/fs/xfs/libxfs/xfs_ag.h
> > index d05c9217c3af..edcdd4fbc225 100644
> > --- a/fs/xfs/libxfs/xfs_ag.h
> > +++ b/fs/xfs/libxfs/xfs_ag.h
> > @@ -116,34 +116,30 @@ void xfs_perag_put(struct xfs_perag *pag);
> >
> > /*
> > * Perag iteration APIs
> > - *
> > - * XXX: for_each_perag_range() usage really needs an iterator to clean up when
> > - * we terminate at end_agno because we may have taken a reference to the perag
> > - * beyond end_agno. Right now callers have to be careful to catch and clean that
> > - * up themselves. This is not necessary for the callers of for_each_perag() and
> > - * for_each_perag_from() because they terminate at sb_agcount where there are
> > - * no perag structures in tree beyond end_agno.
>
> We still really need an iterator for the range iterations so that we
> can have a consistent set of behaviours for all iterations and
> don't need a special case just for the "mid walk break" where the
> code keeps the active reference to the perag for itself...
>
Ok, but what exactly are you referring to by "an iterator" beyond what
we have here to this point? A walker function with a callback or
something? And why wouldn't we have done that in the first place instead
of introducing the API wart documented above?
> > */
> > static inline
> > struct xfs_perag *xfs_perag_next(
> > struct xfs_perag *pag,
> > - xfs_agnumber_t *agno)
> > + xfs_agnumber_t *agno,
> > + xfs_agnumber_t end_agno)
> > {
> > struct xfs_mount *mp = pag->pag_mount;
> >
> > *agno = pag->pag_agno + 1;
> > xfs_perag_put(pag);
> > - pag = xfs_perag_get(mp, *agno);
> > + pag = NULL;
> > + if (*agno <= end_agno)
> > + pag = xfs_perag_get(mp, *agno);
> > return pag;
>
> *agno = pag->pag_agno + 1;
> xfs_perag_put(pag);
> if (*agno > end_agno)
> return NULL;
> return xfs_perag_get(mp, *agno);
>
Will fix.
> > }
> >
> > #define for_each_perag_range(mp, agno, end_agno, pag) \
> > for ((pag) = xfs_perag_get((mp), (agno)); \
> > - (pag) != NULL && (agno) <= (end_agno); \
> > - (pag) = xfs_perag_next((pag), &(agno)))
> > + (pag) != NULL; \
> > + (pag) = xfs_perag_next((pag), &(agno), (end_agno)))
> >
> > #define for_each_perag_from(mp, agno, pag) \
> > - for_each_perag_range((mp), (agno), (mp)->m_sb.sb_agcount, (pag))
> > + for_each_perag_range((mp), (agno), (mp)->m_sb.sb_agcount - 1, (pag))
>
> Isn't this one line the entire bug fix right here? i.e. the
> factoring is largely unnecessary, the grow race bug is fixed by just
> this one-liner?
>
No, the reference count problems can still occur regardless of this
particular change.
...
>
> > The following assert failure occasionally triggers during the xfs_perag
> > free path on unmount, presumably because one of the many
> > for_each_perag() loops in the code that is expected to terminate with a
> > NULL pag raced with a growfs and actually terminated with a non-NULL
> > reference to post-eofs (at the time) perag.
> >
> > XFS: Assertion failed: atomic_read(&pag->pag_ref) == 0, file: fs/xfs/libxfs/xfs_ag.c, line: 195
> >
> > Rework the lower level perag iteration logic to explicitly terminate
> > on the specified end agno, not implicitly rely on pag == NULL as a
> > termination clause and thus avoid these problems.
>
> IMO, this just hides the symptom that results from code that isn't
> handling unexpected adverse loop termination correctly. The
> iterators are going to get more complex in the near future, so we
> really need them to have a robust iterator API that does all the
> cleanup work correctly, rather than try to hide it all in a
> increasingly complex for loop construct.
>
Any loop that uses one of the sb_agcount based iteration macros in a
context that can race with growfs and doesn't check pag != NULL post
loop is not handling loop termination correctly. The sb_agcount check
effectively builds in an early termination vector to every such usage,
because we can't guarantee that pag == NULL when the sb_agcount check
causes loop termination.
IOW, the following usage pattern documented in the comment above is not
universally correct/safe:
for_each_perag(...) {
/* no early termination */
}
/* no perag check because no early termination */
... because for_each_perag() is not implemented correctly to guarantee
that pag == NULL on exit of the loop. This is a simple logic bug with a
simple fix.
Brian
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com
>
next prev parent reply other threads:[~2021-10-08 14:45 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-10-07 12:50 [PATCH 0/3] xfs: fix perag iteration raciness Brian Foster
2021-10-07 12:50 ` [PATCH 1/3] xfs: fold perag loop iteration logic into helper function Brian Foster
2021-10-07 12:50 ` [PATCH 2/3] xfs: rename the next_agno perag iteration variable Brian Foster
2021-10-07 12:50 ` [PATCH 3/3] xfs: terminate perag iteration reliably on end agno Brian Foster
2021-10-07 23:02 ` Dave Chinner
2021-10-08 14:45 ` Brian Foster [this message]
2021-10-08 21:20 ` Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YWBZef87p55+XKNh@bfoster \
--to=bfoster@redhat.com \
--cc=david@fromorbit.com \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).