linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Brian Foster <bfoster@redhat.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH 3/3] xfs: terminate perag iteration reliably on end agno
Date: Sat, 9 Oct 2021 08:20:20 +1100	[thread overview]
Message-ID: <20211008212020.GI54211@dread.disaster.area> (raw)
In-Reply-To: <YWBZef87p55+XKNh@bfoster>

On Fri, Oct 08, 2021 at 10:45:13AM -0400, Brian Foster wrote:
> On Fri, Oct 08, 2021 at 10:02:59AM +1100, Dave Chinner wrote:
> > On Thu, Oct 07, 2021 at 08:50:53AM -0400, Brian Foster wrote:
> > > diff --git a/fs/xfs/libxfs/xfs_ag.h b/fs/xfs/libxfs/xfs_ag.h
> > > index d05c9217c3af..edcdd4fbc225 100644
> > > --- a/fs/xfs/libxfs/xfs_ag.h
> > > +++ b/fs/xfs/libxfs/xfs_ag.h
> > > @@ -116,34 +116,30 @@ void xfs_perag_put(struct xfs_perag *pag);
> > >  
> > >  /*
> > >   * Perag iteration APIs
> > > - *
> > > - * XXX: for_each_perag_range() usage really needs an iterator to clean up when
> > > - * we terminate at end_agno because we may have taken a reference to the perag
> > > - * beyond end_agno. Right now callers have to be careful to catch and clean that
> > > - * up themselves. This is not necessary for the callers of for_each_perag() and
> > > - * for_each_perag_from() because they terminate at sb_agcount where there are
> > > - * no perag structures in tree beyond end_agno.
> > 
> > We still really need an iterator for the range iterations so that we
> > can have a consistent set of behaviours for all iterations and
> > don't need a special case just for the "mid walk break" where the
> > code keeps the active reference to the perag for itself...
> > 
> 
> Ok, but what exactly are you referring to by "an iterator" beyond what
> we have here to this point? A walker function with a callback or
> something? And why wouldn't we have done that in the first place instead
> of introducing the API wart documented above?

We didn't do it in the first place because we started with the tag
walks that only ever terminate on NULL - that was Darrick's code
used in the inode cache walk functions.. Converting the rest of
agcount based walks was not straight-forward - there were different
cases that needed to be handled and this was the least worst way to
begin the conversion needed for shrink.

Looking back at the history, it was that last conversion for
GETFSMAP that caused the problems sb_agcount based walks in commit
58d43a7e3263 ("xfs: pass perags around in fsmap data dev functions")
where for_each_perag_from() was converted to
for_each_perag_from_range(). There was no termination leak before
this, and races with growfs were simply handled with termination by
NULL lookup return. The bug iwas introduced because we moved away
from termination on NULL lookup by adding a termination via loop
index to support GETFSMAP.

We knew about the API wart and put a plan in place to address it in
future by movign to iterators. That looks something like this but
without the C99 automatic cleanup so external iterator setup and
teardown:

https://lore.kernel.org/linux-xfs/162814685996.2777088.11268635137040103857.stgit@magnolia/

So, yes, we need to fix the bug that was introduced, but asking why
the code wasn't perfect before it was merged doesn't help us make
small steps forwards towards solving the bigger problems we need to
address...

> > >  }
> > >  
> > >  #define for_each_perag_range(mp, agno, end_agno, pag) \
> > >  	for ((pag) = xfs_perag_get((mp), (agno)); \
> > > -		(pag) != NULL && (agno) <= (end_agno); \
> > > -		(pag) = xfs_perag_next((pag), &(agno)))
> > > +		(pag) != NULL; \
> > > +		(pag) = xfs_perag_next((pag), &(agno), (end_agno)))
> > >  
> > >  #define for_each_perag_from(mp, agno, pag) \
> > > -	for_each_perag_range((mp), (agno), (mp)->m_sb.sb_agcount, (pag))
> > > +	for_each_perag_range((mp), (agno), (mp)->m_sb.sb_agcount - 1, (pag))
> > 
> > Isn't this one line the entire bug fix right here? i.e. the
> > factoring is largely unnecessary, the grow race bug is fixed by just
> > this one-liner?
> > 
> 
> No, the reference count problems can still occur regardless of this
> particular change.

Ok, so there are two problems then - one is an off-by one that can
result in perag lookups beyond the EOFS racing with growfs instead
of returning NULL.

The other is loop termination via loop index limits can leak a perag
when terminating the walk on the end index rather than a NULL. I
think these two bug fixes should at least be separate patches. Fix
the off-by-one in one patch, fix the termination issue in another.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

      reply	other threads:[~2021-10-08 21:20 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-07 12:50 [PATCH 0/3] xfs: fix perag iteration raciness Brian Foster
2021-10-07 12:50 ` [PATCH 1/3] xfs: fold perag loop iteration logic into helper function Brian Foster
2021-10-07 12:50 ` [PATCH 2/3] xfs: rename the next_agno perag iteration variable Brian Foster
2021-10-07 12:50 ` [PATCH 3/3] xfs: terminate perag iteration reliably on end agno Brian Foster
2021-10-07 23:02   ` Dave Chinner
2021-10-08 14:45     ` Brian Foster
2021-10-08 21:20       ` Dave Chinner [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20211008212020.GI54211@dread.disaster.area \
    --to=david@fromorbit.com \
    --cc=bfoster@redhat.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).