From: Jan Kara <jack@suse.cz>
To: Dave Chinner <david@fromorbit.com>
Cc: "Luis R. Rodriguez" <mcgrof@kernel.org>,
linux-xfs@vger.kernel.org, Nikolay Borisov <nborisov@suse.com>,
Jan Kara <jack@suse.cz>, Michal Hocko <mhocko@suse.com>
Subject: Re: Reviewing determinism of xfs_reclaim_inodes_ag()
Date: Wed, 26 Apr 2017 09:34:38 +0200 [thread overview]
Message-ID: <20170426073438.GB21218@quack2.suse.cz> (raw)
In-Reply-To: <20170426000426.GK12369@dastard>
On Wed 26-04-17 10:04:26, Dave Chinner wrote:
> On Tue, Apr 25, 2017 at 10:25:03AM +0200, Luis R. Rodriguez wrote:
> > I've been staring at xfs_reclaim_inodes_ag() for a bit now to determine
> > if and how its deterministic it is. Fortunately the code makes it relatively
> > clear it chugs on 32 xfs_inodes at at a time before cond_resched()'ing. I was
> > concerned originally if we cond_resched() even if we goto restart but it seems
> > we do.
> >
> > While reviewing this however I ran into the following question based on a
> > comment on the routine. Is such a thing as the below change needed ?
> >
> > diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
> > index 3531f8f72fa5..05f3c79b4f11 100644
> > --- a/fs/xfs/xfs_icache.c
> > +++ b/fs/xfs/xfs_icache.c
> > @@ -1157,7 +1157,9 @@ xfs_reclaim_inodes_ag(
> > for (i = 0; i < nr_found; i++) {
> > struct xfs_inode *ip = batch[i];
> >
> > - if (done || xfs_reclaim_inode_grab(ip, flags))
> > + if (done ||
> > + XFS_INO_TO_AGNO(mp, ip->i_ino) != pag->pag_agno ||
> > + xfs_reclaim_inode_grab(ip, flags))
> > batch[i] = NULL;
>
> If a new check is required here, it belongs in
> xfs_reclaim_inode_grab(), not as separate logic in the if statement
> here as it needs to be serialised with other checks the grab does.
>
> As it is, the grab code first checks for a zero ip->ino, which will
> catch all cases where the inode is still in the RCU grace period and
> a lookup has raced. If the inode is not under reclaim (i.e the inode
> number is non-zero) then the grab code checks and sets XFS_IRECLAIM
> under a spinlock, thereby prevent all further inode cache
> lookup+grab operations from succeeding until reclaim completes and
> the RCU grace period expires and frees the inode. i.e. if the
> XFS_IRECLAIM flag is already set, then the inode is skipped
> regardless of the inode number because it's already on it's way to
> being freed by RCU.
>
> Hence, IMO, adding this check here doesn't change lookup/reclaim
> processing correctness.
Agreed, the code looks correct as is to me.
> > This is what the code looks like before but pay special attention to the
> > comment and a race with RCU:
> >
> > if (done || xfs_reclaim_inode_grab(ip, flags))
> > batch[i] = NULL;
> >
> > /*
> > * Update the index for the next lookup.
> > * Catch overflows into the next AG range
> > * which can occur if we have inodes in the
> > * last block of the AG and we are
> > * currently pointing to the last inode.
> > *
> > * Because we may see inodes that are from
> > * the wrong AG due to RCU freeing and
> > * reallocation, only update the index if
> > * it lies in this AG. It was a race that
> > * lead us to see this inode, so another
> > * lookup from
> > * the same index will not find it again.
> > */
> > if (XFS_INO_TO_AGNO(mp, ip->i_ino) !=
> > pag->pag_agno)
> > continue;
>
> It's stale code left over from development of the RCU freeing of
> inodes where .....
>
> > I checked with Jan Kara and he believes the current code is correct but that
> > its the comment that that may be misleading. As per Jan the race is between
> > getting an inode reclaimed and grabbing it. Ie, XFS frees the inodes by RCU.
> > However it doesn't actually *reuse* the inode until RCU period passes
> > (unlike inodes allocated from slab with SLAB_RCU can be). So it can happen
>
> ..... I initially tried using SLAB_DESTROY_BY_RCU which meant the
> RCU grace period did not prevent reallocation of inodes that had
> been freed. Hence this check was (once) necessary to prevent the
> reclaim index going whacky on a reallocated inode.
>
> However, I dropped the SLAB_DESTROY_BY_RCU behaviour because I
> couldn't get it to work reliably and race free, and there was no
> measurable improvement in performance just using call_rcu() t delay
> the freeing of individual inodes to the end of the RCU grace period.
Yeah, so SLAB_DESTROY_BY_RCU is beneficial only if you have a workload
where getting CPU cache-hot objects on allocation improves performance
measurably. I remember quite a few years ago Nick Piggin had a workload
where this was advantageous for inodes (like opening & closing tons of
files so that inodes where flowing in and out of cache, I don't remember
the details anymore) but it was mostly a corner case.
> In hindsight, I forgot to remove the second comment hunk and the
> check that was eventually committed as 1a3e8f3da09c ("xfs: convert
> inode cache lookups to use RCU locking"). It also appears the commit
> message was not updated, too. It says:
>
> To avoid the read vs write contention, change the cache to use
> RCU locking on the read side. To avoid needing to RCU free every
> single inode, use the built in slab RCU freeing mechanism. This
> requires us to be able to detect lookups of freed inodes, [...]
>
> So, really, the comment+check you're confused about can simply be
> removed....
Thanks for confirmation and reference! Looking at that commit there seem to
be more comments there speaking about reallocation which probably need
updating - e.g. the one in xfs_reclaim_inode_grab(). Luis, will you take
care of that?
Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
next prev parent reply other threads:[~2017-04-26 7:34 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-04-25 8:25 Reviewing determinism of xfs_reclaim_inodes_ag() Luis R. Rodriguez
2017-04-26 0:04 ` Dave Chinner
2017-04-26 7:34 ` Jan Kara [this message]
2017-04-26 9:12 ` Luis R. Rodriguez
2017-04-26 10:55 ` Jan Kara
2017-05-06 17:41 ` Luis R. Rodriguez
2017-05-06 17:52 ` Luis R. Rodriguez
2017-05-09 10:41 ` Jan Kara
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170426073438.GB21218@quack2.suse.cz \
--to=jack@suse.cz \
--cc=david@fromorbit.com \
--cc=linux-xfs@vger.kernel.org \
--cc=mcgrof@kernel.org \
--cc=mhocko@suse.com \
--cc=nborisov@suse.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).