Linux-XFS Archive on lore.kernel.org
 help / color / Atom feed
From: "Darrick J. Wong" <djwong@kernel.org>
To: Dave Chinner <david@fromorbit.com>
Cc: Christoph Hellwig <hch@lst.de>,
	linux-xfs@vger.kernel.org, hch@infradead.org
Subject: Re: [PATCH 1/6] xfs: use s_inodes in xfs_qm_dqrele_all_inodes
Date: Mon, 29 Mar 2021 21:06:25 -0700
Message-ID: <20210330040625.GL4090233@magnolia> (raw)
In-Reply-To: <20210330030747.GT63242@dread.disaster.area>

On Tue, Mar 30, 2021 at 02:07:47PM +1100, Dave Chinner wrote:
> On Mon, Mar 29, 2021 at 07:36:56PM -0700, Darrick J. Wong wrote:
> > On Tue, Mar 30, 2021 at 11:44:07AM +1100, Dave Chinner wrote:
> > > On Thu, Mar 25, 2021 at 05:21:18PM -0700, Darrick J. Wong wrote:
> > > > From: Christoph Hellwig <hch@lst.de>
> > > > 
> > > > Using xfs_inode_walk in xfs_qm_dqrele_all_inodes is complete overkill,
> > > > given that function simplify wants to iterate all live inodes known
> > > > to the VFS.  Just iterate over the s_inodes list.
> > > 
> > > I'm not sure that assertion is true. We attach dquots during inode
> > > inactivation after the VFS has removed the inode from the s_inodes
> > > list and evicted the inode. Hence there is a window between the
> > > inode being removed from the sb->s_inodes lists and it being marked
> > > XFS_IRECLAIMABLE where we can attach dquots to the inode.
> > > 
> > > Indeed, an inode marked XFS_IRECLAIMABLE that has gone through
> > > evict -> destroy -> inactive -> nlink != 0 -> xfs_free_ eofblocks()
> > > can have referenced dquots attached to it and require dqrele() to be
> > > called to release them.
> > 
> > Why do the dquots need to remain attached after destroy_inode?
> 
> They don't. But that's not the problem here.

Actually, they do need to remain attached nowadays, because COW blocks
are accounted as incore dquot reservations so we can't let the dquots
drop until the COW fork gets cleaned out.

Granted I guess I did have a patch that changed the dquot lifecycle so
that they would stay in memory after the refcount dropped to zero, even
if they had incore reservations.

...and now I finally see the plot twist that turns this into the
*Fourth* part of Yet Another Quota Restructuring.  This time I get to
reimplement quotaoff! :P

> > We can
> > easily reattach them during inactivation (v3 did this), and I don't know
> > why an inode needs dquots once we're through making metadata updates.
> 
> Yes, they get re-attached for truncation, attr removal, EOF block
> freeing, etc. Only on the unlinked inode path in inactivation do
> they get removed once all the work tha tmodifies the dquots is done.
> 
> But many of the paths don't detach them again because they are
> multi-use.  e.g xfs_free_eofblocks() will attach dquots, but doesn't
> detatch them because it's called from more placed than than the
> inactivation path.
> 
> I'm sure this can all be cleaned up, but I *really* don't like the
> idea of a "walk all XFS inodes" scan that actually only walks the
> inodes with VFS references and not -all XFS inodes-.
> 
> And there's other problems with doing sb->s_inodes list walks -
> namely the global lock. While we are doing this walk (might be tens
> of millions of inodes!) we can hold the s_inode_list_lock for a long
> time and we cannot instantiate new inodes or evict inodes to/from
> the cache while that lock is held. The XFS inode walk is lockless
> and we don't hold off anything to do wiht cache instantiation and
> freeing, so it has less impact on the running system.
> 
> If everything is clean and don't block on locks anywhere, the
> s_inodes list walk needs a cond_resched() in it. Again, tens
> (hundreds) of millions of inodes can be on that list mean it can
> hold the CPU for a long time.

Yeah, I had wondered how good an idea it was to replace batch lookups
with a list walk...

> Next, igrab() takes a reference to the inode which will mark them
> referenced. THis walk grabs every inode in the filesysetm cache,
> so marks them all referenced and makes it harder to reclaim them
> under memory pressure. This perturbs working set behaviour.
> 
> inode list walks and igrab/iput don't come for free - they perturb
> the working set, LRU orders, cause lock contention, long tail
> latencies, etc. The XFS inode cache walk might not be the prettiest
> thing, but it doesn't have any of these nasty side effects.
> 
> So, in general, I don't think we should be adding new inode list
> walks anywhere, not even deep in XFS where nobody else might care...

...but the current quotaoff behavior has /all/ of these problems too.

I think you and I hashed out on IRC that quotaoff could simply take the
ILOCK and the i_flags lock of every inode that isn't INEW, RECLAIMING,
or INACTIVATING; drop the dquots, and drop the locks, and then dqpurge
would only have to wait for the inodes that are actively being reclaimed
or inactivated.

I'll give that a try ... eventually, but I wouldn't be too confident
that I'll get all this turned around before I have to shut the door next
next Thursday.

> > > Hence I think that xfs_qm_dqrele_all_inodes() is broken if all it is
> > > doing is walking vfs referenced inodes, because it doesn't actually
> > > release the dquots attached to reclaimable inodes. If this did
> > > actually release all dquots, then there wouldn't be a need for the
> > > xfs_qm_dqdetach() call in xfs_reclaim_inode() just before it's
> > > handed to RCU to be freed....
> > 
> > Why does it work now, then?  The current code /also/ leaves the dquots
> > attached to reclaimable inodes, and the quotaoff scan ignores
> > IRECLAIMABLE inodes.
> 
> Luck, I think.
> 
> > Has it simply been the case that the dqpurge spins
> > until reclaim runs, and reclaim gets run quickly enough (or quotaoff runs
> > infrequently enough) that nobody's complained?
> 
> Yup, that's my assumption - quotaoff is rare, inode reclaim runs
> every 5s - and so we haven't noticed it because nobody has looked
> closely at how dquots vs inode reclaim works recently...

Yes, that does explain some of the weird test duration quirks I see in
xfs/305....

--D

> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

  reply index

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-26  0:21 [PATCHSET v3 0/6] xfs: clean up incore inode walk functions Darrick J. Wong
2021-03-26  0:21 ` [PATCH 1/6] xfs: use s_inodes in xfs_qm_dqrele_all_inodes Darrick J. Wong
2021-03-30  0:44   ` Dave Chinner
2021-03-30  2:36     ` Darrick J. Wong
2021-03-30  3:07       ` Dave Chinner
2021-03-30  4:06         ` Darrick J. Wong [this message]
2021-03-31  1:34           ` Dave Chinner
2021-03-26  0:21 ` [PATCH 2/6] xfs: remove iter_flags parameter from xfs_inode_walk_* Darrick J. Wong
2021-03-26  6:04   ` Christoph Hellwig
2021-03-26  0:21 ` [PATCH 3/6] xfs: remove indirect calls from xfs_inode_walk{,_ag} Darrick J. Wong
2021-03-26  6:08   ` Christoph Hellwig
2021-03-26  0:21 ` [PATCH 4/6] xfs: pass struct xfs_eofblocks to the inode scan callback Darrick J. Wong
2021-03-26  6:09   ` Christoph Hellwig
2021-03-26  0:21 ` [PATCH 5/6] xfs: merge xfs_reclaim_inodes_ag into xfs_inode_walk_ag Darrick J. Wong
2021-03-26  6:30   ` Christoph Hellwig
2021-03-26 16:07     ` Darrick J. Wong
2021-03-26  0:21 ` [PATCH 6/6] xfs: refactor per-AG inode tagging functions Darrick J. Wong
2021-03-26  6:48   ` Christoph Hellwig
2021-03-26 16:34     ` Darrick J. Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210330040625.GL4090233@magnolia \
    --to=djwong@kernel.org \
    --cc=david@fromorbit.com \
    --cc=hch@infradead.org \
    --cc=hch@lst.de \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-XFS Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-xfs/0 linux-xfs/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-xfs linux-xfs/ https://lore.kernel.org/linux-xfs \
		linux-xfs@vger.kernel.org
	public-inbox-index linux-xfs

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-xfs


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git