archive mirror
 help / color / mirror / Atom feed
* [PATCHSET v9 00/14] xfs: deferred inode inactivation
@ 2021-08-05  2:06 Darrick J. Wong
  2021-08-05  2:06 ` [PATCH 01/14] xfs: introduce CPU hotplug infrastructure Darrick J. Wong
                   ` (13 more replies)
  0 siblings, 14 replies; 32+ messages in thread
From: Darrick J. Wong @ 2021-08-05  2:06 UTC (permalink / raw)
  To: djwong; +Cc: Dave Chinner, Christoph Hellwig, linux-xfs, david, hch

Hi all,

This patch series implements deferred inode inactivation.  Inactivation
is what happens when an open file loses its last incore reference: if
the file has speculative preallocations, they must be freed, and if the
file is unlinked, all forks must be truncated, and the inode marked
freed in the inode chunk and the inode btrees.

Currently, all of this activity is performed in frontend threads when
the last in-memory reference is lost and/or the vfs decides to drop the
inode.  Three complaints stem from this behavior: first, that the time
to unlink (in the worst case) depends on both the complexity of the
directory as well as the the number of extents in that file; second,
that deleting a directory tree is inefficient and seeky because we free
the inodes in readdir order, not disk order; and third, the upcoming
online repair feature needs to be able to xfs_irele while scanning a
filesystem in transaction context.  It cannot perform inode inactivation
in this context because xfs does not support nested transactions.

The implementation will be familiar to those who have studied how XFS
scans for reclaimable in-core inodes -- we create a couple more inode
state flags to mark an inode as needing inactivation and being in the
middle of inactivation.  When inodes need inactivation, we set
NEED_INACTIVE in iflags and add it to a percpu work list.  Eventually, a
bounded percpu workqueue item will be scheduled to perform all the
on-disk metadata updates.  Once the inode has been inactivated, it is
left in the reclaim state and the background reclaim worker (or direct
reclaim) will get to it eventually.

Doing the inactivations from kernel threads solves the first problem by
constraining the amount of work done by the unlink() call to removing
the directory entry.  It solves the third problem by moving inactivation
to a separate process.  Performing the inactivations in batches
decreases the amount of time it takes to let go of an inode cluster if
we're deleting entire directory trees.

There are three big warts I can think of in this series: first, because
the actual freeing of nlink==0 inodes is now done in the background,
this means that the system will be busy making metadata updates for some
time after the unlink() call returns.  This temporarily reduces
available iops.  Second, in order to retain the behavior that deleting
100TB of unshared data should result in a free space gain of 100TB, the
statvfs and quota reporting ioctls wait for inactivation to finish,
which increases the long tail latency of those calls.  This behavior is,
unfortunately, key to not introducing regressions in fstests.  The third
problem is that the deferrals keep memory usage higher for longer.  The
final patch in the series (clumsily) addresses this by forcing the
inodegc workers to run when memory shrinkers get called and by
throttling the frontend xfs_inodegc_queue callers to wait for the

v1-v2: NYE patchbombs
v3: rebase against 5.12-rc2 for submission.
v4: combine the can/has eofblocks predicates, clean up incore inode tree
    walks, fix inobt deadlock
v5: actually freeze the inode gc threads when we freeze the filesystem,
    consolidate the code that deals with inode tagging, and use
    foreground inactivation during quotaoff to avoid cycling dquots
v6: rebase to 5.13-rc4, fix quotaoff not to require foreground inactivation,
    refactor to use inode walk goals, use atomic bitflags to control the
    scheduling of gc workers
v7: simplify the inodegc worker, which simplifies how flushes work, break
    up the patch into smaller pieces, flush inactive inodes on syncfs to
    simplify freeze/ro-remount handling, separate inode selection filtering
    in iget, refactor inode recycling further, change gc delay to 100ms,
    decrease the gc delay when space or quota are low, move most of the
    destroy_inode logic to mark_reclaimable, get rid of the fallocate flush
    scan thing, get rid of polled flush mode
v8: rebase against 5.14-rc2, hook the memory shrinkers so that we requeue
    inactivation immediately when memory starts to get tight and force
    callers queueing inodes for inactivation to wait for the inactivation
    workers to run (i.e. throttling the frontend) to reduce memory storms,
    add hch's quotaoff removal series as a dependency to shut down arguments
    about quota walks
v9: replace the entire mechanism with percpu lists and workers, clean out
    a ton of ratty code that nobody liked anyway :P

If you're going to start using this mess, you probably ought to just
pull from my git trees, which are linked below.

This is an extraordinary way to destroy everything.  Enjoy!
Comments and questions are, as always, welcome.


kernel git tree:
 fs/xfs/scrub/common.c      |   10 +
 fs/xfs/xfs_dquot.h         |   10 +
 fs/xfs/xfs_icache.c        |  592 ++++++++++++++++++++++++++++++++++++++++++--
 fs/xfs/xfs_icache.h        |    8 +
 fs/xfs/xfs_inode.c         |   53 ++++
 fs/xfs/xfs_inode.h         |   22 ++
 fs/xfs/xfs_itable.c        |   42 +++
 fs/xfs/xfs_iwalk.c         |   33 ++
 fs/xfs/xfs_log_recover.c   |    7 +
 fs/xfs/xfs_mount.c         |   57 +++-
 fs/xfs/xfs_mount.h         |   62 ++++-
 fs/xfs/xfs_qm.c            |   34 +++
 fs/xfs/xfs_qm_syscalls.c   |    8 +
 fs/xfs/xfs_quota.h         |    2 
 fs/xfs/xfs_super.c         |  253 ++++++++++++++-----
 fs/xfs/xfs_trace.h         |   93 +++++++
 fs/xfs/xfs_trans.c         |    5 
 include/linux/cpuhotplug.h |    1 
 18 files changed, 1164 insertions(+), 128 deletions(-)

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2021-08-09 23:36 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-05  2:06 [PATCHSET v9 00/14] xfs: deferred inode inactivation Darrick J. Wong
2021-08-05  2:06 ` [PATCH 01/14] xfs: introduce CPU hotplug infrastructure Darrick J. Wong
2021-08-05  2:06 ` [PATCH 02/14] xfs: introduce all-mounts list for cpu hotplug notifications Darrick J. Wong
2021-08-05  2:06 ` [PATCH 03/14] xfs: move xfs_inactive call to xfs_inode_mark_reclaimable Darrick J. Wong
2021-08-05  5:29   ` Dave Chinner
2021-08-05  2:06 ` [PATCH 04/14] xfs: detach dquots from inode if we don't need to inactivate it Darrick J. Wong
2021-08-05  5:30   ` Dave Chinner
2021-08-05  2:06 ` [PATCH 05/14] xfs: per-cpu deferred inode inactivation queues Darrick J. Wong
2021-08-05  6:43   ` Dave Chinner
2021-08-05  7:00     ` Darrick J. Wong
2021-08-05 22:15       ` Dave Chinner
2021-08-05 22:38         ` Darrick J. Wong
2021-08-07  0:21   ` Darrick J. Wong
2021-08-07 21:49     ` Dave Chinner
2021-08-09 23:36       ` Darrick J. Wong
2021-08-05  2:06 ` [PATCH 06/14] xfs: queue inactivation immediately when free space is tight Darrick J. Wong
2021-08-05  5:31   ` Dave Chinner
2021-08-05  2:07 ` [PATCH 07/14] xfs: queue inactivation immediately when quota is nearing enforcement Darrick J. Wong
2021-08-05  5:35   ` Dave Chinner
2021-08-05  2:07 ` [PATCH 08/14] xfs: queue inactivation immediately when free realtime extents are tight Darrick J. Wong
2021-08-05  5:36   ` Dave Chinner
2021-08-05  2:07 ` [PATCH 09/14] xfs: inactivate inodes any time we try to free speculative preallocations Darrick J. Wong
2021-08-05  5:36   ` Dave Chinner
2021-08-05  2:07 ` [PATCH 10/14] xfs: flush inode inactivation work when compiling usage statistics Darrick J. Wong
2021-08-05  5:38   ` Dave Chinner
2021-08-05  2:07 ` [PATCH 11/14] xfs: don't run speculative preallocation gc when fs is frozen Darrick J. Wong
2021-08-05  5:40   ` Dave Chinner
2021-08-05  2:07 ` [PATCH 12/14] xfs: use background worker pool when transactions can't get free space Darrick J. Wong
2021-08-05  5:42   ` Dave Chinner
2021-08-05  2:07 ` [PATCH 13/14] xfs: avoid buffer deadlocks when walking fs inodes Darrick J. Wong
2021-08-05  2:07 ` [PATCH 14/14] xfs: throttle inode inactivation queuing on memory reclaim Darrick J. Wong
2021-08-05  5:44   ` Dave Chinner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).