Linux-XFS Archive on lore.kernel.org
 help / color / Atom feed
From: "Darrick J. Wong" <djwong@kernel.org>
To: linux-xfs@vger.kernel.org, david@fromorbit.com, hch@infradead.org
Subject: Re: [PATCHSET v5 0/9] xfs: deferred inode inactivation
Date: Tue, 30 Mar 2021 18:36:54 -0700
Message-ID: <20210331013654.GA4090233@magnolia> (raw)
In-Reply-To: <161671810866.622901.16520335819131743716.stgit@magnolia>

Hi,

For the preservation of my own sanity, I am withdrawing this patchset
until the 5.15 cycle or later, since Christoph asked me to rework and
consolidate the incore inode walk functions in his review of the last
revision of this series, which further opened the can of warts that is
quotaoff walking the incore inodes to detach the dquots.  As Dave and
Brian have pointed out elsewhere, fixing quotaoff is not trivial.  The
whole series has run just fine on my testbed cluster for the last two
years without me needing to touch the inode walks or quotaoff, but such
is the nature of review.

Given the ten days left in the 5.13 development cycle before I have to
lob the second and final pile of patches into for-next, it is neither
seems possible for me to fix all those problems in a satisfactory way,
nor does it seem possible to backslide to revert to a previous
submission and get it into such a state that it would pass review.

As it stands now, during this cycle I have needed to add 13 bugfix and
refactoring patches that have to preceed this series, and I have not yet
started planning how to clean up quotaoff.  It has also become apparent
that one the changes requested by a reviewer has also destabilized the
patchset, with generic/371 now periodically failing.  I have not had
time to triage that.

In the meantime, I will work on upstreaming my fstests patch backlog,
which I estimate will take at least another three weeks.

--D

On Thu, Mar 25, 2021 at 05:21:48PM -0700, Darrick J. Wong wrote:
> Hi all,
> 
> This patch series implements deferred inode inactivation.  Inactivation
> is what happens when an open file loses its last incore reference: if
> the file has speculative preallocations, they must be freed, and if the
> file is unlinked, all forks must be truncated, and the inode marked
> freed in the inode chunk and the inode btrees.
> 
> Currently, all of this activity is performed in frontend threads when
> the last in-memory reference is lost and/or the vfs decides to drop the
> inode.  Three complaints stem from this behavior: first, that the time
> to unlink (in the worst case) depends on both the complexity of the
> directory as well as the the number of extents in that file; second,
> that deleting a directory tree is inefficient and seeky because we free
> the inodes in readdir order, not disk order; and third, the upcoming
> online repair feature needs to be able to xfs_irele while scanning a
> filesystem in transaction context.  It cannot perform inode inactivation
> in this context because xfs does not support nested transactions.
> 
> The implementation will be familiar to those who have studied how XFS
> scans for reclaimable in-core inodes -- we create a couple more inode
> state flags to mark an inode as needing inactivation and being in the
> middle of inactivation.  When inodes need inactivation, we set
> NEED_INACTIVE in iflags, set the INACTIVE radix tree tag, and schedule a
> deferred work item.  The deferred worker runs in an unbounded workqueue,
> scanning the inode radix tree for tagged inodes to inactivate, and
> performing all the on-disk metadata updates.  Once the inode has been
> inactivated, it is left in the reclaim state and the background reclaim
> worker (or direct reclaim) will get to it eventually.
> 
> Doing the inactivations from kernel threads solves the first problem by
> constraining the amount of work done by the unlink() call to removing
> the directory entry.  It solves the third problem by moving inactivation
> to a separate process.  Because the inactivations are done in order of
> inode number, we solve the second problem by performing updates in (we
> hope) disk order.  This also decreases the amount of time it takes to
> let go of an inode cluster if we're deleting entire directory trees.
> 
> There are three big warts I can think of in this series: first, because
> the actual freeing of nlink==0 inodes is now done in the background,
> this means that the system will be busy making metadata updates for some
> time after the unlink() call returns.  This temporarily reduces
> available iops.  Second, in order to retain the behavior that deleting
> 100TB of unshared data should result in a free space gain of 100TB, the
> statvfs and quota reporting ioctls wait for inactivation to finish,
> which increases the long tail latency of those calls.  This behavior is,
> unfortunately, key to not introducing regressions in fstests.  The third
> problem is that the deferrals keep memory usage higher for longer,
> reduce opportunities to throttle the frontend when metadata load is
> heavy, and the unbounded workqueues can create transaction storms.
> 
> For v5 there are some serious changes against the older versions of this
> patchset -- we no longer cycle an inode's dquots to avoid fights with
> quotaoff, and we actually shut down the background gc threads when the
> filesystem is frozen.
> 
> v1-v2: NYE patchbombs
> v3: rebase against 5.12-rc2 for submission.
> v4: combine the can/has eofblocks predicates, clean up incore inode tree
>     walks, fix inobt deadlock
> v5: actually freeze the inode gc threads when we freeze the filesystem,
>     consolidate the code that deals with inode tagging, and use
>     foreground inactivation during quotaoff to avoid cycling dquots
> 
> If you're going to start using this mess, you probably ought to just
> pull from my git trees, which are linked below.
> 
> This is an extraordinary way to destroy everything.  Enjoy!
> Comments and questions are, as always, welcome.
> 
> --D
> 
> kernel git tree:
> https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=deferred-inactivation-5.13
> ---
>  Documentation/admin-guide/xfs.rst |   10 +
>  fs/xfs/libxfs/xfs_fs.h            |    7 
>  fs/xfs/scrub/common.c             |    2 
>  fs/xfs/xfs_bmap_util.c            |   44 +++
>  fs/xfs/xfs_fsops.c                |    4 
>  fs/xfs/xfs_globals.c              |    3 
>  fs/xfs/xfs_icache.c               |  550 ++++++++++++++++++++++++++++++++-----
>  fs/xfs/xfs_icache.h               |   39 ++-
>  fs/xfs/xfs_inode.c                |   60 ++++
>  fs/xfs/xfs_inode.h                |   15 +
>  fs/xfs/xfs_ioctl.c                |    9 -
>  fs/xfs/xfs_linux.h                |    2 
>  fs/xfs/xfs_log_recover.c          |    7 
>  fs/xfs/xfs_mount.c                |   29 ++
>  fs/xfs/xfs_mount.h                |   19 +
>  fs/xfs/xfs_qm_syscalls.c          |   22 +
>  fs/xfs/xfs_super.c                |   86 +++++-
>  fs/xfs/xfs_sysctl.c               |    9 +
>  fs/xfs/xfs_sysctl.h               |    1 
>  fs/xfs/xfs_trace.h                |   14 +
>  20 files changed, 832 insertions(+), 100 deletions(-)
> 

      parent reply index

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-26  0:21 Darrick J. Wong
2021-03-26  0:21 ` [PATCH 1/9] xfs: refactor the inode recycling code Darrick J. Wong
2021-03-26  0:22 ` [PATCH 2/9] xfs: deferred inode inactivation Darrick J. Wong
2021-03-26  0:22 ` [PATCH 3/9] xfs: expose sysfs knob to control inode inactivation delay Darrick J. Wong
2021-03-26  0:22 ` [PATCH 4/9] xfs: force inode inactivation and retry fs writes when there isn't space Darrick J. Wong
2021-03-26  0:22 ` [PATCH 5/9] xfs: force inode garbage collection before fallocate when space is low Darrick J. Wong
2021-03-26  0:22 ` [PATCH 6/9] xfs: parallelize inode inactivation Darrick J. Wong
2021-03-26  0:22 ` [PATCH 7/9] xfs: create a polled function to force " Darrick J. Wong
2021-03-26  0:22 ` [PATCH 8/9] xfs: add inode scan limits to the eofblocks ioctl Darrick J. Wong
2021-03-26  0:22 ` [PATCH 9/9] xfs: don't run speculative preallocation gc when fs is frozen Darrick J. Wong
2021-03-31  1:36 ` Darrick J. Wong [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210331013654.GA4090233@magnolia \
    --to=djwong@kernel.org \
    --cc=david@fromorbit.com \
    --cc=hch@infradead.org \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-XFS Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-xfs/0 linux-xfs/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-xfs linux-xfs/ https://lore.kernel.org/linux-xfs \
		linux-xfs@vger.kernel.org
	public-inbox-index linux-xfs

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-xfs


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git