linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: "Darrick J. Wong" <djwong@kernel.org>
Cc: linux-xfs@vger.kernel.org, hch@infradead.org
Subject: Re: [PATCH 06/20] xfs: throttle inodegc queuing on backlog
Date: Mon, 2 Aug 2021 10:45:59 +1000	[thread overview]
Message-ID: <20210802004559.GE2757197@dread.disaster.area> (raw)
In-Reply-To: <162758426670.332903.7504844999802581902.stgit@magnolia>

On Thu, Jul 29, 2021 at 11:44:26AM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> Track the number of inodes in each AG that are queued for inactivation,
> then use that information to decide if we're going to make threads that
> has queued an inode for inactivation wait for the background thread.
> The purpose of this high water mark is to establish a maximum bound on
> the backlog of work that can accumulate on a non-frozen filesystem.
> 
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> ---
>  fs/xfs/libxfs/xfs_ag.c |    1 +
>  fs/xfs/libxfs/xfs_ag.h |    3 ++-
>  fs/xfs/xfs_icache.c    |   16 ++++++++++++++++
>  fs/xfs/xfs_trace.h     |   24 ++++++++++++++++++++++++
>  4 files changed, 43 insertions(+), 1 deletion(-)

Ok, this appears to cause fairly long latencies in unlink. I see it
overrun the throttle threshold and not throttle for some time:

rm-16440 [016]  5391.083568: xfs_inodegc_throttle_backlog: dev 251:0 agno 3 needs_inactive 65537
rm-16440 [016]  5391.083622: xfs_inodegc_throttle_backlog: dev 251:0 agno 3 needs_inactive 65538
rm-16440 [016]  5391.083689: xfs_inodegc_throttle_backlog: dev 251:0 agno 3 needs_inactive 65539
.....
rm-16440 [016]  5391.216007: xfs_inodegc_throttle_backlog: dev 251:0 agno 3 needs_inactive 67193
rm-16440 [016]  5391.216069: xfs_inodegc_throttle_backlog: dev 251:0 agno 3 needs_inactive 67194
rm-16440 [016]  5391.216179: xfs_inodegc_throttle_backlog: dev 251:0 agno 3 needs_inactive 67195
rm-16440 [016]  5391.231293: xfs_inodegc_throttle_backlog: dev 251:0 agno 3 needs_inactive 66807

You can see from the traces above that a typical
unlink() runs in about 60-70 microseconds. Notably, when background
inactivation kicks in, that blew out to 15ms for a single unlink.
Also, we can see that it has overrun 150ms past when it first hits the throttle
threshold before background inactivation kicks in (we can see the
inactive count come down). The next trace from this process is:

rm-16440 [016]  5394.335940: xfs_inodegc_throttled: dev 251:0 agno 3 caller xfs_fs_destroy_inode+0xbb

Because it now waits on flush_work() to complete the background
inactivation before it can run again. IOWs, this user process just
got blocked for over 3 seconds waiting for internal GC to do it's
stuff.

This blows out the long tail latencies that userspace sees and this
will really hurt random processes that drop the last reference to
files that are going to be reclaimed immediately. (e.g. any
unlink() that is run).

There is no reason for waiting for the entire backlog to be
processed here. This really needs to be watermarked, so that when we
hit the high watermark we immediately sleep until the background
reclaim brings it back down below the low watermark.

In this case, we run about 20,000 inactivations/s, so inactivations
take about 50us to run. We want to limit the blocking of any given
process that is throttled to something controllable and practical.
e.g. 100ms, which indicates taht the high and low watermarks should
be somewhere around 5000 operations apart.

So, when something hits the high watermark, it sets a "queue
throttling" bit, forces the perag gc work to run immediately, and
goes to sleep on the throttle bit. Any new operations that hit that
perag also sleep on the "queue throttle" bit. When the GC work
brings the queue down below the low watermark, it wakes all the
waiters and keeps running, allowing user processes to add to the
queue again while it is draining it.

With this sort of setup, we shouldn't need really deep queues -
maybe a few thousand inodes at most - and we guarantee that the
background GC has a period of time where it largely has exclusive
access to the AGI and inode cluster buffers to run batched
inactivation as quickly as possible. We also largey bound the length
of time that user processes block on the background GC work, and
that will be good for keeping long tail latencies under control.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

  reply	other threads:[~2021-08-02  0:46 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-29 18:43 [PATCHSET v8 00/20] xfs: deferred inode inactivation Darrick J. Wong
2021-07-29 18:43 ` [PATCH 01/20] xfs: move xfs_inactive call to xfs_inode_mark_reclaimable Darrick J. Wong
2021-07-29 18:44 ` [PATCH 02/20] xfs: detach dquots from inode if we don't need to inactivate it Darrick J. Wong
2021-07-29 18:44 ` [PATCH 03/20] xfs: defer inode inactivation to a workqueue Darrick J. Wong
2021-07-30  4:24   ` Dave Chinner
2021-07-31  4:21     ` Darrick J. Wong
2021-08-01 21:49       ` Dave Chinner
2021-08-01 23:47         ` Dave Chinner
2021-08-03  8:34   ` [PATCH, alternative] xfs: per-cpu deferred inode inactivation queues Dave Chinner
2021-08-03 20:20     ` Darrick J. Wong
2021-08-04  3:20     ` [PATCH, alternative v2] " Darrick J. Wong
2021-08-04 10:03       ` [PATCH] xfs: inodegc needs to stop before freeze Dave Chinner
2021-08-04 12:37         ` Dave Chinner
2021-08-04 10:46       ` [PATCH] xfs: don't run inodegc flushes when inodegc is not active Dave Chinner
2021-08-04 16:20         ` Darrick J. Wong
2021-08-04 11:09       ` [PATCH, alternative v2] xfs: per-cpu deferred inode inactivation queues Dave Chinner
2021-08-04 15:59         ` Darrick J. Wong
2021-08-04 21:35           ` Dave Chinner
2021-08-04 11:49       ` [PATCH, pre-03/20 #1] xfs: introduce CPU hotplug infrastructure Dave Chinner
2021-08-04 11:50       ` [PATCH, pre-03/20 #2] xfs: introduce all-mounts list for cpu hotplug notifications Dave Chinner
2021-08-04 16:06         ` Darrick J. Wong
2021-08-04 21:17           ` Dave Chinner
2021-08-04 11:52       ` [PATCH, post-03/20 1/1] xfs: hook up inodegc to CPU dead notification Dave Chinner
2021-08-04 16:19         ` Darrick J. Wong
2021-08-04 21:48           ` Dave Chinner
2021-07-29 18:44 ` [PATCH 04/20] xfs: throttle inode inactivation queuing on memory reclaim Darrick J. Wong
2021-07-29 18:44 ` [PATCH 05/20] xfs: don't throttle memory reclaim trying to queue inactive inodes Darrick J. Wong
2021-07-29 18:44 ` [PATCH 06/20] xfs: throttle inodegc queuing on backlog Darrick J. Wong
2021-08-02  0:45   ` Dave Chinner [this message]
2021-08-02  1:30     ` Dave Chinner
2021-07-29 18:44 ` [PATCH 07/20] xfs: queue inodegc worker immediately when memory is tight Darrick J. Wong
2021-07-29 18:44 ` [PATCH 08/20] xfs: expose sysfs knob to control inode inactivation delay Darrick J. Wong
2021-07-29 18:44 ` [PATCH 09/20] xfs: reduce inactivation delay when free space is tight Darrick J. Wong
2021-07-29 18:44 ` [PATCH 10/20] xfs: reduce inactivation delay when quota are tight Darrick J. Wong
2021-07-29 18:44 ` [PATCH 11/20] xfs: reduce inactivation delay when realtime extents " Darrick J. Wong
2021-07-29 18:44 ` [PATCH 12/20] xfs: inactivate inodes any time we try to free speculative preallocations Darrick J. Wong
2021-07-29 18:45 ` [PATCH 13/20] xfs: flush inode inactivation work when compiling usage statistics Darrick J. Wong
2021-07-29 18:45 ` [PATCH 14/20] xfs: parallelize inode inactivation Darrick J. Wong
2021-08-02  0:55   ` Dave Chinner
2021-08-02 21:33     ` Darrick J. Wong
2021-07-29 18:45 ` [PATCH 15/20] xfs: reduce inactivation delay when AG free space are tight Darrick J. Wong
2021-07-29 18:45 ` [PATCH 16/20] xfs: queue inodegc worker immediately on backlog Darrick J. Wong
2021-07-29 18:45 ` [PATCH 17/20] xfs: don't run speculative preallocation gc when fs is frozen Darrick J. Wong
2021-07-29 18:45 ` [PATCH 18/20] xfs: scale speculative preallocation gc delay based on free space Darrick J. Wong
2021-07-29 18:45 ` [PATCH 19/20] xfs: use background worker pool when transactions can't get " Darrick J. Wong
2021-07-29 18:45 ` [PATCH 20/20] xfs: avoid buffer deadlocks when walking fs inodes Darrick J. Wong
2021-08-02 10:35 ` [PATCHSET v8 00/20] xfs: deferred inode inactivation Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210802004559.GE2757197@dread.disaster.area \
    --to=david@fromorbit.com \
    --cc=djwong@kernel.org \
    --cc=hch@infradead.org \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).