From: Amir Goldstein <amir73il@gmail.com>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-xfs <linux-xfs@vger.kernel.org>
Subject: Re: [PATCH 13/24] xfs: make inode reclaim almost non-blocking
Date: Fri, 22 May 2020 15:19:16 +0300 [thread overview]
Message-ID: <CAOQ4uxiit=kANPpQBDuT5kU4uuzO1Oi2p8HwU4wuoF1k7viXuw@mail.gmail.com> (raw)
In-Reply-To: <20200522035029.3022405-14-david@fromorbit.com>
On Fri, May 22, 2020 at 6:51 AM Dave Chinner <david@fromorbit.com> wrote:
>
> From: Dave Chinner <dchinner@redhat.com>
>
> Now that dirty inode writeback doesn't cause read-modify-write
> cycles on the inode cluster buffer under memory pressure, the need
> to throttle memory reclaim to the rate at which we can clean dirty
> inodes goes away. That is due to the fact that we no longer thrash
> inode cluster buffers under memory pressure to clean dirty inodes.
>
> This means inode writeback no longer stalls on memory allocation
> or read IO, and hence can be done asynchrnously without generating
> memory pressure. As a result, blocking inode writeback in reclaim is
> no longer necessary to prevent reclaim priority windup as cleaning
> dirty inodes is no longer dependent on having memory reserves
> available for the filesystem to make progress reclaiming inodes.
>
> Hence we can convert inode reclaim to be non-blocking for shrinker
> callouts, both for direct reclaim and kswapd.
>
> On a vanilla kernel, running a 16-way fsmark create workload on a
> 4 node/16p/16GB RAM machine, I can reliably pin 14.75GB of RAM via
> userspace mlock(). The OOM killer gets invoked at 15GB of
> pinned RAM.
>
> With this patch alone, pinning memory triggers premature OOM
> killer invocation, sometimes with as much as 45% of RAM being free.
> It's trivially easy to trigger the OOM killer when reclaim does not
> block.
>
> With pinning inode clusters in RAM adn then adding this patch, I can
typo: adn
> reliably pin 14.5GB of RAM and still have the fsmark workload run to
> completion. The OOM killer gets invoked 14.75GB of pinned RAM, which
> is only a small amount of memory less than the vanilla kernel. It is
> much more reliable than just with async reclaim alone.
>
> simoops shows that allocation stalls go away when async reclaim is
> used. Vanilla kernel:
>
> Run time: 1924 seconds
> Read latency (p50: 3,305,472) (p95: 3,723,264) (p99: 4,001,792)
> Write latency (p50: 184,064) (p95: 553,984) (p99: 807,936)
> Allocation latency (p50: 2,641,920) (p95: 3,911,680) (p99: 4,464,640)
> work rate = 13.45/sec (avg 13.44/sec) (p50: 13.46) (p95: 13.58) (p99: 13.70)
> alloc stall rate = 3.80/sec (avg: 2.59) (p50: 2.54) (p95: 2.96) (p99: 3.02)
>
> With inode cluster pinning and async reclaim:
>
> Run time: 1924 seconds
> Read latency (p50: 3,305,472) (p95: 3,715,072) (p99: 3,977,216)
> Write latency (p50: 187,648) (p95: 553,984) (p99: 789,504)
> Allocation latency (p50: 2,748,416) (p95: 3,919,872) (p99: 4,448,256)
> work rate = 13.28/sec (avg 13.32/sec) (p50: 13.26) (p95: 13.34) (p99: 13.34)
> alloc stall rate = 0.02/sec (avg: 0.02) (p50: 0.01) (p95: 0.03) (p99: 0.03)
>
> Latencies don't really change much, nor does the work rate. However,
> allocation almost never stalls with these changes, whilst the
> vanilla kernel is sometimes reporting 20 stalls/s over a 60s sample
> period. This difference is due to inode reclaim being largely
> non-blocking now.
>
> IOWs, once we have pinned inode cluster buffers, we can make inode
> reclaim non-blocking without a major risk of premature and/or
> spurious OOM killer invocation, and without any changes to memory
> reclaim infrastructure.
>
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
I can confirm observing the stalls on production servers and that
this patch alone fixes the stalls.
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Thanks,
Amir.
next prev parent reply other threads:[~2020-05-22 12:19 UTC|newest]
Thread overview: 91+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-05-22 3:50 [PATCH 00/24] xfs: rework inode flushing to make inode reclaim fully asynchronous Dave Chinner
2020-05-22 3:50 ` [PATCH 01/24] xfs: remove logged flag from inode log item Dave Chinner
2020-05-22 7:25 ` Christoph Hellwig
2020-05-22 21:13 ` Darrick J. Wong
2020-05-22 3:50 ` [PATCH 02/24] xfs: add an inode item lock Dave Chinner
2020-05-22 6:45 ` Amir Goldstein
2020-05-22 21:24 ` Darrick J. Wong
2020-05-23 8:45 ` Christoph Hellwig
2020-05-22 3:50 ` [PATCH 03/24] xfs: mark inode buffers in cache Dave Chinner
2020-05-22 7:45 ` Amir Goldstein
2020-05-22 21:35 ` Darrick J. Wong
2020-05-24 23:41 ` Dave Chinner
2020-05-23 8:48 ` Christoph Hellwig
2020-05-25 0:06 ` Dave Chinner
2020-05-22 3:50 ` [PATCH 04/24] xfs: mark dquot " Dave Chinner
2020-05-22 7:46 ` Amir Goldstein
2020-05-22 21:38 ` Darrick J. Wong
2020-05-22 3:50 ` [PATCH 05/24] xfs: mark log recovery buffers for completion Dave Chinner
2020-05-22 7:41 ` Amir Goldstein
2020-05-24 23:54 ` Dave Chinner
2020-05-22 21:41 ` Darrick J. Wong
2020-05-22 3:50 ` [PATCH 06/24] xfs: call xfs_buf_iodone directly Dave Chinner
2020-05-22 7:56 ` Amir Goldstein
2020-05-22 21:53 ` Darrick J. Wong
2020-05-22 3:50 ` [PATCH 07/24] xfs: clean up whacky buffer log item list reinit Dave Chinner
2020-05-22 22:01 ` Darrick J. Wong
2020-05-23 8:50 ` Christoph Hellwig
2020-05-22 3:50 ` [PATCH 08/24] xfs: fold xfs_istale_done into xfs_iflush_done Dave Chinner
2020-05-22 22:10 ` Darrick J. Wong
2020-05-23 9:12 ` Christoph Hellwig
2020-05-22 3:50 ` [PATCH 09/24] xfs: use direct calls for dquot IO completion Dave Chinner
2020-05-22 22:13 ` Darrick J. Wong
2020-05-23 9:16 ` Christoph Hellwig
2020-05-22 3:50 ` [PATCH 10/24] xfs: clean up the buffer iodone callback functions Dave Chinner
2020-05-22 22:26 ` Darrick J. Wong
2020-05-25 0:37 ` Dave Chinner
2020-05-23 9:19 ` Christoph Hellwig
2020-05-22 3:50 ` [PATCH 11/24] xfs: get rid of log item callbacks Dave Chinner
2020-05-22 22:27 ` Darrick J. Wong
2020-05-23 9:19 ` Christoph Hellwig
2020-05-22 3:50 ` [PATCH 12/24] xfs: pin inode backing buffer to the inode log item Dave Chinner
2020-05-22 22:39 ` Darrick J. Wong
2020-05-23 9:34 ` Christoph Hellwig
2020-05-23 21:43 ` Dave Chinner
2020-05-24 5:31 ` Christoph Hellwig
2020-05-24 23:13 ` Dave Chinner
2020-05-22 3:50 ` [PATCH 13/24] xfs: make inode reclaim almost non-blocking Dave Chinner
2020-05-22 12:19 ` Amir Goldstein [this message]
2020-05-22 22:48 ` Darrick J. Wong
2020-05-23 22:29 ` Dave Chinner
2020-05-22 3:50 ` [PATCH 14/24] xfs: remove IO submission from xfs_reclaim_inode() Dave Chinner
2020-05-22 23:06 ` Darrick J. Wong
2020-05-25 3:49 ` Dave Chinner
2020-05-23 9:40 ` Christoph Hellwig
2020-05-23 22:35 ` Dave Chinner
2020-05-22 3:50 ` [PATCH 15/24] xfs: allow multiple reclaimers per AG Dave Chinner
2020-05-22 23:10 ` Darrick J. Wong
2020-05-23 22:35 ` Dave Chinner
2020-05-22 3:50 ` [PATCH 16/24] xfs: don't block inode reclaim on the ILOCK Dave Chinner
2020-05-22 23:11 ` Darrick J. Wong
2020-05-22 3:50 ` [PATCH 17/24] xfs: remove SYNC_TRYLOCK from inode reclaim Dave Chinner
2020-05-22 23:14 ` Darrick J. Wong
2020-05-23 22:42 ` Dave Chinner
2020-05-22 3:50 ` [PATCH 18/24] xfs: clean up inode reclaim comments Dave Chinner
2020-05-22 23:17 ` Darrick J. Wong
2020-05-22 3:50 ` [PATCH 19/24] xfs: attach inodes to the cluster buffer when dirtied Dave Chinner
2020-05-22 23:48 ` Darrick J. Wong
2020-05-23 22:59 ` Dave Chinner
2020-05-22 3:50 ` [PATCH 20/24] xfs: xfs_iflush() is no longer necessary Dave Chinner
2020-05-22 23:54 ` Darrick J. Wong
2020-05-22 3:50 ` [PATCH 21/24] xfs: rename xfs_iflush_int() Dave Chinner
2020-05-22 12:33 ` Amir Goldstein
2020-05-22 23:57 ` Darrick J. Wong
2020-05-22 3:50 ` [PATCH 22/24] xfs: rework xfs_iflush_cluster() dirty inode iteration Dave Chinner
2020-05-23 0:13 ` Darrick J. Wong
2020-05-23 23:14 ` Dave Chinner
2020-05-23 11:31 ` Christoph Hellwig
2020-05-23 23:23 ` Dave Chinner
2020-05-24 5:32 ` Christoph Hellwig
2020-05-23 11:39 ` Christoph Hellwig
2020-05-22 3:50 ` [PATCH 23/24] xfs: factor xfs_iflush_done Dave Chinner
2020-05-23 0:20 ` Darrick J. Wong
2020-05-23 11:35 ` Christoph Hellwig
2020-05-22 3:50 ` [PATCH 24/24] xfs: remove xfs_inobp_check() Dave Chinner
2020-05-23 0:16 ` Darrick J. Wong
2020-05-23 11:36 ` Christoph Hellwig
2020-05-22 4:04 ` [PATCH 00/24] xfs: rework inode flushing to make inode reclaim fully asynchronous Dave Chinner
2020-05-23 16:18 ` Darrick J. Wong
2020-05-23 21:22 ` Dave Chinner
2020-05-22 6:18 ` Amir Goldstein
2020-05-22 12:01 ` Amir Goldstein
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAOQ4uxiit=kANPpQBDuT5kU4uuzO1Oi2p8HwU4wuoF1k7viXuw@mail.gmail.com' \
--to=amir73il@gmail.com \
--cc=david@fromorbit.com \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).