Re: [PATCH RFC] xfs: drop SYNC_WAIT from xfs_reclaim_inodes_ag during slab reclaim

From: Dave Chinner <david@fromorbit.com>
To: Chris Mason <clm@fb.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH RFC] xfs: drop SYNC_WAIT from xfs_reclaim_inodes_ag during slab reclaim
Date: Thu, 17 Nov 2016 11:47:45 +1100	[thread overview]
Message-ID: <20161117004745.GB19783@dastard> (raw)
In-Reply-To: <20161116030344.GA7746@clm-mbp.masoncoding.com>

On Tue, Nov 15, 2016 at 10:03:52PM -0500, Chris Mason wrote:
> Moving forward, I think I can manage to carry the one line patch in
> code that hasn't measurably changed in years.  We'll get it tested
> in a variety of workloads and come back with more benchmarks for the
> great slab rework coming soon to a v5.x kernel near you.

FWIW, I just tested your one-liner against my simoops config here,
and by comparing the behaviour to my patchset that still allows
direct reclaim to block on dirty inodes, it would appear that all
the allocation latency I'm seeing here is from direct reclaim.

So I went looking at the direct reclaim throttle with the intent to
hack it to throttle earlier. It throttles based on watermarks, so
I figured Id just hack them to be larger to trigger direct reclaim
throttling earlier. And then I found this recent addition:

https://patchwork.kernel.org/patch/8426381/

+=============================================================
+
+watermark_scale_factor:
+
+This factor controls the aggressiveness of kswapd. It defines the
+amount of memory left in a node/system before kswapd is woken up and
+how much memory needs to be free before kswapd goes back to sleep.
+
+The unit is in fractions of 10,000. The default value of 10 means the
+distances between watermarks are 0.1% of the available memory in the
+node/system. The maximum value is 1000, or 10% of memory.
+
+A high rate of threads entering direct reclaim (allocstall) or kswapd
+going to sleep prematurely (kswapd_low_wmark_hit_quickly) can indicate
+that the number of free pages kswapd maintains for latency reasons is
+too small for the allocation bursts occurring in the system. This knob
+can then be used to tune kswapd aggressiveness accordingly.
+

The /exact hack/ I was thinking of was committed about 6 months
ago and added "support for ever more" /proc file:

commit 795ae7a0de6b834a0cc202aa55c190ef81496665
Author: Johannes Weiner <hannes@cmpxchg.org>
Date:   Thu Mar 17 14:19:14 2016 -0700

    mm: scale kswapd watermarks in proportion to memory

What's painfully obvious, though, is that even when I wind it up to
it's full threshold (10% memory), it does not prevent direct reclaim
from being entered and causing excessive latencies when it blocks.
This is despite the fact that simoops is now running with a big free
memory reserve (3-3.5GB of free memory on my machine as the page
cache now only consumes ~4GB instead of 7-8GB).

And, while harder to trigger, kswapd still goes on the "free fucking
everything" rampages that trigger page writeback from kswapd and
empty both the page cache and the slab caches. The only difference
now is that it does this /without triggering the allocstall
counter/....

So it's seems that just upping the direct reclaim throttle point
isn't a sufficient workaround for the "too much direct reclaim"
problem here...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com