From: Mel Gorman <mgorman@techsingularity.net>
To: Vlastimil Babka <vbabka@suse.cz>
Cc: linux-mm@kvack.org, Joonsoo Kim <iamjoonsoo.kim@lge.com>,
David Rientjes <rientjes@google.com>,
Michal Hocko <mhocko@kernel.org>,
Johannes Weiner <hannes@cmpxchg.org>,
Andrea Arcangeli <aarcange@redhat.com>,
Rik van Riel <riel@redhat.com>
Subject: Re: [RFC PATCH 6/6] mm: make kcompactd more proactive
Date: Fri, 28 Jul 2017 11:58:25 +0100 [thread overview]
Message-ID: <20170728105825.kofzpchclcngdk7c@techsingularity.net> (raw)
In-Reply-To: <20170727160701.9245-7-vbabka@suse.cz>
On Thu, Jul 27, 2017 at 06:07:01PM +0200, Vlastimil Babka wrote:
> Kcompactd activity is currently tied to kswapd - it is woken up when kswapd
> goes to sleep, and compacts to make a single high-order page available, of the
> order that was used to wake up kswapd. This leaves the rest of free pages
> fragmented and results in direct compaction when the demand for fresh
> high-order pages is higher than a single page per kswapd cycle.
>
> Another extreme would be to let kcompactd compact whole zone the same way as
> manual compaction from /proc interface. This would be wasteful if the resulting
> high-order pages would be not needed, but just split back to base pages for
> allocations.
>
> This patch aims to adjust the kcompactd effort through observed demand for
> high-order pages. This is done by hooking into alloc_pages_slowpath() and
> counting (per each order > 0) allocation attempts that would pass the order-0
> watermarks, but don't have the high-order page available. This demand is
> (currently) recorded per node and then redistributed per zones in each node
> according to their relative sizes.
>
> The redistribution considers the current recorded failed attempts together with
> the value used in the previous kcompactd cycle. If there were any recorded
> failed attempts for the current cycle, it means the previous kcompactd activity
> was insufficient, so the two values are added up. If there were zero failed
> attempts it means either the previous amount of activity was optimum, or that
> the demand decreased. We cannot know that without recording also successful
> attempts, which would add overhead to allocator fast paths, so we use
> exponential moving average to decay the kcompactd target in such case.
> In any case, the target is capped to high watermark worth of base pages, since
> that's the kswapd's target when balancing.
>
> Kcompactd then uses a different termination criteria than direct compaction.
> It checks whether for each order, the recorded number of attempted allocations
> would fit within the free pages of that order of with possible splitting of
> higher orders, assuming there would be no allocations of other orders. This
> should make kcompactd effort reflect the high-order demand.
>
> In the worst case, the demand is so high that kcompactd will in fact compact
> the whole zone and would have to be run with higher frequency than kswapd to
> make a larger difference. That possibility can be explored later.
Very broadly speaking, I can't see a problem with the direction you are
taking. Misc comments are
o kcompactd_inc_free_target is a bit excessive without data backing it
up. It's overkill to go through every allowed node incrementing counters
in the page allocator slow path. It's not even necessarily a good idea
because it's hard to reason what impact that has on how the attempts get
decayed and what impact it can have on remote nodes that. At a first
cut, I would have thought incrementing the preferred zone only would be
reasonable. If there are concerns about small high zones then every zone
in the local node and do not bother with the cpuset checks. Overall, don't
worry about the remote nodes unless there is strong evidence it's needed.
o Similarly, it's not clear how much benefit there is to spreading
targets across zones and the compexity in there. I would suggest
keeping kcompactd_inc_free_target as simple as possible for as long as
possible. While it's called from the page allocator slowpath for high-order
allocations only, we shouldn't pay costs there unless we have to.
o The atomics seem a little overkill considering that this is just a
heuristic hint. If lost updates happen, it's not that big a deal and
at worst, there is a spurious compaction run just as the counters hit
0. That corner case is marginal compared to the atomic overheads. Just
watch for going negative due to the races which is a minor fix.
--
Mel Gorman
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2017-07-28 10:58 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-07-27 16:06 [RFC PATCH 0/6] proactive kcompactd Vlastimil Babka
2017-07-27 16:06 ` [PATCH 1/6] mm, kswapd: refactor kswapd_try_to_sleep() Vlastimil Babka
2017-07-28 9:38 ` Mel Gorman
2017-07-27 16:06 ` [PATCH 2/6] mm, kswapd: don't reset kswapd_order prematurely Vlastimil Babka
2017-07-28 10:16 ` Mel Gorman
2017-07-27 16:06 ` [PATCH 3/6] mm, kswapd: reset kswapd's order to 0 when it fails to reclaim enough Vlastimil Babka
2017-07-27 16:06 ` [PATCH 4/6] mm, kswapd: wake up kcompactd when kswapd had too many failures Vlastimil Babka
2017-07-28 10:41 ` Mel Gorman
2017-07-27 16:07 ` [RFC PATCH 5/6] mm, compaction: stop when number of free pages goes below watermark Vlastimil Babka
2017-07-28 10:43 ` Mel Gorman
2017-07-27 16:07 ` [RFC PATCH 6/6] mm: make kcompactd more proactive Vlastimil Babka
2017-07-28 10:58 ` Mel Gorman [this message]
2017-08-09 20:58 ` [RFC PATCH 0/6] proactive kcompactd David Rientjes
2017-08-21 14:10 ` Johannes Weiner
2017-08-21 21:40 ` Rik van Riel
2017-08-22 20:57 ` David Rientjes
2017-08-23 5:36 ` Joonsoo Kim
2017-08-23 8:12 ` Vlastimil Babka
2017-08-24 6:24 ` Joonsoo Kim
2017-08-24 11:30 ` Vlastimil Babka
2017-08-24 23:51 ` Joonsoo Kim
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170728105825.kofzpchclcngdk7c@techsingularity.net \
--to=mgorman@techsingularity.net \
--cc=aarcange@redhat.com \
--cc=hannes@cmpxchg.org \
--cc=iamjoonsoo.kim@lge.com \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=riel@redhat.com \
--cc=rientjes@google.com \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).