linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Hillf Danton <hdanton@sina.com>
To: Mike Kravetz <mike.kravetz@oracle.com>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Vlastimil Babka <vbabka@suse.cz>, Yu Zhao <yuzhao@google.com>,
	Mel Gorman <mgorman@techsingularity.net>,
	Michal Hocko <mhocko@suse.com>
Subject: Re: [PATCH RESEND 0/8] hugetlb: add demote/split page functionality
Date: Thu,  9 Sep 2021 12:07:06 +0800	[thread overview]
Message-ID: <20210909040706.3876-1-hdanton@sina.com> (raw)
In-Reply-To: <6c42bed7-d4dd-e5eb-5a74-24cf64bf52d3@oracle.com>

On Wed, 8 Sep 2021 14:00:19 -0700 Mike Kravetz wrote:
>On 9/7/21 1:50 AM, Hillf Danton wrote:
>> On Mon, 6 Sep 2021 16:40:28 +0200 Vlastimil Babka wrote:
>>> On 9/2/21 20:17, Mike Kravetz wrote:
>>>>
>>>> Here is some very high level information from a long stall that was
>>>> interrupted.  This was an order 9 allocation from alloc_buddy_huge_page().
>>>>
>>>> 55269.530564] __alloc_pages_slowpath: jiffies 47329325 tries 609673 cpu_tries 1   node 0 FAIL
>>>> [55269.539893]     r_tries 25       c_tries 609647   reclaim 47325161 compact 607     
>>>>
>>>> Yes, in __alloc_pages_slowpath for 47329325 jiffies before being interrupted.
>>>> should_reclaim_retry returned true 25 times and should_compact_retry returned
>>>> true 609647 times.
>>>> Almost all time (47325161 jiffies) spent in __alloc_pages_direct_reclaim, and
>>>> 607 jiffies spent in __alloc_pages_direct_compact.
>>>>
>>>> Looks like both
>>>> reclaim retries > MAX_RECLAIM_RETRIES
>>>> and
>>>> compaction retries > MAX_COMPACT_RETRIES
>>>>
>>> Yeah AFAICS that's only possible with the scenario I suspected. I guess
>>> we should put a limit on compact retries (maybe some multiple of
>>> MAX_COMPACT_RETRIES) even if it thinks that reclaim could help, while
>>> clearly it doesn't (i.e. because somebody else is stealing the page like
>>> in your test case).
>> 
>> And/or clamp reclaim retries for costly orders
>> 
>> 	reclaim retries = MAX_RECLAIM_RETRIES - order;
>> 
>> to pull down the chance for stall as low as possible.
>
>Thanks, and sorry for not replying quickly.  I only get back to this as
>time allows.
>
>We could clamp the number of compaction and reclaim retries in
>__alloc_pages_slowpath as suggested.  However, I noticed that a single
>reclaim call could take a bunch of time.  As a result, I instrumented
>shrink_node to see what might be happening.  Here is some information
>from a long stall.  Note that I only dump stats when jiffies > 100000.
>
>[ 8136.874706] shrink_node: 507654 total jiffies,  3557110 tries
>[ 8136.881130]              130596341 reclaimed, 32 nr_to_reclaim
>[ 8136.887643]              compaction_suitable results:
>[ 8136.893276]     idx COMPACT_SKIPPED, 3557109
>[ 8672.399839] shrink_node: 522076 total jiffies,  3466228 tries
>[ 8672.406268]              124427720 reclaimed, 32 nr_to_reclaim
>[ 8672.412782]              compaction_suitable results:
>[ 8672.418421]     idx COMPACT_SKIPPED, 3466227
>[ 8908.099592] __alloc_pages_slowpath: jiffies 2939938  tries 17068 cpu_tries 1   node 0 success
>[ 8908.109120]     r_tries 11       c_tries 17056    reclaim 2939865  compact 9
>
>In this case, clamping the number of retries from should_compact_retry
>and should_reclaim_retry could help.  Mostly because we will not be
>calling back into the reclaim code?  Notice the long amount of time spent
>in shrink_node.  The 'tries' in shrink_node come about from that:
>
>	if (should_continue_reclaim(pgdat, sc->nr_reclaimed - nr_reclaimed,
>				    sc))
>		goto again;
>
>compaction_suitable results is the values returned from calls to
>should_continue_reclaim -> compaction_suitable.
>
>Trying to think if there might be an intelligent way to quit early.

Given the downgrade of costly order to zero on the kswapd side, what you
found suggests the need to bridge the gap between sc->nr_to_reclaim and
compact_gap(sc->order) for direct reclaims.

If nr_to_reclaim is the primary target for direct reclaim, one option is
ask kswapd to do costly order reclaims, with stall handled by r_tries and
c_tries in the slowpath.

+++ x/mm/vmscan.c
@@ -3220,7 +3220,11 @@ again:
 
 	if (should_continue_reclaim(pgdat, sc->nr_reclaimed - nr_reclaimed,
 				    sc))
-		goto again;
+		if (!current_is_kswapd() && sc->nr_reclaimed >=
+					    sc->nr_to_reclaim)
+			/* job done */ ;
+		else
+			goto again;
 
 	/*
 	 * Kswapd gives up on balancing particular nodes after too


  reply	other threads:[~2021-09-09  4:07 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-16 22:49 [PATCH RESEND 0/8] hugetlb: add demote/split page functionality Mike Kravetz
2021-08-16 22:49 ` [PATCH 1/8] hugetlb: add demote hugetlb page sysfs interfaces Mike Kravetz
2021-08-16 22:49 ` [PATCH 2/8] hugetlb: add HPageCma flag and code to free non-gigantic pages in CMA Mike Kravetz
2021-08-16 22:49 ` [PATCH 3/8] hugetlb: add demote bool to gigantic page routines Mike Kravetz
2021-08-16 22:49 ` [PATCH 4/8] hugetlb: add hugetlb demote page support Mike Kravetz
2021-08-16 22:49 ` [PATCH 5/8] hugetlb: document the demote sysfs interfaces Mike Kravetz
2021-08-16 23:28   ` Andrew Morton
2021-08-17  1:04     ` Mike Kravetz
2021-09-21 13:52   ` Aneesh Kumar K.V
2021-09-21 17:17     ` Mike Kravetz
2021-08-16 22:49 ` [PATCH 6/8] hugetlb: vmemmap optimizations when demoting hugetlb pages Mike Kravetz
2021-08-16 22:49 ` [PATCH 7/8] hugetlb: prepare destroy and prep routines for vmemmap optimized pages Mike Kravetz
2021-08-16 22:49 ` [PATCH 8/8] hugetlb: Optimized demote vmemmap optimizatized pages Mike Kravetz
2021-08-16 23:23 ` [PATCH RESEND 0/8] hugetlb: add demote/split page functionality Andrew Morton
2021-08-17  0:17   ` Mike Kravetz
2021-08-17  0:39     ` Andrew Morton
2021-08-17  0:58       ` Mike Kravetz
2021-08-16 23:27 ` Andrew Morton
2021-08-17  0:46   ` Mike Kravetz
2021-08-17  1:46     ` Andrew Morton
2021-08-17  7:30       ` David Hildenbrand
2021-08-17 16:19         ` Mike Kravetz
2021-08-17 18:49           ` David Hildenbrand
2021-08-24 22:08       ` Mike Kravetz
2021-08-26  7:32         ` Hillf Danton
2021-08-27 17:22         ` Vlastimil Babka
2021-08-27 23:04           ` Mike Kravetz
2021-08-30 10:11             ` Vlastimil Babka
2021-09-02 18:17               ` Mike Kravetz
2021-09-06 14:40                 ` Vlastimil Babka
2021-09-07  8:50                   ` Hillf Danton
2021-09-08 21:00                     ` Mike Kravetz
2021-09-09  4:07                       ` Hillf Danton [this message]
2021-09-09 11:54                       ` Michal Hocko
2021-09-09 13:45                         ` Vlastimil Babka
2021-09-09 21:31                           ` Mike Kravetz
2021-09-10  8:20                           ` Michal Hocko
2021-09-11  0:11                             ` Mike Kravetz
2021-09-11  3:11                               ` Hillf Danton
2021-09-13 15:50                               ` Michal Hocko
2021-09-15 16:57                                 ` Mike Kravetz
2021-09-17 20:44                                   ` Mike Kravetz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210909040706.3876-1-hdanton@sina.com \
    --to=hdanton@sina.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=mhocko@suse.com \
    --cc=mike.kravetz@oracle.com \
    --cc=vbabka@suse.cz \
    --cc=yuzhao@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).