Re: [RESEND PATCH] mm:page_alloc.c: lower the order requirement of should_reclaim_retry

From: Zhenhua Huang <quic_zhenhuah@quicinc.com>
To: Michal Hocko <mhocko@suse.com>
Cc: <akpm@linux-foundation.org>, <mgorman@techsingularity.net>,
	<vbabka@suse.cz>, <linux-mm@kvack.org>,
	<quic_tingweiz@quicinc.com>
Subject: Re: [RESEND PATCH] mm:page_alloc.c: lower the order requirement of should_reclaim_retry
Date: Tue, 20 Sep 2022 17:38:30 +0800	[thread overview]
Message-ID: <f6e03e2f-bef8-ff11-79ef-88657556bfb0@quicinc.com> (raw)
In-Reply-To: <YyhujKOF+4nacYKl@dhcp22.suse.cz>

Thanks Michal for comments!

On 2022/9/19 21:28, Michal Hocko wrote:
> On Mon 19-09-22 19:24:32, Zhenhua Huang wrote:
>> Thanks Michal for comments!
>>
>> On 2022/9/19 16:14, Michal Hocko wrote:
>>> On Mon 19-09-22 11:00:55, Zhenhua Huang wrote:
>>>> When a driver was continuously allocating order 3
>>>> pages, it would be very easily OOM even there were lots of reclaimable
>>>> pages. A test module is used to reproduce this issue,
>>>> several key ftrace events are as below:
>>>>
>>>>             insmod-6968    [005] ....   321.306007: reclaim_retry_zone: node=0 zone=Normal
>>>> order=3 reclaimable=539988 available=592856 min_wmark=21227 no_progress_loops=0
>>>> wmark_check=0
>>>>             insmod-6968    [005] ....   321.306009: compact_retry: order=3
>>>> priority=COMPACT_PRIO_SYNC_LIGHT compaction_result=withdrawn retries=0
>>>> max_retries=16 should_retry=1
>>>>             insmod-6968    [004] ....   321.308220:
>>>> mm_compaction_try_to_compact_pages: order=3 gfp_mask=GFP_KERNEL priority=0
>>>>             insmod-6968    [004] ....   321.308964: mm_compaction_end:
>>>> zone_start=0x80000 migrate_pfn=0xaa800 free_pfn=0x80800 zone_end=0x940000,
>>>> mode=sync status=complete
>>>>             insmod-6968    [004] ....   321.308971: reclaim_retry_zone: node=0
>>>> zone=Normal   order=3 reclaimable=539830 available=592776 min_wmark=21227
>>>> no_progress_loops=0 wmark_check=0
>>>>             insmod-6968    [004] ....   321.308973: compact_retry: order=3
>>>> priority=COMPACT_PRIO_SYNC_FULL compaction_result=failed retries=0
>>>> max_retries=16 should_retry=0
>>>>
>>>> There're ~2GB reclaimable pages(reclaimable=539988) but VM decides not to
>>>> reclaim any more:
>>>> insmod-6968    [005] ....   321.306007: reclaim_retry_zone: node=0 zone=Normal
>>>> order=3 reclaimable=539988 available=592856 min_wmark=21227 no_progress_loops=0
>>>> wmark_check=0
>>>>
>>>> >From meminfo when oom, there was NO qualified order >= 3 pages(CMA page not qualified)
>>>> can meet should_reclaim_retry's requirement:
>>>> Normal  : 24671*4kB (UMEC) 13807*8kB (UMEC) 8214*16kB (UEC) 190*32kB (C)
>>>> 94*64kB (C) 28*128kB (C) 16*256kB (C) 7*512kB (C) 5*1024kB (C) 7*2048kB (C)
>>>> 46*4096kB (C) = 571796kB
>>>>
>>>> The reason of should_reclaim_retry early aborting was that is based on having the order
>>>> pages in its free_list. For order 3 pages, that's easily fragmented. Considering enough free
>>>> pages are the fundamental of compaction. It may not be suitable to stop reclaiming
>>>> when lots of page cache there. Relax order by one to fix this issue.
>>>
>>> For the higher order request we rely on should_compact_retry which backs
>>> on based on the compaction feedback. I would recommend looking why the
>>> compaction fails.
>> I think the reason of compaction failure is there're not enough free pages.
>> Like in ftrace events showed, free pages(which include CMA) was only 592856
>> - 539988 = 52868 pages(reclaimable=539988 available=592856).
>>
>> There are some restrictions like suitable_migration_target() for free pages
>> and suitable_migration_source() for movable pages. Hence eligible targets is
>> fewer.
> 
> If the compaction decides the retry is not worth it then either it is
> making a wrong call or it doesn't make sense to retry.

Yeah.. got it. This case compaction already tried its best with highest 
prio in ftrace logs.

>   
>>> Also this patch doesn't really explain why it should work and honestly
>>> it doesn't really make much sense to me either.
>> Sorry, my fault. IMO, The reason it should work is, say for this case of
>> order 3 allocation: we can perform direct reclaim more times as we have only
>> order 2 pages(which *lowered* by this change) in free_list(8214*16kB (UEC)).
>> The order requirement which I have lowered is should_reclaim_retry ->
>> __zone_watermark_ok:
>>         for (o = order; o < MAX_ORDER; o++) {
>>                  struct free_area *area = &z->free_area[o];
>> ...
>>                  for (mt = 0; mt < MIGRATE_PCPTYPES; mt++) {
>>                          if (!free_area_empty(area, mt))
>>                                  return true;
>>                  }
>>
>> Order 2 pages can be more easily met, hence VM has more chance to return
>> true from should_reclaim_retry.
> 
> This is a wrong approach to the problem because there is no real
> guarantee the reclaim round will do anything useful. You should be
> really looking at the compaction side of the thing.

Thanks Michal for the advice, I'll look at from compaction side also. 
But I have one further question, IMO reclaim(~2GB LRU pages can be 
reclaimed) should be more feasible compared to compaction(already tried 
with highest prio and failed) in this case? Could you please elaborate 
more...it seems I still not fully understand why it's a wrong approach 
to check from reclaim phase.

Thanks,
Zhenhua