Linux-mm Archive on
 help / color / Atom feed
From: Honglei Wang <>
To: Michal Hocko <>
Subject: Re: [PATCH v2] mm/vmscan: get number of pages on the LRU list in memcgroup base on lru_zone_size
Date: Tue, 8 Oct 2019 17:34:03 +0800
Message-ID: <> (raw)
In-Reply-To: <>

On 10/7/19 10:28 PM, Michal Hocko wrote:
> On Thu 05-09-19 15:10:34, Honglei Wang wrote:
>> lruvec_lru_size() is involving lruvec_page_state_local() to get the
>> lru_size in the current code. It's base on lruvec_stat_local.count[]
>> of mem_cgroup_per_node. This counter is updated in batch. It won't
>> do charge if the number of coming pages doesn't meet the needs of
>> MEMCG_CHARGE_BATCH who's defined as 32 now.
>> The testcase in LTP madvise09[1] fails due to small block memory is
>> not charged. It creates a new memcgroup and sets up 32 MADV_FREE
>> pages. Then it forks child who will introduce memory pressure in the
>> memcgroup. The MADV_FREE pages are expected to be released under the
>> pressure, but 32 is not more than MEMCG_CHARGE_BATCH and these pages
>> won't be charged in lruvec_stat_local.count[] until some more pages
>> come in to satisfy the needs of batch charging. So these MADV_FREE
>> pages can't be freed in memory pressure which is a bit conflicted
>> with the definition of MADV_FREE.
> The test case is simly wrong. The caching and the batch size is an
> internal implementation detail. Moreover MADV_FREE is a _hint_ so all
> you can say is that those pages will get freed at some point in time but
> you cannot make any assumptions about when that moment happens.

This is a corner case, it makes extremely memory pressure which give the 
group no chance to satisfy the batch operation. There might be small 
chance to hit such problem in real workload -- 128K memory is really 
small in current amount of memory usage. I know exactly what you mean. 
The batch size is internal implementation detail, this *test case* just 
happen hit it in black box.

>> Getting lru_size base on lru_zone_size of mem_cgroup_per_node which
>> is not updated in batch can make it a bit more accurate in similar
>> scenario.
> What does that mean? It would be more helpful to describe the code path
> which will use this more precise value and what is the effect of that.

How about we describe it like this:

Get the lru_size base on lru_zone_size of mem_cgroup_per_node which is 
not updated via batching can help any related code path get more precise 
lru size in mem_cgroup case. This makes memory reclaim code won't ignore 
small blocks of memory(say, less than MEMCG_CHARGE_BATCH pages) in the 
lru list.

For this specific MADV_FREE page case, more precise lru size helps 
release the pages less than 32 as expected.


> As I've said in the previous version, I do not object to the patch
> because a more precise lruvec_lru_size sounds like a nice thing as long
> as we are not paying a high price for that. Just look at the global case
> for mem_cgroup_disabled(). It uses node_page_state and that one is using
> per-cpu accounting with regular global value refreshing IIRC.
>> [1]
>> Signed-off-by: Honglei Wang <>
>> ---
>>   mm/vmscan.c | 9 +++++----
>>   1 file changed, 5 insertions(+), 4 deletions(-)
>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>> index c77d1e3761a7..c28672460868 100644
>> --- a/mm/vmscan.c
>> +++ b/mm/vmscan.c
>> @@ -354,12 +354,13 @@ unsigned long zone_reclaimable_pages(struct zone *zone)
>>    */
>>   unsigned long lruvec_lru_size(struct lruvec *lruvec, enum lru_list lru, int zone_idx)
>>   {
>> -	unsigned long lru_size;
>> +	unsigned long lru_size = 0;
>>   	int zid;
>> -	if (!mem_cgroup_disabled())
>> -		lru_size = lruvec_page_state_local(lruvec, NR_LRU_BASE + lru);
>> -	else
>> +	if (!mem_cgroup_disabled()) {
>> +		for (zid = 0; zid < MAX_NR_ZONES; zid++)
>> +			lru_size += mem_cgroup_get_zone_lru_size(lruvec, lru, zid);
>> +	} else
>>   		lru_size = node_page_state(lruvec_pgdat(lruvec), NR_LRU_BASE + lru);
>>   	for (zid = zone_idx + 1; zid < MAX_NR_ZONES; zid++) {
>> -- 
>> 2.17.0

  reply index

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-05  7:10 Honglei Wang
2019-10-06  0:10 ` Andrew Morton
2019-10-07 14:28 ` Michal Hocko
2019-10-08  9:34   ` Honglei Wang [this message]
2019-10-09 14:16     ` Michal Hocko
2019-10-10  8:40       ` Honglei Wang
2019-10-10 14:33         ` Michal Hocko
2019-10-11  1:40           ` Honglei Wang

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \ \ \ \ \ \ \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-mm Archive on

Archives are clonable:
	git clone --mirror linux-mm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-mm linux-mm/ \
	public-inbox-index linux-mm

Example config snippet for mirrors

Newsgroup available over NNTP:

AGPL code for this site: git clone