Linux-mm Archive on lore.kernel.org
 help / color / Atom feed
From: Andrew Morton <akpm@linux-foundation.org>
To: Honglei Wang <honglei.wang@oracle.com>
Cc: linux-mm@kvack.org, vdavydov.dev@gmail.com, hannes@cmpxchg.org,
	mhocko@kernel.org
Subject: Re: [PATCH v2] mm/vmscan: get number of pages on the LRU list in memcgroup base on lru_zone_size
Date: Sat, 5 Oct 2019 17:10:56 -0700
Message-ID: <20191005171056.f96adf25459757a907b32dd7@linux-foundation.org> (raw)
In-Reply-To: <20190905071034.16822-1-honglei.wang@oracle.com>

On Thu,  5 Sep 2019 15:10:34 +0800 Honglei Wang <honglei.wang@oracle.com> wrote:

> lruvec_lru_size() is involving lruvec_page_state_local() to get the
> lru_size in the current code. It's base on lruvec_stat_local.count[]
> of mem_cgroup_per_node. This counter is updated in batch. It won't
> do charge if the number of coming pages doesn't meet the needs of
> MEMCG_CHARGE_BATCH who's defined as 32 now.
> 
> The testcase in LTP madvise09[1] fails due to small block memory is
> not charged. It creates a new memcgroup and sets up 32 MADV_FREE
> pages. Then it forks child who will introduce memory pressure in the
> memcgroup. The MADV_FREE pages are expected to be released under the
> pressure, but 32 is not more than MEMCG_CHARGE_BATCH and these pages
> won't be charged in lruvec_stat_local.count[] until some more pages
> come in to satisfy the needs of batch charging. So these MADV_FREE
> pages can't be freed in memory pressure which is a bit conflicted
> with the definition of MADV_FREE.
> 
> Getting lru_size base on lru_zone_size of mem_cgroup_per_node which
> is not updated in batch can make it a bit more accurate in similar
> scenario.

I redid the changelog somewhat:

: lruvec_lru_size() is invokving lruvec_page_state_local() to get the
: lru_size.  It's base on lruvec_stat_local.count[] of mem_cgroup_per_node. 
: This counter is updated in a batched way.  It won't be charged if the
: number of incoming pages doesn't meet the needs of MEMCG_CHARGE_BATCH
: which is defined as 32.
: 
: The testcase in LTP madvise09[1] fails because small blocks of memory are
: not charged.  It creates a new memcgroup and sets up 32 MADV_FREE pages. 
: Then it forks a child who will introduce memory pressure in the memcgroup.
: The MADV_FREE pages are expected to be released under the pressure, but
: 32 is not more than MEMCG_CHARGE_BATCH and these pages won't be charged in
: lruvec_stat_local.count[] until some more pages come in to satisfy the
: needs of batch charging.  So these MADV_FREE pages can't be freed in
: memory pressure which is a bit conflicted with the definition of
: MADV_FREE.
: 
: Get the lru_size base on lru_zone_size of mem_cgroup_per_node which is not
: updated via batching can making it more accurate in this scenario.
: 
: This is effectively a partial reversion of 1a61ab8038e72 ("mm: memcontrol:
: replace zone summing with lruvec_page_state()").
: 
: [1] https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/syscalls/madvise/madvise09.c


> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -354,12 +354,13 @@ unsigned long zone_reclaimable_pages(struct zone *zone)
>   */
>  unsigned long lruvec_lru_size(struct lruvec *lruvec, enum lru_list lru, int zone_idx)
>  {
> -	unsigned long lru_size;
> +	unsigned long lru_size = 0;
>  	int zid;
>  
> -	if (!mem_cgroup_disabled())
> -		lru_size = lruvec_page_state_local(lruvec, NR_LRU_BASE + lru);
> -	else
> +	if (!mem_cgroup_disabled()) {
> +		for (zid = 0; zid < MAX_NR_ZONES; zid++)
> +			lru_size += mem_cgroup_get_zone_lru_size(lruvec, lru, zid);
> +	} else
>  		lru_size = node_page_state(lruvec_pgdat(lruvec), NR_LRU_BASE + lru);
>  
>  	for (zid = zone_idx + 1; zid < MAX_NR_ZONES; zid++) {

Do we think this problem is serious enough to warrant backporting into
earlier kernels?



  reply index

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-05  7:10 Honglei Wang
2019-10-06  0:10 ` Andrew Morton [this message]
2019-10-07 14:28 ` Michal Hocko
2019-10-08  9:34   ` Honglei Wang
2019-10-09 14:16     ` Michal Hocko
2019-10-10  8:40       ` Honglei Wang
2019-10-10 14:33         ` Michal Hocko
2019-10-11  1:40           ` Honglei Wang

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191005171056.f96adf25459757a907b32dd7@linux-foundation.org \
    --to=akpm@linux-foundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=honglei.wang@oracle.com \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=vdavydov.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-mm Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-mm/0 linux-mm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-mm linux-mm/ https://lore.kernel.org/linux-mm \
		linux-mm@kvack.org
	public-inbox-index linux-mm

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kvack.linux-mm


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git