Linux-mm Archive on lore.kernel.org
 help / color / Atom feed
From: Michal Hocko <mhocko@kernel.org>
To: Honglei Wang <honglei.wang@oracle.com>
Cc: linux-mm@kvack.org, vdavydov.dev@gmail.com, hannes@cmpxchg.org
Subject: Re: [PATCH v2] mm/vmscan: get number of pages on the LRU list in memcgroup base on lru_zone_size
Date: Mon, 7 Oct 2019 16:28:05 +0200
Message-ID: <20191007142805.GM2381@dhcp22.suse.cz> (raw)
In-Reply-To: <20190905071034.16822-1-honglei.wang@oracle.com>

On Thu 05-09-19 15:10:34, Honglei Wang wrote:
> lruvec_lru_size() is involving lruvec_page_state_local() to get the
> lru_size in the current code. It's base on lruvec_stat_local.count[]
> of mem_cgroup_per_node. This counter is updated in batch. It won't
> do charge if the number of coming pages doesn't meet the needs of
> MEMCG_CHARGE_BATCH who's defined as 32 now.
> 
> The testcase in LTP madvise09[1] fails due to small block memory is
> not charged. It creates a new memcgroup and sets up 32 MADV_FREE
> pages. Then it forks child who will introduce memory pressure in the
> memcgroup. The MADV_FREE pages are expected to be released under the
> pressure, but 32 is not more than MEMCG_CHARGE_BATCH and these pages
> won't be charged in lruvec_stat_local.count[] until some more pages
> come in to satisfy the needs of batch charging. So these MADV_FREE
> pages can't be freed in memory pressure which is a bit conflicted
> with the definition of MADV_FREE.

The test case is simly wrong. The caching and the batch size is an
internal implementation detail. Moreover MADV_FREE is a _hint_ so all
you can say is that those pages will get freed at some point in time but
you cannot make any assumptions about when that moment happens.

> Getting lru_size base on lru_zone_size of mem_cgroup_per_node which
> is not updated in batch can make it a bit more accurate in similar
> scenario.

What does that mean? It would be more helpful to describe the code path
which will use this more precise value and what is the effect of that.

As I've said in the previous version, I do not object to the patch
because a more precise lruvec_lru_size sounds like a nice thing as long
as we are not paying a high price for that. Just look at the global case
for mem_cgroup_disabled(). It uses node_page_state and that one is using
per-cpu accounting with regular global value refreshing IIRC.

> [1] https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/syscalls/madvise/madvise09.c
> 
> Signed-off-by: Honglei Wang <honglei.wang@oracle.com>
> ---
>  mm/vmscan.c | 9 +++++----
>  1 file changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index c77d1e3761a7..c28672460868 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -354,12 +354,13 @@ unsigned long zone_reclaimable_pages(struct zone *zone)
>   */
>  unsigned long lruvec_lru_size(struct lruvec *lruvec, enum lru_list lru, int zone_idx)
>  {
> -	unsigned long lru_size;
> +	unsigned long lru_size = 0;
>  	int zid;
>  
> -	if (!mem_cgroup_disabled())
> -		lru_size = lruvec_page_state_local(lruvec, NR_LRU_BASE + lru);
> -	else
> +	if (!mem_cgroup_disabled()) {
> +		for (zid = 0; zid < MAX_NR_ZONES; zid++)
> +			lru_size += mem_cgroup_get_zone_lru_size(lruvec, lru, zid);
> +	} else
>  		lru_size = node_page_state(lruvec_pgdat(lruvec), NR_LRU_BASE + lru);
>  
>  	for (zid = zone_idx + 1; zid < MAX_NR_ZONES; zid++) {
> -- 
> 2.17.0

-- 
Michal Hocko
SUSE Labs


  parent reply index

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-05  7:10 Honglei Wang
2019-10-06  0:10 ` Andrew Morton
2019-10-07 14:28 ` Michal Hocko [this message]
2019-10-08  9:34   ` Honglei Wang
2019-10-09 14:16     ` Michal Hocko
2019-10-10  8:40       ` Honglei Wang
2019-10-10 14:33         ` Michal Hocko
2019-10-11  1:40           ` Honglei Wang

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191007142805.GM2381@dhcp22.suse.cz \
    --to=mhocko@kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=honglei.wang@oracle.com \
    --cc=linux-mm@kvack.org \
    --cc=vdavydov.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-mm Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-mm/0 linux-mm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-mm linux-mm/ https://lore.kernel.org/linux-mm \
		linux-mm@kvack.org
	public-inbox-index linux-mm

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kvack.linux-mm


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git