linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Johannes Weiner <hannes@cmpxchg.org>
To: "zhaoyang.huang" <zhaoyang.huang@unisoc.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Suren Baghdasaryan <surenb@google.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Zhaoyang Huang <huangzhaoyang@gmail.com>,
	ke.wang@unisoc.com
Subject: Re: [PATCH] mm: deduct the number of pages reclaimed by madvise from workingset
Date: Thu, 25 May 2023 09:54:07 -0400	[thread overview]
Message-ID: <20230525135407.GA31865@cmpxchg.org> (raw)
In-Reply-To: <1684919574-28368-1-git-send-email-zhaoyang.huang@unisoc.com>

On Wed, May 24, 2023 at 05:12:54PM +0800, zhaoyang.huang wrote:
> From: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
> 
> The pages reclaimed by madvise_pageout are made of inactive and dropped from LRU
> forcefully, which lead to the coming up refault pages possess a large refault
> distance than it should be. These could affect the accuracy of thrashing when
> madvise_pageout is used as a common way of memory reclaiming as ANDROID does now.

This alludes to, but doesn't explain, a real world usecase.

Yes, madvise_pageout() will record non-resident entries today. This
means refault and thrash detection is on for user-driven reclaim.

So why is that undesirable?

Today we measure and report the cost of reclaim and memory pressure
for physical memory shortages, cgroup limits, and user-driven cgroup
reclaim. Why should we not do the same for madv_pageout()? If the
userspace code that drives pageout has a bug and the result is extreme
thrashing, wouldn't you want to know that?

Please explain the idea here better.

> Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
> ---
>  include/linux/swap.h | 2 +-
>  mm/madvise.c         | 4 ++--
>  mm/vmscan.c          | 8 +++++++-
>  3 files changed, 10 insertions(+), 4 deletions(-)
> 
> diff --git a/include/linux/swap.h b/include/linux/swap.h
> index 2787b84..0312142 100644
> --- a/include/linux/swap.h
> +++ b/include/linux/swap.h
> @@ -428,7 +428,7 @@ extern unsigned long mem_cgroup_shrink_node(struct mem_cgroup *mem,
>  extern int vm_swappiness;
>  long remove_mapping(struct address_space *mapping, struct folio *folio);
>  
> -extern unsigned long reclaim_pages(struct list_head *page_list);
> +extern unsigned long reclaim_pages(struct mm_struct *mm, struct list_head *page_list);
>  #ifdef CONFIG_NUMA
>  extern int node_reclaim_mode;
>  extern int sysctl_min_unmapped_ratio;
> diff --git a/mm/madvise.c b/mm/madvise.c
> index b6ea204..61c8d7b 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -420,7 +420,7 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd,
>  huge_unlock:
>  		spin_unlock(ptl);
>  		if (pageout)
> -			reclaim_pages(&page_list);
> +			reclaim_pages(mm, &page_list);
>  		return 0;
>  	}
>  
> @@ -516,7 +516,7 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd,
>  	arch_leave_lazy_mmu_mode();
>  	pte_unmap_unlock(orig_pte, ptl);
>  	if (pageout)
> -		reclaim_pages(&page_list);
> +		reclaim_pages(mm, &page_list);
>  	cond_resched();
>  
>  	return 0;
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 20facec..048c10b 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -2741,12 +2741,14 @@ static unsigned int reclaim_folio_list(struct list_head *folio_list,
>  	return nr_reclaimed;
>  }
>  
> -unsigned long reclaim_pages(struct list_head *folio_list)
> +unsigned long reclaim_pages(struct mm_struct *mm, struct list_head *folio_list)
>  {
>  	int nid;
>  	unsigned int nr_reclaimed = 0;
>  	LIST_HEAD(node_folio_list);
>  	unsigned int noreclaim_flag;
> +	struct lruvec *lruvec;
> +	struct mem_cgroup *memcg = get_mem_cgroup_from_mm(mm);
>  
>  	if (list_empty(folio_list))
>  		return nr_reclaimed;
> @@ -2764,10 +2766,14 @@ unsigned long reclaim_pages(struct list_head *folio_list)
>  		}
>  
>  		nr_reclaimed += reclaim_folio_list(&node_folio_list, NODE_DATA(nid));
> +		lruvec = &memcg->nodeinfo[nid]->lruvec;
> +		workingset_age_nonresident(lruvec, -nr_reclaimed);
>  		nid = folio_nid(lru_to_folio(folio_list));
>  	} while (!list_empty(folio_list));
>  
>  	nr_reclaimed += reclaim_folio_list(&node_folio_list, NODE_DATA(nid));
> +	lruvec = &memcg->nodeinfo[nid]->lruvec;
> +	workingset_age_nonresident(lruvec, -nr_reclaimed);

The task might have moved cgroups in between, who knows what kind of
artifacts it will introduce if you wind back the wrong clock.

If there are reclaim passes that shouldn't participate in non-resident
tracking, that should be plumbed through the stack to __remove_mapping
(which already has that bool reclaimed param to not record entries).


  parent reply	other threads:[~2023-05-25 13:54 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-24  9:12 [PATCH] mm: deduct the number of pages reclaimed by madvise from workingset zhaoyang.huang
2023-05-24 20:40 ` Suren Baghdasaryan
2023-05-25  1:23   ` Zhaoyang Huang
2023-05-25 13:54 ` Johannes Weiner [this message]
2023-05-26  6:38   ` Zhaoyang Huang
2023-05-26 17:31     ` Suren Baghdasaryan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230525135407.GA31865@cmpxchg.org \
    --to=hannes@cmpxchg.org \
    --cc=akpm@linux-foundation.org \
    --cc=huangzhaoyang@gmail.com \
    --cc=ke.wang@unisoc.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=surenb@google.com \
    --cc=zhaoyang.huang@unisoc.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).