All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Huang, Ying" <ying.huang@intel.com>
To: Feng Tang <feng.tang@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	"Hocko, Michal" <mhocko@suse.com>, Tejun Heo <tj@kernel.org>,
	Zefan Li <lizefan.x@bytedance.com>,
	Waiman Long <longman@redhat.com>,
	"aneesh.kumar@linux.ibm.com" <aneesh.kumar@linux.ibm.com>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"cgroups@vger.kernel.org" <cgroups@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"Hansen, Dave" <dave.hansen@intel.com>,
	"Chen, Tim C" <tim.c.chen@intel.com>,
	"Yin, Fengwei" <fengwei.yin@intel.com>
Subject: Re: [PATCH] mm/vmscan: respect cpuset policy during page demotion
Date: Thu, 27 Oct 2022 14:05:19 +0800	[thread overview]
Message-ID: <871qqtls2o.fsf@yhuang6-desk2.ccr.corp.intel.com> (raw)
In-Reply-To: <Y1ob5XxqJzTDjBYy@feng-clx> (Feng Tang's message of "Thu, 27 Oct 2022 13:49:25 +0800")

Feng Tang <feng.tang@intel.com> writes:

> On Thu, Oct 27, 2022 at 01:13:30PM +0800, Huang, Ying wrote:
>> Feng Tang <feng.tang@intel.com> writes:
>> 
>> > In page reclaim path, memory could be demoted from faster memory tier
>> > to slower memory tier. Currently, there is no check about cpuset's
>> > memory policy, that even if the target demotion node is not allowd
>> > by cpuset, the demotion will still happen, which breaks the cpuset
>> > semantics.
>> >
>> > So add cpuset policy check in the demotion path and skip demotion
>> > if the demotion targets are not allowed by cpuset.
>> >
>> > Signed-off-by: Feng Tang <feng.tang@intel.com>
>> > ---
> [...]
>> > index 18f6497994ec..c205d98283bc 100644
>> > --- a/mm/vmscan.c
>> > +++ b/mm/vmscan.c
>> > @@ -1537,9 +1537,21 @@ static struct page *alloc_demote_page(struct page *page, unsigned long private)
>> >  {
>> >  	struct page *target_page;
>> >  	nodemask_t *allowed_mask;
>> > -	struct migration_target_control *mtc;
>> > +	struct migration_target_control *mtc = (void *)private;
>> >  
>> > -	mtc = (struct migration_target_control *)private;
>> 
>> I think we should avoid (void *) conversion here.
>
> OK, will change back.
>
>> > +#if IS_ENABLED(CONFIG_MEMCG) && IS_ENABLED(CONFIG_CPUSETS)
>> > +	struct mem_cgroup *memcg;
>> > +	nodemask_t cpuset_nmask;
>> > +
>> > +	memcg = page_memcg(page);
>> > +	cpuset_get_allowed_mem_nodes(memcg->css.cgroup, &cpuset_nmask);
>> > +
>> > +	if (!node_isset(mtc->nid, cpuset_nmask)) {
>> > +		if (mtc->nmask)
>> > +			nodes_and(*mtc->nmask, *mtc->nmask, cpuset_nmask);
>> > +		return alloc_migration_target(page, (unsigned long)mtc);
>> > +	}
>> 
>> If node_isset(mtc->nid, cpuset_nmask) == true, we should keep the
>> original 2 steps allocation and apply nodes_and() on node mask.
>
> Good catch! Yes, the nodes_and() call should be taken out from this
> check and done before calling node_isset().
>
>> > +#endif
>> >  
>> >  	allowed_mask = mtc->nmask;
>> >  	/*
>> > @@ -1649,6 +1661,7 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
>> >  		enum folio_references references = FOLIOREF_RECLAIM;
>> >  		bool dirty, writeback;
>> >  		unsigned int nr_pages;
>> > +		bool skip_this_demotion = false;
>> >  
>> >  		cond_resched();
>> >  
>> > @@ -1658,6 +1671,22 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
>> >  		if (!folio_trylock(folio))
>> >  			goto keep;
>> >  
>> > +#if IS_ENABLED(CONFIG_MEMCG) && IS_ENABLED(CONFIG_CPUSETS)
>> > +		if (do_demote_pass) {
>> > +			struct mem_cgroup *memcg;
>> > +			nodemask_t nmask, nmask1;
>> > +
>> > +			node_get_allowed_targets(pgdat, &nmask);
>> 
>> pgdat will not change in the loop, so we can move this out of the loop?
>  
> Yes
>
>> > +			memcg = folio_memcg(folio);
>> > +			if (memcg)
>> > +				cpuset_get_allowed_mem_nodes(memcg->css.cgroup,
>> > +								&nmask1);
>> > +
>> > +			if (!nodes_intersects(nmask, nmask1))
>> > +				skip_this_demotion = true;
>> > +		}
>> 
>> If nodes_intersects() == true, we will call
>> cpuset_get_allowed_mem_nodes() twice.  Better to pass the intersecting
>> mask to demote_folio_list()?
>  
> The pages in the loop may come from different mem control group, and
> the cpuset's nodemask could be different, I don't know how to save
> this per-page info to be used later in demote_folio_list.

Yes.  You are right.  We cannot do that.

Best Regards,
Huang, Ying

>
>> > +#endif
>> > +
>> >  		VM_BUG_ON_FOLIO(folio_test_active(folio), folio);
>> >  
>> >  		nr_pages = folio_nr_pages(folio);
>> > @@ -1799,7 +1828,7 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
>> >  		 * Before reclaiming the folio, try to relocate
>> >  		 * its contents to another node.
>> >  		 */
>> > -		if (do_demote_pass &&
>> > +		if (do_demote_pass && !skip_this_demotion &&
>> >  		    (thp_migration_supported() || !folio_test_large(folio))) {
>> >  			list_add(&folio->lru, &demote_folios);
>> >  			folio_unlock(folio);
>> 
>> Best Regards,
>> Huang, Ying

WARNING: multiple messages have this Message-ID (diff)
From: "Huang, Ying" <ying.huang-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
To: Feng Tang <feng.tang-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Cc: Andrew Morton
	<akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>,
	Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>,
	"Hocko, Michal" <mhocko-IBi9RG/b67k@public.gmane.org>,
	Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>,
	Zefan Li <lizefan.x-EC8Uxl6Npydl57MIdRCFDg@public.gmane.org>,
	Waiman Long <longman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
	"aneesh.kumar-tEXmvtCZX7AybS5Ee8rs3A@public.gmane.org"
	<aneesh.kumar-tEXmvtCZX7AybS5Ee8rs3A@public.gmane.org>,
	"linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org"
	<linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org>,
	"cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
	<cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	"linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
	<linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	"Hansen,
	Dave" <dave.hansen-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>,
	"Chen,
	Tim C" <tim.c.chen-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>,
	"Yin,
	Fengwei" <fengwei.yin-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Subject: Re: [PATCH] mm/vmscan: respect cpuset policy during page demotion
Date: Thu, 27 Oct 2022 14:05:19 +0800	[thread overview]
Message-ID: <871qqtls2o.fsf@yhuang6-desk2.ccr.corp.intel.com> (raw)
In-Reply-To: <Y1ob5XxqJzTDjBYy@feng-clx> (Feng Tang's message of "Thu, 27 Oct 2022 13:49:25 +0800")

Feng Tang <feng.tang-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> writes:

> On Thu, Oct 27, 2022 at 01:13:30PM +0800, Huang, Ying wrote:
>> Feng Tang <feng.tang-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> writes:
>> 
>> > In page reclaim path, memory could be demoted from faster memory tier
>> > to slower memory tier. Currently, there is no check about cpuset's
>> > memory policy, that even if the target demotion node is not allowd
>> > by cpuset, the demotion will still happen, which breaks the cpuset
>> > semantics.
>> >
>> > So add cpuset policy check in the demotion path and skip demotion
>> > if the demotion targets are not allowed by cpuset.
>> >
>> > Signed-off-by: Feng Tang <feng.tang-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
>> > ---
> [...]
>> > index 18f6497994ec..c205d98283bc 100644
>> > --- a/mm/vmscan.c
>> > +++ b/mm/vmscan.c
>> > @@ -1537,9 +1537,21 @@ static struct page *alloc_demote_page(struct page *page, unsigned long private)
>> >  {
>> >  	struct page *target_page;
>> >  	nodemask_t *allowed_mask;
>> > -	struct migration_target_control *mtc;
>> > +	struct migration_target_control *mtc = (void *)private;
>> >  
>> > -	mtc = (struct migration_target_control *)private;
>> 
>> I think we should avoid (void *) conversion here.
>
> OK, will change back.
>
>> > +#if IS_ENABLED(CONFIG_MEMCG) && IS_ENABLED(CONFIG_CPUSETS)
>> > +	struct mem_cgroup *memcg;
>> > +	nodemask_t cpuset_nmask;
>> > +
>> > +	memcg = page_memcg(page);
>> > +	cpuset_get_allowed_mem_nodes(memcg->css.cgroup, &cpuset_nmask);
>> > +
>> > +	if (!node_isset(mtc->nid, cpuset_nmask)) {
>> > +		if (mtc->nmask)
>> > +			nodes_and(*mtc->nmask, *mtc->nmask, cpuset_nmask);
>> > +		return alloc_migration_target(page, (unsigned long)mtc);
>> > +	}
>> 
>> If node_isset(mtc->nid, cpuset_nmask) == true, we should keep the
>> original 2 steps allocation and apply nodes_and() on node mask.
>
> Good catch! Yes, the nodes_and() call should be taken out from this
> check and done before calling node_isset().
>
>> > +#endif
>> >  
>> >  	allowed_mask = mtc->nmask;
>> >  	/*
>> > @@ -1649,6 +1661,7 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
>> >  		enum folio_references references = FOLIOREF_RECLAIM;
>> >  		bool dirty, writeback;
>> >  		unsigned int nr_pages;
>> > +		bool skip_this_demotion = false;
>> >  
>> >  		cond_resched();
>> >  
>> > @@ -1658,6 +1671,22 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
>> >  		if (!folio_trylock(folio))
>> >  			goto keep;
>> >  
>> > +#if IS_ENABLED(CONFIG_MEMCG) && IS_ENABLED(CONFIG_CPUSETS)
>> > +		if (do_demote_pass) {
>> > +			struct mem_cgroup *memcg;
>> > +			nodemask_t nmask, nmask1;
>> > +
>> > +			node_get_allowed_targets(pgdat, &nmask);
>> 
>> pgdat will not change in the loop, so we can move this out of the loop?
>  
> Yes
>
>> > +			memcg = folio_memcg(folio);
>> > +			if (memcg)
>> > +				cpuset_get_allowed_mem_nodes(memcg->css.cgroup,
>> > +								&nmask1);
>> > +
>> > +			if (!nodes_intersects(nmask, nmask1))
>> > +				skip_this_demotion = true;
>> > +		}
>> 
>> If nodes_intersects() == true, we will call
>> cpuset_get_allowed_mem_nodes() twice.  Better to pass the intersecting
>> mask to demote_folio_list()?
>  
> The pages in the loop may come from different mem control group, and
> the cpuset's nodemask could be different, I don't know how to save
> this per-page info to be used later in demote_folio_list.

Yes.  You are right.  We cannot do that.

Best Regards,
Huang, Ying

>
>> > +#endif
>> > +
>> >  		VM_BUG_ON_FOLIO(folio_test_active(folio), folio);
>> >  
>> >  		nr_pages = folio_nr_pages(folio);
>> > @@ -1799,7 +1828,7 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
>> >  		 * Before reclaiming the folio, try to relocate
>> >  		 * its contents to another node.
>> >  		 */
>> > -		if (do_demote_pass &&
>> > +		if (do_demote_pass && !skip_this_demotion &&
>> >  		    (thp_migration_supported() || !folio_test_large(folio))) {
>> >  			list_add(&folio->lru, &demote_folios);
>> >  			folio_unlock(folio);
>> 
>> Best Regards,
>> Huang, Ying

  reply	other threads:[~2022-10-27  6:06 UTC|newest]

Thread overview: 98+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-26  7:43 [PATCH] mm/vmscan: respect cpuset policy during page demotion Feng Tang
2022-10-26  7:43 ` Feng Tang
2022-10-26  7:49 ` Aneesh Kumar K V
2022-10-26  7:49   ` Aneesh Kumar K V
2022-10-26  8:00   ` Feng Tang
2022-10-26  8:00     ` Feng Tang
2022-10-26  9:19     ` Michal Hocko
2022-10-26  9:19       ` Michal Hocko
2022-10-26 10:42       ` Aneesh Kumar K V
2022-10-26 10:42         ` Aneesh Kumar K V
2022-10-26 11:02         ` Michal Hocko
2022-10-26 11:02           ` Michal Hocko
2022-10-26 12:08           ` Aneesh Kumar K V
2022-10-26 12:08             ` Aneesh Kumar K V
2022-10-26 12:21             ` Michal Hocko
2022-10-26 12:21               ` Michal Hocko
2022-10-26 12:35               ` Aneesh Kumar K V
2022-10-26 12:35                 ` Aneesh Kumar K V
2022-10-27  9:02                 ` Michal Hocko
2022-10-27  9:02                   ` Michal Hocko
2022-10-27 10:16                   ` Aneesh Kumar K V
2022-10-27 10:16                     ` Aneesh Kumar K V
2022-10-27 13:05                     ` Michal Hocko
2022-10-27 13:05                       ` Michal Hocko
2022-10-26 12:20       ` Feng Tang
2022-10-26 12:20         ` Feng Tang
2022-10-26 15:59         ` Michal Hocko
2022-10-26 15:59           ` Michal Hocko
2022-10-26 17:57           ` Yang Shi
2022-10-26 17:57             ` Yang Shi
2022-10-27  7:11             ` Feng Tang
2022-10-27  7:11               ` Feng Tang
2022-10-27  7:45               ` Huang, Ying
2022-10-27  7:45                 ` Huang, Ying
2022-10-27  7:51                 ` Feng Tang
2022-10-27  7:51                   ` Feng Tang
2022-10-27 17:55               ` Yang Shi
2022-10-27 17:55                 ` Yang Shi
2022-10-28  3:37                 ` Feng Tang
2022-10-28  3:37                   ` Feng Tang
2022-10-28  5:54                   ` Huang, Ying
2022-10-28  5:54                     ` Huang, Ying
2022-10-28 17:23                     ` Yang Shi
2022-10-28 17:23                       ` Yang Shi
2022-10-31  1:56                       ` Huang, Ying
2022-10-31  1:56                         ` Huang, Ying
2022-10-31  2:19                       ` Feng Tang
2022-10-31  2:19                         ` Feng Tang
2022-10-28  5:09                 ` Aneesh Kumar K V
2022-10-28  5:09                   ` Aneesh Kumar K V
2022-10-28 17:16                   ` Yang Shi
2022-10-28 17:16                     ` Yang Shi
2022-10-31  1:53                     ` Huang, Ying
2022-10-31  1:53                       ` Huang, Ying
2022-10-27  6:47           ` Huang, Ying
2022-10-27  6:47             ` Huang, Ying
2022-10-27  7:10             ` Michal Hocko
2022-10-27  7:10               ` Michal Hocko
2022-10-27  7:39               ` Huang, Ying
2022-10-27  7:39                 ` Huang, Ying
2022-10-27  8:01                 ` Michal Hocko
2022-10-27  8:01                   ` Michal Hocko
2022-10-27  9:31                   ` Huang, Ying
2022-10-27  9:31                     ` Huang, Ying
2022-10-27 12:29                     ` Michal Hocko
2022-10-27 12:29                       ` Michal Hocko
2022-10-27 23:22                       ` Huang, Ying
2022-10-27 23:22                         ` Huang, Ying
2022-10-31  8:40                         ` Michal Hocko
2022-10-31  8:40                           ` Michal Hocko
2022-10-31  8:51                           ` Huang, Ying
2022-10-31  8:51                             ` Huang, Ying
2022-10-31  9:18                             ` Michal Hocko
2022-10-31  9:18                               ` Michal Hocko
2022-10-31 14:09                           ` Feng Tang
2022-10-31 14:09                             ` Feng Tang
2022-10-31 14:32                             ` Michal Hocko
2022-10-31 14:32                               ` Michal Hocko
2022-11-07  8:05                               ` Feng Tang
2022-11-07  8:05                                 ` Feng Tang
2022-11-07  8:17                                 ` Michal Hocko
2022-11-07  8:17                                   ` Michal Hocko
2022-11-01  3:17                     ` Huang, Ying
2022-11-01  3:17                       ` Huang, Ying
2022-10-26  8:26 ` Yin, Fengwei
2022-10-26  8:26   ` Yin, Fengwei
2022-10-26  8:37   ` Feng Tang
2022-10-26  8:37     ` Feng Tang
2022-10-26 14:36 ` Waiman Long
2022-10-26 14:36   ` Waiman Long
2022-10-27  5:57   ` Feng Tang
2022-10-27  5:57     ` Feng Tang
2022-10-27  5:13 ` Huang, Ying
2022-10-27  5:13   ` Huang, Ying
2022-10-27  5:49   ` Feng Tang
2022-10-27  5:49     ` Feng Tang
2022-10-27  6:05     ` Huang, Ying [this message]
2022-10-27  6:05       ` Huang, Ying

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=871qqtls2o.fsf@yhuang6-desk2.ccr.corp.intel.com \
    --to=ying.huang@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=aneesh.kumar@linux.ibm.com \
    --cc=cgroups@vger.kernel.org \
    --cc=dave.hansen@intel.com \
    --cc=feng.tang@intel.com \
    --cc=fengwei.yin@intel.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lizefan.x@bytedance.com \
    --cc=longman@redhat.com \
    --cc=mhocko@suse.com \
    --cc=tim.c.chen@intel.com \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.