linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Huang, Ying" <ying.huang@intel.com>
To: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>,
	Jagdish Gediya <jvgediya@linux.ibm.com>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	akpm@linux-foundation.org, baolin.wang@linux.alibaba.com,
	dave.hansen@linux.intel.com, Fan Du <fan.du@intel.com>
Subject: Re: [PATCH] mm: migrate: set demotion targets differently
Date: Thu, 31 Mar 2022 15:23:16 +0800	[thread overview]
Message-ID: <87h77ebn6j.fsf@yhuang6-desk2.ccr.corp.intel.com> (raw)
In-Reply-To: <87ilruy5zt.fsf@linux.ibm.com> (Aneesh Kumar K. V.'s message of "Thu, 31 Mar 2022 12:15:58 +0530")

"Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> writes:

> "Huang, Ying" <ying.huang@intel.com> writes:
>
>> "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> writes:
>>
>>> "Huang, Ying" <ying.huang@intel.com> writes:
>>>
>>>> Hi, Jagdish,
>>>>
>>>> Jagdish Gediya <jvgediya@linux.ibm.com> writes:
>>>>
>>>
>>> ...
>>>
>>>>> e.g. with below NUMA topology, where node 0 & 1 are
>>>>> cpu + dram nodes, node 2 & 3 are equally slower memory
>>>>> only nodes, and node 4 is slowest memory only node,
>>>>>
>>>>> available: 5 nodes (0-4)
>>>>> node 0 cpus: 0 1
>>>>> node 0 size: n MB
>>>>> node 0 free: n MB
>>>>> node 1 cpus: 2 3
>>>>> node 1 size: n MB
>>>>> node 1 free: n MB
>>>>> node 2 cpus:
>>>>> node 2 size: n MB
>>>>> node 2 free: n MB
>>>>> node 3 cpus:
>>>>> node 3 size: n MB
>>>>> node 3 free: n MB
>>>>> node 4 cpus:
>>>>> node 4 size: n MB
>>>>> node 4 free: n MB
>>>>> node distances:
>>>>> node   0   1   2   3   4
>>>>>   0:  10  20  40  40  80
>>>>>   1:  20  10  40  40  80
>>>>>   2:  40  40  10  40  80
>>>>>   3:  40  40  40  10  80
>>>>>   4:  80  80  80  80  10
>>>>>
>>>>> The existing implementation gives below demotion targets,
>>>>>
>>>>> node    demotion_target
>>>>>  0              3, 2
>>>>>  1              4
>>>>>  2              X
>>>>>  3              X
>>>>>  4		X
>>>>>
>>>>> With this patch applied, below are the demotion targets,
>>>>>
>>>>> node    demotion_target
>>>>>  0              3, 2
>>>>>  1              3, 2
>>>>>  2              3
>>>>>  3              4
>>>>>  4		X
>>>>
>>>> For such machine, I think the perfect demotion order is,
>>>>
>>>> node    demotion_target
>>>>  0              2, 3
>>>>  1              2, 3
>>>>  2              4
>>>>  3              4
>>>>  4              X
>>>
>>> I guess the "equally slow nodes" is a confusing definition here. Now if the
>>> system consists of 2 1GB equally slow memory and the firmware doesn't want to
>>> differentiate between them, firmware can present a single NUMA node
>>> with 2GB capacity? The fact that we are finding two NUMA nodes is a hint
>>> that there is some difference between these two memory devices. This is
>>> also captured by the fact that the distance between 2 and 3 is 40 and not 10.
>>
>> Do you have more information about this?
>
> Not sure I follow the question there. I was checking shouldn't firmware
> do a single NUMA node if two memory devices are of the same type? How will
> optane present such a config? Both the DIMMs will have the same
> proximity domain value and hence dax kmem will add them to the same NUMA
> node?

Sorry for confusing.  I just wanted to check whether you have more
information about the machine configuration above.  The machines in my
hand have no complex NUMA topology as in the patch description.

> If you are suggesting that firmware doesn't do that, then I agree with you
> that a demotion target like the below is good. 
>
>  node    demotion_target
>   0              2, 3
>   1              2, 3
>   2              4
>   3              4
>   4              X
>
> We can also achieve that with a smiple change as below.

Glad to see the demotion order can be implemented in a simple way.

My concern is that is it necessary to do this?  If there are real
machines with the NUMA topology, then I think it's good to add the
support.  But if not, why do we make the code complex unnecessarily?

I don't have these kind of machines, do you have and will have?

> @@ -3120,7 +3120,7 @@ static void __set_migration_target_nodes(void)
>  {
>  	nodemask_t next_pass	= NODE_MASK_NONE;
>  	nodemask_t this_pass	= NODE_MASK_NONE;
> -	nodemask_t used_targets = NODE_MASK_NONE;
> +	nodemask_t this_pass_used_targets = NODE_MASK_NONE;
>  	int node, best_distance;
>  
>  	/*
> @@ -3141,17 +3141,20 @@ static void __set_migration_target_nodes(void)
>  	/*
>  	 * To avoid cycles in the migration "graph", ensure
>  	 * that migration sources are not future targets by
> -	 * setting them in 'used_targets'.  Do this only
> +	 * setting them in 'this_pass_used_targets'.  Do this only
>  	 * once per pass so that multiple source nodes can
>  	 * share a target node.
>  	 *
> -	 * 'used_targets' will become unavailable in future
> +	 * 'this_pass_used_targets' will become unavailable in future
>  	 * passes.  This limits some opportunities for
>  	 * multiple source nodes to share a destination.
>  	 */
> -	nodes_or(used_targets, used_targets, this_pass);
> +	nodes_or(this_pass_used_targets, this_pass_used_targets, this_pass);
>  
>  	for_each_node_mask(node, this_pass) {
> +
> +		nodemask_t used_targets = this_pass_used_targets;
> +
>  		best_distance = -1;
>  
>  		/*

Best Regards,
Huang, Ying

  reply	other threads:[~2022-03-31  7:23 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-29 11:52 [PATCH] mm: migrate: set demotion targets differently Jagdish Gediya
2022-03-29 12:26 ` Baolin Wang
2022-03-29 14:04   ` Jagdish Gediya
2022-03-30  6:37     ` Baolin Wang
2022-03-30  6:54       ` Huang, Ying
2022-03-29 14:31 ` Dave Hansen
2022-03-29 16:46   ` Jagdish Gediya
2022-03-29 22:40     ` Dave Hansen
2022-03-30  6:46 ` Huang, Ying
2022-03-30 16:36   ` Jagdish Gediya
2022-03-31  0:27     ` Huang, Ying
2022-03-31 11:17     ` Jonathan Cameron
2022-03-30 17:17   ` Aneesh Kumar K.V
2022-03-31  0:32     ` Huang, Ying
2022-03-31  6:45       ` Aneesh Kumar K.V
2022-03-31  7:23         ` Huang, Ying [this message]
2022-03-31  8:27           ` Aneesh Kumar K.V
2022-03-31  8:58             ` Huang, Ying
2022-03-31  9:33               ` Baolin Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87h77ebn6j.fsf@yhuang6-desk2.ccr.corp.intel.com \
    --to=ying.huang@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=aneesh.kumar@linux.ibm.com \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=fan.du@intel.com \
    --cc=jvgediya@linux.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).