From: "Huang, Ying" <ying.huang@intel.com>
To: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>,
Jagdish Gediya <jvgediya@linux.ibm.com>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
akpm@linux-foundation.org, baolin.wang@linux.alibaba.com,
dave.hansen@linux.intel.com, Fan Du <fan.du@intel.com>
Subject: Re: [PATCH] mm: migrate: set demotion targets differently
Date: Thu, 31 Mar 2022 15:23:16 +0800 [thread overview]
Message-ID: <87h77ebn6j.fsf@yhuang6-desk2.ccr.corp.intel.com> (raw)
In-Reply-To: <87ilruy5zt.fsf@linux.ibm.com> (Aneesh Kumar K. V.'s message of "Thu, 31 Mar 2022 12:15:58 +0530")
"Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> writes:
> "Huang, Ying" <ying.huang@intel.com> writes:
>
>> "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> writes:
>>
>>> "Huang, Ying" <ying.huang@intel.com> writes:
>>>
>>>> Hi, Jagdish,
>>>>
>>>> Jagdish Gediya <jvgediya@linux.ibm.com> writes:
>>>>
>>>
>>> ...
>>>
>>>>> e.g. with below NUMA topology, where node 0 & 1 are
>>>>> cpu + dram nodes, node 2 & 3 are equally slower memory
>>>>> only nodes, and node 4 is slowest memory only node,
>>>>>
>>>>> available: 5 nodes (0-4)
>>>>> node 0 cpus: 0 1
>>>>> node 0 size: n MB
>>>>> node 0 free: n MB
>>>>> node 1 cpus: 2 3
>>>>> node 1 size: n MB
>>>>> node 1 free: n MB
>>>>> node 2 cpus:
>>>>> node 2 size: n MB
>>>>> node 2 free: n MB
>>>>> node 3 cpus:
>>>>> node 3 size: n MB
>>>>> node 3 free: n MB
>>>>> node 4 cpus:
>>>>> node 4 size: n MB
>>>>> node 4 free: n MB
>>>>> node distances:
>>>>> node 0 1 2 3 4
>>>>> 0: 10 20 40 40 80
>>>>> 1: 20 10 40 40 80
>>>>> 2: 40 40 10 40 80
>>>>> 3: 40 40 40 10 80
>>>>> 4: 80 80 80 80 10
>>>>>
>>>>> The existing implementation gives below demotion targets,
>>>>>
>>>>> node demotion_target
>>>>> 0 3, 2
>>>>> 1 4
>>>>> 2 X
>>>>> 3 X
>>>>> 4 X
>>>>>
>>>>> With this patch applied, below are the demotion targets,
>>>>>
>>>>> node demotion_target
>>>>> 0 3, 2
>>>>> 1 3, 2
>>>>> 2 3
>>>>> 3 4
>>>>> 4 X
>>>>
>>>> For such machine, I think the perfect demotion order is,
>>>>
>>>> node demotion_target
>>>> 0 2, 3
>>>> 1 2, 3
>>>> 2 4
>>>> 3 4
>>>> 4 X
>>>
>>> I guess the "equally slow nodes" is a confusing definition here. Now if the
>>> system consists of 2 1GB equally slow memory and the firmware doesn't want to
>>> differentiate between them, firmware can present a single NUMA node
>>> with 2GB capacity? The fact that we are finding two NUMA nodes is a hint
>>> that there is some difference between these two memory devices. This is
>>> also captured by the fact that the distance between 2 and 3 is 40 and not 10.
>>
>> Do you have more information about this?
>
> Not sure I follow the question there. I was checking shouldn't firmware
> do a single NUMA node if two memory devices are of the same type? How will
> optane present such a config? Both the DIMMs will have the same
> proximity domain value and hence dax kmem will add them to the same NUMA
> node?
Sorry for confusing. I just wanted to check whether you have more
information about the machine configuration above. The machines in my
hand have no complex NUMA topology as in the patch description.
> If you are suggesting that firmware doesn't do that, then I agree with you
> that a demotion target like the below is good.
>
> node demotion_target
> 0 2, 3
> 1 2, 3
> 2 4
> 3 4
> 4 X
>
> We can also achieve that with a smiple change as below.
Glad to see the demotion order can be implemented in a simple way.
My concern is that is it necessary to do this? If there are real
machines with the NUMA topology, then I think it's good to add the
support. But if not, why do we make the code complex unnecessarily?
I don't have these kind of machines, do you have and will have?
> @@ -3120,7 +3120,7 @@ static void __set_migration_target_nodes(void)
> {
> nodemask_t next_pass = NODE_MASK_NONE;
> nodemask_t this_pass = NODE_MASK_NONE;
> - nodemask_t used_targets = NODE_MASK_NONE;
> + nodemask_t this_pass_used_targets = NODE_MASK_NONE;
> int node, best_distance;
>
> /*
> @@ -3141,17 +3141,20 @@ static void __set_migration_target_nodes(void)
> /*
> * To avoid cycles in the migration "graph", ensure
> * that migration sources are not future targets by
> - * setting them in 'used_targets'. Do this only
> + * setting them in 'this_pass_used_targets'. Do this only
> * once per pass so that multiple source nodes can
> * share a target node.
> *
> - * 'used_targets' will become unavailable in future
> + * 'this_pass_used_targets' will become unavailable in future
> * passes. This limits some opportunities for
> * multiple source nodes to share a destination.
> */
> - nodes_or(used_targets, used_targets, this_pass);
> + nodes_or(this_pass_used_targets, this_pass_used_targets, this_pass);
>
> for_each_node_mask(node, this_pass) {
> +
> + nodemask_t used_targets = this_pass_used_targets;
> +
> best_distance = -1;
>
> /*
Best Regards,
Huang, Ying
next prev parent reply other threads:[~2022-03-31 7:23 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-03-29 11:52 [PATCH] mm: migrate: set demotion targets differently Jagdish Gediya
2022-03-29 12:26 ` Baolin Wang
2022-03-29 14:04 ` Jagdish Gediya
2022-03-30 6:37 ` Baolin Wang
2022-03-30 6:54 ` Huang, Ying
2022-03-29 14:31 ` Dave Hansen
2022-03-29 16:46 ` Jagdish Gediya
2022-03-29 22:40 ` Dave Hansen
2022-03-30 6:46 ` Huang, Ying
2022-03-30 16:36 ` Jagdish Gediya
2022-03-31 0:27 ` Huang, Ying
2022-03-31 11:17 ` Jonathan Cameron
2022-03-30 17:17 ` Aneesh Kumar K.V
2022-03-31 0:32 ` Huang, Ying
2022-03-31 6:45 ` Aneesh Kumar K.V
2022-03-31 7:23 ` Huang, Ying [this message]
2022-03-31 8:27 ` Aneesh Kumar K.V
2022-03-31 8:58 ` Huang, Ying
2022-03-31 9:33 ` Baolin Wang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87h77ebn6j.fsf@yhuang6-desk2.ccr.corp.intel.com \
--to=ying.huang@intel.com \
--cc=akpm@linux-foundation.org \
--cc=aneesh.kumar@linux.ibm.com \
--cc=baolin.wang@linux.alibaba.com \
--cc=dave.hansen@linux.intel.com \
--cc=fan.du@intel.com \
--cc=jvgediya@linux.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).