linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Baolin Wang <baolin.wang@linux.alibaba.com>
To: Jagdish Gediya <jvgediya@linux.ibm.com>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	akpm@linux-foundation.org, aneesh.kumar@linux.ibm.com,
	dave.hansen@linux.intel.com, ying.huang@intel.com
Subject: Re: [PATCH] mm: migrate: set demotion targets differently
Date: Wed, 30 Mar 2022 14:37:35 +0800	[thread overview]
Message-ID: <784aee91-6a01-6e67-389e-1e1883796894@linux.alibaba.com> (raw)
In-Reply-To: <YkMR8OY779Bcri3I@li-6e1fa1cc-351b-11b2-a85c-b897023bb5f3.ibm.com>



On 3/29/2022 10:04 PM, Jagdish Gediya wrote:
> On Tue, Mar 29, 2022 at 08:26:05PM +0800, Baolin Wang wrote:
> Hi Baolin,
>> Hi Jagdish,
>>
>> On 3/29/2022 7:52 PM, Jagdish Gediya wrote:
>>> The current implementation to identify the demotion
>>> targets limits some of the opportunities to share
>>> the demotion targets between multiple source nodes.
>>>
>>> Implement a logic to identify the loop in the demotion
>>> targets such that all the possibilities of demotion can
>>> be utilized. Don't share the used targets between all
>>> the nodes, instead create the used targets from scratch
>>> for each individual node based on for what all node this
>>> node is a demotion target. This helps to share the demotion
>>> targets without missing any possible way of demotion.
>>>
>>> e.g. with below NUMA topology, where node 0 & 1 are
>>> cpu + dram nodes, node 2 & 3 are equally slower memory
>>> only nodes, and node 4 is slowest memory only node,
>>>
>>> available: 5 nodes (0-4)
>>> node 0 cpus: 0 1
>>> node 0 size: n MB
>>> node 0 free: n MB
>>> node 1 cpus: 2 3
>>> node 1 size: n MB
>>> node 1 free: n MB
>>> node 2 cpus:
>>> node 2 size: n MB
>>> node 2 free: n MB
>>> node 3 cpus:
>>> node 3 size: n MB
>>> node 3 free: n MB
>>> node 4 cpus:
>>> node 4 size: n MB
>>> node 4 free: n MB
>>> node distances:
>>> node   0   1   2   3   4
>>>     0:  10  20  40  40  80
>>>     1:  20  10  40  40  80
>>>     2:  40  40  10  40  80
>>>     3:  40  40  40  10  80
>>>     4:  80  80  80  80  10
>>>
>>> The existing implementation gives below demotion targets,
>>>
>>> node    demotion_target
>>>    0              3, 2
>>>    1              4
>>>    2              X
>>>    3              X
>>>    4		X
>>>
>>> With this patch applied, below are the demotion targets,
>>>
>>> node    demotion_target
>>>    0              3, 2
>>>    1              3, 2
>>>    2              3
>>>    3              4
>>>    4		X
>>
>> Node 2 and node 3 both are slow memory and have same distance, why node 2
>> should demote cold memory to node 3? They should have the same target
>> demotion node 4, which is the slowest memory node, right?
>>
> Current demotion target finding algorithm works based on best distance, as distance between node 2 & 3 is 40 and distance between node 2 & 4 is 80, node 2 demotes to node 3.

If node 2 can demote to node 3, which means node 3's memory is colder 
than node 2, right? The accessing time of node 3 should be larger than 
node 2, then we can demote colder memory to node 3 from node 2.

But node 2 and node 3 are same memory type and have same distance, the 
accessing time of node 2 and node 3 should be same too, so why add so 
many page migration between node 2 and node 3? I'm still not sure the 
benefits.

Huang Ying and Dave, how do you think about this demotion targets?

>>>
>>> e.g. with below NUMA topology, where node 0, 1 & 2 are
>>> cpu + dram nodes and node 3 is slow memory node,
>>>
>>> available: 4 nodes (0-3)
>>> node 0 cpus: 0 1
>>> node 0 size: n MB
>>> node 0 free: n MB
>>> node 1 cpus: 2 3
>>> node 1 size: n MB
>>> node 1 free: n MB
>>> node 2 cpus: 4 5
>>> node 2 size: n MB
>>> node 2 free: n MB
>>> node 3 cpus:
>>> node 3 size: n MB
>>> node 3 free: n MB
>>> node distances:
>>> node   0   1   2   3
>>>     0:  10  20  20  40
>>>     1:  20  10  20  40
>>>     2:  20  20  10  40
>>>     3:  40  40  40  10
>>>
>>> The existing implementation gives below demotion targets,
>>>
>>> node    demotion_target
>>>    0              3
>>>    1              X
>>>    2              X
>>>    3              X
>>>
>>> With this patch applied, below are the demotion targets,
>>>
>>> node    demotion_target
>>>    0              3
>>>    1              3
>>>    2              3
>>>    3              X
>>
>> Sounds reasonable.
>>
>>>
>>> with below NUMA topology, where node 0 & 2 are cpu + dram
>>> nodes and node 1 & 3 are slow memory nodes,
>>>
>>> available: 4 nodes (0-3)
>>> node 0 cpus: 0 1
>>> node 0 size: n MB
>>> node 0 free: n MB
>>> node 1 cpus:
>>> node 1 size: n MB
>>> node 1 free: n MB
>>> node 2 cpus: 2 3
>>> node 2 size: n MB
>>> node 2 free: n MB
>>> node 3 cpus:
>>> node 3 size: n MB
>>> node 3 free: n MB
>>> node distances:
>>> node   0   1   2   3
>>>     0:  10  40  20  80
>>>     1:  40  10  80  80
>>>     2:  20  80  10  40
>>>     3:  80  80  40  10
>>>
>>> The existing implementation gives below demotion targets,
>>>
>>> node    demotion_target
>>>    0              3
>>>    1              X
>>>    2              3
>>>    3              X
>>
>> If I understand correctly, this is not true. The demotion route should be as
>> below with existing implementation:
>> node 0 ---> node 1
>> node 1 ---> X
>> node 2 ---> node 3
>> node 3 ---> X
>>
> Its typo, It should be 0 -> 1, Will correct it in v2.
>>>
>>> With this patch applied, below are the demotion targets,
>>>
>>> node    demotion_target
>>>    0              1
>>>    1              3
>>>    2              3
>>>    3              X
>>>
>>> As it can be seen above, node 3 can be demotion target for node
>>> 1 but existing implementation doesn't configure it that way. It
>>> is better to move pages from node 1 to node 3 instead of moving
>>> it from node 1 to swap.
>>
>> Which means node 3 is the slowest memory node?
>>
> Node 1 and 3 are equally slower but 1 is near to 0 and 3 is near to 2. Basically you can think of it like node 1 is slow memory logical node near to node 0 and node 3 is slow memory logical node near to node 2.

OK.

  reply	other threads:[~2022-03-30  6:37 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-29 11:52 [PATCH] mm: migrate: set demotion targets differently Jagdish Gediya
2022-03-29 12:26 ` Baolin Wang
2022-03-29 14:04   ` Jagdish Gediya
2022-03-30  6:37     ` Baolin Wang [this message]
2022-03-30  6:54       ` Huang, Ying
2022-03-29 14:31 ` Dave Hansen
2022-03-29 16:46   ` Jagdish Gediya
2022-03-29 22:40     ` Dave Hansen
2022-03-30  6:46 ` Huang, Ying
2022-03-30 16:36   ` Jagdish Gediya
2022-03-31  0:27     ` Huang, Ying
2022-03-31 11:17     ` Jonathan Cameron
2022-03-30 17:17   ` Aneesh Kumar K.V
2022-03-31  0:32     ` Huang, Ying
2022-03-31  6:45       ` Aneesh Kumar K.V
2022-03-31  7:23         ` Huang, Ying
2022-03-31  8:27           ` Aneesh Kumar K.V
2022-03-31  8:58             ` Huang, Ying
2022-03-31  9:33               ` Baolin Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=784aee91-6a01-6e67-389e-1e1883796894@linux.alibaba.com \
    --to=baolin.wang@linux.alibaba.com \
    --cc=akpm@linux-foundation.org \
    --cc=aneesh.kumar@linux.ibm.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=jvgediya@linux.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).