All of lore.kernel.org
 help / color / mirror / Atom feed
From: Baolin Wang <baolin.wang@linux.alibaba.com>
To: Jagdish Gediya <jvgediya@linux.ibm.com>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	akpm@linux-foundation.org, aneesh.kumar@linux.ibm.com,
	dave.hansen@linux.intel.com, ying.huang@intel.com
Subject: Re: [PATCH] mm: migrate: set demotion targets differently
Date: Wed, 30 Mar 2022 14:37:35 +0800	[thread overview]
Message-ID: <784aee91-6a01-6e67-389e-1e1883796894@linux.alibaba.com> (raw)
In-Reply-To: <YkMR8OY779Bcri3I@li-6e1fa1cc-351b-11b2-a85c-b897023bb5f3.ibm.com>



On 3/29/2022 10:04 PM, Jagdish Gediya wrote:
> On Tue, Mar 29, 2022 at 08:26:05PM +0800, Baolin Wang wrote:
> Hi Baolin,
>> Hi Jagdish,
>>
>> On 3/29/2022 7:52 PM, Jagdish Gediya wrote:
>>> The current implementation to identify the demotion
>>> targets limits some of the opportunities to share
>>> the demotion targets between multiple source nodes.
>>>
>>> Implement a logic to identify the loop in the demotion
>>> targets such that all the possibilities of demotion can
>>> be utilized. Don't share the used targets between all
>>> the nodes, instead create the used targets from scratch
>>> for each individual node based on for what all node this
>>> node is a demotion target. This helps to share the demotion
>>> targets without missing any possible way of demotion.
>>>
>>> e.g. with below NUMA topology, where node 0 & 1 are
>>> cpu + dram nodes, node 2 & 3 are equally slower memory
>>> only nodes, and node 4 is slowest memory only node,
>>>
>>> available: 5 nodes (0-4)
>>> node 0 cpus: 0 1
>>> node 0 size: n MB
>>> node 0 free: n MB
>>> node 1 cpus: 2 3
>>> node 1 size: n MB
>>> node 1 free: n MB
>>> node 2 cpus:
>>> node 2 size: n MB
>>> node 2 free: n MB
>>> node 3 cpus:
>>> node 3 size: n MB
>>> node 3 free: n MB
>>> node 4 cpus:
>>> node 4 size: n MB
>>> node 4 free: n MB
>>> node distances:
>>> node   0   1   2   3   4
>>>     0:  10  20  40  40  80
>>>     1:  20  10  40  40  80
>>>     2:  40  40  10  40  80
>>>     3:  40  40  40  10  80
>>>     4:  80  80  80  80  10
>>>
>>> The existing implementation gives below demotion targets,
>>>
>>> node    demotion_target
>>>    0              3, 2
>>>    1              4
>>>    2              X
>>>    3              X
>>>    4		X
>>>
>>> With this patch applied, below are the demotion targets,
>>>
>>> node    demotion_target
>>>    0              3, 2
>>>    1              3, 2
>>>    2              3
>>>    3              4
>>>    4		X
>>
>> Node 2 and node 3 both are slow memory and have same distance, why node 2
>> should demote cold memory to node 3? They should have the same target
>> demotion node 4, which is the slowest memory node, right?
>>
> Current demotion target finding algorithm works based on best distance, as distance between node 2 & 3 is 40 and distance between node 2 & 4 is 80, node 2 demotes to node 3.

If node 2 can demote to node 3, which means node 3's memory is colder 
than node 2, right? The accessing time of node 3 should be larger than 
node 2, then we can demote colder memory to node 3 from node 2.

But node 2 and node 3 are same memory type and have same distance, the 
accessing time of node 2 and node 3 should be same too, so why add so 
many page migration between node 2 and node 3? I'm still not sure the 
benefits.

Huang Ying and Dave, how do you think about this demotion targets?

>>>
>>> e.g. with below NUMA topology, where node 0, 1 & 2 are
>>> cpu + dram nodes and node 3 is slow memory node,
>>>
>>> available: 4 nodes (0-3)
>>> node 0 cpus: 0 1
>>> node 0 size: n MB
>>> node 0 free: n MB
>>> node 1 cpus: 2 3
>>> node 1 size: n MB
>>> node 1 free: n MB
>>> node 2 cpus: 4 5
>>> node 2 size: n MB
>>> node 2 free: n MB
>>> node 3 cpus:
>>> node 3 size: n MB
>>> node 3 free: n MB
>>> node distances:
>>> node   0   1   2   3
>>>     0:  10  20  20  40
>>>     1:  20  10  20  40
>>>     2:  20  20  10  40
>>>     3:  40  40  40  10
>>>
>>> The existing implementation gives below demotion targets,
>>>
>>> node    demotion_target
>>>    0              3
>>>    1              X
>>>    2              X
>>>    3              X
>>>
>>> With this patch applied, below are the demotion targets,
>>>
>>> node    demotion_target
>>>    0              3
>>>    1              3
>>>    2              3
>>>    3              X
>>
>> Sounds reasonable.
>>
>>>
>>> with below NUMA topology, where node 0 & 2 are cpu + dram
>>> nodes and node 1 & 3 are slow memory nodes,
>>>
>>> available: 4 nodes (0-3)
>>> node 0 cpus: 0 1
>>> node 0 size: n MB
>>> node 0 free: n MB
>>> node 1 cpus:
>>> node 1 size: n MB
>>> node 1 free: n MB
>>> node 2 cpus: 2 3
>>> node 2 size: n MB
>>> node 2 free: n MB
>>> node 3 cpus:
>>> node 3 size: n MB
>>> node 3 free: n MB
>>> node distances:
>>> node   0   1   2   3
>>>     0:  10  40  20  80
>>>     1:  40  10  80  80
>>>     2:  20  80  10  40
>>>     3:  80  80  40  10
>>>
>>> The existing implementation gives below demotion targets,
>>>
>>> node    demotion_target
>>>    0              3
>>>    1              X
>>>    2              3
>>>    3              X
>>
>> If I understand correctly, this is not true. The demotion route should be as
>> below with existing implementation:
>> node 0 ---> node 1
>> node 1 ---> X
>> node 2 ---> node 3
>> node 3 ---> X
>>
> Its typo, It should be 0 -> 1, Will correct it in v2.
>>>
>>> With this patch applied, below are the demotion targets,
>>>
>>> node    demotion_target
>>>    0              1
>>>    1              3
>>>    2              3
>>>    3              X
>>>
>>> As it can be seen above, node 3 can be demotion target for node
>>> 1 but existing implementation doesn't configure it that way. It
>>> is better to move pages from node 1 to node 3 instead of moving
>>> it from node 1 to swap.
>>
>> Which means node 3 is the slowest memory node?
>>
> Node 1 and 3 are equally slower but 1 is near to 0 and 3 is near to 2. Basically you can think of it like node 1 is slow memory logical node near to node 0 and node 3 is slow memory logical node near to node 2.

OK.

  reply	other threads:[~2022-03-30  6:37 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-29 11:52 [PATCH] mm: migrate: set demotion targets differently Jagdish Gediya
2022-03-29 12:26 ` Baolin Wang
2022-03-29 14:04   ` Jagdish Gediya
2022-03-30  6:37     ` Baolin Wang [this message]
2022-03-30  6:54       ` Huang, Ying
2022-03-29 14:31 ` Dave Hansen
2022-03-29 16:46   ` Jagdish Gediya
2022-03-29 22:40     ` Dave Hansen
2022-03-30  6:46 ` Huang, Ying
2022-03-30 16:36   ` Jagdish Gediya
2022-03-31  0:27     ` Huang, Ying
2022-03-31 11:17     ` Jonathan Cameron
2022-03-30 17:17   ` Aneesh Kumar K.V
2022-03-31  0:32     ` Huang, Ying
2022-03-31  6:45       ` Aneesh Kumar K.V
2022-03-31  7:23         ` Huang, Ying
2022-03-31  8:27           ` Aneesh Kumar K.V
2022-03-31  8:58             ` Huang, Ying
2022-03-31  9:33               ` Baolin Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=784aee91-6a01-6e67-389e-1e1883796894@linux.alibaba.com \
    --to=baolin.wang@linux.alibaba.com \
    --cc=akpm@linux-foundation.org \
    --cc=aneesh.kumar@linux.ibm.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=jvgediya@linux.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.