linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: "Huang, Ying" <ying.huang@intel.com>
To: Jagdish Gediya <jvgediya@linux.ibm.com>
Cc: linux-mm@kvack.org,  linux-kernel@vger.kernel.org,
	akpm@linux-foundation.org,  aneesh.kumar@linux.ibm.com,
	baolin.wang@linux.alibaba.com,  dave.hansen@linux.intel.com
Subject: Re: [PATCH] mm: migrate: set demotion targets differently
Date: Wed, 30 Mar 2022 14:46:51 +0800	[thread overview]
Message-ID: <87pmm4c4ys.fsf@yhuang6-desk2.ccr.corp.intel.com> (raw)
In-Reply-To: <20220329115222.8923-1-jvgediya@linux.ibm.com> (Jagdish Gediya's message of "Tue, 29 Mar 2022 17:22:22 +0530")

Hi, Jagdish,

Jagdish Gediya <jvgediya@linux.ibm.com> writes:

> The current implementation to identify the demotion
> targets limits some of the opportunities to share
> the demotion targets between multiple source nodes.

Yes.  It sounds reasonable to share demotion targets among multiple
source nodes.

One question, are example machines below are real hardware now or in
near future?  Or you just think they are possible?

And, before going into the implementation details, I think that we can
discuss the perfect demotion order firstly.

> Implement a logic to identify the loop in the demotion
> targets such that all the possibilities of demotion can
> be utilized. Don't share the used targets between all
> the nodes, instead create the used targets from scratch
> for each individual node based on for what all node this
> node is a demotion target. This helps to share the demotion
> targets without missing any possible way of demotion.
>
> e.g. with below NUMA topology, where node 0 & 1 are
> cpu + dram nodes, node 2 & 3 are equally slower memory
> only nodes, and node 4 is slowest memory only node,
>
> available: 5 nodes (0-4)
> node 0 cpus: 0 1
> node 0 size: n MB
> node 0 free: n MB
> node 1 cpus: 2 3
> node 1 size: n MB
> node 1 free: n MB
> node 2 cpus:
> node 2 size: n MB
> node 2 free: n MB
> node 3 cpus:
> node 3 size: n MB
> node 3 free: n MB
> node 4 cpus:
> node 4 size: n MB
> node 4 free: n MB
> node distances:
> node   0   1   2   3   4
>   0:  10  20  40  40  80
>   1:  20  10  40  40  80
>   2:  40  40  10  40  80
>   3:  40  40  40  10  80
>   4:  80  80  80  80  10
>
> The existing implementation gives below demotion targets,
>
> node    demotion_target
>  0              3, 2
>  1              4
>  2              X
>  3              X
>  4		X
>
> With this patch applied, below are the demotion targets,
>
> node    demotion_target
>  0              3, 2
>  1              3, 2
>  2              3
>  3              4
>  4		X

For such machine, I think the perfect demotion order is,

node    demotion_target
 0              2, 3
 1              2, 3
 2              4
 3              4
 4              X

> e.g. with below NUMA topology, where node 0, 1 & 2 are
> cpu + dram nodes and node 3 is slow memory node,
>
> available: 4 nodes (0-3)
> node 0 cpus: 0 1
> node 0 size: n MB
> node 0 free: n MB
> node 1 cpus: 2 3
> node 1 size: n MB
> node 1 free: n MB
> node 2 cpus: 4 5
> node 2 size: n MB
> node 2 free: n MB
> node 3 cpus:
> node 3 size: n MB
> node 3 free: n MB
> node distances:
> node   0   1   2   3
>   0:  10  20  20  40
>   1:  20  10  20  40
>   2:  20  20  10  40
>   3:  40  40  40  10
>
> The existing implementation gives below demotion targets,
>
> node    demotion_target
>  0              3
>  1              X
>  2              X
>  3              X
>
> With this patch applied, below are the demotion targets,
>
> node    demotion_target
>  0              3
>  1              3
>  2              3
>  3              X

I think this is perfect already.

> with below NUMA topology, where node 0 & 2 are cpu + dram
> nodes and node 1 & 3 are slow memory nodes,
>
> available: 4 nodes (0-3)
> node 0 cpus: 0 1
> node 0 size: n MB
> node 0 free: n MB
> node 1 cpus:
> node 1 size: n MB
> node 1 free: n MB
> node 2 cpus: 2 3
> node 2 size: n MB
> node 2 free: n MB
> node 3 cpus:
> node 3 size: n MB
> node 3 free: n MB
> node distances:
> node   0   1   2   3
>   0:  10  40  20  80
>   1:  40  10  80  80
>   2:  20  80  10  40
>   3:  80  80  40  10
>
> The existing implementation gives below demotion targets,
>
> node    demotion_target
>  0              3
>  1              X
>  2              3
>  3              X

Should be as below as you said in another email of the thread.

node    demotion_target
 0              1
 1              X
 2              3
 3              X

> With this patch applied, below are the demotion targets,
>
> node    demotion_target
>  0              1
>  1              3
>  2              3
>  3              X

The original demotion order looks better for me.  1 and 3 are at the
same level from the perspective of the whole system.

Another example, node 0 & 2 are cpu + dram nodes and node 1 are slow
memory node near node 0,

available: 3 nodes (0-2)
node 0 cpus: 0 1
node 0 size: n MB
node 0 free: n MB
node 1 cpus:
node 1 size: n MB
node 1 free: n MB
node 2 cpus: 2 3
node 2 size: n MB
node 2 free: n MB
node distances:
node   0   1   2
  0:  10  40  20
  1:  40  10  80
  2:  20  80  10


Demotion order 1:

node    demotion_target
 0              1
 1              X
 2              X

Demotion order 2:

node    demotion_target
 0              1
 1              X
 2              1

Demotion order 2 looks better.  But I think that demotion order 1 makes
some sense too (like node reclaim mode).

It seems that,

If a demotion target has same distance to several current demotion
sources, the demotion target should be shared among the demotion
sources.

And as Dave pointed out, we may eventually need a mechanism to override
the default demotion order generated automatically.  So we can just use
some simple mechanism that makes sense in most cases in kernel
automatically.  And leave the best demotion for users to some
customization mechanism.

> As it can be seen above, node 3 can be demotion target for node
> 1 but existing implementation doesn't configure it that way. It
> is better to move pages from node 1 to node 3 instead of moving
> it from node 1 to swap.
>
> Signed-off-by: Jagdish Gediya <jvgediya@linux.ibm.com>
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>

Best Regards,
Huang, Ying

[snip]


  parent reply	other threads:[~2022-03-30  6:46 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-29 11:52 [PATCH] mm: migrate: set demotion targets differently Jagdish Gediya
2022-03-29 12:26 ` Baolin Wang
2022-03-29 14:04   ` Jagdish Gediya
2022-03-30  6:37     ` Baolin Wang
2022-03-30  6:54       ` Huang, Ying
2022-03-29 14:31 ` Dave Hansen
2022-03-29 16:46   ` Jagdish Gediya
2022-03-29 22:40     ` Dave Hansen
2022-03-30  6:46 ` Huang, Ying [this message]
2022-03-30 16:36   ` Jagdish Gediya
2022-03-31  0:27     ` Huang, Ying
2022-03-31 11:17     ` Jonathan Cameron
2022-03-30 17:17   ` Aneesh Kumar K.V
2022-03-31  0:32     ` Huang, Ying
2022-03-31  6:45       ` Aneesh Kumar K.V
2022-03-31  7:23         ` Huang, Ying
2022-03-31  8:27           ` Aneesh Kumar K.V
2022-03-31  8:58             ` Huang, Ying
2022-03-31  9:33               ` Baolin Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87pmm4c4ys.fsf@yhuang6-desk2.ccr.corp.intel.com \
    --to=ying.huang@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=aneesh.kumar@linux.ibm.com \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=jvgediya@linux.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).