From: Aneesh Kumar K V <aneesh.kumar@linux.ibm.com>
To: Wei Xu <weixugc@google.com>,
"ying.huang@intel.com" <ying.huang@intel.com>
Cc: Jagdish Gediya <jvgediya@linux.ibm.com>,
Yang Shi <shy828301@gmail.com>,
Dave Hansen <dave.hansen@linux.intel.com>,
Dan Williams <dan.j.williams@intel.com>,
Davidlohr Bueso <dave@stgolabs.net>,
Linux MM <linux-mm@kvack.org>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
Andrew Morton <akpm@linux-foundation.org>,
Baolin Wang <baolin.wang@linux.alibaba.com>,
Greg Thelen <gthelen@google.com>, MichalHocko <mhocko@kernel.org>,
Brice Goglin <brice.goglin@gmail.com>
Subject: Re: [PATCH v2 0/5] mm: demotion: Introduce new node state N_DEMOTION_TARGETS
Date: Wed, 27 Apr 2022 10:36:10 +0530 [thread overview]
Message-ID: <b1f58fd4-e23b-f617-b4a7-b80b1ffbe13f@linux.ibm.com> (raw)
In-Reply-To: <CAAPL-u94H9FLjVtYLhi_A2AqLTOCTMRh6=Sx9cX8A3WGNM-OdA@mail.gmail.com>
On 4/25/22 10:26 PM, Wei Xu wrote:
> On Sat, Apr 23, 2022 at 8:02 PM ying.huang@intel.com
> <ying.huang@intel.com> wrote:
>>
....
>> 2. For machines with PMEM installed in only 1 of 2 sockets, for example,
>>
>> Node 0 & 2 are cpu + dram nodes and node 1 are slow
>> memory node near node 0,
>>
>> available: 3 nodes (0-2)
>> node 0 cpus: 0 1
>> node 0 size: n MB
>> node 0 free: n MB
>> node 1 cpus:
>> node 1 size: n MB
>> node 1 free: n MB
>> node 2 cpus: 2 3
>> node 2 size: n MB
>> node 2 free: n MB
>> node distances:
>> node 0 1 2
>> 0: 10 40 20
>> 1: 40 10 80
>> 2: 20 80 10
>>
>> We have 2 choices,
>>
>> a)
>> node demotion targets
>> 0 1
>> 2 1
>>
>> b)
>> node demotion targets
>> 0 1
>> 2 X
>>
>> a) is good to take advantage of PMEM. b) is good to reduce cross-socket
>> traffic. Both are OK as defualt configuration. But some users may
>> prefer the other one. So we need a user space ABI to override the
>> default configuration.
>
> I think 2(a) should be the system-wide configuration and 2(b) can be
> achieved with NUMA mempolicy (which needs to be added to demotion).
>
> In general, we can view the demotion order in a way similar to
> allocation fallback order (after all, if we don't demote or demotion
> lags behind, the allocations will go to these demotion target nodes
> according to the allocation fallback order anyway). If we initialize
> the demotion order in that way (i.e. every node can demote to any node
> in the next tier, and the priority of the target nodes is sorted for
> each source node), we don't need per-node demotion order override from
> the userspace. What we need is to specify what nodes should be in
> each tier and support NUMA mempolicy in demotion.
>
I have been wondering how we would handle this. For ex: If an
application has specified an MPOL_BIND policy and restricted the
allocation to be from Node0 and Node1, should we demote pages allocated
by that application
to Node10? The other alternative for that demotion is swapping. So from
the page point of view, we either demote to a slow memory or pageout to
swap. But then if we demote we are also breaking the MPOL_BIND rule.
The above says we would need some kind of mem policy interaction, but
what I am not sure about is how to find the memory policy in the
demotion path.
> Cross-socket demotion should not be too big a problem in practice
> because we can optimize the code to do the demotion from the local CPU
> node (i.e. local writes to the target node and remote read from the
> source node). The bigger issue is cross-socket memory access onto the
> demoted pages from the applications, which is why NUMA mempolicy is
> important here.
>
>
-aneesh
next prev parent reply other threads:[~2022-04-27 5:06 UTC|newest]
Thread overview: 69+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-04-13 9:22 [PATCH v2 0/5] mm: demotion: Introduce new node state N_DEMOTION_TARGETS Jagdish Gediya
2022-04-13 9:22 ` [PATCH v2 1/5] mm: demotion: Set demotion list differently Jagdish Gediya
2022-04-14 7:09 ` ying.huang
2022-04-14 8:48 ` Jagdish Gediya
2022-04-14 8:57 ` ying.huang
2022-04-14 8:55 ` Baolin Wang
2022-04-14 9:02 ` Jonathan Cameron
2022-04-14 10:40 ` Jagdish Gediya
2022-04-21 6:13 ` ying.huang
2022-04-13 9:22 ` [PATCH v2 2/5] mm: demotion: Add new node state N_DEMOTION_TARGETS Jagdish Gediya
2022-04-21 4:33 ` Wei Xu
2022-04-13 9:22 ` [PATCH v2 3/5] mm: demotion: Add support to set targets from userspace Jagdish Gediya
2022-04-21 4:26 ` Wei Xu
2022-04-22 9:13 ` Jagdish Gediya
2022-04-21 5:31 ` Wei Xu
2022-04-13 9:22 ` [PATCH v2 4/5] device-dax/kmem: Set node state as N_DEMOTION_TARGETS Jagdish Gediya
2022-04-13 9:22 ` [PATCH v2 5/5] mm: demotion: Build demotion list based on N_DEMOTION_TARGETS Jagdish Gediya
2022-04-13 21:44 ` [PATCH v2 0/5] mm: demotion: Introduce new node state N_DEMOTION_TARGETS Andrew Morton
2022-04-14 10:16 ` Jagdish Gediya
2022-04-14 7:00 ` ying.huang
2022-04-14 10:19 ` Jagdish Gediya
2022-04-21 3:11 ` Yang Shi
2022-04-21 5:41 ` Wei Xu
2022-04-21 6:24 ` ying.huang
2022-04-21 6:49 ` Wei Xu
2022-04-21 7:08 ` ying.huang
2022-04-21 7:29 ` Wei Xu
2022-04-21 7:45 ` ying.huang
2022-04-21 18:26 ` Wei Xu
2022-04-22 0:58 ` ying.huang
2022-04-22 4:46 ` Wei Xu
2022-04-22 5:40 ` ying.huang
2022-04-22 6:11 ` Wei Xu
2022-04-22 6:13 ` Wei Xu
2022-04-22 6:21 ` ying.huang
2022-04-22 11:00 ` Jagdish Gediya
2022-04-22 16:43 ` Wei Xu
2022-04-22 17:29 ` Yang Shi
2022-04-24 3:02 ` ying.huang
2022-04-25 3:50 ` Aneesh Kumar K.V
2022-04-25 6:10 ` ying.huang
2022-04-25 8:09 ` Aneesh Kumar K V
2022-04-25 8:54 ` Aneesh Kumar K V
2022-04-25 20:17 ` Davidlohr Bueso
2022-04-26 8:42 ` ying.huang
2022-04-26 9:02 ` Aneesh Kumar K V
2022-04-26 9:44 ` ying.huang
2022-04-27 4:27 ` Wei Xu
2022-04-25 7:26 ` Jagdish Gediya
2022-04-25 16:56 ` Wei Xu
2022-04-27 5:06 ` Aneesh Kumar K V [this message]
2022-04-27 18:27 ` Wei Xu
2022-04-28 0:56 ` ying.huang
2022-04-28 4:11 ` Wei Xu
2022-04-28 17:14 ` Yang Shi
2022-04-29 1:27 ` Alistair Popple
2022-04-29 2:21 ` ying.huang
2022-04-29 2:58 ` Wei Xu
2022-04-29 3:27 ` ying.huang
2022-04-29 4:45 ` Alistair Popple
2022-04-29 18:53 ` Yang Shi
2022-04-29 18:52 ` Yang Shi
2022-04-27 7:11 ` ying.huang
2022-04-27 16:27 ` Wei Xu
2022-04-28 8:37 ` ying.huang
2022-04-28 19:30 ` Chen, Tim C
2022-04-30 2:21 ` Wei Xu
2022-04-21 17:56 ` Yang Shi
2022-04-21 23:48 ` ying.huang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=b1f58fd4-e23b-f617-b4a7-b80b1ffbe13f@linux.ibm.com \
--to=aneesh.kumar@linux.ibm.com \
--cc=akpm@linux-foundation.org \
--cc=baolin.wang@linux.alibaba.com \
--cc=brice.goglin@gmail.com \
--cc=dan.j.williams@intel.com \
--cc=dave.hansen@linux.intel.com \
--cc=dave@stgolabs.net \
--cc=gthelen@google.com \
--cc=jvgediya@linux.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=shy828301@gmail.com \
--cc=weixugc@google.com \
--cc=ying.huang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.