From: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
To: Michal Hocko <mhocko@suse.com>
Cc: Andi Kleen <ak@linux.intel.com>,
linux-mm@kvack.org, akpm@linux-foundation.org,
Ben Widawsky <ben.widawsky@intel.com>,
Dave Hansen <dave.hansen@linux.intel.com>,
Feng Tang <feng.tang@intel.com>,
Andrea Arcangeli <aarcange@redhat.com>,
Mel Gorman <mgorman@techsingularity.net>,
Mike Kravetz <mike.kravetz@oracle.com>,
Randy Dunlap <rdunlap@infradead.org>,
Vlastimil Babka <vbabka@suse.cz>,
Dan Williams <dan.j.williams@intel.com>,
Huang Ying <ying.huang@intel.com>
Subject: Re: [RFC PATCH] mm/mempolicy: add MPOL_PREFERRED_STRICT memory policy
Date: Thu, 14 Oct 2021 21:20:51 +0530 [thread overview]
Message-ID: <249414f6-1bb7-b76c-5b5b-2b3ed8937d7b@linux.ibm.com> (raw)
In-Reply-To: <YWhFFOtyVQ8Mespc@dhcp22.suse.cz>
On 10/14/21 20:26, Michal Hocko wrote:
> On Thu 14-10-21 18:59:14, Aneesh Kumar K.V wrote:
>> On 10/14/21 17:11, Michal Hocko wrote:
>>> On Thu 14-10-21 15:58:29, Aneesh Kumar K.V wrote:
>>>> On 10/14/21 15:08, Michal Hocko wrote:
>>> [...]
>>>>> Besides that it would be really great to finish the discussion about the
>>>>> usecase before suggesting a new userspace API.
>>>>>
>>>>
>>>> Application would like to hint a preferred node for allocating memory
>>>> backing a va range and at the same time wants to avoid fallback to some set
>>>> of nodes (in the use case I am interested don't fall back to slow memory
>>>> nodes).
>>>
>>> We do have means for that, right? You can set your memory policy and
>>> then set the cpu afffinity to the node you want to allocate from
>>> initially. You can migrate to a different cpu/node if this is not the
>>> preferred affinity. Why is that not usable?
>>
>> For the same reason you mentioned earlier, these nodes can be cpu less
>> nodes.
>
> It would have been easier if you were explicit about the usecase rather
> than let other guess.
>
>>> Also think about extensibility. Say I want to allocate from a set of
>>> nodes first before falling back to the rest of the nodemask? If you want
>>> to add a new API then think of other potential usecases.
>>>
>>
>> Describing the specific allocation details become hard with preferred node
>> being a nodemask. With the below interface
>>
>> SYSCALL_DEFINE5(preferred_mbind, unsigned long, start, unsigned long, len,
>> const unsigned long __user *, preferred_nmask, const unsigned long __user
>> *, fallback_nmask,
>> unsigned long, maxnode)
>> {
>>
>>
>> 1. The preferred node is the first node in the preferred node mask
>> 2. Then we try to allocate from nodes present in the preferred node mask
>> which is closer to the first node in the preferred node mask
>> 3. If the above fails, we try to allocate from nodes in the fallback node
>> mask which is closer to the first node in the preferred nodemask.
>>
>> Isn't that too complicated? Do we have a real usecase for that?
>
> No, I think this is a suboptimal interface. AFAIU you really want to
> define a "home" node(s) rather than any policy. Home node would
> effectively override the default local node whatever policy you have as
> it makes sense whether you have MPOL_PREFERRED_MANY or MPOL_BIND.
>
yes. I did describe it as below in an earlier email
"We could do
set_mempolicy(MPOLD_PREFERRED, nodemask(nodeX)))
set_mempolicy(MPOLD_PREFFERED_EXTEND, nodemask(fallback nodemask for
above PREFERRED policy)) "
But I agree that restricting this to virtual address range is much
better. Now I am wondering whether a nodemask is any better than a
nodeid. The concept of home nodes is confusing when compared to home node.
What would be the meaning of multiple nodes in a home nodes concept?
Should we do
SYSCALL_DEFINE4(home_node_mbind, unsigned long, start, unsigned long, len,
unsigned long, home_node, unsigned long, flags)
the flags is kept for future extension if any.
I guess this home node will only apply w.r.t MPOL_BIND and
MPOL_PREFFERED_MANY policy for now?
> Another potential interface would be set_nodeorder which would
> explicitly set the allocation fallback ordering. Again agnostic of the
> underlying memory policy. This would be more generic but the question is
> whether this is not too generic and whether there are usecases for that.
>
I would suggest we wait for applications really wanting a fallback order
other than distance based one before adding this. Distance based
fallback order from a preferred node is well understood from application
point of view.
-aneesh
next prev parent reply other threads:[~2021-10-14 15:51 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-10-13 9:45 [RFC PATCH] mm/mempolicy: add MPOL_PREFERRED_STRICT memory policy Aneesh Kumar K.V
2021-10-13 10:42 ` Michal Hocko
2021-10-13 10:48 ` Michal Hocko
2021-10-13 12:35 ` Aneesh Kumar K.V
2021-10-13 12:50 ` Michal Hocko
2021-10-13 12:58 ` Aneesh Kumar K.V
2021-10-13 13:07 ` Michal Hocko
2021-10-13 13:10 ` Aneesh Kumar K.V
2021-10-13 14:22 ` Michal Hocko
2021-10-13 13:57 ` Aneesh Kumar K.V
2021-10-13 14:26 ` Michal Hocko
2021-10-13 13:16 ` Andi Kleen
2021-10-13 13:23 ` Aneesh Kumar K.V
2021-10-13 14:21 ` Michal Hocko
2021-10-14 9:30 ` Aneesh Kumar K.V
2021-10-14 9:38 ` Michal Hocko
2021-10-14 10:28 ` Aneesh Kumar K.V
2021-10-14 11:41 ` Michal Hocko
2021-10-14 13:29 ` Aneesh Kumar K.V
2021-10-14 14:56 ` Michal Hocko
2021-10-14 15:50 ` Aneesh Kumar K.V [this message]
2021-10-19 9:38 ` Michal Hocko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=249414f6-1bb7-b76c-5b5b-2b3ed8937d7b@linux.ibm.com \
--to=aneesh.kumar@linux.ibm.com \
--cc=aarcange@redhat.com \
--cc=ak@linux.intel.com \
--cc=akpm@linux-foundation.org \
--cc=ben.widawsky@intel.com \
--cc=dan.j.williams@intel.com \
--cc=dave.hansen@linux.intel.com \
--cc=feng.tang@intel.com \
--cc=linux-mm@kvack.org \
--cc=mgorman@techsingularity.net \
--cc=mhocko@suse.com \
--cc=mike.kravetz@oracle.com \
--cc=rdunlap@infradead.org \
--cc=vbabka@suse.cz \
--cc=ying.huang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).