linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
To: Michal Hocko <mhocko@suse.com>
Cc: Andi Kleen <ak@linux.intel.com>,
	linux-mm@kvack.org, akpm@linux-foundation.org,
	Ben Widawsky <ben.widawsky@intel.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Feng Tang <feng.tang@intel.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Mel Gorman <mgorman@techsingularity.net>,
	Mike Kravetz <mike.kravetz@oracle.com>,
	Randy Dunlap <rdunlap@infradead.org>,
	Vlastimil Babka <vbabka@suse.cz>,
	Dan Williams <dan.j.williams@intel.com>,
	Huang Ying <ying.huang@intel.com>
Subject: Re: [RFC PATCH] mm/mempolicy: add MPOL_PREFERRED_STRICT memory policy
Date: Thu, 14 Oct 2021 21:20:51 +0530	[thread overview]
Message-ID: <249414f6-1bb7-b76c-5b5b-2b3ed8937d7b@linux.ibm.com> (raw)
In-Reply-To: <YWhFFOtyVQ8Mespc@dhcp22.suse.cz>

On 10/14/21 20:26, Michal Hocko wrote:
> On Thu 14-10-21 18:59:14, Aneesh Kumar K.V wrote:
>> On 10/14/21 17:11, Michal Hocko wrote:
>>> On Thu 14-10-21 15:58:29, Aneesh Kumar K.V wrote:
>>>> On 10/14/21 15:08, Michal Hocko wrote:
>>> [...]
>>>>> Besides that it would be really great to finish the discussion about the
>>>>> usecase before suggesting a new userspace API.
>>>>>
>>>>
>>>> Application would like to hint a preferred node for allocating memory
>>>> backing a va range and at the same time wants to avoid fallback to some set
>>>> of nodes (in the use case I am interested don't fall back to slow memory
>>>> nodes).
>>>
>>> We do have means for that, right? You can set your memory policy and
>>> then set the cpu afffinity to the node you want to allocate from
>>> initially. You can migrate to a different cpu/node if this is not the
>>> preferred affinity. Why is that not usable?
>>
>> For the same reason you mentioned earlier, these nodes can be cpu less
>> nodes.
> 
> It would have been easier if you were explicit about the usecase rather
> than let other guess.
> 
>>> Also think about extensibility. Say I want to allocate from a set of
>>> nodes first before falling back to the rest of the nodemask? If you want
>>> to add a new API then think of other potential usecases.
>>>
>>
>> Describing the specific allocation details become hard with preferred node
>> being a nodemask. With the below interface
>>
>> SYSCALL_DEFINE5(preferred_mbind, unsigned long, start, unsigned long, len,
>> 		const unsigned long __user *, preferred_nmask, const unsigned long __user
>> *, fallback_nmask,
>> 		unsigned long, maxnode)
>> {
>>
>>
>> 1. The preferred node is the first node in the preferred node mask
>> 2. Then we try to allocate from nodes present in the preferred node mask
>> which is closer to the first node in the preferred node mask
>> 3. If the above fails, we try to allocate from nodes in the fallback node
>> mask which is closer to the first node in the preferred nodemask.
>>
>> Isn't that too complicated? Do we have a real usecase for that?
> 
> No, I think this is a suboptimal interface. AFAIU you really want to
> define a "home" node(s) rather than any policy. Home node would
> effectively override the default local node whatever policy you have as
> it makes sense whether you have MPOL_PREFERRED_MANY or MPOL_BIND.
> 


yes. I did describe it as below in an earlier email

"We could do
set_mempolicy(MPOLD_PREFERRED, nodemask(nodeX)))
set_mempolicy(MPOLD_PREFFERED_EXTEND, nodemask(fallback nodemask for 
above PREFERRED policy)) "

But I agree that restricting this to virtual address range is much 
better. Now I am wondering whether a nodemask is any better than a 
nodeid. The concept of home nodes is confusing when compared to home node.
What would be the meaning of multiple nodes in a home nodes concept?

Should we do

SYSCALL_DEFINE4(home_node_mbind, unsigned long, start, unsigned long, len,
		unsigned long, home_node, unsigned long, flags)


the flags is kept for future extension if any.


I guess this home node will only apply w.r.t MPOL_BIND and 
MPOL_PREFFERED_MANY policy for now?

> Another potential interface would be set_nodeorder which would
> explicitly set the allocation fallback ordering. Again agnostic of the
> underlying memory policy. This would be more generic but the question is
> whether this is not too generic and whether there are usecases for that.
> 

I would suggest we wait for applications really wanting a fallback order 
other than distance based one before adding this. Distance based 
fallback order from a preferred node is well understood from application 
point of view.

-aneesh


  reply	other threads:[~2021-10-14 15:51 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-13  9:45 [RFC PATCH] mm/mempolicy: add MPOL_PREFERRED_STRICT memory policy Aneesh Kumar K.V
2021-10-13 10:42 ` Michal Hocko
2021-10-13 10:48   ` Michal Hocko
2021-10-13 12:35     ` Aneesh Kumar K.V
2021-10-13 12:50       ` Michal Hocko
2021-10-13 12:58         ` Aneesh Kumar K.V
2021-10-13 13:07           ` Michal Hocko
2021-10-13 13:10             ` Aneesh Kumar K.V
2021-10-13 14:22               ` Michal Hocko
2021-10-13 13:57           ` Aneesh Kumar K.V
2021-10-13 14:26             ` Michal Hocko
2021-10-13 13:16 ` Andi Kleen
2021-10-13 13:23   ` Aneesh Kumar K.V
2021-10-13 14:21     ` Michal Hocko
2021-10-14  9:30       ` Aneesh Kumar K.V
2021-10-14  9:38         ` Michal Hocko
2021-10-14 10:28           ` Aneesh Kumar K.V
2021-10-14 11:41             ` Michal Hocko
2021-10-14 13:29               ` Aneesh Kumar K.V
2021-10-14 14:56                 ` Michal Hocko
2021-10-14 15:50                   ` Aneesh Kumar K.V [this message]
2021-10-19  9:38                     ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=249414f6-1bb7-b76c-5b5b-2b3ed8937d7b@linux.ibm.com \
    --to=aneesh.kumar@linux.ibm.com \
    --cc=aarcange@redhat.com \
    --cc=ak@linux.intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=ben.widawsky@intel.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=feng.tang@intel.com \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=mhocko@suse.com \
    --cc=mike.kravetz@oracle.com \
    --cc=rdunlap@infradead.org \
    --cc=vbabka@suse.cz \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).