[-V6,RESEND,2/3] NOT kernel/man-pages: man2/set_mempolicy.2: Add mode flag MPOL_F_NUMA_BALANCING
diff mbox series

Message ID 20201202084234.15797-3-ying.huang@intel.com
State New, archived
Headers show
Series
  • numa balancing: Migrate on fault among multiple bound nodes
Related show

Commit Message

Huang\, Ying Dec. 2, 2020, 8:42 a.m. UTC
Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
---
 man2/set_mempolicy.2 | 9 +++++++++
 1 file changed, 9 insertions(+)

Comments

Mel Gorman Dec. 2, 2020, 11:43 a.m. UTC | #1
On Wed, Dec 02, 2020 at 04:42:33PM +0800, Huang Ying wrote:
> Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
> ---
>  man2/set_mempolicy.2 | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/man2/set_mempolicy.2 b/man2/set_mempolicy.2
> index 68011eecb..3754b3e12 100644
> --- a/man2/set_mempolicy.2
> +++ b/man2/set_mempolicy.2
> @@ -113,6 +113,12 @@ A nonempty
>  .I nodemask
>  specifies node IDs that are relative to the set of
>  node IDs allowed by the process's current cpuset.
> +.TP
> +.BR MPOL_F_NUMA_BALANCING " (since Linux 5.11)"
> +Enable the Linux kernel NUMA balancing for the task if it is supported
> +by kernel.
> +If the flag isn't supported by Linux kernel, return -1 and errno is
> +set to EINVAL.
>  .PP
>  .I nodemask
>  points to a bit mask of node IDs that contains up to
> @@ -293,6 +299,9 @@ argument specified both

Should this be expanded more to clarify it applies to MPOL_BIND
specifically?

Maybe the first patch should be expanded more and explictly fail if
MPOL_F_NUMA_BALANCING is used with anything other than MPOL_BIND?

>  .B MPOL_F_STATIC_NODES
>  and
>  .BR MPOL_F_RELATIVE_NODES .
> +Or, the
> +.B MPOL_F_NUMA_BALANCING
> +isn't supported by the Linux kernel.

This will be difficult for an app to distinguish but we can't go back in
time and make this ENOSYS :(

The linux-api people might have more guidance but it may go to the
extent of including a small test program in the manual page for a
sequence that tests whether MPOL_F_NUMA_BALANCING works. They might have
a better recommendation on how it should be handled.
Alejandro Colomar (mailing lists; readonly) Dec. 2, 2020, 12:33 p.m. UTC | #2
Hi Huang Ying,

Please see a few fixes below.

Michael, as always, some question for you too ;)

Thanks,

Alex

On 12/2/20 9:42 AM, Huang Ying wrote:
> Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
> ---
>  man2/set_mempolicy.2 | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/man2/set_mempolicy.2 b/man2/set_mempolicy.2
> index 68011eecb..3754b3e12 100644
> --- a/man2/set_mempolicy.2
> +++ b/man2/set_mempolicy.2
> @@ -113,6 +113,12 @@ A nonempty
>  .I nodemask
>  specifies node IDs that are relative to the set of
>  node IDs allowed by the process's current cpuset.
> +.TP
> +.BR MPOL_F_NUMA_BALANCING " (since Linux 5.11)"

I'd prefer it to be in alphabetical order (rather than just adding at
the bottom).

That way, when lists grow, it's easier to find things.

> +Enable the Linux kernel NUMA balancing for the task if it is supported
> +by kernel.

I'd s/Linux kernel/kernel/ when it doesn't specifically refer to the
Linux kernel to differentiate it from other kernels.  It only adds noise
(IMHO).  mtk?

wfix:

... supported by _the_ kernel.

> +If the flag isn't supported by Linux kernel, return -1 and errno is

wfix:

If the flag isn't supported by _the_ kernel, ...

> +set to EINVAL.

errno and EINVAL should use .I and .B respectively

>  .PP
>  .I nodemask
>  points to a bit mask of node IDs that contains up to
> @@ -293,6 +299,9 @@ argument specified both
>  .B MPOL_F_STATIC_NODES
>  and
>  .BR MPOL_F_RELATIVE_NODES .
> +Or, the
> +.B MPOL_F_NUMA_BALANCING
> +isn't supported by the Linux kernel.
>  .TP
>  .B ENOMEM
>  Insufficient kernel memory was available.
>
Huang\, Ying Dec. 3, 2020, 1:49 a.m. UTC | #3
Mel Gorman <mgorman@suse.de> writes:

> On Wed, Dec 02, 2020 at 04:42:33PM +0800, Huang Ying wrote:
>> Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
>> ---
>>  man2/set_mempolicy.2 | 9 +++++++++
>>  1 file changed, 9 insertions(+)
>> 
>> diff --git a/man2/set_mempolicy.2 b/man2/set_mempolicy.2
>> index 68011eecb..3754b3e12 100644
>> --- a/man2/set_mempolicy.2
>> +++ b/man2/set_mempolicy.2
>> @@ -113,6 +113,12 @@ A nonempty
>>  .I nodemask
>>  specifies node IDs that are relative to the set of
>>  node IDs allowed by the process's current cpuset.
>> +.TP
>> +.BR MPOL_F_NUMA_BALANCING " (since Linux 5.11)"
>> +Enable the Linux kernel NUMA balancing for the task if it is supported
>> +by kernel.
>> +If the flag isn't supported by Linux kernel, return -1 and errno is
>> +set to EINVAL.
>>  .PP
>>  .I nodemask
>>  points to a bit mask of node IDs that contains up to
>> @@ -293,6 +299,9 @@ argument specified both
>
> Should this be expanded more to clarify it applies to MPOL_BIND
> specifically?
>
> Maybe the first patch should be expanded more and explictly fail if
> MPOL_F_NUMA_BALANCING is used with anything other than MPOL_BIND?

For MPOL_PREFERRED, why could we not use NUMA balancing to migrate pages
to the accessing local node if it is same as the preferred node?  We
have a way to turn off NUMA balancing already, why could we not provide
a way to enable it if that's intended?

Even for MPOL_INTERLEAVE, if the target node is the same as the
accessing local node, can we use NUMA balancing to migrate pages?

So, I prefer to make MPOL_F_NUMA_BALANCING to be

  Optimizing with NUMA balancing if possible, and we may add more
  optimization in the future.

Do you agree?

Best Regards,
Huang, Ying

>>  .B MPOL_F_STATIC_NODES
>>  and
>>  .BR MPOL_F_RELATIVE_NODES .
>> +Or, the
>> +.B MPOL_F_NUMA_BALANCING
>> +isn't supported by the Linux kernel.
>
> This will be difficult for an app to distinguish but we can't go back in
> time and make this ENOSYS :(
>
> The linux-api people might have more guidance but it may go to the
> extent of including a small test program in the manual page for a
> sequence that tests whether MPOL_F_NUMA_BALANCING works. They might have
> a better recommendation on how it should be handled.
Mel Gorman Dec. 3, 2020, 9:37 a.m. UTC | #4
On Thu, Dec 03, 2020 at 09:49:02AM +0800, Huang, Ying wrote:
> >> diff --git a/man2/set_mempolicy.2 b/man2/set_mempolicy.2
> >> index 68011eecb..3754b3e12 100644
> >> --- a/man2/set_mempolicy.2
> >> +++ b/man2/set_mempolicy.2
> >> @@ -113,6 +113,12 @@ A nonempty
> >>  .I nodemask
> >>  specifies node IDs that are relative to the set of
> >>  node IDs allowed by the process's current cpuset.
> >> +.TP
> >> +.BR MPOL_F_NUMA_BALANCING " (since Linux 5.11)"
> >> +Enable the Linux kernel NUMA balancing for the task if it is supported
> >> +by kernel.
> >> +If the flag isn't supported by Linux kernel, return -1 and errno is
> >> +set to EINVAL.
> >>  .PP
> >>  .I nodemask
> >>  points to a bit mask of node IDs that contains up to
> >> @@ -293,6 +299,9 @@ argument specified both
> >
> > Should this be expanded more to clarify it applies to MPOL_BIND
> > specifically?
> >
> > Maybe the first patch should be expanded more and explictly fail if
> > MPOL_F_NUMA_BALANCING is used with anything other than MPOL_BIND?
> 
> For MPOL_PREFERRED, why could we not use NUMA balancing to migrate pages
> to the accessing local node if it is same as the preferred node? 

You could but the kernel patch does not do that by making preferred_nid
stick to the preferred node when hinting faults are trapped on that VMA.
It would have to be a separate patch coupled with a man page update. If
you wanted to go in this direction in the future, then the patch should
explicitly return an error *now* if MPOL_PREFERRED is or'd with
MPOL_F_NUMA_BALANCING so that an application becomes aware of
MPOL_F_NUMA_BALANCING then it can detect if support exists in the
current running kernel.

> Even for MPOL_INTERLEAVE, if the target node is the same as the
> accessing local node, can we use NUMA balancing to migrate pages?
> 

The intent of MPOL_INTERLEAVE is to average the costs of the memory
access so the average cost across the VMA is roughly similar across the
entire range. This may be particularly important if the VMA is shared
between multiple threads that are spread out on multiple nodes. A change
in semantics there should be clearly documented.

Similar, if you want to go in this direction, MPOL_F_NUMA_BALANCING
should be chcked against MPOL_INTERLEAVE and explicitly fail now so
suport can be detected at runtime.

> So, I prefer to make MPOL_F_NUMA_BALANCING to be
> 
>   Optimizing with NUMA balancing if possible, and we may add more
>   optimization in the future.
> 

Maybe, but I think it's best that the actual behaviour of the kernel is
documented instead of desired behaviour or future planning.
Huang\, Ying Dec. 8, 2020, 8:13 a.m. UTC | #5
Hi, Alex,

Sorry for late, I just notice this email today.

"Alejandro Colomar (mailing lists; readonly)"
<alx.mailinglists@gmail.com> writes:

> Hi Huang Ying,
>
> Please see a few fixes below.
>
> Michael, as always, some question for you too ;)
>
> Thanks,
>
> Alex
>
> On 12/2/20 9:42 AM, Huang Ying wrote:
>> Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
>> ---
>>  man2/set_mempolicy.2 | 9 +++++++++
>>  1 file changed, 9 insertions(+)
>> 
>> diff --git a/man2/set_mempolicy.2 b/man2/set_mempolicy.2
>> index 68011eecb..3754b3e12 100644
>> --- a/man2/set_mempolicy.2
>> +++ b/man2/set_mempolicy.2
>> @@ -113,6 +113,12 @@ A nonempty
>>  .I nodemask
>>  specifies node IDs that are relative to the set of
>>  node IDs allowed by the process's current cpuset.
>> +.TP
>> +.BR MPOL_F_NUMA_BALANCING " (since Linux 5.11)"
>
> I'd prefer it to be in alphabetical order (rather than just adding at
> the bottom).

That's OK for me.  But it's better to be done in another patch to
distinguish contents from pure order change?

> That way, when lists grow, it's easier to find things.
>
>> +Enable the Linux kernel NUMA balancing for the task if it is supported
>> +by kernel.
>
> I'd s/Linux kernel/kernel/ when it doesn't specifically refer to the
> Linux kernel to differentiate it from other kernels.  It only adds noise
> (IMHO).  mtk?

Sure.  Will fix this and all following comments below.  Thanks a lot for
your help!  I am new to man pages.

Best Regards,
Huang, Ying
Alejandro Colomar (mailing lists; readonly) Dec. 18, 2020, 10:21 a.m. UTC | #6
Hi Huang, Ying,

Sorry I forgot to answer.
See below.

BTW, Linux 5.10 has been released recently;
is this series already merged for 5.11?
If not yet, could you just write '5.??' and we'll fix it (and add a
commit number in a comment) when we know the definitive version?

Thanks,

Alex

On 12/8/20 9:13 AM, Huang, Ying wrote:
> Hi, Alex,
> 
> Sorry for late, I just notice this email today.
> 
> "Alejandro Colomar (mailing lists; readonly)"
> <alx.mailinglists@gmail.com> writes:
> 
>> Hi Huang Ying,
>>
>> Please see a few fixes below.
>>
>> Michael, as always, some question for you too ;)
>>
>> Thanks,
>>
>> Alex
>>
>> On 12/2/20 9:42 AM, Huang Ying wrote:
>>> Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
>>> ---
>>>  man2/set_mempolicy.2 | 9 +++++++++
>>>  1 file changed, 9 insertions(+)
>>>
>>> diff --git a/man2/set_mempolicy.2 b/man2/set_mempolicy.2
>>> index 68011eecb..3754b3e12 100644
>>> --- a/man2/set_mempolicy.2
>>> +++ b/man2/set_mempolicy.2
>>> @@ -113,6 +113,12 @@ A nonempty
>>>  .I nodemask
>>>  specifies node IDs that are relative to the set of
>>>  node IDs allowed by the process's current cpuset.
>>> +.TP
>>> +.BR MPOL_F_NUMA_BALANCING " (since Linux 5.11)"
>>
>> I'd prefer it to be in alphabetical order (rather than just adding at
>> the bottom).
> 
> That's OK for me.  But it's better to be done in another patch to
> distinguish contents from pure order change?

Yes, if you could do a series of 2 patches with a reordering first, it
would be great.

> 
>> That way, when lists grow, it's easier to find things.
>>
>>> +Enable the Linux kernel NUMA balancing for the task if it is supported
>>> +by kernel.
>>
>> I'd s/Linux kernel/kernel/ when it doesn't specifically refer to the
>> Linux kernel to differentiate it from other kernels.  It only adds noise
>> (IMHO).  mtk?
> 
> Sure.  Will fix this and all following comments below.  Thanks a lot for
> your help!  I am new to man pages.

Thank you!

> 
> Best Regards,
> Huang, Ying
>
Huang\, Ying Dec. 21, 2020, 1:31 a.m. UTC | #7
"Alejandro Colomar (mailing lists; readonly)"
<alx.mailinglists@gmail.com> writes:

> Hi Huang, Ying,
>
> Sorry I forgot to answer.
> See below.
>
> BTW, Linux 5.10 has been released recently;
> is this series already merged for 5.11?
> If not yet, could you just write '5.??' and we'll fix it (and add a
> commit number in a comment) when we know the definitive version?

Sure.  Will replace it with 5.12.  Thanks for reminding!

Best Regards,
Huang, Ying

> Thanks,
>
> Alex
>
> On 12/8/20 9:13 AM, Huang, Ying wrote:
>> Hi, Alex,
>> 
>> Sorry for late, I just notice this email today.
>> 
>> "Alejandro Colomar (mailing lists; readonly)"
>> <alx.mailinglists@gmail.com> writes:
>> 
>>> Hi Huang Ying,
>>>
>>> Please see a few fixes below.
>>>
>>> Michael, as always, some question for you too ;)
>>>
>>> Thanks,
>>>
>>> Alex
>>>
>>> On 12/2/20 9:42 AM, Huang Ying wrote:
>>>> Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
>>>> ---
>>>>  man2/set_mempolicy.2 | 9 +++++++++
>>>>  1 file changed, 9 insertions(+)
>>>>
>>>> diff --git a/man2/set_mempolicy.2 b/man2/set_mempolicy.2
>>>> index 68011eecb..3754b3e12 100644
>>>> --- a/man2/set_mempolicy.2
>>>> +++ b/man2/set_mempolicy.2
>>>> @@ -113,6 +113,12 @@ A nonempty
>>>>  .I nodemask
>>>>  specifies node IDs that are relative to the set of
>>>>  node IDs allowed by the process's current cpuset.
>>>> +.TP
>>>> +.BR MPOL_F_NUMA_BALANCING " (since Linux 5.11)"
>>>
>>> I'd prefer it to be in alphabetical order (rather than just adding at
>>> the bottom).
>> 
>> That's OK for me.  But it's better to be done in another patch to
>> distinguish contents from pure order change?
>
> Yes, if you could do a series of 2 patches with a reordering first, it
> would be great.
>
>> 
>>> That way, when lists grow, it's easier to find things.
>>>
>>>> +Enable the Linux kernel NUMA balancing for the task if it is supported
>>>> +by kernel.
>>>
>>> I'd s/Linux kernel/kernel/ when it doesn't specifically refer to the
>>> Linux kernel to differentiate it from other kernels.  It only adds noise
>>> (IMHO).  mtk?
>> 
>> Sure.  Will fix this and all following comments below.  Thanks a lot for
>> your help!  I am new to man pages.
>
> Thank you!
>
>> 
>> Best Regards,
>> Huang, Ying
>>

Patch
diff mbox series

diff --git a/man2/set_mempolicy.2 b/man2/set_mempolicy.2
index 68011eecb..3754b3e12 100644
--- a/man2/set_mempolicy.2
+++ b/man2/set_mempolicy.2
@@ -113,6 +113,12 @@  A nonempty
 .I nodemask
 specifies node IDs that are relative to the set of
 node IDs allowed by the process's current cpuset.
+.TP
+.BR MPOL_F_NUMA_BALANCING " (since Linux 5.11)"
+Enable the Linux kernel NUMA balancing for the task if it is supported
+by kernel.
+If the flag isn't supported by Linux kernel, return -1 and errno is
+set to EINVAL.
 .PP
 .I nodemask
 points to a bit mask of node IDs that contains up to
@@ -293,6 +299,9 @@  argument specified both
 .B MPOL_F_STATIC_NODES
 and
 .BR MPOL_F_RELATIVE_NODES .
+Or, the
+.B MPOL_F_NUMA_BALANCING
+isn't supported by the Linux kernel.
 .TP
 .B ENOMEM
 Insufficient kernel memory was available.