All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2] hostmem: Honor multiple preferred nodes if possible
@ 2022-12-15  9:55 Michal Privoznik
  2022-12-16 13:41 ` David Hildenbrand
  0 siblings, 1 reply; 8+ messages in thread
From: Michal Privoznik @ 2022-12-15  9:55 UTC (permalink / raw)
  To: qemu-devel; +Cc: david, imammedo, marcandre.lureau, berrange

If a memory-backend is configured with mode
HOST_MEM_POLICY_PREFERRED then
host_memory_backend_memory_complete() calls mbind() as:

  mbind(..., MPOL_PREFERRED, nodemask, ...);

Here, 'nodemask' is a bitmap of host NUMA nodes and corresponds
to the .host-nodes attribute. Therefore, there can be multiple
nodes specified. However, the documentation to MPOL_PREFERRED
says:

  MPOL_PREFERRED
    This mode sets the preferred node for allocation. ...
    If nodemask specifies more than one node ID, the first node
    in the mask will be selected as the preferred node.

Therefore, only the first node is honored and the rest is
silently ignored. Well, with recent changes to the kernel and
numactl we can do better.

The Linux kernel added in v5.15 via commit cfcaa66f8032
("mm/hugetlb: add support for mempolicy MPOL_PREFERRED_MANY")
support for MPOL_PREFERRED_MANY, which accepts multiple preferred
NUMA nodes instead.

Then, numa_has_preferred_many() API was introduced to numactl
(v2.0.15~26) allowing applications to query kernel support.

Wiring this all together, we can pass MPOL_PREFERRED_MANY to the
mbind() call instead and stop ignoring multiple nodes, silently.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
---

v2 of:

https://lists.gnu.org/archive/html/qemu-devel/2022-12/msg01354.html

diff to v1 (thanks to David for his rewiew):
- Don't cache numa_has_preferred_many() retval
- Reword comments and commit message
- Switch compile time detection from numa_set_preferred_many() to
  numa_has_preferred_many()

 backends/hostmem.c | 19 +++++++++++++++++--
 meson.build        |  5 +++++
 2 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/backends/hostmem.c b/backends/hostmem.c
index 8640294c10..163ea9af04 100644
--- a/backends/hostmem.c
+++ b/backends/hostmem.c
@@ -23,7 +23,12 @@
 
 #ifdef CONFIG_NUMA
 #include <numaif.h>
+#include <numa.h>
 QEMU_BUILD_BUG_ON(HOST_MEM_POLICY_DEFAULT != MPOL_DEFAULT);
+/*
+ * HOST_MEM_POLICY_PREFERRED may either translate to MPOL_PREFERRED or
+ * MPOL_PREFERRED_MANY, see comments further below.
+ */
 QEMU_BUILD_BUG_ON(HOST_MEM_POLICY_PREFERRED != MPOL_PREFERRED);
 QEMU_BUILD_BUG_ON(HOST_MEM_POLICY_BIND != MPOL_BIND);
 QEMU_BUILD_BUG_ON(HOST_MEM_POLICY_INTERLEAVE != MPOL_INTERLEAVE);
@@ -346,6 +351,7 @@ host_memory_backend_memory_complete(UserCreatable *uc, Error **errp)
          * before mbind(). note: MPOL_MF_STRICT is ignored on hugepages so
          * this doesn't catch hugepage case. */
         unsigned flags = MPOL_MF_STRICT | MPOL_MF_MOVE;
+        int mode = backend->policy;
 
         /* check for invalid host-nodes and policies and give more verbose
          * error messages than mbind(). */
@@ -369,9 +375,18 @@ host_memory_backend_memory_complete(UserCreatable *uc, Error **errp)
                BITS_TO_LONGS(MAX_NODES + 1) * sizeof(unsigned long));
         assert(maxnode <= MAX_NODES);
 
+#ifdef HAVE_NUMA_SET_PREFERRED_MANY
+        if (mode == MPOL_PREFERRED && numa_has_preferred_many() > 0) {
+            /*
+             * Replace with MPOL_PREFERRED_MANY otherwise the mbind() below
+             * silently picks the first node.
+             */
+            mode = MPOL_PREFERRED_MANY;
+        }
+#endif
+
         if (maxnode &&
-            mbind(ptr, sz, backend->policy, backend->host_nodes, maxnode + 1,
-                  flags)) {
+            mbind(ptr, sz, mode, backend->host_nodes, maxnode + 1, flags)) {
             if (backend->policy != MPOL_DEFAULT || errno != ENOSYS) {
                 error_setg_errno(errp, errno,
                                  "cannot bind memory to host NUMA nodes");
diff --git a/meson.build b/meson.build
index 5c6b5a1c75..fb6979349d 100644
--- a/meson.build
+++ b/meson.build
@@ -1858,6 +1858,11 @@ config_host_data.set('CONFIG_LINUX_AIO', libaio.found())
 config_host_data.set('CONFIG_LINUX_IO_URING', linux_io_uring.found())
 config_host_data.set('CONFIG_LIBPMEM', libpmem.found())
 config_host_data.set('CONFIG_NUMA', numa.found())
+if numa.found()
+  config_host_data.set('HAVE_NUMA_HAS_PREFERRED_MANY',
+                       cc.has_function('numa_has_preferred_many',
+                                       dependencies: numa))
+endif
 config_host_data.set('CONFIG_OPENGL', opengl.found())
 config_host_data.set('CONFIG_PROFILER', get_option('profiler'))
 config_host_data.set('CONFIG_RBD', rbd.found())
-- 
2.37.4



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] hostmem: Honor multiple preferred nodes if possible
  2022-12-15  9:55 [PATCH v2] hostmem: Honor multiple preferred nodes if possible Michal Privoznik
@ 2022-12-16 13:41 ` David Hildenbrand
  2022-12-16 13:47   ` Michal Prívozník
  0 siblings, 1 reply; 8+ messages in thread
From: David Hildenbrand @ 2022-12-16 13:41 UTC (permalink / raw)
  To: Michal Privoznik, qemu-devel; +Cc: imammedo, marcandre.lureau, berrange

On 15.12.22 10:55, Michal Privoznik wrote:
> If a memory-backend is configured with mode
> HOST_MEM_POLICY_PREFERRED then
> host_memory_backend_memory_complete() calls mbind() as:
> 
>    mbind(..., MPOL_PREFERRED, nodemask, ...);
> 
> Here, 'nodemask' is a bitmap of host NUMA nodes and corresponds
> to the .host-nodes attribute. Therefore, there can be multiple
> nodes specified. However, the documentation to MPOL_PREFERRED
> says:
> 
>    MPOL_PREFERRED
>      This mode sets the preferred node for allocation. ...
>      If nodemask specifies more than one node ID, the first node
>      in the mask will be selected as the preferred node.
> 
> Therefore, only the first node is honored and the rest is
> silently ignored. Well, with recent changes to the kernel and
> numactl we can do better.
> 
> The Linux kernel added in v5.15 via commit cfcaa66f8032
> ("mm/hugetlb: add support for mempolicy MPOL_PREFERRED_MANY")
> support for MPOL_PREFERRED_MANY, which accepts multiple preferred
> NUMA nodes instead.
> 
> Then, numa_has_preferred_many() API was introduced to numactl
> (v2.0.15~26) allowing applications to query kernel support.
> 
> Wiring this all together, we can pass MPOL_PREFERRED_MANY to the
> mbind() call instead and stop ignoring multiple nodes, silently.
> 
> Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
> ---

[...]

> +#ifdef HAVE_NUMA_SET_PREFERRED_MANY
> +        if (mode == MPOL_PREFERRED && numa_has_preferred_many() > 0) {
> +            /*
> +             * Replace with MPOL_PREFERRED_MANY otherwise the mbind() below
> +             * silently picks the first node.
> +             */
> +            mode = MPOL_PREFERRED_MANY;
> +        }
> +#endif
> +


I was curios if we want to warn the user if "mode == 
MPOL_PREFERRED_MANY" and we were given more than one node.


Apart from that LGTM, thanks

Reviewed-by: David Hildenbrand <david@redhat.com>

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] hostmem: Honor multiple preferred nodes if possible
  2022-12-16 13:41 ` David Hildenbrand
@ 2022-12-16 13:47   ` Michal Prívozník
  2022-12-19  9:55     ` David Hildenbrand
  0 siblings, 1 reply; 8+ messages in thread
From: Michal Prívozník @ 2022-12-16 13:47 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel; +Cc: imammedo, marcandre.lureau, berrange

On 12/16/22 14:41, David Hildenbrand wrote:
> On 15.12.22 10:55, Michal Privoznik wrote:
>> If a memory-backend is configured with mode
>> HOST_MEM_POLICY_PREFERRED then
>> host_memory_backend_memory_complete() calls mbind() as:
>>
>>    mbind(..., MPOL_PREFERRED, nodemask, ...);
>>
>> Here, 'nodemask' is a bitmap of host NUMA nodes and corresponds
>> to the .host-nodes attribute. Therefore, there can be multiple
>> nodes specified. However, the documentation to MPOL_PREFERRED
>> says:
>>
>>    MPOL_PREFERRED
>>      This mode sets the preferred node for allocation. ...
>>      If nodemask specifies more than one node ID, the first node
>>      in the mask will be selected as the preferred node.
>>
>> Therefore, only the first node is honored and the rest is
>> silently ignored. Well, with recent changes to the kernel and
>> numactl we can do better.
>>
>> The Linux kernel added in v5.15 via commit cfcaa66f8032
>> ("mm/hugetlb: add support for mempolicy MPOL_PREFERRED_MANY")
>> support for MPOL_PREFERRED_MANY, which accepts multiple preferred
>> NUMA nodes instead.
>>
>> Then, numa_has_preferred_many() API was introduced to numactl
>> (v2.0.15~26) allowing applications to query kernel support.
>>
>> Wiring this all together, we can pass MPOL_PREFERRED_MANY to the
>> mbind() call instead and stop ignoring multiple nodes, silently.
>>
>> Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
>> ---
> 
> [...]
> 
>> +#ifdef HAVE_NUMA_SET_PREFERRED_MANY
>> +        if (mode == MPOL_PREFERRED && numa_has_preferred_many() > 0) {
>> +            /*
>> +             * Replace with MPOL_PREFERRED_MANY otherwise the mbind()
>> below
>> +             * silently picks the first node.
>> +             */
>> +            mode = MPOL_PREFERRED_MANY;
>> +        }
>> +#endif
>> +
> 
> 
> I was curios if we want to warn the user if "mode ==
> MPOL_PREFERRED_MANY" and we were given more than one node.

I was wondering about that, but given that we currently silently ignore
other nodes, I think it's safe to assume the warning is not necessary.
Then again, as users upgrade to newer kernels this is going to be the
new norm and the warning won't be necessary.

> 
> 
> Apart from that LGTM, thanks
> 
> Reviewed-by: David Hildenbrand <david@redhat.com>
> 

Thanks,
Michal



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] hostmem: Honor multiple preferred nodes if possible
  2022-12-16 13:47   ` Michal Prívozník
@ 2022-12-19  9:55     ` David Hildenbrand
  2022-12-19  9:57       ` Michal Prívozník
  0 siblings, 1 reply; 8+ messages in thread
From: David Hildenbrand @ 2022-12-19  9:55 UTC (permalink / raw)
  To: Michal Prívozník, qemu-devel
  Cc: imammedo, marcandre.lureau, berrange

On 16.12.22 14:47, Michal Prívozník wrote:
> On 12/16/22 14:41, David Hildenbrand wrote:
>> On 15.12.22 10:55, Michal Privoznik wrote:
>>> If a memory-backend is configured with mode
>>> HOST_MEM_POLICY_PREFERRED then
>>> host_memory_backend_memory_complete() calls mbind() as:
>>>
>>>     mbind(..., MPOL_PREFERRED, nodemask, ...);
>>>
>>> Here, 'nodemask' is a bitmap of host NUMA nodes and corresponds
>>> to the .host-nodes attribute. Therefore, there can be multiple
>>> nodes specified. However, the documentation to MPOL_PREFERRED
>>> says:
>>>
>>>     MPOL_PREFERRED
>>>       This mode sets the preferred node for allocation. ...
>>>       If nodemask specifies more than one node ID, the first node
>>>       in the mask will be selected as the preferred node.
>>>
>>> Therefore, only the first node is honored and the rest is
>>> silently ignored. Well, with recent changes to the kernel and
>>> numactl we can do better.
>>>
>>> The Linux kernel added in v5.15 via commit cfcaa66f8032
>>> ("mm/hugetlb: add support for mempolicy MPOL_PREFERRED_MANY")
>>> support for MPOL_PREFERRED_MANY, which accepts multiple preferred
>>> NUMA nodes instead.
>>>
>>> Then, numa_has_preferred_many() API was introduced to numactl
>>> (v2.0.15~26) allowing applications to query kernel support.
>>>
>>> Wiring this all together, we can pass MPOL_PREFERRED_MANY to the
>>> mbind() call instead and stop ignoring multiple nodes, silently.
>>>
>>> Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
>>> ---
>>
>> [...]
>>
>>> +#ifdef HAVE_NUMA_SET_PREFERRED_MANY

That should be HAVE_NUMA_HAS_PREFERRED_MANY, right?

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] hostmem: Honor multiple preferred nodes if possible
  2022-12-19  9:55     ` David Hildenbrand
@ 2022-12-19  9:57       ` Michal Prívozník
  2022-12-19  9:58         ` David Hildenbrand
  0 siblings, 1 reply; 8+ messages in thread
From: Michal Prívozník @ 2022-12-19  9:57 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel; +Cc: imammedo, marcandre.lureau, berrange

On 12/19/22 10:55, David Hildenbrand wrote:
> On 16.12.22 14:47, Michal Prívozník wrote:
>> On 12/16/22 14:41, David Hildenbrand wrote:
>>> On 15.12.22 10:55, Michal Privoznik wrote:
>>>> If a memory-backend is configured with mode
>>>> HOST_MEM_POLICY_PREFERRED then
>>>> host_memory_backend_memory_complete() calls mbind() as:
>>>>
>>>>     mbind(..., MPOL_PREFERRED, nodemask, ...);
>>>>
>>>> Here, 'nodemask' is a bitmap of host NUMA nodes and corresponds
>>>> to the .host-nodes attribute. Therefore, there can be multiple
>>>> nodes specified. However, the documentation to MPOL_PREFERRED
>>>> says:
>>>>
>>>>     MPOL_PREFERRED
>>>>       This mode sets the preferred node for allocation. ...
>>>>       If nodemask specifies more than one node ID, the first node
>>>>       in the mask will be selected as the preferred node.
>>>>
>>>> Therefore, only the first node is honored and the rest is
>>>> silently ignored. Well, with recent changes to the kernel and
>>>> numactl we can do better.
>>>>
>>>> The Linux kernel added in v5.15 via commit cfcaa66f8032
>>>> ("mm/hugetlb: add support for mempolicy MPOL_PREFERRED_MANY")
>>>> support for MPOL_PREFERRED_MANY, which accepts multiple preferred
>>>> NUMA nodes instead.
>>>>
>>>> Then, numa_has_preferred_many() API was introduced to numactl
>>>> (v2.0.15~26) allowing applications to query kernel support.
>>>>
>>>> Wiring this all together, we can pass MPOL_PREFERRED_MANY to the
>>>> mbind() call instead and stop ignoring multiple nodes, silently.
>>>>
>>>> Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
>>>> ---
>>>
>>> [...]
>>>
>>>> +#ifdef HAVE_NUMA_SET_PREFERRED_MANY
> 
> That should be HAVE_NUMA_HAS_PREFERRED_MANY, right?
> 

Oops, yes. Do you want me to send v3?

Michal



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] hostmem: Honor multiple preferred nodes if possible
  2022-12-19  9:57       ` Michal Prívozník
@ 2022-12-19  9:58         ` David Hildenbrand
  2022-12-19 11:10           ` Michal Prívozník
  0 siblings, 1 reply; 8+ messages in thread
From: David Hildenbrand @ 2022-12-19  9:58 UTC (permalink / raw)
  To: Michal Prívozník, qemu-devel
  Cc: imammedo, marcandre.lureau, berrange

On 19.12.22 10:57, Michal Prívozník wrote:
> On 12/19/22 10:55, David Hildenbrand wrote:
>> On 16.12.22 14:47, Michal Prívozník wrote:
>>> On 12/16/22 14:41, David Hildenbrand wrote:
>>>> On 15.12.22 10:55, Michal Privoznik wrote:
>>>>> If a memory-backend is configured with mode
>>>>> HOST_MEM_POLICY_PREFERRED then
>>>>> host_memory_backend_memory_complete() calls mbind() as:
>>>>>
>>>>>      mbind(..., MPOL_PREFERRED, nodemask, ...);
>>>>>
>>>>> Here, 'nodemask' is a bitmap of host NUMA nodes and corresponds
>>>>> to the .host-nodes attribute. Therefore, there can be multiple
>>>>> nodes specified. However, the documentation to MPOL_PREFERRED
>>>>> says:
>>>>>
>>>>>      MPOL_PREFERRED
>>>>>        This mode sets the preferred node for allocation. ...
>>>>>        If nodemask specifies more than one node ID, the first node
>>>>>        in the mask will be selected as the preferred node.
>>>>>
>>>>> Therefore, only the first node is honored and the rest is
>>>>> silently ignored. Well, with recent changes to the kernel and
>>>>> numactl we can do better.
>>>>>
>>>>> The Linux kernel added in v5.15 via commit cfcaa66f8032
>>>>> ("mm/hugetlb: add support for mempolicy MPOL_PREFERRED_MANY")
>>>>> support for MPOL_PREFERRED_MANY, which accepts multiple preferred
>>>>> NUMA nodes instead.
>>>>>
>>>>> Then, numa_has_preferred_many() API was introduced to numactl
>>>>> (v2.0.15~26) allowing applications to query kernel support.
>>>>>
>>>>> Wiring this all together, we can pass MPOL_PREFERRED_MANY to the
>>>>> mbind() call instead and stop ignoring multiple nodes, silently.
>>>>>
>>>>> Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
>>>>> ---
>>>>
>>>> [...]
>>>>
>>>>> +#ifdef HAVE_NUMA_SET_PREFERRED_MANY
>>
>> That should be HAVE_NUMA_HAS_PREFERRED_MANY, right?
>>
> 
> Oops, yes. Do you want me to send v3?

I'll fixup. I just queued the fixed-up patch to

https://github.com/davidhildenbrand/qemu.git mem-next

Please double-check. Thanks!

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] hostmem: Honor multiple preferred nodes if possible
  2022-12-19  9:58         ` David Hildenbrand
@ 2022-12-19 11:10           ` Michal Prívozník
  2022-12-19 15:23             ` David Hildenbrand
  0 siblings, 1 reply; 8+ messages in thread
From: Michal Prívozník @ 2022-12-19 11:10 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel; +Cc: imammedo, marcandre.lureau, berrange

On 12/19/22 10:58, David Hildenbrand wrote:
> 
> I'll fixup. I just queued the fixed-up patch to
> 
> https://github.com/davidhildenbrand/qemu.git mem-next
> 
> Please double-check. Thanks!
> 

Looks good. Sorry for not doing it properly the first time.

Michal



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] hostmem: Honor multiple preferred nodes if possible
  2022-12-19 11:10           ` Michal Prívozník
@ 2022-12-19 15:23             ` David Hildenbrand
  0 siblings, 0 replies; 8+ messages in thread
From: David Hildenbrand @ 2022-12-19 15:23 UTC (permalink / raw)
  To: Michal Prívozník, qemu-devel
  Cc: imammedo, marcandre.lureau, berrange

On 19.12.22 12:10, Michal Prívozník wrote:
> On 12/19/22 10:58, David Hildenbrand wrote:
>>
>> I'll fixup. I just queued the fixed-up patch to
>>
>> https://github.com/davidhildenbrand/qemu.git mem-next
>>
>> Please double-check. Thanks!
>>
> 
> Looks good. Sorry for not doing it properly the first time.

No worries, caught it while testing if the checks (including 
cc.has_function()) work as expected :)

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2022-12-19 15:23 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-12-15  9:55 [PATCH v2] hostmem: Honor multiple preferred nodes if possible Michal Privoznik
2022-12-16 13:41 ` David Hildenbrand
2022-12-16 13:47   ` Michal Prívozník
2022-12-19  9:55     ` David Hildenbrand
2022-12-19  9:57       ` Michal Prívozník
2022-12-19  9:58         ` David Hildenbrand
2022-12-19 11:10           ` Michal Prívozník
2022-12-19 15:23             ` David Hildenbrand

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.