All of lore.kernel.org
 help / color / mirror / Atom feed
* Error "cannot bind memory to host NUMA nodes: Operation not permitted" running inside docker
@ 2020-04-29 20:40 Manuel Hohmann
  2020-04-30  8:52 ` Daniel P. Berrangé
  0 siblings, 1 reply; 5+ messages in thread
From: Manuel Hohmann @ 2020-04-29 20:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: imammedo


[-- Attachment #1.1: Type: text/plain, Size: 1844 bytes --]

Hi,

I encountered the following error message on the QEMU 5.0.0 release, compiled and run inside a docker image:

"cannot bind memory to host NUMA nodes: Operation not permitted"

The QEMU command line to reproduce this behavior (it happens also on -x86_64, -arm, -aarch64 with similar command line):

qemu-system-i386 -m 64 -M pc -smp 1 -display none -monitor stdio -drive file=mp-acpi/NOS.iso,media=cdrom,id=d -boot order=d -d cpu_reset

The docker image which shows the error is available here:

https://hub.docker.com/repository/docker/xenos1984/test-qemu

Built on Ubuntu 20.04, and including NUMA support with libnuma-dev package installed, from the following sources:

https://github.com/xenos1984/cross-toolchain/tree/master/tools-qemu
https://github.com/xenos1984/cross-toolchain/tree/master/test-qemu

The iso image used can be obtained here, but should not be relevant:

https://github.com/xenos1984/NOS/releases/download/latest/nos-i686.iso.bz2

The command fails when the image is used in a CI environment:

https://circleci.com/gh/xenos1984/NOS/953

On recommendation by @imammedo I post the issue to qemu-devel, and also tried the following patch:

--- a/backends/hostmem.c
+++ b/backends/hostmem.c
@@ -384,3 +384,3 @@
           if (mbind(ptr, sz, backend->policy,
-                  maxnode ? backend->host_nodes : NULL, maxnode + 1, flags)) {
+                  maxnode ? backend->host_nodes : NULL, 0, flags)) {
               if (backend->policy != MPOL_DEFAULT || errno != ENOSYS) {

But no success, the same error occurs. It happens only within docker - the same command runs fine on my desktop (also Ubuntu 20.04) system.

Best regards,
xenos1984 / Manuel Hohmann

PS: I apologize if this mail is sent / received more than once; there was a problem with my outgoing mails.



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Error "cannot bind memory to host NUMA nodes: Operation not permitted" running inside docker
  2020-04-29 20:40 Error "cannot bind memory to host NUMA nodes: Operation not permitted" running inside docker Manuel Hohmann
@ 2020-04-30  8:52 ` Daniel P. Berrangé
  2020-04-30 11:45   ` Igor Mammedov
  0 siblings, 1 reply; 5+ messages in thread
From: Daniel P. Berrangé @ 2020-04-30  8:52 UTC (permalink / raw)
  To: Manuel Hohmann; +Cc: imammedo, qemu-devel

On Wed, Apr 29, 2020 at 11:40:32PM +0300, Manuel Hohmann wrote:
> Hi,
> 
> I encountered the following error message on the QEMU 5.0.0 release, compiled and run inside a docker image:
> 
> "cannot bind memory to host NUMA nodes: Operation not permitted"

The error is reporting that mbind() failed.

mbind() man page says it gives EPERM when

  "The  flags argument included the MPOL_MF_MOVE_ALL flag and
   the caller does not have the CAP_SYS_NICE privilege."

QEMU always uses the MPOL_MF_MOVE flag though.

Looking at the kernel source,  mbind can also return EPERM if the
process is not permitted to access the requested nodes which seems
more plausible as a cause.

I guess the container the bound to some sub-set of nodes and QEMU is
trying to place the VM on different nodes that the container isn't
allowed to accesss.

> 
> The QEMU command line to reproduce this behavior (it happens also on -x86_64, -arm, -aarch64 with similar command line):
> 
> qemu-system-i386 -m 64 -M pc -smp 1 -display none -monitor stdio -drive file=mp-acpi/NOS.iso,media=cdrom,id=d -boot order=d -d cpu_reset

There is no reference to host mem backend or NUMA binding, so I'm
puzzled why QEMU would be doing an mbind() at all. That seems bad.


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Error "cannot bind memory to host NUMA nodes: Operation not permitted" running inside docker
  2020-04-30  8:52 ` Daniel P. Berrangé
@ 2020-04-30 11:45   ` Igor Mammedov
  2020-04-30 11:49     ` Daniel P. Berrangé
  0 siblings, 1 reply; 5+ messages in thread
From: Igor Mammedov @ 2020-04-30 11:45 UTC (permalink / raw)
  To: Daniel P. Berrangé; +Cc: Manuel Hohmann, qemu-devel

On Thu, 30 Apr 2020 09:52:15 +0100
Daniel P. Berrangé <berrange@redhat.com> wrote:

> On Wed, Apr 29, 2020 at 11:40:32PM +0300, Manuel Hohmann wrote:
> > Hi,
> > 
> > I encountered the following error message on the QEMU 5.0.0 release, compiled and run inside a docker image:
> > 
> > "cannot bind memory to host NUMA nodes: Operation not permitted"  
> 
> The error is reporting that mbind() failed.
> 
> mbind() man page says it gives EPERM when
> 
>   "The  flags argument included the MPOL_MF_MOVE_ALL flag and
>    the caller does not have the CAP_SYS_NICE privilege."
> 
> QEMU always uses the MPOL_MF_MOVE flag though.
> 
> Looking at the kernel source,  mbind can also return EPERM if the
> process is not permitted to access the requested nodes which seems
> more plausible as a cause.
> 
> I guess the container the bound to some sub-set of nodes and QEMU is
> trying to place the VM on different nodes that the container isn't
> allowed to accesss.


mbind call in this case should be nop since it's 'reapplying' the same default policy
the RAM was allocated with (modulo flags which are not issue in this default case).

It looks like there is configuration issue with container (blacklisted mbind) [2]
Is it possible to try run container with '--security-opt seccomp=unconfined'
to see if it's the issue.

From QEMU side we may skip mbind if hostnodes bitmap is empty to workaround
the issue.
But I'm not sure if it should be done instead of whitelisting mbind in container,
since usecases that are using host-nodes will still be broken due to blacklisted mbind.
(looks like mysql has the same [1] problem (but it just warning for them, so it's not so severe issue),
and they were inclined towards fixing container config)


> > The QEMU command line to reproduce this behavior (it happens also on -x86_64, -arm, -aarch64 with similar command line):
> > 
> > qemu-system-i386 -m 64 -M pc -smp 1 -display none -monitor stdio -drive file=mp-acpi/NOS.iso,media=cdrom,id=d -boot order=d -d cpu_reset  
> 
> There is no reference to host mem backend or NUMA binding, so I'm
> puzzled why QEMU would be doing an mbind() at all. That seems bad.

since 5.0 all guest RAM allocation was consolidated around hostmem.

> 
> 
> Regards,
> Daniel

2)
     https://github.com/docker-library/mysql/issues/303
1)
     https://docs.docker.com/engine/security/seccomp/



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Error "cannot bind memory to host NUMA nodes: Operation not permitted" running inside docker
  2020-04-30 11:45   ` Igor Mammedov
@ 2020-04-30 11:49     ` Daniel P. Berrangé
  0 siblings, 0 replies; 5+ messages in thread
From: Daniel P. Berrangé @ 2020-04-30 11:49 UTC (permalink / raw)
  To: Igor Mammedov; +Cc: Manuel Hohmann, qemu-devel

On Thu, Apr 30, 2020 at 01:45:58PM +0200, Igor Mammedov wrote:
> On Thu, 30 Apr 2020 09:52:15 +0100
> Daniel P. Berrangé <berrange@redhat.com> wrote:
> 
> > On Wed, Apr 29, 2020 at 11:40:32PM +0300, Manuel Hohmann wrote:
> > > Hi,
> > > 
> > > I encountered the following error message on the QEMU 5.0.0 release, compiled and run inside a docker image:
> > > 
> > > "cannot bind memory to host NUMA nodes: Operation not permitted"  
> > 
> > The error is reporting that mbind() failed.
> > 
> > mbind() man page says it gives EPERM when
> > 
> >   "The  flags argument included the MPOL_MF_MOVE_ALL flag and
> >    the caller does not have the CAP_SYS_NICE privilege."
> > 
> > QEMU always uses the MPOL_MF_MOVE flag though.
> > 
> > Looking at the kernel source,  mbind can also return EPERM if the
> > process is not permitted to access the requested nodes which seems
> > more plausible as a cause.
> > 
> > I guess the container the bound to some sub-set of nodes and QEMU is
> > trying to place the VM on different nodes that the container isn't
> > allowed to accesss.
> 
> 
> mbind call in this case should be nop since it's 'reapplying' the same
> default policy the RAM was allocated with (modulo flags which are not
> issue in this default case).
> 
> It looks like there is configuration issue with container (blacklisted mbind) [2]
> Is it possible to try run container with '--security-opt seccomp=unconfined'
> to see if it's the issue.

Oh, yes, I forgot about seccomp - that is almost certainly the problem,
given that this is a public CI system with locked down container config.

> 
> From QEMU side we may skip mbind if hostnodes bitmap is empty to workaround
> the issue.
> But I'm not sure if it should be done instead of whitelisting mbind in container,
> since usecases that are using host-nodes will still be broken due to blacklisted mbind.
> (looks like mysql has the same [1] problem (but it just warning for them, so it's not so severe issue),
> and they were inclined towards fixing container config)

Telling users to reconfigure the container to allow mbind() is not a viable
approach. In a public cloud scenario the users will not have any direct
control over the container, and it is entirely unsurprising for mbind to
be blocked.


> > > The QEMU command line to reproduce this behavior (it happens also on -x86_64, -arm, -aarch64 with similar command line):
> > > 
> > > qemu-system-i386 -m 64 -M pc -smp 1 -display none -monitor stdio -drive file=mp-acpi/NOS.iso,media=cdrom,id=d -boot order=d -d cpu_reset  
> > 
> > There is no reference to host mem backend or NUMA binding, so I'm
> > puzzled why QEMU would be doing an mbind() at all. That seems bad.
> 
> since 5.0 all guest RAM allocation was consolidated around hostmem.

Ok, so QEMU shouldn't be calling mbind() at all unless there's some config
on the CLI that requests us to change the default binding, which there
is not in this case.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Error "cannot bind memory to host NUMA nodes: Operation not permitted" running inside docker
@ 2020-04-29 19:09 Manuel Hohmann
  0 siblings, 0 replies; 5+ messages in thread
From: Manuel Hohmann @ 2020-04-29 19:09 UTC (permalink / raw)
  To: qemu-devel; +Cc: imammedo


[-- Attachment #1.1: Type: text/plain, Size: 1504 bytes --]

Hi,

I encountered the following error on the QEMU 5.0.0 release, compiled and run inside a docker image:

"cannot bind memory to host NUMA nodes: Operation not permitted"

The QEMU command line to reproduce this behavior:

qemu-system-i386 -m 64 -M pc -smp 1 -display none -monitor stdio -drive file=mp-acpi/NOS.iso,media=cdrom,id=d -boot order=d -d cpu_reset

The docker image which shows the error is available here:

https://hub.docker.com/repository/docker/xenos1984/test-qemu

Built on Ubuntu 20.04, and including NUMA support with libnuma-dev package installed, from the following sources:

https://github.com/xenos1984/cross-toolchain/tree/master/tools-qemu
https://github.com/xenos1984/cross-toolchain/tree/master/test-qemu

The iso image used can be obtained here, but should not be relevant:

https://github.com/xenos1984/NOS/releases/download/latest/nos-i686.iso.bz2

The command fails when the image is used in a CI environment:

https://circleci.com/gh/xenos1984/NOS/953

On recommendation by @imammedo I tried the following:

--- a/backends/hostmem.c
+++ b/backends/hostmem.c
@@ -384,3 +384,3 @@
          if (mbind(ptr, sz, backend->policy,
-                  maxnode ? backend->host_nodes : NULL, maxnode + 1, flags)) {
+                  maxnode ? backend->host_nodes : NULL, 0, flags)) {
              if (backend->policy != MPOL_DEFAULT || errno != ENOSYS) {

But no success, the same error occurs.

Best regards,
xenos1984 / Manuel Hohmann


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-04-30 11:51 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-29 20:40 Error "cannot bind memory to host NUMA nodes: Operation not permitted" running inside docker Manuel Hohmann
2020-04-30  8:52 ` Daniel P. Berrangé
2020-04-30 11:45   ` Igor Mammedov
2020-04-30 11:49     ` Daniel P. Berrangé
  -- strict thread matches above, loose matches on Subject: below --
2020-04-29 19:09 Manuel Hohmann

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.