All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [Bug 1740219] [NEW] static linux-user emulation has several-second startup time
@ 2017-12-27  5:54 LukeShu
  2017-12-27  6:18 ` [Qemu-devel] [Bug 1740219] " LukeShu
                   ` (18 more replies)
  0 siblings, 19 replies; 20+ messages in thread
From: LukeShu @ 2017-12-27  5:54 UTC (permalink / raw)
  To: qemu-devel

Public bug reported:

static linux-user emulation has several-second startup time

My problem: I'm a Parabola packager, and I'm updating our
qemu-user-static package from 2.8 to 2.11.  With my new
statically-linked 2.11, running `qemu-arm /my/arm-chroot/bin/true`
went from taking 0.006s to 3s!  This does not happen with the normal
dynamically linked 2.11, or the old static 2.8.

What happens is it gets stuck in
`linux-user/elfload.c:init_guest_space()`.  What `init_guest_space`
does is map 2 parts of the address space: `[base, base+guest_size]`
and `[base+0xffff0000, base+0xffff0000+page_size]`; where it must find
an acceptable `base`.  Its strategy is to `mmap(NULL, guest_size,
...)` decide where the first range is, and then check if that
+0xffff0000 is also available.  If it isn't, then it starts trying
`mmap(base, ...)` for the entire address space from low-address to
high-address.

"Normally," it finds an accaptable `base` within the first 2 tries.
With a static 2.11, it's taking thousands of tries.

----

Now, from my understanding, there are 2 factors working together to
cause that in static 2.11 but not the other builds:

 - 2.11 increased the default `guest_size` from 0xf7000000 to 0xffff0000
 - PIE (and thus ASLR) is disabled for static builds

For some reason that I don't understand, with the smaller
`guest_size` the initial `mmap(NULL, guest_size, ...)` usually
returns an acceptable address range; but larger `guest_size` makes it
consistently return a block of memory that butts right up against
another already mapped chunk of memory.  This isn't just true on the
older builds, it's true with the 2.11 builds if I use the `-R` flag to
shrink the `guest_size` back down to 0xf7000000.  That is with
linux-hardened 4.13.13 on x86-64.

So then, it it falls back to crawling the entire address space; so it
tries base=0x00001000.  With ASLR, that probably succeeds.  But with
ASLR being disabled on static builds, the text segment is at
0x60000000; which is does not leave room for the needed
0xffff1000-size block before it.  So then it tries base=0x00002000.
And so on, more than 6000 times until it finally gets to and passes
the text segment; calling mmap more than 12000 times.

----

I'm not sure what the fix is.  Perhaps try to mmap a continuous chunk
of size 0xffff1000, then munmap it and then mmap the 2 chunks that we
actually need.  The disadvantage to that is that it does not support
the sparse address space that the current algorithm supports for
`guest_size < 0xffff0000`.  If `guest_size < 0xffff0000` *and* the big
mmap fails, then it could fall back to a sparse search; though I'm not
sure the current algorithm is a good choice for it, as we see in this
bug.  Perhaps it should inspect /proc/self/maps to try to find a
suitable range before ever calling mmap?

** Affects: qemu
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1740219

Title:
  static linux-user emulation has several-second startup time

Status in QEMU:
  New

Bug description:
  static linux-user emulation has several-second startup time

  My problem: I'm a Parabola packager, and I'm updating our
  qemu-user-static package from 2.8 to 2.11.  With my new
  statically-linked 2.11, running `qemu-arm /my/arm-chroot/bin/true`
  went from taking 0.006s to 3s!  This does not happen with the normal
  dynamically linked 2.11, or the old static 2.8.

  What happens is it gets stuck in
  `linux-user/elfload.c:init_guest_space()`.  What `init_guest_space`
  does is map 2 parts of the address space: `[base, base+guest_size]`
  and `[base+0xffff0000, base+0xffff0000+page_size]`; where it must find
  an acceptable `base`.  Its strategy is to `mmap(NULL, guest_size,
  ...)` decide where the first range is, and then check if that
  +0xffff0000 is also available.  If it isn't, then it starts trying
  `mmap(base, ...)` for the entire address space from low-address to
  high-address.

  "Normally," it finds an accaptable `base` within the first 2 tries.
  With a static 2.11, it's taking thousands of tries.

  ----

  Now, from my understanding, there are 2 factors working together to
  cause that in static 2.11 but not the other builds:

   - 2.11 increased the default `guest_size` from 0xf7000000 to 0xffff0000
   - PIE (and thus ASLR) is disabled for static builds

  For some reason that I don't understand, with the smaller
  `guest_size` the initial `mmap(NULL, guest_size, ...)` usually
  returns an acceptable address range; but larger `guest_size` makes it
  consistently return a block of memory that butts right up against
  another already mapped chunk of memory.  This isn't just true on the
  older builds, it's true with the 2.11 builds if I use the `-R` flag to
  shrink the `guest_size` back down to 0xf7000000.  That is with
  linux-hardened 4.13.13 on x86-64.

  So then, it it falls back to crawling the entire address space; so it
  tries base=0x00001000.  With ASLR, that probably succeeds.  But with
  ASLR being disabled on static builds, the text segment is at
  0x60000000; which is does not leave room for the needed
  0xffff1000-size block before it.  So then it tries base=0x00002000.
  And so on, more than 6000 times until it finally gets to and passes
  the text segment; calling mmap more than 12000 times.

  ----

  I'm not sure what the fix is.  Perhaps try to mmap a continuous chunk
  of size 0xffff1000, then munmap it and then mmap the 2 chunks that we
  actually need.  The disadvantage to that is that it does not support
  the sparse address space that the current algorithm supports for
  `guest_size < 0xffff0000`.  If `guest_size < 0xffff0000` *and* the big
  mmap fails, then it could fall back to a sparse search; though I'm not
  sure the current algorithm is a good choice for it, as we see in this
  bug.  Perhaps it should inspect /proc/self/maps to try to find a
  suitable range before ever calling mmap?

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1740219/+subscriptions

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Qemu-devel] [Bug 1740219] Re: static linux-user emulation has several-second startup time
  2017-12-27  5:54 [Qemu-devel] [Bug 1740219] [NEW] static linux-user emulation has several-second startup time LukeShu
@ 2017-12-27  6:18 ` LukeShu
  2018-01-03 23:00 ` [Qemu-devel] [Bug 1740219] Re: static linux-user ARM " LukeShu
                   ` (17 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: LukeShu @ 2017-12-27  6:18 UTC (permalink / raw)
  To: qemu-devel

Actually, it seems that the `[base+0xffff0000,
base+0xffff0000+page_size]` segment is only mapped on 32-bit ARM.  So
this is 32-bit ARM-specific.

** Tags added: arm linux-user

** Summary changed:

- static linux-user emulation has several-second startup time
+ static linux-user ARM emulation has several-second startup time

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1740219

Title:
  static linux-user ARM emulation has several-second startup time

Status in QEMU:
  New

Bug description:
  static linux-user emulation has several-second startup time

  My problem: I'm a Parabola packager, and I'm updating our
  qemu-user-static package from 2.8 to 2.11.  With my new
  statically-linked 2.11, running `qemu-arm /my/arm-chroot/bin/true`
  went from taking 0.006s to 3s!  This does not happen with the normal
  dynamically linked 2.11, or the old static 2.8.

  What happens is it gets stuck in
  `linux-user/elfload.c:init_guest_space()`.  What `init_guest_space`
  does is map 2 parts of the address space: `[base, base+guest_size]`
  and `[base+0xffff0000, base+0xffff0000+page_size]`; where it must find
  an acceptable `base`.  Its strategy is to `mmap(NULL, guest_size,
  ...)` decide where the first range is, and then check if that
  +0xffff0000 is also available.  If it isn't, then it starts trying
  `mmap(base, ...)` for the entire address space from low-address to
  high-address.

  "Normally," it finds an accaptable `base` within the first 2 tries.
  With a static 2.11, it's taking thousands of tries.

  ----

  Now, from my understanding, there are 2 factors working together to
  cause that in static 2.11 but not the other builds:

   - 2.11 increased the default `guest_size` from 0xf7000000 to 0xffff0000
   - PIE (and thus ASLR) is disabled for static builds

  For some reason that I don't understand, with the smaller
  `guest_size` the initial `mmap(NULL, guest_size, ...)` usually
  returns an acceptable address range; but larger `guest_size` makes it
  consistently return a block of memory that butts right up against
  another already mapped chunk of memory.  This isn't just true on the
  older builds, it's true with the 2.11 builds if I use the `-R` flag to
  shrink the `guest_size` back down to 0xf7000000.  That is with
  linux-hardened 4.13.13 on x86-64.

  So then, it it falls back to crawling the entire address space; so it
  tries base=0x00001000.  With ASLR, that probably succeeds.  But with
  ASLR being disabled on static builds, the text segment is at
  0x60000000; which is does not leave room for the needed
  0xffff1000-size block before it.  So then it tries base=0x00002000.
  And so on, more than 6000 times until it finally gets to and passes
  the text segment; calling mmap more than 12000 times.

  ----

  I'm not sure what the fix is.  Perhaps try to mmap a continuous chunk
  of size 0xffff1000, then munmap it and then mmap the 2 chunks that we
  actually need.  The disadvantage to that is that it does not support
  the sparse address space that the current algorithm supports for
  `guest_size < 0xffff0000`.  If `guest_size < 0xffff0000` *and* the big
  mmap fails, then it could fall back to a sparse search; though I'm not
  sure the current algorithm is a good choice for it, as we see in this
  bug.  Perhaps it should inspect /proc/self/maps to try to find a
  suitable range before ever calling mmap?

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1740219/+subscriptions

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Qemu-devel] [Bug 1740219] Re: static linux-user ARM emulation has several-second startup time
  2017-12-27  5:54 [Qemu-devel] [Bug 1740219] [NEW] static linux-user emulation has several-second startup time LukeShu
  2017-12-27  6:18 ` [Qemu-devel] [Bug 1740219] " LukeShu
@ 2018-01-03 23:00 ` LukeShu
  2018-03-20  8:36 ` ChristianEhrhardt
                   ` (16 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: LukeShu @ 2018-01-03 23:00 UTC (permalink / raw)
  To: qemu-devel

To have a link to it from here, on the 28th I submitted a patchset to
fix this: https://lists.nongnu.org/archive/html/qemu-
devel/2017-12/msg05237.html

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1740219

Title:
  static linux-user ARM emulation has several-second startup time

Status in QEMU:
  New

Bug description:
  static linux-user emulation has several-second startup time

  My problem: I'm a Parabola packager, and I'm updating our
  qemu-user-static package from 2.8 to 2.11.  With my new
  statically-linked 2.11, running `qemu-arm /my/arm-chroot/bin/true`
  went from taking 0.006s to 3s!  This does not happen with the normal
  dynamically linked 2.11, or the old static 2.8.

  What happens is it gets stuck in
  `linux-user/elfload.c:init_guest_space()`.  What `init_guest_space`
  does is map 2 parts of the address space: `[base, base+guest_size]`
  and `[base+0xffff0000, base+0xffff0000+page_size]`; where it must find
  an acceptable `base`.  Its strategy is to `mmap(NULL, guest_size,
  ...)` decide where the first range is, and then check if that
  +0xffff0000 is also available.  If it isn't, then it starts trying
  `mmap(base, ...)` for the entire address space from low-address to
  high-address.

  "Normally," it finds an accaptable `base` within the first 2 tries.
  With a static 2.11, it's taking thousands of tries.

  ----

  Now, from my understanding, there are 2 factors working together to
  cause that in static 2.11 but not the other builds:

   - 2.11 increased the default `guest_size` from 0xf7000000 to 0xffff0000
   - PIE (and thus ASLR) is disabled for static builds

  For some reason that I don't understand, with the smaller
  `guest_size` the initial `mmap(NULL, guest_size, ...)` usually
  returns an acceptable address range; but larger `guest_size` makes it
  consistently return a block of memory that butts right up against
  another already mapped chunk of memory.  This isn't just true on the
  older builds, it's true with the 2.11 builds if I use the `-R` flag to
  shrink the `guest_size` back down to 0xf7000000.  That is with
  linux-hardened 4.13.13 on x86-64.

  So then, it it falls back to crawling the entire address space; so it
  tries base=0x00001000.  With ASLR, that probably succeeds.  But with
  ASLR being disabled on static builds, the text segment is at
  0x60000000; which is does not leave room for the needed
  0xffff1000-size block before it.  So then it tries base=0x00002000.
  And so on, more than 6000 times until it finally gets to and passes
  the text segment; calling mmap more than 12000 times.

  ----

  I'm not sure what the fix is.  Perhaps try to mmap a continuous chunk
  of size 0xffff1000, then munmap it and then mmap the 2 chunks that we
  actually need.  The disadvantage to that is that it does not support
  the sparse address space that the current algorithm supports for
  `guest_size < 0xffff0000`.  If `guest_size < 0xffff0000` *and* the big
  mmap fails, then it could fall back to a sparse search; though I'm not
  sure the current algorithm is a good choice for it, as we see in this
  bug.  Perhaps it should inspect /proc/self/maps to try to find a
  suitable range before ever calling mmap?

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1740219/+subscriptions

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Qemu-devel] [Bug 1740219] Re: static linux-user ARM emulation has several-second startup time
  2017-12-27  5:54 [Qemu-devel] [Bug 1740219] [NEW] static linux-user emulation has several-second startup time LukeShu
  2017-12-27  6:18 ` [Qemu-devel] [Bug 1740219] " LukeShu
  2018-01-03 23:00 ` [Qemu-devel] [Bug 1740219] Re: static linux-user ARM " LukeShu
@ 2018-03-20  8:36 ` ChristianEhrhardt
  2018-03-20  8:37 ` ChristianEhrhardt
                   ` (15 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: ChristianEhrhardt @ 2018-03-20  8:36 UTC (permalink / raw)
  To: qemu-devel

>From Alistair Buxton (a-j-buxton) on bug 1756807:
I just tested the patch from https://bugs.launchpad.net/qemu/+bug/1740219 and it fixes the problem for me. Specifically I only tried the final patch of the series.

I duped the bugs onto this one since it is older and has a suggested
patch on the ML.

** Also affects: qemu (Ubuntu)
   Importance: Undecided
       Status: New

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1740219

Title:
  static linux-user ARM emulation has several-second startup time

Status in QEMU:
  New
Status in qemu package in Ubuntu:
  Incomplete

Bug description:
  static linux-user emulation has several-second startup time

  My problem: I'm a Parabola packager, and I'm updating our
  qemu-user-static package from 2.8 to 2.11.  With my new
  statically-linked 2.11, running `qemu-arm /my/arm-chroot/bin/true`
  went from taking 0.006s to 3s!  This does not happen with the normal
  dynamically linked 2.11, or the old static 2.8.

  What happens is it gets stuck in
  `linux-user/elfload.c:init_guest_space()`.  What `init_guest_space`
  does is map 2 parts of the address space: `[base, base+guest_size]`
  and `[base+0xffff0000, base+0xffff0000+page_size]`; where it must find
  an acceptable `base`.  Its strategy is to `mmap(NULL, guest_size,
  ...)` decide where the first range is, and then check if that
  +0xffff0000 is also available.  If it isn't, then it starts trying
  `mmap(base, ...)` for the entire address space from low-address to
  high-address.

  "Normally," it finds an accaptable `base` within the first 2 tries.
  With a static 2.11, it's taking thousands of tries.

  ----

  Now, from my understanding, there are 2 factors working together to
  cause that in static 2.11 but not the other builds:

   - 2.11 increased the default `guest_size` from 0xf7000000 to 0xffff0000
   - PIE (and thus ASLR) is disabled for static builds

  For some reason that I don't understand, with the smaller
  `guest_size` the initial `mmap(NULL, guest_size, ...)` usually
  returns an acceptable address range; but larger `guest_size` makes it
  consistently return a block of memory that butts right up against
  another already mapped chunk of memory.  This isn't just true on the
  older builds, it's true with the 2.11 builds if I use the `-R` flag to
  shrink the `guest_size` back down to 0xf7000000.  That is with
  linux-hardened 4.13.13 on x86-64.

  So then, it it falls back to crawling the entire address space; so it
  tries base=0x00001000.  With ASLR, that probably succeeds.  But with
  ASLR being disabled on static builds, the text segment is at
  0x60000000; which is does not leave room for the needed
  0xffff1000-size block before it.  So then it tries base=0x00002000.
  And so on, more than 6000 times until it finally gets to and passes
  the text segment; calling mmap more than 12000 times.

  ----

  I'm not sure what the fix is.  Perhaps try to mmap a continuous chunk
  of size 0xffff1000, then munmap it and then mmap the 2 chunks that we
  actually need.  The disadvantage to that is that it does not support
  the sparse address space that the current algorithm supports for
  `guest_size < 0xffff0000`.  If `guest_size < 0xffff0000` *and* the big
  mmap fails, then it could fall back to a sparse search; though I'm not
  sure the current algorithm is a good choice for it, as we see in this
  bug.  Perhaps it should inspect /proc/self/maps to try to find a
  suitable range before ever calling mmap?

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1740219/+subscriptions

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Qemu-devel] [Bug 1740219] Re: static linux-user ARM emulation has several-second startup time
  2017-12-27  5:54 [Qemu-devel] [Bug 1740219] [NEW] static linux-user emulation has several-second startup time LukeShu
                   ` (2 preceding siblings ...)
  2018-03-20  8:36 ` ChristianEhrhardt
@ 2018-03-20  8:37 ` ChristianEhrhardt
  2018-03-20 18:19 ` LukeShu
                   ` (14 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: ChristianEhrhardt @ 2018-03-20  8:37 UTC (permalink / raw)
  To: qemu-devel

Added an qemu(Ubuntu) task to further track this, keeping it incomplete
there until this is resolved upstream.

** Changed in: qemu (Ubuntu)
       Status: New => Incomplete

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1740219

Title:
  static linux-user ARM emulation has several-second startup time

Status in QEMU:
  New
Status in qemu package in Ubuntu:
  Incomplete

Bug description:
  static linux-user emulation has several-second startup time

  My problem: I'm a Parabola packager, and I'm updating our
  qemu-user-static package from 2.8 to 2.11.  With my new
  statically-linked 2.11, running `qemu-arm /my/arm-chroot/bin/true`
  went from taking 0.006s to 3s!  This does not happen with the normal
  dynamically linked 2.11, or the old static 2.8.

  What happens is it gets stuck in
  `linux-user/elfload.c:init_guest_space()`.  What `init_guest_space`
  does is map 2 parts of the address space: `[base, base+guest_size]`
  and `[base+0xffff0000, base+0xffff0000+page_size]`; where it must find
  an acceptable `base`.  Its strategy is to `mmap(NULL, guest_size,
  ...)` decide where the first range is, and then check if that
  +0xffff0000 is also available.  If it isn't, then it starts trying
  `mmap(base, ...)` for the entire address space from low-address to
  high-address.

  "Normally," it finds an accaptable `base` within the first 2 tries.
  With a static 2.11, it's taking thousands of tries.

  ----

  Now, from my understanding, there are 2 factors working together to
  cause that in static 2.11 but not the other builds:

   - 2.11 increased the default `guest_size` from 0xf7000000 to 0xffff0000
   - PIE (and thus ASLR) is disabled for static builds

  For some reason that I don't understand, with the smaller
  `guest_size` the initial `mmap(NULL, guest_size, ...)` usually
  returns an acceptable address range; but larger `guest_size` makes it
  consistently return a block of memory that butts right up against
  another already mapped chunk of memory.  This isn't just true on the
  older builds, it's true with the 2.11 builds if I use the `-R` flag to
  shrink the `guest_size` back down to 0xf7000000.  That is with
  linux-hardened 4.13.13 on x86-64.

  So then, it it falls back to crawling the entire address space; so it
  tries base=0x00001000.  With ASLR, that probably succeeds.  But with
  ASLR being disabled on static builds, the text segment is at
  0x60000000; which is does not leave room for the needed
  0xffff1000-size block before it.  So then it tries base=0x00002000.
  And so on, more than 6000 times until it finally gets to and passes
  the text segment; calling mmap more than 12000 times.

  ----

  I'm not sure what the fix is.  Perhaps try to mmap a continuous chunk
  of size 0xffff1000, then munmap it and then mmap the 2 chunks that we
  actually need.  The disadvantage to that is that it does not support
  the sparse address space that the current algorithm supports for
  `guest_size < 0xffff0000`.  If `guest_size < 0xffff0000` *and* the big
  mmap fails, then it could fall back to a sparse search; though I'm not
  sure the current algorithm is a good choice for it, as we see in this
  bug.  Perhaps it should inspect /proc/self/maps to try to find a
  suitable range before ever calling mmap?

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1740219/+subscriptions

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Qemu-devel] [Bug 1740219] Re: static linux-user ARM emulation has several-second startup time
  2017-12-27  5:54 [Qemu-devel] [Bug 1740219] [NEW] static linux-user emulation has several-second startup time LukeShu
                   ` (3 preceding siblings ...)
  2018-03-20  8:37 ` ChristianEhrhardt
@ 2018-03-20 18:19 ` LukeShu
  2018-03-22 20:16 ` LukeShu
                   ` (13 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: LukeShu @ 2018-03-20 18:19 UTC (permalink / raw)
  To: qemu-devel

Everything except for the final patch (which has the actual fix) is now
applied on the master branch.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1740219

Title:
  static linux-user ARM emulation has several-second startup time

Status in QEMU:
  New
Status in qemu package in Ubuntu:
  Incomplete

Bug description:
  static linux-user emulation has several-second startup time

  My problem: I'm a Parabola packager, and I'm updating our
  qemu-user-static package from 2.8 to 2.11.  With my new
  statically-linked 2.11, running `qemu-arm /my/arm-chroot/bin/true`
  went from taking 0.006s to 3s!  This does not happen with the normal
  dynamically linked 2.11, or the old static 2.8.

  What happens is it gets stuck in
  `linux-user/elfload.c:init_guest_space()`.  What `init_guest_space`
  does is map 2 parts of the address space: `[base, base+guest_size]`
  and `[base+0xffff0000, base+0xffff0000+page_size]`; where it must find
  an acceptable `base`.  Its strategy is to `mmap(NULL, guest_size,
  ...)` decide where the first range is, and then check if that
  +0xffff0000 is also available.  If it isn't, then it starts trying
  `mmap(base, ...)` for the entire address space from low-address to
  high-address.

  "Normally," it finds an accaptable `base` within the first 2 tries.
  With a static 2.11, it's taking thousands of tries.

  ----

  Now, from my understanding, there are 2 factors working together to
  cause that in static 2.11 but not the other builds:

   - 2.11 increased the default `guest_size` from 0xf7000000 to 0xffff0000
   - PIE (and thus ASLR) is disabled for static builds

  For some reason that I don't understand, with the smaller
  `guest_size` the initial `mmap(NULL, guest_size, ...)` usually
  returns an acceptable address range; but larger `guest_size` makes it
  consistently return a block of memory that butts right up against
  another already mapped chunk of memory.  This isn't just true on the
  older builds, it's true with the 2.11 builds if I use the `-R` flag to
  shrink the `guest_size` back down to 0xf7000000.  That is with
  linux-hardened 4.13.13 on x86-64.

  So then, it it falls back to crawling the entire address space; so it
  tries base=0x00001000.  With ASLR, that probably succeeds.  But with
  ASLR being disabled on static builds, the text segment is at
  0x60000000; which is does not leave room for the needed
  0xffff1000-size block before it.  So then it tries base=0x00002000.
  And so on, more than 6000 times until it finally gets to and passes
  the text segment; calling mmap more than 12000 times.

  ----

  I'm not sure what the fix is.  Perhaps try to mmap a continuous chunk
  of size 0xffff1000, then munmap it and then mmap the 2 chunks that we
  actually need.  The disadvantage to that is that it does not support
  the sparse address space that the current algorithm supports for
  `guest_size < 0xffff0000`.  If `guest_size < 0xffff0000` *and* the big
  mmap fails, then it could fall back to a sparse search; though I'm not
  sure the current algorithm is a good choice for it, as we see in this
  bug.  Perhaps it should inspect /proc/self/maps to try to find a
  suitable range before ever calling mmap?

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1740219/+subscriptions

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Qemu-devel] [Bug 1740219] Re: static linux-user ARM emulation has several-second startup time
  2017-12-27  5:54 [Qemu-devel] [Bug 1740219] [NEW] static linux-user emulation has several-second startup time LukeShu
                   ` (4 preceding siblings ...)
  2018-03-20 18:19 ` LukeShu
@ 2018-03-22 20:16 ` LukeShu
  2018-03-23  7:13 ` ChristianEhrhardt
                   ` (12 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: LukeShu @ 2018-03-22 20:16 UTC (permalink / raw)
  To: qemu-devel

This is now fixed on master, as of
3be2e41b3323169852dca11ffe6ff772c33e5aaa.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1740219

Title:
  static linux-user ARM emulation has several-second startup time

Status in QEMU:
  New
Status in qemu package in Ubuntu:
  Incomplete

Bug description:
  static linux-user emulation has several-second startup time

  My problem: I'm a Parabola packager, and I'm updating our
  qemu-user-static package from 2.8 to 2.11.  With my new
  statically-linked 2.11, running `qemu-arm /my/arm-chroot/bin/true`
  went from taking 0.006s to 3s!  This does not happen with the normal
  dynamically linked 2.11, or the old static 2.8.

  What happens is it gets stuck in
  `linux-user/elfload.c:init_guest_space()`.  What `init_guest_space`
  does is map 2 parts of the address space: `[base, base+guest_size]`
  and `[base+0xffff0000, base+0xffff0000+page_size]`; where it must find
  an acceptable `base`.  Its strategy is to `mmap(NULL, guest_size,
  ...)` decide where the first range is, and then check if that
  +0xffff0000 is also available.  If it isn't, then it starts trying
  `mmap(base, ...)` for the entire address space from low-address to
  high-address.

  "Normally," it finds an accaptable `base` within the first 2 tries.
  With a static 2.11, it's taking thousands of tries.

  ----

  Now, from my understanding, there are 2 factors working together to
  cause that in static 2.11 but not the other builds:

   - 2.11 increased the default `guest_size` from 0xf7000000 to 0xffff0000
   - PIE (and thus ASLR) is disabled for static builds

  For some reason that I don't understand, with the smaller
  `guest_size` the initial `mmap(NULL, guest_size, ...)` usually
  returns an acceptable address range; but larger `guest_size` makes it
  consistently return a block of memory that butts right up against
  another already mapped chunk of memory.  This isn't just true on the
  older builds, it's true with the 2.11 builds if I use the `-R` flag to
  shrink the `guest_size` back down to 0xf7000000.  That is with
  linux-hardened 4.13.13 on x86-64.

  So then, it it falls back to crawling the entire address space; so it
  tries base=0x00001000.  With ASLR, that probably succeeds.  But with
  ASLR being disabled on static builds, the text segment is at
  0x60000000; which is does not leave room for the needed
  0xffff1000-size block before it.  So then it tries base=0x00002000.
  And so on, more than 6000 times until it finally gets to and passes
  the text segment; calling mmap more than 12000 times.

  ----

  I'm not sure what the fix is.  Perhaps try to mmap a continuous chunk
  of size 0xffff1000, then munmap it and then mmap the 2 chunks that we
  actually need.  The disadvantage to that is that it does not support
  the sparse address space that the current algorithm supports for
  `guest_size < 0xffff0000`.  If `guest_size < 0xffff0000` *and* the big
  mmap fails, then it could fall back to a sparse search; though I'm not
  sure the current algorithm is a good choice for it, as we see in this
  bug.  Perhaps it should inspect /proc/self/maps to try to find a
  suitable range before ever calling mmap?

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1740219/+subscriptions

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Qemu-devel] [Bug 1740219] Re: static linux-user ARM emulation has several-second startup time
  2017-12-27  5:54 [Qemu-devel] [Bug 1740219] [NEW] static linux-user emulation has several-second startup time LukeShu
                   ` (5 preceding siblings ...)
  2018-03-22 20:16 ` LukeShu
@ 2018-03-23  7:13 ` ChristianEhrhardt
  2018-03-23 16:19 ` LukeShu
                   ` (11 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: ChristianEhrhardt @ 2018-03-23  7:13 UTC (permalink / raw)
  To: qemu-devel

The sha above is the merge, thanks Luke.

The actual change by you is
commit 2a53535af471f4bee9d6cb5b363746b8d5ed21dd
Author: Luke Shumaker <lukeshu@parabola.nu>
Date:   Thu Dec 28 13:08:13 2017 -0500

    linux-user: init_guest_space: Try to make ARM space+commpage
continuous

I'll be away a week but then look at taking this fix in.

@Luke - to check in advance, are there depending changes post 2.11.1
that are needed for this that you know of?

** Changed in: qemu (Ubuntu)
       Status: Incomplete => Triaged

** Changed in: qemu (Ubuntu)
   Importance: Undecided => High

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1740219

Title:
  static linux-user ARM emulation has several-second startup time

Status in QEMU:
  New
Status in qemu package in Ubuntu:
  Triaged

Bug description:
  static linux-user emulation has several-second startup time

  My problem: I'm a Parabola packager, and I'm updating our
  qemu-user-static package from 2.8 to 2.11.  With my new
  statically-linked 2.11, running `qemu-arm /my/arm-chroot/bin/true`
  went from taking 0.006s to 3s!  This does not happen with the normal
  dynamically linked 2.11, or the old static 2.8.

  What happens is it gets stuck in
  `linux-user/elfload.c:init_guest_space()`.  What `init_guest_space`
  does is map 2 parts of the address space: `[base, base+guest_size]`
  and `[base+0xffff0000, base+0xffff0000+page_size]`; where it must find
  an acceptable `base`.  Its strategy is to `mmap(NULL, guest_size,
  ...)` decide where the first range is, and then check if that
  +0xffff0000 is also available.  If it isn't, then it starts trying
  `mmap(base, ...)` for the entire address space from low-address to
  high-address.

  "Normally," it finds an accaptable `base` within the first 2 tries.
  With a static 2.11, it's taking thousands of tries.

  ----

  Now, from my understanding, there are 2 factors working together to
  cause that in static 2.11 but not the other builds:

   - 2.11 increased the default `guest_size` from 0xf7000000 to 0xffff0000
   - PIE (and thus ASLR) is disabled for static builds

  For some reason that I don't understand, with the smaller
  `guest_size` the initial `mmap(NULL, guest_size, ...)` usually
  returns an acceptable address range; but larger `guest_size` makes it
  consistently return a block of memory that butts right up against
  another already mapped chunk of memory.  This isn't just true on the
  older builds, it's true with the 2.11 builds if I use the `-R` flag to
  shrink the `guest_size` back down to 0xf7000000.  That is with
  linux-hardened 4.13.13 on x86-64.

  So then, it it falls back to crawling the entire address space; so it
  tries base=0x00001000.  With ASLR, that probably succeeds.  But with
  ASLR being disabled on static builds, the text segment is at
  0x60000000; which is does not leave room for the needed
  0xffff1000-size block before it.  So then it tries base=0x00002000.
  And so on, more than 6000 times until it finally gets to and passes
  the text segment; calling mmap more than 12000 times.

  ----

  I'm not sure what the fix is.  Perhaps try to mmap a continuous chunk
  of size 0xffff1000, then munmap it and then mmap the 2 chunks that we
  actually need.  The disadvantage to that is that it does not support
  the sparse address space that the current algorithm supports for
  `guest_size < 0xffff0000`.  If `guest_size < 0xffff0000` *and* the big
  mmap fails, then it could fall back to a sparse search; though I'm not
  sure the current algorithm is a good choice for it, as we see in this
  bug.  Perhaps it should inspect /proc/self/maps to try to find a
  suitable range before ever calling mmap?

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1740219/+subscriptions

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Qemu-devel] [Bug 1740219] Re: static linux-user ARM emulation has several-second startup time
  2017-12-27  5:54 [Qemu-devel] [Bug 1740219] [NEW] static linux-user emulation has several-second startup time LukeShu
                   ` (6 preceding siblings ...)
  2018-03-23  7:13 ` ChristianEhrhardt
@ 2018-03-23 16:19 ` LukeShu
  2018-03-23 16:30 ` LukeShu
                   ` (10 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: LukeShu @ 2018-03-23 16:19 UTC (permalink / raw)
  To: qemu-devel

I don't believe so.  The patchset applies cleanly on 2.11.0, and fixes
the issue there.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1740219

Title:
  static linux-user ARM emulation has several-second startup time

Status in QEMU:
  New
Status in qemu package in Ubuntu:
  Triaged

Bug description:
  static linux-user emulation has several-second startup time

  My problem: I'm a Parabola packager, and I'm updating our
  qemu-user-static package from 2.8 to 2.11.  With my new
  statically-linked 2.11, running `qemu-arm /my/arm-chroot/bin/true`
  went from taking 0.006s to 3s!  This does not happen with the normal
  dynamically linked 2.11, or the old static 2.8.

  What happens is it gets stuck in
  `linux-user/elfload.c:init_guest_space()`.  What `init_guest_space`
  does is map 2 parts of the address space: `[base, base+guest_size]`
  and `[base+0xffff0000, base+0xffff0000+page_size]`; where it must find
  an acceptable `base`.  Its strategy is to `mmap(NULL, guest_size,
  ...)` decide where the first range is, and then check if that
  +0xffff0000 is also available.  If it isn't, then it starts trying
  `mmap(base, ...)` for the entire address space from low-address to
  high-address.

  "Normally," it finds an accaptable `base` within the first 2 tries.
  With a static 2.11, it's taking thousands of tries.

  ----

  Now, from my understanding, there are 2 factors working together to
  cause that in static 2.11 but not the other builds:

   - 2.11 increased the default `guest_size` from 0xf7000000 to 0xffff0000
   - PIE (and thus ASLR) is disabled for static builds

  For some reason that I don't understand, with the smaller
  `guest_size` the initial `mmap(NULL, guest_size, ...)` usually
  returns an acceptable address range; but larger `guest_size` makes it
  consistently return a block of memory that butts right up against
  another already mapped chunk of memory.  This isn't just true on the
  older builds, it's true with the 2.11 builds if I use the `-R` flag to
  shrink the `guest_size` back down to 0xf7000000.  That is with
  linux-hardened 4.13.13 on x86-64.

  So then, it it falls back to crawling the entire address space; so it
  tries base=0x00001000.  With ASLR, that probably succeeds.  But with
  ASLR being disabled on static builds, the text segment is at
  0x60000000; which is does not leave room for the needed
  0xffff1000-size block before it.  So then it tries base=0x00002000.
  And so on, more than 6000 times until it finally gets to and passes
  the text segment; calling mmap more than 12000 times.

  ----

  I'm not sure what the fix is.  Perhaps try to mmap a continuous chunk
  of size 0xffff1000, then munmap it and then mmap the 2 chunks that we
  actually need.  The disadvantage to that is that it does not support
  the sparse address space that the current algorithm supports for
  `guest_size < 0xffff0000`.  If `guest_size < 0xffff0000` *and* the big
  mmap fails, then it could fall back to a sparse search; though I'm not
  sure the current algorithm is a good choice for it, as we see in this
  bug.  Perhaps it should inspect /proc/self/maps to try to find a
  suitable range before ever calling mmap?

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1740219/+subscriptions

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Qemu-devel] [Bug 1740219] Re: static linux-user ARM emulation has several-second startup time
  2017-12-27  5:54 [Qemu-devel] [Bug 1740219] [NEW] static linux-user emulation has several-second startup time LukeShu
                   ` (7 preceding siblings ...)
  2018-03-23 16:19 ` LukeShu
@ 2018-03-23 16:30 ` LukeShu
  2018-04-03  9:49 ` ChristianEhrhardt
                   ` (9 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: LukeShu @ 2018-03-23 16:30 UTC (permalink / raw)
  To: qemu-devel

Oh, but it's worth noting that patch 1/10 had a mistake in it, which was
corrected when applied as 8756e1361d177e91dc6d88f37749b809fd2407fb.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1740219

Title:
  static linux-user ARM emulation has several-second startup time

Status in QEMU:
  New
Status in qemu package in Ubuntu:
  Triaged

Bug description:
  static linux-user emulation has several-second startup time

  My problem: I'm a Parabola packager, and I'm updating our
  qemu-user-static package from 2.8 to 2.11.  With my new
  statically-linked 2.11, running `qemu-arm /my/arm-chroot/bin/true`
  went from taking 0.006s to 3s!  This does not happen with the normal
  dynamically linked 2.11, or the old static 2.8.

  What happens is it gets stuck in
  `linux-user/elfload.c:init_guest_space()`.  What `init_guest_space`
  does is map 2 parts of the address space: `[base, base+guest_size]`
  and `[base+0xffff0000, base+0xffff0000+page_size]`; where it must find
  an acceptable `base`.  Its strategy is to `mmap(NULL, guest_size,
  ...)` decide where the first range is, and then check if that
  +0xffff0000 is also available.  If it isn't, then it starts trying
  `mmap(base, ...)` for the entire address space from low-address to
  high-address.

  "Normally," it finds an accaptable `base` within the first 2 tries.
  With a static 2.11, it's taking thousands of tries.

  ----

  Now, from my understanding, there are 2 factors working together to
  cause that in static 2.11 but not the other builds:

   - 2.11 increased the default `guest_size` from 0xf7000000 to 0xffff0000
   - PIE (and thus ASLR) is disabled for static builds

  For some reason that I don't understand, with the smaller
  `guest_size` the initial `mmap(NULL, guest_size, ...)` usually
  returns an acceptable address range; but larger `guest_size` makes it
  consistently return a block of memory that butts right up against
  another already mapped chunk of memory.  This isn't just true on the
  older builds, it's true with the 2.11 builds if I use the `-R` flag to
  shrink the `guest_size` back down to 0xf7000000.  That is with
  linux-hardened 4.13.13 on x86-64.

  So then, it it falls back to crawling the entire address space; so it
  tries base=0x00001000.  With ASLR, that probably succeeds.  But with
  ASLR being disabled on static builds, the text segment is at
  0x60000000; which is does not leave room for the needed
  0xffff1000-size block before it.  So then it tries base=0x00002000.
  And so on, more than 6000 times until it finally gets to and passes
  the text segment; calling mmap more than 12000 times.

  ----

  I'm not sure what the fix is.  Perhaps try to mmap a continuous chunk
  of size 0xffff1000, then munmap it and then mmap the 2 chunks that we
  actually need.  The disadvantage to that is that it does not support
  the sparse address space that the current algorithm supports for
  `guest_size < 0xffff0000`.  If `guest_size < 0xffff0000` *and* the big
  mmap fails, then it could fall back to a sparse search; though I'm not
  sure the current algorithm is a good choice for it, as we see in this
  bug.  Perhaps it should inspect /proc/self/maps to try to find a
  suitable range before ever calling mmap?

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1740219/+subscriptions

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Qemu-devel] [Bug 1740219] Re: static linux-user ARM emulation has several-second startup time
  2017-12-27  5:54 [Qemu-devel] [Bug 1740219] [NEW] static linux-user emulation has several-second startup time LukeShu
                   ` (8 preceding siblings ...)
  2018-03-23 16:30 ` LukeShu
@ 2018-04-03  9:49 ` ChristianEhrhardt
  2018-04-03 20:17 ` LukeShu
                   ` (8 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: ChristianEhrhardt @ 2018-04-03  9:49 UTC (permalink / raw)
  To: qemu-devel

Back again,
my question was more about if we are able to JUST take 2a53535af471f4bee9d6cb5b363746b8d5ed21dd without the rest.

We are already in Feature Freeze for Ubuntu 18.04, so we can either

a) wait for the next release and pick it up in full by the new qemu
version (well we will do that anyway)

b) identify a fix only (not all the cleanup and reworks) patch that will
be good for the 2.11.1 in Bionic

Especially being "just slow" but not broken makes it harder to consider the closer we get to release (I hate that as well being a performance engineer, but minimizing regressions is a target as well :-) ).
Essentially to some extend being in feature freeze is as if we are under [1] already.

So will 2a53535af471f4bee9d6cb5b363746b8d5ed21dd alone be good in your opinion?
Or will it need more and if so what would be the minimal set of your changes.


[1]: https://wiki.ubuntu.com/StableReleaseUpdates

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1740219

Title:
  static linux-user ARM emulation has several-second startup time

Status in QEMU:
  New
Status in qemu package in Ubuntu:
  Triaged

Bug description:
  static linux-user emulation has several-second startup time

  My problem: I'm a Parabola packager, and I'm updating our
  qemu-user-static package from 2.8 to 2.11.  With my new
  statically-linked 2.11, running `qemu-arm /my/arm-chroot/bin/true`
  went from taking 0.006s to 3s!  This does not happen with the normal
  dynamically linked 2.11, or the old static 2.8.

  What happens is it gets stuck in
  `linux-user/elfload.c:init_guest_space()`.  What `init_guest_space`
  does is map 2 parts of the address space: `[base, base+guest_size]`
  and `[base+0xffff0000, base+0xffff0000+page_size]`; where it must find
  an acceptable `base`.  Its strategy is to `mmap(NULL, guest_size,
  ...)` decide where the first range is, and then check if that
  +0xffff0000 is also available.  If it isn't, then it starts trying
  `mmap(base, ...)` for the entire address space from low-address to
  high-address.

  "Normally," it finds an accaptable `base` within the first 2 tries.
  With a static 2.11, it's taking thousands of tries.

  ----

  Now, from my understanding, there are 2 factors working together to
  cause that in static 2.11 but not the other builds:

   - 2.11 increased the default `guest_size` from 0xf7000000 to 0xffff0000
   - PIE (and thus ASLR) is disabled for static builds

  For some reason that I don't understand, with the smaller
  `guest_size` the initial `mmap(NULL, guest_size, ...)` usually
  returns an acceptable address range; but larger `guest_size` makes it
  consistently return a block of memory that butts right up against
  another already mapped chunk of memory.  This isn't just true on the
  older builds, it's true with the 2.11 builds if I use the `-R` flag to
  shrink the `guest_size` back down to 0xf7000000.  That is with
  linux-hardened 4.13.13 on x86-64.

  So then, it it falls back to crawling the entire address space; so it
  tries base=0x00001000.  With ASLR, that probably succeeds.  But with
  ASLR being disabled on static builds, the text segment is at
  0x60000000; which is does not leave room for the needed
  0xffff1000-size block before it.  So then it tries base=0x00002000.
  And so on, more than 6000 times until it finally gets to and passes
  the text segment; calling mmap more than 12000 times.

  ----

  I'm not sure what the fix is.  Perhaps try to mmap a continuous chunk
  of size 0xffff1000, then munmap it and then mmap the 2 chunks that we
  actually need.  The disadvantage to that is that it does not support
  the sparse address space that the current algorithm supports for
  `guest_size < 0xffff0000`.  If `guest_size < 0xffff0000` *and* the big
  mmap fails, then it could fall back to a sparse search; though I'm not
  sure the current algorithm is a good choice for it, as we see in this
  bug.  Perhaps it should inspect /proc/self/maps to try to find a
  suitable range before ever calling mmap?

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1740219/+subscriptions

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Qemu-devel] [Bug 1740219] Re: static linux-user ARM emulation has several-second startup time
  2017-12-27  5:54 [Qemu-devel] [Bug 1740219] [NEW] static linux-user emulation has several-second startup time LukeShu
                   ` (9 preceding siblings ...)
  2018-04-03  9:49 ` ChristianEhrhardt
@ 2018-04-03 20:17 ` LukeShu
  2018-04-04 13:33 ` ChristianEhrhardt
                   ` (7 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: LukeShu @ 2018-04-03 20:17 UTC (permalink / raw)
  To: qemu-devel

Yes, I believe that 2a53535af471f4bee9d6cb5b363746b8d5ed21dd alone is
good.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1740219

Title:
  static linux-user ARM emulation has several-second startup time

Status in QEMU:
  New
Status in qemu package in Ubuntu:
  Triaged

Bug description:
  static linux-user emulation has several-second startup time

  My problem: I'm a Parabola packager, and I'm updating our
  qemu-user-static package from 2.8 to 2.11.  With my new
  statically-linked 2.11, running `qemu-arm /my/arm-chroot/bin/true`
  went from taking 0.006s to 3s!  This does not happen with the normal
  dynamically linked 2.11, or the old static 2.8.

  What happens is it gets stuck in
  `linux-user/elfload.c:init_guest_space()`.  What `init_guest_space`
  does is map 2 parts of the address space: `[base, base+guest_size]`
  and `[base+0xffff0000, base+0xffff0000+page_size]`; where it must find
  an acceptable `base`.  Its strategy is to `mmap(NULL, guest_size,
  ...)` decide where the first range is, and then check if that
  +0xffff0000 is also available.  If it isn't, then it starts trying
  `mmap(base, ...)` for the entire address space from low-address to
  high-address.

  "Normally," it finds an accaptable `base` within the first 2 tries.
  With a static 2.11, it's taking thousands of tries.

  ----

  Now, from my understanding, there are 2 factors working together to
  cause that in static 2.11 but not the other builds:

   - 2.11 increased the default `guest_size` from 0xf7000000 to 0xffff0000
   - PIE (and thus ASLR) is disabled for static builds

  For some reason that I don't understand, with the smaller
  `guest_size` the initial `mmap(NULL, guest_size, ...)` usually
  returns an acceptable address range; but larger `guest_size` makes it
  consistently return a block of memory that butts right up against
  another already mapped chunk of memory.  This isn't just true on the
  older builds, it's true with the 2.11 builds if I use the `-R` flag to
  shrink the `guest_size` back down to 0xf7000000.  That is with
  linux-hardened 4.13.13 on x86-64.

  So then, it it falls back to crawling the entire address space; so it
  tries base=0x00001000.  With ASLR, that probably succeeds.  But with
  ASLR being disabled on static builds, the text segment is at
  0x60000000; which is does not leave room for the needed
  0xffff1000-size block before it.  So then it tries base=0x00002000.
  And so on, more than 6000 times until it finally gets to and passes
  the text segment; calling mmap more than 12000 times.

  ----

  I'm not sure what the fix is.  Perhaps try to mmap a continuous chunk
  of size 0xffff1000, then munmap it and then mmap the 2 chunks that we
  actually need.  The disadvantage to that is that it does not support
  the sparse address space that the current algorithm supports for
  `guest_size < 0xffff0000`.  If `guest_size < 0xffff0000` *and* the big
  mmap fails, then it could fall back to a sparse search; though I'm not
  sure the current algorithm is a good choice for it, as we see in this
  bug.  Perhaps it should inspect /proc/self/maps to try to find a
  suitable range before ever calling mmap?

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1740219/+subscriptions

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Qemu-devel] [Bug 1740219] Re: static linux-user ARM emulation has several-second startup time
  2017-12-27  5:54 [Qemu-devel] [Bug 1740219] [NEW] static linux-user emulation has several-second startup time LukeShu
                   ` (10 preceding siblings ...)
  2018-04-03 20:17 ` LukeShu
@ 2018-04-04 13:33 ` ChristianEhrhardt
  2018-04-04 13:54 ` Peter Maydell
                   ` (6 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: ChristianEhrhardt @ 2018-04-04 13:33 UTC (permalink / raw)
  To: qemu-devel

Considering 2.12-rcX a release set the upstream status to that

** Changed in: qemu
       Status: New => Fix Released

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1740219

Title:
  static linux-user ARM emulation has several-second startup time

Status in QEMU:
  Fix Released
Status in qemu package in Ubuntu:
  Triaged

Bug description:
  static linux-user emulation has several-second startup time

  My problem: I'm a Parabola packager, and I'm updating our
  qemu-user-static package from 2.8 to 2.11.  With my new
  statically-linked 2.11, running `qemu-arm /my/arm-chroot/bin/true`
  went from taking 0.006s to 3s!  This does not happen with the normal
  dynamically linked 2.11, or the old static 2.8.

  What happens is it gets stuck in
  `linux-user/elfload.c:init_guest_space()`.  What `init_guest_space`
  does is map 2 parts of the address space: `[base, base+guest_size]`
  and `[base+0xffff0000, base+0xffff0000+page_size]`; where it must find
  an acceptable `base`.  Its strategy is to `mmap(NULL, guest_size,
  ...)` decide where the first range is, and then check if that
  +0xffff0000 is also available.  If it isn't, then it starts trying
  `mmap(base, ...)` for the entire address space from low-address to
  high-address.

  "Normally," it finds an accaptable `base` within the first 2 tries.
  With a static 2.11, it's taking thousands of tries.

  ----

  Now, from my understanding, there are 2 factors working together to
  cause that in static 2.11 but not the other builds:

   - 2.11 increased the default `guest_size` from 0xf7000000 to 0xffff0000
   - PIE (and thus ASLR) is disabled for static builds

  For some reason that I don't understand, with the smaller
  `guest_size` the initial `mmap(NULL, guest_size, ...)` usually
  returns an acceptable address range; but larger `guest_size` makes it
  consistently return a block of memory that butts right up against
  another already mapped chunk of memory.  This isn't just true on the
  older builds, it's true with the 2.11 builds if I use the `-R` flag to
  shrink the `guest_size` back down to 0xf7000000.  That is with
  linux-hardened 4.13.13 on x86-64.

  So then, it it falls back to crawling the entire address space; so it
  tries base=0x00001000.  With ASLR, that probably succeeds.  But with
  ASLR being disabled on static builds, the text segment is at
  0x60000000; which is does not leave room for the needed
  0xffff1000-size block before it.  So then it tries base=0x00002000.
  And so on, more than 6000 times until it finally gets to and passes
  the text segment; calling mmap more than 12000 times.

  ----

  I'm not sure what the fix is.  Perhaps try to mmap a continuous chunk
  of size 0xffff1000, then munmap it and then mmap the 2 chunks that we
  actually need.  The disadvantage to that is that it does not support
  the sparse address space that the current algorithm supports for
  `guest_size < 0xffff0000`.  If `guest_size < 0xffff0000` *and* the big
  mmap fails, then it could fall back to a sparse search; though I'm not
  sure the current algorithm is a good choice for it, as we see in this
  bug.  Perhaps it should inspect /proc/self/maps to try to find a
  suitable range before ever calling mmap?

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1740219/+subscriptions

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Qemu-devel] [Bug 1740219] Re: static linux-user ARM emulation has several-second startup time
  2017-12-27  5:54 [Qemu-devel] [Bug 1740219] [NEW] static linux-user emulation has several-second startup time LukeShu
                   ` (11 preceding siblings ...)
  2018-04-04 13:33 ` ChristianEhrhardt
@ 2018-04-04 13:54 ` Peter Maydell
  2018-04-04 14:23 ` ChristianEhrhardt
                   ` (5 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: Peter Maydell @ 2018-04-04 13:54 UTC (permalink / raw)
  To: qemu-devel

We don't generally mark bugs 'fix released' until the final (non-rc)
release is made.


** Changed in: qemu
       Status: Fix Released => Fix Committed

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1740219

Title:
  static linux-user ARM emulation has several-second startup time

Status in QEMU:
  Fix Committed
Status in qemu package in Ubuntu:
  Triaged

Bug description:
  static linux-user emulation has several-second startup time

  My problem: I'm a Parabola packager, and I'm updating our
  qemu-user-static package from 2.8 to 2.11.  With my new
  statically-linked 2.11, running `qemu-arm /my/arm-chroot/bin/true`
  went from taking 0.006s to 3s!  This does not happen with the normal
  dynamically linked 2.11, or the old static 2.8.

  What happens is it gets stuck in
  `linux-user/elfload.c:init_guest_space()`.  What `init_guest_space`
  does is map 2 parts of the address space: `[base, base+guest_size]`
  and `[base+0xffff0000, base+0xffff0000+page_size]`; where it must find
  an acceptable `base`.  Its strategy is to `mmap(NULL, guest_size,
  ...)` decide where the first range is, and then check if that
  +0xffff0000 is also available.  If it isn't, then it starts trying
  `mmap(base, ...)` for the entire address space from low-address to
  high-address.

  "Normally," it finds an accaptable `base` within the first 2 tries.
  With a static 2.11, it's taking thousands of tries.

  ----

  Now, from my understanding, there are 2 factors working together to
  cause that in static 2.11 but not the other builds:

   - 2.11 increased the default `guest_size` from 0xf7000000 to 0xffff0000
   - PIE (and thus ASLR) is disabled for static builds

  For some reason that I don't understand, with the smaller
  `guest_size` the initial `mmap(NULL, guest_size, ...)` usually
  returns an acceptable address range; but larger `guest_size` makes it
  consistently return a block of memory that butts right up against
  another already mapped chunk of memory.  This isn't just true on the
  older builds, it's true with the 2.11 builds if I use the `-R` flag to
  shrink the `guest_size` back down to 0xf7000000.  That is with
  linux-hardened 4.13.13 on x86-64.

  So then, it it falls back to crawling the entire address space; so it
  tries base=0x00001000.  With ASLR, that probably succeeds.  But with
  ASLR being disabled on static builds, the text segment is at
  0x60000000; which is does not leave room for the needed
  0xffff1000-size block before it.  So then it tries base=0x00002000.
  And so on, more than 6000 times until it finally gets to and passes
  the text segment; calling mmap more than 12000 times.

  ----

  I'm not sure what the fix is.  Perhaps try to mmap a continuous chunk
  of size 0xffff1000, then munmap it and then mmap the 2 chunks that we
  actually need.  The disadvantage to that is that it does not support
  the sparse address space that the current algorithm supports for
  `guest_size < 0xffff0000`.  If `guest_size < 0xffff0000` *and* the big
  mmap fails, then it could fall back to a sparse search; though I'm not
  sure the current algorithm is a good choice for it, as we see in this
  bug.  Perhaps it should inspect /proc/self/maps to try to find a
  suitable range before ever calling mmap?

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1740219/+subscriptions

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Qemu-devel] [Bug 1740219] Re: static linux-user ARM emulation has several-second startup time
  2017-12-27  5:54 [Qemu-devel] [Bug 1740219] [NEW] static linux-user emulation has several-second startup time LukeShu
                   ` (12 preceding siblings ...)
  2018-04-04 13:54 ` Peter Maydell
@ 2018-04-04 14:23 ` ChristianEhrhardt
  2018-04-05  8:48 ` ChristianEhrhardt
                   ` (4 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: ChristianEhrhardt @ 2018-04-04 14:23 UTC (permalink / raw)
  To: qemu-devel

I wasn't sure if you'd usually take the interim step to "Fix Committed",
thanks Peter.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1740219

Title:
  static linux-user ARM emulation has several-second startup time

Status in QEMU:
  Fix Committed
Status in qemu package in Ubuntu:
  Triaged

Bug description:
  static linux-user emulation has several-second startup time

  My problem: I'm a Parabola packager, and I'm updating our
  qemu-user-static package from 2.8 to 2.11.  With my new
  statically-linked 2.11, running `qemu-arm /my/arm-chroot/bin/true`
  went from taking 0.006s to 3s!  This does not happen with the normal
  dynamically linked 2.11, or the old static 2.8.

  What happens is it gets stuck in
  `linux-user/elfload.c:init_guest_space()`.  What `init_guest_space`
  does is map 2 parts of the address space: `[base, base+guest_size]`
  and `[base+0xffff0000, base+0xffff0000+page_size]`; where it must find
  an acceptable `base`.  Its strategy is to `mmap(NULL, guest_size,
  ...)` decide where the first range is, and then check if that
  +0xffff0000 is also available.  If it isn't, then it starts trying
  `mmap(base, ...)` for the entire address space from low-address to
  high-address.

  "Normally," it finds an accaptable `base` within the first 2 tries.
  With a static 2.11, it's taking thousands of tries.

  ----

  Now, from my understanding, there are 2 factors working together to
  cause that in static 2.11 but not the other builds:

   - 2.11 increased the default `guest_size` from 0xf7000000 to 0xffff0000
   - PIE (and thus ASLR) is disabled for static builds

  For some reason that I don't understand, with the smaller
  `guest_size` the initial `mmap(NULL, guest_size, ...)` usually
  returns an acceptable address range; but larger `guest_size` makes it
  consistently return a block of memory that butts right up against
  another already mapped chunk of memory.  This isn't just true on the
  older builds, it's true with the 2.11 builds if I use the `-R` flag to
  shrink the `guest_size` back down to 0xf7000000.  That is with
  linux-hardened 4.13.13 on x86-64.

  So then, it it falls back to crawling the entire address space; so it
  tries base=0x00001000.  With ASLR, that probably succeeds.  But with
  ASLR being disabled on static builds, the text segment is at
  0x60000000; which is does not leave room for the needed
  0xffff1000-size block before it.  So then it tries base=0x00002000.
  And so on, more than 6000 times until it finally gets to and passes
  the text segment; calling mmap more than 12000 times.

  ----

  I'm not sure what the fix is.  Perhaps try to mmap a continuous chunk
  of size 0xffff1000, then munmap it and then mmap the 2 chunks that we
  actually need.  The disadvantage to that is that it does not support
  the sparse address space that the current algorithm supports for
  `guest_size < 0xffff0000`.  If `guest_size < 0xffff0000` *and* the big
  mmap fails, then it could fall back to a sparse search; though I'm not
  sure the current algorithm is a good choice for it, as we see in this
  bug.  Perhaps it should inspect /proc/self/maps to try to find a
  suitable range before ever calling mmap?

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1740219/+subscriptions

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Qemu-devel] [Bug 1740219] Re: static linux-user ARM emulation has several-second startup time
  2017-12-27  5:54 [Qemu-devel] [Bug 1740219] [NEW] static linux-user emulation has several-second startup time LukeShu
                   ` (13 preceding siblings ...)
  2018-04-04 14:23 ` ChristianEhrhardt
@ 2018-04-05  8:48 ` ChristianEhrhardt
  2018-04-05 21:52 ` LukeShu
                   ` (3 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: ChristianEhrhardt @ 2018-04-05  8:48 UTC (permalink / raw)
  To: qemu-devel

For Ubuntu: PPA: https://launchpad.net/~ci-train-ppa-
service/+archive/ubuntu/3225

Regression test against ppa looked good tonight.

There are new changes which I need to add for two more bugs.
But testing from the ppa is ok right now already.

@Luke: Please test against this PPA, as I want to ensure it is working
for your case before pushing to Bionic.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1740219

Title:
  static linux-user ARM emulation has several-second startup time

Status in QEMU:
  Fix Committed
Status in qemu package in Ubuntu:
  Triaged

Bug description:
  static linux-user emulation has several-second startup time

  My problem: I'm a Parabola packager, and I'm updating our
  qemu-user-static package from 2.8 to 2.11.  With my new
  statically-linked 2.11, running `qemu-arm /my/arm-chroot/bin/true`
  went from taking 0.006s to 3s!  This does not happen with the normal
  dynamically linked 2.11, or the old static 2.8.

  What happens is it gets stuck in
  `linux-user/elfload.c:init_guest_space()`.  What `init_guest_space`
  does is map 2 parts of the address space: `[base, base+guest_size]`
  and `[base+0xffff0000, base+0xffff0000+page_size]`; where it must find
  an acceptable `base`.  Its strategy is to `mmap(NULL, guest_size,
  ...)` decide where the first range is, and then check if that
  +0xffff0000 is also available.  If it isn't, then it starts trying
  `mmap(base, ...)` for the entire address space from low-address to
  high-address.

  "Normally," it finds an accaptable `base` within the first 2 tries.
  With a static 2.11, it's taking thousands of tries.

  ----

  Now, from my understanding, there are 2 factors working together to
  cause that in static 2.11 but not the other builds:

   - 2.11 increased the default `guest_size` from 0xf7000000 to 0xffff0000
   - PIE (and thus ASLR) is disabled for static builds

  For some reason that I don't understand, with the smaller
  `guest_size` the initial `mmap(NULL, guest_size, ...)` usually
  returns an acceptable address range; but larger `guest_size` makes it
  consistently return a block of memory that butts right up against
  another already mapped chunk of memory.  This isn't just true on the
  older builds, it's true with the 2.11 builds if I use the `-R` flag to
  shrink the `guest_size` back down to 0xf7000000.  That is with
  linux-hardened 4.13.13 on x86-64.

  So then, it it falls back to crawling the entire address space; so it
  tries base=0x00001000.  With ASLR, that probably succeeds.  But with
  ASLR being disabled on static builds, the text segment is at
  0x60000000; which is does not leave room for the needed
  0xffff1000-size block before it.  So then it tries base=0x00002000.
  And so on, more than 6000 times until it finally gets to and passes
  the text segment; calling mmap more than 12000 times.

  ----

  I'm not sure what the fix is.  Perhaps try to mmap a continuous chunk
  of size 0xffff1000, then munmap it and then mmap the 2 chunks that we
  actually need.  The disadvantage to that is that it does not support
  the sparse address space that the current algorithm supports for
  `guest_size < 0xffff0000`.  If `guest_size < 0xffff0000` *and* the big
  mmap fails, then it could fall back to a sparse search; though I'm not
  sure the current algorithm is a good choice for it, as we see in this
  bug.  Perhaps it should inspect /proc/self/maps to try to find a
  suitable range before ever calling mmap?

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1740219/+subscriptions

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Qemu-devel] [Bug 1740219] Re: static linux-user ARM emulation has several-second startup time
  2017-12-27  5:54 [Qemu-devel] [Bug 1740219] [NEW] static linux-user emulation has several-second startup time LukeShu
                   ` (14 preceding siblings ...)
  2018-04-05  8:48 ` ChristianEhrhardt
@ 2018-04-05 21:52 ` LukeShu
  2018-04-06  6:12 ` ChristianEhrhardt
                   ` (2 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: LukeShu @ 2018-04-05 21:52 UTC (permalink / raw)
  To: qemu-devel

I'm not on a Debian/Ubuntu-ish system, but extracting

    qemu-user-static_2.11+dfsg-1ubuntu6~ppa3_amd64.deb : data.tar.xz :
usr/bin/qemu-arm-static

and testing with that binary:

    $ time usr/bin/qemu-arm-static /var/lib/archbuild/dbscripts@armv7h/luke/usr/bin/ldconfig --help
    Usage: ldconfig [OPTION...]
  ...
    <https://github.com/archlinuxarm/PKGBUILDs/issues>.

    real	0m0.068s
    user	0m0.067s
    sys	0m0.000s

That is: LGTM.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1740219

Title:
  static linux-user ARM emulation has several-second startup time

Status in QEMU:
  Fix Committed
Status in qemu package in Ubuntu:
  Triaged

Bug description:
  static linux-user emulation has several-second startup time

  My problem: I'm a Parabola packager, and I'm updating our
  qemu-user-static package from 2.8 to 2.11.  With my new
  statically-linked 2.11, running `qemu-arm /my/arm-chroot/bin/true`
  went from taking 0.006s to 3s!  This does not happen with the normal
  dynamically linked 2.11, or the old static 2.8.

  What happens is it gets stuck in
  `linux-user/elfload.c:init_guest_space()`.  What `init_guest_space`
  does is map 2 parts of the address space: `[base, base+guest_size]`
  and `[base+0xffff0000, base+0xffff0000+page_size]`; where it must find
  an acceptable `base`.  Its strategy is to `mmap(NULL, guest_size,
  ...)` decide where the first range is, and then check if that
  +0xffff0000 is also available.  If it isn't, then it starts trying
  `mmap(base, ...)` for the entire address space from low-address to
  high-address.

  "Normally," it finds an accaptable `base` within the first 2 tries.
  With a static 2.11, it's taking thousands of tries.

  ----

  Now, from my understanding, there are 2 factors working together to
  cause that in static 2.11 but not the other builds:

   - 2.11 increased the default `guest_size` from 0xf7000000 to 0xffff0000
   - PIE (and thus ASLR) is disabled for static builds

  For some reason that I don't understand, with the smaller
  `guest_size` the initial `mmap(NULL, guest_size, ...)` usually
  returns an acceptable address range; but larger `guest_size` makes it
  consistently return a block of memory that butts right up against
  another already mapped chunk of memory.  This isn't just true on the
  older builds, it's true with the 2.11 builds if I use the `-R` flag to
  shrink the `guest_size` back down to 0xf7000000.  That is with
  linux-hardened 4.13.13 on x86-64.

  So then, it it falls back to crawling the entire address space; so it
  tries base=0x00001000.  With ASLR, that probably succeeds.  But with
  ASLR being disabled on static builds, the text segment is at
  0x60000000; which is does not leave room for the needed
  0xffff1000-size block before it.  So then it tries base=0x00002000.
  And so on, more than 6000 times until it finally gets to and passes
  the text segment; calling mmap more than 12000 times.

  ----

  I'm not sure what the fix is.  Perhaps try to mmap a continuous chunk
  of size 0xffff1000, then munmap it and then mmap the 2 chunks that we
  actually need.  The disadvantage to that is that it does not support
  the sparse address space that the current algorithm supports for
  `guest_size < 0xffff0000`.  If `guest_size < 0xffff0000` *and* the big
  mmap fails, then it could fall back to a sparse search; though I'm not
  sure the current algorithm is a good choice for it, as we see in this
  bug.  Perhaps it should inspect /proc/self/maps to try to find a
  suitable range before ever calling mmap?

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1740219/+subscriptions

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Qemu-devel] [Bug 1740219] Re: static linux-user ARM emulation has several-second startup time
  2017-12-27  5:54 [Qemu-devel] [Bug 1740219] [NEW] static linux-user emulation has several-second startup time LukeShu
                   ` (15 preceding siblings ...)
  2018-04-05 21:52 ` LukeShu
@ 2018-04-06  6:12 ` ChristianEhrhardt
  2018-04-09 18:07 ` Launchpad Bug Tracker
  2018-04-26 10:36 ` Thomas Huth
  18 siblings, 0 replies; 20+ messages in thread
From: ChristianEhrhardt @ 2018-04-06  6:12 UTC (permalink / raw)
  To: qemu-devel

Thanks Luke.
I tried the same from the deb of libc for arm in bionic.

Down from
real    0m2.031s
to
real    0m0.002s

So confirmed as well.

** Changed in: qemu (Ubuntu)
       Status: Triaged => In Progress

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1740219

Title:
  static linux-user ARM emulation has several-second startup time

Status in QEMU:
  Fix Committed
Status in qemu package in Ubuntu:
  In Progress

Bug description:
  static linux-user emulation has several-second startup time

  My problem: I'm a Parabola packager, and I'm updating our
  qemu-user-static package from 2.8 to 2.11.  With my new
  statically-linked 2.11, running `qemu-arm /my/arm-chroot/bin/true`
  went from taking 0.006s to 3s!  This does not happen with the normal
  dynamically linked 2.11, or the old static 2.8.

  What happens is it gets stuck in
  `linux-user/elfload.c:init_guest_space()`.  What `init_guest_space`
  does is map 2 parts of the address space: `[base, base+guest_size]`
  and `[base+0xffff0000, base+0xffff0000+page_size]`; where it must find
  an acceptable `base`.  Its strategy is to `mmap(NULL, guest_size,
  ...)` decide where the first range is, and then check if that
  +0xffff0000 is also available.  If it isn't, then it starts trying
  `mmap(base, ...)` for the entire address space from low-address to
  high-address.

  "Normally," it finds an accaptable `base` within the first 2 tries.
  With a static 2.11, it's taking thousands of tries.

  ----

  Now, from my understanding, there are 2 factors working together to
  cause that in static 2.11 but not the other builds:

   - 2.11 increased the default `guest_size` from 0xf7000000 to 0xffff0000
   - PIE (and thus ASLR) is disabled for static builds

  For some reason that I don't understand, with the smaller
  `guest_size` the initial `mmap(NULL, guest_size, ...)` usually
  returns an acceptable address range; but larger `guest_size` makes it
  consistently return a block of memory that butts right up against
  another already mapped chunk of memory.  This isn't just true on the
  older builds, it's true with the 2.11 builds if I use the `-R` flag to
  shrink the `guest_size` back down to 0xf7000000.  That is with
  linux-hardened 4.13.13 on x86-64.

  So then, it it falls back to crawling the entire address space; so it
  tries base=0x00001000.  With ASLR, that probably succeeds.  But with
  ASLR being disabled on static builds, the text segment is at
  0x60000000; which is does not leave room for the needed
  0xffff1000-size block before it.  So then it tries base=0x00002000.
  And so on, more than 6000 times until it finally gets to and passes
  the text segment; calling mmap more than 12000 times.

  ----

  I'm not sure what the fix is.  Perhaps try to mmap a continuous chunk
  of size 0xffff1000, then munmap it and then mmap the 2 chunks that we
  actually need.  The disadvantage to that is that it does not support
  the sparse address space that the current algorithm supports for
  `guest_size < 0xffff0000`.  If `guest_size < 0xffff0000` *and* the big
  mmap fails, then it could fall back to a sparse search; though I'm not
  sure the current algorithm is a good choice for it, as we see in this
  bug.  Perhaps it should inspect /proc/self/maps to try to find a
  suitable range before ever calling mmap?

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1740219/+subscriptions

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Qemu-devel] [Bug 1740219] Re: static linux-user ARM emulation has several-second startup time
  2017-12-27  5:54 [Qemu-devel] [Bug 1740219] [NEW] static linux-user emulation has several-second startup time LukeShu
                   ` (16 preceding siblings ...)
  2018-04-06  6:12 ` ChristianEhrhardt
@ 2018-04-09 18:07 ` Launchpad Bug Tracker
  2018-04-26 10:36 ` Thomas Huth
  18 siblings, 0 replies; 20+ messages in thread
From: Launchpad Bug Tracker @ 2018-04-09 18:07 UTC (permalink / raw)
  To: qemu-devel

This bug was fixed in the package qemu - 1:2.11+dfsg-1ubuntu6

---------------
qemu (1:2.11+dfsg-1ubuntu6) bionic; urgency=medium

  * Remove LP: 1752026 changes to d/p/ubuntu/define-ubuntu-machine-types.patch.
    The Kernel fixes are preferred and already committed to the kernel.
    Therefore remove the default disabling of the HTM feature (LP: #1761175)
  * d/p/ubuntu/lp1739665-SSE-AVX-AVX512-cpu-features.patch: Enable new
    SSE/AVX/AVX512 cpu features (LP: #1739665)
  * d/p/ubuntu/lp1740219-continuous-space-commpage.patch: make Arm
    space+commpage continuous which avoids long startup times on
    qemu-user-static (LP: #1740219)
  * d/p/ubuntu/lp-1761372-*: provide pseries-bionic-2.11-sxxm type as
    convenience with all meltdown/spectre workarounds enabled by default.
    This is not the default type following upstream and x86 on that.
    (LP: #1761372).
  * d/p/ubuntu/lp-1704312-1-* provide means to manually handle filesystem-dax
    with pmem by backporting align and unarmed options (LP: #1704312).
  * d/p/ubuntu/lp-1762315-slirp-Add-domainname.patch: slirp: Add domainname
    option to slirp's DHCP server (LP: #1762315)

 -- Christian Ehrhardt <christian.ehrhardt@canonical.com>  Wed, 04 Apr
2018 15:16:07 +0200

** Changed in: qemu (Ubuntu)
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1740219

Title:
  static linux-user ARM emulation has several-second startup time

Status in QEMU:
  Fix Committed
Status in qemu package in Ubuntu:
  Fix Released

Bug description:
  static linux-user emulation has several-second startup time

  My problem: I'm a Parabola packager, and I'm updating our
  qemu-user-static package from 2.8 to 2.11.  With my new
  statically-linked 2.11, running `qemu-arm /my/arm-chroot/bin/true`
  went from taking 0.006s to 3s!  This does not happen with the normal
  dynamically linked 2.11, or the old static 2.8.

  What happens is it gets stuck in
  `linux-user/elfload.c:init_guest_space()`.  What `init_guest_space`
  does is map 2 parts of the address space: `[base, base+guest_size]`
  and `[base+0xffff0000, base+0xffff0000+page_size]`; where it must find
  an acceptable `base`.  Its strategy is to `mmap(NULL, guest_size,
  ...)` decide where the first range is, and then check if that
  +0xffff0000 is also available.  If it isn't, then it starts trying
  `mmap(base, ...)` for the entire address space from low-address to
  high-address.

  "Normally," it finds an accaptable `base` within the first 2 tries.
  With a static 2.11, it's taking thousands of tries.

  ----

  Now, from my understanding, there are 2 factors working together to
  cause that in static 2.11 but not the other builds:

   - 2.11 increased the default `guest_size` from 0xf7000000 to 0xffff0000
   - PIE (and thus ASLR) is disabled for static builds

  For some reason that I don't understand, with the smaller
  `guest_size` the initial `mmap(NULL, guest_size, ...)` usually
  returns an acceptable address range; but larger `guest_size` makes it
  consistently return a block of memory that butts right up against
  another already mapped chunk of memory.  This isn't just true on the
  older builds, it's true with the 2.11 builds if I use the `-R` flag to
  shrink the `guest_size` back down to 0xf7000000.  That is with
  linux-hardened 4.13.13 on x86-64.

  So then, it it falls back to crawling the entire address space; so it
  tries base=0x00001000.  With ASLR, that probably succeeds.  But with
  ASLR being disabled on static builds, the text segment is at
  0x60000000; which is does not leave room for the needed
  0xffff1000-size block before it.  So then it tries base=0x00002000.
  And so on, more than 6000 times until it finally gets to and passes
  the text segment; calling mmap more than 12000 times.

  ----

  I'm not sure what the fix is.  Perhaps try to mmap a continuous chunk
  of size 0xffff1000, then munmap it and then mmap the 2 chunks that we
  actually need.  The disadvantage to that is that it does not support
  the sparse address space that the current algorithm supports for
  `guest_size < 0xffff0000`.  If `guest_size < 0xffff0000` *and* the big
  mmap fails, then it could fall back to a sparse search; though I'm not
  sure the current algorithm is a good choice for it, as we see in this
  bug.  Perhaps it should inspect /proc/self/maps to try to find a
  suitable range before ever calling mmap?

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1740219/+subscriptions

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Qemu-devel] [Bug 1740219] Re: static linux-user ARM emulation has several-second startup time
  2017-12-27  5:54 [Qemu-devel] [Bug 1740219] [NEW] static linux-user emulation has several-second startup time LukeShu
                   ` (17 preceding siblings ...)
  2018-04-09 18:07 ` Launchpad Bug Tracker
@ 2018-04-26 10:36 ` Thomas Huth
  18 siblings, 0 replies; 20+ messages in thread
From: Thomas Huth @ 2018-04-26 10:36 UTC (permalink / raw)
  To: qemu-devel

** Changed in: qemu
       Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1740219

Title:
  static linux-user ARM emulation has several-second startup time

Status in QEMU:
  Fix Released
Status in qemu package in Ubuntu:
  Fix Released

Bug description:
  static linux-user emulation has several-second startup time

  My problem: I'm a Parabola packager, and I'm updating our
  qemu-user-static package from 2.8 to 2.11.  With my new
  statically-linked 2.11, running `qemu-arm /my/arm-chroot/bin/true`
  went from taking 0.006s to 3s!  This does not happen with the normal
  dynamically linked 2.11, or the old static 2.8.

  What happens is it gets stuck in
  `linux-user/elfload.c:init_guest_space()`.  What `init_guest_space`
  does is map 2 parts of the address space: `[base, base+guest_size]`
  and `[base+0xffff0000, base+0xffff0000+page_size]`; where it must find
  an acceptable `base`.  Its strategy is to `mmap(NULL, guest_size,
  ...)` decide where the first range is, and then check if that
  +0xffff0000 is also available.  If it isn't, then it starts trying
  `mmap(base, ...)` for the entire address space from low-address to
  high-address.

  "Normally," it finds an accaptable `base` within the first 2 tries.
  With a static 2.11, it's taking thousands of tries.

  ----

  Now, from my understanding, there are 2 factors working together to
  cause that in static 2.11 but not the other builds:

   - 2.11 increased the default `guest_size` from 0xf7000000 to 0xffff0000
   - PIE (and thus ASLR) is disabled for static builds

  For some reason that I don't understand, with the smaller
  `guest_size` the initial `mmap(NULL, guest_size, ...)` usually
  returns an acceptable address range; but larger `guest_size` makes it
  consistently return a block of memory that butts right up against
  another already mapped chunk of memory.  This isn't just true on the
  older builds, it's true with the 2.11 builds if I use the `-R` flag to
  shrink the `guest_size` back down to 0xf7000000.  That is with
  linux-hardened 4.13.13 on x86-64.

  So then, it it falls back to crawling the entire address space; so it
  tries base=0x00001000.  With ASLR, that probably succeeds.  But with
  ASLR being disabled on static builds, the text segment is at
  0x60000000; which is does not leave room for the needed
  0xffff1000-size block before it.  So then it tries base=0x00002000.
  And so on, more than 6000 times until it finally gets to and passes
  the text segment; calling mmap more than 12000 times.

  ----

  I'm not sure what the fix is.  Perhaps try to mmap a continuous chunk
  of size 0xffff1000, then munmap it and then mmap the 2 chunks that we
  actually need.  The disadvantage to that is that it does not support
  the sparse address space that the current algorithm supports for
  `guest_size < 0xffff0000`.  If `guest_size < 0xffff0000` *and* the big
  mmap fails, then it could fall back to a sparse search; though I'm not
  sure the current algorithm is a good choice for it, as we see in this
  bug.  Perhaps it should inspect /proc/self/maps to try to find a
  suitable range before ever calling mmap?

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1740219/+subscriptions

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2018-04-26 10:51 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-12-27  5:54 [Qemu-devel] [Bug 1740219] [NEW] static linux-user emulation has several-second startup time LukeShu
2017-12-27  6:18 ` [Qemu-devel] [Bug 1740219] " LukeShu
2018-01-03 23:00 ` [Qemu-devel] [Bug 1740219] Re: static linux-user ARM " LukeShu
2018-03-20  8:36 ` ChristianEhrhardt
2018-03-20  8:37 ` ChristianEhrhardt
2018-03-20 18:19 ` LukeShu
2018-03-22 20:16 ` LukeShu
2018-03-23  7:13 ` ChristianEhrhardt
2018-03-23 16:19 ` LukeShu
2018-03-23 16:30 ` LukeShu
2018-04-03  9:49 ` ChristianEhrhardt
2018-04-03 20:17 ` LukeShu
2018-04-04 13:33 ` ChristianEhrhardt
2018-04-04 13:54 ` Peter Maydell
2018-04-04 14:23 ` ChristianEhrhardt
2018-04-05  8:48 ` ChristianEhrhardt
2018-04-05 21:52 ` LukeShu
2018-04-06  6:12 ` ChristianEhrhardt
2018-04-09 18:07 ` Launchpad Bug Tracker
2018-04-26 10:36 ` Thomas Huth

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.