* [Qemu-devel] [Bug 1740219] [NEW] static linux-user emulation has several-second startup time
@ 2017-12-27 5:54 LukeShu
2017-12-27 6:18 ` [Qemu-devel] [Bug 1740219] " LukeShu
` (18 more replies)
0 siblings, 19 replies; 20+ messages in thread
From: LukeShu @ 2017-12-27 5:54 UTC (permalink / raw)
To: qemu-devel
Public bug reported:
static linux-user emulation has several-second startup time
My problem: I'm a Parabola packager, and I'm updating our
qemu-user-static package from 2.8 to 2.11. With my new
statically-linked 2.11, running `qemu-arm /my/arm-chroot/bin/true`
went from taking 0.006s to 3s! This does not happen with the normal
dynamically linked 2.11, or the old static 2.8.
What happens is it gets stuck in
`linux-user/elfload.c:init_guest_space()`. What `init_guest_space`
does is map 2 parts of the address space: `[base, base+guest_size]`
and `[base+0xffff0000, base+0xffff0000+page_size]`; where it must find
an acceptable `base`. Its strategy is to `mmap(NULL, guest_size,
...)` decide where the first range is, and then check if that
+0xffff0000 is also available. If it isn't, then it starts trying
`mmap(base, ...)` for the entire address space from low-address to
high-address.
"Normally," it finds an accaptable `base` within the first 2 tries.
With a static 2.11, it's taking thousands of tries.
----
Now, from my understanding, there are 2 factors working together to
cause that in static 2.11 but not the other builds:
- 2.11 increased the default `guest_size` from 0xf7000000 to 0xffff0000
- PIE (and thus ASLR) is disabled for static builds
For some reason that I don't understand, with the smaller
`guest_size` the initial `mmap(NULL, guest_size, ...)` usually
returns an acceptable address range; but larger `guest_size` makes it
consistently return a block of memory that butts right up against
another already mapped chunk of memory. This isn't just true on the
older builds, it's true with the 2.11 builds if I use the `-R` flag to
shrink the `guest_size` back down to 0xf7000000. That is with
linux-hardened 4.13.13 on x86-64.
So then, it it falls back to crawling the entire address space; so it
tries base=0x00001000. With ASLR, that probably succeeds. But with
ASLR being disabled on static builds, the text segment is at
0x60000000; which is does not leave room for the needed
0xffff1000-size block before it. So then it tries base=0x00002000.
And so on, more than 6000 times until it finally gets to and passes
the text segment; calling mmap more than 12000 times.
----
I'm not sure what the fix is. Perhaps try to mmap a continuous chunk
of size 0xffff1000, then munmap it and then mmap the 2 chunks that we
actually need. The disadvantage to that is that it does not support
the sparse address space that the current algorithm supports for
`guest_size < 0xffff0000`. If `guest_size < 0xffff0000` *and* the big
mmap fails, then it could fall back to a sparse search; though I'm not
sure the current algorithm is a good choice for it, as we see in this
bug. Perhaps it should inspect /proc/self/maps to try to find a
suitable range before ever calling mmap?
** Affects: qemu
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1740219
Title:
static linux-user emulation has several-second startup time
Status in QEMU:
New
Bug description:
static linux-user emulation has several-second startup time
My problem: I'm a Parabola packager, and I'm updating our
qemu-user-static package from 2.8 to 2.11. With my new
statically-linked 2.11, running `qemu-arm /my/arm-chroot/bin/true`
went from taking 0.006s to 3s! This does not happen with the normal
dynamically linked 2.11, or the old static 2.8.
What happens is it gets stuck in
`linux-user/elfload.c:init_guest_space()`. What `init_guest_space`
does is map 2 parts of the address space: `[base, base+guest_size]`
and `[base+0xffff0000, base+0xffff0000+page_size]`; where it must find
an acceptable `base`. Its strategy is to `mmap(NULL, guest_size,
...)` decide where the first range is, and then check if that
+0xffff0000 is also available. If it isn't, then it starts trying
`mmap(base, ...)` for the entire address space from low-address to
high-address.
"Normally," it finds an accaptable `base` within the first 2 tries.
With a static 2.11, it's taking thousands of tries.
----
Now, from my understanding, there are 2 factors working together to
cause that in static 2.11 but not the other builds:
- 2.11 increased the default `guest_size` from 0xf7000000 to 0xffff0000
- PIE (and thus ASLR) is disabled for static builds
For some reason that I don't understand, with the smaller
`guest_size` the initial `mmap(NULL, guest_size, ...)` usually
returns an acceptable address range; but larger `guest_size` makes it
consistently return a block of memory that butts right up against
another already mapped chunk of memory. This isn't just true on the
older builds, it's true with the 2.11 builds if I use the `-R` flag to
shrink the `guest_size` back down to 0xf7000000. That is with
linux-hardened 4.13.13 on x86-64.
So then, it it falls back to crawling the entire address space; so it
tries base=0x00001000. With ASLR, that probably succeeds. But with
ASLR being disabled on static builds, the text segment is at
0x60000000; which is does not leave room for the needed
0xffff1000-size block before it. So then it tries base=0x00002000.
And so on, more than 6000 times until it finally gets to and passes
the text segment; calling mmap more than 12000 times.
----
I'm not sure what the fix is. Perhaps try to mmap a continuous chunk
of size 0xffff1000, then munmap it and then mmap the 2 chunks that we
actually need. The disadvantage to that is that it does not support
the sparse address space that the current algorithm supports for
`guest_size < 0xffff0000`. If `guest_size < 0xffff0000` *and* the big
mmap fails, then it could fall back to a sparse search; though I'm not
sure the current algorithm is a good choice for it, as we see in this
bug. Perhaps it should inspect /proc/self/maps to try to find a
suitable range before ever calling mmap?
To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1740219/+subscriptions
^ permalink raw reply [flat|nested] 20+ messages in thread
* [Qemu-devel] [Bug 1740219] Re: static linux-user emulation has several-second startup time
2017-12-27 5:54 [Qemu-devel] [Bug 1740219] [NEW] static linux-user emulation has several-second startup time LukeShu
@ 2017-12-27 6:18 ` LukeShu
2018-01-03 23:00 ` [Qemu-devel] [Bug 1740219] Re: static linux-user ARM " LukeShu
` (17 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: LukeShu @ 2017-12-27 6:18 UTC (permalink / raw)
To: qemu-devel
Actually, it seems that the `[base+0xffff0000,
base+0xffff0000+page_size]` segment is only mapped on 32-bit ARM. So
this is 32-bit ARM-specific.
** Tags added: arm linux-user
** Summary changed:
- static linux-user emulation has several-second startup time
+ static linux-user ARM emulation has several-second startup time
--
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1740219
Title:
static linux-user ARM emulation has several-second startup time
Status in QEMU:
New
Bug description:
static linux-user emulation has several-second startup time
My problem: I'm a Parabola packager, and I'm updating our
qemu-user-static package from 2.8 to 2.11. With my new
statically-linked 2.11, running `qemu-arm /my/arm-chroot/bin/true`
went from taking 0.006s to 3s! This does not happen with the normal
dynamically linked 2.11, or the old static 2.8.
What happens is it gets stuck in
`linux-user/elfload.c:init_guest_space()`. What `init_guest_space`
does is map 2 parts of the address space: `[base, base+guest_size]`
and `[base+0xffff0000, base+0xffff0000+page_size]`; where it must find
an acceptable `base`. Its strategy is to `mmap(NULL, guest_size,
...)` decide where the first range is, and then check if that
+0xffff0000 is also available. If it isn't, then it starts trying
`mmap(base, ...)` for the entire address space from low-address to
high-address.
"Normally," it finds an accaptable `base` within the first 2 tries.
With a static 2.11, it's taking thousands of tries.
----
Now, from my understanding, there are 2 factors working together to
cause that in static 2.11 but not the other builds:
- 2.11 increased the default `guest_size` from 0xf7000000 to 0xffff0000
- PIE (and thus ASLR) is disabled for static builds
For some reason that I don't understand, with the smaller
`guest_size` the initial `mmap(NULL, guest_size, ...)` usually
returns an acceptable address range; but larger `guest_size` makes it
consistently return a block of memory that butts right up against
another already mapped chunk of memory. This isn't just true on the
older builds, it's true with the 2.11 builds if I use the `-R` flag to
shrink the `guest_size` back down to 0xf7000000. That is with
linux-hardened 4.13.13 on x86-64.
So then, it it falls back to crawling the entire address space; so it
tries base=0x00001000. With ASLR, that probably succeeds. But with
ASLR being disabled on static builds, the text segment is at
0x60000000; which is does not leave room for the needed
0xffff1000-size block before it. So then it tries base=0x00002000.
And so on, more than 6000 times until it finally gets to and passes
the text segment; calling mmap more than 12000 times.
----
I'm not sure what the fix is. Perhaps try to mmap a continuous chunk
of size 0xffff1000, then munmap it and then mmap the 2 chunks that we
actually need. The disadvantage to that is that it does not support
the sparse address space that the current algorithm supports for
`guest_size < 0xffff0000`. If `guest_size < 0xffff0000` *and* the big
mmap fails, then it could fall back to a sparse search; though I'm not
sure the current algorithm is a good choice for it, as we see in this
bug. Perhaps it should inspect /proc/self/maps to try to find a
suitable range before ever calling mmap?
To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1740219/+subscriptions
^ permalink raw reply [flat|nested] 20+ messages in thread
* [Qemu-devel] [Bug 1740219] Re: static linux-user ARM emulation has several-second startup time
2017-12-27 5:54 [Qemu-devel] [Bug 1740219] [NEW] static linux-user emulation has several-second startup time LukeShu
2017-12-27 6:18 ` [Qemu-devel] [Bug 1740219] " LukeShu
@ 2018-01-03 23:00 ` LukeShu
2018-03-20 8:36 ` ChristianEhrhardt
` (16 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: LukeShu @ 2018-01-03 23:00 UTC (permalink / raw)
To: qemu-devel
To have a link to it from here, on the 28th I submitted a patchset to
fix this: https://lists.nongnu.org/archive/html/qemu-
devel/2017-12/msg05237.html
--
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1740219
Title:
static linux-user ARM emulation has several-second startup time
Status in QEMU:
New
Bug description:
static linux-user emulation has several-second startup time
My problem: I'm a Parabola packager, and I'm updating our
qemu-user-static package from 2.8 to 2.11. With my new
statically-linked 2.11, running `qemu-arm /my/arm-chroot/bin/true`
went from taking 0.006s to 3s! This does not happen with the normal
dynamically linked 2.11, or the old static 2.8.
What happens is it gets stuck in
`linux-user/elfload.c:init_guest_space()`. What `init_guest_space`
does is map 2 parts of the address space: `[base, base+guest_size]`
and `[base+0xffff0000, base+0xffff0000+page_size]`; where it must find
an acceptable `base`. Its strategy is to `mmap(NULL, guest_size,
...)` decide where the first range is, and then check if that
+0xffff0000 is also available. If it isn't, then it starts trying
`mmap(base, ...)` for the entire address space from low-address to
high-address.
"Normally," it finds an accaptable `base` within the first 2 tries.
With a static 2.11, it's taking thousands of tries.
----
Now, from my understanding, there are 2 factors working together to
cause that in static 2.11 but not the other builds:
- 2.11 increased the default `guest_size` from 0xf7000000 to 0xffff0000
- PIE (and thus ASLR) is disabled for static builds
For some reason that I don't understand, with the smaller
`guest_size` the initial `mmap(NULL, guest_size, ...)` usually
returns an acceptable address range; but larger `guest_size` makes it
consistently return a block of memory that butts right up against
another already mapped chunk of memory. This isn't just true on the
older builds, it's true with the 2.11 builds if I use the `-R` flag to
shrink the `guest_size` back down to 0xf7000000. That is with
linux-hardened 4.13.13 on x86-64.
So then, it it falls back to crawling the entire address space; so it
tries base=0x00001000. With ASLR, that probably succeeds. But with
ASLR being disabled on static builds, the text segment is at
0x60000000; which is does not leave room for the needed
0xffff1000-size block before it. So then it tries base=0x00002000.
And so on, more than 6000 times until it finally gets to and passes
the text segment; calling mmap more than 12000 times.
----
I'm not sure what the fix is. Perhaps try to mmap a continuous chunk
of size 0xffff1000, then munmap it and then mmap the 2 chunks that we
actually need. The disadvantage to that is that it does not support
the sparse address space that the current algorithm supports for
`guest_size < 0xffff0000`. If `guest_size < 0xffff0000` *and* the big
mmap fails, then it could fall back to a sparse search; though I'm not
sure the current algorithm is a good choice for it, as we see in this
bug. Perhaps it should inspect /proc/self/maps to try to find a
suitable range before ever calling mmap?
To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1740219/+subscriptions
^ permalink raw reply [flat|nested] 20+ messages in thread
* [Qemu-devel] [Bug 1740219] Re: static linux-user ARM emulation has several-second startup time
2017-12-27 5:54 [Qemu-devel] [Bug 1740219] [NEW] static linux-user emulation has several-second startup time LukeShu
2017-12-27 6:18 ` [Qemu-devel] [Bug 1740219] " LukeShu
2018-01-03 23:00 ` [Qemu-devel] [Bug 1740219] Re: static linux-user ARM " LukeShu
@ 2018-03-20 8:36 ` ChristianEhrhardt
2018-03-20 8:37 ` ChristianEhrhardt
` (15 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: ChristianEhrhardt @ 2018-03-20 8:36 UTC (permalink / raw)
To: qemu-devel
>From Alistair Buxton (a-j-buxton) on bug 1756807:
I just tested the patch from https://bugs.launchpad.net/qemu/+bug/1740219 and it fixes the problem for me. Specifically I only tried the final patch of the series.
I duped the bugs onto this one since it is older and has a suggested
patch on the ML.
** Also affects: qemu (Ubuntu)
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1740219
Title:
static linux-user ARM emulation has several-second startup time
Status in QEMU:
New
Status in qemu package in Ubuntu:
Incomplete
Bug description:
static linux-user emulation has several-second startup time
My problem: I'm a Parabola packager, and I'm updating our
qemu-user-static package from 2.8 to 2.11. With my new
statically-linked 2.11, running `qemu-arm /my/arm-chroot/bin/true`
went from taking 0.006s to 3s! This does not happen with the normal
dynamically linked 2.11, or the old static 2.8.
What happens is it gets stuck in
`linux-user/elfload.c:init_guest_space()`. What `init_guest_space`
does is map 2 parts of the address space: `[base, base+guest_size]`
and `[base+0xffff0000, base+0xffff0000+page_size]`; where it must find
an acceptable `base`. Its strategy is to `mmap(NULL, guest_size,
...)` decide where the first range is, and then check if that
+0xffff0000 is also available. If it isn't, then it starts trying
`mmap(base, ...)` for the entire address space from low-address to
high-address.
"Normally," it finds an accaptable `base` within the first 2 tries.
With a static 2.11, it's taking thousands of tries.
----
Now, from my understanding, there are 2 factors working together to
cause that in static 2.11 but not the other builds:
- 2.11 increased the default `guest_size` from 0xf7000000 to 0xffff0000
- PIE (and thus ASLR) is disabled for static builds
For some reason that I don't understand, with the smaller
`guest_size` the initial `mmap(NULL, guest_size, ...)` usually
returns an acceptable address range; but larger `guest_size` makes it
consistently return a block of memory that butts right up against
another already mapped chunk of memory. This isn't just true on the
older builds, it's true with the 2.11 builds if I use the `-R` flag to
shrink the `guest_size` back down to 0xf7000000. That is with
linux-hardened 4.13.13 on x86-64.
So then, it it falls back to crawling the entire address space; so it
tries base=0x00001000. With ASLR, that probably succeeds. But with
ASLR being disabled on static builds, the text segment is at
0x60000000; which is does not leave room for the needed
0xffff1000-size block before it. So then it tries base=0x00002000.
And so on, more than 6000 times until it finally gets to and passes
the text segment; calling mmap more than 12000 times.
----
I'm not sure what the fix is. Perhaps try to mmap a continuous chunk
of size 0xffff1000, then munmap it and then mmap the 2 chunks that we
actually need. The disadvantage to that is that it does not support
the sparse address space that the current algorithm supports for
`guest_size < 0xffff0000`. If `guest_size < 0xffff0000` *and* the big
mmap fails, then it could fall back to a sparse search; though I'm not
sure the current algorithm is a good choice for it, as we see in this
bug. Perhaps it should inspect /proc/self/maps to try to find a
suitable range before ever calling mmap?
To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1740219/+subscriptions
^ permalink raw reply [flat|nested] 20+ messages in thread
* [Qemu-devel] [Bug 1740219] Re: static linux-user ARM emulation has several-second startup time
2017-12-27 5:54 [Qemu-devel] [Bug 1740219] [NEW] static linux-user emulation has several-second startup time LukeShu
` (2 preceding siblings ...)
2018-03-20 8:36 ` ChristianEhrhardt
@ 2018-03-20 8:37 ` ChristianEhrhardt
2018-03-20 18:19 ` LukeShu
` (14 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: ChristianEhrhardt @ 2018-03-20 8:37 UTC (permalink / raw)
To: qemu-devel
Added an qemu(Ubuntu) task to further track this, keeping it incomplete
there until this is resolved upstream.
** Changed in: qemu (Ubuntu)
Status: New => Incomplete
--
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1740219
Title:
static linux-user ARM emulation has several-second startup time
Status in QEMU:
New
Status in qemu package in Ubuntu:
Incomplete
Bug description:
static linux-user emulation has several-second startup time
My problem: I'm a Parabola packager, and I'm updating our
qemu-user-static package from 2.8 to 2.11. With my new
statically-linked 2.11, running `qemu-arm /my/arm-chroot/bin/true`
went from taking 0.006s to 3s! This does not happen with the normal
dynamically linked 2.11, or the old static 2.8.
What happens is it gets stuck in
`linux-user/elfload.c:init_guest_space()`. What `init_guest_space`
does is map 2 parts of the address space: `[base, base+guest_size]`
and `[base+0xffff0000, base+0xffff0000+page_size]`; where it must find
an acceptable `base`. Its strategy is to `mmap(NULL, guest_size,
...)` decide where the first range is, and then check if that
+0xffff0000 is also available. If it isn't, then it starts trying
`mmap(base, ...)` for the entire address space from low-address to
high-address.
"Normally," it finds an accaptable `base` within the first 2 tries.
With a static 2.11, it's taking thousands of tries.
----
Now, from my understanding, there are 2 factors working together to
cause that in static 2.11 but not the other builds:
- 2.11 increased the default `guest_size` from 0xf7000000 to 0xffff0000
- PIE (and thus ASLR) is disabled for static builds
For some reason that I don't understand, with the smaller
`guest_size` the initial `mmap(NULL, guest_size, ...)` usually
returns an acceptable address range; but larger `guest_size` makes it
consistently return a block of memory that butts right up against
another already mapped chunk of memory. This isn't just true on the
older builds, it's true with the 2.11 builds if I use the `-R` flag to
shrink the `guest_size` back down to 0xf7000000. That is with
linux-hardened 4.13.13 on x86-64.
So then, it it falls back to crawling the entire address space; so it
tries base=0x00001000. With ASLR, that probably succeeds. But with
ASLR being disabled on static builds, the text segment is at
0x60000000; which is does not leave room for the needed
0xffff1000-size block before it. So then it tries base=0x00002000.
And so on, more than 6000 times until it finally gets to and passes
the text segment; calling mmap more than 12000 times.
----
I'm not sure what the fix is. Perhaps try to mmap a continuous chunk
of size 0xffff1000, then munmap it and then mmap the 2 chunks that we
actually need. The disadvantage to that is that it does not support
the sparse address space that the current algorithm supports for
`guest_size < 0xffff0000`. If `guest_size < 0xffff0000` *and* the big
mmap fails, then it could fall back to a sparse search; though I'm not
sure the current algorithm is a good choice for it, as we see in this
bug. Perhaps it should inspect /proc/self/maps to try to find a
suitable range before ever calling mmap?
To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1740219/+subscriptions
^ permalink raw reply [flat|nested] 20+ messages in thread
* [Qemu-devel] [Bug 1740219] Re: static linux-user ARM emulation has several-second startup time
2017-12-27 5:54 [Qemu-devel] [Bug 1740219] [NEW] static linux-user emulation has several-second startup time LukeShu
` (3 preceding siblings ...)
2018-03-20 8:37 ` ChristianEhrhardt
@ 2018-03-20 18:19 ` LukeShu
2018-03-22 20:16 ` LukeShu
` (13 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: LukeShu @ 2018-03-20 18:19 UTC (permalink / raw)
To: qemu-devel
Everything except for the final patch (which has the actual fix) is now
applied on the master branch.
--
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1740219
Title:
static linux-user ARM emulation has several-second startup time
Status in QEMU:
New
Status in qemu package in Ubuntu:
Incomplete
Bug description:
static linux-user emulation has several-second startup time
My problem: I'm a Parabola packager, and I'm updating our
qemu-user-static package from 2.8 to 2.11. With my new
statically-linked 2.11, running `qemu-arm /my/arm-chroot/bin/true`
went from taking 0.006s to 3s! This does not happen with the normal
dynamically linked 2.11, or the old static 2.8.
What happens is it gets stuck in
`linux-user/elfload.c:init_guest_space()`. What `init_guest_space`
does is map 2 parts of the address space: `[base, base+guest_size]`
and `[base+0xffff0000, base+0xffff0000+page_size]`; where it must find
an acceptable `base`. Its strategy is to `mmap(NULL, guest_size,
...)` decide where the first range is, and then check if that
+0xffff0000 is also available. If it isn't, then it starts trying
`mmap(base, ...)` for the entire address space from low-address to
high-address.
"Normally," it finds an accaptable `base` within the first 2 tries.
With a static 2.11, it's taking thousands of tries.
----
Now, from my understanding, there are 2 factors working together to
cause that in static 2.11 but not the other builds:
- 2.11 increased the default `guest_size` from 0xf7000000 to 0xffff0000
- PIE (and thus ASLR) is disabled for static builds
For some reason that I don't understand, with the smaller
`guest_size` the initial `mmap(NULL, guest_size, ...)` usually
returns an acceptable address range; but larger `guest_size` makes it
consistently return a block of memory that butts right up against
another already mapped chunk of memory. This isn't just true on the
older builds, it's true with the 2.11 builds if I use the `-R` flag to
shrink the `guest_size` back down to 0xf7000000. That is with
linux-hardened 4.13.13 on x86-64.
So then, it it falls back to crawling the entire address space; so it
tries base=0x00001000. With ASLR, that probably succeeds. But with
ASLR being disabled on static builds, the text segment is at
0x60000000; which is does not leave room for the needed
0xffff1000-size block before it. So then it tries base=0x00002000.
And so on, more than 6000 times until it finally gets to and passes
the text segment; calling mmap more than 12000 times.
----
I'm not sure what the fix is. Perhaps try to mmap a continuous chunk
of size 0xffff1000, then munmap it and then mmap the 2 chunks that we
actually need. The disadvantage to that is that it does not support
the sparse address space that the current algorithm supports for
`guest_size < 0xffff0000`. If `guest_size < 0xffff0000` *and* the big
mmap fails, then it could fall back to a sparse search; though I'm not
sure the current algorithm is a good choice for it, as we see in this
bug. Perhaps it should inspect /proc/self/maps to try to find a
suitable range before ever calling mmap?
To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1740219/+subscriptions
^ permalink raw reply [flat|nested] 20+ messages in thread
* [Qemu-devel] [Bug 1740219] Re: static linux-user ARM emulation has several-second startup time
2017-12-27 5:54 [Qemu-devel] [Bug 1740219] [NEW] static linux-user emulation has several-second startup time LukeShu
` (4 preceding siblings ...)
2018-03-20 18:19 ` LukeShu
@ 2018-03-22 20:16 ` LukeShu
2018-03-23 7:13 ` ChristianEhrhardt
` (12 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: LukeShu @ 2018-03-22 20:16 UTC (permalink / raw)
To: qemu-devel
This is now fixed on master, as of
3be2e41b3323169852dca11ffe6ff772c33e5aaa.
--
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1740219
Title:
static linux-user ARM emulation has several-second startup time
Status in QEMU:
New
Status in qemu package in Ubuntu:
Incomplete
Bug description:
static linux-user emulation has several-second startup time
My problem: I'm a Parabola packager, and I'm updating our
qemu-user-static package from 2.8 to 2.11. With my new
statically-linked 2.11, running `qemu-arm /my/arm-chroot/bin/true`
went from taking 0.006s to 3s! This does not happen with the normal
dynamically linked 2.11, or the old static 2.8.
What happens is it gets stuck in
`linux-user/elfload.c:init_guest_space()`. What `init_guest_space`
does is map 2 parts of the address space: `[base, base+guest_size]`
and `[base+0xffff0000, base+0xffff0000+page_size]`; where it must find
an acceptable `base`. Its strategy is to `mmap(NULL, guest_size,
...)` decide where the first range is, and then check if that
+0xffff0000 is also available. If it isn't, then it starts trying
`mmap(base, ...)` for the entire address space from low-address to
high-address.
"Normally," it finds an accaptable `base` within the first 2 tries.
With a static 2.11, it's taking thousands of tries.
----
Now, from my understanding, there are 2 factors working together to
cause that in static 2.11 but not the other builds:
- 2.11 increased the default `guest_size` from 0xf7000000 to 0xffff0000
- PIE (and thus ASLR) is disabled for static builds
For some reason that I don't understand, with the smaller
`guest_size` the initial `mmap(NULL, guest_size, ...)` usually
returns an acceptable address range; but larger `guest_size` makes it
consistently return a block of memory that butts right up against
another already mapped chunk of memory. This isn't just true on the
older builds, it's true with the 2.11 builds if I use the `-R` flag to
shrink the `guest_size` back down to 0xf7000000. That is with
linux-hardened 4.13.13 on x86-64.
So then, it it falls back to crawling the entire address space; so it
tries base=0x00001000. With ASLR, that probably succeeds. But with
ASLR being disabled on static builds, the text segment is at
0x60000000; which is does not leave room for the needed
0xffff1000-size block before it. So then it tries base=0x00002000.
And so on, more than 6000 times until it finally gets to and passes
the text segment; calling mmap more than 12000 times.
----
I'm not sure what the fix is. Perhaps try to mmap a continuous chunk
of size 0xffff1000, then munmap it and then mmap the 2 chunks that we
actually need. The disadvantage to that is that it does not support
the sparse address space that the current algorithm supports for
`guest_size < 0xffff0000`. If `guest_size < 0xffff0000` *and* the big
mmap fails, then it could fall back to a sparse search; though I'm not
sure the current algorithm is a good choice for it, as we see in this
bug. Perhaps it should inspect /proc/self/maps to try to find a
suitable range before ever calling mmap?
To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1740219/+subscriptions
^ permalink raw reply [flat|nested] 20+ messages in thread
* [Qemu-devel] [Bug 1740219] Re: static linux-user ARM emulation has several-second startup time
2017-12-27 5:54 [Qemu-devel] [Bug 1740219] [NEW] static linux-user emulation has several-second startup time LukeShu
` (5 preceding siblings ...)
2018-03-22 20:16 ` LukeShu
@ 2018-03-23 7:13 ` ChristianEhrhardt
2018-03-23 16:19 ` LukeShu
` (11 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: ChristianEhrhardt @ 2018-03-23 7:13 UTC (permalink / raw)
To: qemu-devel
The sha above is the merge, thanks Luke.
The actual change by you is
commit 2a53535af471f4bee9d6cb5b363746b8d5ed21dd
Author: Luke Shumaker <lukeshu@parabola.nu>
Date: Thu Dec 28 13:08:13 2017 -0500
linux-user: init_guest_space: Try to make ARM space+commpage
continuous
I'll be away a week but then look at taking this fix in.
@Luke - to check in advance, are there depending changes post 2.11.1
that are needed for this that you know of?
** Changed in: qemu (Ubuntu)
Status: Incomplete => Triaged
** Changed in: qemu (Ubuntu)
Importance: Undecided => High
--
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1740219
Title:
static linux-user ARM emulation has several-second startup time
Status in QEMU:
New
Status in qemu package in Ubuntu:
Triaged
Bug description:
static linux-user emulation has several-second startup time
My problem: I'm a Parabola packager, and I'm updating our
qemu-user-static package from 2.8 to 2.11. With my new
statically-linked 2.11, running `qemu-arm /my/arm-chroot/bin/true`
went from taking 0.006s to 3s! This does not happen with the normal
dynamically linked 2.11, or the old static 2.8.
What happens is it gets stuck in
`linux-user/elfload.c:init_guest_space()`. What `init_guest_space`
does is map 2 parts of the address space: `[base, base+guest_size]`
and `[base+0xffff0000, base+0xffff0000+page_size]`; where it must find
an acceptable `base`. Its strategy is to `mmap(NULL, guest_size,
...)` decide where the first range is, and then check if that
+0xffff0000 is also available. If it isn't, then it starts trying
`mmap(base, ...)` for the entire address space from low-address to
high-address.
"Normally," it finds an accaptable `base` within the first 2 tries.
With a static 2.11, it's taking thousands of tries.
----
Now, from my understanding, there are 2 factors working together to
cause that in static 2.11 but not the other builds:
- 2.11 increased the default `guest_size` from 0xf7000000 to 0xffff0000
- PIE (and thus ASLR) is disabled for static builds
For some reason that I don't understand, with the smaller
`guest_size` the initial `mmap(NULL, guest_size, ...)` usually
returns an acceptable address range; but larger `guest_size` makes it
consistently return a block of memory that butts right up against
another already mapped chunk of memory. This isn't just true on the
older builds, it's true with the 2.11 builds if I use the `-R` flag to
shrink the `guest_size` back down to 0xf7000000. That is with
linux-hardened 4.13.13 on x86-64.
So then, it it falls back to crawling the entire address space; so it
tries base=0x00001000. With ASLR, that probably succeeds. But with
ASLR being disabled on static builds, the text segment is at
0x60000000; which is does not leave room for the needed
0xffff1000-size block before it. So then it tries base=0x00002000.
And so on, more than 6000 times until it finally gets to and passes
the text segment; calling mmap more than 12000 times.
----
I'm not sure what the fix is. Perhaps try to mmap a continuous chunk
of size 0xffff1000, then munmap it and then mmap the 2 chunks that we
actually need. The disadvantage to that is that it does not support
the sparse address space that the current algorithm supports for
`guest_size < 0xffff0000`. If `guest_size < 0xffff0000` *and* the big
mmap fails, then it could fall back to a sparse search; though I'm not
sure the current algorithm is a good choice for it, as we see in this
bug. Perhaps it should inspect /proc/self/maps to try to find a
suitable range before ever calling mmap?
To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1740219/+subscriptions
^ permalink raw reply [flat|nested] 20+ messages in thread
* [Qemu-devel] [Bug 1740219] Re: static linux-user ARM emulation has several-second startup time
2017-12-27 5:54 [Qemu-devel] [Bug 1740219] [NEW] static linux-user emulation has several-second startup time LukeShu
` (6 preceding siblings ...)
2018-03-23 7:13 ` ChristianEhrhardt
@ 2018-03-23 16:19 ` LukeShu
2018-03-23 16:30 ` LukeShu
` (10 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: LukeShu @ 2018-03-23 16:19 UTC (permalink / raw)
To: qemu-devel
I don't believe so. The patchset applies cleanly on 2.11.0, and fixes
the issue there.
--
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1740219
Title:
static linux-user ARM emulation has several-second startup time
Status in QEMU:
New
Status in qemu package in Ubuntu:
Triaged
Bug description:
static linux-user emulation has several-second startup time
My problem: I'm a Parabola packager, and I'm updating our
qemu-user-static package from 2.8 to 2.11. With my new
statically-linked 2.11, running `qemu-arm /my/arm-chroot/bin/true`
went from taking 0.006s to 3s! This does not happen with the normal
dynamically linked 2.11, or the old static 2.8.
What happens is it gets stuck in
`linux-user/elfload.c:init_guest_space()`. What `init_guest_space`
does is map 2 parts of the address space: `[base, base+guest_size]`
and `[base+0xffff0000, base+0xffff0000+page_size]`; where it must find
an acceptable `base`. Its strategy is to `mmap(NULL, guest_size,
...)` decide where the first range is, and then check if that
+0xffff0000 is also available. If it isn't, then it starts trying
`mmap(base, ...)` for the entire address space from low-address to
high-address.
"Normally," it finds an accaptable `base` within the first 2 tries.
With a static 2.11, it's taking thousands of tries.
----
Now, from my understanding, there are 2 factors working together to
cause that in static 2.11 but not the other builds:
- 2.11 increased the default `guest_size` from 0xf7000000 to 0xffff0000
- PIE (and thus ASLR) is disabled for static builds
For some reason that I don't understand, with the smaller
`guest_size` the initial `mmap(NULL, guest_size, ...)` usually
returns an acceptable address range; but larger `guest_size` makes it
consistently return a block of memory that butts right up against
another already mapped chunk of memory. This isn't just true on the
older builds, it's true with the 2.11 builds if I use the `-R` flag to
shrink the `guest_size` back down to 0xf7000000. That is with
linux-hardened 4.13.13 on x86-64.
So then, it it falls back to crawling the entire address space; so it
tries base=0x00001000. With ASLR, that probably succeeds. But with
ASLR being disabled on static builds, the text segment is at
0x60000000; which is does not leave room for the needed
0xffff1000-size block before it. So then it tries base=0x00002000.
And so on, more than 6000 times until it finally gets to and passes
the text segment; calling mmap more than 12000 times.
----
I'm not sure what the fix is. Perhaps try to mmap a continuous chunk
of size 0xffff1000, then munmap it and then mmap the 2 chunks that we
actually need. The disadvantage to that is that it does not support
the sparse address space that the current algorithm supports for
`guest_size < 0xffff0000`. If `guest_size < 0xffff0000` *and* the big
mmap fails, then it could fall back to a sparse search; though I'm not
sure the current algorithm is a good choice for it, as we see in this
bug. Perhaps it should inspect /proc/self/maps to try to find a
suitable range before ever calling mmap?
To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1740219/+subscriptions
^ permalink raw reply [flat|nested] 20+ messages in thread
* [Qemu-devel] [Bug 1740219] Re: static linux-user ARM emulation has several-second startup time
2017-12-27 5:54 [Qemu-devel] [Bug 1740219] [NEW] static linux-user emulation has several-second startup time LukeShu
` (7 preceding siblings ...)
2018-03-23 16:19 ` LukeShu
@ 2018-03-23 16:30 ` LukeShu
2018-04-03 9:49 ` ChristianEhrhardt
` (9 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: LukeShu @ 2018-03-23 16:30 UTC (permalink / raw)
To: qemu-devel
Oh, but it's worth noting that patch 1/10 had a mistake in it, which was
corrected when applied as 8756e1361d177e91dc6d88f37749b809fd2407fb.
--
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1740219
Title:
static linux-user ARM emulation has several-second startup time
Status in QEMU:
New
Status in qemu package in Ubuntu:
Triaged
Bug description:
static linux-user emulation has several-second startup time
My problem: I'm a Parabola packager, and I'm updating our
qemu-user-static package from 2.8 to 2.11. With my new
statically-linked 2.11, running `qemu-arm /my/arm-chroot/bin/true`
went from taking 0.006s to 3s! This does not happen with the normal
dynamically linked 2.11, or the old static 2.8.
What happens is it gets stuck in
`linux-user/elfload.c:init_guest_space()`. What `init_guest_space`
does is map 2 parts of the address space: `[base, base+guest_size]`
and `[base+0xffff0000, base+0xffff0000+page_size]`; where it must find
an acceptable `base`. Its strategy is to `mmap(NULL, guest_size,
...)` decide where the first range is, and then check if that
+0xffff0000 is also available. If it isn't, then it starts trying
`mmap(base, ...)` for the entire address space from low-address to
high-address.
"Normally," it finds an accaptable `base` within the first 2 tries.
With a static 2.11, it's taking thousands of tries.
----
Now, from my understanding, there are 2 factors working together to
cause that in static 2.11 but not the other builds:
- 2.11 increased the default `guest_size` from 0xf7000000 to 0xffff0000
- PIE (and thus ASLR) is disabled for static builds
For some reason that I don't understand, with the smaller
`guest_size` the initial `mmap(NULL, guest_size, ...)` usually
returns an acceptable address range; but larger `guest_size` makes it
consistently return a block of memory that butts right up against
another already mapped chunk of memory. This isn't just true on the
older builds, it's true with the 2.11 builds if I use the `-R` flag to
shrink the `guest_size` back down to 0xf7000000. That is with
linux-hardened 4.13.13 on x86-64.
So then, it it falls back to crawling the entire address space; so it
tries base=0x00001000. With ASLR, that probably succeeds. But with
ASLR being disabled on static builds, the text segment is at
0x60000000; which is does not leave room for the needed
0xffff1000-size block before it. So then it tries base=0x00002000.
And so on, more than 6000 times until it finally gets to and passes
the text segment; calling mmap more than 12000 times.
----
I'm not sure what the fix is. Perhaps try to mmap a continuous chunk
of size 0xffff1000, then munmap it and then mmap the 2 chunks that we
actually need. The disadvantage to that is that it does not support
the sparse address space that the current algorithm supports for
`guest_size < 0xffff0000`. If `guest_size < 0xffff0000` *and* the big
mmap fails, then it could fall back to a sparse search; though I'm not
sure the current algorithm is a good choice for it, as we see in this
bug. Perhaps it should inspect /proc/self/maps to try to find a
suitable range before ever calling mmap?
To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1740219/+subscriptions
^ permalink raw reply [flat|nested] 20+ messages in thread
* [Qemu-devel] [Bug 1740219] Re: static linux-user ARM emulation has several-second startup time
2017-12-27 5:54 [Qemu-devel] [Bug 1740219] [NEW] static linux-user emulation has several-second startup time LukeShu
` (8 preceding siblings ...)
2018-03-23 16:30 ` LukeShu
@ 2018-04-03 9:49 ` ChristianEhrhardt
2018-04-03 20:17 ` LukeShu
` (8 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: ChristianEhrhardt @ 2018-04-03 9:49 UTC (permalink / raw)
To: qemu-devel
Back again,
my question was more about if we are able to JUST take 2a53535af471f4bee9d6cb5b363746b8d5ed21dd without the rest.
We are already in Feature Freeze for Ubuntu 18.04, so we can either
a) wait for the next release and pick it up in full by the new qemu
version (well we will do that anyway)
b) identify a fix only (not all the cleanup and reworks) patch that will
be good for the 2.11.1 in Bionic
Especially being "just slow" but not broken makes it harder to consider the closer we get to release (I hate that as well being a performance engineer, but minimizing regressions is a target as well :-) ).
Essentially to some extend being in feature freeze is as if we are under [1] already.
So will 2a53535af471f4bee9d6cb5b363746b8d5ed21dd alone be good in your opinion?
Or will it need more and if so what would be the minimal set of your changes.
[1]: https://wiki.ubuntu.com/StableReleaseUpdates
--
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1740219
Title:
static linux-user ARM emulation has several-second startup time
Status in QEMU:
New
Status in qemu package in Ubuntu:
Triaged
Bug description:
static linux-user emulation has several-second startup time
My problem: I'm a Parabola packager, and I'm updating our
qemu-user-static package from 2.8 to 2.11. With my new
statically-linked 2.11, running `qemu-arm /my/arm-chroot/bin/true`
went from taking 0.006s to 3s! This does not happen with the normal
dynamically linked 2.11, or the old static 2.8.
What happens is it gets stuck in
`linux-user/elfload.c:init_guest_space()`. What `init_guest_space`
does is map 2 parts of the address space: `[base, base+guest_size]`
and `[base+0xffff0000, base+0xffff0000+page_size]`; where it must find
an acceptable `base`. Its strategy is to `mmap(NULL, guest_size,
...)` decide where the first range is, and then check if that
+0xffff0000 is also available. If it isn't, then it starts trying
`mmap(base, ...)` for the entire address space from low-address to
high-address.
"Normally," it finds an accaptable `base` within the first 2 tries.
With a static 2.11, it's taking thousands of tries.
----
Now, from my understanding, there are 2 factors working together to
cause that in static 2.11 but not the other builds:
- 2.11 increased the default `guest_size` from 0xf7000000 to 0xffff0000
- PIE (and thus ASLR) is disabled for static builds
For some reason that I don't understand, with the smaller
`guest_size` the initial `mmap(NULL, guest_size, ...)` usually
returns an acceptable address range; but larger `guest_size` makes it
consistently return a block of memory that butts right up against
another already mapped chunk of memory. This isn't just true on the
older builds, it's true with the 2.11 builds if I use the `-R` flag to
shrink the `guest_size` back down to 0xf7000000. That is with
linux-hardened 4.13.13 on x86-64.
So then, it it falls back to crawling the entire address space; so it
tries base=0x00001000. With ASLR, that probably succeeds. But with
ASLR being disabled on static builds, the text segment is at
0x60000000; which is does not leave room for the needed
0xffff1000-size block before it. So then it tries base=0x00002000.
And so on, more than 6000 times until it finally gets to and passes
the text segment; calling mmap more than 12000 times.
----
I'm not sure what the fix is. Perhaps try to mmap a continuous chunk
of size 0xffff1000, then munmap it and then mmap the 2 chunks that we
actually need. The disadvantage to that is that it does not support
the sparse address space that the current algorithm supports for
`guest_size < 0xffff0000`. If `guest_size < 0xffff0000` *and* the big
mmap fails, then it could fall back to a sparse search; though I'm not
sure the current algorithm is a good choice for it, as we see in this
bug. Perhaps it should inspect /proc/self/maps to try to find a
suitable range before ever calling mmap?
To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1740219/+subscriptions
^ permalink raw reply [flat|nested] 20+ messages in thread
* [Qemu-devel] [Bug 1740219] Re: static linux-user ARM emulation has several-second startup time
2017-12-27 5:54 [Qemu-devel] [Bug 1740219] [NEW] static linux-user emulation has several-second startup time LukeShu
` (9 preceding siblings ...)
2018-04-03 9:49 ` ChristianEhrhardt
@ 2018-04-03 20:17 ` LukeShu
2018-04-04 13:33 ` ChristianEhrhardt
` (7 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: LukeShu @ 2018-04-03 20:17 UTC (permalink / raw)
To: qemu-devel
Yes, I believe that 2a53535af471f4bee9d6cb5b363746b8d5ed21dd alone is
good.
--
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1740219
Title:
static linux-user ARM emulation has several-second startup time
Status in QEMU:
New
Status in qemu package in Ubuntu:
Triaged
Bug description:
static linux-user emulation has several-second startup time
My problem: I'm a Parabola packager, and I'm updating our
qemu-user-static package from 2.8 to 2.11. With my new
statically-linked 2.11, running `qemu-arm /my/arm-chroot/bin/true`
went from taking 0.006s to 3s! This does not happen with the normal
dynamically linked 2.11, or the old static 2.8.
What happens is it gets stuck in
`linux-user/elfload.c:init_guest_space()`. What `init_guest_space`
does is map 2 parts of the address space: `[base, base+guest_size]`
and `[base+0xffff0000, base+0xffff0000+page_size]`; where it must find
an acceptable `base`. Its strategy is to `mmap(NULL, guest_size,
...)` decide where the first range is, and then check if that
+0xffff0000 is also available. If it isn't, then it starts trying
`mmap(base, ...)` for the entire address space from low-address to
high-address.
"Normally," it finds an accaptable `base` within the first 2 tries.
With a static 2.11, it's taking thousands of tries.
----
Now, from my understanding, there are 2 factors working together to
cause that in static 2.11 but not the other builds:
- 2.11 increased the default `guest_size` from 0xf7000000 to 0xffff0000
- PIE (and thus ASLR) is disabled for static builds
For some reason that I don't understand, with the smaller
`guest_size` the initial `mmap(NULL, guest_size, ...)` usually
returns an acceptable address range; but larger `guest_size` makes it
consistently return a block of memory that butts right up against
another already mapped chunk of memory. This isn't just true on the
older builds, it's true with the 2.11 builds if I use the `-R` flag to
shrink the `guest_size` back down to 0xf7000000. That is with
linux-hardened 4.13.13 on x86-64.
So then, it it falls back to crawling the entire address space; so it
tries base=0x00001000. With ASLR, that probably succeeds. But with
ASLR being disabled on static builds, the text segment is at
0x60000000; which is does not leave room for the needed
0xffff1000-size block before it. So then it tries base=0x00002000.
And so on, more than 6000 times until it finally gets to and passes
the text segment; calling mmap more than 12000 times.
----
I'm not sure what the fix is. Perhaps try to mmap a continuous chunk
of size 0xffff1000, then munmap it and then mmap the 2 chunks that we
actually need. The disadvantage to that is that it does not support
the sparse address space that the current algorithm supports for
`guest_size < 0xffff0000`. If `guest_size < 0xffff0000` *and* the big
mmap fails, then it could fall back to a sparse search; though I'm not
sure the current algorithm is a good choice for it, as we see in this
bug. Perhaps it should inspect /proc/self/maps to try to find a
suitable range before ever calling mmap?
To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1740219/+subscriptions
^ permalink raw reply [flat|nested] 20+ messages in thread
* [Qemu-devel] [Bug 1740219] Re: static linux-user ARM emulation has several-second startup time
2017-12-27 5:54 [Qemu-devel] [Bug 1740219] [NEW] static linux-user emulation has several-second startup time LukeShu
` (10 preceding siblings ...)
2018-04-03 20:17 ` LukeShu
@ 2018-04-04 13:33 ` ChristianEhrhardt
2018-04-04 13:54 ` Peter Maydell
` (6 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: ChristianEhrhardt @ 2018-04-04 13:33 UTC (permalink / raw)
To: qemu-devel
Considering 2.12-rcX a release set the upstream status to that
** Changed in: qemu
Status: New => Fix Released
--
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1740219
Title:
static linux-user ARM emulation has several-second startup time
Status in QEMU:
Fix Released
Status in qemu package in Ubuntu:
Triaged
Bug description:
static linux-user emulation has several-second startup time
My problem: I'm a Parabola packager, and I'm updating our
qemu-user-static package from 2.8 to 2.11. With my new
statically-linked 2.11, running `qemu-arm /my/arm-chroot/bin/true`
went from taking 0.006s to 3s! This does not happen with the normal
dynamically linked 2.11, or the old static 2.8.
What happens is it gets stuck in
`linux-user/elfload.c:init_guest_space()`. What `init_guest_space`
does is map 2 parts of the address space: `[base, base+guest_size]`
and `[base+0xffff0000, base+0xffff0000+page_size]`; where it must find
an acceptable `base`. Its strategy is to `mmap(NULL, guest_size,
...)` decide where the first range is, and then check if that
+0xffff0000 is also available. If it isn't, then it starts trying
`mmap(base, ...)` for the entire address space from low-address to
high-address.
"Normally," it finds an accaptable `base` within the first 2 tries.
With a static 2.11, it's taking thousands of tries.
----
Now, from my understanding, there are 2 factors working together to
cause that in static 2.11 but not the other builds:
- 2.11 increased the default `guest_size` from 0xf7000000 to 0xffff0000
- PIE (and thus ASLR) is disabled for static builds
For some reason that I don't understand, with the smaller
`guest_size` the initial `mmap(NULL, guest_size, ...)` usually
returns an acceptable address range; but larger `guest_size` makes it
consistently return a block of memory that butts right up against
another already mapped chunk of memory. This isn't just true on the
older builds, it's true with the 2.11 builds if I use the `-R` flag to
shrink the `guest_size` back down to 0xf7000000. That is with
linux-hardened 4.13.13 on x86-64.
So then, it it falls back to crawling the entire address space; so it
tries base=0x00001000. With ASLR, that probably succeeds. But with
ASLR being disabled on static builds, the text segment is at
0x60000000; which is does not leave room for the needed
0xffff1000-size block before it. So then it tries base=0x00002000.
And so on, more than 6000 times until it finally gets to and passes
the text segment; calling mmap more than 12000 times.
----
I'm not sure what the fix is. Perhaps try to mmap a continuous chunk
of size 0xffff1000, then munmap it and then mmap the 2 chunks that we
actually need. The disadvantage to that is that it does not support
the sparse address space that the current algorithm supports for
`guest_size < 0xffff0000`. If `guest_size < 0xffff0000` *and* the big
mmap fails, then it could fall back to a sparse search; though I'm not
sure the current algorithm is a good choice for it, as we see in this
bug. Perhaps it should inspect /proc/self/maps to try to find a
suitable range before ever calling mmap?
To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1740219/+subscriptions
^ permalink raw reply [flat|nested] 20+ messages in thread
* [Qemu-devel] [Bug 1740219] Re: static linux-user ARM emulation has several-second startup time
2017-12-27 5:54 [Qemu-devel] [Bug 1740219] [NEW] static linux-user emulation has several-second startup time LukeShu
` (11 preceding siblings ...)
2018-04-04 13:33 ` ChristianEhrhardt
@ 2018-04-04 13:54 ` Peter Maydell
2018-04-04 14:23 ` ChristianEhrhardt
` (5 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: Peter Maydell @ 2018-04-04 13:54 UTC (permalink / raw)
To: qemu-devel
We don't generally mark bugs 'fix released' until the final (non-rc)
release is made.
** Changed in: qemu
Status: Fix Released => Fix Committed
--
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1740219
Title:
static linux-user ARM emulation has several-second startup time
Status in QEMU:
Fix Committed
Status in qemu package in Ubuntu:
Triaged
Bug description:
static linux-user emulation has several-second startup time
My problem: I'm a Parabola packager, and I'm updating our
qemu-user-static package from 2.8 to 2.11. With my new
statically-linked 2.11, running `qemu-arm /my/arm-chroot/bin/true`
went from taking 0.006s to 3s! This does not happen with the normal
dynamically linked 2.11, or the old static 2.8.
What happens is it gets stuck in
`linux-user/elfload.c:init_guest_space()`. What `init_guest_space`
does is map 2 parts of the address space: `[base, base+guest_size]`
and `[base+0xffff0000, base+0xffff0000+page_size]`; where it must find
an acceptable `base`. Its strategy is to `mmap(NULL, guest_size,
...)` decide where the first range is, and then check if that
+0xffff0000 is also available. If it isn't, then it starts trying
`mmap(base, ...)` for the entire address space from low-address to
high-address.
"Normally," it finds an accaptable `base` within the first 2 tries.
With a static 2.11, it's taking thousands of tries.
----
Now, from my understanding, there are 2 factors working together to
cause that in static 2.11 but not the other builds:
- 2.11 increased the default `guest_size` from 0xf7000000 to 0xffff0000
- PIE (and thus ASLR) is disabled for static builds
For some reason that I don't understand, with the smaller
`guest_size` the initial `mmap(NULL, guest_size, ...)` usually
returns an acceptable address range; but larger `guest_size` makes it
consistently return a block of memory that butts right up against
another already mapped chunk of memory. This isn't just true on the
older builds, it's true with the 2.11 builds if I use the `-R` flag to
shrink the `guest_size` back down to 0xf7000000. That is with
linux-hardened 4.13.13 on x86-64.
So then, it it falls back to crawling the entire address space; so it
tries base=0x00001000. With ASLR, that probably succeeds. But with
ASLR being disabled on static builds, the text segment is at
0x60000000; which is does not leave room for the needed
0xffff1000-size block before it. So then it tries base=0x00002000.
And so on, more than 6000 times until it finally gets to and passes
the text segment; calling mmap more than 12000 times.
----
I'm not sure what the fix is. Perhaps try to mmap a continuous chunk
of size 0xffff1000, then munmap it and then mmap the 2 chunks that we
actually need. The disadvantage to that is that it does not support
the sparse address space that the current algorithm supports for
`guest_size < 0xffff0000`. If `guest_size < 0xffff0000` *and* the big
mmap fails, then it could fall back to a sparse search; though I'm not
sure the current algorithm is a good choice for it, as we see in this
bug. Perhaps it should inspect /proc/self/maps to try to find a
suitable range before ever calling mmap?
To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1740219/+subscriptions
^ permalink raw reply [flat|nested] 20+ messages in thread
* [Qemu-devel] [Bug 1740219] Re: static linux-user ARM emulation has several-second startup time
2017-12-27 5:54 [Qemu-devel] [Bug 1740219] [NEW] static linux-user emulation has several-second startup time LukeShu
` (12 preceding siblings ...)
2018-04-04 13:54 ` Peter Maydell
@ 2018-04-04 14:23 ` ChristianEhrhardt
2018-04-05 8:48 ` ChristianEhrhardt
` (4 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: ChristianEhrhardt @ 2018-04-04 14:23 UTC (permalink / raw)
To: qemu-devel
I wasn't sure if you'd usually take the interim step to "Fix Committed",
thanks Peter.
--
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1740219
Title:
static linux-user ARM emulation has several-second startup time
Status in QEMU:
Fix Committed
Status in qemu package in Ubuntu:
Triaged
Bug description:
static linux-user emulation has several-second startup time
My problem: I'm a Parabola packager, and I'm updating our
qemu-user-static package from 2.8 to 2.11. With my new
statically-linked 2.11, running `qemu-arm /my/arm-chroot/bin/true`
went from taking 0.006s to 3s! This does not happen with the normal
dynamically linked 2.11, or the old static 2.8.
What happens is it gets stuck in
`linux-user/elfload.c:init_guest_space()`. What `init_guest_space`
does is map 2 parts of the address space: `[base, base+guest_size]`
and `[base+0xffff0000, base+0xffff0000+page_size]`; where it must find
an acceptable `base`. Its strategy is to `mmap(NULL, guest_size,
...)` decide where the first range is, and then check if that
+0xffff0000 is also available. If it isn't, then it starts trying
`mmap(base, ...)` for the entire address space from low-address to
high-address.
"Normally," it finds an accaptable `base` within the first 2 tries.
With a static 2.11, it's taking thousands of tries.
----
Now, from my understanding, there are 2 factors working together to
cause that in static 2.11 but not the other builds:
- 2.11 increased the default `guest_size` from 0xf7000000 to 0xffff0000
- PIE (and thus ASLR) is disabled for static builds
For some reason that I don't understand, with the smaller
`guest_size` the initial `mmap(NULL, guest_size, ...)` usually
returns an acceptable address range; but larger `guest_size` makes it
consistently return a block of memory that butts right up against
another already mapped chunk of memory. This isn't just true on the
older builds, it's true with the 2.11 builds if I use the `-R` flag to
shrink the `guest_size` back down to 0xf7000000. That is with
linux-hardened 4.13.13 on x86-64.
So then, it it falls back to crawling the entire address space; so it
tries base=0x00001000. With ASLR, that probably succeeds. But with
ASLR being disabled on static builds, the text segment is at
0x60000000; which is does not leave room for the needed
0xffff1000-size block before it. So then it tries base=0x00002000.
And so on, more than 6000 times until it finally gets to and passes
the text segment; calling mmap more than 12000 times.
----
I'm not sure what the fix is. Perhaps try to mmap a continuous chunk
of size 0xffff1000, then munmap it and then mmap the 2 chunks that we
actually need. The disadvantage to that is that it does not support
the sparse address space that the current algorithm supports for
`guest_size < 0xffff0000`. If `guest_size < 0xffff0000` *and* the big
mmap fails, then it could fall back to a sparse search; though I'm not
sure the current algorithm is a good choice for it, as we see in this
bug. Perhaps it should inspect /proc/self/maps to try to find a
suitable range before ever calling mmap?
To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1740219/+subscriptions
^ permalink raw reply [flat|nested] 20+ messages in thread
* [Qemu-devel] [Bug 1740219] Re: static linux-user ARM emulation has several-second startup time
2017-12-27 5:54 [Qemu-devel] [Bug 1740219] [NEW] static linux-user emulation has several-second startup time LukeShu
` (13 preceding siblings ...)
2018-04-04 14:23 ` ChristianEhrhardt
@ 2018-04-05 8:48 ` ChristianEhrhardt
2018-04-05 21:52 ` LukeShu
` (3 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: ChristianEhrhardt @ 2018-04-05 8:48 UTC (permalink / raw)
To: qemu-devel
For Ubuntu: PPA: https://launchpad.net/~ci-train-ppa-
service/+archive/ubuntu/3225
Regression test against ppa looked good tonight.
There are new changes which I need to add for two more bugs.
But testing from the ppa is ok right now already.
@Luke: Please test against this PPA, as I want to ensure it is working
for your case before pushing to Bionic.
--
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1740219
Title:
static linux-user ARM emulation has several-second startup time
Status in QEMU:
Fix Committed
Status in qemu package in Ubuntu:
Triaged
Bug description:
static linux-user emulation has several-second startup time
My problem: I'm a Parabola packager, and I'm updating our
qemu-user-static package from 2.8 to 2.11. With my new
statically-linked 2.11, running `qemu-arm /my/arm-chroot/bin/true`
went from taking 0.006s to 3s! This does not happen with the normal
dynamically linked 2.11, or the old static 2.8.
What happens is it gets stuck in
`linux-user/elfload.c:init_guest_space()`. What `init_guest_space`
does is map 2 parts of the address space: `[base, base+guest_size]`
and `[base+0xffff0000, base+0xffff0000+page_size]`; where it must find
an acceptable `base`. Its strategy is to `mmap(NULL, guest_size,
...)` decide where the first range is, and then check if that
+0xffff0000 is also available. If it isn't, then it starts trying
`mmap(base, ...)` for the entire address space from low-address to
high-address.
"Normally," it finds an accaptable `base` within the first 2 tries.
With a static 2.11, it's taking thousands of tries.
----
Now, from my understanding, there are 2 factors working together to
cause that in static 2.11 but not the other builds:
- 2.11 increased the default `guest_size` from 0xf7000000 to 0xffff0000
- PIE (and thus ASLR) is disabled for static builds
For some reason that I don't understand, with the smaller
`guest_size` the initial `mmap(NULL, guest_size, ...)` usually
returns an acceptable address range; but larger `guest_size` makes it
consistently return a block of memory that butts right up against
another already mapped chunk of memory. This isn't just true on the
older builds, it's true with the 2.11 builds if I use the `-R` flag to
shrink the `guest_size` back down to 0xf7000000. That is with
linux-hardened 4.13.13 on x86-64.
So then, it it falls back to crawling the entire address space; so it
tries base=0x00001000. With ASLR, that probably succeeds. But with
ASLR being disabled on static builds, the text segment is at
0x60000000; which is does not leave room for the needed
0xffff1000-size block before it. So then it tries base=0x00002000.
And so on, more than 6000 times until it finally gets to and passes
the text segment; calling mmap more than 12000 times.
----
I'm not sure what the fix is. Perhaps try to mmap a continuous chunk
of size 0xffff1000, then munmap it and then mmap the 2 chunks that we
actually need. The disadvantage to that is that it does not support
the sparse address space that the current algorithm supports for
`guest_size < 0xffff0000`. If `guest_size < 0xffff0000` *and* the big
mmap fails, then it could fall back to a sparse search; though I'm not
sure the current algorithm is a good choice for it, as we see in this
bug. Perhaps it should inspect /proc/self/maps to try to find a
suitable range before ever calling mmap?
To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1740219/+subscriptions
^ permalink raw reply [flat|nested] 20+ messages in thread
* [Qemu-devel] [Bug 1740219] Re: static linux-user ARM emulation has several-second startup time
2017-12-27 5:54 [Qemu-devel] [Bug 1740219] [NEW] static linux-user emulation has several-second startup time LukeShu
` (14 preceding siblings ...)
2018-04-05 8:48 ` ChristianEhrhardt
@ 2018-04-05 21:52 ` LukeShu
2018-04-06 6:12 ` ChristianEhrhardt
` (2 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: LukeShu @ 2018-04-05 21:52 UTC (permalink / raw)
To: qemu-devel
I'm not on a Debian/Ubuntu-ish system, but extracting
qemu-user-static_2.11+dfsg-1ubuntu6~ppa3_amd64.deb : data.tar.xz :
usr/bin/qemu-arm-static
and testing with that binary:
$ time usr/bin/qemu-arm-static /var/lib/archbuild/dbscripts@armv7h/luke/usr/bin/ldconfig --help
Usage: ldconfig [OPTION...]
...
<https://github.com/archlinuxarm/PKGBUILDs/issues>.
real 0m0.068s
user 0m0.067s
sys 0m0.000s
That is: LGTM.
--
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1740219
Title:
static linux-user ARM emulation has several-second startup time
Status in QEMU:
Fix Committed
Status in qemu package in Ubuntu:
Triaged
Bug description:
static linux-user emulation has several-second startup time
My problem: I'm a Parabola packager, and I'm updating our
qemu-user-static package from 2.8 to 2.11. With my new
statically-linked 2.11, running `qemu-arm /my/arm-chroot/bin/true`
went from taking 0.006s to 3s! This does not happen with the normal
dynamically linked 2.11, or the old static 2.8.
What happens is it gets stuck in
`linux-user/elfload.c:init_guest_space()`. What `init_guest_space`
does is map 2 parts of the address space: `[base, base+guest_size]`
and `[base+0xffff0000, base+0xffff0000+page_size]`; where it must find
an acceptable `base`. Its strategy is to `mmap(NULL, guest_size,
...)` decide where the first range is, and then check if that
+0xffff0000 is also available. If it isn't, then it starts trying
`mmap(base, ...)` for the entire address space from low-address to
high-address.
"Normally," it finds an accaptable `base` within the first 2 tries.
With a static 2.11, it's taking thousands of tries.
----
Now, from my understanding, there are 2 factors working together to
cause that in static 2.11 but not the other builds:
- 2.11 increased the default `guest_size` from 0xf7000000 to 0xffff0000
- PIE (and thus ASLR) is disabled for static builds
For some reason that I don't understand, with the smaller
`guest_size` the initial `mmap(NULL, guest_size, ...)` usually
returns an acceptable address range; but larger `guest_size` makes it
consistently return a block of memory that butts right up against
another already mapped chunk of memory. This isn't just true on the
older builds, it's true with the 2.11 builds if I use the `-R` flag to
shrink the `guest_size` back down to 0xf7000000. That is with
linux-hardened 4.13.13 on x86-64.
So then, it it falls back to crawling the entire address space; so it
tries base=0x00001000. With ASLR, that probably succeeds. But with
ASLR being disabled on static builds, the text segment is at
0x60000000; which is does not leave room for the needed
0xffff1000-size block before it. So then it tries base=0x00002000.
And so on, more than 6000 times until it finally gets to and passes
the text segment; calling mmap more than 12000 times.
----
I'm not sure what the fix is. Perhaps try to mmap a continuous chunk
of size 0xffff1000, then munmap it and then mmap the 2 chunks that we
actually need. The disadvantage to that is that it does not support
the sparse address space that the current algorithm supports for
`guest_size < 0xffff0000`. If `guest_size < 0xffff0000` *and* the big
mmap fails, then it could fall back to a sparse search; though I'm not
sure the current algorithm is a good choice for it, as we see in this
bug. Perhaps it should inspect /proc/self/maps to try to find a
suitable range before ever calling mmap?
To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1740219/+subscriptions
^ permalink raw reply [flat|nested] 20+ messages in thread
* [Qemu-devel] [Bug 1740219] Re: static linux-user ARM emulation has several-second startup time
2017-12-27 5:54 [Qemu-devel] [Bug 1740219] [NEW] static linux-user emulation has several-second startup time LukeShu
` (15 preceding siblings ...)
2018-04-05 21:52 ` LukeShu
@ 2018-04-06 6:12 ` ChristianEhrhardt
2018-04-09 18:07 ` Launchpad Bug Tracker
2018-04-26 10:36 ` Thomas Huth
18 siblings, 0 replies; 20+ messages in thread
From: ChristianEhrhardt @ 2018-04-06 6:12 UTC (permalink / raw)
To: qemu-devel
Thanks Luke.
I tried the same from the deb of libc for arm in bionic.
Down from
real 0m2.031s
to
real 0m0.002s
So confirmed as well.
** Changed in: qemu (Ubuntu)
Status: Triaged => In Progress
--
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1740219
Title:
static linux-user ARM emulation has several-second startup time
Status in QEMU:
Fix Committed
Status in qemu package in Ubuntu:
In Progress
Bug description:
static linux-user emulation has several-second startup time
My problem: I'm a Parabola packager, and I'm updating our
qemu-user-static package from 2.8 to 2.11. With my new
statically-linked 2.11, running `qemu-arm /my/arm-chroot/bin/true`
went from taking 0.006s to 3s! This does not happen with the normal
dynamically linked 2.11, or the old static 2.8.
What happens is it gets stuck in
`linux-user/elfload.c:init_guest_space()`. What `init_guest_space`
does is map 2 parts of the address space: `[base, base+guest_size]`
and `[base+0xffff0000, base+0xffff0000+page_size]`; where it must find
an acceptable `base`. Its strategy is to `mmap(NULL, guest_size,
...)` decide where the first range is, and then check if that
+0xffff0000 is also available. If it isn't, then it starts trying
`mmap(base, ...)` for the entire address space from low-address to
high-address.
"Normally," it finds an accaptable `base` within the first 2 tries.
With a static 2.11, it's taking thousands of tries.
----
Now, from my understanding, there are 2 factors working together to
cause that in static 2.11 but not the other builds:
- 2.11 increased the default `guest_size` from 0xf7000000 to 0xffff0000
- PIE (and thus ASLR) is disabled for static builds
For some reason that I don't understand, with the smaller
`guest_size` the initial `mmap(NULL, guest_size, ...)` usually
returns an acceptable address range; but larger `guest_size` makes it
consistently return a block of memory that butts right up against
another already mapped chunk of memory. This isn't just true on the
older builds, it's true with the 2.11 builds if I use the `-R` flag to
shrink the `guest_size` back down to 0xf7000000. That is with
linux-hardened 4.13.13 on x86-64.
So then, it it falls back to crawling the entire address space; so it
tries base=0x00001000. With ASLR, that probably succeeds. But with
ASLR being disabled on static builds, the text segment is at
0x60000000; which is does not leave room for the needed
0xffff1000-size block before it. So then it tries base=0x00002000.
And so on, more than 6000 times until it finally gets to and passes
the text segment; calling mmap more than 12000 times.
----
I'm not sure what the fix is. Perhaps try to mmap a continuous chunk
of size 0xffff1000, then munmap it and then mmap the 2 chunks that we
actually need. The disadvantage to that is that it does not support
the sparse address space that the current algorithm supports for
`guest_size < 0xffff0000`. If `guest_size < 0xffff0000` *and* the big
mmap fails, then it could fall back to a sparse search; though I'm not
sure the current algorithm is a good choice for it, as we see in this
bug. Perhaps it should inspect /proc/self/maps to try to find a
suitable range before ever calling mmap?
To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1740219/+subscriptions
^ permalink raw reply [flat|nested] 20+ messages in thread
* [Qemu-devel] [Bug 1740219] Re: static linux-user ARM emulation has several-second startup time
2017-12-27 5:54 [Qemu-devel] [Bug 1740219] [NEW] static linux-user emulation has several-second startup time LukeShu
` (16 preceding siblings ...)
2018-04-06 6:12 ` ChristianEhrhardt
@ 2018-04-09 18:07 ` Launchpad Bug Tracker
2018-04-26 10:36 ` Thomas Huth
18 siblings, 0 replies; 20+ messages in thread
From: Launchpad Bug Tracker @ 2018-04-09 18:07 UTC (permalink / raw)
To: qemu-devel
This bug was fixed in the package qemu - 1:2.11+dfsg-1ubuntu6
---------------
qemu (1:2.11+dfsg-1ubuntu6) bionic; urgency=medium
* Remove LP: 1752026 changes to d/p/ubuntu/define-ubuntu-machine-types.patch.
The Kernel fixes are preferred and already committed to the kernel.
Therefore remove the default disabling of the HTM feature (LP: #1761175)
* d/p/ubuntu/lp1739665-SSE-AVX-AVX512-cpu-features.patch: Enable new
SSE/AVX/AVX512 cpu features (LP: #1739665)
* d/p/ubuntu/lp1740219-continuous-space-commpage.patch: make Arm
space+commpage continuous which avoids long startup times on
qemu-user-static (LP: #1740219)
* d/p/ubuntu/lp-1761372-*: provide pseries-bionic-2.11-sxxm type as
convenience with all meltdown/spectre workarounds enabled by default.
This is not the default type following upstream and x86 on that.
(LP: #1761372).
* d/p/ubuntu/lp-1704312-1-* provide means to manually handle filesystem-dax
with pmem by backporting align and unarmed options (LP: #1704312).
* d/p/ubuntu/lp-1762315-slirp-Add-domainname.patch: slirp: Add domainname
option to slirp's DHCP server (LP: #1762315)
-- Christian Ehrhardt <christian.ehrhardt@canonical.com> Wed, 04 Apr
2018 15:16:07 +0200
** Changed in: qemu (Ubuntu)
Status: In Progress => Fix Released
--
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1740219
Title:
static linux-user ARM emulation has several-second startup time
Status in QEMU:
Fix Committed
Status in qemu package in Ubuntu:
Fix Released
Bug description:
static linux-user emulation has several-second startup time
My problem: I'm a Parabola packager, and I'm updating our
qemu-user-static package from 2.8 to 2.11. With my new
statically-linked 2.11, running `qemu-arm /my/arm-chroot/bin/true`
went from taking 0.006s to 3s! This does not happen with the normal
dynamically linked 2.11, or the old static 2.8.
What happens is it gets stuck in
`linux-user/elfload.c:init_guest_space()`. What `init_guest_space`
does is map 2 parts of the address space: `[base, base+guest_size]`
and `[base+0xffff0000, base+0xffff0000+page_size]`; where it must find
an acceptable `base`. Its strategy is to `mmap(NULL, guest_size,
...)` decide where the first range is, and then check if that
+0xffff0000 is also available. If it isn't, then it starts trying
`mmap(base, ...)` for the entire address space from low-address to
high-address.
"Normally," it finds an accaptable `base` within the first 2 tries.
With a static 2.11, it's taking thousands of tries.
----
Now, from my understanding, there are 2 factors working together to
cause that in static 2.11 but not the other builds:
- 2.11 increased the default `guest_size` from 0xf7000000 to 0xffff0000
- PIE (and thus ASLR) is disabled for static builds
For some reason that I don't understand, with the smaller
`guest_size` the initial `mmap(NULL, guest_size, ...)` usually
returns an acceptable address range; but larger `guest_size` makes it
consistently return a block of memory that butts right up against
another already mapped chunk of memory. This isn't just true on the
older builds, it's true with the 2.11 builds if I use the `-R` flag to
shrink the `guest_size` back down to 0xf7000000. That is with
linux-hardened 4.13.13 on x86-64.
So then, it it falls back to crawling the entire address space; so it
tries base=0x00001000. With ASLR, that probably succeeds. But with
ASLR being disabled on static builds, the text segment is at
0x60000000; which is does not leave room for the needed
0xffff1000-size block before it. So then it tries base=0x00002000.
And so on, more than 6000 times until it finally gets to and passes
the text segment; calling mmap more than 12000 times.
----
I'm not sure what the fix is. Perhaps try to mmap a continuous chunk
of size 0xffff1000, then munmap it and then mmap the 2 chunks that we
actually need. The disadvantage to that is that it does not support
the sparse address space that the current algorithm supports for
`guest_size < 0xffff0000`. If `guest_size < 0xffff0000` *and* the big
mmap fails, then it could fall back to a sparse search; though I'm not
sure the current algorithm is a good choice for it, as we see in this
bug. Perhaps it should inspect /proc/self/maps to try to find a
suitable range before ever calling mmap?
To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1740219/+subscriptions
^ permalink raw reply [flat|nested] 20+ messages in thread
* [Qemu-devel] [Bug 1740219] Re: static linux-user ARM emulation has several-second startup time
2017-12-27 5:54 [Qemu-devel] [Bug 1740219] [NEW] static linux-user emulation has several-second startup time LukeShu
` (17 preceding siblings ...)
2018-04-09 18:07 ` Launchpad Bug Tracker
@ 2018-04-26 10:36 ` Thomas Huth
18 siblings, 0 replies; 20+ messages in thread
From: Thomas Huth @ 2018-04-26 10:36 UTC (permalink / raw)
To: qemu-devel
** Changed in: qemu
Status: Fix Committed => Fix Released
--
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1740219
Title:
static linux-user ARM emulation has several-second startup time
Status in QEMU:
Fix Released
Status in qemu package in Ubuntu:
Fix Released
Bug description:
static linux-user emulation has several-second startup time
My problem: I'm a Parabola packager, and I'm updating our
qemu-user-static package from 2.8 to 2.11. With my new
statically-linked 2.11, running `qemu-arm /my/arm-chroot/bin/true`
went from taking 0.006s to 3s! This does not happen with the normal
dynamically linked 2.11, or the old static 2.8.
What happens is it gets stuck in
`linux-user/elfload.c:init_guest_space()`. What `init_guest_space`
does is map 2 parts of the address space: `[base, base+guest_size]`
and `[base+0xffff0000, base+0xffff0000+page_size]`; where it must find
an acceptable `base`. Its strategy is to `mmap(NULL, guest_size,
...)` decide where the first range is, and then check if that
+0xffff0000 is also available. If it isn't, then it starts trying
`mmap(base, ...)` for the entire address space from low-address to
high-address.
"Normally," it finds an accaptable `base` within the first 2 tries.
With a static 2.11, it's taking thousands of tries.
----
Now, from my understanding, there are 2 factors working together to
cause that in static 2.11 but not the other builds:
- 2.11 increased the default `guest_size` from 0xf7000000 to 0xffff0000
- PIE (and thus ASLR) is disabled for static builds
For some reason that I don't understand, with the smaller
`guest_size` the initial `mmap(NULL, guest_size, ...)` usually
returns an acceptable address range; but larger `guest_size` makes it
consistently return a block of memory that butts right up against
another already mapped chunk of memory. This isn't just true on the
older builds, it's true with the 2.11 builds if I use the `-R` flag to
shrink the `guest_size` back down to 0xf7000000. That is with
linux-hardened 4.13.13 on x86-64.
So then, it it falls back to crawling the entire address space; so it
tries base=0x00001000. With ASLR, that probably succeeds. But with
ASLR being disabled on static builds, the text segment is at
0x60000000; which is does not leave room for the needed
0xffff1000-size block before it. So then it tries base=0x00002000.
And so on, more than 6000 times until it finally gets to and passes
the text segment; calling mmap more than 12000 times.
----
I'm not sure what the fix is. Perhaps try to mmap a continuous chunk
of size 0xffff1000, then munmap it and then mmap the 2 chunks that we
actually need. The disadvantage to that is that it does not support
the sparse address space that the current algorithm supports for
`guest_size < 0xffff0000`. If `guest_size < 0xffff0000` *and* the big
mmap fails, then it could fall back to a sparse search; though I'm not
sure the current algorithm is a good choice for it, as we see in this
bug. Perhaps it should inspect /proc/self/maps to try to find a
suitable range before ever calling mmap?
To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1740219/+subscriptions
^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2018-04-26 10:51 UTC | newest]
Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-12-27 5:54 [Qemu-devel] [Bug 1740219] [NEW] static linux-user emulation has several-second startup time LukeShu
2017-12-27 6:18 ` [Qemu-devel] [Bug 1740219] " LukeShu
2018-01-03 23:00 ` [Qemu-devel] [Bug 1740219] Re: static linux-user ARM " LukeShu
2018-03-20 8:36 ` ChristianEhrhardt
2018-03-20 8:37 ` ChristianEhrhardt
2018-03-20 18:19 ` LukeShu
2018-03-22 20:16 ` LukeShu
2018-03-23 7:13 ` ChristianEhrhardt
2018-03-23 16:19 ` LukeShu
2018-03-23 16:30 ` LukeShu
2018-04-03 9:49 ` ChristianEhrhardt
2018-04-03 20:17 ` LukeShu
2018-04-04 13:33 ` ChristianEhrhardt
2018-04-04 13:54 ` Peter Maydell
2018-04-04 14:23 ` ChristianEhrhardt
2018-04-05 8:48 ` ChristianEhrhardt
2018-04-05 21:52 ` LukeShu
2018-04-06 6:12 ` ChristianEhrhardt
2018-04-09 18:07 ` Launchpad Bug Tracker
2018-04-26 10:36 ` Thomas Huth
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.