* Windows slow boot: contractor wanted
@ 2012-08-16 10:47 ` Richard Davies
0 siblings, 0 replies; 101+ messages in thread
From: Richard Davies @ 2012-08-16 10:47 UTC (permalink / raw)
To: qemu-devel, kvm
Hi,
We run a cloud hosting provider using qemu-kvm 1.1, and are keen to find a
contractor to track down and fix problems we have with large memory Windows
guests booting very slowly - they can take several hours.
We previously reported these problems in July (copied below) and they are
still present with Linux kernel 3.5.1 and qemu-kvm 1.1.1.
This is a serious issue for us which is causing significant pain to our
larger Windows VM customers when their servers are offline for many hours
during boot.
If anyone knowledgeable in the area would be interested in being paid to
work on this, or if you know someone who might be, I would be delighted to
hear from you.
Cheers,
Richard.
===== Previous bug report
http://marc.info/?l=qemu-devel&m=134304194329745
We have been experiencing this problem for a while now too, using qemu-kvm
(currently at 1.1.1).
Unfortunately, hv_relaxed doesn't seem to fix it. The following command line
produces the issue:
qemu-kvm -nodefaults -m 4096 -smp 8 -cpu host,hv_relaxed -vga cirrus -usbdevice tablet -vnc :99 -monitor stdio -hda test.img
The hardware consists of dual AMD Opteron 6128 processors (16 cores in
total) and 64GB of memory. This command line was tested on kernel 3.1.4.
I've also tested with -no-hpet.
What I have seen is much as described: the memory fills out slowly, and top
on the host will show the process using 100% on all allocated CPU cores. The
most extreme case was a machine which took something between 6 and 8 hours
to boot.
This seems to be related to the assigned memory, as described, but also the
number of processor cores (which makes sense if we believe it's a timing
issue?). I have seen slow-booting guests improved by switching down to a
single or even two cores.
Matthew, I agree that this seems to be linked to the number of VMs running -
in fact, shutting down other VMs on a dedicated test host caused the machine
to start booting at a normal speed (with no reboot required).
However, the level of contention is never such that this could be explained
by the host simply being overcommitted.
If it helps anyone, there's an image of the hard drive I've been using to
test at:
http://46.20.114.253/
It's 5G of gzip file containing a fairly standard Windows 2008 trial
installation. Since it's in the trial period, anyone who wants to use it may
have to re-arm the trial: http://support.microsoft.com/kb/948472
Please let me know if I can provide any more information, or test anything.
Best wishes,
Owen Tuz
^ permalink raw reply [flat|nested] 101+ messages in thread
* [Qemu-devel] Windows slow boot: contractor wanted
@ 2012-08-16 10:47 ` Richard Davies
0 siblings, 0 replies; 101+ messages in thread
From: Richard Davies @ 2012-08-16 10:47 UTC (permalink / raw)
To: qemu-devel, kvm
Hi,
We run a cloud hosting provider using qemu-kvm 1.1, and are keen to find a
contractor to track down and fix problems we have with large memory Windows
guests booting very slowly - they can take several hours.
We previously reported these problems in July (copied below) and they are
still present with Linux kernel 3.5.1 and qemu-kvm 1.1.1.
This is a serious issue for us which is causing significant pain to our
larger Windows VM customers when their servers are offline for many hours
during boot.
If anyone knowledgeable in the area would be interested in being paid to
work on this, or if you know someone who might be, I would be delighted to
hear from you.
Cheers,
Richard.
===== Previous bug report
http://marc.info/?l=qemu-devel&m=134304194329745
We have been experiencing this problem for a while now too, using qemu-kvm
(currently at 1.1.1).
Unfortunately, hv_relaxed doesn't seem to fix it. The following command line
produces the issue:
qemu-kvm -nodefaults -m 4096 -smp 8 -cpu host,hv_relaxed -vga cirrus -usbdevice tablet -vnc :99 -monitor stdio -hda test.img
The hardware consists of dual AMD Opteron 6128 processors (16 cores in
total) and 64GB of memory. This command line was tested on kernel 3.1.4.
I've also tested with -no-hpet.
What I have seen is much as described: the memory fills out slowly, and top
on the host will show the process using 100% on all allocated CPU cores. The
most extreme case was a machine which took something between 6 and 8 hours
to boot.
This seems to be related to the assigned memory, as described, but also the
number of processor cores (which makes sense if we believe it's a timing
issue?). I have seen slow-booting guests improved by switching down to a
single or even two cores.
Matthew, I agree that this seems to be linked to the number of VMs running -
in fact, shutting down other VMs on a dedicated test host caused the machine
to start booting at a normal speed (with no reboot required).
However, the level of contention is never such that this could be explained
by the host simply being overcommitted.
If it helps anyone, there's an image of the hard drive I've been using to
test at:
http://46.20.114.253/
It's 5G of gzip file containing a fairly standard Windows 2008 trial
installation. Since it's in the trial period, anyone who wants to use it may
have to re-arm the trial: http://support.microsoft.com/kb/948472
Please let me know if I can provide any more information, or test anything.
Best wishes,
Owen Tuz
^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: Windows slow boot: contractor wanted
2012-08-16 10:47 ` [Qemu-devel] " Richard Davies
@ 2012-08-16 11:39 ` Avi Kivity
-1 siblings, 0 replies; 101+ messages in thread
From: Avi Kivity @ 2012-08-16 11:39 UTC (permalink / raw)
To: Richard Davies; +Cc: qemu-devel, kvm
On 08/16/2012 01:47 PM, Richard Davies wrote:
> Hi,
>
> We run a cloud hosting provider using qemu-kvm 1.1, and are keen to find a
> contractor to track down and fix problems we have with large memory Windows
> guests booting very slowly - they can take several hours.
>
> We previously reported these problems in July (copied below) and they are
> still present with Linux kernel 3.5.1 and qemu-kvm 1.1.1.
>
> This is a serious issue for us which is causing significant pain to our
> larger Windows VM customers when their servers are offline for many hours
> during boot.
>
> If anyone knowledgeable in the area would be interested in being paid to
> work on this, or if you know someone who might be, I would be delighted to
> hear from you.
>
I happen to be gainfully employed but maybe I can help. Can you collect
a trace during the slow boot period and post in somewhere? See
http://www.linux-kvm.org/page/Tracing for instructions.
4G/8way is not a particularly large guest. What is the host
configuration (memory, core count)?
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: [Qemu-devel] Windows slow boot: contractor wanted
@ 2012-08-16 11:39 ` Avi Kivity
0 siblings, 0 replies; 101+ messages in thread
From: Avi Kivity @ 2012-08-16 11:39 UTC (permalink / raw)
To: Richard Davies; +Cc: qemu-devel, kvm
On 08/16/2012 01:47 PM, Richard Davies wrote:
> Hi,
>
> We run a cloud hosting provider using qemu-kvm 1.1, and are keen to find a
> contractor to track down and fix problems we have with large memory Windows
> guests booting very slowly - they can take several hours.
>
> We previously reported these problems in July (copied below) and they are
> still present with Linux kernel 3.5.1 and qemu-kvm 1.1.1.
>
> This is a serious issue for us which is causing significant pain to our
> larger Windows VM customers when their servers are offline for many hours
> during boot.
>
> If anyone knowledgeable in the area would be interested in being paid to
> work on this, or if you know someone who might be, I would be delighted to
> hear from you.
>
I happen to be gainfully employed but maybe I can help. Can you collect
a trace during the slow boot period and post in somewhere? See
http://www.linux-kvm.org/page/Tracing for instructions.
4G/8way is not a particularly large guest. What is the host
configuration (memory, core count)?
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: Windows slow boot: contractor wanted
2012-08-16 10:47 ` [Qemu-devel] " Richard Davies
@ 2012-08-16 14:10 ` Benoît Canet
-1 siblings, 0 replies; 101+ messages in thread
From: Benoît Canet @ 2012-08-16 14:10 UTC (permalink / raw)
To: Richard Davies; +Cc: qemu-devel, kvm
Le Thursday 16 Aug 2012 à 11:47:27 (+0100), Richard Davies a écrit :
> Hi,
>
> We run a cloud hosting provider using qemu-kvm 1.1, and are keen to find a
> contractor to track down and fix problems we have with large memory Windows
> guests booting very slowly - they can take several hours.
>
> We previously reported these problems in July (copied below) and they are
> still present with Linux kernel 3.5.1 and qemu-kvm 1.1.1.
>
> This is a serious issue for us which is causing significant pain to our
> larger Windows VM customers when their servers are offline for many hours
> during boot.
>
> If anyone knowledgeable in the area would be interested in being paid to
> work on this, or if you know someone who might be, I would be delighted to
> hear from you.
>
> Cheers,
>
> Richard.
>
>
> ===== Previous bug report
>
> http://marc.info/?l=qemu-devel&m=134304194329745
>
>
> We have been experiencing this problem for a while now too, using qemu-kvm
> (currently at 1.1.1).
>
> Unfortunately, hv_relaxed doesn't seem to fix it. The following command line
> produces the issue:
>
> qemu-kvm -nodefaults -m 4096 -smp 8 -cpu host,hv_relaxed -vga cirrus -usbdevice tablet -vnc :99 -monitor stdio -hda test.img
>
> The hardware consists of dual AMD Opteron 6128 processors (16 cores in
> total) and 64GB of memory. This command line was tested on kernel 3.1.4.
>
> I've also tested with -no-hpet.
>
> What I have seen is much as described: the memory fills out slowly, and top
> on the host will show the process using 100% on all allocated CPU cores. The
> most extreme case was a machine which took something between 6 and 8 hours
> to boot.
>
> This seems to be related to the assigned memory, as described, but also the
> number of processor cores (which makes sense if we believe it's a timing
> issue?). I have seen slow-booting guests improved by switching down to a
> single or even two cores.
>
> Matthew, I agree that this seems to be linked to the number of VMs running -
> in fact, shutting down other VMs on a dedicated test host caused the machine
> to start booting at a normal speed (with no reboot required).
>
> However, the level of contention is never such that this could be explained
> by the host simply being overcommitted.
>
> If it helps anyone, there's an image of the hard drive I've been using to
> test at:
>
> http://46.20.114.253/
>
> It's 5G of gzip file containing a fairly standard Windows 2008 trial
> installation. Since it's in the trial period, anyone who wants to use it may
> have to re-arm the trial: http://support.microsoft.com/kb/948472
>
> Please let me know if I can provide any more information, or test anything.
For info the image boot pretty fast with qemu-kvm 1.1.1 and a 3.2.0-29 ubuntu kernel
on a core i7 with these parameters.
Benoît
>
> Best wishes,
>
> Owen Tuz
>
^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: [Qemu-devel] Windows slow boot: contractor wanted
@ 2012-08-16 14:10 ` Benoît Canet
0 siblings, 0 replies; 101+ messages in thread
From: Benoît Canet @ 2012-08-16 14:10 UTC (permalink / raw)
To: Richard Davies; +Cc: qemu-devel, kvm
Le Thursday 16 Aug 2012 à 11:47:27 (+0100), Richard Davies a écrit :
> Hi,
>
> We run a cloud hosting provider using qemu-kvm 1.1, and are keen to find a
> contractor to track down and fix problems we have with large memory Windows
> guests booting very slowly - they can take several hours.
>
> We previously reported these problems in July (copied below) and they are
> still present with Linux kernel 3.5.1 and qemu-kvm 1.1.1.
>
> This is a serious issue for us which is causing significant pain to our
> larger Windows VM customers when their servers are offline for many hours
> during boot.
>
> If anyone knowledgeable in the area would be interested in being paid to
> work on this, or if you know someone who might be, I would be delighted to
> hear from you.
>
> Cheers,
>
> Richard.
>
>
> ===== Previous bug report
>
> http://marc.info/?l=qemu-devel&m=134304194329745
>
>
> We have been experiencing this problem for a while now too, using qemu-kvm
> (currently at 1.1.1).
>
> Unfortunately, hv_relaxed doesn't seem to fix it. The following command line
> produces the issue:
>
> qemu-kvm -nodefaults -m 4096 -smp 8 -cpu host,hv_relaxed -vga cirrus -usbdevice tablet -vnc :99 -monitor stdio -hda test.img
>
> The hardware consists of dual AMD Opteron 6128 processors (16 cores in
> total) and 64GB of memory. This command line was tested on kernel 3.1.4.
>
> I've also tested with -no-hpet.
>
> What I have seen is much as described: the memory fills out slowly, and top
> on the host will show the process using 100% on all allocated CPU cores. The
> most extreme case was a machine which took something between 6 and 8 hours
> to boot.
>
> This seems to be related to the assigned memory, as described, but also the
> number of processor cores (which makes sense if we believe it's a timing
> issue?). I have seen slow-booting guests improved by switching down to a
> single or even two cores.
>
> Matthew, I agree that this seems to be linked to the number of VMs running -
> in fact, shutting down other VMs on a dedicated test host caused the machine
> to start booting at a normal speed (with no reboot required).
>
> However, the level of contention is never such that this could be explained
> by the host simply being overcommitted.
>
> If it helps anyone, there's an image of the hard drive I've been using to
> test at:
>
> http://46.20.114.253/
>
> It's 5G of gzip file containing a fairly standard Windows 2008 trial
> installation. Since it's in the trial period, anyone who wants to use it may
> have to re-arm the trial: http://support.microsoft.com/kb/948472
>
> Please let me know if I can provide any more information, or test anything.
For info the image boot pretty fast with qemu-kvm 1.1.1 and a 3.2.0-29 ubuntu kernel
on a core i7 with these parameters.
Benoît
>
> Best wishes,
>
> Owen Tuz
>
^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: [Qemu-devel] Windows slow boot: contractor wanted
2012-08-16 10:47 ` [Qemu-devel] " Richard Davies
` (2 preceding siblings ...)
(?)
@ 2012-08-16 15:53 ` Troy Benjegerdes
-1 siblings, 0 replies; 101+ messages in thread
From: Troy Benjegerdes @ 2012-08-16 15:53 UTC (permalink / raw)
To: Richard Davies; +Cc: qemu-devel, kvm
I'd be interested in working on this.. What I'd like to propose is to write
an automated regression test harness that will reboot the host hardware, and
start booting up guest VMs and report the time-to-boot, as well as relative
performance of the running VMs.
For best results, I'd need access to the specific hardware you are using.
I'd also like to release the test harness back to the community, so I would
like some feedback from the mailing list on what kinds of tests should be
written that would provide the best information for the KVM developers.
What do you want to know, and what is the most usefull data to record to
debug this and future performance regressions?
On Thu, Aug 16, 2012 at 11:47:27AM +0100, Richard Davies wrote:
> Hi,
>
> We run a cloud hosting provider using qemu-kvm 1.1, and are keen to find a
> contractor to track down and fix problems we have with large memory Windows
> guests booting very slowly - they can take several hours.
>
> We previously reported these problems in July (copied below) and they are
> still present with Linux kernel 3.5.1 and qemu-kvm 1.1.1.
>
> This is a serious issue for us which is causing significant pain to our
> larger Windows VM customers when their servers are offline for many hours
> during boot.
>
> If anyone knowledgeable in the area would be interested in being paid to
> work on this, or if you know someone who might be, I would be delighted to
> hear from you.
>
> Cheers,
>
> Richard.
>
>
> ===== Previous bug report
>
> http://marc.info/?l=qemu-devel&m=134304194329745
>
>
> We have been experiencing this problem for a while now too, using qemu-kvm
> (currently at 1.1.1).
>
> Unfortunately, hv_relaxed doesn't seem to fix it. The following command line
> produces the issue:
>
> qemu-kvm -nodefaults -m 4096 -smp 8 -cpu host,hv_relaxed -vga cirrus -usbdevice tablet -vnc :99 -monitor stdio -hda test.img
>
> The hardware consists of dual AMD Opteron 6128 processors (16 cores in
> total) and 64GB of memory. This command line was tested on kernel 3.1.4.
>
> I've also tested with -no-hpet.
>
> What I have seen is much as described: the memory fills out slowly, and top
> on the host will show the process using 100% on all allocated CPU cores. The
> most extreme case was a machine which took something between 6 and 8 hours
> to boot.
>
> This seems to be related to the assigned memory, as described, but also the
> number of processor cores (which makes sense if we believe it's a timing
> issue?). I have seen slow-booting guests improved by switching down to a
> single or even two cores.
>
> Matthew, I agree that this seems to be linked to the number of VMs running -
> in fact, shutting down other VMs on a dedicated test host caused the machine
> to start booting at a normal speed (with no reboot required).
>
> However, the level of contention is never such that this could be explained
> by the host simply being overcommitted.
>
> If it helps anyone, there's an image of the hard drive I've been using to
> test at:
>
> http://46.20.114.253/
>
> It's 5G of gzip file containing a fairly standard Windows 2008 trial
> installation. Since it's in the trial period, anyone who wants to use it may
> have to re-arm the trial: http://support.microsoft.com/kb/948472
>
> Please let me know if I can provide any more information, or test anything.
>
> Best wishes,
>
> Owen Tuz
>
^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: Windows slow boot: contractor wanted
2012-08-16 11:39 ` [Qemu-devel] " Avi Kivity
@ 2012-08-17 12:36 ` Richard Davies
-1 siblings, 0 replies; 101+ messages in thread
From: Richard Davies @ 2012-08-17 12:36 UTC (permalink / raw)
To: Avi Kivity; +Cc: qemu-devel, kvm
Hi Avi,
Thanks to you and several others for offering help. We will work with Avi at
first, but are grateful for all the other offers of help. We have a number
of other qemu-related projects which we'd be interested in getting done, and
will get in touch with these names (and anyone else who comes forward) to
see if any are of interest to you.
This slow boot problem is intermittent and varys in how slow the boots are,
but I managed to trigger it this morning with medium slow booting (5-10
minutes) and link to the requested traces below.
The host in question has 128GB RAM and dual AMD Opteron 6128 (16 cores
total). It is running kernel 3.5.1 and qemu-kvm 1.1.1.
In this morning's test, we have 3 guests, all booting Windows with 40GB RAM
and 8 cores each (we have seen small VMs go slow as I originally said, but
it is easier to trigger with big VMs):
pid 15665: qemu-kvm -nodefaults -m 40960 -smp 8 -cpu host,hv_relaxed \
-vga cirrus -usbdevice tablet -vnc :99 -monitor stdio -hda test1.raw
pid 15676: qemu-kvm -nodefaults -m 40960 -smp 8 -cpu host,hv_relaxed \
-vga cirrus -usbdevice tablet -vnc :98 -monitor stdio -hda test2.raw
pid 15653: qemu-kvm -nodefaults -m 40960 -smp 8 -cpu host,hv_relaxed \
-vga cirrus -usbdevice tablet -vnc :97 -monitor stdio -hda test3.raw
We are running with hv_relaxed since this was suggested in the previous
thread, but we see intermittent slow boots with and without this flag.
All 3 VMs are booting slowly for most of the attached capture, which I
started after confirming the slow boots and stopped as soon as the first of
them (15665) had booted. In terms of visible symptoms, the VMs are showing
the Windows boot progress bar, which is moving very slowly. In top, the VMs
are at 400% CPU and their resident state size (RES) memory is slowly
counting up until it reaches the full VM size, at which point they finish
booting.
Here are the trace files:
http://users.org.uk/slow-win-boot-1/ps.txt (ps auxwwwf as root)
http://users.org.uk/slow-win-boot-1/top.txt (top with 2 VMs still slow)
http://users.org.uk/slow-win-boot-1/trace-console.txt (running trace-cmd)
http://users.org.uk/slow-win-boot-1/trace.dat (the 1.7G trace data file)
http://users.org.uk/slow-win-boot-1/trace-report.txt (the 4G trace report)
Please let me know if there is anything else which I can provide?
Thank you,
Richard.
^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: [Qemu-devel] Windows slow boot: contractor wanted
@ 2012-08-17 12:36 ` Richard Davies
0 siblings, 0 replies; 101+ messages in thread
From: Richard Davies @ 2012-08-17 12:36 UTC (permalink / raw)
To: Avi Kivity; +Cc: qemu-devel, kvm
Hi Avi,
Thanks to you and several others for offering help. We will work with Avi at
first, but are grateful for all the other offers of help. We have a number
of other qemu-related projects which we'd be interested in getting done, and
will get in touch with these names (and anyone else who comes forward) to
see if any are of interest to you.
This slow boot problem is intermittent and varys in how slow the boots are,
but I managed to trigger it this morning with medium slow booting (5-10
minutes) and link to the requested traces below.
The host in question has 128GB RAM and dual AMD Opteron 6128 (16 cores
total). It is running kernel 3.5.1 and qemu-kvm 1.1.1.
In this morning's test, we have 3 guests, all booting Windows with 40GB RAM
and 8 cores each (we have seen small VMs go slow as I originally said, but
it is easier to trigger with big VMs):
pid 15665: qemu-kvm -nodefaults -m 40960 -smp 8 -cpu host,hv_relaxed \
-vga cirrus -usbdevice tablet -vnc :99 -monitor stdio -hda test1.raw
pid 15676: qemu-kvm -nodefaults -m 40960 -smp 8 -cpu host,hv_relaxed \
-vga cirrus -usbdevice tablet -vnc :98 -monitor stdio -hda test2.raw
pid 15653: qemu-kvm -nodefaults -m 40960 -smp 8 -cpu host,hv_relaxed \
-vga cirrus -usbdevice tablet -vnc :97 -monitor stdio -hda test3.raw
We are running with hv_relaxed since this was suggested in the previous
thread, but we see intermittent slow boots with and without this flag.
All 3 VMs are booting slowly for most of the attached capture, which I
started after confirming the slow boots and stopped as soon as the first of
them (15665) had booted. In terms of visible symptoms, the VMs are showing
the Windows boot progress bar, which is moving very slowly. In top, the VMs
are at 400% CPU and their resident state size (RES) memory is slowly
counting up until it reaches the full VM size, at which point they finish
booting.
Here are the trace files:
http://users.org.uk/slow-win-boot-1/ps.txt (ps auxwwwf as root)
http://users.org.uk/slow-win-boot-1/top.txt (top with 2 VMs still slow)
http://users.org.uk/slow-win-boot-1/trace-console.txt (running trace-cmd)
http://users.org.uk/slow-win-boot-1/trace.dat (the 1.7G trace data file)
http://users.org.uk/slow-win-boot-1/trace-report.txt (the 4G trace report)
Please let me know if there is anything else which I can provide?
Thank you,
Richard.
^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: Windows slow boot: contractor wanted
2012-08-17 12:36 ` [Qemu-devel] " Richard Davies
@ 2012-08-17 13:02 ` Robert Vineyard
-1 siblings, 0 replies; 101+ messages in thread
From: Robert Vineyard @ 2012-08-17 13:02 UTC (permalink / raw)
To: Richard Davies; +Cc: Avi Kivity, qemu-devel, kvm
Richard,
Not sure if you've tried this, but I noticed massive performance gains
(easily booting 2-3 times as fast) by converting from RAW disk images to
direct-mapped raw partitions and making sure that IOMMU support was
enabled in the BIOS and in the kernel at boot time. The obvious downside
to using raw partitions is a loss of flexibility and portability across
physical machines, but in some cases the trade-offs may be worth it.
I never ran any formal benchmarks, but it "felt" like about a 50%
performance boost going from RAW disk images to raw partitions (don't
even think about using QCOW2 disk images for Windows, your VM's will
still be booting next week...). The real gains, which I can't yet fully
explain, came from passing "iommu=on intel_iommu=on" to the host kernel
on bootup. I believe the boot option to enable IOMMU support may be
different on AMD hardware.
Granted, this is on a much smaller VM than you're using (Windows 7 x64
with two vCPUs and 4gb of vRAM), but might be worth investigating.
Good luck!
-- Robert Vineyard
On 08/17/2012 08:36 AM, Richard Davies wrote:
> Hi Avi,
>
> Thanks to you and several others for offering help. We will work with Avi at
> first, but are grateful for all the other offers of help. We have a number
> of other qemu-related projects which we'd be interested in getting done, and
> will get in touch with these names (and anyone else who comes forward) to
> see if any are of interest to you.
>
>
> This slow boot problem is intermittent and varys in how slow the boots are,
> but I managed to trigger it this morning with medium slow booting (5-10
> minutes) and link to the requested traces below.
>
> The host in question has 128GB RAM and dual AMD Opteron 6128 (16 cores
> total). It is running kernel 3.5.1 and qemu-kvm 1.1.1.
>
> In this morning's test, we have 3 guests, all booting Windows with 40GB RAM
> and 8 cores each (we have seen small VMs go slow as I originally said, but
> it is easier to trigger with big VMs):
>
> pid 15665: qemu-kvm -nodefaults -m 40960 -smp 8 -cpu host,hv_relaxed \
> -vga cirrus -usbdevice tablet -vnc :99 -monitor stdio -hda test1.raw
> pid 15676: qemu-kvm -nodefaults -m 40960 -smp 8 -cpu host,hv_relaxed \
> -vga cirrus -usbdevice tablet -vnc :98 -monitor stdio -hda test2.raw
> pid 15653: qemu-kvm -nodefaults -m 40960 -smp 8 -cpu host,hv_relaxed \
> -vga cirrus -usbdevice tablet -vnc :97 -monitor stdio -hda test3.raw
>
> We are running with hv_relaxed since this was suggested in the previous
> thread, but we see intermittent slow boots with and without this flag.
>
>
> All 3 VMs are booting slowly for most of the attached capture, which I
> started after confirming the slow boots and stopped as soon as the first of
> them (15665) had booted. In terms of visible symptoms, the VMs are showing
> the Windows boot progress bar, which is moving very slowly. In top, the VMs
> are at 400% CPU and their resident state size (RES) memory is slowly
> counting up until it reaches the full VM size, at which point they finish
> booting.
>
>
> Here are the trace files:
>
> http://users.org.uk/slow-win-boot-1/ps.txt (ps auxwwwf as root)
> http://users.org.uk/slow-win-boot-1/top.txt (top with 2 VMs still slow)
> http://users.org.uk/slow-win-boot-1/trace-console.txt (running trace-cmd)
> http://users.org.uk/slow-win-boot-1/trace.dat (the 1.7G trace data file)
> http://users.org.uk/slow-win-boot-1/trace-report.txt (the 4G trace report)
>
>
> Please let me know if there is anything else which I can provide?
>
> Thank you,
>
> Richard.
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: [Qemu-devel] Windows slow boot: contractor wanted
@ 2012-08-17 13:02 ` Robert Vineyard
0 siblings, 0 replies; 101+ messages in thread
From: Robert Vineyard @ 2012-08-17 13:02 UTC (permalink / raw)
To: Richard Davies; +Cc: Avi Kivity, kvm, qemu-devel
Richard,
Not sure if you've tried this, but I noticed massive performance gains
(easily booting 2-3 times as fast) by converting from RAW disk images to
direct-mapped raw partitions and making sure that IOMMU support was
enabled in the BIOS and in the kernel at boot time. The obvious downside
to using raw partitions is a loss of flexibility and portability across
physical machines, but in some cases the trade-offs may be worth it.
I never ran any formal benchmarks, but it "felt" like about a 50%
performance boost going from RAW disk images to raw partitions (don't
even think about using QCOW2 disk images for Windows, your VM's will
still be booting next week...). The real gains, which I can't yet fully
explain, came from passing "iommu=on intel_iommu=on" to the host kernel
on bootup. I believe the boot option to enable IOMMU support may be
different on AMD hardware.
Granted, this is on a much smaller VM than you're using (Windows 7 x64
with two vCPUs and 4gb of vRAM), but might be worth investigating.
Good luck!
-- Robert Vineyard
On 08/17/2012 08:36 AM, Richard Davies wrote:
> Hi Avi,
>
> Thanks to you and several others for offering help. We will work with Avi at
> first, but are grateful for all the other offers of help. We have a number
> of other qemu-related projects which we'd be interested in getting done, and
> will get in touch with these names (and anyone else who comes forward) to
> see if any are of interest to you.
>
>
> This slow boot problem is intermittent and varys in how slow the boots are,
> but I managed to trigger it this morning with medium slow booting (5-10
> minutes) and link to the requested traces below.
>
> The host in question has 128GB RAM and dual AMD Opteron 6128 (16 cores
> total). It is running kernel 3.5.1 and qemu-kvm 1.1.1.
>
> In this morning's test, we have 3 guests, all booting Windows with 40GB RAM
> and 8 cores each (we have seen small VMs go slow as I originally said, but
> it is easier to trigger with big VMs):
>
> pid 15665: qemu-kvm -nodefaults -m 40960 -smp 8 -cpu host,hv_relaxed \
> -vga cirrus -usbdevice tablet -vnc :99 -monitor stdio -hda test1.raw
> pid 15676: qemu-kvm -nodefaults -m 40960 -smp 8 -cpu host,hv_relaxed \
> -vga cirrus -usbdevice tablet -vnc :98 -monitor stdio -hda test2.raw
> pid 15653: qemu-kvm -nodefaults -m 40960 -smp 8 -cpu host,hv_relaxed \
> -vga cirrus -usbdevice tablet -vnc :97 -monitor stdio -hda test3.raw
>
> We are running with hv_relaxed since this was suggested in the previous
> thread, but we see intermittent slow boots with and without this flag.
>
>
> All 3 VMs are booting slowly for most of the attached capture, which I
> started after confirming the slow boots and stopped as soon as the first of
> them (15665) had booted. In terms of visible symptoms, the VMs are showing
> the Windows boot progress bar, which is moving very slowly. In top, the VMs
> are at 400% CPU and their resident state size (RES) memory is slowly
> counting up until it reaches the full VM size, at which point they finish
> booting.
>
>
> Here are the trace files:
>
> http://users.org.uk/slow-win-boot-1/ps.txt (ps auxwwwf as root)
> http://users.org.uk/slow-win-boot-1/top.txt (top with 2 VMs still slow)
> http://users.org.uk/slow-win-boot-1/trace-console.txt (running trace-cmd)
> http://users.org.uk/slow-win-boot-1/trace.dat (the 1.7G trace data file)
> http://users.org.uk/slow-win-boot-1/trace-report.txt (the 4G trace report)
>
>
> Please let me know if there is anything else which I can provide?
>
> Thank you,
>
> Richard.
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: Windows slow boot: contractor wanted
2012-08-17 13:02 ` [Qemu-devel] " Robert Vineyard
@ 2012-08-18 14:44 ` Richard Davies
-1 siblings, 0 replies; 101+ messages in thread
From: Richard Davies @ 2012-08-18 14:44 UTC (permalink / raw)
To: Robert Vineyard; +Cc: Avi Kivity, kvm, qemu-devel
Hi Robert,
Robert Vineyard wrote:
> Not sure if you've tried this, but I noticed massive performance
> gains (easily booting 2-3 times as fast) by converting from RAW disk
> images to direct-mapped raw partitions and making sure that IOMMU
> support was enabled in the BIOS and in the kernel at boot time.
Thanks for the suggestions, but unfortunately do we have IOMMU support
enabled, and in production (rather than this test case), we run from LVM
LVs, which are effectively direct raw partitions and still have this slow
boot problem.
Thanks anyway,
Richard.
^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: [Qemu-devel] Windows slow boot: contractor wanted
@ 2012-08-18 14:44 ` Richard Davies
0 siblings, 0 replies; 101+ messages in thread
From: Richard Davies @ 2012-08-18 14:44 UTC (permalink / raw)
To: Robert Vineyard; +Cc: Avi Kivity, kvm, qemu-devel
Hi Robert,
Robert Vineyard wrote:
> Not sure if you've tried this, but I noticed massive performance
> gains (easily booting 2-3 times as fast) by converting from RAW disk
> images to direct-mapped raw partitions and making sure that IOMMU
> support was enabled in the BIOS and in the kernel at boot time.
Thanks for the suggestions, but unfortunately do we have IOMMU support
enabled, and in production (rather than this test case), we run from LVM
LVs, which are effectively direct raw partitions and still have this slow
boot problem.
Thanks anyway,
Richard.
^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: Windows slow boot: contractor wanted
2012-08-17 12:36 ` [Qemu-devel] " Richard Davies
@ 2012-08-19 5:02 ` Brian Jackson
-1 siblings, 0 replies; 101+ messages in thread
From: Brian Jackson @ 2012-08-19 5:02 UTC (permalink / raw)
To: Richard Davies; +Cc: Avi Kivity, kvm, qemu-devel
[-- Attachment #1: Type: text/plain, Size: 2897 bytes --]
On Friday 17 August 2012 07:36:42 Richard Davies wrote:
> Hi Avi,
>
> Thanks to you and several others for offering help. We will work with Avi
> at first, but are grateful for all the other offers of help. We have a
> number of other qemu-related projects which we'd be interested in getting
> done, and will get in touch with these names (and anyone else who comes
> forward) to see if any are of interest to you.
>
>
> This slow boot problem is intermittent and varys in how slow the boots are,
> but I managed to trigger it this morning with medium slow booting (5-10
> minutes) and link to the requested traces below.
>
> The host in question has 128GB RAM and dual AMD Opteron 6128 (16 cores
> total). It is running kernel 3.5.1 and qemu-kvm 1.1.1.
>
> In this morning's test, we have 3 guests, all booting Windows with 40GB RAM
> and 8 cores each (we have seen small VMs go slow as I originally said, but
> it is easier to trigger with big VMs):
>
> pid 15665: qemu-kvm -nodefaults -m 40960 -smp 8 -cpu host,hv_relaxed \
> -vga cirrus -usbdevice tablet -vnc :99 -monitor stdio -hda test1.raw
> pid 15676: qemu-kvm -nodefaults -m 40960 -smp 8 -cpu host,hv_relaxed \
> -vga cirrus -usbdevice tablet -vnc :98 -monitor stdio -hda test2.raw
> pid 15653: qemu-kvm -nodefaults -m 40960 -smp 8 -cpu host,hv_relaxed \
> -vga cirrus -usbdevice tablet -vnc :97 -monitor stdio -hda test3.raw
>
> We are running with hv_relaxed since this was suggested in the previous
> thread, but we see intermittent slow boots with and without this flag.
>
>
> All 3 VMs are booting slowly for most of the attached capture, which I
> started after confirming the slow boots and stopped as soon as the first of
> them (15665) had booted. In terms of visible symptoms, the VMs are showing
> the Windows boot progress bar, which is moving very slowly. In top, the VMs
> are at 400% CPU and their resident state size (RES) memory is slowly
> counting up until it reaches the full VM size, at which point they finish
> booting.
What memory options have you tried? (KSM, hugepages, -mem-preallocate)?
Is this only with 2008? (is that regular? R2?)
Have you tried any of the hyperv features/hints?
>
>
> Here are the trace files:
>
> http://users.org.uk/slow-win-boot-1/ps.txt (ps auxwwwf as root)
> http://users.org.uk/slow-win-boot-1/top.txt (top with 2 VMs still slow)
> http://users.org.uk/slow-win-boot-1/trace-console.txt (running trace-cmd)
> http://users.org.uk/slow-win-boot-1/trace.dat (the 1.7G trace data file)
> http://users.org.uk/slow-win-boot-1/trace-report.txt (the 4G trace report)
>
>
> Please let me know if there is anything else which I can provide?
>
> Thank you,
>
> Richard.
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
[-- Attachment #2: Type: text/html, Size: 13237 bytes --]
^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: [Qemu-devel] Windows slow boot: contractor wanted
@ 2012-08-19 5:02 ` Brian Jackson
0 siblings, 0 replies; 101+ messages in thread
From: Brian Jackson @ 2012-08-19 5:02 UTC (permalink / raw)
To: Richard Davies; +Cc: Avi Kivity, kvm, qemu-devel
[-- Attachment #1: Type: text/plain, Size: 2897 bytes --]
On Friday 17 August 2012 07:36:42 Richard Davies wrote:
> Hi Avi,
>
> Thanks to you and several others for offering help. We will work with Avi
> at first, but are grateful for all the other offers of help. We have a
> number of other qemu-related projects which we'd be interested in getting
> done, and will get in touch with these names (and anyone else who comes
> forward) to see if any are of interest to you.
>
>
> This slow boot problem is intermittent and varys in how slow the boots are,
> but I managed to trigger it this morning with medium slow booting (5-10
> minutes) and link to the requested traces below.
>
> The host in question has 128GB RAM and dual AMD Opteron 6128 (16 cores
> total). It is running kernel 3.5.1 and qemu-kvm 1.1.1.
>
> In this morning's test, we have 3 guests, all booting Windows with 40GB RAM
> and 8 cores each (we have seen small VMs go slow as I originally said, but
> it is easier to trigger with big VMs):
>
> pid 15665: qemu-kvm -nodefaults -m 40960 -smp 8 -cpu host,hv_relaxed \
> -vga cirrus -usbdevice tablet -vnc :99 -monitor stdio -hda test1.raw
> pid 15676: qemu-kvm -nodefaults -m 40960 -smp 8 -cpu host,hv_relaxed \
> -vga cirrus -usbdevice tablet -vnc :98 -monitor stdio -hda test2.raw
> pid 15653: qemu-kvm -nodefaults -m 40960 -smp 8 -cpu host,hv_relaxed \
> -vga cirrus -usbdevice tablet -vnc :97 -monitor stdio -hda test3.raw
>
> We are running with hv_relaxed since this was suggested in the previous
> thread, but we see intermittent slow boots with and without this flag.
>
>
> All 3 VMs are booting slowly for most of the attached capture, which I
> started after confirming the slow boots and stopped as soon as the first of
> them (15665) had booted. In terms of visible symptoms, the VMs are showing
> the Windows boot progress bar, which is moving very slowly. In top, the VMs
> are at 400% CPU and their resident state size (RES) memory is slowly
> counting up until it reaches the full VM size, at which point they finish
> booting.
What memory options have you tried? (KSM, hugepages, -mem-preallocate)?
Is this only with 2008? (is that regular? R2?)
Have you tried any of the hyperv features/hints?
>
>
> Here are the trace files:
>
> http://users.org.uk/slow-win-boot-1/ps.txt (ps auxwwwf as root)
> http://users.org.uk/slow-win-boot-1/top.txt (top with 2 VMs still slow)
> http://users.org.uk/slow-win-boot-1/trace-console.txt (running trace-cmd)
> http://users.org.uk/slow-win-boot-1/trace.dat (the 1.7G trace data file)
> http://users.org.uk/slow-win-boot-1/trace-report.txt (the 4G trace report)
>
>
> Please let me know if there is anything else which I can provide?
>
> Thank you,
>
> Richard.
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
[-- Attachment #2: Type: text/html, Size: 13237 bytes --]
^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: Windows slow boot: contractor wanted
2012-08-17 12:36 ` [Qemu-devel] " Richard Davies
@ 2012-08-19 8:40 ` Avi Kivity
-1 siblings, 0 replies; 101+ messages in thread
From: Avi Kivity @ 2012-08-19 8:40 UTC (permalink / raw)
To: Richard Davies; +Cc: qemu-devel, kvm
On 08/17/2012 03:36 PM, Richard Davies wrote:
> Hi Avi,
>
> Thanks to you and several others for offering help. We will work with Avi at
> first, but are grateful for all the other offers of help. We have a number
> of other qemu-related projects which we'd be interested in getting done, and
> will get in touch with these names (and anyone else who comes forward) to
> see if any are of interest to you.
>
>
> This slow boot problem is intermittent and varys in how slow the boots are,
> but I managed to trigger it this morning with medium slow booting (5-10
> minutes) and link to the requested traces below.
>
> The host in question has 128GB RAM and dual AMD Opteron 6128 (16 cores
> total). It is running kernel 3.5.1 and qemu-kvm 1.1.1.
>
> In this morning's test, we have 3 guests, all booting Windows with 40GB RAM
> and 8 cores each (we have seen small VMs go slow as I originally said, but
> it is easier to trigger with big VMs):
>
> pid 15665: qemu-kvm -nodefaults -m 40960 -smp 8 -cpu host,hv_relaxed \
> -vga cirrus -usbdevice tablet -vnc :99 -monitor stdio -hda test1.raw
> pid 15676: qemu-kvm -nodefaults -m 40960 -smp 8 -cpu host,hv_relaxed \
> -vga cirrus -usbdevice tablet -vnc :98 -monitor stdio -hda test2.raw
> pid 15653: qemu-kvm -nodefaults -m 40960 -smp 8 -cpu host,hv_relaxed \
> -vga cirrus -usbdevice tablet -vnc :97 -monitor stdio -hda test3.raw
>
40+40+40=120, pretty close to your server specs. Are you swapping?
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: [Qemu-devel] Windows slow boot: contractor wanted
@ 2012-08-19 8:40 ` Avi Kivity
0 siblings, 0 replies; 101+ messages in thread
From: Avi Kivity @ 2012-08-19 8:40 UTC (permalink / raw)
To: Richard Davies; +Cc: qemu-devel, kvm
On 08/17/2012 03:36 PM, Richard Davies wrote:
> Hi Avi,
>
> Thanks to you and several others for offering help. We will work with Avi at
> first, but are grateful for all the other offers of help. We have a number
> of other qemu-related projects which we'd be interested in getting done, and
> will get in touch with these names (and anyone else who comes forward) to
> see if any are of interest to you.
>
>
> This slow boot problem is intermittent and varys in how slow the boots are,
> but I managed to trigger it this morning with medium slow booting (5-10
> minutes) and link to the requested traces below.
>
> The host in question has 128GB RAM and dual AMD Opteron 6128 (16 cores
> total). It is running kernel 3.5.1 and qemu-kvm 1.1.1.
>
> In this morning's test, we have 3 guests, all booting Windows with 40GB RAM
> and 8 cores each (we have seen small VMs go slow as I originally said, but
> it is easier to trigger with big VMs):
>
> pid 15665: qemu-kvm -nodefaults -m 40960 -smp 8 -cpu host,hv_relaxed \
> -vga cirrus -usbdevice tablet -vnc :99 -monitor stdio -hda test1.raw
> pid 15676: qemu-kvm -nodefaults -m 40960 -smp 8 -cpu host,hv_relaxed \
> -vga cirrus -usbdevice tablet -vnc :98 -monitor stdio -hda test2.raw
> pid 15653: qemu-kvm -nodefaults -m 40960 -smp 8 -cpu host,hv_relaxed \
> -vga cirrus -usbdevice tablet -vnc :97 -monitor stdio -hda test3.raw
>
40+40+40=120, pretty close to your server specs. Are you swapping?
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: Windows slow boot: contractor wanted
2012-08-19 8:40 ` [Qemu-devel] " Avi Kivity
@ 2012-08-19 8:51 ` Richard Davies
-1 siblings, 0 replies; 101+ messages in thread
From: Richard Davies @ 2012-08-19 8:51 UTC (permalink / raw)
To: Avi Kivity; +Cc: qemu-devel, kvm
Avi Kivity wrote:
> Richard Davies wrote:
> > The host in question has 128GB RAM and dual AMD Opteron 6128 (16 cores
> > total). It is running kernel 3.5.1 and qemu-kvm 1.1.1.
> >
> > In this morning's test, we have 3 guests, all booting Windows with 40GB RAM
> > and 8 cores each (we have seen small VMs go slow as I originally said, but
> > it is easier to trigger with big VMs):
>
> 40+40+40=120, pretty close to your server specs. Are you swapping?
No - you can see on the "top" screenshot that there's no swap in use.
Richard.
^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: [Qemu-devel] Windows slow boot: contractor wanted
@ 2012-08-19 8:51 ` Richard Davies
0 siblings, 0 replies; 101+ messages in thread
From: Richard Davies @ 2012-08-19 8:51 UTC (permalink / raw)
To: Avi Kivity; +Cc: qemu-devel, kvm
Avi Kivity wrote:
> Richard Davies wrote:
> > The host in question has 128GB RAM and dual AMD Opteron 6128 (16 cores
> > total). It is running kernel 3.5.1 and qemu-kvm 1.1.1.
> >
> > In this morning's test, we have 3 guests, all booting Windows with 40GB RAM
> > and 8 cores each (we have seen small VMs go slow as I originally said, but
> > it is easier to trigger with big VMs):
>
> 40+40+40=120, pretty close to your server specs. Are you swapping?
No - you can see on the "top" screenshot that there's no swap in use.
Richard.
^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: Windows slow boot: contractor wanted
2012-08-17 12:36 ` [Qemu-devel] " Richard Davies
@ 2012-08-19 14:04 ` Avi Kivity
-1 siblings, 0 replies; 101+ messages in thread
From: Avi Kivity @ 2012-08-19 14:04 UTC (permalink / raw)
To: Richard Davies; +Cc: qemu-devel, kvm
On 08/17/2012 03:36 PM, Richard Davies wrote:
> Hi Avi,
>
> Thanks to you and several others for offering help. We will work with Avi at
> first, but are grateful for all the other offers of help. We have a number
> of other qemu-related projects which we'd be interested in getting done, and
> will get in touch with these names (and anyone else who comes forward) to
> see if any are of interest to you.
>
>
> This slow boot problem is intermittent and varys in how slow the boots are,
> but I managed to trigger it this morning with medium slow booting (5-10
> minutes) and link to the requested traces below.
>
> The host in question has 128GB RAM and dual AMD Opteron 6128 (16 cores
> total). It is running kernel 3.5.1 and qemu-kvm 1.1.1.
>
> In this morning's test, we have 3 guests, all booting Windows with 40GB RAM
> and 8 cores each (we have seen small VMs go slow as I originally said, but
> it is easier to trigger with big VMs):
>
> pid 15665: qemu-kvm -nodefaults -m 40960 -smp 8 -cpu host,hv_relaxed \
> -vga cirrus -usbdevice tablet -vnc :99 -monitor stdio -hda test1.raw
> pid 15676: qemu-kvm -nodefaults -m 40960 -smp 8 -cpu host,hv_relaxed \
> -vga cirrus -usbdevice tablet -vnc :98 -monitor stdio -hda test2.raw
> pid 15653: qemu-kvm -nodefaults -m 40960 -smp 8 -cpu host,hv_relaxed \
> -vga cirrus -usbdevice tablet -vnc :97 -monitor stdio -hda test3.raw
>
> We are running with hv_relaxed since this was suggested in the previous
> thread, but we see intermittent slow boots with and without this flag.
>
>
> All 3 VMs are booting slowly for most of the attached capture, which I
> started after confirming the slow boots and stopped as soon as the first of
> them (15665) had booted. In terms of visible symptoms, the VMs are showing
> the Windows boot progress bar, which is moving very slowly. In top, the VMs
> are at 400% CPU and their resident state size (RES) memory is slowly
> counting up until it reaches the full VM size, at which point they finish
> booting.
>
>
> Here are the trace files:
>
> http://users.org.uk/slow-win-boot-1/ps.txt (ps auxwwwf as root)
> http://users.org.uk/slow-win-boot-1/top.txt (top with 2 VMs still slow)
> http://users.org.uk/slow-win-boot-1/trace-console.txt (running trace-cmd)
> http://users.org.uk/slow-win-boot-1/trace.dat (the 1.7G trace data file)
> http://users.org.uk/slow-win-boot-1/trace-report.txt (the 4G trace report)
>
>
> Please let me know if there is anything else which I can provide?
There are tons of PAUSE exits indicating cpu overcommit (and indeed you
are overcommitted by about 50%).
What host kernel version are you running?
Does this reproduce without overcommit?
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: [Qemu-devel] Windows slow boot: contractor wanted
@ 2012-08-19 14:04 ` Avi Kivity
0 siblings, 0 replies; 101+ messages in thread
From: Avi Kivity @ 2012-08-19 14:04 UTC (permalink / raw)
To: Richard Davies; +Cc: qemu-devel, kvm
On 08/17/2012 03:36 PM, Richard Davies wrote:
> Hi Avi,
>
> Thanks to you and several others for offering help. We will work with Avi at
> first, but are grateful for all the other offers of help. We have a number
> of other qemu-related projects which we'd be interested in getting done, and
> will get in touch with these names (and anyone else who comes forward) to
> see if any are of interest to you.
>
>
> This slow boot problem is intermittent and varys in how slow the boots are,
> but I managed to trigger it this morning with medium slow booting (5-10
> minutes) and link to the requested traces below.
>
> The host in question has 128GB RAM and dual AMD Opteron 6128 (16 cores
> total). It is running kernel 3.5.1 and qemu-kvm 1.1.1.
>
> In this morning's test, we have 3 guests, all booting Windows with 40GB RAM
> and 8 cores each (we have seen small VMs go slow as I originally said, but
> it is easier to trigger with big VMs):
>
> pid 15665: qemu-kvm -nodefaults -m 40960 -smp 8 -cpu host,hv_relaxed \
> -vga cirrus -usbdevice tablet -vnc :99 -monitor stdio -hda test1.raw
> pid 15676: qemu-kvm -nodefaults -m 40960 -smp 8 -cpu host,hv_relaxed \
> -vga cirrus -usbdevice tablet -vnc :98 -monitor stdio -hda test2.raw
> pid 15653: qemu-kvm -nodefaults -m 40960 -smp 8 -cpu host,hv_relaxed \
> -vga cirrus -usbdevice tablet -vnc :97 -monitor stdio -hda test3.raw
>
> We are running with hv_relaxed since this was suggested in the previous
> thread, but we see intermittent slow boots with and without this flag.
>
>
> All 3 VMs are booting slowly for most of the attached capture, which I
> started after confirming the slow boots and stopped as soon as the first of
> them (15665) had booted. In terms of visible symptoms, the VMs are showing
> the Windows boot progress bar, which is moving very slowly. In top, the VMs
> are at 400% CPU and their resident state size (RES) memory is slowly
> counting up until it reaches the full VM size, at which point they finish
> booting.
>
>
> Here are the trace files:
>
> http://users.org.uk/slow-win-boot-1/ps.txt (ps auxwwwf as root)
> http://users.org.uk/slow-win-boot-1/top.txt (top with 2 VMs still slow)
> http://users.org.uk/slow-win-boot-1/trace-console.txt (running trace-cmd)
> http://users.org.uk/slow-win-boot-1/trace.dat (the 1.7G trace data file)
> http://users.org.uk/slow-win-boot-1/trace-report.txt (the 4G trace report)
>
>
> Please let me know if there is anything else which I can provide?
There are tons of PAUSE exits indicating cpu overcommit (and indeed you
are overcommitted by about 50%).
What host kernel version are you running?
Does this reproduce without overcommit?
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: Windows slow boot: contractor wanted
2012-08-19 5:02 ` [Qemu-devel] " Brian Jackson
@ 2012-08-20 8:16 ` Richard Davies
-1 siblings, 0 replies; 101+ messages in thread
From: Richard Davies @ 2012-08-20 8:16 UTC (permalink / raw)
To: Brian Jackson; +Cc: Avi Kivity, kvm, qemu-devel
Brian Jackson wrote:
> Richard Davies wrote:
> > The host in question has 128GB RAM and dual AMD Opteron 6128 (16 cores
> > total). It is running kernel 3.5.1 and qemu-kvm 1.1.1.
> >
> > In this morning's test, we have 3 guests, all booting Windows with 40GB RAM
> > and 8 cores each (we have seen small VMs go slow as I originally said, but
> > it is easier to trigger with big VMs):
> >
> > pid 15665: qemu-kvm -nodefaults -m 40960 -smp 8 -cpu host,hv_relaxed \
> > -vga cirrus -usbdevice tablet -vnc :99 -monitor stdio -hda test1.raw
> > pid 15676: qemu-kvm -nodefaults -m 40960 -smp 8 -cpu host,hv_relaxed \
> > -vga cirrus -usbdevice tablet -vnc :98 -monitor stdio -hda test2.raw
> > pid 15653: qemu-kvm -nodefaults -m 40960 -smp 8 -cpu host,hv_relaxed \
> > -vga cirrus -usbdevice tablet -vnc :97 -monitor stdio -hda test3.raw
>
> What memory options have you tried? (KSM, hugepages, -mem-preallocate)?
The host kernel has KSM and CONFIG_TRANSPARENT_HUGEPAGE=y and
CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS=y.
Our qemu-kvm command lines are as above, so we aren't using -mem-prealloc.
We'll try that.
> Is this only with 2008? (is that regular? R2?)
It is intermittent. We definitely see it with 2008 R2, and I believe with
2008 as well. We don't have many customers running earlier versions of
Windows.
> Have you tried any of the hyperv features/hints?
We have tried "-cpu host" and "-cpu host,hv_relaxed" as above, which both
exhibit the bug.
What other hyperv options do you think we should try?
Richard.
^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: [Qemu-devel] Windows slow boot: contractor wanted
@ 2012-08-20 8:16 ` Richard Davies
0 siblings, 0 replies; 101+ messages in thread
From: Richard Davies @ 2012-08-20 8:16 UTC (permalink / raw)
To: Brian Jackson; +Cc: Avi Kivity, kvm, qemu-devel
Brian Jackson wrote:
> Richard Davies wrote:
> > The host in question has 128GB RAM and dual AMD Opteron 6128 (16 cores
> > total). It is running kernel 3.5.1 and qemu-kvm 1.1.1.
> >
> > In this morning's test, we have 3 guests, all booting Windows with 40GB RAM
> > and 8 cores each (we have seen small VMs go slow as I originally said, but
> > it is easier to trigger with big VMs):
> >
> > pid 15665: qemu-kvm -nodefaults -m 40960 -smp 8 -cpu host,hv_relaxed \
> > -vga cirrus -usbdevice tablet -vnc :99 -monitor stdio -hda test1.raw
> > pid 15676: qemu-kvm -nodefaults -m 40960 -smp 8 -cpu host,hv_relaxed \
> > -vga cirrus -usbdevice tablet -vnc :98 -monitor stdio -hda test2.raw
> > pid 15653: qemu-kvm -nodefaults -m 40960 -smp 8 -cpu host,hv_relaxed \
> > -vga cirrus -usbdevice tablet -vnc :97 -monitor stdio -hda test3.raw
>
> What memory options have you tried? (KSM, hugepages, -mem-preallocate)?
The host kernel has KSM and CONFIG_TRANSPARENT_HUGEPAGE=y and
CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS=y.
Our qemu-kvm command lines are as above, so we aren't using -mem-prealloc.
We'll try that.
> Is this only with 2008? (is that regular? R2?)
It is intermittent. We definitely see it with 2008 R2, and I believe with
2008 as well. We don't have many customers running earlier versions of
Windows.
> Have you tried any of the hyperv features/hints?
We have tried "-cpu host" and "-cpu host,hv_relaxed" as above, which both
exhibit the bug.
What other hyperv options do you think we should try?
Richard.
^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: Windows slow boot: contractor wanted
2012-08-19 14:04 ` [Qemu-devel] " Avi Kivity
@ 2012-08-20 13:56 ` Richard Davies
-1 siblings, 0 replies; 101+ messages in thread
From: Richard Davies @ 2012-08-20 13:56 UTC (permalink / raw)
To: Avi Kivity; +Cc: qemu-devel, kvm
Avi Kivity wrote:
> Richard Davies wrote:
> > Hi Avi,
> >
> > Thanks to you and several others for offering help. We will work with Avi at
> > first, but are grateful for all the other offers of help. We have a number
> > of other qemu-related projects which we'd be interested in getting done, and
> > will get in touch with these names (and anyone else who comes forward) to
> > see if any are of interest to you.
> >
> >
> > This slow boot problem is intermittent and varys in how slow the boots are,
> > but I managed to trigger it this morning with medium slow booting (5-10
> > minutes) and link to the requested traces below.
> >
> > The host in question has 128GB RAM and dual AMD Opteron 6128 (16 cores
> > total). It is running kernel 3.5.1 and qemu-kvm 1.1.1.
> >
> > In this morning's test, we have 3 guests, all booting Windows with 40GB RAM
> > and 8 cores each (we have seen small VMs go slow as I originally said, but
> > it is easier to trigger with big VMs):
> >
> > pid 15665: qemu-kvm -nodefaults -m 40960 -smp 8 -cpu host,hv_relaxed \
> > -vga cirrus -usbdevice tablet -vnc :99 -monitor stdio -hda test1.raw
> > pid 15676: qemu-kvm -nodefaults -m 40960 -smp 8 -cpu host,hv_relaxed \
> > -vga cirrus -usbdevice tablet -vnc :98 -monitor stdio -hda test2.raw
> > pid 15653: qemu-kvm -nodefaults -m 40960 -smp 8 -cpu host,hv_relaxed \
> > -vga cirrus -usbdevice tablet -vnc :97 -monitor stdio -hda test3.raw
> >
> > We are running with hv_relaxed since this was suggested in the previous
> > thread, but we see intermittent slow boots with and without this flag.
> >
> >
> > All 3 VMs are booting slowly for most of the attached capture, which I
> > started after confirming the slow boots and stopped as soon as the first of
> > them (15665) had booted. In terms of visible symptoms, the VMs are showing
> > the Windows boot progress bar, which is moving very slowly. In top, the VMs
> > are at 400% CPU and their resident state size (RES) memory is slowly
> > counting up until it reaches the full VM size, at which point they finish
> > booting.
> >
> >
> > Here are the trace files:
> >
> > http://users.org.uk/slow-win-boot-1/ps.txt (ps auxwwwf as root)
> > http://users.org.uk/slow-win-boot-1/top.txt (top with 2 VMs still slow)
> > http://users.org.uk/slow-win-boot-1/trace-console.txt (running trace-cmd)
> > http://users.org.uk/slow-win-boot-1/trace.dat (the 1.7G trace data file)
> > http://users.org.uk/slow-win-boot-1/trace-report.txt (the 4G trace report)
> >
> >
> > Please let me know if there is anything else which I can provide?
>
>
> There are tons of PAUSE exits indicating cpu overcommit (and indeed you
> are overcommitted by about 50%).
>
> What host kernel version are you running?
>
> Does this reproduce without overcommit?
We're running host kernel 3.5.1 and qemu-kvm 1.1.1.
I hadn't though about it, but I agree this is related to cpu overcommit. The
slow boots are intermittent (and infrequent) with cpu overcommit whereas I
don't think it occurs without cpu overcommit.
In addition, if there is a slow boot ongoing, and you kill some other VMs to
reduce cpu overcommit then this will sometimes speed it up.
I guess the question is why even with overcommit most boots are fine, but
some small fraction then go slow?
Richard.
^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: [Qemu-devel] Windows slow boot: contractor wanted
@ 2012-08-20 13:56 ` Richard Davies
0 siblings, 0 replies; 101+ messages in thread
From: Richard Davies @ 2012-08-20 13:56 UTC (permalink / raw)
To: Avi Kivity; +Cc: qemu-devel, kvm
Avi Kivity wrote:
> Richard Davies wrote:
> > Hi Avi,
> >
> > Thanks to you and several others for offering help. We will work with Avi at
> > first, but are grateful for all the other offers of help. We have a number
> > of other qemu-related projects which we'd be interested in getting done, and
> > will get in touch with these names (and anyone else who comes forward) to
> > see if any are of interest to you.
> >
> >
> > This slow boot problem is intermittent and varys in how slow the boots are,
> > but I managed to trigger it this morning with medium slow booting (5-10
> > minutes) and link to the requested traces below.
> >
> > The host in question has 128GB RAM and dual AMD Opteron 6128 (16 cores
> > total). It is running kernel 3.5.1 and qemu-kvm 1.1.1.
> >
> > In this morning's test, we have 3 guests, all booting Windows with 40GB RAM
> > and 8 cores each (we have seen small VMs go slow as I originally said, but
> > it is easier to trigger with big VMs):
> >
> > pid 15665: qemu-kvm -nodefaults -m 40960 -smp 8 -cpu host,hv_relaxed \
> > -vga cirrus -usbdevice tablet -vnc :99 -monitor stdio -hda test1.raw
> > pid 15676: qemu-kvm -nodefaults -m 40960 -smp 8 -cpu host,hv_relaxed \
> > -vga cirrus -usbdevice tablet -vnc :98 -monitor stdio -hda test2.raw
> > pid 15653: qemu-kvm -nodefaults -m 40960 -smp 8 -cpu host,hv_relaxed \
> > -vga cirrus -usbdevice tablet -vnc :97 -monitor stdio -hda test3.raw
> >
> > We are running with hv_relaxed since this was suggested in the previous
> > thread, but we see intermittent slow boots with and without this flag.
> >
> >
> > All 3 VMs are booting slowly for most of the attached capture, which I
> > started after confirming the slow boots and stopped as soon as the first of
> > them (15665) had booted. In terms of visible symptoms, the VMs are showing
> > the Windows boot progress bar, which is moving very slowly. In top, the VMs
> > are at 400% CPU and their resident state size (RES) memory is slowly
> > counting up until it reaches the full VM size, at which point they finish
> > booting.
> >
> >
> > Here are the trace files:
> >
> > http://users.org.uk/slow-win-boot-1/ps.txt (ps auxwwwf as root)
> > http://users.org.uk/slow-win-boot-1/top.txt (top with 2 VMs still slow)
> > http://users.org.uk/slow-win-boot-1/trace-console.txt (running trace-cmd)
> > http://users.org.uk/slow-win-boot-1/trace.dat (the 1.7G trace data file)
> > http://users.org.uk/slow-win-boot-1/trace-report.txt (the 4G trace report)
> >
> >
> > Please let me know if there is anything else which I can provide?
>
>
> There are tons of PAUSE exits indicating cpu overcommit (and indeed you
> are overcommitted by about 50%).
>
> What host kernel version are you running?
>
> Does this reproduce without overcommit?
We're running host kernel 3.5.1 and qemu-kvm 1.1.1.
I hadn't though about it, but I agree this is related to cpu overcommit. The
slow boots are intermittent (and infrequent) with cpu overcommit whereas I
don't think it occurs without cpu overcommit.
In addition, if there is a slow boot ongoing, and you kill some other VMs to
reduce cpu overcommit then this will sometimes speed it up.
I guess the question is why even with overcommit most boots are fine, but
some small fraction then go slow?
Richard.
^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: Windows slow boot: contractor wanted
2012-08-20 13:56 ` [Qemu-devel] " Richard Davies
@ 2012-08-21 9:00 ` Avi Kivity
-1 siblings, 0 replies; 101+ messages in thread
From: Avi Kivity @ 2012-08-21 9:00 UTC (permalink / raw)
To: Richard Davies; +Cc: qemu-devel, kvm, Rik van Riel
On 08/20/2012 04:56 PM, Richard Davies wrote:
> We're running host kernel 3.5.1 and qemu-kvm 1.1.1.
>
> I hadn't though about it, but I agree this is related to cpu overcommit. The
> slow boots are intermittent (and infrequent) with cpu overcommit whereas I
> don't think it occurs without cpu overcommit.
>
> In addition, if there is a slow boot ongoing, and you kill some other VMs to
> reduce cpu overcommit then this will sometimes speed it up.
>
> I guess the question is why even with overcommit most boots are fine, but
> some small fraction then go slow?
Could be a bug. The scheduler and the spin-loop handling code fight
each other instead of working well.
Please provide snapshots of 'perf top' while a slow boot is in progress.
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: [Qemu-devel] Windows slow boot: contractor wanted
@ 2012-08-21 9:00 ` Avi Kivity
0 siblings, 0 replies; 101+ messages in thread
From: Avi Kivity @ 2012-08-21 9:00 UTC (permalink / raw)
To: Richard Davies; +Cc: qemu-devel, kvm
On 08/20/2012 04:56 PM, Richard Davies wrote:
> We're running host kernel 3.5.1 and qemu-kvm 1.1.1.
>
> I hadn't though about it, but I agree this is related to cpu overcommit. The
> slow boots are intermittent (and infrequent) with cpu overcommit whereas I
> don't think it occurs without cpu overcommit.
>
> In addition, if there is a slow boot ongoing, and you kill some other VMs to
> reduce cpu overcommit then this will sometimes speed it up.
>
> I guess the question is why even with overcommit most boots are fine, but
> some small fraction then go slow?
Could be a bug. The scheduler and the spin-loop handling code fight
each other instead of working well.
Please provide snapshots of 'perf top' while a slow boot is in progress.
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: Windows slow boot: contractor wanted
2012-08-21 9:00 ` [Qemu-devel] " Avi Kivity
@ 2012-08-21 15:21 ` Richard Davies
-1 siblings, 0 replies; 101+ messages in thread
From: Richard Davies @ 2012-08-21 15:21 UTC (permalink / raw)
To: Avi Kivity; +Cc: qemu-devel, kvm, Rik van Riel
Avi Kivity wrote:
> Richard Davies wrote:
> > We're running host kernel 3.5.1 and qemu-kvm 1.1.1.
> >
> > I hadn't though about it, but I agree this is related to cpu overcommit. The
> > slow boots are intermittent (and infrequent) with cpu overcommit whereas I
> > don't think it occurs without cpu overcommit.
> >
> > In addition, if there is a slow boot ongoing, and you kill some other VMs to
> > reduce cpu overcommit then this will sometimes speed it up.
> >
> > I guess the question is why even with overcommit most boots are fine, but
> > some small fraction then go slow?
>
> Could be a bug. The scheduler and the spin-loop handling code fight
> each other instead of working well.
>
> Please provide snapshots of 'perf top' while a slow boot is in progress.
Below are two 'perf top' snapshots during a slow boot, which appear to me to
support your idea of a spin-lock problem.
There are a lot more "unprocessable samples recorded" messages at the end of
each snapshot which I haven't included. I think these may be from the guest
OS - the kernel is listed, and qemu-kvm itself is listed on some other
traces which I did, although not these.
Richard.
PerfTop: 62249 irqs/sec kernel:96.9% exact: 0.0% [4000Hz cycles], (all, 16 CPUs)
--------------------------------------------------------------------------------------------------------------------------------
35.80% [kernel] [k] _raw_spin_lock_irqsave
21.64% [kernel] [k] isolate_freepages_block
5.91% [kernel] [k] yield_to
4.95% [kernel] [k] _raw_spin_lock
3.37% [kernel] [k] kvm_vcpu_on_spin
2.74% [kernel] [k] add_preempt_count
2.45% [kernel] [k] _raw_spin_unlock
2.33% [kernel] [k] sub_preempt_count
2.18% [kernel] [k] svm_vcpu_run
2.17% [kernel] [k] kvm_vcpu_yield_to
1.89% [kernel] [k] memcmp
1.50% [kernel] [k] get_pid_task
1.26% [kernel] [k] kvm_arch_vcpu_ioctl_run
1.16% [kernel] [k] pid_task
0.70% [kernel] [k] rcu_note_context_switch
0.70% [kernel] [k] trace_hardirqs_on
0.52% [kernel] [k] __rcu_read_unlock
0.51% [kernel] [k] trace_preempt_on
0.47% [kernel] [k] __srcu_read_lock
0.43% [kernel] [k] get_parent_ip
0.42% [kernel] [k] get_pageblock_flags_group
0.38% [kernel] [k] in_lock_functions
0.34% [kernel] [k] trace_preempt_off
0.34% [kernel] [k] trace_hardirqs_off
0.29% [kernel] [k] clear_page_c
0.23% [kernel] [k] __srcu_read_unlock
0.20% [kernel] [k] __rcu_read_lock
0.14% [kernel] [k] handle_exit
0.11% libc-2.10.1.so [.] strcmp
0.11% [kernel] [k] _raw_spin_unlock_irqrestore
0.11% [kernel] [k] _raw_spin_lock_irq
0.11% [kernel] [k] find_highest_vector
0.09% [kernel] [k] ktime_get
0.08% [kernel] [k] copy_page_c
0.08% [kernel] [k] pause_interception
0.08% [kernel] [k] kmem_cache_alloc
0.08% [kernel] [k] resched_task
0.08% perf [.] dso__find_symbol
0.06% [kernel] [k] compaction_alloc
0.06% libc-2.10.1.so [.] 0x0000000000076dab
0.06% [kernel] [k] read_tsc
0.06% perf [.] add_hist_entry
0.05% [kernel] [k] svm_read_l1_tsc
0.05% [kernel] [k] native_read_tsc
0.05% perf [.] sort__dso_cmp
0.05% [kernel] [k] copy_user_generic_string
0.05% [kernel] [k] ktime_get_update_offsets
0.04% [kernel] [k] kvm_check_async_pf_completion
0.04% [kernel] [k] __schedule
0.04% [kernel] [k] __rcu_pending
0.04% [kernel] [k] svm_complete_interrupts
0.04% [kernel] [k] perf_pmu_disable
0.04% [kernel] [k] isolate_migratepages_range
0.04% [kernel] [k] sched_clock_cpu
0.04% [kernel] [k] kvm_cpu_has_pending_timer
0.04% [kernel] [k] apic_timer_interrupt
0.04% [vdso] [.] 0x00007fff2e1ff607
0.04% [kernel] [k] apic_update_ppr
0.04% [kernel] [k] do_select
0.04% [kernel] [k] svm_scale_tsc
0.04% [kernel] [k] system_call_after_swapgs
0.03% [kernel] [k] kvm_lapic_get_cr8
0.03% perf [.] sort__sym_cmp
0.03% [kernel] [k] find_next_bit
0.03% [kernel] [k] kvm_set_cr8
0.03% [kernel] [k] rcu_check_callbacks
9750 unprocessable samples recorded.9751 unprocessable samples recorded.9752 unprocessable samples recorded.9753 unprocessable samples recorded.9754 unprocessable samples recorded.9755 unprocessable samples recorded.9756 unprocessable samples recorded.9757 u nprocessable samples recorded.9758 unprocessable samples recorded.9759 unprocessable samples recorded.9760 unprocessable samples recorded.9761 unprocessable samples recorded.9762 unprocessable samples recorded.9763 unprocessable samples recorded.
PerfTop: 61584 irqs/sec kernel:97.4% exact: 0.0% [4000Hz cycles], (all, 16 CPUs)
--------------------------------------------------------------------------------------------------------------------------------
36.73% [kernel] [k] _raw_spin_lock_irqsave
19.00% [kernel] [k] isolate_freepages_block
5.80% [kernel] [k] yield_to
5.23% [kernel] [k] _raw_spin_lock
3.97% [kernel] [k] kvm_vcpu_on_spin
2.98% [kernel] [k] add_preempt_count
2.45% [kernel] [k] sub_preempt_count
2.37% [kernel] [k] _raw_spin_unlock
2.22% [kernel] [k] svm_vcpu_run
2.19% [kernel] [k] kvm_vcpu_yield_to
1.90% [kernel] [k] memcmp
1.54% [kernel] [k] get_pid_task
1.39% [kernel] [k] kvm_arch_vcpu_ioctl_run
1.30% [kernel] [k] pid_task
0.75% [kernel] [k] rcu_note_context_switch
0.74% [kernel] [k] trace_hardirqs_on
0.58% [kernel] [k] __rcu_read_unlock
0.55% [kernel] [k] trace_preempt_on
0.47% [kernel] [k] __srcu_read_lock
0.44% [kernel] [k] get_parent_ip
0.41% [kernel] [k] clear_page_c
0.40% [kernel] [k] get_pageblock_flags_group
0.39% [kernel] [k] in_lock_functions
0.36% [kernel] [k] trace_preempt_off
0.35% [kernel] [k] trace_hardirqs_off
0.23% [kernel] [k] __srcu_read_unlock
0.20% [kernel] [k] __rcu_read_lock
0.15% [kernel] [k] _raw_spin_lock_irq
0.14% [kernel] [k] handle_exit
0.12% [kernel] [k] find_highest_vector
0.11% [kernel] [k] resched_task
0.10% libc-2.10.1.so [.] strcmp
0.09% [kernel] [k] _raw_spin_unlock_irqrestore
0.09% [kernel] [k] ktime_get
0.08% [kernel] [k] pause_interception
0.08% [kernel] [k] copy_page_c
0.07% [kernel] [k] __schedule
0.07% [kernel] [k] compact_zone
0.07% perf [.] dso__find_symbol
0.06% perf [.] add_hist_entry
0.06% [kernel] [k] read_tsc
0.06% [kernel] [k] svm_read_l1_tsc
0.05% [kernel] [k] native_read_tsc
0.05% [kernel] [k] ktime_get_update_offsets
0.05% [kernel] [k] compaction_alloc
0.05% libc-2.10.1.so [.] 0x0000000000073ae0
0.05% [kernel] [k] kmem_cache_alloc
0.05% [kernel] [k] svm_complete_interrupts
0.05% [kernel] [k] kvm_check_async_pf_completion
0.05% [kernel] [k] apic_timer_interrupt
0.05% perf [.] sort__dso_cmp
0.05% [kernel] [k] kvm_cpu_has_pending_timer
0.04% [kernel] [k] svm_scale_tsc
0.04% [kernel] [k] isolate_migratepages_range
0.04% [kernel] [k] sched_clock_cpu
0.04% [kernel] [k] __rcu_pending
0.04% [kernel] [k] apic_update_ppr
0.04% [kernel] [k] do_select
0.04% [kernel] [k] perf_pmu_disable
0.04% [kernel] [k] kvm_set_cr8
0.04% [kernel] [k] update_curr
0.04% [kernel] [k] reschedule_interrupt
0.03% [kernel] [k] kvm_lapic_get_cr8
0.03% libc-2.10.1.so [.] strstr
0.03% [kernel] [k] apic_has_pending_timer
0.03% perf [.] sort__sym_cmp
4963 unprocessable samples recorded.4964 unprocessable samples recorded.4965 unprocessable samples recorded.4966 unprocessable samples recorded.4967 unprocessable samples recorded.4968 unprocessable samples recorded.4969 unprocessable samples recorded.4970 unprocessable samples recorded.4971 unprocessable samples recorded.4972 unprocessable samples recorded.4973 unprocessable samples recorded.4974 unprocessable samples recorded.4975 unprocessable samples recorded.
^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: [Qemu-devel] Windows slow boot: contractor wanted
@ 2012-08-21 15:21 ` Richard Davies
0 siblings, 0 replies; 101+ messages in thread
From: Richard Davies @ 2012-08-21 15:21 UTC (permalink / raw)
To: Avi Kivity; +Cc: qemu-devel, kvm
Avi Kivity wrote:
> Richard Davies wrote:
> > We're running host kernel 3.5.1 and qemu-kvm 1.1.1.
> >
> > I hadn't though about it, but I agree this is related to cpu overcommit. The
> > slow boots are intermittent (and infrequent) with cpu overcommit whereas I
> > don't think it occurs without cpu overcommit.
> >
> > In addition, if there is a slow boot ongoing, and you kill some other VMs to
> > reduce cpu overcommit then this will sometimes speed it up.
> >
> > I guess the question is why even with overcommit most boots are fine, but
> > some small fraction then go slow?
>
> Could be a bug. The scheduler and the spin-loop handling code fight
> each other instead of working well.
>
> Please provide snapshots of 'perf top' while a slow boot is in progress.
Below are two 'perf top' snapshots during a slow boot, which appear to me to
support your idea of a spin-lock problem.
There are a lot more "unprocessable samples recorded" messages at the end of
each snapshot which I haven't included. I think these may be from the guest
OS - the kernel is listed, and qemu-kvm itself is listed on some other
traces which I did, although not these.
Richard.
PerfTop: 62249 irqs/sec kernel:96.9% exact: 0.0% [4000Hz cycles], (all, 16 CPUs)
--------------------------------------------------------------------------------------------------------------------------------
35.80% [kernel] [k] _raw_spin_lock_irqsave
21.64% [kernel] [k] isolate_freepages_block
5.91% [kernel] [k] yield_to
4.95% [kernel] [k] _raw_spin_lock
3.37% [kernel] [k] kvm_vcpu_on_spin
2.74% [kernel] [k] add_preempt_count
2.45% [kernel] [k] _raw_spin_unlock
2.33% [kernel] [k] sub_preempt_count
2.18% [kernel] [k] svm_vcpu_run
2.17% [kernel] [k] kvm_vcpu_yield_to
1.89% [kernel] [k] memcmp
1.50% [kernel] [k] get_pid_task
1.26% [kernel] [k] kvm_arch_vcpu_ioctl_run
1.16% [kernel] [k] pid_task
0.70% [kernel] [k] rcu_note_context_switch
0.70% [kernel] [k] trace_hardirqs_on
0.52% [kernel] [k] __rcu_read_unlock
0.51% [kernel] [k] trace_preempt_on
0.47% [kernel] [k] __srcu_read_lock
0.43% [kernel] [k] get_parent_ip
0.42% [kernel] [k] get_pageblock_flags_group
0.38% [kernel] [k] in_lock_functions
0.34% [kernel] [k] trace_preempt_off
0.34% [kernel] [k] trace_hardirqs_off
0.29% [kernel] [k] clear_page_c
0.23% [kernel] [k] __srcu_read_unlock
0.20% [kernel] [k] __rcu_read_lock
0.14% [kernel] [k] handle_exit
0.11% libc-2.10.1.so [.] strcmp
0.11% [kernel] [k] _raw_spin_unlock_irqrestore
0.11% [kernel] [k] _raw_spin_lock_irq
0.11% [kernel] [k] find_highest_vector
0.09% [kernel] [k] ktime_get
0.08% [kernel] [k] copy_page_c
0.08% [kernel] [k] pause_interception
0.08% [kernel] [k] kmem_cache_alloc
0.08% [kernel] [k] resched_task
0.08% perf [.] dso__find_symbol
0.06% [kernel] [k] compaction_alloc
0.06% libc-2.10.1.so [.] 0x0000000000076dab
0.06% [kernel] [k] read_tsc
0.06% perf [.] add_hist_entry
0.05% [kernel] [k] svm_read_l1_tsc
0.05% [kernel] [k] native_read_tsc
0.05% perf [.] sort__dso_cmp
0.05% [kernel] [k] copy_user_generic_string
0.05% [kernel] [k] ktime_get_update_offsets
0.04% [kernel] [k] kvm_check_async_pf_completion
0.04% [kernel] [k] __schedule
0.04% [kernel] [k] __rcu_pending
0.04% [kernel] [k] svm_complete_interrupts
0.04% [kernel] [k] perf_pmu_disable
0.04% [kernel] [k] isolate_migratepages_range
0.04% [kernel] [k] sched_clock_cpu
0.04% [kernel] [k] kvm_cpu_has_pending_timer
0.04% [kernel] [k] apic_timer_interrupt
0.04% [vdso] [.] 0x00007fff2e1ff607
0.04% [kernel] [k] apic_update_ppr
0.04% [kernel] [k] do_select
0.04% [kernel] [k] svm_scale_tsc
0.04% [kernel] [k] system_call_after_swapgs
0.03% [kernel] [k] kvm_lapic_get_cr8
0.03% perf [.] sort__sym_cmp
0.03% [kernel] [k] find_next_bit
0.03% [kernel] [k] kvm_set_cr8
0.03% [kernel] [k] rcu_check_callbacks
9750 unprocessable samples recorded.9751 unprocessable samples recorded.9752 unprocessable samples recorded.9753 unprocessable samples recorded.9754 unprocessable samples recorded.9755 unprocessable samples recorded.9756 unprocessable samples recorded.9757 u nprocessable samples recorded.9758 unprocessable samples recorded.9759 unprocessable samples recorded.9760 unprocessable samples recorded.9761 unprocessable samples recorded.9762 unprocessable samples recorded.9763 unprocessable samples recorded.
PerfTop: 61584 irqs/sec kernel:97.4% exact: 0.0% [4000Hz cycles], (all, 16 CPUs)
--------------------------------------------------------------------------------------------------------------------------------
36.73% [kernel] [k] _raw_spin_lock_irqsave
19.00% [kernel] [k] isolate_freepages_block
5.80% [kernel] [k] yield_to
5.23% [kernel] [k] _raw_spin_lock
3.97% [kernel] [k] kvm_vcpu_on_spin
2.98% [kernel] [k] add_preempt_count
2.45% [kernel] [k] sub_preempt_count
2.37% [kernel] [k] _raw_spin_unlock
2.22% [kernel] [k] svm_vcpu_run
2.19% [kernel] [k] kvm_vcpu_yield_to
1.90% [kernel] [k] memcmp
1.54% [kernel] [k] get_pid_task
1.39% [kernel] [k] kvm_arch_vcpu_ioctl_run
1.30% [kernel] [k] pid_task
0.75% [kernel] [k] rcu_note_context_switch
0.74% [kernel] [k] trace_hardirqs_on
0.58% [kernel] [k] __rcu_read_unlock
0.55% [kernel] [k] trace_preempt_on
0.47% [kernel] [k] __srcu_read_lock
0.44% [kernel] [k] get_parent_ip
0.41% [kernel] [k] clear_page_c
0.40% [kernel] [k] get_pageblock_flags_group
0.39% [kernel] [k] in_lock_functions
0.36% [kernel] [k] trace_preempt_off
0.35% [kernel] [k] trace_hardirqs_off
0.23% [kernel] [k] __srcu_read_unlock
0.20% [kernel] [k] __rcu_read_lock
0.15% [kernel] [k] _raw_spin_lock_irq
0.14% [kernel] [k] handle_exit
0.12% [kernel] [k] find_highest_vector
0.11% [kernel] [k] resched_task
0.10% libc-2.10.1.so [.] strcmp
0.09% [kernel] [k] _raw_spin_unlock_irqrestore
0.09% [kernel] [k] ktime_get
0.08% [kernel] [k] pause_interception
0.08% [kernel] [k] copy_page_c
0.07% [kernel] [k] __schedule
0.07% [kernel] [k] compact_zone
0.07% perf [.] dso__find_symbol
0.06% perf [.] add_hist_entry
0.06% [kernel] [k] read_tsc
0.06% [kernel] [k] svm_read_l1_tsc
0.05% [kernel] [k] native_read_tsc
0.05% [kernel] [k] ktime_get_update_offsets
0.05% [kernel] [k] compaction_alloc
0.05% libc-2.10.1.so [.] 0x0000000000073ae0
0.05% [kernel] [k] kmem_cache_alloc
0.05% [kernel] [k] svm_complete_interrupts
0.05% [kernel] [k] kvm_check_async_pf_completion
0.05% [kernel] [k] apic_timer_interrupt
0.05% perf [.] sort__dso_cmp
0.05% [kernel] [k] kvm_cpu_has_pending_timer
0.04% [kernel] [k] svm_scale_tsc
0.04% [kernel] [k] isolate_migratepages_range
0.04% [kernel] [k] sched_clock_cpu
0.04% [kernel] [k] __rcu_pending
0.04% [kernel] [k] apic_update_ppr
0.04% [kernel] [k] do_select
0.04% [kernel] [k] perf_pmu_disable
0.04% [kernel] [k] kvm_set_cr8
0.04% [kernel] [k] update_curr
0.04% [kernel] [k] reschedule_interrupt
0.03% [kernel] [k] kvm_lapic_get_cr8
0.03% libc-2.10.1.so [.] strstr
0.03% [kernel] [k] apic_has_pending_timer
0.03% perf [.] sort__sym_cmp
4963 unprocessable samples recorded.4964 unprocessable samples recorded.4965 unprocessable samples recorded.4966 unprocessable samples recorded.4967 unprocessable samples recorded.4968 unprocessable samples recorded.4969 unprocessable samples recorded.4970 unprocessable samples recorded.4971 unprocessable samples recorded.4972 unprocessable samples recorded.4973 unprocessable samples recorded.4974 unprocessable samples recorded.4975 unprocessable samples recorded.
^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: [Qemu-devel] Windows slow boot: contractor wanted
2012-08-21 15:21 ` [Qemu-devel] " Richard Davies
@ 2012-08-21 15:39 ` Troy Benjegerdes
-1 siblings, 0 replies; 101+ messages in thread
From: Troy Benjegerdes @ 2012-08-21 15:39 UTC (permalink / raw)
To: Richard Davies; +Cc: Avi Kivity, qemu-devel, kvm
Do you have any way to determine what CPU groups the different VMs
are running on?
If you end up in an overcommit situation where half the 'virtual'
cpus are on one AMD socket, and the other half are on a different
AMD socket, then you'll be thrashing the hypertransport link.
At Cray we were very carefull to never overcommit runnable processes
to CPUS, and generally locked processes to a single cpu.
Have a read of
http://berrange.com/posts/2010/02/12/controlling-guest-cpu-numa-affinity-in-libvirt-with-qemu-kvm-xen/
I'm going to speculate that when things don't work very well you end up with
memory from a booting guest scattered across many different NUMA nodes/cpus,
and then it really won't matter how good the spin loop/scheduler code is
because you are bound by the additional latency and bandwidth limitations of
running on one socekt and accessing half the memory that's resident on a
different socket.
On Tue, Aug 21, 2012 at 04:21:07PM +0100, Richard Davies wrote:
> Avi Kivity wrote:
> > Richard Davies wrote:
> > > We're running host kernel 3.5.1 and qemu-kvm 1.1.1.
> > >
> > > I hadn't though about it, but I agree this is related to cpu overcommit. The
> > > slow boots are intermittent (and infrequent) with cpu overcommit whereas I
> > > don't think it occurs without cpu overcommit.
> > >
> > > In addition, if there is a slow boot ongoing, and you kill some other VMs to
> > > reduce cpu overcommit then this will sometimes speed it up.
> > >
> > > I guess the question is why even with overcommit most boots are fine, but
> > > some small fraction then go slow?
> >
> > Could be a bug. The scheduler and the spin-loop handling code fight
> > each other instead of working well.
> >
> > Please provide snapshots of 'perf top' while a slow boot is in progress.
>
> Below are two 'perf top' snapshots during a slow boot, which appear to me to
> support your idea of a spin-lock problem.
>
> There are a lot more "unprocessable samples recorded" messages at the end of
> each snapshot which I haven't included. I think these may be from the guest
> OS - the kernel is listed, and qemu-kvm itself is listed on some other
> traces which I did, although not these.
>
> Richard.
>
>
>
> PerfTop: 62249 irqs/sec kernel:96.9% exact: 0.0% [4000Hz cycles], (all, 16 CPUs)
> --------------------------------------------------------------------------------------------------------------------------------
>
> 35.80% [kernel] [k] _raw_spin_lock_irqsave
> 21.64% [kernel] [k] isolate_freepages_block
> 5.91% [kernel] [k] yield_to
> 4.95% [kernel] [k] _raw_spin_lock
> 3.37% [kernel] [k] kvm_vcpu_on_spin
> 2.74% [kernel] [k] add_preempt_count
> 2.45% [kernel] [k] _raw_spin_unlock
> 2.33% [kernel] [k] sub_preempt_count
> 2.18% [kernel] [k] svm_vcpu_run
> 2.17% [kernel] [k] kvm_vcpu_yield_to
> 1.89% [kernel] [k] memcmp
> 1.50% [kernel] [k] get_pid_task
> 1.26% [kernel] [k] kvm_arch_vcpu_ioctl_run
> 1.16% [kernel] [k] pid_task
> 0.70% [kernel] [k] rcu_note_context_switch
> 0.70% [kernel] [k] trace_hardirqs_on
> 0.52% [kernel] [k] __rcu_read_unlock
> 0.51% [kernel] [k] trace_preempt_on
> 0.47% [kernel] [k] __srcu_read_lock
> 0.43% [kernel] [k] get_parent_ip
> 0.42% [kernel] [k] get_pageblock_flags_group
> 0.38% [kernel] [k] in_lock_functions
> 0.34% [kernel] [k] trace_preempt_off
> 0.34% [kernel] [k] trace_hardirqs_off
> 0.29% [kernel] [k] clear_page_c
> 0.23% [kernel] [k] __srcu_read_unlock
> 0.20% [kernel] [k] __rcu_read_lock
> 0.14% [kernel] [k] handle_exit
> 0.11% libc-2.10.1.so [.] strcmp
> 0.11% [kernel] [k] _raw_spin_unlock_irqrestore
> 0.11% [kernel] [k] _raw_spin_lock_irq
> 0.11% [kernel] [k] find_highest_vector
> 0.09% [kernel] [k] ktime_get
> 0.08% [kernel] [k] copy_page_c
> 0.08% [kernel] [k] pause_interception
> 0.08% [kernel] [k] kmem_cache_alloc
> 0.08% [kernel] [k] resched_task
> 0.08% perf [.] dso__find_symbol
> 0.06% [kernel] [k] compaction_alloc
> 0.06% libc-2.10.1.so [.] 0x0000000000076dab
> 0.06% [kernel] [k] read_tsc
> 0.06% perf [.] add_hist_entry
> 0.05% [kernel] [k] svm_read_l1_tsc
> 0.05% [kernel] [k] native_read_tsc
> 0.05% perf [.] sort__dso_cmp
> 0.05% [kernel] [k] copy_user_generic_string
> 0.05% [kernel] [k] ktime_get_update_offsets
> 0.04% [kernel] [k] kvm_check_async_pf_completion
> 0.04% [kernel] [k] __schedule
> 0.04% [kernel] [k] __rcu_pending
> 0.04% [kernel] [k] svm_complete_interrupts
> 0.04% [kernel] [k] perf_pmu_disable
> 0.04% [kernel] [k] isolate_migratepages_range
> 0.04% [kernel] [k] sched_clock_cpu
> 0.04% [kernel] [k] kvm_cpu_has_pending_timer
> 0.04% [kernel] [k] apic_timer_interrupt
> 0.04% [vdso] [.] 0x00007fff2e1ff607
> 0.04% [kernel] [k] apic_update_ppr
> 0.04% [kernel] [k] do_select
> 0.04% [kernel] [k] svm_scale_tsc
> 0.04% [kernel] [k] system_call_after_swapgs
> 0.03% [kernel] [k] kvm_lapic_get_cr8
> 0.03% perf [.] sort__sym_cmp
> 0.03% [kernel] [k] find_next_bit
> 0.03% [kernel] [k] kvm_set_cr8
> 0.03% [kernel] [k] rcu_check_callbacks
> 9750 unprocessable samples recorded.9751 unprocessable samples recorded.9752 unprocessable samples recorded.9753 unprocessable samples recorded.9754 unprocessable samples recorded.9755 unprocessable samples recorded.9756 unprocessable samples recorded.9757 u nprocessable samples recorded.9758 unprocessable samples recorded.9759 unprocessable samples recorded.9760 unprocessable samples recorded.9761 unprocessable samples recorded.9762 unprocessable samples recorded.9763 unprocessable samples recorded.
>
>
>
> PerfTop: 61584 irqs/sec kernel:97.4% exact: 0.0% [4000Hz cycles], (all, 16 CPUs)
> --------------------------------------------------------------------------------------------------------------------------------
>
> 36.73% [kernel] [k] _raw_spin_lock_irqsave
> 19.00% [kernel] [k] isolate_freepages_block
> 5.80% [kernel] [k] yield_to
> 5.23% [kernel] [k] _raw_spin_lock
> 3.97% [kernel] [k] kvm_vcpu_on_spin
> 2.98% [kernel] [k] add_preempt_count
> 2.45% [kernel] [k] sub_preempt_count
> 2.37% [kernel] [k] _raw_spin_unlock
> 2.22% [kernel] [k] svm_vcpu_run
> 2.19% [kernel] [k] kvm_vcpu_yield_to
> 1.90% [kernel] [k] memcmp
> 1.54% [kernel] [k] get_pid_task
> 1.39% [kernel] [k] kvm_arch_vcpu_ioctl_run
> 1.30% [kernel] [k] pid_task
> 0.75% [kernel] [k] rcu_note_context_switch
> 0.74% [kernel] [k] trace_hardirqs_on
> 0.58% [kernel] [k] __rcu_read_unlock
> 0.55% [kernel] [k] trace_preempt_on
> 0.47% [kernel] [k] __srcu_read_lock
> 0.44% [kernel] [k] get_parent_ip
> 0.41% [kernel] [k] clear_page_c
> 0.40% [kernel] [k] get_pageblock_flags_group
> 0.39% [kernel] [k] in_lock_functions
> 0.36% [kernel] [k] trace_preempt_off
> 0.35% [kernel] [k] trace_hardirqs_off
> 0.23% [kernel] [k] __srcu_read_unlock
> 0.20% [kernel] [k] __rcu_read_lock
> 0.15% [kernel] [k] _raw_spin_lock_irq
> 0.14% [kernel] [k] handle_exit
> 0.12% [kernel] [k] find_highest_vector
> 0.11% [kernel] [k] resched_task
> 0.10% libc-2.10.1.so [.] strcmp
> 0.09% [kernel] [k] _raw_spin_unlock_irqrestore
> 0.09% [kernel] [k] ktime_get
> 0.08% [kernel] [k] pause_interception
> 0.08% [kernel] [k] copy_page_c
> 0.07% [kernel] [k] __schedule
> 0.07% [kernel] [k] compact_zone
> 0.07% perf [.] dso__find_symbol
> 0.06% perf [.] add_hist_entry
> 0.06% [kernel] [k] read_tsc
> 0.06% [kernel] [k] svm_read_l1_tsc
> 0.05% [kernel] [k] native_read_tsc
> 0.05% [kernel] [k] ktime_get_update_offsets
> 0.05% [kernel] [k] compaction_alloc
> 0.05% libc-2.10.1.so [.] 0x0000000000073ae0
> 0.05% [kernel] [k] kmem_cache_alloc
> 0.05% [kernel] [k] svm_complete_interrupts
> 0.05% [kernel] [k] kvm_check_async_pf_completion
> 0.05% [kernel] [k] apic_timer_interrupt
> 0.05% perf [.] sort__dso_cmp
> 0.05% [kernel] [k] kvm_cpu_has_pending_timer
> 0.04% [kernel] [k] svm_scale_tsc
> 0.04% [kernel] [k] isolate_migratepages_range
> 0.04% [kernel] [k] sched_clock_cpu
> 0.04% [kernel] [k] __rcu_pending
> 0.04% [kernel] [k] apic_update_ppr
> 0.04% [kernel] [k] do_select
> 0.04% [kernel] [k] perf_pmu_disable
> 0.04% [kernel] [k] kvm_set_cr8
> 0.04% [kernel] [k] update_curr
> 0.04% [kernel] [k] reschedule_interrupt
> 0.03% [kernel] [k] kvm_lapic_get_cr8
> 0.03% libc-2.10.1.so [.] strstr
> 0.03% [kernel] [k] apic_has_pending_timer
> 0.03% perf [.] sort__sym_cmp
> 4963 unprocessable samples recorded.4964 unprocessable samples recorded.4965 unprocessable samples recorded.4966 unprocessable samples recorded.4967 unprocessable samples recorded.4968 unprocessable samples recorded.4969 unprocessable samples recorded.4970 unprocessable samples recorded.4971 unprocessable samples recorded.4972 unprocessable samples recorded.4973 unprocessable samples recorded.4974 unprocessable samples recorded.4975 unprocessable samples recorded.
>
^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: [Qemu-devel] Windows slow boot: contractor wanted
@ 2012-08-21 15:39 ` Troy Benjegerdes
0 siblings, 0 replies; 101+ messages in thread
From: Troy Benjegerdes @ 2012-08-21 15:39 UTC (permalink / raw)
To: Richard Davies; +Cc: Avi Kivity, kvm, qemu-devel
Do you have any way to determine what CPU groups the different VMs
are running on?
If you end up in an overcommit situation where half the 'virtual'
cpus are on one AMD socket, and the other half are on a different
AMD socket, then you'll be thrashing the hypertransport link.
At Cray we were very carefull to never overcommit runnable processes
to CPUS, and generally locked processes to a single cpu.
Have a read of
http://berrange.com/posts/2010/02/12/controlling-guest-cpu-numa-affinity-in-libvirt-with-qemu-kvm-xen/
I'm going to speculate that when things don't work very well you end up with
memory from a booting guest scattered across many different NUMA nodes/cpus,
and then it really won't matter how good the spin loop/scheduler code is
because you are bound by the additional latency and bandwidth limitations of
running on one socekt and accessing half the memory that's resident on a
different socket.
On Tue, Aug 21, 2012 at 04:21:07PM +0100, Richard Davies wrote:
> Avi Kivity wrote:
> > Richard Davies wrote:
> > > We're running host kernel 3.5.1 and qemu-kvm 1.1.1.
> > >
> > > I hadn't though about it, but I agree this is related to cpu overcommit. The
> > > slow boots are intermittent (and infrequent) with cpu overcommit whereas I
> > > don't think it occurs without cpu overcommit.
> > >
> > > In addition, if there is a slow boot ongoing, and you kill some other VMs to
> > > reduce cpu overcommit then this will sometimes speed it up.
> > >
> > > I guess the question is why even with overcommit most boots are fine, but
> > > some small fraction then go slow?
> >
> > Could be a bug. The scheduler and the spin-loop handling code fight
> > each other instead of working well.
> >
> > Please provide snapshots of 'perf top' while a slow boot is in progress.
>
> Below are two 'perf top' snapshots during a slow boot, which appear to me to
> support your idea of a spin-lock problem.
>
> There are a lot more "unprocessable samples recorded" messages at the end of
> each snapshot which I haven't included. I think these may be from the guest
> OS - the kernel is listed, and qemu-kvm itself is listed on some other
> traces which I did, although not these.
>
> Richard.
>
>
>
> PerfTop: 62249 irqs/sec kernel:96.9% exact: 0.0% [4000Hz cycles], (all, 16 CPUs)
> --------------------------------------------------------------------------------------------------------------------------------
>
> 35.80% [kernel] [k] _raw_spin_lock_irqsave
> 21.64% [kernel] [k] isolate_freepages_block
> 5.91% [kernel] [k] yield_to
> 4.95% [kernel] [k] _raw_spin_lock
> 3.37% [kernel] [k] kvm_vcpu_on_spin
> 2.74% [kernel] [k] add_preempt_count
> 2.45% [kernel] [k] _raw_spin_unlock
> 2.33% [kernel] [k] sub_preempt_count
> 2.18% [kernel] [k] svm_vcpu_run
> 2.17% [kernel] [k] kvm_vcpu_yield_to
> 1.89% [kernel] [k] memcmp
> 1.50% [kernel] [k] get_pid_task
> 1.26% [kernel] [k] kvm_arch_vcpu_ioctl_run
> 1.16% [kernel] [k] pid_task
> 0.70% [kernel] [k] rcu_note_context_switch
> 0.70% [kernel] [k] trace_hardirqs_on
> 0.52% [kernel] [k] __rcu_read_unlock
> 0.51% [kernel] [k] trace_preempt_on
> 0.47% [kernel] [k] __srcu_read_lock
> 0.43% [kernel] [k] get_parent_ip
> 0.42% [kernel] [k] get_pageblock_flags_group
> 0.38% [kernel] [k] in_lock_functions
> 0.34% [kernel] [k] trace_preempt_off
> 0.34% [kernel] [k] trace_hardirqs_off
> 0.29% [kernel] [k] clear_page_c
> 0.23% [kernel] [k] __srcu_read_unlock
> 0.20% [kernel] [k] __rcu_read_lock
> 0.14% [kernel] [k] handle_exit
> 0.11% libc-2.10.1.so [.] strcmp
> 0.11% [kernel] [k] _raw_spin_unlock_irqrestore
> 0.11% [kernel] [k] _raw_spin_lock_irq
> 0.11% [kernel] [k] find_highest_vector
> 0.09% [kernel] [k] ktime_get
> 0.08% [kernel] [k] copy_page_c
> 0.08% [kernel] [k] pause_interception
> 0.08% [kernel] [k] kmem_cache_alloc
> 0.08% [kernel] [k] resched_task
> 0.08% perf [.] dso__find_symbol
> 0.06% [kernel] [k] compaction_alloc
> 0.06% libc-2.10.1.so [.] 0x0000000000076dab
> 0.06% [kernel] [k] read_tsc
> 0.06% perf [.] add_hist_entry
> 0.05% [kernel] [k] svm_read_l1_tsc
> 0.05% [kernel] [k] native_read_tsc
> 0.05% perf [.] sort__dso_cmp
> 0.05% [kernel] [k] copy_user_generic_string
> 0.05% [kernel] [k] ktime_get_update_offsets
> 0.04% [kernel] [k] kvm_check_async_pf_completion
> 0.04% [kernel] [k] __schedule
> 0.04% [kernel] [k] __rcu_pending
> 0.04% [kernel] [k] svm_complete_interrupts
> 0.04% [kernel] [k] perf_pmu_disable
> 0.04% [kernel] [k] isolate_migratepages_range
> 0.04% [kernel] [k] sched_clock_cpu
> 0.04% [kernel] [k] kvm_cpu_has_pending_timer
> 0.04% [kernel] [k] apic_timer_interrupt
> 0.04% [vdso] [.] 0x00007fff2e1ff607
> 0.04% [kernel] [k] apic_update_ppr
> 0.04% [kernel] [k] do_select
> 0.04% [kernel] [k] svm_scale_tsc
> 0.04% [kernel] [k] system_call_after_swapgs
> 0.03% [kernel] [k] kvm_lapic_get_cr8
> 0.03% perf [.] sort__sym_cmp
> 0.03% [kernel] [k] find_next_bit
> 0.03% [kernel] [k] kvm_set_cr8
> 0.03% [kernel] [k] rcu_check_callbacks
> 9750 unprocessable samples recorded.9751 unprocessable samples recorded.9752 unprocessable samples recorded.9753 unprocessable samples recorded.9754 unprocessable samples recorded.9755 unprocessable samples recorded.9756 unprocessable samples recorded.9757 u nprocessable samples recorded.9758 unprocessable samples recorded.9759 unprocessable samples recorded.9760 unprocessable samples recorded.9761 unprocessable samples recorded.9762 unprocessable samples recorded.9763 unprocessable samples recorded.
>
>
>
> PerfTop: 61584 irqs/sec kernel:97.4% exact: 0.0% [4000Hz cycles], (all, 16 CPUs)
> --------------------------------------------------------------------------------------------------------------------------------
>
> 36.73% [kernel] [k] _raw_spin_lock_irqsave
> 19.00% [kernel] [k] isolate_freepages_block
> 5.80% [kernel] [k] yield_to
> 5.23% [kernel] [k] _raw_spin_lock
> 3.97% [kernel] [k] kvm_vcpu_on_spin
> 2.98% [kernel] [k] add_preempt_count
> 2.45% [kernel] [k] sub_preempt_count
> 2.37% [kernel] [k] _raw_spin_unlock
> 2.22% [kernel] [k] svm_vcpu_run
> 2.19% [kernel] [k] kvm_vcpu_yield_to
> 1.90% [kernel] [k] memcmp
> 1.54% [kernel] [k] get_pid_task
> 1.39% [kernel] [k] kvm_arch_vcpu_ioctl_run
> 1.30% [kernel] [k] pid_task
> 0.75% [kernel] [k] rcu_note_context_switch
> 0.74% [kernel] [k] trace_hardirqs_on
> 0.58% [kernel] [k] __rcu_read_unlock
> 0.55% [kernel] [k] trace_preempt_on
> 0.47% [kernel] [k] __srcu_read_lock
> 0.44% [kernel] [k] get_parent_ip
> 0.41% [kernel] [k] clear_page_c
> 0.40% [kernel] [k] get_pageblock_flags_group
> 0.39% [kernel] [k] in_lock_functions
> 0.36% [kernel] [k] trace_preempt_off
> 0.35% [kernel] [k] trace_hardirqs_off
> 0.23% [kernel] [k] __srcu_read_unlock
> 0.20% [kernel] [k] __rcu_read_lock
> 0.15% [kernel] [k] _raw_spin_lock_irq
> 0.14% [kernel] [k] handle_exit
> 0.12% [kernel] [k] find_highest_vector
> 0.11% [kernel] [k] resched_task
> 0.10% libc-2.10.1.so [.] strcmp
> 0.09% [kernel] [k] _raw_spin_unlock_irqrestore
> 0.09% [kernel] [k] ktime_get
> 0.08% [kernel] [k] pause_interception
> 0.08% [kernel] [k] copy_page_c
> 0.07% [kernel] [k] __schedule
> 0.07% [kernel] [k] compact_zone
> 0.07% perf [.] dso__find_symbol
> 0.06% perf [.] add_hist_entry
> 0.06% [kernel] [k] read_tsc
> 0.06% [kernel] [k] svm_read_l1_tsc
> 0.05% [kernel] [k] native_read_tsc
> 0.05% [kernel] [k] ktime_get_update_offsets
> 0.05% [kernel] [k] compaction_alloc
> 0.05% libc-2.10.1.so [.] 0x0000000000073ae0
> 0.05% [kernel] [k] kmem_cache_alloc
> 0.05% [kernel] [k] svm_complete_interrupts
> 0.05% [kernel] [k] kvm_check_async_pf_completion
> 0.05% [kernel] [k] apic_timer_interrupt
> 0.05% perf [.] sort__dso_cmp
> 0.05% [kernel] [k] kvm_cpu_has_pending_timer
> 0.04% [kernel] [k] svm_scale_tsc
> 0.04% [kernel] [k] isolate_migratepages_range
> 0.04% [kernel] [k] sched_clock_cpu
> 0.04% [kernel] [k] __rcu_pending
> 0.04% [kernel] [k] apic_update_ppr
> 0.04% [kernel] [k] do_select
> 0.04% [kernel] [k] perf_pmu_disable
> 0.04% [kernel] [k] kvm_set_cr8
> 0.04% [kernel] [k] update_curr
> 0.04% [kernel] [k] reschedule_interrupt
> 0.03% [kernel] [k] kvm_lapic_get_cr8
> 0.03% libc-2.10.1.so [.] strstr
> 0.03% [kernel] [k] apic_has_pending_timer
> 0.03% perf [.] sort__sym_cmp
> 4963 unprocessable samples recorded.4964 unprocessable samples recorded.4965 unprocessable samples recorded.4966 unprocessable samples recorded.4967 unprocessable samples recorded.4968 unprocessable samples recorded.4969 unprocessable samples recorded.4970 unprocessable samples recorded.4971 unprocessable samples recorded.4972 unprocessable samples recorded.4973 unprocessable samples recorded.4974 unprocessable samples recorded.4975 unprocessable samples recorded.
>
^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: Windows slow boot: contractor wanted
2012-08-21 15:21 ` [Qemu-devel] " Richard Davies
@ 2012-08-22 9:08 ` Avi Kivity
-1 siblings, 0 replies; 101+ messages in thread
From: Avi Kivity @ 2012-08-22 9:08 UTC (permalink / raw)
To: Richard Davies; +Cc: qemu-devel, kvm, Rik van Riel
On 08/21/2012 06:21 PM, Richard Davies wrote:
> Avi Kivity wrote:
>> Richard Davies wrote:
>> > We're running host kernel 3.5.1 and qemu-kvm 1.1.1.
>> >
>> > I hadn't though about it, but I agree this is related to cpu overcommit. The
>> > slow boots are intermittent (and infrequent) with cpu overcommit whereas I
>> > don't think it occurs without cpu overcommit.
>> >
>> > In addition, if there is a slow boot ongoing, and you kill some other VMs to
>> > reduce cpu overcommit then this will sometimes speed it up.
>> >
>> > I guess the question is why even with overcommit most boots are fine, but
>> > some small fraction then go slow?
>>
>> Could be a bug. The scheduler and the spin-loop handling code fight
>> each other instead of working well.
>>
>> Please provide snapshots of 'perf top' while a slow boot is in progress.
>
> Below are two 'perf top' snapshots during a slow boot, which appear to me to
> support your idea of a spin-lock problem.
>
> There are a lot more "unprocessable samples recorded" messages at the end of
> each snapshot which I haven't included. I think these may be from the guest
> OS - the kernel is listed, and qemu-kvm itself is listed on some other
> traces which I did, although not these.
>
> Richard.
>
>
>
> PerfTop: 62249 irqs/sec kernel:96.9% exact: 0.0% [4000Hz cycles], (all, 16 CPUs)
> --------------------------------------------------------------------------------------------------------------------------------
>
> 35.80% [kernel] [k] _raw_spin_lock_irqsave
> 21.64% [kernel] [k] isolate_freepages_block
Please disable ksm, and if this function persists in the profile, reduce
some memory from the guests.
> 5.91% [kernel] [k] yield_to
> 4.95% [kernel] [k] _raw_spin_lock
> 3.37% [kernel] [k] kvm_vcpu_on_spin
Except for isolate_freepages_block, all functions up to here have to do
with dealing with cpu overcommit. But let's deal with them after we see
a profile with isolate_freepages_block removed.
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: [Qemu-devel] Windows slow boot: contractor wanted
@ 2012-08-22 9:08 ` Avi Kivity
0 siblings, 0 replies; 101+ messages in thread
From: Avi Kivity @ 2012-08-22 9:08 UTC (permalink / raw)
To: Richard Davies; +Cc: qemu-devel, kvm
On 08/21/2012 06:21 PM, Richard Davies wrote:
> Avi Kivity wrote:
>> Richard Davies wrote:
>> > We're running host kernel 3.5.1 and qemu-kvm 1.1.1.
>> >
>> > I hadn't though about it, but I agree this is related to cpu overcommit. The
>> > slow boots are intermittent (and infrequent) with cpu overcommit whereas I
>> > don't think it occurs without cpu overcommit.
>> >
>> > In addition, if there is a slow boot ongoing, and you kill some other VMs to
>> > reduce cpu overcommit then this will sometimes speed it up.
>> >
>> > I guess the question is why even with overcommit most boots are fine, but
>> > some small fraction then go slow?
>>
>> Could be a bug. The scheduler and the spin-loop handling code fight
>> each other instead of working well.
>>
>> Please provide snapshots of 'perf top' while a slow boot is in progress.
>
> Below are two 'perf top' snapshots during a slow boot, which appear to me to
> support your idea of a spin-lock problem.
>
> There are a lot more "unprocessable samples recorded" messages at the end of
> each snapshot which I haven't included. I think these may be from the guest
> OS - the kernel is listed, and qemu-kvm itself is listed on some other
> traces which I did, although not these.
>
> Richard.
>
>
>
> PerfTop: 62249 irqs/sec kernel:96.9% exact: 0.0% [4000Hz cycles], (all, 16 CPUs)
> --------------------------------------------------------------------------------------------------------------------------------
>
> 35.80% [kernel] [k] _raw_spin_lock_irqsave
> 21.64% [kernel] [k] isolate_freepages_block
Please disable ksm, and if this function persists in the profile, reduce
some memory from the guests.
> 5.91% [kernel] [k] yield_to
> 4.95% [kernel] [k] _raw_spin_lock
> 3.37% [kernel] [k] kvm_vcpu_on_spin
Except for isolate_freepages_block, all functions up to here have to do
with dealing with cpu overcommit. But let's deal with them after we see
a profile with isolate_freepages_block removed.
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: Windows slow boot: contractor wanted
2012-08-22 9:08 ` [Qemu-devel] " Avi Kivity
@ 2012-08-22 12:40 ` Richard Davies
-1 siblings, 0 replies; 101+ messages in thread
From: Richard Davies @ 2012-08-22 12:40 UTC (permalink / raw)
To: Avi Kivity; +Cc: qemu-devel, kvm, Rik van Riel
Avi Kivity wrote:
> Richard Davies wrote:
> > Below are two 'perf top' snapshots during a slow boot, which appear to
> > me to support your idea of a spin-lock problem.
...
> > PerfTop: 62249 irqs/sec kernel:96.9% exact: 0.0% [4000Hz cycles], (all, 16 CPUs)
> > --------------------------------------------------------------------------------------------------------------------------------
> >
> > 35.80% [kernel] [k] _raw_spin_lock_irqsave
> > 21.64% [kernel] [k] isolate_freepages_block
>
> Please disable ksm, and if this function persists in the profile, reduce
> some memory from the guests.
>
> > 5.91% [kernel] [k] yield_to
> > 4.95% [kernel] [k] _raw_spin_lock
> > 3.37% [kernel] [k] kvm_vcpu_on_spin
>
> Except for isolate_freepages_block, all functions up to here have to do
> with dealing with cpu overcommit. But let's deal with them after we see
> a profile with isolate_freepages_block removed.
I can trigger the slow boots without KSM and they have the same profile,
with _raw_spin_lock_irqsave and isolate_freepages_block at the top.
I reduced to 3x 20GB 8-core VMs on a 128GB host (rather than 3x 40GB 8-core
VMs), and haven't managed to get a really slow boot yet (>5 minutes). I'll
post agan when I get one.
In the slowest boot that I have so far (1-2 minutes), this is the perf top
ouput:
PerfTop: 26741 irqs/sec kernel:97.5% exact: 0.0% [4000Hz cycles], (all, 16 CPUs)
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
53.94% [kernel] [k] clear_page_c
2.77% [kernel] [k] svm_vcpu_put
2.60% [kernel] [k] svm_vcpu_run
1.79% [kernel] [k] sub_preempt_count
1.56% [kernel] [k] svm_vcpu_load
1.44% [kernel] [k] __schedule
1.36% [kernel] [k] kvm_arch_vcpu_ioctl_run
1.34% [kernel] [k] resched_task
1.32% [kernel] [k] _raw_spin_lock
0.98% [kernel] [k] trace_preempt_on
0.95% [kernel] [k] get_parent_ip
0.94% [kernel] [k] yield_to
0.88% [kernel] [k] __switch_to
0.87% [kernel] [k] get_page_from_freelist
0.81% [kernel] [k] in_lock_functions
0.76% [kernel] [k] add_preempt_count
0.72% [kernel] [k] kvm_vcpu_on_spin
0.69% [kernel] [k] free_pages_prepare
0.59% [kernel] [k] find_highest_vector
0.57% [kernel] [k] rcu_note_context_switch
0.55% [kernel] [k] paging64_walk_addr_generic
0.54% [kernel] [k] __srcu_read_lock
0.49% [kernel] [k] trace_preempt_off
0.47% [kernel] [k] reschedule_interrupt
0.45% [kernel] [k] sched_clock_cpu
0.40% [kernel] [k] trace_hardirqs_on
0.38% [kernel] [k] clear_huge_page
0.37% [kernel] [k] prep_compound_page
0.32% [kernel] [k] x86_emulate_instruction
0.32% [kernel] [k] _raw_spin_lock_irq
0.31% [kernel] [k] __srcu_read_unlock
0.31% [kernel] [k] trace_hardirqs_off
0.30% [kernel] [k] pick_next_task_fair
0.29% [kernel] [k] kvm_find_cpuid_entry
0.28% [kernel] [k] x86_decode_insn
0.26% [kernel] [k] kvm_cpu_has_pending_timer
0.26% [kernel] [k] init_emulate_ctxt
0.25% [kernel] [k] kvm_vcpu_yield_to
0.24% [kernel] [k] clear_buddies
0.24% [kernel] [k] gs_change
0.23% [kernel] [k] handle_exit
0.22% qemu-kvm [.] vnc_refresh_server_surface
0.22% [kernel] [k] update_min_vruntime
0.22% [kernel] [k] gfn_to_memslot
0.22% [kernel] [k] x86_emulate_insn
0.19% [kernel] [k] kvm_sched_out
0.19% [kernel] [k] pid_task
0.18% [kernel] [k] _raw_spin_unlock
0.18% libc-2.10.1.so [.] strcmp
0.17% [kernel] [k] get_pid_task
0.17% [kernel] [k] yield_task_fair
0.17% [kernel] [k] default_send_IPI_mask_sequence_phys
0.16% [kernel] [k] __rcu_read_unlock
0.16% [kernel] [k] kvm_get_cr8
0.16% [kernel] [k] native_sched_clock
0.16% [kernel] [k] do_insn_fetch
0.15% [kernel] [k] set_next_entity
0.14% [kernel] [k] update_rq_clock
0.14% [kernel] [k] __enqueue_entity
0.14% [kernel] [k] kvm_read_guest
0.13% qemu-kvm [.] g_hash_table_lookup
0.13% [kernel] [k] rb_erase
0.12% [kernel] [k] decode_operand
0.12% libz.so.1.2.3 [.] 0x0000000000006451
0.12% [kernel] [k] update_curr
0.12% [kernel] [k] apic_update_ppr
0.12% [kernel] [k] ktime_get
5207 unprocessable samples recorded.5208 unprocessable samples recorded.5209 unprocessable samples recorded.5210 unprocessable samples recorded.5211 unprocessable samples recorded.5212 unprocessable samples recorded.5213 unprocessable samples recorded.5214 unprocessable samples recorded.5215 unprocessable samples recorded.5216 unprocessable samples recorded.5217 unprocessable samples recorded.5218 unprocessable samples recorded.5219 unprocessable samples recorded.5220 unprocessable samples recorded.5221 unprocessable samples recorded.5222 unprocessable samples recorded.5223 unprocessable samples recorded.5224
Thanks,
Richard.
^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: [Qemu-devel] Windows slow boot: contractor wanted
@ 2012-08-22 12:40 ` Richard Davies
0 siblings, 0 replies; 101+ messages in thread
From: Richard Davies @ 2012-08-22 12:40 UTC (permalink / raw)
To: Avi Kivity; +Cc: qemu-devel, kvm
Avi Kivity wrote:
> Richard Davies wrote:
> > Below are two 'perf top' snapshots during a slow boot, which appear to
> > me to support your idea of a spin-lock problem.
...
> > PerfTop: 62249 irqs/sec kernel:96.9% exact: 0.0% [4000Hz cycles], (all, 16 CPUs)
> > --------------------------------------------------------------------------------------------------------------------------------
> >
> > 35.80% [kernel] [k] _raw_spin_lock_irqsave
> > 21.64% [kernel] [k] isolate_freepages_block
>
> Please disable ksm, and if this function persists in the profile, reduce
> some memory from the guests.
>
> > 5.91% [kernel] [k] yield_to
> > 4.95% [kernel] [k] _raw_spin_lock
> > 3.37% [kernel] [k] kvm_vcpu_on_spin
>
> Except for isolate_freepages_block, all functions up to here have to do
> with dealing with cpu overcommit. But let's deal with them after we see
> a profile with isolate_freepages_block removed.
I can trigger the slow boots without KSM and they have the same profile,
with _raw_spin_lock_irqsave and isolate_freepages_block at the top.
I reduced to 3x 20GB 8-core VMs on a 128GB host (rather than 3x 40GB 8-core
VMs), and haven't managed to get a really slow boot yet (>5 minutes). I'll
post agan when I get one.
In the slowest boot that I have so far (1-2 minutes), this is the perf top
ouput:
PerfTop: 26741 irqs/sec kernel:97.5% exact: 0.0% [4000Hz cycles], (all, 16 CPUs)
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
53.94% [kernel] [k] clear_page_c
2.77% [kernel] [k] svm_vcpu_put
2.60% [kernel] [k] svm_vcpu_run
1.79% [kernel] [k] sub_preempt_count
1.56% [kernel] [k] svm_vcpu_load
1.44% [kernel] [k] __schedule
1.36% [kernel] [k] kvm_arch_vcpu_ioctl_run
1.34% [kernel] [k] resched_task
1.32% [kernel] [k] _raw_spin_lock
0.98% [kernel] [k] trace_preempt_on
0.95% [kernel] [k] get_parent_ip
0.94% [kernel] [k] yield_to
0.88% [kernel] [k] __switch_to
0.87% [kernel] [k] get_page_from_freelist
0.81% [kernel] [k] in_lock_functions
0.76% [kernel] [k] add_preempt_count
0.72% [kernel] [k] kvm_vcpu_on_spin
0.69% [kernel] [k] free_pages_prepare
0.59% [kernel] [k] find_highest_vector
0.57% [kernel] [k] rcu_note_context_switch
0.55% [kernel] [k] paging64_walk_addr_generic
0.54% [kernel] [k] __srcu_read_lock
0.49% [kernel] [k] trace_preempt_off
0.47% [kernel] [k] reschedule_interrupt
0.45% [kernel] [k] sched_clock_cpu
0.40% [kernel] [k] trace_hardirqs_on
0.38% [kernel] [k] clear_huge_page
0.37% [kernel] [k] prep_compound_page
0.32% [kernel] [k] x86_emulate_instruction
0.32% [kernel] [k] _raw_spin_lock_irq
0.31% [kernel] [k] __srcu_read_unlock
0.31% [kernel] [k] trace_hardirqs_off
0.30% [kernel] [k] pick_next_task_fair
0.29% [kernel] [k] kvm_find_cpuid_entry
0.28% [kernel] [k] x86_decode_insn
0.26% [kernel] [k] kvm_cpu_has_pending_timer
0.26% [kernel] [k] init_emulate_ctxt
0.25% [kernel] [k] kvm_vcpu_yield_to
0.24% [kernel] [k] clear_buddies
0.24% [kernel] [k] gs_change
0.23% [kernel] [k] handle_exit
0.22% qemu-kvm [.] vnc_refresh_server_surface
0.22% [kernel] [k] update_min_vruntime
0.22% [kernel] [k] gfn_to_memslot
0.22% [kernel] [k] x86_emulate_insn
0.19% [kernel] [k] kvm_sched_out
0.19% [kernel] [k] pid_task
0.18% [kernel] [k] _raw_spin_unlock
0.18% libc-2.10.1.so [.] strcmp
0.17% [kernel] [k] get_pid_task
0.17% [kernel] [k] yield_task_fair
0.17% [kernel] [k] default_send_IPI_mask_sequence_phys
0.16% [kernel] [k] __rcu_read_unlock
0.16% [kernel] [k] kvm_get_cr8
0.16% [kernel] [k] native_sched_clock
0.16% [kernel] [k] do_insn_fetch
0.15% [kernel] [k] set_next_entity
0.14% [kernel] [k] update_rq_clock
0.14% [kernel] [k] __enqueue_entity
0.14% [kernel] [k] kvm_read_guest
0.13% qemu-kvm [.] g_hash_table_lookup
0.13% [kernel] [k] rb_erase
0.12% [kernel] [k] decode_operand
0.12% libz.so.1.2.3 [.] 0x0000000000006451
0.12% [kernel] [k] update_curr
0.12% [kernel] [k] apic_update_ppr
0.12% [kernel] [k] ktime_get
5207 unprocessable samples recorded.5208 unprocessable samples recorded.5209 unprocessable samples recorded.5210 unprocessable samples recorded.5211 unprocessable samples recorded.5212 unprocessable samples recorded.5213 unprocessable samples recorded.5214 unprocessable samples recorded.5215 unprocessable samples recorded.5216 unprocessable samples recorded.5217 unprocessable samples recorded.5218 unprocessable samples recorded.5219 unprocessable samples recorded.5220 unprocessable samples recorded.5221 unprocessable samples recorded.5222 unprocessable samples recorded.5223 unprocessable samples recorded.5224
Thanks,
Richard.
^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: Windows slow boot: contractor wanted
2012-08-22 12:40 ` [Qemu-devel] " Richard Davies
@ 2012-08-22 12:44 ` Avi Kivity
-1 siblings, 0 replies; 101+ messages in thread
From: Avi Kivity @ 2012-08-22 12:44 UTC (permalink / raw)
To: Richard Davies; +Cc: qemu-devel, kvm, Rik van Riel
On 08/22/2012 03:40 PM, Richard Davies wrote:
>
> I can trigger the slow boots without KSM and they have the same profile,
> with _raw_spin_lock_irqsave and isolate_freepages_block at the top.
>
> I reduced to 3x 20GB 8-core VMs on a 128GB host (rather than 3x 40GB 8-core
> VMs), and haven't managed to get a really slow boot yet (>5 minutes). I'll
> post agan when I get one.
I think you can go higher than that. But 120GB on a 128GB host is
pushing it.
>
> In the slowest boot that I have so far (1-2 minutes), this is the perf top
> ouput:
>
>
> PerfTop: 26741 irqs/sec kernel:97.5% exact: 0.0% [4000Hz cycles], (all, 16 CPUs)
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> 53.94% [kernel] [k] clear_page_c
> 2.77% [kernel] [k] svm_vcpu_put
> 2.60% [kernel] [k] svm_vcpu_run
> 1.79% [kernel] [k] sub_preempt_count
> 1.56% [kernel] [k] svm_vcpu_load
> 1.44% [kernel] [k] __schedule
> 1.36% [kernel] [k] kvm_arch_vcpu_ioctl_run
> 1.34% [kernel] [k] resched_task
> 1.32% [kernel] [k] _raw_spin_lock
> 0.98% [kernel] [k] trace_preempt_on
> 0.95% [kernel] [k] get_parent_ip
> 0.94% [kernel] [k] yield_to
This is pretty normal, Widows is touching memory so clear_page_c() is
called to scrub it.
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: [Qemu-devel] Windows slow boot: contractor wanted
@ 2012-08-22 12:44 ` Avi Kivity
0 siblings, 0 replies; 101+ messages in thread
From: Avi Kivity @ 2012-08-22 12:44 UTC (permalink / raw)
To: Richard Davies; +Cc: qemu-devel, kvm
On 08/22/2012 03:40 PM, Richard Davies wrote:
>
> I can trigger the slow boots without KSM and they have the same profile,
> with _raw_spin_lock_irqsave and isolate_freepages_block at the top.
>
> I reduced to 3x 20GB 8-core VMs on a 128GB host (rather than 3x 40GB 8-core
> VMs), and haven't managed to get a really slow boot yet (>5 minutes). I'll
> post agan when I get one.
I think you can go higher than that. But 120GB on a 128GB host is
pushing it.
>
> In the slowest boot that I have so far (1-2 minutes), this is the perf top
> ouput:
>
>
> PerfTop: 26741 irqs/sec kernel:97.5% exact: 0.0% [4000Hz cycles], (all, 16 CPUs)
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> 53.94% [kernel] [k] clear_page_c
> 2.77% [kernel] [k] svm_vcpu_put
> 2.60% [kernel] [k] svm_vcpu_run
> 1.79% [kernel] [k] sub_preempt_count
> 1.56% [kernel] [k] svm_vcpu_load
> 1.44% [kernel] [k] __schedule
> 1.36% [kernel] [k] kvm_arch_vcpu_ioctl_run
> 1.34% [kernel] [k] resched_task
> 1.32% [kernel] [k] _raw_spin_lock
> 0.98% [kernel] [k] trace_preempt_on
> 0.95% [kernel] [k] get_parent_ip
> 0.94% [kernel] [k] yield_to
This is pretty normal, Widows is touching memory so clear_page_c() is
called to scrub it.
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: Windows slow boot: contractor wanted
2012-08-22 12:44 ` [Qemu-devel] " Avi Kivity
@ 2012-08-22 14:41 ` Richard Davies
-1 siblings, 0 replies; 101+ messages in thread
From: Richard Davies @ 2012-08-22 14:41 UTC (permalink / raw)
To: Avi Kivity; +Cc: qemu-devel, kvm, Rik van Riel
Avi Kivity wrote:
> Richard Davies wrote:
> > I can trigger the slow boots without KSM and they have the same profile,
> > with _raw_spin_lock_irqsave and isolate_freepages_block at the top.
> >
> > I reduced to 3x 20GB 8-core VMs on a 128GB host (rather than 3x 40GB 8-core
> > VMs), and haven't managed to get a really slow boot yet (>5 minutes). I'll
> > post agan when I get one.
>
> I think you can go higher than that. But 120GB on a 128GB host is
> pushing it.
I've now triggered a very slow boot at 3x 36GB 8-core VMs on a 128GB host
(i.e. 108GB on a 128GB host).
It has the same profile with _raw_spin_lock_irqsave and
isolate_freepages_block at the top.
Richard.
^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: [Qemu-devel] Windows slow boot: contractor wanted
@ 2012-08-22 14:41 ` Richard Davies
0 siblings, 0 replies; 101+ messages in thread
From: Richard Davies @ 2012-08-22 14:41 UTC (permalink / raw)
To: Avi Kivity; +Cc: qemu-devel, kvm
Avi Kivity wrote:
> Richard Davies wrote:
> > I can trigger the slow boots without KSM and they have the same profile,
> > with _raw_spin_lock_irqsave and isolate_freepages_block at the top.
> >
> > I reduced to 3x 20GB 8-core VMs on a 128GB host (rather than 3x 40GB 8-core
> > VMs), and haven't managed to get a really slow boot yet (>5 minutes). I'll
> > post agan when I get one.
>
> I think you can go higher than that. But 120GB on a 128GB host is
> pushing it.
I've now triggered a very slow boot at 3x 36GB 8-core VMs on a 128GB host
(i.e. 108GB on a 128GB host).
It has the same profile with _raw_spin_lock_irqsave and
isolate_freepages_block at the top.
Richard.
^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: Windows slow boot: contractor wanted
2012-08-22 14:41 ` [Qemu-devel] " Richard Davies
@ 2012-08-22 14:53 ` Avi Kivity
-1 siblings, 0 replies; 101+ messages in thread
From: Avi Kivity @ 2012-08-22 14:53 UTC (permalink / raw)
To: Richard Davies; +Cc: qemu-devel, kvm, Rik van Riel
On 08/22/2012 05:41 PM, Richard Davies wrote:
> Avi Kivity wrote:
>> Richard Davies wrote:
>> > I can trigger the slow boots without KSM and they have the same profile,
>> > with _raw_spin_lock_irqsave and isolate_freepages_block at the top.
>> >
>> > I reduced to 3x 20GB 8-core VMs on a 128GB host (rather than 3x 40GB 8-core
>> > VMs), and haven't managed to get a really slow boot yet (>5 minutes). I'll
>> > post agan when I get one.
>>
>> I think you can go higher than that. But 120GB on a 128GB host is
>> pushing it.
>
> I've now triggered a very slow boot at 3x 36GB 8-core VMs on a 128GB host
> (i.e. 108GB on a 128GB host).
>
> It has the same profile with _raw_spin_lock_irqsave and
> isolate_freepages_block at the top.
Then it's still memory starved.
Please provide /proc/zoneinfo while this is happening.
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: [Qemu-devel] Windows slow boot: contractor wanted
@ 2012-08-22 14:53 ` Avi Kivity
0 siblings, 0 replies; 101+ messages in thread
From: Avi Kivity @ 2012-08-22 14:53 UTC (permalink / raw)
To: Richard Davies; +Cc: qemu-devel, kvm
On 08/22/2012 05:41 PM, Richard Davies wrote:
> Avi Kivity wrote:
>> Richard Davies wrote:
>> > I can trigger the slow boots without KSM and they have the same profile,
>> > with _raw_spin_lock_irqsave and isolate_freepages_block at the top.
>> >
>> > I reduced to 3x 20GB 8-core VMs on a 128GB host (rather than 3x 40GB 8-core
>> > VMs), and haven't managed to get a really slow boot yet (>5 minutes). I'll
>> > post agan when I get one.
>>
>> I think you can go higher than that. But 120GB on a 128GB host is
>> pushing it.
>
> I've now triggered a very slow boot at 3x 36GB 8-core VMs on a 128GB host
> (i.e. 108GB on a 128GB host).
>
> It has the same profile with _raw_spin_lock_irqsave and
> isolate_freepages_block at the top.
Then it's still memory starved.
Please provide /proc/zoneinfo while this is happening.
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: Windows slow boot: contractor wanted
2012-08-22 14:41 ` [Qemu-devel] " Richard Davies
@ 2012-08-22 15:21 ` Rik van Riel
-1 siblings, 0 replies; 101+ messages in thread
From: Rik van Riel @ 2012-08-22 15:21 UTC (permalink / raw)
To: Richard Davies; +Cc: Avi Kivity, qemu-devel, kvm
On 08/22/2012 10:41 AM, Richard Davies wrote:
> Avi Kivity wrote:
>> Richard Davies wrote:
>>> I can trigger the slow boots without KSM and they have the same profile,
>>> with _raw_spin_lock_irqsave and isolate_freepages_block at the top.
>>>
>>> I reduced to 3x 20GB 8-core VMs on a 128GB host (rather than 3x 40GB 8-core
>>> VMs), and haven't managed to get a really slow boot yet (>5 minutes). I'll
>>> post agan when I get one.
>>
>> I think you can go higher than that. But 120GB on a 128GB host is
>> pushing it.
>
> I've now triggered a very slow boot at 3x 36GB 8-core VMs on a 128GB host
> (i.e. 108GB on a 128GB host).
>
> It has the same profile with _raw_spin_lock_irqsave and
> isolate_freepages_block at the top.
That's the page compaction code.
Mel Gorman and I have been working to fix that,
the latest fixes and improvements are in the -mm
kernel already.
^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: [Qemu-devel] Windows slow boot: contractor wanted
@ 2012-08-22 15:21 ` Rik van Riel
0 siblings, 0 replies; 101+ messages in thread
From: Rik van Riel @ 2012-08-22 15:21 UTC (permalink / raw)
To: Richard Davies; +Cc: Avi Kivity, kvm, qemu-devel
On 08/22/2012 10:41 AM, Richard Davies wrote:
> Avi Kivity wrote:
>> Richard Davies wrote:
>>> I can trigger the slow boots without KSM and they have the same profile,
>>> with _raw_spin_lock_irqsave and isolate_freepages_block at the top.
>>>
>>> I reduced to 3x 20GB 8-core VMs on a 128GB host (rather than 3x 40GB 8-core
>>> VMs), and haven't managed to get a really slow boot yet (>5 minutes). I'll
>>> post agan when I get one.
>>
>> I think you can go higher than that. But 120GB on a 128GB host is
>> pushing it.
>
> I've now triggered a very slow boot at 3x 36GB 8-core VMs on a 128GB host
> (i.e. 108GB on a 128GB host).
>
> It has the same profile with _raw_spin_lock_irqsave and
> isolate_freepages_block at the top.
That's the page compaction code.
Mel Gorman and I have been working to fix that,
the latest fixes and improvements are in the -mm
kernel already.
^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: Windows slow boot: contractor wanted
2012-08-22 14:53 ` [Qemu-devel] " Avi Kivity
@ 2012-08-22 15:26 ` Richard Davies
-1 siblings, 0 replies; 101+ messages in thread
From: Richard Davies @ 2012-08-22 15:26 UTC (permalink / raw)
To: Avi Kivity; +Cc: qemu-devel, kvm, Rik van Riel
Avi Kivity wrote:
> Richard Davies wrote:
> > Avi Kivity wrote:
> > > Richard Davies wrote:
> > > > I can trigger the slow boots without KSM and they have the same
> > > > profile, with _raw_spin_lock_irqsave and isolate_freepages_block at
> > > > the top.
> > > >
> > > > I reduced to 3x 20GB 8-core VMs on a 128GB host (rather than 3x 40GB
> > > > 8-core VMs), and haven't managed to get a really slow boot yet (>5
> > > > minutes). I'll post agan when I get one.
> > >
> > > I think you can go higher than that. But 120GB on a 128GB host is
> > > pushing it.
> >
> > I've now triggered a very slow boot at 3x 36GB 8-core VMs on a 128GB
> > host (i.e. 108GB on a 128GB host).
> >
> > It has the same profile with _raw_spin_lock_irqsave and
> > isolate_freepages_block at the top.
>
> Then it's still memory starved.
>
> Please provide /proc/zoneinfo while this is happening.
Here are two copies at /proc/zoneinfo a minute or so apart during a
situation where there are 3x 36GB 8-core VMs on a 128GB host, with two of
the three VMs slow booting.
Node 0, zone DMA
pages free 3968
min 3
low 3
high 4
scanned 0
spanned 4080
present 3904
nr_free_pages 3968
nr_inactive_anon 0
nr_active_anon 0
nr_inactive_file 0
nr_active_file 0
nr_unevictable 0
nr_mlock 0
nr_anon_pages 0
nr_mapped 0
nr_file_pages 0
nr_dirty 0
nr_writeback 0
nr_slab_reclaimable 0
nr_slab_unreclaimable 0
nr_page_table_pages 0
nr_kernel_stack 0
nr_unstable 0
nr_bounce 0
nr_vmscan_write 0
nr_vmscan_immediate_reclaim 0
nr_writeback_temp 0
nr_isolated_anon 0
nr_isolated_file 0
nr_shmem 0
nr_dirtied 0
nr_written 0
numa_hit 0
numa_miss 0
numa_foreign 0
numa_interleave 0
numa_local 0
numa_other 0
nr_anon_transparent_hugepages 0
protection: (0, 3502, 32230, 32230)
pagesets
cpu: 0
count: 0
high: 0
batch: 1
vm stats threshold: 10
cpu: 1
count: 0
high: 0
batch: 1
vm stats threshold: 10
cpu: 2
count: 0
high: 0
batch: 1
vm stats threshold: 10
cpu: 3
count: 0
high: 0
batch: 1
vm stats threshold: 10
cpu: 4
count: 0
high: 0
batch: 1
vm stats threshold: 10
cpu: 5
count: 0
high: 0
batch: 1
vm stats threshold: 10
cpu: 6
count: 0
high: 0
batch: 1
vm stats threshold: 10
cpu: 7
count: 0
high: 0
batch: 1
vm stats threshold: 10
cpu: 8
count: 0
high: 0
batch: 1
vm stats threshold: 10
cpu: 9
count: 0
high: 0
batch: 1
vm stats threshold: 10
cpu: 10
count: 0
high: 0
batch: 1
vm stats threshold: 10
cpu: 11
count: 0
high: 0
batch: 1
vm stats threshold: 10
cpu: 12
count: 0
high: 0
batch: 1
vm stats threshold: 10
cpu: 13
count: 0
high: 0
batch: 1
vm stats threshold: 10
cpu: 14
count: 0
high: 0
batch: 1
vm stats threshold: 10
cpu: 15
count: 0
high: 0
batch: 1
vm stats threshold: 10
all_unreclaimable: 1
start_pfn: 16
inactive_ratio: 1
Node 0, zone DMA32
pages free 29798
min 917
low 1146
high 1375
scanned 0
spanned 1044480
present 896720
nr_free_pages 29798
nr_inactive_anon 0
nr_active_anon 817152
nr_inactive_file 29243
nr_active_file 574
nr_unevictable 0
nr_mlock 0
nr_anon_pages 0
nr_mapped 1
nr_file_pages 29817
nr_dirty 0
nr_writeback 0
nr_slab_reclaimable 26
nr_slab_unreclaimable 2
nr_page_table_pages 244
nr_kernel_stack 0
nr_unstable 0
nr_bounce 0
nr_vmscan_write 0
nr_vmscan_immediate_reclaim 0
nr_writeback_temp 0
nr_isolated_anon 0
nr_isolated_file 0
nr_shmem 0
nr_dirtied 30546
nr_written 30546
numa_hit 42617
numa_miss 124755
numa_foreign 0
numa_interleave 0
numa_local 42023
numa_other 125349
nr_anon_transparent_hugepages 1596
protection: (0, 0, 28728, 28728)
pagesets
cpu: 0
count: 0
high: 186
batch: 31
vm stats threshold: 60
cpu: 1
count: 0
high: 186
batch: 31
vm stats threshold: 60
cpu: 2
count: 0
high: 186
batch: 31
vm stats threshold: 60
cpu: 3
count: 0
high: 186
batch: 31
vm stats threshold: 60
cpu: 4
count: 0
high: 186
batch: 31
vm stats threshold: 60
cpu: 5
count: 0
high: 186
batch: 31
vm stats threshold: 60
cpu: 6
count: 0
high: 186
batch: 31
vm stats threshold: 60
cpu: 7
count: 0
high: 186
batch: 31
vm stats threshold: 60
cpu: 8
count: 0
high: 186
batch: 31
vm stats threshold: 60
cpu: 9
count: 0
high: 186
batch: 31
vm stats threshold: 60
cpu: 10
count: 0
high: 186
batch: 31
vm stats threshold: 60
cpu: 11
count: 0
high: 186
batch: 31
vm stats threshold: 60
cpu: 12
count: 0
high: 186
batch: 31
vm stats threshold: 60
cpu: 13
count: 0
high: 186
batch: 31
vm stats threshold: 60
cpu: 14
count: 0
high: 186
batch: 31
vm stats threshold: 60
cpu: 15
count: 0
high: 186
batch: 31
vm stats threshold: 60
all_unreclaimable: 0
start_pfn: 4096
inactive_ratio: 5
Node 0, zone Normal
pages free 292707
min 7524
low 9405
high 11286
scanned 0
spanned 7471104
present 7354368
nr_free_pages 292707
nr_inactive_anon 281
nr_active_anon 3024092
nr_inactive_file 1824853
nr_active_file 2050217
nr_unevictable 22
nr_mlock 22
nr_anon_pages 5103
nr_mapped 570
nr_file_pages 3875107
nr_dirty 1
nr_writeback 0
nr_slab_reclaimable 99328
nr_slab_unreclaimable 2701
nr_page_table_pages 8153
nr_kernel_stack 127
nr_unstable 0
nr_bounce 0
nr_vmscan_write 0
nr_vmscan_immediate_reclaim 0
nr_writeback_temp 0
nr_isolated_anon 0
nr_isolated_file 0
nr_shmem 8
nr_dirtied 4910752
nr_written 4910735
numa_hit 11010852
numa_miss 973848
numa_foreign 6137099
numa_interleave 14102
numa_local 11003048
numa_other 981652
nr_anon_transparent_hugepages 5898
protection: (0, 0, 0, 0)
pagesets
cpu: 0
count: 29
high: 186
batch: 31
vm stats threshold: 90
cpu: 1
count: 2
high: 186
batch: 31
vm stats threshold: 90
cpu: 2
count: 46
high: 186
batch: 31
vm stats threshold: 90
cpu: 3
count: 26
high: 186
batch: 31
vm stats threshold: 90
cpu: 4
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 5
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 6
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 7
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 8
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 9
count: 18
high: 186
batch: 31
vm stats threshold: 90
cpu: 10
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 11
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 12
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 13
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 14
count: 1
high: 186
batch: 31
vm stats threshold: 90
cpu: 15
count: 0
high: 186
batch: 31
vm stats threshold: 90
all_unreclaimable: 0
start_pfn: 1048576
inactive_ratio: 16
Node 1, zone Normal
pages free 23288
min 8448
low 10560
high 12672
scanned 0
spanned 8388608
present 8257536
nr_free_pages 23288
nr_inactive_anon 361430
nr_active_anon 5925377
nr_inactive_file 1779378
nr_active_file 76158
nr_unevictable 444
nr_mlock 444
nr_anon_pages 603
nr_mapped 990
nr_file_pages 1855911
nr_dirty 3
nr_writeback 0
nr_slab_reclaimable 60961
nr_slab_unreclaimable 1404
nr_page_table_pages 10197
nr_kernel_stack 22
nr_unstable 0
nr_bounce 0
nr_vmscan_write 0
nr_vmscan_immediate_reclaim 0
nr_writeback_temp 0
nr_isolated_anon 0
nr_isolated_file 97
nr_shmem 5
nr_dirtied 5000958
nr_written 5000955
numa_hit 4879358
numa_miss 4315336
numa_foreign 1710349
numa_interleave 14052
numa_local 4860081
numa_other 4334613
nr_anon_transparent_hugepages 12277
protection: (0, 0, 0, 0)
pagesets
cpu: 0
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 1
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 2
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 3
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 4
count: 30
high: 186
batch: 31
vm stats threshold: 90
cpu: 5
count: 88
high: 186
batch: 31
vm stats threshold: 90
cpu: 6
count: 176
high: 186
batch: 31
vm stats threshold: 90
cpu: 7
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 8
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 9
count: 179
high: 186
batch: 31
vm stats threshold: 90
cpu: 10
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 11
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 12
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 13
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 14
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 15
count: 0
high: 186
batch: 31
vm stats threshold: 90
all_unreclaimable: 0
start_pfn: 8519680
inactive_ratio: 17
Node 2, zone Normal
pages free 11632
min 8448
low 10560
high 12672
scanned 3
spanned 8388608
present 8257536
nr_free_pages 11632
nr_inactive_anon 368719
nr_active_anon 6009871
nr_inactive_file 1721022
nr_active_file 47969
nr_unevictable 74
nr_mlock 74
nr_anon_pages 6741
nr_mapped 1678
nr_file_pages 1769100
nr_dirty 3
nr_writeback 0
nr_slab_reclaimable 31690
nr_slab_unreclaimable 1547
nr_page_table_pages 13178
nr_kernel_stack 52
nr_unstable 0
nr_bounce 0
nr_vmscan_write 0
nr_vmscan_immediate_reclaim 0
nr_writeback_temp 0
nr_isolated_anon 0
nr_isolated_file 0
nr_shmem 5
nr_dirtied 3264512
nr_written 3264506
numa_hit 3701723
numa_miss 3141775
numa_foreign 768925
numa_interleave 14093
numa_local 3685078
numa_other 3158420
nr_anon_transparent_hugepages 12446
protection: (0, 0, 0, 0)
pagesets
cpu: 0
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 1
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 2
count: 2
high: 186
batch: 31
vm stats threshold: 90
cpu: 3
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 4
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 5
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 6
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 7
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 8
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 9
count: 172
high: 186
batch: 31
vm stats threshold: 90
cpu: 10
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 11
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 12
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 13
count: 30
high: 186
batch: 31
vm stats threshold: 90
cpu: 14
count: 47
high: 186
batch: 31
vm stats threshold: 90
cpu: 15
count: 30
high: 186
batch: 31
vm stats threshold: 90
all_unreclaimable: 0
start_pfn: 16908288
inactive_ratio: 17
Node 3, zone Normal
pages free 42611
min 8448
low 10560
high 12672
scanned 0
spanned 8388608
present 8257536
nr_free_pages 42611
nr_inactive_anon 273
nr_active_anon 5728983
nr_inactive_file 1787163
nr_active_file 638839
nr_unevictable 79
nr_mlock 79
nr_anon_pages 2091
nr_mapped 670
nr_file_pages 2426028
nr_dirty 0
nr_writeback 0
nr_slab_reclaimable 27949
nr_slab_unreclaimable 1417
nr_page_table_pages 12372
nr_kernel_stack 28
nr_unstable 0
nr_bounce 0
nr_vmscan_write 0
nr_vmscan_immediate_reclaim 0
nr_writeback_temp 0
nr_isolated_anon 0
nr_isolated_file 0
nr_shmem 1
nr_dirtied 2734460
nr_written 2734448
numa_hit 5026640
numa_miss 1501721
numa_foreign 1441062
numa_interleave 14050
numa_local 5005951
numa_other 1522410
nr_anon_transparent_hugepages 11186
protection: (0, 0, 0, 0)
pagesets
cpu: 0
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 1
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 2
count: 14
high: 186
batch: 31
vm stats threshold: 90
cpu: 3
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 4
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 5
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 6
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 7
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 8
count: 31
high: 186
batch: 31
vm stats threshold: 90
cpu: 9
count: 38
high: 186
batch: 31
vm stats threshold: 90
cpu: 10
count: 30
high: 186
batch: 31
vm stats threshold: 90
cpu: 11
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 12
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 13
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 14
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 15
count: 0
high: 186
batch: 31
vm stats threshold: 90
all_unreclaimable: 0
start_pfn: 25296896
inactive_ratio: 17
==========================================================================
Node 0, zone DMA
pages free 3968
min 3
low 3
high 4
scanned 0
spanned 4080
present 3904
nr_free_pages 3968
nr_inactive_anon 0
nr_active_anon 0
nr_inactive_file 0
nr_active_file 0
nr_unevictable 0
nr_mlock 0
nr_anon_pages 0
nr_mapped 0
nr_file_pages 0
nr_dirty 0
nr_writeback 0
nr_slab_reclaimable 0
nr_slab_unreclaimable 0
nr_page_table_pages 0
nr_kernel_stack 0
nr_unstable 0
nr_bounce 0
nr_vmscan_write 0
nr_vmscan_immediate_reclaim 0
nr_writeback_temp 0
nr_isolated_anon 0
nr_isolated_file 0
nr_shmem 0
nr_dirtied 0
nr_written 0
numa_hit 0
numa_miss 0
numa_foreign 0
numa_interleave 0
numa_local 0
numa_other 0
nr_anon_transparent_hugepages 0
protection: (0, 3502, 32230, 32230)
pagesets
cpu: 0
count: 0
high: 0
batch: 1
vm stats threshold: 10
cpu: 1
count: 0
high: 0
batch: 1
vm stats threshold: 10
cpu: 2
count: 0
high: 0
batch: 1
vm stats threshold: 10
cpu: 3
count: 0
high: 0
batch: 1
vm stats threshold: 10
cpu: 4
count: 0
high: 0
batch: 1
vm stats threshold: 10
cpu: 5
count: 0
high: 0
batch: 1
vm stats threshold: 10
cpu: 6
count: 0
high: 0
batch: 1
vm stats threshold: 10
cpu: 7
count: 0
high: 0
batch: 1
vm stats threshold: 10
cpu: 8
count: 0
high: 0
batch: 1
vm stats threshold: 10
cpu: 9
count: 0
high: 0
batch: 1
vm stats threshold: 10
cpu: 10
count: 0
high: 0
batch: 1
vm stats threshold: 10
cpu: 11
count: 0
high: 0
batch: 1
vm stats threshold: 10
cpu: 12
count: 0
high: 0
batch: 1
vm stats threshold: 10
cpu: 13
count: 0
high: 0
batch: 1
vm stats threshold: 10
cpu: 14
count: 0
high: 0
batch: 1
vm stats threshold: 10
cpu: 15
count: 0
high: 0
batch: 1
vm stats threshold: 10
all_unreclaimable: 1
start_pfn: 16
inactive_ratio: 1
Node 0, zone DMA32
pages free 29798
min 917
low 1146
high 1375
scanned 0
spanned 1044480
present 896720
nr_free_pages 29798
nr_inactive_anon 0
nr_active_anon 817152
nr_inactive_file 29243
nr_active_file 574
nr_unevictable 0
nr_mlock 0
nr_anon_pages 0
nr_mapped 1
nr_file_pages 29817
nr_dirty 0
nr_writeback 0
nr_slab_reclaimable 26
nr_slab_unreclaimable 2
nr_page_table_pages 244
nr_kernel_stack 0
nr_unstable 0
nr_bounce 0
nr_vmscan_write 0
nr_vmscan_immediate_reclaim 0
nr_writeback_temp 0
nr_isolated_anon 0
nr_isolated_file 0
nr_shmem 0
nr_dirtied 30546
nr_written 30546
numa_hit 42617
numa_miss 124755
numa_foreign 0
numa_interleave 0
numa_local 42023
numa_other 125349
nr_anon_transparent_hugepages 1596
protection: (0, 0, 28728, 28728)
pagesets
cpu: 0
count: 0
high: 186
batch: 31
vm stats threshold: 60
cpu: 1
count: 0
high: 186
batch: 31
vm stats threshold: 60
cpu: 2
count: 0
high: 186
batch: 31
vm stats threshold: 60
cpu: 3
count: 0
high: 186
batch: 31
vm stats threshold: 60
cpu: 4
count: 0
high: 186
batch: 31
vm stats threshold: 60
cpu: 5
count: 0
high: 186
batch: 31
vm stats threshold: 60
cpu: 6
count: 0
high: 186
batch: 31
vm stats threshold: 60
cpu: 7
count: 0
high: 186
batch: 31
vm stats threshold: 60
cpu: 8
count: 0
high: 186
batch: 31
vm stats threshold: 60
cpu: 9
count: 0
high: 186
batch: 31
vm stats threshold: 60
cpu: 10
count: 0
high: 186
batch: 31
vm stats threshold: 60
cpu: 11
count: 0
high: 186
batch: 31
vm stats threshold: 60
cpu: 12
count: 0
high: 186
batch: 31
vm stats threshold: 60
cpu: 13
count: 0
high: 186
batch: 31
vm stats threshold: 60
cpu: 14
count: 0
high: 186
batch: 31
vm stats threshold: 60
cpu: 15
count: 0
high: 186
batch: 31
vm stats threshold: 60
all_unreclaimable: 0
start_pfn: 4096
inactive_ratio: 5
Node 0, zone Normal
pages free 140658
min 7524
low 9405
high 11286
scanned 0
spanned 7471104
present 7354368
nr_free_pages 140658
nr_inactive_anon 281
nr_active_anon 3178381
nr_inactive_file 1824810
nr_active_file 2050331
nr_unevictable 22
nr_mlock 22
nr_anon_pages 5790
nr_mapped 570
nr_file_pages 3875179
nr_dirty 1
nr_writeback 0
nr_slab_reclaimable 97265
nr_slab_unreclaimable 2756
nr_page_table_pages 8369
nr_kernel_stack 127
nr_unstable 0
nr_bounce 0
nr_vmscan_write 0
nr_vmscan_immediate_reclaim 0
nr_writeback_temp 0
nr_isolated_anon 0
nr_isolated_file 5
nr_shmem 8
nr_dirtied 4911092
nr_written 4911074
numa_hit 11018781
numa_miss 975761
numa_foreign 6137358
numa_interleave 14102
numa_local 11009945
numa_other 984597
nr_anon_transparent_hugepages 6197
protection: (0, 0, 0, 0)
pagesets
cpu: 0
count: 31
high: 186
batch: 31
vm stats threshold: 90
cpu: 1
count: 30
high: 186
batch: 31
vm stats threshold: 90
cpu: 2
count: 48
high: 186
batch: 31
vm stats threshold: 90
cpu: 3
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 4
count: 1
high: 186
batch: 31
vm stats threshold: 90
cpu: 5
count: 17
high: 186
batch: 31
vm stats threshold: 90
cpu: 6
count: 3
high: 186
batch: 31
vm stats threshold: 90
cpu: 7
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 8
count: 1
high: 186
batch: 31
vm stats threshold: 90
cpu: 9
count: 1
high: 186
batch: 31
vm stats threshold: 90
cpu: 10
count: 11
high: 186
batch: 31
vm stats threshold: 90
cpu: 11
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 12
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 13
count: 30
high: 186
batch: 31
vm stats threshold: 90
cpu: 14
count: 1
high: 186
batch: 31
vm stats threshold: 90
cpu: 15
count: 0
high: 186
batch: 31
vm stats threshold: 90
all_unreclaimable: 0
start_pfn: 1048576
inactive_ratio: 16
Node 1, zone Normal
pages free 25982
min 8448
low 10560
high 12672
scanned 0
spanned 8388608
present 8257536
nr_free_pages 25982
nr_inactive_anon 361430
nr_active_anon 5948303
nr_inactive_file 1757767
nr_active_file 76240
nr_unevictable 444
nr_mlock 444
nr_anon_pages 1001
nr_mapped 990
nr_file_pages 1834319
nr_dirty 2
nr_writeback 0
nr_slab_reclaimable 56778
nr_slab_unreclaimable 1404
nr_page_table_pages 10464
nr_kernel_stack 22
nr_unstable 0
nr_bounce 0
nr_vmscan_write 0
nr_vmscan_immediate_reclaim 0
nr_writeback_temp 0
nr_isolated_anon 0
nr_isolated_file 0
nr_shmem 5
nr_dirtied 5001855
nr_written 5001853
numa_hit 4882365
numa_miss 4315400
numa_foreign 1711246
numa_interleave 14052
numa_local 4861540
numa_other 4336225
nr_anon_transparent_hugepages 12322
protection: (0, 0, 0, 0)
pagesets
cpu: 0
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 1
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 2
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 3
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 4
count: 29
high: 186
batch: 31
vm stats threshold: 90
cpu: 5
count: 74
high: 186
batch: 31
vm stats threshold: 90
cpu: 6
count: 120
high: 186
batch: 31
vm stats threshold: 90
cpu: 7
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 8
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 9
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 10
count: 27
high: 186
batch: 31
vm stats threshold: 90
cpu: 11
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 12
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 13
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 14
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 15
count: 0
high: 186
batch: 31
vm stats threshold: 90
all_unreclaimable: 0
start_pfn: 8519680
inactive_ratio: 17
Node 2, zone Normal
pages free 8514
min 8448
low 10560
high 12672
scanned 0
spanned 8388608
present 8257536
nr_free_pages 8514
nr_inactive_anon 385103
nr_active_anon 6307975
nr_inactive_file 1409493
nr_active_file 48031
nr_unevictable 74
nr_mlock 74
nr_anon_pages 6866
nr_mapped 1678
nr_file_pages 1457589
nr_dirty 3
nr_writeback 0
nr_slab_reclaimable 31690
nr_slab_unreclaimable 1537
nr_page_table_pages 13296
nr_kernel_stack 52
nr_unstable 0
nr_bounce 0
nr_vmscan_write 0
nr_vmscan_immediate_reclaim 0
nr_writeback_temp 0
nr_isolated_anon 0
nr_isolated_file 0
nr_shmem 5
nr_dirtied 3264794
nr_written 3264788
numa_hit 3704905
numa_miss 3143298
numa_foreign 774847
numa_interleave 14093
numa_local 3688103
numa_other 3160100
nr_anon_transparent_hugepages 13051
protection: (0, 0, 0, 0)
pagesets
cpu: 0
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 1
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 2
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 3
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 4
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 5
count: 175
high: 186
batch: 31
vm stats threshold: 90
cpu: 6
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 7
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 8
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 9
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 10
count: 170
high: 186
batch: 31
vm stats threshold: 90
cpu: 11
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 12
count: 8
high: 186
batch: 31
vm stats threshold: 90
cpu: 13
count: 30
high: 186
batch: 31
vm stats threshold: 90
cpu: 14
count: 4
high: 186
batch: 31
vm stats threshold: 90
cpu: 15
count: 0
high: 186
batch: 31
vm stats threshold: 90
all_unreclaimable: 0
start_pfn: 16908288
inactive_ratio: 17
Node 3, zone Normal
pages free 42068
min 8448
low 10560
high 12672
scanned 0
spanned 8388608
present 8257536
nr_free_pages 42068
nr_inactive_anon 273
nr_active_anon 5729807
nr_inactive_file 1787193
nr_active_file 638901
nr_unevictable 79
nr_mlock 79
nr_anon_pages 2930
nr_mapped 670
nr_file_pages 2426099
nr_dirty 1
nr_writeback 0
nr_slab_reclaimable 27153
nr_slab_unreclaimable 1453
nr_page_table_pages 12710
nr_kernel_stack 27
nr_unstable 0
nr_bounce 0
nr_vmscan_write 0
nr_vmscan_immediate_reclaim 0
nr_writeback_temp 0
nr_isolated_anon 0
nr_isolated_file 0
nr_shmem 1
nr_dirtied 2734473
nr_written 2734460
numa_hit 5030446
numa_miss 1506319
numa_foreign 1442082
numa_interleave 14050
numa_local 5008209
numa_other 1528556
nr_anon_transparent_hugepages 11186
protection: (0, 0, 0, 0)
pagesets
cpu: 0
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 1
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 2
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 3
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 4
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 5
count: 9
high: 186
batch: 31
vm stats threshold: 90
cpu: 6
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 7
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 8
count: 29
high: 186
batch: 31
vm stats threshold: 90
cpu: 9
count: 31
high: 186
batch: 31
vm stats threshold: 90
cpu: 10
count: 33
high: 186
batch: 31
vm stats threshold: 90
cpu: 11
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 12
count: 31
high: 186
batch: 31
vm stats threshold: 90
cpu: 13
count: 25
high: 186
batch: 31
vm stats threshold: 90
cpu: 14
count: 50
high: 186
batch: 31
vm stats threshold: 90
cpu: 15
count: 0
high: 186
batch: 31
vm stats threshold: 90
all_unreclaimable: 0
start_pfn: 25296896
inactive_ratio: 17
^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: [Qemu-devel] Windows slow boot: contractor wanted
@ 2012-08-22 15:26 ` Richard Davies
0 siblings, 0 replies; 101+ messages in thread
From: Richard Davies @ 2012-08-22 15:26 UTC (permalink / raw)
To: Avi Kivity; +Cc: qemu-devel, kvm
Avi Kivity wrote:
> Richard Davies wrote:
> > Avi Kivity wrote:
> > > Richard Davies wrote:
> > > > I can trigger the slow boots without KSM and they have the same
> > > > profile, with _raw_spin_lock_irqsave and isolate_freepages_block at
> > > > the top.
> > > >
> > > > I reduced to 3x 20GB 8-core VMs on a 128GB host (rather than 3x 40GB
> > > > 8-core VMs), and haven't managed to get a really slow boot yet (>5
> > > > minutes). I'll post agan when I get one.
> > >
> > > I think you can go higher than that. But 120GB on a 128GB host is
> > > pushing it.
> >
> > I've now triggered a very slow boot at 3x 36GB 8-core VMs on a 128GB
> > host (i.e. 108GB on a 128GB host).
> >
> > It has the same profile with _raw_spin_lock_irqsave and
> > isolate_freepages_block at the top.
>
> Then it's still memory starved.
>
> Please provide /proc/zoneinfo while this is happening.
Here are two copies at /proc/zoneinfo a minute or so apart during a
situation where there are 3x 36GB 8-core VMs on a 128GB host, with two of
the three VMs slow booting.
Node 0, zone DMA
pages free 3968
min 3
low 3
high 4
scanned 0
spanned 4080
present 3904
nr_free_pages 3968
nr_inactive_anon 0
nr_active_anon 0
nr_inactive_file 0
nr_active_file 0
nr_unevictable 0
nr_mlock 0
nr_anon_pages 0
nr_mapped 0
nr_file_pages 0
nr_dirty 0
nr_writeback 0
nr_slab_reclaimable 0
nr_slab_unreclaimable 0
nr_page_table_pages 0
nr_kernel_stack 0
nr_unstable 0
nr_bounce 0
nr_vmscan_write 0
nr_vmscan_immediate_reclaim 0
nr_writeback_temp 0
nr_isolated_anon 0
nr_isolated_file 0
nr_shmem 0
nr_dirtied 0
nr_written 0
numa_hit 0
numa_miss 0
numa_foreign 0
numa_interleave 0
numa_local 0
numa_other 0
nr_anon_transparent_hugepages 0
protection: (0, 3502, 32230, 32230)
pagesets
cpu: 0
count: 0
high: 0
batch: 1
vm stats threshold: 10
cpu: 1
count: 0
high: 0
batch: 1
vm stats threshold: 10
cpu: 2
count: 0
high: 0
batch: 1
vm stats threshold: 10
cpu: 3
count: 0
high: 0
batch: 1
vm stats threshold: 10
cpu: 4
count: 0
high: 0
batch: 1
vm stats threshold: 10
cpu: 5
count: 0
high: 0
batch: 1
vm stats threshold: 10
cpu: 6
count: 0
high: 0
batch: 1
vm stats threshold: 10
cpu: 7
count: 0
high: 0
batch: 1
vm stats threshold: 10
cpu: 8
count: 0
high: 0
batch: 1
vm stats threshold: 10
cpu: 9
count: 0
high: 0
batch: 1
vm stats threshold: 10
cpu: 10
count: 0
high: 0
batch: 1
vm stats threshold: 10
cpu: 11
count: 0
high: 0
batch: 1
vm stats threshold: 10
cpu: 12
count: 0
high: 0
batch: 1
vm stats threshold: 10
cpu: 13
count: 0
high: 0
batch: 1
vm stats threshold: 10
cpu: 14
count: 0
high: 0
batch: 1
vm stats threshold: 10
cpu: 15
count: 0
high: 0
batch: 1
vm stats threshold: 10
all_unreclaimable: 1
start_pfn: 16
inactive_ratio: 1
Node 0, zone DMA32
pages free 29798
min 917
low 1146
high 1375
scanned 0
spanned 1044480
present 896720
nr_free_pages 29798
nr_inactive_anon 0
nr_active_anon 817152
nr_inactive_file 29243
nr_active_file 574
nr_unevictable 0
nr_mlock 0
nr_anon_pages 0
nr_mapped 1
nr_file_pages 29817
nr_dirty 0
nr_writeback 0
nr_slab_reclaimable 26
nr_slab_unreclaimable 2
nr_page_table_pages 244
nr_kernel_stack 0
nr_unstable 0
nr_bounce 0
nr_vmscan_write 0
nr_vmscan_immediate_reclaim 0
nr_writeback_temp 0
nr_isolated_anon 0
nr_isolated_file 0
nr_shmem 0
nr_dirtied 30546
nr_written 30546
numa_hit 42617
numa_miss 124755
numa_foreign 0
numa_interleave 0
numa_local 42023
numa_other 125349
nr_anon_transparent_hugepages 1596
protection: (0, 0, 28728, 28728)
pagesets
cpu: 0
count: 0
high: 186
batch: 31
vm stats threshold: 60
cpu: 1
count: 0
high: 186
batch: 31
vm stats threshold: 60
cpu: 2
count: 0
high: 186
batch: 31
vm stats threshold: 60
cpu: 3
count: 0
high: 186
batch: 31
vm stats threshold: 60
cpu: 4
count: 0
high: 186
batch: 31
vm stats threshold: 60
cpu: 5
count: 0
high: 186
batch: 31
vm stats threshold: 60
cpu: 6
count: 0
high: 186
batch: 31
vm stats threshold: 60
cpu: 7
count: 0
high: 186
batch: 31
vm stats threshold: 60
cpu: 8
count: 0
high: 186
batch: 31
vm stats threshold: 60
cpu: 9
count: 0
high: 186
batch: 31
vm stats threshold: 60
cpu: 10
count: 0
high: 186
batch: 31
vm stats threshold: 60
cpu: 11
count: 0
high: 186
batch: 31
vm stats threshold: 60
cpu: 12
count: 0
high: 186
batch: 31
vm stats threshold: 60
cpu: 13
count: 0
high: 186
batch: 31
vm stats threshold: 60
cpu: 14
count: 0
high: 186
batch: 31
vm stats threshold: 60
cpu: 15
count: 0
high: 186
batch: 31
vm stats threshold: 60
all_unreclaimable: 0
start_pfn: 4096
inactive_ratio: 5
Node 0, zone Normal
pages free 292707
min 7524
low 9405
high 11286
scanned 0
spanned 7471104
present 7354368
nr_free_pages 292707
nr_inactive_anon 281
nr_active_anon 3024092
nr_inactive_file 1824853
nr_active_file 2050217
nr_unevictable 22
nr_mlock 22
nr_anon_pages 5103
nr_mapped 570
nr_file_pages 3875107
nr_dirty 1
nr_writeback 0
nr_slab_reclaimable 99328
nr_slab_unreclaimable 2701
nr_page_table_pages 8153
nr_kernel_stack 127
nr_unstable 0
nr_bounce 0
nr_vmscan_write 0
nr_vmscan_immediate_reclaim 0
nr_writeback_temp 0
nr_isolated_anon 0
nr_isolated_file 0
nr_shmem 8
nr_dirtied 4910752
nr_written 4910735
numa_hit 11010852
numa_miss 973848
numa_foreign 6137099
numa_interleave 14102
numa_local 11003048
numa_other 981652
nr_anon_transparent_hugepages 5898
protection: (0, 0, 0, 0)
pagesets
cpu: 0
count: 29
high: 186
batch: 31
vm stats threshold: 90
cpu: 1
count: 2
high: 186
batch: 31
vm stats threshold: 90
cpu: 2
count: 46
high: 186
batch: 31
vm stats threshold: 90
cpu: 3
count: 26
high: 186
batch: 31
vm stats threshold: 90
cpu: 4
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 5
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 6
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 7
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 8
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 9
count: 18
high: 186
batch: 31
vm stats threshold: 90
cpu: 10
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 11
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 12
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 13
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 14
count: 1
high: 186
batch: 31
vm stats threshold: 90
cpu: 15
count: 0
high: 186
batch: 31
vm stats threshold: 90
all_unreclaimable: 0
start_pfn: 1048576
inactive_ratio: 16
Node 1, zone Normal
pages free 23288
min 8448
low 10560
high 12672
scanned 0
spanned 8388608
present 8257536
nr_free_pages 23288
nr_inactive_anon 361430
nr_active_anon 5925377
nr_inactive_file 1779378
nr_active_file 76158
nr_unevictable 444
nr_mlock 444
nr_anon_pages 603
nr_mapped 990
nr_file_pages 1855911
nr_dirty 3
nr_writeback 0
nr_slab_reclaimable 60961
nr_slab_unreclaimable 1404
nr_page_table_pages 10197
nr_kernel_stack 22
nr_unstable 0
nr_bounce 0
nr_vmscan_write 0
nr_vmscan_immediate_reclaim 0
nr_writeback_temp 0
nr_isolated_anon 0
nr_isolated_file 97
nr_shmem 5
nr_dirtied 5000958
nr_written 5000955
numa_hit 4879358
numa_miss 4315336
numa_foreign 1710349
numa_interleave 14052
numa_local 4860081
numa_other 4334613
nr_anon_transparent_hugepages 12277
protection: (0, 0, 0, 0)
pagesets
cpu: 0
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 1
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 2
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 3
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 4
count: 30
high: 186
batch: 31
vm stats threshold: 90
cpu: 5
count: 88
high: 186
batch: 31
vm stats threshold: 90
cpu: 6
count: 176
high: 186
batch: 31
vm stats threshold: 90
cpu: 7
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 8
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 9
count: 179
high: 186
batch: 31
vm stats threshold: 90
cpu: 10
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 11
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 12
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 13
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 14
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 15
count: 0
high: 186
batch: 31
vm stats threshold: 90
all_unreclaimable: 0
start_pfn: 8519680
inactive_ratio: 17
Node 2, zone Normal
pages free 11632
min 8448
low 10560
high 12672
scanned 3
spanned 8388608
present 8257536
nr_free_pages 11632
nr_inactive_anon 368719
nr_active_anon 6009871
nr_inactive_file 1721022
nr_active_file 47969
nr_unevictable 74
nr_mlock 74
nr_anon_pages 6741
nr_mapped 1678
nr_file_pages 1769100
nr_dirty 3
nr_writeback 0
nr_slab_reclaimable 31690
nr_slab_unreclaimable 1547
nr_page_table_pages 13178
nr_kernel_stack 52
nr_unstable 0
nr_bounce 0
nr_vmscan_write 0
nr_vmscan_immediate_reclaim 0
nr_writeback_temp 0
nr_isolated_anon 0
nr_isolated_file 0
nr_shmem 5
nr_dirtied 3264512
nr_written 3264506
numa_hit 3701723
numa_miss 3141775
numa_foreign 768925
numa_interleave 14093
numa_local 3685078
numa_other 3158420
nr_anon_transparent_hugepages 12446
protection: (0, 0, 0, 0)
pagesets
cpu: 0
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 1
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 2
count: 2
high: 186
batch: 31
vm stats threshold: 90
cpu: 3
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 4
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 5
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 6
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 7
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 8
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 9
count: 172
high: 186
batch: 31
vm stats threshold: 90
cpu: 10
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 11
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 12
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 13
count: 30
high: 186
batch: 31
vm stats threshold: 90
cpu: 14
count: 47
high: 186
batch: 31
vm stats threshold: 90
cpu: 15
count: 30
high: 186
batch: 31
vm stats threshold: 90
all_unreclaimable: 0
start_pfn: 16908288
inactive_ratio: 17
Node 3, zone Normal
pages free 42611
min 8448
low 10560
high 12672
scanned 0
spanned 8388608
present 8257536
nr_free_pages 42611
nr_inactive_anon 273
nr_active_anon 5728983
nr_inactive_file 1787163
nr_active_file 638839
nr_unevictable 79
nr_mlock 79
nr_anon_pages 2091
nr_mapped 670
nr_file_pages 2426028
nr_dirty 0
nr_writeback 0
nr_slab_reclaimable 27949
nr_slab_unreclaimable 1417
nr_page_table_pages 12372
nr_kernel_stack 28
nr_unstable 0
nr_bounce 0
nr_vmscan_write 0
nr_vmscan_immediate_reclaim 0
nr_writeback_temp 0
nr_isolated_anon 0
nr_isolated_file 0
nr_shmem 1
nr_dirtied 2734460
nr_written 2734448
numa_hit 5026640
numa_miss 1501721
numa_foreign 1441062
numa_interleave 14050
numa_local 5005951
numa_other 1522410
nr_anon_transparent_hugepages 11186
protection: (0, 0, 0, 0)
pagesets
cpu: 0
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 1
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 2
count: 14
high: 186
batch: 31
vm stats threshold: 90
cpu: 3
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 4
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 5
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 6
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 7
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 8
count: 31
high: 186
batch: 31
vm stats threshold: 90
cpu: 9
count: 38
high: 186
batch: 31
vm stats threshold: 90
cpu: 10
count: 30
high: 186
batch: 31
vm stats threshold: 90
cpu: 11
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 12
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 13
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 14
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 15
count: 0
high: 186
batch: 31
vm stats threshold: 90
all_unreclaimable: 0
start_pfn: 25296896
inactive_ratio: 17
==========================================================================
Node 0, zone DMA
pages free 3968
min 3
low 3
high 4
scanned 0
spanned 4080
present 3904
nr_free_pages 3968
nr_inactive_anon 0
nr_active_anon 0
nr_inactive_file 0
nr_active_file 0
nr_unevictable 0
nr_mlock 0
nr_anon_pages 0
nr_mapped 0
nr_file_pages 0
nr_dirty 0
nr_writeback 0
nr_slab_reclaimable 0
nr_slab_unreclaimable 0
nr_page_table_pages 0
nr_kernel_stack 0
nr_unstable 0
nr_bounce 0
nr_vmscan_write 0
nr_vmscan_immediate_reclaim 0
nr_writeback_temp 0
nr_isolated_anon 0
nr_isolated_file 0
nr_shmem 0
nr_dirtied 0
nr_written 0
numa_hit 0
numa_miss 0
numa_foreign 0
numa_interleave 0
numa_local 0
numa_other 0
nr_anon_transparent_hugepages 0
protection: (0, 3502, 32230, 32230)
pagesets
cpu: 0
count: 0
high: 0
batch: 1
vm stats threshold: 10
cpu: 1
count: 0
high: 0
batch: 1
vm stats threshold: 10
cpu: 2
count: 0
high: 0
batch: 1
vm stats threshold: 10
cpu: 3
count: 0
high: 0
batch: 1
vm stats threshold: 10
cpu: 4
count: 0
high: 0
batch: 1
vm stats threshold: 10
cpu: 5
count: 0
high: 0
batch: 1
vm stats threshold: 10
cpu: 6
count: 0
high: 0
batch: 1
vm stats threshold: 10
cpu: 7
count: 0
high: 0
batch: 1
vm stats threshold: 10
cpu: 8
count: 0
high: 0
batch: 1
vm stats threshold: 10
cpu: 9
count: 0
high: 0
batch: 1
vm stats threshold: 10
cpu: 10
count: 0
high: 0
batch: 1
vm stats threshold: 10
cpu: 11
count: 0
high: 0
batch: 1
vm stats threshold: 10
cpu: 12
count: 0
high: 0
batch: 1
vm stats threshold: 10
cpu: 13
count: 0
high: 0
batch: 1
vm stats threshold: 10
cpu: 14
count: 0
high: 0
batch: 1
vm stats threshold: 10
cpu: 15
count: 0
high: 0
batch: 1
vm stats threshold: 10
all_unreclaimable: 1
start_pfn: 16
inactive_ratio: 1
Node 0, zone DMA32
pages free 29798
min 917
low 1146
high 1375
scanned 0
spanned 1044480
present 896720
nr_free_pages 29798
nr_inactive_anon 0
nr_active_anon 817152
nr_inactive_file 29243
nr_active_file 574
nr_unevictable 0
nr_mlock 0
nr_anon_pages 0
nr_mapped 1
nr_file_pages 29817
nr_dirty 0
nr_writeback 0
nr_slab_reclaimable 26
nr_slab_unreclaimable 2
nr_page_table_pages 244
nr_kernel_stack 0
nr_unstable 0
nr_bounce 0
nr_vmscan_write 0
nr_vmscan_immediate_reclaim 0
nr_writeback_temp 0
nr_isolated_anon 0
nr_isolated_file 0
nr_shmem 0
nr_dirtied 30546
nr_written 30546
numa_hit 42617
numa_miss 124755
numa_foreign 0
numa_interleave 0
numa_local 42023
numa_other 125349
nr_anon_transparent_hugepages 1596
protection: (0, 0, 28728, 28728)
pagesets
cpu: 0
count: 0
high: 186
batch: 31
vm stats threshold: 60
cpu: 1
count: 0
high: 186
batch: 31
vm stats threshold: 60
cpu: 2
count: 0
high: 186
batch: 31
vm stats threshold: 60
cpu: 3
count: 0
high: 186
batch: 31
vm stats threshold: 60
cpu: 4
count: 0
high: 186
batch: 31
vm stats threshold: 60
cpu: 5
count: 0
high: 186
batch: 31
vm stats threshold: 60
cpu: 6
count: 0
high: 186
batch: 31
vm stats threshold: 60
cpu: 7
count: 0
high: 186
batch: 31
vm stats threshold: 60
cpu: 8
count: 0
high: 186
batch: 31
vm stats threshold: 60
cpu: 9
count: 0
high: 186
batch: 31
vm stats threshold: 60
cpu: 10
count: 0
high: 186
batch: 31
vm stats threshold: 60
cpu: 11
count: 0
high: 186
batch: 31
vm stats threshold: 60
cpu: 12
count: 0
high: 186
batch: 31
vm stats threshold: 60
cpu: 13
count: 0
high: 186
batch: 31
vm stats threshold: 60
cpu: 14
count: 0
high: 186
batch: 31
vm stats threshold: 60
cpu: 15
count: 0
high: 186
batch: 31
vm stats threshold: 60
all_unreclaimable: 0
start_pfn: 4096
inactive_ratio: 5
Node 0, zone Normal
pages free 140658
min 7524
low 9405
high 11286
scanned 0
spanned 7471104
present 7354368
nr_free_pages 140658
nr_inactive_anon 281
nr_active_anon 3178381
nr_inactive_file 1824810
nr_active_file 2050331
nr_unevictable 22
nr_mlock 22
nr_anon_pages 5790
nr_mapped 570
nr_file_pages 3875179
nr_dirty 1
nr_writeback 0
nr_slab_reclaimable 97265
nr_slab_unreclaimable 2756
nr_page_table_pages 8369
nr_kernel_stack 127
nr_unstable 0
nr_bounce 0
nr_vmscan_write 0
nr_vmscan_immediate_reclaim 0
nr_writeback_temp 0
nr_isolated_anon 0
nr_isolated_file 5
nr_shmem 8
nr_dirtied 4911092
nr_written 4911074
numa_hit 11018781
numa_miss 975761
numa_foreign 6137358
numa_interleave 14102
numa_local 11009945
numa_other 984597
nr_anon_transparent_hugepages 6197
protection: (0, 0, 0, 0)
pagesets
cpu: 0
count: 31
high: 186
batch: 31
vm stats threshold: 90
cpu: 1
count: 30
high: 186
batch: 31
vm stats threshold: 90
cpu: 2
count: 48
high: 186
batch: 31
vm stats threshold: 90
cpu: 3
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 4
count: 1
high: 186
batch: 31
vm stats threshold: 90
cpu: 5
count: 17
high: 186
batch: 31
vm stats threshold: 90
cpu: 6
count: 3
high: 186
batch: 31
vm stats threshold: 90
cpu: 7
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 8
count: 1
high: 186
batch: 31
vm stats threshold: 90
cpu: 9
count: 1
high: 186
batch: 31
vm stats threshold: 90
cpu: 10
count: 11
high: 186
batch: 31
vm stats threshold: 90
cpu: 11
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 12
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 13
count: 30
high: 186
batch: 31
vm stats threshold: 90
cpu: 14
count: 1
high: 186
batch: 31
vm stats threshold: 90
cpu: 15
count: 0
high: 186
batch: 31
vm stats threshold: 90
all_unreclaimable: 0
start_pfn: 1048576
inactive_ratio: 16
Node 1, zone Normal
pages free 25982
min 8448
low 10560
high 12672
scanned 0
spanned 8388608
present 8257536
nr_free_pages 25982
nr_inactive_anon 361430
nr_active_anon 5948303
nr_inactive_file 1757767
nr_active_file 76240
nr_unevictable 444
nr_mlock 444
nr_anon_pages 1001
nr_mapped 990
nr_file_pages 1834319
nr_dirty 2
nr_writeback 0
nr_slab_reclaimable 56778
nr_slab_unreclaimable 1404
nr_page_table_pages 10464
nr_kernel_stack 22
nr_unstable 0
nr_bounce 0
nr_vmscan_write 0
nr_vmscan_immediate_reclaim 0
nr_writeback_temp 0
nr_isolated_anon 0
nr_isolated_file 0
nr_shmem 5
nr_dirtied 5001855
nr_written 5001853
numa_hit 4882365
numa_miss 4315400
numa_foreign 1711246
numa_interleave 14052
numa_local 4861540
numa_other 4336225
nr_anon_transparent_hugepages 12322
protection: (0, 0, 0, 0)
pagesets
cpu: 0
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 1
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 2
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 3
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 4
count: 29
high: 186
batch: 31
vm stats threshold: 90
cpu: 5
count: 74
high: 186
batch: 31
vm stats threshold: 90
cpu: 6
count: 120
high: 186
batch: 31
vm stats threshold: 90
cpu: 7
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 8
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 9
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 10
count: 27
high: 186
batch: 31
vm stats threshold: 90
cpu: 11
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 12
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 13
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 14
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 15
count: 0
high: 186
batch: 31
vm stats threshold: 90
all_unreclaimable: 0
start_pfn: 8519680
inactive_ratio: 17
Node 2, zone Normal
pages free 8514
min 8448
low 10560
high 12672
scanned 0
spanned 8388608
present 8257536
nr_free_pages 8514
nr_inactive_anon 385103
nr_active_anon 6307975
nr_inactive_file 1409493
nr_active_file 48031
nr_unevictable 74
nr_mlock 74
nr_anon_pages 6866
nr_mapped 1678
nr_file_pages 1457589
nr_dirty 3
nr_writeback 0
nr_slab_reclaimable 31690
nr_slab_unreclaimable 1537
nr_page_table_pages 13296
nr_kernel_stack 52
nr_unstable 0
nr_bounce 0
nr_vmscan_write 0
nr_vmscan_immediate_reclaim 0
nr_writeback_temp 0
nr_isolated_anon 0
nr_isolated_file 0
nr_shmem 5
nr_dirtied 3264794
nr_written 3264788
numa_hit 3704905
numa_miss 3143298
numa_foreign 774847
numa_interleave 14093
numa_local 3688103
numa_other 3160100
nr_anon_transparent_hugepages 13051
protection: (0, 0, 0, 0)
pagesets
cpu: 0
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 1
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 2
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 3
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 4
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 5
count: 175
high: 186
batch: 31
vm stats threshold: 90
cpu: 6
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 7
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 8
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 9
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 10
count: 170
high: 186
batch: 31
vm stats threshold: 90
cpu: 11
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 12
count: 8
high: 186
batch: 31
vm stats threshold: 90
cpu: 13
count: 30
high: 186
batch: 31
vm stats threshold: 90
cpu: 14
count: 4
high: 186
batch: 31
vm stats threshold: 90
cpu: 15
count: 0
high: 186
batch: 31
vm stats threshold: 90
all_unreclaimable: 0
start_pfn: 16908288
inactive_ratio: 17
Node 3, zone Normal
pages free 42068
min 8448
low 10560
high 12672
scanned 0
spanned 8388608
present 8257536
nr_free_pages 42068
nr_inactive_anon 273
nr_active_anon 5729807
nr_inactive_file 1787193
nr_active_file 638901
nr_unevictable 79
nr_mlock 79
nr_anon_pages 2930
nr_mapped 670
nr_file_pages 2426099
nr_dirty 1
nr_writeback 0
nr_slab_reclaimable 27153
nr_slab_unreclaimable 1453
nr_page_table_pages 12710
nr_kernel_stack 27
nr_unstable 0
nr_bounce 0
nr_vmscan_write 0
nr_vmscan_immediate_reclaim 0
nr_writeback_temp 0
nr_isolated_anon 0
nr_isolated_file 0
nr_shmem 1
nr_dirtied 2734473
nr_written 2734460
numa_hit 5030446
numa_miss 1506319
numa_foreign 1442082
numa_interleave 14050
numa_local 5008209
numa_other 1528556
nr_anon_transparent_hugepages 11186
protection: (0, 0, 0, 0)
pagesets
cpu: 0
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 1
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 2
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 3
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 4
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 5
count: 9
high: 186
batch: 31
vm stats threshold: 90
cpu: 6
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 7
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 8
count: 29
high: 186
batch: 31
vm stats threshold: 90
cpu: 9
count: 31
high: 186
batch: 31
vm stats threshold: 90
cpu: 10
count: 33
high: 186
batch: 31
vm stats threshold: 90
cpu: 11
count: 0
high: 186
batch: 31
vm stats threshold: 90
cpu: 12
count: 31
high: 186
batch: 31
vm stats threshold: 90
cpu: 13
count: 25
high: 186
batch: 31
vm stats threshold: 90
cpu: 14
count: 50
high: 186
batch: 31
vm stats threshold: 90
cpu: 15
count: 0
high: 186
batch: 31
vm stats threshold: 90
all_unreclaimable: 0
start_pfn: 25296896
inactive_ratio: 17
^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: Windows slow boot: contractor wanted
2012-08-22 15:21 ` [Qemu-devel] " Rik van Riel
@ 2012-08-22 15:34 ` Richard Davies
-1 siblings, 0 replies; 101+ messages in thread
From: Richard Davies @ 2012-08-22 15:34 UTC (permalink / raw)
To: Rik van Riel; +Cc: Avi Kivity, qemu-devel, kvm
Rik van Riel wrote:
> Richard Davies wrote:
> > I've now triggered a very slow boot at 3x 36GB 8-core VMs on a 128GB
> > host (i.e. 108GB on a 128GB host).
> >
> > It has the same profile with _raw_spin_lock_irqsave and
> > isolate_freepages_block at the top.
>
> That's the page compaction code.
>
> Mel Gorman and I have been working to fix that, the latest fixes and
> improvements are in the -mm kernel already.
Hi Rik,
That's good news.
Can you point me to specific patches which we can backport to a 3.5.2 kernel
to test whether they fix our problem?
Thanks,
Richard.
^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: [Qemu-devel] Windows slow boot: contractor wanted
@ 2012-08-22 15:34 ` Richard Davies
0 siblings, 0 replies; 101+ messages in thread
From: Richard Davies @ 2012-08-22 15:34 UTC (permalink / raw)
To: Rik van Riel; +Cc: Avi Kivity, kvm, qemu-devel
Rik van Riel wrote:
> Richard Davies wrote:
> > I've now triggered a very slow boot at 3x 36GB 8-core VMs on a 128GB
> > host (i.e. 108GB on a 128GB host).
> >
> > It has the same profile with _raw_spin_lock_irqsave and
> > isolate_freepages_block at the top.
>
> That's the page compaction code.
>
> Mel Gorman and I have been working to fix that, the latest fixes and
> improvements are in the -mm kernel already.
Hi Rik,
That's good news.
Can you point me to specific patches which we can backport to a 3.5.2 kernel
to test whether they fix our problem?
Thanks,
Richard.
^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: [Qemu-devel] Windows slow boot: contractor wanted
2012-08-22 14:53 ` [Qemu-devel] " Avi Kivity
@ 2012-08-22 17:22 ` Troy Benjegerdes
-1 siblings, 0 replies; 101+ messages in thread
From: Troy Benjegerdes @ 2012-08-22 17:22 UTC (permalink / raw)
To: Avi Kivity; +Cc: Richard Davies, qemu-devel, kvm
> > I've now triggered a very slow boot at 3x 36GB 8-core VMs on a 128GB host
> > (i.e. 108GB on a 128GB host).
> >
> > It has the same profile with _raw_spin_lock_irqsave and
> > isolate_freepages_block at the top.
>
> Then it's still memory starved.
>
> Please provide /proc/zoneinfo while this is happening.
Is there a way to capture/reproduce this 'slow boot' behavior with
a simple regression test? I'd like to know if it happens on a
single-physical CPU socket machine, or just on dual-sockets.
I'm also observing an interesting phenomenon here.. Kernel development
can move so fast as to make regression testing pointless. ;)
^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: [Qemu-devel] Windows slow boot: contractor wanted
@ 2012-08-22 17:22 ` Troy Benjegerdes
0 siblings, 0 replies; 101+ messages in thread
From: Troy Benjegerdes @ 2012-08-22 17:22 UTC (permalink / raw)
To: Avi Kivity; +Cc: Richard Davies, kvm, qemu-devel
> > I've now triggered a very slow boot at 3x 36GB 8-core VMs on a 128GB host
> > (i.e. 108GB on a 128GB host).
> >
> > It has the same profile with _raw_spin_lock_irqsave and
> > isolate_freepages_block at the top.
>
> Then it's still memory starved.
>
> Please provide /proc/zoneinfo while this is happening.
Is there a way to capture/reproduce this 'slow boot' behavior with
a simple regression test? I'd like to know if it happens on a
single-physical CPU socket machine, or just on dual-sockets.
I'm also observing an interesting phenomenon here.. Kernel development
can move so fast as to make regression testing pointless. ;)
^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: Windows slow boot: contractor wanted
2012-08-22 15:21 ` [Qemu-devel] " Rik van Riel
@ 2012-08-25 17:45 ` Richard Davies
-1 siblings, 0 replies; 101+ messages in thread
From: Richard Davies @ 2012-08-25 17:45 UTC (permalink / raw)
To: Rik van Riel; +Cc: Avi Kivity, qemu-devel, kvm
Rik van Riel wrote:
> Richard Davies wrote:
> > Avi Kivity wrote:
> > > Richard Davies wrote:
> > > > I can trigger the slow boots without KSM and they have the same
> > > > profile, with _raw_spin_lock_irqsave and isolate_freepages_block at
> > > > the top.
> > > >
> > > > I reduced to 3x 20GB 8-core VMs on a 128GB host (rather than 3x 40GB
> > > > 8-core VMs), and haven't managed to get a really slow boot yet (>5
> > > > minutes). I'll post agan when I get one.
> > >
> > > I think you can go higher than that. But 120GB on a 128GB host is
> > > pushing it.
> >
> > I've now triggered a very slow boot at 3x 36GB 8-core VMs on a 128GB host
> > (i.e. 108GB on a 128GB host).
> >
> > It has the same profile with _raw_spin_lock_irqsave and
> > isolate_freepages_block at the top.
>
> That's the page compaction code.
>
> Mel Gorman and I have been working to fix that,
> the latest fixes and improvements are in the -mm
> kernel already.
Hi Rik,
Are you talking about these patches?
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commit;h=c67fe3752abe6ab47639e2f9b836900c3dc3da84
http://marc.info/?l=linux-mm&m=134521289221259
If so, I believe those are in 3.6.0-rc3, so I tested with that.
Unfortunately, I can still get the slow boots and perf top showing
_raw_spin_lock_irqsave.
Here are two perf top traces on 3.6.0-rc3. They do look a bit different from
3.5.2, but _raw_spin_lock_irqsave is still at the top:
PerfTop: 35272 irqs/sec kernel:98.1% exact: 0.0% [4000Hz cycles], (all, 16 CPUs)
------------------------------------------------------------------------------------------------------------------
61.85% [kernel] [k] _raw_spin_lock_irqsave
7.18% [kernel] [k] sub_preempt_count
5.03% [kernel] [k] isolate_freepages_block
2.49% [kernel] [k] yield_to
2.05% [kernel] [k] memcmp
2.01% [kernel] [k] compact_zone
1.76% [kernel] [k] add_preempt_count
1.52% [kernel] [k] _raw_spin_lock
1.31% [kernel] [k] kvm_vcpu_on_spin
0.92% [kernel] [k] svm_vcpu_run
0.78% [kernel] [k] __rcu_read_unlock
0.76% [kernel] [k] migrate_pages
0.68% [kernel] [k] kvm_vcpu_yield_to
0.46% [kernel] [k] pid_task
0.42% [kernel] [k] isolate_migratepages_range
0.41% [kernel] [k] kvm_arch_vcpu_ioctl_run
0.40% [kernel] [k] clear_page_c
0.40% [kernel] [k] get_pid_task
0.40% [kernel] [k] get_parent_ip
0.39% [kernel] [k] __zone_watermark_ok
0.34% [kernel] [k] trace_hardirqs_off
0.34% [kernel] [k] trace_hardirqs_on
0.32% [kernel] [k] _raw_spin_unlock_irqrestore
0.27% [kernel] [k] _raw_spin_unlock
0.22% [kernel] [k] mod_zone_page_state
0.21% [kernel] [k] rcu_note_context_switch
0.21% [kernel] [k] trace_preempt_on
0.21% [kernel] [k] trace_preempt_off
0.19% [kernel] [k] in_lock_functions
0.16% [kernel] [k] __srcu_read_lock
0.14% [kernel] [k] ktime_get
0.11% [kernel] [k] get_pageblock_flags_group
0.11% [kernel] [k] compact_checklock_irqsave
0.11% [kernel] [k] find_busiest_group
0.10% [kernel] [k] __srcu_read_unlock
0.09% [kernel] [k] __rcu_read_lock
0.09% libc-2.10.1.so [.] 0x0000000000072c9d
0.09% [kernel] [k] cpumask_next_and
0.08% [kernel] [k] smp_call_function_many
0.08% [kernel] [k] read_tsc
0.08% [kernel] [k] kmem_cache_alloc
0.08% libc-2.10.1.so [.] strcmp
0.08% [kernel] [k] generic_smp_call_function_interrupt
0.07% [kernel] [k] __schedule
0.07% qemu-kvm [.] main_loop_wait
0.07% [kernel] [k] __hrtimer_start_range_ns
0.06% qemu-kvm [.] qemu_iohandler_poll
0.06% [kernel] [k] ktime_get_update_offsets
0.06% [kernel] [k] ktime_add_safe
0.06% [kernel] [k] find_next_bit
0.06% [kernel] [k] irq_exit
0.06% [kernel] [k] select_task_rq_fair
0.06% [kernel] [k] handle_exit
0.05% [kernel] [k] update_curr
0.05% [kernel] [k] flush_tlb_func
0.05% perf [.] dso__find_symbol
0.05% [kernel] [k] kvm_check_async_pf_completion
0.05% [kernel] [k] rcu_check_callbacks
0.05% [kernel] [k] apic_update_ppr
0.05% [kernel] [k] irq_enter
0.04% [kernel] [k] copy_user_generic_string
0.04% [kernel] [k] copy_page_c
0.04% [kernel] [k] rcu_idle_exit_common.isra.34
0.04% [kernel] [k] load_balance
0.04% [kernel] [k] rb_erase
0.04% libc-2.10.1.so [.] __select
1904 unprocessable samples recorded.1905 unprocessable samples recorded. ...
PerfTop: 49639 irqs/sec kernel:98.8% exact: 0.0% [4000Hz cycles],
(all, 16 CPUs)
------------------------------------------------------------------------------------------------------------------
81.43% [kernel] [k] _raw_spin_lock_irqsave
6.19% [kernel] [k] sub_preempt_count
1.21% [kernel] [k] memcmp
1.03% [kernel] [k] compact_zone
0.72% [kernel] [k] smp_call_function_many
0.50% [kernel] [k] yield_to
0.49% [kernel] [k] add_preempt_count
0.43% [kernel] [k] svm_vcpu_run
0.41% [kernel] [k] _raw_spin_unlock_irqrestore
0.40% [kernel] [k] clear_page_c
0.40% [kernel] [k] migrate_pages
0.38% [kernel] [k] __zone_watermark_ok
0.34% [kernel] [k] isolate_migratepages_range
0.34% [kernel] [k] isolate_freepages_block
0.27% [kernel] [k] kvm_vcpu_on_spin
0.23% [kernel] [k] trace_hardirqs_off
0.21% [kernel] [k] mod_zone_page_state
0.20% [kernel] [k] __rcu_read_unlock
0.18% [kernel] [k] get_parent_ip
0.17% [kernel] [k] _raw_spin_lock
0.14% [kernel] [k] flush_tlb_func
0.14% [kernel] [k] trace_preempt_on
0.14% [kernel] [k] trace_preempt_off
0.14% [kernel] [k] kvm_arch_vcpu_ioctl_run
0.14% [kernel] [k] trace_hardirqs_on
0.10% [kernel] [k] compact_checklock_irqsave
0.09% [kernel] [k] _raw_spin_lock_irq
0.09% [kernel] [k] __srcu_read_lock
0.07% [kernel] [k] in_lock_functions
0.07% [kernel] [k] copy_page_c
0.07% [kernel] [k] kmem_cache_alloc
0.07% libc-2.10.1.so [.] strcmp
0.06% [kernel] [k] _raw_spin_unlock
0.06% [kernel] [k] kvm_vcpu_yield_to
0.06% [kernel] [k] get_pid_task
0.06% [kernel] [k] ktime_get
0.06% [kernel] [k] call_function_interrupt
0.05% [kernel] [k] generic_smp_call_function_interrupt
0.05% [kernel] [k] ktime_get_update_offsets
0.05% [kernel] [k] pid_task
0.05% [kernel] [k] copy_user_generic_string
0.04% [kernel] [k] __srcu_read_unlock
0.04% [kernel] [k] get_pageblock_flags_group
0.04% [kernel] [k] rcu_note_context_switch
0.04% libc-2.10.1.so [.] 0x00000000000743ee
0.04% perf [.] dso__find_symbol
0.04% [kernel] [k] zone_watermark_ok
0.04% [vdso] [.] 0x00007fff9afff85d
0.03% [kernel] [k] __mod_zone_page_state
0.03% [kernel] [k] smp_call_function_interrupt
0.03% [kernel] [k] _cond_resched
0.03% [kernel] [k] read_tsc
0.03% [kernel] [k] sysret_check
0.03% [kernel] [k] system_call_after_swapgs
0.03% [kernel] [k] default_send_IPI_mask_sequence_phys
0.03% perf [.] add_hist_entry
0.03% [kernel] [k] __schedule
0.03% perf [.] sort__dso_cmp
0.02% [kernel] [k] mutex_spin_on_owner
0.02% [kernel] [k] do_select
0.02% [kernel] [k] __rcu_read_lock
0.02% [kernel] [k] rcu_check_callbacks
0.02% [kernel] [k] handle_exit
0.02% [kernel] [k] apic_timer_interrupt
0.02% [kernel] [k] perf_pmu_disable
0.02% [kernel] [k] find_busiest_group
3665 unprocessable samples recorded.3666 unprocessable samples recorded. ...
^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: [Qemu-devel] Windows slow boot: contractor wanted
@ 2012-08-25 17:45 ` Richard Davies
0 siblings, 0 replies; 101+ messages in thread
From: Richard Davies @ 2012-08-25 17:45 UTC (permalink / raw)
To: Rik van Riel; +Cc: Avi Kivity, kvm, qemu-devel
Rik van Riel wrote:
> Richard Davies wrote:
> > Avi Kivity wrote:
> > > Richard Davies wrote:
> > > > I can trigger the slow boots without KSM and they have the same
> > > > profile, with _raw_spin_lock_irqsave and isolate_freepages_block at
> > > > the top.
> > > >
> > > > I reduced to 3x 20GB 8-core VMs on a 128GB host (rather than 3x 40GB
> > > > 8-core VMs), and haven't managed to get a really slow boot yet (>5
> > > > minutes). I'll post agan when I get one.
> > >
> > > I think you can go higher than that. But 120GB on a 128GB host is
> > > pushing it.
> >
> > I've now triggered a very slow boot at 3x 36GB 8-core VMs on a 128GB host
> > (i.e. 108GB on a 128GB host).
> >
> > It has the same profile with _raw_spin_lock_irqsave and
> > isolate_freepages_block at the top.
>
> That's the page compaction code.
>
> Mel Gorman and I have been working to fix that,
> the latest fixes and improvements are in the -mm
> kernel already.
Hi Rik,
Are you talking about these patches?
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commit;h=c67fe3752abe6ab47639e2f9b836900c3dc3da84
http://marc.info/?l=linux-mm&m=134521289221259
If so, I believe those are in 3.6.0-rc3, so I tested with that.
Unfortunately, I can still get the slow boots and perf top showing
_raw_spin_lock_irqsave.
Here are two perf top traces on 3.6.0-rc3. They do look a bit different from
3.5.2, but _raw_spin_lock_irqsave is still at the top:
PerfTop: 35272 irqs/sec kernel:98.1% exact: 0.0% [4000Hz cycles], (all, 16 CPUs)
------------------------------------------------------------------------------------------------------------------
61.85% [kernel] [k] _raw_spin_lock_irqsave
7.18% [kernel] [k] sub_preempt_count
5.03% [kernel] [k] isolate_freepages_block
2.49% [kernel] [k] yield_to
2.05% [kernel] [k] memcmp
2.01% [kernel] [k] compact_zone
1.76% [kernel] [k] add_preempt_count
1.52% [kernel] [k] _raw_spin_lock
1.31% [kernel] [k] kvm_vcpu_on_spin
0.92% [kernel] [k] svm_vcpu_run
0.78% [kernel] [k] __rcu_read_unlock
0.76% [kernel] [k] migrate_pages
0.68% [kernel] [k] kvm_vcpu_yield_to
0.46% [kernel] [k] pid_task
0.42% [kernel] [k] isolate_migratepages_range
0.41% [kernel] [k] kvm_arch_vcpu_ioctl_run
0.40% [kernel] [k] clear_page_c
0.40% [kernel] [k] get_pid_task
0.40% [kernel] [k] get_parent_ip
0.39% [kernel] [k] __zone_watermark_ok
0.34% [kernel] [k] trace_hardirqs_off
0.34% [kernel] [k] trace_hardirqs_on
0.32% [kernel] [k] _raw_spin_unlock_irqrestore
0.27% [kernel] [k] _raw_spin_unlock
0.22% [kernel] [k] mod_zone_page_state
0.21% [kernel] [k] rcu_note_context_switch
0.21% [kernel] [k] trace_preempt_on
0.21% [kernel] [k] trace_preempt_off
0.19% [kernel] [k] in_lock_functions
0.16% [kernel] [k] __srcu_read_lock
0.14% [kernel] [k] ktime_get
0.11% [kernel] [k] get_pageblock_flags_group
0.11% [kernel] [k] compact_checklock_irqsave
0.11% [kernel] [k] find_busiest_group
0.10% [kernel] [k] __srcu_read_unlock
0.09% [kernel] [k] __rcu_read_lock
0.09% libc-2.10.1.so [.] 0x0000000000072c9d
0.09% [kernel] [k] cpumask_next_and
0.08% [kernel] [k] smp_call_function_many
0.08% [kernel] [k] read_tsc
0.08% [kernel] [k] kmem_cache_alloc
0.08% libc-2.10.1.so [.] strcmp
0.08% [kernel] [k] generic_smp_call_function_interrupt
0.07% [kernel] [k] __schedule
0.07% qemu-kvm [.] main_loop_wait
0.07% [kernel] [k] __hrtimer_start_range_ns
0.06% qemu-kvm [.] qemu_iohandler_poll
0.06% [kernel] [k] ktime_get_update_offsets
0.06% [kernel] [k] ktime_add_safe
0.06% [kernel] [k] find_next_bit
0.06% [kernel] [k] irq_exit
0.06% [kernel] [k] select_task_rq_fair
0.06% [kernel] [k] handle_exit
0.05% [kernel] [k] update_curr
0.05% [kernel] [k] flush_tlb_func
0.05% perf [.] dso__find_symbol
0.05% [kernel] [k] kvm_check_async_pf_completion
0.05% [kernel] [k] rcu_check_callbacks
0.05% [kernel] [k] apic_update_ppr
0.05% [kernel] [k] irq_enter
0.04% [kernel] [k] copy_user_generic_string
0.04% [kernel] [k] copy_page_c
0.04% [kernel] [k] rcu_idle_exit_common.isra.34
0.04% [kernel] [k] load_balance
0.04% [kernel] [k] rb_erase
0.04% libc-2.10.1.so [.] __select
1904 unprocessable samples recorded.1905 unprocessable samples recorded. ...
PerfTop: 49639 irqs/sec kernel:98.8% exact: 0.0% [4000Hz cycles],
(all, 16 CPUs)
------------------------------------------------------------------------------------------------------------------
81.43% [kernel] [k] _raw_spin_lock_irqsave
6.19% [kernel] [k] sub_preempt_count
1.21% [kernel] [k] memcmp
1.03% [kernel] [k] compact_zone
0.72% [kernel] [k] smp_call_function_many
0.50% [kernel] [k] yield_to
0.49% [kernel] [k] add_preempt_count
0.43% [kernel] [k] svm_vcpu_run
0.41% [kernel] [k] _raw_spin_unlock_irqrestore
0.40% [kernel] [k] clear_page_c
0.40% [kernel] [k] migrate_pages
0.38% [kernel] [k] __zone_watermark_ok
0.34% [kernel] [k] isolate_migratepages_range
0.34% [kernel] [k] isolate_freepages_block
0.27% [kernel] [k] kvm_vcpu_on_spin
0.23% [kernel] [k] trace_hardirqs_off
0.21% [kernel] [k] mod_zone_page_state
0.20% [kernel] [k] __rcu_read_unlock
0.18% [kernel] [k] get_parent_ip
0.17% [kernel] [k] _raw_spin_lock
0.14% [kernel] [k] flush_tlb_func
0.14% [kernel] [k] trace_preempt_on
0.14% [kernel] [k] trace_preempt_off
0.14% [kernel] [k] kvm_arch_vcpu_ioctl_run
0.14% [kernel] [k] trace_hardirqs_on
0.10% [kernel] [k] compact_checklock_irqsave
0.09% [kernel] [k] _raw_spin_lock_irq
0.09% [kernel] [k] __srcu_read_lock
0.07% [kernel] [k] in_lock_functions
0.07% [kernel] [k] copy_page_c
0.07% [kernel] [k] kmem_cache_alloc
0.07% libc-2.10.1.so [.] strcmp
0.06% [kernel] [k] _raw_spin_unlock
0.06% [kernel] [k] kvm_vcpu_yield_to
0.06% [kernel] [k] get_pid_task
0.06% [kernel] [k] ktime_get
0.06% [kernel] [k] call_function_interrupt
0.05% [kernel] [k] generic_smp_call_function_interrupt
0.05% [kernel] [k] ktime_get_update_offsets
0.05% [kernel] [k] pid_task
0.05% [kernel] [k] copy_user_generic_string
0.04% [kernel] [k] __srcu_read_unlock
0.04% [kernel] [k] get_pageblock_flags_group
0.04% [kernel] [k] rcu_note_context_switch
0.04% libc-2.10.1.so [.] 0x00000000000743ee
0.04% perf [.] dso__find_symbol
0.04% [kernel] [k] zone_watermark_ok
0.04% [vdso] [.] 0x00007fff9afff85d
0.03% [kernel] [k] __mod_zone_page_state
0.03% [kernel] [k] smp_call_function_interrupt
0.03% [kernel] [k] _cond_resched
0.03% [kernel] [k] read_tsc
0.03% [kernel] [k] sysret_check
0.03% [kernel] [k] system_call_after_swapgs
0.03% [kernel] [k] default_send_IPI_mask_sequence_phys
0.03% perf [.] add_hist_entry
0.03% [kernel] [k] __schedule
0.03% perf [.] sort__dso_cmp
0.02% [kernel] [k] mutex_spin_on_owner
0.02% [kernel] [k] do_select
0.02% [kernel] [k] __rcu_read_lock
0.02% [kernel] [k] rcu_check_callbacks
0.02% [kernel] [k] handle_exit
0.02% [kernel] [k] apic_timer_interrupt
0.02% [kernel] [k] perf_pmu_disable
0.02% [kernel] [k] find_busiest_group
3665 unprocessable samples recorded.3666 unprocessable samples recorded. ...
^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: [Qemu-devel] Windows slow boot: contractor wanted
2012-08-22 17:22 ` Troy Benjegerdes
@ 2012-08-25 17:51 ` Richard Davies
-1 siblings, 0 replies; 101+ messages in thread
From: Richard Davies @ 2012-08-25 17:51 UTC (permalink / raw)
To: Troy Benjegerdes; +Cc: Avi Kivity, qemu-devel, kvm
Troy Benjegerdes wrote:
> Is there a way to capture/reproduce this 'slow boot' behavior with
> a simple regression test? I'd like to know if it happens on a
> single-physical CPU socket machine, or just on dual-sockets.
Yes, definitely.
These two emails earlier in the thread give a fairly complete description of
what I am doing - please do ask any further questions?
http://marc.info/?l=qemu-devel&m=134511429415347
http://marc.info/?l=qemu-devel&m=134520701317153
Richard.
^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: [Qemu-devel] Windows slow boot: contractor wanted
@ 2012-08-25 17:51 ` Richard Davies
0 siblings, 0 replies; 101+ messages in thread
From: Richard Davies @ 2012-08-25 17:51 UTC (permalink / raw)
To: Troy Benjegerdes; +Cc: Avi Kivity, kvm, qemu-devel
Troy Benjegerdes wrote:
> Is there a way to capture/reproduce this 'slow boot' behavior with
> a simple regression test? I'd like to know if it happens on a
> single-physical CPU socket machine, or just on dual-sockets.
Yes, definitely.
These two emails earlier in the thread give a fairly complete description of
what I am doing - please do ask any further questions?
http://marc.info/?l=qemu-devel&m=134511429415347
http://marc.info/?l=qemu-devel&m=134520701317153
Richard.
^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: Windows slow boot: contractor wanted
2012-08-25 17:45 ` [Qemu-devel] " Richard Davies
@ 2012-08-25 18:11 ` Rik van Riel
-1 siblings, 0 replies; 101+ messages in thread
From: Rik van Riel @ 2012-08-25 18:11 UTC (permalink / raw)
To: Richard Davies; +Cc: Avi Kivity, qemu-devel, kvm
On 08/25/2012 01:45 PM, Richard Davies wrote:
> Are you talking about these patches?
>
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commit;h=c67fe3752abe6ab47639e2f9b836900c3dc3da84
> http://marc.info/?l=linux-mm&m=134521289221259
>
> If so, I believe those are in 3.6.0-rc3, so I tested with that.
>
> Unfortunately, I can still get the slow boots and perf top showing
> _raw_spin_lock_irqsave.
>
>
> Here are two perf top traces on 3.6.0-rc3. They do look a bit different from
> 3.5.2, but _raw_spin_lock_irqsave is still at the top:
>
> PerfTop: 35272 irqs/sec kernel:98.1% exact: 0.0% [4000Hz cycles], (all, 16 CPUs)
> ------------------------------------------------------------------------------------------------------------------
>
> 61.85% [kernel] [k] _raw_spin_lock_irqsave
> 7.18% [kernel] [k] sub_preempt_count
> 5.03% [kernel] [k] isolate_freepages_block
> 2.49% [kernel] [k] yield_to
> 2.05% [kernel] [k] memcmp
> 2.01% [kernel] [k] compact_zone
> 1.76% [kernel] [k] add_preempt_count
> 1.52% [kernel] [k] _raw_spin_lock
> 1.31% [kernel] [k] kvm_vcpu_on_spin
> 0.92% [kernel] [k] svm_vcpu_run
However, the compaction code is not as prominent as before.
Can you get a backtrace to that _raw_spin_lock_irqsave, to see
from where it is running into lock contention?
It would be good to know whether it is isolate_freepages_block,
yield_to, kvm_vcpu_on_spin or something else...
--
All rights reversed
^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: [Qemu-devel] Windows slow boot: contractor wanted
@ 2012-08-25 18:11 ` Rik van Riel
0 siblings, 0 replies; 101+ messages in thread
From: Rik van Riel @ 2012-08-25 18:11 UTC (permalink / raw)
To: Richard Davies; +Cc: Avi Kivity, kvm, qemu-devel
On 08/25/2012 01:45 PM, Richard Davies wrote:
> Are you talking about these patches?
>
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commit;h=c67fe3752abe6ab47639e2f9b836900c3dc3da84
> http://marc.info/?l=linux-mm&m=134521289221259
>
> If so, I believe those are in 3.6.0-rc3, so I tested with that.
>
> Unfortunately, I can still get the slow boots and perf top showing
> _raw_spin_lock_irqsave.
>
>
> Here are two perf top traces on 3.6.0-rc3. They do look a bit different from
> 3.5.2, but _raw_spin_lock_irqsave is still at the top:
>
> PerfTop: 35272 irqs/sec kernel:98.1% exact: 0.0% [4000Hz cycles], (all, 16 CPUs)
> ------------------------------------------------------------------------------------------------------------------
>
> 61.85% [kernel] [k] _raw_spin_lock_irqsave
> 7.18% [kernel] [k] sub_preempt_count
> 5.03% [kernel] [k] isolate_freepages_block
> 2.49% [kernel] [k] yield_to
> 2.05% [kernel] [k] memcmp
> 2.01% [kernel] [k] compact_zone
> 1.76% [kernel] [k] add_preempt_count
> 1.52% [kernel] [k] _raw_spin_lock
> 1.31% [kernel] [k] kvm_vcpu_on_spin
> 0.92% [kernel] [k] svm_vcpu_run
However, the compaction code is not as prominent as before.
Can you get a backtrace to that _raw_spin_lock_irqsave, to see
from where it is running into lock contention?
It would be good to know whether it is isolate_freepages_block,
yield_to, kvm_vcpu_on_spin or something else...
--
All rights reversed
^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: Windows slow boot: contractor wanted
2012-08-25 18:11 ` [Qemu-devel] " Rik van Riel
@ 2012-08-26 10:58 ` Richard Davies
-1 siblings, 0 replies; 101+ messages in thread
From: Richard Davies @ 2012-08-26 10:58 UTC (permalink / raw)
To: Rik van Riel; +Cc: Avi Kivity, kvm, qemu-devel
Rik van Riel wrote:
> Can you get a backtrace to that _raw_spin_lock_irqsave, to see
> from where it is running into lock contention?
>
> It would be good to know whether it is isolate_freepages_block,
> yield_to, kvm_vcpu_on_spin or something else...
Hi Rik,
I got into a slow boot situation on 3.6.0-rc3, ran "perf record -g -a" for a
while, then ran perf report with the output below.
This trace looks more like the second perf top trace that I sent on Saturday
(there were two in my email and they were different from each other as well
as different from on 3.5.2).
The symptoms were a bit different too - the VM boots appeared to be
completely locked up rather than just slow, and I couldn't quit qemu-kvm at
the monitor - I had to restart the host.
So perhaps this one is actually a deadlock rather than just slow?
Cheers,
Richard.
# ========
# captured on: Sun Aug 26 10:08:28 2012
# os release : 3.6.0-rc3-elastic
# perf version : 3.5.2
# arch : x86_64
# nrcpus online : 16
# nrcpus avail : 16
# cpudesc : AMD Opteron(tm) Processor 6128
# cpuid : AuthenticAMD,16,9,1
# total memory : 131971760 kB
# cmdline : /home/root/bin/perf record -g -a
# event : name = cycles, type = 0, config = 0x0, config1 = 0x0, config2 = 0x0, excl_usr = 0, excl_kern = 0, id = { 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64 }
# HEADER_CPU_TOPOLOGY info available, use -I to display
# HEADER_NUMA_TOPOLOGY info available, use -I to display
# ========
#
# Samples: 2M of event 'cycles'
# Event count (approx.): 1040676441385
#
# Overhead Command Shared Object Symbol
# ........ ............... .................... ..............................................
#
90.01% qemu-kvm [kernel.kallsyms] [k] _raw_spin_lock_irqsave
|
--- _raw_spin_lock_irqsave
|
|--99.99%-- isolate_migratepages_range
| compact_zone
| compact_zone_order
| try_to_compact_pages
| __alloc_pages_direct_compact
| __alloc_pages_nodemask
| alloc_pages_vma
| do_huge_pmd_anonymous_page
| handle_mm_fault
| __get_user_pages
| get_user_page_nowait
| hva_to_pfn.isra.33
| __gfn_to_pfn
| gfn_to_pfn_async
| try_async_pf
| tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--54.91%-- 0x10100000002
| |
| --45.09%-- 0x10100000006
--0.01%-- [...]
4.66% qemu-kvm [kernel.kallsyms] [k] sub_preempt_count
|
--- sub_preempt_count
|
|--99.77%-- _raw_spin_unlock_irqrestore
| |
| |--99.99%-- compact_checklock_irqsave
| | isolate_migratepages_range
| | compact_zone
| | compact_zone_order
| | try_to_compact_pages
| | __alloc_pages_direct_compact
| | __alloc_pages_nodemask
| | alloc_pages_vma
| | do_huge_pmd_anonymous_page
| | handle_mm_fault
| | __get_user_pages
| | get_user_page_nowait
| | hva_to_pfn.isra.33
| | __gfn_to_pfn
| | gfn_to_pfn_async
| | try_async_pf
| | tdp_page_fault
| | kvm_mmu_page_fault
| | pf_interception
| | handle_exit
| | kvm_arch_vcpu_ioctl_run
| | kvm_vcpu_ioctl
| | do_vfs_ioctl
| | sys_ioctl
| | system_call_fastpath
| | ioctl
| | |
| | |--51.94%-- 0x10100000002
| | |
| | --48.06%-- 0x10100000006
| --0.01%-- [...]
--0.23%-- [...]
1.23% ksmd [kernel.kallsyms] [k] memcmp
|
--- memcmp
|
|--99.83%-- memcmp_pages
| |
| |--78.46%-- ksm_scan_thread
| | kthread
| | kernel_thread_helper
| |
| --21.54%-- try_to_merge_with_ksm_page
| ksm_scan_thread
| kthread
| kernel_thread_helper
--0.17%-- [...]
0.91% ksmd [kernel.kallsyms] [k] smp_call_function_many
|
--- smp_call_function_many
|
|--99.98%-- native_flush_tlb_others
| |
| |--99.86%-- flush_tlb_page
| | ptep_clear_flush
| | try_to_merge_with_ksm_page
| | ksm_scan_thread
| | kthread
| | kernel_thread_helper
| --0.14%-- [...]
--0.02%-- [...]
0.34% qemu-kvm [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore
|
--- _raw_spin_unlock_irqrestore
|
|--96.08%-- compact_checklock_irqsave
| isolate_migratepages_range
| compact_zone
| compact_zone_order
| try_to_compact_pages
| __alloc_pages_direct_compact
| __alloc_pages_nodemask
| alloc_pages_vma
| do_huge_pmd_anonymous_page
| handle_mm_fault
| __get_user_pages
| get_user_page_nowait
| hva_to_pfn.isra.33
| __gfn_to_pfn
| gfn_to_pfn_async
| try_async_pf
| tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--65.19%-- 0x10100000006
| |
| --34.81%-- 0x10100000002
|
|--2.68%-- isolate_migratepages_range
| compact_zone
| compact_zone_order
| try_to_compact_pages
| __alloc_pages_direct_compact
| __alloc_pages_nodemask
| alloc_pages_vma
| do_huge_pmd_anonymous_page
| handle_mm_fault
| __get_user_pages
| get_user_page_nowait
| hva_to_pfn.isra.33
| __gfn_to_pfn
| gfn_to_pfn_async
| try_async_pf
| tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--52.08%-- 0x10100000002
| |
| --47.92%-- 0x10100000006
|
|--0.56%-- ntp_tick_length
| do_timer
| tick_do_update_jiffies64
| tick_sched_timer
| __run_hrtimer
| hrtimer_interrupt
| smp_apic_timer_interrupt
| apic_timer_interrupt
| compact_checklock_irqsave
| isolate_migratepages_range
| compact_zone
| compact_zone_order
| try_to_compact_pages
| __alloc_pages_direct_compact
| __alloc_pages_nodemask
| alloc_pages_vma
| do_huge_pmd_anonymous_page
| handle_mm_fault
| __get_user_pages
| get_user_page_nowait
| hva_to_pfn.isra.33
| __gfn_to_pfn
| gfn_to_pfn_async
| try_async_pf
| tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| 0x10100000002
--0.68%-- [...]
0.30% swapper [kernel.kallsyms] [k] default_idle
|
--- default_idle
|
|--99.95%-- cpu_idle
| start_secondary
--0.05%-- [...]
0.15% qemu-kvm [kernel.kallsyms] [k] isolate_migratepages_range
|
--- isolate_migratepages_range
|
|--97.41%-- compact_zone
| compact_zone_order
| try_to_compact_pages
| __alloc_pages_direct_compact
| __alloc_pages_nodemask
| alloc_pages_vma
| do_huge_pmd_anonymous_page
| handle_mm_fault
| __get_user_pages
| get_user_page_nowait
| hva_to_pfn.isra.33
| __gfn_to_pfn
| gfn_to_pfn_async
| try_async_pf
| tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--54.02%-- 0x10100000002
| |
| --45.98%-- 0x10100000006
|
--2.59%-- compact_zone_order
try_to_compact_pages
__alloc_pages_direct_compact
__alloc_pages_nodemask
alloc_pages_vma
do_huge_pmd_anonymous_page
handle_mm_fault
__get_user_pages
get_user_page_nowait
hva_to_pfn.isra.33
__gfn_to_pfn
gfn_to_pfn_async
try_async_pf
tdp_page_fault
kvm_mmu_page_fault
pf_interception
handle_exit
kvm_arch_vcpu_ioctl_run
kvm_vcpu_ioctl
do_vfs_ioctl
sys_ioctl
system_call_fastpath
ioctl
|
|--56.10%-- 0x10100000002
|
--43.90%-- 0x10100000006
0.12% qemu-kvm [kernel.kallsyms] [k] compact_zone
|
--- compact_zone
compact_zone_order
try_to_compact_pages
__alloc_pages_direct_compact
__alloc_pages_nodemask
alloc_pages_vma
do_huge_pmd_anonymous_page
handle_mm_fault
__get_user_pages
get_user_page_nowait
hva_to_pfn.isra.33
__gfn_to_pfn
gfn_to_pfn_async
try_async_pf
tdp_page_fault
kvm_mmu_page_fault
pf_interception
handle_exit
kvm_arch_vcpu_ioctl_run
kvm_vcpu_ioctl
do_vfs_ioctl
sys_ioctl
system_call_fastpath
ioctl
|
|--52.09%-- 0x10100000002
|
--47.91%-- 0x10100000006
0.11% qemu-kvm [kernel.kallsyms] [k] flush_tlb_func
|
--- flush_tlb_func
|
|--99.58%-- generic_smp_call_function_interrupt
| smp_call_function_interrupt
| call_function_interrupt
| |
| |--94.65%-- compact_checklock_irqsave
| | isolate_migratepages_range
| | compact_zone
| | compact_zone_order
| | try_to_compact_pages
| | __alloc_pages_direct_compact
| | __alloc_pages_nodemask
| | alloc_pages_vma
| | do_huge_pmd_anonymous_page
| | handle_mm_fault
| | __get_user_pages
| | get_user_page_nowait
| | hva_to_pfn.isra.33
| | __gfn_to_pfn
| | gfn_to_pfn_async
| | try_async_pf
| | tdp_page_fault
| | kvm_mmu_page_fault
| | pf_interception
| | handle_exit
| | kvm_arch_vcpu_ioctl_run
| | kvm_vcpu_ioctl
| | do_vfs_ioctl
| | sys_ioctl
| | system_call_fastpath
| | ioctl
| | |
| | |--78.04%-- 0x10100000006
| | |
| | --21.96%-- 0x10100000002
| |
| |--4.67%-- sub_preempt_count
| | _raw_spin_unlock_irqrestore
| | compact_checklock_irqsave
| | isolate_migratepages_range
| | compact_zone
| | compact_zone_order
| | try_to_compact_pages
| | __alloc_pages_direct_compact
| | __alloc_pages_nodemask
| | alloc_pages_vma
| | do_huge_pmd_anonymous_page
| | handle_mm_fault
| | __get_user_pages
| | get_user_page_nowait
| | hva_to_pfn.isra.33
| | __gfn_to_pfn
| | gfn_to_pfn_async
| | try_async_pf
| | tdp_page_fault
| | kvm_mmu_page_fault
| | pf_interception
| | handle_exit
| | kvm_arch_vcpu_ioctl_run
| | kvm_vcpu_ioctl
| | do_vfs_ioctl
| | sys_ioctl
| | system_call_fastpath
| | ioctl
| | |
| | |--78.18%-- 0x10100000006
| | |
| | --21.82%-- 0x10100000002
| --0.68%-- [...]
--0.42%-- [...]
0.09% qemu-kvm [kernel.kallsyms] [k] mod_zone_page_state
|
--- mod_zone_page_state
|
|--80.84%-- isolate_migratepages_range
| compact_zone
| compact_zone_order
| try_to_compact_pages
| __alloc_pages_direct_compact
| __alloc_pages_nodemask
| alloc_pages_vma
| do_huge_pmd_anonymous_page
| handle_mm_fault
| __get_user_pages
| get_user_page_nowait
| hva_to_pfn.isra.33
| __gfn_to_pfn
| gfn_to_pfn_async
| try_async_pf
| tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--53.90%-- 0x10100000002
| |
| --46.10%-- 0x10100000006
|
--19.16%-- compact_zone
compact_zone_order
try_to_compact_pages
__alloc_pages_direct_compact
__alloc_pages_nodemask
alloc_pages_vma
do_huge_pmd_anonymous_page
handle_mm_fault
__get_user_pages
get_user_page_nowait
hva_to_pfn.isra.33
__gfn_to_pfn
gfn_to_pfn_async
try_async_pf
tdp_page_fault
kvm_mmu_page_fault
pf_interception
handle_exit
kvm_arch_vcpu_ioctl_run
kvm_vcpu_ioctl
do_vfs_ioctl
sys_ioctl
system_call_fastpath
ioctl
|
|--55.04%-- 0x10100000002
|
--44.96%-- 0x10100000006
0.09% qemu-kvm [kernel.kallsyms] [k] migrate_pages
|
--- migrate_pages
|
|--96.21%-- compact_zone
| compact_zone_order
| try_to_compact_pages
| __alloc_pages_direct_compact
| __alloc_pages_nodemask
| alloc_pages_vma
| do_huge_pmd_anonymous_page
| handle_mm_fault
| __get_user_pages
| get_user_page_nowait
| hva_to_pfn.isra.33
| __gfn_to_pfn
| gfn_to_pfn_async
| try_async_pf
| tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--52.94%-- 0x10100000002
| |
| --47.06%-- 0x10100000006
|
--3.79%-- compact_zone_order
try_to_compact_pages
__alloc_pages_direct_compact
__alloc_pages_nodemask
alloc_pages_vma
do_huge_pmd_anonymous_page
handle_mm_fault
__get_user_pages
get_user_page_nowait
hva_to_pfn.isra.33
__gfn_to_pfn
gfn_to_pfn_async
try_async_pf
tdp_page_fault
kvm_mmu_page_fault
pf_interception
handle_exit
kvm_arch_vcpu_ioctl_run
kvm_vcpu_ioctl
do_vfs_ioctl
sys_ioctl
system_call_fastpath
ioctl
|
|--50.72%-- 0x10100000002
|
--49.28%-- 0x10100000006
0.09% qemu-kvm [kernel.kallsyms] [k] __zone_watermark_ok
|
--- __zone_watermark_ok
|
|--95.81%-- zone_watermark_ok
| compact_zone
| compact_zone_order
| try_to_compact_pages
| __alloc_pages_direct_compact
| __alloc_pages_nodemask
| alloc_pages_vma
| do_huge_pmd_anonymous_page
| handle_mm_fault
| __get_user_pages
| get_user_page_nowait
| hva_to_pfn.isra.33
| __gfn_to_pfn
| gfn_to_pfn_async
| try_async_pf
| tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--51.21%-- 0x10100000002
| |
| --48.79%-- 0x10100000006
|
--4.19%-- compact_zone
compact_zone_order
try_to_compact_pages
__alloc_pages_direct_compact
__alloc_pages_nodemask
alloc_pages_vma
do_huge_pmd_anonymous_page
handle_mm_fault
__get_user_pages
get_user_page_nowait
hva_to_pfn.isra.33
__gfn_to_pfn
gfn_to_pfn_async
try_async_pf
tdp_page_fault
kvm_mmu_page_fault
pf_interception
handle_exit
kvm_arch_vcpu_ioctl_run
kvm_vcpu_ioctl
do_vfs_ioctl
sys_ioctl
system_call_fastpath
ioctl
|
|--50.00%-- 0x10100000006
|
--50.00%-- 0x10100000002
0.06% perf [kernel.kallsyms] [k] copy_user_generic_string
|
--- copy_user_generic_string
generic_file_buffered_write
__generic_file_aio_write
generic_file_aio_write
ext4_file_write
do_sync_write
vfs_write
sys_write
system_call_fastpath
write
run_builtin
main
__libc_start_main
^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: [Qemu-devel] Windows slow boot: contractor wanted
@ 2012-08-26 10:58 ` Richard Davies
0 siblings, 0 replies; 101+ messages in thread
From: Richard Davies @ 2012-08-26 10:58 UTC (permalink / raw)
To: Rik van Riel; +Cc: Avi Kivity, kvm, qemu-devel
Rik van Riel wrote:
> Can you get a backtrace to that _raw_spin_lock_irqsave, to see
> from where it is running into lock contention?
>
> It would be good to know whether it is isolate_freepages_block,
> yield_to, kvm_vcpu_on_spin or something else...
Hi Rik,
I got into a slow boot situation on 3.6.0-rc3, ran "perf record -g -a" for a
while, then ran perf report with the output below.
This trace looks more like the second perf top trace that I sent on Saturday
(there were two in my email and they were different from each other as well
as different from on 3.5.2).
The symptoms were a bit different too - the VM boots appeared to be
completely locked up rather than just slow, and I couldn't quit qemu-kvm at
the monitor - I had to restart the host.
So perhaps this one is actually a deadlock rather than just slow?
Cheers,
Richard.
# ========
# captured on: Sun Aug 26 10:08:28 2012
# os release : 3.6.0-rc3-elastic
# perf version : 3.5.2
# arch : x86_64
# nrcpus online : 16
# nrcpus avail : 16
# cpudesc : AMD Opteron(tm) Processor 6128
# cpuid : AuthenticAMD,16,9,1
# total memory : 131971760 kB
# cmdline : /home/root/bin/perf record -g -a
# event : name = cycles, type = 0, config = 0x0, config1 = 0x0, config2 = 0x0, excl_usr = 0, excl_kern = 0, id = { 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64 }
# HEADER_CPU_TOPOLOGY info available, use -I to display
# HEADER_NUMA_TOPOLOGY info available, use -I to display
# ========
#
# Samples: 2M of event 'cycles'
# Event count (approx.): 1040676441385
#
# Overhead Command Shared Object Symbol
# ........ ............... .................... ..............................................
#
90.01% qemu-kvm [kernel.kallsyms] [k] _raw_spin_lock_irqsave
|
--- _raw_spin_lock_irqsave
|
|--99.99%-- isolate_migratepages_range
| compact_zone
| compact_zone_order
| try_to_compact_pages
| __alloc_pages_direct_compact
| __alloc_pages_nodemask
| alloc_pages_vma
| do_huge_pmd_anonymous_page
| handle_mm_fault
| __get_user_pages
| get_user_page_nowait
| hva_to_pfn.isra.33
| __gfn_to_pfn
| gfn_to_pfn_async
| try_async_pf
| tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--54.91%-- 0x10100000002
| |
| --45.09%-- 0x10100000006
--0.01%-- [...]
4.66% qemu-kvm [kernel.kallsyms] [k] sub_preempt_count
|
--- sub_preempt_count
|
|--99.77%-- _raw_spin_unlock_irqrestore
| |
| |--99.99%-- compact_checklock_irqsave
| | isolate_migratepages_range
| | compact_zone
| | compact_zone_order
| | try_to_compact_pages
| | __alloc_pages_direct_compact
| | __alloc_pages_nodemask
| | alloc_pages_vma
| | do_huge_pmd_anonymous_page
| | handle_mm_fault
| | __get_user_pages
| | get_user_page_nowait
| | hva_to_pfn.isra.33
| | __gfn_to_pfn
| | gfn_to_pfn_async
| | try_async_pf
| | tdp_page_fault
| | kvm_mmu_page_fault
| | pf_interception
| | handle_exit
| | kvm_arch_vcpu_ioctl_run
| | kvm_vcpu_ioctl
| | do_vfs_ioctl
| | sys_ioctl
| | system_call_fastpath
| | ioctl
| | |
| | |--51.94%-- 0x10100000002
| | |
| | --48.06%-- 0x10100000006
| --0.01%-- [...]
--0.23%-- [...]
1.23% ksmd [kernel.kallsyms] [k] memcmp
|
--- memcmp
|
|--99.83%-- memcmp_pages
| |
| |--78.46%-- ksm_scan_thread
| | kthread
| | kernel_thread_helper
| |
| --21.54%-- try_to_merge_with_ksm_page
| ksm_scan_thread
| kthread
| kernel_thread_helper
--0.17%-- [...]
0.91% ksmd [kernel.kallsyms] [k] smp_call_function_many
|
--- smp_call_function_many
|
|--99.98%-- native_flush_tlb_others
| |
| |--99.86%-- flush_tlb_page
| | ptep_clear_flush
| | try_to_merge_with_ksm_page
| | ksm_scan_thread
| | kthread
| | kernel_thread_helper
| --0.14%-- [...]
--0.02%-- [...]
0.34% qemu-kvm [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore
|
--- _raw_spin_unlock_irqrestore
|
|--96.08%-- compact_checklock_irqsave
| isolate_migratepages_range
| compact_zone
| compact_zone_order
| try_to_compact_pages
| __alloc_pages_direct_compact
| __alloc_pages_nodemask
| alloc_pages_vma
| do_huge_pmd_anonymous_page
| handle_mm_fault
| __get_user_pages
| get_user_page_nowait
| hva_to_pfn.isra.33
| __gfn_to_pfn
| gfn_to_pfn_async
| try_async_pf
| tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--65.19%-- 0x10100000006
| |
| --34.81%-- 0x10100000002
|
|--2.68%-- isolate_migratepages_range
| compact_zone
| compact_zone_order
| try_to_compact_pages
| __alloc_pages_direct_compact
| __alloc_pages_nodemask
| alloc_pages_vma
| do_huge_pmd_anonymous_page
| handle_mm_fault
| __get_user_pages
| get_user_page_nowait
| hva_to_pfn.isra.33
| __gfn_to_pfn
| gfn_to_pfn_async
| try_async_pf
| tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--52.08%-- 0x10100000002
| |
| --47.92%-- 0x10100000006
|
|--0.56%-- ntp_tick_length
| do_timer
| tick_do_update_jiffies64
| tick_sched_timer
| __run_hrtimer
| hrtimer_interrupt
| smp_apic_timer_interrupt
| apic_timer_interrupt
| compact_checklock_irqsave
| isolate_migratepages_range
| compact_zone
| compact_zone_order
| try_to_compact_pages
| __alloc_pages_direct_compact
| __alloc_pages_nodemask
| alloc_pages_vma
| do_huge_pmd_anonymous_page
| handle_mm_fault
| __get_user_pages
| get_user_page_nowait
| hva_to_pfn.isra.33
| __gfn_to_pfn
| gfn_to_pfn_async
| try_async_pf
| tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| 0x10100000002
--0.68%-- [...]
0.30% swapper [kernel.kallsyms] [k] default_idle
|
--- default_idle
|
|--99.95%-- cpu_idle
| start_secondary
--0.05%-- [...]
0.15% qemu-kvm [kernel.kallsyms] [k] isolate_migratepages_range
|
--- isolate_migratepages_range
|
|--97.41%-- compact_zone
| compact_zone_order
| try_to_compact_pages
| __alloc_pages_direct_compact
| __alloc_pages_nodemask
| alloc_pages_vma
| do_huge_pmd_anonymous_page
| handle_mm_fault
| __get_user_pages
| get_user_page_nowait
| hva_to_pfn.isra.33
| __gfn_to_pfn
| gfn_to_pfn_async
| try_async_pf
| tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--54.02%-- 0x10100000002
| |
| --45.98%-- 0x10100000006
|
--2.59%-- compact_zone_order
try_to_compact_pages
__alloc_pages_direct_compact
__alloc_pages_nodemask
alloc_pages_vma
do_huge_pmd_anonymous_page
handle_mm_fault
__get_user_pages
get_user_page_nowait
hva_to_pfn.isra.33
__gfn_to_pfn
gfn_to_pfn_async
try_async_pf
tdp_page_fault
kvm_mmu_page_fault
pf_interception
handle_exit
kvm_arch_vcpu_ioctl_run
kvm_vcpu_ioctl
do_vfs_ioctl
sys_ioctl
system_call_fastpath
ioctl
|
|--56.10%-- 0x10100000002
|
--43.90%-- 0x10100000006
0.12% qemu-kvm [kernel.kallsyms] [k] compact_zone
|
--- compact_zone
compact_zone_order
try_to_compact_pages
__alloc_pages_direct_compact
__alloc_pages_nodemask
alloc_pages_vma
do_huge_pmd_anonymous_page
handle_mm_fault
__get_user_pages
get_user_page_nowait
hva_to_pfn.isra.33
__gfn_to_pfn
gfn_to_pfn_async
try_async_pf
tdp_page_fault
kvm_mmu_page_fault
pf_interception
handle_exit
kvm_arch_vcpu_ioctl_run
kvm_vcpu_ioctl
do_vfs_ioctl
sys_ioctl
system_call_fastpath
ioctl
|
|--52.09%-- 0x10100000002
|
--47.91%-- 0x10100000006
0.11% qemu-kvm [kernel.kallsyms] [k] flush_tlb_func
|
--- flush_tlb_func
|
|--99.58%-- generic_smp_call_function_interrupt
| smp_call_function_interrupt
| call_function_interrupt
| |
| |--94.65%-- compact_checklock_irqsave
| | isolate_migratepages_range
| | compact_zone
| | compact_zone_order
| | try_to_compact_pages
| | __alloc_pages_direct_compact
| | __alloc_pages_nodemask
| | alloc_pages_vma
| | do_huge_pmd_anonymous_page
| | handle_mm_fault
| | __get_user_pages
| | get_user_page_nowait
| | hva_to_pfn.isra.33
| | __gfn_to_pfn
| | gfn_to_pfn_async
| | try_async_pf
| | tdp_page_fault
| | kvm_mmu_page_fault
| | pf_interception
| | handle_exit
| | kvm_arch_vcpu_ioctl_run
| | kvm_vcpu_ioctl
| | do_vfs_ioctl
| | sys_ioctl
| | system_call_fastpath
| | ioctl
| | |
| | |--78.04%-- 0x10100000006
| | |
| | --21.96%-- 0x10100000002
| |
| |--4.67%-- sub_preempt_count
| | _raw_spin_unlock_irqrestore
| | compact_checklock_irqsave
| | isolate_migratepages_range
| | compact_zone
| | compact_zone_order
| | try_to_compact_pages
| | __alloc_pages_direct_compact
| | __alloc_pages_nodemask
| | alloc_pages_vma
| | do_huge_pmd_anonymous_page
| | handle_mm_fault
| | __get_user_pages
| | get_user_page_nowait
| | hva_to_pfn.isra.33
| | __gfn_to_pfn
| | gfn_to_pfn_async
| | try_async_pf
| | tdp_page_fault
| | kvm_mmu_page_fault
| | pf_interception
| | handle_exit
| | kvm_arch_vcpu_ioctl_run
| | kvm_vcpu_ioctl
| | do_vfs_ioctl
| | sys_ioctl
| | system_call_fastpath
| | ioctl
| | |
| | |--78.18%-- 0x10100000006
| | |
| | --21.82%-- 0x10100000002
| --0.68%-- [...]
--0.42%-- [...]
0.09% qemu-kvm [kernel.kallsyms] [k] mod_zone_page_state
|
--- mod_zone_page_state
|
|--80.84%-- isolate_migratepages_range
| compact_zone
| compact_zone_order
| try_to_compact_pages
| __alloc_pages_direct_compact
| __alloc_pages_nodemask
| alloc_pages_vma
| do_huge_pmd_anonymous_page
| handle_mm_fault
| __get_user_pages
| get_user_page_nowait
| hva_to_pfn.isra.33
| __gfn_to_pfn
| gfn_to_pfn_async
| try_async_pf
| tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--53.90%-- 0x10100000002
| |
| --46.10%-- 0x10100000006
|
--19.16%-- compact_zone
compact_zone_order
try_to_compact_pages
__alloc_pages_direct_compact
__alloc_pages_nodemask
alloc_pages_vma
do_huge_pmd_anonymous_page
handle_mm_fault
__get_user_pages
get_user_page_nowait
hva_to_pfn.isra.33
__gfn_to_pfn
gfn_to_pfn_async
try_async_pf
tdp_page_fault
kvm_mmu_page_fault
pf_interception
handle_exit
kvm_arch_vcpu_ioctl_run
kvm_vcpu_ioctl
do_vfs_ioctl
sys_ioctl
system_call_fastpath
ioctl
|
|--55.04%-- 0x10100000002
|
--44.96%-- 0x10100000006
0.09% qemu-kvm [kernel.kallsyms] [k] migrate_pages
|
--- migrate_pages
|
|--96.21%-- compact_zone
| compact_zone_order
| try_to_compact_pages
| __alloc_pages_direct_compact
| __alloc_pages_nodemask
| alloc_pages_vma
| do_huge_pmd_anonymous_page
| handle_mm_fault
| __get_user_pages
| get_user_page_nowait
| hva_to_pfn.isra.33
| __gfn_to_pfn
| gfn_to_pfn_async
| try_async_pf
| tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--52.94%-- 0x10100000002
| |
| --47.06%-- 0x10100000006
|
--3.79%-- compact_zone_order
try_to_compact_pages
__alloc_pages_direct_compact
__alloc_pages_nodemask
alloc_pages_vma
do_huge_pmd_anonymous_page
handle_mm_fault
__get_user_pages
get_user_page_nowait
hva_to_pfn.isra.33
__gfn_to_pfn
gfn_to_pfn_async
try_async_pf
tdp_page_fault
kvm_mmu_page_fault
pf_interception
handle_exit
kvm_arch_vcpu_ioctl_run
kvm_vcpu_ioctl
do_vfs_ioctl
sys_ioctl
system_call_fastpath
ioctl
|
|--50.72%-- 0x10100000002
|
--49.28%-- 0x10100000006
0.09% qemu-kvm [kernel.kallsyms] [k] __zone_watermark_ok
|
--- __zone_watermark_ok
|
|--95.81%-- zone_watermark_ok
| compact_zone
| compact_zone_order
| try_to_compact_pages
| __alloc_pages_direct_compact
| __alloc_pages_nodemask
| alloc_pages_vma
| do_huge_pmd_anonymous_page
| handle_mm_fault
| __get_user_pages
| get_user_page_nowait
| hva_to_pfn.isra.33
| __gfn_to_pfn
| gfn_to_pfn_async
| try_async_pf
| tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--51.21%-- 0x10100000002
| |
| --48.79%-- 0x10100000006
|
--4.19%-- compact_zone
compact_zone_order
try_to_compact_pages
__alloc_pages_direct_compact
__alloc_pages_nodemask
alloc_pages_vma
do_huge_pmd_anonymous_page
handle_mm_fault
__get_user_pages
get_user_page_nowait
hva_to_pfn.isra.33
__gfn_to_pfn
gfn_to_pfn_async
try_async_pf
tdp_page_fault
kvm_mmu_page_fault
pf_interception
handle_exit
kvm_arch_vcpu_ioctl_run
kvm_vcpu_ioctl
do_vfs_ioctl
sys_ioctl
system_call_fastpath
ioctl
|
|--50.00%-- 0x10100000006
|
--50.00%-- 0x10100000002
0.06% perf [kernel.kallsyms] [k] copy_user_generic_string
|
--- copy_user_generic_string
generic_file_buffered_write
__generic_file_aio_write
generic_file_aio_write
ext4_file_write
do_sync_write
vfs_write
sys_write
system_call_fastpath
write
run_builtin
main
__libc_start_main
^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: Windows slow boot: contractor wanted
2012-08-26 10:58 ` [Qemu-devel] " Richard Davies
@ 2012-09-06 9:20 ` Richard Davies
-1 siblings, 0 replies; 101+ messages in thread
From: Richard Davies @ 2012-09-06 9:20 UTC (permalink / raw)
To: Rik van Riel; +Cc: Avi Kivity, qemu-devel, kvm
Hi Rik,
Are there any more tests which I can usefully do for you?
I notice that 3.6.0-rc4 is out - are there changes from rc3 which are worth
me retesting?
Cheers,
Richard.
Richard Davies wrote:
> Rik van Riel wrote:
> > Can you get a backtrace to that _raw_spin_lock_irqsave, to see
> > from where it is running into lock contention?
> >
> > It would be good to know whether it is isolate_freepages_block,
> > yield_to, kvm_vcpu_on_spin or something else...
>
> Hi Rik,
>
> I got into a slow boot situation on 3.6.0-rc3, ran "perf record -g -a" for a
> while, then ran perf report with the output below.
>
> This trace looks more like the second perf top trace that I sent on Saturday
> (there were two in my email and they were different from each other as well
> as different from on 3.5.2).
>
> The symptoms were a bit different too - the VM boots appeared to be
> completely locked up rather than just slow, and I couldn't quit qemu-kvm at
> the monitor - I had to restart the host.
>
> So perhaps this one is actually a deadlock rather than just slow?
>
> Cheers,
>
> Richard.
>
>
> # ========
> # captured on: Sun Aug 26 10:08:28 2012
> # os release : 3.6.0-rc3-elastic
> # perf version : 3.5.2
> # arch : x86_64
> # nrcpus online : 16
> # nrcpus avail : 16
> # cpudesc : AMD Opteron(tm) Processor 6128
> # cpuid : AuthenticAMD,16,9,1
> # total memory : 131971760 kB
> # cmdline : /home/root/bin/perf record -g -a
> # event : name = cycles, type = 0, config = 0x0, config1 = 0x0, config2 = 0x0, excl_usr = 0, excl_kern = 0, id = { 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64 }
> # HEADER_CPU_TOPOLOGY info available, use -I to display
> # HEADER_NUMA_TOPOLOGY info available, use -I to display
> # ========
> #
> # Samples: 2M of event 'cycles'
> # Event count (approx.): 1040676441385
> #
> # Overhead Command Shared Object Symbol
> # ........ ............... .................... ..............................................
> #
> 90.01% qemu-kvm [kernel.kallsyms] [k] _raw_spin_lock_irqsave
> |
> --- _raw_spin_lock_irqsave
> |
> |--99.99%-- isolate_migratepages_range
> | compact_zone
> | compact_zone_order
> | try_to_compact_pages
> | __alloc_pages_direct_compact
> | __alloc_pages_nodemask
> | alloc_pages_vma
> | do_huge_pmd_anonymous_page
> | handle_mm_fault
> | __get_user_pages
> | get_user_page_nowait
> | hva_to_pfn.isra.33
> | __gfn_to_pfn
> | gfn_to_pfn_async
> | try_async_pf
> | tdp_page_fault
> | kvm_mmu_page_fault
> | pf_interception
> | handle_exit
> | kvm_arch_vcpu_ioctl_run
> | kvm_vcpu_ioctl
> | do_vfs_ioctl
> | sys_ioctl
> | system_call_fastpath
> | ioctl
> | |
> | |--54.91%-- 0x10100000002
> | |
> | --45.09%-- 0x10100000006
> --0.01%-- [...]
> 4.66% qemu-kvm [kernel.kallsyms] [k] sub_preempt_count
> |
> --- sub_preempt_count
> |
> |--99.77%-- _raw_spin_unlock_irqrestore
> | |
> | |--99.99%-- compact_checklock_irqsave
> | | isolate_migratepages_range
> | | compact_zone
> | | compact_zone_order
> | | try_to_compact_pages
> | | __alloc_pages_direct_compact
> | | __alloc_pages_nodemask
> | | alloc_pages_vma
> | | do_huge_pmd_anonymous_page
> | | handle_mm_fault
> | | __get_user_pages
> | | get_user_page_nowait
> | | hva_to_pfn.isra.33
> | | __gfn_to_pfn
> | | gfn_to_pfn_async
> | | try_async_pf
> | | tdp_page_fault
> | | kvm_mmu_page_fault
> | | pf_interception
> | | handle_exit
> | | kvm_arch_vcpu_ioctl_run
> | | kvm_vcpu_ioctl
> | | do_vfs_ioctl
> | | sys_ioctl
> | | system_call_fastpath
> | | ioctl
> | | |
> | | |--51.94%-- 0x10100000002
> | | |
> | | --48.06%-- 0x10100000006
> | --0.01%-- [...]
> --0.23%-- [...]
> 1.23% ksmd [kernel.kallsyms] [k] memcmp
> |
> --- memcmp
> |
> |--99.83%-- memcmp_pages
> | |
> | |--78.46%-- ksm_scan_thread
> | | kthread
> | | kernel_thread_helper
> | |
> | --21.54%-- try_to_merge_with_ksm_page
> | ksm_scan_thread
> | kthread
> | kernel_thread_helper
> --0.17%-- [...]
> 0.91% ksmd [kernel.kallsyms] [k] smp_call_function_many
> |
> --- smp_call_function_many
> |
> |--99.98%-- native_flush_tlb_others
> | |
> | |--99.86%-- flush_tlb_page
> | | ptep_clear_flush
> | | try_to_merge_with_ksm_page
> | | ksm_scan_thread
> | | kthread
> | | kernel_thread_helper
> | --0.14%-- [...]
> --0.02%-- [...]
> 0.34% qemu-kvm [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore
> |
> --- _raw_spin_unlock_irqrestore
> |
> |--96.08%-- compact_checklock_irqsave
> | isolate_migratepages_range
> | compact_zone
> | compact_zone_order
> | try_to_compact_pages
> | __alloc_pages_direct_compact
> | __alloc_pages_nodemask
> | alloc_pages_vma
> | do_huge_pmd_anonymous_page
> | handle_mm_fault
> | __get_user_pages
> | get_user_page_nowait
> | hva_to_pfn.isra.33
> | __gfn_to_pfn
> | gfn_to_pfn_async
> | try_async_pf
> | tdp_page_fault
> | kvm_mmu_page_fault
> | pf_interception
> | handle_exit
> | kvm_arch_vcpu_ioctl_run
> | kvm_vcpu_ioctl
> | do_vfs_ioctl
> | sys_ioctl
> | system_call_fastpath
> | ioctl
> | |
> | |--65.19%-- 0x10100000006
> | |
> | --34.81%-- 0x10100000002
> |
> |--2.68%-- isolate_migratepages_range
> | compact_zone
> | compact_zone_order
> | try_to_compact_pages
> | __alloc_pages_direct_compact
> | __alloc_pages_nodemask
> | alloc_pages_vma
> | do_huge_pmd_anonymous_page
> | handle_mm_fault
> | __get_user_pages
> | get_user_page_nowait
> | hva_to_pfn.isra.33
> | __gfn_to_pfn
> | gfn_to_pfn_async
> | try_async_pf
> | tdp_page_fault
> | kvm_mmu_page_fault
> | pf_interception
> | handle_exit
> | kvm_arch_vcpu_ioctl_run
> | kvm_vcpu_ioctl
> | do_vfs_ioctl
> | sys_ioctl
> | system_call_fastpath
> | ioctl
> | |
> | |--52.08%-- 0x10100000002
> | |
> | --47.92%-- 0x10100000006
> |
> |--0.56%-- ntp_tick_length
> | do_timer
> | tick_do_update_jiffies64
> | tick_sched_timer
> | __run_hrtimer
> | hrtimer_interrupt
> | smp_apic_timer_interrupt
> | apic_timer_interrupt
> | compact_checklock_irqsave
> | isolate_migratepages_range
> | compact_zone
> | compact_zone_order
> | try_to_compact_pages
> | __alloc_pages_direct_compact
> | __alloc_pages_nodemask
> | alloc_pages_vma
> | do_huge_pmd_anonymous_page
> | handle_mm_fault
> | __get_user_pages
> | get_user_page_nowait
> | hva_to_pfn.isra.33
> | __gfn_to_pfn
> | gfn_to_pfn_async
> | try_async_pf
> | tdp_page_fault
> | kvm_mmu_page_fault
> | pf_interception
> | handle_exit
> | kvm_arch_vcpu_ioctl_run
> | kvm_vcpu_ioctl
> | do_vfs_ioctl
> | sys_ioctl
> | system_call_fastpath
> | ioctl
> | 0x10100000002
> --0.68%-- [...]
> 0.30% swapper [kernel.kallsyms] [k] default_idle
> |
> --- default_idle
> |
> |--99.95%-- cpu_idle
> | start_secondary
> --0.05%-- [...]
> 0.15% qemu-kvm [kernel.kallsyms] [k] isolate_migratepages_range
> |
> --- isolate_migratepages_range
> |
> |--97.41%-- compact_zone
> | compact_zone_order
> | try_to_compact_pages
> | __alloc_pages_direct_compact
> | __alloc_pages_nodemask
> | alloc_pages_vma
> | do_huge_pmd_anonymous_page
> | handle_mm_fault
> | __get_user_pages
> | get_user_page_nowait
> | hva_to_pfn.isra.33
> | __gfn_to_pfn
> | gfn_to_pfn_async
> | try_async_pf
> | tdp_page_fault
> | kvm_mmu_page_fault
> | pf_interception
> | handle_exit
> | kvm_arch_vcpu_ioctl_run
> | kvm_vcpu_ioctl
> | do_vfs_ioctl
> | sys_ioctl
> | system_call_fastpath
> | ioctl
> | |
> | |--54.02%-- 0x10100000002
> | |
> | --45.98%-- 0x10100000006
> |
> --2.59%-- compact_zone_order
> try_to_compact_pages
> __alloc_pages_direct_compact
> __alloc_pages_nodemask
> alloc_pages_vma
> do_huge_pmd_anonymous_page
> handle_mm_fault
> __get_user_pages
> get_user_page_nowait
> hva_to_pfn.isra.33
> __gfn_to_pfn
> gfn_to_pfn_async
> try_async_pf
> tdp_page_fault
> kvm_mmu_page_fault
> pf_interception
> handle_exit
> kvm_arch_vcpu_ioctl_run
> kvm_vcpu_ioctl
> do_vfs_ioctl
> sys_ioctl
> system_call_fastpath
> ioctl
> |
> |--56.10%-- 0x10100000002
> |
> --43.90%-- 0x10100000006
> 0.12% qemu-kvm [kernel.kallsyms] [k] compact_zone
> |
> --- compact_zone
> compact_zone_order
> try_to_compact_pages
> __alloc_pages_direct_compact
> __alloc_pages_nodemask
> alloc_pages_vma
> do_huge_pmd_anonymous_page
> handle_mm_fault
> __get_user_pages
> get_user_page_nowait
> hva_to_pfn.isra.33
> __gfn_to_pfn
> gfn_to_pfn_async
> try_async_pf
> tdp_page_fault
> kvm_mmu_page_fault
> pf_interception
> handle_exit
> kvm_arch_vcpu_ioctl_run
> kvm_vcpu_ioctl
> do_vfs_ioctl
> sys_ioctl
> system_call_fastpath
> ioctl
> |
> |--52.09%-- 0x10100000002
> |
> --47.91%-- 0x10100000006
> 0.11% qemu-kvm [kernel.kallsyms] [k] flush_tlb_func
> |
> --- flush_tlb_func
> |
> |--99.58%-- generic_smp_call_function_interrupt
> | smp_call_function_interrupt
> | call_function_interrupt
> | |
> | |--94.65%-- compact_checklock_irqsave
> | | isolate_migratepages_range
> | | compact_zone
> | | compact_zone_order
> | | try_to_compact_pages
> | | __alloc_pages_direct_compact
> | | __alloc_pages_nodemask
> | | alloc_pages_vma
> | | do_huge_pmd_anonymous_page
> | | handle_mm_fault
> | | __get_user_pages
> | | get_user_page_nowait
> | | hva_to_pfn.isra.33
> | | __gfn_to_pfn
> | | gfn_to_pfn_async
> | | try_async_pf
> | | tdp_page_fault
> | | kvm_mmu_page_fault
> | | pf_interception
> | | handle_exit
> | | kvm_arch_vcpu_ioctl_run
> | | kvm_vcpu_ioctl
> | | do_vfs_ioctl
> | | sys_ioctl
> | | system_call_fastpath
> | | ioctl
> | | |
> | | |--78.04%-- 0x10100000006
> | | |
> | | --21.96%-- 0x10100000002
> | |
> | |--4.67%-- sub_preempt_count
> | | _raw_spin_unlock_irqrestore
> | | compact_checklock_irqsave
> | | isolate_migratepages_range
> | | compact_zone
> | | compact_zone_order
> | | try_to_compact_pages
> | | __alloc_pages_direct_compact
> | | __alloc_pages_nodemask
> | | alloc_pages_vma
> | | do_huge_pmd_anonymous_page
> | | handle_mm_fault
> | | __get_user_pages
> | | get_user_page_nowait
> | | hva_to_pfn.isra.33
> | | __gfn_to_pfn
> | | gfn_to_pfn_async
> | | try_async_pf
> | | tdp_page_fault
> | | kvm_mmu_page_fault
> | | pf_interception
> | | handle_exit
> | | kvm_arch_vcpu_ioctl_run
> | | kvm_vcpu_ioctl
> | | do_vfs_ioctl
> | | sys_ioctl
> | | system_call_fastpath
> | | ioctl
> | | |
> | | |--78.18%-- 0x10100000006
> | | |
> | | --21.82%-- 0x10100000002
> | --0.68%-- [...]
> --0.42%-- [...]
> 0.09% qemu-kvm [kernel.kallsyms] [k] mod_zone_page_state
> |
> --- mod_zone_page_state
> |
> |--80.84%-- isolate_migratepages_range
> | compact_zone
> | compact_zone_order
> | try_to_compact_pages
> | __alloc_pages_direct_compact
> | __alloc_pages_nodemask
> | alloc_pages_vma
> | do_huge_pmd_anonymous_page
> | handle_mm_fault
> | __get_user_pages
> | get_user_page_nowait
> | hva_to_pfn.isra.33
> | __gfn_to_pfn
> | gfn_to_pfn_async
> | try_async_pf
> | tdp_page_fault
> | kvm_mmu_page_fault
> | pf_interception
> | handle_exit
> | kvm_arch_vcpu_ioctl_run
> | kvm_vcpu_ioctl
> | do_vfs_ioctl
> | sys_ioctl
> | system_call_fastpath
> | ioctl
> | |
> | |--53.90%-- 0x10100000002
> | |
> | --46.10%-- 0x10100000006
> |
> --19.16%-- compact_zone
> compact_zone_order
> try_to_compact_pages
> __alloc_pages_direct_compact
> __alloc_pages_nodemask
> alloc_pages_vma
> do_huge_pmd_anonymous_page
> handle_mm_fault
> __get_user_pages
> get_user_page_nowait
> hva_to_pfn.isra.33
> __gfn_to_pfn
> gfn_to_pfn_async
> try_async_pf
> tdp_page_fault
> kvm_mmu_page_fault
> pf_interception
> handle_exit
> kvm_arch_vcpu_ioctl_run
> kvm_vcpu_ioctl
> do_vfs_ioctl
> sys_ioctl
> system_call_fastpath
> ioctl
> |
> |--55.04%-- 0x10100000002
> |
> --44.96%-- 0x10100000006
> 0.09% qemu-kvm [kernel.kallsyms] [k] migrate_pages
> |
> --- migrate_pages
> |
> |--96.21%-- compact_zone
> | compact_zone_order
> | try_to_compact_pages
> | __alloc_pages_direct_compact
> | __alloc_pages_nodemask
> | alloc_pages_vma
> | do_huge_pmd_anonymous_page
> | handle_mm_fault
> | __get_user_pages
> | get_user_page_nowait
> | hva_to_pfn.isra.33
> | __gfn_to_pfn
> | gfn_to_pfn_async
> | try_async_pf
> | tdp_page_fault
> | kvm_mmu_page_fault
> | pf_interception
> | handle_exit
> | kvm_arch_vcpu_ioctl_run
> | kvm_vcpu_ioctl
> | do_vfs_ioctl
> | sys_ioctl
> | system_call_fastpath
> | ioctl
> | |
> | |--52.94%-- 0x10100000002
> | |
> | --47.06%-- 0x10100000006
> |
> --3.79%-- compact_zone_order
> try_to_compact_pages
> __alloc_pages_direct_compact
> __alloc_pages_nodemask
> alloc_pages_vma
> do_huge_pmd_anonymous_page
> handle_mm_fault
> __get_user_pages
> get_user_page_nowait
> hva_to_pfn.isra.33
> __gfn_to_pfn
> gfn_to_pfn_async
> try_async_pf
> tdp_page_fault
> kvm_mmu_page_fault
> pf_interception
> handle_exit
> kvm_arch_vcpu_ioctl_run
> kvm_vcpu_ioctl
> do_vfs_ioctl
> sys_ioctl
> system_call_fastpath
> ioctl
> |
> |--50.72%-- 0x10100000002
> |
> --49.28%-- 0x10100000006
> 0.09% qemu-kvm [kernel.kallsyms] [k] __zone_watermark_ok
> |
> --- __zone_watermark_ok
> |
> |--95.81%-- zone_watermark_ok
> | compact_zone
> | compact_zone_order
> | try_to_compact_pages
> | __alloc_pages_direct_compact
> | __alloc_pages_nodemask
> | alloc_pages_vma
> | do_huge_pmd_anonymous_page
> | handle_mm_fault
> | __get_user_pages
> | get_user_page_nowait
> | hva_to_pfn.isra.33
> | __gfn_to_pfn
> | gfn_to_pfn_async
> | try_async_pf
> | tdp_page_fault
> | kvm_mmu_page_fault
> | pf_interception
> | handle_exit
> | kvm_arch_vcpu_ioctl_run
> | kvm_vcpu_ioctl
> | do_vfs_ioctl
> | sys_ioctl
> | system_call_fastpath
> | ioctl
> | |
> | |--51.21%-- 0x10100000002
> | |
> | --48.79%-- 0x10100000006
> |
> --4.19%-- compact_zone
> compact_zone_order
> try_to_compact_pages
> __alloc_pages_direct_compact
> __alloc_pages_nodemask
> alloc_pages_vma
> do_huge_pmd_anonymous_page
> handle_mm_fault
> __get_user_pages
> get_user_page_nowait
> hva_to_pfn.isra.33
> __gfn_to_pfn
> gfn_to_pfn_async
> try_async_pf
> tdp_page_fault
> kvm_mmu_page_fault
> pf_interception
> handle_exit
> kvm_arch_vcpu_ioctl_run
> kvm_vcpu_ioctl
> do_vfs_ioctl
> sys_ioctl
> system_call_fastpath
> ioctl
> |
> |--50.00%-- 0x10100000006
> |
> --50.00%-- 0x10100000002
> 0.06% perf [kernel.kallsyms] [k] copy_user_generic_string
> |
> --- copy_user_generic_string
> generic_file_buffered_write
> __generic_file_aio_write
> generic_file_aio_write
> ext4_file_write
> do_sync_write
> vfs_write
> sys_write
> system_call_fastpath
> write
> run_builtin
> main
> __libc_start_main
^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: [Qemu-devel] Windows slow boot: contractor wanted
@ 2012-09-06 9:20 ` Richard Davies
0 siblings, 0 replies; 101+ messages in thread
From: Richard Davies @ 2012-09-06 9:20 UTC (permalink / raw)
To: Rik van Riel; +Cc: Avi Kivity, kvm, qemu-devel
Hi Rik,
Are there any more tests which I can usefully do for you?
I notice that 3.6.0-rc4 is out - are there changes from rc3 which are worth
me retesting?
Cheers,
Richard.
Richard Davies wrote:
> Rik van Riel wrote:
> > Can you get a backtrace to that _raw_spin_lock_irqsave, to see
> > from where it is running into lock contention?
> >
> > It would be good to know whether it is isolate_freepages_block,
> > yield_to, kvm_vcpu_on_spin or something else...
>
> Hi Rik,
>
> I got into a slow boot situation on 3.6.0-rc3, ran "perf record -g -a" for a
> while, then ran perf report with the output below.
>
> This trace looks more like the second perf top trace that I sent on Saturday
> (there were two in my email and they were different from each other as well
> as different from on 3.5.2).
>
> The symptoms were a bit different too - the VM boots appeared to be
> completely locked up rather than just slow, and I couldn't quit qemu-kvm at
> the monitor - I had to restart the host.
>
> So perhaps this one is actually a deadlock rather than just slow?
>
> Cheers,
>
> Richard.
>
>
> # ========
> # captured on: Sun Aug 26 10:08:28 2012
> # os release : 3.6.0-rc3-elastic
> # perf version : 3.5.2
> # arch : x86_64
> # nrcpus online : 16
> # nrcpus avail : 16
> # cpudesc : AMD Opteron(tm) Processor 6128
> # cpuid : AuthenticAMD,16,9,1
> # total memory : 131971760 kB
> # cmdline : /home/root/bin/perf record -g -a
> # event : name = cycles, type = 0, config = 0x0, config1 = 0x0, config2 = 0x0, excl_usr = 0, excl_kern = 0, id = { 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64 }
> # HEADER_CPU_TOPOLOGY info available, use -I to display
> # HEADER_NUMA_TOPOLOGY info available, use -I to display
> # ========
> #
> # Samples: 2M of event 'cycles'
> # Event count (approx.): 1040676441385
> #
> # Overhead Command Shared Object Symbol
> # ........ ............... .................... ..............................................
> #
> 90.01% qemu-kvm [kernel.kallsyms] [k] _raw_spin_lock_irqsave
> |
> --- _raw_spin_lock_irqsave
> |
> |--99.99%-- isolate_migratepages_range
> | compact_zone
> | compact_zone_order
> | try_to_compact_pages
> | __alloc_pages_direct_compact
> | __alloc_pages_nodemask
> | alloc_pages_vma
> | do_huge_pmd_anonymous_page
> | handle_mm_fault
> | __get_user_pages
> | get_user_page_nowait
> | hva_to_pfn.isra.33
> | __gfn_to_pfn
> | gfn_to_pfn_async
> | try_async_pf
> | tdp_page_fault
> | kvm_mmu_page_fault
> | pf_interception
> | handle_exit
> | kvm_arch_vcpu_ioctl_run
> | kvm_vcpu_ioctl
> | do_vfs_ioctl
> | sys_ioctl
> | system_call_fastpath
> | ioctl
> | |
> | |--54.91%-- 0x10100000002
> | |
> | --45.09%-- 0x10100000006
> --0.01%-- [...]
> 4.66% qemu-kvm [kernel.kallsyms] [k] sub_preempt_count
> |
> --- sub_preempt_count
> |
> |--99.77%-- _raw_spin_unlock_irqrestore
> | |
> | |--99.99%-- compact_checklock_irqsave
> | | isolate_migratepages_range
> | | compact_zone
> | | compact_zone_order
> | | try_to_compact_pages
> | | __alloc_pages_direct_compact
> | | __alloc_pages_nodemask
> | | alloc_pages_vma
> | | do_huge_pmd_anonymous_page
> | | handle_mm_fault
> | | __get_user_pages
> | | get_user_page_nowait
> | | hva_to_pfn.isra.33
> | | __gfn_to_pfn
> | | gfn_to_pfn_async
> | | try_async_pf
> | | tdp_page_fault
> | | kvm_mmu_page_fault
> | | pf_interception
> | | handle_exit
> | | kvm_arch_vcpu_ioctl_run
> | | kvm_vcpu_ioctl
> | | do_vfs_ioctl
> | | sys_ioctl
> | | system_call_fastpath
> | | ioctl
> | | |
> | | |--51.94%-- 0x10100000002
> | | |
> | | --48.06%-- 0x10100000006
> | --0.01%-- [...]
> --0.23%-- [...]
> 1.23% ksmd [kernel.kallsyms] [k] memcmp
> |
> --- memcmp
> |
> |--99.83%-- memcmp_pages
> | |
> | |--78.46%-- ksm_scan_thread
> | | kthread
> | | kernel_thread_helper
> | |
> | --21.54%-- try_to_merge_with_ksm_page
> | ksm_scan_thread
> | kthread
> | kernel_thread_helper
> --0.17%-- [...]
> 0.91% ksmd [kernel.kallsyms] [k] smp_call_function_many
> |
> --- smp_call_function_many
> |
> |--99.98%-- native_flush_tlb_others
> | |
> | |--99.86%-- flush_tlb_page
> | | ptep_clear_flush
> | | try_to_merge_with_ksm_page
> | | ksm_scan_thread
> | | kthread
> | | kernel_thread_helper
> | --0.14%-- [...]
> --0.02%-- [...]
> 0.34% qemu-kvm [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore
> |
> --- _raw_spin_unlock_irqrestore
> |
> |--96.08%-- compact_checklock_irqsave
> | isolate_migratepages_range
> | compact_zone
> | compact_zone_order
> | try_to_compact_pages
> | __alloc_pages_direct_compact
> | __alloc_pages_nodemask
> | alloc_pages_vma
> | do_huge_pmd_anonymous_page
> | handle_mm_fault
> | __get_user_pages
> | get_user_page_nowait
> | hva_to_pfn.isra.33
> | __gfn_to_pfn
> | gfn_to_pfn_async
> | try_async_pf
> | tdp_page_fault
> | kvm_mmu_page_fault
> | pf_interception
> | handle_exit
> | kvm_arch_vcpu_ioctl_run
> | kvm_vcpu_ioctl
> | do_vfs_ioctl
> | sys_ioctl
> | system_call_fastpath
> | ioctl
> | |
> | |--65.19%-- 0x10100000006
> | |
> | --34.81%-- 0x10100000002
> |
> |--2.68%-- isolate_migratepages_range
> | compact_zone
> | compact_zone_order
> | try_to_compact_pages
> | __alloc_pages_direct_compact
> | __alloc_pages_nodemask
> | alloc_pages_vma
> | do_huge_pmd_anonymous_page
> | handle_mm_fault
> | __get_user_pages
> | get_user_page_nowait
> | hva_to_pfn.isra.33
> | __gfn_to_pfn
> | gfn_to_pfn_async
> | try_async_pf
> | tdp_page_fault
> | kvm_mmu_page_fault
> | pf_interception
> | handle_exit
> | kvm_arch_vcpu_ioctl_run
> | kvm_vcpu_ioctl
> | do_vfs_ioctl
> | sys_ioctl
> | system_call_fastpath
> | ioctl
> | |
> | |--52.08%-- 0x10100000002
> | |
> | --47.92%-- 0x10100000006
> |
> |--0.56%-- ntp_tick_length
> | do_timer
> | tick_do_update_jiffies64
> | tick_sched_timer
> | __run_hrtimer
> | hrtimer_interrupt
> | smp_apic_timer_interrupt
> | apic_timer_interrupt
> | compact_checklock_irqsave
> | isolate_migratepages_range
> | compact_zone
> | compact_zone_order
> | try_to_compact_pages
> | __alloc_pages_direct_compact
> | __alloc_pages_nodemask
> | alloc_pages_vma
> | do_huge_pmd_anonymous_page
> | handle_mm_fault
> | __get_user_pages
> | get_user_page_nowait
> | hva_to_pfn.isra.33
> | __gfn_to_pfn
> | gfn_to_pfn_async
> | try_async_pf
> | tdp_page_fault
> | kvm_mmu_page_fault
> | pf_interception
> | handle_exit
> | kvm_arch_vcpu_ioctl_run
> | kvm_vcpu_ioctl
> | do_vfs_ioctl
> | sys_ioctl
> | system_call_fastpath
> | ioctl
> | 0x10100000002
> --0.68%-- [...]
> 0.30% swapper [kernel.kallsyms] [k] default_idle
> |
> --- default_idle
> |
> |--99.95%-- cpu_idle
> | start_secondary
> --0.05%-- [...]
> 0.15% qemu-kvm [kernel.kallsyms] [k] isolate_migratepages_range
> |
> --- isolate_migratepages_range
> |
> |--97.41%-- compact_zone
> | compact_zone_order
> | try_to_compact_pages
> | __alloc_pages_direct_compact
> | __alloc_pages_nodemask
> | alloc_pages_vma
> | do_huge_pmd_anonymous_page
> | handle_mm_fault
> | __get_user_pages
> | get_user_page_nowait
> | hva_to_pfn.isra.33
> | __gfn_to_pfn
> | gfn_to_pfn_async
> | try_async_pf
> | tdp_page_fault
> | kvm_mmu_page_fault
> | pf_interception
> | handle_exit
> | kvm_arch_vcpu_ioctl_run
> | kvm_vcpu_ioctl
> | do_vfs_ioctl
> | sys_ioctl
> | system_call_fastpath
> | ioctl
> | |
> | |--54.02%-- 0x10100000002
> | |
> | --45.98%-- 0x10100000006
> |
> --2.59%-- compact_zone_order
> try_to_compact_pages
> __alloc_pages_direct_compact
> __alloc_pages_nodemask
> alloc_pages_vma
> do_huge_pmd_anonymous_page
> handle_mm_fault
> __get_user_pages
> get_user_page_nowait
> hva_to_pfn.isra.33
> __gfn_to_pfn
> gfn_to_pfn_async
> try_async_pf
> tdp_page_fault
> kvm_mmu_page_fault
> pf_interception
> handle_exit
> kvm_arch_vcpu_ioctl_run
> kvm_vcpu_ioctl
> do_vfs_ioctl
> sys_ioctl
> system_call_fastpath
> ioctl
> |
> |--56.10%-- 0x10100000002
> |
> --43.90%-- 0x10100000006
> 0.12% qemu-kvm [kernel.kallsyms] [k] compact_zone
> |
> --- compact_zone
> compact_zone_order
> try_to_compact_pages
> __alloc_pages_direct_compact
> __alloc_pages_nodemask
> alloc_pages_vma
> do_huge_pmd_anonymous_page
> handle_mm_fault
> __get_user_pages
> get_user_page_nowait
> hva_to_pfn.isra.33
> __gfn_to_pfn
> gfn_to_pfn_async
> try_async_pf
> tdp_page_fault
> kvm_mmu_page_fault
> pf_interception
> handle_exit
> kvm_arch_vcpu_ioctl_run
> kvm_vcpu_ioctl
> do_vfs_ioctl
> sys_ioctl
> system_call_fastpath
> ioctl
> |
> |--52.09%-- 0x10100000002
> |
> --47.91%-- 0x10100000006
> 0.11% qemu-kvm [kernel.kallsyms] [k] flush_tlb_func
> |
> --- flush_tlb_func
> |
> |--99.58%-- generic_smp_call_function_interrupt
> | smp_call_function_interrupt
> | call_function_interrupt
> | |
> | |--94.65%-- compact_checklock_irqsave
> | | isolate_migratepages_range
> | | compact_zone
> | | compact_zone_order
> | | try_to_compact_pages
> | | __alloc_pages_direct_compact
> | | __alloc_pages_nodemask
> | | alloc_pages_vma
> | | do_huge_pmd_anonymous_page
> | | handle_mm_fault
> | | __get_user_pages
> | | get_user_page_nowait
> | | hva_to_pfn.isra.33
> | | __gfn_to_pfn
> | | gfn_to_pfn_async
> | | try_async_pf
> | | tdp_page_fault
> | | kvm_mmu_page_fault
> | | pf_interception
> | | handle_exit
> | | kvm_arch_vcpu_ioctl_run
> | | kvm_vcpu_ioctl
> | | do_vfs_ioctl
> | | sys_ioctl
> | | system_call_fastpath
> | | ioctl
> | | |
> | | |--78.04%-- 0x10100000006
> | | |
> | | --21.96%-- 0x10100000002
> | |
> | |--4.67%-- sub_preempt_count
> | | _raw_spin_unlock_irqrestore
> | | compact_checklock_irqsave
> | | isolate_migratepages_range
> | | compact_zone
> | | compact_zone_order
> | | try_to_compact_pages
> | | __alloc_pages_direct_compact
> | | __alloc_pages_nodemask
> | | alloc_pages_vma
> | | do_huge_pmd_anonymous_page
> | | handle_mm_fault
> | | __get_user_pages
> | | get_user_page_nowait
> | | hva_to_pfn.isra.33
> | | __gfn_to_pfn
> | | gfn_to_pfn_async
> | | try_async_pf
> | | tdp_page_fault
> | | kvm_mmu_page_fault
> | | pf_interception
> | | handle_exit
> | | kvm_arch_vcpu_ioctl_run
> | | kvm_vcpu_ioctl
> | | do_vfs_ioctl
> | | sys_ioctl
> | | system_call_fastpath
> | | ioctl
> | | |
> | | |--78.18%-- 0x10100000006
> | | |
> | | --21.82%-- 0x10100000002
> | --0.68%-- [...]
> --0.42%-- [...]
> 0.09% qemu-kvm [kernel.kallsyms] [k] mod_zone_page_state
> |
> --- mod_zone_page_state
> |
> |--80.84%-- isolate_migratepages_range
> | compact_zone
> | compact_zone_order
> | try_to_compact_pages
> | __alloc_pages_direct_compact
> | __alloc_pages_nodemask
> | alloc_pages_vma
> | do_huge_pmd_anonymous_page
> | handle_mm_fault
> | __get_user_pages
> | get_user_page_nowait
> | hva_to_pfn.isra.33
> | __gfn_to_pfn
> | gfn_to_pfn_async
> | try_async_pf
> | tdp_page_fault
> | kvm_mmu_page_fault
> | pf_interception
> | handle_exit
> | kvm_arch_vcpu_ioctl_run
> | kvm_vcpu_ioctl
> | do_vfs_ioctl
> | sys_ioctl
> | system_call_fastpath
> | ioctl
> | |
> | |--53.90%-- 0x10100000002
> | |
> | --46.10%-- 0x10100000006
> |
> --19.16%-- compact_zone
> compact_zone_order
> try_to_compact_pages
> __alloc_pages_direct_compact
> __alloc_pages_nodemask
> alloc_pages_vma
> do_huge_pmd_anonymous_page
> handle_mm_fault
> __get_user_pages
> get_user_page_nowait
> hva_to_pfn.isra.33
> __gfn_to_pfn
> gfn_to_pfn_async
> try_async_pf
> tdp_page_fault
> kvm_mmu_page_fault
> pf_interception
> handle_exit
> kvm_arch_vcpu_ioctl_run
> kvm_vcpu_ioctl
> do_vfs_ioctl
> sys_ioctl
> system_call_fastpath
> ioctl
> |
> |--55.04%-- 0x10100000002
> |
> --44.96%-- 0x10100000006
> 0.09% qemu-kvm [kernel.kallsyms] [k] migrate_pages
> |
> --- migrate_pages
> |
> |--96.21%-- compact_zone
> | compact_zone_order
> | try_to_compact_pages
> | __alloc_pages_direct_compact
> | __alloc_pages_nodemask
> | alloc_pages_vma
> | do_huge_pmd_anonymous_page
> | handle_mm_fault
> | __get_user_pages
> | get_user_page_nowait
> | hva_to_pfn.isra.33
> | __gfn_to_pfn
> | gfn_to_pfn_async
> | try_async_pf
> | tdp_page_fault
> | kvm_mmu_page_fault
> | pf_interception
> | handle_exit
> | kvm_arch_vcpu_ioctl_run
> | kvm_vcpu_ioctl
> | do_vfs_ioctl
> | sys_ioctl
> | system_call_fastpath
> | ioctl
> | |
> | |--52.94%-- 0x10100000002
> | |
> | --47.06%-- 0x10100000006
> |
> --3.79%-- compact_zone_order
> try_to_compact_pages
> __alloc_pages_direct_compact
> __alloc_pages_nodemask
> alloc_pages_vma
> do_huge_pmd_anonymous_page
> handle_mm_fault
> __get_user_pages
> get_user_page_nowait
> hva_to_pfn.isra.33
> __gfn_to_pfn
> gfn_to_pfn_async
> try_async_pf
> tdp_page_fault
> kvm_mmu_page_fault
> pf_interception
> handle_exit
> kvm_arch_vcpu_ioctl_run
> kvm_vcpu_ioctl
> do_vfs_ioctl
> sys_ioctl
> system_call_fastpath
> ioctl
> |
> |--50.72%-- 0x10100000002
> |
> --49.28%-- 0x10100000006
> 0.09% qemu-kvm [kernel.kallsyms] [k] __zone_watermark_ok
> |
> --- __zone_watermark_ok
> |
> |--95.81%-- zone_watermark_ok
> | compact_zone
> | compact_zone_order
> | try_to_compact_pages
> | __alloc_pages_direct_compact
> | __alloc_pages_nodemask
> | alloc_pages_vma
> | do_huge_pmd_anonymous_page
> | handle_mm_fault
> | __get_user_pages
> | get_user_page_nowait
> | hva_to_pfn.isra.33
> | __gfn_to_pfn
> | gfn_to_pfn_async
> | try_async_pf
> | tdp_page_fault
> | kvm_mmu_page_fault
> | pf_interception
> | handle_exit
> | kvm_arch_vcpu_ioctl_run
> | kvm_vcpu_ioctl
> | do_vfs_ioctl
> | sys_ioctl
> | system_call_fastpath
> | ioctl
> | |
> | |--51.21%-- 0x10100000002
> | |
> | --48.79%-- 0x10100000006
> |
> --4.19%-- compact_zone
> compact_zone_order
> try_to_compact_pages
> __alloc_pages_direct_compact
> __alloc_pages_nodemask
> alloc_pages_vma
> do_huge_pmd_anonymous_page
> handle_mm_fault
> __get_user_pages
> get_user_page_nowait
> hva_to_pfn.isra.33
> __gfn_to_pfn
> gfn_to_pfn_async
> try_async_pf
> tdp_page_fault
> kvm_mmu_page_fault
> pf_interception
> handle_exit
> kvm_arch_vcpu_ioctl_run
> kvm_vcpu_ioctl
> do_vfs_ioctl
> sys_ioctl
> system_call_fastpath
> ioctl
> |
> |--50.00%-- 0x10100000006
> |
> --50.00%-- 0x10100000002
> 0.06% perf [kernel.kallsyms] [k] copy_user_generic_string
> |
> --- copy_user_generic_string
> generic_file_buffered_write
> __generic_file_aio_write
> generic_file_aio_write
> ext4_file_write
> do_sync_write
> vfs_write
> sys_write
> system_call_fastpath
> write
> run_builtin
> main
> __libc_start_main
^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: Windows VM slow boot
2012-09-06 9:20 ` [Qemu-devel] " Richard Davies
(?)
@ 2012-09-12 10:56 ` Richard Davies
-1 siblings, 0 replies; 101+ messages in thread
From: Richard Davies @ 2012-09-12 10:56 UTC (permalink / raw)
To: Rik van Riel; +Cc: Avi Kivity, qemu-devel, kvm, linux-mm
[ adding linux-mm - previously at http://marc.info/?t=134511509400003 ]
Hi Rik,
Since qemu-kvm 1.2.0 and Linux 3.6.0-rc5 came out, I thought that I would
retest with these.
The typical symptom now appears to be that the Windows VMs boot reasonably
fast, but then there is high CPU use and load for many minutes afterwards -
the high CPU use is both for the qemu-kvm processes themselves and also for
% sys.
I attach a perf report which seems to show that the high CPU use is in the
memory manager.
Cheers,
Richard.
# ========
# captured on: Wed Sep 12 10:25:43 2012
# os release : 3.6.0-rc5-elastic
# perf version : 3.5.2
# arch : x86_64
# nrcpus online : 16
# nrcpus avail : 16
# cpudesc : AMD Opteron(tm) Processor 6128
# cpuid : AuthenticAMD,16,9,1
# total memory : 131973280 kB
# cmdline : /home/root/bin/perf record -g -a
# event : name = cycles, type = 0, config = 0x0, config1 = 0x0, config2 = 0x0, excl_usr = 0, excl_kern = 0, id = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16 }
# HEADER_CPU_TOPOLOGY info available, use -I to display
# HEADER_NUMA_TOPOLOGY info available, use -I to display
# ========
#
# Samples: 870K of event 'cycles'
# Event count (approx.): 432968175910
#
# Overhead Command Shared Object Symbol
# ........ ............... .................... ..............................................
#
89.14% qemu-kvm [kernel.kallsyms] [k] _raw_spin_lock_irqsave
|
--- _raw_spin_lock_irqsave
|
|--95.47%-- isolate_migratepages_range
| compact_zone
| compact_zone_order
| try_to_compact_pages
| __alloc_pages_direct_compact
| __alloc_pages_nodemask
| alloc_pages_vma
| do_huge_pmd_anonymous_page
| handle_mm_fault
| __get_user_pages
| get_user_page_nowait
| hva_to_pfn.isra.17
| __gfn_to_pfn
| gfn_to_pfn_async
| try_async_pf
| tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--55.64%-- 0x10100000002
| |
| --44.36%-- 0x10100000006
|
|--4.53%-- compact_zone
| compact_zone_order
| try_to_compact_pages
| __alloc_pages_direct_compact
| __alloc_pages_nodemask
| alloc_pages_vma
| do_huge_pmd_anonymous_page
| handle_mm_fault
| __get_user_pages
| get_user_page_nowait
| hva_to_pfn.isra.17
| __gfn_to_pfn
| gfn_to_pfn_async
| try_async_pf
| tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--55.36%-- 0x10100000002
| |
| --44.64%-- 0x10100000006
--0.00%-- [...]
4.92% qemu-kvm [kernel.kallsyms] [k] migrate_pages
|
--- migrate_pages
|
|--99.74%-- compact_zone
| compact_zone_order
| try_to_compact_pages
| __alloc_pages_direct_compact
| __alloc_pages_nodemask
| alloc_pages_vma
| do_huge_pmd_anonymous_page
| handle_mm_fault
| __get_user_pages
| get_user_page_nowait
| hva_to_pfn.isra.17
| __gfn_to_pfn
| gfn_to_pfn_async
| try_async_pf
| tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--55.80%-- 0x10100000002
| |
| --44.20%-- 0x10100000006
--0.26%-- [...]
1.59% ksmd [kernel.kallsyms] [k] memcmp
|
--- memcmp
|
|--99.69%-- memcmp_pages
| |
| |--78.86%-- ksm_scan_thread
| | kthread
| | kernel_thread_helper
| |
| --21.14%-- try_to_merge_with_ksm_page
| ksm_scan_thread
| kthread
| kernel_thread_helper
--0.31%-- [...]
0.85% ksmd [kernel.kallsyms] [k] smp_call_function_many
|
--- smp_call_function_many
native_flush_tlb_others
|
|--99.81%-- flush_tlb_page
| ptep_clear_flush
| try_to_merge_with_ksm_page
| ksm_scan_thread
| kthread
| kernel_thread_helper
--0.19%-- [...]
0.38% swapper [kernel.kallsyms] [k] default_idle
|
--- default_idle
|
|--99.80%-- cpu_idle
| |
| |--90.53%-- start_secondary
| |
| --9.47%-- rest_init
| start_kernel
| x86_64_start_reservations
| x86_64_start_kernel
--0.20%-- [...]
0.38% qemu-kvm [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore
|
--- _raw_spin_unlock_irqrestore
|
|--94.31%-- compact_checklock_irqsave
| isolate_migratepages_range
| compact_zone
| compact_zone_order
| try_to_compact_pages
| __alloc_pages_direct_compact
| __alloc_pages_nodemask
| alloc_pages_vma
| do_huge_pmd_anonymous_page
| handle_mm_fault
| __get_user_pages
| get_user_page_nowait
| hva_to_pfn.isra.17
| __gfn_to_pfn
| gfn_to_pfn_async
| try_async_pf
| tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--59.74%-- 0x10100000006
| |
| --40.26%-- 0x10100000002
|
|--3.41%-- isolate_migratepages_range
| compact_zone
| compact_zone_order
| try_to_compact_pages
| __alloc_pages_direct_compact
| __alloc_pages_nodemask
| alloc_pages_vma
| do_huge_pmd_anonymous_page
| handle_mm_fault
| __get_user_pages
| get_user_page_nowait
| hva_to_pfn.isra.17
| __gfn_to_pfn
| gfn_to_pfn_async
| try_async_pf
| tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--53.57%-- 0x10100000006
| |
| --46.43%-- 0x10100000002
|
|--0.82%-- ntp_tick_length
| do_timer
| tick_do_update_jiffies64
| tick_sched_timer
| __run_hrtimer.isra.28
| hrtimer_interrupt
| smp_apic_timer_interrupt
| apic_timer_interrupt
| compact_checklock_irqsave
| isolate_migratepages_range
| compact_zone
| compact_zone_order
| try_to_compact_pages
| __alloc_pages_direct_compact
| __alloc_pages_nodemask
| alloc_pages_vma
| do_huge_pmd_anonymous_page
| handle_mm_fault
| __get_user_pages
| get_user_page_nowait
| hva_to_pfn.isra.17
| __gfn_to_pfn
| gfn_to_pfn_async
| try_async_pf
| tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| 0x10100000002
|
|--0.76%-- __page_cache_release.part.11
| __put_compound_page
| put_compound_page
| release_pages
| free_pages_and_swap_cache
| tlb_flush_mmu
| tlb_finish_mmu
| exit_mmap
| mmput
| exit_mm
| do_exit
| do_group_exit
| get_signal_to_deliver
| do_signal
| do_notify_resume
| int_signal
--0.70%-- [...]
0.26% qemu-kvm [kernel.kallsyms] [k] isolate_migratepages_range
|
--- isolate_migratepages_range
|
|--95.44%-- compact_zone
| compact_zone_order
| try_to_compact_pages
| __alloc_pages_direct_compact
| __alloc_pages_nodemask
| alloc_pages_vma
| do_huge_pmd_anonymous_page
| handle_mm_fault
| __get_user_pages
| get_user_page_nowait
| hva_to_pfn.isra.17
| __gfn_to_pfn
| gfn_to_pfn_async
| try_async_pf
| tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--52.46%-- 0x10100000002
| |
| --47.54%-- 0x10100000006
|
--4.56%-- compact_zone_order
try_to_compact_pages
__alloc_pages_direct_compact
__alloc_pages_nodemask
alloc_pages_vma
do_huge_pmd_anonymous_page
handle_mm_fault
__get_user_pages
get_user_page_nowait
hva_to_pfn.isra.17
__gfn_to_pfn
gfn_to_pfn_async
try_async_pf
tdp_page_fault
kvm_mmu_page_fault
pf_interception
handle_exit
kvm_arch_vcpu_ioctl_run
kvm_vcpu_ioctl
do_vfs_ioctl
sys_ioctl
system_call_fastpath
ioctl
|
|--53.84%-- 0x10100000006
|
--46.16%-- 0x10100000002
0.21% qemu-kvm [kernel.kallsyms] [k] compact_zone
|
--- compact_zone
compact_zone_order
try_to_compact_pages
__alloc_pages_direct_compact
__alloc_pages_nodemask
alloc_pages_vma
do_huge_pmd_anonymous_page
handle_mm_fault
__get_user_pages
get_user_page_nowait
hva_to_pfn.isra.17
__gfn_to_pfn
gfn_to_pfn_async
try_async_pf
tdp_page_fault
kvm_mmu_page_fault
pf_interception
handle_exit
kvm_arch_vcpu_ioctl_run
kvm_vcpu_ioctl
do_vfs_ioctl
sys_ioctl
system_call_fastpath
ioctl
|
|--53.46%-- 0x10100000002
|
--46.54%-- 0x10100000006
0.14% qemu-kvm [kernel.kallsyms] [k] mod_zone_page_state
|
--- mod_zone_page_state
|
|--70.21%-- isolate_migratepages_range
| compact_zone
| compact_zone_order
| try_to_compact_pages
| __alloc_pages_direct_compact
| __alloc_pages_nodemask
| alloc_pages_vma
| do_huge_pmd_anonymous_page
| handle_mm_fault
| __get_user_pages
| get_user_page_nowait
| hva_to_pfn.isra.17
| __gfn_to_pfn
| gfn_to_pfn_async
| try_async_pf
| tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--55.97%-- 0x10100000002
| |
| --44.03%-- 0x10100000006
|
|--29.71%-- compact_zone
| compact_zone_order
| try_to_compact_pages
| __alloc_pages_direct_compact
| __alloc_pages_nodemask
| alloc_pages_vma
| do_huge_pmd_anonymous_page
| handle_mm_fault
| __get_user_pages
| get_user_page_nowait
| hva_to_pfn.isra.17
| __gfn_to_pfn
| gfn_to_pfn_async
| try_async_pf
| tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--61.19%-- 0x10100000002
| |
| --38.81%-- 0x10100000006
--0.08%-- [...]
0.13% qemu-kvm [kernel.kallsyms] [k] flush_tlb_func
|
--- flush_tlb_func
|
|--99.47%-- generic_smp_call_function_interrupt
| smp_call_function_interrupt
| call_function_interrupt
| |
| |--91.76%-- compact_checklock_irqsave
| | isolate_migratepages_range
| | compact_zone
| | compact_zone_order
| | try_to_compact_pages
| | __alloc_pages_direct_compact
| | __alloc_pages_nodemask
| | alloc_pages_vma
| | do_huge_pmd_anonymous_page
| | handle_mm_fault
| | __get_user_pages
| | get_user_page_nowait
| | hva_to_pfn.isra.17
| | __gfn_to_pfn
| | gfn_to_pfn_async
| | try_async_pf
| | tdp_page_fault
| | kvm_mmu_page_fault
| | pf_interception
| | handle_exit
| | kvm_arch_vcpu_ioctl_run
| | kvm_vcpu_ioctl
| | do_vfs_ioctl
| | sys_ioctl
| | system_call_fastpath
| | ioctl
| | |
| | |--76.39%-- 0x10100000006
| | |
| | --23.61%-- 0x10100000002
| |
| |--7.61%-- compact_zone
| | compact_zone_order
| | try_to_compact_pages
| | __alloc_pages_direct_compact
| | __alloc_pages_nodemask
| | alloc_pages_vma
| | do_huge_pmd_anonymous_page
| | handle_mm_fault
| | __get_user_pages
| | get_user_page_nowait
| | hva_to_pfn.isra.17
| | __gfn_to_pfn
| | gfn_to_pfn_async
| | try_async_pf
| | tdp_page_fault
| | kvm_mmu_page_fault
| | pf_interception
| | handle_exit
| | kvm_arch_vcpu_ioctl_run
| | kvm_vcpu_ioctl
| | do_vfs_ioctl
| | sys_ioctl
| | system_call_fastpath
| | ioctl
| | |
| | |--70.59%-- 0x10100000006
| | |
| | --29.41%-- 0x10100000002
| --0.63%-- [...]
|
--0.53%-- smp_call_function_interrupt
call_function_interrupt
|
|--83.32%-- compact_checklock_irqsave
| isolate_migratepages_range
| compact_zone
| compact_zone_order
| try_to_compact_pages
| __alloc_pages_direct_compact
| __alloc_pages_nodemask
| alloc_pages_vma
| do_huge_pmd_anonymous_page
| handle_mm_fault
| __get_user_pages
| get_user_page_nowait
| hva_to_pfn.isra.17
| __gfn_to_pfn
| gfn_to_pfn_async
| try_async_pf
| tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--79.99%-- 0x10100000006
| |
| --20.01%-- 0x10100000002
|
--16.68%-- compact_zone
compact_zone_order
try_to_compact_pages
__alloc_pages_direct_compact
__alloc_pages_nodemask
alloc_pages_vma
do_huge_pmd_anonymous_page
handle_mm_fault
__get_user_pages
get_user_page_nowait
hva_to_pfn.isra.17
__gfn_to_pfn
gfn_to_pfn_async
try_async_pf
tdp_page_fault
kvm_mmu_page_fault
pf_interception
handle_exit
kvm_arch_vcpu_ioctl_run
kvm_vcpu_ioctl
do_vfs_ioctl
sys_ioctl
system_call_fastpath
ioctl
0x10100000002
0.09% qemu-kvm [kernel.kallsyms] [k] free_pages_prepare
|
--- free_pages_prepare
|
|--99.75%-- __free_pages_ok
| |
| |--99.84%-- free_compound_page
| | __put_compound_page
| | put_compound_page
| | release_pages
| | free_pages_and_swap_cache
| | tlb_flush_mmu
| | tlb_finish_mmu
| | exit_mmap
| | mmput
| | exit_mm
| | do_exit
| | do_group_exit
| | get_signal_to_deliver
| | do_signal
| | do_notify_resume
| | int_signal
| --0.16%-- [...]
--0.25%-- [...]
0.08% :2585 [kernel.kallsyms] [k] free_pages_prepare
|
--- free_pages_prepare
|
|--99.47%-- __free_pages_ok
| free_compound_page
| __put_compound_page
| put_compound_page
| release_pages
| free_pages_and_swap_cache
| tlb_flush_mmu
| tlb_finish_mmu
| exit_mmap
| mmput
| exit_mm
| do_exit
| do_group_exit
| get_signal_to_deliver
| do_signal
| do_notify_resume
| int_signal
|
--0.53%-- free_hot_cold_page
__free_pages
|
|--50.65%-- zap_huge_pmd
| unmap_single_vma
| unmap_vmas
| exit_mmap
| mmput
| exit_mm
| do_exit
| do_group_exit
| get_signal_to_deliver
| do_signal
| do_notify_resume
| int_signal
|
--49.35%-- __vunmap
vfree
kvm_free_physmem_slot
kvm_free_physmem
kvm_put_kvm
kvm_vcpu_release
__fput
____fput
task_work_run
do_exit
do_group_exit
get_signal_to_deliver
do_signal
do_notify_resume
int_signal
0.07% :2561 [kernel.kallsyms] [k] free_pages_prepare
|
--- free_pages_prepare
|
|--99.55%-- __free_pages_ok
| free_compound_page
| __put_compound_page
| put_compound_page
| release_pages
| free_pages_and_swap_cache
| tlb_flush_mmu
| tlb_finish_mmu
| exit_mmap
| mmput
| exit_mm
| do_exit
| do_group_exit
| get_signal_to_deliver
| do_signal
| do_notify_resume
| int_signal
--0.45%-- [...]
0.07% qemu-kvm [kernel.kallsyms] [k] __zone_watermark_ok
|
--- __zone_watermark_ok
|
|--56.52%-- zone_watermark_ok
| compact_zone
| compact_zone_order
| try_to_compact_pages
| __alloc_pages_direct_compact
| __alloc_pages_nodemask
| alloc_pages_vma
| do_huge_pmd_anonymous_page
| handle_mm_fault
| __get_user_pages
| get_user_page_nowait
| hva_to_pfn.isra.17
| __gfn_to_pfn
| gfn_to_pfn_async
| try_async_pf
| tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--59.67%-- 0x10100000002
| |
| --40.33%-- 0x10100000006
|
--43.48%-- compact_zone
compact_zone_order
try_to_compact_pages
__alloc_pages_direct_compact
__alloc_pages_nodemask
alloc_pages_vma
do_huge_pmd_anonymous_page
handle_mm_fault
__get_user_pages
get_user_page_nowait
hva_to_pfn.isra.17
__gfn_to_pfn
gfn_to_pfn_async
try_async_pf
tdp_page_fault
kvm_mmu_page_fault
pf_interception
handle_exit
kvm_arch_vcpu_ioctl_run
kvm_vcpu_ioctl
do_vfs_ioctl
sys_ioctl
system_call_fastpath
ioctl
|
|--58.50%-- 0x10100000002
|
--41.50%-- 0x10100000006
0.06% perf [kernel.kallsyms] [k] copy_user_generic_string
|
--- copy_user_generic_string
|
|--99.82%-- generic_file_buffered_write
| __generic_file_aio_write
| generic_file_aio_write
| ext4_file_write
| do_sync_write
| vfs_write
| sys_write
| system_call_fastpath
| write
| run_builtin
| main
| __libc_start_main
--0.18%-- [...]
0.05% qemu-kvm [kernel.kallsyms] [k] compact_checklock_irqsave
|
--- compact_checklock_irqsave
|
|--82.09%-- isolate_migratepages_range
| compact_zone
| compact_zone_order
| try_to_compact_pages
| __alloc_pages_direct_compact
| __alloc_pages_nodemask
| alloc_pages_vma
| do_huge_pmd_anonymous_page
| handle_mm_fault
| __get_user_pages
| get_user_page_nowait
| hva_to_pfn.isra.17
| __gfn_to_pfn
| gfn_to_pfn_async
| try_async_pf
| tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--54.69%-- 0x10100000002
| |
| --45.31%-- 0x10100000006
|
--17.91%-- compact_zone
compact_zone_order
try_to_compact_pages
__alloc_pages_direct_compact
__alloc_pages_nodemask
alloc_pages_vma
do_huge_pmd_anonymous_page
handle_mm_fault
__get_user_pages
get_user_page_nowait
hva_to_pfn.isra.17
__gfn_to_pfn
gfn_to_pfn_async
try_async_pf
tdp_page_fault
kvm_mmu_page_fault
pf_interception
handle_exit
kvm_arch_vcpu_ioctl_run
kvm_vcpu_ioctl
do_vfs_ioctl
sys_ioctl
system_call_fastpath
ioctl
|
|--59.49%-- 0x10100000002
|
--40.51%-- 0x10100000006
0.04% qemu-kvm [kernel.kallsyms] [k] call_function_interrupt
|
--- call_function_interrupt
|
|--91.95%-- compact_checklock_irqsave
| isolate_migratepages_range
| compact_zone
| compact_zone_order
| try_to_compact_pages
| __alloc_pages_direct_compact
| __alloc_pages_nodemask
| alloc_pages_vma
| do_huge_pmd_anonymous_page
| handle_mm_fault
| __get_user_pages
| get_user_page_nowait
| hva_to_pfn.isra.17
| __gfn_to_pfn
| gfn_to_pfn_async
| try_async_pf
| tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--72.81%-- 0x10100000006
| |
| --27.19%-- 0x10100000002
|
|--7.50%-- compact_zone
| compact_zone_order
| try_to_compact_pages
| __alloc_pages_direct_compact
| __alloc_pages_nodemask
| alloc_pages_vma
| do_huge_pmd_anonymous_page
| handle_mm_fault
| __get_user_pages
| get_user_page_nowait
| hva_to_pfn.isra.17
| __gfn_to_pfn
| gfn_to_pfn_async
| try_async_pf
| tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--55.56%-- 0x10100000006
| |
| --44.44%-- 0x10100000002
--0.56%-- [...]
0.04% ksmd [kernel.kallsyms] [k] default_send_IPI_mask_sequence_phys
|
--- default_send_IPI_mask_sequence_phys
|
|--99.44%-- physflat_send_IPI_mask
| native_send_call_func_ipi
| smp_call_function_many
| native_flush_tlb_others
| flush_tlb_page
| ptep_clear_flush
| try_to_merge_with_ksm_page
| ksm_scan_thread
| kthread
| kernel_thread_helper
|
--0.56%-- native_send_call_func_ipi
smp_call_function_many
native_flush_tlb_others
flush_tlb_page
ptep_clear_flush
try_to_merge_with_ksm_page
ksm_scan_thread
kthread
kernel_thread_helper
0.03% qemu-kvm [kernel.kallsyms] [k] generic_smp_call_function_interrupt
|
--- generic_smp_call_function_interrupt
|
|--96.97%-- smp_call_function_interrupt
| call_function_interrupt
| |
| |--97.39%-- compact_checklock_irqsave
| | isolate_migratepages_range
| | compact_zone
| | compact_zone_order
| | try_to_compact_pages
| | __alloc_pages_direct_compact
| | __alloc_pages_nodemask
| | alloc_pages_vma
| | do_huge_pmd_anonymous_page
| | handle_mm_fault
| | __get_user_pages
| | get_user_page_nowait
| | hva_to_pfn.isra.17
| | __gfn_to_pfn
| | gfn_to_pfn_async
| | try_async_pf
| | tdp_page_fault
| | kvm_mmu_page_fault
| | pf_interception
| | handle_exit
| | kvm_arch_vcpu_ioctl_run
| | kvm_vcpu_ioctl
| | do_vfs_ioctl
| | sys_ioctl
| | system_call_fastpath
| | ioctl
| | |
| | |--78.65%-- 0x10100000006
| | |
| | --21.35%-- 0x10100000002
| |
| |--2.43%-- compact_zone
| | compact_zone_order
| | try_to_compact_pages
| | __alloc_pages_direct_compact
| | __alloc_pages_nodemask
| | alloc_pages_vma
| | do_huge_pmd_anonymous_page
| | handle_mm_fault
| | __get_user_pages
| | get_user_page_nowait
| | hva_to_pfn.isra.17
| | __gfn_to_pfn
| | gfn_to_pfn_async
| | try_async_pf
| | tdp_page_fault
| | kvm_mmu_page_fault
| | pf_interception
| | handle_exit
| | kvm_arch_vcpu_ioctl_run
| | kvm_vcpu_ioctl
| | do_vfs_ioctl
| | sys_ioctl
| | system_call_fastpath
| | ioctl
| | |
| | |--57.14%-- 0x10100000002
| | |
| | --42.86%-- 0x10100000006
| --0.19%-- [...]
|
--3.03%-- call_function_interrupt
|
|--77.79%-- compact_checklock_irqsave
| isolate_migratepages_range
| compact_zone
| compact_zone_order
| try_to_compact_pages
| __alloc_pages_direct_compact
| __alloc_pages_nodemask
| alloc_pages_vma
| do_huge_pmd_anonymous_page
| handle_mm_fault
| __get_user_pages
| get_user_page_nowait
| hva_to_pfn.isra.17
| __gfn_to_pfn
| gfn_to_pfn_async
| try_async_pf
| tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--71.42%-- 0x10100000006
| |
| --28.58%-- 0x10100000002
|
--22.21%-- compact_zone
compact_zone_order
try_to_compact_pages
__alloc_pages_direct_compact
__alloc_pages_nodemask
alloc_pages_vma
do_huge_pmd_anonymous_page
handle_mm_fault
__get_user_pages
get_user_page_nowait
hva_to_pfn.isra.17
__gfn_to_pfn
gfn_to_pfn_async
^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: Windows VM slow boot
@ 2012-09-12 10:56 ` Richard Davies
0 siblings, 0 replies; 101+ messages in thread
From: Richard Davies @ 2012-09-12 10:56 UTC (permalink / raw)
To: Rik van Riel; +Cc: Avi Kivity, qemu-devel, kvm, linux-mm
[ adding linux-mm - previously at http://marc.info/?t=134511509400003 ]
Hi Rik,
Since qemu-kvm 1.2.0 and Linux 3.6.0-rc5 came out, I thought that I would
retest with these.
The typical symptom now appears to be that the Windows VMs boot reasonably
fast, but then there is high CPU use and load for many minutes afterwards -
the high CPU use is both for the qemu-kvm processes themselves and also for
% sys.
I attach a perf report which seems to show that the high CPU use is in the
memory manager.
Cheers,
Richard.
# ========
# captured on: Wed Sep 12 10:25:43 2012
# os release : 3.6.0-rc5-elastic
# perf version : 3.5.2
# arch : x86_64
# nrcpus online : 16
# nrcpus avail : 16
# cpudesc : AMD Opteron(tm) Processor 6128
# cpuid : AuthenticAMD,16,9,1
# total memory : 131973280 kB
# cmdline : /home/root/bin/perf record -g -a
# event : name = cycles, type = 0, config = 0x0, config1 = 0x0, config2 = 0x0, excl_usr = 0, excl_kern = 0, id = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16 }
# HEADER_CPU_TOPOLOGY info available, use -I to display
# HEADER_NUMA_TOPOLOGY info available, use -I to display
# ========
#
# Samples: 870K of event 'cycles'
# Event count (approx.): 432968175910
#
# Overhead Command Shared Object Symbol
# ........ ............... .................... ..............................................
#
89.14% qemu-kvm [kernel.kallsyms] [k] _raw_spin_lock_irqsave
|
--- _raw_spin_lock_irqsave
|
|--95.47%-- isolate_migratepages_range
| compact_zone
| compact_zone_order
| try_to_compact_pages
| __alloc_pages_direct_compact
| __alloc_pages_nodemask
| alloc_pages_vma
| do_huge_pmd_anonymous_page
| handle_mm_fault
| __get_user_pages
| get_user_page_nowait
| hva_to_pfn.isra.17
| __gfn_to_pfn
| gfn_to_pfn_async
| try_async_pf
| tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--55.64%-- 0x10100000002
| |
| --44.36%-- 0x10100000006
|
|--4.53%-- compact_zone
| compact_zone_order
| try_to_compact_pages
| __alloc_pages_direct_compact
| __alloc_pages_nodemask
| alloc_pages_vma
| do_huge_pmd_anonymous_page
| handle_mm_fault
| __get_user_pages
| get_user_page_nowait
| hva_to_pfn.isra.17
| __gfn_to_pfn
| gfn_to_pfn_async
| try_async_pf
| tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--55.36%-- 0x10100000002
| |
| --44.64%-- 0x10100000006
--0.00%-- [...]
4.92% qemu-kvm [kernel.kallsyms] [k] migrate_pages
|
--- migrate_pages
|
|--99.74%-- compact_zone
| compact_zone_order
| try_to_compact_pages
| __alloc_pages_direct_compact
| __alloc_pages_nodemask
| alloc_pages_vma
| do_huge_pmd_anonymous_page
| handle_mm_fault
| __get_user_pages
| get_user_page_nowait
| hva_to_pfn.isra.17
| __gfn_to_pfn
| gfn_to_pfn_async
| try_async_pf
| tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--55.80%-- 0x10100000002
| |
| --44.20%-- 0x10100000006
--0.26%-- [...]
1.59% ksmd [kernel.kallsyms] [k] memcmp
|
--- memcmp
|
|--99.69%-- memcmp_pages
| |
| |--78.86%-- ksm_scan_thread
| | kthread
| | kernel_thread_helper
| |
| --21.14%-- try_to_merge_with_ksm_page
| ksm_scan_thread
| kthread
| kernel_thread_helper
--0.31%-- [...]
0.85% ksmd [kernel.kallsyms] [k] smp_call_function_many
|
--- smp_call_function_many
native_flush_tlb_others
|
|--99.81%-- flush_tlb_page
| ptep_clear_flush
| try_to_merge_with_ksm_page
| ksm_scan_thread
| kthread
| kernel_thread_helper
--0.19%-- [...]
0.38% swapper [kernel.kallsyms] [k] default_idle
|
--- default_idle
|
|--99.80%-- cpu_idle
| |
| |--90.53%-- start_secondary
| |
| --9.47%-- rest_init
| start_kernel
| x86_64_start_reservations
| x86_64_start_kernel
--0.20%-- [...]
0.38% qemu-kvm [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore
|
--- _raw_spin_unlock_irqrestore
|
|--94.31%-- compact_checklock_irqsave
| isolate_migratepages_range
| compact_zone
| compact_zone_order
| try_to_compact_pages
| __alloc_pages_direct_compact
| __alloc_pages_nodemask
| alloc_pages_vma
| do_huge_pmd_anonymous_page
| handle_mm_fault
| __get_user_pages
| get_user_page_nowait
| hva_to_pfn.isra.17
| __gfn_to_pfn
| gfn_to_pfn_async
| try_async_pf
| tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--59.74%-- 0x10100000006
| |
| --40.26%-- 0x10100000002
|
|--3.41%-- isolate_migratepages_range
| compact_zone
| compact_zone_order
| try_to_compact_pages
| __alloc_pages_direct_compact
| __alloc_pages_nodemask
| alloc_pages_vma
| do_huge_pmd_anonymous_page
| handle_mm_fault
| __get_user_pages
| get_user_page_nowait
| hva_to_pfn.isra.17
| __gfn_to_pfn
| gfn_to_pfn_async
| try_async_pf
| tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--53.57%-- 0x10100000006
| |
| --46.43%-- 0x10100000002
|
|--0.82%-- ntp_tick_length
| do_timer
| tick_do_update_jiffies64
| tick_sched_timer
| __run_hrtimer.isra.28
| hrtimer_interrupt
| smp_apic_timer_interrupt
| apic_timer_interrupt
| compact_checklock_irqsave
| isolate_migratepages_range
| compact_zone
| compact_zone_order
| try_to_compact_pages
| __alloc_pages_direct_compact
| __alloc_pages_nodemask
| alloc_pages_vma
| do_huge_pmd_anonymous_page
| handle_mm_fault
| __get_user_pages
| get_user_page_nowait
| hva_to_pfn.isra.17
| __gfn_to_pfn
| gfn_to_pfn_async
| try_async_pf
| tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| 0x10100000002
|
|--0.76%-- __page_cache_release.part.11
| __put_compound_page
| put_compound_page
| release_pages
| free_pages_and_swap_cache
| tlb_flush_mmu
| tlb_finish_mmu
| exit_mmap
| mmput
| exit_mm
| do_exit
| do_group_exit
| get_signal_to_deliver
| do_signal
| do_notify_resume
| int_signal
--0.70%-- [...]
0.26% qemu-kvm [kernel.kallsyms] [k] isolate_migratepages_range
|
--- isolate_migratepages_range
|
|--95.44%-- compact_zone
| compact_zone_order
| try_to_compact_pages
| __alloc_pages_direct_compact
| __alloc_pages_nodemask
| alloc_pages_vma
| do_huge_pmd_anonymous_page
| handle_mm_fault
| __get_user_pages
| get_user_page_nowait
| hva_to_pfn.isra.17
| __gfn_to_pfn
| gfn_to_pfn_async
| try_async_pf
| tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--52.46%-- 0x10100000002
| |
| --47.54%-- 0x10100000006
|
--4.56%-- compact_zone_order
try_to_compact_pages
__alloc_pages_direct_compact
__alloc_pages_nodemask
alloc_pages_vma
do_huge_pmd_anonymous_page
handle_mm_fault
__get_user_pages
get_user_page_nowait
hva_to_pfn.isra.17
__gfn_to_pfn
gfn_to_pfn_async
try_async_pf
tdp_page_fault
kvm_mmu_page_fault
pf_interception
handle_exit
kvm_arch_vcpu_ioctl_run
kvm_vcpu_ioctl
do_vfs_ioctl
sys_ioctl
system_call_fastpath
ioctl
|
|--53.84%-- 0x10100000006
|
--46.16%-- 0x10100000002
0.21% qemu-kvm [kernel.kallsyms] [k] compact_zone
|
--- compact_zone
compact_zone_order
try_to_compact_pages
__alloc_pages_direct_compact
__alloc_pages_nodemask
alloc_pages_vma
do_huge_pmd_anonymous_page
handle_mm_fault
__get_user_pages
get_user_page_nowait
hva_to_pfn.isra.17
__gfn_to_pfn
gfn_to_pfn_async
try_async_pf
tdp_page_fault
kvm_mmu_page_fault
pf_interception
handle_exit
kvm_arch_vcpu_ioctl_run
kvm_vcpu_ioctl
do_vfs_ioctl
sys_ioctl
system_call_fastpath
ioctl
|
|--53.46%-- 0x10100000002
|
--46.54%-- 0x10100000006
0.14% qemu-kvm [kernel.kallsyms] [k] mod_zone_page_state
|
--- mod_zone_page_state
|
|--70.21%-- isolate_migratepages_range
| compact_zone
| compact_zone_order
| try_to_compact_pages
| __alloc_pages_direct_compact
| __alloc_pages_nodemask
| alloc_pages_vma
| do_huge_pmd_anonymous_page
| handle_mm_fault
| __get_user_pages
| get_user_page_nowait
| hva_to_pfn.isra.17
| __gfn_to_pfn
| gfn_to_pfn_async
| try_async_pf
| tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--55.97%-- 0x10100000002
| |
| --44.03%-- 0x10100000006
|
|--29.71%-- compact_zone
| compact_zone_order
| try_to_compact_pages
| __alloc_pages_direct_compact
| __alloc_pages_nodemask
| alloc_pages_vma
| do_huge_pmd_anonymous_page
| handle_mm_fault
| __get_user_pages
| get_user_page_nowait
| hva_to_pfn.isra.17
| __gfn_to_pfn
| gfn_to_pfn_async
| try_async_pf
| tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--61.19%-- 0x10100000002
| |
| --38.81%-- 0x10100000006
--0.08%-- [...]
0.13% qemu-kvm [kernel.kallsyms] [k] flush_tlb_func
|
--- flush_tlb_func
|
|--99.47%-- generic_smp_call_function_interrupt
| smp_call_function_interrupt
| call_function_interrupt
| |
| |--91.76%-- compact_checklock_irqsave
| | isolate_migratepages_range
| | compact_zone
| | compact_zone_order
| | try_to_compact_pages
| | __alloc_pages_direct_compact
| | __alloc_pages_nodemask
| | alloc_pages_vma
| | do_huge_pmd_anonymous_page
| | handle_mm_fault
| | __get_user_pages
| | get_user_page_nowait
| | hva_to_pfn.isra.17
| | __gfn_to_pfn
| | gfn_to_pfn_async
| | try_async_pf
| | tdp_page_fault
| | kvm_mmu_page_fault
| | pf_interception
| | handle_exit
| | kvm_arch_vcpu_ioctl_run
| | kvm_vcpu_ioctl
| | do_vfs_ioctl
| | sys_ioctl
| | system_call_fastpath
| | ioctl
| | |
| | |--76.39%-- 0x10100000006
| | |
| | --23.61%-- 0x10100000002
| |
| |--7.61%-- compact_zone
| | compact_zone_order
| | try_to_compact_pages
| | __alloc_pages_direct_compact
| | __alloc_pages_nodemask
| | alloc_pages_vma
| | do_huge_pmd_anonymous_page
| | handle_mm_fault
| | __get_user_pages
| | get_user_page_nowait
| | hva_to_pfn.isra.17
| | __gfn_to_pfn
| | gfn_to_pfn_async
| | try_async_pf
| | tdp_page_fault
| | kvm_mmu_page_fault
| | pf_interception
| | handle_exit
| | kvm_arch_vcpu_ioctl_run
| | kvm_vcpu_ioctl
| | do_vfs_ioctl
| | sys_ioctl
| | system_call_fastpath
| | ioctl
| | |
| | |--70.59%-- 0x10100000006
| | |
| | --29.41%-- 0x10100000002
| --0.63%-- [...]
|
--0.53%-- smp_call_function_interrupt
call_function_interrupt
|
|--83.32%-- compact_checklock_irqsave
| isolate_migratepages_range
| compact_zone
| compact_zone_order
| try_to_compact_pages
| __alloc_pages_direct_compact
| __alloc_pages_nodemask
| alloc_pages_vma
| do_huge_pmd_anonymous_page
| handle_mm_fault
| __get_user_pages
| get_user_page_nowait
| hva_to_pfn.isra.17
| __gfn_to_pfn
| gfn_to_pfn_async
| try_async_pf
| tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--79.99%-- 0x10100000006
| |
| --20.01%-- 0x10100000002
|
--16.68%-- compact_zone
compact_zone_order
try_to_compact_pages
__alloc_pages_direct_compact
__alloc_pages_nodemask
alloc_pages_vma
do_huge_pmd_anonymous_page
handle_mm_fault
__get_user_pages
get_user_page_nowait
hva_to_pfn.isra.17
__gfn_to_pfn
gfn_to_pfn_async
try_async_pf
tdp_page_fault
kvm_mmu_page_fault
pf_interception
handle_exit
kvm_arch_vcpu_ioctl_run
kvm_vcpu_ioctl
do_vfs_ioctl
sys_ioctl
system_call_fastpath
ioctl
0x10100000002
0.09% qemu-kvm [kernel.kallsyms] [k] free_pages_prepare
|
--- free_pages_prepare
|
|--99.75%-- __free_pages_ok
| |
| |--99.84%-- free_compound_page
| | __put_compound_page
| | put_compound_page
| | release_pages
| | free_pages_and_swap_cache
| | tlb_flush_mmu
| | tlb_finish_mmu
| | exit_mmap
| | mmput
| | exit_mm
| | do_exit
| | do_group_exit
| | get_signal_to_deliver
| | do_signal
| | do_notify_resume
| | int_signal
| --0.16%-- [...]
--0.25%-- [...]
0.08% :2585 [kernel.kallsyms] [k] free_pages_prepare
|
--- free_pages_prepare
|
|--99.47%-- __free_pages_ok
| free_compound_page
| __put_compound_page
| put_compound_page
| release_pages
| free_pages_and_swap_cache
| tlb_flush_mmu
| tlb_finish_mmu
| exit_mmap
| mmput
| exit_mm
| do_exit
| do_group_exit
| get_signal_to_deliver
| do_signal
| do_notify_resume
| int_signal
|
--0.53%-- free_hot_cold_page
__free_pages
|
|--50.65%-- zap_huge_pmd
| unmap_single_vma
| unmap_vmas
| exit_mmap
| mmput
| exit_mm
| do_exit
| do_group_exit
| get_signal_to_deliver
| do_signal
| do_notify_resume
| int_signal
|
--49.35%-- __vunmap
vfree
kvm_free_physmem_slot
kvm_free_physmem
kvm_put_kvm
kvm_vcpu_release
__fput
____fput
task_work_run
do_exit
do_group_exit
get_signal_to_deliver
do_signal
do_notify_resume
int_signal
0.07% :2561 [kernel.kallsyms] [k] free_pages_prepare
|
--- free_pages_prepare
|
|--99.55%-- __free_pages_ok
| free_compound_page
| __put_compound_page
| put_compound_page
| release_pages
| free_pages_and_swap_cache
| tlb_flush_mmu
| tlb_finish_mmu
| exit_mmap
| mmput
| exit_mm
| do_exit
| do_group_exit
| get_signal_to_deliver
| do_signal
| do_notify_resume
| int_signal
--0.45%-- [...]
0.07% qemu-kvm [kernel.kallsyms] [k] __zone_watermark_ok
|
--- __zone_watermark_ok
|
|--56.52%-- zone_watermark_ok
| compact_zone
| compact_zone_order
| try_to_compact_pages
| __alloc_pages_direct_compact
| __alloc_pages_nodemask
| alloc_pages_vma
| do_huge_pmd_anonymous_page
| handle_mm_fault
| __get_user_pages
| get_user_page_nowait
| hva_to_pfn.isra.17
| __gfn_to_pfn
| gfn_to_pfn_async
| try_async_pf
| tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--59.67%-- 0x10100000002
| |
| --40.33%-- 0x10100000006
|
--43.48%-- compact_zone
compact_zone_order
try_to_compact_pages
__alloc_pages_direct_compact
__alloc_pages_nodemask
alloc_pages_vma
do_huge_pmd_anonymous_page
handle_mm_fault
__get_user_pages
get_user_page_nowait
hva_to_pfn.isra.17
__gfn_to_pfn
gfn_to_pfn_async
try_async_pf
tdp_page_fault
kvm_mmu_page_fault
pf_interception
handle_exit
kvm_arch_vcpu_ioctl_run
kvm_vcpu_ioctl
do_vfs_ioctl
sys_ioctl
system_call_fastpath
ioctl
|
|--58.50%-- 0x10100000002
|
--41.50%-- 0x10100000006
0.06% perf [kernel.kallsyms] [k] copy_user_generic_string
|
--- copy_user_generic_string
|
|--99.82%-- generic_file_buffered_write
| __generic_file_aio_write
| generic_file_aio_write
| ext4_file_write
| do_sync_write
| vfs_write
| sys_write
| system_call_fastpath
| write
| run_builtin
| main
| __libc_start_main
--0.18%-- [...]
0.05% qemu-kvm [kernel.kallsyms] [k] compact_checklock_irqsave
|
--- compact_checklock_irqsave
|
|--82.09%-- isolate_migratepages_range
| compact_zone
| compact_zone_order
| try_to_compact_pages
| __alloc_pages_direct_compact
| __alloc_pages_nodemask
| alloc_pages_vma
| do_huge_pmd_anonymous_page
| handle_mm_fault
| __get_user_pages
| get_user_page_nowait
| hva_to_pfn.isra.17
| __gfn_to_pfn
| gfn_to_pfn_async
| try_async_pf
| tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--54.69%-- 0x10100000002
| |
| --45.31%-- 0x10100000006
|
--17.91%-- compact_zone
compact_zone_order
try_to_compact_pages
__alloc_pages_direct_compact
__alloc_pages_nodemask
alloc_pages_vma
do_huge_pmd_anonymous_page
handle_mm_fault
__get_user_pages
get_user_page_nowait
hva_to_pfn.isra.17
__gfn_to_pfn
gfn_to_pfn_async
try_async_pf
tdp_page_fault
kvm_mmu_page_fault
pf_interception
handle_exit
kvm_arch_vcpu_ioctl_run
kvm_vcpu_ioctl
do_vfs_ioctl
sys_ioctl
system_call_fastpath
ioctl
|
|--59.49%-- 0x10100000002
|
--40.51%-- 0x10100000006
0.04% qemu-kvm [kernel.kallsyms] [k] call_function_interrupt
|
--- call_function_interrupt
|
|--91.95%-- compact_checklock_irqsave
| isolate_migratepages_range
| compact_zone
| compact_zone_order
| try_to_compact_pages
| __alloc_pages_direct_compact
| __alloc_pages_nodemask
| alloc_pages_vma
| do_huge_pmd_anonymous_page
| handle_mm_fault
| __get_user_pages
| get_user_page_nowait
| hva_to_pfn.isra.17
| __gfn_to_pfn
| gfn_to_pfn_async
| try_async_pf
| tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--72.81%-- 0x10100000006
| |
| --27.19%-- 0x10100000002
|
|--7.50%-- compact_zone
| compact_zone_order
| try_to_compact_pages
| __alloc_pages_direct_compact
| __alloc_pages_nodemask
| alloc_pages_vma
| do_huge_pmd_anonymous_page
| handle_mm_fault
| __get_user_pages
| get_user_page_nowait
| hva_to_pfn.isra.17
| __gfn_to_pfn
| gfn_to_pfn_async
| try_async_pf
| tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--55.56%-- 0x10100000006
| |
| --44.44%-- 0x10100000002
--0.56%-- [...]
0.04% ksmd [kernel.kallsyms] [k] default_send_IPI_mask_sequence_phys
|
--- default_send_IPI_mask_sequence_phys
|
|--99.44%-- physflat_send_IPI_mask
| native_send_call_func_ipi
| smp_call_function_many
| native_flush_tlb_others
| flush_tlb_page
| ptep_clear_flush
| try_to_merge_with_ksm_page
| ksm_scan_thread
| kthread
| kernel_thread_helper
|
--0.56%-- native_send_call_func_ipi
smp_call_function_many
native_flush_tlb_others
flush_tlb_page
ptep_clear_flush
try_to_merge_with_ksm_page
ksm_scan_thread
kthread
kernel_thread_helper
0.03% qemu-kvm [kernel.kallsyms] [k] generic_smp_call_function_interrupt
|
--- generic_smp_call_function_interrupt
|
|--96.97%-- smp_call_function_interrupt
| call_function_interrupt
| |
| |--97.39%-- compact_checklock_irqsave
| | isolate_migratepages_range
| | compact_zone
| | compact_zone_order
| | try_to_compact_pages
| | __alloc_pages_direct_compact
| | __alloc_pages_nodemask
| | alloc_pages_vma
| | do_huge_pmd_anonymous_page
| | handle_mm_fault
| | __get_user_pages
| | get_user_page_nowait
| | hva_to_pfn.isra.17
| | __gfn_to_pfn
| | gfn_to_pfn_async
| | try_async_pf
| | tdp_page_fault
| | kvm_mmu_page_fault
| | pf_interception
| | handle_exit
| | kvm_arch_vcpu_ioctl_run
| | kvm_vcpu_ioctl
| | do_vfs_ioctl
| | sys_ioctl
| | system_call_fastpath
| | ioctl
| | |
| | |--78.65%-- 0x10100000006
| | |
| | --21.35%-- 0x10100000002
| |
| |--2.43%-- compact_zone
| | compact_zone_order
| | try_to_compact_pages
| | __alloc_pages_direct_compact
| | __alloc_pages_nodemask
| | alloc_pages_vma
| | do_huge_pmd_anonymous_page
| | handle_mm_fault
| | __get_user_pages
| | get_user_page_nowait
| | hva_to_pfn.isra.17
| | __gfn_to_pfn
| | gfn_to_pfn_async
| | try_async_pf
| | tdp_page_fault
| | kvm_mmu_page_fault
| | pf_interception
| | handle_exit
| | kvm_arch_vcpu_ioctl_run
| | kvm_vcpu_ioctl
| | do_vfs_ioctl
| | sys_ioctl
| | system_call_fastpath
| | ioctl
| | |
| | |--57.14%-- 0x10100000002
| | |
| | --42.86%-- 0x10100000006
| --0.19%-- [...]
|
--3.03%-- call_function_interrupt
|
|--77.79%-- compact_checklock_irqsave
| isolate_migratepages_range
| compact_zone
| compact_zone_order
| try_to_compact_pages
| __alloc_pages_direct_compact
| __alloc_pages_nodemask
| alloc_pages_vma
| do_huge_pmd_anonymous_page
| handle_mm_fault
| __get_user_pages
| get_user_page_nowait
| hva_to_pfn.isra.17
| __gfn_to_pfn
| gfn_to_pfn_async
| try_async_pf
| tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--71.42%-- 0x10100000006
| |
| --28.58%-- 0x10100000002
|
--22.21%-- compact_zone
compact_zone_order
try_to_compact_pages
__alloc_pages_direct_compact
__alloc_pages_nodemask
alloc_pages_vma
do_huge_pmd_anonymous_page
handle_mm_fault
__get_user_pages
get_user_page_nowait
hva_to_pfn.isra.17
__gfn_to_pfn
gfn_to_pfn_async
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: [Qemu-devel] Windows VM slow boot
@ 2012-09-12 10:56 ` Richard Davies
0 siblings, 0 replies; 101+ messages in thread
From: Richard Davies @ 2012-09-12 10:56 UTC (permalink / raw)
To: Rik van Riel; +Cc: linux-mm, Avi Kivity, kvm, qemu-devel
[ adding linux-mm - previously at http://marc.info/?t=134511509400003 ]
Hi Rik,
Since qemu-kvm 1.2.0 and Linux 3.6.0-rc5 came out, I thought that I would
retest with these.
The typical symptom now appears to be that the Windows VMs boot reasonably
fast, but then there is high CPU use and load for many minutes afterwards -
the high CPU use is both for the qemu-kvm processes themselves and also for
% sys.
I attach a perf report which seems to show that the high CPU use is in the
memory manager.
Cheers,
Richard.
# ========
# captured on: Wed Sep 12 10:25:43 2012
# os release : 3.6.0-rc5-elastic
# perf version : 3.5.2
# arch : x86_64
# nrcpus online : 16
# nrcpus avail : 16
# cpudesc : AMD Opteron(tm) Processor 6128
# cpuid : AuthenticAMD,16,9,1
# total memory : 131973280 kB
# cmdline : /home/root/bin/perf record -g -a
# event : name = cycles, type = 0, config = 0x0, config1 = 0x0, config2 = 0x0, excl_usr = 0, excl_kern = 0, id = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16 }
# HEADER_CPU_TOPOLOGY info available, use -I to display
# HEADER_NUMA_TOPOLOGY info available, use -I to display
# ========
#
# Samples: 870K of event 'cycles'
# Event count (approx.): 432968175910
#
# Overhead Command Shared Object Symbol
# ........ ............... .................... ..............................................
#
89.14% qemu-kvm [kernel.kallsyms] [k] _raw_spin_lock_irqsave
|
--- _raw_spin_lock_irqsave
|
|--95.47%-- isolate_migratepages_range
| compact_zone
| compact_zone_order
| try_to_compact_pages
| __alloc_pages_direct_compact
| __alloc_pages_nodemask
| alloc_pages_vma
| do_huge_pmd_anonymous_page
| handle_mm_fault
| __get_user_pages
| get_user_page_nowait
| hva_to_pfn.isra.17
| __gfn_to_pfn
| gfn_to_pfn_async
| try_async_pf
| tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--55.64%-- 0x10100000002
| |
| --44.36%-- 0x10100000006
|
|--4.53%-- compact_zone
| compact_zone_order
| try_to_compact_pages
| __alloc_pages_direct_compact
| __alloc_pages_nodemask
| alloc_pages_vma
| do_huge_pmd_anonymous_page
| handle_mm_fault
| __get_user_pages
| get_user_page_nowait
| hva_to_pfn.isra.17
| __gfn_to_pfn
| gfn_to_pfn_async
| try_async_pf
| tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--55.36%-- 0x10100000002
| |
| --44.64%-- 0x10100000006
--0.00%-- [...]
4.92% qemu-kvm [kernel.kallsyms] [k] migrate_pages
|
--- migrate_pages
|
|--99.74%-- compact_zone
| compact_zone_order
| try_to_compact_pages
| __alloc_pages_direct_compact
| __alloc_pages_nodemask
| alloc_pages_vma
| do_huge_pmd_anonymous_page
| handle_mm_fault
| __get_user_pages
| get_user_page_nowait
| hva_to_pfn.isra.17
| __gfn_to_pfn
| gfn_to_pfn_async
| try_async_pf
| tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--55.80%-- 0x10100000002
| |
| --44.20%-- 0x10100000006
--0.26%-- [...]
1.59% ksmd [kernel.kallsyms] [k] memcmp
|
--- memcmp
|
|--99.69%-- memcmp_pages
| |
| |--78.86%-- ksm_scan_thread
| | kthread
| | kernel_thread_helper
| |
| --21.14%-- try_to_merge_with_ksm_page
| ksm_scan_thread
| kthread
| kernel_thread_helper
--0.31%-- [...]
0.85% ksmd [kernel.kallsyms] [k] smp_call_function_many
|
--- smp_call_function_many
native_flush_tlb_others
|
|--99.81%-- flush_tlb_page
| ptep_clear_flush
| try_to_merge_with_ksm_page
| ksm_scan_thread
| kthread
| kernel_thread_helper
--0.19%-- [...]
0.38% swapper [kernel.kallsyms] [k] default_idle
|
--- default_idle
|
|--99.80%-- cpu_idle
| |
| |--90.53%-- start_secondary
| |
| --9.47%-- rest_init
| start_kernel
| x86_64_start_reservations
| x86_64_start_kernel
--0.20%-- [...]
0.38% qemu-kvm [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore
|
--- _raw_spin_unlock_irqrestore
|
|--94.31%-- compact_checklock_irqsave
| isolate_migratepages_range
| compact_zone
| compact_zone_order
| try_to_compact_pages
| __alloc_pages_direct_compact
| __alloc_pages_nodemask
| alloc_pages_vma
| do_huge_pmd_anonymous_page
| handle_mm_fault
| __get_user_pages
| get_user_page_nowait
| hva_to_pfn.isra.17
| __gfn_to_pfn
| gfn_to_pfn_async
| try_async_pf
| tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--59.74%-- 0x10100000006
| |
| --40.26%-- 0x10100000002
|
|--3.41%-- isolate_migratepages_range
| compact_zone
| compact_zone_order
| try_to_compact_pages
| __alloc_pages_direct_compact
| __alloc_pages_nodemask
| alloc_pages_vma
| do_huge_pmd_anonymous_page
| handle_mm_fault
| __get_user_pages
| get_user_page_nowait
| hva_to_pfn.isra.17
| __gfn_to_pfn
| gfn_to_pfn_async
| try_async_pf
| tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--53.57%-- 0x10100000006
| |
| --46.43%-- 0x10100000002
|
|--0.82%-- ntp_tick_length
| do_timer
| tick_do_update_jiffies64
| tick_sched_timer
| __run_hrtimer.isra.28
| hrtimer_interrupt
| smp_apic_timer_interrupt
| apic_timer_interrupt
| compact_checklock_irqsave
| isolate_migratepages_range
| compact_zone
| compact_zone_order
| try_to_compact_pages
| __alloc_pages_direct_compact
| __alloc_pages_nodemask
| alloc_pages_vma
| do_huge_pmd_anonymous_page
| handle_mm_fault
| __get_user_pages
| get_user_page_nowait
| hva_to_pfn.isra.17
| __gfn_to_pfn
| gfn_to_pfn_async
| try_async_pf
| tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| 0x10100000002
|
|--0.76%-- __page_cache_release.part.11
| __put_compound_page
| put_compound_page
| release_pages
| free_pages_and_swap_cache
| tlb_flush_mmu
| tlb_finish_mmu
| exit_mmap
| mmput
| exit_mm
| do_exit
| do_group_exit
| get_signal_to_deliver
| do_signal
| do_notify_resume
| int_signal
--0.70%-- [...]
0.26% qemu-kvm [kernel.kallsyms] [k] isolate_migratepages_range
|
--- isolate_migratepages_range
|
|--95.44%-- compact_zone
| compact_zone_order
| try_to_compact_pages
| __alloc_pages_direct_compact
| __alloc_pages_nodemask
| alloc_pages_vma
| do_huge_pmd_anonymous_page
| handle_mm_fault
| __get_user_pages
| get_user_page_nowait
| hva_to_pfn.isra.17
| __gfn_to_pfn
| gfn_to_pfn_async
| try_async_pf
| tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--52.46%-- 0x10100000002
| |
| --47.54%-- 0x10100000006
|
--4.56%-- compact_zone_order
try_to_compact_pages
__alloc_pages_direct_compact
__alloc_pages_nodemask
alloc_pages_vma
do_huge_pmd_anonymous_page
handle_mm_fault
__get_user_pages
get_user_page_nowait
hva_to_pfn.isra.17
__gfn_to_pfn
gfn_to_pfn_async
try_async_pf
tdp_page_fault
kvm_mmu_page_fault
pf_interception
handle_exit
kvm_arch_vcpu_ioctl_run
kvm_vcpu_ioctl
do_vfs_ioctl
sys_ioctl
system_call_fastpath
ioctl
|
|--53.84%-- 0x10100000006
|
--46.16%-- 0x10100000002
0.21% qemu-kvm [kernel.kallsyms] [k] compact_zone
|
--- compact_zone
compact_zone_order
try_to_compact_pages
__alloc_pages_direct_compact
__alloc_pages_nodemask
alloc_pages_vma
do_huge_pmd_anonymous_page
handle_mm_fault
__get_user_pages
get_user_page_nowait
hva_to_pfn.isra.17
__gfn_to_pfn
gfn_to_pfn_async
try_async_pf
tdp_page_fault
kvm_mmu_page_fault
pf_interception
handle_exit
kvm_arch_vcpu_ioctl_run
kvm_vcpu_ioctl
do_vfs_ioctl
sys_ioctl
system_call_fastpath
ioctl
|
|--53.46%-- 0x10100000002
|
--46.54%-- 0x10100000006
0.14% qemu-kvm [kernel.kallsyms] [k] mod_zone_page_state
|
--- mod_zone_page_state
|
|--70.21%-- isolate_migratepages_range
| compact_zone
| compact_zone_order
| try_to_compact_pages
| __alloc_pages_direct_compact
| __alloc_pages_nodemask
| alloc_pages_vma
| do_huge_pmd_anonymous_page
| handle_mm_fault
| __get_user_pages
| get_user_page_nowait
| hva_to_pfn.isra.17
| __gfn_to_pfn
| gfn_to_pfn_async
| try_async_pf
| tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--55.97%-- 0x10100000002
| |
| --44.03%-- 0x10100000006
|
|--29.71%-- compact_zone
| compact_zone_order
| try_to_compact_pages
| __alloc_pages_direct_compact
| __alloc_pages_nodemask
| alloc_pages_vma
| do_huge_pmd_anonymous_page
| handle_mm_fault
| __get_user_pages
| get_user_page_nowait
| hva_to_pfn.isra.17
| __gfn_to_pfn
| gfn_to_pfn_async
| try_async_pf
| tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--61.19%-- 0x10100000002
| |
| --38.81%-- 0x10100000006
--0.08%-- [...]
0.13% qemu-kvm [kernel.kallsyms] [k] flush_tlb_func
|
--- flush_tlb_func
|
|--99.47%-- generic_smp_call_function_interrupt
| smp_call_function_interrupt
| call_function_interrupt
| |
| |--91.76%-- compact_checklock_irqsave
| | isolate_migratepages_range
| | compact_zone
| | compact_zone_order
| | try_to_compact_pages
| | __alloc_pages_direct_compact
| | __alloc_pages_nodemask
| | alloc_pages_vma
| | do_huge_pmd_anonymous_page
| | handle_mm_fault
| | __get_user_pages
| | get_user_page_nowait
| | hva_to_pfn.isra.17
| | __gfn_to_pfn
| | gfn_to_pfn_async
| | try_async_pf
| | tdp_page_fault
| | kvm_mmu_page_fault
| | pf_interception
| | handle_exit
| | kvm_arch_vcpu_ioctl_run
| | kvm_vcpu_ioctl
| | do_vfs_ioctl
| | sys_ioctl
| | system_call_fastpath
| | ioctl
| | |
| | |--76.39%-- 0x10100000006
| | |
| | --23.61%-- 0x10100000002
| |
| |--7.61%-- compact_zone
| | compact_zone_order
| | try_to_compact_pages
| | __alloc_pages_direct_compact
| | __alloc_pages_nodemask
| | alloc_pages_vma
| | do_huge_pmd_anonymous_page
| | handle_mm_fault
| | __get_user_pages
| | get_user_page_nowait
| | hva_to_pfn.isra.17
| | __gfn_to_pfn
| | gfn_to_pfn_async
| | try_async_pf
| | tdp_page_fault
| | kvm_mmu_page_fault
| | pf_interception
| | handle_exit
| | kvm_arch_vcpu_ioctl_run
| | kvm_vcpu_ioctl
| | do_vfs_ioctl
| | sys_ioctl
| | system_call_fastpath
| | ioctl
| | |
| | |--70.59%-- 0x10100000006
| | |
| | --29.41%-- 0x10100000002
| --0.63%-- [...]
|
--0.53%-- smp_call_function_interrupt
call_function_interrupt
|
|--83.32%-- compact_checklock_irqsave
| isolate_migratepages_range
| compact_zone
| compact_zone_order
| try_to_compact_pages
| __alloc_pages_direct_compact
| __alloc_pages_nodemask
| alloc_pages_vma
| do_huge_pmd_anonymous_page
| handle_mm_fault
| __get_user_pages
| get_user_page_nowait
| hva_to_pfn.isra.17
| __gfn_to_pfn
| gfn_to_pfn_async
| try_async_pf
| tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--79.99%-- 0x10100000006
| |
| --20.01%-- 0x10100000002
|
--16.68%-- compact_zone
compact_zone_order
try_to_compact_pages
__alloc_pages_direct_compact
__alloc_pages_nodemask
alloc_pages_vma
do_huge_pmd_anonymous_page
handle_mm_fault
__get_user_pages
get_user_page_nowait
hva_to_pfn.isra.17
__gfn_to_pfn
gfn_to_pfn_async
try_async_pf
tdp_page_fault
kvm_mmu_page_fault
pf_interception
handle_exit
kvm_arch_vcpu_ioctl_run
kvm_vcpu_ioctl
do_vfs_ioctl
sys_ioctl
system_call_fastpath
ioctl
0x10100000002
0.09% qemu-kvm [kernel.kallsyms] [k] free_pages_prepare
|
--- free_pages_prepare
|
|--99.75%-- __free_pages_ok
| |
| |--99.84%-- free_compound_page
| | __put_compound_page
| | put_compound_page
| | release_pages
| | free_pages_and_swap_cache
| | tlb_flush_mmu
| | tlb_finish_mmu
| | exit_mmap
| | mmput
| | exit_mm
| | do_exit
| | do_group_exit
| | get_signal_to_deliver
| | do_signal
| | do_notify_resume
| | int_signal
| --0.16%-- [...]
--0.25%-- [...]
0.08% :2585 [kernel.kallsyms] [k] free_pages_prepare
|
--- free_pages_prepare
|
|--99.47%-- __free_pages_ok
| free_compound_page
| __put_compound_page
| put_compound_page
| release_pages
| free_pages_and_swap_cache
| tlb_flush_mmu
| tlb_finish_mmu
| exit_mmap
| mmput
| exit_mm
| do_exit
| do_group_exit
| get_signal_to_deliver
| do_signal
| do_notify_resume
| int_signal
|
--0.53%-- free_hot_cold_page
__free_pages
|
|--50.65%-- zap_huge_pmd
| unmap_single_vma
| unmap_vmas
| exit_mmap
| mmput
| exit_mm
| do_exit
| do_group_exit
| get_signal_to_deliver
| do_signal
| do_notify_resume
| int_signal
|
--49.35%-- __vunmap
vfree
kvm_free_physmem_slot
kvm_free_physmem
kvm_put_kvm
kvm_vcpu_release
__fput
____fput
task_work_run
do_exit
do_group_exit
get_signal_to_deliver
do_signal
do_notify_resume
int_signal
0.07% :2561 [kernel.kallsyms] [k] free_pages_prepare
|
--- free_pages_prepare
|
|--99.55%-- __free_pages_ok
| free_compound_page
| __put_compound_page
| put_compound_page
| release_pages
| free_pages_and_swap_cache
| tlb_flush_mmu
| tlb_finish_mmu
| exit_mmap
| mmput
| exit_mm
| do_exit
| do_group_exit
| get_signal_to_deliver
| do_signal
| do_notify_resume
| int_signal
--0.45%-- [...]
0.07% qemu-kvm [kernel.kallsyms] [k] __zone_watermark_ok
|
--- __zone_watermark_ok
|
|--56.52%-- zone_watermark_ok
| compact_zone
| compact_zone_order
| try_to_compact_pages
| __alloc_pages_direct_compact
| __alloc_pages_nodemask
| alloc_pages_vma
| do_huge_pmd_anonymous_page
| handle_mm_fault
| __get_user_pages
| get_user_page_nowait
| hva_to_pfn.isra.17
| __gfn_to_pfn
| gfn_to_pfn_async
| try_async_pf
| tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--59.67%-- 0x10100000002
| |
| --40.33%-- 0x10100000006
|
--43.48%-- compact_zone
compact_zone_order
try_to_compact_pages
__alloc_pages_direct_compact
__alloc_pages_nodemask
alloc_pages_vma
do_huge_pmd_anonymous_page
handle_mm_fault
__get_user_pages
get_user_page_nowait
hva_to_pfn.isra.17
__gfn_to_pfn
gfn_to_pfn_async
try_async_pf
tdp_page_fault
kvm_mmu_page_fault
pf_interception
handle_exit
kvm_arch_vcpu_ioctl_run
kvm_vcpu_ioctl
do_vfs_ioctl
sys_ioctl
system_call_fastpath
ioctl
|
|--58.50%-- 0x10100000002
|
--41.50%-- 0x10100000006
0.06% perf [kernel.kallsyms] [k] copy_user_generic_string
|
--- copy_user_generic_string
|
|--99.82%-- generic_file_buffered_write
| __generic_file_aio_write
| generic_file_aio_write
| ext4_file_write
| do_sync_write
| vfs_write
| sys_write
| system_call_fastpath
| write
| run_builtin
| main
| __libc_start_main
--0.18%-- [...]
0.05% qemu-kvm [kernel.kallsyms] [k] compact_checklock_irqsave
|
--- compact_checklock_irqsave
|
|--82.09%-- isolate_migratepages_range
| compact_zone
| compact_zone_order
| try_to_compact_pages
| __alloc_pages_direct_compact
| __alloc_pages_nodemask
| alloc_pages_vma
| do_huge_pmd_anonymous_page
| handle_mm_fault
| __get_user_pages
| get_user_page_nowait
| hva_to_pfn.isra.17
| __gfn_to_pfn
| gfn_to_pfn_async
| try_async_pf
| tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--54.69%-- 0x10100000002
| |
| --45.31%-- 0x10100000006
|
--17.91%-- compact_zone
compact_zone_order
try_to_compact_pages
__alloc_pages_direct_compact
__alloc_pages_nodemask
alloc_pages_vma
do_huge_pmd_anonymous_page
handle_mm_fault
__get_user_pages
get_user_page_nowait
hva_to_pfn.isra.17
__gfn_to_pfn
gfn_to_pfn_async
try_async_pf
tdp_page_fault
kvm_mmu_page_fault
pf_interception
handle_exit
kvm_arch_vcpu_ioctl_run
kvm_vcpu_ioctl
do_vfs_ioctl
sys_ioctl
system_call_fastpath
ioctl
|
|--59.49%-- 0x10100000002
|
--40.51%-- 0x10100000006
0.04% qemu-kvm [kernel.kallsyms] [k] call_function_interrupt
|
--- call_function_interrupt
|
|--91.95%-- compact_checklock_irqsave
| isolate_migratepages_range
| compact_zone
| compact_zone_order
| try_to_compact_pages
| __alloc_pages_direct_compact
| __alloc_pages_nodemask
| alloc_pages_vma
| do_huge_pmd_anonymous_page
| handle_mm_fault
| __get_user_pages
| get_user_page_nowait
| hva_to_pfn.isra.17
| __gfn_to_pfn
| gfn_to_pfn_async
| try_async_pf
| tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--72.81%-- 0x10100000006
| |
| --27.19%-- 0x10100000002
|
|--7.50%-- compact_zone
| compact_zone_order
| try_to_compact_pages
| __alloc_pages_direct_compact
| __alloc_pages_nodemask
| alloc_pages_vma
| do_huge_pmd_anonymous_page
| handle_mm_fault
| __get_user_pages
| get_user_page_nowait
| hva_to_pfn.isra.17
| __gfn_to_pfn
| gfn_to_pfn_async
| try_async_pf
| tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--55.56%-- 0x10100000006
| |
| --44.44%-- 0x10100000002
--0.56%-- [...]
0.04% ksmd [kernel.kallsyms] [k] default_send_IPI_mask_sequence_phys
|
--- default_send_IPI_mask_sequence_phys
|
|--99.44%-- physflat_send_IPI_mask
| native_send_call_func_ipi
| smp_call_function_many
| native_flush_tlb_others
| flush_tlb_page
| ptep_clear_flush
| try_to_merge_with_ksm_page
| ksm_scan_thread
| kthread
| kernel_thread_helper
|
--0.56%-- native_send_call_func_ipi
smp_call_function_many
native_flush_tlb_others
flush_tlb_page
ptep_clear_flush
try_to_merge_with_ksm_page
ksm_scan_thread
kthread
kernel_thread_helper
0.03% qemu-kvm [kernel.kallsyms] [k] generic_smp_call_function_interrupt
|
--- generic_smp_call_function_interrupt
|
|--96.97%-- smp_call_function_interrupt
| call_function_interrupt
| |
| |--97.39%-- compact_checklock_irqsave
| | isolate_migratepages_range
| | compact_zone
| | compact_zone_order
| | try_to_compact_pages
| | __alloc_pages_direct_compact
| | __alloc_pages_nodemask
| | alloc_pages_vma
| | do_huge_pmd_anonymous_page
| | handle_mm_fault
| | __get_user_pages
| | get_user_page_nowait
| | hva_to_pfn.isra.17
| | __gfn_to_pfn
| | gfn_to_pfn_async
| | try_async_pf
| | tdp_page_fault
| | kvm_mmu_page_fault
| | pf_interception
| | handle_exit
| | kvm_arch_vcpu_ioctl_run
| | kvm_vcpu_ioctl
| | do_vfs_ioctl
| | sys_ioctl
| | system_call_fastpath
| | ioctl
| | |
| | |--78.65%-- 0x10100000006
| | |
| | --21.35%-- 0x10100000002
| |
| |--2.43%-- compact_zone
| | compact_zone_order
| | try_to_compact_pages
| | __alloc_pages_direct_compact
| | __alloc_pages_nodemask
| | alloc_pages_vma
| | do_huge_pmd_anonymous_page
| | handle_mm_fault
| | __get_user_pages
| | get_user_page_nowait
| | hva_to_pfn.isra.17
| | __gfn_to_pfn
| | gfn_to_pfn_async
| | try_async_pf
| | tdp_page_fault
| | kvm_mmu_page_fault
| | pf_interception
| | handle_exit
| | kvm_arch_vcpu_ioctl_run
| | kvm_vcpu_ioctl
| | do_vfs_ioctl
| | sys_ioctl
| | system_call_fastpath
| | ioctl
| | |
| | |--57.14%-- 0x10100000002
| | |
| | --42.86%-- 0x10100000006
| --0.19%-- [...]
|
--3.03%-- call_function_interrupt
|
|--77.79%-- compact_checklock_irqsave
| isolate_migratepages_range
| compact_zone
| compact_zone_order
| try_to_compact_pages
| __alloc_pages_direct_compact
| __alloc_pages_nodemask
| alloc_pages_vma
| do_huge_pmd_anonymous_page
| handle_mm_fault
| __get_user_pages
| get_user_page_nowait
| hva_to_pfn.isra.17
| __gfn_to_pfn
| gfn_to_pfn_async
| try_async_pf
| tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--71.42%-- 0x10100000006
| |
| --28.58%-- 0x10100000002
|
--22.21%-- compact_zone
compact_zone_order
try_to_compact_pages
__alloc_pages_direct_compact
__alloc_pages_nodemask
alloc_pages_vma
do_huge_pmd_anonymous_page
handle_mm_fault
__get_user_pages
get_user_page_nowait
hva_to_pfn.isra.17
__gfn_to_pfn
gfn_to_pfn_async
^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: Windows VM slow boot
2012-09-12 10:56 ` Richard Davies
(?)
@ 2012-09-12 12:25 ` Mel Gorman
-1 siblings, 0 replies; 101+ messages in thread
From: Mel Gorman @ 2012-09-12 12:25 UTC (permalink / raw)
To: Richard Davies
Cc: Rik van Riel, Avi Kivity, Shaohua Li, qemu-devel, kvm, linux-mm
On Wed, Sep 12, 2012 at 11:56:59AM +0100, Richard Davies wrote:
> [ adding linux-mm - previously at http://marc.info/?t=134511509400003 ]
>
> Hi Rik,
>
I'm not Rik but hi anyway.
> Since qemu-kvm 1.2.0 and Linux 3.6.0-rc5 came out, I thought that I would
> retest with these.
>
Ok. 3.6.0-rc5 contains [c67fe375: mm: compaction: Abort async compaction
if locks are contended or taking too long] that should have helped mitigate
some of the lock contention problem but not all of it as we'll see later.
> The typical symptom now appears to be that the Windows VMs boot reasonably
> fast,
I see that this is an old-ish bug but I did not read the full history.
Is it now booting faster than 3.5.0 was? I'm asking because I'm
interested to see if commit c67fe375 helped your particular case.
> but then there is high CPU use and load for many minutes afterwards -
> the high CPU use is both for the qemu-kvm processes themselves and also for
> % sys.
>
Ok, I cannot comment on the userspace portion of things but the kernel
portion still indicates that there is a high percentage of time on what
appears to be lock contention.
> I attach a perf report which seems to show that the high CPU use is in the
> memory manager.
>
A follow-on from commit c67fe375 was the following patch (author cc'd)
which addresses lock contention in isolate_migratepages_range where your
perf report indicates that we're spending 95% of the time. Would you be
willing to test it please?
---8<---
From: Shaohua Li <shli@kernel.org>
Subject: mm: compaction: check lock contention first before taking lock
isolate_migratepages_range will take zone->lru_lock first and check if the
lock is contented, if yes, it will release the lock. This isn't
efficient. If the lock is truly contented, a lock/unlock pair will
increase the lock contention. We'd better check if the lock is contended
first. compact_trylock_irqsave perfectly meets the requirement.
Signed-off-by: Shaohua Li <shli@fusionio.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/compaction.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff -puN mm/compaction.c~mm-compaction-check-lock-contention-first-before-taking-lock mm/compaction.c
--- a/mm/compaction.c~mm-compaction-check-lock-contention-first-before-taking-lock
+++ a/mm/compaction.c
@@ -349,8 +349,9 @@ isolate_migratepages_range(struct zone *
/* Time to isolate some pages for migration */
cond_resched();
- spin_lock_irqsave(&zone->lru_lock, flags);
- locked = true;
+ locked = compact_trylock_irqsave(&zone->lru_lock, &flags, cc);
+ if (!locked)
+ return 0;
for (; low_pfn < end_pfn; low_pfn++) {
struct page *page;
^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: Windows VM slow boot
@ 2012-09-12 12:25 ` Mel Gorman
0 siblings, 0 replies; 101+ messages in thread
From: Mel Gorman @ 2012-09-12 12:25 UTC (permalink / raw)
To: Richard Davies
Cc: Rik van Riel, Avi Kivity, Shaohua Li, qemu-devel, kvm, linux-mm
On Wed, Sep 12, 2012 at 11:56:59AM +0100, Richard Davies wrote:
> [ adding linux-mm - previously at http://marc.info/?t=134511509400003 ]
>
> Hi Rik,
>
I'm not Rik but hi anyway.
> Since qemu-kvm 1.2.0 and Linux 3.6.0-rc5 came out, I thought that I would
> retest with these.
>
Ok. 3.6.0-rc5 contains [c67fe375: mm: compaction: Abort async compaction
if locks are contended or taking too long] that should have helped mitigate
some of the lock contention problem but not all of it as we'll see later.
> The typical symptom now appears to be that the Windows VMs boot reasonably
> fast,
I see that this is an old-ish bug but I did not read the full history.
Is it now booting faster than 3.5.0 was? I'm asking because I'm
interested to see if commit c67fe375 helped your particular case.
> but then there is high CPU use and load for many minutes afterwards -
> the high CPU use is both for the qemu-kvm processes themselves and also for
> % sys.
>
Ok, I cannot comment on the userspace portion of things but the kernel
portion still indicates that there is a high percentage of time on what
appears to be lock contention.
> I attach a perf report which seems to show that the high CPU use is in the
> memory manager.
>
A follow-on from commit c67fe375 was the following patch (author cc'd)
which addresses lock contention in isolate_migratepages_range where your
perf report indicates that we're spending 95% of the time. Would you be
willing to test it please?
---8<---
From: Shaohua Li <shli@kernel.org>
Subject: mm: compaction: check lock contention first before taking lock
isolate_migratepages_range will take zone->lru_lock first and check if the
lock is contented, if yes, it will release the lock. This isn't
efficient. If the lock is truly contented, a lock/unlock pair will
increase the lock contention. We'd better check if the lock is contended
first. compact_trylock_irqsave perfectly meets the requirement.
Signed-off-by: Shaohua Li <shli@fusionio.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/compaction.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff -puN mm/compaction.c~mm-compaction-check-lock-contention-first-before-taking-lock mm/compaction.c
--- a/mm/compaction.c~mm-compaction-check-lock-contention-first-before-taking-lock
+++ a/mm/compaction.c
@@ -349,8 +349,9 @@ isolate_migratepages_range(struct zone *
/* Time to isolate some pages for migration */
cond_resched();
- spin_lock_irqsave(&zone->lru_lock, flags);
- locked = true;
+ locked = compact_trylock_irqsave(&zone->lru_lock, &flags, cc);
+ if (!locked)
+ return 0;
for (; low_pfn < end_pfn; low_pfn++) {
struct page *page;
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: [Qemu-devel] Windows VM slow boot
@ 2012-09-12 12:25 ` Mel Gorman
0 siblings, 0 replies; 101+ messages in thread
From: Mel Gorman @ 2012-09-12 12:25 UTC (permalink / raw)
To: Richard Davies; +Cc: kvm, qemu-devel, linux-mm, Avi Kivity, Shaohua Li
On Wed, Sep 12, 2012 at 11:56:59AM +0100, Richard Davies wrote:
> [ adding linux-mm - previously at http://marc.info/?t=134511509400003 ]
>
> Hi Rik,
>
I'm not Rik but hi anyway.
> Since qemu-kvm 1.2.0 and Linux 3.6.0-rc5 came out, I thought that I would
> retest with these.
>
Ok. 3.6.0-rc5 contains [c67fe375: mm: compaction: Abort async compaction
if locks are contended or taking too long] that should have helped mitigate
some of the lock contention problem but not all of it as we'll see later.
> The typical symptom now appears to be that the Windows VMs boot reasonably
> fast,
I see that this is an old-ish bug but I did not read the full history.
Is it now booting faster than 3.5.0 was? I'm asking because I'm
interested to see if commit c67fe375 helped your particular case.
> but then there is high CPU use and load for many minutes afterwards -
> the high CPU use is both for the qemu-kvm processes themselves and also for
> % sys.
>
Ok, I cannot comment on the userspace portion of things but the kernel
portion still indicates that there is a high percentage of time on what
appears to be lock contention.
> I attach a perf report which seems to show that the high CPU use is in the
> memory manager.
>
A follow-on from commit c67fe375 was the following patch (author cc'd)
which addresses lock contention in isolate_migratepages_range where your
perf report indicates that we're spending 95% of the time. Would you be
willing to test it please?
---8<---
From: Shaohua Li <shli@kernel.org>
Subject: mm: compaction: check lock contention first before taking lock
isolate_migratepages_range will take zone->lru_lock first and check if the
lock is contented, if yes, it will release the lock. This isn't
efficient. If the lock is truly contented, a lock/unlock pair will
increase the lock contention. We'd better check if the lock is contended
first. compact_trylock_irqsave perfectly meets the requirement.
Signed-off-by: Shaohua Li <shli@fusionio.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/compaction.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff -puN mm/compaction.c~mm-compaction-check-lock-contention-first-before-taking-lock mm/compaction.c
--- a/mm/compaction.c~mm-compaction-check-lock-contention-first-before-taking-lock
+++ a/mm/compaction.c
@@ -349,8 +349,9 @@ isolate_migratepages_range(struct zone *
/* Time to isolate some pages for migration */
cond_resched();
- spin_lock_irqsave(&zone->lru_lock, flags);
- locked = true;
+ locked = compact_trylock_irqsave(&zone->lru_lock, &flags, cc);
+ if (!locked)
+ return 0;
for (; low_pfn < end_pfn; low_pfn++) {
struct page *page;
^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: Windows VM slow boot
2012-09-12 12:25 ` Mel Gorman
(?)
@ 2012-09-12 16:46 ` Richard Davies
-1 siblings, 0 replies; 101+ messages in thread
From: Richard Davies @ 2012-09-12 16:46 UTC (permalink / raw)
To: Mel Gorman
Cc: Rik van Riel, Avi Kivity, Shaohua Li, qemu-devel, kvm, linux-mm
Hi Mel - thanks for replying to my underhand bcc!
Mel Gorman wrote:
> I see that this is an old-ish bug but I did not read the full history.
> Is it now booting faster than 3.5.0 was? I'm asking because I'm
> interested to see if commit c67fe375 helped your particular case.
Yes, I think 3.6.0-rc5 is already better than 3.5.x but can still be
improved, as discussed.
> A follow-on from commit c67fe375 was the following patch (author cc'd)
> which addresses lock contention in isolate_migratepages_range where your
> perf report indicates that we're spending 95% of the time. Would you be
> willing to test it please?
>
> ---8<---
> From: Shaohua Li <shli@kernel.org>
> Subject: mm: compaction: check lock contention first before taking lock
>
> isolate_migratepages_range will take zone->lru_lock first and check if the
> lock is contented, if yes, it will release the lock. This isn't
> efficient. If the lock is truly contented, a lock/unlock pair will
> increase the lock contention. We'd better check if the lock is contended
> first. compact_trylock_irqsave perfectly meets the requirement.
>
> Signed-off-by: Shaohua Li <shli@fusionio.com>
> Acked-by: Mel Gorman <mgorman@suse.de>
> Acked-by: Minchan Kim <minchan@kernel.org>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> ---
>
> mm/compaction.c | 5 +++--
> 1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff -puN mm/compaction.c~mm-compaction-check-lock-contention-first-before-taking-lock mm/compaction.c
> --- a/mm/compaction.c~mm-compaction-check-lock-contention-first-before-taking-lock
> +++ a/mm/compaction.c
> @@ -349,8 +349,9 @@ isolate_migratepages_range(struct zone *
>
> /* Time to isolate some pages for migration */
> cond_resched();
> - spin_lock_irqsave(&zone->lru_lock, flags);
> - locked = true;
> + locked = compact_trylock_irqsave(&zone->lru_lock, &flags, cc);
> + if (!locked)
> + return 0;
> for (; low_pfn < end_pfn; low_pfn++) {
> struct page *page;
I have applied and tested again - perf results below.
isolate_migratepages_range is indeed much reduced.
There is now a lot of time in isolate_freepages_block and still quite a lot
of lock contention, although in a different place.
# ========
# captured on: Wed Sep 12 16:00:52 2012
# os release : 3.6.0-rc5-elastic+
# perf version : 3.5.2
# arch : x86_64
# nrcpus online : 16
# nrcpus avail : 16
# cpudesc : AMD Opteron(tm) Processor 6128
# cpuid : AuthenticAMD,16,9,1
# total memory : 131973280 kB
# cmdline : /home/root/bin/perf record -g -a
# event : name = cycles, type = 0, config = 0x0, config1 = 0x0, config2 = 0x0, excl_usr = 0, excl_kern = 0, id = { 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80 }
# HEADER_CPU_TOPOLOGY info available, use -I to display
# HEADER_NUMA_TOPOLOGY info available, use -I to display
# ========
#
# Samples: 1M of event 'cycles'
# Event count (approx.): 560365005583
#
# Overhead Command Shared Object Symbol
# ........ ............... .................... ..............................................
#
43.95% qemu-kvm [kernel.kallsyms] [k] isolate_freepages_block
|
--- isolate_freepages_block
|
|--99.99%-- compaction_alloc
| migrate_pages
| compact_zone
| compact_zone_order
| try_to_compact_pages
| __alloc_pages_direct_compact
| __alloc_pages_nodemask
| alloc_pages_vma
| do_huge_pmd_anonymous_page
| handle_mm_fault
| __get_user_pages
| get_user_page_nowait
| hva_to_pfn.isra.17
| __gfn_to_pfn
| gfn_to_pfn_async
| try_async_pf
| tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--95.17%-- 0x10100000006
| |
| --4.83%-- 0x10100000002
--0.01%-- [...]
15.98% qemu-kvm [kernel.kallsyms] [k] _raw_spin_lock_irqsave
|
--- _raw_spin_lock_irqsave
|
|--97.18%-- compact_checklock_irqsave
| |
| |--98.61%-- compaction_alloc
| | migrate_pages
| | compact_zone
| | compact_zone_order
| | try_to_compact_pages
| | __alloc_pages_direct_compact
| | __alloc_pages_nodemask
| | alloc_pages_vma
| | do_huge_pmd_anonymous_page
| | handle_mm_fault
| | __get_user_pages
| | get_user_page_nowait
| | hva_to_pfn.isra.17
| | __gfn_to_pfn
| | gfn_to_pfn_async
| | try_async_pf
| | tdp_page_fault
| | kvm_mmu_page_fault
| | pf_interception
| | handle_exit
| | kvm_arch_vcpu_ioctl_run
| | kvm_vcpu_ioctl
| | do_vfs_ioctl
| | sys_ioctl
| | system_call_fastpath
| | ioctl
| | |
| | |--94.94%-- 0x10100000006
| | |
| | --5.06%-- 0x10100000002
| |
| --1.39%-- isolate_migratepages_range
| compact_zone
| compact_zone_order
| try_to_compact_pages
| __alloc_pages_direct_compact
| __alloc_pages_nodemask
| alloc_pages_vma
| do_huge_pmd_anonymous_page
| handle_mm_fault
| __get_user_pages
| get_user_page_nowait
| hva_to_pfn.isra.17
| __gfn_to_pfn
| gfn_to_pfn_async
| try_async_pf
| tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--95.04%-- 0x10100000006
| |
| --4.96%-- 0x10100000002
|
|--1.94%-- compaction_alloc
| migrate_pages
| compact_zone
| compact_zone_order
| try_to_compact_pages
| __alloc_pages_direct_compact
| __alloc_pages_nodemask
| alloc_pages_vma
| do_huge_pmd_anonymous_page
| handle_mm_fault
| __get_user_pages
| get_user_page_nowait
| hva_to_pfn.isra.17
| __gfn_to_pfn
| gfn_to_pfn_async
| try_async_pf
| tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--95.19%-- 0x10100000006
| |
| --4.81%-- 0x10100000002
--0.88%-- [...]
5.73% ksmd [kernel.kallsyms] [k] memcmp
|
--- memcmp
|
|--99.79%-- memcmp_pages
| |
| |--81.64%-- ksm_scan_thread
| | kthread
| | kernel_thread_helper
| |
| --18.36%-- try_to_merge_with_ksm_page
| ksm_scan_thread
| kthread
| kernel_thread_helper
--0.21%-- [...]
5.52% swapper [kernel.kallsyms] [k] default_idle
|
--- default_idle
|
|--99.51%-- cpu_idle
| |
| |--86.19%-- start_secondary
| |
| --13.81%-- rest_init
| start_kernel
| x86_64_start_reservations
| x86_64_start_kernel
--0.49%-- [...]
2.90% qemu-kvm [kernel.kallsyms] [k] yield_to
|
--- yield_to
|
|--99.70%-- kvm_vcpu_yield_to
| kvm_vcpu_on_spin
| pause_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--96.09%-- 0x10100000006
| |
| --3.91%-- 0x10100000002
--0.30%-- [...]
1.86% qemu-kvm [kernel.kallsyms] [k] clear_page_c
|
--- clear_page_c
|
|--99.15%-- do_huge_pmd_anonymous_page
| handle_mm_fault
| __get_user_pages
| get_user_page_nowait
| hva_to_pfn.isra.17
| __gfn_to_pfn
| gfn_to_pfn_async
| try_async_pf
| tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--96.03%-- 0x10100000006
| |
| --3.97%-- 0x10100000002
|
--0.85%-- __alloc_pages_nodemask
|
|--78.22%-- alloc_pages_vma
| handle_pte_fault
| |
| |--99.76%-- handle_mm_fault
| | __get_user_pages
| | get_user_page_nowait
| | hva_to_pfn.isra.17
| | __gfn_to_pfn
| | gfn_to_pfn_async
| | try_async_pf
| | tdp_page_fault
| | kvm_mmu_page_fault
| | pf_interception
| | handle_exit
| | kvm_arch_vcpu_ioctl_run
| | kvm_vcpu_ioctl
| | do_vfs_ioctl
| | sys_ioctl
| | system_call_fastpath
| | ioctl
| | |
| | |--91.60%-- 0x10100000006
| | |
| | --8.40%-- 0x10100000002
| --0.24%-- [...]
|
--21.78%-- alloc_pages_current
pte_alloc_one
|
|--97.40%-- do_huge_pmd_anonymous_page
| handle_mm_fault
| __get_user_pages
| get_user_page_nowait
| hva_to_pfn.isra.17
| __gfn_to_pfn
| gfn_to_pfn_async
| try_async_pf
| tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--93.12%-- 0x10100000006
| |
| --6.88%-- 0x10100000002
|
--2.60%-- __pte_alloc
do_huge_pmd_anonymous_page
handle_mm_fault
__get_user_pages
get_user_page_nowait
hva_to_pfn.isra.17
__gfn_to_pfn
gfn_to_pfn_async
try_async_pf
tdp_page_fault
kvm_mmu_page_fault
pf_interception
handle_exit
kvm_arch_vcpu_ioctl_run
kvm_vcpu_ioctl
do_vfs_ioctl
sys_ioctl
system_call_fastpath
ioctl
0x10100000006
1.83% qemu-kvm [kernel.kallsyms] [k] get_pageblock_flags_group
|
--- get_pageblock_flags_group
|
|--51.38%-- isolate_migratepages_range
| compact_zone
| compact_zone_order
| try_to_compact_pages
| __alloc_pages_direct_compact
| __alloc_pages_nodemask
| alloc_pages_vma
| do_huge_pmd_anonymous_page
| handle_mm_fault
| __get_user_pages
| get_user_page_nowait
| hva_to_pfn.isra.17
| __gfn_to_pfn
| gfn_to_pfn_async
| try_async_pf
| tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--95.32%-- 0x10100000006
| |
| --4.68%-- 0x10100000002
|
|--43.05%-- suitable_migration_target
| compaction_alloc
| migrate_pages
| compact_zone
| compact_zone_order
| try_to_compact_pages
| __alloc_pages_direct_compact
| __alloc_pages_nodemask
| alloc_pages_vma
| do_huge_pmd_anonymous_page
| handle_mm_fault
| __get_user_pages
| get_user_page_nowait
| hva_to_pfn.isra.17
| __gfn_to_pfn
| gfn_to_pfn_async
| try_async_pf
| tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--95.52%-- 0x10100000006
| |
| --4.48%-- 0x10100000002
|
|--3.62%-- compact_zone
| compact_zone_order
| try_to_compact_pages
| __alloc_pages_direct_compact
| __alloc_pages_nodemask
| alloc_pages_vma
| do_huge_pmd_anonymous_page
| handle_mm_fault
| __get_user_pages
| get_user_page_nowait
| hva_to_pfn.isra.17
| __gfn_to_pfn
| gfn_to_pfn_async
| try_async_pf
| tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--96.78%-- 0x10100000006
| |
| --3.22%-- 0x10100000002
|
|--1.20%-- compaction_alloc
| migrate_pages
| compact_zone
| compact_zone_order
| try_to_compact_pages
| __alloc_pages_direct_compact
| __alloc_pages_nodemask
| alloc_pages_vma
| do_huge_pmd_anonymous_page
| handle_mm_fault
| __get_user_pages
| get_user_page_nowait
| hva_to_pfn.isra.17
| __gfn_to_pfn
| gfn_to_pfn_async
| try_async_pf
| tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--96.33%-- 0x10100000006
| |
| --3.67%-- 0x10100000002
|
|--0.61%-- free_hot_cold_page
| |
| |--77.99%-- free_hot_cold_page_list
| | |
| | |--95.93%-- release_pages
| | | pagevec_lru_move_fn
| | | __pagevec_lru_add
| | | |
| | | |--98.44%-- __lru_cache_add
| | | | lru_cache_add_lru
| | | | putback_lru_page
| | | | migrate_pages
| | | | compact_zone
| | | | compact_zone_order
| | | | try_to_compact_pages
| | | | __alloc_pages_direct_compact
| | | | __alloc_pages_nodemask
| | | | alloc_pages_vma
| | | | do_huge_pmd_anonymous_page
| | | | handle_mm_fault
| | | | __get_user_pages
| | | | get_user_page_nowait
| | | | hva_to_pfn.isra.17
| | | | __gfn_to_pfn
| | | | gfn_to_pfn_async
| | | | try_async_pf
| | | | tdp_page_fault
| | | | kvm_mmu_page_fault
| | | | pf_interception
| | | | handle_exit
| | | | kvm_arch_vcpu_ioctl_run
| | | | kvm_vcpu_ioctl
| | | | do_vfs_ioctl
| | | | sys_ioctl
| | | | system_call_fastpath
| | | | ioctl
| | | | |
| | | | |--96.77%-- 0x10100000006
| | | | |
| | | | --3.23%-- 0x10100000002
| | | |
| | | --1.56%-- lru_add_drain_cpu
| | | lru_add_drain
| | | migrate_prep_local
| | | compact_zone
| | | compact_zone_order
| | | try_to_compact_pages
| | | __alloc_pages_direct_compact
| | | __alloc_pages_nodemask
| | | alloc_pages_vma
| | | do_huge_pmd_anonymous_page
| | | handle_mm_fault
| | | __get_user_pages
| | | get_user_page_nowait
| | | hva_to_pfn.isra.17
| | | __gfn_to_pfn
| | | gfn_to_pfn_async
| | | try_async_pf
| | | tdp_page_fault
| | | kvm_mmu_page_fault
| | | pf_interception
| | | handle_exit
| | | kvm_arch_vcpu_ioctl_run
| | | kvm_vcpu_ioctl
| | | do_vfs_ioctl
| | | sys_ioctl
| | | system_call_fastpath
| | | ioctl
| | | 0x10100000006
| | |
| | --4.07%-- shrink_page_list
| | shrink_inactive_list
| | shrink_lruvec
| | try_to_free_pages
| | __alloc_pages_nodemask
| | alloc_pages_vma
| | do_huge_pmd_anonymous_page
| | handle_mm_fault
| | __get_user_pages
| | get_user_page_nowait
| | hva_to_pfn.isra.17
| | __gfn_to_pfn
| | gfn_to_pfn_async
| | try_async_pf
| | tdp_page_fault
| | kvm_mmu_page_fault
| | pf_interception
| | handle_exit
| | kvm_arch_vcpu_ioctl_run
| | kvm_vcpu_ioctl
| | do_vfs_ioctl
| | sys_ioctl
| | system_call_fastpath
| | ioctl
| | 0x10100000006
| |
| |--19.40%-- __free_pages
| | |
| | |--85.71%-- release_freepages
| | | compact_zone
| | | compact_zone_order
| | | try_to_compact_pages
| | | __alloc_pages_direct_compact
| | | __alloc_pages_nodemask
| | | alloc_pages_vma
| | | do_huge_pmd_anonymous_page
| | | handle_mm_fault
| | | __get_user_pages
| | | get_user_page_nowait
| | | hva_to_pfn.isra.17
| | | __gfn_to_pfn
| | | gfn_to_pfn_async
| | | try_async_pf
| | | tdp_page_fault
| | | kvm_mmu_page_fault
| | | pf_interception
| | | handle_exit
| | | kvm_arch_vcpu_ioctl_run
| | | kvm_vcpu_ioctl
| | | do_vfs_ioctl
| | | sys_ioctl
| | | system_call_fastpath
| | | ioctl
| | | |
| | | |--90.47%-- 0x10100000006
| | | |
| | | --9.53%-- 0x10100000002
| | |
| | |--10.21%-- do_huge_pmd_anonymous_page
| | | handle_mm_fault
| | | __get_user_pages
| | | get_user_page_nowait
| | | hva_to_pfn.isra.17
| | | __gfn_to_pfn
| | | gfn_to_pfn_async
| | | try_async_pf
| | | tdp_page_fault
| | | kvm_mmu_page_fault
| | | pf_interception
| | | handle_exit
| | | kvm_arch_vcpu_ioctl_run
| | | kvm_vcpu_ioctl
| | | do_vfs_ioctl
| | | sys_ioctl
| | | system_call_fastpath
| | | ioctl
| | | 0x10100000006
| | |
| | --4.08%-- __free_slab
| | discard_slab
| | __slab_free
| | kmem_cache_free
| | free_buffer_head
| | try_to_free_buffers
| | jbd2_journal_try_to_free_buffers
| | bdev_try_to_free_page
| | blkdev_releasepage
| | try_to_release_page
| | move_to_new_page
| | migrate_pages
| | compact_zone
| | compact_zone_order
| | try_to_compact_pages
| | __alloc_pages_direct_compact
| | __alloc_pages_nodemask
| | alloc_pages_vma
| | do_huge_pmd_anonymous_page
| | handle_mm_fault
| | __get_user_pages
| | get_user_page_nowait
| | hva_to_pfn.isra.17
| | __gfn_to_pfn
| | gfn_to_pfn_async
| | try_async_pf
| | tdp_page_fault
| | kvm_mmu_page_fault
| | pf_interception
| | handle_exit
| | kvm_arch_vcpu_ioctl_run
| | kvm_vcpu_ioctl
| | do_vfs_ioctl
| | sys_ioctl
| | system_call_fastpath
| | ioctl
| | 0x10100000006
| |
| --2.61%-- __put_single_page
| put_page
| |
| |--91.27%-- putback_lru_page
| | migrate_pages
| | compact_zone
| | compact_zone_order
| | try_to_compact_pages
| | __alloc_pages_direct_compact
| | __alloc_pages_nodemask
| | alloc_pages_vma
| | do_huge_pmd_anonymous_page
| | handle_mm_fault
| | __get_user_pages
| | get_user_page_nowait
| | hva_to_pfn.isra.17
| | __gfn_to_pfn
| | gfn_to_pfn_async
| | try_async_pf
| | tdp_page_fault
| | kvm_mmu_page_fault
| | pf_interception
| | handle_exit
| | kvm_arch_vcpu_ioctl_run
| | kvm_vcpu_ioctl
| | do_vfs_ioctl
| | sys_ioctl
| | system_call_fastpath
| | ioctl
| | 0x10100000006
| |
| --8.73%-- skb_free_head.part.34
| skb_release_data
| __kfree_skb
| tcp_recvmsg
| inet_recvmsg
| sock_recvmsg
| sys_recvfrom
| system_call_fastpath
| recv
| 0x0
--0.14%-- [...]
1.54% qemu-kvm [kernel.kallsyms] [k] svm_vcpu_run
|
--- svm_vcpu_run
|
|--99.52%-- kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--94.70%-- 0x10100000006
| |
| --5.30%-- 0x10100000002
--0.48%-- [...]
1.30% qemu-kvm [kernel.kallsyms] [k] kvm_vcpu_on_spin
|
--- kvm_vcpu_on_spin
|
|--99.45%-- pause_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--96.06%-- 0x10100000006
| |
| --3.94%-- 0x10100000002
|
--0.55%-- handle_exit
kvm_arch_vcpu_ioctl_run
kvm_vcpu_ioctl
do_vfs_ioctl
sys_ioctl
system_call_fastpath
ioctl
|
|--97.59%-- 0x10100000006
|
--2.41%-- 0x10100000002
1.00% qemu-kvm qemu-kvm [.] 0x0000000000254bc2
|
|--1.63%-- 0x4eec20
| |
| |--47.60%-- 0x2274280
| | 0x0
| | 0xa0
| | 0x696368752d62
| |
| |--26.98%-- 0x309c280
| | 0x0
| | 0xa0
| | 0x696368752d62
| |
| --25.42%-- 0x16b5280
| 0x0
| 0xa0
| 0x696368752d62
|
|--1.63%-- 0x4eec6e
| |
| |--52.41%-- 0x2274280
| | 0x0
| | 0xa0
| | 0x696368752d62
| |
| |--38.99%-- 0x16b5280
| | 0x0
| | 0xa0
| | 0x696368752d62
| |
| --8.60%-- 0x309c280
| 0x0
| 0xa0
| 0x696368752d62
|
|--1.44%-- 0x5b4cb4
| 0x0
| |
| --100.00%-- 0x822ee8fff96873e9
|
|--1.32%-- 0x503457
| 0x0
|
|--1.30%-- 0x65a186
| 0x0
|
|--1.22%-- 0x541422
| 0x0
|
|--1.08%-- 0x568f04
| |
| |--93.81%-- 0x0
| |
| |--6.01%-- 0x10100000006
| --0.19%-- [...]
|
|--1.06%-- 0x56a08e
| |
| |--55.97%-- 0x2fa1410
| | 0x0
| |
| |--24.12%-- 0x2179410
| | 0x0
| |
| --19.92%-- 0x15ba410
| 0x0
|
|--1.05%-- 0x4eeeac
| |
| |--66.23%-- 0x309c280
| | 0x0
| | 0xa0
| | 0x696368752d62
| |
| |--19.06%-- 0x16b5280
| | 0x0
| | 0xa0
| | 0x696368752d62
| |
| --14.71%-- 0x2274280
| 0x0
| 0xa0
| 0x696368752d62
|
|--1.01%-- 0x6578d7
| |
| --100.00%-- 0x0
|
|--0.96%-- 0x52fb44
| |
| |--91.88%-- 0x0
| |
| --8.12%-- 0x10100000006
|
|--0.95%-- 0x65a102
|
|--0.94%-- 0x541aac
| 0x0
|
|--0.93%-- 0x525261
| 0x0
| |
| --100.00%-- 0x822ee8fff96873e9
|
|--0.89%-- 0x540e24
|
|--0.88%-- 0x477a32
| 0x0
|
|--0.87%-- 0x4eee03
| |
| |--47.23%-- 0x309c280
| | 0x0
| | 0xa0
| | 0x696368752d62
| |
| |--32.15%-- 0x2274280
| | 0x0
| | 0xa0
| | 0x696368752d62
| |
| --20.62%-- 0x16b5280
| 0x0
| 0xa0
| 0x696368752d62
|
|--0.84%-- 0x530421
| |
| --100.00%-- 0x0
|
|--0.83%-- 0x4eeb52
|
|--0.82%-- 0x40a6a9
|
|--0.79%-- 0x672601
| 0x1
|
|--0.78%-- 0x564e00
| |
| --100.00%-- 0x0
|
|--0.78%-- 0x568e38
| |
| |--95.83%-- 0x309c280
| | 0x0
| | 0xa0
| | 0x696368752d62
| |
| |--2.15%-- 0x10100000006
| |
| --2.02%-- 0x16b5280
| 0x0
| 0xa0
| 0x696368752d62
|
|--0.74%-- 0x56e704
| |
| |--47.84%-- 0x309c280
| | 0x0
| | 0xa0
| | 0x696368752d62
| |
| |--38.61%-- 0x16b5280
| | 0x0
| | 0xa0
| | 0x696368752d62
| |
| |--10.72%-- 0x2274280
| | 0x0
| | 0xa0
| | 0x696368752d62
| |
| --2.83%-- 0x10100000006
|
|--0.73%-- 0x5308c3
|
|--0.72%-- 0x654b22
| 0x0
|
|--0.71%-- 0x530094
|
|--0.71%-- 0x564e04
| |
| |--87.21%-- 0x0
| |
| |--12.59%-- 0x46b47b
| | 0xdffebc0000a88169
| --0.20%-- [...]
|
|--0.71%-- 0x568e5f
| |
| |--98.58%-- 0x309c280
| | 0x0
| | 0xa0
| | 0x696368752d62
| |
| --1.42%-- 0x16b5280
| 0x0
| 0xa0
| 0x696368752d62
|
|--0.70%-- 0x4ef092
|
|--0.70%-- 0x52fac2
| |
| |--99.12%-- 0x0
| |
| --0.88%-- 0x10100000006
|
|--0.68%-- 0x541ac1
|
|--0.66%-- 0x4eec22
| |
| |--44.90%-- 0x16b5280
| | 0x0
| | 0xa0
| | 0x696368752d62
| |
| |--30.11%-- 0x309c280
| | 0x0
| | 0xa0
| | 0x696368752d62
| |
| --25.00%-- 0x2274280
| 0x0
| 0xa0
| 0x696368752d62
|
|--0.65%-- 0x5afab4
| |
| |--48.10%-- 0x2179410
| | 0x0
| |
| |--41.94%-- 0x15ba410
| | 0x0
| |
| |--5.05%-- 0x0
| | |
| | |--39.43%-- 0x3099550
| | | 0x5699c0
| | | 0x24448948004b4154
| | |
| | |--35.76%-- 0x23c0e90
| | | 0x5699c0
| | | 0x24448948004b4154
| | |
| | --24.81%-- 0x16b2130
| | 0x5699c0
| | 0x24448948004b4154
| |
| |--4.00%-- 0x2fa1410
| | 0x0
| |
| --0.92%-- 0x6
|
|--0.63%-- 0x65a3f6
| 0x1
|
|--0.63%-- 0x659d12
| 0x0
|
|--0.62%-- 0x530764
| 0x0
|
|--0.62%-- 0x46e803
| 0x46b47b
| |
| |--72.15%-- 0xdffebc0000a88169
| |
| |--16.88%-- 0xdffebec000a08169
| |
| --10.97%-- 0xdffeb1d000a88169
|
|--0.61%-- 0x4eeba0
| |
| |--45.41%-- 0x309c280
| | 0x0
| | 0xa0
| | 0x696368752d62
| |
| |--36.19%-- 0x16b5280
| | 0x0
| | 0xa0
| | 0x696368752d62
| |
| --18.40%-- 0x2274280
| 0x0
| 0xa0
| 0x696368752d62
|
|--0.60%-- 0x659d61
|
|--0.60%-- 0x4ff496
|
|--0.59%-- 0x5030db
|
|--0.58%-- 0x477822
|
^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: Windows VM slow boot
@ 2012-09-12 16:46 ` Richard Davies
0 siblings, 0 replies; 101+ messages in thread
From: Richard Davies @ 2012-09-12 16:46 UTC (permalink / raw)
To: Mel Gorman
Cc: Rik van Riel, Avi Kivity, Shaohua Li, qemu-devel, kvm, linux-mm
Hi Mel - thanks for replying to my underhand bcc!
Mel Gorman wrote:
> I see that this is an old-ish bug but I did not read the full history.
> Is it now booting faster than 3.5.0 was? I'm asking because I'm
> interested to see if commit c67fe375 helped your particular case.
Yes, I think 3.6.0-rc5 is already better than 3.5.x but can still be
improved, as discussed.
> A follow-on from commit c67fe375 was the following patch (author cc'd)
> which addresses lock contention in isolate_migratepages_range where your
> perf report indicates that we're spending 95% of the time. Would you be
> willing to test it please?
>
> ---8<---
> From: Shaohua Li <shli@kernel.org>
> Subject: mm: compaction: check lock contention first before taking lock
>
> isolate_migratepages_range will take zone->lru_lock first and check if the
> lock is contented, if yes, it will release the lock. This isn't
> efficient. If the lock is truly contented, a lock/unlock pair will
> increase the lock contention. We'd better check if the lock is contended
> first. compact_trylock_irqsave perfectly meets the requirement.
>
> Signed-off-by: Shaohua Li <shli@fusionio.com>
> Acked-by: Mel Gorman <mgorman@suse.de>
> Acked-by: Minchan Kim <minchan@kernel.org>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> ---
>
> mm/compaction.c | 5 +++--
> 1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff -puN mm/compaction.c~mm-compaction-check-lock-contention-first-before-taking-lock mm/compaction.c
> --- a/mm/compaction.c~mm-compaction-check-lock-contention-first-before-taking-lock
> +++ a/mm/compaction.c
> @@ -349,8 +349,9 @@ isolate_migratepages_range(struct zone *
>
> /* Time to isolate some pages for migration */
> cond_resched();
> - spin_lock_irqsave(&zone->lru_lock, flags);
> - locked = true;
> + locked = compact_trylock_irqsave(&zone->lru_lock, &flags, cc);
> + if (!locked)
> + return 0;
> for (; low_pfn < end_pfn; low_pfn++) {
> struct page *page;
I have applied and tested again - perf results below.
isolate_migratepages_range is indeed much reduced.
There is now a lot of time in isolate_freepages_block and still quite a lot
of lock contention, although in a different place.
# ========
# captured on: Wed Sep 12 16:00:52 2012
# os release : 3.6.0-rc5-elastic+
# perf version : 3.5.2
# arch : x86_64
# nrcpus online : 16
# nrcpus avail : 16
# cpudesc : AMD Opteron(tm) Processor 6128
# cpuid : AuthenticAMD,16,9,1
# total memory : 131973280 kB
# cmdline : /home/root/bin/perf record -g -a
# event : name = cycles, type = 0, config = 0x0, config1 = 0x0, config2 = 0x0, excl_usr = 0, excl_kern = 0, id = { 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80 }
# HEADER_CPU_TOPOLOGY info available, use -I to display
# HEADER_NUMA_TOPOLOGY info available, use -I to display
# ========
#
# Samples: 1M of event 'cycles'
# Event count (approx.): 560365005583
#
# Overhead Command Shared Object Symbol
# ........ ............... .................... ..............................................
#
43.95% qemu-kvm [kernel.kallsyms] [k] isolate_freepages_block
|
--- isolate_freepages_block
|
|--99.99%-- compaction_alloc
| migrate_pages
| compact_zone
| compact_zone_order
| try_to_compact_pages
| __alloc_pages_direct_compact
| __alloc_pages_nodemask
| alloc_pages_vma
| do_huge_pmd_anonymous_page
| handle_mm_fault
| __get_user_pages
| get_user_page_nowait
| hva_to_pfn.isra.17
| __gfn_to_pfn
| gfn_to_pfn_async
| try_async_pf
| tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--95.17%-- 0x10100000006
| |
| --4.83%-- 0x10100000002
--0.01%-- [...]
15.98% qemu-kvm [kernel.kallsyms] [k] _raw_spin_lock_irqsave
|
--- _raw_spin_lock_irqsave
|
|--97.18%-- compact_checklock_irqsave
| |
| |--98.61%-- compaction_alloc
| | migrate_pages
| | compact_zone
| | compact_zone_order
| | try_to_compact_pages
| | __alloc_pages_direct_compact
| | __alloc_pages_nodemask
| | alloc_pages_vma
| | do_huge_pmd_anonymous_page
| | handle_mm_fault
| | __get_user_pages
| | get_user_page_nowait
| | hva_to_pfn.isra.17
| | __gfn_to_pfn
| | gfn_to_pfn_async
| | try_async_pf
| | tdp_page_fault
| | kvm_mmu_page_fault
| | pf_interception
| | handle_exit
| | kvm_arch_vcpu_ioctl_run
| | kvm_vcpu_ioctl
| | do_vfs_ioctl
| | sys_ioctl
| | system_call_fastpath
| | ioctl
| | |
| | |--94.94%-- 0x10100000006
| | |
| | --5.06%-- 0x10100000002
| |
| --1.39%-- isolate_migratepages_range
| compact_zone
| compact_zone_order
| try_to_compact_pages
| __alloc_pages_direct_compact
| __alloc_pages_nodemask
| alloc_pages_vma
| do_huge_pmd_anonymous_page
| handle_mm_fault
| __get_user_pages
| get_user_page_nowait
| hva_to_pfn.isra.17
| __gfn_to_pfn
| gfn_to_pfn_async
| try_async_pf
| tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--95.04%-- 0x10100000006
| |
| --4.96%-- 0x10100000002
|
|--1.94%-- compaction_alloc
| migrate_pages
| compact_zone
| compact_zone_order
| try_to_compact_pages
| __alloc_pages_direct_compact
| __alloc_pages_nodemask
| alloc_pages_vma
| do_huge_pmd_anonymous_page
| handle_mm_fault
| __get_user_pages
| get_user_page_nowait
| hva_to_pfn.isra.17
| __gfn_to_pfn
| gfn_to_pfn_async
| try_async_pf
| tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--95.19%-- 0x10100000006
| |
| --4.81%-- 0x10100000002
--0.88%-- [...]
5.73% ksmd [kernel.kallsyms] [k] memcmp
|
--- memcmp
|
|--99.79%-- memcmp_pages
| |
| |--81.64%-- ksm_scan_thread
| | kthread
| | kernel_thread_helper
| |
| --18.36%-- try_to_merge_with_ksm_page
| ksm_scan_thread
| kthread
| kernel_thread_helper
--0.21%-- [...]
5.52% swapper [kernel.kallsyms] [k] default_idle
|
--- default_idle
|
|--99.51%-- cpu_idle
| |
| |--86.19%-- start_secondary
| |
| --13.81%-- rest_init
| start_kernel
| x86_64_start_reservations
| x86_64_start_kernel
--0.49%-- [...]
2.90% qemu-kvm [kernel.kallsyms] [k] yield_to
|
--- yield_to
|
|--99.70%-- kvm_vcpu_yield_to
| kvm_vcpu_on_spin
| pause_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--96.09%-- 0x10100000006
| |
| --3.91%-- 0x10100000002
--0.30%-- [...]
1.86% qemu-kvm [kernel.kallsyms] [k] clear_page_c
|
--- clear_page_c
|
|--99.15%-- do_huge_pmd_anonymous_page
| handle_mm_fault
| __get_user_pages
| get_user_page_nowait
| hva_to_pfn.isra.17
| __gfn_to_pfn
| gfn_to_pfn_async
| try_async_pf
| tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--96.03%-- 0x10100000006
| |
| --3.97%-- 0x10100000002
|
--0.85%-- __alloc_pages_nodemask
|
|--78.22%-- alloc_pages_vma
| handle_pte_fault
| |
| |--99.76%-- handle_mm_fault
| | __get_user_pages
| | get_user_page_nowait
| | hva_to_pfn.isra.17
| | __gfn_to_pfn
| | gfn_to_pfn_async
| | try_async_pf
| | tdp_page_fault
| | kvm_mmu_page_fault
| | pf_interception
| | handle_exit
| | kvm_arch_vcpu_ioctl_run
| | kvm_vcpu_ioctl
| | do_vfs_ioctl
| | sys_ioctl
| | system_call_fastpath
| | ioctl
| | |
| | |--91.60%-- 0x10100000006
| | |
| | --8.40%-- 0x10100000002
| --0.24%-- [...]
|
--21.78%-- alloc_pages_current
pte_alloc_one
|
|--97.40%-- do_huge_pmd_anonymous_page
| handle_mm_fault
| __get_user_pages
| get_user_page_nowait
| hva_to_pfn.isra.17
| __gfn_to_pfn
| gfn_to_pfn_async
| try_async_pf
| tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--93.12%-- 0x10100000006
| |
| --6.88%-- 0x10100000002
|
--2.60%-- __pte_alloc
do_huge_pmd_anonymous_page
handle_mm_fault
__get_user_pages
get_user_page_nowait
hva_to_pfn.isra.17
__gfn_to_pfn
gfn_to_pfn_async
try_async_pf
tdp_page_fault
kvm_mmu_page_fault
pf_interception
handle_exit
kvm_arch_vcpu_ioctl_run
kvm_vcpu_ioctl
do_vfs_ioctl
sys_ioctl
system_call_fastpath
ioctl
0x10100000006
1.83% qemu-kvm [kernel.kallsyms] [k] get_pageblock_flags_group
|
--- get_pageblock_flags_group
|
|--51.38%-- isolate_migratepages_range
| compact_zone
| compact_zone_order
| try_to_compact_pages
| __alloc_pages_direct_compact
| __alloc_pages_nodemask
| alloc_pages_vma
| do_huge_pmd_anonymous_page
| handle_mm_fault
| __get_user_pages
| get_user_page_nowait
| hva_to_pfn.isra.17
| __gfn_to_pfn
| gfn_to_pfn_async
| try_async_pf
| tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--95.32%-- 0x10100000006
| |
| --4.68%-- 0x10100000002
|
|--43.05%-- suitable_migration_target
| compaction_alloc
| migrate_pages
| compact_zone
| compact_zone_order
| try_to_compact_pages
| __alloc_pages_direct_compact
| __alloc_pages_nodemask
| alloc_pages_vma
| do_huge_pmd_anonymous_page
| handle_mm_fault
| __get_user_pages
| get_user_page_nowait
| hva_to_pfn.isra.17
| __gfn_to_pfn
| gfn_to_pfn_async
| try_async_pf
| tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--95.52%-- 0x10100000006
| |
| --4.48%-- 0x10100000002
|
|--3.62%-- compact_zone
| compact_zone_order
| try_to_compact_pages
| __alloc_pages_direct_compact
| __alloc_pages_nodemask
| alloc_pages_vma
| do_huge_pmd_anonymous_page
| handle_mm_fault
| __get_user_pages
| get_user_page_nowait
| hva_to_pfn.isra.17
| __gfn_to_pfn
| gfn_to_pfn_async
| try_async_pf
| tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--96.78%-- 0x10100000006
| |
| --3.22%-- 0x10100000002
|
|--1.20%-- compaction_alloc
| migrate_pages
| compact_zone
| compact_zone_order
| try_to_compact_pages
| __alloc_pages_direct_compact
| __alloc_pages_nodemask
| alloc_pages_vma
| do_huge_pmd_anonymous_page
| handle_mm_fault
| __get_user_pages
| get_user_page_nowait
| hva_to_pfn.isra.17
| __gfn_to_pfn
| gfn_to_pfn_async
| try_async_pf
| tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--96.33%-- 0x10100000006
| |
| --3.67%-- 0x10100000002
|
|--0.61%-- free_hot_cold_page
| |
| |--77.99%-- free_hot_cold_page_list
| | |
| | |--95.93%-- release_pages
| | | pagevec_lru_move_fn
| | | __pagevec_lru_add
| | | |
| | | |--98.44%-- __lru_cache_add
| | | | lru_cache_add_lru
| | | | putback_lru_page
| | | | migrate_pages
| | | | compact_zone
| | | | compact_zone_order
| | | | try_to_compact_pages
| | | | __alloc_pages_direct_compact
| | | | __alloc_pages_nodemask
| | | | alloc_pages_vma
| | | | do_huge_pmd_anonymous_page
| | | | handle_mm_fault
| | | | __get_user_pages
| | | | get_user_page_nowait
| | | | hva_to_pfn.isra.17
| | | | __gfn_to_pfn
| | | | gfn_to_pfn_async
| | | | try_async_pf
| | | | tdp_page_fault
| | | | kvm_mmu_page_fault
| | | | pf_interception
| | | | handle_exit
| | | | kvm_arch_vcpu_ioctl_run
| | | | kvm_vcpu_ioctl
| | | | do_vfs_ioctl
| | | | sys_ioctl
| | | | system_call_fastpath
| | | | ioctl
| | | | |
| | | | |--96.77%-- 0x10100000006
| | | | |
| | | | --3.23%-- 0x10100000002
| | | |
| | | --1.56%-- lru_add_drain_cpu
| | | lru_add_drain
| | | migrate_prep_local
| | | compact_zone
| | | compact_zone_order
| | | try_to_compact_pages
| | | __alloc_pages_direct_compact
| | | __alloc_pages_nodemask
| | | alloc_pages_vma
| | | do_huge_pmd_anonymous_page
| | | handle_mm_fault
| | | __get_user_pages
| | | get_user_page_nowait
| | | hva_to_pfn.isra.17
| | | __gfn_to_pfn
| | | gfn_to_pfn_async
| | | try_async_pf
| | | tdp_page_fault
| | | kvm_mmu_page_fault
| | | pf_interception
| | | handle_exit
| | | kvm_arch_vcpu_ioctl_run
| | | kvm_vcpu_ioctl
| | | do_vfs_ioctl
| | | sys_ioctl
| | | system_call_fastpath
| | | ioctl
| | | 0x10100000006
| | |
| | --4.07%-- shrink_page_list
| | shrink_inactive_list
| | shrink_lruvec
| | try_to_free_pages
| | __alloc_pages_nodemask
| | alloc_pages_vma
| | do_huge_pmd_anonymous_page
| | handle_mm_fault
| | __get_user_pages
| | get_user_page_nowait
| | hva_to_pfn.isra.17
| | __gfn_to_pfn
| | gfn_to_pfn_async
| | try_async_pf
| | tdp_page_fault
| | kvm_mmu_page_fault
| | pf_interception
| | handle_exit
| | kvm_arch_vcpu_ioctl_run
| | kvm_vcpu_ioctl
| | do_vfs_ioctl
| | sys_ioctl
| | system_call_fastpath
| | ioctl
| | 0x10100000006
| |
| |--19.40%-- __free_pages
| | |
| | |--85.71%-- release_freepages
| | | compact_zone
| | | compact_zone_order
| | | try_to_compact_pages
| | | __alloc_pages_direct_compact
| | | __alloc_pages_nodemask
| | | alloc_pages_vma
| | | do_huge_pmd_anonymous_page
| | | handle_mm_fault
| | | __get_user_pages
| | | get_user_page_nowait
| | | hva_to_pfn.isra.17
| | | __gfn_to_pfn
| | | gfn_to_pfn_async
| | | try_async_pf
| | | tdp_page_fault
| | | kvm_mmu_page_fault
| | | pf_interception
| | | handle_exit
| | | kvm_arch_vcpu_ioctl_run
| | | kvm_vcpu_ioctl
| | | do_vfs_ioctl
| | | sys_ioctl
| | | system_call_fastpath
| | | ioctl
| | | |
| | | |--90.47%-- 0x10100000006
| | | |
| | | --9.53%-- 0x10100000002
| | |
| | |--10.21%-- do_huge_pmd_anonymous_page
| | | handle_mm_fault
| | | __get_user_pages
| | | get_user_page_nowait
| | | hva_to_pfn.isra.17
| | | __gfn_to_pfn
| | | gfn_to_pfn_async
| | | try_async_pf
| | | tdp_page_fault
| | | kvm_mmu_page_fault
| | | pf_interception
| | | handle_exit
| | | kvm_arch_vcpu_ioctl_run
| | | kvm_vcpu_ioctl
| | | do_vfs_ioctl
| | | sys_ioctl
| | | system_call_fastpath
| | | ioctl
| | | 0x10100000006
| | |
| | --4.08%-- __free_slab
| | discard_slab
| | __slab_free
| | kmem_cache_free
| | free_buffer_head
| | try_to_free_buffers
| | jbd2_journal_try_to_free_buffers
| | bdev_try_to_free_page
| | blkdev_releasepage
| | try_to_release_page
| | move_to_new_page
| | migrate_pages
| | compact_zone
| | compact_zone_order
| | try_to_compact_pages
| | __alloc_pages_direct_compact
| | __alloc_pages_nodemask
| | alloc_pages_vma
| | do_huge_pmd_anonymous_page
| | handle_mm_fault
| | __get_user_pages
| | get_user_page_nowait
| | hva_to_pfn.isra.17
| | __gfn_to_pfn
| | gfn_to_pfn_async
| | try_async_pf
| | tdp_page_fault
| | kvm_mmu_page_fault
| | pf_interception
| | handle_exit
| | kvm_arch_vcpu_ioctl_run
| | kvm_vcpu_ioctl
| | do_vfs_ioctl
| | sys_ioctl
| | system_call_fastpath
| | ioctl
| | 0x10100000006
| |
| --2.61%-- __put_single_page
| put_page
| |
| |--91.27%-- putback_lru_page
| | migrate_pages
| | compact_zone
| | compact_zone_order
| | try_to_compact_pages
| | __alloc_pages_direct_compact
| | __alloc_pages_nodemask
| | alloc_pages_vma
| | do_huge_pmd_anonymous_page
| | handle_mm_fault
| | __get_user_pages
| | get_user_page_nowait
| | hva_to_pfn.isra.17
| | __gfn_to_pfn
| | gfn_to_pfn_async
| | try_async_pf
| | tdp_page_fault
| | kvm_mmu_page_fault
| | pf_interception
| | handle_exit
| | kvm_arch_vcpu_ioctl_run
| | kvm_vcpu_ioctl
| | do_vfs_ioctl
| | sys_ioctl
| | system_call_fastpath
| | ioctl
| | 0x10100000006
| |
| --8.73%-- skb_free_head.part.34
| skb_release_data
| __kfree_skb
| tcp_recvmsg
| inet_recvmsg
| sock_recvmsg
| sys_recvfrom
| system_call_fastpath
| recv
| 0x0
--0.14%-- [...]
1.54% qemu-kvm [kernel.kallsyms] [k] svm_vcpu_run
|
--- svm_vcpu_run
|
|--99.52%-- kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--94.70%-- 0x10100000006
| |
| --5.30%-- 0x10100000002
--0.48%-- [...]
1.30% qemu-kvm [kernel.kallsyms] [k] kvm_vcpu_on_spin
|
--- kvm_vcpu_on_spin
|
|--99.45%-- pause_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--96.06%-- 0x10100000006
| |
| --3.94%-- 0x10100000002
|
--0.55%-- handle_exit
kvm_arch_vcpu_ioctl_run
kvm_vcpu_ioctl
do_vfs_ioctl
sys_ioctl
system_call_fastpath
ioctl
|
|--97.59%-- 0x10100000006
|
--2.41%-- 0x10100000002
1.00% qemu-kvm qemu-kvm [.] 0x0000000000254bc2
|
|--1.63%-- 0x4eec20
| |
| |--47.60%-- 0x2274280
| | 0x0
| | 0xa0
| | 0x696368752d62
| |
| |--26.98%-- 0x309c280
| | 0x0
| | 0xa0
| | 0x696368752d62
| |
| --25.42%-- 0x16b5280
| 0x0
| 0xa0
| 0x696368752d62
|
|--1.63%-- 0x4eec6e
| |
| |--52.41%-- 0x2274280
| | 0x0
| | 0xa0
| | 0x696368752d62
| |
| |--38.99%-- 0x16b5280
| | 0x0
| | 0xa0
| | 0x696368752d62
| |
| --8.60%-- 0x309c280
| 0x0
| 0xa0
| 0x696368752d62
|
|--1.44%-- 0x5b4cb4
| 0x0
| |
| --100.00%-- 0x822ee8fff96873e9
|
|--1.32%-- 0x503457
| 0x0
|
|--1.30%-- 0x65a186
| 0x0
|
|--1.22%-- 0x541422
| 0x0
|
|--1.08%-- 0x568f04
| |
| |--93.81%-- 0x0
| |
| |--6.01%-- 0x10100000006
| --0.19%-- [...]
|
|--1.06%-- 0x56a08e
| |
| |--55.97%-- 0x2fa1410
| | 0x0
| |
| |--24.12%-- 0x2179410
| | 0x0
| |
| --19.92%-- 0x15ba410
| 0x0
|
|--1.05%-- 0x4eeeac
| |
| |--66.23%-- 0x309c280
| | 0x0
| | 0xa0
| | 0x696368752d62
| |
| |--19.06%-- 0x16b5280
| | 0x0
| | 0xa0
| | 0x696368752d62
| |
| --14.71%-- 0x2274280
| 0x0
| 0xa0
| 0x696368752d62
|
|--1.01%-- 0x6578d7
| |
| --100.00%-- 0x0
|
|--0.96%-- 0x52fb44
| |
| |--91.88%-- 0x0
| |
| --8.12%-- 0x10100000006
|
|--0.95%-- 0x65a102
|
|--0.94%-- 0x541aac
| 0x0
|
|--0.93%-- 0x525261
| 0x0
| |
| --100.00%-- 0x822ee8fff96873e9
|
|--0.89%-- 0x540e24
|
|--0.88%-- 0x477a32
| 0x0
|
|--0.87%-- 0x4eee03
| |
| |--47.23%-- 0x309c280
| | 0x0
| | 0xa0
| | 0x696368752d62
| |
| |--32.15%-- 0x2274280
| | 0x0
| | 0xa0
| | 0x696368752d62
| |
| --20.62%-- 0x16b5280
| 0x0
| 0xa0
| 0x696368752d62
|
|--0.84%-- 0x530421
| |
| --100.00%-- 0x0
|
|--0.83%-- 0x4eeb52
|
|--0.82%-- 0x40a6a9
|
|--0.79%-- 0x672601
| 0x1
|
|--0.78%-- 0x564e00
| |
| --100.00%-- 0x0
|
|--0.78%-- 0x568e38
| |
| |--95.83%-- 0x309c280
| | 0x0
| | 0xa0
| | 0x696368752d62
| |
| |--2.15%-- 0x10100000006
| |
| --2.02%-- 0x16b5280
| 0x0
| 0xa0
| 0x696368752d62
|
|--0.74%-- 0x56e704
| |
| |--47.84%-- 0x309c280
| | 0x0
| | 0xa0
| | 0x696368752d62
| |
| |--38.61%-- 0x16b5280
| | 0x0
| | 0xa0
| | 0x696368752d62
| |
| |--10.72%-- 0x2274280
| | 0x0
| | 0xa0
| | 0x696368752d62
| |
| --2.83%-- 0x10100000006
|
|--0.73%-- 0x5308c3
|
|--0.72%-- 0x654b22
| 0x0
|
|--0.71%-- 0x530094
|
|--0.71%-- 0x564e04
| |
| |--87.21%-- 0x0
| |
| |--12.59%-- 0x46b47b
| | 0xdffebc0000a88169
| --0.20%-- [...]
|
|--0.71%-- 0x568e5f
| |
| |--98.58%-- 0x309c280
| | 0x0
| | 0xa0
| | 0x696368752d62
| |
| --1.42%-- 0x16b5280
| 0x0
| 0xa0
| 0x696368752d62
|
|--0.70%-- 0x4ef092
|
|--0.70%-- 0x52fac2
| |
| |--99.12%-- 0x0
| |
| --0.88%-- 0x10100000006
|
|--0.68%-- 0x541ac1
|
|--0.66%-- 0x4eec22
| |
| |--44.90%-- 0x16b5280
| | 0x0
| | 0xa0
| | 0x696368752d62
| |
| |--30.11%-- 0x309c280
| | 0x0
| | 0xa0
| | 0x696368752d62
| |
| --25.00%-- 0x2274280
| 0x0
| 0xa0
| 0x696368752d62
|
|--0.65%-- 0x5afab4
| |
| |--48.10%-- 0x2179410
| | 0x0
| |
| |--41.94%-- 0x15ba410
| | 0x0
| |
| |--5.05%-- 0x0
| | |
| | |--39.43%-- 0x3099550
| | | 0x5699c0
| | | 0x24448948004b4154
| | |
| | |--35.76%-- 0x23c0e90
| | | 0x5699c0
| | | 0x24448948004b4154
| | |
| | --24.81%-- 0x16b2130
| | 0x5699c0
| | 0x24448948004b4154
| |
| |--4.00%-- 0x2fa1410
| | 0x0
| |
| --0.92%-- 0x6
|
|--0.63%-- 0x65a3f6
| 0x1
|
|--0.63%-- 0x659d12
| 0x0
|
|--0.62%-- 0x530764
| 0x0
|
|--0.62%-- 0x46e803
| 0x46b47b
| |
| |--72.15%-- 0xdffebc0000a88169
| |
| |--16.88%-- 0xdffebec000a08169
| |
| --10.97%-- 0xdffeb1d000a88169
|
|--0.61%-- 0x4eeba0
| |
| |--45.41%-- 0x309c280
| | 0x0
| | 0xa0
| | 0x696368752d62
| |
| |--36.19%-- 0x16b5280
| | 0x0
| | 0xa0
| | 0x696368752d62
| |
| --18.40%-- 0x2274280
| 0x0
| 0xa0
| 0x696368752d62
|
|--0.60%-- 0x659d61
|
|--0.60%-- 0x4ff496
|
|--0.59%-- 0x5030db
|
|--0.58%-- 0x477822
|
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: [Qemu-devel] Windows VM slow boot
@ 2012-09-12 16:46 ` Richard Davies
0 siblings, 0 replies; 101+ messages in thread
From: Richard Davies @ 2012-09-12 16:46 UTC (permalink / raw)
To: Mel Gorman; +Cc: kvm, qemu-devel, linux-mm, Avi Kivity, Shaohua Li
Hi Mel - thanks for replying to my underhand bcc!
Mel Gorman wrote:
> I see that this is an old-ish bug but I did not read the full history.
> Is it now booting faster than 3.5.0 was? I'm asking because I'm
> interested to see if commit c67fe375 helped your particular case.
Yes, I think 3.6.0-rc5 is already better than 3.5.x but can still be
improved, as discussed.
> A follow-on from commit c67fe375 was the following patch (author cc'd)
> which addresses lock contention in isolate_migratepages_range where your
> perf report indicates that we're spending 95% of the time. Would you be
> willing to test it please?
>
> ---8<---
> From: Shaohua Li <shli@kernel.org>
> Subject: mm: compaction: check lock contention first before taking lock
>
> isolate_migratepages_range will take zone->lru_lock first and check if the
> lock is contented, if yes, it will release the lock. This isn't
> efficient. If the lock is truly contented, a lock/unlock pair will
> increase the lock contention. We'd better check if the lock is contended
> first. compact_trylock_irqsave perfectly meets the requirement.
>
> Signed-off-by: Shaohua Li <shli@fusionio.com>
> Acked-by: Mel Gorman <mgorman@suse.de>
> Acked-by: Minchan Kim <minchan@kernel.org>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> ---
>
> mm/compaction.c | 5 +++--
> 1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff -puN mm/compaction.c~mm-compaction-check-lock-contention-first-before-taking-lock mm/compaction.c
> --- a/mm/compaction.c~mm-compaction-check-lock-contention-first-before-taking-lock
> +++ a/mm/compaction.c
> @@ -349,8 +349,9 @@ isolate_migratepages_range(struct zone *
>
> /* Time to isolate some pages for migration */
> cond_resched();
> - spin_lock_irqsave(&zone->lru_lock, flags);
> - locked = true;
> + locked = compact_trylock_irqsave(&zone->lru_lock, &flags, cc);
> + if (!locked)
> + return 0;
> for (; low_pfn < end_pfn; low_pfn++) {
> struct page *page;
I have applied and tested again - perf results below.
isolate_migratepages_range is indeed much reduced.
There is now a lot of time in isolate_freepages_block and still quite a lot
of lock contention, although in a different place.
# ========
# captured on: Wed Sep 12 16:00:52 2012
# os release : 3.6.0-rc5-elastic+
# perf version : 3.5.2
# arch : x86_64
# nrcpus online : 16
# nrcpus avail : 16
# cpudesc : AMD Opteron(tm) Processor 6128
# cpuid : AuthenticAMD,16,9,1
# total memory : 131973280 kB
# cmdline : /home/root/bin/perf record -g -a
# event : name = cycles, type = 0, config = 0x0, config1 = 0x0, config2 = 0x0, excl_usr = 0, excl_kern = 0, id = { 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80 }
# HEADER_CPU_TOPOLOGY info available, use -I to display
# HEADER_NUMA_TOPOLOGY info available, use -I to display
# ========
#
# Samples: 1M of event 'cycles'
# Event count (approx.): 560365005583
#
# Overhead Command Shared Object Symbol
# ........ ............... .................... ..............................................
#
43.95% qemu-kvm [kernel.kallsyms] [k] isolate_freepages_block
|
--- isolate_freepages_block
|
|--99.99%-- compaction_alloc
| migrate_pages
| compact_zone
| compact_zone_order
| try_to_compact_pages
| __alloc_pages_direct_compact
| __alloc_pages_nodemask
| alloc_pages_vma
| do_huge_pmd_anonymous_page
| handle_mm_fault
| __get_user_pages
| get_user_page_nowait
| hva_to_pfn.isra.17
| __gfn_to_pfn
| gfn_to_pfn_async
| try_async_pf
| tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--95.17%-- 0x10100000006
| |
| --4.83%-- 0x10100000002
--0.01%-- [...]
15.98% qemu-kvm [kernel.kallsyms] [k] _raw_spin_lock_irqsave
|
--- _raw_spin_lock_irqsave
|
|--97.18%-- compact_checklock_irqsave
| |
| |--98.61%-- compaction_alloc
| | migrate_pages
| | compact_zone
| | compact_zone_order
| | try_to_compact_pages
| | __alloc_pages_direct_compact
| | __alloc_pages_nodemask
| | alloc_pages_vma
| | do_huge_pmd_anonymous_page
| | handle_mm_fault
| | __get_user_pages
| | get_user_page_nowait
| | hva_to_pfn.isra.17
| | __gfn_to_pfn
| | gfn_to_pfn_async
| | try_async_pf
| | tdp_page_fault
| | kvm_mmu_page_fault
| | pf_interception
| | handle_exit
| | kvm_arch_vcpu_ioctl_run
| | kvm_vcpu_ioctl
| | do_vfs_ioctl
| | sys_ioctl
| | system_call_fastpath
| | ioctl
| | |
| | |--94.94%-- 0x10100000006
| | |
| | --5.06%-- 0x10100000002
| |
| --1.39%-- isolate_migratepages_range
| compact_zone
| compact_zone_order
| try_to_compact_pages
| __alloc_pages_direct_compact
| __alloc_pages_nodemask
| alloc_pages_vma
| do_huge_pmd_anonymous_page
| handle_mm_fault
| __get_user_pages
| get_user_page_nowait
| hva_to_pfn.isra.17
| __gfn_to_pfn
| gfn_to_pfn_async
| try_async_pf
| tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--95.04%-- 0x10100000006
| |
| --4.96%-- 0x10100000002
|
|--1.94%-- compaction_alloc
| migrate_pages
| compact_zone
| compact_zone_order
| try_to_compact_pages
| __alloc_pages_direct_compact
| __alloc_pages_nodemask
| alloc_pages_vma
| do_huge_pmd_anonymous_page
| handle_mm_fault
| __get_user_pages
| get_user_page_nowait
| hva_to_pfn.isra.17
| __gfn_to_pfn
| gfn_to_pfn_async
| try_async_pf
| tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--95.19%-- 0x10100000006
| |
| --4.81%-- 0x10100000002
--0.88%-- [...]
5.73% ksmd [kernel.kallsyms] [k] memcmp
|
--- memcmp
|
|--99.79%-- memcmp_pages
| |
| |--81.64%-- ksm_scan_thread
| | kthread
| | kernel_thread_helper
| |
| --18.36%-- try_to_merge_with_ksm_page
| ksm_scan_thread
| kthread
| kernel_thread_helper
--0.21%-- [...]
5.52% swapper [kernel.kallsyms] [k] default_idle
|
--- default_idle
|
|--99.51%-- cpu_idle
| |
| |--86.19%-- start_secondary
| |
| --13.81%-- rest_init
| start_kernel
| x86_64_start_reservations
| x86_64_start_kernel
--0.49%-- [...]
2.90% qemu-kvm [kernel.kallsyms] [k] yield_to
|
--- yield_to
|
|--99.70%-- kvm_vcpu_yield_to
| kvm_vcpu_on_spin
| pause_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--96.09%-- 0x10100000006
| |
| --3.91%-- 0x10100000002
--0.30%-- [...]
1.86% qemu-kvm [kernel.kallsyms] [k] clear_page_c
|
--- clear_page_c
|
|--99.15%-- do_huge_pmd_anonymous_page
| handle_mm_fault
| __get_user_pages
| get_user_page_nowait
| hva_to_pfn.isra.17
| __gfn_to_pfn
| gfn_to_pfn_async
| try_async_pf
| tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--96.03%-- 0x10100000006
| |
| --3.97%-- 0x10100000002
|
--0.85%-- __alloc_pages_nodemask
|
|--78.22%-- alloc_pages_vma
| handle_pte_fault
| |
| |--99.76%-- handle_mm_fault
| | __get_user_pages
| | get_user_page_nowait
| | hva_to_pfn.isra.17
| | __gfn_to_pfn
| | gfn_to_pfn_async
| | try_async_pf
| | tdp_page_fault
| | kvm_mmu_page_fault
| | pf_interception
| | handle_exit
| | kvm_arch_vcpu_ioctl_run
| | kvm_vcpu_ioctl
| | do_vfs_ioctl
| | sys_ioctl
| | system_call_fastpath
| | ioctl
| | |
| | |--91.60%-- 0x10100000006
| | |
| | --8.40%-- 0x10100000002
| --0.24%-- [...]
|
--21.78%-- alloc_pages_current
pte_alloc_one
|
|--97.40%-- do_huge_pmd_anonymous_page
| handle_mm_fault
| __get_user_pages
| get_user_page_nowait
| hva_to_pfn.isra.17
| __gfn_to_pfn
| gfn_to_pfn_async
| try_async_pf
| tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--93.12%-- 0x10100000006
| |
| --6.88%-- 0x10100000002
|
--2.60%-- __pte_alloc
do_huge_pmd_anonymous_page
handle_mm_fault
__get_user_pages
get_user_page_nowait
hva_to_pfn.isra.17
__gfn_to_pfn
gfn_to_pfn_async
try_async_pf
tdp_page_fault
kvm_mmu_page_fault
pf_interception
handle_exit
kvm_arch_vcpu_ioctl_run
kvm_vcpu_ioctl
do_vfs_ioctl
sys_ioctl
system_call_fastpath
ioctl
0x10100000006
1.83% qemu-kvm [kernel.kallsyms] [k] get_pageblock_flags_group
|
--- get_pageblock_flags_group
|
|--51.38%-- isolate_migratepages_range
| compact_zone
| compact_zone_order
| try_to_compact_pages
| __alloc_pages_direct_compact
| __alloc_pages_nodemask
| alloc_pages_vma
| do_huge_pmd_anonymous_page
| handle_mm_fault
| __get_user_pages
| get_user_page_nowait
| hva_to_pfn.isra.17
| __gfn_to_pfn
| gfn_to_pfn_async
| try_async_pf
| tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--95.32%-- 0x10100000006
| |
| --4.68%-- 0x10100000002
|
|--43.05%-- suitable_migration_target
| compaction_alloc
| migrate_pages
| compact_zone
| compact_zone_order
| try_to_compact_pages
| __alloc_pages_direct_compact
| __alloc_pages_nodemask
| alloc_pages_vma
| do_huge_pmd_anonymous_page
| handle_mm_fault
| __get_user_pages
| get_user_page_nowait
| hva_to_pfn.isra.17
| __gfn_to_pfn
| gfn_to_pfn_async
| try_async_pf
| tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--95.52%-- 0x10100000006
| |
| --4.48%-- 0x10100000002
|
|--3.62%-- compact_zone
| compact_zone_order
| try_to_compact_pages
| __alloc_pages_direct_compact
| __alloc_pages_nodemask
| alloc_pages_vma
| do_huge_pmd_anonymous_page
| handle_mm_fault
| __get_user_pages
| get_user_page_nowait
| hva_to_pfn.isra.17
| __gfn_to_pfn
| gfn_to_pfn_async
| try_async_pf
| tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--96.78%-- 0x10100000006
| |
| --3.22%-- 0x10100000002
|
|--1.20%-- compaction_alloc
| migrate_pages
| compact_zone
| compact_zone_order
| try_to_compact_pages
| __alloc_pages_direct_compact
| __alloc_pages_nodemask
| alloc_pages_vma
| do_huge_pmd_anonymous_page
| handle_mm_fault
| __get_user_pages
| get_user_page_nowait
| hva_to_pfn.isra.17
| __gfn_to_pfn
| gfn_to_pfn_async
| try_async_pf
| tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--96.33%-- 0x10100000006
| |
| --3.67%-- 0x10100000002
|
|--0.61%-- free_hot_cold_page
| |
| |--77.99%-- free_hot_cold_page_list
| | |
| | |--95.93%-- release_pages
| | | pagevec_lru_move_fn
| | | __pagevec_lru_add
| | | |
| | | |--98.44%-- __lru_cache_add
| | | | lru_cache_add_lru
| | | | putback_lru_page
| | | | migrate_pages
| | | | compact_zone
| | | | compact_zone_order
| | | | try_to_compact_pages
| | | | __alloc_pages_direct_compact
| | | | __alloc_pages_nodemask
| | | | alloc_pages_vma
| | | | do_huge_pmd_anonymous_page
| | | | handle_mm_fault
| | | | __get_user_pages
| | | | get_user_page_nowait
| | | | hva_to_pfn.isra.17
| | | | __gfn_to_pfn
| | | | gfn_to_pfn_async
| | | | try_async_pf
| | | | tdp_page_fault
| | | | kvm_mmu_page_fault
| | | | pf_interception
| | | | handle_exit
| | | | kvm_arch_vcpu_ioctl_run
| | | | kvm_vcpu_ioctl
| | | | do_vfs_ioctl
| | | | sys_ioctl
| | | | system_call_fastpath
| | | | ioctl
| | | | |
| | | | |--96.77%-- 0x10100000006
| | | | |
| | | | --3.23%-- 0x10100000002
| | | |
| | | --1.56%-- lru_add_drain_cpu
| | | lru_add_drain
| | | migrate_prep_local
| | | compact_zone
| | | compact_zone_order
| | | try_to_compact_pages
| | | __alloc_pages_direct_compact
| | | __alloc_pages_nodemask
| | | alloc_pages_vma
| | | do_huge_pmd_anonymous_page
| | | handle_mm_fault
| | | __get_user_pages
| | | get_user_page_nowait
| | | hva_to_pfn.isra.17
| | | __gfn_to_pfn
| | | gfn_to_pfn_async
| | | try_async_pf
| | | tdp_page_fault
| | | kvm_mmu_page_fault
| | | pf_interception
| | | handle_exit
| | | kvm_arch_vcpu_ioctl_run
| | | kvm_vcpu_ioctl
| | | do_vfs_ioctl
| | | sys_ioctl
| | | system_call_fastpath
| | | ioctl
| | | 0x10100000006
| | |
| | --4.07%-- shrink_page_list
| | shrink_inactive_list
| | shrink_lruvec
| | try_to_free_pages
| | __alloc_pages_nodemask
| | alloc_pages_vma
| | do_huge_pmd_anonymous_page
| | handle_mm_fault
| | __get_user_pages
| | get_user_page_nowait
| | hva_to_pfn.isra.17
| | __gfn_to_pfn
| | gfn_to_pfn_async
| | try_async_pf
| | tdp_page_fault
| | kvm_mmu_page_fault
| | pf_interception
| | handle_exit
| | kvm_arch_vcpu_ioctl_run
| | kvm_vcpu_ioctl
| | do_vfs_ioctl
| | sys_ioctl
| | system_call_fastpath
| | ioctl
| | 0x10100000006
| |
| |--19.40%-- __free_pages
| | |
| | |--85.71%-- release_freepages
| | | compact_zone
| | | compact_zone_order
| | | try_to_compact_pages
| | | __alloc_pages_direct_compact
| | | __alloc_pages_nodemask
| | | alloc_pages_vma
| | | do_huge_pmd_anonymous_page
| | | handle_mm_fault
| | | __get_user_pages
| | | get_user_page_nowait
| | | hva_to_pfn.isra.17
| | | __gfn_to_pfn
| | | gfn_to_pfn_async
| | | try_async_pf
| | | tdp_page_fault
| | | kvm_mmu_page_fault
| | | pf_interception
| | | handle_exit
| | | kvm_arch_vcpu_ioctl_run
| | | kvm_vcpu_ioctl
| | | do_vfs_ioctl
| | | sys_ioctl
| | | system_call_fastpath
| | | ioctl
| | | |
| | | |--90.47%-- 0x10100000006
| | | |
| | | --9.53%-- 0x10100000002
| | |
| | |--10.21%-- do_huge_pmd_anonymous_page
| | | handle_mm_fault
| | | __get_user_pages
| | | get_user_page_nowait
| | | hva_to_pfn.isra.17
| | | __gfn_to_pfn
| | | gfn_to_pfn_async
| | | try_async_pf
| | | tdp_page_fault
| | | kvm_mmu_page_fault
| | | pf_interception
| | | handle_exit
| | | kvm_arch_vcpu_ioctl_run
| | | kvm_vcpu_ioctl
| | | do_vfs_ioctl
| | | sys_ioctl
| | | system_call_fastpath
| | | ioctl
| | | 0x10100000006
| | |
| | --4.08%-- __free_slab
| | discard_slab
| | __slab_free
| | kmem_cache_free
| | free_buffer_head
| | try_to_free_buffers
| | jbd2_journal_try_to_free_buffers
| | bdev_try_to_free_page
| | blkdev_releasepage
| | try_to_release_page
| | move_to_new_page
| | migrate_pages
| | compact_zone
| | compact_zone_order
| | try_to_compact_pages
| | __alloc_pages_direct_compact
| | __alloc_pages_nodemask
| | alloc_pages_vma
| | do_huge_pmd_anonymous_page
| | handle_mm_fault
| | __get_user_pages
| | get_user_page_nowait
| | hva_to_pfn.isra.17
| | __gfn_to_pfn
| | gfn_to_pfn_async
| | try_async_pf
| | tdp_page_fault
| | kvm_mmu_page_fault
| | pf_interception
| | handle_exit
| | kvm_arch_vcpu_ioctl_run
| | kvm_vcpu_ioctl
| | do_vfs_ioctl
| | sys_ioctl
| | system_call_fastpath
| | ioctl
| | 0x10100000006
| |
| --2.61%-- __put_single_page
| put_page
| |
| |--91.27%-- putback_lru_page
| | migrate_pages
| | compact_zone
| | compact_zone_order
| | try_to_compact_pages
| | __alloc_pages_direct_compact
| | __alloc_pages_nodemask
| | alloc_pages_vma
| | do_huge_pmd_anonymous_page
| | handle_mm_fault
| | __get_user_pages
| | get_user_page_nowait
| | hva_to_pfn.isra.17
| | __gfn_to_pfn
| | gfn_to_pfn_async
| | try_async_pf
| | tdp_page_fault
| | kvm_mmu_page_fault
| | pf_interception
| | handle_exit
| | kvm_arch_vcpu_ioctl_run
| | kvm_vcpu_ioctl
| | do_vfs_ioctl
| | sys_ioctl
| | system_call_fastpath
| | ioctl
| | 0x10100000006
| |
| --8.73%-- skb_free_head.part.34
| skb_release_data
| __kfree_skb
| tcp_recvmsg
| inet_recvmsg
| sock_recvmsg
| sys_recvfrom
| system_call_fastpath
| recv
| 0x0
--0.14%-- [...]
1.54% qemu-kvm [kernel.kallsyms] [k] svm_vcpu_run
|
--- svm_vcpu_run
|
|--99.52%-- kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--94.70%-- 0x10100000006
| |
| --5.30%-- 0x10100000002
--0.48%-- [...]
1.30% qemu-kvm [kernel.kallsyms] [k] kvm_vcpu_on_spin
|
--- kvm_vcpu_on_spin
|
|--99.45%-- pause_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--96.06%-- 0x10100000006
| |
| --3.94%-- 0x10100000002
|
--0.55%-- handle_exit
kvm_arch_vcpu_ioctl_run
kvm_vcpu_ioctl
do_vfs_ioctl
sys_ioctl
system_call_fastpath
ioctl
|
|--97.59%-- 0x10100000006
|
--2.41%-- 0x10100000002
1.00% qemu-kvm qemu-kvm [.] 0x0000000000254bc2
|
|--1.63%-- 0x4eec20
| |
| |--47.60%-- 0x2274280
| | 0x0
| | 0xa0
| | 0x696368752d62
| |
| |--26.98%-- 0x309c280
| | 0x0
| | 0xa0
| | 0x696368752d62
| |
| --25.42%-- 0x16b5280
| 0x0
| 0xa0
| 0x696368752d62
|
|--1.63%-- 0x4eec6e
| |
| |--52.41%-- 0x2274280
| | 0x0
| | 0xa0
| | 0x696368752d62
| |
| |--38.99%-- 0x16b5280
| | 0x0
| | 0xa0
| | 0x696368752d62
| |
| --8.60%-- 0x309c280
| 0x0
| 0xa0
| 0x696368752d62
|
|--1.44%-- 0x5b4cb4
| 0x0
| |
| --100.00%-- 0x822ee8fff96873e9
|
|--1.32%-- 0x503457
| 0x0
|
|--1.30%-- 0x65a186
| 0x0
|
|--1.22%-- 0x541422
| 0x0
|
|--1.08%-- 0x568f04
| |
| |--93.81%-- 0x0
| |
| |--6.01%-- 0x10100000006
| --0.19%-- [...]
|
|--1.06%-- 0x56a08e
| |
| |--55.97%-- 0x2fa1410
| | 0x0
| |
| |--24.12%-- 0x2179410
| | 0x0
| |
| --19.92%-- 0x15ba410
| 0x0
|
|--1.05%-- 0x4eeeac
| |
| |--66.23%-- 0x309c280
| | 0x0
| | 0xa0
| | 0x696368752d62
| |
| |--19.06%-- 0x16b5280
| | 0x0
| | 0xa0
| | 0x696368752d62
| |
| --14.71%-- 0x2274280
| 0x0
| 0xa0
| 0x696368752d62
|
|--1.01%-- 0x6578d7
| |
| --100.00%-- 0x0
|
|--0.96%-- 0x52fb44
| |
| |--91.88%-- 0x0
| |
| --8.12%-- 0x10100000006
|
|--0.95%-- 0x65a102
|
|--0.94%-- 0x541aac
| 0x0
|
|--0.93%-- 0x525261
| 0x0
| |
| --100.00%-- 0x822ee8fff96873e9
|
|--0.89%-- 0x540e24
|
|--0.88%-- 0x477a32
| 0x0
|
|--0.87%-- 0x4eee03
| |
| |--47.23%-- 0x309c280
| | 0x0
| | 0xa0
| | 0x696368752d62
| |
| |--32.15%-- 0x2274280
| | 0x0
| | 0xa0
| | 0x696368752d62
| |
| --20.62%-- 0x16b5280
| 0x0
| 0xa0
| 0x696368752d62
|
|--0.84%-- 0x530421
| |
| --100.00%-- 0x0
|
|--0.83%-- 0x4eeb52
|
|--0.82%-- 0x40a6a9
|
|--0.79%-- 0x672601
| 0x1
|
|--0.78%-- 0x564e00
| |
| --100.00%-- 0x0
|
|--0.78%-- 0x568e38
| |
| |--95.83%-- 0x309c280
| | 0x0
| | 0xa0
| | 0x696368752d62
| |
| |--2.15%-- 0x10100000006
| |
| --2.02%-- 0x16b5280
| 0x0
| 0xa0
| 0x696368752d62
|
|--0.74%-- 0x56e704
| |
| |--47.84%-- 0x309c280
| | 0x0
| | 0xa0
| | 0x696368752d62
| |
| |--38.61%-- 0x16b5280
| | 0x0
| | 0xa0
| | 0x696368752d62
| |
| |--10.72%-- 0x2274280
| | 0x0
| | 0xa0
| | 0x696368752d62
| |
| --2.83%-- 0x10100000006
|
|--0.73%-- 0x5308c3
|
|--0.72%-- 0x654b22
| 0x0
|
|--0.71%-- 0x530094
|
|--0.71%-- 0x564e04
| |
| |--87.21%-- 0x0
| |
| |--12.59%-- 0x46b47b
| | 0xdffebc0000a88169
| --0.20%-- [...]
|
|--0.71%-- 0x568e5f
| |
| |--98.58%-- 0x309c280
| | 0x0
| | 0xa0
| | 0x696368752d62
| |
| --1.42%-- 0x16b5280
| 0x0
| 0xa0
| 0x696368752d62
|
|--0.70%-- 0x4ef092
|
|--0.70%-- 0x52fac2
| |
| |--99.12%-- 0x0
| |
| --0.88%-- 0x10100000006
|
|--0.68%-- 0x541ac1
|
|--0.66%-- 0x4eec22
| |
| |--44.90%-- 0x16b5280
| | 0x0
| | 0xa0
| | 0x696368752d62
| |
| |--30.11%-- 0x309c280
| | 0x0
| | 0xa0
| | 0x696368752d62
| |
| --25.00%-- 0x2274280
| 0x0
| 0xa0
| 0x696368752d62
|
|--0.65%-- 0x5afab4
| |
| |--48.10%-- 0x2179410
| | 0x0
| |
| |--41.94%-- 0x15ba410
| | 0x0
| |
| |--5.05%-- 0x0
| | |
| | |--39.43%-- 0x3099550
| | | 0x5699c0
| | | 0x24448948004b4154
| | |
| | |--35.76%-- 0x23c0e90
| | | 0x5699c0
| | | 0x24448948004b4154
| | |
| | --24.81%-- 0x16b2130
| | 0x5699c0
| | 0x24448948004b4154
| |
| |--4.00%-- 0x2fa1410
| | 0x0
| |
| --0.92%-- 0x6
|
|--0.63%-- 0x65a3f6
| 0x1
|
|--0.63%-- 0x659d12
| 0x0
|
|--0.62%-- 0x530764
| 0x0
|
|--0.62%-- 0x46e803
| 0x46b47b
| |
| |--72.15%-- 0xdffebc0000a88169
| |
| |--16.88%-- 0xdffebec000a08169
| |
| --10.97%-- 0xdffeb1d000a88169
|
|--0.61%-- 0x4eeba0
| |
| |--45.41%-- 0x309c280
| | 0x0
| | 0xa0
| | 0x696368752d62
| |
| |--36.19%-- 0x16b5280
| | 0x0
| | 0xa0
| | 0x696368752d62
| |
| --18.40%-- 0x2274280
| 0x0
| 0xa0
| 0x696368752d62
|
|--0.60%-- 0x659d61
|
|--0.60%-- 0x4ff496
|
|--0.59%-- 0x5030db
|
|--0.58%-- 0x477822
|
^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: Windows VM slow boot
2012-09-12 16:46 ` Richard Davies
(?)
@ 2012-09-13 9:50 ` Mel Gorman
-1 siblings, 0 replies; 101+ messages in thread
From: Mel Gorman @ 2012-09-13 9:50 UTC (permalink / raw)
To: Richard Davies
Cc: Rik van Riel, Avi Kivity, Shaohua Li, qemu-devel, kvm, linux-mm
On Wed, Sep 12, 2012 at 05:46:15PM +0100, Richard Davies wrote:
> Hi Mel - thanks for replying to my underhand bcc!
>
> Mel Gorman wrote:
> > I see that this is an old-ish bug but I did not read the full history.
> > Is it now booting faster than 3.5.0 was? I'm asking because I'm
> > interested to see if commit c67fe375 helped your particular case.
>
> Yes, I think 3.6.0-rc5 is already better than 3.5.x but can still be
> improved, as discussed.
>
What are the boot times for each kernel?
> <PATCH SNIPPED>
>
> I have applied and tested again - perf results below.
>
> isolate_migratepages_range is indeed much reduced.
>
> There is now a lot of time in isolate_freepages_block and still quite a lot
> of lock contention, although in a different place.
>
This on top please.
---8<---
From: Shaohua Li <shli@fusionio.com>
compaction: abort compaction loop if lock is contended or run too long
isolate_migratepages_range() might isolate none pages, for example, when
zone->lru_lock is contended and compaction is async. In this case, we should
abort compaction, otherwise, compact_zone will run a useless loop and make
zone->lru_lock is even contended.
V2:
only abort the compaction if lock is contended or run too long
Rearranged the code by Andrea Arcangeli.
[minchan@kernel.org: Putback pages isolated for migration if aborting]
[akpm@linux-foundation.org: Fixup one contended usage site]
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Shaohua Li <shli@fusionio.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
---
mm/compaction.c | 17 ++++++++++++-----
mm/internal.h | 2 +-
2 files changed, 13 insertions(+), 6 deletions(-)
diff --git a/mm/compaction.c b/mm/compaction.c
index 7fcd3a5..a8de20d 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -70,8 +70,7 @@ static bool compact_checklock_irqsave(spinlock_t *lock, unsigned long *flags,
/* async aborts if taking too long or contended */
if (!cc->sync) {
- if (cc->contended)
- *cc->contended = true;
+ cc->contended = true;
return false;
}
@@ -634,7 +633,7 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
/* Perform the isolation */
low_pfn = isolate_migratepages_range(zone, cc, low_pfn, end_pfn);
- if (!low_pfn)
+ if (!low_pfn || cc->contended)
return ISOLATE_ABORT;
cc->migrate_pfn = low_pfn;
@@ -787,6 +786,8 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)
switch (isolate_migratepages(zone, cc)) {
case ISOLATE_ABORT:
ret = COMPACT_PARTIAL;
+ putback_lru_pages(&cc->migratepages);
+ cc->nr_migratepages = 0;
goto out;
case ISOLATE_NONE:
continue;
@@ -831,6 +832,7 @@ static unsigned long compact_zone_order(struct zone *zone,
int order, gfp_t gfp_mask,
bool sync, bool *contended)
{
+ unsigned long ret;
struct compact_control cc = {
.nr_freepages = 0,
.nr_migratepages = 0,
@@ -838,12 +840,17 @@ static unsigned long compact_zone_order(struct zone *zone,
.migratetype = allocflags_to_migratetype(gfp_mask),
.zone = zone,
.sync = sync,
- .contended = contended,
};
INIT_LIST_HEAD(&cc.freepages);
INIT_LIST_HEAD(&cc.migratepages);
- return compact_zone(zone, &cc);
+ ret = compact_zone(zone, &cc);
+
+ VM_BUG_ON(!list_empty(&cc.freepages));
+ VM_BUG_ON(!list_empty(&cc.migratepages));
+
+ *contended = cc.contended;
+ return ret;
}
int sysctl_extfrag_threshold = 500;
diff --git a/mm/internal.h b/mm/internal.h
index b8c91b3..4bd7c0e 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -130,7 +130,7 @@ struct compact_control {
int order; /* order a direct compactor needs */
int migratetype; /* MOVABLE, RECLAIMABLE etc */
struct zone *zone;
- bool *contended; /* True if a lock was contended */
+ bool contended; /* True if a lock was contended */
};
unsigned long
^ permalink raw reply related [flat|nested] 101+ messages in thread
* Re: Windows VM slow boot
@ 2012-09-13 9:50 ` Mel Gorman
0 siblings, 0 replies; 101+ messages in thread
From: Mel Gorman @ 2012-09-13 9:50 UTC (permalink / raw)
To: Richard Davies
Cc: Rik van Riel, Avi Kivity, Shaohua Li, qemu-devel, kvm, linux-mm
On Wed, Sep 12, 2012 at 05:46:15PM +0100, Richard Davies wrote:
> Hi Mel - thanks for replying to my underhand bcc!
>
> Mel Gorman wrote:
> > I see that this is an old-ish bug but I did not read the full history.
> > Is it now booting faster than 3.5.0 was? I'm asking because I'm
> > interested to see if commit c67fe375 helped your particular case.
>
> Yes, I think 3.6.0-rc5 is already better than 3.5.x but can still be
> improved, as discussed.
>
What are the boot times for each kernel?
> <PATCH SNIPPED>
>
> I have applied and tested again - perf results below.
>
> isolate_migratepages_range is indeed much reduced.
>
> There is now a lot of time in isolate_freepages_block and still quite a lot
> of lock contention, although in a different place.
>
This on top please.
---8<---
From: Shaohua Li <shli@fusionio.com>
compaction: abort compaction loop if lock is contended or run too long
isolate_migratepages_range() might isolate none pages, for example, when
zone->lru_lock is contended and compaction is async. In this case, we should
abort compaction, otherwise, compact_zone will run a useless loop and make
zone->lru_lock is even contended.
V2:
only abort the compaction if lock is contended or run too long
Rearranged the code by Andrea Arcangeli.
[minchan@kernel.org: Putback pages isolated for migration if aborting]
[akpm@linux-foundation.org: Fixup one contended usage site]
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Shaohua Li <shli@fusionio.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
---
mm/compaction.c | 17 ++++++++++++-----
mm/internal.h | 2 +-
2 files changed, 13 insertions(+), 6 deletions(-)
diff --git a/mm/compaction.c b/mm/compaction.c
index 7fcd3a5..a8de20d 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -70,8 +70,7 @@ static bool compact_checklock_irqsave(spinlock_t *lock, unsigned long *flags,
/* async aborts if taking too long or contended */
if (!cc->sync) {
- if (cc->contended)
- *cc->contended = true;
+ cc->contended = true;
return false;
}
@@ -634,7 +633,7 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
/* Perform the isolation */
low_pfn = isolate_migratepages_range(zone, cc, low_pfn, end_pfn);
- if (!low_pfn)
+ if (!low_pfn || cc->contended)
return ISOLATE_ABORT;
cc->migrate_pfn = low_pfn;
@@ -787,6 +786,8 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)
switch (isolate_migratepages(zone, cc)) {
case ISOLATE_ABORT:
ret = COMPACT_PARTIAL;
+ putback_lru_pages(&cc->migratepages);
+ cc->nr_migratepages = 0;
goto out;
case ISOLATE_NONE:
continue;
@@ -831,6 +832,7 @@ static unsigned long compact_zone_order(struct zone *zone,
int order, gfp_t gfp_mask,
bool sync, bool *contended)
{
+ unsigned long ret;
struct compact_control cc = {
.nr_freepages = 0,
.nr_migratepages = 0,
@@ -838,12 +840,17 @@ static unsigned long compact_zone_order(struct zone *zone,
.migratetype = allocflags_to_migratetype(gfp_mask),
.zone = zone,
.sync = sync,
- .contended = contended,
};
INIT_LIST_HEAD(&cc.freepages);
INIT_LIST_HEAD(&cc.migratepages);
- return compact_zone(zone, &cc);
+ ret = compact_zone(zone, &cc);
+
+ VM_BUG_ON(!list_empty(&cc.freepages));
+ VM_BUG_ON(!list_empty(&cc.migratepages));
+
+ *contended = cc.contended;
+ return ret;
}
int sysctl_extfrag_threshold = 500;
diff --git a/mm/internal.h b/mm/internal.h
index b8c91b3..4bd7c0e 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -130,7 +130,7 @@ struct compact_control {
int order; /* order a direct compactor needs */
int migratetype; /* MOVABLE, RECLAIMABLE etc */
struct zone *zone;
- bool *contended; /* True if a lock was contended */
+ bool contended; /* True if a lock was contended */
};
unsigned long
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 101+ messages in thread
* Re: [Qemu-devel] Windows VM slow boot
@ 2012-09-13 9:50 ` Mel Gorman
0 siblings, 0 replies; 101+ messages in thread
From: Mel Gorman @ 2012-09-13 9:50 UTC (permalink / raw)
To: Richard Davies; +Cc: kvm, qemu-devel, linux-mm, Avi Kivity, Shaohua Li
On Wed, Sep 12, 2012 at 05:46:15PM +0100, Richard Davies wrote:
> Hi Mel - thanks for replying to my underhand bcc!
>
> Mel Gorman wrote:
> > I see that this is an old-ish bug but I did not read the full history.
> > Is it now booting faster than 3.5.0 was? I'm asking because I'm
> > interested to see if commit c67fe375 helped your particular case.
>
> Yes, I think 3.6.0-rc5 is already better than 3.5.x but can still be
> improved, as discussed.
>
What are the boot times for each kernel?
> <PATCH SNIPPED>
>
> I have applied and tested again - perf results below.
>
> isolate_migratepages_range is indeed much reduced.
>
> There is now a lot of time in isolate_freepages_block and still quite a lot
> of lock contention, although in a different place.
>
This on top please.
---8<---
From: Shaohua Li <shli@fusionio.com>
compaction: abort compaction loop if lock is contended or run too long
isolate_migratepages_range() might isolate none pages, for example, when
zone->lru_lock is contended and compaction is async. In this case, we should
abort compaction, otherwise, compact_zone will run a useless loop and make
zone->lru_lock is even contended.
V2:
only abort the compaction if lock is contended or run too long
Rearranged the code by Andrea Arcangeli.
[minchan@kernel.org: Putback pages isolated for migration if aborting]
[akpm@linux-foundation.org: Fixup one contended usage site]
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Shaohua Li <shli@fusionio.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
---
mm/compaction.c | 17 ++++++++++++-----
mm/internal.h | 2 +-
2 files changed, 13 insertions(+), 6 deletions(-)
diff --git a/mm/compaction.c b/mm/compaction.c
index 7fcd3a5..a8de20d 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -70,8 +70,7 @@ static bool compact_checklock_irqsave(spinlock_t *lock, unsigned long *flags,
/* async aborts if taking too long or contended */
if (!cc->sync) {
- if (cc->contended)
- *cc->contended = true;
+ cc->contended = true;
return false;
}
@@ -634,7 +633,7 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
/* Perform the isolation */
low_pfn = isolate_migratepages_range(zone, cc, low_pfn, end_pfn);
- if (!low_pfn)
+ if (!low_pfn || cc->contended)
return ISOLATE_ABORT;
cc->migrate_pfn = low_pfn;
@@ -787,6 +786,8 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)
switch (isolate_migratepages(zone, cc)) {
case ISOLATE_ABORT:
ret = COMPACT_PARTIAL;
+ putback_lru_pages(&cc->migratepages);
+ cc->nr_migratepages = 0;
goto out;
case ISOLATE_NONE:
continue;
@@ -831,6 +832,7 @@ static unsigned long compact_zone_order(struct zone *zone,
int order, gfp_t gfp_mask,
bool sync, bool *contended)
{
+ unsigned long ret;
struct compact_control cc = {
.nr_freepages = 0,
.nr_migratepages = 0,
@@ -838,12 +840,17 @@ static unsigned long compact_zone_order(struct zone *zone,
.migratetype = allocflags_to_migratetype(gfp_mask),
.zone = zone,
.sync = sync,
- .contended = contended,
};
INIT_LIST_HEAD(&cc.freepages);
INIT_LIST_HEAD(&cc.migratepages);
- return compact_zone(zone, &cc);
+ ret = compact_zone(zone, &cc);
+
+ VM_BUG_ON(!list_empty(&cc.freepages));
+ VM_BUG_ON(!list_empty(&cc.migratepages));
+
+ *contended = cc.contended;
+ return ret;
}
int sysctl_extfrag_threshold = 500;
diff --git a/mm/internal.h b/mm/internal.h
index b8c91b3..4bd7c0e 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -130,7 +130,7 @@ struct compact_control {
int order; /* order a direct compactor needs */
int migratetype; /* MOVABLE, RECLAIMABLE etc */
struct zone *zone;
- bool *contended; /* True if a lock was contended */
+ bool contended; /* True if a lock was contended */
};
unsigned long
^ permalink raw reply related [flat|nested] 101+ messages in thread
* [PATCH 1/2] Revert "mm: have order > 0 compaction start near a pageblock with free pages"
2012-09-12 16:46 ` Richard Davies
(?)
@ 2012-09-13 19:47 ` Rik van Riel
-1 siblings, 0 replies; 101+ messages in thread
From: Rik van Riel @ 2012-09-13 19:47 UTC (permalink / raw)
To: Richard Davies
Cc: Mel Gorman, Avi Kivity, Shaohua Li, qemu-devel, kvm, linux-mm
On Wed, 12 Sep 2012 17:46:15 +0100
Richard Davies <richard@arachsys.com> wrote:
> Mel Gorman wrote:
> > I see that this is an old-ish bug but I did not read the full history.
> > Is it now booting faster than 3.5.0 was? I'm asking because I'm
> > interested to see if commit c67fe375 helped your particular case.
>
> Yes, I think 3.6.0-rc5 is already better than 3.5.x but can still be
> improved, as discussed.
Re-reading Mel's commit de74f1cc3b1e9730d9b58580cd11361d30cd182d,
I believe it re-introduces the quadratic behaviour that the code
was suffering from before, by not moving zone->compact_cached_free_pfn
down when no more free pfns are found in a page block.
This mail reverts that changeset, the next introduces what I hope to
be the proper fix. Richard, would you be willing to give these patches
a try, since your system seems to reproduce this bug easily?
---8<---
Revert "mm: have order > 0 compaction start near a pageblock with free pages"
This reverts commit de74f1cc3b1e9730d9b58580cd11361d30cd182d.
Mel found a real issue with my "skip ahead" logic in the
compaction code, but unfortunately his approach appears to
have re-introduced quadratic behaviour in that the value
of zone->compact_cached_free_pfn is never advanced until
the compaction run wraps around the start of the zone.
This merely moved the starting point for the quadratic behaviour
further into the zone, but the behaviour has still been observed.
It looks like another fix is required.
Signed-off-by: Rik van Riel <riel@redhat.com>
Reported-by: Richard Davies <richard@daviesmail.org>
diff --git a/mm/compaction.c b/mm/compaction.c
index 7fcd3a5..771775d 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -431,20 +431,6 @@ static bool suitable_migration_target(struct page *page)
}
/*
- * Returns the start pfn of the last page block in a zone. This is the starting
- * point for full compaction of a zone. Compaction searches for free pages from
- * the end of each zone, while isolate_freepages_block scans forward inside each
- * page block.
- */
-static unsigned long start_free_pfn(struct zone *zone)
-{
- unsigned long free_pfn;
- free_pfn = zone->zone_start_pfn + zone->spanned_pages;
- free_pfn &= ~(pageblock_nr_pages-1);
- return free_pfn;
-}
-
-/*
* Based on information in the current compact_control, find blocks
* suitable for isolating free pages from and then isolate them.
*/
@@ -483,6 +469,17 @@ static void isolate_freepages(struct zone *zone,
pfn -= pageblock_nr_pages) {
unsigned long isolated;
+ /*
+ * Skip ahead if another thread is compacting in the area
+ * simultaneously. If we wrapped around, we can only skip
+ * ahead if zone->compact_cached_free_pfn also wrapped to
+ * above our starting point.
+ */
+ if (cc->order > 0 && (!cc->wrapped ||
+ zone->compact_cached_free_pfn >
+ cc->start_free_pfn))
+ pfn = min(pfn, zone->compact_cached_free_pfn);
+
if (!pfn_valid(pfn))
continue;
@@ -533,15 +530,7 @@ static void isolate_freepages(struct zone *zone,
*/
if (isolated) {
high_pfn = max(high_pfn, pfn);
-
- /*
- * If the free scanner has wrapped, update
- * compact_cached_free_pfn to point to the highest
- * pageblock with free pages. This reduces excessive
- * scanning of full pageblocks near the end of the
- * zone
- */
- if (cc->order > 0 && cc->wrapped)
+ if (cc->order > 0)
zone->compact_cached_free_pfn = high_pfn;
}
}
@@ -551,11 +540,6 @@ static void isolate_freepages(struct zone *zone,
cc->free_pfn = high_pfn;
cc->nr_freepages = nr_freepages;
-
- /* If compact_cached_free_pfn is reset then set it now */
- if (cc->order > 0 && !cc->wrapped &&
- zone->compact_cached_free_pfn == start_free_pfn(zone))
- zone->compact_cached_free_pfn = high_pfn;
}
/*
@@ -642,6 +626,20 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
return ISOLATE_SUCCESS;
}
+/*
+ * Returns the start pfn of the last page block in a zone. This is the starting
+ * point for full compaction of a zone. Compaction searches for free pages from
+ * the end of each zone, while isolate_freepages_block scans forward inside each
+ * page block.
+ */
+static unsigned long start_free_pfn(struct zone *zone)
+{
+ unsigned long free_pfn;
+ free_pfn = zone->zone_start_pfn + zone->spanned_pages;
+ free_pfn &= ~(pageblock_nr_pages-1);
+ return free_pfn;
+}
+
static int compact_finished(struct zone *zone,
struct compact_control *cc)
{
^ permalink raw reply related [flat|nested] 101+ messages in thread
* [PATCH 1/2] Revert "mm: have order > 0 compaction start near a pageblock with free pages"
@ 2012-09-13 19:47 ` Rik van Riel
0 siblings, 0 replies; 101+ messages in thread
From: Rik van Riel @ 2012-09-13 19:47 UTC (permalink / raw)
To: Richard Davies
Cc: Mel Gorman, Avi Kivity, Shaohua Li, qemu-devel, kvm, linux-mm
On Wed, 12 Sep 2012 17:46:15 +0100
Richard Davies <richard@arachsys.com> wrote:
> Mel Gorman wrote:
> > I see that this is an old-ish bug but I did not read the full history.
> > Is it now booting faster than 3.5.0 was? I'm asking because I'm
> > interested to see if commit c67fe375 helped your particular case.
>
> Yes, I think 3.6.0-rc5 is already better than 3.5.x but can still be
> improved, as discussed.
Re-reading Mel's commit de74f1cc3b1e9730d9b58580cd11361d30cd182d,
I believe it re-introduces the quadratic behaviour that the code
was suffering from before, by not moving zone->compact_cached_free_pfn
down when no more free pfns are found in a page block.
This mail reverts that changeset, the next introduces what I hope to
be the proper fix. Richard, would you be willing to give these patches
a try, since your system seems to reproduce this bug easily?
---8<---
Revert "mm: have order > 0 compaction start near a pageblock with free pages"
This reverts commit de74f1cc3b1e9730d9b58580cd11361d30cd182d.
Mel found a real issue with my "skip ahead" logic in the
compaction code, but unfortunately his approach appears to
have re-introduced quadratic behaviour in that the value
of zone->compact_cached_free_pfn is never advanced until
the compaction run wraps around the start of the zone.
This merely moved the starting point for the quadratic behaviour
further into the zone, but the behaviour has still been observed.
It looks like another fix is required.
Signed-off-by: Rik van Riel <riel@redhat.com>
Reported-by: Richard Davies <richard@daviesmail.org>
diff --git a/mm/compaction.c b/mm/compaction.c
index 7fcd3a5..771775d 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -431,20 +431,6 @@ static bool suitable_migration_target(struct page *page)
}
/*
- * Returns the start pfn of the last page block in a zone. This is the starting
- * point for full compaction of a zone. Compaction searches for free pages from
- * the end of each zone, while isolate_freepages_block scans forward inside each
- * page block.
- */
-static unsigned long start_free_pfn(struct zone *zone)
-{
- unsigned long free_pfn;
- free_pfn = zone->zone_start_pfn + zone->spanned_pages;
- free_pfn &= ~(pageblock_nr_pages-1);
- return free_pfn;
-}
-
-/*
* Based on information in the current compact_control, find blocks
* suitable for isolating free pages from and then isolate them.
*/
@@ -483,6 +469,17 @@ static void isolate_freepages(struct zone *zone,
pfn -= pageblock_nr_pages) {
unsigned long isolated;
+ /*
+ * Skip ahead if another thread is compacting in the area
+ * simultaneously. If we wrapped around, we can only skip
+ * ahead if zone->compact_cached_free_pfn also wrapped to
+ * above our starting point.
+ */
+ if (cc->order > 0 && (!cc->wrapped ||
+ zone->compact_cached_free_pfn >
+ cc->start_free_pfn))
+ pfn = min(pfn, zone->compact_cached_free_pfn);
+
if (!pfn_valid(pfn))
continue;
@@ -533,15 +530,7 @@ static void isolate_freepages(struct zone *zone,
*/
if (isolated) {
high_pfn = max(high_pfn, pfn);
-
- /*
- * If the free scanner has wrapped, update
- * compact_cached_free_pfn to point to the highest
- * pageblock with free pages. This reduces excessive
- * scanning of full pageblocks near the end of the
- * zone
- */
- if (cc->order > 0 && cc->wrapped)
+ if (cc->order > 0)
zone->compact_cached_free_pfn = high_pfn;
}
}
@@ -551,11 +540,6 @@ static void isolate_freepages(struct zone *zone,
cc->free_pfn = high_pfn;
cc->nr_freepages = nr_freepages;
-
- /* If compact_cached_free_pfn is reset then set it now */
- if (cc->order > 0 && !cc->wrapped &&
- zone->compact_cached_free_pfn == start_free_pfn(zone))
- zone->compact_cached_free_pfn = high_pfn;
}
/*
@@ -642,6 +626,20 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
return ISOLATE_SUCCESS;
}
+/*
+ * Returns the start pfn of the last page block in a zone. This is the starting
+ * point for full compaction of a zone. Compaction searches for free pages from
+ * the end of each zone, while isolate_freepages_block scans forward inside each
+ * page block.
+ */
+static unsigned long start_free_pfn(struct zone *zone)
+{
+ unsigned long free_pfn;
+ free_pfn = zone->zone_start_pfn + zone->spanned_pages;
+ free_pfn &= ~(pageblock_nr_pages-1);
+ return free_pfn;
+}
+
static int compact_finished(struct zone *zone,
struct compact_control *cc)
{
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 101+ messages in thread
* [Qemu-devel] [PATCH 1/2] Revert "mm: have order > 0 compaction start near a pageblock with free pages"
@ 2012-09-13 19:47 ` Rik van Riel
0 siblings, 0 replies; 101+ messages in thread
From: Rik van Riel @ 2012-09-13 19:47 UTC (permalink / raw)
To: Richard Davies
Cc: kvm, qemu-devel, linux-mm, Mel Gorman, Shaohua Li, Avi Kivity
On Wed, 12 Sep 2012 17:46:15 +0100
Richard Davies <richard@arachsys.com> wrote:
> Mel Gorman wrote:
> > I see that this is an old-ish bug but I did not read the full history.
> > Is it now booting faster than 3.5.0 was? I'm asking because I'm
> > interested to see if commit c67fe375 helped your particular case.
>
> Yes, I think 3.6.0-rc5 is already better than 3.5.x but can still be
> improved, as discussed.
Re-reading Mel's commit de74f1cc3b1e9730d9b58580cd11361d30cd182d,
I believe it re-introduces the quadratic behaviour that the code
was suffering from before, by not moving zone->compact_cached_free_pfn
down when no more free pfns are found in a page block.
This mail reverts that changeset, the next introduces what I hope to
be the proper fix. Richard, would you be willing to give these patches
a try, since your system seems to reproduce this bug easily?
---8<---
Revert "mm: have order > 0 compaction start near a pageblock with free pages"
This reverts commit de74f1cc3b1e9730d9b58580cd11361d30cd182d.
Mel found a real issue with my "skip ahead" logic in the
compaction code, but unfortunately his approach appears to
have re-introduced quadratic behaviour in that the value
of zone->compact_cached_free_pfn is never advanced until
the compaction run wraps around the start of the zone.
This merely moved the starting point for the quadratic behaviour
further into the zone, but the behaviour has still been observed.
It looks like another fix is required.
Signed-off-by: Rik van Riel <riel@redhat.com>
Reported-by: Richard Davies <richard@daviesmail.org>
diff --git a/mm/compaction.c b/mm/compaction.c
index 7fcd3a5..771775d 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -431,20 +431,6 @@ static bool suitable_migration_target(struct page *page)
}
/*
- * Returns the start pfn of the last page block in a zone. This is the starting
- * point for full compaction of a zone. Compaction searches for free pages from
- * the end of each zone, while isolate_freepages_block scans forward inside each
- * page block.
- */
-static unsigned long start_free_pfn(struct zone *zone)
-{
- unsigned long free_pfn;
- free_pfn = zone->zone_start_pfn + zone->spanned_pages;
- free_pfn &= ~(pageblock_nr_pages-1);
- return free_pfn;
-}
-
-/*
* Based on information in the current compact_control, find blocks
* suitable for isolating free pages from and then isolate them.
*/
@@ -483,6 +469,17 @@ static void isolate_freepages(struct zone *zone,
pfn -= pageblock_nr_pages) {
unsigned long isolated;
+ /*
+ * Skip ahead if another thread is compacting in the area
+ * simultaneously. If we wrapped around, we can only skip
+ * ahead if zone->compact_cached_free_pfn also wrapped to
+ * above our starting point.
+ */
+ if (cc->order > 0 && (!cc->wrapped ||
+ zone->compact_cached_free_pfn >
+ cc->start_free_pfn))
+ pfn = min(pfn, zone->compact_cached_free_pfn);
+
if (!pfn_valid(pfn))
continue;
@@ -533,15 +530,7 @@ static void isolate_freepages(struct zone *zone,
*/
if (isolated) {
high_pfn = max(high_pfn, pfn);
-
- /*
- * If the free scanner has wrapped, update
- * compact_cached_free_pfn to point to the highest
- * pageblock with free pages. This reduces excessive
- * scanning of full pageblocks near the end of the
- * zone
- */
- if (cc->order > 0 && cc->wrapped)
+ if (cc->order > 0)
zone->compact_cached_free_pfn = high_pfn;
}
}
@@ -551,11 +540,6 @@ static void isolate_freepages(struct zone *zone,
cc->free_pfn = high_pfn;
cc->nr_freepages = nr_freepages;
-
- /* If compact_cached_free_pfn is reset then set it now */
- if (cc->order > 0 && !cc->wrapped &&
- zone->compact_cached_free_pfn == start_free_pfn(zone))
- zone->compact_cached_free_pfn = high_pfn;
}
/*
@@ -642,6 +626,20 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
return ISOLATE_SUCCESS;
}
+/*
+ * Returns the start pfn of the last page block in a zone. This is the starting
+ * point for full compaction of a zone. Compaction searches for free pages from
+ * the end of each zone, while isolate_freepages_block scans forward inside each
+ * page block.
+ */
+static unsigned long start_free_pfn(struct zone *zone)
+{
+ unsigned long free_pfn;
+ free_pfn = zone->zone_start_pfn + zone->spanned_pages;
+ free_pfn &= ~(pageblock_nr_pages-1);
+ return free_pfn;
+}
+
static int compact_finished(struct zone *zone,
struct compact_control *cc)
{
^ permalink raw reply related [flat|nested] 101+ messages in thread
* [PATCH 2/2] make the compaction "skip ahead" logic robust
2012-09-12 16:46 ` Richard Davies
(?)
@ 2012-09-13 19:48 ` Rik van Riel
-1 siblings, 0 replies; 101+ messages in thread
From: Rik van Riel @ 2012-09-13 19:48 UTC (permalink / raw)
To: Richard Davies
Cc: Mel Gorman, Avi Kivity, Shaohua Li, qemu-devel, kvm, linux-mm
Make the "skip ahead" logic in compaction resistant to compaction
wrapping around to the end of the zone. This can lead to less
efficient compaction when one thread has wrapped around to the
end of the zone, and another simultaneous compactor has not done
so yet. However, it should ensure that we do not suffer quadratic
behaviour any more.
Signed-off-by: Rik van Riel <riel@redhat.com>
Reported-by: Richard Davies <richard@daviesmail.org>
diff --git a/mm/compaction.c b/mm/compaction.c
index 771775d..0656759 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -431,6 +431,24 @@ static bool suitable_migration_target(struct page *page)
}
/*
+ * We scan the zone in a circular fashion, starting at
+ * zone->compact_cached_free_pfn. Be careful not to skip if
+ * one compacting thread has just wrapped back to the end of the
+ * zone, but another thread has not.
+ */
+static bool compaction_may_skip(struct zone *zone,
+ struct compact_control *cc)
+{
+ if (!cc->wrapped && zone->compact_free_pfn < cc->start_pfn)
+ return true;
+
+ if (cc->wrapped && zone_compact_free_pfn > cc->start_pfn)
+ return true;
+
+ return false;
+}
+
+/*
* Based on information in the current compact_control, find blocks
* suitable for isolating free pages from and then isolate them.
*/
@@ -471,13 +489,9 @@ static void isolate_freepages(struct zone *zone,
/*
* Skip ahead if another thread is compacting in the area
- * simultaneously. If we wrapped around, we can only skip
- * ahead if zone->compact_cached_free_pfn also wrapped to
- * above our starting point.
+ * simultaneously, and has finished with this page block.
*/
- if (cc->order > 0 && (!cc->wrapped ||
- zone->compact_cached_free_pfn >
- cc->start_free_pfn))
+ if (cc->order > 0 && compaction_may_skip(zone, cc))
pfn = min(pfn, zone->compact_cached_free_pfn);
if (!pfn_valid(pfn))
^ permalink raw reply related [flat|nested] 101+ messages in thread
* [PATCH 2/2] make the compaction "skip ahead" logic robust
@ 2012-09-13 19:48 ` Rik van Riel
0 siblings, 0 replies; 101+ messages in thread
From: Rik van Riel @ 2012-09-13 19:48 UTC (permalink / raw)
To: Richard Davies
Cc: Mel Gorman, Avi Kivity, Shaohua Li, qemu-devel, kvm, linux-mm
Make the "skip ahead" logic in compaction resistant to compaction
wrapping around to the end of the zone. This can lead to less
efficient compaction when one thread has wrapped around to the
end of the zone, and another simultaneous compactor has not done
so yet. However, it should ensure that we do not suffer quadratic
behaviour any more.
Signed-off-by: Rik van Riel <riel@redhat.com>
Reported-by: Richard Davies <richard@daviesmail.org>
diff --git a/mm/compaction.c b/mm/compaction.c
index 771775d..0656759 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -431,6 +431,24 @@ static bool suitable_migration_target(struct page *page)
}
/*
+ * We scan the zone in a circular fashion, starting at
+ * zone->compact_cached_free_pfn. Be careful not to skip if
+ * one compacting thread has just wrapped back to the end of the
+ * zone, but another thread has not.
+ */
+static bool compaction_may_skip(struct zone *zone,
+ struct compact_control *cc)
+{
+ if (!cc->wrapped && zone->compact_free_pfn < cc->start_pfn)
+ return true;
+
+ if (cc->wrapped && zone_compact_free_pfn > cc->start_pfn)
+ return true;
+
+ return false;
+}
+
+/*
* Based on information in the current compact_control, find blocks
* suitable for isolating free pages from and then isolate them.
*/
@@ -471,13 +489,9 @@ static void isolate_freepages(struct zone *zone,
/*
* Skip ahead if another thread is compacting in the area
- * simultaneously. If we wrapped around, we can only skip
- * ahead if zone->compact_cached_free_pfn also wrapped to
- * above our starting point.
+ * simultaneously, and has finished with this page block.
*/
- if (cc->order > 0 && (!cc->wrapped ||
- zone->compact_cached_free_pfn >
- cc->start_free_pfn))
+ if (cc->order > 0 && compaction_may_skip(zone, cc))
pfn = min(pfn, zone->compact_cached_free_pfn);
if (!pfn_valid(pfn))
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 101+ messages in thread
* [Qemu-devel] [PATCH 2/2] make the compaction "skip ahead" logic robust
@ 2012-09-13 19:48 ` Rik van Riel
0 siblings, 0 replies; 101+ messages in thread
From: Rik van Riel @ 2012-09-13 19:48 UTC (permalink / raw)
To: Richard Davies
Cc: kvm, qemu-devel, linux-mm, Mel Gorman, Shaohua Li, Avi Kivity
Make the "skip ahead" logic in compaction resistant to compaction
wrapping around to the end of the zone. This can lead to less
efficient compaction when one thread has wrapped around to the
end of the zone, and another simultaneous compactor has not done
so yet. However, it should ensure that we do not suffer quadratic
behaviour any more.
Signed-off-by: Rik van Riel <riel@redhat.com>
Reported-by: Richard Davies <richard@daviesmail.org>
diff --git a/mm/compaction.c b/mm/compaction.c
index 771775d..0656759 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -431,6 +431,24 @@ static bool suitable_migration_target(struct page *page)
}
/*
+ * We scan the zone in a circular fashion, starting at
+ * zone->compact_cached_free_pfn. Be careful not to skip if
+ * one compacting thread has just wrapped back to the end of the
+ * zone, but another thread has not.
+ */
+static bool compaction_may_skip(struct zone *zone,
+ struct compact_control *cc)
+{
+ if (!cc->wrapped && zone->compact_free_pfn < cc->start_pfn)
+ return true;
+
+ if (cc->wrapped && zone_compact_free_pfn > cc->start_pfn)
+ return true;
+
+ return false;
+}
+
+/*
* Based on information in the current compact_control, find blocks
* suitable for isolating free pages from and then isolate them.
*/
@@ -471,13 +489,9 @@ static void isolate_freepages(struct zone *zone,
/*
* Skip ahead if another thread is compacting in the area
- * simultaneously. If we wrapped around, we can only skip
- * ahead if zone->compact_cached_free_pfn also wrapped to
- * above our starting point.
+ * simultaneously, and has finished with this page block.
*/
- if (cc->order > 0 && (!cc->wrapped ||
- zone->compact_cached_free_pfn >
- cc->start_free_pfn))
+ if (cc->order > 0 && compaction_may_skip(zone, cc))
pfn = min(pfn, zone->compact_cached_free_pfn);
if (!pfn_valid(pfn))
^ permalink raw reply related [flat|nested] 101+ messages in thread
* [PATCH -v2 2/2] make the compaction "skip ahead" logic robust
2012-09-13 19:48 ` Rik van Riel
(?)
@ 2012-09-13 19:54 ` Rik van Riel
-1 siblings, 0 replies; 101+ messages in thread
From: Rik van Riel @ 2012-09-13 19:54 UTC (permalink / raw)
To: Richard Davies
Cc: Mel Gorman, Avi Kivity, Shaohua Li, qemu-devel, kvm, linux-mm
Argh. And of course I send out the version from _before_ the compile test,
instead of the one after! I am not used to caffeine any more and have had
way too much tea...
---8<---
Make the "skip ahead" logic in compaction resistant to compaction
wrapping around to the end of the zone. This can lead to less
efficient compaction when one thread has wrapped around to the
end of the zone, and another simultaneous compactor has not done
so yet. However, it should ensure that we do not suffer quadratic
behaviour any more.
Signed-off-by: Rik van Riel <riel@redhat.com>
Reported-by: Richard Davies <richard@daviesmail.org>
diff --git a/mm/compaction.c b/mm/compaction.c
index 771775d..0656759 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -431,6 +431,24 @@ static bool suitable_migration_target(struct page *page)
}
/*
+ * We scan the zone in a circular fashion, starting at
+ * zone->compact_cached_free_pfn. Be careful not to skip if
+ * one compacting thread has just wrapped back to the end of the
+ * zone, but another thread has not.
+ */
+static bool compaction_may_skip(struct zone *zone,
+ struct compact_control *cc)
+{
+ if (!cc->wrapped && zone->compact_cached_free_pfn < cc->start_free_pfn)
+ return true;
+
+ if (cc->wrapped && zone->compact_cached_free_pfn > cc->start_free_pfn)
+ return true;
+
+ return false;
+}
+
+/*
* Based on information in the current compact_control, find blocks
* suitable for isolating free pages from and then isolate them.
*/
@@ -471,13 +489,9 @@ static void isolate_freepages(struct zone *zone,
/*
* Skip ahead if another thread is compacting in the area
- * simultaneously. If we wrapped around, we can only skip
- * ahead if zone->compact_cached_free_pfn also wrapped to
- * above our starting point.
+ * simultaneously, and has finished with this page block.
*/
- if (cc->order > 0 && (!cc->wrapped ||
- zone->compact_cached_free_pfn >
- cc->start_free_pfn))
+ if (cc->order > 0 && compaction_may_skip(zone, cc))
pfn = min(pfn, zone->compact_cached_free_pfn);
if (!pfn_valid(pfn))
^ permalink raw reply related [flat|nested] 101+ messages in thread
* [PATCH -v2 2/2] make the compaction "skip ahead" logic robust
@ 2012-09-13 19:54 ` Rik van Riel
0 siblings, 0 replies; 101+ messages in thread
From: Rik van Riel @ 2012-09-13 19:54 UTC (permalink / raw)
To: Richard Davies
Cc: Mel Gorman, Avi Kivity, Shaohua Li, qemu-devel, kvm, linux-mm
Argh. And of course I send out the version from _before_ the compile test,
instead of the one after! I am not used to caffeine any more and have had
way too much tea...
---8<---
Make the "skip ahead" logic in compaction resistant to compaction
wrapping around to the end of the zone. This can lead to less
efficient compaction when one thread has wrapped around to the
end of the zone, and another simultaneous compactor has not done
so yet. However, it should ensure that we do not suffer quadratic
behaviour any more.
Signed-off-by: Rik van Riel <riel@redhat.com>
Reported-by: Richard Davies <richard@daviesmail.org>
diff --git a/mm/compaction.c b/mm/compaction.c
index 771775d..0656759 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -431,6 +431,24 @@ static bool suitable_migration_target(struct page *page)
}
/*
+ * We scan the zone in a circular fashion, starting at
+ * zone->compact_cached_free_pfn. Be careful not to skip if
+ * one compacting thread has just wrapped back to the end of the
+ * zone, but another thread has not.
+ */
+static bool compaction_may_skip(struct zone *zone,
+ struct compact_control *cc)
+{
+ if (!cc->wrapped && zone->compact_cached_free_pfn < cc->start_free_pfn)
+ return true;
+
+ if (cc->wrapped && zone->compact_cached_free_pfn > cc->start_free_pfn)
+ return true;
+
+ return false;
+}
+
+/*
* Based on information in the current compact_control, find blocks
* suitable for isolating free pages from and then isolate them.
*/
@@ -471,13 +489,9 @@ static void isolate_freepages(struct zone *zone,
/*
* Skip ahead if another thread is compacting in the area
- * simultaneously. If we wrapped around, we can only skip
- * ahead if zone->compact_cached_free_pfn also wrapped to
- * above our starting point.
+ * simultaneously, and has finished with this page block.
*/
- if (cc->order > 0 && (!cc->wrapped ||
- zone->compact_cached_free_pfn >
- cc->start_free_pfn))
+ if (cc->order > 0 && compaction_may_skip(zone, cc))
pfn = min(pfn, zone->compact_cached_free_pfn);
if (!pfn_valid(pfn))
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 101+ messages in thread
* [Qemu-devel] [PATCH -v2 2/2] make the compaction "skip ahead" logic robust
@ 2012-09-13 19:54 ` Rik van Riel
0 siblings, 0 replies; 101+ messages in thread
From: Rik van Riel @ 2012-09-13 19:54 UTC (permalink / raw)
To: Richard Davies
Cc: kvm, qemu-devel, linux-mm, Mel Gorman, Shaohua Li, Avi Kivity
Argh. And of course I send out the version from _before_ the compile test,
instead of the one after! I am not used to caffeine any more and have had
way too much tea...
---8<---
Make the "skip ahead" logic in compaction resistant to compaction
wrapping around to the end of the zone. This can lead to less
efficient compaction when one thread has wrapped around to the
end of the zone, and another simultaneous compactor has not done
so yet. However, it should ensure that we do not suffer quadratic
behaviour any more.
Signed-off-by: Rik van Riel <riel@redhat.com>
Reported-by: Richard Davies <richard@daviesmail.org>
diff --git a/mm/compaction.c b/mm/compaction.c
index 771775d..0656759 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -431,6 +431,24 @@ static bool suitable_migration_target(struct page *page)
}
/*
+ * We scan the zone in a circular fashion, starting at
+ * zone->compact_cached_free_pfn. Be careful not to skip if
+ * one compacting thread has just wrapped back to the end of the
+ * zone, but another thread has not.
+ */
+static bool compaction_may_skip(struct zone *zone,
+ struct compact_control *cc)
+{
+ if (!cc->wrapped && zone->compact_cached_free_pfn < cc->start_free_pfn)
+ return true;
+
+ if (cc->wrapped && zone->compact_cached_free_pfn > cc->start_free_pfn)
+ return true;
+
+ return false;
+}
+
+/*
* Based on information in the current compact_control, find blocks
* suitable for isolating free pages from and then isolate them.
*/
@@ -471,13 +489,9 @@ static void isolate_freepages(struct zone *zone,
/*
* Skip ahead if another thread is compacting in the area
- * simultaneously. If we wrapped around, we can only skip
- * ahead if zone->compact_cached_free_pfn also wrapped to
- * above our starting point.
+ * simultaneously, and has finished with this page block.
*/
- if (cc->order > 0 && (!cc->wrapped ||
- zone->compact_cached_free_pfn >
- cc->start_free_pfn))
+ if (cc->order > 0 && compaction_may_skip(zone, cc))
pfn = min(pfn, zone->compact_cached_free_pfn);
if (!pfn_valid(pfn))
^ permalink raw reply related [flat|nested] 101+ messages in thread
* Re: [PATCH -v2 2/2] make the compaction "skip ahead" logic robust
2012-09-13 19:54 ` Rik van Riel
(?)
@ 2012-09-15 15:55 ` Richard Davies
-1 siblings, 0 replies; 101+ messages in thread
From: Richard Davies @ 2012-09-15 15:55 UTC (permalink / raw)
To: Rik van Riel
Cc: Mel Gorman, Avi Kivity, Shaohua Li, qemu-devel, kvm, linux-mm
Hi Rik, Mel and Shaohua,
Thank you for your latest patches. I attach my latest perf report for a slow
boot with all of these applied.
Mel asked for timings of the slow boots. It's very hard to give anything
useful here! A normal boot would be a minute or so, and many are like that,
but the slowest that I have seen (on 3.5.x) was several hours. Basically, I
just test many times until I get one which is noticeably slow than normal
and then run perf record on that one.
The latest perf report for a slow boot is below. For the fast boots, most of
the time is in clean_page_c in do_huge_pmd_anonymous_page, but for this slow
one there is a lot of lock contention above that.
Thanks,
Richard.
# ========
# captured on: Sat Sep 15 15:40:54 2012
# os release : 3.6.0-rc5-elastic+
# perf version : 3.5.2
# arch : x86_64
# nrcpus online : 16
# nrcpus avail : 16
# cpudesc : AMD Opteron(tm) Processor 6128
# cpuid : AuthenticAMD,16,9,1
# total memory : 131973280 kB
# cmdline : /home/root/bin/perf record -g -a
# event : name = cycles, type = 0, config = 0x0, config1 = 0x0, config2 = 0x0, excl_usr = 0, excl_kern = 0, id = { 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80 }
# HEADER_CPU_TOPOLOGY info available, use -I to display
# HEADER_NUMA_TOPOLOGY info available, use -I to display
# ========
#
# Samples: 3M of event 'cycles'
# Event count (approx.): 1457256240581
#
# Overhead Command Shared Object Symbol
# ........ ............... .................... ..............................................
#
58.49% qemu-kvm [kernel.kallsyms] [k] _raw_spin_lock_irqsave
|
--- _raw_spin_lock_irqsave
|
|--95.07%-- compact_checklock_irqsave
| |
| |--70.03%-- isolate_migratepages_range
| | compact_zone
| | compact_zone_order
| | try_to_compact_pages
| | __alloc_pages_direct_compact
| | __alloc_pages_nodemask
| | alloc_pages_vma
| | do_huge_pmd_anonymous_page
| | handle_mm_fault
| | __get_user_pages
| | get_user_page_nowait
| | hva_to_pfn.isra.17
| | __gfn_to_pfn
| | gfn_to_pfn_async
| | try_async_pf
| | tdp_page_fault
| | kvm_mmu_page_fault
| | pf_interception
| | handle_exit
| | kvm_arch_vcpu_ioctl_run
| | kvm_vcpu_ioctl
| | do_vfs_ioctl
| | sys_ioctl
| | system_call_fastpath
| | ioctl
| | |
| | |--92.76%-- 0x10100000006
| | |
| | --7.24%-- 0x10100000002
| |
| --29.97%-- compaction_alloc
| migrate_pages
| compact_zone
| compact_zone_order
| try_to_compact_pages
| __alloc_pages_direct_compact
| __alloc_pages_nodemask
| alloc_pages_vma
| do_huge_pmd_anonymous_page
| handle_mm_fault
| __get_user_pages
| get_user_page_nowait
| hva_to_pfn.isra.17
| __gfn_to_pfn
| gfn_to_pfn_async
| try_async_pf
| tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--90.69%-- 0x10100000006
| |
| --9.31%-- 0x10100000002
|
|--4.53%-- isolate_migratepages_range
| compact_zone
| compact_zone_order
| try_to_compact_pages
| __alloc_pages_direct_compact
| __alloc_pages_nodemask
| alloc_pages_vma
| do_huge_pmd_anonymous_page
| handle_mm_fault
| __get_user_pages
| get_user_page_nowait
| hva_to_pfn.isra.17
| __gfn_to_pfn
| gfn_to_pfn_async
| try_async_pf
| tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--92.22%-- 0x10100000006
| |
| --7.78%-- 0x10100000002
--0.40%-- [...]
13.14% qemu-kvm [kernel.kallsyms] [k] clear_page_c
|
--- clear_page_c
|
|--99.38%-- do_huge_pmd_anonymous_page
| handle_mm_fault
| __get_user_pages
| get_user_page_nowait
| hva_to_pfn.isra.17
| __gfn_to_pfn
| gfn_to_pfn_async
| try_async_pf
| tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--51.86%-- 0x10100000006
| |
| |--48.14%-- 0x10100000002
| --0.01%-- [...]
|
--0.62%-- __alloc_pages_nodemask
|
|--76.27%-- alloc_pages_vma
| handle_pte_fault
| |
| |--99.57%-- handle_mm_fault
| | |
| | |--99.65%-- __get_user_pages
| | | get_user_page_nowait
| | | hva_to_pfn.isra.17
| | | __gfn_to_pfn
| | | gfn_to_pfn_async
| | | try_async_pf
| | | tdp_page_fault
| | | kvm_mmu_page_fault
| | | pf_interception
| | | handle_exit
| | | kvm_arch_vcpu_ioctl_run
| | | kvm_vcpu_ioctl
| | | do_vfs_ioctl
| | | sys_ioctl
| | | system_call_fastpath
| | | ioctl
| | | |
| | | |--91.77%-- 0x10100000006
| | | |
| | | --8.23%-- 0x10100000002
| | --0.35%-- [...]
| --0.43%-- [...]
|
--23.73%-- alloc_pages_current
|
|--99.20%-- pte_alloc_one
| |
| |--98.68%-- do_huge_pmd_anonymous_page
| | handle_mm_fault
| | __get_user_pages
| | get_user_page_nowait
| | hva_to_pfn.isra.17
| | __gfn_to_pfn
| | gfn_to_pfn_async
| | try_async_pf
| | tdp_page_fault
| | kvm_mmu_page_fault
| | pf_interception
| | handle_exit
| | kvm_arch_vcpu_ioctl_run
| | kvm_vcpu_ioctl
| | do_vfs_ioctl
| | sys_ioctl
| | system_call_fastpath
| | ioctl
| | |
| | |--58.61%-- 0x10100000002
| | |
| | --41.39%-- 0x10100000006
| |
| --1.32%-- __pte_alloc
| do_huge_pmd_anonymous_page
| handle_mm_fault
| __get_user_pages
| get_user_page_nowait
| hva_to_pfn.isra.17
| __gfn_to_pfn
| gfn_to_pfn_async
| try_async_pf
| tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| 0x10100000006
|
|--0.69%-- __vmalloc_node_range
| __vmalloc_node
| vzalloc
| __kvm_set_memory_region
| kvm_set_memory_region
| kvm_vm_ioctl_set_memory_region
| kvm_vm_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
--0.12%-- [...]
6.31% qemu-kvm [kernel.kallsyms] [k] isolate_freepages_block
|
--- isolate_freepages_block
|
|--99.98%-- compaction_alloc
| migrate_pages
| compact_zone
| compact_zone_order
| try_to_compact_pages
| __alloc_pages_direct_compact
| __alloc_pages_nodemask
| alloc_pages_vma
| do_huge_pmd_anonymous_page
| handle_mm_fault
| __get_user_pages
| get_user_page_nowait
| hva_to_pfn.isra.17
| __gfn_to_pfn
| gfn_to_pfn_async
| try_async_pf
| tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--91.13%-- 0x10100000006
| |
| --8.87%-- 0x10100000002
--0.02%-- [...]
1.68% qemu-kvm [kernel.kallsyms] [k] yield_to
|
--- yield_to
|
|--99.65%-- kvm_vcpu_yield_to
| kvm_vcpu_on_spin
| pause_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--88.78%-- 0x10100000006
| |
| --11.22%-- 0x10100000002
--0.35%-- [...]
1.24% ksmd [kernel.kallsyms] [k] memcmp
|
--- memcmp
|
|--99.78%-- memcmp_pages
| |
| |--77.17%-- ksm_scan_thread
| | kthread
| | kernel_thread_helper
| |
| --22.83%-- try_to_merge_with_ksm_page
| ksm_scan_thread
| kthread
| kernel_thread_helper
--0.22%-- [...]
1.09% qemu-kvm [kernel.kallsyms] [k] svm_vcpu_run
|
--- svm_vcpu_run
|
|--99.44%-- kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--82.15%-- 0x10100000006
| |
| |--17.85%-- 0x10100000002
| --0.00%-- [...]
|
--0.56%-- kvm_vcpu_ioctl
do_vfs_ioctl
sys_ioctl
system_call_fastpath
ioctl
|
|--75.21%-- 0x10100000006
|
--24.79%-- 0x10100000002
1.09% swapper [kernel.kallsyms] [k] default_idle
|
--- default_idle
|
|--99.74%-- cpu_idle
| |
| |--76.31%-- start_secondary
| |
| --23.69%-- rest_init
| start_kernel
| x86_64_start_reservations
| x86_64_start_kernel
--0.26%-- [...]
1.08% ksmd [kernel.kallsyms] [k] smp_call_function_many
|
--- smp_call_function_many
|
|--99.97%-- native_flush_tlb_others
| |
| |--99.78%-- flush_tlb_page
| | ptep_clear_flush
| | try_to_merge_with_ksm_page
| | ksm_scan_thread
| | kthread
| | kernel_thread_helper
| --0.22%-- [...]
--0.03%-- [...]
0.77% qemu-kvm [kernel.kallsyms] [k] kvm_vcpu_on_spin
|
--- kvm_vcpu_on_spin
|
|--99.36%-- pause_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--90.08%-- 0x10100000006
| |
| |--9.92%-- 0x10100000002
| --0.00%-- [...]
|
--0.64%-- handle_exit
kvm_arch_vcpu_ioctl_run
kvm_vcpu_ioctl
do_vfs_ioctl
sys_ioctl
system_call_fastpath
ioctl
|
|--87.37%-- 0x10100000006
|
--12.63%-- 0x10100000002
0.75% qemu-kvm [kernel.kallsyms] [k] compact_zone
|
--- compact_zone
|
|--99.98%-- compact_zone_order
| try_to_compact_pages
| __alloc_pages_direct_compact
| __alloc_pages_nodemask
| alloc_pages_vma
| do_huge_pmd_anonymous_page
| handle_mm_fault
| __get_user_pages
| get_user_page_nowait
| hva_to_pfn.isra.17
| __gfn_to_pfn
| gfn_to_pfn_async
| try_async_pf
| tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--91.29%-- 0x10100000006
| |
| --8.71%-- 0x10100000002
--0.02%-- [...]
0.68% qemu-kvm [kernel.kallsyms] [k] _raw_spin_lock
|
--- _raw_spin_lock
|
|--39.71%-- yield_to
| kvm_vcpu_yield_to
| kvm_vcpu_on_spin
| pause_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--90.52%-- 0x10100000006
| |
| --9.48%-- 0x10100000002
|
|--15.63%-- kvm_vcpu_yield_to
| kvm_vcpu_on_spin
| pause_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--90.96%-- 0x10100000006
| |
| --9.04%-- 0x10100000002
|
|--6.55%-- tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--78.78%-- 0x10100000006
| |
| --21.22%-- 0x10100000002
|
|--4.87%-- free_pcppages_bulk
| |
| |--51.10%-- free_hot_cold_page
| | |
| | |--83.60%-- free_hot_cold_page_list
| | | |
| | | |--62.17%-- release_pages
| | | | pagevec_lru_move_fn
| | | | __pagevec_lru_add
| | | | |
| | | | |--99.22%-- __lru_cache_add
| | | | | lru_cache_add_lru
| | | | | putback_lru_page
| | | | | |
| | | | | |--99.61%-- migrate_pages
| | | | | | compact_zone
| | | | | | compact_zone_order
| | | | | | try_to_compact_pages
| | | | | | __alloc_pages_direct_compact
| | | | | | __alloc_pages_nodemask
| | | | | | alloc_pages_vma
| | | | | | do_huge_pmd_anonymous_page
| | | | | | handle_mm_fault
| | | | | | __get_user_pages
| | | | | | get_user_page_nowait
| | | | | | hva_to_pfn.isra.17
| | | | | | __gfn_to_pfn
| | | | | | gfn_to_pfn_async
| | | | | | try_async_pf
| | | | | | tdp_page_fault
| | | | | | kvm_mmu_page_fault
| | | | | | pf_interception
| | | | | | handle_exit
| | | | | | kvm_arch_vcpu_ioctl_run
| | | | | | kvm_vcpu_ioctl
| | | | | | do_vfs_ioctl
| | | | | | sys_ioctl
| | | | | | system_call_fastpath
| | | | | | ioctl
| | | | | | |
| | | | | | |--88.98%-- 0x10100000006
| | | | | | |
| | | | | | --11.02%-- 0x10100000002
| | | | | --0.39%-- [...]
| | | | |
| | | | --0.78%-- lru_add_drain_cpu
| | | | lru_add_drain
| | | | migrate_prep_local
| | | | compact_zone
| | | | compact_zone_order
| | | | try_to_compact_pages
| | | | __alloc_pages_direct_compact
| | | | __alloc_pages_nodemask
| | | | alloc_pages_vma
| | | | do_huge_pmd_anonymous_page
| | | | handle_mm_fault
| | | | __get_user_pages
| | | | get_user_page_nowait
| | | | hva_to_pfn.isra.17
| | | | __gfn_to_pfn
| | | | gfn_to_pfn_async
| | | | try_async_pf
| | | | tdp_page_fault
| | | | kvm_mmu_page_fault
| | | | pf_interception
| | | | handle_exit
| | | | kvm_arch_vcpu_ioctl_run
| | | | kvm_vcpu_ioctl
| | | | do_vfs_ioctl
| | | | sys_ioctl
| | | | system_call_fastpath
| | | | ioctl
| | | | 0x10100000006
| | | |
| | | --37.83%-- shrink_page_list
| | | shrink_inactive_list
| | | shrink_lruvec
| | | try_to_free_pages
| | | __alloc_pages_nodemask
| | | alloc_pages_vma
| | | do_huge_pmd_anonymous_page
| | | handle_mm_fault
| | | __get_user_pages
| | | get_user_page_nowait
| | | hva_to_pfn.isra.17
| | | __gfn_to_pfn
| | | gfn_to_pfn_async
| | | try_async_pf
| | | tdp_page_fault
| | | kvm_mmu_page_fault
| | | pf_interception
| | | handle_exit
| | | kvm_arch_vcpu_ioctl_run
| | | kvm_vcpu_ioctl
| | | do_vfs_ioctl
| | | sys_ioctl
| | | system_call_fastpath
| | | ioctl
| | | |
| | | |--86.38%-- 0x10100000006
| | | |
| | | --13.62%-- 0x10100000002
| | |
| | |--12.96%-- __free_pages
| | | |
| | | |--98.43%-- release_freepages
| | | | compact_zone
| | | | compact_zone_order
| | | | try_to_compact_pages
| | | | __alloc_pages_direct_compact
| | | | __alloc_pages_nodemask
| | | | alloc_pages_vma
| | | | do_huge_pmd_anonymous_page
| | | | handle_mm_fault
| | | | __get_user_pages
| | | | get_user_page_nowait
| | | | hva_to_pfn.isra.17
| | | | __gfn_to_pfn
| | | | gfn_to_pfn_async
| | | | try_async_pf
| | | | tdp_page_fault
| | | | kvm_mmu_page_fault
| | | | pf_interception
| | | | handle_exit
| | | | kvm_arch_vcpu_ioctl_run
| | | | kvm_vcpu_ioctl
| | | | do_vfs_ioctl
| | | | sys_ioctl
| | | | system_call_fastpath
| | | | ioctl
| | | | |
| | | | |--90.49%-- 0x10100000006
| | | | |
| | | | --9.51%-- 0x10100000002
| | | |
| | | --1.57%-- __free_slab
| | | discard_slab
| | | unfreeze_partials
| | | put_cpu_partial
| | | __slab_free
| | | kmem_cache_free
| | | free_buffer_head
| | | try_to_free_buffers
| | | jbd2_journal_try_to_free_buffers
| | | bdev_try_to_free_page
| | | blkdev_releasepage
| | | try_to_release_page
| | | move_to_new_page
| | | migrate_pages
| | | compact_zone
| | | compact_zone_order
| | | try_to_compact_pages
| | | __alloc_pages_direct_compact
| | | __alloc_pages_nodemask
| | | alloc_pages_vma
| | | do_huge_pmd_anonymous_page
| | | handle_mm_fault
| | | __get_user_pages
| | | get_user_page_nowait
| | | hva_to_pfn.isra.17
| | | __gfn_to_pfn
| | | gfn_to_pfn_async
| | | try_async_pf
| | | tdp_page_fault
| | | kvm_mmu_page_fault
| | | pf_interception
| | | handle_exit
| | | kvm_arch_vcpu_ioctl_run
| | | kvm_vcpu_ioctl
| | | do_vfs_ioctl
| | | sys_ioctl
| | | system_call_fastpath
| | | ioctl
| | | 0x10100000006
| | |
| | --3.44%-- __put_single_page
| | put_page
| | putback_lru_page
| | migrate_pages
| | compact_zone
| | compact_zone_order
| | try_to_compact_pages
| | __alloc_pages_direct_compact
| | __alloc_pages_nodemask
| | alloc_pages_vma
| | do_huge_pmd_anonymous_page
| | handle_mm_fault
| | __get_user_pages
| | get_user_page_nowait
| | hva_to_pfn.isra.17
| | __gfn_to_pfn
| | gfn_to_pfn_async
| | try_async_pf
| | tdp_page_fault
| | kvm_mmu_page_fault
| | pf_interception
| | handle_exit
| | kvm_arch_vcpu_ioctl_run
| | kvm_vcpu_ioctl
| | do_vfs_ioctl
| | sys_ioctl
| | system_call_fastpath
| | ioctl
| | |
| | |--88.25%-- 0x10100000006
| | |
| | --11.75%-- 0x10100000002
| |
| --48.90%-- drain_pages
| |
| |--88.65%-- drain_local_pages
| | |
| | |--96.33%-- generic_smp_call_function_interrupt
| | | smp_call_function_interrupt
| | | call_function_interrupt
| | | |
| | | |--23.46%-- __remove_mapping
| | | | shrink_page_list
| | | | shrink_inactive_list
| | | | shrink_lruvec
| | | | try_to_free_pages
| | | | __alloc_pages_nodemask
| | | | alloc_pages_vma
| | | | do_huge_pmd_anonymous_page
| | | | handle_mm_fault
| | | | __get_user_pages
| | | | get_user_page_nowait
| | | | hva_to_pfn.isra.17
| | | | __gfn_to_pfn
| | | | gfn_to_pfn_async
| | | | try_async_pf
| | | | tdp_page_fault
| | | | kvm_mmu_page_fault
| | | | pf_interception
| | | | handle_exit
| | | | kvm_arch_vcpu_ioctl_run
| | | | kvm_vcpu_ioctl
| | | | do_vfs_ioctl
| | | | sys_ioctl
| | | | system_call_fastpath
| | | | ioctl
| | | | |
| | | | |--93.81%-- 0x10100000006
| | | | |
| | | | --6.19%-- 0x10100000002
| | | |
| | | |--19.93%-- kvm_vcpu_ioctl
| | | | do_vfs_ioctl
| | | | sys_ioctl
| | | | system_call_fastpath
| | | | ioctl
| | | | |
| | | | |--93.65%-- 0x10100000006
| | | | |
| | | | --6.35%-- 0x10100000002
| | | |
| | | |--14.19%-- compaction_alloc
| | | | migrate_pages
| | | | compact_zone
| | | | compact_zone_order
| | | | try_to_compact_pages
| | | | __alloc_pages_direct_compact
| | | | __alloc_pages_nodemask
| | | | alloc_pages_vma
| | | | do_huge_pmd_anonymous_page
| | | | handle_mm_fault
| | | | __get_user_pages
| | | | get_user_page_nowait
| | | | hva_to_pfn.isra.17
| | | | __gfn_to_pfn
| | | | gfn_to_pfn_async
| | | | try_async_pf
| | | | tdp_page_fault
| | | | kvm_mmu_page_fault
| | | | pf_interception
| | | | handle_exit
| | | | kvm_arch_vcpu_ioctl_run
| | | | kvm_vcpu_ioctl
| | | | do_vfs_ioctl
| | | | sys_ioctl
| | | | system_call_fastpath
| | | | ioctl
| | | | |
| | | | |--89.88%-- 0x10100000006
| | | | |
| | | | --10.12%-- 0x10100000002
| | | |
| | | |--8.57%-- isolate_migratepages_range
| | | | compact_zone
| | | | compact_zone_order
| | | | try_to_compact_pages
| | | | __alloc_pages_direct_compact
| | | | __alloc_pages_nodemask
| | | | alloc_pages_vma
| | | | do_huge_pmd_anonymous_page
| | | | handle_mm_fault
| | | | __get_user_pages
| | | | get_user_page_nowait
| | | | hva_to_pfn.isra.17
| | | | __gfn_to_pfn
| | | | gfn_to_pfn_async
| | | | try_async_pf
| | | | tdp_page_fault
| | | | kvm_mmu_page_fault
| | | | pf_interception
| | | | handle_exit
| | | | kvm_arch_vcpu_ioctl_run
| | | | kvm_vcpu_ioctl
| | | | do_vfs_ioctl
| | | | sys_ioctl
| | | | system_call_fastpath
| | | | ioctl
| | | | |
| | | | |--92.14%-- 0x10100000006
| | | | |
| | | | --7.86%-- 0x10100000002
| | | |
| | | |--5.05%-- do_huge_pmd_anonymous_page
| | | | handle_mm_fault
| | | | __get_user_pages
| | | | get_user_page_nowait
| | | | hva_to_pfn.isra.17
| | | | __gfn_to_pfn
| | | | gfn_to_pfn_async
| | | | try_async_pf
| | | | tdp_page_fault
| | | | kvm_mmu_page_fault
| | | | pf_interception
| | | | handle_exit
| | | | kvm_arch_vcpu_ioctl_run
| | | | kvm_vcpu_ioctl
| | | | do_vfs_ioctl
| | | | sys_ioctl
| | | | system_call_fastpath
| | | | ioctl
| | | | |
| | | | |--92.53%-- 0x10100000006
| | | | |
| | | | --7.47%-- 0x10100000002
| | | |
| | | |--4.49%-- shrink_inactive_list
| | | | shrink_lruvec
| | | | try_to_free_pages
| | | | __alloc_pages_nodemask
| | | | alloc_pages_vma
| | | | do_huge_pmd_anonymous_page
| | | | handle_mm_fault
| | | | __get_user_pages
| | | | get_user_page_nowait
| | | | hva_to_pfn.isra.17
| | | | __gfn_to_pfn
| | | | gfn_to_pfn_async
| | | | try_async_pf
| | | | tdp_page_fault
| | | | kvm_mmu_page_fault
| | | | pf_interception
| | | | handle_exit
| | | | kvm_arch_vcpu_ioctl_run
| | | | kvm_vcpu_ioctl
| | | | do_vfs_ioctl
| | | | sys_ioctl
| | | | system_call_fastpath
| | | | ioctl
| | | | |
| | | | |--94.61%-- 0x10100000006
| | | | |
| | | | --5.39%-- 0x10100000002
| | | |
| | | |--2.80%-- free_hot_cold_page_list
| | | | shrink_page_list
| | | | shrink_inactive_list
| | | | shrink_lruvec
| | | | try_to_free_pages
| | | | __alloc_pages_nodemask
| | | | alloc_pages_vma
| | | | do_huge_pmd_anonymous_page
| | | | handle_mm_fault
| | | | __get_user_pages
| | | | get_user_page_nowait
| | | | hva_to_pfn.isra.17
| | | | __gfn_to_pfn
| | | | gfn_to_pfn_async
| | | | try_async_pf
| | | | tdp_page_fault
| | | | kvm_mmu_page_fault
| | | | pf_interception
| | | | handle_exit
| | | | kvm_arch_vcpu_ioctl_run
| | | | kvm_vcpu_ioctl
| | | | do_vfs_ioctl
| | | | sys_ioctl
| | | | system_call_fastpath
| | | | ioctl
| | | | |
| | | | |--91.24%-- 0x10100000006
| | | | |
| | | | --8.76%-- 0x10100000002
| | | |
| | | |--1.96%-- buffer_migrate_page
| | | | move_to_new_page
| | | | migrate_pages
| | | | compact_zone
| | | | compact_zone_order
| | | | try_to_compact_pages
| | | | __alloc_pages_direct_compact
| | | | __alloc_pages_nodemask
| | | | alloc_pages_vma
| | | | do_huge_pmd_anonymous_page
| | | | handle_mm_fault
| | | | __get_user_pages
| | | | get_user_page_nowait
| | | | hva_to_pfn.isra.17
| | | | __gfn_to_pfn
| | | | gfn_to_pfn_async
| | | | try_async_pf
| | | | tdp_page_fault
| | | | kvm_mmu_page_fault
| | | | pf_interception
| | | | handle_exit
| | | | kvm_arch_vcpu_ioctl_run
| | | | kvm_vcpu_ioctl
| | | | do_vfs_ioctl
| | | | sys_ioctl
| | | | system_call_fastpath
| | | | ioctl
| | | | |
| | | | |--63.14%-- 0x10100000006
| | | | |
| | | | --36.86%-- 0x10100000002
| | | |
| | | |--1.62%-- try_to_free_buffers
| | | | jbd2_journal_try_to_free_buffers
| | | | ext4_releasepage
| | | | try_to_release_page
| | | | shrink_page_list
| | | | shrink_inactive_list
| | | | shrink_lruvec
| | | | try_to_free_pages
| | | | __alloc_pages_nodemask
| | | | alloc_pages_vma
| | | | do_huge_pmd_anonymous_page
| | | | handle_mm_fault
| | | | __get_user_pages
| | | | get_user_page_nowait
| | | | hva_to_pfn.isra.17
| | | | __gfn_to_pfn
| | | | gfn_to_pfn_async
| | | | try_async_pf
| | | | tdp_page_fault
| | | | kvm_mmu_page_fault
| | | | pf_interception
| | | | handle_exit
| | | | kvm_arch_vcpu_ioctl_run
| | | | kvm_vcpu_ioctl
| | | | do_vfs_ioctl
| | | | sys_ioctl
| | | | system_call_fastpath
| | | | ioctl
| | | | 0x10100000006
| | | |
| | | |--1.49%-- compact_checklock_irqsave
| | | | isolate_migratepages_range
| | | | compact_zone
| | | | compact_zone_order
| | | | try_to_compact_pages
| | | | __alloc_pages_direct_compact
| | | | __alloc_pages_nodemask
| | | | alloc_pages_vma
| | | | do_huge_pmd_anonymous_page
| | | | handle_mm_fault
| | | | __get_user_pages
| | | | get_user_page_nowait
| | | | hva_to_pfn.isra.17
| | | | __gfn_to_pfn
| | | | gfn_to_pfn_async
| | | | try_async_pf
| | | | tdp_page_fault
| | | | kvm_mmu_page_fault
| | | | pf_interception
| | | | handle_exit
| | | | kvm_arch_vcpu_ioctl_run
| | | | kvm_vcpu_ioctl
| | | | do_vfs_ioctl
| | | | sys_ioctl
| | | | system_call_fastpath
| | | | ioctl
| | | | 0x10100000006
| | | |
| | | |--1.46%-- __mutex_lock_slowpath
| | | | mutex_lock
| | | | page_lock_anon_vma
| | | | page_referenced
| | | | shrink_active_list
| | | | shrink_lruvec
| | | | try_to_free_pages
| | | | __alloc_pages_nodemask
| | | | alloc_pages_vma
| | | | do_huge_pmd_anonymous_page
| | | | handle_mm_fault
| | | | __get_user_pages
| | | | get_user_page_nowait
| | | | hva_to_pfn.isra.17
| | | | __gfn_to_pfn
| | | | gfn_to_pfn_async
| | | | try_async_pf
| | | | tdp_page_fault
| | | | kvm_mmu_page_fault
| | | | pf_interception
| | | | handle_exit
| | | | kvm_arch_vcpu_ioctl_run
| | | | kvm_vcpu_ioctl
| | | | do_vfs_ioctl
| | | | sys_ioctl
| | | | system_call_fastpath
| | | | ioctl
| | | | 0x10100000006
| | | |
| | | |--1.41%-- native_flush_tlb_others
| | | | flush_tlb_page
| | | | |
| | | | |--67.10%-- ptep_clear_flush
| | | | | try_to_unmap_one
| | | | | try_to_unmap_anon
| | | | | try_to_unmap
| | | | | migrate_pages
| | | | | compact_zone
| | | | | compact_zone_order
| | | | | try_to_compact_pages
| | | | | __alloc_pages_direct_compact
| | | | | __alloc_pages_nodemask
^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: [PATCH -v2 2/2] make the compaction "skip ahead" logic robust
@ 2012-09-15 15:55 ` Richard Davies
0 siblings, 0 replies; 101+ messages in thread
From: Richard Davies @ 2012-09-15 15:55 UTC (permalink / raw)
To: Rik van Riel
Cc: Mel Gorman, Avi Kivity, Shaohua Li, qemu-devel, kvm, linux-mm
Hi Rik, Mel and Shaohua,
Thank you for your latest patches. I attach my latest perf report for a slow
boot with all of these applied.
Mel asked for timings of the slow boots. It's very hard to give anything
useful here! A normal boot would be a minute or so, and many are like that,
but the slowest that I have seen (on 3.5.x) was several hours. Basically, I
just test many times until I get one which is noticeably slow than normal
and then run perf record on that one.
The latest perf report for a slow boot is below. For the fast boots, most of
the time is in clean_page_c in do_huge_pmd_anonymous_page, but for this slow
one there is a lot of lock contention above that.
Thanks,
Richard.
# ========
# captured on: Sat Sep 15 15:40:54 2012
# os release : 3.6.0-rc5-elastic+
# perf version : 3.5.2
# arch : x86_64
# nrcpus online : 16
# nrcpus avail : 16
# cpudesc : AMD Opteron(tm) Processor 6128
# cpuid : AuthenticAMD,16,9,1
# total memory : 131973280 kB
# cmdline : /home/root/bin/perf record -g -a
# event : name = cycles, type = 0, config = 0x0, config1 = 0x0, config2 = 0x0, excl_usr = 0, excl_kern = 0, id = { 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80 }
# HEADER_CPU_TOPOLOGY info available, use -I to display
# HEADER_NUMA_TOPOLOGY info available, use -I to display
# ========
#
# Samples: 3M of event 'cycles'
# Event count (approx.): 1457256240581
#
# Overhead Command Shared Object Symbol
# ........ ............... .................... ..............................................
#
58.49% qemu-kvm [kernel.kallsyms] [k] _raw_spin_lock_irqsave
|
--- _raw_spin_lock_irqsave
|
|--95.07%-- compact_checklock_irqsave
| |
| |--70.03%-- isolate_migratepages_range
| | compact_zone
| | compact_zone_order
| | try_to_compact_pages
| | __alloc_pages_direct_compact
| | __alloc_pages_nodemask
| | alloc_pages_vma
| | do_huge_pmd_anonymous_page
| | handle_mm_fault
| | __get_user_pages
| | get_user_page_nowait
| | hva_to_pfn.isra.17
| | __gfn_to_pfn
| | gfn_to_pfn_async
| | try_async_pf
| | tdp_page_fault
| | kvm_mmu_page_fault
| | pf_interception
| | handle_exit
| | kvm_arch_vcpu_ioctl_run
| | kvm_vcpu_ioctl
| | do_vfs_ioctl
| | sys_ioctl
| | system_call_fastpath
| | ioctl
| | |
| | |--92.76%-- 0x10100000006
| | |
| | --7.24%-- 0x10100000002
| |
| --29.97%-- compaction_alloc
| migrate_pages
| compact_zone
| compact_zone_order
| try_to_compact_pages
| __alloc_pages_direct_compact
| __alloc_pages_nodemask
| alloc_pages_vma
| do_huge_pmd_anonymous_page
| handle_mm_fault
| __get_user_pages
| get_user_page_nowait
| hva_to_pfn.isra.17
| __gfn_to_pfn
| gfn_to_pfn_async
| try_async_pf
| tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--90.69%-- 0x10100000006
| |
| --9.31%-- 0x10100000002
|
|--4.53%-- isolate_migratepages_range
| compact_zone
| compact_zone_order
| try_to_compact_pages
| __alloc_pages_direct_compact
| __alloc_pages_nodemask
| alloc_pages_vma
| do_huge_pmd_anonymous_page
| handle_mm_fault
| __get_user_pages
| get_user_page_nowait
| hva_to_pfn.isra.17
| __gfn_to_pfn
| gfn_to_pfn_async
| try_async_pf
| tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--92.22%-- 0x10100000006
| |
| --7.78%-- 0x10100000002
--0.40%-- [...]
13.14% qemu-kvm [kernel.kallsyms] [k] clear_page_c
|
--- clear_page_c
|
|--99.38%-- do_huge_pmd_anonymous_page
| handle_mm_fault
| __get_user_pages
| get_user_page_nowait
| hva_to_pfn.isra.17
| __gfn_to_pfn
| gfn_to_pfn_async
| try_async_pf
| tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--51.86%-- 0x10100000006
| |
| |--48.14%-- 0x10100000002
| --0.01%-- [...]
|
--0.62%-- __alloc_pages_nodemask
|
|--76.27%-- alloc_pages_vma
| handle_pte_fault
| |
| |--99.57%-- handle_mm_fault
| | |
| | |--99.65%-- __get_user_pages
| | | get_user_page_nowait
| | | hva_to_pfn.isra.17
| | | __gfn_to_pfn
| | | gfn_to_pfn_async
| | | try_async_pf
| | | tdp_page_fault
| | | kvm_mmu_page_fault
| | | pf_interception
| | | handle_exit
| | | kvm_arch_vcpu_ioctl_run
| | | kvm_vcpu_ioctl
| | | do_vfs_ioctl
| | | sys_ioctl
| | | system_call_fastpath
| | | ioctl
| | | |
| | | |--91.77%-- 0x10100000006
| | | |
| | | --8.23%-- 0x10100000002
| | --0.35%-- [...]
| --0.43%-- [...]
|
--23.73%-- alloc_pages_current
|
|--99.20%-- pte_alloc_one
| |
| |--98.68%-- do_huge_pmd_anonymous_page
| | handle_mm_fault
| | __get_user_pages
| | get_user_page_nowait
| | hva_to_pfn.isra.17
| | __gfn_to_pfn
| | gfn_to_pfn_async
| | try_async_pf
| | tdp_page_fault
| | kvm_mmu_page_fault
| | pf_interception
| | handle_exit
| | kvm_arch_vcpu_ioctl_run
| | kvm_vcpu_ioctl
| | do_vfs_ioctl
| | sys_ioctl
| | system_call_fastpath
| | ioctl
| | |
| | |--58.61%-- 0x10100000002
| | |
| | --41.39%-- 0x10100000006
| |
| --1.32%-- __pte_alloc
| do_huge_pmd_anonymous_page
| handle_mm_fault
| __get_user_pages
| get_user_page_nowait
| hva_to_pfn.isra.17
| __gfn_to_pfn
| gfn_to_pfn_async
| try_async_pf
| tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| 0x10100000006
|
|--0.69%-- __vmalloc_node_range
| __vmalloc_node
| vzalloc
| __kvm_set_memory_region
| kvm_set_memory_region
| kvm_vm_ioctl_set_memory_region
| kvm_vm_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
--0.12%-- [...]
6.31% qemu-kvm [kernel.kallsyms] [k] isolate_freepages_block
|
--- isolate_freepages_block
|
|--99.98%-- compaction_alloc
| migrate_pages
| compact_zone
| compact_zone_order
| try_to_compact_pages
| __alloc_pages_direct_compact
| __alloc_pages_nodemask
| alloc_pages_vma
| do_huge_pmd_anonymous_page
| handle_mm_fault
| __get_user_pages
| get_user_page_nowait
| hva_to_pfn.isra.17
| __gfn_to_pfn
| gfn_to_pfn_async
| try_async_pf
| tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--91.13%-- 0x10100000006
| |
| --8.87%-- 0x10100000002
--0.02%-- [...]
1.68% qemu-kvm [kernel.kallsyms] [k] yield_to
|
--- yield_to
|
|--99.65%-- kvm_vcpu_yield_to
| kvm_vcpu_on_spin
| pause_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--88.78%-- 0x10100000006
| |
| --11.22%-- 0x10100000002
--0.35%-- [...]
1.24% ksmd [kernel.kallsyms] [k] memcmp
|
--- memcmp
|
|--99.78%-- memcmp_pages
| |
| |--77.17%-- ksm_scan_thread
| | kthread
| | kernel_thread_helper
| |
| --22.83%-- try_to_merge_with_ksm_page
| ksm_scan_thread
| kthread
| kernel_thread_helper
--0.22%-- [...]
1.09% qemu-kvm [kernel.kallsyms] [k] svm_vcpu_run
|
--- svm_vcpu_run
|
|--99.44%-- kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--82.15%-- 0x10100000006
| |
| |--17.85%-- 0x10100000002
| --0.00%-- [...]
|
--0.56%-- kvm_vcpu_ioctl
do_vfs_ioctl
sys_ioctl
system_call_fastpath
ioctl
|
|--75.21%-- 0x10100000006
|
--24.79%-- 0x10100000002
1.09% swapper [kernel.kallsyms] [k] default_idle
|
--- default_idle
|
|--99.74%-- cpu_idle
| |
| |--76.31%-- start_secondary
| |
| --23.69%-- rest_init
| start_kernel
| x86_64_start_reservations
| x86_64_start_kernel
--0.26%-- [...]
1.08% ksmd [kernel.kallsyms] [k] smp_call_function_many
|
--- smp_call_function_many
|
|--99.97%-- native_flush_tlb_others
| |
| |--99.78%-- flush_tlb_page
| | ptep_clear_flush
| | try_to_merge_with_ksm_page
| | ksm_scan_thread
| | kthread
| | kernel_thread_helper
| --0.22%-- [...]
--0.03%-- [...]
0.77% qemu-kvm [kernel.kallsyms] [k] kvm_vcpu_on_spin
|
--- kvm_vcpu_on_spin
|
|--99.36%-- pause_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--90.08%-- 0x10100000006
| |
| |--9.92%-- 0x10100000002
| --0.00%-- [...]
|
--0.64%-- handle_exit
kvm_arch_vcpu_ioctl_run
kvm_vcpu_ioctl
do_vfs_ioctl
sys_ioctl
system_call_fastpath
ioctl
|
|--87.37%-- 0x10100000006
|
--12.63%-- 0x10100000002
0.75% qemu-kvm [kernel.kallsyms] [k] compact_zone
|
--- compact_zone
|
|--99.98%-- compact_zone_order
| try_to_compact_pages
| __alloc_pages_direct_compact
| __alloc_pages_nodemask
| alloc_pages_vma
| do_huge_pmd_anonymous_page
| handle_mm_fault
| __get_user_pages
| get_user_page_nowait
| hva_to_pfn.isra.17
| __gfn_to_pfn
| gfn_to_pfn_async
| try_async_pf
| tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--91.29%-- 0x10100000006
| |
| --8.71%-- 0x10100000002
--0.02%-- [...]
0.68% qemu-kvm [kernel.kallsyms] [k] _raw_spin_lock
|
--- _raw_spin_lock
|
|--39.71%-- yield_to
| kvm_vcpu_yield_to
| kvm_vcpu_on_spin
| pause_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--90.52%-- 0x10100000006
| |
| --9.48%-- 0x10100000002
|
|--15.63%-- kvm_vcpu_yield_to
| kvm_vcpu_on_spin
| pause_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--90.96%-- 0x10100000006
| |
| --9.04%-- 0x10100000002
|
|--6.55%-- tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--78.78%-- 0x10100000006
| |
| --21.22%-- 0x10100000002
|
|--4.87%-- free_pcppages_bulk
| |
| |--51.10%-- free_hot_cold_page
| | |
| | |--83.60%-- free_hot_cold_page_list
| | | |
| | | |--62.17%-- release_pages
| | | | pagevec_lru_move_fn
| | | | __pagevec_lru_add
| | | | |
| | | | |--99.22%-- __lru_cache_add
| | | | | lru_cache_add_lru
| | | | | putback_lru_page
| | | | | |
| | | | | |--99.61%-- migrate_pages
| | | | | | compact_zone
| | | | | | compact_zone_order
| | | | | | try_to_compact_pages
| | | | | | __alloc_pages_direct_compact
| | | | | | __alloc_pages_nodemask
| | | | | | alloc_pages_vma
| | | | | | do_huge_pmd_anonymous_page
| | | | | | handle_mm_fault
| | | | | | __get_user_pages
| | | | | | get_user_page_nowait
| | | | | | hva_to_pfn.isra.17
| | | | | | __gfn_to_pfn
| | | | | | gfn_to_pfn_async
| | | | | | try_async_pf
| | | | | | tdp_page_fault
| | | | | | kvm_mmu_page_fault
| | | | | | pf_interception
| | | | | | handle_exit
| | | | | | kvm_arch_vcpu_ioctl_run
| | | | | | kvm_vcpu_ioctl
| | | | | | do_vfs_ioctl
| | | | | | sys_ioctl
| | | | | | system_call_fastpath
| | | | | | ioctl
| | | | | | |
| | | | | | |--88.98%-- 0x10100000006
| | | | | | |
| | | | | | --11.02%-- 0x10100000002
| | | | | --0.39%-- [...]
| | | | |
| | | | --0.78%-- lru_add_drain_cpu
| | | | lru_add_drain
| | | | migrate_prep_local
| | | | compact_zone
| | | | compact_zone_order
| | | | try_to_compact_pages
| | | | __alloc_pages_direct_compact
| | | | __alloc_pages_nodemask
| | | | alloc_pages_vma
| | | | do_huge_pmd_anonymous_page
| | | | handle_mm_fault
| | | | __get_user_pages
| | | | get_user_page_nowait
| | | | hva_to_pfn.isra.17
| | | | __gfn_to_pfn
| | | | gfn_to_pfn_async
| | | | try_async_pf
| | | | tdp_page_fault
| | | | kvm_mmu_page_fault
| | | | pf_interception
| | | | handle_exit
| | | | kvm_arch_vcpu_ioctl_run
| | | | kvm_vcpu_ioctl
| | | | do_vfs_ioctl
| | | | sys_ioctl
| | | | system_call_fastpath
| | | | ioctl
| | | | 0x10100000006
| | | |
| | | --37.83%-- shrink_page_list
| | | shrink_inactive_list
| | | shrink_lruvec
| | | try_to_free_pages
| | | __alloc_pages_nodemask
| | | alloc_pages_vma
| | | do_huge_pmd_anonymous_page
| | | handle_mm_fault
| | | __get_user_pages
| | | get_user_page_nowait
| | | hva_to_pfn.isra.17
| | | __gfn_to_pfn
| | | gfn_to_pfn_async
| | | try_async_pf
| | | tdp_page_fault
| | | kvm_mmu_page_fault
| | | pf_interception
| | | handle_exit
| | | kvm_arch_vcpu_ioctl_run
| | | kvm_vcpu_ioctl
| | | do_vfs_ioctl
| | | sys_ioctl
| | | system_call_fastpath
| | | ioctl
| | | |
| | | |--86.38%-- 0x10100000006
| | | |
| | | --13.62%-- 0x10100000002
| | |
| | |--12.96%-- __free_pages
| | | |
| | | |--98.43%-- release_freepages
| | | | compact_zone
| | | | compact_zone_order
| | | | try_to_compact_pages
| | | | __alloc_pages_direct_compact
| | | | __alloc_pages_nodemask
| | | | alloc_pages_vma
| | | | do_huge_pmd_anonymous_page
| | | | handle_mm_fault
| | | | __get_user_pages
| | | | get_user_page_nowait
| | | | hva_to_pfn.isra.17
| | | | __gfn_to_pfn
| | | | gfn_to_pfn_async
| | | | try_async_pf
| | | | tdp_page_fault
| | | | kvm_mmu_page_fault
| | | | pf_interception
| | | | handle_exit
| | | | kvm_arch_vcpu_ioctl_run
| | | | kvm_vcpu_ioctl
| | | | do_vfs_ioctl
| | | | sys_ioctl
| | | | system_call_fastpath
| | | | ioctl
| | | | |
| | | | |--90.49%-- 0x10100000006
| | | | |
| | | | --9.51%-- 0x10100000002
| | | |
| | | --1.57%-- __free_slab
| | | discard_slab
| | | unfreeze_partials
| | | put_cpu_partial
| | | __slab_free
| | | kmem_cache_free
| | | free_buffer_head
| | | try_to_free_buffers
| | | jbd2_journal_try_to_free_buffers
| | | bdev_try_to_free_page
| | | blkdev_releasepage
| | | try_to_release_page
| | | move_to_new_page
| | | migrate_pages
| | | compact_zone
| | | compact_zone_order
| | | try_to_compact_pages
| | | __alloc_pages_direct_compact
| | | __alloc_pages_nodemask
| | | alloc_pages_vma
| | | do_huge_pmd_anonymous_page
| | | handle_mm_fault
| | | __get_user_pages
| | | get_user_page_nowait
| | | hva_to_pfn.isra.17
| | | __gfn_to_pfn
| | | gfn_to_pfn_async
| | | try_async_pf
| | | tdp_page_fault
| | | kvm_mmu_page_fault
| | | pf_interception
| | | handle_exit
| | | kvm_arch_vcpu_ioctl_run
| | | kvm_vcpu_ioctl
| | | do_vfs_ioctl
| | | sys_ioctl
| | | system_call_fastpath
| | | ioctl
| | | 0x10100000006
| | |
| | --3.44%-- __put_single_page
| | put_page
| | putback_lru_page
| | migrate_pages
| | compact_zone
| | compact_zone_order
| | try_to_compact_pages
| | __alloc_pages_direct_compact
| | __alloc_pages_nodemask
| | alloc_pages_vma
| | do_huge_pmd_anonymous_page
| | handle_mm_fault
| | __get_user_pages
| | get_user_page_nowait
| | hva_to_pfn.isra.17
| | __gfn_to_pfn
| | gfn_to_pfn_async
| | try_async_pf
| | tdp_page_fault
| | kvm_mmu_page_fault
| | pf_interception
| | handle_exit
| | kvm_arch_vcpu_ioctl_run
| | kvm_vcpu_ioctl
| | do_vfs_ioctl
| | sys_ioctl
| | system_call_fastpath
| | ioctl
| | |
| | |--88.25%-- 0x10100000006
| | |
| | --11.75%-- 0x10100000002
| |
| --48.90%-- drain_pages
| |
| |--88.65%-- drain_local_pages
| | |
| | |--96.33%-- generic_smp_call_function_interrupt
| | | smp_call_function_interrupt
| | | call_function_interrupt
| | | |
| | | |--23.46%-- __remove_mapping
| | | | shrink_page_list
| | | | shrink_inactive_list
| | | | shrink_lruvec
| | | | try_to_free_pages
| | | | __alloc_pages_nodemask
| | | | alloc_pages_vma
| | | | do_huge_pmd_anonymous_page
| | | | handle_mm_fault
| | | | __get_user_pages
| | | | get_user_page_nowait
| | | | hva_to_pfn.isra.17
| | | | __gfn_to_pfn
| | | | gfn_to_pfn_async
| | | | try_async_pf
| | | | tdp_page_fault
| | | | kvm_mmu_page_fault
| | | | pf_interception
| | | | handle_exit
| | | | kvm_arch_vcpu_ioctl_run
| | | | kvm_vcpu_ioctl
| | | | do_vfs_ioctl
| | | | sys_ioctl
| | | | system_call_fastpath
| | | | ioctl
| | | | |
| | | | |--93.81%-- 0x10100000006
| | | | |
| | | | --6.19%-- 0x10100000002
| | | |
| | | |--19.93%-- kvm_vcpu_ioctl
| | | | do_vfs_ioctl
| | | | sys_ioctl
| | | | system_call_fastpath
| | | | ioctl
| | | | |
| | | | |--93.65%-- 0x10100000006
| | | | |
| | | | --6.35%-- 0x10100000002
| | | |
| | | |--14.19%-- compaction_alloc
| | | | migrate_pages
| | | | compact_zone
| | | | compact_zone_order
| | | | try_to_compact_pages
| | | | __alloc_pages_direct_compact
| | | | __alloc_pages_nodemask
| | | | alloc_pages_vma
| | | | do_huge_pmd_anonymous_page
| | | | handle_mm_fault
| | | | __get_user_pages
| | | | get_user_page_nowait
| | | | hva_to_pfn.isra.17
| | | | __gfn_to_pfn
| | | | gfn_to_pfn_async
| | | | try_async_pf
| | | | tdp_page_fault
| | | | kvm_mmu_page_fault
| | | | pf_interception
| | | | handle_exit
| | | | kvm_arch_vcpu_ioctl_run
| | | | kvm_vcpu_ioctl
| | | | do_vfs_ioctl
| | | | sys_ioctl
| | | | system_call_fastpath
| | | | ioctl
| | | | |
| | | | |--89.88%-- 0x10100000006
| | | | |
| | | | --10.12%-- 0x10100000002
| | | |
| | | |--8.57%-- isolate_migratepages_range
| | | | compact_zone
| | | | compact_zone_order
| | | | try_to_compact_pages
| | | | __alloc_pages_direct_compact
| | | | __alloc_pages_nodemask
| | | | alloc_pages_vma
| | | | do_huge_pmd_anonymous_page
| | | | handle_mm_fault
| | | | __get_user_pages
| | | | get_user_page_nowait
| | | | hva_to_pfn.isra.17
| | | | __gfn_to_pfn
| | | | gfn_to_pfn_async
| | | | try_async_pf
| | | | tdp_page_fault
| | | | kvm_mmu_page_fault
| | | | pf_interception
| | | | handle_exit
| | | | kvm_arch_vcpu_ioctl_run
| | | | kvm_vcpu_ioctl
| | | | do_vfs_ioctl
| | | | sys_ioctl
| | | | system_call_fastpath
| | | | ioctl
| | | | |
| | | | |--92.14%-- 0x10100000006
| | | | |
| | | | --7.86%-- 0x10100000002
| | | |
| | | |--5.05%-- do_huge_pmd_anonymous_page
| | | | handle_mm_fault
| | | | __get_user_pages
| | | | get_user_page_nowait
| | | | hva_to_pfn.isra.17
| | | | __gfn_to_pfn
| | | | gfn_to_pfn_async
| | | | try_async_pf
| | | | tdp_page_fault
| | | | kvm_mmu_page_fault
| | | | pf_interception
| | | | handle_exit
| | | | kvm_arch_vcpu_ioctl_run
| | | | kvm_vcpu_ioctl
| | | | do_vfs_ioctl
| | | | sys_ioctl
| | | | system_call_fastpath
| | | | ioctl
| | | | |
| | | | |--92.53%-- 0x10100000006
| | | | |
| | | | --7.47%-- 0x10100000002
| | | |
| | | |--4.49%-- shrink_inactive_list
| | | | shrink_lruvec
| | | | try_to_free_pages
| | | | __alloc_pages_nodemask
| | | | alloc_pages_vma
| | | | do_huge_pmd_anonymous_page
| | | | handle_mm_fault
| | | | __get_user_pages
| | | | get_user_page_nowait
| | | | hva_to_pfn.isra.17
| | | | __gfn_to_pfn
| | | | gfn_to_pfn_async
| | | | try_async_pf
| | | | tdp_page_fault
| | | | kvm_mmu_page_fault
| | | | pf_interception
| | | | handle_exit
| | | | kvm_arch_vcpu_ioctl_run
| | | | kvm_vcpu_ioctl
| | | | do_vfs_ioctl
| | | | sys_ioctl
| | | | system_call_fastpath
| | | | ioctl
| | | | |
| | | | |--94.61%-- 0x10100000006
| | | | |
| | | | --5.39%-- 0x10100000002
| | | |
| | | |--2.80%-- free_hot_cold_page_list
| | | | shrink_page_list
| | | | shrink_inactive_list
| | | | shrink_lruvec
| | | | try_to_free_pages
| | | | __alloc_pages_nodemask
| | | | alloc_pages_vma
| | | | do_huge_pmd_anonymous_page
| | | | handle_mm_fault
| | | | __get_user_pages
| | | | get_user_page_nowait
| | | | hva_to_pfn.isra.17
| | | | __gfn_to_pfn
| | | | gfn_to_pfn_async
| | | | try_async_pf
| | | | tdp_page_fault
| | | | kvm_mmu_page_fault
| | | | pf_interception
| | | | handle_exit
| | | | kvm_arch_vcpu_ioctl_run
| | | | kvm_vcpu_ioctl
| | | | do_vfs_ioctl
| | | | sys_ioctl
| | | | system_call_fastpath
| | | | ioctl
| | | | |
| | | | |--91.24%-- 0x10100000006
| | | | |
| | | | --8.76%-- 0x10100000002
| | | |
| | | |--1.96%-- buffer_migrate_page
| | | | move_to_new_page
| | | | migrate_pages
| | | | compact_zone
| | | | compact_zone_order
| | | | try_to_compact_pages
| | | | __alloc_pages_direct_compact
| | | | __alloc_pages_nodemask
| | | | alloc_pages_vma
| | | | do_huge_pmd_anonymous_page
| | | | handle_mm_fault
| | | | __get_user_pages
| | | | get_user_page_nowait
| | | | hva_to_pfn.isra.17
| | | | __gfn_to_pfn
| | | | gfn_to_pfn_async
| | | | try_async_pf
| | | | tdp_page_fault
| | | | kvm_mmu_page_fault
| | | | pf_interception
| | | | handle_exit
| | | | kvm_arch_vcpu_ioctl_run
| | | | kvm_vcpu_ioctl
| | | | do_vfs_ioctl
| | | | sys_ioctl
| | | | system_call_fastpath
| | | | ioctl
| | | | |
| | | | |--63.14%-- 0x10100000006
| | | | |
| | | | --36.86%-- 0x10100000002
| | | |
| | | |--1.62%-- try_to_free_buffers
| | | | jbd2_journal_try_to_free_buffers
| | | | ext4_releasepage
| | | | try_to_release_page
| | | | shrink_page_list
| | | | shrink_inactive_list
| | | | shrink_lruvec
| | | | try_to_free_pages
| | | | __alloc_pages_nodemask
| | | | alloc_pages_vma
| | | | do_huge_pmd_anonymous_page
| | | | handle_mm_fault
| | | | __get_user_pages
| | | | get_user_page_nowait
| | | | hva_to_pfn.isra.17
| | | | __gfn_to_pfn
| | | | gfn_to_pfn_async
| | | | try_async_pf
| | | | tdp_page_fault
| | | | kvm_mmu_page_fault
| | | | pf_interception
| | | | handle_exit
| | | | kvm_arch_vcpu_ioctl_run
| | | | kvm_vcpu_ioctl
| | | | do_vfs_ioctl
| | | | sys_ioctl
| | | | system_call_fastpath
| | | | ioctl
| | | | 0x10100000006
| | | |
| | | |--1.49%-- compact_checklock_irqsave
| | | | isolate_migratepages_range
| | | | compact_zone
| | | | compact_zone_order
| | | | try_to_compact_pages
| | | | __alloc_pages_direct_compact
| | | | __alloc_pages_nodemask
| | | | alloc_pages_vma
| | | | do_huge_pmd_anonymous_page
| | | | handle_mm_fault
| | | | __get_user_pages
| | | | get_user_page_nowait
| | | | hva_to_pfn.isra.17
| | | | __gfn_to_pfn
| | | | gfn_to_pfn_async
| | | | try_async_pf
| | | | tdp_page_fault
| | | | kvm_mmu_page_fault
| | | | pf_interception
| | | | handle_exit
| | | | kvm_arch_vcpu_ioctl_run
| | | | kvm_vcpu_ioctl
| | | | do_vfs_ioctl
| | | | sys_ioctl
| | | | system_call_fastpath
| | | | ioctl
| | | | 0x10100000006
| | | |
| | | |--1.46%-- __mutex_lock_slowpath
| | | | mutex_lock
| | | | page_lock_anon_vma
| | | | page_referenced
| | | | shrink_active_list
| | | | shrink_lruvec
| | | | try_to_free_pages
| | | | __alloc_pages_nodemask
| | | | alloc_pages_vma
| | | | do_huge_pmd_anonymous_page
| | | | handle_mm_fault
| | | | __get_user_pages
| | | | get_user_page_nowait
| | | | hva_to_pfn.isra.17
| | | | __gfn_to_pfn
| | | | gfn_to_pfn_async
| | | | try_async_pf
| | | | tdp_page_fault
| | | | kvm_mmu_page_fault
| | | | pf_interception
| | | | handle_exit
| | | | kvm_arch_vcpu_ioctl_run
| | | | kvm_vcpu_ioctl
| | | | do_vfs_ioctl
| | | | sys_ioctl
| | | | system_call_fastpath
| | | | ioctl
| | | | 0x10100000006
| | | |
| | | |--1.41%-- native_flush_tlb_others
| | | | flush_tlb_page
| | | | |
| | | | |--67.10%-- ptep_clear_flush
| | | | | try_to_unmap_one
| | | | | try_to_unmap_anon
| | | | | try_to_unmap
| | | | | migrate_pages
| | | | | compact_zone
| | | | | compact_zone_order
| | | | | try_to_compact_pages
| | | | | __alloc_pages_direct_compact
| | | | | __alloc_pages_nodemask
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: [Qemu-devel] [PATCH -v2 2/2] make the compaction "skip ahead" logic robust
@ 2012-09-15 15:55 ` Richard Davies
0 siblings, 0 replies; 101+ messages in thread
From: Richard Davies @ 2012-09-15 15:55 UTC (permalink / raw)
To: Rik van Riel
Cc: kvm, qemu-devel, linux-mm, Mel Gorman, Shaohua Li, Avi Kivity
Hi Rik, Mel and Shaohua,
Thank you for your latest patches. I attach my latest perf report for a slow
boot with all of these applied.
Mel asked for timings of the slow boots. It's very hard to give anything
useful here! A normal boot would be a minute or so, and many are like that,
but the slowest that I have seen (on 3.5.x) was several hours. Basically, I
just test many times until I get one which is noticeably slow than normal
and then run perf record on that one.
The latest perf report for a slow boot is below. For the fast boots, most of
the time is in clean_page_c in do_huge_pmd_anonymous_page, but for this slow
one there is a lot of lock contention above that.
Thanks,
Richard.
# ========
# captured on: Sat Sep 15 15:40:54 2012
# os release : 3.6.0-rc5-elastic+
# perf version : 3.5.2
# arch : x86_64
# nrcpus online : 16
# nrcpus avail : 16
# cpudesc : AMD Opteron(tm) Processor 6128
# cpuid : AuthenticAMD,16,9,1
# total memory : 131973280 kB
# cmdline : /home/root/bin/perf record -g -a
# event : name = cycles, type = 0, config = 0x0, config1 = 0x0, config2 = 0x0, excl_usr = 0, excl_kern = 0, id = { 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80 }
# HEADER_CPU_TOPOLOGY info available, use -I to display
# HEADER_NUMA_TOPOLOGY info available, use -I to display
# ========
#
# Samples: 3M of event 'cycles'
# Event count (approx.): 1457256240581
#
# Overhead Command Shared Object Symbol
# ........ ............... .................... ..............................................
#
58.49% qemu-kvm [kernel.kallsyms] [k] _raw_spin_lock_irqsave
|
--- _raw_spin_lock_irqsave
|
|--95.07%-- compact_checklock_irqsave
| |
| |--70.03%-- isolate_migratepages_range
| | compact_zone
| | compact_zone_order
| | try_to_compact_pages
| | __alloc_pages_direct_compact
| | __alloc_pages_nodemask
| | alloc_pages_vma
| | do_huge_pmd_anonymous_page
| | handle_mm_fault
| | __get_user_pages
| | get_user_page_nowait
| | hva_to_pfn.isra.17
| | __gfn_to_pfn
| | gfn_to_pfn_async
| | try_async_pf
| | tdp_page_fault
| | kvm_mmu_page_fault
| | pf_interception
| | handle_exit
| | kvm_arch_vcpu_ioctl_run
| | kvm_vcpu_ioctl
| | do_vfs_ioctl
| | sys_ioctl
| | system_call_fastpath
| | ioctl
| | |
| | |--92.76%-- 0x10100000006
| | |
| | --7.24%-- 0x10100000002
| |
| --29.97%-- compaction_alloc
| migrate_pages
| compact_zone
| compact_zone_order
| try_to_compact_pages
| __alloc_pages_direct_compact
| __alloc_pages_nodemask
| alloc_pages_vma
| do_huge_pmd_anonymous_page
| handle_mm_fault
| __get_user_pages
| get_user_page_nowait
| hva_to_pfn.isra.17
| __gfn_to_pfn
| gfn_to_pfn_async
| try_async_pf
| tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--90.69%-- 0x10100000006
| |
| --9.31%-- 0x10100000002
|
|--4.53%-- isolate_migratepages_range
| compact_zone
| compact_zone_order
| try_to_compact_pages
| __alloc_pages_direct_compact
| __alloc_pages_nodemask
| alloc_pages_vma
| do_huge_pmd_anonymous_page
| handle_mm_fault
| __get_user_pages
| get_user_page_nowait
| hva_to_pfn.isra.17
| __gfn_to_pfn
| gfn_to_pfn_async
| try_async_pf
| tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--92.22%-- 0x10100000006
| |
| --7.78%-- 0x10100000002
--0.40%-- [...]
13.14% qemu-kvm [kernel.kallsyms] [k] clear_page_c
|
--- clear_page_c
|
|--99.38%-- do_huge_pmd_anonymous_page
| handle_mm_fault
| __get_user_pages
| get_user_page_nowait
| hva_to_pfn.isra.17
| __gfn_to_pfn
| gfn_to_pfn_async
| try_async_pf
| tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--51.86%-- 0x10100000006
| |
| |--48.14%-- 0x10100000002
| --0.01%-- [...]
|
--0.62%-- __alloc_pages_nodemask
|
|--76.27%-- alloc_pages_vma
| handle_pte_fault
| |
| |--99.57%-- handle_mm_fault
| | |
| | |--99.65%-- __get_user_pages
| | | get_user_page_nowait
| | | hva_to_pfn.isra.17
| | | __gfn_to_pfn
| | | gfn_to_pfn_async
| | | try_async_pf
| | | tdp_page_fault
| | | kvm_mmu_page_fault
| | | pf_interception
| | | handle_exit
| | | kvm_arch_vcpu_ioctl_run
| | | kvm_vcpu_ioctl
| | | do_vfs_ioctl
| | | sys_ioctl
| | | system_call_fastpath
| | | ioctl
| | | |
| | | |--91.77%-- 0x10100000006
| | | |
| | | --8.23%-- 0x10100000002
| | --0.35%-- [...]
| --0.43%-- [...]
|
--23.73%-- alloc_pages_current
|
|--99.20%-- pte_alloc_one
| |
| |--98.68%-- do_huge_pmd_anonymous_page
| | handle_mm_fault
| | __get_user_pages
| | get_user_page_nowait
| | hva_to_pfn.isra.17
| | __gfn_to_pfn
| | gfn_to_pfn_async
| | try_async_pf
| | tdp_page_fault
| | kvm_mmu_page_fault
| | pf_interception
| | handle_exit
| | kvm_arch_vcpu_ioctl_run
| | kvm_vcpu_ioctl
| | do_vfs_ioctl
| | sys_ioctl
| | system_call_fastpath
| | ioctl
| | |
| | |--58.61%-- 0x10100000002
| | |
| | --41.39%-- 0x10100000006
| |
| --1.32%-- __pte_alloc
| do_huge_pmd_anonymous_page
| handle_mm_fault
| __get_user_pages
| get_user_page_nowait
| hva_to_pfn.isra.17
| __gfn_to_pfn
| gfn_to_pfn_async
| try_async_pf
| tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| 0x10100000006
|
|--0.69%-- __vmalloc_node_range
| __vmalloc_node
| vzalloc
| __kvm_set_memory_region
| kvm_set_memory_region
| kvm_vm_ioctl_set_memory_region
| kvm_vm_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
--0.12%-- [...]
6.31% qemu-kvm [kernel.kallsyms] [k] isolate_freepages_block
|
--- isolate_freepages_block
|
|--99.98%-- compaction_alloc
| migrate_pages
| compact_zone
| compact_zone_order
| try_to_compact_pages
| __alloc_pages_direct_compact
| __alloc_pages_nodemask
| alloc_pages_vma
| do_huge_pmd_anonymous_page
| handle_mm_fault
| __get_user_pages
| get_user_page_nowait
| hva_to_pfn.isra.17
| __gfn_to_pfn
| gfn_to_pfn_async
| try_async_pf
| tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--91.13%-- 0x10100000006
| |
| --8.87%-- 0x10100000002
--0.02%-- [...]
1.68% qemu-kvm [kernel.kallsyms] [k] yield_to
|
--- yield_to
|
|--99.65%-- kvm_vcpu_yield_to
| kvm_vcpu_on_spin
| pause_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--88.78%-- 0x10100000006
| |
| --11.22%-- 0x10100000002
--0.35%-- [...]
1.24% ksmd [kernel.kallsyms] [k] memcmp
|
--- memcmp
|
|--99.78%-- memcmp_pages
| |
| |--77.17%-- ksm_scan_thread
| | kthread
| | kernel_thread_helper
| |
| --22.83%-- try_to_merge_with_ksm_page
| ksm_scan_thread
| kthread
| kernel_thread_helper
--0.22%-- [...]
1.09% qemu-kvm [kernel.kallsyms] [k] svm_vcpu_run
|
--- svm_vcpu_run
|
|--99.44%-- kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--82.15%-- 0x10100000006
| |
| |--17.85%-- 0x10100000002
| --0.00%-- [...]
|
--0.56%-- kvm_vcpu_ioctl
do_vfs_ioctl
sys_ioctl
system_call_fastpath
ioctl
|
|--75.21%-- 0x10100000006
|
--24.79%-- 0x10100000002
1.09% swapper [kernel.kallsyms] [k] default_idle
|
--- default_idle
|
|--99.74%-- cpu_idle
| |
| |--76.31%-- start_secondary
| |
| --23.69%-- rest_init
| start_kernel
| x86_64_start_reservations
| x86_64_start_kernel
--0.26%-- [...]
1.08% ksmd [kernel.kallsyms] [k] smp_call_function_many
|
--- smp_call_function_many
|
|--99.97%-- native_flush_tlb_others
| |
| |--99.78%-- flush_tlb_page
| | ptep_clear_flush
| | try_to_merge_with_ksm_page
| | ksm_scan_thread
| | kthread
| | kernel_thread_helper
| --0.22%-- [...]
--0.03%-- [...]
0.77% qemu-kvm [kernel.kallsyms] [k] kvm_vcpu_on_spin
|
--- kvm_vcpu_on_spin
|
|--99.36%-- pause_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--90.08%-- 0x10100000006
| |
| |--9.92%-- 0x10100000002
| --0.00%-- [...]
|
--0.64%-- handle_exit
kvm_arch_vcpu_ioctl_run
kvm_vcpu_ioctl
do_vfs_ioctl
sys_ioctl
system_call_fastpath
ioctl
|
|--87.37%-- 0x10100000006
|
--12.63%-- 0x10100000002
0.75% qemu-kvm [kernel.kallsyms] [k] compact_zone
|
--- compact_zone
|
|--99.98%-- compact_zone_order
| try_to_compact_pages
| __alloc_pages_direct_compact
| __alloc_pages_nodemask
| alloc_pages_vma
| do_huge_pmd_anonymous_page
| handle_mm_fault
| __get_user_pages
| get_user_page_nowait
| hva_to_pfn.isra.17
| __gfn_to_pfn
| gfn_to_pfn_async
| try_async_pf
| tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--91.29%-- 0x10100000006
| |
| --8.71%-- 0x10100000002
--0.02%-- [...]
0.68% qemu-kvm [kernel.kallsyms] [k] _raw_spin_lock
|
--- _raw_spin_lock
|
|--39.71%-- yield_to
| kvm_vcpu_yield_to
| kvm_vcpu_on_spin
| pause_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--90.52%-- 0x10100000006
| |
| --9.48%-- 0x10100000002
|
|--15.63%-- kvm_vcpu_yield_to
| kvm_vcpu_on_spin
| pause_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--90.96%-- 0x10100000006
| |
| --9.04%-- 0x10100000002
|
|--6.55%-- tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--78.78%-- 0x10100000006
| |
| --21.22%-- 0x10100000002
|
|--4.87%-- free_pcppages_bulk
| |
| |--51.10%-- free_hot_cold_page
| | |
| | |--83.60%-- free_hot_cold_page_list
| | | |
| | | |--62.17%-- release_pages
| | | | pagevec_lru_move_fn
| | | | __pagevec_lru_add
| | | | |
| | | | |--99.22%-- __lru_cache_add
| | | | | lru_cache_add_lru
| | | | | putback_lru_page
| | | | | |
| | | | | |--99.61%-- migrate_pages
| | | | | | compact_zone
| | | | | | compact_zone_order
| | | | | | try_to_compact_pages
| | | | | | __alloc_pages_direct_compact
| | | | | | __alloc_pages_nodemask
| | | | | | alloc_pages_vma
| | | | | | do_huge_pmd_anonymous_page
| | | | | | handle_mm_fault
| | | | | | __get_user_pages
| | | | | | get_user_page_nowait
| | | | | | hva_to_pfn.isra.17
| | | | | | __gfn_to_pfn
| | | | | | gfn_to_pfn_async
| | | | | | try_async_pf
| | | | | | tdp_page_fault
| | | | | | kvm_mmu_page_fault
| | | | | | pf_interception
| | | | | | handle_exit
| | | | | | kvm_arch_vcpu_ioctl_run
| | | | | | kvm_vcpu_ioctl
| | | | | | do_vfs_ioctl
| | | | | | sys_ioctl
| | | | | | system_call_fastpath
| | | | | | ioctl
| | | | | | |
| | | | | | |--88.98%-- 0x10100000006
| | | | | | |
| | | | | | --11.02%-- 0x10100000002
| | | | | --0.39%-- [...]
| | | | |
| | | | --0.78%-- lru_add_drain_cpu
| | | | lru_add_drain
| | | | migrate_prep_local
| | | | compact_zone
| | | | compact_zone_order
| | | | try_to_compact_pages
| | | | __alloc_pages_direct_compact
| | | | __alloc_pages_nodemask
| | | | alloc_pages_vma
| | | | do_huge_pmd_anonymous_page
| | | | handle_mm_fault
| | | | __get_user_pages
| | | | get_user_page_nowait
| | | | hva_to_pfn.isra.17
| | | | __gfn_to_pfn
| | | | gfn_to_pfn_async
| | | | try_async_pf
| | | | tdp_page_fault
| | | | kvm_mmu_page_fault
| | | | pf_interception
| | | | handle_exit
| | | | kvm_arch_vcpu_ioctl_run
| | | | kvm_vcpu_ioctl
| | | | do_vfs_ioctl
| | | | sys_ioctl
| | | | system_call_fastpath
| | | | ioctl
| | | | 0x10100000006
| | | |
| | | --37.83%-- shrink_page_list
| | | shrink_inactive_list
| | | shrink_lruvec
| | | try_to_free_pages
| | | __alloc_pages_nodemask
| | | alloc_pages_vma
| | | do_huge_pmd_anonymous_page
| | | handle_mm_fault
| | | __get_user_pages
| | | get_user_page_nowait
| | | hva_to_pfn.isra.17
| | | __gfn_to_pfn
| | | gfn_to_pfn_async
| | | try_async_pf
| | | tdp_page_fault
| | | kvm_mmu_page_fault
| | | pf_interception
| | | handle_exit
| | | kvm_arch_vcpu_ioctl_run
| | | kvm_vcpu_ioctl
| | | do_vfs_ioctl
| | | sys_ioctl
| | | system_call_fastpath
| | | ioctl
| | | |
| | | |--86.38%-- 0x10100000006
| | | |
| | | --13.62%-- 0x10100000002
| | |
| | |--12.96%-- __free_pages
| | | |
| | | |--98.43%-- release_freepages
| | | | compact_zone
| | | | compact_zone_order
| | | | try_to_compact_pages
| | | | __alloc_pages_direct_compact
| | | | __alloc_pages_nodemask
| | | | alloc_pages_vma
| | | | do_huge_pmd_anonymous_page
| | | | handle_mm_fault
| | | | __get_user_pages
| | | | get_user_page_nowait
| | | | hva_to_pfn.isra.17
| | | | __gfn_to_pfn
| | | | gfn_to_pfn_async
| | | | try_async_pf
| | | | tdp_page_fault
| | | | kvm_mmu_page_fault
| | | | pf_interception
| | | | handle_exit
| | | | kvm_arch_vcpu_ioctl_run
| | | | kvm_vcpu_ioctl
| | | | do_vfs_ioctl
| | | | sys_ioctl
| | | | system_call_fastpath
| | | | ioctl
| | | | |
| | | | |--90.49%-- 0x10100000006
| | | | |
| | | | --9.51%-- 0x10100000002
| | | |
| | | --1.57%-- __free_slab
| | | discard_slab
| | | unfreeze_partials
| | | put_cpu_partial
| | | __slab_free
| | | kmem_cache_free
| | | free_buffer_head
| | | try_to_free_buffers
| | | jbd2_journal_try_to_free_buffers
| | | bdev_try_to_free_page
| | | blkdev_releasepage
| | | try_to_release_page
| | | move_to_new_page
| | | migrate_pages
| | | compact_zone
| | | compact_zone_order
| | | try_to_compact_pages
| | | __alloc_pages_direct_compact
| | | __alloc_pages_nodemask
| | | alloc_pages_vma
| | | do_huge_pmd_anonymous_page
| | | handle_mm_fault
| | | __get_user_pages
| | | get_user_page_nowait
| | | hva_to_pfn.isra.17
| | | __gfn_to_pfn
| | | gfn_to_pfn_async
| | | try_async_pf
| | | tdp_page_fault
| | | kvm_mmu_page_fault
| | | pf_interception
| | | handle_exit
| | | kvm_arch_vcpu_ioctl_run
| | | kvm_vcpu_ioctl
| | | do_vfs_ioctl
| | | sys_ioctl
| | | system_call_fastpath
| | | ioctl
| | | 0x10100000006
| | |
| | --3.44%-- __put_single_page
| | put_page
| | putback_lru_page
| | migrate_pages
| | compact_zone
| | compact_zone_order
| | try_to_compact_pages
| | __alloc_pages_direct_compact
| | __alloc_pages_nodemask
| | alloc_pages_vma
| | do_huge_pmd_anonymous_page
| | handle_mm_fault
| | __get_user_pages
| | get_user_page_nowait
| | hva_to_pfn.isra.17
| | __gfn_to_pfn
| | gfn_to_pfn_async
| | try_async_pf
| | tdp_page_fault
| | kvm_mmu_page_fault
| | pf_interception
| | handle_exit
| | kvm_arch_vcpu_ioctl_run
| | kvm_vcpu_ioctl
| | do_vfs_ioctl
| | sys_ioctl
| | system_call_fastpath
| | ioctl
| | |
| | |--88.25%-- 0x10100000006
| | |
| | --11.75%-- 0x10100000002
| |
| --48.90%-- drain_pages
| |
| |--88.65%-- drain_local_pages
| | |
| | |--96.33%-- generic_smp_call_function_interrupt
| | | smp_call_function_interrupt
| | | call_function_interrupt
| | | |
| | | |--23.46%-- __remove_mapping
| | | | shrink_page_list
| | | | shrink_inactive_list
| | | | shrink_lruvec
| | | | try_to_free_pages
| | | | __alloc_pages_nodemask
| | | | alloc_pages_vma
| | | | do_huge_pmd_anonymous_page
| | | | handle_mm_fault
| | | | __get_user_pages
| | | | get_user_page_nowait
| | | | hva_to_pfn.isra.17
| | | | __gfn_to_pfn
| | | | gfn_to_pfn_async
| | | | try_async_pf
| | | | tdp_page_fault
| | | | kvm_mmu_page_fault
| | | | pf_interception
| | | | handle_exit
| | | | kvm_arch_vcpu_ioctl_run
| | | | kvm_vcpu_ioctl
| | | | do_vfs_ioctl
| | | | sys_ioctl
| | | | system_call_fastpath
| | | | ioctl
| | | | |
| | | | |--93.81%-- 0x10100000006
| | | | |
| | | | --6.19%-- 0x10100000002
| | | |
| | | |--19.93%-- kvm_vcpu_ioctl
| | | | do_vfs_ioctl
| | | | sys_ioctl
| | | | system_call_fastpath
| | | | ioctl
| | | | |
| | | | |--93.65%-- 0x10100000006
| | | | |
| | | | --6.35%-- 0x10100000002
| | | |
| | | |--14.19%-- compaction_alloc
| | | | migrate_pages
| | | | compact_zone
| | | | compact_zone_order
| | | | try_to_compact_pages
| | | | __alloc_pages_direct_compact
| | | | __alloc_pages_nodemask
| | | | alloc_pages_vma
| | | | do_huge_pmd_anonymous_page
| | | | handle_mm_fault
| | | | __get_user_pages
| | | | get_user_page_nowait
| | | | hva_to_pfn.isra.17
| | | | __gfn_to_pfn
| | | | gfn_to_pfn_async
| | | | try_async_pf
| | | | tdp_page_fault
| | | | kvm_mmu_page_fault
| | | | pf_interception
| | | | handle_exit
| | | | kvm_arch_vcpu_ioctl_run
| | | | kvm_vcpu_ioctl
| | | | do_vfs_ioctl
| | | | sys_ioctl
| | | | system_call_fastpath
| | | | ioctl
| | | | |
| | | | |--89.88%-- 0x10100000006
| | | | |
| | | | --10.12%-- 0x10100000002
| | | |
| | | |--8.57%-- isolate_migratepages_range
| | | | compact_zone
| | | | compact_zone_order
| | | | try_to_compact_pages
| | | | __alloc_pages_direct_compact
| | | | __alloc_pages_nodemask
| | | | alloc_pages_vma
| | | | do_huge_pmd_anonymous_page
| | | | handle_mm_fault
| | | | __get_user_pages
| | | | get_user_page_nowait
| | | | hva_to_pfn.isra.17
| | | | __gfn_to_pfn
| | | | gfn_to_pfn_async
| | | | try_async_pf
| | | | tdp_page_fault
| | | | kvm_mmu_page_fault
| | | | pf_interception
| | | | handle_exit
| | | | kvm_arch_vcpu_ioctl_run
| | | | kvm_vcpu_ioctl
| | | | do_vfs_ioctl
| | | | sys_ioctl
| | | | system_call_fastpath
| | | | ioctl
| | | | |
| | | | |--92.14%-- 0x10100000006
| | | | |
| | | | --7.86%-- 0x10100000002
| | | |
| | | |--5.05%-- do_huge_pmd_anonymous_page
| | | | handle_mm_fault
| | | | __get_user_pages
| | | | get_user_page_nowait
| | | | hva_to_pfn.isra.17
| | | | __gfn_to_pfn
| | | | gfn_to_pfn_async
| | | | try_async_pf
| | | | tdp_page_fault
| | | | kvm_mmu_page_fault
| | | | pf_interception
| | | | handle_exit
| | | | kvm_arch_vcpu_ioctl_run
| | | | kvm_vcpu_ioctl
| | | | do_vfs_ioctl
| | | | sys_ioctl
| | | | system_call_fastpath
| | | | ioctl
| | | | |
| | | | |--92.53%-- 0x10100000006
| | | | |
| | | | --7.47%-- 0x10100000002
| | | |
| | | |--4.49%-- shrink_inactive_list
| | | | shrink_lruvec
| | | | try_to_free_pages
| | | | __alloc_pages_nodemask
| | | | alloc_pages_vma
| | | | do_huge_pmd_anonymous_page
| | | | handle_mm_fault
| | | | __get_user_pages
| | | | get_user_page_nowait
| | | | hva_to_pfn.isra.17
| | | | __gfn_to_pfn
| | | | gfn_to_pfn_async
| | | | try_async_pf
| | | | tdp_page_fault
| | | | kvm_mmu_page_fault
| | | | pf_interception
| | | | handle_exit
| | | | kvm_arch_vcpu_ioctl_run
| | | | kvm_vcpu_ioctl
| | | | do_vfs_ioctl
| | | | sys_ioctl
| | | | system_call_fastpath
| | | | ioctl
| | | | |
| | | | |--94.61%-- 0x10100000006
| | | | |
| | | | --5.39%-- 0x10100000002
| | | |
| | | |--2.80%-- free_hot_cold_page_list
| | | | shrink_page_list
| | | | shrink_inactive_list
| | | | shrink_lruvec
| | | | try_to_free_pages
| | | | __alloc_pages_nodemask
| | | | alloc_pages_vma
| | | | do_huge_pmd_anonymous_page
| | | | handle_mm_fault
| | | | __get_user_pages
| | | | get_user_page_nowait
| | | | hva_to_pfn.isra.17
| | | | __gfn_to_pfn
| | | | gfn_to_pfn_async
| | | | try_async_pf
| | | | tdp_page_fault
| | | | kvm_mmu_page_fault
| | | | pf_interception
| | | | handle_exit
| | | | kvm_arch_vcpu_ioctl_run
| | | | kvm_vcpu_ioctl
| | | | do_vfs_ioctl
| | | | sys_ioctl
| | | | system_call_fastpath
| | | | ioctl
| | | | |
| | | | |--91.24%-- 0x10100000006
| | | | |
| | | | --8.76%-- 0x10100000002
| | | |
| | | |--1.96%-- buffer_migrate_page
| | | | move_to_new_page
| | | | migrate_pages
| | | | compact_zone
| | | | compact_zone_order
| | | | try_to_compact_pages
| | | | __alloc_pages_direct_compact
| | | | __alloc_pages_nodemask
| | | | alloc_pages_vma
| | | | do_huge_pmd_anonymous_page
| | | | handle_mm_fault
| | | | __get_user_pages
| | | | get_user_page_nowait
| | | | hva_to_pfn.isra.17
| | | | __gfn_to_pfn
| | | | gfn_to_pfn_async
| | | | try_async_pf
| | | | tdp_page_fault
| | | | kvm_mmu_page_fault
| | | | pf_interception
| | | | handle_exit
| | | | kvm_arch_vcpu_ioctl_run
| | | | kvm_vcpu_ioctl
| | | | do_vfs_ioctl
| | | | sys_ioctl
| | | | system_call_fastpath
| | | | ioctl
| | | | |
| | | | |--63.14%-- 0x10100000006
| | | | |
| | | | --36.86%-- 0x10100000002
| | | |
| | | |--1.62%-- try_to_free_buffers
| | | | jbd2_journal_try_to_free_buffers
| | | | ext4_releasepage
| | | | try_to_release_page
| | | | shrink_page_list
| | | | shrink_inactive_list
| | | | shrink_lruvec
| | | | try_to_free_pages
| | | | __alloc_pages_nodemask
| | | | alloc_pages_vma
| | | | do_huge_pmd_anonymous_page
| | | | handle_mm_fault
| | | | __get_user_pages
| | | | get_user_page_nowait
| | | | hva_to_pfn.isra.17
| | | | __gfn_to_pfn
| | | | gfn_to_pfn_async
| | | | try_async_pf
| | | | tdp_page_fault
| | | | kvm_mmu_page_fault
| | | | pf_interception
| | | | handle_exit
| | | | kvm_arch_vcpu_ioctl_run
| | | | kvm_vcpu_ioctl
| | | | do_vfs_ioctl
| | | | sys_ioctl
| | | | system_call_fastpath
| | | | ioctl
| | | | 0x10100000006
| | | |
| | | |--1.49%-- compact_checklock_irqsave
| | | | isolate_migratepages_range
| | | | compact_zone
| | | | compact_zone_order
| | | | try_to_compact_pages
| | | | __alloc_pages_direct_compact
| | | | __alloc_pages_nodemask
| | | | alloc_pages_vma
| | | | do_huge_pmd_anonymous_page
| | | | handle_mm_fault
| | | | __get_user_pages
| | | | get_user_page_nowait
| | | | hva_to_pfn.isra.17
| | | | __gfn_to_pfn
| | | | gfn_to_pfn_async
| | | | try_async_pf
| | | | tdp_page_fault
| | | | kvm_mmu_page_fault
| | | | pf_interception
| | | | handle_exit
| | | | kvm_arch_vcpu_ioctl_run
| | | | kvm_vcpu_ioctl
| | | | do_vfs_ioctl
| | | | sys_ioctl
| | | | system_call_fastpath
| | | | ioctl
| | | | 0x10100000006
| | | |
| | | |--1.46%-- __mutex_lock_slowpath
| | | | mutex_lock
| | | | page_lock_anon_vma
| | | | page_referenced
| | | | shrink_active_list
| | | | shrink_lruvec
| | | | try_to_free_pages
| | | | __alloc_pages_nodemask
| | | | alloc_pages_vma
| | | | do_huge_pmd_anonymous_page
| | | | handle_mm_fault
| | | | __get_user_pages
| | | | get_user_page_nowait
| | | | hva_to_pfn.isra.17
| | | | __gfn_to_pfn
| | | | gfn_to_pfn_async
| | | | try_async_pf
| | | | tdp_page_fault
| | | | kvm_mmu_page_fault
| | | | pf_interception
| | | | handle_exit
| | | | kvm_arch_vcpu_ioctl_run
| | | | kvm_vcpu_ioctl
| | | | do_vfs_ioctl
| | | | sys_ioctl
| | | | system_call_fastpath
| | | | ioctl
| | | | 0x10100000006
| | | |
| | | |--1.41%-- native_flush_tlb_others
| | | | flush_tlb_page
| | | | |
| | | | |--67.10%-- ptep_clear_flush
| | | | | try_to_unmap_one
| | | | | try_to_unmap_anon
| | | | | try_to_unmap
| | | | | migrate_pages
| | | | | compact_zone
| | | | | compact_zone_order
| | | | | try_to_compact_pages
| | | | | __alloc_pages_direct_compact
| | | | | __alloc_pages_nodemask
^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: [PATCH -v2 2/2] make the compaction "skip ahead" logic robust
2012-09-15 15:55 ` Richard Davies
@ 2012-09-16 19:12 ` Richard Davies
-1 siblings, 0 replies; 101+ messages in thread
From: Richard Davies @ 2012-09-16 19:12 UTC (permalink / raw)
To: Rik van Riel
Cc: Mel Gorman, Avi Kivity, Shaohua Li, qemu-devel, kvm, linux-mm
Richard Davies wrote:
> Thank you for your latest patches. I attach my latest perf report for a slow
> boot with all of these applied.
For avoidance of any doubt, there is the combined diff versus 3.6.0-rc5
which I tested:
diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index 38b42e7..090405d 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -1383,10 +1383,8 @@ int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans,
qgroup_dirty(fs_info, srcgroup);
}
- if (!inherit) {
- ret = -EINVAL;
+ if (!inherit)
goto unlock;
- }
i_qgroups = (u64 *)(inherit + 1);
for (i = 0; i < inherit->num_qgroups; ++i) {
diff --git a/mm/compaction.c b/mm/compaction.c
index 7fcd3a5..92bae88 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -70,8 +70,7 @@ static bool compact_checklock_irqsave(spinlock_t *lock, unsigned long *flags,
/* async aborts if taking too long or contended */
if (!cc->sync) {
- if (cc->contended)
- *cc->contended = true;
+ cc->contended = true;
return false;
}
@@ -296,8 +295,9 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc,
/* Time to isolate some pages for migration */
cond_resched();
- spin_lock_irqsave(&zone->lru_lock, flags);
- locked = true;
+ locked = compact_trylock_irqsave(&zone->lru_lock, &flags, cc);
+ if (!locked)
+ return 0;
for (; low_pfn < end_pfn; low_pfn++) {
struct page *page;
@@ -431,17 +431,21 @@ static bool suitable_migration_target(struct page *page)
}
/*
- * Returns the start pfn of the last page block in a zone. This is the starting
- * point for full compaction of a zone. Compaction searches for free pages from
- * the end of each zone, while isolate_freepages_block scans forward inside each
- * page block.
+ * We scan the zone in a circular fashion, starting at
+ * zone->compact_cached_free_pfn. Be careful not to skip if
+ * one compacting thread has just wrapped back to the end of the
+ * zone, but another thread has not.
*/
-static unsigned long start_free_pfn(struct zone *zone)
+static bool compaction_may_skip(struct zone *zone,
+ struct compact_control *cc)
{
- unsigned long free_pfn;
- free_pfn = zone->zone_start_pfn + zone->spanned_pages;
- free_pfn &= ~(pageblock_nr_pages-1);
- return free_pfn;
+ if (!cc->wrapped && zone->compact_cached_free_pfn < cc->start_free_pfn)
+ return true;
+
+ if (cc->wrapped && zone->compact_cached_free_pfn > cc->start_free_pfn)
+ return true;
+
+ return false;
}
/*
@@ -483,6 +487,13 @@ static void isolate_freepages(struct zone *zone,
pfn -= pageblock_nr_pages) {
unsigned long isolated;
+ /*
+ * Skip ahead if another thread is compacting in the area
+ * simultaneously, and has finished with this page block.
+ */
+ if (cc->order > 0 && compaction_may_skip(zone, cc))
+ pfn = min(pfn, zone->compact_cached_free_pfn);
+
if (!pfn_valid(pfn))
continue;
@@ -533,15 +544,7 @@ static void isolate_freepages(struct zone *zone,
*/
if (isolated) {
high_pfn = max(high_pfn, pfn);
-
- /*
- * If the free scanner has wrapped, update
- * compact_cached_free_pfn to point to the highest
- * pageblock with free pages. This reduces excessive
- * scanning of full pageblocks near the end of the
- * zone
- */
- if (cc->order > 0 && cc->wrapped)
+ if (cc->order > 0)
zone->compact_cached_free_pfn = high_pfn;
}
}
@@ -551,11 +554,6 @@ static void isolate_freepages(struct zone *zone,
cc->free_pfn = high_pfn;
cc->nr_freepages = nr_freepages;
-
- /* If compact_cached_free_pfn is reset then set it now */
- if (cc->order > 0 && !cc->wrapped &&
- zone->compact_cached_free_pfn == start_free_pfn(zone))
- zone->compact_cached_free_pfn = high_pfn;
}
/*
@@ -634,7 +632,7 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
/* Perform the isolation */
low_pfn = isolate_migratepages_range(zone, cc, low_pfn, end_pfn);
- if (!low_pfn)
+ if (!low_pfn || cc->contended)
return ISOLATE_ABORT;
cc->migrate_pfn = low_pfn;
@@ -642,6 +640,20 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
return ISOLATE_SUCCESS;
}
+/*
+ * Returns the start pfn of the last page block in a zone. This is the starting
+ * point for full compaction of a zone. Compaction searches for free pages from
+ * the end of each zone, while isolate_freepages_block scans forward inside each
+ * page block.
+ */
+static unsigned long start_free_pfn(struct zone *zone)
+{
+ unsigned long free_pfn;
+ free_pfn = zone->zone_start_pfn + zone->spanned_pages;
+ free_pfn &= ~(pageblock_nr_pages-1);
+ return free_pfn;
+}
+
static int compact_finished(struct zone *zone,
struct compact_control *cc)
{
@@ -787,6 +799,8 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)
switch (isolate_migratepages(zone, cc)) {
case ISOLATE_ABORT:
ret = COMPACT_PARTIAL;
+ putback_lru_pages(&cc->migratepages);
+ cc->nr_migratepages = 0;
goto out;
case ISOLATE_NONE:
continue;
@@ -831,6 +845,7 @@ static unsigned long compact_zone_order(struct zone *zone,
int order, gfp_t gfp_mask,
bool sync, bool *contended)
{
+ unsigned long ret;
struct compact_control cc = {
.nr_freepages = 0,
.nr_migratepages = 0,
@@ -838,12 +853,17 @@ static unsigned long compact_zone_order(struct zone *zone,
.migratetype = allocflags_to_migratetype(gfp_mask),
.zone = zone,
.sync = sync,
- .contended = contended,
};
INIT_LIST_HEAD(&cc.freepages);
INIT_LIST_HEAD(&cc.migratepages);
- return compact_zone(zone, &cc);
+ ret = compact_zone(zone, &cc);
+
+ VM_BUG_ON(!list_empty(&cc.freepages));
+ VM_BUG_ON(!list_empty(&cc.migratepages));
+
+ *contended = cc.contended;
+ return ret;
}
int sysctl_extfrag_threshold = 500;
diff --git a/mm/internal.h b/mm/internal.h
index b8c91b3..4bd7c0e 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -130,7 +130,7 @@ struct compact_control {
int order; /* order a direct compactor needs */
int migratetype; /* MOVABLE, RECLAIMABLE etc */
struct zone *zone;
- bool *contended; /* True if a lock was contended */
+ bool contended; /* True if a lock was contended */
};
unsigned long
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 101+ messages in thread
* Re: [Qemu-devel] [PATCH -v2 2/2] make the compaction "skip ahead" logic robust
@ 2012-09-16 19:12 ` Richard Davies
0 siblings, 0 replies; 101+ messages in thread
From: Richard Davies @ 2012-09-16 19:12 UTC (permalink / raw)
To: Rik van Riel
Cc: kvm, qemu-devel, linux-mm, Mel Gorman, Shaohua Li, Avi Kivity
Richard Davies wrote:
> Thank you for your latest patches. I attach my latest perf report for a slow
> boot with all of these applied.
For avoidance of any doubt, there is the combined diff versus 3.6.0-rc5
which I tested:
diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index 38b42e7..090405d 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -1383,10 +1383,8 @@ int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans,
qgroup_dirty(fs_info, srcgroup);
}
- if (!inherit) {
- ret = -EINVAL;
+ if (!inherit)
goto unlock;
- }
i_qgroups = (u64 *)(inherit + 1);
for (i = 0; i < inherit->num_qgroups; ++i) {
diff --git a/mm/compaction.c b/mm/compaction.c
index 7fcd3a5..92bae88 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -70,8 +70,7 @@ static bool compact_checklock_irqsave(spinlock_t *lock, unsigned long *flags,
/* async aborts if taking too long or contended */
if (!cc->sync) {
- if (cc->contended)
- *cc->contended = true;
+ cc->contended = true;
return false;
}
@@ -296,8 +295,9 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc,
/* Time to isolate some pages for migration */
cond_resched();
- spin_lock_irqsave(&zone->lru_lock, flags);
- locked = true;
+ locked = compact_trylock_irqsave(&zone->lru_lock, &flags, cc);
+ if (!locked)
+ return 0;
for (; low_pfn < end_pfn; low_pfn++) {
struct page *page;
@@ -431,17 +431,21 @@ static bool suitable_migration_target(struct page *page)
}
/*
- * Returns the start pfn of the last page block in a zone. This is the starting
- * point for full compaction of a zone. Compaction searches for free pages from
- * the end of each zone, while isolate_freepages_block scans forward inside each
- * page block.
+ * We scan the zone in a circular fashion, starting at
+ * zone->compact_cached_free_pfn. Be careful not to skip if
+ * one compacting thread has just wrapped back to the end of the
+ * zone, but another thread has not.
*/
-static unsigned long start_free_pfn(struct zone *zone)
+static bool compaction_may_skip(struct zone *zone,
+ struct compact_control *cc)
{
- unsigned long free_pfn;
- free_pfn = zone->zone_start_pfn + zone->spanned_pages;
- free_pfn &= ~(pageblock_nr_pages-1);
- return free_pfn;
+ if (!cc->wrapped && zone->compact_cached_free_pfn < cc->start_free_pfn)
+ return true;
+
+ if (cc->wrapped && zone->compact_cached_free_pfn > cc->start_free_pfn)
+ return true;
+
+ return false;
}
/*
@@ -483,6 +487,13 @@ static void isolate_freepages(struct zone *zone,
pfn -= pageblock_nr_pages) {
unsigned long isolated;
+ /*
+ * Skip ahead if another thread is compacting in the area
+ * simultaneously, and has finished with this page block.
+ */
+ if (cc->order > 0 && compaction_may_skip(zone, cc))
+ pfn = min(pfn, zone->compact_cached_free_pfn);
+
if (!pfn_valid(pfn))
continue;
@@ -533,15 +544,7 @@ static void isolate_freepages(struct zone *zone,
*/
if (isolated) {
high_pfn = max(high_pfn, pfn);
-
- /*
- * If the free scanner has wrapped, update
- * compact_cached_free_pfn to point to the highest
- * pageblock with free pages. This reduces excessive
- * scanning of full pageblocks near the end of the
- * zone
- */
- if (cc->order > 0 && cc->wrapped)
+ if (cc->order > 0)
zone->compact_cached_free_pfn = high_pfn;
}
}
@@ -551,11 +554,6 @@ static void isolate_freepages(struct zone *zone,
cc->free_pfn = high_pfn;
cc->nr_freepages = nr_freepages;
-
- /* If compact_cached_free_pfn is reset then set it now */
- if (cc->order > 0 && !cc->wrapped &&
- zone->compact_cached_free_pfn == start_free_pfn(zone))
- zone->compact_cached_free_pfn = high_pfn;
}
/*
@@ -634,7 +632,7 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
/* Perform the isolation */
low_pfn = isolate_migratepages_range(zone, cc, low_pfn, end_pfn);
- if (!low_pfn)
+ if (!low_pfn || cc->contended)
return ISOLATE_ABORT;
cc->migrate_pfn = low_pfn;
@@ -642,6 +640,20 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
return ISOLATE_SUCCESS;
}
+/*
+ * Returns the start pfn of the last page block in a zone. This is the starting
+ * point for full compaction of a zone. Compaction searches for free pages from
+ * the end of each zone, while isolate_freepages_block scans forward inside each
+ * page block.
+ */
+static unsigned long start_free_pfn(struct zone *zone)
+{
+ unsigned long free_pfn;
+ free_pfn = zone->zone_start_pfn + zone->spanned_pages;
+ free_pfn &= ~(pageblock_nr_pages-1);
+ return free_pfn;
+}
+
static int compact_finished(struct zone *zone,
struct compact_control *cc)
{
@@ -787,6 +799,8 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)
switch (isolate_migratepages(zone, cc)) {
case ISOLATE_ABORT:
ret = COMPACT_PARTIAL;
+ putback_lru_pages(&cc->migratepages);
+ cc->nr_migratepages = 0;
goto out;
case ISOLATE_NONE:
continue;
@@ -831,6 +845,7 @@ static unsigned long compact_zone_order(struct zone *zone,
int order, gfp_t gfp_mask,
bool sync, bool *contended)
{
+ unsigned long ret;
struct compact_control cc = {
.nr_freepages = 0,
.nr_migratepages = 0,
@@ -838,12 +853,17 @@ static unsigned long compact_zone_order(struct zone *zone,
.migratetype = allocflags_to_migratetype(gfp_mask),
.zone = zone,
.sync = sync,
- .contended = contended,
};
INIT_LIST_HEAD(&cc.freepages);
INIT_LIST_HEAD(&cc.migratepages);
- return compact_zone(zone, &cc);
+ ret = compact_zone(zone, &cc);
+
+ VM_BUG_ON(!list_empty(&cc.freepages));
+ VM_BUG_ON(!list_empty(&cc.migratepages));
+
+ *contended = cc.contended;
+ return ret;
}
int sysctl_extfrag_threshold = 500;
diff --git a/mm/internal.h b/mm/internal.h
index b8c91b3..4bd7c0e 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -130,7 +130,7 @@ struct compact_control {
int order; /* order a direct compactor needs */
int migratetype; /* MOVABLE, RECLAIMABLE etc */
struct zone *zone;
- bool *contended; /* True if a lock was contended */
+ bool contended; /* True if a lock was contended */
};
unsigned long
^ permalink raw reply related [flat|nested] 101+ messages in thread
* Re: [PATCH -v2 2/2] make the compaction "skip ahead" logic robust
2012-09-15 15:55 ` Richard Davies
@ 2012-09-17 12:26 ` Mel Gorman
-1 siblings, 0 replies; 101+ messages in thread
From: Mel Gorman @ 2012-09-17 12:26 UTC (permalink / raw)
To: Richard Davies
Cc: Rik van Riel, Avi Kivity, Shaohua Li, qemu-devel, kvm, linux-mm
On Sat, Sep 15, 2012 at 04:55:24PM +0100, Richard Davies wrote:
> Hi Rik, Mel and Shaohua,
>
> Thank you for your latest patches. I attach my latest perf report for a slow
> boot with all of these applied.
>
Thanks for testing.
> Mel asked for timings of the slow boots. It's very hard to give anything
> useful here! A normal boot would be a minute or so, and many are like that,
> but the slowest that I have seen (on 3.5.x) was several hours. Basically, I
> just test many times until I get one which is noticeably slow than normal
> and then run perf record on that one.
>
Ok.
> The latest perf report for a slow boot is below. For the fast boots, most of
> the time is in clean_page_c in do_huge_pmd_anonymous_page, but for this slow
> one there is a lot of lock contention above that.
>
> <SNIP>
> 58.49% qemu-kvm [kernel.kallsyms] [k] _raw_spin_lock_irqsave
> |
> --- _raw_spin_lock_irqsave
> |
> |--95.07%-- compact_checklock_irqsave
> | |
> | |--70.03%-- isolate_migratepages_range
> <SNIP>
> | --29.97%-- compaction_alloc
> |
> |--4.53%-- isolate_migratepages_range
> <SNIP>
This is going the right direction but usage due to contentions is still
obviously stupidly high. Compaction features throughout the profile but
staying focused on the lock contention for the moment. Can you try the
following patch? So far I'm not having much luck reproducing this locally.
---8<---
mm: compaction: Only release lru_lock every SWAP_CLUSTER_MAX pages if necessary
Commit b2eef8c0 (mm: compaction: minimise the time IRQs are disabled while
isolating pages for migration) releases the lru_lock every SWAP_CLUSTER_MAX
pages that are scanned as it was found at the time that compaction could
contend badly with page reclaim. This can lead to a situation where
compaction contends heavily with itself as it releases and reacquires
the LRU lock.
This patch makes two changes to how the migrate scanner acquires the LRU
lock. First, it only releases the LRU lock every SWAP_CLUSTER_MAX pages if
the lock is contended. This reduces the number of times it unnnecessarily
disables and reenables IRQs. The second is that it defers acquiring the
LRU lock for as long as possible. In cases where transparent hugepages
are encountered the LRU lock will not be acquired at all.
Signed-off-by: Mel Gorman <mgorman@suse.de>
---
mm/compaction.c | 65 +++++++++++++++++++++++++++++++++++++------------------
1 file changed, 44 insertions(+), 21 deletions(-)
diff --git a/mm/compaction.c b/mm/compaction.c
index 39342ee..1874f23 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -50,6 +50,11 @@ static inline bool migrate_async_suitable(int migratetype)
return is_migrate_cma(migratetype) || migratetype == MIGRATE_MOVABLE;
}
+static inline bool should_release_lock(spinlock_t *lock)
+{
+ return need_resched() || spin_is_contended(lock);
+}
+
/*
* Compaction requires the taking of some coarse locks that are potentially
* very heavily contended. Check if the process needs to be scheduled or
@@ -62,7 +67,7 @@ static inline bool migrate_async_suitable(int migratetype)
static bool compact_checklock_irqsave(spinlock_t *lock, unsigned long *flags,
bool locked, struct compact_control *cc)
{
- if (need_resched() || spin_is_contended(lock)) {
+ if (should_release_lock(lock)) {
if (locked) {
spin_unlock_irqrestore(lock, *flags);
locked = false;
@@ -275,7 +280,7 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc,
isolate_mode_t mode = 0;
struct lruvec *lruvec;
unsigned long flags;
- bool locked;
+ bool locked = false;
/*
* Ensure that there are not too many pages isolated from the LRU
@@ -295,24 +300,17 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc,
/* Time to isolate some pages for migration */
cond_resched();
- locked = compact_trylock_irqsave(&zone->lru_lock, &flags, cc);
- if (!locked)
- return 0;
for (; low_pfn < end_pfn; low_pfn++) {
struct page *page;
/* give a chance to irqs before checking need_resched() */
- if (!((low_pfn+1) % SWAP_CLUSTER_MAX)) {
- spin_unlock_irqrestore(&zone->lru_lock, flags);
- locked = false;
+ if (locked && !((low_pfn+1) % SWAP_CLUSTER_MAX)) {
+ if (should_release_lock(&zone->lru_lock)) {
+ spin_unlock_irqrestore(&zone->lru_lock, flags);
+ locked = false;
+ }
}
- /* Check if it is ok to still hold the lock */
- locked = compact_checklock_irqsave(&zone->lru_lock, &flags,
- locked, cc);
- if (!locked)
- break;
-
/*
* migrate_pfn does not necessarily start aligned to a
* pageblock. Ensure that pfn_valid is called when moving
@@ -352,21 +350,38 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc,
pageblock_nr = low_pfn >> pageblock_order;
if (!cc->sync && last_pageblock_nr != pageblock_nr &&
!migrate_async_suitable(get_pageblock_migratetype(page))) {
- low_pfn += pageblock_nr_pages;
- low_pfn = ALIGN(low_pfn, pageblock_nr_pages) - 1;
- last_pageblock_nr = pageblock_nr;
- continue;
+ goto next_pageblock;
}
+ /* Check may be lockless but that's ok as we recheck later */
if (!PageLRU(page))
continue;
/*
- * PageLRU is set, and lru_lock excludes isolation,
- * splitting and collapsing (collapsing has already
- * happened if PageLRU is set).
+ * PageLRU is set. lru_lock normally excludes isolation
+ * splitting and collapsing (collapsing has already happened
+ * if PageLRU is set) but the lock is not necessarily taken
+ * here and it is wasteful to take it just to check transhuge.
+ * Check transhuge without lock and skip if it's either a
+ * transhuge or hugetlbfs page.
*/
if (PageTransHuge(page)) {
+ if (!locked)
+ goto next_pageblock;
+ low_pfn += (1 << compound_order(page)) - 1;
+ continue;
+ }
+
+ /* Check if it is ok to still hold the lock */
+ locked = compact_checklock_irqsave(&zone->lru_lock, &flags,
+ locked, cc);
+ if (!locked)
+ break;
+
+ /* Recheck PageLRU and PageTransHuge under lock */
+ if (!PageLRU(page))
+ continue;
+ if (PageTransHuge(page)) {
low_pfn += (1 << compound_order(page)) - 1;
continue;
}
@@ -393,6 +408,14 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc,
++low_pfn;
break;
}
+
+ continue;
+
+next_pageblock:
+ low_pfn += pageblock_nr_pages;
+ low_pfn = ALIGN(low_pfn, pageblock_nr_pages) - 1;
+ last_pageblock_nr = pageblock_nr;
+
}
acct_isolated(zone, locked, cc);
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 101+ messages in thread
* Re: [Qemu-devel] [PATCH -v2 2/2] make the compaction "skip ahead" logic robust
@ 2012-09-17 12:26 ` Mel Gorman
0 siblings, 0 replies; 101+ messages in thread
From: Mel Gorman @ 2012-09-17 12:26 UTC (permalink / raw)
To: Richard Davies; +Cc: kvm, qemu-devel, linux-mm, Avi Kivity, Shaohua Li
On Sat, Sep 15, 2012 at 04:55:24PM +0100, Richard Davies wrote:
> Hi Rik, Mel and Shaohua,
>
> Thank you for your latest patches. I attach my latest perf report for a slow
> boot with all of these applied.
>
Thanks for testing.
> Mel asked for timings of the slow boots. It's very hard to give anything
> useful here! A normal boot would be a minute or so, and many are like that,
> but the slowest that I have seen (on 3.5.x) was several hours. Basically, I
> just test many times until I get one which is noticeably slow than normal
> and then run perf record on that one.
>
Ok.
> The latest perf report for a slow boot is below. For the fast boots, most of
> the time is in clean_page_c in do_huge_pmd_anonymous_page, but for this slow
> one there is a lot of lock contention above that.
>
> <SNIP>
> 58.49% qemu-kvm [kernel.kallsyms] [k] _raw_spin_lock_irqsave
> |
> --- _raw_spin_lock_irqsave
> |
> |--95.07%-- compact_checklock_irqsave
> | |
> | |--70.03%-- isolate_migratepages_range
> <SNIP>
> | --29.97%-- compaction_alloc
> |
> |--4.53%-- isolate_migratepages_range
> <SNIP>
This is going the right direction but usage due to contentions is still
obviously stupidly high. Compaction features throughout the profile but
staying focused on the lock contention for the moment. Can you try the
following patch? So far I'm not having much luck reproducing this locally.
---8<---
mm: compaction: Only release lru_lock every SWAP_CLUSTER_MAX pages if necessary
Commit b2eef8c0 (mm: compaction: minimise the time IRQs are disabled while
isolating pages for migration) releases the lru_lock every SWAP_CLUSTER_MAX
pages that are scanned as it was found at the time that compaction could
contend badly with page reclaim. This can lead to a situation where
compaction contends heavily with itself as it releases and reacquires
the LRU lock.
This patch makes two changes to how the migrate scanner acquires the LRU
lock. First, it only releases the LRU lock every SWAP_CLUSTER_MAX pages if
the lock is contended. This reduces the number of times it unnnecessarily
disables and reenables IRQs. The second is that it defers acquiring the
LRU lock for as long as possible. In cases where transparent hugepages
are encountered the LRU lock will not be acquired at all.
Signed-off-by: Mel Gorman <mgorman@suse.de>
---
mm/compaction.c | 65 +++++++++++++++++++++++++++++++++++++------------------
1 file changed, 44 insertions(+), 21 deletions(-)
diff --git a/mm/compaction.c b/mm/compaction.c
index 39342ee..1874f23 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -50,6 +50,11 @@ static inline bool migrate_async_suitable(int migratetype)
return is_migrate_cma(migratetype) || migratetype == MIGRATE_MOVABLE;
}
+static inline bool should_release_lock(spinlock_t *lock)
+{
+ return need_resched() || spin_is_contended(lock);
+}
+
/*
* Compaction requires the taking of some coarse locks that are potentially
* very heavily contended. Check if the process needs to be scheduled or
@@ -62,7 +67,7 @@ static inline bool migrate_async_suitable(int migratetype)
static bool compact_checklock_irqsave(spinlock_t *lock, unsigned long *flags,
bool locked, struct compact_control *cc)
{
- if (need_resched() || spin_is_contended(lock)) {
+ if (should_release_lock(lock)) {
if (locked) {
spin_unlock_irqrestore(lock, *flags);
locked = false;
@@ -275,7 +280,7 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc,
isolate_mode_t mode = 0;
struct lruvec *lruvec;
unsigned long flags;
- bool locked;
+ bool locked = false;
/*
* Ensure that there are not too many pages isolated from the LRU
@@ -295,24 +300,17 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc,
/* Time to isolate some pages for migration */
cond_resched();
- locked = compact_trylock_irqsave(&zone->lru_lock, &flags, cc);
- if (!locked)
- return 0;
for (; low_pfn < end_pfn; low_pfn++) {
struct page *page;
/* give a chance to irqs before checking need_resched() */
- if (!((low_pfn+1) % SWAP_CLUSTER_MAX)) {
- spin_unlock_irqrestore(&zone->lru_lock, flags);
- locked = false;
+ if (locked && !((low_pfn+1) % SWAP_CLUSTER_MAX)) {
+ if (should_release_lock(&zone->lru_lock)) {
+ spin_unlock_irqrestore(&zone->lru_lock, flags);
+ locked = false;
+ }
}
- /* Check if it is ok to still hold the lock */
- locked = compact_checklock_irqsave(&zone->lru_lock, &flags,
- locked, cc);
- if (!locked)
- break;
-
/*
* migrate_pfn does not necessarily start aligned to a
* pageblock. Ensure that pfn_valid is called when moving
@@ -352,21 +350,38 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc,
pageblock_nr = low_pfn >> pageblock_order;
if (!cc->sync && last_pageblock_nr != pageblock_nr &&
!migrate_async_suitable(get_pageblock_migratetype(page))) {
- low_pfn += pageblock_nr_pages;
- low_pfn = ALIGN(low_pfn, pageblock_nr_pages) - 1;
- last_pageblock_nr = pageblock_nr;
- continue;
+ goto next_pageblock;
}
+ /* Check may be lockless but that's ok as we recheck later */
if (!PageLRU(page))
continue;
/*
- * PageLRU is set, and lru_lock excludes isolation,
- * splitting and collapsing (collapsing has already
- * happened if PageLRU is set).
+ * PageLRU is set. lru_lock normally excludes isolation
+ * splitting and collapsing (collapsing has already happened
+ * if PageLRU is set) but the lock is not necessarily taken
+ * here and it is wasteful to take it just to check transhuge.
+ * Check transhuge without lock and skip if it's either a
+ * transhuge or hugetlbfs page.
*/
if (PageTransHuge(page)) {
+ if (!locked)
+ goto next_pageblock;
+ low_pfn += (1 << compound_order(page)) - 1;
+ continue;
+ }
+
+ /* Check if it is ok to still hold the lock */
+ locked = compact_checklock_irqsave(&zone->lru_lock, &flags,
+ locked, cc);
+ if (!locked)
+ break;
+
+ /* Recheck PageLRU and PageTransHuge under lock */
+ if (!PageLRU(page))
+ continue;
+ if (PageTransHuge(page)) {
low_pfn += (1 << compound_order(page)) - 1;
continue;
}
@@ -393,6 +408,14 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc,
++low_pfn;
break;
}
+
+ continue;
+
+next_pageblock:
+ low_pfn += pageblock_nr_pages;
+ low_pfn = ALIGN(low_pfn, pageblock_nr_pages) - 1;
+ last_pageblock_nr = pageblock_nr;
+
}
acct_isolated(zone, locked, cc);
^ permalink raw reply related [flat|nested] 101+ messages in thread
* Re: [PATCH -v2 2/2] make the compaction "skip ahead" logic robust
2012-09-15 15:55 ` Richard Davies
@ 2012-09-17 13:50 ` Rik van Riel
-1 siblings, 0 replies; 101+ messages in thread
From: Rik van Riel @ 2012-09-17 13:50 UTC (permalink / raw)
To: Richard Davies
Cc: Mel Gorman, Avi Kivity, Shaohua Li, qemu-devel, kvm, linux-mm
On 09/15/2012 11:55 AM, Richard Davies wrote:
> Hi Rik, Mel and Shaohua,
>
> Thank you for your latest patches. I attach my latest perf report for a slow
> boot with all of these applied.
>
> Mel asked for timings of the slow boots. It's very hard to give anything
> useful here! A normal boot would be a minute or so, and many are like that,
> but the slowest that I have seen (on 3.5.x) was several hours. Basically, I
> just test many times until I get one which is noticeably slow than normal
> and then run perf record on that one.
>
> The latest perf report for a slow boot is below. For the fast boots, most of
> the time is in clean_page_c in do_huge_pmd_anonymous_page, but for this slow
> one there is a lot of lock contention above that.
How often do you run into slow boots, vs. fast ones?
> # Overhead Command Shared Object Symbol
> # ........ ............... .................... ..............................................
> #
> 58.49% qemu-kvm [kernel.kallsyms] [k] _raw_spin_lock_irqsave
> |
> --- _raw_spin_lock_irqsave
> |
> |--95.07%-- compact_checklock_irqsave
> | |
> | |--70.03%-- isolate_migratepages_range
> | | compact_zone
> | | compact_zone_order
> | | try_to_compact_pages
> | | __alloc_pages_direct_compact
> | | __alloc_pages_nodemask
Looks like it moved from isolate_freepages_block in your last
trace, to isolate_migratepages_range?
Mel, I wonder if we have any quadratic complexity problems
in this part of the code, too?
The isolate_freepages_block CPU use can be fixed by simply
restarting where the last invocation left off, instead of
always starting at the end of the zone. Could we need
something similar for isolate_migratepages_range?
After all, Richard has a 128GB system, and runs 108GB worth
of KVM guests on it...
--
All rights reversed
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: [Qemu-devel] [PATCH -v2 2/2] make the compaction "skip ahead" logic robust
@ 2012-09-17 13:50 ` Rik van Riel
0 siblings, 0 replies; 101+ messages in thread
From: Rik van Riel @ 2012-09-17 13:50 UTC (permalink / raw)
To: Richard Davies
Cc: kvm, qemu-devel, linux-mm, Mel Gorman, Shaohua Li, Avi Kivity
On 09/15/2012 11:55 AM, Richard Davies wrote:
> Hi Rik, Mel and Shaohua,
>
> Thank you for your latest patches. I attach my latest perf report for a slow
> boot with all of these applied.
>
> Mel asked for timings of the slow boots. It's very hard to give anything
> useful here! A normal boot would be a minute or so, and many are like that,
> but the slowest that I have seen (on 3.5.x) was several hours. Basically, I
> just test many times until I get one which is noticeably slow than normal
> and then run perf record on that one.
>
> The latest perf report for a slow boot is below. For the fast boots, most of
> the time is in clean_page_c in do_huge_pmd_anonymous_page, but for this slow
> one there is a lot of lock contention above that.
How often do you run into slow boots, vs. fast ones?
> # Overhead Command Shared Object Symbol
> # ........ ............... .................... ..............................................
> #
> 58.49% qemu-kvm [kernel.kallsyms] [k] _raw_spin_lock_irqsave
> |
> --- _raw_spin_lock_irqsave
> |
> |--95.07%-- compact_checklock_irqsave
> | |
> | |--70.03%-- isolate_migratepages_range
> | | compact_zone
> | | compact_zone_order
> | | try_to_compact_pages
> | | __alloc_pages_direct_compact
> | | __alloc_pages_nodemask
Looks like it moved from isolate_freepages_block in your last
trace, to isolate_migratepages_range?
Mel, I wonder if we have any quadratic complexity problems
in this part of the code, too?
The isolate_freepages_block CPU use can be fixed by simply
restarting where the last invocation left off, instead of
always starting at the end of the zone. Could we need
something similar for isolate_migratepages_range?
After all, Richard has a 128GB system, and runs 108GB worth
of KVM guests on it...
--
All rights reversed
^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: [PATCH -v2 2/2] make the compaction "skip ahead" logic robust
2012-09-17 13:50 ` [Qemu-devel] " Rik van Riel
(?)
@ 2012-09-17 14:07 ` Mel Gorman
-1 siblings, 0 replies; 101+ messages in thread
From: Mel Gorman @ 2012-09-17 14:07 UTC (permalink / raw)
To: Rik van Riel
Cc: Richard Davies, Avi Kivity, Shaohua Li, qemu-devel, kvm, linux-mm
On Mon, Sep 17, 2012 at 09:50:08AM -0400, Rik van Riel wrote:
> On 09/15/2012 11:55 AM, Richard Davies wrote:
> >Hi Rik, Mel and Shaohua,
> >
> >Thank you for your latest patches. I attach my latest perf report for a slow
> >boot with all of these applied.
> >
> >Mel asked for timings of the slow boots. It's very hard to give anything
> >useful here! A normal boot would be a minute or so, and many are like that,
> >but the slowest that I have seen (on 3.5.x) was several hours. Basically, I
> >just test many times until I get one which is noticeably slow than normal
> >and then run perf record on that one.
> >
> >The latest perf report for a slow boot is below. For the fast boots, most of
> >the time is in clean_page_c in do_huge_pmd_anonymous_page, but for this slow
> >one there is a lot of lock contention above that.
>
> How often do you run into slow boots, vs. fast ones?
>
> ># Overhead Command Shared Object Symbol
> ># ........ ............... .................... ..............................................
> >#
> > 58.49% qemu-kvm [kernel.kallsyms] [k] _raw_spin_lock_irqsave
> > |
> > --- _raw_spin_lock_irqsave
> > |
> > |--95.07%-- compact_checklock_irqsave
> > | |
> > | |--70.03%-- isolate_migratepages_range
> > | | compact_zone
> > | | compact_zone_order
> > | | try_to_compact_pages
> > | | __alloc_pages_direct_compact
> > | | __alloc_pages_nodemask
>
> Looks like it moved from isolate_freepages_block in your last
> trace, to isolate_migratepages_range?
>
> Mel, I wonder if we have any quadratic complexity problems
> in this part of the code, too?
>
Possibly but right now I'm focusing on the contention even though I recognise
that reducing the amount of scanning implicitly reduces the amount of
contention. I'm running a test at the moment with an additional patch
to record the pageblock being scanned by either the free or migrate page
scanner. This should be enough to both calculate the scanning efficiency
and how many useless blocks are scanned to determine if your "skip"
patches are behaving as expected and from there decide if the migrate
scanner needs similar logic.
--
Mel Gorman
SUSE Labs
^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: [PATCH -v2 2/2] make the compaction "skip ahead" logic robust
@ 2012-09-17 14:07 ` Mel Gorman
0 siblings, 0 replies; 101+ messages in thread
From: Mel Gorman @ 2012-09-17 14:07 UTC (permalink / raw)
To: Rik van Riel
Cc: Richard Davies, Avi Kivity, Shaohua Li, qemu-devel, kvm, linux-mm
On Mon, Sep 17, 2012 at 09:50:08AM -0400, Rik van Riel wrote:
> On 09/15/2012 11:55 AM, Richard Davies wrote:
> >Hi Rik, Mel and Shaohua,
> >
> >Thank you for your latest patches. I attach my latest perf report for a slow
> >boot with all of these applied.
> >
> >Mel asked for timings of the slow boots. It's very hard to give anything
> >useful here! A normal boot would be a minute or so, and many are like that,
> >but the slowest that I have seen (on 3.5.x) was several hours. Basically, I
> >just test many times until I get one which is noticeably slow than normal
> >and then run perf record on that one.
> >
> >The latest perf report for a slow boot is below. For the fast boots, most of
> >the time is in clean_page_c in do_huge_pmd_anonymous_page, but for this slow
> >one there is a lot of lock contention above that.
>
> How often do you run into slow boots, vs. fast ones?
>
> ># Overhead Command Shared Object Symbol
> ># ........ ............... .................... ..............................................
> >#
> > 58.49% qemu-kvm [kernel.kallsyms] [k] _raw_spin_lock_irqsave
> > |
> > --- _raw_spin_lock_irqsave
> > |
> > |--95.07%-- compact_checklock_irqsave
> > | |
> > | |--70.03%-- isolate_migratepages_range
> > | | compact_zone
> > | | compact_zone_order
> > | | try_to_compact_pages
> > | | __alloc_pages_direct_compact
> > | | __alloc_pages_nodemask
>
> Looks like it moved from isolate_freepages_block in your last
> trace, to isolate_migratepages_range?
>
> Mel, I wonder if we have any quadratic complexity problems
> in this part of the code, too?
>
Possibly but right now I'm focusing on the contention even though I recognise
that reducing the amount of scanning implicitly reduces the amount of
contention. I'm running a test at the moment with an additional patch
to record the pageblock being scanned by either the free or migrate page
scanner. This should be enough to both calculate the scanning efficiency
and how many useless blocks are scanned to determine if your "skip"
patches are behaving as expected and from there decide if the migrate
scanner needs similar logic.
--
Mel Gorman
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: [Qemu-devel] [PATCH -v2 2/2] make the compaction "skip ahead" logic robust
@ 2012-09-17 14:07 ` Mel Gorman
0 siblings, 0 replies; 101+ messages in thread
From: Mel Gorman @ 2012-09-17 14:07 UTC (permalink / raw)
To: Rik van Riel
Cc: Richard Davies, kvm, qemu-devel, linux-mm, Avi Kivity, Shaohua Li
On Mon, Sep 17, 2012 at 09:50:08AM -0400, Rik van Riel wrote:
> On 09/15/2012 11:55 AM, Richard Davies wrote:
> >Hi Rik, Mel and Shaohua,
> >
> >Thank you for your latest patches. I attach my latest perf report for a slow
> >boot with all of these applied.
> >
> >Mel asked for timings of the slow boots. It's very hard to give anything
> >useful here! A normal boot would be a minute or so, and many are like that,
> >but the slowest that I have seen (on 3.5.x) was several hours. Basically, I
> >just test many times until I get one which is noticeably slow than normal
> >and then run perf record on that one.
> >
> >The latest perf report for a slow boot is below. For the fast boots, most of
> >the time is in clean_page_c in do_huge_pmd_anonymous_page, but for this slow
> >one there is a lot of lock contention above that.
>
> How often do you run into slow boots, vs. fast ones?
>
> ># Overhead Command Shared Object Symbol
> ># ........ ............... .................... ..............................................
> >#
> > 58.49% qemu-kvm [kernel.kallsyms] [k] _raw_spin_lock_irqsave
> > |
> > --- _raw_spin_lock_irqsave
> > |
> > |--95.07%-- compact_checklock_irqsave
> > | |
> > | |--70.03%-- isolate_migratepages_range
> > | | compact_zone
> > | | compact_zone_order
> > | | try_to_compact_pages
> > | | __alloc_pages_direct_compact
> > | | __alloc_pages_nodemask
>
> Looks like it moved from isolate_freepages_block in your last
> trace, to isolate_migratepages_range?
>
> Mel, I wonder if we have any quadratic complexity problems
> in this part of the code, too?
>
Possibly but right now I'm focusing on the contention even though I recognise
that reducing the amount of scanning implicitly reduces the amount of
contention. I'm running a test at the moment with an additional patch
to record the pageblock being scanned by either the free or migrate page
scanner. This should be enough to both calculate the scanning efficiency
and how many useless blocks are scanned to determine if your "skip"
patches are behaving as expected and from there decide if the migrate
scanner needs similar logic.
--
Mel Gorman
SUSE Labs
^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: [PATCH -v2 2/2] make the compaction "skip ahead" logic robust
2012-09-17 12:26 ` [Qemu-devel] " Mel Gorman
@ 2012-09-18 8:14 ` Richard Davies
-1 siblings, 0 replies; 101+ messages in thread
From: Richard Davies @ 2012-09-18 8:14 UTC (permalink / raw)
To: Mel Gorman
Cc: Rik van Riel, Avi Kivity, Shaohua Li, qemu-devel, kvm, linux-mm
Hi Mel,
Thanks for your latest patch, I attach a perf report below with this on top
of all previous patches. There is still lock contention, though in a
different place.
Regarding Rik's question:
> > Mel asked for timings of the slow boots. It's very hard to give anything
> > useful here! A normal boot would be a minute or so, and many are like
> > that, but the slowest that I have seen (on 3.5.x) was several hours.
> > Basically, I just test many times until I get one which is noticeably
> > slow than normal and then run perf record on that one.
> >
> > The latest perf report for a slow boot is below. For the fast boots,
> > most of the time is in clean_page_c in do_huge_pmd_anonymous_page, but
> > for this slow one there is a lot of lock contention above that.
>
> How often do you run into slow boots, vs. fast ones?
It is about 1/3rd slow boots, some of which are slower than others. I do
about ten and send you the trace of the worst.
Experimentally, copying large files (the VM image files) immediately before
booting the VM seems to make a slow boot more likely.
Thanks,
Richard.
# ========
# captured on: Mon Sep 17 20:09:33 2012
# os release : 3.6.0-rc5-elastic+
# perf version : 3.5.2
# arch : x86_64
# nrcpus online : 16
# nrcpus avail : 16
# cpudesc : AMD Opteron(tm) Processor 6128
# cpuid : AuthenticAMD,16,9,1
# total memory : 131973280 kB
# cmdline : /home/root/bin/perf record -g -a
# event : name = cycles, type = 0, config = 0x0, config1 = 0x0, config2 = 0x0, excl_usr = 0, excl_kern = 0, id = { 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48 }
# HEADER_CPU_TOPOLOGY info available, use -I to display
# HEADER_NUMA_TOPOLOGY info available, use -I to display
# ========
#
# Samples: 4M of event 'cycles'
# Event count (approx.): 1616311320818
#
# Overhead Command Shared Object Symbol
# ........ ............... .................... ..............................................
#
59.97% qemu-kvm [kernel.kallsyms] [k] _raw_spin_lock_irqsave
|
--- _raw_spin_lock_irqsave
|
|--99.30%-- compact_checklock_irqsave
| |
| |--99.98%-- compaction_alloc
| | migrate_pages
| | compact_zone
| | compact_zone_order
| | try_to_compact_pages
| | __alloc_pages_direct_compact
| | __alloc_pages_nodemask
| | alloc_pages_vma
| | do_huge_pmd_anonymous_page
| | handle_mm_fault
| | __get_user_pages
| | get_user_page_nowait
| | hva_to_pfn.isra.17
| | __gfn_to_pfn
| | gfn_to_pfn_async
| | try_async_pf
| | tdp_page_fault
| | kvm_mmu_page_fault
| | pf_interception
| | handle_exit
| | kvm_arch_vcpu_ioctl_run
| | kvm_vcpu_ioctl
| | do_vfs_ioctl
| | sys_ioctl
| | system_call_fastpath
| | ioctl
| | |
| | |--84.28%-- 0x10100000006
| | |
| | --15.72%-- 0x10100000002
| --0.02%-- [...]
|
|--0.65%-- compaction_alloc
| migrate_pages
| compact_zone
| compact_zone_order
| try_to_compact_pages
| __alloc_pages_direct_compact
| __alloc_pages_nodemask
| alloc_pages_vma
| do_huge_pmd_anonymous_page
| handle_mm_fault
| __get_user_pages
| get_user_page_nowait
| hva_to_pfn.isra.17
| __gfn_to_pfn
| gfn_to_pfn_async
| try_async_pf
| tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--83.37%-- 0x10100000006
| |
| --16.63%-- 0x10100000002
--0.05%-- [...]
12.27% qemu-kvm [kernel.kallsyms] [k] isolate_freepages_block
|
--- isolate_freepages_block
|
|--99.99%-- compaction_alloc
| migrate_pages
| compact_zone
| compact_zone_order
| try_to_compact_pages
| __alloc_pages_direct_compact
| __alloc_pages_nodemask
| alloc_pages_vma
| do_huge_pmd_anonymous_page
| handle_mm_fault
| __get_user_pages
| get_user_page_nowait
| hva_to_pfn.isra.17
| __gfn_to_pfn
| gfn_to_pfn_async
| try_async_pf
| tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--82.90%-- 0x10100000006
| |
| --17.10%-- 0x10100000002
--0.01%-- [...]
7.90% qemu-kvm [kernel.kallsyms] [k] clear_page_c
|
--- clear_page_c
|
|--99.19%-- do_huge_pmd_anonymous_page
| handle_mm_fault
| __get_user_pages
| get_user_page_nowait
| hva_to_pfn.isra.17
| __gfn_to_pfn
| gfn_to_pfn_async
| try_async_pf
| tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--64.93%-- 0x10100000006
| |
| --35.07%-- 0x10100000002
|
--0.81%-- __alloc_pages_nodemask
|
|--84.23%-- alloc_pages_vma
| handle_pte_fault
| |
| |--99.62%-- handle_mm_fault
| | |
| | |--99.74%-- __get_user_pages
| | | get_user_page_nowait
| | | hva_to_pfn.isra.17
| | | __gfn_to_pfn
| | | gfn_to_pfn_async
| | | try_async_pf
| | | tdp_page_fault
| | | kvm_mmu_page_fault
| | | pf_interception
| | | handle_exit
| | | kvm_arch_vcpu_ioctl_run
| | | kvm_vcpu_ioctl
| | | do_vfs_ioctl
| | | sys_ioctl
| | | system_call_fastpath
| | | ioctl
| | | |
| | | |--76.24%-- 0x10100000006
| | | |
| | | --23.76%-- 0x10100000002
| | --0.26%-- [...]
| --0.38%-- [...]
|
--15.77%-- alloc_pages_current
pte_alloc_one
|
|--97.49%-- do_huge_pmd_anonymous_page
| handle_mm_fault
| __get_user_pages
| get_user_page_nowait
| hva_to_pfn.isra.17
| __gfn_to_pfn
| gfn_to_pfn_async
| try_async_pf
| tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--57.31%-- 0x10100000006
| |
| --42.69%-- 0x10100000002
|
--2.51%-- __pte_alloc
do_huge_pmd_anonymous_page
handle_mm_fault
__get_user_pages
get_user_page_nowait
hva_to_pfn.isra.17
__gfn_to_pfn
gfn_to_pfn_async
try_async_pf
tdp_page_fault
kvm_mmu_page_fault
pf_interception
handle_exit
kvm_arch_vcpu_ioctl_run
kvm_vcpu_ioctl
do_vfs_ioctl
sys_ioctl
system_call_fastpath
ioctl
|
|--61.90%-- 0x10100000006
|
--38.10%-- 0x10100000002
2.66% ksmd [kernel.kallsyms] [k] smp_call_function_many
|
--- smp_call_function_many
|
|--99.99%-- native_flush_tlb_others
| |
| |--99.79%-- flush_tlb_page
| | ptep_clear_flush
| | try_to_merge_with_ksm_page
| | ksm_scan_thread
| | kthread
| | kernel_thread_helper
| --0.21%-- [...]
--0.01%-- [...]
1.62% qemu-kvm [kernel.kallsyms] [k] yield_to
|
--- yield_to
|
|--99.58%-- kvm_vcpu_yield_to
| kvm_vcpu_on_spin
| pause_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--77.42%-- 0x10100000006
| |
| --22.58%-- 0x10100000002
--0.42%-- [...]
1.17% ksmd [kernel.kallsyms] [k] memcmp
|
--- memcmp
|
|--99.65%-- memcmp_pages
| |
| |--78.67%-- ksm_scan_thread
| | kthread
| | kernel_thread_helper
| |
| --21.33%-- try_to_merge_with_ksm_page
| ksm_scan_thread
| kthread
| kernel_thread_helper
--0.35%-- [...]
1.16% qemu-kvm [kernel.kallsyms] [k] svm_vcpu_run
|
--- svm_vcpu_run
|
|--99.47%-- kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--74.69%-- 0x10100000006
| |
| --25.31%-- 0x10100000002
|
--0.53%-- kvm_vcpu_ioctl
do_vfs_ioctl
sys_ioctl
system_call_fastpath
ioctl
|
|--72.19%-- 0x10100000006
|
--27.81%-- 0x10100000002
1.09% swapper [kernel.kallsyms] [k] default_idle
|
--- default_idle
|
|--99.73%-- cpu_idle
| |
| |--84.39%-- start_secondary
| |
| --15.61%-- rest_init
| start_kernel
| x86_64_start_reservations
| x86_64_start_kernel
--0.27%-- [...]
0.85% qemu-kvm [kernel.kallsyms] [k] kvm_vcpu_on_spin
|
--- kvm_vcpu_on_spin
|
|--99.40%-- pause_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--76.92%-- 0x10100000006
| |
| --23.08%-- 0x10100000002
|
--0.60%-- handle_exit
kvm_arch_vcpu_ioctl_run
kvm_vcpu_ioctl
do_vfs_ioctl
sys_ioctl
system_call_fastpath
ioctl
|
|--75.02%-- 0x10100000006
|
--24.98%-- 0x10100000002
0.60% qemu-kvm [kernel.kallsyms] [k] __srcu_read_lock
|
--- __srcu_read_lock
|
|--92.87%-- kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--76.37%-- 0x10100000006
| |
| --23.63%-- 0x10100000002
|
|--6.18%-- kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--74.92%-- 0x10100000006
| |
| --25.08%-- 0x10100000002
--0.95%-- [...]
0.60% qemu-kvm [kernel.kallsyms] [k] __rcu_read_unlock
|
--- __rcu_read_unlock
|
|--79.70%-- get_pid_task
| kvm_vcpu_yield_to
| kvm_vcpu_on_spin
| pause_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--75.95%-- 0x10100000006
| |
| --24.05%-- 0x10100000002
|
|--11.44%-- kvm_vcpu_yield_to
| kvm_vcpu_on_spin
| pause_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--75.32%-- 0x10100000006
| |
| --24.68%-- 0x10100000002
|
|--3.51%-- kvm_vcpu_on_spin
| pause_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--76.56%-- 0x10100000006
| |
| --23.44%-- 0x10100000002
|
|--1.88%-- do_select
| core_sys_select
| sys_select
| system_call_fastpath
| __select
| 0x0
|
|--1.30%-- fget_light
| |
| |--71.87%-- do_select
| | core_sys_select
| | sys_select
| | system_call_fastpath
| | __select
| | 0x0
| |
| |--15.50%-- sys_ioctl
| | system_call_fastpath
| | ioctl
| | |
| | |--50.94%-- 0x10100000002
| | |
| | |--17.13%-- 0x2740310
| | | 0x0
| | |
| | |--13.07%-- 0x225c310
| | | 0x0
| | |
| | |--9.95%-- 0x2792310
| | | 0x0
| | |
| | |--3.64%-- 0x75ed8548202c4b83
| | |
| | |--1.87%-- 0x8800000
| | | 0x26433c0
| | |
| | |--1.79%-- 0x10100000006
| | |
| | |--0.95%-- 0x19800000
| | | 0x26953c0
| | |
| | --0.67%-- 0x24bc8b4400000098
| |
| |--7.32%-- sys_read
| | system_call_fastpath
| | read
| | |
| | --100.00%-- pthread_mutex_lock@plt
| |
| |--4.03%-- sys_write
| | system_call_fastpath
| | write
| | |
| | --100.00%-- 0x0
| |
| |--0.69%-- sys_pread64
| | system_call_fastpath
| | pread64
| | 0x269d260
| | 0x80
| | 0x480050b9e1058b48
| --0.59%-- [...]
--2.18%-- [...]
0.49% qemu-kvm [kernel.kallsyms] [k] _raw_spin_lock
|
--- _raw_spin_lock
|
|--50.00%-- yield_to
| kvm_vcpu_yield_to
| kvm_vcpu_on_spin
| pause_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--77.93%-- 0x10100000006
| |
| --22.07%-- 0x10100000002
|
|--11.97%-- free_pcppages_bulk
| |
| |--67.09%-- free_hot_cold_page
| | |
| | |--87.14%-- free_hot_cold_page_list
| | | |
| | | |--62.82%-- shrink_page_list
| | | | shrink_inactive_list
| | | | shrink_lruvec
| | | | try_to_free_pages
| | | | __alloc_pages_nodemask
| | | | alloc_pages_vma
| | | | do_huge_pmd_anonymous_page
| | | | handle_mm_fault
| | | | __get_user_pages
| | | | get_user_page_nowait
| | | | hva_to_pfn.isra.17
| | | | __gfn_to_pfn
| | | | gfn_to_pfn_async
| | | | try_async_pf
| | | | tdp_page_fault
| | | | kvm_mmu_page_fault
| | | | pf_interception
| | | | handle_exit
| | | | kvm_arch_vcpu_ioctl_run
| | | | kvm_vcpu_ioctl
| | | | do_vfs_ioctl
| | | | sys_ioctl
| | | | system_call_fastpath
| | | | ioctl
| | | | |
| | | | |--77.85%-- 0x10100000006
| | | | |
| | | | --22.15%-- 0x10100000002
| | | |
| | | --37.18%-- release_pages
| | | pagevec_lru_move_fn
| | | __pagevec_lru_add
| | | |
| | | |--99.76%-- __lru_cache_add
| | | | lru_cache_add_lru
| | | | putback_lru_page
| | | | migrate_pages
| | | | compact_zone
| | | | compact_zone_order
| | | | try_to_compact_pages
| | | | __alloc_pages_direct_compact
| | | | __alloc_pages_nodemask
| | | | alloc_pages_vma
| | | | do_huge_pmd_anonymous_page
| | | | handle_mm_fault
| | | | __get_user_pages
| | | | get_user_page_nowait
| | | | hva_to_pfn.isra.17
| | | | __gfn_to_pfn
| | | | gfn_to_pfn_async
| | | | try_async_pf
| | | | tdp_page_fault
| | | | kvm_mmu_page_fault
| | | | pf_interception
| | | | handle_exit
| | | | kvm_arch_vcpu_ioctl_run
| | | | kvm_vcpu_ioctl
| | | | do_vfs_ioctl
| | | | sys_ioctl
| | | | system_call_fastpath
| | | | ioctl
| | | | |
| | | | |--80.37%-- 0x10100000006
| | | | |
| | | | --19.63%-- 0x10100000002
| | | --0.24%-- [...]
| | |
| | |--10.98%-- __free_pages
| | | |
| | | |--98.77%-- release_freepages
| | | | compact_zone
| | | | compact_zone_order
| | | | try_to_compact_pages
| | | | __alloc_pages_direct_compact
| | | | __alloc_pages_nodemask
| | | | alloc_pages_vma
| | | | do_huge_pmd_anonymous_page
| | | | handle_mm_fault
| | | | __get_user_pages
| | | | get_user_page_nowait
| | | | hva_to_pfn.isra.17
| | | | __gfn_to_pfn
| | | | gfn_to_pfn_async
| | | | try_async_pf
| | | | tdp_page_fault
| | | | kvm_mmu_page_fault
| | | | pf_interception
| | | | handle_exit
| | | | kvm_arch_vcpu_ioctl_run
| | | | kvm_vcpu_ioctl
| | | | do_vfs_ioctl
| | | | sys_ioctl
| | | | system_call_fastpath
| | | | ioctl
| | | | |
| | | | |--80.81%-- 0x10100000006
| | | | |
| | | | --19.19%-- 0x10100000002
| | | |
| | | --1.23%-- __free_slab
| | | discard_slab
| | | unfreeze_partials
| | | put_cpu_partial
| | | __slab_free
| | | kmem_cache_free
| | | free_buffer_head
| | | try_to_free_buffers
| | | jbd2_journal_try_to_free_buffers
| | | ext4_releasepage
| | | try_to_release_page
| | | shrink_page_list
| | | shrink_inactive_list
| | | shrink_lruvec
| | | try_to_free_pages
| | | __alloc_pages_nodemask
| | | alloc_pages_vma
| | | do_huge_pmd_anonymous_page
| | | handle_mm_fault
| | | __get_user_pages
| | | get_user_page_nowait
| | | hva_to_pfn.isra.17
| | | __gfn_to_pfn
| | | gfn_to_pfn_async
| | | try_async_pf
| | | tdp_page_fault
| | | kvm_mmu_page_fault
| | | pf_interception
| | | handle_exit
| | | kvm_arch_vcpu_ioctl_run
| | | kvm_vcpu_ioctl
| | | do_vfs_ioctl
| | | sys_ioctl
| | | system_call_fastpath
| | | ioctl
| | | |
| | | |--57.92%-- 0x10100000006
| | | |
| | | --42.08%-- 0x10100000002
| | |
| | --1.88%-- __put_single_page
| | put_page
| | putback_lru_page
| | migrate_pages
| | compact_zone
| | compact_zone_order
| | try_to_compact_pages
| | __alloc_pages_direct_compact
| | __alloc_pages_nodemask
| | alloc_pages_vma
| | do_huge_pmd_anonymous_page
| | handle_mm_fault
| | __get_user_pages
| | get_user_page_nowait
| | hva_to_pfn.isra.17
| | __gfn_to_pfn
| | gfn_to_pfn_async
| | try_async_pf
| | tdp_page_fault
| | kvm_mmu_page_fault
| | pf_interception
| | handle_exit
| | kvm_arch_vcpu_ioctl_run
| | kvm_vcpu_ioctl
| | do_vfs_ioctl
| | sys_ioctl
| | system_call_fastpath
| | ioctl
| | |
| | |--62.44%-- 0x10100000006
| | |
| | --37.56%-- 0x10100000002
| |
| --32.91%-- drain_pages
| |
| |--75.89%-- drain_local_pages
| | |
| | |--89.98%-- generic_smp_call_function_interrupt
| | | smp_call_function_interrupt
| | | call_function_interrupt
| | | |
| | | |--44.57%-- compaction_alloc
| | | | migrate_pages
| | | | compact_zone
| | | | compact_zone_order
| | | | try_to_compact_pages
| | | | __alloc_pages_direct_compact
| | | | __alloc_pages_nodemask
| | | | alloc_pages_vma
| | | | do_huge_pmd_anonymous_page
| | | | handle_mm_fault
| | | | __get_user_pages
| | | | get_user_page_nowait
| | | | hva_to_pfn.isra.17
| | | | __gfn_to_pfn
| | | | gfn_to_pfn_async
| | | | try_async_pf
| | | | tdp_page_fault
| | | | kvm_mmu_page_fault
| | | | pf_interception
| | | | handle_exit
| | | | kvm_arch_vcpu_ioctl_run
| | | | kvm_vcpu_ioctl
| | | | do_vfs_ioctl
| | | | sys_ioctl
| | | | system_call_fastpath
| | | | ioctl
| | | | |
| | | | |--79.27%-- 0x10100000006
| | | | |
| | | | --20.73%-- 0x10100000002
| | | |
| | | |--16.92%-- kvm_vcpu_ioctl
| | | | do_vfs_ioctl
| | | | sys_ioctl
| | | | system_call_fastpath
| | | | ioctl
| | | | |
| | | | |--86.24%-- 0x10100000006
| | | | |
| | | | --13.76%-- 0x10100000002
| | | |
| | | |--5.39%-- do_huge_pmd_anonymous_page
| | | | handle_mm_fault
| | | | __get_user_pages
| | | | get_user_page_nowait
| | | | hva_to_pfn.isra.17
| | | | __gfn_to_pfn
| | | | gfn_to_pfn_async
| | | | try_async_pf
| | | | tdp_page_fault
| | | | kvm_mmu_page_fault
| | | | pf_interception
| | | | handle_exit
| | | | kvm_arch_vcpu_ioctl_run
| | | | kvm_vcpu_ioctl
| | | | do_vfs_ioctl
| | | | sys_ioctl
| | | | system_call_fastpath
| | | | ioctl
| | | | |
| | | | |--75.62%-- 0x10100000006
| | | | |
| | | | --24.38%-- 0x10100000002
| | | |
| | | |--3.26%-- buffer_migrate_page
| | | | move_to_new_page
| | | | migrate_pages
| | | | compact_zone
| | | | compact_zone_order
| | | | try_to_compact_pages
| | | | __alloc_pages_direct_compact
| | | | __alloc_pages_nodemask
| | | | alloc_pages_vma
| | | | do_huge_pmd_anonymous_page
| | | | handle_mm_fault
| | | | __get_user_pages
| | | | get_user_page_nowait
| | | | hva_to_pfn.isra.17
| | | | __gfn_to_pfn
| | | | gfn_to_pfn_async
| | | | try_async_pf
| | | | tdp_page_fault
| | | | kvm_mmu_page_fault
| | | | pf_interception
| | | | handle_exit
| | | | kvm_arch_vcpu_ioctl_run
| | | | kvm_vcpu_ioctl
| | | | do_vfs_ioctl
| | | | sys_ioctl
| | | | system_call_fastpath
| | | | ioctl
| | | | |
| | | | |--85.62%-- 0x10100000006
| | | | |
| | | | --14.38%-- 0x10100000002
| | | |
| | | |--3.21%-- __remove_mapping
| | | | shrink_page_list
| | | | shrink_inactive_list
| | | | shrink_lruvec
| | | | try_to_free_pages
| | | | __alloc_pages_nodemask
| | | | alloc_pages_vma
| | | | do_huge_pmd_anonymous_page
| | | | handle_mm_fault
| | | | __get_user_pages
| | | | get_user_page_nowait
| | | | hva_to_pfn.isra.17
| | | | __gfn_to_pfn
| | | | gfn_to_pfn_async
| | | | try_async_pf
| | | | tdp_page_fault
| | | | kvm_mmu_page_fault
| | | | pf_interception
| | | | handle_exit
| | | | kvm_arch_vcpu_ioctl_run
| | | | kvm_vcpu_ioctl
| | | | do_vfs_ioctl
| | | | sys_ioctl
| | | | system_call_fastpath
| | | | ioctl
| | | | |
| | | | |--78.75%-- 0x10100000006
| | | | |
| | | | --21.25%-- 0x10100000002
| | | |
| | | |--3.01%-- free_hot_cold_page_list
| | | | shrink_page_list
| | | | shrink_inactive_list
| | | | shrink_lruvec
| | | | try_to_free_pages
| | | | __alloc_pages_nodemask
| | | | alloc_pages_vma
| | | | do_huge_pmd_anonymous_page
| | | | handle_mm_fault
| | | | __get_user_pages
| | | | get_user_page_nowait
| | | | hva_to_pfn.isra.17
| | | | __gfn_to_pfn
| | | | gfn_to_pfn_async
| | | | try_async_pf
| | | | tdp_page_fault
| | | | kvm_mmu_page_fault
| | | | pf_interception
| | | | handle_exit
| | | | kvm_arch_vcpu_ioctl_run
| | | | kvm_vcpu_ioctl
| | | | do_vfs_ioctl
| | | | sys_ioctl
| | | | system_call_fastpath
| | | | ioctl
| | | | |
| | | | |--84.48%-- 0x10100000006
| | | | |
| | | | --15.52%-- 0x10100000002
| | | |
| | | |--2.25%-- try_to_free_buffers
| | | | jbd2_journal_try_to_free_buffers
| | | | ext4_releasepage
| | | | try_to_release_page
| | | | shrink_page_list
| | | | shrink_inactive_list
| | | | shrink_lruvec
| | | | try_to_free_pages
| | | | __alloc_pages_nodemask
| | | | alloc_pages_vma
| | | | do_huge_pmd_anonymous_page
| | | | handle_mm_fault
| | | | __get_user_pages
| | | | get_user_page_nowait
| | | | hva_to_pfn.isra.17
| | | | __gfn_to_pfn
| | | | gfn_to_pfn_async
| | | | try_async_pf
| | | | tdp_page_fault
| | | | kvm_mmu_page_fault
| | | | pf_interception
| | | | handle_exit
| | | | kvm_arch_vcpu_ioctl_run
| | | | kvm_vcpu_ioctl
| | | | do_vfs_ioctl
| | | | sys_ioctl
| | | | system_call_fastpath
| | | | ioctl
| | | | |
| | | | |--58.91%-- 0x10100000006
| | | | |
| | | | --41.09%-- 0x10100000002
| | | |
| | | |--2.07%-- compact_zone
| | | | compact_zone_order
| | | | try_to_compact_pages
| | | | __alloc_pages_direct_compact
| | | | __alloc_pages_nodemask
| | | | alloc_pages_vma
| | | | do_huge_pmd_anonymous_page
| | | | handle_mm_fault
| | | | __get_user_pages
| | | | get_user_page_nowait
| | | | hva_to_pfn.isra.17
| | | | __gfn_to_pfn
| | | | gfn_to_pfn_async
| | | | try_async_pf
| | | | tdp_page_fault
| | | | kvm_mmu_page_fault
| | | | pf_interception
| | | | handle_exit
| | | | kvm_arch_vcpu_ioctl_run
| | | | kvm_vcpu_ioctl
| | | | do_vfs_ioctl
| | | | sys_ioctl
| | | | system_call_fastpath
| | | | ioctl
| | | | |
| | | | |--67.59%-- 0x10100000006
| | | | |
| | | | --32.41%-- 0x10100000002
| | | |
| | | |--1.80%-- native_flush_tlb_others
| | | | |
| | | | |--75.08%-- flush_tlb_page
| | | | | |
| | | | | |--82.69%-- ptep_clear_flush_young
| | | | | | page_referenced_one
| | | | | | page_referenced
| | | | | | shrink_active_list
| | | | | | shrink_lruvec
| | | | | | try_to_free_pages
| | | | | | __alloc_pages_nodemask
| | | | | | alloc_pages_vma
| | | | | | do_huge_pmd_anonymous_page
| | | | | | handle_mm_fault
| | | | | | __get_user_pages
| | | | | | get_user_page_nowait
| | | | | | hva_to_pfn.isra.17
| | | | | | __gfn_to_pfn
| | | | | | gfn_to_pfn_async
| | | | | | try_async_pf
| | | | | | tdp_page_fault
| | | | | | kvm_mmu_page_fault
| | | | | | pf_interception
| | | | | | handle_exit
| | | | | | kvm_arch_vcpu_ioctl_run
| | | | | | kvm_vcpu_ioctl
| | | | | | do_vfs_ioctl
| | | | | | sys_ioctl
| | | | | | system_call_fastpath
| | | | | | ioctl
| | | | | | |
| | | | | | |--78.99%-- 0x10100000006
| | | | | | |
| | | | | | --21.01%-- 0x10100000002
| | | | | |
| | | | | --17.31%-- ptep_clear_flush
| | | | | try_to_unmap_one
| | | | | try_to_unmap_anon
| | | | | try_to_unmap
| | | | | migrate_pages
| | | | | compact_zone
| | | | | compact_zone_order
| | | | | try_to_compact_pages
| | | | | __alloc_pages_direct_compact
| | | | | __alloc_pages_nodemask
| | | | | alloc_pages_vma
| | | | | do_huge_pmd_anonymous_page
| | | | | handle_mm_fault
| | | | | __get_user_pages
| | | | | get_user_page_nowait
| | | | | hva_to_pfn.isra.17
| | | | | __gfn_to_pfn
| | | | | gfn_to_pfn_async
| | | | | try_async_pf
| | | | | tdp_page_fault
| | | | | kvm_mmu_page_fault
| | | | | pf_interception
| | | | | handle_exit
| | | | | kvm_arch_vcpu_ioctl_run
| | | | | kvm_vcpu_ioctl
| | | | | do_vfs_ioctl
| | | | | sys_ioctl
| | | | | system_call_fastpath
| | | | | ioctl
| | | | | 0x10100000006
| | | | |
| | | | --24.92%-- flush_tlb_mm_range
| | | | pmdp_clear_flush_young
| | | | page_referenced_one
| | | | page_referenced
| | | | shrink_active_list
| | | | shrink_lruvec
| | | | try_to_free_pages
| | | | __alloc_pages_nodemask
| | | | alloc_pages_vma
| | | | do_huge_pmd_anonymous_page
| | | | handle_mm_fault
| | | | __get_user_pages
| | | | get_user_page_nowait
| | | | hva_to_pfn.isra.17
| | | | __gfn_to_pfn
| | | | gfn_to_pfn_async
| | | | try_async_pf
| | | | tdp_page_fault
| | | | kvm_mmu_page_fault
| | | | pf_interception
| | | | handle_exit
| | | | kvm_arch_vcpu_ioctl_run
| | | | kvm_vcpu_ioctl
| | | | do_vfs_ioctl
| | | | sys_ioctl
| | | | system_call_fastpath
| | | | ioctl
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: [Qemu-devel] [PATCH -v2 2/2] make the compaction "skip ahead" logic robust
@ 2012-09-18 8:14 ` Richard Davies
0 siblings, 0 replies; 101+ messages in thread
From: Richard Davies @ 2012-09-18 8:14 UTC (permalink / raw)
To: Mel Gorman; +Cc: kvm, qemu-devel, linux-mm, Avi Kivity, Shaohua Li
Hi Mel,
Thanks for your latest patch, I attach a perf report below with this on top
of all previous patches. There is still lock contention, though in a
different place.
Regarding Rik's question:
> > Mel asked for timings of the slow boots. It's very hard to give anything
> > useful here! A normal boot would be a minute or so, and many are like
> > that, but the slowest that I have seen (on 3.5.x) was several hours.
> > Basically, I just test many times until I get one which is noticeably
> > slow than normal and then run perf record on that one.
> >
> > The latest perf report for a slow boot is below. For the fast boots,
> > most of the time is in clean_page_c in do_huge_pmd_anonymous_page, but
> > for this slow one there is a lot of lock contention above that.
>
> How often do you run into slow boots, vs. fast ones?
It is about 1/3rd slow boots, some of which are slower than others. I do
about ten and send you the trace of the worst.
Experimentally, copying large files (the VM image files) immediately before
booting the VM seems to make a slow boot more likely.
Thanks,
Richard.
# ========
# captured on: Mon Sep 17 20:09:33 2012
# os release : 3.6.0-rc5-elastic+
# perf version : 3.5.2
# arch : x86_64
# nrcpus online : 16
# nrcpus avail : 16
# cpudesc : AMD Opteron(tm) Processor 6128
# cpuid : AuthenticAMD,16,9,1
# total memory : 131973280 kB
# cmdline : /home/root/bin/perf record -g -a
# event : name = cycles, type = 0, config = 0x0, config1 = 0x0, config2 = 0x0, excl_usr = 0, excl_kern = 0, id = { 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48 }
# HEADER_CPU_TOPOLOGY info available, use -I to display
# HEADER_NUMA_TOPOLOGY info available, use -I to display
# ========
#
# Samples: 4M of event 'cycles'
# Event count (approx.): 1616311320818
#
# Overhead Command Shared Object Symbol
# ........ ............... .................... ..............................................
#
59.97% qemu-kvm [kernel.kallsyms] [k] _raw_spin_lock_irqsave
|
--- _raw_spin_lock_irqsave
|
|--99.30%-- compact_checklock_irqsave
| |
| |--99.98%-- compaction_alloc
| | migrate_pages
| | compact_zone
| | compact_zone_order
| | try_to_compact_pages
| | __alloc_pages_direct_compact
| | __alloc_pages_nodemask
| | alloc_pages_vma
| | do_huge_pmd_anonymous_page
| | handle_mm_fault
| | __get_user_pages
| | get_user_page_nowait
| | hva_to_pfn.isra.17
| | __gfn_to_pfn
| | gfn_to_pfn_async
| | try_async_pf
| | tdp_page_fault
| | kvm_mmu_page_fault
| | pf_interception
| | handle_exit
| | kvm_arch_vcpu_ioctl_run
| | kvm_vcpu_ioctl
| | do_vfs_ioctl
| | sys_ioctl
| | system_call_fastpath
| | ioctl
| | |
| | |--84.28%-- 0x10100000006
| | |
| | --15.72%-- 0x10100000002
| --0.02%-- [...]
|
|--0.65%-- compaction_alloc
| migrate_pages
| compact_zone
| compact_zone_order
| try_to_compact_pages
| __alloc_pages_direct_compact
| __alloc_pages_nodemask
| alloc_pages_vma
| do_huge_pmd_anonymous_page
| handle_mm_fault
| __get_user_pages
| get_user_page_nowait
| hva_to_pfn.isra.17
| __gfn_to_pfn
| gfn_to_pfn_async
| try_async_pf
| tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--83.37%-- 0x10100000006
| |
| --16.63%-- 0x10100000002
--0.05%-- [...]
12.27% qemu-kvm [kernel.kallsyms] [k] isolate_freepages_block
|
--- isolate_freepages_block
|
|--99.99%-- compaction_alloc
| migrate_pages
| compact_zone
| compact_zone_order
| try_to_compact_pages
| __alloc_pages_direct_compact
| __alloc_pages_nodemask
| alloc_pages_vma
| do_huge_pmd_anonymous_page
| handle_mm_fault
| __get_user_pages
| get_user_page_nowait
| hva_to_pfn.isra.17
| __gfn_to_pfn
| gfn_to_pfn_async
| try_async_pf
| tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--82.90%-- 0x10100000006
| |
| --17.10%-- 0x10100000002
--0.01%-- [...]
7.90% qemu-kvm [kernel.kallsyms] [k] clear_page_c
|
--- clear_page_c
|
|--99.19%-- do_huge_pmd_anonymous_page
| handle_mm_fault
| __get_user_pages
| get_user_page_nowait
| hva_to_pfn.isra.17
| __gfn_to_pfn
| gfn_to_pfn_async
| try_async_pf
| tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--64.93%-- 0x10100000006
| |
| --35.07%-- 0x10100000002
|
--0.81%-- __alloc_pages_nodemask
|
|--84.23%-- alloc_pages_vma
| handle_pte_fault
| |
| |--99.62%-- handle_mm_fault
| | |
| | |--99.74%-- __get_user_pages
| | | get_user_page_nowait
| | | hva_to_pfn.isra.17
| | | __gfn_to_pfn
| | | gfn_to_pfn_async
| | | try_async_pf
| | | tdp_page_fault
| | | kvm_mmu_page_fault
| | | pf_interception
| | | handle_exit
| | | kvm_arch_vcpu_ioctl_run
| | | kvm_vcpu_ioctl
| | | do_vfs_ioctl
| | | sys_ioctl
| | | system_call_fastpath
| | | ioctl
| | | |
| | | |--76.24%-- 0x10100000006
| | | |
| | | --23.76%-- 0x10100000002
| | --0.26%-- [...]
| --0.38%-- [...]
|
--15.77%-- alloc_pages_current
pte_alloc_one
|
|--97.49%-- do_huge_pmd_anonymous_page
| handle_mm_fault
| __get_user_pages
| get_user_page_nowait
| hva_to_pfn.isra.17
| __gfn_to_pfn
| gfn_to_pfn_async
| try_async_pf
| tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--57.31%-- 0x10100000006
| |
| --42.69%-- 0x10100000002
|
--2.51%-- __pte_alloc
do_huge_pmd_anonymous_page
handle_mm_fault
__get_user_pages
get_user_page_nowait
hva_to_pfn.isra.17
__gfn_to_pfn
gfn_to_pfn_async
try_async_pf
tdp_page_fault
kvm_mmu_page_fault
pf_interception
handle_exit
kvm_arch_vcpu_ioctl_run
kvm_vcpu_ioctl
do_vfs_ioctl
sys_ioctl
system_call_fastpath
ioctl
|
|--61.90%-- 0x10100000006
|
--38.10%-- 0x10100000002
2.66% ksmd [kernel.kallsyms] [k] smp_call_function_many
|
--- smp_call_function_many
|
|--99.99%-- native_flush_tlb_others
| |
| |--99.79%-- flush_tlb_page
| | ptep_clear_flush
| | try_to_merge_with_ksm_page
| | ksm_scan_thread
| | kthread
| | kernel_thread_helper
| --0.21%-- [...]
--0.01%-- [...]
1.62% qemu-kvm [kernel.kallsyms] [k] yield_to
|
--- yield_to
|
|--99.58%-- kvm_vcpu_yield_to
| kvm_vcpu_on_spin
| pause_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--77.42%-- 0x10100000006
| |
| --22.58%-- 0x10100000002
--0.42%-- [...]
1.17% ksmd [kernel.kallsyms] [k] memcmp
|
--- memcmp
|
|--99.65%-- memcmp_pages
| |
| |--78.67%-- ksm_scan_thread
| | kthread
| | kernel_thread_helper
| |
| --21.33%-- try_to_merge_with_ksm_page
| ksm_scan_thread
| kthread
| kernel_thread_helper
--0.35%-- [...]
1.16% qemu-kvm [kernel.kallsyms] [k] svm_vcpu_run
|
--- svm_vcpu_run
|
|--99.47%-- kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--74.69%-- 0x10100000006
| |
| --25.31%-- 0x10100000002
|
--0.53%-- kvm_vcpu_ioctl
do_vfs_ioctl
sys_ioctl
system_call_fastpath
ioctl
|
|--72.19%-- 0x10100000006
|
--27.81%-- 0x10100000002
1.09% swapper [kernel.kallsyms] [k] default_idle
|
--- default_idle
|
|--99.73%-- cpu_idle
| |
| |--84.39%-- start_secondary
| |
| --15.61%-- rest_init
| start_kernel
| x86_64_start_reservations
| x86_64_start_kernel
--0.27%-- [...]
0.85% qemu-kvm [kernel.kallsyms] [k] kvm_vcpu_on_spin
|
--- kvm_vcpu_on_spin
|
|--99.40%-- pause_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--76.92%-- 0x10100000006
| |
| --23.08%-- 0x10100000002
|
--0.60%-- handle_exit
kvm_arch_vcpu_ioctl_run
kvm_vcpu_ioctl
do_vfs_ioctl
sys_ioctl
system_call_fastpath
ioctl
|
|--75.02%-- 0x10100000006
|
--24.98%-- 0x10100000002
0.60% qemu-kvm [kernel.kallsyms] [k] __srcu_read_lock
|
--- __srcu_read_lock
|
|--92.87%-- kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--76.37%-- 0x10100000006
| |
| --23.63%-- 0x10100000002
|
|--6.18%-- kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--74.92%-- 0x10100000006
| |
| --25.08%-- 0x10100000002
--0.95%-- [...]
0.60% qemu-kvm [kernel.kallsyms] [k] __rcu_read_unlock
|
--- __rcu_read_unlock
|
|--79.70%-- get_pid_task
| kvm_vcpu_yield_to
| kvm_vcpu_on_spin
| pause_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--75.95%-- 0x10100000006
| |
| --24.05%-- 0x10100000002
|
|--11.44%-- kvm_vcpu_yield_to
| kvm_vcpu_on_spin
| pause_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--75.32%-- 0x10100000006
| |
| --24.68%-- 0x10100000002
|
|--3.51%-- kvm_vcpu_on_spin
| pause_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--76.56%-- 0x10100000006
| |
| --23.44%-- 0x10100000002
|
|--1.88%-- do_select
| core_sys_select
| sys_select
| system_call_fastpath
| __select
| 0x0
|
|--1.30%-- fget_light
| |
| |--71.87%-- do_select
| | core_sys_select
| | sys_select
| | system_call_fastpath
| | __select
| | 0x0
| |
| |--15.50%-- sys_ioctl
| | system_call_fastpath
| | ioctl
| | |
| | |--50.94%-- 0x10100000002
| | |
| | |--17.13%-- 0x2740310
| | | 0x0
| | |
| | |--13.07%-- 0x225c310
| | | 0x0
| | |
| | |--9.95%-- 0x2792310
| | | 0x0
| | |
| | |--3.64%-- 0x75ed8548202c4b83
| | |
| | |--1.87%-- 0x8800000
| | | 0x26433c0
| | |
| | |--1.79%-- 0x10100000006
| | |
| | |--0.95%-- 0x19800000
| | | 0x26953c0
| | |
| | --0.67%-- 0x24bc8b4400000098
| |
| |--7.32%-- sys_read
| | system_call_fastpath
| | read
| | |
| | --100.00%-- pthread_mutex_lock@plt
| |
| |--4.03%-- sys_write
| | system_call_fastpath
| | write
| | |
| | --100.00%-- 0x0
| |
| |--0.69%-- sys_pread64
| | system_call_fastpath
| | pread64
| | 0x269d260
| | 0x80
| | 0x480050b9e1058b48
| --0.59%-- [...]
--2.18%-- [...]
0.49% qemu-kvm [kernel.kallsyms] [k] _raw_spin_lock
|
--- _raw_spin_lock
|
|--50.00%-- yield_to
| kvm_vcpu_yield_to
| kvm_vcpu_on_spin
| pause_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--77.93%-- 0x10100000006
| |
| --22.07%-- 0x10100000002
|
|--11.97%-- free_pcppages_bulk
| |
| |--67.09%-- free_hot_cold_page
| | |
| | |--87.14%-- free_hot_cold_page_list
| | | |
| | | |--62.82%-- shrink_page_list
| | | | shrink_inactive_list
| | | | shrink_lruvec
| | | | try_to_free_pages
| | | | __alloc_pages_nodemask
| | | | alloc_pages_vma
| | | | do_huge_pmd_anonymous_page
| | | | handle_mm_fault
| | | | __get_user_pages
| | | | get_user_page_nowait
| | | | hva_to_pfn.isra.17
| | | | __gfn_to_pfn
| | | | gfn_to_pfn_async
| | | | try_async_pf
| | | | tdp_page_fault
| | | | kvm_mmu_page_fault
| | | | pf_interception
| | | | handle_exit
| | | | kvm_arch_vcpu_ioctl_run
| | | | kvm_vcpu_ioctl
| | | | do_vfs_ioctl
| | | | sys_ioctl
| | | | system_call_fastpath
| | | | ioctl
| | | | |
| | | | |--77.85%-- 0x10100000006
| | | | |
| | | | --22.15%-- 0x10100000002
| | | |
| | | --37.18%-- release_pages
| | | pagevec_lru_move_fn
| | | __pagevec_lru_add
| | | |
| | | |--99.76%-- __lru_cache_add
| | | | lru_cache_add_lru
| | | | putback_lru_page
| | | | migrate_pages
| | | | compact_zone
| | | | compact_zone_order
| | | | try_to_compact_pages
| | | | __alloc_pages_direct_compact
| | | | __alloc_pages_nodemask
| | | | alloc_pages_vma
| | | | do_huge_pmd_anonymous_page
| | | | handle_mm_fault
| | | | __get_user_pages
| | | | get_user_page_nowait
| | | | hva_to_pfn.isra.17
| | | | __gfn_to_pfn
| | | | gfn_to_pfn_async
| | | | try_async_pf
| | | | tdp_page_fault
| | | | kvm_mmu_page_fault
| | | | pf_interception
| | | | handle_exit
| | | | kvm_arch_vcpu_ioctl_run
| | | | kvm_vcpu_ioctl
| | | | do_vfs_ioctl
| | | | sys_ioctl
| | | | system_call_fastpath
| | | | ioctl
| | | | |
| | | | |--80.37%-- 0x10100000006
| | | | |
| | | | --19.63%-- 0x10100000002
| | | --0.24%-- [...]
| | |
| | |--10.98%-- __free_pages
| | | |
| | | |--98.77%-- release_freepages
| | | | compact_zone
| | | | compact_zone_order
| | | | try_to_compact_pages
| | | | __alloc_pages_direct_compact
| | | | __alloc_pages_nodemask
| | | | alloc_pages_vma
| | | | do_huge_pmd_anonymous_page
| | | | handle_mm_fault
| | | | __get_user_pages
| | | | get_user_page_nowait
| | | | hva_to_pfn.isra.17
| | | | __gfn_to_pfn
| | | | gfn_to_pfn_async
| | | | try_async_pf
| | | | tdp_page_fault
| | | | kvm_mmu_page_fault
| | | | pf_interception
| | | | handle_exit
| | | | kvm_arch_vcpu_ioctl_run
| | | | kvm_vcpu_ioctl
| | | | do_vfs_ioctl
| | | | sys_ioctl
| | | | system_call_fastpath
| | | | ioctl
| | | | |
| | | | |--80.81%-- 0x10100000006
| | | | |
| | | | --19.19%-- 0x10100000002
| | | |
| | | --1.23%-- __free_slab
| | | discard_slab
| | | unfreeze_partials
| | | put_cpu_partial
| | | __slab_free
| | | kmem_cache_free
| | | free_buffer_head
| | | try_to_free_buffers
| | | jbd2_journal_try_to_free_buffers
| | | ext4_releasepage
| | | try_to_release_page
| | | shrink_page_list
| | | shrink_inactive_list
| | | shrink_lruvec
| | | try_to_free_pages
| | | __alloc_pages_nodemask
| | | alloc_pages_vma
| | | do_huge_pmd_anonymous_page
| | | handle_mm_fault
| | | __get_user_pages
| | | get_user_page_nowait
| | | hva_to_pfn.isra.17
| | | __gfn_to_pfn
| | | gfn_to_pfn_async
| | | try_async_pf
| | | tdp_page_fault
| | | kvm_mmu_page_fault
| | | pf_interception
| | | handle_exit
| | | kvm_arch_vcpu_ioctl_run
| | | kvm_vcpu_ioctl
| | | do_vfs_ioctl
| | | sys_ioctl
| | | system_call_fastpath
| | | ioctl
| | | |
| | | |--57.92%-- 0x10100000006
| | | |
| | | --42.08%-- 0x10100000002
| | |
| | --1.88%-- __put_single_page
| | put_page
| | putback_lru_page
| | migrate_pages
| | compact_zone
| | compact_zone_order
| | try_to_compact_pages
| | __alloc_pages_direct_compact
| | __alloc_pages_nodemask
| | alloc_pages_vma
| | do_huge_pmd_anonymous_page
| | handle_mm_fault
| | __get_user_pages
| | get_user_page_nowait
| | hva_to_pfn.isra.17
| | __gfn_to_pfn
| | gfn_to_pfn_async
| | try_async_pf
| | tdp_page_fault
| | kvm_mmu_page_fault
| | pf_interception
| | handle_exit
| | kvm_arch_vcpu_ioctl_run
| | kvm_vcpu_ioctl
| | do_vfs_ioctl
| | sys_ioctl
| | system_call_fastpath
| | ioctl
| | |
| | |--62.44%-- 0x10100000006
| | |
| | --37.56%-- 0x10100000002
| |
| --32.91%-- drain_pages
| |
| |--75.89%-- drain_local_pages
| | |
| | |--89.98%-- generic_smp_call_function_interrupt
| | | smp_call_function_interrupt
| | | call_function_interrupt
| | | |
| | | |--44.57%-- compaction_alloc
| | | | migrate_pages
| | | | compact_zone
| | | | compact_zone_order
| | | | try_to_compact_pages
| | | | __alloc_pages_direct_compact
| | | | __alloc_pages_nodemask
| | | | alloc_pages_vma
| | | | do_huge_pmd_anonymous_page
| | | | handle_mm_fault
| | | | __get_user_pages
| | | | get_user_page_nowait
| | | | hva_to_pfn.isra.17
| | | | __gfn_to_pfn
| | | | gfn_to_pfn_async
| | | | try_async_pf
| | | | tdp_page_fault
| | | | kvm_mmu_page_fault
| | | | pf_interception
| | | | handle_exit
| | | | kvm_arch_vcpu_ioctl_run
| | | | kvm_vcpu_ioctl
| | | | do_vfs_ioctl
| | | | sys_ioctl
| | | | system_call_fastpath
| | | | ioctl
| | | | |
| | | | |--79.27%-- 0x10100000006
| | | | |
| | | | --20.73%-- 0x10100000002
| | | |
| | | |--16.92%-- kvm_vcpu_ioctl
| | | | do_vfs_ioctl
| | | | sys_ioctl
| | | | system_call_fastpath
| | | | ioctl
| | | | |
| | | | |--86.24%-- 0x10100000006
| | | | |
| | | | --13.76%-- 0x10100000002
| | | |
| | | |--5.39%-- do_huge_pmd_anonymous_page
| | | | handle_mm_fault
| | | | __get_user_pages
| | | | get_user_page_nowait
| | | | hva_to_pfn.isra.17
| | | | __gfn_to_pfn
| | | | gfn_to_pfn_async
| | | | try_async_pf
| | | | tdp_page_fault
| | | | kvm_mmu_page_fault
| | | | pf_interception
| | | | handle_exit
| | | | kvm_arch_vcpu_ioctl_run
| | | | kvm_vcpu_ioctl
| | | | do_vfs_ioctl
| | | | sys_ioctl
| | | | system_call_fastpath
| | | | ioctl
| | | | |
| | | | |--75.62%-- 0x10100000006
| | | | |
| | | | --24.38%-- 0x10100000002
| | | |
| | | |--3.26%-- buffer_migrate_page
| | | | move_to_new_page
| | | | migrate_pages
| | | | compact_zone
| | | | compact_zone_order
| | | | try_to_compact_pages
| | | | __alloc_pages_direct_compact
| | | | __alloc_pages_nodemask
| | | | alloc_pages_vma
| | | | do_huge_pmd_anonymous_page
| | | | handle_mm_fault
| | | | __get_user_pages
| | | | get_user_page_nowait
| | | | hva_to_pfn.isra.17
| | | | __gfn_to_pfn
| | | | gfn_to_pfn_async
| | | | try_async_pf
| | | | tdp_page_fault
| | | | kvm_mmu_page_fault
| | | | pf_interception
| | | | handle_exit
| | | | kvm_arch_vcpu_ioctl_run
| | | | kvm_vcpu_ioctl
| | | | do_vfs_ioctl
| | | | sys_ioctl
| | | | system_call_fastpath
| | | | ioctl
| | | | |
| | | | |--85.62%-- 0x10100000006
| | | | |
| | | | --14.38%-- 0x10100000002
| | | |
| | | |--3.21%-- __remove_mapping
| | | | shrink_page_list
| | | | shrink_inactive_list
| | | | shrink_lruvec
| | | | try_to_free_pages
| | | | __alloc_pages_nodemask
| | | | alloc_pages_vma
| | | | do_huge_pmd_anonymous_page
| | | | handle_mm_fault
| | | | __get_user_pages
| | | | get_user_page_nowait
| | | | hva_to_pfn.isra.17
| | | | __gfn_to_pfn
| | | | gfn_to_pfn_async
| | | | try_async_pf
| | | | tdp_page_fault
| | | | kvm_mmu_page_fault
| | | | pf_interception
| | | | handle_exit
| | | | kvm_arch_vcpu_ioctl_run
| | | | kvm_vcpu_ioctl
| | | | do_vfs_ioctl
| | | | sys_ioctl
| | | | system_call_fastpath
| | | | ioctl
| | | | |
| | | | |--78.75%-- 0x10100000006
| | | | |
| | | | --21.25%-- 0x10100000002
| | | |
| | | |--3.01%-- free_hot_cold_page_list
| | | | shrink_page_list
| | | | shrink_inactive_list
| | | | shrink_lruvec
| | | | try_to_free_pages
| | | | __alloc_pages_nodemask
| | | | alloc_pages_vma
| | | | do_huge_pmd_anonymous_page
| | | | handle_mm_fault
| | | | __get_user_pages
| | | | get_user_page_nowait
| | | | hva_to_pfn.isra.17
| | | | __gfn_to_pfn
| | | | gfn_to_pfn_async
| | | | try_async_pf
| | | | tdp_page_fault
| | | | kvm_mmu_page_fault
| | | | pf_interception
| | | | handle_exit
| | | | kvm_arch_vcpu_ioctl_run
| | | | kvm_vcpu_ioctl
| | | | do_vfs_ioctl
| | | | sys_ioctl
| | | | system_call_fastpath
| | | | ioctl
| | | | |
| | | | |--84.48%-- 0x10100000006
| | | | |
| | | | --15.52%-- 0x10100000002
| | | |
| | | |--2.25%-- try_to_free_buffers
| | | | jbd2_journal_try_to_free_buffers
| | | | ext4_releasepage
| | | | try_to_release_page
| | | | shrink_page_list
| | | | shrink_inactive_list
| | | | shrink_lruvec
| | | | try_to_free_pages
| | | | __alloc_pages_nodemask
| | | | alloc_pages_vma
| | | | do_huge_pmd_anonymous_page
| | | | handle_mm_fault
| | | | __get_user_pages
| | | | get_user_page_nowait
| | | | hva_to_pfn.isra.17
| | | | __gfn_to_pfn
| | | | gfn_to_pfn_async
| | | | try_async_pf
| | | | tdp_page_fault
| | | | kvm_mmu_page_fault
| | | | pf_interception
| | | | handle_exit
| | | | kvm_arch_vcpu_ioctl_run
| | | | kvm_vcpu_ioctl
| | | | do_vfs_ioctl
| | | | sys_ioctl
| | | | system_call_fastpath
| | | | ioctl
| | | | |
| | | | |--58.91%-- 0x10100000006
| | | | |
| | | | --41.09%-- 0x10100000002
| | | |
| | | |--2.07%-- compact_zone
| | | | compact_zone_order
| | | | try_to_compact_pages
| | | | __alloc_pages_direct_compact
| | | | __alloc_pages_nodemask
| | | | alloc_pages_vma
| | | | do_huge_pmd_anonymous_page
| | | | handle_mm_fault
| | | | __get_user_pages
| | | | get_user_page_nowait
| | | | hva_to_pfn.isra.17
| | | | __gfn_to_pfn
| | | | gfn_to_pfn_async
| | | | try_async_pf
| | | | tdp_page_fault
| | | | kvm_mmu_page_fault
| | | | pf_interception
| | | | handle_exit
| | | | kvm_arch_vcpu_ioctl_run
| | | | kvm_vcpu_ioctl
| | | | do_vfs_ioctl
| | | | sys_ioctl
| | | | system_call_fastpath
| | | | ioctl
| | | | |
| | | | |--67.59%-- 0x10100000006
| | | | |
| | | | --32.41%-- 0x10100000002
| | | |
| | | |--1.80%-- native_flush_tlb_others
| | | | |
| | | | |--75.08%-- flush_tlb_page
| | | | | |
| | | | | |--82.69%-- ptep_clear_flush_young
| | | | | | page_referenced_one
| | | | | | page_referenced
| | | | | | shrink_active_list
| | | | | | shrink_lruvec
| | | | | | try_to_free_pages
| | | | | | __alloc_pages_nodemask
| | | | | | alloc_pages_vma
| | | | | | do_huge_pmd_anonymous_page
| | | | | | handle_mm_fault
| | | | | | __get_user_pages
| | | | | | get_user_page_nowait
| | | | | | hva_to_pfn.isra.17
| | | | | | __gfn_to_pfn
| | | | | | gfn_to_pfn_async
| | | | | | try_async_pf
| | | | | | tdp_page_fault
| | | | | | kvm_mmu_page_fault
| | | | | | pf_interception
| | | | | | handle_exit
| | | | | | kvm_arch_vcpu_ioctl_run
| | | | | | kvm_vcpu_ioctl
| | | | | | do_vfs_ioctl
| | | | | | sys_ioctl
| | | | | | system_call_fastpath
| | | | | | ioctl
| | | | | | |
| | | | | | |--78.99%-- 0x10100000006
| | | | | | |
| | | | | | --21.01%-- 0x10100000002
| | | | | |
| | | | | --17.31%-- ptep_clear_flush
| | | | | try_to_unmap_one
| | | | | try_to_unmap_anon
| | | | | try_to_unmap
| | | | | migrate_pages
| | | | | compact_zone
| | | | | compact_zone_order
| | | | | try_to_compact_pages
| | | | | __alloc_pages_direct_compact
| | | | | __alloc_pages_nodemask
| | | | | alloc_pages_vma
| | | | | do_huge_pmd_anonymous_page
| | | | | handle_mm_fault
| | | | | __get_user_pages
| | | | | get_user_page_nowait
| | | | | hva_to_pfn.isra.17
| | | | | __gfn_to_pfn
| | | | | gfn_to_pfn_async
| | | | | try_async_pf
| | | | | tdp_page_fault
| | | | | kvm_mmu_page_fault
| | | | | pf_interception
| | | | | handle_exit
| | | | | kvm_arch_vcpu_ioctl_run
| | | | | kvm_vcpu_ioctl
| | | | | do_vfs_ioctl
| | | | | sys_ioctl
| | | | | system_call_fastpath
| | | | | ioctl
| | | | | 0x10100000006
| | | | |
| | | | --24.92%-- flush_tlb_mm_range
| | | | pmdp_clear_flush_young
| | | | page_referenced_one
| | | | page_referenced
| | | | shrink_active_list
| | | | shrink_lruvec
| | | | try_to_free_pages
| | | | __alloc_pages_nodemask
| | | | alloc_pages_vma
| | | | do_huge_pmd_anonymous_page
| | | | handle_mm_fault
| | | | __get_user_pages
| | | | get_user_page_nowait
| | | | hva_to_pfn.isra.17
| | | | __gfn_to_pfn
| | | | gfn_to_pfn_async
| | | | try_async_pf
| | | | tdp_page_fault
| | | | kvm_mmu_page_fault
| | | | pf_interception
| | | | handle_exit
| | | | kvm_arch_vcpu_ioctl_run
| | | | kvm_vcpu_ioctl
| | | | do_vfs_ioctl
| | | | sys_ioctl
| | | | system_call_fastpath
| | | | ioctl
^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: [PATCH -v2 2/2] make the compaction "skip ahead" logic robust
2012-09-18 8:14 ` [Qemu-devel] " Richard Davies
(?)
@ 2012-09-18 11:21 ` Mel Gorman
-1 siblings, 0 replies; 101+ messages in thread
From: Mel Gorman @ 2012-09-18 11:21 UTC (permalink / raw)
To: Richard Davies
Cc: Rik van Riel, Avi Kivity, Shaohua Li, qemu-devel, kvm, linux-mm
On Tue, Sep 18, 2012 at 09:14:55AM +0100, Richard Davies wrote:
> Hi Mel,
>
> Thanks for your latest patch, I attach a perf report below with this on top
> of all previous patches. There is still lock contention, though in a
> different place.
>
> 59.97% qemu-kvm [kernel.kallsyms] [k] _raw_spin_lock_irqsave
> |
> --- _raw_spin_lock_irqsave
> |
> |--99.30%-- compact_checklock_irqsave
> | |
> | |--99.98%-- compaction_alloc
Ok, this just means the focus has moved to the zone->lock instead of the
zone->lru_lock. This was expected to some extent. This is an additional
patch that defers acquisition of the zone->lock for as long as possible.
Incidentally, I checked the efficiency of compaction - i.e. how many
pages scanned versus how many pages isolated and the efficiency
completely sucks. It must be addressed but addressing the lock
contention should happen first.
---8<---
mm: compaction: Acquire the zone->lock as late as possible
The zone lock is required when isolating pages to allocate and for checking
PageBuddy. It is a coarse-grained lock but the current implementation
acquires the lock when examining each pageblock before it is known if there
are free pages to isolate. This patch defers acquiring the zone lock for
as long as possible. In the event there are no free pages in the pageblock
then the lock will not be acquired at all.
Signed-off-by: Mel Gorman <mgorman@suse.de>
---
mm/compaction.c | 80 ++++++++++++++++++++++++++++++++-----------------------
1 file changed, 47 insertions(+), 33 deletions(-)
diff --git a/mm/compaction.c b/mm/compaction.c
index a5d698f..57ff9ef 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -89,19 +89,14 @@ static bool compact_checklock_irqsave(spinlock_t *lock, unsigned long *flags,
return true;
}
-static inline bool compact_trylock_irqsave(spinlock_t *lock,
- unsigned long *flags, struct compact_control *cc)
-{
- return compact_checklock_irqsave(lock, flags, false, cc);
-}
-
/*
* Isolate free pages onto a private freelist. Caller must hold zone->lock.
* If @strict is true, will abort returning 0 on any invalid PFNs or non-free
* pages inside of the pageblock (even though it may still end up isolating
* some pages).
*/
-static unsigned long isolate_freepages_block(unsigned long start_pfn,
+static unsigned long isolate_freepages_block(struct compact_control *cc,
+ unsigned long start_pfn,
unsigned long end_pfn,
struct list_head *freelist,
bool strict)
@@ -109,6 +104,8 @@ static unsigned long isolate_freepages_block(unsigned long start_pfn,
int nr_scanned = 0, total_isolated = 0;
unsigned long blockpfn = start_pfn;
struct page *cursor;
+ unsigned long flags;
+ bool locked = false;
cursor = pfn_to_page(blockpfn);
@@ -117,18 +114,29 @@ static unsigned long isolate_freepages_block(unsigned long start_pfn,
int isolated, i;
struct page *page = cursor;
- if (!pfn_valid_within(blockpfn)) {
- if (strict)
- return 0;
- continue;
- }
+ if (!pfn_valid_within(blockpfn))
+ goto strict_check;
nr_scanned++;
- if (!PageBuddy(page)) {
- if (strict)
- return 0;
- continue;
- }
+ if (!PageBuddy(page))
+ goto strict_check;
+
+ /*
+ * The zone lock must be held to isolate freepages. This
+ * unfortunately this is a very coarse lock and can be
+ * heavily contended if there are parallel allocations
+ * or parallel compactions. For async compaction do not
+ * spin on the lock and we acquire the lock as late as
+ * possible.
+ */
+ locked = compact_checklock_irqsave(&cc->zone->lock, &flags,
+ locked, cc);
+ if (!locked)
+ break;
+
+ /* Recheck this is a buddy page under lock */
+ if (!PageBuddy(page))
+ goto strict_check;
/* Found a free page, break it into order-0 pages */
isolated = split_free_page(page);
@@ -145,10 +153,24 @@ static unsigned long isolate_freepages_block(unsigned long start_pfn,
blockpfn += isolated - 1;
cursor += isolated - 1;
}
+
+ continue;
+
+strict_check:
+ /* Abort isolation if the caller requested strict isolation */
+ if (strict) {
+ total_isolated = 0;
+ goto out;
+ }
}
trace_mm_compaction_isolate_freepages(start_pfn, nr_scanned,
total_isolated);
+
+out:
+ if (locked)
+ spin_unlock_irqrestore(&cc->zone->lock, flags);
+
return total_isolated;
}
@@ -168,13 +190,18 @@ static unsigned long isolate_freepages_block(unsigned long start_pfn,
unsigned long
isolate_freepages_range(unsigned long start_pfn, unsigned long end_pfn)
{
- unsigned long isolated, pfn, block_end_pfn, flags;
+ unsigned long isolated, pfn, block_end_pfn;
struct zone *zone = NULL;
LIST_HEAD(freelist);
+ struct compact_control cc;
if (pfn_valid(start_pfn))
zone = page_zone(pfn_to_page(start_pfn));
+ /* cc needed for isolate_freepages_block to acquire zone->lock */
+ cc.zone = zone;
+ cc.sync = true;
+
for (pfn = start_pfn; pfn < end_pfn; pfn += isolated) {
if (!pfn_valid(pfn) || zone != page_zone(pfn_to_page(pfn)))
break;
@@ -186,10 +213,8 @@ isolate_freepages_range(unsigned long start_pfn, unsigned long end_pfn)
block_end_pfn = ALIGN(pfn + 1, pageblock_nr_pages);
block_end_pfn = min(block_end_pfn, end_pfn);
- spin_lock_irqsave(&zone->lock, flags);
- isolated = isolate_freepages_block(pfn, block_end_pfn,
+ isolated = isolate_freepages_block(&cc, pfn, block_end_pfn,
&freelist, true);
- spin_unlock_irqrestore(&zone->lock, flags);
/*
* In strict mode, isolate_freepages_block() returns 0 if
@@ -480,7 +505,6 @@ static void isolate_freepages(struct zone *zone,
{
struct page *page;
unsigned long high_pfn, low_pfn, pfn, zone_end_pfn, end_pfn;
- unsigned long flags;
int nr_freepages = cc->nr_freepages;
struct list_head *freelist = &cc->freepages;
@@ -536,22 +560,12 @@ static void isolate_freepages(struct zone *zone,
*/
isolated = 0;
- /*
- * The zone lock must be held to isolate freepages. This
- * unfortunately this is a very coarse lock and can be
- * heavily contended if there are parallel allocations
- * or parallel compactions. For async compaction do not
- * spin on the lock
- */
- if (!compact_trylock_irqsave(&zone->lock, &flags, cc))
- break;
if (suitable_migration_target(page)) {
end_pfn = min(pfn + pageblock_nr_pages, zone_end_pfn);
- isolated = isolate_freepages_block(pfn, end_pfn,
+ isolated = isolate_freepages_block(cc, pfn, end_pfn,
freelist, false);
nr_freepages += isolated;
}
- spin_unlock_irqrestore(&zone->lock, flags);
/*
* Record the highest PFN we isolated pages from. When next
^ permalink raw reply related [flat|nested] 101+ messages in thread
* Re: [PATCH -v2 2/2] make the compaction "skip ahead" logic robust
@ 2012-09-18 11:21 ` Mel Gorman
0 siblings, 0 replies; 101+ messages in thread
From: Mel Gorman @ 2012-09-18 11:21 UTC (permalink / raw)
To: Richard Davies
Cc: Rik van Riel, Avi Kivity, Shaohua Li, qemu-devel, kvm, linux-mm
On Tue, Sep 18, 2012 at 09:14:55AM +0100, Richard Davies wrote:
> Hi Mel,
>
> Thanks for your latest patch, I attach a perf report below with this on top
> of all previous patches. There is still lock contention, though in a
> different place.
>
> 59.97% qemu-kvm [kernel.kallsyms] [k] _raw_spin_lock_irqsave
> |
> --- _raw_spin_lock_irqsave
> |
> |--99.30%-- compact_checklock_irqsave
> | |
> | |--99.98%-- compaction_alloc
Ok, this just means the focus has moved to the zone->lock instead of the
zone->lru_lock. This was expected to some extent. This is an additional
patch that defers acquisition of the zone->lock for as long as possible.
Incidentally, I checked the efficiency of compaction - i.e. how many
pages scanned versus how many pages isolated and the efficiency
completely sucks. It must be addressed but addressing the lock
contention should happen first.
---8<---
mm: compaction: Acquire the zone->lock as late as possible
The zone lock is required when isolating pages to allocate and for checking
PageBuddy. It is a coarse-grained lock but the current implementation
acquires the lock when examining each pageblock before it is known if there
are free pages to isolate. This patch defers acquiring the zone lock for
as long as possible. In the event there are no free pages in the pageblock
then the lock will not be acquired at all.
Signed-off-by: Mel Gorman <mgorman@suse.de>
---
mm/compaction.c | 80 ++++++++++++++++++++++++++++++++-----------------------
1 file changed, 47 insertions(+), 33 deletions(-)
diff --git a/mm/compaction.c b/mm/compaction.c
index a5d698f..57ff9ef 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -89,19 +89,14 @@ static bool compact_checklock_irqsave(spinlock_t *lock, unsigned long *flags,
return true;
}
-static inline bool compact_trylock_irqsave(spinlock_t *lock,
- unsigned long *flags, struct compact_control *cc)
-{
- return compact_checklock_irqsave(lock, flags, false, cc);
-}
-
/*
* Isolate free pages onto a private freelist. Caller must hold zone->lock.
* If @strict is true, will abort returning 0 on any invalid PFNs or non-free
* pages inside of the pageblock (even though it may still end up isolating
* some pages).
*/
-static unsigned long isolate_freepages_block(unsigned long start_pfn,
+static unsigned long isolate_freepages_block(struct compact_control *cc,
+ unsigned long start_pfn,
unsigned long end_pfn,
struct list_head *freelist,
bool strict)
@@ -109,6 +104,8 @@ static unsigned long isolate_freepages_block(unsigned long start_pfn,
int nr_scanned = 0, total_isolated = 0;
unsigned long blockpfn = start_pfn;
struct page *cursor;
+ unsigned long flags;
+ bool locked = false;
cursor = pfn_to_page(blockpfn);
@@ -117,18 +114,29 @@ static unsigned long isolate_freepages_block(unsigned long start_pfn,
int isolated, i;
struct page *page = cursor;
- if (!pfn_valid_within(blockpfn)) {
- if (strict)
- return 0;
- continue;
- }
+ if (!pfn_valid_within(blockpfn))
+ goto strict_check;
nr_scanned++;
- if (!PageBuddy(page)) {
- if (strict)
- return 0;
- continue;
- }
+ if (!PageBuddy(page))
+ goto strict_check;
+
+ /*
+ * The zone lock must be held to isolate freepages. This
+ * unfortunately this is a very coarse lock and can be
+ * heavily contended if there are parallel allocations
+ * or parallel compactions. For async compaction do not
+ * spin on the lock and we acquire the lock as late as
+ * possible.
+ */
+ locked = compact_checklock_irqsave(&cc->zone->lock, &flags,
+ locked, cc);
+ if (!locked)
+ break;
+
+ /* Recheck this is a buddy page under lock */
+ if (!PageBuddy(page))
+ goto strict_check;
/* Found a free page, break it into order-0 pages */
isolated = split_free_page(page);
@@ -145,10 +153,24 @@ static unsigned long isolate_freepages_block(unsigned long start_pfn,
blockpfn += isolated - 1;
cursor += isolated - 1;
}
+
+ continue;
+
+strict_check:
+ /* Abort isolation if the caller requested strict isolation */
+ if (strict) {
+ total_isolated = 0;
+ goto out;
+ }
}
trace_mm_compaction_isolate_freepages(start_pfn, nr_scanned,
total_isolated);
+
+out:
+ if (locked)
+ spin_unlock_irqrestore(&cc->zone->lock, flags);
+
return total_isolated;
}
@@ -168,13 +190,18 @@ static unsigned long isolate_freepages_block(unsigned long start_pfn,
unsigned long
isolate_freepages_range(unsigned long start_pfn, unsigned long end_pfn)
{
- unsigned long isolated, pfn, block_end_pfn, flags;
+ unsigned long isolated, pfn, block_end_pfn;
struct zone *zone = NULL;
LIST_HEAD(freelist);
+ struct compact_control cc;
if (pfn_valid(start_pfn))
zone = page_zone(pfn_to_page(start_pfn));
+ /* cc needed for isolate_freepages_block to acquire zone->lock */
+ cc.zone = zone;
+ cc.sync = true;
+
for (pfn = start_pfn; pfn < end_pfn; pfn += isolated) {
if (!pfn_valid(pfn) || zone != page_zone(pfn_to_page(pfn)))
break;
@@ -186,10 +213,8 @@ isolate_freepages_range(unsigned long start_pfn, unsigned long end_pfn)
block_end_pfn = ALIGN(pfn + 1, pageblock_nr_pages);
block_end_pfn = min(block_end_pfn, end_pfn);
- spin_lock_irqsave(&zone->lock, flags);
- isolated = isolate_freepages_block(pfn, block_end_pfn,
+ isolated = isolate_freepages_block(&cc, pfn, block_end_pfn,
&freelist, true);
- spin_unlock_irqrestore(&zone->lock, flags);
/*
* In strict mode, isolate_freepages_block() returns 0 if
@@ -480,7 +505,6 @@ static void isolate_freepages(struct zone *zone,
{
struct page *page;
unsigned long high_pfn, low_pfn, pfn, zone_end_pfn, end_pfn;
- unsigned long flags;
int nr_freepages = cc->nr_freepages;
struct list_head *freelist = &cc->freepages;
@@ -536,22 +560,12 @@ static void isolate_freepages(struct zone *zone,
*/
isolated = 0;
- /*
- * The zone lock must be held to isolate freepages. This
- * unfortunately this is a very coarse lock and can be
- * heavily contended if there are parallel allocations
- * or parallel compactions. For async compaction do not
- * spin on the lock
- */
- if (!compact_trylock_irqsave(&zone->lock, &flags, cc))
- break;
if (suitable_migration_target(page)) {
end_pfn = min(pfn + pageblock_nr_pages, zone_end_pfn);
- isolated = isolate_freepages_block(pfn, end_pfn,
+ isolated = isolate_freepages_block(cc, pfn, end_pfn,
freelist, false);
nr_freepages += isolated;
}
- spin_unlock_irqrestore(&zone->lock, flags);
/*
* Record the highest PFN we isolated pages from. When next
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 101+ messages in thread
* Re: [Qemu-devel] [PATCH -v2 2/2] make the compaction "skip ahead" logic robust
@ 2012-09-18 11:21 ` Mel Gorman
0 siblings, 0 replies; 101+ messages in thread
From: Mel Gorman @ 2012-09-18 11:21 UTC (permalink / raw)
To: Richard Davies; +Cc: kvm, qemu-devel, linux-mm, Avi Kivity, Shaohua Li
On Tue, Sep 18, 2012 at 09:14:55AM +0100, Richard Davies wrote:
> Hi Mel,
>
> Thanks for your latest patch, I attach a perf report below with this on top
> of all previous patches. There is still lock contention, though in a
> different place.
>
> 59.97% qemu-kvm [kernel.kallsyms] [k] _raw_spin_lock_irqsave
> |
> --- _raw_spin_lock_irqsave
> |
> |--99.30%-- compact_checklock_irqsave
> | |
> | |--99.98%-- compaction_alloc
Ok, this just means the focus has moved to the zone->lock instead of the
zone->lru_lock. This was expected to some extent. This is an additional
patch that defers acquisition of the zone->lock for as long as possible.
Incidentally, I checked the efficiency of compaction - i.e. how many
pages scanned versus how many pages isolated and the efficiency
completely sucks. It must be addressed but addressing the lock
contention should happen first.
---8<---
mm: compaction: Acquire the zone->lock as late as possible
The zone lock is required when isolating pages to allocate and for checking
PageBuddy. It is a coarse-grained lock but the current implementation
acquires the lock when examining each pageblock before it is known if there
are free pages to isolate. This patch defers acquiring the zone lock for
as long as possible. In the event there are no free pages in the pageblock
then the lock will not be acquired at all.
Signed-off-by: Mel Gorman <mgorman@suse.de>
---
mm/compaction.c | 80 ++++++++++++++++++++++++++++++++-----------------------
1 file changed, 47 insertions(+), 33 deletions(-)
diff --git a/mm/compaction.c b/mm/compaction.c
index a5d698f..57ff9ef 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -89,19 +89,14 @@ static bool compact_checklock_irqsave(spinlock_t *lock, unsigned long *flags,
return true;
}
-static inline bool compact_trylock_irqsave(spinlock_t *lock,
- unsigned long *flags, struct compact_control *cc)
-{
- return compact_checklock_irqsave(lock, flags, false, cc);
-}
-
/*
* Isolate free pages onto a private freelist. Caller must hold zone->lock.
* If @strict is true, will abort returning 0 on any invalid PFNs or non-free
* pages inside of the pageblock (even though it may still end up isolating
* some pages).
*/
-static unsigned long isolate_freepages_block(unsigned long start_pfn,
+static unsigned long isolate_freepages_block(struct compact_control *cc,
+ unsigned long start_pfn,
unsigned long end_pfn,
struct list_head *freelist,
bool strict)
@@ -109,6 +104,8 @@ static unsigned long isolate_freepages_block(unsigned long start_pfn,
int nr_scanned = 0, total_isolated = 0;
unsigned long blockpfn = start_pfn;
struct page *cursor;
+ unsigned long flags;
+ bool locked = false;
cursor = pfn_to_page(blockpfn);
@@ -117,18 +114,29 @@ static unsigned long isolate_freepages_block(unsigned long start_pfn,
int isolated, i;
struct page *page = cursor;
- if (!pfn_valid_within(blockpfn)) {
- if (strict)
- return 0;
- continue;
- }
+ if (!pfn_valid_within(blockpfn))
+ goto strict_check;
nr_scanned++;
- if (!PageBuddy(page)) {
- if (strict)
- return 0;
- continue;
- }
+ if (!PageBuddy(page))
+ goto strict_check;
+
+ /*
+ * The zone lock must be held to isolate freepages. This
+ * unfortunately this is a very coarse lock and can be
+ * heavily contended if there are parallel allocations
+ * or parallel compactions. For async compaction do not
+ * spin on the lock and we acquire the lock as late as
+ * possible.
+ */
+ locked = compact_checklock_irqsave(&cc->zone->lock, &flags,
+ locked, cc);
+ if (!locked)
+ break;
+
+ /* Recheck this is a buddy page under lock */
+ if (!PageBuddy(page))
+ goto strict_check;
/* Found a free page, break it into order-0 pages */
isolated = split_free_page(page);
@@ -145,10 +153,24 @@ static unsigned long isolate_freepages_block(unsigned long start_pfn,
blockpfn += isolated - 1;
cursor += isolated - 1;
}
+
+ continue;
+
+strict_check:
+ /* Abort isolation if the caller requested strict isolation */
+ if (strict) {
+ total_isolated = 0;
+ goto out;
+ }
}
trace_mm_compaction_isolate_freepages(start_pfn, nr_scanned,
total_isolated);
+
+out:
+ if (locked)
+ spin_unlock_irqrestore(&cc->zone->lock, flags);
+
return total_isolated;
}
@@ -168,13 +190,18 @@ static unsigned long isolate_freepages_block(unsigned long start_pfn,
unsigned long
isolate_freepages_range(unsigned long start_pfn, unsigned long end_pfn)
{
- unsigned long isolated, pfn, block_end_pfn, flags;
+ unsigned long isolated, pfn, block_end_pfn;
struct zone *zone = NULL;
LIST_HEAD(freelist);
+ struct compact_control cc;
if (pfn_valid(start_pfn))
zone = page_zone(pfn_to_page(start_pfn));
+ /* cc needed for isolate_freepages_block to acquire zone->lock */
+ cc.zone = zone;
+ cc.sync = true;
+
for (pfn = start_pfn; pfn < end_pfn; pfn += isolated) {
if (!pfn_valid(pfn) || zone != page_zone(pfn_to_page(pfn)))
break;
@@ -186,10 +213,8 @@ isolate_freepages_range(unsigned long start_pfn, unsigned long end_pfn)
block_end_pfn = ALIGN(pfn + 1, pageblock_nr_pages);
block_end_pfn = min(block_end_pfn, end_pfn);
- spin_lock_irqsave(&zone->lock, flags);
- isolated = isolate_freepages_block(pfn, block_end_pfn,
+ isolated = isolate_freepages_block(&cc, pfn, block_end_pfn,
&freelist, true);
- spin_unlock_irqrestore(&zone->lock, flags);
/*
* In strict mode, isolate_freepages_block() returns 0 if
@@ -480,7 +505,6 @@ static void isolate_freepages(struct zone *zone,
{
struct page *page;
unsigned long high_pfn, low_pfn, pfn, zone_end_pfn, end_pfn;
- unsigned long flags;
int nr_freepages = cc->nr_freepages;
struct list_head *freelist = &cc->freepages;
@@ -536,22 +560,12 @@ static void isolate_freepages(struct zone *zone,
*/
isolated = 0;
- /*
- * The zone lock must be held to isolate freepages. This
- * unfortunately this is a very coarse lock and can be
- * heavily contended if there are parallel allocations
- * or parallel compactions. For async compaction do not
- * spin on the lock
- */
- if (!compact_trylock_irqsave(&zone->lock, &flags, cc))
- break;
if (suitable_migration_target(page)) {
end_pfn = min(pfn + pageblock_nr_pages, zone_end_pfn);
- isolated = isolate_freepages_block(pfn, end_pfn,
+ isolated = isolate_freepages_block(cc, pfn, end_pfn,
freelist, false);
nr_freepages += isolated;
}
- spin_unlock_irqrestore(&zone->lock, flags);
/*
* Record the highest PFN we isolated pages from. When next
^ permalink raw reply related [flat|nested] 101+ messages in thread
* Re: Windows slow boot
2012-08-16 10:47 ` [Qemu-devel] " Richard Davies
@ 2012-09-18 15:12 ` Michael Tokarev
-1 siblings, 0 replies; 101+ messages in thread
From: Michael Tokarev @ 2012-09-18 15:12 UTC (permalink / raw)
To: Richard Davies; +Cc: qemu-devel, kvm
On 16.08.2012 14:47, Richard Davies wrote:
> http://marc.info/?l=qemu-devel&m=134304194329745
>
>
> We have been experiencing this problem for a while now too, using qemu-kvm
> (currently at 1.1.1).
>
> Unfortunately, hv_relaxed doesn't seem to fix it. The following command line
> produces the issue:
>
> qemu-kvm -nodefaults -m 4096 -smp 8 -cpu host,hv_relaxed -vga cirrus -usbdevice tablet -vnc :99 -monitor stdio -hda test.img
Just one question: did you try explicitly using hugepages?
For that,
- reserve some amount of hugepages (echo something > /proc/sys/vm/nr_hugepages),
- mount hugetlbfs to somewhere, like, /dev/hugetlbfs
- use -mem-path=/dev/hugetlbfs qemu option
This may also reduce your lock contention. Sure, hugepages have
some minus sides too, but I think it is worth to try anyway -
for a single VM or for whole lot of VMs (for that you'll have
to reserve much more memory after host boot).
/mjt
^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: [Qemu-devel] Windows slow boot
@ 2012-09-18 15:12 ` Michael Tokarev
0 siblings, 0 replies; 101+ messages in thread
From: Michael Tokarev @ 2012-09-18 15:12 UTC (permalink / raw)
To: Richard Davies; +Cc: qemu-devel, kvm
On 16.08.2012 14:47, Richard Davies wrote:
> http://marc.info/?l=qemu-devel&m=134304194329745
>
>
> We have been experiencing this problem for a while now too, using qemu-kvm
> (currently at 1.1.1).
>
> Unfortunately, hv_relaxed doesn't seem to fix it. The following command line
> produces the issue:
>
> qemu-kvm -nodefaults -m 4096 -smp 8 -cpu host,hv_relaxed -vga cirrus -usbdevice tablet -vnc :99 -monitor stdio -hda test.img
Just one question: did you try explicitly using hugepages?
For that,
- reserve some amount of hugepages (echo something > /proc/sys/vm/nr_hugepages),
- mount hugetlbfs to somewhere, like, /dev/hugetlbfs
- use -mem-path=/dev/hugetlbfs qemu option
This may also reduce your lock contention. Sure, hugepages have
some minus sides too, but I think it is worth to try anyway -
for a single VM or for whole lot of VMs (for that you'll have
to reserve much more memory after host boot).
/mjt
^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: [PATCH -v2 2/2] make the compaction "skip ahead" logic robust
2012-09-18 11:21 ` Mel Gorman
@ 2012-09-18 17:58 ` Richard Davies
-1 siblings, 0 replies; 101+ messages in thread
From: Richard Davies @ 2012-09-18 17:58 UTC (permalink / raw)
To: Mel Gorman
Cc: Rik van Riel, Avi Kivity, Shaohua Li, qemu-devel, kvm, linux-mm
Mel Gorman wrote:
> Ok, this just means the focus has moved to the zone->lock instead of the
> zone->lru_lock. This was expected to some extent. This is an additional
> patch that defers acquisition of the zone->lock for as long as possible.
And I believe you have now beaten the lock contention - congratulations!
> Incidentally, I checked the efficiency of compaction - i.e. how many
> pages scanned versus how many pages isolated and the efficiency
> completely sucks. It must be addressed but addressing the lock
> contention should happen first.
Yes, compaction is now definitely top.
Interestingly, some boots still seem "slow" and some "fast", even without
any lock contention issues. Here are traces from a few different runs, and I
attach the detailed report for the first of these which was one of the slow
ones.
# grep -F '[k]' report.1 | head -8
55.86% qemu-kvm [kernel.kallsyms] [k] isolate_freepages_block
14.98% qemu-kvm [kernel.kallsyms] [k] clear_page_c
2.18% qemu-kvm [kernel.kallsyms] [k] yield_to
1.67% qemu-kvm [kernel.kallsyms] [k] get_pageblock_flags_group
1.66% qemu-kvm [kernel.kallsyms] [k] compact_zone
1.56% ksmd [kernel.kallsyms] [k] memcmp
1.48% swapper [kernel.kallsyms] [k] default_idle
1.33% qemu-kvm [kernel.kallsyms] [k] svm_vcpu_run
#
# grep -F '[k]' report.2 | head -8
38.28% qemu-kvm [kernel.kallsyms] [k] isolate_freepages_block
7.58% qemu-kvm [kernel.kallsyms] [k] get_pageblock_flags_group
7.03% qemu-kvm [kernel.kallsyms] [k] clear_page_c
4.72% qemu-kvm [kernel.kallsyms] [k] isolate_migratepages_range
4.31% qemu-kvm [kernel.kallsyms] [k] copy_page_c
4.15% qemu-kvm [kernel.kallsyms] [k] compact_zone
2.68% qemu-kvm [kernel.kallsyms] [k] __zone_watermark_ok
2.65% qemu-kvm [kernel.kallsyms] [k] yield_to
#
# grep -F '[k]' report.3 | head -8
75.18% qemu-kvm [kernel.kallsyms] [k] clear_page_c
1.82% swapper [kernel.kallsyms] [k] default_idle
1.29% qemu-kvm [kernel.kallsyms] [k] isolate_freepages_block
1.27% qemu-kvm [kernel.kallsyms] [k] get_page_from_freelist
1.20% ksmd [kernel.kallsyms] [k] memcmp
0.83% qemu-kvm [kernel.kallsyms] [k] free_pages_prepare
0.78% qemu-kvm [kernel.kallsyms] [k] svm_vcpu_run
0.59% qemu-kvm [kernel.kallsyms] [k] prep_compound_page
#
# grep -F '[k]' report.4 | head -8
41.02% qemu-kvm [kernel.kallsyms] [k] isolate_freepages_block
32.20% qemu-kvm [kernel.kallsyms] [k] clear_page_c
1.76% qemu-kvm [kernel.kallsyms] [k] yield_to
1.37% swapper [kernel.kallsyms] [k] default_idle
1.35% ksmd [kernel.kallsyms] [k] memcmp
1.27% qemu-kvm [kernel.kallsyms] [k] svm_vcpu_run
1.23% qemu-kvm [kernel.kallsyms] [k] get_pageblock_flags_group
0.88% qemu-kvm [kernel.kallsyms] [k] kvm_vcpu_on_spin
#
# grep -F '[k]' report.5 | head -8
61.18% qemu-kvm [kernel.kallsyms] [k] isolate_freepages_block
14.55% qemu-kvm [kernel.kallsyms] [k] clear_page_c
1.75% qemu-kvm [kernel.kallsyms] [k] yield_to
1.31% ksmd [kernel.kallsyms] [k] memcmp
1.21% qemu-kvm [kernel.kallsyms] [k] svm_vcpu_run
1.20% swapper [kernel.kallsyms] [k] default_idle
1.14% qemu-kvm [kernel.kallsyms] [k] get_pageblock_flags_group
0.94% qemu-kvm [kernel.kallsyms] [k] kvm_vcpu_on_spin
Here is the detailed report for the first of these:
# ========
# captured on: Tue Sep 18 17:03:40 2012
# os release : 3.6.0-rc5-elastic+
# perf version : 3.5.2
# arch : x86_64
# nrcpus online : 16
# nrcpus avail : 16
# cpudesc : AMD Opteron(tm) Processor 6128
# cpuid : AuthenticAMD,16,9,1
# total memory : 131973280 kB
# cmdline : /home/root/bin/perf record -g -a
# event : name = cycles, type = 0, config = 0x0, config1 = 0x0, config2 = 0x0, excl_usr = 0, excl_kern = 0, id = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16 }
# HEADER_CPU_TOPOLOGY info available, use -I to display
# HEADER_NUMA_TOPOLOGY info available, use -I to display
# ========
#
# Samples: 3M of event 'cycles'
# Event count (approx.): 1184064513533
#
# Overhead Command Shared Object Symbol
# ........ ............... .................... ..............................................
#
55.86% qemu-kvm [kernel.kallsyms] [k] isolate_freepages_block
|
--- isolate_freepages_block
|
|--99.99%-- compaction_alloc
| migrate_pages
| compact_zone
| compact_zone_order
| try_to_compact_pages
| __alloc_pages_direct_compact
| __alloc_pages_nodemask
| alloc_pages_vma
| do_huge_pmd_anonymous_page
| handle_mm_fault
| __get_user_pages
| get_user_page_nowait
| hva_to_pfn.isra.17
| __gfn_to_pfn
| gfn_to_pfn_async
| try_async_pf
| tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--88.73%-- 0x10100000006
| |
| --11.27%-- 0x10100000002
--0.01%-- [...]
14.98% qemu-kvm [kernel.kallsyms] [k] clear_page_c
|
--- clear_page_c
|
|--99.84%-- do_huge_pmd_anonymous_page
| handle_mm_fault
| __get_user_pages
| get_user_page_nowait
| hva_to_pfn.isra.17
| __gfn_to_pfn
| gfn_to_pfn_async
| try_async_pf
| tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--55.15%-- 0x10100000006
| |
| --44.85%-- 0x10100000002
--0.16%-- [...]
2.18% qemu-kvm [kernel.kallsyms] [k] yield_to
|
--- yield_to
|
|--99.62%-- kvm_vcpu_yield_to
| kvm_vcpu_on_spin
| pause_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--83.34%-- 0x10100000006
| |
| --16.66%-- 0x10100000002
--0.38%-- [...]
1.67% qemu-kvm [kernel.kallsyms] [k] get_pageblock_flags_group
|
--- get_pageblock_flags_group
|
|--57.67%-- isolate_migratepages_range
| compact_zone
| compact_zone_order
| try_to_compact_pages
| __alloc_pages_direct_compact
| __alloc_pages_nodemask
| alloc_pages_vma
| do_huge_pmd_anonymous_page
| handle_mm_fault
| __get_user_pages
| get_user_page_nowait
| hva_to_pfn.isra.17
| __gfn_to_pfn
| gfn_to_pfn_async
| try_async_pf
| tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--86.10%-- 0x10100000006
| |
| --13.90%-- 0x10100000002
|
|--38.10%-- suitable_migration_target
| compaction_alloc
| migrate_pages
| compact_zone
| compact_zone_order
| try_to_compact_pages
| __alloc_pages_direct_compact
| __alloc_pages_nodemask
| alloc_pages_vma
| do_huge_pmd_anonymous_page
| handle_mm_fault
| __get_user_pages
| get_user_page_nowait
| hva_to_pfn.isra.17
| __gfn_to_pfn
| gfn_to_pfn_async
| try_async_pf
| tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--88.50%-- 0x10100000006
| |
| --11.50%-- 0x10100000002
|
|--2.23%-- compact_zone
| compact_zone_order
| try_to_compact_pages
| __alloc_pages_direct_compact
| __alloc_pages_nodemask
| alloc_pages_vma
| do_huge_pmd_anonymous_page
| handle_mm_fault
| __get_user_pages
| get_user_page_nowait
| hva_to_pfn.isra.17
| __gfn_to_pfn
| gfn_to_pfn_async
| try_async_pf
| tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--85.85%-- 0x10100000006
| |
| --14.15%-- 0x10100000002
|
|--0.88%-- compaction_alloc
| migrate_pages
| compact_zone
| compact_zone_order
| try_to_compact_pages
| __alloc_pages_direct_compact
| __alloc_pages_nodemask
| alloc_pages_vma
| do_huge_pmd_anonymous_page
| handle_mm_fault
| __get_user_pages
| get_user_page_nowait
| hva_to_pfn.isra.17
| __gfn_to_pfn
| gfn_to_pfn_async
| try_async_pf
| tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--87.75%-- 0x10100000006
| |
| --12.25%-- 0x10100000002
|
|--0.75%-- free_hot_cold_page
| |
| |--74.93%-- free_hot_cold_page_list
| | |
| | |--53.13%-- shrink_page_list
| | | shrink_inactive_list
| | | shrink_lruvec
| | | try_to_free_pages
| | | __alloc_pages_nodemask
| | | alloc_pages_vma
| | | do_huge_pmd_anonymous_page
| | | handle_mm_fault
| | | __get_user_pages
| | | get_user_page_nowait
| | | hva_to_pfn.isra.17
| | | __gfn_to_pfn
| | | gfn_to_pfn_async
| | | try_async_pf
| | | tdp_page_fault
| | | kvm_mmu_page_fault
| | | pf_interception
| | | handle_exit
| | | kvm_arch_vcpu_ioctl_run
| | | kvm_vcpu_ioctl
| | | do_vfs_ioctl
| | | sys_ioctl
| | | system_call_fastpath
| | | ioctl
| | | |
| | | |--82.85%-- 0x10100000006
| | | |
| | | --17.15%-- 0x10100000002
| | |
| | --46.87%-- release_pages
| | pagevec_lru_move_fn
| | __pagevec_lru_add
| | |
| | |--98.13%-- __lru_cache_add
| | | lru_cache_add_lru
| | | putback_lru_page
| | | |
| | | |--99.02%-- migrate_pages
| | | | compact_zone
| | | | compact_zone_order
| | | | try_to_compact_pages
| | | | __alloc_pages_direct_compact
| | | | __alloc_pages_nodemask
| | | | alloc_pages_vma
| | | | do_huge_pmd_anonymous_page
| | | | handle_mm_fault
| | | | __get_user_pages
| | | | get_user_page_nowait
| | | | hva_to_pfn.isra.17
| | | | __gfn_to_pfn
| | | | gfn_to_pfn_async
| | | | try_async_pf
| | | | tdp_page_fault
| | | | kvm_mmu_page_fault
| | | | pf_interception
| | | | handle_exit
| | | | kvm_arch_vcpu_ioctl_run
| | | | kvm_vcpu_ioctl
| | | | do_vfs_ioctl
| | | | sys_ioctl
| | | | system_call_fastpath
| | | | ioctl
| | | | |
| | | | |--88.56%-- 0x10100000006
| | | | |
| | | | --11.44%-- 0x10100000002
| | | |
| | | --0.98%-- putback_lru_pages
| | | compact_zone
| | | compact_zone_order
| | | try_to_compact_pages
| | | __alloc_pages_direct_compact
| | | __alloc_pages_nodemask
| | | alloc_pages_vma
| | | do_huge_pmd_anonymous_page
| | | handle_mm_fault
| | | __get_user_pages
| | | get_user_page_nowait
| | | hva_to_pfn.isra.17
| | | __gfn_to_pfn
| | | gfn_to_pfn_async
| | | try_async_pf
| | | tdp_page_fault
| | | kvm_mmu_page_fault
| | | pf_interception
| | | handle_exit
| | | kvm_arch_vcpu_ioctl_run
| | | kvm_vcpu_ioctl
| | | do_vfs_ioctl
| | | sys_ioctl
| | | system_call_fastpath
| | | ioctl
| | | 0x10100000002
| | |
| | --1.87%-- lru_add_drain_cpu
| | lru_add_drain
| | |
| | |--51.26%-- shrink_inactive_list
| | | shrink_lruvec
| | | try_to_free_pages
| | | __alloc_pages_nodemask
| | | alloc_pages_vma
| | | do_huge_pmd_anonymous_page
| | | handle_mm_fault
| | | __get_user_pages
| | | get_user_page_nowait
| | | hva_to_pfn.isra.17
| | | __gfn_to_pfn
| | | gfn_to_pfn_async
| | | try_async_pf
| | | tdp_page_fault
| | | kvm_mmu_page_fault
| | | pf_interception
| | | handle_exit
| | | kvm_arch_vcpu_ioctl_run
| | | kvm_vcpu_ioctl
| | | do_vfs_ioctl
| | | sys_ioctl
| | | system_call_fastpath
| | | ioctl
| | | 0x10100000002
| | |
| | --48.74%-- migrate_prep_local
| | compact_zone
| | compact_zone_order
| | try_to_compact_pages
| | __alloc_pages_direct_compact
| | __alloc_pages_nodemask
| | alloc_pages_vma
| | do_huge_pmd_anonymous_page
| | handle_mm_fault
| | __get_user_pages
| | get_user_page_nowait
| | hva_to_pfn.isra.17
| | __gfn_to_pfn
| | gfn_to_pfn_async
| | try_async_pf
| | tdp_page_fault
| | kvm_mmu_page_fault
| | pf_interception
| | handle_exit
| | kvm_arch_vcpu_ioctl_run
| | kvm_vcpu_ioctl
| | do_vfs_ioctl
| | sys_ioctl
| | system_call_fastpath
| | ioctl
| | 0x10100000006
| |
| |--23.04%-- __free_pages
| | |
| | |--59.57%-- release_freepages
| | | compact_zone
| | | compact_zone_order
| | | try_to_compact_pages
| | | __alloc_pages_direct_compact
| | | __alloc_pages_nodemask
| | | alloc_pages_vma
| | | do_huge_pmd_anonymous_page
| | | handle_mm_fault
| | | __get_user_pages
| | | get_user_page_nowait
| | | hva_to_pfn.isra.17
| | | __gfn_to_pfn
| | | gfn_to_pfn_async
| | | try_async_pf
| | | tdp_page_fault
| | | kvm_mmu_page_fault
| | | pf_interception
| | | handle_exit
| | | kvm_arch_vcpu_ioctl_run
| | | kvm_vcpu_ioctl
| | | do_vfs_ioctl
| | | sys_ioctl
| | | system_call_fastpath
| | | ioctl
| | | |
| | | |--89.08%-- 0x10100000006
| | | |
| | | --10.92%-- 0x10100000002
| | |
| | |--30.57%-- do_huge_pmd_anonymous_page
| | | handle_mm_fault
| | | __get_user_pages
| | | get_user_page_nowait
| | | hva_to_pfn.isra.17
| | | __gfn_to_pfn
| | | gfn_to_pfn_async
| | | try_async_pf
| | | tdp_page_fault
| | | kvm_mmu_page_fault
| | | pf_interception
| | | handle_exit
| | | kvm_arch_vcpu_ioctl_run
| | | kvm_vcpu_ioctl
| | | do_vfs_ioctl
| | | sys_ioctl
| | | system_call_fastpath
| | | ioctl
| | | |
| | | |--60.91%-- 0x10100000006
| | | |
| | | --39.09%-- 0x10100000002
| | |
| | --9.86%-- __free_slab
| | discard_slab
| | |
| | |--55.43%-- unfreeze_partials
| | | put_cpu_partial
| | | __slab_free
| | | kmem_cache_free
| | | free_buffer_head
| | | try_to_free_buffers
| | | jbd2_journal_try_to_free_buffers
| | | ext4_releasepage
| | | try_to_release_page
| | | shrink_page_list
| | | shrink_inactive_list
| | | shrink_lruvec
| | | try_to_free_pages
| | | __alloc_pages_nodemask
| | | alloc_pages_vma
| | | do_huge_pmd_anonymous_page
| | | handle_mm_fault
| | | __get_user_pages
| | | get_user_page_nowait
| | | hva_to_pfn.isra.17
| | | __gfn_to_pfn
| | | gfn_to_pfn_async
| | | try_async_pf
| | | tdp_page_fault
| | | kvm_mmu_page_fault
| | | pf_interception
| | | handle_exit
| | | kvm_arch_vcpu_ioctl_run
| | | kvm_vcpu_ioctl
| | | do_vfs_ioctl
| | | sys_ioctl
| | | system_call_fastpath
| | | ioctl
| | | 0x10100000006
| | |
| | --44.57%-- __slab_free
| | kmem_cache_free
| | free_buffer_head
| | try_to_free_buffers
| | jbd2_journal_try_to_free_buffers
| | ext4_releasepage
| | try_to_release_page
| | shrink_page_list
| | shrink_inactive_list
| | shrink_lruvec
| | try_to_free_pages
| | __alloc_pages_nodemask
| | alloc_pages_vma
| | do_huge_pmd_anonymous_page
| | handle_mm_fault
| | __get_user_pages
| | get_user_page_nowait
| | hva_to_pfn.isra.17
| | __gfn_to_pfn
| | gfn_to_pfn_async
| | try_async_pf
| | tdp_page_fault
| | kvm_mmu_page_fault
| | pf_interception
| | handle_exit
| | kvm_arch_vcpu_ioctl_run
| | kvm_vcpu_ioctl
| | do_vfs_ioctl
| | sys_ioctl
| | system_call_fastpath
| | ioctl
| | 0x10100000006
| |
| --2.02%-- __put_single_page
| put_page
| putback_lru_page
| migrate_pages
| compact_zone
| compact_zone_order
| try_to_compact_pages
| __alloc_pages_direct_compact
| __alloc_pages_nodemask
| alloc_pages_vma
| do_huge_pmd_anonymous_page
| handle_mm_fault
| __get_user_pages
| get_user_page_nowait
| hva_to_pfn.isra.17
| __gfn_to_pfn
| gfn_to_pfn_async
| try_async_pf
| tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--83.36%-- 0x10100000006
| |
| --16.64%-- 0x10100000002
--0.37%-- [...]
1.66% qemu-kvm [kernel.kallsyms] [k] compact_zone
|
--- compact_zone
|
|--99.99%-- compact_zone_order
| try_to_compact_pages
| __alloc_pages_direct_compact
| __alloc_pages_nodemask
| alloc_pages_vma
| do_huge_pmd_anonymous_page
| handle_mm_fault
| __get_user_pages
| get_user_page_nowait
| hva_to_pfn.isra.17
| __gfn_to_pfn
| gfn_to_pfn_async
| try_async_pf
| tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--85.25%-- 0x10100000006
| |
| --14.75%-- 0x10100000002
--0.01%-- [...]
1.56% ksmd [kernel.kallsyms] [k] memcmp
|
--- memcmp
|
|--99.67%-- memcmp_pages
| |
| |--77.39%-- ksm_scan_thread
| | kthread
| | kernel_thread_helper
| |
| --22.61%-- try_to_merge_with_ksm_page
| ksm_scan_thread
| kthread
| kernel_thread_helper
--0.33%-- [...]
1.48% swapper [kernel.kallsyms] [k] default_idle
|
--- default_idle
|
|--99.55%-- cpu_idle
| |
| |--92.95%-- start_secondary
| |
| --7.05%-- rest_init
| start_kernel
| x86_64_start_reservations
| x86_64_start_kernel
--0.45%-- [...]
1.33% qemu-kvm [kernel.kallsyms] [k] svm_vcpu_run
|
--- svm_vcpu_run
|
|--99.34%-- kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--77.65%-- 0x10100000006
| |
| --22.35%-- 0x10100000002
|
--0.66%-- kvm_vcpu_ioctl
do_vfs_ioctl
sys_ioctl
system_call_fastpath
ioctl
|
|--73.97%-- 0x10100000006
|
--26.03%-- 0x10100000002
1.08% qemu-kvm [kernel.kallsyms] [k] kvm_vcpu_on_spin
|
--- kvm_vcpu_on_spin
|
|--99.27%-- pause_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--83.21%-- 0x10100000006
| |
| --16.79%-- 0x10100000002
|
--0.73%-- handle_exit
kvm_arch_vcpu_ioctl_run
kvm_vcpu_ioctl
do_vfs_ioctl
sys_ioctl
system_call_fastpath
ioctl
|
|--80.89%-- 0x10100000006
|
--19.11%-- 0x10100000002
0.79% qemu-kvm qemu-kvm [.] 0x00000000000ae282
|
|--1.27%-- 0x4eec6e
| |
| |--38.48%-- 0x1491280
| | 0x0
| | 0xa0
| | 0x696368752d62
| |
| |--32.35%-- 0x3195280
| | 0x0
| | 0xa0
| | 0x696368752d62
| |
| --29.16%-- 0x200c280
| 0x0
| 0xa0
| 0x696368752d62
|
|--1.24%-- 0x503457
| 0x0
|
|--1.02%-- 0x4eec20
| |
| |--46.48%-- 0x200c280
| | 0x0
| | 0xa0
| | 0x696368752d62
| |
| |--28.52%-- 0x3195280
| | 0x0
| | 0xa0
| | 0x696368752d62
| |
| --24.99%-- 0x1491280
| 0x0
| 0xa0
| 0x696368752d62
|
|--1.00%-- 0x4eec2a
| |
| |--77.52%-- 0x200c280
| | 0x0
| | 0xa0
| | 0x696368752d62
| |
| |--12.67%-- 0x3195280
| | 0x0
| | 0xa0
| | 0x696368752d62
| |
| --9.80%-- 0x1491280
| 0x0
| 0xa0
| 0x696368752d62
|
|--0.99%-- 0x4ef092
|
|--0.94%-- 0x568f04
| |
| |--89.85%-- 0x0
| |
| |--7.89%-- 0x10100000002
| |
| --2.26%-- 0x10100000006
|
|--0.93%-- 0x5afab4
| |
| |--40.39%-- 0x309a410
| | 0x0
| |
| |--31.80%-- 0x1f11410
| | 0x0
| |
| |--20.88%-- 0x1396410
| | 0x0
| |
| |--4.58%-- 0x0
| | |
| | |--52.36%-- 0x148ea00
| | | 0x5699c0
| | | 0x24448948004b4154
| | |
| | |--31.49%-- 0x2009a00
| | | 0x5699c0
| | | 0x24448948004b4154
| | |
| | --16.15%-- 0x3192a00
| | 0x5699c0
| | 0x24448948004b4154
| |
| |--1.31%-- 0x1000
| |
| --1.03%-- 0x6
|
|--0.92%-- 0x4eeba0
| |
| |--35.54%-- 0x3195280
| | 0x0
| | 0xa0
| | 0x696368752d62
| |
| |--32.33%-- 0x200c280
| | 0x0
| | 0xa0
| | 0x696368752d62
| |
| --32.12%-- 0x1491280
| 0x0
| 0xa0
| 0x696368752d62
|
|--0.91%-- 0x652b11
|
|--0.83%-- 0x65a102
|
|--0.82%-- 0x40a6a9
|
|--0.81%-- 0x530421
| |
| |--94.43%-- 0x0
| |
| --5.57%-- 0x46b47b
| |
| |--51.32%-- 0xdffec96000a08169
| |
| --48.68%-- 0xdffec90000a08169
|
|--0.80%-- 0x569fc4
| |
| |--41.34%-- 0x1396410
| | 0x0
| |
| |--29.46%-- 0x1f11410
| | 0x0
| |
| --29.21%-- 0x309a410
| 0x0
|
|--0.73%-- 0x541422
| 0x0
|
|--0.70%-- 0x56b990
| |
| |--72.77%-- 0x100000008
| |
| |--26.00%-- 0xfed00000
| | |
| | --100.00%-- 0x0
| |
| |--0.73%-- 0x100000004
| --0.50%-- [...]
|
|--0.69%-- 0x525261
| 0x0
| 0x822ee8fff96873e9
|
|--0.69%-- 0x6578d7
| |
| --100.00%-- 0x0
|
|--0.67%-- 0x52fb44
| |
| |--75.44%-- 0x0
| |
| |--17.16%-- 0x10100000002
| |
| --7.41%-- 0x10100000006
|
|--0.66%-- 0x568e29
| |
| |--50.87%-- 0x200c280
| | 0x0
| | 0xa0
| | 0x696368752d62
| |
| |--33.04%-- 0x3195280
| | 0x0
| | 0xa0
| | 0x696368752d62
| |
| |--13.60%-- 0x1491280
| | 0x0
| | 0xa0
| | 0x696368752d62
| |
| |--1.40%-- 0x1000
| |
| |--0.65%-- 0x3000
| --0.43%-- [...]
|
|--0.65%-- 0x5b4cb4
| 0x0
| 0x822ee8fff96873e9
|
|--0.62%-- 0x55b9ba
| |
| |--50.14%-- 0x0
| |
| --49.86%-- 0x2000000
|
|--0.61%-- 0x4ff496
|
|--0.60%-- 0x672601
| 0x1
|
|--0.58%-- 0x4eec06
| |
| |--75.93%-- 0x200c280
| | 0x0
| | 0xa0
| | 0x696368752d62
| |
| |--15.91%-- 0x3195280
| | 0x0
| | 0xa0
| | 0x696368752d62
| |
| --8.15%-- 0x1491280
| 0x0
| 0xa0
| 0x696368752d62
|
|--0.58%-- 0x477a32
| 0x0
|
|--0.56%-- 0x477b27
| 0x0
|
|--0.56%-- 0x540e24
|
|--0.56%-- 0x40a4f4
|
|--0.55%-- 0x659d12
| 0x0
|
|--0.55%-- 0x4eec22
| |
| |--44.24%-- 0x200c280
| | 0x0
| | 0xa0
| | 0x696368752d62
| |
| |--32.08%-- 0x1491280
| | 0x0
| | 0xa0
| | 0x696368752d62
| |
| --23.68%-- 0x3195280
| 0x0
| 0xa0
| 0x696368752d62
|
|--0.53%-- 0x564394
| |
| |--69.75%-- 0x0
| |
| |--23.87%-- 0x10100000002
| |
| --6.38%-- 0x10100000006
|
|--0.52%-- 0x4eeb52
|
|--0.51%-- 0x530094
|
|--0.50%-- 0x477a9e
| 0x0
--74.90%-- [...]
0.77% qemu-kvm [kernel.kallsyms] [k] __srcu_read_lock
|
--- __srcu_read_lock
|
|--91.98%-- kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--81.72%-- 0x10100000006
| |
| --18.28%-- 0x10100000002
|
|--5.81%-- kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--78.63%-- 0x10100000006
| |
| --21.37%-- 0x10100000002
|
|--1.06%-- fsnotify
| vfs_write
| |
| |--98.29%-- sys_write
| | system_call_fastpath
| | write
| | |
| | --100.00%-- 0x0
| |
| --1.71%-- sys_pwrite64
| system_call_fastpath
| pwrite64
| |
| |--55.68%-- 0x1f12260
| | 0x80
| | 0x480050b9e1058b48
| |
| --44.32%-- 0x309b260
| 0x80
| 0x480050b9e1058b48
|
|--0.91%-- kvm_mmu_notifier_invalidate_page
| __mmu_notifier_invalidate_page
| try_to_unmap_one
| |
| |--98.79%-- try_to_unmap_anon
| | try_to_unmap
| | migrate_pages
| | compact_zone
| | compact_zone_order
| | try_to_compact_pages
| | __alloc_pages_direct_compact
| | __alloc_pages_nodemask
| | alloc_pages_vma
| | do_huge_pmd_anonymous_page
| | handle_mm_fault
| | __get_user_pages
| | get_user_page_nowait
| | hva_to_pfn.isra.17
| | __gfn_to_pfn
| | gfn_to_pfn_async
| | try_async_pf
| | tdp_page_fault
| | kvm_mmu_page_fault
| | pf_interception
| | handle_exit
| | kvm_arch_vcpu_ioctl_run
| | kvm_vcpu_ioctl
| | do_vfs_ioctl
| | sys_ioctl
| | system_call_fastpath
| | ioctl
| | |
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 101+ messages in thread
* Re: [Qemu-devel] [PATCH -v2 2/2] make the compaction "skip ahead" logic robust
@ 2012-09-18 17:58 ` Richard Davies
0 siblings, 0 replies; 101+ messages in thread
From: Richard Davies @ 2012-09-18 17:58 UTC (permalink / raw)
To: Mel Gorman; +Cc: kvm, qemu-devel, linux-mm, Avi Kivity, Shaohua Li
Mel Gorman wrote:
> Ok, this just means the focus has moved to the zone->lock instead of the
> zone->lru_lock. This was expected to some extent. This is an additional
> patch that defers acquisition of the zone->lock for as long as possible.
And I believe you have now beaten the lock contention - congratulations!
> Incidentally, I checked the efficiency of compaction - i.e. how many
> pages scanned versus how many pages isolated and the efficiency
> completely sucks. It must be addressed but addressing the lock
> contention should happen first.
Yes, compaction is now definitely top.
Interestingly, some boots still seem "slow" and some "fast", even without
any lock contention issues. Here are traces from a few different runs, and I
attach the detailed report for the first of these which was one of the slow
ones.
# grep -F '[k]' report.1 | head -8
55.86% qemu-kvm [kernel.kallsyms] [k] isolate_freepages_block
14.98% qemu-kvm [kernel.kallsyms] [k] clear_page_c
2.18% qemu-kvm [kernel.kallsyms] [k] yield_to
1.67% qemu-kvm [kernel.kallsyms] [k] get_pageblock_flags_group
1.66% qemu-kvm [kernel.kallsyms] [k] compact_zone
1.56% ksmd [kernel.kallsyms] [k] memcmp
1.48% swapper [kernel.kallsyms] [k] default_idle
1.33% qemu-kvm [kernel.kallsyms] [k] svm_vcpu_run
#
# grep -F '[k]' report.2 | head -8
38.28% qemu-kvm [kernel.kallsyms] [k] isolate_freepages_block
7.58% qemu-kvm [kernel.kallsyms] [k] get_pageblock_flags_group
7.03% qemu-kvm [kernel.kallsyms] [k] clear_page_c
4.72% qemu-kvm [kernel.kallsyms] [k] isolate_migratepages_range
4.31% qemu-kvm [kernel.kallsyms] [k] copy_page_c
4.15% qemu-kvm [kernel.kallsyms] [k] compact_zone
2.68% qemu-kvm [kernel.kallsyms] [k] __zone_watermark_ok
2.65% qemu-kvm [kernel.kallsyms] [k] yield_to
#
# grep -F '[k]' report.3 | head -8
75.18% qemu-kvm [kernel.kallsyms] [k] clear_page_c
1.82% swapper [kernel.kallsyms] [k] default_idle
1.29% qemu-kvm [kernel.kallsyms] [k] isolate_freepages_block
1.27% qemu-kvm [kernel.kallsyms] [k] get_page_from_freelist
1.20% ksmd [kernel.kallsyms] [k] memcmp
0.83% qemu-kvm [kernel.kallsyms] [k] free_pages_prepare
0.78% qemu-kvm [kernel.kallsyms] [k] svm_vcpu_run
0.59% qemu-kvm [kernel.kallsyms] [k] prep_compound_page
#
# grep -F '[k]' report.4 | head -8
41.02% qemu-kvm [kernel.kallsyms] [k] isolate_freepages_block
32.20% qemu-kvm [kernel.kallsyms] [k] clear_page_c
1.76% qemu-kvm [kernel.kallsyms] [k] yield_to
1.37% swapper [kernel.kallsyms] [k] default_idle
1.35% ksmd [kernel.kallsyms] [k] memcmp
1.27% qemu-kvm [kernel.kallsyms] [k] svm_vcpu_run
1.23% qemu-kvm [kernel.kallsyms] [k] get_pageblock_flags_group
0.88% qemu-kvm [kernel.kallsyms] [k] kvm_vcpu_on_spin
#
# grep -F '[k]' report.5 | head -8
61.18% qemu-kvm [kernel.kallsyms] [k] isolate_freepages_block
14.55% qemu-kvm [kernel.kallsyms] [k] clear_page_c
1.75% qemu-kvm [kernel.kallsyms] [k] yield_to
1.31% ksmd [kernel.kallsyms] [k] memcmp
1.21% qemu-kvm [kernel.kallsyms] [k] svm_vcpu_run
1.20% swapper [kernel.kallsyms] [k] default_idle
1.14% qemu-kvm [kernel.kallsyms] [k] get_pageblock_flags_group
0.94% qemu-kvm [kernel.kallsyms] [k] kvm_vcpu_on_spin
Here is the detailed report for the first of these:
# ========
# captured on: Tue Sep 18 17:03:40 2012
# os release : 3.6.0-rc5-elastic+
# perf version : 3.5.2
# arch : x86_64
# nrcpus online : 16
# nrcpus avail : 16
# cpudesc : AMD Opteron(tm) Processor 6128
# cpuid : AuthenticAMD,16,9,1
# total memory : 131973280 kB
# cmdline : /home/root/bin/perf record -g -a
# event : name = cycles, type = 0, config = 0x0, config1 = 0x0, config2 = 0x0, excl_usr = 0, excl_kern = 0, id = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16 }
# HEADER_CPU_TOPOLOGY info available, use -I to display
# HEADER_NUMA_TOPOLOGY info available, use -I to display
# ========
#
# Samples: 3M of event 'cycles'
# Event count (approx.): 1184064513533
#
# Overhead Command Shared Object Symbol
# ........ ............... .................... ..............................................
#
55.86% qemu-kvm [kernel.kallsyms] [k] isolate_freepages_block
|
--- isolate_freepages_block
|
|--99.99%-- compaction_alloc
| migrate_pages
| compact_zone
| compact_zone_order
| try_to_compact_pages
| __alloc_pages_direct_compact
| __alloc_pages_nodemask
| alloc_pages_vma
| do_huge_pmd_anonymous_page
| handle_mm_fault
| __get_user_pages
| get_user_page_nowait
| hva_to_pfn.isra.17
| __gfn_to_pfn
| gfn_to_pfn_async
| try_async_pf
| tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--88.73%-- 0x10100000006
| |
| --11.27%-- 0x10100000002
--0.01%-- [...]
14.98% qemu-kvm [kernel.kallsyms] [k] clear_page_c
|
--- clear_page_c
|
|--99.84%-- do_huge_pmd_anonymous_page
| handle_mm_fault
| __get_user_pages
| get_user_page_nowait
| hva_to_pfn.isra.17
| __gfn_to_pfn
| gfn_to_pfn_async
| try_async_pf
| tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--55.15%-- 0x10100000006
| |
| --44.85%-- 0x10100000002
--0.16%-- [...]
2.18% qemu-kvm [kernel.kallsyms] [k] yield_to
|
--- yield_to
|
|--99.62%-- kvm_vcpu_yield_to
| kvm_vcpu_on_spin
| pause_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--83.34%-- 0x10100000006
| |
| --16.66%-- 0x10100000002
--0.38%-- [...]
1.67% qemu-kvm [kernel.kallsyms] [k] get_pageblock_flags_group
|
--- get_pageblock_flags_group
|
|--57.67%-- isolate_migratepages_range
| compact_zone
| compact_zone_order
| try_to_compact_pages
| __alloc_pages_direct_compact
| __alloc_pages_nodemask
| alloc_pages_vma
| do_huge_pmd_anonymous_page
| handle_mm_fault
| __get_user_pages
| get_user_page_nowait
| hva_to_pfn.isra.17
| __gfn_to_pfn
| gfn_to_pfn_async
| try_async_pf
| tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--86.10%-- 0x10100000006
| |
| --13.90%-- 0x10100000002
|
|--38.10%-- suitable_migration_target
| compaction_alloc
| migrate_pages
| compact_zone
| compact_zone_order
| try_to_compact_pages
| __alloc_pages_direct_compact
| __alloc_pages_nodemask
| alloc_pages_vma
| do_huge_pmd_anonymous_page
| handle_mm_fault
| __get_user_pages
| get_user_page_nowait
| hva_to_pfn.isra.17
| __gfn_to_pfn
| gfn_to_pfn_async
| try_async_pf
| tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--88.50%-- 0x10100000006
| |
| --11.50%-- 0x10100000002
|
|--2.23%-- compact_zone
| compact_zone_order
| try_to_compact_pages
| __alloc_pages_direct_compact
| __alloc_pages_nodemask
| alloc_pages_vma
| do_huge_pmd_anonymous_page
| handle_mm_fault
| __get_user_pages
| get_user_page_nowait
| hva_to_pfn.isra.17
| __gfn_to_pfn
| gfn_to_pfn_async
| try_async_pf
| tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--85.85%-- 0x10100000006
| |
| --14.15%-- 0x10100000002
|
|--0.88%-- compaction_alloc
| migrate_pages
| compact_zone
| compact_zone_order
| try_to_compact_pages
| __alloc_pages_direct_compact
| __alloc_pages_nodemask
| alloc_pages_vma
| do_huge_pmd_anonymous_page
| handle_mm_fault
| __get_user_pages
| get_user_page_nowait
| hva_to_pfn.isra.17
| __gfn_to_pfn
| gfn_to_pfn_async
| try_async_pf
| tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--87.75%-- 0x10100000006
| |
| --12.25%-- 0x10100000002
|
|--0.75%-- free_hot_cold_page
| |
| |--74.93%-- free_hot_cold_page_list
| | |
| | |--53.13%-- shrink_page_list
| | | shrink_inactive_list
| | | shrink_lruvec
| | | try_to_free_pages
| | | __alloc_pages_nodemask
| | | alloc_pages_vma
| | | do_huge_pmd_anonymous_page
| | | handle_mm_fault
| | | __get_user_pages
| | | get_user_page_nowait
| | | hva_to_pfn.isra.17
| | | __gfn_to_pfn
| | | gfn_to_pfn_async
| | | try_async_pf
| | | tdp_page_fault
| | | kvm_mmu_page_fault
| | | pf_interception
| | | handle_exit
| | | kvm_arch_vcpu_ioctl_run
| | | kvm_vcpu_ioctl
| | | do_vfs_ioctl
| | | sys_ioctl
| | | system_call_fastpath
| | | ioctl
| | | |
| | | |--82.85%-- 0x10100000006
| | | |
| | | --17.15%-- 0x10100000002
| | |
| | --46.87%-- release_pages
| | pagevec_lru_move_fn
| | __pagevec_lru_add
| | |
| | |--98.13%-- __lru_cache_add
| | | lru_cache_add_lru
| | | putback_lru_page
| | | |
| | | |--99.02%-- migrate_pages
| | | | compact_zone
| | | | compact_zone_order
| | | | try_to_compact_pages
| | | | __alloc_pages_direct_compact
| | | | __alloc_pages_nodemask
| | | | alloc_pages_vma
| | | | do_huge_pmd_anonymous_page
| | | | handle_mm_fault
| | | | __get_user_pages
| | | | get_user_page_nowait
| | | | hva_to_pfn.isra.17
| | | | __gfn_to_pfn
| | | | gfn_to_pfn_async
| | | | try_async_pf
| | | | tdp_page_fault
| | | | kvm_mmu_page_fault
| | | | pf_interception
| | | | handle_exit
| | | | kvm_arch_vcpu_ioctl_run
| | | | kvm_vcpu_ioctl
| | | | do_vfs_ioctl
| | | | sys_ioctl
| | | | system_call_fastpath
| | | | ioctl
| | | | |
| | | | |--88.56%-- 0x10100000006
| | | | |
| | | | --11.44%-- 0x10100000002
| | | |
| | | --0.98%-- putback_lru_pages
| | | compact_zone
| | | compact_zone_order
| | | try_to_compact_pages
| | | __alloc_pages_direct_compact
| | | __alloc_pages_nodemask
| | | alloc_pages_vma
| | | do_huge_pmd_anonymous_page
| | | handle_mm_fault
| | | __get_user_pages
| | | get_user_page_nowait
| | | hva_to_pfn.isra.17
| | | __gfn_to_pfn
| | | gfn_to_pfn_async
| | | try_async_pf
| | | tdp_page_fault
| | | kvm_mmu_page_fault
| | | pf_interception
| | | handle_exit
| | | kvm_arch_vcpu_ioctl_run
| | | kvm_vcpu_ioctl
| | | do_vfs_ioctl
| | | sys_ioctl
| | | system_call_fastpath
| | | ioctl
| | | 0x10100000002
| | |
| | --1.87%-- lru_add_drain_cpu
| | lru_add_drain
| | |
| | |--51.26%-- shrink_inactive_list
| | | shrink_lruvec
| | | try_to_free_pages
| | | __alloc_pages_nodemask
| | | alloc_pages_vma
| | | do_huge_pmd_anonymous_page
| | | handle_mm_fault
| | | __get_user_pages
| | | get_user_page_nowait
| | | hva_to_pfn.isra.17
| | | __gfn_to_pfn
| | | gfn_to_pfn_async
| | | try_async_pf
| | | tdp_page_fault
| | | kvm_mmu_page_fault
| | | pf_interception
| | | handle_exit
| | | kvm_arch_vcpu_ioctl_run
| | | kvm_vcpu_ioctl
| | | do_vfs_ioctl
| | | sys_ioctl
| | | system_call_fastpath
| | | ioctl
| | | 0x10100000002
| | |
| | --48.74%-- migrate_prep_local
| | compact_zone
| | compact_zone_order
| | try_to_compact_pages
| | __alloc_pages_direct_compact
| | __alloc_pages_nodemask
| | alloc_pages_vma
| | do_huge_pmd_anonymous_page
| | handle_mm_fault
| | __get_user_pages
| | get_user_page_nowait
| | hva_to_pfn.isra.17
| | __gfn_to_pfn
| | gfn_to_pfn_async
| | try_async_pf
| | tdp_page_fault
| | kvm_mmu_page_fault
| | pf_interception
| | handle_exit
| | kvm_arch_vcpu_ioctl_run
| | kvm_vcpu_ioctl
| | do_vfs_ioctl
| | sys_ioctl
| | system_call_fastpath
| | ioctl
| | 0x10100000006
| |
| |--23.04%-- __free_pages
| | |
| | |--59.57%-- release_freepages
| | | compact_zone
| | | compact_zone_order
| | | try_to_compact_pages
| | | __alloc_pages_direct_compact
| | | __alloc_pages_nodemask
| | | alloc_pages_vma
| | | do_huge_pmd_anonymous_page
| | | handle_mm_fault
| | | __get_user_pages
| | | get_user_page_nowait
| | | hva_to_pfn.isra.17
| | | __gfn_to_pfn
| | | gfn_to_pfn_async
| | | try_async_pf
| | | tdp_page_fault
| | | kvm_mmu_page_fault
| | | pf_interception
| | | handle_exit
| | | kvm_arch_vcpu_ioctl_run
| | | kvm_vcpu_ioctl
| | | do_vfs_ioctl
| | | sys_ioctl
| | | system_call_fastpath
| | | ioctl
| | | |
| | | |--89.08%-- 0x10100000006
| | | |
| | | --10.92%-- 0x10100000002
| | |
| | |--30.57%-- do_huge_pmd_anonymous_page
| | | handle_mm_fault
| | | __get_user_pages
| | | get_user_page_nowait
| | | hva_to_pfn.isra.17
| | | __gfn_to_pfn
| | | gfn_to_pfn_async
| | | try_async_pf
| | | tdp_page_fault
| | | kvm_mmu_page_fault
| | | pf_interception
| | | handle_exit
| | | kvm_arch_vcpu_ioctl_run
| | | kvm_vcpu_ioctl
| | | do_vfs_ioctl
| | | sys_ioctl
| | | system_call_fastpath
| | | ioctl
| | | |
| | | |--60.91%-- 0x10100000006
| | | |
| | | --39.09%-- 0x10100000002
| | |
| | --9.86%-- __free_slab
| | discard_slab
| | |
| | |--55.43%-- unfreeze_partials
| | | put_cpu_partial
| | | __slab_free
| | | kmem_cache_free
| | | free_buffer_head
| | | try_to_free_buffers
| | | jbd2_journal_try_to_free_buffers
| | | ext4_releasepage
| | | try_to_release_page
| | | shrink_page_list
| | | shrink_inactive_list
| | | shrink_lruvec
| | | try_to_free_pages
| | | __alloc_pages_nodemask
| | | alloc_pages_vma
| | | do_huge_pmd_anonymous_page
| | | handle_mm_fault
| | | __get_user_pages
| | | get_user_page_nowait
| | | hva_to_pfn.isra.17
| | | __gfn_to_pfn
| | | gfn_to_pfn_async
| | | try_async_pf
| | | tdp_page_fault
| | | kvm_mmu_page_fault
| | | pf_interception
| | | handle_exit
| | | kvm_arch_vcpu_ioctl_run
| | | kvm_vcpu_ioctl
| | | do_vfs_ioctl
| | | sys_ioctl
| | | system_call_fastpath
| | | ioctl
| | | 0x10100000006
| | |
| | --44.57%-- __slab_free
| | kmem_cache_free
| | free_buffer_head
| | try_to_free_buffers
| | jbd2_journal_try_to_free_buffers
| | ext4_releasepage
| | try_to_release_page
| | shrink_page_list
| | shrink_inactive_list
| | shrink_lruvec
| | try_to_free_pages
| | __alloc_pages_nodemask
| | alloc_pages_vma
| | do_huge_pmd_anonymous_page
| | handle_mm_fault
| | __get_user_pages
| | get_user_page_nowait
| | hva_to_pfn.isra.17
| | __gfn_to_pfn
| | gfn_to_pfn_async
| | try_async_pf
| | tdp_page_fault
| | kvm_mmu_page_fault
| | pf_interception
| | handle_exit
| | kvm_arch_vcpu_ioctl_run
| | kvm_vcpu_ioctl
| | do_vfs_ioctl
| | sys_ioctl
| | system_call_fastpath
| | ioctl
| | 0x10100000006
| |
| --2.02%-- __put_single_page
| put_page
| putback_lru_page
| migrate_pages
| compact_zone
| compact_zone_order
| try_to_compact_pages
| __alloc_pages_direct_compact
| __alloc_pages_nodemask
| alloc_pages_vma
| do_huge_pmd_anonymous_page
| handle_mm_fault
| __get_user_pages
| get_user_page_nowait
| hva_to_pfn.isra.17
| __gfn_to_pfn
| gfn_to_pfn_async
| try_async_pf
| tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--83.36%-- 0x10100000006
| |
| --16.64%-- 0x10100000002
--0.37%-- [...]
1.66% qemu-kvm [kernel.kallsyms] [k] compact_zone
|
--- compact_zone
|
|--99.99%-- compact_zone_order
| try_to_compact_pages
| __alloc_pages_direct_compact
| __alloc_pages_nodemask
| alloc_pages_vma
| do_huge_pmd_anonymous_page
| handle_mm_fault
| __get_user_pages
| get_user_page_nowait
| hva_to_pfn.isra.17
| __gfn_to_pfn
| gfn_to_pfn_async
| try_async_pf
| tdp_page_fault
| kvm_mmu_page_fault
| pf_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--85.25%-- 0x10100000006
| |
| --14.75%-- 0x10100000002
--0.01%-- [...]
1.56% ksmd [kernel.kallsyms] [k] memcmp
|
--- memcmp
|
|--99.67%-- memcmp_pages
| |
| |--77.39%-- ksm_scan_thread
| | kthread
| | kernel_thread_helper
| |
| --22.61%-- try_to_merge_with_ksm_page
| ksm_scan_thread
| kthread
| kernel_thread_helper
--0.33%-- [...]
1.48% swapper [kernel.kallsyms] [k] default_idle
|
--- default_idle
|
|--99.55%-- cpu_idle
| |
| |--92.95%-- start_secondary
| |
| --7.05%-- rest_init
| start_kernel
| x86_64_start_reservations
| x86_64_start_kernel
--0.45%-- [...]
1.33% qemu-kvm [kernel.kallsyms] [k] svm_vcpu_run
|
--- svm_vcpu_run
|
|--99.34%-- kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--77.65%-- 0x10100000006
| |
| --22.35%-- 0x10100000002
|
--0.66%-- kvm_vcpu_ioctl
do_vfs_ioctl
sys_ioctl
system_call_fastpath
ioctl
|
|--73.97%-- 0x10100000006
|
--26.03%-- 0x10100000002
1.08% qemu-kvm [kernel.kallsyms] [k] kvm_vcpu_on_spin
|
--- kvm_vcpu_on_spin
|
|--99.27%-- pause_interception
| handle_exit
| kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--83.21%-- 0x10100000006
| |
| --16.79%-- 0x10100000002
|
--0.73%-- handle_exit
kvm_arch_vcpu_ioctl_run
kvm_vcpu_ioctl
do_vfs_ioctl
sys_ioctl
system_call_fastpath
ioctl
|
|--80.89%-- 0x10100000006
|
--19.11%-- 0x10100000002
0.79% qemu-kvm qemu-kvm [.] 0x00000000000ae282
|
|--1.27%-- 0x4eec6e
| |
| |--38.48%-- 0x1491280
| | 0x0
| | 0xa0
| | 0x696368752d62
| |
| |--32.35%-- 0x3195280
| | 0x0
| | 0xa0
| | 0x696368752d62
| |
| --29.16%-- 0x200c280
| 0x0
| 0xa0
| 0x696368752d62
|
|--1.24%-- 0x503457
| 0x0
|
|--1.02%-- 0x4eec20
| |
| |--46.48%-- 0x200c280
| | 0x0
| | 0xa0
| | 0x696368752d62
| |
| |--28.52%-- 0x3195280
| | 0x0
| | 0xa0
| | 0x696368752d62
| |
| --24.99%-- 0x1491280
| 0x0
| 0xa0
| 0x696368752d62
|
|--1.00%-- 0x4eec2a
| |
| |--77.52%-- 0x200c280
| | 0x0
| | 0xa0
| | 0x696368752d62
| |
| |--12.67%-- 0x3195280
| | 0x0
| | 0xa0
| | 0x696368752d62
| |
| --9.80%-- 0x1491280
| 0x0
| 0xa0
| 0x696368752d62
|
|--0.99%-- 0x4ef092
|
|--0.94%-- 0x568f04
| |
| |--89.85%-- 0x0
| |
| |--7.89%-- 0x10100000002
| |
| --2.26%-- 0x10100000006
|
|--0.93%-- 0x5afab4
| |
| |--40.39%-- 0x309a410
| | 0x0
| |
| |--31.80%-- 0x1f11410
| | 0x0
| |
| |--20.88%-- 0x1396410
| | 0x0
| |
| |--4.58%-- 0x0
| | |
| | |--52.36%-- 0x148ea00
| | | 0x5699c0
| | | 0x24448948004b4154
| | |
| | |--31.49%-- 0x2009a00
| | | 0x5699c0
| | | 0x24448948004b4154
| | |
| | --16.15%-- 0x3192a00
| | 0x5699c0
| | 0x24448948004b4154
| |
| |--1.31%-- 0x1000
| |
| --1.03%-- 0x6
|
|--0.92%-- 0x4eeba0
| |
| |--35.54%-- 0x3195280
| | 0x0
| | 0xa0
| | 0x696368752d62
| |
| |--32.33%-- 0x200c280
| | 0x0
| | 0xa0
| | 0x696368752d62
| |
| --32.12%-- 0x1491280
| 0x0
| 0xa0
| 0x696368752d62
|
|--0.91%-- 0x652b11
|
|--0.83%-- 0x65a102
|
|--0.82%-- 0x40a6a9
|
|--0.81%-- 0x530421
| |
| |--94.43%-- 0x0
| |
| --5.57%-- 0x46b47b
| |
| |--51.32%-- 0xdffec96000a08169
| |
| --48.68%-- 0xdffec90000a08169
|
|--0.80%-- 0x569fc4
| |
| |--41.34%-- 0x1396410
| | 0x0
| |
| |--29.46%-- 0x1f11410
| | 0x0
| |
| --29.21%-- 0x309a410
| 0x0
|
|--0.73%-- 0x541422
| 0x0
|
|--0.70%-- 0x56b990
| |
| |--72.77%-- 0x100000008
| |
| |--26.00%-- 0xfed00000
| | |
| | --100.00%-- 0x0
| |
| |--0.73%-- 0x100000004
| --0.50%-- [...]
|
|--0.69%-- 0x525261
| 0x0
| 0x822ee8fff96873e9
|
|--0.69%-- 0x6578d7
| |
| --100.00%-- 0x0
|
|--0.67%-- 0x52fb44
| |
| |--75.44%-- 0x0
| |
| |--17.16%-- 0x10100000002
| |
| --7.41%-- 0x10100000006
|
|--0.66%-- 0x568e29
| |
| |--50.87%-- 0x200c280
| | 0x0
| | 0xa0
| | 0x696368752d62
| |
| |--33.04%-- 0x3195280
| | 0x0
| | 0xa0
| | 0x696368752d62
| |
| |--13.60%-- 0x1491280
| | 0x0
| | 0xa0
| | 0x696368752d62
| |
| |--1.40%-- 0x1000
| |
| |--0.65%-- 0x3000
| --0.43%-- [...]
|
|--0.65%-- 0x5b4cb4
| 0x0
| 0x822ee8fff96873e9
|
|--0.62%-- 0x55b9ba
| |
| |--50.14%-- 0x0
| |
| --49.86%-- 0x2000000
|
|--0.61%-- 0x4ff496
|
|--0.60%-- 0x672601
| 0x1
|
|--0.58%-- 0x4eec06
| |
| |--75.93%-- 0x200c280
| | 0x0
| | 0xa0
| | 0x696368752d62
| |
| |--15.91%-- 0x3195280
| | 0x0
| | 0xa0
| | 0x696368752d62
| |
| --8.15%-- 0x1491280
| 0x0
| 0xa0
| 0x696368752d62
|
|--0.58%-- 0x477a32
| 0x0
|
|--0.56%-- 0x477b27
| 0x0
|
|--0.56%-- 0x540e24
|
|--0.56%-- 0x40a4f4
|
|--0.55%-- 0x659d12
| 0x0
|
|--0.55%-- 0x4eec22
| |
| |--44.24%-- 0x200c280
| | 0x0
| | 0xa0
| | 0x696368752d62
| |
| |--32.08%-- 0x1491280
| | 0x0
| | 0xa0
| | 0x696368752d62
| |
| --23.68%-- 0x3195280
| 0x0
| 0xa0
| 0x696368752d62
|
|--0.53%-- 0x564394
| |
| |--69.75%-- 0x0
| |
| |--23.87%-- 0x10100000002
| |
| --6.38%-- 0x10100000006
|
|--0.52%-- 0x4eeb52
|
|--0.51%-- 0x530094
|
|--0.50%-- 0x477a9e
| 0x0
--74.90%-- [...]
0.77% qemu-kvm [kernel.kallsyms] [k] __srcu_read_lock
|
--- __srcu_read_lock
|
|--91.98%-- kvm_arch_vcpu_ioctl_run
| kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--81.72%-- 0x10100000006
| |
| --18.28%-- 0x10100000002
|
|--5.81%-- kvm_vcpu_ioctl
| do_vfs_ioctl
| sys_ioctl
| system_call_fastpath
| ioctl
| |
| |--78.63%-- 0x10100000006
| |
| --21.37%-- 0x10100000002
|
|--1.06%-- fsnotify
| vfs_write
| |
| |--98.29%-- sys_write
| | system_call_fastpath
| | write
| | |
| | --100.00%-- 0x0
| |
| --1.71%-- sys_pwrite64
| system_call_fastpath
| pwrite64
| |
| |--55.68%-- 0x1f12260
| | 0x80
| | 0x480050b9e1058b48
| |
| --44.32%-- 0x309b260
| 0x80
| 0x480050b9e1058b48
|
|--0.91%-- kvm_mmu_notifier_invalidate_page
| __mmu_notifier_invalidate_page
| try_to_unmap_one
| |
| |--98.79%-- try_to_unmap_anon
| | try_to_unmap
| | migrate_pages
| | compact_zone
| | compact_zone_order
| | try_to_compact_pages
| | __alloc_pages_direct_compact
| | __alloc_pages_nodemask
| | alloc_pages_vma
| | do_huge_pmd_anonymous_page
| | handle_mm_fault
| | __get_user_pages
| | get_user_page_nowait
| | hva_to_pfn.isra.17
| | __gfn_to_pfn
| | gfn_to_pfn_async
| | try_async_pf
| | tdp_page_fault
| | kvm_mmu_page_fault
| | pf_interception
| | handle_exit
| | kvm_arch_vcpu_ioctl_run
| | kvm_vcpu_ioctl
| | do_vfs_ioctl
| | sys_ioctl
| | system_call_fastpath
| | ioctl
| | |
^ permalink raw reply [flat|nested] 101+ messages in thread
end of thread, other threads:[~2012-09-18 17:59 UTC | newest]
Thread overview: 101+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-08-16 10:47 Windows slow boot: contractor wanted Richard Davies
2012-08-16 10:47 ` [Qemu-devel] " Richard Davies
2012-08-16 11:39 ` Avi Kivity
2012-08-16 11:39 ` [Qemu-devel] " Avi Kivity
2012-08-17 12:36 ` Richard Davies
2012-08-17 12:36 ` [Qemu-devel] " Richard Davies
2012-08-17 13:02 ` Robert Vineyard
2012-08-17 13:02 ` [Qemu-devel] " Robert Vineyard
2012-08-18 14:44 ` Richard Davies
2012-08-18 14:44 ` [Qemu-devel] " Richard Davies
2012-08-19 5:02 ` Brian Jackson
2012-08-19 5:02 ` [Qemu-devel] " Brian Jackson
2012-08-20 8:16 ` Richard Davies
2012-08-20 8:16 ` [Qemu-devel] " Richard Davies
2012-08-19 8:40 ` Avi Kivity
2012-08-19 8:40 ` [Qemu-devel] " Avi Kivity
2012-08-19 8:51 ` Richard Davies
2012-08-19 8:51 ` [Qemu-devel] " Richard Davies
2012-08-19 14:04 ` Avi Kivity
2012-08-19 14:04 ` [Qemu-devel] " Avi Kivity
2012-08-20 13:56 ` Richard Davies
2012-08-20 13:56 ` [Qemu-devel] " Richard Davies
2012-08-21 9:00 ` Avi Kivity
2012-08-21 9:00 ` [Qemu-devel] " Avi Kivity
2012-08-21 15:21 ` Richard Davies
2012-08-21 15:21 ` [Qemu-devel] " Richard Davies
2012-08-21 15:39 ` Troy Benjegerdes
2012-08-21 15:39 ` Troy Benjegerdes
2012-08-22 9:08 ` Avi Kivity
2012-08-22 9:08 ` [Qemu-devel] " Avi Kivity
2012-08-22 12:40 ` Richard Davies
2012-08-22 12:40 ` [Qemu-devel] " Richard Davies
2012-08-22 12:44 ` Avi Kivity
2012-08-22 12:44 ` [Qemu-devel] " Avi Kivity
2012-08-22 14:41 ` Richard Davies
2012-08-22 14:41 ` [Qemu-devel] " Richard Davies
2012-08-22 14:53 ` Avi Kivity
2012-08-22 14:53 ` [Qemu-devel] " Avi Kivity
2012-08-22 15:26 ` Richard Davies
2012-08-22 15:26 ` [Qemu-devel] " Richard Davies
2012-08-22 17:22 ` Troy Benjegerdes
2012-08-22 17:22 ` Troy Benjegerdes
2012-08-25 17:51 ` Richard Davies
2012-08-25 17:51 ` Richard Davies
2012-08-22 15:21 ` Rik van Riel
2012-08-22 15:21 ` [Qemu-devel] " Rik van Riel
2012-08-22 15:34 ` Richard Davies
2012-08-22 15:34 ` [Qemu-devel] " Richard Davies
2012-08-25 17:45 ` Richard Davies
2012-08-25 17:45 ` [Qemu-devel] " Richard Davies
2012-08-25 18:11 ` Rik van Riel
2012-08-25 18:11 ` [Qemu-devel] " Rik van Riel
2012-08-26 10:58 ` Richard Davies
2012-08-26 10:58 ` [Qemu-devel] " Richard Davies
2012-09-06 9:20 ` Richard Davies
2012-09-06 9:20 ` [Qemu-devel] " Richard Davies
2012-09-12 10:56 ` Windows VM slow boot Richard Davies
2012-09-12 10:56 ` [Qemu-devel] " Richard Davies
2012-09-12 10:56 ` Richard Davies
2012-09-12 12:25 ` Mel Gorman
2012-09-12 12:25 ` [Qemu-devel] " Mel Gorman
2012-09-12 12:25 ` Mel Gorman
2012-09-12 16:46 ` Richard Davies
2012-09-12 16:46 ` [Qemu-devel] " Richard Davies
2012-09-12 16:46 ` Richard Davies
2012-09-13 9:50 ` Mel Gorman
2012-09-13 9:50 ` [Qemu-devel] " Mel Gorman
2012-09-13 9:50 ` Mel Gorman
2012-09-13 19:47 ` [PATCH 1/2] Revert "mm: have order > 0 compaction start near a pageblock with free pages" Rik van Riel
2012-09-13 19:47 ` [Qemu-devel] " Rik van Riel
2012-09-13 19:47 ` Rik van Riel
2012-09-13 19:48 ` [PATCH 2/2] make the compaction "skip ahead" logic robust Rik van Riel
2012-09-13 19:48 ` [Qemu-devel] " Rik van Riel
2012-09-13 19:48 ` Rik van Riel
2012-09-13 19:54 ` [PATCH -v2 " Rik van Riel
2012-09-13 19:54 ` [Qemu-devel] " Rik van Riel
2012-09-13 19:54 ` Rik van Riel
2012-09-15 15:55 ` Richard Davies
2012-09-15 15:55 ` [Qemu-devel] " Richard Davies
2012-09-15 15:55 ` Richard Davies
2012-09-16 19:12 ` Richard Davies
2012-09-16 19:12 ` [Qemu-devel] " Richard Davies
2012-09-17 12:26 ` Mel Gorman
2012-09-17 12:26 ` [Qemu-devel] " Mel Gorman
2012-09-18 8:14 ` Richard Davies
2012-09-18 8:14 ` [Qemu-devel] " Richard Davies
2012-09-18 11:21 ` Mel Gorman
2012-09-18 11:21 ` [Qemu-devel] " Mel Gorman
2012-09-18 11:21 ` Mel Gorman
2012-09-18 17:58 ` Richard Davies
2012-09-18 17:58 ` [Qemu-devel] " Richard Davies
2012-09-17 13:50 ` Rik van Riel
2012-09-17 13:50 ` [Qemu-devel] " Rik van Riel
2012-09-17 14:07 ` Mel Gorman
2012-09-17 14:07 ` [Qemu-devel] " Mel Gorman
2012-09-17 14:07 ` Mel Gorman
2012-08-16 14:10 ` Windows slow boot: contractor wanted Benoît Canet
2012-08-16 14:10 ` [Qemu-devel] " Benoît Canet
2012-08-16 15:53 ` Troy Benjegerdes
2012-09-18 15:12 ` Windows slow boot Michael Tokarev
2012-09-18 15:12 ` [Qemu-devel] " Michael Tokarev
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.